What it is
Where MCP standardises one agent calling tools, agent frameworks orchestrate many agents talking to each other. LangGraph (LangChain) models agent flows as state machines on a graph. AutoGen (Microsoft) and its fork AG2 model multi-agent conversations with explicit role assignments. CrewAI is the high-level “Researcher / Planner / Critic / Writer” team abstraction. MetaGPT ships the same idea as a software-team simulation. Together they are how teams ship the kind of system Anthropic’s CEO calls “a virtual coworker.”
What goes wrong
The orchestrator process is a long-running stateful Python service that holds the entire conversation graph between every agent it has ever coordinated. The state typically lives on disk or in a Redis-backed checkpoint store. When the orchestrator’s HTTP control plane is exposed without auth, an attacker reads every agent’s history (which often contains intermediate tool outputs and customer data) and can frequently inject new messages into a running conversation. The attack surface is every tool every agent has ever been given multiplied by the orchestrator’s lifetime.
How we test
We probe LangGraph’s /threads and /runs endpoints, AutoGen’s WebSocket
control surface, and CrewAI’s REST API for the conversation inventory.
Conversation IDs and timestamps tell us how long the orchestrator has been
running and how active the operator’s deployment is. We do not read message
bodies. The agent role catalogue (extractable from configuration without
reading conversations) is sufficient operator-attribution evidence.