What it is
Voice agents pair an LLM with real-time speech-to-text, text-to-speech, and phone-call infrastructure. Vapi and Retell are the managed-platform leaders, both used to build customer-support and outbound-sales bots that sound like humans on a phone call. LiveKit Agents is the open-source real-time framework. Pipecat (Daily.co) is the Python-native voice agent framework. Behind every “AI agent answered my call” experience is one of these orchestrating Whisper, GPT/Claude, and a TTS engine on a sub-200ms budget.
What goes wrong
Voice agent control planes hold the most invasive credentials in the AI stack: a phone number, an outbound-calling ability, and a recording of every call the operator has placed or received. When exposed without auth, an attacker gets a free phone number with the operator’s billing relationship and a verbatim audio archive of every customer conversation, including account verification phrases, credit card readbacks, and the medical or legal context the customer thought was private.
How we test
We probe the dashboard and admin APIs (Vapi’s /v1/calls, LiveKit’s
WebSocket control endpoint, Pipecat’s status server). Call counts and
duration distributions characterise the operator’s traffic. We never
trigger outbound calls. Recording filenames or call IDs are sufficient
attribution evidence; most operators name calls by their internal
campaign ID which identifies the team without our needing to listen to
anything.