Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

§ THE STACK / AGENT LAYER

Voice Agents

Vapi, Retell, LiveKit Agents, real-time voice + LLM

How LLMs reach out and take action: call APIs, browse the web, drive workflows.

What it is

Voice agents pair an LLM with real-time speech-to-text, text-to-speech, and phone-call infrastructure. Vapi and Retell are the managed-platform leaders, both used to build customer-support and outbound-sales bots that sound like humans on a phone call. LiveKit Agents is the open-source real-time framework. Pipecat (Daily.co) is the Python-native voice agent framework. Behind every “AI agent answered my call” experience is one of these orchestrating Whisper, GPT/Claude, and a TTS engine on a sub-200ms budget.

What goes wrong

Voice agent control planes hold the most invasive credentials in the AI stack: a phone number, an outbound-calling ability, and a recording of every call the operator has placed or received. When exposed without auth, an attacker gets a free phone number with the operator’s billing relationship and a verbatim audio archive of every customer conversation, including account verification phrases, credit card readbacks, and the medical or legal context the customer thought was private.

How we test

We probe the dashboard and admin APIs (Vapi’s /v1/calls, LiveKit’s WebSocket control endpoint, Pipecat’s status server). Call counts and duration distributions characterise the operator’s traffic. We never trigger outbound calls. Recording filenames or call IDs are sufficient attribution evidence; most operators name calls by their internal campaign ID which identifies the team without our needing to listen to anything.

Receipts

Research

Every survey, case study, and disclosure we've published that touches this layer of the stack. Counts on the cells above tally these directly.