What it is
An LLM gateway is a reverse proxy for model APIs. The operator wires up keys for OpenAI, Anthropic, Google, Mistral, their own Ollama box, and a handful of fine-tunes; the gateway exposes a single OpenAI-compatible endpoint and handles routing, rate-limiting, fallback, observability, and cost accounting. LiteLLM is the Python-native one (most common in research); OneAPI is the Go/Chinese-ecosystem one (most common in commercial deployments). Portkey, Helicone-Proxy, and APISIX-AI sit in the same niche.
What goes wrong
The gateway holds the operator’s entire AI billing relationship. If it’s exposed without auth, an attacker can route arbitrary prompts through any of the configured providers: burning the operator’s quota, exfiltrating embedded prompts that may contain customer data, and racking up usage charges on premium models. Worse: the admin panel typically lists every model alias, the keys behind them, and the per-user/per-team budget. The attacker learns the operator’s whole AI org chart before issuing a single request.
How we test
We confirm the gateway by its /v1/models response shape (LiteLLM’s is
distinct from a vanilla OpenAI proxy), then check /health/readiness and
/key/info for admin-key reachability. The key endpoint, when unauthenticated,
returns the operator’s full virtual-key inventory including budget caps and
team assignments. We do not issue paid completions. The catalogue is enough
to demonstrate the quota-drain risk and identify the operator.