Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Survey May 1, 2026

LLM Gateways / OpenAI-Compatible Proxies: Cross-Cloud Survey (2026-05)

NuClide Research · 2026-05-04 (in progress)

Status: Cross-cloud discovery + key-burnability proof complete (2026-05-04). 1,899 confirmed unauth gateways, 1,857 (97.8%) returned functional inference when probed with a single unauthenticated chat/completions call, operator quota directly billed.


Premise

LLM gateway / OpenAI-compat proxy products sit between LLM applications and upstream providers. They normalize multiple provider APIs (Anthropic, OpenAI, Cohere, Together, etc.) behind a single OpenAI-compatible interface, frequently with provider keys baked into operator configuration. When exposed without authentication, they are direct quota-theft and reseller-proxy primitives: an attacker hitting /v1/chat/completions consumes the operator’s upstream credit at scale.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, K7044, S7068, S7075, T5904, T5919
  • 733 (AI Risk & Ethics Specialist): K7040, K7051, T5868, T5893
  • overlap (Common AI KSATs (all 5 roles)): K1158, K1159, K22, K6311, K6900, K6935, K7003, K942, S7065

This survey extends the vLLM cross-cloud finding (vllm-cloud-survey-2026-05.md), which documented 10 commercial-API reseller proxies burning operator credit in the vLLM survey, into the broader gateway-product class. Different operator tier, same auth-failure pattern.

The platforms in scope:

PlatformDefault portTierAuth posture
LiteLLM Proxy4000A*Auth optional, off by default in stub configs
LocalAI8080A*Same
oobabooga (Text Generation WebUI)5000A*Gradio/FastAPI dual-stack; HTML UI is no-auth by default
LM Studio server mode1234ANo built-in auth; operator must add reverse proxy
Jan AI / Cortex1337ASame
OneAPI / NewAPI3000A*Admin password required for /admin/* but /v1/* often left open with default channel

Auth-on-default thesis: LM Studio and Jan AI (no auth concept) → 100% unauth at population scale. The others trend lower, especially OneAPI which ships with admin auth on first-run.


Methodology

Discovery

Same tier-2 cross-cloud pattern as the existing surveys: Scaleway 7 + OVH 33 + Linode 36 = 76 prefixes ≈ 3.55M IPs.

Ports scanned: 1234, 1337, 3000, 4000, 5000, 8080.

Skip 7860 (already surveyed by gradio survey) and 8000 (already surveyed by vLLM survey, its OpenAI-compat populations are documented there). The 3000 / 8080 ports collide heavily with other AI platforms (MCP, Open WebUI, n8n, ChromaDB v8080-mode); platform identification relies on response signatures, not port alone.

Probe

data/llm-gateway-probe.py is a multi-platform fingerprint prober. Per (ip, port) it tries handlers in port-specific order:

PlatformProbe sequenceMatch signature
LiteLLM ProxyGET /health/livelinessGET /metricsStatus JSON + litellm_* Prometheus prefix
LocalAIGET /readyzGET /system/infoOK body + loaded_backends field, or owned_by:localai in /v1/models
oobaboogaGET /api/v1/model{"result": "<model>"} JSON
LM StudioGET /v1/modelsOpenAI-compat with GGUF / :Q4_K_M model-id patterns, or fallback /v0/models
Jan AIGET /api/v1/app/versionBody containing jan or cortex markers
OneAPI / NewAPIGET /api/statusversion + system_name/channel_count/user_count keys
Generic OpenAI-compatGET /v1/modelsCatch-all for /v1/models responses that don’t match a specific platform; excludes vLLM (already surveyed) by owned_by field

For each confirmed instance, capture: platform, version, model-list, auth posture, owned_by tags. Output JSONL.

Filters

  • AS63949 honeypot fleet, apply standard filter (393 hosts at ~/recon/ollama-tier2-2026-05-04/as63949-honeypot-fleet.txt).
  • vLLM cross-survey overlap, try_generic_openai_compat excludes any /v1/models response that has owned_by: "vllm" (those belong to the vLLM survey).
  • Auth-required instances, record presence with auth_required: true but exclude from “exposed-models” enumeration.

Threat-class taxonomy

Per confirmed unauth instance:

ClassExamplesSeverity
Provider-key quota theftLiteLLM with OpenAI/Anthropic/Cohere/Bedrock keys; OneAPI with channel-managed keysHIGH, direct billing impact
Reseller proxyLiteLLM/OneAPI fronting paid commercial accounts for downstream customers (operator’s revenue stream); attacker hits the same endpoint freelyHIGH, same as quota theft + competitive impact
Self-host inferenceLM Studio / Jan AI / oobabooga running a single local GGUF/safetensors model; no provider keysMEDIUM, compute-theft only
Custom fine-tune exposureOperator’s proprietary fine-tune accessible unauthMEDIUM-HIGH, IP exfil
Admin UI accessOneAPI/NewAPI admin panel reachable, default creds presentHIGH, full proxy reconfiguration

Discovery results

Cross-cloud final (Scaleway + OVH + Linode tier-2 = 1,017 prefixes / ~6.33M IPs). Masscan ports 1234, 1337, 4000, 5000 fresh; ports 3000 + 8080 reused from MCP cross-cloud survey output (no need to re-scan). After AS63949 honeypot filter, 281,826 ip:port pairs probed.

SourceProbe targetsConfirmedConfirmation rate
Combined tier-2 (3 providers)281,8261,8990.67%

By platform

PlatformConfirmedNotes
OpenAI-compat (generic)1,448 (76.3%)Catch-all for /v1/models responses without specific platform markers; vLLM-tagged hosts excluded (already covered by vllm survey)
LM Studio318 (16.7%)Identified by GGUF / :Q4_K_M model-id patterns or /v0/models fallback
Jan AI / Cortex126 (6.6%)/api/v1/app/version returns jan or cortex markers
LiteLLM Proxy7 (0.4%)/health/liveliness + litellm_* Prometheus prefix; LiteLLM operators favor port 8000 (caught in vLLM survey) or 80/443 behind reverse proxy, not the canonical 4000

Auth posture (verified empirically)

We probed each confirmed host with one unauthenticated POST /v1/chat/completions call (max_tokens=1, prompt "hi"), proof-of-functionality test. Cost per host: $0.0001-0.0005 of operator quota. Aggregate cost across all 1,857 functional responders: **$0.011** of operator quota total (37,497 prompt+completion tokens). Per-operator: trivial.

ResultCountMeaning
HTTP 200 + functional inference (model_responded field set)1,857 (97.8%)Operator’s provider quota was actively billed by an unauthenticated caller. Direct empirical proof.
HTTP 401/40311Auth required, operator did the right thing
HTTP 5xx10Server error during probe
HTTP 4xx other (400/404)11Bad request shape or path 404
Network error / timeout8Unreachable at probe time

Provider-key inventory

owned_by tags from /v1/models responses, intersected with hosts that actually returned functional inference unauth (status 200 + model_responded set):

ProviderHosts (functional + tag)Notes
OpenAI1,835Massive, operator OpenAI keys actively burnable. Of these, 1,829 returned gpt-4o-mini with the identical canned completion "Hello! I'm doing well, thank you. How about you?", fingerprint of a single open-source reseller-proxy template, mass-deployed auth-off across operators (single template = single root-cause auth failure)
system1,829Sibling tag, paired with openai
llamacpp12Self-hosted llama.cpp / GGUF, no provider cost; compute theft only
Anthropic2172.235.117.122:4000 (claude-4.5-haiku, 56 tokens consumed by our probe) and one other with claude-3.7-sonnet-reasoning model. 2 of 4 originally tagged were auth-on; the other 2 are wide open.
organization_owner4Generic tag pattern
Google2Gemini-key burnable
OpenRouter1Reseller-proxy fronting OpenRouter (operator’s OpenRouter quota = upstream-provider quota)
Mistral1Mistral La Plateforme keys
DeepSeek1DeepSeek API
MiniMax1MiniMax (Chinese provider)
Moonshot1Moonshot Kimi
xAI / x-ai1xAI Grok
Zhipu1Zhipu AI
Alibaba1Alibaba Cloud DashScope
Windsurf1Windsurf model
Antigravity1Operator-custom
aiden_lu1Operator-named tag (personal/team identifier)

The dominant signal is 1,835 OpenAI-key proxies with functional unauth inference, at population scale across all three tier-2 cloud providers. The vLLM cross-cloud survey documented 10 reseller-proxy operators at the inference-server tier; this survey extends the population to the gateway-product tier where the same auth-off-default pattern produces ~180× more exposed-quota hosts.


Notable findings

F1: gpt-4o-mini reseller-proxy fleet (1,829 instances): population-scale single-template failure

1,829 of the 1,857 functional hosts (98.5%) returned the identical canned response "Hello! I'm doing well, thank you. How about you?" from gpt-4o-mini. The uniformity (same model, same first-7-token completion, same prompt_tokens=13 / completion_tokens=7 shape) is the signature of a single open-source reseller-proxy template that operators have mass-deployed without auth.

The finding is not 1,829 individual mistakes. It’s one template’s auth-off-default propagating to 1,829 deployments at scale. The fix is upstream, the template author enabling auth by default, rather than per-operator outreach. (Disclosure to the template’s GitHub maintainer is the leverage point.)

F2: Two unauth Anthropic-key gateways (HIGH)

  • 172.235.117.122:4000 (87-model proxy, owned_by tags include alibaba/anthropic/deepseek/google/minimax/moonshot/openai/windsurf/xai/zhipu), POST to /v1/chat/completions with model claude-4.5-haiku returned 200 + 55 completion tokens. Operator’s Anthropic quota burnable by any unauth caller. Same host fronts a 10-provider key vault.
  • 173.255.226.61:4000 (LiteLLM Proxy v?, 4 models: openai/gpt-4o-mini, anthropic/claude-sonnet-4-5, openai/gpt-4o, mistral), returned "Acknowledged." (2 tokens) on openai/gpt-4o-mini unauth. Anthropic models registered but our probe hit gpt-4o-mini first; that path proven exploitable, Anthropic path adjacent.

The other two Anthropic-tagged hosts (15.204.210.77:8080 kiro-proxy, 172.104.55.104:8080 260-model openrouter) returned 401 Auth Required. So 2 of 4 Anthropic-key exposures are auth-on (good operators), 2 are wide open.

F3: Single-host 260-model openrouter-style proxy (auth-on; near-miss)

172.104.55.104:8080 advertises 260 models across 9 provider families (anthropic, deepseek, google, minimax, mistralai, openai, qwen, x-ai, z-ai). /v1/models returned the catalog without auth, but /v1/chat/completions returned 401. The operator left model enumeration open but gated inference, partial auth posture. The catalog itself is competitor intelligence (operator’s commercial-API key inventory disclosed) but the keys aren’t burnable.

F4: LM Studio / Jan AI self-host fleets

  • 318 LM Studio servers identified by GGUF/quantization-suffix model IDs in /v1/models. These are personal-developer-grade exposures: someone’s /Users/<name>/Downloads/some-model-q4.gguf running on their own GPU, exposed to the public internet without auth. Compute theft (operator’s GPU + electricity) but no provider-key cost.
  • 126 Jan AI / Cortex servers, same shape, different vendor. /api/v1/app/version returns jan or cortex markers.
  • Self-host model inventory includes uncensored fine-tunes (Qwen3.5-9B-Uncensored-HauhauCS-Aggressive, multiple *Q4_K_M.gguf builds), content/safety class is the same as the Ollama survey’s abliterated-model finding, just different platform.

F5: aiden_lu-tagged operator (named owned_by tag)

One host has owned_by: aiden_lu in its model registry, a personal-name tag. The operator named their model registry after themselves (or a team member named Aiden). Identifies the operator without WHOIS/cert work.

F6: Inference-functionality is universal

97.8% of confirmed unauth gateways accept and execute inference requests without any auth header. There is no “auth-off but not actually exploitable” in this population, if /v1/models is unauth, /v1/chat/completions almost always is too. The auth posture is binary at the gateway tier.


Aggregate proof-of-functionality stats

Across the 1,857 functional hosts:

  • Total prompt tokens billed (across all operators): 24,584
  • Total completion tokens billed: 12,913
  • Total quota consumed: 37,497 tokens
  • Estimated USD cost (across all operators combined): ~$0.011 (gpt-4o-mini-equivalent pricing)
  • Average per-host cost: $0.000006 (~$6 per million such calls)
  • Disclosure threshold compliance: well below the $1 threshold typical of coordinated-disclosure PoC

Evidence pack: evidence/llm-gateway-tier2-2026-05-04/, full JSONL + CSV breakdown, no key strings (none extracted).


Cross-reference: vLLM survey

The vLLM cross-cloud survey (vllm-cloud-survey-2026-05.md) already documented 10 commercial-API reseller proxies burning operator credit on every external prompt, Grok2API, Kiro-Go, AgentBar, etc. Those instances served via vLLM-fronted OpenAI-compat endpoints. This survey extends the population to gateway-product instances (LiteLLM, OneAPI, etc.) where the same pattern recurs but at a different operator tier.

The combined “OpenAI-compat reseller proxy” population spans:

  • vLLM-fronted (vLLM survey: 10)
  • LiteLLM-fronted (this survey: 7)
  • LM Studio self-host (this survey: 318, compute-only, no provider-key burnable)
  • Jan AI / Cortex self-host (this survey: 126, compute-only)
  • Generic OpenAI-compat (this survey: 1,448, of which 1,829 burnable provider-keyed instances)
  • Combined population of “OpenAI-compat reseller / proxy / self-host with unauth”: 10 (vllm) + 7 (litellm) + 1,829 (generic-burnable) + 444 (LM Studio + Jan AI compute-only) = 2,290 unauth gateway instances at population scale

Honest negative space

  • OpenAI-compat is a coarse signal. Many of the gateway products will look identical at /v1/models until probed at platform-specific endpoints. The probe handles this with a port-ordered handler list, but a misconfigured or atypical deployment may skip platform fingerprinting and fall through to “generic OpenAI-compat” classification. That bucket is therefore large and somewhat lossy.
  • Reverse-proxied gateways behind nginx/Caddy with basic auth at the proxy layer will return 401 at all paths and be classified as auth-on. The operator has done the right thing; we count them only as auth-on aggregate, no platform breakdown possible.
  • LM Studio / Jan AI desktop builds are sometimes exposed via SSH tunnel forwarding rather than public IP; this survey does not capture tunneled exposures.
  • vLLM overlap is explicitly excluded to avoid double-counting with that survey.

Disclosure plan

For each unauthenticated instance with high-severity threat classes (provider-key quota theft, reseller proxy fronting paid commercial accounts), draft coordinated-disclosure email per the standard NuClide template, routed via WHOIS-derived institution identification.


See also