Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Survey Jun 5, 2026

Cat-03 Model Serving & Inference — Survey 2026-06-05

NuClide Research · 2026-06-05 · Consumer OpenAI-compat inference servers. 6 verified unauthenticated exposures; a high false-positive rate in the verification sample drove most aimap CRITICAL/HIGH candidates down to FP or surface-only.

Summary

Survey of 5,018 IPs across 17 Shodan and 9 Censys queries targeting Cat-03 (model serving and inference: llama.cpp, KoboldCpp, LM Studio, vLLM, SillyTavern, faster-whisper, One API, New API, Open WebUI, SGLang, GPT4All, HuggingFace TGI). 158 hosts responded live; aimap fingerprinted 72 services and flagged 20 CRITICAL / 19 HIGH. Verification of the flagged candidates refuted the majority: the One API/New API default-credential thesis did not hold at population scale (0/9), and four “GPT Researcher”, one “Lunary”, one “h2oGPT”, and two TTS fingerprints were misattributions. Six hosts confirmed genuinely unauthenticated with a 200-with-data read. The most material finding is an unauthenticated Ollama instance proxying a paid Ollama Connect cloud subscription (deepseek-v4-pro:cloud), callable by any internet host.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, K7004, K7044, K7054, S7068, S7070, S7075, S7076, T5858, T5904, T5919
  • 733 (AI Risk & Ethics Specialist): K7040, K7051, K7052, S7056, S7067, T5854, T5882, T5893, T5904
  • overlap (Common AI KSATs (all 5 roles)): K108, K1158, K1159, K22, K6311, K6900, K6935, K7003, K7041, K7051, K942, T5896

Thesis fit

The headline result is a verification-stage correction, not a discovery. This survey is a clean illustration of the load-bearing-verification principle: the scan produced 39 CRITICAL/HIGH candidates; the verified-true unauth set is 6. The auth-on-default thesis (Insight #40) is confirmed here — One API / New API enforce first-run password setup and the population rejected the documented default — the opposite of the pre-verification framing. Cand #79 (Ollama Connect subscription hijack) is confirmed with hard proof.

DCWF KSAT coverage

This survey produces evidence against the following DoD Cyber Workforce Framework AI work-role KSATs:

  • 672 AI T&E Specialist: T5919 (adversarial test in operationally realistic environments — marker-probe verify step), T5904 / T5858 (risk assessment), K7003 (AI security risks/threats/vulnerabilities), K7004 (T&E frameworks — the chain), K7044 (T&E V&V tools — aimap + VisorCAS), K7054 (robustness/resilience tools), S7068 (org/project-level AI risks), S7075 (testing ML algorithms/AI solutions).
  • 733 AI Risk & Ethics Specialist: T5893 / T5882 (Responsible AI practices — fingerprint-not-exfiltrate), T5904 (technical/societal risk), K7040 (PHI/PII considerations — bag-of-fields, no record reads), K7051 (ML blind spots / edge cases — the FP catalog IS this).
  • Gaps (publish as such): S7056 (bias/ethics assessment), S7076 (bias in datasets/outputs) — infrastructure survey, model outputs not exercised.

Per-finding tag examples:

  • F2 Ollama 121.153.39.157:11434 → K7003 + T5919 + K7040 (40 models enumerated, no inference invoked)
  • F5+F6 home rig 108.210.175.159 → K7003 + K7051 (front-end-secured / backend-exposed asymmetry = ML blind-spot at architecture level)

Per-finding entries — VERIFIED UNAUTH (200-with-data)

F1. 121.28.161.118:3000 — One API (operator outlier, not class default)

What was found

POST /api/user/login with {"username":"root","password":"123456"} returned HTTP 200 with data.role: 100 (root admin). Confirmed admin session on a single host.

Why it is bad

Default credentials on this one host expose the admin panel of an LLM API multiplexer (user + upstream-key management). Verified: admin credential works, role:100 returned. NOT exercised: upstream API keys in the admin UI were not read (restraint ethic).

Caveat (corrected)

This was the ONLY host of the surveyed One API / New API population that accepted root/123456. A 9-host population sweep returned 0/9 (see Cand #78, refuted). This is a lazy-operator outlier, not a class-level default-credential condition.

Who it affects

Bare IP, no RDNS, no bounty program. aimap-profile: unclassified.


F2. 121.153.39.157:11434 — Ollama (40 models, cloud-proxied)

What was found

GET /api/tags → 200 with 40 models including qwen3-embedding:latest, minimax-m2.5:cloud, deepseek-v4-pro:cloud. The :cloud models are Ollama Connect remote models served through the operator’s subscribed account.

Why it is bad

Unauthenticated model inventory disclosure plus an active Ollama Connect subscription reachable by any host. Listing verified; inference not exercised.

Who it affects

ai-open.kr — Korean research lab. aimap-profile: research_lab, MX mail.ai-open.kr.


F3. 176.9.85.172:7860 — ComfyUI

What was found

GET /system_stats → 200, {"ram_total":67108864000,"cuda_version":"12.8","comfyui_version":"0.17.0"}. 64GB RAM, CUDA 12.8.

Why it is bad

Unauth hardware/profile disclosure. No workflow or model-weight access exercised.

Who it affects

your-server.de Hetzner VPS. aimap-profile: unclassified.


F4. 51.15.140.250:8000 — vLLM 0.12.0

What was found

GET /v1/models → 200, {"id":"meta-llama/Llama-3.2-3B-Instruct","owned_by":"vllm","max_model_len":512}. GET /version{"version":"0.12.0"}.

Why it is bad

Open vLLM inference server, no auth. Model name + version disclosed; free inference available. Synthesis not exercised.

Who it affects

Scaleway range (51.15.x). Bare IP.


F5+F6. 108.210.175.159 — home AI-roleplay rig: front door locked, inference backends open (MOST MATERIAL)

A single AT&T residential host (108-210-175-159.lightspeed.nworla.sbcglobal.net, Kenner LA) running an enthusiast local-LLM stack on home broadband, alongside a Minecraft server. This is the survey’s best chain: the operator secured the UI and left the backends it depends on wide open.

What was found

Four services on one box (Shodan saw three; the fourth was Shodan-dark and surfaced only on active probe):

PortServiceAuthSource
8000SillyTavern (chat front-end)Basic auth ONHTTP 401, WWW-Authenticate: Basic realm="SillyTavern"Shodan
5001KoboldCpp / KoboldAI LitenoneServer: KoboldCppServer 1, access-control-allow-origin: *Shodan
11434Ollama 0.17.4, cloud-proxiednoneactive probe only (Shodan-dark)
25565Minecraft 1.21.11 “New Bloodsquirrelia”n/aShodan
  • KoboldCpp :5001 — GET /openai_api/v1/models → 200, {"id":"koboldcpp/gemma-4-31B-it-UD-Q8_K_XL","owned_by":"koboldcpp"}. Wildcard CORS, allow-headers include apikey, genkey, Authorization.
  • Ollama :11434 — GET /api/tags → 200, 5 models, one cloud-proxied: {"name":"deepseek-v4-pro:cloud","remote_model":"deepseek-v4-pro","remote_host":"https://ollama.com:443"}. Local: deepseek-r1:70b, deepseek-r1:32b, llama3:latest, smollm2:135m.

Why it is bad

The operator is not naive — SillyTavern (the thing they log into via browser) enforces basic auth. But SillyTavern is only the front-end; it drives the inference backends, and those (KoboldCpp :5001, Ollama :11434) are independently internet-reachable with no auth. An attacker never touches the authed front door:

SillyTavern :8000  [Basic auth OK]  --drives-->  KoboldCpp :5001  [unauth]
                                          \-----> Ollama   :11434 [unauth, cloud-proxied]

Direct hits on :5001/:11434 yield the same model access SillyTavern has, plus theft of the paid Ollama Connect cloud subscription (deepseek-v4-pro:cloud → ollama.com) — billing/resource drain with zero operator awareness — and unauth access to a local 70B. The auth on SillyTavern is decorative once the dependency graph is exposed. KoboldCpp’s wildcard CORS additionally makes :5001 drive-by-capable from a malicious page in the operator’s browser. visorgoose flagged a Connect takeover URL embedding an ed25519 key (host DESKTOP-8SFE9EN); the key was NOT fetched or used (restraint ethic) — the cloud-model listing alone proves the subscription surface.

Who it affects

AT&T residential broadband (home user), Kenner LA. Not a business or infra target. aimap mislabelled :5001 as h2oGPT (see FP catalog); the host is genuinely unauth, the fingerprint name was wrong.

Tool attribution

  • :5001 discovered by aimap (mislabelled h2oGPT); :11434 discovered by visorgoose as a shadow port beyond the corpus portlist (Shodan-dark — confirms Insight #77, scanner non-skippable after Shodan)
  • Verified: GET /openai_api/v1/models (KoboldCpp) and GET /api/tags (Ollama cloud-proxied) — both 200 unauth
  • :8000 SillyTavern basic-auth state and :25565 Minecraft from Shodan host record

Surface-open (access not exercised, not findings)

HostServiceState
108.62.161.37:8080sub2api401 API_KEY_REQUIRED — key-gated (downgrade from CRITICAL)
80.225.185.157:8080sub2api401 API_KEY_REQUIRED — key-gated (downgrade from CRITICAL)
31.192.104.158:8000MCP Serveruvicorn streamable-HTTP; mints mcp-session-id pre-auth; initialize not sent
15.235.9.143:3000Grafana 12.4.2login surface exposed; anonymous org access OFF (401)

False-positive catalog (verification refutations)

The codify-every-survey value of this survey is the FP set. Of the actively-verified CRITICAL/HIGH candidates, the following were refuted:

Candidateaimap labelRealityFP class
9 hosts (port 3000)One API/New API default-cred0/9 accept root/123456; New API ships first-run password setupthesis over-extrapolation
129.213.81.173:8888unauth Jupyter/api/contents → 403; token-protectednon-404 version banner read as open
37.59.123.209:3000Flowise unauth/api/v1/chatflows → 401auth-on-default holds
5.9.249.102:3000LunaryCheckRef (scholarly reference app)/api/v1/health+port-3000 too generic
15.235.9.143 / 88.198.67.137 / 34.47.31.176 / 91.99.202.219 :8000GPT Researcher (x4)all the same Gradio “Whisper Playground”/api/report 405 (Gradio FastAPI catch-all) read as endpoint-exists
108.210.175.159:5001h2oGPTKoboldCpp / KoboldAI Litelabel wrong (host IS unauth — F5)
61.171.112.92:8000Coqui XTTS + Chatterbox TTSZenTao project-management apptwo TTS fingerprints collided on port 8000
172.182.235.102:3000GrafanaHTTPS-only; plain-HTTP → 400scheme mismatch
121.28.161.118 (10250/8001)menlohunt kubelet /exec + K8s Dashboard10250 connection-refused; 8001 is a One API key proxymenlohunt asserted /exec on a port with no listener

Survey statistics

MetricCount
Shodan IPs4,284
Censys IPs1,025
Combined unique5,018
Live hosts158 (3.1%)
Services fingerprinted72
aimap CRITICAL / HIGH candidates20 / 19
Verified unauth (hard proof)6
Surface-open (access not exercised)4
Refuted in verification sample~21 candidates across 9 FP classes

Candidate insights

Cand #78 — REFUTED. One API / New API do NOT ship an exploitable factory default at population scale. A 9-host sweep returned 0/9 for root/123456; New API requires first-run password setup (setup:true, no shipped password to leave unchanged). The single confirmed host (F1) is an operator outlier. This is a textbook verification-stage save: the pre-verification framing (“10/10 class-level default”) was an extrapolation from one host and is false. Auth-on-default (Insight #40) is confirmed, not bucked.

Cand #79 — CONFIRMED (hard proof). Ollama Connect cloud-model proxying is an unauthenticated subscription-hijack surface. Verified on 108.210.175.159:11434 (deepseek-v4-pro:cloud → ollama.com, unauth) and 121.153.39.157:11434 (cloud models in a 40-model listing). The :cloud suffix + remote_host field in unauth /api/tags is a reliable marker. Impact: paid cloud-subscription drain with zero operator awareness. Related to Insight #49 (shared Ollama Connect portfolio).

Cand #80 — RETRACTED. The “Indonesian government AI exposure” hosts (jatengprov.go.id, kaltaraprov.go.id) are NOT in the Cat-03 corpus. They surfaced because VisorScuba assess scores ledger-wide over nuclide.db (all prior surveys, ~25k events) with no per-survey filter; those are prior Ollama-survey carryover. Not a Cat-03 finding. Lesson: scope VisorScuba output to the survey’s own ingested events before attributing a finding to the survey.

Cand #81 (new) — Framework catch-all FP class recurs in model-serving. Three of this survey’s FPs (GPT Researcher via Gradio /api/report 405, Lunary via generic /api/v1/health, TTS via ZenTao on :8000) are the same structural class as the dcm4chee ASP.NET-catchall and CVAT-IAP-200 FPs: a generic web framework echoing a truthy/non-404 status on the probed path, read by the fingerprint as “endpoint exists.” Fix pattern: anchor fingerprints on a positive body marker (vendor string / real JSON shape), not a non-404 status, and add framework negative-matches (e.g. gradio_config, ZenTao zentaosid cookie).

Cand #82 (new) — Front-end-secured / backend-exposed asymmetry in enthusiast local-LLM stacks. The hobbyist roleplay stack (SillyTavern + KoboldCpp + Ollama) exhibits a recurring failure shape: the operator authenticates the UI they personally log into (SillyTavern basic auth) and leaves the inference backends that UI depends on (KoboldCpp :5001, Ollama :11434) unauthenticated and independently internet-reachable. The defender tried — this is not naivety — but secured the front door while the dependency graph stayed open. An attacker bypasses the authed UI entirely by hitting the backends directly. The attack surface is the dependency graph, not the front door. Verified on 108.210.175.159 (F5+F6). Severity is amplified when a backend carries a paid Ollama Connect cloud subscription (subscription theft, Cand #79) and/or wildcard CORS (browser drive-by). Defensive framing for reporting: securing the UI is necessary but not sufficient; every inference backend needs its own auth and bind-to-localhost.