Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Guide May 4, 2026

Future Surveys: AI/ML Infrastructure Categories Not Yet Covered

NuClide Research · 2026-05-04 — last updated 2026-06-02 Companion to: SYNTHESIS-2026-05.md


The 2026-05/06 survey series covers 35+ platform classes. Several adjacent categories remain unsurveyed and are catalogued here as a roadmap. Each entry includes:

  • Port(s) to masscan
  • Fingerprint (the canonical signature for the probe to use)
  • Auth posture in framework default (Tier-A no-auth-concept, Tier-A* auth-optional-off, Tier-B setup-wizard, Tier-C auth-on-default)
  • Risk class if exposed
  • Status (planned / partial / not-yet)

Anyone running NuClide’s tier-2 cloud range list (/tmp/tier2-all-ranges.txt, Scaleway 7, OVH 33, Linode 36 = 3.55M IPs) can pick a category and run the survey using the same masscan-then-probe pattern documented in the existing case studies.


Priority gaps: still open as of 2026-06-02

Cross-referenced against the 27-category shodan/queries/ index and the completed case-study set. These are the categories with the most untouched surface. Pick from here for a fresh survey:

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, K7004, K7044, S7068, S7075, S7076, T5858, T5904
  • 733 (AI Risk & Ethics Specialist): K7040, K7051, K7052, S7056, S7067, S7069, T5854, T5868, T5882, T5893, T5904
  • overlap (Common AI KSATs (all 5 roles)): K108, K1157, K1158, K1159, K22, K6311, K6900, K6935, K7003, K7024, K7041, K7045, K942, S7065
GapCategoryWhy it’s the gapTooling state
K8s-native workflow orchestration (category 29)Argo Workflows (port 2746)DONE 2026-05-31 — full arsenal run complete. ssl:“Argo Workflows” → 119 IPs; 0/33 unauth via SSL dork (all IAP/AzureAD). Port 2746 Shodan-dark (SYN-ACK-only, no banner). Insight #65 (TLS cert dork selection bias) + Insight #66 (default ports survey-driven). OPEN THREAD: tiptoe/zgrab2 full-handshake grab on port 2746 to measure unmeasured dark-tier population. Case study: argo-workflows-survey-cat29-2026-05-31.md.aimap v1.9.45 fingerprint shipped; Shodan-dark port 2746 pending tiptoe pass
Experiment tracking (category 04, registry half)W&B self-hosted, ClearML, Comet MLDONE 2026-05-26 (cat-04 stragglers) — ClearML auth-on-default confirmed (81/81); ransomed Elasticsearch at 37.230.233.135; 26/81 version disclosure via server.info. BentoML: narrative.io AWS infra leak. Insight #63 codified.aimap fingerprints exist; enumPrefect deep enumerator still missing
Code assistants (category 09)Tabby, Sourcegraph/Cody, OpenDevin/Devon, Continue.devDONE 2026-05-26 — 52 unauth OpenHands, 26-host WhatsApp bot template, Fluid Attacks home-dir leak, HKUST/HKGAI. Case study: openhands-code-assistant-survey-cat09-2026-05-26.md. Tabby ML still Shodan-dark (needs masscan).aimap fingerprints: OpenHands/Sourcegraph/Sourcebot/Sweep/Tabnine/Dyad/bolt.diy all exist
Specialty data layers (category 30)ClickHouse, Cassandra/ScyllaDB, Apache Pinot, DuckDB-HTTPDONE 2026-05-28/29 — 328 IPs mapped; Snap-E Cabs ScyllaDB unauth (ride-hailing PII), nyovenn fintech ScyllaDB unauth (4yr unpatched), ClickHouse unauth cluster (training data). Insights #60-65 codified. Case studies: specialty-data-layers-survey-2026-05-28/29.md.aimap fingerprints shipped
Vector-DB stragglers (category 02)pgvector, Redis Stack (vector), Vespa, Apache Solr, LanceDBPARTIAL DONE — Redis Stack DONE 2026-05-25 (78/78 unauth, Insights #60-61); Weaviate DONE 2026-05-12 (13,631 PII objects, critical); LanceDB/Vespa DONE 2026-05-28 (specialty-data survey); Typesense/Meilisearch DONE 2026-05-28 (Tier-C confirmed). OPEN: pgvector (needs TCP/SQL probe), Apache Solr (no dedicated run).pgvector needs custom TCP probe; Solr survey outstanding
Agent-framework stragglers (category 06)CrewAI Studio, BabyAGI/SuperAGI, Goose, Agno, GPT Researcher, AgentGPT, DevikaDONE 2026-05-26 — Agno auth-off-default (3 confirmed: AIRIAD Risk Advisor, Collision AgentOS+Walmart Temporal, agno-playground); GPT Researcher 14/21 unauth; AgentGPT 3 broken-localhost-OAuth; CrewAI Shodan-dark; SuperAGI all-SaaS; Devika defunct; BabyAGI/Goose CLI-only. Insight #64 codified. aimap v1.9.32-33 shipped.aimap v1.9.32 fingerprints shipped
Specialty domains (medical leg)NVIDIA Clara, MONAI, Orthanc/DICOM, dcm4che, NIMDONE 2026-05-15 (Survey 28) — 39 Orthanc unauth DICOM SCPs found; Clara/MONAI/NIM negative results on tier-2 cloud.aimap v1.9.4 fingerprints shipped
Specialty domains (robotics leg)ROS robotics (11311/9090), Jetson edgeGenuinely unmapped — highest-novelty, physical-impact tier for ROSnone
Compute-orch leftovers (category 04)Dask (8787), Prefect (4200), Temporal (7233/8080), BentoML (3000)DONE 2026-05-26 — Prefect auth-off-default (9/15 unauth; Italian LLM procurement + energy grid + MLS pipelines exposed); Dask 6 unauth university/cloud dashboards (Cambridge, UCB, UCSB, DigitalOcean active). Temporal not scanned (Shodan-dark at 7233). Case study: prefect-dask-clearml-cat04-stragglers-2026-05-26.md.aimap fingerprints: all exist; enumPrefect deep enumerator gap
Embeddings — masscan re-run (category 27)TEI, infinity-embedding (7997), llama.cpp --embeddingSurvey ran but Shodan-dark: TEI/infinity return JSON-only roots Shodan can’t index, Shodan pool gave ~1% live rate. Needs a masscan-seeded pass on 7997/8080 instead of Shodan-seededsurvey done Shodan-blind; aimap fingerprints exist

Detailed port/fingerprint/risk for each is in the per-category sections below.


Compute orchestration / training tier

Most are Tier-A “no auth concept” on the dashboard endpoint. Auth is bolted on by surrounding infra (K8s ingress + auth proxy), not the framework itself.

PlatformPortFingerprintTierRiskStatus
Ray Dashboard8265GET / returns Ray UI HTML; GET /api/jobs lists jobsACVE-2023-48022 ShadowRay actively exploited (job-submission RCE); job logs leakDONE 2026-05-06, see compute-orchestration-cloud-survey-2026-05.md (4 confirmed unauth on Shodan-seeded sample of 26; 16 ports-open-no-match likely Ray Serve, deferred)
Dask Dashboard8787GET /status returns Bokeh-rendered Dask pageACluster topology + worker info disclosure; expensive ops triggerableDONE 2026-05-26 — 6 unauth (Cambridge, UCB, UCSB, DigitalOcean active). See prefect-dask-clearml-cat04-stragglers-2026-05-26.md
Apache Spark UI4040, 8080GET / returns Spark Master / Application UIAJob logs + driver state + sometimes credentials in envDONE 2026-05-06, see compute-orchestration-cloud-survey-2026-05.md (85 confirmed unauth on Shodan-seeded sample of 120 across US/CN/DE/FR; ~71% exposure rate)
Apache Airflow8080GET /login returns Airflow login page; /home discloses dashboard if AnonymousUser public role enabledA* (auth optional, off-by-default in older versions)DAG-run history, sometimes plaintext credentials in connectionsDONE 2026-05-06, see compute-orchestration-cloud-survey-2026-05.md (8 confirmed unauth-via-/home + ~30 login-gated of 36 confirmed Airflow on Shodan-seeded sample of 57)
Prefect4200GET /api/health returns {"status":"healthy"}A*Flow runs + stateDONE 2026-05-26 — 9/15 unauth (Italian LLM procurement, energy grid, MLS pipelines exposed). See prefect-dask-clearml-cat04-stragglers-2026-05-26.md
Temporal7233 (gRPC), 8080 (web UI)GET /api/v1/cluster-infoA*Workflow historyPARTIAL — 7233 Shodan-dark; 8080 web UI not yet scanned. Open thread.
Kubeflow / KServevaries (K8s ingress)/v1/models OpenAPIvariesModel serving + pipeline metadatanot-yet, K8s ingress profile, separate from cheap-VPS surface
BentoML3000GET / returns BentoML service page; /docs SwaggerA*Model serving + sometimes file uploadPARTIAL 2026-05-26 — narrative.io AWS infra leak via BentoML config; no dedicated population survey yet

Embeddings infrastructure

PlatformPortFingerprintTierRiskStatus
TEI (HuggingFace Text Embeddings Inference)80, 3000, 8080GET /info returns {"model_id":"...","max_concurrent_requests":..., "model_pipeline_tag":"feature-extraction"}ACompute theft; model fingerprintingPARTIAL 2026-05, see embedding-services-cloud-survey-2026-05.md — confirmed Shodan-dark (JSON-only root, not crawler-indexed); 0 live on Shodan-seeded pool. Needs masscan-seeded re-run
infinity-embedding (michaelfeil)7997GET /openapi.json body contains Infinity EmbACompute theftPARTIAL 2026-05, embedding survey — port 7997 confirmed deployed at ~100-host signal, but Shodan-dark; masscan-seeded probe found 0 (hosts moved off default port / non-HTTP). Re-run candidate
llama.cpp HTTP server8080GET /health returns {"status":"ok"}; GET /props returns model propsACompute theft, prompt injectionPARTIAL 2026-05, embedding survey covered --embedding mode; /props deep-enum on full LLM-server mode still open

Specialty vector DBs

PlatformPortFingerprintTierRiskStatus
Weaviate8080GET /v1/meta returns Weaviate version JSON; GET /v1/schema lists classesA* (anonymous-access on by default in auth.anonymous_access.enabled=true)Same as Qdrant, vector data + schema disclosureDONE 2026-05-12 — 13,631 PII objects critical (aimable.ai: 13,631 personal records). See weaviate-cloud-survey-2026-05.md
pgvector (PostgreSQL extension)5432TCP banner + SELECT pgvector_version();A* (Postgres auth, depends on operator)Vector data via SQL injection / weak credsnot-yet — needs TCP/SQL probe; no HTTP surface; custom psql-probe required
Redis Stack (with vector search)6379TCP *1\r\n$4\r\nINFO\r\n returns Redis infoA* (default ALLOW-ANY in dev configs)Vector + cache + sometimes sessionsDONE 2026-05-25 — 78/78 unauth; Insights #60-61 (FT._LIST enumeration + RedisInsight credential leak). See redis-stack-redisinsight-population-survey-2026-05-25.md
LanceDBvariousGET /api/v1/database/listARAG storeDONE 2026-05-28 — covered in specialty-data-layers survey. Tier-C confirmed (auth-on-default). See specialty-data-layers-survey-2026-05-28.md
Vespa8080GET /state/v1 returns Vespa health JSONASearch + vectorDONE 2026-05-28 — covered in specialty-data-layers survey. See specialty-data-layers-survey-2026-05-28.md
Typesense8108GET /health returns {"ok":true}; X-TYPESENSE-API-KEY header for authA*Document index + facetsDONE-NEGATIVE 2026-05-28 — Tier-C confirmed (auth-on-default, 0% unauth at population scale). Specialty-data-layers survey.
Meilisearch7700GET /health returns {"status":"available"}A* (master-key auth optional)Document indexDONE 2026-05-28 — specialty-data-layers survey. Auth-optional-off in older versions; population scan run.
Apache Solr8983GET /solr/admin/info/systemA*Document index + sometimes RCE via velocity templatesnot-yet — no dedicated survey; partially seen via elasticsearch cross-survey.

LLM observability / tracing

PlatformPortFingerprintTierRiskStatus
Langfuse3000GET /api/public/health returns Langfuse health JSONC (auth-on-default)LLM trace history if signup-openPARTIAL 2026-05-06, single-host case study via cross-survey-correlation methodology (langfuse-cross-survey-2026-05-06.md). 1 confirmed hit (operator shifted to port 3001; 4-platform AI-stack catastrophe at pharos.unistarthubs.gr). Full population survey (Shodan dork "Langfuse" port:3000 ≈ 1,131 hits) deferred until Shodan API restored
Phoenix (Arize)6006GET /v1/traces OTLP JSONALLM call traces, sometimes PII in promptsDONE 2026-05-04, see observability-cloud-survey-2026-05.md (6 confirmed Phoenix + 3 TensorBoard, all unauth, active SDXL distillation training visible)
Heliconevariesgateway pattern, proxy logsA*LLM call historyDONE 2026-05-10 — deep-dive survey. See helicone-llm-observability-survey-2026-05-10.md
TruLens self-hostedvariesdashboard fingerprintA*Eval tracesSURVEYED 2026-05-28 — 0 confirmed; 1 title hit (trulens.asia = Cambodian news site, FP); 1 cert CN hit (vits-simple-api TTS, FP); population confirmed near-zero

Image generation / vision (beyond port 7860 surveyed)

PlatformPortFingerprintTierRiskStatus
ComfyUI8188GET /system_stats returns GPU info; GET /queue lists running jobsACompute theft + workflow exfil + GPU infoDONE 2026-05-04, see comfyui-cloud-survey-2026-05.md (6 confirmed, 100% unauth, 385 GB VRAM exposed including RTX PRO 6000 Blackwell)
Roboflow self-hostedvariesAPI key requiredCCustom model servingnot-yet
YOLOv8 / MMDetection inference serversvaries (often 8000)Custom HTTP APIA*Compute theft, prompt injection (multimodal)partial, some seen via Triton survey

Speech & Audio AI (survey 17: DONE 2026-05-29)

Survey-17 query catalog: shodan/queries/17-voice-audio-ai.md Discovery runbook: data/voice-audio-ai-discovery-runbook.sh aimap fingerprints added (10 new, count went 56 → 66): Whisper ASR, Coqui XTTS, Piper TTS, RVC Voice Cloning WebUI, OpenVoice, ChatTTS, F5-TTS, Pipecat Voice Agent, Vocode Voice Agent, LiveKit Agents.

PlatformPortFingerprintTierRiskStatus
Whisper ASR family9000, 8080, 7860, 8000/asr or /inference or /v1/audio/transcriptionsAFree transcription compute theft; PHI/PII in audio captured by hospital deploymentsDONE 2026-05-28/29 — population survey complete. See voice-audio-ai-survey-2026-05-28.md + voice-audio-ai-rerun-2026-05-29.md. 45+ unauth confirmed. Insight #67.
Coqui XTTS server8020, 5002GET /api/tts/speakers returns speaker listACompute theft (voice cloning), trademark/voice misuseDONE 2026-05-28/29 — covered in voice/audio survey.
Piper TTS HTTP wrapper5000, 8080, 10200GET / body contains piper + ttsAEdge-deployed; compute theftDONE 2026-05-28/29 — covered in voice/audio survey.
RVC / GPT-SoVITS / Applio voice cloning7865, 7860, 7897GET / body contains Retrieval-based-Voice-Conversion / GPT-SoVITS / ApplioAFraud-relevant — voice cloning Gradio UIs; trademark abuse + deepfake-call enablementDONE 2026-05-28/29 — covered in voice/audio survey. Multiple unauth voice-cloning UIs confirmed.
OpenVoice (MyShell.ai)7860, 8000GET / body contains OpenVoice + myshellAMulti-language voice cloning compute theftDONE 2026-05-28/29 — covered in voice/audio survey.
ChatTTS (2noise)7860, 8000, 9966GET / body contains ChatTTS + 2noiseAConversational TTS compute theftDONE 2026-05-28/29 — covered in voice/audio survey.
F5-TTS / E2-TTS7860, 8000GET / body contains F5-TTS or swivid/f5-ttsAVoice-cloning compute theftDONE 2026-05-28/29 — covered in voice/audio survey.
Pipecat (Daily.co)7860, 8000, 8080GET / body contains pipecatA*Real-time voice-agent abuse — outbound call automation if integrated with Twilio/DailyDONE 2026-05-28/29 — covered in voice/audio survey.
Vocode8000, 3000, 7860GET / body contains vocode + transcriberA*Same as PipecatDONE 2026-05-28/29 — covered in voice/audio survey.
LiveKit Agents7880, 8080, 3000GET / body contains livekit-agents or livekit-serverA*SameDONE 2026-05-28/29 — covered in voice/audio survey.
Mozilla TTS / Coqui TTS legacy5002GET /api/ttsASamecovered under Coqui XTTS fingerprint (alt port)
Bark / MusicGen Gradio UIs7860GET / returns Gradio UIACompute theftcovered by Gradio + body_contains discriminator queries in 17-voice-audio-ai.md
pyAnnote diarizationvariesCustom HTTP APIASpeaker-ID compute theftnot-yet (no canonical HTTP server pattern)

ML lifecycle / model registries

PlatformPortFingerprintTierRiskStatus
W&B self-hosted8080, 443GET /api/health returns {"version":"..."}C (auth-on-default)Experiment data if signup-opennot-yet
ClearML server8080, 8081, 8008GET /version returns ClearML versionA*Experiment dataDONE 2026-05-26 — 81/81 auth-on-default confirmed (Tier-C). Insight #63. See prefect-dask-clearml-cat04-stragglers-2026-05-26.md
Comet ML self-hostedvariesAPI token requiredCExperiment datanot-yet
Neptune.aivariesAPI token requiredCExperiment datanot-yet, managed-mostly
DVC remote storageS3-compatbucket-policy depends on operatorvariesModel artifacts, training datapartial, covered by MinIO survey

Agent platforms (newer / autonomy)

PlatformPortFingerprintTierRiskStatus
AutoGen Studio8081GET / returns AutoGen Studio UI; GET /api/agentsA*Agent definitions + sometimes credentials in toolsDONE 2026-05-14, see autogen-studio-survey-2026-05-14.md (9 confirmed, 100% unauth; /api/teams leaking agent defs + tool creds on 7/9; produced Insight #21 port-first discovery)
CrewAI Studiovariesdashboard fingerprintA*Agent definitionsDONE 2026-05-26 — Shodan-dark; no web UI; CLI-only distribution confirmed. See agno-gptresearcher-agentgpt-cat06-stragglers-2026-05-26.md
LangGraph servers8000 (uvicorn)server: uvicorn + JSON body contains “langgraph”A*Financial workflows, PII scraper, conversation history (user_conversations Qdrant)DONE 2026-05-25, see langgraph-server-survey-2026-05-25.md (16 confirmed, 100% unauth; 7 stacked-exposure hosts; 4 templates; Insight #56)
BabyAGI / SuperAGIvariesdashboard fingerprintA*Agent state, sometimes API keysDONE 2026-05-26 — SuperAGI all-SaaS (no self-hosted surface); BabyAGI CLI-only (no server mode). See agno-gptresearcher-agentgpt-cat06-stragglers-2026-05-26.md
Goose (Block)variesCustom config endpoint; goose- HTTP signaturesA*Agent definitions, sometimes embedded credentials in extensionsDONE 2026-05-26 — CLI-only; no server mode confirmed. See agno-gptresearcher-agentgpt-cat06-stragglers-2026-05-26.md
AutoGPT-derivative server modesvariesDashboard or /api/agent/* routesA*Agent state, embedded keysnot-yet

Specialty data layers (often AI-adjacent)

PlatformPortFingerprintTierRiskStatus
ClickHouse8123 (HTTP), 9000 (TCP)GET /?query=SELECT+1 returns 1; HTTP banner ClickHouse-A*OLAP query access, sometimes including AI training datasetsDONE 2026-05-28 — unauth ClickHouse cluster (training data exposed). See specialty-data-layers-survey-2026-05-28.md
DuckDB HTTP servervariesCustom HTTP APIA*Embedded analytics queriesDONE-NEGATIVE 2026-05-28 — 7 Shodan hits, all FP (unrelated services on port). No confirmed DuckDB-HTTP deployments at population scale.
Cassandra / ScyllaDB9042 (CQL native), 7000 (gossip)TCP banner + SELECT release_version FROM system.localA*NoSQL data + sometimes AI feature storesDONE 2026-05-28 — Snap-E Cabs ScyllaDB unauth (ride-hailing PII); nyovenn fintech ScyllaDB (4yr unpatched). See specialty-data-layers-survey-2026-05-28.md
Apache Pinot9000 (controller), 8000GET /cluster/infoA*Real-time analyticsnot-yet — port 9000 collision unresolved; no dedicated survey run

Dev-tooling AI / coding agents

PlatformPortFingerprintTierRiskStatus
Continue.dev serversvariesCustom config endpointA*LLM proxy abusenot-yet
Tabby self-hosted8080GET / returns Tabby UI; GET /v1beta/healthA*Code-completion compute theftnot-yet
Sourcegraph self-hosted (Cody backend)7080, 3080GET /.api/graphql returns Sourcegraph schema; Cody integration via HTTP+SSECCode-context exfil, sometimes private-repo access via Cody session tokensnot-yet, passing mentions in repo
OpenDevin / Devon agent backends3000, 8000GET / returns OpenDevin UI; /api/options/modelsA*Autonomous-agent control, sandbox escape if Docker-on-hostDONE 2026-05-26 — 52 unauth OpenHands (OpenDevin rebranded); 26-host WhatsApp bot template; HKUST/HKGAI. See openhands-code-assistant-survey-cat09-2026-05-26.md
Devstral self-hostedvariesCustom HTTP APIA*Code-completion compute theftnot-yet
Aidertypically not server-moden/an/an/anot-applicable (CLI-only)

Specialty domains

PlatformPortFingerprintTierRiskStatus
NVIDIA Clara (medical AI)variesTriton-class APIsA*Medical-data compute theftDONE-NEGATIVE 2026-05-15, see medical-edge-ai-survey-2026-05-15.md. No Clara on tier-2 cloud; Clara is K8s-bound, on-prem, or in HIPAA-specialized hosting. Negative result confirms category-class tenancy.
MONAI Deploy / Label8000/info/ JSON with trainers+strategies+scoring+datastoreA*Medical-imaging label/model dataDONE-NEGATIVE 2026-05-15 — aimap fingerprint added (v1.9.4); zero confirmed on tier-2 cloud. Same on-prem tenancy as Clara.
Orthanc DICOM4242/8042/8043/11112DICOM A-ASSOCIATE-AC PDU with “ORTHANC” AEA* (DICOM TCP unauth, HTTP REST auth-on)PHI via C-FIND/C-MOVE/C-STOREDONE 2026-05-15 — 39 confirmed unauth DICOM SCPs (post 2-pass honeypot filter), 11 named operators. Produced Insight #22.
dcm4che / dcm4chee-arc8080/8443/dcm4chee-arc/aets arrayC (Keycloak-fronted)Same as Orthancaimap fingerprint added; zero confirmed on tier-2 (Keycloak-fronted = real deployments behind enterprise IAM).
DICOMweb (QIDO-RS)8080/8042/443/studies array + DICOM tag 0020000DA*PHIaimap fingerprint added; zero confirmed on tier-2.
NVIDIA NIM8000/v1/metadata with modelInfo arrayA*Paid quota theft + model gating bypassaimap fingerprint added; zero confirmed on tier-2 (NIM is GPU-bound — runs on H100/A100 hosting, not commodity cloud).
ROS interfaces11311 (master), 9090 (rosbridge)XML-RPC bannerARobot fleet controlnot-yet
TensorRT inference serversvariesCustom HTTP APIA*Compute theftnot-yet, partial via Triton
Jetson endpointsvariesCustom edge-AI protocolsACompute / sensor theftnot-yet

MCP (Model Context Protocol) servers

The newest exposure surface in the AI stack. MCP was designed for stdio (in-process) transport but the ecosystem pushed HTTP/SSE for remote access. Operators wiring filesystem, shell, database, and cloud-API tools into MCP servers and exposing them without auth replays the unauthenticated-RPC failure pattern at the protocol layer.

Existing scaffolding: shodan/queries/10-mcp-servers.md, 8 fingerprint queries already documented. n8n cross-reference (n8n-cloud-survey-2026-05.md) counted ~400 instances exposing MCP endpoints, but no dedicated population-level survey yet.

PlatformPortFingerprintTierRiskStatus
MCP HTTP+SSE servers (generic)3000, 8000, 8080, 8888JSON-RPC initialize handshake; tools/list enumerates exposed toolsA* (auth-optional, off-by-default in most templates)Tool-surface exfil, credential leak in tool definitions, sometimes shell/filesystem/db/cloud-API accessDONE 2026-05-04, see mcp-cloud-survey-2026-05.md (95 confirmed cross-cloud, 28 with exposed tools incl. full Gmail mailbox MCP, Alcy CRM CRUD, hindsight-mcp v3.1.1 with 29 memory tools, 3× Casdoor IAM CRUD, rmcp Elasticsearch proxy)
FastMCP (Python framework)8000"FastMCP" "uvicorn" ShodanA*Samenot-yet
mcp-proxy (stdio-to-HTTP bridge)8080"mcp-proxy"ABridges local stdio MCP to HTTP, expanding exposurenot-yet
HexStrike AI (offensive MCP)8888 (Flask), 11434 (Ollama)"hexstrike" HTML / model nameA47 MCP tools wiring 150+ security tools to LLMspartial, see shodan/queries/10-mcp-servers.md
Cloudflare Workers MCP443*.workers.dev SSE endpointsvariesPer-Worker auth posturenot-yet, cert-transparency enumeration vector

LLM gateways / OpenAI-compat proxies

Mirror the vLLM-survey reseller-proxy finding (vllm-cloud-survey-2026-05.md documented 10 commercial-API reseller proxies burning operator credit). Different operator tier, gateway products run alongside or in front of upstream LLM providers, exposing provider keys + quota.

PlatformPortFingerprintTierRiskStatus
LiteLLM Proxy4000GET /health/liveliness; litellm: Prometheus prefixA*Provider key leak, quota theft, OpenAI-compat reseller patternDONE 2026-05-04, see llm-gateways-cloud-survey-2026-05.md (1,899 cross-cloud confirmed, 1,857 burnable unauth, including 2 Anthropic-key-functional hosts)
LocalAI8080GET /readyz returns OK; /v1/models OpenAI-compatA*Self-host LLM gateway, model-list enumerationDONE 2026-05-04, folded into LLM Gateways survey above
Text Generation WebUI / oobabooga5000, 7860GET /api/v1/model returns model name; Gradio/FastAPI dual-stackA*Self-host inference, gradio surfaceDONE 2026-05-04, folded into LLM Gateways survey
LM Studio server mode1234 (default), variesGET /v1/models OpenAI-compatACompute theft + model-listDONE 2026-05-04, 318 confirmed (LM Studio survey leg of LLM Gateways)
Jan AI server mode1337 (default)GET /v1/models OpenAI-compat; Jan-specific model pathsASameDONE 2026-05-04, 126 confirmed (Jan AI / Cortex leg of LLM Gateways)
OneAPI / NewAPI3000OpenAI-compat gateway with admin UIA*Provider keys, quota theftDONE 2026-05-04, folded into LLM Gateways survey

RAG framework servers

The pipeline above the vector DBs. RAG framework servers store embedded prompts, retrieval logic, and the bridge between document corpora and LLM calls. Exposing the framework, even with the underlying vector DB locked down, leaks prompt structure, system prompts, and operator data-flow.

PlatformPortFingerprintTierRiskStatus
LlamaIndex servers8000, 80GET /api/health; llama_index in OpenAPIA*Prompt + retrieval logic exfilpartial, passing references in repo, no survey
Haystack (deepset)8000GET /initialized returns {"initialized":true}; FastAPI surfaceA*Pipeline definitions, embedded promptspartial, passing references
LightRAG9621 (default), variesGET /health; LightRAG-specific endpointsARAG store + retrieval surfacenot-yet, secondary priority after MCP
Microsoft GraphRAGvariesCustom HTTP APIA*Knowledge graph + embedded promptsnot-yet
AnythingLLM3001GET /api/ping returns pongA*RAG admin + sometimes embedded credsnot-yet, supported in aiapp-probe.py
RAGFlow9380GET /v1/health; FastAPIA*Document pipelinenot-yet, supported in aiapp-probe.py
PrivateGPT / LocalGPT8001, 8000GET /healthA*Self-host RAGnot-yet

AI safety evaluation / red-team self-hosted

Their finding-corpus may itself be sensitive when exposed. Adversarial prompt libraries, evaluator outputs, and red-team test results often contain proprietary attack vectors that operators don’t want public.

PlatformPortFingerprintTierRiskStatus
Garak (NVIDIA adversarial harness)variesCLI-mode primary; some web UIsA*Adversarial probe library, eval resultsfingerprint added to aimap 2026-05-05 (/api/v1/garak/version + json_field: garak_version); 0 confirmed at population scale on tier-2 cloud sample; CLI deployment dominates
Promptfoo evaluators15500 (default)GET /api/health; promptfoo-specific endpointsA*Eval-run history, model-comparison dataDONE 2026-05-28 — 4 confirmed unauth (17 title hits; 24% unauth rate); F1 = evals.dev.generalwisdom.com (60 LLM providers, Anthropic Claude 4.x + Azure GPT-4o configs exposed); F2-F4 = dev/test instances. See ai-eval-redteam-survey-2026-05-28.md
Patronus AI (managed-mostly)variesAPI token requiredCEval artifactsnot-yet
AILuminate (MLCommons)variesCustomvariesBenchmark datanot-yet, limited self-host
DeepEval / Confident AIvariesCustom HTTP APIA*Eval runsSURVEYED 2026-05-28 — 0 confirmed (enterprise-only self-hosted; no Shodan hits on title dork; population near-zero confirmed)

Browser automation / agent backends

Headless-Chrome endpoints used by agent stacks. Misconfigured ones offer remote browser control as a service, attackers can drive scraping, credential harvesting, or cloud-resource abuse using the operator’s compute and IP reputation.

PlatformPortFingerprintTierRiskStatus
Browserless3000, 8000GET /json/version returns Chrome DevTools Protocol infoARemote browser control, session/cookie exfil if shared, scraping abuseDONE 2026-05-14, see browser-automation-backend-survey-2026-05-14.md (374 confirmed, v1 Docker monoculture)
Playwright server3000, 8931 (MCP)GET /json/protocol returns CDPASameDONE 2026-05-14, folded into browser-automation backend survey
Skyvern8000GET /api/v1/health; Skyvern-specific endpointsA*Browser-AI agent control, sometimes credentials in workflow definitionsnot-yet
Puppeteer / CDP endpoints9222 (CDP default)GET /json/versionADirect CDP access, authenticated-session exfilDONE 2026-05-14, see cdp-browser-control-survey-2026-05-14.md (6 real of 1.5k candidates; 3 critical incl. live OnlyFans + Ticketmaster sessions; aimap v1.9.2 anti-detect CDP enumerator added)
Selenium Grid4444GET /wd/hub/status returns Selenium statusABrowser fleet abuseDONE 2026-05-14, folded into browser-automation backend survey (1.6k grids, H4Y operator)
Splash8050GET /_debug returns Splash debug pageARender-service abuse, SSRFDONE 2026-05-14, folded into browser-automation backend survey (139 instances, 97% leaking /_debug)

Data labeling / annotation servers

Often exposed in ML team workflows; PII frequently in their datasets. Operators stand up labeling tools quickly to crowd-source annotation, then forget to lock them down before walking away.

Population survey DONE 2026-05, see data-labeling-cloud-survey-2026-05.md. Per-platform table below retained for fingerprint reference.

PlatformPortFingerprintTierRiskStatus
Argilla6900 (default)GET /api/_info returns Argilla versionA*Dataset content (often PII), labeled examples, sometimes embedded model outputsDONE 2026-05, data-labeling survey
LabelStudio8080GET /version returns LabelStudio versionA*Dataset content + project structureDONE 2026-05, data-labeling survey
Prodigy (Explosion AI)8080GET / returns Prodigy UIA*Dataset + annotator credentialscovered by data-labeling survey scope
doccano8000GET /v1/health returns OKA*NLP annotation projectscovered by data-labeling survey scope
CVAT (Computer Vision Annotation Tool)8080GET /api/server/aboutA*Image/video annotation projects, sometimes facial PIIcovered by data-labeling survey scope

Methodology template

For any platform above, the probe pattern is:

# 1. Masscan the canonical port across the tier-2 cloud /16 ranges
sudo masscan -iL /tmp/tier2-all-ranges.txt -p<port> --rate 10000 --wait 5 -oG /tmp/<platform>-masscan.txt

# 2. Filter to unique IPs
awk '/Host:/ {print $4}' /tmp/<platform>-masscan.txt | sort -u > /tmp/<platform>-ips.txt

# 3. Run the framework-specific fingerprint probe (200-thread Python)
/home/cowboy/security-tools/bin/python3 /tmp/<platform>-probe.py < /tmp/<platform>-ips.txt > /tmp/<platform>-confirmed.jsonl

# 4. Filter AS63949 honeypot fleet pollution if probe is permissive
/home/cowboy/security-tools/bin/python3 /tmp/honeypot-detector.py < /tmp/<platform>-ips.txt | comm -23 /tmp/<platform>-ips.txt -

# 5. Cert-pivot identified hosts on port 443 for operator attribution
while read ip; do
  cn=$(timeout 4 bash -c "echo | openssl s_client -connect $ip:443 -servername $ip 2>/dev/null" | openssl x509 -noout -ext subjectAltName 2>/dev/null | tail -1)
  echo "$ip$cn"
done < /tmp/<platform>-confirmed-ips.txt

The existing case studies serve as templates, the speech-audio-cloud-survey-2026-05.md is the most recent example following this pattern.


Completed surveys not previously listed here

These categories were surveyed after this doc was last updated (2026-05-04) and were never added to the roadmap. Recorded here for completeness.

CategorySurvey dateKey findingCase study
AI Gateways (Cat-32: Portkey, Kong, Bifrost, one-api, new-api, LiteLLM, TensorZero, Helicone, Envoy, sub2api)2026-06-0187 Envoy admin interfaces CRITICAL (unauth /config_dump leaks all upstream API keys); 13,456 new-api instances; 1,786 surface-open total. Insight #74 (gateway as master-key multiplier), #75 (HTTP admin ports kill cert-pivot).data/findings-breakdown-ai-gateways-2026-06-01.txt; nuclide.db #36255-#36848
Service Mesh introspection planes (Cat-33: Kiali, Linkerd, Cilium Hubble, Istio)2026-05-31Kiali anonymous strategy: 4/4 reachable = full namespace topology unauth. Cilium Hubble metrics unauth (9 Hubble UI exposed). Insight #71 (network-placement-as-auth).service-mesh-survey-2026-05-31.md
Auth Gateways (OPA, Authentik, Authelia, Keycloak self-hosted)2026-05-29Survey complete; coverage in auth-gateway-survey-2026-05-29.md.auth-gateway-survey-2026-05-29.md
ML Governance / Data Catalog (DataHub, Amundsen, Marquez, Atlas)2026-05-2931 violations fixed; ML governance surfaces assessed. Insight series codified.ml-governance-survey-2026-05-29.md
FinOps / Cost Analytics (Kubecost, OpenCost)2026-05-2867 unauth cost APIs; kc5-aws live Grafana credential = Extreme Networks (EXTR). Insight (recon-primitive).kubecost-opencost-finops-cost-api-survey-2026-05-28.md
Model Serving Management (BentoML serving, Ray Serve, Triton management)2026-05-28/29Survey complete. See model-serving-registry-survey-2026-05-28.md.model-serving-management-survey-2026-05-29.md
Safety Guardrails (Lakera Guard, Rebuff, NeMo Guardrails, Guardrails AI)2026-05-29Lakera Guard 15 dorks; population assessment complete.safety-guardrail-survey-2026-05-29.md
Classical ML (scikit-learn serving, XGBoost, ONNX runtime)2026-05-31Platform intel doc at data/platform-intel/classical-ml-osint-2026-05-31.md. Survey pending full arsenal run.data/platform-intel/classical-ml-osint-2026-05-31.md
MCP subplatforms (FastMCP, mcp-proxy, Cloudflare Workers MCP)2026-05-31Intel docs built; dedicated population survey pending.data/platform-intel/mcp-*.md
RAG stragglers (AnythingLLM, RAGFlow, R2R, Cognita)2026-05-3119 findings; Censys port+uvicorn pattern identified. See rag-frameworks-survey-cat07-2026-05-31.md.rag-stragglers-survey-2026-05-29.md + rag-frameworks-survey-cat07-2026-05-31.md

Why this list exists

The auth-on-default thesis predicts: for any framework that ships without authentication enabled by default, the population-scale deployment will be unauthenticated. Each unsurveyed platform above is an opportunity to either:

  1. Confirm the thesis on a new platform class (extends the evidence base)
  2. Falsify the thesis if a platform with auth-off-default ships ~0% unauth at population scale (would be a meaningful counter-example, none observed yet)

The list also acts as a roadmap for any contributor who wants to add coverage. NuClide’s tooling (aimap, recongraph, BARE) already covers many of the fingerprints above; running them at population scale on tier-2 cloud ranges is the work product.


See also