Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC
Library Reference Research Corpus
research corpus · SOURCE DATA · 37 categories · 9 layers · 84 methodology insights

The research corpus.

§ 01 Reference topology
09layer
Chat UIs user
Open WebUI AnythingLLM LobeChat LibreChat custom front-ends
3,400+ unauthenticated chat front-ends
08layer
Agent / RAG APIs orchestration
LiteLLM LangServe LangFlow Flowise custom routers
1,200+ open Agent / RAG endpoints
07layer
Model servers inference
Ollama llama.cpp vLLM TGI Triton LocalAI
16,473 unauthenticated Ollama · 1,200+ vLLM
06layer
Vector DBs retrieval
Qdrant Milvus Weaviate Chroma Pinecone (hosted)
2,100+ open vector indices
05layer
Search / docs retrieval
Elasticsearch ClickHouse Solr Meilisearch Typesense
5,037 ES with dense_vector schema
04layer
Browser automation agents
Browserless Selenium Grid Playwright CDP proxies ComfyUI
548 unauthenticated ComfyUI · 6 live CDP sessions
03layer
Data layer storage
Postgres MongoDB MinIO / S3 Redis etcd Vault
3,014 etcd · 912 Vault · 4,105 Consul
02layer
Orchestration compute
Kubernetes Docker Compose Nomad systemd
Docker defaults are the proximate cause across most layers above
01layer
GPU compute hardware
H100 H200 L40S A100 RTX 5090 consumer cards
10× L40S in one fleet observed
the public IPv4 internet
§ 02 Ports
portservicenote
80 / 443 Generic HTTP(S), Dify, Flowise, reverse-proxied everything Filter by `http.title:` / HTML fingerprint
1337 Jan, Devika Hacker-cute defaults
1984 LangSmith
2375 Docker daemon (unauth) RCE → host foothold
2379 etcd (Milvus metadata), Kubernetes control plane
3000 Flowise, Open WebUI, AnythingLLM, AgentGPT, SuperAGI, Langfuse, Promptfoo, OpenDevin, Grafana Most crowded port in AI
3001 AnythingLLM
4000 LiteLLM Proxy Provider keys live here
4040 Apache Spark UI Often co-deployed with ML pipelines
4317 OpenTelemetry gRPC (OTLP) LLM observability transport
4318 OpenTelemetry HTTP (OTLP) LLM observability transport
4567 Rivet
5000 MLflow Models, artifacts, experiments
5001 KoboldCpp
5050 pgAdmin Often default creds
5432 PostgreSQL + pgvector, Supabase, Neon, Timescale
5500 ChromaDB (alt)
5601 Kibana, OpenSearch Dashboards Vector index admin
5678 n8n AI workflow automation
6006 Phoenix/Arize, TensorBoard Traces + training viz
6333 Qdrant (HTTP) Snapshots downloadable
6334 Qdrant (gRPC)
6379 Redis / Redis Stack (vector search) Often no auth
6443 Kubernetes API server ML workload orchestration
6900 Argilla RLHF/annotation data
7474 Neo4j Browser Graph memory stores
7501 Lightning AI
7687 Neo4j Bolt, Memgraph
7860 Gradio, LangFlow, unsloth, text-generation-webui HuggingFace Spaces default
7997 Infinity (embeddings)
8000 LangChain, vLLM, Triton, FastAPI generic, ChromaDB, AutoGPT, BentoML, Ray Serve, MetaGPT, Mem0, many `/v1/*` OpenAI-compat Single most common LLM port
8001 RedisInsight
8008 ClearML
8080 LocalAI, llama.cpp, Vespa, BabyAGI, Axolotl, Determined AI, Kubeflow, Airflow, Helicone, Dgraph, NVIDIA, Vast.ai, HF TEI/TGI, Phidata Generic "alt-HTTP"
8081 mongo-express
8088 Hadoop YARN ResourceManager Training data pipelines
8089 Splunk HEC Sometimes LLM log sink
8108 Typesense API key enumeration risk
8123 LangGraph Studio, ClickHouse
8161 ActiveMQ Web Console ML pipeline message broker
8265 Ray Dashboard Cluster job submission, RCE
8443 SageMaker Notebook, alt-HTTPS
8501 Streamlit
8529 ArangoDB
8787 Cloudflare AI Gateway, Portkey, RStudio Server
8882 Marqo
8888 Jupyter, RunPod RCE if no token
9000 MinIO (Milvus backing), Portainer Vector blobs in buckets
9090 Prometheus Every ML stack exports metrics here
9091 Milvus metrics, Zilliz
9092 Apache Kafka LLM event streams, training pipelines
9200 Elasticsearch / OpenSearch `dense_vector` / kNN
9400 NVIDIA DCGM GPU telemetry
9870 Hadoop NameNode (HDFS) Training data at rest
9998 Apache Tika Document ingestion
10250 Kubelet K8s node attack surface
11434 Ollama Most-exposed LLM runtime in 2025-26
19530 Milvus (gRPC)
27017 MongoDB Increasingly used as vector store
50070 Hadoop NameNode (legacy)
§ 03 Catalogue
01
The agent layer
orchestration & tool-calling 5 categories

Where the LLM gets hands. Agent frameworks coordinate multi-step reasoning; MCP servers expose tools; browser and voice agents pipe model output into real-world action.

Agent Frameworks

T1

LangGraph, AutoGen, CrewAI, AG2

ports 3000 · 8000
LangGraph / AutoGen / CrewAI / AG2
read
What it is

Where MCP standardises one agent calling tools, agent frameworks orchestrate many agents talking to each other. LangGraph (LangChain) models agent flows as state machines on a graph. AutoGen (Microsoft) and its fork AG2 model multi-agent conversations with explicit role assignments. CrewAI is the high-level "Researcher / Planner / Critic / Writer" team abstraction. MetaGPT ships the same idea as a software-team simulation. Together they are how teams ship the kind of system Anthropic's CEO calls "a virtual coworker."

What goes wrong

The orchestrator process is a long-running stateful Python service that holds the entire conversation graph between every agent it has ever coordinated. The state typically lives on disk or in a Redis-backed checkpoint store. When the orchestrator's HTTP control plane is exposed without auth, an attacker reads every agent's history (which often contains intermediate tool outputs and customer data) and can frequently inject new messages into a running conversation. The attack surface is every tool every agent has ever been given multiplied by the orchestrator's lifetime.

How we test

We probe LangGraph's /threads and /runs endpoints, AutoGen's WebSocket control surface, and CrewAI's REST API for the conversation inventory. Conversation IDs and timestamps tell us how long the orchestrator has been running and how active the operator's deployment is. We do not read message bodies. The agent role catalogue (extractable from configuration without reading conversations) is sufficient operator-attribution evidence.

Browser Agents

T1

Headless browsers driven by LLMs

ports 3000 · 4444 · 9222 surveyed
153 unauth · 100% at platform endpoint
read
What it is

Browser agents pair an LLM with a headless browser (Chromium, Playwright, Puppeteer) so the model can see a webpage, reason about it, and click. It's the natural answer to a real problem: most of the world's data lives behind JavaScript, and most of the world's tools live behind UIs that have no API. Frameworks like browser-use, Stagehand, and the Anthropic Computer-Use harness all share this shape: a screenshot, a model decision, an action.

What goes wrong

The agent process is an entire web browser running with the operator's full session context: cookies, saved logins, residential IP, sometimes payment methods. When the agent's control plane (the API that accepts "go do X") is exposed to the public internet without auth, anyone can drive that browser through the operator's identity. We've also found stale instances where the browser left a session pinned for hours after the agent's last task; an attacker who finds the open port inherits whatever the human last logged into.

How we test

We probe for known framework footprints (the WebSocket port browser-use opens, the screenshot endpoint of the Computer-Use sample server, the Stagehand remote-control API) and confirm reachability with a benign read of agent state: current URL, session cookies count, last action history. From there we map the agent's identity by inspecting what site it last interacted with, which is sufficient to identify the operator without ever issuing an action.

MCP Servers

T1

Model Context Protocol, tool-calling agents

ports varies surveyed
95 unauth · 28 with non-empty tools/list
read
What it is

The Model Context Protocol is Anthropic's open standard for letting an LLM call tools: read files, query a database, send mail, push to GitHub. An MCP server is a small process that lists "tools" (functions with JSON schemas) over a server-sent-events connection; any compatible client (Claude Desktop, Cursor, Cline) can connect, enumerate the catalogue, and invoke tools on the model's behalf. It's an elegant design: a single protocol for all of agentic tool-calling. And it has spread fast.

What goes wrong

The protocol assumes the network boundary handles authentication. A tremendous amount of operator effort goes into the tool definitions and almost none into the transport: most deployments expose /sse directly to the public internet with no auth at all. The first message a client sends is tools/list, and the server answers with the entire tool catalogue in plaintext: names, descriptions, parameter schemas. From there an attacker calls anything they want, with the operator's credentials baked into the server-side handlers.

How we test

Our deep MCP enumerator opens an SSE channel, walks the JSON-RPC handshake, captures the tool list, and probes each tool's schema with synthetic arguments to confirm reachability. We classify the catalogue by sensitivity (file-system access, mail/IM connectors, IAM/cluster operations) and follow up with single high-signal invocations to validate exploitability. We then map the tool implementations back to the operator (often via leaked tokens, repository references, or Claude Desktop config patterns) so the disclosure reaches the actual maintainer rather than the abuse desk of an unrelated cloud.

Voice Agents

T2

Vapi, Retell, LiveKit Agents, Pipecat

ports WebRTC
Vapi / Retell / LiveKit / Pipecat
read
What it is

Voice agents pair an LLM with real-time speech-to-text, text-to-speech, and phone-call infrastructure. Vapi and Retell are the managed-platform leaders, both used to build customer-support and outbound-sales bots that sound like humans on a phone call. LiveKit Agents is the open-source real-time framework. Pipecat (Daily.co) is the Python-native voice agent framework. Behind every "AI agent answered my call" experience is one of these orchestrating Whisper, GPT/Claude, and a TTS engine on a sub-200ms budget.

What goes wrong

Voice agent control planes hold the most invasive credentials in the AI stack: a phone number, an outbound-calling ability, and a recording of every call the operator has placed or received. When exposed without auth, an attacker gets a free phone number with the operator's billing relationship and a verbatim audio archive of every customer conversation, including account verification phrases, credit card readbacks, and the medical or legal context the customer thought was private.

How we test

We probe the dashboard and admin APIs (Vapi's /v1/calls, LiveKit's WebSocket control endpoint, Pipecat's status server). Call counts and duration distributions characterise the operator's traffic. We never trigger outbound calls. Recording filenames or call IDs are sufficient attribution evidence; most operators name calls by their internal campaign ID which identifies the team without our needing to listen to anything.

Workflow Automation

T1

n8n, Flowise, LLM-native flows

ports 5678 · 3000 surveyed
n8n + Flowise cloud sweep · 2026-05
read
What it is

n8n and Flowise are the Zapier of the AI era: visual builders where every node can be an HTTP call, a database query, an LLM call, or a downstream automation. They are how non-engineers ship real agentic systems: drag a Gmail node, an OpenAI node, a Postgres node onto a canvas, click run. The expressive power is genuinely impressive, and that's why they have caught on in startups, marketing teams, and internal-tools shops.

What goes wrong

Every workflow is a JSON document containing the credentials of every service it touches. The default n8n install exposes the editor at / with no auth on first boot; the operator is supposed to enable basic auth themselves. Many don't. Flowise has the same shape: visit the IP, see the canvas, see the API keys baked into the OpenAI node, see which CRM is wired to which Gmail account. A single exposed instance can leak the API keys for the operator's entire SaaS stack, plus a list of every workflow they run.

How we test

We fingerprint the editor by its asset bundle, then read the workflow list through the public REST API (no auth in the default config). Each workflow's JSON exposes credentials by reference. We resolve the reference through the credentials endpoint and confirm the secret is present without ever exfiltrating it. We catalogue the workflow names because they tell the operator's story better than any banner: "Daily-report-to-CEO", "Sync-Stripe-to-Notion", etc.

02
The application layer
what the user touches 5 categories

The surfaces operators put in front of users — chat UIs, generation studios, code agents, inference demos, hosted notebooks. Almost all of them ship with permissive defaults that survive into production.

Chat UIs

T2

Open WebUI, ChatGPT-style frontends

ports 3000 · 3001 · 8080 surveyed
1,170 instances · 0% unauth (open-signup is the failure mode)
read
What it is

Open WebUI (formerly Ollama WebUI) is the most popular self-hosted chat interface for local LLMs. It looks like ChatGPT, talks to Ollama or any OpenAI-compatible backend, supports multi-user accounts, RAG document upload, and has become the de-facto control panel for self-hosted AI. LibreChat, Chatbot UI, and a handful of others share the niche.

What goes wrong

Open WebUI ships with open registration enabled by default: visit the URL, click "Sign up", you're inside. The first user is silently promoted to administrator, and admin accounts can read every other user's chat history, upload arbitrary RAG documents into the shared knowledge base, and route prompts through any configured backend at the operator's expense. When the operator never bothers to disable signups (and very few do), anyone who finds the IP becomes a peer user with full access to the whole multi-tenant shared corpus.

How we test

We confirm Open WebUI by its /manifest.json and the very specific bundle hash of its frontend, then test the registration endpoint with a benign account creation. We do not enumerate other users' chats; the proof of exposure is the successful account itself, which we screenshot and report. Where the deployment connects to a backend gateway (LiteLLM, OneAPI), we note which provider's API key the operator is paying for. That's the quota-drain story that makes the disclosure land.

Code Agents

T2

Aider, OpenHands, Continue, SWE-agent

ports 3000 · 8000
Tabby / Refact / Cody — not yet surveyed standalone
read
What it is

Code agents pair an LLM with a development environment. The model reads the codebase, edits files, runs tests, and submits PRs. Aider (Paul Gauthier) is the terminal-native pair-programming agent. OpenHands (formerly OpenDevin) is the all-in-one autonomous-developer platform. Continue.dev is the IDE plugin the model drives from inside VS Code or JetBrains. SWE-agent (Princeton) is the research-grade benchmark agent. Cline and Roo Code sit in the same niche. Together they are how the "AI writes the code" workflow actually ships.

What goes wrong

A code agent runs as the operator inside a development environment with the operator's git credentials, SSH keys, cloud SDK config, and shell history. When the agent's web UI or REST control plane is exposed without auth, an attacker drives the same shell. They can read the codebase (including secrets in .env files the agent has visibility into), commit and push to the operator's repos, deploy via the operator's CI hooks, and pivot via any SSH credential the agent's environment carries. Most installs assume "this is on my laptop" and never reconsider when the operator deploys to a remote workstation.

How we test

We probe the agent's web UI for the framework signature (OpenHands, Continue server, Aider's --browser mode all have distinct asset bundles) and read the workspace path from the status endpoint. The workspace path is enough to characterise the operator (corporate hostname, repo name, often the user's home directory). We never invoke the agent. Workspace metadata is sufficient attribution evidence.

Generation Studios

T1

ComfyUI, image/video pipelines

ports 7860 · 8188 surveyed
548 unauth ComfyUI · 385 GB VRAM observed
read
What it is

ComfyUI is a node-graph editor for diffusion models: Stable Diffusion, FLUX, Stable Video, AnimateDiff. You wire model loaders, samplers, conditioning nodes, and post-processing into a graph and run it. Automatic1111 is the older single-page-app cousin. Both are how artists, hobbyists, and a growing number of commercial studios actually generate images and short video at scale on self-hosted hardware.

What goes wrong

ComfyUI ships with no authentication and an HTTP API that accepts arbitrary Python custom-node code execution as a feature. Workflow JSON files include the full graph (model paths, LoRA weights, sometimes seeds and prompts), which is enough to reconstruct an operator's creative process or extract proprietary fine-tuned weights. The /object_info endpoint enumerates every loaded model and custom node; the queue endpoint accepts arbitrary workflows from anyone who can reach the port.

How we test

We probe /object_info for the model inventory, /queue for currently-running jobs (often labelled with the operator's project name), and /history for the last N generations. The history endpoint is particularly attribution-rich: it contains thumbnails of past outputs, which on commercial deployments is the operator's actual product pipeline. We never enqueue jobs. The read surface alone is sufficient evidence.

Inference UIs

T2

Gradio, Streamlit, model demos

ports 7860 · 8501 surveyed
Gradio + Streamlit · port-7860 sweep 2026-05
read
What it is

Gradio and Streamlit are the two ways researchers turn a script into a web app in one afternoon. Gradio (originally Hugging Face) gives you a chat interface or an image-uploader for any function in three lines of Python. Streamlit (now Snowflake) gives you a full dashboard. Both are aimed at the same need: "I built a model, my collaborator wants to play with it, can I have a UI by tomorrow?"

What goes wrong

Both frameworks make sharing easy and authentication invisible. A Gradio app with share=True becomes a tunnelled public URL with no password. A Streamlit app started with streamlit run listens on 0.0.0.0 by default. The model behind the UI typically processes uploaded files. The exposure is "an attacker uploads a file my code unpickles, my code reads from S3, my code calls a paid API." The UI is the rendered version of an entire backend pipeline, and that pipeline runs as the operator.

How we test

We fingerprint Gradio by the /info endpoint (it advertises the function signatures of every Python callable wired into the UI) and Streamlit by the WebSocket handshake on /_stcore/stream. From there the API surface tells the story: a Gradio app exposing an image-classification function is a model demo; a Gradio app exposing database-query is the operator using Gradio as an internal admin panel they didn't realise was public.

Notebooks

T1

Jupyter, ML research environments

ports 8443 · 8888 surveyed
university JupyterHub corpus · ongoing
read
What it is

Jupyter is where most of modern machine-learning research happens. A notebook is a live Python (or R, or Julia) shell with rich output (plots, tables, images) that runs inside a kernel an operator can leave running for days. JupyterLab is the polished IDE on top, JupyterHub the multi-user variant. Every ML grad student, every model fine-tuner, every quantitative analyst lives in this stack.

What goes wrong

A Jupyter server with no token (or a token shared in a public Slack, or a token from a screencast, or a token in a Docker Compose file pushed to GitHub) is a remote Python shell with the operator's full filesystem, GPU, and cloud credentials available via the imported boto3/google-cloud SDKs. The exposure isn't the notebook. It's the kernel behind it. Anyone reaching the port can spawn a new kernel and run arbitrary code under the operator's identity.

How we test

We probe for the token-prompt page, then the API at /api/sessions to enumerate live kernels (this works without auth in surprisingly many configs, and the response is a perfect operator-attribution payload: kernel paths contain user homedirs, repo names, and dataset filenames). We never spawn a new kernel on the target. The session list alone is sufficient to attribute, draft the disclosure, and demonstrate impact in evidence form.

03
The gateway layer
routing, rerank, retrieval glue 3 categories

The connective tissue between application and model. Gateways route across providers and manage keys; RAG frameworks join retrieval and generation; rerankers reorder candidate documents before they hit the model.

LLM Gateways

T1

LiteLLM, OneAPI, model routing

ports 4000 · 8000 · 8787 surveyed
1,899 unauth · 1,829 returned identical canned response
read
What it is

An LLM gateway is a reverse proxy for model APIs. The operator wires up keys for OpenAI, Anthropic, Google, Mistral, their own Ollama box, and a handful of fine-tunes; the gateway exposes a single OpenAI-compatible endpoint and handles routing, rate-limiting, fallback, observability, and cost accounting. LiteLLM is the Python-native one (most common in research); OneAPI is the Go/Chinese-ecosystem one (most common in commercial deployments). Portkey, Helicone-Proxy, and APISIX-AI sit in the same niche.

What goes wrong

The gateway holds the operator's entire AI billing relationship. If it's exposed without auth, an attacker can route arbitrary prompts through any of the configured providers: burning the operator's quota, exfiltrating embedded prompts that may contain customer data, and racking up usage charges on premium models. Worse: the admin panel typically lists every model alias, the keys behind them, and the per-user/per-team budget. The attacker learns the operator's whole AI org chart before issuing a single request.

How we test

We confirm the gateway by its /v1/models response shape (LiteLLM's is distinct from a vanilla OpenAI proxy), then check /health/readiness and /key/info for admin-key reachability. The key endpoint, when unauthenticated, returns the operator's full virtual-key inventory including budget caps and team assignments. We do not issue paid completions. The catalogue is enough to demonstrate the quota-drain risk and identify the operator.

RAG Frameworks

T1

LangChain, LlamaIndex, Dify, retrieval pipelines

ports 3000 · 7860 · 9380 surveyed
~98% port-9380 are custom FastAPI RAG · 51% leak /openapi.json
read
What it is

Retrieval-Augmented Generation is how an LLM gets access to documents it wasn't trained on: your company wiki, last week's invoices, a PDF of your medical history. A RAG pipeline chains a document loader, an embedder, a vector store, a retriever, and the LLM call. Frameworks that package this into one runtime: Dify (the most polished, Chinese-origin), Flowise (visual builder on top of LangChain), Haystack (Deepset's enterprise stack), Quivr, Verba. The pipeline is what turns a model into a product.

What goes wrong

Most RAG deployments are research artefacts that grew into prototypes that grew into production. Dify ships with admin@admin.com / password as the seed account; a fresh Flowise install exposes the canvas and every workflow's embedded API keys; Haystack's REST API is unauthenticated by default and its /query endpoint will dutifully retrieve and return any document the embedder has indexed. The corpus exposed this way ranges from public PDFs all the way to attorney-client communications, internal sales decks, and patient records.

How we test

We probe each framework's signature endpoints: Dify's /console/api/setup for the seed-account state, Flowise's /api/v1/chatflows for the workflow catalogue, Haystack's /search for the indexed corpus reach. When the retriever is reachable, we issue a single low-volume query (e.g. "summary") to confirm the corpus contains real content, capture the document titles and sources from the response, and stop. Title metadata is enough to attribute the operator and characterise the data class without reading the documents themselves.

Rerankers

T1

Cohere, Jina, BGE, Infinity reranker servers

ports 7997 · 8080
Cohere / Jina / BGE / Infinity — adjacent to embeddings
read
What it is

A reranker is the quality filter that sits between vector retrieval and the LLM. The vector DB returns the top-50 candidate documents fast but loosely; the reranker re-scores them with a smaller cross-encoder model that actually reads each document against the query and orders them by real relevance. Cohere Rerank (managed) and Jina Reranker (open source) are the two most common; BGE-Reranker (BAAI) is the strong open default; Infinity serves rerankers alongside its embeddings. Most production RAG stacks have one in the middle and most teaching examples skip it entirely.

What goes wrong

Reranker servers ship the same way embedding servers do: OpenAI-compatible HTTP, no auth, on the assumption that only the upstream RAG pipeline calls them. When exposed they leak two things: (1) the model identifier, which indicates how seriously the operator is doing RAG, and (2) the queries the operator is processing, since some servers log recent inputs to a status endpoint for debugging. The query log is the more damaging signal because queries often contain the original user prompt verbatim.

How we test

We probe /v1/rerank for the version banner and /v1/models for the model inventory. We do not submit reranking workloads. Where a debug or status endpoint exposes recent traffic we capture only the count and timing, not the query content. The model identifier and traffic profile together characterise the operator's RAG seriousness without our ever reading queries.

04
The model layer
inference runtimes 6 categories

Where the GPU melts. The runtimes that load weights and serve tokens — Ollama (no-auth by design), vLLM (no-auth by default), llama.cpp, Triton, embedding servers, speech & audio models.

Embedding Servers

T1

TEI, Infinity, sentence-transformers

ports 7997 · 8080
surfaces inside RAG / inference surveys
read
What it is

Embedding servers turn text (or images, or audio) into high-dimensional vectors that vector databases can search. Every RAG pipeline has one of these in the middle. Text Embeddings Inference (TEI) is Hugging Face's production-grade Rust runtime; Infinity (Michael Feil) is the fast Python alternative; the original sentence-transformers library ships its own HTTP server; Ollama also serves embeddings via /api/embeddings for the small-deployment crowd. They look like miniature inference servers because that's exactly what they are.

What goes wrong

Embedding servers expose an OpenAI-compatible /v1/embeddings endpoint by default and are typically deployed without auth, on the assumption that "only my RAG pipeline talks to it." When the host ends up reachable on a public IP an attacker gets a free embedding service, useful for their own RAG pipelines. More damaging: the model loaded by the server is often a fine-tuned variant trained on the operator's private corpus, and those custom weights are often what makes the operator's product different from the generic alternative.

How we test

We hit /v1/models (or the /info endpoint TEI exposes) for the model inventory and tokenizer metadata, then capture the model identifier. If the model name maps to a known Hugging Face artefact we attribute via the publishing org. If it's a private fine-tune we capture the architecture and tokenizer fingerprint, which is sufficient evidence of operator intellectual property without our needing to issue any embedding requests.

llama.cpp

T1

C++ inference runtime, frequently co-deployed on Ollama port :11434

ports 8080 · 11434
often co-deployed on Ollama port :11434
read
What it is

llama.cpp is the C++ reference implementation of LLaMA inference, the project that pioneered GGUF quantization and runs LLMs on commodity CPU + small GPU hardware. Its built-in HTTP server (llama-server) exposes an OpenAI-compatible API at /v1/models, /v1/chat/completions, plus the platform-native /props and /completion endpoints. Operators frequently co-deploy llama.cpp on the same port as Ollama (:11434) so existing Ollama clients can swap backends transparently.

What goes wrong

llama.cpp has no built-in authentication. The framework's design assumption (same as Ollama, vLLM, Triton) is that auth comes from a reverse proxy. Population-scale surveys find ~70% of :11434 ports running llama.cpp instead of (or alongside) Ollama, all unauthenticated. The /props endpoint discloses the loaded chat template (sometimes a custom-trained one), the model's n_ctx, the total slots, and the operator's quantization config. /completion accepts arbitrary prompts and burns operator compute. When the operator has loaded a custom-finetuned model (Xiyan_FT_14B, Baichuan_32B_medical, etc.), the model itself is operator IP.

How we test

We probe three alternative endpoints to distinguish llama.cpp from co-deployed Ollama: /v1/models should return JSON with "owned_by":"llamacpp", /props returns the server-info JSON with default_generation_settings + chat_template, and the HTTP Server: header reads llama.cpp on most builds. We never POST /completion or /v1/chat/completions; the model identity + config disclosure is the finding. The llama.cpp fingerprint was added to aimap in v1.9.4 (2026-05-15) after a field instance was caught running custom BitNet-b1.58-2B-4T on a Contabo SG host.

Ollama

T1

Local-LLM runtime, no auth by design

ports 11434 surveyed
16,473 unauth instances · 100% auth-off
read
What it is

Ollama is the easiest way to run a large language model on your own hardware. One binary, one command: ollama pull llama3 and you have a local OpenAI-style API on port 11434. It pulls quantised model weights from its own registry, manages the GPU layout, and serves an OpenAI-compatible chat endpoint. It is genuinely beautifully engineered, and it is the reason most of the world's self-hosted AI exists.

What goes wrong

The framework has no authentication concept by design. The maintainers' position is that auth is an upstream concern (run it behind a reverse proxy, behind Tailscale, behind your firewall). Most operators don't. Any host running Ollama on a public IP is a free, unauthenticated, unlimited model endpoint: an attacker can list the model inventory at /api/tags, chat through /api/chat, and even pull arbitrary new models via /api/pull, which silently downloads gigabytes onto the operator's disk and bills any attached cloud egress.

How we test

We hit /api/tags to enumerate the loaded models (this is the population-scale fingerprint behind our cross-cloud surveys), capture the response, and attribute via the kind of models loaded: a host serving gemma3:e4b and nothing else is a hobbyist; a host serving fifteen fine-tuned variants of llama3:70b with custom Modelfiles is a commercial operator. We do not issue chat completions. We do not call /api/pull. The model inventory tells the whole story.

Speech & Audio

T1

Whisper, Piper, RVC, Coqui, voice servers

ports 9000 surveyed
Whisper.cpp survey on :9000 · 2026-05
read
What it is

Speech models translate between text and audio in both directions. Whisper (OpenAI) is the universal speech-to-text engine; Piper is Rhasspy's tiny fast TTS; Coqui XTTS is the high-quality multi-speaker TTS that survived the company's death; RVC (Retrieval-based Voice Conversion) is the model that turns one person's voice into another's. Servers like wyoming-piper, openedai-speech, and the Whisper-server reference deployments wrap these models in HTTP APIs.

What goes wrong

The model itself isn't the exposure. The audio it processes is. Whisper servers exposed without auth become free transcription endpoints; we've found deployments where the operator was clearly using their server to transcribe internal meetings, with the audio paths in the request log telling the story. RVC servers carry an additional risk: the operator's trained voice models are stored on disk and served via the API. An attacker can pull a celebrity or executive voice model, then synthesise arbitrary speech in that voice.

How we test

We probe /v1/audio/transcriptions and /api/voices for the model and voice inventory, then characterise what kind of audio the operator processes by the file-extension distribution in the recent-jobs endpoint. We never submit audio. The voice-model catalogue is sufficient to identify problematic deployments (any voice model whose name matches a real person warrants disclosure to that person's representation, not just the cloud abuse desk).

Triton Inference Server

T1

NVIDIA model serving

ports 8000 · 8001 · 8002 surveyed
100% unauth across surveyed Class-A tier
read
What it is

Triton is NVIDIA's enterprise inference server: the heavyweight runtime designed for production model serving across every hardware target NVIDIA makes. It supports TensorRT, ONNX, PyTorch, TensorFlow, vLLM, and Python backends; it runs ensemble pipelines across models; it has a binary protocol (gRPC) and an HTTP/REST one. When you see a tritonserver container in a Kubernetes deployment, you're looking at someone serious about ML throughput.

What goes wrong

Triton's HTTP endpoints (/v2/models, /v2/repository/index, /v2/health/ready) are unauthenticated by design (NVIDIA's position: enforce auth at the ingress). The model repository index is a verbatim list of model names, their versions, their backends, and their state. For commercial operators these names are their intellectual property: fraud-detection-v3, recommender-cold-start-v7, biometric-match-v2. We've found Triton instances exposing classifier models that are clearly pulled from the operator's product, alongside the safety classifiers the operator hopes nobody bypasses.

How we test

We hit /v2 for the version banner, /v2/repository/index for the catalogue, and /v2/models/{name} for the model config (which exposes input/output tensor shapes, sufficient to reverse-engineer the model's purpose without ever invoking it). When the model is a published architecture (a known LLM, a known vision backbone) we do not issue inference. When it's a custom fine-tune we capture only the metadata.

vLLM

T1

High-throughput batched inference

ports 8000 surveyed
1,200+ unauth · 100% in Class-A 2026-05 sweep
read
What it is

vLLM is the inference engine of choice when you actually have GPUs and want to serve a model at scale. It implements PagedAttention (a memory-management technique that lets a single GPU host serve dozens of concurrent requests without OOMing) plus continuous batching, speculative decoding, prefix caching. It's what most commercial fine-tune deployments and university research clusters reach for once "Ollama on a laptop" stops being enough.

What goes wrong

vLLM exposes an OpenAI-compatible API on port 8000 by default. There is an --api-key flag. Most operators don't set it. An exposed vLLM instance is a free GPU compute pool serving whichever model the operator loaded (often a 70B parameter fine-tune that costs $5k/month to host on commercial infra), with token throughput high enough to be useful for an attacker running their own quota-heavy workloads. The /v1/models endpoint reveals the model name and architecture, which is often enough to identify the operator's research lab.

How we test

We probe /v1/models for the model inventory and /metrics for the Prometheus exposition (vLLM publishes detailed per-model token statistics here, including average request size, which is useful for inferring deployment age and traffic). For research instances we map the model name back to the publishing institution via Hugging Face. Disclosure goes to the lab's security contact directly, not the cloud abuse desk.

05
The data layer
storage, search, weights, telemetry 15 categories

Everything beneath the model. Vector indices, full-text engines, OLAP backends, object storage, model hubs, fine-tuning runtimes, MLOps trackers, backup snapshots — and the GPU pools that move it all.

Agent Memory

T1

Mem0, long-term memory for agents

ports 6379 · 7474 · 7687
Neo4j + Redis + Bolt — GraphRAG memory
read
What it is

A bare LLM has no memory between conversations. Agent-memory frameworks fix that. Mem0 is the runaway leader. It watches an agent's conversation, extracts the facts worth remembering ("user prefers vegetarian", "user lives in Denver", "user's company uses Postgres"), stores them in a vector DB, and re-injects the relevant ones into future prompts. Letta (formerly MemGPT), Zep, and Mem-Agent sit in the same shape. Together they are how an agent goes from goldfish to colleague.

What goes wrong

The memory store is a verbatim record of the operator's most-used agents' private context: user preferences, business facts, sometimes credentials and PII the user mentioned in passing. Mem0's REST API exposes /v1/memories/ with no authentication in the default Docker compose. Each memory record is attributed to a user_id, so the data is also indexed by the operator's identity scheme. That makes it both more useful for the user and more useful for an attacker who can now query "all memories about user 47".

How we test

We list memories via the unauthenticated API, capture the first few records' metadata (timestamps, user IDs, memory categories), and stop. We do not page through the corpus. The memory categories alone (preferences, work-history, medical) characterise the data class for the disclosure without our needing to read individual entries.

Backup & Snapshot Services

T2

Velero, Restic REST, Barman, Longhorn, model weights in unprotected snapshots

surveyed
269 GB Qdrant snapshots exposed in 2026-05 backup survey
read
What it is

Backups are easy to forget about. That's why they're dangerous. The popular ML and Kubernetes backup stack: Velero snapshots Kubernetes cluster state plus the persistent volumes underneath; Restic is the encrypted-by-default file backup tool whose REST server mode listens on a public port for incoming snapshots; Barman does Postgres-specific backup-and-restore; Longhorn (Rancher) is the Kubernetes block-storage layer that snapshots volumes on a schedule; BorgBackup sits in the same niche as Restic. In an ML deployment these tools are how the operator's model weights, training datasets, and vector-DB volumes are persisted between restarts.

What goes wrong

A backup is a verbatim copy of the system at rest. And at rest, every secret is unencrypted and every model file is intact. Restic's REST server, when exposed without HTTP auth, lets an attacker download every snapshot the operator has ever taken (which is usually the entire model registry plus the training data). Velero exposes its API through the Kubernetes API server, so a misconfigured cluster RBAC turns into a one-step model-exfiltration primitive. Longhorn's UI ships without auth on port 80 and lists every volume by name (model-weights-pvc, training-data-pvc), pointing attackers exactly where to chain next.

How we test

We probe Restic REST /snapshots for the snapshot inventory (this works without auth in the default config), Longhorn /v1/volumes for the volume list, Velero's BackupStorageLocation objects via the Kubernetes API. We do not download snapshots. The metadata (snapshot IDs, volume names, timestamps, sizes) is sufficient evidence and avoids us ever touching the model files themselves. A snapshot called mlflow-pvc measuring 240GB on a research host tells the disclosure story without any further reach.

Compute Orchestration

T1

RunPod, Ray, Volcano, Kubeflow, SkyPilot

ports 4040 · 5000 · 8265 · 4200
Spark · Airflow · Ray · Dask · Prefect — discovery runbook drafted
read
What it is

You can't fine-tune a 70B model on a laptop. ML compute orchestrators are how teams rent and schedule expensive GPUs. RunPod (managed) lets a researcher spin up an 8xA100 pod from a Jupyter button; Ray (Anyscale) is the Python-native distributed-compute framework; Volcano is the Kubernetes GPU scheduler; Kubeflow wraps both for an MLOps workflow; SkyPilot abstracts cloud GPU provisioning across providers. Each is the layer between "I need 80GB of VRAM" and "the GPU is now running my code."

What goes wrong

These systems hold very expensive credentials. RunPod API keys map to billable GPU pods; Ray clusters mount the operator's full SSH agent and kubeconfig; Kubeflow Pipelines runs as a service account with cluster-wide read on most installs. An exposed Ray dashboard is a one-click ray submit endpoint that runs arbitrary Python on the operator's GPU fleet. An exposed RunPod control plane lets an attacker spin up new pods for arbitrary workloads on the operator's bill. The cost vector here is real: we have seen disclosures involving five-figure unauthorised GPU rentals.

How we test

We probe Ray's dashboard /api/version, Kubeflow's /pipeline endpoint, and SkyPilot's API server for fingerprints. Where reachable, we list jobs (no submit, no cancel) to characterise what the operator runs and how much GPU they have available. Job names typically include the model architecture and training step, which is enough to attribute the operator and characterise the loss vector for the disclosure.

Container Orchestration

T1

Docker daemon, etcd, HashiCorp Vault, HashiCorp Consul, Portainer, Argo CD, Kubelet. The substrate AI runs on.

ports 2375 · 2379 · 6443 · 10250
3,014 etcd · 912 Vault · 4,105 Consul
read
What it is

Every modern LLM deployment runs on container infrastructure. The substrate layer is technically not LLM-specific: the Docker daemon, etcd (k8s/standalone), HashiCorp Vault (secrets), HashiCorp Consul (service mesh), Portainer (UI), Argo CD (continuous deployment), and the kubelet itself. Unauthenticated exposure here is sometimes more impactful than exposure of the LLM service it carries. Docker socket exposure = container escape = root on host. etcd unauth = full k8s state dump. Vault uninitialized = anyone calls /v1/sys/init and becomes the operator.

What goes wrong

The framework defaults vary across the layer:

- Docker daemon on TCP 2375 ships without auth in the official documentation's "remote API" examples; operators copy-paste the config and forget the TLS step. Population-scale unauth rate: high. - etcd v2 API (/v2/keys) ships without auth in older deployments; v3 default is gRPC-auth-on but operators frequently turn it off. - Vault is auth-on-default at the framework layer; the only unauth surface is the /v1/sys/init bootstrap endpoint, which is intentionally open until the first init call. Uninitialized Vaults are a one-shot full-takeover surface. - Consul ships with ACLs disabled by default in framework config (Tier-A); 100% of reachable Consul instances at population scale have ACL off. - Argo CD** is auth-on-default (Tier-C). 99.93% of the population is properly gated; ~0.07% set the anonymous-read template-config and leak app inventories.

How we test

Each substrate platform has its own identity-and-state probe. Docker: GET /version. etcd: GET /version + GET /v2/keys?recursive=false (top-level keys only). Vault: GET /v1/sys/seal-status + GET /v1/sys/init (sealed / unsealed / uninitialized). Consul: GET /v1/agent/self + GET /v1/catalog/services. We never read secret values, never PUT/DELETE/POST /v1/sys/init. The presence of the substrate at the public boundary is the finding; the operator's k8s topology, secret-engine mounts, and service catalog leak as metadata even when the data layer is gated.

Data Labeling

T2

Label Studio, Argilla, CVAT, Doccano, Prodigy

ports 6900 · 8080 surveyed
348 confirmed · ~99% auth-on at /v1/projects (auth-off thesis breaks here)
read
What it is

Models learn from labels. A data-labeling platform is the editing environment where humans annotate the raw data: boxes around objects, classifications on text, transcriptions of audio, span-level reasoning traces for RLHF. Label Studio (HumanSignal) is the universal multi-modal one; Argilla (Hugging Face) is the LLM-centric one; CVAT (Intel/Roboflow) owns computer-vision; Doccano is the lightweight NLP option; Prodigy (Explosion) is the paid serious one. The dataset that comes out of these tools is what the next model gets trained on. The labelling stack is upstream of model behaviour itself.

What goes wrong

The platform exposes two things. First, the raw data being labelled: often unredacted medical images, customer support transcripts, legal documents. Second, the labels themselves, which encode the operator's labelling rubric and frequently the model bias they are trying to amplify or correct. Default deployments have weak credentials (admin/admin is alarmingly common in the Label Studio Docker Compose examples) or token-based auth that operators share in Slack and forget to rotate.

How we test

We confirm the platform via its /version endpoint, then list projects via the unauthenticated API surface (Label Studio's /api/projects works without auth on the default install). Project names plus task counts tell the story: a project called "medical-imaging-batch-7" with 12,000 tasks is a healthcare operator; a project called "red-team-prompts" with a few hundred tasks is an AI lab's safety team. We never download tasks. The metadata characterises both the data class and the operator function.

Document Parsers

T1

Unstructured, LlamaParse, marker, MinerU, Docling

ports 9998
Apache Tika · ingestion-time exposure
read
What it is

Before a document gets embedded, it has to be turned into clean text. PDFs have layout. Word documents have tables. Slide decks have hierarchy. The document-parsing layer extracts all of that into the markdown-or-JSON the embedder expects. Unstructured.io is the multi-format incumbent. LlamaParse (LlamaIndex) is the cloud-API competitor optimised for RAG. marker is the open PDF-to-markdown specialist; MinerU (OpenDataLab) is the high-quality alternative. Docling (IBM) is the newer research-grade option. Every serious RAG pipeline runs documents through one of these before they ever reach the vector DB.

What goes wrong

The parser server processes a stream of operator-uploaded documents and caches them on disk. When the parser is exposed without auth, the document queue and the parsed-output cache are both reachable. That cache often contains documents the operator has marked private. Internal contracts, HR files, partner agreements, all sitting in plaintext markdown form on the parser's disk. The parser is also a high-CPU service: an attacker can submit large or malicious PDFs to either burn the operator's compute or trigger known parsing-library RCEs.

How we test

We probe the parser's REST endpoint for the version banner and check the status endpoint for queue depth and recent-job filenames. The filenames characterise the operator's document inventory and frequently identify the legal or business unit the parser is serving. We do not submit documents.

Fine-tuning Runtimes

T2

Axolotl, LLaMA-Factory, Unsloth, torchtune

Axolotl / LoRA / unsloth runtimes
read
What it is

Fine-tuning is the bridge between a generic foundation model and the operator's actual product. Axolotl (Wing Lian) is the YAML-driven post-training framework most labs reach for. LLaMA-Factory is the all-in-one Chinese-ecosystem fine-tuner with a polished WebUI. Unsloth makes LoRA fine-tuning faster and cheaper on consumer GPUs. torchtune (PyTorch) is Meta's official lightweight option. TRL (Hugging Face) provides the underlying RLHF/DPO trainers. Together they are where the operator's training data, base model choice, and tuning recipe live.

What goes wrong

A fine-tuning host is a workstation with the operator's training corpus on local disk, their Hugging Face token in the environment, their base model weights downloaded to a local cache, and their output adapter weights in a results directory. LLaMA-Factory's WebUI ships without auth on first boot; Axolotl jobs running under tmux/screen leave the dataset filename visible in the process list of any reachable monitoring endpoint. The exposure is the operator's entire training strategy: what data, what base model, what hyperparameters, what they're trying to teach the model to do.

How we test

We probe LLaMA-Factory's WebUI for the version banner and the recent-jobs endpoint, Axolotl's prometheus metrics for active runs, and any job scheduler integration that surfaces the dataset path. Job names and dataset filenames tell the story without our needing to read training data.

GPU Compute & Telemetry

T1

Run:AI, NVIDIA DCGM-exporter, Bright Cluster Manager, Slurm REST. Fleet metrics + scheduling.

ports 8265 · 9400
NVIDIA DCGM + Ray + Vast.ai + RunPod
read
What it is

The GPU-compute tier is the metrics and scheduling plane beneath every LLM training and inference deployment. NVIDIA's DCGM-exporter publishes Prometheus metrics from each GPU (utilization, memory, temperature, power), with a Hostname tag the operator sets to identify the box. Run:AI (now NVIDIA Run:AI) and NVIDIA Bright Cluster Manager orchestrate fleets of GPUs across clusters. Slurm REST is the HPC-tier scheduler.

What goes wrong

DCGM-exporter is a Prometheus exporter. The framework assumes the metrics endpoint sits inside a private network. There is no application-level authentication; auth is meant to come from the operator's network ACL. Operators who expose :9400 to the public internet inherit "no auth" by deployment-config mistake, not framework-default mistake. The leak is rich: GPU model, operator-set hostname, utilization timeline. The combination fingerprints what's being trained (LLM training has a different utilization signature than CV training has a different signature than inference). Operators running H100, H200, A100, RTX PRO 6000 Blackwell-class hardware are exposing six-figure compute fleets at the metrics layer.

How we test

We probe :9400/metrics and parse the Prometheus text for DCGM_FI_DEV_GPU_UTIL, modelName="...", and Hostname="..." labels. Operator hostnames are operator-attribution-rich (video-gpu007-mojo-mia.vs3.com discloses a video-AI rental operator with a Miami location). We do not scrape the time-series; instantaneous metrics suffice for severity. Run:AI dashboards, Bright Cluster Manager, and Slurm REST get their own fingerprint pathways; for each, we read identity-only and never invoke a job-submission endpoint.

Medical / Edge AI

T2

DICOM, MONAI, Orthanc, dcm4che, NVIDIA NIM, NVIDIA Clara. Clinical and edge model serving.

ports 4242 · 8042 · 11112
DICOM / Orthanc / MONAI / NVIDIA Clara · NIM
read
What it is

The medical-AI tier covers everything from a DICOM image archive (the canonical storage format for medical imaging) to specialty inference servers tuned for clinical workloads. Orthanc is the most-deployed open-source DICOM PACS. dcm4che / dcm4chee-arc is the Java-based enterprise option. DICOMweb (QIDO-RS, WADO-RS, STOW-RS) is the HTTP-API standard. MONAI Label is the NVIDIA-sponsored medical-imaging annotation server. NVIDIA NIM is NVIDIA's containerized model-serving platform increasingly used for clinical inference.

What goes wrong

DICOM servers ship without auth by default on the DICOM protocol port (104 / 11112) and frequently on the HTTP plugin (DICOMweb on :8042 for Orthanc, varied for dcm4che). Operators frequently use the protocol port as the public boundary, with no auth, because that's what every DICOM tutorial in 2010-2018 told them to do. Once reachable, the QIDO-RS endpoint discloses studies (patient identifiers, accession numbers, modality, study date). Orthanc's REST API exposes the same plus image data. MONAI Label's /info/ discloses loaded trainers and datasets, operator-attribution-rich for any deployment doing custom finetuning.

How we test

For each platform we probe the documented identity endpoint: Orthanc's /system, dcm4che's /dcm4chee-arc/aets, DICOMweb's /studies (with ?limit=1), MONAI's /info/, NIM's /v1/metadata. We confirm protocol shape (DICOM tag 0020000D is the StudyInstanceUID; its presence in a JSON ?limit=1 response is the high-confidence DICOM marker). We never fetch image data; the study and series counts plus the operator-attribution metadata are sufficient for severity. Disclosure pathways are clinical-data adjacent (HIPAA / GDPR / equivalent) and follow the hold-cluster-detail rule until acknowledged.

MLOps & Model Registries

T1

MLflow, W&B, Kubeflow, Metaflow

ports 1984 · 5000 · 8008 surveyed
MLflow surveyed · 18% had attacker-injected CVE-2023-1177 artifacts
read
What it is

When you train models for a living you need to track every experiment: hyperparameters, metrics, artefacts, the model file itself. MLflow (Databricks) is the open default; Weights & Biases is the polished SaaS incumbent; Kubeflow Pipelines, Metaflow (Netflix), and Comet sit in the same niche. The tracking server is the operator's training history, their model registry, and increasingly the artefact store from which production deployments pull.

What goes wrong

MLflow has no native authentication. The maintainers' guidance is to deploy behind a reverse proxy. Most operators don't, and the consequences are richer than they look. CVE-2023-1177 turns the artefact-fetch endpoint into a path-traversal RCE: an attacker creates an experiment with a crafted artefact URI and reads any file the MLflow process can read. We have found instances where this CVE was actively being exploited by other parties: the attacker's experiment runs were sitting alongside the operator's, with names like "recon", "shell", "check". The operator's model files, training data paths, and cloud-credential filenames all leak in the same way.

How we test

We probe /api/2.0/mlflow/experiments/search for the experiment inventory and /api/2.0/mlflow/runs/search for run-level metadata. We do not invoke the artefact-fetch endpoint. Operator attribution comes from the experiment names, the model architecture (visible in the params), and frequently the dataset URIs which point at named S3 buckets. Where we see signs of third-party exploitation we escalate the disclosure with the threat-actor evidence included.

Model Hubs & Registries

T2

HF Hub mirrors, ModelScope, Replicate, BentoML

HF / ModelScope / Replicate / BentoML
read
What it is

Models live somewhere. The Hugging Face Hub is the public default; ModelScope (Alibaba) is the Chinese equivalent; Replicate hosts serverless fine-tunes; BentoML's BentoCloud is the Python-native deployment registry. Most large operators run a private mirror of one of these so internal teams can pull models without going to the public internet, and so the operator's own fine-tunes can be versioned and served through the same interface every framework already speaks.

What goes wrong

A private hub mirror is a webhook target, a model-weights store, and an API token issuer all in one. When the mirror is exposed without auth, an attacker pulls every model the operator has uploaded, including private fine-tunes that contain the operator's training-data leakage. Worse, most mirrors implement the same /api/models/upload endpoint as the upstream, so an attacker can push a malicious model into the operator's namespace and wait for an internal team to pull it. The supply-chain risk is real: PyTorch models execute arbitrary Python on load.

How we test

We probe /api/models (HF-compatible) or /api/v1/models (ModelScope) for the inventory and read the model card metadata. Model names plus sizes characterise the operator and identify private fine-tunes. We do not download weights. Where the upload endpoint is reachable we confirm reachability with an OPTIONS request and stop.

Object Storage

T2

MinIO, S3, model & dataset stores

ports 9000 surveyed
852 MinIO surveyed · 0% anonymous-list (auth works here)
read
What it is

Models and datasets are big (gigabytes to terabytes per artefact), and the universal storage substrate for them is S3-compatible object storage. MinIO is the self-hosted on-prem option (also bundled with most RAG distributions like Dify); AWS S3, Google Cloud Storage, and Cloudflare R2 are the cloud variants; Garage and SeaweedFS are the smaller open alternatives. Every model registry, every fine-tuning job, every RAG document loader writes through one of these.

What goes wrong

MinIO ships with the credentials minioadmin / minioadmin and a public console on port 9001. Most operators change the password but leave the console reachable; many leave the API on port 9000 with a public bucket policy that reveals the bucket inventory. The buckets are typically named after the project (model-weights, training-data-2026, customer-uploads), and the keys inside them describe the artefact lifecycle. S3 buckets exhibit the same pattern at a different scale: misconfigured bucket policies, public ACLs from old aws s3 sync --acl public-read mistakes, and the now-classic "bucket name is the company name plus production" enumeration vulnerability.

How we test

We list buckets through the unauthenticated MinIO admin API where reachable, and check S3 buckets via probabilistic name enumeration (no brute-force, just the patterns that fall out of the operator's known naming conventions). We confirm exposure with a single HEAD against a bucket-listing URL; we do not download objects. Bucket names plus their key-prefix structure are the disclosure evidence.

OLAP / Analytics Backends

T2

ClickHouse, Apache Cassandra, ScyllaDB, Apache Pinot. The trace / log / analytics tier beneath observability.

ports 8123 · 9000 · 9042
ClickHouse / Cassandra / ScyllaDB / Pinot — in progress
read
What it is

Every modern LLM observability stack writes its traces, metrics, and call-history into a columnar OLAP backend. ClickHouse is dominant: it's the storage tier under SigNoz, Phoenix-on-OTLP, PostHog product analytics, Plausible, and many custom in-house observability platforms. Cassandra, ScyllaDB, and Apache Pinot fill adjacent niches. When the upstream observability tool is itself unauthenticated, the OLAP backend is also typically reachable on the same host, often on its own default port.

What goes wrong

ClickHouse's official Docker image creates a default user with no password. The operator must set CLICKHOUSE_USER and CLICKHOUSE_PASSWORD at container start, or modify users.xml. At population scale, ~18% of reachable ClickHouse instances skip this step. The exposure surface is the operator's entire app schema: database names disclose what the operator stores (signoz_traces, posthog, plausible_events_db, custom vllm_service, ai_hedge_fund, scentedai_fragid_new). LLM call traces, with full prompt and response bodies, often land here.

How we test

We send GET /ping to confirm a ClickHouse server, then GET /?query=SELECT+version() (read-only sanity check) and GET /?query=SHOW+DATABASES+FORMAT+JSON for the database list. Database and table names are the finding; we never SELECT * FROM any user table. For Cassandra/Scylla, the TCP banner on port 9042 confirms identity. The classification is intel-disclosure-tier, not RCE-tier. For observability backends, the disclosed information is exactly the LLM call history the operator is trying to keep private.

Search Engines

T2

Elasticsearch, Apache Solr, Meilisearch, Typesense, Vespa. Full-text + vector search backends.

ports 5601 · 9200 surveyed
5,037 ES with dense_vector schema · Sanctionscanner 79M KYB
read
What it is

Search engines power both the classic full-text retrieval tier (Elasticsearch, Apache Solr, Vespa) and the modern vector-similarity tier that LLM apps lean on for retrieval-augmented generation. The line between them has blurred since 2022: every mainstream engine (Elastic, OpenSearch, Solr 9, Vespa, Meilisearch, Typesense) now ships dense-vector indices alongside their inverted-index core. Many production RAG pipelines store their LangChain or LlamaIndex document chunks here rather than in a dedicated vector DB.

What goes wrong

The official Docker images ship with auth off by default. The operator must opt into security: set xpack.security.enabled=true for Elasticsearch, configure Solr's security.json to enable the basic-auth plugin, or set the Meilisearch master key via environment variable. Across population-scale surveys, ~54% of reachable Elasticsearch instances skip the step entirely. Solr's older Docker tags (solr:7.x) compound the problem with multiple unauthenticated remote-code-execution CVEs: CVE-2019-17558 (Velocity Template SSTI), CVE-2019-0193 (DataImportHandler), CVE-2019-12409 (JMX-RMI). The data layer itself discloses operator app schema via index and core names long before any document is read.

How we test

We probe each engine's identity endpoint (/ for Elasticsearch's version JSON, /solr/admin/info/system for Solr, /health for Meili and Typesense, /state/v1 for Vespa), confirm version, and then call the documented listing endpoint (/_cat/indices, /solr/admin/cores, /indexes, /collections). Index and core names are the finding: operators name things like rag-document-chunks, spring-ai-document-index, entity_vectors, kb_documents_v1. Disclosure of the operator's app architecture happens before any document fetch. We never run free-text queries against the index; the names alone justify the severity claim.

Vector Databases

T1

ChromaDB, Qdrant, Milvus, Weaviate, Pinecone, pgvector, Elasticsearch

ports 5500 · 6333 · 6334 · 19530 · 9091 surveyed
142 unauth in Class-A · OnlyFans Milvus = 1.21M facial embeddings
read
What it is

A vector database stores high-dimensional embeddings, numerical fingerprints of text, images, audio, and answers nearest-neighbour queries against them. It is the memory of every RAG system. The popular ones each carve out a slightly different niche: ChromaDB (the developer-friendly default), Qdrant (Rust-fast, popular in production), Milvus (the heavyweight enterprise option), Weaviate (schema-rich), Pinecone (managed-only), pgvector (Postgres extension), Elasticsearch with its dense_vector type (the one that already lives in your stack).

What goes wrong

Every popular vector DB ships with authentication off by default and a public listen socket. The exposure isn't theoretical, it's the contents. Every collection is named (often after the project: customer-support-knowledge, legal-discovery-q4, patient-notes-v2); every collection contains the embedded text in its metadata; many collections also contain the original source URL or document ID. Reading a single collection lets an attacker reconstruct most of the operator's internal corpus, plus the prompts the operator has been embedding (which often are customer queries).

How we test

We hit the heartbeat endpoint to confirm the engine, list collections via the unauthenticated metadata API, and read the first record's metadata only (never the raw vectors, never the bulk content). The collection names plus the metadata-schema fields are sufficient evidence of exposure. For operators we already know, universities, medical centres, financial institutions, we draft the disclosure on collection names alone, which cleanly avoids touching the contents in any reportable way.

06
The observability layer
tracing, safety, prompt registries 3 categories

How operators see what their model is doing. LLM-specific tracing, prompt registries, safety eval harnesses. Often misconfigured to publish what the operator thinks they are observing privately.

AI Safety & Evals

T3

Guardrails, Nemo Guardrails, EvalHarness, lm-eval

surveyed
0 confirmed at population scale after methodology correction (substring FPs)
read
What it is

Eval harnesses measure model behaviour against benchmarks: capabilities, biases, refusal patterns, jailbreak resistance. lm-eval-harness (EleutherAI) is the universal capability-eval; EleutherAI's safety eval forks track refusal/harm rates; NVIDIA Nemo Guardrails and Guardrails AI sit in front of production models and constrain output in real time; Inspect (UK AI Safety Institute) and Anthropic's evals are the research-grade options. Together they are how a serious AI deployment knows whether the model is doing what it's supposed to.

What goes wrong

Eval and guardrail systems hold the operator's threat model: the prompts they consider harmful, the responses they consider unacceptable, the policy they want enforced. When an eval-harness server is exposed unauthenticated, an attacker reads the full set of red-team prompts the operator uses, learns which of those prompts the model currently fails, and gets a precise roadmap to bypass the operator's guardrails. The exposure of a guardrail configuration is also a disclosure of the policy boundary itself.

How we test

We probe for harness control endpoints (lm-eval's WebSocket UI, Nemo Guardrails' REST API on port 8000) and read the policy/eval inventory via the unauthenticated metadata endpoints. We never run new evals. The eval names alone are the disclosure evidence. Names like "jailbreak-bench", "medical-refusals", "copyright-output" characterise the operator's concerns and identify their team without our needing to read prompt bodies.

LLM Observability

T1

Langfuse, Helicone, LangSmith, Phoenix, Lunary

ports 3000 · 6006 · 8787 surveyed
6 Phoenix + 3 TensorBoard, all unauth · live SDXL training observed
read
What it is

Once an operator runs an LLM in production they need to see what it's doing. LLM observability platforms record every prompt, every completion, every tool call, every retrieved document, with token-cost and latency overlays. Langfuse is the open self-hostable leader; Helicone is the proxy-based one; LangSmith (LangChain) is the SaaS option; Arize AI's Phoenix is the open-source agent development & evaluation platform; Lunary sits in the same space. Together they are the AI equivalent of Datadog, the system of record for everything the model has done.

What goes wrong

The trace store is the operator's most sensitive AI artefact. It contains every customer prompt verbatim (which is often customer PII), every retrieved document (which is often the operator's private corpus), and every tool call with full arguments (which is often credentials in plain text). Langfuse ships with a project-key model that operators sometimes bypass by enabling the public-projects feature for "share a trace with my colleague" workflows and forgetting to disable it. The traces become indexed and crawlable by default after that.

How we test

We probe /api/public/projects and /api/public/traces for the trace inventory; the response shape confirms Langfuse and reveals project names along with first-seen and last-seen timestamps. We never read trace bodies. Project names attribute the operator (most are "customer-support-prod", "sales-enrichment", etc.) and the date range characterises the corpus volume. Trace counts in the millions on a single project warrant priority disclosure.

Prompt Management

T2

PromptLayer, Promptly, Pezzo, Agenta

ports 3000 · 8000
Langfuse · Phoenix · custom registries
read
What it is

Prompts are code, and code that lives inside f-strings is hard to manage at scale. Prompt management platforms version, A/B-test, and govern prompts the same way GitHub manages source. PromptLayer is the SaaS leader. Pezzo, Promptly, and Agenta are the open-source alternatives. LangSmith (LangChain) and Langfuse also overlap into this space from the observability side. Together they are how a prompt becomes a versioned, deployable artifact instead of a magic string copy-pasted into twelve services.

What goes wrong

The prompt store is a verbatim record of every system-prompt and template the operator has written, including the ones they tried and rejected. When the platform is exposed without auth, an attacker reads the operator's entire prompt history including the jailbreak-resistance prompts, the tone-of-voice instructions, the proprietary chain-of-thought patterns, and any embedded keys or URLs the operator pasted into prompt bodies. We have also seen credential-bearing webhooks defined in PromptLayer-style platforms, leaking the operator's downstream integrations.

How we test

We probe /api/prompts, /v1/prompts, or the platform-specific equivalent for the prompt inventory and read prompt names plus version counts. We do not read prompt bodies. The names alone ("customer-support-system-v3", "jailbreak-defense", "tone-formal") characterise the operator's product strategy without our needing to see the actual text.

§ 04 Methodology insights
84 methodology insights codified in the corpus each lives at /methodology/insight-NN-*
  1. 01

    Protocol-strict surveys self-filter honeypots

    The protocol-shape gate is a stronger honeypot filter than IP-based blocklists.

  2. 02

    Single-template auth-off failures propagate at population scale

    Pattern detection on response uniformity is a powerful "single root-cause / many victims" classifier.

  3. 03

    Capabilities-object tool-schema leak

    Auth-gated invocation surfaces still leak structural information at the unauthenticated handshake layer.

  4. 04

    WHOIS-driven contact resolution is non-negotiable

    ARIN/RIPE/APNIC OrgName + OrgAbuseEmail from IP-WHOIS is the authoritative input for any disclosure recipient derivation. Filename-friendly identifiers are not institution-domain mappings.

  5. 05

    Same-day-remediation feedback loop

    Structured disclosures with embedded one-line fixes have an order-of-magnitude faster remediation rate than vague advisories.

  6. 06

    Single-word substring matching is unsound at population scale

    A platform fingerprint must require, at minimum: (a) a specific endpoint that the platform alone serves, (b) structured response (JSON parse + named field, or specific HTML title format), (c) anchored keyword match conjoined with (a) and (b).

  7. 07

    Shodan-facet bucketing inherits the substring-FP class

    Shodan's http.html: and product: matches are themselves substring-style filters at the indexer level. Apply Insight #6's conjunctive-matcher rule at the seed layer, not just the probe layer.

  8. 08

    Auth-bypass-via-misconfiguration is missed by entry-point-only fingerprints

    For application-tier surveys (RAG framework, LLM orchestration, BI dashboards, anything with a documented public-role config), entry-point fingerprints are insufficient. The probe must follow redirects and check for authenticated-state-only tokens on the post-redirect target.

  9. 09

    Cross-survey-correlation is a Shodan-free discovery vector with stacked-finding bias

    The existing nuclide.db ledger of confirmed exposures is itself a discovery substrate. Every IP NuClide has previously confirmed running an unauth Tier-A platform is a candidate for additional unauth platforms on adjacent ports. Cross-survey-correlation probes must always sweep alt-ports, not…

  10. 10

    Research/lab-instrument vendors ship web stacks with auth-disabled defaults

    Population-scale exposure is the default-config decision of the vendor, not a misconfiguration by the operator. Vendor-template means population-scale exposure.

  11. 11

    Source code is authoritative; bug reports are framing

    When a bug report claims that a vendor wrote X to a config, verify against the vendor's source repository and current release tag before accepting the framing. Config mutators that preserve keys they don't manage are a misattribution attack surface; the right verification path is grep on the…

  12. 12

    Hostname-routed SSO doesn't protect the IP-direct shadow

    When an operator deploys SSO at the application layer (authentik, OAuth proxy, Keycloak, oauth2-proxy, Traefik forward-auth, etc.) and binds it via the reverse proxy's hostname routing, every service that listens on the underlying host's IP, at any port, answers requests by IP and bypasses the…

  13. 13

    Shipping defaults are load-bearing for population-scale security posture

    When two products in the same category have similar customer overlap but ship with opposite security defaults, the population-scale security outcomes follow the defaults. Not the operators. A single env-var default (AUTH_ENABLE=False vs no toggle at all) can produce population-scale…

  14. 14

    Recon yield aligns with port-class operator intent, not port number

    When sweeping IP-direct-shadow ports for hidden surfaces on hosts already fronted by an SSO reverse proxy (see Insight #12), the productive selector is what class of service the operator was deploying, not the port's formal IANA assignment, popularity rank, or even whether the port number is…

  15. 15

    Shodan dork hits are not platform instances (the 50% rule)

    The number of hits returned by a Shodan dork is not the number of platform instances. Across the AI/LLM infrastructure surveys in 2026-04 and 2026-05, the population of hits that match a single-token title-based dork contains roughly half false positives, services that are not the target…

  16. 16

    A 200 from a platform endpoint is identity, not auth state

    When a platform endpoint returns HTTP 200 to an unauthenticated probe, that response confirms platform identity, the platform is alive at the URL, accepts requests, and chose to answer, but it does NOT classify the auth posture. The fingerprint must observe the actual data layer behind the…

  17. 17

    Platform-class operators are mono-platform at population scale

    When two platforms solve the same problem (e.g. LLM observability, vector storage, prompt management), operators install one of them per host. Across 789 hosts spanning four AI-observability platforms (Phoenix + Langfuse + Helicone + LangSmith), there are zero genuine IP-level overlaps. The…

  18. 18

    Storage-tier hygiene exceeds tracker-tier hygiene at population scale

    Across 49 cloud-provider buckets extracted from the artifact URIs of 120 critically-exposed unauthenticated MLflow trackers, 48 buckets (97.96%) are locked at the storage tier. One container has an anonymous-list ACL, and it was empty at probe time.

  19. 19

    SPA + headless API is a high-severity exposure tell

    When a single-page application is hosted on a CDN platform (Vercel, Cloudflare Pages, Netlify, GitHub Pages, etc.) and its bundled JavaScript calls a same-brand API host of the form https://api.<brand>.<tld>/..., the API host is almost always on infrastructure the operator manages directly, and…

  20. 20

    aimap's AI-service classifier needs the ML data tier, not just the inference tier

    aimap classifies a target by what AI/ML services it can fingerprint on that target's open ports. The catalog has been built incrementally around the inference and observability tiers: Ollama, vLLM, llama.cpp, MLflow, Phoenix, Langfuse, LangSmith, Helicone, Open WebUI, ChromaDB, Qdrant, Milvus,…

  21. 21

    Port-first discovery beats brand-dork discovery for low-footprint platforms

    The standard population survey is dork-then-confirm: write a Shodan dork that matches the platform's brand string, harvest the hits, confirm each one. That works when the platform's web frontend carries Shodan-indexable distinctive text.

  22. 22

    Protocol-strict handshakes are the only verifier for multi-protocol honeypot fleets

    Insight #1 established that protocol-strict handshakes filter honeypots: an exact JSON-RPC initialize envelope dropped AS63949 Linode honeypot pollution from 91.6% to 1.1% in the MCP survey. The medical/edge AI survey extends this, and surfaces the second-order pattern: modern honeypot fleets…

  23. 23

    Discovery-channel coverage is multiplicative

    A population survey can be sourced two ways: masscan-on-cloud-prefixes (scope a set of cloud /16 ranges, scan a port across all of them) or Shodan-walk (page through the Shodan-indexed result set for a brand dork or service-product facet). Each method has a coverage profile, and those profiles…

  24. 24

    Operator workload visibility via Ollama /api/show Modelfile SYSTEM prompts

    When Ollama is unauthenticated, the /api/tags endpoint discloses what models the operator installed. That is the canonical finding.

  25. 25

    Tier-C platforms produce ~0% unauth at population scale

    The auth-on-default thesis is falsifiable: a Tier-C platform (auth-on-default in framework) that landed at 5–25% unauth at population scale would break it. None have. The cumulative evidence base across the 2026-05 survey series:

  26. 26

    Shodan-facet FP rate escalates with token commonality

    Codified by Insight #15 (http.title:"LiteLLM API" → 5,391 hits, 2,710 real LiteLLM = 50% FP). Sharpened by the 2026-05-15 RVC voice-cloning survey (http.title:"RVC" → ~34 hits, ~6 real = ~82% FP). Now further sharpened by the 2026-05-16 ComfyUI survey:

  27. 27

    Docker-image-template dominance

    Three independent surveys on 2026-05-16 surfaced the same shape:

  28. 28

    A population state is not a daily rate (RETRACTED)

    The first version of this insight claimed 71.6% of the 5,037-host population was wiped by an automated extortion campaign in a 24-hour window. That framing is wrong as a 24-hour event rate. The corrected numbers come from re-probing the same host list 24 hours later.

  29. 29

    Snapshot vs delta

    A single observation of a population says one thing. Two observations say another. When a campaign has been running long enough to saturate the population, the snapshot reports history. Only the delta reports today.

  30. 30

    Multi-port identical responses identify honeypot fleets

    A real service occupies one port. A honeypot fleet that ships the same canned response on every port it has open is identifiable by that uniformity alone, with no need to decode any specific protocol.

  31. 31

    App-builder tools brand the OUTPUT, not the AGENT — anchor on agent API contract

    _Source: code-assistants survey verification, 2026-05-18. Extends Insight #6 (conjunctive marker-anchored matchers) and Insight #15 (~50% real-rate on single-token dorks)._

  32. 32

    Multi-service deception fleets emulate target-specific services for Shodan scanners; filter on body markers, not title

    _Source: Jetson/TensorRT-edge population survey, 2026-05-18. Two distinct deception fleets surfaced in one survey: 22 hosts emulating Triton, 576 hosts emulating Shinobi. Distinct from the AS63949 Linode honeypot fleet documented in Insight #1's source case._

  33. 33

    Side-channel attribution via Docker registry catalog content when direct fingerprinting fails

    When the direct fingerprint for a target class (Shodan dork on title, body, port, banner) returns mostly false positives at population scale, look for an adjacent service the operator runs whose content reveals what the direct probe could not. Docker Registry V2 is the canonical such service:…

  34. 34

    Persistence without pressure — operator-unauth populations don't self-remediate

    _Source: code-assistants population follow-up survey, 2026-05-18. Cross-referenced against Insight #28 (extortion-driven decay)._

  35. 35

    Side-channel attribution has high precision and low recall; it is for targeted investigation, not population discovery

    Insight #33 establishes that operator-class attribution via adjacent-service content (Docker Registry /v2/_catalog) works when the operator's content carries class signals. The yield is high when the population is already selected for the class, and very low when the population is not.

  36. 36

    PaaS deployment automation bakes build-time env-vars into client JS bundles; secrets prefixed with NEXT_PUBLIC_ / VITE_ leak to every visitor

    When an operator deploys a Next.js or Vite app via a self-hosted PaaS (Dokploy, Coolify, Caprover, Easypanel) and declares a secret like LANGFUSE_SECRET_KEY with one of:

  37. 37

    Asymmetric auth gating, dashboard requires login but the API does not; observability platforms accept unauthenticated trace ingestion even when the UI is locked

    Many AI observability + telemetry platforms ship with two distinct authentication surfaces on the same port:

  38. 38

    Hard-proof verification chain for exfiltrated-credential class findings; six steps from HTML-exposed key to verified operator data

    A finding involving a credential exposed in public HTML cannot be tiered without traversing the six-step verification chain. Each step verifies a discrete claim. Tier promotion happens at each step; the finding's final tier is determined by the deepest step verified.

  39. 39

    Pooled-account upstream proxy as attribution-laundering layer; one paid API account fans out to N unauthorized end-customers through a middle-tier relay

    A subset of LLM-resale fraud operations route through a three-tier architecture that flattens attribution from the upstream vendor's perspective:

  40. 40

    Auth-on-default thesis shifts rightward in successor OSS generations

    Codified: 2026-05-19 (sub2api population survey) Family: Insight #25 (auth-on-default thesis), Insight #36 (PaaS build-arg secret baking), Insight #39 (pooled-account attribution laundering) Falsifiability tier: medium — pattern needs at least one more successor-generation pair to confirm or break

  41. 41

    Admin-endpoint field-name enumeration is the Stage-2-deep verify primitive; secret-class field names at documented paths are the finding, no value read required

    For admin-style endpoints that return a long structured JSON dump (Envoy /config_dump, Spring Actuator /env and /configprops, Kong admin /config, Consul /v1/agent/self, Vault /sys/config/state/sanitized, Traefik /api/rawdata, NATS /varz), the Stage-2-deep verify primitive is enumeration of…

  42. 42

    LLM gateway model-name mismatch: proxies advertise premium model IDs while serving different backends. /v1/model/info is the authoritative discriminator; the motive (convenience alias vs fraud) requires per-host verification.

    Initial framing of "fraud" was incorrect. The operator is Jo Lab (jolab.ai, jolab.app), an academic biomedical AI research lab marketing "AI for Disease Prediction & Early Diagnosis." swatweb.org is their SWAT-web Sliding Window Association Test bioinformatics tool. No customer-facing "Claude…

  43. 43

    VisorSD multi-ASN grouped-OR query construction returns zero even when Shodan direct returns hundreds; the bug is in VisorSD's query templating, not Shodan's index.

    VisorSD's multi-ASN grouped-OR query construction can silently return zero where Shodan direct queries return hundreds. A zero-result VisorSD run against a known-populated ASN is a tooling failure signal, not a population signal. Always cross-validate a zero VisorSD result with a direct Shodan…

  44. 44

    Parallel aimap passes cannibalize each other's throughput via client-side socket pool contention; default to sequential or staged execution with the largest corpus running alone first.

    Running multiple aimap processes in parallel against large corpora degrades total throughput by roughly 3× compared to sequential execution, and can cause complete hangs (zero output after 36+ minutes). The bottleneck is client-side socket pool exhaustion: N concurrent aimap binaries each…

  45. 45

    Niche Shodan dork yield follows a stable class hierarchy: Server-header > frontend-bundle-ID body > route-slug body. Route-slug dorks fail because Shodan crawls root HTML, not JS bundle source.

    Shodan dork yield for AI/LLM infrastructure follows a stable three-tier class hierarchy:

  46. 46

    TLS certificate subject CN is a precise operator-attribution surface; operators who embed platform brand names in cert CN are doing intentional TLS termination, making cert-CN dorks stable against CDN proxying and more precise than HTML body matching.

    An operator who names a TLS certificate after the AI platform they're running (openai.mycompany.com, litellm-prod, ollama-inference) has:

  47. 47

    TLS cert subject CN is an operator-attribution surface, NOT a platform-confirmation or auth-state surface. CN-identified operators are the intentionally-configured class; they are inversely correlated with auth-off-default posture.

    Two populations. Inverse correlation with auth posture.

  48. 49

    Ollama-Cloud-signin × public-exposure = LLMjacking surface; the operator's Ollama Cloud subscription quota is billable by any public caller

    An Ollama instance meeting BOTH of these conditions exposes the signed-in operator's Ollama Cloud subscription quota to public invocation:

  49. 50

    OVMS Backend Co-location: FastAPI Wrapper + OpenVINO Model Server Both Exposed

    Custom FastAPI embedding services often sit in front of an Intel OpenVINO Model Server (OVMS) backend on a co-located port. When the FastAPI wrapper is exposed without auth, the OVMS backend is also exposed without auth — and on a different port than the wrapper.

  50. 51

    A port number names a candidate, not a finding

    Codified: 2026-05-21 (global university LLM-exposure map, service-verification pass) Family: Insight #25 (auth-on-default thesis), Insight #16 (no status code is identity). This is the population-scale measurement of the METHODOLOGY's load-bearing claim that verification, not scanning, produces…

  51. 52

    An HTTP 200 at an API path is not that API

    Codified: 2026-05-21 (global university LLM-exposure map, per-host arsenal triage) Family: Insight #16 (no status code is identity), Insight #51 (a port number names a candidate). This is the layer-7 analogue of #51: where #51 is a TCP connect mistaken for a service, #52 is an HTTP 200 mistaken…

  52. 53

    A hostname label is not a cloud project identifier

    Codified: 2026-05-21 (global university LLM-exposure map, per-host arsenal triage, Firebase candidate verification) Family: Insight #51 (a port number names a candidate), Insight #52 (an HTTP 200 is not that API), Insight #16 (no status code is identity). This one is the attribution-stage…

  53. 54

    Metabase setup-token: a self-authorizing credential class

    Codified: 2026-05-21 (embedding-tier2-2026-05-21 session — masscan sweep of OVH/Scaleway tier-2 cloud ranges) Family: Insight #39 (install-wizard-open / pooled-account attribution laundering), Insight #16 (no status code is identity), Insight #25 (auth-on-default thesis) Falsifiability tier:…

  54. 55

    Auth-gated API + Open Signup = Uncontrolled Account Creation

    Date: 2026-05-22 Survey anchor: Agenta LLMOps (14-host population) Finding class: First-party authentication configuration

  55. 56

    LangGraph self-identifying JSON root as primary fingerprint

    Date codified: 2026-05-25 Survey anchor: LangGraph Server population survey File: case-studies/commercial/langgraph-server-survey-2026-05-25.md

  56. 57

    Partial-auth failure: auth on collection endpoints, none on individual resource endpoints

    Date codified: 2026-05-25 Survey anchor: Survey-38 LangGraph — Stock.ai / EMOR AI (20.193.252.230) File: case-studies/commercial/stock-ai-emor-partial-auth-2026-05-25.md

  57. 58

    Vite dev server left running in production exposes full TypeScript source

    Date codified: 2026-05-25 Survey anchor: Survey-38 LangGraph — Assistent Tècnic Intel·ligent / Docu Companion (157.180.21.126) File: case-studies/commercial/docu-companion-vite-dev-server-2026-05-25.md

  58. 59

    Date: 2026-05-25 Survey anchor: n8n discovery, 38.102.86.8

  59. 60

    Redis Stack FT._LIST as Vector-Tier Enumeration Primitive

    Date: 2026-05-25 Survey anchor: Redis Stack / RedisInsight population survey (2026-05-25)

  60. 61

    RedisInsight /api/databases Returns Redis Passwords in Plaintext

    Date: 2026-05-26 Survey anchor: Redis Stack / RedisInsight population survey (2026-05-25)

  61. 62

    Survey anchor: Cat-09 code assistants, 2026-05-26 Codified: 2026-05-26 Status: Confirmed, population-verified

  62. 63

    Date: 2026-05-26 Survey anchor: Cat-04 stragglers — Prefect, Dask, ClearML, BentoML Status: Confirmed

  63. 64

    Date: 2026-05-26 Survey anchor: Cat-06 stragglers — Agno (AIRIAD Risk Advisor, Collision Analysis AgentOS) Status: Confirmed

  64. 65

    Date: 2026-05-27 Survey: Argo Workflows (Category 29 — K8s Workflow Orchestration) Anchoring data: 67 confirmed instances (ssl:"ArgoProj" population, 0 auth-bypass); 200 additional instances (ssl:"Argo Workflows" population, auth status pending)

  65. 66

    Date: 2026-05-27 Survey: Argo Workflows (Category 29), aimap v1.9.35 fix Anchoring data: 156 hosts, 111 on port 443, 0 on port 2746

  66. 67

    Insight #67: Voice/audio AI API servers are Shodan-dark behind JSON-only roots; only the demo UI indexes

    For the entire voice/audio AI category, the highest-severity surfaces are the ones Shodan cannot see. The OpenAI-compatible TTS/ASR API servers (GPT-SoVITS, Orpheus, Kokoro's API path, Deepgram on-prem, WhisperLive) return a JSON-only root or a non-root JSON status endpoint that the Shodan…

  67. 68

    Insight #68: The verification-rung grid. Label every claim by a depth-and-breadth pair, and never use language above the rung its evidence reached

    Every finding carries a verification status expressed as a pair: an inner rung (depth, code vs live) and an outer rung (breadth, host vs population). The two axes are logically orthogonal, so they must not be collapsed into one ladder. The claim language is bound to the pair. State the pair in…

  68. 69

    Insight #69: A curated-port scan's negative is not a host-level negative; run a full-range population (Censys) as a standing complement

    When aimap (our AI-intent-curated port scanner) reports "no AI/ML service," that is a true statement about the ports and fingerprints it checked, not a statement about the host. The two are easy to conflate, and conflating them ships a confident, wrong "clean host" conclusion.

  69. 70

    Insight #70: Censys is a dual primitive — full-range ports give identity, protocol decoders give auth-state; never conflate the label with the decoder

    A Censys cross-reference returns two separable things, and treating them as one ships a wrong number. The first is identity: the full-range port sweep shows which services a host actually runs, including the data tier and second apps a curated AI-port scan never touches. The second is…

  70. 71

    The auth-on-default thesis has, until now, measured platforms that have an authentication layer and ship it on or off by default (Phoenix ENABLE_AUTH=False vs Langfuse no-toggle). Service-mesh introspection planes are a different and worse class: they have no authentication layer at all. Their…

  71. 72

    There is a failure class between "auth off by default" (#13) and "no auth layer at all" (#71): a platform that ships real authentication and a real authorization layer, both on by default, and then ships a self-registration knob that defaults open. The data endpoints are correctly gated, the…

  72. 73

    A fingerprinter that does not send the platform's content-negotiation header will get zero results from a platform that uses header-based API versioning, even when the platform is present, exposed, and unauthenticated at the identity endpoint. The absence is a tool artifact, not a population fact.

  73. 74

    An exposed AI gateway is categorically different from an exposed model server. A single unauth Ollama instance leaks one operator's inference surface. A single unauth AI gateway yields every upstream LLM provider API key the operator has wired in, across every provider (OpenAI, Anthropic,…

  74. 75

    Cert-pivot (VisorGraph / crt.sh) only works on HTTPS endpoints -- there is no TLS handshake to intercept and no certificate to extract from a plaintext HTTP port. AI gateway admin APIs run HTTP-only by design:

  75. 76

    Survey: Cat-31 Data Labeling (Extended), 2026-06-01.

  76. 77

    Survey: MCP server population survey, 2026-06-02.

  77. 78

    Survey: Single-host ad-hoc assessments, xTom Japan (AS3258), 2026-06-05/06.

  78. 79

    Survey: Cat-OW (Open WebUI population survey) calibration pass, 2026-06-06.

  79. 80

    Status: confirmed at n=31 known-stage subset; validation at n>=100 blocked on funding-stage data, not method.

  80. 81

    Codified: 2026-06-07. Lane 1B of the 9-item plan. Source survey: data/platform-intel/mta-fingerprint-catalog-2026-06-07.md (6 MTAs probed; 4 live in docker, 2 source-characterized). Family: reference-haraka-docker-compose-leak (parent observation), Insight #78 (shared deployment kit operator…

  81. 82

    Codified: 2026-06-07. Cat-33 Phase 3B Lane B survey. Promoted to HIGH: 2026-06-07 (later same day). Cat-33 Phase 5 Lane D Slice B extension. 6/6 strict confirmations across two independent surveys. Source: data/platform-intel/cat33-lane-b-vendors-2026-06-07.md (Lane B, 3 vendors) +…

  82. 83

    Codified: 2026-06-07. Cat-33 Phase 3B Lane C survey. Source: data/platform-intel/cat33-lane-c-vendors-2026-06-07.md (3 vendors). Family: Insight #75 (HTTP admin ports kill cert-pivot), Insight #65 (TLS cert dork selection bias), Insight #71 (network placement as auth). Falsifiability tier:…

  83. 84

    Codified: 2026-06-07. Cat-33 Phase 5 Lane D Slice D survey over LiteLLM cloud-native guardrail hooks. Source: data/platform-intel/cat33-lane-d-slice-d-cloud-deltas-2026-06-07.md (3 cloud-native + 1 OSS framework). Family: Insight #74 (gateway-as-master-key-multiplier), Insight #78…

  84. 85

    Codified: 2026-06-07. Cat-33 Phase 5 Lane D Slice C survey. Source: data/platform-intel/cat33-lane-d-slice-c-specialized-2026-06-07.md (10 vendors probed, 2 confirmed stubs). Family: Insight #17 (platform-class-operators-are-mono-platform), Insight #51…

37 categories · nine-layer topology · one public IPv4 internet compiled live from the AI-LLM-Infrastructure-OSINT corpus