Shodan Queries: AI/ML Infrastructure, NuClide Reference

Living catalogue of Shodan dorks for fingerprinting exposed AI/ML control-plane infrastructure.

Polished PDF reference: Shodan_AI_Reference.pdf, v2.1, April 2026 (markdown ahead at v2.2) Living markdown source: see queries/, these are the files to PR against.

How to Read the Tables

Every query is tagged with an exposure tier. Tiers let you triage Shodan result sets without re-deriving risk for every entry.

Tier	Meaning	How to use
T1	Unauthenticated by default	Service ships with no auth, or trivially-bypassed auth. A positive hit is typically a live, interactive target. Treat as immediately actionable.
T2	Requires misconfiguration	Service has auth by default but is commonly deployed without it, or has known auth-bypass CVEs. Positive hits need one additional probe to confirm exposure.
T3	Recon / fingerprint only	Identifies the presence of the service. Does not indicate auth status. Use for inventory, trend analysis, and pivoting.

Index

#	Category	Examples
1	LLM Orchestration Platforms	Flowise, Langflow, Dify, Open WebUI, Ollama, n8n, Clawdbot
2	Vector Databases	ChromaDB, Qdrant, Weaviate, Milvus, pgvector, MinIO/Harbor (artifact stores)
3	Model Serving & Inference	vLLM, Triton, TGI, llama.cpp, LM Studio, GPT4All, NVIDIA NIM
4	Training, Fine-Tuning & Experiments	MLflow, Kubeflow, Ray, ClearML, Argilla, Feast
5	AI Gateways, Proxies & Monitoring	LiteLLM, Portkey, Langfuse, Helicone
6	Agent Frameworks	SuperAGI, OpenDevin, MetaGPT, Clawdbot, AutoGen
7	RAG Stacks & Self-Hosted AI Apps (new)	h2oGPT, Danswer/Onyx, Quivr, Khoj, RAGFlow, LibreChat
8	Image Generation & Diffusion (new)	ComfyUI, Stable Diffusion, AUTOMATIC1111, InvokeAI, Fooocus
9	AI Code Assistants (new)	Tabby, self-hosted Cody, Continue, Refact, FauxPilot
10	MCP Servers (new)	Model Context Protocol over HTTP/SSE, filesystem, shell, DB tool surfaces
11	Credential Leaks & Misconfigs	OpenAI/Anthropic/Groq/Gemini keys, `.env` exposure, HF tokens
12	Container & Orchestration Infrastructure (expanded)	Docker daemon, Kubernetes, kubelet, etcd, Consul, Vault
13	Backup / Snapshot Exposure (new)	Qdrant snapshots, Weaviate backups, ES snapshots, HTTP-served dumps
14	GPU & Compute Dashboards	NVIDIA DCGM, RunPod, Vast.ai, GPUStack
15	Fingerprinting Canaries	Favicon hashes, generic FastAPI/OpenAI-style detection
16	BI / Dashboard / Visualization	Metabase, Apache Superset, Redash, Grafana
17	Speech & Audio AI	Whisper ASR, Coqui XTTS, Piper TTS, RVC, OpenVoice, Pipecat, LiveKit
18	Jupyter Notebook / JupyterHub	Jupyter Notebook, JupyterLab, JupyterHub (inventory + CVE-2026-33709 targeting)
19	Streamlit Data Apps	Streamlit port 8501; 551 confirmed unauth in survey, 100% no-auth-concept
20	Gradio / Stable Diffusion WebUI / Langflow	Gradio port 7860, A1111, Langflow, HuggingFace demos
21	Browser Automation / Agent Backends	Selenium Grid, Chrome DevTools Protocol (CDP), Browserless, Playwright, Skyvern
22	Data Labeling / Annotation Servers	doccano, Argilla, Label Studio, Prodigy, CVAT
23	AI Safety Evaluation / Red-Team Self-Hosted	Promptfoo, LangSmith, NeMo Guardrails, DeepEval, Garak, Lakera Guard
24	LLM Observability / Training Telemetry	Phoenix (Arize), TensorBoard, W&B self-hosted, ClearML
25	Elasticsearch / OpenSearch	ES 7.x auth-off, OpenSearch, Kibana; 42 unauth in survey incl. 79M KYB record AML platform
26	Mem0 / Agent Long-Term Memory Collections	Mem0 collection patterns on unauth Qdrant/ChromaDB backends
A	Appendix, High-Severity CVE Cross-Reference (new)	Ray, MLflow, Flowise, Ollama, ComfyUI, kubelet, etc.

Search across all queries

grep -rn "qdrant"   queries/
grep -rn "port:8000" queries/
grep -rn "API_KEY"  queries/
grep -rn " T1 "     queries/    # all T1 (unauth-by-default) queries

Adding a new query

Find the category file it belongs in (or open an issue to propose a new category).
Add a row to the appropriate Markdown table, include a tier (T1/T2/T3).
Add a Notes cell when the query reveals something specific, auth state, version, snapshot exposure, default credentials.
Open a PR. See CONTRIBUTING.md.

Versioning

The PDF reference is regenerated periodically from the markdown sources. Check the date on the cover page; the markdown is always more current.

Current PDF: v2.1 · April 2026

v2.0 added four new sections (RAG Stacks, Image Generation, AI Code Assistants, MCP Servers), expanded Container/Orchestration to cover k8s/kubelet/etcd/Docker Registry v2, tagged every query with an exposure tier, and added Appendix A.
v2.1 folds in a new Object Storage & Artifact Stores subsection under Vector Databases (MinIO, Harbor, image registries where AI models, vectors, and snapshots live), adds ClickHouse / Cassandra / txtai / Feast / Tecton entries, introduces GPT4All / NVIDIA NIM / AutoGen coverage, and ships a terminology primer for readers newer to the stack.
v2.2 adds Audio/Speech/Vision inference (whisper.cpp, faster-whisper, Coqui, Piper, Bark, Vocode, PaddleOCR) + SGLang / LMDeploy / Aphrodite / Seldon under §3; Dagster, Weights & Biases self-hosted, wandb-local, CVAT, Doccano, Humanloop, Kubeflow Pipelines under §4; PromptLayer, Kong/Tyk AI plugins, Unify router under §5; OpenHands and AutoGPT-Next-Web under §6; a transport-agnostic MCP jsonrpc/tools/list fingerprint under §10; and Mistral / DeepSeek / raw sk-ant- key leaks plus .claude/settings.json exposure under §11.