Reference
Shodan Queries: AI/ML Infrastructure
Source: https://github.com/nuclide-research/AI-LLM-Infrastructure-OSINT/blob/main/shodan/readme
Living catalogue of Shodan dorks for fingerprinting exposed AI/ML control-plane infrastructure.
Polished PDF reference: Shodan_AI_Reference.pdf, v2.1, April 2026 (markdown ahead at v2.2)
Living markdown source: see queries/, these are the files to PR against.
How to Read the Tables
Every query is tagged with an exposure tier. Tiers let you triage Shodan result sets without re-deriving risk for every entry.
| Tier | Meaning | How to use |
|---|---|---|
| T1 | Unauthenticated by default | Service ships with no auth, or trivially-bypassed auth. A positive hit is typically a live, interactive target. Treat as immediately actionable. |
| T2 | Requires misconfiguration | Service has auth by default but is commonly deployed without it, or has known auth-bypass CVEs. Positive hits need one additional probe to confirm exposure. |
| T3 | Recon / fingerprint only | Identifies the presence of the service. Does not indicate auth status. Use for inventory, trend analysis, and pivoting. |
Index
| # | Category | Examples |
|---|---|---|
| 1 | LLM Orchestration Platforms | Flowise, Langflow, Dify, Open WebUI, Ollama, n8n, Clawdbot |
| 2 | Vector Databases | ChromaDB, Qdrant, Weaviate, Milvus, pgvector, MinIO/Harbor (artifact stores) |
| 3 | Model Serving & Inference | vLLM, Triton, TGI, llama.cpp, LM Studio, GPT4All, NVIDIA NIM |
| 4 | Training, Fine-Tuning & Experiments | MLflow, Kubeflow, Ray, ClearML, Argilla, Feast |
| 5 | AI Gateways, Proxies & Monitoring | LiteLLM, Portkey, Langfuse, Helicone |
| 6 | Agent Frameworks | SuperAGI, OpenDevin, MetaGPT, Clawdbot, AutoGen |
| 7 | RAG Stacks & Self-Hosted AI Apps (new) | h2oGPT, Danswer/Onyx, Quivr, Khoj, RAGFlow, LibreChat |
| 8 | Image Generation & Diffusion (new) | ComfyUI, Stable Diffusion, AUTOMATIC1111, InvokeAI, Fooocus |
| 9 | AI Code Assistants (new) | Tabby, self-hosted Cody, Continue, Refact, FauxPilot |
| 10 | MCP Servers (new) | Model Context Protocol over HTTP/SSE, filesystem, shell, DB tool surfaces |
| 11 | Credential Leaks & Misconfigs | OpenAI/Anthropic/Groq/Gemini keys, .env exposure, HF tokens |
| 12 | Container & Orchestration Infrastructure (expanded) | Docker daemon, Kubernetes, kubelet, etcd, Consul, Vault |
| 13 | Backup / Snapshot Exposure (new) | Qdrant snapshots, Weaviate backups, ES snapshots, HTTP-served dumps |
| 14 | GPU & Compute Dashboards | NVIDIA DCGM, RunPod, Vast.ai, GPUStack |
| 15 | Fingerprinting Canaries | Favicon hashes, generic FastAPI/OpenAI-style detection |
| 16 | BI / Dashboard / Visualization | Metabase, Apache Superset, Redash, Grafana |
| 17 | Speech & Audio AI | Whisper ASR, Coqui XTTS, Piper TTS, RVC, OpenVoice, Pipecat, LiveKit |
| 18 | Jupyter Notebook / JupyterHub | Jupyter Notebook, JupyterLab, JupyterHub (inventory + CVE-2026-33709 targeting) |
| 19 | Streamlit Data Apps | Streamlit port 8501; 551 confirmed unauth in survey, 100% no-auth-concept |
| 20 | Gradio / Stable Diffusion WebUI / Langflow | Gradio port 7860, A1111, Langflow, HuggingFace demos |
| 21 | Browser Automation / Agent Backends | Selenium Grid, Chrome DevTools Protocol (CDP), Browserless, Playwright, Skyvern |
| 22 | Data Labeling / Annotation Servers | doccano, Argilla, Label Studio, Prodigy, CVAT |
| 23 | AI Safety Evaluation / Red-Team Self-Hosted | Promptfoo, LangSmith, NeMo Guardrails, DeepEval, Garak, Lakera Guard |
| 24 | LLM Observability / Training Telemetry | Phoenix (Arize), TensorBoard, W&B self-hosted, ClearML |
| 25 | Elasticsearch / OpenSearch | ES 7.x auth-off, OpenSearch, Kibana; 42 unauth in survey incl. 79M KYB record AML platform |
| 26 | Mem0 / Agent Long-Term Memory Collections | Mem0 collection patterns on unauth Qdrant/ChromaDB backends |
| A | Appendix, High-Severity CVE Cross-Reference (new) | Ray, MLflow, Flowise, Ollama, ComfyUI, kubelet, etc. |
Search across all queries
grep -rn "qdrant" queries/
grep -rn "port:8000" queries/
grep -rn "API_KEY" queries/
grep -rn " T1 " queries/ # all T1 (unauth-by-default) queries
Adding a new query
- Find the category file it belongs in (or open an issue to propose a new category).
- Add a row to the appropriate Markdown table, include a tier (T1/T2/T3).
- Add a
Notescell when the query reveals something specific, auth state, version, snapshot exposure, default credentials. - Open a PR. See CONTRIBUTING.md.
Versioning
The PDF reference is regenerated periodically from the markdown sources. Check the date on the cover page; the markdown is always more current.
Current PDF: v2.1 · April 2026
- v2.0 added four new sections (RAG Stacks, Image Generation, AI Code Assistants, MCP Servers), expanded Container/Orchestration to cover k8s/kubelet/etcd/Docker Registry v2, tagged every query with an exposure tier, and added Appendix A.
- v2.1 folds in a new Object Storage & Artifact Stores subsection under Vector Databases (MinIO, Harbor, image registries where AI models, vectors, and snapshots live), adds ClickHouse / Cassandra / txtai / Feast / Tecton entries, introduces GPT4All / NVIDIA NIM / AutoGen coverage, and ships a terminology primer for readers newer to the stack.
- v2.2 adds Audio/Speech/Vision inference (whisper.cpp, faster-whisper, Coqui, Piper, Bark, Vocode, PaddleOCR) + SGLang / LMDeploy / Aphrodite / Seldon under §3; Dagster, Weights & Biases self-hosted, wandb-local, CVAT, Doccano, Humanloop, Kubeflow Pipelines under §4; PromptLayer, Kong/Tyk AI plugins, Unify router under §5; OpenHands and AutoGPT-Next-Web under §6; a transport-agnostic MCP
jsonrpc/tools/listfingerprint under §10; and Mistral / DeepSeek / rawsk-ant-key leaks plus.claude/settings.jsonexposure under §11.