Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All reference

Reference

Shodan Queries: AI/ML Infrastructure

Source: https://github.com/nuclide-research/AI-LLM-Infrastructure-OSINT/blob/main/shodan/readme

Living catalogue of Shodan dorks for fingerprinting exposed AI/ML control-plane infrastructure.

Polished PDF reference: Shodan_AI_Reference.pdf, v2.1, April 2026 (markdown ahead at v2.2) Living markdown source: see queries/, these are the files to PR against.

How to Read the Tables

Every query is tagged with an exposure tier. Tiers let you triage Shodan result sets without re-deriving risk for every entry.

TierMeaningHow to use
T1Unauthenticated by defaultService ships with no auth, or trivially-bypassed auth. A positive hit is typically a live, interactive target. Treat as immediately actionable.
T2Requires misconfigurationService has auth by default but is commonly deployed without it, or has known auth-bypass CVEs. Positive hits need one additional probe to confirm exposure.
T3Recon / fingerprint onlyIdentifies the presence of the service. Does not indicate auth status. Use for inventory, trend analysis, and pivoting.

Index

#CategoryExamples
1LLM Orchestration PlatformsFlowise, Langflow, Dify, Open WebUI, Ollama, n8n, Clawdbot
2Vector DatabasesChromaDB, Qdrant, Weaviate, Milvus, pgvector, MinIO/Harbor (artifact stores)
3Model Serving & InferencevLLM, Triton, TGI, llama.cpp, LM Studio, GPT4All, NVIDIA NIM
4Training, Fine-Tuning & ExperimentsMLflow, Kubeflow, Ray, ClearML, Argilla, Feast
5AI Gateways, Proxies & MonitoringLiteLLM, Portkey, Langfuse, Helicone
6Agent FrameworksSuperAGI, OpenDevin, MetaGPT, Clawdbot, AutoGen
7RAG Stacks & Self-Hosted AI Apps (new)h2oGPT, Danswer/Onyx, Quivr, Khoj, RAGFlow, LibreChat
8Image Generation & Diffusion (new)ComfyUI, Stable Diffusion, AUTOMATIC1111, InvokeAI, Fooocus
9AI Code Assistants (new)Tabby, self-hosted Cody, Continue, Refact, FauxPilot
10MCP Servers (new)Model Context Protocol over HTTP/SSE, filesystem, shell, DB tool surfaces
11Credential Leaks & MisconfigsOpenAI/Anthropic/Groq/Gemini keys, .env exposure, HF tokens
12Container & Orchestration Infrastructure (expanded)Docker daemon, Kubernetes, kubelet, etcd, Consul, Vault
13Backup / Snapshot Exposure (new)Qdrant snapshots, Weaviate backups, ES snapshots, HTTP-served dumps
14GPU & Compute DashboardsNVIDIA DCGM, RunPod, Vast.ai, GPUStack
15Fingerprinting CanariesFavicon hashes, generic FastAPI/OpenAI-style detection
16BI / Dashboard / VisualizationMetabase, Apache Superset, Redash, Grafana
17Speech & Audio AIWhisper ASR, Coqui XTTS, Piper TTS, RVC, OpenVoice, Pipecat, LiveKit
18Jupyter Notebook / JupyterHubJupyter Notebook, JupyterLab, JupyterHub (inventory + CVE-2026-33709 targeting)
19Streamlit Data AppsStreamlit port 8501; 551 confirmed unauth in survey, 100% no-auth-concept
20Gradio / Stable Diffusion WebUI / LangflowGradio port 7860, A1111, Langflow, HuggingFace demos
21Browser Automation / Agent BackendsSelenium Grid, Chrome DevTools Protocol (CDP), Browserless, Playwright, Skyvern
22Data Labeling / Annotation Serversdoccano, Argilla, Label Studio, Prodigy, CVAT
23AI Safety Evaluation / Red-Team Self-HostedPromptfoo, LangSmith, NeMo Guardrails, DeepEval, Garak, Lakera Guard
24LLM Observability / Training TelemetryPhoenix (Arize), TensorBoard, W&B self-hosted, ClearML
25Elasticsearch / OpenSearchES 7.x auth-off, OpenSearch, Kibana; 42 unauth in survey incl. 79M KYB record AML platform
26Mem0 / Agent Long-Term Memory CollectionsMem0 collection patterns on unauth Qdrant/ChromaDB backends
AAppendix, High-Severity CVE Cross-Reference (new)Ray, MLflow, Flowise, Ollama, ComfyUI, kubelet, etc.

Search across all queries

grep -rn "qdrant"   queries/
grep -rn "port:8000" queries/
grep -rn "API_KEY"  queries/
grep -rn " T1 "     queries/    # all T1 (unauth-by-default) queries

Adding a new query

  1. Find the category file it belongs in (or open an issue to propose a new category).
  2. Add a row to the appropriate Markdown table, include a tier (T1/T2/T3).
  3. Add a Notes cell when the query reveals something specific, auth state, version, snapshot exposure, default credentials.
  4. Open a PR. See CONTRIBUTING.md.

Versioning

The PDF reference is regenerated periodically from the markdown sources. Check the date on the cover page; the markdown is always more current.

Current PDF: v2.1 · April 2026

  • v2.0 added four new sections (RAG Stacks, Image Generation, AI Code Assistants, MCP Servers), expanded Container/Orchestration to cover k8s/kubelet/etcd/Docker Registry v2, tagged every query with an exposure tier, and added Appendix A.
  • v2.1 folds in a new Object Storage & Artifact Stores subsection under Vector Databases (MinIO, Harbor, image registries where AI models, vectors, and snapshots live), adds ClickHouse / Cassandra / txtai / Feast / Tecton entries, introduces GPT4All / NVIDIA NIM / AutoGen coverage, and ships a terminology primer for readers newer to the stack.
  • v2.2 adds Audio/Speech/Vision inference (whisper.cpp, faster-whisper, Coqui, Piper, Bark, Vocode, PaddleOCR) + SGLang / LMDeploy / Aphrodite / Seldon under §3; Dagster, Weights & Biases self-hosted, wandb-local, CVAT, Doccano, Humanloop, Kubeflow Pipelines under §4; PromptLayer, Kong/Tyk AI plugins, Unify router under §5; OpenHands and AutoGPT-Next-Web under §6; a transport-agnostic MCP jsonrpc/tools/list fingerprint under §10; and Mistral / DeepSeek / raw sk-ant- key leaks plus .claude/settings.json exposure under §11.