Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All reference

Reference

3. Model Serving & Inference

Source: https://github.com/nuclide-research/AI-LLM-Infrastructure-OSINT/blob/main/shodan/queries/03-model-serving

Section verified: April 22, 2026 11:38

The runtime layer that exposes models over HTTP. Most modern serving stacks emulate the OpenAI API surface (/v1/chat/completions, /v1/models, /v1/embeddings), making them trivially fingerprintable, and often trivially abusable as free compute (LLMjacking).

Inference Servers

Shodan QueryNotes
"SillyTavern"9,030 hits, LLM frontend (chat/RP workloads); most-exposed model-serving surface in this catalogue
http.html:"/v1/chat/completions"6,238 hits, canonical OpenAI-compat fingerprint (path appears in HTML responses; covers vLLM, llama.cpp, LMDeploy, LM Studio, most inference servers)
http.html:"/v1/models"6,089 hits, same category, paired canonical OpenAI-compat path
"llama.cpp"1,483 hits, most-deployed specific server (bare banner match)
"llama.cpp" port:8080461 hits, llama.cpp on default port
"vLLM"356 hits, banner match
"/v1/chat/completions"225 hits, banner-level path leak
"/v1/models"216 hits, banner-level path leak
http.html:"vllm"188 hits, vLLM in HTML body
http.html:"llama.cpp"81 hits, llama.cpp in HTML
"sglang"60 hits, SGLang structured-generation server, OpenAI-compat
"koboldcpp"59 hits, KoboldCpp banner (all ports)
"koboldcpp" port:500154 hits, KoboldCpp default port
("koboldcpp" OR "koboldai")47 hits, grouped OR catches AI Horde workers (unparenthesized breaks)
"koboldai"46 hits, KoboldAI web UI
"Seldon"33 hits, Seldon Core model deployment (k8s)
http.html:"lm studio"173 hits, LM Studio in HTML body (best LM Studio fingerprint)
port:1234 "model"71 hits, LM Studio default port with model term
http.html:"sglang"74 hits, SGLang in HTML body
http.html:"seldon"20 hits, Seldon in HTML body
http.title:"Seldon"10 hits, Seldon title match
"LM Studio"12 hits, LM Studio banner match
http.html:"nvidia nim"12 hits, NVIDIA NIM microservices in HTML
http.html:"/v1/embeddings"90 hits, embeddings path in HTML
"/v1/embeddings"8 hits, embeddings path banner leak
"Triton Inference Server"7 hits, NVIDIA Triton (use exact phrase; bare “Triton” returns 1,809 polluted)
"lmdeploy"5 hits, InternLM LMDeploy engine
"vLLM" port:80002 hits, vLLM on default port
port:8000 "openai" "model"1 hit, OpenAI-compat on 8000 with model list
http.html:"GPT4All"2 hits, GPT4All in HTML body
"llama" "completion"2 hits, llama completion term combo (no port)
"GPT4All"1 hit, GPT4All desktop server banner
"NVIDIA NIM"1 hit, NIM exact phrase (2026 deployments still sparse on public internet)
"aphrodite"362 hits, ⚠️ bare term is Greek-mythology noise, not the vLLM fork
"Aphrodite Engine"1 hit, exact phrase (specific-but-rare)

Canonical query recommendation: For OpenAI-compatible inference discovery, http.html:"/v1/chat/completions" (6,238) is the best single query, it catches every server emulating the OpenAI API surface regardless of underlying implementation. Pair with http.html:"/v1/models" for a narrow overlap.

GET /v1/models HTTP/1.1  {"object":"list","data":[{"id":"meta-llama/Llama-3.1-70B-Instruct",
  "object":"model","created":1713300000,"owned_by":"vllm"}]}

Typical vLLM response on an exposed port. Model name reveals what’s hosted; /v1/chat/completions on the same port is reachable with no auth.

Embedding / Reranking

Shodan QueryNotes
http.html:"TEI"332 hits, TEI in HTML (best TEI fingerprint)
http.html:"jina"83 hits, Jina in HTML
http.html:"sentence-transformers"31 hits, sentence-transformers in HTML
"text-embeddings-inference"5 hits, TEI banner match
"sentence-transformers"5 hits, banner match
"Jina" "embeddings"4 hits, Jina banner + term
"Jina" "embeddings" port:80802 hits, Jina on default port
"infinity" "embedding"1 hit, Infinity embedding server
"Cohere"1,603 hits, bare brand name, contaminated by SaaS references / unrelated docs

Pollution flagged: http.html:"infinity" alone returns 7,794 hits, but "infinity" "embedding" collapses to 1 hit. “Infinity” is a generic English word polluting 7,793 unrelated hits. Do not use http.html:"infinity" as a fingerprint.

Cohere note: Cohere’s rerank is primarily a cloud API product; self-hosted rerank instances are rare on the public internet. Bare "Cohere" at 1,603 is brand-name noise, not server fingerprints, narrow before trusting.

Audio / Speech / Vision Inference

Shodan QueryNotes
http.title:"Whisper"444 hits, Whisper ASR web UI
http.html:"faster-whisper"190 hits, faster-whisper in HTML
http.title:"Bark"120 hits, Suno Bark text-to-audio
http.title:"Piper"103 hits, Piper TTS title
http.html:"coqui"55 hits, Coqui in HTML
"faster-whisper"32 hits, faster-whisper banner
"PaddleOCR"28 hits, PaddleOCR service (banner form)
"whisper.cpp" "/inference"16 hits, whisper.cpp HTTP server, free ASR compute
"coqui" "tts"13 hits, Coqui TTS banner + term
"piper" "tts"8 hits, Piper neural TTS (RPi/edge)
"vocode"4 hits, Vocode voice-agent framework
http.html:"vocode"2 hits, Vocode in HTML
"paddleocr" port:80801 hit, PaddleOCR (default port, narrow)