Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All reference

Reference

27. Embedding Services

Source: https://github.com/nuclide-research/AI-LLM-Infrastructure-OSINT/blob/main/shodan/queries/27-embedding-services

Section verified: 2026-05-09

The vector-conversion layer that sits between raw text and vector databases. Embedding servers ingest documents or queries and return dense float vectors; without them, RAG pipelines and semantic search cannot run. They ship auth-off across every observed implementation — the attack class is compute theft and embedding oracle (pre-computing query vectors to probe downstream vector stores).

Survey note: Embedding services are Shodan-dark compared to LLM inference servers. The root / path of canonical servers (TEI, infinity) returns either a redirect or API JSON rather than HTML, so Shodan’s HTTP crawler indexes thin banners. Model-name queries (BAAI/bge, nomic-embed) are the highest-signal Shodan approach; the population-scale survey is masscan-driven on tier-2 cloud ranges, not Shodan-driven.


HuggingFace Text Embeddings Inference (TEI)

The canonical standalone embedding server from HuggingFace. Single Rust binary, ships without auth, exposes /info (model metadata), /embed (POST), /rerank (optional), and /metrics (Prometheus).

Warning — Docker Registry false positive: "text-embeddings-inference" in Shodan banner (6 hits) matches Docker Registry catalog responses that list ghcr.io/huggingface/text-embeddings-inference as a cached image. These are not live TEI servers. Narrow with port constraints or model_pipeline_tag checks.

Shodan QueryHitsNotes
http.html:"text-embeddings-inference"2Low — Shodan rarely indexes the HTML body of API-only roots
"text-embeddings-inference"6FP-heavy — mostly Docker Registry catalogs, not live TEI
product:"Text Embeddings Inference"0No Shodan product facet registered
port:80 http.html:"embed" http.html:"model_id"0TEI /info fields not indexed in HTML

Live fingerprint (aimap / curl): GET /info{"model_id": "BAAI/bge-small-en-v1.5", "model_pipeline_tag": "feature-extraction", "max_concurrent_requests": 512, "max_batch_total_tokens": 16384, "version": "2.x.x"}. The model_pipeline_tag: "feature-extraction" field is unique to TEI and not present in any LLM inference server.

Canonical aimap fingerprint: status_code:200 + json_field:model_pipeline_tag + body_contains:feature-extraction on GET /info.

Ports: 80 (Docker default internal), 8080 (common Docker mapping -p 8080:80), 3000 (older versions).


infinity-embedding (michaelfeil/infinity)

OpenAI-compatible embedding server. Default port 7997. FastAPI with /v1/embeddings (POST), /v1/models (GET), and /openapi.json. OpenAPI title is "Infinity Emb".

Shodan QueryHitsNotes
http.html:"infinity_emb"0Python package name not indexed
"Infinity Emb"0Not in Shodan index
port:7997 http.html:"embedding"0Default port not crawled
http.html:"infinity" http.html:"/v1/embeddings"1Combined term, 1 confirmed hit

Live fingerprint: GET /openapi.json → body contains "Infinity Emb". Alt: GET /v1/models → JSON with data[] and infinity_emb in model paths.

Canonical aimap fingerprint: status_code:200 + body_contains:Infinity Emb on GET /openapi.json.


Custom FastAPI Embedding APIs

The dominant shape in the wild. Operators wrap BAAI/bge, nomic-embed, multilingual-e5, and other embedding models in custom FastAPI services. Root GET / returns a JSON status object that leaks: embedding model name, embedding dimension, reranker model, LLM backend, and internal filesystem paths (index_dir, docs_dir). Auth-off on every observed instance.

Fingerprint by model name (Shodan-indexed because model names appear in HTML pages that embed API response data, e.g., Swagger UI and React dashboards):

Shodan QueryHitsNotes
http.html:"BAAI/bge"41Best signal — BAAI/bge family dominates; Contabo/Hetzner/Scaleway hosts
http.html:"nomic-embed"22nomic-embed-text model family; uvicorn-served FastAPI
http.html:"multilingual-e5"27intfloat multilingual-e5 family
http.html:"all-MiniLM"404sentence-transformers MiniLM — high count, heavily polluted by Reposify/honeypot fleet
http.html:"sentence-transformers"31sentence-transformers library reference
"sentence-transformers"4banner match, higher confidence
http.html:"jina" http.html:"embedding"12Jina embedding models in HTML
http.html:"jinaai"9Jina AI package name in page source

Pollution note: http.html:"all-MiniLM" at 404 hits is dominated by Server: Reposify honeypots returning identical Content-Length: 3151 across disparate IPs and ASNs. Filter: exclude server:"Reposify" and Content-Length:3151 responses.

Fingerprint by endpoint shape:

Shodan QueryHitsNotes
http.html:"/v1/embeddings" -http.html:"chat"46OpenAI-compat embedding-only (excludes LLM gateways)
http.html:"/embed" http.html:"model"1,541Too broad — includes any page with “embed” + “model”
http.html:"/embedding" http.html:"llama"480llama.cpp /embedding endpoint in HTML
http.html:"fastembed"5fastembed library reference

Live fingerprint (aimap): GET /json_field:embedding_dimension (OpenVINO pattern) OR json_field:embed (RAG config pattern). GET /healthjson_field:embedding_dimension.

Ports: 8000, 8001 (most common), 8002, 8080, 8100, 5000.


Sentence-Transformers / bert-as-service

Older embedding infrastructure. bert-as-service uses ZMQ (port 5555 pull, 5556 push) rather than HTTP — not directly Shodan-scannable. Newer sentence-transformers HTTP wrappers are indistinguishable from Custom FastAPI Embedding APIs above.

Shodan QueryHitsNotes
http.html:"sentence-transformers"31Library reference in page HTML
"sentence-transformers"4Banner match
port:5555 "sentence"0bert-as-service ZMQ not HTTP-indexed

Jina Embeddings Self-Hosted

Jina provides self-hosted jina-embeddings-v3 and jina-reranker via their jinaai package. Typically runs on 8080/8000, FastAPI.

Shodan QueryHitsNotes
http.html:"jinaai"9Package name in page source
http.html:"jina" http.html:"embedding"12Combined term
"jina" "embeddings"4Banner match
"Jina" "embeddings" port:80802Default port, higher confidence

llama.cpp Embedding Server Mode

llama.cpp exposes /embedding (POST) when started with --embedding flag. Frequently co-deployed with the chat endpoint on the same instance.

Shodan QueryHitsNotes
http.html:"/embedding" http.html:"llama"480llama.cpp embedding endpoint in HTML
"llama.cpp" http.html:"/embedding"0Banner + HTML combined — 0 (banner and HTML rarely co-indexed)

OpenAI-compat Embedding Endpoints (Generic)

Any service emulating the OpenAI /v1/embeddings interface. Catches TEI, infinity, LocalAI, and custom wrappers.

Shodan QueryHitsNotes
http.html:"/v1/embeddings" -http.html:"chat"46Embedding-only, excludes LLM gateways
http.html:"/v1/embeddings"90Includes LLM gateways that also serve embeddings

Round-2 expansion (2026-05-09 — 144 queries run, 818 unique IPs surfaced)

After the initial section, a comprehensive Shodan sweep ran 144 total queries (46 baseline + 50 zero-hit variants + 48 anchored/extension). Headline new findings worth promoting to first-class queries:

New Tier 1 model families (round-2 verified)

QueryHitsNotes
http.html:"bge-m3"56BAAI BGE-M3 multi-functional (dense+sparse+ColBERT) — biggest single new family
http.html:"text-embedding-3-large"55OpenAI v3-large model name in proxy gateways
http.html:"feature-extraction"64TEI’s HF tag in HTML (without model_ prefix that Round 1 used)
http.html:"e5-large"15intfloat/e5-large family
http.html:"bge-reranker"14Reranker co-deployed with embedder
http.html:"jina-embeddings"13The -v suffix from Round 1 was the bug
http.html:"mxbai-embed"12mixedbread-ai/mxbai-embed-large
http.html:"e5-base"8intfloat/e5-base
http.html:"max_input_length"8TEI parameter — closest indexable structural marker
http.html:"text-embedding-ada"7OpenAI ada-002 in proxy gateways
http.html:"openai/clip"7CLIP image embedding (multimodal)
http.html:"siglip"8Google SigLIP image embedding
http.html:"Snowflake/arctic"6Snowflake arctic-embed (namespace required)
http.html:"intfloat/e5"4Catches whole intfloat/e5 family
http.html:"jina-reranker"4Jina reranker co-deployed
http.html:"clip-vit-base-patch32" / clip-vit-large3+3CLIP variants
http.html:"jina-embeddings-v3"2Jina v3 explicit
http.html:"pipeline_tag"2Closest TEI structural marker
http.html:"feature-extraction" http.html:"model_id"1TEI signature pair

New Tier 5 platforms (embedding-hosting)

QueryHitsNotes
http.html:"xinference"484Xinference — Chinese multi-model platform (98% confirmed via title=“Xinference”)
http.html:"localai"190LocalAI multi-model (versions in title: v3.0.0, v3.8.0, v3.9.0, v3.12.1)
http.html:"nvidia" http.html:"embedding"13Anchored — many empty titles = API-only embedders, needs aimap

Documented false-positive queries (do NOT use)

QueryApparent hitsWhy FP
http.html:"stella"865AMEX Stella voice assistant, Genshin Impact servers, Italian hotels (Stella di Mare), Japanese 株式会社ステラ — name collision class
http.html:"infinity"8,330Generic word — telecom, gaming, ISP. Use anchored with + embed only with aimap follow-up
http.html:"NIM"883Network adapters (NIM cards), VirtualBox NIM driver, Polish parishes (Nim = “no” in some Slavic langs)
http.html:"cohere"157SDK references, “cohere” as adjective
http.html:"voyage" http.html:"embed"181”voyage” = trip in French; matches French tourism sites; only voyageai (7) is real
http.html:"truncate" http.html:"embed"113truncate is a CSS class, embed is an HTML tag — appears on every page using truncated text
http.html:"ColBERT"68”Colbert” surname, John D. Colbert & Associates, etc.

Confirmed truly Shodan-dark (no variant fired)

  • michaelfeil/infinity, infinity-emb — infinity-embedding’s package metadata never appears in HTML; only model + endpoint references do
  • model_pipeline_tag, max_batch_total_tokens, Infinity Emb (banner/title) — TEI/infinity API JSON / OpenAPI title not indexed
  • embed-multilingual-v3, embed-english-v3, voyage-large-2, voyage-2/3 — Cohere/Voyage are SaaS-native; rarely operator-deployed
  • NV-Embed, nemo-retriever, nvidia/NV-Embed — NVIDIA NIM not population-deployed in self-host space
  • hkunlp/instructor, instructor-large/xl — instructor-models HTML-rare
  • Alibaba-NLP/gte, thenlper/gte, gte-base/multilingual/Qwen2-7B — GTE family Shodan-dark despite known popularity in Chinese ecosystem

Methodology Notes

Why masscan supplements Shodan here: TEI, infinity, and custom FastAPI embedding servers all return API JSON at GET /, not HTML. Shodan’s crawler indexes the root path HTTP response; a JSON blob with {"model_pipeline_tag":"feature-extraction"} looks like a non-page and gets minimal indexing. The port:7997 -http.html:"chat" query (630 hits) demonstrates this: 630 hosts on infinity’s default port, but port:7997 + html:openapi/v1/model/embed returns 0 — meaning Shodan recorded these hosts only via banner/TLS, not HTML body. Port-targeted aimap fingerprinting is the only way to confirm these. Shodan handles the population that exposes HTML dashboards; masscan/aimap handle the API-only population.

aimap fingerprints (3 new — count 66 → 69):

  • HuggingFace TEIGET /infojson_field:model_pipeline_tag + body_contains:feature-extraction
  • infinity-embeddingGET /openapi.jsonbody_contains:Infinity Emb
  • Embedding APIGET /json_field:embedding_dimension OR json_field:embed

Threat class: Compute theft (GPU/CPU cost borne by operator) + embedding oracle (attacker pre-computes query vectors to probe downstream vector DBs semantically without holding the embedding key). Severity: medium on auth-off; elevated to high when paired with an exposed vector DB on the same host (dual-stack attack surface).

Key population finding from Shodan sample: Custom FastAPI wrappers around BAAI/bge and nomic-embed dominate over canonical TEI and infinity deployments. Operators build their own embedding servers rather than deploying the reference implementation, which means varied endpoint shapes and no canonical fingerprint covers the full population — aimap’s multi-probe approach is necessary.


See also