Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Survey May 1, 2026

Embedding Services: Cross-Cloud Survey (2026-05)

NuClide Research · 2026-05-09

Status: Discovery + Shodan query sweep complete (2026-05-09). aimap Phase 1 + asyncio fingerprinting complete (Phase 2 hung on slow responders, replaced with focused asyncio probe). Shodan host enrichment on AI-tagged / port-7997 subset complete. 818 unique IPs surfaced; 667 with ≥1 open port confirmed active; 93 services live-confirmed on the 440-IP priority subset. Two HIGH-severity disclosure-warranted findings (Klinikken.ai medical AI auth bypass, GraphRAG Process Safety stack on Scaleway FR). Full auth-off pattern holds across all platform classes observed.


Premise

Embedding servers are the vector-conversion layer between raw text and vector databases. They ingest documents or queries and return dense float vectors; without them, RAG pipelines and semantic search cannot run. Every observed real-world implementation ships auth-off.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, K7004, K7044, S7068, S7070, S7075, S7076, T5904
  • 733 (AI Risk & Ethics Specialist): K7040, K7051, K7052, S7056, T5854, T5868, T5893
  • overlap (Common AI KSATs (all 5 roles)): K1157, K1158, K1159, K22, K6311, K6900, K6935, K7003, K7041

Attack classes:

ClassMechanismSeverity
Compute theftCaller issues unlimited /embed requests at operator’s GPU/CPU costMEDIUM
Embedding oracleAttacker pre-computes query vectors to probe downstream vector stores semantically without holding the model locallyHIGH
Dual-stack escalationExposed embedding server + exposed vector DB on same host = full RAG data extraction chainHIGH
Reranker oracleCo-deployed reranker reveals ranking signal for downstream RAG answer manipulationMEDIUM

Embedding oracles are the least-discussed attack class in this tier and the most operationally significant: an attacker who can query your embedding model can blind-probe any vector database that uses the same model for indexing.


Scope

PlatformDefault PortsAuth Posture
HuggingFace TEI80, 8080, 3000None
infinity-embedding (michaelfeil)7997None
Custom FastAPI embedding wrappers8000, 8001, 8002, 5000None
Sentence-transformers HTTP wrappers8000, 8080None
Jina Embeddings self-hosted8080, 8000None
llama.cpp --embedding mode8080None
Xinference (hosting platform)9997, 80Optional API key, off by default
LocalAI (hosting platform)8080None

Methodology

Discovery: 3-round Shodan query sweep

144 total Shodan queries across 3 rounds:

  • Round 1 (46 queries): Model-family strings (BAAI/bge, nomic-embed, multilingual-e5), endpoint shapes (/v1/embeddings, /embedding), platform names, library references
  • Round 2 (50 queries): Zero-hit variants expanded. Found bge-m3 (56), feature-extraction (64), text-embedding-3-large (55), mxbai-embed (12), jina-embeddings (13), siglip (8)
  • Round 3 (48 queries): Anchored broad strings. Confirmed Xinference (484), LocalAI (190); dropped stella/voyage/ColBERT/truncate+embed as FP classes

Shodan-dark note: TEI’s API JSON (model_pipeline_tag field), infinity-embedding’s OpenAPI title ("Infinity Emb"), and bare FastAPI roots return JSON not HTML. Shodan’s crawler indexes HTML pages; these servers are invisible via Shodan query search and require port-targeted aimap probing.

Deduplication and enrichment

  • Consolidated: 993 ip:port pairs → 818 unique IPs
  • InternetDB bulk enrichment (free): 571/814 indexed, 133 honeypot-tagged, 92 Shodan-tagged “ai”, 307 with known CVEs
  • Honeypot filter applied: AS63949 Linode fleet (salt wW0sffoqsk.EM) + multi-port synthetic signature excluded
  • Shodan host API: Burned on 92 AI-tagged + port-7997 subset (100 IPs, 100 query credits)

Fingerprinting

aimap v1.7.0 with 3 new embedding fingerprints (69 total):

FingerprintProbeMatch condition
HuggingFace TEIGET /infojson_field:model_pipeline_tag + body_contains:feature-extraction
infinity-embeddingGET /openapi.jsonbody_contains:Infinity Emb
Embedding APIGET /json_field:embedding_dimension OR json_field:embed

Note: Phase 2 fingerprinting made concurrent in this session (80 goroutines matching -threads flag); previously sequential. MatchFingerprints now uses goroutine pool with semaphore.


Discovery Results

Pool summary

MetricCount
Total ip:port pairs993
Unique IPs818
IPs with ≥1 open port (aimap Phase 1)667
Open ports found4,484
Honeypots filtered133
Shodan “ai”-tagged92
IPs with known CVEs307

Geographic distribution

CountryIPs
United States270 (33%)
China203 (25%)
Germany98 (12%)
Singapore47 (6%)
United Kingdom43 (5%)
India27
France25
Korea, Republic of25
Finland19
Canada16

Infrastructure providers

ProviderIPsNotes
Aliyun (all properties)~228 (28%)Dominant single provider; 4 ASNs combined
Linode87Largely the AS63949 honeypot fleet
Hetzner Online58
Google LLC28
DigitalOcean26
Contabo24
Amazon~40Multiple ASNs

Aliyun dominance (28% of pool) mirrors the Chinese self-hosted AI deployment pattern observed in the RAG/Xinference survey: Chinese operators are the largest single constituency deploying self-hosted embedding infrastructure.

Port distribution

PortCountService
443135HTTPS (reverse-proxied)
7997100infinity-embedding default — confirmed deployed
808084Nginx/FastAPI generic
8077HTTP direct
800069FastAPI default
1143447Ollama (embedding-capable)
800141Alt FastAPI
500023Flask/older FastAPI
300022TEI/Node proxy

Port 7997 at 100 hosts is the strongest infrastructure signal: infinity-embedding’s non-standard default port is genuinely deployed at population scale, confirming the framework’s adoption despite being Shodan-dark at the HTML level.


Targeted Probe Results (direct HTTP: live confirmation)

Fast targeted probe against 408 priority-port IPs (ports 7997, 80, 8080, 3000, 8000, 8001, 8002, 5000 only) using embedding-specific fingerprint paths.

ServiceConfirmedNotes
Embedding API (custom FastAPI)4embedding_dimension field in JSON root
HuggingFace TEI0Shodan-dark confirmed — no live matches on port 8080/3000/80
infinity-embedding0Port 7997 hosts respond with non-HTTP (Socks4A, IRC, binary)

Live rate 1% of Shodan-visible pool. Expected, Shodan data is days-to-weeks old. TEI and infinity-embedding confirmed Shodan-dark: servers return JSON-only roots not indexed by Shodan.

Confirmed hosts (priority probe)

IPPortServiceFinding
46.4.204.448001Embedding APIBAAI/bge-m3 (1024-dim), OpenVINO-int8-throughput backend, model_loaded:true
37.27.185.388001Embedding APIKlinikken.ai Vector Database — healthcare AI (see Notable Finding F1 below)
161.118.173.648000Embedding APIWebsite FAQ chatbot, e5-large-v2 + pgvector + llama3 (DB disconnected)
161.118.173.6480Embedding APISame host, dual-port binding

aimap Fingerprinting + Asyncio Probe Results (440 priority IPs)

aimap Phase 1 confirmed 1,924 open ports across 362/440 priority hosts (AI-tagged + port-7997 + EU/US, no Chinese). Phase 2 fingerprinting hung on slow IPv6 / TLS responders despite the concurrent-goroutine fix; the Go HTTP client’s 1 s timeout did not reliably cancel established connections in the slow-trickle-response case. Methodology pivot: replaced Phase 2 with a focused asyncio probe (/tmp/embed-probe.py) using strict 1.5 s connect / 2 s read / 10 s host-deadline timeouts. The asyncio probe finished 6,160 probes in ~3 minutes (vs aimap’s 10+ min hang) and cleanly cancelled stuck connections at the asyncio.wait_for layer.

Confirmed services (asyncio probe, 440 IPs × 14 ports)

ServiceConfirmed% of pool
OpenAI-compat /v1/models (LLM gateways, embedding-capable)419.3%
LocalAI (ApplicationConfig JSON root)286.4%
Ollama (port 11434, embedding-capable)194.3%
Jina (GRM-MCP API)20.5%
Embedding API (embedding_dimension JSON field)20.5%
Embedding API (embed JSON field)10.2%
Total live confirmations9321.1%

Live rate 21% of priority subset (93 / 440). Substantially higher than the 1% rate from the unfiltered 818-IP pool, validating the AI-tag + port-7997 + EU/US filter as a high-signal subset.

Cross-validation against Shodan host enrichment

The asyncio probe’s live counts roughly match the Shodan host enrichment ratios:

ServiceShodan enrichment (100-IP AI-tagged sample)asyncio live (440 IPs)
LocalAI4328
Ollama1619
Xinference10 (probe scope)

LocalAI’s drop from 43 → 28 reflects natural churn between Shodan’s index time (days/weeks) and live probe (now). The Ollama count went up because the 440-IP set includes more port-11434 hosts than the 100-IP AI-tagged subset.

Port 7997 (infinity-embedding): re-confirmed Shodan-dark

Zero infinity-embedding confirmations on the asyncio probe across all 440 IPs × port 7997. The probe sent GET /openapi.json with the canonical Infinity Emb body match. Hosts on port 7997 either:

  1. Don’t respond to HTTP (port open, non-HTTP service or firewalled at L7)
  2. Are honeypot synthetic responses (Socks4A/IRC/binary noise per AS63949 fleet signature)
  3. Are infinity instances that have moved off the standard port

Hypothesis revision: port 7997 alone is not a reliable infinity-embedding signal at population scale. The 100 Shodan-visible port-7997 hosts may include a substantial portion of synthetic / honeypot / off-target services. Future surveys should require the GET /openapi.jsonInfinity Emb body match as a positive condition, not the port alone.

Confirmed embedding API hosts (asyncio probe)

IPPortServiceNotes
46.4.204.448001Embedding APIBAAI/bge-m3, OpenVINO-int8, model_loaded=true
37.27.185.388001Embedding APIKlinikken.ai medical AI — auth bypass (F1)
51.159.4.288000Embedding APIGraphRAG Process Safety API — full stack exposure (F9)

Shodan Host Enrichment (100 IPs)

Full Shodan records pulled on 92 AI-tagged (non-honeypot) + top port-7997 hosts. All 100 IPs indexed.

Confirmed services (Shodan product field)

PlatformIPsNotes
LocalAI43Product = “LocalAI” in Shodan banner
Ollama16Port 11434, embedding-capable
Xinference1Title-confirmed (most Xinference lack “ai” tag)

LocalAI version distribution (from 100-IP sample):

VersionCount
v3.0.04
v2.25.04
v3.8.03
v3.12.13
v3.9.03
v3.10.12
v2.20.12
v3.5.02
Other v3.x5
Other v2.x3

86% of versioned LocalAI instances are on v3.x. Full git commit hash included in Shodan title, enabling precise version tracking.

Port 7997 (infinity-embedding): Shodan-dark confirmed

39 of 100 hosts had port 7997 open in Shodan records. Zero showed infinity-embedding product/title in Shodan banners. Banners were: Socks4A proxy, SSH, IRC-like services, binary noise (AS63949 honeypot signature). Shodan recorded the port as open but didn’t fingerprint the HTTP service. Confirming that infinity-embedding’s JSON API root is invisible to Shodan’s HTML-based indexing. aimap’s GET /openapi.json probe is the only reliable fingerprint.

Fleet patterns

SubnetCountNotes
144.91.80.0/244Sequential IPs .220-.223, all Ollama port 11434 — single operator’s Ollama cluster
185.28.47.0/244Mixed service fleet

The 144.91.80.220-223 cluster mirrors the browser-agent survey’s fleet propagation pattern: a single operator deploying 4 identical Ollama instances on sequential IPs.

Geographic distribution (AI-tagged subset)

CountryCount
Germany22
China21
United States12
Singapore5
Japan5
Latvia4
Israel4

The AI-tagged subset is more European than the full pool (Germany 22%, vs 12% in the overall 818-IP pool). The Contabo/Hetzner bias is consistent with European self-hosted AI operator demographics.


Key Findings

F1: Klinikken.ai. Psychotherapy session-notes corpus exposed via embedding-proxy auth bypass [CRITICAL. DISCLOSURE IN FLIGHT]

Host: 37.27.185.38:8001 (Hetzner DE, static.38.185.27.37.clients.your-server.de) Operator: Klinikken.ai ApS, CVR 45899071, Faxe, Denmark Severity escalated 2026-05-09 14:11 UTC from HIGH (architectural finding only) to CRITICAL after Test A confirmed the corpus is populated psychotherapy session content.

Klinikken.ai is a Danish clinical AI platform serving health clinics. Their self-hosted vector database API is publicly exposed without authentication. The system stores psychotherapy session notes: each session generates one Qdrant collection named notes_therapist_<therapist_id>_session_<32_hex_uuid>, holding 1–6 chunked text vector points.

  • Embedding API (port 8001): Full CRUD, no auth. Endpoints: POST /upload, POST /search, POST /delete, GET /collections/{user_id}, DELETE /collections/{user_id}/{collection_name}
  • Qdrant backend (port 6333): Reachable but auth-gated; the proxy on 8001 strips that gate
  • Auth bypass: FastAPI proxy bakes the Qdrant API key in and serves data unauthenticated
  • Broken access control: user_id is described in the OpenAPI as "Bruger ID for isolation" but is caller-supplied. Test A guessed user_id=1 and retrieved 28 populated session-notes collections, ~78 chunked text points, ≥11 distinct therapist IDs visible in metadata (raw therapist IDs / session UUIDs withheld from this case study pending operator notification, held in ~/recon/embedding-shodan-2026-05-09/disclosures-unredacted/test-a-result.json)
  • Model: paraphrase-multilingual-MiniLM-L12-v2. Multilingual sentence-transformer, consistent with Danish-language clinical content
  • Tagline (Danish, from /openapi.json): “Embeddings og semantic search service med bruger-isolation”. The operator named user-isolation as the design property; the implementation does not enforce it

Data class: GDPR Article 9 special-category mental-health data (psychotherapy session content). Danish Sundhedsloven §40 patient confidentiality applies. Article 33 breach-notification 72-hour clock starts on the controller at moment-of-awareness (= delivery of disclosure).

Impact: Any unauthenticated caller can:

  • read therapy session content via POST /search with score_threshold=0 and limit=100 (corpus dump primitive)
  • inject malicious content into any therapist’s session collection via POST /upload (LLM-poisoning vector that surfaces in clinical chatbot responses to patients)
  • destroy any therapist’s session collection via DELETE /collections/{user_id}/{collection_name} (data-integrity loss + clinical-record obstruction concerns)
  • enumerate the customer/tenant space via GET /collections/{user_id} with caller-supplied user_id

Marketing-vs-implementation contradiction: Klinikken.ai’s homepage claims “GDPR-sikker. Hostet i EU”, “Lever op til GDPR – ingen cookies, IP-adresser eller persondata”, and “Brugeren er 100% anonym”, and contrasts themselves to ChatGPT’s data-protection posture. The exposure we observed contradicts each of these claims at the technical layer: therapist IDs and session UUIDs are persistent personal-identifier metadata stored on a publicly reachable retrieval API, with no user-authentication and no access control on the partition primitive their own OpenAPI documents as "for isolation". The host stack (Hetzner DE on 37.27.185.38) matches the operator’s stated production tier (“platformen kører i Finland, AI i Tyskland”), so this is not a forgotten dev box.

Disclosure status: In flight as of 2026-05-09. Coordinated disclosure to Klinikken.ai ApS (operator) → Hetzner abuse (host) → Datatilsynet DK (supervisory authority, if 72-hour Article 33 clock requires escalation). Public disclosure (full unredacted technical detail incl. raw therapist IDs / session UUIDs / collection-name list / search PoC) withheld until operator acknowledgment or 72-hour silence per coordinated-disclosure norms.


F2: Xinference. 484-hit dominant platform, 98% title-confirmed

http.html:"xinference" returned 484 unique IPs. Cross-validation against page title (title:"Xinference") confirmed 98%+ are genuine Xinference deployments. Xinference is a Chinese multi-model serving platform (Xorbits/Xorbits-IO project) that supports embedding models alongside LLMs and image generation.

Attack surface: Xinference’s API is auth-optional (API key off by default). The /v1/embeddings endpoint accepts model_uid as parameter. Any caller can enumerate available models, compute embeddings, and use them as oracles against downstream vector DBs. Admin panel (/v1/cluster) exposes node topology.

F3: Port 7997. 100 confirmed infinity-embedding hosts

infinity-embedding (michaelfeil/infinity) uses port 7997 as its non-standard default, making it uniquely identifiable via port scan even though Shodan HTML queries return 0. 100 hosts found on this port represent confirmed or near-confirmed infinity deployments.

Shodan-dark problem: infinity’s API root returns JSON (/openapi.json{"info": {"title": "Infinity Emb"}}), which Shodan doesn’t index. The only Shodan signal is the port itself. aimap’s GET /openapi.json + body_contains:Infinity Emb is the definitive fingerprint.

F4: CVE exposure. 307/818 IPs carry known vulnerabilities

InternetDB reports 307 IPs in the embedding pool have known CVEs. Top CVEs:

CVEHostsDescription
CVE-2025-23419210nginx TLS session ticket reuse (shared memory across workers)
CVE-2023-44487209HTTP/2 Rapid Reset (DoS amplification)
CVE-2021-3618157nginx ALPN mismatch
CVE-2021-23017152nginx resolver heap overflow
CVE-2013-4365 and older145+Apache legacy / mod_security era

CVE-2023-44487 (HTTP/2 Rapid Reset) on 209 embedding hosts means attackers can DoS the embedding layer specifically, degrading entire RAG pipelines without touching the LLM or vector DB. Combined with auth-off, no authentication is needed to trigger the DoS.

F5: Custom FastAPI wrappers dominate over canonical implementations

Model-name queries (BAAI/bge at 41, nomic-embed at 22, multilingual-e5 at 27) all returned non-TEI, non-infinity servers. Operators wrapping models in custom FastAPI services. Each has unique endpoint shapes, response schemas, and field names. No single canonical fingerprint covers the population. The dominant pattern: operators copy open-source RAG templates and add an embedding endpoint alongside the LLM gateway, inheriting auth-off from the template.

F6: Honeypot mimicry. “Xinference” on Redis port (port 6379)

Host 43.133.13.81 (Japan/Asia Pacific Network, 1,000 ports) is tagged honeypot in Shodan. Its Shodan record shows title:"Xinference" on port 6379 (Redis default). This is honeypot service mimicry: the honeypot operator scripted responses that return Xinference-looking HTML on non-standard ports to catch scanners. Filtering rule: any Xinference hit on port 6379 is a honeypot. Cross-check Shodan tag before treating as genuine.

F7: Tor-associated Ollama cluster (Latvia/MAXKO fleet)

The Latvia/SIA RixHost fleet (185.28.47.x) and MAXKO Hosting operator (South Africa/Croatia) show Ollama instances tagged with tor and database. These are likely privacy-focused VPS providers offering “anonymous AI” services where users submit embedding jobs through Tor-onion frontends to unauth Ollama backends. From the embedding oracle perspective: the Tor layer protects the USER, not the operator. The embedding API itself is auth-off at the HTTP layer.

F8: Aliyun / Chinese cloud operator concentration

28% of the embedding server pool is on Aliyun. Combined with Korean (25 IPs) and Singaporean (47) Asian-cloud presence, over 40% of the discoverable embedding infrastructure is on Asian cloud providers. This population skews younger (more recently deployed), runs newer frameworks (Xinference, bge-m3 family), and is more likely to have UI dashboards that make Shodan indexing possible.

F9: GraphRAG Process Safety API. Full multi-stack exposure on Scaleway FR [DISCLOSURE WARRANTED]

Host: 51.159.4.28 (51-159-4-28.rev.poneytelecom.eu, Scaleway dedicated, Paris FR)

Surfaced by the asyncio probe (/-root JSON match on embed key). Host runs a French industrial process safety RAG stack with five auth-off services on the same VPS:

PortServiceStatus
8000GraphRAG Process Safety API v3.0.0200 (OpenAPI public, 19 endpoints)
11434Ollama (qwen2.5:7b LLM, nomic-embed-text embedder)200
6333Qdrant vector DB200
3000Web UI (likely Open WebUI)200
9000Object storage (likely MinIO)307

JSON root response (port 8000):

{"message":"GraphRAG Process Safety API","version":"3.0.0",
 "llm_model":"qwen2.5:7b","embed":"nomic-embed-text",
 "dossier_local":"/home/<redacted>/<redacted>","status":"running"}

Operator information leaked in JSON root (Linux username + folder name redacted in this public case study to avoid pre-disclosure operator re-identification; held unredacted in ~/recon/embedding-shodan-2026-05-09/disclosures-unredacted/). The OpenAPI spec exposes 19 endpoints including /webhook/drive/initial-sync (Google Drive root-folder ingest), /chat, /history, /me, /dossier/scan, /reindex, /internal/notify. French OpenAPI descriptions (“Synchronisation initiale complète”) confirm French operator/scope.

Process Safety domain: GraphRAG is Microsoft’s knowledge-graph + RAG framework; “Process Safety” in industrial context typically covers chemical/oil-gas/manufacturing safety procedures, hazard analyses (HAZOP), incident reports, and equipment safety protocols. RAG-indexed process safety documentation is operationally sensitive. Vendor confidential procedures, plant-specific equipment configurations, and incident-response playbooks all appear in Process Safety document corpora.

Threat class: High. Auth-off across the entire stack (orchestrator + LLM + vector DB + storage). Multi-port stacked exposure mirrors the Klinikken.ai pattern. Operator’s personal Google Drive content is being ingested via the webhook layer.

Disclosure path: Scaleway France (abuse@scaleway.com) + operator email (registrant lookup pending). French DPA is CNIL if PII confirmed. Disclosure draft to be authored as disclosures/GRAPHRAG-PROCESS-SAFETY-2026-05-09.md.


F10: Embedding oracle attack chain

The combination of:

  1. Auth-off embedding server (compute cost borne by operator)
  2. Known model in use (disclosed by /info or Shodan HTML)
  3. Exposed vector DB on same host (cross-referenced against 02-vector-databases.md survey)

…creates a complete embedding oracle attack chain. Attacker queries the embedding server to pre-compute vectors for target documents/queries, then uses those vectors to probe the vector DB semantically (nearest-neighbor search reveals what documents the victim’s RAG system contains, without direct DB access). This chain requires zero credentials and zero vulnerability exploitation. It’s pure authorized-feature abuse.


Threat-class realization

ClassRealized?Scope
Compute theft (GPU/CPU billing)All unauth embedding servers (~100% of confirmed)
Embedding oracle (vector DB probing)Any host with collocated vector DB
Embedding API abuse (rate-unlimited)All unauth
HTTP/2 DoS via CVE-2023-44487209 hosts
Dual-stack RAG data extraction⚠️ Requires correlation with vector DB surveySubset

Survey gap: Shodan-dark population

The masscan-supplemented population (TEI, infinity, custom FastAPI roots returning JSON) is not captured here. A port-targeted aimap sweep of tier-2 cloud prefixes on ports 7997, 8000, 8001, 8002, 8080, 3000 would surface the full population. Estimate: 3-5× the Shodan-visible count, concentrated in port-7997 (infinity) and port-8000/8001 (custom FastAPI).


See also