Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All reference

Reference

2. Vector Databases

Source: https://github.com/nuclide-research/AI-LLM-Infrastructure-OSINT/blob/main/shodan/queries/02-vector-databases

Section verified: April 30, 2026

The storage layer for RAG, embeddings, and long-term LLM memory. Many of these ship without authentication enabled by default, exposed instances often disclose collection names, schema, embedding model, and the LLM provider keys used to generate vectors.

ChromaDB

Shodan QueryNotes
"chroma"2,311 hits, ⚠️ noisy; “chroma” collides with chroma-subsampling, color-correction UIs, Razer Chroma; use as last resort
product:"Chroma"1,838 hits, canonical product facet; sampled hits on 8000/8100 confirm vector DB
product:"Chroma" port:80001,139 hits, traditional default port
product:"Chroma" "uvicorn"353 hits, highest-confidence: Chroma ASGI server + product facet
"chroma" "uvicorn" port:8000230 hits, same intent, no product-facet dependency
http.title:"Chroma"94 hits, title match, some non-DB Chroma products mixed in
http.html:"chromadb"76 hits, name-specific HTML body match
"chromadb"47 hits, name-specific banner match (46 on 2026-04-30 reverify)
"server: uvicorn" "/api/v2"39 hits (2026-04-30), Chroma 0.6+ ASGI fingerprint; v2 API path + uvicorn header narrows to current-era unauth-by-default deployments
http.html:"/api/v1/heartbeat"22 hits, heartbeat path leaked into HTML response
"chroma" port:810013 hits, post-0.5 default port
product:"Chroma" port:810013 hits, product facet on new default
"chroma" "0.6"12 hits, current version era
"chromadb" port:80005 hits, narrow
"chroma" "0.5"2 hits, transitional auth-optional era
"chroma" "0.4"1 hit, pre-auth legacy era, nearly extinct

Fingerprint note: Shodan does not index arbitrary HTTP endpoint paths (/api/v1/heartbeat, /api/v1/collections, /openapi.json) in the root banner, it crawls / and headers only. Path-based fingerprints only work when the path string appears inside HTML response bodies (see http.html:"/api/v1/heartbeat" above). This is a general lesson applicable across the catalogue.

Status-code caveat: Chroma returns 404 at / regardless of auth configuration (no root handler registered), so http.status:200 and http.status:401 both return 0 for product:"Chroma". Auth-enabled vs unauth cannot be distinguished from Shodan banner data alone, probe /api/v1/heartbeat live to tell them apart.

{"nanosecond heartbeat": 1713384722842816000}

What a T1 Chroma heartbeat response looks like. No auth, full API reachable.

Qdrant

Shodan QueryNotes
http.html:"qdrant"949 hits, canonical fingerprint; HTML-body match catches what banner doesn’t
http.html:"qdrant" port:443331 hits, TLS-fronted deployments (proxy or native)
http.html:"qdrant" port:80221 hits, plaintext proxy-fronted
"qdrant"189 hits, banner-level match, narrower subset
"qdrant" "vector"31 hits, banner + domain term
http.html:"qdrant" port:633327 hits, direct default-port exposure; highest-risk subset, no proxy auth layer
"qdrant" "dashboard"17 hits, web dashboard accessible, full read/write
http.title:"Qdrant"13 hits, title match
http.html:"Qdrant Web UI"6 hits, dashboard UI marker
http.html:"Qdrant Dashboard"3 hits, specific dashboard HTML
"qdrant" "collections"1 hit, banner + API term
http.html:"qdrant-version"1 hit, response header string leaked into HTML

Fingerprint field lesson (generalizable): "qdrant" bare returns 189 hits but "qdrant" port:6333 returns 0, not a bug. Bare "<term>" matches Shodan’s banner text (headers + initial response). On port 6333, Qdrant’s REST root returns JSON without the literal word “qdrant” in headers, so banner match misses. The http.html: field parses response bodies and does catch it. Always try both "<term>" and http.html:"<term>".

Path indexing: Qdrant’s /telemetry, /snapshots, and /points/search endpoints don’t surface in Shodan at all, even the terms “telemetry”, “points”, “snapshots” paired with “qdrant” return 0 on every combination tested. These endpoints require live probing, not Shodan queries.

Deployment shift: 922 of 949 Qdrant instances are no longer on port 6333 (80+443+proxies = ~90% of exposure). Same pattern as n8n and Flowise.

Weaviate

Shodan QueryNotes
http.html:"weaviate"1,647 hits, canonical fingerprint
"weaviate"1,326 hits, banner match, anonymous access by default
http.html:"weaviate" port:8080899 hits, ~55% of instances still on default port; direct unauth access likely
"weaviate" port:8080564 hits, banner + default port
"weaviate" port:80206 hits, plaintext proxy-fronted
"weaviate" port:443118 hits, TLS-fronted
"weaviate" "meta"19 hits, meta endpoint term in banner
http.title:"Weaviate"10 hits, title match
http.title:"Weaviate Console"3 hits, console UI title
http.html:"Weaviate Console"3 hits, console UI HTML
"weaviate" "schema"1 hit, schema term in banner

Deployment anomaly: Weaviate is the only vector DB verified so far that did not mass-migrate off its default port. 899 of 1,647 instances (~55%) remain directly exposed on 8080, vs Qdrant (3% on 6333) and Flowise (~0% on 3000). Likely explanation: Weaviate is more often run behind application backends rather than user-facing reverse proxies, the embedding service sits between an app and the data, not between a user and a UI. The practical impact: direct 8080 hits are higher-confidence “this is a real Weaviate API, no auth layer in front” signals.

Path indexing: All 5 original /v1/schema, /v1/objects, /v1/meta, /v1/nodes, /v1/backups queries returned 0. Shodan doesn’t crawl these paths. Live-probing the /v1/meta endpoint on confirmed Weaviate hosts is required to enumerate version + module config.

Module fingerprint gap: "text2vec-openai" and "generative-openai" both return 0 alone and paired, module config is server-side metadata, not reflected in root banner or HTML. To detect OpenAI-key-backed Weaviate deployments, probe /v1/meta live on the 899 port:8080 hits.

Auth reality check: Weaviate ships with anonymous access enabled unless AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=false is explicitly set. With 899 direct-exposure hits, assume most are unauthenticated absent contrary evidence.

Milvus

Shodan QueryNotes
"Attu"2,071 hits, admin GUI banner, unique to Milvus
"milvus"1,617 hits, banner match
http.html:"milvus"1,571 hits, canonical HTML fingerprint
http.html:"Attu"1,497 hits, admin GUI HTML
http.title:"Attu"1,492 hits, admin GUI title
product:"Milvus"1,486 hits, Shodan product facet
http.title:"Attu" "Milvus"1,483 hits, highest-confidence admin GUI
"milvus" "Attu" port:8000751 hits, Attu on port 8000 with Milvus banner
"milvus" port:8000751 hits, Milvus banner on 8000
http.title:"Attu" port:3000301 hits, Attu on its own default port 3000
"milvus" http.status:20057 hits, Milvus banner with 200 response
"milvus" "MinIO"50 hits, Milvus + object-storage backend co-located
http.title:"Milvus"36 hits, Milvus in title
"milvus" "etcd"31 hits, Milvus + metadata store
"milvus" port:900011 hits, MinIO port
"milvus" "metrics"8 hits, metrics term in banner
"milvus" port:2379 "etcd"6 hits, etcd metadata store exposed
"milvus" port:90911 hit, legacy HTTP proxy port

gRPC blind spot (important): Milvus’s primary API runs on port 19530 over gRPC (HTTP/2 + protobuf binary framing). Shodan’s banner grab cannot read the “milvus” string from a gRPC banner, it’s not HTTP text. port:19530 returns 522 baseline hits that likely include Milvus, but "milvus" port:19530 returns 0. These 522 hits cannot be confirmed as Milvus from Shodan alone; live probing via a gRPC reflection request or a milvus.proto.milvus.MilvusService/DescribeCollection RPC is required.

Path indexing: Original queries using /api/v1/health and /metrics on port 9091 all returned 0 (same Shodan crawl-path limitation documented in Chroma and Qdrant sections).

Component visibility ranking: In this catalogue’s data, the Attu admin UI is more exposed than the Milvus core itself (2,071 Attu banner hits vs ~1,500 Milvus HTML hits). Attu is a React SPA with full read/write access to any Milvus it connects to, typically via a connection dropdown with no auth checks in the UI layer. A reachable Attu is a reachable database.

Side-service exposure: When Milvus is deployed with its full stack, the etcd metadata store (port 2379) and MinIO object backend (port 9000) often ride alongside, etcd exposes the entire schema and collection topology; MinIO holds the raw vector files. Check "milvus" "etcd" and "milvus" "MinIO" results for colocated trifectas.

Object Storage & Artifact Stores

Vector databases are the search layer. Object storage is where the models, embeddings, raw documents, training datasets, and DB snapshots actually live on disk. Milvus, Weaviate, and most RAG pipelines back onto MinIO or S3 by default. A compromised bucket is typically a higher-impact finding than the index in front of it.

MinIO

Shodan QueryNotes
"MinIO Console" port:900149,984 hits, admin console on default port; largest exposure in this catalogue
"MinIO" port:900043,775 hits, S3-compatible data port; default backing for Milvus/Weaviate/RAG
"Server: MinIO" port:900042,799 hits, Server header leak, near-total overlap with above
http.title:"MinIO Browser" port:90002,711 hits, legacy browser UI (pre-console split)
"MinIO" ("bucket" OR "objects") port:90001,444 hits, bucket listing keywords in banner

MinIO reality check: The previously-listed "MinIO" port:9000 -"auth" is not a useful auth filter, -"auth" matches ~identical counts (43,778 vs 43,775) because “auth” is not a banner token for MinIO either way. Query dropped. Distinguishing authenticated vs open buckets requires live probing /probe-bucket-sign/ or attempting anonymous ListBuckets.

Docker Registry

Shodan QueryNotes
product:"Docker Registry"15,656 hits, Shodan product facet, canonical fingerprint
"Docker Registry"14,843 hits, banner-level match
"Docker Registry" -"unauthorized"9,161 hits, registries NOT returning 401 at root; anonymous-readable subset
"Docker Registry" "unauthorized"5,683 hits, auth-enabled subset (returns 401 at root)
"Docker-Distribution-Api-Version"1,679 hits, HTTP response header leak (highly specific)
"registry/2.0"1,675 hits, www-authenticate realm token leak
"registry/2.0" "unauthorized"1,030 hits, registry realm with 401 enforced
"Docker Registry" port:443873 hits, TLS-fronted registries
"registry/2.0" -"unauthorized"646 hits, realm token + no 401 = anonymous read candidate
"Docker Registry" port:51000397 hits, non-default port, same exposure class
"Docker Registry" port:55000291 hits, non-default port, same exposure class
"Docker Registry" port:5001273 hits, alternate registry port
"Docker Registry" "/v2/" port:5000131 hits, default port + v2 API token
"Docker Registry" port:80100 hits, plaintext registries
http.html:"_catalog"32 hits, catalog path leaked into HTML body
"registry/2.0" port:500029 hits, realm token + default port
"_catalog"20 hits, catalog token in banner
http.html:"/v2/_catalog"14 hits, exact catalog path in HTML
"/v2/_catalog"2 hits, literal path in banner (rare)

Harbor

Shodan QueryNotes
http.title:"Harbor"22,555 hits, enterprise registry title
"harbor" (port:80 OR port:443)14,607 hits, banner match with grouped OR (unparenthesized version silently breaks)

Triage note: Anonymous /v2/_catalog read ≠ anonymous push. But anonymous pull of internal images leaks entire codebases, build-time secrets in layer history, and the full supply chain of whatever ships from that registry. Rate severity on read vs. push separately before escalating. See §12 for Docker daemon and container runtime exposure (distinct from the artifact store).

Scale callout: MinIO (~50k console hits + ~44k data hits) and Docker Registry (~15k) together form the single largest AI-adjacent attack surface in this catalogue, bigger than any model-serving, vector-DB, or orchestration platform measured. When a Milvus or Weaviate colocates with an exposed MinIO, the storage layer is almost always the faster path to training data, DB snapshots, and raw vectors. Prioritize accordingly.

Elasticsearch / OpenSearch

Elasticsearch

Shodan QueryNotes
product:"Elastic"92,587 hits, Shodan product facet; note Shodan uses “Elastic”, not “Elasticsearch”
"elasticsearch"77,771 hits, banner-level match
"elasticsearch" "lucene_version"60,294 hits, JSON root response leak (highly specific)
"elasticsearch" port:92007,075 hits, banner + default HTTP port
"elasticsearch" "8."3,728 hits, v8.x subset, has native dense_vector support
"elasticsearch" port:9200 "cluster_name"109 hits, cluster_name JSON field leaked
"elasticsearch" port:9200 "You Know, for Search"94 hits, official tagline leak
http.html:"/_cluster/health"3 hits, cluster health path in HTML (rare)

Kibana

Shodan QueryNotes
"kibana"17,368 hits, banner match
"kibana" port:56015,253 hits, Kibana default port
http.title:"Kibana"2,230 hits, UI title
"kibana" "server is not ready"3 hits, bootstrap state, often unauth window

OpenSearch

Shodan QueryNotes
"opensearch"63,832 hits, banner match; browser-plugin pollution tested as minimal
http.html:"opensearch"22,846 hits, HTML body match
http.title:"OpenSearch Dashboards"7,843 hits, dashboards UI title (clean)
"opensearch" port:9200934 hits, data node default port
"opensearch-dashboards"269 hits, dashboards banner
"opensearch-dashboards" port:560178 hits, dashboards default port

AI-extension blind spot (important): The vector-search features that make Elasticsearch/OpenSearch relevant to AI (dense_vector field, knn plugin, embeddings, number_of_shards, index mappings) are invisible to Shodan. Every query containing "dense_vector", "knn", "embeddings", or "number_of_shards" returns 0, even alone, even in html fields. These tokens live in per-index API responses (/_mapping, /_cat/indices), which Shodan does not crawl. You cannot distinguish “ES running a RAG vector store” from “ES storing web logs” via Shodan alone, requires live probing.

Auth-state blind spot: "elasticsearch" -"security_exception" matches the same 77k as unfiltered (literally 77,772 vs 77,771). X-Pack auth errors don’t appear in Shodan banners either. Distinguishing open clusters from auth-gated ones requires live / or /_cluster/health probes with no auth header.

Browser plugin pollution tested, not a concern: http.html:"opensearch" -"opensearchdescription" = 22,845 hits (removes 1 from 22,846). The W3C OpenSearch browser plugin spec is not meaningfully contaminating OpenSearch DB counts.

Scale note: With product:"Elastic" at 92,587, Elasticsearch is the single most-exposed platform in this entire catalogue, larger than MinIO Console (49,984) and n8n (77,102). Most of these run for observability/logs, not vectors, but a non-trivial fraction of the 3,728 v8.x instances serve RAG workloads whose vector indexes cannot be separated from log indexes without live probing.

PostgreSQL + pgvector

PostgreSQL base platform

Shodan QueryNotes
"PostgreSQL"603,878 hits, banner match; largest base platform in catalogue. Most auth-gated on the wire protocol, not exploitable from Shodan banner alone
product:"PostgreSQL"602,991 hits, Shodan product facet (near-total overlap with banner)

pgvector

Shodan QueryNotes
"pgvector"125 hits, banner match; vanishingly small subset of PG instances fingerprint as vector DBs
http.html:"pgvector"103 hits, pgvector mentioned in HTML body (dashboards, docs, admin UIs)

Supabase

Shodan QueryNotes
http.html:"supabase"45,784 hits, Supabase ships with pgvector by default; each of these is a candidate RAG backend
"supabase"3,276 hits, banner-level match
"Supabase" port:800056 hits, Supabase Studio/API on default port, direct-exposure subset
http.title:"Supabase Studio"1 hit, Studio admin UI title

pgAdmin

Shodan QueryNotes
http.html:"pgAdmin"6,899 hits, HTML body match (largest admin-surface count)
http.title:"pgAdmin"6,704 hits, admin UI title; default creds historically common
"pgAdmin" port:80100 hits, plaintext-exposed admin UI
"pgAdmin"892 hits, banner-level match
"pgAdmin" port:44359 hits, TLS-fronted admin UI

Timescale

Shodan QueryNotes
"Timescale"63 hits, banner-level match
http.html:"timescaledb"56 hits, TimescaleDB mentioned in HTML; vector support via pgvector compatibility

Neon

Shodan QueryNotes
"neon.tech"139 hits, Neon hostname in banner/TLS cert
http.html:"neon.tech"7 hits, Neon domain in HTML body
"Neon" "postgres"3 hits, Neon word collision is high; filter tightly

Scale reality: PostgreSQL at 603k+ is the largest base-platform count in this catalogue, but the vast majority are not AI-adjacent. The pgvector subset is 125 banner-match hits, three orders of magnitude smaller. Supabase is the signal-to-noise winner here: 45,784 Supabase HTML hits vs 125 raw pgvector hits, and Supabase ships pgvector by default. A reachable Supabase backend with open anon keys is a pgvector-backed RAG store with higher probability than probing a random "PostgreSQL" host.

Auth-state blind spot: PostgreSQL uses a binary protocol on 5432; Shodan captures the startup error/version banner but not auth posture. psql -h <ip> -U postgres or explicit SELECT over a captured connection string is the only way to confirm anonymous vs authenticated access. The 603,878 banner hits are a visibility count, not a vulnerability count.

Original query failures: All 5 originals combined narrow port (5432) + path-style strings ("pgvector", "vector") that don’t coexist in the PG startup banner. "PostgreSQL" port:5432 "pgvector" = 0 because the startup error doesn’t contain extension names. Use the replacements above.

Shodan QueryNotes
product:"Redis"245,566 hits, Shodan product facet, canonical Redis fingerprint
"Redis" port:6379 -"AUTH" -"NOAUTH"67,934 hits, no password set (filter narrows from 245k → 68k, meaningful)
"Redis Stack"754 hits, Redis Stack bundle (includes RediSearch for vectors)
"Redis Stack" port:6379726 hits, Redis Stack on default port
http.title:"RedisInsight"295 hits, Redis GUI title
"RedisInsight"57 hits, GUI banner match
"Redis" "FT.SEARCH"1 hit, vector-search command in banner (extremely rare)

MongoDB

Shodan QueryNotes
port:27017 -"unauthorized"568,835 hits, broad MongoDB-port candidates not returning 401; not all are MongoDB but the default Mongo port dominates
product:"MongoDB"197,308 hits, Shodan product facet
"MongoDB"107,071 hits, banner-level match
"MongoDB" port:27017 -"auth"89,942 hits, unauth-candidate subset (filter narrows ~17%)
"MongoDB" port:27017 "vector"⚠️ 59,996 hits, “vector” likely pollutes; Atlas Vector Search is the AI feature but this token appears in more MongoDB banners than expected, sample before trusting
"mongo-express" port:8081187 hits, web admin, default creds common

ClickHouse

Shodan QueryNotes
"clickhouse"35,023 hits, banner
product:"ClickHouse"32,152 hits, Shodan product facet
"clickhouse" port:8123 ("vector" OR "similarity")1 hit, OLAP + vector hybrid in banner (rare, live-probe required)

Other Vector DBs

Shodan QueryNotes
"arangodb"641 hits, banner match
http.html:"arangodb"555 hits, HTML body
http.title:"ArangoDB"551 hits, broader than “ArangoDB Web Interface”
http.title:"ArangoDB Web Interface"550 hits, specific admin UI title
"surrealdb"480 hits, banner
"typesense"341 hits, banner
http.html:"lancedb"334 hits, HTML body (bare "lancedb" banner = 0)
"cassandra"267 hits, banner; vector extensions in 5.x+ invisible to Shodan
http.html:"vespa"238 hits, Vespa HTML
"Vespa"232 hits, banner
http.html:"typesense"202 hits, HTML body
"Vespa" "document"67 hits, Vespa + document term (narrower)
http.html:"surrealdb"12 hits, HTML body
"Zilliz"8 hits, Milvus’s hosted variant, sparse banner presence
"marqo"7 hits, banner
"txtai"3 hits, banner

No product facet on Shodan (verified 0) for: SurrealDB, ArangoDB, Marqo, LanceDB, Typesense, Vespa, Zilliz, Cassandra, txtai. Use banner/HTML variants.

Graph Databases / Memory

Shodan QueryNotes
product:"Neo4j"9,743 hits, Shodan product facet
"Neo4j"6,101 hits, banner match
"Neo4j" port:7474 "browser"5,225 hits, Neo4j Browser UI, default creds neo4j/neo4j historically common
"Dgraph"185 hits, banner (case-insensitive)
"Mem0"140 hits, AI memory store
http.title:"Ratel"73 hits, Dgraph Ratel UI (cleaner than bare "ratel" which returns 156k of unrelated noise)
"memgraph"51 hits, in-memory graph DB, agent memory workloads
"Memgraph Lab"10 hits, Memgraph web UI
"Mem0" port:80009 hits, Mem0 on default port

Ratel noise warning: "ratel" bare returns 156,554 hits due to unrelated projects/products sharing the name. http.title:"Ratel" (73) is the clean version for Dgraph’s UI specifically.

Auth-state takeaways from this section:

  • Redis -"AUTH" -"NOAUTH" filter is meaningful (245k → 68k)
  • MongoDB -"auth" filter is meaningful (107k → 90k)
  • Contrast with ES -"security_exception" and MinIO -"auth" which are both no-ops
  • Rule: always verify negative-filter behavior against the unfiltered count before relying on it as T1

MongoDB "vector" anomaly ⚠️: The "MongoDB" port:27017 "vector" query returns 59,996 hits, suspiciously large given MongoDB Atlas Vector Search is a 2023+ feature that runs on Atlas cloud (not typically self-hosted on open 27017). Likely contaminated by the word “vector” appearing in unrelated MongoDB banner/log content. Sample before trusting this as a RAG-backend fingerprint.