Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Tier-2 Survey May 4, 2026

ChromaDB on Tier-2 Cloud: Auth Posture Survey (Scope Expansion)

NuClide Research · 2026-05-04 Companion to: chromadb-cloud-survey-2026-05.md (DO/Hetzner/Vultr baseline, 48 instances) Sibling tier-2 expansions: ollama-tier2-cloud-survey-2026-05.md, qdrant-tier2-cloud-survey-2026-05.md, milvus-tier2-cloud-survey-2026-05.md


Summary

Mass-scan of port 8000 (ChromaDB default) across the same 76 tier-2 /16 ranges (3.55M IPs), Scaleway + OVH + Linode used in the parallel Qdrant/Milvus/Ollama tier-2 expansions. 34,524 port-open candidates → 44 confirmed ChromaDB instances → 23 populated.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, S7068, S7075, T5858, T5904
  • 733 (AI Risk & Ethics Specialist): K7040, K7051, T5854, T5868, T5904
  • overlap (Common AI KSATs (all 5 roles)): K1157, K1158, K22, K6935, K7003, K942

Combined with the original DO/Hetzner/Vultr baseline (48 instances), total ChromaDB count across NuClide surveys is 92 instances, 100% unauthenticated, the auth-off-default thesis reproduces cleanly.

The notable findings are not the auth state itself (ChromaDB ships auth-off; that’s now well-documented) but the branded enterprise tenant content appearing in unauth ChromaDB collections at population scale:

  • STIHL (German power-tools brand) AI deployment via RaptorCX integrator, host serves ai-stihl.raptorcx.com cert
  • AXA Insurance, rag_axa collection on unauth ChromaDB
  • Mitsubishi + Daikin, branded Japanese corporate tenants
  • Indonesian financial regulator (OJK) + Indonesian Personal Data Protection Law (UU PDP), RAG corpus over Indonesian banking regulations and the privacy law itself
  • Healthcare/medical content, oncology, patient_info_embeddings, larvol_kol (Larvol pharma KOL data)

Methodology

masscan -iL <76 tier-2 /16 CIDRs> -p 8000 --rate 10000
  → 34,524 port-8000 hits (port 8000 is heavily shared)

chroma-probe.py (200-thread, strict signature)
  GET /api/v1/heartbeat → JSON containing "nanosecond heartbeat"
  GET /api/v1/version
  GET /api/v1/collections OR /api/v2/tenants/default_tenant/databases/default_database/collections
  → 44 confirmed ChromaDB instances

Strict heartbeat-signature probe filters out the 99.9% non-ChromaDB hits on port 8000 (port also used by uvicorn FastAPI apps, Open WebUI, generic HTTP servers).


Findings Summary

MetricValue
Tier-2 /16 ranges scanned76
Total IPs scanned3,550,208
Masscan hits on :800034,524 (heavy false-positive port)
ChromaDB confirmed44
Unauthenticated44 (100%), auth-off-default
Populated (≥1 collection)23
API version splitv2: 40, v1: 4
Top ChromaDB version1.0.0 (30 hosts), 0.6.3 (6)

Combined ChromaDB across all NuClide surveys

SurveyConfirmed100% unauth
DO/Hetzner/Vultr baseline48yes
Tier-2 (Scaleway/OVH/Linode)44yes
Total92yes

Operator identification (cert pivots on populated tier-2 hosts)

13 of 23 populated hosts have TLS certs revealing operator identity (identities redacted pending coordinated-disclosure windows):

IPCert SAN class
139.162.58.251Religious-content SaaS, dev4 environment
141.95.173.68Generic SaaS API
151.80.36.161Dubai luxury-collection platform
158.69.200.136German power-tools brand AI deployment via integrator
172.105.180.119Religious-content SaaS, dev1 environment (same operator class as 139.162.58.251)
172.237.100.134UAE real-estate platform
217.182.175.144Brazilian SaaS log/analytics endpoint
51.15.248.130Deck/aviation AI
51.158.182.87EU food-supplement / health brand
51.68.233.220AI sketching / image-gen platform
51.75.95.98QA staging environment for code/build platform
54.37.154.254Generic API
54.39.50.216Marketing-intelligence platform

Remaining 10 populated hosts have no TLS cert on port 443 (OVH default rDNS) and are not directly operator-attributable.


Notable collection content

Sample of distinctive collection names observed across the 23 populated hosts:

CollectionContent class
rag_axaAXA Insurance, branded enterprise tenant on shared ChromaDB
rag_ojkOJK = Otoritas Jasa Keuangan = Indonesian Financial Services Authority, government regulatory body
rag_uupdpUU PDP = Indonesian Personal Data Protection Law (Undang-Undang Pelindungan Data Pribadi), RAG over the data-privacy law itself
oncologyMedical oncology content
patient_info_embeddingsPatient information embeddings, healthcare-PII concern
larvol_kolLarvol Key Opinion Leaders, pharma KOL data
mitsubishiMitsubishi-branded RAG
daikinDaikin-branded RAG (HVAC manufacturer)
pmi-faq, pmilegales, pmiprecios, pmiproductosMulti-collection “pmi” tenant, possibly Philip Morris International or PMI Project Management Institute (the legal/precios/productos suffix suggests pharma or product-catalog)
hilversum_documentsDutch municipality (Hilversum) document RAG
os_bali_documentsBali / Indonesian regional document RAG
collectdxbDubai collections platform
documents_<UUID>Per-tenant per-document UUID-keyed collections (multi-tenant SaaS pattern)

The mix of major-brand tenants (AXA, Mitsubishi, Daikin), regulatory/government content (OJK, UU PDP, Hilversum municipality), and healthcare/pharma content (oncology, patient_info_embeddings, larvol_kol) on shared unauth ChromaDB instances is the same pattern observed in the parallel Qdrant tier-2 survey: operators don’t recognize that ChromaDB on port 8000 is a separate security perimeter from their authenticated front-end, and so the data tier carrying enterprise/regulated content is exposed.


Cross-platform correlation

  • Two-tier auth-skew pattern (also documented in Qdrant tier-2): Multiple ChromaDB tier-2 hosts run a Sermon Live-style chat app on port 80/443/3000 (auth-protected) while exposing the backing ChromaDB on port 8000 unauth. Same operator pattern: front-end auth perimeter, data-tier accidental exposure.
  • Brand-tenant overlap with Milvus tier-2: Just as the Milvus tier-2 survey found OCBC Bank Singapore + STIHL-class branded tenants on unauth Milvus, the ChromaDB tier-2 surface includes AXA + STIHL + Mitsubishi + Daikin. Major brands are commissioning AI-RAG products from integrators who are using auth-off-default vector DBs as backing stores. The integrators bear remediation responsibility but the brands face reputational exposure.
  • Government / regulatory content: OIF Francophonie (Qdrant tier-2), Quebec municipal-government RAG (Milvus tier-2), Indonesian OJK + UU PDP (ChromaDB tier-2), Hilversum Dutch municipality (ChromaDB tier-2). At the population scale, government/regulatory content is meaningfully represented on auth-off-default vector stores.

Disclosure posture

Per the consolidated ledger at disclosure/qdrant-snapshot-disclosure-ledger-2026-05.md, no per-operator disclosures have been drafted for the ChromaDB tier-2 cohort yet, the Qdrant disclosures are in the active 30-day window and the cumulative outreach is being managed at sustainable cadence. Per-operator drafts can be added on demand for ChromaDB hosts with the highest content-sensitivity (oncology / patient-info / branded enterprise tenants), but the aggregate finding is the same auth-off-default thesis.

The recommended fix is the same as for the broader survey series:

  1. Set CHROMA_SERVER_AUTHN_PROVIDER and CHROMA_SERVER_AUTHN_CREDENTIALS (token or basic auth)
  2. Bind ChromaDB to 127.0.0.1 and access only over a private network
  3. Apply identical posture to the front-end’s data backend, not just the front-end itself

Raw Data

~/recon/chromadb-tier2-2026-05-04/tier2-chroma-confirmed.jsonl  (44 hosts)

See Also