Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

§ THE STACK / DATA LAYER

Vector Databases

ChromaDB, Milvus, Qdrant, Weaviate, pgvector

Vector stores, registries, memory, datasets: what the model knows and remembers.

What it is

A vector database stores high-dimensional embeddings, numerical fingerprints of text, images, audio, and answers nearest-neighbour queries against them. It is the memory of every RAG system. The popular ones each carve out a slightly different niche: ChromaDB (the developer-friendly default), Qdrant (Rust-fast, popular in production), Milvus (the heavyweight enterprise option), Weaviate (schema-rich), Pinecone (managed-only), pgvector (Postgres extension), Elasticsearch with its dense_vector type (the one that already lives in your stack).

What goes wrong

Every popular vector DB ships with authentication off by default and a public listen socket. The exposure isn’t theoretical, it’s the contents. Every collection is named (often after the project: customer-support-knowledge, legal-discovery-q4, patient-notes-v2); every collection contains the embedded text in its metadata; many collections also contain the original source URL or document ID. Reading a single collection lets an attacker reconstruct most of the operator’s internal corpus, plus the prompts the operator has been embedding (which often are customer queries).

How we test

We hit the heartbeat endpoint to confirm the engine, list collections via the unauthenticated metadata API, and read the first record’s metadata only (never the raw vectors, never the bulk content). The collection names plus the metadata-schema fields are sufficient evidence of exposure. For operators we already know, universities, medical centres, financial institutions, we draft the disclosure on collection names alone, which cleanly avoids touching the contents in any reportable way.

Receipts

Research

Every survey, case study, and disclosure we've published that touches this layer of the stack. Counts on the cells above tally these directly.

Cross-cloud surveys

13
Survey May 17, 2026

Vector database population survey, 2026-05-17

We surveyed the public vector-database population: Qdrant, Weaviate, Milvus, ChromaDB. Vector DBs hold the embeddings for an operator's RAG pipeline. Every document, customer transcript, support ticke…

Read →
Survey May 11, 2026

VisorBishop loop-iteration #3: AI-stack ML pipeline ports, Rogers NetOps disclosure

NuClide Research · 2026-05-11

Read →
Survey May 9, 2026

Milvus/Attu on Public Cloud: Auth Posture and Multi-Tenant SaaS Exposure Survey

Shodan pull of http.title:"Attu" "Milvus" → 1,389 unique IPs → asyncio probe of Attu port 3000 + Milvus REST port 19530 → 763 confirmed reachable instances. Of these, 303 have the Attu admin UI open (…

Read →
Survey May 9, 2026

Weaviate on Public Cloud: Auth Posture and Enterprise Tenant Exposure Survey

Shodan pull of http.html:"weaviate" port:8080 → 852 unique IPs → asyncio probe of /v1/meta, /v1/schema, /v1/nodes → 694 confirmed reachable Weaviate instances. Of these, 435 are fully open (no authent…

Read →
Survey May 4, 2026

Commercial AI Infrastructure Exposures

Commercial / SaaS Ollama and AI infrastructure exposures discovered during OSINT sweeps. These differ from university and research-network exposures in that the operators are commercial entities with…

Read →
Survey May 4, 2026

ChromaDB on Tier-2 Cloud: Auth Posture Survey (Scope Expansion)

Mass-scan of port 8000 (ChromaDB default) across the same 76 tier-2 /16 ranges (3.55M IPs), Scaleway + OVH + Linode used in the parallel Qdrant/Milvus/Ollama tier-2 expansions. 34,524 port-open candid…

Read →
Survey May 4, 2026

Milvus on Tier-2 Cloud: Auth Posture Survey (Scope Expansion)

Mass-scan of port 19530 (Milvus REST/gRPC default) across the same 76 tier-2 /16 ranges (3.55M IPs), Scaleway + OVH + Linode used in the Ollama and Qdrant tier-2 expansions. 5,480 port-open candidates…

Read →
Survey May 4, 2026

Qdrant on Tier-2 Cloud: Auth Posture Survey (Scope Expansion)

Mass-scan of port 6333 (Qdrant HTTP API) across the same 76 tier-2 /16 ranges (3.55M IPs), Scaleway + OVH + Linode used in the tier-2 Ollama expansion. 9,192 port-open candidates → 781 confirmed Qdran…

Read →
Survey May 4, 2026

Operator Remediation Guide

If you operate one of the platforms surveyed in 2026-05, most exposures resolve to a single configuration change to enable authentication. The most-effective hardening goes one step further and binds…

Read →
Survey May 3, 2026

ChromaDB on Public Cloud: Auth Posture Survey

Sweep of 1.83M IPs across 28 cloud-provider /16 ranges (DigitalOcean, Hetzner, Vultr) on port 8000 → 22,765 masscan hits → 48 confirmed ChromaDB instances via /api/v{1,2}/heartbeat → {"nanosecond hear…

Read →
Survey May 3, 2026

Milvus on Public Cloud: Auth Posture Survey

Sweep of 1.83M IPs across 28 cloud-provider /16 ranges (DigitalOcean, Hetzner, Vultr) on port 19530 → 275 masscan hits → 33 confirmed Milvus instances via the /v2/vectordb/collections/list REST API →…

Read →
Survey May 3, 2026

Qdrant on Public Cloud: Auth Posture Survey

Sweep of 1.83M IPs across 28 cloud-provider /16 ranges (DigitalOcean, Hetzner, Vultr) on port 6333 → 9,462 live hosts (partial scan, killed at 40% coverage) → 151 masscan hits → 61 confirmed Qdrant in…

Read →
Survey May 3, 2026

The Modern AI Stack Ships Open: Cross-Survey Synthesis

Across thirteen distinct platform classes, vector databases, model-serving inference servers, MLOps tracking, image generation, agent platforms, chat UIs, data apps, and orchestration tools, surveyed…

Read →

Field cases

10
Case May 25, 2026

Stock.ai (EMOR AI) — Partial-Auth Failure, Open Weaviate, and 62 Proprietary Analyst Reports

EMOR AI's unreleased Stock.ai product exposes a Weaviate vector database, individual API resource endpoints, and 62+ proprietary Arihant Capital equity analyst reports. The developer implemented JWT and Google OAuth but left individual resource endpoints unprotected. A reused HR/resume Azure OpenAI subscription confirms operator identity.

Read →
Case May 4, 2026

Commercial AI Infrastructure Exposures

Commercial / SaaS Ollama and AI infrastructure exposures discovered during OSINT sweeps. These differ from university and research-network exposures in that the operators are commercial entities with…

Read →
Case May 3, 2026

Auto F&I Sales Training RAG: Customer Dialogues + Methodology IP Exposed via Unauthenticated ChromaDB

A ChromaDB instance on a DigitalOcean VPS exposes three RAG collections used to train an auto-dealership F&I (Finance & Insurance) sales agent. The collections contain real customer dialogue transcrip…

Read →
Case May 3, 2026

Crypto Investment Agent: Per-User Financial Memory Exposed via Unauthenticated ChromaDB

A ChromaDB instance on a DigitalOcean VPS exposes a Spanish-language crypto investment AI agent's full vector memory: 12 collections holding the CoinGecko API documentation corpus, a 15,560-token cryp…

Read →
Case May 3, 2026

HolaModa + Delta701: Multi-Tenant Fashion Retail RAG with Dev/Prod Co-Located on Unauth ChromaDB

A ChromaDB instance on a DigitalOcean VPS holds 1.53M embedded documents across seven collections, spanning two tenants (HolaModa and Delta701) and mixing development with production environments on t…

Read →
Case May 3, 2026

Brazilian Banking-Compliance AI Consultant: Unauthenticated Qdrant with BCB / LGPD Methodology Corpus

A Qdrant instance on a DigitalOcean VPS exposes an unauthenticated endpoint with a collection schema consistent with a RAG-backed legal casework or compliance investigation platform. Collections inclu…

Read →
Case May 3, 2026

Multi-Tenant Personal Document SaaS: Diary, Theater Scripts, Philosophy via Unauth ChromaDB

A ChromaDB instance on a DigitalOcean VPS exposes three CUID-named collections (corpuscln) representing the personal document corpora of three users on what appears to be a multi-tenant document-RAG S…

Read →
Case May 3, 2026

Unknown Operator: Pingu Crypto Trading AI + Nova Molecular Optimization: Live Strategy IP Exposed via Unauthenticated Qdrant

A single Qdrant instance on a Vultr host exposes two parallel autonomous AI agent systems without authentication. The first, "Pingu", is a live crypto trading AI with active positions, real PnL histor…

Read →
Case May 3, 2026

tweet-optimize.com: 1.21M Facial Embeddings (OnlyFans + Second Dataset) Exposed Unauth on Milvus

![Evidence dashboard](../../evidence/tweet-optimize-2026-05-03/00-evidence-dashboard.png)

Read →
Case May 3, 2026

Watzis / Calmio: Vietnamese AI Assistant: PII Memory Store Exposed via Unauthenticated Qdrant

A production multi-user Vietnamese AI assistant, likely operating under the "Watzis" or "Calmio" brand, runs a Mem0-backed long-term memory stack on a Vultr VPS with no authentication on port 6333. Th…

Read →