Vector Databases, Data Layer, NuClide Stack

What it is

A vector database stores high-dimensional embeddings, numerical fingerprints of text, images, audio, and answers nearest-neighbour queries against them. It is the memory of every RAG system. The popular ones each carve out a slightly different niche: ChromaDB (the developer-friendly default), Qdrant (Rust-fast, popular in production), Milvus (the heavyweight enterprise option), Weaviate (schema-rich), Pinecone (managed-only), pgvector (Postgres extension), Elasticsearch with its dense_vector type (the one that already lives in your stack).

What goes wrong

Every popular vector DB ships with authentication off by default and a public listen socket. The exposure isn’t theoretical, it’s the contents. Every collection is named (often after the project: customer-support-knowledge, legal-discovery-q4, patient-notes-v2); every collection contains the embedded text in its metadata; many collections also contain the original source URL or document ID. Reading a single collection lets an attacker reconstruct most of the operator’s internal corpus, plus the prompts the operator has been embedding (which often are customer queries).

How we test

We hit the heartbeat endpoint to confirm the engine, list collections via the unauthenticated metadata API, and read the first record’s metadata only (never the raw vectors, never the bulk content). The collection names plus the metadata-schema fields are sufficient evidence of exposure. For operators we already know, universities, medical centres, financial institutions, we draft the disclosure on collection names alone, which cleanly avoids touching the contents in any reportable way.

Vector Databases

What it is

What goes wrong

How we test

Vector database population survey, 2026-05-17

VisorBishop loop-iteration #3: AI-stack ML pipeline ports, Rogers NetOps disclosure

Milvus/Attu on Public Cloud: Auth Posture and Multi-Tenant SaaS Exposure Survey

Weaviate on Public Cloud: Auth Posture and Enterprise Tenant Exposure Survey

Commercial AI Infrastructure Exposures

ChromaDB on Tier-2 Cloud: Auth Posture Survey (Scope Expansion)

Milvus on Tier-2 Cloud: Auth Posture Survey (Scope Expansion)

Qdrant on Tier-2 Cloud: Auth Posture Survey (Scope Expansion)

Operator Remediation Guide

ChromaDB on Public Cloud: Auth Posture Survey

Milvus on Public Cloud: Auth Posture Survey

Qdrant on Public Cloud: Auth Posture Survey

The Modern AI Stack Ships Open: Cross-Survey Synthesis

Stock.ai (EMOR AI) — Partial-Auth Failure, Open Weaviate, and 62 Proprietary Analyst Reports

Commercial AI Infrastructure Exposures

Auto F&I Sales Training RAG: Customer Dialogues + Methodology IP Exposed via Unauthenticated ChromaDB

Crypto Investment Agent: Per-User Financial Memory Exposed via Unauthenticated ChromaDB

HolaModa + Delta701: Multi-Tenant Fashion Retail RAG with Dev/Prod Co-Located on Unauth ChromaDB

Brazilian Banking-Compliance AI Consultant: Unauthenticated Qdrant with BCB / LGPD Methodology Corpus

Multi-Tenant Personal Document SaaS: Diary, Theater Scripts, Philosophy via Unauth ChromaDB

Unknown Operator: Pingu Crypto Trading AI + Nova Molecular Optimization: Live Strategy IP Exposed via Unauthenticated Qdrant

tweet-optimize.com: 1.21M Facial Embeddings (OnlyFans + Second Dataset) Exposed Unauth on Milvus

Watzis / Calmio: Vietnamese AI Assistant: PII Memory Store Exposed via Unauthenticated Qdrant

Blutspende Sergogram Flowise Weaviate Credentials Exposed 2026 05 25

Sergogram Flowise Weaviate Operator 2026 05 25

MyAi Corporation: Unauthenticated Multi-Tenant Weaviate Knowledge Base

Search Engines

OLAP / Analytics Backends

MLOps Tracking

Agent Memory

Data Labeling

Object Storage

Compute Orchestration

GPU Compute & Telemetry

Container Orchestration

Medical / Edge AI

Backup & Snapshots

Fine-tuning Runtimes

Document Parsers

Model Hubs & Registries