Milvus on Public Cloud: Auth Posture Survey
NuClide Research · 2026-05-03
Summary
Sweep of 1.83M IPs across 28 cloud-provider /16 ranges (DigitalOcean, Hetzner, Vultr) on port 19530 → 275 masscan hits → 33 confirmed Milvus instances via the /v2/vectordb/collections/list REST API → all returned code: 0 (success) with no authentication. All 33 unauthenticated. 27 of 33 contain non-empty collections.
DCWF KSAT coverage
Auto-derived from DCWF AI work-role rule files (ksat-tag).
- 672 (AI Test & Evaluation Specialist): K7003, K7004, S7068, S7070, S7075, T5904
- 733 (AI Risk & Ethics Specialist): K7040, K7051, S7067, T5854, T5868, T5882, T5904
- overlap (Common AI KSATs (all 5 roles)): K108, K1157, K1158, K1159, K22, K6311, K6900, K6935, K7003, K942
Milvus 2.4+ unifies REST and gRPC on port 19530 via the proxy component. The REST API exposes collection list, schema (describe), and entity query (/v2/vectordb/entities/query) without authentication when RBAC is not configured, RBAC is opt-in, not default. This matches the Qdrant / ChromaDB pattern: the vector-DB layer of the modern RAG stack ships open and operators rarely close it.
Methodology
masscan -iL <28 cloud /16 CIDRs> -p 19530 --rate 10000
→ 275 masscan hits on :19530
milvus-probe.py (100-thread REST API probe)
POST /v2/vectordb/collections/list body {"dbName":"default"}
match {"code":0, "data":[<collections>]}
fallback v1: GET /v1/vector/collections - match {"code":200, "data":[...]}
→ 33 confirmed Milvus instances (32 v2, 1 v1)
milvus-deep.py (per-collection schema describe)
POST /v2/vectordb/collections/describe body {dbName, collectionName}
→ field schemas for 28 of 33 instances (5 returned empty schemas - likely permission-gated describe)
Findings Summary
| Metric | Value |
|---|---|
| Cloud /16 ranges scanned | 28 (DO/Hetzner/Vultr) |
| Masscan hits on :19530 | 275 |
| Milvus REST confirmed | 33 |
| Unauthenticated | 33 (100%) |
| With non-empty collections | 27 |
| Empty / fresh installs | 6 |
API version split
| Version family | Count |
|---|---|
| v2 (Milvus 2.4+) | 32 |
| v1 (Milvus 2.3.x and earlier) | 1 |
Hosting provider split
| Provider | Confirmed |
|---|---|
| Hetzner | 16 |
| DigitalOcean | 11 |
| Vultr | 6 |
High-Value Exposures
1. Everos AI Agent Platform: Multi-Tenant Episodic Memory + User Profiles
Host: 167.172.135.156:19530 (DigitalOcean) · v2 · multi-tenant
Collections (6):
| Collection (suffixed with creation timestamp) | Fields |
|---|---|
t_everos_v1_episodic_memory_* | id, vector, user_id, group_id, session_id, participants, sender_ids, type, timestamp, episode, search_content, parent_type, parent_id, tenant_id |
t_everos_v1_foresight_record_* | id, vector, user_id, group_id, session_id, participants, sender_ids, type, start_time, end_time, duration_days, parent_type, parent_id, tenant_id |
t_everos_v1_atomic_fact_record_* | id, vector, user_id, group_id, session_id, participants, timestamp, sender_ids, type, parent_type, parent_id, tenant_id |
t_everos_v1_user_profile_* | id, vector, user_id, group_id, scenario, memcell_count, item_type, embed_text, tenant_id |
t_everos_v1_agent_case_* | id, vector, user_id, group_id, session_id, timestamp, task_intent, parent_type, parent_id, tenant_id |
t_everos_v1_agent_skill_* | id, vector, user_id, group_id, cluster_id, content, maturity_score, confidence |
What’s exposed:
The schema is sophisticated, this is a production AI agent platform with cognitive architecture worth describing:
- Episodic memory, full conversation episodes with participant lists, sender IDs, parent linkage
- Foresight records, agent-generated predictions/plans (start/end time, duration)
- Atomic facts, extracted assertions from conversations
- User profiles, scenario-keyed, with memcell counts indicating cognitive primitives stored per user
- Agent cases, task instances with
task_intent - Agent skills, learned behaviors with
maturity_score+confidence
The tenant_id field on every collection confirms multi-tenant SaaS. Collection names are timestamped (20260421060534632549 = April 21, 2026 06:05:34), provisioning creates fresh collections per deployment generation.
Risk: A complete AI-agent-platform dump is exposed. An attacker can:
- Enumerate all users (via
user_idfield on every collection) - Enumerate all tenants (via
tenant_id) - Read full conversation history per user (
episodic_memory.episode+search_content) - Read agent’s predictive plans (
foresight_record) - Read extracted personal facts (
atomic_fact_record) - Read learned agent behaviors and confidence levels (
agent_skill)
This is the most architecturally complete exposure in the survey, Everos appears to be a real AI-agent SaaS startup; the deployment date suggests recent production launch.
2. Multi-Tenant “Intelbase” Platform: 25 Tenant Collections
Host: 167.71.232.155:19530 (DigitalOcean) · v2
Collections (25):
intelbase_42, intelbase_59, intelbase_74, intelbase_77, intelbase_68, … (25 numbered tenant collections)
Schema: Only id and vector exposed via describe, likely full schema requires auth, or the operator’s collection definition is intentionally minimal (vector + ID only, all metadata in a sibling MongoDB/Postgres).
What’s exposed: A multi-tenant SaaS with sequential tenant IDs starting from low numbers (42, 59, 68, 74, 77, gaps suggest churn). The “intelbase” naming + sequential customer IDs indicates a B2B intel/OSINT product where customers each get their own collection. Even without payload visibility, the list of customer numbers is itself competitive intelligence, a rival can enumerate the customer count and growth rate by polling /collections/list periodically.
3. Saudi/Gulf Legal RAG: mahkamaty_prod, hakam_laws
Host: 65.109.51.219:19530 (Hetzner) · v1
Collections (10): law_collection, hakam_laws, mahkamaty_prod_new, mahkamaty_prod, hakam, …
What’s exposed: The Arabic naming is distinctive: mahkamaty (محكمتي) translates to “my courts” and is a known Saudi e-litigation portal brand. hakam means “arbitrator”. This appears to be a Gulf-region legal-tech operator running Milvus as the RAG backend for legal document retrieval. The _new suffix on mahkamaty_prod_new indicates active development (a v2 of the production index).
Risk: Saudi Arabia’s PDPL (Personal Data Protection Law, in force 2024) requires controllers to implement security measures appropriate to data sensitivity. Legal/judicial data is high-sensitivity by any framework. If mahkamaty_prod contains case-related embeddings, exposure breaches PDPL Article 19 (security obligations) and likely Article 33 (breach notification within 72 hours).
4. Midea (Chinese Appliance MFG) Corporate Knowledge Base
Host: 65.108.127.99:19530 (Hetzner) · v2
Collections (4): kb_midea, kb_midea_2, kb_midea__, kb_midea3, four iterations of the same KB
Schema fields per collection:
id, text, dense, sparse, pdf_id, chunk_index, page_number, folders, metadata
What’s exposed: Hybrid (dense + sparse) embedding indices over PDF documents with chunk-level + page-level addressing and folder hierarchy preserved in the folders field. Midea is a $50B+ Chinese household appliance manufacturer, exposure of corporate KB would include internal procedures, product specs, supplier documentation, M&A diligence material, etc. The four collection iterations indicate ongoing KB experimentation by an internal team.
5. Facial Recognition Doxing Primitive: psos + onlyfans: 1.21M face embeddings
Host: 65.108.107.240:19530 (Hetzner FI) · v2
Operator: tweet-optimize.com (per port-80 HTTP 301)
Collections (2):
| Collection | Count |
|---|---|
onlyfans | 897,111 |
psos | 313,066 |
Schema: id, mongo_id, image_id, embedding, bbox1, bbox2, bbox3, bbox4
This is the most impactful finding in the survey. 1.21M facial embeddings of OnlyFans content, plus a second psos dataset, exposed unauthenticated. The Milvus /v2/vectordb/entities/search endpoint accepts a face vector and returns nearest-neighbor matches: it is a functional doxing primitive. An attacker with a target’s photo can compute a comparable face embedding locally and query the operator’s index to find which OnlyFans accounts the person appears on.
Full writeup with operator-attributed disclosure path, embedding-space attack details, and sibling-MongoDB analysis: multi-tweet-optimize-facial-recognition.md
6. Multi-Tenant UUID-Named Collections (38 tenants)
Host: 168.119.242.46:19530 (Hetzner) · v1
Collections (38): _3f2db519_1d10_447a_9c51_ed52a3fa1790, _06dc1637_2490_4a69_847d_e7f1ae4bc89f, … (UUID-prefixed)
What’s exposed: UUID-named collections imply auto-generated tenant IDs in a multi-tenant SaaS. The leading underscore is a Milvus naming convention, collections starting with a number must be prefixed. 38 tenants visible. Schema describe returned empty for all of these, suggesting RBAC may partially apply (list visible, describe gated), but the tenant count is itself competitive intelligence.
7. Other Notable Exposures (one-line)
| Host | Highlight |
|---|---|
144.202.73.63 | 11 collections including meeting_memory, voice_registry, faq, cities_ingest, information_faq_collect, voice + meeting AI agent stack |
45.63.7.3 | investigator, organization, study, documents, fq, research/investigation platform |
188.166.229.136 | records, functions, knowledge_base, generic LLM tool-use stack |
135.181.221.152 | product_name, product_description, product_taxonomy, ecommerce product RAG |
135.181.252.66 | experience_memory, mem0migrations, all, all_v3, Mem0-on-Milvus deployment |
165.227.8.44 | psk_saree_finder_v3-v6, Indian saree (garment) finder, 4 versions |
46.101.105.165 | policy_chunks_v1, policy/legal document RAG |
45.76.248.64 | image_database, image_database_backup, image_database_backup_2, image vector DB with backup co-located |
159.69.184.136 | wdw_prod_hybrid, prod_hybrid, production hybrid indices |
167.172.46.101 | termex_ip_cosine, IP/term-extraction cosine index |
65.108.226.74 | teamboost_tasks_title, teamboost_tasks_description, task management AI |
65.108.76.202 | image_embeddings, image_embeddings_2, duplicated image embeddings |
159.69.87.49 | rostros (faces, ES), possibly facial-recognition |
159.203.45.150 | dev_cost_guard, internal cost-guarding tool |
116.202.108.128 | skynet_test, interesting name choice for production |
168.119.102.222 | legal_acts_e5_large, legal corpus with E5-large embeddings |
165.227.182.149 | documents_chunks, generic chunked document RAG |
45.76.114.69 | Vector_index_<uuid>_Node, LlamaIndex-style indexed nodes |
168.119.141.25 | screenshots, likely OCR’d UI screenshots |
168.119.229.126 | llamacollection, LlamaIndex |
95.179.181.104 | image_database_gpu/pq/cpu/_, image embeddings with index variants (GPU vs CPU vs PQ) |
Root Cause: Default-Off RBAC
Milvus ships with RBAC disabled by default. Enabling it requires:
# milvus.yaml
common:
security:
authorizationEnabled: true
# Then create root user + permissions via:
# milvus_cli> create user -u root -p <password>
# milvus_cli> create role -r ro_role
# milvus_cli> grant role ro_role -u <user>
The authorizationEnabled: true flag must be explicitly set. The Milvus quickstart Docker Compose and the helm chart defaults both leave it false. None of the 33 confirmed instances had it enabled, /v2/vectordb/collections/list returned code: 0 (success) with no token requirement.
The Milvus 2.4 REST API documentation does describe the auth header (Authorization: Bearer <token>), but the security-disabled default means the field is ignored when RBAC is off. This is the same pattern as Qdrant and ChromaDB.
Cross-Survey Pattern: Vector DB Auth Posture
| Platform | Sample | Unauthenticated | Default | Survey |
|---|---|---|---|---|
| Qdrant | 61 | 100% | auth-off | qdrant-cloud-survey-2026-05.md |
| ChromaDB | 48 | 100% | auth-off | chromadb-cloud-survey-2026-05.md |
| Milvus | 33 | 100% | RBAC-off | this file |
| Elasticsearch | 42 | mixed | auth-off in 7.x | elasticsearch-cloud-survey-2026-05.md |
| Flowise | 43 | 0% | auth-on (since CVE-2024-36420) | flowise-cloud-survey-2026-05.md |
| n8n | 1,006 | 0% | auth-on (since v0.166.0) | n8n-cloud-survey-2026-05.md |
| Jupyter | 18 (univ) | 0% | PAM/LDAP standard | jupyter-survey-2026-05.md |
Observation: Three independent vector DB vendors (Qdrant, ChromaDB, Milvus), three independent codebases, three independent leadership teams, and all three have the same default. The pattern is not a per-vendor oversight; it is the cultural inheritance of “vector DB is local development infrastructure” that has not yet adapted to “vector DB is production multi-tenant data layer.”
Remediation
# milvus.yaml
common:
security:
authorizationEnabled: true
After enabling, create a root password, then non-root users with read-only or read-write roles per application. Firewall port 19530 to the application backend. If Pulsar/etcd metadata stores are on adjacent ports (9091, 9000, 2379), firewall those too, Milvus is a multi-component system; closing the proxy port alone leaves the metadata accessible.
NuClide Pipeline Artifacts
| Stage | Tool | Output |
|---|---|---|
| Discovery | masscan + custom REST probe | /tmp/milvus-confirmed.jsonl (33 instances) |
| Schema enumeration | custom REST describe probe | /tmp/milvus-deep.jsonl (per-collection schemas) |
| Findings ledger | VisorLog | To be ingested into data/nuclide.db |
| Compliance scoring | VisorScuba | Will fail AI.C1 (unauth-baseline) for all 33 |
| Adversarial corpus | VisorCorpus | Existing chromadb adversarial corpus applies, categories kb_exfiltration, tenant_cross_leak, prompt_injection transfer |
References
- Milvus authentication: https://milvus.io/docs/authenticate.md
- v2 REST API: https://milvus.io/api-reference/restful/v2.4.x/About.md
- RBAC enable: https://milvus.io/docs/rbac.md
- Cross-survey index: index.md