Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Survey May 3, 2026

Milvus on Public Cloud: Auth Posture Survey

NuClide Research · 2026-05-03


Summary

Sweep of 1.83M IPs across 28 cloud-provider /16 ranges (DigitalOcean, Hetzner, Vultr) on port 19530 → 275 masscan hits → 33 confirmed Milvus instances via the /v2/vectordb/collections/list REST API → all returned code: 0 (success) with no authentication. All 33 unauthenticated. 27 of 33 contain non-empty collections.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, K7004, S7068, S7070, S7075, T5904
  • 733 (AI Risk & Ethics Specialist): K7040, K7051, S7067, T5854, T5868, T5882, T5904
  • overlap (Common AI KSATs (all 5 roles)): K108, K1157, K1158, K1159, K22, K6311, K6900, K6935, K7003, K942

Milvus 2.4+ unifies REST and gRPC on port 19530 via the proxy component. The REST API exposes collection list, schema (describe), and entity query (/v2/vectordb/entities/query) without authentication when RBAC is not configured, RBAC is opt-in, not default. This matches the Qdrant / ChromaDB pattern: the vector-DB layer of the modern RAG stack ships open and operators rarely close it.


Methodology

masscan -iL <28 cloud /16 CIDRs> -p 19530 --rate 10000
  → 275 masscan hits on :19530

milvus-probe.py (100-thread REST API probe)
  POST /v2/vectordb/collections/list  body {"dbName":"default"}
  match {"code":0, "data":[<collections>]}
  fallback v1: GET /v1/vector/collections - match {"code":200, "data":[...]}
  → 33 confirmed Milvus instances (32 v2, 1 v1)

milvus-deep.py (per-collection schema describe)
  POST /v2/vectordb/collections/describe  body {dbName, collectionName}
  → field schemas for 28 of 33 instances (5 returned empty schemas - likely permission-gated describe)

Findings Summary

MetricValue
Cloud /16 ranges scanned28 (DO/Hetzner/Vultr)
Masscan hits on :19530275
Milvus REST confirmed33
Unauthenticated33 (100%)
With non-empty collections27
Empty / fresh installs6

API version split

Version familyCount
v2 (Milvus 2.4+)32
v1 (Milvus 2.3.x and earlier)1

Hosting provider split

ProviderConfirmed
Hetzner16
DigitalOcean11
Vultr6

High-Value Exposures

1. Everos AI Agent Platform: Multi-Tenant Episodic Memory + User Profiles

Host: 167.172.135.156:19530 (DigitalOcean) · v2 · multi-tenant

Collections (6):

Collection (suffixed with creation timestamp)Fields
t_everos_v1_episodic_memory_*id, vector, user_id, group_id, session_id, participants, sender_ids, type, timestamp, episode, search_content, parent_type, parent_id, tenant_id
t_everos_v1_foresight_record_*id, vector, user_id, group_id, session_id, participants, sender_ids, type, start_time, end_time, duration_days, parent_type, parent_id, tenant_id
t_everos_v1_atomic_fact_record_*id, vector, user_id, group_id, session_id, participants, timestamp, sender_ids, type, parent_type, parent_id, tenant_id
t_everos_v1_user_profile_*id, vector, user_id, group_id, scenario, memcell_count, item_type, embed_text, tenant_id
t_everos_v1_agent_case_*id, vector, user_id, group_id, session_id, timestamp, task_intent, parent_type, parent_id, tenant_id
t_everos_v1_agent_skill_*id, vector, user_id, group_id, cluster_id, content, maturity_score, confidence

What’s exposed:

The schema is sophisticated, this is a production AI agent platform with cognitive architecture worth describing:

  • Episodic memory, full conversation episodes with participant lists, sender IDs, parent linkage
  • Foresight records, agent-generated predictions/plans (start/end time, duration)
  • Atomic facts, extracted assertions from conversations
  • User profiles, scenario-keyed, with memcell counts indicating cognitive primitives stored per user
  • Agent cases, task instances with task_intent
  • Agent skills, learned behaviors with maturity_score + confidence

The tenant_id field on every collection confirms multi-tenant SaaS. Collection names are timestamped (20260421060534632549 = April 21, 2026 06:05:34), provisioning creates fresh collections per deployment generation.

Risk: A complete AI-agent-platform dump is exposed. An attacker can:

  • Enumerate all users (via user_id field on every collection)
  • Enumerate all tenants (via tenant_id)
  • Read full conversation history per user (episodic_memory.episode + search_content)
  • Read agent’s predictive plans (foresight_record)
  • Read extracted personal facts (atomic_fact_record)
  • Read learned agent behaviors and confidence levels (agent_skill)

This is the most architecturally complete exposure in the survey, Everos appears to be a real AI-agent SaaS startup; the deployment date suggests recent production launch.


2. Multi-Tenant “Intelbase” Platform: 25 Tenant Collections

Host: 167.71.232.155:19530 (DigitalOcean) · v2

Collections (25): intelbase_42, intelbase_59, intelbase_74, intelbase_77, intelbase_68, … (25 numbered tenant collections)

Schema: Only id and vector exposed via describe, likely full schema requires auth, or the operator’s collection definition is intentionally minimal (vector + ID only, all metadata in a sibling MongoDB/Postgres).

What’s exposed: A multi-tenant SaaS with sequential tenant IDs starting from low numbers (42, 59, 68, 74, 77, gaps suggest churn). The “intelbase” naming + sequential customer IDs indicates a B2B intel/OSINT product where customers each get their own collection. Even without payload visibility, the list of customer numbers is itself competitive intelligence, a rival can enumerate the customer count and growth rate by polling /collections/list periodically.


Host: 65.109.51.219:19530 (Hetzner) · v1

Collections (10): law_collection, hakam_laws, mahkamaty_prod_new, mahkamaty_prod, hakam, …

What’s exposed: The Arabic naming is distinctive: mahkamaty (محكمتي) translates to “my courts” and is a known Saudi e-litigation portal brand. hakam means “arbitrator”. This appears to be a Gulf-region legal-tech operator running Milvus as the RAG backend for legal document retrieval. The _new suffix on mahkamaty_prod_new indicates active development (a v2 of the production index).

Risk: Saudi Arabia’s PDPL (Personal Data Protection Law, in force 2024) requires controllers to implement security measures appropriate to data sensitivity. Legal/judicial data is high-sensitivity by any framework. If mahkamaty_prod contains case-related embeddings, exposure breaches PDPL Article 19 (security obligations) and likely Article 33 (breach notification within 72 hours).


4. Midea (Chinese Appliance MFG) Corporate Knowledge Base

Host: 65.108.127.99:19530 (Hetzner) · v2

Collections (4): kb_midea, kb_midea_2, kb_midea__, kb_midea3, four iterations of the same KB

Schema fields per collection: id, text, dense, sparse, pdf_id, chunk_index, page_number, folders, metadata

What’s exposed: Hybrid (dense + sparse) embedding indices over PDF documents with chunk-level + page-level addressing and folder hierarchy preserved in the folders field. Midea is a $50B+ Chinese household appliance manufacturer, exposure of corporate KB would include internal procedures, product specs, supplier documentation, M&A diligence material, etc. The four collection iterations indicate ongoing KB experimentation by an internal team.


5. Facial Recognition Doxing Primitive: psos + onlyfans: 1.21M face embeddings

Host: 65.108.107.240:19530 (Hetzner FI) · v2

Operator: tweet-optimize.com (per port-80 HTTP 301)

Collections (2):

CollectionCount
onlyfans897,111
psos313,066

Schema: id, mongo_id, image_id, embedding, bbox1, bbox2, bbox3, bbox4

This is the most impactful finding in the survey. 1.21M facial embeddings of OnlyFans content, plus a second psos dataset, exposed unauthenticated. The Milvus /v2/vectordb/entities/search endpoint accepts a face vector and returns nearest-neighbor matches: it is a functional doxing primitive. An attacker with a target’s photo can compute a comparable face embedding locally and query the operator’s index to find which OnlyFans accounts the person appears on.

Full writeup with operator-attributed disclosure path, embedding-space attack details, and sibling-MongoDB analysis: multi-tweet-optimize-facial-recognition.md


6. Multi-Tenant UUID-Named Collections (38 tenants)

Host: 168.119.242.46:19530 (Hetzner) · v1

Collections (38): _3f2db519_1d10_447a_9c51_ed52a3fa1790, _06dc1637_2490_4a69_847d_e7f1ae4bc89f, … (UUID-prefixed)

What’s exposed: UUID-named collections imply auto-generated tenant IDs in a multi-tenant SaaS. The leading underscore is a Milvus naming convention, collections starting with a number must be prefixed. 38 tenants visible. Schema describe returned empty for all of these, suggesting RBAC may partially apply (list visible, describe gated), but the tenant count is itself competitive intelligence.


7. Other Notable Exposures (one-line)

HostHighlight
144.202.73.6311 collections including meeting_memory, voice_registry, faq, cities_ingest, information_faq_collect, voice + meeting AI agent stack
45.63.7.3investigator, organization, study, documents, fq, research/investigation platform
188.166.229.136records, functions, knowledge_base, generic LLM tool-use stack
135.181.221.152product_name, product_description, product_taxonomy, ecommerce product RAG
135.181.252.66experience_memory, mem0migrations, all, all_v3, Mem0-on-Milvus deployment
165.227.8.44psk_saree_finder_v3-v6, Indian saree (garment) finder, 4 versions
46.101.105.165policy_chunks_v1, policy/legal document RAG
45.76.248.64image_database, image_database_backup, image_database_backup_2, image vector DB with backup co-located
159.69.184.136wdw_prod_hybrid, prod_hybrid, production hybrid indices
167.172.46.101termex_ip_cosine, IP/term-extraction cosine index
65.108.226.74teamboost_tasks_title, teamboost_tasks_description, task management AI
65.108.76.202image_embeddings, image_embeddings_2, duplicated image embeddings
159.69.87.49rostros (faces, ES), possibly facial-recognition
159.203.45.150dev_cost_guard, internal cost-guarding tool
116.202.108.128skynet_test, interesting name choice for production
168.119.102.222legal_acts_e5_large, legal corpus with E5-large embeddings
165.227.182.149documents_chunks, generic chunked document RAG
45.76.114.69Vector_index_<uuid>_Node, LlamaIndex-style indexed nodes
168.119.141.25screenshots, likely OCR’d UI screenshots
168.119.229.126llamacollection, LlamaIndex
95.179.181.104image_database_gpu/pq/cpu/_, image embeddings with index variants (GPU vs CPU vs PQ)

Root Cause: Default-Off RBAC

Milvus ships with RBAC disabled by default. Enabling it requires:

# milvus.yaml
common:
  security:
    authorizationEnabled: true

# Then create root user + permissions via:
# milvus_cli> create user -u root -p <password>
# milvus_cli> create role -r ro_role
# milvus_cli> grant role ro_role -u <user>

The authorizationEnabled: true flag must be explicitly set. The Milvus quickstart Docker Compose and the helm chart defaults both leave it false. None of the 33 confirmed instances had it enabled, /v2/vectordb/collections/list returned code: 0 (success) with no token requirement.

The Milvus 2.4 REST API documentation does describe the auth header (Authorization: Bearer <token>), but the security-disabled default means the field is ignored when RBAC is off. This is the same pattern as Qdrant and ChromaDB.


Cross-Survey Pattern: Vector DB Auth Posture

PlatformSampleUnauthenticatedDefaultSurvey
Qdrant61100%auth-offqdrant-cloud-survey-2026-05.md
ChromaDB48100%auth-offchromadb-cloud-survey-2026-05.md
Milvus33100%RBAC-offthis file
Elasticsearch42mixedauth-off in 7.xelasticsearch-cloud-survey-2026-05.md
Flowise430%auth-on (since CVE-2024-36420)flowise-cloud-survey-2026-05.md
n8n1,0060%auth-on (since v0.166.0)n8n-cloud-survey-2026-05.md
Jupyter18 (univ)0%PAM/LDAP standardjupyter-survey-2026-05.md

Observation: Three independent vector DB vendors (Qdrant, ChromaDB, Milvus), three independent codebases, three independent leadership teams, and all three have the same default. The pattern is not a per-vendor oversight; it is the cultural inheritance of “vector DB is local development infrastructure” that has not yet adapted to “vector DB is production multi-tenant data layer.”


Remediation

# milvus.yaml
common:
  security:
    authorizationEnabled: true

After enabling, create a root password, then non-root users with read-only or read-write roles per application. Firewall port 19530 to the application backend. If Pulsar/etcd metadata stores are on adjacent ports (9091, 9000, 2379), firewall those too, Milvus is a multi-component system; closing the proxy port alone leaves the metadata accessible.


NuClide Pipeline Artifacts

StageToolOutput
Discoverymasscan + custom REST probe/tmp/milvus-confirmed.jsonl (33 instances)
Schema enumerationcustom REST describe probe/tmp/milvus-deep.jsonl (per-collection schemas)
Findings ledgerVisorLogTo be ingested into data/nuclide.db
Compliance scoringVisorScubaWill fail AI.C1 (unauth-baseline) for all 33
Adversarial corpusVisorCorpusExisting chromadb adversarial corpus applies, categories kb_exfiltration, tenant_cross_leak, prompt_injection transfer

References