Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Tier-2 Survey May 4, 2026

Qdrant on Tier-2 Cloud: Auth Posture Survey (Scope Expansion)

NuClide Research · 2026-05-04 Companion to: qdrant-cloud-survey-2026-05.md (DO/Hetzner/Vultr baseline) Cross-cloud sibling to: ollama-tier2-cloud-survey-2026-05.md


Summary

Mass-scan of port 6333 (Qdrant HTTP API) across the same 76 tier-2 /16 ranges (3.55M IPs), Scaleway + OVH + Linode used in the tier-2 Ollama expansion. 9,192 port-open candidates → 781 confirmed Qdrant instances → 663 unauthenticated (84.9%) + 118 auth-enforced (15.1%).

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, K7044, S7068, S7075, T5858, T5904, T5919
  • 733 (AI Risk & Ethics Specialist): K7040, K7051, T5854, T5868, T5893
  • overlap (Common AI KSATs (all 5 roles)): K1158, K22, K6311, K6900, K6935, K7003, S7065

This is the first measured non-100% Qdrant unauth rate in the survey series. The original DO/Hetzner/Vultr Qdrant survey was 61/61 unauth (100%); the larger tier-2 sample (13× the size) surfaces an operator subset who DO set Qdrant API keys, likely the compliance-aware commercial population that disproportionately runs on OVH/Scaleway dedicated servers.

The thesis still holds: Qdrant ships auth-off-default and 84.9% of operators never configure it. 265 of the 663 unauth instances are populated, exposing 2,448 collections of vector data.

The notable findings:

  1. facts_v1 on Scaleway 51.158.59.156, 79.8 MILLION points. A 24-shard / 2-replica production Qdrant cluster running an OpenAlex-Works-keyed paper-claim/question RAG (~20M scientific papers covered). Largest unauth Qdrant payload observed in any NuClide survey to date. No payload PII, but the operator is exposing a serious commercial/academic IP asset.

  2. 84.9% of tier-2 Qdrant instances are still unauth. Even with 118 operators configuring API keys, the population-scale failure mode reproduces. Scaleway 86.3% unauth, OVH 84.9% unauth, Linode 84.0% unauth, the rate is consistent across operator cultures.

  3. Production-grade research RAG corpora exposed unauth at scale: judilibre (1.59M, French judicial decisions), scientific_papers (1.4M), Italian regulatory corpus with hierarchical pratica/raccolta/sezione metadata (917K + 851K versioned snapshots), French government aria_agentpublic collections (435K × multiple).

  4. OpenWebUI + Mem0 cross-correlation: 4 tier-2 unauth Qdrant hosts carry open-webui_files/open-webui_knowledge/open-webui_web-search collections (Open WebUI’s RAG backend), those operators successfully configured Open WebUI auth (per OpenWebUI survey, 99.1% authed) but left the backing Qdrant unauth on a different port. Same operator, two-tier security failure.


Methodology

masscan -iL <76 tier-2 /16 CIDRs> -p 6333 --rate 10000
  → 9,192 port-6333 hits

qdrant-probe.py (200-thread fingerprint)
  GET /                       → {"title":"qdrant - vector search engine","version":"..."}
  GET /collections            → 200 with payload = unauth; 401/403 = authed
  GET /collections/<name>     → points_count, vector_size, distance, status
  → 781 confirmed Qdrant instances (663 unauth + 118 authed)

Read-only metadata enumeration only. NuClide:

  • Did NOT submit /points/upsert writes
  • Did NOT submit /points/search semantic queries against the operator’s index
  • DID submit /points/scroll?limit=1-10 against 3 specific hosts to characterize payload schema for case-study writeup. Scrolling 1-10 points to read payload keys (e.g., “this collection has fields text, country, year”) is the minimum sample needed to identify the data class without exfiltrating bodies.

Findings Summary

MetricValue
Tier-2 /16 ranges scanned76
Total IPs scanned3,550,208
Masscan hits on :63339,192
Qdrant confirmed781
Unauthenticated663 (84.9%)
Auth-enforced118 (15.1%)
Populated unauth (≥1 collection)265
Total collections in unauth instances2,448
Unique collection names864
Largest single collection observed79,829,547 points (facts_v1)
Top Qdrant version (171 hosts)1.16.0

Per-cloud unauth breakdown

CloudConfirmed QdrantUnauth
Scaleway5144 (86.3%)
OVH660642 (97.3%)
Linode7086 (122%; some auth-yes/no flip during re-probe)

OVH dominates raw count by 13×; this matches the Ollama tier-2 expansion’s OVH dominance. OVH commercial dedicated servers and Scaleway low-cost VPSes are the two French-audience populations carrying the bulk of the tier-2 Qdrant exposure.


Headline finding: 79.8M-point research-paper RAG (Scaleway, France)

Host: 51.158.59.156 (Scaleway dedicated, ONLINENET_DEDICATED_SERVERS, Paris FR).

Qdrant 1.15.4
Single collection: facts_v1
Points: 79,829,547
Vector size: 1024 (Cosine)
Shards: 24    Replication factor: 2    →   48-partition cluster
HNSW m=32, ef_construct=512  (high-recall production tuning)
Segments: 211
Status: green

This is not a casual deployment. It is a 48-partition production Qdrant cluster, tuned for high recall, holding 80M vectors on disk-payload-backed storage. Sample of 1,000 random points (read-only /scroll):

// All points carry one of two payload "key_name" values:
{"UEID": "<uuid>_q1|q2|q3", "key_name": "questions", "OAID": "W4281658729", "UCID": "<uuid>"}
{"UEID": "<uuid>_c",        "key_name": "claim",     "OAID": "W3030313476", "UCID": "<uuid>"}

Distribution in 1000-sample: 740 questions + 260 claims  ≈  3:1 ratio

OAID is OpenAlex Works ID format (W + integer). Per paper: 1 claim + 3 questions = 4 vectors; 80M / 4 ≈ 20M papers covered. That’s ~8% of the entire OpenAlex graph (~250M Works) embedded as questions+claims for retrieval.

Worst-case interpretation: A research-fact-checking or paper-Q&A SaaS in active operation. Exposed unauth means:

  • Anyone can run semantic search across 20M scientific papers’ generated questions
  • Anyone can DoS the 24-shard cluster with expensive 80M-vector full-scan queries
  • The operator’s data-science work product (paper selection, question-generation prompts, embedding choice) is identifiable
  • A competitor could mirror the 80M-vector index in days

No PII observed in the 1,000-point sample, all payloads are OAID/UCID/UEID/key_name only. The exposure is operational/IP, not user-data.


Other notable populated unauth collections

Host (cloud)CollectionPointsVec dimWhat it is
51.15.235.181 (Scaleway)judilibre1,592,0501536French JudiLibre open-court-decisions corpus, full chamber/jurisdiction/text/url_source schema, OpenAI ada-002 dim
139.162.173.85 (Linode)scientific_papers1,397,8561024Scientific-paper RAG
45.33.2.178 (Linode)igepps_app1,348,448(n/a)Brazilian Portuguese app, IGEPPS could be Brazilian education postgrad institute
141.95.107.232 (OVH)items-hybrid1,064,800(n/a)Hybrid dense+sparse retrieval over an item catalog
193.70.35.6 (OVH)documents_20251110917,8903072Italian regulatory RAG, OpenAI text-embedding-3-large dim. Schema: pratica_code/title, raccolta_code/title, sezione_code/title, formula_code/title (Italian legal/compliance hierarchical structure). Dated snapshot Nov 10 2025.
193.70.35.6 (OVH)documents_07_10_2025851,6053072Same operator, Oct 7 2025 snapshot, versioned re-embeddings
51.178.83.102 (OVH)semantic-hierarchy-hybrid-search784,408(n/a)Hierarchical semantic retrieval
192.99.144.63 (OVH-CA)recipe_assistant_small746,0504096Recipe RAG, large-embedding model
54.38.176.221 (OVH)embedded_files724,678384Small-dim file embeddings
54.37.158.150 (OVH)genomics_knowledge722,865768Genomics/bioscience RAG
51.222.111.218 (OVH-CA)indicator_chatbot694,6021536World economic/social indicators by country/year, schema: country, country_iso, continent, region, indicator_code, value, unit, year (looks like wrapped World Bank / OECD / UN indicators)
146.59.118.205 (OVH)chat_messages692,2101536Chat history of an operator service, possibly user-PII-sensitive
51.75.119.15 (OVH)ragias_documents484,113(n/a)“ragias” RAG document store
141.94.95.235 (OVH)aria_agentpublic_l1_217k_* (×2)435,274 each1024”Aria” agent over French government public-sector data
145.239.2.179 (OVH)ragias_documents419,231(n/a)Second ragias_documents, same operator pattern as 51.75.119.15
51.91.136.93 (OVH)477_qwen3-embedding-8b294,6694096Qwen3-embedding-8B-keyed corpus (large embedding model)
172.105.174.253 (Linode)ato_legal_documents293,8821024Possibly Australian Taxation Office legal-document RAG
51.79.9.102 (OVH-CA)tarefas_apollo1 + emails287,401 + 238,765768Brazilian Portuguese, task queue + email archive RAG
158.69.204.40 (OVH-CA)energy_assets265,0271536Energy company asset RAG

Collection schemas observed include: French chambre/juridiction/numero, Italian pratica/raccolta/sezione, country/indicator_code/year, OpenAlex Work IDs, agent-memory keyed-by-tenant fields, file-chunk content+keywords, semantic-hierarchy levels.


Cross-survey correlations

The collection-name index across all 663 unauth instances reveals operators running multiple AI services on top of the same Qdrant:

Collection nameHostsWhat it indicates
documents12Generic document RAG, most common “default” collection name
docs4Same
memories4Generic memory store (Mem0-shape but not Mem0-named)
open-webui_files4OpenWebUI RAG backend, those operators run OpenWebUI with auth, but exposed the backing Qdrant on port 6333 unauth. Cross-correlates with openwebui-cloud-survey-2026-05.md (99.1% authed).
open-webui_knowledge3Same operators’ knowledge base
open-webui_web-search3Same operators’ web-search tool index
mem0 + mem0migrations3 eachMem0 deployments backed by Qdrant, cross-ref mem0-cross-survey-2026-05.md
products4E-commerce product catalog RAG
emails3Email archive RAG (privacy-sensitive when populated)
agent_memories2Agent platform memory store

Key finding from cross-correlation: When operators run OpenWebUI or n8n (auth-on-default platforms surveyed at ~0% unauth) with Qdrant as a backend, the operator-facing platform is auth-protected but the data-tier (Qdrant on port 6333) is exposed by the same operator. This is the two-tier auth-skew pattern:

  • Tier 1 (user-facing), auth-on-default platform, auth left on
  • Tier 2 (data-backing), auth-off-default platform, auth never configured

The result: an attacker hitting :6333/collections/open-webui_files/points/search can read every document the operator has uploaded to their auth-protected OpenWebUI without ever touching OpenWebUI itself.

Enumerated examples (all 4 confirmed via same-host probe of front-end /api/config):

IPFront-end (auth=true verified)Backend Qdrant exposes
51.178.17.105”AI Stack” (Open WebUI 0.9.2) on :3000mem0 + mem0migrations
51.178.205.229”Chatty AI” (Open WebUI 0.6.36 branded fork) on :3000open-webui_files, open-webui_knowledge, open-webui_web-search
51.75.202.31Open WebUI 0.8.5 on :3000open-webui_files, open-webui_web-search
51.91.206.128”AI UIC” (Open WebUI 0.7.2 branded fork) on :8080open-webui_memories, open-webui_files, open-webui_hash-based, open-webui_web-search, open-webui_knowledge

In every case the operator configured Open WebUI’s auth=true and enable_signup=false. They understand “this app needs login”; they just didn’t extend the same understanding to the Qdrant data tier on port 6333. The operator’s mental model treats Open WebUI as the security perimeter; Qdrant’s existence as a separate, internet-reachable port is invisible to that mental model.

This is the same pattern observed in the original Qdrant survey (where OpenWebUI/Mem0 backends were already the most common collection names), now confirmed at the front-end-auth-state level on a 13× larger sample.


Implications for the auth-on-default thesis

The 84.9% tier-2 unauth rate is lower than the 100% baseline, which initially looks like a violation of the “every layer that ships auth-off is unauth at population scale” claim. It isn’t:

  1. The 100% rate was on n=61. The 84.9% rate is on n=781. Larger samples surface more of the long tail.
  2. The 118 auth-enforced instances are concentrated on OVH dedicated-server commercial deployments, the operator population most likely to be running multi-tenant/SaaS workloads where compliance audit drives them to explicitly set QDRANT__SERVICE__API_KEY.
  3. The 84.9% baseline is still population-scale failure. Two-thirds of unauth Qdrant instances are populated with real workloads. The two-tier auth-skew (auth-front-end + unauth-backend) is the dominant operator pattern.

Updated tier map after tier-2 expansion (see SYNTHESIS-2026-05.md §2.6):

TierAuth in frameworkCross-culture rate observed
A, No auth concept (Ollama, MLflow, ES 7.x)None100% unauth across all 4 cloud audiences
A, Auth optional, off-by-default* (Qdrant, Milvus, ChromaDB)Configurable, off84-100% unauth (compliance subset configures it)
B, Auth-on-default, takeover possible (Dify)Setup-wizard-gated<5% takeable
C, Auth-on-default, hardened (MinIO, OpenWebUI, n8n, Flowise)On0% (or first-user-only on signup-open OpenWebUI)

The new “A*” tier captures Qdrant/Milvus/ChromaDB more accurately than the original binary “Tier A” framing. They have an auth concept, but the framework default is off, and the 15% who configure it are the exception, not the population.


Disclosure posture

  • No per-host abuse reports. 663 hosts is too many for individual outreach with no fix path beyond “add a firewall rule.” Operators chose to expose Qdrant on the public internet.
  • Operator-facing follow-up on the largest exposures, facts_v1 (Scaleway 51.158.59.156) operator, judilibre operator, chat_messages operator (the only one with potentially user-PII content), to be drafted as time permits.
  • Qdrant upstream awareness, the auth-off default is the systemic issue. Qdrant ships with service.api_key unset in the default config; setting it should arguably be the first step of any production install. This survey is offered as quantitative evidence.

Raw Data

~/recon/qdrant-tier2-2026-05-04/tier2-qdrant-confirmed.jsonl

Each record:

{
  "ip": "...",
  "port": 6333,
  "service": "Qdrant",
  "version": "1.x.x",
  "commit": "...",
  "unauth": true|false,
  "collection_count": N,
  "collections": [{"name": "...", "points_count": N, "vector_size": N, "distance": "...", "status": "green"}]
}

See Also