Commercial AI Infrastructure Exposures, NuClide Research

NuClide Research, ongoing · Updated 2026-05-04

Commercial / SaaS Ollama and AI infrastructure exposures discovered during OSINT sweeps. These differ from university and research-network exposures in that the operators are commercial entities with paying customers and PII pipelines.

2026-05 cross-survey synthesis: SYNTHESIS-2026-05.md, pulls together all 18+ platform surveys (~5,200 confirmed deployments) into one analysis: tier-by-tier auth-posture comparison, root-cause taxonomy, threat-class taxonomy, cross-survey operator correlations.

For operators who find their IP in a survey paper: REMEDIATION-GUIDE.md gives the one-line config fix for each affected platform (Qdrant, ChromaDB, Milvus, Ollama, MLflow, vLLM, Streamlit, Open WebUI, MinIO).

Future-work roadmap: FUTURE-SURVEYS.md catalogues 30+ AI/ML platform classes not yet surveyed (Ray Dashboard, ComfyUI, Weaviate, pgvector, Langfuse, W&B self-hosted, ClearML, AutoGen Studio, ClickHouse, ROS, NVIDIA Clara, etc.) with port + fingerprint + risk-class for each. Anyone can pick a category and run the survey using the documented methodology template.

Confirmed Findings

File	Operator	Country	Severity	Key Finding
FR-emails-pro-rdv-bot.md	emails-pro.fr (hosted on Romanian ICI IP space)	France / Romania	CRITICAL	Production French commercial appointment-booking SaaS, full system prompt + PII collection schema + function-call format exposed
TR-sanctionscanner-aml-kyc.md	sanctionscanner.com (168.119.90.62, Hetzner DE)	Turkey / Germany	CRITICAL	AML/KYC compliance SaaS, 79M KYB records + 6.2M individual sanctions list entries unauth; active ransom compromise; disclosed 2026-05-03
VN-watzis-ai-pii-memory.md	Watzis / Calmio AI assistant (149.28.77.155, Vultr)	Vietnam	HIGH	Vietnamese AI assistant, Mem0 long-term memory store unauth; citizen ID card + VND wallet + student PII in plaintext; multiple users confirmed
multi-pingu-trading-ai.md	Unknown operator (45.76.20.46, Vultr)	Unknown	HIGH	Pingu crypto trading AI + Nova molecular optimization, 25 Qdrant collections unauth; live trade PnL, full LLM reasoning traces, competition leaderboard
multi-legal-compliance-investigation.md	Unknown operator (167.172.120.218, DigitalOcean)	Unknown	CRITICAL (if populated)	Legal/compliance investigation platform schema exposed unauth, investigation_data, case_drafts, attachments collections; empty at probe time; flagged for re-probe
multi-auto-fi-sales-training.md	Unknown operator (104.131.60.234, DigitalOcean)	Unknown (Sean McNally methodology)	HIGH	Auto F&I sales training RAG, real customer dialogues with names + vehicles + dollar figures, Sean McNally methodology IP, 1,608 docs unauth ChromaDB
multi-crypto-agent-user-memory.md	Unknown operator (159.203.117.193, DigitalOcean)	Spanish-language LatAm/Spain	HIGH	Crypto investment agent, per-user financial profiles ($50K targets, exchange affinity, asset allocation) in user_memory_ collections; 12 collections, 15.9K docs unauth
multi-holamoda-multitenant.md	HolaModa + Delta701 (46.101.118.246, DigitalOcean)	Unknown (Mexican/Spanish?)	CRITICAL	Multi-tenant fashion retail RAG, 2 tenants + dev/prod co-located on one ChromaDB; 1.53M docs across 7 collections unauth; Vertex AI text-embedding-gecko
multi-personal-diary-corpus.md	Unknown Prisma SaaS (188.166.71.44, DigitalOcean)	Belgium/France inferred	HIGH	Multi-tenant document SaaS, Prisma CUID per-user collections expose personal alcohol-cessation diary (GDPR Art. 9), theater scripts with author emails + Belgian phones, public-domain texts
multi-tweet-optimize-facial-recognition.md	tweet-optimize.com (65.108.107.240, Hetzner FI)	Finland (Hetzner DC)	CRITICAL	1.21M face embeddings unauth on Milvus, onlyfans (897K) + psos (313K) collections with bbox + mongo_id refs. Worst-case interpretation: a doxing-as-a-service backend exposed on the public internet via unauth `/entities/search`
langfuse-cross-survey-2026-05-06.md	Unistart Hubs / Pharos AI Assistant (135.181.252.66, Hetzner DE)	Greece	CRITICAL	Four-platform AI-stack catastrophe on one host, Langfuse v3.73.1 with `signUpDisabled:false` on port 3001 (anyone registers, reads all LLM traces) + Mem0/Milvus unauth on 19530 (existing finding) + Attu admin GUI on 3000 + `CLIENT_SECRET` literally hardcoded in `/env.js` of the Pharos webapp on 8080. Surfaced via cross-survey-correlation probe of 723 ledger IPs (Methodology Insight #9) when Shodan API was unavailable, full chain ran in <5 minutes from a single anonymous probe
elasticsearch-cloud-survey-2026-05.md	sanctionscanner.com (168.119.90.62, Hetzner DE)	Turkey / Germany	CRITICAL	AML/KYC compliance SaaS, 79M KYB records + 6.2M individual sanctions list entries unauth; active ransom compromise; disclosed 2026-05-03
qdrant-cloud-survey-2026-05.md	Multiple operators	Various	HIGH	61/61 Qdrant instances unauth across DO/Hetzner/Vultr, crypto trading AI, Vietnamese PII in agent memory, internal SOPs, legal compliance platform

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

672 (AI Test & Evaluation Specialist): K7003, K7004, K7044, S7068, S7070, S7075, T5858, T5904
733 (AI Risk & Ethics Specialist): K7040, K7051, S7067, S7069, T5854, T5868, T5893
overlap (Common AI KSATs (all 5 roles)): K108, K1157, K1158, K1159, K22, K6311, K6900, K6935, K7003, K942, S7065

Cross-Provider Surveys

Aggregate auth-posture studies across cloud-hosting providers (DigitalOcean, Hetzner, Vultr, etc.) for specific platform classes.

File	Platform	Sample	Result
flowise-cloud-survey-2026-05.md	Flowise	43 instances across DO/Hetzner/Vultr	0 unauthenticated, operator hygiene post-CVE-2024-36420 has improved on cloud platforms
n8n-cloud-survey-2026-05.md	n8n	1,006 instances across DO/Hetzner/Vultr	0 unauthenticated, mandatory auth since v0.166.0 fully adopted on cloud platforms
jupyter-survey-2026-05.md	Jupyter / JupyterHub	18 confirmed university instances (Berkeley, ETH, Cambridge, NTU, INHA, NCCU)	0 unauthenticated, JupyterHub PAM/LDAP auth standard across all surveyed institutions
qdrant-cloud-survey-2026-05.md	Qdrant	61 instances across DO/Hetzner/Vultr	100% unauthenticated, ships auth-off by default; 48/61 contain live data
chromadb-cloud-survey-2026-05.md	ChromaDB	48 instances across DO/Hetzner/Vultr	100% unauthenticated, ships auth-off by default; 22/48 populated; 2.67M documents total exposed
chromadb-tier2-cloud-survey-2026-05.md	ChromaDB (tier-2 expansion)	44 instances across Scaleway/OVH/Linode (3.55M IPs)	100% unauth (combined cross-survey total: 92 ChromaDB instances, 100% unauth); 23 populated. Branded enterprise tenants visible: STIHL (German power-tools via RaptorCX integrator), AXA Insurance (`rag_axa`), Mitsubishi, Daikin. Government/regulatory: Indonesian OJK financial regulator + UU PDP data-privacy law, Hilversum Dutch municipality. Healthcare: `oncology`, `patient_info_embeddings`, `larvol_kol` pharma KOL data
speech-audio-cloud-survey-2026-05.md	Speech & Audio AI (whisper-asr-webservice + faster-whisper-server, port 9000)	6 confirmed instances across tier-2 cloud (3.55M IPs, AS63949 honeypot pollution filtered)	100% unauth, Tier-A “no auth concept” reproduces on a new platform class. 2 of 6 are dual-stack with unauth Ollama on the same host, operators building “local AI swiss army knives” (one host runs faster-whisper-large-v3 + Ollama with Qwen3-235B + minimax-m2.7:cloud billing-target). 3 hosts run whisper-asr-webservice 1.9.1, 3 run faster-whisper-server with OpenAI-compat audio API. Compute-theft + adversarial-transcription + model-disk-write threat classes
comfyui-cloud-survey-2026-05.md	ComfyUI image-gen workflow tool (port 8188)	6 confirmed across tier-2 + Hetzner (5.25M IPs)	100% unauth, Tier-A no-auth-concept. Exposes `/system_stats` (GPU topology), `/queue` (jobs), `/history` (full workflow JSON + prompts + output filenames), `/object_info` (custom-node loadout), `POST /prompt` (compute theft), `POST /upload/image` (disk-fill). 385 GB total VRAM exposed including a NVIDIA RTX PRO 6000 Blackwell Max-Q (~$10K workstation card) on a single host. One operator identified: bonivivre.fr (French SaaS). Threat classes: GPU-hour theft, workflow + prompt + output exfil, adversarial workflow injection, disk-fill
observability-cloud-survey-2026-05.md	LLM observability + ML training telemetry (port 6006: Phoenix Arize + TensorBoard)	9 confirmed across tier-2 (3.55M IPs, 38 non-AI port-6006 services filtered)	100% unauth, Tier-A. 6 Phoenix (LLM trace platform) + 3 TensorBoard. Headline: active Stable Diffusion 1.5 + SDXL distillation + LoRA fine-tuning research workflow exposed on `51.159.189.219` (Scaleway), full PyTorch Lightning logs. 2 Phoenix hosts run `made-doc-analysis-llm-app` project (operator’s prod + staging). Threat classes: LLM trace exfil, project-name disclosure, training-loss-curve exfil, hyperparameter-sweep history, sometimes training-data samples in TensorBoard summaries
mcp-cloud-survey-2026-05.md	Model Context Protocol servers (Anthropic protocol, JSON-RPC over HTTP+SSE; ports 3000/8000/8080/8888)	95 confirmed cross-cloud (Scaleway 9 + Linode 4 + OVH 82 across 1,017 prefixes / ~6.33M IPs)	70.5% empty `tools/list` (auth-gated or stub), 29.5% with real exposed tool surfaces. Headline findings: fully-exposed Gmail mailbox MCP (19-tool send/read/delete CRUD on operator’s own Gmail); Alcy CRM Simple (22-tool French facility-management CRUD with create/patch on tickets/work-orders/interventions); rmcp Elasticsearch MCP proxy; hindsight-mcp v3.1.1 personal-AI-memory CRUD (29 tools incl. `clear_memories`, `delete_bank`); 3× Casdoor IAM-CRUD across providers (recurring template-auth-off pattern); Brazilian legal RAG with TCE-ES state-audit data; 6× Netdata sandboxed-but-unauth telemetry. Protocol-shape gate (strict JSON-RPC `initialize`) filtered honeypot pollution to 1.1% on Linode (vs 91.6% on prior Milvus survey). Pattern synthesis: single-operator catastrophic exposures, fleet-deployed open-source templates with auth-off-default, IAM-platform MCP wrappers as recurring high-risk class
llm-gateways-cloud-survey-2026-05.md	LLM Gateways / OpenAI-compat proxies (LiteLLM / LM Studio / Jan AI / oobabooga / OneAPI / generic; ports 1234/1337/3000/4000/5000/8080)	1,899 confirmed unauth cross-cloud (1,448 generic OpenAI-compat + 318 LM Studio + 126 Jan AI / Cortex + 7 LiteLLM Proxy)	97.8% (1,857) returned functional inference unauth when probed with single-token disclosure-PoC, operator quota actively billed. Provider-key inventory: 1,835 OpenAI-burnable / 2 Anthropic-burnable / Google / OpenRouter / Mistral / DeepSeek / MiniMax / xAI / Moonshot / Zhipu / Alibaba / Windsurf. 1,829 hosts (98.5% of burnable) ran the same canned-response template, single open-source proxy mass-deployed auth-off across operators. Aggregate ~$0.011 of operator quota consumed total (37,497 tokens, ~$0.000006 per host) by the methodology probe; no key strings extracted. Highlight finding: `172.235.117.122:4000` returned 56 Anthropic tokens unauth on `claude-4.5-haiku`. Extends vLLM survey’s 10-reseller-proxy finding by ~180× at the gateway-product tier
milvus-cloud-survey-2026-05.md	Milvus	33 instances across DO/Hetzner/Vultr	100% unauthenticated, RBAC opt-in; 27/33 populated; multi-tenant Everos AI agent platform, Saudi legal RAG, Midea KB, image+facial pipelines
triton-cloud-survey-2026-05.md	NVIDIA Triton Inference Server	2 instances on DO	100% unauthenticated, chat-safety pipeline w/ 127M-inference minor-detection classifier (159.203.42.211), workplace-surveillance YOLOv8 pipeline (178.62.225.198)
vllm-cloud-survey-2026-05.md	vLLM / OpenAI-compatible LLM servers	44 instances across DO/Hetzner/Vultr	100% unauthenticated, 19 vLLM + 25 generic; 10 commercial-API reseller proxies (Grok2API, Kiro-Go, AgentBar) burning operator credits on every external prompt; sipgate + Infomaniak proprietary fine-tunes attributable; Llama-3.3-70B-AMD, gpt-oss-120b, Qwen3-235B + Kimi-K2.6 clusters, Pixtral-12B all exposed
openwebui-cloud-survey-2026-05.md	Open WebUI (Ollama/OpenAI-compat chat UI)	112 instances across DO/Hetzner/Vultr	99.1% auth-enforced (different finding shape), but 14 instances with `enable_signup: true` (anyone can register), 5 branded deploys identifiable (Aera IA, TopicalBase, Tuuci AI, CloudU3, Lexa fork)
gradio-port-7860-survey-2026-05.md	Gradio / A1111 / Langflow on port 7860	16 instances (9 Langflow + 1 A1111 + 6 Gradio)	A1111 (167.172.175.48) fully open w/ dreamshaper + 3 models; 1 unauth Langflow is a CVE-research lab (excluded from disclosure); 6 branded Gradio LLMs incl. ByteDance Ark commercial-API tester
mlflow-cloud-survey-2026-05.md	MLflow Tracking Server	11 instances across DO/Hetzner/Vultr	100% unauth, 2 already actively exploited via CVE-2023-1177 by external attackers (visible attacker-injected experiments targeting /etc/ + /root/.ssh/, same actor across hosts); production workloads exposed: SPX hedging trading models, pediatric medical XGBoost classifiers, horse-racing/livestock breeders, manufacturing homogeneity, dental AI, AI safety probes
streamlit-cloud-survey-2026-05.md	Streamlit data apps	551 instances across DO/Hetzner/Vultr	100% unauth (no built-in auth); 100-app Playwright sample → 84 unique custom titles. Dominant cluster: trading bots / crypto dashboards (Binance, Hyperliquid, Polymarket, Kalshi). Also: Dark-Web OSINT tool (“Robin”), Russian OZON sellers admin, MITEC Live, GC Breeders Evaluation (cross-correlates with MLflow finding, same operator)
ollama-cloud-survey-2026-05.md	Ollama	342 instances across DO/Hetzner/Vultr	100% unauth (Ollama has no auth concept); 172 instances loading `:cloud` models = direct Ollama Cloud quota theft (minimax-m2.7, deepseek-v4-pro, kimi-k2.6, deepseek-v3.1:671b, devstral-2:123b); 22+ abliterated/uncensored safety-rail-removed models (huihui_ai family, Llama-3.1-8B-Lexi-Uncensored, Qwen3.5-9B-Claude-Opus-Uncensored-Distilled)
ollama-tier2-cloud-survey-2026-05.md	Ollama (tier-2 expansion)	850 real instances across Scaleway/OVH/Linode (3.55M IPs; 1,019 raw → 169 honeypots filtered)	100% unauth on real hits; 471 hosts (55.4%) load `:cloud` models (358 minimax-m2.7, 289 deepseek-v4-pro, 22 gemini-3-flash-preview = direct Ollama Cloud quota theft); 20+ abliterated/uncensored finetunes; discovered 393-host AS63949 (Akamai/Linode) honeypot fleet spoofing as Ollama 0.1.33 + Milvus + generic AI APIs, initially mis-attributed as a Linode marketplace cluster, corrected after cross-validation with Milvus probe
qdrant-tier2-cloud-survey-2026-05.md	Qdrant (tier-2 expansion)	781 instances across Scaleway/OVH/Linode (3.55M IPs)	84.9% unauth (663 hosts), first non-100% Qdrant auth posture measured at scale; 265 populated, 2,448 collections; `facts_v1` (51.158.59.156, Scaleway): 79.8M-point OpenAlex-keyed paper-claim/question RAG (~20M papers, 24-shard production cluster); two-tier auth-skew with OpenWebUI/Mem0 front-ends auth-protected but backing Qdrant exposed
milvus-tier2-cloud-survey-2026-05.md	Milvus (tier-2 expansion)	36 real instances (after filtering 393-host AS63949 honeypot fleet from 429 raw hits)	100% unauth (36 real); honeypot pollution rate 91.6% on Linode; populated: Quebec municipal-RAG operator (`rag_ville_de_saint_hyacinthe`, `rag_delson`, `rag_telefilm`), Islamic-text RAG (`SiddiqQuran`/`SunnahHadiths`/`SiratVectorstore`), kisspng/cleanpng image-search RAG, 17-collection multi-version document RAG with backup snapshots
backup-snapshot-services-survey-2026-05.md	Backup & Snapshot Services (Qdrant `/snapshots` cross-cut)	16 of 663 unauth Qdrant hosts expose pre-built snapshot files	2,512 snapshot files = 269 GB bulk-downloadable. 10 of 16 operators identified via TLS cert pivots (identities redacted pending coordinated-disclosure windows). Top exposures by data sensitivity: Brazilian Portuguese citizenship-application SaaS (passport/certidão OCR archive), EU CRM SaaS (WhatsApp/email/leads), EU multi-tenant RAG SaaS backup server (18-month cross-tenant retention, 226 GB single host). The snapshot endpoint inherits API auth state, operators with mature daily-backup workflows are at higher risk than those without
minio-dify-cloud-survey-2026-05.md	MinIO + Dify	852 MinIO + 5 Dify	MinIO: 0% anonymous-list (operators DID enable auth) but 27 version-disclosed older releases CVE-2023-28432 vulnerable, 747 Console-exposed for credential brute-force, 9-instance cluster on identical 6-year-old release. Dify: 5 confirmed all `setup_step:finished`, no setup-wizard takeover. Negative finding for both: auth-on-default upstream + clear docs = ~zero unauth at population scale
mem0-cross-survey-2026-05.md	Mem0 (cross-DB framework)	8 instances (6 Qdrant + 2 ChromaDB)	Content fingerprint cross-ref; 4 new identifiable-individual exposures: “Friday” assistant (8,984 pts), Italian marketing agency claude_memory (424), Chinese personal diary (1,199), openclaw_memories (empty)
elasticsearch-cloud-survey-2026-05.md	Elasticsearch / OpenSearch	42 instances across DO/Hetzner/Vultr	Mixed, ~18 ransomed/wiped, ~16 live production data; ES 7.x default-no-auth still common
compute-orchestration-cloud-survey-2026-05.md	Compute Orchestration / Training tier (Apache Spark + Apache Airflow + Ray Dashboard)	203 Shodan-seeded candidates, 126 confirmed across 3 platforms	118 unauthenticated exposures: 12 critical (4 Ray Dashboard CVE-2023-48022 ShadowRay surface + 8 Airflow unauth-via-`/home` with anonymous public role enabled) · 79 high (Spark Master + Worker + Application UI; ~71% exposure rate of confirmed Spark hosts) · 25 medium (Airflow login-gated, version-disclosure surface) · 2 low (Airflow API/health only). Methodology Insight #8, Airflow `/home` bypass: entry-point fingerprints miss auth-bypass-via-misconfiguration; probes must follow `/` → `/home` redirect and check authenticated-state-only tokens. BARE rank-1: `exploits_linux_http_spark_unauth_rce`, `exploits_linux_http_apache_airflow_dag_rce`, `exploits_linux_http_ray_agent_job_rce` (commodity-CVE chain across all 91 critical/high)
medical-edge-ai-survey-2026-05-15.md	Medical / Edge AI (Orthanc DICOM + MONAI Label + dcm4che + DICOMweb + NVIDIA NIM + Clara; ports 4242/8042/8043/11112/8000)	12,135 masscan candidates → 88 protocol-strict DICOM PDU responders → 39 confirmed real after 2-pass honeypot filter	39 unauth DICOM SCPs on tier-2 cloud, all default Orthanc AE “ORTHANC”. 11 cert-attributable named operators (Spanish clinic SaaS, Brazilian hospital, French AI orthopedic vendor, UK MRI phantom company, Colombian healthcare-tech, etc.). HTTP REST 401 across all 39 = Orthanc 1.10+ auth-on-default working; DICOM TCP 4242/11112 wide-open = protocol-default failing same operators. Insight #13 at the protocol layer. Produced Insight #22: protocol-strict A-ASSOCIATE + adjacent-port shape-hash discrimination caught a 7-host Linode multi-protocol honeypot fleet (fake Citrix login on 443 + real DICOM SCP on 4242). Shodan-down survey — port-first per Insight #21. Restraint perimeter held: no C-FIND/C-MOVE issued; no PHI retrieved

Why Separate from Universities

Commercial exposures carry distinct risk profiles:

Paying customers, direct financial / contractual liability when PII is exposed
Live PII pipelines, system prompts often reveal the exact data-collection schema
Competitive intel, proprietary business logic in plain text
Cross-border attribution, host country (e.g., Romania) often differs from operator country (e.g., France), complicating regulatory disclosure