Commercial AI Infrastructure Exposures
NuClide Research, ongoing · Updated 2026-05-04
Commercial / SaaS Ollama and AI infrastructure exposures discovered during OSINT sweeps. These differ from university and research-network exposures in that the operators are commercial entities with paying customers and PII pipelines.
2026-05 cross-survey synthesis:
SYNTHESIS-2026-05.md, pulls together all 18+ platform surveys (~5,200 confirmed deployments) into one analysis: tier-by-tier auth-posture comparison, root-cause taxonomy, threat-class taxonomy, cross-survey operator correlations.For operators who find their IP in a survey paper:
REMEDIATION-GUIDE.mdgives the one-line config fix for each affected platform (Qdrant, ChromaDB, Milvus, Ollama, MLflow, vLLM, Streamlit, Open WebUI, MinIO).Future-work roadmap:
FUTURE-SURVEYS.mdcatalogues 30+ AI/ML platform classes not yet surveyed (Ray Dashboard, ComfyUI, Weaviate, pgvector, Langfuse, W&B self-hosted, ClearML, AutoGen Studio, ClickHouse, ROS, NVIDIA Clara, etc.) with port + fingerprint + risk-class for each. Anyone can pick a category and run the survey using the documented methodology template.
Confirmed Findings
| File | Operator | Country | Severity | Key Finding |
|---|---|---|---|---|
| FR-emails-pro-rdv-bot.md | emails-pro.fr (hosted on Romanian ICI IP space) | France / Romania | CRITICAL | Production French commercial appointment-booking SaaS, full system prompt + PII collection schema + function-call format exposed |
| TR-sanctionscanner-aml-kyc.md | sanctionscanner.com (168.119.90.62, Hetzner DE) | Turkey / Germany | CRITICAL | AML/KYC compliance SaaS, 79M KYB records + 6.2M individual sanctions list entries unauth; active ransom compromise; disclosed 2026-05-03 |
| VN-watzis-ai-pii-memory.md | Watzis / Calmio AI assistant (149.28.77.155, Vultr) | Vietnam | HIGH | Vietnamese AI assistant, Mem0 long-term memory store unauth; citizen ID card + VND wallet + student PII in plaintext; multiple users confirmed |
| multi-pingu-trading-ai.md | Unknown operator (45.76.20.46, Vultr) | Unknown | HIGH | Pingu crypto trading AI + Nova molecular optimization, 25 Qdrant collections unauth; live trade PnL, full LLM reasoning traces, competition leaderboard |
| multi-legal-compliance-investigation.md | Unknown operator (167.172.120.218, DigitalOcean) | Unknown | CRITICAL (if populated) | Legal/compliance investigation platform schema exposed unauth, investigation_data, case_drafts, attachments collections; empty at probe time; flagged for re-probe |
| multi-auto-fi-sales-training.md | Unknown operator (104.131.60.234, DigitalOcean) | Unknown (Sean McNally methodology) | HIGH | Auto F&I sales training RAG, real customer dialogues with names + vehicles + dollar figures, Sean McNally methodology IP, 1,608 docs unauth ChromaDB |
| multi-crypto-agent-user-memory.md | Unknown operator (159.203.117.193, DigitalOcean) | Spanish-language LatAm/Spain | HIGH | Crypto investment agent, per-user financial profiles ($50K targets, exchange affinity, asset allocation) in user_memory_ |
| multi-holamoda-multitenant.md | HolaModa + Delta701 (46.101.118.246, DigitalOcean) | Unknown (Mexican/Spanish?) | CRITICAL | Multi-tenant fashion retail RAG, 2 tenants + dev/prod co-located on one ChromaDB; 1.53M docs across 7 collections unauth; Vertex AI text-embedding-gecko |
| multi-personal-diary-corpus.md | Unknown Prisma SaaS (188.166.71.44, DigitalOcean) | Belgium/France inferred | HIGH | Multi-tenant document SaaS, Prisma CUID per-user collections expose personal alcohol-cessation diary (GDPR Art. 9), theater scripts with author emails + Belgian phones, public-domain texts |
| multi-tweet-optimize-facial-recognition.md | tweet-optimize.com (65.108.107.240, Hetzner FI) | Finland (Hetzner DC) | CRITICAL | 1.21M face embeddings unauth on Milvus, onlyfans (897K) + psos (313K) collections with bbox + mongo_id refs. Worst-case interpretation: a doxing-as-a-service backend exposed on the public internet via unauth /entities/search |
| langfuse-cross-survey-2026-05-06.md | Unistart Hubs / Pharos AI Assistant (135.181.252.66, Hetzner DE) | Greece | CRITICAL | Four-platform AI-stack catastrophe on one host, Langfuse v3.73.1 with signUpDisabled:false on port 3001 (anyone registers, reads all LLM traces) + Mem0/Milvus unauth on 19530 (existing finding) + Attu admin GUI on 3000 + CLIENT_SECRET literally hardcoded in /env.js of the Pharos webapp on 8080. Surfaced via cross-survey-correlation probe of 723 ledger IPs (Methodology Insight #9) when Shodan API was unavailable, full chain ran in <5 minutes from a single anonymous probe |
| elasticsearch-cloud-survey-2026-05.md | sanctionscanner.com (168.119.90.62, Hetzner DE) | Turkey / Germany | CRITICAL | AML/KYC compliance SaaS, 79M KYB records + 6.2M individual sanctions list entries unauth; active ransom compromise; disclosed 2026-05-03 |
| qdrant-cloud-survey-2026-05.md | Multiple operators | Various | HIGH | 61/61 Qdrant instances unauth across DO/Hetzner/Vultr, crypto trading AI, Vietnamese PII in agent memory, internal SOPs, legal compliance platform |
DCWF KSAT coverage
Auto-derived from DCWF AI work-role rule files (ksat-tag).
- 672 (AI Test & Evaluation Specialist): K7003, K7004, K7044, S7068, S7070, S7075, T5858, T5904
- 733 (AI Risk & Ethics Specialist): K7040, K7051, S7067, S7069, T5854, T5868, T5893
- overlap (Common AI KSATs (all 5 roles)): K108, K1157, K1158, K1159, K22, K6311, K6900, K6935, K7003, K942, S7065
Cross-Provider Surveys
Aggregate auth-posture studies across cloud-hosting providers (DigitalOcean, Hetzner, Vultr, etc.) for specific platform classes.
| File | Platform | Sample | Result |
|---|---|---|---|
| flowise-cloud-survey-2026-05.md | Flowise | 43 instances across DO/Hetzner/Vultr | 0 unauthenticated, operator hygiene post-CVE-2024-36420 has improved on cloud platforms |
| n8n-cloud-survey-2026-05.md | n8n | 1,006 instances across DO/Hetzner/Vultr | 0 unauthenticated, mandatory auth since v0.166.0 fully adopted on cloud platforms |
| jupyter-survey-2026-05.md | Jupyter / JupyterHub | 18 confirmed university instances (Berkeley, ETH, Cambridge, NTU, INHA, NCCU) | 0 unauthenticated, JupyterHub PAM/LDAP auth standard across all surveyed institutions |
| qdrant-cloud-survey-2026-05.md | Qdrant | 61 instances across DO/Hetzner/Vultr | 100% unauthenticated, ships auth-off by default; 48/61 contain live data |
| chromadb-cloud-survey-2026-05.md | ChromaDB | 48 instances across DO/Hetzner/Vultr | 100% unauthenticated, ships auth-off by default; 22/48 populated; 2.67M documents total exposed |
| chromadb-tier2-cloud-survey-2026-05.md | ChromaDB (tier-2 expansion) | 44 instances across Scaleway/OVH/Linode (3.55M IPs) | 100% unauth (combined cross-survey total: 92 ChromaDB instances, 100% unauth); 23 populated. Branded enterprise tenants visible: STIHL (German power-tools via RaptorCX integrator), AXA Insurance (rag_axa), Mitsubishi, Daikin. Government/regulatory: Indonesian OJK financial regulator + UU PDP data-privacy law, Hilversum Dutch municipality. Healthcare: oncology, patient_info_embeddings, larvol_kol pharma KOL data |
| speech-audio-cloud-survey-2026-05.md | Speech & Audio AI (whisper-asr-webservice + faster-whisper-server, port 9000) | 6 confirmed instances across tier-2 cloud (3.55M IPs, AS63949 honeypot pollution filtered) | 100% unauth, Tier-A “no auth concept” reproduces on a new platform class. 2 of 6 are dual-stack with unauth Ollama on the same host, operators building “local AI swiss army knives” (one host runs faster-whisper-large-v3 + Ollama with Qwen3-235B + minimax-m2.7:cloud billing-target). 3 hosts run whisper-asr-webservice 1.9.1, 3 run faster-whisper-server with OpenAI-compat audio API. Compute-theft + adversarial-transcription + model-disk-write threat classes |
| comfyui-cloud-survey-2026-05.md | ComfyUI image-gen workflow tool (port 8188) | 6 confirmed across tier-2 + Hetzner (5.25M IPs) | 100% unauth, Tier-A no-auth-concept. Exposes /system_stats (GPU topology), /queue (jobs), /history (full workflow JSON + prompts + output filenames), /object_info (custom-node loadout), POST /prompt (compute theft), POST /upload/image (disk-fill). 385 GB total VRAM exposed including a NVIDIA RTX PRO 6000 Blackwell Max-Q (~$10K workstation card) on a single host. One operator identified: bonivivre.fr (French SaaS). Threat classes: GPU-hour theft, workflow + prompt + output exfil, adversarial workflow injection, disk-fill |
| observability-cloud-survey-2026-05.md | LLM observability + ML training telemetry (port 6006: Phoenix Arize + TensorBoard) | 9 confirmed across tier-2 (3.55M IPs, 38 non-AI port-6006 services filtered) | 100% unauth, Tier-A. 6 Phoenix (LLM trace platform) + 3 TensorBoard. Headline: active Stable Diffusion 1.5 + SDXL distillation + LoRA fine-tuning research workflow exposed on 51.159.189.219 (Scaleway), full PyTorch Lightning logs. 2 Phoenix hosts run made-doc-analysis-llm-app project (operator’s prod + staging). Threat classes: LLM trace exfil, project-name disclosure, training-loss-curve exfil, hyperparameter-sweep history, sometimes training-data samples in TensorBoard summaries |
| mcp-cloud-survey-2026-05.md | Model Context Protocol servers (Anthropic protocol, JSON-RPC over HTTP+SSE; ports 3000/8000/8080/8888) | 95 confirmed cross-cloud (Scaleway 9 + Linode 4 + OVH 82 across 1,017 prefixes / ~6.33M IPs) | 70.5% empty tools/list (auth-gated or stub), 29.5% with real exposed tool surfaces. Headline findings: fully-exposed Gmail mailbox MCP (19-tool send/read/delete CRUD on operator’s own Gmail); Alcy CRM Simple (22-tool French facility-management CRUD with create/patch on tickets/work-orders/interventions); rmcp Elasticsearch MCP proxy; hindsight-mcp v3.1.1 personal-AI-memory CRUD (29 tools incl. clear_memories, delete_bank); 3× Casdoor IAM-CRUD across providers (recurring template-auth-off pattern); Brazilian legal RAG with TCE-ES state-audit data; 6× Netdata sandboxed-but-unauth telemetry. Protocol-shape gate (strict JSON-RPC initialize) filtered honeypot pollution to 1.1% on Linode (vs 91.6% on prior Milvus survey). Pattern synthesis: single-operator catastrophic exposures, fleet-deployed open-source templates with auth-off-default, IAM-platform MCP wrappers as recurring high-risk class |
| llm-gateways-cloud-survey-2026-05.md | LLM Gateways / OpenAI-compat proxies (LiteLLM / LM Studio / Jan AI / oobabooga / OneAPI / generic; ports 1234/1337/3000/4000/5000/8080) | 1,899 confirmed unauth cross-cloud (1,448 generic OpenAI-compat + 318 LM Studio + 126 Jan AI / Cortex + 7 LiteLLM Proxy) | 97.8% (1,857) returned functional inference unauth when probed with single-token disclosure-PoC, operator quota actively billed. Provider-key inventory: 1,835 OpenAI-burnable / 2 Anthropic-burnable / Google / OpenRouter / Mistral / DeepSeek / MiniMax / xAI / Moonshot / Zhipu / Alibaba / Windsurf. 1,829 hosts (98.5% of burnable) ran the same canned-response template, single open-source proxy mass-deployed auth-off across operators. Aggregate ~$0.011 of operator quota consumed total (37,497 tokens, ~$0.000006 per host) by the methodology probe; no key strings extracted. Highlight finding: 172.235.117.122:4000 returned 56 Anthropic tokens unauth on claude-4.5-haiku. Extends vLLM survey’s 10-reseller-proxy finding by ~180× at the gateway-product tier |
| milvus-cloud-survey-2026-05.md | Milvus | 33 instances across DO/Hetzner/Vultr | 100% unauthenticated, RBAC opt-in; 27/33 populated; multi-tenant Everos AI agent platform, Saudi legal RAG, Midea KB, image+facial pipelines |
| triton-cloud-survey-2026-05.md | NVIDIA Triton Inference Server | 2 instances on DO | 100% unauthenticated, chat-safety pipeline w/ 127M-inference minor-detection classifier (159.203.42.211), workplace-surveillance YOLOv8 pipeline (178.62.225.198) |
| vllm-cloud-survey-2026-05.md | vLLM / OpenAI-compatible LLM servers | 44 instances across DO/Hetzner/Vultr | 100% unauthenticated, 19 vLLM + 25 generic; 10 commercial-API reseller proxies (Grok2API, Kiro-Go, AgentBar) burning operator credits on every external prompt; sipgate + Infomaniak proprietary fine-tunes attributable; Llama-3.3-70B-AMD, gpt-oss-120b, Qwen3-235B + Kimi-K2.6 clusters, Pixtral-12B all exposed |
| openwebui-cloud-survey-2026-05.md | Open WebUI (Ollama/OpenAI-compat chat UI) | 112 instances across DO/Hetzner/Vultr | 99.1% auth-enforced (different finding shape), but 14 instances with enable_signup: true (anyone can register), 5 branded deploys identifiable (Aera IA, TopicalBase, Tuuci AI, CloudU3, Lexa fork) |
| gradio-port-7860-survey-2026-05.md | Gradio / A1111 / Langflow on port 7860 | 16 instances (9 Langflow + 1 A1111 + 6 Gradio) | A1111 (167.172.175.48) fully open w/ dreamshaper + 3 models; 1 unauth Langflow is a CVE-research lab (excluded from disclosure); 6 branded Gradio LLMs incl. ByteDance Ark commercial-API tester |
| mlflow-cloud-survey-2026-05.md | MLflow Tracking Server | 11 instances across DO/Hetzner/Vultr | 100% unauth, 2 already actively exploited via CVE-2023-1177 by external attackers (visible attacker-injected experiments targeting /etc/ + /root/.ssh/, same actor across hosts); production workloads exposed: SPX hedging trading models, pediatric medical XGBoost classifiers, horse-racing/livestock breeders, manufacturing homogeneity, dental AI, AI safety probes |
| streamlit-cloud-survey-2026-05.md | Streamlit data apps | 551 instances across DO/Hetzner/Vultr | 100% unauth (no built-in auth); 100-app Playwright sample → 84 unique custom titles. Dominant cluster: trading bots / crypto dashboards (Binance, Hyperliquid, Polymarket, Kalshi). Also: Dark-Web OSINT tool (“Robin”), Russian OZON sellers admin, MITEC Live, GC Breeders Evaluation (cross-correlates with MLflow finding, same operator) |
| ollama-cloud-survey-2026-05.md | Ollama | 342 instances across DO/Hetzner/Vultr | 100% unauth (Ollama has no auth concept); 172 instances loading :cloud models = direct Ollama Cloud quota theft (minimax-m2.7, deepseek-v4-pro, kimi-k2.6, deepseek-v3.1:671b, devstral-2:123b); 22+ abliterated/uncensored safety-rail-removed models (huihui_ai family, Llama-3.1-8B-Lexi-Uncensored, Qwen3.5-9B-Claude-Opus-Uncensored-Distilled) |
| ollama-tier2-cloud-survey-2026-05.md | Ollama (tier-2 expansion) | 850 real instances across Scaleway/OVH/Linode (3.55M IPs; 1,019 raw → 169 honeypots filtered) | 100% unauth on real hits; 471 hosts (55.4%) load :cloud models (358 minimax-m2.7, 289 deepseek-v4-pro, 22 gemini-3-flash-preview = direct Ollama Cloud quota theft); 20+ abliterated/uncensored finetunes; discovered 393-host AS63949 (Akamai/Linode) honeypot fleet spoofing as Ollama 0.1.33 + Milvus + generic AI APIs, initially mis-attributed as a Linode marketplace cluster, corrected after cross-validation with Milvus probe |
| qdrant-tier2-cloud-survey-2026-05.md | Qdrant (tier-2 expansion) | 781 instances across Scaleway/OVH/Linode (3.55M IPs) | 84.9% unauth (663 hosts), first non-100% Qdrant auth posture measured at scale; 265 populated, 2,448 collections; facts_v1 (51.158.59.156, Scaleway): 79.8M-point OpenAlex-keyed paper-claim/question RAG (~20M papers, 24-shard production cluster); two-tier auth-skew with OpenWebUI/Mem0 front-ends auth-protected but backing Qdrant exposed |
| milvus-tier2-cloud-survey-2026-05.md | Milvus (tier-2 expansion) | 36 real instances (after filtering 393-host AS63949 honeypot fleet from 429 raw hits) | 100% unauth (36 real); honeypot pollution rate 91.6% on Linode; populated: Quebec municipal-RAG operator (rag_ville_de_saint_hyacinthe, rag_delson, rag_telefilm), Islamic-text RAG (SiddiqQuran/SunnahHadiths/SiratVectorstore), kisspng/cleanpng image-search RAG, 17-collection multi-version document RAG with backup snapshots |
| backup-snapshot-services-survey-2026-05.md | Backup & Snapshot Services (Qdrant /snapshots cross-cut) | 16 of 663 unauth Qdrant hosts expose pre-built snapshot files | 2,512 snapshot files = 269 GB bulk-downloadable. 10 of 16 operators identified via TLS cert pivots (identities redacted pending coordinated-disclosure windows). Top exposures by data sensitivity: Brazilian Portuguese citizenship-application SaaS (passport/certidão OCR archive), EU CRM SaaS (WhatsApp/email/leads), EU multi-tenant RAG SaaS backup server (18-month cross-tenant retention, 226 GB single host). The snapshot endpoint inherits API auth state, operators with mature daily-backup workflows are at higher risk than those without |
| minio-dify-cloud-survey-2026-05.md | MinIO + Dify | 852 MinIO + 5 Dify | MinIO: 0% anonymous-list (operators DID enable auth) but 27 version-disclosed older releases CVE-2023-28432 vulnerable, 747 Console-exposed for credential brute-force, 9-instance cluster on identical 6-year-old release. Dify: 5 confirmed all setup_step:finished, no setup-wizard takeover. Negative finding for both: auth-on-default upstream + clear docs = ~zero unauth at population scale |
| mem0-cross-survey-2026-05.md | Mem0 (cross-DB framework) | 8 instances (6 Qdrant + 2 ChromaDB) | Content fingerprint cross-ref; 4 new identifiable-individual exposures: “Friday” assistant (8,984 pts), Italian marketing agency claude_memory (424), Chinese personal diary (1,199), openclaw_memories (empty) |
| elasticsearch-cloud-survey-2026-05.md | Elasticsearch / OpenSearch | 42 instances across DO/Hetzner/Vultr | Mixed, ~18 ransomed/wiped, ~16 live production data; ES 7.x default-no-auth still common |
| compute-orchestration-cloud-survey-2026-05.md | Compute Orchestration / Training tier (Apache Spark + Apache Airflow + Ray Dashboard) | 203 Shodan-seeded candidates, 126 confirmed across 3 platforms | 118 unauthenticated exposures: 12 critical (4 Ray Dashboard CVE-2023-48022 ShadowRay surface + 8 Airflow unauth-via-/home with anonymous public role enabled) · 79 high (Spark Master + Worker + Application UI; ~71% exposure rate of confirmed Spark hosts) · 25 medium (Airflow login-gated, version-disclosure surface) · 2 low (Airflow API/health only). Methodology Insight #8, Airflow /home bypass: entry-point fingerprints miss auth-bypass-via-misconfiguration; probes must follow / → /home redirect and check authenticated-state-only tokens. BARE rank-1: exploits_linux_http_spark_unauth_rce, exploits_linux_http_apache_airflow_dag_rce, exploits_linux_http_ray_agent_job_rce (commodity-CVE chain across all 91 critical/high) |
| medical-edge-ai-survey-2026-05-15.md | Medical / Edge AI (Orthanc DICOM + MONAI Label + dcm4che + DICOMweb + NVIDIA NIM + Clara; ports 4242/8042/8043/11112/8000) | 12,135 masscan candidates → 88 protocol-strict DICOM PDU responders → 39 confirmed real after 2-pass honeypot filter | 39 unauth DICOM SCPs on tier-2 cloud, all default Orthanc AE “ORTHANC”. 11 cert-attributable named operators (Spanish clinic SaaS, Brazilian hospital, French AI orthopedic vendor, UK MRI phantom company, Colombian healthcare-tech, etc.). HTTP REST 401 across all 39 = Orthanc 1.10+ auth-on-default working; DICOM TCP 4242/11112 wide-open = protocol-default failing same operators. Insight #13 at the protocol layer. Produced Insight #22: protocol-strict A-ASSOCIATE + adjacent-port shape-hash discrimination caught a 7-host Linode multi-protocol honeypot fleet (fake Citrix login on 443 + real DICOM SCP on 4242). Shodan-down survey — port-first per Insight #21. Restraint perimeter held: no C-FIND/C-MOVE issued; no PHI retrieved |
Why Separate from Universities
Commercial exposures carry distinct risk profiles:
- Paying customers, direct financial / contractual liability when PII is exposed
- Live PII pipelines, system prompts often reveal the exact data-collection schema
- Competitive intel, proprietary business logic in plain text
- Cross-border attribution, host country (e.g., Romania) often differs from operator country (e.g., France), complicating regulatory disclosure