Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Survey May 9, 2026

Weaviate on Public Cloud: Auth Posture and Enterprise Tenant Exposure Survey

NuClide Research · 2026-05-09


Summary

Shodan pull of http.html:"weaviate" port:8080 → 852 unique IPs → asyncio probe of /v1/meta, /v1/schema, /v1/nodes694 confirmed reachable Weaviate instances. Of these, 435 are fully open (no authentication), 344 contain at least one populated class (vector collection), 201 have the OpenAI module active (meaning an OpenAI API key is configured server-side and callable by any unauthenticated client). The auth-off-by-default thesis reproduces cleanly: Weaviate’s anonymous access is opt-out, not opt-in.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, S7068, S7075, T5904
  • 733 (AI Risk & Ethics Specialist): K7040, T5868, T5904
  • overlap (Common AI KSATs (all 5 roles)): K1157, K1158, K22, K6311, K6900, K6935, K7003, K942

The notable findings are not the auth posture itself (well-documented) but the enterprise multi-tenant SaaS pattern appearing at scale: AI integrators have built chatbot platforms on top of Weaviate and exposed the entire client-portfolio knowledge base without access controls. Named enterprise tenants confirmed across luxury retail, public transport infrastructure, national-level financial regulatory, government, and cybersecurity sectors.


Methodology

shodan download --limit 1000 weaviate-8080.json.gz 'http.html:"weaviate" port:8080'
  → 852 unique IPs

weaviate-probe.py (asyncio, 80 concurrent, 2s connect / 4s read / 8s host deadline)
  GET /v1/meta     → version, module list (OpenAI/Cohere/Anthropic/etc)
  GET /v1/schema   → class names (collection names)
  GET /v1/nodes    → object count, shard count, node health
  GET /v1/objects?limit=1 → confirm data access (no content read)
  → 694 confirmed (12 seconds wall time)

Findings Summary

MetricValue
Shodan hits (http.html:"weaviate" port:8080)858 (852 downloadable)
Confirmed reachable694
Auth-gated (HTTP 401)259
Fully open (no auth)435
Populated (≥1 class)344
Populated + OpenAI module169
OpenAI module active (any)201
Cohere module active135
Both OpenAI + Cohere134

Version distribution (top 10)

Version seriesCount
v1.24.x55
v1.27.x54
v1.28.x39
v1.30.x31
v1.25.x29
v1.32.x27
v1.23.x26
v1.31.x25
v1.35.x22
(no meta response)262

The 262 “unknown version” hosts responded on /v1/schema or /v1/nodes but not /v1/meta. Likely older Weaviate versions or non-standard deployments that nonetheless serve the schema API.


Notable Findings

F1: MyAi Corporation multi-tenant platform (HIGH)

Hosts: 188.245.173.135:8080 (Hetzner DE, www.myaicorp.com), 91.98.226.57:8080 (Hetzner DE)
Operator: MyAi Corporation (myaicorp.com). Spanish AI integrator, TLS cert *.myaicorp.com (Sectigo DV)
Severity: HIGH. Enterprise multi-tenant SaaS platform with no access controls; named clients’ vectorized knowledge bases publicly readable

Both instances share the same schema with 200–203 classes, running Weaviate v1.28.4 with text2vec-openai and backup-filesystem modules. The instances appear to be production and staging (or load-balanced pair) for MyAi Corporation’s chatbot/RAG platform. The schema enumerates the complete client portfolio.

Named clients confirmed in the class namespace (selection):

SectorClients
Luxury / beautyDior, Chanel, YSL, Armani, Charlotte Tilbury, Tom Ford, Louboutin, Lancôme, Hermès, Byredo, Paco Rabanne, Guerlain, Nars, Revlon
Industrial equipmentWittmann (injection molding robots — 10 model-specific classes), IKA (lab equipment), Salicru (UPS/power), Yaskawa (industrial robots), VaccuBrand, Plasmac
Public transportRenfe (Spanish national rail), TMB (Barcelona metro), Metro Madrid, Moventis, FGC (Ferrocarrils de la Generalitat de Catalunya), Turkish Cargo
GovernmentGencat / Generalitat de Catalunya (Catalan government), Qatar University, Riyadh Municipality, Roshn (Saudi NEOM-era mega-city developer)
Finance / paymentsAstropay, Monri (Balkan payment gateway), Signifyd
CybersecurityCrowdStrike, Kaspersky, Orange (Cyberdefense)
Pharma / healthProbiotical, URIAGE (French dermo-cosmetic)

The Dataseekers-prefixed classes (DataseekersFragranceArtisan, DataseekersFragranceLouboutin, DataseekersMakeUpChanel, etc.) indicate Dataseekers is either a sub-brand or an integrator project name within the MyAi Corporation platform.

Reproduction (operator-self-test):

curl http://188.245.173.135:8080/v1/schema | python3 -m json.tool | grep '"class"'

Impact: Any unauthenticated caller can enumerate all client names, read schema structure for each client’s knowledge base, and issue semantic search queries over any client’s vectorized documents. The text2vec-openai module means OpenAI embeddings are computed using an API key baked into the server config, but credential extraction requires the /v1/modules/text2vec-openai config endpoint, which was not tested.

Disclosure status: Not yet disclosed. Routing via myaicorp.com contact surface (currently nginx default page, no public contact form or security email found at time of survey).


F2: Indian regulatory/compliance RAG platform (HIGH)

Hosts: 34.56.31.138:8080 (GCP US-central1, Iowa), 104.154.128.27:8080 (GCP US-central1, Iowa)
Severity: HIGH. 3,059-class and 197-class instances respectively; Indian government and financial regulatory corpus

Both GCP instances run Weaviate v1.31.3 with the full Weaviate Cloud module set (all generative providers: OpenAI, Anthropic, Cohere, Google, Mistral, etc.) and hostname: http://[::]:8080. The Weaviate Cloud Service embedded-module profile. These may be WCS-managed instances or self-hosted with the WCS module bundle.

34.56.31.138 (3,059 classes): Schema contains Indian regulatory documents across:

  • NPCI (National Payments Corporation of India). UPI operational circulars
  • UIDAI (Unique Identification Authority of India). Aadhaar-related notifications
  • MCA (Ministry of Corporate Affairs). Companies Act GSR gazette amendments (largest class group: ~2,000+ GSR series)
  • SEBI: securities regulation (smaller subset)
  • IBC (Insolvency and Bankruptcy Code). Enforcement notifications
  • CCI (Competition Commission of India)

104.154.128.27 (197 classes): SEBI-focused subset. Securities circulars, borrowing regulations, timeline extension notifications.

Context: This is an Indian corporate law / compliance AI assistant built over vectorized gazette notifications. The breadth of coverage (NPCI/UPI + UIDAI/Aadhaar + MCA + SEBI + IBC) suggests either a legal-tech product or an in-house compliance tool for a regulated entity. The data itself is public (gazette notifications are public domain), but the unauthorized searchable RAG interface over this corpus represents exposure of the operator’s proprietary knowledge curation work, and /v1/generate queries (generative RAG) would consume the operator’s API keys without authorization.

Disclosure status: Not yet disclosed. Operator identity not established from IP/module profile alone. WCS module bundle is consistent with Weaviate Cloud Service managed instances (disclosure would route via Weaviate Cloud abuse contact if confirmed).


F3: Multi-tenant chatbot SaaS (MEDIUM)

Host: 85.190.246.164:8080 (Contabo DE, vmi1891772.contaboserver.net)
Severity: MEDIUM. 50 classes, 136,243 vector objects (highest object count in survey), Weaviate v1.23.5 with OpenAI + Cohere generative modules

Named tenants include UK brands: Harrogate Spring (mineral water), Heck (food), Imperial (tobacco/brands), Redline Specialist Cars, Odyssey, and multiple 001-suffixed patterns indicating a tenant-per-class SaaS model. The Dead01 and Deadnorthern_001 classes suggest decommissioned tenants that weren’t deleted.

136,243 objects is the largest corpus volume confirmed in this survey. Weaviate v1.23.5 predates the introduction of authentication as an easily-configurable default, making this a legacy deployment unlikely to be hardened.

Disclosure status: Not yet disclosed.


Auth Posture Analysis

Weaviate ships with AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true by default. The operator must explicitly set AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=false and configure an auth method (API key, OIDC, or both) to restrict access. This is the same opt-out posture as ChromaDB (pre-0.6) and early Qdrant deployments.

The 259 auth-gated instances (37%) represent operators who actively configured auth. A better ratio than ChromaDB (0% auth-gated in both NuClide surveys) but still a majority unauthenticated.

OpenAI key exposure path: Weaviate’s /v1/modules/text2vec-openai endpoint returns module configuration including whether a key is configured (the key value itself is not returned in the config endpoint per Weaviate’s design). However, any unauthenticated caller on a Weaviate with text2vec-openai active can issue embedding and generative queries that consume the operator’s OpenAI API quota at their own cost. This is effectively LLM compute theft via the semantic search layer, not direct key extraction.


Discovery Context

Survey conducted 2026-05-09 as part of NuClide Research vector database exposure series. Shodan pull on http.html:"weaviate" port:8080, asyncio probe with per-endpoint timeout enforcement (2s connect / 4s read / 8s host deadline, 80 concurrent). Total probe time: 12 seconds for 852 hosts.

Port 8080 is Weaviate’s default through v1.x. Weaviate Cloud Service and newer managed deployments front on 443; the direct-8080 exposure set represents self-hosted instances, the majority of which are not behind a reverse proxy with auth.

Companion surveys: chromadb-tier2-cloud-survey-2026-05.md, milvus-cloud-survey-2026-05.md, milvus-tier2-cloud-survey-2026-05.md.