Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All reference

Reference

Specialty Data Layers — Shodan Query Catalog

Source: https://github.com/nuclide-research/AI-LLM-Infrastructure-OSINT/blob/main/shodan/queries/specialty-data-layers-queries

Generated: 2026-05-27 from pre-survey OSINT pass (15 platforms) See: data/platform-intel/specialty-data-layers-osint-2026-05-27.md for full intel

Platforms covered: ClickHouse, Apache Cassandra, Redis Stack, MinIO, Feast, Hopsworks, Tecton, Feathr, ArangoDB, Neo4j, Apache Kafka REST Proxy, Apache Flink, Spark History Server, Trino/Presto, Delta Sharing Server.


ClickHouse

Auth default: off (default user ships with empty password; pre-DeepSeek-incident Docker images leave the default user network-accessible with no password) Exposure class: Full SQL access to all databases; system tables expose env vars, query logs, schema, LLM request/response logs stored in AI stacks. SELECT * FROM system.environment dumps environment variables including any secrets.

LabelQueryRationaleFP Risk
primaryport:8123 "x-clickhouse-server-display-name"Vendor-unique HTTP response header; present on all ClickHouse HTTP responsesLow
secondaryhttp.title:"ClickHouse" port:8123Title fingerprint on the /play web SQL consoleLow
tertiaryport:8123 "X-ClickHouse-Format"Another vendor-unique response header set on query responsesLow
metricsport:9363 "clickhouse_"Prometheus metrics endpoint — no auth, leaks table counts and query ratesLow
identity-probeGET /?query=SELECT+11\n + x-clickhouse-server-display-name headerConfirms unauthenticated SQL execution + leaks internal hostname

Apache Cassandra

Auth default: off (authenticator: AllowAllAuthenticator is the yaml default; no credentials required) Exposure class: Full CQL read/write to all keyspaces; JMX on 7199 gives full cluster management. AI stacks expose feature vectors, session data, time-series ML features.

LabelQueryRationaleFP Risk
primaryport:9042 "Apache Cassandra"CQL handshake banner contains product stringLow
secondaryport:9042 "CQL_VERSION"CQL OPTIONS response field; vendor-unique protocol signalLow
jmxport:7199 "cassandra"JMX port — unauthenticated by default; full cluster management accessMed
identity-probeCQL OPTIONS on port 9042 → CQL_VERSION + COMPRESSION keys in responseConfirms Cassandra native protocol; no HTTP endpoint available

Redis Stack / Redis with Vector Modules

Auth default: off (no requirepass by default; ~60,000 of 300,000+ internet-exposed instances have zero auth per Wiz 2025) Exposure class: Full key-value read/write including all embeddings, cached API keys, LLM conversation history, feature values. CVE-2025-49844 (RediShell, CVSS 9.9) enables RCE via Lua on no-auth instances.

LabelQueryRationaleFP Risk
primaryport:6379 "redis_version"INFO server response contains this field in plaintext; Shodan reads itLow
secondaryport:6379 "+OK"Redis protocol response to PING; broad but catches no-auth instancesMed
redisinsighthttp.title:"RedisInsight" port:8001Redis Stack web UI; if exposed, direct GUI access to all keys and dataLow
vector-moduleport:6379 "ReJSON" "search"MODULE LIST response fields for Redis Stack with RediSearch + ReJSON loadedLow
identity-probeINFO server on port 6379 → redis_version, redis_mode, os:Linux fields; MODULE LIST → confirms vector modules

MinIO

Auth default: default-creds (minioadmin:minioadmin Docker default; CVE-2023-28432 leaked root credentials via unauthenticated POST on pre-2023-03-20 releases) Exposure class: Full S3 API access to all buckets — model weights, training datasets, MLflow artifacts, DVC cache, Kubeflow pipeline artifacts. CVE-2023-28432 (CISA KEV) leaks MINIO_SECRET_KEY and MINIO_ROOT_PASSWORD in plaintext.

LabelQueryRationaleFP Risk
primaryhttp.title:"MinIO Console" port:9001Console web UI; vendor-unique title fingerprintLow
secondary"x-minio-deployment-id" port:9000Per-deployment UUID header on S3 API port; vendor-uniqueLow
tertiary"MinIO" port:9000 http.status:403S3 API returns 403 to unauthenticated requests but MinIO header still presentLow
healthport:9000 "/minio/health/live"Health endpoint is unauthenticated; confirms MinIO without loginLow
identity-probeGET /minio/health/live → 200 (no auth); HEAD /x-minio-deployment-id UUID header; CVE-2023-28432 probe: POST /minio/bootstrap/v1/verify → env vars on unpatched

Feast

Auth default: off (auth: type: no_auth is the documented default in feature_store.yaml; TLS also off by default) Exposure class: Full online feature store read/write — real-time feature values for any entity, RAG document embeddings via /retrieve-online-documents, feature metadata revealing data schema. /push endpoint enables data poisoning.

LabelQueryRationaleFP Risk
primaryport:6566 "feature_names"JSON response field from /get-online-features endpoint; vendor-specific schemaLow
secondaryport:6566 "feast"Server or response body contains “feast” identifierLow
identity-probePOST /get-online-features with {"features":[],"entities":{}} → JSON with metadata.feature_names and results fields

Hopsworks

Auth default: default-creds (admin@kth.se / admin documented default from official installer and community forums) Exposure class: Full ML platform access — feature groups, feature views, training datasets, model registry (model binaries + metadata), experiment runs. Complete ML artifact lifecycle exposed.

LabelQueryRationaleFP Risk
primaryhttp.title:"Hopsworks" port:8080Web UI title fingerprintLow
secondary"hopsworks" port:8080 http.status:200Body/product string on open portLow
identity-probeGET /hopsworks/auth/login.xhtml → 200 + Hopsworks login form; attempt admin@kth.se/admin → redirects to dashboard on default installs

Tecton

Auth default: on (managed SaaS, API-key required; no open self-hosted deployment model) Exposure class: N/A — no open-auth exposure documented.

LabelQueryRationaleFP Risk
sweep"tecton.ai" http.status:200Broad sweep for any Tecton-branded surfaceHigh
identity-probeNo unauthenticated probe available; all endpoints require API key

Note: Expected near-zero actionable hits. Any findings are likely customer portals or documentation sites, not feature store infrastructure.


Feathr

Auth default: off for Docker sandbox (local development image; Azure AD required for production) Exposure class: Feature registry read/write — feature definitions, transformation logic (Python UDFs), entity definitions. Business logic exposure even without raw data access.

LabelQueryRationaleFP Risk
primaryhttp.title:"Feathr" port:80Sandbox UI titleMed
secondary"feathr" "feature_store" http.status:200Body discriminator for API responsesLow
identity-probeGET /features → JSON array of feature definitions on sandbox deployments

ArangoDB

Auth default: off (--server.authentication defaults to false; Docker image warns but does not enforce auth) Exposure class: Full multi-model database access — graph data (knowledge graphs, RAG entity maps), document collections, key-value store. Foxx microservice framework enables RCE if app services are deployed. /_api/database/user lists all databases without auth.

LabelQueryRationaleFP Risk
primaryport:8529 "ArangoDB"HTTP banner on default port; product name in Server header or bodyLow
secondaryport:8529 "arango" http.status:200API or UI response with arango identifier stringLow
versionport:8529 "\"server\":\"arango\""Exact JSON field from /_api/version responseLow
identity-probeGET /_api/version{"server":"arango","license":"community","version":"X.Y.Z"} with no credentials required

Neo4j

Auth default: default-creds (neo4j/neo4j, forced password change on first login); many Docker deployments set NEO4J_AUTH=none for development and expose it externally Exposure class: Full Cypher query access — knowledge graph entities and relationships, RAG document chunk metadata, ontology definitions. In AI stacks: LLM-extracted entity graphs, concept maps, user behavior graphs for recommendation.

LabelQueryRationaleFP Risk
primaryport:7474 "neo4j_version"REST API /db/data/ response field; vendor-unique JSON keyLow
secondaryhttp.title:"Neo4j Browser" port:7474Web browser UI title; confirms HTTP-accessible Neo4jLow
boltport:7687 "bolt"Bolt binary protocol fingerprint; not HTTP but Shodan captures bannerMed
identity-probeGET /db/data/ → JSON with "neo4j_version":"X.Y.Z" and "neo4j_edition":"community" — no auth required on NEO4J_AUTH=none instances

Apache Kafka REST Proxy

Auth default: off (REST Proxy default config binds to port 8082 with no auth; “the REST Proxy bypasses broker ACLs when authorization is disabled”) Exposure class: Full topic enumeration — message stream names reveal business process topology. Consumer group access enables message replay. In AI pipelines: real-time feature events, inference requests, training data streams. Native Kafka on 9092 (no auth by default) allows producing/consuming all messages.

LabelQueryRationaleFP Risk
primaryport:8082 "kafka" http.status:200REST Proxy default port with Kafka product identifierMed
secondaryport:8082 "KafkaTopicList"v3 API "kind" field value in /v3/clusters/{id}/topics responseLow
tertiaryport:8082 "/topics"v1/v2 API topic list endpoint appears in banner or responseMed
zookeeperport:2181 "zookeeper"ZooKeeper unauthenticated — lists full Kafka cluster topologyMed
identity-probeGET /topics → JSON array of topic names (v1/v2); GET /v3/clusters"kind":"KafkaClusterList" (v3). Unauthenticated on default installs.

Auth default: off (“the REST endpoint does not authenticate the client by default” per official Flink docs) Exposure class: Running job visibility, full cluster config (may include Kafka brokers, DB connection strings, AWS credentials via /jobmanager/config), JAR upload enabling RCE. CVE-2020-17518/17519 (arbitrary file write/read) affect unpatched instances <= 1.11.2.

LabelQueryRationaleFP Risk
primaryport:8081 "flink-version"JSON field from /config endpoint; vendor-uniqueLow
secondaryhttp.title:"Apache Flink Web Dashboard" port:8081Dashboard HTML titleLow
tertiaryport:8081 "/jobs/overview"API endpoint path in banner or response bodyLow
identity-probeGET /config{"flink-version":"X.Y.Z","flink-revision":"...","features":{...}} — no auth; GET /jobs/overview → running job names

Spark History Server

Auth default: off (“security features like authentication are not enabled by default in Apache Spark”) Exposure class: Spark job history including environment variables from all historical jobs — AWS access keys, S3 bucket names, Hive metastore passwords, Databricks PATs stored as Spark config properties. Job names reveal pipeline structure.

LabelQueryRationaleFP Risk
primaryport:18080 "Spark History Server"Page title and body text on default UILow
secondaryhttp.title:"History Server" port:18080Alternate title format used in some Spark versionsLow
apiport:18080 "/api/v1/applications"REST API endpoint path in bannerLow
identity-probeGET /api/v1/applications → JSON array with id, name, attempts; GET /api/v1/applications/{appId}/environment → Spark config including secrets

Trino / Presto

Auth default: off (“Trino runs with no security by default”; port 8080, no auth, no TLS in default config) Exposure class: Full SQL query submission to any connected data source — S3/Delta Lake/Iceberg/Hive. Running query text and results exposed via /v1/query without auth. Cluster topology via /v1/cluster.

LabelQueryRationaleFP Risk
primaryport:8080 "nodeVersion" "Trino"/v1/info JSON field + product name; discriminates from other port 8080 servicesLow
secondaryport:8080 http.title:"Trino"Web UI title (if enabled)Low
prestoport:8080 "presto" "nodeVersion"Facebook Presto fork uses same API; catches both productsLow
tertiaryport:8080 "/v1/info" "starting"Specific field from the unauthenticated health endpointLow
identity-probeGET /v1/info{"nodeVersion":{"version":"X.Y.Z"},"environment":"production","starting":false} — unauthenticated health endpoint by design

Delta Sharing Server

Auth default: bearer-token (required by protocol; exposure risk is static/demo tokens with zero expiry, or tokens embedded in documentation used in production) Exposure class: With valid bearer token: all shared Delta Lake tables — training datasets, feature tables, model evaluation sets. Token misconfiguration (zero expiry, demo tokens in production) is the primary risk.

LabelQueryRationaleFP Risk
primaryport:8080 "delta-sharing"Protocol identifier in response body or headersLow
secondaryport:8080 "/shares" "application/json"REST API endpoint for listing sharesMed
identity-probeGET /shares with Authorization: Bearer {token}{"items":[{"name":"..."}]} listing all shares; test doc example token faaie590-f132-4954-8571-d5b5b8 on self-hosted reference server installs