Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All reference

Reference

Terminology Primer

Source: https://github.com/nuclide-research/AI-LLM-Infrastructure-OSINT/blob/main/reference/terminology

Shorthand for the AI/ML stack as it appears throughout this repository. Skip this page if you already live in it.

TermMeaning
Artifact StoreWhere trained models, weights, embeddings, and datasets actually live on disk. In self-hosted AI, this is almost always MinIO or S3. Compromise here means exfiltrating the model itself, not just the inference endpoint.
Feature StoreCentralized repository of ML features used for both training and serving. Feast and Tecton are the common names. Rare on Shodan, but a positive hit leaks the feature definitions that power production models.
GraphRAG / Knowledge GraphHybrid retrieval architecture combining a vector database with a graph database for relationship-aware retrieval. Neo4j + a vector store is the common pairing.
MCP (Model Context Protocol)2025–2026 standard for connecting LLMs to tools, filesystems, and databases. The protocol was designed for stdio transport but deploys increasingly over HTTP/SSE, which is where exposure happens. See shodan/queries/10-mcp-servers.md.
OLAP + VectorHybrid analytical databases (ClickHouse, Cassandra 5.x+) that handle both classical OLAP queries and semantic vector search. Shows up in analytics-heavy AI stacks.
LLMjacking2024-coined term for attacks that hijack exposed LLM endpoints or leaked API keys to run inference on someone else’s provider bill. MCP servers and exposed LiteLLM proxies are primary targets.
RAG (Retrieval-Augmented Generation)Architecture pattern that retrieves relevant documents from a vector store before passing them to an LLM. RAG stacks accumulate enterprise document sets, see shodan/queries/07-rag-stacks.md.
Specialty Data LayersGeneral-purpose analytic / wide-column / embedded-analytical platforms (ClickHouse, DuckDB, Cassandra/ScyllaDB, Pinot, Druid, StarRocks) that the AI/ML ecosystem co-opts for AI-adjacent work, feature stores, training-data lakes, real-time analytics over inference logs, ML classifier state. Predate the modern AI stack but get wired into it. See reference/category-taxonomy.md.
Tier (T1 / T2 / T3)Exposure tier used throughout this repo. T1 = unauth by default. T2 = auth common but misconfigured or has known bypass CVEs. T3 = recon / fingerprint only, not an immediate finding.