What it is
Search engines power both the classic full-text retrieval tier (Elasticsearch, Apache Solr, Vespa) and the modern vector-similarity tier that LLM apps lean on for retrieval-augmented generation. The line between them has blurred since 2022: every mainstream engine (Elastic, OpenSearch, Solr 9, Vespa, Meilisearch, Typesense) now ships dense-vector indices alongside their inverted-index core. Many production RAG pipelines store their LangChain or LlamaIndex document chunks here rather than in a dedicated vector DB.
What goes wrong
The official Docker images ship with auth off by default. The operator must opt into security: set xpack.security.enabled=true for Elasticsearch, configure Solr’s security.json to enable the basic-auth plugin, or set the Meilisearch master key via environment variable. Across population-scale surveys, ~54% of reachable Elasticsearch instances skip the step entirely. Solr’s older Docker tags (solr:7.x) compound the problem with multiple unauthenticated remote-code-execution CVEs: CVE-2019-17558 (Velocity Template SSTI), CVE-2019-0193 (DataImportHandler), CVE-2019-12409 (JMX-RMI). The data layer itself discloses operator app schema via index and core names long before any document is read.
How we test
We probe each engine’s identity endpoint (/ for Elasticsearch’s version JSON, /solr/admin/info/system for Solr, /health for Meili and Typesense, /state/v1 for Vespa), confirm version, and then call the documented listing endpoint (/_cat/indices, /solr/admin/cores, /indexes, /collections). Index and core names are the finding: operators name things like rag-document-chunks, spring-ai-document-index, entity_vectors, kb_documents_v1. Disclosure of the operator’s app architecture happens before any document fetch. We never run free-text queries against the index; the names alone justify the severity claim.