RAG Frameworks — Shodan Query Catalog
Generated: 2026-05-27 from pre-survey OSINT pass (15 platforms) See: data/platform-intel/rag-frameworks-osint-2026-05-27.md for full intel
Already covered (not duplicated here):
- Open WebUI — cat-01 LLM orchestration queries
- Chroma, Qdrant, Weaviate, Milvus — cat-02 vector DB queries
AnythingLLM
Auth default: off (single-user mode ships with password protect disabled) Exposure class: Workspace documents, chat history, connected LLM API keys, embedded document chunks
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | http.title:"AnythingLLM" port:3001 | App sets <title>AnythingLLM</title>; port 3001 is the default Docker bind | Low |
| secondary | port:3001 http.html:"anythingllm" http.html:"workspace" | Catches instances with non-standard title; workspace string is app-specific | Med |
| identity-probe | GET / → React bundle containing AnythingLLM string; GET /api/v1/auth → 200 with {"authenticated":true/false} indicates single-user mode | — |
RAGFlow
Auth default: default-creds (admin@ragflow.io / admin); login required but known creds
Exposure class: Indexed document chunks, knowledge bases, chat history, cross-tenant API tokens (CVE-2024-12880); pre-auth RCE via RPC pickle chain (CVE-2024-12433)
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | http.html:"ragflow" port:80 | RAGFlow serves React SPA on port 80 via nginx; ragflow string in bundle | Med |
| secondary | http.html:"RAGFlow" http.favicon.hash:-1467534538 | Favicon hash avoids port-80 FP flood; RAGFlow sets distinctive favicon | Low |
| tertiary | port:9380 http.html:"ragflow" | Internal Python API port exposed directly in misconfigured deployments | Low |
| identity-probe | GET /api/v1/datasets → 401 {"code":401,...} with doc_aggs / parser_config fields in 200 responses confirms RAGFlow | — |
LightRAG
Auth default: off (open by default; JWT/API key auth is opt-in via env vars) Exposure class: Knowledge graph nodes/edges, indexed document chunks, document upload queue, Ollama-compatible endpoints open even when API key auth is configured
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:9621 http.html:"LightRAG" | Port 9621 is LightRAG-specific; LightRAG string in Swagger/React UI | Low |
| secondary | port:9621 http.html:"/query" | /query endpoint path appears in Swagger UI loaded on root | Med |
| identity-probe | GET /health → 200 + operational JSON; GET /docs → Swagger listing /query, /documents/upload, /documents/scan with references field in query responses | — |
PrivateGPT
Auth default: off (auth.enabled: false in default settings.yaml)
Exposure class: Ingested documents, full document text via /v1/ingest/list, chat history, model context
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:8001 http.html:"PrivateGPT" | Port 8001 is PrivateGPT default; PrivateGPT in Swagger title | Low |
| secondary | port:8001 http.html:"/v1/ingest" | /v1/ingest/file is PrivateGPT-specific — not in OpenAI API schema | Low |
| identity-probe | GET /docs → FastAPI Swagger with /v1/ingest/file, /v1/ingest/list, /v1/chunks paths — /v1/ingest/* endpoints do not exist in stock OpenAI-compatible servers | — |
txtai
Auth default: off (“default implementation runs via HTTP and is fully open”) Exposure class: All indexed document embeddings and full text, search query results, workflow pipeline configs
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:8080 http.html:"txtai" | txtai string in Swagger/app UI; port 8080 is common but txtai discriminates | Med |
| secondary | port:8080 http.html:"/workflow" http.html:"txtai" | /workflow endpoint is txtai-specific within the FastAPI surface | Low |
| identity-probe | GET /docs → FastAPI Swagger listing /search, /batchsearch, /add, /index, /upsert, /workflow — the /workflow path is the strongest txtai identity signal | — |
Cognita
Auth default: off (default compose.env has LOCAL=true, no JWT, placeholder test creds)
Exposure class: RAG document collections, data source credentials (DB connection strings, S3 keys), query results with retrieved chunks
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:8000 http.html:"cognita" http.html:"truefoundry" | Both strings appear in the React SPA bundle for Cognita | Low |
| secondary | port:8000 http.html:"/collections" http.html:"cognita" | /collections API path is Cognita-specific within FastAPI surface | Low |
| identity-probe | GET /docs → FastAPI Swagger with /collections, /data_sources, /apps endpoints; truefoundry string in page body confirms operator | — |
R2R (SciPhi)
Auth default: off (require_authentication: false in r2r.toml); default admin admin@example.com / change_me_immediately when auth enabled
Exposure class: Document collections, knowledge graph, user list, full system config including connected LLM API keys (when auth disabled)
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:7272 http.html:"r2r" | Port 7272 is R2R-specific; r2r string in API responses | Low |
| secondary | port:7272 http.html:"/v3/" | The /v3/ path prefix is R2R-specific API versioning | Low |
| identity-probe | GET /v3/health → 200 + {"results":{"response":"ok"}}; GET /v3/system/settings → full config dump when auth disabled | — |
Kotaemon
Auth default: default-creds (admin / admin on first run)
Exposure class: All uploaded documents and indexed chunks, chat sessions across all users (admin has global view), connected LLM API key configs
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:7860 http.html:"kotaemon" | Port 7860 is Gradio default; kotaemon string in app HTML | Low |
| secondary | port:7860 http.html:"Kotaemon" http.html:"gradio" | Gradio framework string + Kotaemon name; filters Gradio FPs | Low |
| identity-probe | GET / → Gradio login page with kotaemon in HTML title/body; test admin/admin credentials | — |
Quivr
Auth default: on (Supabase JWT required by default; AUTHENTICATE=true)
Exposure class: When auth misconfigured: all brains and documents, chat history, Supabase anon key in frontend JS (RLS bypass risk if misconfigured)
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:5050 http.html:"quivr" | Port 5050 is Quivr backend default; quivr string in error/Swagger responses | Low |
| secondary | port:5050 http.html:"second brain" http.html:"quivr" | ”Second brain” branding appears in older v1 Quivr deployments | Low |
| identity-probe | GET /docs → FastAPI Swagger; GET /healthz → 200; auth-on instances return 401 on protected endpoints; look for NEXT_PUBLIC_SUPABASE_URL leak in frontend JS on port 3000 | — |
Danswer / Onyx
Auth default: configurable; AUTH_TYPE=disabled documented option; default first-run requires user registration (no hardcoded creds)
Exposure class: When AUTH_TYPE=disabled: all indexed documents from connected sources (Slack, Google Drive, Confluence, GitHub), connector OAuth tokens, user directory, full chat/search history
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | http.title:"Onyx" port:3000 | Onyx sets page title; port 3000 is nginx frontend | Low |
| secondary | http.title:"Danswer" port:3000 | Legacy branding (pre-2024 rename); still deployed in many instances | Low |
| tertiary | port:3000 http.html:"danswer" http.html:"connector" | ”connector” appears in UI for doc source management — Danswer-specific | Med |
| identity-probe | GET /api/me → 401 (auth enabled) or 200 with user object (auth disabled); GET /api/connector → connector list when auth disabled | — |
Verba
Auth default: off (no auth layer; designed for local use) Exposure class: All documents loaded into Weaviate, chat conversation history, Weaviate/OpenAI API keys in backend environment
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:8000 http.html:"Verba" | Verba string in Next.js/FastAPI app HTML; port 8000 | Med |
| secondary | port:8000 http.html:"goldenverba" | PyPI package name goldenverba appears in page source/JS bundles | Low |
| identity-probe | GET / → Next.js app with Verba or goldenverba string; Weaviate endpoint /api/get_components or similar FastAPI path confirms backend | — |
DocsGPT
Auth default: off (no auth on API endpoints in default deployment)
Exposure class: Indexed documentation, chat history, API keys via /api/get_api_keys (unauthed on vulnerable versions), document upload queue; pre-auth RCE (CVE-2025-0868)
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:5001 http.html:"DocsGPT" | DocsGPT string in React SPA; port 5001 is the Flask backend | Low |
| secondary | port:5001 http.html:"docsgpt" http.html:"conversation" | ”conversation” appears in DocsGPT chat UI HTML | Low |
| identity-probe | GET /api/task_status?task_id=test → 200 + JSON with name_job, filename, formats, directory fields — DocsGPT-specific task response shape; GET /api/get_api_keys → API key list if unpatched | — |
Ragapp
Auth default: off (no auth by design; documented explicit non-feature)
Exposure class: Full admin UI access, all indexed document contents, connected LLM API keys visible in /api/management/config, agentic tool configs, chat history
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:8000 http.html:"ragapp" http.html:"/admin" | /admin path + ragapp string in page source; both ship in default build | Low |
| secondary | port:8000 http.html:"RAGapp" http.html:"llamaindex" | LlamaIndex attribution string appears in RAGapp UI/docs | Low |
| identity-probe | GET /admin → 200 unauthenticated admin UI; GET /api/management/config → LLM API key exposure | — |
Perplexica
Auth default: off (ships with no auth; explicit developer advisory against public exposure)
Exposure class: All search queries and results, connected LLM API keys in config.toml, SearXNG instance configuration, search history
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | http.title:"Perplexica" port:3000 | Perplexica sets page title; port 3000 is Next.js frontend | Low |
| secondary | port:3000 http.html:"Perplexica" http.html:"focusMode" | focusMode is a Perplexica-specific API field that appears in frontend source | Low |
| identity-probe | GET / → HTML with Perplexica title; backend POST :3001/api/search with focusMode field in body is the functional identity probe — focusMode values webSearch/academicSearch/writingAssistant are Perplexica-specific | — |
FreedomGPT
Auth default: off (llama.cpp server has no auth; Electron app localhost-only by default) Exposure class: Chat conversations, loaded model identity, uncensored model outputs (no safety filtering on data submitted)
| Label | Query | Rationale | FP Risk |
|---|---|---|---|
| primary | port:8080 http.html:"FreedomGPT" | FreedomGPT string in Electron-served React app; port 8080 is llama.cpp default | Med |
| secondary | port:8080 http.html:"freedomgpt" http.html:"llama" | Combined signals reduce FP on general llama.cpp deployments | Low |
| identity-probe | GET / → Electron React app with FreedomGPT string; underlying llama.cpp at GET /health or POST /completion — low expected Shodan population | — |