What it is
A reranker is the quality filter that sits between vector retrieval and the LLM. The vector DB returns the top-50 candidate documents fast but loosely; the reranker re-scores them with a smaller cross-encoder model that actually reads each document against the query and orders them by real relevance. Cohere Rerank (managed) and Jina Reranker (open source) are the two most common; BGE-Reranker (BAAI) is the strong open default; Infinity serves rerankers alongside its embeddings. Most production RAG stacks have one in the middle and most teaching examples skip it entirely.
What goes wrong
Reranker servers ship the same way embedding servers do: OpenAI-compatible HTTP, no auth, on the assumption that only the upstream RAG pipeline calls them. When exposed they leak two things: (1) the model identifier, which indicates how seriously the operator is doing RAG, and (2) the queries the operator is processing, since some servers log recent inputs to a status endpoint for debugging. The query log is the more damaging signal because queries often contain the original user prompt verbatim.
How we test
We probe /v1/rerank for the version banner and /v1/models for the model
inventory. We do not submit reranking workloads. Where a debug or status
endpoint exposes recent traffic we capture only the count and timing,
not the query content. The model identifier and traffic profile together
characterise the operator’s RAG seriousness without our ever reading queries.