Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

§ THE STACK / GATEWAY LAYER

Rerankers

Cohere, Jina, BGE, Infinity reranker

Routes the request, attaches retrieved context, mediates between user and model.

What it is

A reranker is the quality filter that sits between vector retrieval and the LLM. The vector DB returns the top-50 candidate documents fast but loosely; the reranker re-scores them with a smaller cross-encoder model that actually reads each document against the query and orders them by real relevance. Cohere Rerank (managed) and Jina Reranker (open source) are the two most common; BGE-Reranker (BAAI) is the strong open default; Infinity serves rerankers alongside its embeddings. Most production RAG stacks have one in the middle and most teaching examples skip it entirely.

What goes wrong

Reranker servers ship the same way embedding servers do: OpenAI-compatible HTTP, no auth, on the assumption that only the upstream RAG pipeline calls them. When exposed they leak two things: (1) the model identifier, which indicates how seriously the operator is doing RAG, and (2) the queries the operator is processing, since some servers log recent inputs to a status endpoint for debugging. The query log is the more damaging signal because queries often contain the original user prompt verbatim.

How we test

We probe /v1/rerank for the version banner and /v1/models for the model inventory. We do not submit reranking workloads. Where a debug or status endpoint exposes recent traffic we capture only the count and timing, not the query content. The model identifier and traffic profile together characterise the operator’s RAG seriousness without our ever reading queries.

Receipts

Research

Every survey, case study, and disclosure we've published that touches this layer of the stack. Counts on the cells above tally these directly.

Queued

We haven't surveyed this category yet. The technology is on our map; the receipts will follow when the cross-cloud survey lands. Browse the research feed for what's already published, or watch this page.