What it is
Retrieval-Augmented Generation is how an LLM gets access to documents it wasn’t trained on: your company wiki, last week’s invoices, a PDF of your medical history. A RAG pipeline chains a document loader, an embedder, a vector store, a retriever, and the LLM call. Frameworks that package this into one runtime: Dify (the most polished, Chinese-origin), Flowise (visual builder on top of LangChain), Haystack (Deepset’s enterprise stack), Quivr, Verba. The pipeline is what turns a model into a product.
What goes wrong
Most RAG deployments are research artefacts that grew into prototypes that
grew into production. Dify ships with admin@admin.com / password as the
seed account; a fresh Flowise install exposes the canvas and every workflow’s
embedded API keys; Haystack’s REST API is unauthenticated by default and
its /query endpoint will dutifully retrieve and return any document the
embedder has indexed. The corpus exposed this way ranges from public PDFs
all the way to attorney-client communications, internal sales decks, and
patient records.
How we test
We probe each framework’s signature endpoints: Dify’s /console/api/setup
for the seed-account state, Flowise’s /api/v1/chatflows for the workflow
catalogue, Haystack’s /search for the indexed corpus reach. When the
retriever is reachable, we issue a single low-volume query (e.g. “summary”)
to confirm the corpus contains real content, capture the document titles and
sources from the response, and stop. Title metadata is enough to attribute
the operator and characterise the data class without reading the documents
themselves.