What it is
Once an operator runs an LLM in production they need to see what it’s doing. LLM observability platforms record every prompt, every completion, every tool call, every retrieved document, with token-cost and latency overlays. Langfuse is the open self-hostable leader; Helicone is the proxy-based one; LangSmith (LangChain) is the SaaS option; Arize AI’s Phoenix is the open-source agent development & evaluation platform; Lunary sits in the same space. Together they are the AI equivalent of Datadog, the system of record for everything the model has done.
What goes wrong
The trace store is the operator’s most sensitive AI artefact. It contains every customer prompt verbatim (which is often customer PII), every retrieved document (which is often the operator’s private corpus), and every tool call with full arguments (which is often credentials in plain text). Langfuse ships with a project-key model that operators sometimes bypass by enabling the public-projects feature for “share a trace with my colleague” workflows and forgetting to disable it. The traces become indexed and crawlable by default after that.
How we test
We probe /api/public/projects and /api/public/traces for the trace
inventory; the response shape confirms Langfuse and reveals project names
along with first-seen and last-seen timestamps. We never read trace bodies.
Project names attribute the operator (most are “customer-support-prod”,
“sales-enrichment”, etc.) and the date range characterises the corpus
volume. Trace counts in the millions on a single project warrant priority
disclosure.