Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

§ THE STACK / OBSERVABILITY & SAFETY

LLM Observability

Langfuse, Helicone, LangSmith, Phoenix

Tracing, evaluation, and policy enforcement around the entire stack.

What it is

Once an operator runs an LLM in production they need to see what it’s doing. LLM observability platforms record every prompt, every completion, every tool call, every retrieved document, with token-cost and latency overlays. Langfuse is the open self-hostable leader; Helicone is the proxy-based one; LangSmith (LangChain) is the SaaS option; Arize AI’s Phoenix is the open-source agent development & evaluation platform; Lunary sits in the same space. Together they are the AI equivalent of Datadog, the system of record for everything the model has done.

What goes wrong

The trace store is the operator’s most sensitive AI artefact. It contains every customer prompt verbatim (which is often customer PII), every retrieved document (which is often the operator’s private corpus), and every tool call with full arguments (which is often credentials in plain text). Langfuse ships with a project-key model that operators sometimes bypass by enabling the public-projects feature for “share a trace with my colleague” workflows and forgetting to disable it. The traces become indexed and crawlable by default after that.

How we test

We probe /api/public/projects and /api/public/traces for the trace inventory; the response shape confirms Langfuse and reveals project names along with first-seen and last-seen timestamps. We never read trace bodies. Project names attribute the operator (most are “customer-support-prod”, “sales-enrichment”, etc.) and the date range characterises the corpus volume. Trace counts in the millions on a single project warrant priority disclosure.

Receipts

Research

Every survey, case study, and disclosure we've published that touches this layer of the stack. Counts on the cells above tally these directly.

Cross-cloud surveys

19
Survey Jun 7, 2026

The Auth-on-Default Landscape of OSS AI/LLM Infrastructure

Two-day population survey across 13 OSS AI/LLM infrastructure platforms reveals a maintainer-culture-axis split between demo-first defaults (auth-permissive, 70-91% open) and enterprise-customer-first defaults (auth-required, 0-1%). The cohort is not jurisdiction-defined. Insight #76 scope-bounded to platform class; LLM02 Sensitive Information Disclosure is the dominant finding class; the Capitol.ai escalation demonstrates the maintainer-default failing at enterprise-SaaS scale; in-flight attacker /proc/self/environ activity directly observable on OpenHands instances.

Read →
Survey Jun 6, 2026

Langfuse Population Survey — 816/918 Open Registration (88.9%)

Langfuse is an open-source LLM observability platform (trace ingestion, prompt analytics, evaluation tooling for production AI applications). 1,141 Shodan-indexed instances on "Langfuse" port:3000. 91…

Read →
Survey Jun 6, 2026

Arize Phoenix Population Survey — 41/55 Unauthenticated Project Disclosure

Arize Phoenix (github.com/Arize-ai/phoenix) is an open-source LLM observability and tracing platform — span ingestion, project organization, dataset versioning, prompt management for production AI app…

Read →
Survey May 28, 2026

AI Evaluation and Red-Team Platform Survey — Promptfoo Population Pass

Promptfoo is the only AI eval/red-team platform in the 13-platform scope that produced confirmed unauthenticated exposure at scale. Four instances returned {"email":null} on GET /api/user/email with e…

Read →
Survey May 19, 2026

AI Cost / Billing / Usage Analytics population survey: Langfuse secret-key exposures + Dokploy frontend-secret leak class

The AI cost / billing / usage analytics tier sits at the intersection of LLM operations and finance: it tracks per-tenant token usage, attaches dollar amounts to model calls, and surfaces usage to ope…

Read →
Survey May 12, 2026

AI observability tier, Phase 2 synthesis (cross-cuts + version-deltas)

NuClide Research · 2026-05-12

Read →
Survey May 11, 2026

VisorBishop loop-iteration #1: Re-sweep all Phase 1 corpora, surface gaps

NuClide Research · 2026-05-11

Read →
Survey May 11, 2026

VisorBishop: Phase 3 meta-fingerprinter for the AI observability tier

NuClide Research · 2026-05-11

Read →
Survey May 10, 2026

Helicone deep-dive: Phase 2 (default ClickHouse exposure on benchmarkit.solutions)

NuClide Research · 2026-05-10

Read →
Survey May 10, 2026

Helicone LLM-observability population survey (21-host self-hosted population)

NuClide Research · 2026-05-10

Read →
Survey May 10, 2026

Langfuse deep-dive: Phase 2 (source audit + latent primitives + extended IP-shadow)

NuClide Research · 2026-05-10

Read →
Survey May 10, 2026

Langfuse LLM-observability population survey (1,333-host population, 0% unauth)

NuClide Research · 2026-05-10

Read →
Survey May 10, 2026

LangSmith deep-dive: Phase 2 (customer identity disclosure on 19 enterprise operators)

NuClide Research · 2026-05-10

Read →
Survey May 10, 2026

LangSmith LLM-observability population survey (27-host self-hosted population)

NuClide Research · 2026-05-10

Read →
Survey May 10, 2026

AI observability tier: Small platforms population sweep (Lunary, OpenLIT, Pezzo)

NuClide Research · 2026-05-10

Read →
Survey May 10, 2026

AI observability tier: Cross-platform synthesis (Phase 1)

NuClide Research · 2026-05-10

Read →
Survey May 7, 2026

Agent frameworks cross-survey, planning + dork catalog (2026-05-07)

NuClide Research, 2026-05-07

Read →
Survey May 6, 2026

Langfuse cross-survey-correlation single-host case study (2026-05-06)

NuClide Research · 2026-05-06

Read →
Survey May 4, 2026

LLM Observability + Training Telemetry: Auth Posture Survey

Mass-scan of port 6006 (Phoenix Arize default + TensorBoard default) across 76 tier-2 cloud /16 ranges (3.55M IPs). 4,314 port-open candidates → 9 confirmed AI/ML observability instances (after filter…

Read →