AI Safety & Evals, Observability & Safety, NuClide Stack

What it is

Eval harnesses measure model behaviour against benchmarks: capabilities, biases, refusal patterns, jailbreak resistance. lm-eval-harness (EleutherAI) is the universal capability-eval; EleutherAI’s safety eval forks track refusal/harm rates; NVIDIA Nemo Guardrails and Guardrails AI sit in front of production models and constrain output in real time; Inspect (UK AI Safety Institute) and Anthropic’s evals are the research-grade options. Together they are how a serious AI deployment knows whether the model is doing what it’s supposed to.

What goes wrong

Eval and guardrail systems hold the operator’s threat model: the prompts they consider harmful, the responses they consider unacceptable, the policy they want enforced. When an eval-harness server is exposed unauthenticated, an attacker reads the full set of red-team prompts the operator uses, learns which of those prompts the model currently fails, and gets a precise roadmap to bypass the operator’s guardrails. The exposure of a guardrail configuration is also a disclosure of the policy boundary itself.

How we test

We probe for harness control endpoints (lm-eval’s WebSocket UI, Nemo Guardrails’ REST API on port 8000) and read the policy/eval inventory via the unauthenticated metadata endpoints. We never run new evals. The eval names alone are the disclosure evidence. Names like “jailbreak-bench”, “medical-refusals”, “copyright-output” characterise the operator’s concerns and identify their team without our needing to read prompt bodies.

Receipts

Research

Every survey, case study, and disclosure we've published that touches this layer of the stack. Counts on the cells above tally these directly.

Cross-cloud surveys

3

Survey May 29, 2026

LLM Safety / Guardrail survey, 2026-05-29

Five dorks. One confirmed unauthenticated guardrail server, and the guardrail was the least exposed thing on the box. The same host left MongoDB, Redis, MySQL, PostgreSQL, and a Docker registry open w…

Read →

Survey May 28, 2026

LLM Guard survey: guardrail platforms Shodan-dark except /metrics side-channel

Two LLM Guard v0.0.10 instances confirmed from an 11-platform Shodan sweep. Both have auth configured on scan endpoints (/analyze/prompt, /analyze/output, /scan/output). Both expose /metrics without a…

Read →

Survey May 1, 2026

AI Safety Evaluation / Red-Team Self-Hosted: Cross-Cloud Survey (2026-05)

The original probe, data/aisafety-probe.py, used naked single-word substring matching on response bodies (b"garak" in body.lower(), b"deepeval" in text or b"confident" in text). At population scale ac…

Read →

Observability & Safety

Other categories in this layer

LLM Observability

Langfuse, Helicone, LangSmith, Phoenix

Prompt Management

PromptLayer, Promptly, Pezzo, Agenta