Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← Research library

Insight #6 May 5, 2026

Single-word substring matching is unsound at population scale

A platform fingerprint must require, at minimum: (a) a specific endpoint that the platform alone serves, (b) structured response (JSON parse + named field, or specific HTML title format), (c) anchored keyword match conjoined with (a) and (b).

Evidence

The AI safety eval survey’s bespoke probe (data/aisafety-probe.py) used b"garak" in body.lower() and b"confident" in body.lower() as platform-identification matches.

At 1,017 prefixes, this produced 6 false positives and 0 true positives.

Concrete trace: a personal video clip browser had a file named [F] Garakuta 【Flashアニメ】ガラクタノカミサマ.mp4. Japanese anime title contains “garak”; the broken probe declared the host an exposed Garak deployment.

How to apply

The conjunctive matcher pattern, written verbosely:

def is_platform(probe_response):
    return (
        probe_response.url.endswith("/specific-endpoint-only-this-platform-serves") and
        probe_response.status_code == 200 and
        is_json(probe_response.body) and
        json.loads(probe_response.body).get("specific_field_name") == "expected_value" and
        "anchored_keyword" in probe_response.body
    )

Three conjuncts minimum: endpoint specificity, structural shape, anchored keyword. Any single one alone produces population-scale false positives.

How to apply (broader)

The structural lesson is broader than this one bug:

  • Future surveys should add fingerprints to aimap and run aimap on the cohort, not write per-survey bespoke probes. aimap’s existing fingerprint database has the right matcher schema; the bespoke probes don’t.
  • All AI safety eval fingerprints are now in aimap with this discipline applied.

Source

Captured in case-studies/commercial/SYNTHESIS-2026-05.md and the methodology-correction section of ai-safety-eval-cloud-survey-2026-05.

SOURCE · case-studies/commercial/SYNTHESIS-2026-05.md