Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Survey May 11, 2026

VisorBishop Phase 5: Three primitives that turn 492 critical hosts into an impact narrative

NuClide Research · 2026-05-11

Summary

Phase 3 (iter-1..6) built the prober coverage. Phase 4 built the dashboard. Phase 5 turns the inventory into impact, three primitives that derive second-order findings from data the cumulative corpus already contains, without launching new sweeps.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, K7004, S7068, S7070, S7075, T5904
  • 733 (AI Risk & Ethics Specialist): K7040, S7067, T5868, T5882
  • overlap (Common AI KSATs (all 5 roles)): K108, K1158, K1159, K22, K6311, K6900, K6935, K7003

The three:

  1. Cross-platform operator correlation, which operators run multiple unauthenticated platforms? Same IP, /24, or org.
  2. MLflow artifact-URI extraction, what cloud buckets do the 120 critical MLflow hosts point at? Names, providers, cross-host reuse.
  3. LiteLLM model-catalog spend-tier classifier, what’s the dollarized cost-at-risk across the 283 LLMjacking primitives, assuming attacker abuse at provider rates?

All three run against the existing corpus. No new network traffic, no Shodan credits burned. The data was there; it just hadn’t been mined yet.

#1: Cross-platform operator correlation

Result: 1 IP and 23 hosting orgs run multiple unauth platforms.

Same-IP cross-platform finding (the actual chain)

IPCountryOrgPlatformsTargets
78.46.88.7DEHetzner Online AGLiteLLM + Phoenixhttp://78.46.88.7:80, https://78.46.88.7:443, http://78.46.88.7:4000

One Hetzner customer runs both LiteLLM (LLMjacking primitive) and Phoenix (trace disclosure) on the same machine. Attack chain: attacker discovers Phoenix unauth at port 80/443, reads the trace history (which contains prompts + responses + sometimes API keys), then turns to port 4000 and uses the LiteLLM proxy to burn the operator’s LLM budget while sending prompts that match the operator’s normal usage pattern (extracted from the Phoenix traces). The trace-extracted prompts make the abuse traffic indistinguishable from legitimate operator usage until the bill arrives.

Same-/24 cross-platform: zero

No /24 network shows multiple distinct IPs running different unauth platforms. Operators consolidate on single hosts; they don’t shard AI infrastructure across an internal subnet.

Same-org cross-platform: the hosting-tier pattern

OrgIP countPlatformsBreakdown
Google LLC58LiteLLM + MLflow + Phoenixmlflow=34, phoenix=17, litellm=7
Hetzner Online GmbH57LiteLLM + MLflow + Phoenixlitellm=46, mlflow=9, phoenix=4
Microsoft Corporation28LiteLLM + MLflow + Phoenixmlflow=12, litellm=11, phoenix=6
DigitalOcean, LLC26LiteLLM + MLflow + Phoenixlitellm=12, mlflow=8, phoenix=6
Contabo GmbH26LiteLLM + MLflow + Phoenixlitellm=23, phoenix=3, mlflow=1
Microsoft Limited (UK)12LiteLLM + MLflow + Phoenixphoenix=5, litellm=4, mlflow=4
OVH SAS11LiteLLM + MLflow + Phoenixlitellm=8, mlflow=2, phoenix=1
Aliyun Computing10LiteLLM + Phoenixlitellm=6, phoenix=4

This isn’t “same operator”. These are independent customers of the same hosting provider. The signal is structural: each provider’s customer base reproduces the same shipping-default pattern at population scale. Google Cloud’s MLflow tilt vs Hetzner’s LiteLLM tilt reflects the deployment shape that each provider’s tooling makes easiest.

The chains aren’t at the operator level here. They’re at the hosting-org level, where a single subnet sweep across one provider’s IP range will surface a known mix of unauth platforms.

#2: MLflow artifact-URI extraction

Result: 901 artifact URIs across 120 critical hosts → 58 unique cloud buckets surfaced.

The MLflow REST API at /api/2.0/mlflow/experiments/search returns each experiment’s artifact_location field. For 120 unauth hosts, that’s the operator’s pointer at where the model artifacts and datasets actually live.

Provider distribution

ProviderURI count
local-fs (file:// or /path)486
AWS S3157
unknown (custom or empty)110
HTTP (proxied storage)66
Google Cloud Storage51
Azure Blob23
Databricks DBFS4
SFTP4

40% of MLflow critical hosts store artifacts on local disk. Which means the artifacts ride on the same host as the unauth tracking server. If the artifact-path is queryable (the older CVE-2023-1177 path-traversal class), the artifacts are reachable from the same internet entry point.

The other 60% point at cloud buckets. A separately disclosed surface that may be public independently of the MLflow server.

Top buckets by occurrence

ProviderBucketURIsHostsNotes
aws-s3mlflow3111Generic name, likely many ops
aws-s3bia-mlflow2022 different IPs, same operator
aws-s3flow-bucket101DigitalOcean NL operator
aws-s3maiaddy-mlflow-artifacts101Mediloca / maiaddy healthcare
aws-s3authenta-mlflow101AWS US East
aws-s3mlflow-artifacts-631835340943101AWS account 631835340943 (Oishii vertical farming)
aws-s3svdeepflow-mlflow-artifacts-dev933 IPs in AWS Seoul, same operator redundant infra
aws-s3miles-mlflow-models91Hetzner DE operator
aws-s3hia-dev-s3-ew1-hes-ai-00091AWS Sweden — likely HES / Hes-AI
aws-s3broadcast-ai91GCP US central
gcsjrg-mlflow-artifacts81GCS bucket
azure-blobmlflowartifacts@riskmodelstorage.blob.core.windows.net51Risk modeling, Azure
aws-s3gmb-ddal-test-oslo-shared41University of Oslo
aws-s3mlperf-mlflow-artifacts41IBM Cloud, MLPerf benchmark
gcsaircheck-mlflow-tracking41GCP
gcskurious-ml-dev41GCP dev bucket
gcsavaloka-mlflow-artifacts41GCP
azure-blobmlflowartifacts@mvpusstorage.blob.core.windows.net41Azure

Cross-operator bucket reuse

Only one bucket name appears on multiple distinct hosts in a way that suggests same-operator-redundant-infrastructure (vs different operators happening to share a name):

  • svdeepflow-mlflow-artifacts-dev on 3 IPs in AWS Seoul region. Same operator running redundant tracking servers pointed at the same bucket. Disclosure target: one entity, three servers.

The 78-host “local-fs” set isn’t real cross-operator overlap. Every operator with file:// artifacts has their own local-fs.

Disclosure implications

The 58 unique buckets are second-order disclosure surface. Each bucket name is searchable: if the operator’s bucket policy allows public list/get, the artifacts are exposed independently of the MLflow server’s auth state. Worth a follow-up pass to check public accessibility of each named bucket, but out of scope for this Phase 5 primitive.

#3: LiteLLM model-catalog spend-tier classifier

Result: 283 unauth LiteLLM hosts represent ~$60,494/month ($725,927/year) in cost-at-risk at provider posted rates.

Methodology

Each LiteLLM critical host’s /v1/models response was already captured in the VisorBishop indicators payload (full model_ids when ≤20 models, sample of 20 otherwise). For each model:

  1. Match the model ID against a pricing table covering Anthropic Claude, OpenAI GPT/o-series/embeddings, Google Gemini, AWS Bedrock, Meta Llama, DeepSeek, Moonshot Kimi, Mistral, Cohere, xAI Grok, and Ollama-passthrough (free, signaled by deployment).
  2. Classify into a tier (frontier ≥$15/Mtok, high $3-15, mid $0.5-3, low <$0.5, free self-hosted).
  3. Compute blended $/Mtok across all distinct models on the host.
  4. Apply a conservative 10M tokens/model/month abuse rate (attacker burns one model at 10M tokens for a month). Real attacker abuse is faster. Bots can hit a frontier model at 100M+ tokens/month before detection, so this is a lower-bound estimate.

Tier distribution across all 1,127 exposed models

TierCount%Definition
frontier13311.8%≥$15/Mtok (GPT-4/5, Claude Opus, o1/o3)
high26823.8%$3-15/Mtok (Claude Sonnet, GPT-4o, Gemini Pro)
mid31027.5%$0.5-3/Mtok (Claude Haiku, Gemini Flash, Mistral)
low37333.1%<$0.5/Mtok (GPT-3.5, Llama 8B, embeddings)
free433.8%Self-hosted (Ollama, TGI passthrough)

Provider distribution (top 10)

ProviderModels exposed
(unknown / custom model names)290
OpenAI207
Anthropic201
Google Gemini120
Alibaba (Qwen)99
DeepSeek45
Ollama (self-hosted)43
Moonshot Kimi30
Meta Llama27
Mistral26

Top 10 operators by cost-at-risk

$/mo (conservative)TargetTierModelsFrontier sample
$4,68020.2.91.83:80 (Azure HK)frontier57gpt-5.5, gpt-5.4-pro, gpt-5.4
$2,64098.149.54.126:4100 (Charter US residential)frontier22claude-opus-4-1, claude-opus-4-1-20250805
$1,89895.216.95.1:4000 (Hetzner FI / netiva.com.tr)frontier11gpt-5.4, gpt-5.5, gpt-5.4-mini
$1,854156.227.236.247:4000 (Yisu Cloud JP)frontier18gpt-5-image, gpt-5-image-mini
$1,575118.196.116.53:4000 (China)frontier7gpt-5.2, gpt-5.3-codex
$1,500103.29.190.142:4000 (Indonesia)frontier6chatgpt/gpt-5.4, chatgpt/gpt-5.4-pro
$1,46043.153.66.205:4000 (Tencent US)frontier11gpt-4.1, gpt-5.3-codex
$1,447203.149.11.67:443 (Samart Thailand, api.modelharbor.com)frontier20gpt-5.4, claude-sonnet-4.5
$1,353107.174.66.118:4000 (RackNerd US)frontier30gpt-5, o1, claude-opus-4
$1,24364.227.146.129:4000 (DigitalOcean IN)frontier16gpt-4.1, gpt-4.1-mini

What “cost-at-risk” really means

The $60,494/mo aggregate is the minimum bill the operator’s provider stack would face if every one of these endpoints were abused at 10M tokens/model/month for a single month. Real-world attacker patterns are different:

  1. Concentrated abuse on frontier models, attackers target the highest-cost models because the burn rate is fastest. The $4,680/mo operator at 20.2.91.83 exposes 57 models including 3 frontier. An attacker hitting just those 3 at 100M tokens/mo would burn ~$6,750/mo on that single host.
  2. Sustained abuse beyond one month, most operators don’t notice the bill anomaly for 1-3 cycles. Cumulative damage over a 90-day undetected window is 3× the monthly number.
  3. Indirect costs, burning the operator’s rate limit, exhausting their Anthropic / OpenAI capacity allocation, triggering provider abuse-investigation that may suspend the operator’s account.

A reasonable real-world projection: multiply the conservative number by 3-10× to estimate actual undetected-abuse-window blast radius. That puts the population-scale annualized risk in the $2-7M/year range.

Outliers worth noting

  • 98.149.54.126 is on Charter Spectrum residential. This is a developer running LiteLLM at home with their personal Anthropic API key, exposing 22 models including frontier Opus. An individual’s monthly Anthropic bill could be five-figures before they notice.
  • api.modelharbor.com (Thailand) reads as a commercial LLM routing product. Every paying customer is sharing an unauth proxy.
  • 9 of the top 10 operators expose frontier-tier models: the abuse-attractive tail is concentrated, not random.

Cumulative impact picture

The three primitives together turn the inventory into an impact narrative:

PrimitiveNumeric findingWhat it means
Cross-platform correlation1 same-IP chainAttacker can chain Phoenix trace-disclosure → LiteLLM LLMjacking on the same host (Hetzner DE)
Artifact-URI extraction58 unique buckets across 120 hostsSecond-order disclosure surface; named buckets like bia-mlflow, maiaddy-mlflow-artifacts, svdeepflow (3× same operator), riskmodelstorage
Spend-tier classifier$60K-$725K conservative / $2M-7M realisticDollarized LLMjacking blast radius across the 283 critical LiteLLM operators

These are the headlines the 492-critical inventory enables.

Reproducibility

All three primitives are deterministic functions of the cumulative corpus. The scripts live at:

~/recon/2026-05-11-phase5/

  • 01-cross-platform-correlation.py: reads the dashboard JSON
  • 02-mlflow-artifact-uris.py: reads the raw iter-7 MLflow JSON
  • 03-litellm-spend-tier.py: reads the raw iter-6 LiteLLM JSON

Outputs:

  • correlations.json: by-IP / by-/24 / by-org operator overlaps
  • mlflow-artifacts.json / mlflow-artifact-buckets.tsv. Bucket inventory
  • litellm-spend-tier.json / litellm-spend-tier.tsv. Per-host spend tier + cost-at-risk

The pricing table in #3 is the only piece that needs maintenance (provider list-prices change quarterly).

What’s next

Phase 5 closes the research arc. The natural next steps:

  1. Public-facing announcement, the cost-at-risk numbers ($725K-$7M/year annualized) are the publishable signal.
  2. Bucket-accessibility pass, query each of the 58 unique buckets surfaced in #2 for public-list / public-get.
  3. Iter-8 platform expansion, postHog LLM analytics, Kubeflow, Airflow ML DAGs.
  4. Disclosure-routing pipeline for the 492 cumulative critical operators.

Cross-references: