VisorBishop Phase 5: Three primitives that turn 492 critical hosts into an impact narrative
NuClide Research · 2026-05-11
Summary
Phase 3 (iter-1..6) built the prober coverage. Phase 4 built the dashboard. Phase 5 turns the inventory into impact, three primitives that derive second-order findings from data the cumulative corpus already contains, without launching new sweeps.
DCWF KSAT coverage
Auto-derived from DCWF AI work-role rule files (ksat-tag).
- 672 (AI Test & Evaluation Specialist): K7003, K7004, S7068, S7070, S7075, T5904
- 733 (AI Risk & Ethics Specialist): K7040, S7067, T5868, T5882
- overlap (Common AI KSATs (all 5 roles)): K108, K1158, K1159, K22, K6311, K6900, K6935, K7003
The three:
- Cross-platform operator correlation, which operators run multiple unauthenticated platforms? Same IP, /24, or org.
- MLflow artifact-URI extraction, what cloud buckets do the 120 critical MLflow hosts point at? Names, providers, cross-host reuse.
- LiteLLM model-catalog spend-tier classifier, what’s the dollarized cost-at-risk across the 283 LLMjacking primitives, assuming attacker abuse at provider rates?
All three run against the existing corpus. No new network traffic, no Shodan credits burned. The data was there; it just hadn’t been mined yet.
#1: Cross-platform operator correlation
Result: 1 IP and 23 hosting orgs run multiple unauth platforms.
Same-IP cross-platform finding (the actual chain)
| IP | Country | Org | Platforms | Targets |
|---|---|---|---|---|
78.46.88.7 | DE | Hetzner Online AG | LiteLLM + Phoenix | http://78.46.88.7:80, https://78.46.88.7:443, http://78.46.88.7:4000 |
One Hetzner customer runs both LiteLLM (LLMjacking primitive) and Phoenix (trace disclosure) on the same machine. Attack chain: attacker discovers Phoenix unauth at port 80/443, reads the trace history (which contains prompts + responses + sometimes API keys), then turns to port 4000 and uses the LiteLLM proxy to burn the operator’s LLM budget while sending prompts that match the operator’s normal usage pattern (extracted from the Phoenix traces). The trace-extracted prompts make the abuse traffic indistinguishable from legitimate operator usage until the bill arrives.
Same-/24 cross-platform: zero
No /24 network shows multiple distinct IPs running different unauth platforms. Operators consolidate on single hosts; they don’t shard AI infrastructure across an internal subnet.
Same-org cross-platform: the hosting-tier pattern
| Org | IP count | Platforms | Breakdown |
|---|---|---|---|
| Google LLC | 58 | LiteLLM + MLflow + Phoenix | mlflow=34, phoenix=17, litellm=7 |
| Hetzner Online GmbH | 57 | LiteLLM + MLflow + Phoenix | litellm=46, mlflow=9, phoenix=4 |
| Microsoft Corporation | 28 | LiteLLM + MLflow + Phoenix | mlflow=12, litellm=11, phoenix=6 |
| DigitalOcean, LLC | 26 | LiteLLM + MLflow + Phoenix | litellm=12, mlflow=8, phoenix=6 |
| Contabo GmbH | 26 | LiteLLM + MLflow + Phoenix | litellm=23, phoenix=3, mlflow=1 |
| Microsoft Limited (UK) | 12 | LiteLLM + MLflow + Phoenix | phoenix=5, litellm=4, mlflow=4 |
| OVH SAS | 11 | LiteLLM + MLflow + Phoenix | litellm=8, mlflow=2, phoenix=1 |
| Aliyun Computing | 10 | LiteLLM + Phoenix | litellm=6, phoenix=4 |
This isn’t “same operator”. These are independent customers of the same hosting provider. The signal is structural: each provider’s customer base reproduces the same shipping-default pattern at population scale. Google Cloud’s MLflow tilt vs Hetzner’s LiteLLM tilt reflects the deployment shape that each provider’s tooling makes easiest.
The chains aren’t at the operator level here. They’re at the hosting-org level, where a single subnet sweep across one provider’s IP range will surface a known mix of unauth platforms.
#2: MLflow artifact-URI extraction
Result: 901 artifact URIs across 120 critical hosts → 58 unique cloud buckets surfaced.
The MLflow REST API at /api/2.0/mlflow/experiments/search returns
each experiment’s artifact_location field. For 120 unauth hosts,
that’s the operator’s pointer at where the model artifacts and
datasets actually live.
Provider distribution
| Provider | URI count |
|---|---|
local-fs (file:// or /path) | 486 |
| AWS S3 | 157 |
unknown (custom or empty) | 110 |
| HTTP (proxied storage) | 66 |
| Google Cloud Storage | 51 |
| Azure Blob | 23 |
| Databricks DBFS | 4 |
| SFTP | 4 |
40% of MLflow critical hosts store artifacts on local disk. Which means the artifacts ride on the same host as the unauth tracking server. If the artifact-path is queryable (the older CVE-2023-1177 path-traversal class), the artifacts are reachable from the same internet entry point.
The other 60% point at cloud buckets. A separately disclosed surface that may be public independently of the MLflow server.
Top buckets by occurrence
| Provider | Bucket | URIs | Hosts | Notes |
|---|---|---|---|---|
| aws-s3 | mlflow | 31 | 11 | Generic name, likely many ops |
| aws-s3 | bia-mlflow | 20 | 2 | 2 different IPs, same operator |
| aws-s3 | flow-bucket | 10 | 1 | DigitalOcean NL operator |
| aws-s3 | maiaddy-mlflow-artifacts | 10 | 1 | Mediloca / maiaddy healthcare |
| aws-s3 | authenta-mlflow | 10 | 1 | AWS US East |
| aws-s3 | mlflow-artifacts-631835340943 | 10 | 1 | AWS account 631835340943 (Oishii vertical farming) |
| aws-s3 | svdeepflow-mlflow-artifacts-dev | 9 | 3 | 3 IPs in AWS Seoul, same operator redundant infra |
| aws-s3 | miles-mlflow-models | 9 | 1 | Hetzner DE operator |
| aws-s3 | hia-dev-s3-ew1-hes-ai-000 | 9 | 1 | AWS Sweden — likely HES / Hes-AI |
| aws-s3 | broadcast-ai | 9 | 1 | GCP US central |
| gcs | jrg-mlflow-artifacts | 8 | 1 | GCS bucket |
| azure-blob | mlflowartifacts@riskmodelstorage.blob.core.windows.net | 5 | 1 | Risk modeling, Azure |
| aws-s3 | gmb-ddal-test-oslo-shared | 4 | 1 | University of Oslo |
| aws-s3 | mlperf-mlflow-artifacts | 4 | 1 | IBM Cloud, MLPerf benchmark |
| gcs | aircheck-mlflow-tracking | 4 | 1 | GCP |
| gcs | kurious-ml-dev | 4 | 1 | GCP dev bucket |
| gcs | avaloka-mlflow-artifacts | 4 | 1 | GCP |
| azure-blob | mlflowartifacts@mvpusstorage.blob.core.windows.net | 4 | 1 | Azure |
Cross-operator bucket reuse
Only one bucket name appears on multiple distinct hosts in a way that suggests same-operator-redundant-infrastructure (vs different operators happening to share a name):
svdeepflow-mlflow-artifacts-devon 3 IPs in AWS Seoul region. Same operator running redundant tracking servers pointed at the same bucket. Disclosure target: one entity, three servers.
The 78-host “local-fs” set isn’t real cross-operator overlap. Every operator with file:// artifacts has their own local-fs.
Disclosure implications
The 58 unique buckets are second-order disclosure surface. Each bucket name is searchable: if the operator’s bucket policy allows public list/get, the artifacts are exposed independently of the MLflow server’s auth state. Worth a follow-up pass to check public accessibility of each named bucket, but out of scope for this Phase 5 primitive.
#3: LiteLLM model-catalog spend-tier classifier
Result: 283 unauth LiteLLM hosts represent ~$60,494/month ($725,927/year) in cost-at-risk at provider posted rates.
Methodology
Each LiteLLM critical host’s /v1/models response was already
captured in the VisorBishop indicators payload (full model_ids
when ≤20 models, sample of 20 otherwise). For each model:
- Match the model ID against a pricing table covering Anthropic Claude, OpenAI GPT/o-series/embeddings, Google Gemini, AWS Bedrock, Meta Llama, DeepSeek, Moonshot Kimi, Mistral, Cohere, xAI Grok, and Ollama-passthrough (free, signaled by deployment).
- Classify into a tier (
frontier≥$15/Mtok,high$3-15,mid$0.5-3,low<$0.5,freeself-hosted). - Compute blended $/Mtok across all distinct models on the host.
- Apply a conservative 10M tokens/model/month abuse rate (attacker burns one model at 10M tokens for a month). Real attacker abuse is faster. Bots can hit a frontier model at 100M+ tokens/month before detection, so this is a lower-bound estimate.
Tier distribution across all 1,127 exposed models
| Tier | Count | % | Definition |
|---|---|---|---|
| frontier | 133 | 11.8% | ≥$15/Mtok (GPT-4/5, Claude Opus, o1/o3) |
| high | 268 | 23.8% | $3-15/Mtok (Claude Sonnet, GPT-4o, Gemini Pro) |
| mid | 310 | 27.5% | $0.5-3/Mtok (Claude Haiku, Gemini Flash, Mistral) |
| low | 373 | 33.1% | <$0.5/Mtok (GPT-3.5, Llama 8B, embeddings) |
| free | 43 | 3.8% | Self-hosted (Ollama, TGI passthrough) |
Provider distribution (top 10)
| Provider | Models exposed |
|---|---|
| (unknown / custom model names) | 290 |
| OpenAI | 207 |
| Anthropic | 201 |
| Google Gemini | 120 |
| Alibaba (Qwen) | 99 |
| DeepSeek | 45 |
| Ollama (self-hosted) | 43 |
| Moonshot Kimi | 30 |
| Meta Llama | 27 |
| Mistral | 26 |
Top 10 operators by cost-at-risk
| $/mo (conservative) | Target | Tier | Models | Frontier sample |
|---|---|---|---|---|
| $4,680 | 20.2.91.83:80 (Azure HK) | frontier | 57 | gpt-5.5, gpt-5.4-pro, gpt-5.4 |
| $2,640 | 98.149.54.126:4100 (Charter US residential) | frontier | 22 | claude-opus-4-1, claude-opus-4-1-20250805 |
| $1,898 | 95.216.95.1:4000 (Hetzner FI / netiva.com.tr) | frontier | 11 | gpt-5.4, gpt-5.5, gpt-5.4-mini |
| $1,854 | 156.227.236.247:4000 (Yisu Cloud JP) | frontier | 18 | gpt-5-image, gpt-5-image-mini |
| $1,575 | 118.196.116.53:4000 (China) | frontier | 7 | gpt-5.2, gpt-5.3-codex |
| $1,500 | 103.29.190.142:4000 (Indonesia) | frontier | 6 | chatgpt/gpt-5.4, chatgpt/gpt-5.4-pro |
| $1,460 | 43.153.66.205:4000 (Tencent US) | frontier | 11 | gpt-4.1, gpt-5.3-codex |
| $1,447 | 203.149.11.67:443 (Samart Thailand, api.modelharbor.com) | frontier | 20 | gpt-5.4, claude-sonnet-4.5 |
| $1,353 | 107.174.66.118:4000 (RackNerd US) | frontier | 30 | gpt-5, o1, claude-opus-4 |
| $1,243 | 64.227.146.129:4000 (DigitalOcean IN) | frontier | 16 | gpt-4.1, gpt-4.1-mini |
What “cost-at-risk” really means
The $60,494/mo aggregate is the minimum bill the operator’s provider stack would face if every one of these endpoints were abused at 10M tokens/model/month for a single month. Real-world attacker patterns are different:
- Concentrated abuse on frontier models, attackers target the
highest-cost models because the burn rate is fastest. The
$4,680/mo operator at
20.2.91.83exposes 57 models including 3 frontier. An attacker hitting just those 3 at 100M tokens/mo would burn ~$6,750/mo on that single host. - Sustained abuse beyond one month, most operators don’t notice the bill anomaly for 1-3 cycles. Cumulative damage over a 90-day undetected window is 3× the monthly number.
- Indirect costs, burning the operator’s rate limit, exhausting their Anthropic / OpenAI capacity allocation, triggering provider abuse-investigation that may suspend the operator’s account.
A reasonable real-world projection: multiply the conservative number by 3-10× to estimate actual undetected-abuse-window blast radius. That puts the population-scale annualized risk in the $2-7M/year range.
Outliers worth noting
98.149.54.126is on Charter Spectrum residential. This is a developer running LiteLLM at home with their personal Anthropic API key, exposing 22 models including frontier Opus. An individual’s monthly Anthropic bill could be five-figures before they notice.api.modelharbor.com(Thailand) reads as a commercial LLM routing product. Every paying customer is sharing an unauth proxy.- 9 of the top 10 operators expose frontier-tier models: the abuse-attractive tail is concentrated, not random.
Cumulative impact picture
The three primitives together turn the inventory into an impact narrative:
| Primitive | Numeric finding | What it means |
|---|---|---|
| Cross-platform correlation | 1 same-IP chain | Attacker can chain Phoenix trace-disclosure → LiteLLM LLMjacking on the same host (Hetzner DE) |
| Artifact-URI extraction | 58 unique buckets across 120 hosts | Second-order disclosure surface; named buckets like bia-mlflow, maiaddy-mlflow-artifacts, svdeepflow (3× same operator), riskmodelstorage |
| Spend-tier classifier | $60K-$725K conservative / $2M-7M realistic | Dollarized LLMjacking blast radius across the 283 critical LiteLLM operators |
These are the headlines the 492-critical inventory enables.
Reproducibility
All three primitives are deterministic functions of the cumulative corpus. The scripts live at:
~/recon/2026-05-11-phase5/
01-cross-platform-correlation.py: reads the dashboard JSON02-mlflow-artifact-uris.py: reads the raw iter-7 MLflow JSON03-litellm-spend-tier.py: reads the raw iter-6 LiteLLM JSON
Outputs:
correlations.json: by-IP / by-/24 / by-org operator overlapsmlflow-artifacts.json/mlflow-artifact-buckets.tsv. Bucket inventorylitellm-spend-tier.json/litellm-spend-tier.tsv. Per-host spend tier + cost-at-risk
The pricing table in #3 is the only piece that needs maintenance (provider list-prices change quarterly).
What’s next
Phase 5 closes the research arc. The natural next steps:
- Public-facing announcement, the cost-at-risk numbers ($725K-$7M/year annualized) are the publishable signal.
- Bucket-accessibility pass, query each of the 58 unique buckets surfaced in #2 for public-list / public-get.
- Iter-8 platform expansion, postHog LLM analytics, Kubeflow, Airflow ML DAGs.
- Disclosure-routing pipeline for the 492 cumulative critical operators.
Cross-references:
- iter-6 case study (LiteLLM)
- iter-7 case study (MLflow + W&B)
- Methodology Insight #16
- VisorBishop pipeline scripts: VisorBishop/tree/main/scripts