AIPOD orthodontic AI MLflow + Label Studio + S3 stack, CVE-2023-1177 actively-exploited (138.197.152.103)
NuClide Research · 2026-05-06
Summary
DigitalOcean droplet 138.197.152.103 runs an end-to-end orthodontic-AI R&D stack that has been operational and unauthenticated since March 2023. Three production AI services on the same host:
DCWF KSAT coverage
Auto-derived from DCWF AI work-role rule files (ksat-tag).
- 672 (AI Test & Evaluation Specialist): K7003, K7004, K7044, S7068, S7070, S7075, T5904, T5919
- 733 (AI Risk & Ethics Specialist): K7040, S7067, T5854, T5868, T5893
- overlap (Common AI KSATs (all 5 roles)): K1158, K1159, K22, K6311, K6900, K6935, K7003, K7024
| Port | Service | Auth | Vulnerability |
|---|---|---|---|
| 5000 | MLflow 2.2.1 | NONE | CVE-2023-1177 path-traversal RCE, actively exploited since 2026-03-26 with 18 attacker-injected experiments |
| 8080 | Label Studio 1.5.0.post0 (Jul 2022 release, 3 years stale) | Token required | CVE-2024-23633 LFR + CVE-2024-24566 SSRF apply to this version |
s3://aipod-crop/ | S3 artifact bucket (us-east-2) | Private (403 on public probes) | Bucket exists, no public objects |
Full chain ran via bash data/visor-chain-runner.sh mlflow-cve plus follow-up enumeration via direct MLflow REST API (Methodology Insight #6 conjunctive matchers).
What AIPOD does (operator-IP exfil from MLflow metadata)
The operator is a dental-AI / orthodontic-AI startup developing an end-to-end pipeline of medical-imaging models:
| Year | Experiment | Runs | Model task |
|---|---|---|---|
| 2023 | /demo-experiment (id 155786267…) | 15 | Initial validation experiment |
| 2023 | initial-model (id 701546096…) | 3 | First production attempt; train_data_version train_version_1.csv |
| 2023–2024 | real-exp (id 956907690…) | 53 | Multi-class classifier with MSE loss + cross-entropy; cosine warmup; ReduceLROnPlateau scheduler; 256x256 image input |
| 2024 | pan-segmentation (id 148418839…) | 13 | Panoramic dental X-ray segmentation; training datasets pan_set_1, pan_set_2, pan_set_3; cosine warmup; IoU 0.66 / val loss 0.61 best |
| 2025 | ceph-keypoint (id 804275185…) | 14 | Cephalometric (lateral skull X-ray) keypoint detection; 256x256 input; 4,428 train / 3,542 val examples; final RMSE 0.0109 |
| 2026 | orthodontic-upper-multitask (id 583324192…) | 19 | Multi-task upper-jaw classifier (arch + alignment); developer gaurav; fold-cross-validated; best avg_combined_f1: 0.4899 (R&D-stage, not production) |
The pipeline progression is consistent with a methodical orthodontic-AI roadmap: foundational classifier → panoramic segmentation → cephalometric keypoint detection → multi-task arch+alignment fusion. 4-year R&D investment leaked through the MLflow metadata.
Developer roster (extracted from mlflow.user and mlflow.source.name)
gaurav, Mac developer (path/Users/gaurav/Documents/usa_work/ULClassification/); only active on the 2026 multi-task work; offshore developer signature (usa_workdirectory naming pattern)ubuntu, production droplet’s default user; ran the 2023–2024 training waves; src pathssrc/models/train.pyandsrc/model/train.py
Git commits leaked through MLflow tags
| Commit | Period | Source path |
|---|---|---|
34fb854192012a8da1c409abbeb13939112df9fc | 2023-03 to 2023-04 | src/models/train.py |
f32e5d52f16c83f01bac8b654da1e8bd8f4754b4 | 2023-06 to 2024-05 (~12 months main branch) | src/models/train.py |
daa9915c… | 2024-04 (refactor) | src/model/train.py (singular model) |
dfe5665b8a0af217dc632d313245d0640e08b18d | ceph-keypoint context | src/model/train.py |
0024a538f1c70c660ac9391048fc5d1e603fe89a | pan-segmentation context | src/model/train.py |
GitHub commit-search for these SHAs returns 0 hits → the operator’s repos are private. The commit hashes are nonetheless useful as forensic fingerprints if the operator’s GitHub Enterprise / GitLab self-host ever surfaces.
Activity timeline
2023-03-10 ────● /demo-experiment
2023-03-17 ────● initial-model ← year-1: foundational
2023-03-20 ────● real-exp (53 runs over 2023-2024)
2024-03-26 ────● pan-segmentation ← year-2: dental X-ray segmentation
2024-05-04 ────● last `ubuntu` legit run
(13 months of silence on MLflow surface - possibly migrated production
elsewhere; this droplet kept as stale dev / artifact repository)
2025-04-13 ────● ceph-keypoint (14 runs) ← year-3: keypoint detection
2026-03-23 ────● orthodontic-upper-multitask (19 runs in one day, gaurav)
← year-4: multi-task fusion
2026-03-26 ────● CVE-2023-1177 spray actor finds the host (3 days after gaurav's burst)
2026-04-10 to 2026-04-23 ────● wave of /etc/ traversals
2026-04-20 ────● 5x /root/.ssh/ traversals (SSH key hunt)
2026-05-01 ────● new attacker campaign IDs
2026-05-05 ────● `exp_103` injection
2026-05-06 06:54 UTC ── ● `poc_exp` injection (16h before NuClide re-probe)
The CVE-2023-1177 spray actor landed 3 days after the operator’s most-recent-visible legit activity. Possibilities: (a) coincidental population-scale spray, (b) Shodan harvest noticed the activity, (c) someone signaled the host. The 3BT8ncOzBWAH4GyIGz0EXsSwj7f UUID appears on multiple tier-2 MLflow hosts (population-scale actor, the synthesis paper documents this UUID across both 138.197.152.103 + 159.203.110.202).
Active attacker presence (CVE-2023-1177)
24 total experiments on host: 6 legit + 18 attacker-injected. Attacker-injected experiments share a recognizable pattern:
{
"name": "<random-16-char>",
"artifact_location": "http:///?/../../../../../../../../../../../../../../etc/"
}
Each path traversal targets either /etc/ or /root/.ssh/. The attacker has multiple campaign UUIDs:
| Campaign UUID prefix | Pattern | First seen | Most recent |
|---|---|---|---|
3BT8ncOzBWAH4GyIGz0EXsSwj7f | population-scale spray | 2026-03-26 00:11:10 | 2026-03-26 00:11:12 |
3CCGENufMtsxUjr3ij4gjsPM44m | /etc/ only | 2026-04-10 23:33:48 | 2026-04-10 |
3D9V4JvPnDuvfxpSHZBQo1TTM3x | /etc/ only | 2026-05-01 22:54:45 | 2026-05-01 |
PJYMtlmXsSfyO0hk (16-char) | /etc/ | 2026-04-23 12:49:00 | 2026-04-23 |
MXhmOLyZ7i2zgR5d (16-char) | /etc/ | 2026-04-20 11:11:25 | 2026-04-20 |
6tUWyqxY1Z3cuSvj (16-char) | /etc/ | 2026-04-20 11:11:13 | 2026-04-20 |
aZGVwezuF60CHthW (16-char) | /root/.ssh/ | 2026-04-20 11:11:36 | 2026-04-20 |
9D6H17u0tiNmXdOp (16-char) | /root/.ssh/ | 2026-04-20 11:11:39 | 2026-04-20 |
apwsM4eyDoVjWJxq (16-char) | /root/.ssh/ | 2026-04-20 11:11:28 | 2026-04-20 |
RaYNG7f9MAsKW8ci (16-char) | /root/.ssh/ | 2026-04-20 11:11:33 | 2026-04-20 |
4lHeW9CUYxhVujFz (16-char) | /etc/ | 2026-04-20 11:11:19 | 2026-04-20 |
A0lNs4QbTgIChecm (16-char) | /root/ | 2026-04-20 11:11:41 | 2026-04-20 |
exp_103 (named) | /etc/ | 2026-05-05 08:37:42 | 2026-05-05 |
poc_exp (named) | /etc/ | 2026-05-06 06:54:25 | 2026-05-06 |
The 2026-04-20 batch is striking, 8 experiments injected within 30 seconds, all targeting /root/.ssh/ and /etc/. This is automated mass-spray behavior, not interactive testing.
Did they exfil anything?
The attacker-injected runs are stuck in RUNNING status with empty user_id. The CVE-2023-1177 exfil flow is:
1. POST /api/2.0/mlflow/experiments/create
{"artifact_location": "http:///#/../../../../../etc/passwd"}
2. POST /api/2.0/mlflow/runs/create - get run_id
3. GET /get-artifact?path=passwd&run_uuid=<id> ← read the file content
The injection (steps 1-2) is what’s visible to us; step 3 is what would actually exfil files. NuClide cannot determine from passive observation whether the attacker has executed step 3 successfully, the run’s artifact response payload isn’t logged in MLflow. However: the persistence + scale of the spray (40+ days, 18 experiments, 6 distinct campaign IDs on this single host) suggests the actor is at least attempting exfil, not just surveying.
Recommended verification: the operator should grep their MLflow access logs (mlflow_default.log or systemd journal of the gunicorn service) for GET /get-artifact?path= requests with attacker run_ids, those would confirm exfil execution.
Disclosure routing
Provider: abuse@digitalocean.com (rank-1 from nuclide-contact WHOIS resolution).
Operator-direct: AIPOD has no public-facing domain reachable from the data we collected. No CT-log subdomains, no rDNS, no website at aipod.com / .io / .ai / .app (those are unrelated). The S3 bucket aipod-crop is the only operator-attributable artifact, and AWS doesn’t surface bucket-owner contact publicly. Provider-channel-only disclosure recommended; DigitalOcean’s customer-notification process will reach the operator through their billing identity.
Disclosure draft: disclosures/DIGITALOCEAN-138-197-152-103-aipod-mlflow.md
9-step chain provenance
Step 0 jaxen import --no-lookup --source ledger-revisit-2026-05-06 → empire.db
Step 1a visorplus assess (138.197.152.103) → DigitalOcean WHOIS, nmap top-1000 (3 ports), SSH host keys (RSA+ECDSA+Ed25519), GreyNoise: benign/RIOT
Step 1b aimap -list → MLflow 2.2.1 confirmed; **Label Studio mis-fingerprinted as Langfuse** (FP bug - see followup work)
Step 1c jaxen pivot http://138.197.152.103:8080/ → favicon hash `-1649949475` for cross-fleet pivot
Step 2 visorgraph -ip → no TLS, no cert pivots (bare-IP hosting)
Step 3 aimap-profile --target --mode full → no CT subdomains, no security.txt, no public DNS
Step 4 JS-bundle extraction (Label Studio) → /api/version disclosed v1.5.0.post0 build hash
Step 5 nuclide-contact → abuse@digitalocean.com (operator opaque)
Step 6 visorlog ingest → ledger entry (existing event ID #220 from milvus survey was on different IP; this is new)
Step 7 visorscuba assess → 743 nodes; AI.C1 critical violation
Step 8 bare → CVE-2023-1177 commodity-CVE chain confirmed (top score ~0.5+)
Step 9 visorcorpus build (-profile strict -type baseline -include kb_exfiltration,system_prompt,config_secrets) → 46-case corpus
Severity rationale
HIGH, not CRITICAL. Reasoning:
- AIPOD is at R&D stage (best avg_combined_f1: 0.4899, model is not production-deployed; metrics suggest active iteration)
- No customer-facing surface identified (no CT logs, no DNS, opaque operator)
- Patient-PHI scale unconfirmed (
pan_set_1/2/3contain X-ray training images but counts/PII shape not enumerated; MLflow doesn’t log full filenames in run params) - Active CVE exploitation IS confirmed but exfil success is unproven from external observation
- 3+ year persistent exposure increases blast radius
If the operator’s S3 access keys leak via /etc/aws/credentials traversal, severity escalates to CRITICAL, the bucket has 4 years of model artifacts including the patient X-ray training data.
References
- Original Triton + MLflow survey context,
mlflow-cloud-survey-2026-05.md - Population-scale CVE-2023-1177 attacker UUID
3BT8ncOzBWAH4GyIGz0EXsSwj7f, first documented inSYNTHESIS-2026-05.md“Class E, Active CVE exploitation” - Sister actively-exploited host (159.203.110.202), same attacker UUID, financial workload (
helios_stock_direction); deferred to a separate disclosure - aimap Langfuse fingerprint FP,
~/ai-recon/aimap/fingerprints.go:294matches Label Studio’s{"status":"UP"}response (Methodology Insight #10 territory) - JAXEN favicon-hash pivot for the Label Studio v1.5 fleet,
http.favicon.hash:-1649949475