sub2api — Population survey: 7,720 indexed hosts, auth-on-default at scale, zero pool-leak
Date: 2026-05-19 Survey: sub2api-population-2026-05-19 Target class: github.com/Wei-Shaw/sub2api (Go-rewrite successor to claude-relay-service) Population: 7,720 unique host:port indexed on Shodan Toolchain: JAXEN, direct Shodan API loop, custom verification probe, VisorGraph, attribution slicer, aimap deep-enum, JS-bundle extractor
Headline findings
| State | Hosts | Share | Significance |
|---|---|---|---|
| CONFIRMED_AUTH_GATED | 5,848 | 75.75% | Auth-on-default holds at population scale (Insight #25 confirmed) |
| DEAD | 1,605 | 20.79% | Indexed but not responding |
| SETUP_OPEN | 101 | 1.31% | Install wizard accessible — takeover-on-init vector |
| PROXIED_OR_DOWN | 71 | 0.92% | Front-door 5xx or nginx/openresty without sub2api signature |
| UNKNOWN | 64 | 0.83% | Mixed: cross-contamination from other LLM proxies, scheme mismatches |
| LIVE_FRONTEND_ONLY | 15 | 0.19% | Vue3 UI served, backend not routing |
| CONFIRMED | 12 | 0.16% | /health only, auth status not directly anchored |
| DEV_MODE | 4 | 0.05% | vite dev server in production (build-pipeline leak) |
| POOL_LEAK | 0 | 0.00% | The v1 finding pattern does NOT generalize |
DCWF KSAT coverage
Auto-derived from DCWF AI work-role rule files (ksat-tag).
- 672 (AI Test & Evaluation Specialist): K7003, K7004, K7044, S7068, S7070, S7075, T5858, T5904, T5919
- 733 (AI Risk & Ethics Specialist): K7040, K7051, S7056, S7067, T5854, T5868, T5882, T5893
- overlap (Common AI KSATs (all 5 roles)): K1158, K1159, K22, K6311, K6935, K7003, T5896
Top-line: 7,720 / 7,720 = 100% probed. 6,083 / 7,720 = 78.8% returned sub2api-signature responses. Of those, 5,848 / 6,083 = 96.1% enforced auth-on-default on the API surface.
Why this matters
Two months ago a different Wei-Shaw project, claude-relay-service (v1, Node.js), was disclosed to Anthropic (2026-05-19) for pooling 32 paid Anthropic accounts across 6 visible hosts and serving 13.92B Claude tokens. The finding was anchored on publicly-readable pool stats at /. Insight #39 was codified from that work: pooled-account upstream proxy as attribution-laundering layer.
The successor project, sub2api (v2, Go), claims ~8,105 deployments. If the v1 finding pattern held at v2 scale, we’d be looking at a 1,300× larger surface of the same exposure class. The directional question this survey set out to answer: does the v1 finding generalize to v2?
Answer: No. The Go rewrite hardened the metrics surface. Zero of 7,720 hosts expose pool stats publicly. The finding pattern (publicly-readable pool counts) does not survive the v1 → v2 OSS generation. The architecture pattern (Insight #39: Tier-2 pooled-account substrate) remains; the finding shape changes.
Methodology
Discovery
- Dork:
http.title:"Sub2API - AI API Gateway"(title-anchored — sub2api ships this verbatim fromindex.html) - Total Shodan reports: 7,963 (matches Wei-Shaw’s claimed 8,105 within 1.8%, consistent with normal indexing lag)
- Pulled 7,720 unique host:port across 79 pages (Freelance tier; no page-70 wall per Insight #35)
- Pre-emptively widened against schema-anchored fallbacks (
http.html:"sub2api-admin", favicon hash) — not needed at this scale
Verification (load-bearing stage)
Eight schema-anchored paths per host, 80-worker concurrent probe, 8.2 minutes wall time for full population:
| Path | Anchor signal | Status code expected |
|---|---|---|
/health | {"status":"ok"} (sub2api signature) | 200 |
/setup/status | {"code":0,"data":{"needs_setup":Bool}} | 200 |
/api/v1/admin/stats | {"code":"UNAUTHORIZED"} envelope | 401 (auth gated) |
/api/v1/admin/users | {"code":"UNAUTHORIZED"} envelope | 401 (auth gated) |
/api/v1/auth/login | sub2api error envelope | 405 / 422 |
/v1/messages | (method-not-allowed for GET) | 404 / 405 |
/v1/models | {"code":"API_KEY_REQUIRED","message":"API key is required in Authorization header (Bearer scheme), x-api-key header, or x-goog-api-key header"} | 401 |
/ | Vue 3 frontend, /logo.png favicon, lang="zh-CN"/"zh" | 200 |
The verbatim /v1/models 401 error string is the highest-confidence single-signature anchor — it appears in sub2api source at backend/internal/gateway/* and is unique to this project.
Classifier (v2, post-probe refinement at reclassify.py) anchors on response-shape rather than status codes alone per Insight #6.
Attribution
- Per-host: ASN, org, ISP, country, hostnames, cert subject CN, domain parts (extracted from
attribution.csv) - Per-operator: cert-CN clustering identifies multi-host operators (Tier 3 storefronts per Insight #39)
- VisorGraph cert-pivot on 5 representative storefront CNs: zero cross-seed overlap — independent operators
Ethics
- All probes are GET-only against published health/status endpoints and the sub2api API surface
- Anchored on response evidence; no POST to auth, install, or inference endpoints
- SETUP_OPEN hosts confirmed wizard-accessible via
/setup/statusGET; no/setup/initPOST attempted (this would seize admin) - No upstream Tier-1 vendor credentials extracted or attempted
- No customer prompt/response data retrieved
Detailed findings
F1 — SETUP_OPEN class (101 hosts): install-wizard takeover-on-init vector
What: Hosts at production-stage deployment with the sub2api install wizard accessible. /setup/status returns {"code":0,"data":{"needs_setup":true,"step":"welcome"}}. The wizard endpoint accepts admin-account creation without authentication; first POST claims the admin role.
Verified to: Wizard accessibility confirmed via GET. Wizard exploitability was not confirmed by attempted POST (ethical-stop boundary).
Distribution:
- 48 US, 17 SG, 13 CN, 11 HK, 8 JP
- Top providers: RackNerd (7), Asia Pacific NIC (7), Tencent BJ (6), Oracle (5), IT7 Networks (4)
- Top domain clusters: ColoCrossing (6), 16clouds (6), DMIT (4) — same operator ecosystem as Butterfly2Sea / Li Peipei from prior survey (project_butterfly2sea_operator)
Operator brand certs visible: *.helper6.com, sub2api.shouyouradar.com
Why it matters: Per the auth-on-default thesis, this is the residual 1.3% gap where install incompleteness creates an admin-takeover window. Anyone reaching the host before the operator completes setup can seize the admin account and bind their own pooled Tier-1 credentials.
Verification artifact sample (101.42.109.163:8080):
GET /setup/status HTTP/1.1
HTTP/1.1 200 OK
content-type: application/json; charset=utf-8
{"code":0,"message":"success","data":{"needs_setup":true,"step":"welcome"}}
F2 — DEV_MODE class (4 hosts): vite dev server exposed in production
What: Hosts serving the root HTML with import { inject } from "/@vite-plugin-checker-runtime" — the vite development server’s HMR client. These are production deployments accidentally running the dev build, which exposes source-map paths, unminified bundles, and the vite dev WebSocket on the same port.
Distribution:
- 2 HK, 2 US
- 2 on HostPapa (AS36352, ColoCrossing), 1 on FREEZING NETWORK, 1 on Alibaba LLC
- One named operator:
api.nevinxu.site
Why it matters: Source-map URLs leak operator file-system paths. Dev-mode bundles are unminified and grep-friendly for downstream finding-extraction.
F3 — multi-host operator clusters (Insight #39 Tier-3 evidence)
VisorGraph cert-pivot on 5 cert-CN seeds; population-level cert-CN aggregation:
| Operator brand | Hosts | Cluster shape |
|---|---|---|
| aiproxy.astrum-lab.com | 9 | Multi-host storefront cluster; mostly PROXIED_OR_DOWN at probe time (cluster front-door enforces access control above sub2api layer) |
| 79102.com | 9 | Multi-host commercial brand |
| *.wowkaka.cn | 8 | Multi-host with expired DigiCert wildcard (2024-07-22) |
| sub2api.t2n.cc | 4 | Cert-blind (no CT history; plain HTTP only) |
| fuxingshop666.cn | 3 | Multi-host commercial brand |
| gpt.melemoe.com | 3 | Multi-host operator |
| sub2api.team | 1 | Branded single-host |
| api.nevinxu.site | 1 | Branded single-host (DEV_MODE) |
Cross-cluster overlap (VisorGraph result): Zero. Each operator brand is structurally independent — no shared IPs, no shared hosting providers, no shared cert authorities. This is direct evidence for Insight #39’s Tier-3 fragmentation claim: storefronts are NOT centralized, they’re independent commercial operators each pooling upstream relays.
F4 — Non-finding: zero POOL_LEAK class hosts
What did not appear: 0 of 7,720 hosts exposed pool-account counts, total-token counters, or third-party-account-max-concurrent fields publicly. These were the load-bearing fields of the v1 claude-relay-service disclosure (the v1 finding was anchored on / returning {"accounts":N,"availableAccounts":M,"stats":{"totalTokens":...}}).
What changed: The Go rewrite gates /api/v1/admin/* behind an x-api-key header or JWT bearer. /health returns only {"status":"ok"}. /v1/models returns 401 with the verbatim API_KEY_REQUIRED envelope. Pool stats are not reachable at the unauthed layer.
Why this matters: OSS authors learn from disclosures. The v1 finding (and the resulting Anthropic disclosure) plausibly drove the v2 architecture’s auth-on-default posture on metrics. This is a positive signal for the disclosure-feedback loop in this ecosystem.
F5 — JS bundle hygiene (clean): zero baked secrets across 60-host sample
Sampled 60 hosts, fetched root HTML + ~20 JS assets per host, deduplicated by sha256, regex-scanned the cached unique bundles for:
- Anthropic, OpenAI, Google, AWS, GitHub, Slack, Cohere, DigitalOcean, Render API key patterns
- Vite/Next.js/React build-time-baked env-var secrets
- Hardcoded vendor inference URLs
- Source-map URLs and brand-string leaks
Result: zero matches across all unique bundles. The default sub2api Vue 3 build is sterilized; api_base_url is bound as a runtime form-field in the admin UI, not a build constant. Insight #36 (PaaS build-arg secret baking) does NOT generalize to sub2api OSS deployments.
The strongest bundle (sha256 4b398481…) appeared on all 60 sampled hosts = stock vendor build with no operator customization. Smaller cohorts (17-23 hosts per bundle) represent different release versions, all clean.
Attribution
Geographic distribution (CONFIRMED_AUTH_GATED subset, n=5,848)
- 51% US (2,966)
- 13% CN (749)
- 11% SG (635)
- 9% JP (502)
- 8% HK (441)
The US share is misleading; majority of US-hosted instances run on Chinese-operator hosting (ColoCrossing 417, 16clouds 175, DMIT 160, MULTACOM, PEG TECH) — same .us-shell pattern as the v1 claude-relay disclosure cohort.
Hosting providers (top 5)
- Asia Pacific NIC: 447
- ACEVILLE PTE.LTD. (SG-registered Tencent shell): 285 (v1 disclosure cohort: 5/6 hosts)
- RackNerd LLC: 268
- Tencent Beijing: 256
- HostPapa: 234
The ACEVILLE concentration (285) suggests the same operator preference pattern as the v1 cohort, scaled ~47×.
Operator-brand cert clusters
74 hosts behind CloudFlare Origin Certs (operator hides behind CF; the actual operator is one layer below).
For the visible operators (cert CN ≠ CloudFlare): the long tail is dominated by single-host and small clusters (1-9 hosts each), confirming Insight #39’s Tier-3 fragmentation. No operator brand fronts more than 9 visible hosts in this dataset.
Insight extraction
Candidate Insight #40 — “Auth-on-default thesis shifts rightward in successor OSS generations”
Pattern:
- v1 (claude-relay-service, Node.js, 2024-2025): pool stats publicly readable on
/(no auth) - v2 (sub2api, Go, 2026): all admin/metrics routes require
x-api-keyor JWT;/healthreturns only{"status":"ok"};/v1/modelsreturns the verbatim API_KEY_REQUIRED envelope
Mechanism (hypothesis): OSS authors observe disclosure outcomes from prior versions and harden the surface that drove enforcement (banned accounts, vendor pushback). The v1 disclosure (sent 2026-05-19) generalized rapidly because the v2 was already published. The v2 architecture’s auth-on-default posture is the empirical answer to the v1 exposure class.
Falsifiability: Pick the next-published OSS pooled-account proxy after sub2api. If it ALSO auth-gates metrics by default (and references prior pushback in commit history), the pattern holds. If it reverts to publicly-readable pool stats, the hypothesis is wrong and the lesson did not transfer.
Cross-survey applicability: Test against OpenAI-compat pooled relays (e.g., sub2api’s OpenAI mode), Gemini-compat relays, Vertex relays. Expect similar auth-on-default behavior if the pattern is general.
Methodology observations (not yet rising to numbered insight)
-
Author-claimed deployment count is externally verifiable to within ~2%. Wei-Shaw claims 8,105 sub2api deployments. Shodan-indexed: 7,963. Delta 1.8%, consistent with normal indexing lag. When OSS-author counts match independent measurement, the upstream-attribution claim is honest and can be relied on without re-verification.
-
Title-anchored dorks can be precise IF the title string is shipped verbatim in stock templates. The memory rule says prefer schema-anchor over brand-anchor.
<title>Sub2API - AI API Gateway</title>is both — it’s a brand match AND a template constant. Precision was preserved. -
Install-state conditionally exposes routes. Pre-setup sub2api hosts respond on
/setup/status+/only;/health404s. Post-setup hosts respond on the full API. A/health-only dork would miss the pre-setup population. Population fingerprinting must probe multiple paths to be complete. -
Cert-blind operators exist.
sub2api.t2n.cchas no CT history and serves plain HTTP only. Cert-pivot misses this class entirely. ASN/banner pivot or response-fingerprint pivot is the fallback.
Negative space (what we did NOT find)
- No publicly-readable pool stats anywhere in 7,720 hosts
- No baked API keys in the JS bundle sample (60 hosts)
- No upstream Tier-2 relay URLs hardcoded in default builds (operator entered at runtime)
- No shared infrastructure across the 5 storefront seeds (zero cross-cluster operator overlap)
- No defense-contractor / .gov / .mil hostnames in the cert-CN distribution
- No medical / clinical / .edu institutional hosts beyond what’s expected at random in a 7,720-host sample
Pivot avenues
-
AROSSCLOUD 5-host cluster (CONFIRMED bucket). 5 hosts in AS400619 are non-auth-gated CONFIRMED — possibly an older sub2api version or single-operator misconfig. Worth a focused look at the 5 cert CNs (uily.de, happycodernow.xyz, z-daha.cc, etc.) to check for a common upstream signature.
-
*.helper6.comSETUP_OPEN cluster. Sub2api install-wizard accessible on a branded multi-host cluster suggests deployment-automation script left unfinished. Test whether the wizard endpoint has been hit by anyone else. -
astrum-lab.com PROXIED-front cluster (9 hosts). Front-door enforces access control above the sub2api layer. Worth a closer look at the front-door — is it an operator-built reverse proxy?
-
CLI Proxy API Server cross-contamination. The UNKNOWN bucket caught at least one different-OSS LLM proxy class. Worth a quick survey to identify other OSS proxy projects that happen to ship the sub2api title or template.
-
Cross-vendor Insight #39 applicability check. OpenAI-compat + Gemini-compat resale architectures. If the auth-on-default v2 hardening pattern transfers, expect similar zero-pool-leak finding shapes.
-
sub2api.t2n.cccert-blind cluster. Need ASN/banner pivot to enumerate sibling hosts that deliberately stay off CT.
Artifacts
~/recon/sub2api-population-2026-05-19/
├── candidates.txt # 7,720 host:port from Shodan
├── harvest-raw.jsonl # raw Shodan records
├── harvest.py # paginated harvest harness
├── attribution.csv # per-host ASN/org/country/cert from Shodan
├── attribution-summary.txt # population distribution by all dimensions
├── verify_probe.py # 8-path schema-anchored verification
├── verify-raw.jsonl # full probe responses
├── verify-state.csv # v1 classifier output
├── verify-state-v2.csv # v2 reclassifier output (final)
├── verify-state-v2-samples.txt # 3 sample records per state class
├── reclassify.py # v2 classifier with refined anchors
├── select_aimap_sample.py # stratified 182-IP sampler
├── aimap-sample.txt # 182 IPs for deep-enum
├── aimap-results.json # aimap output (in progress)
├── js-sample.txt # 60 hosts for bundle extraction
├── js_extract.py # bundle harvester + secret scanner
├── js-bundles/ # deduplicated cached bundles
├── js-bundle-attribution.csv # sha → hosts mapping
├── js-findings.csv # secret matches (zero)
├── visorgraph-*.json # 5 cert-pivot graphs
├── cert-pivot-summary.md # VisorGraph subagent report
└── asn_slice.py # attribution slicer
Toolchain provenance
| Stage | Tool | Used | Note |
|---|---|---|---|
| 0 | JAXEN harvest | wrapped (50-cap → direct API loop) | Tool gap #2 logged |
| 1 | aimap fingerprint | running on 182-IP stratified sample | Population sweep impractical at 7,720 |
| 2 | VisorGraph cert-pivot | 5 seeds | Zero cross-overlap (Insight #39 evidence) |
| 3 | aimap-profile | pending (top 5 cluster reps) | |
| 4 | JS bundle extractor | inline replacement for missing vampire.py | Tool gap logged; tool-ideas_js-extractor.txt proposes new build |
| 5 | VisorLog ledger ingest | pending | |
| 6 | VisorScuba compliance | pending | |
| 7 | BARE module ranking | pending | |
| 8 | VisorCorpus | N/A — no unauthed /v1/messages reachable | Justified: ethical-stop + zero corpus-target surface |
| 9-17 | VisorBishop, SD, Goose, menlohunt, recongraph, nu-recon, Plus, RAG, cortex | pending / partial | Run on cluster representatives, not population |
Non-run: VisorHollow (Windows-only binary), VisorAgent (controlled-target-only).
Cross-references
- Insight #25 (auth-on-default thesis): confirmed at 5,848 / 6,083 = 96.1% of verified hosts
- Insight #35 (Shodan pagination depth): Freelance tier survived past page 70 cleanly on a 7,963-host dork; no country-split fallback needed
- Insight #36 (PaaS build-arg secret baking): did NOT generalize to this OSS install class (zero baked secrets in default build)
- Insight #37 (asymmetric auth gating): N/A — sub2api gates both dashboard AND API symmetrically
- Insight #39 (pooled-account attribution laundering): Tier-3 fragmentation confirmed at population scale (zero cross-cluster overlap on cert pivot); Tier-2 substrate hardening represents architectural evolution of the Insight (see candidate Insight #40)
- Reference: project_butterfly2sea_operator: 16clouds.com cluster overlaps prior survey’s Li Peipei ecosystem
- Reference: project_claude_relay_chinese_reseller (v1 disclosure): sub2api is the direct successor; this survey extends the v1 disclosure’s findings to the v2 population
Survey conducted 2026-05-19 by Nuclide research (Nicholas Kloster + Claude). Methodology per ~/.claude/nuclide-internal/METHODOLOGY.md. Tool gaps and improvement notes logged at ~/Desktop/nuclide-logs/.