Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All research

Survey May 15, 2026

Whisper ASR Population Survey (2026-05-15)

NuClide Research · 2026-05-15 (late evening, sixth survey of the day) Closes: Survey 17 batch 1 (Whisper ASR leg). Survey 17 now fully closed across all 3 batches


Summary

Population-scale survey of Whisper ASR (speech-to-text) deployments. The canonical OpenAI Whisper plus the popular forks (whisper.cpp, faster-whisper, WhisperX). aimap fingerprints shipped 2026-05-08; this survey closes the remaining open piece of Survey 17.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

  • 672 (AI Test & Evaluation Specialist): K7003, K7004, K7044, S7068, S7070, S7075, T5858, T5904
  • 733 (AI Risk & Ethics Specialist): K7040, T5854, T5868, T5893
  • overlap (Common AI KSATs (all 5 roles)): K108, K1158, K22, K6311, K6935, K7003, S7065
  • Shodan harvest across 6 dorks → 537 unique candidate IPs
  • Probed via fast_enum_whisper.py (read-only: GET /, /docs, /openapi.json + HEAD on /asr and /v1/audio/transcriptions) in 32.6 seconds at threads=80
  • 230 confirmed unauthenticated Whisper deployments (43% of harvest; 124 dead)
  • 145 hosts with /asr or /v1/audio/transcriptions reachable: unauth POST-audio surface = anyone can submit audio and get transcription on the operator’s compute (63% of confirmed)

Platform breakdown:

PlatformCount%Notes
openai-whisper-asr-webservice13960%The canonical OpenAI-Whisper FastAPI wrapper. Default deploys ship /docs Swagger UI + unauth /asr POST endpoint.
whisper.cpp Server4017%The llama.cpp sibling for speech — server mode exposing /v1/audio/transcriptions OpenAI-compat
faster-whisper2511%CTranslate2-backed Whisper, 4-5× faster than OpenAI’s reference impl
WhisperX42%Word-level timestamps + speaker diarization wrapper
Unidentified Whisper-class2210%Title-tagged but no body-marker; likely customized deploys

The single big finding: openai-whisper-asr-webservice default-deploy pattern

137 hosts ship with the default Swagger UI title "Whisper Asr Webservice - Swagger UI", that’s an out-of-the-box ahmetoner/whisper-asr-webservice deployment (the most popular Whisper wrapper on Docker Hub). The image’s default config exposes:

  • /: landing page
  • /docs: Swagger UI (interactive API browser)
  • /openapi.json: full schema
  • POST /asr: the transcription endpoint, unauth by container default

The deployment template doesn’t ship with authentication. Operators who run docker run -p 9000:9000 onerahmet/openai-whisper-asr-webservice and expose the port are running the unauth surface by design. Same Tier-A “no auth concept in framework default” pattern as Ollama, vLLM, Triton.

Compute-theft surface: anyone can POST a 5-minute audio file and burn the operator’s GPU/CPU on Whisper-large-v3 transcription. At the operator-paid-Compute tier, this is the audio analog of the Ollama /api/generate surface. Same class, different modality.


Cross-survey colocation

PairOverlap
Whisper ∩ Ollama (16,473)13 (multi-modal text+audio AI stack — same VPS)
Whisper ∩ llama.cpp (965)2
Whisper ∩ Docker daemon (286)0
Whisper ∩ voice-agent (184)0

The 13-host Whisper+Ollama overlap is the multi-modal stacked operator catastrophe class:

  • Whisper exposes audio→text transcription (operator’s compute)
  • Ollama exposes text→text inference (operator’s compute, often paid :cloud quota)
  • Together: an attacker can pipe audio → Whisper → Ollama → response end-to-end through the operator’s resources

The 2 Whisper+llama.cpp overlap is a smaller version of the same. Same VPS running both unauth ASR and unauth LLM.


Methodology placement

This survey adds Whisper to the Tier-A “no auth concept in framework default” platform list, joining:

  • Ollama (no auth concept, 100% unauth)
  • vLLM (no auth concept, 100% unauth at population)
  • Triton (no auth concept)
  • Qdrant / ChromaDB / Milvus (no auth concept in defaults)
  • MLflow Tracking (no auth in defaults)
  • Whisper ASR webservice family (no auth in defaults, confirmed at 230 hosts)

Reinforces the auth-on-default thesis on the audio-AI tier (which was previously only tested via the voice-cloning / voice-agent surveys earlier today; Whisper-ASR fills the speech-recognition side of the audio quadrant).


Toolchain Provenance

0. shodan_paginate.py × 6 dorks                          →  537 ip:port unique
1. fast_enum_whisper.py (threads=80, timeout=6s, ~32s)   →  230 confirmed (139 webservice + 40 cpp + 25 faster + 4 X + 22 unid)
2. /docs + /openapi.json + /asr + /v1/audio HEAD probes  →  145 hosts with unauth-POST audio endpoint reachable
3. cross-survey diff vs Ollama/llama.cpp/Docker/voice    →  13 Whisper+Ollama; 2 Whisper+llama.cpp
4. visorlog ingest                                       →  28 net-new events (heavy dedup with prior surveys' IPs)

Honest negative space

  • Model-hint extraction returned mostly empty: openai-whisper-asr-webservice doesn’t expose its loaded model (ASR_MODEL env var) in the /openapi.json description. The operator-workload-corpus axis from Insight #24 doesn’t have a direct analogue here; the loaded model is an internal-only config.
  • No PHI / hospital / clinical hostnames surfaced in the immediate analysis pass (because the hostname-enrichment query had a path error during analysis; full hostname-vs-sensitive-token grep is flagged for follow-up). The healthcare-tier risk class is real for Whisper (hospital transcription deployments) but didn’t surface a specific gov/edu/clinical host this run.
  • 537 candidate corpus may undercount the true Whisper population. http.html:"whisper" returned 2,300 candidates that this survey didn’t probe (Shodan-broader pool). A second-pass on the 2,300 with the full prober is flagged for follow-up.

See also