Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All reference

Reference

Voice/Audio AI — Shodan Query Catalog

Source: https://github.com/nuclide-research/AI-LLM-Infrastructure-OSINT/blob/main/shodan/queries/voice-audio-ai-queries

Generated: 2026-05-27 from pre-survey OSINT pass (12 platforms) See: data/platform-intel/voice-audio-ai-osint-2026-05-27.md for full intel


Whisper / faster-whisper / whisper.cpp

Auth default: none (no auth concept across all variants) Exposure class: Transcribed audio content, free GPU compute abuse, audio upload cache

LabelQueryRationaleFP Risk
primary"openai-whisper-asr-webservice" port:9000Exact Docker image banner from onerahmet/openai-whisper-asr-webservice; appears in HTTP server headerLow
secondary"whisper.cpp" port:8080whisper.cpp literal string in HTTP response + canonical portLow
title-filteredhttp.title:"Whisper" "uvicorn" -product:"Microsoft IIS"Gradio/FastAPI Whisper UIs; uvicorn confirms Python stackMed
faster-whisperhttp.html:"faster-whisper" -http.html:"wakehealth"faster-whisper string in HTML sourceMed
wyoming-tcpport:10300Wyoming protocol port (TCP, not HTTP — limited Shodan coverage)High
cpp-inference"whisper.cpp" "/inference"C++ server endpoint literal in bannerLow
html-anchorhttp.html:"WhisperX"WhisperX word-level alignment variantLow
identity-probeGET / on port 9000 → "ASR" in title; POST /inference on 8080 → JSON {"text":"..."}

FP note: http.title:"Whisper" alone returns Wake Forest WHISPER clinical portal (ColdFusion/IIS), government-authorized-use banners, and unrelated chat apps. Required filters: -http.html:"wakehealth" -http.html:"actLogin.cfm" -product:"Microsoft IIS". Never run the bare title dork without anchors.


Coqui TTS

Auth default: none (binds 0.0.0.0:5002, no auth shipped) Exposure class: Free TTS compute, deepfake audio generation, speaker-embedding upload path traversal

LabelQueryRationaleFP Risk
primaryport:5002 http.html:"api/tts"Port + endpoint path combination; /api/tts is Coqui-specific at this portLow
secondaryport:5002 "coqui"Port + brand string in bannerLow
openai-compatport:5002 http.html:"v1/audio/speech"OpenAI-compatible endpoint path in HTML docsLow
maryttsport:5002 http.html:"/locales"MaryTTS-compatible endpoint on Coqui serverMed
port-onlyport:5002 http.html:"tts"Broad — catches Coqui + AllTalk + Mozilla TTS legacyMed
identity-probeGET /api/tts?text=test → 200 + audio/wav; GET / → HTML containing "Coqui"

FP note: Port 5002 also used by VMware vCenter; http.html:"api/tts" kills VMware FPs cleanly. Port 5002 + "tts" alone still catches Mozilla TTS legacy — acceptable since same exposure class.


AllTalk TTS

Auth default: none (no auth concept; API port 7851 + Gradio port 7852 both open) Exposure class: Full TTS engine control, voice inventory, GPU/DeepSpeed runtime control, RVC voice conversion

LabelQueryRationaleFP Risk
primaryport:7851 http.json:"engines_available"AllTalk-specific JSON field in /api/currentsettings responseLow
secondaryport:7851 http.json:"current_engine_loaded"Second AllTalk-unique JSON fieldLow
tertiaryport:7851 http.json:"manufacturer_name"Third distinctive field; value is always "Coqui"Low
gradio-uiport:7852 http.html:"AllTalk"Gradio UI port with AllTalk titleLow
combinedport:7851 "alltalk"Brand string in bannerLow
identity-probeGET /api/currentsettings → 200 + JSON with engines_available, current_engine_loaded, manufacturer_name:"Coqui"

FP note: Port 7851 is distinctive — low ambient use. engines_available as a JSON field is AllTalk-unique. No significant FP classes identified.


RVC (Retrieval-based Voice Conversion)

Auth default: none (Gradio, no auth; --host 0.0.0.0 exposes all interfaces) Exposure class: Voice model files, arbitrary voice conversion, training pipeline, uploaded celebrity voice models

LabelQueryRationaleFP Risk
primaryport:7865 http.html:"Retrieval-based-Voice-Conversion"Full project name in Gradio HTML; specific to RVCLow
secondaryport:7865 http.html:"rvc-webui"Docker image name appears in sourceLow
rvc-bosshttp.html:"RVC-Boss" port:7865GPT-SoVITS/RVC-Boss variantLow
titleport:7865 http.title:"RVC"UI title; shorter but FP-prone without port anchorMed
gradio-apiport:7865 http.html:"/run/predict"Gradio API endpoint always present; anchored to RVC portMed
port-7897port:7897 http.html:"voice"Alternative RVC port in some forksHigh
identity-probeGET / → Gradio HTML; GET /info → JSON {"label":"RVC..."}

FP note: Port 7865 is Gradio fallback when 7860 is occupied — other Gradio apps can land here. http.html:"Retrieval-based-Voice-Conversion" is the reliable discriminator; never drop it.


GPT-SoVITS

Auth default: none (API localhost by default but Docker exposes 0.0.0.0 on all five ports; four WebUI ports have unauthenticated RCE CVEs) Exposure class: Voice cloning (1-min audio), model file path traversal, command injection RCE (CVE-2025-49833/34/35/36)

LabelQueryRationaleFP Risk
primaryport:9880 http.html:"GPT-SoVITS"API port + brand name; tight combinationLow
secondaryport:9872 http.html:"GPT-SoVITS"Inference WebUI portLow
api-endpointport:9880 http.html:"/set_gpt_weights"API endpoint path distinctive to GPT-SoVITSLow
docker-rangeport:9874 http.html:"GPT-SoVITS"Main training WebUI portLow
cve-rceport:9871 http.html:"GPT-SoVITS"Proofreading tool port; CVE-affectedLow
broadhttp.html:"GPT-SoVITS"Any port — catches all five exposed portsMed
identity-probeGET / on 9872 → Gradio HTML with "GPT-SoVITS" title; GET /control on 9880 → JSON {"message":"..."}

FP note: GPT-SoVITS is a distinctive brand string with no collision class. Port range 9871-9874 and 9880 is specific to this project. Low FP risk across all queries.

CVE note: Hosts matching port 9871-9874 dorks are candidates for CVE-2025-49833/34/35/36 (unauthenticated RCE via command injection). Verify before asserting exploitability.


Pipecat

Auth default: none (WebSocket transport; “best suited for prototyping and controlled network environments” per docs) Exposure class: Live voice agent interaction, LLM API key exposure via file-read CVEs, audio stream interception

LabelQueryRationaleFP Risk
primaryhttp.html:"pipecat-ai"GitHub org name in page source; Pipecat-specificLow
secondaryhttp.html:"pipecat" "daily.co"Framework + parent company API co-occurrence in demosLow
websocketport:8765 "pipecat"Default WebSocket transport port + brandMed
gradioport:7860 http.html:"pipecat"Gradio-wrapped Pipecat demosMed
identity-probeWebSocket connect port 8765 → server accepts upgrade; HTTP GET / → FastAPI docs page if dev runner active

FP note: Port 8765 is widely used for WebSocket development servers. Without the "pipecat" anchor this is useless. http.html:"pipecat-ai" is the cleanest signal.


LiveKit

Auth default: JWT required for room operations; but health endpoint on port 7880 accessible; many self-hosters expose API directly without reverse proxy Exposure class: Room metadata, participant telemetry via Prometheus (port 6789), TURN credential leaks, SIP trunk exposure

LabelQueryRationaleFP Risk
primaryhttp.headers:"X-LiveKit-Server" port:7880Response header confirmed in LiveKit SDK docs; version-bearingLow
secondaryport:7880 "livekit"Port + brand string in bannerLow
htmlport:7880 http.html:"livekit"HTML content on API portLow
prometheusport:6789 "livekit"Prometheus metrics endpoint, unauthenticated by defaultLow
sipport:5060 "livekit"SIP integration, optionalMed
identity-probeGET / on port 7880 → HTTP 200 + X-LiveKit-Server: livekit/x.x.x response header

FP note: X-LiveKit-Server header is the near-zero-FP fingerprint. Port 7880 has low ambient traffic. Prometheus on 6789 is unauthenticated — separate survey target worth running independently.


Deepgram Self-Hosted

Auth default: API key required (enterprise licensing); but response headers leak on any request including auth-failed requests Exposure class: Enterprise transcription at no cost if misconfigured; model/version disclosure; medical/legal audio data

LabelQueryRationaleFP Risk
primaryport:8080 http.headers:"dg-request-id"Deepgram-specific response header present on all requestsLow
secondaryport:8080 http.headers:"dg-model-name"Second Deepgram-specific headerLow
tertiaryport:8080 "deepgram"Brand in bannerLow
license-proxyport:8443 "deepgram"License proxy portLow
identity-probeGET /v1/listen → response headers contain dg-request-id, dg-model-name, dg-model-uuid regardless of auth status

FP note: dg-request-id and dg-model-name are Deepgram-specific header names confirmed in official API documentation. Near-zero FP. Enterprise deployment profile means small population.


XTTS-v2 / xtts-api-server

Auth default: none (FastAPI, no auth; auto-docs at /docs always open) Exposure class: Voice cloning synthesis, speaker reference audio uploads, arbitrary text synthesis in cloned voice

LabelQueryRationaleFP Risk
primaryport:8020 http.html:"tts_to_audio"Endpoint name from FastAPI auto-docs; XTTS-specificLow
secondaryport:8020 "/docs" http.html:"xtts"FastAPI docs page with XTTS brandLow
tertiaryport:8020 http.json:"language_iso_codes"JSON field from /languages endpointLow
broadport:8020 http.html:"tts"Port + TTS term; catches all xtts-api-server variantsMed
docs-leakport:8020 "swagger"FastAPI Swagger docs page openMed
identity-probeGET /docs → Swagger UI listing tts_to_audio, tts_to_file endpoints; GET /languages → JSON {"language_iso_codes":[...]}

FP note: Port 8020 has low ambient usage. tts_to_audio as an HTML string in FastAPI docs is XTTS-api-server-specific. Low FP risk.


SpeechBrain

Auth default: varies (toolkit; HuggingFace-hosted requires token; self-hosted wrappers typically none) Exposure class: Speaker biometric verification, emotion recognition, speech enhancement, ASR — biometric-class data

LabelQueryRationaleFP Risk
primaryhttp.html:"speechbrain.pretrained" port:7860Python module import path in Gradio demo sourceLow
secondaryhttp.html:"speechbrain" "gradio"Toolkit name + Gradio framework co-occurrenceMed
titlehttp.title:"SpeechBrain"UI title if demo page uses itMed
huggingfacehttp.html:"speechbrain" "huggingface"HF model hub path in sourceMed
identity-probeGET / → Gradio HTML containing "speechbrain" in source; no standard REST API probe

FP note: speechbrain.pretrained is a Python import path specific to SpeechBrain’s interface pattern. Low FP. Generic http.html:"speechbrain" at port 7860 is reasonable but verify against HF-hosted demos (those should be auth-gated).


Tortoise TTS

Auth default: none (Gradio; share=True commonly enabled in tutorials, creating public ngrok tunnels) Exposure class: High-quality voice cloning synthesis, voice_samples/ directory with stored reference audio clips

LabelQueryRationaleFP Risk
primaryhttp.html:"tortoise-tts" port:7860GitHub repo slug in Gradio source + default portLow
secondaryhttp.html:"voice_samples" "tortoise" port:7860voice_samples directory reference + brandLow
flask-apiport:5000 http.html:"/synthesize" "tortoise"Flask API endpoint + brandLow
titlehttp.title:"Tortoise" -http.html:"tortoise.com" -http.html:"investment"UI title minus financial services FPsMed
tts-webuihttp.html:"tortoise" http.html:"tts-webui"TTS-WebUI (rsxdalv) wrapper often used for TortoiseLow
identity-probeGET / → Gradio HTML with "tortoise" in source; GET /info → JSON Gradio metadata

FP note: http.title:"Tortoise" hits tortoise.com (investment firm) and other tortoise-themed pages. The voice_samples anchor is strong — it’s a Tortoise-specific directory name used in UI file pickers.


Bark (Suno TTS)

Auth default: none (FastAPI/Uvicorn; Docker runs 0.0.0.0:5000; no auth in canonical implementation) Exposure class: Free audio synthesis (voice + music + sound effects), deepfake audio generation including emotional/paraverbal audio

LabelQueryRationaleFP Risk
primaryport:5000 http.json:"bark-inference"Endpoint name appears in FastAPI docs JSONLow
secondaryport:5000 "suno-ai" http.html:"bark"Docker image slug + brandLow
github-slughttp.html:"suno-ai/bark"GitHub path appears in requirements/sourceLow
gradioport:7860 http.html:"suno-ai"Gradio-wrapped Bark demosMed
identity-probeGET /docs → Swagger UI with /bark-inference endpoint; POST /bark-inference JSON {"text":"hi"} → 200 + audio/wav

FP note: Port 5000 is Flask default and heavily used. http.json:"bark-inference" anchors specifically to the Bark FastAPI docs response. Without this anchor, port 5000 + "bark" produces massive FPs (bark as in dog bark, tree bark, barking mad, etc.).


Compound / Multi-Platform Dorks

LabelQueryCoversFP Risk
voice-cloning-stackhttp.html:"GPT-SoVITS" OR http.html:"rvc-webui" OR http.html:"so-vits-svc"All three major voice cloning projectsLow
tts-portsport:5002 OR port:7851 OR port:8020 http.html:"tts"Coqui/AllTalk/XTTS ports with TTS anchorMed
gradio-voiceport:7860 http.html:"voice" http.html:"cloning"Any Gradio voice cloning appMed
whisper-family"openai-whisper-asr-webservice" OR "whisper.cpp" OR "faster-whisper"All three Whisper variantsLow
livekit-full"livekit" port:7880 OR port:6789LiveKit API + PrometheusLow