Arize Phoenix Population Survey — 41/55 Unauthenticated Project Disclosure, NuClide Research

NuClide Research · 2026-06-06

Executive Summary

Arize Phoenix (github.com/Arize-ai/phoenix) is an open-source LLM observability and tracing platform — span ingestion, project organization, dataset versioning, prompt management for production AI applications. 94 Shodan-indexed instances on "Phoenix" port:6006. 89 unique endpoints downloaded; 55 responded.

DCWF KSAT coverage

Auto-derived from DCWF AI work-role rule files (ksat-tag).

672 (AI Test & Evaluation Specialist): K7003, K7004, S7068, S7070, S7075, T5858, T5904, T5919
733 (AI Risk & Ethics Specialist): K7040, S7067, T5854, T5868, T5893
overlap (Common AI KSATs (all 5 roles)): K1157, K1158, K22, K6311, K6900, K6935, K7003, K942

Of 55 reachable instances, 41 (74.5%) expose /v1/projects without authentication, and 34 (61.8%) expose /v1/users without authentication. The user list endpoint returns account records including creation timestamps and account IDs — a PII disclosure at population scale.

This is a smaller population than Langfuse or RAGFlow, but the auth surface is more severe: where Langfuse and RAGFlow expose only a signup flag, Phoenix exposes the data layer (projects, users) directly. The finding class is LLM02:2025 Sensitive Information Disclosure (current OWASP Top 10 for LLM Applications #2).

Notable institutional findings: Northeastern University (Boston, USA), SENAI (Brazil — Serviço Nacional de Aprendizagem Industrial, the Brazilian national vocational education service).

Methodology

Stage	Action	Tool
Stage 0	Shodan harvest `"Phoenix" port:6006`	shodan CLI (89 records)
Stage 0c	TCP/HTTP liveness via `/healthz`	herald
Stage 1b	Auth-posture probe `/v1/projects` and `/v1/users` (array_nonempty match)	herald phoenix platform config
Stage 3v	Endpoint semantics validated against Phoenix v6.x source (`src/phoenix/server/api/routers/v1/`)	manual review
Stage 12b	Dataset enrichment with country/org from Shodan record	Python + Shodan join

The probes use array_nonempty matching: if the response contains a non-empty data array, the finding fires. Phoenix returns {"data": [], "next_cursor": null} when authenticated routes are queried without credentials in instances where auth is configured — so a non-empty data array is the unauth signal.

NuClide restraint: account count is the only /v1/users field consumed by herald. Schema/PII details were not extracted, per the restraint ethic — names ARE the finding.

Population Results

Metric	Count	Rate
Shodan-indexed	94	—
Unique endpoints downloaded	89	—
Reachable (HEALTH_OPEN)	55	61.8% of indexed
`/v1/projects` unauth (PROJECTS_UNAUTH)	41	74.5% of reachable
`/v1/users` unauth (USERS_UNAUTH)	34	61.8% of reachable

The PROJECTS_UNAUTH > USERS_UNAUTH gap (41 vs 34) is consistent: about a third of Phoenix instances have configured user accounts (so /v1/users returns data), and the remainder have no users at all (a fresh install with no auth requirement — even more permissive).

Notable Findings

Northeastern University — `129.10.224.226:6006` (HIGH)

Northeastern University (Boston, USA, AS161). Phoenix instance with two projects: Essaybot and default. The Essaybot project name suggests a student essay-grading or writing-assistant LLM application — potentially handling student work products, which is FERPA-relevant data under US education privacy law.

Project names accessible unauthenticated. User records (count = 2) accessible. Span/trace data not exercised (restraint).

Disclosure recipient: oirc@northeastern.edu (Northeastern Office of Information Security)

SENAI Brazil — `200.9.65.187:6006` (HIGH)

Serviço Nacional de Aprendizagem Industrial (SENAI) is Brazil’s national industrial apprenticeship service — vocational education for ~3 million students annually, operated by the Brazilian National Confederation of Industry (CNI). Phoenix instance on SENAI infrastructure with 2 projects exposed and 2 users disclosed.

Vocational education context: Brazilian LGPD applies. Disclosure recipient: CERT.br for coordination, with SENAI national IT direct contact.

`37.27.248.144:6006` Hetzner Helsinki — 21 Projects Disclosed (HIGH)

Single Phoenix instance on Hetzner Helsinki (AS24940) exposing 21 distinct project names unauthenticated. The largest project-count disclosure in the population. Operator not identified from the Shodan record (no TLS cert, no PTR).

Operator profiling: 21 projects suggests a sophisticated production deployment — either a multi-tenant SaaS offering Phoenix-as-a-service to downstream customers, or a single large engineering org with many parallel LLM applications. Either case elevates concern.

Scaleway Paris Cluster — 7 Instances (MEDIUM-PATTERN)

7 Phoenix instances on Scaleway France (163.172.x, 51.15.x, 51.158.x, 51.159.x), all PROJECTS_UNAUTH, most also USERS_UNAUTH. The IP clustering suggests either a single Scaleway customer running a fleet or a Scaleway tenant pattern. Worth follow-up profiling to determine if this is a single operator.

Google Cloud US Cluster — 7 Instances (MEDIUM-PATTERN)

7 Phoenix instances on Google LLC (34.x, 35.x), all PROJECTS_UNAUTH. Same pattern observation. One instance (34.133.205.22) discloses 8 projects.

Geographic Distribution (Findings)

Country	PROJECTS_UNAUTH hosts
United States	12
France (Scaleway)	8
Germany (Hetzner / Contabo / IONOS)	7
Finland (Hetzner)	1
Brazil (SENAI)	1
Sweden	1
Poland	1
China (Aliyun, UCloud)	2
Vietnam	1
India	1
Indonesia	1
Canada	1
Other	5

Unlike Langfuse and RAGFlow (CN-dominant and CN-second respectively), Phoenix’s population is US-Western-Europe dominated. This reflects Arize’s commercial customer base — primarily US enterprise AI teams using the open-source Phoenix for self-hosted observability alongside Arize’s paid product.

Comparison: Auth Surface Severity

Platform	Auth signal	Data exposed
Langfuse	`signUpDisabled: false` flag	Registration possible; trace data behind workspace auth
RAGFlow	`registerEnabled: 1` flag	Registration possible; knowledge base behind tenant auth
Open WebUI	`features.auth: false`	Full chat interface, model inference
Dify	`is_allow_register: true`	Registration possible; apps behind workspace auth
Phoenix	`/v1/projects`, `/v1/users` return data	Direct project + user enumeration; spans potentially accessible

Phoenix’s exposure model is the most severe: while the other platforms gate the data layer even when registration is open, Phoenix exposes project names and user account metadata unauthenticated, requiring no account creation step at all.

Disclosure Pipeline

Finding	Tier	Recommended action
Northeastern University	HIGH (FERPA-class)	oirc@northeastern.edu
SENAI Brazil	HIGH (LGPD-class)	CERT.br + SENAI IT
`37.27.248.144` (21 projects)	HIGH (operator unknown)	Vendor mediated via Arize
Scaleway Paris cluster (7)	MEDIUM	Profiling first, then per-tenant
Google Cloud cluster (7)	MEDIUM	Profiling first
41 total PROJECTS_UNAUTH	UPSTREAM	Arize: change default to auth-required for `/v1/projects` and `/v1/users`

The upstream remediation is the highest-leverage. Arize Phoenix currently ships with PHOENIX_ENABLE_AUTH defaulting to false. A one-line config change protects the entire population.

Remediation (per-operator)

# Phoenix environment:
PHOENIX_ENABLE_AUTH=true
PHOENIX_SECRET=<strong-random-secret>

Verify:

curl http://IP:6006/v1/projects
# Expected: 401 or 403, NOT a populated data array

Toolchain Provenance

Step 0:    shodan download '"Phoenix" port:6006' (89 records)
Step 0c:   IP extraction → ip-port.txt (89 unique)
Step 1b:   herald -platform phoenix < ip-port.txt
           - probe id projects_unauth: /v1/projects array_nonempty
           - probe id users_unauth: /v1/users array_nonempty
           - probe id health_open: /healthz body_contains "OK"
Step 3v:   Endpoint semantics verified against Arize/phoenix v6.x source
Step 12b:  This document

Tool: herald v0.1.1 (github.com/nuclide-research/herald). Phoenix config added with three probes covering the LLM02-class disclosure surface.

Research Contribution

Phoenix is the third same-day platform survey (after Langfuse and RAGFlow). Unlike the registration-flag findings, Phoenix surfaces a direct data-layer disclosure: the platform’s default deployment exposes the actual data model (projects, users) without any auth step. This maps cleanly to OWASP LLM Top 10 (2025) entry LLM02 Sensitive Information Disclosure, which jumped from #6 to #2 in the 2025 revision specifically because of incidents like this — enterprise AI deployments leaking observability data publicly.

The Phoenix population is also the smallest of the three (89 vs 1,140 vs 1,905), suggesting either a more sophisticated user base who hardens by default, or a less mature deployment cycle where many operators still run development instances unauthenticated. The 74.5% PROJECTS_UNAUTH rate suggests the latter.

The three-platform same-day corpus (Langfuse 88.9%, RAGFlow 87.2%, Phoenix 74.5%) is a strong empirical baseline for the auth-permissive-default cohort hypothesis (Candidate Insight #76).