Triton Inference Server, Model Layer, NuClide Stack

What it is

Triton is NVIDIA’s enterprise inference server: the heavyweight runtime designed for production model serving across every hardware target NVIDIA makes. It supports TensorRT, ONNX, PyTorch, TensorFlow, vLLM, and Python backends; it runs ensemble pipelines across models; it has a binary protocol (gRPC) and an HTTP/REST one. When you see a tritonserver container in a Kubernetes deployment, you’re looking at someone serious about ML throughput.

What goes wrong

Triton’s HTTP endpoints (/v2/models, /v2/repository/index, /v2/health/ready) are unauthenticated by design (NVIDIA’s position: enforce auth at the ingress). The model repository index is a verbatim list of model names, their versions, their backends, and their state. For commercial operators these names are their intellectual property: fraud-detection-v3, recommender-cold-start-v7, biometric-match-v2. We’ve found Triton instances exposing classifier models that are clearly pulled from the operator’s product, alongside the safety classifiers the operator hopes nobody bypasses.

How we test

We hit /v2 for the version banner, /v2/repository/index for the catalogue, and /v2/models/{name} for the model config (which exposes input/output tensor shapes, sufficient to reverse-engineer the model’s purpose without ever invoking it). When the model is a published architecture (a known LLM, a known vision backbone) we do not issue inference. When it’s a custom fine-tune we capture only the metadata.

Receipts

Research

Every survey, case study, and disclosure we've published that touches this layer of the stack. Counts on the cells above tally these directly.

Cross-cloud surveys

1

Survey May 3, 2026

NVIDIA Triton Inference Server on Public Cloud: Auth Posture Survey

Reused the 22,765 port-8000 hits from the prior ChromaDB sweep and fingerprinted them for NVIDIA Triton Inference Server (GET /v2 body match "name":"triton"). 2 confirmed Triton instances, both unauth…

Read →