Compute Orchestration, Data Layer, NuClide Stack

What it is

You can’t fine-tune a 70B model on a laptop. ML compute orchestrators are how teams rent and schedule expensive GPUs. RunPod (managed) lets a researcher spin up an 8xA100 pod from a Jupyter button; Ray (Anyscale) is the Python-native distributed-compute framework; Volcano is the Kubernetes GPU scheduler; Kubeflow wraps both for an MLOps workflow; SkyPilot abstracts cloud GPU provisioning across providers. Each is the layer between “I need 80GB of VRAM” and “the GPU is now running my code.”

What goes wrong

These systems hold very expensive credentials. RunPod API keys map to billable GPU pods; Ray clusters mount the operator’s full SSH agent and kubeconfig; Kubeflow Pipelines runs as a service account with cluster-wide read on most installs. An exposed Ray dashboard is a one-click ray submit endpoint that runs arbitrary Python on the operator’s GPU fleet. An exposed RunPod control plane lets an attacker spin up new pods for arbitrary workloads on the operator’s bill. The cost vector here is real: we have seen disclosures involving five-figure unauthorised GPU rentals.

How we test

We probe Ray’s dashboard /api/version, Kubeflow’s /pipeline endpoint, and SkyPilot’s API server for fingerprints. Where reachable, we list jobs (no submit, no cancel) to characterise what the operator runs and how much GPU they have available. Job names typically include the model architecture and training step, which is enough to attribute the operator and characterise the loss vector for the disclosure.

Receipts

Research

Every survey, case study, and disclosure we've published that touches this layer of the stack. Counts on the cells above tally these directly.

Cross-cloud surveys

1

Survey May 6, 2026

Compute Orchestration / Training tier, cloud survey 2026-05

NuClide Research

Read →

Data Layer

Compute Orchestration

What it is

What goes wrong

How we test

Research

Cross-cloud surveys

Compute Orchestration / Training tier, cloud survey 2026-05

Other categories in this layer

Vector Databases

Search Engines

OLAP / Analytics Backends

MLOps Tracking

Agent Memory

Data Labeling

Object Storage

GPU Compute & Telemetry

Container Orchestration

Medical / Edge AI

Backup & Snapshots

Fine-tuning Runtimes

Document Parsers

Model Hubs & Registries