What it is
Backups are easy to forget about. That’s why they’re dangerous. The popular ML and Kubernetes backup stack: Velero snapshots Kubernetes cluster state plus the persistent volumes underneath; Restic is the encrypted-by-default file backup tool whose REST server mode listens on a public port for incoming snapshots; Barman does Postgres-specific backup-and-restore; Longhorn (Rancher) is the Kubernetes block-storage layer that snapshots volumes on a schedule; BorgBackup sits in the same niche as Restic. In an ML deployment these tools are how the operator’s model weights, training datasets, and vector-DB volumes are persisted between restarts.
What goes wrong
A backup is a verbatim copy of the system at rest. And at rest, every secret
is unencrypted and every model file is intact. Restic’s REST server, when
exposed without HTTP auth, lets an attacker download every snapshot the
operator has ever taken (which is usually the entire model registry plus the
training data). Velero exposes its API through the Kubernetes API server, so
a misconfigured cluster RBAC turns into a one-step model-exfiltration
primitive. Longhorn’s UI ships without auth on port 80 and lists every
volume by name (model-weights-pvc, training-data-pvc), pointing
attackers exactly where to chain next.
How we test
We probe Restic REST /snapshots for the snapshot inventory (this works
without auth in the default config), Longhorn /v1/volumes for the volume
list, Velero’s BackupStorageLocation objects via the Kubernetes API. We do
not download snapshots. The metadata (snapshot IDs, volume names,
timestamps, sizes) is sufficient evidence and avoids us ever touching the
model files themselves. A snapshot called mlflow-pvc measuring 240GB on
a research host tells the disclosure story without any further reach.