What it is
Models and datasets are big (gigabytes to terabytes per artefact), and the universal storage substrate for them is S3-compatible object storage. MinIO is the self-hosted on-prem option (also bundled with most RAG distributions like Dify); AWS S3, Google Cloud Storage, and Cloudflare R2 are the cloud variants; Garage and SeaweedFS are the smaller open alternatives. Every model registry, every fine-tuning job, every RAG document loader writes through one of these.
What goes wrong
MinIO ships with the credentials minioadmin / minioadmin and a public
console on port 9001. Most operators change the password but leave the
console reachable; many leave the API on port 9000 with a public bucket
policy that reveals the bucket inventory. The buckets are typically named
after the project (model-weights, training-data-2026,
customer-uploads), and the keys inside them describe the artefact lifecycle.
S3 buckets exhibit the same pattern at a different scale: misconfigured
bucket policies, public ACLs from old aws s3 sync --acl public-read
mistakes, and the now-classic “bucket name is the company name plus
production” enumeration vulnerability.
How we test
We list buckets through the unauthenticated MinIO admin API where reachable,
and check S3 buckets via probabilistic name enumeration (no brute-force,
just the patterns that fall out of the operator’s known naming conventions).
We confirm exposure with a single HEAD against a bucket-listing URL; we do
not download objects. Bucket names plus their key-prefix structure are the
disclosure evidence.