llama.cpp, Model Layer, NuClide Stack

What it is

llama.cpp is the C++ reference implementation of LLaMA inference, the project that pioneered GGUF quantization and runs LLMs on commodity CPU + small GPU hardware. Its built-in HTTP server (llama-server) exposes an OpenAI-compatible API at /v1/models, /v1/chat/completions, plus the platform-native /props and /completion endpoints. Operators frequently co-deploy llama.cpp on the same port as Ollama (:11434) so existing Ollama clients can swap backends transparently.

What goes wrong

llama.cpp has no built-in authentication. The framework’s design assumption (same as Ollama, vLLM, Triton) is that auth comes from a reverse proxy. Population-scale surveys find ~70% of :11434 ports running llama.cpp instead of (or alongside) Ollama, all unauthenticated. The /props endpoint discloses the loaded chat template (sometimes a custom-trained one), the model’s n_ctx, the total slots, and the operator’s quantization config. /completion accepts arbitrary prompts and burns operator compute. When the operator has loaded a custom-finetuned model (Xiyan_FT_14B, Baichuan_32B_medical, etc.), the model itself is operator IP.

How we test

We probe three alternative endpoints to distinguish llama.cpp from co-deployed Ollama: /v1/models should return JSON with "owned_by":"llamacpp", /props returns the server-info JSON with default_generation_settings + chat_template, and the HTTP Server: header reads llama.cpp on most builds. We never POST /completion or /v1/chat/completions; the model identity + config disclosure is the finding. The llama.cpp fingerprint was added to aimap in v1.9.4 (2026-05-15) after a field instance was caught running custom BitNet-b1.58-2B-4T on a Contabo SG host.

Receipts

Research

Every survey, case study, and disclosure we've published that touches this layer of the stack. Counts on the cells above tally these directly.

Cross-cloud surveys

1

Survey May 15, 2026

llama.cpp HTTP Server Population Survey (2026-05-15)

Direct follow-on survey to the day's Ollama work and the aimap v1.9.4 release. aimap v1.9.4 added a llama.cpp server fingerprint after the 194.233.71.223 single-host case revealed that PHASE-2 fingerp…

Read →

Model Layer

llama.cpp

What it is

What goes wrong

How we test

Research

Cross-cloud surveys

llama.cpp HTTP Server Population Survey (2026-05-15)

Other categories in this layer

Ollama

vLLM

Triton Inference Server

Speech & Audio

Embedding Servers