Most recent
navigate open esc close Corpus index built 2026-06-07 23:58 UTC

← All engagement records

Case study May 1, 2026

INHA University: Ollama Stack + vLLM Node

Sector
Universities
Country
inha

NuClide Research · 2026-05-01 (updated 2026-05-03)


Summary

INHA University (인하대학교) in Incheon has two independent unprotected AI inference nodes: an Ollama instance (165.246.39.51) with 7 models totalling ~133GB including gpt-oss:20b and dual Nemotron-Cascade 30B, and a separate vLLM 0.8.4 node (165.246.170.53) serving a containerized Qwen model with 90% prefix cache efficiency.


Node Summary

NodeIPServiceModelPortNotes
Ollama165.246.39.51Ollamagpt-oss:20b + 6 models11434CVE-2025-63389 injectable
vLLM165.246.170.53vLLM 0.8.4local-qwen (Qwen, containerized)8000311 requests, 90% cache hit

Infrastructure

FieldValue
IP165.246.39.51 (Ollama) / 165.246.170.53 (vLLM)
OrganizationINHA UNIVERSITY
CountrySouth Korea
Open ports11434 (Ollama, public) / 8000 (vLLM, public)

Model Inventory

ModelSizeNotes
gpt-oss:20b12.1GBLocal inference, 20.9B params, gpt-oss family
hf.co/unsloth/gpt-oss-20b-GGUF:Q8_012.1GBSame weights, direct HF GGUF pull
nemotron-cascade-2:30b24.3GBNVIDIA Nemotron Cascade 2 30B
gemma4:26b-a4b-it-q8_028.1GBGemma 4 Q8
nemotron-3-nano:30b24.3GBNVIDIA Nemotron-3 Nano 30B
qwen3.5:27b22.5GB,
deepseek-r1:14b9.0GB,

Total local storage: ~132GB


Findings

F1: Local gpt-oss:20b and Dual Nemotron Stack (HIGH)

gpt-oss:20b is running locally (12.1GB, 20.9B params). The model family gpt-oss is the OpenAI open-source weights release. Both the standard Ollama-tagged version and the direct HuggingFace GGUF pull are present, suggesting the operator downloaded via hf.co/unsloth/gpt-oss-20b-GGUF:Q8_0 first, then aliased it.

The dual nemotron-cascade-2:30b and nemotron-3-nano:30b stack (both 24.3GB) suggests NVIDIA model evaluation or research use.

F2: CVE-2025-63389 Injectable (HIGH)

All models injectable via unauthenticated /api/create. The Nemotron and gpt-oss models have no system prompts, post-injection inference is unobstructed.


Remediation

OLLAMA_HOST=127.0.0.1:11434
systemctl restart ollama

Node: 165.246.170.53: vLLM Containerized Qwen Node

FieldValue
IP165.246.170.53
rDNSNo rDNS (SERVFAIL)
vLLM version0.8.4
Model IDlocal-qwen
Model root/model (container mount)
max_model_len4,096 tokens
Port8000/tcp public

local-qwen is an alias, the model is mounted at /model inside a container, hiding the actual model family and version. Based on the naming and university context, this is likely a Qwen 2.5 or Qwen 3 variant. The containerized deployment pattern (Docker volume mount at /model) and the vLLM 0.8.4 version suggest an automated or scripted deployment.

Metrics

MetricValue
request_success_total[stop]277
request_success_total[length]34
Total requests311
prompt_tokens_total10,833
generation_tokens_total12,900
gpu_prefix_cache_queries_total532
gpu_prefix_cache_hits_total481
Prefix cache hit rate90.4%

The high cache hit rate (90.4%) indicates a consistent input pattern, likely a chatbot or assistant with a fixed system prompt contributing repeated prefix tokens.


Disclosure

  • Discovered: 2026-05-01 (Ollama) / 2026-05-03 (vLLM node)
  • Status: Pending outreach to INHA IT (inha.ac.kr)