YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Arbitrary Code Execution via GGML_PAD Integer Overflow in llama.cpp during GGUF (CLIP) Model Load
Summary
A crafted GGUF file triggers an integer overflow in the GGML_PAD macro (ggml/src/ggml.c), causing a heap buffer overflow during model loading. This allows an attacker to overwrite function pointers and achieve arbitrary code execution.
The vulnerability is reachable through clip.cpp, which is used by llama-server --mmproj, llama-mtmd-cli, llama-llava-cli, and 4 other shipped tools. While llama-model-loader.h implements bounds checking, clip.cpp performs no validation on tensor sizes prior to allocation.
CVSS 3.1: 7.8 High (CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H)
In server deployments where users can specify multimodal projector files, the attack vector becomes network-facing with no user interaction required (CVSS 9.8 Critical: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H).
Tested against: llama.cpp commit e8e261699, tag b8132. The vulnerability is unfixed in ggml/src/ggml.c.
Vulnerability Details
The GGML_PAD macro computes aligned sizes:
#define GGML_PAD(x, n) (((x) + (n) - 1) & ~((size_t)(n) - 1))
When a tensor has ne[0] = SIZE_MAX/4 with type F32:
ggml_nbytes = 4 * ne[0] = 0xFFFFFFFFFFFFFFFCGGML_PAD(0xFFFFFFFFFFFFFFFC, 32) = 0(integer overflow wraps to zero)
This passes all existing validation checks in gguf.cpp (lines 550, 589, 643). The result: a tensor claims to need 0 bytes, but the loader writes attacker-controlled data past the end of a too-small heap allocation.
Exploit Chain
gguf_init_from_file()parses the GGUF: the overflow tensor gets padded size 0ggml_backend_alloc_ctx_tensors_from_buft()allocates a buffer sized only for the bait tensors (clip.cpp:1905, no bounds check)fread()attempts to read the massive unvalidated tensor size from the file, writing attacker-controlled data far past the bounds of the undersized heap allocation- The overflow overwrites a
ggml_backend_bufferstruct'sfree_bufferfunction pointer withsystem() ggml_backend_buffer_free()calls the overwritten pointer, executing attacker-controlled code
PoC Files
| File | Description |
|---|---|
template_clip.gguf |
Malicious CLIP GGUF that triggers heap overflow on load |
craft_clip_exploit.c |
Generates the malicious GGUF (standalone, no dependencies) |
clip_loader.c |
Minimal CLIP loader mirroring clip.cpp's code path |
run.sh |
Automated 7-step exploit chain achieving RCE |
The GGUF uses projector type idefics3 with block_count=0, 12 CLIP metadata KV pairs (the minimum for clip.cpp to reach the vulnerable allocation), bait tensors, and 1 overflow probe tensor (mm.model.fc.weight with ne[0] = SIZE_MAX/4).
The number of bait tensors affects the heap layout. run.sh automatically sweeps bait counts (1-64) to find a configuration where the buffer struct lands after the probe data, enabling the forward OOB write to reach its function pointers.
Reproduction
Quick verification with llama-server (Release b8132)
Load the PoC mmproj alongside any standard GGUF model (tested here with ggml-org/tinygemma3-GGUF, but any valid GGUF model should reproduce the crash):
llama-server --mmproj template_clip.gguf --model tinygemma3-Q8_0.gguf
The server crashes with GGML_ASSERT(buffer) failed in clip_model_loader::load_tensors(). Under AddressSanitizer, this reports heap-buffer-overflow on ggml_backend_alloc_ctx_tensors_from_buft.
Full RCE demonstration
huggingface-cli download t-cun/clip-vit-base-patch16-idefics3 --local-dir clip-vit-base-patch16-idefics3
cd clip-vit-base-patch16-idefics3
chmod +x run.sh
./run.sh
This downloads the repo (including shipped llama.cpp b8132 libraries), builds the tools, probes the heap layout, crafts the exploit GGUF, and executes it.
Expected output: PWNED - arbitrary code execution via GGML_PAD overflow
Impact
An attacker can craft a .gguf file that appears to be a standard CLIP vision model. When a victim loads it via llama-server --mmproj or any of the other affected tools, the attacker gains arbitrary code execution on the host machine. The malicious code executes during model loading with no further interaction required from the victim.
- Downloads last month
- -
We're not able to determine the quantization variants.