You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Arbitrary Code Execution via GGML_PAD Integer Overflow in llama.cpp during GGUF (CLIP) Model Load

Summary

A crafted GGUF file triggers an integer overflow in the GGML_PAD macro (ggml/src/ggml.c), causing a heap buffer overflow during model loading. This allows an attacker to overwrite function pointers and achieve arbitrary code execution.

The vulnerability is reachable through clip.cpp, which is used by llama-server --mmproj, llama-mtmd-cli, llama-llava-cli, and 4 other shipped tools. While llama-model-loader.h implements bounds checking, clip.cpp performs no validation on tensor sizes prior to allocation.

CVSS 3.1: 7.8 High (CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H)

In server deployments where users can specify multimodal projector files, the attack vector becomes network-facing with no user interaction required (CVSS 9.8 Critical: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H).

Tested against: llama.cpp commit e8e261699, tag b8132. The vulnerability is unfixed in ggml/src/ggml.c.

Vulnerability Details

The GGML_PAD macro computes aligned sizes:

#define GGML_PAD(x, n) (((x) + (n) - 1) & ~((size_t)(n) - 1))

When a tensor has ne[0] = SIZE_MAX/4 with type F32:

ggml_nbytes = 4 * ne[0] = 0xFFFFFFFFFFFFFFFC
GGML_PAD(0xFFFFFFFFFFFFFFFC, 32) = 0 (integer overflow wraps to zero)

This passes all existing validation checks in gguf.cpp (lines 550, 589, 643). The result: a tensor claims to need 0 bytes, but the loader writes attacker-controlled data past the end of a too-small heap allocation.

Exploit Chain

gguf_init_from_file() parses the GGUF: the overflow tensor gets padded size 0
ggml_backend_alloc_ctx_tensors_from_buft() allocates a buffer sized only for the bait tensors (clip.cpp:1905, no bounds check)
fread() attempts to read the massive unvalidated tensor size from the file, writing attacker-controlled data far past the bounds of the undersized heap allocation
The overflow overwrites a ggml_backend_buffer struct's free_buffer function pointer with system()
ggml_backend_buffer_free() calls the overwritten pointer, executing attacker-controlled code

PoC Files

File	Description
`template_clip.gguf`	Malicious CLIP GGUF that triggers heap overflow on load
`craft_clip_exploit.c`	Generates the malicious GGUF (standalone, no dependencies)
`clip_loader.c`	Minimal CLIP loader mirroring clip.cpp's code path
`run.sh`	Automated 7-step exploit chain achieving RCE

The GGUF uses projector type idefics3 with block_count=0, 12 CLIP metadata KV pairs (the minimum for clip.cpp to reach the vulnerable allocation), bait tensors, and 1 overflow probe tensor (mm.model.fc.weight with ne[0] = SIZE_MAX/4).

The number of bait tensors affects the heap layout. run.sh automatically sweeps bait counts (1-64) to find a configuration where the buffer struct lands after the probe data, enabling the forward OOB write to reach its function pointers.

Reproduction

Quick verification with llama-server (Release b8132)

Load the PoC mmproj alongside any standard GGUF model (tested here with ggml-org/tinygemma3-GGUF, but any valid GGUF model should reproduce the crash):

llama-server --mmproj template_clip.gguf --model tinygemma3-Q8_0.gguf

The server crashes with GGML_ASSERT(buffer) failed in clip_model_loader::load_tensors(). Under AddressSanitizer, this reports heap-buffer-overflow on ggml_backend_alloc_ctx_tensors_from_buft.

Full RCE demonstration

huggingface-cli download t-cun/clip-vit-base-patch16-idefics3 --local-dir clip-vit-base-patch16-idefics3
cd clip-vit-base-patch16-idefics3
chmod +x run.sh
./run.sh

This downloads the repo (including shipped llama.cpp b8132 libraries), builds the tools, probes the heap layout, crafts the exploit GGUF, and executes it.

Expected output: PWNED - arbitrary code execution via GGML_PAD overflow

Impact

An attacker can craft a .gguf file that appears to be a standard CLIP vision model. When a victim loads it via llama-server --mmproj or any of the other affected tools, the attacker gains arbitrary code execution on the host machine. The malicious code executes during model loading with no further interaction required from the victim.

Downloads last month: -

GGUF

Model size

4611686T params

Architecture

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support