GGUF
English
How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf city96/umt5-xxl-encoder-gguf:
# Run inference directly in the terminal:
llama-cli -hf city96/umt5-xxl-encoder-gguf:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf city96/umt5-xxl-encoder-gguf:
# Run inference directly in the terminal:
llama-cli -hf city96/umt5-xxl-encoder-gguf:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf city96/umt5-xxl-encoder-gguf:
# Run inference directly in the terminal:
./llama-cli -hf city96/umt5-xxl-encoder-gguf:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf city96/umt5-xxl-encoder-gguf:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf city96/umt5-xxl-encoder-gguf:
Use Docker
docker model run hf.co/city96/umt5-xxl-encoder-gguf:
Quick Links

This is a GGUF conversion of Google's UMT5 xxl model, specifically the encoder part.

The weights can be used with ./llama-embedding or with the ComfyUI-GGUF custom node together with image/video generation models.

This is a non imatrix quant as llama.cpp doesn't support imatrix creation for T5 models at the time of writing. It's therefore recommended to use Q5_K_M or larger for the best results, although smaller models may also still provide decent results in resource constrained scenarios.

Downloads last month
79,256
GGUF
Model size
6B params
Architecture
t5encoder
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for city96/umt5-xxl-encoder-gguf

Base model

google/umt5-xxl
Quantized
(4)
this model

Space using city96/umt5-xxl-encoder-gguf 1

Collection including city96/umt5-xxl-encoder-gguf