mPLUG UI-S1-7B - Hybrid W4 Quantized (Quanto)

Model Description

This is a hybrid quantized version of mPLUG/UI-S1-7B optimized for efficient GUI automation on consumer hardware.

Quantization Strategy

Method: Quanto INT4 hybrid quantization
Text Layers: 196 layers quantized to INT4 (75% compression)
Vision Tower: 162 layers preserved in BF16 (100% quality)
Size: 4.6GB (68.7% smaller than 14.5GB original)
VRAM: ~4.5-5.5GB (fits on 16GB GPUs with 16k context)

Key Features

✅ Zero Vision Quality Loss - Vision tower completely preserved in BF16
✅ Massive Memory Savings - 68.7% size reduction
✅ Consumer Hardware Ready - Runs on 16GB VRAM GPUs
✅ 16k Context Support - Full context window with room to spare

Performance

Metric	Original	Quantized
Model Size	14.5 GB	4.6 GB
VRAM Usage	~14-15 GB	~4.5-5.5 GB
Vision Quality	100%	100% (preserved)
Text Layers	FP16	INT4

Usage

Loading the Model

import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from quanto import safe_load, quantize, freeze, qint4

# Load base architecture
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load quantized weights
state_dict = safe_load("quanto_model.safetensors")
model.load_state_dict(state_dict, strict=False)

# Requantize (restore quanto layers)
vision_keywords = ['visual', 'vision', 'image', 'patch', 'merger', 'projector', 'embed_tokens', 'lm_head']
exclude_modules = []
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Linear):
        if any(k in name.lower() for k in vision_keywords):
            exclude_modules.append(name)

quantize(model, weights=qint4, exclude=exclude_modules)
freeze(model)
model.eval()

processor = AutoProcessor.from_pretrained("Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto", trust_remote_code=True)

Inference with Images

from PIL import Image
from qwen_vl_utils import process_vision_info

# Load image
image = Image.open("screenshot.png")

# Prepare messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Describe the UI elements."}
        ]
    }
]

# Process
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to(model.device)

# Generate
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=128)
    
response = processor.batch_decode(
    [out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)],
    skip_special_tokens=True
)[0]

Hardware Requirements

Minimum

GPU: 16GB VRAM (RTX A4000, L4, RTX 4060 Ti)
RAM: 16GB+
Storage: 5GB

Technical Details

Layer Distribution

Total Linear Layers: 358
Quantized (Text): 196 layers → INT4
Preserved (Vision): 162 layers → BF16

Quantization Process

Load model in BF16
Identify vision-critical layers
Apply Quanto INT4 to text layers only
Preserve vision tower in full precision
Save with safetensors

Limitations

Requires quanto library for loading
Best performance with vLLM deployment
Vision layers must remain unquantized for quality

Citation

Original Model:

@article{lu2025ui,
  title={UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning},
  author={Lu, Zhengxi and others},
  journal={arXiv preprint arXiv:2509.11543},
  year={2025}
}

License

Apache 2.0 (same as base model)

Acknowledgements

Base Model: mPLUG/UI-S1-7B
Quantization: Quanto by Hugging Face
Strategy: Custom hybrid quantization for VLM quality preservation

Downloads last month: 15

Model tree for Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto

Base model

mPLUG/UI-S1-7B

Finetuned

(1)

this model

Paper for Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Paper • 2509.11543 • Published Sep 15, 2025 • 49

Hadidiz9
/

UI-S1-7B-Hybrid-W4-Quanto