mPLUG UI-S1-7B - Hybrid W4 Quantized (Quanto)

Model Description

This is a hybrid quantized version of mPLUG/UI-S1-7B optimized for efficient GUI automation on consumer hardware.

Quantization Strategy

  • Method: Quanto INT4 hybrid quantization
  • Text Layers: 196 layers quantized to INT4 (75% compression)
  • Vision Tower: 162 layers preserved in BF16 (100% quality)
  • Size: 4.6GB (68.7% smaller than 14.5GB original)
  • VRAM: ~4.5-5.5GB (fits on 16GB GPUs with 16k context)

Key Features

โœ… Zero Vision Quality Loss - Vision tower completely preserved in BF16
โœ… Massive Memory Savings - 68.7% size reduction
โœ… Consumer Hardware Ready - Runs on 16GB VRAM GPUs
โœ… 16k Context Support - Full context window with room to spare

Performance

Metric Original Quantized
Model Size 14.5 GB 4.6 GB
VRAM Usage ~14-15 GB ~4.5-5.5 GB
Vision Quality 100% 100% (preserved)
Text Layers FP16 INT4

Usage

Loading the Model

import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from quanto import safe_load, quantize, freeze, qint4

# Load base architecture
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load quantized weights
state_dict = safe_load("quanto_model.safetensors")
model.load_state_dict(state_dict, strict=False)

# Requantize (restore quanto layers)
vision_keywords = ['visual', 'vision', 'image', 'patch', 'merger', 'projector', 'embed_tokens', 'lm_head']
exclude_modules = []
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Linear):
        if any(k in name.lower() for k in vision_keywords):
            exclude_modules.append(name)

quantize(model, weights=qint4, exclude=exclude_modules)
freeze(model)
model.eval()

processor = AutoProcessor.from_pretrained("Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto", trust_remote_code=True)

Inference with Images

from PIL import Image
from qwen_vl_utils import process_vision_info

# Load image
image = Image.open("screenshot.png")

# Prepare messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Describe the UI elements."}
        ]
    }
]

# Process
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to(model.device)

# Generate
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=128)
    
response = processor.batch_decode(
    [out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)],
    skip_special_tokens=True
)[0]

Hardware Requirements

Minimum

  • GPU: 16GB VRAM (RTX A4000, L4, RTX 4060 Ti)
  • RAM: 16GB+
  • Storage: 5GB

Recommended

  • GPU: 24GB VRAM (RTX 4090, A5000)
  • RAM: 32GB+

Technical Details

Layer Distribution

  • Total Linear Layers: 358
  • Quantized (Text): 196 layers โ†’ INT4
  • Preserved (Vision): 162 layers โ†’ BF16

Quantization Process

  1. Load model in BF16
  2. Identify vision-critical layers
  3. Apply Quanto INT4 to text layers only
  4. Preserve vision tower in full precision
  5. Save with safetensors

Limitations

  • Requires quanto library for loading
  • Best performance with vLLM deployment
  • Vision layers must remain unquantized for quality

Citation

Original Model:

@article{lu2025ui,
  title={UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning},
  author={Lu, Zhengxi and others},
  journal={arXiv preprint arXiv:2509.11543},
  year={2025}
}

License

Apache 2.0 (same as base model)

Acknowledgements

  • Base Model: mPLUG/UI-S1-7B
  • Quantization: Quanto by Hugging Face
  • Strategy: Custom hybrid quantization for VLM quality preservation
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto

Base model

mPLUG/UI-S1-7B
Finetuned
(1)
this model

Paper for Hadidiz9/UI-S1-7B-Hybrid-W4-Quanto