Content

This model area links to models and tools around Mägic. The research milestones were called Skipper (T3) and Mate (M8).

The Mägic project is a Proto Open Source project (OpenSoars) that does NOT publish its code but applies the benefits ONLY to OSI models and some select Open Weights models. The goal is to strengthen the True Open Source model family. Currently there are T3 based demos on Huggingface Spaces to try out the model behavior under such extreme compression. Further variants for faster inference and local inference will follow.

SmartQuant

SmartQuant is not Mägic. It is there to provide a baseline with improved default compression. It adds support for regular and ikllama quants for unmodified, finetuned and REAP/REAM models.

Mägic

Mägic splits the work into two parts: secret code for lossless model conversion and public code for efficient model inference.

Converted GGUF models will be provided for:

  • all OSI compliant language models (Olmo, Apertus, Smol, ...)
  • select Open Weights language models (tbd: Mistral, Granite, ...)

Inference software based on llama.cpp and ikllama.cpp will be provided as open source under Apache 2.0 license.

  • initial versions target 2bpw for fp16 quality at a 4x speedup: A single RTX 4090 will be able to serve a 70b model as fast as a H200 today.
  • T3 showed that 1.4bpw and 10x speedup is possible. That is Mägic.

Demo Spaces

  • Regular compression
  • T3 OSI compression
    • TOM@zero Demo of next generation 2bpw compression (Skipper aka T3) with high quality open source models (OSI)
      • Olmo3
      • Smol3: invalid continuation byte
      • Apertus: invalid continuation byte
  • T3 Open Weights compression
    • Granite4extreme Granite 4 small hybrid 32b compressed to below 9GB in fp16 quality
  • Mägic compression
Downloads last month
-
GGUF
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support