danielhanchen (Daniel (Unsloth))

posted an update 1 day ago

Post

2316

Kimi K3 can now be run locally! ✨

The 1-bit model retains ~78.9% accuracy after we shrunk it from 1.56TB to 594GB (-62% size).

Run on a Mac Studio connected with 128GB RAM device. Kimi K3 is the strongest open model to date.

GGUF: unsloth/Kimi-K3-GGUF
Guide: https://unsloth.ai/docs/models/kimi-k3

2 replies

·

replied to their post 10 days ago

As long as it's above RDNA2 it should work

posted an update 10 days ago

Post

4649

Introducing Unsloth for AMD 🚀
You can now train & run LLMs on your AMD hardware

• We collaborated with AMD to enable you to train & run 500+ models on AMD GPUs
• Works on Windows, WSL, Linux
• Train Qwen, Gemma on just 3GB VRAM

GitHub: https://github.com/unslothai/unsloth
Blog + Guide: https://unsloth.ai/docs/basics/amd

3 replies

·

replied to their post 11 days ago

If you read our graphic, it says you can update the template as well. Most people don't know how to replace the chat template.

replied to their post 12 days ago

Our MLX quants were update: https://huggingface.co/collections/unsloth/gemma-4

replied to their post 12 days ago

It was posted officially by Google: https://x.com/googlegemma/status/2077449152062247219

posted an update 12 days ago

Post

5705

Gemma 4 is now faster and much more accurate! 🚀

Google made huge improvements to tool-calling and chat accuracy, reliability + speed.
To get fixes, re-download our updated GGUF, MLX, NVFP4 quants!

Unsloth quants: https://huggingface.co/collections/unsloth/gemma-4
Gemma 4 Guide: https://unsloth.ai/docs/models/gemma-4

8 replies

·

posted an update 16 days ago

Post

4640

We’re releasing Gemma 4 NVFP4 quants that run 1.5× faster on your GPU.

Gemma-4-12B NVFP4 works on 11GB VRAM.
26B-A4B hits 13K tok/s (B200).

Unsloth NVFP4 enables faster, more accurate 4-bit Blackwell inference.

Blog: https://unsloth.ai/docs/basics/nvfp4
Gemma NVFP4: https://huggingface.co/collections/unsloth/nvfp4

3 replies

·

posted an update 20 days ago

Post

4227

We’re releasing new Qwen3.6 quants that run 2.5× faster on your GPU. ⚡

Qwen3.6-27B NVFP4 runs on 24GB VRAM.
35B-A3B can hit 17,561 tok/s (B200).

We also improved accuracy, tool calling, agent use, and looping.

Qwen3.6 NVFP4: https://huggingface.co/collections/unsloth/nvfp4
Guide: https://unsloth.ai/docs/models/qwen3.6#nvfp4

1 reply

·

posted an update 23 days ago

Post

6124

DeepSeek-V4 can now run locally with Unsloth GGUFs! 🐳

Run lossless DeepSeek-V4-Flash on 168GB RAM or
3-bit works on 110GB Mac, RAM, VRAM setups.

Run via Unsloth Studio or llama.cpp.

GGUF: unsloth/DeepSeek-V4-Flash-GGUF
Guide: https://unsloth.ai/docs/models/deepseek-v4

posted an update about 1 month ago

Post

3355

1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5

We gave 3 models the same prompt and compared one-shot outputs.

The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s.

Which output do you like best?
GGUF: unsloth/GLM-5.2-GGUF

3 replies

·

posted an update about 1 month ago

Post

4603

Google's new DiffusionGemma can now run at 2000+ tokens/sec! ⚡

We made local DiffusionGemma inference 1.8× faster.
Run it on 18GB RAM via Unsloth Studio.

GitHub: https://github.com/unslothai/unsloth
Guide: https://unsloth.ai/docs/models/diffusiongemma

4 replies

·

posted an update about 2 months ago

Post

1191

Google releases DiffusionGemma.✨
The new 26B-A4B diffusion text model runs locally on 18GB RAM.

Run with 4x faster text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio.

GGUF: unsloth/diffusiongemma-26B-A4B-it-GGUF
Guide: https://unsloth.ai/docs/models/diffusiongemma

1 reply

·

posted an update about 2 months ago

Post

4299

Google releases Gemma 4 QAT. ✨
You can now run Gemma 4 at 3x less memory with near original performance.

QAT makes it possible to run Gemma 4 26B-A4B on 16GB RAM.

GGUFs: https://huggingface.co/collections/unsloth/gemma-4-qat
QAT Guide: https://unsloth.ai/docs/models/gemma-4/qat

1 reply

·

posted an update about 2 months ago

Post

9348

Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.

Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.

GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4

5 replies

·

posted an update 2 months ago

Post

2838

Qwen3.6 MTP is here! Run locally on 20GB RAM. ⚡️

MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change.

Qwen3.6-27B: unsloth/Qwen3.6-27B-MTP-GGUF
Qwen3.6-35B-A3B: unsloth/Qwen3.6-35B-A3B-MTP-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

2 replies

·

posted an update 3 months ago

Post

6000

We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! 🔥🦥

Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! 💕

Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth

2 replies

·

posted an update 3 months ago

Post

7791

We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀

Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing

Guide: https://unsloth.ai/blog/nvidia-collab
GitHub: https://github.com/unslothai/unsloth

posted an update 3 months ago

Post

8942

We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp

Guide: https://unsloth.ai/docs/basics/api