DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers Paper • 2602.02016 • Published 15 days ago • 11
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published 18 days ago • 55
tencent/HunyuanImage-3.0-Instruct-Distil Image-to-Image • 83B • Updated 14 days ago • 696 • 44
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization Paper • 2512.00956 • Published Nov 30, 2025 • 23
TiDAR: Think in Diffusion, Talk in Autoregression Paper • 2511.08923 • Published Nov 12, 2025 • 128
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 • 28
Running on CPU Upgrade Featured 2.99k The Smol Training Playbook 📚 2.99k The secrets to building world-class LLMs
ISTA-DASLab/Qwen3-30B-A3B-Instruct-2507-W4A4-mxfp4-gptq-hadamard-transform 17B • Updated Nov 5, 2025 • 10
ISTA-DASLab/Qwen3-30B-A3B-Instruct-2507-W4A4-mxfp4-gptq-identity-transform 17B • Updated Nov 5, 2025 • 1