view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 • 34
mklasby/self-distill-qwen-qwen3-30b-a3b-theblackcat102-evol-codealpaca-v1 Viewer • Updated Mar 9 • 111k • 15
mklasby/self-distill-qwen-qwen3-30b-a3b-theblackcat102-evol-codealpaca-v1 Viewer • Updated Mar 9 • 111k • 15
mklasby/self-distill__qwen-qwen3-30b-a3b__theblackcat102-evol-codealpaca-v1__greedy__seed-42 Viewer • Updated Mar 8 • 1.02k • 11
mklasby/self-distill__qwen-qwen3-30b-a3b__theblackcat102-evol-codealpaca-v1__greedy__seed-42 Viewer • Updated Mar 8 • 1.02k • 11
Cerebras REAP Collection Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated Feb 25 • 138