--- language: - zh - en pipeline_tag: text-generation ---
JoyAI-LLM Flash-Base

Hugging Face License
## 1. Model Introduction JoyAI-LLM Flash-Base is a state-of-the-art mixture-of-experts (MoE) language model with 3 billion activated parameters and 48 billion total parameters. Trained with the Muon optimizer, JoyAI Flash-base achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. JoyAI-LLM Flash series aim to accelarate high-throughput, latency-sensitive applications where cost per query must remain minimal. ### Key Features - Training-Inference Collaboration: apply Muon optimizer with dense MTP, develop novel optimization techniques to resolve instabilities while scaling up, delivering 1.3× to 1.7× the throughput of the non-MTP version. - Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving. ## 2. Model Summary | | | | :-----------------------------------------: | :----------------------: | | **Architecture** | Mixture-of-Experts (MoE) | | **Total Parameters** | 48B | | **Activated Parameters** | 3B | | **Number of Layers** (Dense layer included) | 40 | | **Number of Dense Layers** | 1 | | **Attention Hidden Dimension** | 2048 | | **MoE Hidden Dimension** (per Expert) | 768 | | **Number of Attention Heads** | 32 | | **Number of Experts** | 256 | | **Selected Experts per Token** | 8 | | **Number of Shared Experts** | 1 | | **Vocabulary Size** | 129K | | **Context Length** | 128K | | **Attention Mechanism** | MLA | | **Activation Function** | SwiGLU | | | | ## 3. Evaluation Results
Benchmark JoyAI-LLM Flash-base Qwen3-30B-A3B-base
MMLU 84.70 82.12
MMLU-Pro 73.14 61.76
CMMLU 83.09 83.60
HumanEval 85.37 87.80
LiveCodeBench 39.91 37.34
GSM8K 88.78 90.37
MATH 78.16 59.60
MATH 500 77.00 58.00
## 4. License Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).