| --- |
| language: |
| - zh |
| - en |
| pipeline_tag: text-generation |
| library_name: transformers |
| --- |
| <div align="center"> |
| <picture> |
| <img src="figures/joyai-logo.png" width="30%" alt="JoyAI-LLM Flash"> |
| </picture> |
| </div> |
| <hr> |
| |
| <div align="center" style="line-height: 1;"> |
| <a href="https://huggingface.co/jdopensource" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-JD-ffc107?color=ffc107&logoColor=white"/></a> |
| <a href="https://huggingface.co/jdopensource/JoyAI-LLM-Flash/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a> |
| </div> |
|
|
|
|
|
|
|
|
| ## 1. Model Introduction |
|
|
| JoyAI-LLM-Flash is a state-of-the-art medium-sized instruct language model with 3 billion activated parameters and 48 billion total parameters. JoyAI-LLM-Flash was pretrained on 20 trillion text tokens using Muon optimizer, followed by large-scale supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL) across diverse environments. JoyAI-LLM-Flash achieves strong performance across frontier knowledge, reasoning, coding tasks and agentic capabilities. |
|
|
| ### Key Features |
|
|
| - Fiber Bundle RL: Introduces fiber bundle theory into reinforcement learning, proposing a novel optimization framework, FiberPO. This method is specifically designed to handle the challenges of large-scale and heterogeneous agent training, improving stability and robustness under complex data distributions. |
| - Training-Inference Collaboration: apply Muon optimizer with dense MTP, develop novel optimization techniques to resolve instabilities while scaling up, delivering 1.3× to 1.7× the throughput of the non-MTP version. |
| - Agentic Intelligence: designed for tool use, reasoning, and autonomous problem-solving. |
|
|
| ## 2. Model Summary |
|
|
| | | | |
| | :-----------------------------------------: | :----------------------: | |
| | **Architecture** | Mixture-of-Experts (MoE) | |
| | **Total Parameters** | 48B | |
| | **Activated Parameters** | 3B | |
| | **Number of Layers** (Dense layer included) | 40 | |
| | **Number of Dense Layers** | 1 | |
| | **Attention Hidden Dimension** | 2048 | |
| | **MoE Hidden Dimension** (per Expert) | 768 | |
| | **Number of Attention Heads** | 32 | |
| | **Number of Experts** | 256 | |
| | **Selected Experts per Token** | 8 | |
| | **Number of Shared Experts** | 1 | |
| | **Vocabulary Size** | 129K | |
| | **Context Length** | 128K | |
| | **Attention Mechanism** | MLA | |
| | **Activation Function** | SwiGLU | |
| | </div> | | |
|
|
|
|
| ## 3. Evaluation Results |
|
|
| <table> |
| <thead> |
| <tr> |
| <th align="center">Benchmark</th> |
| <th align="center"><sup>JoyAI-LLM Flash</sup></th> |
| <th align="center"><sup>Qwen3-30B-A3B-Instuct-2507</sup></th> |
| <th align="center"><sup>GLM-4.7-Flash<br>(Non-thinking)</sup></th> |
| </tr> |
| </thead> |
| <tbody> |
|
|
|
|
| <tr> |
| <td align="center" colspan=8><strong>Knowledge & Alignment</strong></td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">MMLU</td> |
| <td align="center" style="vertical-align: middle"><strong>89.50</strong></td> |
| <td align="center" style="vertical-align: middle">86.87</td> |
| <td align="center" style="vertical-align: middle">80.53</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">MMLU-Pro</td> |
| <td align="center" style="vertical-align: middle"><strong>81.02</strong></td> |
| <td align="center" style="vertical-align: middle">73.88</td> |
| <td align="center" style="vertical-align: middle">63.62</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">CMMLU</td> |
| <td align="center" style="vertical-align: middle"><strong>87.03</strong></td> |
| <td align="center" style="vertical-align: middle">85.88</td> |
| <td align="center" style="vertical-align: middle">75.85</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">GPQA-Diamond</td> |
| <td align="center" style="vertical-align: middle"><strong>74.43</strong></td> |
| <td align="center" style="vertical-align: middle">68.69</td> |
| <td align="center" style="vertical-align: middle">39.90</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">SuperGPQA</td> |
| <td align="center" style="vertical-align: middle"><strong>55.00</strong></td> |
| <td align="center" style="vertical-align: middle">52.00</td> |
| <td align="center" style="vertical-align: middle">32.00</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">LiveBench</td> |
| <td align="center" style="vertical-align: middle"><strong>72.90</strong></td> |
| <td align="center" style="vertical-align: middle">59.70</td> |
| <td align="center" style="vertical-align: middle">43.10</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">IFEval</td> |
| <td align="center" style="vertical-align: middle"><strong>86.69</strong></td> |
| <td align="center" style="vertical-align: middle">83.18</td> |
| <td align="center" style="vertical-align: middle">82.44</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">AlignBench</td> |
| <td align="center" style="vertical-align: middle"><strong>8.24</strong></td> |
| <td align="center" style="vertical-align: middle">8.07</td> |
| <td align="center" style="vertical-align: middle">6.85</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">HellaSwag</td> |
| <td align="center" style="vertical-align: middle"><strong>91.79</strong></td> |
| <td align="center" style="vertical-align: middle">89.90</td> |
| <td align="center" style="vertical-align: middle">60.84</td> |
| </tr> |
|
|
| <tr> |
| <td align="center" colspan=8><strong>Coding</strong></td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">HumanEval</td> |
| <td align="center" style="vertical-align: middle"><strong>96.34</strong></td> |
| <td align="center" style="vertical-align: middle">95.12</td> |
| <td align="center" style="vertical-align: middle">74.39</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">LiveCodeBench</td> |
| <td align="center" style="vertical-align: middle"><strong>65.60</strong></td> |
| <td align="center" style="vertical-align: middle">39.71</td> |
| <td align="center" style="vertical-align: middle">27.43</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">SciCode</td> |
| <td align="center" style="vertical-align: middle"><strong>3.08/22.92</strong></td> |
| <td align="center" style="vertical-align: middle"><strong>3.08/22.92</strong></td> |
| <td align="center" style="vertical-align: middle">3.08/15.11</td> |
| </tr> |
| <tr> |
| <td align="center" colspan=8><strong>Mathematics</strong></td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">GSM8K</td> |
| <td align="center" style="vertical-align: middle"><strong>95.83</strong></td> |
| <td align="center" style="vertical-align: middle">79.83</td> |
| <td align="center" style="vertical-align: middle">81.88</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">AIME2025</td> |
| <td align="center" style="vertical-align: middle"><strong>65.83</strong></td> |
| <td align="center" style="vertical-align: middle">62.08</td> |
| <td align="center" style="vertical-align: middle">24.17</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">MATH 500</td> |
| <td align="center" style="vertical-align: middle"><strong>97.10</strong></td> |
| <td align="center" style="vertical-align: middle">89.80</td> |
| <td align="center" style="vertical-align: middle">90.90</td> |
| </tr> |
|
|
| <tr> |
| <td align="center" colspan=8><strong>Agentic</strong></td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">SWE-bench Verified</td> |
| <td align="center" style="vertical-align: middle"><strong>60.60</strong></td> |
| <td align="center" style="vertical-align: middle">24.44</td> |
| <td align="center" style="vertical-align: middle">51.60</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">Tau2-Retail</td> |
| <td align="center" style="vertical-align: middle"><strong>67.55</strong></td> |
| <td align="center" style="vertical-align: middle">53.51</td> |
| <td align="center" style="vertical-align: middle">62.28</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">Tau2-Airline</td> |
| <td align="center" style="vertical-align: middle"><strong>54.00</strong></td> |
| <td align="center" style="vertical-align: middle">32.00</td> |
| <td align="center" style="vertical-align: middle">52.00</td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">Tau2-Telecom</td> |
| <td align="center" style="vertical-align: middle">79.83</td> |
| <td align="center" style="vertical-align: middle">4.39</td> |
| <td align="center" style="vertical-align: middle"><strong>88.60</strong></td> |
| </tr> |
|
|
| <tr> |
| <td align="center" colspan=8><strong>Long Context</strong></td> |
| </tr> |
| <tr> |
| <td align="center" style="vertical-align: middle">RULER</td> |
| <td align="center" style="vertical-align: middle"><strong>95.60</strong></td> |
| <td align="center" style="vertical-align: middle">89.66</td> |
| <td align="center" style="vertical-align: middle">56.12</td> |
| </tr> |
| </tbody> |
| </table> |
|
|
|
|
| ## 4. Deployment |
|
|
| > [!Note] |
| > You can access JoyAI-LLM Flash API on https://docs.jdcloud.com/cn/jdaip/chat and we provide OpenAI/Anthropic-compatible API for you. |
| > Currently, JoyAI-LLM-Flash-INT4 is recommended to run on the following inference engines: |
|
|
| * vLLM |
| * SGLang |
|
|
| The minimum version requirement for `transformers` is `4.57.1`. |
|
|
| Deployment examples can be found in the [Model Deployment Guide](docs/deploy_guidance.md). |
|
|
|
|
|
|
| ## 5. Model Usage |
|
|
| The usage demos below demonstrate how to call our official API. |
|
|
| For third-party APIs deployed with vLLM or SGLang, please note that: |
|
|
| > [!Note] Recommended sampling parameters: `temperature=0.6`, `top_p=1.0` |
| |
| ### Chat Completion |
| |
| This is a simple chat completion script which shows how to call JoyAI-Flash API. |
| |
| ```python |
| from openai import OpenAI |
| |
| client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY") |
| |
| |
| def simple_chat(client: OpenAI): |
| messages = [ |
| { |
| "role": "user", |
| "content": [ |
| { |
| "type": "text", |
| "text": "which one is bigger, 9.11 or 9.9? think carefully.", |
| } |
| ], |
| }, |
| ] |
| model_name = client.models.list().data[0].id |
| response = client.chat.completions.create( |
| model=model_name, messages=messages, stream=False, max_tokens=4096 |
| ) |
| print(f"response: {response.choices[0].message.content}") |
| |
|
|
| if __name__ == "__main__": |
| simple_chat(client) |
| ``` |
| |
|
|
| ### Tool call Completion |
|
|
| This is a simple toll call completion script which shows how to call JoyAI-Flash API. |
|
|
| ```python |
| import json |
| |
| from openai import OpenAI |
| |
| client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY") |
| |
| |
| def my_calculator(expression: str) -> str: |
| return str(eval(expression)) |
| |
| |
| def rewrite(expression: str) -> str: |
| return str(expression) |
| |
| |
| def simple_tool_call(client: OpenAI): |
| messages = [ |
| { |
| "role": "user", |
| "content": [ |
| { |
| "type": "text", |
| "text": "use my functions to compute the results for the equations: 6+1", |
| }, |
| ], |
| }, |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "my_calculator", |
| "description": "A calculator that can evaluate a mathematical equation and compute its results.", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "expression": { |
| "type": "string", |
| "description": "The mathematical expression to evaluate.", |
| }, |
| }, |
| "required": ["expression"], |
| }, |
| }, |
| }, |
| { |
| "type": "function", |
| "function": { |
| "name": "rewrite", |
| "description": "Rewrite a given text for improved clarity", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "text": { |
| "type": "string", |
| "description": "The input text to rewrite", |
| } |
| }, |
| }, |
| }, |
| }, |
| ] |
| model_name = client.models.list().data[0].id |
| response = client.chat.completions.create( |
| model=model_name, |
| messages=messages, |
| temperature=1.0, |
| max_tokens=1024, |
| tools=tools, |
| tool_choice="auto", |
| ) |
| tool_calls = response.choices[0].message.tool_calls |
| |
| results = [] |
| for tool_call in tool_calls: |
| function_name = tool_call.function.name |
| function_args = tool_call.function.arguments |
| if function_name == "my_calculator": |
| result = my_calculator(**json.loads(function_args)) |
| results.append(result) |
| messages.append({"role": "assistant", "tool_calls": tool_calls}) |
| for tool_call, result in zip(tool_calls, results): |
| messages.append( |
| { |
| "role": "tool", |
| "tool_call_id": tool_call.id, |
| "name": tool_call.function.name, |
| "content": result, |
| } |
| ) |
| response = client.chat.completions.create( |
| model=model_name, |
| messages=messages, |
| temperature=1.0, |
| max_tokens=1024, |
| ) |
| print(response.choices[0].message.content) |
| |
| |
| if __name__ == "__main__": |
| simple_tool_call(client) |
| |
| ``` |
|
|
| --- |
|
|
| ## 6. License |
|
|
| Both the code repository and the model weights are released under the [Modified MIT License](LICENSE). |