Papers
arxiv:2602.13367

Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

Published on Feb 13
· Submitted by
Ben Kelly
on Feb 17
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Nanbeige4.1-3B is a 3B-parameter unified language model that demonstrates superior performance in agentic behavior, code generation, and reasoning compared to larger models through advanced reward modeling and training techniques.

AI-generated summary

We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.

Community

Via Victor Mustar's recommendation here. ** I am not affiliated with this paper! ** I am only submitting it to the Daily so that others can enjoy it!

https://x.com/victormustar/status/2023423300278583727

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.13367 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 3