Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play
Abstract
STRATAGEM addresses limitations in reasoning transfer for language models by using a reasoning transferability coefficient and evolution reward to promote abstract, domain-agnostic patterns over game-specific heuristics.
Games offer a compelling paradigm for developing general reasoning capabilities in language models, as they naturally demand strategic planning, probabilistic inference, and adaptive decision-making. However, existing self-play approaches rely solely on terminal game outcomes, providing no mechanism to distinguish transferable reasoning patterns from game-specific heuristics. We present STRATAGEM, which addresses two fundamental barriers to reasoning transfer: domain specificity, where learned patterns remain anchored in game semantics, and contextual stasis, where static game contexts fail to cultivate progressive reasoning. STRATAGEM selectively reinforces trajectories exhibiting abstract, domain-agnostic reasoning through a Reasoning Transferability Coefficient, while incentivizing adaptive reasoning development via a Reasoning Evolution Reward. Experiments across mathematical reasoning, general reasoning, and code generation benchmarks demonstrate substantial improvements, with particularly strong gains on competition-level mathematics where multi-step reasoning is critical. Ablation studies and human evaluation confirm that both components contribute to transferable reasoning.
Community
Happy to share our latest paper: STRATAGEM. Our key idea is that not all successful game trajectories are equally useful for building general reasoning ability, so we explicitly reinforce those with higher transferability and stronger reasoning evolution. The resulting model shows strong gains across mathematical reasoning, general reasoning, and code generation benchmarks.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning (2026)
- Placing Puzzle Pieces Where They Matter: A Question Augmentation Framework for Reinforcement Learning (2026)
- SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions (2026)
- SAGE: Multi-Agent Self-Evolution for LLM Reasoning (2026)
- Learning Structured Reasoning via Tractable Trajectory Control (2026)
- Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use (2026)
- Beyond Scaling: Assessing Strategic Reasoning and Rapid Decision-Making Capability of LLMs in Zero-sum Environments (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.17696 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper