arxiv:2605.06642

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

Published on May 7

· Submitted by

Authors:

Abstract

Strategic Trajectory Abstraction framework enhances long-horizon decision making in large language models by introducing trajectory-level strategies that improve sample efficiency and performance across interactive environments.

AI-generated summary

Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajectory Abstraction (StraTA), a simple framework that introduces an explicit trajectory-level strategy into agentic reinforcement learning (RL). StraTA samples a compact strategy from the initial task state, conditions subsequent actions on that strategy, and trains strategy generation and action execution jointly with a hierarchical GRPO-style rollout design, further enhanced by diverse strategy rollout and critical self-judgment. Experiments on ALFWorld, WebShop, and SciWorld show that StraTA consistently improves both sample efficiency and final performance over strong baselines. StraTA reaches success rates of 93.1% on ALFWorld and 84.2% on WebShop. On SciWorld, StraTA attains a 63.5% overall score, outperforming frontier closed-source models.

View arXiv page View PDF GitHub 2 Add to collection

Community

lucazhou2000

Paper submitter about 3 hours ago

Hi everyone! 👋 Sharing our latest work: StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction.

Should long-horizon LLM agents stay purely reactive? We argue they shouldn't. When the agent has to decide both the immediate action and the overall course of action from the current state alone, planning and execution get tangled — and exploration and credit assignment suffer.

Our fix: StraTA samples a compact natural-language strategy from the initial state and conditions all subsequent actions on it. We jointly train both levels with a hierarchical GRPO-style rollout, enhanced by diverse strategy sampling and critical self-judgment.

Does it work? StraTA improves both sample efficiency and final performance, outperforming both frontier closed-source models and prior RL baselines.

We believe StraTA highlights the value of explicit trajectory-level abstraction for more structured and effective long-horizon agentic RL.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.06642 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.06642 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.06642 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.