Papers
arxiv:2602.12108

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

Published on Feb 12
· Submitted by
Yan Wang
on Feb 12
Authors:
,
,
,
,
,
,

Abstract

StateLM enables language models to actively manage their own memory and context through internal reasoning loops and memory tools, significantly improving performance on long-document tasks and chat memory challenges.

AI-generated summary

In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This work finally places the wand in the model's hand. We introduce StateLM, a new class of foundation models endowed with an internal reasoning loop to manage their own state. We equip our model with a suite of memory tools, such as context pruning, document indexing, and note-taking, and train it to actively manage these tools. By learning to dynamically engineering its own context, our model breaks free from the architectural prison of a fixed window. Experiments across various model sizes demonstrate StateLM's effectiveness across diverse scenarios. On long-document QA tasks, StateLMs consistently outperform standard LLMs across all model scales; on the chat memory task, they achieve absolute accuracy improvements of 10% to 20% over standard LLMs. On the deep research task BrowseComp-Plus, the performance gap becomes even more pronounced: StateLM achieves up to 52% accuracy, whereas standard LLM counterparts struggle around 5%. Ultimately, our approach shifts LLMs from passive predictors to state-aware agents where reasoning becomes a stateful and manageable process.

Community

Paper submitter

It’s time to evolve from context engineering to Model-as-Context-Engineer.

By equipping LLMs with the intrinsic free() operation, effectively handing them the wand to master their own memory, we take a decisive step closer to AGI. This work presents the first agent architecture that generalizes remarkably well across Long-Document QA, Multi-Turn Dialogue, and Deep Search, proving that sustainable intelligence isn't just about remembering everything, but accurate forgetting.

Great work!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.12108 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.12108 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.12108 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.