arxiv:2603.02765

Next Embedding Prediction Makes World Models Stronger

Published on Mar 3

· Submitted by

Natyren on Mar 4

T-Tech

Upvote

Authors:

Abstract

NE-Dreamer uses a temporal transformer to predict next-step encoder embeddings for model-based reinforcement learning without requiring decoders or auxiliary supervision.

AI-generated summary

Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.

View arXiv page View PDF Project page GitHub 0 Add to collection

Community

GeorgeBredis

Paper submitter about 9 hours ago

Most world models learn representations by reconstructing pixels. But reconstruction isn’t necessarily aligned with control.

In this paper we explore a different idea:
➡️predict the next encoder embedding instead of reconstructing the observation.

Using a next-embedding prediction objective and temporal transformer over latents, NE-Dreamer learns temporally predictive latent states and significantly improves performance on hard navigation tasks.