arxiv:2603.02083

π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Published on Mar 2

· Submitted by

JeffWang on Mar 9

GigaAI-Research

Upvote

Authors:

Xiaofeng Wang ,

Abstract

Step-wise negative-aware fine-tuning enables efficient reinforcement learning for vision-language-action models by eliminating likelihood computation and auxiliary networks while improving generalization in complex environments.

AI-generated summary

Flow-based vision-language-action (VLA) models excel in embodied control but suffer from intractable likelihoods during multi-step sampling, hindering online reinforcement learning. We propose \textit{boldsymbolπ-StepNFT} (Step-wise Negative-aware Fine-Tuning), a critic-and-likelihood-free framework that requires only a single forward pass per optimization step and eliminates auxiliary value networks. We identify that wider exploration spaces necessitate finer-grained, step-wise guidance for alignment. Empirically, π-StepNFT unlocks latent potential on LIBERO with competitive few-shot robustness. Moreover, it achieves superior generalization on ManiSkill, outperforming value-based baselines in OOD scenarios by preventing overfitting to multimodal features. This property offers a scalable solution promising for complex real-world applications.

View arXiv page View PDF Project page GitHub 13 Add to collection

Community

Jeff-Wang

Paper author Paper submitter about 7 hours ago

π-StepNFT is a critic-free, likelihood-free online RL fine-tuning method for flow-based VLA policies.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.02083 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.02083 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.02083 in a Space README.md to link it from this page.