Papers
arxiv:2603.02083

π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Published on Mar 2
· Submitted by
JeffWang
on Mar 9
Authors:
,
,
,
,
,
,
,
,

Abstract

Step-wise negative-aware fine-tuning enables efficient reinforcement learning for vision-language-action models by eliminating likelihood computation and auxiliary networks while improving generalization in complex environments.

AI-generated summary

Flow-based vision-language-action (VLA) models excel in embodied control but suffer from intractable likelihoods during multi-step sampling, hindering online reinforcement learning. We propose \textit{boldsymbolπ-StepNFT} (Step-wise Negative-aware Fine-Tuning), a critic-and-likelihood-free framework that requires only a single forward pass per optimization step and eliminates auxiliary value networks. We identify that wider exploration spaces necessitate finer-grained, step-wise guidance for alignment. Empirically, π-StepNFT unlocks latent potential on LIBERO with competitive few-shot robustness. Moreover, it achieves superior generalization on ManiSkill, outperforming value-based baselines in OOD scenarios by preventing overfitting to multimodal features. This property offers a scalable solution promising for complex real-world applications.

Community

Paper author Paper submitter

π-StepNFT is a critic-free, likelihood-free online RL fine-tuning method for flow-based VLA policies.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.02083 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.02083 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.02083 in a Space README.md to link it from this page.

Collections including this paper 1