πRL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models#
Paper: arXiv:2510.25889
Overview#
πRL provides online reinforcement learning fine-tuning for flow-based vision-language-action (VLA) models π₀ and π₀.₅ within the RLinf framework. By combining PPO/GRPO with flow matching policies, the method enables few-shot SFT models to achieve strong manipulation performance through environment feedback. It supports the LIBERO, ManiSkill3, MetaWorld, and CALVIN benchmarks.
Results#
π₀ Model#
π₀.₅ Model#
Quick Start#
Full guide: RL on π0 and π0.5 Models
Run: bash examples/embodiment/run_embodiment.sh <CONFIG_NAME> (configs in examples/embodiment/config/)
Model Selection:
π₀: Configs without
_pi05in the nameπ₀.₅: Configs with
_pi05in the name (e.g.*_openpi_pi05.yaml)
Benchmarks:
LIBERO: RL with LIBERO Benchmarks
ManiSkill3: RL with ManiSkill Benchmark
MetaWorld: RL with MetaWorld Benchmark
CALVIN: RL with CALVIN Benchmark
Real2Sim2Real (GSEnv): RL with Real2Sim2Real GSEnv
Citation#
@article{chen2025pi_rl,
title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Li, Xiang and Zhang, Quanlu and Yu, Zhaofei and others},
journal={arXiv preprint arXiv:2510.25889},
year={2025}
}