Ï€RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models#
Paper: arXiv:2510.25889
Overview#
πRL provides online reinforcement learning fine-tuning for flow-based vision-language-action (VLA) models π₀ and π₀.₅ within the RLinf framework. By combining PPO/GRPO with flow matching policies, the method enables few-shot SFT models to achieve strong manipulation performance through environment feedback. It supports LIBERO, ManiSkill3, MetaWorld, and CALVIN benchmarks, with visual understanding, language comprehension, and continuous action generation jointly optimized via RL.
Results#
π₀ Model#
π₀.₅ Model#
Quickstart#
Full guide: RL on π0 and π0.5 Models
Run: bash examples/embodiment/run_embodiment.sh <CONFIG_NAME> (configs in examples/embodiment/config/)
Model Selection:
π₀: Configs without
_pi05in the nameπ₀.₅: Configs with
_pi05in the name (e.g.*_openpi_pi05.yaml)
Benchmarks:
LIBERO: RL with LIBERO Benchmark
ManiSkill3: RL with ManiSkill Benchmark
MetaWorld: RL with MetaWorld Benchmark
CALVIN: RL with CALVIN Benchmark
Real2Sim2Real (GSEnv): RL with Real2Sim2Real GSEnv
Citation#
@article{chen2025pi_rl,
title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Li, Xiang and Zhang, Quanlu and Yu, Zhaofei and others},
journal={arXiv preprint arXiv:2510.25889},
year={2025}
}