Ï€RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models#

Paper: arXiv:2510.25889

Overview#

Ï€RL teaser

πRL provides online reinforcement learning fine-tuning for flow-based vision-language-action (VLA) models π₀ and π₀.₅ within the RLinf framework. By combining PPO/GRPO with flow matching policies, the method enables few-shot SFT models to achieve strong manipulation performance through environment feedback. It supports LIBERO, ManiSkill3, MetaWorld, and CALVIN benchmarks, with visual understanding, language comprehension, and continuous action generation jointly optimized via RL.

Results#

π₀ Model#

Evaluation results of π₀ model#

Environment

Task

SFT

Flow-SDE

Flow-Noise

LIBERO

Spatial, Object, Goal

SFT

—

—

LIBERO

Long

SFT

—

—

ManiSkill3

Multi-task

38.4%

78.8%

77.8%

MetaWorld

MT50

50.8%

78.1%

85.8%

CALVIN

ABC-D

57.5%

61.7%

59.9%

π₀.₅ Model#

Evaluation results of π₀.₅ model#

Environment

Task

SFT

Flow-SDE

Flow-Noise

LIBERO

Spatial, Object, Goal, Long

SFT

—

—

ManiSkill3

Multi-task

40.1%

90.9%

89.7%

MetaWorld

MT50

43.8%

70.7%

66.1%

CALVIN

ABC-D

61.3%

87.0%

84.5%

Quickstart#

Full guide: RL on π0 and π0.5 Models

Run: bash examples/embodiment/run_embodiment.sh <CONFIG_NAME> (configs in examples/embodiment/config/)

Model Selection:

  • π₀: Configs without _pi05 in the name

  • π₀.â‚…: Configs with _pi05 in the name (e.g. *_openpi_pi05.yaml)

Benchmarks:

Citation#

@article{chen2025pi_rl,
  title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
  author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Li, Xiang and Zhang, Quanlu and Yu, Zhaofei and others},
  journal={arXiv preprint arXiv:2510.25889},
  year={2025}
}