RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training#
Paper: arXiv:2510.06710 | Models: RLinf-OpenVLA | RLinf-OpenVLAOFT
Overview#
RLinf-VLA is a unified and efficient framework for scalable RL training of VLA models. It provides a unified interface that standardizes the integration of diverse VLA architectures, multiple RL algorithms, and heterogeneous simulators. The system uses a flexible resource allocation architecture for rendering, inference, and training; for GPU-parallelized simulators it introduces a hybrid fine-grained pipeline allocation strategy, yielding a 1.61×–1.88× training speedup. Models trained with RLinf-VLA show consistent improvements of approximately 20–85% across LIBERO, ManiSkill, and RoboTwin.
Results#
Training curves (ManiSkill)#
OpenVLA |
OpenVLA-OFT |
Training curves on ManiSkill “PutOnPlateInScene25Mani-v3” with OpenVLA and OpenVLA-OFT, using PPO and GRPO. PPO consistently outperforms GRPO and is more stable.
ManiSkill evaluation#
Model |
In-Dist. |
Vision |
Semantic |
Execution |
Avg. |
|---|---|---|---|---|---|
OpenVLA (Base) |
53.91 |
38.75 |
35.94 |
42.11 |
39.10 |
93.75 |
80.47 |
75.00 |
81.77 |
79.15 |
|
84.38 |
74.69 |
72.99 |
77.86 |
75.15 |
|
96.09 |
82.03 |
78.35 |
85.42 |
81.93 |
|
OpenVLA-OFT (Base) |
28.13 |
27.73 |
12.95 |
11.72 |
18.29 |
94.14 |
84.69 |
45.54 |
44.66 |
60.64 |
|
97.66 |
92.11 |
64.84 |
73.57 |
77.05 |
LIBERO (unified model, five task groups)#
Model |
Spatial |
Object |
Goal |
Long |
90 |
Avg. |
|---|---|---|---|---|---|---|
72.18 |
71.48 |
64.06 |
48.44 |
70.97 |
65.43 |
|
99.40 |
99.80 |
98.79 |
93.95 |
98.59 |
98.11 |
|
Δ Improvement |
+27.22 |
+28.32 |
+34.73 |
+45.51 |
+27.62 |
+32.68 |
RoboTwin (seven tasks)#
Task |
OpenVLA-OFT (SFT) |
OpenVLA-OFT (RLinf-GRPO) |
|---|---|---|
beat_block_hammer |
||
pick_dual_bottles |
||
place_empty_cup |
||
place_container_plate |
||
move_can_pot |
||
lift_pot |
||
handover_block |
||
Average |
28.79% |
86.16 |
Δ Avg. |
— |
+57.37% |
“Base” and “SFT” refer to supervised fine-tuned models before RL training.
Quickstart#
ManiSkill: RL with ManiSkill Benchmark
LIBERO: RL with LIBERO Benchmark
RoboTwin: RL with RoboTwin Benchmark
More examples: Embodied Scenarios
Citation#
@article{zang2025rlinf,
title={RLinf-VLA: A unified and efficient framework for VLA+ RL training},
author={Zang, Hongzhi and Wei, Mingjie and Xu, Si and Wu, Yongji and Guo, Zhen and Wang, Yuanqing and Lin, Hao and Shi, Liangzhi and Xie, Yuqing and Xu, Zhexuan and others},
journal={arXiv preprint arXiv:2510.06710},
year={2025}
}