RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training#
论文: arXiv:2510.06710 | 模型: RLinf-OpenVLA | RLinf-OpenVLAOFT
概述#
RLinf-VLA 是面向 VLA 模型可扩展 RL 训练的统一高效框架,通过统一接口整合多种 VLA 架构、多种 RL 算法与异构仿真器;采用灵活的资源分配架构,针对 GPU 并行仿真器引入混合细粒度流水线策略,带来约 1.61×–1.88× 训练加速。在 LIBERO、ManiSkill、RoboTwin 等基准上,RLinf-VLA 训练得到的模型均有约 20–85% 的性能提升。
结果#
训练曲线(ManiSkill)#
OpenVLA |
OpenVLA-OFT |
ManiSkill “PutOnPlateInScene25Mani-v3” 上使用 OpenVLA 与 OpenVLA-OFT、PPO 与 GRPO 的训练曲线。PPO 持续优于 GRPO 且更稳定。
ManiSkill 评估#
模型 |
分布内 |
Vision |
Semantic |
Execution |
平均 |
|---|---|---|---|---|---|
OpenVLA (Base) |
53.91 |
38.75 |
35.94 |
42.11 |
39.10 |
93.75 |
80.47 |
75.00 |
81.77 |
79.15 |
|
84.38 |
74.69 |
72.99 |
77.86 |
75.15 |
|
96.09 |
82.03 |
78.35 |
85.42 |
81.93 |
|
OpenVLA-OFT (Base) |
28.13 |
27.73 |
12.95 |
11.72 |
18.29 |
94.14 |
84.69 |
45.54 |
44.66 |
60.64 |
|
97.66 |
92.11 |
64.84 |
73.57 |
77.05 |
LIBERO(统一模型,五类任务)#
模型 |
Spatial |
Object |
Goal |
Long |
90 |
平均 |
|---|---|---|---|---|---|---|
72.18 |
71.48 |
64.06 |
48.44 |
70.97 |
65.43 |
|
99.40 |
99.80 |
98.79 |
93.95 |
98.59 |
98.11 |
|
Δ 提升 |
+27.22 |
+28.32 |
+34.73 |
+45.51 |
+27.62 |
+32.68 |
RoboTwin(七项任务)#
Task |
OpenVLA-OFT (SFT) |
OpenVLA-OFT (RLinf-GRPO) |
|---|---|---|
beat_block_hammer |
||
pick_dual_bottles |
||
place_empty_cup |
||
place_container_plate |
||
move_can_pot |
||
lift_pot |
||
handover_block |
||
Average |
28.79% |
86.16 |
Δ Avg. |
--- |
+57.37% |
“Base”与“SFT”指 RL 训练前的监督微调模型。
快速开始#
ManiSkill: 基于ManiSkill评测平台的强化学习训练
LIBERO: 基于LIBERO评测平台的强化学习训练
RoboTwin: 基于RoboTwin评测平台的强化学习训练
更多示例: 具身智能场景
引用#
@article{zang2025rlinf,
title={RLinf-VLA: A unified and efficient framework for VLA+ RL training},
author={Zang, Hongzhi and Wei, Mingjie and Xu, Si and Wu, Yongji and Guo, Zhen and Wang, Yuanqing and Lin, Hao and Shi, Liangzhi and Xie, Yuqing and Xu, Zhexuan and others},
journal={arXiv preprint arXiv:2510.06710},
year={2025}
}