RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training#

论文: arXiv:2510.06710 | 模型: RLinf-OpenVLA | RLinf-OpenVLAOFT

概述#

RLinf-VLA 概述

RLinf-VLA 是面向 VLA 模型可扩展 RL 训练的统一高效框架,通过统一接口整合多种 VLA 架构、多种 RL 算法与异构仿真器;采用灵活的资源分配架构,针对 GPU 并行仿真器引入混合细粒度流水线策略,带来约 1.61×–1.88× 训练加速。在 LIBERO、ManiSkill、RoboTwin 等基准上,RLinf-VLA 训练得到的模型均有约 20–85% 的性能提升。

结果#

训练曲线(ManiSkill)#

mani_openvla
OpenVLA
mani_openvlaoft
OpenVLA-OFT

ManiSkill “PutOnPlateInScene25Mani-v3” 上使用 OpenVLA 与 OpenVLA-OFT、PPO 与 GRPO 的训练曲线。PPO 持续优于 GRPO 且更稳定。

ManiSkill 评估#

ManiSkill 评估结果(成功率 %)#

模型

分布内

Vision

Semantic

Execution

平均

OpenVLA (Base)

53.91

38.75

35.94

42.11

39.10

RL4VLA (PPO)

93.75

80.47

75.00

81.77

79.15

OpenVLA (RLinf-GRPO)

84.38

74.69

72.99

77.86

75.15

OpenVLA (RLinf-PPO)

96.09

82.03

78.35

85.42

81.93

OpenVLA-OFT (Base)

28.13

27.73

12.95

11.72

18.29

OpenVLA-OFT (RLinf-GRPO)

94.14

84.69

45.54

44.66

60.64

OpenVLA-OFT (RLinf-PPO)

97.66

92.11

64.84

73.57

77.05

LIBERO(统一模型,五类任务)#

五类 LIBERO 任务组上统一模型评估结果(%)#

模型

Spatial

Object

Goal

Long

90

平均

OpenVLA-OFT (Base)

72.18

71.48

64.06

48.44

70.97

65.43

OpenVLA-OFT (RLinf-GRPO)

99.40

99.80

98.79

93.95

98.59

98.11

Δ 提升

+27.22

+28.32

+34.73

+45.51

+27.62

+32.68

RoboTwin(七项任务)#

OpenVLA-OFT 在七项 RoboTwin 任务上的评估结果(%)#

Task

OpenVLA-OFT (SFT)

OpenVLA-OFT (RLinf-GRPO)

beat_block_hammer

huggingface 10.15%

huggingface 96.09%

pick_dual_bottles

huggingface 20.31%

huggingface 92.96%

place_empty_cup

huggingface 75.78%

huggingface 94.53%

place_container_plate

huggingface 54.69%

huggingface 95.31%

move_can_pot

huggingface 9.37%

huggingface 83.59%

lift_pot

huggingface 3.13%

huggingface 70.31%

handover_block

huggingface 28.13%

huggingface 70.31%

Average

28.79%

86.16

Δ Avg.

---

+57.37%

“Base”与“SFT”指 RL 训练前的监督微调模型。

快速开始#

引用#

@article{zang2025rlinf,
  title={RLinf-VLA: A unified and efficient framework for VLA+ RL training},
  author={Zang, Hongzhi and Wei, Mingjie and Xu, Si and Wu, Yongji and Guo, Zhen and Wang, Yuanqing and Lin, Hao and Shi, Liangzhi and Xie, Yuqing and Xu, Zhexuan and others},
  journal={arXiv preprint arXiv:2510.06710},
  year={2025}
}