RL with RoboTwin Benchmark#

https://robotwin-platform.github.io/assets/images/teaser.png — RoboTwin 2.0 dual-arm manipulation tasks (image: RoboTwin).#

RoboTwin 2.0 is a dual-arm manipulation benchmark with a large task suite. You’ll use RLinf to RL-fine-tune VLA policies on RoboTwin tasks such as place_empty_cup and adjust_bottle.

Overview#

Fine-tune a VLA on RoboTwin 2.0; OpenVLA-OFT + GRPO lifts average task success by +57%.

Models

OpenVLA-OFT · π₀ / π₀.₅ · Lingbot-VLA

Algorithms

PPO · GRPO · DAgger

Tasks

46 supported · 10 configured

Hardware

1–2 nodes · 8–16 GPUs

You’ll do: install → clone RoboTwin + assets → download an SFT model → launch run_embodiment.sh → watch env/success_once.

Prerequisites: Installation · RoboTwin repo and assets · an SFT checkpoint.

Tasks#

RoboTwin supports 46 manipulation tasks. RLinf ships ready-to-run env configs for these tasks:

Task	Description
`adjust_bottle`	Pick up the bottle on the table head-up with the correct arm.
`place_empty_cup`	Place the empty cup on the coaster.
`place_container_plate`	Place the container onto the plate.
`pick_dual_bottles`	Pick up one bottle with each arm.
`move_can_pot`	Move a can to beside the pot.
`lift_pot`	Lift the pot with both arms.
`handover_block`	Handover a red block from the left arm to the right arm, then place it on the blue pad.
`beat_block_hammer`	Grab the hammer and hit the block.
`click_bell`	Click the bell’s top center.
`place_shoe`	Pick up a shoe and place it on the mat.

Note

Four RoboTwin tasks are not yet supported in RLinf: place_fan, open_laptop, place_object_scale, and put_object_cabinet. Dense reward functions are still being expanded across tasks.

Observation and Action#

Field	Specification
`images`	Head-camera RGB, `[B, 224, 224, 3]` uint8, center-cropped when enabled.
`wrist_images`	Optional left/right wrist-camera RGB, `[B, n, 224, 224, 3]` uint8, or `None`.
`states`	Proprioception, `[B, 14]` float32.
`task_descriptions`	Natural-language task descriptions.
`actions`	VLA-dependent continuous action chunks for ALOHA-style dual-arm control.

Installation#

First, clone the RLinf repository:

# Mainland China users can use a mirror for faster cloning:
# git clone https://ghfast.top/github.com/RLinf/RLinf.git
git clone https://github.com/RLinf/RLinf.git
cd RLinf

Then set up the dependencies with one of the two methods below — a prebuilt Docker image (recommended) or a custom environment. The general setup (prerequisites, GPU drivers, the in-image switch_env helper, mirrors, and troubleshooting) is documented once in Installation; the commands in this recipe only differ in the Docker image tag and the --env value.

Docker image

docker run -it --rm --gpus all \
   --shm-size 32g \
   --network host \
   --name rlinf \
   -v .:/workspace/RLinf \
   rlinf/rlinf:agentic-rlinf0.3-robotwin

# For mainland China users:
# docker.1ms.run/rlinf/rlinf:agentic-rlinf0.3-robotwin

Switch to the matching virtual environment inside the image:

# OpenVLA-OFT
source switch_env openvla-oft

# OpenPI π₀ / π₀.₅
# source switch_env openpi

# Lingbot-VLA, if available in your image
# source switch_env lingbotvla

Custom environment

Install dependencies for the model you want to run:

# Mainland China users can add --use-mirror.

# OpenVLA-OFT
bash requirements/install.sh embodied --model openvla-oft --env robotwin

# OpenPI π₀ / π₀.₅
# bash requirements/install.sh embodied --model openpi --env robotwin

# Lingbot-VLA
# bash requirements/install.sh embodied --model lingbotvla --env robotwin

source .venv/bin/activate

Clone RoboTwin and download its assets:

git clone https://github.com/RoboTwin-Platform/RoboTwin.git -b RLinf_support
cd RoboTwin
bash script/_download_assets.sh

export PYTHONPATH=/path/to/RoboTwin:$PYTHONPATH
export ROBOT_PLATFORM=ALOHA

By default, this script downloads assets under /path/to/RoboTwin/assets/. After the download completes, set env.train.assets_path and env.eval.assets_path to /path/to/RoboTwin (the parent folder of assets/).

Download the Model#

Download the SFT checkpoint that matches your config. Examples:

OpenVLA-OFT

cd /path/to/save/model

git lfs install
git clone https://huggingface.co/RLinf/RLinf-OpenVLAOFT-RoboTwin-SFT-place_empty_cup

# Or use huggingface-hub:
# export HF_ENDPOINT=https://hf-mirror.com
pip install huggingface-hub
hf download RLinf/RLinf-OpenVLAOFT-RoboTwin-SFT-place_empty_cup --local-dir RLinf-OpenVLAOFT-RoboTwin-SFT-place_empty_cup

OpenPI π₀ / π₀.₅

cd /path/to/save/model

git lfs install
git clone https://huggingface.co/RLinf/RLinf-Pi0-RoboTwin-SFT-adjust_bottle
git clone https://huggingface.co/RLinf/RLinf-Pi05-RoboTwin-SFT-adjust_bottle

# Or use huggingface-hub:
# export HF_ENDPOINT=https://hf-mirror.com
pip install huggingface-hub
hf download RLinf/RLinf-Pi0-RoboTwin-SFT-adjust_bottle --local-dir RLinf-Pi0-RoboTwin-SFT-adjust_bottle
hf download RLinf/RLinf-Pi05-RoboTwin-SFT-adjust_bottle --local-dir RLinf-Pi05-RoboTwin-SFT-adjust_bottle

After downloading, point your config YAML at the checkpoint — set the same path for both the rollout and the actor model:

rollout:
   model:
      model_path: /path/to/downloaded-checkpoint
actor:
   model:
      model_path: /path/to/downloaded-checkpoint

For Lingbot-VLA recipes, point actor.model.model_path and rollout.model.model_path at your Lingbot-VLA SFT checkpoint.

Note

The action-normalization key unnorm_key in the config (e.g. unnorm_key: "place_empty_cup") must match the unnorm_key used when the SFT checkpoint was trained, otherwise actions will be denormalized incorrectly.

Run It#

Pick one recipe and launch training:

Recipe	Config	Command suffix
OpenVLA-OFT + GRPO	`examples/embodiment/config/robotwin_place_empty_cup_grpo_openvlaoft.yaml`	`robotwin_place_empty_cup_grpo_openvlaoft`
OpenVLA-OFT + PPO	`examples/embodiment/config/robotwin_place_empty_cup_ppo_openvlaoft.yaml`	`robotwin_place_empty_cup_ppo_openvlaoft`
π₀ + PPO	`examples/embodiment/config/robotwin_adjust_bottle_ppo_openpi.yaml`	`robotwin_adjust_bottle_ppo_openpi`
π₀.₅ + PPO	`examples/embodiment/config/robotwin_adjust_bottle_ppo_openpi_pi05.yaml`	`robotwin_adjust_bottle_ppo_openpi_pi05`
OpenPI + DAgger	`examples/embodiment/config/robotwin_adjust_bottle_dagger_openpi.yaml`	`robotwin_adjust_bottle_dagger_openpi`
Lingbot-VLA + GRPO	`examples/embodiment/config/robotwin_click_bell_grpo_lingbotvla.yaml`	`robotwin_click_bell_grpo_lingbotvla`

bash examples/embodiment/run_embodiment.sh robotwin_place_empty_cup_grpo_openvlaoft
bash examples/embodiment/run_embodiment.sh robotwin_adjust_bottle_ppo_openpi_pi05

What this does:

Starts the embodied training entrypoint with the selected RoboTwin Hydra config.
Creates Ray workers for actor, rollout, and RoboTwin env components.
Runs rollouts, computes task rewards, and updates the selected VLA policy.

Run standalone evaluation through the RoboTwin evaluation guide. It owns ROBOTWIN_PATH / assets_path setup, available eval configs such as robotwin_place_empty_cup_openvlaoft_eval and robotwin_adjust_bottle_openpi_pi05_eval, and result interpretation.

Note

The provided configs use train/eval seed files under rlinf/envs/robotwin/seeds/.

Visualization and Results#

Launch TensorBoard from the RLinf repo root:

tensorboard --logdir ../results --port 6006

The key signal is env/success_once. For every logged metric, see Training metrics.

Videos are saved through the env video config:

video_cfg:
  save_video: True
  video_base_dir: ${runner.logger.log_path}/video/eval

OpenVLA-OFT evaluation results on seven RoboTwin tasks#
Task	SFT	RLinf-GRPO	RLinf-PPO
`beat_block_hammer`	10.15%	96.09%	—
`pick_dual_bottles`	20.31%	92.96%	—
`place_empty_cup`	75.78%	94.53%	92.97%
`place_container_plate`	54.69%	95.31%	—
`move_can_pot`	9.37%	83.59%	—
`lift_pot`	3.13%	70.31%	—
`handover_block`	28.13%	70.31%	—
Average	28.79%	86.16%	—
Δ Avg.	—	+57.37%	—

OpenPI evaluation results on RoboTwin `adjust_bottle`#
Method	SFT	RLinf-PPO
π₀	76.56%	98.44%
π₀.₅	85.94%	96.09%

Note

OpenVLA-OFT results use the demo_randomized setting. OpenPI results use demo_clean. For task-level simulator options, see the RoboTwin configuration documentation.