Quick Start#

This quick-start walks you through training the Visual-Language-Action model, including OpenVLA and OpenVLA-OFT on the ManiSkill3 environment with RLinf.

Environment Introduction#

ManiSkill3 is a GPU-accelerated simulation platform for robotics research, focusing on complex contact manipulation and embodied intelligence tasks. The benchmark covers multiple domains, including robotic arms, mobile manipulators, humanoid robots, and dexterous hands, supporting various tasks such as grasping, assembling, drawing, and locomotion.

We have also implemented system-level optimizations for the GPU simulator (see Flexible Execution Modes).

Launch Training#

Step 1: Download pre-trained models

If using the OpenVLA model, run the following command:

# Download OpenVLA pre-trained model
hf download gen-robot/openvla-7b-rlvla-warmup \
--local-dir /path/to/model/openvla-7b-rlvla-warmup/

This model is cited in the paper: paper

If using the OpenVLA-OFT model, run the following command:

# Download OpenVLA-OFT pre-trained model
hf download RLinf/Openvla-oft-SFT-libero10-trajall \
--local-dir /path/to/model/Openvla-oft-SFT-libero10-trajall/

# Download LoRA fine-tuned checkpoint on maniskill
hf download RLinf/RLinf-OpenVLAOFT-ManiSkill-Base-Lora \
--local-dir /path/to/model/oft-sft/lora_004000

Step 2: Download ManiSkill assets

This step is required for both OpenVLA and OpenVLA-OFT on ManiSkill3.

cd <path_to_RLinf>/rlinf/envs/maniskill
# For mainland China users, you can use the following for better download speed:
# export HF_ENDPOINT=https://hf-mirror.com
hf download --repo-type dataset RLinf/maniskill_assets --local-dir ./assets

Step 3: Modify the configuration file

Before running the script, you need to modify the ./examples/embodiment/config/maniskill_ppo_openvla_quickstart.yaml file according to the download paths of the model and dataset. Specifically, update the following configurations to the path where the gen-robot/openvla-7b-rlvla-warmup checkpoint is located.

  • rollout.model.model_path

  • actor.model.model_path

For OpenVLA-OFT, modify the maniskill_ppo_openvlaoft_quickstart.yaml file. Set the following model configuration items to the path where the RLinf/Openvla-oft-SFT-libero10-trajall checkpoint is located. At the same time, set the LoRA path to the path where the RLinf/RLinf-OpenVLAOFT-ManiSkill-Base-Lora checkpoint is located.

  • rollout.model.model_path

  • actor.model.model_path

  • actor.model.lora_path

  • actor.model.is_lora: True

Step 4: Launch training

After completing the above configuration file modifications, run the following command to launch training:

bash examples/embodiment/run_embodiment.sh maniskill_ppo_openvla_quickstart

Note

If you installed RLinf via Docker image (see Installation), please ensure you have switched to the Python environment corresponding to the target model. The default environment is openvla. If using OpenVLA-OFT or openpi, please use the built-in script switch_env to switch environments: source switch_env openvla-oft or source switch_env openpi.

If you installed RLinf in a custom environment, please ensure you have installed the dependencies for the corresponding model, see Installation for details.

For OpenVLA-OFT:

source switch_env openvla-oft
bash examples/embodiment/run_embodiment.sh maniskill_ppo_openvlaoft_quickstart

Training Pipeline Mode#

For embodied FSDP training, runner.use_training_pipeline enables a pipeline execution path between environment rollout and actor training. When it is set to True, rollout trajectories are processed on the environment worker, converted into packed actor micro-batches, and streamed to the actor through the channel. The actor can then train on ready-to-use micro-batches while rollout generation is still progressing.

This mode is useful when rollout payloads contain nested observations or large tensors. Sending packed micro-batches makes the channel payload more friendly to the tensor fast path and reduces extra reconstruction work on the actor side. It is especially helpful when environment workers and actor workers are placed on different nodes and the inter-node connection crosses a wide-area network, where smaller packed tensor payloads reduce transfer overhead.

Example:

runner:
  use_training_pipeline: True

algorithm:
  adv_type: gae
  normalize_advantages: True

Notes and limitations:

  • algorithm.normalize_advantages is supported. The pipeline path computes raw advantages on the environment worker, aggregates masked advantage statistics across the environment workers that feed each actor rank, and normalizes before streaming actor micro-batches.

  • algorithm.adv_type currently supports only gae in this mode.

  • This mode is intended for embodied FSDP actor training with PPO/GRPO-style actor losses. It is not supported for embodied_sac, embodied_dagger, or embodied_nft.

View Training Results#

  • Final model and metrics save path: ./logs

  • Launch as follows:

    tensorboard --host 0.0.0.0 --logdir path/to/tensorboard/
    

After opening TensorBoard, you will see an interface similar to the figure below. It is recommended to focus on the following metrics:

  • rollout/env_info/return

  • rollout/env_info/success_once

Note

If you want to specify GPU usage, you can modify the parameter cluster.component_placement in the configuration file.

Set this item to 0-3 or 0-7 to use 4/8 GPUs according to your actual resources. See Basic Configuration for more detailed instructions on Placement configuration.

cluster:
num_nodes: 1
component_placement:
   actor,env,rollout: 0-3