Supervised Fine-Tuning#

This page explains how to run full-parameter supervised fine-tuning (SFT) and LoRA fine-tuning with the RLinf framework. SFT is typically the first stage before reinforcement learning: the model imitates high-quality examples so RL can continue optimization with a strong prior.

Contents#

  • How to configure full-parameter SFT and LoRA SFT in RLinf

  • How to launch training on a single machine or multi-node cluster

  • How to monitor and evaluate results

Supported datasets#

RLinf currently supports datasets in the LeRobot format, selected via config_type.

Supported formats include:

  • pi0_maniskill

  • pi0_libero

  • pi0_aloha_robotwin

  • pi05_libero

  • pi05_maniskill

  • pi05_metaworld

  • pi05_calvin

You can also train with a custom dataset format. Refer to the files below:

  1. In examples/sft/config/custom_sft_openpi.yaml, set the data format.

model:
  openpi:
    config_name: "pi0_custom"
  1. In rlinf/models/embodiment/openpi/__init__.py, set the data format to pi0_custom.

TrainConfig(
    name="pi0_custom",
    model=pi0_config.Pi0Config(),
    data=CustomDataConfig(
        repo_id="physical-intelligence/custom_dataset",
        base_config=DataConfig(
            prompt_from_task=True
        ),  # we need language instruction
        assets=AssetsConfig(assets_dir="checkpoints/torch/pi0_base/assets"),
        extra_delta_transform=True,  # True for delta action, False for abs_action
        action_train_with_rotation_6d=False,  # User can add extra config in custom dataset
    ),
    pytorch_weight_path="checkpoints/torch/pi0_base",
),
  1. In rlinf/models/embodiment/openpi/dataconfig/custom_dataconfig.py, define the custom dataset config.

class CustomDataConfig(DataConfig):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.repo_id = "physical-intelligence/custom_dataset"
        self.base_config = DataConfig(
            prompt_from_task=True
        )
        self.assets = AssetsConfig(assets_dir="checkpoints/torch/pi0_base/assets")
        self.extra_delta_transform = True
        self.action_train_with_rotation_6d = False

Training configuration#

A full example lives in examples/sft/config/libero_sft_openpi.yaml. Key fields:

cluster:
    num_nodes: 1                 # number of nodes
    component_placement:         # component → GPU mapping
        actor: 0-3

To enable LoRA fine-tuning, set actor.model.is_lora to True and configure actor.model.lora_rank.

actor:
    model:
        is_lora: True
        lora_rank: 32

Dependency Installation#

This section describes the dependency for the SFT of OpenPI model. For other models, please refer to the Dependency Installation section of the corresponding examples.

1. Clone RLinf Repository#

# For mainland China users, you can use the following for better download speed:
# git clone https://ghfast.top/github.com/RLinf/RLinf.git
git clone https://github.com/RLinf/RLinf.git
cd RLinf

2. Install Dependencies#

Option 1: Docker Image

Use Docker image for the experiment.

docker run -it --rm --gpus all \
    --shm-size 20g \
    --network host \
    --name rlinf \
    -v .:/workspace/RLinf \
    rlinf/rlinf:agentic-rlinf0.2-maniskill_libero
    # For mainland China users, you can use the following for better download speed:
    # docker.1ms.run/rlinf/rlinf:agentic-rlinf0.2-maniskill_libero

Please switch to the corresponding virtual environment via the built-in switch_env utility in the image:

source switch_env openpi

Option 2: Custom Environment

Install dependencies directly in your environment by running the following command:

# For mainland China users, you can add the `--use-mirror` flag to the install.sh command for better download speed.

bash requirements/install.sh embodied --model openpi --env maniskill_libero
source .venv/bin/activate

Launch scripts#

First start the Ray cluster, then run the helper script:

# return to repo root
bash examples/sft/run_vla_sft.sh --config libero_sft_openpi

The same script works for generic text SFT; just swap the config file.