RL with PolaRiS Simulation Platform#

https://raw.githubusercontent.com/RLinf/misc/main/pic/polaris.png — PolaRiS (image: PolaRiS).#

PolaRiS is an Isaac Sim robotics benchmark with Gaussian Splatting rendering for desktop manipulation. You’ll use RLinf to PPO-fine-tune OpenPI π₀ or π₀.₅ policies on DROID-style PolaRiS tasks.

Overview#

Fine-tune an OpenPI policy on PolaRiS with two RGB views, proprioception, and chunked 8-dim actions.

Models

π₀ · π₀.₅

Algorithms

PPO

Tasks

6 DROID desktop tasks

Hardware

1 node · 1 GPU

You’ll do: install → download Isaac Sim + datasets + model → launch run_embodiment.sh → watch env/success_once.

Prerequisites: Installation · Isaac Sim · PolaRiS-Hub · an OpenPI checkpoint.

Tasks#

Task	Description	Env Config
`DROID-TapeIntoContainer`	Put the tape into the container.	`polaris_droid_tapeintocontainer.yaml`
`DROID-PanClean`	Use the yellow sponge to scrub the blue-handle frying pan.	`polaris_droid_panclean.yaml`
`DROID-BlockStackKitchen`	Place and stack blocks on the green tray.	`polaris_droid_blockstackkitchen.yaml`
`DROID-FoodBussing`	Put all foods in the bowl.	`polaris_droid_foodbussing.yaml`
`DROID-MoveLatteCup`	Put the latte art cup on top of the cutting board.	`polaris_droid_movelattecup.yaml`
`DROID-OrganizeTools`	Put the scissors into the large container.	`polaris_droid_organizetools.yaml`

Observation and Action#

Field	Specification
Observation	External RGB camera and wrist RGB camera at 224×224 plus 8-dim robot state.
Action	8-dim continuous action: 7 joint velocities plus gripper position.
Reward	Task-completion reward from the PolaRiS environment.
Prompt	The task description in `init_params.task_description`.

Installation#

First, clone the RLinf repository:

# Mainland China users can use a mirror for faster cloning:
# git clone https://ghfast.top/github.com/RLinf/RLinf.git
git clone https://github.com/RLinf/RLinf.git
cd RLinf

Then set up the dependencies with one of the two methods below — a prebuilt Docker image (recommended) or a custom environment. The general setup (prerequisites, GPU drivers, the in-image switch_env helper, mirrors, and troubleshooting) is documented once in Installation; the commands in this recipe only differ in the Docker image tag and the --env value.

Docker image

docker run -it --rm --gpus all \
   --shm-size 32g \
   --network host \
   --name rlinf \
   -v .:/workspace/RLinf \
   rlinf/rlinf:agentic-rlinf0.2-polaris

# For mainland China users:
# docker.1ms.run/rlinf/rlinf:agentic-rlinf0.2-polaris

Switch to the OpenPI virtual environment inside the image:

source switch_env openpi

Custom environment

Install PolaRiS with OpenPI dependencies:

# Mainland China users can add --use-mirror.
bash requirements/install.sh embodied --model openpi --env polaris
source .venv/bin/activate

Download Isaac Sim#

Download Isaac Sim 5.1.0 and initialize its shell environment:

mkdir -p isaac_sim
cd isaac_sim
wget https://download.isaacsim.omniverse.nvidia.com/isaac-sim-standalone-5.1.0-linux-x86_64.zip
unzip isaac-sim-standalone-5.1.0-linux-x86_64.zip
rm isaac-sim-standalone-5.1.0-linux-x86_64.zip
source ./setup_conda_env.sh

Warning

Run source ./setup_conda_env.sh in every new terminal before launching PolaRiS.

Download the Datasets#

Download the evaluation scenes and initial conditions:

# export HF_ENDPOINT=https://hf-mirror.com
hf download owhan/PolaRiS-Hub --repo-type=dataset --local-dir ./PolaRiS-Hub
export POLARIS_DATA_PATH=/path/to/PolaRiS-Hub

Optionally download co-training demonstrations:

hf download owhan/PolaRiS-datasets --repo-type=dataset --local-dir ./PolaRiS-datasets

Download the Model#

Download the checkpoint for the OpenPI model you plan to fine-tune.

OpenPI π₀.₅

cd /path/to/save/model

git lfs install
git clone https://huggingface.co/RLinf/RLinf-Pi05-Polaris-droid_jointpos

# Or use huggingface-hub:
# export HF_ENDPOINT=https://hf-mirror.com
pip install huggingface-hub
hf download RLinf/RLinf-Pi05-Polaris-droid_jointpos --local-dir RLinf-Pi05-Polaris-droid_jointpos

OpenPI π₀

cd /path/to/save/model

git lfs install
git clone https://huggingface.co/RLinf/RLinf-Pi0-Polaris-droid_jointpos

# Or use huggingface-hub:
# export HF_ENDPOINT=https://hf-mirror.com
pip install huggingface-hub
hf download RLinf/RLinf-Pi0-Polaris-droid_jointpos --local-dir RLinf-Pi0-Polaris-droid_jointpos

After downloading, point your config YAML at the checkpoint — set the same path for both the rollout and the actor model:

rollout:
   model:
      model_path: /path/to/downloaded-checkpoint
actor:
   model:
      model_path: /path/to/downloaded-checkpoint

Run It#

Pick one training config and launch from a terminal where Isaac Sim is initialized:

Recipe	Config	Command suffix
π₀.₅ + PPO	`examples/embodiment/config/polaris_tapeintocontainer_ppo_openpi_pi05.yaml`	`polaris_tapeintocontainer_ppo_openpi_pi05`
π₀ + PPO	`examples/embodiment/config/polaris_tapeintocontainer_ppo_openpi.yaml`	`polaris_tapeintocontainer_ppo_openpi`

source /path/to/isaac_sim/setup_conda_env.sh
export POLARIS_DATA_PATH=/path/to/PolaRiS-Hub

bash examples/embodiment/run_embodiment.sh polaris_tapeintocontainer_ppo_openpi_pi05
bash examples/embodiment/run_embodiment.sh polaris_tapeintocontainer_ppo_openpi

What this does:

Starts the embodied training entrypoint with the selected Hydra config.
Creates Ray workers for the actor, rollout, and PolaRiS env components.
Runs PPO with chunked OpenPI actions and Gaussian Splatting-rendered observations.

Run standalone evaluation through the PolaRiS evaluation guide. It owns POLARIS_DATA_PATH, the available eval configs (polaris_tapeintocontainer_openpi_pi05_eval and polaris_movelattecup_openpi_eval), and result interpretation.

Note

Training configs default to polaris_droid_tapeintocontainer. To switch tasks, change the Hydra env defaults to another polaris_droid_* env config and keep POLARIS_DATA_PATH pointed at PolaRiS-Hub.

A few PolaRiS-specific fields are worth knowing when tuning the action/rendering pipeline:

Key	Meaning
`open_loop_horizon`	Frequency of high-quality Gaussian-Splatting rendering. Within an action chunk, high-quality rendering runs every `open_loop_horizon` steps while intermediate steps use low-quality rendering to speed up the simulation.
`num_action_chunks`	Number of action steps the model generates at a time (e.g. `15`).
`num_images_in_input`	Number of camera images fed to the policy (e.g. `2`: external + wrist camera).
`config_name`	OpenPI config / data format selector (e.g. `pi05_droid_polaris` for the DROID data format).

Visualization and Results#

Launch TensorBoard from the RLinf repo root:

tensorboard --logdir ../results --port 6006

The key signal is env/success_once. For every logged metric, see Training metrics.

Enable evaluation videos in the env config when needed:

env:
  eval:
    video_cfg:
      save_video: True
      video_base_dir: ${runner.logger.log_path}/video/eval