RL with Real2Sim2Real GSEnv#
This example describes the full workflow for reinforcement learning fine-tuning in the GSEnv (ManiSkill-GS) environment using the RLinf framework. GSEnv combines ManiSkill robot simulation with 3D Gaussian Splatting (3DGS) rendering and supports Real-to-Sim-to-Real transfer; see the pi_RL paper.
The main goals are to equip the model with:
Visual understanding: Process RGB images from 3DGS rendering (aligned with real-world appearance).
Language understanding: Understand natural-language task descriptions.
Action generation: Produce precise robot actions (end-effector pose, gripper control).
Reinforcement learning: Use PPO with environment feedback to optimize the policy.
Environment#
GSEnv (ManiSkill-GS) Environment
Environment: ManiSkill-based physics simulation + 3D Gaussian Splatting rendering, with the same interface as ManiSkill.
Task: Currently supports PutCubeOnPlate-v0: pick a cube and place it on a designated plate.
Observation: Supports state (proprioception) or rgb (e.g. third-person camera); task instruction is natural language, e.g. “pick up the cube and put it on the plate”.
Action Space: Continuous actions driven by PD end-effector control (e.g. pd_ee_target_delta_pose) for Franka arm and gripper.
Robot: my_franka (Franka FR3).
Reward: Sparse; evaluate() returns success (cube stably on the plate).
Data Structures
Images: RGB tensors from 3DGS or sim camera rendering.
Task Descriptions: Natural-language instructions.
Actions: Normalized continuous values (denormalized and executed by the policy).
Rewards: 0/1 reward based on task success (configurable, e.g. only at episode end).
Algorithm#
Core Components
PPO (Proximal Policy Optimization)
GAE (Generalized Advantage Estimation) for advantage estimation
Ratio-based policy clipping
Value function clipping
Entropy regularization
Vision-Language-Action models (e.g. OpenPI π:sub:`0`/π:sub:`0.5`)
Vision + language input, action token output
Compatible with GSEnv state/rgb observations and language instructions
Dependencies and Setup#
1. Clone RLinf#
# For faster clone in some regions you can use:
# git clone https://ghfast.top/github.com/RLinf/RLinf.git
git clone https://github.com/RLinf/RLinf.git
cd RLinf
2. Install RLinf#
Option 1: Docker image
Run experiments with the Docker image.
docker run -it --rm --gpus all \
--shm-size 20g \
--network host \
--name rlinf \
-v .:/workspace/RLinf \
rlinf/rlinf:agentic-rlinf0.2-maniskill_libero
# For mirror in some regions:
# docker.1ms.run/rlinf/rlinf:agentic-rlinf0.2-maniskill_libero
Switch to the correct virtual environment with the image’s switch_env tool:
source switch_env openpi
Option 2: Custom environment
# Add `--use-mirror` to install.sh for faster install in some regions
bash requirements/install.sh embodied --model openpi --env maniskill_libero
source .venv/bin/activate
3. Install GSEnv#
GSEnv comes from the separate repo ManiSkill-GS; install it before using it with RLinf:
# Clone ManiSkill-GS
git clone -b v01 https://github.com/chenkang455/ManiSkill-GS.git
cd ManiSkill-GS
uv pip install -e .
4. Download GSEnv assets#
GSEnv needs asset files (robot URDFs, 3DGS PLY, object models, etc.). Download RLinf/gsenv-assets-v0 from HuggingFace into the ManiSkill-GS project assets/ directory:
# Run from ManiSkill-GS project root
export HF_ENDPOINT=https://hf-mirror.com
hf download RLinf/gsenv-assets-v0 --repo-type dataset --local-dir ./assets
✨ After installation, run python scripts/test_rlinf_interface.py in the ManiSkill-GS project to verify the RLinf interface. Note: the first run may take a while while gsplat compiles; please be patient.
Model download#
Before training, download the desired pretrained model (e.g. OpenPI π0.5SFT on GSEnv-PutCubeOnPlate):
# Download model (choose one method)
# Method 1: git clone
git lfs install
git clone https://huggingface.co/RLinf/RLinf-Pi05-GSEnv-PutCubeOnPlate-V0-SFT
# Method 2: huggingface-hub
# Set HF_ENDPOINT for mirror if needed:
# export HF_ENDPOINT=https://hf-mirror.com
pip install huggingface-hub
hf download RLinf/RLinf-Pi05-GSEnv-PutCubeOnPlate-V0-SFT --local-dir RLinf-Pi05-GSEnv-PutCubeOnPlate-V0-SFT
After download, set the model path correctly in your yaml config.
Running the scripts#
1. Cluster configuration
cluster:
num_nodes: 1
component_placement:
env: 0-3
rollout: 4-7
actor: 0-7
rollout:
pipeline_stage_num: 2
You can configure GPU usage for env, rollout, and actor. Setting pipeline_stage_num = 2 enables pipeline overlap between rollout and env for higher throughput.
cluster:
num_nodes: 1
component_placement:
env,rollout,actor: all
You can also use a fully shared layout where env, rollout, and actor share all GPUs.
cluster:
num_nodes: 1
component_placement:
env: 0-1
rollout: 2-5
actor: 6-7
Or a fully separated layout where each component uses its own GPUs without offload.
2. Config files
GSEnv PutCubeOnPlate training config:
π0.5+ PPO:
examples/embodiment/config/gsenv_ppo_openpi_pi05.yaml
3. Launch command
To start training with your chosen config:
bash examples/embodiment/run_embodiment.sh CHOSEN_CONFIG
Example: to train the π0.5model with PPO on GSEnv PutCubeOnPlate:
bash examples/embodiment/run_embodiment.sh gsenv_ppo_openpi_pi05
Visualization and results#
1. TensorBoard
# Start TensorBoard
tensorboard --logdir ./logs --port 6006
2. Key metrics
Training
actor/loss: Policy lossactor/value_loss: Value loss (PPO)actor/grad_norm: Gradient normactor/approx_kl: Approx KL between old and new policyactor/pg_clipfrac: Policy clip fractionactor/value_clip_ratio: Value clip ratio (PPO)
Rollout
rollout/returns_mean: Mean episode returnrollout/advantages_mean: Mean advantage
Environment
env/episode_len: Mean episode lengthenv/success_once: Task success rate
3. Video
Enable video in env config to record 3DGS renders (requires gs_kwargs.render_interface: "gs_rlinf" etc.):
video_cfg:
save_video: True
info_on_video: True
video_base_dir: ${runner.logger.log_path}/video/train
4. WandB
runner:
task_type: embodied
logger:
log_path: "../results"
project_name: rlinf
experiment_name: "gsenv_ppo_openpi_pi05"
logger_backends: ["tensorboard", "wandb"] # tensorboard, wandb, swanlab
GSEnv results#
On PutCubeOnPlate-v0, training OpenPI π0.5with PPO in RLinf, monitor env/success_once and related metrics for convergence. Actual numbers depend on seed, steps, hyperparameters, and SFT checkpoint.
References#
ManiSkill-GS repo: GSEnv implementation and 3DGS rendering (ManiSkill-GS).
pi_RL paper: pi_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models.
RLinf ManiSkill docs: Understanding ManiSkill interface and config helps when using GSEnv.