RL with RoboCasa Benchmark#
This document provides a comprehensive guide for reinforcement learning training tasks using the RoboCasa environment in the RLinf framework. RoboCasa Kitchen focuses on manipulation tasks in kitchen environments, featuring diverse kitchen layouts, objects, and manipulation tasks. RoboCasa Kitchen combines realistic kitchen environments with diverse manipulation challenges, making it an ideal benchmark for developing generalizable robotic policies.
The main goal is to train vision-language-action models capable of performing the following tasks:
Visual Understanding: Process RGB images from multiple camera viewpoints.
Language Understanding: Interpret natural language task instructions.
Manipulation Skills: Execute complex kitchen tasks such as pick-and-place, opening/closing doors, and appliance control.
Environment#
RoboCasa Simulation Platform
Environment: RoboCasa Kitchen simulation environment (built on robosuite)
Robot: Panda manipulator with mobile base (PandaOmron), equipped with gripper
Observation: Multi-view RGB images (robot view + wrist camera) + proprioceptive state
Action Space: 12-dimensional continuous actions
3D arm position delta
3D arm rotation delta
1D gripper control (open/close)
4D base control
1D mode selection (control base or arm)
Task Categories
RoboCasa Kitchen provides 24 atomic tasks covering multiple categories (excluding NavigateKitchen atomic task that requires base movement):
Door Manipulation Tasks:
OpenSingleDoor: Open cabinet or microwave doorCloseSingleDoor: Close cabinet or microwave doorOpenDoubleDoor: Open double cabinet doorsCloseDoubleDoor: Close double cabinet doorsOpenDrawer: Open drawerCloseDrawer: Close drawer
Pick and Place Tasks:
PnPCounterToCab: Pick from counter and place into cabinetPnPCabToCounter: Pick from cabinet and place on counterPnPCounterToSink: Pick from counter and place in sinkPnPSinkToCounter: Pick from sink and place on counterPnPCounterToStove: Pick from counter and place on stovePnPStoveToCounter: Pick from stove and place on counterPnPCounterToMicrowave: Pick from counter and place in microwavePnPMicrowaveToCounter: Pick from microwave and place on counter
Appliance Control Tasks:
TurnOnMicrowave: Turn on microwaveTurnOffMicrowave: Turn off microwaveTurnOnSinkFaucet: Turn on sink faucetTurnOffSinkFaucet: Turn off sink faucetTurnSinkSpout: Turn sink spoutTurnOnStove: Turn on stoveTurnOffStove: Turn off stove
Coffee Making Tasks:
CoffeeSetupMug: Setup coffee mugCoffeeServeMug: Serve coffee into mugCoffeePressButton: Press coffee machine button
Observation Structure
Base Camera Image (
base_image): Robot left view (224×224 RGB)Wrist Camera Image (
wrist_image): End-effector view camera (224×224 RGB)Wrist Camera Image (
extra_view_image): Robot right view (224×224 RGB, not included by default.)Proprioceptive State (
state): 25-dimensional vector containing: -[0:3]End-effector position (x, y, z) -[3:7]End-effector quaternion (w, x, y, z) -[7:9]Gripper joint position -[9:11]Gripper joint velocities -[11:14]End-effector position relative to base (x, y, z) -[14:18]End-effector quaternion relative to base (w, x, y, z) -[18:21]Base position (x, y, z) -[21:25]Base quaternion (w, x, y, z)
Data Structure
Images: Left camera RGB tensor
[batch_size, 3, 224, 224]and wrist camera[batch_size, 3, 224, 224]. Right camera RGB tensor[batch_size, 3, 224, 224]can also be included.State: Proprioceptive state tensor
[batch_size, 25].Task Description: Natural language instructions
Actions: 12-dimensional continuous actions
Reward: Sparse reward based on task completion
Algorithm#
Core Algorithm Components
PPO (Proximal Policy Optimization)
Advantage estimation using GAE (Generalized Advantage Estimation)
Policy clipping with ratio limits
Value function clipping
Entropy regularization
GRPO (Group Relative Policy Optimization)
For every state / prompt the policy generates G independent actions
Compute the advantage of each action by subtracting the group’s mean reward.
Dependency Installation#
1. Clone RLinf Repository#
# For mainland China users, you can use the following for better download speed:
# git clone https://ghfast.top/github.com/RLinf/RLinf.git
git clone https://github.com/RLinf/RLinf.git
cd RLinf
2. Install Dependencies#
Option 1: Docker Image
Use Docker image for the experiment.
docker run -it --rm --gpus all \
--shm-size 20g \
--network host \
--name rlinf \
-v .:/workspace/RLinf \
rlinf/rlinf:agentic-rlinf0.2-robocasa
# For mainland China users, you can use the following for better download speed:
# docker.1ms.run/rlinf/rlinf:agentic-rlinf0.2-robocasa
Option 2: Custom Environment
Install dependencies directly in your environment by running the following command:
# For mainland China users, you can add the `--use-mirror` flag to the install.sh command for better download speed.
bash requirements/install.sh embodied --model openpi --env robocasa
source .venv/bin/activate
Dataset Download#
python -m robocasa.scripts.download_kitchen_assets # Caution: Assets to be downloaded are around 5GB
Model Download#
# Download the model (choose either method)
# Method 1: Using git clone
git lfs install
git clone https://huggingface.co/RLinf/RLinf-Pi0-RoboCasa
# Method 2: Using huggingface-hub
# For mainland China users, you can use the following for better download speed:
# export HF_ENDPOINT=https://hf-mirror.com
pip install huggingface-hub
hf download RLinf/RLinf-Pi0-RoboCasa --local-dir RLinf-Pi0-RoboCasa