RL on Embodied Models#

This category groups examples in which the model or policy class is the headline. They show how to onboard a specific model family in RLinf — checkpoint loading, processor / config wiring, action head, lightweight MLP policies, and a reference RL fine-tuning recipe — independent of any single benchmark.

If you are starting from “I want to train or RL-fine-tune model X”, this is the right entry point. For benchmark-driven examples see RL with Embodied Simulators.

RL on MLP Policy
Train a lightweight MLP policy with PPO, SAC, or GRPO across simulation environments

RL on π₀ and π₀.₅ Models
Significant improvement in RL training on π₀ and π₀.₅

RL on GR00T Models
Support GR00T-N1.5, N1.6 and N1.7 RL fine-tuning.

RL with Lingbot-VLA Model
Support Lingbot-VLA + RoboTwin + GRPO training

RL on Dexbotic Model
Dexbotic (π₀.₅-based) + LIBERO + PPO training

RL on StarVLA Models
StarVLA + LIBERO + GRPO embodied RL training

RL on ABot-M0 Model
ABot-M0 native integration with LIBERO-plus PPO training

RL with OpenSora World Model
Support OpenSora World Model + OpenVLA-OFT + GRPO training

RL with Wan World Model
Support Wan World Model + OpenVLA-OFT + GRPO training