Skip to main content
Ctrl+K
RLinf
Ctrl+K
  • English
  • 中文
  • GitHub
  • Get Started
    • Requirements
    • Installation
    • Quick Start
    • Cheat Sheet
  • Examples
    • Simulators
      • ManiSkill
      • LIBERO
      • Behavior
      • MetaWorld
      • IsaacLab
      • CALVIN
      • RoboCasa
      • RoboTwin
      • RoboVerse
      • Franka-Sim
      • EmbodiChain
      • PolaRiS
      • GSEnv
      • Genesis
    • Robots
      • Franka
        • Reward Model
        • ZED + Robotiq
        • GELLO
        • Dual-Arm
        • Dexterous Hand
        • Pi0 SFT
        • HG-DAgger
      • GimArm
      • XSquare Turtle2
      • DOS-W1
    • Models
      • MLP
      • π₀ / π₀.â‚…
      • GR00T
      • Lingbot-VLA
      • Dexbotic
      • StarVLA
      • ABot-M0
      • OpenSora
      • Wan
    • SFT
      • OpenPI
      • DreamZero
      • VLM
    • Algorithms
      • SAC-Flow
      • DSRL
      • DAgger
      • RECAP
      • Co-Training
      • IQL (D4RL)
    • Agents
      • WideSeek-R1
        • Tool Setup
        • WideSeek-R1 Training
        • WideSeek-R1 Evaluation
      • AgentLightning
      • Coding Online RL
      • Search-R1
      • rStar2
      • Math GRPO
      • Math PPO
    • Systems
      • FUSCO High-Performance MoE Communication Library
  • Evaluation
    • Get Started
      • Overview
      • Installation
      • Quick Tour
    • Benchmark Guides
      • Real-World Evaluation
      • BEHAVIOR-1K Evaluation
      • LIBERO Evaluation
      • ManiSkill OOD Evaluation
      • PolaRiS Evaluation
      • RoboTwin Evaluation
    • Reference
      • Configuration Reference
      • CLI Reference
      • Supported Models
      • Logs and Results
  • Guides
    • Configure
      • Basic Configuration
      • Embodied Configuration
      • Agentic Configuration
      • Logging
    • Launch & Scale
      • Multi-Node Training
      • Heterogeneous Clusters
      • Cloud-Edge Collaboration
      • Real-World Robots
    • Data & Checkpoints
      • Data Collection
      • Checkpoint Conversion
      • Resume Training
    • Performance
      • Auto Placement
      • Dynamic Scheduling
      • Profiling
      • 5D Parallelism
      • LoRA
    • Hardware Backends
      • AMD ROCm
      • Ascend CANN
      • SGLang Version Switching
    • Agent Workflows
      • Agentic Guides
  • Concepts
    • Execution Model
      • RLinf Execution Flow
      • M2Flow Programming Flow
      • Worker and WorkerGroup
      • Cluster
      • Channel
      • Collective Communication
    • Scheduling Model
      • Placement
      • Execution Modes
      • Replay Buffer
  • Reference
    • API
      • Worker Interface
      • Placement Interface
      • Cluster Interface
      • Channel Interface
      • Actor Interface
      • Rollout Interface
      • Environment Interface
      • Data Interface
      • Embodied Data Interface
      • Replay Buffer
    • Algorithms
      • PPO
      • GRPO
      • DAPO
      • Reinforce++
      • SAC
      • CrossQ
      • RLPD
      • IQL
      • Async PPO
    • Configuration
      • Training Configuration
      • Training Metrics
  • Extending
    • Extending Overview
    • New Environment
    • New Model with FSDP
    • New Model with Megatron
    • New SFT Model
    • Advanced Integrations
      • Megatron-Bridge
      • Weight Synchronization
      • Reward Model Workflow
  • Resources
    • Why RLinf
    • Blog
      • Comparison with VeRL
      • A First Look at the “Last Mile” of Agent Deployment: Cursor Online Reinforcement Learning
      • Accelerating the “ImageNet Moment” of Embodied AI: RLinf Brings a 25Ă— System Optimization to BEHAVIOR
    • Publications
      • RLinf-USER: Unified System for Real-world Online Policy Learning
      • RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
      • Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models
      • RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
      • Ď€RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
      • WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL
      • WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
    • Release Notes
    • FAQ
  • Example Gallery

Example Gallery#

This section presents the collection of examples currently supported by RLinf, showcasing how the framework can be applied across different scenarios and demonstrating its efficiency in practice. This example gallery is continuously expanding, covering new scenarios and tasks to highlight RLinf’s flexibility and efficiency.

Embodied intelligence is RLinf’s primary focus. The embodied gallery is split into five entry points — pick the one that matches your starting question:

Simulators

Start from a benchmark — LIBERO, ManiSkill, RoboTwin, IsaacLab, and more.

RL with Embodied Simulators
Robots

Run on physical robot hardware — the Franka family plus GimArm, XSquare Turtle2, and DOS-W1.

RL with Real-World Robots
Models

RL-fine-tune a specific model family — π₀, GR00T, Lingbot-VLA, OpenSora, Wan, and more.

RL on Embodied Models
SFT

Supervised fine-tuning recipes that produce strong RL cold-start checkpoints.

SFT for VLA / WAM Models
Algorithms

Algorithm-centric examples — DAgger, RECAP, DSRL, IQL offline RL, sim-real co-training, MLP / SAC-Flow.

Algorithms for Embodiment

Beyond embodiment:

Agents

Math reasoning and agentic AI workflows, in both single-agent and multi-agent settings.

Agentic Scenarios
Systems

Flexible, dynamic scheduling of compute resources across the most suitable hardware devices.

System-level Optimizations

previous

Cheat Sheet

next

RL with Embodied Simulators

© Copyright 2025 RLinf Project.

Created using Sphinx 9.0.4.

Built with the PyData Sphinx Theme 0.16.1.