RLinf Documentation#

Welcome to RLinf!

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. The ‘inf’ in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

RLinf is unique with:

Macro-to-Micro Flow: a new paradigm M2Flow, which executes macro-level logical flows through micro-level execution flows, decoupling logical workflow construction (programmable) from physical communication and scheduling (efficiency).
Flexible Execution Modes
- Collocated mode: shares all GPUs across all workers.
- Disaggregated mode: enables fine-grained pipelining.
- Hybrid mode: a customizable combination of different placement modes, integrating both collocated and disaggregated modes.
Auto Scheduling
- Dynamic Scheduling: dynamically schedule resource allocation, maximizing resource utilization.
- Static Scheduling: automatically select the most suitable execution mode based on the training workload, without the need for manual resource allocation.
Embodied Agent Support
- Fast adaptation support for mainstream VLA models: OpenVLA, OpenVLA-OFT, π₀, GR00T-N1.5
- Support for mainstream CPU & GPU-based simulators via standardized RL interfaces: ManiSkill3, LIBERO, IsaacLab
- Enabling the first RL fine-tuning of the π₀ model family with a flow-matching action expert.

RLinf is fast with:

Hybrid mode with fine-grained pipelining: achieves a 120%+ throughput improvement compared to other frameworks.
Automatic Online Scaling Strategy: dynamically scales training resources, with GPU switching completed within seconds, further improving efficiency by 20–40% while preserving the on-policy nature of RL algorithms.

RLinf is flexible and easy to use with:

Multiple Backend Integrations
- FSDP + Hugging Face: rapid adaptation to new models and algorithms, ideal for beginners and fast prototyping.
- Megatron + SGLang: optimized for large-scale training, delivering maximum efficiency for expert users with demanding workloads.
Adaptive communication via the asynchronous communication channel
Built-in support for popular RL methods, including PPO , GRPO , DAPO , Reinforce++ , and more.

Tutorials

Example Gallery

Blog

Publications

APIs

FAQs