RLinf Documentation#

_images/logo_white.svg

Welcome to RLinf!

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. The ‘inf’ in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.


_images/overview.svg

RLinf is unique with:

  • Macro-to-Micro Flow: a new paradigm M2Flow, which executes macro-level logical flows through micro-level execution flows, decoupling logical workflow construction (programmable) from physical communication and scheduling (efficiency).

  • Flexible Execution Modes

    • Collocated mode: shares all GPUs across all workers.

    • Disaggregated mode: enables fine-grained pipelining.

    • Hybrid mode: a customizable combination of different placement modes, integrating both collocated and disaggregated modes.

  • Auto Scheduling

    • Dynamic Scheduling: dynamically schedule resource allocation, maximizing resource utilization.

    • Static Scheduling: automatically select the most suitable execution mode based on the training workload, without the need for manual resource allocation.

  • Embodied Agent Support

    • Fast adaptation support for mainstream VLA models: OpenVLA, OpenVLA-OFT, π₀, GR00T-N1.5

    • Support for mainstream CPU & GPU-based simulators via standardized RL interfaces: ManiSkill3, LIBERO, IsaacLab

    • Enabling the first RL fine-tuning of the π₀ model family with a flow-matching action expert.

RLinf is fast with:

  • Hybrid mode with fine-grained pipelining: achieves a 120%+ throughput improvement compared to other frameworks.

  • Automatic Online Scaling Strategy: dynamically scales training resources, with GPU switching completed within seconds, further improving efficiency by 20–40% while preserving the on-policy nature of RL algorithms.

RLinf is flexible and easy to use with:

  • Multiple Backend Integrations

    • FSDP + Hugging Face: rapid adaptation to new models and algorithms, ideal for beginners and fast prototyping.

    • Megatron + SGLang: optimized for large-scale training, delivering maximum efficiency for expert users with demanding workloads.

  • Adaptive communication via the asynchronous communication channel

  • Built-in support for popular RL methods, including PPO , GRPO , DAPO , Reinforce++ , and more.