Why RLinf#

RLinf is built to post-train foundation models with reinforcement learning at scale. This page collects the design ideas, performance numbers, and SOTA recipes behind the framework — read it when you want the why rather than the how.

What Makes RLinf Unique#

  • Macro-to-Micro Flow (M2Flow): a new paradigm that executes macro-level logical flows through micro-level execution flows, decoupling logical workflow construction (programmable) from physical communication and scheduling (efficient).

  • Flexible execution modes:

    • Collocated — share all GPUs across all workers.

    • Disaggregated — enable fine-grained pipelining.

    • Hybrid — a customizable combination of collocated and disaggregated modes.

  • Auto scheduling:

    • Dynamic — schedule resource allocation on the fly to maximize utilization.

    • Static — automatically pick the best execution mode for the workload, with no manual resource allocation.

  • Embodied agent support:

Performance#

  • Hybrid mode with fine-grained pipelining achieves a 120%+ throughput improvement over comparable frameworks.

  • Automatic online scaling scales training resources dynamically, completing GPU switching within seconds and improving efficiency by a further 20–40% while preserving the on-policy nature of RL algorithms.

Flexible and Easy to Use#

  • Multiple backends behind one interface — switch without code changes:

    • FSDP + Hugging Face — rapid adaptation to new models and algorithms; ideal for beginners and fast prototyping.

    • Megatron + SGLang — optimized for large-scale training and maximum efficiency for demanding workloads.

  • Adaptive communication via the asynchronous communication channel.

  • Built-in RL methods, including PPO, GRPO, DAPO, Reinforce++, and more.

SOTA RL Training Reproduction#

RLinf provides end-to-end recipes that reproduce or match state-of-the-art (SOTA) RL results out of the box — run our configs and scripts directly to obtain published numbers without custom engineering.

  • Embodied tasks: RLinf reaches or matches SOTA success rates on benchmarks such as LIBERO, ManiSkill, and RoboTwin with OpenVLA, OpenVLA-OFT, π₀/π₀.â‚…, GR00T, and other VLAs. See the Example Gallery gallery and the Reference algorithm specs.

  • Agentic tasks (including math reasoning): RLinf achieves SOTA performance on AIME24 / AIME25 / GPQA-diamond with DeepSeek-R1-Distill-Qwen models, and supports single- and multi-agent tasks such as Search-R1 and Coding-Online-RL. See Agentic Scenarios.