Why RLinf#
RLinf is built to post-train foundation models with reinforcement learning at scale. This page collects the design ideas, performance numbers, and SOTA recipes behind the framework — read it when you want the why rather than the how.
What Makes RLinf Unique#
Macro-to-Micro Flow (M2Flow): a new paradigm that executes macro-level logical flows through micro-level execution flows, decoupling logical workflow construction (programmable) from physical communication and scheduling (efficient).
Flexible execution modes:
Collocated — share all GPUs across all workers.
Disaggregated — enable fine-grained pipelining.
Hybrid — a customizable combination of collocated and disaggregated modes.
Auto scheduling:
Dynamic — schedule resource allocation on the fly to maximize utilization.
Static — automatically pick the best execution mode for the workload, with no manual resource allocation.
Embodied agent support:
Fast adaptation for mainstream VLA models: OpenVLA, OpenVLA-OFT, π₀, GR00T-N1.5.
Mainstream CPU & GPU simulators via standardized RL interfaces: ManiSkill3, LIBERO, IsaacLab.
The first RL fine-tuning of the π₀ model family with a flow-matching action expert.
Performance#
Hybrid mode with fine-grained pipelining achieves a 120%+ throughput improvement over comparable frameworks.
Automatic online scaling scales training resources dynamically, completing GPU switching within seconds and improving efficiency by a further 20–40% while preserving the on-policy nature of RL algorithms.
Flexible and Easy to Use#
Multiple backends behind one interface — switch without code changes:
FSDP + Hugging Face — rapid adaptation to new models and algorithms; ideal for beginners and fast prototyping.
Megatron + SGLang — optimized for large-scale training and maximum efficiency for demanding workloads.
Adaptive communication via the asynchronous communication channel.
Built-in RL methods, including PPO, GRPO, DAPO, Reinforce++, and more.
SOTA RL Training Reproduction#
RLinf provides end-to-end recipes that reproduce or match state-of-the-art (SOTA) RL results out of the box — run our configs and scripts directly to obtain published numbers without custom engineering.
Embodied tasks: RLinf reaches or matches SOTA success rates on benchmarks such as LIBERO, ManiSkill, and RoboTwin with OpenVLA, OpenVLA-OFT, π₀/π₀.₅, GR00T, and other VLAs. See the Example Gallery gallery and the Reference algorithm specs.
Agentic tasks (including math reasoning): RLinf achieves SOTA performance on AIME24 / AIME25 / GPQA-diamond with DeepSeek-R1-Distill-Qwen models, and supports single- and multi-agent tasks such as Search-R1 and Coding-Online-RL. See Agentic Scenarios.