Agentic Scenarios#

RLinf’s worker abstraction, flexible communication modules, and support for various accelerators make it naturally suited for building agent workflows and training agents. The following examples include math reasoning RL and agentic AI workflows, such as agent workflow construction, online RL training, environment integration, and reasoning-centric agent training.

WideSeek-R1
Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Open-Source Online RL for Code Completion
End-to-end online RL with RLinf + Continue, improving model performance by 52%

Search-R1 RL Training
Train LLMs to answer questions by invoking search tools, RLinf accelerates the training process by 55%.

rStar2-agent RL Training
Enabling models to autonomously reason and reflect using Python tools through reinforcement learning, achieving frontier-level mathematical reasoning at extremely low computational cost

[Ongoing]SWE-agent
Unified deployment, inference, and training with high flexibility and performance

GRPO training for Math Reasoning
SOTA RL training for math reasoning (AIME24/AIME25/GPQA-diamond) with Qwen-based models

PPO training for Math Reasoning
Math reasoning RL training using the PPO algorithm