Tutorials#
This section offers an in-depth exploration of RLinf. It provides a collection of hands-on tutorials covering all the core components and features of the library. Below, we first give an overview of RLinf execution flow to help users understand how RLinf executes an RL training.
RLinf Execution Overview#
The following figure demonstrates the overview of RLinf execution flow, including the main code flow (left), the main process corresponding to the code flow (middle), and the concept of Worker, WorkerGroup, and Channel (right).
Code Flow Overview. Let’s first look at the main code flow shown in the left of the figure. The
run.shscript runsmain_grpo.pywhich serves as the entry point. Inmain_grpo.py, the main function first determines the placement for Workers (e.g, actor, rollout) based on the YAML configuration (i.e.,cluster/component_placement). Specifically, each Worker can be flexibly assigned to any number of GPUs (or other types of accelerators) through the YAML configuration. After determining the placement, the script launches WorkGroups, each consisting of one or more Worker processes of the same type. These WorkerGroups are then passed to the Runner, where the main RL training workflow is encapsulated in therun()function.Main Process. Here, Worker placement defined in the YAML
placementconfiguration is translated into ourWorker Placement Strategy, which dictates on which node and/or which GPU a worker process should run. Based on this, worker processes are launched in the cluster viaWorker’slaunch()API. ThelaunchAPI returns a handle to collectively manage all remote processes of the same Worker class, e.g. RolloutWorker, termed aWorkerGroup. You can command a group’s processes to simultaneously execute any public functions of the Worker class via thisWorkerGrouphandle. TheRunnerthen obtains these handles and orchestrates the execution of Worker processes remotely. However, communication between Workers is not conducted via the mainRunnerprocess. Instead,Runnerestablishes communication Channels (Channel.Create()) for inter-worker data exchange. For example, in a typicalRunneriteration, it first calls RolloutGroup’srolloutto execute therolloutfunction on all RolloutWorker processes, and then similarly executes ActorGroup’strainfunction. For each function, it passes in the created Channels for them to communicate.Key Concepts and Features. The right side of the figure highlights three core features of RLinf. (i) Flexible Worker placement, a WorkerGroup can be elastically placed on any node or GPU. (ii) Easy-to-use communication interface, users can send or receive data by referencing only the WorkerGroup’s name. (iii) Distributed data Channels, Workers can easily exchange data using
channel.putandchannel.get.
RLinf adopts a modular design that abstracts distributed complexity through the Worker, WorkerGroup, and Channel features. This design enables users to build large-scale RL training pipelines with minimal distributed programming effort, especially for embodied intelligence and agent-based systems.