Flexible Execution Modes#
Conventional RL post-training systems are typically classified—based on their GPU placement strategy—into two primary modes: Collocated Mode and Disaggregated Mode.
In the collocated mode, all major components (e.g., generator, actor inference, and actor training) share the same set of GPUs or nodes. In contrast, the disaggregated mode assigns these components to separate GPUs or nodes.
However, neither mode is well-suited for complex RL workloads such as embodied intelligence, which involve more components (e.g., simulators) and more intricate communication patterns—for instance, the fine-grained interactions between the simulator and the generator.
To better accommodate diverse and dynamic RL workloads, RLinf supports flexible component placement and execution modes, enabling users to orchestrate components in a highly adaptable way. In particular, components can be placed on any GPUs with configurable execution strategies:
Collocated on the same GPUs: Users may configure whether both components remain resident in GPU memory, or whether they switch usage of the GPUs via an offloading/reloading mechanism.
Distributed on separate GPUs: Components may either run sequentially—potentially causing GPU idle time—or execute in a pipelined fashion, ensuring that all GPUs remain busy.
The Hybrid Mode mode further extends this flexibility by supporting customized combinations of placement and execution strategies.