Disaggregated Mode#
Different RL tasks are mapped to different GPU groups according to their computation needs. There also two execution modes: the workers run sequentially one after another or the workers run concurrently with fine-grained pipelining.
Pros
Flexible worker assignment.
No requirement for offloading implementation.
Cons
Data-flow dependencies lead to GPUs idle.
Pipelining should be implemented to reduce GPU idle time.
Example configuration
The workers are assigned to separate GPUs. The set of GPUs is specified using global GPU indices.
cluster:
num_nodes: 2
component_placement:
rollout: 0-9
inference: 10-11
actor: 12-15
Currently, whether the execution is pipelined is decided by the underlying code implementation. We have not exposed the configuration option yet. If pipelining is implemented underlying, the disaggregated mode uses pipelining by default.
ComponentPlacement programming
As described in Collocated Mode, the placement configuration in the yaml file can be parsed by ComponentPlacement and enforced on workers. Refer to Math RL training with pipelining for the complete code.