Placement Interface#
This section introduces the GPU and node placement strategies in RLinf.
Whether in collocated mode, disaggregated mode, or hybrid mode, ComponentPlacement is the user-facing interface for generating the placements of different component workers (e.g., actor, env, rollout, inference), while placement strategies are the underlying mechanisms for obtaining precise allocation of each node and each GPU resource.
The generated placement metadata is later used for remote launching with Ray.
Component Placement#
The ComponentPlacement interface is responsible for parsing the cluster.component_placement field in the configuration file and generating precise placements for different component workers.
Notably, ComponentPlacement also supports configuration of heterogeneous clusters through the node_group field in cluster.node_groups.
The detailed explanation of the syntax can be found in the docs below.
- class rlinf.utils.placement.ComponentPlacement#
Bases:
objectBase component placement for parsing cluster.component_placement config.
The component placement config is defined as either:
group_name1,group_name2,...: resource_ranks1:process_ranks1, resource_ranks2:process_ranks2,...
or:
group_name1,group_name2,...: node_group: <node_group_label> placement: "resource_ranks1":"process_ranks1", "resource_ranks2":"process_ranks2",...
A simple example is:
cluster: num_nodes: 1 actor,inference: 0-7
which means both the actor and inference groupsβ process 0-7 evenly occupy accelerator 0 to 7.
A more complex example is:
cluster: num_nodes: 2 component_placement: actor: node_group: a800 placement: 0-8 rollout: node_group: 4090 placement: 0-8 env: node_group: robot # Assuming robot hardware type is defined in the node group config placement: 0-3:0-7 agent: node_group: node placement: 0-1:0-200,2-3:201-511
which means:
The actor group occupies accelerators 0-8 on node group βa800β.
The rollout group occupies accelerators 0-8 on node group β4090β.
The env group occupies robot hardware 0-3 on node group βrobotβ, with each robot hardware shared by 2 processes.
The agent group occupies nodes 0-1 for process 0-200, and nodes 2-3 for process 201-511.
The concrete specifications of the config format are as follows:
resource_ranksis the ranks of the resources (e.g., GPUs, robots, or nodes) to use for the component(s). resource ranks are by default the accelerator ranks (within the node group ifnode_groupis given, counted from 0) if no hardware is specified in the config. If the nodes do not have accelerators, resource ranks are the node ranks. If a hardware is specified in the node group config, the resource ranks are the hardware ranks within the label node group, e.g., for nodes with robotic systems.The format of
resource_ranksis an integer range a-b, which means all ranks from a to b including a and b. For example, 0-3 means rank 0, 1, 2, 3. Alternatively, βallβ can be used to specify all resources.process_ranksis the ranks of the processes of the component(s), following the same format ofresource_ranks. The processes will be evenly assigned to the specified resource ranks. For example, 0-3:0-7 means process 0-7 will be evenly assigned to resource ranks 0-3, with 2 processes sharing 1 resource. If the number of processes is smaller than the number of resources, it means one process occupy multiple resources. Ifprocess_ranksis not specified, each process will be assigned to one resource rank in order. For example, 0-4 means process 0-4 will be assigned to resource ranks 0-4 respectively.Fancier syntax mixing the two formats is also supported, e.g., 0-1:0-3,3-5,7-10:7-14, which means process 0-3 will be evenly assigned to resource ranks 0-1, process 4-6 will be assigned to resource ranks 3-5 (implicitly inferred by the scheduler) respectively, and process 7-14 will be evenly assigned to resource ranks 7-10. Note that even if the process ranks are not specified, they are assumed to be continuous from 0 to N-1, where N is the total number of processes. Failure to follow this rule will raise an assertion error.
For the second format, the
node_grouplabel is the label defined in cluster.node_groups.label, which is optional. It can be either a single string (single node group) or a comma-separated string/list (multiple node groups, e.g.,node_group: "a800,4090"). If not specified, all nodes in the cluster are used. Anodelabel is reserved by the scheduler for allocating on node ranks only (no accelerators or other hardware).When multiple node groups are specified, hardware ranks span across all groups in order, starting from 0. For example, if group βa800β has 8 GPUs (ranks 0-7) and group β4090β has 8 GPUs (ranks 0-7 within that group), then in the composite placement, hardware ranks 0-7 belong to βa800β and ranks 8-15 belong to β4090β. Note that hardware ranks within a single process must all be of the same hardware type and on the same node.
- __init__(config, cluster)#
Parsing component placement configuration.
- Parameters:
config (
DictConfig) β The configuration dictionary for the component placement.cluster (
Cluster) β The cluster to use for placement.
- property placement_mode#
Get the placement mode for the component.
- Returns:
The placement mode for the component.
- Return type:
PlacementMode
- property components: list[str]#
Get the list of components defined in the placement.
- Returns:
The list of component names.
- Return type:
list[str]
- get_hardware_ranks(component_name)#
Get the hardware count for a specific component.
- Parameters:
component_name (
str) β The name of the component.- Returns:
The hardware ranks for the specified component.
- Return type:
list[int]
- get_world_size(component_name)#
Get the world size for a specific component.
- Parameters:
component_name (
str) β The name of the component.- Returns:
The world size for the specified component.
- Return type:
int
- get_strategy(component_name)#
Get the placement strategy for a component based on the configuration.
- Parameters:
component_name (
str) β The name of the component to retrieve the placement strategy for.- Returns:
The placement strategy for the specified component.
- Return type:
In the embodied intelligence and MATH reasoning settings,
HybridComponentPlacement and ModelParallelComponentPlacement are used to generate the worker placements, respectively.
HybridComponentPlacement is a direct inheritance of ComponentPlacement, while ModelParallelComponentPlacement extends the placement logic to support model parallelism of inference engines across multiple GPUs.
HybridComponentPlacement#
- class rlinf.utils.placement.HybridComponentPlacement#
Bases:
ComponentPlacementHybrid component placement that allows components to run on any sets of GPUs.
ModelParallelComponentPlacement#
- class rlinf.utils.placement.ModelParallelComponentPlacement#
Bases:
ComponentPlacementComponent placement for model-parallel components.
The components must be actor, rollout, and optionally inference, whose GPUs must be continuous.
This placement supports both collocated and disaggregated modes.
In the collocated mode, all components share the same set of GPUs. In particular, the rollout group is specially placed in a strided manner to enable fast cudaIPC-based weight sync. In the disaggregated mode, each component has its own dedicated set of GPUs.
In the collocated mode, only actor and rollout exist. While in the disaggregated mode, actor, rollout, and inference should all exist.
Placement Strategies#
Placement strategies are the underlying mechanisms for obtaining precise allocation of each node and each GPU resource used by component placement.
If you wish to customize placements, you can refer to the following built-in strategies, namely FlexiblePlacementStrategy, PackedPlacementStrategy and NodePlacementStrategy.
Specifically, FlexiblePlacementStrategy and PackedPlacementStrategy are used for placing worker processes on top of accelerators/GPUs, while NodePlacementStrategy is used for placing worker processes on specific nodes without considering the underlying accelerator resources and thus useful for CPU-only workers.
FlexiblePlacementStrategy#
- class rlinf.scheduler.placement.flexible.FlexiblePlacementStrategy#
Bases:
PlacementStrategyThis placement strategy allows processes to be placed on any hardware (accelerators, robots, etc.) by specifying a list of global hardware ranks for each process.
Note
The global hardware rank means the hardware rank across the entire cluster or a node group if node_group_label is given. For example, if a cluster has 2 nodes, each with 8 GPUs, then the global GPU ranks are 0~7 for node 0 and 8~15 for node 1.
The following example shows how to use the placement strategy.
Example:
>>> from rlinf.scheduler import ( ... Cluster, ... Worker, ... FlexiblePlacementStrategy, ... ) >>> >>> class MyWorker(Worker): ... def __init__(self, msg: str = "Hello, World!"): ... super().__init__() ... self._msg = msg ... ... def hello(self): ... return self._rank ... ... def available_gpus(self): ... import torch ... available_gpus = torch.cuda.device_count() ... gpu_ids = [ ... torch.cuda.get_device_properties(i) for i in range(available_gpus) ... ] ... return available_gpus >>> >>> cluster = Cluster(num_nodes=1) >>> >>> # `FlexiblePlacementStrategy` allows you to specify the *global* accelerator/GPU ranks for each process. >>> placement = FlexiblePlacementStrategy([[0, 1], [2], [3]]) >>> my_worker = MyWorker.create_group().launch( ... cluster=cluster, name="flexible_placement", placement_strategy=placement ... ) >>> # This will run 3 processes on the first node's GPU 0, 1, 2, 3, where the first process uses GPUs 0 and 1, the second process uses GPU 2, and the third process uses GPU 3. >>> my_worker.available_gpus().wait() [2, 1, 1]
- __init__(hardware_ranks_list, node_group_label=None)#
Initialize the FlexiblePlacementStrategy.
Note
The hardware ranks in each inner list must be on the same node and must be unique.
Note
The hardware ranks will be sorted in ascending order both within each process and across processes (based on the first rank).
- Parameters:
hardware_ranks_list (
List[List[int]]) β A list of lists, where each inner list contains the hardware (e.g., GPU) ranks to allocate for a specific process.node_group_label (
Optional[str | Sequence[str]]) β The label or list of labels of the node groups to which the accelerator ranks belong. If specified, the accelerator ranks mean local ranks within the selected node groups. Otherwise, accelerator ranks are global ranks.
- get_placement(cluster, isolate_accelerator=True)#
Generate a list of placements based on the flexible strategy.
- Parameters:
cluster (
Cluster) β The cluster object containing information about the nodes and accelerators.isolate_accelerator (
bool) β Whether accelerators not allocated to a worker will not be visible to the worker (by settings envs like CUDA_VISIBLE_DEVICES). Defaults to True.
- Returns:
A list of Placement objects representing the placements of processes on accelerators.
- Return type:
List[Placement]
PackedPlacementStrategy#
- class rlinf.scheduler.placement.packed.PackedPlacementStrategy#
Bases:
PlacementStrategyPlacement strategy that allows processes to be placed on hardware (e.g., GPUs) in a close-packed manner. One process can have one or multiple hardware.
The following example shows how to use the placement strategy.
Example:
>>> from rlinf.scheduler import ( ... Cluster, ... Worker, ... PackedPlacementStrategy, ... ) >>> >>> class MyWorker(Worker): ... def __init__(self, msg: str = "Hello, World!"): ... super().__init__() ... self._msg = msg ... ... def hello(self): ... return self._rank ... ... def available_gpus(self): ... import torch ... available_gpus = torch.cuda.device_count() ... gpu_ids = [ ... torch.cuda.get_device_properties(i) for i in range(available_gpus) ... ] ... return available_gpus >>> >>> cluster = Cluster(num_nodes=1) >>> >>> # `PackedPlacementStrategy` will fill up nodes with workers before moving to the next node. >>> placement = PackedPlacementStrategy(start_hardware_rank=0, end_hardware_rank=3) >>> my_worker = MyWorker.create_group().launch( ... cluster=cluster, name="packed_placement", placement_strategy=placement ... ) >>> my_worker.available_gpus().wait() # This will run 4 processes on the first node's GPU 0, 1, 2, 3, each using 1 GPU. [1, 1, 1, 1] >>> >>> >>> # `num_hardware_per_process` allows for one process to hold multiple accelerators/GPUs. >>> # For example, if you want a process to hold 4 GPUs, you can set the `num_hardware_per_process` to 4. >>> placement_chunked = PackedPlacementStrategy( ... start_hardware_rank=0, end_hardware_rank=3, num_hardware_per_process=2 ... ) >>> my_worker_chunked = MyWorker.create_group().launch( ... cluster=cluster, ... name="chunked_placement", ... placement_strategy=placement_chunked, ... ) >>> my_worker_chunked.available_gpus().wait() # This will run 2 processes, each using 2 GPUs (0-1 and 2-3) of the first node. [2, 2] >>> >>> >>> # `stride` allows for strided placement of workers across GPUs. >>> # For example, if you want to place workers on every second GPU, you can set the stride to 2. >>> placement_strided = PackedPlacementStrategy( ... start_hardware_rank=0, end_hardware_rank=3, stride=2, num_hardware_per_process=2 ... ) >>> my_worker_strided = MyWorker.create_group().launch( ... cluster=cluster, ... name="strided_placement", ... placement_strategy=placement_strided, ... ) >>> # This will run 2 processes, each using 2 GPUs (0,2 1,3) of the first node. >>> my_worker_strided.available_gpus().wait() [2, 2]
- __init__(start_hardware_rank, end_hardware_rank, num_hardware_per_process=1, stride=1, node_group=None)#
Initialize the PackedPlacementStrategy.
- Parameters:
start_hardware_rank (
int) β The global rank of the starting hardware in the cluster or node group for the placement.end_hardware_rank (
int) β The global rank of the end hardware in the cluster or node group for the placement.num_hardware_per_process (
int) β The number of hardware resources to allocate for each process.stride (
int) β The stride to use when allocating hardware. This allows one process to have multiple hardware in a strided manner, e.g., GPU 0, 2, 4 (stride 2) or GPU 0, 3, 6 (stride 3).node_group (
Optional[str | Sequence[str]]) β The label(s) of the node group(s) to use for placement. Provide a list to span multiple node groups. If None, the entire cluster is considered.
- get_placement(cluster, isolate_accelerator=True)#
Generate a list of placements based on the packed strategy.
- Parameters:
cluster (
Cluster) β The cluster object containing information about the nodes and accelerators.isolate_accelerator (
bool) β Whether accelerators not allocated to a worker will not be visible to the worker (by settings envs like CUDA_VISIBLE_DEVICES). Defaults to True.
- Returns:
A list of Placement objects representing the placements of processes on accelerators.
- Return type:
list[Placement]
NodePlacementStrategy#
- class rlinf.scheduler.placement.node.NodePlacementStrategy#
Bases:
PlacementStrategyThis placement strategy places processes on specific nodes (using global node rank) without limiting accelerators. This is useful for CPU-only workers who do not rely on accelerators.
Note
The global node rank means the node rank across the entire cluster. For example, if a cluster has 16 nodes, the node ranks are 0~15.
Example:
>>> from rlinf.scheduler import ( ... Cluster, ... Worker, ... NodePlacementStrategy, ... ) >>> >>> class MyWorker(Worker): ... def __init__(self, msg: str = "Hello, World!"): ... super().__init__() ... self._msg = msg ... ... def hello(self): ... return self._rank ... >>> >>> cluster = Cluster(num_nodes=1) >>> >>> # `NodePlacementStrategy` allows you to specify the *global* node ranks for each process. >>> placement = NodePlacementStrategy([0] * 4) >>> my_worker = MyWorker.create_group().launch( ... cluster=cluster, name="node_placement", placement_strategy=placement ... ) >>> my_worker.hello().wait() # This will run 4 processes on the first node [0, 1, 2, 3]
- __init__(node_ranks, node_group_label=None)#
Initialize the NodePlacementStrategy.
Note
The node ranks will be sorted.
- Parameters:
node_ranks (
List[int]) β A list of node ranks to allocate for the processes.node_group_label (
Optional[str | Sequence[str]]) β The label or list of labels of the node groups to which the node ranks belong. If specified, the node_ranks are local ranks within the selected node groups. Otherwise, node_ranks are global ranks.
- get_placement(cluster, isolate_accelerator=True)#
Generate a list of placements based on the node placement strategy.
- Parameters:
cluster (
Cluster) β The cluster object containing information about the nodes and hardware.isolate_accelerator (
bool) β Whether accelerators not allocated to a worker will not be visible to the worker (by settings envs like CUDA_VISIBLE_DEVICES). Defaults to True.
- Returns:
A list of Placement objects representing the placements of processes.
- Return type:
List[Placement]
Placement Metadata#
- class rlinf.scheduler.placement.placement.Placement#
Class representing the placement of a worker on a specific GPU.
- rank: int#
Global rank of the worker in the cluster.
- cluster_node_rank: int#
Global node rank in the cluster where the worker is placed.
- placement_node_rank: int#
Local rank of the node in the placement.
- local_accelerator_rank: int#
Local GPU ID on the node.
- accelerator_type: AcceleratorType#
Type of accelerators on the node.
- local_rank: int#
Local rank of the worker on the node.
- local_world_size: int#
Local world size (number of workers) on the node.
- visible_accelerators: list[str]#
List of CUDA visible devices for the worker.
- isolate_accelerator: bool#
Flag to indicate if the local rank should be set to zero. This is useful for workers that require multiple GPUs.
- local_hardware_ranks: list[int]#
The assigned local hardware ranks of the worker
- node_group_label: str#
The label of the node group where the worker is placed.