Heterogenous Software and Hardware Setup#
RLinf supports running on nodes with heterogeneous hardware and software setup, e.g., running simulators on ray tracing-capable GPUs (like RTX 4090), training on compute GPUs with larger GPU memory (like A100), search agent on CPU-only nodes, and/or robot controllers on nodes with special hardware like robotic arms.
To set up such a heterogeneous environment, all you need to do is configuring the cluster section of the YAML config file as follows.
Cluster configuration#
The cluster section describes what machines you have and how RLinf
should place each component (actor, rollout, env, agent, etc.) on them.
At a high level, you specify:
the total number of nodes in the cluster,
a set of node groups that share the same hardware / environment, and
a component placement rule that maps logical components to hardware resources (GPUs, robots, or just nodes).
An example#
The following example shows a cluster with heterogeneous hardware and per-node software environments
configured via env_configs:
cluster:
num_nodes: 18
component_placement:
actor:
node_group: a800
placement: 0-63 # hardware ranks within ``a800``
rollout:
node_group: 4090
placement: 0-63 # hardware ranks within ``4090``
env:
node_group: franka
placement: 0-1 # robot hardware ranks within ``franka``
agent:
node_group: node
placement: 0-1:0-199,2-3:200-399 # node ranks : process ranks
node_groups:
- label: a800
node_ranks: 0-7
env_configs:
- node_ranks: 0-7
python_interpreter_path: /opt/venv/openpi/bin/python3
env_vars:
- GLOO_SOCKET_IFNAME: "eth0"
- label: 4090
node_ranks: 8-15
env_configs:
- node_ranks: 8-15
env_vars:
- GLOO_SOCKET_IFNAME: "eth1"
- label: franka
node_ranks: 16-17
hardware:
type: Franka
configs:
- robot_ip: "10.10.10.1"
node_rank: 16
camera_serials:
- "322142001230"
- "322142001231"
- robot_ip: "10.10.10.2"
node_rank: 17
camera_serials:
- "322142001232"
- "322142001233"
Interpretation#
The above configuration encodes the following ideas:
num_nodes: 18– total number of nodes in the cluster. Node ranks are zero-indexed and specified via theRLINF_NODE_RANKenvironment variable when starting Ray on each node.node_groups– each entry defines a node group: a set of nodes with the same hardware and environment. A node group has:label: a unique string identifier used later incomponent_placement(e.g.,a800,4090,franka). Labels are case sensitive.The labels
clusterandnodeare reserved by the scheduler.nodeis a special group that covers all nodes and is used for hardware-agnostic placement (CPU-only processes, agents, etc.).node_ranks: a list or range of global node ranks that belong to this group. In the example,a800covers0-7,4090covers8-15, andfrankacovers16-17.env_configs(optional): a list of software environment configurations for subsets of nodes in the group. Each entry is aNodeGroupEnvConfigwith its ownnode_ranks,env_vars, andpython_interpreter_path:node_ranksmust be a subset of the parent group’snode_ranks, and differentenv_configsin the same group must not overlap.env_varsis a list of one-key dicts; environment variable keys must be unique within a node group for a node.python_interpreter_pathis the interpreter to use on the specified nodes.
hardware(optional): structured description of non-accelerator hardware (such as robots). The structure depends on the hardwaretype(for example,Franka). Whenhardwareis present, this node group is treated as owning exactly one hardware type, and that type defines hardware ranks (0, 1, …) within the group.
If
hardwareis not specified for a node group, RLinf behaves as follows:If accelerator hardware (GPUs, NPUs, etc.) is detected on the nodes those accelerators become the default resources, and their local indices are used as hardware ranks.
If no accelerators are present, each node itself is treated as a single hardware resource with rank 0 within that node.
When you reference a node_group in component_placement, the placement string is always written in terms of hardware ranks within that group:
If
hardwareis present, these are explicit hardware ranks of that type (e.g., robots0-3).Otherwise, they are automatically detected accelerators, if any.
If there is no accelerator, the node itself is considered a hardware resource.
Using the reserved node group in component_placement disables
hardware placement entirely and interprets ranks as node ranks only. This
is useful for placing hardware-agnostic processes (such as agents or
CPU-only workers) on particular nodes regardless of available GPUs.