Multi-node Training#
This guide shows how to launch a 4-node Ray cluster (each node has 8 GPUs) and run distributed RL training on the math task with the DeepSeek-R1-Distill-Qwen-1.5B model. The same procedure scales to any number of nodes/GPUs, as long as you customize the YAML configuration according to your needs.
Prerequisites#
Before running, make sure to check the following:
Clone RLinf to a shared filesystem accessible by all nodes.
Ensure that each node has started the corresponding container image.
Step 1: Start a Ray Cluster#
Clean up old cached state first:
rm -f ray_utils/ray_head_ip.txt
Open a shell on each node and run:
node index |
command |
|---|---|
0 (head) |
|
1 |
|
2 |
|
3 |
|
Once the scripts run successfully, the terminal on the head node should display output similar to the following (for simplicity, we only show the example of 2 nodes with 16 GPUs):

On each worker node, the terminal should display:

After all four startup scripts print Ray started, remain in the head node terminal and verify the total cluster size (in this example, 4 Γ 8 = 32 GPUs):
bash ray_utils/check_ray.sh 32
Note
The argument to check_ray.sh must equal the number of accelerators/GPUs in the cluster.
If successful, your terminal should show:

Note: For simplicity, the images in this example only show a 2-node setup with 16 GPUs.
Step 2: Launch Training Tasks#
Here we provide startup examples in two modes: collocated mode and disaggregated mode.
Collocated#
Every training stage (rollout, inference, actor) shares all GPUs. Edit the sample YAML:
# examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron.yaml
cluster:
num_nodes: 4 # adapt to your cluster
component_placement:
actor,rollout: all # βallβ means the whole visible GPU set
Launch from the head node:
bash examples/reasoning/run_main_grpo_math.sh \
qwen2.5-1.5b-grpo-megatron
Disaggregated#
Different stages receive disjoint GPU ranges, allowing fine-grained pipelining. Edit the pipeline YAML:
# examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron-pipeline.yaml
cluster:
num_nodes: 4
component_placement:
rollout: 0-19 # 20 GPUs
inference: 20-23 # 4 GPUs
actor: 24-31 # 8 GPUs
rollout + inference + actormust equal the total GPU count (here32).Ranges are inclusive.
Start the job:
bash examples/reasoning/run_main_grpo_math.sh \
qwen2.5-1.5b-grpo-megatron-pipeline