Dynamic Scheduling#

Dynamic scheduling adjusts and migrates resources among components (actor / rollout / inference) in real time during training to improve overall throughput and resource utilization. It relies on Megatron-LM’s online scaling (second-level elasticity) and SGLang/vLLM’s migrate capability to reallocate GPU resources without stopping training.

Online Scaling Mechanism #

Online scaling (also known as elastic training) is a powerful feature that enables dynamic scaling of training resources, with GPU switching performed within 1 second. This capability allows you to adjust the number of GPUs and nodes used for training in real time, based on cluster availability, workload demands, or resource optimization goals.

What is Online Scaling?#

Online scaling refers to the ability to scale up (add more resources) or scale down (remove resources) during training while maintaining training continuity and model state consistency.

In the context of RL training with Megatron-LM, this involves:

Scaling up: Adding nodes/GPUs to increase training throughput
Scaling down: Releasing nodes/GPUs to free up resources for other tasks
Parallel strategy adjustment: Dynamically changing Megatron’s parallel strategies (TP/PP/DP/CP)

The system automatically handles:

Model parameter redistribution across the new parallel configuration
Optimizer state migration
Communication group reconstruction
Training state synchronization

Why is Online Scaling Important?#

When using RLinf’s disaggregated mode with fine-grained pipelining, the rollout and inference stages are completed before the actor stage finishes. At this point, the resources used for rollout and inference can be reallocated to the actor stage within seconds, accelerating actor training and improving overall system performance.

Benefits and Effects #

Performance Benefits:

Increased Throughput: Adding more GPUs can significantly speed up training
Better Resource Utilization: Dynamic resource allocation ensures optimal usage
Reduced Training Time: Efficient scaling can reduce overall training time by 20–50%

Operational Benefits:

Zero Training Interruption: Scaling occurs seamlessly without halting training
Consistent Training Progress: Maintains convergence and model continuity throughout scaling

Dynamic Scheduling in Practice #

Dynamic scheduling adjusts GPU resources across components during training based on stage-specific bottlenecks and workload changes:

Scaling up: temporarily add GPUs when a component becomes the bottleneck
Scaling down: reclaim GPUs when a component is temporarily idle

Benefits #

Performance benefits:

Higher throughput: temporarily accelerate bottleneck components to speed up training
Better utilization: promptly reassign idle resources to effective computation
Shorter end-to-end time: often yields 20–50% total time reduction (task/cluster dependent)

Operational characteristics:

No training interruption: scaling/migration occurs without stopping training
Consistency preserved: training/model state remains consistent during scaling

How to Use Dynamic Scheduling #

Prerequisites#

Prepare Megatron-LM online scaling dependency (prebuilt):

WORKSPACE=YourWorkspace
cd $WORKSPACE
git clone git@github.com:i-Taozi/params_resharding_release.git
export PYTHONPATH=$PYTHONPATH:$WORKSPACE/params_resharding_release

This repository provides the compiled artifacts for Megatron-LM online scaling. The source code will be released in future.

Megatron must be version 0.11. If your environment is not 0.11, fetch 0.11 separately:

WORKSPACE=YourWorkspace
cd $WORKSPACE
git clone -b core_r0.11.0 git@github.com:NVIDIA/Megatron-LM.git
export PYTHONPATH=$PYTHONPATH:$WORKSPACE/Megatron-LM

Important

If you use torch >= 2.6.0, Megatron-LM 0.11 may raise errors due to the default torch.load behavior. You can clone a modified Megatron-LM 0.11 version from

git clone -b core_v0.11.0_rlinf git@github.com:RLinf/Megatron-LM.git

Configuration Example#

Disaggregated Pipeline configuration:

cluster:
  num_nodes: 1
  component_placement:
    rollout: 0-3
    inference: 4-5
    actor: 6-7

Based on the disaggregated pipeline configuration, change the component order and enable the auto scheduler. Make sure the component order is actor -> rollout -> inference, otherwise the actor can’t scale up.

cluster:
  num_nodes: 1
  auto_scheduler: True
  use_pre_process_policy: True
  use_wait_before_last_iter_policy: False
  component_placement:
    actor: 0-1
    rollout: 2-5
    inference: 6-7

Scheduling Logic #

When dynamic scheduling is enabled, the runtime scheduler monitors component progress and queues and decides whether to adjust resources. Typical actions include:

When the rollout backlog is small: trigger rollout migration, release part of rollout resources, and expand the actor
When rollout or inference finishes: release resources to expand the actor

Optional Policies #

use_pre_process_policy
1. In the early phase of each iteration, temporarily transfer actor resources to rollout
2. When the scheduler detects an appropriate time, reassign part of rollout resources back to the actor
3. Effective for long sequence length (expensive rollout) scenarios to maximize pipeline efficiency
use_wait_before_last_iter_policy
1. Before the last actor iter in an iteration, the actor waits for rollout and inference to finish
2. Then the actor takes all resources for an expanded final step
3. Thanks to pipelining, rollout/inference typically finish earlier; with proper scheduling, the actor can fully utilize the entire cluster for the last iter