Megatron-Bridge#
RLinf supports Megatron-Bridge through the Megatron-LM training backend. This integration lets users start Megatron-LM training directly from HuggingFace-format checkpoints, use model architectures supported by Megatron-Bridge, and keep RLinfβs training loop, data pipeline, logging, and checkpoint workflow unchanged.
Use Megatron-Bridge when:
the actor-side model is large and FSDP or FSDP2 becomes a performance bottleneck;
the model architecture is not yet supported by RLinfβs native Megatron-LM integration.
Megatron-Bridge resources:
Environment Setup#
MBridge currently uses RLinfβs agentic environment. Install it with:
bash requirements/install.sh agentic
source .venv/bin/activate
In addition, update and install the following extra packages:
uv pip install transformers==4.57.1 bitsandbytes
Overview#
After MBridge is enabled, RLinf imports and builds the Megatron-LM model through
Megatron-Bridge instead of relying on the traditional Megatron checkpoint
conversion workflow.
The key configuration is different for reasoning/RL and SFT tasks.
For reasoning tasks:
actor:
training_backend: megatron
megatron:
mbridge: True
use_hf_ckpt: True
ckpt_convertor:
hf_model_path: /path/to/huggingface_model
When actor.megatron.mbridge is True and use_hf_ckpt is True,
RLinf reads the model path from actor.megatron.ckpt_convertor.hf_model_path
and lets MBridge build the Megatron model provider.
For SFT tasks:
actor:
training_backend: megatron
model:
model_path: /path/to/huggingface_model
megatron_checkpoint: null
megatron:
use_hf_ckpt: True
mbridge: True
When actor.megatron.mbridge is True, RLinf reads the model path from
actor.model.model_path and lets MBridge build the Megatron model provider.
Quick Start#
Add Megatron-Bridge and the corresponding Megatron-LM version to
PYTHONPATH:
export PYTHONPATH=/path/to/Megatron-Bridge/src:$PYTHONPATH
export PYTHONPATH=/path/to/Megatron-LM:$PYTHONPATH
export CUDA_DEVICE_MAX_CONNECTIONS=1
Prepare a HuggingFace model directory, for example:
/path/to/Qwen2.5-VL-3B-Instruct
Update the model and tokenizer paths in the config.
Path Differences#
MBridge reads HuggingFace checkpoint paths from different config entries for different training tasks:
Reasoning / RL tasks usually read the HuggingFace model path from
actor.megatron.ckpt_convertor.hf_model_path;SFT tasks usually read the HuggingFace model path from
actor.model.model_path;the tokenizer path is still specified by
actor.tokenizer.tokenizer_model. We recommend keeping it consistent with the model directory.
Therefore, do not only copy mbridge: True when migrating configs. Also check
whether the model path is configured in the entry used by the current task type.
Reasoning task example:
actor:
tokenizer:
tokenizer_model: "/path/to/model/DeepSeek-R1-Distill-Qwen-1.5B"
training_backend: megatron
megatron:
mbridge: True
use_hf_ckpt: True
ckpt_convertor:
hf_model_path: /path/to/huggingface_model
SFT example:
actor:
model:
model_type: "qwen2.5_vl"
model_path: "/path/to/Qwen2.5-VL-3B-Instruct"
megatron_checkpoint: null
tokenizer:
tokenizer_model: "/path/to/Qwen2.5-VL-3B-Instruct"
megatron:
use_hf_ckpt: True
mbridge: True
Launch the corresponding training script.
Start reasoning training from the repository root:
bash examples/reasoning/run_main_grpo_math.sh qwen2.5-1.5b-grpo-megatron
Start VLM SFT training from the repository root:
bash examples/sft/run_vlm_sft.sh qwen2_5_vl_megatron_sft_vlm
Checkpoint Loading#
When Megatron-Bridge is used in RLinf, RLinf saves both checkpoint formats:
HuggingFace checkpoint;
Megatron checkpoint.
The checkpoint directory is organized as follows:
/path/to/logs/qwen2.5-1.5b-grpo-megatron/checkpoints/
βββ global_step_10/
β βββ actor/
β βββ hf_model/
β β βββ model.safetensors
β β βββ tokenizer.json
β βββ iter_0000010/
β β βββ mp_rank_00/
β β β βββ distrib_optim.pt
β β β βββ model_optim_rng.pt
β β βββ mp_rank_01/
β β βββ distrib_optim.pt
β β βββ model_optim_rng.pt
β βββ latest_checkpointed_iteration.txt
βββ global_step_20/
βββ ...
The hf_model directory stores HuggingFace-format model weights and tokenizer
files. The iter_XXXXXXX directory stores Megatron model weights and optimizer
states. latest_checkpointed_iteration.txt records the latest checkpointed
iteration. In this example, global_step_10/ and global_step_20/ are two
different checkpoints for step 10 and step 20.
For resume training, you can load only the Megatron checkpoint. The HuggingFace-format checkpoint is not required.
runner:
resume_dir: /path/to/logs/qwen2.5-1.5b-grpo-megatron/checkpoints/global_step_10
Practical Notes#
Keep
actor.model.megatron_checkpoint: nullwhenuse_hf_ckpt: True.Set
actor.megatron.use_hf_ckpt: Falseonly when loading a prepared Megatron checkpoint.For Qwen3-VL models, keep
actor.model.apply_rope_fusion: False.For Qwen2.5 models,
qkv_biasis forced on for model compatibility.For Qwen3 models,
qk_layernormis forced on for model compatibility.Make sure the tokenizer path matches the HuggingFace model directory.
Troubleshooting#
model.megatron_checkpoint is required if use_hf_ckpt is Falseuse_hf_ckptis disabled, but no Megatron checkpoint path was provided. Setactor.megatron.use_hf_ckpt: Trueor providerunner.resume_dir.model.megatron_checkpoint should be None if use_hf_ckpt is TrueHuggingFace loading and Megatron checkpoint loading are both enabled. Set
actor.model.megatron_checkpoint: null.- Qwen3-VL fails with a
deepstack_visual_indexesassertion The modelβs visual deepstack configuration does not match the current pipeline split. First try
pipeline_model_parallel_size: 1. If pipeline parallelism is required, make sure the first language pipeline stage has enough layers to contain alldeepstack_visual_indexes. If you are using a reduced-layer checkpoint, also verify that the visual deepstack configuration matches the number of language model layers.