SFT for VLA / WAM Models#
Supervised fine-tuning (SFT) is the standard cold-start step before embodied RL: a strong SFT checkpoint dramatically reduces RL exploration time and improves final policy quality. This category lists RLinf’s recipes for full-parameter and LoRA SFT on VLA / WAM models, plus VLM SFT for multimodal post-training.
After running SFT here, continue to RL on Embodied Models (model-centric RL) or RL with Embodied Simulators (benchmark-centric RL) to fine-tune the resulting checkpoint with RL.
OpenPI Supervised Fine-Tuning
Run full-parameter and LoRA SFT for OpenPI before RL fine-tuning
DreamZero Supervised Fine-Tuning
Full-parameter and mixture SFT for DreamZero (WAN2.1 / WAN2.2 backbones)
VLM Supervised Fine-Tuning
Run full-parameter SFT and evaluation for VLM models such as Qwen