Training Configuration#
RLinf recipes are Hydra YAML configs. Use this page as the shared reference for configuration ownership; example pages should link here instead of repeating long key tables.
Where Configs Live#
Workload |
Config location |
Launcher |
|---|---|---|
Embodied RL |
|
|
Reasoning RL |
|
|
Agent workflows |
|
The launcher under the matching |
SFT |
|
The recipe’s |
Evaluation |
|
|
Common Sections#
Section |
Purpose |
|---|---|
|
Node count, node groups, and component placement for actor, rollout, env, reward, or agent workers. |
|
Training backend, model path, optimizer, batch sizes, offload, checkpointing, and loss settings. |
|
Inference engine, sampling parameters, model path, and rollout batch sizing. |
|
Training/evaluation environment type, task selection, assets, video settings, and episode controls. |
|
Task type, logging, checkpoint cadence, validation cadence, and resume behavior. |
|
Advantage and loss selections such as PPO, GRPO, SAC, IQL, or DAgger-specific settings. |
|
Dataset paths, prompt/answer fields, preprocessing, train/validation splits, and SFT data options. |
Edit a Recipe#
Start from the recipe’s named config in
examples/orevaluations/.Set local paths such as
rollout.model.model_path,actor.model.model_path, dataset paths, and environment asset paths.Keep hardware-specific placement in
cluster. For multi-node runs, setcluster.num_nodesand start Ray on every node before launching the recipe.Put logs and checkpoints under
runner.logger.log_pathso TensorBoard, videos, and checkpoints stay together.