Training Configuration#

RLinf recipes are Hydra YAML configs. Use this page as the shared reference for configuration ownership; example pages should link here instead of repeating long key tables.

Where Configs Live#

Workload	Config location	Launcher
Embodied RL	`examples/embodiment/config/*.yaml`	`bash examples/embodiment/run_embodiment.sh <config_name>`
Reasoning RL	`examples/reasoning/config/**.yaml`	`bash examples/reasoning/run_main_grpo_math.sh <config_name>`
Agent workflows	`examples/agent/*/config/.yaml`	The launcher under the matching `examples/agent/<recipe>/` directory
SFT	`examples/sft/**` and recipe-specific config directories	The recipe’s `run_*_sft.sh` launcher
Evaluation	`evaluations/<benchmark>/*.yaml`	`bash evaluations/run_eval.sh <benchmark> <config_name>`

Common Sections#

Section	Purpose
`cluster`	Node count, node groups, and component placement for actor, rollout, env, reward, or agent workers.
`actor`	Training backend, model path, optimizer, batch sizes, offload, checkpointing, and loss settings.
`rollout`	Inference engine, sampling parameters, model path, and rollout batch sizing.
`env`	Training/evaluation environment type, task selection, assets, video settings, and episode controls.
`runner`	Task type, logging, checkpoint cadence, validation cadence, and resume behavior.
`algorithm`	Advantage and loss selections such as PPO, GRPO, SAC, IQL, or DAgger-specific settings.
`data`	Dataset paths, prompt/answer fields, preprocessing, train/validation splits, and SFT data options.

Edit a Recipe#

Start from the recipe’s named config in examples/ or evaluations/.
Set local paths such as rollout.model.model_path, actor.model.model_path, dataset paths, and environment asset paths.
Keep hardware-specific placement in cluster. For multi-node runs, set cluster.num_nodes and start Ray on every node before launching the recipe.
Put logs and checkpoints under runner.logger.log_path so TensorBoard, videos, and checkpoints stay together.

Further Reading#