Configuration Reference#
Eval configs are Hydra YAML files under evaluations/<benchmark>/. The core structure (using libero_spatial_openpi_pi05_eval.yaml as an example):
defaults:
- env/libero_spatial@env.eval # Environment preset
- model/pi0_5@rollout.model # Model preset
- override hydra/job_logging: stdout
hydra:
searchpath:
- file://${oc.env:EMBODIED_PATH}/config/
runner:
task_type: embodied_eval # Must be embodied_eval
only_eval: True # Evaluation only, no training
ckpt_path: null # Optional: load a .pt checkpoint
logger:
log_path: "../results"
cluster:
component_placement:
env,rollout: all # GPU placement for env and rollout
env:
eval:
total_num_envs: 500 # Number of parallel environments
rollout_epoch: 1 # Number of eval epochs
max_episode_steps: 240
auto_reset: True
is_eval: True
video_cfg:
save_video: True
rollout:
generation_backend: "huggingface"
model:
model_path: "/path/to/model" # Required: model weights path
model_type: "openpi"
env.eval Field Reference#
The fields below live under env.eval and control parallelism, trajectory length, and test-set coverage for embodied evaluation.
Field |
Role and recommended settings |
|---|---|
|
Total number of parallel environments, evenly distributed across env workers. Higher values improve throughput but use more GPU/RAM. Set to the total init-state count when resources allow; use a smaller value with |
|
Number of evaluation rollout epochs. Each epoch traverses the test set under the same seed; multiple epochs are averaged for lower variance. Use |
|
Maximum interaction steps per trajectory before forced truncation. Should meet the benchmark’s minimum step requirement and match the model’s training config. |
|
Per-env step budget within one |
|
Whether to reset and load the next init state when an episode ends (success or truncation). |
|
Whether to ignore early termination on task success. When |
|
Use pre-assigned reset state IDs instead of random sampling. Set |
|
Traverse init states in a fixed order. When |
|
Evaluation mode flag; must be |
Fields You Must Customize#
rollout.model.model_path: Local model directory or HuggingFace cache path.Resource-related fields under
env.eval:total_num_envs,max_episode_steps,assets_path(RoboTwin), etc.cluster.component_placement: Adjustenvandrolloutplacement for your GPUs.Real-robot eval: Configure Franka IP and node topology in
cluster.node_groups(seerealworld/realworld_eval.yaml).
Deriving from a Training Config#
Copy the matching YAML from examples/embodiment/config/ or tests/e2e_tests/embodied/, remove training sections (algorithm, actor, etc.), keep env.eval and rollout, and set:
runner.task_type: embodied_evalrunner.only_eval: True
Config Fallback#
If evaluations/<benchmark>/<config>.yaml does not exist, run_eval.sh falls back to examples/embodiment/config/ with the same config name. See CLI Reference.