Reference#

This section documents the evaluation reference material: config structure, CLI usage, supported models, and how to read outputs.

Page	What you get
Configuration	Hydra YAML layout under `evaluations/<benchmark>/`, including `runner`, `env`, and `rollout` fields required for `embodied_eval`.
CLI	How to launch evaluations with `run_eval.sh`, pass Hydra overrides, and auto-infer the benchmark from config names.
Models	VLA models with example configs in `evaluations/` today (OpenPI, OpenVLA-OFT, StarVLA, DreamZero, LingBotVLA) and how to set `model_path`.
Results	Where logs and rollout videos are written, terminal metrics such as `eval/success_once`, and TensorBoard usage.