Reference#

This section documents the evaluation reference material: config structure, CLI usage, supported models, and how to read outputs.

Page

What you get

Configuration

Hydra YAML layout under evaluations/<benchmark>/, including runner, env, and rollout fields required for embodied_eval.

CLI

How to launch evaluations with run_eval.sh, pass Hydra overrides, and auto-infer the benchmark from config names.

Models

VLA models with example configs in evaluations/ today (OpenPI, OpenVLA-OFT, StarVLA, DreamZero, LingBotVLA) and how to set model_path.

Results

Where logs and rollout videos are written, terminal metrics such as eval/success_once, and TensorBoard usage.