Evaluation#
RLinf provides a unified embodied evaluation entry point. It runs parallel rollouts in simulation or on real robots and reports task-level metrics such as success rate. This module covers environment setup, a quick first evaluation, and end-to-end workflows per benchmark.
Supported Benchmarks
The table below lists benchmarks that have example configs under evaluations/ and can be launched directly with run_eval.sh.
Benchmark |
Task / env preset |
Example config |
|---|---|---|
RealWorld |
|
|
BEHAVIOR-1K |
|
|
LIBERO |
|
|
ManiSkill OOD |
|
|
PolaRiS |
|
|
RoboTwin |
|
|
LIBERO variants: Standard LIBERO, LIBERO-PRO, and LIBERO-PLUS are supported via environment variables (see LIBERO Evaluation).
Config fallback: If evaluations/<benchmark>/<config>.yaml does not exist, run_eval.sh falls back to examples/embodiment/config/ with the same config name, so training configs can be reused for evaluation.
Get Started#
Page |
What you get |
|---|---|
Evaluation architecture and the |
|
Environment setup and benchmark-specific variables. |
|
Run your first LIBERO Spatial evaluation in ~5 minutes. |
Guides#
End-to-end evaluation workflows per benchmark (setup → config → launch → results):
Benchmark |
Workflow |
|---|---|
Franka real-robot evaluation and deployment. |
|
BEHAVIOR-1K evaluation. |
|
LIBERO / LIBERO-PRO / LIBERO-PLUS. |
|
ManiSkill out-of-distribution evaluation. |
|
PolaRiS tabletop manipulation. |
|
RoboTwin bimanual manipulation. |
Reference#
Page |
What you get |
|---|---|
Hydra YAML structure and required fields. |
|
|
|
Supported models and example configs. |
|
Logs, metrics, and video output. |