Training Metrics#
RLinf reports metrics through the MetricLogger under a few namespaces —
train/, rollout/, env/, and time/. This page defines them once; example
pages link here instead of repeating the definitions.
Tip
For embodied tasks the single most useful signal is ``env/success_once`` — the
unnormalized episodic success rate. Most other env/* values are hard to read
directly under sparse rewards (see below).
Training metrics — train/#
Policy- and value-optimization statistics, logged every actor update.
Metric |
Meaning |
|---|---|
|
Approximate KL divergence between the old and new policies. |
|
Fraction of updates where the probability ratio was clipped. |
|
Mean of the clipped probability ratios. |
|
Gradient norm of the actor. |
|
Current learning rate. |
|
PPO / GRPO policy loss. |
|
Value-function loss. |
|
Fraction of value targets whose update was clipped. |
|
Explained variance of the value predictions (closer to 1 is better). |
|
Policy entropy. |
|
Total training loss (actor + critic + entropy regularization). |
Rollout metrics — rollout/#
Statistics of the advantages and rewards collected during rollout.
Metric |
Meaning |
|---|---|
|
Maximum advantage in the batch. |
|
Mean advantage in the batch. |
|
Minimum advantage in the batch. |
|
Reward of a rollout chunk. |
Environment metrics — env/#
Task-level signals from the simulator.
Metric |
Meaning |
|---|---|
|
Recommended. Unnormalized episodic success rate — the truest measure of task performance. |
|
Number of environment steps elapsed in the episode. |
|
Episode return. Under sparse rewards this is near-zero until the terminal success step, so it is not very informative during training. |
|
Step-level reward ( |
See also the Logger tutorial for choosing backends (TensorBoard,
Weights & Biases, SwanLab) and configuring runner.logger.