Embodied Data Interface#
This section describes the core data structures used during rollout and training
in embodied settings: EnvOutput, ChunkStepResult, EmbodiedRolloutResult,
and Trajectory. Together, they connect environment outputs, chunk-step
accumulation, trajectory construction, and training batches.
Relationships#
EnvOutput: raw environment outputs per chunk step (obs, reward, done, etc.).ChunkStepResult: model inference outputs and reward signals per chunk step.EmbodiedRolloutResult: accumulates chunk-step results and transitions.Trajectory: aggregated trajectory tensors (typically[T, B, ...]).
EmbodiedRolloutResult.to_splited_trajectories() can split trajectories along the
batch dimension for Channel distribution to multiple Actor/Trainer workers.
EnvOutput#
EnvOutput describes environment-side outputs, including observations and
episode-termination signals. During initialization, tensors are moved to CPU
and made contiguous.
- class rlinf.data.embodied_io_struct.EnvOutput#
Environment output for a single chunk step.
- static merge_env_outputs(env_outputs)#
Merge multiple env output dicts into one batch-aligned env output.
Merge strategy:
Tensor fields: concatenate on batch dimension.
List fields: flatten in source order.
Nonefields: keepNone.final_obssupports partialNoneacross shards. For shardswithout
final_obs, use the correspondingobsas fallback to keep batch alignment.
- Parameters:
env_outputs (list[dict]) – Per-source env output dicts that share the same schema.
- Returns:
A merged env output dict produced via
EnvOutput(...).to_dict().- Return type:
dict[str, Any]
- __init__(*, obs, final_obs=None, dones=None, terminations=None, truncations=None, rewards=None, intervene_actions=None, intervene_flags=None)#
- Parameters:
obs (dict[str, Any])
final_obs (dict[str, Any] | None)
dones (Tensor | None)
terminations (Tensor | None)
truncations (Tensor | None)
rewards (Tensor | None)
intervene_actions (Tensor | None)
intervene_flags (Tensor | None)
- Return type:
None
ChunkStepResult#
ChunkStepResult represents per-step inference results and training signals,
including actions, log-probabilities, value estimates, and extra forward inputs.
Tensors are moved to CPU on initialization.
- class rlinf.data.embodied_io_struct.ChunkStepResult#
Model outputs, env outputs (without observations), and training forward inputs for a chunk step.
- __init__(*, actions=None, prev_logprobs=None, prev_values=None, dones=None, truncations=None, terminations=None, rewards=None, forward_inputs=<factory>, versions=None)#
- Parameters:
actions (Tensor)
prev_logprobs (Tensor)
prev_values (Tensor)
dones (Tensor)
truncations (Tensor)
terminations (Tensor)
rewards (Tensor)
forward_inputs (dict[str, Tensor])
versions (Tensor)
- Return type:
None
EmbodiedRolloutResult#
EmbodiedRolloutResult accumulates chunk-step results and transitions during
rollout, and provides conversion utilities:
append_step_result(): append chunk-step resultsappend_transitions(): append current/next transition observationsto_trajectory(): concatenate into trajectory tensorsto_splited_trajectories(): split trajectories along the batch dimension
- class rlinf.data.embodied_io_struct.EmbodiedRolloutResult#
Collect chunk-step results and transitions during rollout, and convert them into trajectory tensors.
- __init__(*, max_episode_length=0, model_weights_id='', actions=<factory>, intervene_flags=<factory>, rewards=<factory>, terminations=<factory>, truncations=<factory>, dones=<factory>, prev_logprobs=<factory>, prev_values=<factory>, versions=<factory>, forward_inputs=<factory>, curr_obs=<factory>, next_obs=<factory>)#
- Parameters:
max_episode_length (int)
model_weights_id (str)
actions (list[Tensor])
intervene_flags (list[Tensor])
rewards (list[Tensor])
terminations (list[Tensor])
truncations (list[Tensor])
dones (list[Tensor])
prev_logprobs (list[Tensor])
prev_values (list[Tensor])
versions (list[Tensor])
forward_inputs (list[dict[str, Any]])
curr_obs (list[dict[str, Any]])
next_obs (list[dict[str, Any]])
- Return type:
None
Trajectory#
Trajectory is the final trajectory representation for training. It includes
actions, rewards, termination flags, observations, and model forward inputs.
The typical tensor shape is [T, B, ...], where T is the chunk-step count
and B is the number of parallel environments (batch dimension).
- class rlinf.data.embodied_io_struct.Trajectory#
trajectory contains multiple episodes.
- __init__(max_episode_length=0, model_weights_id='', actions=None, intervene_flags=None, rewards=None, terminations=None, truncations=None, dones=None, prev_logprobs=None, prev_values=None, versions=None, forward_inputs=<factory>, curr_obs=<factory>, next_obs=<factory>)#
- Parameters:
max_episode_length (int)
model_weights_id (str)
actions (Tensor)
intervene_flags (Tensor)
rewards (Tensor)
terminations (Tensor)
truncations (Tensor)
dones (Tensor)
prev_logprobs (Tensor)
prev_values (Tensor)
versions (Tensor)
forward_inputs (dict[str, Any])
curr_obs (dict[str, Any])
next_obs (dict[str, Any])
- Return type:
None