Embodied Data Interface#

This section describes the core data structures used during rollout and training in embodied settings: EnvOutput, ChunkStepResult, EmbodiedRolloutResult, and Trajectory. Together, they connect environment outputs, chunk-step accumulation, trajectory construction, and training batches.

Relationships#

EnvOutput: raw environment outputs per chunk step (obs, reward, done, etc.).
ChunkStepResult: model inference outputs and reward signals per chunk step.
EmbodiedRolloutResult: accumulates chunk-step results and transitions.
Trajectory: aggregated trajectory tensors (typically [T, B, ...]).

EmbodiedRolloutResult.to_splited_trajectories() can split trajectories along the batch dimension for Channel distribution to multiple Actor/Trainer workers.

EnvOutput#

EnvOutput describes environment-side outputs, including observations and episode-termination signals. During initialization, tensors are moved to CPU and made contiguous.

class rlinf.data.embodied_io_struct.EnvOutput#

Environment output for a single chunk step.

static merge_env_outputs(env_outputs)#

Merge multiple env output dicts into one batch-aligned env output.

Merge strategy:

Tensor fields: concatenate on batch dimension.
List fields: flatten in source order.
None fields: keep None.
final_obs supports partial None across shards. For shards
without final_obs, use the corresponding obs as fallback to keep batch alignment.

Parameters:: env_outputs (list[dict]) – Per-source env output dicts that share the same schema.
Returns:: A merged env output dict produced via EnvOutput(...).to_dict().
Return type:: dict[str, Any]

__init__(*, obs, final_obs=None, dones=None, terminations=None, truncations=None, rewards=None, intervene_actions=None, intervene_flags=None)#

Parameters:

obs (dict[str, Any])
final_obs (dict[str, Any] | None)
dones (Tensor | None)
terminations (Tensor | None)
truncations (Tensor | None)
rewards (Tensor | None)
intervene_actions (Tensor | None)
intervene_flags (Tensor | None)

Return type:

None

ChunkStepResult#

ChunkStepResult represents per-step inference results and training signals, including actions, log-probabilities, value estimates, and extra forward inputs. Tensors are moved to CPU on initialization.

class rlinf.data.embodied_io_struct.ChunkStepResult#

Model outputs, env outputs (without observations), and training forward inputs for a chunk step.

__init__(*, actions=None, prev_logprobs=None, prev_values=None, dones=None, truncations=None, terminations=None, rewards=None, forward_inputs=<factory>, versions=None)#

Parameters:

actions (Tensor)
prev_logprobs (Tensor)
prev_values (Tensor)
dones (Tensor)
truncations (Tensor)
terminations (Tensor)
rewards (Tensor)
forward_inputs (dict[str, Tensor])
versions (Tensor)

Return type:

None

EmbodiedRolloutResult#

EmbodiedRolloutResult accumulates chunk-step results and transitions during rollout, and provides conversion utilities:

append_step_result(): append chunk-step results
append_transitions(): append current/next transition observations
to_trajectory(): concatenate into trajectory tensors
to_splited_trajectories(): split trajectories along the batch dimension

class rlinf.data.embodied_io_struct.EmbodiedRolloutResult#

Collect chunk-step results and transitions during rollout, and convert them into trajectory tensors.

__init__(*, max_episode_length=0, model_weights_id='', actions=<factory>, intervene_flags=<factory>, rewards=<factory>, terminations=<factory>, truncations=<factory>, dones=<factory>, prev_logprobs=<factory>, prev_values=<factory>, versions=<factory>, forward_inputs=<factory>, curr_obs=<factory>, next_obs=<factory>)#

Parameters:

max_episode_length (int)
model_weights_id (str)
actions (list[Tensor])
intervene_flags (list[Tensor])
rewards (list[Tensor])
terminations (list[Tensor])
truncations (list[Tensor])
dones (list[Tensor])
prev_logprobs (list[Tensor])
prev_values (list[Tensor])
versions (list[Tensor])
forward_inputs (list[dict[str, Any]])
curr_obs (list[dict[str, Any]])
next_obs (list[dict[str, Any]])

Return type:

None

Trajectory#

Trajectory is the final trajectory representation for training. It includes actions, rewards, termination flags, observations, and model forward inputs. The typical tensor shape is [T, B, ...], where T is the chunk-step count and B is the number of parallel environments (batch dimension).

class rlinf.data.embodied_io_struct.Trajectory#

trajectory contains multiple episodes.

__init__(max_episode_length=0, model_weights_id='', actions=None, intervene_flags=None, rewards=None, terminations=None, truncations=None, dones=None, prev_logprobs=None, prev_values=None, versions=None, forward_inputs=<factory>, curr_obs=<factory>, next_obs=<factory>)#

Parameters:

max_episode_length (int)
model_weights_id (str)
actions (Tensor)
intervene_flags (Tensor)
rewards (Tensor)
terminations (Tensor)
truncations (Tensor)
dones (Tensor)
prev_logprobs (Tensor)
prev_values (Tensor)
versions (Tensor)
forward_inputs (dict[str, Any])
curr_obs (dict[str, Any])
next_obs (dict[str, Any])

Return type:

None