Embodied Data Interface#

This section describes the core data structures used during rollout and training in embodied settings: EnvOutput, ChunkStepResult, EmbodiedRolloutResult, and Trajectory. Together, they connect environment outputs, chunk-step accumulation, trajectory construction, and training batches.

Relationships#

  • EnvOutput: raw environment outputs per chunk step (obs, reward, done, etc.).

  • ChunkStepResult: model inference outputs and reward signals per chunk step.

  • EmbodiedRolloutResult: accumulates chunk-step results and transitions.

  • Trajectory: aggregated trajectory tensors (typically [T, B, ...]).

EmbodiedRolloutResult.to_splited_trajectories() can split trajectories along the batch dimension for Channel distribution to multiple Actor/Trainer workers.

EnvOutput#

EnvOutput describes environment-side outputs, including observations and episode-termination signals. During initialization, tensors are moved to CPU and made contiguous.

class rlinf.data.embodied_io_struct.EnvOutput#

Environment output for a single chunk step.

static merge_env_outputs(env_outputs)#

Merge multiple env output dicts into one batch-aligned env output.

Merge strategy:

  • Tensor fields: concatenate on batch dimension.

  • List fields: flatten in source order.

  • None fields: keep None.

  • final_obs supports partial None across shards. For shards

    without final_obs, use the corresponding obs as fallback to keep batch alignment.

Parameters:

env_outputs (list[dict]) – Per-source env output dicts that share the same schema.

Returns:

A merged env output dict produced via EnvOutput(...).to_dict().

Return type:

dict[str, Any]

__init__(*, obs, final_obs=None, dones=None, terminations=None, truncations=None, rewards=None, intervene_actions=None, intervene_flags=None)#
Parameters:
  • obs (dict[str, Any])

  • final_obs (dict[str, Any] | None)

  • dones (Tensor | None)

  • terminations (Tensor | None)

  • truncations (Tensor | None)

  • rewards (Tensor | None)

  • intervene_actions (Tensor | None)

  • intervene_flags (Tensor | None)

Return type:

None

ChunkStepResult#

ChunkStepResult represents per-step inference results and training signals, including actions, log-probabilities, value estimates, and extra forward inputs. Tensors are moved to CPU on initialization.

class rlinf.data.embodied_io_struct.ChunkStepResult#

Model outputs, env outputs (without observations), and training forward inputs for a chunk step.

__init__(*, actions=None, prev_logprobs=None, prev_values=None, dones=None, truncations=None, terminations=None, rewards=None, forward_inputs=<factory>, versions=None)#
Parameters:
  • actions (Tensor)

  • prev_logprobs (Tensor)

  • prev_values (Tensor)

  • dones (Tensor)

  • truncations (Tensor)

  • terminations (Tensor)

  • rewards (Tensor)

  • forward_inputs (dict[str, Tensor])

  • versions (Tensor)

Return type:

None

EmbodiedRolloutResult#

EmbodiedRolloutResult accumulates chunk-step results and transitions during rollout, and provides conversion utilities:

  • append_step_result(): append chunk-step results

  • append_transitions(): append current/next transition observations

  • to_trajectory(): concatenate into trajectory tensors

  • to_splited_trajectories(): split trajectories along the batch dimension

class rlinf.data.embodied_io_struct.EmbodiedRolloutResult#

Collect chunk-step results and transitions during rollout, and convert them into trajectory tensors.

__init__(*, max_episode_length=0, model_weights_id='', actions=<factory>, intervene_flags=<factory>, rewards=<factory>, terminations=<factory>, truncations=<factory>, dones=<factory>, prev_logprobs=<factory>, prev_values=<factory>, versions=<factory>, forward_inputs=<factory>, curr_obs=<factory>, next_obs=<factory>)#
Parameters:
  • max_episode_length (int)

  • model_weights_id (str)

  • actions (list[Tensor])

  • intervene_flags (list[Tensor])

  • rewards (list[Tensor])

  • terminations (list[Tensor])

  • truncations (list[Tensor])

  • dones (list[Tensor])

  • prev_logprobs (list[Tensor])

  • prev_values (list[Tensor])

  • versions (list[Tensor])

  • forward_inputs (list[dict[str, Any]])

  • curr_obs (list[dict[str, Any]])

  • next_obs (list[dict[str, Any]])

Return type:

None

Trajectory#

Trajectory is the final trajectory representation for training. It includes actions, rewards, termination flags, observations, and model forward inputs. The typical tensor shape is [T, B, ...], where T is the chunk-step count and B is the number of parallel environments (batch dimension).

class rlinf.data.embodied_io_struct.Trajectory#

trajectory contains multiple episodes.

__init__(max_episode_length=0, model_weights_id='', actions=None, intervene_flags=None, rewards=None, terminations=None, truncations=None, dones=None, prev_logprobs=None, prev_values=None, versions=None, forward_inputs=<factory>, curr_obs=<factory>, next_obs=<factory>)#
Parameters:
  • max_episode_length (int)

  • model_weights_id (str)

  • actions (Tensor)

  • intervene_flags (Tensor)

  • rewards (Tensor)

  • terminations (Tensor)

  • truncations (Tensor)

  • dones (Tensor)

  • prev_logprobs (Tensor)

  • prev_values (Tensor)

  • versions (Tensor)

  • forward_inputs (dict[str, Any])

  • curr_obs (dict[str, Any])

  • next_obs (dict[str, Any])

Return type:

None