Data Collection#

RLinf provides two data collection approaches targeting different downstream use cases:

Approach

Entry Point

Typical Use

Episode Collection

CollectEpisode wrapper

Reward model / value model training data

Real-robot Replay Buffer Collection

collect_data.sh

Real-robot RLPD prior data / policy initialization


Episode Data Collection#

CollectEpisode is a gymnasium.Wrapper that transparently wraps any environment and automatically records step-level data during RL training or evaluation, saving each completed episode to disk asynchronously.

Two output formats are supported:

  • pickle β€” saves the complete raw buffer; suited for custom offline processing.

  • lerobot β€” saves structured Parquet files with metadata; directly compatible with the LeRobot training pipeline.

Key Features#

  • Supports both single environments and vectorized parallel environments (num_envs > 1).

  • Compatible with auto-reset environments: the final pre-reset observation is correctly attributed to the current episode, and the post-reset observation is carried over to the next episode.

  • All write operations run asynchronously in a background thread so they never block the RL training loop.

  • The LeRobot writer is lazily initialized on the first episode write, with image shape, state dimension, and action dimension inferred automatically.

  • Set only_success=True to filter out failed episodes and save disk space.

Constructor Arguments#

Argument

Type

Default

Description

env

gym.Env

β€”

The gymnasium environment to wrap

save_dir

str

β€”

Directory for saving episode data (created automatically)

rank

int

0

Worker rank for unique file naming in distributed settings

num_envs

int

1

Number of parallel environments

show_goal_site

bool

True

Show goal-site visualization in renders (for environments that support it)

export_format

str

"pickle"

Output format: "pickle" or "lerobot"

robot_type

str

"panda"

Robot type written to LeRobot metadata (lerobot format only)

fps

int

10

Dataset frame rate written to LeRobot metadata (lerobot format only)

only_success

bool

False

Save only successful episodes

stats_sample_ratio

float

0.1

Image sampling ratio for incremental statistics (lerobot format only)

finalize_interval

int

100

Call writer.finalize() every N completed episodes as a checkpoint (0 disables; lerobot format only)

Usage Examples#

Direct Python API:

from rlinf.envs.wrappers.collect_episode import CollectEpisode

env = CollectEpisode(
    env=base_env,
    save_dir="./collected_data",
    num_envs=8,
    export_format="lerobot",   # or "pickle"
    robot_type="panda",
    fps=10,
    only_success=True,
)

obs, info = env.reset()
while not done:
    action = policy(obs)
    obs, reward, terminated, truncated, info = env.step(action)
env.close()   # triggers final flush and finalize

Via YAML configuration (simulation training):

Add a data_collection block under env in your YAML config:

env:
  group_name: "EnvGroup"
  enable_offload: False

  data_collection:
    enabled: True
    save_dir: ${runner.logger.log_path}/collected_data
    export_format: "lerobot"      # or "pickle"
    only_success: True
    robot_type: "panda"
    fps: 10

Then run the training script as usual; data is collected automatically:

bash examples/embodiment/run_embodiment.sh maniskill_ppo_mlp_collect

Data Format Details#

pickle format

Each episode is saved as a separate .pkl file with the naming convention:

rank_{rank}_env_{env_idx}_episode_{episode_id}_{success|fail}.pkl

Example: rank_0_env_3_episode_42_success.pkl

The file contains a single dictionary:

{
    "rank":        int,   # worker rank
    "env_idx":     int,   # environment index
    "episode_id":  int,   # episode counter (per-env, monotonically increasing)
    "success":     bool,  # whether the episode succeeded
    "observations": list, # length = num_steps + 1 (includes the initial reset obs)
    "actions":     list,  # length = num_steps
    "rewards":     list,  # length = num_steps
    "terminated":  list,  # length = num_steps
    "truncated":   list,  # length = num_steps
    "infos":       list,  # length = num_steps
}

Note

The pickle format preserves the raw buffer exactly as recorded, making it suitable for custom offline RL or behaviour analysis pipelines. Index 0 in observations comes from reset(); indices 1 through N come from successive step() calls.

LeRobot format

Data is stored as Parquet files alongside JSON metadata files:

save_dir/
β”œβ”€β”€ meta/
β”‚   β”œβ”€β”€ info.json           # dataset metadata (fps, robot_type, dimensions, …)
β”‚   β”œβ”€β”€ episodes.jsonl      # per-episode length and task description
β”‚   β”œβ”€β”€ tasks.jsonl         # deduplicated task list
β”‚   └── stats.json          # mean / std statistics for observations and actions
└── data/
    └── chunk-000/
        β”œβ”€β”€ episode_000000.parquet
        β”œβ”€β”€ episode_000001.parquet
        └── ...

Parquet column schema:

Column

Description

image

Main camera image (bytes + path), uint8

wrist_image

Wrist camera image (bytes + path), uint8; empty when no wrist camera

state

Robot state vector, float32[state_dim]

actions

Action vector, float32[action_dim]

timestamp

Frame timestamp in seconds, float

frame_index

Frame index within the episode, int64

episode_index

Global episode index, int64

index

Global frame index, int64

task_index

Task index (references tasks.jsonl), int64

done

Per-step done flag, bool (True on the last step of each episode)

is_success

Whether the episode succeeded, bool

Observation key lookup order (first match wins):

Field

Keys checked (in priority order)

Main image

main_images β†’ image β†’ full_image

Wrist image

wrist_images β†’ wrist_image

State

states β†’ state

Images are automatically converted to uint8 (float [0, 1] arrays are multiplied by 255; out-of-range arrays are cast directly).

Success Detection Logic#

The wrapper scans info dicts in reverse step order (most recent first). For each step’s info dict, it checks three sources in order β€” final_info β†’ episode β†’ root info β€” and within each source looks for keys in order success_once β†’ success_at_end β†’ success:

  1. info["final_info"]["success_once"] / success_at_end / success

  2. info["episode"]["success_once"] / success_at_end / success

  3. info["success_once"] / info["success_at_end"] / info["success"]

If none of the above keys is found across all recorded steps, the wrapper falls back to the incrementally maintained _episode_success flag updated at each step.


Real-robot Replay Buffer Collection#

Real-robot collection is used for RLPD (Reinforcement Learning from Prior Data) or policy initialization. An operator uses a SpaceMouse or similar device to demonstrate successful task completions; data is saved in TrajectoryReplayBuffer format for direct use in subsequent real-robot training.

Unlike large-scale parallel simulation collection, real-robot collection runs on a single control node and stops automatically once the target number of successful demonstrations is reached.

Core Components#

  • Entry script: examples/embodiment/collect_data.sh

  • Collection logic: examples/embodiment/collect_real_data.py (DataCollector class)

  • Config file: examples/embodiment/config/realworld_collect_data.yaml

DataCollector workflow:

  1. Initialise RealWorldEnv and TrajectoryReplayBuffer.

  2. Loop over steps, reading the SpaceMouse intervention action from info["intervene_action"].

  3. Construct a ChunkStepResult and append it to EmbodiedRolloutResult.

  4. When an episode ends (done=True) with a positive reward, count it as a success and write the trajectory to the buffer.

  5. Stop automatically once num_data_episodes successes have been collected and finalise the buffer.

Configuration Parameters#

Parameter

Default

Description

runner.num_data_episodes

20

Target number of successful demonstrations; stops when reached

cluster.node_groups.hardware.configs.robot_ip

β€”

IP address of the Franka robot

env.eval.use_spacemouse

True

Enable SpaceMouse intervention

env.eval.override_cfg.target_ee_pose

β€”

Target end-effector pose [x, y, z, rx, ry, rz]

env.eval.override_cfg.success_hold_steps

1

Number of consecutive steps at goal pose required to declare success

runner.record_task_description

True

Whether to include the task description string in observations

Data Format#

After collection, data is saved to:

logs/{timestamp}/demos/

TrajectoryReplayBuffer stores each trajectory as a .pt file. Each trajectory contains:

{
    "transitions": {
        "obs": {
            "states":      # robot state, shape=[T, 19] (pose, torques, …)
            "main_images"  # main camera images, shape=[T, 128, 128, 3], uint8
        },
        "next_obs": {
            "states":      # next-step robot state
            "main_images"  # next-step camera images
        },
        "action":          # action, shape=[T, 6]
        "rewards":         # reward, shape=[T, 1]
        "dones":           # done flag, shape=[T, 1], bool
        "terminations":    # termination flag, shape=[T, 1], bool
        "truncations":     # truncation flag, shape=[T, 1], bool
    },
    "intervene_flags":     # all ones, marking this trajectory as expert data
}

Note

intervene_flags is set to all ones to mark the trajectory as an expert demonstration. During RLPD training this flag distinguishes prior data from online policy rollouts.

Usage Steps#

  1. Activate the environment on the control node:

    source <path_to_your_venv>/bin/activate
    
  2. Edit examples/embodiment/config/realworld_collect_data.yaml to set robot_ip and target_ee_pose:

    cluster:
      node_groups:
        hardware:
          configs:
            robot_ip: "192.168.1.100"   # replace with actual IP
    
    env:
      eval:
        use_spacemouse: True
        override_cfg:
          target_ee_pose: [0.5, 0.0, 0.3, 0.0, 3.14, 0.0]
          success_hold_steps: 3
    
    runner:
      num_data_episodes: 50
    
  3. Launch collection (an optional first argument overrides the config name):

    bash examples/embodiment/collect_data.sh
    # or with a custom config name:
    bash examples/embodiment/collect_data.sh my_custom_config
    
  4. Use the SpaceMouse to operate the robot. Once num_data_episodes successes are recorded the script saves the buffer and exits. Logs and data are written under logs/{timestamp}/.


Best Practices#

Episode collection (CollectEpisode)

  • Image data is large. If disk space is limited, use only_success=True to discard failed episodes.

  • When using the LeRobot format, stats_sample_ratio controls the fraction of frames used to compute per-channel statistics. Lowering it reduces memory usage at the cost of slightly less accurate statistics.

  • In distributed training, assign each worker a unique rank to prevent filename collisions.

Real-robot replay buffer collection

  • Prioritise trajectory quality. If the success rate is low, relax success_hold_steps or set a more tolerant target_ee_pose.

  • After collection, load the buffer with TrajectoryReplayBuffer.load() to verify the trajectory count before launching training.

  • To append additional demonstrations, re-run the script pointing to the same demos directory. With auto_save=True, the buffer writes incrementally without overwriting existing trajectories.