Data Collection#
RLinf provides two data collection approaches targeting different downstream use cases:
Approach |
Entry Point |
Typical Use |
|---|---|---|
Episode Collection |
|
Reward model / value model training data |
Real-robot Replay Buffer Collection |
|
Real-robot RLPD prior data / policy initialization |
Episode Data Collection#
CollectEpisode is a gymnasium.Wrapper that transparently wraps any
environment and automatically records step-level data during RL training or
evaluation, saving each completed episode to disk asynchronously.
Two output formats are supported:
pickle β saves the complete raw buffer; suited for custom offline processing.
lerobot β saves structured Parquet files with metadata; directly compatible with the LeRobot training pipeline.
Key Features#
Supports both single environments and vectorized parallel environments (
num_envs > 1).Compatible with auto-reset environments: the final pre-reset observation is correctly attributed to the current episode, and the post-reset observation is carried over to the next episode.
All write operations run asynchronously in a background thread so they never block the RL training loop.
The LeRobot writer is lazily initialized on the first episode write, with image shape, state dimension, and action dimension inferred automatically.
LeRobot export can store
imageandextra_view_image. Whenextra_view_imagesis a stacked[N, H, W, C]tensor, the columns are fanned out by index (extra_view_image-0,extra_view_image-1, β¦).Set
only_success=Trueto filter out failed episodes and save disk space.
Constructor Arguments#
Argument |
Type |
Default |
Description |
|---|---|---|---|
|
|
β |
The gymnasium environment to wrap |
|
|
β |
Directory for saving episode data (created automatically) |
|
|
|
Worker rank for unique file naming in distributed settings |
|
|
|
Number of parallel environments |
|
|
|
Show goal-site visualization in renders (for environments that support it) |
|
|
|
Output format: |
|
|
|
Robot type written to LeRobot metadata (lerobot format only) |
|
|
|
Dataset frame rate written to LeRobot metadata (lerobot format only) |
|
|
|
Save only successful episodes |
|
|
|
Call |
Usage Examples#
Direct Python API:
from rlinf.envs.wrappers.collect_episode import CollectEpisode
env = CollectEpisode(
env=base_env,
save_dir="./collected_data",
num_envs=8,
export_format="lerobot", # or "pickle"
robot_type="panda",
fps=10,
only_success=True,
)
obs, info = env.reset()
while not done:
action = policy(obs)
obs, reward, terminated, truncated, info = env.step(action)
env.close() # triggers final flush and finalize
Via YAML configuration (simulation training):
Add a data_collection block under env in your YAML config:
env:
group_name: "EnvGroup"
enable_offload: False
eval:
data_collection:
enabled: True
save_dir: ${runner.logger.log_path}/collected_data
export_format: "lerobot" # or "pickle"
only_success: True
robot_type: "panda"
fps: 10
Then run the training script as usual; data is collected automatically:
bash examples/embodiment/run_embodiment.sh maniskill_ppo_mlp_collect
Data Format Details#
pickle format
Each episode is saved as a separate .pkl file with the naming convention:
rank_{rank}_env_{env_idx}_episode_{episode_id}_{success|fail}.pkl
Example: rank_0_env_3_episode_42_success.pkl
The file contains a single dictionary:
{
"rank": int, # worker rank
"env_idx": int, # environment index
"episode_id": int, # episode counter (per-env, monotonically increasing)
"success": bool, # whether the episode succeeded
"observations": list, # length = num_steps + 1 (includes the initial reset obs)
"actions": list, # length = num_steps
"rewards": list, # length = num_steps
"terminated": list, # length = num_steps
"truncated": list, # length = num_steps
"infos": list, # length = num_steps
}
Note
The pickle format preserves the raw buffer exactly as recorded, making it
suitable for custom offline RL or behaviour analysis pipelines.
Index 0 in observations comes from reset(); indices 1 through N come
from successive step() calls.
LeRobot format
Data is stored as Parquet files alongside JSON metadata files:
save_dir/
βββ meta/
β βββ info.json # dataset metadata (fps, robot_type, dimensions, β¦)
β βββ episodes.jsonl # per-episode length and task description
β βββ tasks.jsonl # deduplicated task list
β βββ stats.json # mean / std statistics for observations and actions
βββ data/
βββ chunk-000/
βββ episode_000000.parquet
βββ episode_000001.parquet
βββ ...
Parquet column schema:
Column |
Description |
|---|---|
|
Main camera image (bytes + path), uint8 |
|
Auxiliary camera image (bytes + path), uint8. Multi-view stacks are
fanned out into |
|
Robot state vector, |
|
Action vector, |
|
Frame timestamp in seconds, |
|
Frame index within the episode, |
|
Global episode index, |
|
Global frame index, |
|
Task index (references tasks.jsonl), |
|
Per-step done flag, |
|
Whether the episode succeeded, |
Observation key lookup order (first match wins):
Field |
Keys checked (in priority order) |
|---|---|
Main image |
|
Extra-view image |
|
State |
|
Images are automatically converted to uint8 (float [0, 1] arrays are multiplied by 255; out-of-range arrays are cast directly).
Success Detection Logic#
The wrapper scans info dicts in reverse step order (most recent first). For
each stepβs info dict, it checks three sources in order β
final_info β episode β root info β and within each source looks for keys
in order success_once β success_at_end β success:
info["final_info"]["success_once"]/success_at_end/successinfo["episode"]["success_once"]/success_at_end/successinfo["success_once"]/info["success_at_end"]/info["success"]
If none of the above keys is found across all recorded steps, the wrapper falls
back to the incrementally maintained _episode_success flag updated at each
step.
Real-robot Replay Buffer Collection#
Real-robot collection is used for RLPD (Reinforcement Learning from Prior Data)
or policy initialization. An operator uses a SpaceMouse or GELLO device to
demonstrate successful task completions; data is saved in
TrajectoryReplayBuffer format for direct use in subsequent real-robot training.
Unlike large-scale parallel simulation collection, real-robot collection runs on a single control node and stops automatically once the target number of successful demonstrations is reached.
Core Components#
Entry script:
examples/embodiment/collect_data.shCollection logic:
examples/embodiment/collect_real_data.py(DataCollectorclass)Config file:
examples/embodiment/config/realworld_collect_data.yaml
DataCollector workflow:
Initialise
RealWorldEnvandTrajectoryReplayBuffer.Loop over steps, reading the SpaceMouse intervention action from
info["intervene_action"].Construct a
ChunkStepResultand append it toEmbodiedRolloutResult.When an episode ends (
done=True) with reward>= 0.5, count it as a success and write the trajectory to the buffer.Stop automatically once
num_data_episodessuccesses have been collected and finalise the buffer.
Configuration Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
|
Target number of successful demonstrations; stops when reached |
|
β |
IP address of the Franka robot |
|
|
Enable SpaceMouse intervention |
|
|
Whether the real-world env uses a 6-DoF action without a gripper dimension |
|
|
Enable GELLO teleoperation (mutually exclusive with SpaceMouse) |
|
β |
Serial port of the GELLO device (required when |
|
β |
Target end-effector pose |
|
|
Number of consecutive steps at goal pose required to declare success |
|
|
Whether to include the task description string in observations |
Data Format#
After collection, data is saved to:
logs/{timestamp}/demos/
TrajectoryReplayBuffer stores each trajectory as a .pt file.
Each trajectory contains:
{
"transitions": {
"obs": {
"states": # robot state, shape=[T, 19] (pose, torques, β¦)
"main_images" # main camera images, shape=[T, 128, 128, 3], uint8
},
"next_obs": {
"states": # next-step robot state
"main_images" # next-step camera images
},
"action": # action, shape=[T, 6]
"rewards": # reward, shape=[T, 1]
"dones": # done flag, shape=[T, 1], bool
"terminations": # termination flag, shape=[T, 1], bool
"truncations": # truncation flag, shape=[T, 1], bool
},
"intervene_flags": # all ones, marking this trajectory as expert data
}
Note
intervene_flags is set to all ones to mark the trajectory as an expert
demonstration. During RLPD training this flag distinguishes prior data from
online policy rollouts.
Collect Replay Buffer And LeRobot Data Together#
examples/embodiment/collect_real_data.py now supports writing the real-robot
replay buffer and the CollectEpisode export in the same run. With
env.eval.data_collection.enabled=True, successful demonstrations are saved twice:
logs/{timestamp}/demos/asTrajectoryReplayBuffertrajectories for RLPDlogs/{timestamp}/collected_data/as episode files inpickleor LeRobot format
To collect LeRobot-format data while still building the replay buffer, keep the real-world collection config like this:
env:
eval:
data_collection:
enabled: True
save_dir: ${runner.logger.log_path}/collected_data
export_format: "lerobot"
only_success: True
robot_type: "panda"
fps: 10
Usage Steps#
Activate the environment on the control node:
source <path_to_your_venv>/bin/activate
Edit
examples/embodiment/config/realworld_collect_data.yamlto replaceROBOT_IPandTARGET_EE_POSEwith your actual robot IP and target pose:cluster: node_groups: hardware: configs: robot_ip: "192.168.1.100" # replace with actual IP env: eval: use_spacemouse: True override_cfg: target_ee_pose: [0.5, 0.0, 0.3, 0.0, 3.14, 0.0] success_hold_steps: 3 runner: num_data_episodes: 50
Launch collection (an optional first argument overrides the config name):
bash examples/embodiment/collect_data.sh # or with a custom config name: bash examples/embodiment/collect_data.sh my_custom_config
Use the SpaceMouse (or GELLO) to operate the robot. Once
num_data_episodessuccesses are recorded the script saves the buffer and exits. Logs and data are written underlogs/{timestamp}/.To use GELLO instead of SpaceMouse, use the dedicated config:
bash examples/embodiment/collect_data.sh realworld_collect_data_gello
See Real-World RL with Franka for GELLO setup details.
Best Practices#
Episode collection (CollectEpisode)
Image data is large. If disk space is limited, use
only_success=Trueto discard failed episodes.In distributed training, assign each worker a unique
rankto prevent filename collisions.
Real-robot replay buffer collection
Prioritise trajectory quality. If the success rate is low, relax
success_hold_stepsor set a more toleranttarget_ee_pose.After collection, load the buffer with
TrajectoryReplayBuffer.load()to verify the trajectory count before launching training.To append additional demonstrations, re-run the script pointing to the same
demosdirectory. Withauto_save=True, the buffer writes incrementally without overwriting existing trajectories.
Visualization Tools#
After collection, you can inspect both output formats directly from the saved
artifacts under logs/{timestamp}/.
Replay buffer trajectories
Use the existing replay-buffer visualizer to inspect trajectories in
logs/{timestamp}/demos/:
python toolkits/replay_buffer/visualize.py \
--replay_dir logs/{timestamp}/demos
LeRobot datasets
Use toolkits/lerobot/visualize_lerobot_dataset.py to expand a LeRobot
dataset into per-episode folders containing .jpg images and .txt step
metadata:
python toolkits/lerobot/visualize_lerobot_dataset.py \
--dataset-path logs/{timestamp}/collected_data \
--output-dir logs/{timestamp}/collected_data_visualized
The tool reads meta/info.json plus each episode_*.parquet file, then
creates output like episode_000000/step_000003_image.jpg and
episode_000000/step_000003.txt for quick inspection.