Data Collection#
RLinf provides two data collection approaches targeting different downstream use cases:
Approach |
Entry Point |
Typical Use |
|---|---|---|
Episode Collection |
|
Reward model / value model training data |
Real-robot Replay Buffer Collection |
|
Real-robot RLPD prior data / policy initialization |
Episode Data Collection#
CollectEpisode is a gymnasium.Wrapper that transparently wraps any
environment and automatically records step-level data during RL training or
evaluation, saving each completed episode to disk asynchronously.
Two output formats are supported:
pickle β saves the complete raw buffer; suited for custom offline processing.
lerobot β saves structured Parquet files with metadata; directly compatible with the LeRobot training pipeline.
Key Features#
Supports both single environments and vectorized parallel environments (
num_envs > 1).Compatible with auto-reset environments: the final pre-reset observation is correctly attributed to the current episode, and the post-reset observation is carried over to the next episode.
All write operations run asynchronously in a background thread so they never block the RL training loop.
The LeRobot writer is lazily initialized on the first episode write, with image shape, state dimension, and action dimension inferred automatically.
Set
only_success=Trueto filter out failed episodes and save disk space.
Constructor Arguments#
Argument |
Type |
Default |
Description |
|---|---|---|---|
|
|
β |
The gymnasium environment to wrap |
|
|
β |
Directory for saving episode data (created automatically) |
|
|
|
Worker rank for unique file naming in distributed settings |
|
|
|
Number of parallel environments |
|
|
|
Show goal-site visualization in renders (for environments that support it) |
|
|
|
Output format: |
|
|
|
Robot type written to LeRobot metadata (lerobot format only) |
|
|
|
Dataset frame rate written to LeRobot metadata (lerobot format only) |
|
|
|
Save only successful episodes |
|
|
|
Image sampling ratio for incremental statistics (lerobot format only) |
|
|
|
Call |
Usage Examples#
Direct Python API:
from rlinf.envs.wrappers.collect_episode import CollectEpisode
env = CollectEpisode(
env=base_env,
save_dir="./collected_data",
num_envs=8,
export_format="lerobot", # or "pickle"
robot_type="panda",
fps=10,
only_success=True,
)
obs, info = env.reset()
while not done:
action = policy(obs)
obs, reward, terminated, truncated, info = env.step(action)
env.close() # triggers final flush and finalize
Via YAML configuration (simulation training):
Add a data_collection block under env in your YAML config:
env:
group_name: "EnvGroup"
enable_offload: False
data_collection:
enabled: True
save_dir: ${runner.logger.log_path}/collected_data
export_format: "lerobot" # or "pickle"
only_success: True
robot_type: "panda"
fps: 10
Then run the training script as usual; data is collected automatically:
bash examples/embodiment/run_embodiment.sh maniskill_ppo_mlp_collect
Data Format Details#
pickle format
Each episode is saved as a separate .pkl file with the naming convention:
rank_{rank}_env_{env_idx}_episode_{episode_id}_{success|fail}.pkl
Example: rank_0_env_3_episode_42_success.pkl
The file contains a single dictionary:
{
"rank": int, # worker rank
"env_idx": int, # environment index
"episode_id": int, # episode counter (per-env, monotonically increasing)
"success": bool, # whether the episode succeeded
"observations": list, # length = num_steps + 1 (includes the initial reset obs)
"actions": list, # length = num_steps
"rewards": list, # length = num_steps
"terminated": list, # length = num_steps
"truncated": list, # length = num_steps
"infos": list, # length = num_steps
}
Note
The pickle format preserves the raw buffer exactly as recorded, making it
suitable for custom offline RL or behaviour analysis pipelines.
Index 0 in observations comes from reset(); indices 1 through N come
from successive step() calls.
LeRobot format
Data is stored as Parquet files alongside JSON metadata files:
save_dir/
βββ meta/
β βββ info.json # dataset metadata (fps, robot_type, dimensions, β¦)
β βββ episodes.jsonl # per-episode length and task description
β βββ tasks.jsonl # deduplicated task list
β βββ stats.json # mean / std statistics for observations and actions
βββ data/
βββ chunk-000/
βββ episode_000000.parquet
βββ episode_000001.parquet
βββ ...
Parquet column schema:
Column |
Description |
|---|---|
|
Main camera image (bytes + path), uint8 |
|
Wrist camera image (bytes + path), uint8; empty when no wrist camera |
|
Robot state vector, |
|
Action vector, |
|
Frame timestamp in seconds, |
|
Frame index within the episode, |
|
Global episode index, |
|
Global frame index, |
|
Task index (references tasks.jsonl), |
|
Per-step done flag, |
|
Whether the episode succeeded, |
Observation key lookup order (first match wins):
Field |
Keys checked (in priority order) |
|---|---|
Main image |
|
Wrist image |
|
State |
|
Images are automatically converted to uint8 (float [0, 1] arrays are multiplied by 255; out-of-range arrays are cast directly).
Success Detection Logic#
The wrapper scans info dicts in reverse step order (most recent first). For
each stepβs info dict, it checks three sources in order β
final_info β episode β root info β and within each source looks for keys
in order success_once β success_at_end β success:
info["final_info"]["success_once"]/success_at_end/successinfo["episode"]["success_once"]/success_at_end/successinfo["success_once"]/info["success_at_end"]/info["success"]
If none of the above keys is found across all recorded steps, the wrapper falls
back to the incrementally maintained _episode_success flag updated at each
step.
Real-robot Replay Buffer Collection#
Real-robot collection is used for RLPD (Reinforcement Learning from Prior Data)
or policy initialization. An operator uses a SpaceMouse or similar device to
demonstrate successful task completions; data is saved in
TrajectoryReplayBuffer format for direct use in subsequent real-robot training.
Unlike large-scale parallel simulation collection, real-robot collection runs on a single control node and stops automatically once the target number of successful demonstrations is reached.
Core Components#
Entry script:
examples/embodiment/collect_data.shCollection logic:
examples/embodiment/collect_real_data.py(DataCollectorclass)Config file:
examples/embodiment/config/realworld_collect_data.yaml
DataCollector workflow:
Initialise
RealWorldEnvandTrajectoryReplayBuffer.Loop over steps, reading the SpaceMouse intervention action from
info["intervene_action"].Construct a
ChunkStepResultand append it toEmbodiedRolloutResult.When an episode ends (
done=True) with a positive reward, count it as a success and write the trajectory to the buffer.Stop automatically once
num_data_episodessuccesses have been collected and finalise the buffer.
Configuration Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
|
Target number of successful demonstrations; stops when reached |
|
β |
IP address of the Franka robot |
|
|
Enable SpaceMouse intervention |
|
β |
Target end-effector pose |
|
|
Number of consecutive steps at goal pose required to declare success |
|
|
Whether to include the task description string in observations |
Data Format#
After collection, data is saved to:
logs/{timestamp}/demos/
TrajectoryReplayBuffer stores each trajectory as a .pt file.
Each trajectory contains:
{
"transitions": {
"obs": {
"states": # robot state, shape=[T, 19] (pose, torques, β¦)
"main_images" # main camera images, shape=[T, 128, 128, 3], uint8
},
"next_obs": {
"states": # next-step robot state
"main_images" # next-step camera images
},
"action": # action, shape=[T, 6]
"rewards": # reward, shape=[T, 1]
"dones": # done flag, shape=[T, 1], bool
"terminations": # termination flag, shape=[T, 1], bool
"truncations": # truncation flag, shape=[T, 1], bool
},
"intervene_flags": # all ones, marking this trajectory as expert data
}
Note
intervene_flags is set to all ones to mark the trajectory as an expert
demonstration. During RLPD training this flag distinguishes prior data from
online policy rollouts.
Usage Steps#
Activate the environment on the control node:
source <path_to_your_venv>/bin/activate
Edit
examples/embodiment/config/realworld_collect_data.yamlto setrobot_ipandtarget_ee_pose:cluster: node_groups: hardware: configs: robot_ip: "192.168.1.100" # replace with actual IP env: eval: use_spacemouse: True override_cfg: target_ee_pose: [0.5, 0.0, 0.3, 0.0, 3.14, 0.0] success_hold_steps: 3 runner: num_data_episodes: 50
Launch collection (an optional first argument overrides the config name):
bash examples/embodiment/collect_data.sh # or with a custom config name: bash examples/embodiment/collect_data.sh my_custom_config
Use the SpaceMouse to operate the robot. Once
num_data_episodessuccesses are recorded the script saves the buffer and exits. Logs and data are written underlogs/{timestamp}/.
Best Practices#
Episode collection (CollectEpisode)
Image data is large. If disk space is limited, use
only_success=Trueto discard failed episodes.When using the LeRobot format,
stats_sample_ratiocontrols the fraction of frames used to compute per-channel statistics. Lowering it reduces memory usage at the cost of slightly less accurate statistics.In distributed training, assign each worker a unique
rankto prevent filename collisions.
Real-robot replay buffer collection
Prioritise trajectory quality. If the success rate is low, relax
success_hold_stepsor set a more toleranttarget_ee_pose.After collection, load the buffer with
TrajectoryReplayBuffer.load()to verify the trajectory count before launching training.To append additional demonstrations, re-run the script pointing to the same
demosdirectory. Withauto_save=True, the buffer writes incrementally without overwriting existing trajectories.