Environment Interface#

This section provides the key APIs of the environment in the RLinf framework, specifically designed for embodied intelligence scenarios. As an example, we demonstrate the env setup using LIBERO as the background. For any other simulator, the process is analogous.

EnvWorker#

class rlinf.workers.env.env_worker.EnvWorker#

__init__(cfg)#

Initialize the Worker with the given parent address and world size.

Only non-Ray workers should provide parent_address, world_size and rank. For example, when a Worker is created via multiprocessing by another Worker, the parent address, world size and rank should be provided.

Parameters:

parent_address (Optional[WorkerAddress]) – The address of the parent worker. This is used to set up the WorkerAddress for this worker.
world_size (Optional[int]) – The total number of workers in the group. If not provided, it will be set to the environment variable WORLD_SIZE.
rank (Optional[int]) – The rank of this worker in the group. If not provided, it will be set to the environment variable RANK.
cfg (DictConfig)

env_interact_step(chunk_actions, stage_id)#

This function is used to interact with the environment.

Parameters:

chunk_actions (Tensor)
stage_id (int)

Return type:

tuple[EnvOutput, dict[str, Any]]

env_evaluate_step(raw_actions, stage_id)#

This function is used to evaluate the environment.

Parameters:

raw_actions (Tensor)
stage_id (int)

Return type:

tuple[EnvOutput, dict[str, Any]]

recv_chunk_actions(input_channel, mode='train')#

Receive and merge chunked actions for the current env worker.

The method fetches one action shard from each mapped rollout source rank under a deterministic channel key pattern and concatenates them on the batch dimension.

Parameters:

input_channel (Channel) – Channel carrying rollout->env action chunks.
mode – Rollout mode, either "train" or "eval".

Returns:

Concatenated action chunk array with shape [num_envs_per_stage, ...].

Return type:

ndarray

split_env_batch(env_batch, sizes, mode)#

Split one env batch dict into size-specified sub-batches along dim-0.

Tensor values are chunked on dim-0; list values are sliced proportionally; nested dict values are split recursively.

Parameters:

env_batch (dict[str, Any]) – Env output dictionary produced by EnvOutput.to_dict.
sizes (list[int]) – Batch sizes for each destination rank.
mode (Literal['train', 'eval']) – Rollout mode used for list-length validation.

Returns:

A list of split env batches, one item per destination rank.

Return type:

list[dict[str, Any]]

send_env_batch(output_channel, env_batch, mode='train')#

Send split env batches to mapped rollout ranks.

Each destination rank receives one split batch via a stable key built from src_rank, dst_rank and mode.

Parameters:

output_channel (Channel) – Channel carrying env->rollout outputs.
env_batch (dict[str, Any]) – Env output dictionary for one pipeline stage.
mode (Literal['train', 'eval']) – Rollout mode, either "train" or "eval".

Return type:

None

Environment#

class rlinf.envs.libero.libero_env.LiberoEnv#

__init__(cfg, num_envs, seed_offset, total_num_processes, worker_info)#

reset(env_idx=None, reset_state_ids=None)#

Resets the environment to an initial state and returns the initial observation.

This method can reset the environment’s random number generator(s) if seed is an integer or if the environment has not yet initialized a random number generator. If the environment already has a random number generator and reset() is called with seed=None, the RNG should not be reset. Moreover, reset() should (in the typical use case) be called with an integer seed right after initialization and then never again.

Parameters:

seed (optional int) – The seed that is used to initialize the environment’s PRNG. If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.
options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)
env_idx (int | list[int] | ndarray | None)

Returns:

Observation of the initial state. This will be an element of observation_space: (typically a numpy array) and is analogous to the observation returned by step().
info (dictionary): This dictionary contains auxiliary information complementing observation. It should be analogous to: the info returned by step().

Return type:

observation (object)

step(actions=None, auto_reset=True)#: Step the environment with the given actions.