Environment 接口#

本节介绍 RLinf 框架中 环境（environment） 的关键 API，这些接口是专门为 具身智能 场景设计的。作为示例，我们展示了基于 LIBERO 的环境配置方法。对于其他模拟器，设置过程也是类似的。

EnvWorker#

class rlinf.workers.env.env_worker.EnvWorker#

__init__(cfg)#

Initialize the Worker with the given parent address and world size.

Only non-Ray workers should provide parent_address, world_size and rank. For example, when a Worker is created via multiprocessing by another Worker, the parent address, world size and rank should be provided.

参数:

parent_address (Optional[WorkerAddress]) -- The address of the parent worker. This is used to set up the WorkerAddress for this worker.
world_size (Optional[int]) -- The total number of workers in the group. If not provided, it will be set to the environment variable WORLD_SIZE.
rank (Optional[int]) -- The rank of this worker in the group. If not provided, it will be set to the environment variable RANK.
cfg (DictConfig)

env_interact_step(chunk_actions, stage_id)#

This function is used to interact with the environment.

参数:

chunk_actions (Tensor)
stage_id (int)

返回类型:

tuple[EnvOutput, dict[str, Any]]

env_evaluate_step(raw_actions, stage_id)#

This function is used to evaluate the environment.

参数:

raw_actions (Tensor)
stage_id (int)

返回类型:

tuple[EnvOutput, dict[str, Any]]

recv_chunk_actions(input_channel, mode='train')#

Receive and merge chunked actions for the current env worker.

The method fetches one action shard from each mapped rollout source rank under a deterministic channel key pattern and concatenates them on the batch dimension.

参数:

input_channel (Channel) -- Channel carrying rollout->env action chunks.
mode -- Rollout mode, either "train" or "eval".

返回:

Concatenated action chunk array with shape [num_envs_per_stage, ...].

返回类型:

ndarray

send_env_batch(rollout_channel, env_batch, mode='train')#

Send split env batches to mapped rollout ranks.

Each destination rank receives one split batch via a stable key built from src_rank, dst_rank and mode.

参数:

rollout_channel (Channel) -- Channel carrying env->rollout outputs.
env_batch (dict[str, Any]) -- Env output dictionary for one pipeline stage.
mode (Literal['train', 'eval']) -- Rollout mode, either "train" or "eval".

返回类型:

None

prefetch_train_bootstrap(rollout_channel)#

Prepare and send the first env batch for the next training rollout.

参数:: rollout_channel (Channel)
返回类型:: None

Environment#

class rlinf.envs.libero.libero_env.LiberoEnv#

__init__(cfg, num_envs, seed_offset, total_num_processes, worker_info)#

reset(env_idx=None, reset_state_ids=None)#

Resets the environment to an initial state and returns the initial observation.

This method can reset the environment's random number generator(s) if seed is an integer or if the environment has not yet initialized a random number generator. If the environment already has a random number generator and reset() is called with seed=None, the RNG should not be reset. Moreover, reset() should (in the typical use case) be called with an integer seed right after initialization and then never again.

参数:

seed (optional int) -- The seed that is used to initialize the environment's PRNG. If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.
options (optional dict) -- Additional information to specify how the environment is reset (optional, depending on the specific environment)
env_idx (int | list[int] | ndarray | None)

返回:

Observation of the initial state. This will be an element of observation_space: (typically a numpy array) and is analogous to the observation returned by step().
info (dictionary): This dictionary contains auxiliary information complementing observation. It should be analogous to: the info returned by step().

返回类型:

observation (object)

step(actions=None, auto_reset=True)#: Step the environment with the given actions.