Algorithms#

Use these references when you need the objective, scope, and implementation notes for a supported RL algorithm.

Algorithm

Summary

PPO

Proximal Policy Optimization.

GRPO

Group Relative Policy Optimization.

DAPO

Decoupled-clip and dynamic-sampling policy optimization.

Reinforce++

An enhanced REINFORCE baseline.

SAC

Soft Actor-Critic.

CrossQ

Sample-efficient off-policy RL without target networks.

RLPD

RL with prior data.

IQL

Implicit Q-Learning for offline RL.

Async PPO

Asynchronous, pipelined PPO.