Algorithms#
Use these references when you need the objective, scope, and implementation notes for a supported RL algorithm.
Algorithm |
Summary |
|---|---|
Proximal Policy Optimization. |
|
Group Relative Policy Optimization. |
|
Decoupled-clip and dynamic-sampling policy optimization. |
|
An enhanced REINFORCE baseline. |
|
Soft Actor-Critic. |
|
Sample-efficient off-policy RL without target networks. |
|
RL with prior data. |
|
Implicit Q-Learning for offline RL. |
|
Asynchronous, pipelined PPO. |