Skip to content

BetaDistribution policy for bounded continuous action spaces to avoid Gaussian clipping bias and improve training stability #2142

Open
@lukaskiss222

Description

@lukaskiss222

🚀 Feature

Add a option of BetaDistribution policy for bounded continuous action spaces to avoid Gaussian clipping bias and improve training stability.

Motivation

Petrazzini & Antonelo (2021) demonstrated that replacing Gaussian with a Beta distribution (compact support) in PPO yields significantly faster convergence, higher final rewards, and a 63 % increase in success on CarRacing-v0.

Improving Stochastic Policy Gradients in Continuous Control with Deep
Reinforcement Learning using the Beta Distribution
showed that the Beta policy is bias-free and provides significantly faster convergence and higher scores over the Gaussian policy when both are used with trust region policy optimization (TRPO) and actor critic with experience replay (ACER).

Pitch

New BetaDistribution class in stable_baselines3.common.distributions, mirroring DiagGaussianDistribution with methods and optionally to use it on any on policy algorithm.

Or create combination of Gaussian and beta, where for bounded actions, it will use Beta distribution and for unbounded the Gaussian distribution.

Alternatives

  • Tanh-squashed Gaussian (SquashedDiagGaussianDistribution): still biases density near ±1 and can have vanishing gradients at the tails .
  • Truncated normal: more complex to implement and less stable under backpropagation .
  • State-Dependent Noise (gSDE): adapts variance but still uses Gaussian support, so does not fully eliminate bias at hard bounds .

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
  • If I'm requesting a new feature, I have proposed alternatives

@araffin What would be the best implementation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions