Pitfalls in Policy Gradient methods🔗

Abstract🔗

In this talk, I will present the behavior of variants of the REINFORCE algorithm using simple gym classic control benchmarks (CartPole, Pendulum, MountainCar...) and various stochastic policy representations (Bernoulli, Gaussian, squashed Gaussian). I will highlight difficulties faced by these algorithms on those simple environments and draw lessons about the necessity to better understanding how they work or why they don't before moving to more advanced methods and more complex benchmarks.

Speaker🔗

Olivier Sigaud

Class material🔗

Slides (1/4) (2/4) (3/4) (4/4)
Github repo