From Policy Gradients to Actor Critic methods🔗

Abstract🔗

Starting from the general policy search problem and direct policy search methods, I will give a didactical presentation of the Policy Gradient Theorem and explain some variants of the REINFORCE algorithm. From there, I will move step by step to present more advanced methods such as TRPO, PPO, and Actor-Critic methods such as DDPG and SAC.

Speaker🔗

Olivier Sigaud

Outline🔗

  1. Introduction: the 4 routes to deep RL
  2. The Policy Search problem
  3. The policy gradient derivation (part 1/3)
  4. The policy gradient derivation (part 2/3)
  5. The policy gradient derivation (part 3/3)
  6. PG with baseline versus Actor-Critic
  7. Bias-variance trade-off
  8. On-policy versus Off-policy
  9. TRPO and ACKTR
  10. Proximal Policy Optimization (PPO)
  11. Deep Deterministic Policy Gradient (and TD3)
  12. Soft Actor Critic
  13. Policy Gradient and Reward Weighted Regression
  14. Wrap-up, Take Home Messages

Class material🔗

Olivier Sigaud provides updated videos for each topic on his youtube channel. We provide separate links to them in the list below.

  1. Introduction: the 4 routes to deep RL [slides] [most recent video]
  2. The Policy Search problem [slides] [most recent video]
  3. The policy gradient derivation (part 1/3) [slides] [most recent video]
  4. The policy gradient derivation (part 2/3) [slides] [most recent video]
  5. The policy gradient derivation (part 3/3) [slides] [most recent video]
  6. PG with baseline versus Actor-Critic [slides] [most recent video]
  7. Bias variance trade-off [slides] [most recent video]
  8. On-policy versus Off-policy [slides] [most recent video]
  9. TRPO and ACKTR [slides] [most recent video]
  10. Proximal Policy Optimization (PPO) [slides] [most recent video]
  11. Deep Deterministic Policy Gradient (and TD3) [slides] [most recent video]
  12. Soft Actor Critic [slides] [most recent video]
  13. Policy gradient and Reward Weighted Regression [slides] [most recent video]
  14. Wrap-up, Take Home Messages [slides] [most recent video]

Olivier Sigaud's youtube channel