From Policy Gradients to Actor Critic methods🔗
Abstract🔗
Starting from the general policy search problem and direct policy search methods, I will give a didactical presentation of the Policy Gradient Theorem and explain some variants of the REINFORCE algorithm. From there, I will move step by step to present more advanced methods such as TRPO, PPO, and Actor-Critic methods such as DDPG and SAC.
Speaker🔗
Outline🔗
- Introduction: the 4 routes to deep RL
- The Policy Search problem
- The policy gradient derivation (part 1/3)
- The policy gradient derivation (part 2/3)
- The policy gradient derivation (part 3/3)
- PG with baseline versus Actor-Critic
- Bias-variance trade-off
- On-policy versus Off-policy
- TRPO and ACKTR
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (and TD3)
- Soft Actor Critic
- Policy Gradient and Reward Weighted Regression
- Wrap-up, Take Home Messages
Class material🔗
Olivier Sigaud provides updated videos for each topic on his youtube channel. We provide separate links to them in the list below.
- Introduction: the 4 routes to deep RL [slides] [most recent video]
- The Policy Search problem [slides] [most recent video]
- The policy gradient derivation (part 1/3) [slides] [most recent video]
- The policy gradient derivation (part 2/3) [slides] [most recent video]
- The policy gradient derivation (part 3/3) [slides] [most recent video]
- PG with baseline versus Actor-Critic [slides] [most recent video]
- Bias variance trade-off [slides] [most recent video]
- On-policy versus Off-policy [slides] [most recent video]
- TRPO and ACKTR [slides] [most recent video]
- Proximal Policy Optimization (PPO) [slides] [most recent video]
- Deep Deterministic Policy Gradient (and TD3) [slides] [most recent video]
- Soft Actor Critic [slides] [most recent video]
- Policy gradient and Reward Weighted Regression [slides] [most recent video]
- Wrap-up, Take Home Messages [slides] [most recent video]