From Policy Gradients to Actor Critic methods🔗

Abstract🔗

Starting from the general policy search problem and direct policy search methods, I will give a didactical presentation of the Policy Gradient Theorem and explain some variants of the REINFORCE algorithm. From there, I will move step by step to present more advanced methods such as TRPO, PPO, and Actor-Critic methods such as DDPG and SAC.

Speaker🔗

Olivier Sigaud

Outline🔗

Introduction: the 4 routes to deep RL
The Policy Search problem
The policy gradient derivation (part 1/3)
The policy gradient derivation (part 2/3)
The policy gradient derivation (part 3/3)
PG with baseline versus Actor-Critic
Bias-variance trade-off
On-policy versus Off-policy
TRPO and ACKTR
Proximal Policy Optimization (PPO)
Deep Deterministic Policy Gradient (and TD3)
Soft Actor Critic
Policy Gradient and Reward Weighted Regression
Wrap-up, Take Home Messages

Class material🔗

Olivier Sigaud provides updated videos for each topic on his youtube channel. We provide separate links to them in the list below.

Introduction: the 4 routes to deep RL [slides] [most recent video]
The Policy Search problem [slides] [most recent video]
The policy gradient derivation (part 1/3) [slides] [most recent video]
The policy gradient derivation (part 2/3) [slides] [most recent video]
The policy gradient derivation (part 3/3) [slides] [most recent video]
PG with baseline versus Actor-Critic [slides] [most recent video]
Bias variance trade-off [slides] [most recent video]
On-policy versus Off-policy [slides] [most recent video]
TRPO and ACKTR [slides] [most recent video]
Proximal Policy Optimization (PPO) [slides] [most recent video]
Deep Deterministic Policy Gradient (and TD3) [slides] [most recent video]
Soft Actor Critic [slides] [most recent video]
Policy gradient and Reward Weighted Regression [slides] [most recent video]
Wrap-up, Take Home Messages [slides] [most recent video]

Olivier Sigaud's youtube channel

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search