Stochastic bandits🔗

Abstract🔗

The bandit framework specializes the reinforcement learning setup by removing the (controlled) state. Bandits provide all the essential ingredients to study the exploration/exploitation dilemma, with many principles derived for bandits generalizing to the reinforcement learning setting. The simplification has the advantage that it permits a more complete understanding and practical algorithms. Furthermore, bandits are a good model for many applications.

I will introduce bandit problems and present the most well-known algorithms based on the principle of optimism in the face of uncertainty. There will be a live coding demo and discussions of the many extensions needed in practical applications.

Speaker🔗

Tor Lattimore

Class material🔗

Slides
Code
The Bandit Algorithms book
Notebook on colab implementing the same code as above.