Micro-data policy search🔗

Abstract🔗

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible when experiments take some time or are expensive (for instance, with a physical robot or with an aerodynamics simulator). This class focuses on the extreme other end of the spectrum: how can an algorithm adopt a policy with only a handful of trials (a dozen) and a few minutes? By analogy with the expression "big-data", we refer to this challenge as "micro-data reinforcement learning". We will describe two main strategies: (1) leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators), and (2) create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Most of the examples will be about robotic systems, but the principle applies to any other expensive setup.

Speakers🔗

Jean-Baptiste Mouret
Konstantinos Chatzilygeroudis

Class material🔗

Slides:

  1. Introduction
  2. Priors on policy structures
  3. Bayesian Optimization
  4. Model based policy search

Demos:

Readings:
A survey on policy search algorithms for learning robot controllers in a handful of trials