Mathematics Colloquium - Thaleia Zariphopoulou
This event is in the past.
Detroit, MI 48202
Speaker: Professor Thaleia Zariphopoulou, Presidential Chair of Mathematics and the V. H. Neuhaus Centennial Professor of Finance at the University of Texas at Austin
Title: Exploration versus exploitation in reinforcement learning: a stochastic control approach
Abstract: In this talk, I will consider reinforcement learning (RL) in continuous time and study the problem of achieving the best trade-off between exploration and exploitation. I will propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory formulation for the feature dynamics that captures learning under exploration. The resulting optimization problem is a revitalization of the classical relaxed stochastic control. I will provide a complete analysis of the problem in the linear– quadratic (LQ) setting and deduce that the optimal feedback control distribution for balancing exploitation and exploration is Gaussian. This in turn interprets the widely adopted Gaussian exploration in RL, beyond its simplicity for sampling. Moreover, the exploitation and exploration are captured respectively by the mean and variance of the Gaussian distribution. Furthermore, a more random environment contains more learning opportunities in the sense that less exploration is needed. I will also characterize the cost of exploration, which, for the LQ case, is shown to be proportional to the entropy regularization weight and inversely proportional to the discount factor. Finally, as the weight of exploration decays to zero, I will discuss the convergence of the solution of the entropy-regularized LQ problem to the one of the classical LQ problem.