DESCRIPTION:Speaker: \;Professor Thaleia Zariphopoulou\, \;Presiden
tial Chair of Mathematics and the V. H. Neuhaus Centennial Professor of Fi
Title: Exploration versus exploitation in reinforcement learning: a stochastic control approach
versus exploitation in reinforcement learning: a stochastic control appro
ach\nAbstract: \;In this talk\, I will consider reinforcement learning
(RL) in continuous time and study the problem of achieving the best trade
-off between exploration and exploitation. I will propose an entropy-regul
arized reward function involving the differential entropy of the distribut
ions of actions\, and motivate and devise an exploratory formulation for t
he feature dynamics that captures learning under exploration. The resultin
g optimization problem is a revitalization of the classical relaxed stocha
stic control. I will provide a complete analysis of the problem in the lin
ear&ndash\; quadratic (LQ) setting and deduce that the optimal feedback co
ntrol distribution for balancing exploitation and exploration is Gaussian.
This in turn interprets the widely adopted Gaussian exploration in RL\, b
eyond its simplicity for sampling. Moreover\, the exploitation and explora
tion are captured respectively by the mean and variance of the Gaussian di
stribution. Furthermore\, a more random environment contains more learning
opportunities in the sense that less exploration is needed. I will also c
haracterize the cost of exploration\, which\, for the LQ case\, is shown t
o be proportional to the entropy regularization weight and inversely propo
rtional to the discount factor. Finally\, as the weight of exploration dec
ays to zero\, I will discuss the convergence of the solution of the entrop
y-regularized LQ problem to the one of the classical LQ problem.
DTSTART:20200304T110000
DTEND:20200304T120000
Location: State Hall
Mathematics Colloquium - Thaleia Zariphopoulou
https://events.wayne.edu/main/2020/03/04/mathematics-colloquium-thaleia-zariphopoulou-85413/
-zariphopoulou-85413/
