Abstract:
Modern Reinforcement Learning (RL) is
commonly applied to practical problems with an enormous number of states, where
function approximation such as deep neural networks must be deployed to
approximate either the value function or the policy. The introduction of
function approximation raises a fundamental set of challenges involving
computational and statistical efficiency, especially under the online setting
with active data acquisition. As a result, a core RL question remains open: how
can we design provably efficient RL algorithms that incorporate possibly
nonlinear function approximation?
In this talk, I will introduce the first generation of efficient value-based
and policy-based RL algorithms under the setting where both the value function
and policy are represented by powerful function approximators such as the
kernel and neural network functions. The proposed algorithms highlight a
systematic integration of the ``optimism under the face of uncertainty’’
principle into algorithm design and are shown to enjoy both polynomial runtime
and polynomial sample complexity. Finally, as an initial attempt to study
multi-agent reinforcement learning, I will show how the value-based algorithm
can be modified for solving zero-sum stochastic games with efficiency.