Provably Efficient Exploration in Reinforcement Learning: An Optimistic Approach, Zhuoran Yang; IDS2 seminar series

views comments

Abstract:
Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation such as deep neural networks must be deployed to approximate either the value function or the policy. The introduction of function approximation raises a fundamental set of challenges involving computational and statistical efficiency, especially under the online setting with active data acquisition. As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate possibly nonlinear function approximation?
In this talk, I will introduce the first generation of efficient value-based and policy-based RL algorithms under the setting where both the value function and policy are represented by powerful function approximators such as the kernel and neural network functions. The proposed algorithms highlight a systematic integration of the ``optimism under the face of uncertainty’’ principle into algorithm design and are shown to enjoy both polynomial runtime and polynomial sample complexity. Finally, as an initial attempt to study multi-agent reinforcement learning, I will show how the value-based algorithm can be modified for solving zero-sum stochastic games with efficiency.

…Read more Less…

Tags

Provably Efficient Exploration in Reinforcement Learning: An Optimistic Approach, Zhuoran Yang; IDS2 seminar series

Related Media