Recent years have witnessed tremendous successes of AI and machine learning, especially reinforcement learning (RL), in solving many decision-making and control tasks. However, many RL algorithms are still miles away from being applied to practical autonomous systems, which usually involve more complicated scenarios with model uncertainty and multiple decision-makers by nature. In this talk, I will introduce our study of RL for control and sequential decision-making with provable guarantees, especially with robustness and multi-agent interaction considerations. I will first show that policy optimization, one of the main drivers of many empirical successes of RL, can solve a fundamental class of robust control tasks with global optimality guarantees, despite nonconvexity. More importantly, I will show that certain policy optimization approaches also automatically preserve some "robustness" during learning, a property we termed as "implicit regularization", an interesting phenomenon that has been observed in other different machine learning contexts. Notably, such results also address other basic while foundational settings in control and game theory: risk-sensitive linear control design (dating back to Jacobson's seminal work in 1970s), and linear quadratic zero-sum dynamic games, in a unified manner. The latter is a benchmark multi-agent RL (MARL) setting that mirrors the role played by linear quadratic regulators for single-agent RL. I will also briefly highlight our works on MARL with decentralized networked agents, and MARL in zero-sum Markov games, the very first MARL model since
Shapley '53 and Littman '94, with provable convergence and sample efficiency guarantees. Time-permitting, I will also discuss the recent continuing and exciting effort that follow up this line of works. Finally, I will share some thoughts on how to further exploit the intersection of control theory, machine/reinforcement learning, and game theory, towards safe and large-scale autonomy.
…Read more
Less…