COLLOQUIUM: Arindam Banerjee, "SGD for Deep Learning: Empirical Geometry, Stability, and Smoothed Analysis"
From Emma Ilukhin on 03/3/2021
Due to technical difficulties, captions are unavailable but are forthcoming soon.
While the past decade has seen unprecedented empirical success of deep learning models, the generalization behavior of such models remain shrouded in mystery. We will start by briefly reviewing recent work indicating that generalization may potentially be explained by properties of the optimization algorithm used for learning rather than expressive power of the function class. In this talk, we will focus on Stochastic Gradient Descent (SGD)-type algorithms and discuss optimization and generalization properties of such algorithms. We will first present a set of empirical results illustrating the high-dimensional geometry of gradients and Hessians for deep models trained with SGD. Then, we discuss generalization of SGD-type algorithms based on stability, where mild changes in data do not lead to large changes in the learned model. In particular, we present stability bounds based on a smoothed analysis of SGD, i.e., by adding Gaussian noise to the stochastic gradients, and discuss tradeoffs in optimization and generalization illustrated by such bounds. Further, we illustrate that such noisy SGD methods have essentially the same empirical performance as SGD while being much easier to analyze in theory.
Arindam Banerjee is a Founder Professor at the Department of Computer Science, University of Illinois Urbana-Champaign. His research interests are in machine learning and data mining, especially on problems involving geometry and randomness. His current research focuses on computational and statistical aspects of deep learning, spatial and temporal data analysis, and sequential decision making problems. His work also focuses on applications in complex real-world problems in different areas including climate science, ecology, recommendation systems, and finance, among others. He has won several awards, including the NSF CAREER award (2010), the IBM Faculty Award (2013), and six best paper awards in top-tier venues.
Part of the Illinois Computer Science Speakers Series. Faculty Host: Hanghang Tong.