A Phase Transition in Gradient Descent for Wide, Deep Neural Networks, Yasaman Bahri; iDS2 Seminar Series
From Oluwasanmi Koyejo
views
comments
From Oluwasanmi Koyejo
Abstract: Recent investigations into infinitely-wide deep neural networks have given rise to intriguing connections between deep networks, kernel methods, and Gaussian processes. Backing off of the infinite-width limit, one may wonder to what extent finite-width neural networks will be describable by including perturbative corrections to these results. We identify a regime that appears to be sharply different from such a description. The choice of learning rate in gradient descent is a crucial factor, naturally categorizing the dynamics of deep neural networks into two classes that are separated by a (sharp) phase transition as networks become wider. I will describe the distinct signatures of the two phases, how they are elucidated in a class of solvable simple models, and the implications for neural network performance.