ML Seminar: ReLU nets are powerful memorizers: a tight analysis of finite sample expressive power

Seminar
Friday, November 08, 2019
11:00am - 12:00pm
EER 1.516 (North Tower)
I will talk about finite sample expressivity, aka memorization power of ReLU networks. Recent results showed (unsurprisingly) that arbitrary input data could be perfectly memorized using a shallow ReLU network with one hidden layer having N hidden nodes. I will describe a more careful construction that trades of width with depth to show that a ReLU network with 2 hidden layers, each with 2*sqrt(N) hidden nodes, can perfectly memorize arbitrary datasets. Moreover, we prove that width of Θ(sqrt(N)) is necessary and sufficient for having perfect memorization power. A notable corollary of this result is that mild overparametrization suffices to permit a NN to achieve zero training loss!
 
We extend our results to deep networks too and combined with recent results on VC-dimension of deep nets, we show that our results on memorization power are nearly tight. Time permitting, I will mention expressive power results for Resnets, as well as how SGD behaves on optimizing such networks.

Speaker

portrait of Prof. Suvrit Sra
Assistant Professor
Massachusetts Institute of Technology

Suvrit Sra joined MIT’s Department of Electrical Engineering and Computer Science as an Assistant Professor, and IDSS as a core faculty member, in January 2018. Prior to this, he was a Principal Research Scientist in the MIT Laboratory for Information and Decision Systems (LIDS). Before coming to LIDS, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, in Tübingen, Germany. During this time, he was also a visiting faculty member at the University of California at Berkeley (EECS Department) and Carnegie Mellon University (Machine Learning Department). He received his Ph.D. in Computer Science from the University of Texas at Austin.

Suvrit’s research bridges a variety of mathematical topics including optimization, matrix theory, differential geometry, and probability with machine learning. His recent work focuses on the foundations of geometric optimization, an emerging subarea of nonconvex optimization where geometry (often non-Euclidean) enables efficient computation of global optimality. More broadly, his work encompasses a wide range of topics in optimization, especially in machine learning, statistics, signal processing, and related areas. He is pursuing novel applications of machine learning and optimization to materials science, quantum chemistry, synthetic biology, healthcare, and other data-driven domains. His work has won several awards at machine learning conferences, the 2011 “SIAM Outstanding Paper” award, faculty research awards from Criteo and Amazon, and an NSF CAREER award. In addition, Suvrit founded (and regularly co-chairs) the popular OPT “Optimization for Machine Learning” series of Workshops at the Conference on Neural Information Processing Systems (NIPS). He has also edited a well-received book with the same title (MIT Press, 2011).