Microsoft Research

Training and Tuning Deep Neural Networks: Faster, Stronger, Better

[PhD Student: Chen Liu; Joint project with Ryota Tomioka]

The most prominent difficulties of training neural networks are that the loss functions are highly non-convex, and that they exhibit inhomogeneous curvatures along different directions. After extensive studies on the behavior of loss functions, now the practitioners and the theoreticians have reached the important consensus: An efficient training algorithm must be able to take the curvatures into account, and also be robust against saddle points.

SGD falls short of the first goal, and, despite theoretically possible to escape saddle points, is vulnerable to saddle points in practice with commonly adopted step-sizes. Instead, we propose to study the Stochastic Spectral Descent (SSD), a prototypical first-order algorithm which can be viewed as doing SGD in a suitable Banach space. Along with SSD, we plan to create an package for automatic training of deep neural networks, whose training algorithms include SSD variants, and also the automatic search over the best and robust hyperparameter configurations through our unifying algorithm