Globally optimal gradient descent for a ConvNet with Gaussian inputs

Alon Brutzkus*, Amir Globerson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

37 Scopus citations

Abstract

Deep learning models are often successfully trained using gradient descent, despite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeed. Here we provide a strong result of this kind. We consider a neural net with one hidden layer and a convolutional structure with no overlap, and a ReLU activation function. For this architecture we show that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. To the best of our knowledge, this is the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations.

Original languageEnglish
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages980-1006
Number of pages27
ISBN (Electronic)9781510855144
StatePublished - 2017
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: 6 Aug 201711 Aug 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume2

Conference

Conference34th International Conference on Machine Learning, ICML 2017
Country/TerritoryAustralia
CitySydney
Period6/08/1711/08/17

Fingerprint

Dive into the research topics of 'Globally optimal gradient descent for a ConvNet with Gaussian inputs'. Together they form a unique fingerprint.

Cite this