"Convex until proven guilty": Dimension-free acceleration of gradient descent on non-convex functions

Yair Cannon*, John C. Duchi, Oliver Hinder, Aaron Sidford

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

30 Scopus citations

Abstract

We develop and analyze a variant of Nesterov's accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is "guilty" of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function /with Lipschitz continuous gradient and Hessian, we compute a point x with ∥Vf(x)∥ ≤ ϵ in O(ϵ-7/4 log(l/ϵ)) gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only O(ϵ-5/3log(1/ϵ)) evaluations.

Original languageEnglish
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages1069-1091
Number of pages23
ISBN (Electronic)9781510855144
StatePublished - 2017
Externally publishedYes
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: 6 Aug 201711 Aug 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume2

Conference

Conference34th International Conference on Machine Learning, ICML 2017
Country/TerritoryAustralia
CitySydney
Period6/08/1711/08/17

Funding

FundersFunder number
NSF-CAREER1553086
Numerical Technologies
Paccar
S AIL-Toyota Center for AI ResearchAI Research
Norsk Sykepleierforbund

    Fingerprint

    Dive into the research topics of '"Convex until proven guilty": Dimension-free acceleration of gradient descent on non-convex functions'. Together they form a unique fingerprint.

    Cite this