Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We design an algorithm which finds an ϵ-approximate stationary point (with ∥∇F(x)∥≤ϵ) using O(ϵ−3) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic pth order methods for any p≥2, even when the first p derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding (ϵ,γ)-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.
Original languageEnglish
Title of host publicationProceedings of Thirty Third Conference on Learning Theory
EditorsJacob Abernethy, Shivani Agarwal
PublisherPMLR
Pages242-299
Number of pages58
Volume125
StatePublished - 1 Sep 2020
Event33rd Annual Conference on Learning Theory, COLT 2020 - virtual
Duration: 9 Jul 202012 Jul 2020
Conference number: 33

Publication series

NameProceedings of Machine Learning Research
PublisherPMLR
ISSN (Electronic)2640-3498

Conference

Conference33rd Annual Conference on Learning Theory, COLT 2020
Abbreviated titleCOLT 2020
Period9/07/2012/07/20

Fingerprint

Dive into the research topics of 'Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations'. Together they form a unique fingerprint.

Cite this