Abstract
We design an algorithm which finds an ϵ-approximate stationary point (with ∥∇F(x)∥≤ϵ) using O(ϵ−3) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic pth order methods for any p≥2, even when the first p derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding (ϵ,γ)-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.
Original language | English |
---|---|
Title of host publication | Proceedings of Thirty Third Conference on Learning Theory |
Editors | Jacob Abernethy, Shivani Agarwal |
Publisher | PMLR |
Pages | 242-299 |
Number of pages | 58 |
Volume | 125 |
State | Published - 1 Sep 2020 |
Event | 33rd Annual Conference on Learning Theory, COLT 2020 - virtual Duration: 9 Jul 2020 → 12 Jul 2020 Conference number: 33 |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Publisher | PMLR |
ISSN (Electronic) | 2640-3498 |
Conference
Conference | 33rd Annual Conference on Learning Theory, COLT 2020 |
---|---|
Abbreviated title | COLT 2020 |
Period | 9/07/20 → 12/07/20 |