TY - JOUR
T1 - Second-Order Information in Non-Convex Stochastic Optimization
T2 - 33rd Conference on Learning Theory, COLT 2020
AU - Arjevani, Yossi
AU - Carmon, Yair
AU - Duchi, John C.
AU - Foster, Dylan J.
AU - Sekhari, Ayush
AU - Sridharan, Karthik
N1 - Publisher Copyright:
© 2020 Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, A. Sekhari & K. Sridharan.
PY - 2020
Y1 - 2020
N2 - We design an algorithm which finds an ε-approximate stationary point (with k∇F(x)k ≤ ε) using O(ε−3) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic pth order methods for any p ≥ 2, even when the first p derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding (ε, γ)-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.
AB - We design an algorithm which finds an ε-approximate stationary point (with k∇F(x)k ≤ ε) using O(ε−3) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic pth order methods for any p ≥ 2, even when the first p derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding (ε, γ)-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.
KW - Hessian-vector products
KW - Stochastic optimization
KW - non-convex optimization
KW - second-order methods
KW - variance reduction
UR - http://www.scopus.com/inward/record.url?scp=85161288165&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85161288165
SN - 2640-3498
VL - 125
SP - 242
EP - 299
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 9 July 2020 through 12 July 2020
ER -