TY - GEN
T1 - Stochastic Bias-Reduced Gradient Methods
AU - Asi, Hilal
AU - Carmon, Yair
AU - Jambulapati, Arun
AU - Jin, Yujia
AU - Sidford, Aaron
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer x? of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn [8] to turn any optimal stochastic gradient method into an estimator of x? with bias δ, variance O(log(1/δ)), and an expected sampling cost of O(log(1/δ)) stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of N functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.
AB - We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer x? of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn [8] to turn any optimal stochastic gradient method into an estimator of x? with bias δ, variance O(log(1/δ)), and an expected sampling cost of O(log(1/δ)) stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of N functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.
UR - http://www.scopus.com/inward/record.url?scp=85131801254&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85131801254
T3 - Advances in Neural Information Processing Systems
SP - 10810
EP - 10822
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -