SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

Amit Attia*, Tomer Koren

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general “affine variance” noise model and provides sharp rates of convergence in both the low-noise and high-noise regimes.

Original languageEnglish
Pages (from-to)1147-1171
Number of pages25
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: 23 Jul 202329 Jul 2023

Funding

FundersFunder number
Adelis Research Fund for Artificial Intelligence
Yandex Initiative for Machine Learning
Blavatnik Family Foundation
Israel Science Foundation993/17
Tel Aviv University

    Fingerprint

    Dive into the research topics of 'SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance'. Together they form a unique fingerprint.

    Cite this