Making SGD Parameter-Free

Yair Carmon, Oliver Hinder

Research output: Contribution to journalConference articlepeer-review

Abstract

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.

Original languageEnglish
Pages (from-to)2360-2389
Number of pages30
JournalProceedings of Machine Learning Research
Volume178
StatePublished - 2022
Event35th Conference on Learning Theory, COLT 2022 - London, United Kingdom
Duration: 2 Jul 20225 Jul 2022

Fingerprint

Dive into the research topics of 'Making SGD Parameter-Free'. Together they form a unique fingerprint.

Cite this