TY - GEN
T1 - Tight Risk Bounds for Gradient Descent on Separable Data
AU - Schliserman, Matan
AU - Koren, Tomer
N1 - Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.
PY - 2023
Y1 - 2023
N2 - We study the generalization properties of unregularized gradient methods applied to separable linear classification-a setting that has received considerable attention since the pioneering work of Soudry et al. [14]. We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form Θ(Equation presented), where (Equation presented) is the number of gradient steps, (Equation presented) is size of the training set, (Equation presented) is the data margin, and (Equation presented) is a complexity term that depends on the tail decay rate of the loss function (and on (Equation presented)). Our upper bound greatly improves the existing risk bounds due to Shamir [13] and Schliserman and Koren [12], that either applied to specific loss functions or imposed extraneous technical assumptions, and applies to virtually any convex and smooth loss function. Our risk lower bound is the first in this context and establish the tightness of our general upper bound for any given tail decay rate and in all parameter regimes. The proof technique used to show these results is also markedly simpler compared to previous work, and is straightforward to extend to other gradient methods; we illustrate this by providing analogous results for Stochastic Gradient Descent.
AB - We study the generalization properties of unregularized gradient methods applied to separable linear classification-a setting that has received considerable attention since the pioneering work of Soudry et al. [14]. We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form Θ(Equation presented), where (Equation presented) is the number of gradient steps, (Equation presented) is size of the training set, (Equation presented) is the data margin, and (Equation presented) is a complexity term that depends on the tail decay rate of the loss function (and on (Equation presented)). Our upper bound greatly improves the existing risk bounds due to Shamir [13] and Schliserman and Koren [12], that either applied to specific loss functions or imposed extraneous technical assumptions, and applies to virtually any convex and smooth loss function. Our risk lower bound is the first in this context and establish the tightness of our general upper bound for any given tail decay rate and in all parameter regimes. The proof technique used to show these results is also markedly simpler compared to previous work, and is straightforward to extend to other gradient methods; we illustrate this by providing analogous results for Stochastic Gradient Descent.
UR - http://www.scopus.com/inward/record.url?scp=85187765707&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85187765707
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
A2 - Oh, A.
A2 - Neumann, T.
A2 - Globerson, A.
A2 - Saenko, K.
A2 - Hardt, M.
A2 - Levine, S.
PB - Neural information processing systems foundation
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -