TY - JOUR
T1 - The sample complexity of ERMs in stochastic convex optimization
AU - Carmon, Daniel
AU - Livni, Roi
AU - Yehudayoff, Amir
N1 - Publisher Copyright:
Copyright 2024 by the author(s).
PY - 2024
Y1 - 2024
N2 - Stochastic convex optimization is one of the most well-studied models for learning in modern machine learning. Nevertheless, a central fundamental question in this setup remained unresolved: how many data points must be observed so that any empirical risk minimizer (ERM) shows good performance on the true population? This question was proposed by Feldman who proved that Ω(dε + ε12 ) data points are necessary (where d is the dimension and " > 0 is the accuracy parameter). Proving an !(dε + ε12 ) lower bound was left as an open problem. In this work we show that in fact Õ(dε + ε12 ) data points are also sufficient. This settles the question and yields a new separation between ERMs and uniform convergence. This sample complexity holds for the classical setup of learning bounded convex Lipschitz functions over the Euclidean unit ball. We further generalize the result and show that a similar upper bound holds for all symmetric convex bodies. The general bound is composed of two terms: (i) a term of the form Õ(dε ) with an inverse-linear dependence on the accuracy parameter, and (ii) a term that depends on the statistical complexity of the class of linear functions (captured by the Rademacher complexity). The proof builds a mechanism for controlling the behavior of stochastic convex optimization problems.
AB - Stochastic convex optimization is one of the most well-studied models for learning in modern machine learning. Nevertheless, a central fundamental question in this setup remained unresolved: how many data points must be observed so that any empirical risk minimizer (ERM) shows good performance on the true population? This question was proposed by Feldman who proved that Ω(dε + ε12 ) data points are necessary (where d is the dimension and " > 0 is the accuracy parameter). Proving an !(dε + ε12 ) lower bound was left as an open problem. In this work we show that in fact Õ(dε + ε12 ) data points are also sufficient. This settles the question and yields a new separation between ERMs and uniform convergence. This sample complexity holds for the classical setup of learning bounded convex Lipschitz functions over the Euclidean unit ball. We further generalize the result and show that a similar upper bound holds for all symmetric convex bodies. The general bound is composed of two terms: (i) a term of the form Õ(dε ) with an inverse-linear dependence on the accuracy parameter, and (ii) a term that depends on the statistical complexity of the class of linear functions (captured by the Rademacher complexity). The proof builds a mechanism for controlling the behavior of stochastic convex optimization problems.
UR - http://www.scopus.com/inward/record.url?scp=85194163236&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85194163236
SN - 2640-3498
VL - 238
SP - 3799
EP - 3807
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024
Y2 - 2 May 2024 through 4 May 2024
ER -