TY - JOUR
T1 - Batches Stabilize the Minimum Norm Risk in High-Dimensional Overparametrized Linear Regression
AU - Ioushua, Shahar Stein
AU - Hasidim, Inbar
AU - Shayevitz, Ofer
AU - Feder, Meir
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparametrized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator and derive bounds on its quadratic risk. We then characterize the optimal batch size and show it is inversely proportional to the noise level, as well as to the overparametrization ratio. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparametrization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. We further show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings. Interestingly, we observe that the implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
AB - Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparametrized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator and derive bounds on its quadratic risk. We then characterize the optimal batch size and show it is inversely proportional to the noise level, as well as to the overparametrization ratio. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparametrization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. We further show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings. Interestingly, we observe that the implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
KW - Linear matrix inequalities
KW - Linear regression
KW - Partitioning algorithms
KW - Servers
KW - Signal to noise ratio
KW - Task analysis
KW - Vectors
UR - http://www.scopus.com/inward/record.url?scp=85197524171&partnerID=8YFLogxK
U2 - 10.1109/TIT.2024.3422837
DO - 10.1109/TIT.2024.3422837
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85197524171
SN - 0018-9448
SP - 1
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
ER -