GROKKING IN LINEAR ESTIMATORS - A SOLVABLE MODEL THAT GROKS WITHOUT UNDERSTANDING

Noam Levi, Alon Beck, Yohai Bar Sinai

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

Grokking is the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We show both analytically and numerically that grokking can surprisingly occur in linear networks performing linear tasks in a simple teacher-student setup with Gaussian inputs. In this setting, the full training dynamics is derived in terms of the training and generalization data covariance matrix. We present exact predictions on how the grokking time depends on input and output dimensionality, train sample size, regularization, and network initialization. We demonstrate that the sharp increase in generalization accuracy may not imply a transition from "memorization" to "understanding", but can simply be an artifact of the accuracy measure. We provide empirical verification for our calculations, along with preliminary results indicating that some predictions also hold for deeper networks, with non-linear activations.

Original languageEnglish
StatePublished - 2024
Event12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria
Duration: 7 May 202411 May 2024

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityHybrid, Vienna
Period7/05/2411/05/24

Funding

FundersFunder number
Google Gift
Milner Foundation
National Science FoundationPHY-2210452

    Fingerprint

    Dive into the research topics of 'GROKKING IN LINEAR ESTIMATORS - A SOLVABLE MODEL THAT GROKS WITHOUT UNDERSTANDING'. Together they form a unique fingerprint.

    Cite this