On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

Shahar Azulay*, Edward Moroshko*, Mor Shpigel Nacson*, Blake Woodworth*, Nathan Srebro*, Amir Globerson*, Daniel Soudry*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

30 Scopus citations

Abstract

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to. In particular, it was shown that large initialization leads to the neural tangent kernel regime solution, whereas small initialization leads to so called “rich regimes”. However, the initialization structure is richer than the overall scale alone and involves relative magnitudes of different weights and layers in the network. Here we show that these relative scales, which we refer to as initialization shape, play an important role in determining the learned model. We develop a novel technique for deriving the inductive bias of gradient-flow and use it to obtain closed-form implicit regularizers for multiple cases of interest.

Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages468-477
Number of pages10
ISBN (Electronic)9781713845065
StatePublished - 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: 18 Jul 202124 Jul 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period18/07/2124/07/21

Funding

FundersFunder number
Avatar Consortium
Google Research
Taub Foundation
Yandex Initiative in Machine Learning
European Commission
Israel Science Foundation31/1031
Tel Aviv University
Horizon 2020ERC HOLI 819080
Israel Innovation Authority

    Fingerprint

    Dive into the research topics of 'On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent'. Together they form a unique fingerprint.

    Cite this