On the Predictability of Pruning Across Scales

Jonathan Rosenfeld*, Jonathan Frankle, Michael Carbin, Nir Shavit

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable. We demonstrate the accuracy of this approximation over orders of magnitude in depth, width, dataset size, and density. We show that the functional form holds (generalizes) for large scale data (e.g., ImageNet) and architectures (e.g., ResNets). As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.

Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages9075-9083
Number of pages9
ISBN (Electronic)9781713845065
StatePublished - 2021
Externally publishedYes
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: 18 Jul 202124 Jul 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period18/07/2124/07/21

Fingerprint

Dive into the research topics of 'On the Predictability of Pruning Across Scales'. Together they form a unique fingerprint.

Cite this