TY - GEN
T1 - Inducing and exploiting activation sparsity for fast neural network inference
AU - Kurtz, Mark
AU - Kopinsky, Justin
AU - Gelashvili, Rati
AU - Matveev, Alexander
AU - Carr, John
AU - Goin, Michael
AU - Leiserson, William
AU - Moore, Sage
AU - Nell, Bill
AU - Shavit, Nir
AU - Alistarh, Dan
N1 - Publisher Copyright:
© International Conference on Machine Learning, ICML 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Optimizing deep neural networks for inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by remov_ing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an analysis of methods for maximizing the sparsity of the activations in a trained neu_ral network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant per_formance gains. To induce highly sparse activa_tion maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced_Activation-Threshold Rectified Linear Unit (FA_TReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into in_ference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can en_able significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with relatively low retraining cost.
AB - Optimizing deep neural networks for inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by remov_ing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an analysis of methods for maximizing the sparsity of the activations in a trained neu_ral network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant per_formance gains. To induce highly sparse activa_tion maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced_Activation-Threshold Rectified Linear Unit (FA_TReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into in_ference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can en_able significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with relatively low retraining cost.
UR - http://www.scopus.com/inward/record.url?scp=85105550736&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85105550736
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 5489
EP - 5499
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
Y2 - 13 July 2020 through 18 July 2020
ER -