TY - JOUR
T1 - UNIQ
T2 - Uniform Noise Injection for Non-Uniform Qantization of Neural Networks
AU - Baskin, Chaim
AU - Liss, Natan
AU - Schwartz, Eli
AU - Zheltonozhskii, Evgenii
AU - Giryes, Raja
AU - Bronstein, Alex M.
AU - Mendelson, Avi
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/6
Y1 - 2021/6
N2 - We present a novel method for neural network quantization. Our method, named UNIQ, emulates a non-uniform k-quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.
AB - We present a novel method for neural network quantization. Our method, named UNIQ, emulates a non-uniform k-quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.
KW - Deep learning
KW - efficient deep learning
KW - neural networks
KW - quantization
UR - http://www.scopus.com/inward/record.url?scp=85109098163&partnerID=8YFLogxK
U2 - 10.1145/3444943
DO - 10.1145/3444943
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85109098163
SN - 0734-2071
VL - 37
JO - ACM Transactions on Computer Systems
JF - ACM Transactions on Computer Systems
IS - 1-4
M1 - 3444943
ER -