TY - GEN
T1 - Deep tensor convolution on multicores
AU - Budden, David
AU - Matveev, Alexander
AU - Santurkar, Shibani
AU - Chaudhuri, Shraman Ray
AU - Shavit, Nir
N1 - Publisher Copyright:
© 2017 by the author(s).
PY - 2017
Y1 - 2017
N2 - Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster Winograd-class of convolutional algorithms to the iV-dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multi-core scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2D ConvNets as a special case, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art.
AB - Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster Winograd-class of convolutional algorithms to the iV-dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multi-core scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2D ConvNets as a special case, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art.
UR - http://www.scopus.com/inward/record.url?scp=85048409780&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85048409780
T3 - 34th International Conference on Machine Learning, ICML 2017
SP - 1007
EP - 1017
BT - 34th International Conference on Machine Learning, ICML 2017
PB - International Machine Learning Society (IMLS)
T2 - 34th International Conference on Machine Learning, ICML 2017
Y2 - 6 August 2017 through 11 August 2017
ER -