Deep tensor convolution on multicores

David Budden*, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri, Nir Shavit

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster Winograd-class of convolutional algorithms to the iV-dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multi-core scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2D ConvNets as a special case, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art.

Original languageEnglish
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages1007-1017
Number of pages11
ISBN (Electronic)9781510855144
StatePublished - 2017
Externally publishedYes
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: 6 Aug 201711 Aug 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume2

Conference

Conference34th International Conference on Machine Learning, ICML 2017
Country/TerritoryAustralia
CitySydney
Period6/08/1711/08/17

Funding

FundersFunder number
Intelligence Advanced Research Projects Activity138076-5093555
National Science FoundationIIS-1447786, CCF-1563880

    Fingerprint

    Dive into the research topics of 'Deep tensor convolution on multicores'. Together they form a unique fingerprint.

    Cite this