Tensors for deep learning theory: Analyzing deep learning architectures via tensorization

Yoav Levine, Noam Wies, Or Sharir, Nadav Cohen, Amnon Shashua

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Deep learning architectures have enabled unprecedented advances in a wide range of artificial intelligence-related applications. The empirical success of these architectures has posed fundamental riddles regarding their operation in the front lines of modern theoretical machine learning research. Related theoretical efforts can be broadly divided into (i) explaining the observed success of deep learning architectures and (ii) harnessing these insights for improving their operation. In this chapter, we outline a tensor analysis-based contribution to understanding and improving the expressivity of prominent deep learning architecture classes. We detail a successful proof methodology which includes analyzing grid tensors of the functions realized by deep learning architecture classes, which was applied for convolutional, recurrent, and self-attention networks. The rank of an architecture’s grid tensor is used for bounding the input dependencies that can be modeled by the architecture and for establishing superiority of one architectural configuration over the other. We demonstrate how the above methodology has promoted the understanding of the architectures’ operations and consequently led to their practical improvements.

Original languageEnglish
Title of host publicationTensors for Data Processing
Subtitle of host publicationTheory, Methods, and Applications
PublisherElsevier
Pages215-248
Number of pages34
ISBN (Electronic)9780128244470
ISBN (Print)9780323859653
DOIs
StatePublished - 1 Jan 2021
Externally publishedYes

Keywords

  • Convolutional networks
  • Deep learning
  • Depth efficiency
  • Expressivity
  • Grid tensors
  • Recurrent networks
  • Self-attention networks
  • Separation rank

Fingerprint

Dive into the research topics of 'Tensors for deep learning theory: Analyzing deep learning architectures via tensorization'. Together they form a unique fingerprint.

Cite this