TY - GEN

T1 - Shampoo

AU - Gupta, Vineet

AU - Koren, Tomer

AU - Singer, Yoram

N1 - Publisher Copyright:
© 2018 35th International Conference on Machine Learning, ICML 2018. All rights reserved.

PY - 2018

Y1 - 2018

N2 - Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state- of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Surprisingly, although it involves a more complex update rule, Shampoo's runtime per step is comparable in practice to that of simple gradient methods such as SGD, AdaGrad, and Adam.

AB - Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state- of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Surprisingly, although it involves a more complex update rule, Shampoo's runtime per step is comparable in practice to that of simple gradient methods such as SGD, AdaGrad, and Adam.

UR - http://www.scopus.com/inward/record.url?scp=85057324679&partnerID=8YFLogxK

M3 - פרסום בספר כנס

AN - SCOPUS:85057324679

T3 - 35th International Conference on Machine Learning, ICML 2018

SP - 2956

EP - 2964

BT - 35th International Conference on Machine Learning, ICML 2018

A2 - Dy, Jennifer

A2 - Krause, Andreas

PB - International Machine Learning Society (IMLS)

Y2 - 10 July 2018 through 15 July 2018

ER -