TY - GEN
T1 - Using the output embedding to improve language models
AU - Press, Ofir
AU - Wolf, Lior
N1 - Publisher Copyright:
© 2017 Association for Computational Linguistics.
PY - 2017
Y1 - 2017
N2 - We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.
AB - We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.
UR - http://www.scopus.com/inward/record.url?scp=85021677033&partnerID=8YFLogxK
U2 - 10.18653/v1/e17-2025
DO - 10.18653/v1/e17-2025
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85021677033
T3 - 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference
SP - 157
EP - 163
BT - Short Papers
PB - Association for Computational Linguistics (ACL)
T2 - 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017
Y2 - 3 April 2017 through 7 April 2017
ER -