Using the output embedding to improve language models

Ofir Press, Lior Wolf

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

335 Scopus citations

Abstract

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.

Original languageEnglish
Title of host publicationShort Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages157-163
Number of pages7
ISBN (Electronic)9781510838604
DOIs
StatePublished - 2017
Event15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Valencia, Spain
Duration: 3 Apr 20177 Apr 2017

Publication series

Name15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference
Volume2

Conference

Conference15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017
Country/TerritorySpain
CityValencia
Period3/04/177/04/17

Fingerprint

Dive into the research topics of 'Using the output embedding to improve language models'. Together they form a unique fingerprint.

Cite this