Aligned cross entropy for non-autoregressive machine translation

Marjan Ghazvininejad*, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

66 Scopus citations

Abstract

Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-autoregressive models.

Original languageEnglish
Title of host publicationProceedings of the 37th International Conference on Machine Learning
EditorsHal Daume, Aarti Singh
PublisherPMLR
Pages3515-3523
Number of pages9
ISBN (Electronic)9781713821120
StatePublished - 2020
Externally publishedYes
Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
Duration: 13 Jul 202018 Jul 2020

Publication series

NameProceedings of Machine Learning Research
PublisherPMLR
Volume119
ISSN (Electronic)2640-3498

Conference

Conference37th International Conference on Machine Learning, ICML 2020
CityVirtual, Online
Period13/07/2018/07/20

Fingerprint

Dive into the research topics of 'Aligned cross entropy for non-autoregressive machine translation'. Together they form a unique fingerprint.

Cite this