Exploring Unsupervised Pretraining Objectives for Machine Translation

doi:https://doi.org/10.18653/v1/2021.findings-acl.261

Exploring Unsupervised Pretraining Objectives for Machine Translation

Authors	C. Baziotis I. Titov A. Birch B. Haddow
Publication date	2021
Host editors	C. Zong F. Xia W. Li R. Navigli
Book title	Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Book subtitle	Findings of ACL: ACL-IJCNLP 2021 : August 1-6, 2021
ISBN (electronic)	9781954085541
Event	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
Pages (from-to)	2956-2971
Number of pages	16
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We pretrain models with different methods on English?German, English?Nepali and English?Sinhala monolingual data, and evaluate them on NMT. In (semi-) supervised NMT, varying the pretraining objective leads to surprisingly small differences in the finetuned performance, whereas unsupervised NMT is much more sensitive to it. To understand these results, we thoroughly study the pretrained models and verify that they encode and use information in different ways. We conclude that finetuning on parallel data is mostly sensitive to few properties that are shared by most models, such as a strong decoder, in contrast to unsupervised NMT that also requires models with strong cross-lingual abilities.
Document type	Conference contribution
Note	With supplementary video
Language	English
Published at	https://doi.org/10.18653/v1/2021.findings-acl.261
Other links	https://paperswithcode.com/paper/exploring-unsupervised-pretraining-objectives https://www.scopus.com/pages/publications/85115309740
Downloads	2021.findings-acl.261 (Final published version)
Supplementary materials	2021.findings-acl.261
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Exploring Unsupervised Pretraining Objectives for Machine Translation