Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings

Open Access
Authors
  • D. Xezonaki
  • T. Khalil
  • D. Stap
  • B.J. Denis
Publication date 2023
Host editors
  • M. Utiyama
  • R. Wang
Book title MTS: Machine Translation Summit 2023
Book subtitle September 4-8, 2023, Macau SAR, China : Proceedings of Machine Translation Summit XIX. - Vol. 1: Research Track
Event Machine Translation Summit XIX
Pages (from-to) 209–221
Publisher Asia-Pacific Association for Machine Translation
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Domain robustness is a key challenge for Neural Machine Translation (NMT). Translating text from a different distribution than the training set requires the NMT models to generalize well to unseen domains. In this work we propose a novel way to address domain robustness, by fusing external topic knowledge into the NMT architecture. We employ a pretrained denoising autoencoder and fuse topic information into the system during continued pretraining, and finetuning of the model on the downstream NMT task. Our results show that incorporating external topic knowledge, as well as additional pretraining can improve the out-of-domain performance of NMT models. The proposed methodology meets state-of-the-art on out-of-domain performance. Our analysis shows that a low overlap between the pretraining and finetuning corpora, as well as the quality of topic representations help the NMT systems become more robust under domain shift.
Document type Conference contribution
Language English
Published at https://aclanthology.org/2023.mtsummit-research.18
Downloads
2023.mtsummit-research.18 (Final published version)
Permalink to this page
Back