Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings

D. Xezonaki; T. Khalil; D. Stap; B.J. Denis

Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings

Authors	D. Xezonaki T. Khalil D. Stap B.J. Denis
Publication date	2023
Host editors	M. Utiyama R. Wang
Book title	MTS: Machine Translation Summit 2023
Book subtitle	September 4-8, 2023, Macau SAR, China : Proceedings of Machine Translation Summit XIX. - Vol. 1: Research Track
Event	Machine Translation Summit XIX
Pages (from-to)	209–221
Publisher	Asia-Pacific Association for Machine Translation
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Domain robustness is a key challenge for Neural Machine Translation (NMT). Translating text from a different distribution than the training set requires the NMT models to generalize well to unseen domains. In this work we propose a novel way to address domain robustness, by fusing external topic knowledge into the NMT architecture. We employ a pretrained denoising autoencoder and fuse topic information into the system during continued pretraining, and finetuning of the model on the downstream NMT task. Our results show that incorporating external topic knowledge, as well as additional pretraining can improve the out-of-domain performance of NMT models. The proposed methodology meets state-of-the-art on out-of-domain performance. Our analysis shows that a low overlap between the pretraining and finetuning corpora, as well as the quality of topic representations help the NMT systems become more robust under domain shift.
Document type	Conference contribution
Language	English
Published at	https://aclanthology.org/2023.mtsummit-research.18
Downloads	2023.mtsummit-research.18 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings