Data Augmentation for Low-Resource Neural Machine Translation

Open Access
Authors
Publication date 2017
Host editors
  • R. Barzilay
  • M.-Y. Kan
Book title The 55th Annual Meeting of the Association for Computational Linguistics
Book subtitle proceedings of the Conference : July 30-August 4, 2017, Vancouver, Canada
ISBN
  • 9781945626760
Event Annual Meeting of the Association for Computational Linguistics
Volume | Issue number 2
Pages (from-to) 567-573
Publisher Stroudsburg, PA: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.
Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/P17-2090
Other links http://aclweb.org/anthology/attachments/P/P17/P17-2090.Presentation.pdf
Downloads
P17-2090 (Final published version)
Permalink to this page
Back