Data Augmentation for Low-Resource Neural Machine Translation
| Authors | |
|---|---|
| Publication date | 2017 |
| Host editors |
|
| Book title | The 55th Annual Meeting of the Association for Computational Linguistics |
| Book subtitle | proceedings of the Conference : July 30-August 4, 2017, Vancouver, Canada |
| ISBN |
|
| Event | Annual Meeting of the Association for Computational Linguistics |
| Volume | Issue number | 2 |
| Pages (from-to) | 567-573 |
| Publisher | Stroudsburg, PA: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/P17-2090 |
| Other links | http://aclweb.org/anthology/attachments/P/P17/P17-2090.Presentation.pdf |
| Downloads |
P17-2090
(Final published version)
|
| Permalink to this page | |