Statistical machine translation with local language models

Authors
Publication date 2011
Book title EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Event EMNLP '11, Conference on Empirical Methods in Natural Language Processing
Pages (from-to) 869-879
Publisher Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model.
Document type Conference contribution
Language English
Published at http://delivery.acm.org/10.1145/2150000/2145528/p869-monz.pdf?ip=145.18.109.227&acc=OPEN&CFID=196570483&CFTOKEN=48575281&__acm__=1352451699_dacf34ffc5e528639882c3c57cd14e61
Permalink to this page
Back