Learning probabilistic synchronous CFGs for phrase-based translation

Open Access
Authors
Publication date 2010
Book title CoNLL-2010 : Fourteenth Conference on Computational Natural Language Learning
Book subtitle proceedings of the conference : 15-16 July 2010, Uppsala University, Uppsala, Sweden
ISBN
  • 9781932432831
Event Fourteenth Conference on Computational Natural Language Learning (CoNLL 2010), Uppsala, Sweden
Pages (from-to) 117-125
Publisher Stroudsburg, PA: Association for Computational Linguistics (ACL)
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Probabilistic phrase-based synchronous grammars are now considered promising devices for statistical machine translation because they can express reordering phenomena between pairs of languages. Learning these hierarchical, probabilistic devices from parallel corpora constitutes a major challenge, because of multiple latent model variables as well as the risk of data overfitting. This paper presents an effective method for learning a family of particular interest to MT, binary Synchronous Context-Free Grammars with inverted/monotone orientation (a.k.a. Binary ITG). A second contribution concerns devising a lexicalized phrase reordering mechanism that has complimentary strengths to Chiang's model. The latter conditions reordering decisions on the surrounding lexical context of phrases, whereas our mechanism works with the lexical content of phrase pairs (akin to standard phrase-based systems). Surprisingly, our experiments on French-English data show that our learning method applied to far simpler models exhibits performance indistinguishable from the Hiero system.
Document type Conference contribution
Language English
Published at http://portal.acm.org/citation.cfm?id=1870583 https://aclweb.org/anthology/W10-2915/
Downloads
329842.pdf (Final published version)
Permalink to this page
Back