Learning probabilistic synchronous CFGs for phrase-based translation
| Authors | |
|---|---|
| Publication date | 2010 |
| Book title | CoNLL-2010 : Fourteenth Conference on Computational Natural Language Learning |
| Book subtitle | proceedings of the conference : 15-16 July 2010, Uppsala University, Uppsala, Sweden |
| ISBN |
|
| Event | Fourteenth Conference on Computational Natural Language Learning (CoNLL 2010), Uppsala, Sweden |
| Pages (from-to) | 117-125 |
| Publisher | Stroudsburg, PA: Association for Computational Linguistics (ACL) |
| Organisations |
|
| Abstract |
Probabilistic phrase-based synchronous grammars are now considered promising devices for statistical machine translation because they can express reordering phenomena between pairs of languages. Learning these hierarchical, probabilistic devices from parallel corpora constitutes a major challenge, because of multiple latent model variables as well as the risk of data overfitting. This paper presents an effective method for learning a family of particular interest to MT, binary Synchronous Context-Free Grammars with inverted/monotone orientation (a.k.a. Binary ITG). A second contribution concerns devising a lexicalized phrase reordering mechanism that has complimentary strengths to Chiang's model. The latter conditions reordering decisions on the surrounding lexical context of phrases, whereas our mechanism works with the lexical content of phrase pairs (akin to standard phrase-based systems). Surprisingly, our experiments on French-English data show that our learning method applied to far simpler models exhibits performance indistinguishable from the Hiero system.
|
| Document type | Conference contribution |
| Language | English |
| Published at | http://portal.acm.org/citation.cfm?id=1870583 https://aclweb.org/anthology/W10-2915/ |
| Downloads |
329842.pdf
(Final published version)
|
| Permalink to this page | |