Permutation forests for modeling word order in machine translation

Open Access
Authors
Supervisors
Cosupervisors
Award date 13-12-2017
Number of pages 148
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
In natural language, there is only a limited space for variation in the word order of linguistic productions. From a linguistic perspective, word order is the result of multiple application of syntactic recursive functions. These syntactic operations produce hierarchical syntactic structures, as well as a string of words that appear in a certain order.
However, different languages are governed by different syntactic rules. Thus, one of the main problems in machine translation is to find the mapping between word order in the source language and word order in the target language. This is often done by a method of syntactic transfer, in which the syntactic tree is recovered from the source sentence, and then transduced so that its form is consistent with the syntactic rules of the target language.
In this dissertation, I propose an alternative to syntactic transfer that maintains its good properties – namely the compositional and hierarchical structure – but, unlike syntactic transfer, it is directly derived from data without requiring any linguistic annotation. This approach brings two main advantages. First, it allows for applying hierarchical reordering even on languages for which there are no syntactic parsers available. Second, unlike the trees used in syntactic transfer which in some cases cannot cover the reordering patterns present in the data, the trees used in this work are built directly over the reordering patterns, so they can cover them by definition.
Document type PhD thesis
Note ILLC Dissertation Series DS-2017-09
Language English
Downloads
Permalink to this page
cover
Back