|Title||Syntax-Based Word Reordering in Phrase-Based Statistical Machine Translation: Why Does it Work?|
|Publication Type||Conference Paper|
|Year of Publication||2008|
|Authors||Zwarts S, Dras M|
|Conference Name||MT Summit|
Most natural language applications have some degree of preprocessing of data: tokenisation, stemming and so on. In the domain of Statistical Machine Translation (SMT) it has been shown that word reordering as a preprocessing step can help the translation process, but it is unclear why. We propose two possible reasons for the observed improvement: (1) that the reordering explicitly matches the syntax of the source language more closely to that of the target language; or (2) that it fits the data better to the mechanisms of phrasal SMT. In previous work from German to English, for example, hand-written language-specific reordering rules both match the German more closely to English syntax, and compress heads and dependants into the PSMT phrasal window. Whether the source of the improvement is (1) or (2) has not been determined, although most other work assumes the former. To identify the effects of each possible cause, we carry out two sets of experiments. For (1) we reverse the language-dependent syntactic reordering such that heads and dependants are moved apart. For (2), we propose a generic approach to minimising dependency distances in reordering that does not explicitly match target language word order and that does not require language-specific rules; the aim of which, rather than to beat state-of-the-art systems, is to investigate. The results show that (1) and (2) individually do still lead to improvements in translation quality, but each weaker than the original, suggesting that both features are necessary for a strong improvement. A consequence of this is that is possible to gain half the improvement of language-specific rules through one generic one.