Statistical Machine Translation on Paraphrased Corpora

This paper presents a statistical machine translation trained on normalized corpora. The automatic paraphrasing is carried out by inducing paraphrasing expressions from a bilingual corpus. Then, the normalization is treated as a specic paraphrase of a given input determined by the frequency in a corpus. The experimental results on Japanese-to-English translation with normalized English corpus exhibited the reduction of word-error-rate by 8% and the improvement of subjective evaluation from 70% into 72.5%
Published in 2002