, Volume 13, Issue 1, pp 59-80

Bilingual Sentence Alignment: Balancing Robustness and Accuracy

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Sentence alignment is the problem of making explicit the relations that exist between the sentences of two texts that are known to be mutual translations. Automatic sentence-alignment methods typically face two kinds of difficulties. First, there is the question of robustness. In real life, discrepancies between a source text and its translation are quite common: differences in layout, omissions, inversions, etc. Sentence-alignment programs must be ready to deal with such phenomena. Then, there is the question of accuracy. Even when translations are “clean”, alignment is still not a trivial matter: some decisions are hard to make, even for humans. We report here on the current state of our ongoing efforts to produce a sentence-alignment program that is both robust and accurate. The method that we propose relies on two new alignment engines: one that produces highly reliable and robust character-level alignments, and one that relies on statistical lexical knowledge to produce accurate mappings. Experimental results are presented which demonstrate the method's effectiveness, and highlight where problems remain to be solved.