Bilingual Sentence Alignment: Balancing Robustness and Accuracy
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Sentence alignment is the problem of making explicit the relations that exist between the sentences of two texts that are known to be mutual translations. Automatic sentence-alignment methods typically face two kinds of difficulties. First, there is the question of robustness. In real life, discrepancies between a source text and its translation are quite common: differences in layout, omissions, inversions, etc. Sentence-alignment programs must be ready to deal with such phenomena. Then, there is the question of accuracy. Even when translations are “clean”, alignment is still not a trivial matter: some decisions are hard to make, even for humans. We report here on the current state of our ongoing efforts to produce a sentence-alignment program that is both robust and accurate. The method that we propose relies on two new alignment engines: one that produces highly reliable and robust character-level alignments, and one that relies on statistical lexical knowledge to produce accurate mappings. Experimental results are presented which demonstrate the method's effectiveness, and highlight where problems remain to be solved.
- Aarts, E., Korst, J. (1989) Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. John Wiley and Sons, Chichester, England
- Brown, P.F., J.C. Lai, and R.L. Mercer: 1991, ‘Aligning Sentences in Parallel Corpora’, in 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, Calif., pp. 89–94.
- Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L. (1993) The Mathematics of Machine Translation: Parameter Estimation. Computational Linguistics 19: pp. 263-311
- Catizone, R., G. Russell, and S. Warwick: 1989, ‘Deriving Translation Data from Bilingual Texts’, in Proceedings of the 1st International Lexical Acquisition Workshop, Detroit, MI, pp. 15–21.
- Chen, S.F.: 1993, ‘Aligning Sentences in Bilingual Corpora Using Lexical Information’, in 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 9–16.
- Church, K.W.: 1993, ‘Char_align: A Program for Aligning Parallel Texts at the Character Level’, in 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 1–8.
- Dagan, I., K.W. Church, and W.A. Gale: 1993, ‘Robust Bilingual Word Alignment for Machine Aided Translation’, in Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, pp. 164–171.
- Fung, P. and K.R. McKeown: 1994, ‘Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature by Dynamic TimeWarping’, in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 81–89.
- Gale, W.A. and K.W. Church: 1991, ‘A Program for Aligning Sentences in Bilingual Corpora’, in 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, Calif., pp. 177–183.
- Isabelle, P., M. Dymetman, G. Foster, J-M. Jutras, E. Macklovitch, F. Perrault, X. Ren. and M. Simard: 1993, ‘Translation Analysis and Translation Automation’, in Proceedings of Fifth International Conference on Theoretical and Methodological Issues in Machine Translation TMI ‘93: MT in the Next Generation, Kyoto, Japan, pp. 15–22.
- Kay, M., Röscheisen, M. (1993) Text-Translation Alignment. Computational Linguistics 19: pp. 121-142
- Klavans, J., Tzoukermann, E. (1995) Combining Corpus and Machine-Readable Dictionary Data for Building Bilingual Lexicons. Machine Translation 10: pp. 59-75
- Melamed, I.D.: 1996, ‘A Geometric Approach to Mapping Bitext Correspondence’, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, pp. 93–101.
- Palmer, D.D., Hearst, M.A. (1994) Adaptive Sentence Boundary Disambiguation. Computer Science Division (EECS), University of California, Berkeley, CA
- Simard, M.: 1997, BAF: un corpus de bi-texte anglais-français annoté à la main, http://www-rali.iro.umontreal.ca/arc-a2/BAF.
- Simard, M., G. Foster, and P. Isabelle: 1992, ‘Using Cognates to Align Sentences in Bilingual Corpora’, in Quatrième colloque international sur les aspects théoriques et méthodologiques de la traduction automatique, Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Méthodes empiristes versus méthodes rationalistes en TA, Empiricist vs. Rationalist Methods in MT, TMI-92, Montréal, Canada, pp. 12–20.
- Simard, M., G. Foster, and F. Perrault: 1993, TransSearch: un concordancier bilingue, CITI Technical Report, Montréal, Canada.
- Bilingual Sentence Alignment: Balancing Robustness and Accuracy
Volume 13, Issue 1 , pp 59-80
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- bilingual sentence alignment
- translation analysis
- statistical translation model
- Industry Sectors