Machine Translation

, Volume 13, Issue 1, pp 59–80 | Cite as

Bilingual Sentence Alignment: Balancing Robustness and Accuracy

  • Michel Simard
  • Pierre Plamondon


Sentence alignment is the problem of making explicit the relations that exist between the sentences of two texts that are known to be mutual translations. Automatic sentence-alignment methods typically face two kinds of difficulties. First, there is the question of robustness. In real life, discrepancies between a source text and its translation are quite common: differences in layout, omissions, inversions, etc. Sentence-alignment programs must be ready to deal with such phenomena. Then, there is the question of accuracy. Even when translations are “clean”, alignment is still not a trivial matter: some decisions are hard to make, even for humans. We report here on the current state of our ongoing efforts to produce a sentence-alignment program that is both robust and accurate. The method that we propose relies on two new alignment engines: one that produces highly reliable and robust character-level alignments, and one that relies on statistical lexical knowledge to produce accurate mappings. Experimental results are presented which demonstrate the method's effectiveness, and highlight where problems remain to be solved.

bilingual sentence alignment bitext translation analysis statistical translation model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aarts, E. and J. Korst: 1989, Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. John Wiley and Sons, Chichester, England.Google Scholar
  2. Brown, P.F., J.C. Lai, and R.L. Mercer: 1991, ‘Aligning Sentences in Parallel Corpora’, in 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, Calif., pp. 89–94.Google Scholar
  3. Brown, P.F., S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer: 1993, ‘The Mathematics of Machine Translation: Parameter Estimation’, Computational Linguistics 19, 263–311.Google Scholar
  4. Catizone, R., G. Russell, and S. Warwick: 1989, ‘Deriving Translation Data from Bilingual Texts’, in Proceedings of the 1st International Lexical Acquisition Workshop, Detroit, MI, pp. 15–21.Google Scholar
  5. Chen, S.F.: 1993, ‘Aligning Sentences in Bilingual Corpora Using Lexical Information’, in 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 9–16.Google Scholar
  6. Church, K.W.: 1993, ‘Char_align: A Program for Aligning Parallel Texts at the Character Level’, in 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 1–8.Google Scholar
  7. Dagan, I., K.W. Church, and W.A. Gale: 1993, ‘Robust Bilingual Word Alignment for Machine Aided Translation’, in Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, pp. 164–171.Google Scholar
  8. Fung, P. and K.R. McKeown: 1994, ‘Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature by Dynamic TimeWarping’, in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 81–89.Google Scholar
  9. Gale, W.A. and K.W. Church: 1991, ‘A Program for Aligning Sentences in Bilingual Corpora’, in 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, Calif., pp. 177–183.Google Scholar
  10. Isabelle, P., M. Dymetman, G. Foster, J-M. Jutras, E. Macklovitch, F. Perrault, X. Ren. and M. Simard: 1993, ‘Translation Analysis and Translation Automation’, in Proceedings of Fifth International Conference on Theoretical and Methodological Issues in Machine Translation TMI ‘93: MT in the Next Generation, Kyoto, Japan, pp. 15–22.Google Scholar
  11. Kay, M. and M. Röscheisen: 1993, ‘Text-Translation Alignment’, Computational Linguistics 19, 121–142.Google Scholar
  12. Klavans, J. and E. Tzoukermann: 1995, ‘Combining Corpus and Machine-Readable Dictionary Data for Building Bilingual Lexicons’, Machine Translation 10, 59–75.Google Scholar
  13. Melamed, I.D.: 1996, ‘A Geometric Approach to Mapping Bitext Correspondence’, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, pp. 93–101.Google Scholar
  14. Palmer, D.D. and M.A. Hearst: 1994, Adaptive Sentence Boundary Disambiguation, Report No. UCB/CSD 94/797, Computer Science Division (EECS), University of California, Berkeley, CA.Google Scholar
  15. Simard, M.: 1997, BAF: un corpus de bi-texte anglais-français annoté à la main, Scholar
  16. Simard, M., G. Foster, and P. Isabelle: 1992, ‘Using Cognates to Align Sentences in Bilingual Corpora’, in Quatrième colloque international sur les aspects théoriques et méthodologiques de la traduction automatique, Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Méthodes empiristes versus méthodes rationalistes en TA, Empiricist vs. Rationalist Methods in MT, TMI-92, Montréal, Canada, pp. 12–20.Google Scholar
  17. Simard, M., G. Foster, and F. Perrault: 1993, TransSearch: un concordancier bilingue, CITI Technical Report, Montréal, Canada.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Michel Simard
    • 1
  • Pierre Plamondon
    • 1
  1. 1.Laboratoire de recherche appliquée en linguistique informatique (RALI), Département d'informatique et de recherche opérationnelleUniversité de MontréalMontréalCanada

Personalised recommendations