Abstract
We present a new method for aligning sentences with their translations in a parallel bilingual corpus. Previous approaches have generally been based either on sentence length or word correspondences. Sentence-length-based methods are relatively fast and fairly accurate. Word-correspondence-based methods are generally more accurate but much slower, and usually depend on cognates or a bilingual lexicon. Our method adapts and combines these approaches, achieving high accuracy at a modest computational cost, and requiring no knowledge of the languages or the corpus beyond division into words and sentences.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kay, M., Röscheisen, M.: Text-Translation Alignment. Technical Report, Xerox Palo Alto Research Center (1988)
Kay, M., Röscheisen, M.: Text-Translation Alignment. Computational Linguistics 19(1) (1993) 121–142
Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning Sentences in Parallel Corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California (1991) 169–176
Gale, W.A., Church, K.W.: A program for Aligning Sentences in Bilingual Corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California (1991) 177–184
Gale, W.A., Church, K.W.: A Program for Aligning Sentences in Bilingual Corpora. Computational Linguistics 19(1) (1993) 75–102
Chen, S.F.: 1993. Aligning Sentences in Bilingual Corpora Using Lexical Information. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio (1993) 9–16
Wu, D.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico (1994) 80–87
Melamed, I.D.: A Geometric Approach to Mapping Bitext Correspondence. IRCS Technical Report 96-22, University of Pennsylvania (1996)
Melamed, I.D.: A Portable Algorithm for Mapping Bitext Correspondence. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain (1997) 305–312
Simard, M., Plamondon, P.: Bilingual Sentence Alignment: Balancing Robustness and Accuracy. Machine Translation 13(1) (1998) 59–80
Brown, PR, Delia Pietra, S. A., Della Pietra, V. J., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2) (1993) 263–311
Rabiner, L. R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2) (1989) 257–286
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moore, R.C. (2002). Fast and Accurate Sentence Alignment of Bilingual Corpora. In: Richardson, S.D. (eds) Machine Translation: From Research to Real Users. AMTA 2002. Lecture Notes in Computer Science(), vol 2499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45820-4_14
Download citation
DOI: https://doi.org/10.1007/3-540-45820-4_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44282-0
Online ISBN: 978-3-540-45820-3
eBook Packages: Springer Book Archive