Abstract
Extraction of term equivalents is one of the most important tasks for building bilingual dictionaries. Several measures have been proposed to extract translation equivalents from aligned parallel texts. In this paper, we will compare 28 measures of similarity based on the co-occurrence of words in aligned parallel text segments. Parallel texts are aligned using a simple method that extends previous work by Pascale Fung & Kathleen McKeown and Melamed but which, in contrast, does not use statistically unsupported heuristics to filter reliable points.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brown, P., Lai, J., Mercer, R.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 169–176
Church, K.: Char_align: A Program for Aligning Parallel Texts at the Character Level. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, U.S.A. (1993) 1–8
Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. In: Computational Linguistics, Vol. 16, number 1 (1990) 22–29
Dagan, I., Church, K., Gale, W.: Robust Bilingual Word Alignment for Machine Aided Translation. In: Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, U.S.A. (1993) 1–8
Daille, B.: Combined Approach for Terminology Extraction: Lexical Statistics and Linguistic Filtering. In: UCREL Technical Papers, Vol. 5., University of Lancaster, Department of Linguistics (1995)
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. In: Computational Linguistics, Vol. 19, number 1 (1993) 61–74
Fung, P., McKeown, K.: Aligning Noisy Parallel Corpora across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. In: Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, U.S.A. (1994) 81–88
Fung, P., McKeown, K.: A Technical Word-and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. In: Machine Translation, Vol. 12, numbers 1–2 (Special issue) (1997) 53–87
Gale, W., Church, K.: Identifying Word Correspondences in Parallel Texts. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, Pacific Grove, California, U.S.A., Morgan Kaufmann (1991) 152–157
Kay, M., Röscheisen, M.: Text-Translation Alignment. In: Computational Linguistics, Vol. 19, number 1 (1993) 121–142
Kotz, S., Johnson, N., Read, C.: Encyclopedia of Statistical Sciences, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1982)
Langlais, P., El-Bèze, M.: Alignement de Corpus Bilingues: Algorithmes et Évaluation. In: Ressources et Évaluations en Ingénierie de la Langue, Collection Actualité Scientifique. Aupfel—Uref, Paris, France (1999)
Melamed, I.: Bitext Maps and Alignment via Pattern Recognition. In: Computational Linguistics, Vol. 25, number 1 (1999) 107–130
Oakes, M.: Statistics for Corpus Linguistics. Edinburgh University Press, Edinburgh, U.K. (1998)
Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Alignment with Hapaxes. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’ 2000), Las Vegas, U.S.A.. CSREA Press, U.S.A. (2000) 1089–1095
Ribeiro, A., Lopes, G., Mexia, J.: Linear Regression Based Alignment of Parallel Texts Using Homograph Words. In: Horn, W. (ed.): ECAI 2000. Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany. IOS Press, Amsterdam, Netherlands (2000)
Ribeiro, A., Lopes, G., Mexia, J.: Aligning Portuguese and Chinese Parallel Texts Using Confidence Bands. In: Mizoguchi, R. & Slaney, J. (eds.), Proceedings of the Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000) — Lecture Notes in Artificial Intelligence. Springer-Verlag, Berlin Heidelberg New York (2000)
Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Parallel Texts Alignment. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong, China (2000)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval, McGraw-Hill, New York (1983)
da Silva, J., Dias, G., Guilloré, S., Lopes, J.: Using Localmaxs algorithms for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J. (eds.): Progress in Artificial Intelligence — Lecture Notes in Artificial Intelligence, Vol. 1695. Springer-Verlag, Berlin Heidelberg New York (1999) 113–132
Simard, M., Plamondon, P.: Bilingual Sentence Alignment: Balancing Robustness and Accuracy. In: Machine Translation, Vol. 13, number 1 (1998) 59–80
Smadja, F., McKeown, K., Hatzivassiloglou, V.: Translation Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, Vol. 22, number 1 (1996) 1–38
Wonnacott, T., Wonnacott, R.: Introductory Statistics, 5th edition, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ribeiro, A., Pereira Lopes, G., Mexia, J. (2000). Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity. In: Monard, M.C., Sichman, J.S. (eds) Advances in Artificial Intelligence. IBERAMIA SBIA 2000 2000. Lecture Notes in Computer Science(), vol 1952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44399-1_35
Download citation
DOI: https://doi.org/10.1007/3-540-44399-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41276-2
Online ISBN: 978-3-540-44399-5
eBook Packages: Springer Book Archive