Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity

  • António Ribeiro
  • Gabriel Pereira Lopes
  • João Mexia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1952)

Abstract

Extraction of term equivalents is one of the most important tasks for building bilingual dictionaries. Several measures have been proposed to extract translation equivalents from aligned parallel texts. In this paper, we will compare 28 measures of similarity based on the co-occurrence of words in aligned parallel text segments. Parallel texts are aligned using a simple method that extends previous work by Pascale Fung & Kathleen McKeown and Melamed but which, in contrast, does not use statistically unsupported heuristics to filter reliable points.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P., Lai, J., Mercer, R.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 169–176Google Scholar
  2. Church, K.: Char_align: A Program for Aligning Parallel Texts at the Character Level. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, U.S.A. (1993) 1–8Google Scholar
  3. Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. In: Computational Linguistics, Vol. 16, number 1 (1990) 22–29Google Scholar
  4. Dagan, I., Church, K., Gale, W.: Robust Bilingual Word Alignment for Machine Aided Translation. In: Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, U.S.A. (1993) 1–8Google Scholar
  5. Daille, B.: Combined Approach for Terminology Extraction: Lexical Statistics and Linguistic Filtering. In: UCREL Technical Papers, Vol. 5., University of Lancaster, Department of Linguistics (1995)Google Scholar
  6. Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. In: Computational Linguistics, Vol. 19, number 1 (1993) 61–74Google Scholar
  7. Fung, P., McKeown, K.: Aligning Noisy Parallel Corpora across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. In: Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, U.S.A. (1994) 81–88Google Scholar
  8. Fung, P., McKeown, K.: A Technical Word-and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. In: Machine Translation, Vol. 12, numbers 1–2 (Special issue) (1997) 53–87CrossRefGoogle Scholar
  9. Gale, W., Church, K.: Identifying Word Correspondences in Parallel Texts. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, Pacific Grove, California, U.S.A., Morgan Kaufmann (1991) 152–157Google Scholar
  10. Kay, M., Röscheisen, M.: Text-Translation Alignment. In: Computational Linguistics, Vol. 19, number 1 (1993) 121–142Google Scholar
  11. Kotz, S., Johnson, N., Read, C.: Encyclopedia of Statistical Sciences, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1982)MATHGoogle Scholar
  12. Langlais, P., El-Bèze, M.: Alignement de Corpus Bilingues: Algorithmes et Évaluation. In: Ressources et Évaluations en Ingénierie de la Langue, Collection Actualité Scientifique. Aupfel—Uref, Paris, France (1999)Google Scholar
  13. Melamed, I.: Bitext Maps and Alignment via Pattern Recognition. In: Computational Linguistics, Vol. 25, number 1 (1999) 107–130Google Scholar
  14. Oakes, M.: Statistics for Corpus Linguistics. Edinburgh University Press, Edinburgh, U.K. (1998)Google Scholar
  15. Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Alignment with Hapaxes. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’ 2000), Las Vegas, U.S.A.. CSREA Press, U.S.A. (2000) 1089–1095Google Scholar
  16. Ribeiro, A., Lopes, G., Mexia, J.: Linear Regression Based Alignment of Parallel Texts Using Homograph Words. In: Horn, W. (ed.): ECAI 2000. Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany. IOS Press, Amsterdam, Netherlands (2000)Google Scholar
  17. Ribeiro, A., Lopes, G., Mexia, J.: Aligning Portuguese and Chinese Parallel Texts Using Confidence Bands. In: Mizoguchi, R. & Slaney, J. (eds.), Proceedings of the Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000) — Lecture Notes in Artificial Intelligence. Springer-Verlag, Berlin Heidelberg New York (2000)Google Scholar
  18. Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Parallel Texts Alignment. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong, China (2000)Google Scholar
  19. Salton, G., McGill, M.: Introduction to Modern Information Retrieval, McGraw-Hill, New York (1983)MATHGoogle Scholar
  20. da Silva, J., Dias, G., Guilloré, S., Lopes, J.: Using Localmaxs algorithms for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J. (eds.): Progress in Artificial Intelligence — Lecture Notes in Artificial Intelligence, Vol. 1695. Springer-Verlag, Berlin Heidelberg New York (1999) 113–132CrossRefGoogle Scholar
  21. Simard, M., Plamondon, P.: Bilingual Sentence Alignment: Balancing Robustness and Accuracy. In: Machine Translation, Vol. 13, number 1 (1998) 59–80CrossRefGoogle Scholar
  22. Smadja, F., McKeown, K., Hatzivassiloglou, V.: Translation Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, Vol. 22, number 1 (1996) 1–38Google Scholar
  23. Wonnacott, T., Wonnacott, R.: Introductory Statistics, 5th edition, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • António Ribeiro
    • 1
  • Gabriel Pereira Lopes
    • 1
  • João Mexia
    • 2
  1. 1.Departamento de Informática, Quinta da TorreUniversidade Nova de Lisboa, Faculdade de Ciências e TecnologiaMonte da CaparicaPortugal
  2. 2.Departamento de Matemática, Quinta da TorreUniversidade Nova de Lisboa, Faculdade de Ciências e TecnologiaMonte da CaparicaPortugal

Personalised recommendations