Skip to main content

Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1952))

Abstract

Extraction of term equivalents is one of the most important tasks for building bilingual dictionaries. Several measures have been proposed to extract translation equivalents from aligned parallel texts. In this paper, we will compare 28 measures of similarity based on the co-occurrence of words in aligned parallel text segments. Parallel texts are aligned using a simple method that extends previous work by Pascale Fung & Kathleen McKeown and Melamed but which, in contrast, does not use statistically unsupported heuristics to filter reliable points.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Brown, P., Lai, J., Mercer, R.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 169–176

    Google Scholar 

  • Church, K.: Char_align: A Program for Aligning Parallel Texts at the Character Level. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, U.S.A. (1993) 1–8

    Google Scholar 

  • Church, K., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. In: Computational Linguistics, Vol. 16, number 1 (1990) 22–29

    Google Scholar 

  • Dagan, I., Church, K., Gale, W.: Robust Bilingual Word Alignment for Machine Aided Translation. In: Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, U.S.A. (1993) 1–8

    Google Scholar 

  • Daille, B.: Combined Approach for Terminology Extraction: Lexical Statistics and Linguistic Filtering. In: UCREL Technical Papers, Vol. 5., University of Lancaster, Department of Linguistics (1995)

    Google Scholar 

  • Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. In: Computational Linguistics, Vol. 19, number 1 (1993) 61–74

    Google Scholar 

  • Fung, P., McKeown, K.: Aligning Noisy Parallel Corpora across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. In: Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, U.S.A. (1994) 81–88

    Google Scholar 

  • Fung, P., McKeown, K.: A Technical Word-and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. In: Machine Translation, Vol. 12, numbers 1–2 (Special issue) (1997) 53–87

    Article  Google Scholar 

  • Gale, W., Church, K.: Identifying Word Correspondences in Parallel Texts. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, Pacific Grove, California, U.S.A., Morgan Kaufmann (1991) 152–157

    Google Scholar 

  • Kay, M., Röscheisen, M.: Text-Translation Alignment. In: Computational Linguistics, Vol. 19, number 1 (1993) 121–142

    Google Scholar 

  • Kotz, S., Johnson, N., Read, C.: Encyclopedia of Statistical Sciences, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1982)

    Google Scholar 

  • Langlais, P., El-Bèze, M.: Alignement de Corpus Bilingues: Algorithmes et Évaluation. In: Ressources et Évaluations en Ingénierie de la Langue, Collection Actualité Scientifique. Aupfel—Uref, Paris, France (1999)

    Google Scholar 

  • Melamed, I.: Bitext Maps and Alignment via Pattern Recognition. In: Computational Linguistics, Vol. 25, number 1 (1999) 107–130

    Google Scholar 

  • Oakes, M.: Statistics for Corpus Linguistics. Edinburgh University Press, Edinburgh, U.K. (1998)

    Google Scholar 

  • Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Alignment with Hapaxes. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’ 2000), Las Vegas, U.S.A.. CSREA Press, U.S.A. (2000) 1089–1095

    Google Scholar 

  • Ribeiro, A., Lopes, G., Mexia, J.: Linear Regression Based Alignment of Parallel Texts Using Homograph Words. In: Horn, W. (ed.): ECAI 2000. Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany. IOS Press, Amsterdam, Netherlands (2000)

    Google Scholar 

  • Ribeiro, A., Lopes, G., Mexia, J.: Aligning Portuguese and Chinese Parallel Texts Using Confidence Bands. In: Mizoguchi, R. & Slaney, J. (eds.), Proceedings of the Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000) — Lecture Notes in Artificial Intelligence. Springer-Verlag, Berlin Heidelberg New York (2000)

    Google Scholar 

  • Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Parallel Texts Alignment. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong, China (2000)

    Google Scholar 

  • Salton, G., McGill, M.: Introduction to Modern Information Retrieval, McGraw-Hill, New York (1983)

    Google Scholar 

  • da Silva, J., Dias, G., Guilloré, S., Lopes, J.: Using Localmaxs algorithms for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J. (eds.): Progress in Artificial Intelligence — Lecture Notes in Artificial Intelligence, Vol. 1695. Springer-Verlag, Berlin Heidelberg New York (1999) 113–132

    Chapter  Google Scholar 

  • Simard, M., Plamondon, P.: Bilingual Sentence Alignment: Balancing Robustness and Accuracy. In: Machine Translation, Vol. 13, number 1 (1998) 59–80

    Article  Google Scholar 

  • Smadja, F., McKeown, K., Hatzivassiloglou, V.: Translation Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, Vol. 22, number 1 (1996) 1–38

    Google Scholar 

  • Wonnacott, T., Wonnacott, R.: Introductory Statistics, 5th edition, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ribeiro, A., Pereira Lopes, G., Mexia, J. (2000). Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity. In: Monard, M.C., Sichman, J.S. (eds) Advances in Artificial Intelligence. IBERAMIA SBIA 2000 2000. Lecture Notes in Computer Science(), vol 1952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44399-1_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-44399-1_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41276-2

  • Online ISBN: 978-3-540-44399-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics