International Conference on Natural Language Processing

NLP 2014: Advances in Natural Language Processing pp 105-115

Cross-Lingual Semantic Similarity Measure for Comparable Articles

  • Motaz Saad
  • David Langlois
  • Kamel Smaïli
Conference paper

DOI: 10.1007/978-3-319-10888-9_11

Volume 8686 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Saad M., Langlois D., Smaïli K. (2014) Cross-Lingual Semantic Similarity Measure for Comparable Articles. In: Przepiórkowski A., Ogrodniczuk M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science, vol 8686. Springer, Cham

Abstract

A measure of similarity is required to find and compare cross-lingual articles concerning a specific topic. This measure can be based on bilingual dictionaries or based on numerical methods such as Latent Semantic Indexing (LSI). In this paper, we use LSI in two ways to retrieve Arabic-English comparable articles. The first way is monolingual: the English article is translated into Arabic and then mapped into the Arabic LSI space; the second way is cross-lingual: Arabic and English documents are mapped into Arabic-English LSI space. Then we compare LSI approaches to the dictionary-based approach on several English-Arabic parallel and comparable corpora. Results indicate that the performance of our cross-lingual LSI approach is competitive to the monolingual approach and even better for some corpora. Moreover, both LSI approaches outperform the dictionary approach.

Keywords

Cross-lingual latent semantic indexing corpus comparability cross-lingual information retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Motaz Saad
    • 1
    • 2
    • 3
  • David Langlois
    • 1
    • 2
    • 3
  • Kamel Smaïli
    • 1
    • 2
    • 3
  1. 1.SMarT Group, LORIA INRIAVillers-lès-NancyFrance
  2. 2.Université de Lorraine, LORIA, UMR 7503Villers-lès-NancyFrance
  3. 3.CNRS, LORIA, UMR 7503Villers-lès-NancyFrance