Advertisement

A Multilingual Approach to Discover Cross-Language Links in Wikipedia

  • Nacéra Bennacer
  • Mia Johnson Vioulès
  • Maximiliano Ariel López
  • Gianluca Quercini
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9418)

Abstract

Wikipedia is a well-known public and collaborative encyclopaedia consisting of millions of articles. Initially in English, the popular website has grown to include versions in over 288 languages. These versions and their articles are interconnected via cross-language links, which not only facilitate navigation and understanding of concepts in multiple languages, but have been used in natural language processing applications, developments in linked open data, and expansion of minor Wikipedia language versions. These applications are the motivation for an automatic, robust, and accurate technique to identify cross-language links. In this paper, we present a multilingual approach called EurekaCL to automatically identify missing cross-language links in Wikipedia. More precisely, given a Wikipedia article (the source) EurekaCL uses the multilingual and semantic features of BabelNet 2.0 in order to efficiently identify a set of candidate articles in a target language that are likely to cover the same topic as the source. The Wikipedia graph structure is then exploited both to prune and to rank the candidates. Our evaluation carried out on 42,000 pairs of articles in eight language versions of Wikipedia shows that our candidate selection and pruning procedures allow an effective selection of candidates which significantly helps the determination of the correct article in the target language version.

References

  1. 1.
    Adafre, S.F., de Rijke, M.: Finding similar sentences across multiple languages in wikipedia. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 62–69 (2006)Google Scholar
  2. 2.
    Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic expansion of DBpedia exploiting wikipedia cross-language information. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 397–411. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  3. 3.
    de Melo, G., Weikum, G.: Menta: inducing multilingual taxonomies from wikipedia. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1099–1108. ACM (2010)Google Scholar
  4. 4.
    de Melo, G., Weikum, G.: Untangling the cross-lingual link structure of wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Uppsala, Sweden, 11–16 July 2010, pp. 844–853. Association for Computational Linguistics (2010)Google Scholar
  5. 5.
    Moreira, C.E.M., Moreira, V.P.: Finding missing cross-language links in wikipedia. JIDM J. Inform. Data Manage. 4(3), 251–265 (2013)Google Scholar
  6. 6.
    Navigli, R.: Babelnet and friends: a manifesto for multilingual semantic processing. Intelligenza Artificiale 7(2), 165–181 (2013)Google Scholar
  7. 7.
    Penta, A., Quercini, G., Reynaud, C., Shadbolt, N.: Discovering cross-language links in wikipedia through semantic relatedness. In: ECAI 2012–20th European Conference on Artificial Intelligence, pp. 642–647 (2012)Google Scholar
  8. 8.
    Sorg, P., Cimiano, P.: Enriching the crosslingual link structure of wikipedia -a classification-based approach. In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WikiAI 2008) (2008, to appear)Google Scholar
  9. 9.
    Sorg, P., Cimiano, P.: Exploiting wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng. 74, 26–45 (2012)CrossRefGoogle Scholar
  10. 10.
    Tsunakawa, T., Araya, M., Kaji, H.: Enriching wikipedia’s intra-language links by their cross-language transfer. In: Proceedings of the 25th International Conference on Computational Linguistics, COLING 2014, pp. 1260–1268 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nacéra Bennacer
    • 1
  • Mia Johnson Vioulès
    • 1
  • Maximiliano Ariel López
    • 1
  • Gianluca Quercini
    • 1
  1. 1.Laboratoire de Recherche en Informatique (LRI), CentraleSupélecUniversity of Paris-SaclayOrsay CedexFrance

Personalised recommendations