Skip to main content

A Multilingual Approach to Discover Cross-Language Links in Wikipedia

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2015 (WISE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9418))

Included in the following conference series:

Abstract

Wikipedia is a well-known public and collaborative encyclopaedia consisting of millions of articles. Initially in English, the popular website has grown to include versions in over 288 languages. These versions and their articles are interconnected via cross-language links, which not only facilitate navigation and understanding of concepts in multiple languages, but have been used in natural language processing applications, developments in linked open data, and expansion of minor Wikipedia language versions. These applications are the motivation for an automatic, robust, and accurate technique to identify cross-language links. In this paper, we present a multilingual approach called EurekaCL to automatically identify missing cross-language links in Wikipedia. More precisely, given a Wikipedia article (the source) EurekaCL uses the multilingual and semantic features of BabelNet 2.0 in order to efficiently identify a set of candidate articles in a target language that are likely to cover the same topic as the source. The Wikipedia graph structure is then exploited both to prune and to rank the candidates. Our evaluation carried out on 42,000 pairs of articles in eight language versions of Wikipedia shows that our candidate selection and pruning procedures allow an effective selection of candidates which significantly helps the determination of the correct article in the target language version.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dumps.wikimedia.org/backup-index.html.

  2. 2.

    https://github.com/gquercini/graphipedia.

References

  1. Adafre, S.F., de Rijke, M.: Finding similar sentences across multiple languages in wikipedia. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 62–69 (2006)

    Google Scholar 

  2. Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic expansion of DBpedia exploiting wikipedia cross-language information. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 397–411. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. de Melo, G., Weikum, G.: Menta: inducing multilingual taxonomies from wikipedia. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1099–1108. ACM (2010)

    Google Scholar 

  4. de Melo, G., Weikum, G.: Untangling the cross-lingual link structure of wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Uppsala, Sweden, 11–16 July 2010, pp. 844–853. Association for Computational Linguistics (2010)

    Google Scholar 

  5. Moreira, C.E.M., Moreira, V.P.: Finding missing cross-language links in wikipedia. JIDM J. Inform. Data Manage. 4(3), 251–265 (2013)

    Google Scholar 

  6. Navigli, R.: Babelnet and friends: a manifesto for multilingual semantic processing. Intelligenza Artificiale 7(2), 165–181 (2013)

    Google Scholar 

  7. Penta, A., Quercini, G., Reynaud, C., Shadbolt, N.: Discovering cross-language links in wikipedia through semantic relatedness. In: ECAI 2012–20th European Conference on Artificial Intelligence, pp. 642–647 (2012)

    Google Scholar 

  8. Sorg, P., Cimiano, P.: Enriching the crosslingual link structure of wikipedia -a classification-based approach. In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WikiAI 2008) (2008, to appear)

    Google Scholar 

  9. Sorg, P., Cimiano, P.: Exploiting wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng. 74, 26–45 (2012)

    Article  Google Scholar 

  10. Tsunakawa, T., Araya, M., Kaji, H.: Enriching wikipedia’s intra-language links by their cross-language transfer. In: Proceedings of the 25th International Conference on Computational Linguistics, COLING 2014, pp. 1260–1268 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gianluca Quercini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bennacer, N., Johnson Vioulès, M., López, M.A., Quercini, G. (2015). A Multilingual Approach to Discover Cross-Language Links in Wikipedia. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9418. Springer, Cham. https://doi.org/10.1007/978-3-319-26190-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26190-4_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26189-8

  • Online ISBN: 978-3-319-26190-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics