A Multilingual Approach to Discover Cross-Language Links in Wikipedia

Bennacer, Nacéra; Johnson Vioulès, Mia; López, Maximiliano Ariel; Quercini, Gianluca

doi:10.1007/978-3-319-26190-4_36

Nacéra Bennacer²⁰,
Mia Johnson Vioulès²⁰,
Maximiliano Ariel López²⁰ &
…
Gianluca Quercini²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9418))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1470 Accesses
1 Citations
6 Altmetric

Abstract

Wikipedia is a well-known public and collaborative encyclopaedia consisting of millions of articles. Initially in English, the popular website has grown to include versions in over 288 languages. These versions and their articles are interconnected via cross-language links, which not only facilitate navigation and understanding of concepts in multiple languages, but have been used in natural language processing applications, developments in linked open data, and expansion of minor Wikipedia language versions. These applications are the motivation for an automatic, robust, and accurate technique to identify cross-language links. In this paper, we present a multilingual approach called EurekaCL to automatically identify missing cross-language links in Wikipedia. More precisely, given a Wikipedia article (the source) EurekaCL uses the multilingual and semantic features of BabelNet 2.0 in order to efficiently identify a set of candidate articles in a target language that are likely to cover the same topic as the source. The Wikipedia graph structure is then exploited both to prune and to rank the candidates. Our evaluation carried out on 42,000 pairs of articles in eight language versions of Wikipedia shows that our candidate selection and pruning procedures allow an effective selection of candidates which significantly helps the determination of the correct article in the target language version.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Adafre, S.F., de Rijke, M.: Finding similar sentences across multiple languages in wikipedia. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 62–69 (2006)
Google Scholar
Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic expansion of DBpedia exploiting wikipedia cross-language information. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 397–411. Springer, Heidelberg (2013)
Chapter Google Scholar
de Melo, G., Weikum, G.: Menta: inducing multilingual taxonomies from wikipedia. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1099–1108. ACM (2010)
Google Scholar
de Melo, G., Weikum, G.: Untangling the cross-lingual link structure of wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Uppsala, Sweden, 11–16 July 2010, pp. 844–853. Association for Computational Linguistics (2010)
Google Scholar
Moreira, C.E.M., Moreira, V.P.: Finding missing cross-language links in wikipedia. JIDM J. Inform. Data Manage. 4(3), 251–265 (2013)
Google Scholar
Navigli, R.: Babelnet and friends: a manifesto for multilingual semantic processing. Intelligenza Artificiale 7(2), 165–181 (2013)
Google Scholar
Penta, A., Quercini, G., Reynaud, C., Shadbolt, N.: Discovering cross-language links in wikipedia through semantic relatedness. In: ECAI 2012–20th European Conference on Artificial Intelligence, pp. 642–647 (2012)
Google Scholar
Sorg, P., Cimiano, P.: Enriching the crosslingual link structure of wikipedia -a classification-based approach. In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WikiAI 2008) (2008, to appear)
Google Scholar
Sorg, P., Cimiano, P.: Exploiting wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng. 74, 26–45 (2012)
Article Google Scholar
Tsunakawa, T., Araya, M., Kaji, H.: Enriching wikipedia’s intra-language links by their cross-language transfer. In: Proceedings of the 25th International Conference on Computational Linguistics, COLING 2014, pp. 1260–1268 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Recherche en Informatique (LRI), CentraleSupélec, University of Paris-Saclay, 91405, Orsay Cedex, France
Nacéra Bennacer, Mia Johnson Vioulès, Maximiliano Ariel López & Gianluca Quercini

Authors

Nacéra Bennacer
View author publications
You can also search for this author in PubMed Google Scholar
Mia Johnson Vioulès
View author publications
You can also search for this author in PubMed Google Scholar
Maximiliano Ariel López
View author publications
You can also search for this author in PubMed Google Scholar
Gianluca Quercini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gianluca Quercini .

Editor information

Editors and Affiliations

Tsinghua University, Bijing, China
Jianyong Wang
Poznan University of Economics, Poznan, Poland
Wojciech Cellary
Florida Atlantic University, Boca Raton, Florida, USA
Dingding Wang
Victoria University, Melbourne, Australia
Hua Wang
School of Computing & Information, Florida International University, Miami, Florida, USA
Shu-Ching Chen
Florida International University, Miami, Florida, USA
Tao Li
Victoria University, Melbourne, Victoria, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bennacer, N., Johnson Vioulès, M., López, M.A., Quercini, G. (2015). A Multilingual Approach to Discover Cross-Language Links in Wikipedia. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9418. Springer, Cham. https://doi.org/10.1007/978-3-319-26190-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-26190-4_36
Published: 25 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26189-8
Online ISBN: 978-3-319-26190-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics