Skip to main content

An Analysis of Links in Wikidata

  • 552 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13261)

Abstract

Wikidata has become one of the most prominent open knowledge graphs (KGs) on the Web. Relying on a community of users with different expertise, this cross-domain KG is directly related to other data sources. This paper investigates how Wikidata is linked to other data sources in the Linked Data ecosystem. To this end, we adapt previous definitions of ontology links and instance links to the terminological part of the Wikidata vocabulary and perform an analysis of the links in Wikidata to external datasets and ontologies from the Linked Data ecosystem. As a side effect, this reveals insights on the ontological expressiveness of meta-properties used in Wikidata. The results of this analysis show that while Wikidata defines a large number of individuals, classes and properties within its own namespace, they are not (yet) extensively linked. We discuss reasons for this and conclude with some suggestions to increase the interconnectedness of Wikidata with other KGs.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-06981-9_2
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-06981-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    cf. https://www.wikidata.org/wiki/Special:Statistics.

  2. 2.

    https://www.wikidata.org/wiki/Wikidata:Bots.

  3. 3.

    https://www.rdfhdt.org/datasets/.

  4. 4.

    For instance, the pattern {[] wdt:P279 ?X; wdt:P31 ?X.} indicates ambiguous subclass vs. instance of usage on 2131 entities, run on 9 Dec 2021 at https://w.wiki/4XQw.

  5. 5.

    See https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Top-level_ontology_list for the top two layers of the ontology.

  6. 6.

    i.e., there are 37 uses of P2445 in total in Wikidata as of August 2021.

  7. 7.

    There are higher orders of second-order class, i.e., third-, fourth- and fifth-order classes, each of which is an instance of the higher ordered class, all of which are subclasses of the fixed-order class (Q23959932).

  8. 8.

    cf. https://www.w3.org/TR/owl2-new-features/#Simple_metamodeling_capabilities.

  9. 9.

    cf. https://www.wikidata.org/wiki/Wikidata:List_of_properties/Wikidata_property_for_properties.

  10. 10.

    We note here again that subtle semantic differences such as constraining (i.e., CWA) vs implicit (i.e., OWA) semantics of certain properties are not relevant for the purpose of our link analysis.

  11. 11.

    Prefixes are used as follows: wd: <http://www.wikidata.org/entity/>, wdt: <http://www.wikidata.org/prop/direct/>, pq: <http://www.wikidata.org/prop/qualifier/>, p: <http://www.wikidata.org/prop/>, ps: <http://www.wikidata.org/prop/statement/>.

  12. 12.

    https://www.wikidata.org/wiki/Wikidata:Property_proposal/disjoint_with.

  13. 13.

    All code implemented in Python is available at: https://github.com/arminhaller/LinksInLOD.

  14. 14.

    https://www.wikidata.org/wiki/Wikidata:List_of_properties.

  15. 15.

    No longitudional data is published on the Wikidata site, but the growth in the number of properties between July and November 2021 was 3.4%.

  16. 16.

    Some individuals might use more than one exact match relation.

References

  1. Abián, D., Bernad, J., Trillo, R.: Using contemporary constraints to ensure data consistency. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 2303–2310, April 2019

    Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    CrossRef  Google Scholar 

  3. Balaraman, V., Razniewski, S., Nutt, W.: Recoin: relative completeness in Wikidata. In: Wiki Workshop 2018 co-located with the Web Conference 2018 in Lyon, France, 24 April 2018, April 2018

    Google Scholar 

  4. Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD laundromat: a uniform way of publishing other people’s dirty data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 213–228. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_14

    CrossRef  Google Scholar 

  5. Berners-Lee, T.: Linked Data. W3C Design Issues, July 2006. http://www.w3.org/DesignIssues/LinkedData.html

  6. Brasileiro, F., Almeida, J.P.A., Carvalho, V.A., Guizzardi, G.: Applying a multi-level modeling theory to assess taxonomic hierarchies in Wikidata. In: Proceedings of the 25th International Conference Companion Volume on World Wide Web, pp. 975–980 (2016)

    Google Scholar 

  7. Debattista, J., Auer, S., Lange, C.: Luzzu - a methodology and framework for linked data quality assessment. J. Data Inf. Qual. 8(1), 4:1–4:32 (2016)

    Google Scholar 

  8. Debattista, J., Lange, C., Auer, S., Cortis, D.: Evaluating the quality of the LOD cloud: an empirical investigation. Semant. Web 9(6), 859–901 (2018)

    CrossRef  Google Scholar 

  9. Erxleben, F., Günther, M., Krötzsch, M., Mendez, J., Vrandečić, D.: Introducing Wikidata to the linked data web. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 50–65. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_4

    CrossRef  Google Scholar 

  10. Freire, N., Isaac, A.: Technical usability of Wikidata’s linked data. In: Abramowicz, W., Corchuelo, R. (eds.) BIS 2019. LNBIP, vol. 373, pp. 556–567. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36691-9_47

    CrossRef  Google Scholar 

  11. Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web 9(1), 77–129 (2018)

    CrossRef  Google Scholar 

  12. Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_13

    CrossRef  Google Scholar 

  13. Haller, A., Fernández, J.D., Kamdar, M.R., Polleres, A.: What are links in linked open data? A characterization and evaluation of links between knowledge graphs on the web. J. Data Inf. Qual. 12(1), 1–34 (2020)

    Google Scholar 

  14. Haller, A., Polleres, A.: Are we better off with just one ontology on the web? Semant. Web 11(1), 87–99 (2020)

    CrossRef  Google Scholar 

  15. Hernández, D., Hogan, A., Krötzsch, M.: Reifying RDF: what works well with Wikidata? In: Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems, vol. 1457, pp. 32–47. CEUR-WS.org (2015)

    Google Scholar 

  16. Pillai, S.G., Soon, L.-K., Haw, S.-C.: Comparing DBpedia, Wikidata, and YAGO for web information retrieval. In: Piuri, V., Balas, V.E., Borah, S., Syed Ahmad, S.S. (eds.) Intelligent and Interactive Computing. LNNS, vol. 67, pp. 525–535. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-6031-2_40

    CrossRef  Google Scholar 

  17. Piscopo, A., Simperl, E.: Who models the world?: collaborative ontology creation and user roles in Wikidata. Proc. ACM Hum.-Comput. Interact. 2(CSCW), 141:1–141:18 (2018)

    Google Scholar 

  18. Piscopo, A., Simperl, E.: What we talk about when we talk about Wikidata quality: a literature survey. In: Proceedings of the 15th International Symposium on Open Collaboration, New York, NY, USA (2019)

    Google Scholar 

  19. Raad, J., Beek, W., van Harmelen, F., Pernelle, N., Saïs, F.: Detecting erroneous identity links on the web using network metrics. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 391–407. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_23

    CrossRef  Google Scholar 

  20. Radulovic, F., Mihindukulasooriya, N., García-Castro, R., Gómez-Pérez, A.: A comprehensive quality model for linked data. Semant. Web 9(1), 3–24 (2018)

    CrossRef  Google Scholar 

  21. Samuel, J.: Towards understanding and improving multilingual collaborative ontology development in Wikidata. In: Proceedings of Wiki Workshop 2018 co-located with the Web Conference 2018, Lyon, France, April 2018

    Google Scholar 

  22. Sarasua, C., Staab, S., Thimm, M.: Methods for intrinsic evaluation of links in the web of data. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 68–84. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_5

    CrossRef  Google Scholar 

  23. Shenoy, K., Ilievski, F., Garijo, D., Schwabe, D., Szekely, P.: A study of the quality of Wikidata. arXiv preprint arXiv:2107.00156 (2021)

  24. Vandenbussche, P., Atemezing, G., Poveda-Villalón, M., Vatant, B.: Linked open vocabularies (LOV): a gateway to reusable semantic vocabularies on the web. Semant. Web 8(3), 437–452 (2017)

    CrossRef  Google Scholar 

  25. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    CrossRef  Google Scholar 

  26. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)

    CrossRef  Google Scholar 

Download references

Acknowledgment

This research has received funding from the Teaming.AI project, which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No 957402.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Armin Haller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Haller, A., Polleres, A., Dobriy, D., Ferranti, N., Rodríguez Méndez, S.J. (2022). An Analysis of Links in Wikidata. In: , et al. The Semantic Web. ESWC 2022. Lecture Notes in Computer Science, vol 13261. Springer, Cham. https://doi.org/10.1007/978-3-031-06981-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06981-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06980-2

  • Online ISBN: 978-3-031-06981-9

  • eBook Packages: Computer ScienceComputer Science (R0)