Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

RDF Dataset Profiling

  • Stefan DietzeEmail author
  • Elena Demidova
  • Konstantin Todorov
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_288-1


In the context of this chapter, an RDF dataset is defined in accordance with the dataset definition in the Vocabulary of Interlinked Datasets (VoID), (http://vocab.deri.ie/void), namely, “A Dataset is a set of RDF triples that are published, maintained or aggregated by a single provider.” According to VoID, a dataset represents a meaningful collection of triples as envisioned by its provider. An RDF dataset profile is a formal representation of a set of dataset characteristics (features). It describes the dataset and aids dataset discovery, recommendation, and comparison with regard to the represented features. A dataset profile featureis a characteristic describing a certain attribute of the dataset. For instance, “dataset conciseness” is a dataset profile feature providing information on the degree of redundancy of the information contained in the dataset. A dataset profile is extensible with respect to the features it contains. Usually, the relevant feature set is...

This is a preview of subscription content, log in to check access.


  1. Abedjan Z, Grütze T, Jentzsch A, Naumann F (2014) Profiling and mining RDF data with prolod++. In: Proceedings of the 30th international conference on data engineering, ICDE 2014, Chicago, 31 Mar–4 Apr 2014, pp 1198–1201Google Scholar
  2. Alexander K, Cyganiak R, Hausenblas M, Zhao J (2009) Describing linked datasets – on the design and usage of void, the ‘vocabulary of interlinked datasets’. In: WWW 2009 workshop: linked data on the web (LDOW2009), MadridGoogle Scholar
  3. Auer S, Demter J, Martin M, Lehmann J (2012) Lodstats – an extensible framework for high-performance dataset analytics. In: Proceedings of the 18th international conference on knowledge engineering and knowledge management, EKAW 2012, Galway City, 8–12 Oct 2012, pp 353–362Google Scholar
  4. Ben Ellefi M, Bellahsene Z, John B, Demidova E, Dietze S, Szymanski J, Todorov K (2017) RDF dataset profiling – a survey of features, methods, vocabularies and applications. Semant Web JGoogle Scholar
  5. Bizer C, Cyganiak R (2009) Quality-driven information filtering using the WIQA policy framework. J Web Sem 7(1):1–10CrossRefGoogle Scholar
  6. Böhm C, Lorey J, Naumann F (2011) Creating void descriptions for web-scale data. J Web Sem 9(3):339–345CrossRefGoogle Scholar
  7. Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems, I-SEMANTICS 2013, Graz, 4–6 Sept 2013, pp 121–124Google Scholar
  8. Debattista J, Lange C, Auer S (2014) daQ, an ontology for dataset Quality information. In: Proceedings of the workshop on linked data on the web co-located with the 23rd international world wide web conference (WWW 2014), Seoul, 8 Apr 2014Google Scholar
  9. Endris KM, Giménez-Garía JM, Thakkar H, Demidova E, Zimmermann A, Lange C, Simperl E (2017) Dataset reuse: an analysis of references in community discussions, publications and data. In: Proceedings of the ninth international conference on knowledge capture (K-CAP 2017)Google Scholar
  10. Fetahu B, Dietze S, Nunes BP, Casanova MA, Taibi D, Nejdl W (2014) A scalable approach for efficiently generating structured dataset topic profiles. In: Proceedings of the 11th ESWC conference 2014, Anissaras, 25–29 May 2014, pp 519–534Google Scholar
  11. Fürber C, Hepp M (2011) Towards a vocabulary for data quality management in semantic web architectures. In: Proceedings of the 1st international workshop on linked web data management, LWDM’11. ACM, New York, pp 1–8Google Scholar
  12. Graube M, Hensel S, Urbas L (2014) R43ples: revisions for triples – an approach for version control in the semantic web. In: Proceedings of the 1st workshop on linked data quality co-located with 10th international conference on semantic systems, LDQ@SEMANTiCS 2014, Leipzig, 2 Sept 2014Google Scholar
  13. Harth A, Hose K, Karnstedt M, Polleres A, Sattler KU, Umbrich J (2010) Data summaries for on-demand queries over linked data. In: Proceedings of the 19th international conference on world wide web, WWW’10. ACM, New York, pp 411–420Google Scholar
  14. Käfer T, Abdelrahman A, Umbrich J, O’Byrne P, Hogan A (2013) Observing linked data dynamics. In: Proceedings of the 10th ESWC conference, Montpellier, 26–30 May 2013, pp 213–227Google Scholar
  15. Konrath M, Gottron T, Staab S, Scherp A (2012) Schemex – efficient construction of a data catalogue by stream-based indexing of linked data. J Web Sem 16:52–58CrossRefGoogle Scholar
  16. Missier P, Belhajjame K, Cheney J (2013) The W3C PROV family of specifications for modelling provenance metadata. In: Joint 2013 EDBT/ICDT conferences, EDBT’13. Proceedings, Genoa, 18–22 Mar 2013, pp 773–776Google Scholar
  17. Moro A, Raganato A, Navigli R (2014) Entity linking meets word sense disambiguation: a unified approach. TACL 2:231–244Google Scholar
  18. Omitola T, Zuo L, Gutteridge C, Millard IC, Glaser H, Gibbins N, Shadbolt N (2011) Tracing the provenance of linked data using void. In: Proceedings of the international conference on web intelligence, mining and semantics, WIMS’11. ACM, New York, pp 17:1–17:7Google Scholar
  19. Paulheim H, Bizer C (2014) Improving the quality of linked data using statistical distributions. Int J Semant Web Inf Syst 10(2):63–86CrossRefGoogle Scholar
  20. Umbrich J, Neumaier S, Polleres A (2015) Quality assessment and evolution of open data portals. In: Proceedings of the 3rd international conference on future internet of things and cloud, FiCloud 2015, Rome, 24–26 Aug 2015, pp 404–411Google Scholar
  21. Yu R, Gadiraju U, Fetahu B, Dietze S (2017) Fusem: query-centric data fusion on structured web markup. In: Proceedings of the 2017 IEEE 33nd international conference on data engineering (ICDE). IEEEGoogle Scholar
  22. Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S (2016) Quality assessment for linked data: a survey. Semant Web 7(1):63–93CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Stefan Dietze
    • 1
    Email author
  • Elena Demidova
    • 1
  • Konstantin Todorov
    • 2
  1. 1.L3S Research CenterLeibniz Universität HannoverHanoverGermany
  2. 2.LIRMMUniversity of MontpellierMontpellierFrance

Section editors and affiliations

  • Philippe Cudré-Mauroux
    • 1
  • Olaf Hartig
    • 2
  1. 1.eXascale InfolabUniversity of FribourgFribourgSwitzerland
  2. 2.Linköping University