Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Data Quality and Data Cleansing of Semantic Data

  • Amrapali ZaveriEmail author
  • Anisa RulaEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_289-1

Synonyms

Definition

Data quality is commonly conceived as a multidimensional construct defined as “fitness for use.” Data quality may depend on various factors (dimensions or characteristics) such as completeness, consistency, availability, etc. Data quality assessment involves the measurement of quality dimensions or criteria that are relevant to the use case. A metric or measure is a procedure for measuring a data quality dimension with the help of a tool. These metrics are heuristics that are designed to fit a specific assessment situation.

Overview

In this chapter, we first introduce the concepts of Linked Data quality and its dimensions and metrics. Then we provide definitions for 18 quality dimensions along with a total of 69 metrics to measure the dimensions.

Thereafter, we provide an overview of tools currently available for Linked Data quality assessment followed by an introduction to...

This is a preview of subscription content, log in to check access.

References

  1. Acosta M, Zaveri A, Simperl E, Kontokostas D, Auer S, Lehmann J (2013) Crowdsourcing linked data quality assessment. In: Proceedings of the 12th international semantic web conference (ISWC). Lecture notes in computer science, vol 8219. Springer, Berlin/Heidelberg, pp 260–276Google Scholar
  2. Albertoni R, Isaac A, Guéret C, Debattista J, Lee D, Mihindukulasooriya N, Zaveri A (2015) Data quality vocabulary (DQV). W3c interest group note, World Wide Web consortium (W3C)Google Scholar
  3. Bizer C, Cyganiak R (2009) Quality-driven information filtering using the WIQA policy framework. J Web Semant 7(1):1–10CrossRefGoogle Scholar
  4. Guéret C, Groth P, Stadler C, Lehmann J (2011) Linked data quality assessment through network analysis. In: The semantic web – ISWC 2011. Lecture notes in computer science, vol 7032. Springer, Berlin/HeidelbergGoogle Scholar
  5. Heath T, Bizer C (2011) Linked data: evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology, 1st edn. Morgan & Claypool, Milton KeynesGoogle Scholar
  6. Kontokostas D, Westphal P, Auer S, Hellmann S, Lehmann J, Cornelissen R (2014) Databugger: a test-driven framework for debugging the web of data. In: Proceedings of the companion publication of the 23rd international conference on world wide web companion, WWW companion ’14, pp 115–118. https://doi.org/10.1145/2567948.2577017
  7. Lehmann J, Sejdiu G, Bühmann L, Westphal P, Stadler C, Ermilov I, Bin S, Chakraborty N, Saleem M, Ngonga Ngomo AC, Jabeen H (2017) Distributed semantic analytics using the SANSA stack. In: Proceedings of the 16th international semantic web conference, part II, the semantic web – ISWC 2017, 21–25 Oct 2017. Springer International Publishing, ViennaGoogle Scholar
  8. Pipino L, Kopsco D, Wang R, Rybold W (2005) Developing measurement scales for data-quality dimensions, vol 1. M.E. Sharpe, New YorkGoogle Scholar
  9. Maali F, Erickson J, Archer P (2014) Data catalog vocabulary. W3C recommendation, world wide web consortium (W3C)Google Scholar
  10. Spahiu B, Porrini R, Palmonari M, Rula A, Maurino A (2016) ABSTAT: ontology-driven linked data summaries with pattern minimalization. In: The semantic web: ESWC 2016 satellite events, Heraklion, 29 May–2 June 2016. Springer International Publishing, pp 381–395Google Scholar
  11. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Goble C, Jeffrey S, Grethe PG, Heringa J, ’t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR guiding principles for scientific data management and stewardship. Scientific data 3. https://doi.org/10.1038/sdata.2016.18
  12. Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S (2016) Quality assessment for linked data: a survey. Semant Web J 7:63–93CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Institute of Data ScienceMaastricht UniversityMaastrichtThe Netherlands
  2. 2.Department of Computer Science, Systems and Communication (DISCo)University of Milano-BicoccaMilanItaly

Section editors and affiliations

  • Philippe Cudré-Mauroux
    • 1
  • Olaf Hartig
    • 2
  1. 1.eXascale InfolabUniversity of FribourgFribourgSwitzerland
  2. 2.Linköping University