Skip to main content

Data Quality and Data Cleansing of Semantic Data

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies

Synonyms

Data cleaning; Data curation; Data quality; Data validation; Linked data; Quality assessment

Definition

Data quality is commonly conceived as a multidimensional construct defined as “fitness for use.” Data quality may depend on various factors (dimensions or characteristics) such as completeness, consistency, availability, etc. Data quality assessment involves the measurement of quality dimensions or criteria that are relevant to the use case. A metric or measure is a procedure for measuring a data quality dimension with the help of a tool. These metrics are heuristics that are designed to fit a specific assessment situation.

Overview

In this chapter, we first introduce the concepts of Linked Data quality and its dimensions and metrics. Then we provide definitions for 18 quality dimensions along with a total of 69 metrics to measure the dimensions.

Thereafter, we provide an overview of tools currently available for Linked Data quality assessment followed by an introduction to...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Acosta M, Zaveri A, Simperl E, Kontokostas D, Auer S, Lehmann J (2013) Crowdsourcing linked data quality assessment. In: Proceedings of the 12th international semantic web conference (ISWC). Lecture notes in computer science, vol 8219. Springer, Berlin/Heidelberg, pp 260–276

    Google Scholar 

  • Albertoni R, Isaac A, Guéret C, Debattista J, Lee D, Mihindukulasooriya N, Zaveri A (2015) Data quality vocabulary (DQV). W3c interest group note, World Wide Web consortium (W3C)

    Google Scholar 

  • Bizer C, Cyganiak R (2009) Quality-driven information filtering using the WIQA policy framework. J Web Semant 7(1):1–10

    Article  Google Scholar 

  • Guéret C, Groth P, Stadler C, Lehmann J (2011) Linked data quality assessment through network analysis. In: The semantic web – ISWC 2011. Lecture notes in computer science, vol 7032. Springer, Berlin/Heidelberg

    Google Scholar 

  • Heath T, Bizer C (2011) Linked data: evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology, 1st edn. Morgan & Claypool, Milton Keynes

    Google Scholar 

  • Kontokostas D, Westphal P, Auer S, Hellmann S, Lehmann J, Cornelissen R (2014) Databugger: a test-driven framework for debugging the web of data. In: Proceedings of the companion publication of the 23rd international conference on world wide web companion, WWW companion ’14, pp 115–118. https://doi.org/10.1145/2567948.2577017

  • Lehmann J, Sejdiu G, Bühmann L, Westphal P, Stadler C, Ermilov I, Bin S, Chakraborty N, Saleem M, Ngonga Ngomo AC, Jabeen H (2017) Distributed semantic analytics using the SANSA stack. In: Proceedings of the 16th international semantic web conference, part II, the semantic web – ISWC 2017, 21–25 Oct 2017. Springer International Publishing, Vienna

    Google Scholar 

  • Pipino L, Kopsco D, Wang R, Rybold W (2005) Developing measurement scales for data-quality dimensions, vol 1. M.E. Sharpe, New York

    Google Scholar 

  • Maali F, Erickson J, Archer P (2014) Data catalog vocabulary. W3C recommendation, world wide web consortium (W3C)

    Google Scholar 

  • Spahiu B, Porrini R, Palmonari M, Rula A, Maurino A (2016) ABSTAT: ontology-driven linked data summaries with pattern minimalization. In: The semantic web: ESWC 2016 satellite events, Heraklion, 29 May–2 June 2016. Springer International Publishing, pp 381–395

    Google Scholar 

  • Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Goble C, Jeffrey S, Grethe PG, Heringa J, ’t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR guiding principles for scientific data management and stewardship. Scientific data 3. https://doi.org/10.1038/sdata.2016.18

  • Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S (2016) Quality assessment for linked data: a survey. Semant Web J 7:63–93

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Amrapali Zaveri or Anisa Rula .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Zaveri, A., Rula, A. (2018). Data Quality and Data Cleansing of Semantic Data. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_289-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_289-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics