Skip to main content

Using Semantic Web Resources for Data Quality Management

  • Conference paper
Knowledge Engineering and Management by the Masses (EKAW 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6317))

Abstract

The quality of data is a critical factor for all kinds of decision-making and transaction processing. While there has been a lot of research on data quality in the past two decades, the topic has not yet received sufficient attention from the Semantic Web community. In this paper, we discuss (1) the data quality issues related to the growing amount of data available on the Semantic Web, (2) how data quality problems can be handled within the Semantic Web technology framework, namely using SPARQL on RDF representations, and (3) how Semantic Web reference data, e.g. from DBPedia, can be used to spot incorrect literal values and functional dependency violations. We show how this approach can be used for data quality management of public Semantic Web data and data stored in relational databases in closed settings alike. As part of our work, we developed generic SPARQL queries to identify (1) missing datatype properties or literal values, (2) illegal values, and (3) functional dependency violations. We argue that using Semantic Web datasets reduces the effort for data quality management substantially. As a use-case, we employ Geonames, a publicly available Semantic Web resource for geographical data, as a trusted reference for managing the quality of other data sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Redman, T.C.: Data quality for the information age. Artech House, Boston (1996)

    Google Scholar 

  2. Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41, 79–82 (1998)

    Article  Google Scholar 

  3. Brett, S.: World Wide Web Consortium (W3C), http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/layerCake-4.png (retrieved on March 8, 2010)

  4. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–33 (1996)

    Article  Google Scholar 

  5. Redman, T.C.: Data quality: the field guide. Digital Press, Boston (2001)

    Google Scholar 

  6. Rahm, E., Do, H.-H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin 23(4), 3–13 (2000)

    Google Scholar 

  7. Oliveira, P., Rodrigues, F., Henriques, P.R., Galhardas, H.: A Taxonomy of Data Quality Problems. In: Proc. 2nd Int. Workshop on Data and Information Quality (in conjunction with CAiSE 2005), Porto, Portugal (2005)

    Google Scholar 

  8. Oliveira, P., Rodrigues, F., Henriques, P.R.: A Formal Definition of Data Quality Problems. In: International Conference on Information Quality (2005)

    Google Scholar 

  9. Leser, U., Naumann, F.: Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. Dpunkt-Verlag, Heidelberg (2007)

    MATH  Google Scholar 

  10. Kashyap, V., Sheth, A.P.: Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach. Very Large Data Base Journal (5), 276–304 (1996)

    Article  Google Scholar 

  11. Fürber, C., Hepp, M.: Using SPARQL and SPIN for Data Quality Management on the Semantic Web. In: 13th International Conference on Business Information Systems (BIS 2010), Berlin, Germany. LNBIP. Springer, Heidelberg (2010) (forthcoming)

    Google Scholar 

  12. Olson, J.: Data quality: the accuracy dimension. Morgan Kaufmann/Elsevier Science, Oxford (2003)

    Google Scholar 

  13. Wang, R.Y.: A product perspective on total data quality management. ACM Commun. 41, 58–65 (1998)

    Article  Google Scholar 

  14. Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)

    Article  Google Scholar 

  15. Hartig, O., Zhao, J.: Using Web Data Provenance for Quality Assessment. In: First International Workshop on the role of Semantic Web in Provenance Management (Co-located with the 8th International Semantic Web Conference, ISWC 2009, Washington DC, USA (2009)

    Google Scholar 

  16. Hartig, O.: Querying Trust in RDF Data with tSPARQL. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 5–20. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Bizer, C., Cyganiak, R.: Quality-driven information filtering using the WIQA policy framework. Web Semant 7, 1–10 (2009)

    Article  Google Scholar 

  18. Lei, Y., Nikolov, A.: Detecting Quality Problems in Semantic Metadata without the Presence of a Gold Standard. In: EON, vol. 329, pp. 51–60 (2007), CEUR-WS.org

    Google Scholar 

  19. Brüggemann, S., Grüning, F.: Using Ontologies Providing Domain Knowledge for Data Quality Management. In: Pellegrini, T., Auer, S., Tochtermann, K., Schaffert, S. (eds.) Networked Knowledge - Networked Media, pp. 187–203. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Batini, C., Scannapieco, M.: Data quality: concepts, methodologies and techniques. Springer, Berlin (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fürber, C., Hepp, M. (2010). Using Semantic Web Resources for Data Quality Management. In: Cimiano, P., Pinto, H.S. (eds) Knowledge Engineering and Management by the Masses. EKAW 2010. Lecture Notes in Computer Science(), vol 6317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16438-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16438-5_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16437-8

  • Online ISBN: 978-3-642-16438-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics