Advertisement

Using Semantic Web Resources for Data Quality Management

  • Christian Fürber
  • Martin Hepp
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6317)

Abstract

The quality of data is a critical factor for all kinds of decision-making and transaction processing. While there has been a lot of research on data quality in the past two decades, the topic has not yet received sufficient attention from the Semantic Web community. In this paper, we discuss (1) the data quality issues related to the growing amount of data available on the Semantic Web, (2) how data quality problems can be handled within the Semantic Web technology framework, namely using SPARQL on RDF representations, and (3) how Semantic Web reference data, e.g. from DBPedia, can be used to spot incorrect literal values and functional dependency violations. We show how this approach can be used for data quality management of public Semantic Web data and data stored in relational databases in closed settings alike. As part of our work, we developed generic SPARQL queries to identify (1) missing datatype properties or literal values, (2) illegal values, and (3) functional dependency violations. We argue that using Semantic Web datasets reduces the effort for data quality management substantially. As a use-case, we employ Geonames, a publicly available Semantic Web resource for geographical data, as a trusted reference for managing the quality of other data sources.

Keywords

Semantic Web Ontologies Data Quality Management Ontology-Based Data Quality Management Metadata Management SPARQL Linked Data Geonames Trust 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Redman, T.C.: Data quality for the information age. Artech House, Boston (1996)Google Scholar
  2. 2.
    Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41, 79–82 (1998)CrossRefGoogle Scholar
  3. 3.
    Brett, S.: World Wide Web Consortium (W3C), http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/layerCake-4.png (retrieved on March 8, 2010)
  4. 4.
    Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–33 (1996)CrossRefGoogle Scholar
  5. 5.
    Redman, T.C.: Data quality: the field guide. Digital Press, Boston (2001)Google Scholar
  6. 6.
    Rahm, E., Do, H.-H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin 23(4), 3–13 (2000)Google Scholar
  7. 7.
    Oliveira, P., Rodrigues, F., Henriques, P.R., Galhardas, H.: A Taxonomy of Data Quality Problems. In: Proc. 2nd Int. Workshop on Data and Information Quality (in conjunction with CAiSE 2005), Porto, Portugal (2005)Google Scholar
  8. 8.
    Oliveira, P., Rodrigues, F., Henriques, P.R.: A Formal Definition of Data Quality Problems. In: International Conference on Information Quality (2005)Google Scholar
  9. 9.
    Leser, U., Naumann, F.: Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. Dpunkt-Verlag, Heidelberg (2007)zbMATHGoogle Scholar
  10. 10.
    Kashyap, V., Sheth, A.P.: Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach. Very Large Data Base Journal (5), 276–304 (1996)CrossRefGoogle Scholar
  11. 11.
    Fürber, C., Hepp, M.: Using SPARQL and SPIN for Data Quality Management on the Semantic Web. In: 13th International Conference on Business Information Systems (BIS 2010), Berlin, Germany. LNBIP. Springer, Heidelberg (2010) (forthcoming)Google Scholar
  12. 12.
    Olson, J.: Data quality: the accuracy dimension. Morgan Kaufmann/Elsevier Science, Oxford (2003)Google Scholar
  13. 13.
    Wang, R.Y.: A product perspective on total data quality management. ACM Commun. 41, 58–65 (1998)CrossRefGoogle Scholar
  14. 14.
    Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)CrossRefGoogle Scholar
  15. 15.
    Hartig, O., Zhao, J.: Using Web Data Provenance for Quality Assessment. In: First International Workshop on the role of Semantic Web in Provenance Management (Co-located with the 8th International Semantic Web Conference, ISWC 2009, Washington DC, USA (2009)Google Scholar
  16. 16.
    Hartig, O.: Querying Trust in RDF Data with tSPARQL. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 5–20. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  17. 17.
    Bizer, C., Cyganiak, R.: Quality-driven information filtering using the WIQA policy framework. Web Semant 7, 1–10 (2009)CrossRefGoogle Scholar
  18. 18.
    Lei, Y., Nikolov, A.: Detecting Quality Problems in Semantic Metadata without the Presence of a Gold Standard. In: EON, vol. 329, pp. 51–60 (2007), CEUR-WS.orgGoogle Scholar
  19. 19.
    Brüggemann, S., Grüning, F.: Using Ontologies Providing Domain Knowledge for Data Quality Management. In: Pellegrini, T., Auer, S., Tochtermann, K., Schaffert, S. (eds.) Networked Knowledge - Networked Media, pp. 187–203. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  20. 20.
    Batini, C., Scannapieco, M.: Data quality: concepts, methodologies and techniques. Springer, Berlin (2006)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Christian Fürber
    • 1
  • Martin Hepp
    • 1
  1. 1.E-Business & Web Science Research GroupNeubibergGermany

Personalised recommendations