Advertisement

Using SPARQL and SPIN for Data Quality Management on the Semantic Web

  • Christian Fürber
  • Martin Hepp
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 47)

Abstract

The quality of data is a key factor that determines the performance of information systems, in particular with regard (1) to the amount of exceptions in the execution of business processes and (2) to the quality of decisions based on the output of the respective information system. Recently, the Semantic Web and Linked Data activities have started to provide substantial data resources that may be used for real business operations. Hence, it will soon be critical to manage the quality of such data. Unfortunately, we can observe a wide range of data quality problems in Semantic Web data. In this paper, we (1) evaluate how the state of the art in data quality research fits the characteristics of the Web of Data, (2) describe how the SPARQL query language and the SPARQL Inferencing Notation (SPIN) can be utilized to identify data quality problems in Semantic Web data automatically and this within the Semantic Web technology stack, and (3) evaluate our approach.

Keywords

Semantic Web Linked Data Data Quality Management SPARQL SPIN RDF Ontologies Ontology-Based Data Quality Management 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–33 (1996)zbMATHGoogle Scholar
  2. 2.
    Redman, T.C.: Data quality: the field guide. Digital Press, Boston (2001)Google Scholar
  3. 3.
    Redman, T.C.: Data quality for the information age. Artech House, Boston (1996)Google Scholar
  4. 4.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 34–43 (2001)CrossRefGoogle Scholar
  5. 5.
    Uschold, M., Gruninger, M.: Ontologies: Principles, Methods, and Applications. The Knowledge Engineering Review 11(2), 93–155 (1996)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Hepp, M.: GoodRelations: An ontology for describing products and services offers on the web. In: Gangemi, A., Euzenat, J. (eds.) EKAW 2008. LNCS (LNAI), vol. 5268, pp. 329–346. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Oliveira, P., Rodrigues, F., Henriques, P.R.: A Formal Definition of Data Quality Problems. In: International Conference on Information Quality (2005)Google Scholar
  9. 9.
    Leser, U., Naumann, F.: Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. dpunkt-Verlag, Heidelberg (2007)Google Scholar
  10. 10.
    Oliveira, P., Rodrigues, F., Henriques, P.R., Galhardas, H.: A Taxonomy of Data Quality Problems. In: Proc. 2nd Int. Workshop on Data and Information Quality (in conjunction with CAiSE 2005), Porto, Portugal (2005)Google Scholar
  11. 11.
    Rahm, E., Do, H.-H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin 23(4), 3–13 (2000)Google Scholar
  12. 12.
  13. 13.
    Olson, J.: Data quality: the accuracy dimension. Morgan Kaufmann Publishers, San Francisco (2003)Google Scholar
  14. 14.
    Wang, X., Hamilton, H.J., Bither, Y.: An ontology-based approach to data cleaning. Dept. of Computer Science, University of Regina, Regina (2005)Google Scholar
  15. 15.
    Grüning, F.: Datenqualitätsmanagement in der Energiewirtschaft. Oldenburger Verlag für Wirtschaft, Informatik und Recht, Oldenburg (2009)Google Scholar
  16. 16.
    Ji, Q., Haase, P., Qi, G., Hitzler, P., Stadtmüller, S.: RaDON – Repair and Diagnosis in Ontology Networks. In: 6th European Semantic Web Conference on The Semantic Web: Research and Applications (2009)Google Scholar
  17. 17.
    Knublauch, H.: SPIN – SPARQL Inferencing Notation (2009), http://spinrdf.org/ (retrieved December 4, 2009)
  18. 18.
    Alexiev, V., Breu, M., de Bruin, J., Fensel, D., Lara, R., Lausen, H.: Information integration with ontologies: experiences from an industrial showcase. Jon Wiley & Sons, Ltd., Chichester (2005)Google Scholar
  19. 19.
    Eckerson, W.: Data Quality and the Bottom Line: Achieving Business Success through a Commitment to High Quality Data. Report of The Data Warehousing Institute (2002)Google Scholar
  20. 20.
    Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41, 79–82 (1998)CrossRefGoogle Scholar
  21. 21.
    Kedad, Z., Métais, E.: Ontology-Based Data Cleaning. In: Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers (2002)Google Scholar
  22. 22.
    Hartig, O.: Provenance Information in the Web of Data. In: Linked Data on the Web (LDOW 2009) Workshop at the World Wide Web Conference, WWW (2009)Google Scholar
  23. 23.
    Hartig, O.: Querying trust in RDF data with tSPARQL. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 5–20. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  24. 24.
    O’Reilly catalog in RDF, http://oreilly.com/catalog/9780596007683

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Christian Fürber
    • 1
  • Martin Hepp
    • 1
  1. 1.E-Business & Web Science Research GroupUniversität der Bundeswehr MünchenNeubibergGermany

Personalised recommendations