Abstract
The quality of data is a critical factor for all kinds of decision-making and transaction processing. While there has been a lot of research on data quality in the past two decades, the topic has not yet received sufficient attention from the Semantic Web community. In this paper, we discuss (1) the data quality issues related to the growing amount of data available on the Semantic Web, (2) how data quality problems can be handled within the Semantic Web technology framework, namely using SPARQL on RDF representations, and (3) how Semantic Web reference data, e.g. from DBPedia, can be used to spot incorrect literal values and functional dependency violations. We show how this approach can be used for data quality management of public Semantic Web data and data stored in relational databases in closed settings alike. As part of our work, we developed generic SPARQL queries to identify (1) missing datatype properties or literal values, (2) illegal values, and (3) functional dependency violations. We argue that using Semantic Web datasets reduces the effort for data quality management substantially. As a use-case, we employ Geonames, a publicly available Semantic Web resource for geographical data, as a trusted reference for managing the quality of other data sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Redman, T.C.: Data quality for the information age. Artech House, Boston (1996)
Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41, 79–82 (1998)
Brett, S.: World Wide Web Consortium (W3C), http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/layerCake-4.png (retrieved on March 8, 2010)
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–33 (1996)
Redman, T.C.: Data quality: the field guide. Digital Press, Boston (2001)
Rahm, E., Do, H.-H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin 23(4), 3–13 (2000)
Oliveira, P., Rodrigues, F., Henriques, P.R., Galhardas, H.: A Taxonomy of Data Quality Problems. In: Proc. 2nd Int. Workshop on Data and Information Quality (in conjunction with CAiSE 2005), Porto, Portugal (2005)
Oliveira, P., Rodrigues, F., Henriques, P.R.: A Formal Definition of Data Quality Problems. In: International Conference on Information Quality (2005)
Leser, U., Naumann, F.: Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. Dpunkt-Verlag, Heidelberg (2007)
Kashyap, V., Sheth, A.P.: Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach. Very Large Data Base Journal (5), 276–304 (1996)
Fürber, C., Hepp, M.: Using SPARQL and SPIN for Data Quality Management on the Semantic Web. In: 13th International Conference on Business Information Systems (BIS 2010), Berlin, Germany. LNBIP. Springer, Heidelberg (2010) (forthcoming)
Olson, J.: Data quality: the accuracy dimension. Morgan Kaufmann/Elsevier Science, Oxford (2003)
Wang, R.Y.: A product perspective on total data quality management. ACM Commun. 41, 58–65 (1998)
Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)
Hartig, O., Zhao, J.: Using Web Data Provenance for Quality Assessment. In: First International Workshop on the role of Semantic Web in Provenance Management (Co-located with the 8th International Semantic Web Conference, ISWC 2009, Washington DC, USA (2009)
Hartig, O.: Querying Trust in RDF Data with tSPARQL. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 5–20. Springer, Heidelberg (2009)
Bizer, C., Cyganiak, R.: Quality-driven information filtering using the WIQA policy framework. Web Semant 7, 1–10 (2009)
Lei, Y., Nikolov, A.: Detecting Quality Problems in Semantic Metadata without the Presence of a Gold Standard. In: EON, vol. 329, pp. 51–60 (2007), CEUR-WS.org
Brüggemann, S., Grüning, F.: Using Ontologies Providing Domain Knowledge for Data Quality Management. In: Pellegrini, T., Auer, S., Tochtermann, K., Schaffert, S. (eds.) Networked Knowledge - Networked Media, pp. 187–203. Springer, Heidelberg (2009)
Batini, C., Scannapieco, M.: Data quality: concepts, methodologies and techniques. Springer, Berlin (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fürber, C., Hepp, M. (2010). Using Semantic Web Resources for Data Quality Management. In: Cimiano, P., Pinto, H.S. (eds) Knowledge Engineering and Management by the Masses. EKAW 2010. Lecture Notes in Computer Science(), vol 6317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16438-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-16438-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16437-8
Online ISBN: 978-3-642-16438-5
eBook Packages: Computer ScienceComputer Science (R0)