Abstract
The increasing popularity of data warehouse systems reflects the rising requirement to make strategic use of data integrated from heterogeneous sources. While the research subject of schema integration has been extensively discussed for many years, data integration has been neglected up to the recent past. Data integration often reveals deficiencies of data quality, e. g. inconsistency, redundancy, and incompleteness. Up to now, there are hardly any mellow methods for data quality control. In this paper, we propose an adaptation of statistical process control (SPC), a technique well-established in manufacturing for several decades, to the data quality field. After reviewing basic concepts of SPC, we introduce an appropriate SPC-oriented algorithm for data quality control. By means of several scenarios, we demonstrate the applicability of our approach. Finally, we integrate our concepts into a system architecture for data quality management.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chaudhuri, S., Dayal, U. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26 (1), 1995, pp. 65–74.
Hinrichs, H. Metadata-based quality management of warehouse data. In K. Richta, (ed.), Proc. of the 19 th Conf. on Current Trends in Databases and Information Systems (DATASEM), Brno, Czech Republic, Masaryk University, 1999, pp. 239–248.
Inmon, W. H. Building the Data Warehouse. Wiley, New York, 1992.
Jensen, O. M., Parkin, D. M., MacLennan, R., Muir, C. S., Skeet, R. G. Cancer registration: principles and methods. IARC Scientific Publications No. 95, International Agency for Research on Cancer (IARC ), Lyon, 1991.
Juran, J. M. (ed.) Juran’s Quality Handbook. 5th Ed., McGraw-Hill, 1999.
Kashyap, V., Sheth, A. Semantic and schematic similarities between database objects: a context-based approach. VLDB Journal, 5, 1996, pp. 276–304.
Kay, E. Dirty Data Challenges Warehouses. DWS - For Data Warehousing Management (10), http://www.softwaremag.com/data_whs/dw10intr.htm, 1997.
Kimball, R. The Data Warehouse Lifecycle Toolkit. Wiley, New York, 1998.
Pyzdek, T. Pyzdek’s Guide to SPC Volume One: Fundamentals. ASQC Press, Milwaukee, 1990.
Redeker, G. Grundlagen der Qualitaetssicherung (in German). Online lecture script http://www.iq.uni-hannover.de/vorlesung/gs1/allgLMENUE.HTM, 1998.
Redman, T. C. Data Quality for the Information Age. Artech House, 1996.
Rinne, H., Mittag, H.-J. Statistische Methoden der Qualitaetssicherung (in German). 2nd Ed., Hanser, 1991.
Sachtleber, M. Eine generische Bibliothek von Datenqualitaetsmessverfahren fuer Data Warehouses (in German). Diploma thesis, University of Oldenburg, Germany, 1999.
Sheth, A. P., Larson, J. A. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22 (3), 1990, pp. 183–236.
Shewhart, W. A. Economic Control of Quality of Manufactured Product. D. Van Nostrand, New York, 1931.
Vality Technology Inc. http://wwwvality.com, 2000.
Vavouras, A., Gatziu, S., Dittrich, K. R. The SIRIUS approach for refreshing data warehouses incrementally. Proc. of the GI Conf. BTW’99, Freiburg, Germany, Springer, 1999, pp. 80–86.
Wang, R. Y. Total data quality management. Communications of the ACM, 41 (2), 1998, pp. 58–65.
Wheeler, D., Chambers, D. Understanding Statistical Process Control. 2nd Ed., SPC Press, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Hinrichs, H. (2001). Statistical Quality Control of Warehouse Data. In: Barzdins, J., Caplinskas, A. (eds) Databases and Information Systems. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9636-7_6
Download citation
DOI: https://doi.org/10.1007/978-94-015-9636-7_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5657-3
Online ISBN: 978-94-015-9636-7
eBook Packages: Springer Book Archive