Skip to main content

Statistical Quality Control of Warehouse Data

  • Chapter
Databases and Information Systems

Abstract

The increasing popularity of data warehouse systems reflects the rising requirement to make strategic use of data integrated from heterogeneous sources. While the research subject of schema integration has been extensively discussed for many years, data integration has been neglected up to the recent past. Data integration often reveals deficiencies of data quality, e. g. inconsistency, redundancy, and incompleteness. Up to now, there are hardly any mellow methods for data quality control. In this paper, we propose an adaptation of statistical process control (SPC), a technique well-established in manufacturing for several decades, to the data quality field. After reviewing basic concepts of SPC, we introduce an appropriate SPC-oriented algorithm for data quality control. By means of several scenarios, we demonstrate the applicability of our approach. Finally, we integrate our concepts into a system architecture for data quality management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chaudhuri, S., Dayal, U. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26 (1), 1995, pp. 65–74.

    Article  Google Scholar 

  2. Hinrichs, H. Metadata-based quality management of warehouse data. In K. Richta, (ed.), Proc. of the 19 th Conf. on Current Trends in Databases and Information Systems (DATASEM), Brno, Czech Republic, Masaryk University, 1999, pp. 239–248.

    Google Scholar 

  3. Inmon, W. H. Building the Data Warehouse. Wiley, New York, 1992.

    Google Scholar 

  4. Jensen, O. M., Parkin, D. M., MacLennan, R., Muir, C. S., Skeet, R. G. Cancer registration: principles and methods. IARC Scientific Publications No. 95, International Agency for Research on Cancer (IARC ), Lyon, 1991.

    Google Scholar 

  5. Juran, J. M. (ed.) Juran’s Quality Handbook. 5th Ed., McGraw-Hill, 1999.

    Google Scholar 

  6. Kashyap, V., Sheth, A. Semantic and schematic similarities between database objects: a context-based approach. VLDB Journal, 5, 1996, pp. 276–304.

    Article  Google Scholar 

  7. Kay, E. Dirty Data Challenges Warehouses. DWS - For Data Warehousing Management (10), http://www.softwaremag.com/data_whs/dw10intr.htm, 1997.

    Google Scholar 

  8. Kimball, R. The Data Warehouse Lifecycle Toolkit. Wiley, New York, 1998.

    Google Scholar 

  9. Pyzdek, T. Pyzdek’s Guide to SPC Volume One: Fundamentals. ASQC Press, Milwaukee, 1990.

    Google Scholar 

  10. Redeker, G. Grundlagen der Qualitaetssicherung (in German). Online lecture script http://www.iq.uni-hannover.de/vorlesung/gs1/allgLMENUE.HTM, 1998.

  11. Redman, T. C. Data Quality for the Information Age. Artech House, 1996.

    Google Scholar 

  12. Rinne, H., Mittag, H.-J. Statistische Methoden der Qualitaetssicherung (in German). 2nd Ed., Hanser, 1991.

    Google Scholar 

  13. Sachtleber, M. Eine generische Bibliothek von Datenqualitaetsmessverfahren fuer Data Warehouses (in German). Diploma thesis, University of Oldenburg, Germany, 1999.

    Google Scholar 

  14. Sheth, A. P., Larson, J. A. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22 (3), 1990, pp. 183–236.

    Article  Google Scholar 

  15. Shewhart, W. A. Economic Control of Quality of Manufactured Product. D. Van Nostrand, New York, 1931.

    Google Scholar 

  16. Vality Technology Inc. http://wwwvality.com, 2000.

  17. Vavouras, A., Gatziu, S., Dittrich, K. R. The SIRIUS approach for refreshing data warehouses incrementally. Proc. of the GI Conf. BTW’99, Freiburg, Germany, Springer, 1999, pp. 80–86.

    Google Scholar 

  18. Wang, R. Y. Total data quality management. Communications of the ACM, 41 (2), 1998, pp. 58–65.

    Article  Google Scholar 

  19. Wheeler, D., Chambers, D. Understanding Statistical Process Control. 2nd Ed., SPC Press, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Hinrichs, H. (2001). Statistical Quality Control of Warehouse Data. In: Barzdins, J., Caplinskas, A. (eds) Databases and Information Systems. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9636-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-94-015-9636-7_6

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5657-3

  • Online ISBN: 978-94-015-9636-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics