Skip to main content

Data Mining and Statistical Control - A Review and Some Links

  • Chapter
Frontiers in Statistical Quality Control 8

Summary

Due to the potential of modern data processing industrial companies collect large amounts of business and engineering data. Interest in analysing these data has lead to a strong demand for data mining techniques and for corresponding software packages. Although data analysis is the common interest of statistics and data mining, the relationship between the two fields has remained unclear, in practice as well as in methodology. The present paper reviews links between data mining and statistical control in two instances: database management, particularly data warehousing, and temporal pattern analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aghasaryan, A., Fabre, E., Benveniste, A., Boubour, R., Jard, C. (1998) “A Hybrid Stochastic Petri Net Approach to Fault Diagnosis in Large Distributed Systems”. In: Mathematical Theory of Network and Systems (MTNS), edited by A. Beghi, L. Finesso, and G. Picci, II Poligrafo, Padova, Italy, pp. 921–924.

    Google Scholar 

  2. Agrawal, R., Lawless, J. F., and Mackay, R. J. (1999) “Analysis of Variation Transmission in Manufacturing Processes — Part II”. Journal of Quality Technology, Vol. 31, No. 2, pp. 143–154.

    Google Scholar 

  3. Agresti, A. (1990) Categorical Data Analysis. John Wiley and Sons Inc., New York.

    MATH  Google Scholar 

  4. Bartholomew, D. (2002) “Event Management: Hype or Hope?” Industry Week, May 2002.

    Google Scholar 

  5. Bendell, A., Disney, J., and McCollin, C. (1999) “The Future Role of Statistics in Quality Engineering and Management”. The Statistician, 48, Part 3, pp. 299–326.

    Google Scholar 

  6. Benveniste, A., Le Gland, F., Fabre, E., and Haar, S. (2001) “Distributed Hidden Markov Models”. Pages 211–220 in: Optimal Control and PDE’s — Innovations and Applications. In honor of Alain Bensoussan on the occasion of his 60th birthday. Edited by J.-L. Menaldi, E. Rofman, and A. Sulem. IOS Press, Amsterdam.

    Google Scholar 

  7. Benveniste, A., Fabre, E., and Haar, S. (2003) “Markov Nets: Probabilistic Models for Distributed and Concurrent Systems”. IEEE Transactions on Automatic Control, 48,11, pp. 1936–1950.

    Article  MathSciNet  Google Scholar 

  8. Berchthold, A., and Raftery, A. (1999) The mixture transition distribution (MTD) model for high-order Markov chains and non-Gaussian time series. Technical Report 360, Department of Statistics, University of Washington.

    Google Scholar 

  9. Bhote, K. R. (1988) World Class Quality: Design of Experiments Made Easier More Cost Effective than SPC. American Management Association, New York.

    Google Scholar 

  10. Bouloutas, A. T., Calo, S., and Finkel, A. (1994) “Alarm Correlation and Fault Identification in Communication Networks”. IEEE Transactions on Communications, Vol. 42, No. 2/3/4, pp. 523–533.

    Article  Google Scholar 

  11. Brauer, B. (2001) “Data Quality-Spinning Straw Into Gold”. Paper 117 in: Proceedings of the 26th SAS Users Group International Conference, SAS Institute Inc.

    Google Scholar 

  12. Brockwell, P. J., and Davis, R. A. (1996) Introduction to Time Series and Forecasting. Springer-Verlag, New York.

    MATH  Google Scholar 

  13. Brugnoni S., Bruno G., Manione R., Montariolo E., Paschetta E., Sisto L. (1993) “An Expert System for Real Time Fault Diagnosis of the Italian Telecommunications Network”. In: Proceedings of the IFIP TC6/WG 6.6 Third International Symposium on Integrated Network Management, pp. 617–628, Elsevier/North-Holland.

    Google Scholar 

  14. Davison, B., and Hirsh, H. (1998) “Probabilistic Online Action Prediction”. In: Proceedings of the AAAI Spring Symposium on Intelligent Environmemnts.

    Google Scholar 

  15. Drusinsky, D., and Shing, M.-T. (2003) “Monitoring Temporal Logic Specifications Combined with Time Series Constraints”. Journal of Universal Computer Science, vol. 9, no. 11, pp. 1261–1276.

    Google Scholar 

  16. Fabre, E., Aghasaryan, A., Benveniste, A., Boubour, R., and Jard, C. (1998) “Fault Detection and Diagnosis in Distributed Systems: An Approach by Partially Stochastic Petri Nets”. Discrete Event Dynamic Systems 8,2 (Special issue on Hybrid Systems), pp. 203–231.

    Article  MATH  MathSciNet  Google Scholar 

  17. Faltin, F. W., Mastrangelo, C. M., Runger, G. C, and Ryan, T. P. (1997) “Considerations in the Monitoring of Autocorrelated and Independent Data”. Journal of Quality Technology, Vol. 29, No. 2, pp. 131–133.

    Google Scholar 

  18. Fahrmeir, L., and Kaufmann, H. (1987) “Regression models for non-stationary categorical time series”. Journal of Time Series Analysis, Vol. 8, No. 2, pp. 147–160.

    MATH  MathSciNet  Google Scholar 

  19. Fahrmeir, L., and Tutz, G. (2001) Multivariate Statistical Modelling Based on Generalized Linear Models. Springer-Verlag, New York.

    MATH  Google Scholar 

  20. Fong, D. Y. T., and Lawless, J. F. (1998) “The Analysis of Process Variation Transmission with Multivariate Measurements”. Statistica Sinica, 8, pp. 151–164.

    MATH  MathSciNet  Google Scholar 

  21. Friedman, J. H. (1997) “Data Mining and Statistics: What’s the Connection?”. In: Proceedings of the 29th Symposium on the Interface, edited by D. Scott.

    Google Scholar 

  22. Friedman, J. H. (2001) “The Role of Statistics in the Data Revolution”. International Statistical Review, 69, 5.

    Google Scholar 

  23. Fröhlich, P., Nejdl, W., Jobmann, K., and Wietgrefe, H. (1997) “Model-Based Alarm Correlation in Cellular Phone Networks”. In: Proceedings of the Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

    Google Scholar 

  24. Gale, W. A., Hand, D. J., and Kelly, A. E. (1993) “Artificial Intelligence and Statistics”. Pages 535–576 in: Handbook of Statistics 9: Computational Statistics, edited by C. R. Rao, North-Holland, Amsterdam.

    Google Scholar 

  25. Göb, R., Del Castillo, E., and Ratz, M. (2001) “Run Length Comparisons of Shewhart \( \bar X\) Charts and Most Powerful Test Charts for the Detection of Trends and Shifts”. Communications in Statistics, Simulation and Computation, 30,2, pp. 355–376.

    Article  MATH  MathSciNet  Google Scholar 

  26. Green, P. J., and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, London.

    MATH  Google Scholar 

  27. Hand, D. J. (1998) “Data Mining: Statistics and More?” The American Statistician, Vol. 52, No. 2, pp. 112–118.

    Article  Google Scholar 

  28. Hand, D. J. (1999) “Statistics and Data Mining: Intersecting Disciplines”. SIGKDD Explorations, Volume 1, Issue 1, pp. 16–19.

    Google Scholar 

  29. Heierman, E., and Cook, D. J. (2003) “Improving Home Automation by Discovering Regularly Occurring Device Usage Patterns”. In: Proceedings of the International Conference on Data Mining, pp. 537–540.

    Google Scholar 

  30. Hong, S. J., and Weiss, S. (2004) “Advances in Predictive Models for Data Mining”. Pattern Recognition Letters Journal. To appear.

    Google Scholar 

  31. Hwarng, H. B., and Hubele, N. F. (1991) “X-bar Chart Pattern Recognition Using Neural Networks”. ASQC Quality Congress Transactions, pp. 884–889.

    Google Scholar 

  32. Inmon, W. H. (1996) Building the Data Warehouse. John Wiley and Sons Inc., New York.

    Google Scholar 

  33. Jacobs, P. A., and Lewis, P. A. W. (1978a) “Discrete time series generated by mixtures. I: Correlational and runs properties”. Journal of the Royal Stat. Soc. B, Vol. 40, No. 1, pp. 94–105.

    MATH  MathSciNet  Google Scholar 

  34. Jacobs, P. A., and Lewis, P. A. W. (1978b) “Discrete time series generated by mixtures. II: Asymptotic properties”. Journal of the Royal Stat. Soc. B, Vol. 40, No. 2, pp. 222–228.

    MATH  MathSciNet  Google Scholar 

  35. Jacobs, P. A., and Lewis, P. A. W. (1978a) “Discrete time series generated by mixtures. III: Autoregressive processes (DAR(p))”. Naval Postgraduate School Technical Report NPS55-78-022.

    Google Scholar 

  36. Jacobs, P. A., and Lewis, P. A. W. (1983) “Stationary discrete autoregressive-moving average time series generated by mixtures”. Journal of Time Series Analysis, Vol. 4, No. 1, pp. 19–36.

    MATH  MathSciNet  Google Scholar 

  37. Jakobson, G., and Weissman, M. D. (1993) “Alarm Correlation”. IEEE Network, 7(6), pp. 52–59.

    Article  Google Scholar 

  38. Ji, X., Zhou, S., Cao, J., and Shao, J. (2001) “Data Warehousing Helps Enterprise Improve Quality Management”. Paper 115 in: Proceedings of the 26th SAS Users Group International Conference, SAS Institute Inc.

    Google Scholar 

  39. Kanji, G. K., and Arif, O. H. (1999) “Quality Improvement by Quantile Approach”. Bulletin of the International Statistical Institute, 52nd Session, Proceedings Tome LVIII.

    Google Scholar 

  40. Klenz, B. W., and Fulenwider, D. O. (1999) “The Quality Data Warehouse: Solving Problems for the Enterprise”. Paper 142 in: Proceedings of the 24th SAS Users Group International Conference, SAS Institute Inc.

    Google Scholar 

  41. Kusiak, A. (2000) “Data Analysis: Models and Algorithms”. Pages 1–9 in: Proceedings of the SPEI Conference on Intelligent Systems and Advanced Manufacturing, edited by P. E. Orban and G. K. Knopf, SPIE, Vol. 4191, Boston.

    Google Scholar 

  42. Lawless, J. F. Mackay, R. J., and Robinson, J. A. (1999) “Analysis of Variation Transmission in Manufacturing Processes — Part I”. Journal of Quality Technology, Vol. 31, No. 2, pp. 131–142.

    Google Scholar 

  43. Lenz, H.-J. (1987) “Design and Implementation of a Sampling Inspection System for Incoming Batches Based on Relational Databases”. Pages 116–127 in: Frontiers in Statistical Quality Control 3, edited by H.-J. Lenz, G. B. Wetherill, P.-Th. Wilrich. Physica-Verlag, Heidelberg.

    Google Scholar 

  44. Liebetrau, A. M. (1990) Measures of Association. Fifth Edition. Sage Publications, Newbury Park, London, New Delhi.

    Google Scholar 

  45. MacDonald, I. L., and Zucchini, W. (1997) Hidden Markov and other models for discrete-valued time series. Chapman and Hall, London.

    MATH  Google Scholar 

  46. Mannila, H. (2000) “Theoretical Frameworks for Data Mining”. SIGKDD Explorations, Volume 1, Issue 2, pp. 30–32.

    Google Scholar 

  47. Mannila, H., Toivonen, H., and Verkamo, A. I. (1997) “Discovery of Frequent Episodes in Event Sequences”. Data Mining and Knowledge Discovery 1(3), pp. 259–289.

    Article  Google Scholar 

  48. McClellan, M. (1997) Applying Manufacturing Execution Systems. St. Lucie Press, Boca Raton.

    Google Scholar 

  49. Megan, L., and Cooper, D. J. (1992) “Neural Network Based Adaptive Control Via Temporal Pattern Recognition”. Canadian Journal of Chemical Engineering, 70, p. 1208.

    Article  Google Scholar 

  50. Milne, R., Nicol, C, Ghallab, M., Trave-Massuyes, L., Bousson, K., Dousson, C, Quevedo, J., Aguilar, J., and Guasch, A. (1994) “TIGER: Real-Time Situation Assessment of Dynamic Systems”. Intelligent Systems Engineering, pp. 103–124.

    Google Scholar 

  51. Ming, L., Bing, Z. J. Zhi, Z. Y., and Hong, Z. D. (2002) “Anticipative Event Management and Intelligent Self-Recovery for Manufacturing”. Technical Report AT/02/020/MET of the Singapore Institute of Manufacturing Technology.

    Google Scholar 

  52. Möller, M., and Tretter, S. (1995) “Event correlation in network management systems”. In: Proceedings of the 15th International Switching Symposium, volume 2, Berlin.

    Google Scholar 

  53. Oates, T. (1999) “Identifying distinctive subsequences in multivariate time series by clustering”. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 322–326.

    Google Scholar 

  54. Padmanabhan, B., and Tuzhilin, A. (1996) “Pattern Discovery in Temporal Databases: A Temporal Logic Approach”. Pages 351–354 in: Proceedings of the Second ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Portland.

    Google Scholar 

  55. PricewaterhouseCoopers (2002) Global Data Management Survey. Pricewater-houseCoopcrs.

    Google Scholar 

  56. Rabiner, L. R. (1989) “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition”. Proceedings of the IEE, volume 77, number 2, pp. 257–286.

    Article  Google Scholar 

  57. Raftery, A. E. (1985) “A Model for High-Order Markov Chains”. Journal of the Royal Statistical Society, Series B, Vol. 47, No. 3, pp. 528–539, 1985.

    MATH  MathSciNet  Google Scholar 

  58. Rijsbergen, C. J. van (1979) Information Retrieval. Butterworths, London.

    Google Scholar 

  59. Roddick, J. F., and Spiliopoulou, M. (1999) “A Bibliography of Temporal, Spatial and Spatio-Temporal Data Mining Research”. SIGKDD Exporations, volume 1, issue 1, pp. 34–38.

    Google Scholar 

  60. Rutledge, R. A. (2000) “Data Warehousing for Manufacturing Yield Improvement”. Paper 134 in: Proceedings of the 25th SAS Users Group International Conference, SAS Institute Inc.

    Google Scholar 

  61. G. Shafer (1976) A Mathematical Theory of Evidence. Princeton University Press, Princeton.

    MATH  Google Scholar 

  62. Silverston, L., Inmon, W. H., and Graziano, K. (1997) The Data Model Resource Book: A Library of Logical Data Models and Data Warehouse Designs. John Wiley and Sons Inc., New York.

    Google Scholar 

  63. Smith, A. E. (1992) “Control Chart Representation and Analysis via Backpropagation Neural Networks”. Pages 275–282 in: Proceedings of the 1992 International Fuzzy Systems and Intelligent Control Conference.

    Google Scholar 

  64. Thornhill, N. F., Atia, M. R., and Hutchison, R. J. (1999) “Experiences of Statistical Quality Control with BP Chemicals”. International Journal of COMADEM, 2(4), pp. 5–10.

    Google Scholar 

  65. Tukey, J. W. “The Future of Data Analysis”. The Annals of Mathematical Statistics, Vol. 33, pp. 1–67.

    Google Scholar 

  66. Weiss, G. M. (1999) “Timeweaver: A Genetic Algorithm for Identifying Predictive Patterns in Sequences of Events”. Pages 719–725 in: Proceedings of the Genetic and Evolutionary Computation Conference, edited by W. Banzhaf, J. Daida, A. Eiben, M. Garzon, V. Honavar, M. Jakiela. Morgan Kaufmann, San Francisco.

    Google Scholar 

  67. Weiss, G. M. (2001) “Predicting Telecommunication Equipment Failures from Sequences of Network Alarms”. In: Handbook of Knowledge Discovery and Data Mining, edited by W. Kloesgen and J. Zytkow, Oxford University Press.

    Google Scholar 

  68. Weiss, G. M., and Hirsh, H. (1998) “Learning to Predict Rare Events in Event Sequences”. Pages 359–363 in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, AAAI Press.

    Google Scholar 

  69. Wheeler, D. J. (1995) Advanced Topics in Statistical Process Control. SPC Press, Knoxville, Tennessee.

    Google Scholar 

  70. Woodall, W. H. (2000) “Controversies and Contradictions in Statistical Process Control”. Journal of Quality Technology, Vol. 32, No. 4, pp. 341–350.

    Google Scholar 

  71. Western Electric Company (1956). Statistical Quality Control Handbook. American Telephone and Telegraph Company, Chicago.

    Google Scholar 

  72. Yang, Y., and Liu, X. (1999) “A Re-Examination of Text Categorization Methods”. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Physica-Verlag Heidelberg

About this chapter

Cite this chapter

Göb, R. (2006). Data Mining and Statistical Control - A Review and Some Links. In: Lenz, HJ., Wilrich, PT. (eds) Frontiers in Statistical Quality Control 8. Physica-Verlag HD. https://doi.org/10.1007/3-7908-1687-6_17

Download citation

Publish with us

Policies and ethics