Summary
Due to the potential of modern data processing industrial companies collect large amounts of business and engineering data. Interest in analysing these data has lead to a strong demand for data mining techniques and for corresponding software packages. Although data analysis is the common interest of statistics and data mining, the relationship between the two fields has remained unclear, in practice as well as in methodology. The present paper reviews links between data mining and statistical control in two instances: database management, particularly data warehousing, and temporal pattern analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aghasaryan, A., Fabre, E., Benveniste, A., Boubour, R., Jard, C. (1998) “A Hybrid Stochastic Petri Net Approach to Fault Diagnosis in Large Distributed Systems”. In: Mathematical Theory of Network and Systems (MTNS), edited by A. Beghi, L. Finesso, and G. Picci, II Poligrafo, Padova, Italy, pp. 921–924.
Agrawal, R., Lawless, J. F., and Mackay, R. J. (1999) “Analysis of Variation Transmission in Manufacturing Processes — Part II”. Journal of Quality Technology, Vol. 31, No. 2, pp. 143–154.
Agresti, A. (1990) Categorical Data Analysis. John Wiley and Sons Inc., New York.
Bartholomew, D. (2002) “Event Management: Hype or Hope?” Industry Week, May 2002.
Bendell, A., Disney, J., and McCollin, C. (1999) “The Future Role of Statistics in Quality Engineering and Management”. The Statistician, 48, Part 3, pp. 299–326.
Benveniste, A., Le Gland, F., Fabre, E., and Haar, S. (2001) “Distributed Hidden Markov Models”. Pages 211–220 in: Optimal Control and PDE’s — Innovations and Applications. In honor of Alain Bensoussan on the occasion of his 60th birthday. Edited by J.-L. Menaldi, E. Rofman, and A. Sulem. IOS Press, Amsterdam.
Benveniste, A., Fabre, E., and Haar, S. (2003) “Markov Nets: Probabilistic Models for Distributed and Concurrent Systems”. IEEE Transactions on Automatic Control, 48,11, pp. 1936–1950.
Berchthold, A., and Raftery, A. (1999) The mixture transition distribution (MTD) model for high-order Markov chains and non-Gaussian time series. Technical Report 360, Department of Statistics, University of Washington.
Bhote, K. R. (1988) World Class Quality: Design of Experiments Made Easier More Cost Effective than SPC. American Management Association, New York.
Bouloutas, A. T., Calo, S., and Finkel, A. (1994) “Alarm Correlation and Fault Identification in Communication Networks”. IEEE Transactions on Communications, Vol. 42, No. 2/3/4, pp. 523–533.
Brauer, B. (2001) “Data Quality-Spinning Straw Into Gold”. Paper 117 in: Proceedings of the 26th SAS Users Group International Conference, SAS Institute Inc.
Brockwell, P. J., and Davis, R. A. (1996) Introduction to Time Series and Forecasting. Springer-Verlag, New York.
Brugnoni S., Bruno G., Manione R., Montariolo E., Paschetta E., Sisto L. (1993) “An Expert System for Real Time Fault Diagnosis of the Italian Telecommunications Network”. In: Proceedings of the IFIP TC6/WG 6.6 Third International Symposium on Integrated Network Management, pp. 617–628, Elsevier/North-Holland.
Davison, B., and Hirsh, H. (1998) “Probabilistic Online Action Prediction”. In: Proceedings of the AAAI Spring Symposium on Intelligent Environmemnts.
Drusinsky, D., and Shing, M.-T. (2003) “Monitoring Temporal Logic Specifications Combined with Time Series Constraints”. Journal of Universal Computer Science, vol. 9, no. 11, pp. 1261–1276.
Fabre, E., Aghasaryan, A., Benveniste, A., Boubour, R., and Jard, C. (1998) “Fault Detection and Diagnosis in Distributed Systems: An Approach by Partially Stochastic Petri Nets”. Discrete Event Dynamic Systems 8,2 (Special issue on Hybrid Systems), pp. 203–231.
Faltin, F. W., Mastrangelo, C. M., Runger, G. C, and Ryan, T. P. (1997) “Considerations in the Monitoring of Autocorrelated and Independent Data”. Journal of Quality Technology, Vol. 29, No. 2, pp. 131–133.
Fahrmeir, L., and Kaufmann, H. (1987) “Regression models for non-stationary categorical time series”. Journal of Time Series Analysis, Vol. 8, No. 2, pp. 147–160.
Fahrmeir, L., and Tutz, G. (2001) Multivariate Statistical Modelling Based on Generalized Linear Models. Springer-Verlag, New York.
Fong, D. Y. T., and Lawless, J. F. (1998) “The Analysis of Process Variation Transmission with Multivariate Measurements”. Statistica Sinica, 8, pp. 151–164.
Friedman, J. H. (1997) “Data Mining and Statistics: What’s the Connection?”. In: Proceedings of the 29th Symposium on the Interface, edited by D. Scott.
Friedman, J. H. (2001) “The Role of Statistics in the Data Revolution”. International Statistical Review, 69, 5.
Fröhlich, P., Nejdl, W., Jobmann, K., and Wietgrefe, H. (1997) “Model-Based Alarm Correlation in Cellular Phone Networks”. In: Proceedings of the Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).
Gale, W. A., Hand, D. J., and Kelly, A. E. (1993) “Artificial Intelligence and Statistics”. Pages 535–576 in: Handbook of Statistics 9: Computational Statistics, edited by C. R. Rao, North-Holland, Amsterdam.
Göb, R., Del Castillo, E., and Ratz, M. (2001) “Run Length Comparisons of Shewhart \( \bar X\) Charts and Most Powerful Test Charts for the Detection of Trends and Shifts”. Communications in Statistics, Simulation and Computation, 30,2, pp. 355–376.
Green, P. J., and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, London.
Hand, D. J. (1998) “Data Mining: Statistics and More?” The American Statistician, Vol. 52, No. 2, pp. 112–118.
Hand, D. J. (1999) “Statistics and Data Mining: Intersecting Disciplines”. SIGKDD Explorations, Volume 1, Issue 1, pp. 16–19.
Heierman, E., and Cook, D. J. (2003) “Improving Home Automation by Discovering Regularly Occurring Device Usage Patterns”. In: Proceedings of the International Conference on Data Mining, pp. 537–540.
Hong, S. J., and Weiss, S. (2004) “Advances in Predictive Models for Data Mining”. Pattern Recognition Letters Journal. To appear.
Hwarng, H. B., and Hubele, N. F. (1991) “X-bar Chart Pattern Recognition Using Neural Networks”. ASQC Quality Congress Transactions, pp. 884–889.
Inmon, W. H. (1996) Building the Data Warehouse. John Wiley and Sons Inc., New York.
Jacobs, P. A., and Lewis, P. A. W. (1978a) “Discrete time series generated by mixtures. I: Correlational and runs properties”. Journal of the Royal Stat. Soc. B, Vol. 40, No. 1, pp. 94–105.
Jacobs, P. A., and Lewis, P. A. W. (1978b) “Discrete time series generated by mixtures. II: Asymptotic properties”. Journal of the Royal Stat. Soc. B, Vol. 40, No. 2, pp. 222–228.
Jacobs, P. A., and Lewis, P. A. W. (1978a) “Discrete time series generated by mixtures. III: Autoregressive processes (DAR(p))”. Naval Postgraduate School Technical Report NPS55-78-022.
Jacobs, P. A., and Lewis, P. A. W. (1983) “Stationary discrete autoregressive-moving average time series generated by mixtures”. Journal of Time Series Analysis, Vol. 4, No. 1, pp. 19–36.
Jakobson, G., and Weissman, M. D. (1993) “Alarm Correlation”. IEEE Network, 7(6), pp. 52–59.
Ji, X., Zhou, S., Cao, J., and Shao, J. (2001) “Data Warehousing Helps Enterprise Improve Quality Management”. Paper 115 in: Proceedings of the 26th SAS Users Group International Conference, SAS Institute Inc.
Kanji, G. K., and Arif, O. H. (1999) “Quality Improvement by Quantile Approach”. Bulletin of the International Statistical Institute, 52nd Session, Proceedings Tome LVIII.
Klenz, B. W., and Fulenwider, D. O. (1999) “The Quality Data Warehouse: Solving Problems for the Enterprise”. Paper 142 in: Proceedings of the 24th SAS Users Group International Conference, SAS Institute Inc.
Kusiak, A. (2000) “Data Analysis: Models and Algorithms”. Pages 1–9 in: Proceedings of the SPEI Conference on Intelligent Systems and Advanced Manufacturing, edited by P. E. Orban and G. K. Knopf, SPIE, Vol. 4191, Boston.
Lawless, J. F. Mackay, R. J., and Robinson, J. A. (1999) “Analysis of Variation Transmission in Manufacturing Processes — Part I”. Journal of Quality Technology, Vol. 31, No. 2, pp. 131–142.
Lenz, H.-J. (1987) “Design and Implementation of a Sampling Inspection System for Incoming Batches Based on Relational Databases”. Pages 116–127 in: Frontiers in Statistical Quality Control 3, edited by H.-J. Lenz, G. B. Wetherill, P.-Th. Wilrich. Physica-Verlag, Heidelberg.
Liebetrau, A. M. (1990) Measures of Association. Fifth Edition. Sage Publications, Newbury Park, London, New Delhi.
MacDonald, I. L., and Zucchini, W. (1997) Hidden Markov and other models for discrete-valued time series. Chapman and Hall, London.
Mannila, H. (2000) “Theoretical Frameworks for Data Mining”. SIGKDD Explorations, Volume 1, Issue 2, pp. 30–32.
Mannila, H., Toivonen, H., and Verkamo, A. I. (1997) “Discovery of Frequent Episodes in Event Sequences”. Data Mining and Knowledge Discovery 1(3), pp. 259–289.
McClellan, M. (1997) Applying Manufacturing Execution Systems. St. Lucie Press, Boca Raton.
Megan, L., and Cooper, D. J. (1992) “Neural Network Based Adaptive Control Via Temporal Pattern Recognition”. Canadian Journal of Chemical Engineering, 70, p. 1208.
Milne, R., Nicol, C, Ghallab, M., Trave-Massuyes, L., Bousson, K., Dousson, C, Quevedo, J., Aguilar, J., and Guasch, A. (1994) “TIGER: Real-Time Situation Assessment of Dynamic Systems”. Intelligent Systems Engineering, pp. 103–124.
Ming, L., Bing, Z. J. Zhi, Z. Y., and Hong, Z. D. (2002) “Anticipative Event Management and Intelligent Self-Recovery for Manufacturing”. Technical Report AT/02/020/MET of the Singapore Institute of Manufacturing Technology.
Möller, M., and Tretter, S. (1995) “Event correlation in network management systems”. In: Proceedings of the 15th International Switching Symposium, volume 2, Berlin.
Oates, T. (1999) “Identifying distinctive subsequences in multivariate time series by clustering”. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 322–326.
Padmanabhan, B., and Tuzhilin, A. (1996) “Pattern Discovery in Temporal Databases: A Temporal Logic Approach”. Pages 351–354 in: Proceedings of the Second ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Portland.
PricewaterhouseCoopers (2002) Global Data Management Survey. Pricewater-houseCoopcrs.
Rabiner, L. R. (1989) “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition”. Proceedings of the IEE, volume 77, number 2, pp. 257–286.
Raftery, A. E. (1985) “A Model for High-Order Markov Chains”. Journal of the Royal Statistical Society, Series B, Vol. 47, No. 3, pp. 528–539, 1985.
Rijsbergen, C. J. van (1979) Information Retrieval. Butterworths, London.
Roddick, J. F., and Spiliopoulou, M. (1999) “A Bibliography of Temporal, Spatial and Spatio-Temporal Data Mining Research”. SIGKDD Exporations, volume 1, issue 1, pp. 34–38.
Rutledge, R. A. (2000) “Data Warehousing for Manufacturing Yield Improvement”. Paper 134 in: Proceedings of the 25th SAS Users Group International Conference, SAS Institute Inc.
G. Shafer (1976) A Mathematical Theory of Evidence. Princeton University Press, Princeton.
Silverston, L., Inmon, W. H., and Graziano, K. (1997) The Data Model Resource Book: A Library of Logical Data Models and Data Warehouse Designs. John Wiley and Sons Inc., New York.
Smith, A. E. (1992) “Control Chart Representation and Analysis via Backpropagation Neural Networks”. Pages 275–282 in: Proceedings of the 1992 International Fuzzy Systems and Intelligent Control Conference.
Thornhill, N. F., Atia, M. R., and Hutchison, R. J. (1999) “Experiences of Statistical Quality Control with BP Chemicals”. International Journal of COMADEM, 2(4), pp. 5–10.
Tukey, J. W. “The Future of Data Analysis”. The Annals of Mathematical Statistics, Vol. 33, pp. 1–67.
Weiss, G. M. (1999) “Timeweaver: A Genetic Algorithm for Identifying Predictive Patterns in Sequences of Events”. Pages 719–725 in: Proceedings of the Genetic and Evolutionary Computation Conference, edited by W. Banzhaf, J. Daida, A. Eiben, M. Garzon, V. Honavar, M. Jakiela. Morgan Kaufmann, San Francisco.
Weiss, G. M. (2001) “Predicting Telecommunication Equipment Failures from Sequences of Network Alarms”. In: Handbook of Knowledge Discovery and Data Mining, edited by W. Kloesgen and J. Zytkow, Oxford University Press.
Weiss, G. M., and Hirsh, H. (1998) “Learning to Predict Rare Events in Event Sequences”. Pages 359–363 in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, AAAI Press.
Wheeler, D. J. (1995) Advanced Topics in Statistical Process Control. SPC Press, Knoxville, Tennessee.
Woodall, W. H. (2000) “Controversies and Contradictions in Statistical Process Control”. Journal of Quality Technology, Vol. 32, No. 4, pp. 341–350.
Western Electric Company (1956). Statistical Quality Control Handbook. American Telephone and Telegraph Company, Chicago.
Yang, Y., and Liu, X. (1999) “A Re-Examination of Text Categorization Methods”. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Physica-Verlag Heidelberg
About this chapter
Cite this chapter
Göb, R. (2006). Data Mining and Statistical Control - A Review and Some Links. In: Lenz, HJ., Wilrich, PT. (eds) Frontiers in Statistical Quality Control 8. Physica-Verlag HD. https://doi.org/10.1007/3-7908-1687-6_17
Download citation
DOI: https://doi.org/10.1007/3-7908-1687-6_17
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-1686-0
Online ISBN: 978-3-7908-1687-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)