Partitioning Algorithms and Combined Model Integration for Data Mining
- 659 Downloads
- 6 Citations
Summary
In this paper a data-driven procedure is introduced enabling to extract information from complex and huge data sets for statistical purposes. The proposed strategy consists of three stages: tree-partitioning, modelling and model fusion. As a result, we define a final complex decision rule for supervised classification and prediction. Main tools are represented by the tree production rules and nonlinear regression models from the class of Generalized Additive Multi-Mixture Models. The benchmark of the proposed strategy is shown using a well-known real data set.
Keywords
Exploratory Trees Generalized Additive Models Mixing Parameters Bootstrap Averaging Backfitting Algorithm Generalized Additive Multi-Mixture Models Resampling Model FusionNotes
Acknowledgements
Authors wish to thank the referees for helpful comments and Prof. Jaromir Antoch of Charles University in Prague for having carefully red a previous draft of the paper. The work of this paper was supported by MURST funds (prot. 9913182289) and by research funds of the Department of Mathematics and Statistics of the University of Naples.
References
- Blake, K., & Mertz, C.J., (1998), UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn.
- Breiman, L., (1996), ‘Bagging predictors’, Machine Learning, 26, 46–59.zbMATHGoogle Scholar
- Breiman, L., Friedman, J., Olshen, R.A. & Stone, C. J. (1984), Classification and Regression Trees, Wadsworth, Belmont CA.Google Scholar
- Conversano, C., & Mola, F., (2000), ‘Semi-parametric models for data mining’, in Bethlehem, J.G., van der Heijden, P.G.M. (eds.): Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 241–252.Google Scholar
- Conversano, C., Mola, F. & Siciliano, R., (2000(a)), ‘Generalized Additive Multi-Model for Classification and Prediction’, in Kiers, H.A.L., Rasson, J.P., Groen, P.J.F., Shader, M. (eds.): Data Analysis, Classification, and Related Methods, Springer, Berlin, 205–210.CrossRefGoogle Scholar
- Conversano, C., Siciliano, R. & Mola, F. (2000(b)), ‘Supervised Classifier Combination through Generalized Additive Multi-Model’, in Roli, F., Kittler, F. (eds.): Proceedings of the First International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science, Physica-Verlag, Heidelberg (D), 167–176.CrossRefGoogle Scholar
- Conversano, C., Siciliano, R. & Mola, F. (2000(c)), ‘Generalized Additive Multi-Mixture Models for Data Mining’, in Di Ciaccio, A., Borra, S. (eds.): Proceedings of the First International Workshop on Nonlinear Methods and Data Mining, CNR, Italy (I), 166–180.Google Scholar
- Coppi, R. (2000), ‘Design and Analysis Issues in Data Mining’, in Borra, S., Di Ciaccio, A. (eds.): Proceedings of the International Workshop on Nonlinear Methods and Data Mining, CISU, Rome, 126–141.Google Scholar
- Friedman, J.H. (1997), ‘Data Mining and Statistics: What’s the Connection?’, http://www-stat-stanford.edu/~jhf/.
- Hand, D. J. (2000), ‘Methodological issues in data mining’, in Bethlehem, J.G., van der Heijden, P.G.M. (eds.): Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 77–85.Google Scholar
- Hastie, T. & Tibshirani, R. (1990), Generalized Additive Models, Chapmann & Hall, London.zbMATHGoogle Scholar
- Klaschka, J., Siciliano, R. & Antoch, J. (1998), ‘Computational Enhancements in Tree-Growing Methods.’, in A. Rizzi (eds.): Proceedings of the International Federation of Classification Societies Conference: Data Analysis, Classification, and Related Methods, Springer, Rome, 295–302.Google Scholar
- Kohavi, R.,(1996), ‘Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid.’, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Springer.Google Scholar
- Mola, F., & Siciliano, R. (1992), ‘A Two-Stage Predictive Splitting Algorithm in Binary Segmentation’, in Dodge, Y. & Whittaker, J. (eds.): Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 179–184.CrossRefGoogle Scholar
- Mola, F. & Siciliano, R. (1997), ‘A Fast Splitting Procedure for Classification Trees’ Statistics and Computing, 7, 208–216.CrossRefGoogle Scholar
- Quinlan, J. R. (1986), ‘Induction of Decision Trees’, Machine Learning, (1), 81–106.Google Scholar
- Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA.Google Scholar
- Savichy, P., Klascka, J. & Antoch, J. (2000), ‘Optimal Classification Trees’, in Bethelem, J.G. & van der Heijden, P.G.M. Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 427–432.Google Scholar
- Siciliano, R. (1998), ‘Exploratory versus Decision Trees’, in Payne, R. & Green, P. Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 113–124.Google Scholar
- Siciliano, R. & Mola, F. (2000), ‘Multivariate Data Analysis through Classification and Regression Trees’, Computational Statistics and Data Analysis, (32), Elsevier Science, 285–301.CrossRefGoogle Scholar