Partitioning Algorithms and Combined Model Integration for Data Mining
- 659 Downloads
In this paper a data-driven procedure is introduced enabling to extract information from complex and huge data sets for statistical purposes. The proposed strategy consists of three stages: tree-partitioning, modelling and model fusion. As a result, we define a final complex decision rule for supervised classification and prediction. Main tools are represented by the tree production rules and nonlinear regression models from the class of Generalized Additive Multi-Mixture Models. The benchmark of the proposed strategy is shown using a well-known real data set.
KeywordsExploratory Trees Generalized Additive Models Mixing Parameters Bootstrap Averaging Backfitting Algorithm Generalized Additive Multi-Mixture Models Resampling Model Fusion
Authors wish to thank the referees for helpful comments and Prof. Jaromir Antoch of Charles University in Prague for having carefully red a previous draft of the paper. The work of this paper was supported by MURST funds (prot. 9913182289) and by research funds of the Department of Mathematics and Statistics of the University of Naples.
- Blake, K., & Mertz, C.J., (1998), UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn.
- Breiman, L., Friedman, J., Olshen, R.A. & Stone, C. J. (1984), Classification and Regression Trees, Wadsworth, Belmont CA.Google Scholar
- Conversano, C., & Mola, F., (2000), ‘Semi-parametric models for data mining’, in Bethlehem, J.G., van der Heijden, P.G.M. (eds.): Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 241–252.Google Scholar
- Conversano, C., Siciliano, R. & Mola, F. (2000(b)), ‘Supervised Classifier Combination through Generalized Additive Multi-Model’, in Roli, F., Kittler, F. (eds.): Proceedings of the First International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science, Physica-Verlag, Heidelberg (D), 167–176.CrossRefGoogle Scholar
- Conversano, C., Siciliano, R. & Mola, F. (2000(c)), ‘Generalized Additive Multi-Mixture Models for Data Mining’, in Di Ciaccio, A., Borra, S. (eds.): Proceedings of the First International Workshop on Nonlinear Methods and Data Mining, CNR, Italy (I), 166–180.Google Scholar
- Coppi, R. (2000), ‘Design and Analysis Issues in Data Mining’, in Borra, S., Di Ciaccio, A. (eds.): Proceedings of the International Workshop on Nonlinear Methods and Data Mining, CISU, Rome, 126–141.Google Scholar
- Friedman, J.H. (1997), ‘Data Mining and Statistics: What’s the Connection?’, http://www-stat-stanford.edu/~jhf/.
- Hand, D. J. (2000), ‘Methodological issues in data mining’, in Bethlehem, J.G., van der Heijden, P.G.M. (eds.): Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 77–85.Google Scholar
- Klaschka, J., Siciliano, R. & Antoch, J. (1998), ‘Computational Enhancements in Tree-Growing Methods.’, in A. Rizzi (eds.): Proceedings of the International Federation of Classification Societies Conference: Data Analysis, Classification, and Related Methods, Springer, Rome, 295–302.Google Scholar
- Kohavi, R.,(1996), ‘Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid.’, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Springer.Google Scholar
- Quinlan, J. R. (1986), ‘Induction of Decision Trees’, Machine Learning, (1), 81–106.Google Scholar
- Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA.Google Scholar
- Savichy, P., Klascka, J. & Antoch, J. (2000), ‘Optimal Classification Trees’, in Bethelem, J.G. & van der Heijden, P.G.M. Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 427–432.Google Scholar
- Siciliano, R. (1998), ‘Exploratory versus Decision Trees’, in Payne, R. & Green, P. Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 113–124.Google Scholar