Advertisement

Computational Statistics

, Volume 16, Issue 3, pp 323–339 | Cite as

Partitioning Algorithms and Combined Model Integration for Data Mining

  • Claudio Conversano
  • Francesco Mola
  • Roberta Siciliano
Article

Summary

In this paper a data-driven procedure is introduced enabling to extract information from complex and huge data sets for statistical purposes. The proposed strategy consists of three stages: tree-partitioning, modelling and model fusion. As a result, we define a final complex decision rule for supervised classification and prediction. Main tools are represented by the tree production rules and nonlinear regression models from the class of Generalized Additive Multi-Mixture Models. The benchmark of the proposed strategy is shown using a well-known real data set.

Keywords

Exploratory Trees Generalized Additive Models Mixing Parameters Bootstrap Averaging Backfitting Algorithm Generalized Additive Multi-Mixture Models Resampling Model Fusion 

Notes

Acknowledgements

Authors wish to thank the referees for helpful comments and Prof. Jaromir Antoch of Charles University in Prague for having carefully red a previous draft of the paper. The work of this paper was supported by MURST funds (prot. 9913182289) and by research funds of the Department of Mathematics and Statistics of the University of Naples.

References

  1. Blake, K., & Mertz, C.J., (1998), UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn.
  2. Breiman, L., (1996), ‘Bagging predictors’, Machine Learning, 26, 46–59.zbMATHGoogle Scholar
  3. Breiman, L., Friedman, J., Olshen, R.A. & Stone, C. J. (1984), Classification and Regression Trees, Wadsworth, Belmont CA.Google Scholar
  4. Conversano, C., & Mola, F., (2000), ‘Semi-parametric models for data mining’, in Bethlehem, J.G., van der Heijden, P.G.M. (eds.): Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 241–252.Google Scholar
  5. Conversano, C., Mola, F. & Siciliano, R., (2000(a)), ‘Generalized Additive Multi-Model for Classification and Prediction’, in Kiers, H.A.L., Rasson, J.P., Groen, P.J.F., Shader, M. (eds.): Data Analysis, Classification, and Related Methods, Springer, Berlin, 205–210.CrossRefGoogle Scholar
  6. Conversano, C., Siciliano, R. & Mola, F. (2000(b)), ‘Supervised Classifier Combination through Generalized Additive Multi-Model’, in Roli, F., Kittler, F. (eds.): Proceedings of the First International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science, Physica-Verlag, Heidelberg (D), 167–176.CrossRefGoogle Scholar
  7. Conversano, C., Siciliano, R. & Mola, F. (2000(c)), ‘Generalized Additive Multi-Mixture Models for Data Mining’, in Di Ciaccio, A., Borra, S. (eds.): Proceedings of the First International Workshop on Nonlinear Methods and Data Mining, CNR, Italy (I), 166–180.Google Scholar
  8. Coppi, R. (2000), ‘Design and Analysis Issues in Data Mining’, in Borra, S., Di Ciaccio, A. (eds.): Proceedings of the International Workshop on Nonlinear Methods and Data Mining, CISU, Rome, 126–141.Google Scholar
  9. Friedman, J.H. (1997), ‘Data Mining and Statistics: What’s the Connection?’, http://www-stat-stanford.edu/~jhf/.
  10. Hand, D. J. (2000), ‘Methodological issues in data mining’, in Bethlehem, J.G., van der Heijden, P.G.M. (eds.): Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 77–85.Google Scholar
  11. Hastie, T. & Tibshirani, R. (1990), Generalized Additive Models, Chapmann & Hall, London.zbMATHGoogle Scholar
  12. Klaschka, J., Siciliano, R. & Antoch, J. (1998), ‘Computational Enhancements in Tree-Growing Methods.’, in A. Rizzi (eds.): Proceedings of the International Federation of Classification Societies Conference: Data Analysis, Classification, and Related Methods, Springer, Rome, 295–302.Google Scholar
  13. Kohavi, R.,(1996), ‘Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid.’, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Springer.Google Scholar
  14. Mola, F., & Siciliano, R. (1992), ‘A Two-Stage Predictive Splitting Algorithm in Binary Segmentation’, in Dodge, Y. & Whittaker, J. (eds.): Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 179–184.CrossRefGoogle Scholar
  15. Mola, F. & Siciliano, R. (1997), ‘A Fast Splitting Procedure for Classification Trees’ Statistics and Computing, 7, 208–216.CrossRefGoogle Scholar
  16. Quinlan, J. R. (1986), ‘Induction of Decision Trees’, Machine Learning, (1), 81–106.Google Scholar
  17. Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA.Google Scholar
  18. Savichy, P., Klascka, J. & Antoch, J. (2000), ‘Optimal Classification Trees’, in Bethelem, J.G. & van der Heijden, P.G.M. Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 427–432.Google Scholar
  19. Siciliano, R. (1998), ‘Exploratory versus Decision Trees’, in Payne, R. & Green, P. Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 113–124.Google Scholar
  20. Siciliano, R. & Mola, F. (2000), ‘Multivariate Data Analysis through Classification and Regression Trees’, Computational Statistics and Data Analysis, (32), Elsevier Science, 285–301.CrossRefGoogle Scholar

Copyright information

© Physica-Verlag 2001

Authors and Affiliations

  • Claudio Conversano
    • 1
  • Francesco Mola
    • 2
  • Roberta Siciliano
    • 1
  1. 1.Dipartimento di Matematica e StatisticaUniversità di Napoli Federico IINapoliItalia
  2. 2.Dipartimento di EconoiniaUniversità di CagliariCagliariItalia

Personalised recommendations