A dynamic integration algorithm for an ensemble of classifiers

  • Seppo Puuronen
  • Vagan Terziyan
  • Alexey Tsymbal
Communications 8A Learning and Knowledge Discovery
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1609)


Numerous data mining methods have recently been developed, and there is often a need to select the most appropriate data mining method or methods. The method selection can be done statically or dynamically. Dynamic selection takes into account characteristics of a new instance and usually results in higher classification accuracy. We discuss a dynamic integration algorithm for an ensemble of classifiers. Our algorithm is a new variation of the stacked generalization method and is based on the basic assumption that each basic classifier is best inside certain subareas of the application domain. The algorithm includes two main phases: a learning phase, which collects information about the quality of classifications made by the basic classifiers into a performance matrix, and an application phase, which predicts the goodness of classification for a new instance produced by the basic classifiers using the performance matrix. In this paper we present also experiments made on three machine learning data sets, which show promising results.


  1. 1.
    Aivazyan, S.A.: Applied Statistics: Classification and Dimension Reduction. Finance and Statistics, Moscow (1989).Google Scholar
  2. 2.
    Chan, P., Stolfo, S.: On the Accuracy of Meta-Learning for Scalable Data Mining. Intelligent Information Systems, Vol. 8 (1997) 5–28.CrossRefGoogle Scholar
  3. 3.
    Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning, Vol. 10, No. 1 (1993) 57–78.Google Scholar
  4. 4.
    Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, Vol. 10, No. 7 (1998) 1895–1923.CrossRefGoogle Scholar
  5. 5.
    Dietterich, T.G.: Machine Learning Research: Four Current Directions. AI Magazine, Vol. 18, No. 4 (1997) 97–136.Google Scholar
  6. 6.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1997).Google Scholar
  7. 7.
    Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of IJCAI’95 (1995).Google Scholar
  8. 8.
    Kohavi, R., Sommerfield, D., Dougherty, J.: Data Mining Using MLC++: A Machine Learning Library in C++. Tools with Artificial Intelligence, IEEE CS Press (1996) 234–245.Google Scholar
  9. 9.
    Koppel, M., Engelson, S.P.: Integrating Multiple Classifiers by Finding their Areas of Expertise. In: AAAI-96 Workshop On Integrating Multiple Learning Models (1996) 53–58.Google Scholar
  10. 10.
    Merz, C.: Dynamical Selection of Learning Algorithms. In: D. Fisher, H.-J. Lenz (Eds.), Learning from Data, Artificial Intelligence and Statistics, Springer Verlag, NY (1996).Google Scholar
  11. 11.
    Merz, C.J.: Combining Classifiers Using Correspondence Analysis. In: Advances in Neural Information Processing Systems 10, M.I.Jordan, M.J.Kearns, S.A.Solla, eds., MIT Press, 1998.Google Scholar
  12. 12.
    Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases≈mlearn/ MLRepository.html. Dep-t of Information and CS, Un-ty of California, Irvine, CA (1998).Google Scholar
  13. 13.
    Ortega, J., Koppel, M., Argamon-Engelson, S.: Arbitrating Among Competing Classifiers Using Learned Referees, Machine Learning (1998) to appear.Google Scholar
  14. 14.
    Puuronen, S., Terziyan, V., Katasonov, A., Tsymbal, A.: Dynamic Integration of Multiple Data Mining Techniques in a Knowledge Discovery Management System. In: SPIE Conf. on Data Mining and Knowledge Discovery, 5–9 April 1999, Orlando. Florida (to appear).Google Scholar
  15. 15.
    Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993).Google Scholar
  16. 16.
    Schapire, R.E.: Using Output Codes to Boost Multiclass Learning Problems. In: Machine Learning: Proceedings of the Fourteenth International Conference (1997) 313–321.Google Scholar
  17. 17.
    Skalak, D.B.: Combining Nearest Neighbor Classifiers. Ph.D. Thesis, Dept. of Computer Science, University of Massachusetts, Amherst, MA (1997).Google Scholar
  18. 18.
    Terziyan, V., Tsymbal, A., Puuronen, S.: The Decision Support System for Telemedicine Based on Multiple Expertise. Int. J. of Medical Informatics, Vol. 49, No. 2 (1998) 217–229.CrossRefGoogle Scholar
  19. 19.
    Terziyan, V., Tsymbal, A., Tkachuk, A., Puuronen, S.: Intelligent Medical Diagnostics System Based on Integration of Statistical Methods. In: Informatica Medica Slovenica, Journal of Slovenian Society of Medical Informatics, Vol. 3, Ns. 1,2,3 (1996) 109–114.Google Scholar
  20. 20.
    Tsymbal, A., Puuronen, S., Terziyan, V.: Advanced Dynamic Selection of Diagnostic Methods. In: Proceedings 11th IEEE Symp. on Computer-Based Medical Systems CMBS’98, IEEE CS Press, Lubbock, Texas, June (1998) 50–54.CrossRefGoogle Scholar
  21. 21.
    Wolpert, D.: Stacked Generalization. Neural Networks, Vol. 5 (1992) 241–259.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Seppo Puuronen
    • 1
  • Vagan Terziyan
    • 2
  • Alexey Tsymbal
    • 1
  1. 1.University of JyväskyläJyväskyläFinland
  2. 2.Kharkov State Technical University of RadioelectronicsKharkovUkraine

Personalised recommendations