Abstract
Numerous data mining methods have recently been developed, and there is often a need to select the most appropriate data mining method or methods. The method selection can be done statically or dynamically. Dynamic selection takes into account characteristics of a new instance and usually results in higher classification accuracy. We discuss a dynamic integration algorithm for an ensemble of classifiers. Our algorithm is a new variation of the stacked generalization method and is based on the basic assumption that each basic classifier is best inside certain subareas of the application domain. The algorithm includes two main phases: a learning phase, which collects information about the quality of classifications made by the basic classifiers into a performance matrix, and an application phase, which predicts the goodness of classification for a new instance produced by the basic classifiers using the performance matrix. In this paper we present also experiments made on three machine learning data sets, which show promising results.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Aivazyan, S.A.: Applied Statistics: Classification and Dimension Reduction. Finance and Statistics, Moscow (1989).
Chan, P., Stolfo, S.: On the Accuracy of Meta-Learning for Scalable Data Mining. Intelligent Information Systems, Vol. 8 (1997) 5–28.
Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning, Vol. 10, No. 1 (1993) 57–78.
Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, Vol. 10, No. 7 (1998) 1895–1923.
Dietterich, T.G.: Machine Learning Research: Four Current Directions. AI Magazine, Vol. 18, No. 4 (1997) 97–136.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1997).
Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of IJCAI’95 (1995).
Kohavi, R., Sommerfield, D., Dougherty, J.: Data Mining Using MLC++: A Machine Learning Library in C++. Tools with Artificial Intelligence, IEEE CS Press (1996) 234–245.
Koppel, M., Engelson, S.P.: Integrating Multiple Classifiers by Finding their Areas of Expertise. In: AAAI-96 Workshop On Integrating Multiple Learning Models (1996) 53–58.
Merz, C.: Dynamical Selection of Learning Algorithms. In: D. Fisher, H.-J. Lenz (Eds.), Learning from Data, Artificial Intelligence and Statistics, Springer Verlag, NY (1996).
Merz, C.J.: Combining Classifiers Using Correspondence Analysis. In: Advances in Neural Information Processing Systems 10, M.I.Jordan, M.J.Kearns, S.A.Solla, eds., MIT Press, 1998.
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases http://www.ics.uci.edu/≈mlearn/ MLRepository.html. Dep-t of Information and CS, Un-ty of California, Irvine, CA (1998).
Ortega, J., Koppel, M., Argamon-Engelson, S.: Arbitrating Among Competing Classifiers Using Learned Referees, Machine Learning (1998) to appear.
Puuronen, S., Terziyan, V., Katasonov, A., Tsymbal, A.: Dynamic Integration of Multiple Data Mining Techniques in a Knowledge Discovery Management System. In: SPIE Conf. on Data Mining and Knowledge Discovery, 5–9 April 1999, Orlando. Florida (to appear).
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993).
Schapire, R.E.: Using Output Codes to Boost Multiclass Learning Problems. In: Machine Learning: Proceedings of the Fourteenth International Conference (1997) 313–321.
Skalak, D.B.: Combining Nearest Neighbor Classifiers. Ph.D. Thesis, Dept. of Computer Science, University of Massachusetts, Amherst, MA (1997).
Terziyan, V., Tsymbal, A., Puuronen, S.: The Decision Support System for Telemedicine Based on Multiple Expertise. Int. J. of Medical Informatics, Vol. 49, No. 2 (1998) 217–229.
Terziyan, V., Tsymbal, A., Tkachuk, A., Puuronen, S.: Intelligent Medical Diagnostics System Based on Integration of Statistical Methods. In: Informatica Medica Slovenica, Journal of Slovenian Society of Medical Informatics, Vol. 3, Ns. 1,2,3 (1996) 109–114.
Tsymbal, A., Puuronen, S., Terziyan, V.: Advanced Dynamic Selection of Diagnostic Methods. In: Proceedings 11th IEEE Symp. on Computer-Based Medical Systems CMBS’98, IEEE CS Press, Lubbock, Texas, June (1998) 50–54.
Wolpert, D.: Stacked Generalization. Neural Networks, Vol. 5 (1992) 241–259.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Puuronen, S., Terziyan, V., Tsymbal, A. (1999). A dynamic integration algorithm for an ensemble of classifiers. In: Raś, Z.W., Skowron, A. (eds) Foundations of Intelligent Systems. ISMIS 1999. Lecture Notes in Computer Science, vol 1609. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095148
Download citation
DOI: https://doi.org/10.1007/BFb0095148
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65965-5
Online ISBN: 978-3-540-48828-6
eBook Packages: Springer Book Archive