Abstract
We propose a probabilistic framework for classifier combination, which gives rigorous optimality conditions (minimum classification error) for four combination methods: majority vote, weighted majority vote, recall combiner and the naive Bayes combiner. The framework is based on two assumptions: class-conditional independence of the classifier outputs and an assumption about the individual accuracies. The four combiners are derived subsequently from one another, by progressively relaxing and then eliminating the second assumption. In parallel, the number of the trainable parameters increases from one combiner to the next. Simulation studies reveal that if the parameter estimates are accurate and the first assumption is satisfied, the order of preference of the combiners is: naive Bayes, recall, weighted majority and majority. By inducing label noise, we expose a caveat coming from the stability-plasticity dilemma. Experimental results with 73 benchmark data sets reveal that there is no definitive best combiner among the four candidates, giving a slight preference to naive Bayes. This combiner was better for problems with a large number of fairly balanced classes while weighted majority vote was better for problems with a small number of unbalanced classes.
Similar content being viewed by others
Notes
This should be called rather the plurality vote because the assigned label is the most voted one, in spite of the fact that majority of more than 50 % may not be reached.
Conditional independence means that
$$\begin{aligned} P(s_1,s_2,\ldots ,s_L|\omega _k)=P(s_1|\omega _k)P(s_2|\omega _k)\ldots P(s_L|\omega _k). \end{aligned}$$However, this assumption precludes unconditional independence, that is,
$$\begin{aligned} P(s_1,s_2,\ldots ,s_L)\ne P(s_1)P(s_2)\ldots P(s_L). \end{aligned}$$Each set of c random numbers summing up to 1 had the same chance of being generated.
References
Brown G (2010) Ensemble learning. In: Sammut C, Webb G (eds) In encyclopedia of machine learning. Springer, Berlin
Duin RPW (2002) The combining classifier: to train or not to train? In: Proceedings 16th international conference on pattern recognition, ICPR’02, Canada, pp. 765–770
Duin RPW, Tax DMJ (2000) Experiments with classifier combination rules. In: Kittler J, Roli F (eds) Multiple classifier systems, vol. 1857 of lecture notes in computer science. Springer, Italy, pp. 16–29
Eibl G, Pfeiffer KP (2005) Multiclass boosting for weak classifiers. J Mach Learn Res 6:189–210
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Thirteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp. 148–156
Fumera G, Roli F (2005) A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 27:942–956
Ghosh K, Ng YS, Srinivasan R (2011) Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault diagnostic methods. Comput Chem Eng 35(2):342–355
Grossi V, Turini F (2012) Stream mining: a novel architecture for ensemble-based classification. Knowl Inf Syst 30:247–281
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: An update, SIGKDD explorations 11
Kim H, Kim H, Moon H, Ahn H (2011) A weight-adjusted voting algorithm for ensembles of classifiers. J Korean Stat Soc 40(4):437–449
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Kuncheva L (2002) A theoretical study on six classifier fusion strategies. IEEE Trans Pattern Anal Mach Intell 24(2):281–286
Kuncheva LI (2003) ‘Fuzzy’ vs ‘non-fuzzy’ in combining classifiers designed by boosting. IEEE Trans Fuzzy Syst 11(6):729–741
Kuncheva LI (2004) Combining pattern classifiers. Methods and algorithms. Wiley, New York
Kuncheva L, Whitaker C, Shipp C, Duin R (2003) Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl 6:22–31
Lam L, Suen C (1995) Optimal combination of pattern classifiers. Pattern Recognit Lett 16:945–954
Lam L, Suen C (1997) Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans Syst Man Cybern 27(5):553–568
Lin X, Yacoub S, Burns J, Simske S (2003) Performance analysis of pattern classifier combination by plurality voting. Pattern Recognit Lett 24(12):1795–1969
Lingenfelser F, Wagner J, André E (2011) A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In: Proceedings of the 13th international conference on multimodal interfaces, ICMI ’11. ACM, New York, pp. 19–26
Matan O (1996) On voting ensembles of classifiers (extended abstract). In: Proceedings of AAAI-96 workshop on integrating multiple learned models, pp. 84–88
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6:21–45
Re M, Valentini G, (2011) Ensemble methods: a review, Data mining and machine learning for astronomical applications, Chapman & Hall, London (in press)
Read J, Bifet A, Holmes G, Pfahringer B (2012) Scalable and efficient multi-label classification for evolving data streams. Mach Learn 88(1–2, SI):243–272
Sewell M (2011) Ensemble learning, Technical Report RN/11/02. Department of Computer Science, UCL, London
Shapley L, Grofman B (1984) Optimizing group judgemental accuracy in the presence of interdependencies. Public Choice 43:329–343
Tax DMJ, Duin RPW, van Breukelen M (1997) Comparison between product and mean classifier combination rules. In: Proceedings workshop on statistical pattern recognition, Prague, Czech Republic
Tumer K, Ghosh J (1999) Combining artificial neural nets. In: Sharkey A (ed) Linear and order statistics combiners for pattern classification. Springer, London, pp 127–161
Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their application to handwriting recognition. IEEE Trans Syst Man Cybern 22:418–435
Zhang CX, Duin RP (2011) An experimental study of one- and two-level classifier fusion for different sample sizes. Pattern Recognit Lett 32(14):1756–1767
Zhang L, Zhou WD (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit 44(1):97–106
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kuncheva, L.I., Rodríguez, J.J. A weighted voting framework for classifiers ensembles. Knowl Inf Syst 38, 259–275 (2014). https://doi.org/10.1007/s10115-012-0586-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0586-6