Knowledge and Information Systems

, Volume 38, Issue 2, pp 259–275 | Cite as

A weighted voting framework for classifiers ensembles

Regular Paper

Abstract

We propose a probabilistic framework for classifier combination, which gives rigorous optimality conditions (minimum classification error) for four combination methods: majority vote, weighted majority vote, recall combiner and the naive Bayes combiner. The framework is based on two assumptions: class-conditional independence of the classifier outputs and an assumption about the individual accuracies. The four combiners are derived subsequently from one another, by progressively relaxing and then eliminating the second assumption. In parallel, the number of the trainable parameters increases from one combiner to the next. Simulation studies reveal that if the parameter estimates are accurate and the first assumption is satisfied, the order of preference of the combiners is: naive Bayes, recall, weighted majority and majority. By inducing label noise, we expose a caveat coming from the stability-plasticity dilemma. Experimental results with 73 benchmark data sets reveal that there is no definitive best combiner among the four candidates, giving a slight preference to naive Bayes. This combiner was better for problems with a large number of fairly balanced classes while weighted majority vote was better for problems with a small number of unbalanced classes.

Keywords

Classifier ensembles Combination rules Weighted majority vote Recall Naive Bayes 

References

  1. 1.
    Brown G (2010) Ensemble learning. In: Sammut C, Webb G (eds) In encyclopedia of machine learning. Springer, BerlinGoogle Scholar
  2. 2.
    Duin RPW (2002) The combining classifier: to train or not to train? In: Proceedings 16th international conference on pattern recognition, ICPR’02, Canada, pp. 765–770Google Scholar
  3. 3.
    Duin RPW, Tax DMJ (2000) Experiments with classifier combination rules. In: Kittler J, Roli F (eds) Multiple classifier systems, vol. 1857 of lecture notes in computer science. Springer, Italy, pp. 16–29Google Scholar
  4. 4.
    Eibl G, Pfeiffer KP (2005) Multiclass boosting for weak classifiers. J Mach Learn Res 6:189–210MATHMathSciNetGoogle Scholar
  5. 5.
    Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531CrossRefGoogle Scholar
  6. 6.
    Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Thirteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp. 148–156Google Scholar
  7. 7.
    Fumera G, Roli F (2005) A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 27:942–956CrossRefGoogle Scholar
  8. 8.
    Ghosh K, Ng YS, Srinivasan R (2011) Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault diagnostic methods. Comput Chem Eng 35(2):342–355CrossRefGoogle Scholar
  9. 9.
    Grossi V, Turini F (2012) Stream mining: a novel architecture for ensemble-based classification. Knowl Inf Syst 30:247–281CrossRefGoogle Scholar
  10. 10.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: An update, SIGKDD explorations 11Google Scholar
  11. 11.
    Kim H, Kim H, Moon H, Ahn H (2011) A weight-adjusted voting algorithm for ensembles of classifiers. J Korean Stat Soc 40(4):437–449CrossRefMathSciNetGoogle Scholar
  12. 12.
    Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239CrossRefGoogle Scholar
  13. 13.
    Kuncheva L (2002) A theoretical study on six classifier fusion strategies. IEEE Trans Pattern Anal Mach Intell 24(2):281–286CrossRefGoogle Scholar
  14. 14.
    Kuncheva LI (2003) ‘Fuzzy’ vs ‘non-fuzzy’ in combining classifiers designed by boosting. IEEE Trans Fuzzy Syst 11(6):729–741CrossRefGoogle Scholar
  15. 15.
    Kuncheva LI (2004) Combining pattern classifiers. Methods and algorithms. Wiley, New YorkCrossRefMATHGoogle Scholar
  16. 16.
    Kuncheva L, Whitaker C, Shipp C, Duin R (2003) Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl 6:22–31CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Lam L, Suen C (1995) Optimal combination of pattern classifiers. Pattern Recognit Lett 16:945–954CrossRefGoogle Scholar
  18. 18.
    Lam L, Suen C (1997) Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans Syst Man Cybern 27(5):553–568Google Scholar
  19. 19.
    Lin X, Yacoub S, Burns J, Simske S (2003) Performance analysis of pattern classifier combination by plurality voting. Pattern Recognit Lett 24(12):1795–1969CrossRefGoogle Scholar
  20. 20.
    Lingenfelser F, Wagner J, André E (2011) A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In: Proceedings of the 13th international conference on multimodal interfaces, ICMI ’11. ACM, New York, pp. 19–26Google Scholar
  21. 21.
    Matan O (1996) On voting ensembles of classifiers (extended abstract). In: Proceedings of AAAI-96 workshop on integrating multiple learned models, pp. 84–88Google Scholar
  22. 22.
    Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6:21–45CrossRefGoogle Scholar
  23. 23.
    Re M, Valentini G, (2011) Ensemble methods: a review, Data mining and machine learning for astronomical applications, Chapman & Hall, London (in press)Google Scholar
  24. 24.
    Read J, Bifet A, Holmes G, Pfahringer B (2012) Scalable and efficient multi-label classification for evolving data streams. Mach Learn 88(1–2, SI):243–272Google Scholar
  25. 25.
    Sewell M (2011) Ensemble learning, Technical Report RN/11/02. Department of Computer Science, UCL, LondonGoogle Scholar
  26. 26.
    Shapley L, Grofman B (1984) Optimizing group judgemental accuracy in the presence of interdependencies. Public Choice 43:329–343CrossRefGoogle Scholar
  27. 27.
    Tax DMJ, Duin RPW, van Breukelen M (1997) Comparison between product and mean classifier combination rules. In: Proceedings workshop on statistical pattern recognition, Prague, Czech RepublicGoogle Scholar
  28. 28.
    Tumer K, Ghosh J (1999) Combining artificial neural nets. In: Sharkey A (ed) Linear and order statistics combiners for pattern classification. Springer, London, pp 127–161Google Scholar
  29. 29.
    Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their application to handwriting recognition. IEEE Trans Syst Man Cybern 22:418–435CrossRefGoogle Scholar
  30. 30.
    Zhang CX, Duin RP (2011) An experimental study of one- and two-level classifier fusion for different sample sizes. Pattern Recognit Lett 32(14):1756–1767CrossRefGoogle Scholar
  31. 31.
    Zhang L, Zhou WD (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit 44(1):97–106CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  1. 1.School of Computer ScienceBangor UniversityBangor GwyneddUK
  2. 2.Departamento de Ingeniería CivilUniversidad de BurgosBurgosSpain

Personalised recommendations