Skip to main content
Log in

A semi-hard voting combiner scheme to ensemble multi-class probabilistic classifiers

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Ensembling of probabilistic classifiers is a technique that has been widely applied in classification, allowing to build a new classifier combining a set of base classifiers. Of the different schemes that can be used to construct the ensemble, we focus on the simple majority vote (MV), which is one of the most popular combiner schemes, being the foundation of the meta-algorithm bagging. We propose a non-trainable weighted version of the simple majority vote rule that, instead of assign weights to each base classifier based on their respective estimated accuracies, uses the confidence level CL, which is the standard measure of the degree of support that each one of the base classifiers gives to its prediction. In the binary case, we prove that if the number of base classifiers is odd, the accuracy of this scheme is greater than that of the majority vote. Moreover, through a sensitivity analysis, we show in the multi-class setting that its resilience to the estimation error of the probabilities assigned by the classifiers to each class is greater than that of the average scheme. We also consider another simple measure of the degree of support that incorporates additional knowledge of the probability distribution over the classes, namely the modified confidence level MCL. The usefulness for bagging of the proposed weighted majority vote based on CL or MCL is checked through a series of experiments with different databases of public access, resulting that it outperforms the simple majority vote in the sense of a statistically significant improvement regarding two performance measures: Accuracy and Matthews Correlation Coefficient (MCC), while holding up against the average combiner, which majority vote does not, being less computationally demanding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. https://CRAN.R-project.org/package=RWeka

  2. http://www.kaggle.com

  3. https://CRAN.R-project.org/package=tictoc

References

  1. Abdelhamida N, Ayesha A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41:5948–5959

    Article  Google Scholar 

  2. Amrieh EA, Hamtini T, Aljarah I (2015) Preprocessing and analyzing educational dataset using X-API for improving student’s performance. In: IEEE jordan conference on applied electrical engineering and computing technologies (AEECT), Amman, 1–5

  3. Amrieh EA, Hamtini T, Aljarah I (2016) Mining educational data to predict student’s academic performance using ensemble methods. Int J Database Theor Appl 9(8):119–136

    Article  Google Scholar 

  4. Benjamin Fredrick David H, Suruliandi A, Raja SP (2019) Preventing crimes ahead of time by predicting crime propensity in released prisoners using Data Mining techniques. Int J Appl Decis Sci 12(3):307–336

    Google Scholar 

  5. Bi Y, Guan J, Bell D (2008) The combination of multiple classifiers using an evidential reasoning approach. Artif Intell 172(15):1731–1751

    Article  Google Scholar 

  6. Breiman L (1996) Bagging predictors. Mach Learn 26(2):123–140

    MATH  Google Scholar 

  7. Breiman L (2001) Random forests. Machine Learn 45:5–32

    Article  Google Scholar 

  8. Cortez P, Silva A (2008) Using data mining to predict secondary school student performance. In: Brito A, Teixeira J (eds) Proceedings of 5th FUture BUsiness TEChnology conference (FUBUTEC 2008) pp. 5–12, Porto, Portugal, April, 2008, EUROSIS ISBN 978-9077381-39-7,

  9. Delgado R, Tibau XA (2019) Why Cohen’s Kappa should be avoided as performance measure in classification. PLoSONE 14(9):e0222916. https://doi.org/10.1371/journal.pone.0222916

    Article  Google Scholar 

  10. Dettling M, Bühlmann P (2003) Boosting for tumor classification with gene expression data. Bioinformatics 19(9):1061–1069

    Article  Google Scholar 

  11. Dietterich TG (1997) Machine-learning research: Four current directions. The AI Magazine 8 (4):97–136

    Google Scholar 

  12. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157

    Article  Google Scholar 

  13. Dua D, Graff C (2019) UCI Machine Learning Repository. [Http://archive.ics.uci.edu/ml]. irvine, CA: University of California School of Information and Computer Science

  14. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: ICML’96: Proceedings of the thirteenth international conference on international conference on machine learning, vol 96, pp 148–156

  15. Gorodkin J (2004) Comparing two k-category assignments by a k-category correlation coefficient. Computat Biol Chemist 28(5-6):367–374

    Article  Google Scholar 

  16. Hansen LK, Salamon P (1990) Neural networs ensembles. IEEE Trans Pattern Anal Machine Intell 12(10):993–1001

    Article  Google Scholar 

  17. Hao J, Zhang B, Yue K, Wang J, Wu H Sun X, Chao HC, You X, Bertino E (eds) (2017) Performance measurement and configuration optimization of virtual machines based on the bayesian network, vol 10603. Springer, Cham

  18. Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Machine Intell 20(3):226–239

    Article  Google Scholar 

  19. Kuncheva LI (2004) Combining pattern classifiers. Wiley, New York

    Book  Google Scholar 

  20. Liu Q, Lu J, Chen S, Zhao K (2014) Multiple naïve Bayes Classifiers Ensemble for Traffic Incident Detection, Mathematical Problems in Engineering, vol. 2014, Article ID 383671, 16 pages Hidawi Publishing Corporation

  21. Matthews BW (1975) Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochim Biophys Acta (BBA)-Protein Struct 405(2):442–451

    Article  Google Scholar 

  22. R Core Team R (2013) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria http://www.R-project.org/

  23. Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3-4):591–611

    Article  MathSciNet  Google Scholar 

  24. Shapley L, Grofman B (1984) Optimizing group judgement accuracy in the presence of interdependencies. Public Choice 43:329–343

    Article  Google Scholar 

  25. Sharma M, Bilgic M (2017) Evidence-based uncertainty sampling for active learning. Data Minning Knowl Discov 31:164–202

    Article  MathSciNet  Google Scholar 

  26. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: A classification and regression tool for compound classification and QSAR modeling. J Chem Inform Comput Sci 43(6):1947–1958

    Article  Google Scholar 

  27. Twala B (2010) Multiple classifier application to credit risk assessment. Expert Syst Appl 37 (4):3326–3336

    Article  Google Scholar 

  28. Wilcoxon F (1945) Individual comparisons by ranking methods. Biomet Bull 1(6):80–83

    Article  Google Scholar 

  29. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82

    Article  Google Scholar 

  30. Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their application to handwriting recognition. IEEE Trans Syst Man Cybern 22:418–435

    Article  Google Scholar 

Download references

Acknowledgments

The author is supported by Ministerio de Ciencia, Innovación y Universidades, Gobierno de España, project ref. PGC2018-097848-B-I0.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosario Delgado.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix : A: Properties of MCL

Proposition 5

For r ≥ 2,

  1. a)
    $$\frac{1}{r}\le \textup{CL} \le \textup{MCL} \le \textup{CL}+(1-\textup{CL}) \frac{r \textup{CL}-1}{r-1}\le 1 .$$
  2. b)

    If r > 2, fixed \(\widetilde {\textup {CL}}\), MCL is an increasing function of CL, while fixed CL it is a straight line of non-positive slope as function of \(\widetilde {\textup {CL}}\), then decreasing. Moreover, if \(\widetilde {\textup {CL}}\le 1/r\),

    $$ \textup{CL}\le \textup{CL} + (1-\textup{CL}) \frac{r \textup{CL}-1}{r}\le {\textup{MCL}} $$

Proof

  1. a)

    The first inequality is evident by definition of CL. Second inequality is obvious, being strict except if CL= 1 or \(\widetilde {\text {CL}}=\)CL. Third inequality is due to the fact that

$$\widetilde{\text{CL}}\ge \frac{1-\text{CL}}{r-1} ,$$

which is obvious by definition of \(\widetilde {\text {CL}}\), since a total probability of 1 −CL is divided among the r − 1 non-maximum values, from which \(\widetilde {\text {CL}}\) is defined, in turn, as the maximum.

Finally, we prove that \(\text {CL}+(1-\text {CL}) \frac {r \text {CL}-1}{r-1}\le 1\). Indeed, simple algebraic manipulations show that this inequality is equivalent to (1 −CL)2 ≥ 0, what is obviously fulfilled. This inequality is strict except if CL= 1.

  1. b)

    Fixed \(\widetilde {\text {CL}}\), MCL as function of x = CL is \(g(x)=-x^{2}+(2+\widetilde {\text {CL}}) x-\widetilde {\text {CL}}\), which is strictly increasing since its first derivative is \(g^{\prime }(x)=-2 x+(2+\widetilde {\text {CL}})\), which is > 0 for 0 < x ≤ 1. On the other hand, fixed CL, as function of \(z=\widetilde {\text {CL}}\) MCL is h(z) = −(1 −CL)z + CL + (1 −CL)CL.

Corollary 1

For r ≥ 2,

$$\textup{CL}=1 \Longleftrightarrow \textup{MCL}=1$$

Proof

By definition of MCL, if CL= 1 then MCL= CL = 1.

The reverse implication is also true. Indeed, MCL= 1 implies by Proposition 5 a) that

$$\text{CL}+(1-\text{CL}) \frac{r \text{CL}-1}{r-1}=1 ,$$

which is equivalent to (1 −CL)2 = 0, implying CL = 1.

Appendix B: Complementary tables

Table 13 Comparison between MCL-MV, CL-MV, majority vote and average combiner schemes, with the different choices for the number of bags in bagging, attending to Accuracy and MCC, for different datasets
Table 14 Continuation of Table 13
Table 15 Comparison between MCL-MV, CL-MV, majority vote and average combiner schemes, with the different choices for the number of bags in bagging, attending to the mean running times
Table 16 Average over the runs of the averages over the folds, for the metrics Accuracy and MCC, with the different combiner ensembles used for bagging purpose, for all the datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Delgado, R. A semi-hard voting combiner scheme to ensemble multi-class probabilistic classifiers. Appl Intell 52, 3653–3677 (2022). https://doi.org/10.1007/s10489-021-02447-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02447-7

Keywords

Navigation