A semi-hard voting combiner scheme to ensemble multi-class probabilistic classifiers

Delgado, Rosario

doi:10.1007/s10489-021-02447-7

A semi-hard voting combiner scheme to ensemble multi-class probabilistic classifiers

Published: 09 July 2021

Volume 52, pages 3653–3677, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Rosario Delgado ORCID: orcid.org/0000-0003-1208-9236¹

535 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Ensembling of probabilistic classifiers is a technique that has been widely applied in classification, allowing to build a new classifier combining a set of base classifiers. Of the different schemes that can be used to construct the ensemble, we focus on the simple majority vote (MV), which is one of the most popular combiner schemes, being the foundation of the meta-algorithm bagging. We propose a non-trainable weighted version of the simple majority vote rule that, instead of assign weights to each base classifier based on their respective estimated accuracies, uses the confidence level CL, which is the standard measure of the degree of support that each one of the base classifiers gives to its prediction. In the binary case, we prove that if the number of base classifiers is odd, the accuracy of this scheme is greater than that of the majority vote. Moreover, through a sensitivity analysis, we show in the multi-class setting that its resilience to the estimation error of the probabilities assigned by the classifiers to each class is greater than that of the average scheme. We also consider another simple measure of the degree of support that incorporates additional knowledge of the probability distribution over the classes, namely the modified confidence level MCL. The usefulness for bagging of the proposed weighted majority vote based on CL or MCL is checked through a series of experiments with different databases of public access, resulting that it outperforms the simple majority vote in the sense of a statistically significant improvement regarding two performance measures: Accuracy and Matthews Correlation Coefficient (MCC), while holding up against the average combiner, which majority vote does not, being less computationally demanding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

A survey on ensemble learning

Article 30 August 2019

Notes

References

Abdelhamida N, Ayesha A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41:5948–5959
Article Google Scholar
Amrieh EA, Hamtini T, Aljarah I (2015) Preprocessing and analyzing educational dataset using X-API for improving student’s performance. In: IEEE jordan conference on applied electrical engineering and computing technologies (AEECT), Amman, 1–5
Amrieh EA, Hamtini T, Aljarah I (2016) Mining educational data to predict student’s academic performance using ensemble methods. Int J Database Theor Appl 9(8):119–136
Article Google Scholar
Benjamin Fredrick David H, Suruliandi A, Raja SP (2019) Preventing crimes ahead of time by predicting crime propensity in released prisoners using Data Mining techniques. Int J Appl Decis Sci 12(3):307–336
Google Scholar
Bi Y, Guan J, Bell D (2008) The combination of multiple classifiers using an evidential reasoning approach. Artif Intell 172(15):1731–1751
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 26(2):123–140
MATH Google Scholar
Breiman L (2001) Random forests. Machine Learn 45:5–32
Article Google Scholar
Cortez P, Silva A (2008) Using data mining to predict secondary school student performance. In: Brito A, Teixeira J (eds) Proceedings of 5th FUture BUsiness TEChnology conference (FUBUTEC 2008) pp. 5–12, Porto, Portugal, April, 2008, EUROSIS ISBN 978-9077381-39-7,
Delgado R, Tibau XA (2019) Why Cohen’s Kappa should be avoided as performance measure in classification. PLoSONE 14(9):e0222916. https://doi.org/10.1371/journal.pone.0222916
Article Google Scholar
Dettling M, Bühlmann P (2003) Boosting for tumor classification with gene expression data. Bioinformatics 19(9):1061–1069
Article Google Scholar
Dietterich TG (1997) Machine-learning research: Four current directions. The AI Magazine 8 (4):97–136
Google Scholar
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157
Article Google Scholar
Dua D, Graff C (2019) UCI Machine Learning Repository. [Http://archive.ics.uci.edu/ml]. irvine, CA: University of California School of Information and Computer Science
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: ICML’96: Proceedings of the thirteenth international conference on international conference on machine learning, vol 96, pp 148–156
Gorodkin J (2004) Comparing two k-category assignments by a k-category correlation coefficient. Computat Biol Chemist 28(5-6):367–374
Article Google Scholar
Hansen LK, Salamon P (1990) Neural networs ensembles. IEEE Trans Pattern Anal Machine Intell 12(10):993–1001
Article Google Scholar
Hao J, Zhang B, Yue K, Wang J, Wu H Sun X, Chao HC, You X, Bertino E (eds) (2017) Performance measurement and configuration optimization of virtual machines based on the bayesian network, vol 10603. Springer, Cham
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Machine Intell 20(3):226–239
Article Google Scholar
Kuncheva LI (2004) Combining pattern classifiers. Wiley, New York
Book Google Scholar
Liu Q, Lu J, Chen S, Zhao K (2014) Multiple naïve Bayes Classifiers Ensemble for Traffic Incident Detection, Mathematical Problems in Engineering, vol. 2014, Article ID 383671, 16 pages Hidawi Publishing Corporation
Matthews BW (1975) Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochim Biophys Acta (BBA)-Protein Struct 405(2):442–451
Article Google Scholar
R Core Team R (2013) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria http://www.R-project.org/
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3-4):591–611
Article MathSciNet Google Scholar
Shapley L, Grofman B (1984) Optimizing group judgement accuracy in the presence of interdependencies. Public Choice 43:329–343
Article Google Scholar
Sharma M, Bilgic M (2017) Evidence-based uncertainty sampling for active learning. Data Minning Knowl Discov 31:164–202
Article MathSciNet Google Scholar
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: A classification and regression tool for compound classification and QSAR modeling. J Chem Inform Comput Sci 43(6):1947–1958
Article Google Scholar
Twala B (2010) Multiple classifier application to credit risk assessment. Expert Syst Appl 37 (4):3326–3336
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biomet Bull 1(6):80–83
Article Google Scholar
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Article Google Scholar
Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their application to handwriting recognition. IEEE Trans Syst Man Cybern 22:418–435
Article Google Scholar

Download references

Acknowledgments

The author is supported by Ministerio de Ciencia, Innovación y Universidades, Gobierno de España, project ref. PGC2018-097848-B-I0.

Author information

Authors and Affiliations

Department of Mathematics, Universitat Autònoma de Barcelona, Edifici C- Campus de la UAB. Av. de l’Eix Central s/n., 08193 Bellaterra (Cerdanyola del Vallès), Barcelona, Spain
Rosario Delgado

Authors

Rosario Delgado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rosario Delgado.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix : A: Properties of MCL

Proposition 5

For r ≥ 2,

a)
$$\frac{1}{r}\le \textup{CL} \le \textup{MCL} \le \textup{CL}+(1-\textup{CL}) \frac{r \textup{CL}-1}{r-1}\le 1 .$$
b)
If r > 2, fixed $\widetilde {\textup {CL}}$, MCL is an increasing function of CL, while fixed CL it is a straight line of non-positive slope as function of $\widetilde {\textup {CL}}$, then decreasing. Moreover, if $\widetilde {\textup {CL}}\le 1/r$,
$$ \textup{CL}\le \textup{CL} + (1-\textup{CL}) \frac{r \textup{CL}-1}{r}\le {\textup{MCL}} $$

Proof

a)
The first inequality is evident by definition of CL. Second inequality is obvious, being strict except if CL= 1 or $\widetilde {\text {CL}}=$CL. Third inequality is due to the fact that

$$\widetilde{\text{CL}}\ge \frac{1-\text{CL}}{r-1} ,$$

which is obvious by definition of $\widetilde {\text {CL}}$, since a total probability of 1 −CL is divided among the r − 1 non-maximum values, from which $\widetilde {\text {CL}}$ is defined, in turn, as the maximum.

Finally, we prove that $\text {CL}+(1-\text {CL}) \frac {r \text {CL}-1}{r-1}\le 1$. Indeed, simple algebraic manipulations show that this inequality is equivalent to (1 −CL)² ≥ 0, what is obviously fulfilled. This inequality is strict except if CL= 1.

b)
Fixed $\widetilde {\text {CL}}$, MCL as function of x = CL is $g(x)=-x^{2}+(2+\widetilde {\text {CL}}) x-\widetilde {\text {CL}}$, which is strictly increasing since its first derivative is $g^{\prime }(x)=-2 x+(2+\widetilde {\text {CL}})$, which is > 0 for 0 < x ≤ 1. On the other hand, fixed CL, as function of $z=\widetilde {\text {CL}}$ MCL is h(z) = −(1 −CL)z + CL + (1 −CL)CL.

□

Corollary 1

For r ≥ 2,

$$\textup{CL}=1 \Longleftrightarrow \textup{MCL}=1$$

Proof

By definition of MCL, if CL= 1 then MCL= CL = 1.

The reverse implication is also true. Indeed, MCL= 1 implies by Proposition 5 a) that

$$\text{CL}+(1-\text{CL}) \frac{r \text{CL}-1}{r-1}=1 ,$$

which is equivalent to (1 −CL)² = 0, implying CL = 1. □

Appendix B: Complementary tables

Table 13 Comparison between MCL-MV, CL-MV, majority vote and average combiner schemes, with the different choices for the number of bags in bagging, attending to Accuracy and MCC, for different datasets

Full size table

Table 14 Continuation of Table 13

Full size table

Table 15 Comparison between MCL-MV, CL-MV, majority vote and average combiner schemes, with the different choices for the number of bags in bagging, attending to the mean running times

Full size table

Table 16 Average over the runs of the averages over the folds, for the metrics Accuracy and MCC, with the different combiner ensembles used for bagging purpose, for all the datasets

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Delgado, R. A semi-hard voting combiner scheme to ensemble multi-class probabilistic classifiers. Appl Intell 52, 3653–3677 (2022). https://doi.org/10.1007/s10489-021-02447-7

Download citation

Accepted: 20 April 2021
Published: 09 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02447-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A semi-hard voting combiner scheme to ensemble multi-class probabilistic classifiers

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A comparative analysis of gradient boosting algorithms

A survey on ensemble learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix : A: Properties of MCL

Proposition 5

Proof

Corollary 1

Proof

Appendix B: Complementary tables

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A semi-hard voting combiner scheme to ensemble multi-class probabilistic classifiers

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A comparative analysis of gradient boosting algorithms

A survey on ensemble learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix : A: Properties of MCL

Proposition 5

Proof

Corollary 1

Proof

Appendix B: Complementary tables

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation