Abstract
Classification is one of the most important topics in machine learning. However, most of these works focus on the two-class classification (i.e., classification into ‘positive’ and ‘negative’), whereas studies on multi-class classification are far from enough. In this study, we develop a novel methodology of multiple classifier systems (MCS) with one-vs-one (OVO) scheme for the multi-class classification task. First, the multi-class classification problem is divided into as many pairs of easier-to-solve binary sub-problems as possible. Subsequently, an optimal MCS is generated for each sub-problem using a roulette-based feature subspace selection and validation procedure. Finally, to identify the final class of a query sample, an OVO aggregation strategy is employed to obtain the class from the confidence score matrix derived from the MCS. To verify the effectiveness and robustness of the proposed approach, a thorough experimental study is performed. The extracted findings supported by the proper statistical analysis indicate the strength of the proposed method with respect to the state-of-the-art methods for multi-class classification problems.
Similar content being viewed by others
References
Aha DW, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Alcalá-Fdez J, Sanchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM et al (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Britto AS Jr, Sabourin R, Oliveira LE (2014) Dynamic selection of classifiers - a comprehensive review. Pattern Recognit 47(11):3665–3680
Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of the twenty-first international conference on Machine learning, p 18
Cevikalp H, Polikar R (2008) Local classifier weighting by quadratic programming. IEEE Trans Neural Netw 19(10):1832–1838
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Chen CW, Tsai YH, Chang FR, Lin WC (2020) Ensemble feature selection in medical datasets: combining filter, wrapper, and embedded feature selection results. Exp Syst 37(5):e12553. https://doi.org/10.1111/exsy.12553
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166
Cicchetti DV, Feinstein AR (1990) High agreement but low kappa: II. resolving the paradoxes. J Clin Epidemiol 43(6):551–558
Clark P, Boswell R (1991) Rule induction with cn2: some recent improvements. In: European Working Session on Learning, Springer, pp 151–163
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Cruz RM, Sabourin R, Cavalcanti GD (2018) Dynamic classifier selection: recent advances and perspectives. Inf Fusion 41:195–216
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Dietterich TG, Bakiri G (1994) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286
Dos Santos EM, Sabourin R, Maupin P (2008) A dynamic overproduce-and-choose strategy for the selection of classifier ensembles. Pattern Recognit 41(10):2993–3009
Duan Y, Zou B, Xu J, Chen F, Wei J, Tang YY (2021) OAA-SVM-MS: a fast and efficient multi-class classification algorithm. Neurocomputing 454:448–460
Feinstein AR, Cicchetti DV (1990) High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 43(6):543–549
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44(8):1761–1776
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2013) Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recognit 46(12):3412–3424
Galar M, Fernández A, Barrenechea E, Herrera F (2015) DRCW-OVO: distance-based relative competence weighting combination for one-vs-one strategy in multi-class problems. Pattern Recognit 48(1):28–42
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Hall MA (1999) Correlation-based feature selection for machine learning. In: PhD thesis, Hamilton
Hassan MR, Huda S, Hassan MM, Abawajy J, Alsanad A, Fortino G (2022) Early detection of cardiovascular autonomic neuropathy: a multi-class classification model based on feature selection and deep learning feature fusion. Inf Fusion 77:70–80
Hodges J, Lehmann E (1962) Rank methods for combination of independent experiments in analysis of variance. Ann Math Stat 33:482–497
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat. pp 65–70
Huang YS, Suen CY (1995) A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans Pattern Anal Mach Intell 17(1):90–94
Jadhav S, He H, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553
John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: the Eleventh Conference on Uncertainty in Artificial Intelligence, pp 338–345
Kang S, Cho S, Kang P (2015) Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149:677–682
Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing, Springer, pp 41–50
Kuncheva LI, Bezdek JC, Duin RP (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit 34(2):299–314
Mendialdua I, Martínez-Otzeta JM, Rodriguez-Rodriguez I, Ruiz-Vazquez T, Sierra B (2015) Dynamic selection of the best base classifier in one versus one. Knowl Based Syst 85:298–306
Moreno-Torres JG, Sáez JA, Herrera F (2012) Study on the impact of partition-induced dataset shift on \(k\)-fold cross-validation. IEEE Trans Neural Netw Learn Syst 23(8):1304–1312
Nakariyakul S, Casasent DP (2009) An improvement on floating search algorithms for feature subset selection. Pattern Recognit 42(9):1932–1940
Nguyen BH, Xue B, Zhang M (2020) A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evolut Comput 54:100663. https://doi.org/10.1016/j.swevo.2020.100663
Nouri-Moghaddam B, Ghazanfari M, Fathian M (2021) A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Syst Appl 175:114737. https://doi.org/10.1016/j.eswa.2021.114737
Onan A, Korukoğlu S, Bulut H (2017) A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Inf Process Manag 53(4):814–833
Peralta B, Soto A (2014) Embedded local feature selection within mixture of experts. Inf Sci 269:176–187
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numer Recipes C. Cambridge University Press, Cambridge
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Ram VSS, Kayastha N, Sha K (2022) Ofes: optimal feature evaluation and selection for multi-class classification. Data Knowl Eng, p 102007
Raudys Š (2006) Trainable fusion rules. i. Large sample size case. Neural Netw 19(10):1506–1516
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Sahoo KK, Dutta I, Ijaz MF, Woźniak M, Singh PK (2021) TLEFuzzyNet: fuzzy rank-based ensemble of transfer learning models for emotion recognition from human speeches. IEEE Access 9:166518–166530
Salesi S, Cosma G, Mavrovouniotis M (2021) TAGA: Tabu asexual genetic algorithm embedded in a filter/filter feature selection approach for high-dimensional data. Inf Sci 565:105–127
Seijo-Pardo B, Porto-Díaz I, Bolón-Canedo V, Alonso-Betanzos A (2017) Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl Based Syst 118:124–139
Senliol B, Gulgezen G, Yu L, Cataltepe Z (2008) Fast Correlation Based Filter (FCBF) with a different search strategy. In: International Symposium on Computer & Information Sciences, pp 1–4
Shannon CE (1948) A mathematical theory of communication. Bell Syst Techn J 27(3):379–423
Silva RA, Britto Jr AdS, Enembreck F, Sabourin R, de Oliveira LE (2020) CSBF: a static ensemble fusion method based on the centrality score of complex networks. Comput Intell 36(2):522–556
Thabtah F, Kamalov F, Hammoud S, Shahamiri SR (2020) Least loss: a simplified filter method for feature selection. Inf Sci 534:1–15
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Wei J, Huang H, Yao L, Hu Y, Fan Q, Huang D (2021) New imbalanced bearing fault diagnosis method based on Sample-characteristic Oversampling TechniquE (SCOTE) and multi-class LS-SVM. Appl Soft Comput 101:107043. https://doi.org/10.1016/j.asoc.2020.107043
Wieczorek M, Sika J, Wozniak M, Garg S, Hassan M (2021) Lightweight CNN model for human face detection in risk situations. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2021.3129629
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Yang BQ, Guan XP, Zhu JW, Gu CC, Wu KJ, Xu JJ (2021) SVMs multi-class loss feedback based discriminative dictionary learning for image classification. Pattern Recognit 112:107690. https://doi.org/10.1016/j.patcog.2020.107690
Zhang J, Dai Q, Yao C (2021) DEP-TSP meta: a multiple criteria dynamic ensemble pruning technique ad-hoc for time series prediction. Int J Mach Learn Cybern, pp 1–24
Zhang ZL, Luo XG, García S, Tang JF, Herrera F (2017) Exploring the effectiveness of dynamic ensemble selection in the one-versus-one scheme. Knowl Based Syst 125:53–63
Zhang ZL, Luo XG, Zhou Q (2022) Drcw-fr\(k\)nn-ovo: distance-based related competence weighting based on fixed radius \(k\) nearest neighbour for one-vs-one scheme. Int J Mach Learn Cybern 13(5):1441–1459
Zyblewski P, Sabourin R, Woźniak M (2021) Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Inf Fusion 66:138–154
Acknowledgements
The authors would like to thank the (anonymous) reviewers for their constructive comments. This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LZ20G010001, the National Science Foundation of China under Grant No. 71801065, 71831006, and 71932005.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, ZL., Zhang, CY., Luo, XG. et al. A multiple classifiers system with roulette-based feature subspace selection for one-vs-one scheme. Pattern Anal Applic 26, 73–90 (2023). https://doi.org/10.1007/s10044-022-01089-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-022-01089-w