Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach

Vluymans, Sarah; Fernández, Alberto; Saeys, Yvan; Cornelis, Chris; Herrera, Francisco

doi:10.1007/s10115-017-1126-1

Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach

Regular Paper
Published: 22 October 2017

Volume 56, pages 55–84, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Sarah Vluymans ORCID: orcid.org/0000-0003-1782-8114^1,2,3,
Alberto Fernández³,
Yvan Saeys^1,2,
Chris Cornelis^1,3 &
…
Francisco Herrera^3,4

921 Accesses
25 Citations
Explore all metrics

Abstract

Class imbalance occurs when data elements are unevenly distributed among classes, which poses a challenge for classifiers. The core focus of the research community has been on binary-class imbalance, although there is a recent trend toward the general case of multi-class imbalanced data. The IFROWANN method, a classifier based on fuzzy rough set theory, stands out for its performance in two-class imbalanced problems. In this paper, we consider its extension to multi-class data by combining it with one-versus-one decomposition. The latter transforms a multi-class problem into two-class sub-problems. Binary classifiers are applied to these sub-problems, after which their outcomes are aggregated into one prediction. We enhance the integration of IFROWANN in the decomposition scheme in two steps. Firstly, we propose an adaptive weight setting for the binary classifier, addressing the varying characteristics of the sub-problems. We call this modified classifier IFROWANN-\({{\mathcal {W}}_{\mathrm{IR}}}\). Second, we develop a new dynamic aggregation method called WV–FROST that combines the predictions of the binary classifiers with the global class affinity before making a final decision. In a meticulous experimental study, we show that our complete proposal outperforms the state-of-the-art on a wide range of multi-class imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiclass Classification Based on Multi-criteria Decision-making

Article 02 April 2019

An adjustable fuzzy classification algorithm using an improved multi-objective genetic strategy based on decomposition for imbalance dataset

Article 26 February 2019

On the Combination of Pairwise and Granularity Learning for Improving Fuzzy Rule-Based Classification Systems: GL-FARCHD-OVO

Notes

If the classifier provides both confidence degrees, one must ensure that they are normalized such that \(r_{ij} + r_{ji} = 1\).

References

Abdi L, Hashemi S (2016) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng 28(1):238–251
Article Google Scholar
Alshomrani S, Bawakid A, Shim S, Fernández A, Herrera F (2015) A proposal for evolutionary fuzzy systems using feature weighting: dealing with overlapping in imbalanced datasets. Knowl Based Syst 73:1–17
Article Google Scholar
Barandela R, Sánchez J, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recog 36(3):849–851
Article Google Scholar
Batista G, Prati R, Monard MC (2004) A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor 6(1):20–29
Article Google Scholar
Britto AS Jr, Sabourin R, de Oliveira LES (2014) Dynamic selection of classifiers—a comprehensive review. Pattern Recog 47(1):3665–3680
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Chen Y (2016) An empirical study of a hybrid imbalanced-class DT–RST classification procedure to elucidate therapeutic effects in uremia patients. Med Biol Eng Comput 54(6):983–1001
Article Google Scholar
Cornelis C, Verbiest N, Jensen R (2010) Ordered weighted average based fuzzy rough sets. In: Yu J, Greco S, Lingras P, Wang G, Skowron A (eds) Rough set and knowledge technology. Springer, Berlin, pp 78–85
Chapter Google Scholar
D’eer L, Verbiest N, Cornelis C, Godo L (2015) A comprehensive study of implicator–conjunctor-based and noise-tolerant fuzzy rough sets: definitions, properties and robustness analysis. Fuzzy Sets Syst 275:1–38
Article MathSciNet MATH Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Domingos P (1999) MetaCost: a general method for making classifiers cost—sensitive. In: Fayyad U, Chaudhuri S, Madigan D (eds) Proceedings of the 5th international conference on knowledge discovery and data mining (KDD’99). ACM, New York, pp 155–164
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J Gen Syst 17(2–3):191–209
Article MATH Google Scholar
Fei B, Liu J (2006) Binary tree of SVM: a new fast multiclass training and classification algorithm. IEEE Trans Neural Netw 17(3):696–704
Article MathSciNet Google Scholar
Fernández A, Calderon M, Barrenechea E, Bustince H, Herrera F (2010a) Solving multi-class problems with linguistic fuzzy rule based classification systems based on pairwise learning and preference relations. Fuzzy Sets Syst 161(23):3064–3080
Article MathSciNet MATH Google Scholar
Fernández A, García S, Luengo J, Bernado-Mansilla E, Herrera F (2010b) Genetics-based machine learning for rule induction: state of the art, taxonomy and comparative study. IEEE Trans Evol Comput 14(6):913–941
Article Google Scholar
Fernández A, López V, Galar M, Del Jesus MJ, Herrera F (2013) Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl Based Syst 42:97–110
Article Google Scholar
Friedman JH (1996) Another approach to polychotomous classification. Tech rep, Department of Statistics, Stanford University. http://www-stat.stanford.edu/~jhf/ftp/poly.ps.Z
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article MATH Google Scholar
Fürnkranz J, Hüllermeier E, Vanderlooy S (2009) Binary Decomposition Methods for Multipartite Ranking. In: Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J (eds.) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science, vol 5781. Springer, Berlin, Heidelberg
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recog 44(8):1761–1776
Article Google Scholar
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2013) Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recog 46(12):3412–3424
Article Google Scholar
Galar M, Fernández A, Barrenechea E, Herrera F (2015) DRCW-OVO: distance-based relative competence weighting combination for one-vs-one strategy in multi-class problems. Pattern Recog 48(1):28–42
Article Google Scholar
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2016) Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets. Inf Sci 354:178–196
Article Google Scholar
Gao X, Chen Z, Tang S, Zhang Y, Li J (2016) Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 173:1927–1935
Article Google Scholar
Gao Z, Zhang L, Chen M, Hauptmann A, Zhang H, Cai A (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Article Google Scholar
García V, Mollineda RA, Sánchez JS (2008) On the k-nn performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl 11(3–4):269–280
Article MathSciNet Google Scholar
Haixiang G, Yijing L, Yanan L, Xiao L, Jinling L (2016) BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artifl Intell 49:176–193
Article Google Scholar
Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186
Article MATH Google Scholar
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471
Article MathSciNet MATH Google Scholar
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2): 65–70
Huhn J, Hüllermeier E (2009) FR3: a fuzzy rule learner for inducing reliable classifiers. IEEE Trans Fuzzy Syst 17(1):138–149
Article Google Scholar
Hüllermeier E, Brinker K (2008) Learning valued preference structures for solving classification problems. Fuzzy Sets Syst 159(18):2337–2352
Article MathSciNet MATH Google Scholar
Hüllermeier E, Vanderlooy S (2010) Combining predictions in pairwise classification: an optimal adaptive voting strategy and its relation to weighted voting. Pattern Recog 43(1):128–142
Article MATH Google Scholar
Jensen R, Cornelis C (2011) Fuzzy-rough nearest neighbour classification and prediction. Theor Comput Sci 412(42):5871–5884
Article MathSciNet MATH Google Scholar
Kuncheva L, Bezdek J, Duin R (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recog 34(2):299–314
Article MATH Google Scholar
Liu B, Hao Z, Yang X (2007) Nesting algorithm for multi-classification problems. Soft Comput 11(4):383–389
Article MATH Google Scholar
Liu B, Hao Z, Tsang ECC (2008) Nesting one-against-one algorithm based on SVMs for pattern classification. IEEE Trans Neural Netw 19(12):2044–2052
Article Google Scholar
López V, Fernández A, Moreno-Torres JG, Herrera F (2012) Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst Appl 39(7):6585–6608
Article Google Scholar
López V, Fernández A, Del Jesus M, Herrera F (2013a) A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl Based Syst 38:85–104
Article Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013b) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
López V, Fernández A, Herrera F (2014) On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed. Inf Sci 257:1–13
Article Google Scholar
Lorena AC, Carvalho AC, Gama JM (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30(1–4):19–37
Article Google Scholar
Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Inst Sci (Calcutta) 2:49–55
MATH Google Scholar
Martínez-Munoz G, Hernández-Lobato D, Suárez A (2009) An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans Pattern Anal Mach Intellig 31(2):245–259
Article Google Scholar
Moreno-Torres JG, Sáez JA, Herrera F (2012) Study on the impact of partition-induced dataset shift on-fold cross-validation. IEEE Trans Neural Netw Learn Syst 23(8):1304–1312
Article Google Scholar
Orriols-Puig A, Bernado-Mansilla E (2009) Evolutionary rule-based systems for imbalanced datasets. Soft Comput 13(3):213–225
Article Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Article MATH Google Scholar
Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. In: Solla S, Leen T, Müller K (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 547–553
Google Scholar
Ramentol E, Vluymans S, Verbiest N, Caballero Y, Bello R, Cornelis C, Herrera F (2015) IFROWANN: imbalanced fuzzy-rough ordered weighted average nearest neighbor classification. IEEE Trans Fuzzy Syst 23(5):1622–1637
Article Google Scholar
Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: a small target detection benchmark. J Vis Commun Image Represent 34:187–203
Article Google Scholar
Rokach L (2016) Decision forest: twenty years of research. Inf Fusion 27:111–125
Article Google Scholar
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Article Google Scholar
Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recog Artif Intell 23(4):687–719
Article Google Scholar
Verbiest N, Ramentol E, Cornelis C, Herrera F (2014) Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection. Appl Soft Comput 22:511–517
Article Google Scholar
Villar P, Fernández A, Carrasco R, Herrera F (2012) Feature selection and granularity learning in genetic fuzzy rule-based classification systems for highly imbalanced data-sets. Int J Uncertain Fuzz 20(03):369–397
Article MATH Google Scholar
Vluymans S, D’eer L, Saeys Y, Cornelis C (2015) Applications of fuzzy rough set theory in machine learning: a survey. Fundam Inform 142(1–4):53–86
Article MathSciNet MATH Google Scholar
Vluymans S, Sánchez Tarragó D, Saeys Y, Cornelis C, Herrera F (2016) Fuzzy rough classifiers for class imbalanced multi-instance data. Pattern Recog 53:36–45
Article Google Scholar
Vriesmann LM, Britto AS Jr, Oliveira LES, Koerich AL, Sabourin R (2015) Combining overall and local class accuracies in an oracle-based method for dynamic ensemble selection. In: Proceedings of the 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–7
Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130
Article Google Scholar
Wang S, Chen H, Yao X (2010) Negative correlation learning for classification ensembles. In: Proceedings of the 2010 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Article Google Scholar
Woods K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19:405–410
Article Google Scholar
Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005
MathSciNet MATH Google Scholar
Yager R (1988) On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Trans Syst Man Cybern 18(1):183–190
Article MathSciNet MATH Google Scholar
Yijing L, Haixiang G, Xiao L, Yanan L, Jinling L (2016) Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl Based Syst 94:88–104
Article Google Scholar
Yu H, Hong S, Yang X, Ni J, Dan Y, Qin B (2013) Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers. BioMed Res Int 2013:1–13
Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353
Article MATH Google Scholar
Zhang Z, Krawczyk B, Garcìa S, Rosales-Pérez A, Herrera F (2016) Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl Based Syst 106:251–263
Article Google Scholar
Zhao X, Li X, Chen L, Aihara K (2008) Protein classification with imbalanced data. Proteins: Struct Funct Bioinform 70(4):1125–1132
Article Google Scholar
Zhou Z, Liu X (2010) On multi-class cost-sensitive learning. Comput Intell 26(3):232–257
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research of Sarah Vluymans is funded by the Special Research Fund (BOF) of Ghent University. This work was partially supported by the Spanish Ministry of Science and Technology under the Projects TIN2014-57251-P and TIN2015-68454-R; the Andalusian Research Plans P11-TIC-7765 and P12-TIC-2958. Yvan Saeys is an ISAC Marylou Ingram Scholar.

Author information

Authors and Affiliations

Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
Sarah Vluymans, Yvan Saeys & Chris Cornelis
Data Mining and Modeling for Biomedicine, VIB Inflammation Research Center, Ghent, Belgium
Sarah Vluymans & Yvan Saeys
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Sarah Vluymans, Alberto Fernández, Chris Cornelis & Francisco Herrera
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Francisco Herrera

Authors

Sarah Vluymans
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Yvan Saeys
View author publications
You can also search for this author in PubMed Google Scholar
Chris Cornelis
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarah Vluymans.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vluymans, S., Fernández, A., Saeys, Y. et al. Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach. Knowl Inf Syst 56, 55–84 (2018). https://doi.org/10.1007/s10115-017-1126-1

Download citation

Received: 11 October 2016
Revised: 08 September 2017
Accepted: 10 October 2017
Published: 22 October 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10115-017-1126-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach

Abstract

Access this article

Similar content being viewed by others

Multiclass Classification Based on Multi-criteria Decision-making

An adjustable fuzzy classification algorithm using an improved multi-objective genetic strategy based on decomposition for imbalance dataset

On the Combination of Pairwise and Granularity Learning for Improving Fuzzy Rule-Based Classification Systems: GL-FARCHD-OVO

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach

Abstract

Access this article

Similar content being viewed by others

Multiclass Classification Based on Multi-criteria Decision-making

An adjustable fuzzy classification algorithm using an improved multi-objective genetic strategy based on decomposition for imbalance dataset

On the Combination of Pairwise and Granularity Learning for Improving Fuzzy Rule-Based Classification Systems: GL-FARCHD-OVO

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation