Ensemble classification for imbalanced data based on feature space partitioning and hybrid metaheuristics

Lopez-Garcia, Pedro; Masegosa, Antonio D.; Osaba, Eneko; Onieva, Enrique; Perallos, Asier

doi:10.1007/s10489-019-01423-6

Ensemble classification for imbalanced data based on feature space partitioning and hybrid metaheuristics

Published: 07 February 2019

Volume 49, pages 2807–2822, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Pedro Lopez-Garcia ORCID: orcid.org/0000-0002-0515-4928^1,2,
Antonio D. Masegosa^1,2,3,
Eneko Osaba^1,2,4,
Enrique Onieva^1,2 &
…
Asier Perallos^1,2

1203 Accesses
35 Citations
Explore all metrics

Abstract

One of the most challenging issues when facing a classification problem is to deal with imbalanced datasets. Recently, ensemble classification techniques have proven to be very successful in addressing this problem. We present an ensemble classification approach based on feature space partitioning for imbalanced classification. A hybrid metaheuristic called GACE is used to optimize the different parameters related to the feature space partitioning. To assess the performance of the proposal, an extensive experimentation over imbalanced and real-world datasets compares different configurations and base classifiers. Its performance is competitive with that of reference techniques in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exploratory study of mono and multi-objective metaheuristics to ensemble of classifiers

Article 14 July 2017

A study of model and hyper-parameter selection strategies for classifier ensembles: a robust analysis on different optimization algorithms and extended results

Article 30 October 2020

A Genetic-Based Ensemble Learning Applied to Imbalanced Data Classification

Notes

References

Alcala-Fdez J, Alcala R, Herrera F (2011) A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 19(5):857–872
Article Google Scholar
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
Amami R, Ben Ayed D, Ellouze N (2013) Adaboost with SVM using GMM supervector for imbalanced phoneme data. In: 2013 The 6th international conference on human system interaction (HSI), pp 328–333
Bäck T, Schwefel H (1993) An overview of evolutionary algorithms for parameter optimization. Evol Comput 1(1):1– 23
Article Google Scholar
Bi Y, Guan J, Bell D (2008) The combination of multiple classifiers using an evidential reasoning approach. Artif Intell 172(15):1731–1751
Article MATH Google Scholar
Bian J, Peng XG, Wang Y, Zhang H (2016) An efficient cost-sensitive feature selection using chaos genetic algorithm for class imbalance problem. Math Probl Eng, 2016
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2 (2):121–167
Article Google Scholar
Cervantes J, Huang DS, García-Lamont F, Chau A (2014) A hybrid algorithm to improve the accuracy of support vector machines on skewed data-sets. In: International conference on intelligent computing, pp 782–788
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Danesh A, Moshiri B, Fatemi O (2007) Improve text classification accuracy based on classifier fusion methods. In: 10th International conference on information fusion, pp 1–6
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Article Google Scholar
Díez-Pastor JF, Rodríguez GOCJ, Kuncheva LIJ (2015) Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl-Based Syst 85:96–111
Article Google Scholar
Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66
Article Google Scholar
Duin RP (2002) The combining classifier: to train or not to train? In: Proceedings 16th international conference patter recognition, vol 2. IEEE, pp 765–770
Eshelman LJ, Schaffer JD (1992) Real-coded genetic algorithms and interval-schemata. Found Gen Algor 2:187–202
Google Scholar
Fattahi S, Othman Z, Othman Z (2015) New approach with ensemble method to address class imbalance problem. J Theor Appl Inf Technol 72:1
Google Scholar
Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88 (423):920–923
Article MathSciNet MATH Google Scholar
Giacinto G, Roli F (2001) Dynamic classifier selection based on multiple classifier behaviour. Pattern Recogn 34(9):1879– 1881
Article MATH Google Scholar
Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Found Gen Algor 1:69–93
MathSciNet Google Scholar
Haixiang G, Xiuwu L, Kejun Z, Chang D, Yanhui G (2011) Optimizing reservoir features in oil exploration management based on fusion of soft computing. Appl Soft Comput 11(1):1144–1155
Article Google Scholar
Hashem S (1997) Optimal linear combinations of neural networks. Neural Netw 10(4):599–614
Article MathSciNet Google Scholar
Herrera F, Lozano M, Verdegay JL (1998) Tackling real-coded genetic algorithms: operators and tools for behavioural analysis. Artif Intell Rev 12(4):265–319
Article MATH Google Scholar
Ho D, Drake T, Bentley R, Valea F, Wax A (2015) Evaluation of hybrid algorithm for analysis of scattered light using ex vivo nuclear morphology measurements of cervical epithelium. Biom Opt Express 6 (8):2755–2765
Article Google Scholar
Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
MathSciNet MATH Google Scholar
Hopfield J (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79(8):2554–2558
Article MathSciNet MATH Google Scholar
Jackowski K, Wozniak M (2009) Algorithm of designing compound recognition system on the basis of combining classifiers with simultaneous splitting feature space into competence areas. Pattern Anal Applic 12(4):415–425
Article MathSciNet Google Scholar
Jackowski K, Krawczyk B, Woźniak M (2014) Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. Int J Neural Syst 24(03):1430007
Article Google Scholar
Jackowski K (2015) Adaptive splitting and selection algorithm for regression. N Gener Comput 33(4):425–448
Article Google Scholar
del Jesus M, Hoffmann F, Junco L, Sánchez L (2004) Induction of fuzzy-rule-based classifiers with evolutionary boosting algorithms. IEEE Trans Fuzzy Syst 12(3):296–308
Article Google Scholar
Jurek A, Bi Y, Wu S, Nugent C (2011) Classification by cluster analysis: a new meta-learning based approach. Multiple Classif Syst, 259–268
Jurek A, Bi Y, Wu S, Nugent C (2014) A survey of commonly used ensemble-based classification techniques. Knowl Eng Rev 29(5):551–581
Article Google Scholar
Kennedy J (2011) Particle swarm optimization. Encyclopedia of machine learning. Springer, pp 760–766
Krawczyk B, Cyganek B (2017) Selecting locally specialised classifiers for one-class classification ensembles. Pattern Anal Appl 20(2):427–439
Article MathSciNet Google Scholar
Krawczyk B, McInnes BT (2018) Local ensemble learning from imbalanced and noisy data for word sense disambiguation. Pattern Recogn 78:103–119
Article Google Scholar
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley
Kuncheva LI, Jain LC (2000) Designing classifier fusion systems by genetic algorithms. IEEE Trans Evol Comput 4(4):327–336
Article Google Scholar
Kuncheva LI, Whitaker CJ, Shipp CA, Duin RP (2003) Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl 6(1):22–31
Article MathSciNet MATH Google Scholar
Lavanya S, Palaniswami S, Divyabharathi M (2015) Resampling ensemble algorithm for class imbalance problem using optimization algorithm. Int J Appl Eng Res 10(13):11520–11526
Google Scholar
Liu X, Lin J, Deng K (2011) Scheduling optimization in re-entrant lines based on a GA and PSO hybrid algorithm. Tongji Daxue Xuebao/J Tongji Univ 39:726–729
MATH Google Scholar
Lopez-Garcia P, Onieva E, Osaba E, Masegosa A, Perallos A (2016) Gace: a meta-heuristic based in the hybridization of genetic algorithms and cross entropy methods for continuous optimization. Expert Syst Appl 55:508–519
Article Google Scholar
Lopez-Garcia P, Onieva E, Osaba E, Masegosa AD, Perallos A (2016) A hybrid method for short-term traffic congestion forecasting using genetic algorithms and cross entropy. IEEE Trans Intell Transp Syst 17(2):557–569
Article Google Scholar
Lopez-Garcia P, Woźniak M, Onieva E, Perallos A (2016c) Hybrid optimization method applied to adaptive splitting and selection algorithm. Lecture notes in computer science, vol 9648. Springer, pp 742–750
Mauša G, Galinac Grbac T (2017) Co-evolutionary multi-population genetic programming for classification in software defect prediction: an empirical case study. Appl Soft Comput J 55:331–351
Article Google Scholar
Mokeddem D, Belbachir H (2009) A survey of distributed classification based ensemble data mining methods. J Appl Sci 9(20):3739–3745
Article Google Scholar
Opitz DW, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198
Article MATH Google Scholar
Paredes R, Vidal E (2006) Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Trans Pattern Anal Mach Intell 28(7):1100–1110
Article Google Scholar
Qian Y, Liang Y, Li M, Feng G, Shi X (2014) A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing 143:57–67
Article Google Scholar
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1):1–39
Article MathSciNet Google Scholar
Ruta D, Gabrys B (2005) Classifier selection for majority voting. Inform Fus 6(1):63–81
Article MATH Google Scholar
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern-Part A: Syst Humans 40(1):185–197
Article Google Scholar
Sentinella M, Casalino L (2009) Cooperative evolutionary algorithm for space trajectory optimization. Celest Mech Dyn Astron 105(1-3):211
Article MathSciNet MATH Google Scholar
Stanciu S, Tranca D, Stanciu G, Hristu R, Bueno J (2016) Perspectives on combining nonlinear laser scanning microscopy and bag-of-features data classification strategies for automated disease diagnostics. Opt Quant Electron 48(6):320
Article Google Scholar
Vorraboot P, Rasmequan S, Chinnasarn K, Lursinsap C (2015) Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms. Neurocomputing 152:429–443
Article Google Scholar
Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of IEEE symposium in computational intelligence and data mining, 2009, CIDM’09, pp 324–331
Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B (Cybern) 42(4):1119–1130
Article Google Scholar
Wang S, Minku L, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368
Article Google Scholar
Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435
Article Google Scholar
Yang J, Ji Z, Xie W, Zhu Z (2016) Model selection based on particle swarm optimization for omics data classification. Shenzhen Daxue Xuebao (Ligong Ban)/J Shenzhen Univ Sci Eng 33(3):264–271
Article Google Scholar
Yang P, Xu L, Zhou B, Zhang Z, Zomaya A (2009) A particle swarm based hybrid system for imbalanced medical data sampling. BMC Genomics 10:Suppl. 3. https://doi.org/10.1186/1471-2164-10-S3-S34
Article Google Scholar
Yang XS (2010) A new metaheuristic bat-inspired algorithm. Stud Comput Intell 284:65–74
MATH Google Scholar
Yu H, Ni J, Zhao J (2013) ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 101:309–318
Article Google Scholar
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Article MathSciNet Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232
Article Google Scholar
Cano A, Zafra A, Ventura S (2013) Weighted data gravitation classification for standard and imbalanced data. IEEE Trans Cybern 43(6):1672–1687
Article Google Scholar
Mahdizadehaghdam S, Dai L, Krim H, Skau E, Wang H (2017) Image classification: a hierarchical dictionary learning approach. In: IEEE International conference in acoustics, speech and signal processing (ICASSP), 2017, pp 2597–2601
Khari M, Kumar P, Burgos D, Crespo RG (2017) Optimized test suites for automated testing using different optimization techniques. Soft Comput, 1–12
Fernández A, García S, Herrera F (2011) Addressing the classification with imbalanced data: open problems and new challenges on class distribution. Hybrid Artif Intell Syst, 1–10
Sun Y, Wong AK, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(04):687–719
Article Google Scholar
Krawczyk B, Cano A, Woźniak M (2018) Selecting local ensembles for multi-class imbalanced data classification, In: 2018 International joint conference on neural networks (IJCNN) 1–8
Fernandez A, Garcia S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets. Springer

Download references

Acknowledgments

This work has been supported by the research projects TEC2013-45585-C2-2-R and TIN2014-56042-JIN from the Spanish Ministry of Economy and Competitiveness, the TIMON project, which received funding from the European Union Horizon 2020 research and innovation programme under grant agreement No. 636220, and the LOGISTAR project, which received funding from European Union’s Horizon 2020 research and innovation programme under grant agreement No. 769142.

Author information

Authors and Affiliations

DeustoTech-Fundacion Deusto, Deusto Foundation, 48007, Bilbao, Spain
Pedro Lopez-Garcia, Antonio D. Masegosa, Eneko Osaba, Enrique Onieva & Asier Perallos
Faculty of Engineering, University of Deusto, 48007, Bilbao, Spain
Pedro Lopez-Garcia, Antonio D. Masegosa, Eneko Osaba, Enrique Onieva & Asier Perallos
IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Spain
Antonio D. Masegosa
TECNALIA Research and Innovation, 48160, Derio, Spain
Eneko Osaba

Authors

Pedro Lopez-Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Antonio D. Masegosa
View author publications
You can also search for this author in PubMed Google Scholar
Eneko Osaba
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Onieva
View author publications
You can also search for this author in PubMed Google Scholar
Asier Perallos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Lopez-Garcia.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lopez-Garcia, P., Masegosa, A.D., Osaba, E. et al. Ensemble classification for imbalanced data based on feature space partitioning and hybrid metaheuristics. Appl Intell 49, 2807–2822 (2019). https://doi.org/10.1007/s10489-019-01423-6

Download citation

Published: 07 February 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10489-019-01423-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble classification for imbalanced data based on feature space partitioning and hybrid metaheuristics

Abstract

Access this article

Similar content being viewed by others

An exploratory study of mono and multi-objective metaheuristics to ensemble of classifiers

A study of model and hyper-parameter selection strategies for classifier ensembles: a robust analysis on different optimization algorithms and extended results

A Genetic-Based Ensemble Learning Applied to Imbalanced Data Classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble classification for imbalanced data based on feature space partitioning and hybrid metaheuristics

Abstract

Access this article

Similar content being viewed by others

An exploratory study of mono and multi-objective metaheuristics to ensemble of classifiers

A study of model and hyper-parameter selection strategies for classifier ensembles: a robust analysis on different optimization algorithms and extended results

A Genetic-Based Ensemble Learning Applied to Imbalanced Data Classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation