Ensemble learning method based on CNN for class imbalanced data

Zhong, Xin; Wang, Nan

doi:10.1007/s11227-023-05820-0

Ensemble learning method based on CNN for class imbalanced data

Published: 19 December 2023

Volume 80, pages 10090–10121, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Xin Zhong¹ &
Nan Wang¹

122 Accesses
Explore all metrics

Abstract

Classifying imbalanced data presents a significant challenge, and many studies have proposed methodologies to address this issue. Among them, Convolutional Neural Networks have demonstrated superior performance for imbalanced image classification. This paper initially employs various data pre-processing methods such as over-sampling, under-sampling, and SMOTE to enhance the original dataset. Subsequently, an Ensemble CNN learning model is used to train and predict the data. In order to comprehensively evaluate models trained on imbalanced data, we used metrics such as Accuracy, Recall, Precision, F1-score, and G-mean. On the CIFAR-10 and Fashion-MNIST datasets, different samples from each category were extracted as imbalanced data for experimental research. Compared to the AdaBoost-DenseNet model, our proposed methodology increases the test accuracy on the CIFAR-10 dataset by 9%. Similarly, the F1-score and G-mean improved by 0.096 and 0.069, respectively. Compared to traditional methodologies, our proposed method significantly improves accuracy, recall, precision, and other performance indicators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effects of Class Imbalance Problem in Convolutional Neural Network Based Image Classification

Image classification method on class imbalance datasets using multi-scale CNN and two-stage transfer learning

Article 24 September 2021

Remix: Rebalanced Mixup

Data availability

The data used to support the findings of this study are included in the article.

References

He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340. https://doi.org/10.1109/TPAMI.2005.55
Article Google Scholar
Malini N, Pushpa M (2017) Analysis on credit card fraud identification techniques based on KNN and outlier detection. In: 2017 third international conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB). Chennai, India, pp 255–258. https://doi.org/10.1109/AEEICB.2017.7972424
Seera M, Lim CP (2014) A hybrid intelligent system for medical data classification. Expert Syst Appl 41(5):2239–2249. https://doi.org/10.1016/j.eswa.2013.09.022
Article Google Scholar
Lee W, Jun CH, Lee JS (2017) Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification. Inf Sci 381:92–103. https://doi.org/10.1016/j.ins.2016.11.014
Article Google Scholar
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2016) Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets. Inf Sci 354:178–196. https://doi.org/10.1016/j.ins.2016.02.056
Article Google Scholar
Chua LO, Roska T (1993) The CNN paradigm. IEEE Trans Circuits Syst I: Fundam Theory Appl 40(3):147–156. https://doi.org/10.1109/81.222795
Article Google Scholar
Zhang W, Li R, Deng H, Wang L, Lin W, Ji S, Shen D (2015) Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. Neuroimage 108:214–224. https://doi.org/10.1016/j.neuroimage.2014.12.061
Article Google Scholar
Galvez RL, Bandala A, Dadios EP, Vicerra RRP, Maningo JMZ (2018) Object detection using convolutional neural networks. In: TENCON 2018–2018 IEEE region 10 conference. Jeju, Korea (South), pp 2023–2027. https://doi.org/10.1109/TENCON.2018.8650517
Zhao J, Jin J, Chen S, Zhang R, Yu B, Liu Q (2020) A weighted hybrid ensemble method for classifying imbalanced data. Knowl-Based Syst 203:106087. https://doi.org/10.1016/j.knosys.2020.106087
Article Google Scholar
Gao M, Hong X, Chen S, Harris CJ (2012) Probability density function estimation based over-sampling for imbalanced two-class problems. In: the 2012 international joint conference on neural networks (IJCNN). Brisbane, QLD, Australia, pp 1–8. https://doi.org/10.1109/IJCNN.2012.6252384
Pradipta GA, Wardoyo R, Musdholifah A, Sanjaya INH (2021) Radius-SMOTE: a new oversampling technique of minority samples based on radius distance for learning from imbalanced data. IEEE Access 9:74763–74777. https://doi.org/10.1109/ACCESS.2021.3080316
Article Google Scholar
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Article Google Scholar
Md AQ, Kulkarni S, Joshua CJ, Vaichole T, Mohan S, Iwendi C (2023) Enhanced pre-processing approach using ensemble machine learning algorithms for detecting liver disease. Biomedicines 11(2):581. https://doi.org/10.3390/biomedicines11020581
Article Google Scholar
Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. In: the 2010 international joint conference on neural networks (IJCNN). Barcelona, Spain, pp 1–8. https://doi.org/10.1109/IJCNN.2010.5596486
Jiang X, Wang J, Meng Q, Saada M, Cai H (2023) An adaptive multi-class imbalanced classification framework based on ensemble methods and deep network. Neural Comput Appl 35(15):11141–11159. https://doi.org/10.1007/s00521-023-08290-w
Article Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232. https://doi.org/10.1007/s13748-016-0094-0
Article Google Scholar
Han M, Li A, Gao Z, Mu D, Liu S (2023) Hybrid sampling and dynamic weighting-based classification method for multi-class imbalanced data stream. Appl Sci 13(10):5924. https://doi.org/10.3390/app13105924
Article Google Scholar
Yuan X, Xie L, Abouelenien M (2017) A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data. Pattern Recogn 77:160–172. https://doi.org/10.1016/j.patcog.2017.12.017
Article Google Scholar
Prabhakararao E, Dandapat S (2022) Multi-scale convolutional neural network ensemble for multi-class arrhythmia classification. IEEE J Biomed Health Inform 26(8):3802–3812. https://doi.org/10.1109/JBHI.2021.3138986
Article Google Scholar
Taherkhani A, Cosma G, McGinnity TM (2020) AdaBoost-CNN: an adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing 404:351–366. https://doi.org/10.1016/j.neucom.2020.03.064
Article Google Scholar
Krishna PR, Prasad VVKDV, Battula TK (2023) Optimization empowered hierarchical residual VGGNet19 network for multi-class brain tumor classification. Multimed Tools Appl 82(11):16691–16716. https://doi.org/10.1007/s11042-022-13994-7
Article Google Scholar
Yu W, Yang K, Bai Y, Xiao T, Yao H, Rui Y (2016) Visualizing and comparing AlexNet and VGG using deconvolutional layers. In: Proceedings of the 33rd international conference on machine learning
Szegedy C, Vanhoucke V, Ioffe S, Shalens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11231
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133. https://doi.org/10.1016/j.patcog.2019.01.006
Article Google Scholar
He F, Liu T, Tao D (2020) Why resnet works? Residuals generalize. IEEE Trans Neural Netw Learn Syst 31(12):5349–5362. https://doi.org/10.1109/TNNLS.2020.2966319
Article MathSciNet Google Scholar
Wang X, Wang S, Cao J, Wang Y (2020) Data-driven based tiny-YOLOv3 method for front vehicle detection inducing SPP-Net. IEEE Access 8:110227–110236. https://doi.org/10.1109/ACCESS.2020.3001279
Article Google Scholar
Jeon J, Jeong B, Baek S, Jeong YS (2022) Hybrid malware detection based on Bi-LSTM and SPP-Net for smart IoT. IEEE Trans Industr Inf 18(7):4830–4837. https://doi.org/10.1109/TLL.2021.3119778
Article Google Scholar
Zhao Z, Yun S, Jia L, Guo J, Meng Y, He N, Li X, Shi J, Yang L (2023) Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features. Eng Appl Artif Intell 121:105982. https://doi.org/10.1016/j.engappai.2023.105982
Article Google Scholar
Elhassan T, Aljurf M (2016) Classification of imbalance data using tomek link (t-link) combined with random under-sampling (rus) as a data reduction method. Global J Technol Optim S 1:2016. https://doi.org/10.4172/2229-8711.S1111
Article Google Scholar
Van Hulse J, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning. Corvallis, Oregon, USA, pp 935–942. https://doi.org/10.1145/1273496.1273614
Chawla NV, Bowyer KW, Hal LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–367. https://doi.org/10.1613/jair.953
Article Google Scholar
Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Advances in intelligent computing: In: international conference on intelligent computing, Springer, Berlin, pp 878–887. https://doi.org/10.1007/11538059_91
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2003) Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB (eds) Advances in knowledge discovery and data mining: 13th PAKDD 2009. Springer, Heidelberg, pp 475–482
Google Scholar
Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv: 1712.04621. https://doi.org/10.48550/arXiv.1712.04621
Mikołajczyk A, Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 international interdisciplinary PhD workshop (IIPhDW). Świnouście, Poland, pp 117–122. https://doi.org/10.1109/IIPHDW.2018.8388338
Ikram RMA, Hazarika BB, Gupta D, Heddam S, Kisi O (2023) Streamflow prediction in mountainous region using new machine learning and data pre-processing methods: a case study. Neural Comput Appl 35(12):9053–9070. https://doi.org/10.1007/s00521-022-08163-8
Article Google Scholar
Geurts P (2000) Some enhancements of decision tree bagging. In: European conference on principles of data mining and knowledge discovery, Springer, Berlin, pp 136–147. https://doi.org/10.1007/3-540-45372-5_14
Kim HC, Pang S, Je HM, Kim D, Bang SY (2002) Support vector machine ensemble with bagging. Pattern recognition with support vector machines: In: First International Workshop, SVM 2002 Niagara Falls, Canada, August 10, 2002 Proceedings. Springer Berlin Heidelberg, pp 397–408. https://doi.org/10.1007/3-540-45665-1_31
Li X, Wang L, Sung E (2005) A study of AdaBoost with SVM-based weak learners. In: Proceedings 2005 IEEE international joint conference on neural networks, 1, 196–201. https://doi.org/10.1109/IJCNN.2005.1555829
Grossmann E (2004) AdaTree: boosting a weak classifier into a decision tree. In: 2004 conference on computer vision and pattern recognition workshop, Washington, pp 105–105. https://doi.org/10.1109/CVPR.2004.296
Zeng M, Xiao N (2019) Effective combination of DenseNet and BiLSTM for keyword spotting. IEEE Access 7(1):0767–107752. https://doi.org/10.1109/ACCESS.2019.2891838
Article Google Scholar

Download references

Funding

The authors did not receive specific funding.

Author information

Authors and Affiliations

School of Mathematical Sciences, Heilongjiang University, 74 Xuefu Road, Harbin City, 150080, Heilongjiang Province, China
Xin Zhong & Nan Wang

Authors

Xin Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Nan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.Z. and N.W. wrote the main manuscript text and prepared all the figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Nan Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhong, X., Wang, N. Ensemble learning method based on CNN for class imbalanced data. J Supercomput 80, 10090–10121 (2024). https://doi.org/10.1007/s11227-023-05820-0

Download citation

Accepted: 15 November 2023
Published: 19 December 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11227-023-05820-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble learning method based on CNN for class imbalanced data

Abstract

Access this article

Similar content being viewed by others

Effects of Class Imbalance Problem in Convolutional Neural Network Based Image Classification

Image classification method on class imbalance datasets using multi-scale CNN and two-stage transfer learning

Remix: Rebalanced Mixup

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble learning method based on CNN for class imbalanced data

Abstract

Access this article

Similar content being viewed by others

Effects of Class Imbalance Problem in Convolutional Neural Network Based Image Classification

Image classification method on class imbalance datasets using multi-scale CNN and two-stage transfer learning

Remix: Rebalanced Mixup

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation