Skip to main content
Log in

Ensemble learning method based on CNN for class imbalanced data

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Classifying imbalanced data presents a significant challenge, and many studies have proposed methodologies to address this issue. Among them, Convolutional Neural Networks have demonstrated superior performance for imbalanced image classification. This paper initially employs various data pre-processing methods such as over-sampling, under-sampling, and SMOTE to enhance the original dataset. Subsequently, an Ensemble CNN learning model is used to train and predict the data. In order to comprehensively evaluate models trained on imbalanced data, we used metrics such as Accuracy, Recall, Precision, F1-score, and G-mean. On the CIFAR-10 and Fashion-MNIST datasets, different samples from each category were extracted as imbalanced data for experimental research. Compared to the AdaBoost-DenseNet model, our proposed methodology increases the test accuracy on the CIFAR-10 dataset by 9%. Similarly, the F1-score and G-mean improved by 0.096 and 0.069, respectively. Compared to traditional methodologies, our proposed method significantly improves accuracy, recall, precision, and other performance indicators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The data used to support the findings of this study are included in the article.

References

  1. He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340. https://doi.org/10.1109/TPAMI.2005.55

    Article  Google Scholar 

  2. Malini N, Pushpa M (2017) Analysis on credit card fraud identification techniques based on KNN and outlier detection. In: 2017 third international conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB). Chennai, India, pp 255–258. https://doi.org/10.1109/AEEICB.2017.7972424

  3. Seera M, Lim CP (2014) A hybrid intelligent system for medical data classification. Expert Syst Appl 41(5):2239–2249. https://doi.org/10.1016/j.eswa.2013.09.022

    Article  Google Scholar 

  4. Lee W, Jun CH, Lee JS (2017) Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification. Inf Sci 381:92–103. https://doi.org/10.1016/j.ins.2016.11.014

    Article  Google Scholar 

  5. Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2016) Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets. Inf Sci 354:178–196. https://doi.org/10.1016/j.ins.2016.02.056

    Article  Google Scholar 

  6. Chua LO, Roska T (1993) The CNN paradigm. IEEE Trans Circuits Syst I: Fundam Theory Appl 40(3):147–156. https://doi.org/10.1109/81.222795

    Article  Google Scholar 

  7. Zhang W, Li R, Deng H, Wang L, Lin W, Ji S, Shen D (2015) Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. Neuroimage 108:214–224. https://doi.org/10.1016/j.neuroimage.2014.12.061

    Article  Google Scholar 

  8. Galvez RL, Bandala A, Dadios EP, Vicerra RRP, Maningo JMZ (2018) Object detection using convolutional neural networks. In: TENCON 2018–2018 IEEE region 10 conference. Jeju, Korea (South), pp 2023–2027. https://doi.org/10.1109/TENCON.2018.8650517

  9. Zhao J, Jin J, Chen S, Zhang R, Yu B, Liu Q (2020) A weighted hybrid ensemble method for classifying imbalanced data. Knowl-Based Syst 203:106087. https://doi.org/10.1016/j.knosys.2020.106087

    Article  Google Scholar 

  10. Gao M, Hong X, Chen S, Harris CJ (2012) Probability density function estimation based over-sampling for imbalanced two-class problems. In: the 2012 international joint conference on neural networks (IJCNN). Brisbane, QLD, Australia, pp 1–8. https://doi.org/10.1109/IJCNN.2012.6252384

  11. Pradipta GA, Wardoyo R, Musdholifah A, Sanjaya INH (2021) Radius-SMOTE: a new oversampling technique of minority samples based on radius distance for learning from imbalanced data. IEEE Access 9:74763–74777. https://doi.org/10.1109/ACCESS.2021.3080316

    Article  Google Scholar 

  12. Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259

    Article  Google Scholar 

  13. Md AQ, Kulkarni S, Joshua CJ, Vaichole T, Mohan S, Iwendi C (2023) Enhanced pre-processing approach using ensemble machine learning algorithms for detecting liver disease. Biomedicines 11(2):581. https://doi.org/10.3390/biomedicines11020581

    Article  Google Scholar 

  14. Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. In: the 2010 international joint conference on neural networks (IJCNN). Barcelona, Spain, pp 1–8. https://doi.org/10.1109/IJCNN.2010.5596486

  15. Jiang X, Wang J, Meng Q, Saada M, Cai H (2023) An adaptive multi-class imbalanced classification framework based on ensemble methods and deep network. Neural Comput Appl 35(15):11141–11159. https://doi.org/10.1007/s00521-023-08290-w

    Article  Google Scholar 

  16. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232. https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  17. Han M, Li A, Gao Z, Mu D, Liu S (2023) Hybrid sampling and dynamic weighting-based classification method for multi-class imbalanced data stream. Appl Sci 13(10):5924. https://doi.org/10.3390/app13105924

    Article  Google Scholar 

  18. Yuan X, Xie L, Abouelenien M (2017) A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data. Pattern Recogn 77:160–172. https://doi.org/10.1016/j.patcog.2017.12.017

    Article  Google Scholar 

  19. Prabhakararao E, Dandapat S (2022) Multi-scale convolutional neural network ensemble for multi-class arrhythmia classification. IEEE J Biomed Health Inform 26(8):3802–3812. https://doi.org/10.1109/JBHI.2021.3138986

    Article  Google Scholar 

  20. Taherkhani A, Cosma G, McGinnity TM (2020) AdaBoost-CNN: an adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing 404:351–366. https://doi.org/10.1016/j.neucom.2020.03.064

    Article  Google Scholar 

  21. Krishna PR, Prasad VVKDV, Battula TK (2023) Optimization empowered hierarchical residual VGGNet19 network for multi-class brain tumor classification. Multimed Tools Appl 82(11):16691–16716. https://doi.org/10.1007/s11042-022-13994-7

    Article  Google Scholar 

  22. Yu W, Yang K, Bai Y, Xiao T, Yao H, Rui Y (2016) Visualizing and comparing AlexNet and VGG using deconvolutional layers. In: Proceedings of the 33rd international conference on machine learning

  23. Szegedy C, Vanhoucke V, Ioffe S, Shalens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  24. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11231

  25. Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133. https://doi.org/10.1016/j.patcog.2019.01.006

    Article  Google Scholar 

  26. He F, Liu T, Tao D (2020) Why resnet works? Residuals generalize. IEEE Trans Neural Netw Learn Syst 31(12):5349–5362. https://doi.org/10.1109/TNNLS.2020.2966319

    Article  MathSciNet  Google Scholar 

  27. Wang X, Wang S, Cao J, Wang Y (2020) Data-driven based tiny-YOLOv3 method for front vehicle detection inducing SPP-Net. IEEE Access 8:110227–110236. https://doi.org/10.1109/ACCESS.2020.3001279

    Article  Google Scholar 

  28. Jeon J, Jeong B, Baek S, Jeong YS (2022) Hybrid malware detection based on Bi-LSTM and SPP-Net for smart IoT. IEEE Trans Industr Inf 18(7):4830–4837. https://doi.org/10.1109/TLL.2021.3119778

    Article  Google Scholar 

  29. Zhao Z, Yun S, Jia L, Guo J, Meng Y, He N, Li X, Shi J, Yang L (2023) Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features. Eng Appl Artif Intell 121:105982. https://doi.org/10.1016/j.engappai.2023.105982

    Article  Google Scholar 

  30. Elhassan T, Aljurf M (2016) Classification of imbalance data using tomek link (t-link) combined with random under-sampling (rus) as a data reduction method. Global J Technol Optim S 1:2016. https://doi.org/10.4172/2229-8711.S1111

    Article  Google Scholar 

  31. Van Hulse J, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning. Corvallis, Oregon, USA, pp 935–942. https://doi.org/10.1145/1273496.1273614

  32. Chawla NV, Bowyer KW, Hal LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–367. https://doi.org/10.1613/jair.953

    Article  Google Scholar 

  33. Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Advances in intelligent computing: In: international conference on intelligent computing, Springer, Berlin, pp 878–887. https://doi.org/10.1007/11538059_91

  34. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2003) Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB (eds) Advances in knowledge discovery and data mining: 13th PAKDD 2009. Springer, Heidelberg, pp 475–482

    Google Scholar 

  35. Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv: 1712.04621. https://doi.org/10.48550/arXiv.1712.04621

  36. Mikołajczyk A, Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 international interdisciplinary PhD workshop (IIPhDW). Świnouście, Poland, pp 117–122. https://doi.org/10.1109/IIPHDW.2018.8388338

  37. Ikram RMA, Hazarika BB, Gupta D, Heddam S, Kisi O (2023) Streamflow prediction in mountainous region using new machine learning and data pre-processing methods: a case study. Neural Comput Appl 35(12):9053–9070. https://doi.org/10.1007/s00521-022-08163-8

    Article  Google Scholar 

  38. Geurts P (2000) Some enhancements of decision tree bagging. In: European conference on principles of data mining and knowledge discovery, Springer, Berlin, pp 136–147. https://doi.org/10.1007/3-540-45372-5_14

  39. Kim HC, Pang S, Je HM, Kim D, Bang SY (2002) Support vector machine ensemble with bagging. Pattern recognition with support vector machines: In: First International Workshop, SVM 2002 Niagara Falls, Canada, August 10, 2002 Proceedings. Springer Berlin Heidelberg, pp 397–408. https://doi.org/10.1007/3-540-45665-1_31

  40. Li X, Wang L, Sung E (2005) A study of AdaBoost with SVM-based weak learners. In: Proceedings 2005 IEEE international joint conference on neural networks, 1, 196–201. https://doi.org/10.1109/IJCNN.2005.1555829

  41. Grossmann E (2004) AdaTree: boosting a weak classifier into a decision tree. In: 2004 conference on computer vision and pattern recognition workshop, Washington, pp 105–105. https://doi.org/10.1109/CVPR.2004.296

  42. Zeng M, Xiao N (2019) Effective combination of DenseNet and BiLSTM for keyword spotting. IEEE Access 7(1):0767–107752. https://doi.org/10.1109/ACCESS.2019.2891838

    Article  Google Scholar 

Download references

Funding

The authors did not receive specific funding.

Author information

Authors and Affiliations

Authors

Contributions

X.Z. and N.W. wrote the main manuscript text and prepared all the figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Nan Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, X., Wang, N. Ensemble learning method based on CNN for class imbalanced data. J Supercomput 80, 10090–10121 (2024). https://doi.org/10.1007/s11227-023-05820-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05820-0

Keywords

Navigation