Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function


This paper presents a novel approach to deal with the imbalanced data set problem in neural networks by incorporating prior probabilities into a cost-sensitive cross-entropy error function. Several classical benchmarks were tested for performance evaluation using different metrics, namely G-Mean, area under the ROC curve (AUC), adjusted G-Mean, Accuracy, True Positive Rate, True Negative Rate and F1-score. The obtained results were compared to well-known algorithms and showed the effectiveness and robustness of the proposed approach, which results in well-balanced classifiers given different imbalance scenarios.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    Chawla NV, Japkowicz N, Kotcz A (2004a) Special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6

    Article  Google Scholar 

  2. 2.

    Chawla N, Japkowicz N, Kolcz A (2004b) Special issue on learning from imbalanced data sets. In: Editorial of the ACM SIGKDD explorations newsletter

  3. 3.

    He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  4. 4.

    López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141

    Article  Google Scholar 

  5. 5.

    Bhowan U, Johnston M, Zhang M, Yao X (2013) Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans Evol Comput 17(3):368–386

    Article  Google Scholar 

  6. 6.

    Frasca M, Bertoni A, Re M, Valentini G (2013) A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Netw 43:84–98

    Article  Google Scholar 

  7. 7.

    Wang L, Yang B, Chen Y, Zhang X, Orchard J (2017) Improving neural-network classifiers using nearest neighbor partitioning. IEEE Trans Neural Netw Learn Syst 28(10):2255–2267

    MathSciNet  Article  Google Scholar 

  8. 8.

    Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899

    Article  Google Scholar 

  9. 9.

    Oh SH (2011) A statistical perspective of neural networks for imbalanced data problems. Int J Contents 7(3):1–5

    Article  Google Scholar 

  10. 10.

    Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York

    Google Scholar 

  11. 11.

    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357

  12. 12.

    Barandela R, Valdovinos RM, Sánchez JS, Ferri FJ (2004) The imbalanced training sample problem: under or over sampling? In: Structural, syntactic, and statistical pattern recognition. Springer, pp 806–814

  13. 13.

    He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328

  14. 14.

    Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21(5):813–830

    Article  Google Scholar 

  15. 15.

    Chen S, He H, Garcia EA (2010) Ramoboost: ranked minority oversampling in boosting. IEEE Trans Neural Netw 21(10):1624–1642

    Article  Google Scholar 

  16. 16.

    Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378

    Article  Google Scholar 

  17. 17.

    Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336

    Article  Google Scholar 

  18. 18.

    Kukar M, Kononenko I (1998) Cost-sensitive learning with neural networks. In: ECAI, pp 445–449

  19. 19.

    Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence. Lawrence Erlbaum Associates Ltd, pp 973–978

  20. 20.

    Alejo R, García V, Sotoca JM, Mollineda RA, Sánchez JS (2007) Improving the performance of the rbf neural networks trained with imbalanced samples. In: Computational and ambient intelligence. Springer, pp 162–169

  21. 21.

    Kline DM, Berardi VL (2005) Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comput Appl 14(4):310–318

    Article  Google Scholar 

  22. 22.

    Berger JO (2010) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New York

    Google Scholar 

  23. 23.

    Riedmiller M, Braun H (1993) A direct adaptive method for faster back propagation learning: the rprop algorithm. In: IEEE international conference on neural networks. IEEE, pp 586–591

  24. 24.

    Zhu C, Wang Z (2017) Entropy-based matrix learning machine for imbalanced data sets. Pattern Recognit Lett 88:72–80

    Article  Google Scholar 

  25. 25.

    Tomek I (1976) Two modifications of cnn. IEEE Trans Syst Man Cybern 6:769–772

    MathSciNet  MATH  Google Scholar 

  26. 26.

    Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231

    Article  Google Scholar 

  27. 27.

    Kubat M, Matwin S (1997) Addressing the curse of imbalanced trainingsets: one-sided selection. In: ICML, Nashville, USA, vol 97, pp 179–186

  28. 28.

    Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874

    MathSciNet  Article  Google Scholar 

  29. 29.

    Batuwita R, Palade V (2012) Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. J Bioinform Comput Biol 10(04):1250003

    Article  Google Scholar 

  30. 30.

    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30

    MathSciNet  MATH  Google Scholar 

  31. 31.

    Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  Google Scholar 

  32. 32.

    Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64

    MathSciNet  Article  Google Scholar 

Download references


The authors would like to thank the funding agencies CNPq, FAPEMIG and CAPES for their financial support.

Author information



Corresponding author

Correspondence to Yuri Sousa Aurelio.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aurelio, Y.S., de Almeida, G.M., de Castro, C.L. et al. Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function. Neural Process Lett 50, 1937–1949 (2019). https://doi.org/10.1007/s11063-018-09977-1

Download citation


  • Multilayer perceptron
  • Imbalanced data
  • Classification problem
  • Back-propagation
  • Cost-sensitive function