Convolutional Neural Network-Based Classification of Histopathological Images Affected by Data Imbalance

  • Michał KoziarskiEmail author
  • Bogdan Kwolek
  • Bogusław Cyganek
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11264)


In this paper we experimentally evaluated the impact of data imbalance on the convolutional neural networks performance in the histopathological image recognition task. We conducted our analysis on the Breast Cancer Histopathological Database. We considered four phenomena associated with data imbalance: how does it affect classification performance, what strategies of preventing imbalance are suitable for histopathological data, how presence of imbalance affects the value of new observations, and whether sampling training data from a balanced distribution during data acquisition is beneficial if test data will remain imbalanced. The most important findings of our experimental analysis are the following: while high imbalance significantly affects the performance, for some of the metrics small imbalance. Sampling training data from a balanced distribution had a decremental effect, and we achieved a better performance applying a dedicated strategy of dealing with imbalance. Finally, not all of the traditional strategies of dealing with imbalance translate well to the histopathological image recognition setting.


Convolutional neural network Data imbalance Histopathological image classification 



This research was supported by the National Science Centre, Poland, under the grant no. 2017/27/N/ST6/01705 and the PLGrid infrastructure.


  1. 1.
    Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. arXiv preprint arXiv:1710.05381 (2017)
  2. 2.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  3. 3.
    Dong, Q., Gong, S., Zhu, X.: Imbalanced deep learning by minority class incremental rectification. arXiv preprint arXiv:1804.10851 (2018)
  4. 4.
    Hamidinekoo, A., Denton, E., Rampun, A., Honnor, K., Zwiggelaar, R.: Deep learning in mammography and breast histology, an overview and future trends. Med. Image Anal. 47, 45–67 (2018)CrossRefGoogle Scholar
  5. 5.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  6. 6.
    Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)CrossRefGoogle Scholar
  7. 7.
    Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based approach to imbalanced data oversampling. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 318–327. Springer, Cham (2017). Scholar
  8. 8.
    Koziarski, M., Wożniak, M.: CCR: a combined cleaning and resampling algorithm for imbalanced data classification. Int. J. Appl. Math. Comput. Sci. 27(4), 727–736 (2017)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016)CrossRefGoogle Scholar
  10. 10.
    Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001). Scholar
  11. 11.
    Lusa, L., et al.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013)CrossRefGoogle Scholar
  12. 12.
    Pulgar, F.J., Rivera, A.J., Charte, F., del Jesus, M.J.: On the impact of imbalanced data in convolutional neural networks performance. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 220–232. Springer, Cham (2017). Scholar
  13. 13.
    Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: Breast cancer histopathological image classification using convolutional neural networks. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 2560–2567. IEEE (2016)Google Scholar
  14. 14.
    Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 63(7), 1455–1462 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Michał Koziarski
    • 1
    Email author
  • Bogdan Kwolek
    • 1
  • Bogusław Cyganek
    • 1
  1. 1.Department of ElectronicsAGH University of Science and TechnologyKrakówPoland

Personalised recommendations