Convolutional Neural Network-Based Classification of Histopathological Images Affected by Data Imbalance
In this paper we experimentally evaluated the impact of data imbalance on the convolutional neural networks performance in the histopathological image recognition task. We conducted our analysis on the Breast Cancer Histopathological Database. We considered four phenomena associated with data imbalance: how does it affect classification performance, what strategies of preventing imbalance are suitable for histopathological data, how presence of imbalance affects the value of new observations, and whether sampling training data from a balanced distribution during data acquisition is beneficial if test data will remain imbalanced. The most important findings of our experimental analysis are the following: while high imbalance significantly affects the performance, for some of the metrics small imbalance. Sampling training data from a balanced distribution had a decremental effect, and we achieved a better performance applying a dedicated strategy of dealing with imbalance. Finally, not all of the traditional strategies of dealing with imbalance translate well to the histopathological image recognition setting.
KeywordsConvolutional neural network Data imbalance Histopathological image classification
This research was supported by the National Science Centre, Poland, under the grant no. 2017/27/N/ST6/01705 and the PLGrid infrastructure.
- 1.Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. arXiv preprint arXiv:1710.05381 (2017)
- 3.Dong, Q., Gong, S., Zhu, X.: Imbalanced deep learning by minority class incremental rectification. arXiv preprint arXiv:1804.10851 (2018)
- 7.Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based approach to imbalanced data oversampling. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 318–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59650-1_27CrossRefGoogle Scholar
- 12.Pulgar, F.J., Rivera, A.J., Charte, F., del Jesus, M.J.: On the impact of imbalanced data in convolutional neural networks performance. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 220–232. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59650-1_19CrossRefGoogle Scholar
- 13.Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: Breast cancer histopathological image classification using convolutional neural networks. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 2560–2567. IEEE (2016)Google Scholar