The Unreasonable Effectiveness of the Final Batch Normalization Layer

Kocaman, Veysel; Shir, Ofer M.; Bäck, Thomas

doi:10.1007/978-3-030-90436-4_7

Veysel Kocaman ORCID: orcid.org/0000-0002-0065-6478¹⁷,
Ofer M. Shir¹⁸ &
Thomas Bäck¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13018))

Included in the following conference series:

International Symposium on Visual Computing

938 Accesses

Abstract

Early-stage disease indications are rarely recorded in real-world domains, such as Agriculture and Healthcare, and yet, their accurate identification is critical in that point of time. In this type of highly imbalanced classification problems, which encompass complex features, deep learning (DL) is much needed because of its strong detection capabilities. At the same time, DL is observed in practice to favor majority over minority classes and consequently suffer from inaccurate detection of the targeted early-stage indications. In this work, we extend the study done by [11], showing that the final BN layer, when placed before the softmax output layer, has a considerable impact in highly imbalanced image classification problems as well as undermines the role of the softmax outputs as an uncertainty measure. This current study addresses additional hypotheses and reports on the following findings: (i) the performance gain after adding the final BN layer in highly imbalanced settings could still be achieved after removing this additional BN layer in inference; (ii) there is a certain threshold for the imbalance ratio upon which the progress gained by the final BN layer reaches its peak; (iii) the batch size also plays a role and affects the outcome of the final BN application; (iv) the impact of the BN application is also reproducible on other datasets and when utilizing much simpler neural architectures; (v) the reported BN effect occurs only per a single majority class and multiple minority classes – i.e., no improvements are evident when there are two majority classes; and finally, (vi) utilizing this BN layer with sigmoid activation has almost no impact when dealing with a strongly imbalanced image classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barz, B., Denzler, J.: Deep learning on small datasets without pre-training using cosine loss. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1371–1380 (2020)
Google Scholar
Beggel, L., Pfeiffer, M., Bischl, B.: Robust anomaly detection in images using adversarial autoencoders. arXiv preprint arXiv:1901.06355 (2019)
Bjorck, N., Gomes, C.P., Selman, B., Weinberger, K.Q.: Understanding batch normalization. In: Advances in Neural Information Processing Systems, pp. 7694–7705 (2018)
Google Scholar
Chelombiev, I., Houghton, C., O’Donnell, C.: Adaptive estimators show information compression in deep neural networks. arXiv preprint arXiv:1902.09037 (2019)
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9268–9277 (2019)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1321–1330 (2017). JMLR. org
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hussain, M., Bird, J.J., Faria, D.R.: A study on CNN transfer learning for image classification. In: Lotfi, A., Bouchachia, H., Gegov, A., Langensiepen, C., McGinnity, M. (eds.) UKCI 2018. AISC, vol. 840, pp. 191–202. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-97982-3_16
Chapter Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Kocaman, V., Shir, O.M., Bäck, T.: Improving model accuracy for imbalanced image classification tasks by adding a final batch normalization layer: An empirical study. arXiv preprint arXiv:2011.06319, Accepted to International Conference on Pattern Recognition, ICPR 2020. (2020)
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in Neural Information Processing Systems, pp. 950–957 (1992)
Google Scholar
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
Article MathSciNet Google Scholar
Mishkin, D., Matas, J.: All you need is a good init. arXiv preprint arXiv:1511.06422 (2015)
Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016)
Article Google Scholar
Müller, R., Kornblith, S., Hinton, G.E.: When does label smoothing help? In: Advances in Neural Information Processing Systems, pp. 4696–4705 (2019)
Google Scholar
Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: Advances in Neural Information Processing Systems, pp. 2483–2493 (2018)
Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 60 (2019)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE (2017)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

Download references

Author information

Authors and Affiliations

LIACS, Leiden University, Leiden, The Netherlands
Veysel Kocaman & Thomas Bäck
Computer Science Department, Tel-Hai College and Migal Institute, Upper Galilee, Israel
Ofer M. Shir

Authors

Veysel Kocaman
View author publications
You can also search for this author in PubMed Google Scholar
Ofer M. Shir
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Bäck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veysel Kocaman .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Texas at Arlington, Arlington, TX, USA
Vassilis Athitsos
University of South Carolina, Columbia, SC, USA
Tong Yan
City University of Hong Kong, Kowloon, Hong Kong
Manfred Lau
School of Engineering and Computing, University of Durham, Durham, Durham, UK
Frederick Li
Airbnb, New York, NY, USA
Conglei Shi
Peking University, Beijing, China
Xiaoru Yuan
Purdue University, West Lafayette, IN, USA
Christos Mousas
IST, School of Modeling, Simulation, and Training, Orlando, FL, USA
Gerd Bruder

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kocaman, V., Shir, O.M., Bäck, T. (2021). The Unreasonable Effectiveness of the Final Batch Normalization Layer. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2021. Lecture Notes in Computer Science(), vol 13018. Springer, Cham. https://doi.org/10.1007/978-3-030-90436-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-90436-4_7
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90435-7
Online ISBN: 978-3-030-90436-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics