Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks

Szandała, Tomasz

doi:10.1007/978-981-15-5495-7_11

Tomasz Szandała⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 903))

2819 Accesses
117 Citations

Abstract

The primary neural networks’ decision-making units are activation functions. Moreover, they evaluate the output of networks neural node; thus, they are essential for the performance of the whole network. Hence, it is critical to choose the most appropriate activation function in neural networks calculation. Acharya et al. (2018) suggest that numerous recipes have been formulated over the years, though some of them are considered deprecated these days since they are unable to operate properly under some conditions. These functions have a variety of characteristics, which are deemed essential to successfully learning. Their monotonicity, individual derivatives, and finite of their range are some of these characteristics. This research paper will evaluate the commonly used additive functions, such as swish, ReLU, Sigmoid, and so forth. This will be followed by their properties, own cons and pros, and particular formula application recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Deng, L.: A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Signal Inf. Process. 3, e2 (2014)
Google Scholar
Hertz, J.A.: Introduction to the theory of neural computation. CRC Press (2018)
Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. Comput. Vis. Pattern Recognit. (CVPR) 7 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105 (2012). NIPS’12, Curran Associates Inc., USA
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–17
Google Scholar
Piczak, K.J.: Recognizing bird species in audio recordings using deep convolutional neural networks. In: CLEF (Working Notes), pp. 534–543
Google Scholar
Yakopcic, C., Westberg, S., Van Esesn, B., Alom, M.Z., Taha, T.M., Asari, V.K.: The history began from alexnet: a comprehensive survey on deep learning approaches (2018)
Google Scholar
Huang, G., Sun, Y., Liu, Z., Sedra, D.,Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N.,Welling, M. (eds.), ECCV (4), volume 9908 of Lecture Notes in Computer Science, pp. 646–661. Springer (2016)
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22, 400–407 (1951)
Google Scholar
Nielsen, M.A.: Neural Networks and Deep Learning, Determination Press (2015)
Google Scholar
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-newton method for largescale optimization. SIAM J. Opt. 26, 1008–1031 (2016)
Google Scholar
Banerjee, A., Dubey, A., Menon, A., Nanda, S., Nandi, G.C.: Speaker recognition using deep belief networks (2018). arXiv:1805.08865
Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: Wechsler, H. (ed.) Neural Networks for Perception, pp. 65–93. Academic Press (1992)
Google Scholar
LeCun, Y., Bottou L., Orr, G.B., Müller, K.R.: Efficient BackProp, pp. 9–50. Springer, Berlin, Heidelberg
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034
Google Scholar
Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3d residual networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3154–3160
Google Scholar
Godfrey, L.B., Gashler, M.S.: A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. In: 7th International Conference on Knowledge Discovery and Information Retrieval, pp. 481–486
Google Scholar
Neal, R.M.: Connectionist learning of belief networks. Artif. Intell. 56, 71–113 (1992)
Article MathSciNet Google Scholar
Karpathy, A.: Yes you should understand backprop. https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b (2016). Accessed 30 Nov 2018
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Google Scholar
Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10. IEEE
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pp. 249–256. PMLR, Chia Laguna Resort, Sardinia, Italy (2010)
Google Scholar
Elliott, D.L.: A better activation function for artificial neural networks, Maryland Publishing Unit (1998)
Google Scholar
Turian, J., Bergstra, J., Bengio, Y.: Quadratic features and deep architectures for chunking. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, NAACL-Short ’09, Association for Computational Linguistics, pp. 245–248. Stroudsburg, PA, USA (2009)
Google Scholar
Gibiansky, A., Arik, S.O., Kannan, A., Narang, S., Ping, W., Peng, K., Miller, J.: Deep voice 3: scaling text-to-speech with convolutional sequence learning. In: International Conference on Learning Representations, ICLR, pp. 1094–1099
Google Scholar
Farzad, A., Mashayekhi, H., Hassanpour, H.: A comparative performance analysis of different activation functions in lstm networks for classification. Neural Comput. Appl. (2017)
Google Scholar
Nielsen, M.A.: Neural networks and deep learning, Determination Press (2015)
Google Scholar
Hahnloser, R., Sarpeshkar, R., Mahowald, M.A., Douglas, R., Sebastian Seung, H.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947–951 (2000)
Google Scholar
Hahnloser, R.H.R., Seung, H.S., Slotine, J.-J.: Permitted and forbidden sets in symmetric threshold-linear networks. Neural Comput. 15, 621–638 (2003)
Google Scholar
Ping, W., Peng, K., Gibiansky, A., Arik, S.O., Kannan, A., Narang, S., Raiman, J., Miller, J.: Deep voice 3: scaling text-to-speech with convolutional sequence learning (2017). arXiv:1710.07654
Chigozie Enyinna Nwankpa, A.G., Winifred Ijomah, S.M.: Activation functions: comparison of trends in practice and research for deep learning (2018)
Google Scholar
Maas, A.L.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings 30th International Conference on Machine Learning, pp. 1–6
Google Scholar
Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18, 629–681 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision (ICCV 2015), vol. 1502 (2015)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Google Scholar
Zhang, Y., Pezeshki,M., Brakel, P., Zhang, S., Laurent, C., Bengio, Y., Courville, A.: Towards end-to-end speech recognitionwith deep convolutional neural networks. In: Interspeech (2016), pp. 410–414
Google Scholar
Tóth, L.: Phone recognition with hierarchical convolutional deep maxout networks. EURASIP J. Audio Speech Music Process. 25 (2015)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pp. 315–323. PMLR, Fort Lauderdale, FL, USA (2011)
Google Scholar
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2017). arXiv:1710.05941
Zoph, B.: Swish: a self-gated activation function (2017)
Google Scholar
Sharma, J.: Experiments with swish activation function on mnist dataset,Medium Corporation (2017)
Google Scholar
Krizhevsky,A., Nair,V., Hinton, G.: Cifar-10, Canadian Institute for Advanced Research (2015)
Google Scholar
Szandała, T.: Benchmarking comparison of swish versus other activation functions on cifar-10 imageset. In: International Conference on Dependability and Complex Systems, pp. 498–505. Springer
Google Scholar

Download references

Author information

Authors and Affiliations

Wroclaw University of Science and Technology, Wrocław, Poland
Tomasz Szandała

Authors

Tomasz Szandała
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Szandała .

Editor information

Editors and Affiliations

Sikkim Manipal Institute of Technology, Rangpo, Sikkim, India
Akash Kumar Bhoi
KIIT Deemed to be University, Bhubaneswar, Odisha, India
Pradeep Kumar Mallick
National Taipei University of Technology, Taipei, Taiwan
Chuan-Ming Liu
Aurel Vlaicu University of Arad, Arad, Romania
Valentina E. Balas

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Szandała, T. (2021). Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks. In: Bhoi, A., Mallick, P., Liu, CM., Balas, V. (eds) Bio-inspired Neurocomputing. Studies in Computational Intelligence, vol 903. Springer, Singapore. https://doi.org/10.1007/978-981-15-5495-7_11

Download citation

DOI: https://doi.org/10.1007/978-981-15-5495-7_11
Published: 22 July 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5494-0
Online ISBN: 978-981-15-5495-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics