Skip to main content

Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks

  • Chapter
  • First Online:
Bio-inspired Neurocomputing

Part of the book series: Studies in Computational Intelligence ((SCI,volume 903))

Abstract

The primary neural networks’ decision-making units are activation functions. Moreover, they evaluate the output of networks neural node; thus, they are essential for the performance of the whole network. Hence, it is critical to choose the most appropriate activation function in neural networks calculation. Acharya et al. (2018) suggest that numerous recipes have been formulated over the years, though some of them are considered deprecated these days since they are unable to operate properly under some conditions. These functions have a variety of characteristics, which are deemed essential to successfully learning. Their monotonicity, individual derivatives, and finite of their range are some of these characteristics. This research paper will evaluate the commonly used additive functions, such as swish, ReLU, Sigmoid, and so forth. This will be followed by their properties, own cons and pros, and particular formula application recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Deng, L.: A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Signal Inf. Process. 3, e2 (2014)

    Google Scholar 

  2. Hertz, J.A.: Introduction to the theory of neural computation. CRC Press (2018)

    Google Scholar 

  3. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)

    Google Scholar 

  4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. Comput. Vis. Pattern Recognit. (CVPR) 7 (2015)

    Google Scholar 

  5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105 (2012). NIPS’12, Curran Associates Inc., USA

    Google Scholar 

  6. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556

  7. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–17

    Google Scholar 

  8. Piczak, K.J.: Recognizing bird species in audio recordings using deep convolutional neural networks. In: CLEF (Working Notes), pp. 534–543

    Google Scholar 

  9. Yakopcic, C., Westberg, S., Van Esesn, B., Alom, M.Z., Taha, T.M., Asari, V.K.: The history began from alexnet: a comprehensive survey on deep learning approaches (2018)

    Google Scholar 

  10. Huang, G., Sun, Y., Liu, Z., Sedra, D.,Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N.,Welling, M. (eds.), ECCV (4), volume 9908 of Lecture Notes in Computer Science, pp. 646–661. Springer (2016)

    Google Scholar 

  11. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22, 400–407 (1951)

    Google Scholar 

  12. Nielsen, M.A.: Neural Networks and Deep Learning, Determination Press (2015)

    Google Scholar 

  13. Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-newton method for largescale optimization. SIAM J. Opt. 26, 1008–1031 (2016)

    Google Scholar 

  14. Banerjee, A., Dubey, A., Menon, A., Nanda, S., Nandi, G.C.: Speaker recognition using deep belief networks (2018). arXiv:1805.08865

  15. Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: Wechsler, H. (ed.) Neural Networks for Perception, pp. 65–93. Academic Press (1992)

    Google Scholar 

  16. LeCun, Y., Bottou L., Orr, G.B., Müller, K.R.: Efficient BackProp, pp. 9–50. Springer, Berlin, Heidelberg

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034

    Google Scholar 

  18. Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3d residual networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3154–3160

    Google Scholar 

  19. Godfrey, L.B., Gashler, M.S.: A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. In: 7th International Conference on Knowledge Discovery and Information Retrieval, pp. 481–486

    Google Scholar 

  20. Neal, R.M.: Connectionist learning of belief networks. Artif. Intell. 56, 71–113 (1992)

    Article  MathSciNet  Google Scholar 

  21. Karpathy, A.: Yes you should understand backprop. https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b (2016). Accessed 30 Nov 2018

  22. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)

    Google Scholar 

  23. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10. IEEE

    Google Scholar 

  24. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pp. 249–256. PMLR, Chia Laguna Resort, Sardinia, Italy (2010)

    Google Scholar 

  25. Elliott, D.L.: A better activation function for artificial neural networks, Maryland Publishing Unit (1998)

    Google Scholar 

  26. Turian, J., Bergstra, J., Bengio, Y.: Quadratic features and deep architectures for chunking. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, NAACL-Short ’09, Association for Computational Linguistics, pp. 245–248. Stroudsburg, PA, USA (2009)

    Google Scholar 

  27. Gibiansky, A., Arik, S.O., Kannan, A., Narang, S., Ping, W., Peng, K., Miller, J.: Deep voice 3: scaling text-to-speech with convolutional sequence learning. In: International Conference on Learning Representations, ICLR, pp. 1094–1099

    Google Scholar 

  28. Farzad, A., Mashayekhi, H., Hassanpour, H.: A comparative performance analysis of different activation functions in lstm networks for classification. Neural Comput. Appl. (2017)

    Google Scholar 

  29. Nielsen, M.A.: Neural networks and deep learning, Determination Press (2015)

    Google Scholar 

  30. Hahnloser, R., Sarpeshkar, R., Mahowald, M.A., Douglas, R., Sebastian Seung, H.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947–951 (2000)

    Google Scholar 

  31. Hahnloser, R.H.R., Seung, H.S., Slotine, J.-J.: Permitted and forbidden sets in symmetric threshold-linear networks. Neural Comput. 15, 621–638 (2003)

    Google Scholar 

  32. Ping, W., Peng, K., Gibiansky, A., Arik, S.O., Kannan, A., Narang, S., Raiman, J., Miller, J.: Deep voice 3: scaling text-to-speech with convolutional sequence learning (2017). arXiv:1710.07654

  33. Chigozie Enyinna Nwankpa, A.G., Winifred Ijomah, S.M.: Activation functions: comparison of trends in practice and research for deep learning (2018)

    Google Scholar 

  34. Maas, A.L.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings 30th International Conference on Machine Learning, pp. 1–6

    Google Scholar 

  35. Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18, 629–681 (2017)

    Google Scholar 

  36. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision (ICCV 2015), vol. 1502 (2015)

    Google Scholar 

  37. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)

    Google Scholar 

  38. Zhang, Y., Pezeshki,M., Brakel, P., Zhang, S., Laurent, C., Bengio, Y., Courville, A.: Towards end-to-end speech recognitionwith deep convolutional neural networks. In: Interspeech (2016), pp. 410–414

    Google Scholar 

  39. Tóth, L.: Phone recognition with hierarchical convolutional deep maxout networks. EURASIP J. Audio Speech Music Process. 25 (2015)

    Google Scholar 

  40. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pp. 315–323. PMLR, Fort Lauderdale, FL, USA (2011)

    Google Scholar 

  41. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2017). arXiv:1710.05941

  42. Zoph, B.: Swish: a self-gated activation function (2017)

    Google Scholar 

  43. Sharma, J.: Experiments with swish activation function on mnist dataset,Medium Corporation (2017)

    Google Scholar 

  44. Krizhevsky,A., Nair,V., Hinton, G.: Cifar-10, Canadian Institute for Advanced Research (2015)

    Google Scholar 

  45. Szandała, T.: Benchmarking comparison of swish versus other activation functions on cifar-10 imageset. In: International Conference on Dependability and Complex Systems, pp. 498–505. Springer

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomasz Szandała .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Szandała, T. (2021). Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks. In: Bhoi, A., Mallick, P., Liu, CM., Balas, V. (eds) Bio-inspired Neurocomputing. Studies in Computational Intelligence, vol 903. Springer, Singapore. https://doi.org/10.1007/978-981-15-5495-7_11

Download citation

Publish with us

Policies and ethics