Advertisement

Activation Functions

  • Mohit GoyalEmail author
  • Rajan Goyal
  • P. Venkatappa Reddy
  • Brejesh Lall
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 865)

Abstract

Activation functions lie at the core of deep neural networks allowing them to learn arbitrarily complex mappings. Without any activation, a neural network learn will only be able to learn a linear relation between input and the desired output. The chapter introduces the reader to why activation functions are useful and their immense importance in making deep learning successful. A detailed survey of several existing activation functions is provided in this chapter covering their functional forms, original motivations, merits as well as demerits. The chapter also discusses the domain of learnable activation functions and proposes a novel activation ‘SLAF’ whose shape is learned during the training of a neural network. A working model for SLAF is provided and its performance is experimentally shown on XOR and MNIST classification tasks.

Keywords

Activation functions Neural networks Learning deep neural networks Adaptive activation functions ReLU SLAF 

References

  1. 1.
    Bengio, Y., Aaron, C., Courville, Vincent, P.: Unsupervised feature learning and deep learning: a review and new perspectives. In: CoRR abs/1206.5538 (2012). arXiv:1206.5538. http://arxiv.org/abs/1206.5538
  2. 2.
    Zhang, G.P.: Neural networks for classification: a survey . IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(4), 451–462 (2000). ISSN 1094-6977.  https://doi.org/10.1109/5326.897072
  3. 3.
    Tian, G.P., Pan, L.: Predicting short-term traffic flow by long short-term memory recurrent neural network. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 153–158 (2015).  https://doi.org/10.1109/SmartCity.2015.63
  4. 4.
    Wiki. Activation Potential | Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Action_potential (2018). Accessed 31 Dec 2018
  5. 5.
    Stanford CS231n—Convolutional neural networks for visual recognition. http://cs231n.github.io/neural-networks-1/. Accessed 01 May 2019
  6. 6.
    London, M., Hausser, M.: Dendritic computation. Annu. Rev. Neurosci. 28(1), 503–532 (2005).  https://doi.org/10.1146/annurev.neuro.28.061604.135703
  7. 7.
    Wiki. Activation Function | Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Activation_function (2018). Accessed 31 Dec 2018
  8. 8.
    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951).  https://doi.org/10.1214/aoms/1177729586
  9. 9.
    Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Proceedings of the 4th International Conference on Neural Information Processing Systems, NIPS’91, pp. 950–957. Morgan Kaufmann Publishers Inc., Denver, Colorado. http://dl.acm.org/citation.cfm?id=2986916.2987033 (1991). ISBN 1-55860-222-4
  10. 10.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR abs/1502.03167 (2015). arXiv:1502.03167. http://arxiv.org/abs/1502.03167
  12. 12.
    Autoencoders. https://en.wikipedia.org/wiki/Autoencoder. Accessed 05 Sept 2019
  13. 13.
    Saxe, A.M., Mcclelland, J., Ganguli, G.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2013)Google Scholar
  14. 14.
    Arora, S., et al.: A convergence analysis of gradient descent for deep linear neural networks. In: CoRR abs/1810.02281 (2018). arXiv:1810.02281. http://arxiv.org/abs/1810.02281
  15. 15.
    Toms, D.J.: Training binary node feedforward neural networks by back propagation of error. Electron. Lett. 26(21), 1745–1746 (1990)CrossRefGoogle Scholar
  16. 16.
    Muselli, M.: On sequential construction of binary neural networks. IEEE Trans. Neural Netw. 6(3), 678–690 (1995)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Ito, Y.: Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory. Neural Netw. 4(3), 385–394 (1991)CrossRefGoogle Scholar
  18. 18.
    Kwan, H.K.: Simple sigmoid-like activation function suitable for digital hardware implementation. Electron. Lett. 28(15), 1379–1380 (1992)CrossRefGoogle Scholar
  19. 19.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, Berlin (2006). ISBN 0387310738Google Scholar
  20. 20.
    Parkes, E.J., Duffy, B.R.: An automated tanh-function method for finding solitary wave solutions to non-linear evolution equations. Comput. Phys. Commun. 98(3), 288–300 (1996)CrossRefGoogle Scholar
  21. 21.
    LeCun, Y., et al.: Efficient backprop In: Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, pp. 9–50. Springer, Berlin (1998). ISBN: 3-540-65311-2. http://dl.acm.org/citation.cfm?id=645754.668382
  22. 22.
    Pascanu, R., Mikolov, R., Bengio, Y.: Understanding the exploding gradient problem. In: CoRR abs/1211.5063 (2012). arXiv:1211.5063. http://arxiv.org/abs/1211.5063
  23. 23.
    Hahnloser, R.H.R., et al.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947 (2000).  https://doi.org/10.1038/35016072
  24. 24.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10, pp. 807–814. Omnipress, Haifa, Israel (2010). ISBN 978-1-60558-907-7. http://dl.acm.org/citation.cfm?id=3104322.3104425
  25. 25.
    Maas, A.L.: Rectifier nonlinearities improve neural network acoustic models. In: ICML, vol. 30 (2013)Google Scholar
  26. 26.
    He, K., et al.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  27. 27.
    Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). In: arXiv preprint (2015). arXiv:1511.07289
  28. 28.
    Klambauer, G., et al.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 971–980 (2017)Google Scholar
  29. 29.
    Goodfellow, I., et al.: Maxout networks. In: Dasgupta, S., McAlleste, D. (eds.) Proceedings of the 30th International Conference on Machine Learning, vol. 28. Proceedings of Machine Learning Research 3. Atlanta, Georgia, USA: PMLR, June 2013, pp. 1319–1327. http://proceedings.mlr.press/v28/goodfellow13.html
  30. 30.
    Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2018)Google Scholar
  31. 31.
    He, K., et al.: Identity mappings in Deep residual networks. In: CoRR abs/1603.05027 (2016). arXiv:1603.05027. http://arxiv.org/abs/1603.05027
  32. 32.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: CoRR abs/1605.07146 (2016). arXiv:1605.07146. http://arxiv.org/abs/1605.07146
  33. 33.
    Huang, G., et al.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017).  https://doi.org/10.1109/CVPR.2017.243.
  34. 34.
    Yu, C.C., Tang, Y.C., Liu, B.D.: An adaptive activation function for multilayer feedforward neural networks. In: 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM ’02. Proceedings, vol. 1, pp. 645–650 (2002).  https://doi.org/10.1109/TENCON.2002.1181357.
  35. 35.
    Qian, S., et al.: Adaptive activation functions in convolutional neural networks. Neurocomput 272, 204–212 (2018). ISSN 0925-2312.  https://doi.org/10.1016/j.neucom.2017.06.070.
  36. 36.
    Hou, L., et al.: ConvNets with smooth adaptive activation functions for regression. In: Singh, A., Zhu, J. (eds.) Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. vol. 54. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR, pp. 430–439 (2017). http://proceedings.mlr.press/v54/hou17a.html
  37. 37.
    Agostinelli, F., et al.: Learning activation functions to improve deep neural networks. In: CoRR abs/1412.6830 (2014). arXiv:1412.6830. http://arxiv.org/abs/1412.6830
  38. 38.
    Lin, M., Chen Q., Yan, S.: Network in network. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16, 2014 Conference Track Proceedings, (2014). http://arxiv.org/abs/1312.4400
  39. 39.
  40. 40.
    Carini, A., Sicuranza, G.L.: Even mirror Fourier nonlinear filters. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5608–5612 (2013)Google Scholar
  41. 41.
    Loshchilov, L., Hutter, F.: Fixing weight decay regularization in adam. In: CoRR abs/1711.05101 (2017). arXiv:1711.05101. http://arxiv.org/abs/1711.05101
  42. 42.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings, (2015). http://arxiv.org/abs/1412.6980
  43. 43.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=1953048.2021068
  44. 44.
    Zeiler, M.D.: ADADELTA: An adaptive learning rate method. In: CoRR abs/1212.5701 (2012). arXiv:1212.5701. http://arxiv.org/abs/1212.5701
  45. 45.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: imagenet classification with deep convolutional neural networks. In: Pereira, F., et al. (eds.) Advances in Neural Information Processing Systems 25. Curran Associates, Inc., pp. 1097–1105 (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mohit Goyal
    • 1
    Email author
  • Rajan Goyal
    • 1
  • P. Venkatappa Reddy
    • 1
    • 2
  • Brejesh Lall
    • 1
  1. 1.Department of Electrical EngineeringIndian Institute of Technology DelhiDelhiIndia
  2. 2.Electronics and Communication EngineeringVignan’s Foundation for Science, Technology & ResearchGunturIndia

Personalised recommendations