Abstract
Activation functions lie at the core of deep neural networks allowing them to learn arbitrarily complex mappings. Without any activation, a neural network learn will only be able to learn a linear relation between input and the desired output. The chapter introduces the reader to why activation functions are useful and their immense importance in making deep learning successful. A detailed survey of several existing activation functions is provided in this chapter covering their functional forms, original motivations, merits as well as demerits. The chapter also discusses the domain of learnable activation functions and proposes a novel activation ‘SLAF’ whose shape is learned during the training of a neural network. A working model for SLAF is provided and its performance is experimentally shown on XOR and MNIST classification tasks.
Mohit Goyal and Rajan Goyal have Equally Contributed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bengio, Y., Aaron, C., Courville, Vincent, P.: Unsupervised feature learning and deep learning: a review and new perspectives. In: CoRR abs/1206.5538 (2012). arXiv:1206.5538. http://arxiv.org/abs/1206.5538
Zhang, G.P.: Neural networks for classification: a survey . IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(4), 451–462 (2000). ISSN 1094-6977. https://doi.org/10.1109/5326.897072
Tian, G.P., Pan, L.: Predicting short-term traffic flow by long short-term memory recurrent neural network. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 153–158 (2015). https://doi.org/10.1109/SmartCity.2015.63
Wiki. Activation Potential | Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Action_potential (2018). Accessed 31 Dec 2018
Stanford CS231n—Convolutional neural networks for visual recognition. http://cs231n.github.io/neural-networks-1/. Accessed 01 May 2019
London, M., Hausser, M.: Dendritic computation. Annu. Rev. Neurosci. 28(1), 503–532 (2005). https://doi.org/10.1146/annurev.neuro.28.061604.135703
Wiki. Activation Function | Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Activation_function (2018). Accessed 31 Dec 2018
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Proceedings of the 4th International Conference on Neural Information Processing Systems, NIPS’91, pp. 950–957. Morgan Kaufmann Publishers Inc., Denver, Colorado. http://dl.acm.org/citation.cfm?id=2986916.2987033 (1991). ISBN 1-55860-222-4
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR abs/1502.03167 (2015). arXiv:1502.03167. http://arxiv.org/abs/1502.03167
Autoencoders. https://en.wikipedia.org/wiki/Autoencoder. Accessed 05 Sept 2019
Saxe, A.M., Mcclelland, J., Ganguli, G.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2013)
Arora, S., et al.: A convergence analysis of gradient descent for deep linear neural networks. In: CoRR abs/1810.02281 (2018). arXiv:1810.02281. http://arxiv.org/abs/1810.02281
Toms, D.J.: Training binary node feedforward neural networks by back propagation of error. Electron. Lett. 26(21), 1745–1746 (1990)
Muselli, M.: On sequential construction of binary neural networks. IEEE Trans. Neural Netw. 6(3), 678–690 (1995)
Ito, Y.: Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory. Neural Netw. 4(3), 385–394 (1991)
Kwan, H.K.: Simple sigmoid-like activation function suitable for digital hardware implementation. Electron. Lett. 28(15), 1379–1380 (1992)
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, Berlin (2006). ISBN 0387310738
Parkes, E.J., Duffy, B.R.: An automated tanh-function method for finding solitary wave solutions to non-linear evolution equations. Comput. Phys. Commun. 98(3), 288–300 (1996)
LeCun, Y., et al.: Efficient backprop In: Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, pp. 9–50. Springer, Berlin (1998). ISBN: 3-540-65311-2. http://dl.acm.org/citation.cfm?id=645754.668382
Pascanu, R., Mikolov, R., Bengio, Y.: Understanding the exploding gradient problem. In: CoRR abs/1211.5063 (2012). arXiv:1211.5063. http://arxiv.org/abs/1211.5063
Hahnloser, R.H.R., et al.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947 (2000). https://doi.org/10.1038/35016072
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10, pp. 807–814. Omnipress, Haifa, Israel (2010). ISBN 978-1-60558-907-7. http://dl.acm.org/citation.cfm?id=3104322.3104425
Maas, A.L.: Rectifier nonlinearities improve neural network acoustic models. In: ICML, vol. 30 (2013)
He, K., et al.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). In: arXiv preprint (2015). arXiv:1511.07289
Klambauer, G., et al.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 971–980 (2017)
Goodfellow, I., et al.: Maxout networks. In: Dasgupta, S., McAlleste, D. (eds.) Proceedings of the 30th International Conference on Machine Learning, vol. 28. Proceedings of Machine Learning Research 3. Atlanta, Georgia, USA: PMLR, June 2013, pp. 1319–1327. http://proceedings.mlr.press/v28/goodfellow13.html
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2018)
He, K., et al.: Identity mappings in Deep residual networks. In: CoRR abs/1603.05027 (2016). arXiv:1603.05027. http://arxiv.org/abs/1603.05027
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: CoRR abs/1605.07146 (2016). arXiv:1605.07146. http://arxiv.org/abs/1605.07146
Huang, G., et al.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243.
Yu, C.C., Tang, Y.C., Liu, B.D.: An adaptive activation function for multilayer feedforward neural networks. In: 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM ’02. Proceedings, vol. 1, pp. 645–650 (2002). https://doi.org/10.1109/TENCON.2002.1181357.
Qian, S., et al.: Adaptive activation functions in convolutional neural networks. Neurocomput 272, 204–212 (2018). ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2017.06.070.
Hou, L., et al.: ConvNets with smooth adaptive activation functions for regression. In: Singh, A., Zhu, J. (eds.) Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. vol. 54. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR, pp. 430–439 (2017). http://proceedings.mlr.press/v54/hou17a.html
Agostinelli, F., et al.: Learning activation functions to improve deep neural networks. In: CoRR abs/1412.6830 (2014). arXiv:1412.6830. http://arxiv.org/abs/1412.6830
Lin, M., Chen Q., Yan, S.: Network in network. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16, 2014 Conference Track Proceedings, (2014). http://arxiv.org/abs/1312.4400
Basis function. June 2018. https://en.wikipedia.org/wiki/Basis_function
Carini, A., Sicuranza, G.L.: Even mirror Fourier nonlinear filters. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5608–5612 (2013)
Loshchilov, L., Hutter, F.: Fixing weight decay regularization in adam. In: CoRR abs/1711.05101 (2017). arXiv:1711.05101. http://arxiv.org/abs/1711.05101
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings, (2015). http://arxiv.org/abs/1412.6980
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=1953048.2021068
Zeiler, M.D.: ADADELTA: An adaptive learning rate method. In: CoRR abs/1212.5701 (2012). arXiv:1212.5701. http://arxiv.org/abs/1212.5701
Krizhevsky, A., Sutskever, I., Hinton, G.E.: imagenet classification with deep convolutional neural networks. In: Pereira, F., et al. (eds.) Advances in Neural Information Processing Systems 25. Curran Associates, Inc., pp. 1097–1105 (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Goyal, M., Goyal, R., Venkatappa Reddy, P., Lall, B. (2020). Activation Functions. In: Pedrycz, W., Chen, SM. (eds) Deep Learning: Algorithms and Applications. Studies in Computational Intelligence, vol 865. Springer, Cham. https://doi.org/10.1007/978-3-030-31760-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-31760-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31759-1
Online ISBN: 978-3-030-31760-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)