Abstract
Current improvements in the performance of deep neural networks are partly due to the proposition of rectified linear units. A ReLU activation function outputs zero for negative component, inducing the death of some neurons and a bias shift of the outputs, which causes oscillations and impedes learning. According to the theory that “zero mean activations improve learning ability”, a softplus linear unit (SLU) is proposed as an adaptive activation function that can speed up learning and improve performance in deep convolutional neural networks. Firstly, for the reduction of the bias shift, negative inputs are processed using the softplus function, and a general form of the SLU function is proposed. Secondly, the parameters of the positive component are fixed to control vanishing gradients. Thirdly, the rules for updating the parameters of the negative component are established to meet back- propagation requirements. Finally, we designed deep auto-encoder networks and conducted several experiments with them on the MNIST dataset for unsupervised learning. For supervised learning, we designed deep convolutional neural networks and conducted several experiments with them on the CIFAR-10 dataset. The experiments have shown faster convergence and better performance for image classification of SLU-based networks compared with rectified activation functions.
Similar content being viewed by others
References
Noda K, Yamaguchi Y, Nakadai K et al (2015) Audio- visual speech recognition using deep learning. Appl Intell 42(4):722– 737
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks[C] Proceeding of the 26th Annual Conference on Neural Information Processing Systems Lake Taheo, USA, pp 1097–1105
Szegedy C, Liu W, Jia YQ et al (2015) Going deeper with convolutions[C] Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, pp 1–9
Ross G, Jeff D, Darrell, Trevor D et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation[C] Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, pp 580–587
Wang N, Li S, Gupta A et al (2015) Transferring rich feature hierarchies for robust visual tracking[OL]. arXiv:1501.04587
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[OL]. arXiv:1409.1556
He K, Zhang X, Ren S et al (2015) Deep residual learning for image recognition [OL]. arXiv:1512.03385
Trottier L, Giguère P, Chaib-draa B (2016) Parametric Exponential Linear Unit for Deep Convolutional Neural Networks [OL]. arXiv:1605.09332
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines[C] Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel, pp 807– 814
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models[C] Proceeding of the 30th International Conference on Machine Learning. Atlanta, GA, USA, vol 30
He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C] Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, pp 1026–1034
Xu B, Wang N, Chen T et al (2015) Empirical evaluation of rectified activations in convolutional network [OL] arXiv:1505.00853
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus) [OL] arXiv:1511.07289
Desjardins G, Pascanu R, Courville A et al (2013) Metric-free natural gradient for joint-training of boltzmann machines[J]. arXiv:1301.3545
Desjardins G, Simanyan K, Pascanu R et al (2015) Natural neural network[OL] arXiv:1507.00210
Olivier Y (2013) Riemannian metrics for neural networks i: feedforward networks[OL] arXiv:1303.0818
Glorot X, Bordes A, Bengio Y (2011) Deep Sparse Rectifier Neural Networks [C] Proceeding of the 14th International Conference on Artificial Intelligence and Statistics. Fort Landerdale, FL, USA, pp 315–314
Senior A, Lei X (2014) Fine context, low-rank, softplus deep neural networks for mobile speech recognition [C] Proceeding of IEEE International Conference on Acoustic, Speech and Signal Processing, Florence, Italy, pp 7644–7648
Krizhevsky A, Hinton GE (2009) Learning multiple layers of features from tiny images[R]. Computer Science Department, University of toronto, Tech 1(4):7
Lin M, Chen Q, Yan S. (2013) Network in network[OL] arXiv:1312.4400
Njikam ANS, Zhao H (2016) A novel activation function for multilayer feed-forward neural networks[J]. Appl Intell 45(1):75–82
Lee CY, Xie S, Gallagher P et al (2015) Deeply-supervised Nets[C] Proceeding of the 18th international conference on artificial intelligence and statistics. San Diego, California, USA, vol 2, p 6
Acknowledgments
This work was supported by grants from Air Force Engineering University. The authors would like to thank all of the team members of D605 Laboratory.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, H., Liu, F., Li, L. et al. A novel softplus linear unit for deep convolutional neural networks. Appl Intell 48, 1707–1720 (2018). https://doi.org/10.1007/s10489-017-1028-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1028-7