Deep networks with non-static activation function
- 130 Downloads
Abstract
Deep neural networks typically with a fixed activation function at each neuron, have shown breakthrough performances. The fixed activation function is not the optimal choice for different data distributions. Toward this end, this work improves the deep neural networks by proposing a novel and efficient activation scheme called “Mutual Activation” (MAC). A non-static activation function is adaptively learned in the training phase of deep network. Furthermore, the proposed activation neuron cooperating with maxout is a potent higher-order function approximator, which can break through the convex curve limitation. Experimental results on object recognition benchmarks demonstrate the effectiveness of the proposed activation scheme.
Keywords
Object recognition Activation neuron Convolution network Feature learningNotes
Acknowledgments
This work was partially supported by the 973 Program (Project No. 2014CB347600), the National Natural Science Foundation of China (Grant No. 61772275, 61720106004, 61672285 and 61672304) and the Natural Science Foundation of Jiangsu Province (BK20170033).
References
- 1.Agostinelli F, Hoffman MD, Sadowski PJ, Baldi P (2015) Learning activation functions to improve deep neural networks. In: ICLRGoogle Scholar
- 2.Chang JR, Chen YS (2015) Batch-normalized maxout network in network. arXiv:1511.02583 1511.02583
- 3.Clevert DA, Unterthiner T, Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units. In: ICLRGoogle Scholar
- 4.Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: AISTATSGoogle Scholar
- 5.Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: AISTATSGoogle Scholar
- 6.Goodfellow IJ, Warde-Farley D, Mirza M, Courville AC, Bengio Y (2013) Maxout networks. In: ICMLGoogle Scholar
- 7.Gulcehre C, Cho K, Pascanu R, Bengio Y (2014) Learned-norm pooling for deep feedforward and recurrent neural networks. In: ECMLGoogle Scholar
- 8.He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp 1026–1034Google Scholar
- 9.He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778Google Scholar
- 10.Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICMLGoogle Scholar
- 11.Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678Google Scholar
- 12.Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny imagesGoogle Scholar
- 13.Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPSGoogle Scholar
- 14.LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
- 15.Lee CY, Xie S, Gallagher PW, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: AISTATSGoogle Scholar
- 16.Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999CrossRefGoogle Scholar
- 17.Li Z, Tang J (2017) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288MathSciNetCrossRefGoogle Scholar
- 18.Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: CVPRGoogle Scholar
- 19.Lin M, Chen Q, Yan S (2014) Network in network. In ICLRGoogle Scholar
- 20.Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: ICML, vol 30Google Scholar
- 21.Maaten LVD, Hinton G (2017) Visualizing data using t-sne. JMLR 9 (2605):2579–2605zbMATHGoogle Scholar
- 22.Mishkin D, Matas J (2016) All you need is a good init. In: ICLRGoogle Scholar
- 23.Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICMLGoogle Scholar
- 24.Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y Fitnets: Hints for thin deep nets. In: ICLRGoogle Scholar
- 25.Shang W, Sohn K, Almeida D, Lee H (2016) Understanding and improving convolutional neural networks via concatenated rectified linear units. In: ICMLGoogle Scholar
- 26.Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLRGoogle Scholar
- 27.Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. JMLR 15:1929–1958MathSciNetzbMATHGoogle Scholar
- 28.Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387
- 29.Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9Google Scholar
- 30.Wang Q, Gao J, Yuan Y (2017) A joint convolutional neural networks and context transfer for street scenes labeling. IEEE Trans Intell Transp Syst PP(99):1–14Google Scholar
- 31.Wang Q, Wan J, Yuan Y (2017) Deep metric learning for crowdedness regression. IEEE Trans Circuits Syst Video Technol PP(99):1–1Google Scholar