Advertisement

Layer-Wise Weight Decay for Deep Neural Networks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10749)

Abstract

In this paper, we propose layer-wise weight decay for efficient training of deep neural networks. Our method sets different values of the weight-decay coefficients layer by layer so that the ratio of the scale of back-propagated gradients and that of the weight decay is constant throughout the network. By utilizing such a setting, we can avoid under or over-fitting and train all layers properly without having to tune the coefficients layer by layer. Experimental results show that our method can enhance the performance of existing deep neural networks without any change of network models.

References

  1. 1.
    Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 437–478. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-35289-8_26 CrossRefGoogle Scholar
  2. 2.
    Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  3. 3.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (2010)Google Scholar
  4. 4.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision (2015)Google Scholar
  5. 5.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.: Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093 (2014)
  6. 6.
    Krähenbühl, P., et al.: Data-dependent initializations of convolutional neural networks. In: International Conference on Learning Representations (2016)Google Scholar
  7. 7.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto 10 (2009)Google Scholar
  8. 8.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  9. 9.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  10. 10.
    Loffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  11. 11.
    Neyshabur, B., Salakhutdinov, R., Srebro, N.: Path-SGD: path-normalized optimization in deep neural networks. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  12. 12.
    Shahriari, B., Bouchard-Cote, A., de Freitas, N.: Unbounded Bayesian optimization via regularization. In: International Conference on Artificial Intelligence and Statistics (2016)Google Scholar
  13. 13.
    Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., Adams, R.: Scalable Bayesian optimization using deep neural networks. In: International Conference on Machine Learning (2015)Google Scholar
  14. 14.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: International Conference on Learning Reprensentation (Workshop Track) (2015)Google Scholar
  15. 15.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATHGoogle Scholar
  16. 16.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems (2014)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.NEC Data Science Research LaboratoriesKawasakiJapan

Personalised recommendations