Advertisement

Effects of Skip-Connection in ResNet and Batch-Normalization on Fisher Information Matrix

  • Yasutaka FurushoEmail author
  • Kazushi Ikeda
Conference paper
Part of the Proceedings of the International Neural Networks Society book series (INNS, volume 1)

Abstract

Deep neural networks such as multi-layer perceptron (MLP) have intensively been studied and new techniques have been introduced for better generalization ability and faster convergence. One of the techniques is skip-connections between layers in the ResNet and another is the batch normalization (BN). To clarify effects of these techniques, we carried out the landscape analysis of the loss function for these networks. The landscape affects the convergence properties where the eigenvalues of the Fisher Information Matrix (FIM) plays an important role. Thus, we calculated the eigenvalues of the FIMs of the MLP, ResNet, and ResNet with BN by applying functional analysis to the networks with random weights, which of MLP was analyzed before in asymptotic case using the central limit theorem. Our results show that the MLP has eigenvalues that are independent of its depth, that the ResNet has eigenvalues that grow exponentially with its depth, and that the ResNet with BN has eigenvalues that grow sub-linear with its depth. These imply that the BN allows the ResNet to use larger learning rate and hence converges faster than the vanilla ResNet.

Keywords

ResNet Batch-normalization Fisher Information Matrix 

Notes

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP18J15055, JP18K19821, and NAIST Big Data Project.

References

  1. 1.
    Balduzzi, D., Frean, M., Leary, L., Lewis, J., Ma, K.W.D., McWilliams, B.: The shattered gradients problem: if ResNets are the answer, then what is the question? In: International Conference on Machine Learning, pp. 342–350 (2017)Google Scholar
  2. 2.
    Bengio, Y., et al.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefGoogle Scholar
  3. 3.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  4. 4.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  5. 5.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  6. 6.
    Karakida, R., Akaho, S., Amari, S.: Universal statistics of fisher information in deep neural networks: mean field approach. arXiv preprint arXiv:1806.01316 (2018)
  7. 7.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  8. 8.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  9. 9.
    LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. In: Neural Networks: Tricks of the Trade, pp. 9–48. Springer, Heidelberg (2012)Google Scholar
  10. 10.
    Montufar, G.F., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2924–2932 (2014)Google Scholar
  11. 11.
    Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., Sohl-Dickstein, J.: On the expressive power of deep neural networks. In: International Conference on Machine Learning, pp. 2847–2854 (2017)Google Scholar
  12. 12.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: Advances in Neural Information Processing Systems, vol. 31 (2018)Google Scholar
  14. 14.
    Telgarsky, M.: Benefits of depth in neural networks. In: Conference on Learning Theory, pp. 1517–1539 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Nara Institute of Science and TechnologyNaraJapan

Personalised recommendations