Skip to main content
Log in

Mathematical analysis of finite parameter deep neural network models with skip connections from the viewpoint of representation sets

  • Original Paper
  • Published:
Japan Journal of Industrial and Applied Mathematics Aims and scope Submit manuscript

Abstract

In this paper, we discuss representation capability for a systematic understanding of connection structures in deep neural networks. Deep learning is a machine learning method using deep neural networks, and various network structures have been proposed. Skip connections are one of the network structures proposed in the ResNet model and are now a significant architecture. Although skip connections are a straightforward structure and their effectiveness has been shown empirically, the connections in deep neural network models are not mathematically understood, and propositions for model structure are not systematic. In our approach, we discuss the problem from the perspective of function sets represented by neural network models with finite parameters and clarify the correspondence between models with different connection structures. We show that the structure of a variety of branching connections, such as those represented by skip connections, can be designed by a multilayer ReLU perceptron model that is a model with only series connections.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen, R.T.Q., et al.: Neural ordinary differential equations. arXiv preprint, arXiv:1806.07366 (2018)

  2. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  3. Funahashi, K.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2(3), 183–192 (1989)

    Article  Google Scholar 

  4. Ge, Y., Samuel, S.: Mean field residual networks: on the edge of chaos. Adv. Neural Inf. Process. Syst. 30, 7103–7114 (2017)

    Google Scholar 

  5. Glorot, X., et al.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)

  6. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  7. He, K., et al.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016)

  8. Hinton, G.E., et al.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Huang, G., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

  10. Ioffe, S., Christian S.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32th International Conference on Machine Learning, PMLR 37, pp. 448–456 (2015)

  11. Krizhevsky, A., et al.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012)

  12. Liu, H., et al.: Pay attention to MLPs. Adv. Neural Inf. Process. Syst. 34, 9204–9215 (2021)

    Google Scholar 

  13. Lu, Y., et al.: Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 3282–3291 (2018)

  14. Minsky, M., Papert, S.A.: Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge (1969)

    MATH  Google Scholar 

  15. Nagase, J., Ishiwata, T.: Mathematical analysis and design based on representation capability of deep neural network with residual skip connection. Trans. Jpn. Soc. Ind. Appl. Math. 30(1), 45–70 (2020) (in Japanese)

    Google Scholar 

  16. Nair, V., Geoffrey, H.: Rectified linear units Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, ICML-10, pp. 807–814 (2010)

  17. Oono, K., Suzuki, T.: Approximation and non-parametric estimation of ResNet-type convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 4922–4931(2019)

  18. Tolstikhin, I.O., et al.: MLP-Mixer: an all-MLP architecture for vision. Adv. Neural Inf. Process. Syst. 34

  19. Veit, A., et al.: Residual networks behave like ensembles of relatively shallow networks. Adv. Neural Inf. Process. Syst. 29, 550–558 (2016)

    Google Scholar 

  20. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations (2017)

Download references

Acknowledgements

This work was supported by JSPS KAKENHI JP21J12812. I would like to thank Professor Tetsuya Ishiwata, my supervisor in the doctoral course, and everyone who generously gave of their time to discuss the various issues with me.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jumpei Nagase.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nagase, J. Mathematical analysis of finite parameter deep neural network models with skip connections from the viewpoint of representation sets. Japan J. Indust. Appl. Math. 39, 1075–1093 (2022). https://doi.org/10.1007/s13160-022-00541-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13160-022-00541-y

Keywords

Mathematics Subject Classification

Navigation