Abstract
In this paper, we discuss representation capability for a systematic understanding of connection structures in deep neural networks. Deep learning is a machine learning method using deep neural networks, and various network structures have been proposed. Skip connections are one of the network structures proposed in the ResNet model and are now a significant architecture. Although skip connections are a straightforward structure and their effectiveness has been shown empirically, the connections in deep neural network models are not mathematically understood, and propositions for model structure are not systematic. In our approach, we discuss the problem from the perspective of function sets represented by neural network models with finite parameters and clarify the correspondence between models with different connection structures. We show that the structure of a variety of branching connections, such as those represented by skip connections, can be designed by a multilayer ReLU perceptron model that is a model with only series connections.
Similar content being viewed by others
References
Chen, R.T.Q., et al.: Neural ordinary differential equations. arXiv preprint, arXiv:1806.07366 (2018)
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Funahashi, K.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2(3), 183–192 (1989)
Ge, Y., Samuel, S.: Mean field residual networks: on the edge of chaos. Adv. Neural Inf. Process. Syst. 30, 7103–7114 (2017)
Glorot, X., et al.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, K., et al.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016)
Hinton, G.E., et al.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Huang, G., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Ioffe, S., Christian S.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32th International Conference on Machine Learning, PMLR 37, pp. 448–456 (2015)
Krizhevsky, A., et al.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012)
Liu, H., et al.: Pay attention to MLPs. Adv. Neural Inf. Process. Syst. 34, 9204–9215 (2021)
Lu, Y., et al.: Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 3282–3291 (2018)
Minsky, M., Papert, S.A.: Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge (1969)
Nagase, J., Ishiwata, T.: Mathematical analysis and design based on representation capability of deep neural network with residual skip connection. Trans. Jpn. Soc. Ind. Appl. Math. 30(1), 45–70 (2020) (in Japanese)
Nair, V., Geoffrey, H.: Rectified linear units Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, ICML-10, pp. 807–814 (2010)
Oono, K., Suzuki, T.: Approximation and non-parametric estimation of ResNet-type convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 4922–4931(2019)
Tolstikhin, I.O., et al.: MLP-Mixer: an all-MLP architecture for vision. Adv. Neural Inf. Process. Syst. 34
Veit, A., et al.: Residual networks behave like ensembles of relatively shallow networks. Adv. Neural Inf. Process. Syst. 29, 550–558 (2016)
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations (2017)
Acknowledgements
This work was supported by JSPS KAKENHI JP21J12812. I would like to thank Professor Tetsuya Ishiwata, my supervisor in the doctoral course, and everyone who generously gave of their time to discuss the various issues with me.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Nagase, J. Mathematical analysis of finite parameter deep neural network models with skip connections from the viewpoint of representation sets. Japan J. Indust. Appl. Math. 39, 1075–1093 (2022). https://doi.org/10.1007/s13160-022-00541-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13160-022-00541-y