Abstract
The efficiency of deep convolutional neural networks (DCNNs) has been demonstrated empirically in many practical applications. In this paper, we establish a theory for approximating functions from Korobov spaces by DCNNs. It verifies rigorously the efficiency of DCNNs in approximating functions of many variables with some variable structures and their abilities in overcoming the curse of dimensionality.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bungartz, H.-J., Griebel, M.: Sparse grids. Acta Numer. 13, 147–269 (2004)
Chui, C.K., Lin, S.B., Zhang, B., Zhou, D.X.: Realization of spatial sparseness by deep reLU nets with massive data. IEEE Trans. Neural Netw. Learn. Syst. 33, 229–243 (2022)
Chui, C.K., Lin, S.B., Zhou, D.X.: Deep neural networks for rotation-invariance approximation and learning. Anal. Appl. 17, 737–772 (2019)
Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: 29th Annual Conference on Learning Theory, PMLR, vol. 49, pp. 907–940 (2016)
Fang, Z., Feng, H., Huang, S., Zhou, D.X.: Theory of deep convolutional neural networks II: spherical analysis. Neural Netw. 131, 154–162 (2020)
Feng, H., Hou, S.Z., Wei, L.Y., Zhou, D.X.: CNN models for readability of Chinese texts. Math. Found. Comp. 5, 351–362 (2022)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N., Peste, A.: Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22, 1–124 (2021)
Klusowski, J.M., Barron, A.R.: Approximation by combinations of reLU and squared reLU ridge functions with ℓ1 and ℓ0 controls. IEEE Trans. Inf. Theory 64, 7649–7656 (2018)
Kohler, M., Krzyżak, A.: Nonparametric regression based on hierarchical interaction models. IEEE Trans. Inf. Theory 63, 1620–1630 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2012)
Liang, S., Srikant, R.: Why deep neural networks for function approximation?. In: Proceedings of international conference on learning representations (2017)
Lin, S.B.: Generalization and expressivity for deep nets. IEEE Trans. Neural Netw. Learn Syst. 30, 1392–1406 (2019)
Mao, T., Shi, Z.J., Zhou, D.X.: Theory of deep convolutional neural networks III: Approximating radial functions. Neural Netw. 144, 778–790 (2021)
Mhaskar, H.N.: Approximation properties of a multilayered feedforward artificial neural network. Adv. Comput. Math. 1, 61–80 (1993)
Montanelli, H., Du, Q.: New error bounds for deep reLU networks using sparse grids. SIAM Journal on Mathematics of Data Science 1, 78–92 (2019)
Pinkus, A.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6, 861–867 (1993)
Poggio, T., Mhaskar, H.N., Rosasco, L., Miranda, B., Liao, Q.: Why and when can deep—but not shallow—networks avoid the curse of dimensionality: a review. Internat. J. Automation Comput. 14, 503–519 (2017)
Telgarsky, M.: Benefits of depth in neural networks. In: 29th Annual Conference on Learning Theory, PMLR, vol. 49, pp. 1517–1539 (2016)
Yarotsky, D.: Error bounds for approximations with deep reLU networks. Neural Netw. 94, 103–114 (2017)
Zhou, D.X.: Universality of deep convolutional neural networks. Appl. Comput. Harmon. Anal. 48, 787–794 (2020)
Zhou, D.X.: Theory of deep convolutional neural networks: Downsampling. Neural Netw. 124, 319–327 (2020)
Zhou, D.X.: Deep distributed convolutional neural networks: universality. Anal. Appl. 16, 895–919 (2018)
Zhou, D.X. In: Webster, J. (ed.) : Deep Convolutional Neural Networks. Wiley Encyclopedia of Electrical and Electronics Engineering, Hoboken (2021). https://doi.org/10.1002/047134608X.W8424
Zhu, X.N., Li, Z.Y., Sun, J.: Expression recognition method combining convolutional features and Transformer, Math. Found. Comp., online first
Acknowledgements
The work described in this paper is supported partially by the Laboratory for AI-Powered Financial Technologies, the Research Grants Council of Hong Kong (Projects # C1013-21GF and #11308121), the Germany/Hong Kong Joint Research Scheme (Project No. G-CityU101/20), NSFC/RGC Joint Research Scheme (RGC Project No. N_CityU102/20 and NSFC Project No. 12061160462), and Hong Kong Institute for Data Science.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors have no other relevant financial or non-financial interests to disclose, except the sponsorships stated in the section of Acknowledgements.
Additional information
Communicated by: Rachel Ward
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proof of Lemma 2
Appendix: Proof of Lemma 2
In this appendix, we prove Lemma 2.
Proof of Lemma 2
We show how the iterations of tooth functions can be realized by DCNNs.
For the 1st step, we take in Lemma 1 that L1 = L1, m = L, L2 = L2, M = 1, W represented by
B by \([B]_{1}^{8L}=\boldsymbol {0}_{8L}\), and \(\check z=y\). Then we conclude there exist filters \(\{w^{(j)}\}_{j=1}^{K_{0}}\) and biases \(\{b^{(j)}\}_{j=1}^{K_{0}}\) satisfying (4.2) such that
where \(K_{0}\leq \left \lceil \frac {7L}{s-1}\right \rceil \) and n0 = L2 + K0s − 7L.
For the u + 1-th step, we assume the Ku-th layer has the form
Notice \(h^{(K_{0})}\) already has this form (see Definition 3).
Now following from Lemma 1 by letting L1 = L1,u, m = 8L, L2 = L2,u, M = 1, W represented by
B by
and
we find there exist filters \(\{w^{(j)}\}_{j=K_{u}+1}^{K_{u+1,1}}\) and biases \(\{b^{(j)}\}_{j=K_{u}+1}^{K_{u+1,1}}\) satisfying the restriction (4.2) such that
where \(n_{u+1,1}=L_{2,u}+\left (K_{u+1,1}-K_{u}\right )s-2L\) and the number of layers Ku+ 1,1 − Ku is bounded by \(\left \lceil \frac {2L}{s-1}\right \rceil \).
Again, appealing Lemma 1 by letting L1 = L1,u + 2L, m = 8L, L2 = nu+ 1,1, M = 2, W represented by
B by
and
we find there exist filters \(\{w^{(j)}\}_{j=K_{u+1,1}+1}^{K_{u+1,2}}\), biases \(\{b^{(j)}\}_{j=K_{u+1,1}+1}^{K_{u+1,2}}\) satisfying the restriction (4.2) such that
where \(n_{u+1,2}=n_{u+1,1}+\left (K_{u+1,2}-K_{u+1,1}\right )s-2L\) and the number of layers Ku+ 1,2 − Ku+ 1,1 is bounded by \(\left \lceil \frac {2L}{s-1}\right \rceil \).
Again we can deduce from putting L1 = L1,u + 2L, m = 8L, L2 = nu+ 1,2 + 2L, M = 2, W represented by
B by
and
in Lemma 1 that there exist filters \(\{w^{(j)}\}_{j=K_{u+1,1}+1}^{K_{u+1,2}}\), biases \(\{b^{(j)}\}_{j=K_{u+1,1}+1}^{K_{u+1,2}}\) satisfying the restriction (4.2) such that
where \(n_{u+1,3}=n_{u+1,2}+\left (K_{u+1,3}-K_{u+1,2}\right )s-3L\) and the number of layers Ku+ 1,3 − Ku+ 1,2 is bounded by \(\left \lceil \frac {3L}{s-1}\right \rceil \).
Let Ku+ 1 = Ku+ 1,3, L1,u+ 1 = L1,u + 2L and L2,u+ 1 = nu+ 1,3 + 3L. This is exactly the form (6.1). By repeating this process V times, from the input y we obtain
To realize (4.5), we only need to construct y − RV(y) by a deep CNN. Applying Lemma 1 to L1 = L1,V, m = 8L, L2 = L2,V, M = 1, W represented by
B by
and
we see that there exist filters \(\{w^{(j)}\}_{j=K_{V}+1}^{K}\), biases \(\{b^{(j)}\}_{j=K_{V}+1}^{K}\) satisfying the restriction (4.2) such that
where \(n_{V+1}=L_{2,V}+\left (K-K_{V}\right )s-8L\) and the number of layers K − KV is bounded by \(\left \lceil \frac {8L}{s-1}\right \rceil \). This is exactly (4.5) with LV = L1,V and \(L_{V}^{\prime }=n_{V+1}\).
We finally count the depth K and the number of free parameters \(\mathcal N\). At the first step, \(K_{0}\leq \left \lceil \frac {7L}{s-1}\right \rceil +1\). At the u + 1-th step, \(K_{u+1,1}-K_{u}\leq \left \lceil \frac {2L}{s-1}\right \rceil \), \(K_{u+1,2}-K_{u+1,1}\leq \left \lceil \frac {2L}{s-1}\right \rceil \), \(K_{u+1,3}-K_{u+1,2}\leq \left \lceil \frac {3L}{s-1}\right \rceil \). At the last step, \(K-K_{V}\leq \left \lceil \frac {8L}{s-1}\right \rceil \). Therefore,
The dimension of each bias b(j) are bounded by \(\dim (b^{(j)})\leq d_{K}\leq L+Ks\). Then the number of free parameters in these biases \(b^{(K_{0})}\), b(K), \(b^{(K_{u+1,1})}\), \(b^{(K_{u+1,2})}\), \(b^{(K_{u+1,3})}\), \(u=1,\dots ,p\) satisfies:
Together with the number of free parameters in the other layers, we have
This completes the proof of Lemma 2. □
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mao, T., Zhou, DX. Approximation of functions from Korobov spaces by deep convolutional neural networks. Adv Comput Math 48, 84 (2022). https://doi.org/10.1007/s10444-022-09991-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-022-09991-x