Abstract
In this paper, we investigate solving the elliptical multiple eigenvalue (EME) problems using a Feedforward Neural Network. Firstly, we propose a general formulation for computing EME based on penalized variational forms of elliptical eigenvalue problems. Next, we solve the penalized variational form using the Deep Ritz Method. We establish an upper bound on the error between the estimated eigenvalues and true ones in terms of the depth \(\mathcal {D}\), width \(\mathcal {W}\) of the neural network, and training sample size n. By exploring the regularity of the EME and selecting an appropriate depth \(\mathcal {D}\) and width \(\mathcal {W}\), we demonstrate that the desired bound enjoys a convergence rate of \(O(1/n^{16})\), which circumvents the curse of dimensionality. We also present several high-dimensional simulation results to illustrate the effectiveness of our proposed method and support our theoretical findings.
Similar content being viewed by others
Data availability
The datasets generated and/or analyzed during the current study, which substantiate the conclusions drawn within this article, are available from the corresponding author, Professor Lu, upon reasonable request and in accordance with institutional and ethical guidelines for data sharing.
References
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (2009)
Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. J. Mach. Learn. Res. 20(1), 2285–2301 (2019)
Berezin, F.A., Shubin, M.: The Schrödinger Equation, vol. 66. Springer Science & Business Media, Berlin (2012)
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press (2013)
Chen, F., Huang, J., Wang, C., Yang, H.: Friedrichs learning: Weak solutions of partial differential equations via deep learning. SIAM J. Sci. Comput. 45(3), A1271–A1299 (2023)
Courtade, T. A.: Bounds on the poincaré constant for convolution measures. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, vol. 56 (2020)
Duan, C., Jiao, Y., Lai, Y., Li, D., Lu, X., Yang, J.Z.: Convergence rate analysis for deep ritz method. Commun. Comput. Phys. 31(4), 1020–1048 (2022)
Duan, C., Jiao, Y., Lai, Y., Lu, X., Quan, Q., Yang, J.Z.: Deep ritz methods for Laplace equations with Dirichlet boundary condition. Commun. Comput. Phys. 31(4), 1020–1048 (2022)
Dudley, R.M.: The sizes of compact subsets of Hilbert space and continuity of gaussian processes. J. Funct. Anal. 1(3), 290–330 (1967)
Evans, L. C.: Partial Differential Equations, vol. 19 (2 edn). American Mathematical Society, pp. 355–360 (2010)
Fefferman, C.L.: A sharp form of Whitney’s extension theorem. Ann. Math. 161(1), 509–577 (2005)
Fortunato, V.B.-D.: An eigenvalue problem for the Schrodinger-Maxwell equations. J. Juliusz Schauder Center 11, 283–293 (1998)
Gu, Y., Yang, H., Zhou, C.: Selectnet: self-paced learning for high-dimensional partial differential equations. J. Comput. Phys. 441, 110444 (2021)
Han, J., Jentzen, A., Solving, W.E.: high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
Han, J., Zhang, L., Weinan, E.: Solving many-electron Schrödinger equation using deep neural networks. J. Comput. Phys. 399, 108929 (2019)
Han, J., Jianfeng, L., Zhou, M.: Solving high-dimensional eigenvalue problems using deep neural networks: a diffusion Monte Carlo like approach. J. Comput. Phys. 423, 109792 (2020)
Hermann, J., Schätzle, Z., Noé, F.: Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem. 12(10), 891–897 (2020)
Hon, S., Yang, H.: Simultaneous Neural Network Approximations in Sobolev Spaces (2021)
Jagtap, A. D., Karniadakis, G. E.: Extended physics-informed neural networks (xpinns): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. In: AAAI Spring Symposium: MLPS, pp. 2002–2041 (2021)
Johnson, O.: Convergence of the poincaré constant. Theory Probab. Appl. 48(3), 535–541 (2004)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer Science & Business Media, Berlin (2013)
Li, P., Yau, S.-T.: On the Schrödinger equation and the eigenvalue problem. Commun. Math. Phys. 88(3), 309–318 (1983)
Li, H., Ying, L.: A semigroup method for high dimensional elliptic PDEs and eigenvalue problems based on neural networks. J. Comput. Phys. 453, 110939 (2022)
Lu, Y., Lu, J., Wang, M.: A priori generalization analysis of the deep ritz method for solving high dimensional elliptic partial differential equations. In: Conference on Learning Theory, pp. 3196–3241. PMLR (2021)
Lu, J., Lu, Y.: A priori generalization error analysis of two-layer neural networks for solving high dimensional Schrödinger eigenvalue problems. Commun. Am. Math. Soc. 2(01), 1–21 (2022)
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: Deepxde: A deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)
Maury, B.: Numerical analysis of a finite element/volume penalty method. SIAM J. Numer. Anal. 47(2), 1126–1148 (2009)
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal. 42(2), 981–1022 (2022)
Müller, J., Zeinhofer, M.: Error estimates for the variational training of neural networks with boundary penalty. arXiv:2103.01007 (2021)
Pfau, D., Spencer, J.S., Matthews, A.G.D.G., Foulkes, W.M.C.: Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Phys. Rev. Res. 2(3), 033429 (2020)
Raissi, M., Perdikaris, P., Karniadakis, G. E.: Physics informed deep learning (part i): data-driven solutions of nonlinear partial differential equations. arXiv:1711.10561 (2017)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Weinan, E., Bing, Y.: The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
Zang, Y., Bao, G., Ye, X., Zhou, H.: Weak adversarial networks for high-dimensional partial differential equations. J. Comput. Phys. 411, 109409 (2020)
Acknowledgements
We would like the thank the anonymous referees and associated editor for their useful comments and suggestions, which have led to considerable improvements in the paper.
Funding
The present study is generously funded by the National Key Research and Development Program of China under Grant No. 2020YFA0714200, as well as multiple grants from the National Natural Science Foundation of China (Nos. 12371424, 12371441, and 12371389), the Beijing Natural Science Foundation through Grant No. Z200003 and “the Fundamental Research Funds for the Central Universities”. It is noteworthy that the numerical calculations in this article have been executed on the high-performance computing facilities at the Supercomputing Center of Wuhan University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
All authors disclosed no relevant relationships.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Lemma 4.1
To prove this lemma, we divide it into two parts that can be separately obtained using two facts. The first fact is that the Rademacher complexity can be passed on through a Lipschitz continuous function, which proves the last four inequalities:
Lemma 6.1
Suppose that \(\psi :{\mathbb {R}}^{d}\times {\mathbb {R}}\rightarrow {\mathbb {R}}\), \((x,y)\mapsto \psi (x,y)\) is \(\ell \)-Lipschitz continuous on y for all x. Let \({\mathcal {N}}\) be classes of functions on \(\Omega \) and \(\psi \circ {\mathcal {N}}=\{\psi \circ u:x\mapsto \psi (x,u(x)),u\in {\mathcal {N}}\}\). Then
For the deduction of this statement, we cite the corollary 3.17 in [21]. Therefore, for the second term, we have:
given the Lipschitz constant is \(2{\mathcal {B}}^{2}\) in this case. So it can be acquired in the third term, with the Lipschitz constant being \(2{\mathcal {B}}\).
As for the last two terms, we use the following inequality:
It can be obtained in the same manner.
The first term needs to be operated separately for the \(\nabla \) operator is not Lipschitz continuous. It is a direct consequence of the following claim:
Claim: Let u be a function implemented by a \(\textrm{ReLU}^{2}\) network with depth \({\mathcal {D}}\) and width \({\mathcal {W}}\). Then \(\Vert \nabla u\Vert _2^2\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network with depth \({\mathcal {D}}+3\) and width \(d\left( {\mathcal {D}}+2\right) {\mathcal {W}}\).
Denote \(\textrm{ReLU}\) and \(\textrm{ReLU}^{2}\) as \(\sigma _1\) and \(\sigma _2\), respectively. As long as we show that each partial derivative \(D_iu(i=1,2,\ldots ,d)\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network respectively, we can easily obtain the network we desire, since, \(\Vert \nabla u\Vert _2^2=\sum _{i=1}^{d}\left| D_i u\right| ^2\) and the square function can be implemented by \(x^2=\sigma _2(x)+\sigma _2(-x)\).
Now we show that for any \(i=1,2,\ldots ,d\), \(D_iu\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network. We deal with the first two layers in detail since there is a little bit of difference for the first two layers and apply induction for layers \(k\ge 3\). For the first layer, since \(\sigma _2^{'}(x)=2\sigma _1(x)\), we have for any \(q=1,2,\ldots ,n_1\)
Hence \(D_iu_q^{(1)}\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network with depth 2 and width 1. For the second layer,
Since \(\sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) \) and \(\sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}\) can be implemented by \(\textrm{ReLU}-\textrm{ReLU}^{2}\) subnetworks, respectively, and the multiplication can also be implemented by
We conclude that \(D_iu_q^{(2)}\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network. We have
and
Thus \({\mathcal {D}}\left( D_iu_q^{(2)}\right) =4,\) \({\mathcal {W}}\left( D_iu_q^{(2)}\right) \le \max \{2{\mathcal {W}},4\}\).
Now we apply induction for layers \(k\ge 3\). For the third layer,
Since
and
we conclude that \(D_iu_q^{(3)}\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network and
We assume that \(D_iu_q^{(k)}(q=1,2,\ldots ,n_k)\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network and \({\mathcal {D}}\left( D_iu_q^{(k)}\right) =k+2\), \({\mathcal {W}}\left( D_iu_q^{(3)}\right) \le (k+2){\mathcal {W}}\). For the \((k+1)-\)th layer,
Since
and
we conclude that \(D_iu_q^{(k+1)}\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network and \({\mathcal {D}}\left( D_iu_q^{(k+1)}\right) =k+3\), \({\mathcal {W}}\left( D_iu_q^{(k+1)}\right) \le \max \{(k+3){\mathcal {W}},4\}=(k+3){\mathcal {W}}\).
Hence we derive that \(D_iu=D_iu_1^{{\mathcal {D}}}\) can be implemented by a \(\textrm{ReLU}\)–\(\textrm{ReLU}^{2}\) network and \({\mathcal {D}}\left( D_iu\right) ={\mathcal {D}}+2\), \({\mathcal {W}}\left( D_iu\right) \le \left( {\mathcal {D}}+2\right) {\mathcal {W}}\). Finally, we obtain:
\(\square \)
Proof of Lemma 4.2
First, we introduce Massart’s finite class lemma whose proof can be found in [4].
Lemma 6.2
(Massart’s finite class lemma [4]) For any finite set \(V\in {\mathbb {R}}^{n}\) with diameter \(D=\sum _{v\in V}\Vert v\Vert _{2}\), then
where \(\sigma _{i}\) and \(\Sigma _{n}\) are the Rademacher variables defined the same as in Definition 4.1.
Then we apply changing method. Set \(\varepsilon _{j}=2^{-k+1} B\). We denote by \({\mathcal {F}}_{k}\) such that \({\mathcal {F}}_{k}\) is an \(\varepsilon _{k}\) -cover of \({\mathcal {F}}\) and \(\left| {\mathcal {F}}_{k}\right| ={\mathcal {C}}\left( \varepsilon _{k}, {\mathcal {F}},\Vert \cdot \Vert _{\infty }\right) \). Hence for any \(u \in {\mathcal {F}}\), there exists \(u_{k} \in {\mathcal {F}}_{k}\) such that \(\left\| u-u_{k}\right\| _{\infty } \le \varepsilon _{k}\). Let K be a positive integer determined later. We have
Since \(0 \in {\mathcal {F}}\), we can choose \({\mathcal {F}}_{1}=\{0\}\) to eliminate the third term. For the first term,
For the second term, defining \(v_{i}^{j}=u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \), and applying Lemma 6.2, we have
By the definition of \(V_{j}\), we know that \(\left| V_{j}\right| \le \left| {\mathcal {F}}_{j}\right| \left| {\mathcal {F}}_{j+1}\right| \le \left| {\mathcal {F}}_{j+1}\right| ^{2}\) and
Hence
Now we obtain
We conclude the lemma by choosing K such that \(\varepsilon _{K+2}\!<\!\delta \!\le \!\varepsilon _{K+1}\) for any \(0\!<\!\delta \!<\!\frac{B}{2}.\) \(\square \)
Proof of Lemma 4.4
The sketch of the proof is given as follow: First the VCdim and pseudo-shattering is introduced as a lower bound of the Pdim; Then the VCdim for polynomial is estimated through a lemma in [1]; Based on the conclusion above, the proof can be finished by a deduction similar to Theorem 6 in [2].
These are the definition of pseudo-shattering and VCdim.
Definition 6.1
Let \({\mathcal {N}}\) be a set of functions from \(X=\Omega (\partial \Omega )\) to \(\{0,1\}\). Suppose that \(S=\left\{ x_{1}, x_{2}, \ldots , x_{n}\right\} \subset X\). We say that S is shattered by \({\mathcal {N}}\) if for any \(b \in \{0,1\}^{n}\), there exists a \(u \in {\mathcal {N}}\) satisfying
Definition 6.2
The VC-dimension of \({\mathcal {N}}\) denoted as \({\text {VCdim}}({\mathcal {N}})\), is defined to be the maximum cardinality among all sets shattered by \({\mathcal {N}}\).
Lemma 6.3 is introduced to estimate the Pdim for polynomials. The proof can be found in Theorem 8.3 in [1].
Lemma 6.3
Let \(p_1,\ldots ,p_m\) be polynomials with n variables of degree at most d. If \(n\le m\), then
The argument follows from the proof of Theorem 6 in [2]. The result stated here is somewhat stronger than Theorem 6 in [2] since \(\textrm{VCdim}({\text {sign}}({\mathcal {N}}))\le \textrm{Pdim}({\mathcal {N}})\).
We consider a new set of functions:
It is clear that \(\textrm{Pdim}({\mathcal {N}})\le \textrm{VCdim}(\mathcal {{\widetilde{N}}})\). We now bound the VC-dimension of \(\mathcal {{\widetilde{N}}}\). Denoting \({\mathcal {M}}\) as the total number of parameters(weights and biases) in the neural networks implementing functions in \({\mathcal {N}}\), in our case we want to derive the uniform bound for
over all \(\{x_i\}_{i=1}^{m}\subset X\) and \(\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}\). Actually, the maximum of \(K_{\{x_i\},\{y_i\}}(m)\) overall \(\{x_i\}_{i=1}^{m}\subset X\) and \(\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}\) is the growth function \({\mathcal {G}}_{\mathcal {{\widetilde{N}}}}(m)\).
To apply Lemma 6.3, we partition the parameter space \({\mathbb {R}}^{{\mathcal {M}}}\) into several subsets to ensure that in each subset \(u(x_i,a)-y_i\) is a polynomial with respect to a without any breakpoints. In fact, our partition is the same as the partition in [2]. Denote the partition as \(\{P_1,P_2,\ldots ,P_N\}\) with some integer N satisfying
Where \(k_i\) and \({\mathcal {M}}_i\) denote the number of units at the i-th layer and the total number of parameters of the units’ inputs in all the layers up to layer i of the neural networks implementing functions in \({\mathcal {N}}\), respectively. For more information on the construction of the partition that is used, please refer to [2]. Obviously, we have
Note that \(u(x_i,a)-y_i\) is a polynomial with respect to a with the degree the same as the degree of \(u(x_i,a)\), which is equal to \(1 + ({\mathcal {D}}-1)2^{{\mathcal {D}}-1}\) as shown in [2]. Hence by Lemma 6.3, we have
Combining (23), (24), (25) yields
We then have
since the maximum of \(K_{\{x_i\},\{y_i\}}(m)\) overall \(\{x_i\}_{i=1}^{m}\subset X\) and \(\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}\) is the growth function \({\mathcal {G}}_{\mathcal {{\widetilde{N}}}}(m)\). Some algebras as that of the proof of Theorem 6 in [2], we obtain
where \({\mathcal {U}}\) refers to the number of units of the neural networks implementing functions in \({\mathcal {N}}\).
\(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ji, X., Jiao, Y., Lu, X. et al. Deep Ritz Method for Elliptical Multiple Eigenvalue Problems. J Sci Comput 98, 48 (2024). https://doi.org/10.1007/s10915-023-02443-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-023-02443-8