Skip to main content
Log in

Deep Ritz Method for Elliptical Multiple Eigenvalue Problems

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

In this paper, we investigate solving the elliptical multiple eigenvalue (EME) problems using a Feedforward Neural Network. Firstly, we propose a general formulation for computing EME based on penalized variational forms of elliptical eigenvalue problems. Next, we solve the penalized variational form using the Deep Ritz Method. We establish an upper bound on the error between the estimated eigenvalues and true ones in terms of the depth \(\mathcal {D}\), width \(\mathcal {W}\) of the neural network, and training sample size n. By exploring the regularity of the EME and selecting an appropriate depth \(\mathcal {D}\) and width \(\mathcal {W}\), we demonstrate that the desired bound enjoys a convergence rate of \(O(1/n^{16})\), which circumvents the curse of dimensionality. We also present several high-dimensional simulation results to illustrate the effectiveness of our proposed method and support our theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated and/or analyzed during the current study, which substantiate the conclusions drawn within this article, are available from the corresponding author, Professor Lu, upon reasonable request and in accordance with institutional and ethical guidelines for data sharing.

References

  1. Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  2. Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. J. Mach. Learn. Res. 20(1), 2285–2301 (2019)

    MathSciNet  Google Scholar 

  3. Berezin, F.A., Shubin, M.: The Schrödinger Equation, vol. 66. Springer Science & Business Media, Berlin (2012)

    Google Scholar 

  4. Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press (2013)

  5. Chen, F., Huang, J., Wang, C., Yang, H.: Friedrichs learning: Weak solutions of partial differential equations via deep learning. SIAM J. Sci. Comput. 45(3), A1271–A1299 (2023)

    Article  MathSciNet  Google Scholar 

  6. Courtade, T. A.: Bounds on the poincaré constant for convolution measures. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, vol. 56 (2020)

  7. Duan, C., Jiao, Y., Lai, Y., Li, D., Lu, X., Yang, J.Z.: Convergence rate analysis for deep ritz method. Commun. Comput. Phys. 31(4), 1020–1048 (2022)

    Article  MathSciNet  Google Scholar 

  8. Duan, C., Jiao, Y., Lai, Y., Lu, X., Quan, Q., Yang, J.Z.: Deep ritz methods for Laplace equations with Dirichlet boundary condition. Commun. Comput. Phys. 31(4), 1020–1048 (2022)

    Article  MathSciNet  Google Scholar 

  9. Dudley, R.M.: The sizes of compact subsets of Hilbert space and continuity of gaussian processes. J. Funct. Anal. 1(3), 290–330 (1967)

    Article  MathSciNet  Google Scholar 

  10. Evans, L. C.: Partial Differential Equations, vol. 19 (2 edn). American Mathematical Society, pp. 355–360 (2010)

  11. Fefferman, C.L.: A sharp form of Whitney’s extension theorem. Ann. Math. 161(1), 509–577 (2005)

    Article  MathSciNet  Google Scholar 

  12. Fortunato, V.B.-D.: An eigenvalue problem for the Schrodinger-Maxwell equations. J. Juliusz Schauder Center 11, 283–293 (1998)

    MathSciNet  Google Scholar 

  13. Gu, Y., Yang, H., Zhou, C.: Selectnet: self-paced learning for high-dimensional partial differential equations. J. Comput. Phys. 441, 110444 (2021)

    Article  MathSciNet  Google Scholar 

  14. Han, J., Jentzen, A., Solving, W.E.: high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)

    Article  MathSciNet  Google Scholar 

  15. Han, J., Zhang, L., Weinan, E.: Solving many-electron Schrödinger equation using deep neural networks. J. Comput. Phys. 399, 108929 (2019)

    Article  MathSciNet  Google Scholar 

  16. Han, J., Jianfeng, L., Zhou, M.: Solving high-dimensional eigenvalue problems using deep neural networks: a diffusion Monte Carlo like approach. J. Comput. Phys. 423, 109792 (2020)

    Article  MathSciNet  Google Scholar 

  17. Hermann, J., Schätzle, Z., Noé, F.: Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem. 12(10), 891–897 (2020)

    Article  Google Scholar 

  18. Hon, S., Yang, H.: Simultaneous Neural Network Approximations in Sobolev Spaces (2021)

  19. Jagtap, A. D., Karniadakis, G. E.: Extended physics-informed neural networks (xpinns): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. In: AAAI Spring Symposium: MLPS, pp. 2002–2041 (2021)

  20. Johnson, O.: Convergence of the poincaré constant. Theory Probab. Appl. 48(3), 535–541 (2004)

    Article  MathSciNet  Google Scholar 

  21. Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer Science & Business Media, Berlin (2013)

    Google Scholar 

  22. Li, P., Yau, S.-T.: On the Schrödinger equation and the eigenvalue problem. Commun. Math. Phys. 88(3), 309–318 (1983)

    Article  Google Scholar 

  23. Li, H., Ying, L.: A semigroup method for high dimensional elliptic PDEs and eigenvalue problems based on neural networks. J. Comput. Phys. 453, 110939 (2022)

    Article  MathSciNet  Google Scholar 

  24. Lu, Y., Lu, J., Wang, M.: A priori generalization analysis of the deep ritz method for solving high dimensional elliptic partial differential equations. In: Conference on Learning Theory, pp. 3196–3241. PMLR (2021)

  25. Lu, J., Lu, Y.: A priori generalization error analysis of two-layer neural networks for solving high dimensional Schrödinger eigenvalue problems. Commun. Am. Math. Soc. 2(01), 1–21 (2022)

    Article  Google Scholar 

  26. Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: Deepxde: A deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)

    Article  MathSciNet  Google Scholar 

  27. Maury, B.: Numerical analysis of a finite element/volume penalty method. SIAM J. Numer. Anal. 47(2), 1126–1148 (2009)

    Article  MathSciNet  Google Scholar 

  28. Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal. 42(2), 981–1022 (2022)

    Article  MathSciNet  Google Scholar 

  29. Müller, J., Zeinhofer, M.: Error estimates for the variational training of neural networks with boundary penalty. arXiv:2103.01007 (2021)

  30. Pfau, D., Spencer, J.S., Matthews, A.G.D.G., Foulkes, W.M.C.: Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Phys. Rev. Res. 2(3), 033429 (2020)

    Article  Google Scholar 

  31. Raissi, M., Perdikaris, P., Karniadakis, G. E.: Physics informed deep learning (part i): data-driven solutions of nonlinear partial differential equations. arXiv:1711.10561 (2017)

  32. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)

    Article  MathSciNet  Google Scholar 

  33. Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)

    Article  MathSciNet  Google Scholar 

  34. Weinan, E., Bing, Y.: The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)

    Article  MathSciNet  Google Scholar 

  35. Zang, Y., Bao, G., Ye, X., Zhou, H.: Weak adversarial networks for high-dimensional partial differential equations. J. Comput. Phys. 411, 109409 (2020)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like the thank the anonymous referees and associated editor for their useful comments and suggestions, which have led to considerable improvements in the paper.

Funding

The present study is generously funded by the National Key Research and Development Program of China under Grant No. 2020YFA0714200, as well as multiple grants from the National Natural Science Foundation of China (Nos. 12371424, 12371441, and 12371389), the Beijing Natural Science Foundation through Grant No. Z200003 and “the Fundamental Research Funds for the Central Universities”. It is noteworthy that the numerical calculations in this article have been executed on the high-performance computing facilities at the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiliang Lu.

Ethics declarations

Competing interests

All authors disclosed no relevant relationships.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 4.1

To prove this lemma, we divide it into two parts that can be separately obtained using two facts. The first fact is that the Rademacher complexity can be passed on through a Lipschitz continuous function, which proves the last four inequalities:

Lemma 6.1

Suppose that \(\psi :{\mathbb {R}}^{d}\times {\mathbb {R}}\rightarrow {\mathbb {R}}\), \((x,y)\mapsto \psi (x,y)\) is \(\ell \)-Lipschitz continuous on y for all x. Let \({\mathcal {N}}\) be classes of functions on \(\Omega \) and \(\psi \circ {\mathcal {N}}=\{\psi \circ u:x\mapsto \psi (x,u(x)),u\in {\mathcal {N}}\}\). Then

$$\begin{aligned} {\mathfrak {R}}(\psi \circ {\mathcal {N}})\le \ell \ {\mathfrak {R}}({\mathcal {N}}) \end{aligned}$$

For the deduction of this statement, we cite the corollary 3.17 in [21]. Therefore, for the second term, we have:

$$\begin{aligned} \sup _{u \in {\mathcal {N}}^{2}}\left| {\mathcal {L}}_{k, 2}(u)-\widehat{{\mathcal {L}}}_{k, 2}(u)\right| \le |\Omega |{\mathfrak {R}}\left( \left\{ w u^{2}:u\in {\mathcal {N}}^{2}\right\} \right) \le 2|\Omega |{\mathcal {B}}^{2}{\mathfrak {R}}\left( {\mathcal {N}}^{2}\right) \end{aligned}$$

given the Lipschitz constant is \(2{\mathcal {B}}^{2}\) in this case. So it can be acquired in the third term, with the Lipschitz constant being \(2{\mathcal {B}}\).

As for the last two terms, we use the following inequality:

$$\begin{aligned} |a^{2}-b^{2}|\le 2|a||a-b|+|a-b|^{2}. \end{aligned}$$

It can be obtained in the same manner.

The first term needs to be operated separately for the \(\nabla \) operator is not Lipschitz continuous. It is a direct consequence of the following claim:

Claim: Let u be a function implemented by a \(\textrm{ReLU}^{2}\) network with depth \({\mathcal {D}}\) and width \({\mathcal {W}}\). Then \(\Vert \nabla u\Vert _2^2\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network with depth \({\mathcal {D}}+3\) and width \(d\left( {\mathcal {D}}+2\right) {\mathcal {W}}\).

Denote \(\textrm{ReLU}\) and \(\textrm{ReLU}^{2}\) as \(\sigma _1\) and \(\sigma _2\), respectively. As long as we show that each partial derivative \(D_iu(i=1,2,\ldots ,d)\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network respectively, we can easily obtain the network we desire, since, \(\Vert \nabla u\Vert _2^2=\sum _{i=1}^{d}\left| D_i u\right| ^2\) and the square function can be implemented by \(x^2=\sigma _2(x)+\sigma _2(-x)\).

Now we show that for any \(i=1,2,\ldots ,d\), \(D_iu\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network. We deal with the first two layers in detail since there is a little bit of difference for the first two layers and apply induction for layers \(k\ge 3\). For the first layer, since \(\sigma _2^{'}(x)=2\sigma _1(x)\), we have for any \(q=1,2,\ldots ,n_1\)

$$\begin{aligned} D_iu_q^{(1)}=D_i\sigma _2\left( \sum _{j=1}^{d}a_{qj}^{(1)}x_j+b_q^{(1)}\right) =2\sigma _1\left( \sum _{j=1}^{d}a_{qj}^{(1)}x_j+b_q^{(1)}\right) \cdot a_{qi}^{(1)} \end{aligned}$$

Hence \(D_iu_q^{(1)}\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network with depth 2 and width 1. For the second layer,

$$\begin{aligned} D_iu_q^{(2)}=D_i\sigma _2\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) =2\sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) \cdot \sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}. \end{aligned}$$

Since \(\sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) \) and \(\sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}\) can be implemented by \(\textrm{ReLU}-\textrm{ReLU}^{2}\) subnetworks, respectively, and the multiplication can also be implemented by

$$\begin{aligned} x\cdot y= & {} \frac{1}{4}\left[ (x+y)^2-(x-y)^2\right] \\= & {} \frac{1}{4}\left[ \sigma _2(x+y)+\sigma _2(-x-y)-\sigma _2(x-y)-\sigma _2(-x+y)\right] . \end{aligned}$$

We conclude that \(D_iu_q^{(2)}\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network. We have

$$\begin{aligned} {\mathcal {D}}\left( \sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) \right) =3, {\mathcal {W}}\left( \sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) \right) \le {\mathcal {W}} \end{aligned}$$

and

$$\begin{aligned} {\mathcal {D}}\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}\right) =2, {\mathcal {W}}\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}\right) \le {\mathcal {W}}. \end{aligned}$$

Thus \({\mathcal {D}}\left( D_iu_q^{(2)}\right) =4,\) \({\mathcal {W}}\left( D_iu_q^{(2)}\right) \le \max \{2{\mathcal {W}},4\}\).

Now we apply induction for layers \(k\ge 3\). For the third layer,

$$\begin{aligned} D_iu_q^{(3)}=D_i\sigma _2\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}u_j^{(2)}+b_q^{(3)}\right) =2\sigma _1\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}u_j^{(2)}+b_q^{(3)}\right) \cdot \sum _{j=1}^{n_2}a_{qj}^{(3)}D_iu_j^{(2)}. \end{aligned}$$

Since

$$\begin{aligned} {\mathcal {D}}\left( \sigma _1\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}u_j^{(2)}+b_q^{(3)}\right) \right) =4, {\mathcal {W}}\left( \sigma _1\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}u_j^{(2)}+b_q^{(3)}\right) \right) \le {\mathcal {W}} \end{aligned}$$

and

$$\begin{aligned} {\mathcal {D}}\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}D_iu_j^{(2)}\right) =4, {\mathcal {W}}\left( \sum _{j=1}^{n_1}a_{qj}^{(3)}D_iu_j^{(2)}\right) \le \max \{2{\mathcal {W}},4{\mathcal {W}}\}=4{\mathcal {W}}, \end{aligned}$$

we conclude that \(D_iu_q^{(3)}\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network and

$$\begin{aligned} {\mathcal {D}}\left( D_iu_q^{(3)}\right) =5, {\mathcal {W}}\left( D_iu_q^{(3)}\right) \le \max \{5{\mathcal {W}},4\}=5{\mathcal {W}}. \end{aligned}$$

We assume that \(D_iu_q^{(k)}(q=1,2,\ldots ,n_k)\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network and \({\mathcal {D}}\left( D_iu_q^{(k)}\right) =k+2\), \({\mathcal {W}}\left( D_iu_q^{(3)}\right) \le (k+2){\mathcal {W}}\). For the \((k+1)-\)th layer,

$$\begin{aligned} D_iu_q^{(k+1)}&=D_i\sigma _2\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}u_j^{(k)}+b_q^{(k+1)}\right) \end{aligned}$$
(20)
$$\begin{aligned}&=2\sigma _1\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}u_j^{(k)}+b_q^{(k+1)}\right) \cdot \sum _{j=1}^{n_k}a_{qj}^{(k+1)}D_iu_j^{(k)}. \end{aligned}$$
(21)

Since

$$\begin{aligned} {\mathcal {D}}\left( \sigma _1\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}u_j^{(k)}+b_q^{(k+1)}\right) \right)= & {} k+2, \\ {\mathcal {W}}\left( \sigma _1\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}u_j^{(k)}+b_q^{(k+1)}\right) \right)\le & {} {\mathcal {W}}, \end{aligned}$$

and

$$\begin{aligned} {\mathcal {D}}\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}D_iu_j^{(k)}\right)= & {} k+2,\\ {\mathcal {W}}\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}D_iu_j^{(k)}\right)\le & {} \max \{(k+2){\mathcal {W}},4{\mathcal {W}}\}=(k+2){\mathcal {W}}, \end{aligned}$$

we conclude that \(D_iu_q^{(k+1)}\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network and \({\mathcal {D}}\left( D_iu_q^{(k+1)}\right) =k+3\), \({\mathcal {W}}\left( D_iu_q^{(k+1)}\right) \le \max \{(k+3){\mathcal {W}},4\}=(k+3){\mathcal {W}}\).

Hence we derive that \(D_iu=D_iu_1^{{\mathcal {D}}}\) can be implemented by a \(\textrm{ReLU}\)\(\textrm{ReLU}^{2}\) network and \({\mathcal {D}}\left( D_iu\right) ={\mathcal {D}}+2\), \({\mathcal {W}}\left( D_iu\right) \le \left( {\mathcal {D}}+2\right) {\mathcal {W}}\). Finally, we obtain:

$$\begin{aligned} {\mathcal {D}}\left( \Vert \nabla u\Vert ^2\right) ={\mathcal {D}}+3, {\mathcal {W}}\left( \Vert \nabla u\Vert ^2\right) \le d\left( {\mathcal {D}}+2\right) {\mathcal {W}}. \end{aligned}$$
(22)

\(\square \)

Proof of Lemma 4.2

First, we introduce Massart’s finite class lemma whose proof can be found in [4].

Lemma 6.2

(Massart’s finite class lemma [4]) For any finite set \(V\in {\mathbb {R}}^{n}\) with diameter \(D=\sum _{v\in V}\Vert v\Vert _{2}\), then

$$\begin{aligned} {\mathbb {E}}_{\Sigma _{n}}\left[ \sup _{v\in V}\frac{1}{n}\left| \sum _{i}\sigma _{i}v_{i} \right| \right] \le \frac{D}{n}\sqrt{2\log (2|V|)}, \end{aligned}$$

where \(\sigma _{i}\) and \(\Sigma _{n}\) are the Rademacher variables defined the same as in Definition 4.1.

Then we apply changing method. Set \(\varepsilon _{j}=2^{-k+1} B\). We denote by \({\mathcal {F}}_{k}\) such that \({\mathcal {F}}_{k}\) is an \(\varepsilon _{k}\) -cover of \({\mathcal {F}}\) and \(\left| {\mathcal {F}}_{k}\right| ={\mathcal {C}}\left( \varepsilon _{k}, {\mathcal {F}},\Vert \cdot \Vert _{\infty }\right) \). Hence for any \(u \in {\mathcal {F}}\), there exists \(u_{k} \in {\mathcal {F}}_{k}\) such that \(\left\| u-u_{k}\right\| _{\infty } \le \varepsilon _{k}\). Let K be a positive integer determined later. We have

$$\begin{aligned}{} & {} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i} u\left( Z_{i}\right) \right| \right] \\{} & {} \quad ={\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i}\left( u\left( Z_{i}\right) -u_{K}\left( Z_{i}\right) \right) \right. \right. \\{} & {} \qquad \left. \left. +\sum \limits _{j=1}^{K-1} \sum \limits _{i=1}^{n} \sigma _{i}\left( u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right) +\sum \limits _{i=1}^{n} \sigma _{i} u_{1}\left( Z_{i}\right) \right| \right] \\{} & {} \quad \le {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i}\left( u\left( Z_{i}\right) -u_{K}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \qquad +\sum \limits _{j=1}^{K-1} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{t}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i}\left( u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \qquad +{\mathbb {E}}_{\left\{ \sigma _{i}, Z_{t}\right\} _{k=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i} u_{1}\left( Z_{i}\right) \right| \right] \end{aligned}$$

Since \(0 \in {\mathcal {F}}\), we can choose \({\mathcal {F}}_{1}=\{0\}\) to eliminate the third term. For the first term,

$$\begin{aligned}{} & {} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{t=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i}\left( u\left( Z_{i}\right) -u_{K}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \quad \le {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n} \sum \limits _{i=1}^{n}\left| \sigma _{i}\right| \left\| u-u_{K}\right\| _{\infty }\right] \le \varepsilon _{K} \end{aligned}$$

For the second term, defining \(v_{i}^{j}=u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \), and applying Lemma 6.2, we have

$$\begin{aligned}{} & {} \sum _{j=1}^{K-1} {\mathbb {E}}_{\{\sigma \}_{i=1}^{n}}\left[ \sup _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum _{i=1}^{n} \sigma _{i}\left( u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \quad =\sum _{j=1}^{K-1} {\mathbb {E}}_{\left\{ \sigma _{t}\right\} _{t=1}^{n}}\left[ \sup _{v \in V_{j}} \frac{1}{n} \left| \sum _{i=1}^{n} \sigma _{i} v_{i}^{j}\right| \right] \le \sum _{j=1}^{K-1} \frac{D_{j}}{n} \sqrt{2 \log \left( 2\left| V_{j}\right| \right) }. \end{aligned}$$

By the definition of \(V_{j}\), we know that \(\left| V_{j}\right| \le \left| {\mathcal {F}}_{j}\right| \left| {\mathcal {F}}_{j+1}\right| \le \left| {\mathcal {F}}_{j+1}\right| ^{2}\) and

$$\begin{aligned} \Vert V\Vert _{2}= & {} \left( \sum _{i=1}^{n}\left| u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right| ^{2}\right) ^{1 / 2} \le \sqrt{n}\left\| u_{j+1}-u_{j}\right\| _{\infty }\\\le & {} \sqrt{n}\left\| u_{j+1}-u\right\| _{\infty }+\sqrt{n}\left\| u_{j}-u\right\| _{\infty }=\sqrt{n} \varepsilon _{j+1}+\sqrt{n} \varepsilon _{j}=3 \sqrt{n} \varepsilon _{j+1}. \end{aligned}$$

Hence

$$\begin{aligned}{} & {} \sum _{j=1}^{K-1} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{t}\right\} _{t=1}^{n}}\left[ \sup _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum _{i=1}^{n} \sigma _{i}\left( u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \quad \le \sum _{j=1}^{K-1} \frac{D_{j}}{n} \sqrt{2 \log \left( 2\left| V_{j}\right| \right) } \le \sum _{j=1}^{K-1} \frac{3 \varepsilon _{j+1}}{\sqrt{n}} \sqrt{2 \log \left( 2\left| {\mathcal {F}}_{j+1}\right| ^{2}\right) } \end{aligned}$$

Now we obtain

$$\begin{aligned}{} & {} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum _{i=1}^{n} \sigma _{i} u\left( Z_{i}\right) \right| \right] \le \varepsilon _{K}+\sum _{j=1}^{K-1} \frac{6 \varepsilon _{j+2}}{\sqrt{n}} \sqrt{2 \log \left( 2\left| {\mathcal {F}}_{j+1}\right| ^{2}\right) } \\{} & {} \quad =\varepsilon _{K}+\frac{6}{\sqrt{n}} \sum _{j=1}^{K-1}\left( \varepsilon _{j+1}-\varepsilon _{j+2}\right) \sqrt{2 \log \left( 2 {\mathcal {C}}\left( \varepsilon _{j+1}, {\mathcal {F}},\Vert \cdot \Vert _{\infty }\right) ^{2}\right) } \\{} & {} \quad \le \varepsilon _{K}+\frac{6}{\sqrt{n}} \int _{\varepsilon _{K+1}}^{B / 2} \sqrt{2 \log \left( 2 {\mathcal {C}}\left( \varepsilon , {\mathcal {F}},\Vert \cdot \Vert _{\infty }\right) ^{2}\right) } d \varepsilon . \end{aligned}$$

We conclude the lemma by choosing K such that \(\varepsilon _{K+2}\!<\!\delta \!\le \!\varepsilon _{K+1}\) for any \(0\!<\!\delta \!<\!\frac{B}{2}.\) \(\square \)

Proof of Lemma 4.4

The sketch of the proof is given as follow: First the VCdim and pseudo-shattering is introduced as a lower bound of the Pdim; Then the VCdim for polynomial is estimated through a lemma in [1]; Based on the conclusion above, the proof can be finished by a deduction similar to Theorem 6 in [2].

These are the definition of pseudo-shattering and VCdim.

Definition 6.1

Let \({\mathcal {N}}\) be a set of functions from \(X=\Omega (\partial \Omega )\) to \(\{0,1\}\). Suppose that \(S=\left\{ x_{1}, x_{2}, \ldots , x_{n}\right\} \subset X\). We say that S is shattered by \({\mathcal {N}}\) if for any \(b \in \{0,1\}^{n}\), there exists a \(u \in {\mathcal {N}}\) satisfying

$$\begin{aligned} u\left( x_{i}\right) =b_{i}, \quad i=1,2, \ldots , n. \end{aligned}$$

Definition 6.2

The VC-dimension of \({\mathcal {N}}\) denoted as \({\text {VCdim}}({\mathcal {N}})\), is defined to be the maximum cardinality among all sets shattered by \({\mathcal {N}}\).

Lemma 6.3 is introduced to estimate the Pdim for polynomials. The proof can be found in Theorem 8.3 in [1].

Lemma 6.3

Let \(p_1,\ldots ,p_m\) be polynomials with n variables of degree at most d. If \(n\le m\), then

$$\begin{aligned} |\{({\text {sign}}(p_1(x)),\ldots ,{\text {sign}}(p_m(x))):x\in {\mathbb {R}}^n\}| \le 2\left( \frac{2emd}{n}\right) ^n \end{aligned}$$

The argument follows from the proof of Theorem 6 in [2]. The result stated here is somewhat stronger than Theorem 6 in [2] since \(\textrm{VCdim}({\text {sign}}({\mathcal {N}}))\le \textrm{Pdim}({\mathcal {N}})\).

We consider a new set of functions:

$$\begin{aligned} \mathcal {{\widetilde{N}}}=\{{\widetilde{u}}(x,y)={\text {sign}}(u(x)-y): u\in {\mathcal {N}}\} \end{aligned}$$

It is clear that \(\textrm{Pdim}({\mathcal {N}})\le \textrm{VCdim}(\mathcal {{\widetilde{N}}})\). We now bound the VC-dimension of \(\mathcal {{\widetilde{N}}}\). Denoting \({\mathcal {M}}\) as the total number of parameters(weights and biases) in the neural networks implementing functions in \({\mathcal {N}}\), in our case we want to derive the uniform bound for

$$\begin{aligned} K_{\{x_i\},\{y_i\}}(m):=|\{({\text {sign}}(u(x_{1}, a)-y_1), \ldots , {\text {sign}}(u(x_{m}, a)-y_m)): a \in {\mathbb {R}}^{{\mathcal {M}}}\}| \end{aligned}$$

over all \(\{x_i\}_{i=1}^{m}\subset X\) and \(\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}\). Actually, the maximum of \(K_{\{x_i\},\{y_i\}}(m)\) overall \(\{x_i\}_{i=1}^{m}\subset X\) and \(\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}\) is the growth function \({\mathcal {G}}_{\mathcal {{\widetilde{N}}}}(m)\).

To apply Lemma 6.3, we partition the parameter space \({\mathbb {R}}^{{\mathcal {M}}}\) into several subsets to ensure that in each subset \(u(x_i,a)-y_i\) is a polynomial with respect to a without any breakpoints. In fact, our partition is the same as the partition in [2]. Denote the partition as \(\{P_1,P_2,\ldots ,P_N\}\) with some integer N satisfying

$$\begin{aligned} N\le \prod _{i=1}^{{\mathcal {D}}-1}2\left( \frac{2emk_i(1+(i-1)2^{i-1})}{{\mathcal {M}}_i}\right) ^{{\mathcal {M}}_i} \end{aligned}$$
(23)

Where \(k_i\) and \({\mathcal {M}}_i\) denote the number of units at the i-th layer and the total number of parameters of the units’ inputs in all the layers up to layer i of the neural networks implementing functions in \({\mathcal {N}}\), respectively. For more information on the construction of the partition that is used, please refer to [2]. Obviously, we have

$$\begin{aligned} K_{\{x_i\},\{y_i\}}(m)\le \sum _{i=1}^{N}|\{({\text {sign}}(u(x_{1}, a)-y_1), \ldots , {\text {sign}}(u(x_{m}, a)-y_m)): a\in P_i\}| \end{aligned}$$
(24)

Note that \(u(x_i,a)-y_i\) is a polynomial with respect to a with the degree the same as the degree of \(u(x_i,a)\), which is equal to \(1 + ({\mathcal {D}}-1)2^{{\mathcal {D}}-1}\) as shown in [2]. Hence by Lemma 6.3, we have

$$\begin{aligned}&|\{({\text {sign}}(u(x_{1}, a)-y_1), \ldots , {\text {sign}}(u(x_{m}, a)-y_m)): a\in P_i\}|\nonumber \\&\quad \le 2\left( \frac{2em(1+({\mathcal {D}}-1)2^{{\mathcal {D}}-1})}{{\mathcal {M}}_{{\mathcal {D}}}}\right) ^{{\mathcal {M}}_{{\mathcal {D}}}}. \end{aligned}$$
(25)

Combining (23), (24), (25) yields

$$\begin{aligned} K_{\{x_i\},\{y_i\}}(m)\le \prod _{i=1}^{{\mathcal {D}}}2 \left( \frac{2emk_i(1+(i-1)2^{i-1})}{{\mathcal {M}}_i}\right) ^{{\mathcal {M}}_i}. \end{aligned}$$

We then have

$$\begin{aligned} {\mathcal {G}}_{\mathcal {{\widetilde{N}}}}(m)\le \prod _{i=1}^{{\mathcal {D}}}2 \left( \frac{2emk_i(1+(i-1)2^{i-1})}{{\mathcal {M}}_i}\right) ^{{\mathcal {M}}_i}, \end{aligned}$$

since the maximum of \(K_{\{x_i\},\{y_i\}}(m)\) overall \(\{x_i\}_{i=1}^{m}\subset X\) and \(\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}\) is the growth function \({\mathcal {G}}_{\mathcal {{\widetilde{N}}}}(m)\). Some algebras as that of the proof of Theorem 6 in [2], we obtain

$$\begin{aligned} \textrm{Pdim}({\mathcal {N}})\lesssim C\left( {\mathcal {D}}^2{\mathcal {W}}^2 \log {\mathcal {U}}+{\mathcal {D}}^3{\mathcal {W}}^2\right) \lesssim C\left( {\mathcal {D}}^2{\mathcal {W}}^2\left( {\mathcal {D}}+\log {\mathcal {W}}\right) \right) \end{aligned}$$

where \({\mathcal {U}}\) refers to the number of units of the neural networks implementing functions in \({\mathcal {N}}\).

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, X., Jiao, Y., Lu, X. et al. Deep Ritz Method for Elliptical Multiple Eigenvalue Problems. J Sci Comput 98, 48 (2024). https://doi.org/10.1007/s10915-023-02443-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-023-02443-8

Keywords

Mathematics Subject Classification

Navigation