Deep Ritz Method for Elliptical Multiple Eigenvalue Problems

Ji, Xia; Jiao, Yuling; Lu, Xiliang; Song, Pengcheng; Wang, Fengru

doi:10.1007/s10915-023-02443-8

Deep Ritz Method for Elliptical Multiple Eigenvalue Problems

Published: 17 January 2024

Volume 98, article number 48, (2024)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Xia Ji^1,2,
Yuling Jiao³,
Xiliang Lu ORCID: orcid.org/0000-0002-7592-5994^3,4,
Pengcheng Song³ &
…
Fengru Wang³

317 Accesses
Explore all metrics

Abstract

In this paper, we investigate solving the elliptical multiple eigenvalue (EME) problems using a Feedforward Neural Network. Firstly, we propose a general formulation for computing EME based on penalized variational forms of elliptical eigenvalue problems. Next, we solve the penalized variational form using the Deep Ritz Method. We establish an upper bound on the error between the estimated eigenvalues and true ones in terms of the depth $\mathcal {D}$, width $\mathcal {W}$ of the neural network, and training sample size n. By exploring the regularity of the EME and selecting an appropriate depth $\mathcal {D}$ and width $\mathcal {W}$, we demonstrate that the desired bound enjoys a convergence rate of $O(1/n^{16})$, which circumvents the curse of dimensionality. We also present several high-dimensional simulation results to illustrate the effectiveness of our proposed method and support our theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative Methods for Computing Eigenvectors of Nonlinear Operators

A Riemannian Optimization Approach for Solving the Generalized Eigenvalue Problem for Nonsquare Matrix Pencils

Article 27 February 2020

Data availability

The datasets generated and/or analyzed during the current study, which substantiate the conclusions drawn within this article, are available from the corresponding author, Professor Lu, upon reasonable request and in accordance with institutional and ethical guidelines for data sharing.

References

Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (2009)
Google Scholar
Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. J. Mach. Learn. Res. 20(1), 2285–2301 (2019)
MathSciNet Google Scholar
Berezin, F.A., Shubin, M.: The Schrödinger Equation, vol. 66. Springer Science & Business Media, Berlin (2012)
Google Scholar
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press (2013)
Chen, F., Huang, J., Wang, C., Yang, H.: Friedrichs learning: Weak solutions of partial differential equations via deep learning. SIAM J. Sci. Comput. 45(3), A1271–A1299 (2023)
Article MathSciNet Google Scholar
Courtade, T. A.: Bounds on the poincaré constant for convolution measures. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, vol. 56 (2020)
Duan, C., Jiao, Y., Lai, Y., Li, D., Lu, X., Yang, J.Z.: Convergence rate analysis for deep ritz method. Commun. Comput. Phys. 31(4), 1020–1048 (2022)
Article MathSciNet Google Scholar
Duan, C., Jiao, Y., Lai, Y., Lu, X., Quan, Q., Yang, J.Z.: Deep ritz methods for Laplace equations with Dirichlet boundary condition. Commun. Comput. Phys. 31(4), 1020–1048 (2022)
Article MathSciNet Google Scholar
Dudley, R.M.: The sizes of compact subsets of Hilbert space and continuity of gaussian processes. J. Funct. Anal. 1(3), 290–330 (1967)
Article MathSciNet Google Scholar
Evans, L. C.: Partial Differential Equations, vol. 19 (2 edn). American Mathematical Society, pp. 355–360 (2010)
Fefferman, C.L.: A sharp form of Whitney’s extension theorem. Ann. Math. 161(1), 509–577 (2005)
Article MathSciNet Google Scholar
Fortunato, V.B.-D.: An eigenvalue problem for the Schrodinger-Maxwell equations. J. Juliusz Schauder Center 11, 283–293 (1998)
MathSciNet Google Scholar
Gu, Y., Yang, H., Zhou, C.: Selectnet: self-paced learning for high-dimensional partial differential equations. J. Comput. Phys. 441, 110444 (2021)
Article MathSciNet Google Scholar
Han, J., Jentzen, A., Solving, W.E.: high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
Article MathSciNet Google Scholar
Han, J., Zhang, L., Weinan, E.: Solving many-electron Schrödinger equation using deep neural networks. J. Comput. Phys. 399, 108929 (2019)
Article MathSciNet Google Scholar
Han, J., Jianfeng, L., Zhou, M.: Solving high-dimensional eigenvalue problems using deep neural networks: a diffusion Monte Carlo like approach. J. Comput. Phys. 423, 109792 (2020)
Article MathSciNet Google Scholar
Hermann, J., Schätzle, Z., Noé, F.: Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem. 12(10), 891–897 (2020)
Article Google Scholar
Hon, S., Yang, H.: Simultaneous Neural Network Approximations in Sobolev Spaces (2021)
Jagtap, A. D., Karniadakis, G. E.: Extended physics-informed neural networks (xpinns): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. In: AAAI Spring Symposium: MLPS, pp. 2002–2041 (2021)
Johnson, O.: Convergence of the poincaré constant. Theory Probab. Appl. 48(3), 535–541 (2004)
Article MathSciNet Google Scholar
Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer Science & Business Media, Berlin (2013)
Google Scholar
Li, P., Yau, S.-T.: On the Schrödinger equation and the eigenvalue problem. Commun. Math. Phys. 88(3), 309–318 (1983)
Article Google Scholar
Li, H., Ying, L.: A semigroup method for high dimensional elliptic PDEs and eigenvalue problems based on neural networks. J. Comput. Phys. 453, 110939 (2022)
Article MathSciNet Google Scholar
Lu, Y., Lu, J., Wang, M.: A priori generalization analysis of the deep ritz method for solving high dimensional elliptic partial differential equations. In: Conference on Learning Theory, pp. 3196–3241. PMLR (2021)
Lu, J., Lu, Y.: A priori generalization error analysis of two-layer neural networks for solving high dimensional Schrödinger eigenvalue problems. Commun. Am. Math. Soc. 2(01), 1–21 (2022)
Article Google Scholar
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: Deepxde: A deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)
Article MathSciNet Google Scholar
Maury, B.: Numerical analysis of a finite element/volume penalty method. SIAM J. Numer. Anal. 47(2), 1126–1148 (2009)
Article MathSciNet Google Scholar
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal. 42(2), 981–1022 (2022)
Article MathSciNet Google Scholar
Müller, J., Zeinhofer, M.: Error estimates for the variational training of neural networks with boundary penalty. arXiv:2103.01007 (2021)
Pfau, D., Spencer, J.S., Matthews, A.G.D.G., Foulkes, W.M.C.: Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Phys. Rev. Res. 2(3), 033429 (2020)
Article Google Scholar
Raissi, M., Perdikaris, P., Karniadakis, G. E.: Physics informed deep learning (part i): data-driven solutions of nonlinear partial differential equations. arXiv:1711.10561 (2017)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Article MathSciNet Google Scholar
Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Article MathSciNet Google Scholar
Weinan, E., Bing, Y.: The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
Article MathSciNet Google Scholar
Zang, Y., Bao, G., Ye, X., Zhou, H.: Weak adversarial networks for high-dimensional partial differential equations. J. Comput. Phys. 411, 109409 (2020)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like the thank the anonymous referees and associated editor for their useful comments and suggestions, which have led to considerable improvements in the paper.

Funding

The present study is generously funded by the National Key Research and Development Program of China under Grant No. 2020YFA0714200, as well as multiple grants from the National Natural Science Foundation of China (Nos. 12371424, 12371441, and 12371389), the Beijing Natural Science Foundation through Grant No. Z200003 and “the Fundamental Research Funds for the Central Universities”. It is noteworthy that the numerical calculations in this article have been executed on the high-performance computing facilities at the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, 100081, People’s Republic of China
Xia Ji
Beijing Key Laboratory on MCAACI, Beijing Institute of Technology, Beijing, 100081, People’s Republic of China
Xia Ji
School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, People’s Republic of China
Yuling Jiao, Xiliang Lu, Pengcheng Song & Fengru Wang
Computational Science Hubei Key Laboratory, Wuhan University, Wuhan, 430072, People’s Republic of China
Xiliang Lu

Authors

Xia Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yuling Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Xiliang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Pengcheng Song
View author publications
You can also search for this author in PubMed Google Scholar
Fengru Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiliang Lu.

Ethics declarations

Competing interests

All authors disclosed no relevant relationships.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Lemma 4.1

To prove this lemma, we divide it into two parts that can be separately obtained using two facts. The first fact is that the Rademacher complexity can be passed on through a Lipschitz continuous function, which proves the last four inequalities:

Lemma 6.1

Suppose that $\psi :{\mathbb {R}}^{d}\times {\mathbb {R}}\rightarrow {\mathbb {R}}$, $(x,y)\mapsto \psi (x,y)$ is $\ell $-Lipschitz continuous on y for all x. Let ${\mathcal {N}}$ be classes of functions on $\Omega $ and $\psi \circ {\mathcal {N}}=\{\psi \circ u:x\mapsto \psi (x,u(x)),u\in {\mathcal {N}}\}$. Then

$$\begin{aligned} {\mathfrak {R}}(\psi \circ {\mathcal {N}})\le \ell \ {\mathfrak {R}}({\mathcal {N}}) \end{aligned}$$

For the deduction of this statement, we cite the corollary 3.17 in [21]. Therefore, for the second term, we have:

$$\begin{aligned} \sup _{u \in {\mathcal {N}}^{2}}\left| {\mathcal {L}}_{k, 2}(u)-\widehat{{\mathcal {L}}}_{k, 2}(u)\right| \le |\Omega |{\mathfrak {R}}\left( \left\{ w u^{2}:u\in {\mathcal {N}}^{2}\right\} \right) \le 2|\Omega |{\mathcal {B}}^{2}{\mathfrak {R}}\left( {\mathcal {N}}^{2}\right) \end{aligned}$$

given the Lipschitz constant is $2{\mathcal {B}}^{2}$ in this case. So it can be acquired in the third term, with the Lipschitz constant being $2{\mathcal {B}}$.

As for the last two terms, we use the following inequality:

$$\begin{aligned} |a^{2}-b^{2}|\le 2|a||a-b|+|a-b|^{2}. \end{aligned}$$

It can be obtained in the same manner.

The first term needs to be operated separately for the $\nabla $ operator is not Lipschitz continuous. It is a direct consequence of the following claim:

Claim: Let u be a function implemented by a $\textrm{ReLU}^{2}$ network with depth ${\mathcal {D}}$ and width ${\mathcal {W}}$. Then $\Vert \nabla u\Vert _2^2$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network with depth ${\mathcal {D}}+3$ and width $d\left( {\mathcal {D}}+2\right) {\mathcal {W}}$.

Denote $\textrm{ReLU}$ and $\textrm{ReLU}^{2}$ as $\sigma _1$ and $\sigma _2$, respectively. As long as we show that each partial derivative $D_iu(i=1,2,\ldots ,d)$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network respectively, we can easily obtain the network we desire, since, $\Vert \nabla u\Vert _2^2=\sum _{i=1}^{d}\left| D_i u\right| ^2$ and the square function can be implemented by $x^2=\sigma _2(x)+\sigma _2(-x)$.

Now we show that for any $i=1,2,\ldots ,d$, $D_iu$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network. We deal with the first two layers in detail since there is a little bit of difference for the first two layers and apply induction for layers $k\ge 3$. For the first layer, since $\sigma _2^{'}(x)=2\sigma _1(x)$, we have for any $q=1,2,\ldots ,n_1$

$$\begin{aligned} D_iu_q^{(1)}=D_i\sigma _2\left( \sum _{j=1}^{d}a_{qj}^{(1)}x_j+b_q^{(1)}\right) =2\sigma _1\left( \sum _{j=1}^{d}a_{qj}^{(1)}x_j+b_q^{(1)}\right) \cdot a_{qi}^{(1)} \end{aligned}$$

Hence $D_iu_q^{(1)}$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network with depth 2 and width 1. For the second layer,

$$\begin{aligned} D_iu_q^{(2)}=D_i\sigma _2\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) =2\sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) \cdot \sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}. \end{aligned}$$

Since $\sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) $ and $\sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}$ can be implemented by $\textrm{ReLU}-\textrm{ReLU}^{2}$ subnetworks, respectively, and the multiplication can also be implemented by

$$\begin{aligned} x\cdot y= & {} \frac{1}{4}\left[ (x+y)^2-(x-y)^2\right] \\= & {} \frac{1}{4}\left[ \sigma _2(x+y)+\sigma _2(-x-y)-\sigma _2(x-y)-\sigma _2(-x+y)\right] . \end{aligned}$$

We conclude that $D_iu_q^{(2)}$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network. We have

$$\begin{aligned} {\mathcal {D}}\left( \sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) \right) =3, {\mathcal {W}}\left( \sigma _1\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}u_j^{(1)}+b_q^{(2)}\right) \right) \le {\mathcal {W}} \end{aligned}$$

and

$$\begin{aligned} {\mathcal {D}}\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}\right) =2, {\mathcal {W}}\left( \sum _{j=1}^{n_1}a_{qj}^{(2)}D_iu_j^{(1)}\right) \le {\mathcal {W}}. \end{aligned}$$

Thus ${\mathcal {D}}\left( D_iu_q^{(2)}\right) =4,$ ${\mathcal {W}}\left( D_iu_q^{(2)}\right) \le \max \{2{\mathcal {W}},4\}$.

Now we apply induction for layers $k\ge 3$. For the third layer,

$$\begin{aligned} D_iu_q^{(3)}=D_i\sigma _2\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}u_j^{(2)}+b_q^{(3)}\right) =2\sigma _1\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}u_j^{(2)}+b_q^{(3)}\right) \cdot \sum _{j=1}^{n_2}a_{qj}^{(3)}D_iu_j^{(2)}. \end{aligned}$$

Since

$$\begin{aligned} {\mathcal {D}}\left( \sigma _1\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}u_j^{(2)}+b_q^{(3)}\right) \right) =4, {\mathcal {W}}\left( \sigma _1\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}u_j^{(2)}+b_q^{(3)}\right) \right) \le {\mathcal {W}} \end{aligned}$$

and

$$\begin{aligned} {\mathcal {D}}\left( \sum _{j=1}^{n_2}a_{qj}^{(3)}D_iu_j^{(2)}\right) =4, {\mathcal {W}}\left( \sum _{j=1}^{n_1}a_{qj}^{(3)}D_iu_j^{(2)}\right) \le \max \{2{\mathcal {W}},4{\mathcal {W}}\}=4{\mathcal {W}}, \end{aligned}$$

we conclude that $D_iu_q^{(3)}$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network and

$$\begin{aligned} {\mathcal {D}}\left( D_iu_q^{(3)}\right) =5, {\mathcal {W}}\left( D_iu_q^{(3)}\right) \le \max \{5{\mathcal {W}},4\}=5{\mathcal {W}}. \end{aligned}$$

We assume that $D_iu_q^{(k)}(q=1,2,\ldots ,n_k)$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network and ${\mathcal {D}}\left( D_iu_q^{(k)}\right) =k+2$, ${\mathcal {W}}\left( D_iu_q^{(3)}\right) \le (k+2){\mathcal {W}}$. For the $(k+1)-$th layer,

$$\begin{aligned} D_iu_q^{(k+1)}&=D_i\sigma _2\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}u_j^{(k)}+b_q^{(k+1)}\right) \end{aligned}$$

(20)

$$\begin{aligned}&=2\sigma _1\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}u_j^{(k)}+b_q^{(k+1)}\right) \cdot \sum _{j=1}^{n_k}a_{qj}^{(k+1)}D_iu_j^{(k)}. \end{aligned}$$

(21)

Since

$$\begin{aligned} {\mathcal {D}}\left( \sigma _1\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}u_j^{(k)}+b_q^{(k+1)}\right) \right)= & {} k+2, \\ {\mathcal {W}}\left( \sigma _1\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}u_j^{(k)}+b_q^{(k+1)}\right) \right)\le & {} {\mathcal {W}}, \end{aligned}$$

and

$$\begin{aligned} {\mathcal {D}}\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}D_iu_j^{(k)}\right)= & {} k+2,\\ {\mathcal {W}}\left( \sum _{j=1}^{n_k}a_{qj}^{(k+1)}D_iu_j^{(k)}\right)\le & {} \max \{(k+2){\mathcal {W}},4{\mathcal {W}}\}=(k+2){\mathcal {W}}, \end{aligned}$$

we conclude that $D_iu_q^{(k+1)}$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network and ${\mathcal {D}}\left( D_iu_q^{(k+1)}\right) =k+3$, ${\mathcal {W}}\left( D_iu_q^{(k+1)}\right) \le \max \{(k+3){\mathcal {W}},4\}=(k+3){\mathcal {W}}$.

Hence we derive that $D_iu=D_iu_1^{{\mathcal {D}}}$ can be implemented by a $\textrm{ReLU}$–$\textrm{ReLU}^{2}$ network and ${\mathcal {D}}\left( D_iu\right) ={\mathcal {D}}+2$, ${\mathcal {W}}\left( D_iu\right) \le \left( {\mathcal {D}}+2\right) {\mathcal {W}}$. Finally, we obtain:

$$\begin{aligned} {\mathcal {D}}\left( \Vert \nabla u\Vert ^2\right) ={\mathcal {D}}+3, {\mathcal {W}}\left( \Vert \nabla u\Vert ^2\right) \le d\left( {\mathcal {D}}+2\right) {\mathcal {W}}. \end{aligned}$$

(22)

$\square $

Proof of Lemma 4.2

First, we introduce Massart’s finite class lemma whose proof can be found in [4].

Lemma 6.2

(Massart’s finite class lemma [4]) For any finite set $V\in {\mathbb {R}}^{n}$ with diameter $D=\sum _{v\in V}\Vert v\Vert _{2}$, then

$$\begin{aligned} {\mathbb {E}}_{\Sigma _{n}}\left[ \sup _{v\in V}\frac{1}{n}\left| \sum _{i}\sigma _{i}v_{i} \right| \right] \le \frac{D}{n}\sqrt{2\log (2|V|)}, \end{aligned}$$

where $\sigma _{i}$ and $\Sigma _{n}$ are the Rademacher variables defined the same as in Definition 4.1.

Then we apply changing method. Set $\varepsilon _{j}=2^{-k+1} B$. We denote by ${\mathcal {F}}_{k}$ such that ${\mathcal {F}}_{k}$ is an $\varepsilon _{k}$ -cover of ${\mathcal {F}}$ and $\left| {\mathcal {F}}_{k}\right| ={\mathcal {C}}\left( \varepsilon _{k}, {\mathcal {F}},\Vert \cdot \Vert _{\infty }\right) $. Hence for any $u \in {\mathcal {F}}$, there exists $u_{k} \in {\mathcal {F}}_{k}$ such that $\left\| u-u_{k}\right\| _{\infty } \le \varepsilon _{k}$. Let K be a positive integer determined later. We have

$$\begin{aligned}{} & {} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i} u\left( Z_{i}\right) \right| \right] \\{} & {} \quad ={\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i}\left( u\left( Z_{i}\right) -u_{K}\left( Z_{i}\right) \right) \right. \right. \\{} & {} \qquad \left. \left. +\sum \limits _{j=1}^{K-1} \sum \limits _{i=1}^{n} \sigma _{i}\left( u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right) +\sum \limits _{i=1}^{n} \sigma _{i} u_{1}\left( Z_{i}\right) \right| \right] \\{} & {} \quad \le {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i}\left( u\left( Z_{i}\right) -u_{K}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \qquad +\sum \limits _{j=1}^{K-1} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{t}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i}\left( u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \qquad +{\mathbb {E}}_{\left\{ \sigma _{i}, Z_{t}\right\} _{k=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i} u_{1}\left( Z_{i}\right) \right| \right] \end{aligned}$$

Since $0 \in {\mathcal {F}}$, we can choose ${\mathcal {F}}_{1}=\{0\}$ to eliminate the third term. For the first term,

$$\begin{aligned}{} & {} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{t=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum \limits _{i=1}^{n} \sigma _{i}\left( u\left( Z_{i}\right) -u_{K}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \quad \le {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup \limits _{u \in {\mathcal {F}}} \frac{1}{n} \sum \limits _{i=1}^{n}\left| \sigma _{i}\right| \left\| u-u_{K}\right\| _{\infty }\right] \le \varepsilon _{K} \end{aligned}$$

For the second term, defining $v_{i}^{j}=u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) $, and applying Lemma 6.2, we have

$$\begin{aligned}{} & {} \sum _{j=1}^{K-1} {\mathbb {E}}_{\{\sigma \}_{i=1}^{n}}\left[ \sup _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum _{i=1}^{n} \sigma _{i}\left( u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \quad =\sum _{j=1}^{K-1} {\mathbb {E}}_{\left\{ \sigma _{t}\right\} _{t=1}^{n}}\left[ \sup _{v \in V_{j}} \frac{1}{n} \left| \sum _{i=1}^{n} \sigma _{i} v_{i}^{j}\right| \right] \le \sum _{j=1}^{K-1} \frac{D_{j}}{n} \sqrt{2 \log \left( 2\left| V_{j}\right| \right) }. \end{aligned}$$

By the definition of $V_{j}$, we know that $\left| V_{j}\right| \le \left| {\mathcal {F}}_{j}\right| \left| {\mathcal {F}}_{j+1}\right| \le \left| {\mathcal {F}}_{j+1}\right| ^{2}$ and

$$\begin{aligned} \Vert V\Vert _{2}= & {} \left( \sum _{i=1}^{n}\left| u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right| ^{2}\right) ^{1 / 2} \le \sqrt{n}\left\| u_{j+1}-u_{j}\right\| _{\infty }\\\le & {} \sqrt{n}\left\| u_{j+1}-u\right\| _{\infty }+\sqrt{n}\left\| u_{j}-u\right\| _{\infty }=\sqrt{n} \varepsilon _{j+1}+\sqrt{n} \varepsilon _{j}=3 \sqrt{n} \varepsilon _{j+1}. \end{aligned}$$

Hence

$$\begin{aligned}{} & {} \sum _{j=1}^{K-1} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{t}\right\} _{t=1}^{n}}\left[ \sup _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum _{i=1}^{n} \sigma _{i}\left( u_{j+1}\left( Z_{i}\right) -u_{j}\left( Z_{i}\right) \right) \right| \right] \\{} & {} \quad \le \sum _{j=1}^{K-1} \frac{D_{j}}{n} \sqrt{2 \log \left( 2\left| V_{j}\right| \right) } \le \sum _{j=1}^{K-1} \frac{3 \varepsilon _{j+1}}{\sqrt{n}} \sqrt{2 \log \left( 2\left| {\mathcal {F}}_{j+1}\right| ^{2}\right) } \end{aligned}$$

Now we obtain

$$\begin{aligned}{} & {} {\mathbb {E}}_{\left\{ \sigma _{i}, Z_{i}\right\} _{i=1}^{n}}\left[ \sup _{u \in {\mathcal {F}}} \frac{1}{n}\left| \sum _{i=1}^{n} \sigma _{i} u\left( Z_{i}\right) \right| \right] \le \varepsilon _{K}+\sum _{j=1}^{K-1} \frac{6 \varepsilon _{j+2}}{\sqrt{n}} \sqrt{2 \log \left( 2\left| {\mathcal {F}}_{j+1}\right| ^{2}\right) } \\{} & {} \quad =\varepsilon _{K}+\frac{6}{\sqrt{n}} \sum _{j=1}^{K-1}\left( \varepsilon _{j+1}-\varepsilon _{j+2}\right) \sqrt{2 \log \left( 2 {\mathcal {C}}\left( \varepsilon _{j+1}, {\mathcal {F}},\Vert \cdot \Vert _{\infty }\right) ^{2}\right) } \\{} & {} \quad \le \varepsilon _{K}+\frac{6}{\sqrt{n}} \int _{\varepsilon _{K+1}}^{B / 2} \sqrt{2 \log \left( 2 {\mathcal {C}}\left( \varepsilon , {\mathcal {F}},\Vert \cdot \Vert _{\infty }\right) ^{2}\right) } d \varepsilon . \end{aligned}$$

We conclude the lemma by choosing K such that $\varepsilon _{K+2}\!<\!\delta \!\le \!\varepsilon _{K+1}$ for any $0\!<\!\delta \!<\!\frac{B}{2}.$ $\square $

Proof of Lemma 4.4

The sketch of the proof is given as follow: First the VCdim and pseudo-shattering is introduced as a lower bound of the Pdim; Then the VCdim for polynomial is estimated through a lemma in [1]; Based on the conclusion above, the proof can be finished by a deduction similar to Theorem 6 in [2].

These are the definition of pseudo-shattering and VCdim.

Definition 6.1

Let ${\mathcal {N}}$ be a set of functions from $X=\Omega (\partial \Omega )$ to $\{0,1\}$. Suppose that $S=\left\{ x_{1}, x_{2}, \ldots , x_{n}\right\} \subset X$. We say that S is shattered by ${\mathcal {N}}$ if for any $b \in \{0,1\}^{n}$, there exists a $u \in {\mathcal {N}}$ satisfying

$$\begin{aligned} u\left( x_{i}\right) =b_{i}, \quad i=1,2, \ldots , n. \end{aligned}$$

Definition 6.2

The VC-dimension of ${\mathcal {N}}$ denoted as ${\text {VCdim}}({\mathcal {N}})$, is defined to be the maximum cardinality among all sets shattered by ${\mathcal {N}}$.

Lemma 6.3 is introduced to estimate the Pdim for polynomials. The proof can be found in Theorem 8.3 in [1].

Lemma 6.3

Let $p_1,\ldots ,p_m$ be polynomials with n variables of degree at most d. If $n\le m$, then

$$\begin{aligned} |\{({\text {sign}}(p_1(x)),\ldots ,{\text {sign}}(p_m(x))):x\in {\mathbb {R}}^n\}| \le 2\left( \frac{2emd}{n}\right) ^n \end{aligned}$$

The argument follows from the proof of Theorem 6 in [2]. The result stated here is somewhat stronger than Theorem 6 in [2] since $\textrm{VCdim}({\text {sign}}({\mathcal {N}}))\le \textrm{Pdim}({\mathcal {N}})$.

We consider a new set of functions:

$$\begin{aligned} \mathcal {{\widetilde{N}}}=\{{\widetilde{u}}(x,y)={\text {sign}}(u(x)-y): u\in {\mathcal {N}}\} \end{aligned}$$

It is clear that $\textrm{Pdim}({\mathcal {N}})\le \textrm{VCdim}(\mathcal {{\widetilde{N}}})$. We now bound the VC-dimension of $\mathcal {{\widetilde{N}}}$. Denoting ${\mathcal {M}}$ as the total number of parameters(weights and biases) in the neural networks implementing functions in ${\mathcal {N}}$, in our case we want to derive the uniform bound for

$$\begin{aligned} K_{\{x_i\},\{y_i\}}(m):=|\{({\text {sign}}(u(x_{1}, a)-y_1), \ldots , {\text {sign}}(u(x_{m}, a)-y_m)): a \in {\mathbb {R}}^{{\mathcal {M}}}\}| \end{aligned}$$

over all $\{x_i\}_{i=1}^{m}\subset X$ and $\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}$. Actually, the maximum of $K_{\{x_i\},\{y_i\}}(m)$ overall $\{x_i\}_{i=1}^{m}\subset X$ and $\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}$ is the growth function ${\mathcal {G}}_{\mathcal {{\widetilde{N}}}}(m)$.

To apply Lemma 6.3, we partition the parameter space ${\mathbb {R}}^{{\mathcal {M}}}$ into several subsets to ensure that in each subset $u(x_i,a)-y_i$ is a polynomial with respect to a without any breakpoints. In fact, our partition is the same as the partition in [2]. Denote the partition as $\{P_1,P_2,\ldots ,P_N\}$ with some integer N satisfying

$$\begin{aligned} N\le \prod _{i=1}^{{\mathcal {D}}-1}2\left( \frac{2emk_i(1+(i-1)2^{i-1})}{{\mathcal {M}}_i}\right) ^{{\mathcal {M}}_i} \end{aligned}$$

(23)

Where $k_i$ and ${\mathcal {M}}_i$ denote the number of units at the i-th layer and the total number of parameters of the units’ inputs in all the layers up to layer i of the neural networks implementing functions in ${\mathcal {N}}$, respectively. For more information on the construction of the partition that is used, please refer to [2]. Obviously, we have

$$\begin{aligned} K_{\{x_i\},\{y_i\}}(m)\le \sum _{i=1}^{N}|\{({\text {sign}}(u(x_{1}, a)-y_1), \ldots , {\text {sign}}(u(x_{m}, a)-y_m)): a\in P_i\}| \end{aligned}$$

(24)

Note that $u(x_i,a)-y_i$ is a polynomial with respect to a with the degree the same as the degree of $u(x_i,a)$, which is equal to $1 + ({\mathcal {D}}-1)2^{{\mathcal {D}}-1}$ as shown in [2]. Hence by Lemma 6.3, we have

$$\begin{aligned}&|\{({\text {sign}}(u(x_{1}, a)-y_1), \ldots , {\text {sign}}(u(x_{m}, a)-y_m)): a\in P_i\}|\nonumber \\&\quad \le 2\left( \frac{2em(1+({\mathcal {D}}-1)2^{{\mathcal {D}}-1})}{{\mathcal {M}}_{{\mathcal {D}}}}\right) ^{{\mathcal {M}}_{{\mathcal {D}}}}. \end{aligned}$$

(25)

Combining (23), (24), (25) yields

$$\begin{aligned} K_{\{x_i\},\{y_i\}}(m)\le \prod _{i=1}^{{\mathcal {D}}}2 \left( \frac{2emk_i(1+(i-1)2^{i-1})}{{\mathcal {M}}_i}\right) ^{{\mathcal {M}}_i}. \end{aligned}$$

We then have

$$\begin{aligned} {\mathcal {G}}_{\mathcal {{\widetilde{N}}}}(m)\le \prod _{i=1}^{{\mathcal {D}}}2 \left( \frac{2emk_i(1+(i-1)2^{i-1})}{{\mathcal {M}}_i}\right) ^{{\mathcal {M}}_i}, \end{aligned}$$

since the maximum of $K_{\{x_i\},\{y_i\}}(m)$ overall $\{x_i\}_{i=1}^{m}\subset X$ and $\{y_i\}_{i=1}^{m}\subset {\mathbb {R}}$ is the growth function ${\mathcal {G}}_{\mathcal {{\widetilde{N}}}}(m)$. Some algebras as that of the proof of Theorem 6 in [2], we obtain

$$\begin{aligned} \textrm{Pdim}({\mathcal {N}})\lesssim C\left( {\mathcal {D}}^2{\mathcal {W}}^2 \log {\mathcal {U}}+{\mathcal {D}}^3{\mathcal {W}}^2\right) \lesssim C\left( {\mathcal {D}}^2{\mathcal {W}}^2\left( {\mathcal {D}}+\log {\mathcal {W}}\right) \right) \end{aligned}$$

where ${\mathcal {U}}$ refers to the number of units of the neural networks implementing functions in ${\mathcal {N}}$.

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ji, X., Jiao, Y., Lu, X. et al. Deep Ritz Method for Elliptical Multiple Eigenvalue Problems. J Sci Comput 98, 48 (2024). https://doi.org/10.1007/s10915-023-02443-8

Download citation

Received: 23 April 2023
Revised: 02 November 2023
Accepted: 12 December 2023
Published: 17 January 2024
DOI: https://doi.org/10.1007/s10915-023-02443-8

Keywords

Mathematics Subject Classification

65N99

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Ritz Method for Elliptical Multiple Eigenvalue Problems

Abstract

Access this article

Similar content being viewed by others

Iterative Methods for Computing Eigenvectors of Nonlinear Operators

Iterative Methods for Computing Eigenvectors of Nonlinear Operators

A Riemannian Optimization Approach for Solving the Generalized Eigenvalue Problem for Nonsquare Matrix Pencils

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix

Proof of Lemma 4.1

Lemma 6.1

Proof of Lemma 4.2

Lemma 6.2

Proof of Lemma 4.4

Definition 6.1

Definition 6.2

Lemma 6.3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Deep Ritz Method for Elliptical Multiple Eigenvalue Problems

Abstract

Access this article

Similar content being viewed by others

Iterative Methods for Computing Eigenvectors of Nonlinear Operators

Iterative Methods for Computing Eigenvectors of Nonlinear Operators

A Riemannian Optimization Approach for Solving the Generalized Eigenvalue Problem for Nonsquare Matrix Pencils

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 4.1

Lemma 6.1

Proof of Lemma 4.2

Lemma 6.2

Proof of Lemma 4.4

Definition 6.1

Definition 6.2

Lemma 6.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation