Error analysis for physics-informed neural networks (PINNs) approximating Kolmogorov PDEs

De Ryck, Tim; Mishra, Siddhartha

doi:10.1007/s10444-022-09985-9

Error analysis for physics-informed neural networks (PINNs) approximating Kolmogorov PDEs

Open access
Published: 15 November 2022

Volume 48, article number 79, (2022)
Cite this article

Download PDF

You have full access to this open access article

Advances in Computational Mathematics Aims and scope Submit manuscript

Error analysis for physics-informed neural networks (PINNs) approximating Kolmogorov PDEs

Download PDF

2213 Accesses
22 Citations
Explore all metrics

Abstract

Physics-informed neural networks approximate solutions of PDEs by minimizing pointwise residuals. We derive rigorous bounds on the error, incurred by PINNs in approximating the solutions of a large class of linear parabolic PDEs, namely Kolmogorov equations that include the heat equation and Black-Scholes equation of option pricing, as examples. We construct neural networks, whose PINN residual (generalization error) can be made as small as desired. We also prove that the total L²-error can be bounded by the generalization error, which in turn is bounded in terms of the training error, provided that a sufficient number of randomly chosen training (collocation) points is used. Moreover, we prove that the size of the PINNs and the number of training samples only grow polynomially with the underlying dimension, enabling PINNs to overcome the curse of dimensionality in this context. These results enable us to provide a comprehensive error analysis for PINNs in approximating Kolmogorov PDEs.

Article PDF

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Article Open access 26 July 2022

Regularity for Nonuniformly Elliptic Equations with $$p,\!q$$ -Growth and Explicit $$x,\!u$$ -Dependence

Article 17 June 2024

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Article 14 February 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Bai, G., Koley, U., Mishra, S., Molinaro, R.: Physics informed neural networks (PINNs,) for approximating nonlinear dispersive PDEs. arXiv:2104.05584 (2021)
Barth, A., Jentzen, A., Lang, A., Schwab, C.: Numerical analysis of stochastic ordinary differential equations. ETH Zürich (2018)
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of black–scholes partial differential equations. SIAM J. Math. Data Sci. 2(3), 631–657 (2020)
Article MathSciNet MATH Google Scholar
Chen, T., Chen, H.: Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Trans. Neural Netw. 6(4), 911–917 (1995)
Article Google Scholar
De Ryck, T., Lanthaler, S., Mishra, S.: On the approximation of functions by tanh neural networks. Neural Netw. 143, 732–750 (2021)
Article Google Scholar
Dissanayake, M., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. (1994)
Han, EW., Jentzen, A.J.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
Article MathSciNet MATH Google Scholar
Grohs, P., Hornung, F., Jentzen, A., Von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. arXiv:1809.02362 (2018)
Gühring, I., Kutyniok, G., Petersen, P.: Error bounds for approximations with deep ReLU neural networks in W^s,p norms. Anal. Appl. 18(05), 803–859 (2020)
Article MathSciNet MATH Google Scholar
Gühring, I., Raslan, M.: Approximation rates for neural networks with encodable weights in smoothness spaces. Neural Netw. 134, 107–130 (2021)
Article MATH Google Scholar
Hiptmair, R., Schwab, C.: Numerical methods for elliptic and parabolic boundary value problems. ETH Zürich (2008)
Hornung, F., Jentzen, A., Salimova, D.: Space-time deep neural network approximations for high-dimensional partial differential equations. arXiv:2006.02199 (2020)
Jagtap, A. D., Karniadakis, G. E.: Extended physics-informed neural networks (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys. 28(5), 2002–2041 (2020)
Article MathSciNet MATH Google Scholar
Jagtap, A. D., Kharazmi, E., Karniadakis, G.E.: Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 365, 113028 (2020)
Article MathSciNet MATH Google Scholar
Klebaner, F. C.: Introduction to stochastic calculus with applications. World Scientific Publishing Company (2012)
Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constr. Approx. pp. 1–53 (2021)
Lagaris, I.E., Likas, A., D., P.G.: Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw. 11, 1041–1049 (2000)
Article Google Scholar
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (2000)
Article Google Scholar
Lanthaler, S., Mishra, S., Karniadakis, G.E.: Error estimates for DeepOnets: a deep learning framework in infinite dimensions (2022)
Lévy, P., Lévy, P.: Théorie de l’addition des variables aléatoires Gauthier-Villars (1954)
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A.: Fourier neural operator for parametric partial differential equations (2020)
Lu, L., Jin, P., Karniadakis, G. E.: DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv:1910.03193 (2019)
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: DeepXDE: A deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)
Article MathSciNet MATH Google Scholar
Lye, K. O., Mishra, S., Ray, D.: Deep learning observables in computational fluid dynamics. J. Comput. Phys. p. 109339 (2020)
Lye, K. O., Mishra, S., Ray, D., Chandrashekar, P.: Iterative surrogate model optimization (ISMO): An active learning algorithm for PDE constrained optimization with deep neural networks. Comput. Methods Appl. Mech. Eng. 374, 113575 (2021)
Article MathSciNet MATH Google Scholar
Mao, Z., Jagtap, A.D., Karniadakis, G.E.: Physics-informed neural networks for high-speed flows. Comput. Methods Appl. Mech. Eng. 360, 112789 (2020)
Article MathSciNet MATH Google Scholar
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs IMA J. Numer. Anal. (2021)
Mishra, S., Molinaro, R.: Physics informed neural networks for simulating radiative transfer. J. Quant. Spectros. Radiat. Transfer 270, 107705 (2021)
Article Google Scholar
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics informed neural networks (PINNs) for approximating PDEs IMA. J. Numer. Anal. (2022)
Mishra, S., Molinaro, R., Tanios, R.: Physics informed neural networks for option pricing. In: Preparation (2021)
Øksendal, B.: Stochastic differential equations. Springer, New York (2003)
Book MATH Google Scholar
Pang, G., Lu, L., Karniadakis, G.E.: fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 41, A2603–A2626 (2019)
Article MathSciNet MATH Google Scholar
Raissi, M., Karniadakis, G.E.: Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018)
Article MathSciNet MATH Google Scholar
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Article MathSciNet MATH Google Scholar
Raissi, M., Yazdani, A., Karniadakis, G.E.: Hidden fluid mechanics: a Navier-Stokes informed deep learning framework for assimilating flow visualization data. arXiv:1808.04327 (2018)
Schwab, C., Zech, J.: Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in uq. Anal. Appl. 17(01), 19–55 (2019)
Article MathSciNet MATH Google Scholar
Shin, Y., Darbon, J., Karniadakis, G.E.: On the convergence and generalization of physics informed neural networks. arXiv:2004.01806 (2020)
Shin, Y., Zhang, Z., Karniadakis, G.E.: Error estimates of residual minimization using neural networks for linear equations. arXiv:2010.08019 (2020)
Tanios, R.: Physics informed neural networks in computational finance: high-dimensional forward and inverse option pricing. Master’s thesis, ETH Zürich. https://www.research-collection.ethz.ch/handle/20.500.11850/491556 (2021)
Yang, L., Meng, X., Karniadakis, G.E.: B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 425, 109913 (2021)
Article MathSciNet MATH Google Scholar

Download references

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich. This study was supported by ETH Zürich.

Author information

Authors and Affiliations

Seminar for Applied Mathematics, ETH Zürich, Rämistrasse 101, 8092, Zürich, Switzerland
Tim De Ryck & Siddhartha Mishra

Authors

Tim De Ryck
View author publications
You can also search for this author in PubMed Google Scholar
Siddhartha Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim De Ryck.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Communicated by: Carola-Bibiane Schoenlieb

Availability of data and material

NA

Code availability

NA.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Additional material for Section 3

1.1 A.1 Sobolev spaces

Let $d\in \mathbb {N}$, $k\in \mathbb {N}_{0}$, $1\leq p\leq \infty $ and let ${{\varOmega }} \subseteq \mathbb {R}^{d}$ be open. For a function $f:{{\varOmega }}\to \mathbb {R}$ and a (multi-)index $\alpha \in \mathbb {N}^{d}_{0}$ we denote by

$$ D^{\alpha} f= \frac{\partial^{|{\alpha}|} f}{\partial x_{1}^{\alpha_{1}}{\cdots} \partial x_{d}^{\alpha_{d}}} $$

(A.1)

the classical or distributional (i.e. weak) derivative of f. We denote by L^p(Ω) the usual Lebesgue space and for we define the Sobolev space W^k,p(Ω) as

$$ W^{k,p}({{\varOmega}}) = \left\{f \in L^{p}({{\varOmega}}): D^{\alpha} f \in L^{p}({{\varOmega}}) \text{ for all } \alpha\in\mathbb{N}^{d}_{0} \text{ with } |{\alpha}|\leq k\right\}. $$

(A.2)

For $p<\infty $, we define the following seminorms on W^k,p(Ω),

$$ |{f}|_{W^{m,p}({{\varOmega}})} = \left( \sum\limits_{|{\alpha}|= m}\left\|{D^{\alpha} f}\right\|^{p}_{L^{p}({{\varOmega}})}\right)^{1/p} \qquad \text{for } m=0,\ldots, k, $$

(A.3)

and for $p=\infty $ we define

$$ |{f}|_{W^{m,\infty}({{\varOmega}})} =\underset{|{\alpha}|= m}{\max} \left\|{D^{\alpha} f}\right\|_{L^{\infty}({{\varOmega}})}\qquad \qquad \text{for } m=0,\ldots, k. $$

(A.4)

Based on these seminorms, we can define the following norm for $p<\infty $,

$$ \|{f}\|_{W^{k,p}({{\varOmega}})} = \left( \sum\limits_{m=0}^{k} |{f}|_{W^{m,p}({{\varOmega}})}^{p}\right)^{1/p}, $$

(A.5)

and for $p=\infty $ we define the norm

$$ \|{f}\|_{W^{k,\infty}({{\varOmega}})} =\max_{0\leq m\leq k} |{f}|_{W^{m,\infty}({{\varOmega}})}. $$

(A.6)

The space W^k,p(Ω) equipped with the norm $\|{\cdot }\|_{W^{k,p}({{\varOmega }})}$ is a Banach space.

We denote by C^k(Ω) the space of functions that are k times continuously differentiable and equip this space with the norm $\|{f}\|_{C^{k}({{\varOmega }})} = \|{f}\|_{W^{k,\infty }({{\varOmega }})}$.

1.2 A.2 Auxiliary results

We introduce some results related to the analysis of stochastic differential equations and random variables in general. We start by introducing notation cf. [2, Definition 2.1.3].

Definition 1

Let $({{\varOmega }}, \mathcal {F}, \mu )$ be a measure space and let q > 0. For every $\mathcal {F}/{\mathscr{B}}(\mathbb {R}^{d})$-measurable function $f:{{\varOmega }}\to \mathbbm {R}^{d}$, we define

$$ \|{f}\|_{\mathcal{L}^{q}(\mu, \|{\cdot}\|_{\mathbb{R}^{d}})} := \left[{\int}_{{{\varOmega}}}\|{f(\omega)}\|_{\mathbb{R}^{d}}^{q} \mu(d\omega)\right]^{1/q}. $$

(A.7)

Lemma 1

Let $p\in [2,\infty )$, $d,m\in \mathbb {N}$, let $({{\varOmega }}, \mathcal {F}, \mathcal {P})$ be a probability space, and let $X_{i}:{{\varOmega }}\to \mathbb {R}^{d}, i\in \{1,\ldots , m\}$, be i.i.d. random variables with $\mathbb {E}\left [{\left \|{X_{1}}\right \|}\right ]<\infty $. Then, it holds that

$$ \left( {\mathbb{E}\left[ \left\|{\mathbb{E}\left[{X_{1}}\right]-\frac{1}{m}\sum\limits_{i=1}^{m} X_{i}}\right\|^{p} \right]}\right)^{1/p} \leq 2 \sqrt{\frac{p-1}{m}}\left( \mathbb{E}\left[{\left\|{\mathbb{E}\left[{X_{1}}\right]-X_{1}}\right\|^{p}}\right]\right)^{1/p}. $$

(A.8)

Proof 1

This result is [8, Corollary 2.5]. □

Lemma 2

Let $p\in [2,\infty )$, $q,m\in \mathbb {N}$, let $({{\varOmega }}, \mathcal {F}, \mathcal {P})$ and $(\mathcal {D}, \mathcal {A}, \mu )$ be probability spaces, and let for every $q\in \mathcal {D}$ the maps ${X_{i}^{q}}:{{\varOmega }}\to \mathbb {R}, i\in \{1,\ldots , m\}$, be i.i.d. random variables with $\mathbb {E}\left [{|{{X^{q}_{1}}}|}\right ]<\infty $. Then, it holds that

$$ \begin{aligned} &{\mathbb{E}\left[ \left( {\int}_{\mathcal{D}}\left|{\mathbb{E}\left[{{X^{q}_{1}}}\right]-\frac{1}{m}\sum\limits_{i=1}^{m} {X^{q}_{i}}}\right|^{p}\mu(dq)\right)^{1/p} \right]}\\ & \leq 2\sqrt{\frac{p-1}{m}} \left( {\int}_{\mathcal{D}}\mathbb{E}\left[{\left|{\mathbb{E}\left[{{X^{q}_{1}}}\right]-{X_{1}^{q}}}\right|^{p}}\right]\mu(dq)\right)^{1/p}. \end{aligned} $$

(A.9)

Proof 2

The proof involves Hölder’s inequality, Fubini’s theorem and Lemma 3. The calculation is as in [8, Eq. (226)]. □

Lemma 3

Let 𝜖 > 0, let $({{\varOmega }}, \mathcal {F}, \mathcal {P})$ be a probability space, and let $X:{{\varOmega }}\to \mathbb {R}$ be a random variable that satisfies $\mathbb {E}\left [{|{X}|}\right ]\leq \epsilon $. Then, it holds that $\mathbb {P}(|{X}|\leq \epsilon )>0$.

Proof 3

This result is [8, Proposition 3.3]. □

Lemma 4 (Levy’s modulus of continuitý)

For (B_t)_t∈[0,1] a Brownian motion, it holds almost surely that

$$ \underset{h \downarrow 0}{\limsup} \underset{0\leq t \leq 1 - h}{\sup} \frac{\left|{B_{t+h}-B_{t}}\right|}{\sqrt{2h\log(1/h)}}=1. $$

(A.10)

Proof 4

This result is due to [20] and can be found in most probability theory textbooks. □

Lemma 5

Let T > 0, p ≥ 2, $d,m\in \mathbb {N}$, let $({{\varOmega }}, \mathcal {F}, P, (\mathbb {F}_{t})_{t\in [0,T]})$ be a stochastic basis and let $W:[0,T]\times {{\varOmega }}\to \mathbb {R}^{m}$ be a standard m-dimensional Brownian motion on $({{\varOmega }}, \mathcal {F}, P, (\mathbb {F}_{t})_{t\in [0,T]})$. Let $\lambda \in {\mathscr{L}}^{p}(P|_{\mathbb {F}_{0}}, \|{\cdot }\|_{\mathbb {R}^{d}})$ and let $\mu :\mathbb {R}^{d}\to \mathbb {R}^{d}$ and $\sigma :\mathbb {R}^{d}\to \mathbb {R}^{d\times m}$ be affine functions. Then, there exists an up to indistinguishability unique $(\mathbb {F}_{t})_{t\in [0,T]}$-adapted stochastic process $X^{\lambda }:[0,T]\times {{\varOmega }}\to \mathbb {R}^{d}$, which satisfies

1.
that for all t ∈ [0,T] it holds P-a.s. that
$$ X^{\lambda}_{t} = \lambda +{{\int}_{0}^{t}}\mu\left( X^{\lambda}_{s}\right)ds + {{\int}_{0}^{t}}\sigma\left( X^{\lambda}_{s}\right)dW_{s} $$
(A.11)
2.
that $\sup _{t\in [0,T]}\left \|{X^{\lambda }_{t}}\right \|_{{\mathscr{L}}^{p}(P, \|{\cdot }\|_{\mathbb {R}^{d}})}<\infty $,
3.
that for all $\alpha \in (0,\frac {1}{2}]$ that
$$ \underset{s<t}{\underset{s,t \in [0,T],}{\sup}} \frac{\left\|{X^{\lambda}_{s}-X^{\lambda}_{t}}\right\|_{\mathcal{L}^{p}(P, \|{\cdot}\|_{\mathbb{R}^{d}})}}{|{s-t}|^{\alpha}} < \infty, $$
(A.12)
4.
for all $x\in \mathbb {R}^{d}$, t ∈ [0,T] and ω ∈Ω it holds that
$$ {X^{x}_{t}}(\omega) = \sum\limits_{i=1}^{d} \left( X^{e_{i}}_{t}(\omega)-{X^{0}_{t}}(\omega)\right)x_{i} + {X^{0}_{t}}(\omega). $$
(A.13)

Proof 5

Properties (1)-(3) are proven in [2, Theorem 4.5.1]. Property (4) follows from Proposition 2.20 in [8] and Lemma 3.4 in [3]. □

Lemma 6

Let $h: \mathbb {R}\to \mathbb {R} :x\mapsto \min \limits \{1,\max \limits \{0,x\}\}$. For every N ≥ 2 and 𝜖,γ > 0 there exists a tanh neural network $\hat {h}$ with two hidden layers, $\mathcal {O}\left (N^{\frac {1}{2(1-\gamma )}}\epsilon ^{\frac {-3}{1-\gamma }}\right )$ neurons and weights growing as $\mathcal {O}\left (N^{\frac {1}{(1-\gamma )}}\epsilon ^{\frac {-6}{1-\gamma }}\right )$ such that

$$ \left\|{h-\hat{h}}\right\|_{L^{\infty}(\mathbb{R})} \leq \epsilon , \quad \left\|{h^{\prime}-\hat{h}^{\prime}}\right\|_{L^{2}([-N,N])} \leq \epsilon \quad \text{ and }\quad \left\|{\hat{h}^{\prime}}\right\|_{L^{\infty}(\mathbb{R})}\leq 2. $$

(A.14)

Proof 6

We first approximate h with a function $\tilde {h}$ that is twice continuously differentiable,

$$ \tilde{h}(x) = \begin{cases} 0 &x\leq -\frac{\pi \epsilon^{2}}{2},\\ \frac{1}{2}\left( \frac{\pi \epsilon^{2}}{2} + x - \epsilon^{2} \cos\left( \frac{x}{\epsilon^{2}}\right)\right) &-\frac{\pi \epsilon^{2}}{2} < x \leq \frac{\pi \epsilon^{2}}{2}, \\ x & \frac{\pi \epsilon^{2}}{2} < x \leq 1 - \frac{\pi \epsilon^{2}}{2},\\ \frac{1}{2}\left( 1-\frac{\pi \epsilon^{2}}{2} + x + \epsilon^{2} \cos\left( \frac{1-x}{\epsilon^{2}}\right)\right) & 1 - \frac{\pi \epsilon^{2}}{2} < x \leq 1 + \frac{\pi \epsilon^{2}}{2}, \\ 1 & 1 + \frac{\pi \epsilon^{2}}{2} < x. \end{cases} $$

(A.15)

It is easy to prove that $\left \|{h-\tilde {h}}\right \|_{L^{\infty }(\mathbb {R})} = \mathcal {O}(\epsilon ^{2})$. Next, we calculate the derivative of $\tilde {h}$,

$$ \tilde{h}^{\prime}(x) = \begin{cases} 0 &x\leq -\frac{\pi \epsilon^{2}}{2},\\ \frac{1}{2}\left( 1+ \sin\left( \frac{x}{\epsilon^{2}}\right)\right) &-\frac{\pi \epsilon^{2}}{2} < x \leq \frac{\pi \epsilon^{2}}{2}, \\ 1 & \frac{\pi \epsilon^{2}}{2} < x \leq 1 - \frac{\pi \epsilon^{2}}{2},\\ \frac{1}{2}\left( 1+ \sin\left( \frac{1-x}{\epsilon^{2}}\right)\right) & 1 - \frac{\pi \epsilon^{2}}{2} < x \leq 1 + \frac{\pi \epsilon^{2}}{2}, \\ 0 & 1 + \frac{\pi \epsilon^{2}}{2} < x. \end{cases} $$

(A.16)

A straightforward calculation leads to the bound $\left \|{h^{\prime }-\tilde {h}^{\prime }}\right \|_{L^{2}(\mathbb {R})} = \mathcal {O}(\epsilon )$. Finally, one can easily check that $\tilde {h}^{\prime \prime }$ is continuous and that $\left \|{\tilde {h}^{\prime \prime }}\right \|_{L^{\infty }(\mathbb {R})} = \mathcal {O}(\epsilon ^{-2})$. An application of [5, Theorem 5.1] on $\tilde {h}$ gives us for every γ > 0 and N large enough the existence of a tanh neural network $\hat {h}^{\mathcal {N}}$ with two hidden layers and $\mathcal {O}(\mathcal {N})$ neurons for which it holds that $\left \|{\tilde {h}-\hat {h}^{\mathcal {N}}}\right \|_{W^{1,\infty }([-1,2])} = \mathcal {O}(N^{-1+\gamma }\epsilon ^{-2})$. Because of the nature of the construction of $\hat {h}^{\mathcal {N}}$, the monotonous behaviour of the hyperbolic tangent towards infinity and the fact that $\tilde {h}$ is constant outside [− 1, 2], the stronger result that $\left \|{\tilde {h}-\hat {h}^{\mathcal {N}}}\right \|_{W^{1,\infty }(\mathbb {R})} = \mathcal {O}\left (\mathcal {N}^{-1+\gamma }\epsilon ^{-2}\right )$ holds automatically as well. As a result we find that $\left \|{\left (\hat {h}^{\mathcal {N}}\right )^{\prime } }\right \|_{L^{\infty }(\mathbb {R})}\leq 2$, $\left \|{\tilde {h}-\hat {h}^{\mathcal {N}}}\right \|_{L^{\infty }(\mathbb {R})} = \mathcal {O}\left (\mathcal {N}^{-1+\gamma }\epsilon ^{-2}\right )$ and $\left \|{\tilde {h}-\hat {h}^{\mathcal {N}}}\right \|_{L^{2}([-N,N])} = \mathcal {O}\left (\sqrt {N}\mathcal {N}^{-1+\gamma }\epsilon ^{-2}\right )$. If we choose $\mathcal {N}\sim N^{\frac {1}{2(1-\gamma )}}\epsilon ^{\frac {-3}{1-\gamma }}$ then we find that

$$ \left\|{\tilde{h}-\hat{h}^{\mathcal{N}}}\right\|_{L^{\infty}(\mathbb{R})} \leq \epsilon \quad \text{ and }\quad \left\|{\tilde{h}^{\prime}-\left( \hat{h}^{\mathcal{N}}\right)^{\prime}}\right\|_{L^{2}([-N,N])} \leq \epsilon. $$

(A.17)

Moreover, [5, Theorem 5.1] tells us that the weights of $\hat {h}^{\mathcal {N}}$ grow as $\mathcal {O}(\mathcal {N}^{2}) = \mathcal {O}\left (N^{\frac {1}{(1-\gamma )}}\epsilon ^{\frac {-6}{1-\gamma }}\right )$. The statement then follows from applying the triangle inequality. □

Appendix B: Lipschitz continuity in the parameter vector of a neural network and its derivatives

In this section, we will prove that for any x ∈ D, a neural network and its corresponding Jacobian and Hessian matrix are Lipschitz continuous in the parameter vector. This property is of crucial importance to find bounds on the generalization error of physics-informed neural networks, cf. Section 4. We first introduce some notation and then state or results. The main results of this section are Lemmas 11 and 13.

We denote by $\sigma :\mathbb {R}\to \mathbb {R}$ be an (at least) twice continuously differentiable activation function, like tanh or sigmoid. For any $n\in \mathbb {N}$, we write for $x\in \mathbb {R}^{n}$ that σ(x) := (σ(x₁),…,σ(x_n)). We use the definition of a neural network as in Definition 1.

Recall that for a differentiable function $f:\mathbb {R}^{n}\to \mathbb {R}^{m}$ the Jacobian matrix J[f] is defined by

$$ J[f]_{ij} = \frac{\partial f_{i}}{\partial x_{j}}. $$

(B.1)

For our purpose, we make the following the following convention. For any 1 ≤ k ≤ L, we define

$$ J_{k}^{\theta}(x) := J\left[f_{k}^{\theta}\right]\left( \left( f_{k-1}^{\theta}\circ {\cdots} \circ f_{1}^{\theta}\right)(x)\right) \in \mathbb{R}^{l_{k} \times l_{k-1}}. $$

(B.2)

Similarly, for a twice differentiable function $g:\mathbb {R}^{n}\to \mathbb {R}$ the Hessian matrix is defined by

$$ H[g]_{ij} = \frac{\partial^{2} g}{\partial x_{i} \partial x_{j}}. $$

(B.3)

Slightly abusing notation, we generalize this to vector-valued functions $g:\mathbb {R}^{n}\to \mathbb {R}^{m}$. We write

$$ H[g]_{kij} = \frac{\partial^{2} g_{k}}{\partial x_{i} \partial x_{j}}, $$

(B.4)

where we identify $\mathbb {R}^{1\times n \times n}$ with $\mathbb {R}^{n \times n}$ to make the definitions consistent. Similarly, if $v\in \mathbb {R}^{1 \times m}$, then v ⋅ H[g] should be interpreted as

$$ v\cdot H[g](x) := \sum\limits_{k=1}^{m} v_{k} H[g_{k}](x) \in \mathbb{R}^{n \times n}. $$

(B.5)

For any 1 ≤ k < L, we write

$$ H^{\theta}_{k}(x) := H\left[f_{k}^{\theta}\right]\left( \left( f_{k-1}^{\theta}\circ {\cdots} \circ f_{1}^{\theta}\right)(x)\right) \in \mathbb{R}^{l_{k} \times l_{k-1} \times l_{k-1}}. $$

(B.6)

Finally, we will use the notation J^𝜃 := J[Ψ^𝜃] and H^𝜃 := H[Ψ^𝜃]. The following lemma presents a generalized version of the chain rule.

Lemma 7

Let $f:\mathbb {R}^{n}\to \mathbb {R}^{m}$ and $g:\mathbb {R}^{m}\to \mathbb {R}$. Then, it holds that

$$ H[g\circ f](x):= J[f](x)^{T} \cdot H[g](f(x)) \cdot J[f](x) + J[g](f(x)) \cdot H[f](x). $$

(B.7)

We now apply this formula to find an expression for H^𝜃 in terms of $ J^{\theta }_{k}$ and $ H^{\theta }_{k}$.

Lemma 8

It holds that

$$ \begin{aligned} J\left[{{\varPsi}}^{\theta}\right] &= \prod\limits_{k=0}^{L-1} J_{L-k}^{\theta} \quad \text{and} \quad H\left[{{\varPsi}}^{\theta}\right] \\ &= \sum\limits_{k=1}^{L} \left( J^{\theta}_{1}\right)^{T}\cdots\left( J^{\theta}_{k-1}\right)^{T}\cdot\left( J^{\theta}_{L}{\cdots} J^{\theta}_{k+1}\cdot H^{\theta}_{k}\right)\cdot J^{\theta}_{k-1} {\cdots} J^{\theta}_{1}. \end{aligned} $$

(B.8)

Proof 7

The first statement is just the chain rule for calculating the derivative of a composite function. We prove the second statement using induction. For the base step, let L = 1. Then, ${{\varPsi }}^{\theta } = f_{L}^{\theta }$ and we have $H[{{\varPsi }}^{\theta }] = H^{\theta }_{L}$. For the induction step, take $K\in \mathbb {N}, K\geq 2$ and assume that the statement holds for L = K − 1. Now, let ${{\varPhi }}^{\theta } = f_{K}^{\theta }\circ {\cdots } \circ f_{2}^{\theta }$ and ${{\varPsi }}^{\theta } = {{\varPhi }}^{\theta } \circ f_{1}^{\theta }$. Applying the generalized chain rule to calculate $H[{{\varPhi }}^{\theta } \circ f_{1}^{\theta }]$ and using the induction hypothesis on H[Φ^𝜃] gives the wanted result. □

Next, we formally introduce the element-wise supremum norm ${\left \vert \cdot \right \vert _{\infty }}$. Let $N\in \mathbb {N}$, $n_{0}, {\ldots } n_{N}\in \mathbb {N}$ and $A\in \mathbb {R}^{n_{1}\times {\cdots } \times n_{N}}$. Then, we define

$$ {\left\vert A \right\vert_{\infty}} := \underset{1\leq i_{1} \leq n_{1}}{\max} {\cdots} \underset{1\leq i_{N} \leq n_{N}}{\max} |{A_{i_{1} {\cdots} i_{N}}}|. $$

(B.9)

Let R > 0 and suppose that $A_{i}\in \mathbb {R}^{n_{i-1} \times n_{i}}$. Then, it holds that

$$ {\left\vert \prod\limits_{i=1}^{N} A_{i} \right\vert_{\infty}} \leq {\left\vert A_{N} \right\vert_{\infty}}\prod\limits_{i=1}^{N-1} n_{i} {\left\vert A_{i} \right\vert_{\infty}}. $$

(B.10)

Moreover, for $v\in \mathbb {R}^{1 \times a}$ and $A\in \mathbb {R}^{a\times b \times c}$ it holds that ${\left \vert v\cdot A \right \vert _{\infty }} \leq a {\left \vert v \right \vert _{\infty }}{\left \vert A \right \vert _{\infty }}$.

The following lemma states that the output of each layer of a neural network is Lipschitz continuous in the parameter vector for any input x ∈ [a,b]^d. The lemma is stated for neural networks with a differentiable activation function, but can be easily adapted for, e.g. ReLU neural networks.

Lemma 9

Let $d,L,W\in \mathbb {N}$ with L,W ≥ 2, $a,b\in \mathbb {R}$ with a < b and R ≥ 1. Moreover, let 𝜃,𝜗 ∈Θ_L,W,R, $\alpha = \max \limits \{1,|{a}|, |{b}|, \|{\sigma }\|_{\infty }\}$ and $\beta = \max \limits \{1, \left \|{\sigma ^{\prime }}\right \|_{\infty }\}$. Then, it holds for 1 ≤ K ≤ L that

$$ \left\|{f_{K}^{\theta} \circ {\cdots} \circ f_{1}^{\theta}-f_{K}^{\vartheta} \circ {\cdots} \circ f_{1}^{\vartheta}}\right\|_{L^{\infty}([a,b]^{d})} \leq \alpha(d+4) W^{K-1} R^{K-1} \beta^{K}\left\vert \theta-\vartheta\right\vert_{\infty}. $$

(B.11)

Proof 8

Let l₀,…,l_L denote the widths of the neural network, where l₀ = d. Let x ∈ [a,b]^d be arbitrary. First of all, it holds that

$$ \begin{array}{@{}rcl@{}} {\left\vert f_{1}^{\theta}(x)-f_{1}^{\vartheta}(x) \right\vert_{\infty}} &=& {\left\vert \sigma\left( W^{\theta}_{1}x+b^{\theta}_{1}\right)-\sigma\left( W^{\vartheta}_{1}x+b^{\vartheta}_{1}\right) \right\vert_{\infty}}\\ &\leq& \left\|{\sigma^{\prime}}\right\|_{\infty} {\left\vert \left( W^{\theta}_{1}-W^{\vartheta}_{1}\right)x+\left( b^{\theta}_{1}-b^{\vartheta}_{1}\right) \right\vert_{\infty}}\\ &\leq& \beta (d\alpha+1){\left\vert \theta-\vartheta \right\vert_{\infty}}. \end{array} $$

(B.12)

Now, let 2 ≤ k ≤ L and define $y = \left (f_{k-1}^{\theta }\circ {\cdots } \circ f_{1}^{\theta }\right )(x)$ and $\tilde {y} = (f_{k-1}^{\vartheta }\circ {\cdots } \circ f_{1}^{\vartheta })(x)$. We find that

$$ \begin{array}{@{}rcl@{}} {\left\vert f_{k}^{\theta}(y)-f_{k}^{\vartheta}(\tilde{y}) \right\vert_{\infty}} & \leq& \max\left\{1,\left\|{\sigma^{\prime}}\right\|_{\infty}\right\} {\left\vert \left( W^{\theta}_{k}-W^{\vartheta}_{k}\right)y+b^{\theta}_{k}-b^{\vartheta}_{k}+W^{\vartheta}_{k}(y-\tilde{y}) \right\vert_{\infty}}\\ & \leq& \beta((l_{k-1}\alpha + 1){\left\vert \theta-\vartheta \right\vert_{\infty}} + l_{k-1}R{\left\vert y-\tilde{y} \right\vert_{\infty}}). \end{array} $$

(B.13)

A recursive application of this inequality then gives us for 1 ≤ K ≤ L that

$$ {{\begin{aligned} & \left\|{f_{K}^{\theta}\circ f_{K-1}^{\theta} \circ {\cdots} \circ f_{1}^{\theta}-f_{K}^{\vartheta}\circ f_{K-1}^{\vartheta} \circ {\cdots} \circ f_{1}^{\vartheta}}\right\|_{\infty}\\ & \leq \sum\limits_{k=1}^{K} l_{K-1} {\cdots} l_{k}(l_{k-1}\alpha+1)R^{K-k}\beta^{K-k+1}{\left\vert \theta-\vartheta \right\vert_{\infty}}\\ & \leq W^{K-1}(d\alpha+1) R^{K-1}\beta^{K}{\left\vert \theta-\vartheta \right\vert_{\infty}} +\beta (W\alpha+1){\left\vert \theta-\vartheta \right\vert_{\infty}}\sum\limits_{k=2}^{K} W^{K-k} R^{K-k}\beta^{K-k}\\ & \leq W^{K-1}(d\alpha+1) R^{K-1}\beta^{K}{\left\vert \theta-\vartheta \right\vert_{\infty}} + \frac{\beta (W\alpha+1)W^{K-1} R^{K-1}\beta^{K-1}}{WR\beta-1}{\left\vert {\theta-\vartheta}\right\vert_{\infty}}\\ & \leq \alpha(d+4) W^{K-1} R^{K-1}\beta^{K}{\left\vert \theta-\vartheta \right\vert_{\infty}}, \end{aligned}}} $$

(B.14)

where we used that β(Wα + 1)/(WRβ − 1) ≤ β(2α + 1)/(2Rβ − 1) ≤ 3α when W ≥ 2,R ≥ 1,α ≥ 1. □

Lemma 10

Let $d,L,W\in \mathbb {N}$ with L,W ≥ 2, $a,b\in \mathbb {R}$ with a < b and R ≥ 1. Moreover, let 𝜃,𝜗 ∈Θ_L,W,R, $\alpha = \max \limits \{1,|{a}|, |{b}|, \|{\sigma }\|_{\infty }\}$ and $\beta = \max \limits \{1, \left \|{\sigma ^{\prime }}\right \|_{\infty },\left \|{\sigma ^{\prime \prime }}\right \|_{\infty },\left \|{\sigma ^{\prime \prime \prime }}\right \|_{\infty }\}$. Then, it holds for all 1 ≤ k ≤ L and x ∈ [a,b]^d that

$$ {{\begin{aligned} {\left\vert J^{\theta}_{k}(x)_{i}-J^{\vartheta}_{k}(x)_{i} \right\vert_{\infty}} &\leq& \beta(1 +\alpha(d+4) W^{k-1} R^{k}\beta^{k-1} + R(\alpha W+1)) {\left\vert \theta-\vartheta \right\vert_{\infty}} \quad \text{and} \end{aligned}}} $$

(B.15)

$$ {{\begin{aligned} {\left\vert H^{\theta}_{k}(x)_{i}-H^{\vartheta}_{k}(x)_{i} \right\vert_{\infty}} &\leq& 2\beta R(1+\alpha(d+4) W^{k-1} R^{k}\beta^{k-1} + R(\alpha W+1)) {\left\vert \theta-\vartheta \right\vert_{\infty}}. \end{aligned}}} $$

(B.16)

Proof 9

Let ${w_{i}^{T}}$ be the i th row of W^𝜃,k, let $\tilde {w}_{i}^{T}$ be the i th row of W^𝜗,k and set b := b^𝜃,k and $\tilde {b}:=b^{\vartheta , k}$. Let $F = f_{k-1}^{\theta }\circ {\cdots } \circ f_{1}^{\theta }$ and $\tilde {F} = f_{k-1}^{\vartheta }\circ {\cdots } \circ f_{1}^{\vartheta }$. For 1 ≤ i ≤ l_k, we have that

$$ \begin{array}{@{}rcl@{}} J^{\theta}_{k}(x)_{i} &=& \sigma^{\prime}\left( {w_{i}^{T}}\cdot F(x)+b_{i}\right) \cdot {w_{i}^{T}} \in \mathbb{R}^{1 \times l_{k-1}} \end{array} $$

(B.17)

$$ \begin{array}{@{}rcl@{}} H^{\theta}_{k}(x)_{i} &=& \sigma^{\prime\prime}\left( {w_{i}^{T}}\cdot F(x)+b_{i}\right) \cdot w_{i} \cdot {w_{i}^{T}} \in \mathbb{R}^{l_{k-1} \times l_{k-1}} \end{array} $$

(B.18)

and analogously for $J^{\vartheta }_{k}(x)_{i}$ and $H^{\vartheta }_{k}(x)_{i}$. The triangle inequality and the Lipschitz continuity of $\sigma ^{\prime }$ gives us that

$$ {{\begin{aligned} {\left\vert J^{\theta}_{k}(x)_{i}-J^{\vartheta}_{k}(x)_{i} \right\vert_{\infty}} &\leq \left\|{\sigma^{\prime}}\right\|_{\infty} {\left\vert w_{i} -\tilde{w}_{i} \right\vert_{\infty}} + \left|{\sigma^{\prime}\left( {w_{i}^{T}}\cdot F(x)+b_{i}\right)-\sigma^{\prime}\left( \tilde{w}_{i}^{T}\cdot \tilde{F}(x)+\tilde{b}_{i}\right)}\right| {\left\vert {\tilde{w}_{i}}\right\vert_{\infty}}\\ &\leq \beta{\left\vert \theta-\vartheta \right\vert_{\infty}} + \left\|{\sigma^{\prime\prime}}\right\|_{\infty} R \left|{{w_{i}^{T}}\cdot (F(x)-\tilde{F}(x)) +(w_{i}-\tilde{w}_{i})^{T}\cdot \tilde{F}(x) +b_{i}-\tilde{b}_{i}}\right|\\ &\leq \beta {\left\vert \theta-\vartheta \right\vert_{\infty}} + \left\|{\sigma^{\prime\prime}}\right\|_{\infty} R \left( l_{k-1}R{\left\vert {F(x)-\tilde{F}(x)}\right\vert_{\infty}}+(l_{k-1}\|{\sigma}\|_{\infty}+1){\left\vert \theta-\vartheta\right\vert_{\infty}}\right). \end{aligned}}} $$

(B.19)

Using that ${\left \vert F(x)-\tilde {F}(x) \right \vert _{\infty }} \leq \alpha (d+4) W^{k-2} R^{k-2}\beta ^{k-1}{\left \vert \theta -\vartheta \right \vert _{\infty }}$ (Lemma 11) for k ≥ 2 and l_k− 1 ≤ W, we get

$$ {\left\vert J^{\theta}_{k}(x)_{i}-J^{\vartheta}_{k}(x)_{i} \right\vert_{\infty}} \leq \beta(1 +\alpha(d+4) W^{k-1} R^{k}\beta^{k-1} + R(\alpha W+1)) {\left\vert \theta-\vartheta \right\vert_{\infty}} $$

(B.20)

for k ≥ 2. One can check that the inequality also holds for k = 1.

For the Hessian matrix, the triangle inequality and the Lipschitz continuity of $\sigma ^{\prime \prime }$ gives us that

$$ \begin{array}{@{}rcl@{}} {\left\vert H^{\theta}_{k}(x)_{i}-H^{\vartheta}_{k}(x)_{i} \right\vert_{\infty}} &\leq& \left\|{\sigma^{\prime\prime}}\right\|_{\infty} {\left\vert w_{i} \cdot {w_{i}^{T}}-\tilde{w}_{i} \cdot \tilde{w}_{i}^{T} \right\vert_{\infty}}\\ & +& \left|{\sigma^{\prime\prime}\left( {w_{i}^{T}}\cdot F(x)+b_{i}\right)-\sigma^{\prime\prime}\left( \tilde{w}_{i}^{T}\cdot \tilde{F}(x)+\tilde{b}_{i}\right)}\right| {\left\vert \tilde{w}_{i} \cdot \tilde{w}_{i}^{T} \right\vert_{\infty}}\\ &\leq & 2\beta R{\left\vert \theta-\vartheta \right\vert_{\infty}} + \left\|{\sigma^{\prime\prime\prime}}\right\|_{\infty} R^{2}(\alpha W+1) {\left\vert \theta-\vartheta \right\vert_{\infty}} \\ &+& \left\|{\sigma^{\prime\prime\prime}}\right\|_{\infty} R^{3} W {\left\vert {F(x)-\tilde{F}(x)}\right\vert_{\infty}} \end{array} $$

(B.21)

Using Lemma 11 again, we get

$$ {\left\vert H^{\theta}_{k}(x)_{i}-H^{\vartheta}_{k}(x)_{i} \right\vert_{\infty}} \leq 2\beta R(1+\alpha(d+4) W^{k-1} R^{k}\beta^{k-1} + R(\alpha W+1)) {\left\vert \theta-\vartheta \right\vert_{\infty}} $$

(B.22)

for k ≥ 2. One can check that the inequality also holds for k = 1. □

The following lemma states that the Jacobian and Hessian matrix of a neural network are Lipschitz continuous in the parameter vector for any input x ∈ [a,b]^d.

Lemma 11

Let $d,L,W\in \mathbb {N}$ with L,W ≥ 2, $a,b\in \mathbb {R}$ with a < b and R ≥ 1. Moreover, let 𝜃,𝜗 ∈Θ_L,W,R, $\alpha = \max \limits \{1,|{a}|, |{b}|, \|{\sigma }\|_{\infty }\}$ and $\beta = \max \limits \{1, \left \|{\sigma ^{\prime }}\right \|_{\infty },\left \|{\sigma ^{\prime \prime }}\right \|_{\infty },\left \|{\sigma ^{\prime \prime \prime }}\right \|_{\infty }\}$. Then, it holds that for all x ∈ [a,b]^d that

$$ {\left\vert J[{{\varPsi}}^{\theta}](x)-J[{{\varPsi}}^{\vartheta}](x) \right\vert_{\infty}} \leq 2\alpha(d+7)LR^{2L-1}W^{2L-2}\beta^{L-1}{\left\vert \theta-\vartheta \right\vert_{\infty}}, $$

(B.23)

$$ {\left\vert H[{{\varPsi}}^{\theta}](x)-H[{{\varPsi}}^{\vartheta}](x) \right\vert_{\infty}} \leq 4\alpha(d+7)L^{2} R^{3L-1}W^{3L-3}\beta^{L} {\left\vert \theta-\vartheta \right\vert_{\infty}}. $$

(B.24)

Proof 10

We will prove the formulas by repeatedly using the triangle inequality and using the representations proven in Lemma 10. To do so, we need to introduce some notation. Define for 0 ≤ l ≤ L + k − 1 the object ϕ^l ∈{𝜃,𝜗}^2L such that

$$ {\phi^{l}_{j}} = \begin{cases}\vartheta & j\leq l,\\ \theta & j > l. \end{cases} \qquad \text{and} \qquad A^{k,l}_{j} = \begin{cases} \left( J^{{\phi^{l}_{j}}}_{j}\right)^{T} & 1\leq j \leq k-1,\\ J^{{\phi^{l}_{k}}}_{L+k-j} & k\leq j \leq L-1 \\ H^{{\phi^{l}_{L}}}_{k} & j=L \\ J^{{\phi^{l}_{k}}}_{L+k-j} & L+1 \leq j \leq L+k-1. \end{cases} $$

(B.25)

In particular, $\phi ^{k,0}_{j} = \theta $ and $\phi ^{k,{L+k-1}}_{j} = \vartheta $ for all j. To simplify notation, we write

$$ \begin{array}{@{}rcl@{}} {h_{k}^{l}} &= \left( J^{{\phi^{l}_{1}}}_{1}\right)^{T}\cdots\left( J^{\phi^{l}_{k-1}}_{k-1}\right)^{T}\cdot\left( J^{{\phi^{l}_{k}}}_{L}{\cdots} J^{\phi^{l}_{L-1}}_{k+1}\cdot H^{{\phi^{l}_{L}}}_{k}\right)\cdot J^{\phi^{l}_{L+1}}_{k-1} {\cdots} J^{\phi^{l}_{L+k-1}}_{1} = \prod\limits_{j=1}^{L+k-1} A^{k,l}_{j}. \end{array} $$

(B.26)

The triangle inequality and Lemma 10 then give that

$$ {\left\vert H^{\theta}-H^{\vartheta} \right\vert_{\infty}} \leq \sum\limits_{k=1}^{L} \sum\limits_{l=1}^{L+k-1} {\left\vert h_{k}^{l-1}- {h_{k}^{l}} \right\vert_{\infty}}. $$

(B.27)

Observe that $A^{k,l-1}_{j}-A^{k,l}_{j} = 0$ for j≠l. Therefore,

$$ \begin{array}{@{}rcl@{}} {\left\vert h_{k}^{l-1}- {h_{k}^{l}} \right\vert_{\infty}} & =& {\left\vert A^{k,l}_{1} {\cdots} A^{k,l}_{l-1}\cdot\left( A^{k,l-1}_{l}-A^{k,l}_{l}\right) \cdot A^{k,l}_{l+1} {\cdots} A^{k,l}_{L+k-1} \right\vert_{\infty}}\\ &\leq &(l_{1} {\cdots} l_{k-1})^{2} \cdot l_{k} {\cdots} l_{L-1} \cdot R^{L+k-2} {\left\vert A^{k,l-1}_{l}-A^{k,l}_{l} \right\vert_{\infty}}\\ &\leq& W^{L+k-2}R^{L+k-2} {\left\vert A^{k,l-1}_{l}-A^{k,l}_{l} \right\vert_{\infty}}. \end{array} $$

(B.28)

From Lemma 12, it follows that

$$ {\left\vert A^{k,l-1}_{l}-A^{k,l}_{l} \right\vert_{\infty}} \leq 2\beta R(1+\alpha(d+4) W^{k-1} R^{k}\beta^{k-1} + R(\alpha W+1)) {\left\vert \theta-\vartheta \right\vert_{\infty}} $$

(B.29)

Writing γ := 1 + R(αW + 1) we get

$$ {{\begin{aligned} {\left\vert H^{\theta}-H^{\vartheta} \right\vert_{\infty}} &\leq \sum\limits_{k=1}^{L} (L+k-1) W^{L+k-2}R^{L+k-2}\cdot 2\beta R(1+\alpha(d+4) W^{k-1} R^{k}\beta^{k-1} + R(\alpha W+1)) {\left\vert \theta-\vartheta \right\vert_{\infty}}\\ &\leq \sum\limits_{k=1}^{L} 2L W^{2L-2}R^{2L-2}\cdot 2\beta R\alpha(d+7) W^{L-1} R^{L}\beta^{L-1}{\left\vert \theta-\vartheta \right\vert_{\infty}}\\ &\leq 4\alpha(d+7)L^{2} R^{3L-1}W^{3L-3}\beta^{L} {\left\vert \theta-\vartheta \right\vert_{\infty}}. \end{aligned}}} $$

(B.30)

In an entirely similar fashion we obtain

$$ \begin{array}{@{}rcl@{}} {\left\vert J^{\theta}-J^{\vartheta} \right\vert_{\infty}} &\leq& \sum\limits_{k=1}^{L} W^{L-1}R^{L-1}{\left\vert J^{\theta}_{k}-J^{\vartheta}_{k} \right\vert_{\infty}}\\ & \leq& 2\alpha(d+7)LR^{2L-1}W^{2L-2}\beta^{L-1}{\left\vert {\theta-\vartheta}\right\vert_{\infty}}. \end{array} $$

(B.31)

□

Appendix C: Additional material for Section 4

Lemma 12 (Hoeffding’s inequality)

Let 𝜖,c > 0, $N\in \mathbb {N}$, let $({{\varOmega }},\mathcal {A},\mathbb {P})$ be a probability space and let X_n : Ω → [0,c] be independent random variables. Then, it holds that

$$ \mathbb{P}\left( \frac{1}{N}\left( \sum\limits_{i=1}^{N}(X_{i}-\mathbb{E}[X_{i}])\right) \geq \epsilon \right) \leq \exp\left( \frac{-2\epsilon^{2}N}{c^{2}}\right). $$

(C.1)

Lemma 13

Let $x\in \mathbb {R}$ and $ \sigma (x) = \tanh {x} = \frac {e^{-x}-e^{x}}{e^{-x}+e^{x}}$. It holds that $\sigma ^{\prime }(x) = 1-(\sigma (x))^{2}$ and $\sigma ^{\prime \prime }(x) = -2\sigma (x)/(1-(\sigma (x))^{2})$. In addition, it holds that $\|{\sigma }^{\prime }\|_{\infty } = 1$ and $\left \|{\sigma ^{\prime \prime }}\right \|_{\infty } = 4/3\sqrt {3} \leq 1$ and $\left \|{\sigma ^{\prime \prime \prime }}\right \|_{\infty } = 2$.

The following lemma provides estimate on the various PINN residuals. It is based on the fact that neural networks and their derivatives are Lipschitz continuous in the parameter vector, the proof of which can be found in Appendix Appendix.

Lemma 14

Let $d, L, W\in \mathbb {N}$, R ≥ 1 and let $u_{\theta }:[0,1]^{d}\to \mathbb {R}$, 𝜃 ∈Θ, be tanh neural networks, at most L − 1 hidden layers, width at most W and weights and biases bounded by R. Let the PINN generalization $\mathcal {E}_{G}^{q}$ and training $\mathcal {E}_{T}^{q}$ errors be defined as in Section 2.3 for linear Kolmogorov PDEs (cf. Section 2.1). Assume that $\max \limits \{\left \|{\varphi }\right \|_{\infty }, \left \|{\psi }\right \|_{\infty }\}\leq \max \limits _{\theta \in {{\varTheta }}}\left \|{u_{\theta }}\right \|_{\infty }$. Let $\mathfrak {L}^{q}_{Q}$ denote the Lipschitz constant of $\mathcal {E}^{q}_{Q}$, for q = i,t,s and Q = G,T. Then, it holds that

$$ {{\begin{aligned} \mathfrak{L}^{q}_{Q} \leq 2^{5+2L} \underset{x\in D}{\max} \left( 1+\sum\limits_{i=1}^{d}|{\mu(x)_{i}}|+\sum\limits_{i,j=1}^{d}|{(\sigma(x)\sigma(x)^{*})_{ij}}|\right)^{2}(d+7)^{2}L^{4}R^{6L-1}W^{6L-6}. \end{aligned}}} $$

(C.2)

Proof 11

Without loss of generality, we only focus on $\mathcal {E}_G^{q}$, for q = i,s,t. We see for q = i,t,s

$$ \left|{\mathcal{E}_G^{q}(\theta)-\mathcal{E}_T^{q}(\vartheta)}\right|_{\infty} \leq 2 \underset{\theta}{\max} \left\|{\mathcal{R}_{q}[u_{\theta}]}\right\|_{\infty} \left\|{\mathcal{R}_{q}[u_{\theta}]-\mathcal{R}_{q}[{{\varPhi}}^{\vartheta}]}\right\|_{\infty} $$

(C.3)

For q = t,s and (x,t) ∈ D × [0,T], it follows from Lemma 11 that

$$ \left|{\mathcal{R}_{q}[u_{\theta}](x,t)-\mathcal{R}_{q}[{{\varPhi}}^{\vartheta}](t,x)}\right| \leq (d+4) W^{L-1} R^{L-1} {\left\vert \theta-\vartheta \right\vert_{\infty}}, $$

(C.4)

and similarly using Lemma 13 that

$$ {{\begin{aligned} \left|{\mathcal{R}_{i}[u_{\theta}](t,x)-\mathcal{R}_{i}[{{\varPhi}}^{\vartheta}](t,x)}\right| &\leq (1+\left|{\mu(x)}\right|_{1}){\left\vert J^{\theta}-J^{\vartheta}\right\vert_{\infty}}\\ &+ \left|{\sigma(x)\sigma(x)^{*}}\right|_{1} {\left\vert H_{x}^{\theta}-H_{x}^{\vartheta}\right\vert_{\infty}}\\ &\leq 4 \alpha (1+|{\mu(x)}|_{1}\\ &+\left|{\sigma(x)\sigma(x)^{*}}\right|_{1})(d+7)L^{2}R^{3L-1}W^{3L-3}2^{L}{\left\vert \theta-\vartheta \right\vert_{\infty}}, \end{aligned}}} $$

(C.5)

where we let |⋅|_p denote the vector p-norm of the vectorized version of a general tensor (cf. (B.9)). Next, we calculate using again Lemma 13 (by setting 𝜗 = 0) and $\max \limits \{\left \|{\varphi }\right \|_{\infty }, \left \|{\psi }\right \|_{\infty }\}\leq \max \limits _{\theta \in {{\varTheta }}}\left \|{u_{\theta }}\right \|_{\infty }$ for q = t,s that

$$ \begin{array}{@{}rcl@{}} \underset{\theta}{\max} \left\|{\mathcal{R}_{i}[u_{\theta}]}\right\|_{\infty} &\leq& 4 \alpha C(d+7)2^{L}L^{2}R^{3L}W^{3L-3}, \\ && \underset{\theta}{\max} \left\|{\mathcal{R}_{q}[u_{\theta}]}\right\|_{\infty} \leq 2 WR, \end{array} $$

(C.6)

where $C=\max \limits _{x\in D}(1+|{\mu (x)}|_{1}+|{\sigma (x)\sigma (x)^{*}}|_{1})$. Combining all the previous results prove the stated bound. □

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

De Ryck, T., Mishra, S. Error analysis for physics-informed neural networks (PINNs) approximating Kolmogorov PDEs. Adv Comput Math 48, 79 (2022). https://doi.org/10.1007/s10444-022-09985-9

Download citation

Received: 21 July 2021
Accepted: 15 September 2022
Published: 15 November 2022
DOI: https://doi.org/10.1007/s10444-022-09985-9

Keywords

Mathematics Subject Classification (2010)

65M99

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Error analysis for physics-informed neural networks (PINNs) approximating Kolmogorov PDEs

Abstract

Article PDF

Similar content being viewed by others

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Regularity for Nonuniformly Elliptic Equations with $$p,\!q$$ -Growth and Explicit $$x,\!u$$ -Dependence

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Availability of data and material

Code availability

Publisher’s note

Appendices

Appendix A: Additional material for Section 3

1.1 A.1 Sobolev spaces

1.2 A.2 Auxiliary results

Definition 1

Lemma 1

Proof 1

Lemma 2

Proof 2

Lemma 3

Proof 3

Lemma 4 (Levy’s modulus of continuitý)

Proof 4

Lemma 5

Proof 5

Lemma 6

Proof 6

Appendix B: Lipschitz continuity in the parameter vector of a neural network and its derivatives

Lemma 7

Lemma 8

Proof 7

Lemma 9

Proof 8

Lemma 10

Proof 9

Lemma 11

Proof 10

Appendix C: Additional material for Section 4

Lemma 12 (Hoeffding’s inequality)

Lemma 13

Lemma 14

Proof 11

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation