Abstract
Physics-informed neural networks approximate solutions of PDEs by minimizing pointwise residuals. We derive rigorous bounds on the error, incurred by PINNs in approximating the solutions of a large class of linear parabolic PDEs, namely Kolmogorov equations that include the heat equation and Black-Scholes equation of option pricing, as examples. We construct neural networks, whose PINN residual (generalization error) can be made as small as desired. We also prove that the total L2-error can be bounded by the generalization error, which in turn is bounded in terms of the training error, provided that a sufficient number of randomly chosen training (collocation) points is used. Moreover, we prove that the size of the PINNs and the number of training samples only grow polynomially with the underlying dimension, enabling PINNs to overcome the curse of dimensionality in this context. These results enable us to provide a comprehensive error analysis for PINNs in approximating Kolmogorov PDEs.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bai, G., Koley, U., Mishra, S., Molinaro, R.: Physics informed neural networks (PINNs,) for approximating nonlinear dispersive PDEs. arXiv:2104.05584 (2021)
Barth, A., Jentzen, A., Lang, A., Schwab, C.: Numerical analysis of stochastic ordinary differential equations. ETH Zürich (2018)
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of black–scholes partial differential equations. SIAM J. Math. Data Sci. 2(3), 631–657 (2020)
Chen, T., Chen, H.: Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Trans. Neural Netw. 6(4), 911–917 (1995)
De Ryck, T., Lanthaler, S., Mishra, S.: On the approximation of functions by tanh neural networks. Neural Netw. 143, 732–750 (2021)
Dissanayake, M., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. (1994)
Han, EW., Jentzen, A.J.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
Grohs, P., Hornung, F., Jentzen, A., Von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. arXiv:1809.02362 (2018)
Gühring, I., Kutyniok, G., Petersen, P.: Error bounds for approximations with deep ReLU neural networks in Ws,p norms. Anal. Appl. 18(05), 803–859 (2020)
Gühring, I., Raslan, M.: Approximation rates for neural networks with encodable weights in smoothness spaces. Neural Netw. 134, 107–130 (2021)
Hiptmair, R., Schwab, C.: Numerical methods for elliptic and parabolic boundary value problems. ETH Zürich (2008)
Hornung, F., Jentzen, A., Salimova, D.: Space-time deep neural network approximations for high-dimensional partial differential equations. arXiv:2006.02199 (2020)
Jagtap, A. D., Karniadakis, G. E.: Extended physics-informed neural networks (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys. 28(5), 2002–2041 (2020)
Jagtap, A. D., Kharazmi, E., Karniadakis, G.E.: Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 365, 113028 (2020)
Klebaner, F. C.: Introduction to stochastic calculus with applications. World Scientific Publishing Company (2012)
Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constr. Approx. pp. 1–53 (2021)
Lagaris, I.E., Likas, A., D., P.G.: Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw. 11, 1041–1049 (2000)
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (2000)
Lanthaler, S., Mishra, S., Karniadakis, G.E.: Error estimates for DeepOnets: a deep learning framework in infinite dimensions (2022)
Lévy, P., Lévy, P.: Théorie de l’addition des variables aléatoires Gauthier-Villars (1954)
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A.: Fourier neural operator for parametric partial differential equations (2020)
Lu, L., Jin, P., Karniadakis, G. E.: DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv:1910.03193 (2019)
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: DeepXDE: A deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)
Lye, K. O., Mishra, S., Ray, D.: Deep learning observables in computational fluid dynamics. J. Comput. Phys. p. 109339 (2020)
Lye, K. O., Mishra, S., Ray, D., Chandrashekar, P.: Iterative surrogate model optimization (ISMO): An active learning algorithm for PDE constrained optimization with deep neural networks. Comput. Methods Appl. Mech. Eng. 374, 113575 (2021)
Mao, Z., Jagtap, A.D., Karniadakis, G.E.: Physics-informed neural networks for high-speed flows. Comput. Methods Appl. Mech. Eng. 360, 112789 (2020)
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs IMA J. Numer. Anal. (2021)
Mishra, S., Molinaro, R.: Physics informed neural networks for simulating radiative transfer. J. Quant. Spectros. Radiat. Transfer 270, 107705 (2021)
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics informed neural networks (PINNs) for approximating PDEs IMA. J. Numer. Anal. (2022)
Mishra, S., Molinaro, R., Tanios, R.: Physics informed neural networks for option pricing. In: Preparation (2021)
Øksendal, B.: Stochastic differential equations. Springer, New York (2003)
Pang, G., Lu, L., Karniadakis, G.E.: fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 41, A2603–A2626 (2019)
Raissi, M., Karniadakis, G.E.: Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Raissi, M., Yazdani, A., Karniadakis, G.E.: Hidden fluid mechanics: a Navier-Stokes informed deep learning framework for assimilating flow visualization data. arXiv:1808.04327 (2018)
Schwab, C., Zech, J.: Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in uq. Anal. Appl. 17(01), 19–55 (2019)
Shin, Y., Darbon, J., Karniadakis, G.E.: On the convergence and generalization of physics informed neural networks. arXiv:2004.01806 (2020)
Shin, Y., Zhang, Z., Karniadakis, G.E.: Error estimates of residual minimization using neural networks for linear equations. arXiv:2010.08019 (2020)
Tanios, R.: Physics informed neural networks in computational finance: high-dimensional forward and inverse option pricing. Master’s thesis, ETH Zürich. https://www.research-collection.ethz.ch/handle/20.500.11850/491556 (2021)
Yang, L., Meng, X., Karniadakis, G.E.: B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 425, 109913 (2021)
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich. This study was supported by ETH Zürich.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Communicated by: Carola-Bibiane Schoenlieb
Availability of data and material
NA
Code availability
NA.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Additional material for Section 3
1.1 A.1 Sobolev spaces
Let \(d\in \mathbb {N}\), \(k\in \mathbb {N}_{0}\), \(1\leq p\leq \infty \) and let \({{\varOmega }} \subseteq \mathbb {R}^{d}\) be open. For a function \(f:{{\varOmega }}\to \mathbb {R}\) and a (multi-)index \(\alpha \in \mathbb {N}^{d}_{0}\) we denote by
the classical or distributional (i.e. weak) derivative of f. We denote by Lp(Ω) the usual Lebesgue space and for we define the Sobolev space Wk,p(Ω) as
For \(p<\infty \), we define the following seminorms on Wk,p(Ω),
and for \(p=\infty \) we define
Based on these seminorms, we can define the following norm for \(p<\infty \),
and for \(p=\infty \) we define the norm
The space Wk,p(Ω) equipped with the norm \(\|{\cdot }\|_{W^{k,p}({{\varOmega }})}\) is a Banach space.
We denote by Ck(Ω) the space of functions that are k times continuously differentiable and equip this space with the norm \(\|{f}\|_{C^{k}({{\varOmega }})} = \|{f}\|_{W^{k,\infty }({{\varOmega }})}\).
1.2 A.2 Auxiliary results
We introduce some results related to the analysis of stochastic differential equations and random variables in general. We start by introducing notation cf. [2, Definition 2.1.3].
Definition 1
Let \(({{\varOmega }}, \mathcal {F}, \mu )\) be a measure space and let q > 0. For every \(\mathcal {F}/{\mathscr{B}}(\mathbb {R}^{d})\)-measurable function \(f:{{\varOmega }}\to \mathbbm {R}^{d}\), we define
Lemma 1
Let \(p\in [2,\infty )\), \(d,m\in \mathbb {N}\), let \(({{\varOmega }}, \mathcal {F}, \mathcal {P})\) be a probability space, and let \(X_{i}:{{\varOmega }}\to \mathbb {R}^{d}, i\in \{1,\ldots , m\}\), be i.i.d. random variables with \(\mathbb {E}\left [{\left \|{X_{1}}\right \|}\right ]<\infty \). Then, it holds that
Proof 1
This result is [8, Corollary 2.5]. □
Lemma 2
Let \(p\in [2,\infty )\), \(q,m\in \mathbb {N}\), let \(({{\varOmega }}, \mathcal {F}, \mathcal {P})\) and \((\mathcal {D}, \mathcal {A}, \mu )\) be probability spaces, and let for every \(q\in \mathcal {D}\) the maps \({X_{i}^{q}}:{{\varOmega }}\to \mathbb {R}, i\in \{1,\ldots , m\}\), be i.i.d. random variables with \(\mathbb {E}\left [{|{{X^{q}_{1}}}|}\right ]<\infty \). Then, it holds that
Proof 2
The proof involves Hölder’s inequality, Fubini’s theorem and Lemma 3. The calculation is as in [8, Eq. (226)]. □
Lemma 3
Let 𝜖 > 0, let \(({{\varOmega }}, \mathcal {F}, \mathcal {P})\) be a probability space, and let \(X:{{\varOmega }}\to \mathbb {R}\) be a random variable that satisfies \(\mathbb {E}\left [{|{X}|}\right ]\leq \epsilon \). Then, it holds that \(\mathbb {P}(|{X}|\leq \epsilon )>0\).
Proof 3
This result is [8, Proposition 3.3]. □
Lemma 4 (Levy’s modulus of continuitý)
For (Bt)t∈[0,1] a Brownian motion, it holds almost surely that
Proof 4
This result is due to [20] and can be found in most probability theory textbooks. □
Lemma 5
Let T > 0, p ≥ 2, \(d,m\in \mathbb {N}\), let \(({{\varOmega }}, \mathcal {F}, P, (\mathbb {F}_{t})_{t\in [0,T]})\) be a stochastic basis and let \(W:[0,T]\times {{\varOmega }}\to \mathbb {R}^{m}\) be a standard m-dimensional Brownian motion on \(({{\varOmega }}, \mathcal {F}, P, (\mathbb {F}_{t})_{t\in [0,T]})\). Let \(\lambda \in {\mathscr{L}}^{p}(P|_{\mathbb {F}_{0}}, \|{\cdot }\|_{\mathbb {R}^{d}})\) and let \(\mu :\mathbb {R}^{d}\to \mathbb {R}^{d}\) and \(\sigma :\mathbb {R}^{d}\to \mathbb {R}^{d\times m}\) be affine functions. Then, there exists an up to indistinguishability unique \((\mathbb {F}_{t})_{t\in [0,T]}\)-adapted stochastic process \(X^{\lambda }:[0,T]\times {{\varOmega }}\to \mathbb {R}^{d}\), which satisfies
-
1.
that for all t ∈ [0,T] it holds P-a.s. that
$$ X^{\lambda}_{t} = \lambda +{{\int}_{0}^{t}}\mu\left( X^{\lambda}_{s}\right)ds + {{\int}_{0}^{t}}\sigma\left( X^{\lambda}_{s}\right)dW_{s} $$(A.11) -
2.
that \(\sup _{t\in [0,T]}\left \|{X^{\lambda }_{t}}\right \|_{{\mathscr{L}}^{p}(P, \|{\cdot }\|_{\mathbb {R}^{d}})}<\infty \),
-
3.
that for all \(\alpha \in (0,\frac {1}{2}]\) that
$$ \underset{s<t}{\underset{s,t \in [0,T],}{\sup}} \frac{\left\|{X^{\lambda}_{s}-X^{\lambda}_{t}}\right\|_{\mathcal{L}^{p}(P, \|{\cdot}\|_{\mathbb{R}^{d}})}}{|{s-t}|^{\alpha}} < \infty, $$(A.12) -
4.
for all \(x\in \mathbb {R}^{d}\), t ∈ [0,T] and ω ∈Ω it holds that
$$ {X^{x}_{t}}(\omega) = \sum\limits_{i=1}^{d} \left( X^{e_{i}}_{t}(\omega)-{X^{0}_{t}}(\omega)\right)x_{i} + {X^{0}_{t}}(\omega). $$(A.13)
Proof 5
Properties (1)-(3) are proven in [2, Theorem 4.5.1]. Property (4) follows from Proposition 2.20 in [8] and Lemma 3.4 in [3]. □
Lemma 6
Let \(h: \mathbb {R}\to \mathbb {R} :x\mapsto \min \limits \{1,\max \limits \{0,x\}\}\). For every N ≥ 2 and 𝜖,γ > 0 there exists a tanh neural network \(\hat {h}\) with two hidden layers, \(\mathcal {O}\left (N^{\frac {1}{2(1-\gamma )}}\epsilon ^{\frac {-3}{1-\gamma }}\right )\) neurons and weights growing as \(\mathcal {O}\left (N^{\frac {1}{(1-\gamma )}}\epsilon ^{\frac {-6}{1-\gamma }}\right )\) such that
Proof 6
We first approximate h with a function \(\tilde {h}\) that is twice continuously differentiable,
It is easy to prove that \(\left \|{h-\tilde {h}}\right \|_{L^{\infty }(\mathbb {R})} = \mathcal {O}(\epsilon ^{2})\). Next, we calculate the derivative of \(\tilde {h}\),
A straightforward calculation leads to the bound \(\left \|{h^{\prime }-\tilde {h}^{\prime }}\right \|_{L^{2}(\mathbb {R})} = \mathcal {O}(\epsilon )\). Finally, one can easily check that \(\tilde {h}^{\prime \prime }\) is continuous and that \(\left \|{\tilde {h}^{\prime \prime }}\right \|_{L^{\infty }(\mathbb {R})} = \mathcal {O}(\epsilon ^{-2})\). An application of [5, Theorem 5.1] on \(\tilde {h}\) gives us for every γ > 0 and N large enough the existence of a tanh neural network \(\hat {h}^{\mathcal {N}}\) with two hidden layers and \(\mathcal {O}(\mathcal {N})\) neurons for which it holds that \(\left \|{\tilde {h}-\hat {h}^{\mathcal {N}}}\right \|_{W^{1,\infty }([-1,2])} = \mathcal {O}(N^{-1+\gamma }\epsilon ^{-2})\). Because of the nature of the construction of \(\hat {h}^{\mathcal {N}}\), the monotonous behaviour of the hyperbolic tangent towards infinity and the fact that \(\tilde {h}\) is constant outside [− 1, 2], the stronger result that \(\left \|{\tilde {h}-\hat {h}^{\mathcal {N}}}\right \|_{W^{1,\infty }(\mathbb {R})} = \mathcal {O}\left (\mathcal {N}^{-1+\gamma }\epsilon ^{-2}\right )\) holds automatically as well. As a result we find that \(\left \|{\left (\hat {h}^{\mathcal {N}}\right )^{\prime } }\right \|_{L^{\infty }(\mathbb {R})}\leq 2\), \(\left \|{\tilde {h}-\hat {h}^{\mathcal {N}}}\right \|_{L^{\infty }(\mathbb {R})} = \mathcal {O}\left (\mathcal {N}^{-1+\gamma }\epsilon ^{-2}\right )\) and \(\left \|{\tilde {h}-\hat {h}^{\mathcal {N}}}\right \|_{L^{2}([-N,N])} = \mathcal {O}\left (\sqrt {N}\mathcal {N}^{-1+\gamma }\epsilon ^{-2}\right )\). If we choose \(\mathcal {N}\sim N^{\frac {1}{2(1-\gamma )}}\epsilon ^{\frac {-3}{1-\gamma }}\) then we find that
Moreover, [5, Theorem 5.1] tells us that the weights of \(\hat {h}^{\mathcal {N}}\) grow as \(\mathcal {O}(\mathcal {N}^{2}) = \mathcal {O}\left (N^{\frac {1}{(1-\gamma )}}\epsilon ^{\frac {-6}{1-\gamma }}\right )\). The statement then follows from applying the triangle inequality. □
Appendix B: Lipschitz continuity in the parameter vector of a neural network and its derivatives
In this section, we will prove that for any x ∈ D, a neural network and its corresponding Jacobian and Hessian matrix are Lipschitz continuous in the parameter vector. This property is of crucial importance to find bounds on the generalization error of physics-informed neural networks, cf. Section 4. We first introduce some notation and then state or results. The main results of this section are Lemmas 11 and 13.
We denote by \(\sigma :\mathbb {R}\to \mathbb {R}\) be an (at least) twice continuously differentiable activation function, like tanh or sigmoid. For any \(n\in \mathbb {N}\), we write for \(x\in \mathbb {R}^{n}\) that σ(x) := (σ(x1),…,σ(xn)). We use the definition of a neural network as in Definition 1.
Recall that for a differentiable function \(f:\mathbb {R}^{n}\to \mathbb {R}^{m}\) the Jacobian matrix J[f] is defined by
For our purpose, we make the following the following convention. For any 1 ≤ k ≤ L, we define
Similarly, for a twice differentiable function \(g:\mathbb {R}^{n}\to \mathbb {R}\) the Hessian matrix is defined by
Slightly abusing notation, we generalize this to vector-valued functions \(g:\mathbb {R}^{n}\to \mathbb {R}^{m}\). We write
where we identify \(\mathbb {R}^{1\times n \times n}\) with \(\mathbb {R}^{n \times n}\) to make the definitions consistent. Similarly, if \(v\in \mathbb {R}^{1 \times m}\), then v ⋅ H[g] should be interpreted as
For any 1 ≤ k < L, we write
Finally, we will use the notation J𝜃 := J[Ψ𝜃] and H𝜃 := H[Ψ𝜃]. The following lemma presents a generalized version of the chain rule.
Lemma 7
Let \(f:\mathbb {R}^{n}\to \mathbb {R}^{m}\) and \(g:\mathbb {R}^{m}\to \mathbb {R}\). Then, it holds that
We now apply this formula to find an expression for H𝜃 in terms of \( J^{\theta }_{k}\) and \( H^{\theta }_{k}\).
Lemma 8
It holds that
Proof 7
The first statement is just the chain rule for calculating the derivative of a composite function. We prove the second statement using induction. For the base step, let L = 1. Then, \({{\varPsi }}^{\theta } = f_{L}^{\theta }\) and we have \(H[{{\varPsi }}^{\theta }] = H^{\theta }_{L}\). For the induction step, take \(K\in \mathbb {N}, K\geq 2\) and assume that the statement holds for L = K − 1. Now, let \({{\varPhi }}^{\theta } = f_{K}^{\theta }\circ {\cdots } \circ f_{2}^{\theta }\) and \({{\varPsi }}^{\theta } = {{\varPhi }}^{\theta } \circ f_{1}^{\theta }\). Applying the generalized chain rule to calculate \(H[{{\varPhi }}^{\theta } \circ f_{1}^{\theta }]\) and using the induction hypothesis on H[Φ𝜃] gives the wanted result. □
Next, we formally introduce the element-wise supremum norm \({\left \vert \cdot \right \vert _{\infty }}\). Let \(N\in \mathbb {N}\), \(n_{0}, {\ldots } n_{N}\in \mathbb {N}\) and \(A\in \mathbb {R}^{n_{1}\times {\cdots } \times n_{N}}\). Then, we define
Let R > 0 and suppose that \(A_{i}\in \mathbb {R}^{n_{i-1} \times n_{i}}\). Then, it holds that
Moreover, for \(v\in \mathbb {R}^{1 \times a}\) and \(A\in \mathbb {R}^{a\times b \times c}\) it holds that \({\left \vert v\cdot A \right \vert _{\infty }} \leq a {\left \vert v \right \vert _{\infty }}{\left \vert A \right \vert _{\infty }}\).
The following lemma states that the output of each layer of a neural network is Lipschitz continuous in the parameter vector for any input x ∈ [a,b]d. The lemma is stated for neural networks with a differentiable activation function, but can be easily adapted for, e.g. ReLU neural networks.
Lemma 9
Let \(d,L,W\in \mathbb {N}\) with L,W ≥ 2, \(a,b\in \mathbb {R}\) with a < b and R ≥ 1. Moreover, let 𝜃,𝜗 ∈ΘL,W,R, \(\alpha = \max \limits \{1,|{a}|, |{b}|, \|{\sigma }\|_{\infty }\}\) and \(\beta = \max \limits \{1, \left \|{\sigma ^{\prime }}\right \|_{\infty }\}\). Then, it holds for 1 ≤ K ≤ L that
Proof 8
Let l0,…,lL denote the widths of the neural network, where l0 = d. Let x ∈ [a,b]d be arbitrary. First of all, it holds that
Now, let 2 ≤ k ≤ L and define \(y = \left (f_{k-1}^{\theta }\circ {\cdots } \circ f_{1}^{\theta }\right )(x)\) and \(\tilde {y} = (f_{k-1}^{\vartheta }\circ {\cdots } \circ f_{1}^{\vartheta })(x)\). We find that
A recursive application of this inequality then gives us for 1 ≤ K ≤ L that
where we used that β(Wα + 1)/(WRβ − 1) ≤ β(2α + 1)/(2Rβ − 1) ≤ 3α when W ≥ 2,R ≥ 1,α ≥ 1. □
Lemma 10
Let \(d,L,W\in \mathbb {N}\) with L,W ≥ 2, \(a,b\in \mathbb {R}\) with a < b and R ≥ 1. Moreover, let 𝜃,𝜗 ∈ΘL,W,R, \(\alpha = \max \limits \{1,|{a}|, |{b}|, \|{\sigma }\|_{\infty }\}\) and \(\beta = \max \limits \{1, \left \|{\sigma ^{\prime }}\right \|_{\infty },\left \|{\sigma ^{\prime \prime }}\right \|_{\infty },\left \|{\sigma ^{\prime \prime \prime }}\right \|_{\infty }\}\). Then, it holds for all 1 ≤ k ≤ L and x ∈ [a,b]d that
Proof 9
Let \({w_{i}^{T}}\) be the i th row of W𝜃,k, let \(\tilde {w}_{i}^{T}\) be the i th row of W𝜗,k and set b := b𝜃,k and \(\tilde {b}:=b^{\vartheta , k}\). Let \(F = f_{k-1}^{\theta }\circ {\cdots } \circ f_{1}^{\theta }\) and \(\tilde {F} = f_{k-1}^{\vartheta }\circ {\cdots } \circ f_{1}^{\vartheta }\). For 1 ≤ i ≤ lk, we have that
and analogously for \(J^{\vartheta }_{k}(x)_{i}\) and \(H^{\vartheta }_{k}(x)_{i}\). The triangle inequality and the Lipschitz continuity of \(\sigma ^{\prime }\) gives us that
Using that \({\left \vert F(x)-\tilde {F}(x) \right \vert _{\infty }} \leq \alpha (d+4) W^{k-2} R^{k-2}\beta ^{k-1}{\left \vert \theta -\vartheta \right \vert _{\infty }}\) (Lemma 11) for k ≥ 2 and lk− 1 ≤ W, we get
for k ≥ 2. One can check that the inequality also holds for k = 1.
For the Hessian matrix, the triangle inequality and the Lipschitz continuity of \(\sigma ^{\prime \prime }\) gives us that
Using Lemma 11 again, we get
for k ≥ 2. One can check that the inequality also holds for k = 1. □
The following lemma states that the Jacobian and Hessian matrix of a neural network are Lipschitz continuous in the parameter vector for any input x ∈ [a,b]d.
Lemma 11
Let \(d,L,W\in \mathbb {N}\) with L,W ≥ 2, \(a,b\in \mathbb {R}\) with a < b and R ≥ 1. Moreover, let 𝜃,𝜗 ∈ΘL,W,R, \(\alpha = \max \limits \{1,|{a}|, |{b}|, \|{\sigma }\|_{\infty }\}\) and \(\beta = \max \limits \{1, \left \|{\sigma ^{\prime }}\right \|_{\infty },\left \|{\sigma ^{\prime \prime }}\right \|_{\infty },\left \|{\sigma ^{\prime \prime \prime }}\right \|_{\infty }\}\). Then, it holds that for all x ∈ [a,b]d that
Proof 10
We will prove the formulas by repeatedly using the triangle inequality and using the representations proven in Lemma 10. To do so, we need to introduce some notation. Define for 0 ≤ l ≤ L + k − 1 the object ϕl ∈{𝜃,𝜗}2L such that
In particular, \(\phi ^{k,0}_{j} = \theta \) and \(\phi ^{k,{L+k-1}}_{j} = \vartheta \) for all j. To simplify notation, we write
The triangle inequality and Lemma 10 then give that
Observe that \(A^{k,l-1}_{j}-A^{k,l}_{j} = 0\) for j≠l. Therefore,
From Lemma 12, it follows that
Writing γ := 1 + R(αW + 1) we get
In an entirely similar fashion we obtain
□
Appendix C: Additional material for Section 4
Lemma 12 (Hoeffding’s inequality)
Let 𝜖,c > 0, \(N\in \mathbb {N}\), let \(({{\varOmega }},\mathcal {A},\mathbb {P})\) be a probability space and let Xn : Ω → [0,c] be independent random variables. Then, it holds that
Lemma 13
Let \(x\in \mathbb {R}\) and \( \sigma (x) = \tanh {x} = \frac {e^{-x}-e^{x}}{e^{-x}+e^{x}}\). It holds that \(\sigma ^{\prime }(x) = 1-(\sigma (x))^{2}\) and \(\sigma ^{\prime \prime }(x) = -2\sigma (x)/(1-(\sigma (x))^{2})\). In addition, it holds that \(\|{\sigma }^{\prime }\|_{\infty } = 1\) and \(\left \|{\sigma ^{\prime \prime }}\right \|_{\infty } = 4/3\sqrt {3} \leq 1\) and \(\left \|{\sigma ^{\prime \prime \prime }}\right \|_{\infty } = 2\).
The following lemma provides estimate on the various PINN residuals. It is based on the fact that neural networks and their derivatives are Lipschitz continuous in the parameter vector, the proof of which can be found in Appendix Appendix.
Lemma 14
Let \(d, L, W\in \mathbb {N}\), R ≥ 1 and let \(u_{\theta }:[0,1]^{d}\to \mathbb {R}\), 𝜃 ∈Θ, be tanh neural networks, at most L − 1 hidden layers, width at most W and weights and biases bounded by R. Let the PINN generalization \(\mathcal {E}_{G}^{q}\) and training \(\mathcal {E}_{T}^{q}\) errors be defined as in Section 2.3 for linear Kolmogorov PDEs (cf. Section 2.1). Assume that \(\max \limits \{\left \|{\varphi }\right \|_{\infty }, \left \|{\psi }\right \|_{\infty }\}\leq \max \limits _{\theta \in {{\varTheta }}}\left \|{u_{\theta }}\right \|_{\infty }\). Let \(\mathfrak {L}^{q}_{Q}\) denote the Lipschitz constant of \(\mathcal {E}^{q}_{Q}\), for q = i,t,s and Q = G,T. Then, it holds that
Proof 11
Without loss of generality, we only focus on \(\mathcal {E}_G^{q}\), for q = i,s,t. We see for q = i,t,s
For q = t,s and (x,t) ∈ D × [0,T], it follows from Lemma 11 that
and similarly using Lemma 13 that
where we let |⋅|p denote the vector p-norm of the vectorized version of a general tensor (cf. (B.9)). Next, we calculate using again Lemma 13 (by setting 𝜗 = 0) and \(\max \limits \{\left \|{\varphi }\right \|_{\infty }, \left \|{\psi }\right \|_{\infty }\}\leq \max \limits _{\theta \in {{\varTheta }}}\left \|{u_{\theta }}\right \|_{\infty }\) for q = t,s that
where \(C=\max \limits _{x\in D}(1+|{\mu (x)}|_{1}+|{\sigma (x)\sigma (x)^{*}}|_{1})\). Combining all the previous results prove the stated bound. □
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
De Ryck, T., Mishra, S. Error analysis for physics-informed neural networks (PINNs) approximating Kolmogorov PDEs. Adv Comput Math 48, 79 (2022). https://doi.org/10.1007/s10444-022-09985-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-022-09985-9