Abstract
This paper addresses the numerical resolution of controllability problems for partial differential equations (PDEs) by using physics-informed neural networks. Error estimates for the generalization error for both state and control are derived from classical observability inequalities and energy estimates for the considered PDE. These error bounds, that apply to any exact controllable linear system of PDEs and in any dimension, provide a rigorous justification for the use of neural networks in this field. Preliminary numerical simulation results for three different types of PDEs are carried out to illustrate the performance of the proposed methodology.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Since the pioneering theoretical works by Russell [37] and Lions [22], the numerical resolution of controllability problems for PDEs has faced a range of challenging difficulties which have been solved by using a number of sophisticated techniques (see, e.g., [10, 11, 15, 29, 32, 46], among many others). All these methods require the numerical approximation of some suitable PDEs, a task which is done by using classical methods in numerical analysis, mainly finite differences and finite elements. As a consequence, the available methods for solving numerically controllability problems for PDEs suffer from the well-known curse of dimensionality phenomenon. In plain words, curse of dimensionality reflects the fact that doubling the number of degrees of freedom in each spatial direction increases the solution complexity by a factor of \(2^d\), with d being the spatial dimension. This makes classical numerical methods for solving PDEs impractical when the spatial dimension is large.
On the other hand, during the last few years there has been a deep research effort in developing numerical methods for solving PDEs by using techniques from machine learning (ML) and artificial intelligence (AI). The main motivation for exploring the use of these new techniques in approximating PDEs is not to try to find methods that outperform classical methods (finite differences, finite elements, finite volumes, etc.) in low spatial dimensions (\(d=1,2,3\)), but to solve numerically high-dimensional PDEs (\(d>3\)), where the above-mentioned classical methods get stuck by the curse of dimensionality.
Examples of fields where high-dimensional PDEs arise are, among others: radioactive transport equation (\(d\ge 5\)), kinetic models, e.g., the Boltzmann kinetic equations (\(d=6\)), computational finance, e.g., the nonlinear Black–Scholes equation for pricing derivatives (\(d\gg 1\)), computational quantum chemistry, e.g., the nonlinear Schrödinger equation in the quantum many-body problem (\(d\gg 1\)), and game theory, e.g., the Hamilton–Jacobi–Bellman equation in dynamic programming [17]. Control problems for parametric PDEs is another field where high dimension plays a crucial role [20, 25, 26, 47].
Among the different deep-learning-based methods that have been recently proposed to approximate numerically the solution of PDEs, it is worth to mention the following: physics-informed neural networks (PINNs) [34], deep Ritz method [41], methods based on the Feynman–Kac formula [5], and methods based on the solution of backward stochastic differential equations [17]. See also [4, 6] for recent reviews. Although these methods have shown an excellent performance at the level of numerical simulation, the error analysis theory for these methods is essentially missing.
The above deep learning-based numerical schemes can be adapted to solve numerically not only forward problems for PDEs, but also a number of related problems involving PDEs such as inverse problems [27, 34] or random PDEs [43], among others.
Up to the best knowledge of the authors, the numerical resolution of controllability problems for PDEs by using ML has not been addressed so far.
The main goal of this paper is to explore the use of PINNs to approximate numerically the solution of controllability problems for PDEs. More precisely, a PINNs-based algorithm that applies to both linear and nonlinear PDEs is proposed in Sect. 2. Then, fostered by [27], error estimates for the so-called generalization error are provided (Theorem 3.1). The proof of this result is based on energy estimates for the solution of the considered PDE and on observability inequalities for its associated adjoint system. From these error bounds, a convergence result of the control and state obtained by using the proposed method to the control and state of the continuous problem is established (Corollary 3.1). For the sake of clarity, in this preliminary work we focus on the case of boundary Dirichlet control, but in a straightforward manner the ideas and methods here proposed extend to other types of control actions. Also, for pedagogical reasons, instead of presenting an abstract general framework, the methods and proofs are first described for the two emblematic systems of the wave and heat equations and then extended to more general PDE systems. Preliminary numerical experiments for three different PDEs illustrate the performance of the proposed method. More precisely, the accuracy of the method is tested on a simple model for the wave equation for which an analytical solution is available. In a second experiment, a high-dimensional controllability problem for the heat equation is considered. The third experiment concerns a semilinear PDE.
As the title indicates, this work is just a first step toward the numerical resolution of controllability problems for high-dimensional PDEs and so further research is needed to achieve a deeper understanding of the type of problems considered here.
Finally, for the sake of completeness, it is important to point out that the connection between different architectures of deep neural networks and controlled ordinary differential equations has been recently studied in [9]. This is a novel research line that also includes the analysis of the so-called neural differential equations by using techniques coming from continuous control theory [1, 12, 13, 35, 36].
2 Problem Setup and Description of the PINNs Algorithm
From now on in this paper, \(\varOmega \subset {\mathbb {R}}^d\), \(d\in {\mathbb {N}}\), denotes a bounded domain with a smooth boundary which is decomposed into two disjoint parts \(\varGamma _D \) and \(\varGamma _C\). For any positive time T, we denote by \(Q_T:=\varOmega \times \left( 0,T\right) \). As is usual \(\varDelta :=\sum _{j=1}^d\frac{\partial ^2}{\partial x_j^2}\) stands for the Laplacian operator.
2.1 Wave Equation
Given initial data \((y^0,y^1)\) in suitable function spaces, the null controllability problem for the wave equation amounts to finding a positive time \(T>0\) and a control function u(x, t) such that the solution y(x, t) of the system
satisfies
It is well known [22] that if the domain \(\varOmega \) satisfies the so-called geometrical controllability condition (GCC) introduced by Bardos, Lebeau, and Rauch [3] and if \(\left( y^0, y^1\right) \in L^2(\varOmega )\times H^{-1}(\varOmega )\), then, for T large enough, problem (1)–(2) has a solution \(u\in L^2\left( \varGamma _C\times (0, T)\right) \).
The PINNs approach for solving direct and inverse problems for PDEs [34] is next adapted to approximate numerically the control u(x, t) of problem (1)–(2). Roughly speaking, in the PINNs approach the solution is approximated by a neural network and the equations are imposed, in the least square sense, at a collection of nodal points. In the machine learning language, PINNs approach is composed of the following four main steps: (1) design an artificial neural network \({\hat{y}}\left( x,t;\varvec{\theta }\right) \) as a surrogate of the true solution y(x, t), (2) consider a training set that is used to train the neural network, (3) define an appropriate loss function which accounts for residuals of the PDE, initial, boundary, and final conditions, and (4) train the network by minimizing the cost function defined in the previous step. From the training process, optimal parameters defining the neural network \({\hat{y}}\left( x,t;\varvec{\theta }\right) \) are computed and eventually are used to get predictions about the state y(x, t) and the control u(x, t), which is approximated as the trace of \({\hat{y}}\left( x,t;\varvec{\theta }\right) \) on the boundary \(\varGamma _C\). Next, we give details on these steps:
Step 1: Neural Network Among different possibilities, we consider a deep feedforward neural network (also known in the literature as a multilayer perceptron (MLP)) with \(d+1\) input channels \(\varvec{x}=(x,t)\in {\mathbb {R}}^{d+1}\) and a scalar output \({\hat{y}}\) (see Fig. 1). More specifically, \({\hat{y}}\left( x,t;\varvec{\theta }\right) \) is constructed as
where
-
\({\mathcal {N}}^{\ell }:{\mathbb {R}}^{d_{in}}\rightarrow {\mathbb {R}}^{d_{out}}\) is the \(\ell \) layer with \(N_{\ell }\) neurons,
-
\(\varvec{W}^{\ell }\in {\mathbb {R}}^{N_{\ell }\times N_{\ell -1}}\) and \(\varvec{b}^{\ell }\in {\mathbb {R}}^{N_{\ell }}\) are, respectively, the weights and biases so that \(\varvec{\theta }=\left\{ \varvec{W}^{\ell }, \varvec{b}^{\ell }\right\} _{1\le \ell \le L}\) are the parameters of the neural network, and
-
\(\sigma \) is an activation function, which acts component-wise. Throughout this paper, we consider smooth activation functions such as hyperbolic tangent \(\sigma (s)=\tanh (s)\), with \(s\in {\mathbb {R}}\).
Illustration of a fully connected deep feedforward neural network with two input channels \(\varvec{x}=(x,t)\), two hidden layers (each one with 3 neurons) and a scalar output \({\hat{y}}(x,t;\varvec{\theta })\). Input data pass through the net by following (3). The set of free parameters of the network is denoted by \(\varvec{\theta }\)
Step 2: Training Dataset A dataset \({\mathcal {T}}\) of scattered data is selected in the interior domain \({\mathcal {T}}_{\text {int}}\subset Q_T\) and on the boundaries \({\mathcal {T}}_{\varGamma _D} \subset \varGamma _D\times (0, T)\), \({\mathcal {T}}_{t=0} \subset \varOmega \times \left\{ 0\right\} \), \({\mathcal {T}}_{t=T}\subset \varOmega \times \left\{ T\right\} \). Thus, \({\mathcal {T}}={\mathcal {T}}_{\text {int}}\cup {\mathcal {T}}_{\varGamma _D}\cup {\mathcal {T}}_{t=0}\cup {\mathcal {T}}_{t=T}\) (see Fig. 2 for an illustration). The number of selected points in \({\mathcal {T}}_{\text {int}}\) is denoted by \(N_{int}\). Analogously, \(N_{b}\) is the number of points on the boundary \(\varGamma _D\), and \(N_{0}\) and \(N_{T}\) stand for the number of points in \({\mathcal {T}}_{t=0}\) and \({\mathcal {T}}_{t=T}\), respectively. The total number of collocation nodes is denoted by N, and we write \({\mathcal {T}}_N\) instead of \({\mathcal {T}}\) to indicate clearly the number of points N used hereafter.
Step 3: Loss Function A weighted summation of the \(L^2\) norm of residuals for the equation, boundary, initial, and final conditions is considered as the loss function to be minimized during the training process. It is composed of the following six terms: given a neural network approximation \({\hat{y}}\) (as constructed in (3)), define
where \( w_{j,\text {int}}\), \(w_{j, b}\), \(w_{j, 0}\), and \(w_{j, T}\) are the weights of the quadrature rules.
The loss function used for training is
Notice that no boundary condition is imposed on \(\varGamma _C\). As is usual in the field of machine learning, all the derivatives included in the loss function are computed by using automatic differentiation [2].
Step 4: Training Process The final step in the PINN algorithm amounts to minimizing (4), i.e.,
The approximation \({\hat{u}}(t;\varvec{\theta }^*)\) of the control u(x, t) is then obtained as the restriction of \({\hat{y}}(x,t;\varvec{\theta }^*)\) to the boundary \(\varGamma _C\), i.e.,
See Fig. 3 for an schematic diagram of the proposed algorithm.
PINN algorithm for approximating the exact state and control for the wave equation. The neural network \({\hat{y}}\left( x,t,\varvec{\theta }\right) \) is required to satisfy, in the least squares sense, the PDE, initial conditions, boundary condition and exact controllability conditions. Then, the residual on training points \({\mathcal {L}}\left( \varvec{\theta };{\mathcal {T}}\right) \) is minimized to get the optimal set of parameters \(\varvec{\theta }^*\) of the neural network. This leads to the PINN exact state \({\hat{y}}\left( x,t;\varvec{\theta }^*\right) \). Finally, the PINN exact control \({\hat{u}}\left( x,t;\varvec{\theta }^*\right) \) is obtained as the trace of the PINN state on the boundary control region \(\varGamma _C\)
Remark 2.1
Notice that the PINN algorithm proposed above is, in spirit, in the same line as the one considered in [30], where an error function is minimized in the sense of least squares. As a consequence, if this error function reaches the zero value, then the controllability condition is satisfied. A major difference with respect to classical numerical methods for control of PDEs is that the PINN-based approach is mesh-free as it does not require a (finite element) mesh for numerical approximation. Moreover, the function that is used for numerical approximation is a neural network as opposed to (piece-wise) polynomials, that are the usual models of choice.
Remark 2.2
As is well known, the different terms that appear in the loss function (4) do not have the same strength, in general. At the practical level, this difficulty may be overcome by introducing additional weighting parameters in front of those terms. These would be new hyperparameters that the machine-learning-based algorithm ought to learn. It is clear that the introduction of these parameters does not affect the convergence results in the next section.
2.2 Heat Equation
Similar to the case of the wave equation, given an initial datum \(y^0\) in a suitable function space, the null controllability problem for the heat equation amounts to finding a positive time \(T>0\) and a control function u(x, t) such that the solution y(x, t) of the system
satisfies
It is well known [21] that if \(y^0\in L^2(\varOmega )\), then, for any \(T>0\), problem (7)–(8) has a solution \(u\in L^2\left( \varGamma _C\times (0, T)\right) \).
The numerical approximation of problem (7)–(8) follows the same steps 1–4 as in the case of the wave equation. The only element to be modified is the loss function, which in this case is defined as the sum of
2.3 Extension to General Evolution PDE Systems
Consider now a general evolution system of the form
where A is a generic (linear or nonlinear) operator, and the state \(y=y(x,t)\) is, in general, a vector function.
As in the preceding two cases, the goal is to find a positive time T and a control function u(x, t) such that the solution to (9) satisfies
The PINNs algorithm described above for the wave and heat equations also applies in this general framework with few changes. Actually, the only step that must be updated is the one corresponding to the loss function that now takes the form:
where \(\Vert \cdot \Vert \) stands for the Euclidean norm.
3 Estimates on Generalization Error
This section aims at obtaining error estimates for the so-called generalization error for both control and state. The generalization error for the control variable u is defined by
where \(u=u(x,t)\) is the exact control of minimal \(L^2\)-norm of the continuous problem, \({\hat{u}}={\hat{u}}\left( x,t;\varvec{\theta }^*\right) \) is its numerical approximation obtained from the algorithm proposed above, and \(\Vert \cdot \Vert \) is an appropriate norm. The generalization error for the state variable is similarly defined.
The generalization error (11) is typically decomposed into approximation error, which is due to the choice of the hypothesis space (two-layer, multilayer, residual, convolutional neural networks, etc.), and estimation error, due to the fact that the surrogate control \({\hat{u}}\) is computed from a finite dataset. Of course, the generalization error also depends on a crucial way on the specific algorithm proposed for training. In particular, PINN solutions obtained from the proposed method are obtained by solving highly nonconvex optimization problems that typically get stuck in local minima. Estimating this optimization error is a very challenging open problem.
Error estimates for the approximation error of some hypothesis spaces are by now well known. For instance, for the case of Barron space of two-layer neural networks, the approximation error in the \(L^2\)-norm scales as \({\mathcal {O}}\left( m^{-1/2}\right) \), with m being the number of neurons in the network, and independently of the dimension d. As for the estimation error, it is also known that the Rademacher complexity of Barron space, which controls the estimation error, is controlled by a Monte Carlo rate \({\mathcal {O}}\left( N^{-1/2}\right) \), where N is the number of sampling points used for training. We refer the reader to [42] and the references therein for more details on this issue. In particular, these results support the choice of multilayer neural networks of Sect. 2.
Regarding the PINNs algorithm for solving PDEs, convergence results w.r.t. the number of sampling points used for training have been recently obtained in [38] for the case of second-order linear elliptic and parabolic equations with smooth solutions. It is also worth to mentioning article [27] where error estimates, in terms of training error and the number of sampling points, are derived for the generalization error of a class of data assimilation problems.
Following [27], we next prove error estimates for control and state and for the class of controllability problems considered here. The two key ingredients to get such error bounds are observability inequalities and error estimates for quadrature rules. The precise observability inequalities that are needed in our cases will be detailed in the next subsections. Concerning quadrature error estimates, these are very well known in the literature but for the sake of completeness, we now recall some basic concepts and results on this issue.
3.1 Error Estimates for Quadrature Rules
For a given function \(f:{\mathcal {D}}\subset {\mathbb {R}}^d\rightarrow {\mathbb {R}}\), a quadrature rule approximating the integral
is defined by
where \((x_j, w_j)\), \(1\le j\le N\), are the nodes and weights of the quadrature rule. Quadrature errors depend on the specific rule used, on the smoothness of the function f and on the dimension d. For regular functions and low dimensions, one typically may use Gauss or Clenshaw–Curtis rules. Rules based on low discrepancy sequences such as Sobol sequences are the rules of choice for intermediate dimensions [39]. In both cases, error estimates for these quadrature rules take the general form
where \(\alpha \) depends on the regularity of f and the constant \(C_q=C_q(d)\), which also depends on f and its derivatives, explodes as \(d\rightarrow \infty \). Monte Carlo integration is immune to the curse of dimensionality and applies to non-smooth integrands. As is well known, the error estimate in that case is as in (12) where \(C_q\) is independent of the dimension d and \(\alpha =1/2\).
3.2 Wave Equation
The generalization error in the control variable u due to the PINN algorithm proposed in Sect. 2.1 is defined as
where \(u=u(t)\) is the exact control of the continuous problem (1)–(2) and \({\hat{u}}={\hat{u}}\left( t;\varvec{\theta }^*\right) \) is its numerical approximation given by (6).
Similarly, the generalization error for the state variable is defined by
As is usual in machine learning’s terminology, the so-called training error for PINNs algorithm is given by
where
and \(\varvec{\theta }^*\) is as in (5).
Next, we recall classical observability and energy inequalities for the wave equation:
Lemma 3.1
Let us assume that the domain \(\varOmega \) satisfies the geometrical controllability condition [3], and let \(T>0\) be large enough. Given initial and final conditions \((z^0_0, z^1_0), (z^0_T, z^1_T)\in L^2\left( \varOmega \right) \times H^{-1}\left( \varOmega \right) \), there exists a control function \(v\in L^2\left( \varGamma _C;(0,T)\right) \) such that the solution z(x, t) of the system
satisfies
Moreover,
for a positive constant \(C_o=C_o(\varOmega ,T)\) which depends on \(\varOmega \) and T, but is independent of the initial and final data.
Lemma 3.2
Let \(\left( z_0^0,z_0^1\right) \in L^2(\varOmega )\times H^{-1}(\varOmega )\) and \(g\in L^2\left( \partial \varOmega \times (0,T)\right) \). Consider the non-homogeneous system
Then, there exists a positive constant \(C_e=C_e(\varOmega ,T)\) such that
We are now in a position to estimate the generalization error for our PINNs-based algorithm.
Theorem 3.1
Let \(y=y(x,t)\in C^2\left( \overline{Q_T}\right) \) be a classical solution of (1)–(2), and let \({\hat{y}}={\hat{y}}(x,t;\varvec{\theta }^*)\) its PINN approximation obtained by the method proposed in Sect. 2.1. It is assumed that \({\hat{y}}\in C^2\left( \overline{Q_T}\right) \). Let \(u=u(x,t)\) and \({\hat{u}}={\hat{u}}\left( x,t;\varvec{\theta }^*\right) \) be the exact control of the continuous system (1)–(2) and its PINN approximation, respectively. Then, the following estimate for the generalization error in the control variable holds:
where \(C=C(\varOmega , T)\), and consequently \(C=C(d)\) also depends on the spatial dimension d. The constants \(C_{q-}\) and exponents \(\alpha _{-}\) are the ones associated with quadrature rules as in (12).
A similar estimate, with different constants, also holds for the generalization error in the state variable, as given by (14).
Proof
Let \({\overline{y}} = y-{\hat{y}}\) and \({\overline{u}}=u-{\hat{u}}\) be the error in the state and control variables, respectively. By linearity, \({\overline{y}}\) solves
Again by linearity, \({\overline{y}} (x,t;\varvec{\theta })\) is decomposed as \({\overline{y}}={\overline{y}}^1+{\overline{y}}^2\), where \({\overline{y}}^1\) and \({\overline{y}}^2\) are, respectively, solutions to
and
By applying the observability inequality (19) to system (24), and the energy estimate (21) to (25),
Estimate (22) then follows by applying (12). The corresponding estimate for the generalization error (14) is an immediate consequence of (21) and (22). \(\square \)
Although it has not been written explicitly hereinabove, it is clear that generalization errors depend on the specific type and size of neural network as well as on the type and number of quadrature nodes which are selected from the very beginning. Thus, denoting by \({\mathcal {H}}_m\) the hypothesis space considered for numerical approximation, where m denotes the number of neurons (or free parameters) in the neural network, and by N the number of collocation points used for quadrature, to make explicit this dependence we write
Next, the behavior of the generalization errors is analyzed when the size m of single-layer neural networks goes to infinity and so does the sampling size (\(N\rightarrow \infty \)).
Let us consider the hypothesis space of single-layer neural nets
The training process (5) may be rewritten in the equivalent form
From now on it is assumed that the optimization problem (27) has a solution. Otherwise, one can always add a regularization term of the form \(\Vert \varvec{\theta }\Vert ^2\).
We now recall the following universal approximation theorem due to Pinkus [33, Th. 4.1].
Theorem 3.2
Let \(f\in C^k({\mathbb {R}}^{d+1})\). Assume that the activation function \(\sigma \in C^k({\mathbb {R}})\) is not a polynomial. Then, for any compact set \(K\subset {\mathbb {R}}^{d+1}\) and any \(\varepsilon >0\) there exists \(m\in {\mathbb {N}}\) and \(y_m\in {\mathcal {H}}_m\) such that
for all multiindex \(l\le k\).
Corollary 3.1
Assume that the activation function \(\sigma \in C^k({\mathbb {R}})\) is not a polynomial. With the same assumptions as in Theorem 3.1 and considering subsequences, still labeled by m and N, one has
Proof
Let us fix \(\varepsilon >0\). We apply Theorem 3.2 for \(K=\overline{Q_T}\) and \(f=y\), solution of the controllability problem (1)–(2). Then, there exist \(m=m(\varepsilon )\in {\mathbb {N}}\) and corresponding \(y_m\in {\mathcal {H}}_m\) such that
Each of the terms in the left-hand side of (29) is now expressed by using a quadrature rule with collocation nodes \({\mathcal {T}}_N\). Then, taking into account the optimality of \({\hat{y}}_{m,N}\), as given by (27), one deduces that the sum of training errors that appear in the right-hand side of (22) is less than or equal to \(\varepsilon /2\).
Moreover, for fixed \(m=m(\varepsilon )\), there exists N such that the sum of quadrature errors in (22) is also less than or equal to \(\varepsilon /2\). Thus, \({\mathcal {E}}^{m,N}_{\text {gener}}\left( u\right) \le \varepsilon \). The arbitrariness of \(\varepsilon \) gives the result for the generalization error in the control variable. The case of the state variable is completely analogous. \(\square \)
3.3 Extension to Other PDE Systems and Neural Network Architectures
It is clear that the arguments and conclusions of Theorem 3.1 and Corollary 3.1 extend to any linear system of PDEs for which observability as well as energy inequalities similar to those in (19) and (21) hold. Linearity of the PDE is used in an essential way in the proof of Theorem 3.1. Thus, a different argument is needed to extend this result to the case of nonlinear PDEs.
The proof of Corollary 3.1 relies on the universal approximation theorem by Pinkus for the case of single-layer neural networks. Thus, the conclusion of Corollary 3.1 also holds for other neural network architectures for which such a density result is true.
4 Numerical Experiments
In this section, we test the performance of the proposed method in three exact controllability problems. The first experiment aims at checking the accuracy of the method on a very simple controllability problem for the wave equation for which an exact solution is explicitly known. In the second experiment, the high-dimensional situation is tested on a controllability problem for the heat equation. The last experiment considers a semilinear wave equation.
As indicated at the beginning of Sect. 3, the optimization error due to the gradient-based algorithms used for training is a key ingredient in the total error associated with the proposed PINN algorithm. This error has not been accounted for in Theorem 3.1 and Corollary 3.1. However, the numerical simulation results presented in this section do incorporate this error. As a consequence, simulation results are unable to illustrate with accuracy the theoretical findings of Sect. 3. The gap between the theoretical error estimates and the simulation results is accounted for by the optimization error in the training process.
In all experiments that follow, a multilayer neural network, as described in Sect. 2, with the \(\tanh \) as activation function, is used. Sobol quadrature nodes [39] are employed for training the neural network. The training process, i.e., minimization of \({\mathcal {L}}\left( \varvec{\theta };{\mathcal {T}}_N\right) \), is carried out with the ADAM optimizer [19] with learning rate \(10^{-3}\) for the first 20000 epochs. Then, a L-BFGS optimizer [7] is employed to accelerate convergence. The required gradients are computed by using automatic differentiation [2]. The descent algorithm is initialized with Glorot uniform [14]. As is well known, results obtained from gradient-based optimizers depend on initialization. A common practice to deal with this issue is to perform an emsemble training [24]. However, the use of this and other more sophisticated techniques (residual-based adaptive refinement (RAR) [23], dropout [40], batch normalization [18], etc.) is not the purpose of this paper which aims at illustrating the possible use of PINNs in the topic of controllability of PDEs.
4.1 Experiment 1: Linear Wave Equation
We consider the control system (1)–(2) in the domain \(\varOmega = \left( 0,1\right) \) for the data
and for the control time \(T=2\). An explicit solution of the problem is easily obtained by using D’Alembert formula. Indeed, by considering the function
the explicit exact state is given by
and the exact control is
Remark 4.1
We notice that the control given by (31) is the one of minimal \(L^2\)-norm. This is no longer true if the initial velocity \(y^1\) is different from zero (see [16], Section 4.1 for details).
The efficiency of the proposed PINN-based algorithm in approximating the solution to this problem is analyzed next. The generalization error in the control variable \({\mathcal {E}}_{\text {gener}}(u):=\Vert u-{\hat{u}}\Vert _{L^2(0,T)}\), \(L^2\)-relative error \(\Vert u-{\hat{u}}\Vert _{L^2(0,T)}/\Vert u\Vert _{L^2(0,T)}\), and total training error \({\mathcal {E}}_{\text {train}}:={\mathcal {L}}\left( \varvec{\theta }^*;{\mathcal {T}}_N\right) \) are computed for several values of the total number N of training points and several architectures of the neural network. The effect of regularization, where the term \(\lambda _{\text {reg}}\Vert \varvec{\theta }\Vert _{2}^2\) is added to the loss function (4), with \(\lambda _{\text {reg}} >0\), is also studied.
Once the training process is finished and the optimal set of parameters \(\varvec{\theta }^*\) is obtained, the PINN control \({\hat{u}}(t;\varvec{\theta }^*) = {\hat{y}}\left( 1,t;\varvec{\theta }^*\right) \) is computed on a uniform mesh of size \(h=0.02\) in the segment \(\left( 1,t\right) \), \(0\le t\le 2\). Both the generalization error and the \(L^2\)-relative error are then approximated by using the same mesh. The training points are split into interior and boundary points as follows: for a given positive integer \(N_0\), \(3N_0\) points are located on the boundary and \(N_0^2\) in the interior domain. Thus, \(N=N_0^2+3N_0\).
Tables 1 and 2 collect simulation results for a multilayer neural network composed of 4 hidden layers and 50 neurons in each layer. It is observed that both the generalization error and the \(L^2\)-relative error slowly decrease as the number of training points increases. The comparison between Tables 1 and 2 shows that regularization does not increase the level of accuracy. Table 3 displays simulation results for the case of a single-layer architecture having the same number of neurons as in the multilayer neural network considered in Tables 1 and 2. It is observed that the single-layer architecture provides slightly less accurate results.
Figure 4 shows the exact control (31) and the PINN control \({\hat{u}}\left( t;\varvec{\theta }^*\right) \), and the error between exact and PINN states.
The effect of increasing the depth (number of hidden layers) and width (number of neurons for layer) of the neural network has been also tested. We have observed that the level of accuracy in the solutions is not improved significantly. This is in agreement with previous studies (see, e.g., [23]) that show that a relative small neural network is able to approximate with accuracy of smooth solutions of PDEs.
Experiment 1 (linear wave equation). Comparison between exact control u(t) and PINN (or predicted) control \({\hat{u}}(t;\varvec{\theta }^{*})\) (left), and error between exact state and PINN state, i.e., \(y(x,t)-{\hat{y}}(x,t;\varvec{\theta }^{*})\) (right) Neural network composed of 4 hidden layers and 50 neurons in each layer. No regularization. Number of training points \(N=10300\)
4.2 Experiment 2: Linear Heat Equation
In this experiment, we consider the heat system (7)–(8) for \(\varOmega = \left( 0,1\right) ^d\) and \(d=1, 5, 10\), and 20.
The One-Dimensional case For comparison purposes, the case \(d=1\) is addressed next. The first mode of the Laplacian \(y^0(x)=\sin \left( \pi x\right) \), \(0<x<1\), is taken as the initial condition. On \(x=0\), a zero Dirichlet boundary condition is imposed. The control function acts on the extreme \(x=1\). In order to have a better control of the diffusion, the Laplacian \(\varDelta \) is replaced by \(\kappa \varDelta \), with \(\kappa = 0.25\). The control time is \(T = 0.5\). This experiment has been previously considered in [30, Subsection 5.1]. Figure 5 shows the predicted state (left) and control (right) obtained from the PINN algorithm described in Sect. 2.2, and for a feedforward neural network composed of 5 hidden layers and 100 neurons in each layer. The number of training points is \(N=10300\). Once the training process is completed, the training error for the controllability condition \(y(x,T)=0\), \(0<x<1\), which provides an approximation of \(\Vert y\left( \cdot ,T\right) \Vert _{L^2\left( \varOmega \right) }\) is \(1.17\times 10^{-5}\). It is observed in Fig. 6 that both the PINN control and state have a similar profile as in [30, Figures 2 and 4 (left)]. However, no oscillations near the final time appear in the PINN control. This is not contradictory with the results in [30] since it is well known that no uniqueness of null controls holds.
The Multi-dimensional Case In order to check the accuracy of the proposed method in high dimensions, we consider the following control to the trajectory problem for \(\varOmega = (0,1)^d\) and \(T=1\):
This problem has an explicit solution [28], which is given by \(y(x,t)=\frac{\Vert x\Vert ^2}{d} + 2t\), \(x\in \varOmega \). The control function is obtained as the trace of y on \(\partial \varOmega \). Table 4 displays simulation results for \(L^2\)- relative error in the state variable and training error. It is observed that even for high dimensions the relative error in the state variable is very low. Accuracy is similar to the one obtained for forward PDEs via PINNs [28].
4.3 Experiment 3: A Semilinear Wave Equation
Next, we consider a nonlinear situation, precisely the case of a semilinear wave equation. Positive results on the exact controllability for semilinear wave equations have been obtained, among others, in [31, 44, 45].
In this experiment, the following null controllability problem for a semilinear wave equation is considered:
This problem has been previously studied in [8, Subsection 4.2.1].
The proposed PINN-based algorithm has been tested for different neural network architectures and number of training points. Table 5 collects the simulation results for all contributions in training error as in (16). Recall that \( {\mathcal {E}}_{\text {train, int}} \) is the training error associated with the residual of the PDE, \({\mathcal {E}}_{\text {train, boundary}}\) corresponds to the boundary condition at \(x=0\), \( {\mathcal {E}}_{\text {train, initialpos}}\) and \( {\mathcal {E}}_{\text {train, initialvel}} \) are, respectively, the training errors for initial position and velocity, and finally, \( {\mathcal {E}}_{\text {train, finalpos}} \) and \( {\mathcal {E}}_{\text {train, initialvel}}\) are the training errors for the controllability condition at the control time \(T=2\). It is observed in Table 5 that increasing the number of training points does not reduce significantly training errors. This is in accordance with previous studies (see, e.g., numerical experiments in [27]). Recall that the training error includes the optimization error due to the gradient-based descent algorithms used for minimization of the highly nonconvex loss function (4) for which we have no information.
Figure 6 displays numerical simulation results obtained for a multilayer neural network composed of 5 hidden layers and 100 neurons in each layer. For this particular example, no explicit solution is known so that it is not possible to check the accuracy of the method. In addition, as it was mentioned in the preceding experiment, the control is not unique. Nonetheless, comparison between Figs. 6 and 2 in [8] shows that the results are very similar.
5 Conclusions
Even though highly accurate methods are available for approximating numerically a wide range of controllability problems for PDEs, the applicability of these methods to high-dimensional problems is questionable due to the well-known curse of dimensionality phenomenon.
The present paper provides a first attempt to overcome this difficulty. It relies on the use of modern deep-learning-based methods, in particular on the so-called physics-informed neural networks. More precisely, a PINNs-based method has been proposed for the numerical approximation of controllability problems for PDEs both linear and nonlinear. The problem is formulated as the minimization, in the sense of least squares, of a loss function that accounts for the residual of the PDE and its initial, boundary, and final conditions. The main novelties here with respect to more classical numerical methods in control of PDEs are as follows: (i) a feedforward neural network is used for approximating both the state and the control variables, and (ii) the method is mesh-free. In addition, it is important to emphasize that although deep-learning-based methods have found great success in many applications, no theoretical results have appeared in the literature in this field so far. In this respect, estimates for the generalization error (in both control and state variables) in terms of training and quadrature errors have been obtained in this paper. It is also proved that the training error vanishes as the size of the neural network and the number of training points go to infinity. An important feature of these theoretical results is that they apply to any controllability problem for a linear PDE and in any dimension and so PINNs qualify as a promising tool to deal with high-dimensional problems.
The accuracy in our numerical experiments is similar to the one obtained by using the PINN algorithm [34] for solving forward problems for PDEs. This is not surprising since the proposed method is a PINN-based algorithm for solving PDEs where final conditions are added to the picture and the control is obtained as the trace of the solution of the PDE.
There are many interesting questions that remain open. Some of them are:
-
Since the constants that appear in our estimates on generalization error are based on energy and observability inequalities, they depend on the spatial dimension d. To what extent these estimates break the curse of dimensionality is a very interesting open problem.
-
Although it was proved that training error converges to zero as the size of network and the number of the training points increase, up to the best knowledge of the authors, estimating training error is also a very challenging open problem. It is clear that training errors can be estimated a posteriori. Nonetheless, a posteriori estimates of training errors are, in general, not sharp as the training error incorporates errors due to the numerical approximation of highly nonconvex optimization problems whose solutions get stuck in local minima. This issue has been observed in our numerical experiments where increasing the size of the neural network and the number of training points produces a very slow decreasing of the training error.
-
Proving error estimates for generalization error in the case of controllability problems for nonlinear PDEs and other types of control actions (e.g., distributed control) are also interesting open problems.
6 Reproducibility
The implementation of the numerical experiments presented in Sect. 4 has been performed with the user-friendly Python library DeepXDE [23], which is available at https://github.com/lululxvi/deepxde. Python scripts for the three experiments can be downloaded from https://github.com/fperiago/deepcontrol.
References
Bárcenas-Petisco, J.A.: Optimal control for neural ode in a long time horizon and applications to the classification and simultaneous controllability problems. https://hal.archives-ouvertes.fr/hal-03299270/ (2022)
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in Machine Learning: a survey. J. Mach. Learn. Res. 18, 1–43 (2018)
Bardos, C., Lebeau, G., Rauch, J.: Sharp sufficient conditions for the observation, control, and stabilization of waves from the boundary. SIAM J. Control Optim. 30(5), 1024–1065 (1992)
Beck, C., Martin, H., Jentzen, A., Benno, K.: An overview on deep learning-based approximation methods for partial differential equations. arXiv:2012.12348 (2021)
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving the Kolmogorov PDE by means of deep learning. J. Sci. Comput. 88(3), 1–28 (2021)
Blechschmidt, J., Ernst, O.G.: Three ways to solve partial differential equations with neural networks—a review. GAMM-Mitt 44(2), 1–29 (2021)
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995)
Cavalcanti, M., Cavalcanti, V.D., Rosier, C., Rosier, L.: Numerical control of a semilinear wave equation on an interval. In: Auriol, J., Deutscher, J., Mazanti, G., Valmorbida, G. (eds.) Advances in Distributed Parameter Systems, pp. 69–89. Springer, Cham (2022)
Cuchiero, C., Larsson, M., Teichmann, J.: Deep neural networks, generic universal interpolation, and controlled ODEs. SIAM J. Math. Data Sci. 2(3), 901–919 (2020)
Castro, C., Micu, S.: Boundary controllability of a linear semi-discrete 1-D wave equation derived from a mixed finite element method. Numer. Math. 102(3), 413–462 (2006)
Ervedoza, S., Zuazua, E.: Numerical Approximation of Exact Controls for Waves. Springer Briefs in Mathematics, vol. 38. Springer, Berlin (2013)
Esteve, C., Geshkovski, B., Pighin, D., Zuazua, E.: Large-time asymptotics in deep learning. arXiv:2008.02491 (2021)
Esteve-Yagüe, C., Geshkovski, B.: Sparse approximation in learning via neural odes. arXiv:2102.13566 (2021)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. AISTATS (2010)
Glowinski, R., Li, C., Lions, J.L.: A numerical approach to the exact boundary controllability of the wave equation (I). Dirichlet controls: description of the numerical methods. Jpn. J. Appl. Math. 7, 1–76 (1990)
Gugat, M.: Optimal boundary control and boundary stabilization of hyperbolic systems. Springer Briefs in Control, Automation and Robotics. Springer (2015)
Han, J., Jentzen, A., Weinan, E.: Solving high-dimensional partial differential equations using deep learning. PANS 115(34), 8505–8510 (2018)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. PMLR 37, 448–456 (2015)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations (2015)
Lazar, M., Zuazua, E.: Greedy controllability of finite dimensional linear systems. Automatica 74, 327–340 (2016)
Lebeau, G., Robbiano, L.: Contrôle exact de L’equation de la chaleur. Commun. Partial Differ. Equ. 20(1–2), 335–356 (1995)
Lions, J.L.: Controllabilité exacte, perturbations et stabilization de systémes distribués, vol. I. Masson, Paris (1988)
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: DeepXDE: a deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)
Lye, K.O., Mishra, S., Ray, D.: Deep learning observables in computational fluid dynamics. J. Comput. Phys. 410, 109339 (2020)
Marín, F.J., Martínez-Frutos, J., Periago, F.: Robust averaged control of vibrations for the Bernoulli–Euler beam equation. J. Optim. Theory Appl. 174(2), 428–454 (2017)
Martínez-Frutos, J., Periago, F.: Optimal control of PDEs under uncertainty. An introduction with application to optimal shape design of structures. Springer Briefs in Mathematics. BCAM Springer Briefs. Springer (2018)
Mishra, S., Molinaro, R.: Estimates on generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal. 00, 1–42 (2021)
Mishra, S., Molinaro, R.: Estimates on generalization error of physics-informed neural networks for approximating PDEs. IMA J. Numer. Anal. (2022). https://doi.org/10.1093/imanum/drab093
Münch, A.: A uniformly controllable and implicit scheme for the 1-D wave equation. Math. Model. Numer. Anal. 39(2), 377–418 (2006)
Münch, A., Pedregal, P.: Numerical null controllability of the heat equation through a least squares and variational approach. Eur.J. Appl. Math. 25(3), 277–306 (2014)
Münch, A., Trélat, E.: Constructive exact control of semilinear 1D wave equations by a least-squares approach. SIAM J. Control Optim. 60(2), 652–673 (2022)
Pedregal, P., Periago, F., Villena, J.: A numerical method of local energy decay for the boundary controllability of time-reversible distributed parameter systems. Stud. Appl. Math. 121(1), 27–47 (2008)
Pinkus, A.: Approximation theory of the MLP model in neural networks. Stud. Acta Numer. 8, 143–195 (1999)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Ruiz-Balet, D., Zuazua, E.: Neural ode control for classification, approximation and transport. arXiv:2104.05278 (2021)
Ruiz-Balet, D., Affili, E., Zuazua, E.: Interpolation and approximation via momentum resnets and neural odes. Syst. Control Lett. 162, 105182 (2022)
Russell, D.L.: A unified boundary controllability theory for hyperbolic and parabolic partial differential equations. Stud. Appl. Math. LI I(3), 189–211 (1973)
Shin, Y., Darbon, J., Karniadakis, G.E.: On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. Commun. Comput. Phys. 28(5), 2042–2074 (2020)
Sobol’, I.M.: On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 7(4), 784–802 (1967)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Weinan, E., Yu, B.: The deep Riesz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6, 1–12 (2018)
Weinan, E., Chao, M., Wojtowytsch, S., Lei, W.: Towards a mathematical understanding of neural network-based machine learning: what we know and what we don’t. CSIAM Trans. Appl. Math. 1(4), 561–615 (2020)
Zhang, D., Lu, L., Guo, L., Karniadakis, G.E.: Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems. J. Comput. Phys. 397, 108850 (2019)
Zuazua, E.: Exact controllability for the semilinear wave equation. J. Math. Pures Appl. 69(9), 1–31 (1990)
Zuazua, E.: Exact boundary controllability for the semilinear wave equation. In: Nonlinear partial Differential Equations and Their Applications. Vol. 220 of Pitman Res. Notes Math. Ser., Longman Sci. Tech., Harlow, 357–391 (1991)
Zuazua, E.: Propagation, observation, control and numerical approximation of waves approximated by finite difference methods. SIAM Rev. 47(2), 197–243 (2005)
Zuazua, E.: Averaged control. Automatica 50, 3077–3087 (2014)
Acknowledgements
This research was supported by Fundación Séneca (Agencia de Ciencia y Tecnología de la Región de Murcia (Spain)) under contract 20911/PI/18 and grant number 21503/EE/21 (mobility program Jiménez de la Espada). F. Periago acknowledges the hospitality of the Mathematics Department at University of California, Santa Barbara, where part of this work was carried out. The authors also thank professor Lu Lu for very fruitful comments on the use of DeepXDE.
Funding
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Lorenz Biegler.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
García-Cervera, C.J., Kessler, M. & Periago, F. Control of Partial Differential Equations via Physics-Informed Neural Networks. J Optim Theory Appl 196, 391–414 (2023). https://doi.org/10.1007/s10957-022-02100-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-022-02100-4