Control of Partial Differential Equations via Physics-Informed Neural Networks

This paper addresses the numerical resolution of controllability problems for partial differential equations (PDEs) by using physics-informed neural networks. Error estimates for the generalization error for both state and control are derived from classical observability inequalities and energy estimates for the considered PDE. These error bounds, that apply to any exact controllable linear system of PDEs and in any dimension, provide a rigorous justification for the use of neural networks in this field. Preliminary numerical simulation results for three different types of PDEs are carried out to illustrate the performance of the proposed methodology.

the proposed method to the control and state of the continuous problem is established (Corollary 3.1). For the sake of clarity, in this preliminary work we focus on the case of boundary Dirichlet control, but in a straightforward manner the ideas and methods here proposed extend to other types of control actions. Also, for pedagogical reasons, instead of presenting an abstract general framework, the methods and proofs are first described for the two emblematic systems of the wave and heat equations and then extended to more general PDE systems. Preliminary numerical experiments for three different PDEs illustrate the performance of the proposed method. More precisely, the accuracy of the method is tested on a simple model for the wave equation for which an analytical solution is available. In a second experiment, a high-dimensional controllability problem for the heat equation is considered. The third experiment concerns a semilinear PDE.
As the title indicates, this work is just a first step toward the numerical resolution of controllability problems for high-dimensional PDEs and so further research is needed to achieve a deeper understanding of the type of problems considered here.
Finally, for the sake of completeness, it is important to point out that the connection between different architectures of deep neural networks and controlled ordinary differential equations has been recently studied in [9]. This is a novel research line that also includes the analysis of the so-called neural differential equations by using techniques coming from continuous control theory [1,12,13,35,36].

Problem Setup and Description of the PINNs Algorithm
From now on in this paper, Ω ⊂ R d , d ∈ N, denotes a bounded domain with a smooth boundary which is decomposed into two disjoint parts Γ D and Γ C . For any positive time T , we denote by Q T := Ω × (0, T ). As is usual Δ := d j=1 ∂ 2 ∂ x 2 j stands for the Laplacian operator.

Wave Equation
Given initial data (y 0 , y 1 ) in suitable function spaces, the null controllability problem for the wave equation amounts to finding a positive time T > 0 and a control function u(x, t) such that the solution y(x, t) of the system satisfies y(x, T ) = y t (x, T ) = 0 x ∈ Ω.
(2) Fig. 1 Illustration of a fully connected deep feedforward neural network with two input channels x = (x, t), two hidden layers (each one with 3 neurons) and a scalar outputŷ(x, t; θ ). Input data pass through the net by following (3). The set of free parameters of the network is denoted by θ It is well known [22] that if the domain Ω satisfies the so-called geometrical controllability condition (GCC) introduced by Bardos, Lebeau, and Rauch [3] and if y 0 , y 1 ∈ L 2 (Ω) × H −1 (Ω), then, for T large enough, problem (1)-(2) has a solution u ∈ L 2 (Γ C × (0, T )).
The PINNs approach for solving direct and inverse problems for PDEs [34] is next adapted to approximate numerically the control u(x, t) of problem (1)- (2). Roughly speaking, in the PINNs approach the solution is approximated by a neural network and the equations are imposed, in the least square sense, at a collection of nodal points. In the machine learning language, PINNs approach is composed of the following four main steps: (1) design an artificial neural networkŷ (x, t; θ ) as a surrogate of the true solution y(x, t), (2) consider a training set that is used to train the neural network, (3) define an appropriate loss function which accounts for residuals of the PDE, initial, boundary, and final conditions, and (4) train the network by minimizing the cost function defined in the previous step. From the training process, optimal parameters defining the neural networkŷ (x, t; θ ) are computed and eventually are used to get predictions about the state y(x, t) and the control u(x, t), which is approximated as the trace ofŷ (x, t; θ ) on the boundary Γ C . Next, we give details on these steps: Step 1: Neural Network Among different possibilities, we consider a deep feedforward neural network (also known in the literature as a multilayer perceptron (MLP)) with d + 1 input channels x = (x, t) ∈ R d+1 and a scalar outputŷ (see Fig. 1). More specifically,ŷ (x, t; θ ) is constructed as where -N : R d in → R d out is the layer with N neurons, -W ∈ R N ×N −1 and b ∈ R N are, respectively, the weights and biases so that are the parameters of the neural network, and σ is an activation function, which acts component-wise. Throughout this paper, we consider smooth activation functions such as hyperbolic tangent σ (s) = tanh(s), with s ∈ R. Step 2: Training Dataset A dataset T of scattered data is selected in the interior domain T int ⊂ Q T and on the boundaries Fig. 2 for an illustration). The number of selected points in T int is denoted by N int . Analogously, N b is the number of points on the boundary Γ D , and N 0 and N T stand for the number of points in T t=0 and T t=T , respectively. The total number of collocation nodes is denoted by N , and we write T N instead of T to indicate clearly the number of points N used hereafter.
Step 3: Loss Function A weighted summation of the L 2 norm of residuals for the equation, boundary, initial, and final conditions is considered as the loss function to be minimized during the training process. It is composed of the following six terms: given a neural network approximationŷ (as constructed in (3)), define networkŷ (x, t, θ ) is required to satisfy, in the least squares sense, the PDE, initial conditions, boundary condition and exact controllability conditions. Then, the residual on training points L (θ; T ) is minimized to get the optimal set of parameters θ * of the neural network. This leads to the PINN exact stateŷ x, t; θ * . Finally, the PINN exact controlû x, t; θ * is obtained as the trace of the PINN state on the boundary control region Γ C where w j,int , w j,b , w j,0 , and w j,T are the weights of the quadrature rules. The loss function used for training is Notice that no boundary condition is imposed on Γ C . As is usual in the field of machine learning, all the derivatives included in the loss function are computed by using automatic differentiation [2].
Step 4: Training Process The final step in the PINN algorithm amounts to minimizing (4), i.e., The approximationû(t; θ * ) of the control u(x, t) is then obtained as the restriction of y(x, t; θ * ) to the boundary Γ C , i.e., See Fig. 3 for an schematic diagram of the proposed algorithm.

Remark 2.1
Notice that the PINN algorithm proposed above is, in spirit, in the same line as the one considered in [30], where an error function is minimized in the sense of least squares. As a consequence, if this error function reaches the zero value, then the controllability condition is satisfied. A major difference with respect to classical numerical methods for control of PDEs is that the PINN-based approach is mesh-free as it does not require a (finite element) mesh for numerical approximation. Moreover, the function that is used for numerical approximation is a neural network as opposed to (piece-wise) polynomials, that are the usual models of choice.

Remark 2.2
As is well known, the different terms that appear in the loss function (4) do not have the same strength, in general. At the practical level, this difficulty may be overcome by introducing additional weighting parameters in front of those terms. These would be new hyperparameters that the machine-learning-based algorithm ought to learn. It is clear that the introduction of these parameters does not affect the convergence results in the next section.

Heat Equation
Similar to the case of the wave equation, given an initial datum y 0 in a suitable function space, the null controllability problem for the heat equation amounts to finding a positive time T > 0 and a control function u(x, t) such that the solution y(x, t) of the system It is well known [21] that if y 0 ∈ L 2 (Ω), then, for any T > 0, problem (7)-(8) has a solution u ∈ L 2 (Γ C × (0, T )). The numerical approximation of problem (7)-(8) follows the same steps 1-4 as in the case of the wave equation. The only element to be modified is the loss function, which in this case is defined as the sum of

Extension to General Evolution PDE Systems
Consider now a general evolution system of the form where A is a generic (linear or nonlinear) operator, and the state y = y(x, t) is, in general, a vector function.
As in the preceding two cases, the goal is to find a positive time T and a control function u(x, t) such that the solution to (9) satisfies The PINNs algorithm described above for the wave and heat equations also applies in this general framework with few changes. Actually, the only step that must be updated is the one corresponding to the loss function that now takes the form: where · stands for the Euclidean norm.

Estimates on Generalization Error
This section aims at obtaining error estimates for the so-called generalization error for both control and state. The generalization error for the control variable u is defined by where u = u(x, t) is the exact control of minimal L 2 -norm of the continuous problem, u =û x, t; θ * is its numerical approximation obtained from the algorithm proposed above, and · is an appropriate norm. The generalization error for the state variable is similarly defined. The generalization error (11) is typically decomposed into approximation error, which is due to the choice of the hypothesis space (two-layer, multilayer, residual, convolutional neural networks, etc.), and estimation error, due to the fact that the surrogate controlû is computed from a finite dataset. Of course, the generalization error also depends on a crucial way on the specific algorithm proposed for training. In particular, PINN solutions obtained from the proposed method are obtained by solving highly nonconvex optimization problems that typically get stuck in local minima. Estimating this optimization error is a very challenging open problem.
Error estimates for the approximation error of some hypothesis spaces are by now well known. For instance, for the case of Barron space of two-layer neural networks, the approximation error in the L 2 -norm scales as O m −1/2 , with m being the number of neurons in the network, and independently of the dimension d. As for the estimation error, it is also known that the Rademacher complexity of Barron space, which controls the estimation error, is controlled by a Monte Carlo rate O N −1/2 , where N is the number of sampling points used for training. We refer the reader to [42] and the references therein for more details on this issue. In particular, these results support the choice of multilayer neural networks of Sect. 2.
Regarding the PINNs algorithm for solving PDEs, convergence results w.r.t. the number of sampling points used for training have been recently obtained in [38] for the case of second-order linear elliptic and parabolic equations with smooth solutions. It is also worth to mentioning article [27] where error estimates, in terms of training error and the number of sampling points, are derived for the generalization error of a class of data assimilation problems.
Following [27], we next prove error estimates for control and state and for the class of controllability problems considered here. The two key ingredients to get such error bounds are observability inequalities and error estimates for quadrature rules. The precise observability inequalities that are needed in our cases will be detailed in the next subsections. Concerning quadrature error estimates, these are very well known in the literature but for the sake of completeness, we now recall some basic concepts and results on this issue.

Error Estimates for Quadrature Rules
For a given function f : D ⊂ R d → R, a quadrature rule approximating the integral is defined by where (x j , w j ), 1 ≤ j ≤ N , are the nodes and weights of the quadrature rule. Quadrature errors depend on the specific rule used, on the smoothness of the function f and on the dimension d. For regular functions and low dimensions, one typically may use Gauss or Clenshaw-Curtis rules. Rules based on low discrepancy sequences such as Sobol sequences are the rules of choice for intermediate dimensions [39]. In both cases, error estimates for these quadrature rules take the general form where α depends on the regularity of f and the constant C q = C q (d), which also depends on f and its derivatives, explodes as d → ∞. Monte Carlo integration is immune to the curse of dimensionality and applies to non-smooth integrands. As is well known, the error estimate in that case is as in (12) where C q is independent of the dimension d and α = 1/2.

Wave Equation
The generalization error in the control variable u due to the PINN algorithm proposed in Sect. 2.1 is defined as where u = u(t) is the exact control of the continuous problem (1)-(2) andû = u t; θ * is its numerical approximation given by (6). Similarly, the generalization error for the state variable is defined by As is usual in machine learning's terminology, the so-called training error for PINNs algorithm is given by where and θ * is as in (5).
Next, we recall classical observability and energy inequalities for the wave equation: Lemma 3.1 Let us assume that the domain Ω satisfies the geometrical controllability condition [3], and let T > 0 be large enough. Given initial and final Moreover, for a positive constant C o = C o (Ω, T ) which depends on Ω and T , but is independent of the initial and final data. (0, T )). Consider the non-homogeneous system Then, there exists a positive constant C e = C e (Ω, T ) such that We are now in a position to estimate the generalization error for our PINNs-based algorithm.
It is assumed thatŷ ∈ C 2 Q T . Let u = u(x, t) andû =û x, t; θ * be the exact control of the continuous system (1)-(2) and its PINN approximation, respectively. Then, the following estimate for the generalization error in the control variable holds: where C = C(Ω, T ), and consequently C = C(d) also depends on the spatial dimension d. The constants C q− and exponents α − are the ones associated with quadrature rules as in (12). A similar estimate, with different constants, also holds for the generalization error in the state variable, as given by (14).
Proof Let y = y −ŷ and u = u −û be the error in the state and control variables, respectively. By linearity, y solves Again by linearity, y(x, t; θ ) is decomposed as y = y 1 + y 2 , where y 1 and y 2 are, respectively, solutions to and By applying the observability inequality (19) to system (24), and the energy estimate (21) to (25), Estimate (22) then follows by applying (12). The corresponding estimate for the generalization error (14) is an immediate consequence of (21) and (22).
Although it has not been written explicitly hereinabove, it is clear that generalization errors depend on the specific type and size of neural network as well as on the type and number of quadrature nodes which are selected from the very beginning. Thus, denoting by H m the hypothesis space considered for numerical approximation, where m denotes the number of neurons (or free parameters) in the neural network, and by N the number of collocation points used for quadrature, to make explicit this dependence we write E gener (u) = E m,N gener (u) and E gener (y) = E m,N gener (y) .
Next, the behavior of the generalization errors is analyzed when the size m of singlelayer neural networks goes to infinity and so does the sampling size (N → ∞). Let us consider the hypothesis space of single-layer neural nets The training process (5) From now on it is assumed that the optimization problem (27) has a solution. Otherwise, one can always add a regularization term of the form θ 2 . We now recall the following universal approximation theorem due to Pinkus [33,Th. 4.1].
Assume that the activation function σ ∈ C k (R) is not a polynomial. Then, for any compact set K ⊂ R d+1 and any ε > 0 there exists m ∈ N and y m ∈ H m such that Each of the terms in the left-hand side of (29) is now expressed by using a quadrature rule with collocation nodes T N . Then, taking into account the optimality ofŷ m,N , as given by (27), one deduces that the sum of training errors that appear in the right-hand side of (22) is less than or equal to ε/2. Moreover, for fixed m = m(ε), there exists N such that the sum of quadrature errors in (22) is also less than or equal to ε/2. Thus, E m,N gener (u) ≤ ε. The arbitrariness of ε gives the result for the generalization error in the control variable. The case of the state variable is completely analogous.

Extension to Other PDE Systems and Neural Network Architectures
It is clear that the arguments and conclusions of Theorem 3.1 and Corollary 3.1 extend to any linear system of PDEs for which observability as well as energy inequalities similar to those in (19) and (21) hold. Linearity of the PDE is used in an essential way in the proof of Theorem 3.1. Thus, a different argument is needed to extend this result to the case of nonlinear PDEs.
The proof of Corollary 3.1 relies on the universal approximation theorem by Pinkus for the case of single-layer neural networks. Thus, the conclusion of Corollary 3.1 also holds for other neural network architectures for which such a density result is true.

Numerical Experiments
In this section, we test the performance of the proposed method in three exact controllability problems. The first experiment aims at checking the accuracy of the method on a very simple controllability problem for the wave equation for which an exact solution is explicitly known. In the second experiment, the high-dimensional situation is tested on a controllability problem for the heat equation. The last experiment considers a semilinear wave equation.
As indicated at the beginning of Sect. 3, the optimization error due to the gradientbased algorithms used for training is a key ingredient in the total error associated with the proposed PINN algorithm. This error has not been accounted for in Theorem 3.1 and Corollary 3.1. However, the numerical simulation results presented in this section do incorporate this error. As a consequence, simulation results are unable to illustrate with accuracy the theoretical findings of Sect. 3. The gap between the theoretical error estimates and the simulation results is accounted for by the optimization error in the training process.
In all experiments that follow, a multilayer neural network, as described in Sect. 2, with the tanh as activation function, is used. Sobol quadrature nodes [39] are employed for training the neural network. The training process, i.e., minimization of L (θ ; T N ), is carried out with the ADAM optimizer [19] with learning rate 10 −3 for the first 20000 epochs. Then, a L-BFGS optimizer [7] is employed to accelerate convergence. The required gradients are computed by using automatic differentiation [2]. The descent algorithm is initialized with Glorot uniform [14]. As is well known, results obtained from gradient-based optimizers depend on initialization. A common practice to deal with this issue is to perform an emsemble training [24]. However, the use of this and other more sophisticated techniques (residual-based adaptive refinement (RAR) [23], dropout [40], batch normalization [18], etc.) is not the purpose of this paper which aims at illustrating the possible use of PINNs in the topic of controllability of PDEs.

Experiment 1: Linear Wave Equation
We consider the control system (1)-(2) in the domain Ω = (0, 1) for the data and for the control time T = 2. An explicit solution of the problem is easily obtained by using D'Alembert formula. Indeed, by considering the function the explicit exact state is given by and the exact control is Remark 4. 1 We notice that the control given by (31) is the one of minimal L 2 -norm. This is no longer true if the initial velocity y 1 is different from zero (see [16], Section 4.1 for details).
The efficiency of the proposed PINN-based algorithm in approximating the solution to this problem is analyzed next. The generalization error in the control variable E gener (u) := u −û L 2 (0,T ) , L 2 -relative error u −û L 2 (0,T ) / u L 2 (0,T ) , and total training error E train := L θ * ; T N are computed for several values of the total number N of training points and several architectures of the neural network. The effect of regularization, where the term λ reg θ 2 2 is added to the loss function (4), with λ reg > 0, is also studied.
Once the training process is finished and the optimal set of parameters θ * is obtained, the PINN controlû(t; θ * ) =ŷ 1, t; θ * is computed on a uniform mesh of size h = 0.02 in the segment (1, t), 0 ≤ t ≤ 2. Both the generalization error and the L 2 -relative error are then approximated by using the same mesh. The training points are split into interior and boundary points as follows: for a given positive integer N 0 , 3N 0 points are located on the boundary and N 2 0 in the interior domain. Thus, N = N 2 0 + 3N 0 . Tables 1 and 2 collect simulation results for a multilayer neural network composed of 4 hidden layers and 50 neurons in each layer. It is observed that both the generalization error and the L 2 -relative error slowly decrease as the number of training points increases. The comparison between Tables 1 and 2 shows that regularization does not increase the level of accuracy. Table 3 displays simulation results for the case of a single-layer architecture having the same number of neurons as in the multilayer neural network considered in Tables 1 and 2. It is observed that the single-layer architecture provides slightly less accurate results. Figure 4 shows the exact control (31) and the PINN controlû t; θ * , and the error between exact and PINN states.
The effect of increasing the depth (number of hidden layers) and width (number of neurons for layer) of the neural network has been also tested. We have observed that the level of accuracy in the solutions is not improved significantly. This is in agreement with previous studies (see, e.g., [23]) that show that a relative small neural network is able to approximate with accuracy of smooth solutions of PDEs.

Experiment 2: Linear Heat Equation
In this experiment, we consider the heat system (7)- (8) for Ω = (0, 1) d and d = 1, 5, 10, and 20. The One-Dimensional case For comparison purposes, the case d = 1 is addressed next. The first mode of the Laplacian y 0 (x) = sin (π x), 0 < x < 1, is taken as the initial condition. On x = 0, a zero Dirichlet boundary condition is imposed. The control function acts on the extreme x = 1. In order to have a better control of the diffusion, the Laplacian Δ is replaced by κΔ, with κ = 0.25. The control time is T = 0.5. This experiment has been previously considered in [30,Subsection 5.1]. Figure 5 shows the predicted state (left) and control (right) obtained from the PINN algorithm described in Sect. 2.2, and for a feedforward neural network composed of 5 hidden layers and 100 neurons in each layer. The number of training points is N = 10300. Once the training process is completed, the training error for the controllability condition y(x, T ) = 0, 0 < x < 1, which provides an approximation of y (·, T ) L 2 (Ω) is 1.17×10 −5 . It is observed in Fig. 6 that both the PINN control and state have a similar profile as in [30,Figures 2 and 4 (left)]. However, no oscillations near the final time appear in the PINN control. This is not contradictory with the results in [30] since it is well known that no uniqueness of null controls holds. The Multi-dimensional Case In order to check the accuracy of the proposed method in high dimensions, we consider the following control to the trajectory problem for Ω = (0, 1) d and T = 1: This problem has an explicit solution [28], which is given by y( The control function is obtained as the trace of y on ∂Ω. Table 4 displays simulation results for L 2 -relative error in the state variable and training error. It is observed that even for high dimensions the relative error in the state variable is very low. Accuracy is similar to the one obtained for forward PDEs via PINNs [28].

Experiment 3: A Semilinear Wave Equation
Next, we consider a nonlinear situation, precisely the case of a semilinear wave equation. Positive results on the exact controllability for semilinear wave equations have been obtained, among others, in [31,44,45].
In this experiment, the following null controllability problem for a semilinear wave equation is considered: This problem has been previously studied in [8, Subsection 4.2.1]. The proposed PINN-based algorithm has been tested for different neural network architectures and number of training points. Table 5 collects the simulation results for all contributions in training error as in (16). Recall that E train, int is the training error associated with the residual of the PDE, E train, boundary corresponds to the boundary condition at x = 0, E train, initialpos and E train, initialvel are, respectively, the training errors for initial position and velocity, and finally, E train, finalpos and E train, initialvel are the training errors for the controllability condition at the control time T = 2. It is observed in Table 5 that increasing the number of training points does not reduce significantly training errors. This is in accordance with previous studies (see, e.g., numerical experiments in [27]). Recall that the training error includes the optimization error due to the gradient-based descent algorithms used for minimization of the highly nonconvex loss function (4) for which we have no information. Figure 6 displays numerical simulation results obtained for a multilayer neural network composed of 5 hidden layers and 100 neurons in each layer. For this particular  example, no explicit solution is known so that it is not possible to check the accuracy of the method. In addition, as it was mentioned in the preceding experiment, the control is not unique. Nonetheless, comparison between Figs. 6 and 2 in [8] shows that the results are very similar.

Conclusions
Even though highly accurate methods are available for approximating numerically a wide range of controllability problems for PDEs, the applicability of these methods to high-dimensional problems is questionable due to the well-known curse of dimensionality phenomenon.
The present paper provides a first attempt to overcome this difficulty. It relies on the use of modern deep-learning-based methods, in particular on the so-called physicsinformed neural networks. More precisely, a PINNs-based method has been proposed for the numerical approximation of controllability problems for PDEs both linear and nonlinear. The problem is formulated as the minimization, in the sense of least squares, of a loss function that accounts for the residual of the PDE and its initial, boundary, and final conditions. The main novelties here with respect to more classical numerical methods in control of PDEs are as follows: (i) a feedforward neural network is used for approximating both the state and the control variables, and (ii) the method is mesh-free. In addition, it is important to emphasize that although deep-learning-based methods have found great success in many applications, no theoretical results have appeared in the literature in this field so far. In this respect, estimates for the generalization error (in both control and state variables) in terms of training and quadrature errors have been obtained in this paper. It is also proved that the training error vanishes as the size of the neural network and the number of training points go to infinity. An important feature of these theoretical results is that they apply to any controllability problem for a linear PDE and in any dimension and so PINNs qualify as a promising tool to deal with high-dimensional problems.
The accuracy in our numerical experiments is similar to the one obtained by using the PINN algorithm [34] for solving forward problems for PDEs. This is not surprising since the proposed method is a PINN-based algorithm for solving PDEs where final conditions are added to the picture and the control is obtained as the trace of the solution of the PDE.
There are many interesting questions that remain open. Some of them are: -Since the constants that appear in our estimates on generalization error are based on energy and observability inequalities, they depend on the spatial dimension d. To what extent these estimates break the curse of dimensionality is a very interesting open problem. -Although it was proved that training error converges to zero as the size of network and the number of the training points increase, up to the best knowledge of the authors, estimating training error is also a very challenging open problem. It is clear that training errors can be estimated a posteriori. Nonetheless, a posteriori estimates of training errors are, in general, not sharp as the training error incorporates errors due to the numerical approximation of highly nonconvex optimization problems whose solutions get stuck in local minima. This issue has been observed in our numerical experiments where increasing the size of the neural network and the number of training points produces a very slow decreasing of the training error. -Proving error estimates for generalization error in the case of controllability problems for nonlinear PDEs and other types of control actions (e.g., distributed control) are also interesting open problems.

Reproducibility
The implementation of the numerical experiments presented in Sect. 4 has been performed with the user-friendly Python library DeepXDE [23], which is available at https://github.com/lululxvi/deepxde. Python scripts for the three experiments can be downloaded from https://github.com/fperiago/deepcontrol.