Abstract
Rapidly developing machine learning methods have stimulated research interest in computationally reconstructing differential equations (DEs) from observational data, providing insight into the underlying mechanistic models. In this paper, we propose a new neural-ODE-based method that spectrally expands the spatial dependence of solutions to learn the spatiotemporal DEs they obey. Our spectral spatiotemporal DE learning method has the advantage of not explicitly relying on spatial discretization (e.g., meshes or grids), thus allowing reconstruction of DEs that may be defined on unbounded spatial domains and that may contain long-ranged, nonlocal spatial interactions. By combining spectral methods with the neural ODE framework, our proposed spectral DE method addresses the inverse-type problem of reconstructing spatiotemporal equations in unbounded domains. Even for bounded domain problems, our spectral approach is as accurate as some of the latest machine learning approaches for learning or numerically solving partial differential equations (PDEs). By developing a spectral framework for reconstructing both PDEs and partial integro-differential equations (PIDEs), we extend dynamical reconstruction approaches to a wider range of problems, including those in unbounded domains.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
There has been much recent interest in developing machine-learning-based methods for learning the underlying physics-based equations of motion from data. In this paper, we are interested in learning the general dynamics F[u; x, t] of spatiotemporal partial differential equations (PDEs) or spatiotemporal partial integro-differential equations (PIDEs) such as
Here, \(\Omega \) is the spatial domain of interest and F[u; x, t] represents a general spatiotemporal operator acting on the function u(x, t), including linear combinations of all differential operators acting on u, such as \(u_x, u_{xx}, u_{xxx},...\), and spatial convolutional operators.
Although machine learning approaches have been proposed for many types of inverse problems that reconstruct partial differential equations (PDEs) from data [1, 2], most of them make prior assumptions about the specific form of the PDE and use spatial discretization i.e., grids or meshes, of a bounded spatial variable x to approximate the solutions of the PDE. There are three main types of machine-learning-based methods for learning PDEs: (i) methods that use neural networks to reconstruct the RHS of Eq. (1), F[u; x, t], by assuming that it can be well approximated by a (non)linear combination of a class of differential operators, (ii) methods that try to find an explicit mathematical expression for F[u; x, t] by imposing specific forms on F, and (iii) methods that circumvent learning F[u; x, t] by reconstructing a map from the initial condition to the solution at a later time. Long et al. [3] and Churchill et al. [4] used convolutional layers to construct the spatial derivatives of u, then applied a neural network [3, 4] or a symbolic network [5] to approximate F[u; x, t] by \(F(x, u_x, u_{xx},\dots )\). Sparse identification of nonlinear dynamics (SINDy) [6] and its variants [7, 8] have been developed to learn the dynamics of PDEs by using a sparse regression method to infer coefficients \(\textbf{a}\) in \(\partial _{t}u(x,t) = \textbf{a}\cdot (1, u, u^2, u_x, u_{xx}, \ldots )\), where \(\textbf{a}\) is the to-be-learned row vector of coefficients associated with each term in the PDE. These methods imposed an additive form for F[u; x, t]. Additionally, Fourier neural operator (FNO) [9] and other approaches [10] which learn the mapping between the function space of the initial condition \(u_0(\cdot , 0)\) and the function space of the solution \(u(\cdot , t)\) within a time range \(t\in [t_1, t_2]\) have also been recently developed [10, 11].
In summary, previous methods either assume F[u; x, t] can be approximated by some (non)linear combinations of differential operators, impose a specific form of F[u; x, t], or circumvent learning F[u; x, t] by reconstructing a map from the initial condition to the solution at a later time. To our knowledge, there has been no method that can extract the dynamics F[u; x, t] from data without making prior assumptions on its form. Moreover, since most prevailing numerical methods for time-dependent DEs rely on spatial discretization via meshes or grids, they cannot be directly applied to problems defined on an unbounded domain [12]. Nonetheless, many physical systems involve the evolution of quantities that experience long-ranged spatial interactions, requiring the solution of spatiotemporal integro-differential equations defined on unbounded domains; i.e., the dynamics F on the RHS of Eq. (1) might contain a spatial convolutional operator. Examples of such spatiotemporal integro-differential equations involve the fractional Laplacian equations for anomalous diffusion [13] and the Keller–Segel equation [14] for describing the swarming behavior. Reconstructing F[u; x, t] on the RHS in Eq. (1) given some observations of the physical quantity u(x, t) can help uncover the physical laws that govern their time evolution.
One major difficulty that prevents direct application of previous methods to unbounded domain problems is that one needs to truncate the unbounded domain and define appropriate boundary conditions [15, 16] on the new artificial boundaries. Although generalizations of the FNO method that include basis functions other than the Fourier series [17] can potentially be applied to unbounded domains, they do not reconstruct the dynamics F[u; x, t]. Moreover, they treat x and t in the same way using nonadaptive basis functions which can be inefficient for addressing unbounded-domain spatiotemporal problems where basis functions often need to be dynamically adjusted over time. Recently, an adaptive spectral PINN (s-PINN) method was proposed to solve specified unbounded-domain PDEs [18]. The method expresses the underlying unknown function in terms of adaptive spectral expansions in space and time-dependent coefficients of the basis functions, does not rely on explicit spatial discretization, and can be applied to unbounded domains. However, like many other approaches, the s-PINN approach assumes that the PDE takes the specific form \(u_t = F(u, u_x, u_{xx},...) + f(x, t)\) in which \(F(u,u_x,u_{xx},...)\) is known and only the u-independent source term f(x, t) is unknown and to be learned. Therefore, the s-PINN method is limited to parameter inference and source reconstruction.
a A 1D example of the spectral expansion in an unbounded domain with scaling factor \(\beta \) and displacement \(x_0\) (Eq. (7)). b The evolution of the coefficient \(c_0(t)\) and the two tuning parameters \(\beta (t), x_0(t)\). c A schematic of how to reconstruct Eq. (1) satisfied by the spectral expansion approximation \(u_{N, x_0}^{\beta }\). The time t, expansion coefficients \(c_i\), and tuning variables \(\beta (t)\), and \(x_0\) are inputs of the neural network, which then outputs \(\varvec{F}(\tilde{\varvec{c}}_N;t,\Theta )=(F_0,...,F_N, F_{\beta }, F_{x_0})\). The basis functions \(\phi _i\big (\beta (t)(x-x_0(t))\big )\) are shaped by the time-dependent scaling factor \(\beta (t)\) and shift parameter \(x_0(t)\) which are determined by \(\frac{\text {d} \beta }{\text {d}t} \approx F_{\beta }\) and \(\frac{\text {d}x_0}{\text {d} t}\approx F_{x_0}\), respectively
In this paper, we propose a spectral-based DE learning method that extracts the unknown dynamics in the spatiotemporal DE Eq. (1) by using a parameterized neural network to express \(F[u; x, t]\approx F[u;x, t,\Theta ]\). Specifically, our spectral-based DE learning approach aims to reconstruct both spatiotemporal PDEs and spatiotemporal PIDEs where the spatial variable x is defined on an unbounded domain. Moreover, our approach does not require prior assumptions on the form of F[u; x, t]. Throughout this paper, the term “spatiotemporal DE” will refer to both PDEs and PIDEs in the form of Eq. (1). The formal solution u is then represented by a spectral expansion in space,
where \(\{\phi _i\}_{i=0}^N\) is a set of appropriate basis functions that can be defined on bounded or unbounded domains and \(\{c_i\}_{i=0}^N\) are the associated coefficients. We assume that the spectral expansion coefficients \(c_i(t_{j}), i=0,...,N\) in Eq. (2) are given as inputs at various time points \(\{t_{j}\}=t_0,...,t_M\).
By using the spectral expansion in Eq. (2) to approximate u, we do not need an explicit spatial discretization like spatial grids or meshes, and the spatial variable x can be defined in either bounded or unbounded spatial domains. The best choice of basis functions will depend on the spatial domain. In bounded domains, any set of basis functions in the Jacobi polynomial family, including Chebyshev and Legendre polynomials, provides similar performance and convergence rates; for semibounded domains \(\mathbb {R}^+\), generalized Laguerre functions are often used; for unbounded domains \(\mathbb {R}\), generalized Hermite functions are used if the solution is exponentially decaying at infinity, while mapped Jacobi functions are used if the solution is algebraically decaying [19, 20].
Additionally, after using the spectral expansion Eq. (2), a numerical scheme for Eq. (1), regardless of whether it is a PDE or a PIDE involving convolution terms in the spatial variable x, can be expressed as ordinary differential equations (ODEs) in the expansion coefficients \(\varvec{c}_N(t):=(c_0(t),\ldots , c_N(t))\)
The spatiotemporal DE learning method proposed here differs substantially from the s-PINN framework because it does not make any assumptions on the form of the spatiotemporal DE in Eq. (1) other than that the RHS F does not contain time-derivatives or time-integrals of u(x, t). Instead, the spectral neural DE method models F directly by a neural network and employs the neural ODE method [21] to fit the trajectories of the ground truth spectral expansion coefficients \(\varvec{c}_N(t)\). (see Fig. 1c). The method inputs both the solution u(x, t) (in terms of a spectral expansion) and t into the neural network and applies a neural ODE model [21]. Thus, general DEs such as Eq. (1) can be learned with little knowledge of the RHS. To summarize, the proposed method presented in this paper has the advantage that it
-
(i)
does not require assumptions on the explicit form of F other than it should not contain any time-derivatives or time-integrals of u. Both spatiotemporal PDEs and PIDEs can be learned in a unified way.
-
(ii)
directly learns the dynamics of a spatiotemporal DE (RHS of Eq. (1)) by using a parameterized neural network that can time-extrapolate the solutions, and
-
(iii)
does not rely on explicit spatial discretization and can thus learn unbounded-domain DEs. By further using adaptive spectral techniques, our neural DE learning method also learns the dynamics of the shaping parameters that adjust the basis functions. Additionally, our neural DE learning method can also take advantage of sparse spectral methods [22] for effectively reconstructing multidimensional spatiotemporal DEs using a reduced number of inputs.
In the next section, we formulate our spatiotemporal DE learning method. In Sect. 3, we use our spatiotemporal DE method to learn the underlying dynamics of DEs. Although our main focus is to address learning unbounded-domain spatiotemporal DEs, we perform benchmarking comparisons on bounded-domain problems that are solved using other recently developed machine-learning based PDE learning methods that apply only in bounded domains. Concluding remarks are given in Sect. 4. Additional numerical experiments and results are given in the Appendix.
2 Spectral spatiotemporal DE learning method
We now formalize our spectral spatiotemporal DE learning method for spatiotemporal DEs of the general structure of Eq. (1), assuming F[u; x, t] does not include time-differentiation or time-integration of u(x, t). However, unlike in [10], the “dynamics” F[u; x, t] on the RHS of Eq. (1) can take any other form including differentiation in space, spatial convolution, and nonlinear terms. Below is a table of the notation used throughout the development of our method and the test examples in the rest of the paper.
First, consider a bounded spatial domain \(\Omega \). Suppose we have observational data \(u_m(x, t),\, m=1,...,M\) for all x at given time points \(t_j, j=1,...,T\) associated with different initial conditions \(u_m(x, t_0),\, m=1,...,M\). Furthermore, we assume that \(u_m(x, t)\) all obey the same underlying, well-posed spatiotemporal DE Eq. (1). Upon choosing proper orthogonal basis functions \(\{\phi _i(x)\}_{i=0}^N\), we can approximate u(x, t) by the spectral expansion in Eq. (2) and obtain the spectral expansion coefficients \(\varvec{c}_N(t):=(c_0(t),...,c_N(t))\) as Eq. (3). We aim to reconstruct the dynamics \(F(\varvec{c}_N; t)\) in Eq. (3) by using a neural network
where \(\Theta \) is the set of parameters in the neural network. We can then construct the RHS of Eq. (1) using
where \(F_i\) is the \(i^{\text {th}}\) component of the vector \(\varvec{F}(\varvec{c}_N; t,\Theta )\). We shall use the neural ODE to learn the dynamics \(\varvec{F}(\varvec{c}_N;t, \Theta )\) by minimizing the mean loss function \(L(u_N(x, t;\Theta ), u(x, t))\) between the numerical solution \(u_N(x, t;\Theta )\) and the observations u(x, t). When data are provided at discrete time points \(t_j\), we need to minimize
with respect to \(\Theta \). Here, \(u_m(x, t_j)\) is the solution at \(t_j\) of the \(m^{\text {th}}\) trajectory in the dataset and \(u_{N, m}(x, t_j;{\Theta })\) denotes the spectral expansion solution reconstructed from the coefficients \(\varvec{c}_{N, s}\) obtained by the neural ODE of the \(m^{\text {th}}\) solution at \(t_j\).
To solve unbounded domain DEs (in any dimension \(\Omega \subseteq \mathbb {R}^D\)), two additional sets of parameters are needed to scale and translate the spatial argument \(\varvec{x}\), a scaling factor \(\varvec{\beta }:=(\beta ^1,\ldots ,\beta ^D)\in \mathbb {R}^D\), and a shift factor \(\varvec{x}_0:=(x_0^1,\ldots ,x_0^D)\in \mathbb {R}^D\). These factors need to be dynamically adjusted to obtain accurate spectral approximations of the original function [12, 23, 24]. When generalizing the spectral approximation \(u_{N, x_0}^{\beta }(x, t)\) in Table 1 to higher spatial dimensions, we can write
where here, \(\varvec{\beta }*(\varvec{x} - \varvec{x}_0) :=(\beta ^1(x-x_0^1),..., \beta ^D(x-x_0^D))\) is the Hadamard product and \(\phi _{i}(\cdot )\) are D-dimensional basis functions.
Given observed \(u(\varvec{x}, t)\), the ground truth coefficients \(c_i(t)\) as well as the spectral adjustment parameters \(\varvec{\beta }(t)\) and \(\varvec{x}_0(t)\) at discrete time points can be obtained by minimizing the frequency indicator (introduced in [12])
that measures the error of the numerical representation of the solution u [25]. \(\mathcal {F}(u; \varvec{\beta }, \varvec{x}_0)\) depends on \(\varvec{\beta }, \varvec{x}_0\), and the expansion order N through the arguments of the basis functions and thus implicitly through their expansion coefficients \(c_{i}\). Thus, minimizing \(\mathcal {F}(u; \varvec{\beta }, \varvec{x}_0)\) will also minimize the approximation error \(\Vert u - \sum _{i=0}^N c_i\phi _{i}(\varvec{\beta }(t)*(\varvec{x}-\varvec{x}_0(t)))\Vert ^2_2\). Numerically evaluating \(c_i(t_{j})\) usually requires setting up appropriate collocation points determined by the basis functions and adaptive parameters \(\varvec{\beta }\) and \(\varvec{x}_0\). In such unbounded domain problems, the ground truth coefficients and adaptive parameters \(\tilde{\varvec{c}}_N :=\big (c_0(t),...,c_N(t),\varvec{\beta }(t), \varvec{x}_0(t)\big )\) at times \(t_{j}\) are given as inputs to the neural network.
In addition to \(\varvec{c}_{N}(t)\), evolution of the adaptive parameters \(\varvec{\beta }(t), \varvec{x}_0(t)\) over time can also be learned by the neural ODE. More specifically,
for the ODEs satisfied by \(\tilde{\varvec{c}}_N :=\big (\varvec{c}_{N}(t),\varvec{\beta }(t), \varvec{x}_0(t)\big )\). The underlying dynamics \(\varvec{F}(\tilde{\varvec{c}}_N;t)\) is approximated as
by minimizing with respect to \(\Theta \) a loss function that also penalizes the error in \(\varvec{\beta }\) and \(\varvec{x}_0\)
Similarly, the DE satisfied by \(u_{N, \varvec{x}_0}^{\varvec{\beta }}(\varvec{x}, t)\) is
where
and \(F_i\) is the \(i^{\text {th}}\) component of \(\varvec{F}(\tilde{\varvec{c}}_N; t,\Theta )\). Here, \(\varvec{\beta }_{m}(t_j)\) and \(\varvec{x}_{0, m}(t_j)\) are the scaling factor and the displacement of the \(m^{\text {th}}\) sample at time \(t_j\), respectively, and \(\lambda \) is the penalty due to squared mismatches in the scaling and shift parameters \(\beta \) and \(x_0\). In this way, the dynamics of the variables \(\varvec{x}_0,\varvec{\beta }\) are also learned by the neural ODE so they do not need to be manually adjusted as they were in [12, 18, 24, 25].
If the space \(\Omega \) is high-dimensional, sufficiently smooth and well-behaved solutions can be approximated by restricting the basis functions \(\{\phi _{i, x_0}^{\beta }\}\) to those in the hyperbolic cross space. If this projection is performed optimally, such sparse spectral methods with spectral expansions defined in the hyperbolic cross space can reduce the effective dimensionality of the problem by leaving out redundant basis functions [22, 26] without significant loss of accuracy. We will show that our method can also easily incorporate the hyperbolic cross spaces to enhance training efficiency in modestly higher-dimensional problems.
3 Numerical experiments
In this work, we set \(L(\cdot , \cdot )\) to be the relative squared \(L^2\) error
in the loss function Eq. (6) used for training. We carry out numerical experiments to test our spectral spatiotemporal DE reconstruction method by learning the underlying DE given data in both bounded and unbounded domains. In this section, we use the odeint_adjoint function along with the dopri5 numerical integration method developed in the torchdiffeq package [21] to numerically integrate Eqs. (3) and (9). Stochastic gradient descent (SGD) and the Adam optimizer are used separately to optimize parameters of the neural network. Computations for all numerical experiments were implemented on a 4-core Intel® i7-8550U CPU, 1.80 GHz laptop using Python 3.8.10, Torch 1.12.1, and Torchdiffeq 0.2.3.
Since algorithms already exist for learning bounded-domain PDEs, we first examine a bounded-domain problem in order to benchmark our spatiotemporal DE method against two other recent representative methods, a convolutional neural PDE learner [10] and a Fourier neural operator PDE learning method [11].
Example 1
For our first example, we consider learning a bounded-domain Burgers’ equation that describes the behavior of viscous fluid flow [27]. This example illustrates the performance of our spatiotemporal DE method in learning bounded-domain PDEs and benchmarks our approach against some recently developed methods.
where
This model admits the analytic solution expressible as \(u(x, t) = -\frac{\psi _x(x, t)}{5\psi (x, t)}\). We then sample two independent random variables from \(\xi _1, \xi _2 \sim \mathcal {U}(0, 1)\) to generate a class of solutions to Eq. (16) \(\{u\}_{\xi _1, \xi _2}\) for both training and testing. To approximate F in Eq. (4), we use a neural network that has one intermediate layer with 300 neurons and the ELU activation function. The basis functions in Eq. (2) are taken to be Chebyshev polynomials. For training, we use 200 solutions (each corresponding to an independently randomly sampled pair \((\xi _{1}, \xi _{2})\) of Eq. (15)) and record the expansion coefficients \(\{c_i\}_{i=0}^9\) at different times \(t_j=j\Delta {t}, \Delta {t}=\frac{1}{4}, j=0,\ldots ,4\). The test set consists of 100 more solutions, also evaluated at times \(t_j=j\Delta {t}, \Delta {t}=\frac{1}{4}, j=0,\ldots ,4\).
In this bounded-domain problem, we can compare our results (the generated solutions u(x, t)) with those generated from the Fourier neural operator (FNO) and the convolutional neural PDE learner methods. In the FNO method, four intermediate Fourier convolution layers with 128 neurons in each layer were used to input the initial condition \(u(i\Delta x, 0)\). Then, the FNO method outputs the function values \(u(i\Delta x, t=j\Delta t)\) (with \(\Delta x = \frac{1}{128}, \Delta t = \frac{1}{4}\)) for \(j>0\) [11].
When implementing the convolutional neural PDE solver [10], we input \(u(i\Delta x,(j-1)\Delta t)\) and \(u(i\Delta x,j\Delta t)\) (with \(\Delta {x}=\frac{1}{100}, \Delta {t}=\frac{1}{250}\)) [10] into seven convolutional layers with 40 neurons in each layer which outputs \(u(i\Delta x,(j+1)\Delta t)\) as the numerical solution at the next time step. Small \(\Delta {x}\) and \(\Delta {t}\) are used in the convolutional neural PDE solver method because this method depends on both spatial and temporal discretization, requiring fine discretization meshes in both dimensions. For all three methods, we used the Adam method to perform gradient descent with a learning rate \(\eta =0.001\) to run 10000 epochs, which was sufficient for the errors in all three methods to converge. We list in Table 2 the mean relative \(L^2\) error
For the FNO and spectral PDE learning methods, we aim to minimize the relative squared \(L^{2}\) loss (Eq. (14)), while for the convolution neural PDE solver method, we must minimize the MSE loss since only partial and local spatial information on the solution is inputted during each training epoch so we cannot properly define a relative squared loss as the relative squared loss Eq. (14) needs global spatial information to calculate \(\Vert u\Vert _2\). As shown in Table 2, the relative \(L^2\) error of the FNO method is smaller than the MSEs of the other two methods on the training set while the convolutional neural PDE solver method performs the worst. Nonetheless, our proposed neural spectral DE learning approach performs comparably to the FNO method, giving comparable mean relative \(L^2\) errors for learning the dynamics associated with the bounded-domain Burgers’ equation, but can also generate new solutions given different initial conditions.
Additionally, the run times (using a 4-core i7-8550U laptop) in this example were \(\sim 2\) hours for the convolutional PDE solver method, \(\sim 6\) hours for the FNO method, and \(\sim 5\) hours for our proposed spatiotemporal DE learning approach. Overall, even in bounded domains, our proposed neural DE learning approach compares well with the recently developed convolutional neural PDE solver and FNO methods, providing comparable errors and efficiency in generating solutions to Eq. (15) given different initial conditions.
The Fourier neural operator method works well for solving Burgers’ equation in Example 1, and there could be other even more efficient methods for reconstructing bounded domain spatiotemporal DEs. However, reconstructing unbounded domain spatiotemporal DEs is substantially different from reconstructing bounded domain counterparts. First, discretizing space cannot be directly applied to unbounded domains; second, if we truncate an unbounded domain into a bounded domain, appropriate artificial boundary conditions need to be imposed [15]. Constructing such boundary conditions is usually complex; improper boundary conditions can lead to large errors. A simple example of when the FNO will fail when we truncate the unbounded domain into a bounded domain is provided in Appendix A.
Since our spectral method uses basis functions, it obviates the need for explicit spatial discretization and can be used to reconstruct unbounded-domain DEs. Dynamics in unbounded domains are intrinsically different from their bounded-domain counterparts because functions can display diffusive and convective behavior leading to, e.g., time-dependent growth at large x. This growth poses intrinsic numerical challenges when using prevailing finite element/finite difference methods that truncate the domain.
Although it is difficult for most existing methods to learn the dynamics in unbounded spatial domains, our spectral approach can reconstruct unbounded-domain DEs by simultaneously learning the expansion coefficients and the evolution of the basis functions. To illustrate this, we next consider a one-dimensional unbounded domain inverse problem.
Example 2
Here, we examine a parabolic PDE in an unbounded domain and with initial conditions that depend on parameters \(\xi _1, \xi _2, \xi _3\). The PDE and its initial condition are given by
This example illustrates the application of our method to learning the dynamics of a parabolic PDE given different ground truth solutions in an unbounded domain corresponding to different initial conditions. The solution of the PDE, within the domain \(x\in \mathbb {R}\) and time interval \(t\in [0, 1]\), is expressed as
Since this problem is defined on an unbounded domain, neither the FNO nor the convolutional neural PDE methods can be used as they rely explicitly on spatial meshes or grids and apply only on bounded domains. However, given observational data \(u(\cdot , t)\) for different t, we can calculate the spectral expansion of u via the generalized Hermite functions [20]
and then use the spatiotemporal DE learning approach to reconstruct the dynamics F in Eq. (1) satisfied by u. Recall that the scaling factor \(\beta (t)\) and the displacement of the basis functions \(x_0(t)\) are also to be learned. To penalize misalignment of the spectral expansion coefficients and the scaling and displacement factors \(\beta \) and \(x_0\), we use the loss function Eq. (11). Note that taking the derivative of the first term in Eq. (11) would involve evaluating the derivative of Eq. (14) which would require evaluation of integrals such as \(\int \tfrac{u_{N,x_0}^{\beta }(x,t_j; \Theta ) - u(x, t_j)}{\Vert u\Vert _2^2} \partial _x u_{N, x_0}^{\beta }(x, t_j;\Theta ) \partial _{\Theta }\big [\beta (t_j;\Theta )(x-x_0(t_j;\Theta ))\big ] \text {d}{x}\). Expressing \(\partial _x u_{N, x_0}^{\beta }(x, t_j; \Theta ) \) in terms of the basis functions \(\hat{\mathcal {H}}_{i}\big (\beta (t)(x-x_{0}(t))\big )\) would involve a dense matrix–vector multiplication of the coefficients of the expansion \(\partial _x u_{N, x_0}^{\beta }(x, t_j; \Theta ) \partial _{\Theta }\big [\beta (t_j;\Theta )(x-x_0(t_j;\Theta ))\big ]\), which might be computationally expensive during backward propagation in the stochastic gradient descent (SGD) procedure.
Alternatively, let the neural network parameter after the \((j-1)^\mathrm{{th}}\) training epoch be \(\Theta _{j-1}\). During the calculation of the gradient of the loss function Eq. (11) w.r.t. \(\Theta \) at the \(j^\mathrm{{th}}\) epoch, we define \(\tilde{\beta }(t_j):=\beta (t_j;\Theta _{j-1}), \tilde{x}_0(t_j) :=x_0(t_j;\Theta _{j-1})\) to be constants independent of \(\Theta \) and then modify Eq. (11) to
so that backpropagation within each epoch will not involve calculating gradients of \(\tilde{\beta }_m(t_j), \tilde{x}_{0, m}(t_j)\) in the first term of Eq. (21). This simplified calculation reduces the computational cost of the training process but can provide gradients close to the true gradients when the reconstructed \(\tilde{\beta }(t_j), \tilde{x}_0(t_j)\) are close to the ground truth values \(\beta _m(t), x_{m, 0}(t)\). For example, when \(\beta _m(t;\Theta _{j-1})=\beta _m(t), x_{m, 0}(t;\Theta _{j-1})=x_{m, 0}(t)\), i.e., the reconstructed \(\beta _m(t;\Theta _{j-1}), x_{m, 0}(t;\Theta _{j-1})\) agree exactly with the ground truth, Eq. (11) and Eq. (21) will both become
No derivative of \(\beta , x_0\) w.r.t. \(\Theta \) will be used and only the gradient of F in Eq. (4) w.r.t. \(\Theta \) appears. In this case, the simplified gradient exactly reflects the true gradient. Therefore, we can fit the coefficients \(c_i(t)\) and \(\beta (t), x_0(t)\) separately, and then use the simplified loss gradient to update the neural network parameters.
We use 100 solutions for training and another 50 solutions for testing with \(N=9, \Delta {t}=0.1, T=9, \lambda =0.1\). Each solution is generated from Eq. (19) with independently sampled parameters \(\xi _1, \xi _2, \xi _3\). A neural network with two hidden layers, 200 neurons in each layer, and the ELU activation function is used for training. Both training and testing data are taken from Eq. (19) with sampled parameters \(\xi _1\sim \mathcal {N}(3,\frac{1}{4}),\,\, \xi _2\sim \mathcal {U}(0, \frac{1}{2}),\,\, \xi _3\sim \mathcal {N}(0, \frac{1}{2})\).
Setting \(\lambda =0.1\), we first compare the two different loss functions Eqs. (11) and (21). After running 10 independent training processes using SGD, each containing 2000 epochs and using a learning rate \(\eta =0.0002\), the average relative \(L^2\) error when using the loss function Eq. (11) are larger than the average relative \(L^2\) errors when using the loss function Eq. (21). This difference arises in both the training and testing sets as shown in Fig. 2a.
In Fig. 2b, we plot the average learned F (RHS in Eq. (1)) for a randomly selected sample at \(t=0\) in the testing set. The dynamics learned by using Eq. (21) is a little more accurate than that learned by using Eq. (11). Also, using the loss function Eq. (21) required only \(~\sim 1\) hour of computational time compared to 5 days when learning with Eq. (11) (on the 4-core i7-8550U laptop). Therefore, for efficiency and accuracy, we adopt the revised loss function Eq. (21) and separately fit the dynamics of the adaptive spectral parameters (\(\beta , x_0\)) and the dynamics of the spectral coefficients \(c_{i}\).
We also explore how network architecture and regularization affect the reconstructed dynamics. The results are shown in Appendix B, from which we observe that a wider and shallower neural network with 2 or 4 intermediate layers and 200 neurons in each layer yields the smallest errors on both the training and testing sets, and short run times. We also apply a ResNet [28] as well as the dropout technique [29, 30] to regularize the neural network structure. Dropout regularization does not reduce either the training error or the testing error probably because even with a feedforward neural network, the errors from our spatiotemporal DE learner on the training set are close to those on the testing set and there is no overfitting issue. On the other hand, applying the ResNet technique leads to about a 20% decrease in errors. Results from using ResNets and dropout are shown in Appendix B.
Next, we investigate how noise in the observed data and changes in the adaptive parameter penalty coefficient \(\lambda \) in Eq. (21) impact the results. Noise is incorporated into simulated observational data as
where u(x, t) is the solution to the parabolic equation Eq. (18) given by Eq. (19) and \(\xi (x, t)\sim \mathcal {N}(0, \sigma ^2)\) is a Gaussian-distributed noise that is both spatially and temporally uncorrelated (i.e., \(\langle \xi (x, t)\xi (y, s)\rangle =\sigma ^2\delta _{x, y}\delta _{s, t}\)). The noise term is assumed to be independent for different samples. We use a neural network with 2 hidden layers, 200 neurons in each layer, to implement 10 independent training processes using SGD and a learning rate \(\eta =0.0002\), each containing 5000 epochs. Results are shown in Fig. 2c and further tabulated in Appendix C. For \(\sigma =0\), choosing an intermediate \(\lambda \in (10^{-1.5}, 10^{-1}]\) leads the smallest errors and an optimal balance between learning the coefficients \(c_{i}\) and learning the dynamics of \(\beta , x_0\). When \(\sigma \) is increased to nonzero values (\(\sim 10^{-4} - 10^{-3}\)), a larger \(\lambda \sim 10^{-0.75}-10^{-0.5}\) is needed to keep errors small (see Fig. 2c and Appendix C). If the noise is further increased to, say, \(\sigma =10^{-2}\) (not shown in Fig. 2), an even larger \(\lambda \sim 10^{-0.5}\) is needed for training to converge. This behavior arises because the independent noise \(\xi (x, t)\sim \mathcal {N}(0, \sigma ^2)\) contributes more to high-frequency components in the spectral expansion. In order for training to converge, fitting the shape of the basis functions by learning \(\beta , x_0\) is more important than fitting noisy high-frequency components via learning \(c_{i}\). A larger \(\lambda \) puts more weight on learning the dynamics of \(\beta , x_0\) and basis function shapes.
We also investigate how intrinsic noise in the parameters \(\xi _1, \xi _2, \xi _3\) affects the solution (Eq. (19)) and the accuracy of the learned DE. As shown in D, we find that if the intrinsic noise in \(\xi _1, \xi _2, \xi _3\) is increased, the training errors of the learned DE models also increase. However, compared to models trained on data with lower \(\xi _1, \xi _2, \xi _3\), training using noisier data leads to lower errors when testing data are also noisy. Additionally, we explore how the number of solutions in the training set impacts how the learned DE model makes new predictions given the initial conditions of solutions in the testing set. The results are also listed in Appendix D and show that larger numbers of training samples (solutions associated with different \((\xi _1, \xi _2)\) in Eq. (19)) lead to smaller relative \(L^2\) errors of the predicted solutions in both the training and testing sets.
Finally, we test whether the parameterized F (Eq. (10)) learned from the training set can extrapolate well beyond the training set sampling interval \(t\in [0, 0.9]\). To do this, we generate another 50 trajectories and sample each of then at random times \(t_i\sim \mathcal {U}(0, 1.5), i=1,...,9\). We then use models trained with \(\sigma =0\) and different \(\lambda \) to test. As shown in Fig. 2d, our spatiotemporal DE learner can accurately extrapolate the solution to times beyond the training set sampling time intervals. We also observe that a stronger penalty on \(\beta \) and \(x_0\) (\(\lambda =10^{-0.5}\)) leads to better extrapolation results.
In the last example, we carry out a numerical experiment on learning the evolution of a Gaussian wave packet (which may depend on nonlocal interactions) across a two-dimensional unbounded domain \((x, k)\in \mathbb {R}^2\). We use this case to explore improving training efficiency by using a hyperbolic cross space to reduce the number of coefficients in multidimensional problems.
Example 3
We solve a 2D unbounded-domain problem of fitting a Gaussian wave packet’s evolution
where \(\xi _1\) is the center of the wave packet and a is the minimum positional spread. If \(\xi _2=0\), the Gaussian wave packets defined in Eq. (24) solves the stationary zero-potential Wigner equation, an equation often used in quantum mechanics to describe the evolution of the Wigner quasi-distribution function [31, 32]. We set \(a=1\) and \(b=\frac{1}{2}\) in Eq. (24) and independently sample \(\xi _1, \xi _2\sim \mathcal {U}(-\frac{1}{2}, \frac{1}{2})\) to generate data. Thus, the DE satisfied by the highly nonlinear Eq. (24) is unknown and potentially involves nonlocal convolution terms. In fact, there could be infinitely many DEs, including complicated nonlocal DEs, that can describe the dynamics of Eq. (24). An example of such a nonlocal DE is
We wish to learn the underlying dynamics using a parameterized F in Eq. (10). Since the Gaussian wave packet Eq. (24) is defined in the unbounded domain \(\mathbb {R}^2\), learning its evolution requires information over the entire domain. Thus, methods that depend on discretization of space are not applicable.
Our numerical experiment uses Eq. (24) as both training and testing data. We take \(\Delta {t}=0.1, t_j=j\Delta {t}, j=0,...,10\) and generate 100 solutions for training. For testing, we generate another 50 solutions, each with starting time \(t_0=0\) but \(t_j\) taken from \(\mathcal {U}(0, 1), j=1,\dots , 10\). The parameters \(\xi _1, \xi _2\) in the solutions Eq. (24) are independently sampled for both the training set and the testing set. For this example, training with ResNet results in diverging gradients, whereas the use of a feedforward neural network yields convergent results. So we use a feedforward neural network with two hidden layers and 200 neurons in each hidden layer and the ELU activation function. We train across 1000 epochs using SGD with momentum (SGD M), a learning rate \(\eta =0.001\), \(\text {momentum}=0.9\), and \(\text {weight decay}=0.005\). We use a spectral expansion in the form of a two-dimensional tensorial product of Hermite basis functions \(\hat{\mathcal {H}}_i\hat{\mathcal {H}}_{\ell }\)
to approximate Eq. (24). We record the coefficients \(c_{i, \ell }\) as well as the scaling factors and displacements \(\beta ^{1}, \beta ^{2}, x_0^1, k_0^2\) at different \(t_j\) as the training data.
Because \((x, k)\in \mathbb {R}^2\) are defined in a 2-dimensional space, instead of a tensor product, we can use a hyperbolic cross space for the spectral expansion to effectively reduce the total number of basis functions while preserving accuracy [22]. Similar to the use of sparse grids in the finite element method [33, 34], choosing basis functions in the space
can reduce the effective dimensionality of the problem. We explored different hyperbolic spaces \(V_{N, \gamma }^{\varvec{\beta }, \varvec{x}_0}\) with different N and \(\gamma \). We use the loss function Eq. (21) with \(\lambda =\frac{1}{50}\) for training. The results are listed in Appendix E. To show how the loss function Eq. (21) depends on the coefficients \(c_{i, \ell }\) in Eq. (26), we plot saliency maps [35] for the quantity \(\frac{1}{10}\sum _{j=1}^{10} \Big |\frac{\partial \textrm{Loss}_j}{\partial c_{i, \ell }(0)}\Big |\)Footnote 1, the absolute value of the partial derivative of the loss function Eq. (21) w.r.t. \(c_{i, \ell }\) averaged over 10 training processes.
As shown in Fig. 3a, b, using \(\gamma =-1, 0\) leads to similar errors as the full tensor product \(\gamma =-\infty \), but could greatly reduce the number of coefficients and improve training efficiency. Taking a too large \(\gamma =1/2\) leads to larger errors because useful coefficients are left out. From Fig. 3c–f, there is a resolution-invariance for the dependence of the loss function on the coefficients \(c_{i, \ell }\) though using different hyperbolic spaces with different \(\gamma \). We find that an intermediate \(\gamma \in (-\infty , 1)\) (e.g., \(\gamma =-1, 0\)) can be used to maintain accuracy and reduce the number of inputs/outputs when reconstructing the dynamics of Eq. (24). Overall, the “curse of dimensionality" can be mitigated by adopting a hyperbolic space for the spectral representation.
Finally, in Appendix F, we consider source reconstruction in a heat equation. Our proposed spatiotemporal DE learning method achieves an average relative error \(L^2 \approx 0.1\) in the reconstructed source term. On the other hand, if all terms on the RHS of Eq. (1) except an unknown source (which does not depend on the solution) are known, the recently developed s-PINN method [18] achieves a higher accuracy. However, if in addition to the source term, additional terms on on the RHS of Eq. (1) are unknown, s-PINNs cannot be used but our proposed spatiotemporal DE learning method remains applicable.
4 Conclusions
In this paper, we propose a spatiotemporal DE learning method that is quite suitable for learning spatiotemporal DEs from spectral expansion data of the underlying solution. Its main advantage is its applicability to learning both spatiotemporal PDEs and integro-differential equations in unbounded domains, while matching the performance of the most recent high-accuracy PDE learning methods applicable to only bounded domains. Moreover, our proposed method has the potential to deal with higher-dimensional problems if a proper hyperbolic cross space can be justified to effectively reduce the dimensionality.
In future investigations, we plan to apply our spatiotemporal DE learning method to many other inverse-type problems in physics with other appropriate basis functions in unbounded domains, such as the mapped Jacobi functions that characterize algebraic decay at infinity [36,37,38], the radial basis functions [39,40,41], or the Laguerre functions on the semi-unbounded half line \(\mathbb {R}^+\) [42,43,44]. A potentially interesting application is to learn the evolution of probability densities associated with anomalous diffusion [45] in an unbounded domain, which is often described by fractional derivatives or convolutional terms in the corresponding F[u; x, t] term. Finally, higher dimensional problems remain challenging since the number of inputs (expansion coefficients) grows exponentially with spatial dimension and the computational cost in may not be sufficiently mitigated by the optimal hyperbolic cross space indices \(N, \gamma \) (see Eq. (27)). Two possible ways to address this issue are promising. First, prior knowledge on the observed data can be used to reduce the dimension of the unknown dynamics to be learned, e.g., if we can determine an optimal hyperbolic cross space for the spectral expansion from data, we can effectively reduce the number of basis functions needed. Second, deep neural networks [46, 47], which can effectively handle a large number of inputs, could be adopted when the number of spectral expansion coefficients becomes large. Exploring these directions can further extend the applicability of our proposed spatiotemporal DE learning method to higher-dimensional problems.
Data availability
No data were used in this manuscript. All reported results in the manuscript were generated numerically from the methods presented. Codes that were used to generate numerical results can be made available on request.
Notes
We take derivatives w.r.t. only the coefficients \(\{c_{i, \ell }(0)\}\) of the predicted \(u_{N, \varvec{x}_0(0), m}^{\varvec{\beta }_m(0)}(x, 0;\Theta )\) in Eq. (21) and not w.r.t. the expansion coefficients of the observational data u(x, 0).
References
Bar, L., Sochen, N.: Unsupervised deep learning algorithm for PDE-based forward and inverse problems. arXiv preprint arXiv:1904.05417 (2019)
Stephany, R., Earls, C.: PDE-LEARN: using deep learning to discover partial differential equations from noisy, limited data. Neural Netw. 106242 (2024)
Long, Z., Lu, Y., Ma, X., Dong, B.: PDE-net: learning PDEs from data. In: International Conference on Machine Learning, pp. 3208–3216 (2018). PMLR
Churchill, V., Chen, Y., Xu, Z., Xiu, D.: Dnn modeling of partial differential equations with incomplete data. J. Comput. Phys. 493, 112502 (2023)
Long, Z., Lu, Y., Dong, B.: PDE-net 2.0: learning PDEs from data with a numeric-symbolic hybrid deep network. J. Comput. Phys. 399, 108925 (2019)
Brunton, S.L., Proctor, J.L., Kutz, J.N.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 113(15), 3932–3937 (2016)
Rudy, S.H., Brunton, S.L., Proctor, J.L., Kutz, J.N.: Data-driven discovery of partial differential equations. Sci. Adv. 3(4), 1602614 (2017)
Xu, H., Chang, H., Zhang, D.: DL-PDE: deep-learning based data-driven discovery of partial differential equations from discrete and noisy data. Commun. Comput. Phys. 29(3), 698–728 (2021)
Anandkumar, A., Azizzadenesheli, K., Bhattacharya, K., Kovachki, N., Li, Z., Liu, B., Stuart, A.: Neural operator: graph kernel network for partial differential equations. In: ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (2020)
Brandstetter, J., Worrall, D.E., Welling, M.: Message passing neural PDE solvers. In: International Conference on Learning Representations (2021)
Li, Z., Kovachki, N.B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., Anandkumar, A.: Fourier neural operator for parametric partial differential equations. In: International Conference on Learning Representations (2020)
Xia, M., Shao, S., Chou, T.: Efficient scaling and moving techniques for spectral methods in unbounded domains. SIAM J. Sci. Comput. 43(5), 3244–3268 (2021)
De Pablo, A., Quirós, F., Rodr\(\acute{\imath }\)guez, A., Vázquez, J.L.: A general fractional porous medium equation. Commun. Pure Appl. Math. 65(9), 1242–1284 (2012)
Grindrod, P.M.: Patterns and Waves: The Theory and Applications of Reaction-Diffusion Equations. Oxford (1991)
Antoine, X., Arnold, A., Besse, C., Ehrhardt, M., Schädle, A.: A review of transparent and artificial boundary conditions techniques for linear and nonlinear Schrödinger equations. Commun. Comput. Phys. 4(4), 729–796 (2008)
Zhang, W., Yang, J., Zhang, J., Du, Q.: Artificial boundary conditions for nonlocal heat equations on unbounded domain. Commun. Comput. Phys. 21(1), 16–39 (2017)
Fanaskov, V., Oseledets, I.: Spectral neural operators. arXiv preprint arXiv:2205.10573 (2022)
Xia, M., Böttcher, L., Chou, T.: Spectrally adapted physics-informed neural networks for solving unbounded domain problems. Mach. Learn. Sci. Technol. 4(2), 025024 (2023)
Burns, K.J., Vasil, G.M., Oishi, J.S., Lecoanet, D., Brown, B.P.: Dedalus: a flexible framework for numerical simulations with spectral methods. Phys. Rev. Res. 2(2), 023068 (2020)
Shen, J., Tang, T., Wang, L.-L.: Spectral Methods: Algorithms, Analysis and Applications. Springer, New York (2011)
Chen, R.T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.: Neural ordinary differential equations. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6572–6583 (2018)
Shen, J., Wang, L.-L.: Sparse spectral approximations of high-dimensional problems based on hyperbolic cross. SIAM J. Numer. Anal. 48(3), 1087–1109 (2010)
Tang, T.: The Hermite spectral method for Gaussian-type functions. SIAM J. Sci. Comput. 14(3), 594–606 (1993)
Xia, M., Shao, S., Chou, T.: A frequency-dependent p-adaptive technique for spectral methods. J. Comput. Phys. 446, 110627 (2021)
Chou, T., Shao, S., Xia, M.: Adaptive Hermite spectral methods in unbounded domains. Appl. Numer. Math. 183, 201–220 (2023)
Shen, J., Yu, H.: Efficient spectral sparse grid methods and applications to high-dimensional elliptic problems. SIAM J. Sci. Comput. 32(6), 3228–3250 (2010)
Bateman, H.: Some recent researches on the motion of fluids. Mon. Weather Rev. 43(4), 163–170 (1915)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Chen, Z., Xiong, Y., Shao, S.: Numerical methods for the Wigner equation with unbounded potential. J. Sci. Comput. 79(1), 345–368 (2019)
Shao, S., Lu, T., Cai, W.: Adaptive conservative cell average spectral element methods for transient Wigner equation in quantum transport. Commun. Comput. Phys. 9(3), 711–739 (2011)
Bungartz, H.-J., Griebel, M.: Sparse grids. Acta Numer. 13, 147–269 (2004)
Zenger, C., Hackbusch, W.: Sparse grids. In: Proceedings of the Research Workshop of the Israel Science Foundation on Multiscale Phenomenon, Modelling and Computation, p. 86 (1991)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. In: Proceedings of the International Conference on Learning Representation (ICLR), pp. 1–8 (2014)
Wang, L.-L., Shen, J.: Error analysis for mapped jacobi spectral methods. J. Sci. Comput. 24, 183–218 (2005)
Shen, J., Wang, L.-L.: Some recent advances on spectral methods for unbounded domains. Commun. Comput. Phys. 5(2–4), 195–241 (2009)
Zhao, T., Zhao, Z., Li, C., Li, D.: Spectral approximation of \(\psi \)-fractional differential equation based on mapped Jacobi functions. arXiv preprint arXiv:2312.16426 (2023)
Wang, Z., Chen, M., Chen, J.: Solving multiscale elliptic problems by sparse radial basis function neural networks. J. Comput. Phys. 492, 112452 (2023)
Buhmann, M.D.: Radial basis functions. Acta Numer. 9, 1–38 (2000)
Chen, C.-S., Noorizadegan, A., Young, D.L., Chen, C.S.: On the selection of a better radial basis function and its shape parameter in interpolation problems. Appl. Math. Comput. 442, 127713 (2023)
Clement, P.R.: Laguerre functions in signal analysis and parameter identification. J. Franklin Inst. 313(2), 85–95 (1982)
Vismara, F., Benacchio, T., Bonaventura, L.: A seamless, extended DG approach for advection-diffusion problems on unbounded domains. J. Sci. Comput. 90, 1–27 (2022)
Xiong, Y., Guo, X.: A short-memory operator splitting scheme for constant-Q viscoelastic wave equation. J. Comput. Phys. 449, 110796 (2022)
Jin, B., Rundell, W.: A tutorial on inverse problems for anomalous diffusion processes. Inverse Prob. 31(3), 035003 (2015)
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: Deepxde: a deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)
Chen, Z., Churchill, V., Wu, K., Xiu, D.: Deep neural network modeling of unknown partial differential equations in nodal space. J. Comput. Phys. 449, 110782 (2022)
Funding
This work was supported in part by the Army Research Office under Grant W911NF-18-1-0345.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Using Fourier neural operator to solve unbounded domain DEs
We shall show through a simple example that it is usually difficult to apply bounded domain DE solve methods to reconstruct unbounded domain DEs even if we truncate the unbounded domain into a bounded domain, because appropriate boundary conditions must be provided. Here, we show how the FNO method fails to generalize well on the testing set when solutions to a PDE with a wrong boundary condition are inputted as training data. We wish to use the Fourier neural operator method to solve the unbounded domain DE
If one imposes the initial condition \(u(x, 0) = 10\xi e^{-100x^2}\), then
is the analytic solution to Eq. (28). For this problem, we will assume \(\xi \sim \mathcal {U}(1, \frac{3}{2})\).
Since the Fourier neural operator (FNO) method relies on spatial discretization and grids, and cannot be directly applied to unbounded domain problems, we truncate the unbounded domain. Suppose one is interested in the solution’s behavior for \(x\in [-1, 1]\). One approach is to truncate the unbounded domain \(x\in \mathbb {R}\) to \([-1, 1]\) and use the FNO method to reconstruct the solution \(u(x, t),\, x\in [-1, 1], t\in [0, 1]\) given u(x, 0). However, we show how improper boundary conditions of the truncated domain can leads to large errors.
For example, we assume the training set satisfies the boundary condition \(u(x=\pm 1, t) = 0\), which is not the correct boundary condition since it is inconsistent with the ground truth solution. Therefore, we would not be solving the model in Eq. (28). We generate the testing dataset using the correct initial condition \(u(x, 0) = 10\xi \exp (-100x^2)\), without boundary conditions. The results are given in Table 3.
From Table 3, the testing error is significantly larger than the training error because a different DE (not Eq. (28)) is constructed from the training data, which is not the heat equation we expect. Therefore, even if methods such FNO are efficient in bounded domain DE reconstruction problems, directly using them to reconstruct unbounded domain problems is not feasible if we cannot construct appropriate boundary conditions.
Dependence on neural network architecture
The neural network structure of the parameterized \(F[u;x,t,\Theta ]\) may impact learned dynamics. To investigate how the neural network structure influences results, we use neural networks with various configurations to learn the dynamics of Eq. (19) in Example 2 in the noise-free limit. We set the learning rate \(\eta =0.0002\) and apply networks with 2,3,5,8 intermediate layers, and 50, 80, 120, 200 neurons in each layer.
From Tables 4 and 5, we see that a shallower and wider neural network yields the smallest error. Runtimes increase with the number of layers and the number of neurons in each layer; however, when the number of layers is small, the increase in runtime with the number of neurons in each layer is not significant. Thus, for the best accuracy and computational efficiency, we recommend a neural network with 2 hidden layers and 200 neurons in each layer.
Regularization of the neural network can also affect the spectral neural PDE learner’s ability to learn the dynamics or to reduce overfitting on the training set. We set \(\lambda =0.1\) in the loss function Eq. (21) and the training rate \(\eta =0.0002\) and train over 5000 epochs using SGD. We applied the ResNet and dropout techniques with a neural network containing 2 intermediate layers, each with 200 neurons. For the ResNet technique, we add the output of the first hidden layer to the output of the second hidden layer as the new output of the second hidden layer. For the dropout technique, each neuron in the second hidden layer is set to 0 with a probability \(P_\textrm{d}\). The results are presented in Table 6 which shows the relative \(L^2\) errors on the training set and testing set for Example 2 when there is no noise in both the training and testing data (\(\sigma =0\)). We apply regularization to the neural network, testing both the ResNet and the dropout techniques with different dropout probabilities \(P_\textrm{d}\). The errors are averaged over 10 independent training processes. Applying the ResNet technique leads to approximately a 20% decrease in the errors, whereas applying the dropout technique does not reduce the training error nor the testing error.
How data noise and penalty parameter \(\lambda \) affect learning
We now investigate how different strengths \(\sigma \) and penalties \(\lambda \) affect the learning, including the dynamics of \(\beta \) and \(x_0\). For each strength of noise \(\sigma =0, 0.0001, 0.001\), 100 trajectories are generated for training and another 50 are generated for testing according to Eq. (23). The penalty parameter tested are \(\lambda = 10^{-2}, 10^{-1.5}, 10^{-1}, 10^{-0.5}\). The mean relative errors on the training set and testing set over 10 independent training processes are shown in Table 7 below and are plotted in Fig. 2c.
How parameter noise and number of training solutions affect reconstruction
Here, we take different distributions of the three parameters \(\xi _1, \xi _2, \xi _3\) in Eq. (19). We shall use \(\xi _1\sim \mathcal {N}(3, \frac{\sigma _\textrm{p}^2}{4}),\,\, \xi _2\sim \frac{1}{4}\mathcal {U}(-\frac{\sigma _\textrm{p}}{4}, \frac{\sigma _\textrm{p}}{4}),\,\, \xi _3\sim \mathcal {N}(0, \frac{\sigma _\textrm{p}^2}{2})\) and vary \(\sigma _\textrm{p}=0,\frac{1}{4}, \frac{1}{2}, 1\). For different \(\sigma _\textrm{p}\), we train 10 independent models and the results are given in Table 8.
The training and testing error is the same for models with \(\sigma _\textrm{p}=0\) in Table 8 because if there is no uncertainty in the initial condition, all trajectories are the same. Though giving larger training errors, models trained on training sets with larger variances in the parameters of the initial condition could generalize better on testing sets where the variances in the parameters of the initial condition is larger.
Next, we change the number of solutions in the training set for training the spatiotemporal DE model and test the performance of the learned spatiotemporal DE models with different training solutions on the testing set. We take the first 12, 25, 50, and 100 solutions of the 100-solution training set generated by Eq. (19) with independently sampled \(\xi _1\sim \mathcal {N}(3,\frac{1}{4}),\,\, \xi _2\sim \mathcal {U}(0, \frac{1}{2}),\,\, \xi _3\sim \mathcal {N}(0, \frac{1}{2})\). Solutions in the testing set (a total of 50 solutions) are also generated from Eq. (19) with independently sampled \(\xi _1\sim \mathcal {N}(3,\frac{1}{4}),\,\, \xi _2\sim \mathcal {U}(0, \frac{1}{2}),\,\, \xi _3\sim \mathcal {N}(0, \frac{1}{2})\). The time points for both the training set and the testing set are taken to be \(t_j=0.1j, j=0,...,9\). A neural network with two hidden layers, 200 neurons in each layer, and the ELU activation function is used for training. The results shown in Table 9 are averaged over 10 independent training process using SGD, each containing 2000 epochs and using a learning rate \(\eta =0.0002\). Equation (21) is minimized for training and the penalty parameter \(\lambda =0.1\).
Varying hyperbolic cross-space parameters N and \(\gamma \)
If the spectral expansion order N is sufficiently large, using a hyperbolic cross space Eq. (27) can effectively reduce the required number of basis functions while maintaining accuracy. In our experiments, we set \(N=5, 9, 14\) and \(\gamma =-\infty \) (full tensor product), \(-1, 0, \frac{1}{2}\). We train the network for 1000 epochs using SGDM with a learning rate \(\eta =0.001\), \(\text {momentum}=0.9\), and \(\text {weight decay}=0.005\). The penalty coefficient in Eq. (21) is \(\lambda =0.02\).
Saliency maps of the absolute value of derivatives of the relative \(L^2\) loss w.r.t. \(\{c_{i, \ell }\}\). Here, \(N=5, 9, 14\) and \(\gamma =-\infty , -1, 0, \frac{1}{2}\) in Eq. (27). The loss function is always most sensitive to frequencies \(c_{i, \ell }\) on the lower left of the saliency maps. Such a “resolution-invariance” indicates that having a larger N but a moderate \(\gamma > -\infty \) to include the frequencies in the lower-left part of this saliency map leads to a balance between efficiency and accuracy
From Table 10, we see that a hyperbolic space with \(N=14, \gamma =-1\) leads to minimal errors on the testing set. Furthermore, the number of basis functions for the hyperbolic space with \(N=15, \gamma =-1\) is smaller than the full tensor product space for \(N=9, 14\) when \(\gamma =-\infty \), so the hyperbolic space with \(N=14, \gamma =-1\) could be close to the most appropriate choice. We shall also use the saliency map to investigate the role of different frequencies and plot \(|\frac{\partial \text {Loss}}{\partial c_{i,\ell }(0)}|\) for different N and \(\gamma \) in Fig. 4, where the loss function is Eq. (21). Even for different choices of \(N, \gamma \), the changes in frequencies on the lower-left part of the saliency map (corresponding to a moderate \(\gamma \) and a large N) have the largest impact on the loss. This “resolution-invariance” justifies our choices that the proper hyperbolic space should have a larger N but a moderate \(\gamma >\infty \) so that the total number of inputs or outputs are reduced to boost efficiency in higher-dimensional problems while accuracy is maintained.
The errors in Table 10 can be larger on the training set than on the testing set, especially at larger training set errors (e.g., for \(N=5\)). This arises because the largest sampling time among the training samples is \(t=1\) while it is less than 1 for testing samples. If the trained dynamics \(F(\tilde{U};t,\Theta )\) does not approximate the true dynamics \(F(\tilde{U};t)\) in Eq. (10) well, the error of the training samples with time \(t=1\) will be larger than that of testing samples due to error accumulation.
Comparison with the s-PINN method
Comparison of the relative \(L^{2}\) error in the potential in Eq. (30) learned from our proposed spatiotemporal DE learner and from the s-PINN method. Our spectral neural DE learner achieved an average relative \(L^{2}\) error of 0.1, while the s-PINN method, designed to input the exact form of the RHS of Eq. (30) with only one unknown potential, achieved better accuracy with an average relative \(L^{2}\) error of about 0.01
To make a comparison with the s-PINN method proposed in [18], we consider the following inverse-type problem of reconstructing the unknown potential f(x, t) in
by approximating \(f(x, t)\approx \hat{f}:=F[u;x, t, \Theta ] - u_{xx}\). The function u is taken to be
where \(\xi \sim \mathcal {U}(\frac{1}{2}, 1)\) is i.i.d. sampled for different trajectories. Therefore, the true potential in Eq. (30) is
which is independent of u(x, t). We generate 100 solutions \(u_{m}(x, t_i), \, m=1,...,100\) as the training set to learn the unknown potential with \(t_i=i\Delta {t}, \Delta {t}=0.1, i=0,...,10\). In the s-PINN method, since only t is inputted, only one reconstructed \(\hat{f}\) (which is the same for all trajectories) is outputted in the form of a spectral expansion. However, in our spatiotemporal DE learning method, \(f(x, t)\approx \hat{f} = F[u;x, t, \Theta ] - u_{xx}\) will be different for different inputted u giving rise to a changing error along the time horizon. The mean and variance of the relative \(L^2\) error \(\frac{\Vert \hat{f}-f\Vert _2}{\Vert f\Vert _2}\) is plotted in Fig. 5.
When all but the potential on the RHS of Eq. (1) is known, s-PINN is preferable because more information is inputted as part of the loss function in [18]. Nevertheless, our spatiotemporal DE learner can still achieve a relative \(L^2\) error \(\sim 0.1\), indicating that without any prior information it can still reconstruct the unknown source term with acceptable accuracy. However, the accuracy of our spatiotemporal DE learner for reconstructing the potential f deteriorates as the time horizon \(t=j\Delta t\) increases. Since errors accumulate, minimizing Eq. (21) requires more accurate reconstruction of the dynamics (RHS of Eq. (1)) at earlier times than at later times.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Xia, M., Li, X., Shen, Q. et al. Learning unbounded-domain spatiotemporal differential equations using adaptive spectral methods. J. Appl. Math. Comput. (2024). https://doi.org/10.1007/s12190-024-02131-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12190-024-02131-2