Physics Informed Neural Networks for an Inverse Problem in Peridynamic Models

Deep learning is a powerful tool for solving data driven differential problems and has come out to have successful applications in solving direct and inverse problems described by PDEs, even in presence of integral terms. In this paper, we propose to apply radial basis functions (RBFs) as activation functions in suitably designed Physics Informed Neural Networks (PINNs) to solve the inverse problem of computing the peridynamic kernel in the nonlocal formulation of classical wave equation, resulting in what we call RBF-iPINN. We show that the selection of an RBF is necessary to achieve meaningful solutions, that agree with the physical expectations carried by the data. We support our results with numerical examples and experiments, comparing the solution obtained with the proposed RBF-iPINN to the exact solutions.


Introduction to the peridynamic inverse problem
We consider the following PDE in peridynamic formulation: (1.1) where C : R → R, representing the so-called kernel function, is a nonnegative even function.
In the one-dimensional case, the model describes the dynamic response of an infinite bar composed of a linear microelastic material.
The main important aspect of such constitutive model is that it takes into account long-range interactions and their effects.The equation of motion (1.1) was proposed by Silling in 2000 in [37] in the framework of continuum mechanics theory with the name of linear peridynamic.This is an integral-type nonlocal model involving only the displacement field and not its gradient.This leads to a theory able to incorporate cracks, fractures and other kind of singularities.
The general initial-value problem associated with (1.1) is well-posed (see [16,17]) and due to the presence of long-range forces, the solution shows a dispersive behavior.The length-scale of the long-range interactions is parameterized by a positive scalar value δ > 0 called horizon, which represents the maximum interaction distance between two material particles.
If the kernel function C, also known as micromodulus function, is a suitable generalized function, in the limit of short-range forces, or equivalently taking the limit as δ → 0, the linear peridynamic model (1.1) reduces to the wave equation ∂ tt θ(x, t)−∂ xx θ(x, t) = 0, (see [43,31]).As a consequence, the length-scale parameter δ can be viewed as a measure of the degree of nonlocality of the model.
In order to maintain the consistency with Newton's third law, the micromodulus function must be even: Moreover, due to the dispersive effects C must be such that for every wave number k ̸ = 0. Additionally, since the interaction between two material particle should become negligible as the distance between particles become very large, we can assume that (1.4) lim If a material is characterized by a finite horizon, so that no interactions happen within particles that have relative distance greater than δ, then we can assume that the support of the kernel function is given by [−δ, δ] and the model (1.1) writes as Of course, such condition is less restrictive than (1.4).
It is clear that a different microelastic material corresponds to a different kernel function and, as a consequence, the kernel function involved in the model provides different constitutive models.
In this paper, we aim to solve the inverse problem described in (1.1) for learning the shape of the kernel function C, by implementing a Physics Informed Neural Network (PINN).More specifically, we show that this inverse problem requires a careful selection of activation functions in all the layers and a correct interaction with kernel initializers.It can be seen, in fact, that a naive choice on these functions would result in unreliable predictions and possibly unfeasible solutions.More precisely, we see that, if the neural network structure is not chosen accordingly to appropriate geometric knowledge relative to the data, then PINN output returns different, still acceptable, results, showing a lack of uniqueness.Therefore we will show that, as soon as the peridynamic operator is bounded on a compact support [−δ, δ] and the PINN architecture is build accordingly, as a consequence of the well posedness conditions of the peridynamic formulation, the learned kernel fulfills all the requirements expected, provided that PINN structure is accurate enough.

Introduction to PINNs
Physics-Informed Neural Networks (PINNs) have emerged as a transformative approach to tackle both direct and inverse problems associated with PDEs.These innovative neural network architectures seamlessly integrate the principles of physics into the machine learning framework.By doing so, PINNs offer a promising solution to efficiently and accurately model, simulate, and optimize complex systems governed by PDEs.More specifically, they can be employed to solve both direct and inverse problems; in the latter case, such PINNs are commonly referred to as inverse PINNs.
Direct problems involve finding solutions to PDEs that describe the evolution of physical systems under specified initial and boundary conditions.Traditional numerical methods, such as finite element analysis (see [46,23]), finite difference methods with composite quadrature formulas (see [11,32]) and applied to spectral fractional models (see [14]), model order reduction methods (see [34]), meshfree methods (see [36,35]), adaptive refinement techniques (see [9,2]) and collocation and Galerkin methods (see [1]) have been widely used for solving direct problems.Moreover, more recently spectral methods with volume penalization techniques (see for instance [24,27,15,22]) and Chebyshev spectral methods (see [28,26,25,42]) have been developed in order to increase the order of convergence, to improve the accuracy of the results and to maintain the consistency of the method even in presence of singularities.
However, these approaches often require substantial computational resources and may struggle with high-dimensional or non-linear problems.Additionally, such methods need to know the constitutive parameters of the model to predict fractures in the material under consideration and, in suitable configurations, they fail to impose boundary conditions.In order to provide some hint in this direction, a data-driven approach can be developed.In [38] the authors propose a geometry-aware method in physics informed neural network to exactly imposing boundary conditions over complex domains.In [47] the authors investigate both a forward and an inverse problems of high-dimensional nonlinear wave equations via a deep neural networks with the activation function.While in [41] a combination of an orthogonal decomposition with a neural network is applied to build a reduced order model.In [29], the authors present an unsupervised convolutional neural network architecture with nonlocal interactions for solving PDEs using Peridynamic Differential Operator as a convolutional filter.Indeed, this approach results to be very efficient when the model is governed by an integral operator (see for instance [45]).
Inverse problems, on the other hand, are concerned with determining unknown parameters, boundary conditions, or the PDE itself, given limited or noisy observations of the system behavior.These problems frequently arise in real-world applications, including medical imaging [10], geophysics [4,21,8], material characterization [44], and industrial process optimization [30].Inverse problems are inherently ill-posed, as multiple solutions or no solutions may exist, making their resolution challenging.In fact, several issues could arise in solving inverse problems, especially related to irregular geometries [19], or also small data regimes, incomplete data or incomplete models [33].
In the context of nonlocal elasticity theory, in [40] the authors propose a methodology based on a constrained least squares optimization to solve inverse problems in heterogeneous media using state-based peridynamics in order to derive parameter values characterizing several material properties and to establish conditions for fracture patterns in geological setting.
Thus, Physics-Informed Neural Networks represent a paradigm shift in the way to approach direct and inverse problems associated with PDEs.Their ability to combine data-driven learning with physical principles opens up new frontiers in scientific discovery, engineering design, and problem-solving across a wide spectrum of domains.
2.1.PINN paradigm.In this paper, we will consider a Feed-Forward fully connected Neural Network (FF-DNN), also called Multi-Layer Perceptron (MLP) (see [5,13] and references therein).In a PINN the solution space is approximated through a combination of activation functions, acting on all the hidden layers, with the independent variable used as the network inputs.Letting x ∈ R n , in a Feed-Forward network each layer feeds the next one through nested transformation, so that a it can be seen, letting L be the number of layers, as (2.1) where, for each layer l = 1, . . ., L, σ l : R n → R m is the activation function, which operates componentwise, W l is the weight matrix and b l is the bias vector.Thus, the output z L ∈ R m of a FF-NN can be expressed as a single function of the input vector x, defined as the composition of all the layers above in the following way: The aim of a PINN is to minimize, through a Stochastic Gradient Descent method, a suitable objective function called loss function, that would take into account the physics of the problem, with respect to all the components, called trainable parameters, of W l , b l , for l = 1, . . ., L.
More specifically, given a general PDE of the form P(f ) = 0, where P represents the differential operator acting on f , the loss function used by a PINN is usually given by where f * is the training dataset (of points inside the domain or on the boundary), and 0 * is the expected (true) value for the differential operation P(f ) at any given training or sampling point; the chosen norm ∥ • ∥ (it may be different for each term in the loss function) depends on the functional space and the specific problem.Selecting a correct norm (so to avoid overfitting) for the loss function evaluation is an important problem in PINN, and recently in [39] authors have proposed spectral techniques based on Fourier residual method to overcome computational and accuracy issues.The first term in the right-hand side of (2.2) is referred to as data fitting loss, while the second term is referred to as residual loss, which is responsible to make a NN be informed by physics.The operator P is usually performed using autodiff (Automatic Differentiation algorithm).In the context of peridynamic theory, in [20] authors propose, for the first time, a nonlocal alternative to autodiff by replacing the evaluation of f and its partial derivatives through the action of a Peridynamic Differential Operator (PDDO) on f .
A recent review on PINNs and related theory can be found in [12].

RBF-iPINN for the kernel function
In case one wants to solve an inverse problem then there will be more trainable parameters than only those coming from weight matrices and bias vectors.Hence, such further parameters have to be considered in the minimization iterations and their respective gradients must be computed as well.However, in our case, the inverse problem does not involve the mere computation of scalar quantities, but rather a whole function, specifically the kernel function C in (1.1), which has also analytical and geometrical properties to be accounted for, such as nonnegativity and symmetry.In the PINN architecture proposed, these features reflect in the implementation of a NN model with two separated sets of layers, one for C and the other for θ, and whose output would be both the solution to (1.1) and the unknown function C; moreover, the loss function (2.2) has been accordingly endowed with further terms necessary to enforce the requirements on C.
From the point of view of the architecture, while nonnegatitivity of C has been simply enforced by requiring all trainable parameters in (2.1) to be nonnegative, symmetry has required a more specific treatment, both in terms of activation functions and in the way we have defined the loss function.Our idea has been to wisely select different activation functions for the two different sets of layers, inspired by the properties coming along with C and the data on θ.In the following section we will introduce and discuss the technical approaches to deal with activation and loss functions.
3.1.Radial Basis Function Layer.As activation function for the first layer, whose input is x, a Radial Basis Function (RBF) is selected.By definition, a Radial Basis Function (RBF) is a real-valued function whose output depends only on the distance from a fixed center or prototype point.An RBF can be defined as where ϕ is the RBF function, x is the input to the RBF, c is the center or prototype point, ∥x − c∥ represents the distance between x and c.
RBFs are commonly used in various fields and, when used as activation functions in neural networks, they give rise to Radial Basis Function Neural Networks (RBFNNs) (see [18]).
We have experimented on two families of RBFs (3.1), given by called inverse quadratic and multiquadric RBFs respectively, where all the parameters above could be taken to be trainable.This approach has recently been proposed in [3] in the context of direct problem for nonlinear PDEs, but used in the middle layer.However, our experiments solving the inverse problem of learning the kernel function of a PDE in the peridynamic formulation, we have extensively noticed that such a choice is not efficient, resulting in nonphysical results and excessively large computational time and cost.In fact, this has led us to introduce an RBF inverse PINN, that we called RBF-iPINN, where the first layer is implemented with an RBF activation function, that has significantly sped up performance while providing the expected result if compared to the exact solution.x, t need to be handled separately.To this purpose, the proposed RBF-iPINN is implemented in a serialized fashion That is, its first layer is activated by an RBF as in (3.2) with the only input x, followed by 8 layers with 20 neurons each, activated by ReLu function.The output of this sequence of layer is then concatenated with t, to produce the input for the second portion on the Neural Network; more specifically, we put again 8 layers with 20 neurons each, having sigmoid as their activation function, before providing the overall output of the RBF-iPINN, which is thus represented by a tensor for C and a tensor for θ.The structure is sketched in Figure 2.Moreover, the two sets of full hidden layers are endowed with a kernel initializer of type glorot normal and random uniform respectively; also, a kernel regularizer of type l1 l2, with weights l 1 = l 2 = 0.01 is applied to the first portion of the NN, which yields the computed kernel function.

Loss Function.
Given the constraints on C, we have to accordingly construct a loss function as in (2.2) with as many components as there are constraints to be enforced in the RBF-PINN.More specifically, we consider the following components to be part of L: In the selection of norms above, we have been guided by the features we want our PINN to take into account.More specifically, since we are interested in fitting data and in satisfying our model as much as possible, we selected the 2-norm for these contributions; however, since symmetry is to be expected from the PINN architecture and, in particular, from the first RBF layer, we measure its loss via the 1-norm, which is more sensible to small errors.Finally, we consider a weighted sum of the contributions given above as ( Figure 2. RBF-iPINN structure.In our simulations, we set the number of hidden layers, after each RBF layer, to 8, and the number of neurons per layer to 20.

3.4.
Learning rate.The learning rate α has been selected to be decreasing with the epoch in a quadratic way.More precisely, we implemented the following scheduler: (3.5) where N is the number of epochs chosen for the training.

Numerical Simulations
In this section, we show results with our RBF-PINN.All the experiments have been run using 1000 epochs with a learning rate defined in (3.5) and employed the ADAM optimizer.The machine used for the experiments is an Intel Core i7-8850H CPU at 2.60GHz and 64 GB of RAM.Moreover, the PINNs, providing results in the examples below, have been developed in the python library TensorFlow using the interface Keras.Moreover, real data, which are used in our simulations to compute the loss function (3.3b), are synthetically built using appropriate spectral methods [24] to solve (1.1).
A main feature of the numerical computation is the evaluation of the integral on the right-hand side of (1.1).In fact, in order to exploit the power of Keras, we notice that where the second term in the right-hand side above is the convolution product between the kernel C and the unknown function θ.It has to be noticed here that the kernel function C is compactly supported, with support [−δ, δ].Now, in order to numerically compute such convolution product, let [0, X] be the space interval and let 0 < x 1 < x 2 < . . .< x N −1 < x N = X be the uniform spatial discretization of the interval [0, X] with stepsize h > 0. the convolution product above can be numerically treated by determining the exact number of components in the vector [C(x i )] n i=1 so that only points x i such that come into play when computing C(∥x∥) * θ(x, t).Since x i = i • h, then we deduce that the only indices involved in the convolution product are i, j = 1, . . ., N such that |i − j| < δ h .
Since the peridynamic integral-operator in (1.5) is linear, we can derive in terms of the Green's function a solution of the initial-value problem using continuous Fourier Transform.Such solution can be used to provide a dataset for the next simulations.
In the next two examples, we exemplify on datasets derived from V-shaped kernel functions.This choice of kernel is justified by the fact that it is implemented in some nonlocal formulations of Richards' equation as it is able to easily incorporate Dirichlet boundary condition in the model (see [6]).
Thus, we select as activation function for the first layer of our PINN model an RBF of type (3.2b).Further, we tried both to keep all the three parameters γ, ρ, µ trainable and to fix γ while letting ρ, µ be trainable.According to our experience and for the following two cases, fixing γ improves convergence performance and result quality.We set γ = 0.09, obtaining the results showed in Figure 3a, where we compare the true kernel function in (4.1) to the output of the proposed inverse RBF-iPINN.Setting γ = 0.05 in (3.2b) provides qualitatively comparable results, as can be observed in Figure 3b.We set γ = 0.09, obtaining the results showed in Figure 4a, where we compare the true kernel function in (4.2) to the output of the proposed inverse RBF-iPINN.Setting γ = 0.05 provides results in Figure 4b.l1 l2 with weights 0.01 and 0.1 respectively, in order to try and catch, as better as possible, the correct qualitative behavior of the kernel in the interior of its compact support; moreover, on the account of the knowledge of the kernel shape, we decided to activate the first layer of the RBF-iPINN through (3.2a),where we set the hyperparameter γ = 1; finally, for a better data fitting, we also selected the sup-norm in (3.3b).
In this case, the neural network shows a discrete ability to catch shape and support of the bell-shaped kernel, but fails in a good approximation of the characteristic parameters, as shown in Figure 5.In fact, in this case the true kernel is given by Therefore, we have performed a further analysis by implementing a standard inverse PINN to learn parameters γ * and σ * in Starting with initial guesses for γ * = 3 and for σ * = 0.5, we run a PINN whose structure is the same as the second portion relative to θ of the RBF-iPINN described above (see the architecture in Figure 2).The training phase has been performed over 1000 epochs and with the same learning rate scheduler described in Section 3.4, but with a faster descent obtained by setting α 0 = 10 −3 .Results are depicted in Figure 6.It can be deduced that the inverse PINN has been able to correctly detect the learned values which, at convergence, are given by γ * = 2.3302033 and σ * = 1.0218402, being 4 √ π ≈ 2.2567583.

Conclusions
In this work we have analyzed a peridynamic formulation of a classical wave equation, trying to compute the kernel function responsible for the nonlocal behavior of the model considered.We have proposed to utilize a Radial Basis Function (RBF) as activation function for the first layer of a suitably designed Physics Informed Neural Network (PINN) to solve the inverse problem.Specifically, our inverse PINN architecture has two neural networks working in series: the first set of layers, whose first layer activated by some Radial Basis Function, takes the spatial data as input and yields the first output for recovering the kernel function; then, this first output is concatenated to the temporal data, thus providing a new tensor serving as input to the second set of layers, which produces the output describing θ, solution to the peridynamic wave equation.We called the proposed model RBF-iPINN.We have shown that, with a wise scheduler for the learning rate and necessary initializations of the first set of layers responsible for the kernel function computation, for standard selections of the activation RBF, the RBF-iPINN can provide reliable prediction of the kernel function, in case it has a Vshape behavior.We also tackle the case of Gaussian kernel function: here, RBF-iPINN is able to adequately detect the shape, but a further analysis is necessary to learn the expected parameters of a bell-shaped function.We did so by implementing a standard inverse PINN, practically tuning the very same second set of layer of the RBF-iPINN.Such models turn out to be promising tools for investigating optimal controls problems in dimension 2 or higher (see, e.g., [7]), where an inverse PINN approach seems to provide robust and scalable results.Such considerations pave the way to further investigations about how to deal with more complicated peridynamic models via PINNs and Radial Basis Functions, that seem to be a powerful approach to this kind of problems, due to their inherently symmetric nature.

Figure 3 .
Figure 3. Learned kernel functions relative to Example 4.1 for different values of γ in (3.2b).

Example 4 . 3 .
For this example we consider a bell-shaped kernel function to test the proposed RBF-iPINN.First, we tuned hyperparameters, setting the kernel regularizers

Figure 5 .
Figure 5. Learned kernel function compared to the true one from Example 4.3.

Figure 6 .
Figure 6.Gaussian kernel function learned from (4.4), compared to the true one from Example 4.3.
3.4) L = w pde L pde + w data L data + w sym L sym .