A deep artificial neural network architecture for mesh free solutions of nonlinear boundary value problems

Seeking efficient solutions to nonlinear boundary value problems is a crucial challenge in the mathematical modelling of many physical phenomena. A well-known example of this is solving the Biharmonic equation relating to numerous problems in fluid and solid mechanics. One must note that, in general, it is challenging to solve such boundary value problems due to the higher-order partial derivatives in the differential operators. An artificial neural network is thought to be an intelligent system that learns by example. Therefore, a well-posed mathematical problem can be solved using such a system. This paper describes a mesh free method based on a suitably crafted deep neural network architecture to solve a class of well-posed nonlinear boundary value problems. We show how a suitable deep neural network architecture can be constructed and trained to satisfy the associated differential operators and the boundary conditions of the nonlinear problem. To show the accuracy of our method, we have tested the solutions arising from our method against known solutions of selected boundary value problems, e.g., comparison of the solution of Biharmonic equation arising from our convolutional neural network subject to the chosen boundary conditions with the corresponding analytical/numerical solutions. Furthermore, we demonstrate the accuracy, efficiency, and applicability of our method by solving the well known thin plate problem and the Navier-Stokes equation.


Introduction
Artificial neural networks (ANNs) can be utilised to create suitable tools to solve large-scale computational problems [2,37]. ANNs involve the computational inference of simulated neuronal units, where every neuron has a real-valued input. Multiplying these units results in the corresponding neuronal coefficients. Then, using the results, a bias term [29], the authors discuss a study related to the sclera recognition system using CNNs. This CNN framework is formed using four major convolutional units linked with the corresponding convolutional layers. This is a self-learning model where the output of the first layer is fed to the next so that the model is better trained to provide efficient results. Similarly, a multi-layer feedforward neural network for internet traffic classification was proposed in [30]. They have used a model with four hidden layers for handling large amounts of imbalanced data.
Thus, recently, ANNs have shown great promise in solving many challenging problems in applied mathematics and computing. In this sense, several ways to solve partial differential equations (PDEs) using feedforward neural networks by substituting approximate solutions into the corresponding differential operator [22,38] have been proposed. For example, Abdeljaber et al. [1] studied an interesting active vibration problem of cantilevered beam induced by a pulse concentrated load using an ANN architecture. Consequently, they proposed a novel methodology for the active control of flexible cantilever plates using piezoelectric sensor/actuator pairs. Similarly, Jafari et al. [18] solved a fuzzy differential equation using a feedforward neural network with three hidden layers. Various other differential equations have been solved using the ANN architectures, e.g., [10,11,34,35]. However, much of these recent problems solved in the area of PDEs and boundary value problems are of first order belonging to the linear domain. Thus, the resulting optimisation problem solved within the ANN architecture is constrained, making the application domain rather limited.
The challenge we address in this paper is that we propose a general ANN architecture for solving a broad range of nonlinear boundary value problems. For this purpose, we utilise a deep neural network that can learn patterns from the data [27] using trial solutions as input during the learning phase. Here, we solve an unconstrained minimisation problem instead of solving the traditional forms of constrained minimisation problems [14]. Moreover, the method proposed here is free from meshes, which is a considerable advantage since meshes can be infeasible and costly when solving PDES in higher dimensions [26]. Here, the convolutional neural network replaces the mesh formulation where trial solutions help to train the network. The first part of the sample data is the necessary boundary conditions of the PDES, while the second part consists of a feedforward neural system containing flexible parameters or weights, giving rise to the solution [9,23]. Thus, the approximate trial solution provides precision-controlled coefficients without the need for domain discretisation.
The work we describe here is novel in that we demonstrate that an ANN-based nonlinear machine learning model is used to solve general nonlinear boundary value problems. To our knowledge, this is the first time a general ANN-based solution is proposed for such boundary value problems with diverse applications in mathematics, physics and engineering. Specifically, in this work, we show that neural networks can learn accurately about the solution of PDEs, such as the Poisson's equation and the Biharmonic equation, even when such equations are solved in complicated domains. Notably, one can utilise this approach to solve PDEs arising from boundary value problems without worrying about creating and manipulating complex numerical meshes. We demonstrate the efficiency and superiority of the proposed ANNbased method over commonly utilised numerical methods such as the finite difference and the finite element methods.
The remainder of the paper is structured as follows. In Section 2, we introduce the proposed ANN architecture for solving nonlinear boundary value problems. Subsequently, in that section, we discuss the methodology and provide experimental validation of the proposed method. In Section 3, we provide examples showing the application of the proposed method. More specifically, we solve the Von-Kármán and the Navier-Stokes equation subject to given boundary conditions. Finally, in Section 4, we summarise and conclude the paper.

The proposed network architecture
The neural network architecture we have propose for obtaining mesh free solutions of nonlinear boundary value problems is inspired by deep and fully connected feedforward neural networks [6]. The ANN is a map between the input and output of the system, which can be linear or nonlinear [17]. The resulting output is a linear activation. Thus, the ANN defines a map such that R n → R n . The system forming the initial/boundary value problem describes this map. In some cases, the learning process of the multi-layer network can be time-consuming and illconditioned. An increasing number of input parameters and proper error minimisation techniques can be useful to cope with such difficulties. The ANN we consider here consists of n + 1 layers with 0 ≤ l i ≤ l where layer 0 is the input layer and layer l is the output. The activation functions in the hidden layers can be of general form [36], e.g., sigmoids, rectified linear units, or hyperbolic tangents. For this work, unless otherwise stated, we use sigmoids in the hidden layers.

A general formulation for solving boundary value problems
A general form of the nonlinear boundary value problem considered here can be formulated such that: subject to certain boundary conditions, where x = (x 1 , x 2 , .., x n ) ∈ R n is an independent variable vector over the domain, D ⊂ R n , m = {1, 2, ..., n} is the order of boundary value problem,g(x) is the source term and u(x) is an unknown solution. The function F is considered as non-singular, and all the partial derivatives and higher order derivatives in terms of {x, u, u} are assumed to be well defined. The finite domain in the n-dimensional space for the independent variables is discretised with a unit h such that: where a(l) = a, l = {1, 2, .., n} is the discrete index vector with each of the components representing the discrete coordinate for the individual variables x l and x l0 representing the initial values. Hence, the discretised boundary value problem becomes: To obtain a numerical solution of the boundary value problem, the following method is adopted, which assumes the grid discretisation of the domain D and the boundary value problem is defined as described in (2) and (3), respectively. The unknown function u is a two variable entity, u = u(x), such that it is defined over a twodimensional domain D with the following trial solution: where the weights need to be learned by the neural network N(x, p) to approximate the solution with A(x) along with the boundary conditions and B(x) satisfying B(x)| δD ≡ 0. Note that the choice of A and B are not restricted. There are several ways to construct a solution and in [4] one could find further details on how to approach it. The main problem here is how to transform the differential equation into a minimisation problem. Let us explain this further by way of an example as follows.

Solving poisson's equation
Let us consider the general form of Poisson's equation to be solved as: Then, the minimisation problem will look like: At this stage, the problem is changed from a constrained to an unconstrained optimisation problem since the boundary conditions are satisfied by a trial solution already, which makes it easier to handle. Now, the minimised error x la are points in D such that: Let us consider a multilayer perceptron with m p input units, the hidden layer with h a activation function units (which is a sigmoid in our case) and an output unit. For a given input vector x, the output of the network is: where w ij is the weight of the input unit, (j ), v i is the weight of the hidden unit, σ (z) is the sigmoid transfer function and h i is the hidden unit i. Then it is straightforward to see that: where σ (z i ) (k) denotes the k th order derivative of the sigmoid. It is worth mentioning that when we compute our loss function, various high order derivatives can be picked into a collection, forming a single unknown function. This collection of unknown functions can include the PDE analytic solution and its derivatives up to the third order, or it can also include only the PDE solution and its secondorder derivatives. However, in this case, the loss function is written in the least square sense, bearing a resemblance to the well known least square finite element method [21]. In constructing the architecture, it is important to bear in mind that the PDE solution and its derivatives share nearly the same deep neural network. The structure of the model can be seen in Fig. 1, where we need to update our boundary/trial solution network before updating the main part of the neural network.

Enabling learning in the neural network
The major challenge with multi-layer perceptron neural networks is the difficulty in measuring the efficiency of the weights within the hidden layers, whereby the aim is to obtain the lowest performance error. Unfortunately, there is no straightforward mechanism to measure the degree of error within the hidden layers during the training. Hence, we showing the proposed feedforward neural network. It consists of an input layer, two or more hidden layers, and an output layer containing one or more artificial neurons propose the following backpropagation learning procedure to be applied during the training phase. It includes: -the set of sample/domain data points {x la }, -the value for the learning rate α and the criteria to terminate the algorithm, -and the methodology for updating weights and initial values of the weights.
Note, here, to start with, we take random initial weights lying between −0.5 to 0.5. There are n input nodes that will become n + 1 total input nodes after including an extra bias node. A single hidden layer consists of h n nodes with sigmoid as the activation function. The single scalar output from the previous information is then given by: where v ∈ R h n andW ∈ R h n ×{n+1} are the specific neural network parameters that replaces the general parameter p.
The input variable isx = [x T , 1] T . The function σ : R h n → R h n represents a component-wise activation function that operates on the hidden layer. The overall task here is to choose the discretised setting given in (2) and various hidden nodes h n in a layer, and then optimise the (5) to get an approximation of u t (x,p). As shown in Fig. 1, a feed-forward neural network comprises of an input layer, one or more hidden layers and an output layer. Parameters are propagated from the input layer to the output layer through the hidden layers. The number of nodes in the hidden layer is determined by the degree of polynomial considered. If an nth degree polynomial is used, the number of nodes in the hidden layer would be n + 1. The coefficients of the polynomial can be used as initial weights from the input to the hidden layers and from the hidden layers to the output layer. The number of hidden layers, the number of neurons per layer and the choice of activation functions are essential considerations we have put in place while designing our neural network.
As the number of hidden layers increases, the network can be thought of as an adaptive nonlinear dynamical system composed of numerous neurons connected by various attributes and capable of approximating a wide range of complex functions. This is a key feature of the network we have proposed here. Thus, since an ANN can be used to accurately approximate very complicated functions, it can be regarded as a learnable entity which in our case is regarded as the solution of the PDE in question.
It is important to bear in mind that the number of network parameters, including the weights of internode links and biases associated with hidden and output nodes, should be less than the size of the training set. Otherwise, the network will be susceptible to over-fitting, which means that the predictions on the training set will improve while the power of generalisation will deteriorate. Regularisation methods are used to control over-fitting. This is also a feature which is inherent in the ANN proposed here.
Another feature we have taken onboard is the consideration given in selecting the bias terms. Each neuron in the ANN is supplied with a bias, including the output neurons but excluding the input neurons. The connections between neurons in subsequent layers are represented by matrices of weights. We let b l i denote the bias of neuron i in layer l. The weight between neuron k in the layer l 1 and neuron i in the layer l is denoted by w l ik . The activation function in the layer l is denoted by σ l , regardless of the type of activation. We assume for simplicity that a single activation function is used for each layer. The output from neuron i in the layer l is denoted by y l i . Drawing parallels with a given numerical method for the accuracy of the solutionp obtained, we assume the grid and the hidden layer as the dominant choices in the solution parameter space. Thus, conceptually, the accuracy of the solution is expected to improve with a finer grid and a larger hidden layer. However, this comes with a high computational cost and also with the undesirable effect of potential over-fitting. Note, the aim here to achieve an estimate of the solution with sufficient accuracy. Whilst we do that, we must bear in mind to minimise the computational cost and complexity of the problem itself. We explain our ANN formulation further by way of another example, as discussed below.

Solving the biharmonic equation
The Biharmonic operator or bi-Laplacian operator is used in various formulations such as solving the airy stress problem, incompressible fluid mechanics, and elasticity problems [5]. This operator also has applications in geometric design, e.g., [4,19]. Here, we solve the two dimensional Biharmonic equation in a rectangular domain together with mixed boundary conditions using the method proposed in the previous section. Consider the equation: , is a four times continuously differentiable function and f, ψ i , {i = 1, 2} are known functions. In our case, δ δn refers to the normal derivative where n is a unit vector orthogonal to the surface. In other words, if n is the unit vector providing the direction, then the derivative is given as, δ δn ≡ · n.
If n = (1, 0), i.e., in the direction of x-axis, then we just get x , which is the standard partial derivative in the direction of the x-axis. Similarly, if n = (0, 1) T , then we get y .
To find the appropriate trial solution, first, we write all the boundary conditions as follows: (a, y) = ψ 1 (a, y) , δ δn (a, y) = −ψ 2 (a, y), The trial solution is written in a similar way as proposed in the previous section such that: N(x, p)), where, and N(x, p), (12) where φ 2 vanishes on the boundary. Now, after substituting (11) and (12) in (10) and then in (9), we can find the unknown constants A i and B i , for i = {1, 2, 3, 4}. In practice, we can generate a feasible trial solution satisfying the boundary conditions since the determinant of the coefficient matrix for the defined system is not identically zero. Hence, the minimum error is written as: where x la are points in .
The formulation is further described in Algorithm 1. To explain this further, we take a specific case of the Biharmonic equation. We solve it and compare the solution of our proposed neural network based solution with other existing numerical methods such as finite element method (FEM) and finite difference method (FDM) for (14) as shown in Table 1. We consider the following nonlinear Biharmonic equation with non-zero boundary conditions such that: Ψ xxxx + 2 xxyy + yyyy + xx + 2 xy + yy + Ψ x + y + 2 =g(x, y), where,

g(x)
= e x (2π 4 − 2π 2 + 3) sin(πy) + e 2x −e 2x cos(πy) 2 + 3e x π cos(πy), (14) with boundary conditions as follows, Using the boundary conditions, the trial function is constructed as: The ANN solution and the analytical solution for the chosen Biharmonic problem are shown in Figs. 2 and 3, respectively. Figure 4 shows a detailed comparison of our proposed approach with the FEM and FDM through the use of L 2 error. This error is calculated for different grid sizes starting from 5 × 5 to 30 × 30.
For further comparison of the errors between various methods compared here, we have also utilised the rootmean-square (RMS) error: where N is the total number of nodes, T i is the exact solution, and R i is the numerical solution at the particular   Table 2.

Effects of the hidden layers
An optimum number of hidden layers is an important and critical requirement when deciding to create a chosen ANN framework. The key idea is to efficiently calculate the number of hidden units needed in a multi-layer network to obtain a particular approximation. For this purpose, we work out the number of independent variables that should be modified to create an arbitrary smooth function for a given order of approximation. At the same time, we also work out the number of values that will be changed based on the overall number of parameters in the network. Figure 5 shows the effect of the hidden layers as we change the discretisation in the x and y directions. Our experiments, for this example and for a number of other examples, show that the use of 10 hidden layers results in good agreement of the ANN-based solution when compared with the exact and the numerical solutions. The important point to be noted here is that as we increase the number of hidden layers and discretisation across both axes, the training time increases. Therefore, we suggest using the number of hidden layers between 5 and 10, depending on the training data and the complexity of the problem.

Examples
So far, we have discussed how the ANN for solving the chosen boundary value can be setup. We have also shown how the method can be utilised to solve simpler yet general problems such as the Poisson's and the Biharmonic equation. In this section, we show how the ANN framework discussed above can be used to solve two further boundary value problems namely, the the Von-Kármán equation and the Naiver-Stokes Equation.
Both these problems are of complex nature and of practical importance.

Solving the Von-Kármán equation
The analysis of large square and rectangular plate deflections is one of the most studied architectural problems. The applications of this problem include the design of aviation structures, ship hulls, bridges and spacecraft. For example, the lateral loads on an aircraft are subjected to the thin plate effects in the pressure cabin or lift and bending on the wings. The generic Kirchhoff linear plate bending theory is appropriate only for minor deflections that neglect midsurface strains and subsequent in-plane stresses. With a relatively strong lateral deflection, the external force increases. Both the plate's central surface and the membrane forces play a major role in moving the side loads. Initially, the Von-Kármán's extension to large deformations was assisted through the pioneering work of Leitao [24]. In film relationships, we must account for the nonlinear terms for the sake of large flow. Consequently, we obtain two nonlinear fourth-order cross-displacement equations for which there is no analytic solution.
Consequently, to solve this problem, one has to resort to numerical procedures. For example, the FDM technique was used in [7] to examine large deflections of rectangular plates subject to lateral pressure and edge loading combination. In [12], the authors discuss some of the earlier final component implementations for large deformations. In [33], the stress-based finite element approach was introduced. In [32], a boundary element method helps to analyse the static, dynamic and buckling behaviour of thin plates. These numerical methods can solve the Von-Kármán plate equations and are more versatile than the rival semianalytical methods. However, they all come with high computational costs.

Problem formulation
Let us define a side-by-side general plate system composed of non-homogeneous elastoplastic plates with varying physical properties in the domain ∈ R 2 such that: where k > 0 is the number of plates that form the system. The problem of bending of the system with the discontinuous coefficient is written as: where is the bending of the beam corresponding to the vertically applied force F , are the cylindrical stiffness coefficients of the plates of the system, E, μ and h represents Young's modulus, Poisson's constant, and the thickness of the beam (assumed to be uniform throughout). Due to the plate boundaries, the boundary conditions for the equilibrium condition are defined to be the physical conditions such that it is assumed that the clamped boundary and the boundary condition are simply assisted. Though the plates in the system have different cylindrical stiffness coefficients of D i , the coefficients of (15) lead to discontinuity at the common bounds of the system. Let us consider two plates with different properties, i.e., i = 2 in a rectangular domain, Fig. 6. The case considered is with the boundary at x = 0. With that, the following functions can be easily seen to satisfy the simply supported boundary condition, the clamped boundary condition of and the common boundary, respectively, such that: t2 (x) = −(1− cos 2πx)(1− cos 2πy)+x 2 y 2 (x−l 2 ) 2 (y−w 1 ) 2 N(x, p). (16) The function (x) = { t1 ∪ t2 } is geometrically equivalent such that the plates are connected with an ideal hinge on the common boundary points, and there is absolute rigid support under the hinge. The function (x) also corresponds to the bending along the common boundary. This means that the plates are composed of welding, and there is absolute rigid support under the hinge.
To examine the effect of the Young's modulus on the bending of the system, we solve the problem using conditions whereby E 1 = 5000 kn/cm 2 , E 2 = 3000 kn/cm 2 and E 1 = 3000 kn/cm 2 andE 2 = 5000 kn/cm 2 and shown in Fig. 7. The ANN-based solution is derived using 6 hidden layers. A comparison is carried out with the FEM, which shows an L 2 error of 1.3413e − 08. The error between the ANN solution and the analytical solution is 1.066e − 11, with 7 hidden layers and a grid size of 20 × 20.

Solving the naiver-stokes equation
Here we present an efficient deep learning technique relying on the ANN for the model reduction of the Navier-Stokes Equation [3] for incompressible flow. Consider a two dimensional Navier-Stokes Equation with the continuity and  2 , v is the velocity vector, p is pressure, ν is the viscosity, and f represents body forces. We also introduce a stream function (x, y) such that it satisfies the continuity equation: Using the stream function from the (18), we can rewrite the equation after eliminating the pressure term from the momentum equations, leading to the vorticity transport equation with two boundary conditions such that: 1 For this problem, we define the trial solution as: t (x) = (x(x −1)y(y −1)) 2 +x 2 y 2 (x −1) 2 (y −1) 2 N(x, p).
In this example, we have divided the domain space [0, 1] 2 to a 16 × 16 grid. Figure 8 shows the ANN solution of the defined Naiver-stokes problem. The L 2 error between

Conclusions
In this paper, we discuss a method of solving nonlinear boundary value problems. Our proposed approach relies on the function approximation capabilities of the feedforward neural networks. Thus, we have designed and trained an ANN framework capable of solving general nonlinear boundary value problems whereby the resulting solutions are mesh free. We can use the proposed method to solve general nonlinear boundary value problems whose analytic solutions are not readily available. That is a significant advantage of this method. Other benefits of this method include its generalisability for solving higher-order and nonlinear problems. For example, in this paper, we have described how the ANN framework proposed can be utilised for solving the Navier-Stokes equation [13,15,28] and the Von-Kármán equation [8].
Generating numerical solutions of nonlinear boundary value problems, especially for higher-order PDEs, is a challenging task. However, the method proposed here appears to be feasible to address such critical problems with good agreement with the corresponding analytic solutions and state of the art techniques for obtaining numerical solutions. We show how it is feasible to set up the ANN framework to incorporate the necessary initial/boundary conditions for the chosen problem and optimise the chosen ANN parameters as well as weights using trial functions. The use of trial functions in the training process helps to provide precision-controlled coefficients without the need for complex domain discretisation. We have also examined the performance of the network subject to different numbers of hidden layers on the chosen ANN. According to our studies, the number of hidden layers between 5 and 10 provides the optimum results.
From the studies we have described here, one can conclude that there is great potential for the ANN framework presented in this paper to be further studied and extended. It will be useful to experiment with various optimisation techniques to reduce the number of parameters in the network. That will help in further optimising the training process and computation of the solutions. It will also be helpful to combine the proposed ANN framework with other existing and pre-trained deep learning models so that solving more complex boundary value problems can be carried out accurately and efficiently. Finally, it will be useful to extend the proposed ANN framework to solve PDEs in very high dimensions, e.g., dimensions > 100. Solving very high dimensional boundary value problems is a highly challenging problem using the present state of the art FEM and FDM based techniques.

Conflict of Interests
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.