1 Introduction

Artificial neural networks (ANNs) can be utilised to create suitable tools to solve large-scale computational problems [2, 37]. ANNs involve the computational inference of simulated neuronal units, where every neuron has a real-valued input. Multiplying these units results in the corresponding neuronal coefficients. Then, using the results, a bias term is calculated. Finally, an investigating function, known as the activation function, is needed to determine a real-valued output.

Applications of ANNs are found in almost every field of computer science and engineering [31]. Networks can learn and store complex mapping relations of the input-output models, which are highly useful. Applications of ANNs, for example, appear to be numerous in the domain of healthcare. These include automated diagnosis of congestive heart failure using ECG signals [2], generating quantitative computed tomography reports for the diagnosis of pulmonary tuberculosis [25] and finding minute mammographic differences for the diagnosis of certain types of cancers [16].

Similarly, ANN architectures enhanced with backpropagation have recently been prevalent in solving problems that have been considered very challenging in the past. For example, Jha et al. [20] have proposed a deep learning model for designing a visual question answering system. This qualitative study shows how one might train a single but fast convolutional neural network (CNN) model by modifying weights in the parameter prediction layers. In [29], the authors discuss a study related to the sclera recognition system using CNNs. This CNN framework is formed using four major convolutional units linked with the corresponding convolutional layers. This is a self-learning model where the output of the first layer is fed to the next so that the model is better trained to provide efficient results. Similarly, a multi-layer feedforward neural network for internet traffic classification was proposed in [30]. They have used a model with four hidden layers for handling large amounts of imbalanced data.

Thus, recently, ANNs have shown great promise in solving many challenging problems in applied mathematics and computing. In this sense, several ways to solve partial differential equations (PDEs) using feedforward neural networks by substituting approximate solutions into the corresponding differential operator [22, 38] have been proposed. For example, Abdeljaber et al. [1] studied an interesting active vibration problem of cantilevered beam induced by a pulse concentrated load using an ANN architecture. Consequently, they proposed a novel methodology for the active control of flexible cantilever plates using piezoelectric sensor/actuator pairs. Similarly, Jafari et al. [18] solved a fuzzy differential equation using a feedforward neural network with three hidden layers. Various other differential equations have been solved using the ANN architectures, e.g., [10, 11, 34, 35]. However, much of these recent problems solved in the area of PDEs and boundary value problems are of first order belonging to the linear domain. Thus, the resulting optimisation problem solved within the ANN architecture is constrained, making the application domain rather limited.

The challenge we address in this paper is that we propose a general ANN architecture for solving a broad range of nonlinear boundary value problems. For this purpose, we utilise a deep neural network that can learn patterns from the data [27] using trial solutions as input during the learning phase. Here, we solve an unconstrained minimisation problem instead of solving the traditional forms of constrained minimisation problems [14]. Moreover, the method proposed here is free from meshes, which is a considerable advantage since meshes can be infeasible and costly when solving PDEs in higher dimensions [26]. Here, the convolutional neural network replaces the mesh formulation where trial solutions help to train the network. The first part of the sample data is the necessary boundary conditions of the PDEs, while the second part consists of a feedforward neural system containing flexible parameters or weights, giving rise to the solution [9, 23]. Thus, the approximate trial solution provides precision-controlled coefficients without the need for domain discretisation.

The work we describe here is novel in that we demonstrate that an ANN-based nonlinear machine learning model is used to solve general nonlinear boundary value problems. To our knowledge, this is the first time a general ANN-based solution is proposed for such boundary value problems with diverse applications in mathematics, physics and engineering. Specifically, in this work, we show that neural networks can learn accurately about the solution of PDEs, such as the Poisson’s equation and the Biharmonic equation, even when such equations are solved in complicated domains. Notably, one can utilise this approach to solve PDEs arising from boundary value problems without worrying about creating and manipulating complex numerical meshes. We demonstrate the efficiency and superiority of the proposed ANN-based method over commonly utilised numerical methods such as the finite difference and the finite element methods.

The remainder of the paper is structured as follows. In Section 2, we introduce the proposed ANN architecture for solving nonlinear boundary value problems. Subsequently, in that section, we discuss the methodology and provide experimental validation of the proposed method. In Section 3, we provide examples showing the application of the proposed method. More specifically, we solve the Von-Kármán and the Navier-Stokes equation subject to given boundary conditions. Finally, in Section 4, we summarise and conclude the paper.

2 The proposed network architecture

The neural network architecture we have propose for obtaining mesh free solutions of nonlinear boundary value problems is inspired by deep and fully connected feedforward neural networks [6]. The ANN is a map between the input and output of the system, which can be linear or nonlinear [17]. The resulting output is a linear activation. Thus, the ANN defines a map such that \(\mathbf {R}^{n} \rightarrow \mathbf {R}^{n}\). The system forming the initial/boundary value problem describes this map. In some cases, the learning process of the multi-layer network can be time-consuming and ill-conditioned. An increasing number of input parameters and proper error minimisation techniques can be useful to cope with such difficulties. The ANN we consider here consists of n + 1 layers with 0 ≤ lil where layer 0 is the input layer and layer l is the output. The activation functions in the hidden layers can be of general form [36], e.g., sigmoids, rectified linear units, or hyperbolic tangents. For this work, unless otherwise stated, we use sigmoids in the hidden layers.

2.1 A general formulation for solving boundary value problems

A general form of the nonlinear boundary value problem considered here can be formulated such that:

$$ F(\mathbf{x},u(\mathbf{x}),\triangledown u(\mathbf{x}),\triangledown^{2} u(\mathbf{x}),..,\triangledown^{m} u(\mathbf{x}))= \tilde{g}(\mathbf{x}) , \mathbf{x} \in D, $$
(1)

subject to certain boundary conditions, where \(\mathbf {x} = (x_{1},x_{2},..,x_{n}) \in \textbf {R}^{n}\) is an independent variable vector over the domain, DRn, m = {1,2,...,n} is the order of boundary value problem, \(\tilde {g}(\mathbf {x})\) is the source term and u(x) is an unknown solution. The function F is considered as non-singular, and all the partial derivatives and higher order derivatives in terms of \(\{\mathbf {x},u, \triangledown u\}\) are assumed to be well defined. The finite domain in the n-dimensional space for the independent variables is discretised with a unit h such that:

$$ \mathbf{x}_{l\mathbf{a}} = \mathbf{x}_{l0} + \mathbf{a}(l)h, \mathbf{x_{la}} \in D, $$
(2)

where a(l) = a,l = {1,2,..,n} is the discrete index vector with each of the components representing the discrete coordinate for the individual variables xl and xl0 representing the initial values. Hence, the discretised boundary value problem becomes:

$$ F_{la}(\mathbf{x}_{la},u(\mathbf{x}_{la}),\triangledown u(\mathbf{x}_{la}),\triangledown^{2} u(\mathbf{x}_{la}),..,\triangledown^{m} u(\mathbf{x}_{la}))= \tilde{g}(\mathbf{x}_{la}). $$
(3)

To obtain a numerical solution of the boundary value problem, the following method is adopted, which assumes the grid discretisation of the domain D and the boundary value problem is defined as described in (2) and (3), respectively. The unknown function u is a two variable entity, u = u(x), such that it is defined over a two-dimensional domain D with the following trial solution:

$$ u_{T}(\mathbf{x})=A(\mathbf{x})+B(\mathbf{x})N(\mathbf{x},\mathbf{p}), $$
(4)

where the weights need to be learned by the neural network N(x,p) to approximate the solution with A(x) along with the boundary conditions and B(x) satisfying B(x)|δD ≡ 0. Note that the choice of A and B are not restricted. There are several ways to construct a solution and in [4] one could find further details on how to approach it. The main problem here is how to transform the differential equation into a minimisation problem. Let us explain this further by way of an example as follows.

2.2 Solving poisson’s equation

Let us consider the general form of Poisson’s equation to be solved as:

$$ F(\mathbf{x},u(\mathbf{x}),\triangledown u(\mathbf{x}),\triangledown^{2} u(\mathbf{x}))= \tilde{g}(\mathbf{x}) , \mathbf{x} \in D. $$
(5)

Then, the minimisation problem will look like:

$$ \min_{\mathbf{p}} {\sum}_{\mathbf{x}\in D} F(\mathbf{x},u(\mathbf{x},\mathbf{p}),\triangledown u({\mathbf{x}},\mathbf{p}),\triangledown^{2} u({\mathbf{x}},\mathbf{p})). $$
(6)

At this stage, the problem is changed from a constrained to an unconstrained optimisation problem since the boundary conditions are satisfied by a trial solution already, which makes it easier to handle. Now, the minimised error xla are points in D such that:

$$ E(\mathbf{p})={\sum}_{i}\bigg\{{{\varDelta}} u(\mathbf{x}_{ia}) -\tilde{g}(\mathbf{x}_{ia}) \bigg \}^{2}. $$

Let us consider a multilayer perceptron with mp input units, the hidden layer with ha activation function units (which is a sigmoid in our case) and an output unit. For a given input vector x, the output of the network is:

$$ N = {\sum}_{i=1}^{h_{a}} v_{i} \sigma(z_{i}), z_{i} = {\sum}_{j=1}^{m_{p}} w_{ij}x_{j}+h_{i}, $$

where wij is the weight of the input unit, (j), vi is the weight of the hidden unit, σ(z) is the sigmoid transfer function and hi is the hidden unit i. Then it is straightforward to see that:

$$ \frac{\delta^{k}{N}}{\delta {x^{k}_{j}}}={\sum}_{i=1}^{h_{a}} v_{i} w_{ij}^{k} \sigma{(z_{i})}^{(k)}, $$

where \(\sigma {(z_{i})}^{(k)}\) denotes the kth order derivative of the sigmoid.

It is worth mentioning that when we compute our loss function, various high order derivatives can be picked into a collection, forming a single unknown function. This collection of unknown functions can include the PDE analytic solution and its derivatives up to the third order, or it can also include only the PDE solution and its second-order derivatives. However, in this case, the loss function is written in the least square sense, bearing a resemblance to the well known least square finite element method [21]. In constructing the architecture, it is important to bear in mind that the PDE solution and its derivatives share nearly the same deep neural network. The structure of the model can be seen in Fig. 1, where we need to update our boundary/trial solution network before updating the main part of the neural network.

Fig. 1
figure 1

A structural diagram showing the proposed feedforward neural network. It consists of an input layer, two or more hidden layers, and an output layer containing one or more artificial neurons

2.3 Enabling learning in the neural network

The major challenge with multi-layer perceptron neural networks is the difficulty in measuring the efficiency of the weights within the hidden layers, whereby the aim is to obtain the lowest performance error. Unfortunately, there is no straightforward mechanism to measure the degree of error within the hidden layers during the training. Hence, we propose the following backpropagation learning procedure to be applied during the training phase. It includes:

  • the set of sample/domain data points {xla},

  • the value for the learning rate α and the criteria to terminate the algorithm,

  • and the methodology for updating weights and initial values of the weights.

Note, here, to start with, we take random initial weights lying between − 0.5 to 0.5. There are n input nodes that will become n + 1 total input nodes after including an extra bias node. A single hidden layer consists of hn nodes with sigmoid as the activation function. The single scalar output from the previous information is then given by:

$$ N(x,v,\tilde{W}) = v^{t} \sigma(\tilde{W} \tilde{x}), $$
(7)

where \(v \in \mathbf {R}^{h_{n}}\) and \(\tilde {W}\in \mathbf {R}^{h_{n} \times \{n+1\}}\) are the specific neural network parameters that replaces the general parameter p. The input variable is \(\tilde {x} = [x^{T},1]^{T}\). The function \(\sigma : \mathbf {R}^{h_{n}} \rightarrow \mathbf {R}^{h_{n}}\) represents a component-wise activation function that operates on the hidden layer. The overall task here is to choose the discretised setting given in (2) and various hidden nodes hn in a layer, and then optimise the (5) to get an approximation of \(u_{t}(x, \tilde {p})\).

As shown in Fig. 1, a feed-forward neural network comprises of an input layer, one or more hidden layers and an output layer. Parameters are propagated from the input layer to the output layer through the hidden layers. The number of nodes in the hidden layer is determined by the degree of polynomial considered. If an n th degree polynomial is used, the number of nodes in the hidden layer would be n + 1. The coefficients of the polynomial can be used as initial weights from the input to the hidden layers and from the hidden layers to the output layer. The number of hidden layers, the number of neurons per layer and the choice of activation functions are essential considerations we have put in place while designing our neural network. As the number of hidden layers increases, the network can be thought of as an adaptive nonlinear dynamical system composed of numerous neurons connected by various attributes and capable of approximating a wide range of complex functions. This is a key feature of the network we have proposed here. Thus, since an ANN can be used to accurately approximate very complicated functions, it can be regarded as a learnable entity which in our case is regarded as the solution of the PDE in question.

It is important to bear in mind that the number of network parameters, including the weights of internode links and biases associated with hidden and output nodes, should be less than the size of the training set. Otherwise, the network will be susceptible to over-fitting, which means that the predictions on the training set will improve while the power of generalisation will deteriorate. Regularisation methods are used to control over-fitting. This is also a feature which is inherent in the ANN proposed here.

Another feature we have taken onboard is the consideration given in selecting the bias terms. Each neuron in the ANN is supplied with a bias, including the output neurons but excluding the input neurons. The connections between neurons in subsequent layers are represented by matrices of weights. We let \({b_{i}^{l}}\) denote the bias of neuron i in layer l. The weight between neuron k in the layer l1 and neuron i in the layer l is denoted by \(w_{ik}^{l}\). The activation function in the layer l is denoted by σl, regardless of the type of activation. We assume for simplicity that a single activation function is used for each layer. The output from neuron i in the layer l is denoted by \({y_{i}^{l}}\).

Drawing parallels with a given numerical method for the accuracy of the solution \(\tilde {p}\) obtained, we assume the grid and the hidden layer as the dominant choices in the solution parameter space. Thus, conceptually, the accuracy of the solution is expected to improve with a finer grid and a larger hidden layer. However, this comes with a high computational cost and also with the undesirable effect of potential over-fitting. Note, the aim here to achieve an estimate of the solution with sufficient accuracy. Whilst we do that, we must bear in mind to minimise the computational cost and complexity of the problem itself. We explain our ANN formulation further by way of another example, as discussed below.

2.4 Solving the biharmonic equation

The Biharmonic operator or bi-Laplacian operator is used in various formulations such as solving the airy stress problem, incompressible fluid mechanics, and elasticity problems [5]. This operator also has applications in geometric design, e.g., [4, 19]. Here, we solve the two dimensional Biharmonic equation in a rectangular domain together with mixed boundary conditions using the method proposed in the previous section. Consider the equation:

$$ \begin{array}{@{}rcl@{}} \triangledown^{4} {{\varPhi}} (\mathbf{x}) &=& \Bigg(\frac{\delta^{4}}{\delta x^{4}}+\frac{\delta^{4}}{\delta x^{2} \delta y^{2}}+\frac{\delta^{4}}{\delta y^{4}}\Bigg) {\Phi}(\mathbf{x}) = \tilde{g}(\mathbf{x}), \\ {{\varPhi}}(\mathbf{x}) &=& \psi_{1}(\mathbf{x}) ,~\mathbf{x}\in \delta{{\varOmega}}, \\ \frac{\delta {{\varPhi}}}{\delta n}(\mathbf{x}) &=& \psi_{2}(\mathbf{x}) ,~\mathbf{x}\in \delta{{\varOmega}}, \end{array} $$
(8)

where Ω = [a,b] × [c,d], Φ is a four times continuously differentiable function and f,ψi, {i = 1,2} are known functions. In our case, \(\frac {\delta }{\delta n}\) refers to the normal derivative where n is a unit vector orthogonal to the surface. In other words, if n is the unit vector providing the direction, then the derivative is given as,

$$ \frac{\delta {\Phi}}{\delta n} \equiv \triangledown{\Phi} \cdot n. $$

If n = (1,0), i.e., in the direction of x-axis, then we just get Φx, which is the standard partial derivative in the direction of the x-axis. Similarly, if n = (0,1)T, then we get Φy.

To find the appropriate trial solution, first, we write all the boundary conditions as follows:

$$ \begin{array}{@{}rcl@{}} {\Phi}(a,y) & =& \psi_{1}(a,y) ~,~ \frac{\delta {\Phi}}{\delta n}(a,y) = -\psi_{2}(a,y),\\ {\Phi}(b,y) & =& \psi_{1}(b,y) ~,~ \frac{\delta {\Phi}}{\delta n}(b,y) = \psi_{2}(b,y),\\ {\Phi}(x,c) & =& \psi_{1}(x,c) ~,~ \frac{\delta {\Phi}}{\delta n}(x,c) = -\psi_{2}(x,c),\\ {\Phi}(x,d) & =& \psi_{1}(x,d) ~,~ \frac{\delta {\Phi}}{\delta n}(x,d) = \psi_{2}(x,d). \end{array} $$
(9)

The trial solution is written in a similar way as proposed in the previous section such that:

$$ {\Phi}_{t}(\mathbf{x,p}) = \phi_{1}(\mathbf{x})+\phi_{2}(\mathbf{x},N(\mathbf{x,p})), $$
(10)

where,

$$ \phi_{1}(\mathbf{x}) = {\Sigma}_{i=1}^{4} A_{i} x^{i}+{\Sigma}_{i=1}^{4} B_{i} y^{i}, $$
(11)

and

$$ \phi_{2}(\mathbf{x},N) = (x-a)^{2}(x-b)^{2}(y-c)^{2}(y-d)^{2} N(\mathbf{x,p}), $$
(12)

where ϕ2 vanishes on the boundary. Now, after substituting (11) and (12) in (10) and then in (9), we can find the unknown constants Ai and Bi, for i = {1,2,3,4}. In practice, we can generate a feasible trial solution satisfying the boundary conditions since the determinant of the coefficient matrix for the defined system is not identically zero. Hence, the minimum error is written as:

$$ \begin{array}{@{}rcl@{}} E(\mathbf{p})={\sum}_{i}{\sum}_{j}\bigg\{|\triangledown^{4} {\Phi}(\mathbf{x}_{la}) -\tilde{g}(\mathbf{x}_{la}) |^{P} \bigg\}^{1/P},\\ i \in{1,2,..,n}, j \in{1,2,..,m}, \end{array} $$

where xla are points in Ω.

figure d

The formulation is further described in Algorithm 1. To explain this further, we take a specific case of the Biharmonic equation. We solve it and compare the solution of our proposed neural network based solution with other existing numerical methods such as finite element method (FEM) and finite difference method (FDM) for (14) as shown in Table 1. We consider the following nonlinear Biharmonic equation with non-zero boundary conditions such that:

$$ \begin{array}{@{}rcl@{}} &&{{\varPsi}}_{xxxx}+2{\Psi}_{xxyy}+{\Psi}_{yyyy}+{\Psi}_{xx}+ 2 {\Psi}_{xy} +{\Psi}_{yy} + \\ &&{{\varPsi}}_{x} +{\Psi}_{y}+{\Psi}^{2}= \tilde{g}(x,y), \end{array} $$
(13)

where,

$$ \begin{array}{@{}rcl@{}} \tilde{g}(\mathbf{x})&=& e^{x}(2 \pi^{4}-2 \pi^{2}+3)\sin(\pi y) + e^{2x} \\ &&-e^{2x}\cos(\pi y)^{2}+3e^{x} \pi\cos(\pi y), \end{array} $$
(14)

with boundary conditions as follows,

$$ \begin{array}{@{}rcl@{}} {\Psi}(\mathbf{x}) &=& e^{x} \sin(\pi y),~\forall ~(\mathbf{x})\in [0,1]\times[0,1], \\ &&\frac{\delta {\Psi}}{\delta x}(\mathbf{x})= e^{x} \sin(\pi y), ~ x=0,1,\\ &&\frac{\delta {\Psi}}{\delta y}(\mathbf{x})= \pi e^{x} \cos(\pi y), ~y=0,1. \end{array} $$
Table 1 Comparison between standard numerical methods and our proposed method for the Biharmonic example

Using the boundary conditions, the trial function is constructed as:

$$ {\Psi}_{t}(\mathbf{x}) = e^{x} \sin(\pi y) + x^{2}y^{2}(x-1)^{2}(y-1)^{2} N(\mathbf{x,p}) $$

The ANN solution and the analytical solution for the chosen Biharmonic problem are shown in Figs. 2 and 3, respectively. Figure 4 shows a detailed comparison of our proposed approach with the FEM and FDM through the use of L2 error. This error is calculated for different grid sizes starting from 5 × 5 to 30 × 30.

Fig. 2
figure 2

The ANN solution for the example Biharmonic problem

Fig. 3
figure 3

The exact solution for the example Biharmonic problem

Fig. 4
figure 4

L2 errors in the solutions of the example Biharmonic problem

For further comparison of the errors between various methods compared here, we have also utilised the root-mean-square (RMS) error:

$$ \text{Error} = \sqrt{\frac{{\sum}_{i=1}^{N}(T_{i}-R_{i})^{2}}{N}}, $$

where N is the total number of nodes, Ti is the exact solution, and Ri is the numerical solution at the particular node i. The error values for the Biharmonic example is shown in Table 2.

Table 2 Comparison of the RMS error values in the solution of the Biharmonic example

2.5 Effects of the hidden layers

An optimum number of hidden layers is an important and critical requirement when deciding to create a chosen ANN framework. The key idea is to efficiently calculate the number of hidden units needed in a multi-layer network to obtain a particular approximation. For this purpose, we work out the number of independent variables that should be modified to create an arbitrary smooth function for a given order of approximation. At the same time, we also work out the number of values that will be changed based on the overall number of parameters in the network. Figure 5 shows the effect of the hidden layers as we change the discretisation in the x and y directions.

Fig. 5
figure 5

The effect of the hidden layers compared to levels of domain discretisation

Our experiments, for this example and for a number of other examples, show that the use of 10 hidden layers results in good agreement of the ANN-based solution when compared with the exact and the numerical solutions. The important point to be noted here is that as we increase the number of hidden layers and discretisation across both axes, the training time increases. Therefore, we suggest using the number of hidden layers between 5 and 10, depending on the training data and the complexity of the problem.

3 Examples

So far, we have discussed how the ANN for solving the chosen boundary value can be setup. We have also shown how the method can be utilised to solve simpler yet general problems such as the Poisson’s and the Biharmonic equation. In this section, we show how the ANN framework discussed above can be used to solve two further boundary value problems namely, the the Von-Kármán equation and the Naiver-Stokes Equation. Both these problems are of complex nature and of practical importance.

3.1 Solving the Von-Kármán equation

The analysis of large square and rectangular plate deflections is one of the most studied architectural problems. The applications of this problem include the design of aviation structures, ship hulls, bridges and spacecraft. For example, the lateral loads on an aircraft are subjected to the thin plate effects in the pressure cabin or lift and bending on the wings.

The generic Kirchhoff linear plate bending theory is appropriate only for minor deflections that neglect mid-surface strains and subsequent in-plane stresses. With a relatively strong lateral deflection, the external force increases. Both the plate’s central surface and the membrane forces play a major role in moving the side loads. Initially, the Von-Kármán’s extension to large deformations was assisted through the pioneering work of Leitao [24]. In film relationships, we must account for the nonlinear terms for the sake of large flow. Consequently, we obtain two nonlinear fourth-order cross-displacement equations for which there is no analytic solution.

Consequently, to solve this problem, one has to resort to numerical procedures. For example, the FDM technique was used in [7] to examine large deflections of rectangular plates subject to lateral pressure and edge loading combination. In [12], the authors discuss some of the earlier final component implementations for large deformations. In [33], the stress-based finite element approach was introduced. In [32], a boundary element method helps to analyse the static, dynamic and buckling behaviour of thin plates. These numerical methods can solve the Von-Kármán plate equations and are more versatile than the rival semi-analytical methods. However, they all come with high computational costs.

3.1.1 Problem formulation

Let us define a side-by-side general plate system composed of non-homogeneous elastoplastic plates with varying physical properties in the domain Ω ∈R2 such that:

$$ \begin{array}{@{}rcl@{}} {\Omega} = \cup_{i=1}^{k} {\Omega}_{i} = \{\mathbf{x}~|~ a^{k-1} \leq x \leq a^{k},~ 0\leq y \leq b,~a^{0} =0 \}, \end{array} $$

where k > 0 is the number of plates that form the system. The problem of bending of the system with the discontinuous coefficient is written as:

$$ \begin{array}{@{}rcl@{}} &&D_{i}\Bigg[\frac{\delta^{4} {\Phi}}{\delta x^{4}} {+} \nu_{i} \frac{\delta^{4} {\Phi}}{\delta y^{4}} + \frac{\delta^{4} {\Phi}}{\delta y^{4}} + \nu_{i} \frac{\delta^{4} {\Phi}}{\delta x^{4}} + 2 (1-\nu_{i}) \frac{\delta^{4} {\Phi}}{\delta x^{2} \delta y^{2}} \Bigg] \\ &&= F(\mathbf{x}), \end{array} $$
(15)

where Φ is the bending of the beam corresponding to the vertically applied force F, \(D_{i} = \frac {E_{i} h^{3}}{12(1-{\nu _{i}^{2}})}\) are the cylindrical stiffness coefficients of the plates of the system, E, μ and h represents Young’s modulus, Poisson’s constant, and the thickness of the beam (assumed to be uniform throughout). Due to the plate boundaries, the boundary conditions for the equilibrium condition are defined to be the physical conditions such that it is assumed that the clamped boundary and the boundary condition are simply assisted. Though the plates in the system have different cylindrical stiffness coefficients of Di, the coefficients of (15) lead to discontinuity at the common bounds of the system. Let us consider two plates with different properties, i.e., i = 2 in a rectangular domain, Ω = {x = (x,y) | − l1xl1,0 ≤ yw1}, as shown in Fig. 6. The case considered is with the boundary at x = 0. With that, the following functions can be easily seen to satisfy the simply supported boundary condition, the clamped boundary condition of Ω and the common boundary, respectively, such that:

$$ \begin{array}{@{}rcl@{}} {\Phi}_{t1}(\mathbf{x}) &=& -\sin{\pi x} \sin{\pi y} + x^{2}y^{2}(x+l_{1})^{2}(y-w_{1})^{2} N(\mathbf{x,p}),\\ {\Phi}_{t2}(\mathbf{x}) &=& {-}(1{-}\cos{2\pi x}) (1{-}\cos{2\pi y}){+}x^{2}y^{2}(x{-}l_{2})^{2}(y{-}w_{1})^{2} N(\mathbf{x,p}).\\ \end{array} $$
(16)
Fig. 6
figure 6

Plate system of two beams with load F

The function Φ(x) = {Φt1 ∪Φt2} is geometrically equivalent such that the plates are connected with an ideal hinge on the common boundary points, and there is absolute rigid support under the hinge. The function Φ(x) also corresponds to the bending along the common boundary. This means that the plates are composed of welding, and there is absolute rigid support under the hinge.

To examine the effect of the Young’s modulus on the bending of the system, we solve the problem using conditions whereby E1 = 5000kn/cm2,E2 = 3000kn/cm2 and E1 = 3000kn/cm2andE2 = 5000kn/cm2 and shown in Fig. 7. The ANN-based solution is derived using 6 hidden layers. A comparison is carried out with the FEM, which shows an L2 error of 1.3413e − 08. The error between the ANN solution and the analytical solution is 1.066e − 11, with 7 hidden layers and a grid size of 20 × 20.

Fig. 7
figure 7

Numerical solution with E1 = 5000 and E2 = 3000 in the top figure, E1 = 3000 and E2 = 5000 in the bottom figure, and νi = 0.3,i = 1,2

3.2 Solving the naiver-stokes equation

Here we present an efficient deep learning technique relying on the ANN for the model reduction of the Navier-Stokes Equation [3] for incompressible flow. Consider a two dimensional Navier-Stokes Equation with the continuity and momentum elements describing an incompressible laminar flow in the non-dimensional form such that:

$$ \begin{aligned} - \nu \triangle \mathbf{v} + (v \cdot \triangle) \mathbf{v} + \triangledown p &= f ~~~\text{(momentum equation),}\\ \triangledown \cdot (\mathbf{v}) &= 0 ~~~\text{(continuity equation),} \end{aligned} $$
(17)

where Ω = {(x,y) = [a,b] × [c,d]}∈R2, v is the velocity vector, p is pressure, ν is the viscosity, and f represents body forces. We also introduce a stream function Ψ(x,y) such that it satisfies the continuity equation:

$$ u = \frac{\delta {\Psi}(x,y)}{\delta y} ~\text{and}~ v = -\frac{\delta {\Psi}(x,y)}{\delta x}. $$
(18)

Using the stream function from the (18), we can rewrite the equation after eliminating the pressure term from the momentum equations, leading to the vorticity transport equation with two boundary conditions such that:

$$ \frac{1}{Re} \triangle^{2} {\Psi} - \frac{\delta {\Psi}}{\delta y} \triangle \frac{\delta {\Psi}}{\delta x} + \frac{\delta {\Psi}}{\delta y} \triangle \frac{\delta {\Psi}}{\delta y} = \frac{\delta f_{1}}{\delta y} - \frac{\delta f_{2}}{\delta x}. $$
(19)

For this problem, we define the trial solution as:

$$ {\Psi}_{t}(\mathbf{x}) = (x(x-1)y(y-1))^{2} + x^{2}y^{2}(x-1)^{2}(y-1)^{2}N(\mathbf{x,p}). $$

In this example, we have divided the domain space [0,1]2 to a 16 × 16 grid. Figure 8 shows the ANN solution of the defined Naiver-stokes problem. The L2 error between the FEM and the analytical solution is 1.213e − 07, and that for the ANN solution (using 7 hidden layers) and the corresponding analytic solution is 1.560e − 11.

Fig. 8
figure 8

Numerical velocity u and v, respectively

4 Conclusions

In this paper, we discuss a method of solving nonlinear boundary value problems. Our proposed approach relies on the function approximation capabilities of the feedforward neural networks. Thus, we have designed and trained an ANN framework capable of solving general nonlinear boundary value problems whereby the resulting solutions are mesh free.

We can use the proposed method to solve general nonlinear boundary value problems whose analytic solutions are not readily available. That is a significant advantage of this method. Other benefits of this method include its generalisability for solving higher-order and nonlinear problems. For example, in this paper, we have described how the ANN framework proposed can be utilised for solving the Navier–Stokes equation [13, 15, 28] and the Von-Kármán equation [8].

Generating numerical solutions of nonlinear boundary value problems, especially for higher-order PDEs, is a challenging task. However, the method proposed here appears to be feasible to address such critical problems with good agreement with the corresponding analytic solutions and state of the art techniques for obtaining numerical solutions. We show how it is feasible to set up the ANN framework to incorporate the necessary initial/boundary conditions for the chosen problem and optimise the chosen ANN parameters as well as weights using trial functions. The use of trial functions in the training process helps to provide precision-controlled coefficients without the need for complex domain discretisation. We have also examined the performance of the network subject to different numbers of hidden layers on the chosen ANN. According to our studies, the number of hidden layers between 5 and 10 provides the optimum results.

From the studies we have described here, one can conclude that there is great potential for the ANN framework presented in this paper to be further studied and extended. It will be useful to experiment with various optimisation techniques to reduce the number of parameters in the network. That will help in further optimising the training process and computation of the solutions. It will also be helpful to combine the proposed ANN framework with other existing and pre-trained deep learning models so that solving more complex boundary value problems can be carried out accurately and efficiently. Finally, it will be useful to extend the proposed ANN framework to solve PDEs in very high dimensions, e.g., dimensions > 100. Solving very high dimensional boundary value problems is a highly challenging problem using the present state of the art FEM and FDM based techniques.