Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines

We address a new numerical method based on a class of machine learning methods, the so-called Extreme Learning Machines (ELM) with both sigmoidal and radial-basis functions, for the computation of steady-state solutions and the construction of (one-dimensional) bifurcation diagrams of nonlinear partial differential equations (PDEs). For our illustrations, we considered two benchmark problems, namely (a) the one-dimensional viscous Burgers with both homogeneous (Dirichlet) and non-homogeneous boundary conditions, and, (b) the one- and two-dimensional Liouville–Bratu–Gelfand PDEs with homogeneous Dirichlet boundary conditions. For the one-dimensional Burgers and Bratu PDEs, exact analytical solutions are available and used for comparison purposes against the numerical derived solutions. Furthermore, the numerical efficiency (in terms of numerical accuracy, size of the grid and execution times) of the proposed numerical machine-learning method is compared against central finite differences (FD) and Galerkin weighted-residuals finite-element (FEM) methods. We show that the proposed numerical machine learning method outperforms in terms of numerical accuracy both FD and FEM methods for medium to large sized grids, while provides equivalent results with the FEM for low to medium sized grids; both methods (ELM and FEM) outperform the FD scheme. Furthermore, the computational times required with the proposed machine learning scheme were comparable and in particular slightly smaller than the ones required with FEM.


Introduction
The solution of partial differential equations (PDEs) with the aid of machine learning as an alternative to conventional numerical analysis methods can been traced back in the early '90s.For example, Lagaris et al. [37] presented a method based on feedforward neural networks (FNN) that can be used for the numerical solution of linear and nonlinear PDEs.The method is based on the construction of appropriate trial functions, the analytical derivation of the gradient of the error with respect to the network parameters and collocation.The training of the FNN was achieved iteratively with the quasi-Newton BFGS method.Gonzalez-Garcia et al. [22] proposed a multilayer neural network scheme that resembles the Runge-Kutta integrator for the identification of dynamical systems described by nonlinear PDEs.
Nowadays, the exponentially increasing-over the last decades-computational power and recent theoretical advances, have allowed further developments at the intersection between machine learning and numerical analysis.In particular, on the side of the numerical solution of PDEs, the development of systematic and robust machine-learning methodologies targeting at the solution of large scale systems of nonlinear problems with steep gradients constitutes an open and challenging problem in the area.Very recently [42,43] addressed the use of numerical Gaussian Processes and Deep Neural Networks (DNNs) with collocation to solve time-dependent non-linear PDEs circumventing the need for spatial discretization of the differential operators.The proposed approach is demonstrated through the one-dimensional nonlinear Burgers, the Schrödinger and the Allen-Cahn equations.In [26], DNNs were used to solve high-dimensional nonlinear parabolic PDEs including the Black-Scholes, the Hamilton-Jacobi-Bellman and the Allen-Cahn equation.In [45], DNNs were used to approximate the solution of PDEs arising in engineering problems by exploiting the variational structure that may arise in some of these problems.In [10,20,24] DNNs were used to solve high-dimensional semi-linear PDEs; the efficiency of the method was compared against other deep learning schemes.In [51], the authors used FNN to solve modified high-dimensional diffusion equations: the training of the FNN is achieved iteratively using an unsupervised universal machine-learning solver.Most recently, in [19], the authors have used DNN to construct non-linear reduced-order models of time-dependent parametrized PDEs.
Over the last few years, extreme learning machines (ELMs) have been used as an alternative to other machine learning schemes, thus providing a good generalization at a low computational cost [32].The idea behind ELMs is to randomly set the values of the weights between the input and hidden layer, the biases and the parameters of the activation/transfer functions and determine the weights between the last hidden and output layer by solving a least-squares problem.The solution of such a least-squares problem is the whole "training" procedure; hence, no iterative training is needed for ELMs, in contrast with what happens with the other aforementioned machine learning methods.Extensions to this basic scheme include multilayer ELMs [14,28,48] and deep ELMs [49].As with conventional neural networks, convolutional networks and deep learning, ELMs have been mainly used for classification purposes [4,11,12,30,48,50].
On the other hand, the use of ELMs for "traditional" numerical analysis tasks and in particular for the numerical solution of PDEs is still widely unexplored.To the best of our knowledge, the only study on the subject is that of [18] where the authors however report a failure of ELMs to deal, for example, with PDEs whose solutions exhibit steep gradients.Recently, we have proposed an ELM scheme to deal with such steep gradients appearing in linear PDEs [8] demonstrating through several benchmark problems that the proposed approach is efficient.
Here, we propose a problem-independent new numerical scheme based on ELMs for the solution of nonlinear PDEs that may exhibit sharp gradients.As nonlinear PDEs may also exhibit non-uniqueness and/or non-existence of solutions, we also show how one can use ELMs for the construction of (one-dimensional) bifurcation diagrams of PDEs.The efficiency of the proposed numerical scheme is demonstrated and discussed through two well-studied benchmark problems: the one-dimensional viscous Burgers equation, a representative of the class of advection-diffusion problems and the one-and two-dimensional Liouville-Bratu-Gelfand PDE, a representative of the class of reaction-diffusion problems.The numerical accuracy of the proposed scheme is compared against the analytical solutions and the exact locations of the limit points that are known for the one-dimensional PDEs, but also against the corresponding numerical approximations obtained with central finite differences (FD) and Galerkin finite elements methods (FEM).

Extreme Learning Machines
ELMs are a class of machine-learning techniques for defining functions derived by artificial neural networks (ANNs) with fixed internal weights and biases.Thus, ELMs have the same structure of a single hidden layer FNN with N neurons.Next, we report the definition of ELM functions which we denote by v(x) : R d → R. Definition 2.1 (ELM network with single hidden layer).Assuming: • An infinitely differentiable non polynomial function ψ, the activation (transfer) function for the neurons in the hidden layer.
• A randomly-generated matrix A ∈ R N ×d containing the internal weights matrix connecting the input layer and the hidden layer.
• A randomly-generated vector β ∈ R N , containing the biases in the hidden layer.
Then, we say that v is an ELM function with a single hidden layer, if there exists a choice of w ∈ R N , (the external weights vector between the hidden layer and the output layer) such that: where x = (x 1 , x 2 , . . ., x d ) ∈ R d is the input vector.
We remark that the regularity assumption in the above definition is not mandatory for the approximation properties, but in our case some regularity is needed to write the collocation method, thus for this case, we also briefly present the necessary theory.It is well-known, that for ANNs, where A and β are not a-priori fixed, holds the universal approximation theorem if ψ is a non-polynomial function: the functional space is spanned by the basis functions Moreover, with some regularity assumptions on the activation function(s), the approximation holds true also for the derivatives (see e.g.Theorem 3.1 and Theorem 4.1 in [40]).Besides, fixing A and β a priori is not a limitation, because the universal approximation is still valid in the setting of ELMs (see Theorem 2 in [27]): Theorem 2.1 (Universal approximation).Let the coefficients α, β in the function sequence {ψ(α j • x + β j )} N j=1 be randomly generated according to any continuous sampling distribution and call ṽN ∈ span{ψ(α j • x + β j ) , j = 1 . . .N } the ELM function determined by ordinary least square solution of f (x) − ṽN (x) , where f is a continuous function.
Then, one has with probability one that lim N →∞ f − ṽN = 0.
We remark that in the ANN framework, the classical way is to optimize the parameters of the network (internal and external weights and biases) iteratively, e.g. by stochastic gradient descent algorithms that have a high computational cost and don't ensure a global but only local convergence.On the other hand, ELM networks are advantageous because the solution of an interpolation problem leads to a system of linear equations, where the only unknowns are the external weights w.For example, consider M points x i such that y i = v(x i ) for i = 1, . . ., M .In the ELM framework (1) the interpolation problem becomes: where N is the number of neurons and ψ j (x) is used to denote ψ(α j • x + β j ).Thus, this is a system of M equations and N unknowns that in a matrix form can be we written as: where y = (y 1 , . . ., y M ) ∈ R M and S ∈ R M×N is the matrix with elements (S) ij = ψ j (x i ).If the problem is square (N = M ) and the parameters α j and β j are chosen randomly, it can be proved that the matrix S is invertible with probability 1 (see i.e.Theorem 1 [27]) and so, there is a unique solution, than can be numerically found; if one has to deal with an ill-conditioned matrix, one can still attempt to find a numerically robust solution by applying established numerical analysis methods suitable for such a case (e.g. by constructing the Moore-Penrose pseudoinverse using QR factorization or SVD).If the problem is under-determined (N > M ), the linear system has (infinite) many solutions and can be solved by applying regularization in order to pick the solution with e.g. the minimal L 2 norm.Such an approach provides the best solution to the optimization problem related to the magnitude of the calculated weights (see [31]).Thus, in ELM networks, one has to choose the type of the activation/transfer function and the values of the internal weights and biases.Since the only limitation is that ψ is a non-polynomial function, there are infinitely many choices.The most common choice are the sigmoidal functions (SF) (also referred as ridge functions or plane waves) and the radial basis functions (RBF) [2,40].
Below, we describe the construction procedure and main features of the proposed ELM scheme, based on these two transfer functions.In the case of the logistic sigmoid transfer function this investigation was made in our work for one-dimensional linear PDEs [8].Here, we report the fundamental arguments and we extend them to include RBFs and two-dimensional nonlinear problems.

ELM with sigmoidal functions
For the SF case, we select the logistic sigmoid, that is defined by For this function, it is straightforward to compute the derivatives.In particular the derivatives with respect to the x k component are given by: where A crucial point in the ELM framework is how to fix the values of the internal weights and biases in a proper way.Indeed, despite the fact that theoretically any random choice should be good enough, in practice, it is convenient to define an appropriate range of values for the parameters α j,k and β j that are strictly related to the selected activation function.For the one-dimensional case, σ j is a monotonic function such that: This function has a inflection point, that we call center c j defined by the following property: Now since σ(0) = 1/2, the following relation between parameters holds: Finally, σ j has a steep transition that is governed by the amplitude of α j : if |α j | → +∞, then σ j approximates the Heaviside function, while if |α j | → 0, then σ j becomes a constant function.Now, since in the ELM framework these parameters are fixed a priori, what one needs to avoid is to have some function that can be "useless"1 in the domain, say Therefore, for the one-dimensional case, our suggestion is to chose α j uniformly distributed as: where N is the number of neurons in the the hidden layer and |I| = b − a is the domain length.Moreover, we also suggest to avoid too small in module coefficients a j by setting: .
Then, for the centers c j , we select equispaced points in the domain I, that are given by imposing the β j s to be: In the two-dimensional case, we denote as x = (x 1 , x 2 ) ∈ R 2 the input and A ∈ R N ×2 the matrix with rows α j = (α j,1 , α j,2 ).Then, the condition (5) becomes: So, now we have: where s is a straight line of inflection points that we call central direction.As the direction parallel to the central direction σ j is constant, while the orthogonal direction to s, the sigmoid σ j is exactly the one-dimensional logistic sigmoid.So considering one point c j = (c j,1 , c j,2 ) of the straight line s, we get the following relation between parameters: Now, the difference with the one-dimensional case is the fact that in a domain I 2 = [a, b] 2 discretized by a grid of n × n points, the number of neurons N = n 2 grows quadratically, while the distance between two adjacent points decreases linearly, i.e. is given by |I|/(n − 1).Thus, for the two-dimensional case, we take α j,k uniformly distributed as: where N is the number of neuron in the network and |I| = b − a.

ELM with radial basis functions
Here, for the RBF case, we select the Gaussian kernel, that is defined as follows: where c j ∈ R d is the center point and ε j ∈ R is the inverse of the standard deviation.For such functions, we have: where In all the directions, the Gaussian kernel is a classical bell function such that: Moreover, the parameter ε 2 j controls the steepness of the amplitude of the bell function: if ε j → +∞, then φ j approximates the Dirac function, while if ε → 0, φ j approximates a constant function.Thus, in the case of RBFs one can relate the role of ε j to the role of α j,k for the case of SF.For RBFs, it is well known that the center has to be chosen as a point internal to the domain and also more preferable to be exactly a grid point, while the steepness parameter ε is usually chosen to be the same for each function.Here, since we are embedding RBFs in the ELM framework, we take randomly the center c j and the steepness parameter ε j in order to have more variability in the functional space.Thus, as for the SF case, we set the parameters ε 2 j random uniformly distributed as: where N denotes the number of neurons in the hidden layer and |I| = b − a is the domain length; for the centers c j , we select equispaced points in the domain.Besides, note that for the RBF case, it is trivial to extend the above into the multidimensional case, since ϕ j is already expressed with respect to the center.For the two-dimensional case, we do the same reasoning as for the SF taking:

Numerical Bifurcation Analysis of Nonlinear Partial Differential Equations with Extreme Learning Machines
In this section, we introduce the general setting for the numerical solution and bifurcation analysis of nonlinear PDEs with ELMs based on basic numerical analysis concepts and tools (see e.g.[7,9,13,21,41]).Let's start from a nonlinear PDE of the general form: with boundary conditions: where L is the partial differential operator acting on u, f (u, λ) is a nonlinear function of u and λ ∈ R p is the vector of model parameters, and {∂Ω l } l denotes a partition of the boundary.
A numerical solution ũ = ũ(λ) to the above problem at particular values of the parameters λ is typically found iteratively by applying e.g.Newton-Raphson or matrix-free Krylov-subspace methods (Newton-GMRES) (see e.g.[34]) on a finite system of M nonlinear algebraic equations.In general, these equations reflect some zero residual condition, or exactness equation, and thus the numerical solution that is sought is the optimal solution with respect to the condition in the finite dimensional space.Assuming that ũ is fixed via the degrees of freedom w ∈ R N -we use the notation ũ = ũ(w) -then these degrees of freedom are sought by solving: Many methods for the numerical solution of Eq. ( 8), ( 9) are written in the above form after the application of an approximation and discretization technique such as Finite Differences (FD), Finite Elements (FE) and Spectral Expansion (SE), as we detail next.
The system of M algebraic equations ( 10) is solved iteratively (e.g. by Newton's method), that is by solving until a convergence criterion is satisfied, the following linearized system: ) ∇ w F is the Jacobian matrix: If the system is not square (i.e. when M = N ), then at each iteration, one would perform e.g.QR-factorization of the Jacobian matrix where Q ∈ R N ×N is an orthogonal matrix and R ∈ R N ×M is an upper triangular matrix.Then, the solution of Eq.( 11) is given by: dw Branches of solutions in the parameter space past critical points on which the Jacobian matrix ∇F with elements ∂F k ∂w j becomes singular can be traced with the aid of numerical bifurcation analysis theory (see e.g.[15,16,17,23,35,36,46]).For example, solution branches past saddle-node bifurcations (limit/turning points) can be traced by applying the so called "pseudo" arc-length continuation method [9].This involves the parametrization of both ũ(w) and λ by the arc-length s on the solution branch.The solution is sought in terms of both ũ(w; s) and λ(s) in an iterative manner, by solving until convergence the following augmented system: where and is one of the choices for the so-called "pseudo arc-length condition" (for more details see e.g.[9,16,21,23,36]); ũ(w) −2 and ũ(w) −1 are two already found consequent solutions for λ −2 and λ −1 , respectively and ds is the arclength step for which a new solution around the previous solution (ũ(w) −2 , λ −2 ) along the arc-length of the solution branch is being sought.

Finite Differences and Finite Elements cases: the application of Newton's method
In FD methods, one aims to find the values of the solution per se (i.e.u j = w j ) at a finite number of nodes within the domain.The operator in the differential problem (8) and the boundary conditions ( 9) are approximated by means of some finite difference operator: L h ≈ L ; B h l ≈ B l : the finite operator revels in some linear combination of the function evaluations for the differential part, while keeping non-linear requirement to be satisfied due to the presence of nonlinearity.Then, approximated equations are collocated in internal and boundary points x k giving equations that can be written as residual equations (10).
With FE and SE methods, the aim is to find the coefficients of a properly chosen basis function expansion of the solution within the domain such that the boundary conditions are satisfied precisely.In the Galerkin-FEM with Lagrangian basis (see e.g.[39,41]), the discrete counterpart seeks for a solution of Eq. ( 8)-( 9) in N points x j of the domain Ω according to: where the basis functions φ j are defined so that they satisfy the completeness requirement and are such that φ j (x k ) = δ jk .This, again with the choice of nodal variables to be the function approximation at the points, gives that u(x j ) = w j are exactly the degrees of freedom for the method.The scheme can be written as the satisfaction of the zero for the weighted residuals R k , k = 1, 2, . . .N defined as: where the weighting functions φ i are the same basis functions used in Eq. ( 15) for the approximation of u.The above constitutes a nonlinear system of N algebraic equations that for a given set of values for λ are solved by Newton-Raphson, thus solving until convergence the following linearized system seen in equation (11), where R k plays the role of F k .
Notice that the border rows and columns of the Jacobian matrix ( 12) are appropriately changed so that Eq. ( 11) satisfy the boundary conditions.Due to the construction of the basis functions, the Jacobian matrix is sparse, thus allowing the significant reduction of the computation cost for the solution of (11) at each Newton's iteration.

Extreme Learning Machine Collocation: the application of Newton's method
In an analogous manner to FE methods, Extreme Learning Machines aim at solving the problem ( 8)-( 9), using an approximation ũN of u with N neurons as an ansatz.The difference is that, similarly to FD methods, the equations are constructed by collocating the solution on M Ω points x i ∈ Ω and M l points x k ∈ ∂Ω l , where Ω l are the parts of the boundary where boundary conditions are posed, see e.g.[3,41]: Then, if we denote M = M Ω + m l=1 M l , we have a system of M nonlinear equations with N unknowns that can be rewritten in a compact way as: where for k = 1, . . ., M Ω , we have: while for the l-th boundary condition, for k = 1, . . ., M l we have: At this system of non-linear algebraic equations, here we apply Newton's method (11).Notice that the application of the method requires the explicit knowledge of the derivatives of the functions ψ; in the ELM case as described, we have explicit formulae for these (see Eq. ( 4), ( 7)).
Remark 3.1.In our case, Newton's method is applied to non-squared systems.When the rank of the Jacobian is small, here we have chosen to solve the problem with the use of Moore-Penrose pseudo inverse of ∇ w F computed by the SVD decomposition; as discussed above, another choice would be QR-decomposition (13).This means that we cut off all the eigenvectors correlated to small eigenvalues2 , so: where U ∈ R M×M and V ∈ R N ×N are the unitary matrices of left and right eigenvectors respectively, and Σ ∈ R M×N is the diagonal matrix of singular values.Finally, we can select q ≤ rank(∇F ) to get where U q ∈ R M×q and V ∈ R N ×q and Σ q ∈ R q×q .Thus, the solution of Eq.( 11) is given by: ). Branches of solutions past turning points can be traced by solving the augmented, with the pseudo-arc-length condition, problem given by Eq.( 14).In particular in (14), for the ELM framework (1), the term ∇ w N becomes: ds , where S is the collocation matrix defined in equation ( 2).Remark 3.2.The three numerical methods (FD, FEM and ELM) are compared with respect to the dimension of the Jacobian matrix J, that in the case of FD and FEM is square and related to the number N of nodes, i.e.J ∈ R N ×N , and in the case ELM is rectangular and related to both the number M of collocation nodes and the number N of neurons, i.e.J ∈ R M×N .Actually, N is the parameter related to the computational cost, i.e. the inversion of the J is O(N3 ) and the same is in the ELM case for the inversion of the matrix J T J ∈ R N ×N .Finally we make explicit that in all the rest of this work, for the ELM case, we use a number M of collocation points that is half the number N of neurons.Such a choice is justified by our previous work ( [8]) that works better for linear PDEs with steep gradients.
In general, we pinpoint that by increasing the number M to be 2N 3 , 3N 4 , etc.. 3 one gets even better results (see e.g.our previous work [8] on the solution of linear PDEs).

Numerical Analysis Results: the Case Studies
The efficiency of the proposed numerical scheme is demonstrated through two benchmark nonlinear PDEs, namely (a) the one dimensional nonlinear Burgers equation with Dirichlet boundary conditions and also mixed boundary conditions, and, (b) the one-and two-dimensional Liouville-Bratu-Gelfand problem.These problems have been widely studied as have been used to model and analyse the behaviour of many physical and chemical systems (see e.g.[1,6,9,21,25,33,44]).In this section, we present some known properties of the proposed problems and provide details on their numerical solution with FD, FEM and ELM with both logistic and Gaussian RBF transfer functions.

The Nonlinear Viscous Burgers Equation
Here, we consider the one-dimensional steady state viscous Burgers problem: in the unit interval [0, 1], where ν > 0 denotes the viscosity.For our analysis, we considered two different sets of boundary conditions: • Mixed boundary conditions: Neumann condition on the left boundary and zero Dirichlet on the right boundary: The two sets of boundary conditions result to different behaviours (see [1,5]).We summarize in the next two lemmas some of the main results.
Lemma 4.1 (Dirichlet case).Consider Eq. ( 18) with boundary conditions given by (19).Moreover, take (notice that γ − −− → ν→0 1): Then, the problem ( 18)-( 19) has a unique solution given by: We will use this test problem because the solution has a boundary layer and for this simple case, we can also implement and discuss the efficiency of a fixed point iteration by linearization, while in the mixed-boundaries case, we implement only the Newton's iterative procedure.
Lemma 4.2 (Mixed case).Consider Eq.( 18) with boundary conditions given by (20).The solution of the problem can be written as [1] : where c is constant value which can be determined by the imposed Neumann condition.
To better understand the behaviour of the unstable solution with respect to the left boundary condition, we can prove the following.
Corollary 4.2.1.Consider Eq.( 18) with boundary conditions given by (20).For the non-zero solution, when ϑ = ǫ → 0 the solution at x = 0 goes to infinity with values: Proof.By setting the value of ϑ in the Neumann boundary condition to be a very small number, i.e. ϑ = ǫ ≪ 1, then from Eq.( 24), we get that the slope of the analytical solution given by Eq.( 23) is equal to ǫ, when Plugging the above into the analytical solution given by Eq.( 22), we get Eq.(25).
The above findings imply also the existence of a limit point bifurcation with respect to ϑ that depends also on the viscosity.For example, as shown in [1], for ϑ > 0 and ν = 1/10, there are two equilibria arising due to a turning point at ϑ * = 0.087845767978.

Numerical Solution of the Burgers equation with Finite Differences and Finite Elements
The discretization of the one-dimensional viscous Burgers problem in N points with second-order central finite differences in the unit interval 0 ≤ x ≤ 1 leads to the following system of N − 2 algebraic equations ∀x j = (j − 1)h, j = 2, . . .N − 1, h = 1 N −1 : At the boundaries x 1 = 0, x N = 1, we have u 1 = γ, u N = 0, respectively for the Dirichlet boundary conditions (19) and u 1 = (2hϑ + 4u 2 − u 3 )/3, u N = 0, respectively for the mixed boundary conditions (20).
The above N −2 nonlinear algebraic equations are the residual equations ( 10) that are solved iteratively using Newton's method (11).The Jacobian ( 12) is now triagonal: at each i-th iteration, the non-null elements are given by: The Galerkin residuals (16) in the case of the one-dimensional Burgers equation are: By inserting the numerical solution (15) into Eq.(27) and by applying the Green's formula for integration, we get: At the above residuals, we have to impose the boundary conditions.If Dirichlet boundary conditions (19) are imposed, Eq. ( 28) becomes: In the case of the mixed boundary conditions (20), Eq.( 28) becomes: In this paper, we use a P 2 Finite Element space, thus quadratic basis functions using an affine element mapping in the interval [0, 1] d .For the computation of the integrals, we used the Gauss quadrature numerical scheme: for the one-dimensional case, we used the three-points gaussian rule: When writing Newton's method (11), the elements of the Jacobian matrix for both ( 29) and ( 30) are given by: Finally, with all the above, the Newton's method (11) involves the iterative solution of a linear system.For the Dirichlet problem this becomes: while for the problem with the mixed boundary conditions, at each iteration, we need to solve the following system:

Numerical Solution of the Burgers equation with Extreme Learning Machine Collocation
Collocating the ELM network function for the one-dimensional Burgers equation leads to the following nonlinear algebraic system for i = 2, . . ., M − 1: Then, the imposition of the boundary conditions (19) gives: while boundary conditions (20) lead to: These equations are the residual equations ( 10) that we solve by Newton's method (11).The elements of the Jacobian matrix ∇ w F are: for i = 2, . . ., M − 1 and due to the Dirichlet boundary conditions (35), we have: On the other hand, due to the mixed boundary conditions given by (36), we get: At this point, the application of Newton's method (11) using the exact computation of the derivatives of the basis functions is straightforward (see ( 4) and ( 7)).

Numerical Results
In all the computations with FD, FEM and ELMs, the convergence criterion for Newton's iterations was the L 24 norm of the relative error between the solutions resulting from successive iterations; the convergence tolerance was set to 10 −6 .In fact, for all methods, Newton's method converged quadratically also up to the order of 10 −10 , when the bifurcation parameter was not close to zero where the solution of both Burgers with mixed boundary conditions and Bratu problems goes asymptotically to infinity.The exact solutions that are available for the one-dimensional Burgers and Bratu problems are derived using Newton's method with a convergence tolerance of 10 −12 .
First, we present the numerical results for the Burgers equation ( 18) with Dirichlet boundary conditions (19).Recall that for this case, the exact solution is available (see equation ( 21)).For our illustrations, we have selected two different values for the viscosity, namely ν = 0.1 and ν = 0.007.Results were obtained with Newton's iterations starting from an initial guess that is a linear segment that satisfies the boundary conditions.Figure 1 shows the corresponding computed solutions for a fized size N = 40 as well as the relative errors with respect to the exact solution.As it is shon the proposed ELM scheme outperforms both the FD and FEM schemes for medium to large sizes of the grid; from low to medium sizes of the grid, all methods perform equivalently.However, as shown in Figure 1(c), for ν = 0.007, and the particualr choice of the size (N = 40), the FD scheme fails to approximate sufficiently the steep-gradient appearing at the right boundary.
Then, we considered the case of the non-homogeneous Neumann condition on the left boundary ( 18)- (20); here, we have set ν = 1/10.In this case, the solution is not unique and the resulting bifurcation diagram obtained with FD, FEM and ELM is depicted in Fig. (2).In Table 1, we report the error between the value of the bifurcation point as computed with FD, FEM and ELM for various problem sizes N , with respect to the exact value of the bifurcation point (occurring for the particular choice of viscosity at ϑ * = 0.087845767978).The location of the bifurcation point for all numerical methods was estimated by fitting a parabola around the four points (two on the lower and two on the upper branch) of the largest values of λ as obtained by the pseudo-arc-length continuation.As shown, the proposed ELM scheme performs equivalently to FEM for low to medium sized of the grid, thus outperforming FEM for medium to large grid sizes; both methods FEM and ELM) outperform FD for all sizes of the grid.-5.3487e-05 -7.6969e-09 -2.0571e-09 -2.1431e-09 100 -1.3370e-05 -2.1575e-09 -9.8439e-09 -9.8483e-09 200 -3.3420e-06-5.9262e-09 -9.6156e-09 -9.6095e-09 400 -8.3473e-07 4.1474e-09 9.3882e-10 9.3338e-10 Table 1: One-dimensional Burgers equation (18) with mixed boundary conditions (20).Comparative results with respect to the error between the estimated value of the turning point as obtained with FD, FEM and ELMs schemes and the exact value of the turning point at ϑ * = 0.087845767978 for ν = 1/10.The value of the turning point was estimated by fitting a parabola around the four points with the largest λ values as obtained by the arc-length continuation.
Remark 4.1 (Linearization of the Burgers equation for its numerical solution.).For the numerical solution of the Burgers equation (18) with boundary conditions given by (19), one can also consider the following simple iterative procedure that linearizes the equation: In this way, the nonlinear term becomes a linear advection term with a non-constant coefficient given by the evaluation of u at the previous iteration.This results to a fixed point scheme.Such linearized equations can be easily solved, being linear elliptic equations, and thus in this case one can perform the analysis for linear systems presented in [8].
The results of this procedure are depicted in Figure 3.We point out that such iterations converge generally very slowly and, what is most important from our point of view, is that convergence is obtained only for a very "good" guess of the solution.

The one-and two-dimensional Liouville-Bratu-Gelfand Problem
The Liouville-Bratu-Gelfand model arises in many physical and chemical systems.It is an elliptic partial differential equation which in its general form is given by [6]: The domain that we consider here is the The one-dimensional problem admits an analytical solution given by [38]: , where θ is such that cosh Figure 3: Numerical accuracy of FD and ELM schemes with respect to the exact solution, for the case of the onedimensional Burgers equation (18) with Dirichlet boundary conditions given by ( 19), as obtained by the fixed point scheme described in Remark 4.1 for (a) ν = 0.1 and (b) ν = 0.007.We depict the L 2 -norm of the difference between the solutions obtained with FD and ELMs and the exact solution (21).
It can be shown that when 0 < λ < λ c the problem admits two branches of solutions that meet at λ c ∼ 3.513830719, a limit point (saddle-node bifurcation) that marks the onset of two branches of solutions with different stability, while beyond that point no solutions exist.
For the two-dimensional problem, to the best of our knowledge, no such (as in the one-dimensional case) exact analytical solution exist that is verified by the numerical results that have been reported in the literature (e.g.[9,25]), in which the authors report the value of the turning at λ c ∼ 6.808124.

Numerical Solution with Finite Differences and Finite Elements
The discretization of the one-dimensional problem in N points with central finite differences at the unit interval 0 ≤ x ≤ 1 leads to the following system of N − 2 algebraic equations ∀x j = (j − 1)h, j = 2, . . .N − 1, h = 1 N −1 : where, at the boundaries The solution of the above N − 2 nonlinear algebraic equations is obtained iteratively using the Newton-Raphson method.The Jacobian is now triagonal; at each n-th iteration, the elements at the main diagonal are given by ∂Fj ∂uj and the elements of the first diagonal above and the first diagonal below are given by ∂Fj+1 ∂uj respectively.The discretization of the two-dimensional Bratu problem in N × N points with central finite differences on the square grid 0 ≤ x, y ≤ 1 with zero boundary conditions leads to the following system of (N − 2) The Jacobian is now a (N − 2) 2 × (N − 2) 2 block diagonal matrix of the form: , where I is the (N − 2) × (N − 2) identity matrix and T i is the (N − 2) × (N − 2) tridiagonal matrix with non null elements on the j-th row: Regarding the FEM solution, for the one-dimensional Bratu problem, Eq. ( 16) gives: By inserting Eq.( 15) into Eq.(40) and by applying the Green's formula for integration, we get: and because of the zero Dirichlet boundary conditions, Eq.( 41) becomes: The elements of the Jacobian matrix are given by: Due to the Dirichlet boundary conditions, Eq.( 42) becomes: . . .
For the two-dimensional Bratu problem, the residuals are given by: By applying the Green's formula for integration, we get: As before, for our computations we have used quadratic basis functions using an affine element mapping in the domain [0, 1] 2 .
Thus, the elements of the Jacobian matrix ∇ w F are given by: The application of Newton's method ( 11) is straightforward using the exact computation of derivatives of the basis functions (see ( 4) and ( 7)).
For the two-dimensional Bratu problem (37), we have: w j ψ j (x i , y i ) = 0, i = 1, . . ., M Ω with boundary conditions: Thus, the elements of the Jacobian matrix ∇ w F read: Also in this case, with the above computations the application of Newton's method (11) is straightforward.

Numerical results for the one-dimensional problem
First, we show the numerical results for the one-dimensional Liouville-Bratu-Gelfand equation (37) with homogeneous Dirichlet boundary conditions (38).Recall that an exact solution, although in implicit form, is available in this case (see equation ( 39)); thus, as discussed, the exact solutions are derived using Newton's method with a convergence tolerance of 10 −12 .Figure 4 depicts the comparative results between the exact, FD, FEM and ELM solutions on the upper-branch as obtained by applying Newton's iterations, for two values of the parameter λ and a fixed N = 40, namely for λ = 3 close to the turning point (occurring at λ c ∼ 3.513830719) and for λ = 0.2.For our illustrations, we have set as initial guess u 0 (x) a parabola that satisfies the homogeneous boundary conditions, namely: with a fixed L ∞ -norm ||u|| ∞ = l 0 close to the one obtained from the exact solution.
In particular, for λ = 3, we used as initial guess a parabola with l 0 = 2.2; in all cases Newton's iterations converge to the correct unstable upper-branch solution.For λ = 0.2, we used as initial guess a parabola with l 0 = 6.4 (the exact solution has l 0 ∼ 6.5); again in all cases, Newton's iterations converged to the correct unstable upper-branch solution.
To clarify more the behaviour of the convergence, in Figure 5, we report the regimes of convergence for a grid of L ∞ norms of the initial guesses (parabolas) and λs.
Remark 4.2 (Linearization of the equation for the numerical solution of the Liouville-Bratu-Gelfand problem).For the solution of the equation (37) with boundary conditions given by (38), one can consider the following iterative procedure that linearizes the equation: Given u (0) , do until convergence find u (k) such that ∆u (k) + λe u (k−1) u (k) = λ(u (k−1) − 1)e u (k−1) .
In this way, the nonlinear term becomes a linear reaction term with a non-constant coefficient given by the evaluation of the nonlinearity at the previous step.Then, we implemented fixed point iterations until convergence.Such a linearization procedure is used, for example, in [33].In Figure 6, we report some results on the application of this method.We note that this scheme converges more slowly and it is not so robust compared to Newton's method.

Bifurcation diagram and numerical accuracy
In this section, we report the numerical results obtained by the bifurcation analysis of the one-dimensional Bratu problem (37).Figure 7 shows the constructed bifurcation diagram with respect to the parameter λ and in Table 3 we report the accuracy of the computed value as obtained with FD, FEM and ELMS, versus the exact value of the turning point.As shown, the ELMs provide a bigger numerical accuracy for the value of the turning point for medium to large sizes of the grid, and equivalent results (the ELM with SF) to FEM, both outperforming the FD scheme.
In Figures 8 and 9, we report the contour plots of the L ∞ -norms of the differences between the computed solutions by FD, FEM and ELMs and the exact solutions for the lower-( 8) and upper-branch (9), respectively with respect to N and λ.
As it is shown, the ELM schemes outperform both FD and FEM methods for medium to large problem sizes N , and provide equivalent results with FEM for low to medium problem sizes, ths both (FEM and ELMs) outperforming the FD scheme.-7.3137e-04 8.4422e-07 2.9818e-07 6.6092e-05 100 -1.8282e-04 5.0597e-08 -3.7086e-08 6.1302e-08 200 -4.5683e-052.3606e-08 -4.5484e-09 -2.6770e-09 400 -1.1412e-05 1.3557e-08 2.0169e-09 2.0275e-09 Table 3: One-dimensional Bratu problem (37).Accuracy of FD, FEM and ELMS in the approximation of the value of the turning point with respect to the exact value λ = 3.513830719125162.Values express the difference with the computed turning point and the exact one.The value of the turning point was estimated by fitting a parabola around the four points with the largest λ values as obtained with arc-length continuation.

Numerical results for the two-dimensional problem
For the two-dimensional problem (37)- (38), no exact analytical solution is available.Thus, for comparing the numerical accuracy of the FD, FEM and ELM schemes, we considered the value of the bifurcation point that has been reported in key works as discussed in Section 4.2. Figure 10 depicts the computed bifurcation diagram as computed via pseudo-arc-length continuation (see section 3).
In the case d = 2 this equation gives multiple solutions if λ < λ c = 2.For example, in [44], the authors have used Mathematica to give analytical solutions at various values of λ; for our tests we consider: ( Figure 11 depicts the numerical accuracy of the ELM collocation schemes with respect to the exact solutions for two values of λ, namely for λ = 1/2 and for λ = 1.Because no meshing procedure is involved, and because the collocation equation seeks no other point, the implementation of the Newton's method is straightforward when changing the geometry of the domain.N Grid FD FEM ELM SF ELM RBF 64 8x8 6.783434 7.083742 6.845015 7.207203 100 10x10 6.792626 6.984260 6.723902 6.930798 196 14x14 6.800361 6.900313 6.855055 6.882435 400 20x20 6.804392 6.856401 6.799440 6.829754 784 28x28 6.806235 6.835771 6.801689 6.806149 1600 40x40 6.807220 6.824770 6.806899 6.804600 Table 4: Turning point estimation of the two-dimensional Bratu problem.The value that has been reported in the literature in key works (see e.g.[9]) is λ * = 6.808124.The value of the turning point was estimated by fitting a parabola around the four points with the largest λ values as obtained by the arc-length continuation.

Conclusions
We proposed a numerical approach based on Extreme Learning Machines (ELMs) and collocation for the approximation of steady-state solutions of non-linear PDEs.The proposed scheme takes advantage of the property of the ELMs as universal function approximators, bypassing the need of the computational very expensive -and most-of-the times without any guarantee for convergence of-the training phase of other types of machine learning such as single or multilayer ANNs and Deep-learning networks.The base of the approximation subspace on which a solution of the PDE is sought are the (unknown) weights of the hidden to output layer.For linear PDEs, these can be computed by solving a linear regularization problem in one step.In our previous work [8], we demonstrated that ELMs can provide robust and accurate approximations of the solution of benchmark linear PDEs with steep gradients, for which analytical solutions were available.Here, building on this work, we make a step change by showing how ELMs can be used to solve non-linear PDEs, and by bridging them with continuation methods, we show how one can exploit the arsenal of numerical bifurcation theory to trace branches of solutions past critical points.For our demonstrations, we considered two celebrated classes of nonlinear PDEs whose solutions bifurcate as parameter values change: the onedimensional viscous Burgers equation (a fundamental representative of advection-diffusion PDEs) and the one-and two-dimensional Liouville-Bratu-Gelfand equation (a fundamental representative of reaction-diffusion PDEs).By coupling the proposed numerical scheme with Newton-Raphson iterations and the "pseudo" arc-length continuation method, we constructed the corresponding bifurcation diagrams past turning points.The efficiency of the proposed numerical ELM collocation scheme was compared against two of the most established numerical solution methods, namely central Finite Differences and Galerkin Finite Elements.By doing so, we showed that (for the same problem size) the proposed machine-learning approach outperforms FD and FEM schemes for relatively medium to large sizes of the grid, both with respect to the accuracy of the computed solutions for a wide range of the bifurcation parameter values and the approximation accuracy of the turning points.Hence, the proposed approach arises as an alternative and powerful new numerical technique for the approximation of steady-state solutions of non-linear PDEs.Furthermore, its implementation is far simpler than the implementation of FEM, thus providing equivalent or even better numerical accuracy, and in all cases is shown to outperform the simple FD scheme, which fails to approximate steep gradients as here arise near the boundaries.Of course there are many open problems linked to the implementation of the proposed scheme that ask for further and deeper investigation, such as the theoretical investigation of the impact of the type of transfer functions and the probability distribution of their parameter values functions to the approximation of the solutions.Further directions could be towards the extension of the scheme for the solution of time-dependent non-linear PDEs as well as the solution of inverse-problems in PDEs.

Figure 1 :
Figure 1: Numerical solution and accuracy of the FD, FEM and ELM schemes for the one-dimensional viscous Burgers problem with Dirichlet boundary conditions (18), (19), (a,b) with viscosity ν = 0.1: (a) Solutions for a fixed problem size N = 40; (b) L 2 -norm of differences with respect to the exact solution (21) for various problem sizes.(c,d) with viscosity ν = 0.007: (c) Solutions for a fixed problem size N = 40; (d) L 2 -norm errors with respect to the exact solution for various problem sizes.

Figure 2 :
Figure 2: (a) One-dimensional Burgers equation (18) with mixed boundary conditions (20).Bifurcation diagram with respect to the Neumann boundary value θ as obtained for ν = 1/10, with FD, FEM and ELM schemes with a fixed problem size N = 400; (b) Zoom near the turning point.

Figure 4 :
Figure 4: Numerical solutions and accuracy of the FD, FEM and ELMs schemes for the one-dimensional Bratu problem (37).(a) Computed solutions at the upper-branch unstable solution at λ = 3 for a fixed problem size N = 40.(b) L 2 -norm of differences with respect to the exact unstable solution (39) at λ = 3 for various values of N .(c) Computed solutions at the upper-branch unstable solution at λ = 0.2 with a fixed problem size N = 40.(d) L 2 -norm of differences with respect to the exact unstable solution (39) at λ = 0.2 for various values of N .The initial guess of the solutions was a parabola satisfying the homogeneous boundary conditions with a fixed L ∞ -norm ||u|| ∞ = l 0 close to the one resulting from the exact solution.

Figure 5 :
Figure 5: Convergence regimes (basin of attraction) of Newton's method with the (a) FD, (b) FEM, and (c-d) ELM numerical schemes for the one-dimensional Bratu problem (37) for a grid of initial guesses (L ∞ -norms of parabolas that satisfy the boundary conditions (38)) and λs.Green points indicate convergence to the lower-branch solutions; Red points indicate convergence to the upper-branch solutions; Blue points indicate divergence.(c) ELM with logistic SF (3) (d) ELM with Gaussian RBF (6).

Figure 6 :
Figure 6: Fixed point iterations: L 2 -norm of the difference errors for the low and up branch Liouville-Bratu-Gelfand solution (39) for λ = 2: (a) L 2 errors with respect to N of the low branch solution (b) L 2 errors with respect to N of the upper branch.

Figure 7 :
Figure 7: (a) Bifurcation diagram for the one-dimensional Bratu problem (37), with a fixed problem size N = 400.(b) Zoom near the turning point.

Figure 10 :
Figure 10: (a) Computed bifurcation diagram for the two-dimensional Bratu problem (37), with a grid of 40 × 40 points.b) Zoom near the turning point.

Table 5
[47]mmarizes the computed values of the turning point as estimated with the FD, FEM and ELM schemes for various sizes N of the grid.Remark 4.3 (The Gelfand-Bratu model).The Liouville-Bratu-Gelfand equation(37)in a unitary ball B ⊂ R d with homogeneous Dirichlet boundary conditions is usually refereed as Gelfand-Bratu model.Such equation posses radial solutions u(r) of the one-dimensional non-linear boundary-value problem[47]: