Abstract
Mathematics lies at the heart of engineering science and is very important for capturing and modeling of diverse processes. These processes may be naturally-occurring or man-made. One important engineering problem in this regard is the modeling of advanced mathematical problems and their analysis. Partial differential equations (PDEs) are important and useful tools to this end. However, solving complex PDEs for advanced problems requires extensive computational resources and complex techniques. Neural networks provide a way to solve complex PDEs reliably. In this regard, large-data models are new generation of techniques, which have large dependency capturing capabilities. Hence, they can richly model and accurately solve such complex PDEs. Some common large-data models include Convolutional neural networks (CNNs) and their derivatives, transformers, etc. In this literature survey, the mathematical background is introduced. A gentle introduction to the area of solving PDEs using large-data models is given. Various state-of-the-art large-data models for solving PDEs are discussed. Also, the major issues and future scope of the area are identified. Through this literature survey, it is hoped that readers will gain an insight into the area of solving PDEs using large-data models and pursue future research in this interesting area.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
For scientific computing, differential equations (DEs) are efficient for description of various engineering problems (Boussange et al. 2023; Mallikarjunaiah 2023). Differential equations are first-order or higher derivatives of anonymous functions and are classified accordingly as ordinary differential equations (ODEs) or partial differential equations (PDEs) (Taylor et al. 2023; Farlow 2006). Unlike algebraic techniques, the approach establishes an equational relationship between unknown expressions and the related derivatives. Hence, the solutions should conform to the relationship. Practically, DEs are very important for depiction of complex problems occurring daily in nature (Namaki et al. 2023; Soldatenko and Yusupov 2017; Rodkina and Kelly 2011). Since PDEs are a very important type of DEs which relate to the issues addressed in this paper, we focus on PDEs. Some simplified PDEs have solutions using common operators (Melchers et al. 2023; Farlow 2006). Some popular techniques for PDE solving are the Finite element method (FEM) (Zienkiewicz and Taylor 2000), the Finite volume method (FVM) (Versteeg and Malalasekera 2011), Particle-based methods (Oñate and Owen 2014), and the Finite cell method (FCM) (Kollmannsberger 2019).
For higher order PDEs, a finite element method (FEM) may be conveniently used for PDE solving and can give an accurate solution using extensive computational resources (Innerberger and Praetorius 2023). Also, the multi-iteration solution limits practicality. As of now, solving PDEs is chiefly used for advanced applications like design of aircrafts using fluid dynamics, forecast of weather,... etc. For improving the PDE solving capability, some spline functions have been used (Kolman et al. 2017; Qin et al. 2019). On similar lines, FEM basically discretizes and approximates the PDE solution numerically. Hence, introducing Fourier transforms or Laplace transforms with higher efficiency for expressing PDEs has better feasibility (Mugler and Scott 1988; El-Ajou 2021). Large-data models have robust fitting abilities for functions which are multi-variate and high-dimensional, and have been developed for light-weight and rapid solving of PDEs. Before proceeding further, we will give a brief introduction to FEM, and discuss briefly the relevance and importance of neural network techniques in this regard. Next, we will enumerate the large-data models used for PDE solving. These models will be discussed in detail in the paper subsequently.
1.1 Finite element method (FEM)
The numerical approximation of the continuous field u of any PDE can be given by Eq. 1 on a certain domain and can be solved by different techniques including the Finite element method (FEM) (Zienkiewicz and Taylor 2000). FEM is discussed here with emphasis on the Galerkin-based FEM.
Let the PDE be given by Eq. 1, where \(L(\cdot )\) is an arbitrary function of the continuous field u, and let it be defined on the domain \(\Omega \in \mathbb {R}^n\), which is a set of all possible inputs for the PDE equation, along with boundary conditions (BCs) given by Eqs. 2 and 3. Let \(y_d\) and g be the Dirichlet and Neumann BCs, respectively. The Dirichlet BC gives the numerical value that the variable u at the domain boundary assumes when solving the PDE. The Neumann BC assumes the derivative value of the variable u applied at the domain boundary \(\omega\) , as against to the variable u itself as in the Dirichlet BC. The Finite element formulation of Eq. 1 on a discrete domain having m elements and n nodes, including the BCs, will give the next system Eq. 4.
In Eq. 4, \(K(u^h)\) is the left-hand side matrix and is non-linear, and is known as the stiffness matrix. The stiffness matrix is a matrix that gives the system of linear equations to be solved for ascertaining the approximate solution to the PDE. \(u^h\) is the discretized solution field, and \(F \in \mathbb {R}^n\) is the right-hand side vector giving the forces applied, where \(F_i\) is the force at the \(i^{th}\) node. The equation system can be reduced to be as:
For obtaining the solution \(u^h\), the Newton-Ralphson method can be used by linearizing \(r(u^h)\) and its tangent. This technique needs solving per iteration, an equation-system which is linear. The iterations keep proceeding till the residual norm \(||r||_n\) adjusts to the tolerance. For a linear operator, convergence is achieved in only one iteration. For excessive elements and nodes, the most computationally expensive FEM step is the one for finding the linear equation-system solution. For applications with critical computational efficiency like real-time models, digital twins, etc. This step needs to be avoided at all costs. Applications of techniques like model-order reduction, build a surrogate to significantly reduce the computational cost. Large-data based techniques like deep networks can do away with this cost completely. Large-data models like Convolutional neural networks have some notable merits for solving PDEs (Willard et al. 2022).
1.2 Large-data models for solving PDEs
With time, the popular large-data models like CNNs (Lecun et al. 1998; Krizhevsky et al. 2012; Hafiz et al. 2021) used in deep learning (Hassaballah and Awad 2020; Minaee et al. 2023; Xu et al. 2023; Xiang et al. 2023; Hafiz et al. 2022; Hafiz and Hassaballah 2021), Recurrent neural networks (RNNs) (Ren et al. 2022), Long short term memory (LSTM) neural networks (José et al. 2021), Generative adversarial networks (GANs) (Gao and Ng 2022; Yang 2019), and the attention-based Transformers (Cao 2021) have also been applied for solving PDEs. Deep learning is an area wherein neural networks with large number of layers are used for classification, regression, etc. Introduced by Lecun et al. (1998), Convolutional neural networks (CNNs) rose to popularity with Krizhevsky et al. (2012). AlexNet was a CNN that gave outstanding performance on the ImageNet dataset classification challenge (Vakalopoulou et al. 2023; Deng et al. 2009). At that time, obtaining a high classification accuracy on the ImageNet images dataset was considered a tough computer vision task. Since then, CNNs and deep learning have shattered many records on applications like computer vision (Hassaballah and Awad 2020; Hafiz and Bhat 2020; Hafiz et al. 2020, 2023), speech recognition (Jean et al. 2022; Bhangale and Kothandaraman 2022), financial market forecasting (Zhao and Yang 2023; Ashtiani and Raahemi 2023), and for developing intelligent chatbots like the popular ChatGPT (Gordijn and ten Have 2023). Given the prowess of CNNs, it was only a matter of time before they were applied to tasks like solving PDEs, and demonstrated promising results. This success of CNN based PDE solving was due to their unique strengths like implementation simplicity for supervised learning, and consistency (Smets et al. 2023; Alt et al. 2023; Jiang et al. 2023).
CNNs have both strengths as well as weaknesses for solving PDEs (Michoski et al. 2020; Peng et al. 2023; Choi et al. 2023). The strengths of CNNs in this regard are:
-
1.
Significant ease of implementing PDEs.
-
2.
Convenience of using large data.
-
3.
Consistent solutions over the full space of parameters.
As for (1), it can be said that highly complicated PDEs systems with a very large number of parameters and high dimensionality, can be implemented in Python Language using TensorFlow, and PyTorch in hundreds of code lines in a couple of days (Yiqi and Ng 2023; Quan and Huynh 2023). TensorFlow and PyTorch are CNN based Python Language code-libraries offering a rich set of functions encapsulating the state-of-the-art CNNs and their required data processing related programming sub-routines. This ease of implementation of CNNs as of now offers a convenient and accurate solution for PDEs. This is much easier than using many legacy solvers for PDEs (Kiyani et al. 2022). For (2), using data in the PDEs in supervised learning for CNNs is simple and so is the empirical integration (Jiagang et al. 2022; Fang et al. 2023; Fuhg et al. 2023). As for (3), one more important advantage is that the exploration of the parameter space , requires basic solution-domain augmentation (Ren et al. 2022) (i.e., the space-time parameter space, denoting the possible parameter values of: i) the active PDE variable, and ii) the time parameter) with more parameters like \((x,t, p_1, \cdots , p_n)\), followed by optimization of the CNN for solving the PDE as a function of the input parameters. By augmenting the input-space, much lesser computational complexity is added to the algorithm as compared to solving the PDE in space-time (x, t) parameter values at only one parameter point \((p_1, \cdots , p_n)\) which is much easier than the sequential exploration of the parameter space points (Boussif et al. 2022; Tanyu et al. 2023). It must be noted that space-time refers to the (x, t) values, where x may be a PDE input variable like displacement, and t refers to the input time variable.
On the other side, some of the contemporary weaknesses of CNN-based PDE solvers are:
-
1.
Absence of a guarantee for theoretical convergence of residuals for non-convex PDE minimization.
-
2.
Overall slower run-time for each forward-solve.
-
3.
Weak grounding of theoretical methods in analysis of PDEs.
As for (1), there is the challenge of optimization convergence in non-convex domains, wherein the solution may be trapped inside local minima (Shaban et al. 2023; Mowlavi and Nabi 2023). It may be noted that for a convex function, the global minimum is unique and any local minimum is also the global minimum. However, a non-convex function can have multiple local minima, which are function solutions where the function reaches a low value but this minimum may not be the globally lowest value. Hence, other minima or potentially better solutions may exist. As for (2) it is much more subtle as compared to its appearance and strongly depends on the different aspects of the CNN architecture used, such as hyperparameters optimization, ultimate simulation goal, etc (Tang et al. 2023b; Grohs et al. 2023). With respect to (3), it merely indicates that CNNs have only now been seriously used for solving PDEs, and hence are theoretically untouched at large (Chen et al. 2023). In spite of the above weaknesses, having been inspired by the strengths of deep learning, large-data models have also been used for solving PDEs (Hou et al. 2023). Examples of CNNs (Lagaris et al. 1998) used for solving PDEs are Physics-informed neural networks (PINNs) (Raissi et al. 2019; Baydin et al. 2018), DeepONet (Lu et al. 2021b), etc. Examples of other variants of large-data neural networks used are RNNs like PhyCRNet (Ren et al. 2022), LSTMs (José et al. 2021), GANs (Gao and Ng 2022), etc. Also, recently developed large-data models like Transformers (Cao 2021), and Deep reinforcement learning neural networks (DRLNNs) (Han et al. 2018) have also been used for solving PDEs. Through this literature survey it is hoped that the readers will get some insight into the area of using state-of-the-art large-data models and that they will be encouraged to engage in research in this interesting field. It is also hoped that by this discussion, future inroads into the merger of high-level mathematical modeling and large-data model based simulation and prediction will be laid.
The main contributions of this paper are summarized as follows.
-
A comprehensive survey paper is presented in the domain of solving PDEs, to help researchers review, summarize, solve challenges, and plan for future.
-
An overview of current trends and related techniques for solving PDEs using large-data models is given.
-
The major issues and future scope of using large-data models are also discussed.
The rest of the paper is organized as follows. Section 2 discusses the works related to using large-data models. Section 3 presents the current trends in the area. Section 4 provides the issues and future directions. Finally, the conclusion is given in Sect. 5.
2 Related work
Solving differential equations with neural networks has been going on for some time (Huang et al. 2022). Traditionally, shallow neural networks were used. Shallow neural networks are the older generation of neural networks which have a few layers, and approximate a small number of parameters. One of the first works can be traced to the year 1990 (Hyuk Lee and In Seok 1990). In the same work, the first- and higher-order DEs were quantified by finitesimal approaches followed by using an energy function for the transformed algebraic functions. This energy function was subsequently minimized by using Hopfield networks. Since then there has been a lot of research on solving DEs using various models like neural networks, CNNs and recently other large-data models (Boussange et al. 2023). As reported by the dimensions online database (Hook et al. 2018) the total number of publications is 337 till date for the phrase search: ‘ PDE solving using neural networks OR CNNs OR deep learning OR RNN OR LSTM OR GANs OR Transformers OR DRL’. The year-wise breakup of the number of publications obtained from the Dimensions online database, for the same phrase, is shown in Fig. 1. Out of the search results, the important and ground-breaking works with notable impact, and citations, were considered for the current work. Also, those works were included in the current paper which had novelty, and a significant contribution to the field of PDE solving.
Total number of publications year-wise till date for the phrase search: ’PDE solving using neural networks OR CNNs OR deep learning OR RNN OR LSTM OR GANs OR Transformers OR DRL’ on the dimensions online database (Hook et al. 2018)
2.1 Shallow neural networks for PDE solving
The mathematical modeling of physical problems can be efficiently incorporated by neural networks. The works of Meade Jr and Fernandez (1994a, b) proved to be important for solving PDEs. In the work (Gobovic and Zaghloul 1993), a technique using local connections of neurons was proposed for solving PDEs of heat flow processes. Energy functions are formulated for the PDEs having constant parameters. The energy functions are minimized by using Very large scale integrated (VLSI) Complementary metal oxide (CMOS) circuits. The design of the CMOS circuits was implemented as a neural network with each neuron representing a CMOS cell. Later works like (Gobovic and Zaghloul 1994; Yentis and Zaghloul 1994, 1996) used local neural networks for obtaining solutions of the PDEs by parallelization, and by Neural integrated circuits (NICs). The shallow neural networks initially paved the way for more work in the area of PDE solving. However, their weak approximation capabilities due to lesser number of hidden layers, led to the use of CNNs whose capability was better at capturing the numerous dependencies in PDEs. Hence the CNNs performed better than their shallow counterparts in PDE solving.
2.2 Deep neural networks for PDE solving
Deep learning using CNNs is a popular technique for computer vision tasks (Hassaballah and Awad 2020; Girshick et al. 2014; Girshick 2015; Ren et al. 2017; Sajid et al. 2021; Shelhamer et al. 2017; Chen et al. 2018; Zhao et al. 2017; Hafiz and Bhat 2020; Vinyals et al. 2015). Deep learning has also been used for tasks like Natural Language Processing (NLP) (Amanat et al. 2022). CNNs pre-trained on large datasets like ImageNet (Deng et al. 2009) are used after fine-tuning for two notable reasons (Jing and Tian 2021). First, the feature maps learned by CNNs from the large datasets help them to generalize better and faster. Second, pre-trained CNNs are adept at avoiding over-fitting during fine-tuning for smaller down-stream applications.
The accuracy of CNNs depends on their architecture (Hafiz et al. 2022; Hafiz and Hassaballah 2021) and the training technique (Hafiz et al. 2021). Many CNNs have been developed with huge numbers of parameters. For training these parameters, huge datasets are required. Some popular CNNs include AlexNet (Krizhevsky et al. 2012), VGG (Simonyan and Zisserman 2014), GoogLeNet (Szegedy et al. 2015), ResNet (He et al. 2016), and DenseNet (Huang et al. 2017). Popular CNN training datasets for computer vision include ImageNet (Deng et al. 2009) and OpenImage (Kuznetsova 2020). CNNs have achieved state-of-the-art classification performance for many computer vision tasks (Girshick et al. 2014; Shelhamer et al. 2017; Vinyals et al. 2015; Hassaballah and Hosny 2019; Ledig et al. 2017; Tran et al. 2015; Hafiz et al. 2020, 2023).
2.2.1 CNNs for solving non-linear equations
As per the work of Lagaris et al. (1998), the Differential Equations can be broken down into sub-components using the Dirichlet and Neumann expressions. By using neural networks (Lagaris et al. 1998), non-linear equations were solved up to the seventh decimal digit. Since (Lagaris et al. 1998) is the first work to use CNNs for solving PDEs, it is worthy of being explained briefly. Considering a general differential equation given by Eq. 6 which needs a solution:
Here \({x} = (x_1, x_2, ..., x_n) \in \mathbb {R}^n\) for certain boundary conditions as per an arbitrary boundary S, \(B \subset \mathbb {R}^n\) is the defining domain and y(x, t) is the solution needed. It should be noted that we do not define the boundary conditions here because we are defining a general equation above.
For obtaining a solution to Eq. 6, first the domain B has to be discretized into a set of points \(\hat{B}\). Also the arbitrary boundary S (given here) of the general equation has to discretized into a set of points \(\hat{S}\). Then, the DE may be expressed as a system which has constraints of the generally defined boundary conditions as per Eq. 7:
Here y(x, t) is the solution. It can be obtained from two components given in Eq. 8:
Here A(x) has fixed parameters, p is the parameter set, and N(x, p) is the neural network for minimization.
Although the initial CNNs used for solving PDEs gave promising results, they had certain issues. These included a lack of interpretability, weak adaptation of their structure to problems, and average performance (Ruthotto and Haber 2020; Uriarte et al. 2023).
2.2.2 Physics-informed neural networks (PINNs) for PDE solving
After taking inspiration from the works of Lagaris et al. (1998) and Hornik (1991), wherein CNNs were used for universal approximation, a new genre of CNNs was introduced wherein the physical constraints in the form of PDEs were added to the loss function, hence the name Physics-informed neural networks (PINNs) (Raissi et al. 2019; Baydin et al. 2018). More specifically, the technique involved applying the laws of physics expressed by PDEs as CNN loss functions. And in turn, the loss functions could be optimized for finding solutions (Maziar and George 2018). PINNs do not need discretization of the domains. They are also quite practical as the heavy computation is avoided. PINNs use minimization techniques for non-linear parametric PDEs of the form given by Eq. 9:
Here y(x, t) is the solution which is hidden. \(N[\cdot ; \lambda ]\) is an operator of \(\lambda\). \(\Omega\) belongs to \({\mathbb {R}^D}\) where D is the number of dimensions.
Using PINNs, Raissi et al. (2019) extensively studied complex dynamic processes like post-cylindrical flow, and aneurysm in Raissi et al. (2019),Raissi et al. (2020). Figure 2 shows the schematic for the PINN. Although PINN was a PDE-dedicated CNN model which led to better performance, it suffered from issues of other CNNs like lack of interpretability, need for large data and long training times (Mowlavi and Nabi 2023; Meng et al. 2023). In spite of these, it was partly successful as an expert system for solving PDEs. This was due to its PDE-dedicated framework (Tang et al. 2023a; Jia et al. 2022).
Schematic of the Physics-informed neural network (PINN) used for solving PDEs. Huang et al. (2022)
2.2.3 DeepONet CNN for PDE solving
Due to improved computing capability and availability of high-performance computing (HPC) systems, CNN implementation and training became convenient. Subsequently, different techniques which were previously difficult to implement, were implemented. Lu et al. considered Chen et al.’s non-linear operators (Chen and Chen 1995a, b) as a theoretical basis for justifying the use of neural networks for operator learning. They came up with a robust CNN called DeepONet (Lu et al. 2021b). As CNNs have superior expressibility and many special advantages e.g. in computer vision and in sequential analysis respectively, the authors used many CNNs in the DeepONet model for diversifying and targeting various net assemblies. Also, Lu et al. exploited a bifurcated parallel structure in their proposed CNN. Figure 3 shows the schematic for the DeepONet model. FEA-Net (Yao et al. 2019), Hierarchical deep-learning neural network (Lei et al. 2021), etc. are other examples of CNNs used for PDE solving.
Schematic of the DeepONet CNN model for approximation of the nonlinear operator G(u)(v) where u(x) and v are the variables of the PDE to be solved (Huang et al. 2022)
2.2.4 Recurrent neural networks (RNNs) for PDE solving
Ren et al. (2022) proposed a PINN called physics-informed convolutional recurrent network (PhyCRNet) for solving PDEs. They used a convolutional encoder-decoder long short-term memory (LSTM) network. This network was used for low dimensional feature extraction and evolutional learning. Next, PDE residuals are used for the PDE solution estimation. The residual is a value which is obtained when we substitute the current estimate into the discretized PDE. If the PDE solution depends on time, then the residuals have to converge with every time step. They (Ren et al. 2022) used the PDE residual R(x,y,t,\(\theta\)) given by Eq. 10:
Here \({y \in \mathbb {R}^n}\) is the solution of the PDE in the temporal domain \(t \in\) [0,T], and in the physical domain \(\Omega\). \(\textit{u}_t\) is the first order derivative. \(\nabla _x\) is the gradient for x and F is the non-linear function having parameter \(\lambda\). It must be noted that F plays an important role in Eq. 10 because the residual is obtained by substituting the iterative input parameters in the PDE function.
The loss function L is the sum of squares of the residuals of the PDEs. For a 2D PDE system, L is given by Eq. 11:
Here n and m are the height and the width in the spatial domain respectively. T is the total of time-steps and \(||\cdot ||_2\) is the \(l_2\) Norm.
The loss function was based on PDE residuals. The initial- and boundary-conditions were fixed inside the network. The model was enhanced by regressive as well as residual links which simulated time flow. They solved three types of PDEs using their model viz. 2D Burgers’ equation, the \(\lambda -\omega\) equation and the FitzHugh Nagumo equation. Promising results were obtained. Figure 4 gives the overview of the PhyCRNet architecture.
Schematic of the PhyCRNet model for solving PDEs. Each RNN cell consists of an encoder unit, a Long short term memory (LSTM) unit and a decoder unit. C and h are the cell and hidden states of the Convolutional long short term memory (ConvLSTM) units respectively. BC Encoding refers to the Boundary condition (BC) encoding wherein the BCs are enforced on the output variables. \({u_0}\) is the Initial condition (IC) and \({u_i}\) refers to the state variable for time-step \({t \in [1,T]}\). Ren et al. (2022)
RNNs are usually used for prediction of time-series problems. Their application to PDE solving in forms like PhyCRNet, marked a shift of technique and gave promising results. However, the performance of RNNs for solving PDEs is affected by issues like high complexity, narrow scope, long-training times, etc.
2.3 Long short term memory (LSTM) neural networks for PDE solving
In their work (José et al. 2021), Ferrandis et al. proposed prediction of naval vessel motion by PDEs using Long short term memory (LSTM) neural networks. The input to their LSTM model was the stochastic wave elevation for a particular sea state and the output of their LSTM model comprised of the vessel motions viz. pitch, heave and roll. Promising prediction results were obtained for the vessel motion for arbitrary wave elevations. They trained their LSTM neural networks using offline simulation and extended the prediction to online mode. Their objective function for minimization during training was the Mean squared error (MSE) given by Eq. 12:
Here X is the observed value vector and \(\hat{X_i}\) is the predicted value vector.
Their work was modelled by the universal approximation theorem for functional problems. They claimed that their work was the first to implement such a model for real engineering processes. A schematic of the general LSTM neural network model is shown in Fig. 5.
Schematic of the general Long-short term menory (LSTM) neural network illustrating the unfolding of the feedback loop which makes it suitable for sequential data processing. In the unfolded model \(x_i\) is input to the \(i^{th}\) cell, \(c_i\) is cell-state for \(i^{th}\) cell and \(h_i\) is the hidden-state of the \(i^{th}\) cell. The subscripts \(t-1\), t and \(t+1\) represent three successive time-steps.Song et al. (2020)
As is evident from the above, LSTM NNs are complex in nature. In spite of this, LSTM NNs are suitable for regression problems like time-series prediction tasks, due to their recurrent nature. However, LSTM NNs are used less in PDE solving. And as of now, few works of PDE solving using LSTM NNs are found. This issue is due to the complexity of modeling PDEs with LSTM NNs. In spite of this, some unique applications of LSTM NN based PDE solving have been proposed and promising results are being obtained.
2.4 Generative adversarial networks (GANs) for PDE solving
Generative adversarial networks (GAN) (Goodfellow et al. 2014; Gui et al. 2021; Yang 2019; Gao and Ng 2022) are machine learning algorithms that use deep learning for generation of new data. A GAN is made up of two NNs i.e., a generator and a discriminator. The generator and discriminator are trained together. The generator generates artificial data which imitates the real data, while the discriminator separates the data generated by the generator from the real data. GANs have become popular with tasks like image-morphing (Gui et al. 2021). In their work (Gao and Ng 2022), Gao and Ng proposed the physics informed GAN called Wassertein Generative Adversarial Network (WGAN), which was used for solving PDEs. These GANs use a unique function known as the Wassertein function for convergence. Wassertein function is suitable for PDE solving, and leads to convenient convergence for the latter. The usage of this function has paved the way for using GANs for PDE solving. They stated that GANs could be formulated in the general form given by Eq. 13:
In Eq. 13, G and F are the classes of the generator and the discriminator respectively. \(\pi\) is the distribution of the source and v is the distribution used for approximation. \(g_\theta\) and \(f_\alpha\) are the generator and discriminator functions respectively. x, z, \(\alpha\), and \(\theta\) are the inputs to the PDE function to be solved. They form the input parameter-space to be explored for solving the PDE. \(\mathbb {E}\) is the energy function to be minimized for solving the PDE.
The authors of Gao and Ng (2022) showed that the generalization error to be minimized for obtaining the PDE solution, converged to the approximation error of the GAN model for large data, and they obtained promising results. Schematic for the WGAN used in their work is shown in Fig. 6.
Schematic of the GAN. The GAN is trained till the discriminator which compares the real data x and the generated data G(z) cannot distinguish between the two (Wang et al. 2017)
There are many types of GANs found in literature which have been applied diversely to engineering problems. The most popular application of GANs is image-morphing, though the former is not limited to the latter. Using GANs for solving PDEs comes as a unique application. This is because of the complexity in adapting GANs to solve specific framework problems like those of PDE solving. However as seen with WGANs, progress in this area is being made, and promising results have been obtained.
2.5 Transformers for PDE solving
Inspired by the initial work on Transformers (Vaswani et al. 2017), Shuhao Cao applied self-attention based Transformers to data driven learning for PDEs (Cao 2021). He used Hilbert space approximation for operators. He showed that soft-max normalization was not needed for the scaled dot product in attention mechanisms. He introduced a novel normalization layer which mimicked the Petrov-Galerkin projection for scalable propagation through the attention-based layers in Transformers. The Galerkin attention operator uses the best approximation of f in the \(l_2\) norm \(||\cdot ||_H\) as given by Eq. 14:
Here H is the Hilbert space, and \(f \in H\). \((\mathbb {Q},\mathbb {V})\) refer to the Query and Value subspaces used in the attention maps respectively. \(g_\theta (\cdot )\) is a learnable map of the Galerkin attention operator. \(f_h \in \mathbb {Q}_h\) is the best approximation of f in \(|| \cdot ||_H\), \(b(\cdot , \cdot ):V \times Q \rightarrow \mathbb {R}\) is the continuous bilinear form and c is the boundary condition limit. \(y \in \mathbb {R}^{n \times d}\) is the current latent representation.
This novel technique helped the Transformer model to obtain a good accuracy in learning operators for un-normalized data tasks. The three PDEs he used for experimentation purposes were the Burgers’ equation, the Darcy flow interface process and a coefficient identification process. He called his improved Transformer as the Galerkin Transformer, which demonstrated a better cost of training and a better performance over the conventional counterparts. A schematic of Galerkin attention mechanism is shown in Fig. 7.
Schematic of the Galerkin attention module used in the novel Transformer which is used to solve PDEs. Here (Q, K, V) are the (Query, Key and Value) matrices of the input to the attention module. \(K^T\) denotes the transpose of the Key matrix K. z is the attention-value matrix which is the output of the attention module. Cao (2021)
Transformers were initially developed for Natural language processing (NLP) and were later on adapted to computer vision in the form of Visual transformers (ViTs). Their recent application to PDE solving is an encouraging step. This is because the huge number of training parameters in Transformers can effectively absorb the large number of dependencies in PDEs. However, Transformers have their own issues e.g. need for very large amount of training data, long training times, and those performances which still are not able to rival that of CNNs. Now, as more training data for semi-supervised, un-supervised and formula-driven supervised learning are becoming available, these issues are being addressed.
2.6 Deep reinforcement learning neural networks (DRLNNs) for solving PDEs
Deep reinforcement learning neural networks (DRLNNs) (Hafiz 2023; Hafiz et al. 2023) are deep networks using Reinforcement learning (RL) (Hafiz et al. 2021). DRLNNs have also been used for solving PDEs (Han et al. 2018). Han et al. (2018) proposed a deep learning based technique which was capable of solving high dimensional PDEs. They reformulated the PDEs using stochastic DEs and the solution gradient was approximated by deep neural networks using RL. Their backward stochastic differential equation (BSDE) played the role of model based RL and the solution gradient played the role of the policy function (Han et al. 2018). Considering the PDE given by Eq. 15, its model showed promising results as shown in Table 1. The PDE used for experimentation was a high dimensional (d = 100) Gobet and Tukedjiev equation from the work (Gobet and Turkedjiev 2017) given by Eq. 15 as:
where (t, x) are the temporal and spatial variables of the oscillating solution \(y^{*}(t,x)\) given by Eq. 16 as:
Here D is the dimensionality of the system and T is the time. In the above equations k=1.6, \(\lambda\)=0.1 and T=1.
It is observed from Table 1 that the SD is quite low and both the Mean error (%) and SD decrease with the increase in the number of layers in the deep network. This is testimony to the fact that large-data models like DRLNNs are quite capable of solving complex high dimensional PDEs. RL was initially limited to basic algorithms. Now it has developed into a substantial field of research having numerous techniques for many applications. With the development of DRLNNs, the fields of deep learning and RL were merged. However, there is not a perfect merger due to their respective unique natures. Also with RL there is the potential issue of RL systems going ‘rogue’ due to greed and harming their frameworks and environments. Nevertheless, DRLNNs offer solutions to many important modern day problems and are generally controlled (Raissi 2024; Siegel et al. 2023).
A graphical abstract of the above models for PDE solving is given in Fig. 8 in the form of a timeline. It can be observed from the timeline that the majority of large-data models for solving PDEs are neural networks. To summarize the main large-data models discussed in this work, the PDEs they solve, and the pros and cons of these models, we highlight the same in Table 2.
3 Current trends
There has been significant research on analyzing the generalization errors using techniques like PINNs (Mishra and Molinaro 2022; Penwarden et al. 2023) and DeepONet (Kovachki et al. 2021a; Lanthaler et al. 2022). These techniques have very good performance. Lu et al. (2022a) compared the efficiency of these two techniques. There is often a desire for incremental research based on existing methods like PINNs. The model architecture for neural network training for solving PDEs consists of input, neural net approximation, and the network loss function (Huang et al. 2022). A three pronged approach has been used for improving performance as discussed below.
-
Loss function: Improving the loss function is very useful for obtaining superior performance. Jagtap et al. (2020) improved the PINNs by using the cPINNs as well as the XPINNs (Jagtap and Karniadakis 2021). They used multiple domains and additional constraints. Kutz et al. and Patel et al. did the same by using parsimony (Nathan Kutz and Brunton 2022) and spatio-temporal schemes (Patel et al. 2022). Other studies have used the initial conditions and the boundary differently. For instance, Li et al. (2022) who fused partial integration and level set techniques.
-
Model used: Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Long short term memory (LSTM) neural networks, and even the attention based Transformers have been recently used and have given better performance (Ren et al. 2022; José et al. 2021; Cao 2021; Gao and Ng 2022; Meng et al. 2022) in solving PDEs. U-FNO with more U-Fourier layers (Wen et al. 2022) and the attention based model viz. FNO (Peng et al. 2022) have been developed. Also, research has been done for optimization of activation functions (Wang et al. 2021; Yifan and Zaki 2021; Venturi and Casey 2023).
-
Input: Kovacs et al. (2022) used parameters for determination of the eigen-value equation coefficients as additional inputs . Hadorn improved the DeepONet model by allowing the base function to shift and scale the input values (Patrik 2022). Lye et al. improved the ML technique accuracy by using a multi-level approach (Lye et al. 2021).
The state-of-the-art technologies have led to much better capability. Lu et al. (2021a) crafted DeepXDE using PINNs and DeepONet. In the industry, NVIDIA the popular Graphics Processing Unit (GPU) manufacturing company has used PINNs, DeepONet, and enhanced finite numerical optimization techniques in a toolbox to build the Digital Twin. Also, the advances in quantum-based technology are favoring numerical optimization (Swan et al. 2021). In addition to these, more research (Wang et al. 2022) is being done on PDE solving by introduction of Gaussian processes (Chen et al. 2021), and use of hybrid Finite Element Method-Neural Network models (Mitusch et al. 2021). Li et al. have improved their previous works of Finite Numerical Optimization and have come up with a CNN model for operator learning (Kovachki et al. 2021b). A notable aspect of the newly developed model is its robustness to discretization-invariance, leading to its use for a wider range of applications. As mentioned earlier, like the case of DeepONet (Lu et al. 2021b), there is an effort to use the state-of-the-art deep learning models in this area. In spite of the fact that there are numerous new techniques, they share a common thing. That is, the boundary between theoretical process mechanism and experimental data is being dissolved. Lastly fusion of these two aspects has profoundly improved (Nelsen and Stuart 2021; Kadeethum et al. 2021; Gupta and Jaiman 2022; Gin et al. 2021; Bao et al. 2020; Jin et al. 2022; Lu et al. 2022b). Using alternate techniques for developing large-data models like Transformers (Cao 2021) is also an interesting pointer to the potential to be unlocked in PDE solving. The diversity of large-data models available today also offers rich choices for solving PDEs and has many potential applications (Antony et al. 2023; Li et al. 2023; Shen et al. 2023).
4 Issues and future scope
The mixing of scientific computing with deep learning is very likely due to the advances in technology and research (Huang et al. 2022). However, this trend is reaching a flash point due to abundant computational resources warranting new research directions. Also, the gaps in the theoretical models and the experimental data, pose a problem and their elimination is difficult by conventional means. Further, in spite of the fact that CNNs have robust pattern recognition capabilities, their interpretability and inner working are not extensively researched. New techniques like partial integration and numerical optimization have tried to relate the earlier knowledge of PDEs and the new information of big-data from models like CNNs. Large-data models are often referred to as being ‘data-hungry’ due to the need for extensive training. As such there needs to be research on ways to augment training data without the need for huge amounts of ‘natural’ data. This is an open research area. Again, the interpretability and complexity raise issues, and addressing the same remains an open problem.
Techniques can be coarsely classified into iterative numerical techniques and machine learning based techniques (Psaros et al. 2023). Numerical analysis directly depicts the DE mechanism, while as other ML techniques use probabilistic expressions for data characterization. However, for handling specific PDE solving, approximation can also be used. An example is the parameter approximation used in CNNs for PDEs. Also, special CNNs like DeepONet may be used for functional description of the large-data physical models. These techniques are promising for solving PDEs. Hence using different large-data models can also benefit by their respective strengths. In addition to using neural networks and bifurcated structures, different models may be used (Jagtap et al. 2022; Sirignano and Spiliopoulos 2018; Gupta et al. 2021). Also, the reverse path can be used in dynamical system pattern application for model enhancement. This can lead to substantial benefits for model learning interpretability. As mentioned above, one interesting area is the generation of training data for ‘data-hungry’ models without the use of manual collection. An example of this is mathematical formula-based generation of training data, e.g., by Formula-driven supervised learning or FDSL (Hafiz et al. 2023).
5 Conclusion
In this review paper, an overview of solving Partial differential equation (PDE) using large-data models was given. An introduction to the area was presented along with its publication trends. This was followed by a discussion of various techniques used for solving PDEs using large-data models. The large-data models discussed included Convolutional neural networks (CNNs), Recurrent neural networks (RNNs), Long-short term memory (LSTM) neural networks, Generative adversarial networks (GANs), attention-based Transformers and the Deep reinforcement learning neural networks (DRLNNs). The pros and cons of these techniques were discussed. A trend timeline for the purpose was also given. Then, the major issues and future scope in the area were discussed. Finally, we hope this literature survey becomes a quick guide for the researchers and motivates them to consider using large-data models to solve significant problems in PDE based mathematical modeling.
Data availibility
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Alt T, Schrader K, Augustin M, Peter P, Weickert J (2023) Connections between numerical algorithms for PDEs and neural networks. J Math Imaging Vis 65(1):185–208
Antony ANM, Narisetti N, Gladilin E (2023) FDM data driven U-Net as a 2D Laplace PINN solver. Sci Rep 13(1):9116
Ashtiani MN, Raahemi B (2023) News-based intelligent prediction of financial markets using text mining and machine learning: a systematic literature review. Expert Syst Appl 217:119509
Bao G, Ye X, Zang Y, Zhou H (2020) Numerical solution of inverse problems by weak adversarial networks. Inverse Probl 36(11):115003
Baydin AG, Pearlmutter BA, Radul AA, Siskind JM (2018) Automatic differentiation in machine learning: a survey. J March Learn Res 18:1–43
Bhangale KB, Kothandaraman M (2022) Survey of deep learning paradigms for speech processing. Wirel Pers Commun 125(2):1913–1949
Boussange V, Becker S, Jentzen A, Kuckuck B, Pellissier L (2023) Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions. Partial Differ Equ Appl 4(6):51
Boussif O, Bengio Y, Benabbou L, Assouline D (2022) MAgnet: mesh agnostic neural PDE solver. Adv Neural Inf Process Syst 35:31972–31985
Cao S (2021) Choose a transformer: Fourier or Galerkin. Adv Neural Inf Process Syst 34:24924–24940
Chen T, Chen H (1995a) Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks. IEEE Trans Neural Netw 6(4):904–910
Chen T, Chen H (1995b) Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Trans Neural Netw 6(4):911–917
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen Y, Hosseini B, Owhadi H, Stuart AM (2021) Solving and learning nonlinear PDEs with gaussian processes. J Comput Phys 447:110668
Chen M, Niu R, Zheng W (2023) Adaptive multi-scale neural network with resnet blocks for solving partial differential equations. Nonlinear Dyn 111(7):6499–6518
Choi J, Kim N, Hong Y (2023) Unsupervised Legendre-Galerkin neural network for solving partial differential equations. IEEE Access 11:23433–23446
El-Ajou A (2021) Adapting the Laplace transform to create solitary solutions for the nonlinear time-fractional dispersive PDEs via a new approach. Eur Phys J Plus 136(229):1–22
Fang X, Qiao L, Zhang F, Sun F (2023) Explore deep network for a class of fractional partial differential equations. Chaos Solitons Fract 172:113528
Farlow SJ (2006) An introduction to differential equations and their applications. Dover Publications, Mineola
Fuhg JN, Karmarkar A, Kadeethum T, Yoon H, Bouklas N (2023) Deep convolutional Ritz method: parametric PDE surrogates without labeled data. Appl Math Mech 44(7):1151–1174
Gao Y, Ng MK (2022) Wasserstein generative adversarial uncertainty quantification in physics-informed neural networks. J Comput Phys 463:111270
Gin CR, Shea DE, Brunton SL, Nathan Kutz J (2021) DeepGreen: deep learning of Green’s functions for nonlinear boundary value problems. Sci Rep 11(1):21614
Gobet E, Turkedjiev P (2017) Adaptive importance sampling in least-squares Monte Carlo algorithms for backward stochastic differential equations. Stoch Process Appl 127(4):1171–1203
Grohs P, Hornung F, Jentzen A, Zimmermann P (2023) Space-time error estimates for deep neural network approximations for differential equations. Adv Comput Math 49(1):4
Gupta R, Jaiman R (2022) A hybrid partitioned deep learning methodology for moving interface and fluid-structure interaction. Comput Fluids 233:105239
Gupta G, Xiao X, Bogdan P (2021) Multiwavelet-based operator learning for differential equations. Adv Neural Inf Process Syst 34:24048–24062
Hafiz AM (2023) A survey of deep Q-networks used for reinforcement learning: state of the art. In: Rajakumar G, Ke-Lin D, Vuppalapati C, Beligiannis GN (eds) Intelligent communication technologies and virtual mobile networks. Springer, Singapore, pp 393–402
Hafiz AM, Bhat GM (2020a) A survey on instance segmentation: state of the art. Int J Multimed Inf Retr 9(3):171–189
Hafiz AM, Bhat GM (2020b) A survey of deep learning techniques for medical diagnosis. In: Tuba M, Akashe S, Joshi A (eds) Information and communication technology for sustainable development, Singapore. Springer, Singapore, pp 161–170
Hafiz AM, Hassaballah M (2021) Digit image recognition using an ensemble of one-versus-all deep network classifiers. In: Shamim Kaiser M, Xie J, Rathore VS (eds) Information and communication technology for competitive strategies. Springer, Singapore, pp 445–455
Hafiz AM, Parah SA, Bhat RA (2021) Reinforcement learning applied to machine vision: state of the art. Int J Multimed Inf Retr 10(2):71–82
Hafiz AM, Hassaballah M, Alqahtani A, Alsubai S, Hameed MA (2023) Reinforcement learning with an ensemble of binary action deep Q-networks. Comput Syst Sci Eng 46(3):2651–2666
Hafiz AM, Bhat RUA, Parah SA, Hassaballah M (2023) SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images. Pattern Anal Appl 26:1291–1302
Han J, Jentzen A, Weinan E (2018) Solving high-dimensional partial differential equations using deep learning. Proc Natl Acad Sci 115(34):8505–8510
Hassaballah M, Awad AI (2020) Deep learning in computer vision: principles and applications. CRC Press, Boca Raton
Hassaballah M, Hosny KM (2019) Recent advances in computer vision: theories and applications. Springer, Berlin
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257
Hou J, Li Y, Ying S (2023) Enhancing PINNs for solving PDEs via adaptive collocation point movement and adaptive loss weighting. Nonlinear Dyn 111(16):15233–15261
Hyuk L, In SK (1990) Neural algorithm for solving differential equations. J Comput Phys 91(1):110–131
Innerberger M, Praetorius D (2023) MooAFEM: an object oriented Matlab code for higher-order adaptive FEM for (nonlinear) elliptic PDEs. Appl Math Comput 442:127731
Jagtap AD, Kharazmi E, Karniadakis GE (2020) Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput Methods Appl Mech Eng 365:113028
Jagtap AD, Shin Y, Kawaguchi K, Karniadakis GE (2022) Deep Kronecker neural networks: a general framework for neural networks with adaptive activation functions. Neurocomputing 468:165–180
Jean LK, Fendji E, Tala DCM, Yenke BO, Atemkeng M (2022) Automatic speech recognition using limited vocabulary: a survey. Appl Artif Intell 36(1):2095039
Jia X, Meng D, Zhang X, Feng X (2022) PDNet: progressive denoising network via stochastic supervision on reaction-diffusion-advection equation. Inf Sci 610:345–358
Jiagang Q, Cai W, Zhao Y (2022) Learning time-dependent PDEs with a linear and nonlinear separate convolutional neural network. J Comput Phys 453:110928
Jiang Z, Jiang J, Yao Q, Yang G (2023) A neural network-based PDE solving algorithm with high precision. Sci Rep 13(1):4479
Jin P, Meng S, Lu L (2022) MIONet: learning multiple-input operators via tensor product. SIAM J Sci Comput 44(6):A3490–A3514
Jing L, Tian Y (2021) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 43(11):4037–4058
José del Águila F, Triantafyllou MS, Chryssostomidis C, Karniadakis GE (2021) Learning functionals via LSTM neural networks for predicting vessel dynamics in extreme sea states. Proc R Soc A 477(2245):20190897
Kadeethum T, O’Malley D, Fuhg JN, Choi Y, Lee J, Viswanathan HS, Bouklas N (2021) A framework for data-driven solution and parameter estimation of PDEs using conditional generative adversarial networks. Nat Comput Sci 1(12):819–829
Kiyani E, Silber S, Kooshkbaghi M, Karttunen M (2022) Machine-learning-based data-driven discovery of nonlinear phase-field dynamics. Phys Rev E 106(6):065303
Kolman R, Okrouhlík M, Berezovski A, Gabriel D, Kopačka J, Plešek J (2017) B-spline based finite element method in one-dimensional discontinuous elastic wave propagation. Appl Math Model 46:382–395
Kovachki N, Lanthaler S, Mishra S (2021a) On universal approximation and error bounds for Fourier neural operators. J Mach Learn Res 22(1):13237–13312
Kovacs A, Exl L, Kornell A, Fischbacher J, Hovorka M, Gusenbauer M, Breth L, Oezelt H, Yano M, Sakuma N et al (2022) Conditional physics informed neural networks. Commun Nonlinear Sci Numer Simul 104:106041
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJ, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates Inc, Montreal
Kuznetsova A et al (2020) The open images dataset v4. Int J Comput Vis 128(7):1956–1981
Lagaris IE, Likas A, Fotiadis DI (1998) Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans Neural Netw 9(5):987–1000
Lanthaler S, Mishra S, Karniadakis GE (2022) Error estimates for deeponets: a deep learning framework in infinite dimensions. Trans Math Appl 6(1):tnac001
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li C, Yang Y, Liang H, Boying W (2022) Learning high-order geometric flow based on the level set method. Nonlinear Dyn 107(3):2429–2445
Li S, Zhang C, Zhang Z, Zhao H (2023) A data-driven and model-based accelerated Hamiltonian Monte Carlo method for Bayesian elliptic inverse problems. Stat Comput 33(4):90
Lu L, Meng X, Mao Z, Karniadakis GE (2021a) DeepXDE: a deep learning library for solving differential equations. SIAM Rev 63(1):208–228
Lu L, Jin P, Pang G, Zhang Z, Karniadakis GE (2021b) Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat Mach Intell 3(3):218–229
Lu L, Meng X, Cai S, Mao Z, Goswami S, Zhang Z, Karniadakis GE (2022a) A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data. Comput Methods Appl Mech Eng 393:114778
Lu L, Pestourie R, Johnson SG, Romano G (2022b) Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. Phys Rev Res 4(2):023210
Lye KO, Mishra S, Molinaro R (2021) A multi-level procedure for enhancing accuracy of machine learning algorithms. Eur J Appl Math 32(3):436–469
Mallikarjunaiah SM (2023) A deep learning feed-forward neural network framework for the solutions to singularly perturbed delay differential equations. Appl Soft Comput 148:110863
Maziar R, George EK (2018) Hidden physics models: machine learning of nonlinear partial differential equations. J Comput Phys 357:125–141
Meade Jr AJ, Fernandez AA (1994) Solution of nonlinear ordinary differential equations by feedforward neural networks. Math Comput Model 20(9):19–44
Meade Jr AJ, Fernandez AA (1994) The numerical solution of linear ordinary differential equations by feedforward neural networks. Math Comput Model 19(12):1–25
Melchers H, Crommelin D, Koren B, Menkovski V, Sanderse B (2023) Comparison of neural closure models for discretised PDEs. Comput Math Appl 143:94–107
Meng X, Yang L, Mao Z, del Águila FJ, George EK (2022) Learning functional priors and posteriors from data and physics. J Comput Phys 457:111073
Meng Z, Qian Q, Mengqiang X, Bo Yu, Yıldız AR, Mirjalili S (2023) PINN-FORM: a new physics-informed neural network for reliability analysis with partial differential equation. Comput Methods Appl Mech Eng 414:116172
Michoski C, Milosavljević M, Oliver T, Hatch DR (2020) Solving differential equations using deep neural networks. Neurocomputing 399:193–212
Mishra S, Molinaro R (2022) Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J Numer Anal 42(2):981–1022
Mitusch SK, Funke SW, Kuchta M (2021) Hybrid FEM-NN models: combining artificial neural networks with the finite element method. J Comput Phys 446:110651
Mowlavi S, Nabi S (2023) Optimal control of PDEs using physics-informed neural networks. J Comput Phys 473:111731
Mugler DH, Scott RA (1988) Fast Fourier transform method for partial differential equations, case study: the 2-D diffusion equation. Comput Math Appl 16(3):221–228
Namaki N, Eslahchi MR, Salehi R (2023) The use of physics-informed neural network approach to image restoration via nonlinear PDE tools. Comput Math Appl 152:355–363
Nathan Kutz J, Brunton SL (2022) Parsimony as the ultimate regularizer for physics-informed machine learning. Nonlinear Dyn 107(3):1801–1817
Nelsen NH, Stuart AM (2021) The random feature model for input-output maps between Banach spaces. SIAM J Sci Comput 43(5):A3212–A3243
Oñate E, Owen R (2014) Particle-based methods: fundamentals and applications. Computational Methods in Applied Sciences. Springer, Dordrecht
Patel RG, Manickam I, Trask NA, Wood MA, Lee M, Tomas I, Cyr EC (2022) Thermodynamically consistent physics-informed neural networks for hyperbolic systems. J Comput Phys 449:110754
Peng W, Yuan Z, Wang J (2022) Attention-enhanced neural network models for turbulence simulation. Phys Fluids 34(2):025111
Peng Y, Dan H, Zin-Qin John X (2023) A non-gradient method for solving elliptic partial differential equations with deep neural networks. J Comput Phys 472:111690
Penwarden M, Zhe S, Narayan A, Kirby RM (2023) A metalearning approach for physics-informed neural networks (PINNs): application to parameterized PDEs. J Comput Phys 477:111912
Psaros AF, Meng X, Zou Z, Guo L, Karniadakis GE (2023) Uncertainty quantification in scientific machine learning: methods, metrics, and comparisons. J Comput Phys 477:111902
Qin D, Yanwei D, Liu B, Huang W (2019) A B-spline finite element method for nonlinear differential equations describing crystal surface growth with variable coefficient. Adv Differ Equ 2019(1):1–16
Quan HD, Huynh HT (2023) Solving partial differential equation based on extreme learning machine. Math Comput Simul 205:697–708
Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 378:686–707
Raissi M, Yazdani A, Karniadakis GE (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367(6481):1026–1030
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Ren P, Rao C, Liu Y, Wang J-X, Sun H (2022) PhyCRNet: physics-informed convolutional-recurrent network for solving spatiotemporal PDEs. Comput Methods Appl Mech Eng 389:114399
Ruthotto L, Haber E (2020) Deep neural networks motivated by partial differential equations. J Math Imaging Vis 62:352–364
Sajid F, Javed AR, Basharat A, Kryvinska N, Afzal A, Rizwan M (2021) An efficient deep learning framework for distracted driver detection. IEEE Access 9:169270–169280
Shaban WM, Elbaz K, Zhou A, Shen S-L (2023) Physics-informed deep neural network for modeling the chloride diffusion in concrete. Eng Appl Artif Intell 125:106691
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Shen C, Appling AP, Gentine P, Bandai T, Gupta H, Tartakovsky A, Baity-Jesi M, Fenicia F, Kifer D, Li L et al (2023) Differentiable modelling to unify machine learning and physical models for geosciences. Nat Rev Earth Environ 4(8):552–567
Siegel JW, Hong Q, Jin X, Hao W, Jinchao X (2023) Greedy training algorithms for neural networks and applications to PDEs. J Comput Phys 484:112084
Sirignano J, Spiliopoulos K (2018) DGM: a deep learning algorithm for solving partial differential equations. J Comput Phys 375:1339–1364
Smets BMN, Portegies J, Bekkers EJ, Duits R (2023) PDE-based group equivariant convolutional neural networks. J Math Imaging Vis 65(1):209–239
Song X, Liu Y, Xue L, Wang J, Zhang J, Wang J, Jiang L, Cheng Z (2020) Time-series well performance prediction based on long short-term memory (LSTM) neural network model. J Pet Sci Eng 186:106682
Swan M, Witte F, dos Santos RP (2021) Quantum information science. IEEE Internet Comput 26(1):7–14
Tang S, Feng X, Wei W, Hui X (2023a) Physics-informed neural networks combined with polynomial interpolation to solve nonlinear partial differential equations. Comput Math Appl 132:48–62
Tang K, Wan X, Yang C (2023b) DAS-PINNs: a deep adaptive sampling method for solving high-dimensional partial differential equations. J Comput Phys 476:111868
Tanyu DN, Ning J, Freudenberg T, Heilenkötter N, Rademacher A, Iben U, Maass P (2023) Deep learning methods for partial differential equations and related parameter identification problems. Inverse Probl 39(10):103001
Taylor JM, Pardo D, Muga I (2023) A deep Fourier residual method for solving PDEs using neural networks. Comput Methods Appl Mech Eng 405:115850
Uriarte C, Pardo D, Muga I, Muñoz-Matute J (2023) A deep double Ritz method (D2RM) for solving partial differential equations using neural networks. Comput Methods Appl Mech Eng 405:115892
Venturi S, Casey T (2023) SVD perspectives for augmenting DeepONet flexibility and interpretability. Comput Methods Appl Mech Eng 403:115718
Versteeg HK, Malalasekera W (2011) An introduction to computational fluid dynamics: the finite, vol Method. Pearson Education, Limited, London
Wang K, Gou C, Duan Y, Lin Y, Zheng X, Wang F-Y (2017) Generative adversarial networks: introduction and outlook. IEEE/CAA J Autom Sin 4(4):588–598
Wang S, Wang H, Perdikaris P (2021) On the eigenvector bias of Fourier feature networks: from regression to solving multi-scale PDEs with physics-informed neural networks. Comput Methods Appl Mech Eng 384:113938
Wang H, Planas R, Chandramowlishwaran A, Bostanabad R (2022) Mosaic flows: a transferable deep learning framework for solving PDEs on unseen domains. Comput Methods Appl Mech Eng 389:114424
Wen G, Li Z, Azizzadenesheli K, Anandkumar A, Benson SM (2022) U-FNO: an enhanced Fourier neural operator-based deep-learning model for multiphase flow. Adv Water Resour 163:104180
Willard J, Jia X, Shaoming X, Steinbach M, Kumar V (2022) Integrating scientific knowledge with machine learning for engineering and environmental systems. ACM Comput Surv 55(4):1–37
Xiang H, Zou Q, Nawaz MA, Huang X, Zhang F, Yu H (2023) Deep learning for image inpainting: a survey. Pattern Recogn 134:109046
Xu M, Yoon S, Fuentes A, Park DS (2023) A comprehensive survey of image augmentation techniques for deep learning. Pattern Recogn 137:109347
Yentis R, Zaghloul ME (1996) VLSI implementation of locally connected neural network for solving partial differential equations. IEEE Trans Circ Syst I: Fundam Theory Appl 43(8):687–690
Yifan D, Zaki TA (2021) Evolutional deep neural network. Phys Rev E 104(4):045303
Yiqi G, Ng MK (2023) Deep neural networks for solving large linear systems arising from high-dimensional problems. SIAM J Sci Comput 45(5):A2356–A2381
Zhang L, Cheng L, Li H, Gao J, Cheng Y, Domel R, Yang Y, Tang S, Liu WK (2021) Hierarchical deep-learning neural networks: finite elements and beyond. Comput Mech 67:207–230
Zhao Y, Yang G (2023) Deep learning-based integrated framework for stock price movement prediction. Appl Soft Comput 133:109921
Zienkiewicz OC, Taylor RL (2000) The finite element method, the basis. The finite element method. Wiley, New York
Amanat A, Rizwan M, Javed AR, Maha A, Alsaqour R, Pandya S, Uddin M (2022) Deep learning for depression detection from textual data. Electronics 11(5)
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Girshick R (2015) Fast R-CNN. In: IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580–587
Gobovic D, Zaghloul ME (1993) Design of locally connected CMOS neural cells to solve the steady-state heat flow problem. In: 36th midwest symposium on circuits and systems. IEEE, pp 755–757
Gobovic D, Zaghloul ME (1994) Analog cellular neural network with application to partial differential equations with variable mesh-size. In: IEEE international symposium on circuits and systems. IEEE, vol 6, pp 359–362
Goodfellow IJ, Pouget-Abadie J, Mirza M, Bing X, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. arxiv preprint arxiv:1406:2661
Gordijn B, ten Have H (2023) ChatGPT: evolution or revolution? Medicine, health care and philosophy, pp 1–2
Gui J, Sun Z, Wen Y, Tao D, Ye J (2021) A review on generative adversarial networks: algorithms, theory, and applications. IEEE Trans Knowl Data Eng
Hafiz AM, Bhat GM (2021) Fast training of deep networks with one-class CNNs. In: Modern approaches in machine learning and cognitive science: a walkthrough: latest trends in AI. Springer, Berlin, pp 409–421
Hafiz AM, Bhat RA, Hassaballah M (2022) Image classification using convolutional neural network tree ensembles. Multimed Tools Appl, pp 1–18
Hafiz AM, Hassaballah M, Binbusayyis A (2023) Formula-driven supervised learning in computer vision: a literature survey. Appl Sci 13(2)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778
Hook DW, Porter SJ, Herzog C (2018) Dimensions: building context for search and evaluation. Front Res Metr Anal 3:23. https://www.frontiersin.org/articles/10.3389/frma.2018.00023/pdf
Huang S, Feng W, Tang C, Lv J (2022) Partial differential equations meet deep neural networks: a survey. Preprint arxiv:2211.05567
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (July 2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition, Los Alamitos, CA, USA, pp 2261–2269
Jagtap AD, Karniadakis GE (2021) Extended physics-informed neural networks (XPINNs): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. In: AAAI spring symposium: MLPS, pp 2002–2041
Kollmannsberger S (2019) The finite cell method: towards engineering applications. Technische Universität München
Kovachki N, Li Z, Liu B, Azizzadenesheli K, Bhattacharya K, Stuart A, Anandkumar A (2021b) Neural operator: learning maps between function spaces. arxiv preprint arxiv:2108.08481
Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W (July 2017) Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE conference on computer vision and pattern recognition
Minaee S, Abdolrashidi A, Su H, Bennamoun M, Zhang D (2023) Biometrics recognition using deep learning: a survey. Artif Intell Rev, pp 1–49
Patrik SH (2022) Shift-DeepONet: extending deep operator networks for discontinuous output functions. ETH Zurich, Seminar for applied mathematics
Raissi M (2024) Forward-backward stochastic neural networks: deep learning of high-dimensional partial differential equations. In Peter Carr Gedenkschrift: research advances in mathematical finance. World Scientific, pp 637–655
Rodkina A, Kelly C (2011) Stochastic difference equations and applications
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition
Soldatenko S, Yusupov R (2017) Predictability in deterministic dynamical systems with application to weather forecasting and climate modelling. In: Dynamical systems-analytical and computational techniques. IntechOpen, p 101
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (Jun 2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, Los Alamitos, CA, USA. IEEE Computer Society, pp 1–9
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (December 2015) Learning spatiotemporal features with 3D convolutional networks. In: IEEE international conference on computer vision
Vakalopoulou M, Christodoulidis S, Burgos N, Colliot O, Lepetit V (2023) Deep learning: basics and convolutional neural networks (CNNs). Mach Learn Brain Disord, pp 77–115
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30
Vinyals O, Toshev A, Bengio S, Erhan D (June 2015) Show and tell: a neural image caption generator. In: IEEE conference on computer vision and pattern recognition, Los Alamitos, CA, USA. IEEE Computer Society, pp 3156–3164
Yang L et al (2019) Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs. In: IEEE/ACM 3rd workshop on deep learning on supercomputers. IEEE, pp 1–11
Yao H, Ren Y, Liu Y (2019) FEA-Net: a deep convolutional neural network with physicsprior for efficient data driven PDE learning. In: AIAA Scitech 2019 forum, p 0680
Yentis R, Zaghloul ME (1994) CMOS implementation of locally connected neural cells to solve the steady-state heat flow problem. In: 37th midwest symposium on circuits and systems. IEEE, vol 1, pp 503–506
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition, pp 6230–6239
Funding
The work is not funded.
Author information
Authors and Affiliations
Contributions
All authors wrote, prepared figures and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hafiz, A.M., Faiq, I. & Hassaballah, M. Solving partial differential equations using large-data models: a literature review. Artif Intell Rev 57, 152 (2024). https://doi.org/10.1007/s10462-024-10784-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s10462-024-10784-5