Abstract
The rapid growth of deep learning research, including within the field of computational mechanics, has resulted in an extensive and diverse body of literature. To help researchers identify key concepts and promising methodologies within this field, we provide an overview of deep learning in deterministic computational mechanics. Five main categories are identified and explored: simulation substitution, simulation enhancement, discretizations as neural networks, generative approaches, and deep reinforcement learning. This review focuses on deep learning methods rather than applications for computational mechanics, thereby enabling researchers to explore this field more effectively. As such, the review is not necessarily aimed at researchers with extensive knowledge of deep learning—instead, the primary audience is researchers on the verge of entering this field or those attempting to gain an overview of deep learning in computational mechanics. The discussed concepts are, therefore, explained as simple as possible.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
1.1 Motivation
In recent years, access to enormous quantities of data combined with rapid advances in machine learning has yielded outstanding results in computer vision, recommendation systems, medical diagnosis, and financial forecasting [1]. Nonetheless, the impact of learning algorithms reaches far beyond and has already found its way into many scientific disciplines [2].
The rapid interest in machine learning in general and within computational mechanics is well documented in the scientific literature. By considering the number of publications treating “Artificial Intelligence”, “Machine Learning”, “Deep Learning”, and “Neural Networks”, the interest can be quantified. Figure 1a shows the trend in all journals of Elsevier and Springer since 1999, while Fig. 1b depicts the trend within the computational mechanics community by considering representative journalsFootnote 1 at Elsevier and Springer. The trends before 2017 differ slightly, with a steady growth in general but only limited interest within computational mechanicsFootnote 2. However, around 2017, both curves show a shift in trend, namely a vast increase in publications highlighting the interest and potential prospects of artificial intelligence and its subtopics for a variety of applications.
Due to the rapid growth [21] of research in the field of deep learning (see Fig. 1a), we provide an overview of the various deep learning methodologies in deterministic computational mechanics. To limit the scope of this work, we focus on deterministic approaches and problems within computational mechanics. Numerous review articles on deep learning for specific applications have already emerged (see [22, 23] for topology optimization, [24] for full waveform inversion, [25,26,27,28,29] for fluid mechanics, [30] for continuum mechanics, [31] for material mechanics, [32] for constitutive modeling, [33] for generative design, [34] for material design, and [35] for aeronautics)Footnote 3. The aim of this work is, however, to focus on the general methods rather than applications, where similar methods are often applied to different problems. This has the potential to bridge gaps between scientific communities by highlighting similarities between methods and thereby establishing clarity on the state-of-the-art.
1.2 Taxonomy of deep learning techniques in computational mechanics
In order to discuss the deep learning methods in a structured manner, we introduce the following taxonomy:
-
simulation substitution (Sect. 2)
-
simulation enhancement (Sect. 3)
-
discretizations as neural networks (Sect. 4)
-
generative approaches (Sect. 5)
-
deep reinforcement learning (Sect. 6)
Simulation substitution replaces the entire simulation with a surrogate model, which in the context of deep learning are deep neural networks (NNs). The model can be trained with supervised learning, which purely relies on labeled data and therefore is referred to as data-driven modeling. The generalization errors of these models can be reduced by physics-informed learning. Here, physics constraints are imposed on the learnable space such that only physically admissible solutions are learned.
Simulation enhancement instead only replaces components of the simulation chain, while the remaining parts are still handled by classical methods. Approaches within this category are strongly linked to their respective applications and will, therefore, be presented in the context of their specific use cases. Both data-driven and physics-informed approaches will be discussed.
Treating discretizations as neural networks is achieved by constructing a discretization from the basic building blocks of NNs, i.e., linear transformations and non-linear activation functions. Thereby, techniques within deep learning frameworks—such as automatic differentiation, gradient-based optimization, and efficient GPU-based parallelization—can be leveraged to improve classical simulation techniques.
Generative approaches deal with creating new content based on a data set. The goal is not, however, to recreate the data, but to generate statistically similar data. This is useful in diversifying the design space or enhancing a data set to train surrogate models.
Finally, in deep reinforcement learning, an agent learns how to interact with an environment in order to maximize rewards provided by the environment. In the case of deep reinforcement learning, the agent is modeled with NNs. In the context of computational mechanics, the environment is modeled by the governing physical equations. Reinforcement learning provides an alternative to gradient-based optimization, which is useful when gradient information is not available.
The unique proposed taxonomy arises from a methodological viewpoint, instead of an application [22,23,24,25,26,27,28,29,30,31,32,33,34,35], or problem [42] oriented perspective. However, parallels can be drawn to the in [42] identified challenges and proposed areas of investigation in machine learning. Similarly, the distinction between machine learning enhancedFootnote 4 and substitution by machine learning models is made. Additionally, challenges such as robustness, explainability, and handling of complex and high-dimensional data are highlighted. Also, the separation between physics-informed learning and data-driven modeling is made by [42], as well as by [43]. Interestingly, older reviews [3, 4] arrived at similar categories, additionally including NNs as means of more efficient implementations, i.e., discretizations as NNs. Only the last two proposed categories, generative approaches, and deep reinforcement learning, have not been spotlighted as methodologies within reviews of computational mechanics. But these are well-established within the machine learning community [44,45,46,47] and sufficiently distinct to be treated separately.
1.3 Deep learning
Before continuing with the topics specific to computational mechanics, NNsFootnote 5 and the notation used throughout this work are briefly introduced. In essence, NNs are function approximators that are capable of approximating any continuous function [50]. The NN parametrized by the learnable parameters \(\varvec{\theta }\) (typically consisting of weights \({\varvec{w}}\) and biases \({\varvec{b}}\)) learns a function \({\hat{y}}=f_{NN}(x;\varvec{\theta })\), which approximates the relation \(y=f(x)\). The NN is constructed with nested linear transformations in combination with non-linear activation functions \(\sigma \). The most basic NNs: Fully connected NNs achieve this with layers of fully connected neurons (see Fig. 2), where the activation \(a_k^i\) of each neuron (the ith neuron of layer k) is obtained through linear combinations of the previous layer and the non-linear activation function \(\sigma \):
If more than one layer (excluding input x and output layer \({\hat{y}}\)) is employed, the NN is considered a deep NN, and its training process is thereby deep learning. The evaluation of the NN, i.e., the prediction is referred to as forward propagation. The quality of prediction is determined by a cost function \(C({\hat{y}})\), which is to be minimized. Its gradients \(\nabla _{\varvec{\theta }} C=\{\nabla _{{\varvec{w}}}C, \nabla _{{\varvec{b}}}C\}\) with respect to the parameters \(\varvec{\theta }\) are obtained with automatic differentiation [51], specifically referred to as backward propagation in the context of NNs. The gradients are used within a gradient-based optimization [44, 52, 53] to update the parameters \(\varvec{\theta }\) and thereby improve the prediction \({\hat{y}}\). Supervised learning relies on labeled data \(x^{{\mathcal {M}}}, y^{{\mathcal {M}}}\) to establish a cost function C, while unsupervised learning does not rely on labeled data. The parameters defining the user-defined training algorithm and NN architecture are referred to as hyperparameters. The concept is summarized by Fig. 2, showing a fully connected multi-layer, i.e., deep, NN. More advanced NN architectures discussed throughout this work are described in Appendix A.
NotationalRemark 1
Data sets are denoted by a superscript \({\mathcal {M}}\), i.e, \(\{ x^{\mathcal {M}}, y^{\mathcal {M}} \}_{i=1}^{N_{\mathcal {M}}}\), where \(N_{\mathcal {M}}\) is the data set size.
NotationalRemark 2
Although x and y may denote vector-valued quantities, we do not use bold-faced notation for them. Instead, this is reserved for all N degrees of freedom within a problem, i.e., \({\varvec{x}} = \{x_i\}_{i=1}^{N}\), \({\varvec{y}} = \{y_i\}_{i=1}^{N}\). This can, for instance, be in the form of a domain \(\Omega \) sampled with N grid points or systems composed of N degrees of freedom. Note however, that matrices will still be denoted with capital letters in bold face.
NotationalRemark 3
A multitude of NN architectures will be discussed throughout this work, for which we introduce abbreviations and subscripts. Most prominent are fully connected NNs \(F_{FNN}\) (FC-NNs) [44, 54], convolutional NNs \(f_{CNN}\) (CNNs) [55,56,57], recurrent NNs \(f_{RNN}\) (RNNs) [58,59,60], and graph NNs \(f_{GNN}\) (GNNs) [61,62,63]Footnote 6. If the network architecture is independent of the method, the network is denoted as \(f_{NN}\).
2 Simulation substitution
In the field of computational mechanics, numerical procedures are developed to solve or find partial differential equations (PDEs). A generic PDE can be written as
where a non-linear operator \({\mathcal {N}}\) acts on a solution u(x, t) of a PDE as well as the coefficients \(\lambda (x,t)\) of the PDEFootnote 7 in the spatio-temporal domain \(\Omega \times {\mathcal {T}}\). In the forward problem, the solution u(x, t) is to be computed, while the inverse problem considers either the non-linear operator \({\mathcal {N}}\) or coefficients \(\lambda (x,t)\) as unknowns.
A further distinction is made between methods treating the temporal dimension t as a continuum, as in space-time approaches [67] (Sects. 2.1.1 and 2.2.1)Footnote 8, or in discrete sequential time steps, as in time-stepping procedures (Sects. 2.1.2 and 2.2.2). For simplicity, but without loss of generality, time-stepping procedures will be presented on PDEs with a first order derivative with respect to time:
with the non-linear operator \({\mathcal {N}}^{{\mathcal {T}}}\). Another task in computational mechanics is the forward modeling and identification of systems of ordinary differential equations (ODEs). For this, we will consider systems of the following form:
Here, \({\varvec{x}}(t)\) are the time-dependent degrees of freedom and \({\varvec{f}}\) is the right-hand side defining the system of equations.Footnote 9 Both the forward problem of computing \({\varvec{x}}(t)\) and the inverse problem of identifying \({\varvec{f}}\) will be discussed in the following.
2.1 Data-driven modeling
Data-driven modeling relies entirely on labeled data \(x^{{\mathcal {M}}}, y^{{\mathcal {M}}}\). The NN learns the mapping between \(x^{{\mathcal {M}}}\) and \(y^{{\mathcal {M}}}\) with \({\hat{y}}_i=f_{NN}(x_i;\varvec{\theta })\). Thereby an interpolation to yet unseen data points is established. A data-driven loss \({\mathcal {L}}_{{\mathcal {D}}}\), such as the mean squared error, for example, can be used as cost function C.
2.1.1 Space-time approaches
To declutter the notation, but without loss of generality, the temporal dimension t is dropped in this section, as it is possible to treat it like any other spatial dimension x in the scope of these methods. The goal of the upcoming methods is to either learn a forward operator \({\hat{u}}=F[\lambda ; x]\), an inverse operator for the coefficients \({\hat{\lambda }} = I[u; x]\), or an inverse operator for the non-linear operator \(\hat{{\mathcal {N}}} = O[u; \lambda ; x]\).Footnote 10 The methods will be explained using the forward operator, but they apply analogously to the inverse operators. Only the inputs and outputs differ.
The solution prediction \({\hat{u}}_i\) at coordinate \(x_i\) or \(\varvec{{\hat{u}}}_i\) on the entire domain \(\Omega \) is made based on a set of inverse coefficients \(\varvec{\lambda }_i\). The cost function C is formulated analogously to Eq. (5):
2.1.1.1. Fully connected neural networks
The simplest procedure is to approximate the operator F with a FC-NN \(F_{FNN}\).
Example applications are flow classification [68, 69], fluid flow in turbomachinery [70], dynamic beam displacements from previous measurements [71], wall velocity predictions in turbulence [72], heat transfer [73], prediction of source terms in turbulence models [74], full waveform inversion [75,76,77], and topology optimization based on moving morphable bars [78]. The approach is however limited to simple problems, as an abundance of data is required. Therefore, several improvements have been proposed.
2.1.1.2. Image-to-image mapping
One downside of the application of FC-NNs to problems in computational mechanics is that they often need to learn spatial relationships with respect to x from scratch. CNNs inherently account for these spatial relationships due to their kernel-based structure. Therefore, image-to-image mappings using CNNs have been proposed, where an image, i.e., a uniform grid (see Fig. 3) of the coefficients \(\varvec{\lambda }\), is used as input.
This results in a prediction of the solution \(\varvec{{\hat{u}}}\) throughout the entire image, i.e., the domain.
Applications include pressure and velocity predictions around airfoils [80,81,82,83], stress predictions from geometries and boundary conditions [84, 85], steady flow predictions [86], detection of manufacturing features [87, 88], full waveform inversion [89,90,91,92,93,94,95,96,97,98,99,100], and topology optimization [101,102,103,104,105,106,107,108,109,110]. An important choice in the design of the learning algorithm is the encoding of the input data. In the case of geometries and boundary conditions, binary representations are the most straightforward approach. These are however challenging for CNNs, as discussed in [86]. Signed distance functions [86] or simulations on coarse grids provide superior alternatives. For inverse problems, an initial forward simulation of an initial guess of the inverse field can be used to encode the desired boundary conditions [105, 108,109,110]. Another possibility for CNNs is a decomposition of the domain. The mapping can be performed on the full domain [111], smaller subdomains [112], or even individual pixels [113]. In the latter two cases, interfaces require special treatment.
The disadvantage of CNN mappings is being constrained to uniform grids on rectangular domains. This can be circumvented by using GNNs acting on graph data, e.g., meshes, such as in [114,115,116], or point cloud-based NNs [117, 118] acting on point cloud data, such as in [119]. Just as CNNs, GNNs operate on the invariant structural elements of the data, which for GNNs are edges connecting vertices (see Appendix A.2) instead of pixels aligned on a structured grid for CNNs. In fact, GNNs can be regarded as a generalization of CNNs since they can handle a broader class of data structures, i.e., graphs (including images)Footnote 11. This comes at the cost of less efficient implementations when compared to pure CNNs.
2.1.1.3. Model order reduction encoding
Independent of the NN architecture, learning can be aided by applying the NN to a lower-dimensional space that is able to capture the data. For complex problems, mappings e to low-dimensional spaces (also referred to as latent space or latent vector) \({\varvec{h}}\) can be identified with model order reduction techniques. Thus, in the case of simulation substitution, a low-dimensional encoding \({\varvec{h}}^\lambda =e(\varvec{\lambda })\) of \(\varvec{\lambda }\) (sampled on all sample points \({\varvec{x}}\)) is identified. This is provided as input to a NN to predict the solution field \({\varvec{h}}^u\) in a reduced latent space. The full solution field \({\varvec{u}}\) (on all sample points \({\varvec{x}}\)) is obtained in a decoding \(d=e^{-1}\) step. The prediction is given as
The dimensional reduction can, e.g., be performed with principal components analysis [120, 121], as proposed in [122], proper orthogonal decomposition [123], or reduced manifold learning [124]. These techniques have been applied to learning aortic wall stresses [125], arterial wall stresses [126], flow velocities in viscoplastic flow [127], and the inverse problem of identifying unpressurized geometries from pressurized geometries [128]. Currently, the most impressive results in data-driven surrogate modeling are achieved with model order reduction encodings combined with NNs [129, 130], which can be combined with most other methodologies presented in this work.
Another dimensionality reduction technique are autoencoders [131], where e and d are modeled by NNsFootnote 12. These are treated in detail in Appendix B.1 and enable non-linear encodings. An early investigation is presented in [132], where proper orthogonal decomposition is related to NNs. Application areas are the prediction of designs of acoustic scatterers from the reduced latent space [133], or mappings from dynamic responses of bridges to damage [134]. Furthermore, it has to be stated that many of the image-to-image mapping techniques rely on NN architectures inspired by autoencoders, such as U-nets [135, 136].
2.1.1.4. Neural operators The most recent trend in surrogate modeling with NNs are neural operators [137], which map between function spaces instead of functions. Neural operators rely on the extension of the universal approximation theorem [50] to non-linear operators [138]. The two most prominent neural operators are DeepONetsFootnote 13 [139] and Fourier neural operators [140].
DeepONet
In DeepONets [139], illustrated in Fig. 4, the task of predicting the operator \({\hat{u}}(\varvec{\lambda }; x)\) is split up into two sub-tasks:
-
the prediction of \(N_P\) basis functions \(\varvec{{\hat{t}}}(x)\) (TrunkNet),
-
the prediction of the corresponding \(N_P\) problem-specific coefficients \(\varvec{{\hat{b}}}(\varvec{\lambda })\) (BranchNet).
The basis is predicted by the TrunkNet with parameters \(\varvec{\theta }^T\) via an evaluation at coordinates x. The coefficients are estimated from the coefficients \(\varvec{\lambda }\) using the BranchNet parametrized by \(\varvec{\theta }^B\) and, thus, specific to the problem being solved. Taking the dot product over the evaluated basis and the coefficients yields the solution prediction \({\hat{u}}(\varvec{\lambda }; x)\).
Applications can be found in [141,142,143,144,145,146,147,148,149,150,151,152,153]. DeepONets have also been extended with physics-informed loss functions [154,155,156].
Fourier neural operators
Fourier neural operators [140] predict the solution \(\varvec{{\hat{u}}}\) on a uniform grid \({\varvec{x}}\) from the spatially varying coefficients \(\varvec{\lambda }=\lambda ({\varvec{x}})\). As the aim is to learn a mapping between functions, sampled on the entire domain, non-local mappings can be performed at each layer [157]. For example, mappings such as integral kernels [158, 159], Laplace transformations [160], and Fourier transforms [140] can be employed. These transformations enhance the non-local expressivity of the NN [157], where Fourier transforms are particularly favorable due to the computational efficiency achievable through fast Fourier transforms.
The Fourier neural operator, as illustrated in Fig. 5, consists of Fourier layers, where linear transformations \({\varvec{K}}\) are performed after Fourier transforms \({\mathcal {F}}\) along the spatial dimensions x. Subsequently, an inverse Fourier transform \({\mathcal {F}}^{-1}\) is applied, which is added to the output of a linear transformation \({\varvec{W}}\) performed outside the Fourier space. Thus, the Fourier transform can be skipped by the NN. The final step is an activation function \(\sigma \). The manipulations within a Fourier layer to predict the next activation on the uniform grid \({\varvec{a}}^{(j+1)}({\varvec{x}})\) can be written as
where \({\varvec{b}}\) is the bias. Both the linear transformations \({\varvec{K}}, {\varvec{W}}\) and the bias \({\varvec{b}}\) are learnable and thereby part of the parameters \(\varvec{\theta }\). Multiple Fourier layers can be employed, typically used in combination with an encoding network \(P_{NN}\) and a decoding network \(Q_{NN}\).
Applications can be found in [161,162,163,164,165,166,167,168,169,170,171]. An extension relying on the attention mechanisms of transformers [172] is presented in [173]. Analogously to DeepONets, Fourier neural operators have been combined with physics-informed loss functions [174].
2.1.1.5. Neural network approximation power
Despite the advancements in NN architecturesFootnote 14, NN surrogates struggle to learn solutions of general PDEs. Typically, successes have only been achieved for parametrized PDEs with relatively small parameter spaces—or in cases where accuracy, reliability, or generalization were disregarded. It has, however, been shown—both for simple architectures such as FC-NNs [175, 176] as well as for advanced architectures such as DeepONets [177]—that NNs possess an excellent theoretical approximation power which can capture solutions of various PDEs. Currently, there are two obstacles that impede the identification of sufficiently good optima with these desirable NN parameter spaces [175]:
-
training data: generalization error,
-
training algorithm: optimization error.
A lack of sufficient training data leads to poor generalization. This might be alleviated through faster data generation using, e.g., faster and specialized classical methods [178], or improved sampling strategies, i.e., finding the minimum number of required data points distributed in a specific manner to train the surrogate. Additionally, current training algorithms only converge to local optima. Research into improved optimization algorithms, such as current trends in computing better initial weights [179] and thereby better local optima, attempts to reduce the optimization error. At the same time, training times are reduced drastically increasing the competitiveness.
2.1.2 Time-stepping procedures
For the time-stepping procedures, we will consider Eqs. (3) and (4) in the following.
2.1.2.1. Recurrent neural networks
The simplest approach to modeling time series data is by using FC-NNs to predict the next time step \(t_{i+1}\) from the current time step \(t_i\):
However, this approach lacks the ability to capture the temporal dependencies between different time steps, as each input is treated independently and without considering more than just the previous time step. Incorporating the sequential nature of the data can be achieved directly with RNNs. RNNs maintain a hidden state which captures information from the previous time steps, to be used for the next time step prediction. By unrolling the RNN, the entire time-history can be predicted.
Shortcomings of RNNs, such as their tendency to struggle with learning long-term dependencies due to the problem of vanishing or exploding gradients, have been addressed by more sophisticated architectures such as long short-time memory networks (LSTM) [59], gated recurrent unit networks (GRU) [180], and transformers [172] (see [181] for a recent contribution on transformers for thermal analysis in additive manufacturing). The concept of recurrent units has also been combined with other architectures, as demonstrated for CNNs [182] and GNNs [114, 115, 183,184,185,186,187].
Further applications of RNNs are full waveform inversion [188,189,190], high-dimensional chaotic systems [191], fluid flow [40, 192], fracture propagation [116], sensor signals in non-linear dynamic systems [193, 194], and settlement field predictions induced by tunneling [195], which was extended to damage prediction in affected structures [196, 197]. RNNs are often combined with reduced order model encodings [198], where the dynamics are predicted on the reduced latent space, as demonstrated in [199,200,201,202,203,204,205]. Further variations employ classical time-stepping schemes on the reduced latent space obtained by autoencoders [206, 207].
2.1.2.2. Dynamic mode decomposition
Another approach that was formulated for system dynamics, i.e., Eq. (4) is dynamic mode decomposition (DMD) [208, 209]. The aim of DMD is to identify a linear operator \({\varvec{A}}\) that relates two successive snapshot matrices with n time steps \({\varvec{X}}=[{\varvec{x}}(t_1),{\varvec{x}}(t_2),\dots ,{\varvec{x}}(t_n)]^T, {\varvec{X}}'=[{\varvec{x}}(t_2),{\varvec{x}}(t_3),\dots ,{\varvec{x}}(t_{n+1})]^T\):
To solve this, the problem is reframed as a regression task. The operator \({\varvec{A}}\) is approximated by minimizing the Frobenius norm of the difference between \({\varvec{X}}'\) and \({\varvec{A}}{\varvec{X}}\). This minimization can be performed using the Moore-Penrose pseudoinverse \({\varvec{X}}^\dagger \) (see, e.g., [38]):
Once the operator is identified, it can be used to propagate the dynamics forward in time, approximating the next state \({\varvec{x}}(t_{i+1})\) using the current state \({\varvec{x}}(t_i)\):
This framework, is however, only valid for linear dynamics. DMD can be extended to handle non-linear systems through the application of Koopman operator theory [210]. According to Koopman operator theory, it is possible to represent a non-linear system as a linear one by using an infinite-dimensional Koopman operator \({\mathcal {K}}\) that acts on a transformed state \(e({\varvec{x}}(t_i))\):
In theory, the Koopman operator \({\mathcal {K}}\) is an infinite-dimensional linear transformation. In practice, however, finite-dimensional approximations are employed. This approach is, for example utilized in the extended DMD [211], where the regression from Eq. (17) is performed on a higher-dimensional state \({\varvec{h}}(t_{i}) = e({\varvec{x}}(t_{i}))\) relying on a dictionary of orthonormal basis functions \({\varvec{h}}(t_i)=\varvec{\psi }({\varvec{x}}(t_i))\). Alternatively, the dictionary can be learned using NNs, i.e., \(\varvec{{\hat{\psi }}}({\varvec{x}})=\psi _{NN}({\varvec{x}};\varvec{\theta })\), as demonstrated in [212, 213]. The NN is trained by minimizing the mismatch between predicted state \(\varvec{\psi }(\varvec{{\hat{x}}}(t_{i+1}))={\varvec{A}} \varvec{{\hat{\psi }}}({\varvec{x}}(t_i))\) (Eq. 18) and the true state in the dictionary space. Orthogonality is not required and therefore not enforced.
When the dictionary is learned, the state predictions can be reconstructed using the Koopman mode decomposition, as explained in detail in [212].
Alternatively, the mapping to the augmented state can be performed with autoencoders, which at the same time allows for a direct map back to the original space [214,215,216,217]. Thus, an encoder learns a reduced latent space \(\varvec{{\hat{h}}}({\varvec{x}})=e_{NN}({\varvec{x}};\varvec{\theta }^e)\) and a decoder learns the inverse mapping \(\varvec{{\hat{x}}}({\varvec{h}})=d_{NN}({\varvec{h}};\varvec{\theta }^d)\). The networks are trained using three losses: the autoencoder reconstruction loss \({\mathcal {L}}_{{\mathcal {A}}}\), the linear dynamics loss \({\mathcal {L}}_{{\mathcal {R}}}\), and the future state prediction loss \({\mathcal {L}}_{{\mathcal {F}}}\).
The cost function C is composed of a weighted sum of the loss terms \({\mathcal {L}}_{{\mathcal {A}}},{\mathcal {L}}_{{\mathcal {R}}},{\mathcal {L}}_{{\mathcal {F}}}\) and weighting terms \(\kappa _{{\mathcal {A}}},\kappa _{{\mathcal {R}}},\kappa _{{\mathcal {F}}}\). Furthermore, [216] allows \({\varvec{A}}\) to vary depending on the state. This is achieved by predicting the eigenvalues of \({\varvec{A}}\) with an auxiliary network and constructing the matrix from these.
2.1.3 Active learning and transfer learning
Finally, an important machine learning technique independent of the NN architecture and applicable to both space-time and time-stepping approaches is active learning [218]. Instead of precomputing a labeled data set, data is only provided when the prediction quality of the NN is insufficient. Furthermore, the data is not chosen arbitrarily, but only in the vicinity of the failed prediction. In computational mechanics, the prediction of the NN can be assessed with an error indicator. For an insufficient result, the results of a classical simulation are used to retrain the NN. Over time, the NN estimates improve in the respective domain of application. Due to the error indicator and the classical simulations, the predictions are reliable. Examples for active learning in computational mechanics can be found in [219,220,221].
Another technique, transfer learning [222, 223], aims at accelerating the NN training. Here, the NN is first trained on a similar task. Subsequently, it is applied to the task of interest—where it converges faster than an untrained NN. Applications in computational mechanics can be found in [98, 224].
2.2 Physics-informed learning
In supervised learning, as discussed in Sect. 2.1, the quality of prediction strongly depends on the amount of training data. Acquiring data in computational mechanics may be expensive. To reduce the amount of required data, constraints enforcing the physics have been proposed. Two main approaches exist [43, 225]. The physics can be enforced by modifying the cost function through a penalty term punishing unphysical predictions, thus acting as a regularizer. Possible modifications are discussed in the upcoming section. Alternatively, the physics can be enforced by construction, i.e., by reducing the learnable space to a physically meaningful space. This approach is highly specific to its application and will therefore mainly be explored in Sect. 3. A brief coverage is provided in Sect. 2.2.3.
Both approaches can be found in overview publications, where [43] defines four overarching methodologies: (i) augmentation of training data using prior knowledge, (ii) modification of the model, i.e., enforcement by construction, (iii) enhancement of the learning algorithm with regularization terms, i.e., enforcing constraints through the cost function, and (iv) checking the final estimate and thereby discarding physical violations (using, e.g., error indicators). The two most prominent methodologies, i.e., modifying the cost function and enforcement by construction are similarly mentioned in [225], which correspondingly refers to them as physics-informed and physics-augmented. Further variations in terminology can be found in [182, 226], who refer to physics-informed NNs for multiple solutions as physics-constrained deep learning, or [227] using the term physics-enhanced NNs for NNs enforcing the physics by construction. Due to the many names within the relatively new and interconnected field, we cover the variations under the overarching term of physics-informed learning.
2.2.1 Space-time approaches
Once again and without loss of generality, the temporal dimension t is dropped to declutter the notation. However, in contrast to Sect. 2.1.1, the following methods are not equally applicable to forward and inverse problems. Thus, the prediction of the solution \({\hat{u}}\), the PDE coefficients \({\hat{\lambda }}\), and the non-linear operator \({\mathcal {N}}\) are treated separately.
2.2.1.1. Differential equation solving with neural networks
The concept of solving PDEsFootnote 15 was first proposed in the 1990s [8,9,10], but was recently popularized by the so-called physics-informed neural networks (PINNs) [228] (see [229,230,231] for recent review articles and SciANN [232], SimNet [233], DeepXDE [234] for libraries).
To illustrate the idea and variations of PINNs, we will consider the differential equation of a static elastic bar
Here, the operator \({\mathcal {N}}\) is given by the left-hand side of the equation, the solution u(x) is the axial displacement, and the spatially varying coefficients \(\lambda (x)\) are given by the cross-sectional properties EA(x) and the distributed load p(x). Additionally, boundary conditions are specified, which can be in terms of Dirichlet (on \(\Gamma _D\)) or Neumann boundary conditions (on \(\Gamma _N\)):
Physics-informed neural networks
PINNs [228] approximate either the solution u(x), the coefficients \(\lambda (x)\), or both with FC-NNs.
Instead of training the network with labeled data as in Eq. (6), the residual of the PDE is considered. The residual is evaluated at a set of \(N_{{\mathcal {N}}}\) points, called collocation points. Taking the mean squared error over the residual evaluations yields the PDE loss
The gradients of the possible predictions, i.e., u, EA, and p with respect to x, are obtained with automatic differentiation [51] through the NN approximation. Similarly, the boundary conditions are enforced at the \(N_\mathcal {B_D}+N_\mathcal {B_N}\) boundary points.
The cost function is composed of the PDE loss \({\mathcal {L}}_{\mathcal {N}}\), boundary loss \({\mathcal {L}}_{\mathcal {B}}\), and possibly a data-driven loss \({\mathcal {L}}_{\mathcal {D}}\)
Both the deep least-squares method [235] and the deep Galerkin method [236] are closely related. Instead of focusing on the residuals at individual collocation points as in PINNs, these methods consider the \(L^2\)-norm of the residuals integrated over the domain \(\Omega \).
Variational physics-informed neural networks
Computing high-order derivatives for the non-linear operator \({\mathcal {N}}\) is expensive. Therefore, variational PINNs [237, 238] consider the weak form of the PDE, which lowers the order of differentiation. In the case of the bar equation, the weak PDE loss is given by
In [237], \(N_{{\mathcal {V}}}\) trigonometric and polynomial test functions \(w_i(x)\) are used. The cost function is obtained by replacing the PDE loss \({\mathcal {L}}_{\mathcal {N}}\) with the weak PDE loss \({\mathcal {L}}_{\mathcal {V}}\) in Eq. (32). Note that the Neumann boundary conditions are now not included in the boundary loss \({\mathcal {L}}_{\mathcal {B}}\), as they are already incorporated in the weak form in Eq. (33). The integrals are evaluated through numerical integration methods, such as Gaussian quadrature, Monte Carlo integration methods [239, 240], or sparse grid quadratures [241]. Severe inaccuracies can be introduced through the numerical integration of the NN output—for which remedies have been proposed in [242].
Weak adversarial networks
Instead of specifying the test functions w(x), weak adversarial networks [243] employ a second NN as test function
The test function is learned through a minimax optimization
where the test function w(x) continually challenges the solution u(x).
Deep energy method and deep Ritz method
By minimizing the potential energy \(\Pi =\Pi _i+\Pi _e\) instead, the need for test functions is overcome by the deep energy method [244] and the deep Ritz method [245]. This results in the following loss term
Note that the inverse problem generally cannot be solved using the minimization of the potential energy. Consider, for instance, the potential energy of the bar equation in Eq. (37), which is not well-posed in the inverse setting. Here, EA(x) going towards \(-\infty \) in the domain \(\Omega \) and going towards \(\infty \) at \(\Gamma _N\) minimizes the potential energy \({\mathcal {L}}_{{\mathcal {E}}}\).
Extensions
A multitude of extensions to the PINN methodology exist. For in-depth reviews, see [229,230,231].
Learning multiple solutions
Currently, PINNs are mainly employed to learn a single solution. As the training effort exceeds the solving effort of classical solvers, the viability of PINNs is questionable [246]. However, PINNs can also be employed to learn multiple solutions. This is achieved by providing the parametrization of the PDE, i.e., \(\lambda \) as an additional input to the network, as discussed in Sect. 2.1. This enables a cheap prediction stage without retraining for new solutionsFootnote 16. One possible example for this is [247], where different geometries are captured in terms of point clouds and processed with point cloud-based NNs [117].
Boundary conditions
The enforcement of the boundary conditions through a penalty term \({\mathcal {L}}_{{\mathcal {B}}}\) in Eq. (31) leads to an unbalanced optimization, due to the competing loss terms \({\mathcal {L}}_{{\mathcal {N}}}, {\mathcal {L}}_{{\mathcal {B}}}, {\mathcal {L}}_{{\mathcal {D}}}\) in Eq. (32)Footnote 17. One remedy is to modify the NN output \(F_{FNN}\) by multiplication of a function, such that the Dirichlet boundary conditions are satisfied a priori, i.e., \({\mathcal {L}}_{{\mathcal {B}}}=0\), as demonstrated in [37, 248].
Here, G(x) is a smooth interpolation of the boundary conditions, and D(x) is a signed distance function that is zero at the boundary. For Neumann boundary conditions, [249] propose to predict u and its derivatives \(\partial u/\partial x\) with separate networks, such that the Neumann boundary conditions can be enforced strongly by modifying the derivative network. This requires an additional constraint, ensuring that the derivative predictions match the derivative of u. For complex domains, G(x) and D(x) cannot be found analytically. Therefore, [248] use NNs to learn G(x) and D(x) in a supervised manner by prescribing either the boundary values or zero at the boundary and restricting the values within the domain to be non-zero. Similarly [250] proposed using radial basis function networks for G(x), where \(D(x)=1\) is assumed. The radial basis function networks are determined by solving a linear system of equations constructed with the boundary conditions. On uniform grids, strong enforcement can be achieved through specialized CNN kernels [204] with constant padding terms for Dirichlet boundary conditions and ghost cells for Neumann boundary conditions. Constrained backward propagation [251] has also been proposed to guarantee the enforcement of boundary conditions [252, 253].
Another possibility is to introduce weighting terms \(\kappa _{{\mathcal {N}}}, \kappa _{{\mathcal {B}}}, \kappa _{{\mathcal {D}}}\) for each loss term. These are either hyperparameters, or they are learned during the optimization with attention mechanisms [254,255,256]. This is achieved by performing a minimax optimization with respect to all weighting terms \(\varvec{\kappa }=\{\kappa _{{\mathcal {N}}}, \kappa _{{\mathcal {B}}}, \kappa _{{\mathcal {D}}}\}\)
Expanding on this idea, each collocation point used for the loss terms can be considered an individual equality constraint [257, 258]. Therefore, a weighting term \(\kappa _{{\mathcal {N}}_i}\) is allocated for each collocation point \(x_i\), as illustrated for the PDE loss \({\mathcal {L}}_{{\mathcal {N}}}\) from Eq. (30)
This has the added advantage that greater emphasis is assigned on more important collocation points, i.e., points which lead to larger residuals. This approach is strongly related to the approaches relying on the augmented Lagrangian method [259] and competitive PINNs [260], where an additional NN models the penalty weights \(\kappa (x)=K_{FNN}(x; \varvec{\theta }^\kappa )\). This is similar to weak adversarial networks, but instead formulated using the strong form.
Ansatz
Another prominent topic is the question of which ansatz to choose. The type of ansatz is, for example, determined by different NN architectures (see [261] for a comparison) or combinations with classical ansatz formulations. Instead of using FC-NNs, some authors [182, 226] employ CNNs to exploit the spatial structure of the data. Irregular geometries can be handled by embedding the structure in a rectangular domain using binary encodings [262] or signed distance functions [86, 263]. Another option are coordinate transformations into rectangular grids [264]. The CNN requires a full-grid discretization, meaning that the coordinates x are analytically independent of the prediction \({\hat{u}} = F_{CNN}\). Thus, the gradients of u are not obtained with automatic differentiation, but with numerical differentiation, i.e., finite differences. Alternatively, the output of the CNN can represent coefficients of an interpolation, as proposed under the name spline-PINNs [265] using Hermite splines. This again allows for an automatic differentiation. This is similarly applied for irregular geometries in [266], where GNNs are used in combination with a piecewise polynomial basis. Using a classical basis has the added advantage that Dirichlet boundary conditions can be satisfied exactly. A further variation is the approximation of the coefficients of classical bases with FC-NNs. This is shown with B-splines in [267] in the sense of isogeometric analysis [268]. This was similarly done for piecewise polynomials in [269]. However, instead of simply minimizing the PDE residual from Eq. (30) directly, the finite element discretization [270, 271] is exploited. The loss \({\mathcal {L}}_{{\mathcal {F}}}\) can thus be formulated in terms of the non-linear stiffness matrix \({\varvec{K}}\), the force vector \({\varvec{F}}\), and the degrees of freedom \({\varvec{u}}^h\).
In the forward problem, \({\varvec{u}}^h\) is approximated by a FC-NN, whereas for the inverse problem a FC-NN predicts \({\varvec{K}}\). Similarly, [272, 273] map a NN onto a finite element space by using the NN evaluations at nodal coordinates as the corresponding basis function coefficents. This also allows a straightforward strong enforcement of Dirichlet boundary conditions, as demonstrated in [79] with CNNs. The nodes are represented as pixels (see Fig. 3).
Prior information on the solution can be incorporated through a feature layer [274]. If, for example, it is known that the solution is composed of trigonometric functions, a feature layer with trigonometric functions can be applied after the input layer. Thus, known features are given to the NN directly to aid the learning. Without known features, the task can also be modified to improve learning. Inspired by adaptivity from finite elements, refinements are progressively learned by additional layers of the NN [275] (see Fig. 6). Thus, a coarse solution \({\varvec{u}}_1\) is learned to begin with, then refined to \({\varvec{u}}_2\) by an additional layer, which again is refined to \({\varvec{u}}_3\) until the deepest refinement level is reached.
Domain decomposition
To improve the scalability of PINNs to more complex problems, several domain decomposition methods have been proposed. One approach are hp-variational PINNs [238], where the domain is decomposed into patches. Piecewise polynomial test functions are defined on each patch separately, while the solution is approximated by a globally acting NN. This enables a separate numerical integration of each patch, improving its accuracy.
In an alternative formulation, one NN can be used per subdomain. This was proposed as conservative PINNs [276], where conservation laws are enforced at the interface to ensure continuity. Here, the discrepancies between both solution and flux were penalized at the interface in a least squares manner. The advantages of this approach are twofold: Firstly, parallelization is possible [277] and, secondly, adaptivitiy can be introduced. Shallower networks can be employed for smooth solutions and deeper networks for more complex solutions. The approach was generalized for any PDE in the context of extended PINNs [278]. Here, the interface condition is formulated in terms of the difference in both the residual and the solution.
Acceleration methods
Analogously to supervised learning, as discussed in Sect. 2.1, transfer learning can be applied to PINNs [279] as, e.g., demonstrated in phase-field fracture [280] or topology optimization [281]. These are very suitable problems since crack and displacement fields evolve with mostly local changes in phase-field fracture. For topology optimization, only minor updates are expected between each optimization iteration [281].
The poor performance of PINNs in their original form can also be improved with better sampling strategies. In importance sampling [282, 283], the collocation point density is proportional to the value of the cost function. Alternatively, residual-based adaptive refinement [234] adds collocation points in the vicinity of areas with a higher cost function.
Another essential topic for NNs is normalization of the inputs, outputs, and loss terms [284, 285]. For time-dependent problems, it is possible to use time-dependent normalization [286] to ensure that the solution is always in the same range regardless of the time step.
Furthermore, the cost function can be enhanced by including the derivative of the residual [287] as well. The derivative should also be minimized, as both the residual and its derivative should be zero at the correct solution. However, a general problem in the cost function formulation persists. The cost function should correspond to the norm of the error, which is not necessarily the case. This means that a reduction in the cost does not necessarily yield an improvement in quality of solution. The error norm can be expressed in terms of the \(H^{-1}\)-norm, which, according to [288], can efficiently be computed on rectangular domains with Fourier transforms. Thus, the \(H^{-1}\)-norm can directly be used as cost function and minimized.
Another aspect is numerical differentiation, which is advantageous for the residual of the PDE [289], as automatic differentiation may be erroneous due to spurious oscillations between collocation points. Thus, numerical differentiation enforces regularity, which was exploited in [289] by coupling automatic differentiation and numerical differentiation to retain the advantages of automatic differentiation.
Further specialized modifications to NN architectures have been proposed. Adaptive activation functions [290] have shown acceleration in convergence. Extreme learning machines [291, 292] remove the need for iterations altogether. All layers are randomly initialized in extreme learning machines, and only the last layer is learnable. Without a non-linear activation function, the parameters are found with a least-squares regression. This was demonstrated for PINNs in [293]. Instead of only learning the last layer, the problem can be split into a non-linear and a linear regression problem, which are solved separately [294], such that the full expressivity of NNs is retained.
Applications to forward problems
PINNs have been applied to various PDEs (see [229,230,231] for an overview). Forward problems can, for example, be found in solid mechanics [284, 295, 296], fluid mechanics [297,298,299,300,301,302,303,304], and thermomechanics [305, 306]. Currently, PINNs do not outperform classical solvers such as the finite element method [246, 307] in terms of speed for a given accuracy of engineering relevance. In the author’s experience and judgement, this is especially the case for forward problems even if the extensions mentioned above are employed. Often, the mentioned gains compared to classical forward solvers disregard the training effort and only report evaluation times.
Incorporating large parts of the solution in the form of measurements with the data-driven loss \({\mathcal {L}}_{{\mathcal {D}}}\) improves the performance of PINNs, which thereby can become a viable method in some cases. Yet, [308] states that data-driven methods outperform PINNs. Thus PINNs should not be regarded as a replacement for data-driven methods, but rather as a regularization technique for data-driven methods to reduce the generalization error.
Applications to inverse problems
However, PINNs are in particular useful for inverse problems with full domain knowledge, i.e., the solution is available throughout the entire domain. This has, for example, been shown for the identification of material properties [285, 309,310,311,312]. By contrast, for inverse problems with only partial knowledge, the applicability of PINNs is limited [313], as both forward and inverse solution have to be learned simultaneously. Most applications therefore limit themselves to simpler inversions such as size and shape optimization. Examples are published, e.g., in [295, 314,315,316,317,318,319]. Exceptions that deal with the identification of entire fields can be found in full waveform inversion [320], topology optimization [321], elasticity, and the heat equation [322].
2.2.1.2. Inverse problems
PINNs are capable of discovering governing equations by either learning the operator \({\mathcal {N}}\) or the coefficients \(\lambda \). The resulting operator is, however, not always interpretable, and in the case of identification of the coefficients, the underlying PDE is assumed. To discover interpretable operators, one can apply sparse regression approaches [323]. Here, potential differential operators are assumed as an input to the non-linear operator
Subsequently, a NN learns the corresponding coefficients using observed solutions inserted into Eq. (42). The evaluation of the differential operators is achieved through automatic differentiation by first interpolating the solution with a NN. Sparsity is ensured with a \(L^1\)-regularization.
A more sophisticated and complete framework is AI-Feynman [324]. Sequentially, dimensional analysis, polynomial regression, and brute force search algorithms are applied to identify fundamental laws in the data. If unsuccessful, a NN interpolates the data, which can thereby be queried for symmetry and separability. The identification of symmetries leads to a reduction in variables, i.e., a reduction of the input space. In the case of separability, the problem is decomposed into two subproblems. The reduced problems or subproblems are iteratively fed through the framework until an equation is identified. AI-Feynman has been successfully applied to 100 equations from the Feynman lectures [325].
2.2.2 Time-stepping procedures
Again Eqs. (3) and (4) will be considered for the time-stepping procedures.
2.2.2.1. Physics-informed neural networks
In the spirit of domain decomposition, parareal PINNs [326] split up the temporal domain in subdomains \([t_i<t_{i+1}]\). A rough estimate of the solution u is provided by a conjugate gradient solver on a simplified form of the PDE starting from \(t_0\). PINNs are then independently applied in each subdomain to correct the estimate. Subsequently, the conjugate gradient solver is applied again, starting from \(t_1\). This process is repeated until all time steps have been traversed. A closely related approach can be found in [327], where a PINN is retrained on successive time segments. It is however ensured that previous time steps are kept fulfilled through a data-driven loss term for time segments that were already learned.
Another approach are the discrete-time PINNs [228], which consider the temporal dimension in a discrete manner. The differential equation from Eq. (3) is discretized with the Runge-Kutta method with q stages [328]:
where
A NN \(F_{NN}\) predicts all stages \(i=1,\dots ,q\) from an input x:
The cost is then constructed by rearranging Eqs. (43) and (44).
The \(q+1\) predictions \({\hat{u}}_i^n, {\hat{u}}^n_{q+1}\) of \({\hat{u}}^n\) have to match the initial conditions \(u^{{\mathcal {M}}^n}\), where the mean squared error is used as a loss function to learn all stages \(\varvec{{\hat{u}}}\). The approach has been applied to fluid mechanics [329, 330].
2.2.2.2. Inverse problems
As for inverse problems in the space-time approaches (Paragraph 2.2.1.2), the non-linear operator \({\mathcal {N}}\) can be learned. For temporal problems, this corresponds to the right-hand side of Eq. (3) for PDEs and to Eq. (4) for systems of ODEs. The predicted right-hand side can then be used to predict time series using a classical time-stepping scheme, as proposed in [331]. More sophisticated methods leaning on similar principles are presented in the following. Specifically, we will discuss PDE-Net for discovering PDEs, SINDy for discovering systems of ODEs in an interpretable sense, and an approach relying on multistep methods for systems of ODEs. The multistep approach leads to a non-interpretable, but more expressive approximation of the right-hand side.
PDE-Net
PDE-Net [332, 333] is designed to learn both the system dynamics u(x, t) and the underlying differential equation it follows. Given a problem of the form of Eq. (3), the right-hand side can be approximated as a function of coordinates and gradients of the solution.
The operator \(\hat{{\mathcal {N}}}^{{\mathcal {T}}}\) is approximated by NNs. The first step involves estimating spatial derivatives using learnable convolutional filters. The filters are designed to adjust their order of approximation based on the fit to the underlying measurements \(u^{{\mathcal {M}}}\), while the type of gradient is predefinedFootnote 18. Thus, the NN learns how to best approximate spatial derivatives specific to the underlying data. Subsequently, the inputs of \(\hat{{\mathcal {N}}}^{{\mathcal {T}}}\) are combined with point-wise CNNs [334] in [332] or a symbolic network in [333]. Both yield an interpretable operator from which the analytical expression can be extracted. In order to construct a loss function, Eqs. (3) and (49) are discretized using the forward Euler method:
This temporal discretization is applied iteratively, and the discrepancy between the derived function and the measured data \(u^{{\mathcal {M}}}(x, t_{n})\) serves as the loss function.
SINDy
Sparse identification of non-linear dynamic systems (SINDy) [335] deals with the discovery of dynamic systems of the form of Eq. (4). The task is posed as a sparse regression problem. Snapshot matrices of the state \({\varvec{X}}=[{\varvec{x}}(t_1),{\varvec{x}}(t_2),\dots ,{\varvec{x}}(t_n)]\) and its time derivative \(\dot{{\varvec{X}}}=[\dot{{\varvec{x}}}(t_1),\dot{{\varvec{x}}}(t_2),\dots ,\dot{{\varvec{x}}}(t_n)]\) are related to one another via candidate functions \(\varvec{\Theta }({\varvec{X}})\) evaluated at \({\varvec{X}}\) using unknown coefficients \(\varvec{\Xi }\):
The coefficients \(\varvec{\Xi }\) are determined through sparse regression, such as sequential thresholded least squares or LASSO regression. By including partial derivatives, SINDy has been extended to the discovery of PDEs [336, 337].
The expressivity of SINDy can further be increased by a coordinate transformation into a representation allowing for a simpler representation of the system dynamics. This can be achieved with an autoencoder (consisting of an encoder \(e_{NN}(x;\varvec{\theta }^e)\) and a decoder \(d_{NN}(h;\varvec{\theta }^d)\), as proposed in [338], where the dynamics are learned on the reduced latent space h using SINDy. A simultaneous optimization of the NN parameters \(\varvec{\theta }^e, \varvec{\theta }^d\) and SINDy parameters \(\varvec{\Xi }\) is conducted with gradient descent. The cost is defined in terms of the autoencoder reconstruction loss \({\mathcal {L}}_{{\mathcal {A}}}\) and the residual of Eq. (51) at both the reduced latent space \({\mathcal {L}}_{{\mathcal {R}}}\) and the original space \({\mathcal {L}}_{{\mathcal {F}}}\)Footnote 19. A \(L^1\)-regularization for \(\varvec{\Xi }\) promotes sparsity.
As in Eq. (24), a weighted cost function with weights \(\kappa _{{\mathcal {A}}},\kappa _{{\mathcal {R}}},\kappa _{{\mathcal {F}}}\) is employed. The reduced latent space can be exploited for forward simulations of the identified system. By solving the system with classical time-stepping schemes in the reduced latent space, the solution is obtained in the full space through the decoder, as outlined in [339]. Thus, a reduced order model of a previously unknown system is identified. The downside is, that the model is no longer interpretable in the full space.
Multistep methods
Another approach [340] to learning the system dynamics from Eq. (4) is to approximate the right-hand side directly with a NN \(\varvec{{\hat{f}}}({\varvec{x}}_i)=O_{NN}({\varvec{x}}_i;\varvec{\theta })\), \({\varvec{x}}_i={\varvec{x}}(t_i)\). A residual can be formulated by considering linear multistep methods [328], a residual can be formulated. In general, these methods take the form:
where \(M, \alpha _0, \alpha _1, \beta _0, \beta _1\) are parameters specific to a multistep scheme. The scheme can be reformulated with a cost function, given as:
The idea of the method is strongly linked to the discrete-time PINN presented in Paragraph 2.2.2.1, where a reformulation of the Runge-Kutta method yields the cost function needed to learn the forward solution.
2.2.3 Enforcement of physics by construction
Up to this point, this review only considered the case where physics are enforced indirectly through penalty terms of the PDE residual. The only exception, and the first example of enforcing physics by construction, was the strong enforcement of boundary conditions [37, 204, 248] by modifying the outputs of the NN—which led to a fulfillment of the boundary conditions independent of the NN parameters. For PDEs, this can be achieved by manipulating the output, such that the solution automatically obeys fundamental physical laws. Examples for this are, e.g., given in [341], where stream functions are predicted and subsequently differentiated to ensure conservation of mass, the incorporation of symmetries [342], or invariances [343] by using integrity bases [344]. Dynamical systems have been treated by learning the Lagrangian or Hamiltonian with correspondingly Lagrangian NNs [345,346,347] and Hamiltonian NNs [348]. The quantities of interest are obtained through the differentiable NN and compared to labeled data. Indirectly learning the quantities of interest through the Lagrangian or Hamiltonian guarantees the conservation of energy. Enforcing the physics by construction is also referred to as physics-constrained learning, as the learnable space is constrained. Note, however, that constraining the learnable space also challenges the learning algorithm, thus potentially making convergence more difficult. Therefore, [225] relaxes the requirement of fulfilling the physical laws by introducing a secondary unconstrained network—acting additively on the solution—whose influence is scaled by a hyperparameter. More examples of physics enforcement by construction are provided in the context of simulation enhancement in Sect. 3.2.
3 Simulation enhancement
The category of simulation enhancement deals with any deep learning technique that interacts directly with and, thus, improves a component of a classical simulation. This is the most diverse category and will therefore be subdivided into the individual steps of a classical simulation pipeline:
-
pre-processing
-
physical modeling
-
numerical methods
-
post-processing
Both data-driven and physics-informed approaches will be discussed in the following.
3.1 Pre-processing
The discussed pre-processing methods are trained in a supervised manner relying on the techniques presented in Sect. 2.1 and on labeled data.
3.1.1 Data preparation
Data preparation includes tasks, such as geometry extraction. For instance the detection of cracks from images by means of segmentation [349,350,351] can subsequently be used in simulations to assess the impact of the identified cracks. Also, CNNs have been used to prepare voxel data obtained from computed tomography scans, see [352], where scanning artifacts are removed. Similarly NNs can be employed to enhance measurement data. This was, for example, demonstrated in [353], where the NN acts as a denoiser for magnetic signals in the scope of non-destructive testing. Similarly, low-frequency extrapolation for full waveform inversion has been performed using NNs [354,355,356].
3.1.2 Initialization
Instead of preparing the data, the simulation can be accelerated by an initialization. This can, for example, be achieved through initial guesses by NNs, providing a better starting point for classical iterative solvers [357]Footnote 20. A tighter integration is achieved by using a pre-trained [279] NN ansatz whose parameters are subsequently tweaked by the classical solver, as demonstrated for full waveform inversion in [224].
3.1.3 Meshing
Finally, many simulation techniques rely on meshes. This can be achieved indirectly with NNs, by prediction of mesh density functions [358,359,360,361,362] incorporating either expert knowledge of where small elements are needed, or relying on error estimations. Subsequently, a classical mesh generator is employed. However, NNs (specifically let-it-grow NNs [363]) have also been proposed directly as mesh generators [364, 365].
3.2 Physical modeling
Physical models that capture physical phenomena accurately are a core component of mechanics. Deep learning offers three main approaches for physical models. Firstly, a NN is used as the physical model directly (model substitution). Secondly, an underlying model may be assumed where a NN determines its coefficients (identification of model parameters). Lastly, the entire model can be identified by a NN (model identification). In the first approach, the NN is integrated within the simulation pipeline, while the latter two rely on incorporation of the identified models in a classical sense.
For illustration purposes, the approaches are mostly explained on the example of constitutive models. Here, the task is to relate the strain \(\varepsilon \) to a stress \(\sigma \), i.e., find a function \(\sigma =f(\varepsilon )\). This can, for example, be used within a finite element framework to determine the element stiffness, as elaborated in [366].
3.2.1 Model substitution
In model substitution, a NN \(f_{NN}\) replaces the model, yielding the prediction \({\hat{\sigma }}=f_{NN}(\varepsilon ;\varvec{\theta })\). The quality of the model is assessed with a data-driven cost function (Eq. 5) using labeled data \(\sigma ^{{\mathcal {M}}},\varepsilon ^{{\mathcal {M}}}\). The approach is applied to a variety of problems, where the key difference lies in the definition of input and output quantities. The same deep learning techniques from data-driven simulation substitution (Sect. 2.1) can be employed.
Applications include predictions of stress from strain [366, 367], flow stresses from temperatures, strain rates and strains [368, 369], yield functions [370], crack opening responses from stresses [371], contact stiffness from penetration and contact pressure [372], point of contact from position of neighboring nodes of finite elements [373], or control points of NURBS surfaces [374]. Source terms of simplified equations or coarser discretizations have also been learned for turbulence [74, 375, 376] and the wave equation [377]. Here, the reference—a high-fidelity model—is to be captured in the best possible way by the source term.
Variations also predict the quantity of interest indirectly. For example, strain energy densities \(\psi \) are predicted by NNs from deformation tensors F, and subsequently derived using automatic differentiation to obtain stresses [378, 379]. The approach can also be extended to incorporate uncertainty quantification [380]. By extending the input space with microstructural information, an in-built homogenization is added to the constitutive model [381,382,383]. Thus, the macroscale simulation considers the microstructure at the integration points in the sense of \(\hbox {FE}^2\) [384, 385], but without an additional finite element computation. Incorporation of microstructures requires a large amount of realistic training data, which can be obtained through generative approaches as discussed in Sect. 5. Active learning can reduce the required number of simulations on these geometries [221].
A specialized NN architecture is employed by [386], where a NN first estimates invariants I of the deformation tensor F and thereupon predicts the strain energy density, thus mimicking the classical constitutive modeling approach. Another network extension is the use of RNNs to learn history-dependent models. This was shown in [381, 382, 387, 388] for the prediction of the stress increment from the stress-strain history, the strain energy from the strain energy history [389], and crack patterns based on prior cracks and crystalline orientations [390, 391].
The learned models do not, however, necessarily obey fundamental physical laws. Attempts to incorporate physics as constraints using penalty terms have been made in [392,393,394]. Still, physical consistency is not guaranteed. Instead, NN architectures can be chosen such that they satisfy physical requirements by construction. In constitutive modeling, objectivity can be enforced by using only deformation invariants as input [395], and polyconvexity can be enforced through the architecture, such as input-convex NNs [396,397,398,399] or neural ordinary differential equations [395, 400]. It was demonstrated that ensuring fundamental physical aspects such as invariants combined with polyconvexivity delivers a much better behavior for unseen data, especially if the model is used in extrapolation.
Input-convex NNs [401] enforce the convexity with specialized activation functions such as log-sum-exponential, or softplus functions in combination with constraints on the NN weights to ensure that they are positive, while neural ordinary differential equations [402] (discussed in Sect. 4) approximate the strain energy density derivatives and ensure non-negative values. Alternatively, a mapping from the NN to a convex function can be defined [403] ensuring a convex function for any NN output. Related are also thermodynamics-based NNs [404, 405], e.g., applied to complex microstructures in [406], which by construction obey fundamental thermodynamic laws. Training of these methods can be performed in a supervised manner, relying on stress-strain data, or unsupervised. In the unsupervised setting, the constitutive model is incorporated in a finite element solver, yielding a displacement field for a specific boundary value problem. The computed field, together with measurement data, yields a residual that is referred to as the modified constitutive relation error (mCRE) [407,408,409], which is minimized to improve the constitutive relation [410, 411]. Instead of formulating the mismatch in terms of displacements, [412, 413] formulate it in terms of boundary forces. For an in-depth overview of constitutive model substitution in deep learning, see [32].
3.2.2 Identification of model parameters
Identification of model parameters is achieved by assuming an underlying model and training a NN to predict its parameters for a given input. In the constitutive model example, one might assume a linear elastic model expressed in terms of a constitutive tensor c, such that \(\sigma =c\varepsilon \). The constitutive tensor can be predicted from the material distribution defined in terms of a heterogeneous elasticity modulus \({\varvec{E}}\) defined throughout the domain
Typical applications are homogenization, where effective properties are predicted from the geometry and material distribution. Examples are CNN-based homogenizations on computed tomography scans [414, 415], predictions of in-vivo constitutive parameters of aortic walls from its geometry [416], predictions of elastoplastic properties [417] from instrumented indentation results relying on a multi-fidelity approach [418], prediction of stress intensity factors from the geometry in microfabricated microcantilevers [419], estimation of effective bone properties from the boundary conditions and applied stresses within a finite element, and incorporating meso-scale information by training a NN on representative volume elements [420].
3.2.3 Model identification
NN models as a replacement of classical approaches are not interpretable, while only identifying model parameters of known models restricts the models capacity. This gap can be bridged by the identification of models in terms of parsimonious mathematical expressions.
The typical procedure is to pose the problem in terms of candidate functions and to identify the most relevant terms. The methodology was inspired by SINDy [335] and introduced in the framework for efficient unsupervised constitutive law identification and discovery (EUCLID) [421]. The approach is unsupervised, as the stress-strain data is only indirectly available through the displacement field and corresponding reaction forces. The \(N_I\) invariants \(I_i\) of the deformation tensor F are inserted into a candidate library \(Q(\{I_i\}_{i=1}^{N_I})\) containing the candidate functions. Together with the corresponding weights \(\varvec{\theta }\), the strain density \(\psi \) is determined:
Through derivation of the strain density \(\psi \) using automatic differentiation, the stresses \(\varvec{\sigma }\) are determined. The problem is then cast into the weak form with which the linear momentum balance is enforced. The weak form is then minimized with respect to \(\varvec{\theta }\) using a fixed-point iteration scheme (inspired by [422]), where a \(L_p\)-regularization is used to promote sparsity in \(\varvec{\theta }\). Despite its young age, the approach has already been applied to plasticity [423], viscoelasticity [424], combinations [425], and has been extended to incorporate uncertainties through a Bayesian model [426]. Furthermore, the approach has been extended with an ensemble of input-convex NNs [413], yielding a more accurate, but less interpretable model.
A similar effort was recently carried out by [427, 428], where NNs are designed to retain interpretability. This is achieved through sparse connections in combination with specialized activation functions representing candidate functions, such that they are able to capture classical forms of constitutive terms. Through the sparse connections in the network and the specialized activation functions, the NN’s weights become physical parameters, yielding an interpretable model. This is best understood by consulting Fig. 7, where the strain energy density is expressed as
Differentiating the predicted strain energy density \({\hat{\psi }}\) with respect to the invariants \(I_i\) yields the constitutive model, relating stress and strain.
3.3 Numerical methods
This subsection describes efforts in which NNs are used to replace or enhance classical numerical schemes to solve PDEs.
3.3.1 Algorithm enhancement
Classical algorithms can be enhanced by NNs, by learning corrections to commonly arising numerical errors, or by estimating tunable parameters within the algorithm. Corrections have, for example, been used for numerical quadrature [429] in the context of finite elements. Therein, NNs are used to predict adjustments to quadrature weights and positions from the nodal positions to improve the accuracy for distorted elements. Similarly, NNs have been applied as correction for strain-displacement matrices for distorted elements [430]. NNs have also been employed to provide improved gradient estimates. Specifically, [431] modify the gradient computation to match a fine scale simulation on a coarse grid:
The coefficients \(\alpha _i\) are predicted by NNs from the current coarse solution. Special constraints are imposed on \(\alpha _i\) to guarantee accurate derivatives. Another application are specialized strain mappings for damage mechanics embedded within individual finite elements learned by PINNs [432]. It has even been suggested to partially replace solvers. For example, [433] replace either the fluid or structural solver by a surrogate model for fluid-structure interaction problems.
Learning tunable parameters was demonstrated for the estimation of the largest possible time step using a RNN acting at the latent vector of an autoencoder [434]. Also, optimal test functions for finite elements were learned to improve stability [435]. Another approach to learning numerical parameters for simulation is presented in [436], where hyperparameters connected to a similarity-based topology optimization are learned—specifically, an energy scaling factor is predicted from a dissimilarity metric based on a previous topology optimization. These approaches have in common that they spare the user from performing multiple simulations to tune the numerical parameters.
3.3.2 Multiscale methods
Multiscale methods have been proposed to efficiently integrate and resolve systems acting on multiple scales. One approach are the learned constitutive models from Sect. 3.2 that incorporate the microstructure. This is essentially achieved through a homogenization at the mesoscale used within a macroscale simulation.
A related approach is element substructuring [437, 438], where superelements mimic the behavior of a conglomerate of classic basic finite elements. In [439], the superelements are enhanced by NNs, which draw on the boundary displacements to predict the displacements and stresses within the element as well as the reaction forces at the boundary. Through assembly of the reaction forces in the global finite element system, an equilibrium is reached with a Newton-Raphson solver. Similarly, the approach in [440] learns the internal forces from the coarse degrees of freedom of the superelements. These approaches are particularly valuable, as they can seamlessly incorporate history-dependent behavior using RNNs.
Finally, multiscale analysis can also be performed by first solving a coarse global model with a subsequent local analysis. This is referred to as zooming methods. In [441], a NN learns the global model and thereby predicts the boundary conditions for the local model. In a similar sense, DeepONets have been applied for the local analysis [442], whereas the global analysis is performed with a finite element solver. Both are conducted in an alternating fashion until convergence is reached.
3.3.3 Optimization
Optimization is a fundamental task within computational mechanics and therefore addressed separately. It is not only used to find optimal structures, but also to solve inverse problems. Generally, the task can be formulated as minimizing a cost function C with respect to parameters \(\lambda \). In computational mechanics, \(\lambda \) is typically fed to a forward simulation \(u=F(\lambda )\), yielding a solution u inserted into the cost function C. If the gradients \(\nabla _\lambda C\) are available, gradient-based optimization is the state-of-the-art [443], where the gradients are used to update \(\lambda \). In order to access the gradients, the forward simulation F has to be differentiable. This requirement is, for example, utilized within the branch of deep learning called differentiable physics [36]. Incorporating gradient information from the numerical solver into the NN improves learning, feedback, and generalization. An overview and introduction to differentiable physics is provided in [36], with applications in [215, 402, 431, 444,445,446]Footnote 21.
The iterative gradient-based optimization procedure is illustrated in Fig. 8. For an in-depth treatment of NNs in optimization, see the recent review [22].
Inserting a learned forward operator F, as those discussed in Sect. 2.1, into an optimization problem provides two advantages [447,448,449,450,451]. Firstly, a faster forward operator results in faster optimization iterations. Secondly, the gradient computation is simplified, as automatic differentiation through the forward operator F is straightforward in contrast to the adjoint state method [452, 453]. Note however, that for time-stepping procedures, the computational cost might be greater for automatic differentiation, as shown in [313]. Applications include full waveform inversion [313], topology optimization [454,455,456], and control problems [70, 72, 444].
Similarly, an operator replacing the sensitivity computation can be learned [456,457,458,459]. This can be achieved in a supervised manner with precomputed sensitivities to reduce the cost C [456, 458], or by intending to maximize the improvement of the cost function after the gradient update [457, 459]. In [457, 459], an evolutionary algorithm was employed for the general case that the sensitivites are not readily available. Training can adaptively be reintroduced during the optimization phase, if the cost C does not decrease [456], improving the NN for the specific problem it is handling. Taking this idea to the extreme, the NN is trained on the initial gradient updates of a specific optimization. Later, solely the NN delivers the sensitivities [460] with supervised updates every n updates to improve accuracy, where n is a hyperparameter. The ideas of learning a forward operator and a sensitivity operator are combined in [455], where it is pointed out that the sensitivity from automatic differentiation through the learned forward operator can be inaccurate, despite an accurate forward operatorFootnote 22. Therefore, an additional loss term is added to the cost function, enforcing the correctness of the sensitivity through labels obtained with the adjoint state method. Alternatively, the sensitivity computation can be enhanced by correcting the sensitivity computation performed on a coarse grid, as proposed in [461] and related to the multiscale techniques discussed in Sect. 3.3.2. Here, the adjoint field used for the sensitivity computation is reduced by both a proper orthogonal decomposition, and a coarser discretization. Subsequently, a NN corrects the coarse estimate through a super-resolution NN [462]. Similarly, [456, 463] maps the forward solution on a coarse grid to the design variable sensitivity on a fine grid. A similar application is a correction term within a fixed-point iterator, as outlined in [464].
Related to the sensitivity predictions are approaches that directly predict an updated state. The goal is to decrease the total number of iterations. In practice, a combination of predictions and classical gradient-based updates is performed [111,112,113, 465]. The main variations between the methods in the literature are the inputs and how far the forecasting is performed. In [111], the update is obtained from the current state and gradient, while [113] predicts the final state from the history of initial updates. The history is also considered in [112], but the prediction is performed on subpatches which are then stitched together.
Another option of introducing NNs to the optimization loop is to use NNs as an ansatz of \(\lambda \), see, e.g., [313, 444, 466,467,468,469,470,471,472,473,474]. In the context of inverse problems [313, 444, 466,467,468,469,470], the NN acts as regularizer on a spatially varying inverse quantity \(\lambda (x)=I_{NN}(x;\varvec{\theta })\), providing both smoother and sharper solutions. For topology optimization with a NN parametrization of the density function [471,472,473,474], no regularizing effect was observed. It was however possible to obtain a greater design diversity through different initializations of the NN. Extensions using specialized NN architectures for implicit representations [475,476,477,478,479,480] have been presented in the context of topology optimization in [481]. Furthermore, [313, 468, 472] showed how to conduct the gradient computation without automatic differentiation through the solver F. The gradient computation is split up via the chain rule:
The first gradient \(\nabla _{\lambda } C\) is computed with the adjoint state method, such that the solver can be treated as a black box. The second gradient \(\nabla _{\varvec{\theta }} \lambda \) is obtained through automatic differentiation. An additional advantage of the NN ansatz is that, if applied to multiple solutions with a problem specific input, the NN is trained. Thus, after sufficient inversions, the NN can be used as predictor, as presented in [482]. The training can also be performed in combination with labeled data, yielding a semi-supervised approach, as demonstrated in [224, 483].
3.4 Post-processing
Post-processing concerns the modification and interpretation of the computed solution. One motivation is to reduce the numerical error of the computed solution. This can for example be achieved with super-resolution techniques relying on specialized CNN architectures from computer vision [484, 485]. Coarse to fine mappings can be obtained in a supervised manner using matching coarse and fine simulations as labeled data, as presented for turbulent flows [462, 486] and topology optimization [487,488,489]. The mapping is typically performed from coarse to fine solution fields, but mappings from a posteriori errors have been proposed as well [490]. Further specialized extensions to the cost function have been suggested in the context of de-homogenization [491].
The methods can analogously be applied to temporal data where the solution is refined at each time step,—as, e.g., presented with RNNs as corrector of reduced order models [492]. However, coarse discretizations in dynamical models lead to an error accumulation, that increases with the number of time steps. Thus, a simple coarse-to-fine post-processing at each time step is not sufficient. To this end, [445, 446] apply a correction at each time step before the coarse solver predicts the next time step. As the correction is propagated through the solver, the sensitivities of the solver must be computed to perform the backward propagation. Therefore, a differentiable solver (i.e., differentiable physics) has to be employed. This significantly outperforms the purely supervised approach, where the entire coarse trajectory is applied without corrections in between. The number of steps performed is a hyperparameter, which increases the accuracy but comes with a higher computational effort. This concept is referred to as solver-in-the-loop.
Further variations perform the coarse-to-fine mapping in a patch-based manner, where the interfaces require a special treatment [493]. Another approach uses a NN to map the coarse solution to the closest fine solution stored in a database [494]. The mapping is performed on patches of the domain.
Other post-processing tasks include feature extraction. After a topology optimization, NNs have been used to extract basic shapes to be used in a subsequent shape optimization [495, 496]. Another aspect that can be ensured through post-processing is manufacturability.
Lastly, adaptive mesh refinement falls under the category of post-processing as well. Closely related to the meshing approaches discussed in Sect. 3.1.3, NNs have been proposed as error indicators [361, 497] that are trained in a supervised manner. The error indicators can subsequently be employed to adapt the mesh based on the error.
4 Discretizations as neural networks
NNs are composed of linear transformations and non-linear functions, which are basic building blocks of most PDE discretizations. Thus, the motivation to construct NNs utilizing discretizations of PDEs are twofold. Firstly, deep learning techniques can hereby be exploited within classical discretization frameworks. Secondly, novel NN architectures arise, which are more tailored towards many physical problems in computational mechanics but potentially also find their use cases outside of that field.
4.1 Finite element method
One method are finite element NNs [14, 498] (see [499,500,501,502,503,504] for applications), for which we consider the system of equations from a finite element discretization with the stiffness matrix \(K_{ij}\), degrees of freedom \(u_j\), and the body load \(b_i\):
Assuming constant material properties along an element and uniform elements, a pre-integration of the local stiffness matrix \(k_{ij}^e=\alpha ^e w_{ij}^e\) can be performed, as, e.g., shown in [505]. The goal is to pull out the material coefficients of the integration, leading to the following assembly of the global stiffness matrix:
Inserting the assembly into the system of equations from Eq. (64) yields
The nested summation has a similar structure of a FC-NN, \(a_i^{(l)}=\sigma (z_i^{(l)})=\sigma (\sum _{j=1}^{N^{(l)}}a_j^{(l-1)}+b_i^{(l)})\), (where \(z_i^{(l)}=\sum _{j=1}^{N^{(l)}}a_j^{(l-1)}+b_i^{(l)}\)) without activation \(\sigma \) and bias b (see Fig. 9):
Thus, the stiffness matrix \(K_{ij}\) is the hidden layer. In a forward problem, \(W_{ij}^e\) are non-learnable weights, while \(u_j\) contains a mixture of learnable weights and non-learnable weights coming from the imposed Dirichlet boundary conditions. A loss can be formulated in terms of body load mismatch, as \(\frac{1}{2}\sum _{i=1}^N ({\hat{b}}_i - b_i)^2\). In the inverse setting, \(\alpha ^e\) becomes learnable—instead of \(u_j\), which is then fixed. For partial domain knowledge in the inverse case, \(u_j\) becomes partially learnable.
A different approach are the hierarchical deep-learning NNs (HiDeNNs) [506] with extensions in [507,508,509,510,511,512]. Here, shape functions are treated as NNs constructed from basic building blocks. Consider, for example, the one-dimensional linear shape functions
which can be represented as a NN, as shown in Fig. 10, where the weights depend on the nodal positions \(x_1^e, x_2^e\). The interpolated displacement field \(u^e(x)\), which is valid in the element domain \(\Omega ^e\), is obtained by multiplication with the nodal displacements \(u_1^e, u_2^e\), treated as shared NN weights.
They are shared, as the nodal displacements \(u_1^e, u_2^e\) are also used for the neighboring elements \(u^{e-1}, u^{e+1}\). Finally the displacement over the entire domain u is obtained by superposition of all elemental displacement fields \(u^e\), which are first multiplied by a step function defined as 1 inside the corresponding element domain \(\Omega ^e\) and 0 outside.
A forward problem is solved with a minimization of the variational loss function, as presented in Sect. 3.2 with the nodal values \(u^e_i\) as learnable weights. According to [506], this is equivalent to iterative solution procedures employed for large systems of equations in finite elements. The additional advantage is a seamless integration of r-refinement [513,514,515] (also referred to as adaptive mesh refinement), i.e., the shift of nodal positions to optimal positions by making the nodal positions \(x_i^e\) learnable. Special care has to be taken to avoid element inversion, which is handled by an additional loss term. Inverse problems can similarly be solved by using learnable input parameters, as presented for topology optimization [512].
The method has been combined with reduced order modeling techniques [508]. Furthermore, the shape functions have been extended with convolutions [510, 511]. Specifically, a weighting field W(x), i.e., kernel (e.g., radial basis functions) with learnable dilation parameterFootnote 23, is introduced to enhance the finite element space \(u^c(x)\) through convolutions, thereby increasing the space’s expressivity and continuity:
This introduces a smoothing effect over the elements and can efficiently be implemented using NNs and, thereby, obtain a more favorable data-structure to exploit the full parallelization capabilities of GPUs [511]. The enhanced space has been incorporated in the HiDeNN framework. While an independent confirmation is still missing, the authors promise a speedup of several orders of magnitude compared to traditional finite element solvers [512]Footnote 24.
Lastly, another approach related to finite elements was presented as FEA-net [516, 517]. Here, the matrix-vector multiplication of the global stiffness matrix \({\varvec{K}}\) and solution vector \({\varvec{u}}\) including the assembly of the global stiffness matrix is replaced by a convolution. In other words, the computation of the force vector \({\varvec{f}}\) is used to compute the residual \({\varvec{r}}\).
Assuming a uniform mesh with homogeneous material properties, the mesh is defined by the segment illustrated in Fig. 11. The degree of freedom \(u_j\) only interacts with the stiffness contributions \(K_i^1, K_i^2, K_{i+1}^1, K_{i+1}^2\) of its neighboring elements i and \(i+1\). Therefore, the force component \(f_j\) acting on node j can be expressed by a convolution:
This can analogously be applied to all degrees of freedoms, with the same convolution filter \({\varvec{W}} = [K^1, K^1 + K^2, K^2]\), assuming the same stiffness contributions for each element.
The convolution can then be exploited in iterative schemes which minimize the residual \({\varvec{r}}\) from Eq. 72). This saves the effort of constructing and storing the global stiffness matrix. By constructing the filter \({\varvec{W}}\) as a function of the material properties of the adjacent elements, heterogeneities can be taken into account [517]. If the same iterative solver is employed, FEA-Net is able to outperform classical finite elements for non-linear problems on uniform grids.
4.2 Finite difference method
Similar ideas have been proposed for finite differences [518], as for example employed in [313], where convolutional kernels are used as an implementation of stencils exploiting the efficient NN libraries with GPU capabilities. Here, the learnable parameters can be the finite difference stencil for inverse problems or the output for forward problems. This has, for example, been presented in the context of full waveform inversion, which is modeled as a RNN [519, 520]. The stencils are written as convolutional filters and repeatedly applied to the current state and the corresponding inputs. These are the wave field, the material distribution, and the source. The problem can then be regarded as a RNN. However, it is computationally expensive to perform automatic differentiation throughout the time steps for full waveform inversion, thereby obtaining the sensitivities with respect to \(\gamma \)—both regarding memory and wall clock computational time. A remedy is to combine automatic differentiation with the adjoint state method as in [313, 468, 472] and discussed in Sect. 3.3.3.
Taking this idea one step further, the discretized wave equation can be regarded as an analog RNN [521] where the weights are the material distribution. Here, a binary material is learned in a trainable region between source and probing location. The input x(t) is encoded as a signal and emitted as source, which is measured at the probing locations \(y_i(t)\) as output. By integrating the outputs, a classification of the input can be performed.
4.3 Material discretizations
Deep material networks [522, 523] construct a NN from a material distribution. An output is constructed from basic building blocks, inspired by analytical homogenization techniques. Given two materials defined in terms of their compliance tensors \(c_1\), \(c_2\), and volume fractions \(f_1, f_2\), an analytical effective compliance tensor \({\bar{c}}\) is computed. The effective tensor is subsequently rotated with a rotation tensor R, defined in terms of the three rotation angles \(\alpha , \beta , \gamma \), yielding a rotated effective tensor \({\bar{c}}_r\). Thus, the building block takes as input two compliance tensors \(c_1,c_2\) and outputs a rotated effective compliance tensor \({\bar{c}}_r\), where \(f_1, f_2, \alpha , \beta , \gamma \) are the learnable parameters (see Fig. 13). By connecting these building blocks, a large network can be created. The network is applied to homogenization tasks of RVEs [522, 523], where the material of the phases is varied during evaluation.
4.4 Neural differential equations
In a more general setting, neural ordinary differential equations [402] consider the forward Euler discretization of ordinary differential equations. Specifically, RNNs are viewed as Euler discretizations of continuous transformations [524,525,526]. Consider the iterative update rule of the hidden states \(y_{t+1}=y(t+\Delta t)\) of a RNN.
Here, f is the evaluation of one recurrent unit in the RNN. In the limit of the time step size \(\lim {\Delta t\rightarrow 0}\), the dynamics of the hidden units \(y_t\) can be parametrized by an ordinary differential equation
The input to the network is the initial condition y(0), and the output is the solution y(T) at time T. The output of the NN, y(T), is obtained by solving Eq. 76 with a differential equation solver. The sensitivity computation for the weight update is obtained using the adjoint state method [453, 527], as backpropagating through each time step of the solver leads to a high memory cost. This also makes it possible to treat the solver as a black box. Similar extensions to PDEs [525] have been proposed by considering recurrent CNNs with residual connections, where the CNNs act as spatial gradients.
Similarly, [528] establish a connection between deep residual RNNs and iterative solvers. Residual connections in NNs allow information to bypass NN layers. Consider the estimation of the next state of a PDE with a classical solver \(u_{t+1}=u(t+\Delta t)=F[u(t)]\). The residual \(r_{t+1}=r(t+\Delta t)\) is determined in terms of the ground truth \(u_{t+1}^{{\mathcal {M}}}\):
An iterative correction scheme is formulated with a NN. The iterations are indicated with the superindex (k).
Note that the residual connection, i.e., \(u_{t+1}^{(k)}\) as directly used in the prediction of \(u_{t+1}^{(k+1)}\), allows information to pass past the recurrent unit \(f_{NN}\). A related approach can be found in [529], where an autoencoder iteratively acts on a solution until convergence. In the first iteration, a random initial solution is used as input.
5 Generative approaches
Generative approaches (see [33] for an in-depth review in the field of design and [530] for a hands-on textbook) aim to model the underlying probability distribution of a data set to generate new data that resembles the training data. Three main methodologies exist:
-
autoencoders,
-
generative adversarial networks (GANs),
-
diffusion models,
and are described in detail in Appendix B. Currently, there are two prominent areas of application in computational mechanics. One area of focus is microstructure generation (Sect. 5.1.1), which aims to produce a sufficient quantity of realistic training data for surrogate models, as described in Sect. 2.1. The second key application area is generative design (Sect. 5.1.2), which relies on algorithms to efficiently explore the design space within the constraints established by the designer.
5.1 Applications
5.1.1 Data generation
The most straightforward application of variational autoencoders and GANs in computational mechanics is the generation of new data, based on existing examples. This has been demonstrated in [531,532,533,534,535] for microstructures in [93] for velocity models used in full waveform inversion, and in [536] for optimized structures using GANs. Variational autoencoders have also been used to model the crossover operation in evolutionary algorithms to create new designs from parent designs [537]. Applications of diffusion models for microstructure generation can be found in [538,539,540].
Microstructures pose a unique challenge due to their inherent three-dimensional nature, while often only two-dimensional reference images are available. This has led to the development of specialized architectures that are capable of creating three-dimensional structures from representative two-dimensional slices [541,542,543]. The approach typically involves treating three-dimensional voxel data as a sequence of two-dimensional slices of pixels. Sequences of images are predicted from individual slices, ultimately forming a three-dimensional microstructure. In [544], a RNN is applied to a two-dimensional reference image, yielding an additional dimension, and consequently creating a three-dimensional structure. The RNN is applied at the latent vector inside an encoder decoder architecture, such that the inputs and outputs of the RNN have a relatively small size. Similarly, [545, 546] apply a transformer [172] to the latent vector. An alternative formulation using variational autoencoder GANs is presented in [547] to reconstruct three-dimensional voxel models of porous media from two-dimensional images.
The generated data sets can subsequently be leveraged to train surrogate models, as demonstrated in [536, 548,549,550] where CNNs were used to verify the physical properties of designs, and in the study by [551] on the homogenization of microstructures with CNNs. Similarly, [93, 552] generate realistic material distributions, such as velocity distributions, to train an inverse operator for full waveform inversion.
5.1.2 Generative design and design optimization
Within generative design, the generator can also be considered as a reparametrization of the design space that reduces the number of design variables. With autoencoders, the latent vector serves as the design parameter [553, 554], which is then optimizedFootnote 25. Similarly, [556] find that point cloud autoencoders [117, 557, 558] are advantageous as geometric dimensionality reduction tools (potentially combined with performance features) for efficiently exploring the design space. In the context of GANs, the optimization task is aimed at the random input \(\varvec{\xi }\) provided to the generator. This approach is demonstrated in various studies, such as ship hull design parameterized by NURBS surfaces [559], airfoil shapes expressed with Bézier curves [560, 561], structural optimization [562], and full waveform inversion [563]. For optimization, variational autoencoder GANs are particularly important, as the GAN ensures high quality designs, while the autoencoder ensures well-behaving gradients. This was shown for microstructure optimization in [564].
An important requirement for generative design is design diversity. Achieving this involves ensuring that the entire design space is spanned by the generated data. For this, the cost function can be extended, as presented in [565], using determinantal point processes [566] or in [559] with a space-filling term [567].
Other strategies are specifically focused on promoting design diversity. This involves identifying novel designs via a novelty score [568]. The novelty within these designs is segmented and used to modify the GAN using methods outlined in [569]. An alternative approach proposed by [570] quantifies creativity and maximizes it. This is achieved by performing a classification in pre-determined categories by the discriminator. If the classification is unsuccessful, the design must lie outside the categories and is therefore deemed creative. Thus the generator then seeks to minimize the classification accuracy.
However, some applications necessitate a resemblance to prior designs due to factors such as aesthetics [571] or manufacturability [572]. In [571], a pixel-wise \(L^1\)-distance to previous designs is included in the lossFootnote 26. A complete workflow with generative design enforcing resemblance of previous designs and surrogate model training for the quantification of mechanical properties is described in [573]. Another option is the use of style transfer techniques [555], which in [574] is incorporated into a conventional topology optimization scheme [575] as a constraint in the loss. These are tools with the purpose of incorporating vague constraints based on previous designs for topology optimization.
GANs can also be applied to inverse problems, as presented in [576] for full waveform inversion. The generator predicts the material distribution, which is used in a differentiable simulation providing the forward solution in the form of a seismogram. The discriminator attempts to distinguish between the seismogram indirectly coming from the generator and the measured seismograms. The underlying material distribution is determined through gradient descent.
5.1.3 Conditional generation
As stated earlier, GANs can take specific inputs to dictate the output’s nature. The key difference to data-driven surrogate models from Sect. 2.1 is that GANs provide a tool to generate multiple outputs given the same conditional input. They are thus applicable to problems with multiple solutions, such as design optimization or data generation.
Examples of conditional generation are rendered cars from car sketches [577], hierarchical shape generation [578], where the child shape considers its parent shape and topology optimization with predictions of optimal structures from initial fields, e.g., strain energy, of the unoptimized structure [579, 580]. Physical properties can also be used as input. The properties are computed by a differentiable solver after generation and are incorporated in the loss. This was, e.g., presented in [581] for airplane shapes, and in [582] for inverse homogenization. For full waveform inversion, [583] trains a conditional GAN with seismograms as input to predict the corresponding velocity distributions. A similar effort is made by [584] with CycleGANs [585] to circumvent the need for paired data. Here, one generator generates a seismogram \({\hat{y}}=G_y(x)\) and another a corresponding velocity distribution \({\hat{x}}=G_x(y)\). The predictions are judged by two separate discriminators. Additionally, a cycle-consistency loss ensures that a prediction from a prediction, i.e., \(G_y({\hat{x}})\) or \(G_x({\hat{y}})\), matches the initial input x or y. This cycle-consistency loss ensures, that the learned transformations preserve the essential features and structures of the original seismograms or velocity distributions when they are transformed from seismogram to velocity distribution and back again.
Lastly, coarse-to-fine mappings as previously discussed in Sect. 3.4, can also be learned by GANs. This was, for example, demonstrated in topology optimization, where a conditional GAN refines coarse designs obtained from classical optimizations [579, 586] or CNN predictions [102]. For temporal problems, such as fluid flows, the temporal coherence between time steps poses an additional challenge. Temporal coherence can be ensured by a second discriminator, which receives three consecutive frames of either the generator or the real data and decides if they are real or generated. The method is referred to as tempoGAN [587].
5.1.4 Anomaly detection
Finally, a last application of generative models is anomaly detection, see [588] for a review. This is particularly valuable for non-destructive testing, where flawed specimens can be identified in terms of anomalies. The approach relies on generative models and attempts to reconstruct the geometry. At first, the generative model is trained on structures without flaws. During evaluation, the structures to be tested are then fed through the NN. In case of an autoencoder, as in [589], it is fed through the encoder and decoder. For a GAN, as discussed, e.g., in [590,591,592], the input of the generator is optimized to fit the output as well as possible. The mismatch in reconstruction then provides a spatially dependent measure of where an anomaly, i.e., defect is located.
Another approach is to use the discriminator directly, as presented in [593]. If a flawed specimen is given to the discriminator, it will be categorized as fake, as it was not part of the undamaged structures during training. The discriminator can also be used to check if the domain of application of a surrogate model is valid. Trained on the same training data as the surrogate model, the discriminator estimates the dissimilarity between the data to be tested and the training data. For large discrepancies, the discriminator detects that the surrogate model becomes invalid.Footnote 27
6 Deep reinforcement learning
In reinforcement learning, an agent interacts with an environment through a sequence of actions \(a_t\), which is illustrated in Fig. 14. Upon executing an action \(a_t\), the agent receives an updated state \(s_{t+1}\) and reward \(r_{t+1}\) from the environment. The agent’s objective is to maximize the cumulative reward \(R_{\Sigma }\). The environment can be treated as a black box. This presents an advantage in computational mechanics when differentiable physics are not feasible (as for example in crash simulations [594]). Reinforcement learning has achieved impressive results such as human-level performance in games like Atari [20], Go [595], and StarCraft II [596]. Further, reinforcement learning has successfully been demonstrated in robotics [597]. An example hereof is learning complex maneuvers for autonomous helicopter flight [598,599,600].
A comprehensive review of reinforcement learning exceeds the scope of this work, since it represents a major branch of machine learning. An introduction is, e.g., given in [25, 38], and an in-depth textbook is [45]. However, at the intersection of these domains lies deep reinforcement learning, which employs NNs to model the agent’s actions. In Appendix C, we present the main concepts of deep reinforcement learning and delve into two prominent methodologies: deep policy networks (Appendix C.1) and deep Q-learning (Appendix C.2) in view of applications in computational mechanics.
6.1 Applications
Deep reinforcement learning is mainly used for inverse problems (see [25] for a review within fluid mechanics), where the PDE solver is treated as a black box, and assumed to not be differentiable.
The most prominent application are control problems. One example is discovering swimming strategies for fish—with the goal of efficiently minimizing the distance to a leader fish [601, 602]. The environment is given by the Navier Stokes equation. Another example is balancing rigid bodies with fluid jets while using as little force as possible [603]. Similarly, [604] control jets in order to reduce the drag around a cylinder. Reducing the drag around a cylinder is also achieved by controlling small rotating cylinders in the wake of the flow [605]. A more complex example is controlling unmanned aerial vehicles [606]. The control schemes are learned by interacting with simulations and, subsequently, applied in experiments.
Further applications in connection with inverse problems are learning filters to perturb flows in order to match target flows [607]. Also, constitutive laws can be identified. The individual arithmetic manipulations within a constitutive law can be represented as graphs. An agent constructs the graph in order to best match simulation and measurement [608], which yields an interpretable law.
Topology optimization has also been tackled by reinforcement learning. Specifically, the ability to predict only binary states (material or no material) is desirable—instead of intermediate states, as in solid isotropic material with penalization [609, 610]. This has been shown with binary truss structures, modeled with graphs in order to minimize the total structural volume under stress constraints. In [611], an agent removes trusses from existing structures, and trusses are added in [612]. Similarly, [613] removes finite elements in solid structures to modify the topology. Instead, [614] pursues design diversity. Here a NN surrogate model predicts near optimal structures from reference designs. The agent then learns to generate reference designs as input, such that the corresponding optimal structures are as diverse as possible.
Also, high-dimensional PDEs have been solved with reinforcement learning [615, 616]. This is achieved by recasting the PDE into stochastic control problems, thereby solving these with reinforcement learning.
Finally, adaptive mesh refinement algorithms have been learned by reinforcement learning [617]. An agent decides whether an element is to be refined based on the current state, i.e., the mesh and solution. The reward is subsequently defined in terms of the error reduction, which is computed with a ground truth solution. The trained agent can thus be applied to adaptive mesh refinement to previously unseen simulations.
6.1.1 Extensions
Each interaction with the environment requires solving the differential equation, which, due to the many interactions, makes reinforcement learning expensive. The learning can be accelerated through some basic modifications. The learning can be perfectly parallelized by using multiple environments simultaneously [618], or by using multiple agents within the same environment [619]. Another idea is to construct a surrogate model of the environment and thereby exploit model-based approaches [620,621,622,623]. The general procedure consists of three steps:
-
model learning: learn surrogate of environment,
-
behavior learning: learn policy or value function,
-
environment interaction: apply learned policy and collect data.
Most approaches construct the surrogate with data-driven modeling (Sect. 2.1), but physics-informed approaches have been proposed as well [620, 622] (Sect. 3.2).
7 Conclusion and outlook
In order to structure the state-of-the-art, an overview of the most prominent deep learning methods employed in computational mechanics was presented. Five main categories were identified: simulation substitution, simulation enhancement, discretizations as NNs, generative approaches, and deep reinforcement learning.
Despite the variety and abundance of the literature, few approaches are competitive in comparison to classical methods. This manifests itself in the lack of comparisons in the literature of NN-based methods to classical methods. We have found little evidence that NN-based methods truly outperform classical methods in computational mechanics. However, with only few exceptions, current research is still in its early stages, with a focus on showcasing possibilities without focusing too much attention on accuracy and efficiency. Future research must, nevertheless, shift its focus to incorporate more in-depth investigations into the performance of the developed methods—including thorough and meaningful comparisons to performant classical methods dedicated to the task under investigation. This is in agreement with the recent review article on deep learning in topology optimization [22], where critical and fair assessments are requested. This includes the determination of generalization capabilities, greater transparency by including, e.g., worst case performances to illustrate reliability, and computation times without disregarding the training time.
In line with this, and to the best of our knowledge, we provide a final overview outlining the potentials and limitations of the discussed methods.
-
Simulation substitution has potential for surrogate modeling of parameterized models that need to be evaluated many times. However, currently this is only realizable for small parameter spaces, due to the amount of data required and unlikely to replace established methods, as also stated in [42]. Complex problems can still be tackled by NN surrogates if they are first reduced to a low-dimensional space through model order reduction techniques. Physics-informed learning further reduces the amount of required data and improves the generalization capabilities. However, enforcing physics through penalty terms increases the computational effort, where the solutions still do not necessarily satisfy the corresponding physical laws. Instead, enforcing physical laws by construction guarantees that they are obeyed, which is more favorable to adding constraints through penalty terms.
-
Simulation enhancement is currently one of the most promising areas of investigation. It is in particular beneficial for tasks where classical methods show difficulties. An excellent example for this is the formulation of constitutive laws, which are inherently phenomenological and thereby well-suited to be identified from data using tools such as deep learning. In addition, simulation enhancement, makes it possible to draw on insights gained from classical methods developed since the inception of computational mechanics. Furthermore, it is currently more realistic to learn smaller components of the simulation chain with NNs rather than the entire model. These components should ideally be expensive and have limited requirements regarding accuracy and reliability. Lastly, it is also easier to assess whether a method enhanced by deep learning outperforms the classical method, as direct and fair comparisons are readily possible.
-
An interesting research direction is to employ discretizations as NNs, as this offers the potential to discover NNs tailored to computational mechanics tasks, such as CNNs for computer vision or RNNs and transformers for natural language processing. In computational mechanics, their main benefit seems to stem from being able to exploit the computational benefits of tools and hardware that were created for the wider community of deep learning—such as NN libraries programmed for GPUs which enable an efficient, yet effortless massive parallelization. In our assessment, none of the methods encountered in this review were shown to be able to consistently outperform classical approaches using a comparable amount of computational resources.
-
Generative approaches have been shown to be highly versatile in applications of computational mechanics since the accuracy of a specific instance under investigation is less of a concern here. They have been used to generate statistically equivalent data to train other machine learning models, to incorporate vague constraints based on data within optimization frameworks, and to detect anomalies.
-
Deep reinforcement learning has already shown encouraging results—for example in controlling unmanned vehicles in complex physics environments. It is mainly applicable for problems where efficient differentiable physics solvers are unavailable, which is why it is popular in control problems for turbulence. In the presence of differentiable solvers, gradient-based methods are, however, still the state-of-the-art [443] and, thus, preferred.
Notes
The considered journals are Computer Methods in Applied Mechanics and Engineering, Computers & Mathematics with Applications, Computers & Structures, Computational Mechanics, Engineering with Computers, Journal of Computational Physics.
Pioneering works exploring neural networks for computational mechanics prior to the current rise of deep learning are compiled in reviews such as [3, 4], see [5] for a more recent treatment. Contributions across almost all of the discussed categories have already been made before the year 2000. Aligning with the proposed taxonomy from Sect. 1.2, these include data-driven modeling, such as inverse surrogate models [6, 7], physics-informed approaches for solving differential equations [8,9,10], efforts in simulation substitution, such as constitutive modeling [11, 12] or estimating numerical parameters [13]. Also, efforts to exploit the parallel computation capabilities of neural networks have been made, where more efficient implementations are obtained by constructing networks from discretizations [14,15,16]. The notable exclusions are generative approaches, arising with variational autoencoders [17, 18], and generative adversarial networks [19] in 2014, as well as deep reinforcement learning, which was popularized in the early 2010s [20].
A further interesting distinction is made between inner (within a forward simulation) and outer loop enhancements (using multiple forward simulations, e.g., within an optimization).
In case of the bar equation (Eq. 25), the PDE coefficients could be the cross-sectional stiffness EA(x) or/and the distributed load p(x).
Static problems without time-dependence can only be treated by the space-time approaches.
Note that a spatial discretization of the PDE equation (3) can also be written as a system of ODEs.
Note that u might only be partially known on the domain \(\Omega \) for inverse problems.
For an in-depth treatment of the inner workings of GNNs, see [63].
Note that the autoencoder is modified, as it does not perform an identity mapping. Nonetheless, the idea of mapping to a reduced latent state is exploited.
Originally proposed in [138] with shallow NNs.
including architectures specifically designed to solve PDEs
Typically, a single solution to a PDE is obtained. If the PDE is parametrized, multiple solutions can be obtained.
Importantly, the training would be without training data and would only require a definition of the parametrized PDE. Currently, this is only possible for simple PDEs with small parameter spaces.
Consider, for instance, a training procedure in which the PDE loss \({\mathcal {L}}_{{\mathcal {N}}}\) is first minimal, such that the PDE is fulfilled. Without fulfilment of the boundary conditions, the solution is not unique. However, the NN struggles to modify the current boundary values without violating the Footnote 17 continued
PDE loss and thereby increasing the total cost function C. The NN is thus stuck in a bad local minimum. Similar scenarios can be formulated for a too rapid minimization of the other loss terms.
This is enforced through constraints using moment matrices of the convolutional filters.
The encoder and decoder are derived with respect to their inputs to estimate the derivatives \(\dot{{\varvec{x}}}, \dot{{\varvec{h}}}\) using the chain rule.
Here, the initial guess is incorporated through a regularization term.
Applications of differentiable physics vary widely and are addressed throughout this work.
Although automatic differentiation in principle has a high accuracy, oscillations between the sampled points may lead to spurious gradients with regard to the sampled points [242].
The dilation parameter depends on x and thus introduces additional degrees of freedom throughout the domain.
After close examination and exchanges with the authors of [506, 508, 510,511,512], we have concluded that the current speed-up is mainly attributable to the simultaneously employed reduced order models. Minor improvements in accuracy are possible through the employed r-adaptivity and convolutions, however, accompanied by an increase in computational effort. Furthermore, we have been informed that the hiDeNN methodology is implemented most efficiently in the standard finite element way, i.e., without NNs, unlike the descriptions in [506, 508, 510,511,512]. Thus the shape functions \(N_i^e(x)\) are implemented in a straightforward manner within automatic differentiation frameworks in order to obtain the derivatives of the solution u with respect to x, and the loss function with respect to the nodal positions \(x_i^e\) and degrees of freedom \(u^e\). If convolutions are not employed, the sensitivities can be precomputed analytically, eliminating the need for automatic differentiation. Hence, the difference to a conventional finite element implementation is that the finite element discretization is solved with gradient descent optimization instead of solving the system of equations directly. This is more expensive but allows for the flexibility of Footnote 24 continued
seamlessly introducing r-adaptivity or convolutions on top of the ansatz space.
It is worth noting, that to ensure designs that are physically meaningful, a style transfer technique can be implemented [555]. Here, the training data is perceived as a style, and the Gram matrices’ difference, characterizing the distribution of visual patterns or textures in the generated designs, is minimized.
Similarly, this loss can be used to filter out designs that are too similar.
Note however, that the discriminator does not guarantee an accurate assessment of the validity of the surrogate model.
An undirected graph can also be considered by treating it as a bi-directional graph.
References
Abu-Mostafa YS, Magdon-Ismail M, Lin H-T (2012) Learning from data. AML Book
Adie J, Juntao Y, Zhang X, See S (2018) Deep learning for computational science and engineering. In: GPU technology conference. https://on-demand.gputechconf.com/gtc/2018/presentation/S8242-Yang-Juntao-paper.pdf
Yagawa G, Okuda H (1996) Neural networks in computational mechanics. Arch Comput Methods Eng 3(4):435–512. https://doi.org/10.1007/BF02818935
Waszczyszyn Z, Ziemiański L (2001) Neural networks in mechanics of structures and materials—new results and prospects of applications. Comput Struct 79(22):2261–2276. https://doi.org/10.1016/S0045-7949(01)00083-9
Yagawa G, Oishi A (2021) Computational mechanics with neural networks. Lecture notes on numerical methods in engineering and sciences. Springer, Cham
Song SJ, Schmerr LW (1992) Ultrasonic flaw classification in weldments using probabilistic neural networks. J Nondestr Eval 11(2):69–77. https://doi.org/10.1007/BF00568290
Yagawa G, Yoshimura S, Mochizuki Y, Oishi T (1993) Identification of crack shape hidden in solid by means of neural network and computational mechanics. In: Masataka T, Huy Duong B (eds) Inverse problems in engineering mechanics, international union of theoretical and applied mechanics. Springer, Berlin, pp 213–222. https://doi.org/10.1007/978-3-642-52439-4_21
Psichogios DC, Ungar LH (1992) A hybrid neural network-first principles approach to process modeling. AIChE J 38(10):1499–1511. https://doi.org/10.1002/aic.690381003
Dissanayake MWMG, Phan-Thien N (1994) Neural-network-based approximations for solving partial differential equations. Commun Numer Methods Eng 10(3):195–201. https://doi.org/10.1002/cnm.1640100303
Lagaris IE, Likas A, Fotiadis DI (1998) Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans Neural Netw 9(5):987–1000. https://doi.org/10.1109/72.712178
Theocaris Pericles S, Panagiotopoulos PD (1995) Generalised hardening plasticity approximated via anisotropic elasticity: a neural network approach. Comput Methods Appl Mech Eng 125(1):123–139. https://doi.org/10.1016/0045-7825(94)00769-J
Tomonari F, Genki Y (1998) Implicit constitutive modelling for viscoplasticity using neural networks. Int J Numer Methods Eng 43(2):195–219
Okuda H, Yoshimura S, Yagawa G, Matsuda A (1998) Neural network-based parameter estimation for non-linear finite element analyses. Eng Comput 15(1):103–138. https://doi.org/10.1108/02644409810200721
Jun T, Yukio K (1994) Neural network representation of finite element method. Neural Netw 7(2):389–395. https://doi.org/10.1016/0893-6080(94)90031-0
Yagawa G, Okuda H (1996) Finite element solutions with feedback network mechanism through direct minimization of energy functionals. Int J Numer Methods Eng 39(5):867–883
Topping BHV, Khan AI, Bahreininejad A (1997) Parallel training of neural networks for finite element mesh decomposition. Comput Struct 63(4):693–707. https://doi.org/10.1016/S0045-7949(96)00082-X
Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. arXiv:1401.4082 [cs, stat]
Kingma Diederik P, Welling M (2022) Auto-encoding variational bayes. arXiv:1312.6114 [cs, stat]
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Zhang D, Maslej N, Brynjolfsson E, Etchemendy J, Lyons T, Manyika J, Ngo H, Niebles JC, Sellitto M, Sakhaee E, Shoham Y, Clark J, Perrault R (2022) The AI index 2022 annual report. arXiv:2205.03468 [cs]
Woldseth RV, Aage N, Andreas Bærentzen J, Sigmund O (2022) On the use of artificial neural networks in topology optimisation. Struct Multidiscip Optim 65(10):294. https://doi.org/10.1007/s00158-022-03347-1
Seungyeon S, Dongju S, Namwoo K (2023) Topology optimization via machine learning and deep learning: a review. J Comput Des Eng 10(4):1736–1766. https://doi.org/10.1093/jcde/qwad072
Adler A, Araya-Polo M, Poggio T (2021) Deep learning for seismic inverse problems: toward the acceleration of geophysical analysis workflows. IEEE Signal Process Mag 38(2):89–119. https://doi.org/10.1109/MSP.2020.3037429
Garnier P, Viquerat J, Rabault J, Larcher A, Kuhnle A, Hachem E (2019) A review on deep reinforcement learning for fluid mechanics. arXiv:1908.04127 [physics]
Karthik D, Gianluca I, Heng X (2019) Turbulence modeling in the age of data. Ann Rev Fluid Mech 51(1):357–377. https://doi.org/10.1146/annurev-fluid-010518-040547
Brunton S, Noack B, Koumoutsakos P (2020) Machine learning for fluid mechanics. Annu Rev Fluid Mech 52(1):477–508. https://doi.org/10.1146/annurev-fluid-010719-060214. arXiv: 1905.11075
Cai S, Mao Z, Wang Z, Yin M, Karniadakis GE (2021) Physics-informed neural networks (PINNs) for fluid mechanics: a review. Acta Mech Sin 37(12):1727–1738. https://doi.org/10.1007/s10409-021-01148-1
Giovanni C, Wei L (2021) Deep learning to replace, improve, or aid CFD analysis in built environment applications: a review. Build Environ 206:108315. https://doi.org/10.1016/j.buildenv.2021.108315
Bock FE, Aydin RC, Cyron CJ, Huber N, Kalidindi SR, Klusemann B (2019) A review of the application of machine learning and data mining approaches in continuum materials mechanics. Front Mater 6:110. https://doi.org/10.3389/fmats.2019.00110
Bishara D, Xie Y, Liu WK, Li S (2023) A state-of-the-art review on machine learning-based multiscale modeling, simulation, homogenization and design of materials. Arch Comput Methods Eng 30(1):191–222. https://doi.org/10.1007/s11831-022-09795-8
Max R, Kalina Karl A, Jörg B, Markus K (2023) A comparative study on different neural network architectures to model inelasticity. Int J Numer Methods Eng. https://doi.org/10.1002/nme.7319
Lyle R, Heyrani NA, Faez A (2022) Deep generative models in engineering design: a review. J Mech Des 144(7):071704. https://doi.org/10.1115/1.4053859
Moosavi SM, Jablonka KM, Smit B (2020) The role of machine learning in the understanding and design of materials. J Am Chem Soc 142(48):20273–20287. https://doi.org/10.1021/jacs.0c09105
Faller William E, Schreck Scott J (1996) Neural networks: applications and opportunities in aeronautics. Progress Aerosp Sci 32(5):433–456. https://doi.org/10.1016/0376-0421(95)00011-9
Thuerey N, Holl P, Mueller M, Schnell P, Trost F, Um K (2022) Physics-based deep learning. arXiv:2109.05237 [physics]
Kollmannsberger S, D’Angella D, Jokeit M, Herrmann L (2021) Deep learning in computational mechanics: an introductory course, vol 977. Studies in computational intelligence. Springer, Cham
Brunton SL, Kutz JN (2022) Data-driven science and engineering: machine learning, dynamical systems, and control. Cambridge University Press, Cambridge
Anuj K, Ramakrishnan K, Vipin K (2022) Knowledge guided machine learning: accelerating discovery using scientific knowledge and data. Chapman and Hall/CRC, New York. https://doi.org/10.1201/9781003143376
Yagawa G, Oishi A (2023) Computational mechanics with deep learning: an introduction. Springer, Cham
Rabczuk T, Bathe K-J (2023) Machine learning in modeling and simulation: methods and applications. Springer
Baker N, Alexander F, Bremer T, Hagberg A, Kevrekidis Y, Najm H, Parashar M, Patra A, Sethian J, Wild S, Willcox K, Lee S (2019) Workshop report on basic research needs for scientific machine learning: core technologies for artificial intelligence. Technical Report 1478744. http://www.osti.gov/servlets/purl/1478744/
von Rueden L, Mayer S, Beckh K, Georgiev B, Giesselbach S, Heese R, Kirsch B, Pfrommer J, Pick A, Ramamurthy R, Walczak M, Garcke J, Bauckhage C, Schuecker J (2023) Informed machine learning—a taxonomy and survey of integrating prior knowledge into learning systems. IEEE Trans Knowl Data Eng 35(1):614–633. https://doi.org/10.1109/TKDE.2021.3079836
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. Adaptive computation and machine learning series. The MIT Press, Cambridge
Alpaydin E (2020) Introduction to machine learning, 4th edn. Adaptive computation and machine learning series. The MIT Press, Cambridge
Russell SJ, Norvig P (2022) Artificial intelligence: a modern approach, 4th edn. Pearson series in artificial intelligence. Pearson, Harlow
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: an imperative style, high-performance deep learning library. arXiv:1912.01703 [cs, stat]
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 [cs]
Kurt H, Maxwell S, Halbert W (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366. https://doi.org/10.1016/0893-6080(89)90020-8
Baydin AG, Pearlmutter BA, Radul AA, Siskind JM (2018) Automatic differentiation in machine learning: a survey, p 43
Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. arXiv:1412.6980 [cs]
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer series in operations research. Springer, New York
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408. https://doi.org/10.1037/h0042519
LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, vol 2. Morgan-Kaufmann. https://proceedings.neurips.cc/paper/1989/hash/53c3bce66e43be4f209556518c2fcb54-Abstract.html
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Rumelhart David E, Hinton Geoffrey E, Williams Ronald J (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536. https://doi.org/10.1038/323533a0
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, pp 1724–1734. https://doi.org/10.3115/v1/D14-1179
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 [cs, stat]
Monti F, Shchur O, Bojchevski A, Litany O, Günnemann S, Bronstein MM (2018) Dual-primal graph convolutional networks. arXiv:1806.00770 [cs, stat]
Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R, Gulcehre C, Song F, Ballard A, Gilmer J, Dahl G, Vaswani A, Allen K, Nash C, Langston V, Dyer C, Heess N, Wierstra D, Kohli P, Botvinick M, Vinyals O, Li Y, Pascanu R (2018) Relational inductive biases, deep learning, and graph networks. arXiv:1806.01261 [cs, stat]
Henkes A, Eshraghian JK, Wessels H (2022) Spiking neural networks for nonlinear regression. arXiv:2210.03515 [cs]
Tandale SB, Stoffel M (2023) Spiking recurrent neural networks for neuromorphic computing in nonlinear structural mechanics. Comput Methods Appl Mech Eng 412:116095. https://doi.org/10.1016/j.cma.2023.116095
Gerstner W, Kistler WM (2002) Spiking neuron models: single neurons, populations, plasticity, 1st edn. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511815706
Hughes Thomas JR, Hulbert GM (1988) Space-time finite element methods for elastodynamics: formulations and error estimates. Comput Methods Appl Mech Eng 66(3):339–363. https://doi.org/10.1016/0045-7825(88)90006-0
Alsalman M, Colvert B, Kanso E (2018) Training bioinspired sensors to classify flows. Bioinspir Biomimet 14(1):016009. https://doi.org/10.1088/1748-3190/aaef1d
Colvert B, Alsalman M, Kanso E (2018) Classifying vortex wakes using neural networks. Bioinspir Biomimet 13(2):025003. https://doi.org/10.1088/1748-3190/aaa787
Pierret S, Van Den Braembussche RA (1999) Turbomachinery blade design using a Navier–Stokes solver and artificial neural network. J Turbomach 121(2):326–332. https://doi.org/10.1115/1.2841318
Vurtur Badarinath P, Chierichetti M, Davoudi Kakhki F (2021) A machine learning approach as a surrogate for a finite element analysis: status of research and application to one dimensional systems. Sensors 21(5):1654. https://doi.org/10.3390/s21051654
Lee C, Kim J, Babcock D, Goodman R (1997) Application of neural networks to turbulence control for drag reduction. Phys Fluids 9(6):1740–1747. https://doi.org/10.1063/1.869290
Jambunathan K, Hartle SL, Ashforth-Frost S, Fontama VN (1996) Evaluating convective heat transfer coefficients using neural networks. Int J Heat Mass Transfer 39(11):2329–2332. https://doi.org/10.1016/0017-9310(95)00332-0
Tracey BD, Duraisamy K, Alonso JJ (2015) A machine learning strategy to assist turbulence model development. In: 53rd AIAA aerospace sciences meeting. American Institute of Aeronautics and Astronautics, Kissimmee. https://doi.org/10.2514/6.2015-1287
Ramuhalli P, Udpa L, Udpa SS (2002) Electromagnetic NDE signal inversion by function-approximation neural networks. IEEE Trans Magn 38(6):3633–3642. https://doi.org/10.1109/TMAG.2002.804817
Araya-Polo M, Jennings J, Adler A, Dahlke T (2018) Deep-learning tomography. Lead Edge 37(1):58–66. https://doi.org/10.1190/tle37010058.1
Kim Y, Nakata N (2018) Geophysical inversion versus machine learning in inverse problems. Lead Edge 37(12):894–901. https://doi.org/10.1190/tle37120894.1
Hoang V-N, Nguyen N-L, Tran DQ, Vu Q-V, Nguyen-Xuan H (2022) Data-driven geometry-based topology optimization. Struct Multidiscip Optim 65(2):69. https://doi.org/10.1007/s00158-022-03170-8
Zhang X, Garikipati K (2023) Label-free learning of elliptic partial differential equation solvers with generalizability across boundary value problems. Comput Methods Appl Mech Eng. https://doi.org/10.1016/j.cma.2023.116214
Thuerey N, Weißenow K, Prantl L, Xiangyu H (2020) Deep learning methods for Reynolds-averaged Navier–Stokes simulations of airfoil flows. AIAA J 58(1):25–36. https://doi.org/10.2514/1.J058291
Li-Wei C, Cakal Berkay A, Xiangyu H, Nils T (2021) Numerical investigation of minimum drag profiles in laminar flow using deep learning surrogates. J Fluid Mech 919:A34. https://doi.org/10.1017/jfm.2021.398
Chen X, Zhao X, Gong Z, Zhang J, Zhou W, Chen X, Yao W (2021) A deep neural network surrogate modeling benchmark for temperature field prediction of heat source layout. Sci China Phys Mech Astron 64(11):1. https://doi.org/10.1007/s11433-021-1755-6
Chen LW, Thuerey N (2023) Towards high-accuracy deep learning inference of compressible flows over aerofoils. Comput Fluids 250:105707. https://doi.org/10.1016/j.compfluid.2022.105707
Khadilkar A, Wang J, Rai R (2019) Deep learning-based stress prediction for bottom-up SLA 3D printing process. Int J Adv Manuf Technol 102(5):2555–2569. https://doi.org/10.1007/s00170-019-03363-4
Zhenguo N, Haoliang J, Burak KL (2020) Stress field prediction in cantilevered structures using convolutional neural networks. J Comput Inform Sci Eng 20(1):011002. https://doi.org/10.1115/1.4044097
Guo X, Li W, Iorio F (2016) Convolutional neural networks for steady flow approximation. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 481–490. https://doi.org/10.1145/2939672.2939738
Zhang Z, Jaiswal P, Rai R (2018) FeatureNet: machining feature recognition based on 3D convolution neural network. Comput Aided Des 101:12–22. https://doi.org/10.1016/j.cad.2018.03.006
Williams G, Meisel NA, Simpson TW, McComb C (2019) Design repository effectiveness for 3d convolutional neural networks: application to additive manufacturing. J Mech Des 141(11):111701. https://doi.org/10.1115/1.4044199
Wu Y, Lin Y, Zhou Z (2018) Inversionet: accurate and efficient seismic-waveform inversion with convolutional neural networks. In: SEG technical program expanded abstracts 2018. Society of Exploration Geophysicists, Anaheim, pp 2096–2100. https://doi.org/10.1190/segam2018-2998603.1
Wang W, Yang F, Ma J (2018) Velocity model building with a modified fully convolutional network. In: SEG technical program expanded abstracts 2018. Society of Exploration Geophysicists, Anaheim, pp 2086–2090. https://doi.org/10.1190/segam2018-2997566.1
Yang F, Ma J (2019) Deep-learning inversion: a next-generation seismic velocity model building method. Geophysics 84(4):R583–R599. https://doi.org/10.1190/geo2018-0249.1
Zheng Y, Zhang Q, Yusifov A, Shi Y (2019) Applications of supervised deep learning for seismic interpretation and inversion. Lead Edge 38(7):526–533. https://doi.org/10.1190/tle38070526.1
Araya-Polo M, Farris S, Florez M (2019) Deep learning-driven velocity model building workflow. Lead Edge 38(11):872–872. https://doi.org/10.1190/tle38110872a1.1
Das V, Pollack A, Wollner U, Mukerji T (2019) Convolutional neural network for seismic impedance inversion. Geophysics 84(6):R869–R880. https://doi.org/10.1190/geo2018-0838.1
Wang W, Ma J (2020) Velocity model building in a crosswell acquisition geometry with image-trained artificial neural networks. Geophysics 85(2):U31–U46. https://doi.org/10.1190/geo2018-0591.1
Li S, Liu B, Ren Y, Chen Y, Yang S, Wang Y, Jiang P (2020) Deep-learning inversion of seismic data. IEEE Trans Geosci Remote Sens 58(3):2135–2149. https://doi.org/10.1109/TGRS.2019.2953473
Bangyu W, Meng D, Wang L, Liu N, Wang Y (2020) Seismic impedance inversion using fully convolutional residual network and transfer learning. IEEE Geosci Remote Sens Lett 17(12):2140–2144. https://doi.org/10.1109/LGRS.2019.2963106
Park MJ, Sacchi MD (2020) Automatic velocity analysis using convolutional neural network and transfer learning. Geophysics 85(1):V33–V43. https://doi.org/10.1190/geo2018-0870.1
Ye J, Toyama N (2022) Automatic defect detection for ultrasonic wave propagation imaging method using spatio-temporal convolution neural networks. Struct Health Monit 21(6):2750–2767. https://doi.org/10.1177/14759217211073503
Jing R, Fangshu Y, Huadong M, Stefan K, Ernst R (2023) Quantitative reconstruction of defects in multi-layered bonded composites using fully convolutional network-based ultrasonic inversion. J Sound Vib 542:117418. https://doi.org/10.1016/j.jsv.2022.117418
Qiyin L, Jun H, Zheng L, Baotong L, Jihong W (2018) Investigation into the topology optimization for conductive heat transfer based on deep learning approach. Int Commun Heat Mass Transfer 97:103–109. https://doi.org/10.1016/j.icheatmasstransfer.2018.07.001
Yonggyun Yu, Hur T, Jung J, Jang IG (2019) Deep learning for determining a near-optimal topological design without any iteration. Struct Multidiscip Optim 59(3):787–799. https://doi.org/10.1007/s00158-018-2101-5
Abueidda Diab W, Seid K, Sobh Nahil A (2020) Topology optimization of 2D structures with nonlinearities using deep learning. Comput Struct 237:106283. https://doi.org/10.1016/j.compstruc.2020.106283
Nakamura K, Suzuki Y (2020) Deep learning-based topological optimization for representing a user-specified design area. arXiv:2004.05461
Zhang Y, Peng B, Zhou X, Xiang C, Wang D (2020) A deep convolutional neural network for topology optimization with strong generalization ability. arXiv:1901.07761 [cs, stat]
Zheng S, He Z, Liu H (2021) Generating three-dimensional structural topologies via a U-Net convolutional neural network. Thin-Walled Struct 159:107263. https://doi.org/10.1016/j.tws.2020.107263
Shuai Z, Haojie F, Ziyu Z, Zhiqiang T, Kang J (2021) Accurate and real-time structural topology prediction driven by deep learning under moving morphable component-based framework. Appl Math Modell 97:522–535. https://doi.org/10.1016/j.apm.2021.04.009
Wang D, Xiang C, Pan Y, Chen A, Zhou X, Zhang Y (2022) A deep convolutional neural network for topology optimization with perceptible generalization ability. Eng Optim 54(6):973–988. https://doi.org/10.1080/0305215X.2021.1902998
Jun Y, Zhang Qi X, Qi FZ, Haijiang L, Wei S, Guangyuan W (2022) Deep learning driven real time topology optimisation based on initial stress learning. Adv Eng Inform 51:101472. https://doi.org/10.1016/j.aei.2021.101472
Seo J, Kapania RK (2023) Topology optimization with advanced CNN using mapped physics-based data. Struct Multidiscip Optim 66(1):21. https://doi.org/10.1007/s00158-022-03461-0
Ivan S, Ivan O (2019) Neural networks for topology optimization. Russian J Numer Anal Mathl Modell 34(4):215–223. https://doi.org/10.1515/rnam-2019-0018
Joo Y, Yonggyun Yu, Jang IG (2021) Unit module-based convergence acceleration for topology optimization using the spatiotemporal deep neural network. IEEE Access 9:149766–149779. https://doi.org/10.1109/ACCESS.2021.3125014
Kallioras NA, Kazakis G, Lagaros ND (2020) Accelerated topology optimization by means of deep learning. Struct Multidiscip Optim 62(3):1185–1212. https://doi.org/10.1007/s00158-020-02545-z
Sanchez-Gonzalez A, Godwin J, Pfaff T, Ying R, Leskovec J, Battaglia PW (2020) Learning to simulate complex physics with graph networks. arXiv:2002.09405
Pfaff T, Fortunato M, Sanchez-Gonzalez A, Battaglia PW (2021) Learning mesh-based simulation with graph networks. arXiv:2010.03409
Roberto P, Davide G, Vinamra A (2022) Graph neural networks for simulating crack coalescence and propagation in brittle materials. Comput Methods Appl Mech Eng 395:115021. https://doi.org/10.1016/j.cma.2022.115021
Qi CR, Su H, Mo K, Guibas LJ (2017) PointNet: deep learning on point sets for 3D classification and segmentation. arXiv:1612.00593
Groueix T, Fisher M, Kim VG, Russell BC, Aubry M (2018) AtlasNet: a Papier-Mâché approach to learning 3D surface generation. arXiv:1802.05384 [cs]
Cunningham JD, Simpson TW, Tucker CS (2019) An investigation of surrogate models for efficient performance-based decoding of 3D point clouds. J Mech Des 141(12):121401. https://doi.org/10.1115/1.4044597
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer series in statistics. Springer, New York
Tobias H, Hans-Peter M (2009) Statistical shape models for 3D medical image segmentation: a review. Med Image Anal 13(4):543–563. https://doi.org/10.1016/j.media.2009.05.004
Bhattacharya K, Hosseini B, Kovachki NB, Stuart AM (2021) Model reduction and neural networks for parametric PDEs. SMAI J Comput Math 7:121–157. https://doi.org/10.5802/smai-jcm.74
Berkooz G, Holmes P, Lumley JL (1993) The proper orthogonal decomposition in the analysis of turbulent flows. Annu Rev Fluid Mech 25(1):539–575. https://doi.org/10.1146/annurev.fl.25.010193.002543
Muñoz D, Allix O, Chinesta F, Ródenas JJ, Nadal E (2023) Manifold learning for coherent design interpolation based on geometrical and topological descriptors. Comput Methods Appl Mech Eng 405:115859. https://doi.org/10.1016/j.cma.2022.115859
Liang L, Liu M, Martin C, Sun W (2018) A deep learning approach to estimate stress distribution: a fast and accurate surrogate of finite-element analysis. J R Soc Interface 15(138):20170844. https://doi.org/10.1098/rsif.2017.0844
Ali M, Ahmed B, Jiwon K, Yara M, Mofrad Mohammad RK (2019) Bridging finite element and machine learning modeling: stress prediction of arterial walls in atherosclerosis. J Biomech Eng 141(8):084502. https://doi.org/10.1115/1.4043290
Muravleva E, Oseledets I, Koroteev D (2018) Application of machine learning to viscoplastic flow modeling. Phys Fluids 30(10):103102. https://doi.org/10.1063/1.5058127
Liang L, Liu M, Martin C, Sun W (2018) A machine learning approach as a surrogate of finite element analysis-based inverse method to estimate the zero-pressure geometry of human thoracic aorta. Int J Numer Methods Biomed Eng 34(8):e3103. https://doi.org/10.1002/cnm.3103
Derouiche K, Garois S, Champaney V, Daoud M, Traidi K, Chinesta F (2021) Data-driven modeling for multiphysics parametrized problems-application to induction hardening process. Metals 11(5):738. https://doi.org/10.3390/met11050738
Quercus H, Alberto B, Francisco C, Elías C (2023) Thermodynamics-informed neural networks for physically realistic mixed reality. Comput Methods Appl Mech Eng 407:115912. https://doi.org/10.1016/j.cma.2023.115912
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. https://doi.org/10.1126/science.1127647
Michele M, Petros K (2002) Neural network modeling for near wall turbulent flow. J Comput Phys 182(1):1–26. https://doi.org/10.1006/jcph.2002.7146
Siddharth N, Walsh Timothy F, Greg P, Fabio S (2023) GRIDS-Net: inverse shape design and identification of scatterers via geometric regularization and physics-embedded deep learning. Comput Methods Appl Mech Eng 414:116167. https://doi.org/10.1016/j.cma.2023.116167
Ana F-N, Diego Z-S, Omella Ángel J, David P, David G-S, Filipe M (2022) Supervised deep learning with finite element simulations for damage identification in bridges. Eng Struct 257:114016. https://doi.org/10.1016/j.engstruct.2022.114016
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In Nassir N, Joachim H, Wells WM, Frangi AF (eds) Medical image computing and computer-assisted intervention—MICCAI 2015. Lecture notes in computer science. Springer, Cham, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) UNet++: a nested U-Net architecture for medical image segmentation. arXiv:1807.10165 [cs, eess, stat]
Lu L, Xuhui M, Shengze C, Zhiping M, Somdatta G, Zhongqiang Z, Em KG (2022) A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data. Comput Methods Appl Mech Eng 393:114778. https://doi.org/10.1016/j.cma.2022.114778
Chen T, Chen H (1995) Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Trans Neural Netw 6(4):911–917. https://doi.org/10.1109/72.392253
Lu L, Pengzhan J, Guofei P, Zhongqiang Z, Em KG (2021) Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat Mach Intell 3(3):218–229. https://doi.org/10.1038/s42256-021-00302-5
Li Z, Kovachki N, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart A, Anandkumar A (2021) Fourier neural operator for parametric partial differential equations. arXiv:2010.08895
Chensen L, Martin M, Zhen L, Em KG (2021) A seamless multiscale operator neural network for inferring bubble dynamics. J Fluid Mech 929:A18. https://doi.org/10.1017/jfm.2021.866
Mao Zhiping LL, Olaf M, Zaki Tamer A, Em KG (2021) DeepM &Mnet for hypersonics: predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators. J Comput Phys 447:110698. https://doi.org/10.1016/j.jcp.2021.110698
Clark DLP, Lu L, Meneveau C, Karniadakis G, Zaki TA (2021) DeepONet prediction of linear instability waves in high-speed boundary layers. arXiv:2105.08697 [physics]
Shengze C, Wang Zhicheng LL, Zaki Tamer A, Em KG (2021) DeepM &Mnet: inferring the electroconvection multiphysics fields based on operator approximation by neural networks. J Comput Phys 436:110296. https://doi.org/10.1016/j.jcp.2021.110296
Chensen L, Li Zhen LL, Shengze C, Martin M, Em KG (2021) Operator learning for predicting multiscale bubble growth dynamics. J Chem Phys 154(10):104118. https://doi.org/10.1063/5.0041203
Minglang Y, Ehsan B, Rego Bruno V, Enrui Z, Cristina C, Humphrey Jay D, Em KG (2022) Simulating progressive intramural damage leading to aortic dissection using DeepONet: an operator-regression neural network. J Roy Soc Interface 19(187):20210670. https://doi.org/10.1098/rsif.2021.0670
Osorio Julian D, Zhicheng W, George K, Shengze C, Chrys C, Mayank P, Mayank H (2022) Forecasting solar-thermal systems performance under transient operation using a data-driven machine learning approach based on the deep operator network architecture. Energy Convers Manag 252:115063. https://doi.org/10.1016/j.enconman.2021.115063
Goswami S, Li DS, Rego BV, Latorre M, Humphrey JD, Karniadakis GE (2022) Neural operator learning of heterogeneous mechanobiological insults contributing to aortic aneurysms. J R Soc Interface 19(193):20220410. https://doi.org/10.1098/rsif.2022.0410
Seid K, Asha V, Abueidda Diab W, Sobh Nahil A, Kamran K (2023) Deep learning operator network for plastic deformation with variable loads and material properties. Eng Comput. https://doi.org/10.1007/s00366-023-01822-x
Clark DLP, Lu L, Charles M, Em KG, Zaki Tamer A (2023) Neural operator prediction of linear instability waves in high-speed boundary layers. J Comput Phys 474:111793. https://doi.org/10.1016/j.jcp.2022.111793
Seid K, Abueidda Diab W (2023) Data-driven and physics-informed deep learning operators for solution of heat conduction equation with parametric heat source. Int J Heat Mass Transfer 203:123809. https://doi.org/10.1016/j.ijheatmasstransfer.2022.123809
Liu C, He Q, Zhao A, Tao W, Song Z, Liu B, Feng C (2023) Operator learning for predicting mechanical response of hierarchical composites with applications of inverse design. Int J Appl Mech 15(04):2350028. https://doi.org/10.1142/S175882512350028X
Ahmed Shady E, Panos S (2023) A multifidelity deep operator network approach to closure for multiscale systems. Comput Methods Appl Mech Eng 414:116161. https://doi.org/10.1016/j.cma.2023.116161
Wang S, Wang H, Perdikaris P (2021) Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Sci Adv 7(40):8605. https://doi.org/10.1126/sciadv.abi8605
Somdatta G, Yin Minglang Yu, Yue KG (2022) A physics-informed variational DeepONet for predicting crack path in quasi-brittle materials. Comput Methods Appl Mech Eng 391:114587. https://doi.org/10.1016/j.cma.2022.114587
Goswami S, Bora A, Yu Y, Karniadakis GE (2022) Physics-informed deep neural operator networks. arXiv:2207.05748 [cs, math]
Kovachki N, Lanthaler S, Mishra S (2021) On universal approximation and error bounds for Fourier neural operators. J Mach Learn Res 22(1):290:13237-290:13312
Li Z, Kovachki N, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart A, Anandkumar A (2020) Neural operator: graph kernel network for partial differential equations. arXiv:2003.03485 [cs, math, stat]
Li Z, Kovachki N, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart A, Anandkumar A (2020) Multipole graph neural operator for parametric partial differential equations. In: Proceedings of the 34th international conference on neural information processing systems, NIPS’20. Curran Associates Inc., Red Hook, pp 6755–6766
Cao Q, Goswami S, Karniadakis GE (2023) LNO: laplace neural operator for solving differential equations. arXiv:2303.10528 [cs]
Zhu C, Ye H, Zhan B (2021) Fast solver of 2D Maxwell’s equations based on Fourier neural operator. In: 2021 Photonics and electromagnetics research symposium (PIERS). IEEE, Hangzhou, pp 1635–1643. https://doi.org/10.1109/PIERS53385.2021.9695119
Chao S, Yanghua W (2022) High-frequency wavefield extrapolation using the Fourier neural operator. J Geophys Eng 19(2):269–282. https://doi.org/10.1093/jge/gxac016
Wei W, Li-Yun F (2022) Small-data-driven fast seismic simulations for complex media using physics-informed Fourier neural operators. Geophysics 87(6):T435–T446. https://doi.org/10.1190/geo2021-0573.1
Mehran RM, Tanu P, Souvik C, Anoop Krishnan NM (2022) Learning the stress-strain fields in digital composites using Fourier neural operator. iScience 25(11):105452. https://doi.org/10.1016/j.isci.2022.105452
Kai Z, Yuande Z, Hanjun Z, Ma Xiaopeng G, Jianwei WJ, Yongfei Y, Chuanjin Y, Jun Y (2022) Fourier neural operator for solving subsurface oil/water two-phase flow partial differential equation. SPE J 27(03):1815–1830. https://doi.org/10.2118/209223-PA
Bicheng Y, Bailian C, Dylan RH, Wei J, Pawar Rajesh J (2022) A robust deep learning workflow to predict multiphase flow behavior during geological CO2 sequestration injection and Post-Injection periods. J Hydrol 607:127542. https://doi.org/10.1016/j.jhydrol.2022.127542
Gege W, Zongyi L, Kamyar A, Anima A, Benson Sally M (2022) U-FNO—an enhanced Fourier neural operator-based deep-learning model for multiphase flow. Adv Water Resour 163:104180. https://doi.org/10.1016/j.advwatres.2022.104180
Wenhui P, Zelong Y, Jianchun W (2022) Attention-enhanced neural network models for turbulence simulation. Phys Fluids 34(2):025111. https://doi.org/10.1063/5.0079302
You H, Zhang Q, Ross Colton J, Lee CH, Yu Y (2022) Learning deep implicit Fourier neural operators (IFNOs) with applications to heterogeneous material modeling. Comput Methods Appl Mech Eng 398:115296. https://doi.org/10.1016/j.cma.2022.115296
Tie K, Jianqiao L, Zhilin Y, Hongbin J, Yubo L, Zhengkai L, Huanquan P (2023) Fast and robust prediction of multiphase flow in complex fractured reservoir using a Fourier neural operator. Energies 16(9):3765. https://doi.org/10.3390/en16093765
Alexandre CRP, Joseph JS, Victor OS, Aliabadi Amir A, Jesse VGT, Bahram G (2023) Deep neural network modeling for CFD simulations: benchmarking the Fourier neural operator on the lid-driven cavity case. Appl Sci 13(5):3165. https://doi.org/10.3390/app13053165
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Cao S (2021) Choose a transformer: Fourier or Galerkin. arXiv:2105.14995 [cs, math]
Li Z, Zheng H, Kovachki N, Jin D, Chen H, Liu B, Azizzadenesheli K, Anandkumar A (2023) Physics-informed neural operator for learning partial differential equations. arXiv:2111.03794 [cs, math]
Marcati C, Opschoor JAA, Petersen PC, Schwab C (2023) Exponential ReLU neural network approximation rates for point and edge singularities. Found Comput Math 23(3):1043–1127. https://doi.org/10.1007/s10208-022-09565-9
Lukas G, Christoph S (2023) Deep ReLU neural networks overcome the curse of dimensionality for partial integrodifferential equations. Anal Appl 21(01):1–47. https://doi.org/10.1142/S0219530522500129
Marcati C, Schwab C (2023) Exponential convergence of deep operator networks for elliptic partial differential equations. SIAM J Numer Anal 61(3):1513–1545. https://doi.org/10.1137/21M1465718
Álvarez-Aramberri J, Vicent D, Caro F, Pardo D (2023) Generation of massive databases for deep learning inversion: a goal-oriented hp-adaptive strategy. In: International conference on adaptive modeling and simulation (ADMOS 2023), applications of goal-oriented error estimation and adaptivity. https://doi.org/10.23967/admos.2023.027
Bolager EL, Burak I, Datar I, Sun Q, Dietrich F (2023) Sampling weights of deep neural networks. arXiv:2306.16830 [cs, math]
Ballakur AA, Arya A (2020) Empirical evaluation of gated recurrent neural network architectures in aviation delay prediction. In: 2020 5th International conference on computing, communication and security (ICCCS), pp 1–7. https://doi.org/10.1109/ICCCS49678.2020.9276855
Chen Q, Kong L, Dugast F, To A (2023) Using the transformer model for physical simulation: an application on transient thermal analysis for 3D printing process simulation. https://openreview.net/forum?id=tuXhnv6pgo
Geneva N, Zabaras N (2020) Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks. J Comput Phys 403:109056. https://doi.org/10.1016/j.jcp.2019.109056
Chang MB, Ullman T, Torralba A, Tenenbaum JB (2017) A compositional object-based approach to learning physical dynamics. arXiv:1612.00341
Mrowca D, Zhuang C, Wang E, Haber N, Fei-Fei L, Tenenbaum JB, Yamins DL (2018) Flexible neural representation for physics prediction. arXiv:1806.08047
Sanchez-Gonzalez A, Heess N, Springenberg JT, Merel J, Riedmiller M, Hadsell R, Battaglia P (2018) Graph networks as learnable physics engines for inference and control. arXiv:1806.01242
Li Y, Wu J, Zhu JY, Tenenbaum JB, Torralba A, Tedrake R (2019) Propagation networks for model-based control under partial observation. arXiv:1809.11169
Lino M, Cantwell C, Bharath AA, Fotiadis S (2021) Simulating continuum mechanics with multi-scale graph neural networks. arXiv:2106.04900
Alfarraj M, AlRegib G (2018) Petrophysical-property estimation from seismic data using recurrent neural networks. In: SEG technical program expanded abstracts 2018. Society of Exploration Geophysicists, Anaheim, pp 2141–2146. https://doi.org/10.1190/segam2018-2995752.1
Adler A, Araya-Polo M, Poggio T (2019) Deep recurrent architectures for seismic tomography. In: 81st EAGE conference and exhibition 2019, pp 1–5. https://doi.org/10.3997/2214-4609.201901512
Fabien-Ouellet G, Sarkar R (2020) Seismic velocity estimation: a deep recurrent neural-network approach. Geophysics 85(1):U21–U29. https://doi.org/10.1190/geo2018-0786.1
Vlachas PR, Byeon W, Wan ZY, Sapsis TP, Koumoutsakos P (2018) Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. Proc Roy Soc A Math Phys Eng Sci 474(2213):20170844. https://doi.org/10.1098/rspa.2017.0844
Hou W, Darakananda D, Eldredge J (2019) Machine learning based detection of flow disturbances using surface pressure measurements. In: AIAA Scitech 2019 forum. American Institute of Aeronautics and Astronautics, San Diego. https://doi.org/10.2514/6.2019-1148
Heindel L, Hantschke P, Kästner M (2021) A virtual sensing approach for approximating nonlinear dynamical systems using LSTM networks. PAMM 21(1):e202100119. https://doi.org/10.1002/pamm.202100119
Heindel L, Hantschke P, Kästner M (2022) A data-driven approach for approximating non-linear dynamic systems using LSTM networks. Proc Struct Integr 38:159–167. https://doi.org/10.1016/j.prostr.2022.03.017
Freitag S, Cao BT, Ninić J, Meschke G (2018) Recurrent neural networks and proper orthogonal decomposition with interval data for real-time predictions of mechanised tunnelling processes. Comput Struct 207:258–273. https://doi.org/10.1016/j.compstruc.2017.03.020
Cao BT, Obel M, Freitag S, Mark P, Meschke G (2020) Artificial neural network surrogate modelling for real-time predictions and control of building damage during mechanised tunnelling. Adv Eng Softw 149:102869. https://doi.org/10.1016/j.advengsoft.2020.102869
Cao BT, Obel M, Freitag S, Heußner L, Meschke G, Mark P (2022) Real-time risk assessment of tunneling-induced building damage considering polymorphic uncertainty. ASCE-ASME J Risk Uncertain Eng Syst Part A Civ Eng 8(1):04021069. https://doi.org/10.1061/AJRUA6.0001192
Anthony G, Gunzburger Max J, Lili WZ (2022) A comparison of neural network architectures for data-driven reduced-order modeling. Comput Methods Appl Mech Eng 393:114764. https://doi.org/10.1016/j.cma.2022.114764
Gonzalez FJ, Balajewicz M (2018) Deep convolutional recurrent autoencoders for learning low-dimensional feature dynamics of fluid systems. arXiv:1808.01346 [physics]
Holden D, Duong BC, Datta S, Nowrouzezahrai D (2019) Subspace neural physics: fast data-driven interactive simulation. In: Proceedings of the 18th annual ACM SIGGRAPH/Eurographics symposium on computer animation, SCA ’19. Association for Computing Machinery, New York, pp 1–12. https://doi.org/10.1145/3309486.3340245
Stefania F, Andrea M, Luca D, Alfio Q (2020) Deep learning-based reduced order models in cardiac electrophysiology. PLoS ONE 15(10):e0239416. https://doi.org/10.1371/journal.pone.0239416
Fresca S, Dede’ L, Manzoni A (2021) A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. J Sci Comput 87(2):61. https://doi.org/10.1007/s10915-021-01462-7
Stefania F, Andrea M (2022) POD-DL-ROM: enhancing deep learning-based reduced order models for nonlinear parametrized PDEs by proper orthogonal decomposition. Comput Methods Appl Mech Eng 388:114181. https://doi.org/10.1016/j.cma.2021.114181
Ren P, Chengping R, Yang L, Jian-Xun W, Hao S (2022) PhyCRNet: physics-informed convolutional-recurrent network for solving spatiotemporal PDEs. Comput Methods Appl Mech Eng 389:114399. https://doi.org/10.1016/j.cma.2021.114399
Hu C, Martin S, Dingreville R (2022) Accelerating phase-field predictions via recurrent neural networks learning the microstructure evolution in latent space. Comput Methods Appl Mech Eng 397:115128. https://doi.org/10.1016/j.cma.2022.115128
Kookjin L, Carlberg Kevin T (2020) Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. J Comput Phys 404:108973. https://doi.org/10.1016/j.jcp.2019.108973
Shen S, Yin Y, Shao T, Wang H, Jiang C, Lan L, Zhou K (2021) High-order differentiable autoencoder for nonlinear model reduction. arXiv:2102.11026 [cs]
Schmid Peter J (2010) Dynamic mode decomposition of numerical and experimental data. J Fluid Mech 656:5–28. https://doi.org/10.1017/S0022112010001217
Tu JH, Rowley CW, Luchtenburg DM, Brunton SL, Nathan KJ (2013) On dynamic mode decomposition: theory and applications. arXiv:1312.0041 [physics]
Koopman BO (1931) Hamiltonian systems and transformation in Hilbert space. Proc Natl Acad Sci 17(5):315–318. https://doi.org/10.1073/pnas.17.5.315
Williams MO, Kevrekidis IG, Rowley CW (2015) A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. J Nonlinear Sci 25(6):1307–1346. https://doi.org/10.1007/s00332-015-9258-5
Li Q, Dietrich F, Bollt EM, Kevrekidis IG (2017) Extended dynamic mode decomposition with dictionary learning: a data-driven adaptive spectral decomposition of the Koopman operator. Chaos Interdiscip J Nonlinear Sci 27(10):103111. https://doi.org/10.1063/1.4993854
Yeung E, Kundu S, Hodas N (2019) Learning deep neural network representations for koopman operators of nonlinear dynamical systems. In: 2019 American Control Conference (ACC), pp 4832–4839. https://doi.org/10.23919/ACC.2019.8815339
Takeishi N, Kawahara Y, Yairi T (2017) Learning Koopman invariant subspaces for dynamic mode decomposition. In: Proceedings of the 31st international conference on neural information processing systems, NIPS’17. Curran Associates Inc, Red Hook, pp 1130–1140
Morton J, Witherden FD, Jameson A, Kochenderfer MJ (2018) Deep dynamical modeling and control of unsteady fluid flows. In: Proceedings of the 32nd international conference on neural information processing systems, NIPS’18. Curran Associates Inc, Red Hook, pp 9278–9288
Lusch B, Nathan Kutz J, Brunton SL (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nat Commun 9(1):4950. https://doi.org/10.1038/s41467-018-07210-0
Otto SE, Rowley CW (2019) Linearly recurrent autoencoder networks for learning dynamics. SIAM J Appl Dyn Syst 18(1):558–593. https://doi.org/10.1137/18M1177846
Cohn D, Ghahramani Z, Jordan M (1994) Active learning with statistical models. In: Advances in neural information processing systems, vol 7. MIT Press, Cambridge
Liu X, Athanasiou CE, Padture NP, Sheldon BW, Gao H (2021) Knowledge extraction and transfer in data-driven fracture mechanics. Proc Natl Acad Sci 118(23):e2104765118. https://doi.org/10.1073/pnas.2104765118
Haasdonk B, Kleikamp H, Ohlberger M, Schindler F, Wenzel T (2023) A new certified hierarchical and adaptive RB-ML-ROM surrogate model for parametrized PDEs. SIAM J Sci Comput 45(3):A1039–A1065. https://doi.org/10.1137/22M1493318
Kalina KA, Linden L, Brummund J, Kästner M (2023) FE ANN: an efficient data-driven multiscale approach based on physics-constrained neural networks and automated data mining. Comput Mech 71(5):827–851. https://doi.org/10.1007/s00466-022-02260-0
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359. https://doi.org/10.1109/TKDE.2009.191
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Proceedings of the 27th international conference on neural information processing systems, vol 2, NIPS’14. MIT Press, Cambridge, pp 3320–3328
Kollmannsberger S, Singh D, Herrmann L (2023) Transfer learning enhanced full waveform inversion. arXiv:2302.11259 [physics]
Liu Z, Chen Y, Du Y, Tegmark M (2021) Physics-augmented learning: a new paradigm beyond physics-informed learning. arXiv:2109.13901 [physics]
Zhu Y, Zabaras N, Koutsourelakis PS, Perdikaris P (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J Comput Phys 394:56–81. https://doi.org/10.1016/j.jcp.2019.05.024. arXiv: 1901.06314
Eichelsdörfer J, Kaltenbach S, Koutsourelakis PS (2021) Physics-enhanced neural networks in the small data regime. arXiv:2111.10329 [physics, stat] version: 1
Raissi M (2018) Deep hidden physics models: deep learning of nonlinear partial differential equations. arXiv:1801.06637 [cs, math, stat]
Em KG, Kevrekidis Ioannis G, Lu L, Paris P, Sifan W, Liu Y (2021) Physics-informed machine learning. Nat Rev Phys 3(6):422–440. https://doi.org/10.1038/s42254-021-00314-5
Cuomo S, Cola VSD, Giampaolo F, Rozza G, Raissi M, Piccialli F (2022) Scientific machine learning through physics-informed neural networks: where we are and what’s next. J Sci Comput 92(3):88. https://doi.org/10.1007/s10915-022-01939-z
Hao Z, Liu S, Zhang Y, Ying C, Feng Y, Su H, Zhu J (2022) Physics-informed machine learning: a survey on problems, methods and applications. arXiv:2211.08064
Ehsan H, Ruben J (2021) SciANN: a Keras/tensorflow wrapper for scientific computations and physics-informed deep learning using artificial neural networks. Comput Methods Appl Mech Eng 373:113552. https://doi.org/10.1016/j.cma.2020.113552
Hennigh O, Narasimhan S, Nabian MA, Subramaniam A, Tangsali K, Fang Z, Rietmann M, Byeon W, Choudhry S (2021) NVIDIA SimNet: an AI-accelerated multi-physics simulation framework. In: Paszynski M, Kranzlmüller D, Krzhizhanovskaya VV, Dongarra JJ, Sloot PMA (eds) Computational science—ICCS 2021. Lecture notes in computer science. Springer, Cham, pp 447–461. https://doi.org/10.1007/978-3-030-77977-1_36
Lu L, Meng X, Mao Z, Karniadakis GE (2021) DeepXDE: a deep learning library for solving differential equations. SIAM Rev 63(1):208–228. https://doi.org/10.1137/19M1274067
Zhiqiang C, Jingshuang C, Min L, Xinyu L (2020) Deep least-squares methods: an unsupervised learning-based numerical method for solving elliptic PDEs. J Comput Phys 420:109707. https://doi.org/10.1016/j.jcp.2020.109707
Justin S, Konstantinos S (2018) DGM: a deep learning algorithm for solving partial differential equations. J Comput Phys 375:1339–1364. https://doi.org/10.1016/j.jcp.2018.08.029
Kharazmi E, Zhang Z, Karniadakis GE (2019) Variational physics-informed neural networks for solving partial differential equations. arXiv:1912.00873 [physics, stat]
Ehsan K, Zhongqiang Z, Karniadakis George EM (2021) hp-VPINNs: variational physics-informed neural networks with domain decomposition. Comput Methods Appl Mech Eng 374:113547. https://doi.org/10.1016/j.cma.2020.113547
Morokoff William J, Caflisch Russel E (1995) Quasi-Monte Carlo integration. J Comput Phys 122(2):218–230. https://doi.org/10.1006/jcph.1995.1209
14-Monte Carlo integration I: basic concepts (2004). In: Pharr M, Humphreys G (eds) Physically based rendering. Morgan Kaufmann, Burlington, pp 631–660. https://doi.org/10.1016/B978-012553180-1/50016-8
Novak E, Ritter K (1996) High dimensional integration of smooth functions over cubes. Numer Math 75(1):79–97. https://doi.org/10.1007/s002110050231
Rivera Jon A, Taylor Jamie M, Omella Angel J, David P (2022) On quadrature rules for solving partial differential equations using neural networks. Comput Methods Appl Mech Eng 393:114710. https://doi.org/10.1016/j.cma.2022.114710
Yaohua Z, Gang B, Xiaojing Y, Haomin Z (2020) Weak adversarial networks for high-dimensional partial differential equations. J Comput Phys 411:109409. https://doi.org/10.1016/j.jcp.2020.109409
Minh N-TV, Xiaoying Z, Timon R (2019) A deep energy method for finite deformation hyperelasticity. Eur J Mech A Solids. https://doi.org/10.1016/j.euromechsol.2019.103874
Weinan E, Bing Yu (2018) The Deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun Math Stat 6(1):1–12. https://doi.org/10.1007/s40304-018-0127-z
Grossmann TG, Komorowska UJ, Latz J, Schönlieb CB (2023) Can physics-informed neural networks beat the finite element method? arXiv:2302.04107
Ali K, Tapan M (2022) Physics-informed PointNet: a deep learning solver for steady-state incompressible flows and thermal fields on multiple sets of irregular geometries. J Comput Phys 468:111510. https://doi.org/10.1016/j.jcp.2022.111510
Jens B, Kaj N (2018) A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317:28–41. https://doi.org/10.1016/j.neucom.2018.06.056
Alexander H, Henning W, Rolf M (2022) Physics informed neural networks for continuum micromechanics. Comput Methods Appl Mech Eng 393:114790. https://doi.org/10.1016/j.cma.2022.114790
Lagaris IE, Likas AC, Papageorgiou DG (2000) Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans Neural Netw 11(5):1041–1049. https://doi.org/10.1109/72.870037
Ferrari S, Jensenius M (2008) A constrained optimization approach to preserving prior knowledge during incremental training. IEEE Trans Neural Netw 19(6):996–1009. https://doi.org/10.1109/TNN.2007.915108
Rudd K, Di Muro G, Ferrari S (2014) A constrained backpropagation approach for the adaptive solution of partial differential equations. IEEE Trans Neural Netw Learn Syst 25(3):571–584. https://doi.org/10.1109/TNNLS.2013.2277601
Keith R, Silvia F (2015) A constrained integration (CINT) approach to solving partial differential equations using artificial neural networks. Neurocomputing 155:277–285. https://doi.org/10.1016/j.neucom.2014.11.058
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6450–6458. https://doi.org/10.1109/CVPR.2017.683
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in CNNs. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6995–7003. https://doi.org/10.1109/CVPR.2018.00731
Jim M, Deep R, Hesthaven Jan S, Christian R (2020) Constraint-aware neural networks for Riemann problems. J Comput Phys 409:109345. https://doi.org/10.1016/j.jcp.2020.109345
Nandwani Y, Pathak AM, Singla P (2019) A primal dual formulation for deep learning with constraints. In: Advances in neural information processing systems, vol 32. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2019/hash/cf708fc1decf0337aded484f8f4519ae-Abstract.html
McClenny L, Braga-Neto U (2022) Self-adaptive physics-informed neural networks using a soft attention mechanism. arXiv:2009.04544 [cs, stat]
Lu L, Pestourie R, Yao W, Wang Z, Verdugo F, Johnson SG (2021) Physics-informed neural networks with hard constraints for inverse design. SIAM J Sci Comput 43(6):B1105–B1132. https://doi.org/10.1137/21M1397908
Zeng Q, Kothari Y, Bryngelson SH, Schäfer F (2022) Competitive physics informed networks. arXiv:2204.11144 [cs, math]
Philipp M, Wolfgang F, Stefan T, Isabell G, Michael G (2023) Modeling of 3D blood flows with physics-informed neural networks: comparison of network architectures. Fluids 8(2):46. https://doi.org/10.3390/fluids8020046
Han J, Tao J, Wang C (2020) FlowNet: a deep learning framework for clustering and selection of streamlines and stream surfaces. IEEE Trans Visual Comput Graph 26(4):1732–1744. https://doi.org/10.1109/TVCG.2018.2880207
Bhatnagar S, Afshar Y, Pan S, Duraisamy K, Kaushik S (2019) Prediction of aerodynamic flow fields using convolutional neural networks. Comput Mech 64(2):525–545. https://doi.org/10.1007/s00466-019-01740-0
Han G, Luning S, Jian-Xun W (2021) PhyGeoNet: physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J Comput Phys 428:110079. https://doi.org/10.1016/j.jcp.2020.110079
Wandel N, Weinmann M, Neidlin M, Klein R (2022) Spline-PINN: approaching PDEs without data using fast, physics-informed hermite-spline CNNs. arXiv:2109.07143 [physics]
Han G, Zahr Matthew J, Jian-Xun W (2022) Physics-informed graph neural Galerkin networks: a unified framework for solving PDE-governed forward and inverse problems. Comput Methods Appl Mech Eng 390:114502. https://doi.org/10.1016/j.cma.2021.114502
Möller M, Toshniwal D, Van Ruiten F (2021) Physics-informed machine learning embedded into isogeometric analysis. Mathematics: key enabling technology for scientific machine learning. https://platformwiskunde.nl/wp-content/uploads/2021/11/Math_KET_SciML.pdf
Hughes TJR, Cottrell JA, Bazilevs Y (2005) Isogeometric analysis: CAD, finite elements, NURBS, exact geometry and mesh refinement. Comput Methods Appl Mech Eng 194(39):4135–4195. https://doi.org/10.1016/j.cma.2004.10.008
Meethal RE, Obst B, Khalil M, Ghantasala A, Kodakkal A, Bletzinger KU, Wüchner R (2022) Finite element method-enhanced neural network for forward and inverse problems. arXiv:2205.08321 [cs, math]
Hughes TJR (2000) The finite element method: linear static and dynamic finite element analysis. Dover Publications, Mineola
Bathe K-J (ed) (2014) Finite element procedures, 2nd edn. K.J. Bathe, Watertown
Berrone S, Canuto C, Pintore M (2022) Variational physics informed neural networks: the role of quadratures and test functions. J Sci Comput 92(3):100. https://doi.org/10.1007/s10915-022-01950-4
Badia S, Li W, Martín AF (2023) Finite element interpolated neural networks for solving forward and inverse problems. arXiv:2306.06304 [cs, math]
Alireza Yazdani LL, Raissi M, Karniadakis GE (2020) Systems biology informed deep learning for inferring parameters and hidden dynamics. PLoS Comput Biol 16(11):e1007575. https://doi.org/10.1371/journal.pcbi.1007575
Carlos U, David P, Javier OA (2022) A finite element based deep learning solver for parametric PDEs. Comput Methods Appl Mech Eng 391:114562. https://doi.org/10.1016/j.cma.2021.114562
Jagtap Ameya D, Ehsan K, Em KG (2020) Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput Methods Appl Mech Eng 365:113028. https://doi.org/10.1016/j.cma.2020.113028
Khemraj S, Jagtap Ameya D, Em KG (2021) Parallel physics-informed neural networks via domain decomposition. J Comput Phys 447:110683. https://doi.org/10.1016/j.jcp.2021.110683
Jagtap Ameya D, Em KG (2020) Extended physics-informed neural networks (XPINNs): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun Comput Phys 28(5):2002–2041. https://doi.org/10.4208/cicp.OA-2020-0164
Chen X, Gong C, Wan Q, Deng L, Wan Y, Liu Y, Chen B, Liu J (2021) Transfer learning for deep neural network-based partial differential equations solving. Adv Aerodyn 3(1):36. https://doi.org/10.1186/s42774-021-00094-7
Goswami S, Anitescu C, Chakraborty S, Rabczuk T (2019) Transfer learning enhanced physics informed neural network for phase-field modeling of fracture. arXiv:1907.02531 [cs, stat]
He J, Chadha C, Kushwaha S, Koric S, Abueidda D, Jasiuk I (2023) Deep energy method in topology optimization applications. Acta Mech 234(4):1365–1379. https://doi.org/10.1007/s00707-022-03449-3
Nabian MA, Gladstone RJ, Meidani H (2021) Efficient training of physics-informed neural networks via importance sampling. Comput Aided Civ Infrastruct Eng 36(8):962–977. https://doi.org/10.1111/mice.12685
Hanna John M, Aguado José V, Sebastien C-C, Ramzi A, Domenico B (2022) Residual-based adaptivity for two-phase flow simulation in porous media using physics-informed Neural Networks. Comput Methods Appl Mech Eng 396:115100. https://doi.org/10.1016/j.cma.2022.115100
Kollmannsberger S, D’Angella D, Jokeit M, Herrmann L (2021) Physics-informed neural networks. In: Stefan K, Davide D, Moritz J, Leon H (eds) Deep learning in computational mechanics: an introductory course, studies in computational intelligence. Springer, Cham, pp 55–84. https://doi.org/10.1007/978-3-030-76587-3_5
Anton D, Wessels H (2021) Identification of material parameters from full-field displacement data using physics-informed neural networks. https://doi.org/10.13140/RG.2.2.24558.89924/1
Yifei Z, QiZhi H, Tartakovsky Alexandre M (2023) Improved training of physics-informed neural networks for parabolic differential equations with sharply perturbed initial conditions. Comput Methods Appl Mech Eng 414:116125. https://doi.org/10.1016/j.cma.2023.116125
Yu Jeremy LL, Xuhui M, Em KG (2022) Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems. Comput Methods Appl Mech Eng 393:114823. https://doi.org/10.1016/j.cma.2022.114823
Taylor Jamie M, David P, Ignacio M (2023) A deep Fourier residual method for solving PDEs using neural networks. Comput Methods Appl Mech Eng 405:115850. https://doi.org/10.1016/j.cma.2022.115850
Pao-Hsiung C, Cheng WJ, Chinchun O, Ha DM, Yew-Soon O (2022) CAN-PINN: a fast physics-informed neural network based on coupled-automatic-numerical differentiation method. Comput Methods Appl Mech Eng 395:114909. https://doi.org/10.1016/j.cma.2022.114909
Jagtap Ameya D, Kenji K, Em KG (2020) Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. J Comput Phys 404:109136. https://doi.org/10.1016/j.jcp.2019.109136
Guang-Bin H, Qin-Yu Z, Chee-Kheong S (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Huang G-B, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122. https://doi.org/10.1007/s13042-011-0019-y
Suchuan D, Zongwei L (2021) Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Comput Methods Appl Mech Eng 387:114129. https://doi.org/10.1016/j.cma.2021.114129
Suchuan D, Jielin Y (2022) Numerical approximation of partial differential equations by a variable projection method with artificial neural networks. Comput Methods Appl Mech Eng 398:115284. https://doi.org/10.1016/j.cma.2022.115284
Ehsan H, Maziar R, Adrian M, Hector G, Ruben J (2021) A physics-informed deep learning framework for inversion and surrogate modeling in solid mechanics. Comput Methods Appl Mech Eng 379:113741. https://doi.org/10.1016/j.cma.2021.113741
Jinshuai B, Hyogu J, Batuwatta-Gamage CP, Shusheng X, Qingxia W, Rathnayaka CM, Laith A, Liu Gui-Rong G, Yuantong S (2023) An introduction to programming physics-informed neural network-based computational solid mechanics. Int J Comput Methods. https://doi.org/10.1142/S0219876223500135
Georgios K, Yibo Y, Eileen H, Witschey Walter R, Detre John A, Paris P (2020) Machine learning in cardiovascular flows modeling: predicting arterial blood pressure from non-invasive 4D flow MRI data using physics-informed neural networks. Comput Methods Appl Mech Eng 358:112623. https://doi.org/10.1016/j.cma.2019.112623
Raissi M, Yazdani A, Karniadakis GE (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367(6481):1026–1030. https://doi.org/10.1126/science.aaw4741
Luning S, Han G, Shaowu P, Jian-Xun W (2020) Surrogate modeling for fluid flows based on physics-constrained deep learning without simulation data. Comput Methods Appl Mech Eng 361:112732. https://doi.org/10.1016/j.cma.2019.112732
Xiaowei J, Shengze C, Hui L, Em KG (2021) NSFnets (Navier–Stokes flow nets): physics-informed neural networks for the incompressible Navier–Stokes equations. J Comput Phys 426:109951. https://doi.org/10.1016/j.jcp.2020.109951
Shengze C, Zhicheng W, Frederik F, Jin JY, Callum G, Em KG (2021) Flow over an espresso cup: inferring 3-D velocity and pressure fields from tomographic background oriented Schlieren via physics-informed neural networks. J Fluid Mech 915:A102. https://doi.org/10.1017/jfm.2021.135
Fraces Cedric G, Hamdi T (2021) Physics informed deep learning for flow and transport in porous media. OnePetro. https://doi.org/10.2118/203934-MS
Wenbo Z, Li David S, Tan B-T, Sacks Michael S (2022) Simulation of the 3D hyperelastic behavior of ventricular myocardium using a finite-element based neural-network approach. Comput Methods Appl Mech Eng 394:114871. https://doi.org/10.1016/j.cma.2022.114871
Wang Jeremy CH, Jean-Pierre H (2023) FluxNet: a physics-informed learning-based Riemann solver for transcritical flows with non-ideal thermodynamics. Comput Methods Appl Mech Eng 411:116070. https://doi.org/10.1016/j.cma.2023.116070
Sina AN, Ehsan H, Trevor C, Anoush P, Reza V (2021) Physics-informed neural network for modelling the thermochemical curing process of composite-tool systems during manufacture. Comput Methods Appl Mech Eng 384:113959. https://doi.org/10.1016/j.cma.2021.113959
Zhu Q, Liu Z, Yan J (2021) Machine learning for metal additive manufacturing: predicting temperature and melt pool fluid dynamics using physics-informed neural networks. Comput Mech 67(2):619–635. https://doi.org/10.1007/s00466-020-01952-9
Markidis S (2021) The old and the new: can physics-informed deep-learning replace traditional linear solvers? Front Big Data. https://doi.org/10.3389/fdata.2021.669097
Liangliang L, Li Yunzhu D, Qiuwan LT, Yonghui X (2022) ReF-nets: physics-informed neural network for Reynolds equation of gas bearing. Comput Methods Appl Mech Eng 391:114524. https://doi.org/10.1016/j.cma.2021.114524
Chen Yuyao LL, Em KG, Dal NL (2020) Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Optics Express 28(8):11618. https://doi.org/10.1364/OE.384875
Ruiyang Z, Yang L, Hao S (2020) Physics-informed multi-LSTM networks for metamodeling of nonlinear structures. Comput Methods Appl Mech Eng 369:113226. https://doi.org/10.1016/j.cma.2020.113226
Shukla K, Di Leoni PC, Blackshire J, Sparkman D, Karniadakis GE (2020) Physics-informed neural network for ultrasound nondestructive quantification of surface breaking cracks. J Nondestr Eval 39(3):61. https://doi.org/10.1007/s10921-020-00705-1
Anton D, Wessels H (2022) Physics-informed neural networks for material model calibration from full-field displacement data. arXiv:2212.07723
Herrmann L, Bürchner T, Dietrich F, Kollmannsberger S (2023) On the use of neural networks for full waveform inversion. Comput Methods Appl Mech Eng 415:116278. https://doi.org/10.1016/j.cma.2023.116278
Rojas Carlos JG, Bitterncourt ML, Boldrini JL (2021) Parameter identification for a damage model using a physics informed neural network. arXiv:2107.08781
Li W, Lee K-M (2021) Physics informed neural network for parameter identification and boundary force estimation of compliant and biomechanical systems. Int J Intell Robot Appl 5(3):313–325. https://doi.org/10.1007/s41315-021-00196-x
Zhang E, Dao M, Karniadakis GE, Suresh S (2022) Analyses of internal structures and defects in materials using physics-informed neural networks. Sci Adv 8(7):0644. https://doi.org/10.1126/sciadv.abk0644
Depina I, Jain S, Mar Valsson S, Gotovac H (2022) Application of physics-informed neural networks to inverse problems in unsaturated groundwater flow. Georisk Assess Manag Risk Eng Syst Geohazards 16(1):21–36. https://doi.org/10.1080/17499518.2021.1971251
Chen X, Trung CB, Yong Y, Günther M (2023) Transfer learning based physics-informed neural networks for solving inverse problems in engineering structures under different loading scenarios. Comput Methods Appl Mech Eng 405:115852. https://doi.org/10.1016/j.cma.2022.115852
Yubiao S, Ushnish S, Matthew J (2023) Physics-informed deep learning for simultaneous surrogate modeling and PDE-constrained optimization of an airfoil geometry. Comput Methods Appl Mech Eng 411:116042. https://doi.org/10.1016/j.cma.2023.116042
Rasht-Behesht M, Huber C, Shukla K, Karniadakis GE (2022) Physics-informed neural networks (PINNs) for wave propagation and full waveform inversions. J Geophys Res Solid Earth. https://doi.org/10.1029/2021JB023120
Zehnder J, Li Y, Coros S, Thomaszewski B (2021) NTopo: mesh-free topology optimization using implicit neural representations. arXiv:2102.10782
Di Lorenzo D, Champaney V, Marzin JY, Farhat C, Chinesta F (2023) Physics informed and data-based augmented learning in structural health diagnosis. Comput Methods Appl Mech Eng 414:116186. https://doi.org/10.1016/j.cma.2023.116186
Jens B, Kaj N (2019) Data-driven discovery of PDEs in complex datasets. J Comput Phys 384:239–252. https://doi.org/10.1016/j.jcp.2019.01.036
Udrescu S-M, Tegmark M (2020) AI Feynman: a physics-inspired method for symbolic regression. Sci Adv 6(16):2631. https://doi.org/10.1126/sciadv.aay2631
Feynman Richard P, Leighton Robert B, Sands Matthew L (2011) The Feynman lectures on physics. Basic Books, New York
Xuhui M, Zhen L, Dongkun Z, Em KG (2020) PPINN: parareal physics-informed neural network for time-dependent PDEs. Comput Methods Appl Mech Eng 370:113250. https://doi.org/10.1016/j.cma.2020.113250
Revanth M, Susanta G (2022) A novel sequential method to train physics informed neural networks for Allen Cahn and Cahn Hilliard equations. Comput Methods Appl Mech Eng 390:114474. https://doi.org/10.1016/j.cma.2021.114474
Iserles A (2008) A first course in the numerical analysis of differential equations. Cambridge University Press
Henning W, Christian W, Peter W (2020) The neural particle method—an updated Lagrangian physics informed neural network for computational fluid dynamics. Comput Methods Appl Mech Eng 368:113127. https://doi.org/10.1016/j.cma.2020.113127
Jinshuai B, Ying Z, Yuwei M, Hyogu J, Haifei Z, Charith R, Sauret Emilie G (2022) A general neural particle method for hydrodynamics modeling. Comput Methods Appl Mech Eng 393:114740. https://doi.org/10.1016/j.cma.2022.114740
González-García R, Rico-Martínez R, Kevrekidis IG (1998) Identification of distributed parameter systems: a neural net based approach. Comput Chem Eng 22:S965–S968. https://doi.org/10.1016/S0098-1354(98)00191-4
Long Z, Lu Y, Ma X, Dong B (2018) PDE-Net: learning PDEs from data. In: Proceedings of the 35th international conference on machine learning. PMLR, pp 3208–3216. https://proceedings.mlr.press/v80/long18a.html
Long Zichao L, Yiping DB (2019) PDE-Net 2.0: learning PDEs from data with a numeric-symbolic hybrid deep network. J Comput Phys 399:108925. https://doi.org/10.1016/j.jcp.2019.108925
Hua BS, Tran MK, Yeung SK (2018) Pointwise convolutional neural networks. arXiv:1712.05245 [cs]
Brunton SL, Proctor JL, Nathan Kutz J (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Natl Acad Sci 113(15):3932–3937. https://doi.org/10.1073/pnas.1517384113
Rudy SH, Brunton SL, Proctor JL, Nathan Kutz J (2017) Data-driven discovery of partial differential equations. Sci Adv 3(4):e1602614. https://doi.org/10.1126/sciadv.1602614
Schaeffer H (2017) Learning partial differential equations via data discovery and sparse optimization. Proc Roy Soc A Math Phys Eng Sci 473(2197):20160446. https://doi.org/10.1098/rspa.2016.0446
Champion K, Lusch B, Nathan Kutz J, Brunton SL (2019) Data-driven discovery of coordinates and governing equations. Proc Natl Acad Sci 116(45):22445–22451. https://doi.org/10.1073/pnas.1906995116
Paolo C, Giorgio G, Stefania F, Andrea M, Attilio F (2023) Reduced order modeling of parametrized systems through autoencoders and SINDy approach: continuation of periodic solutions. Comput Methods Appl Mech Eng 411:116072. https://doi.org/10.1016/j.cma.2023.116072
Raissi M, Perdikaris P, Karniadakis GE (2018) Multistep neural networks for data-driven discovery of nonlinear dynamical systems. arXiv:1801.01236 [nlin, physics:physics, stat]
Kim B, Azevedo VC, Thuerey N, Kim T, Gross M, Solenthaler B (2019) Deep fluids: a generative network for parameterized fluid simulations. Comput Graph Forum 38(2):59–70. https://doi.org/10.1111/cgf.13619
Julia L, Reese J, Jeremy T (2016) Machine learning strategies for systems with invariance properties. J Comput Phys 318:22–35. https://doi.org/10.1016/j.jcp.2016.05.003
Julia L, Andrew K, Jeremy T (2016) Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J Fluid Mech 807:155–166. https://doi.org/10.1017/jfm.2016.615
Smith GF (1965) On isotropic integrity bases. Arch Ration Mech Anal 18(4):282–292. https://doi.org/10.1007/BF00251667
Lutter M, Listmann K, Peters J (2019) Deep Lagrangian networks for end-to-end learning of energy-based control for under-actuated systems. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 7718–7725. https://doi.org/10.1109/IROS40897.2019.8968268
Lutter M, Ritter C, Peters J (2019) Deep Lagrangian networks: using physics as model prior for deep learning. arXiv:1907.04490 [cs, eess, stat]
Cranmer M, Greydanus S, Hoyer S, Battaglia P, Spergel D, Ho S (2020) Lagrangian neural networks. arXiv:2003.04630 [physics, stat]
Greydanus S, Dzamba M, Yosinski J (2019) Hamiltonian neural networks. arXiv:1906.01563 [cs]
Zhang L, Yang F, Daniel Zhang Y, Zhu YJ (2016) Road crack detection using deep convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), pp 3708–3712. https://doi.org/10.1109/ICIP.2016.7533052
Chen F-C, Jahanshahi MR (2018) NB-CNN: deep learning-based crack detection using convolutional neural network and Naïve Bayes data fusion. IEEE Trans Ind Electron 65(5):4392–4400. https://doi.org/10.1109/TIE.2017.2764844
Jaeger BE, Schmid S, Grosse CU, Gögelein A, Elischberger F (2022) Infrared thermal imaging-based turbine blade crack classification using deep learning. J Nondestr Eval 41(4):74. https://doi.org/10.1007/s10921-022-00907-9
Korshunova N, Jomo J, Lékó G, Reznik D, Balázs P, Kollmannsberger S (2020) Image-based material characterization of complex microarchitectured additively manufactured structures. Comput Math Appl 80(11):2462–2480. https://doi.org/10.1016/j.camwa.2020.07.018
Hall Barbosa C, Bruno AC, Vellasco M, Pacheco M, Wikswo JP, Ewing AP (1999) Automation of SQUlD nondestructive evaluation of steel plates by neural networks. IEEE Trans Appl Supercond 9(2):3475–3478. https://doi.org/10.1109/77.783778
Ovcharenko O, Kazei V, Kalita M, Peter D, Alkhalifah T (2019) Deep learning for low-frequency extrapolation from multioffset seismic data. Geophysics 84(6):R989–R1001. https://doi.org/10.1190/geo2018-0884.1
Sun H, Demanet L (2020) Extrapolated full waveform inversion with deep learning. Geophysics, 85(3):R275–R288. https://doi.org/10.1190/geo2019-0195.1. arXiv:1909.11536
Sun H, Demanet L (2022) Deep learning for low-frequency extrapolation of multicomponent data in elastic FWI. IEEE Trans Geosci Remote Sens 60:1–11. https://doi.org/10.1109/TGRS.2021.3135790
Lewis W, Vigh W (2017) Deep learning prior models from seismic images for full-waveform inversion. In: SEG technical program expanded abstracts 2017. Society of Exploration Geophysicists, Houston, pp 1512–1517. https://doi.org/10.1190/segam2017-17627643.1
Dyck DN, Lowther DA, McFee S (1992) Determining an approximate finite element mesh density using neural network techniques. IEEE Trans Magn 28(2):1767–1770. https://doi.org/10.1109/20.124047
Chedid R, Najjar N (1996) Automatic finite-element mesh generation using artificial neural networks-part I: prediction of mesh density. IEEE Trans Magn 32(5):5173–5178. https://doi.org/10.1109/20.538619
Triantafyllidis DG, Labridis DP (2000) An automatic mesh generator for handling small features in open boundary power transmission line problems using artificial neural networks. Commun Numer Methods Eng 16(3):177–190
Zhang Z, Wang Y, Jimack PK, Wang H (2020) MeshingNet: a new mesh generation method based on deep learning. In: Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J (eds) Computational science—ICCS 2020, vol 12139. Lecture notes in computer science. Springer, Cham, pp 186–198. https://doi.org/10.1007/978-3-030-50420-5_14
Lock C, Hassan O, Sevilla R, Jones J (2023) Meshing using neural networks for improving the efficiency of computer modelling. Eng Comput. https://doi.org/10.1007/s00366-023-01812-z
Bernd F (1994) Growing cell structures—a self-organizing network for unsupervised and supervised learning. Neural Netw 7(9):1441–1460. https://doi.org/10.1016/0893-6080(94)90091-4
Alfonzetti S, Coco S, Cavalieri S, Malgeri M (1996) Automatic mesh generation by the let-it-grow neural network. IEEE Trans Magn 32(3):1349–1352. https://doi.org/10.1109/20.497496
Triantafyllidis DG, Labridis DP (2002) A finite-element mesh generator based on growing neural networks. IEEE Trans Neural Netw 13(6):1482–1496. https://doi.org/10.1109/TNN.2002.804223
Lefik M, Schrefler BA (2003) Artificial neural network as an incremental non-linear constitutive model for a finite element code. Comput Methods Appl Mech Eng 192(28):3265–3283. https://doi.org/10.1016/S0045-7825(03)00350-5
Phill JD, Piemaan F, Whan YJ (2021) Machine learning-based constitutive model for J2- plasticity. Int J Plast 138:102919. https://doi.org/10.1016/j.ijplas.2020.102919
Lin YC, Jun Z, Jue Z (2008) Application of neural networks to predict the elevated temperature flow behavior of a low alloy steel. Comput Mater Sci 43(4):752–758. https://doi.org/10.1016/j.commatsci.2008.01.039
Li Hong-Ying H, Ji-Dong WD-D, Xiao-Feng W, Yang-Hua L (2012) Artificial neural network and constitutive equations to predict the hot deformation behavior of modified 2.25Cr-1Mo steel. Mater Des 42:192–197. https://doi.org/10.1016/j.matdes.2012.05.056
Daoping L, Hang Y, Elkhodary KI, Shan T, Kam LW, Guo X (2022) Mechanistically informed data-driven modeling of cyclic plasticity via artificial neural networks. Comput Methods Appl Mech Eng 393:114766. https://doi.org/10.1016/j.cma.2022.114766
Unger Jörg F, Carsten K (2009) Neural networks as material models within a multiscale approach. Comput Struct 87(19):1177–1186. https://doi.org/10.1016/j.compstruc.2008.12.003
Gabriel H, Luiz SA (2015) Contact stiffness estimation in ANSYS using simplified models and artificial neural networks. Finite Elem Anal Des 97:43–53. https://doi.org/10.1016/j.finel.2015.01.003
Atsuya O, Shinobu Y (1970) A new local contact search method using a multi-layer neural network. Comput Model Eng Sci 21(2):93–104. https://doi.org/10.3970/cmes.2007.021.093
Oishi A, Yagawa G (2020) A surface-to-surface contact search method enhanced by deep learning. Comput Mech 65(4):1125–1147. https://doi.org/10.1007/s00466-019-01811-2
Singh AP, Medida S, Duraisamy K (2017) Machine-learning-augmented predictive modeling of turbulent separated flows over airfoils. AIAA J 55(7):2215–2227. https://doi.org/10.2514/1.J055595
Maulik R, San O, Rasheed A, Vedula P (2019) Subgrid modelling for two-dimensional turbulence using neural networks. J Fluid Mech 858:122–144. https://doi.org/10.1017/jfm.2018.770
Arnau F, Joan B, Ramon C (2022) Finite element approximation of wave problems with correcting terms based on training artificial neural networks with fine solutions. Comput Methods Appl Mech Eng 399:115280. https://doi.org/10.1016/j.cma.2022.115280
Le BA, Yvonnet J, He Q-C (2015) Computational homogenization of nonlinear elastic materials using neural networks. Int J Numer Method Eng 104(12):1061–1084. https://doi.org/10.1002/nme.4953
Xiaoxin L, Giovanis DG, Yvonnet J, Papadopoulos V, Detrez F, Bai J (2019) A data-driven computational homogenization method based on neural networks for the nonlinear anisotropic electrical response of graphene/polymer nanocomposites. Comput Mech 64(2):307–321. https://doi.org/10.1007/s00466-018-1643-0
Huang Daniel Z, Kailai X, Charbel F, Eric D (2020) Learning constitutive relations from indirect observations using deep neural networks. J Comput Phys 416:109491. https://doi.org/10.1016/j.jcp.2020.109491
Kun W, WaiChing S (2018) A multiscale multi-permeability poroplasticity model linked by recursive homogenizations and deep learning. Comput Methods Appl Mech Eng 334:337–380. https://doi.org/10.1016/j.cma.2018.01.036
Li B, Zhuang X (2020) Multiscale computation on feedforward neural network and recurrent neural network. Front Struct Civ Eng 14(6):1285–1298. https://doi.org/10.1007/s11709-020-0691-7
Vlassis Nikolaos N, Ran M, WaiChing S (2020) Geometric deep learning for computational mechanics part I: anisotropic hyperelasticity. Comput Methods Appl Mech Eng 371:113299. https://doi.org/10.1016/j.cma.2020.113299
Frankenreiter I, Rosato D, Miehe C (2011) Hybrid micro-macro-modeling of evolving anisotropies and length scales in finite plasticity of polycrystals: hybrid micro-macro-modeling of evolving anisotropies and length scales in finite plasticity of polycrystals. PAMM 11(1):515–518. https://doi.org/10.1002/pamm.201110249
Fish J (2013) Practical multiscaling. Wiley, Chichester
Kevin L, Markus H, Abdolazizi Kian P, Aydin Roland C, Mikhail I, Cyron Christian J (2021) Constitutive artificial neural networks: a fast and general approach to predictive data-driven constitutive modeling by deep learning. J Comput Phys 429:110010. https://doi.org/10.1016/j.jcp.2020.110010
Mozaffar M, Bostanabad R, Chen W, Ehmann K, Cao J, Bessa MA (2019) Deep learning predicts path-dependent plasticity. Proc Natl Acad Sci 116(52):26414–26420. https://doi.org/10.1073/pnas.1911815116
Ling W, Ludovic N (2022) Recurrent neural networks (RNNs) with dimensionality reduction and break down in computational mechanics; application to multi-scale localization step. Comput Methods Appl Mech Eng 390:114476. https://doi.org/10.1016/j.cma.2021.114476
Abueidda Diab W, Seid K, Sobh Nahil A, Huseyin S (2021) Deep learning for plasticity and thermo-viscoplasticity. Int J Plast 136:102852. https://doi.org/10.1016/j.ijplas.2020.102852
Hsu Yu-Chuan Yu, Chi-Hua BM (2020) Using deep learning to predict fracture patterns in crystalline solids. Matter 3(1):197–211. https://doi.org/10.1016/j.matt.2020.04.019
Lew AJ, Yu CH, Hsu YC, Buehler MJ (2021) Deep learning model to predict fracture mechanisms of graphene. Npj 2D Mater Appl 5(1):1–8. https://doi.org/10.1038/s41699-021-00228-x
Minliang L, Liang L, Wei S (2020) A generic physics-informed neural network-based constitutive model for soft biological tissues. Comput Methods Appl Mech Eng 372:113402. https://doi.org/10.1016/j.cma.2020.113402
Weber P, Geiger J, Wagner W (2021) Constrained neural network training and its application to hyperelastic material modeling. Comput Mech 68(5):1179–1204. https://doi.org/10.1007/s00466-021-02064-8
Leng Y, Tac V, Calve S, Tepole AB (2021) Predicting the mechanical properties of biopolymer gels using neural networks trained on discrete fiber network data. Comput Methods Appl Mech Eng 387:114160. https://doi.org/10.1016/j.cma.2021.114160. arXiv:2101.11712 [cs, q-bio]
Vahidullah T, Francisco SC, Tepole Adrian B (2022) Data-driven tissue mechanics with polyconvex neural ordinary differential equations. Comput Methods Appl Mech Eng 398:115248. https://doi.org/10.1016/j.cma.2022.115248
Linden L, Klein DK, Kalina KA, Brummund J, Weeger O, Kästner M (2023) Neural networks meet hyperelasticity: a guide to enforcing physics. arXiv:2302.02403 [cs]
Klein Dominik K, Rogelio O, Jesús M-F, Oliver W (2022) Finite electro-elasticity with physics-augmented neural networks. Comput Methods Appl Mech Eng 400:115501. https://doi.org/10.1016/j.cma.2022.115501
Klein Dominik K, Mauricio F, Martin Robert J, Patrizio N, Oliver W (2022) Polyconvex anisotropic hyperelasticity with neural networks. J Mech Phys Solids 159:104703. https://doi.org/10.1016/j.jmps.2021.104703
As’ad F, Farhat C (2023) A mechanics-informed neural network framework for data-driven nonlinear viscoelasticity. In: AIAA SCITECH 2023 forum. American Institute of Aeronautics and Astronautics, National Harbor. https://doi.org/10.2514/6.2023-0949
Vahidullah T, Rausch Manuel K, Francisco SC, Buganza TA (2023) Data-driven anisotropic finite viscoelasticity using neural ordinary differential equations. Comput Methods Appl Mech Eng 411:116046. https://doi.org/10.1016/j.cma.2023.116046
Amos B, Xu L, Zico KJ (2017) Input convex neural networks. In: Proceedings of the 34th international conference on machine learning. PMLR, pp 146–155. https://proceedings.mlr.press/v70/amos17b.html
Chen Ricky TQ, Rubanova Y, Bettencourt J, Duvenaud D (2019) Neural ordinary differential equations. arXiv:1806.07366
Peiyi C, Johann G (2022) Polyconvex neural networks for hyperelastic constitutive models: a rectification approach. Mech Res Commun 125:103993. https://doi.org/10.1016/j.mechrescom.2022.103993
Filippo M, Ioannis S, Paolo V, Victor M-B (2021) Thermodynamics-based artificial neural networks for constitutive modeling. J Mech Phys Solids 147:104277. https://doi.org/10.1016/j.jmps.2020.104277
Masi F, Stefanou I, Vannucci P, Maffi-Berthier V (2021) Material modeling via thermodynamics-based artificial neural networks. In: Barbaresco F, Nielsen F (eds) Geometric structures of statistical physics, information geometry, and learning. Springer proceedings in mathematics and statistics. Springer, Cham, pp 308–329. https://doi.org/10.1007/978-3-030-77957-3_16
Filippo M, Ioannis S (2022) Multiscale modeling of inelastic materials with thermodynamics-based artificial neural networks (TANN). Comput Methods Appl Mech Eng 398:115190. https://doi.org/10.1016/j.cma.2022.115190
Ladeveze P, Nedjar D, Reynier M (1994) Updating of finite element models using vibration tests. AIAA J 32(7):1485–1491. https://doi.org/10.2514/3.12219
Basile M, Ludovic C, Christian R (2019) Parameter identification and model updating in the context of nonlinear mechanical behaviors using a unified formulation of the modified constitutive relation error concept. Comput Methods Appl Mech Eng 345:1094–1113. https://doi.org/10.1016/j.cma.2018.09.008
Nam NH, Ludovic C, Cuong HM (2022) mCRE-based parameter identification from full-field measurements: consistent framework, integrated version, and extension to nonlinear material behaviors. Comput Methods Appl Mech Eng 400:115461. https://doi.org/10.1016/j.cma.2022.115461
Benady A, Baranger E, Chamoin L (2023) NN-mCRE: a modified constitutive relation error framework for unsupervised learning of nonlinear state laws with physics-augmented neural networks. https://doi.org/10.13140/RG.2.2.32171.00804
Benady AB, Chamoin LC, Baranger EB (2023) A modified constitutive relation error (mCRE) framework to learn nonlinear constitutive models from strain measurements with thermodynamics-consistent neural networks. In: International conference on adaptive modeling and simulation (ADMOS 2023), advanced techniques for data assimilation, inverse analysis, and data-based enrichment of simulation models. https://doi.org/10.23967/admos.2023.020
Xueyang L, Roth Christian C, Dirk M (2019) Machine-learning based temperature- and rate-dependent plasticity model: application to analysis of fracture experiments on DP steel. Int J Plast 118:320–344. https://doi.org/10.1016/j.ijplas.2019.02.012
Prakash T, Akshay J, Yiwen Z, Yiwen F, Laura DL, Siddhant K (2022) NN-EUCLID: deep-learning hyperelasticity without stress data. J Mech Phys Solids 169:105076. https://doi.org/10.1016/j.jmps.2022.105076
Xiang L, Zhanli L, Shaoqing C, Chengcheng L, Chenfeng L, Zhuo Z (2019) Predicting the effective mechanical property of heterogeneous materials by image based modeling and deep learning. Comput Methods Appl Mech Eng 347:735–753. https://doi.org/10.1016/j.cma.2019.01.005
Henkes A, Caylak I, Mahnken R (2021) A deep learning driven pseudospectral PCE based FFT homogenization algorithm for complex microstructures. Comput Methods Appl Mech Eng 385:114070. https://doi.org/10.1016/j.cma.2021.114070. arXiv:2110.13440
Minliang L, Liang L, Wei S (2019) Estimation of in vivo constitutive parameters of the aortic wall using a machine learning approach. Comput Methods Appl Mech Eng 347:201–217. https://doi.org/10.1016/j.cma.2018.12.030
Lu L, Dao M, Kumar P, Ramamurty U, Karniadakis GE, Suresh S (2020) Extraction of mechanical properties of materials through deep learning from instrumented indentation. Proc Natl Acad Sci 117(13):7052–7062. https://doi.org/10.1073/pnas.1922210117
Xuhui M, Em KG (2020) A composite neural network that learns from multi-fidelity data: application to function approximation and inverse PDE problems. J Comput Phys 401:109020. https://doi.org/10.1016/j.jcp.2019.109020
Xing L, Athanasiou Christos E, Padture Nitin P, Sheldon Brian W, Huajian G (2020) A machine learning approach to fracture mechanics problems. Acta Mater 190:105–112. https://doi.org/10.1016/j.actamat.2020.03.016
Hambli R, Katerchi H, Benhamou C-L (2011) Multiscale methodology for bone remodelling simulation using coupled finite element and neural network computation. Biomech Model Mechanobiol 10(1):133–145. https://doi.org/10.1007/s10237-010-0222-x
Moritz F, Siddhant K, Laura DL (2021) Unsupervised discovery of interpretable hyperelastic constitutive laws. Comput Methods Appl Mech Eng 381:113852. https://doi.org/10.1016/j.cma.2021.113852
Robert T (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B Methodol 58(1):267–288
Flaschel M, Kumar S, De Lorenzis L (2022) Discovering plasticity models without stress data. npj Comput Mater, 8(1):91. https://doi.org/10.1038/s41524-022-00752-4. arXiv:2202.04916 [cs]
Enzo M, Moritz F, Siddhant K, Laura DL (2023) Automated identification of linear viscoelastic constitutive laws with EUCLID. Mech Mater 181:104643. https://doi.org/10.1016/j.mechmat.2023.104643
Moritz F, Siddhant K, Laura DL (2023) Automated discovery of generalized standard material models with EUCLID. Comput Methods Appl Mech Eng 405:115867. https://doi.org/10.1016/j.cma.2022.115867
Akshay J, Prakash T, Yiwen Z, Maxime E, Moritz F, Laura DL, Siddhant K (2022) Bayesian-EUCLID: discovering hyperelastic material laws with uncertainties. Comput Methods Appl Mech Eng 398:115225. https://doi.org/10.1016/j.cma.2022.115225
Kevin L, Sarah P, Kuhl E (2023) Automated model discovery for human brain using constitutive artificial neural networks. Acta Biomater 160:134–151. https://doi.org/10.1016/j.actbio.2023.01.055
Kevin L, Ellen K (2023) A new family of constitutive artificial neural networks towards automated model discovery. Comput Methods Appl Mech Eng 403:115731. https://doi.org/10.1016/j.cma.2022.115731
Atsuya O, Genki Y (2017) Computational mechanics enhanced by deep learning. Comput Methods Appl Mech Eng 327:327–351. https://doi.org/10.1016/j.cma.2017.08.040
Jaeho J, Kyungho Y, Phill-Seung L (2020) Deep learned finite elements. Comput Methods Appl Mech Eng 372:113401. https://doi.org/10.1016/j.cma.2020.113401
Bar-Sinai Y, Hoyer S, Hickey J, Brenner MP (2019) Learning data-driven discretizations for partial differential equations. Proc Natl Acad Sci USA 116(31):15344–15349. https://doi.org/10.1073/pnas.1814058116
Panos P, Mobasher Mostafa E (2023) Integrated finite element neural network (I-FENN) for non-local continuum damage mechanics. Comput Methods Appl Mech Eng 404:115766. https://doi.org/10.1016/j.cma.2022.115766
Arcones DA, Meethal RE, Obst B, Wüchner R (2022) Neural network-based surrogate models applied to fluid–structure interaction problems. In: WCCM-APCOM 2022, 1700 data science, machine learning and artificial intelligence. https://doi.org/10.23967/wccm-apcom.2022.080
Changnian H, Peng Z, Danny B, Guojing C, Yuefan D (2021) Artificial intelligence for accelerating time integrations in multiscale modeling. J Comput Phys 427:110053. https://doi.org/10.1016/j.jcp.2020.110053
Tomasz S, Mateusz D, Anna P, Ignacio M, Marcin Ł, Maciej P (2023) Automatic stabilization of finite-element simulations using neural networks and hierarchical matrices. Comput Methods Appl Mech Eng 411:116073. https://doi.org/10.1016/j.cma.2023.116073
Mariusz B, Salman YM, Nathan Z, Duane D, Stefan M, Satchit R, Thiago R, Fabian D (2023) Learning hyperparameter predictors for similarity-based multidisciplinary topology optimization. Sci Rep 13(1):14856. https://doi.org/10.1038/s41598-023-42009-0
Casadei F, Rimoli JJ, Ruzzene M (2013) A geometric multiscale finite element method for the dynamic analysis of heterogeneous solids. Comput Methods Appl Mech Eng 263:56–70. https://doi.org/10.1016/j.cma.2013.05.009
Oztoprak O, Paolini A, D’Acunto P, Rank E, Kollmannsberger S (2023) Two-scale analysis of spaceframes with complex additive manufactured nodes. Eng Struct 289:116283. https://doi.org/10.1016/j.engstruct.2023.116283
Arnd K, Franz B, Bernd M (2020) An intelligent nonlinear meta element for elastoplastic continua: deep learning using a new time-distributed residual U-Net architecture. Comput Methods Appl Mech Eng 366:113088. https://doi.org/10.1016/j.cma.2020.113088
German C, Rimoli Julian J (2019) Smart finite elements: a novel machine learning application. Comput Methods Appl Mech Eng 345:363–381. https://doi.org/10.1016/j.cma.2018.10.046
Taichi Y, Hiroshi O (2021) Zooming method for FEA using a neural network. Comput Struct 247:106480. https://doi.org/10.1016/j.compstruc.2021.106480
Minglang Y, Zhang Enrui Yu, Yue KG (2022) Interfacing finite elements with deep neural operators for fast multiscale modeling of mechanics problems. Comput Methods Appl Mech Eng 402:115027. https://doi.org/10.1016/j.cma.2022.115027
Sigmund O (2011) On the usefulness of non-gradient approaches in topology optimization. Struct Multidiscip Optim 43(5):589–596. https://doi.org/10.1007/s00158-011-0638-7
Holl P, Koltun V, Thuerey N (2020) Learning to control PDEs with differentiable physics. arXiv:2001.07457 [physics, stat]
Um K, Brand R, Fei Y, Holl P, Thuerey N (2020) Solver-in-the-loop: learning from differentiable physics to interact with iterative PDE-solvers. In: Proceedings of the 34th international conference on neural information processing systems, NIPS’20. Curran Associates Inc, Red Hook, pp 6111–6122
Um K, Brand R, Yun F, Holl P, Thuerey P (2021) Solver-in-the-loop: learning from differentiable physics to interact with iterative PDE-solvers. arXiv:2007.00016 [physics]
Jensen CA, Reed RD, Marks RJ, El-Sharkawi MA, Jung J-B, Miyamoto RT, Anderson GM, Eggen CJ (1999) Inversion of feedforward neural networks: algorithms and applications. Proc IEEE 87(9):1536–1549. https://doi.org/10.1109/5.784232
Chi-Hua Yu, Qin Z, Buehler MJ (2019) Artificial intelligence design algorithm for nanocomposites optimized for shear crack resistance. Nano Futures 3(3):035001. https://doi.org/10.1088/2399-1984/ab36f0
Chen C-T, Grace XG (2020) Generative deep neural networks for inverse materials design using backpropagation and active learning. Adv Sci 7(5):1902607. https://doi.org/10.1002/advs.201902607
Tanyu DN, Ning J, Freudenberg T, Heilenkötter N, Rademacher A, Iben U, Maass Pr (2022) Deep learning methods for partial differential equations and related parameter identification problems. arXiv:2212.03130
Zohdi TI (2023) A machine-learning digital-twin for rapid large-scale solar-thermal energy system design. Comput Methods Appl Mech Eng 412:115991. https://doi.org/10.1016/j.cma.2023.115991
Plessix R-E (2006) A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys J Int 167(2):495–503. https://doi.org/10.1111/j.1365-246X.2006.02978.x
Dan G (2021) A tutorial on the adjoint method for inverse problems. Comput Methods Appl Mech Eng 380:113810. https://doi.org/10.1016/j.cma.2021.113810
Keshavarzzadeh V, Kirby RM, Narayan A (2021) Robust topology optimization with low rank approximation using artificial neural networks. Comput Mech 68(6):1297–1323. https://doi.org/10.1007/s00466-021-02069-3
Qian C, Ye W (2021) Accelerating gradient-based topology optimization design with dual-model artificial neural networks. Struct Multidiscip Optim 63(4):1687–1707. https://doi.org/10.1007/s00158-020-02770-6
Heng C, Yuyu Z, Elaine TTL, Lucia M, Livio D, Le S, Paulino Glaucio H (2021) Universal machine learning for topology optimization. Comput Methods Appl Mech Eng 375:112739. https://doi.org/10.1016/j.cma.2019.112739
Aulig N, Olhofer M (2013) Evolutionary generation of neural network update signals for the topology optimization of structures. In: Proceedings of the 15th annual conference companion on genetic and evolutionary computation, GECCO ’13 Companion. Association for Computing Machinery, New York, pp 213–214. https://doi.org/10.1145/2464576.2464685
Aulig N, Olhofer M (2014) Topology optimization by predicting sensitivities based on local state features. https://congress.cimne.com/iacm-eccomas2014/admin/files/filePaper/p437.pdf
Aulig N, Olhofer M (2015) Neuro-evolutionary topology optimization with adaptive improvement threshold. In: Mora AM, Squillero G (eds) Applications of evolutionary computation. Lecture notes in computer science. Springer, Cham, pp 655–666. https://doi.org/10.1007/978-3-319-16549-3_53
Zhang Y, Chi H, Chen B, Tang TLE, Mirabella L, Song L, Paulino GH (2021) Speeding up computational morphogenesis with online neural synthetic gradients. arXiv:2104.12282
Hunter TH, Hulsoff SH, Sitaram A (2023) SuperAdjoint: super-resolution neural networks in adjoint-based output error estimation. In: International conference on adaptive modeling and simulation (ADMOS 2023), recent developments in methods and applications for mesh adaptation. https://doi.org/10.23967/admos.2023.058
Kai F, Koji F, Kunihiko T (2021) Machine-learning-based spatio-temporal super resolution reconstruction of turbulent flows. J Fluid Mech 909:A9. https://doi.org/10.1017/jfm.2020.948
Senhora Fernando V, Heng C, Yuyu Z, Lucia M, Elaine TTL, Paulino Glaucio H (2022) Machine learning for topology optimization: physics-based learning through an independent training strategy. Comput Methods Appl Mech Eng 398:115116. https://doi.org/10.1016/j.cma.2022.115116
Hsieh JT, Zhao S, Eismann S, Mirabella L, Ermon S (2019) Learning neural PDE solvers with convergence guarantees. arXiv:1906.01200 [cs, stat]
Hong-Ling Y, Ji-Cheng L, Bo-Shuai Y, Nan W, Yun-Kang S (2021) Acceleration design for continuum topology optimization by using Pix2pix neural network. Int J Appl Mech 13(04):2150042. https://doi.org/10.1142/S1758825121500423
Hoyer S, Sohl-Dickstein J, Greydanus S (2019) Neural reparameterization improves structural optimization. arXiv:1909.04240
Xu K, Darve E (2019) The neural network approach to inverse problems in differential equations. arXiv:1901.07758
Jens B, Kaj N (2021) Neural networks as smooth priors for inverse problems for PDEs. J Comput Math Data Sci 1:100008. https://doi.org/10.1016/j.jcmds.2021.100008
Chen L, Shen MHH (2021) A new topology optimization approach by physics-informed deep learning process. Adv Sci Technol Eng Syst J 6(4):233–240. https://doi.org/10.25046/aj060427
Alex H, Flavio CL, Alexander H (2021) An artificial intelligence-assisted design method for topology optimization without pre-optimized training data. Appl Sci 11(19):9041. https://doi.org/10.3390/app11199041
Deng H, Albert CT (2020) Topology optimization based on deep representation learning (DRL) for compliance and stress-constrained design. Comput Mech 66(2):449–469. https://doi.org/10.1007/s00466-020-01859-5
Chandrasekhar A, Suresh K (2021) TOuNN: topology optimization using neural networks. Struct Multidiscip Optim 63(3):1135–1149. https://doi.org/10.1007/s00158-020-02748-4
Chandrasekhar A, Suresh K (2021) Length scale control in topology optimization using fourier enhanced neural networks. arXiv:2109.01861
Aaditya C, Krishnan S (2021) Multi-material topology optimization using neural networks. Comput Aided Des 136:103017. https://doi.org/10.1016/j.cad.2021.103017
Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S (2019) DeepSDF: learning continuous signed distance functions for shape representation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 165–174. https://doi.org/10.1109/CVPR.2019.00025
Michalkiewicz M, Pontes JK, Jack D, Baktashmotlagh M, Eriksson A (2019) Implicit surface representations as layers in neural networks. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4742–4751. https://doi.org/10.1109/ICCV.2019.00484
Gropp A, Yariv L, Haim N, Atzmon M, Lipman Y (2020) Implicit geometric regularization for learning shapes. In: Proceedings of the 37th international conference on machine learning, vol 119 of ICML’20, pp 3789–3799. JMLR.org
Sitzmann V, Martel Julien NP, Bergman AW, Lindell DB, Wetzstein G (2020) Implicit neural representations with periodic activation functions. arXiv:2006.09661 [cs, eess]
Huang Z, Bai S, Zico KJ (2021) \(({\rm Implicit})^2\): implicit layers for implicit representations. In: Advances in neural information processing systems, vol 34. Curran Associates, Inc., pp 9639–9650. https://papers.nips.cc/paper/2021/hash/4ffbd5c8221d7c147f8363ccdc9a2a37-Abstract.html
Deng H, To AC (2021) A parametric level set method for topology optimization based on deep neural network (DNN). arXiv:2101.03286
Zeyu Z, Li Yu, Weien Z, Xiaoqian C, Wen Y, Yong Z (2021) TONR: an exploration for a novel way combining neural network with topology optimization. Comput Methods Appl Mech Eng 386:114083. https://doi.org/10.1016/j.cma.2021.114083
Biswas R, Sen MK, Das V, Mukerji T (2019) Prestack and poststack inversion using a physics-guided convolutional neural network. Interpretation 7(3):161–174. https://doi.org/10.1190/INT-2018-0236.1
Alfarraj M, AlRegib G (2019) Semi-supervised learning for acoustic impedance inversion. In: SEG technical program expanded abstracts 2019. Society of Exploration Geophysicists, San Antonio, pp 2298–2302. https://doi.org/10.1190/segam2019-3215902.1
Dong C, Loy CC, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Lecture notes in computer science. Springer, Cham, pp 184–199. https://doi.org/10.1007/978-3-319-10593-2_13
Dong C, Loy CC, He K, Tang X (2015) Image super-resolution using deep convolutional networks. arXiv:1501.00092 [cs]
Kai F, Koji F, Kunihiko T (2019) Super-resolution reconstruction of turbulent flows with machine learning. J Fluid Mech 870:106–120. https://doi.org/10.1017/jfm.2019.238
Nicholas N, Sai-Aksharah S, Tran Huy T, James Kai A (2020) An artificial neural network approach for generating high-resolution designs from low-resolution input in topology optimization. J Mech Des 142(1):011402. https://doi.org/10.1115/1.4044332
Wang C, Yao S, Wang Z, Jie H (2021) Deep super-resolution neural network for structural topology optimization. Eng Optim 53(12):2108–2121. https://doi.org/10.1080/0305215X.2020.1846031
Xue L, Liu J, Wen G, Wang H (2021) Efficient, high-resolution topology optimization method based on convolutional neural networks. Front Mech Eng 16(1):80–96. https://doi.org/10.1007/s11465-020-0614-2
Oishi A, Yagawa G (2021) Finite elements using neural networks and a posteriori error. Arch Comput Methods Eng 28(5):3433–3456. https://doi.org/10.1007/s11831-020-09507-0
Ohrt EM, Niels A, Andreas BJ, Ole S (2022) De-homogenization using convolutional neural networks. Comput Methods Appl Mech Eng 388:114197. https://doi.org/10.1016/j.cma.2021.114197
Wan ZY, Vlachas P, Koumoutsakos P, Sapsis T (2018) Data-assisted reduced-order modeling of extreme events in complex dynamical systems. PLoS ONE 13(5):e0197704. https://doi.org/10.1371/journal.pone.0197704
Sato S, Dobashi Y, Kim T, Nishita T (2018) Example-based turbulence style transfer. ACM Trans Graph 37(4):84:1-84:9. https://doi.org/10.1145/3197517.3201398
Chu M, Thuerey N (2017) Data-driven synthesis of smoke flows with CNN-based feature descriptors. ACM Trans Graph 36(4):69:1-69:14. https://doi.org/10.1145/3072959.3073643
Yildiz AR, Öztürk N, Kaya N, Öztürk F (2003) Integrated optimal topology design and shape optimization using neural networks. Struct Multidiscip Optim 25(4):251–260. https://doi.org/10.1007/s00158-003-0300-0
Chyi-Yeu L, Shin-Hong L (2005) Artificial neural network based hole image interpretation techniques for integrated topology and shape optimization. Comput Methods Appl Mech Eng 194(36):3817–3837. https://doi.org/10.1016/j.cma.2004.09.005
Chen G, Fidkowski K (2020) Output-based error estimation and mesh adaptation using convolutional neural networks: application to a scalar advection-diffusion problem. In: AIAA Scitech 2020 forum. American Institute of Aeronautics and Astronautics, Orlando. https://doi.org/10.2514/6.2020-1143
Ramuhalli P, Udpa L, Udpa SS (2005) Finite-element neural networks for solving differential equations. IEEE Trans Neural Netw 16(6):1381–1392. https://doi.org/10.1109/TNN.2005.857945
Sikora R, Sikora J, Cardelli E, Chady T (1999) Artificial neural network application for material evaluation by electromagnetic methods. In: International joint conference on neural networks. Proceedings (Cat. No.99CH36339), IJCNN’99, vol 6, pp 4027–4032. https://doi.org/10.1109/IJCNN.1999.830804
Xu G, Littlefair G, Penson R, Callan R (1999) Application of FE-based neural networks to dynamic problems. In: ICONIP’99. ANZIIS’99 & ANNES’99 & ACNN’99. 6th International conference on neural information processing. Proceedings (Cat. No.99EX378), vol 3, pp 1039–1044. https://doi.org/10.1109/ICONIP.1999.844679
Guo F, Zhang P, Wang F, Ma X, Qiu G (1999) Finite element analysis based Hopfield neural network model for solving nonlinear electromagnetic field problems. In: International joint conference on neural networks. Proceedings (Cat. No.99CH36339), IJCNN’99, vol 6, pp 4399–4403. https://doi.org/10.1109/IJCNN.1999.830877
Hyuk L, Seok KI (1990) Neural algorithm for solving differential equations. J Comput Phys 91(1):110–131. https://doi.org/10.1016/0021-9991(90)90007-N
Kalkkuhl J, Hunt KJ, Fritz H (1999) FEM-based neural-network approach to nonlinear modeling with application to longitudinal vehicle dynamics control. IEEE Trans Neural Netw 10(4):885–897. https://doi.org/10.1109/72.774241
Chao X, Wang C, Ji F, Yuan X (2012) Finite-element neural network-based solving 3-D differential equations in MFL. IEEE Trans Magn 48(12):4747–4756. https://doi.org/10.1109/TMAG.2012.2207732
Yang Z, Ruess M, Kollmannsberger S, Düster A, Rank E (2012) An efficient integration technique for the voxel-based finite cell method: efficient integration technique for finite cells. Int J Numer Methods Eng 91(5):457–471. https://doi.org/10.1002/nme.4269
Zhang L, Cheng L, Li H, Gao J, Cheng Yu, Domel R, Yang Y, Tang S, Liu WK (2021) Hierarchical deep-learning neural networks: finite elements and beyond. Comput Mech 67(1):207–230. https://doi.org/10.1007/s00466-020-01928-9
Sourav S, Zhengtao G, Lin C, Jiaying G, Kafka Orion L, Xiaoyu X, Hengyang L, Mahsa T, Alicia Kim H, Kam LW (2021) Hierarchical deep learning neural network (HiDeNN): an artificial intelligence (AI) framework for computational science and engineering. Comput Methods Appl Mech Eng 373:113452. https://doi.org/10.1016/j.cma.2020.113452
Zhang Lei L, Ye TS, Kam LW (2022) HiDeNN-TD: reduced-order hierarchical deep learning neural networks. Comput Methods Appl Mech Eng 389:114414. https://doi.org/10.1016/j.cma.2021.114414
Liu Y, Park C, Ye L, Mojumder S, Liu WK, Qian D (2023) HiDeNN-FEM: a seamless machine learning approach to nonlinear finite element analysis. Comput Mech 72(1):173–194. https://doi.org/10.1007/s00466-023-02293-z
Ye L, Li H, Zhang L, Park C, Mojumder S, Knapik S, Sang Z, Tang S, Apley DW, Wagner GJ, Liu WK (2023) Convolution hierarchical deep-learning neural networks (C-HiDeNN): finite elements, isogeometric analysis, tensor decomposition, and beyond. Comput Mech 72(2):333–362. https://doi.org/10.1007/s00466-023-02336-5
Park C, Ye L, Saha S, Xue T, Guo J, Mojumder S, Apley DW, Wagner GJ, Liu WK (2023) Convolution hierarchical deep-learning neural network (C-HiDeNN) with graphics processing unit (GPU) acceleration. Comput Mech 72(2):383–409. https://doi.org/10.1007/s00466-023-02329-4
Li H, Knapik S, Li Y, Park C, Guo J, Mojumder S, Ye L, Chen W, Apley DW, Liu WK (2023) Convolution hierarchical deep-learning neural network tensor decomposition (C-HiDeNN-TD) for high-resolution topology optimization. Comput Mech 72(2):363–382. https://doi.org/10.1007/s00466-023-02333-8
Grosse IR, Katragadda P, Benoit J (1992) An adaptive accuracy-based a posteriori error estimator. Finite Elem Anal Des 12(1):75–90. https://doi.org/10.1016/0168-874X(92)90008-Z
Zhu JZ, Zienkiewicz OC (1997) A posteriori error estimation and three-dimensional automatic mesh generation. Finite Elem Anal Des 25(1):167–184. https://doi.org/10.1016/S0168-874X(96)00037-6
Möller M, Kuzmin D (2006) Adaptive mesh refinement for high-resolution finite element schemes. Int J Numer Meth Fluids 52(5):545–569. https://doi.org/10.1002/fld.1183
Yao H, Ren Y, Liu Y (2019) FEA-Net: a deep convolutional neural network with physicsprior for efficient data driven PDE learning. In: AIAA Scitech 2019 forum. American Institute of Aeronautics and Astronautics, San Diego. https://doi.org/10.2514/6.2019-0680
Houpu Y, Yi G, Yongming L (2020) FEA-Net: a physics-guided data-driven model for efficient mechanical response prediction. Comput Methods Appl Mech Eng 363:112892. https://doi.org/10.1016/j.cma.2020.112892
Mishra RK, Hall PS (2005) NFDTD concept. IEEE Trans Neural Netw 16(2):484–490. https://doi.org/10.1109/TNN.2004.841799
Richardson A (2018) Seismic full-waveform inversion using deep learning tools and techniques. arXiv:1801.07232
Sun J, Niu Z, Innanen KA, Li J, Trad DO (2020) A theory-guided deep-learning formulation and optimization of seismic waveform inversion. Geophysics 85(2):R87–R99. https://doi.org/10.1190/geo2019-0138.1
Hughes TW, Williamson IAD, Minkov M, Fan S (2019) Wave physics as an analog recurrent neural network. Sci Adv 5(12):6946. https://doi.org/10.1126/sciadv.aay6946
Liu Zeliang WCT, Koishi M (2019) A deep material network for multiscale topology learning and accelerated nonlinear modeling of heterogeneous materials. Comput Methods Appl Mech Eng 345:1138–1168. https://doi.org/10.1016/j.cma.2018.09.020
Liu Zeliang WCT (2019) Exploring the 3D architectures of deep material network in data-driven multiscale mechanics. J Mech Phys Solids 127:20–46. https://doi.org/10.1016/j.jmps.2019.03.004
Haber E, Ruthotto L (2018) Stable architectures for deep neural networks. Inverse Problems, 34(1):014004. https://doi.org/10.1088/1361-6420/aa9a90. arXiv:1705.03341 [cs, math]
Ruthotto L, Haber E (2018) Deep neural networks motivated by partial differential equations. arXiv:1804.04272 [cs, math, stat]
Lu Y, Zhong A, Li Q, Dong B (2020) Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. arXiv:1710.10121 [cs, stat]
Pontriagin LS, Neustadt LW, Pontriagin LS (1986) The mathematical theory of optimal processes. In: Classics of Soviet mathematics. Gordon and Breach Science Publishers, New York
Yu Y, Yao H, Liu Y (2018) Physics-based learning for aircraft dynamics simulation. In: Annual conference of the PHM society. https://doi.org/10.36001/phmconf.2018.v10i1.513
Rishikesh R, Chris H, Jay P (2021) DiscretizationNet: a machine-learning based solver for Navier–Stokes equations using finite volume discretization. Comput Methods Appl Mech Eng 378:113722. https://doi.org/10.1016/j.cma.2021.113722
Foster D (2023) Generative deep learning: teaching machines to paint, write, compose, and play, 2nd edn. O’Reilly Media Incorporated, Sebastopol
Mosser L, Dubrule O, Blunt MJ (2017) Reconstruction of three-dimensional porous media using generative adversarial neural networks. Phys Rev E 96(4):043309. https://doi.org/10.1103/PhysRevE.96.043309
Feng J, He X, Teng Q, Ren C, Chen H, Li Y (2019) Reconstruction of porous media from extremely limited information using conditional generative adversarial networks. Phys Rev E 100(3):033308. https://doi.org/10.1103/PhysRevE.100.033308
Reza S, Mohsen M, Bozorgmehry BR, Blunt Martin J (2020) Coupled generative adversarial and auto-encoder neural networks to reconstruct three-dimensional multi-scale porous media. J Petrol Sci Eng 186:106794. https://doi.org/10.1016/j.petrol.2019.106794
Xia P, Bai H, Zhang T (2022) Multi-scale reconstruction of porous media based on progressively growing generative adversarial networks. Stoch Env Res Risk Assess 36(11):3685–3705. https://doi.org/10.1007/s00477-022-02216-z
Alexander H, Henning W (2022) Three-dimensional microstructure generation using generative adversarial neural networks in the context of continuum micromechanics. Comput Methods Appl Mech Eng 400:115497. https://doi.org/10.1016/j.cma.2022.115497
Rawat S, Herman Shen MH (2019) A novel topology design approach using an integrated deep learning network architecture. arXiv:1808.02334
Kentaro Y, Shintaro Y, Kikuo F (2022) Data-driven multifidelity topology design using a deep generative model: application to forced convection heat transfer problems. Comput Methods Appl Mech Eng 388:114284. https://doi.org/10.1016/j.cma.2021.114284
Lee KH, Yun GJ (2023) Microstructure reconstruction using diffusion-based generative models. arXiv:2211.10949 [cond-mat, physics:physics]
Christian D, Paul S, Dennis R, Stephanie H, Markus K, Maik G (2023) Conditional diffusion-based microstructure reconstruction. Mater Today Commun 35:105608. https://doi.org/10.1016/j.mtcomm.2023.105608
Vlassis Nikolaos N, WaiChing S (2023) Denoising diffusion algorithm for inverse design of microstructures with fine-tuned nonlinear material properties. Comput Methods Appl Mech Eng 413:116126. https://doi.org/10.1016/j.cma.2023.116126
Junxi F, Qizhi T, Bing L, Xiaohai H, Honggang C, Yang L (2020) An end-to-end three-dimensional reconstruction framework of porous media from a single two-dimensional image based on deep learning. Comput Methods Appl Mech Eng 368:113043. https://doi.org/10.1016/j.cma.2020.113043
Steve K, Cooper Samuel J (2021) Generating three-dimensional structures from a two-dimensional slice with generative adversarial network-based dimensionality expansion. Nat Mach Intell 3(4):299–305. https://doi.org/10.1038/s42256-021-00322-1
Li Y, Jian P, Han G (2022) Cascaded progressive generative adversarial networks for reconstructing three-dimensional grayscale core images from a single two-dimensional image. Front Phys. https://doi.org/10.3389/fphy.2022.716708
Fan Z, Xiaohai H, Teng Qizhi W, Xiaohong DX (2022) 3D-PMRNN: Reconstructing three-dimensional porous media from the two-dimensional image with recurrent neural network. J Petrol Sci Eng 208:109652. https://doi.org/10.1016/j.petrol.2021.109652
Zheng Q, Zhang D (2022) RockGPT: reconstructing three-dimensional digital rocks from single two-dimensional slice with deep learning. Comput Geosci 26(3):677–696. https://doi.org/10.1007/s10596-022-10144-8
Johan P, Leonardo R, Gabriel K, Frank L (2022) Size-invariant 3D generation from a single 2D rock image. J Petrol Sci Eng 215:110648. https://doi.org/10.1016/j.petrol.2022.110648
Fan Z, Qizhi T, Honggang C, Xiaohai H, Xiucheng D (2021) Slice-to-voxel stochastic reconstructions on porous media with hybrid deep generative model. Comput Mater Sci 186:110018. https://doi.org/10.1016/j.commatsci.2020.110018
Rawat S, Shen MHH (2019) Application of adversarial networks for 3D structural topology optimization, pp 2019-01-0829. https://doi.org/10.4271/2019-01-0829
Rawat S, Herman SMH (2019) A novel topology optimization approach using conditional deep learning. arXiv:1901.04859
Herman Shen MH, Chen L (2019) A new CGAN technique for constrained topology design optimization. arXiv:1901.07675
Henning W, Christoph B, Fadi A, Markus H, Michael H, Ludger L, Peter W (2022) Computational homogenization using convolutional neural networks. In: Fadi A, Blaž H, Meisam S, Henning W, Christian W, Michele M (eds) Current trends and open problems in computational mechanics. Springer, Cham, pp 569–579. https://doi.org/10.1007/978-3-030-87312-7_55
Mosser L, Dubrule O, Blunt MJ (2020) Stochastic seismic waveform inversion using generative adversarial networks as a geological prior. Math Geosci 52(1):53–79. https://doi.org/10.1007/s11004-019-09832-6
Tinghao G, Lohan Danny J, Ruijin C, Yi RM, Allison James T (2018) An indirect design representation for topology optimization using variational autoencoder and style transfer. In: AIAA/ASCE/AHS/ASC structures, structural dynamics, and materials conference. AIAA SciTech Forum American Institute of Aeronautics and Astronautics. https://doi.org/10.2514/6.2018-0804
Vulimiri Praveen S, Hao D, Florian D, Xiaoli Z, To Albert C (2021) Integrating geometric data into topology optimization via neural style transfer. Materials 14(16):4551. https://doi.org/10.3390/ma14164551
Gatys L, Ecker A, Bethge M (2016) A neural algorithm of artistic style. J Vis 16(12):326. https://doi.org/10.1167/16.12.326
Dommaraju N, Bujny M, Menzel S, Olhofer M, Duddeck F (2023) Evaluation of geometric similarity metrics for structural clusters generated using topology optimization. Appl Intell 53(1):904–929. https://doi.org/10.1007/s10489-022-03301-0
Achlioptas P, Diamanti O, Mitliagkas I, Guibas L (2018) Learning representations and generative models for 3D point clouds. In: Proceedings of the 35th international conference on machine learning. PMLR, pp 40–49. https://proceedings.mlr.press/v80/achlioptas18a.html
Yang Y, Feng C, Shen Y, Tian D (2018) FoldingNet: point cloud auto-encoder via deep grid deformation, pp 206–215. https://openaccess.thecvf.com/content_cvpr_2018/html/Yang_FoldingNet_Point_Cloud_CVPR_2018_paper.html
Shahroz K, Kosa G-L, Konstantinos K, Panagiotis K (2023) ShipHullGAN: a generic parametric modeller for ship hull design using deep convolutional generative model. Comput Methods Appl Mech Eng 411:116051. https://doi.org/10.1016/j.cma.2023.116051
Qiuyi C, Jun W, Phillip P, Chen W, Fuge M (2022) Inverse design of two-dimensional airfoils using conditional generative models and surrogate log-likelihoods. J Mech Des 144(2):021712. https://doi.org/10.1115/1.4052846
Chen W, Fuge M (2021) BézierGAN: automatic generation of smooth curves from interpretable low-dimensional parameters. arXiv:1808.08871 [cs, stat]
Wei C, Faez A (2021) MO-PaDGAN: reparameterizing engineering designs for augmented multi-objective optimization. Appl Soft Comput 113:107909. https://doi.org/10.1016/j.asoc.2021.107909
Richardson A (2018) Generative adversarial networks for model order reduction in seismic full-waveform inversion. arXiv:1806.00828 [physics]
Zhang Y, Seibert P, Otto A, Raßloff A, Ambati M, Kästner M (2023) DA-VEGAN: differentiably augmenting VAE-GAN for microstructure reconstruction from extremely small data sets. arXiv:2303.03403 [cs]
Wei C, Faez A (2021) PaDGAN: learning to generate high-quality novel designs. J Mech Des 143(3):031703. https://doi.org/10.1115/1.4048626
Kulesza A, Taskar B (2012) Determinantal point processes for machine learning. Found Trends Mach Learn 5(2–3):123–286. https://doi.org/10.1561/2200000044. arXiv:1207.6083 [cs, stat]
Bates SJ, Sienz J, Langley DS (2003) Formulation of the Audze–Eglais uniform latin hypercube design of experiments. Adv Eng Softw 34(8):493–506. https://doi.org/10.1016/S0965-9978(03)00042-5
Heyrani Nobari A, Rashad MF, Ahmed F (2021) CreativeGAN: editing generative adversarial networks for creative design synthesis. In: 47th Design automation conference (DAC), page V03AT03A002, virtual, vol 3. American Society of Mechanical Engineers. https://doi.org/10.1115/DETC2021-68103
Bau D, Liu S, Wang T, Zhu JY, Torralba A (2020) Rewriting a deep generative model. arXiv:2007.15646 [cs]
Elgammal A, Liu B, Elhoseiny M, Mazzone M (2017) CAN: creative adversarial networks, generating “art” by learning about styles and deviating from style norms. arXiv:1706.07068 [cs]
Oh S, Jung Y, Kim S, Lee I, Kang N (2019) Deep generative design: integration of topology optimization and generative models. J Mech Des 141(11):111405. https://doi.org/10.1115/1.4044229. arXiv:1903.01548
Greminger M (2020) Generative adversarial networks with synthetic training data for enforcing manufacturing constraints on topology optimization. In: 46th Design automation conference (DAC), vol 11A, p V11AT11A005. American Society of Mechanical Engineers. https://doi.org/10.1115/DETC2020-22399
Yoo S, Lee S, Kim S, Hwang KH, Park JH, Kang N (2021) Integrating deep learning into CAD/CAE system: generative design and evaluation of 3D conceptual wheel. Struct Multidiscip Optim 64(4):2725–2747. https://doi.org/10.1007/s00158-021-02953-9
Weisheng Z, Wang Yue D, Zongliang LC, Sung-Kie Y, Guo X (2023) Machine-learning assisted topology optimization for architectural design with artistic flavor. Comput Methods Appl Mech Eng 413:116041. https://doi.org/10.1016/j.cma.2023.116041
Bendsøe MP, Sigmund O (2003) Topology optimization: theory, methods, and applications. Springer, New York
Yang F, Ma J (2023) FWIGAN: full-waveform inversion via a physics-informed generative adversarial network. J Geophys Res Solid Earth 128(4):e2022JB025493. https://doi.org/10.1029/2022JB025493
Radhakrishnan S, Bharadwaj V, Manjunath V, Srinath R (2018) Creative intelligence—automating car design studio with generative adversarial networks (GAN). In: Holzinger A, Kieseberg P, Tjoa AM, Weippl E (eds) Machine learning and knowledge extraction. Lecture notes in computer science. Springer, Cham, pp 160–175. https://doi.org/10.1007/978-3-319-99740-7_11
Wei C, Mark F (2019) Synthesizing designs with interpart dependencies using hierarchical generative adversarial networks. J Mech Des 141(11):111403. https://doi.org/10.1115/1.4044076
Nie Z, Lin T, Jiang H, Kara LB (2020) TopologyGAN: topology optimization using generative adversarial networks based on physical fields over the initial domain. https://doi.org/10.48550/arXiv.2003.04685. arXiv:2003.04685v2
Nathan H, Buskohl Philip R, Andrew G, Kumar V, Sam A (2021) Generative adversarial network for early-stage design flexibility in topology optimization for additive manufacturing. J Manuf Syst 59:675–685. https://doi.org/10.1016/j.jmsy.2021.04.007
Heyrani Nobari A, Chen W, Ahmed F (2021) RANGE-GAN: design synthesis under constraints using conditional generative adversarial networks. J Mech Des 10(1115/1):4052442
Jun W, Wei C, Da D, Fuge M, Rai R (2022) IH-GAN: a conditional generative model for implicit surface-based inverse design of cellular structures. Comput Methods Appl Mech Eng 396:115060. https://doi.org/10.1016/j.cma.2022.115060
Duque L, Gutiérrez G, Arias C, Rüger A, Jaramillo H (2019) Automated velocity estimation by deep learning based seismic-to-velocity mapping. Eur Assoc Geosci Eng. https://doi.org/10.3997/2214-4609.201901523
Yu-Qing W, Wang Qi L, Wen-Kai GQ, Xin-Fei Y (2022) Seismic impedance inversion based on cycle-consistent generative adversarial network. Pet Sci 19(1):147–161. https://doi.org/10.1016/j.petsci.2021.09.038
Zhu JY, Park T, Isola P, Efros AA (2020) Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv:1703.10593 [cs]
Baotong L, Congjia H, Xin L, Shuai Z, Jun H (2019) Non-iterative structural topology optimization using deep learning. Comput Aided Des 115:172–180. https://doi.org/10.1016/j.cad.2019.05.038
Xie Y, Franz E, Chu M, Thuerey N (2018) tempoGAN: a temporally coherent, volumetric GAN for super-resolution fluid flow. ACM Trans Graph 37(4):95:1-95:15. https://doi.org/10.1145/3197517.3201304
Pang G, Shen C, Cao L, Van Den Hengel A (2022) Deep learning for anomaly detection: a review. ACM Comput Surv 54(2):1–38. https://doi.org/10.1145/3439950
Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data warehousing and knowledge discovery. Lecture notes in computer science. Springer, Berlin, pp 170–180. https://doi.org/10.1007/3-540-46145-0_17
Thomas S, Philipp S, Waldstein Sebastian M, Ursula S-E, Georg L (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Niethammer M, Styner M, Aylward S, Zhu H, Oguz I, Yap P-T, Shen D (eds) Information processing in medical imaging. Lecture notes in computer science. Springer, Cham, pp 146–157. https://doi.org/10.1007/978-3-319-59050-9_12
Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR (2019) Efficient GAN-based anomaly detection. arXiv:1802.06222 [cs, stat]
Thomas S, Philipp S, Waldstein Sebastian M, Georg L, Ursula S-E (2019) f-AnoGAN: fast unsupervised anomaly detection with generative adversarial networks. Med Image Anal 54:30–44. https://doi.org/10.1016/j.media.2019.01.010
Henkes A, Herrmann L, Wessels H, Kollmannsberger S (2023) Gan enables outlier detection and property monitoring for additive manufacturing of complex structures. Preprint https://www.ssrn.com/abstract=4627723
Duddeck F (2008) Multidisciplinary optimization of car bodies. Struct Multidiscip Optim 35(4):375–389. https://doi.org/10.1007/s00158-007-0130-6
David S, Julian S, Karen S, Ioannis A, Aja H, Arthur G, Thomas H, Lucas B, Matthew L, Adrian B, Yutian C, Timothy L, Fan H, Laurent S, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359. https://doi.org/10.1038/nature24270
Oriol V, Igor B, Czarnecki Wojciech M, Michaël M, Andrew D, Junyoung C, Choi David H, Richard P, Timo E, Petko G, Junhyuk O, Dan H, Manuel K, Ivo D, Aja H, Laurent S, Trevor C, Agapiou John P, Max J, Vezhnevets Alexander S, Rémi L, Tobias P, Valentin D, David B, Yury S, James M, Paine Tom L, Caglar G, Ziyu W, Tobias P, Yuhuai W, Roman R, Dani Y, Dario W, Katrina MK, Oliver S, Tom S, Timothy L, Koray K, Demis H, Chris A, David S (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354. https://doi.org/10.1038/s41586-019-1724-z
Kober J, Andrew Bagnell J, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721
Kim H, Jordan M, Sastry S, Ng A (2003) Autonomous helicopter flight via reinforcement learning. In: Advances in neural information processing systems, vol 16. MIT Press. https://papers.nips.cc/paper_files/paper/2003/hash/b427426b8acd2c2e53827970f2c2f526-Abstract.html
Abbeel P, Coates A, Quigley M, Ng A (2006). An application of reinforcement learning to aerobatic helicopter flight. In: Advances in neural information processing systems, vol 19. MIT Press. https://proceedings.neurips.cc/paper/2006/hash/98c39996bf1543e974747a2549b3107c-Abstract.html
Abbeel P, Coates A, Andrew YN (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robot Res 29(13):1608–1639. https://doi.org/10.1177/0278364910371999
Novati G, Verma S, Alexeev D, Rossinelli D, van Rees WM, Koumoutsakos P (2017) Synchronised swimming of two fish. Bioinspir Biomimet 12(3):036001. https://doi.org/10.1088/1748-3190/aa6311. arXiv:1610.04248 [physics]
Verma S, Novati G, Koumoutsakos P (2018) Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc Natl Acad Sci 115(23):5849–5854. https://doi.org/10.1073/pnas.1800923115
Ma P, Tian Y, Pan Z, Ren B, Manocha D (2018) Fluid directed rigid body control using deep reinforcement learning. ACM Trans Graph 37(4):96:1-96:11. https://doi.org/10.1145/3197517.3201334
Jean R, Miroslav K, Atle J, Ulysse R, Nicolas C (2019) Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. J Fluid Mech 865:281–302. https://doi.org/10.1017/jfm.2019.62
Fan D, Yang L, Wang Z, Triantafyllou MS, Karniadakis GE (2020) Reinforcement learning for bluff body active flow control in experiments and simulations. Proc Natl Acad Sci 117(42):26091–26098. https://doi.org/10.1073/pnas.2004939117
Jie X, Tao D, Foshey M, Li B, Zhu B, Schulz A, Matusik W (2019) Learning to fly: computational controller design for hybrid UAVs with reinforcement learning. ACM Trans Graph 38(4):42:1-42:12. https://doi.org/10.1145/3306346.3322940
Lee XY, Balu A, Stoecklein D, Ganapathysubramanian B, Sarkar S (2018) Flow shape design for microfluidic devices using deep reinforcement learning. arXiv:1811.12444 [cs, stat]
Kun W, WaiChing S (2019) Meta-modeling game for deriving theory-consistent, microstructure-based traction-separation laws via deep reinforcement learning. Comput Methods Appl Mech Eng 346:216–241. https://doi.org/10.1016/j.cma.2018.11.026
Bendsøe MP (1989) Optimal shape design as a material distribution problem. Struct Optim 1(4):193–202. https://doi.org/10.1007/BF01650949
Martin P (2004) Bendsøe and ole sigmund, topology optimization. Springer, Berlin. https://doi.org/10.1007/978-3-662-05086-6
Hayashi K, Ohsaki M (2020) Reinforcement learning and graph embedding for binary truss topology optimization under stress and displacement constraints. Front Built Environ. https://doi.org/10.3389/fbuil.2020.00059
Shaojun Z, Makoto O, Kazuki H, Xiaonong G (2021) Machine-specified ground structures for topology optimization of binary trusses using graph embedding policy network. Adv Eng Softw 159:103032. https://doi.org/10.1016/j.advengsoft.2021.103032
Hongbo S, Ling M (2020) Generative design by using exploration approaches of reinforcement learning in density-based structural topology optimization. Designs 4(2):10. https://doi.org/10.3390/designs4020010
Seowoo J, Soyoung Y, Namwoo K (2022) Generative design by reinforcement learning: enhancing the diversity of topology optimization designs. Comput Aided Des 146:103225. https://doi.org/10.1016/j.cad.2022.103225
Jiequn H, Arnulf J, Weinan E (2018) Solving high-dimensional partial differential equations using deep learning. Proc Natl Acad Sci 115(34):8505–8510. https://doi.org/10.1073/pnas.1718942115
Weinan E, Han J, Jentzen A (2017) Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun Math Stat 5(4):349–380. https://doi.org/10.1007/s40304-017-0117-6. arXiv:1706.04702
Yang J, Dzanic T, Petersen B, Kudo J, Mittal K, Tomov Vl Camier JS, Zhao T, Zha H, Kolev T, Anderson R, Faissol D (2023) Reinforcement learning for adaptive mesh refinement. In: Proceedings of The 26th international conference on artificial intelligence and statistics PMLR, pp 5997–6014. https://proceedings.mlr.press/v206/yang23e.html
Rabault J, Kuhnle A (2019) Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach. Phys Fluids 31(9):094105. https://doi.org/10.1063/1.5116415. arXiv:1906.10382 [physics]
Novati G, de Laroussilhe HL, Koumoutsakos P (2020) Automating turbulence modeling by multi-agent reinforcement learning. arXiv:2005.09023 [physics]
Liu X-Y, Wang J-X (2021) Physics-informed Dyna-style model-based deep reinforcement learning for dynamic control. Proc Roy Soc A Math Phys Eng Sci 477(2255):20210618. https://doi.org/10.1098/rspa.2021.0618
Haotian S, Zhou Yang W, Keshu CS, Bin R, Qinghui N (2023) Physics-informed deep reinforcement learning-based integrated two-dimensional car-following control strategy for connected automated vehicles. Knowl-Based Syst 269:110485. https://doi.org/10.1016/j.knosys.2023.110485
Ramesh A, Ravindran B (2023) Physics-informed model-based reinforcement learning. arXiv:2212.02179 [cs]
Colin R, Phanindra T (2023) Physics-informed reinforcement learning for motion control of a fish-like swimming robot. Sci Rep 13(1):10754. https://doi.org/10.1038/s41598-023-36399-4
Nielsen MA (2015) Neural networks and deep learning. Determination Press. http://neuralnetworksanddeeplearning.com
Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems, vol 28. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2015/hash/f9be311e65d81a9ad8150a60844bb94c-Abstract.html
Bird S, Klein E, Loper E (2009) Natural language processing with Python, 1st edn. Beijing, Cambridge
Hobson L, Cole H, Max HH (2019) Natural language processing in action: understanding, analyzing, and generating text with Python. Manning Publications Co, Shelter Island
Jurafsky D, Martin JH, Norvig P, Russell SJ (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall series in artificial intelligence, 2nd edn. Prentice Hall, Pearson Education International, Upper Saddle River
Olah C (2015) Understanding LSTM networks. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Le Cun Y, Françoise F-S (1987) Modèles connexionnistes de l’apprentissage. Intellectica 2(1):114–143. https://doi.org/10.3406/intel.1987.1804
Bourlard H, Kamp Y (1988) Auto-association by multilayer perceptrons and singular value decomposition. Biol Cybern 59(4):291–294. https://doi.org/10.1007/BF00332918
Hinton GE, Zemel R (1993) Autoencoders, minimum description length and helmholtz free energy. In: Advances in neural information processing systems, vol 6. Morgan-Kaufmann. https://proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html
Shuangshuang C, Wei G (2023) Auto-encoders in deep learning–a review with new perspectives. Mathematics 11(8):1777. https://doi.org/10.3390/math11081777
Nash JF (1950) Equilibrium points in n-person games. Proc Natl Acad Sci 36(1):48–49. https://doi.org/10.1073/pnas.36.1.48
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X, Chen X (2016) Improved techniques for training GANs. In: Advances in neural information processing systems, vol 29. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2016/hash/8a3363abe792db2d8761d6403605aeb7-Abstract.html
Srivastava A, Valkov L, Russell C, Gutmann MU, Sutton C (2017) VEEGAN: reducing mode collapse in GANs using implicit variational learning. In: Proceedings of the 31st international conference on neural information processing systems, NIPS’17. Curran Associates Inc, Red Hook, pp 3310–3320
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. arXiv:1701.07875 [cs, stat]
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784 [cs, stat]
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th international conference on neural information processing systems, NIPS’16. Curran Associates Inc, Red Hook, pp 2180–2188
Bridle JS, Heading Anthony JR, MacKay David JC (1991) Unsupervised classifiers, mutual information and ’phantom targets. In: Proceedings of the 4th international conference on neural information processing systems, NIPS’91. Morgan Kaufmann Publishers Inc., San Francisco, pp 1096–1101
Larsen ABL, Sønderby SK, Larochelle H, Winther O (2016) Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48, ICML’16, pp 1558–1566
Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd international conference on machine learning. PMLR, pp 2256–2265. https://proceedings.mlr.press/v37/sohl-dickstein15.html
Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. In: Advances in neural information processing systems, vol 33. Curran Associates, Inc., pp 6840–6851. https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html
Nichol A, Dhariwal P (2021) Improved denoising diffusion probabilistic models. arXiv:2102.09672 [cs, stat]
Rezende D, Mohamed S (2015) Variational inference with normalizing flows. In: Proceedings of the 32nd international conference on machine learning. PMLR, pp 1530–1538. https://proceedings.mlr.press/v37/rezende15.html
Ivan K, Prince Simon JD, Brubaker Marcus A (2021) Normalizing flows: an introduction and review of current methods. IEEE Trans Pattern Anal Mach Intell 43(11):3964–3979. https://doi.org/10.1109/TPAMI.2020.2992934
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull 2(4):160–163. https://doi.org/10.1145/122344.122377
Janner M, Fu J, Zhang M, Levine S (2019) When to trust your model: model-based policy optimization. In: Proceedings of the 33rd international conference on neural information processing systems, vol 1122. Curran Associates Inc., Red Hook, pp 12519–12530
Lukasz K, Mohammad B, Piotr M, Blazej O, Campbell RH, Konrad C, Dumitru E, Chelsea F, Piotr K, Sergey L, Afroz M, Ryan S, George T, Henryk M (2020) Model-based reinforcement learning for Atari. arXiv:1903.00374 [cs, stat]
Luo Y, Xu H, Li Y, Tian Y, Darrell T, Ma T (2021) Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. arXiv:1807.03858 [cs, stat]
Deisenroth MP, Rasmussen CE (2011) PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th international conference on international conference on machine learning, ICML’11. Omnipress, Madison, pp 465–472
Levine S, Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in neural information processing systems, vol 27. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2014/hash/6766aa2750c19aad2fa1b32f36ed4aee-Abstract.html
Heess N, Wayne G, Silver D, Lillicrap T, Erez T, Tassa Y (2015) Learning continuous control policies by stochastic value gradients. In: Advances in neural information processing systems, vol 28. Curran Associates, Inc., https://papers.nips.cc/paper_files/paper/2015/hash/148510031349642de5ca0c544f31b2ef-Abstract.html
Clavera I, Fu V, Abbeel P (2020) Model-augmented actor-critic: backpropagating through paths. arXiv:2005.08068 [cs, stat]
Hafner D, Lillicrap T, Ba J, Norouzi M (2020) Dream to control: learning behaviors by latent imagination. arXiv:1912.01603 [cs]
Hafner D, Lillicrap T, Norouzi M, Ba J (2022) Mastering atari with discrete world models. arXiv:2010.02193 [cs, stat]
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3):229–256. https://doi.org/10.1007/BF00992696
Sutton RS, McAllester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, vol 12. MIT Press. https://papers.nips.cc/paper_files/paper/1999/hash/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html
Kakade S (2001) A natural policy gradient. In: Advances in neural information processing systems, vol 14. MIT Press. https://papers.nips.cc/paper_files/paper/2001/hash/4b86abe48d358ecf194c56c69108433e-Abstract.html
Silver D, Lever G, Heess N, Degris T, Wierstra T, Riedmiller M (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on machine learning. PMLR, pp 387–395. https://proceedings.mlr.press/v32/silver14.html
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: Proceedings of the 32nd international conference on machine learning. PMLR, pp 1889–1897. https://proceedings.mlr.press/v37/schulman15.html
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8:279–292. https://doi.org/10.1007/BF00992698
van Hasselt H, Guez A, Silver D (February) Deep reinforcement learning with double Q-learning. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, Phoenix, pp 2094–2100
Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48, ICML’16, New York, pp 1995–2003
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347 [cs]
Richard B (1957) A Markovian decision process. J Math Mech 6(5):679–684
Capuzzo Dolcetta I, Ishii H (1984) Approximate solutions of the bellman equation of deterministic control theory. Appl Math Optim 11(1):161–181. https://doi.org/10.1007/BF01442176
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44. https://doi.org/10.1007/BF00115009
Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1):33–57. https://doi.org/10.1007/BF00114723
Acknowledgements
The authors gratefully acknowledge the funding through the joint research project Geothermal-Alliance Bavaria (GAB) by the Bavarian State Ministry of Science and the Arts (StMWK) as well as the Georg Nemetschek Institut (GNI) under the project DeepMonitor.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Advanced neural network architectures
1.1 Convolutional neural networks
CNNs [55,56,57] leverage the translation-invariant properties of physical objects. Unlike FC-NNs, CNNs process structured data arranged in grids, such as two-dimensional images or three-dimensional voxel data. On this data, specialized convolutional layers (confer Fig. 15a) are applied. These layers utilize a set of trainable kernels to extract features such as edges, textures, and shapes. In addition, pooling layers (as illustrated in Fig. 15b) are employed for downsampling, thereby reducing spatial dimensions, which not only captures essential information but also reduces computational complexity. CNNs excel in learning hierarchical feature representations, rendering them highly effective in tasks like object classification, and image segmentation [624].
1.2 Graph neural networks
GNNs [61,62,63] are tailored to data structured as graphs, where entities and their connections are represented as nodes \({\varvec{v}}\) and edges \({\varvec{e}}\), respectively. Unlike other NNs, GNNs operate on non-Euclidean data with irregular structures, making them suitable for a wide range of applications, including citation networks [61], molecule analysis [625], and even mesh-based simulation [114, 115]. GNNs propagate information through the graph by iteratively aggregating and updating features from neighboring nodes. An example of this is the message passing NN [63], where the following invariant is exploited:
In a directed graphFootnote 28, each edge \(e_i\) has a sender node \(v^s_i\) and a receiver node \(v^r_i\). This enables the formulation of an algorithm operating first on the edges, and subsequently on the nodes, as summarized in Algorithm 1 for a single graph block. These graph blocks can be stacked similarly to layers in other NN architectures.
1.3 Recurrent neural networks
RNNs [58,59,60] harness sequential dependencies and temporal information within data. In contrast to FC-NNs, RNNs are designed to handle sequential or time-series data, where each input \(x_t\) is not treated in isolation but rather as part of a sequence (see Fig. 16). RNNs maintain an internal hidden state \(h_t\) that evolves as new inputs are processed, allowing them to capture context and relationships across time steps. This makes RNNs particularly well-suited for tasks like natural language processing [626, 627], speech recognition [628], and other time-series forecasting tasks. The recurrent nature of RNNs enables them to model dynamic patterns and dependencies, making them a valuable tool in various applications that involve sequential data analysis. Modern variations and alternatives hereof are LSTMs [59], GRUs [180], and transformers [172].
Generative approaches
1.1 Autoencoders
Autoencoders [630,631,632,633] facilitate data generation by mapping high-dimensional training data \(\{{\varvec{x}}_i\}_{i=1}^N\) to a lower-dimensional latent space \(\{{\varvec{h}}_i\}_{i=1}^N\) which can be sampled efficiently. Specifically, an encoder \(\varvec{{\hat{h}}} = E_{NN}({\varvec{x}}; \varvec{\theta }^e)\) transforms an input sample \({\varvec{x}}\) to a reduced latent vector \(\varvec{{\hat{h}}}\). A corresponding decoder \(\varvec{{\hat{x}}}=D_{NN}(\varvec{{\hat{h}}};\varvec{\theta }^d)\) reconstructs the original sample \({\varvec{x}}\) from this latent vector \(\varvec{{\hat{h}}}\). As mentioned in Paragraph 2.1.1.3, the encoder can serve as a tool for dimensionality reduction, whereas the decoder, within the scope of generative approaches, operates as a generator. By emulating the probability distribution of the latent space \(\{\varvec{{\hat{h}}}_i\}_{i=1}^N\), variational autoencoders [17, 18] are able to generate new data that resembles the training data.
1.2 Generative adversarial networks
GANs [19] emulate data distributions by setting up a two-player adversarial game between two NNs:
-
the generator \(G_{NN}\),
-
the discriminator \(D_{NN}\).
The generator creates predictions \(\varvec{{\hat{y}}} = G_{NN}(\varvec{\xi };\varvec{\theta }_G)\) from random noise \(\varvec{\xi }\), while the discriminator attempts to distinguish between these generated predictions \(\varvec{{\hat{y}}}\) from real data \({\varvec{y}}^{{\mathcal {M}}}\). The discriminator assigns a probability score \({\hat{p}}=D_{NN}({\varvec{y}};\varvec{\theta }_D)\) which evaluates the likelihood of a data point \({\varvec{y}}\) being real or generated. The quality of both the generator and the discriminator can be expressed via the following cost function:
Here, \(N_D\) real samples and \(N_G\) generated samples are used for training. The goal for the generator is to minimize the cost function, implying that the discriminator fails to distinguish between real and generated samples. However, the discriminator strives to maximize the cost. Therefore, this is formulated as a minimax optimization problem
Convergence is ideally reached at the Nash equilibrium [634], where the discriminator always outputs a probability of 1/2, signifying its inability to distinguish between real and generated samples. However, GANs can be challenging to train. Problems like mode collapse [635] can arise. Here, the generator learns only a few modes from the training data. In the extreme case, only a single sample from the training data is learned, yielding a low discriminator score, yet an undesirable outcome. To combat mode collapse, design diversity can be either promoted in the learning algorithm or the cost [635, 636]. Another challenge lies in balancing the training of the two NNs. If the discriminator learns too quickly and manages to distinguish all generated samples, the gradient of the cost function (Eq. 81) with respect to the weights becomes zero, halting further progress. A possible remedy is to use the Wasserstein distance in the cost function [637].
Additionally, GANs can be modified to include inputs that control the generated data. This can be achieved in a supervised manner with conditional GANs [638]. The conditional GAN does not just receive random noise, but also an additional input. This supplementary input is considered by the discriminator, which assesses whether the input-output pairs are real or generated. An unsupervised alternative are InfoGANs [639], which disentangle the input information, i.e., the random input \(\xi \), defining the generated data. This is achieved by introducing an additional parameter c, a latent code to the generator \(G_{NN}(\xi , c;\varvec{\theta }_G)\). To ensure that the parameter is used by the NN, the cost (Eq. 81)) is extended by a mutual information term [640] \(I(c, G_{NN}(x, c;\varvec{\theta }_G))\) ensuring that the generated data varies meaningfully based on the input latent code c.
In comparison to variational autoencoders, GANs typically generate higher quality data. However, the advantage of autoencoders lies in their ability to construct a well-structured latent space, where proper sampling leads to smooth interpolations in the generated space. In other words, small changes in the latent space correspond to small changes in the generated space—a characteristic not inherent to GANs. To achieve smooth interpolations, autoencoders can be combined with GANs [641], where the autoencoder acts as generator in the GAN framework, employing both an autoencoder loss and a GAN loss.
1.3 Diffusion models
Diffusion models enhanced by NNs [642,643,644] convert random noise \({\varvec{x}}\) into a sample resembling the training data through a series of transformations. Given a data set \(\{ {\varvec{y}}^0_i\}_{i=1}^N\) that corresponds to the distribution \(q({\varvec{x}}^0)\), a forward noising process \(q({\varvec{x}}^t|{\varvec{x}}^{t-1})\) is introduced. This process adds Gaussian noise to \({\varvec{x}}^{t-1}\) at each time step \(t-1\). The process is applied iteratively
After a sufficient number of iterations T, the resulting distribution approximates a Gaussian distribution. Consequently, a random sample from a Gaussian distribution \({\varvec{x}}^T\) can be denoised with the reverse denoising process \(q({\varvec{x}}^{t-1}|{\varvec{x}}^t)\), resulting in a sample \({\varvec{x}}^0\) that matches the original distribution \(q({\varvec{x}}^0)\). The reverse denoising process is, however, unknown and therefore modeled as a Gaussian distribution, where the mean and covariance are learned by a NN. With the learned denoising process, data can be generated by denoising samples drawn from a Gaussian distribution. Note the similarity to autoencoders. Instead of learning a mapping to a hidden random state \({\varvec{h}}_i\), the encoding is prescribed as the iterative application of Gaussian noise [530].
A related approach are normalizing flows [645] (see [646] for an introduction and extensive review). Here, a basic probability distribution is transformed through a series of invertible transformations, i.e., flows. The goal is to model distributions of interest. The individual transformations can be modeled by NNs. A normalization is required, such that each intermediate probability distribution integrates to one.
Deep reinforcement learning
In reinforcement learning, the environment is commonly modeled as a Markov Decision Process (MDP). This mathematical model is defined by a set of all possible states S, actions A, and associated rewards R. Furthermore, the probability of getting to the next state \(s_{t+1}\) from the previous \(s_t\) with action \(a_t\) is given by \({\mathbb {P}}(s_{t+1}|s_t,a_t)\). Thus, the environment is not necessarily deterministic. One key aspect of a Markov Decision Process is the Markov property, stating that future states depend solely on the current state and action, and not the history of states and actions.
The goal of a reinforcement learning algorithm is to determine a policy \(\pi (s,a)\) which dictates the next action \(a_t\) in order to maximize the cumulative reward \(R_{\Sigma }\). The cumulative reward \(R_{\Sigma }\) is discounted by a discount factor \(\gamma ^t\) in order to give more importance to immediate rewards.
The quality of a policy \(\pi (s,a)\) can be assessed by a state-value function \(V_{\pi }(s)\), defined as the expected future reward given the current state s and following the policy \(\pi \). Similarly, an action-value function \(Q_{\pi }(s)\) determines the expected future reward given the current state s and action a, while subsequently following the policy \(\pi \). The expected value along a policy \(\pi \) is denoted as \({\mathbb {E}}_\pi \).
The optimal value and quality function correspondingly follow the optimal policy:
The approaches can be subdivided into model-based and model-free. Model-based methods incorporate a model of the environment. In the most general sense, a probabilistic environment, this entitles the probability distribution of the next state \({\mathbb {P}}(s_{t+1}|s_t,a_t)\) and of the next reward \({\mathbb {R}}(r_{t+1}|s_{t+1},s_t,a_t)\). The model of the environment can be cheaply sampled to improve the policy \(\pi \) with model-free reinforcement learning techniques [647,648,649,650] discussed in the sequel (Appendices C.1 and C.2). However, if the model is differentiable, the gradient of the reward can directly be used to update the policy [651,652,653,654,655,656]. This is identical to the optimization through differentiable physics solvers discussed in Sect. 3.3.3. Model-free reinforcement learning techniques can be used to enhance the optimization.
A further distinction is made between policy-based [657,658,659,660,661] and value-based [662,663,664] approaches. Policy-based methods, such as deep policy networks [38] (Appendix C.1), directly optimize the policy. By contrast, value-based methods, such as deep Q-learning [664] (Appendix C.2) learn the value function from which the optimal policy is selected. Actor-critic methods, such as proximal policy optimization [665] combine the ideas with an actor that performs a policy and a critic that judges its quality. Both can be modeled by NNs.
1.1 Deep policy networks
In deep policy networks, the policy, i.e., the mapping of states to actions, is modeled by a NN \({\hat{a}}=\pi (s;\varvec{\theta })\). The quality of the NN is assessed by the expected cumulative reward \(R_{\Sigma }\), formulated in terms of the action-value function Q(s, a).
Its gradient (see [38, 658, 660] for a derivation), given as
can be applied within a gradient ascent scheme to learn the optimal policy.
1.2 Deep Q-learning
Deep Q-learning identifies the optimal action-value function Q(s, a) from which the optimal policy is extracted. Q-Learning relies on the Bellman optimality criterion [666, 667]. By separating the reward \(r_0\) at the first step, the recursion formula of the optimal state-value function, i.e., the Bellman optimality criterion, can be established:
Here, \(s'\) represents the next state after s. This can be done analogously for the action-value function.
The recursion enables an update formula, referred to as temporal difference (TD) learning [668, 669]. Specifically, the current estimate \(Q^{(m)}\) at state \(s_t\) is compared to the more accurate estimate at the next state \(s_{t+1}\) using the obtained reward \(r_t\), referred to as the TD target estimate. The difference is the TD error, which in combination with a learning rate \(\alpha \) is used to update the function \(Q^{(m)}\):
Here, the TD target estimate only looks one step ahead—and is therefore referred to as TD(0). The generalization is called TD(N). In the limit \(N\rightarrow \infty \), the method is equivalent to Monte Carlo learning, where all steps are performed and a true target is obtained.
Deep Q-learning introduces a NN for the action-value function \(Q(s,a;\varvec{\theta })\). Its quality is assessed with a loss composed of the mean squared error of the TD error.
Lastly, the optimal policy \(\pi (s)\) maximizing the action-value function \(Q(s,a;\varvec{\theta })\) is extracted:
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Herrmann, L., Kollmannsberger, S. Deep learning in computational mechanics: a review. Comput Mech 74, 281–331 (2024). https://doi.org/10.1007/s00466-023-02434-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00466-023-02434-4