Deep-HyROMnet: A Deep Learning-Based Operator Approximation for Hyper-Reduction of Nonlinear Parametrized PDEs

Cicci, Ludovica; Fresca, Stefania; Manzoni, Andrea

doi:10.1007/s10915-022-02001-8

Deep-HyROMnet: A Deep Learning-Based Operator Approximation for Hyper-Reduction of Nonlinear Parametrized PDEs

Open access
Published: 11 October 2022

Volume 93, article number 57, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Scientific Computing Aims and scope Submit manuscript

Deep-HyROMnet: A Deep Learning-Based Operator Approximation for Hyper-Reduction of Nonlinear Parametrized PDEs

Download PDF

2210 Accesses
17 Citations
1 Altmetric
Explore all metrics

This article has been updated

Abstract

To speed-up the solution of parametrized differential problems, reduced order models (ROMs) have been developed over the years, including projection-based ROMs such as the reduced-basis (RB) method, deep learning-based ROMs, as well as surrogate models obtained through machine learning techniques. Thanks to its physics-based structure, ensured by the use of a Galerkin projection of the full order model (FOM) onto a linear low-dimensional subspace, the Galerkin-RB method yields approximations that fulfill the differential problem at hand. However, to make the assembling of the ROM independent of the FOM dimension, intrusive and expensive hyper-reduction techniques, such as the discrete empirical interpolation method (DEIM), are usually required, thus making this strategy less feasible for problems characterized by (high-order polynomial or nonpolynomial) nonlinearities. To overcome this bottleneck, we propose a novel strategy for learning nonlinear ROM operators using deep neural networks (DNNs). The resulting hyper-reduced order model enhanced by DNNs, to which we refer to as Deep-HyROMnet, is then a physics-based model, still relying on the RB method approach, however employing a DNN architecture to approximate reduced residual vectors and Jacobian matrices once a Galerkin projection has been performed. Numerical results dealing with fast simulations in nonlinear structural mechanics show that Deep-HyROMnets are orders of magnitude faster than POD-Galerkin-DEIM ROMs, still ensuring the same level of accuracy.

Mesh-Informed Neural Networks for Operator Learning in Finite Element Spaces

Article Open access 23 September 2023

Data-driven reduced order modeling of poroelasticity of heterogeneous media based on a discontinuous Galerkin approximation

Article 22 June 2021

Error estimates for POD-DL-ROMs: a deep learning framework for reduced order modeling of nonlinear parametrized PDEs enhanced by proper orthogonal decomposition

Article Open access 24 April 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Mathematical models involving partial differential equations (PDEs) depending on a set of parameters are ubiquitous in applied sciences and engineering. These input parameters are defined to characterize, e.g., material properties, loads, boundary/initial conditions, source terms, or geometrical features. High-fidelity simulations based on full-order models (FOMs), like the finite element (FE) method, entail huge computational costs in terms of CPU time and memory, if a large number of degrees of freedom dofs) is required. Complexity is amplified whenever interested to go beyond a single direct simulation, such as in the multi-query contexts of optimization, parameter estimation and uncertainty quantification. To face these problems, several strategies to build reduced order models (ROMs) have been developed over the years, aiming at computing reliable solutions to parametrized PDEs at a greatly reduced cost. A large class of ROMs relies on a projection-based approach, which aims at approximating the unknown state quantities as a linear superimposition of basis functions; these latter then span a subspace which the governing equations are projected onto [8, 9]. Among these, the reduced basis (RB) method [35, 49] is a powerful and widely used technique, characterized by a splitting of the reduction procedure into an expensive, parameter-independent offline phase (however performed once and for all) and an efficient, parameter-dependent online phase. Its efficiency mainly relies on two crucial assumptions:

1.
The solution manifold is low-dimensional, so that the FOM solutions can be approximated as a linear combination of few reduced modes with a small error;
2.
The online stage is completely independent of the high-fidelity dimension [21].

Assumption 1 concerns the approximability of the solution set and is associated with the slow decay of the Kolmogorov N-width [46]. However, for physical phenomena characterized by a slow N-width decay, such as those featuring coherent structures that propagate over time [25], the manifold spanned by all the possible solutions is not of small dimension, so that ROMs relying on linear (global) subspaces might be inefficient. Alternative strategies to overcome this bottleneck can be, e.g., local RB methods [2, 43, 55], or nonlinear approximation techniques, mainly based on deep learning architectures, see, e.g., [22,23,24, 37, 39].

Assumption 2 is automatically verified in linear, affinely parametrized problems [49], but cannot be fulfilled when dealing with nonlinear problems, as the online assembling of the reduced operators requires to reconstruct the high-fidelity ones. To overcome this issue, a further level of approximation, or hyper-reduction, must be introduced. State-of-the-art methods, such as the empirical interpolation method (EIM) [6], the discrete empirical interpolation method (DEIM) [15], its variant matrix DEIM (MDEIM) [42], the missing point estimation [4] and the Gauss-Newton with approximated tensors (GNAT) [13], aim at recovering an affine expansion of the nonlinear operators by computing only a few entries of the nonlinear terms. EIM, DEIM and GNAT can be seen as approximate-then-project techniques, since operator approximation is performed at the level of FOM quantities, prior to the projection stage. On the other hand, project-then-approximate strategies have also been introduced, aiming at approximating directly ROM operators, such as the reduced nonlinear residual and its Jacobian. An option in this sense is represented by the so-called Energy Conserving Sampling and Weighting (ECSW) technique [20]; see. e.g., [21] for a detailed review.

Although extensively used in many applications, spanning from fluid flows models to cardiac mechanics [2, 11, 19, 27, 50, 54], these strategies are code-intrusive and, more importantly, might impact on the overall efficiency of the ROM approximation in complex applications. Indeed, when dealing with highly nonlinear problems, expensive hyper-reduction strategies are usually required if aiming at preserving the physical constraints at the ROM level, that is, if ROMs are built consistently with the FOM through a projection-based procedure. For instance, a large number of DEIM basis vectors are required to ensure the convergence of the reduced Newton systems arising from the linearization of the nonlinear hyper-ROM when dealing with highly nonlinear elastodynamics problems [17], even if few basis functions are required to approximate the state solution in a low-dimensional subspace through, e.g., proper orthogonal decomposition (POD). An alternative formulation of DEIM in a finite element framework, known as unassembled DEIM (UDEIM) [53], has been proposed to preserve the sparsity of the problem, while in [44] a localized DEIM selecting smaller local subspace for the approximation of the nonlinear term is presented.

Semi-intrusive strategies, avoiding the ROM construction through a Galerkin projection, have been recently proposed exploiting surrogate models to determine the RB approximation. For instance, neural networks (NNs) or Gaussian process regression can be employed to learn the map between the input parameters and the reduced-basis expansion coefficients in a non-intrusive way [32, 33, 36, 52]. An approximation of the nonlinear term arising in projection-based ROMs is obtained in [26] through deep NNs (DNNs) that exploit the projection of FOM solutions. NNs have also also been recently applied in the context of operator inference for (parametrized) differential equations, combining ideas from classical model reduction with data-driven learning. For instance, the design of NNs able to accurately represent linear/nonlinear operators, mapping input functions to output functions, has been proposed recently in [40]; based on the universal approximation theorem of operators [16], a general deep learning framework, called DeepONet, has been introduced to learn continuous operators, such as solution operators of PDEs, using DNNs; see also [56]. In [45] a non-intrusive projection-based ROM for parametrized time-dependent PDEs including low-order polynomial nonlinear terms is considered, inferring an approximation of the reduced operators directly from data of the FOM. Finally, the obtained low-dimensional system is solved—in this case, the learning task consists in the solution to a least squares problem; see also [7]. Projection-based ROMs and machine learning have been fused in [47] aiming at the approximation of linear and quadratic ROM operators, focusing on the solution to a large class of fluid dynamics applications. Similarly, in [5] a non-intrusive technique, exploiting machine learning regression algorithms, is proposed for the approximation of ROM operators related to projection-based methods for the solution of parametrized PDEs. Finally, principal component analysis-based model reduction with a NNs for approximation has been combined in [10], in a purely data-driven fashion, of infinite-dimensional solution maps, such as the solution operator for time-dependent PDEs.

In this paper, we develop a novel semi-intrusive, deep learning-enhanced hyper-reduced order modeling strategy, which hereon we refer to as shape Deep-HyROMnet, by leveraging a Galerkin-RB method for solution dimensionality reduction and DNNs to perform hyper-reduction. Since the efficiency of the nonlinear ROM hinges upon the cost-effective approximation of the projections of the (discrete) reduced residual operator and its Jacobian (when an implicit numerical scheme is employed), the key idea is to overcome the computational bottleneck associated with classical, intrusive hyper-reduction techniques, like DEIM, by relying on DNNs to approximate otherwise expensive reduced nonlinear operators at a greatly reduced cost. Unlike data-driven-based methods, for which the predicted output is not guaranteed to satisfy the underlying PDE, our proposed method is physics-based, as it computes the ROM solution by actually solving the reduced nonlinear systems by means of the Newton method, thus exploiting the physics of the problem. A further benefit of the proposed Deep-HyROMnet method lies on the fact that the inputs given to the NNs are low-dimensional arrays, so that the overwhelming training times and costs that may be required by even moderately large FOM dimensions can be avoided. Note that we aim at efficiently approximating the nonlinear operators that result from the composition of the reduced solution operator—that maps the input parameter vector and time to the corresponding ROM solution—and the reduced residual/Jacobian operator, that maps the ROM solution to the reduced residual/Jacobian evaluated on the ROM solution. We apply the proposed methodology to the solution of problems in nonlinear solid mechanics, focusing on parametrized nonlinear elastodynamics and complex (e.g., exponential nonlinear) constitutive relations of the material undergoing large deformations, showing that the Deep-HyROMnet approach outperforms the Galerkin-RB method equipped with DEIM in terms of computational speed-up during the online stage, achieving the same level of accuracy results.

The paper is structured as follows. We recall the formulation of the RB method for nonlinear unsteady parametrized PDEs in Sect. 2, relying on POD for the construction of the reduced subspace and on DEIM as hyper-reduction technique (hence obtaining POD-Galerkin-DEIM ROMs). The proposed Deep-HyROMnet and the DNN architecture employed to perform reduced operator approximation are detailed in Sect. 3. The numerical performances of the resulting method are assessed in Sect. 4 on several benchmark problems related with nonlinear elastodynamics, highlighting some conclusions in Sect. 5.

2 Projection-Based ROMs: The Reduced Basis Method

Our goal is to pursue an efficient solution to nonlinear unsteady PDE problems depending on a set of input parameters, which can be written in abstract form as follows: given an input parameter vector ${\varvec{\mu }}\in \mathcal {P}$ and $\textbf{u}(0;{\varvec{\mu }})$, find $\textbf{u}(t;{\varvec{\mu }})\in V$ such that, $\forall t\in (0,T]$,

$$\begin{aligned} R(\textbf{u}(t;{\varvec{\mu }}),t;{\varvec{\mu }})=0 \quad \text {in } V', \end{aligned}$$

(1)

where the parameter space $\mathcal {P}\subset \mathbb {R}^P$ is a compact set, R is the residual of a second-order differential equation, and $V'$ is the dual of a suitable functional space $H^1_0(\varOmega )^m\subseteq V \subseteq H^1(\varOmega )^m$ over the bounded domain $\varOmega \subset \mathbb {R}^d$, where V depends on the boundary conditions at hand. In particular, we are interested in vector problems ($m=3$) set in $d=3$ dimensions. The role of the parameter vector ${\varvec{\mu }}$ depends on the particular application at hand; in the case of nonlinear elastodynamics, ${\varvec{\mu }}$ is for instance related to the coefficients of the constitutive relation, the material properties and the boundary conditions.

After discretising problem (1) in space and time, we end up with a fully-discrete nonlinear system

$$\begin{aligned} \textbf{R}(\textbf{u}_h^n({\varvec{\mu }}),t^n;{\varvec{\mu }}) = \textbf{0} \quad \text {in } \mathbb {R}^{N_h}, \end{aligned}$$

(2)

at each time step $t^n$, $n=1,\ldots ,N_t$, which can be solved by means of the Newton method: given ${\varvec{\mu }}\in \mathcal {P}$ and an initial guess $\textbf{u}_h^{n,(0)}({\varvec{\mu }})$, for $k\ge 0$, find $\mathbf {\delta u}_h^{n,(k)}({\varvec{\mu }})\in \mathbb {R}^{N_h}$ such that

$$\begin{aligned} \left\{ \begin{array}{llll} \textbf{J}(\textbf{u}_h^{n,(k)}({\varvec{\mu }}),t^{n};{\varvec{\mu }})\mathbf {\delta u}_h^{n,(k)}({\varvec{\mu }}) = - \textbf{R}(\textbf{u}_h^{n,(k)}({\varvec{\mu }}),t^{n};{\varvec{\mu }}) \\ \textbf{u}_h^{n,(k+1)}({\varvec{\mu }}) = \textbf{u}_h^{n,(k)}({\varvec{\mu }}) + \mathbf {\delta u}_h^{n,(k)}({\varvec{\mu }}) \end{array} \right. \end{aligned}$$

(3)

until suitable stopping criteria are fulfilled. Here, $\textbf{u}_h^{n,(k)}({\varvec{\mu }})$ represents the solution vector for a fixed parameter ${\varvec{\mu }}$ computed at time step $t^{n}$ and Newton iteration k, while $\textbf{R}\in \mathbb {R}^{N_h}$ and $\textbf{J}\in \mathbb {R}^{N_h\times N_h}$ denote the residual vector and the corresponding Jacobian matrix, respectively. In particular, $\textbf{u}_h^{n,(0)}({\varvec{\mu }})$ is selected as the solution vector obtained at time $t^{n-1}$ once Newton iterations have converged. We refer to (3) as the high-fidelity, FOM for problem (1). In particular, we rely on a Galerkin-FE method for space approximation, and consider implicit finite difference schemes for time discretization, which do not require restrictions on $\Delta t$ [48]. The high-fidelity dimension $N_h$ is determined by the underlying mesh and the chosen FE polynomial order and can be extremely large in case of very fine meshes.

To reduce the FOM numerical complexity, we introduce a projection-based ROM, by relying on the RB method [49]. The idea of the RB method is to suitably select $N\ll N_h$ vectors of $\mathbb {R}^{N_h}$, forming the so-called RB matrix $\textbf{V}\in \mathbb {R}^{N_h\times N}$, and to generate a reduced problem by performing a Galerkin projection of the FOM onto the subspace $V_N=\text {Col}(\textbf{V})\subset \mathbb {R}^{N_h}$ generated by these vectors. This method relies on the assumption that the reduced-order approximation can be expressed as a linear combination of few, problem-dependent, basis functions, that is

$$\begin{aligned} \textbf{u}_h^{n}({\varvec{\mu }})\approx \textbf{Vu}_N^{n}({\varvec{\mu }}) \end{aligned}$$

for $n=1,\ldots ,N_t$, where $\textbf{u}_N^{n}({\varvec{\mu }})\in \mathbb {R}^N$ denotes the vector of the ROM-dofs at time $t^n\ge 0$. The latter is obtained by imposing that the projection of the FOM residual computed on the ROM solution is orthogonal to the trial subspace (in the case of a Galerkin projection): given ${\varvec{\mu }}\in \mathcal {P}$, at each time $t^{n}$, for $n=1,\ldots ,N_t$, we seek $\textbf{u}_N^{n}({\varvec{\mu }})\in \mathbb {R}^{N}$ such that

$$\begin{aligned} \textbf{V}^T\textbf{R}(\textbf{Vu}_N^{n}({\varvec{\mu }}),t^{n};{\varvec{\mu }}) = \textbf{0}. \end{aligned}$$

From now on, we will denote the reduced residual $\textbf{V}^T\textbf{R}$ and the corresponding Jacobian $\textbf{V}^T\textbf{JV}$ as $\textbf{R}_N$ and $\textbf{J}_N$, respectively. Then, the associated reduced Newton problem at time $t^{n}$ reads: given $\textbf{u}_N^{n,(0)}({\varvec{\mu }})$, for $k\ge 0$, find $\mathbf {\delta u}_N^{n,(k)}({\varvec{\mu }})\in \mathbb {R}^{N}$ such that

$$\begin{aligned} \left\{ \begin{array}{llll} \textbf{J}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^{n};{\varvec{\mu }})\mathbf {\delta u}_N^{n,(k)}({\varvec{\mu }}) = - \textbf{R}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^{n};{\varvec{\mu }}), \\ \textbf{u}_N^{n,(k+1)}({\varvec{\mu }}) = \textbf{u}_N^{n,(k)}({\varvec{\mu }}) + \mathbf {\delta u}_N^{n,(k)}({\varvec{\mu }}), \end{array} \right. \end{aligned}$$

(4)

until a suitable stopping criterion is fulfilled.

2.1 Solution-Space Reduction: Proper Orthogonal Decomposition

In this section we provide an overview of the proper orthogonal decomposition (POD) technique used to compute the reduced basis $\textbf{V}$ through the so-called method of snapshots [8, 14]. Let

$$\begin{aligned} \mathcal {M}_{u_h} = \{\textbf{u}_h^n({\varvec{\mu }})\in \mathbb {R}^{N_h}~|~{\varvec{\mu }}\in \mathcal {P}, ~n=1,\ldots ,N_t \} \end{aligned}$$

be the (discrete) solution manifold identified by the image of $\textbf{u}_h$, that is, the set of all the PDE solutions for ${\varvec{\mu }}$ varying in the parameter space and $t^n$ in the partition of the time interval. Our goal is to approximate $\mathcal {M}_{u_h}$ with a reduced linear manifold, the trial manifold

$$\begin{aligned} \mathcal {M}_{u_N}^{lin} = \{\textbf{Vu}_N^n({\varvec{\mu }})~|~\textbf{u}_N^n({\varvec{\mu }})\in \mathbb {R}^N, ~{\varvec{\mu }}\in \mathcal {P},~n=1,\ldots ,N_t \}. \end{aligned}$$

To do this, given $n_s<N_h$ sampled instances of ${\varvec{\mu }}\in \mathcal {P}$, the snapshots matrix $\textbf{S}$ is constructed by collecting column-wise the FOM solution $\textbf{u}_h^n({\varvec{\mu }}_\ell )$ at each time step $n=1,\ldots ,N_t$, for $\ell =1,\ldots ,n_s$, that is

$$\begin{aligned} \textbf{S} = \left[ \textbf{u}_h^1({\varvec{\mu }}_1)\left| \ldots \right| \textbf{u}_h^{N_t}({\varvec{\mu }}_1)\left| \ldots \right| \textbf{u}_h^1({\varvec{\mu }}_{n_s})\left| \ldots \right| \textbf{u}_h^{N_t}({\varvec{\mu }}_{n_s})\right] \in \mathbb {R}^{N_h\times N_t n_s}. \end{aligned}$$

Sampling can be performed, e.g., through a latin hypercube sampling design, as well as through suitable low-discrepancy points sets. The POD basis $\textbf{V}\in \mathbb {R}^{N_h\times N}$ spanning the subspace $V_N$ is usually obtained by performing the singular value decomposition $\textbf{S}=\mathbf {U\Sigma Z}^T$ of $\textbf{S}$, and then collecting the first N columns of $\textbf{U}$, corresponding to the N largest singular values stored in the diagonal matrix ${\varvec{\Sigma }}= \text {diag}(\sigma _1,\ldots ,\sigma _{r})\in \mathbb {R}^{n_s\times n_s}$, with $\sigma _1\ge \dots \ge \sigma _{r}\ge 0$ and $r\le N_h\wedge n_s$ being the rank of $\textbf{S}$. This yields an orthonormal basis such that

$$\begin{aligned}&\Vert \textbf{S} - \textbf{V}\textbf{V}^T\textbf{S} \Vert _F^2 = \underset{\textbf{W}\in \mathbb {R}^{N_h\times N}:\textbf{W}^T\textbf{W}=\textbf{I}_N}{\min } \Vert \textbf{S} - \textbf{W}\textbf{W}^T\textbf{S} \Vert _F^2 = \sum _{i=N+1}^r \sigma _i^2, \end{aligned}$$

where $\Vert \cdot \Vert _F$ is the Frobenius norm. Hence, singular values’ decay directly impacts on the size N, which can be computed as the minimum integer satisfying

$$\begin{aligned} RIC(N) = \frac{\sum _{\ell =1}^{N}\sigma _i^2}{\sum _{\ell =1}^{r}\sigma _i^2} \ge 1-\varepsilon _{POD}^2 \end{aligned}$$

(5)

for a given tolerance $\varepsilon _{POD}>0$. In this work we exploit the so-called randomized-SVD, which offers a powerful tool to perform low-rank matrix approximations when dealing with massive datasets [34], such as high-dimensional snapshots matrices.

2.2 Hyper-Reduction: The Discrete Empirical Interpolation Method

In the case of parametrized PDEs featuring nonaffine dependence on the parameter and/or nonlinear (high-order polynomial or nonpolynomial) dependence on the field variable, a further level of reduction, known as hyper-reduction, must be introduced [30, 42]. Note that if nonlinearities only include quadratic (or, at most, cubic) terms and do not feature any parameter dependence, assembling of nonlinear terms in the ROM can be performed by projection of the corresponding FOM quantities, once and for all [28].

For the case at hand, the residual $\textbf{R}_N$ and the Jacobian $\textbf{J}_N$ appearing in the reduced Newton system (4) depend on the solution at the previous iteration and, therefore, must be computed at each step $k\ge 0$. It follows that, for any new instance of the parameter ${\varvec{\mu }}$, we need to assemble the high-dimensional FOM-arrays before projecting them onto the reduced subspace, entailing a computational complexity which is still of order $N_h$. To setup an efficient offline–online computational splitting, an approximation of the nonlinear operators that is independent of the high-fidelity dimension is required.

Several techniques have been employed to provide this further level of approximation [4, 6, 13, 15, 20]; among these, DEIM has been successfully applied to stationary or quasi-static nonlinear mechanical problems [11, 27]. Its key idea is to replace the nonlinear arrays in (4) with a collateral reduced basis expansion, computed through an inexpensive interpolation procedure. In this framework, the high-dimensional residual $\textbf{R}({\varvec{\mu }})$ is projected onto a reduced subspace of dimension $m<N_h$ spanned by a basis ${\varvec{\Phi }}_\mathcal {R}\in \mathbb {R}^{N_h\times m}$

$$\begin{aligned} \textbf{R}({\varvec{\mu }}) \approx {\varvec{\Phi }}_\mathcal {R}\textbf{r}({\varvec{\mu }}), \end{aligned}$$

where $\textbf{r}({\varvec{\mu }})\in \mathbb {R}^m$ is the vector of the unknown amplitudes. The matrix ${\varvec{\Phi }}_\mathcal {R}$ can be precomputed offline by performing POD on a set of high-fidelity residuals collected when solving (4) for $n_s'$ training input parameters

$$\begin{aligned} \textbf{S}_{{\varvec{\rho }}} =\left[ \textbf{R}(\textbf{Vu}_N^{n,(k)}({\varvec{\mu }}_\ell ),t^{n};{\varvec{\mu }}_\ell )), k\ge 0\right] _{n=1,\ldots ,N_t}^{\ell =1,\ldots ,n_s'}. \end{aligned}$$

The unknown parameter-dependent coefficient $\textbf{r}({\varvec{\mu }})$ is obtained online by collocating the approximation at the m components selected by a greedy procedure, and requiring that, for these components, $\textbf{P}^T\textbf{R}({\varvec{\mu }}) = \textbf{P}^T{\varvec{\Phi }}_\mathcal {R}\textbf{r}({\varvec{\mu }})$, so that

$$\begin{aligned} \textbf{r}({\varvec{\mu }}) = (\textbf{P}^T{\varvec{\Phi }}_\mathcal {R})^{-1}\textbf{P}^T\textbf{R}({\varvec{\mu }}), \end{aligned}$$

where $\textbf{P}\in \mathbb {R}^{N_h\times m}$ is the boolean matrix associated with the interpolation constraints. We thus define the hyper-reduced residual vector as

$$\begin{aligned} \textbf{R}_{N,m}({\varvec{\mu }}) := \textbf{V}^T{\varvec{\Phi }}_\mathcal {R}(\textbf{P}^T{\varvec{\Phi }}_\mathcal {R})^{-1}\textbf{P}^T\textbf{R}({\varvec{\mu }}) \approx \textbf{R}_N \equiv \textbf{V}^T\textbf{R}({\varvec{\mu }}). \end{aligned}$$

Finally, the associated Jacobian approximation $\textbf{J}_{N,m}({\varvec{\mu }})$ can be computed as the derivative of $\textbf{R}_{N,m}({\varvec{\mu }})$ with respect to the reduced displacement, obtaining

$$\begin{aligned} \textbf{J}_{N,m}({\varvec{\mu }}) = \textbf{V}^T{{\varvec{\Phi }}_\mathcal {R}}(\textbf{P}^T{{\varvec{\Phi }}_\mathcal {R}})^{-1} \textbf{P}^T\textbf{J}({\varvec{\mu }})\textbf{V}, \end{aligned}$$

or relying on the MDEIM algorithm [42], see [11, 41].

However, the application of DEIM in the case of nonlinear time-dependent PDEs can be rather inefficient, especially when turning to complex problem which require a high number of residual basis (and, thus, of interpolation points) to ensure the convergence of the hyper-reduced Newton system

$$\begin{aligned} \left\{ \begin{array}{llll} \textbf{J}_{N,m}(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^{n};{\varvec{\mu }})\mathbf {\delta u}_N^{n,(k)}({\varvec{\mu }}) = - \textbf{R}_{N,m}(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^{n};{\varvec{\mu }}) \\ \textbf{u}_N^{n,(k+1)}({\varvec{\mu }}) = \textbf{u}_N^{n,(k)}({\varvec{\mu }}) + \mathbf {\delta u}_N^{n,(k)}({\varvec{\mu }}). \end{array} \right. \end{aligned}$$

In fact, the m points selected by the DEIM algorithm correspond to a subset of nodes of the computational mesh, which, together with the neighboring nodes (i.e. those sharing the same cell), form the so-called reduced mesh; see, e.g., the sketch reported in Fig. 1. Since the entries of any FE-vector are associated with the dofs of the problem, $\textbf{P}^T\textbf{R}({\varvec{\mu }})$ is computed by integrating the residual only on the quadrature points belonging to the elements forming the reduced mesh, which, nevertheless, can be rather dense.

A modification of the DEIM algorithm, UDEIM, has been proposed in [54] to exploit the sparsity of the problem and minimize the number of element function calls. However, a high number of nonlinear function evaluations is still required when the number of magic points is sufficiently large. Indeed, DEIM-based affine approximations are effective, in terms of computational costs, provided that few entries of the nonlinear terms can be cheaply computed; however, this situation does not occur neither for dynamical systems arising from the linearization of a nonlinear system around a steady state, nor when dealing with global nonpolynomial nonlinearities.

In this paper, we propose an alternative technique to perform hyper-reduction, which is independent of the underlying mesh and relies on a DNN architecture to approximate reduced residual vectors and Jacobian matrices. The introduction of a surrogate model to perform operator approximation is justified by the fact that, often, most of the CPU time needed online for each new parameter instance is required by POD-Galerkin-DEIM ROMs for assembling arrays such as residual vectors or corresponding Jacobian matrices on the reduced mesh.

3 Operator Approximation: A Deep Learning-Based Technique (Deep-HyROMnet)

To recover the offline–online efficiency of the RB method, overcoming the need to assemble the nonlinear arrays onto the computational mesh as in the case of DEIM, we present a novel projection-based method which relies on DNNs for the approximation of the nonlinear terms. We refer to this strategy as to a hyper-reduced order model enhanced by deep neural networks (Deep-HyROMnet). Our goal is the efficient numerical approximation of the sets

which represents the reduced residual manifold and reduced Jacobian manifold, respectively, in a way that depends only on the ROM dimension N and on the number of parameters P. To achieve this task, we employ the DNN architecture developed in [23] for the so-called DL-ROM techniques. It is worthy to note that, except for the approximation error of the reduced nonlinear operators, the proposed Deep-HyROMnet approach is a physics-based method and that the computed solution satisfies the nonlinear equation of the problem at hand, up to a further approximation of ROM residual and Jacobian arrays—thus, similarly to what happened for a POD-Galerkin-DEIM ROM. The main idea of the deep learning-based operator approximation approach that replaces DEIM in our new Deep-HyROMnet strategy, is to learn the following input-to-residual and input-to-Jacobian maps, respectively:

$$\begin{aligned} {\varvec{\rho }}_N:({\varvec{\mu }},t^n,k)&\longmapsto \textbf{R}_N(\textbf{Vu}^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }}), \\ {\varvec{\iota }}_N:({\varvec{\mu }},t^n,k)&\longmapsto \textbf{J}_N(\textbf{Vu}^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }}), \end{aligned}$$

provided $({\varvec{\mu }},t^n,k)\in \mathcal {P}\times \{t^1,\ldots ,t^{N_t}\}\times \mathbb {N}^+$ as inputs and to finally replace the linear system in (4) with

$$\begin{aligned} {{\varvec{\iota }}}_N({\varvec{\mu }},t^n,k)\mathbf {\delta u}^{n,(k)}({\varvec{\mu }}) = - {{\varvec{\rho }}}_N({\varvec{\mu }},t^n,k). \end{aligned}$$

Hence, we aim at approximating the residual vector and the Jacobian matrix obtained after their projection onto the reduced space of dimension $N\ll N_h$. Indeed, performing a Galerkin projection onto the POD solution space allows to severely reduce the problem dimension from $N_h$ to N and, hence, to ease the learning task with respect the reconstruction of the full-order residual $\textbf{R}$ and Jacobian $\textbf{J}$.

Remark 1

As an alternative to Newton iterative scheme, we can rely on Broyden’s method [12], which belongs to the class of quasi-Newton methods. This allows to avoid the computation of the Jacobian matrix at each iteration $k\ge 0$ by relying on rank-one updates, based on residuals computed at previous iterations. However, we are able to compute Jacobian matrices very efficiently using automatic differentiation (AD), so that the computational burden is the assembling of residual vectors. For this reason, in this paper we will focus on the Newton method only, that is, the solution of problem (4).

To summarize, in the case of the Newton approach, we end up with the following reduced problem: given ${\varvec{\mu }}\in \mathcal {P}$ and, for $n=1,\ldots ,N_t$, the initial guess $\textbf{u}_N^{n,(0)}({\varvec{\mu }}) = \textbf{u}_N^{n-1}({\varvec{\mu }})$, find $\delta \textbf{u}_N^{n,(k)}\in \mathbb {R}^N$ such that, for $k\ge 0$,

$$\begin{aligned} \left\{ \begin{array}{llll} {\varvec{\iota }}_N({\varvec{\mu }},t^{n},k) \delta \textbf{u}_N^{n,(k)}({\varvec{\mu }}) = - {\varvec{\rho }}_N({\varvec{\mu }},t^{n},k), \\ \textbf{u}_N^{n,(k+1)}({\varvec{\mu }}) = \textbf{u}_N^{n,(k)}({\varvec{\mu }}) + \delta \textbf{u}_N^{n,(k)}({\varvec{\mu }}), \end{array} \right. \end{aligned}$$

(6)

until $\Vert {\varvec{\rho }}_N({\varvec{\mu }},t^n,k)\Vert _2 / \Vert {\varvec{\rho }}_N({\varvec{\mu }},t^n,0)\Vert _2 < \varepsilon $, where $\varepsilon >0$ is a given tolerance. In Algorithms 1 and 2, we report a summary of the offline and online stages of Deep-HyROMnet, respectively.

3.1 Deep Neural Network Construction

For the sake of generality, we will focus on the DNN-based approximation of the reduced residual vector only, that is

$$\begin{aligned} {{\varvec{\rho }}}_N({\varvec{\mu }},t^n,k) \approx \textbf{R}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }})\in \mathbb {R}^N. \end{aligned}$$

In fact, by relying on a suitable transformation, we can easily write the Jacobian matrix as a vector of dimension $N^2$ and apply to it the same procedure described in the following for the residual vector. In particular, we define the transformation

$$\begin{aligned} vec:\mathbb {R}^{N\times N}\rightarrow \mathbb {R}^{N^2}, \quad vec(\textbf{J}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }})) = \textbf{j}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }}), \end{aligned}$$

which consists in stacking the columns of $\textbf{J}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }})$ in a vector that is passed to the DNN as training sample, thus obtaining

$$\begin{aligned} \widetilde{{\varvec{\iota }}}_N({\varvec{\mu }},t^n,k)\approx \textbf{j}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }})\in \mathbb {R}^{N^2}. \end{aligned}$$

Finally, the vec operation is reverted, so that ${{\varvec{\iota }}}_N({\varvec{\mu }},t^n,k) = vec^{-1}(\widetilde{{\varvec{\iota }}}_N({\varvec{\mu }},t^n,k))$.

We thus aim at efficiently approximating the whole set $\mathcal {M}_{R_N}$ by means of the reduced residual trial manifold, defined as

$$\begin{aligned} \mathcal {M}_{\rho _N} = \{ {{\varvec{\rho }}}_N({\varvec{\mu }},t^n,k) ~|~{\varvec{\mu }}\in \mathcal {P}, ~n=1,\ldots ,N_t, ~k\ge 0\}\subset \mathbb {R}^{N}. \end{aligned}$$

The approximation of the ROM residual $\textbf{R}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }})$ takes the form

$$\begin{aligned} {\varvec{\rho }}_N({\varvec{\mu }},t^n,k) = \widetilde{\textbf{R}}_N({\varvec{\mu }},t^n,k;{\varvec{\theta }}_{DF},{\varvec{\theta }}_{D}) = \textbf{f}^D_N({\varvec{\phi }}_q^{DF}({\varvec{\mu }},t^n,k;{\varvec{\theta }}_{DF});{\varvec{\theta }}_{D}) \end{aligned}$$

where:

${\varvec{\phi }}_q^{DF}(\cdot ~;{\varvec{\theta }}_{DF}):\mathbb {R}^{P+2}\rightarrow \mathbb {R}^q$ such that
$$\begin{aligned} {\varvec{\phi }}_q^{DF}({\varvec{\mu }},t^n,k;{\varvec{\theta }}_{DF}) = \textbf{R}_q({\varvec{\mu }},t^n,k;{\varvec{\theta }}_{DF}) \end{aligned}$$
is a deep feedforward neural network (DFNN), consisting in the subsequent composition of a nonlinear activation function, applied to a linear transformation of the input, multiple times. Here, ${\varvec{\theta }}_{DF}$ denotes the vector of parameters of the DFNN, collecting all the corresponding weights and biases of each layer and q is as close as possible to the input size $P+2$;
$\textbf{f}^D_N(\cdot ~;{\varvec{\theta }}_{D}):\mathbb {R}^q\rightarrow \mathbb {R}^N$ such that
$$\begin{aligned} \textbf{f}^D_N(\textbf{R}_q({\varvec{\mu }},t^n,k;{\varvec{\theta }}_{DF});{\varvec{\theta }}_{D}) = \widetilde{\textbf{R}}_N({\varvec{\mu }},t^n,k;{\varvec{\theta }}_{DF},{\varvec{\theta }}_{D}) \end{aligned}$$
is the decoder function of a convolutional autoencoder (CAE), obtained as the composition of several layers (some of which are convolutional), depending upon a vector ${\varvec{\theta }}_{D}$ collecting all the corresponding weights and biases.

The encoder function of the CAE is exploited, during the training stage only, to map the reduced residual $\textbf{R}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }})$ associated to $({\varvec{\mu }},t^n,k)$ onto a low-dimensional representation

$$\begin{aligned} \textbf{f}^E_q(\textbf{R}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }});{\varvec{\theta }}_{E}) = \widetilde{\textbf{R}}_q({\varvec{\mu }},t^n,k;{\varvec{\theta }}_{E}), \end{aligned}$$

where $\textbf{f}^E_q(\cdot ~;{\varvec{\theta }}_{E}):\mathbb {R}^N\rightarrow \mathbb {R}^q$ denotes the encoder function, depending upon a vector ${\varvec{\theta }}_{E}$ of parameters. The choice of a CAE is due to the fact that, thanks to the shared parameters and local connectivity properties which allow to reduce the numbers of parameters of the network and the number of associated computations, convolutional layers are better suited than dense layers to handle high-dimensional correlated data.

Remark 2

We point out that the input of the encoder function—that is, the reduced residual vector $ \textbf{R}_N$—is reshaped into a square matrix by rewriting its elements in row-major order, thus obtaining $\textbf{R}_N^{reshape}\in \mathbb {R}^{\sqrt{N}\times \sqrt{N}}$. If N is not a square, the input $\textbf{R}_N$ is zero-padded as explained in [29], and the additional elements are subsequently discarded.

Regarding the prediction of the reduced residual for new unseen instances of the inputs, computing ${\varvec{\rho }}_N({\varvec{\mu }}_{test},t^n,k)$ for any given ${\varvec{\mu }}_{test}\in \mathcal {P}$, and for any possible $n=1,\ldots ,N_t$, $k\ge 0$, corresponds to the testing stage of a DFNN and of the decoder function of a convolutional AE. Thus, we discard the encoder function at testing time. The architecture used during the training stage is reported in Fig. 2; only the block highlited in the inner (red) box is then used during the testing phase.

Let $\textbf{M}\in \mathbb {R}^{(P+2)\times N_{train}}$, with $N_{train}=n_s'N_tN_k$, be the parameter matrix of the input triples, i.e.

$$\begin{aligned} \textbf{M} = \left[ \left( {\varvec{\mu }}_\ell ,t^n,k\right) \right] _{\ell =1,\ldots ,n_s',n=1,\ldots ,N_t,k\ge 0}. \end{aligned}$$

The corresponding training dataset for ${\varvec{\rho }}_N$ is given by the reduced residual snapshots matrix $\textbf{S}_{{\varvec{\rho }}}\in \mathbb {R}^{N\times N_{train}}$ defined as

$$\begin{aligned} \textbf{S}_{{\varvec{\rho }}} = \left[ \right.&\textbf{R}_N(\textbf{Vu}_N^{1,(1)}({\varvec{\mu }}_1),t^1;{\varvec{\mu }}_1)|\dots |\textbf{R}_N(\textbf{Vu}_N^{1,(N_{k_1})}({\varvec{\mu }}_1),t^1;{\varvec{\mu }}_1)|\\&\textbf{R}_N(\textbf{Vu}_N^{2,(1)}({\varvec{\mu }}_1),t^2;{\varvec{\mu }}_1)|\dots |\textbf{R}_N(\textbf{Vu}_N^{2,(N_{k_2})}({\varvec{\mu }}_1),t^2;{\varvec{\mu }}_1)|\\&\vdots \\&\textbf{R}_N(\textbf{Vu}_N^{N_t,(1)}({\varvec{\mu }}_{n_s'}),t^{N_t};{\varvec{\mu }}_{n_s'})|\dots |\textbf{R}_N(\textbf{Vu}_N^{N_t,(N_{k_{n_s'}})}({\varvec{\mu }}_{n_s'}),t^{N_t};{\varvec{\mu }}_{n_s'}) \left. \right] \\ =&\left[ \textbf{R}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}),t^n;{\varvec{\mu }}_\ell )\right] _{\ell =1,\ldots ,n_s',n=1,\ldots ,N_t,k\ge 0}, \end{aligned}$$

that is, by the matrix collecting column-wise ROM residuals computed for $n_s'$ sampled parameters ${\varvec{\mu }}_\ell \in \mathcal {P}$, at different time instances $t^1,\ldots ,t^{N_t}$ and for each Newton iteration $k\ge 0$. The training stage consists in solving the following optimization problem in the weights variable ${\varvec{\theta }}= ({\varvec{\theta }}_{E},{\varvec{\theta }}_{DF},{\varvec{\theta }}_{D})$:

$$\begin{aligned} \mathcal {J}({\varvec{\theta }}) = \dfrac{1}{N_{train}}\sum _{\ell =1}^{n_s'}\sum _{n=1}^{N_t}\sum _{k=0}^{N_k}\mathcal {L}({\varvec{\mu }}_\ell ,t^n,k;{\varvec{\theta }}) \rightarrow \underset{{\varvec{\theta }}}{\min } \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \mathcal {L}({\varvec{\mu }}_\ell ,t^n,k;{\varvec{\theta }})&= \dfrac{1}{2} \Vert \textbf{R}_N(\textbf{V}\textbf{u}_N^{n,(k)}({\varvec{\mu }}_\ell ),t^n;{\varvec{\mu }}_\ell ) - \widetilde{\textbf{R}}_N({\varvec{\mu }}_\ell ,t^n,k;{\varvec{\theta }}_{DF},{\varvec{\theta }}_{D}) \Vert ^2 \\&\quad + \dfrac{1}{2} \Vert \widetilde{\textbf{R}}_q({\varvec{\mu }}_\ell ,t^n,k;{\varvec{\theta }}_{E}) - \textbf{R}_q({\varvec{\mu }}_\ell ,t^n,k;{\varvec{\theta }}_{DF}) \Vert ^2. \end{aligned} \end{aligned}$$

(7)

The loss function (7) combines the reconstruction error, i.e. the error between the ROM residual and the corresponding DNN approximation, and the error between the intrinsic coordinates and the output of the encoder. The training stage of the DNN involved in Deep-HyROMnet is detailed in Algorithm 3; in particular, we denote by $\alpha $ the training-validation splitting fraction, by $\eta $ the starting learning rate of the ADAM optimizer, by $N_b$ the batch size, by $n_b = (1-\alpha )N_{train}/N_b$ the number of minibatches and by $N_e$ the maximum number of epochs. We recall that the total number of training samples is given by $N_{train}= n_s'N_tN_k$. The testing stage of the DNN is detailed in Algorithm 4. As suggested by [23, 24], we set $\alpha =8:2$, $\eta =10^{-4}$, $N_b=20$ and $N_e=10^4$. To avoid overfitting, we employ the early-stopping regularization technique [29], that is, we stop the training if (and when) the loss evaluated on the validation set does not decrease over 500 epochs. Regarding the NN architecture, a 12-layer DFNN with 50 neurons per hidden layer and q neurons in the output layer is employed, where q is problem-dependent and set equal to the intrinsic dimension, i.e. $q = P + 2$ (the time instant and the Newton iteration are considered as additional parameters); further details about the CAE architecture are summarized in Tables 1 and 2. In all cases, except for the last convolutional layer of the decoder, we consider the ELU nonlinear activation function [18], selected by assessing the impact of different activation functions on the validation loss.

Table 1 Attributes of convolutional and dense layers of the encoder function of the CAE

Deep-HyROMnet: A Deep Learning-Based Operator Approximation for Hyper-Reduction of Nonlinear Parametrized PDEs

Abstract

Similar content being viewed by others

Mesh-Informed Neural Networks for Operator Learning in Finite Element Spaces

Data-driven reduced order modeling of poroelasticity of heterogeneous media based on a discontinuous Galerkin approximation

Error estimates for POD-DL-ROMs: a deep learning framework for reduced order modeling of nonlinear parametrized PDEs enhanced by proper orthogonal decomposition

1 Introduction

2 Projection-Based ROMs: The Reduced Basis Method

2.1 Solution-Space Reduction: Proper Orthogonal Decomposition

2.2 Hyper-Reduction: The Discrete Empirical Interpolation Method

3 Operator Approximation: A Deep Learning-Based Technique (Deep-HyROMnet)

Remark 1

3.1 Deep Neural Network Construction

Remark 2

Remark 3

4 Numerical Results

4.1 Nonlinear Elastodynamics

4.2 Deformation of a Clamped Rectangular Beam

4.2.1 Test Case 1: Linear Function for the Pressure Load

4.2.2 Test Case 2: Hat Function for the Pressure Load

4.2.3 Test Case 3: Step Function for the Pressure Load

4.3 Passive Inflation and Active Contraction of an Idealized Left Ventricle

5 Conclusions

Data Availability

Change history

24 February 2023

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation