1 Introduction

This article addresses model order reduction for reactive flows. These flows often exhibit sharp fronts, like flames, which makes their simulation computational expensive. This suggests applying model reduction for reducing simulation costs. However, classical model order reduction methods fail [1] due to the sharp, moving fronts, that pose challenges for reducing and predicting new system states. This manuscript addresses these issues by presenting a new decomposition method together with efficient strategies to evaluate the dynamics of the reduced system. For our study we use advection–reaction–diffusion systems (ARD) with a nonlinear Kolmogorov–Petrovsky–Piskunov (KPP) reaction term, as the complex front dynamics with topology changes of such systems feature essential difficulties for model order reduction, while the analysis is simplified because the reacting quantity is scalar and bounded.

Model order reduction (MOR) has been studied for various ARD systems [2,3,4,5,6], including those related to self-ignition from coal [2], wildland fires [3], combustion systems [4, 5, 7, 8] and financial risk marketing [6]. In this study we focus on systems that exhibit locally one-dimensional traveling fronts. The compact support of the moving fronts is challenging for linear reduced basis methods, such as the proper orthogonal decomposition (POD). The POD approximates a set of snapshots \(q(\varvec{x},t_i)\), with \( i=1,\ldots ,N_{t}\), by separation of variables

$$\begin{aligned} q(\varvec{x},t) \approx \sum _{k=1}^r \hat{a}_k(t) \hat{\psi }_k(\varvec{x}), \end{aligned}$$
(1)

with help of time amplitudes \(\hat{a}_k(t)\) and spatial modes \(\hat{\psi }_k(\varvec{x})\), computed by a singular value decomposition (SVD). Based on (1), (Petrov–)Galerkin methods (see for a review [9, 10]) are then used to project the original dynamics of the system onto a lower dimensional subspace to obtain a reduced order model (ROM) of only a few variables that enable an efficient evaluation. In parametric MOR (pMOR) this is referred to as a two-stage offline–online procedure [9]. In the offline step, the reduced basis is built from a large number of trajectories that have been simulated for a predefined set of parameters using the full order model (FOM/offline). Thereafter, in the online step, any new parameter can be simulated efficiently using the ROM.

Unfortunately, moving fronts with sharp gradients significantly slow down the convergence of (1), thereby limiting the use of pMOR. This has been numerically investigated for reactive flows in [1]. The reducibility, i.e. the smallest approximation error that is obtained using an r-dimensional linear subspace (1), can be theoretically quantified by the Kolmogorov width [11] or equivalently by the Hankel singular values as shown in [12]. While an exponential decay of the Kolmogorov width is necessary for the success of classical MOR, it was shown in [13, 14] that for typical transport-dominated systems the Kolmogorov width decays only with \(\sim 1/\sqrt{r}\), where r is the dimension of the subspace. The slow decay is often referred to as the Kolmogorov barrier [15].

Currently, three different groups of methods can be identified in the literature that attempt to break the Kolmogorov barrier:

  1. 1.

    Transport compensating methods. These methods improve the convergence of (1) by compensating the transport, for which many authors use coordinate transformations [16,17,18,19,20,21,22,23,24] to align the front onto a reference frame in which the moving front is stationary. This allows to efficiently decompose the temporal variation of the front shape into few basis functions. Similarly, ideas have been applied to parameter-dependent shock positions [25,26,27] or fluid–structure interaction with parametrized geometries [28, 29]. Often these methods make assumptions that limit their applicability to the specific problem at hand. For example the shifted POD assumes multiple wave like structures, where the transport dependent movement is known n[16, 18] or at least a sufficiently smooth function in time [17], which is easy to parametrize and itself independent of \(\varvec{x}\). First attempts to automate the generation of the transforms have been introduced in [17, 30, 31]. Similarly [28, 29] construct mappings that are based on the knowledge of the parametrization of the deformation or movement of the structures that interacts with the flow. Other authors [23] for example make use of knowledge of the advection speed/characteristics in the system. Although these methods have already been applied to reactive flows, as in [3] for a wildland fire model or in [7, 18, 32] for detonation combustion, they were studied for 1D systems only. Essential for transport compensating methods is a mapping from a transformed domain on the simulation domain, which is often assumed to be a smooth bijection. This complicates their applicability to complex transports with topological changes as they often appear in reactive flows.

  2. 2.

    Artificial neural networks. A common dimension reduction method uses artificial neural networks. Although the specific implementations differ, most authors [33,34,35,36] rely on a so-called autoencoder (AE) structure. This structure is a composition of affine linear and non-linear functions that squeezes the input data through a small informational bottleneck, which is then mapped back to its input dimension by a decoder. After the network has been trained to approximate its input data, the decoder can be used for model order reduction. However, in contrast to the transport compensating methods, AE-based dimension reduction does not make explicit use of the underlying transport. It is therefore more general in the sense that transport can also be handled where an alignment of the transported quantities is difficult. Examples are systems that feature topological changes like splitting and merging flame fronts or multi-phase flows. Unfortunately, the resulting AE descriptions lack structural insights and interpretability. Here, an optimal approximation is not guaranteed and it is not clear which structures or features are identified by an AE. Existing feature interpretation methods are usually based on first-order linear approximations around a given input. However, they may fail if the inputs are perturbed [37,38,39].

  3. 3.

    Adaptive basis methods. Adaptive basis methods such as the dynamic low rank approximation [40], the adaptive basis and adaptive sampling discrete empirical interpolation method (AADEIM) [41], the reduced basis method with adaptive time partitioning [42] or the online adaptive basis refinement and compression method [43] break the Kolmogorov barrier since they exploit the time-local low-rank structure of transport-dominated fluid systems. This is achieved by adapting the dominant modes to the solution locally in time. Here, [42, 43] build on an offline–online procedure. However, to account for the time-local structure [42, 43], either assign a basis to each sub-intervals in time that is switched during the online stage [42] or locally refine a hierarchical basis according to an online error estimator [43]. In contrast, DLRA and AADEIM do not require an offline stage since they compute the reduced basis during the online stage. Dynamical low-rank approximation [40] decomposes the PDEs solution with help of a low-rank matrix factorization and derives evolution equations to update the factors of the solution in time. These evolution equations are determined by minimizing a residual, that arises from projecting the solution onto the manifold of low-rank matrices. The AADEIM method [41], updates the basis using an adaptive version (ADEIM [44]) of the discrete empirical interpolation method [45]. Since the underlying description of the adaptive basis methods does not make explicit assumptions on the transport, they can be readily applied to ARD systems, as shown for 1D premixed flames in [41, 46]. Nevertheless, as recently observed in [47] for hyperbolic shallow water moment equations, an offline–online procedure can lead to substantially faster ROMs during the online stage compared to an adaptive basis method [40], as the offline–online procedure does not require expensive online updates.

In this work, we build on the idea of transport compensation together with the classical offline–online model order reduction procedure. However, in contrast to previous works [16,17,18,19,20,21], where the authors aim for a simple coordinate transformation and compensate with a couple of modes, we use a single mode to describe the front, but use a transformation that is only locally defined. To this end, we make use of an auxiliary field \(\phi (\varvec{x},t)\), which parameterizes the transport efficiently, together with a shape function f to retain the front shape:

$$\begin{aligned} q(\varvec{x},t) \approx f(\phi (\varvec{x},t)) \quad \text {s. t.}\quad \phi (\varvec{x},t) = \sum _{k=1}^r a_k(t) \psi _k(\varvec{x}). \end{aligned}$$
(2)

The auxiliary field \(\phi :\mathbb {R}^d\times [0,T]\rightarrow \mathbb {R}\) allows to embed the local one-dimensional front movement into a d-dimensional transport. Since the transport is only parameterized locally, changes in the topology of the front surface can be captured. A similar approach was introduced in [4], where \(\phi \) was constructed with help of a signed distance function and the front function f was determined from a fit to the reacting front. While improving the approximation, this was found to be not optimal, since the obtained signed distance function does not have to be of low rank. Here, we follow a similar approach, but we formulate an optimization problem to compute \(\phi \). The resulting description (2) is called Front Transport Reduction (FTR) in the remainder of this manuscript. Due to the nonlinearly activated linear space created by the span of \(\{\psi _k(\varvec{x})\}_{k=1,\ldots ,r}\), this approach shows many parallels to artificial neural networks. It can be seen as the decoder part of a shallow autoencoder structure. While shallow autoencoders have been used in previous studies [36], we are explicitly incorporating the underlying physical assumptions and thereby obtain interpretable results of reduced variables.

The second part of this manuscript addresses dynamical ROM predictions of ARD systems using the FTR ansatz (2). Here, many different methods have been applied to ARD systems in combination with the POD (1), which can be categorized into intrusive (based on the initial set of equations) [1, 48, 49] or non-intrusive (based on data of the system state) [5, 50, 51] reduced order models. Intrusive models project the original equation system on the reduced manifold, which is nonlinear in our approach. For non-linear reduced mappings, so-called manifold Galerkin methods have been used in combination with neural networks in [52] and with dynamical transformed modes in [53]. Unfortunately, manifold Galerkin methods require special hyper-reduction schemes to gain speedups in the resulting ROM. Examples of these methods are the extended-ECSW scheme proposed by [54], the gappy-POD based GNAT procedure [52] first introduced for nonlinear manifolds in [36] or the shifted DEIM [3]. The idea of all of these methods is to evaluate the nonlinear dynamics of the underlying system for a small number of points to determine the evolution of the parameters in the reduced space. Unfortunately, the extended-ECSW scheme [54] and the GNAT procedure [36, 54], cannot be used for ARD systems with sharp advected fronts, since they preselect a fixed set of points, but the dynamics are localized only near the moving front. As already noted by [55,56,57] for linear projection-based model order reduction of nonlinear equations, the choice of sampling points is crucial for the accuracy and complexity of the resulting ROM. We discuss this problem and state a practical solution using a special hyper-reduction scheme, based on the reduced integration domain (RID) method [58]. Furthermore, we examine the ability of the FTR mapping to predict new system states with the help of a non-intrusive method introduced by [59].

1.1 Structure of the Article

The remainder of the article is structured into three parts. The first part, Sect. 2, is dedicated to the FTR decomposition, where we motivate the decomposition and introduce two algorithms to solve the corresponding high dimensional optimization problem. While the first algorithm is based on iterative thresholding of singular values (see Sect. 2.2), the other algorithm uses artificial neural networks (see Sect. 2.3). The algorithms are applied and compared for two synthetic examples in Sect. 2.4 and one example of a two-dimensional (2D) ARD system with topology change in Sect. 2.5. In the second part, Sect. 3, we use the low dimensional description of the FTR to predict new system states via non-intrusive MOR (Sect. 3.1) and intrusive MOR (Sect. 3.2). For the latter, we propose a special hyper-reduction method. The resulting ROM is tested for 1D and 2D ARD systems in Sect. 3.4. Finally, we summarize our results in the last part, Sect. 4.

1.2 Nomenclature

Matrices are denoted in capital letters with straight, bold font \(\textbf{A}\in \mathbb {R}^{M\times M}\) and vectors are denoted by \(\varvec{x}\in \mathbb {R}^M\). Whenever a scalar function \(f:\mathbb {R}\rightarrow \mathbb {R}\) is applied on a vector valued quantity \(\varvec{x}\), we assume element-wise operation on the entries of \(\varvec{x}=(x_1,\ldots ,x_M)\), if not stated otherwise, and write \(f(\varvec{x})\) instead of \((f(x_1),\ldots ,f(x_M))\), similarly for matrices \(f(\textbf{A})\). The Hadamard product \(\odot \) is defined by the element-wise multiplication of matrix entries. Furthermore, if \(q(\varvec{x},t,\mu )\in \mathbb {R}\) is the solution of a scalar PDE, its (discrete space) ODE counterpart is denoted by a vector \(\varvec{q}(t,\mu )\in \mathbb {R}^M\) containing the spatial values of q in its components. Correspondingly, the snapshot matrix \(\textbf{Q}\) contains all time and parameter snapshots in its columns: \(\textbf{Q}=[\varvec{q}(t_1,\mu _1), \varvec{q}(t_2,\mu _1), \ldots , \varvec{q}(t_{N_{t}},\mu _P)]\). Partial derivatives in space and time are denoted by \(\partial _x=\frac{\partial }{\partial x}, \partial _t=\frac{\partial }{\partial t}, \partial _{xx}=\frac{\partial ^2}{\partial x^2}\) and \(\dot{q}=\frac{\partial {q}}{\partial t}\). The Laplacian of a function q is defined by: \(\nabla ^{2}q = \sum _{i=1}^d \frac{\partial ^2}{\partial x_i^2} q,\) where we make use of the nabla operator: \(\nabla =\left( {\frac{\partial }{\partial x_{1}}},\ldots ,{\frac{\partial }{\partial x_{d}}}\right) .\)

2 Dimension Reduction Methods for Complex Moving Reaction Fronts

In this section, we introduce the front transport reduction (FTR) and motivate why our special nonlinear reduction method is advantageous when decomposing reactive flows. We present two FTR decomposition approaches, that are based on iterative thresholding (Sect. 2.2) and a special autoencoder network (Sect. 2.3).

2.1 A Nonlinear Decomposition Approach for Moving Fronts

To motivate our decomposition approach, we consider advection–reaction–diffusion systems:

$$\begin{aligned} \frac{\partial q}{\partial t} = \varvec{u}\cdot \nabla q + \kappa {\nabla ^{2}}q + {R(q)}. \end{aligned}$$
(3)

with a Kolmogorov–Petrovsky–Piskunov (KPP) [60] reaction term \({R(q)}=\mu q^\alpha (q-1)\) with \(\alpha>0, \mu >0\). This systems describes how a quantity or reactant \(q(\varvec{x},t)\) spreads in space \(\varvec{x}\in \varOmega \subset \mathbb {R}^d,\,d>0\) over time \(t\in [0,T]\). This spread can be caused by the advection with velocity \(\varvec{u}\in \mathbb {R}^d\) or an interplay between diffusion \({\nabla ^{2}}q\) and reaction processes R(q). KPP-systems exhibit traveling or pulsating fronts [61,62,63,64] that are locally one-dimensional. Locally one-dimensional means that the spatial variable \(\varvec{x}=(x_1,x_2,\ldots ,x_d)\) of the reactant q can be transformed to \(\varvec{\hat{x}}=(\hat{\phi },\hat{x}_2,\ldots ,\hat{x}_d)\), where \(\hat{x}_2,\ldots ,\hat{x}_d\) are on a hyperplane tangential to the front of the traveling wave. On this hyperplane, all gradients in the equation vanish relative to the \(\hat{\phi }\)-terms that are normal to the traveling wave. The resulting equation is therefore only dependent on one spatial variable \(\hat{\phi }\) (see [65, p. 87], for details). Therefore, we can assume that the solution of (3) can be transformed into

$$\begin{aligned} q(\varvec{x},t)=\hat{f}(\hat{\phi }(\varvec{x},t),t). \end{aligned}$$
(4)

where the front-profile of the traveling wave is described by \(\hat{f}:\mathbb {R}^2\rightarrow \mathbb {R}\) and \(\hat{\phi }:\varOmega \times \mathbb {R}\rightarrow \mathbb {R}\) determines the location of the front. Note, that this is not a special property for KPP-systems, but a general property shared by other reaction or combustion systems [65,66,67].

Example 1

(One dimensional KPP-system) We consider the one-dimensional reaction–diffusion system taken from [68]

$$\begin{aligned} \partial _t q = \partial _{xx} q + \frac{8}{\mu ^2} q^2(q-1) \end{aligned}$$
(5)

with \(\mu >0\). A special solution in form of a traveling wave is

$$\begin{aligned} q(x,t)=\frac{1}{2}\left( 1+\tanh \left( \frac{x-2t/\mu }{\mu }\right) \right) . \end{aligned}$$
(6)

If we identify \(\hat{\phi }(x,t)=\frac{x-2t/\mu }{\mu }\) as a local distance to the front and the time constant front profile \(\hat{f}(x,t)=\frac{1}{2}\left( 1+\tanh (x)\right) \) we exactly recover the form (4). The relation between the analytical solution q, the front function \(\hat{f}\) and the transport field \(\hat{\phi }\) are shown in Fig. 1a.

Fig. 1
figure 1

Illustration of the basic idea of the front transport reduction method (Example 1). a The FTR replaces the sharp traveling front structure q (blue curves), by a level set function \(\hat{\phi }\) (orange lines) and a nonlinear front function \(\hat{f}\) (indicated by the red arrow). Both quantities share locally the same transport (shift). However, the level set field \(\hat{\phi }(x,t)\) is of low rank and can be therefore parameterized with only a few POD basis functions (here: \(\{x,1\}\)). b Comparison of the singular values of the snapshot matrices \({\varvec{\Phi }},\textbf{Q}\in \mathbb {R}^{101,101}\). The snapshot matrices \({\hat{\varvec{\Phi }}}_{ij}=\hat{\phi }(i\varDelta x,j\varDelta t),\, \textbf{Q}_{ij}=q(i\varDelta x,j\varDelta t)\) are computed from (6) using \(\varDelta x=10\varDelta t=0.1\) and \(\mu =0.2\) (Color figure online)

As shown in Fig. 1 the transport field \(\hat{\phi }\) is usually smooth and slowly varying compared to q, thus we assume it to be of low rank. For the Example 1 this is visualized by the comparison of the normalized singular values when decomposing the snapshot matrices \(\hat{\varvec{\Phi }}_{ij}=\hat{\phi }(x_i,t_j)\) and \(\textbf{Q}_{ij}=q(x_i,t_j)\) with help of the SVD.

Our decomposition approach exploits this observation. For given snapshot data \(\textbf{Q}\in \mathbb {R}^{M\times N_t}\) with \(\textbf{Q}_{ij}=q(\varvec{x}_i,t_j)\) and a time constant front function f, the approach decomposes the data with help of the nonlinear mapping

$$\begin{aligned} q(\varvec{x},t) \approx \tilde{q}(\varvec{x},t) = f(\phi (\varvec{x},t)) \quad \text {s. t.}\quad \phi (\varvec{x},t) = \sum _{k=1}^r a_k(t) \psi _k(\varvec{x}),\quad r\ll N_t \end{aligned}$$
(7)

and a low rank field \({\varvec{\Phi }}_{ij}=\phi (\varvec{x}_i,t_j)\), that allows to embed the local one-dimensional front movement into a d-dimensional transport. Since the transport is only parameterized locally, changes in the topology of the front surface can be captured. The profile of the wave f can be analytically computed with help of perturbation theory after transforming (3) into the co-moving frame (see for example [69]) or by fitting the front profile [4]. Unlike (4) the ansatz (7) drops the time dependence of the front function since it can be attributed to \(\phi \). Note, that additionally to the truncation error \(\mathcal {R}=\hat{\phi }-\phi \) the approach (7) introduces another error \(\varDelta f = \hat{f}-f\) (see for further details [4]). Thus even for vanishing truncation error the overall error is limited by the so-called approximation error \(\varDelta f\) of the non-linear function, which is known from machine learning (see Section 5.2 in [70]). This error is caused by the choice of the unknown function class, that is describing the front.

The following problem formalizes finding a suitable \(\hat{\phi }\) (encoding transport including change) of Fig. 1 for given snapshots of q and prescribed front shape f. The (non-unique) field \(\phi \) would be in the simplest transport of Fig. 1 a linear function with its offset changing in time, which is a rank two structure. The decomposition goal is formulated as an optimization problem.

Problem 1

Front Transport Reduction For a given snapshot matrix \(\textbf{Q}\in \mathbb {R}^{M\times N_t}\) with \(\textbf{Q}_{ij}=q(\varvec{x}_i,t_j)\in [0,1]\) and nonlinear smooth monotone increasing function \(f:\mathbb {R}\rightarrow [0,1]\), find a rank r matrix \({\varvec{\Phi }}\in \mathbb {R}^{M\times N_t}\), such that the error \(\Vert \textbf{Q}-\tilde{\textbf{Q}} \Vert _\textrm{F}^2\) for \(\tilde{\textbf{Q}}_{ij}=f({\varvec{\Phi }}_{ij})\) is minimized.

Remark 1

In Problem 1 we assume that the data \(\textbf{Q}\) has been rescaled, which is a common practice in machine learning to help the convergence of the optimization algorithm (see Section 25.2 in [70]).

Two possible algorithms that solve the optimization problem 1 are provided in the following sections.

2.2 Front Transport Reduction via Iterative Thresholding of Singular Values

A simple iterative algorithm to determine the auxiliary field \({\varvec{\Phi }}\in \mathbb {R}^{M\times N_t}\) of the front transport reduction Problem 1 is stated in Algorithm 1.

Our algorithm is constructed by combining a gradient descent step (line 4) to minimize \(\Vert \textbf{Q}-\tilde{\textbf{Q}} \Vert _\textrm{F}^2\), together with a rank-r projection step of \({\varvec{\Phi }}\) (line 5). In the gradient descent step, the FTR residual

$$\begin{aligned} \mathcal {L}_\textrm{FTR}({\varvec{\Phi }}) =\frac{1}{2} \Vert \textbf{Q}-\tilde{\textbf{Q}} \Vert _\textrm{F}^2 \quad \text { with }\quad \tilde{\textbf{Q}} = f({\varvec{\Phi }}) \end{aligned}$$
(8)

is minimized in direction of the gradient \(D_{\varvec{\Phi }} \mathcal {L}_\textrm{FTR}({\varvec{\Phi }})=f'({\varvec{\Phi }})\odot (f({\varvec{\Phi }})-\textbf{Q})\). Here, \(f'({\varvec{\Phi }}),f({\varvec{\Phi }})\) are element-wise operations of \(f,f'\) on \({\varvec{\Phi }}\). Since f is monotonically increasing and bounded, it is sufficient to replace \(D_{\varvec{\Phi }} \mathcal {L}_\textrm{FTR}\) by \(\textbf{R}=f({\varvec{\Phi }})-\textbf{Q}\) in line 4. Neglecting \(f'({\varvec{\Phi }})\) in the gradient prevents a dying gradient for points where \(f'(\Phi )\rightarrow 0\), i.e. \(\left| {\varvec{\Phi }}_{ij} \right| \gg 0\). Note that replacing the simple gradient descent step by a quasi-Newton method or a line search would not affect the convergence rate, since it is followed by a projection step (line 5), which is likely to destroy the possible larger step of a more sophisticated method.

Algorithm 1
figure a

FTR as iterative thresholding

The computational costs of Algorithm 1 scale with the complexity of the singular value decomposition (SVD). For large systems it can be advantageous to use randomized- or wavelet techniques [71, 72] to compute the SVD. Different initialization of \({\varvec{\Phi }}^0\) can be used, which may increase the convergence of the algorithm. For example, \({\varvec{\Phi }}^0\) might be initialized with help of a signed distance function since it was found to be a good candidate for a low-rank decomposition [4]. Other possibilities include using a rank r approximation of \({\varvec{\Phi }}^0=f^{-1}(\textbf{Q})\).

2.3 Front Transport Reduction via Neural Autoencoder Networks

Another way to solve the optimization problem 1 is with the help of neural autoencoder networks, which are commonly used in dimensionality reduction [73]. For a general introduction to neural autoencoder networks, we refer to [74]. Here, we briefly explain the concept and the specifications of our network.

An autoencoder tries to reproduce the input data while squeezing it through an informational bottleneck. It consists of two parts, the

Encoder:

\(g_\text {enc}:\mathbb {R}^{M}\rightarrow \mathbb {R}^r, \varvec{q}\mapsto \varvec{a} = g_\text {enc}(\varvec{q}) \), mapping the input data \(\varvec{q}\) onto points \(\varvec{a}\) in a learned lower dimensional latent space and the

Decoder:

\(g_\text {dec}:\mathbb {R}^{r}\rightarrow \mathbb {R}^{M},\varvec{a} \mapsto g_\text {dec}(\varvec{a}) = \tilde{\varvec{q}}\), mapping the latent representation back to the input space.

The composition of the two parts

$$\begin{aligned} \tilde{\varvec{q}} = g_\text {dec}(g_\text {enc}(\varvec{q})) \end{aligned}$$

defines the autoencoder. The task of the optimization procedure is now to determine \(g_\text {dec}, g_\text {enc}\), such that the reconstruction error over the training data \(\textbf{Q}=[ \varvec{q}_1, \ldots , \varvec{q}_{N_{t}}]\):

$$\begin{aligned} \mathcal {L}_{\textrm{FTR}} = \sum _{i=1}^{N_{t}}\Vert \varvec{q}_i - \tilde{\varvec{q}}_i\Vert ^2_\textrm{F} = \sum _{i=1}^{N_{t}}\Vert \varvec{q}_i - g_\text {dec}(g_\text {enc}(\varvec{q}_i))\Vert ^2_\textrm{F} \end{aligned}$$

is minimized. After the network has been trained, the reduction is achieved as the dimension \(r\ll M\) of the latent variables \(\varvec{a}_i=g_\text {enc}(\varvec{q}_i)\in \mathbb {R}^r\) is much smaller than the input dimension M. Therefore, the decoder \(\varvec{q}_i\approx g_\text {dec}(\varvec{a}_i)\) represents a reduced map of the high dimensional data contained in the columns of \(\textbf{Q}\).

In the training procedure, the functions \(g_\text {enc},g_\text {dec}\) are determined by trainable parameters of the network, called weights and biases. The networks are constructed by a composition of layers, \(g_\text {enc}=L_1\circ L_2 \circ \cdots \circ L_N\). Usually, the layers of the network \(L_n:\mathbb {R}^i \rightarrow \mathbb {R}^o\) are given by an affine linear mapping \(\varvec{x}\mapsto h_n(\textbf{W}_n \varvec{x}+\varvec{b}_n)\), with weights \(\textbf{W}_n\in \mathbb {R}^{o,i}\) and biases \(\varvec{b}_n \in \mathbb {R}^o\) together with a predefined nonlinear function \(h_n\). The choice of the input and output dimension io in each layer, the activation function and the number of layers is called architecture of the network.

As the FTR-autoencoder network (FTR-NN) should implement the structure motivated in Problem 1, we choose a special architecture. It consists of a single layer decoder, without bias

$$\begin{aligned} \tilde{\varvec{q}}=g_\text {dec}(\varvec{a}) = f(\varvec{{\Psi }}\varvec{a}), \quad \varvec{{\Psi }}\in \mathbb {R}^{M\times r}, \end{aligned}$$

which is activated by the physics-dependent front function f. Here, the images of the linear part \(\varvec{\phi }_i= \varvec{{\Psi }}\varvec{a}_i\), with respect to \(\varvec{a}_i=g_\text {enc}(\varvec{q}_i)\) correspond to the columns of the discrete transport field \({\varvec{\Phi }}=[\varvec{\phi }_1,\ldots , \varvec{\phi }_{N_{t}}]\). Since the image of the linear part is represented by \(\varvec{{\Psi }}\in \mathbb {R}^{M \times r},\,r\ll M\) the resulting matrix is at most of rank r.

The encoder network consists of four convolutional layers, each followed by an exponential linear unit (ELU) and a batch normalization layer [75]. After flattening the output, the convolutional layers are followed by two linear layers, where the first one is again followed by an ELU activation and a batch normalization layer. We apply a stride of two in all convolutional layers after the first, to downsample the spatial resolution of the input data. Further details of the architecture and training procedure can be found in Appendix A.

For the training of the FTR-NN, an additional smoothness constraint is added to the optimization goal \(\mathcal {L}_\textrm{FTR}\), which penalizes the non-smoothness of the columns \(\varvec{\psi }_n\) of \(\varvec{{\Psi }}\in \mathbb {R}^{M \times r}\)

$$\begin{aligned} \mathcal {L}_\textrm{smooth} = \lambda _\text {smooth}\sum _{n=1}^{r}\frac{\Vert \textbf{D} \varvec{\psi }_n\Vert ^2_\textrm{F}}{\Vert \varvec{\psi }_n \Vert ^2_\textrm{F}}. \end{aligned}$$
(9)

Here, \(\textbf{D}\in \mathbb {R}^{M\times M}\) denotes the coefficient matrix of a forward finite difference, which is implemented as a convolution operation over the columns of \(\varvec{{\Psi }}\). For the examples in this manuscript \(\lambda _\text {smooth}=10^{-7}\), was found to be optimal. The additional smoothness constraint allows for faster convergence of the network in the validation phase. The constraint is reasonable since the columns represent the transport field \({\varvec{\Phi }}_{ij}=\phi (\varvec{x}_i,t_j)\), which is assumed to be smooth.

2.4 Synthetic Examples

In this subsection, we provide two synthetic examples. The first example illustrates the application of the FTR to linear advection and compares the two decomposition methods outlined above with previous results [4]. The second example addresses topology changing fronts. The reduced mappings that are used for the comparisons are summarized in Table 1. Note that the number of parameters that are determined during the offline stage is different from the number of time/parameter dependent variables \(\varvec{a}(t)\in \mathbb {R}^r\). We refer to the latter as degrees of freedom in the following.

Table 1 Summary of the different reduced mappings and their number of degrees of freedom (DOF)

2.4.1 Linear Advection of a Disc

The first synthetic example is taken from [4]. It illustrates the idea of the FTR in the pure advection case, without any topological change. The example parameterizes a disc of radius \(R=0.15L\), which is moving in a circle:

$$\begin{aligned}&q(\varvec{x},t) = f(\phi (\varvec{x},t)) \quad \text {and} \quad \phi (\varvec{x},t)=\frac{1}{2R}(\Vert \varvec{x}-\varvec{x}_0(t) \Vert _2^2-R^2) \end{aligned}$$
(10)
$$\begin{aligned}&\text {with circle center } \varvec{x}_0(t) =L \begin{pmatrix} 0.5 + 1/4 \cos (2 \pi t)\\ 0.5 + 1/4 \sin (2\pi t) \end{pmatrix}. \end{aligned}$$
(11)

The snapshot data \(q(\varvec{x},t)\) is generated with the specific level set field \(\phi (\varvec{x},t)\) defined in (10), which is zero at the outer radius of the disc, i.e. location of the front. The front is generated with help of the function \(f(x) = (\tanh (x/\lambda )+1)/2, \lambda = 0.1\). One representative snapshot of the data is shown together with its approximation using the POD in Fig. 2. As the authors of [4] have already pointed out, for this example \(\phi (\varvec{x},t)=\sum _{k=1}^3{a_k(t)\psi _k(\varvec{x})}\) can be parameterized by only three functions and is therefore of low rank, even if the field \(q(\varvec{x},t)\) is not. The basis functions \(\psi _1,\psi _2,\psi _3\) are shown in the top row of Fig. 4a. They can be interpreted as a quadratic basis function \(\psi _1(x,y)=(x-0.5L)^2+(y-0.5L)^2{-}R^2+L^2/4{^2}\) that represents the initial shape of the contour line with constant time amplitude \(a_1(t)=a_1\in \mathbb {R}\), and the linear transport functions \(\psi _2(x,y)=(x{-0.5L)},\psi _3(x,y)=(y{-0.5L)}\) for the shift in x/y-direction with \(a_2(t) \sim \cos (2\pi t),a_3(t) \sim \sin (2\pi t)\). Note, that the arrows in Fig. 4a indicate \(\nabla \psi _2(x,y)\), \(\nabla \psi _3(x,y)\) the direction of the shift.

Fig. 2
figure 2

Data at time \(t=0.11\) and its approximation with the Proper Orthogonal Decomposition (POD) using \(r=3\) modes

To show that the singular value thresholding Algorithm 1 (FTR) and the neural network approach Sect. 2.3 (FTR-NN) can find a similar basis set, we generate 200 equally spaced snapshots from q in the time interval \(0<t\le 1\), discretized with \(129\times 129\) grid points in the rectangular domain \([0,L]^2\). The data was split into a training and test set, where every second sample is a test sample. After training the neural network on the training samples, it is compared to the results of the POD and the thresholding Algorithm 1 using the test samples. All parameters for Algorithm 1 are stated in Table 2. The results are visualized in Figs. 3 and 4. In Fig. 3 we compare the results of both FTR algorithms (FTR, FTR-NN) and a simple symmetrical autoencoder structure, labeled with NN (for details see Appendix A). The NN decoder attempts to implement the encoder in an inverse manner (see details Appendix A), which is a common practice in dimension reduction. The relative errors in the Frobenius norm are shown in Fig. 3a. The quantitative errors of FTR-NN and the FTR show a significant drop using \(r=3\) degrees of freedom (FTR basis functions/latent space dimension), which is in accordance with the proposed level-set field. In contrast, the POD is showing a much slower convergence of the relative error. Comparing the two networks NN and FTR-NN regarding the quantitative errors shows that the additional depth of the NN-decoder compared to the one-layer decoder in FTR-NN does not influence the minimal relative error. This leads us to conclude that additional depth is not needed for a better representation. However, note that the NN needs fewer degrees of freedom to converge to its minimal relative error, which is due to the higher expressivity of a deeper network [76].

Table 2 Parameters of Algorithm 1 for the moving disc example
Fig. 3
figure 3

Comparison of POD, FTR, FTR-NN and the symmetrical autoencoder structure labeled with NN. a compares the relative errors in the Frobenius norm for different degrees of freedom. b visualizes the level-set field \(\phi \) together with the approximation of the data \(\tilde{q}=f(\phi )\) and the deviation from the exact data \(q-\tilde{q}\) for one selected snapshot

Furthermore, it is important to note that the FTR-thresholding algorithm outperforms both networks, when increasing the number of degrees of freedom, for this special example.

Remark 2

We will briefly make reference to other transport compensating methods such as the shifted POD (sPOD) [16] or transport reversal method [23]. In these, the movement of the disc is described by a two-dimensional shift transformation in x and y, denoted by \(\mathcal {T}^{\varvec{\varDelta }(t)}[q(x,y,t)] = q(x-\varDelta _1(t),y-\varDelta _2(t),t)\), and one mode \(\psi _1^\text {sPOD}(x,y)=q(x,y,0)\) for representing the structure of the disc in the co-moving frame. Thus, the state \(q(x,y,t)=\mathcal {T}^{\varvec{\varDelta }(t)}[\psi _1^\text {sPOD}(x,y)a_1^\text {sPOD}(t)]\) is described by \(r=3\) DOF (\(\varDelta _1(t),\varDelta _2(t),a^\text {sPOD}_1(t)\)), which is in accordance with the presented FTR approach for this specific example.

For qualitative comparison, Fig. 3b shows the approximation of one snapshot before and after activation (first and second column), together with the difference in the last column. Comparing the POD in Fig. 2 to the FTR results shows, that the typical stair casing behavior (which becomes a blurring of the sharp structures for many snapshots as used here) of the POD can be overcome with the FTR ansatz that recaptures the sharp front. We observe that both qualitative and quantitative errors of the FTR-NN and iterative thresholding approach yield similar results. In this study, we use \(\lambda _\text {smooth}=10^{-7}\) for regularizing the smoothness of \(\phi \) at the output of the FTR-NN and NN decoder. As visualized in Appendix A Fig. 14, for larger smoothness parameter \(\lambda _\text {smooth}>10^{-7}\) the transport field of the FTR-NN is smoothly continued at areas of no information (no transport), but the additional constraint (9) can cause a larger overall approximation error. However, the level-set fields of the iterative thresholding approach and NN are almost identical inside areas where fronts have been transported. This is due to the special choice of the encoder.

Figure 4 compares the fields \(\psi _1,\psi _2,\psi _3\), obtained by contemplation, to the first three modes of \(\phi \). Similar to the proposed functions, the auxiliary field can be split into a mode (\(\tilde{\psi }_1\)) responsible for the shape of the disc and two modes that parameterize the transport (\(\tilde{\psi }_2,\tilde{\psi }_3\)). As expected for this special case, \(\tilde{a}_1\) is constant and \(\tilde{a}_2,\tilde{a}_3\sim \cos (2\pi t + \theta )\), with \(\theta \in \mathbb {R}\) depends on the alignment (indicated as arrows in Fig. 4a) of the two shifting functions \(\tilde{\psi }_2,\tilde{\psi }_3\). The modes \(\tilde{\psi }_2,\tilde{\psi }_3\) only have meaningful values along the trajectories of the front because the algorithm can not in-paint \(\phi \) in areas of no transport. This explains that the modes in Fig. 4 are zero outside the circle.

Fig. 4
figure 4

Visualization of the FTR transport field \(\phi (x,t){=}\sum _{i=0}^3 a_i(t) \psi _i(x,y)\) for the disc moving in a circle (see Eqs. (11), (10)). Displayed are the expected spatial modes \(\psi _i\) (Fig. 4a), their temporal amplitudes \(a_i\) (Fig. 4b) and FTR approximation \(\tilde{\psi }_i,\tilde{a}_i,i=1,2,3\). The modes have been rescaled by their maximal absolute value, for better comparison. The arrows in Fig. 4a indicate the direction of the shift. They are computed from the spatial mean of \(\nabla \psi _2(x,y)\), \(\nabla \psi _3(x,y)\). The corresponding amplitudes \(a_2,a_3\) parameterize the circular movement in time

Comparing the CPU time of the different algorithms is difficult due to various factors that affect the convergence of the algorithms, especially for neural networks. These factors include initialization, learning rate, architecture, and others. Moreover, algorithms are typically run on different hardware (CPU/GPU) which can further complicate comparisons. Despite these caveats, what we can say is that when aiming for a comparable level of accuracy, Algorithm 1 requires less time than neural networks when compared on a CPU. This is likely due to the fact that neural networks involve optimizing a larger number of parameters during the minimization process. To illustrate this, we provide the required CPU/GPU time needed by a modern laptop to reach an error below 1% using \(r=3\) degrees of freedom in the resulting reduced mapping in Table 3.

Table 3 Average CPU/GPU time required to generate the reduced mapping of the moving disc with 3 degrees of freedom using either Algorithm 1 (FTR) or network training (FTR-NN, NN) was evaluated

2.4.2 Advection with Topology Change

In this example, we show that our approach is capable of handling transport with topological changes. Therefore, we introduce the synthetic snapshot data \(q(\varvec{x},t)=f(\phi (\varvec{x},t))\) build from the level-set field

$$\begin{aligned} \varphi (\varvec{x},t) = \sum _{k=1}^3 -A_k e^{-\sigma _k r_k}-t,\quad r_k = \Vert \varvec{x}-\varvec{x}_i \Vert _2, \end{aligned}$$
(12)

which we try to approximate. The front f is chosen as above. The level-set field is sampled equidistantly using \(256\times 265\) grid points in \([0,10]^2\), with \((A_1,A_2,A_3)=(1,1.4,1.2)\), \((\sigma _1,\sigma _2,\sigma _3)=(0.1,0.3,0.5)\) and \(\varvec{x}_1=(7.5,3.5),\varvec{x}_2=(2.5,5.0),\, \varvec{x}_3=(5.0,7.6)\). Furthermore, 101 equally spaced snapshots with \(0\le t\le 0.5\) are constructed from (12). As above, we split the samples into a test and training set, where every second sample is used for testing the autoencoders. After training the networks, they are compared to the reconstruction errors of the POD and FTR using the test samples. The level-set fields for \(t=0\) and \(t=0.4\) are visualized as a surface plot in Fig. 5, together with the resulting snapshots of q as a color plot. The intersection of \(\phi \) with the zero plane parameterizes the surface of the front. For increasing t, the level-set function is shifted vertically and produces an expanding surface of the front, which is merged from three independent into one single front contour. The merging of the fronts allows no smooth bijective mapping between the contour lines of the front at time \(t=0\) to \(t=0.4\). This property makes it difficult for most dimension reduction methods, which can handle transports because these rely on one-to-one mappings between different time or parameter instances.

Fig. 5
figure 5

Graph of the auxiliary field \(\phi \) (12) in a, b and its FTR approximation in c, d. The resulting snapshots \(q=f(\phi )\) are shown as a color plot in the xy plane. The intersection with the zero level is visualized (Color figure online)

As presented in Fig. 6, the FTR approximates the dynamics within two dyadic pairs with an error smaller than 0.2%, which is expected from the two-term dyadic structure in (12). The networks approximation errors behave as in the case of the moving disc. The FTR-NN gives similar results as the FTR, but with a larger minimal relative error. Due to the additional depth, the NN only needs one degree of freedom to converge towards its minimal error. Topology changes of the zero level-set are nicely recovered as is illustrated in Fig. 5c and d, since the FTR approach can recover the initial auxiliary field \(\phi \) in the regions of transport.

Fig. 6
figure 6

Comparison of the relative errors for the advection example with topology change using the proper orthogonal decomposition (POD), the FTR iterative thresholding algorithm (FTR), the FTR autoencoder structure (FTR-NN) and a standard autoencoder (NN)

2.5 Application to Advection–Reaction–Diffusion Systems

To motivate the FTR approach for more complex examples, we introduce the advection–diffusion–reaction PDE with a KPP reaction term

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t q = - \varvec{u}\cdot \varvec{\nabla } q + \kappa {\nabla ^{2}}q - \mu q^2(q-1)\\ q(\varvec{x},0) = q_0(\varvec{x}) \end{array}\right. }, \end{aligned}$$
(13)

on a square, two-dimensional domain \(\varOmega =[0,L]^2\) with periodic boundary conditions and time interval [0, T]. The PDE is discretized in space using 6th-order central finite differences, and in time with an explicit Runge–Kutta method of 5th(4th) order [77]. In the following, we refer to the discretized system as the full order model (FOM). All simulation parameters are listed in Table 4.

Table 4 Parameters of the 2D ARD simulation of Sect. 2.5

For our test case we choose a velocity field inspired by the vortex pair example in [78]. Therefore \(\varvec{u}=\varvec{\nabla } \times \omega \) is expressed in terms of the vorticity

$$\begin{aligned} \omega (\varvec{x},t) = \omega _0 e^{-t^2 / \tau ^2} (e^{-r^2_1(t)/r^2_0}+e^{-r^2_2(t)/r^2_0}), \quad r_i(t) = \Vert \varvec{x}-\varvec{x}_i(t) \Vert _2, \end{aligned}$$
(14)

which parameterizes a moving vortex pair \(\varvec{x}_1=L(0.6-ct,0.49)\), \(\varvec{x}_2=L(0.6-ct,0.51)\), \(r_0=5\times 10^{-4}\) with an initial amplitude \(\omega _0=10^{3}\) decaying slowly in time (decay constant \(\tau =3T\)). The initial distribution of the reactant q is given by:

$$\begin{aligned} q_0(x,y)= {\left\{ \begin{array}{ll} 1, &{}{\text {for}}\quad \sqrt{(x-0.4L)^2+(y-0.5L)^2}>0.2L,\\ 0, &{}\text {else}. \end{array}\right. } \end{aligned}$$
(15)

The velocity field and initial distribution are tuned to mimic a flame kernel interacting with a vortex pair, which is a usual phenomenon in turbulence flame inter- actions. During the simulation, the synthetic vortex pair (14) moves towards burning gas and mixes unburned (\(q = 1\)) with burned gas (\(q=0\)), such that a small island of unburned gas detaches into the burned area, creating a topology change in the contour line of the front. We compare the FTR decomposition of the data with the POD in Fig. 7. The color plot in Fig. 7a shows the time evolution of the FOM for \(t=0.0,0.4,0.8\) in the top row. In the second and third row, the FTR and POD are compared using \(r=6\) degrees of freedom. The POD approximation shows the typical staircase behavior as oscillations occur before and after the contour line of the front. The oscillations violate the initial range of values \(0\le q \le 1\), which is depicted as red and black areas in Fig. 7b. Even for a large number of modes (\(r=30\)) the oscillations are still visible, as shown by the profile plot in Fig. 7b. Therefore, preservation of physical structure cannot be expected, which may lead to instabilities in the resulting ROM [56, 57, 79]. Here, the FTR approach gives much better results, restricting the approximation on the initial range of values due to the range of the sigmoid function \(f(x)\in [0,1]\).

Fig. 7
figure 7

Qualitative comparison of the reconstruction errors of the 2D ARD system (13). a Shows a colorplot at three different time instances \(t=0.0, 0.4, 0.8\) (respectively left, middle, right column). The plot shows the FOM data (top row) and its reconstructions using the POD (middle row) and FTR (bottom row). For the FTR and POD, \(r=6\) degrees of freedom are used. The color bar is chosen such that values outside the initial range of values \(0\le q \le 1\) are highlighted in black or red. b Shows a profile of the solution and the different approximations along the horizontal line with at \(y=0.5\) and \(t=0.4\) (Color figure online)

3 Galerkin and Data-Driven Models for Moving Fronts

In the previous sections we have addressed the so-called offline stage of a model order reduction procedure, in which data is collected and its dimension is reduced. The reduced model generated by the FTR algorithm in Sect. 2.2 is nonlinear, which poses additional challenges for the online stage, to predict and interpolate new system states. This section is therefore dedicated to online prediction methods. In Sect. 3.1 we use a non-intrusive, i.e. equation free, approach of [59] and introduce an intrusive approach, the hyper-reduced Galerkin method in Sect. 3.2 for 1D and 2D ARD systems.

3.1 Data-Driven Methods

With the rise of data-driven methods in model order reduction, non-intrusive prediction methods of the reduced system, e.g. POD-DL-ROM [34], SINDy [80, 81] or Fourier-Koopman forecasting [59], have become prominent. Although the methods make specific assumptions on the system at hand, they can be useful, since they allow rapid evaluation of the reduced variables with good accuracy. This is especially beneficial if the reduced space is a nonlinear manifold, which makes any Galerkin-projection approach more complex and costly, as is shown in the next section.

Following the approach of [59], we can derive new system states and extrapolate in time with help of the Fourier-Koopman framework implemented in [82]. The Fourier-Koopman framework imposes the assumption that the reduced state \(\varvec{a}(t)\in \mathbb {R}^r\) is quasi-periodic in t and can be thus parameterized by:

$$\begin{aligned} \varvec{a}(t) = \textbf{A}\varvec{\varOmega }(t) \quad \text {with} \quad \varvec{\varOmega }(t)= \begin{pmatrix} \cos (\varvec{\omega }t)\\ \sin (\varvec{\omega }t) \end{pmatrix}. \end{aligned}$$
(16)

Here, \(\textbf{A}\in \mathbb {R}^{r\times p}\) and \(\varvec{\omega }\in \mathbb {R}^{p/2}\) are determined by solving the optimization problem:

$$\begin{aligned} \min _{\varvec{\omega },\textbf{A}} \sum _{n=0}^{N-1} \Vert \varvec{a}(t_n) - \textbf{A}\varvec{\varOmega }(t_n)\Vert _2^2, \end{aligned}$$
(17)

in a smart way [59]. Since the dynamical system presented in Sect. 2.4.1 is quasi-periodic, we can apply the method to the FTR decomposition

$$\begin{aligned} \varvec{q}(t)\approx \tilde{\varvec{q}}(t) = f(\varvec{{\Psi }}\varvec{a}(t)) \end{aligned}$$
(18)

using the basis functions \(\varvec{{\Psi }}=[\tilde{\varvec{\psi }}_1,\tilde{\varvec{\psi }}_2,\tilde{\varvec{\psi }}_3]\), shown in Fig. 4 together with the amplitudes \(\varvec{a}(t)=(a_1(t),a_2(t),a_3(t))\) at the sampled time points \(\{t_n=n\varDelta t\mid n=0,\ldots ,N-1\}\)

Remark 3

The systems dynamics can be further reduced by rewriting \(f(\varvec{{\Psi }}\varvec{a}(t)) = f(\varvec{\tilde{\varvec{{\Psi }}}}\varvec{\tilde{a}}(t)+\varvec{b}), \varvec{b}\in \mathbb {R}^M, \varvec{\tilde{a}}\in \mathbb {R}^{r-1},\varvec{\tilde{\varvec{{\Psi }}}}\in \mathbb {R}^{M\times (r-1)}\). The offset vector \(\varvec{b}\) then contains the time-independent part of the decomposition shown as a constant line in Fig. 8. This can be done similarly for the POD.

From the sampled data we compute \(\textbf{A},\varvec{\omega }\). To test the resulting model \(\tilde{\varvec{q}}(t)=f(\varvec{{\Psi }}\textbf{A}\varvec{\varOmega }(t))\) we evaluate it at time instances \(t_{n+1/2}=(n+1/2)\varDelta t\) for \(n=0,\ldots ,2N-1\), that have been not sampled during the offline phase. Similarly, we can derive an approximation with the POD. Both results are compared in Fig. 8. Furthermore, the online-prediction error is stated for \(r=2,4,6,8,10,12,15\) in Table 6.

Fig. 8
figure 8

Predictions using Fourier–Koopman forecasting with three POD modes (a) and three FTR modes (b). The black circles () in the upper row indicate the predictions of the amplitudes \(\varvec{a}(t)=(a_1(t),a_2(t),a_3(t))\widehat{=}\) () and the colored crosses mark the training samples. In the lower row, we show the corresponding snapshots at selected time instances \(t=0.2,0.4,0.6,0.8\) (Color figure online)

Note that after solving (17) in the offline stage, the computational effort is reduced to the evaluation of \(\tilde{\varvec{q}}(t)=f(\varvec{{\Psi }}\textbf{A}\varvec{\varOmega }(t))\), which only takes milliseconds.

For a realistic test case, we apply the FTR-Fourier-Koopman procedure to the methane mass fraction \(Y_{\textrm{CH}_4}\) of one flame of a multi-slit Bunsen burner simulation analyzed and studied in [83, 84]. The snapshots are generated with a customized, weakly compressible version of rhoReactionFOAM from the OpenFOAM software package (see [84, 85]). In the simulation, a flame is periodically excited by an incoming velocity pulse. The acceleration of the fuel detaches a burning pocket shown in Fig. 9. The data set.Footnote 1 consists of 200 snapshots, with \(M = 128\times 430\) grid points, sampled in a time interval \(t\in [0.01,0.05]\) in which the Bunsen flame is quasi-periodic. Again, we split the data into training (\(t_n=2\varDelta t n\)) and test samples \(t_{n+1/2}=(2 n+ 1)\varDelta t\). While we use the training samples to generate the reduced model using Algorithm 1 with the parameters that are summarized in Table 2, the test samples are used to calculate the relative errors stated in Table 5. The flame pinch-off is not a special case in combustion systems, but it poses challenges to model order reduction methods, as described above. Figure 9 shows that for the FTR the structure of the solution is well captured and the physical bound \(0\le Y_{\textrm{CH}_4}\le 1\) is preserved.

Fig. 9
figure 9

Online predictions of the Bunsen flame example. a compares the test data in the top row with the FTR-Koopman and POD-Koopman results using \(r=8\) degrees of freedom for \(t=0.01\) and 0.04. The snapshots show how a burning fuel pocket is detached from the flame at \(t=0.04\) causing a change in the topology of the contour line of the front. b visualizes the Fourier–Koopman predictions () for \(\varvec{a}(t)=(a_1(t),a_2(t),a_3(t))\widehat{=}\) ()

Table 5 Relative errors \(\sum _{n=0}^{2N-1}\Vert \varvec{q}(t_{n+1/2})-\tilde{\varvec{q}}(t_{n+1/2})\Vert _2^2/\sum _{n=0}^{2N-1}\Vert \varvec{q}(t_{n+1/2})\Vert _2^2\) using Fourier–Koopman (online) with POD and FTR on the Bunsen flame data
Table 6 Relative error \(\sum _{n=0}^{2N-1}\Vert \varvec{q}(t_{n+1/2})-\tilde{\varvec{q}}(t_{n+1/2})\Vert _2^2/\sum _{n=0}^{2N-1}\Vert \varvec{q}(t_{n+1/2})\Vert _2^2\) using Fourier–Koopman (online) with POD and FTR on the moving disc data

3.2 Manifold Galerkin Methods

After discretizing the ARD system (3) in space, we obtain an ODE system of the form

$$\begin{aligned} \text {(FOM)}\quad \left\{ \begin{aligned}&\dot{\varvec{q}}(t,\mu ) =\varvec{F}(\varvec{q},t,\mu )\\&\varvec{q}(0) = \varvec{q}_0, \end{aligned}\right. \end{aligned}$$
(19)

with the discretized non-linear right hand side (RHS) \(\varvec{F}:\mathbb {R}^M\times \mathbb {R}\times \mathbb {R}^+\). Here, the parameter \(\mu >0\) denotes the reaction constant. Using a reduced mapping

$$\begin{aligned} g:\mathbb {R}^r \rightarrow \mathbb {R}^M: \varvec{a}\mapsto g(\varvec{a}),\quad \text {with Jacobian} \quad \textbf{J}_g(\varvec{a}) = \left( {\frac{\partial g_i}{\partial a_{j}}}(\varvec{a})\right) _{\begin{array}{c} i=1,\ldots ,M\\ j=1,\ldots ,r \end{array}} \end{aligned}$$
(20)

as approximation \(\varvec{q}\approx \tilde{\varvec{q}}=g(\varvec{a})\) of the data and plugging it into (19) yields a reduced model:

figure f

Minimizing the continuous time residual (21), yields the optimality condition:

$$\begin{aligned} 0&= \frac{\textrm{d}}{\textrm{d}{\dot{\varvec{a}}}}\Vert \textbf{J}_g(\varvec{a})\dot{\varvec{a}} -\varvec{F}(g(\varvec{a}),t,\mu )\Vert _2^2 \end{aligned}$$
(23)
$$\begin{aligned}&= 2 \textbf{J}_g(\varvec{a})^T\textbf{J}_g(\varvec{a})\dot{\varvec{a}} - 2\textbf{J}_g(\varvec{a})^T \varvec{F}(g(\varvec{a}),t,\mu ), \end{aligned}$$
(24)

which is uniquely solved by

$$\begin{aligned} \varvec{\dot{a}} = \textbf{J}_g^+(\varvec{a}) \varvec{F}(g(\varvec{a}),t,\mu ), \end{aligned}$$
(25)

if the Jacobin has full column rank [35]. Here, \( \textbf{J}_g^+\) is the Moore-Penrose pseudo inverse of \(\textbf{J}_g\). In practice the inverse is never formed, but (24) is solved directly. Unfortunately, the calculation of (24) still depends on the dimension of the FOM, since the full RHS has to be evaluated. In order to circumvent this, we apply hyper-reduction explained in the following.

3.3 Hyper-Reduction for Moving Fronts

Apart from the slow decaying POD approximation errors, advection–reaction–diffusion systems pose another difficulty for model order reduction. The dynamics of advection–reaction–diffusion systems take place at a thin band along the front, which is shown in Fig. 10. This band is usually much smaller than the size of the domain or the traveling distance of the front. Hence, the FOM-RHS and its gradient posses only few spatial grid points per time step with non-vanishing support. Therefore, the hyper-reduction methods for nonlinear manifolds [36, 54] cannot be applied. For example, the extended-ECSW scheme proposed by [54], or the gappy-POD based GNAT procedure [52] first introduced for nonlinear manifolds in [36] cannot be used here, since they preselect a set of sample points, which is fixed for every time step and all \(\mu \in \mathcal {P}\). This is supported by the observations made in [55, 57], where an oversampling of the hyper-reduced system is necessary in order to maintain stability of the resulting ROM. In contrast, the FTR-hyper-reduction approach outlined in Algorithm 2 can help to identify the locations of the front to reduce computational complexity, while sustaining an accurate solution and stability. Here, we propose an idea that is similar to the Reduced Integration Domain (RID) method [58] for finite elements. By imposing a threshold criterion on each finite element, RID is choosing a reduced number of elements to describe a balance condition, i.e. to minimize the residual between internal and external forces. Similar to RID, using a threshold search we selected a number of \(M_p\) sampling points \(\varvec{s}=(s_1,\ldots ,s_{M_p})\) that are sensitive to the residual (24). This corresponds to replacing the Euclidean by a weighted norm in (24):

$$\begin{aligned} 0&=\frac{\textrm{d}}{\textrm{d}{\dot{\varvec{a}}}}\Vert \textbf{J}_g(\varvec{a})\dot{\varvec{a}} -\varvec{F}(g(\varvec{a}),t,\mu )\Vert _{\textbf{P}^2_{\varvec{a}}}^2 \end{aligned}$$
(26)
$$\begin{aligned}&= 2 \textbf{J}_g(\varvec{a})^T\textbf{P}^T_{\varvec{a}}\textbf{P}_{\varvec{a}}\textbf{J}_g(\varvec{a})\dot{\varvec{a}}- 2\textbf{J}_g(\varvec{a})^T \textbf{P}^T_{\varvec{a}}\textbf{P}_{\varvec{a}} \varvec{F}(g(\varvec{a}),t,\mu ). \end{aligned}$$
(27)

Each of the \(M_p\) selected sample points corresponds to an index \(0\le s_i\le M\), which is represented as the \(s_i\)th standard basis vector \(\varvec{e}_{s_i}\in \mathbb {R}^M\) inside the rows of the selection matrix \(\textbf{P}_{\varvec{a}}\in \mathbb {R}^{M_p\times M}\). Thus, the hyper-reduced Jacobian and right hand side \(\textbf{P}_{\varvec{a}}\textbf{J}_g,\textbf{P}_{\varvec{a}}\varvec{F}\) are only computed at \(M_p\) sample points. Note that the stencil size of our finite difference scheme requires computing \(f(\varvec{\phi })\) on additional supporting mesh points, contained in \(\hat{\textbf{P}}_{\varvec{a}}\in \mathbb {R}^{\hat{M}_p\times M}\). In practice, \(\hat{\textbf{P}}_{\varvec{a}}f(\varvec{\phi }),\textbf{P}_{\varvec{a}}\textbf{J}_g,\textbf{P}_{\varvec{a}}\varvec{F}\) are not computed as matrix products, but as pointwise evaluations of \(f(\varvec{\phi }),\textbf{J}_g,\varvec{F}\) at the corresponding sample points. To be precise, in line 5 of Algorithm 2 we only iterate over the indices \(\varvec{s}\in \mathbb {R}^{M_p}\), when computing \(\varvec{b}\in \mathbb {R}^{M_p}\). Similarly, we compute the Jacobian in line 6 of Algorithm 2.

Algorithm 2
figure g

Hyper-reduced RHS

In contrast to RID, the selection matrix \(\textbf{P}_{\varvec{a}}:\mathbb {R}^r\rightarrow \mathbb {R}^{M_p\times M}\) is dependent on the state \(\varvec{a}(t,\mu )\), which evolves over time (see Fig. 10). However, similar to what RID does for external forces, we have to add nodes, i.e. sample points, at which the right hand side does not vanish. Thus, if all sample points are included for which the RHS does not vanish, we have an exact equality between the different norms:

$$\begin{aligned} \Vert \textbf{J}_g(\varvec{a})\dot{\varvec{a}} -\varvec{F}(g(\varvec{a}),t,\mu )\Vert _{2}^2=\Vert \textbf{J}_g(\varvec{a})\dot{\varvec{a}} -\varvec{F}(g(\varvec{a}),t,\mu )\Vert _{\textbf{P}^2_{\varvec{a}}}^2. \end{aligned}$$
(28)

Since for the FTR \(\varvec{F}\) is non-vanishing at the locations of the front, i.e. at the roots of the level-set function, we can perform a time-dependent adaptive thresholding, which defines \(\textbf{P}_{\varvec{a}}\). Starting from the sorted entries \(\left| \phi _{s_1} \right| \le \left| \phi _{s_2} \right| \le \cdots \le \left| \phi _{s_{M_p}} \right| \le \left| \phi _{s_{M_p+1}} \right| \cdots \le \left| \phi _{s_M} \right| \) of the level set vector \(\varvec{\phi }=\varvec{{\Psi }}\varvec{a}\in {\mathbb {R}}^M\), the threshold search selects the \(M_p\) first indices \(\varvec{s}=(s_1,\ldots ,s_{M_p})\in \{1,\ldots ,M\}^{M_p}\). These are the indices at which we evaluate \(\hat{\textbf{P}}_{\varvec{a}}f(\varvec{\phi }),\textbf{P}_{\varvec{a}}\textbf{J}_g,\textbf{P}_{\varvec{a}}\varvec{F}\). For two time instances, the sample points are visualized in Fig. 10 for the 2D ARD-system of Sect. 2.5. Note, that the threshold search is a heuristic in order to perform a cheap minimization of the residual (24) using the relation (28).

Fig. 10
figure 10

Color plot of the right hand side \(\varvec{F}(\varvec{q},t,\mu )\) of the 2D advection–reaction–diffusion system (13) for two different time instances \(t=0.07\) (left) and \(t=2.85\) (right) and the corresponding sample points for a sample fraction of \(M_p/M=0.1\). The inset in the right color plot shows a close up of the location of the front (Color figure online)

Further, it should be noted that in this work we are using explicit time integration schemes, as they are usually used inside finite difference solvers. Therefore, the aforementioned methods [36, 54] are not comparable in speedup, since they compare the ROM with implicit time integration schemes used in the FOM. Nevertheless, applying implicit integration schemes during the online phase may benefit the stability of the resulting ROM. A promising and efficient method for explicit time integration schemes was proposed by [3] for reaction–diffusion systems in one spatial dimension. Although the framework cannot cope with topological changes, since it relies on a smooth parameterization of the transport, the authors claim speedups of up to a factor of 130.

In the following section, we will show some numerical examples utilizing the here presented hyper-reduction approach.

3.4 Numerical Examples

In this section, we numerically investigate the applicability of our framework. Therefore, we define the offline and online errors:

$$\begin{aligned} \text {offline/online err} = \frac{\Vert \textbf{Q}^\text {train/test}-\tilde{\textbf{Q}}^\text {train/test} \Vert _\textrm{F}}{\Vert \textbf{Q}^\text {train/test} \Vert _\textrm{F}}. \end{aligned}$$
(29)

Here, \(\textbf{Q}\in \mathbb {R}^{M\times (N_tN_{\mathcal {P}})}\) is the snapshot matrix containing all snapshots for the \(N_t\) time and \(N_\mathcal {P}\) parameter instances \(\mu \in \mathcal {P}\) in its columns. The superscript "train" ("test") belongs to the snapshots \(\mu \in \mathcal {P}^\text {train}\) (\(\mathcal {P}^\text {test}\)) computed during the offline (online) phase.

The approximation \(\tilde{\textbf{Q}}^\text {train}\) is therefore either the reconstruction of the training data using the FTR-ansatz (Algorithm 1) or, in the case of the POD, the projection onto the first r left singular vectors of \(\textbf{Q}^\text {train}\) contained in \(\textbf{U}\in \mathbb {R}^{M\times r}\), i.e. \(\tilde{\textbf{Q}}^\text {train}=\textbf{U}^T\textbf{U}\textbf{Q}^\text {train}\).

\(\tilde{\textbf{Q}}^\text {test}\) refers to the results evaluating the ROM (21) for the given time interval and parameters \(\mu \in \mathcal {P}^\text {test}\) using the reduced mapping \(g:\mathbb {R}^r \rightarrow \mathbb {R}^M\). Specifically, in the case of the POD, the dynamical ROM predictions use \(g(\varvec{a})=\textbf{U}\varvec{a}\) as a reduced mapping, whereas \(g(\varvec{a}) = f(\varvec{\Psi }\varvec{a})\) for the FTR.

Furthermore, we define the projection error:

$$\begin{aligned} \text {proj. err} = \frac{\Vert \textbf{Q}^\text {test}-\tilde{\textbf{Q}}_*^\text {test} \Vert _\textrm{F}}{\Vert \textbf{Q}^\text {test} \Vert _\textrm{F}} \end{aligned}$$
(30)

where \(\tilde{\textbf{Q}}_*^\text {test}\) is the best fit of \(\textbf{Q}^\text {test}\) with help of our the mapping g.

3.4.1 Reaction–Diffusion System in 1D

First, we test our approach on an analytic test case taken and modified from [68]. The test case is based on a one-dimensional scalar nonlinear reaction–diffusion equation

$$\begin{aligned} \partial _t q = \partial _{xx} q + \frac{8}{\mu ^2} q^2(q-1) \quad (t,x) \in [0,1]\times [-15,15] \end{aligned}$$
(31)

with corresponding analytical solution

$$\begin{aligned} q(x,t,\mu )=f\left( \frac{|x|-2t/\mu -2}{\mu }\right) , \end{aligned}$$
(32)

given that \(f(x)=\frac{1}{2}(1+\tanh (x))\). We discretize (31) with central finite difference of 6th order and periodic boundary conditions. Furthermore, we use an explicit Runge-Kutta integration method of 5th(4th) order for adaptive time stepping of the FOM and ROM ODE-system [77]. The numerical parameters for the FTR-decomposition and discretization are stated in Table 7.

Table 7 Parameters of the 1D reaction–diffusion simulations and the decomposition procedure (Algorithm 1)

The training data consists of 202 samples, including 101 samples of each training parameter \(\mu \in \mathcal {P}^\text {train}=\{0.2,1\}\). The training data is visualized as color plot in Fig. 11 together with the ROM prediction of the FTR using \(r=3\) and \(\mu =\mu _\text {test}=0.3\) in (31). The FTR algorithm (Algorithm 1) is run for 8000 steps using \(\tau = 4\) and different truncation ranks \(1<r<10\). After we have computed the reduced mapping \(\varvec{q}(t,\mu )=f(\varvec{{\Psi }}\varvec{a}(t,\mu ))\) from the training set, we can compute the starting values \(\varvec{a}(0,\mu ), \mu \in \mathcal {P}^\text {test}\) to test the ROM (21) by minimizing the initial condition of the ROM (22), using Gauss-Newton iterations [86]. As an initial guess for the minimization, we use the set of initial points \(\{(\mu ,\varvec{a}(0,\mu ))\mid \mu \in \mathcal {P}^\text {train}\}\) and interpolate them for any given test parameter \(\mu \in \mathcal {P}^\text {test}\). Thereafter, the ROM-solution for all test parameters \(\mu \in \mathcal {P}^\text {test}\) is compared to the analytical solution (32). The results are reported as online errors in Table 8 together with the offline and projection errors. The online and projection errors are stated for the cumulated snapshots including the time interval [0, 1] and all parameters \(\mu \in \mathcal {P}^\text {test}=\{0.3,0.4,\ldots ,0.9\}\). Table 8 also compares the results with the POD-Galerkin approach. The starting values for the POD-Galerkin-ROM are simply given by the orthogonal projection of \(\varvec{q}(0,\mu )\) onto the POD modes. It is remarkable to see that the FTR outperforms the POD by two orders of magnitude.

Fig. 11
figure 11

Training and test data of the reaction diffusion system (31) with the FTR-ROM using \(r=3\) degrees of freedom

Next, we are interested in whether the gain in precision can be translated to speedups. Therefore, we study the performance of the hyper-FTR (Algorithm 2) and compare it to POD-DEIM (see Appendix B for details). Figure 12 compares CPU-time and error for \(M_p=0.1\,M,0.2\,M,0.5\,M\) and M number of grid points, where M is the dimension of the FOM. The figure indicates that even without hyper-reduction, speedups can be achieved compared to the FOM, due to larger step sizes in the reduced coordinates. Comparing the hyper-FTR with a sample fraction of \(M_p/M=0.2\) to 1 we see another speedup in CPU-time. For a reduction below 0.1M grid points, the solution is unstable and can lead to additional time steps, making the overall simulation slower. Figure 12 shows that for the cumulated parameter range, the POD-DEIM approach is superior to the presented hyper-reduced FTR-ROM, albeit we expect larger errors when comparing single trajectories for the smallest \(\mu \), i.e. steepest fronts.

Table 8 Offline, online and projection errors for FTR and POD
Fig. 12
figure 12

Error versus CPU-time for the accumulated parameter range \(\mu \in \mathcal {P}^\text {test}\). Different ranks r are indicated as (r) near the markers. The dashed line indicates the CPU time needed for solving the FOM. The sampled fraction \(M_p/M\) in the hyper-reduced FTR-ROM is given in terms of the size M of the FOM. The POD-DEIM approach of [45] is given for reference. In the POD-DEIM approach (r) denotes the number of DEIM points p, which is identical to the number of POD modes \(r=p\) used

3.4.2 Advection–Reaction–Diffusion System in 2D

Finally, we test the online performance of the hyper-FTR on the advection–reaction–diffusion example of (13), introduced in Sect. 2.5. We build the training/testing data \(\textbf{Q}^\text {train}\) from 101 equally spaced snapshots (visualized in Fig. 7) with \(t\in [0,3]\), \(\mu \in \mathcal {P}^\text {train}=\{10,30,50,70,100\}\) and respectively \(\mu \in \mathcal {P}^\text {test}=\{20,40,60,80,90\}\). The online, offline, and projection errors of the test case are shown in Fig. 13 together with the speedup generated by the hyper-reduction scheme. It is visible that the FTR outperforms the POD with respect to offline and online errors. Furthermore, the utilized hyper-reduction strategy results in speedups with moderate online errors. Note that reducing the integration domain to about 10% of its original size (see sample points in Fig. 10) does not affect the online error, as can be seen from Fig. 13a. Unfortunately, for hyper-reduced systems with \(M_p/M<0.1\) the resulting ROM becomes unstable in the tested parameter range. Figure 13b shows, that for small r, the additional costs (\(\mathcal {O}(rM)\)) for the matrix multiplication \(\varvec{\phi }(t,\mu )=\varvec{{\Psi }}\varvec{a}(t,\mu )\) are negligible, compared to the evaluation of \(\varvec{F}\). However, as soon as r becomes large, the speedups of the hyper-reduction scheme are compensated by the computation of \(\varvec{\phi }\) inside Algorithm 2. The balance point at which the additional costs compensate the costs of the RHS is problem dependent, but computing the sample points from \(\varvec{\phi }\) is a bottleneck of this method, since it scales with the FOM dimension. Nevertheless, when aiming for more complex examples like combustion systems or 3D ARD systems, the outlined hyper-reduction approach will benefit from a more computationally complex RHS, which will shift the balance point towards a higher number of modes.

Fig. 13
figure 13

Hyper-FTR results: a relative errors defined in Eqs. (29) and (30) for \(\mathcal {P}^{\text {test}/\text {train}}\) using POD and FTR decomposition. b Speedup versus degrees of freedom for the cumulated parameter range \(\mu \in \mathcal {P}^\text {test}\). The speedups are compared for different numbers of sampled grid points \(M_p\le M\). The full order model (FOM) using \(M=512^2\) grid points is marked with a dashed line

4 Discussion and Conclusion

In this work, we have introduced the front transport reduction (FTR) method to decompose and simulate transports of complex moving fronts. The decomposition parameterizes moving fronts with the help of a transport-dependent auxiliary field \(\varvec{\phi }\) and a function f to approximate the front profile. Two different decomposition algorithms have been proposed based on singular value thresholding (Algorithm 1) and artificial neural networks (Sect. 2.3). These methods are purely data-driven since they only require a set of snapshots \(\varvec{q}(t,\mu )\in \mathbb {R}^M\) of the FOM as input. The resulting approximation \(\varvec{q}(t,\mu )\approx f(\varvec{\phi }(t,\mu ))\) is well suited for model order reduction of reacting fronts, since \(\varvec{\phi }(t,\mu )=\varvec{{\Psi }}\varvec{a}(t,\mu )\in \mathbb {R}^M\) can be represented by a few \(r\ll M\) spatial modes collected in \(\varvec{{\Psi }}\in \mathbb {R}^{M\times r}\).

We emphasize that the utilized front-structure is inherent for advection–reaction–diffusion (ARD) systems (see for example [61,62,63,64]). Making explicit use of the physical structure has advantages over other linear and nonlinear dimension reduction methods, for reasons we discuss in the following: It was shown, that for various ARD systems the FTR requires fewer modes to decompose the input snapshots compared with the proper orthogonal decomposition (POD), i.e. it has a better compression quality. Regarding artificial autoencoder networks, the FTR is similar in the sense that it uses a linear layer activated by a problem dependent nonlinear front function as a decoder. Here, other authors [36] use multiple nonlinear activated layers \(\varvec{q}\approx f(f\ldots f(\varvec{a}))\), resulting in costly evaluations of the network itself. This can limit the overall performance of the ROM when evaluating the additional nonlinearities. Furthermore, the autoencoder networks are often difficult to tune and require training on GPUs. Similar to artificial autoencoder networks, the FTR can approximate topological changes in the evolution of the contour line of the front since it does not make explicit assumptions on the mapping. These topological changes cause problems for most of the transport compensating methods [16,17,18,19,20,21,22,23,24], because they use transformations that are smooth bijections on the simulation domain.

The ability of the FTR to predict new system states has been demonstrated for non-intrusive (Sect. 3.1) and intrusive (Sect. 3.2) ROMs. Since the FTR gives additional insights into the underlying structure (transport field \(\varvec{\phi }\)), it allows us to use this information when predicting new system states. As an example, we heuristically reduced the integration domain during the online evaluation of the Galerkin projected ODE system, using the knowledge of \(\varvec{\phi }\). This can be seen as an adaptive version of the reduced integration domain method [58]. Other nonlinear hyper-reduction methods preselect a set of sample points, on which the dynamics are evaluated. Since for the studied systems, only sample points close to the front are relevant for the dynamics, such hyper-reduction methods may fail. Although the outlined hyper-reduction procedure yields speedups in CPU time, it needs a substantially larger number of sample points \(M_p\) than required by the dimensions of the ROM \(r\ll M_p\ll M\). Therefore, the construction of more efficient hyper-reduction schemes is left open for future research.

To apply our findings to more complex advection–reaction–diffusion systems such as combustion systems in fluid mechanics with multiple reacting species, the decomposition has to be extended to allow arbitrary traveling front shapes. Here, the FTR method would benefit from a generalization or combination with the shifted POD [18], as this would allow decomposing of multiple traveling wave systems with topological changes. Furthermore, it would be interesting to see if our approach can be applied to multi-phase flows, as they inherit a similar front structure separating the fluids. Here, the similarity with level-set-based methods like the characteristic mapping method [87] should be exploited.