Front Transport Reduction for Complex Moving Fronts

This work addresses model order reduction for complex moving fronts, which are transported by advection or through a reaction–diffusion process. Such systems are especially challenging for model order reduction since the transport cannot be captured by linear reduction methods. Moreover, topological changes, such as splitting or merging of fronts pose difficulties for many nonlinear reduction methods and the small non-vanishing support of the underlying partial differential equations dynamics makes most nonlinear hyper-reduction methods infeasible. We propose a new decomposition method together with a hyper-reduction scheme that addresses these shortcomings. The decomposition uses a level-set function to parameterize the transport and a nonlinear activation function that captures the structure of the front. This approach is similar to autoencoder artificial neural networks, but additionally provides insights into the system, which can be used for efficient reduced order models. In addition to the presented decomposition method, we outline a tailored hyper-reduction method that is based on the reduced integration domain method. The capability of the approach is illustrated by various numerical examples in one and two spatial dimensions, including an advection–reaction–diffusion system with a Kolmogorov–Petrovsky–Piskunov reaction term and real life application to a two-dimensional Bunsen flame.


Introduction
This article addresses model order reduction for reactive flows.These flows often exhibit sharp fronts, like flames, which makes their simulation computational expensive.This suggests applying model reduction for reducing simulation costs.However, classical model order reduction methods fail [1] due to the sharp, moving fronts, that pose challenges for reducing and predicting new system states.This manuscript addresses these issues by presenting a new decomposition method together with efficient strategies to evaluate the dynamics of the reduced system.For our study we use advection-reaction-diffusion systems (ARD) with a nonlinear Kolmogorov-Petrovsky-Piskunov (KPP) reaction term, as the complex front dynamics with topology changes of such systems feature essential difficulties for MOR, while the analysis is simplified because the reacting quantity is scalar and bounded.
Model order reduction (MOR) has been studied for various ARD systems [2][3][4][5][6].In this study we focus on systems that exhibit locally one-dimensional traveling fronts.The compact support of the moving fronts is challenging for linear reduced basis methods, such as the proper orthogonal decomposition (POD).The POD approximates a set of snapshots q(x, t i ), i = 1, . . ., N t by separation of variables q(x, t) ≈ r k=1 âk (t) ψk (x) , with help of time amplitudes âk (t) and spatial modes ψk (x), computed by a singular value decomposition (SVD).Unfortunately, moving fronts with sharp gradients significantly slow down the convergence of Eq. ( 1).This has been numerically investigated for reactive flows [1] and is theoretically quantified with help of the Kolmogorov n-width in [7,8].
The convergence can be improved by compensating the transport, for which many authors use one-to-one mappings [9][10][11][12][13] to align the front onto a reference frame in which the moving front is stationary.This allows to efficiently decompose the temporal variation of the front shape into few basis functions.On the downside, however, it assumes that the transport dependent movement is known [9,11] or at least a sufficiently smooth function in time [10], which is easy to parametrize and itself independent of x.Unfortunately, for complicated transports, where fronts may split or merge, this approach does not work, because no smooth one-to-one mapping exists.
In this work, we therefore follow a more direct approach, in which we make use of an auxiliary field φ(x, t), which parameterizes the transport efficiently, together with a shape function f to retain the front shape: The auxiliary field φ : R d × [0, T ] → R allows to embed the local one dimensional front movement into a ddimensional transport.Since the transport is only parameterized locally, changes in the topology of the front surface can be captured.A similar approach was introduced in [4], where φ was constructed with help of a signed distance function and the front function f was determined from a fit to the reacting front.While improving the approximation, this was found to be not optimal, since the obtained signed distance function does not have to be of low rank.Here, we follow a similar approach, but we formulate an optimization problem to compute φ.The resulting description Eq. ( 2) is called Front Transport Reduction (FTR) in the remainder of this manuscript.Due to the nonlinearly activated linear space created by the span of {ψ k (x)} k=1,...,r , this approach shows many parallels to artificial neural networks.It can be seen as the decoder part of a shallow autoencoder structure.While shallow autoencoders have been used in previous studies [14], we are explicitly incorporating the underlying physical assumptions and thereby obtain interpretable results of reduced variables.
The second part of this manuscript addresses dynamical ROM predictions of ARD systems using the FTR ansatz Eq. (2).Here, many different methods exist in the literature, which can be categorized into intrusive or non-intrusive reduced order models.Intrusive refers to data models where the resulting predictions are based on the initial ARD model, whereas a purely data-driven, non-intrusive model is based on additional assumptions such as smoothness in the reduced parameter space.Intrusive models project the original equation system on the reduced manifold, which is nonlinear in our approach.These so-called manifold Galerkin methods have been used in combination with neural networks in [15] and with dynamical transformed modes in [16].Unfortunately, manifold Galerkin methods require special hyper-reduction schemes to gain speedups in the resulting ROM.Examples of these methods are the extended-ECSW scheme proposed by [17], the gappy-POD based GNAT procedure [15] first introduced for nonlinear manifolds in [14] or the shifted DEIM algorithm in [3].The idea of all of these methods is to evaluate the nonlinear dynamics of the underlying system for a small number of points to determine the evolution of the parameters in the reduced space.Unfortunately, the extended-ECSW scheme [17] and the GNAT procedure [14,17], cannot be used for ARD systems with sharp advected fronts, since they preselect a fixed set of points, but the dynamics are localized only near the moving front.We discuss this problem and state a practical solution using a special hyperreduction scheme, based on the reduced integration domain (RID) method [18].Furthermore, we examine the ability of the FTR mapping to predict new system states with the help of non-intrusive methods.

Structure of the Article
The remainder of the article is structured into three parts.The first part, Section 2, is dedicated to the FTR decomposition, where we motivate the decomposition and introduce two algorithms to solve the corresponding high dimensional optimization problem.While the first algorithm is based on iterative thresholding of singular values (see Section 2.2), the other algorithm uses artificial neural networks (see Section 2.3).The algorithms are applied and compared for two synthetic examples in Section 2.4 and one example of a twodimensional (2D) ARD system with topology change in Section 2.5.In the second part, Section 3, we use the low dimensional description of the FTR to predict new system states via non-intrusive MOR (Section 3.1) and intrusive MOR (Section 3.2).For the latter we propose a special hyper-reduction method.The resulting ROM is tested for 1D and 2D ARD systems in Section 3.4.Finally, we summarize our results in the last part, Section 4.

Nomenclature
Matrices are denoted in capital letters with straight, bold font A ∈ R M ×M and vectors are denoted by x ∈ R M .Whenever a scalar function f : R → R is applied on a vector valued quantity x, we assume pointwise operation on the entries of , similarly for matrices f (A).Furthermore, if q(x, t, µ) ∈ R is the solution of a scalar PDE, its (discrete space) ODE counterpart is denoted by a vector q(t, µ) ∈ R M containing the spatial values of q in its components.Correspondingly, the snapshot matrix Q contains all time and parameter snapshots in its columns: Q = [q(t 1 , µ 1 ), q(t 2 , µ 1 ), . . ., q(t Nt , µ P )].Partial derivatives in space and time are denoted by

Dimension Reduction Methods for Complex Moving Reaction Fronts
In this section, we motivate why special nonlinear reduction methods are advantageous when decomposing reactive flows and we introduce the Front Transport Reduction as an iterative thresholding algorithm in Section 2.2 and as an autoencoder network with one decoder layer in Section 2.3.

2.1
The need for a nonlinear decomposition approach for moving fronts.
To motivate our decomposition approach, we consider advection-reaction-diffusion systems of the form: These systems describe how a quantity or reactant q(x, t) spreads in space This spread can be caused by the advection with velocity u ∈ R d or an interplay between diffusion ∆q and reaction processes R(q, γ).For the sake of simplicity, we focus on the reaction-diffusion described by a nonlinear Kolmogorov-Petrovsky-Piskunov (KPP) reaction term R(q, γ) = γq α (q − 1) with α > 0, γ > 0.
These systems exhibit traveling or pulsating fronts [19][20][21][22] and without loss of generality one can assume that d = 1 near the front, since the moving structures are locally one-dimensional1 [23][24][25].Therefore, the solution of Eq. ( 3) can be simply transformed into a co-moving frame where the front-profile of the traveling wave is described by f and φ(x, t) = (x − ∆(x, t)) • e v , the location of the front with respect to the direction e v = v/ v of the wave speed v.For a one dimensional traveling wave this is illustrated in Fig. 2a.The profile of the wave f can be analytically computed with help of perturbation theory after transforming Eq. ( 3) into the co-moving frame (see for example [26]) or by fitting q(x, t) = f (x − ∆(t)) Figure 1: Relationship between characteristic length scale l f and approximation error of the POD. Figure 1a shows a wave front f (x) = sigmoid(xl f ) with front width l f traveling along a distance L with constant speed c = L/T and shift 1b shows the decay rate q − qn ∼ e −βn as a function of the relative front width l f /L, when approximating q with n POD-modes.
the front profile [4].However, the wave speed v is the most complex part in typical applications, since it is coupled to an outer transport/velocity field u in Eq. ( 3) and an additional constant propagation speed c * of the reacting wave which depends on R(q, γ) (i.e.minimal propagation speed c * ≥ 2 κR (0, γ) for KPP nonlinearities [19][20][21]).
Dimensional analysis yields a definition of the thickness of the propagating front in terms of the fraction of the diffusion and propagation speed: This characteristic length scale of the system is shown in Fig. 1.In a linear projection-based MOR approach, l f plays an essential role, because its length is directly related to the success of the approximation.As already pointed out by [7,8] for transport systems with vanishing front width (l f → 0), every front position is linear independent of the others and therefore equally important when defining a projection basis.Therefore, the typical exponential decay q − qn ∼ e −βn of the approximation error is reduced to ∼ n − 1 2 , when increasing the dimension of the ROM-basis n.Since the authors [7,8] give no general results for l f > 0, we quantify the decay of the approximation errors numerically in Fig. 1.It can be seen from Fig. 1b that the error decay rate per POD mode diminishes if the traveling distance L becomes large relative to the front width l f , making a linear MOR approach impractical.In order to compensate the transport, many studies use one-to-one mappings [9][10][11][12][13], which can not be used here, since reacting fronts may split or merge.
Here, we thus make explicit use of the underlying physical structure of ARD systems Eq. ( 4).For given snapshot data Q ∈ R M ×Nt with Q ij = q(x i , t j ) ∈ [0, 1] and front function f : R → [0, 1], the approach decomposes the data with help of the nonlinear mapping and a low rank field Φ ij = φ(x i , t j ), that allows to embed the local one dimensional front movement into a d dimensional transport.The idea is visualized in Fig. 2. Since the transport is only parameterized locally, changes in the topology of the front surface can be captured.The decomposition goal is formulated as an optimization problem.
Problem 1 Front Transport Reduction For a given snapshot matrix Q ∈ R M ×Nt with Q ij = q(x i , t j ) ∈ [0, 1] and nonlinear smooth monotone increasing function f : R → [0, 1], find a rank r matrix Φ ∈ R M ×Nt , such that the error Two possible algorithms that solve the optimization problem 1 are provided in the following sections.

Front Transport Reduction via Iterative Thresholding of Singular Values
A simple iterative algorithm to determine the auxiliary field Φ ∈ R M ×Nt of the front transport reduction Problem 1 is stated in Algorithm 1. Figure 2: Illustration of the basic idea of the front transport reduction method.Figure 2a: The FTR replaces the sharp traveling front structure q (blue curves), by a level set function φ (orange lines) and a nonlinear mapping f (indicated by the red arrow).Both quantities share locally the same transport.However, the level set field φ(x, t) = x − ∆(t) is of low rank and can be therefore parameterized with only a few POD basis functions (here: {x, 1}). Figure 2b: The generated snapshot data Φ ij = φ(x i , t j ) can be approximated efficiently with the POD, compared to Q ij = q(x i , t j ).
Our algorithm is constructed by combining a gradient descent step (line 4) to minimize Q − Q 2 F , together with a rank-r projection step of Φ (line 5).In the gradient descent step, the FTR residual is minimized in direction of the gradient Neglecting f (Φ) in the gradient prevents a dying gradient for points where f (Φ) → 0, i.e. |Φ ij | 0. Note that replacing the simple gradient descent step by a quasi Newton method or a line search would not affect the convergence rate, since it is followed by a projection step (line 5), which is likely to destroy the possible larger step of a more sophisticated method.
Algorithm 1 FTR as iterativ thresholding decompose and truncate Φ k+1 = svd(Φ k+1/2 , r) The computational costs of Algorithm 1 scale with the complexity of the singular value decomposition (SVD).For large systems it can be advantageous to use randomized-or wavelet-techniques [27,28] to compute the SVD.

Front Transport Reduction via Neural Autoencoder Networks
Another way to solve the optimization problem 1 is with the help of neural autoencoder networks, which are commonly used in dimensionality reduction [29].For a general introduction to neural autoencoder networks, we refer to [30].Here, we briefly explain the concept and the specifications of our network.
An autoencoder tries to reproduce the input data, while squeezing it through an informational bottleneck.It consists of two parts, the Encoder g enc : R M → R r , q → a = g enc (q), mapping the input data q onto points a in a learned lower dimensional latent space and the Decoder g dec : R r → R M , a → g dec (a) = q, mapping the latent representation back to the input space.
The composition of the two parts q = g dec (g enc (q)) defines the autoencoder.The task of the optimization procedure is now to determine g dec , g enc , such that the reconstruction error over the training data Q = [q 1 , . . ., q Nt ]: is minimized.After the network has been trained, the reduction is achieved as the dimension r M of the latent variables a i = g enc (q i ) ∈ R r is much smaller than the input dimension M .Therefore, the decoder q i ≈ g dec (a i ) represents a reduced map of the high dimensional data contained in the columns of Q.
In the training procedure, the functions g enc , g dec are determined by trainable parameters of the network, called weights and biases.The networks are constructed by a composition of layers, Usually, the layers of the network L n : R i → R o are given by an affine linear mapping x → h n (W n x + b n ), with weights W n ∈ R o,i and biases b n ∈ R o together with a predefined nonlinear function h n .The choice of the input and output dimension i, o in each layer, the activation function and the number of layers is called architecture of the network.
As the FTR-autoencoder network (FTR-NN) should implement the structure motivated in Problem 1, we choose a special architecture.It consists of a single layer decoder, without bias which is activated by the physics dependent front function f .Here, the images of the linear part φ i = Ψa i , with respect to a i = g enc (q i ) correspond to the columns of the discrete transport field Φ = [φ 1 , . . ., φ Nt ].Since the image of the linear part is represented by Ψ ∈ R M ×r , r M the resulting matrix is at most of rank r.
The encoder network consists of four convolutional layers, each followed by an exponential linear unit (ELU) and a batch normalization layer [31].After flattening the output, the convolutional layers are followed by two linear layers, where the first one is again followed by an ELU activation and a batch normalization layer.We apply a stride of two in all convolutional layers after the first, to downsample the spatial resolution of the input data.Further details of the architecture and training procedure can be found in Appendix A.
For the training of the FTR-NN, an additional smoothness constraint is added to the optimization goal L FTR , which penalizes the non-smoothness of the columns Here, D ∈ R M ×M denotes the coefficient matrix of a forward finite difference, which is implemented as a convolution operation over the columns of Ψ.For the examples in this manuscript λ smooth = 10 −7 , was found to be optimal.The additional smoothness constraint allows for faster convergence of the network in the validation phase.The constraint is reasonable since the columns represent the transport field Φ ij = φ(x i , t j ), which is assumed to be smooth.

Synthetic Examples
In this subsection, we provide two synthetic examples.The first example illustrates the application of the FTR to linear advection and compares the two decomposition methods outlined above with previous results [4].The second example addresses topology changing fronts.

Linear Advection of a Disk
The first synthetic example is taken from [4].It illustrates the idea of the FTR in the pure advection case, without any topological change.The example parameterizes a disk of radius R = 0.15L, which is moving in a circle: where x 0 (t) = L 0.5 + 1/4 cos(2πt) 0.5 + 1/4 sin(2πt) .
The snapshot data q(x, t) is generate from a level set field φ(x, t), which is zero at the outer radius of the disk, i.e. location of the front.The front is generated with help of the function f (x) = (tanh(x/λ) + 1)/2, λ = 0.1.One representative snapshot of the data is shown together with its approximation using the POD in Fig. 3.As the authors of [4] have already pointed out, for this example φ(x, t) = parameterized by only three functions and is therefore of low rank, even if the field q(x, t) is not.The basis functions ψ 1 , ψ 2 , ψ 3 are shown in the top row of Fig. 4a.They can be interpreted as a quadratic basis function    9) and ( 10)).Displayed are the expected spatial modes ψ i (Fig. 4a), their temporal amplitudes a i (Fig. 4b) and FTR approximation ψi , ãi , i = 1, 2, 3.The arrows in Fig. 4a indicate the direction of the shift.They are computed from the spatial mean of ∇ψ 2 (x, y), ∇ψ 3 (x, y).The corresponding amplitudes a 2 , a 3 parameterize the circular movement in time.
To show that the singular value thresholding Algorithm 1 (FTR) and the neural network approach Section 2.3 (FTR-NN) can find a similar basis set, we generate 200 equally spaced snapshots from q in the time interval 0 < t ≤ 1, discretized with 129 × 129 grid points in the rectangular domain [0, L] 2 .The data was split into a train and test set, where every second sample is a test sample.After training the neural network on the training samples, it is compared to the results of the POD and the thresholding Algorithm 1 using the test samples.The results are visualized in Figs. 4 and 5.In Fig. 5 we compare the results of both FTR-algorithms (FTR, FTR-NN) and a simple symmetrical autoencoder structure, labeled with NN (for details see Appendix A).The NN decoder attempts to implement the encoder in an inverse manner (see details Appendix A), which is a common practice in dimension reduction.The relative errors in the Frobenius norm are shown in Fig. 5a.The   5b visualizes the level-set field φ together with the approximation of the data q = f (φ) and the deviation from the exact data q − q for one selected snapshot.
quantitative errors of FTR-NN and the FTR show a significant drop using r = 3 degrees of freedom (FTR basis functions/latent space dimension), which is in accordance with the proposed level-set field.In contrast, the POD is showing a much slower convergence of the relative error.Comparing the two networks NN and FTR-NN regarding the quantitative errors shows that the additional depth of the NN-decoder compared to the one layer decoder in FTR-NN does not influence the minimal relative error.This leads us to conclude that additional depth is not needed for a better representation.However, note that the NN needs fewer degrees of freedom to converge to its minimal relative error, which is due to the higher expressivity of a deeper network.Furthermore, it is important to note that the FTR-thresholding algorithm outperforms both networks, when increasing the number of degrees of freedom, for this special example.For qualitative comparison, Fig. 5b shows the approximation of one snapshot before and after activation (first and second column), together with the difference in the last column.Comparing the POD in Fig. 3 to the FTR results shows, that the typical stair casing behavior (which becomes a blurring of the sharp structures for many snapshots as used here) of the POD can be overcome with the FTR ansatz that recaptures the sharp front.We observe that both qualitative and quantitative errors of the FTR-NN and iterative thresholding approach yield similar results.In this study, we use λ smooth = 10 −7 for regularizing the smoothness of φ at the output of the FTR-NN and NN decoder.As visualized in Appendix A Fig. 17, for larger smoothness parameter λ smooth > 10 −7 the transport field of the FTR-NN is smoothly continued at areas of no information (no transport), but the additional constraint Eq. ( 8) can cause a larger overall approximation error.However, the level-set fields of the iterative thresholding approach and NN are almost identical inside areas where fronts have been transported.This is due to the special choice of the encoder.
Figure 4 compares the fields ψ 1 , ψ 2 , ψ 3 , obtained by contemplation, to the first three modes of φ.Similar to the proposed functions, the auxiliary field can be split into a mode ( ψ1 ) responsible for the shape of the disk and two modes that parameterize the transport ( ψ2 , ψ3 ).As expected for this special case, ã1 is constant and ã2 , ã3 ∼ cos(2πt + θ), with θ ∈ R depends on the alignment (indicated as arrows in Fig. 4a) of the two shifting functions ψ2 , ψ3 .The modes ψ2 , ψ3 only have meaningful values along the trajectories of the front because the algorithm can not in-paint φ in areas of no transport.This explains that the modes in Fig. 4 are zero outside the circle.

Advection with Topology Change
In this example we show that our approach is capable of handling transport with topological changes.Therefore, we introduce the synthetic snapshot data q(x, t) = f (φ(x, t)) build from the level-set field which we try to approximate.The front f is chosen as above.The level-set field is sampled equidistantly using 256 × 265 grid points in [0, 10] 2 , with (A 1 , A 2 , A 3 ) = (1, 1.4, 1.2), (σ 1 , σ 2 , σ 3 ) = (0.1, 0.3, 0.5) and x 1 = (7.5, 3.5), x 2 = (2.5, 5.0), x 3 = (5.0,7.6).Furthermore, 101 equally spaced snapshots with 0 ≤ t ≤ 0.5 are constructed from Eq. (11).As above, we split the samples in a test and train set, where every second sample is used for testing the autoencoders.After training the networks, they are compared to the reconstruction errors of the POD and FTR using the test samples.The level-set fields for t = 0 and t = 0.4 are visualized as a surface plot in Fig. 6, together with the resulting snapshots of q as a color plot.The intersection of φ with the zero plane parameterizes the surface of the front.For increasing t, As presented in Fig. 7, the FTR approximates the dynamics within two dyadic pairs with an error smaller than 0.2%, which is expected from the two-term dyadic structure in Eq. (11).The networks approximation errors behave as in the case of the moving disk.The FTR-NN gives similar results as the FTR, but with larger minimal relative error.Due to the additional depth, the NN only needs one degree of freedom to converge towards its minimal error.Topology changes of the zero level-set are nicely recovered as is illustrated in Figs.6c and 6d, since the FTR approach can recover the initial auxiliary field φ in the regions of transport.

Application to Advection-reaction-diffusion Systems
To motivate the FTR approach for more complex examples, we introduce the advection-diffusion-reaction PDE with a KKP reaction term on a square, two dimensional domain Ω = [0, L] 2 with periodic boundary conditions and time interval [0, T ].
The PDE is discretized in space using 6th order central finite differences, and in time with an explicit Runge-Kutta method of 5th(4th) order [32].In the following, we refer to the discretized system as the full order model (FOM).All simulation parameters are listed in Table 1.
For our test case we choose a velocity field inspired by the vortex pair example in [33].Therefore u = ∇ × ω is expressed in terms of the vorticity which parameterizes a moving vortex pair x 1 = L(0.6 − ct, 0.49), x 2 = L(0.6 − ct, 0.51), r 0 = 5 × 10 −4 with an initial amplitude ω 0 = 10 3 decaying slowly in time (decay constant τ = 3T ).The initial distribution of the reactant q is given by: The velocity field and initial distribution are tuned to mimic a flame kernel interacting with a vortex pair, which is a usual phenomenon in turbulence flame inter-actions.During the simulation, the synthetic vortex pair Eq. ( 13) moves towards burning gas and mixes unburned (q = 1) with burned gas (q = 0), such that a small island of unburned gas detaches into the burned area, creating a topology change in the contour line of the front.The time evolution of the FOM is visualized for t = 0.0, 0.4, 0.8 in the top row of Fig. 8.
In the second and third row, the FTR and POD are compared using r = 6 degrees of freedom.The POD approximation shows the typical staircase behavior as oscillations occur before and after the contour line of the front.The oscillations violate the initial range of values 0 ≤ q ≤ 1, which is depicted as red and black areas in Fig. 8. Therefore, preservation of physical structure cannot be expected.Here, the FTR approach gives much better results, restricting the approximation on the initial range of values due to the range of the sigmoid function f (x) ∈ [0, 1]. 3 Galerkin and data-driven Models for Moving Fronts In the previous sections we have addressed the so-called offline stage of a model order reduction procedure, in which data is collected and its dimension is reduced.The reduced model generated by the FTR algorithm in Section 2.2 is nonlinear, which poses additional challenges for the online stage, to predict and interpolate new system states.This section is therefore dedicated to online prediction methods.In Section 3.1 we use a non-intrusive, i.e. equation free, approach of [34] and introduce an intrusive approach, the hyper-reduced Galerkin method in Section 3.2 for 1D and 2D ARD systems.

Data-driven Methods
With the rise of data-driven methods in model order reduction, non-intrusive prediction methods of the reduced system, e.g.POD-DL-ROM [35], SINDy [36,37] or Fourier-Koopman forecasting [34], have become prominent.Although the methods make specific assumptions on the system at hand, they can be useful, since they allow rapid evaluation of the reduced variables with good accuracy.This is especially beneficial if the reduced space is a nonlinear manifold, which makes any Galerkin-projection approach more complex and costly, as is shown in the next section.
Following the approach of [34], we can derive new system states and extrapolate in time with help of the Fourier-Koopman framework implemented in [38].The Fourier-Koopman framework imposes the assumption that the reduced state a(t) ∈ R r is quasi-periodic in t and can be thus parameterized by: Here, A ∈ R r×p and ω ∈ R p/2 are determined by solving the optimization problem: in a smart way [34].Since the dynamical system presented in Section 2.4.1 is quasi-periodic, we can apply the method to the FTR decomposition using the basis functions Ψ = [ ψ1 , ψ2 , ψ3 ], shown in Fig. 4 together with the amplitudes a(t) = (a 1 (t), a 2 (t), a 3 (t)) at the sampled time points {t n = n∆t | n = 0, . . ., N − 1} 2 .From the sampled data we compute A, ω.The resulting model q(t) = f (ΨAΩ(t)) is evaluated at t n+1/2 = (n + 1/2)∆t for n = 0, . . ., 2N − 1.Similarly, we can derive an approximation with the POD.Both results are compared in Fig. 9.
Note that after solving Eq. ( 16) in the offline stage, the computational effort is reduced to the evaluation of q(t) = f (ΨAΩ(t)), which only takes milliseconds.
For a realistic test case, we apply the FTR-Fourier-Koopman procedure to the methane mass fraction Y CH4 of one flame of a multi-slit Bunsen burner simulation analyzed and studied in [39,40].The snapshots are generated with a customized, weakly compressible version of rhoReactionFOAM from the OpenFOAM software package (see [40,41]).In the simulation, a flame is periodically excited by an incoming velocity pulse.The acceleration of the fuel detaches a burning pocket shown in Fig. 10.The data set consists of 200 snapshots, with M = 128 × 430 grid points, sampled in a time interval t ∈ [0.01, 0.05] in which the Bunsen flame is quasi-periodic.Again, we split the data into train (t n = 2∆tn) and test samples t n+1/2 = (2n+1)∆t.While we use the train samples to generate the reduced model, the test samples are used to calculate the relative errors stated in Table 2.The flame pinch-off is not a special case in combustion systems, but it poses challenges to model order reduction methods, as described above.Figure 10 shows that for the FTR the structure of the solution is well captured and the physical bound 0 ≤ Y CH4 ≤ 1 is preserved.

Moving Disc
q(t n+1/2 ) 2 2 for the FTR-Fourier-Koopman predictions using the moving disk and Bunsen flame data. 2 The systems dynamics can be further reduced by rewriting f (Ψa(t) 1) .The offset vector b then contains the time independent part of the decomposition shown as constant line in Fig. 9.This can be done similarly for the POD.

Manifold Galerkin Methods
After discretizing the ARD system Eq.( 3) in space, we obtain an ODE system of the form Here, the parameters µ ∈ P contain the velocity field u, diffusion or reaction constant κ, γ.After discretizing the rescaled system it yields the FOM-RHS with a linear operator L : [0, T ] → R M ×M and a nonlinear operator N : R M → R M .Using a reduced mapping as approximation q ≈ q = g(a) of the data and plugging it into Eq.( 18) yields a reduced model: Minimizing the continuous time residual Eq. ( 21), yields the optimality condition: which is uniquely solved by if the Jacobin has full column rank [42].Here, J + g is the Moore-Penrose pseudo inverse of J g .Note, that for the common POD-Galerkin approach orthogonal mappings g(a) = Ua, with U T U = I are used.Therefore, Eq. ( 25) and Eq. ( 24) are identical.When neglecting N (q) in Eq. ( 19) for the time being, we obtain a small r-dimensional system: which can be solved efficiently, when L r (t) is precomputed.For example in the case of pure advection: the spatial derivative has the form can be precomputed and is much smaller than the operator L(t) ∈ R M ×M , r M of the FOM.Although POD-Galerkin enables to solve Eq. ( 26) efficiently, this approach cannot be used for advection dominated systems, because of its slow decaying approximation errors.Here, nonlinear methods like artificial neural networks can accelerate the convergence of the overall online and offline error.However, any nonlinear reduction method will imply that even linear systems like Eq. ( 27) become nonlinear, causing additional effort for evaluating nonlinearities.At least in the special case of an advection system, this can be avoided with the FTR approach.Due to its special structure q(x, t) = f (φ(x, t)), we can rewrite the advection equation The prefactor f (φ) can be dropped, when assuming that φ features the same transport then q and thus the nonlinear manifold Galerkin system (25) can be simplified to a linear Galerkin system for φ(t) = Ψa(t): Since the operator L r can be precomputed in the same fashion as for the POD-Galerkin approach, the resulting ROM complexity is reduced and the online/offline error is compensated due to the additional nonlinearity f to retain q = f (Ψa).However, these findings need to be interpreted with caution.One might expect that a pure transport of q implies a pure transport of φ.However, if f becomes (approximately) zero, φ might locally change its value without changing q, so that q is transported everywhere, while φ is not.If, however, this cancellation is justified, it can speed up the calculation considerably.The advection of fronts according to a given transport field u(t) within milliseconds is impressively shown for a 1D advection example in Fig. 11.
In this example, only two trajectories of constant advection speed (u(t) = ±2) are used for building the reduced system.Thereafter, almost any parameterization of u(t) can be computed with the ROM.The relative error and speedup for the test trajectory is shown in the lower part of Fig. 12 and is plotted together with the POD-Galerkin results.We see that the errors are reduced compared to the results of the POD.Further details about the simulation are reported in Appendix B. The results apply similarly to higher spatial dimension d > 1, for example in case of the moving disk (Section 2.4.1).Note, that the path simulated in the online phase is limited to areas where the transport field is initialized.These are the areas where any front has traveled during the offline phase as shown in Fig. 4.This restriction is, however, shared with classical linear methods.Dynamics that are not covered in the ansatz space created from the initial set of snapshots are usually not covered by the ROM.
The success of the heuristic approach Eq. ( 30) is somewhat obvious since the level-set function φ parameterizes the transport (see Section 2.4.1), which implies it to be a good basis for the advection operator.The idea to use transport capturing level-set functions to accelerate simulations for advection laws is not new.For example it is intensively used by the characteristic mapping method (CMM) [43], which evolves the initial condition q 0 (x) of a PDE along characteristic curves X(x, t), such that q(x, t) = q 0 (X(x, t)).Nevertheless, in [43] the authors use an invertible mapping X(x, t), which hinders the applicability for systems with topological changes.Intentionally, this is not done here, since we aim for systems, where topological changes are possible.However, it would be interesting to see if snapshots of the characteristic map can be similarly utilized for MOR as the snapshots of the auxiliary field Φ inside the FTR. Figure 12: Relative error vs. speedup for r = 2, . . ., 15 using the FTR-and POD-Galerkin approach.The error and speedup is measured using the testing data shown in Fig. 11.The FTR/POD offline errors mark the reconstruction error of the training data, which is collected in the offline phase (snapshots are shown in the upper row of Fig. 11).All FTR Galerkin results are computed using Eq. ( 30) (snapshots are shown in the lower right of Fig. 11).Accordingly, the POD Galerkin results are computed using Eq. ( 26).The FTR manifold Galerkin results are computed using Eq. ( 25) directly.
Nevertheless, it should be noted that the procedure proposed for the advection equation cannot be generalized to advection-reaction-diffusion equations and therefore special hyper-reduction methods are needed for an efficient ROM.

Hyper-reduction for Moving Fronts
Apart from the slow decaying POD approximation errors, advection-reaction-diffusion systems pose another difficulty for model order reduction.The dynamics of advection-reaction-diffusion systems take place at a characteristic length scale l f defined in Eq. ( 5).This characteristic scale is usually much smaller than the size of the domain or the traveling distance of the front.Hence, the FOM-RHS Eq. ( 19) and its gradient posses only few spatial grid points per time step with non-vanishing support.Therefore, the hyper-reduction methods for nonlinear manifolds [14,17] cannot be applied.For example, the extended-ECSW scheme proposed by [17], or the gappy-POD based GNAT procedure [15] first introduced for nonlinear manifolds in [14] cannot be used here, since they preselect a set of sample points, which is fixed for every time step and all µ ∈ P.
In contrast, the FTR-hyper-reduction approach can help to identify the locations of the front to reduce computational complexity, while sustaining an accurate solution.Here, we propose an idea that is similar to the Reduced Integration Domain (RID) method [18] for finite elements.By imposing a threshold criterion on each finite element, RID is choosing a reduced number of elements to describe a balance condition, i.e. to minimize the residual between internal and external forces.Similar to RID, we choose a selected number of M p sampled/selected points to minimize the error of the projected right hand side (i.e.external/internal force): Each of the M p selected sample points corresponds to an index 0 ≤ i ≤ M , which is represented as the ith standard basis vector e i ∈ R M inside the rows of the selection matrix P a ∈ R Mp×M .Thus, the hyperreduced Jacobian and right hand side P a J g , P a F are only computed at M p sample points.Note that the stencil size of our finite difference scheme requires to compute f (φ) on additional supporting mesh points, contained in Pa ∈ R Mp×M .In practice, Pa f (φ), P a J g , P a F are not computed as matrix products, but as pointwise evaluations of f (φ), J g , F at the corresponding sample points.
In contrast to RID, the selection matrix P a : R r → R Mp×M is dependent on the state a(t, µ), which evolves over time (see Fig. 13).However, similar to what RID does for external forces, we have to add nodes, i.e. sample points, at which the right hand side does not vanish.Since for the FTR F is non-vanishing at the locations of the front, i.e. at the roots of the level-set function, we can perform a time dependent adaptive thresholding, which defines P a .The threshold search selects the M p smallest values of the level-set function φ = Ψa ∈ R M at which we evaluate Pa f (φ), P a J g , P a F .For two time instances, the sample points are visualized in Fig. 13 for the 2D ARD-system of Section 2.5.To reduce the costs of the threshold search, one might recompute the sample points only after a fraction of the characteristic time scale t f = l f /c * , where l f is defined in Eq. ( 5).Note, that the threshold search is a heuristic in order to perform a cheap minimization of the residual Eq. ( 31).
Figure 13: Color plot of the right hand side F (q, t, µ) of the 2D advection-reaction-diffusion system Eq.( 12) for two different time instances t = 0.07 (left) and t = 2.85 (right) and the corresponding sample points for a sample fraction of M p /M = 0.1.The inset in the right color plot shows a close up of the location of the front.
Further, it should be noted that in this work we are using explicit time integration schemes, as they are usually used inside finite difference solvers.Therefore, the aforementioned methods [14,17] are not comparable in speedup, since they compare the ROM with implicit time integration schemes used in the offline stage.Nevertheless, applying implicit integration schemes during the online phase may benefit the stability of the resulting ROM.A promising and efficient method for explicit time integration schemes was proposed by [3] for reaction-diffusion systems in one spatial dimension.Although the framework cannot cope with topological changes, since it relies on a smooth parameterization of the transport, the authors claim speedups of up to a factor of 130.
In the following, we will show some numerical examples utilizing the here presented hyper-reduction approach.

Numerical Examples
In this section, we numerically investigate the applicability of our framework.Therefore, we define the offline and online errors: Here, Q ∈ R M ×(NtN P ) is the snapshot matrix containing all snapshots for the N t time and N P parameter instances µ ∈ P in its columns.The superscript "train" ("test") belongs to the snapshots µ ∈ P train (P test ) computed during the offline (online) phase.
The approximation Qtrain is therefore either the reconstruction of the training data using the FTR-ansatz (Algorithm 1) or, in case of the POD, the projection onto the first r left singular vectors of Q train contained in U ∈ R M ×r , i.e.Qtrain = U T UQ train .
Qtest refers to the results evaluating the ROM Eq. ( 21) for the given time interval and parameters µ ∈ P test using the reduced mapping g : R r → R M .Specifically, in the case of the POD, the dynamical ROM predictions use g(a) = Ua as a reduced mapping, whereas g(a) = f (Ψa) for the FTR.
Furthermore, we define the projection error: where Qtest * is the best fit of Q test with help of our the mapping g.

Reaction-Diffusion System in 1D
First, we test our approach on an analytic test case taken and modified from [44].The test case is based on a one dimensional scalar nonlinear reaction-diffusion equation with corresponding analytical solution given that f (x) = sigmoid(2x).We follow the strategy outlined above.First we compute the numerical solution by discretizing Eq. ( 35) with M = 4000 grid points and solving it for δ ∈ P train = {0.steps using τ = 4 and different truncation ranks 1 < r < 10.After we have computed the reduced mapping q(t, δ) = f (Ψa(t, δ)) from the training set, we can compute the starting values a(0, δ), δ ∈ P test to test the ROM Eq. ( 21) by minimizing the initial condition of the ROM Eq. ( 22), using Gauss-Newton iterations [45].As an initial guess for the minimization, we use the set of initial points {(δ, a(0, δ)) | δ ∈ P train } and interpolate them for any given test parameter δ ∈ P test .Thereafter, the ROM-solution for all test parameters δ ∈ P test is compared to the analytical solution Eq. (36).The results are reported as online errors in Table 3 together with the offline and projection errors.The online and projection errors are stated for the cumulated snapshots including the time interval [0, 1] and all parameters δ ∈ P test = {0.3,0.4, . . ., 0.9}.Table 3 also compares the results with the POD-Galerkin approach.The starting values for the POD-Galerkin-ROM are simply given by the orthogonal projection of q(0, δ) onto the POD modes.It is remarkable to see that the FTR outperforms the POD by two orders of magnitude.
Next, we are interested in whether the gain in precision can be translated to speedups.Therefore, we study the performance of the hyper-FTR explained in Section 3.3.Figure 15 3: Offline, online and projection errors for FTR and POD.The errors are reported for the cumulated snapshot data of the train and test parameters used in Section 3.4.1.

Advection-Reaction-Diffusion System in 2D
Finally, we test the online performance of the hyper-FTR on the advection-reaction-diffusion example of Eq. ( 12), introduced in Section 2.5.We build the training/testing data Q train from 101 equally spaced snapshots (visualized in Fig. 8) with t ∈ [0, 3], γ ∈ P train = {10, 30, 50, 70, 100} and respectively γ ∈ P test = {20, 40, 60, 80, 90}.The online, offline, and projection errors of the test case are shown in Fig. 16 together with the speedup generated by the hyper-reduction scheme.It is visible that the FTR outperforms the POD with respect to the offline and online errors.Furthermore, the utilized hyper-reduction strategy results in speedups with moderate online errors.Note that reducing the integration domain to about 10% of its original size (see sample points in Fig. 13) does not affect the online error, as can be seen from Fig. 16 (a).Figure 16 (b) shows, that for small r, the additional costs (O(rM )) for the matrix multiplication φ(t, µ) = Ψa(t, µ) are negligible, compared to the evaluation of F .However, as soon as r becomes large, the speedups of the hyper-reduction scheme are compensated by the computation of φ inside the threshold search.The balance point at which the additional costs compensate the costs of the RHS is problem depen- dent, but computing the sample points from φ is a bottleneck of this method.Nevertheless, when aiming for more complex examples like combustion systems or 3D ARD systems, the outlined hyper-reduction approach will benefit from a more computationally complex RHS, which will shift the balance point towards a higher number of modes.

Discussion and Conclusion
In this work, we have introduced the front transport reduction (FTR) method to decompose and simulate transports of complex moving fronts.The decomposition parameterizes moving fronts with the help of a transport-dependent auxiliary field φ and a function f to approximate the front profile.Two different decomposition algorithms have been proposed based on singular value thresholding (Algorithm 1) and artificial neural networks (Section 2.3).These methods are purely data-driven since they only require a set of snapshots q(t, µ) ∈ R M of the FOM as input.The resulting approximation q(t, µ) ≈ f (φ(t, µ)) is well suited for model order reduction of reacting fronts, since φ(t, µ) = Ua(t, µ) ∈ R M can be represented by a few r M spatial modes collected in U ∈ R M ×r .
We emphasize that the utilized front-structure is inherent for advection-reaction-diffusion (ARD) systems (see for example [19][20][21][22]).Making explicit use of the physical structure has advantages over other linear and nonlinear dimension reduction methods, for reasons we discuss in the following: It was shown, that for various ARD systems the FTR requires fewer modes to decompose the input snapshots compared with the proper orthogonal decomposition (POD), i.e. it has a better compression quality.Regarding artificial autoencoder networks, the FTR is similar in the sense that it uses a linear layer activated by a problem depen-dent nonlinear front function as a decoder.Here, other authors [14] use multiple nonlinear activated layers q ≈ f (f . . .f (a)), resulting in costly evaluations of the network itself.This can limit the overall performance of the ROM when evaluating the additional nonlinearities.Furthermore, the autoencoder networks are often difficult to tune and require training on GPUs.Similar to artificial autoencoder networks, the FTR can approximate topological changes in the evolution of the contour line of the front since it does not make explicit assumptions on the mapping.Here, methods like the shifted POD, previously applied to similar problems in [3], cannot be used, since they assume one-to-one mappings to align the front.
The ability of the FTR to predict new system states has been demonstrated for non-intrusive (Section 3.1) and intrusive (Section 3.2) ROMs.Since the FTR gives additional insights into the underlying structure (transport field φ), it allows us to use this information when predicting new system states.As an example, we heuristically reduced the integration domain during the online evaluation of the Galerkin projected ODE system, using the knowledge of φ.This can be seen as an adaptive version of the reduced integration domain method [18].Other nonlinear hyper-reduction methods preselect a set of sample points, on which the dynamics are evaluated.Since for the studied systems, only sample points close to the front are relevant for the dynamics, such hyper-reduction methods may fail.Although the outlined hyper-reduction procedure yields speedups in CPU-time, it needs a substantially larger number of sample points M p than required by the dimensions of the ROM r M p M .Therefore, the construction of more efficient hyper-reduction schemes is left open for future research.
To apply our findings to more complex advection-reaction-diffusion systems such as combustion systems in fluid mechanics with multiple reacting species, the decomposition has to be extended to allow arbitrary traveling front shapes.Here, the FTR method would benefit from a generalization or combination with the shifted POD [11], as this would allow to decompose multiple traveling wave systems with topological changes.Furthermore, it would be interesting to see if our approach can be applied to multi-phase flows, as they inherit a similar front structure separating the fluids.Here, the similarity with level-set-based methods like the characteristic mapping method [43] should be exploited.

A Details on the Autoencoders Network Architecture and training hyperparameters
In this section we provide detailed information on the architecture and training hyperparameters for the autoencoder networks.For both autoencoder variants (NN and FTR-NN), the encoder architecture g enc : R M → R r is the same.Its task is to encode the spatial field q ∈ R M into a latent space a ∈ R r .It consists of four convolutional layers, each followed by a ELU activation and a batch normalization layer.After flattening the output, two fully connected layers follow, with another ELU activation and batch normalization layer in between.The output of the second fully connected layer represents the latent space with r degrees of freedom and is not activated.A summary of the encoder architecture is listed in Table 4.The decoder, g dec : R r → R M maps the latent representation back to the spatial domain.
There are two different decoders used in this paper labeled NN and FTR-NN autoencoder.The NN decoder mirrors the encoder architecture, using transposed convolutional layers instead of convolutional layers.The FTR-NN decoder consists of only a single fully connected layer with no bias with M (number of grid points) output channels.It applies a simple Matrix multiplication Wa, where a is the vector with the latent representations and W is the learnable weight matrix of the layer.Afterwards the resulting output φ = Wa is reshaped into the spatial domain.In analogy to the FTR ansatz q ≈ q = f (φ), both networks are activated with the physics dependent front function f in the output layer.The layer details for both decoder networks are listed in Table 5.
After splitting the data by taking every other time step into a training set and a test set, each network was trained using the ADAM optimizer with a learning rate of 0.0025 for up to 2 • 10 4 iterations, using all training samples as input batch.Every 500 iterations, the performance is tested on the test set.The network parameters that yield the best test results are saved.
Error decay rate per mode.

3 k=1Figure 3 :
Figure 3: Data at time t = 0.11 and its approximation with the Proper Orthogonal Decomposition (POD) using r = 3 modes.

3 i=0
a i (t)ψ i (x, y) for the disk moving in a circle (see Eqs. ( Quantitative error b) Qualitative error r = 3 degrees of freedom

Figure 5 :
Figure 5: Comparison of POD, FTR, FTR-NN and the symmetrical autoencoder structure labeled with NN.Fig. 5a compares the relative errors in the Frobenius norm for different degrees of freedom.Fig.5bvisualizes the level-set field φ together with the approximation of the data q = f (φ) and the deviation from the exact data q − q for one selected snapshot.

Figure 6 : 10
Figure 6: Graph of the auxiliary field φ (Eq.(11) in a)-b) and its FTR approximation in c)-d).The resulting snapshots q = f (φ) are shown as color plot in the xy plane.The intersection with the zero level is visualized.

Figure 7 :
Figure 7: Comparison of the relative errors for the advection example with topology change using the proper orthogonal decomposition (POD), the FTR iterative thresholding algorithm (FTR), the FTR autoencoder structure (FTR-NN) and a standard autoencoder (NN).

Figure 8 :
Figure 8: Qualitative comparison of the reconstruction errors of the 2D ARD system Eq.(12) at three different time instances t = 0.0, 0.4, 0.8 (respectively left, middle, right column).The plot shows the FOM data (top row) and its reconstructions using the POD (middle row) and FTR (bottom row).For the FTR and POD, r = 6 degrees of freedom are used.The colorbar is chosen such that values outside the initial range of values 0 ≤ q ≤ 1 are highlighted in black or red.

Figure 9 :Figure 10 :
Figure 9: Predictions using Fourier-Koopman forecasting with three POD modes (a) and three FTR modes (b).The black circles ( ) in the upper row indicate the predictions of the amplitudes a(t) = (a 1 (t), a 2 (t), a 3 (t)) = ( , , ) and the colored crosses mark the training samples.In the lower row, we show the corresponding snapshots at selected time instances t = 0.2, 0.4, 0.6, 0.8.

Figure 11 :
Figure 11: The 1D advection test case for moving fronts.In the upper row, the two training samples computed with the full order model (FOM) are shown.They are used to build the snapshot matrix Q ∈ R 1000×202 for the FTR decomposition.In the lower row, the trajectory of the FOM and the FTR-ROM Eq. (30) (r = 4) are compared.

Figure 15 :
Figure 15: Error vs. CPU-time for the accumulated parameter range δ ∈ P test .Different ranks r are indicated as (r) above or below the markers.The dashed line indicates the CPU-time needed for solving the FOM.The sampled fraction M p /M in the hyper-reduced ROM is given in terms of the size M of the FOM.

1 FTR-online Mp/M =0. 2 FTR-online Mp/M =0. 5 FTRFigure 16 :
Figure 16: Hyper-FTR results: (a) relative errors defined in Eqs.(33) and (34) for P test/train using POD and FTR decomposition.(b) Speedup vs. degrees of freedom for the cumulated parameter range γ ∈ P test .The speedups are compared for different numbers of sampled grid points M p ≤ M .The full order model (FOM) using M = 512 2 grid points is marked with a dashed line.

Figure 17 :
Figure 17: Color plot of one snapshot of the FTR-NN and NN levelset field φ using three degrees of freedom and different smoothness strength λ smooth .The smoothness parameter λ smooth controls the strength of the smoothness constraint Eq. (8).

Table 2 :
Relative error compares CPU-time and error for M p = 0.1M, 0.2M, 0.5M and M number of grid points, where M is the dimension of the FOM.The figure indicates that even without hyper-reduction, speedups can be achieved compared to the FOM, due to larger step sizes in the reduced coordinates.Comparing the hyper-FTR with a sample fraction of M p /M = 0.2 to 1 we see another speedup in CPU-time.For a reduction below 0.1M grid points, the solution is unstable and can lead to additional time steps, making the overall simulation slower.

Table 4 :
). Encoder network details.M describes the number of remaining spatial grid points after all convolutional layers are applied.Each convolutional layer reduces the spatial resolution in each spatial direction by N out = (N in − kernel size) /stride + 1

Table 5 :
NN decoder network details