Model Order Reduction for the 1D Boltzmann-BGK Equation: Identifying Intrinsic Variables Using Neural Networks

Kinetic equations are crucial for modeling non-equilibrium phenomena, but their computational complexity is a challenge. This paper presents a data-driven approach using reduced order models (ROM) to efficiently model non-equilibrium flows in kinetic equations by comparing two ROM approaches: Proper Orthogonal Decomposition (POD) and autoencoder neural networks (AE). While AE initially demonstrate higher accuracy, POD's precision improves as more modes are considered. Notably, our work recognizes that the classical POD-MOR approach, although capable of accurately representing the non-linear solution manifold of the kinetic equation, may not provide a parsimonious model of the data due to the inherently non-linear nature of the data manifold. We demonstrate how AEs are used in finding the intrinsic dimension of a system and to allow correlating the intrinsic quantities with macroscopic quantities that have a physical interpretation.


Introduction
Kinetic equations are widely used in science and engineering [20,25,26,33].They allow the modeling of deviations from an equilibrium model which is given by an underlying macroscopic equation like the Euler equations, providing detailed insight into fundamental physical processes [35].However, kinetic equations are often characterized by a large dimensional phase space, making them computationally expensive to solve and sometimes even unfeasible for realistic applications [35].
Investing in solving kinetic equations is only beneficial if large deviations from equilibrium are present [35].Striking a balance between a fast but inaccurate equilibrium solver and a slow but accurate non-equilibrium solver remains an open challenge.Our work aims to address this challenge by providing a proof-of-concept for a data-driven solution to efficient modeling of flows in different non-equilibrium regimes.
In the field of non-equilibrium gas flows, several standard methods to discretize the high dimensional phase space exist.Particle-based Monte Carlo methods are only tractable in the free flight regime and out of scope in the transition regime of moderate non-equilibrium unless special techniques are used [5].The straightforward Discrete Velocity Method (DVM) uses a pointwise discretization of the velocity space, potentially leading to a large number of equations [30].Specially tailored moment models are based on the expansion of the particle distribution function and lead to a set of extended fluid dynamical equations [35].However, it is by no means clear a-priori how many equations are sufficient and which variables are optimal [19,34].
To tackle the computational complexity of kinetic equations, recently, reduced order models (ROM) have been introduced, enabling reductions in computational complexity by orders of magnitudes [1,7,8,9].Two different approaches have been followed in the literature.The classical offline-online decomposition as used by [1] involves a two-stage procedure.In the offline stage, the full order model (FOM) is assessed to create a database, which is then utilized to generate a data-dependent basis through proper orthogonal decomposition (POD).This basis allows an efficient description of the FOM on a low-dimensional linear subspace during the online phase.On the other hand, the online adaptive basis method called dynamic low-rank approximation [13] constructs the low dimensional linear basis during the online phase itself, eliminating the need to evaluate the expensive FOM.It has been successfully applied to kinetic equations in the works by Einkemmer et al. [7,8,9].However, the additional complexity of updating the basis during the evolution makes it less online efficient than the classical offline-online approach shown in [15] for a shallow water moment model.
In this work, we adopt the same offline strategies as in [1].Specifically, we sample data for a classical test case called Sod shock tube using a discrete velocity model as our FOM and compare the compression of the linear reduced subspace created by POD with a non-linear description provided by neural autoencoder networks.Neural networks, based on the universal approximation theorem [29], allow for the approximation of a wide range of function classes and appear promising in identifying the intrinsic dimension of a system.However, the non-linear relation between macroscopic model equations and the discrete velocity model hinders the determination of these dimensions using linear reduction methods like the POD.This paper aims to utilize these data-driven model reduction techniques to reduce the number of describing variables and equations and determine how many and which variables are useful in specific test cases.For the non-vanishing Knudsen number, we expect to need more non-equilibrium variables with corresponding balance laws, while in the limit of vanishing Knudsen number, we expect to recover the Euler equations, given by conservation laws for mass, momentum, and energy.To the knowledge of the authors, this is the first paper aiming to bridge the gap between equilibrium and non-equilibrium flows using neural networks in this way.
The long-term objective of this line of work is to enable dynamically adapting the model by varying the number of variables during the online phase, paving the way for more efficient and accurate model adaptive simulations of kinetic equations.
The organization of the paper is as follows: In Section 2, we introduce the 1D model problem and the reference data used for model reduction.Section 3 describes the two model reduction techniques used in this study: Proper Orthogonal Decomposition (POD) and Autoencoder Networks.The results are presented in Section 4, and the paper concludes with a summary in Section 5.
For standard continuum flows the widely-used Euler equations can be applied, but more rarefied regimes require different extended fluid dynamical models.Rarefaction levels are distinguished with the help of the Knudsen number Kn defined by the ratio of the mean free path length of the particles λ over a reference length l as The right-hand side of the BGK collision operator (1) models the relaxation with relaxation time τ ∈ R + towards the equilibrium Maxwellian distribution f M (t, x, c) given by where ρ(t, x), v(t, x) and T (t, x) are density, bulk velocity, and temperature of the flow, respectively.R is the universal gas constant.In this work, we consider the relaxation time τ a parameter and set it equal to the Knudsen number, τ = Kn, however, the relaxation time can also be changed, e.g., to depend on the gas density and temperature in addition.
For practical computations, we consider macroscopic moments of the distribution function, which are given by multiplying the distribution function with the co-called collision invariants (1, c, 1 2 c 2 ) and integrating in velocity space where E denotes the total energy.The temperature T (t, x) and the pressure p(t, x) can be obtained by Figure 1 illustrates the relation between the macroscopic moments and the distribution function f (t, x, c) at a certain position in time and space.The density ρ(t, x) is the integral of the distribution function, which is centered around the macroscopic velocity u(t, x), and the mean deviation is related to the temperature T (t, x).The Boltzmann-BGK equation ( 1) is in equilibrium when f = f M .Multiplying the equilibrium solution with the collision invariants and integrating in velocity space, one finds the Euler equations of classical gas dynamics which are conservation laws for mass, momentum, and energy, respectively.
For distribution functions further away from equilibrium, for example due to a larger relaxation time τ and a significantly large Knudsen number Kn, the Euler equations do not give accurate results.In this case, additional equations can be used, which are derived by the so-called method of moments [14,35].This effectively leads to an extended set of equations, called moment model.It is possible to preserve important properties like hyperbolicity with moment models [11,17].The additional equations (for example for the heat flux and higher-order moments) add complexity, but allow for more accurate solutions [18,34].However, it is often unclear a-priori, how many equations are needed for an efficiently accurate and computationally feasible solution.In this work, we aim to give a proof-of-concept for a data-based identification of the necessary number of variables, called the intrinsic physical dimension.

Sod shock tube test case and reference data
Sod's shock tube is a well-established test case in the field of rarefied gases [19].It uses discontinuous initial conditions based on equilibrium values corresponding to a jump in density and pressure at x = 0.5 due to a diaphragm at that position, which is removed at time t = 0.
The problem setup at t = 0 is shown in fig.2, which is split into two regions left and right of the diaphragm.
x diaphragm x = 0.5 For the generation of reference data we employ a discrete velocity method (DVM) [27], which uses a pointwise microscopic velocity space discretization where a uniform grid in velocity space is considered with c k = k∆c to discretize the distribution function f k (t, x) = f (t, x, c k ), for some k ∈ Z.After a subsequent discretization in space, the DVM eq. ( 12) leads to a coupled ODE system in time than can be solved with standard methods.
For the numerical reference data, we use 10] and N t = 25 time steps t n ∈ [0, 0.12] summarized in table 1.It is possible to choose another range for the discrete velocity points, but in typical applications the range of the bulk velocity is not known, such that one has to include a safety margin.We therefore chose the domain [ −10, 10].The goal of the model order reduction is now to reduce the complexity of the computation using lower dimensional models.For that matter, it is not relevant what the actual error of the numerical reference data is or if is fully converged.It is fair to say that a full reference solution might easily take into account more spatial points, time steps, and discrete velocities, which makes it even more necessary to reduce the complexity.
For the model order reduction later, we consider two different Knudsen numbers for Sod's shock-tube test case: Kn = 0.00001 for a small Knudsen number in the hydrodynamic regime and Kn = 0.01 for a relatively large Knudsen number in the rarefied regime.
To understand the behavior of the reference solutions for non-vanishing Knudsen numbers, we first describe the solution for vanishing Knudsen number in equilibrium, i.e., Kn = 0, which can be obtained using the method of characteristics and the Rankine Hugoniot jump conditions connecting the states before and after the shocks [24].
Starting from the initial condition in fig.3a, the solution evolves for t > 0 and five regions are formed that are depicted in fig.3b [32].A rarefaction wave is moving to the left between x 1 and x 2 .The contact discontinuity is located at x 3 , where the macroscopic velocity u and the pressure p are continuous in contrast to the density ρ and the energy E. x 4 is the position of the shock wave.
In non-equilibrium, i.e., for solutions evolving with Knudsen numbers Kn > 0, the solution does not have discontinuities due to the finite relaxation time τ .Figure 3c shows the reference solutions f (t, x, c) at t 0 = 0, t 1 = 0.06 and t 3 = 0.12 for the two levels of rarefaction considered in this paper: Kn = 0.00001 and Kn = 0.01.Increasing the Knudsen number leads to a smoother transition from region 1 to region 5 with a less pronounced shock front.

Methods
In this section we present two common methods used for reducing the dimensionality of high dimensional data: (1) the proper orthogonal decomposition (POD) and ( 2) neural autoencoder networks (AE).The methods will be used to parameterise the high dimensional data stemming from the DVM simulation, using a linear mapping in case of the POD and a non-linear mapping in case of AE.Although the classical POD-MOR approach shows that linear mappings are sufficient to describe the non-linear solution manifold of the BGK equation to a good accuracy [1], it is in general not sufficient to determine a parsimonious model of the full model data, since the data manifold can be non-linear.Here, neural autoencoder networks can be used as they are capable to find the intrinsic dimension of a system.

Proper orthogonal decomposition
The proper orthogonal decomposition [31] (POD) approximates the data with help of dyadic pairs: The pairs {( fk (x, t), ψ k (c i ))} k=1,...,r are the structures in the data that contain the most energy and they are chosen to minimize the gap between the data and the reconstruction eq. ( 13).In the following, ψ k (c i ) are termed POD-modes and fk (x, t) the corresponding reduced variables.
x 1 (b) Sod's shock tube (t > 0) containing the rarefaction wave, the contact discontinuity and the shock wave.Technically one can solve this optimization problem using a singular value decomposition (SVD) of the so-called snapshot matrix [22]: The snapshot matrix collects the time and space discrete distribution function in its columns.Each column holds the time-spatial values for a different discrete velocity.
Performing an SVD factorizes F as The first r columns of the truncated Ψ r = [ψ 1 , . . ., ψ r ] ∈ R Nc×r contain the POD modes in eq. ( 13).Together with Σ r = diag(σ 1 , . . ., σ r ) and the r leading left singular vectors Φ r ∈ R (NxNt)×r they yield the rank r-term approximation F r of the snapshot matrix given by According to the Eckart-Young-Mirsky theorem [6, 28] F r is the best rank r approximation and the resulting error in the Frobenius norm is rigorously computed from the trailing singular values A common choice for r is to truncate after a certain energy percentage is reached in the reduced system compared to the full system: In this paper, POD is used to compare with the autoencoder from the next section.

Autoencoders
Neural networks, particularly autoencoder networks, have become widely used tools for dimension reduction [36].A comprehensive introduction to autoencoder networks can be found in [12].Here we only summarize briefly the common idea of autoencoder networks and give the specific details of our implementation.
An autoencoder aims to reproduce the input data while compressing it through an information bottleneck.It consists of two main components: • The encoder, denoted as g enc , maps the input data f to points f in a lower-dimensional latent space: • The decoder, denoted as g dec , reconstructs the input space from the latent representation: Note that the dimension of the latent space is denoted by r, to match the rank of the POD approximation.
The autoencoder is defined as the composition of both parts: f = g dec (g enc (f )).For our purpose, we identify the discrete velocity space as the input dimension M = N c .Thus, the autoencoder maps each time-spatial value of the distribution function f (x, t) ∈ R Nc onto a smaller latent space f (x, t) ∈ R r , which parameterizes the necessary physical information of the system.
The goal of the optimization procedure is to determine g dec and g enc such that the reconstruction error over a set of training/testing data contained in F is minimized.The reconstruction error is defined as: The reconstruction error is the sum of the two-norm of the discrete velocities vector of the difference between the input data f and the reconstructed data f that has been squeezed through the informational bottleneck.The assumption is, that if the original data can be represented well while the information went through a smaller latent space, there exists a physical law in the latent space that describes the system sufficiently.The intrinsic latent dimension r = p * which is sufficient to describe the data is then called the intrinsic physical dimension similar to the intrinsic dimension defined in [23].Such a reduced model is then termed parsimonious because it explains the data with a minimum number of variables.
In the training procedure, the functions g enc and g dec are determined by trainable parameters of the network, referred to as weights and biases.The networks are constructed using a composition of layers Training Before the training we initialize the weights of the network using the standard initialization implemented in pytorch.Thus the weights are randomly uniform distributed between m −1/2 and m 1/2 with m being the number of input nodes in the layer.Our network is trained by splitting the data consisting of N x × N t samples in a testing and training set with a 80/20 split over 3000 epochs using a batch size of 4. In each epoch the network is updated using the Adam optimizer with a learning rate of 10 −5 .More information about hyperparameters and training of the network can be found in the appendix section 5.

Results
We reconstruct the full order model (FOM) solution with the help of POD and an autoencoder, the FCNN, for which the selection of hyperparameters and the training are described in the previous section.Note that we apply both model reduction techniques to reconstruct both the rarefied reference data and the hydrodynamic reference data.The goal is to later determine the intrinsic dimension of the data for both cases.We therefore compare the two dimension reduction techniques by means of different measures.The intrinsic variables obtained from POD and the FCNN will be referred to as h and r, where the former describes the intrinsic variables when reducing the hydrodynamic data and the latter when reducing the rarefied data.
For the purpose of comparing the results we define the L 2 -error where F the reference data is given in eq. ( 15) and the reconstructed data F is either F r in case of the POD or the FCNN predictions with r latent variables for every (x, t, c) in the data set.

Singular value decay of reference data
As a first step, we perform a POD with the hydrodynamic data and with the rarefied data.
The obtained singular values σ, as well as the cumulative energy (cusum-e) defined in eq. ( 19), are shown in fig. 4. As expected, more modes are necessary in the rarefied regime compared to the hydrodynamic regime.With a total of p POD = 4 intrinsic variables, a cumulative energy of over 99% can be achieved for the hydrodynamic regime.The cumulative energy of the singular values of the rarefied regime only reaches above 99% with p POD = 6 singular values.
For the POD we define the intrinsic dimension p POD as the smallest truncation rank r of the reduced system, at which the cumulative energy E cum defined in eq. ( 19) reaches 99%.
Although this choice is arbitrary it is a common practice in classical MOR to truncate eq. ( 13), whenever 99% of the cumulative energy is reached.
In fig.4, we further see that the rate at which the singular values drop is approximately exponential in both regimes, which has been also observed by [1].Consequently, a rapid decay of the Kolmogorov N-width is indicated.Note that the singular value decay is similar for both domains but not exactly the same, thus leading to an expected increase in the number of intrinsic variables in the rarefied regime necessary to achieve similar L 2 -errors.It is important to note that the parameter p POD is not expected to precisely match the actual intrinsic dimension p * of the solution manifold.The intrinsic dimension represents the minimum number of variables required to accurately describe the system's exact solution manifold.This discrepancy arises because the solution manifold is fundamentally nonlinear, making it challenging to adequately capture with a parsimonious linear approximation.
For the FCNN, the intrinsic dimension is defined as the smallest number of intrinsic variables that minimizes the error.In well-trained models, the FCNN's intrinsic dimension should ideally align with p * .
From a fluid mechanics perspective, the hydrodynamic regime theoretically requires only p * = 3 intrinsic variables.This is because near-equilibrium flows in this regime can be effectively characterized by three macroscopic quantities: density ρ, macroscopic velocity u, and total energy E, as outlined in eq. ( 8)-eq.( 10), see also [1,19].
Conversely, the rarefied regime demands a larger intrinsic dimension, denoted as p * .This is due to the need for more than only the equilibrium Maxwellian distribution function to describe the microscopic velocities adequately.Therefore, we initially set p * = 3 intrinsic variables (h) for the FCNN in the hydrodynamic case and choose p * = 5 intrinsic variables (r) for the rarefied regime.This choice aligns with extended fluid dynamic models as described in [19,35].
We note that each FCNN with different latent space dimension r needs to be trained separately.This is different from the POD, where the decomposition is only performed once.Thereafter the approximation quality is given by the truncation rank r.

Variation of the number of intrinsic variables
The variation of the number of intrinsic variables r in fig. 5 sheds light on the performance of the autoencoder with different bottleneck layer sizes.In the case of the POD r is the truncation rank of the decomposition eq. ( 13) and the latent space dimension in case of the FCNN.To this end, r is varied for both the POD and the FCNN over r ∈ {1, 2, 3, 4, 8, 16, 32} for the hydrodynamic case and over r ∈ {1, 2, 4, 5, 8, 16, 32} for the rarefied case.We note that the loss of information when applying POD goes exponentially to zero with increasing r, which is not surprising when consulting the Eckard-Young Theorem [6].Note that the FCNN is retrained for each different r.By changing r, i.e. widening the bottleneck layer, a gain or loss of capacity occurs that can be connected to stability during training.Both for the hydrodynamic and the rarefied regime, POD initially yields a larger error than the FCNN for small number of intrinsic variables r.Not surprisingly, the POD accuracy increases with the number of singular values taken into account until the error reaches machine precision.The FCNN error decreases as well and then reaches a plateau, with a typical remaining error due to the network architecture and training.For the previously identified values p * = 3 in the hydrodynamic case and p * = 5 in the rarefied case, the FCNN results in a more accurate approximation than the POD.
We note that when testing the FCNN against POD and fixing r the FCNN is limited by the estimation error of the training and performs under its abilities.However, POD uses five to six times more parameters than the FCNN while the deterministic character enables POD to achieve any possible accuracy, which was not observed with the neural network.
In the following, we consider the intrinsic variables with constant values p * = 3 in the hydrodynamic case and p * = 5 in the rarefied case.
This leads to the number of trainable parameters of the POD and the FCNN shown in table 3. We can see that the FCNN achieves a relatively small error with a small number of parameters in comparison with the POD for this choice of the number of intrinsic variables.

Reconstruction quality
The reconstructed solutions compared to the full order model (FOM) at the final time step t = 0.12s are given in fig.6 .Because of the small overall errors indicated in table 3 both the POD and the FCNN reproduce the FOM solution without any visible differences at first sight.As shown in fig.7, the more subtle information loss from the model reduction can unfold in actual differences in the macroscopic quantities ρ, ρu and E. Overall, fig.7 shows that the errors are larger for the hydrodynamic regime (top row), most notably for the momentum and energy of the POD model close to the contact discontinuity.However, the position of the shock is well approximated.In contrast, the FCNN model yields a very good agreement in the hydrodynamic case.For the rarefied regime, both models approximate the FOM solution very well.The lack of sharp shock structures in the full model and the increased intrinsic dimension p * = 5 combined seem to notably influence the accuracy.

Conservation properties
The physical consistency of the reduced f , in terms of conservation of mass, momentum, and energy, is a critical criterion for its validity.Hence, conservation properties are analyzed in the following.We note that conservation of mass, momentum, and energy is not directly built-in by means of a specifically tailored loss function.Even though we can expect to recover some conservation properties as they are implicitly built into the numerical reference data.The investigation of different loss functions to improve upon this is left for future work.
We investigate the conservation properties by means of the derivative of the cell-averaged conserved quantities mass, momentum, and total energy, defined exemplary as for the mass.Note that a derivative ρ = 0 denotes conservation of mass, for example.We expect the conservation of mass and total energy, while the momentum increases with a constant rate, due to the boundary conditions of the test case, featuring a larger pressure on the left-hand side of the domain.reasonable accuracy for the FCNN, while the error is larger for the POD reconstruction for both regimes.Also the increase in momentum of the full order model (FOM) is accurately described by the FCNN with a larger error for the POD method for both regimes.Overall, the errors are slightly smaller for the rarefied case compared to the hydrodynamic case, which might be due to the higher capacity of the neural network and more modes for the POD using five intrinsic variables in the rarefied regime.

Physical interpretability
An important question in the context of model reduction with neural networks is the interpretability of the results, because of the usual black-box nature of neural networks [10].Especially when benchmarking neural networks for model order reduction with POD, evaluating the interpretability of the intrinsic variables is important, since POD by construction achieves a so-called physically interpretable decomposition of the input data [3] as outlined previously.
Following the assumption that the hydrodynamic case can be fully described in terms of three macroscopic quantities and that the rarefied case is reasonably describable in a similar way with an extended set of five variables, we test the intrinsic variables h and r for similarities and investigate if they match any macroscopic quantity.Two related macroscopic quantities, namely the temperature T and macroscopic velocity u, are added to the three macroscopic variables ρ, ρu, and E. In fig. 9 and fig.12, these are plotted first over the whole domain of x and t and for the end time t = 0.12s for both regimes.Similarly, we plot both the FCNN's and the POD's first 3 intrinsic variables h of the hydrodynamic case and 5 intrinsic variables r of the rarefied test case r = [r 0 (x, t), r 1 (x, t), r 2 (x, t), r 3 (x, t), r 4 (x, t)], depicted in fig. 10 and fig.13 respectively.First five intrinsic variables r 0 (x, t), r 1 (x, t), r 2 (x, t), r 3 (x, t) and r 4 (x, t) of rarefied case obtained by the POD.Top row depicts the whole (x, t) domain, bottom row is for t = 0.12.
variables closely resemble the density ρ and the momentum ρu.Interestingly, this does not relate to very good conservation properties of the POD in comparison with FCNN as shown in the previous section and fig.8. Considering the FCNN in the rarefied case, we compare fig.12 and fig.13.Here r 3 clearly reflects the shape of the density ρ.Moreover, the peak of the velocity u can be observed in r 0 .For the other intrinsic variables of r 1 , r 2 and r 4 a clear discernability of macroscopic quantities is difficult to observe and might require linear or nonlinear combinations.Additionally, it is possible that those intrinsic variables resemble non-equilibrium variables not present among the macroscopic variables.
Considering the POD results in fig.14, we again see a very good agreement of the first intrinsic variables with density and momentum.
For more physical insight into the relation between macroscopic variables and intrinsic variables, the Pearson correlation between all variable combinations is computed in [12] for the FCNN.Note that we expect similar results for the POD based on the previous results, but do not present them here for conciseness.The Pearson correlation coefficient r X,Y = r Y,X is a measure of linear correlation between two sets of data, here represented by a macroscopic variable X ∈ {ρ, ρu, E, T, u} and an intrinsic variable Y ∈ {r 0 , r 1 , r 2 , } for the hydrodynamic case and Y ∈ {r 0 , r 1 , r 2 , r 3 , r 4 } for the rarefied case.It is commonly defined as Note that r X,Y ∈ [−1, 1], with r X,Y = 0 meaning that there is no correlation between both data sets, r X,Y = 1 indicating a perfect correlation, and r X,Y = −1 indicating a perfect anti-correlation.
The Pearson correlation coefficients for the hydrodynamic test case are presented in fig.15.
As predicted by the previous analysis, there appears to be an almost perfect correlation of the first intrinsic variable r 0 with the density ρ.This means that the FCNN succeeds at identifying the density ρ precisely as an internal variable.Note that r 0 also correlates almost perfectly with the energy E, as the energy depends linearly on ρ.Additionally, there is a relatively strong linear correlation between r 2 and both the momentum ρu as well as the temperature T .The intrinsic variable r 2 on the other hand seems to be correlated with all variables.
The Pearson correlation coefficients for the rarefied test case are presented in fig.16.In agreement with the previous results, there is no clear correlation of most of the intrinsic variables.An exception is r 0 , which is anti-correlated with u and r 3 , which correlates almost perfectly with the energy E. Note that only linear correlations are tested here.For further analysis on the disentanglement of the intrinsic variables, it might be more suitable to consider nonlinear correlations beyond the Pearson correlation coefficients.One option that could be explored in future work would be to train the functional relation between the intrinsic variables and macroscopic variables, for example by means of symbolic regression [4].

Conclusion
This paper marks the first comparison of velocity space model reduction techniques for rarefied flows using proper orthogonal decomposition (POD) and a fully connected neural network (FCNN).
As physically expected, the rarefied regime needs more modes than the hydrodynamic regime.Choosing three and five intrinsic variables for the hydrodynamic and rarefied case, respectively, leads to less than one percent error.The FCNN is initially more accurate but has a remaining error even for larger number of intrinsic variables, while POD achieves subsequently higher accuracy with increasing the number of parameters.The resulting errors of the macroscopic variables are small, especially in the smoother rarefied case.
Even though not strictly enforced, the FCNN approximately exhibits the correct conservation of mass, momentum, and energy, while POD has a slightly larger error.The correlation of intrinsic variables and macroscopic variables is investigated by means of the evolution of the reconstructed values and the pairwise correlation for the FCNN.The density is directly included in the latent space while the relation with other macroscopic variables seems to be more complex.
In addition to optimizing the neural network's performance, our research endeavors to enhance its predictive capabilities by incorporating fundamental physical properties such as conservation and interpretability.This objective includes the exploration of different loss functions and the regularization techniques.Furthermore, future work can utilize the model reduction approach in real-world scenarios, including simulations and parameter predictions.Lastly, the presented concepts could be applied in related fields, such as shallow flows [16] or fusion plasmas [21].
Appendix: Hyperparameters for the Fully Connected Autoencoder In this section, we describe the tests that have been conducted to optimize our hyperparameters.The hyperparameters include: the number of layers, i.e., depth; the number of nodes per hidden layer, i.e., width, batch size, and non-linear activation functions; the number of epochs for training; and the learning rate.Experiments are evaluated through: the validation error which estimates the model's ability to generalize; the training error which estimates the optimization to training data; and the L 2 -error, which gives an estimate of how well the model performs on the whole dataset and hence is applied as a comparative metric against POD.
To start with a working model, an estimate over the initial hyperparameters is done, which are summarized in Table 4.These include a mini-batch size of 16, the width of the bottleneck layer is 3 and 5 for the hydrodynamic case H and the rarefied case R, respectively, and a learning rate of 0.0001.LeakyReLU is applied as an activation function for the output, input, and any hidden layer besides the output of the bottleneck layer, and is referred to as activation hidden.The hyperbolic tangent is applied as an activation function for the output of the last hidden layer in the encoder which outputs the code, referred to as the activation code.Moreover, 2000 initial number of epochs are used.This might appear exaggerated but is justified by the small amount of input data and the small size of the network which yields fast training.shows that the updates to the free parameters θ were too big, which prohibitively slowed down or even prevented the learning process.Small updates to θ made all models quickly reach a minimum.The final hyperparameters for both input data are summarized below in table 2. From the initial models to the final models, the decrease in the validation error gained ≈ 1.5 × 10 −7 for hydrodynamic case H and ≈ 7.2 × 10 −8 for rarefied case R which amount to 93% of the initial values for both models.

Figure 1 :
Figure 1: Illustration of the macroscopic moments corresponding to an example distribution function.

Figure 2 :
Figure 2: Problem setup for the 1D Sod shock tube.A diaphragm at the center is initially separating the domain in two regions, where initial conditions for density ρ, macroscopic velocity u, and pressure p are indicated.

Figure 3 :
Figure 3: Sod shock tube and reference data.Initial conditions (a); equilibrium solution (b); Reference solutions in rarefied and hydrodynamic regime (c); Macroscopic quantities at t = 0.12s (d).
Nc×m are orthogonal matrices containing the left and right singular vectors, respectively.

Figure 4 :
Figure 4: Singular value decay σ and cumulative energy increase for the number of singular variables k in the hydrodynamic regime (a) and in the rarefied regime (b).A black cross marker corresponds to over 99% cumulative energy.

Figure 5 :
Figure 5: The L 2 -Error over the variation of the latent space dimension/truncation rank r using FCNN/POD for the hydrodynamic regime (left) and the rarefied regime (right).

Figure 6 :
Figure 6: Comparison of the FOM solutions f (left column) with reconstructions f obtained from POD (middle column) and FCNN (right column) at end time t = 0.12s for x ∈ [0.375, 0.75] in the hydrodynamic regime (top row) and in the rarefied regime (bottom row).

Figure 7 :
Figure 7: Comparison of FOM macroscopic quantities ρ, ρu and E with POD and the FCNN for the hydrodynamic regime (top row) and the rarefied regime (bottom row) at time t = 0.12s.

Figure 8 Figure 8 :
Figure8shows the evolution of the derivatives of mass, momentum, and total energy as a function of time for the hydrodynamic regime (top row) and the rarefied regime (bottom

Figure 9 : 2 Figure 10 : 2 Figure 11 :Figure 12 :
Figure 9: Macroscopic quantities of the hydrodynamic case obtained by the FOM.Density ρ, momentum ρu, total energy E, temperature T , and velocity u over time t and space x in the top row and at t = 0.12 in the bottom row.

Figure 15 :
Figure 15: Pearson correlation between macroscopic quantities and intrinsic variables for the hydrodynamic case.

Figure 16 :
Figure 16: Pearson correlation between macroscopic quantities and intrinsic variables for the rarefied case.

Figure 17 :
Figure 17: Five experiments over the different depths with the hydrodynamic case H left and the rarefied case R right.The number of layers used for every experiment is given.Training and validation loss are shown over 2000 epochs.

Figure 18 :
Figure 18: Five experiments over different widths with the hydrodynamic case H left and the rarefied case R right.The number of nodes used for every experiment is given.Training and validation loss are shown over 4000 epochs.

Figure 19 :
Figure 19: Five experiments over different batch sizes with the hydrodynamic case H left and the rarefied case R right.The batch size used for every experiment is given.Training and validation loss are shown over 5000 epochs.

Figure 20 :
Figure 20: Eight experiments with different combinations of activation functions for the hydrodynamic case H left and the rarefied case R right.Shown are training-and validation error over 5000 epochs.

Table 1 :
Problem setup for the Boltzmann-BGK model in Sod's shock tube.

Table 2 :
Hyper-parameters of Fully Connected Autoencoder Network (FCNN) for the hydrodynamic and rarefied regime.
Typically, each layer L n : R i → R o in the network consists of an affine linear mapping x → h n (W n x+b n ), where W n ∈ R o,i represent the weights, b n ∈ R o denote the biases, and h n are predefined non-linear functions.The configuration of the input and output dimensions i and o for each layer, the choice of activation function, and the number of layers collectively determine the architecture of the network.The choice of these so-called hyper-parameters is often difficult and a matter of trial and error.Architecture In our studies we have exploited fully connected neural autoencoder networks (FCNN) and convolutional autoencoder networks (CNN).However, in this manuscript we restrict ourselves to the results of the fully connected network, since it gave structurally the best results.We have studied a variety of different activation functions, hidden layers, batch sizes and depths of the network.The best results concerning the validation error and acceptable training time where obtained by the network defined in table 2. A comprehensive study of the parameter optimization is attached to the manuscript in section 5. c , 30, r, 30, N c ] [N c , 40, r, 40, N c ]

Table 3 :
Amount of parameters used to reconstruct f , the number of intrinsic variables p and the corresponding L 2 -Error for POD and FCNN, both for hydrodynamic (H) and rarefied (R) regimes.
Next, a qualitative analysis with the actual reconstructions is presented.From computations of the L 2 -error over time t, which are not shown due to conciseness, it became clear that the time step that contributes most to the error is the last time step in case of POD, while the FCNN distributes the error more evenly over all time steps.

Table 4 :
The initial selection of batch size, bottleneck size, number of epochs, learning rate, and applied activation functions.The model's depth is determined in a primary step because it determines the model's representational capacity and therefore can initiate over-and underfitting at an early stage in the hyperparameter search.The results of the experimentation are shown in fig.17and table 5 for both rarefaction levels.

Table 5 :
Results for the variation of the depth.Given are minimum values of training and validation error as well as the L 2 error.The minima were reached around the last 50 epochs of the training.The L 2 error is evaluated with the model at the last epoch.For the hydrodynamic case H, the lowest validation error of 7.74 × 10 −8 and an L 2 error of 0.0031 is reached with 4 layers and constitutes the best-performing design.Additionally, as seen in fig.17(left), a design that exceeds 4 layers results in slight overfitting from the 500th epoch.Less than 4 layers do not reach the validation error and L 2 error of the other designs, yielding the conclusion, that the capacity is too low.Overfitting occurs with 4 layers only after the 1000th epoch and is of smaller magnitude compared to the other three models that show overfitting.For rarefied case R, the lowest validation error of 1.61 × 10 −7 is also reached with 4 layers.On the other hand, the lowest L 2 error of 0.0031 and the lowest training error of 1.40×10 −7 are reached with 6 layers.Contrary to the previously discussed hydrodynamic case, the training error and L 2 errors are of lower magnitude for 6 layers, except for the validation error.Looking at fig.17(right), it is observed that the model with 6 layers starts to overfit after the 1500 epochs, yielding a decreasing training error and a stagnating validation error.Hence the model improved in the optimization task which additionally improves the L 2 error.Its generalization ability, measured by the validation error, did not improve and is larger than the validation error reached with 4 layers.This concludes a model with 4 layers constitutes the best-performing.Qualitatively, the overall training for both rarefaction levels is very stable.Training and validation errors do not diverge excessively and converge early in training.Separation of training and validation error occurs prominently for the hydrodynamic solution.The width of the two remaining hidden layers is examined in the following.For both the hydrodynamic and the rarefied regime five experiments are conducted, lowering the hidden units of the hidden layers from fifty to ten.Note that the decoder is chosen to be structurally a reflection of the encoder.Therefore only one parameter is changed.Results for the hydrodynamic case H and the rarefied case R are shown in table 6.Note that the contribution of over-and underfitting is negligible and therefore the training error is omitted.A model with 30 hidden units in encoder and decoder performs best with the hydrodynamic case H and reaches a error of 1.77 × 10 −8 .The corresponding L 2 error is equal to 1.5 × 10 −3 with a shrinkage factor of 0.015.Overall, the loss of each experiment with the hydrodynamic case H is quite similar and ranges from 1.77 × 10 −8 to 5.11×10 −8 .The L 2 error behaves similarly and is even equal for 50 and 30 layers.A model with 40 hidden units performs best for the rarefied case R. The corresponding validation error is 1.65 × 10 −8 with L 2 = 1.4 × 10 −3 , which is smaller than for the hydrodynamic case H.The shrinkage factor here is 0.125.In all experiments, a model with 10 hidden nodes performs worst.Training and validation errors over 4000 epochs for both experiments can be seen in fig.18.

Table 6 :
Results for the variation of the width.Given are the minimum value of the validation error as well as the L 2 error.The minima were reached around the last 50 epochs of the training, the L 2 error is evaluated with the model at the last epoch.Next, the mini-batch size is analyzed.Results are displayed in table7.Experiments are conducted with mini-batch sizes of 2, 4, 8, 16, 32.The smallest batch size of 2 yields the lowest validation error of 1.15 × 10 −8 with corresponding L 2 = 0.0012 at epoch 4956 for the hydrodynamic case H.The lowest validation error with 6.30 × 10 −9 is achieved for the rarefied case R at epoch 4534 with a batch size of 4. The corresponding L 2 error equals 0.001.Small batch sizes have a regularizing effect on the training and therefore are beneficial to generalization.At the same time, the lower the batch size is, the more unstable is the training as seen in fig.19.The oscillations that begin with batch sizes of 8 and lower, which make the training unstable, can be cured with a lower learning rate as soon as training starts to tremble.Additionally, small batch sizes drastically increase training time, thus a batch size as low as 2 is not used for the experiments.In conclusion, a batch size of 4 is chosen.Furthermore, a reduction of the learning rate from 1 × 10 −4 to 1 × 10 −5 is applied after the 3000th epoch.

Table 7 :
Results for the variation of the batch sizes.Given are the minimum value of validation error as well as the corresponding epoch.Additionally, the L 2 error is given but evaluated with the model at the last epoch.Eight experiments with different activation functions, ReLU, ELU, Tanh, SiLU, and LeakyReLU, are performed.The experiment designs and results are given in table 8 for hidden and code activations.With the hydrodynamic case H, combining ELU and ELU for hidden and code activation, respectively, yields the best result for the validation error with 4.44 × 10 −9 and a corresponding L 2 error of 0.0008.These values are achieved at the last epoch.For the rarefied case R, a combination of ReLU and ReLU for hidden and code activation, respectively, produces a validation error of 7.18 × 10 −9 and a corresponding L 2 error of 0.0009.Both are reached close to the last epoch.Note that all models reach their lowest loss at or very close to the last epoch.The reason is the stable training after the 3000th epoch, where the learning rate is lowered to 1 × 10 −5 as seen in section 5.This measure shows in all experiments an immediate success for learning.Both validation and training errors fall at the 3001st epoch and only decrease slightly thereafter.This behavior clearly