Moment closure approximations of the Boltzmann Equation based on {\phi}-divergences

This paper is concerned with approximations of the Boltzmann equation based on the method of moments. We propose a generalization of the setting of the moment-closure problem from relative entropy to {\phi}-divergences and a corresponding closure procedure based on minimization of {\phi}-divergences. The proposed description encapsulates as special cases Grad's classical closure based on expansion in Hermite polynomials and Levermore's entropy-based closure. We establish that the generalization to divergence-based closures enables the construction of extended thermodynamic theories that avoid essential limitations of the standard moment-closure formulations such as inadmissibility of the approximate phase-space distribution, potential loss of hyperbolicity and singularity of flux functions at local equilibrium. The divergence-based closure leads to a hierarchy of tractable symmetric hyperbolic systems that retain the fundamental structural properties of the Boltzmann equation.

In this paper we consider alternative moment-closure relations for the Boltzmann equation, based on approximations of the exponential function derived from truncations of its standard limit definition, exp(·) = lim n→∞ (1 + ·/n) n . It is to be noted that closure relations derived from a seriesexpansion definition of the exponential have received scant attention before, e.g., by Brini and Ruggeri [11]. Our motivation for considering the limit definition instead of the series-expansion definition for constructing the moment closures is based on the direct availability of a corresponding inverse relation for higher order approximations. We propose a generalization of the setting of the moment-closure problem from Kullback-Leibler divergence [31] (i.e relative entropy) to the class of ϕ-divergences [13]. The considered ϕ-divergences constitute an approximation to the Kullback-Leibler divergence in the vicinity of some Maxwellian. It will be shown that the approximateexponential closure relation can be derived via constrained minimization of a corresponding ϕ-divergence. The proposed description encapsulates as special cases Grad's closure relation and Levermore's entropy-based closure relation. For even order approximations of the exponential, the closure relation engenders non-negative phase-space distributions. Moreover, the corresponding moment systems are symmetric hyperbolic and tractable, in the sense that the formulation only requires the evaluation of higher-order moments of Gaussian distributions. The moment systems furthermore dissipate an appropriate ϕ-divergence, analogous to the dissipation of relative entropy of the Boltzmann equation, provided that the collision operator dissipates the corresponding ϕ-divergence. We will show that the class of collision operators that dissipate appropriate ϕ-divergences includes the standard BGK [4] and generalized BGK [33] operators.
The remainder of this paper is organized as follows. Section 2 abstracts, for completeness, well known structural features of the Boltzmann equation to be retained in the developed moment system approximation. Section 3 introduces concepts relevant to moment systems pertaining to subspace approximations and reviews the moment closures of Grad [19] and Levermore [33] in light of the aforementioned issues, namely, admissibility of phase-space distributions, hyperbolicity, realizability and tractability. Section 4 presents a novel tractable moment closure approximation and, moreover, it will be shown that the corresponding closed system of moment equations are well-posed and retain the structural features of the Boltzmann equation. Finally, section 6 gives a concluding discussion.

The Boltzmann Equation
Consider a monatomic gas, i.e. a gas composed of a single species of identical classical particles, contained in a fixed spatial domain Ω ⊂ R D . Kinetic theory describes the state of such a gas by a non-negative (phase-space) density f = f (t, x, v) over the single-particle phase space Ω × R D . The evolution of f is considered to be governed by the Boltzmann equation, where the collision operator f → C(f ) acts only on the v = (v 1 , . . . , v D ) dependence of f locally at each (t, x) and the summation convention applies to repeated indices. The collision operator is assumed to possess certain conservation, symmetry and dissipation properties, viz., conservation of mass, momentum and energy, invariance under Galilean transformations and dissipation of appropriate entropy functionals. These fundamental properties of the collision operator are treated in further detail below. Our treatment of the conservation and symmetry properties is standard (see, for instance, [33]) and is presented merely for coherence and completeness. For the entropy-dissipation property, we consider a generalization of the usual (relative) entropy to ϕ-divergences [13], to enable an exploration of the moment-closure problem in an extended setting; see Section 4.
To elaborate the conservation properties of the collision operator, let · denote integration in the velocity dependence of any scalar, vector or matrix valued measurable function over D-dimensional Lebesgue measure. A function ψ : where D(C) ⊂ L 1 (R D , R ≥0 ) denotes the domain of C, which we consider to be a subset of the almost everywhere nonnegative Lebesgue integrable functions on R D . Equation (1) associates a scalar conservation law with each collision invariant: We insist that {1, v 1 , . . . , v D , |v| 2 } are collision invariants of C and that the span of this set contains all collision invariants, i.e.
The moments f , v i f and |v| 2 f , correspond to mass-density, the (components of) momentumdensity and energy-density, respectively. Accordingly, the conservation law (3) implies that (1) conserves mass, momentum and energy. The assumed symmetry properties of the collision operator pertain to commutation with translational and rotational transformations. In particular, for all vectors u ∈ R D and all orthogonal tensors O : R D → R D , we define the translation transformation T u : D(C) → D(C) and the rotation transformation with O * the Euclidean adjoint of O. Note that the above transformations act on the v-dependence only. It is assumed that C possesses the following symmetries: The symmetries (4) imply that (1) complies with Galilean invariance, i.e. if f (t, x, v) satisfies the Boltzmann equation (1), then for arbitrary u ∈ R D and arbitrary orthogonal O : The entropy dissipation property of C is considered in the extended setting of [33,Sec. 7], from which we derive the following definition: a convex function η : with η ′ (f ) the derivative of η(f ), and if for every f ∈ D(C) the following equivalences hold: Relation (5) implies that C dissipates the local entropy η(·) , which leads to an abstraction of Boltzmann's H-theorem for (1), asserting that solutions of the Boltzmann equation (1) satisfy the local entropy-dissipation law: The functions η(f ) , v i η(f ) and η ′ (f ) C(f ) are referred to as entropy density, entropy flux and entropy-dissipation rate, respectively. The first equivalence in (6) characterizes local equilibria of C by vanishing entropy dissipation, while the second equivalence indicates the form of such local equilibria. For spatially homogeneous initial data, f 0 , Equations (5) and (6) suggest that equilibrium solutions, f eq , of (1) are determined by: Equation (8) identifies equilibria as minimizers 1 of the entropy, subject to the constraint that the invariant moments are identical to the invariant moments of the initial distribution. The standard definition of entropy corresponds to a density f → f log f , possibly augmented with f ψ where ψ ∈ E is any collision invariant. It is to be noted that for Maxwellians M, i.e. distributions of the form for some (̺, u, T ) ∈ R >0 × R D × R >0 and a certain gas constant R ∈ R >0 , it holds that log M ∈ E. Therefore, the relative entropy f log (f /M) of f with respect to M is equivalent to f log f in the sense of dissipation characteristics. The physical interpretation of the entropy f log f , due to Boltzmann [7,8,9], is that of a measure of degeneracy of macroscopic states, i.e. of the number of microscopic states that are consistent with the macroscopic state as described by the one-particle marginal, f . In the context of information theory, Shannon [50] showed that for discrete probability distributions, the density f → f log f is uniquely defined by the postulates of continuity, strong additivity and the property that mη(1/m) < nη(1/n) whenever n < m. These postulates ensure that for discrete probability distributions the entropy yields a meaningful characterization of information content and, accordingly, rationalize an interpretation of entropy as a measure of the uncertainty or, conversely, information gain pertaining to an observation represented by the corresponding probability distribution [24]. Kullback and Leibler [31] generalized Shannon's definition of information to the abstract case and identified the divergence 2 as a distance between mutually absolutely continuous measures µ 1 and µ 2 , both absolutely continuous with respect to the measure ν with Radon-Nikodym derivatives f 1 = dµ 1 /dν and f 2 = dµ 2 /dν. The Kullback-Leibler divergence characterizes the mean information for discrimination between µ 1 and µ 2 per observation from µ 1 . Noting that the Kullback-Leibler divergence (10) coincides with the relative entropy of f 1 with respect to f 2 , the relative entropy f log(f /M) can thus be understood as a particular measure of the divergence of the one-particle marginal relative to the reference (or background ) distribution M. Kullback-Leibler divergence was further generalized by Csiszár [13] and Ali et. al. [1], who introduced a general class of distances between probability measures, referred to as ϕ-divergences, of the form: where ϕ is some convex function subject to ϕ(1) = ϕ ′ (1) = 0 and ϕ ′′ (1) > 0. Note that the Kullback-Leibler divergence corresponds to the specific case ϕ KL (·) = (·) log(·).
In this work, we depart from the standard (relative) entropy for (1) and instead consider entropies based on particular ϕ-divergences. These ϕ-divergences generally preclude the usual physical and information-theoretical interpretations, but still provide a meaningful entropy density in accordance with (5) and (6). The considered ϕ-divergences yield a setting in which entropy-minimization based moment-closure approximations to (1) are not impaired by non-realizability, exhibit bounded fluxes in the vicinity of equilibrium, and are numerically tractable.
Remark 1 Implicit to our adoption of ϕ-divergence-based entropies is the assumption that such entropies comply with (5) and (6) for a meaningful class of collision operators. It can be shown that the class of admissible collision operators includes the BGK operator [4]: where τ ∈ R >0 is a relaxation time and E (·) corresponds to the map f 0 → f eq defined by (8).
The Kuhn-Tucker optimality conditions associated with (8) convey that η ′ (E f ) ∈ E and, therefore, The dissipation inequality (5) then follows from the convexity of η(·): Moreover, because equality in (13) holds if and only if f = E f , the condition η ′ (f ) C BGK (f ) = 0 implies that f = E f , which in turn yields C BGK (f ) = 0 and η ′ (f ) ∈ E. The equivalences in (6) are therefore also verified. A similar result holds for the multi-scale generalization of the BGK operator introduced in [33]; see Appendix A.

Moment Systems
Moment systems are approximations of the Boltzmann equation based on a finite number of velocitymoments of the one-particle marginal. An inherit aspect of moment equations derived from (1) is that low-order moments are generally coupled with higher-order ones, and consequently a closed set of equations for the moments cannot be readily formulated. Therefore, a closure relation is required.
To derive the moment equations from (1) and elaborate on the corresponding moment-closure problem, let M denote a finite-dimensional subspace of D-variate polynomials and let represent a corresponding basis. Denoting the column M -vector of these basis elements by m, it holds that the moments { m i f } M i=1 of the one-particle marginal satisfy: It is to be noted that we implicitly assume in (14) that f resides in almost everywhere in the considered time interval (0, T ) and the spatial domain Ω. This assumption has been confirmed in specific settings of (1) but not for the general case; see [33,Sec. 4] and the references therein for further details. The moment-closure problem pertains to the fact that (14) provides only M relations between (2+D)M independent variables, viz., the densities m i f , the flux components v i m i f and the production terms m i C(f ) . Therefore, (1 + D)M auxiliary relations must be specified to close the system. Generally, moment systems are closed by expressing the fluxes and production terms as a function of the densities. Moment systems are generally closed by constructing an approximation to the distribution function from the densities and then evaluating the fluxes and production terms for the approximate distribution. Denoting by A ⊆ R M a suitable class of moments, a function F : A → F must be specified such that F realizes the moments in A, i.e. mF(ρ) = ρ for all ρ ∈ A, and F( mf ) constitutes a suitable (in a sense to be made more precise below) approximation to the solution f of the Boltzmann equation (1). Approximating the moments in (14) by ρ ≈ mf and replacing f in (14) by the approximation F(ρ), one obtains the following closed system for the approximate moments: The closed moment system (16) is essentially defined by the polynomial subspace, M, and the closure relation, F. A subspace/closure-relation pair (M, F) is suitable if the corresponding moment system (16) is well posed and retains the fundamental structural properties of the Boltzmann equation (1) as described in section 2, viz., conservation of mass, momentum and energy, Galilean invariance and dissipation of an entropy functional. Auxiliary conditions may be taken into consideration, e.g. that the fluxes and production terms can be efficiently evaluated by numerical quadrature. It is noteworthy that moment systems can alternatively be conceived of as Galerkin subspaceapproximations of the Boltzmann equation in renormalized form. This Galerkin-approximation interpretation can for instance prove useful in constructing error estimates for (16) and in deriving structural properties. In addition, the Galerkin-approximation interpretation conveys that smooth functionals of approximate distributions obtained from moment systems, such as velocity moments, generally display superconvergence under hierarchical-rank refinement, in accordance with the Babuška-Miller theorem; see [2] and also Section 5. Consider the subspace M and let β : M → F denote a renormalization map. Denoting by V ((0, T ) × Ω; M) a suitable class of functions from (0, T ) × Ω into M, the moment system (16) can be recast into the Galerkin form: To elucidate the relation between (16) and (17), we associate to β : is implicitly restricted to moments ρ ∈ R M that can be realized by some g ∈ M. The equivalence between the Galerkin formulation (17) and the moment system (16) now follows immediately by noting that constitutes a basis of M and inserting g ρ for g in (17). In the remainder of this section we review the celebrated moment closures of Levermore [33] and Grad [19] to provide a basis for the subsequent divergence-based moment closures in section 4.

Levermore's Entropy-Based Moment Closure
The moment-closure relation of Levermore [33] is essentially characterized by the renormalization map β(·) = exp(·). For this closure relation, a subspace M is considered to be admissible if it satisfies: The first condition insists that M contains the collision invariants, which ensures that the moment system imposes conservation of mass, momentum and energy. These conservation laws must be obeyed if any fluid-dynamical approximation is to be recovered. The second condition dictates that for all m ∈ M, all u ∈ R D and all orthogonal tensors O it holds that m(u − (·)) ∈ M and m(O * (·)) ∈ M. This condition ensures that the moment system exhibits Galilean invariance. As argued by Junk [28], rotation and translation invariant finite dimensional spaces are necessarily composed of multivariate polynomials. The third condition requires that M contains functions m such that β(m(·)) is Lebesgue integrable on R D . For β(·) = exp(·) and M composed of multivariate polynomials, this condition implies that the highest-order terms in any variable in M must be of even order. The subset M c then corresponds to a convex cone, consisting of all polynomials in M for which the highest-order terms in any variable are of even order and have a negative coefficient. One can infer that exp(·) maps M c to distributions with bounded moments and fluxes, i.e. g ∈ M c implies | mβ(g) | < ∞ and |vmβ(g) | < ∞ for all m ∈ M.
In [33] the moment-closure relation associated with β(·) = exp(·) is derived by minimization of the entropy with density η L (f ) := f log f − f , subject to the moment constraint. Specifically, considering any admissible subspace M, Levermore formally defines the closure relation ρ → F L (ρ) according to: To elucidate the fundamental properties of the closure relation (18), we consider an admissible subspace M and we denote by D the collection of all f ∈ F that yield moments ρ = mf for which the minimizer in (18) exists. The operator F L ( m(·) ) : D → I is idempotent and its image I ⊂ D admits a finite-dimensional characterization. In particular, it holds that log I coincides with the convex cone M c . The idempotence of the operator F L ( m(·) ) and its injectivity in D imply that (18) corresponds to a projection. This projection is generally referred to as the entropic projection [22]. A second characterization of (18) follows from the following sequence of identities, which holds for any f ∈ F such that mf = ρ and all Maxwellian distributions, M = exp(ψ) with ψ ∈ E: for some α ∈ R M . Noting that α · ρ is independent of f , one can infer from (10) and (19) that F L according to (18) is the distribution in F that is closest to equilibrium in the Kullback-Leibler divergence, subject to the condition that its moments m(·) coincide with ρ. Similarly, it can be shown that F L according to (18) minimizes f log f subject to mf = ρ. Therefore, the information interpretation of the entropy f log f (see Section 2) enables a third characterization of (18), viz. as the least-biased distribution given the information m(·) = ρ on the moments. The exponential form of the renormalization map associated with (18) can be derived straightforwardly by means of the Lagrange multiplier method. Provided it exists, the minimizer of the constrained minimization problem (18) corresponds to a stationary point of the . The stationarity condition implies that log f − α · m vanishes, which conveys the exponential form f = exp(α · m). It is to be noted that the Lagrange multipliers have to comply with an admissibility condition related to integrability. In particular, α · m must belong to the convex cone M c .
In [33] it is shown that the moment system (16) with closure F L corresponds to a quasi-linear symmetric hyperbolic system for the Lagrange multipliers. Application of the chain rule to (16) with F L (ρ) = exp(α · m) (with, implicitly, ρ = m exp(α · m) ) yields: The symmetry of A i (i = 0, 1, . . . , D) and the positive definiteness of A 0 are evident. By virtue of its quasi-linear symmetric hyperbolicity, the system (20) is at least linearly well posed [33]. Moreover, under auxiliary conditions on the initial data, local-in-time existence of solutions can be established; see, for instance, [37]. Levermore's moment systems retain the fundamental structural properties of the Boltzmann equation. The conservation properties and Galilean invariance are direct consequences of conditions 1. and 2. on the admissible subspaces, respectively. Dissipation of the entropy η L (·) can be inferred from the Galerkin formulation (17), by noting that for β(·) = exp(·) it holds that log β(·) : M → M. Hence, if g complies with (17) and β(·) = exp(·) then the following identity holds on account of Galerkin orthogonality: The left-hand side of this identity coincides with ∂ t η L (β(g)) + ∂ xi v i η L (β(g)) , while the right-hand side equals C(β(g)) η ′ L (β(g)) . For g according to (17), the distribution β(g) = exp(g) thus obeys the entropy dissipation relation (7) with entropy density η L .
Levermore's consideration of entropy-based moment-closure systems in [33], as well as the above exposition, implicitly rely on existence of a solution to the moment-constrained entropy minimization problem (18). It was however shown by Junk in the series of papers [26,27,28] that for superquadratic M the closure relation (18) is impaired by non-realizability, i.e. a minimizer of (18) may be non-existent. Moreover, the class of local equilibrium distributions generally lies on the boundary of the set of degenerate densities. In [26], Junk also establishes that the flux v i mβ(g) can become unbounded in the vicinity of equilibrium, thus compromising well-posedness of (20). The singularity of the fluxes moreover represents a severe complication for numerical approximation methods; see also [39].
The realizability problem of Levermore's entropy-based moment closure has been extensively investigated; see, in particular, [26,27,28,23,49,43]. In [28,23,43] it has been shown that the set of degenerate densities is empty if and only if the set {α ∈ R M : m exp(α · m) ∈ L 1 (R D , R M )} of Lagrange multipliers associated with integrable distributions is open. This result implies that degenerate densities are unavoidable for super-quadratic polynomial spaces, because the Lagrange multipliers associated with equilibrium are then located on the boundary of the above set; see also [23]. To bypass the realizability problem, Schneider [49] and Pavan [43] considered the following relaxation of the constrained entropy-minimization problem: where the binary relation ≤ * connotes that the highest order moments of the left member are bounded by the corresponding moments of the right member. The relaxation of the highest-ordermoment constraints serves to accommodate that minimizing sequences {f n } ⊂ F subject to the constraint mf n = ρ converge (in the topology of absolutely integrable functions) to an exponential density with inferior highest-order moments; see [26,27,49,23,43]. The analyses in [49,43] convey that the relaxed minimization problem indeed admits a unique solution, corresponding to an exponential distribution. The exponential closure can therefore be retained if the closure relation is defined by (22) instead of (18). It is to be noted however that the closure relation (22) does not generally provide a bijection between the Lagrange multipliers and the moments. Moreover, the aforementioned singularity of fluxes near equilibrium is also inherent to (22). Another formidable obstruction to the implementation of numerical approximations of Levermore's moment-closure systems are the exponential integrals that appear in (16). The evaluation of moments of exponentials of super-quadratic polynomials is generally accepted to be intractable, and accurate approximation of such moments is algorithmically complicated and computationally intensive; see, in particular, [32,Sec. 12.2] and [26,Sec. 6].

Grad's Hermite-Based Moment Closure
In his seminal paper [19], Grad proposed a moment-closure relation based on a factorization of the one-particle marginal in a Maxwellian distribution and a term expanded in Hermite polynomials; see also [20,Sec. V]. The expansion considered by Grad writes: where c denotes peculiar velocity, i k = (i 1 , i 2 , . . . , i k ) is a multi-index with sub-indices i (·) ∈ {1, 2, . . . , D}, a i k are the polynomial expansion coefficients and H (k) i k are D-variate Hermite polynomials of degree k: The Maxwellian in (23) can either correspond to a prescribed local or global Maxwellian, or it can form part of the approximation; see [19,20]. In the latter case, the coefficients associated with invariant moments are fixed and it holds that a (0) = 1, a  (24) is invariant under permutations of its indices. In [19], uniqueness of the coefficients in (23) is restored by imposing auxiliary symmetry conditions on the coefficients.
Grad's moment systems can be conveniently conceived of as Galerkin approximations of the Boltzmann equation in renormalized form in accordance with (17). For a prescribed Maxwellian, the renormalization map simply corresponds to β : g → Mg. Incorporation of the Maxwellian in (23) in the approximation can be represented by the renormalization map: where Π E : M → E denotes the orthogonal projection onto the space of collision invariants and Id represents the identity operator. The embedding E ⊆ M implies that Π E M = E and (Id − Π E )M = M\E. Hence, the projection in (25) provides a separation of M into E and its orthogonal complement. It is notable that the renormalization map in Grad's moment system can be conceived of as a linearization of Levermore's exponential closure relation in the vicinity of M. In particular, setting ψ = log M ∈ E, the following identities hold pointwise: as (g − ψ) → 0. To derive the renormalization map β : g → Mg for prescribed Maxwellians, it suffices to note that 1 + g − ψ ∈ M. To infer the renormalization map (25) if M is retained in the approximation, we note that setting ψ = Π E g and omitting the remainder in (26) yields (25). For a prescribed (global or local) Maxwellian M, Grad's moment systems dissipate the entropy η χ 2 (f ) := 1 2 M(f /M − 1) 2 , provided that η χ 2 represents an entropy density for the collision operator under consideration. It can for example be shown that η χ 2 is generally a suitable entropy density for collision operators linearized about M (see [21]) and for BGK collision operators. Dissipation of the entropy η χ 2 can be directly inferred from the Galerkin formulation (17), by noting that for β(g) = Mg it holds that: Hence, η ′ χ 2 resides in the test space M in (17) and dissipation of η χ 2 follows from Galerkin orthogonality. The entropy η χ 2 (f ) can be associated with the ϕ χ 2 -divergence of f relative to M with ϕ χ 2 (s) = 1 2 (s − 1) 2 ; cf. (11). Grad's moment-closure relation can in fact be obtained by minimization of the ϕ χ 2 -divergence subject to the moment constraints: The minimization problem (28) is not impaired by the realizability problem inherent to (18), because the moment functionals m(·) are continuous in the topology corresponding to η χ 2 .
If the Maxwellian is retained in the approximation, then an entropy for the corresponding moment systems can be non-existent or its derivation is intractable. However, for any entropy density η for the collision operator, the following identity holds by virtue of the Galerkin-orthogonality property of β := β(g) in (17): for arbitrary m ∈ M. Equation (29) implies that solutions to Grad's moment systems dissipate any entropy η for the collision operator up to inf m∈M η ′ (β(g)) − m , in some suitable norm · . For example, introducing the condensed notation g 0 = Π E g, g 1 = (Id − Π E )g and the convex functional η : E × M \ E → R according to η(g 0 , g 1 ) = (g 0 − 1)e g0 (1 + g 1 ) + e g0 g 1 (1 + g 1 ) the renormalization map in (25) corresponds to β(g) = e g0 (1 + g 1 ) and it holds that dη(g 0 , g 1 ) = (g 0 + g 1 )e g0 (1 + g 1 ) dg 0 + (g 0 + 2g 1 )e g0 dg 1 = (g 0 + g 1 )(∂ g0 β dg 0 + ∂ g1 β dg 1 ) + g 1 e g0 dg 1 = (g 0 + g 1 ) dβ + g 1 e g0 dg 1 Considering that g 0 + g 1 ∈ M, it follows from (29) that if η in (30) is an entropy density for the collision operator, then Grad's moment systems with β(·) according to (25) dissipates η up to O(g 1 ) as g 1 vanishes (in some appropriate norm). Note that g 1 vanishes at equilibrium. Grad's moment-closure relation exhibits several fundamental deficiencies that may cause breakdown of the physical and mathematical structure of the corresponding moment-closure system for large deviations from equilibrium. First, the expansion (23) admits inadmissible, locally negative distributions. Second, the moment systems are generally non-symmetric and hyperbolicity is not guaranteed. It has been observed in [10,54] that Grad's moment-closure systems can indeed exhibit complex characteristics and loss of hyperbolicity.

Divergence-Based Moment Closures
In this section we present a novel moment-closure relation based on an approximation of the exponential function. The considered approximation is derived from truncations of the standard limit definition of the exponential exp(·) · · = lim n→∞ (1 + (·)/n) n ≈ (1 + (·)/N ) N . It is noteworthy that unlike the exponential function, in the limit as v → −∞ the truncated exponential as well as its derivative do not vanish. The former condition is needed to preserve the decay properties of the exponential function while the latter condition is needed to preserve the same absolute maximum and minimum as the exponential. Moreover, as opposed to the exponential function, the truncated exponential can be negative if N is odd. Several approximations of the exponential function that preserve the aforementioned properties of the exponential have been proposed in the literature; see, for example, [55,42,29] and references therein. These so-called deformed exponentials can generally serve to construct moment-closure renormalization maps, with properties depending on the particular form of the deformed exponential and the construction. In [55], Tsallis proposed the q-exponential: with q = 1 and (·) + = 1 2 (·)+ 1 2 |·| the non-negative part of a function extended by 0. The q-exponential in (32) is related to the non-negative part of the truncated limit definition of the exponential by 1 − q = 1/N . We will consider renormalization maps of the form with M a prescribed local or global Maxwell distribution. The renormalization map β N can be construed as an approximation to the exponential renormalization map about the Maxwellian distribution M. We will establish that the moment-closure distribution (33) can be derived as the minimizer of a modified entropy that approximates the Kullback-Leibler divergence near M and that belongs to the class of ϕ-divergences. In addition, we will show that the resulting moment system overcomes the aforementioned deficiencies of Grad's and Levermore's moment systems, while retaining the fundamental properties of the Boltzmann equation presented in Section 2.
The renormalization map (33) engenders the following moment-closure relation: where the moment densities ρ and the coefficients α are related by ρ = m M exp q (α · m) . Given a polynomial subspace M ⊇ E with a Galilean-group property (admissibility conditions 1 and 2 in section 3.1), the moment system corresponding to (33) conforms to (16) with, in particular, the moment-closure relation F N according to (34).
To elucidate some of the characteristics of the renormalization map (33), we regard it in comparison with the renormalization maps associated with Levermore's exponential moment-closure relation and Grad's moment-closure relation with a prescribed Maxwellian prefactor. The renormalization map associated with Levermore's moment-closure relation is given by g → exp(g); see Section 3.1. By virtue of the vector-space structure of M ⊇ E, for an arbitrary Maxwellian distribution M it holds that log M + M = M. Hence, for g ∈ M, the renormalization map g → exp(g) can be equivalently expressed as g → exp(log M + g). In the limit N → ∞, we obtain for (33): Equation (35) implies that in the limit N → ∞, the renormalization map in (33) coincides with the exponential renormalization map associated with Levermore's moment-closure relation. For finite N , the moments mβ N (g) and fluxes mvβ N (g) with m, g ∈ M correspond to piecewise-polynomial moments of the Gaussian distribution M. The evaluation of such moments is tractable, as opposed to the evaluation of moments and fluxes for the exponential renormalization map. In addition, for superquadratic approximations M ⊃ E, the exponential renormalization map associated with Levermore's closure can lead to singular moments and fluxes in the vicinity of equilibrium, i.e. as g approaches E. The fundamental underlying problem is the realizability problem; see Section 3.1 and [26]. Accordingly, one can form sequences {g n } such that exp(g n ) → E (in the L 1 topology) while there exist m ∈ M such that | m exp(g n ) | → ∞ or | mv exp(g n ) | → ∞. One can infer that due to the exponential decay of the prefactor M and the polynomial form of the renormalization map in (33), moments and fluxes corresponding to (33) are non-singular near equilibrium. To compare (33) to the renormalization map corresponding to Grad's moment-closure relation with a prescribed Maxwellian prefactor, g → Mg (see Section 3.2), we note that by virtue of the vector-space structure of M ⊇ E, it holds that 1 + M = M. Hence, for g ∈ M, the renormalization map g → Mg can be equivalently expressed as β G : g → M(1 + g). Comparison of β N and β G imparts that β 1 = (β G ) + , i.e. for N = 1 the renormalization map β N in (33) coincides with the non-negative part of the renormalization in Grad's closure, extended by zero. Therefore, the renormalization map (33) avoids the potential negativity of the approximate distribution inherent to Grad's closure and the corresponding loss of hyperbolicity of the moment system. The moment system corresponding to (33) retains conservation of mass, momentum and energy as well as Galilean invariance. The conservation properties can be directly deduced from the Galerkin form (17) of the moment system, by noting that E is contained in the test space M, in accordance with admissibility condition 1 in section 3.1). Galilean invariance is an immediate consequence of admissibility condition 2. However, contrary to Levermore's moment system, the moment system with renormalization map (33) does not generally dissipate the relative entropy f log (f /M) , because the inverse of β N (·) does not correspond to log(·) and, therefore, log β N (g) does not generally belong to the test space M for g ∈ M; cf. Section 3.1. The moment system closed by (33) does however dissipate a modified entropy. To determine a suitable entropy function for the moment system with renormalization map (33), we observe that: provides an inverse of β N according to (33) with domain R ≥0 . The function log yields an approximation to the natural logarithm, corresponding to the inverse of the q-exponential in (32). The approximate logarithm is eligible as the derivative of an entropy density associated with the moment system with renormalization (33). In particular, defining the entropy density as it holds that η ′ N = β −1 N and, hence, η ′ N (β N (·)) : M → M. The constant in (37) has been selected such that η N (M) vanishes. The entropy corresponding to (37) can be cast in the form of a relative entropy associated with a ϕ-divergence, in accordance with (11). To this end, we introduce and note that η N (f ) = Mϕ N (f /M). Convexity of the function ϕ N and of the corresponding entropy density η L follows by direct computation: Therefore, ϕ ′′ N is strictly positive on R >0 . Moreover, it holds that ϕ N (1) = 0. In conclusion, if η N (·) = Mϕ((·)/M) is an entropy density for the collision operator C according to (5), then the approximate distribution β N (g) of the moment system (17) with renormalization map (33) complies with the local entropy-dissipation relation: We recall that the premise on the collision operator is for example satisfied by the BGK and extended BGK operators; see also Remark 1.
The moment-closure relation (34) can be derived by minimization of the ϕ N -divergence subject to the moment constraint; cf. the definition of Levermore's closure relation according to (18). Consider the constrained minimization problem: Formally, the solution to (41) can be obtained by the method of Lagrange multipliers. The minimizer in (41) corresponds to a stationary point of the Lagrangian (f, α) → η N (f ) + α · (ρ − mf ). The stationarity condition implies that η ′ N (f )−α·m = 0 and, on account of (36), that β −1 N (f ) = α·m. It follows directly that the minimizer in (41) is of the form F N (ρ) = β N (α · m) in conformity with (34). Contrary to the entropy minimization problem (18) underlying Levermore's closure relation, the minimization problem (41) is well posed. Existence of a solution to the minimization problem (41) can be deduced from results for generalized projections for non-negative functions by Csiszár in [14]. In [14] it is shown that the minimization problem over a constrained set of non-negative functions, for certain countable functions {a j } j∈IJ , possesses a minimizer belonging to X provided that the following (sufficient) conditions hold: 1) X is a convex set of non-negative functions and the infimum in (42) is finite; 2) lim s→∞ ϕ ′ (s) = ∞; is finite for all ξ > 0 and j ∈ IJ.
The function ϕ ⋆ in condition 3 corresponds to the convex conjugate of ϕ. Comparison conveys that (41) conforms to (42)-(43) with f 2 = M, ν(·) Lebesgue measure and {a j } j∈IJ a monomial basis of M. Convexity of the constrained distributions follows from the linearity of the moment constraints. Finiteness of the infimum is ensured by the fact that the infimum over the constrained set is bounded from below by the infimum over the unconstrained set, and the latter attains its minimum of 0 for f = M. The minimization problem (41) thus complies with condition 1. Compliance with condition 2 follows from ϕ ′ N (s) = N s 1/N − N and lim s→∞ s 1/N = ∞. To verify condition 3, we note that the convex conjugate of ϕ N is: Condition 2 therefore translates into the requirement that is bounded. By virtue of the exponential decay of the prefactor M and the fact that |m j | N +1 increases only polynomially, the expressions in (45) are indeed finite for any ξ > 0. The minimization problem (41) therefore also satisfies condition 3. It is notable that the minimization problem (18) pertaining to Levermore's moment closure satisfies conditions 1 and 2 but violates condition 3.
To establish that the closure relation (34) leads to a symmetric-hyperbolic system, we insert (34) into the generic form (16) of moment systems, and note that application of the chain rule and product rule leads to a system of the form (20) with: The symmetry of A 0 , . . . , A D is evident. Positive-definiteness of A 0 follows from: The inequality in (47) reduces to an equality if and only if γ = 0 or α · m = −N . The latter case is pathological, because α · m = −N implies that F N (ρ) = 0. It is noteworthy that the second constituent of the production term s(α), i.e. the term representing the contribution of ∂ t M + v i ∂ xi M to the production, may cause blow up of solutions to the hyperbolic system (16) with (46) in the limit t → ∞. Hence, the hyperbolic character of (16) with (46) ensures stability of solutions only in finite time. If M corresponds to a global Maxwellian, then ∂ t M+v i ∂ xi M vanishes and the stability provided by hyperbolicity also holds in the ad-infinitum limit.

Numerical results for the 1D spatially homogeneous Boltzmann-BGK equation
To illustrate the properties of the moment-system approximation (17) with the divergence-based closure relation encoded by the renormalization map (33), this section presents numerical computations for the spatially homogeneous Boltzmann-BGK equation in 1D: for some given initial distribution f 0 . The corresponding moment system writes: with F N according to (34) and (F N ) 0 defined by the minimization problem (41) subject to the moments corresponding to the initial distribution: The systems in (48) and (49) represent initial-value problems for the ordinary differential equations (48a) and (49a). The solutions of the initial-value problems (48) and (49) are, respectively, For the considered spatially-homogeneous case, the collision-invariance properties of the collision operator imply that E f = E f0 and, similarly, E F N = E (F N )0 . Furthermore, the constraints in the minimization problem in (50) impose E f0 = E (F N )0 . Based on the expressions for the solutions in (51), it then follows that: in any suitable norm. Equation (52) conveys that the accuracy of the approximation of f (t, ·) by F N (t, ·) at any time t > 0 depends exclusively on the accuracy of the approximation of the initial condition f 0 by (F N ) 0 according to (50). In the remainder of this section we therefore restrict our considerations to numerical examples that illustrate the approximation properties of ϕ N -divergence minimizers and to properties of the projection problem (50). We consider approximations of the distributions: by means of moment-constrained ϕ N -divergence minimizers in polynomial spaces of increasing order. The distributions in (53a)-(53c), shown in Figure 1, correspond to distributions of increasing complexity, viz., a symmetric bi-modal distribution, a non-symmetric bi-modal distribution and a non-symmetric tri-modal distribution, respectively. For the pre-factor M in the renormalization map (33) and, accordingly, in the relative entropy η N (f ) = Mϕ N (f /M) associated with the ϕ N -divergence in (38), we select the global equilibrium distribution E f0 . In particular, denoting by M k = span{1, v, . . . , v k−1 } the space of polynomials of degree k −1, the minimization problem in (50) with k moment (k ≥ 3) constraints engenders the nonlinear-projection problem: Expanding  (53) and the corresponding approximations with k = 3 (dotted ), k = 6 (dash-dot) and k = 15 (dashed ) moments obtained from the moment-constrained ϕ-divergence minimization problem (50).
contributions to the moments of M(1 + α · m/N ) N + . The integrals are evaluated by applying a suitable transformation of the integration variable and invoking the following rule: where −∞ ≤ v 0 ≤ ∞ and −∞ ≤ v 1 ≤ ∞ and Γ (·) and Γ (·, ·) are the complete and incomplete gamma functions respectively. The coefficients α are extracted from the system (54) by means of the Newton method. It is to be noted that (1 + g/N ) N + is Fréchet differentiable with respect to g by virtue of the fact that, evidently, changes in the sign of 1 + g/N occur only at roots. A consistent Jacobian for the tangent problems in the Newton method is provided by: The Jacobian matrix J (α) in the right member of (56) can be identified as a symmetric-positive definite matrix and, hence, the tangent problems in the Newton method are well posed. The Jacobian is however of Hankel-type and it becomes increasingly ill-conditioned as the number of moments increases; see for example [17,53]. Consequently, the convergence behavior of the Newton process deteriorates for higher-moment systems. To illustrate the dependence of the convergence behavior of the Newton process on the number of moments, Figure 2 (left) plots the ratio of the 2-norm of the update in the Newton process, δα 2 , over the 2-norm of the solution vector, α (n+1) 2 , versus the number of iterations for polynomial orders k = 7, 9, 11, 13 for the three test distributions in (53). The ratio δα 2 / α (n+1) 2 can be conceived of as the relative magnitude of the update vector. Figure 2 (right) plots an approximation of the corresponding infinity-norm condition numbers, κ ∞ (α) = J (α) ∞ J −1 (α) ∞ , of the Jacobian matrices. The results in Figure 2 convey that the condition number increases significantly as the number of moments increases. For k = 7 the condition number is approximately 10 3 , while for k = 13 the condition number exceeds 10 5 and can even reach 10 10 . For high-order approximations, the convergence behavior of the Newton process is generally slow and non-monotonous. However, in all cases the relative update can be reduced to a tolerance of 10 −4 .
To illustrate the approximation properties of the moment method with closure relation (34), Figure 3 (left) plots the L 1 (R)-norm of the relative error in the approximation F i 2 to the distribution f i (i = 1, 2, 3) according to (53) and Figure 3 (right) the corresponding relative error in the cosine moment, respectively. The cosine moment serves to investigate the super-convergence properties of the approximation in accordance with the Babuška-Miller theorem [2]; see Section 3. A non-polynomial k=9 k=11 k=13 Fig. 2: Convergence of the Newton process for the nonlinear projection problem (54) and conditioning of the corresponding Jacobian matrices: (left) relative magnitude of the Newton update, δα 2 / α (n+1) 2 , versus the number of iterations for k = 7, 9, 11, 13 and for distributions (53a) (top), (53b) (center ) and (53c) (bottom); (right) corresponding ∞-norm condition numbers, κ ∞ (α (n) ), of the Jacobian matrices according to (56). moment has been selected to examine the convergence behavior, because for any polynomial moment σf i with σ ∈ M l the approximation σF i 2 provided by the k-moment approximation F i 2 is exact for all k ≥ l, on account of the constraints in (50). Figure 3 converges exponentially with increasing k, i.e., there exist positive constants C and ζ such that In particular, ζ ≈ 10 −0.13 ≈ 0.74 for the bi-modal distributions f 1 and f 2 and ζ ≈ 10 −0.085 ≈ 0.82 for the tri-modal distribution f 3 . Comparison of the left and right panels in Figure 3 conveys that the approximation of the cosine moment indeed converges at a higher rate than the L 1 (R)-norm of the approximation itself. Figure 3 (right) conveys that the cosine moment converges at a rate of ζ ≈ 10 −0.58 ≈ 0.26 for both the bi-modal distributions f 1 , f 2 and the tri-modal distribution f 3 .

Conclusion
To avoid the realizability problem inherent to the maximum-entropy closure relation for momentsystem approximations of the Boltzmann equation, we proposed a class of new closure relations based on ϕ-divergence minimization. We established that ϕ-divergences provide a natural generalization of the usual relative-entropy setting of the moment-closure problem. It was shown that minimization of certain ϕ-divergences leads to suitable closure relations and that the corresponding moment-constrained ϕ-divergence minimization problems are not impaired by the realizability problem inherent to relative-entropy minimization. Moreover, if the collision operator under consideration dissipates a ϕ-divergence, then the corresponding minimal-divergence moment-closure systems retain the fundamental structural properties of the Boltzmann equation, namely, conservation of mass, momentum and energy, Galilean invariance, and dissipation of an entropy, sc. the ϕ-divergence. For suitable ϕ-divergences, the closure relation yields non-negative approximations of the one-particle marginal. Divergence-based moment systems are generally symmetric hyperbolic, which implies linear well-posedness. We inferred that moment systems can alternatively be conceived of as Galerkin approximations of a renormalized Boltzmann equation. We considered moment systems based on a renormalization map composed of Tsallis' q-exponential. This renormalization map is concomitant with a ϕ-divergence corresponding to the anti-derivative of the inverse q-exponential, which yields a natural approximation to relative entropy. The evaluation of moments of q-exponential, elementary in numerical methods for the corresponding moment system, is tractable, as opposed to the evaluation of moments of exponentials of arbitrary-order polynomials, connected with maximum-entropy closure.
Numerical results have been presented for the one-dimensional spatially homogeneous Boltzmann-BGK equation. The nonlinear projection problem associated with the moment-constrained ϕ-divergence minimization problems was solved by means of Newton's method. We observed that the condition number of the Jacobian matrices in the tangent problems generally deteriorates as the number of moments increases. Nevertheless, in all considered cases approximations up to at least 14 moments could be computed. We observed that the q-exponential approximation converges exponentially in the L 1 (R)-norm with increasing number of moments. Moreover, we demonstrated that functionals of the approximate distribution display super convergence, in accordance with the Babuška-Miller theorem for Galerkin approximations.
with η(g) = g log g − g, under the assumption that (58) admits a solution for each k. Based on the sequence of projections {F k } K k=1 , one can define a multiscale relaxation operator: with {θ k } K k=1 an increasing sequence of positive relaxation rates depending on f . The relaxation rate θ k with k ∈ {1, 2, . . . , K − 1} constitutes the rate at which F k+1 decays to F k , while θ K is the rate at which f decays to F K . In [33] it is shown that the Prandtl number can be controlled via the relaxation rates.
The above construction of the generalized BGK operator can be extended to ϕ-divergences. To this end, consider an arbitrary ϕ-divergence and let f → F k (f ) =: F k denote the corresponding divergence-minimization projection according to (58), i.e. F k is defined by (58) with η(·) = Mϕ((·)/M). Based on the projections F k , an extended BGK operator can be defined analogous to (59). To establish that η ′ (·) = ϕ ′ ((·/M) corresponds to an entropy density for the generalized BGK operator, we first note that the (strong) convexity of η implies: for all s, t in the domain of η and equality in (60) holds if and only if s = t. Rearranging the sum in (59) yields: From the minimization problem (58) we infer that for all k it holds that η ′ (F k ) ∈ M k and m(f − F k ) = 0 for all m ∈ M k . Hence, η ′ (F k )f − η ′ (F k )F k = 0 yields a partition of zero for all k. From (61) and the aforementioned partition of zero, we obtain From θ 1 > 0 and θ k+1 > θ k (k = 1, . . . , K −1), and the convexity of η(·) according to (60) we conclude that η and C satisfy the dissipation relation (5), i.e. η ′ (f )C(f ) ≤ 0 for all admissible f . To verify the second prerequisite relation between η and C, viz., the equivalence of the statements in (6), we first observe that the implication (6) (i) ⇒ (6) (ii) is trivial. To validate the reverse implication in (6), we note that (6) (ii) in combination with the convexity of η according to (60) and the ultimate expression in (62) implies that (η ′ (f ) − η ′ (F k ))(f − F k ) vanishes almost everywhere for all k = 1, . . . , K. This, in turn, implies that f = F 1 = · · · = F K . Condition (6) (i) then follows directly from (61). To verify the implication (6) (ii) ⇒ (6) (iii) , we note that F k according to (58) satisfies η ′ (F k ) ∈ M k for all k. Recalling that (6) (ii) implies f = F 1 , we infer η ′ (f ) ∈ M 1 = E in accordance with (6) (iii) . Finally, the reverse implication (6) (iii) ⇒ (6) (ii) follows immediately from (61) and the moment constraints in (58).