Error control for statistical solutions

Statistical solutions have recently been introduced as a an alternative solution framework for hyperbolic systems of conservation laws. In this work we derive a novel a posteriori error estimate in the Wasserstein distance between dissipative statistical solutions and numerical approximations, which rely on so-called regularized empirical measures. The error estimator can be split into deterministic parts which correspond to spatio-temporal approximation errors and a stochastic part which reflects the stochastic error. We provide numerical experiments which examine the scaling properties of the residuals and verify their splitting.


Introduction
Analysis and numerics of hyperbolic conservation laws have seen a significant shift of paradigms in the last decade. The investigation and approximation of entropy weak solutions was state of the art for a long time. This has changed due to two reasons. Firstly, analytical insights [5,9] revealed that weak entropic solutions to the Euler equations in several space dimensions are not unique. Secondly, numerical experiments [13,22] have shown that in simulations of e.g. the Kelvin-Helmholtz instability, numerical solutions do not converge under mesh refinement. In contrast, when families of simulations with slightly varying initial data are considered, averaged quantities like mean, variance and also higher moments are observed to converge under mesh refinement. This has led to several weaker (more statistics inspired) notions of solutions being proposed. We would like to mention dissipative measure valued solutions [12] and statistical solutions [16]. Considering measure valued solutions has a long history in hyperbolic conservation laws and can be traced back to the works of DiPerna [11] considering Young measures. Statistical solutions are time-parametrized probability measures on spaces of integrable functions and have been introduced recently for scalar problems in [14] and for systems in [16]. We would like to mention that in the context of the incompressible Navier-Stokes equations and turbulence, statistical solutions have a long history going back to the seminal work of Foias et al., see [18] and references therein.
The precise definition of statistical solutions is based on an equivalence theorem [16,Theorem 2.8] that relates probability measures on spaces of integrable functions and correlation measures on the state space. Correlation measures are measures that determine joint probability distributions of some unknown quantity at any finite collection of spatial points. In this sense, statistical solutions contain more information than e.g. dissipative measure valued solutions and, indeed, for scalar problems uniqueness of entropy dissipative statistical solutions can be proven. This proof is similar to the classical proof of uniqueness of weak solutions for scalar problems. In contrast, for systems in multiple space dimensions the non-uniqueness of entropy weak solutions immediately implies non-uniqueness of dissipative measure valued solutions and statistical solutions. Still, all these concepts satisfy weak-strong uniqueness principles, i.e., as long as a Lipschitz continuous solution exists in any of these classes it is the unique solution in any of these classes. The (technical) basis for obtaining weak-strong uniqueness results is based on the relative entropy framework of Dafermos and DiPerna [8], which can be extended to dissipative statistical solutions as in [16].
Convergence of numerical schemes for nonlinear systems of hyperbolic conservation laws is widely open (except when the solution is smooth). Exceptions are the one dimensional situation, where convergence of the Glimm scheme is well known [21] and is, indeed, used for constructing the standard Riemann semigroup [4]. For multi-dimensional problems, there is recent progress showing convergence of numerical schemes towards dissipative measure valued solutions [12] with more information on the convergence in case the limit is an entropy weak solution. It seems impossible to say anything about convergence rates in this setting due to the multiplicity of entropy weak solutions.
The work at hand tries to complement the a priori analysis from [16] with a reliable a posteriori error estimator, i.e., we propose a computable upper bound for the numerical approximation error of statistical solutions. This extends results for entropy weak solutions of deterministic and random systems of hyperbolic conservation laws [10,20] towards statistical solutions. One appealing feature of our error estimator is that it can be decomposed into different parts corresponding to space-time and stochastic errors. Our analysis relies on the relative entropy framework of Dafermos and DiPerna [8] extended to statistical solutions. We would like to mention that there are also other frameworks for providing a posteriori error analysis for hyperbolic systems, namely [1], [23] and [26] for one-dimensional systems.
The structure of this work is as follows: Section 2 reviews the notion of (dissipative) statistical solutions of hyperbolic conservation laws from [16]. Section 3 is concerned with the numerical approximation of dissipative statistical solutions using empirical measures. Moreover, we recall a reconstruction procedure which allows us to define the so-called regularized empirical measure. In Section 4, we present our main a posteriori error estimate between a dissipative statistical solution and its numerical approximation using the regularized empirical measure. Section 5 discusses why defining statistical solutions to general systems like the Euler equations is not straightforward. The technical difficulties in defining statistical solutions for general systems vanish when attention is restricted to solutions that take values in some compact subset of the state space. We explain in Section 5 that our a posterirori error analysis directly extends to such solutions. Finally, Section 6 provides some numerical experiments examining and verifying the convergence and splitting of the error estimators.

Preliminaries and notations
We consider the following one-dimensional system of m ∈ N nonlinear conservation laws: Here, u(t, x) ∈ U ⊂ R m is the vector of conserved quantities, U is an open and convex set that is called state space, f : U → R m is the flux function, D ⊂ R is the spatial domain and T ∈ R + . We restrict ourselves to the case where D = (0, 1) with periodic boundary conditions. The system (1) is called hyperbolic if for any u ∈ U the flux Jacobian D F (u) has m real eigenvalues and admits a basis of eigenvectors. We assume that (1) is equipped with an entropy/entropy flux pair (η, q), where the strictly convex function η ∈ C 2 (U ; R) and Most literature on numerical schemes for hyperbolic conservation laws focuses on computing numerical approximations of entropy weak solutions of (1). In contrast, we are interested in computing statistical solutions. We recall the definition of statistical solutions in subsection 2.1. It is worthwhile to note that statistical solutions were only defined for systems for which U = R m in [16]. We restrict our analysis to this setting in Sections 2.1 and 4. In Section 5, we discuss why we believe defining statistical solutions to general systems is not straightforward.

Statistical solutions
In this section, we provide a brief overview of the notions needed to define statistical solutions for (1), following the exposition in [16], referring to [16, Section 2] for more background, details and proofs. Let us introduce some notation: For any topological space X let B(X) denote the Borel σalgebra on X and M(X) denotes the set of signed Radon measures on (X, B(X)). In addition, P(X) denotes the set of probability measures on (X, B(X)), i.e., all non-negative µ ∈ M(X) satisfying µ(X) = 1. We consider U = R m and choose p ∈ [1, ∞) minimal, such that |f (u)|, |η(u)|, |q(u)| ≤ C(1 + |u| p ), ∀u ∈ U holds for some constant C > 0. The following classical theorem states the duality between L 1 (D k ; C 0 (U k )) and L ∞ (D k ; M(U k )).
i.e., the space of bounded, weak*-measurable maps from D k to M(U k ) under the duality pairing [Correlation measures [14,Def 2.5]] A p-summable correlation measure is a family ν = (ν 1 , ν 2 , . . . ) with ν k ∈ H k * 0 (D; U ) satisfying for all k ∈ N the following properties: (a) ν k is a Young measure from D k to U k . For every correlation measure ν ∈ L p (D; U ) there exists a unique probability measure µ ∈ P(L p (D; U )) whose p-th moment is finite, i.e., L p u L p (D;U ) dµ(u) < ∞ and such that µ is dual to ν, i.e., Conversely, for every µ ∈ P(L p (D; U )) with finite p-th moment there is a ν ∈ L p (D; U ) that is dual to µ.
To take into account the time-dependence in (1) the authors of [16] suggest to consider time parametrized probability measures. For T ∈ (0, ∞] consider time parametrized measures µ : [0, T ) → P(L p (D; U )). Note that such a µ does not contain any information about correlation between function values at different times. We denote µ evaluated at time t by ) and notice that it was shown in [16] that it is meaningful to evaluate an element ν k ∈ H k * 0 ([0, T ), D; U ) at almost every t ∈ [0, T ).
We denote the space of all time-dependent p-summable correlation measures by Conversely, for every µ : [0, T ) → P(L p (D; U )) satisfying (a) and (b) there is a unique correlation measure ν ∈ L p ([0, T ), D; U ) such that (c) holds. Definition 2.6 (Bounded support). We say someμ ∈ P(L p (D; U )) has bounded support, provided there exists C > 0 so that Definition 2.7 (Statistical solution). Letμ ∈ P(L p (D; U )) have bounded support. A statistical solution of (1) with initial dataμ is a time-dependent map µ : [0, T ) → P(L p (D; U )) such that each µ t has bounded support and such that the corresponding correlation measures ν k t satisfy in the sense of distributions, for every k ∈ N.
In order to show uniqueness (for scalar problems) and weak-strong uniqueness the authors of [14] and [16] require a comparison principle that compares statistical solutions to convex combinations of Dirac measures. Therefore, the entropy condition for statistical solutions has two parts. The first imposes stability under convex decompositions and the second is reminiscent of the standard entropy condition for deterministic problems. To state the first condition, we need the following notation: The elements of Λ(α, µ) are strongly connected to transport plans that play a major role in defining the Wasserstein distance, see Remark 3.5 for details. Now, we are in position to state the selection criterion for statistical solutions. Definition 2.9 (Dissipative statistical solution). A statistical solution of (1) is called a dissipative statistical solution if 1. for every choice of coefficients α i ≥ 0, satisfying K i=1 α i = 1 and for every (μ 1 , . . . ,μ K ) ∈ Λ(α,μ), there exists a function t → (µ 1,t , . . . , µ K,t ) ∈ Λ(α, µ t ), such that each µ i : [0, T ) → P(L p (D; U )) is a statistical solution of (1) with initial measureμ i .

it satisfies
. This selection criterion implies weak-(dissipative)-strong uniqueness, i.e., as long as some initial value problem admits a statistical solution supported on finitely many classical solutions, then this is the only dissipative statistical solution (on that time interval), [16,Lemma 3.3].
That result is a major ingredient in the proof of our main result Theorem 4.2 and it is indeed the special case of Theorem 4.2 for R st k ≡ 0. Both results are extensions of the classical relative entropy stability framework going back to seminal works of Dafermos and DiPerna. The attentive reader will note that on the technical level there are some differences between [16,Lemma 3.3] and Theorem 4.2. This is due to the following consideration: If L 2 stability results are to be inferred from the relative entropy framework, upper and lower bounds on the Hessian of the entropy and an upper bound on the Hessian of the flux F are needed. To this end, Fjordholm et.al. restrict their attention to systems for which such bounds exist globally, while we impose conditions (8), (9) and discuss situations in which they are satisfied (including the setting of [16, Lemma 3.3]), see Remark 4.3.

Numerical approximation of statistical solutions
This section is concerned with the description of the numerical approximation of statistical solutions. Following [15,16] the stochastic discretization relies on a Monte-Carlo, resp. collocation approach. Once samples are picked the problem at each sample is deterministic and we approximate it using the Runge-Kutta Discontinuous Galerkin method which we outline briefly. Moreover, we introduce a Lipschitz continuous reconstruction of the numerical solutions which is needed for our a posteriori error estimate in Theorem 4.2. Let us start with the deterministic space and time discretization.

Space and time discretization: Runge-Kutta Discontinuous Galer-kin method
We briefly describe the space and time discretization of (1), using the Runge-Kutta Discontinuous Galerkin (RKDG) method from [6]. Let T := {I l } Ns−1 l=0 , I l := (x l , x l+1 ) be a quasi-uniform triangulation of D. We set h l := (x l+1 − x l ), h max := max l h l , h min := min l h l for the spatial mesh and the (spatial) piecewise polynomial DG spaces for p ∈ N 0 are defined as Here P p denotes the space of polynomials of degree at most p and L V p h denotes the L 2projection into the DG space V p h . After spatial discretization of (1) we obtain the following semi-discrete scheme for the discrete solution Here, F : R m × R m → R m denotes a consistent, conservative and locally Lipschitz-continuous The initial-value problem (DG) is advanced in time by a R-th order strong stability preserving Runge-Kutta (SSP-RK) method [25,28]. To this end, we let 0 = t 0 < t 1 < . . . < t Nt = T be a (non-equidistant) temporal decomposition of [0, T ]. We define ∆t n := (t n+1 − t n ), ∆t := max n ∆t n . To ensure stability, the explicit time-stepping scheme has to obey the CFL-type condition where λ max is an upper bound for absolute values of eigenvalues of the flux Jacobian D f and C ∈ (0, 1]. Furthermore, we let Π h : R m → R m be the TVBM minmod slope limiter from [7]. The complete S-stage time-marching algorithm for given n-th time-iterate u n h := u h (t n , ·) ∈ V p h can then be written as follows.

Reconstruction of numerical solution
Our main a posteriori error estimate Theorem 4.2 is based on the relative entropy method of Dafermos and DiPerna [8] and its extension to statistical solution as described in the recent work [16]. The a posteriori error estimate requires that the approximating solutions are at least Lipschitz-continuous in space and time. To ensure this property, we reconstruct the numerical solution {u n h } Nt n=0 to a Lipschitz-continuous function in space and time. We do not elaborate upon this reconstruction procedure, to keep notation short and simple, but refer to [10,19,20]. The reconstruction process provides a computable space-time reconstruction , which allows us to define the following residual. Definition 3.1 (Space-time residual). We call the function R st ∈ L 2 ((0, T )× D; R m ), defined by the space-time residual for u st .
We would like to stress that the mentioned reconstruction procedure does not only render u st Lipschitz-continuous, but it is also specifically designed to ensure that the residual, defined in (5), has the same decay properties as the error of the RKDG numerical scheme as h tends to zero, cf. [10].

Computing the empirical measure
Following [15,16], we approximate the statistical solution of (1) using empirical measures. In this work, we allow for arbitrary sample points and weights. The sample points can either be obtained by randomly sampling the initial measure (Monte-Carlo), or by using abscissae of corresponding orthogonal polynomials. In this article we focus on the Monte-Carlo sampling. Let us assume that the sampling points are indexed by the set K = {1, ..., K}, K ∈ N and let us denote the corresponding weights by {w k } k∈K , i.e., for Monte-Carlo sampling we have w k = 1 K , for all k ∈ K. We use the following Monte-Carlo type algorithm from [16] to compute the empirical measure.
Algorithm 2 Algorithm to compute the empirical measure 1: Given: initial measureμ ∈ P(L p (D)) 2: For some probability space (Ω, σ, P) letū : Ω → L p (D; U ) be a random field such that the law ofū with respect to P coincides withμ 3: Draw K samples fromū and denote the samples by {ū k } k∈K 4: For every k ∈ K compute a numerical approximation {(u h,k ) n } Nt n=0 of (1) with initial datā u k using Algorithm 1 5: Set µ K,h T := k∈K w k δ u h,k (T ) Definition 3.2 (Empirical measure). Let {u h,k (t)} k∈K be a sequence of approximate solutions of (1) with initial dataū k , at time t ∈ (0, T ). We define the empirical measure at time t ∈ (0, T ) via For the a posteriori error estimate in Theorem 4.2, we need to consider a regularized measure, which we define using the reconstructed numerical solution u st obtained from the reconstruction procedure described in Section 3.2.
Definition 3.3 (Regularized empirical measure). Let {u st k (t)} k∈K be a sequence of reconstructed numerical approximations at time t ∈ (0, T ). We define the regularized empirical measure as followsμ Common metrics for computing distances between probability measures on Banach spaces spaces are the Wasserstein-distances. In Theorem 4.2, we bound the error between the dissipative statistical solution of (1) and the discrete regularized measure in the 2-Wasserstein distance. Definition 3.4 (r-Wasserstein distance). Let r ∈ [0, ∞), X be a separable Banach space and let µ, ρ ∈ P(X) with finite rth moment, i.e., The r-Wasserstein distance between µ and ρ is defined as is the set of all transport plans from µ to ρ, i.e., the set of measures π on X 2 with marginals µ and ρ, i.e., for all measurable subsets A of X. Remark 3.5. It is important to recall from [14, Lemma 4.2] that if ρ ∈ P(X) is M -atomic, i.e., ρ = K i=1 α i δ u i for some u 1 , . . . , u K ∈ X and some α 1 , . . . , α K ≥ 0 with K i=1 α i = 1, then there is a one-to-one correspondence between transport plans in Π(µ, ρ) and elements of Λ(α, µ), which was defined in (4).

The a posteriori error estimate
In this section, we present the main a posteriori error estimate between dissipative statistical solutions and regularized numerical approximations. A main ingredient for the proof of Theorem 4.2 is the notion of relative entropy. Since the relative entropy framework is essentially an L 2 -framework and due to the quadratic bounds in (8), (9) it is natural to consider the case p = 2. Therefore, for the remaining part of this paper we denote F := L 2 (D; U ). Definition 4.1 (Relative entropy and relative flux). Let (η, q) be a strictly convex entropy/entropy flux pair of (1). We define the relative entropy η(·|·) : U × U → R, and relative flux We can now state the main result of this work which is an a posteriori error estimate between dissipative statistical solutions and their numerical approximations using the regularized empirical measure. Theorem 4.2. Let µ t be a dissipative statistical solution of (1) as in Definition 2.9 and letμ K t be the regularized empirical measure from Definition 3.3. Assume that there exist constants A, B > 0 (independent of h, M ) such that for all t ∈ [0, T ) for a.e. x ∈ D, for µ t − a.e. u ∈ F, forμ K t − a.e. v ∈ F. Here H v denotes the Hessian matrix w.r.t. v. Then the distance between µ t andμ K t satisfies 2. Let C ⋐ U be a compact and convex set. Let us assume that µ t andμ K t are (for all t) supported on solutions having values in C only. Due to the regularity of f, η and the compactness of C, there exist constants 0 < C f ,C < ∞ and 0 < C η,C < C η,C < ∞, such that In this case, A, B from (8), (9) can be chosen as Remark 4.4. Let us briefly outline a fundamental difference between Theorem 4.2 and [16, Thm 4.9]. The latter result is an a priori estimate that treats the empirical measure (created via Monte Carlo sampling) as a random variable and infers convergence in the mean via the law of large numbers. Theorem 4.2 provides an a posteriori error estimate that can be evaluated once the numerical scheme has produced an approximate solution. This process involves sampling the initial data measureμ and the "quality" of these samples enters the error estimate via the term W 2 (μ,μ K ).
Proof. We recall that the weights {w k } k∈K satisfy k∈K w k = 1. We further denote the vector of weights by w := (w 1 , . . . , w K ), and let µ * = (µ * 1 , . . . , µ * K ) ∈ Λ( w,μ) correspond to an optimal transport plan betweenμ andμ K . Because µ t is a dissipative statistical solution, there exists (µ * 1,t , . . . , µ * K,t ) ∈ Λ( w, µ t ), such that Recalling that each function u st k is a Lipschitzcontinuous solution of the perturbed conservation law (5), considering its weak formulation yields for every φ k ∈ C ∞ c ([0, T ) × D; R m ). As (µ * k,t ) k∈K are probability measures on F, we obtain (after changing order of integration) Subtracting (14) from (12) and using the Lipschitz-continuous test function We compute the partial derivatives of D η(u st k (t, x))φ(t) using product and chain rule Next, we multiply (5) by D η(u st k ). Upon using the chain rule for Lipschitz-continuous functions and the relationship D q(u st Let us consider the weak form of (18) and integrate w.r.t. x, t and dµ * k,t for k ∈ K. Upon changing the order of integration we have for Hence, subtracting (19) from (20) and using the definition of the relative entropy from Definition 4.1 yields After subtracting (15) from (21) and using (16), (17) we are led to Rearranging (5) yields Plugging (23) into (22) and after rearranging we have Up to now, the choice of φ(t) was arbitrary. We fix s > 0 and ǫ > 0 and define φ as follows According to Theorem 2.5 (a) we have that the mapping is measurable for all k ∈ K. Moreover, due to the quadratic bound on the relative entropy, cf. (8), Lebesgue's differentiation theorem states that a.e. t ∈ (0, T ) is a Lebesgue point of (25). Thus, letting ǫ → 0 we obtain The left hand side of (26) is bounded from below using (8). The first term on the right hand is estimated using the L ∞ (D)-norm of the spatial derivative. We estimate the second term on the right hand side by Young's inequality. Finally, we apply (8) and then (9) to both terms. The last term on the right hand side is estimated using (8). We, thus, end up with Upon using Grönwall's inequality we obtain Using max k∈K ∂ x u st k L ∞ ((0,s)×D) =: L and recalling that (µ * k ) k∈K corresponds to an optimal transport plan and that (µ * k,s ) k∈K corresponds to an admissible transport plan, we finally obtain To obtain an error estimate with separated bounds, i.e., bounds that quantify spatio-temporal and stochastic errors, respectively, we split the 2-Wasserstein error in initial data into a spatial and a stochastic part. Using the triangle inequality we obtain The first term in (27) is a stochastic error, which is inferred from approximating the initial data by an empirical measure. On the other hand, the second term is essentially a spatial approximation error. This can be seen from the following lemma.
Lemma 4.5. With the same notation as in Theorem 4.2, the following inequality holds.
Equality in (28) holds provided spatial discretization errors are smaller than distances between samples. A sufficient condition is: Proof. Recalling the definition ofμ K = k∈K w kū st k and defining the transport plan π * := k∈K w k (δū k ⊗ δūst k ) yields the assertion, because If (29) holds, π * is an optimal transport plan.
Remark 4.6. In contrast to random conservation laws as considered in [20], the stochastic part of the error estimator of Theorem 4.2 is solely given by the discretization error of the initial data, i.e., given by the second term in (27), which may be amplified due to nonlinear effects. However, there is no stochastic error source during the evolution of the numerical approximation.

Extension to general systems (with U = R m )
Up to this point, we have stuck with the setting from [16] to stress that a posteriori error estimates can be obtained as long as the relative entropy framework from [16] is applicable. Having said this, it is fairly clear that this setting does not cover certain systems of practical interest, e.g. the Euler equations of fluid dynamics.
This raises the question how statistical solutions can be defined for more general systems. On the first glance, it might seem sufficient to require µ t to be a measure on some function space F so that u ∈ F implies that f (u) and η(u) are integrable. There is, however, a more subtle (and rather technical) issue: The integrals in (2) need to be well defined and so does The question is not only whether this expression is finite, but whether (and the analogous expression with f replaced by η) is µ t -measurable. If µ t is defined on the Borel σ-algebra of F the integral is well-defined provided E is continuous. Whether this is the case depends on the topology on F.We would like to present a toy example showing that some rather naive choices of topologies on F fail to ensure continuity of E in case U has a boundary and f (or η) is singular when approaching the boundary of U . Let us consider a scalar problem with U = (0, ∞) and f (u) = 1/u and D = (0, 1) with periodic boundary conditions. Then, there exists no r so that the map E from (30) is defined on L r . The problem of E not being well-defined can be fixed by restricting E to F = L r f := {u ∈ L r : D |f (u(x))| dx < ∞} but this does not make E continuous. Indeed, any L ∞ neighborhood of u(x) = x 1/2 in L ∞ f contains functions having arbitrarily large values of E, e.g. the functions u ǫ with ǫ ∈ (0, 1 4 ) given by Thus, E is not continuous on L ∞ f and well-posedness of L r f E(u)dµ t (u) is unclear. Due to continuous embeddings L ∞ f → L r f for all r the same argument works for any r ∈ [1, ∞). The a posteriori error estimate in Section 4 is not applicable in all situations in which a statistical solution can be defined but it requires (8), (9) . There is one interesting, albeit somewhat restrictive, setting in which (8), (9) holds and in which the technical difficulties in defining statistical solutions vanish: The case that all measures under consideration are supported on some set of functions taking values in some compact subset of the state space. This is the setting in which we will provide a definition of statistical solutions and an a posteriori error estimate. Definition 5.1 (C-valued statistical solution). Let C ⋐ U be compact and convex. Letμ ∈ P(L p (D; U )) be supported on L p (D; C). A C-valued statistical solution of (1) with initial dataμ is a time-dependent map µ : [0, T ) → P(L p (D; U )) such that each µ t is supported on L p (D; C) for all t ∈ [0, T ) and such that the corresponding correlation measures ν k t satisfy in the sense of distributions, for every k ∈ N Remark 5.2. Note that duality implies that if µ is supported on L p (D; C) then the corresponding ν k is supported on C k (for all k ∈ N). Definition 5.3 (Dissipative C-valued statistical solution). A C-valued statistical solution of (1) is called a dissipative statistical solution provided the conditions from Definition 2.9 hold with "statistical solution" being replaced by " C-valued statistical solution".
We immediately have the following theorem whose proof follows mutatis mutandis from the proof of Theorem 4.2 Theorem 5.4. Let µ t be a C-valued dissipative statistical solution of (1) with initial dataμ as in Definition 2.9 and letμ K t be the regularized empirical measure as in Definition 3.3 and supported on functions with values in C. Then the difference between µ t andμ K t satisfies for a.e. s ∈ (0, T ) and L := max k∈K ∂ x u st k L ∞ ((0,s)×D) .

Numerical experiment
In this section we illustrate how to approximate the 2-Wasserstein distances that occur in Theorem 4.2. Moreover, we examine the scaling properties of the estimators in Theorem 4.2 by means of a smooth solution of the one-dimensional compressible Euler equations.

Numerical approximation of Wasserstein distances
We illustrate how to approximate the 2-Wasserstein distance on the example of W 2 (μ, k∈K w k δū k ) from (27). For the given initial measureμ ∈ P(L 2 (D)) we choose a probability space (Ω, σ, P) and a random fieldū ∈ L 2 (Ω; L 2 (D)), such that the law ofū with respect to P coincides with µ. We approximate the initial measureμ using some empirical measure Computing the optimal transport π * (and thus the 2-Wasserstein distance) between the two atomic measures can be formulated as the following linear program, cf. [27, (2.11)] π * = arg min The linear program is solved using the network simplex algorithm [3,27], implemented in the optimal transport library ot.emd [17] in python. The 2-Wasserstein distance is finally approximated by where (33) can be computed with ot.emd2( w K , w M , C).

A numerical experiment
This section is devoted to the numerical study of the scaling properties of the individual terms in the a posteriori error estimate Theorem 4.2. In the following experiment we consider as instance of (1) the one-dimensional compressible Euler equations for the flow of an ideal gas, which are given by where ρ describes the mass density, m the momentum and E the energy of the gas. The constitutive law for pressure p reads with the adiabatic constant γ = 1.4. We construct a smooth exact solution of (34) by introducing an additional source term. The exact solution reads as follows where ξ ∼ U (0, 1) is a uniformly distributed random variable. Moreover, the spatial domain is [0, 1] per and we compute the solution up to T = 0.2. We sample the initial measureμ and the exact measure µ T with 10000 samples.
For the remaining part of this section we introduce the notations Moreover, when we refer to error, we mean the Wasserstein distance between exact and numerically computed density ρ at time t = T .
As numerical solver we use the RKDG Code Flexi [24]. The DG polynomial degrees will always be two and for the time-stepping we use a low storage SSP RK-method of order three as in [25]. The time-reconstruction is also of order three. As numerical fluxes we choose the Lax-Wendroff numerical flux Computing E det , E det 0 , E stoch 0 requires computing integrals, we approximate them via Gauß-Legendre quadrature where we use seven points in each time-cell and ten points in every spatial cell.

Spatial refinement
In this section we examine the scaling properties of E det (T ) and E det 0 when we gradually increase the number of spatial cells. We start initially on a coarse mesh with 16 elements, i.e., h = 1/16 and upon each refinement we subdivide each spatial cell uniformly into two cells. To examine a possible influence of the stochastic resolution we consider a coarse and fine sampling with 100 and 1000 samples, respectively. Table 1 and Figure ( Table 2 and Figure (1b) display the same quantities for 1000 samples. We observe that the convergence of E det and E det 0 does not depend on the stochastic resolution. Indeed, both quantities decay when we increase the number of spatial elements and they are uniform with respect to changes in the sample size. Moreover, E det 0 exhibits the expected order of convergence which is six for a DG polynomial degree of two (note that we compute squared quantities). It can also be observed that E det converges with order five, which is due to a suboptimal rate of convergence on the first time-step. This issue has also been discussed in detail in [20,Remark 4.1]. Furthermore, we see that the numerical error remains almost constant upon mesh refinement, since it is dominated by the stochastic resolution error, which is described by E stoch 0 . Since E stoch 0 reflects the stochastic error it also remains constant upon spatial mesh refinement.

Stochastic refinement
In this section, we consider stochastic refinement, i.e., we increase the number of samples and keep the spatial resolution fixed. Similarly to the numerical example in the previous section we consider a very coarse spatial discretization with 8 elements (Table 3, Figure (2a)) and a fine spatial discretization with 256 elements (Table 4, Figure (2b)). We observe that E det and E det 0 remain constant when we increase the number of samples. This is the correct behavior, since both residuals reflect spatio-temporal errors. For the coarse spatial discretization we observe that the numerical error does not decrease when increasing the the number of samples since the spatial discretization error, reflected by E det , dominates the numerical error. For the fine spatial discretization the numerical error is only dominated by the stochastic resolution error and thus the error converges with the same order. We observe an experimental rate of convergence of one (resp. one half after taking the square root). Finally, we see that E stoch 0 is independent of the spatial discretization because its convergence is not altered by the spatio-temporal resolution.

Conclusions
This work provides a first rigorous a posteriori error estimate for numerical approximations of dissipative statistical solutions in one spatial dimension. Our numerical approximations rely on so-called regularized empirical measures, which enable us to use the relative entropy method of Dafermons and DiPerna [8] within the framework of dissipative statistical solutions introduced by the authors of [16]. We derived a splitting of the error estimator into a stochastic and a spatio-temporal part. In addition, we provided a numerical example verifying this splitting. Moreover, our numerical results confirm that the quantities that occur in our a posteriori error estimate decay with the expected order of convergence.
Further work should focus on a posteriori error control for multi-dimensional systems of hyperbolic conservation laws, especially the correct extension of the space-time reconstruction to two and three spatial dimensions. Moreover, based on the a posteriori error estimate it is possible to design reliable space-time-stochastic numerical schemes for the approximation of dissipative statistical solutions. For random conservation laws and classical L 2 (Ω; L 2 (D))estimates, the stochastic part of the error estimator is given by the discretization error in the initial data and an additional stochastic residual which occurs during the evolution, see for example [19,20]. For dissipative statistical solutions the stochastic part of the error estimator in the 2-Wasserstein distance is directly related to the stochastic discretization error of the initial data which may be amplified in time due to nonlinear effects. This result shows that stochastic adaptivity for dissipative statistical solutions becomes significantly easier compared to random conservation laws since only stochastic discretization errors of the initial data (and their proliferation) need to be controlled. The design of space-stochastic adaptive numerical schemes based on this observation will be the subject of further research.