A New Class of High-Order Methods for Fluid Dynamics Simulations Using Gaussian Process Modeling: One-Dimensional Case

We introduce an entirely new class of high-order methods for computational fluid dynamics based on the Gaussian process (GP) family of stochastic functions. Our approach is to use kernel-based GP prediction methods to interpolate/reconstruct high-order approximations for solving hyperbolic PDEs. We present a new high-order formulation to solve (magneto)hydrodynamic equations using the GP approach that furnishes an alternative to conventional polynomial-based approaches.


Introduction
Cutting edge simulations of gas dynamics and magnetohydrodynamics (MHD) have been among the headliner applications of scientific high-performance computing (HPC) [13,14,32,61].They are expected to remain important as new HPC architectures of ever more powerful capabilities come online in the decades to come.A notable trend in recent HPC developments concerns the hardware design of the newer architectures: It is expected that newer HPC architectures will feature a radical change in the balance between computation and memory resources, with memory per compute core declining dramatically from current levels [2,13,61].This trend tells us that new algorithmic strategies will be required to meet the goals of saving memory and accommodating increased computation.This paradigm shift in designing scientific algorithms has become a great import in HPC applications.In the context of numerical methods for computational fluid dynamics (CFD), one desirable approach is to design high-order accurate methods [61] that, in contrast to low-order methods, can achieve an increased target solution accuracy more efficiently and quickly by computing increased higher-order floating-point approximations on a given grid resolution [25,36,37].This approach embodies in a concrete manner the desired tradeoff between memory and computation by exercising more computation per memory -or equivalently, the equal amount of computation with less memory.
Within the broad framework of finite difference method (FDM) and finite volume method (FVM) discretizations, discrete algorithms of data interpolation and reconstruction play a key role in numerical methods for PDE integration [36,37,62].They are frequently the limiting factor in the convergence rate, efficiency, and algorithmic complexity of a numerical scheme.The general procedure in 1D highorder conservative FDM is to pursue high-order approximations of flux function values Fi+ 1 2 at interfaces, by interpolating the set of the interface flux function values {F (q i−p ), . . ., F (q i+r )}, each of which is evaluated as pointwise value at q k , k = i − p, . . ., i + r, for some integers p and r.Mathematically, this is formulated as Fi+ 1 2 = I F (q i−p ), . . ., F (q i+r ) , where I(•) is a highly accurate interpolation scheme providing stable numerical interface flux values that are evaluated as pointwise values of q k over a stencil of interpolation, ∪ i+r k=i−p I k , where I k denotes the k-th cell (e.g., [x i−1/2 , x i+1/2 ] in 1D) [26,43].By contrast, the procedure of 1D high-order FVM begins with a set of the cell volume-averaged values as initial conditions, and seeks a pair of high-order accurate reconstructed pointwise Riemann state values by solving Riemann problems at the interfaces x i+ 1 2 using the Riemann state pair ) as inputs [8,35,36,42,55,62].
More generally, interpolation and reconstruction are not only essential for estimating high-order accurate approximations for fluxes at quadrature points on each cell, but also for interface tracking; for prolonging states from coarse zones to corresponding refined zones in adaptive-mesh refinement (AMR) schemes; and for various other contexts associated with high-order solutions.In CFD simulations, these interpolation and reconstruction algorithms must be carried out as accurately as possible, because, to a large extent, their accuracy is one of the key factors that determines the overall accuracy of the simulation.
Polynomial-based approaches are the most successful and popular among interpolation/reconstruction methods in this field.There are a couple of convincing reasons for this state of affairs.First, they are easily relatable to Taylor expansion, the most familiar of function approximations.Second, the nominal N -th order accuracy of polynomial interpolation/reconstruction is derived from using polynomials of degree (N − 1), bearing a leading term of the error that scales with O(∆ N ) as the local grid spacing ∆ approaches to zero [36,37,62].However, the simplicity of polynomial interpolation/reconstruction comes at a price: The polynomial approach is notoriously prone to oscillations in data fitting, especially with discontinuous data [20]; Furthermore, in many practical situations the highorder polynomial interpolation/reconstruction must be carried out on a fixed size of stencils, whereby there is a one-to-one relationship between the order of the interpolation/reconstruction and the size of the stencils.This becomes a restriction in particular when unstructured meshes are considered in multiple spatial dimensions.Lastly, another related major issue lies in the fact that the algorithmic complexity of such polynomial based schemes typically grows with order of accuracy [18], as well as with spatial dimensionality [8,42,71] in FVM.
To overcome the aforementioned issues in polynomial methods, practitioners have developed in the last decades the so-called "non-polynomial" interpolation/reconstruction based on the mesh-free Radial Basis Function (RBF) approximations.The core idea is to replace the polynomial interpolants with RBFs, which is a part of a very general class of approximants from the field known as Optimal Recovery (OR) [68].Several interpolation techniques in OR have shown practicable in the framework of solving hyperbolic PDEs [31,46,58], parabolic PDEs [44,45], diffusion and reaction-diffusion PDEs [53], boundary value problems of elliptic PDEs [40], interpolations on irregular domains [10,24,41], and also interpolations on a more general set of scattered data [17].Historically, RBFs were introduced to seek exact function interpolations [47].Recently, the RBF approximations have been combined with the key ideas of handling discontinuities in the ENO [23] and WENO [26,39] methods.Such approaches, termed as ENO/WENO-RBF, have been extended to solve nonlinear scalar equations and the Euler equations in the FDM [29] and the FVM [21] frameworks.These studies focused on designing their RBF methods with the use of adaptive shape parameters to control local errors.Also in [4], two types of multiquadrics and polyharmonic spline RBFs were used to model the Euler equations with a strategy of selecting optimal shape parameters for different RBF orders.Stability analysis on the fully discretized hyperbolic PDEs in both space and time using the multiquadrics RBF is reported in [9].While there exist a few conceptual resemblances between these RBF approaches and our new GP method, the fundamental differences that distinguish the two approaches are discussed later in this paper.
The goal in this article is to develop a new class of high-order methods that overcomes the aforementioned difficulties in polynomial approaches by exploiting the alternative perspective afforded by GP modeling, a methodology borrowed from the field of statistical data modeling.In view of the novelty of our approach, which bridges the two distinct research fields of statistical modeling and CFD, it is our intention in this paper to first construct a mathematical formulation in 1D framework.The current study will serve as a theoretical foundation for later multidimensional extensions of the GP modeling for CFD applications, topics of which will be studied in our future work.
The GP method is a class of high-order schemes primarily designed for numerical evolution of hyperbolic PDEs, q t + ∇ • F (q) = 0. To this end, we describe the new high-order Gaussian Process (GP) approximation strategies in two steps: 1. GP interpolation that works on pointwise values of q(x i ) as both inputs and outputs, and 2. GP reconstruction that works on volume-averaged values q i = 1 ∆x ∆x q(x, t n )dx as inputs, reconstructing pointwise values as outputs.GP interpolation will provide a "baseline" formulation of using GP as a new highorder interpolator operating on the same data type, while GP reconstruction will serve as a high-order reconstructor operating on two different types of data.

Gaussian Process Modeling
The theory of GP, and more generally of stochastic functions, dates back to the work of Wiener [69] and Kolmogorov [33].Modern-day applications are numerous: Just in the physical sciences, GP prediction is in common use in meteorology, geology, and time-series analysis [5,48,67], and in cosmology, where GP models furnish the standard description of the Cosmic Microwave Background [6].Applications abound in many other fields, in particular wherever spatial or time-series data requires "nonparametric" modeling [5,48,59].Within the perspective of CFD applications, our goal in this study is to use predictive GP modeling that is processed by training observed data, (e.g., cell-averaged fluid variables at cell centers) to produce a "data-informed" prediction (e.g., pointwise Riemann state values at cell interfaces).In what follows we give a brief overview on GP from statistical perspective (Section 2.1), followed by our strategies of tuning GP for high-order interpolation (Section 2.2) and reconstruction (Section 2.3) in CFD applications.Readers who wish to pursue the subject in greater detail are referred to [5,48,59].

GP -Statistical Perspective
GP is a class of stochastic processes, i.e., processes that sample functions (rather than points) from an infinite dimensional function space.Initially one specifies a socalled prior probability distribution over the function space.Then, given a sample of function values at some set of points, one "trains" the model by regarding the sample as data and using Bayes' theorem to update the probability distribution over the function space.This way, one obtains a data-informed posterior probability distribution over the function space, adjusted with respect to the prior so as to be compatible with the observed data.The posterior distribution may be used to predict (probabilistically) the value of the function at points where the function has not yet been sampled.The mean value of this GP prediction is our target interpolation/reconstruction for FDM and FVM.
Formally, a GP is a collection of random variables, any finite collection of which has a joint Gaussian distribution [5,48].A GP is fully defined by two functions: a mean function f (x) = E[f (x)] over R N , and a covariance function which is a symmetric, positive-definite integral kernel Such functions f , drawn randomly from this distribution, are said to be sampled from a Gaussian Process with mean function f (x) and covariance function K(x, y), and we write f ∼ GP( f , K).As with the case of finite-dimensional Gaussian distributions, the significance of the covariance is where the averaging is over the GP distribution.
In standard statistical modeling practice, both f (x) and K(x, y) are typically parametrized functions, with parameters controlling the character (e.g.length scales, differentiability, oscillation strength) of "likely" functions.Given a GP, and given N "training" points x i , i = 1, . . ., N at which the function values f (x i ) are known, we may calculate the likelihood L (the probability of f given the GP model) of the data vector f ≡ [f (x 1 ), . . ., f (x N )] T (e.g., N many pointwise values of density ρ at x i , i = 1, . . ., N ) by where Given the function samples f = [f (x 1 ), . . ., f (x N )] T obtained at spatial points x i , i = 1, . . ., N , GP predictions aim to make a probabilistic statement about the value f * ≡ f (x * ) of the unknown function f ∼ GP( f , K) at a new spatial point x * .In other words, from a stochastic modeling view point, we are interested in making a new prediction of GP for f at any randomly chosen point x * .This is particularly of interest to us from the perspectives of FDM and FVM, because we can use GP to predict an unknown function value at cell interfaces (e.g., u i± 1 2 in 1D) where both FDM and FVM require estimates of flux functions.
We can accomplish this by utilizing the conditioning property of GP from the theory of Bayesian inference [5,48].We look at the augmented likelihood function L * by considering the joint distribution of the currently available training outputs, f , and the new test output f * , where g and ḡ are the (N + 1)-dimensional vectors whose components, in partitioned form, are and M is the (N + 1) × (N + 1) augmented covariance matrix, given in partitioned form by In Eq. ( 8), we've defined a scalar k * * and an N -dimensional vector k * = [k * ,i ] i=1,...,N given by Using Bayes' Theorem, the conditioning property applied to the joint Gaussian prior distribution on the observation f yields the Gaussian posterior distribution of f * given f .One may then straightforwardly derive [5,48]: where the newly updated posterior mean function is and the newly updated posterior covariance is What has happened here is that the GP on the unknown function f , trained on the data f , has resulted in a Gaussian posterior probability distribution on the unknown function value f (x * ) at a new desired location x * , with a mean f * as given in Eq. ( 11), and with a variance as given in Eq. (12).
It is worthwhile to make an important observation at this point.Eq. ( 5) provides a clear conceptual difference in interpolations and reconstructions between the GP predictions and the RBF approximations.In the GP nonparametric viewpoint we define a prior (or latent) probability distribution directly over function spaces, hence there is no need to seek any fixed number of parameters that may depend on the datasets under consideration.This way, the stochastic properties of functions in the prior are set by a choice of covariance kernel functions, by which the posterior predictions in Eq. ( 6) follow.
On the contrary, RBF models in the context of interpolations and reconstructions follow the parametric modeling paradigm, requiring to solve a linear system Aλ = F (see e.g., [4,31]) whose size is determined and fixed by the number of desired N interpolation points f = [f (x 1 ), . . ., f (x N )] T and the associated coefficients (or parameters) λ j and c k of an RBF interpolating function s(x), The components of the solution vector λ are λ j and c k , A is the matrix whose components are set by the values of RBFs Φ i,j = Φ(||x i − x j ||, j ) at two points x i and x j and the shape parameter j > 0, F is the vector with entries of f , and the last term in Eq. ( 13) is a polynomial (at most) K-th degree with k-th standard basis polynomials p k .Unlike GP, a probability distribution (if needed) is not directly imposed on s(x), rather, it can only be induced by a prior distribution defined on the coefficients.
Even though the two approaches resemble each other in terms of using "kernels" (i.e., RBFs and covariance kernel functions), they are fundamentally different from these statistical viewpoints.

High-order GP Interpolation for CFD
In this paper we are most interested in developing a high-order reconstruction method for FVM in which the function samples f are given as volume-averaged data u i .However, we first consider an interpolation method which uses pointwise data u i as function samples f .An algorithmic design for GP interpolation using pointwise data will provide a good mathematical foundation for FVM which reconstructs pointwise values from volume-averaged data.
The mean f * of the distribution given in Eq. ( 11) is our interpolation of the function f at the point x * ∈ R D , D = 1, 2, 3, where f is any given fluid variable such as density, pressure, velocity fields, magnetic fields, etc.For the purpose of exposition, let us use q to denote one of such fluid variables (e.g., density ρ = ρ(x, t n )).Also, the mathematical descriptions of both interpolation and reconstruction will be considered in 1D hereafter.
We are interested in seeking a high-order interpolation of q at x * = x * = x i± 1 2 on a stencil ∪ i+r k=i−p I k , where I GP (•) is the GP interpolation given in Eq. (11).We define Furthermore, if we simply assume a constant mean f 0 for the data in Eq. ( 15) we get where 1 r−p+1 = [1, . . ., 1] T is an (r − p + 1)-dimensional one-vector.In case of a zero mean f 0 = 0 the GP interpolation scheme simply becomes As shown in Eqs.(11) and (17), the interpolant f * = q i± 1 2 is a simple linear combination of the observed data f and the covariance kernels k * and K, anchored by one of its arguments to one of the data points, x * , x i−p , . . ., x i+r .The second term in Eq. ( 11) can also be cast as an inner product between a vector of weights w T ≡ k T * K −1 and a vector of data residuals (f − f ).The weights w are independent of the data values f ; they depend only on the locations of the data points x i and on the desired interpolation point x * .This is useful, because in hydrodynamic simulations the training point locations (viz.a GP stencil) and the interpolation points (viz.cell interfaces) are often known in advance, as is the case with static Cartesian grid configurations.In such cases, the weight vector w can be computed and stored in advance at an initialization step and remain constant throughout the simulation.When an adaptive mesh refinement (AMR) configuration is considered, w can be computed for all possible grid refinement levels and stored a priori for later use.The GP interpolations then come at the cost of the remaining inexpensive inner product operation between w T and (f − f ) in Eq. (11), whose operation count is linearly proportional to the number of points in the stencil.Typically in 1D, the size of the stencil is one for the first order Godunov (FOG) method [19,62]; three for 2nd order piecewise linear methods (PLM) [62,66]; five for both the 3rd order piecewise parabolic method (PPM) [11]; and 5th order Weighted Essentially Non-oscillatory (WENO) method [26].Given such stencil sizes, the cost of the linear solves required by Eq. ( 11) are minor.For instance, K is a 5 × 5 matrix when using a stencil of five grid points, namely, the 5-point GP stencil centered at x i with a radius of two-cell length Note that the interpolation point x * may be anywhere in the continuous interval S R .
Since the matrix K is symmetric positive-definite, the inversion of K can be obtained efficiently by Cholesky decomposition which is about a factor 2 faster than the usual LU decomposition.The decomposition is only needed once, at initialization time, for the calculation of the vector of weights w.
An important feature of GP interpolation is that it naturally supports multidimensional stencil configurations.The reason for this is that there are many valid covariance functions over R D that are isotropic, and therefore do not bias interpolations and reconstructions along any special direction.The possibility of directionally-unbiased reconstruction over multidimensional stencils is a qualitative advantage of GP interpolation, especially when designing high-order algorithms [8,42,71].Such high-order multidimensional GP methods will be studied in our future work.
There is an additional piece of information beyond the point estimate f * .We also have an uncertainty in the estimate, given by U in Eq. ( 12).This posterior uncertainty is of crucial importance in many GP modeling applications, but it is of limited interest for the purpose of this paper.We will overlook posterior uncertainty in the current study, and focus on the posterior mean formula given in Eq. (11).

High-order GP Reconstruction for CFD
In FVM the fluid variables to be evolved are not pointwise values, but rather are volume-averaged integral quantities, q i = 1 ∆V i ∆V i q(x, t n )dV.In FDM on the other hand, the main task is to find a high-order approximation to the interface flux values Fi+ 1 2 , given the fact that their integral quantities are known via the analytic evaluations of the flux function F at the given pointwise data set u i .The conservative FDM updates in 1D then readily proceed by solving In both cases, we see that there is a change in data types -hence the name "reconstruction" -between input (e.g., q i for FVM; F (q i ) for FDM) and output (e.g., q(x, t n ) for FVM; Fi+ 1 2 for FDM) pairs in such a way that high-order approximations are applied to the integral quantities (inputs) and produce corresponding pointwise values (outputs), with high accuracy.
The GP interpolation method outlined in Section 2.2 should therefore be modified so that reconstruction may account for such data type changes in both FDM and FVM.Note that the integral averages over a grid cell constitute "linear" operations on a function f (x).As with ordinary finite-dimensional multivariate Gaussian distributions, where linear operations on Gaussian random variables result in new Gaussian random variables with linearly transformed means and covariances, a set of N linear functionals operating on a GP-distributed function f has an N -dimensional Gaussian distribution with mean and covariance that are linear functionals of the GP mean function and covariance function.
Suppose for example in a FVM sense, we consider a GP reconstruction on a GP stencil For the sake of a reconstruction scheme on a mesh of control volumes, we choose the measures dg k (x) to be the cell volume-average measures, where ∆ (d) is the grid spacing in the d-direction, and the k-th cell I k is given by 2 ] for each d-direction.For the purpose of the current discussion, we will assume a locallyuniform rectilinear grid of cubical cells Then, the vector G = [G i−p , . . ., G i+r ] T is normally distributed with mean Ḡ = [ Ḡi−p , . . ., Ḡi+r ] T and covariance matrix C = [C kh ] k,h=i−p,...,i+r , where and Thus, we see that the GP distribution on the function f leads to a multivariate Gaussian distribution on any (r − p + 1)-dimensional vector G of linear functionals of f .This Gaussian distribution can be used for likelihood maximization in a manner completely analogous to the case of interpolation, where training is performed using pointwise data.
For reconstruction, it is also not hard to generalize Eq. (11).We first define the (r − p + 1)-dimensional prediction vector T * = [T * ,k ] k=i−p,...,i+r at x * by Then the pointwise function value f (x * ) at the point x * , reconstructed from the volume-averaged data G, is given by Notice that the sample data G now holds volume-averaged quantities on a stencil The (r − p + 1)-dimensional vector T * ,k is the covariance between cell average quantities q k , k = i − p, . . ., i + r, and the pointwise value q * at x * , C is the (r − p + 1) × (r − p + 1) covariance kernel matrix between cell average values q k , and z = C −1 T * ,k is the (r − p + 1)-dimensional vector of weights for the data (G − Ḡ).Eq. ( 25) is a straightforward generalization of Eq. ( 11), producing an integral version of it.
In a simple 1D case with a constant mean f 0 and x * = x i± 1 2 , the mean vector Ḡ is written as and the GP reconstruction in Eq. ( 25) becomes Analogous considerations arise in the RBF reconstruction methods for volumeaveraged data.In [4] the RBF function is integrated and is used for reconstruction, a procedure that corresponds closely to integrating the kernel function in Eqs.(23) and (24).

The GP-SE Model
One of the most widely-used kernels in GP modeling is the "Squared Exponential (SE)" covariance kernel function [5,48], which has the form This GP-SE model features three free parameters, f 0 , Σ 2 , and .Often, in Eq. ( 11) and Eq. ( 25), a constant mean function f (x) = f 0 1 N is adopted for simplicity, where T is an N -dimensional one-vector.The constant f 0 can be analytically determined from the data as part of the interpolation process by maximizing the likelihood function in Eq. ( 5).See Appendix A for more details.
The latter two, Σ 2 and , are called "hyperparameters" which are the parameters that are built into the kernel.The hyperparameter Σ 2 has no effect on the posterior mean function, so one can set Σ 2 = 1 for simplicity.On the other hand, the hyperparameter is the correlation length scale of the model.It determines the length scale of variation preferred by the GP model.Our GP predictions for interpolation/reconstruction, which necessarily agree with the observed values of the function at the training points x k , may "wiggle" on this scale between training points.In this sense, is a "rigidity", controlling the curvature scales of the prediction, and should correspond to the physical length scales of the features GP is to resolve.Since we want function interpolations/reconstructions that are smooth on the scale of the grid, we certainly want > ∆, and would also prefer ≥ R. As just mentioned, the choice of requires a balance between the physical length scales in the problem and the grid scale.This implies that different values of can be employed depending on differing length scales on different regions of the computational domain.We leave the investigation on best practice for determining such an optimal within the context of FVM reconstructions to our forthcoming papers.For the purpose of this paper we select a constant value of for each simulation, which has a direct impact to the solution accuracy, as shown in Fig. 3(b).The relationship often leads to the trade-off principle in that there is a conflicting balance between the obtainable accuracy and the numerical stability due to a large condition number of the kernel matrix when a grid is highly refined.Analogously in RBF theory, the trade-off principle also occurs in terms of the shape parameters of the RBFs and it has been investigated by several authors [17,16,22,49].More recent studies [4,21,29] have focused on finding their optimal values of the shape parameters in the combined ENO/WENO-RBF framework.We discuss in detail our strategy of GP in Section 4.2.
The SE covariance function has two desirable properties [5,12,48].First, it has the property of having a native space of C ∞ functions, which implies that the resulting interpolants converge exponentially with stencil size, for data sampled from smooth functions.Second, the SE kernel facilitates dimensional factorization, which is useful in multidimensional cases.
Like all other GP kernels, SE suffers a notorious singularity issue where the kernel is prone to yield nearly singular matrices when the distance between any two points x and y becomes progressively smaller (or equivalently, the grid becomes progressively refined in CFD).A practical and well-known fix for this problem is to add a "nugget" [12], i.e., a small perturbation c o I (where c 0 is a small positive constant, usually chosen to be 10-100 times the floating-point machine precision) is added to K, where I is the identity matrix.Unfortunately, this trick does not resolve the issue in a desired way because it can result in less accurate data predictions in GP.Our strategy to overcome the singularity issue in SE is discussed in Section 4.
SE also performs poorly when the dataset contains discontinuities, such as shocks and contact discontinuities in a CFD context.An intuitive resolution to this issue, from a statistical modeling perspective, would be to use other types of non-smooth kernel functions instead such as the Matérn class, exponentialtypes, rational quadratic functions, Wendland, etc. [5,48,59].They are known to be better-suited for discontinuous datasets than SE by relaxing the strong smoothness assumptions of SE.By construction, the latent (or prior) sample function spaces generated by these non-smooth kernels possess finite-order of differentiability.As a result, we have seen that the posterior GP predictions of interpolations and reconstructions using such kernels turn out to be very noisy and rough, and their solutions often suffer from reduced solution-accuracy and numerical instability.A covariance kernel function such as the Gibbs covariance or the neural-network covariance [48] would be more appropriate for resolving discontinuities.The possibility of using these GP kernels for high-order interpolations and reconstructions is under investigation and will be studied in our future work.To deal with discontinuous solutions in this paper, we introduce yet another new approach to design an improved GP reconstruction algorithm, termed as "GP-WENO", which is described in Section 2.5.This new approach formulates a new set of non-polynomial, GP-based smoothness indicators for discontinuous flows.
Although we are interested in describing one-dimensional formulations of GP in this paper, let's briefly discuss a 2D case that best illustrates a general strategy of GP to compute a single interpolation and reconstruction procedure.Consider a list of N cells I i , i = 1, . . ., N in 2D.The most natural way to construct the list of cells (i.e., the stencil) over which the GP interpolation/reconstruction will take place is to pick a radius † R, and add to the list those cells I i k whose cell centers x i k are within the distance R from a local cell I i 0 under consideration.The result is a "blocky sphere" that ensures isotropy of the interpolation.See Fig. 1.We can adjust R to regulate N , thus making R a performance/accuracy tradeoff tuning parameter.
An important practical feature of the SE covariance function is that it provides its dimensional factorization analytically.Therefore, the volume averages in Eqs. ( 22) and ( 23) simplify to iterated integrals, and in fact they can be expressed analytically in terms of a pre-computed list of error functions of arguments proportional to one-dimensional cell center differences.Eqs. ( 23) and (24) become and using the SE kernel, where ∆ kh = (x k − x h )/∆.Other choices of covariance kernel functions are also available [59,12], while such choices would require numerical approximations for Eqs. ( 24) and (23), rather than a possible analytical form as in the SE case.

GP-WENO: New GP-based Smoothness Indicators for Non-smooth Flows
For smooth flows the GP linear prediction in Eq. ( 25) with the SE covariance kernel Eq. ( 29) furnishes a high-order GP reconstruction algorithm without any extra controls on numerical stability.However, for non-smooth flows the unmodified GP-SE reconstruction suffers from unphysical oscillations that originate at discontinuities such as shocks.To handle flows with discontinuities a hybrid method can be implemented using a shock detector (see [3]) to switch to a lower order piecewise linear method from the GP reconstruction when there is a shocked cell in the stencil.We have implemented such a hybrid method and found that it works well in general for problems with shocks, such as the Sod shock tube problem.However, we noticed that it fails to capture features in flow regions that contain a transition from smooth flow to a shock, producing unphysical oscillations there.
To ultimately resolve such issues with non-smooth flows we adopt the principal idea of employing the nonlinear weights in the Weighted Essentially Nonoscillatory (WENO) methods [26], by which we adaptively change the size of the reconstruction stencil to avoid interpolating through a discontinuity, while retaining high-order properties in smooth flow regions.A traditional WENO takes the weighted combination of candidate stencils based on the local smoothness of the individual sub-stencils.The weights are chosen so that they are optimal in smooth regions, in the sense that they are equivalent to an approximation using the global stencil that is the union of the candidate sub-stencils.
For a stencil S R of a radius R, with 2R + 1 points centered at the cell I i , we consider R + 1 candidate sub-stencils S m ⊂ S R , each with R + 1 points.Similar to the traditional WENO schemes, we subdivide S R given as Using Eq. ( 25) on each sub-stencil S m , the GP approximation for a pointwise value at a cell location x * from the m-th sub-stencil S m is On the other hand, from the global 2R + 1 stencil S R we also have a pointwise GP approximation at the same location Here, the cell location " * " is where we want to obtain a desired high-order reconstruction from GP, e.g., the cell interface x * = x i±1/2 is of particularly of interest to the Riemann problems in our case.Further, f 0 is a constant mean function that is the same over all the candidate sub-stencils, z T m = T * ,m C −1 m is the (R + 1)dimensional vector of weights, G m is the (R + 1)-dimensional vector of volume averaged data on S m (e.g., the volume averaged densities), and 1 R+1 = [1, . . ., 1] T is the (R + 1)-dimensional one-vector.Likewise, z T = T * C −1 , G, and 1 2R+1 are (2R + 1)-dimensional vectors defined in the same way.
We now take the weighted combination of these GP approximations as our final reconstructed value As in the traditional WENO approach, the weights ω m should reduce to some optimal weights γ m in smooth regions such that the approximation in Eq. (37) gives the GP approximation over the global 2R + 1 point stencil.The γ m 's then should satisfy where γ = [γ 1 , . . ., γ R+1 ] T is given by the solution to the (R + 1) × (2R + 2) overdetermined system The columns of the matrix M are given by The optimal weights γ m then depend on the choice of kernel function, but as with the vectors of GP weights, z and z m , γ m need only be computed once.We take the γ m 's as the least squares solution to the overdetermined system in Eq. ( 39), which can be determined numerically.It remains to describe the non-linear weights ω m in Eq. ( 37).Again, these should reduce to the optimal weights in smooth regions, but more importantly, they need to serve as an indicator of the quality of data on the sub-stencil S m .We adopt the same weights as in the WENO-JS methods [26]: , where ωm = where we set p = 1 and = 10 −36 for our test results.These weights are based on the so-called smoothness indicators β m , for each stencil.In modern WENO schemes the smoothness indicators are calculated as the L 2 -norm of the reconstructed polynomial on each ENO stencil S m .We wish to construct a new class of smoothness indicators that fits within the GP framework.
To begin with, we give a brief description on the eigenfunction analysis of covariance kernels.Gaussian process regression, as described in Section 2, can be equivalently viewed as Bayesian linear regression using a possibly infinite number of basis functions.One choice of basis functions is the eigenfunctions, φ i (•), i = 1, . . ., n, of the covariance kernel function, k.By definition each eigenfunction The eigenpair (φ i , λ i ) consists of the eigenfunction and the eigenvalue of the covariance kernel function k with respect to the measure µ [5,48].
Of interest is when there is a non-negative function p(x) called the density of the measure µ so that the measure may be written as dµ(x) = p(x)dx.We can approximate the integral in Eq. ( 42) by choosing n finite sample points, x l , l = 1, . . ., n, from p(x) on some stencil, which gives Plugging in x = x l into Eq.( 43) for l = 1, . . ., n gives the matrix eigenproblem where K is an n × n Gram matrix with entries K ij = k(x i , x j ), and nλ i is the i-th matrix eigenvalue associated with the normalized n-dimensional eigenvector v i satisfying v T i v j = δ ij .We see that φ i (x j ) ∼ √ ne j • v i where e j is the j-th unit vector, and we get the √ n factor from differing normalizations of the eigenvector and eigenfunction.This eigendecomposition of the covariance matrix, when viewed as a principal component analysis (PCA) [5,48], can be thought of as decomposing the covariance into the various variational modes that can be represented on the n sample points.For more discussion of eigenfunction analysis of covariance kernels we refer readers to Section 4.3 of [48]; Section 12.3 of [5].
A set of n samples of a function f (x), denoted as evaluated at the points x l , l = 1, . . ., n as in Eq. ( 43), can be expanded in the eigenbasis in (44) as We can then define a norm in the Hilbert space of functions using the inner product of the function in the reproducing kernel Hilbert space (RKHS) defined by the kernel function k(x, x ) [48], In this expression for the norm f 2 H , the relative weight of a given mode in the data f is inversely weighted by the eigenvalue λ i .The SE kernel has a native Hilbert space of C ∞ functions, and assigns relatively small eigenvalues to eigenfunctions that vary rapidly on short length scales.The quantity f 2 H thus contains large additive terms when the data embodies a discontinuity, while data corresponding to smooth regions produces smaller additive terms.This suggests that we may use f 2 H to supply the basis for a GP-based smoothness indicator in the non-linear weights in Eq. (41).
We now use the above discussion of the eigendecomposition to focus on the practical implementation of our new GP-based smoothness indicators.It remains to specify how to obtain the α i 's from the volume averaged data G m on the substencil S m .Using the notation 'm' to denote the objects restricted on each S m , the eigenproblem in Eq. ( 44) on S m can be written as where K m is the (R + 1) × (R + 1) sub-matrix.Note that here we are using the pointwise covariance kernel K m from Eq. ( 29) rather than the integrated covariance kernel C (m) from Eq. (23).In this way we obtain the eigenpair (λ m i , u m i ) that is consistent with providing β m which is to measure smoothness of pointwise values ) in Eq. ( 37).Consider a (R + 1)-dimensional cell-centered pointwise values We can express f m in the basis v m i as Each element f m (x l ), l = 1, . . ., R + 1, can be reconstructed from G m , using a zero mean f 0 = 0 in Eq. ( 28), as where (R + 1)-dimensional vector Here T m l,k is the prediction vector (Eq.( 31)) over the sub-stencil S m evaluating the covariance between the cell average values q k , k = 1, . . ., R + 1, and the pointwise value q l evaluated at each x l ∈ S m .
Obtaining α m i is straightforward from Eq. ( 50).Using (v m j ) T v m i = δ ji , Finally for each m we have our new GP-based smoothness indicators for the data on the sub-stencil S m given by The computational efficiency of Eqs. ( 51) and ( 53) can be improved by introducing vectors P m i defined as where the l-th column of the matrix Z m is given by z m (x l ), l = 1, . . ., R+1.In this way we precompute P m i which can directly act on the volume average quantities of G m , by which the α m i 's are given by To see the computational efficiency in comparison, in Appendix B we provide operation counts of the two approaches referred to as "Option 1" with solving Eqs. ( 51) and ( 53), and "Option 2" with solving Eqs. ( 55) and (56).Option 2 was used for the results shown in Section 4.
It is noteworthy that there is an alternative interpretation of Eq. ( 54) available in terms of the log likelihood function.Notice that Eq. ( 54) can be written as a quadratic form which is again equivalent to assuming f 0 = 0 in the relation in Eq. ( 5) and C = ln(det |K| −1 ) + N ln(2π).The analogy between Eq. ( 54) and Eq. ( 57) can be proved easily if we realize that the matrix K m can be expressed as and hence The result follows easily by plugging Eqs. ( 50) and ( 60) into Eq.( 57) to get Eq.( 54).So the statement β m is large in this sense is analogous to the statement that the likelihood of f m is small.The statistical interpretation is that short lengthscale variability in f m make its data unlikely according to the smoothness model represented by K m .It was observed by Jiang and Shu [26] that it is critically important to have appropriate smoothness indicators so that in smooth regions the non-linear weights reproduce the optimal linear weights to the design accuracy in order to maintain high order of accuracy in smooth solutions.Typically this is shown by Taylor expansion of smoothness indicators in Eq. ( 41).Here we propose to calculate the eigendecomposition in Eq. ( 44) numerically, and therefore we don't provide a mathematical proof for the accuracy of our new smoothness indicators.Nonetheless we observe the expected high order convergence expected from the polynomial WENO reconstructions on the equivalent stencils in numerical experiments provided in Section 4.2.
3 Steps in GP-SE Algorithm for 1D FVM Before we present the numerical results of the GP reconstruction algorithms detailed in the above, we give a quick summary on the step-by-step procedure for 1D simulations with FVM.The 1D algorithm of GP outlined above can be summarized as below: Step 1 Pre-Simulation: The following steps are carried out before starting a simulation, and any calculations therein need only be performed once, stored, and used throughout the actual simulation.
Step (a) Configure computational grid: Determine a GP stencil radius R as well as choose the size of the hyperparameter .This determines the SE kernel function in Eq. ( 29) as well as the global and candidate stencils in Eqs. ( 32) & (33).
Step (b) Compute GP weights: Compute the covariance matrices, C and C m (Eq.( 23)) on the stencils S R and each of S m as well as the prediction vectors, T * (Eq.( 24)).The GP weight vectors, z T = T * C −1 , on each of the stencils can then be stored for use in the GP reconstruction.The columns of the matrices Z m , z m (x l )'s, should be computed here for use in step (d).
It is crucial in this step as well as in Step (c) to use the appropriate floating point precision to prevent the covariance matrix C from being numerically singular.This is discussed in more detail in Section 4.2.We find that double precision is suitable up to condition numbers κ ∼ 10 8 , whereas quadruple precision allows up to κ ∼ 10 18 .The standard double precision is used except for Step (b) and Step (c).
Step (c) Compute linear weights: Use the GP weight vectors to calculate and store the optimal linear weights according to Eq. ( 39).
Step (d) Compute kernel eigensystem: The eigensystem for the covariance matrices used in GP-WENO are calculated using Eq. ( 44).The matrices, C m , are the same on each of the candidate stencils in the GP-WENO scheme presented here, so only one eigensystem needs to be determined.This eigensystem is then used to calculate and store the vectors in Eq. ( 55) for use in determining the smoothness indicators in the reconstruction Step 2. At this point, before beginning the simulation, if quadruple precision was used in Step (b) and Step (c) the GP weights and linear weights can be truncated to double precision for use in the actual reconstruction step.
Step 2 Reconstruction: Start a simulation.Choose f 0 according to Eq. ( 65) or simply set to zero.The simplest choice f 0 = 0 gives good results in practice and yields Ḡ = 0 in Eq. ( 25).At each cell x i , calculate the updated posterior mean function f * in Eq. ( 25) as a high-order GP reconstructor to compute high-order pointwise Riemann state values at x * = x i±1/2 using each of the candidate stencils.The smoothness indicators (Eq.( 54)) are calculated using the eigensystem from Step (d) and, in conjunction with the linear weights from Step (c), form the non-linear weights (Eq.( 41)).Then take the convex combination according to Eq. ( 37).
Step 3 Calculate fluxes: Solve Riemann problems at cell interfaces using the highorder GP Riemann states in Step 2 as inputs.
Step 4 Temporal update: Update the volume-averaged solutions q i from t n to t n+1 using the Godunov fluxes from Step 3.

Numerical Results
Here we present numerical results using the GP-WENO reconstructions with stencil radii R = 1, 2, 3 (denoted as GP-R1, GP-R2, GP-R3 respectively), described in Section 2.5, applied to the 1D compressible Euler equations and the 1D equations of ideal magnetohydrodynamics (MHD).The GP-WENO method will be the default reconstruction scheme hereafter.We compare the solutions of GP-WENO with the fifth-order WENO (referred to as WENO-JS in what follows) FVM [26,55], using the same non-linear weights in Eq. (41).The only difference between using WENO-JS and GP-R2 is the use of polynomial based reconstruction and smoothness indicators [55] for WENO-JS, and the use of Gaussian process regression with the new GP-based smoothness indicators for GP.A fourth-order TVD Runge-Kutta method [54] for temporal updating, and the HLLC [38,63] or Roe [51] Riemann solvers are used throughout.

Performance Comparison
We first present a performance comparison for the proposed GP scheme and the WENO-JS scheme summarized in Table 1.We see that the cost of the GP-R2 and WENO-JS methods are very nearly the same, operating on the same stencils.The minor difference can be attributed to the new GP based smoothness indicators.It can be seen in Eq. ( 54) the GP smoothness indicators can be written as the sum of R + 1 perfect squares, which is three terms for each β m for GP-R2, meanwhile the smoothness indicators for WENO-JS contain only two terms (i.e., the first and second derivatives of the second-degree ENO polynomials p m (x)).The additional cost is offset by a significant advantage in smooth regions near discontinuities over the WENO-JS scheme, which is discussed in Section 4.3.
Table 1 Shown is the relative time to solution for the four methods considered, all normalized to the GP-R2 time.

1D Smooth Advection
The test considered here involves the passive advection of a Gaussian density profile.We initialize a computational box on [0,1] with periodic boundary conditions.The initial density profile is defined by ρ(x) = 1 + e −100(x−x 0 ) 2 , with x 0 = 0.5, with constant velocity, u = 1, and pressure, P = 1/γ.The specific heat ratio is chosen to be γ = 5/3.The resulting profile is propagated for one period through the boundaries.At t = 1, the profile returns to its initial position at x = x 0 , any deformation of the initial profile is due to either phase errors or numerical diffusion.We perform this test using a length hyperparameter of = 0.1 for stencil radii R = 1, 2, 3, 4 and 5, with a fixed Courant number, C cfl = 0.8 and vary the resolution of computational box, with N = 32, 64, 128, 256 and 512.
The results of this study are shown in Fig. 3(a).From these numerical experiments, GP reconstruction shows a convergence rate that goes as the size of the stencil, 2R + 1.Note that this convergence rate 2R + 1 of GP is equivalent to the convergence rate 2r − 1 of a classic WENO method for a same size of stencil.In the classic WENO method, the notation r represents the order (not degree) of polynomials on S m which require r cells on each S m , totaling 2r − 1 cells on S R .This implies 2r − 1 = 2R + 1, or R = r − 1.For example GP-R2 converges at fifth-order on the stencil S R = S 2 that has five cells, so does WENO-JS with three third-order ENO polynomials (i.e., r = 3) on the same 5-point stencil.Fig. 3 (a) Convergence for smooth Gaussian density advection using different GP stencil radii, all using the length scale = 0.1 and the HLLC Riemann solver with C cfl = 0.8.Black dotted lines show 2R + 1 convergence rates, ranging from 3rd-order (R = 1) to 11th-order (R = 5).Red dotted line represents a plateau at an L 1 of 10 −12 where a large condition number of the covariance matrix is obtained and no further accuracy is achievable.(b) L 1 errors as a function of the hyperparameter , using R = 2. Dotted lines are the error for fifth-order WENO-JS method on the same stencil.
In Fig. 3(a) the L 1 error plateaus out at an ∼ 10 −12 which is a few orders of magnitude greater than the standard IEEE double-precision, ∼ 10 16 .This happens because at high resolution the length hyperparameter, , becomes very large relative to the grid spacing, ∆.The covariance matrix, C given in Eq. ( 30) becomes nearly singular in the regime /∆ 1, yielding very large condition numbers for C. We find the plateau in the L 1 error occurs for condition numbers, κ ∼ 10 18 , corresponding to the point where the errors in inverting C in Eq. ( 25) begin to dominate.This implies that the choice of floating-point precision has an immense impact on the possible /∆, and a proper floating-point precision needs to be chosen in such a way that the condition number errors do not dominate.As mentioned in Section 2.4, SE suffers from singularity when the size of the dataset grows.The approach that enabled to produce the results in Fig. 3 is to utilize quadrupleprecision only for the calculation of z T = T T * C −1 in Eq. ( 25).Otherwise, the plateau would appear at a much higher L 1 error ∼ 10 −7 with double-precision, producing undesirable outcomes for all forms of grid convergence studies.This corresponds to condition numbers κ ∼ 10 8 , and as a point of reference, the breakdown starts to occur for /∆ > 48 using a GP radius R = 2. Since z T needs to be calculated only once, before starting the simulation, it can then be truncated to double-precision for use in the actual reconstruction procedure.There are only four related small subroutines that need to be compiled with quadruple-precision in our code implementation.The overall performance is not affected due to this extra precision handling.It should be noted that this extra precision handling is necessary for the purpose of a grid convergence study from the perspective of CFD applications.
The correlational length hyperparameter provides an additional avenue to tune solution accuracy that is not present in polynomial-based methods.Fig. 3(b) shows how the GP errors with R = 2 in the smooth-advection problem changes with the choice of , compared with the error from a fifth-order WENO-JS + RK4 solution (denoted in dotted lines).At large the errors become roughly the same as in WENO-JS, and at small the errors become worse.The error finds a minimum at a value of near the half-width of the Gaussian density profile.This is in line with the idea that the optimal choice of should match the physical length scale of the feature being resolved.We will report new strategies on this topic in our future papers.

1D Shu-Osher Shock Tube Problem
The second test is the Shu-Osher problem [56] to test GP's shock-capturing capability as well as to see how well GP can resolve small-scale features in the flow.The test gives a good indication of the method's numerical diffusivity, and it has been a popular benchmark to demonstrate numerical errors of a given method.In this problem, a (nominally) Mach 3 shock wave propagates into a constant density field with sinusoidal perturbations.As the shock advances, two sets of density features appear behind the shock.One set has the same spatial frequency as the unshocked perturbations, while in the second set the frequency is doubled and follows more closely behind the shock.The primary test of the numerical method is to accurately resolve the dynamics and strengths of the oscillations behind the shock.The results are shown in Fig. 4. The solutions are calculated at t = 1.8 using a resolution of N = 200 and are compared to a reference solution resolved on N = 2056.It is evident that the GP solution using R = 3 provides the least diffusive solution of the methods shown, especially in capturing the amplitude of the post-shock oscillations in Fig. 4(b).Of the two fifth-order methods, GP-R2 and WENO-JS, the GP solution has slightly better amplitude in the post-shock oscillations compared to the WENO-JS solution, consistent with what is observed in Section 4.2 for the smooth advection problem.
The results in Section 4.2 suggest that the choice of should correspond with a length scale characteristic of the flow for optimal performance.Fig. 5 compares the post shock features on the same grid for the WENO-JS method and the GP-R2 method with /∆ = 3, 6 and 50.Here /∆ = 3 is roughly a half wavelength of the oscillations, and the GP-R2 method clearly gives a much more accurate solution compared to the WENO-JS solution.Just as can be seen in Fig. 3(b), /∆ much larger than the characteristic length yields a solution much closer to that of WENO-JS.From Fig. 3(b) we expect that the GP reconstructions in Eq. ( 35) should approach the WENO-JS reconstructions in smooth regions for ∆.However, we see that the new GP based smoothness indicators allow for the amplitudes near the shock to be better resolved.This reflects a key advantage of the proposed GP method over polynomial-based high-order methods.The additional flexibility afforded by the kernel hyperparameters in GP, allowing for solution accuracy to be tuned to the features that are being resolved.Only at larger values of does the model become fully constrained by the data in GP, whereas the interpolating polynomials used in a classical WENO are always fully constrained by design.Analogously in the RBF theory, the the shape parameters j plays an important role in the similar context of accuracy and convergence.Several strategies have been studied recently for solving hyperbolic PDEs [4,21,29].

The Sod Shock Tube Test
The shock tube problem of Sod [57] has been one of the most popular 1D tests of a code's ability to handle shocks and contact discontinuities.The initial conditions, on the domain [0, 1], are given by the left and right states (ρ, u, p) = (1, 0, 1), x < 0.5, (0.125, 0, 0.1), x > 0.5, (61) with the ratio of specific heats γ = 1.4.Outflow boundary conditions are imposed at x = 0 and x = 1.The results are shown in Fig. 6.All of the GP-WENO schemes correctly predicts the nonlinear characteristics of the flow including the rarefaction wave, contact discontinuity and the shock.The solution using GP-R2 is very comparable to the WENO-JS solution using the same 5-point stencil.As expected, the GP-R1 solution smears out the most at both the shock and the contact discontinuity, and at the head and tail of the rarefaction.The 7th order GP-R3 also successfully demonstrates that its shock solution is physically correct without triggering any unphysical oscillation.Somewhat counter intuitively from the perspective of 1D polynomial schemes, the smallest stencil GP-R1 shows the most oscillations near the shock.This happens because the eigendecomposition of the kernel in Eq. ( 44) used in the calculation the smoothness indicators for Eq. ( 54) better approximates f 2 H as the size of the stencil increases.Recall that the eigenexpansion of the kernel function in Eq. ( 42) can contain an infinite number of eigenvalues, and so can f 2 H and the sum in Eq. (47).The finite approximation to the eigensystem and subsequently f 2 H then becomes better as the smallest coefficient α i goes to zero.Then for GP, the β m 's best indicate the smoothness on larger sized stencils.

Einfeldt Strong Rarefaction Test
First described by Einfeldt et al. [15], this problem tests how satisfactorily a code can perform in a low density region in computing physical variables, ρ, u, p, , etc.Among those, the internal energy = p/(ρ(γ − 1)) is particularly difficult to get right due to regions where the density and pressure are very close to zero.The ratio of these two small values amplifies any small errors in both ρ and p, making the errors in the largest in general [62].The large errors in are apparent for all schemes shown in Fig. 7, where the error is largest around x = 0.5.It can be observed that the amount of departure in from the exact solution (the cyan solid line) decreases as the GP radius R increases.The error in GP-R2 is slightly larger for at the center than in WENO-JS.However, the peak becomes considerably smaller in amplitude and becomes slightly flatter as R increases.

Brio-Wu MHD Shock Tube
Brio and Wu [7] studied an MHD version of Sod's shock tube problem, which has since become an essential test for any MHD code.The test has since uncovered some interesting findings, such as the compound wave [7], as well as the existence of non-unique solutions [64,65].The results for this test are shown in Fig. 8.All of the GP methods are able to satisfactorily capture the MHD structures of the problem.In all methods, including WENO-JS, there are some observable oscillations in the post shock regions.Lee in [34] showed that these oscillations arise as a result of the numerical nature of the slowly moving shock [70] controlled by the strength of the transverse magnetic field.As studied by various researchers [1,27,28,30,50,60], there seems no ultimate fix for controlling such un- physical oscillations due to the slowly moving shock.Quantitatively, the amount of oscillations differs in different choices of numerical methods such as reconstruction algorithms and Riemann solvers.We see that all of the GP solutions together with the WENO-JS solution feature a comparable level of oscillations.Except for GP-R1, all solutions also suffer from a similar type of distortions in u and B y in the right going fast rarefaction.This distortion as well as the oscillations due to the slowly moving shock seem to be suppressed in the 3rd order GP-R1.

Ryu and Jones MHD Shock Tubes
Ryu and Jones [52] introduced a large set of MHD shock tube problems as a test of their 1D algorithm, that are now informative to run as a code verification.In what follows we will refer to the tests as RJ followed by the corresponding figure from [52] in which the test can be found.

RJ1b Shock Tube
The first of the RJ shock tubes we consider is the RJ1b problem.This test contains a left going fast and slow shock, contact discontinuity as well as a slow and fast rarefaction.In Fig. 9 that all waves are resolved in the schemes considered.

RJ2a Shock Tube
The RJ2a test provides an interesting test due to its initial conditions producing a discontinuity in each of the MHD wave families.The solution contains both fastand slow-left and right-moving magnetoacoustic shocks, left-and right-moving rotational discontinuities and a contact discontinuity.Fig. 10 shows that all three of the GP schemes are able to resolve all of these discontinuities.Again we see that the smallest stencil GP-R1 solution contains some oscillations near the shock that are not present in the other GP solutions.

RJ2b Shock Tube
The RJ2b shock tube creates a set of a fast shock, a rotational discontinuity and a slow shock moving to the left away from a contact discontinuity, as well as a fast rarefaction, rotational discontinuity and slow rarefaction all moving to the right.What is of interest is that since the waves propagate at almost the same speed, at t = 0.035 they have still yet to separate much, and so test a codes ability to resolve all of the discontinuities despite their close separation.The results of this test for the methods considered are shown in Fig. 11.At the shown resolution of N = 128, the contact discontinuity and the slow shock become somewhat smeared together in the GP-R1 solution, while they are better resolved for the other methods, even though there are only couple of grid points distributed over the range of the features.

RJ4a Shock Tube
The RJ4a shock tube yields a fast and slow rarefaction, a contact discontinuity, a slow shock, and of particular interest the switch-on fast shock.The feature of the switch-on shock is that the magnetic field turns on in the region behind the shock.As can be seen in Fig. 12, all of the features including the switch-on fast shock are resolved in all methods.We see that GP-R1 smears out the solution not only in resolving discontinuous flow regions, but also in resolving both fast and slow rarefaction waves.

RJ4b Shock Tube
The RJ4b test is designed to produce only a contact discontinuity and a fast right going switch-off fast rarefaction, where the magnetic field is zero behind the rarefaction.We can see in Fig. 13 that both the contact discontinuity and the switch-off rarefaction features are captured in all of the considered methods.

RJ5b Shock Tube
The RJ5b problem is of interest because it produces a fast compound wave, as opposed to the slow compound wave in the Brio-Wu problem, in addition to a left-and right-going slow shock, contact discontinuity and fast rarefaction.At the resolution of N = 128 used for the results in Fig. 14, the compound wave and one of the slow shocks are smeared together in all the methods tested.

Conclusion
We summarize key novel features of the new high-order GP approach presented in this paper.
The new GP approach utilizes the key idea from statistical theory of GP prediction to produce accurate interpolations of fluid variables in CFD applications.We have developed a new set of numerical strategies of GP for both smooth flows and non-smooth flows to numerically solve hyperbolic systems of conservation laws.The GP methods presented here show an extremely fast rate of solution accuracy in smooth advection problems by controlling a single parameter, R. Further, the additional flexibility offered by the GP model approach over the fully constrained polynomial based model through the kernel hyperparameter allows for added tuning of solution accuracy that is not present in traditional polynomial based high-order methods.These parameters allow the GP method to demonstrate variable orders of method accuracy as functions of the size of the GP stencil and the hyperparameter (see Eq. ( 29)) within a single algorithmic implementation.
The new GP based smoothness indicators introduced here are used to construct non-linear weights that give the essentially non-oscillatory property in discontinuous flows.The new smoothness indicators show a significant advantage over traditional WENO schemes in capturing flow features at the grid resolution near discontinuities.
The GP model, by design, can easily be extended to multidimensional GP stencils.Therefore, GP can seamlessly provide a significant algorithmic advantage in solving the multidimensional PDE of CFD.This "dimensional agnosticism" is unique to GP, and not a feature of polynomial methods.We will report our ongoing developments of GP in multiple spatial dimensions in forthcoming papers.

Acknowledgements
The software used in this work was developed in part by funding from the U.S. DOE NNSA-ASC and OS-OASCR to the Flash Center for Computational Science at the University of Chicago.
A Appendix: Choosing the Optimal Mean f 0 using the Maximum Likelihood Function In most practical applications we will not have any prior information on the mean of the function samples.This makes it reasonable to take a zero mean function f 0 = 0 if we try to achieve the simplest and most general symmetry of any random samples.However, one can easily determine the best optimal choice of the mean value by maximizing the likelihood of the samples.This can be done by taking the logarithm function of Eq. ( 5) to first get the log likelihood function of the function samples f = [f (x 1 ), . . ., f (x N )] T that are pointwise, To maximize the likelihood, we take the partial derivative of Eq. ( 62) with respect to f 0 , where f = f 0 1 N , and set the partial derivative to be zero, where we used the symmetry of K −1 to get f T K −1 1 N = 1 T N K −1 f .The best optimal value of f 0 is then given by which maximizes the likelihood of the samples in Eq. ( 5) for the case of interpolations.
In the similar way, we obtain the best optimal value of f 0 for the case of reconstructions, given as where G = [ q 1 , . . ., q N ] T contains N volume-averaged values and C is the integrated kernel in Eq. ( 23).These optimal choices of f 0 are obviously more expensive in computation as they involve local calculations on each GP stencil.In all the tests we presented in this paper we do not see any significant evidence that these optimal values result in any gains in terms of solution accuracy and computational efficiency.

B Appendix: Operation Counts
We display possible outcomes of computing the GP-based new smoothness indicators in two different approaches described in Section 2.5: -Option 1: Compute α m i using Eqs.( 51) and ( 53), -Option 2: Compute α m i using Eqs.( 55) and ( 56).Consider a 1D domain discretized with Nx grid cells.Let's say we use a GP stencil of a radius R, which is subdivided into R + 1 sub-stencils, Sm, m = 1, . . ., R + 1.Since the calculations described in Section 2.5 to obtain smoothness indicators need to take place on every cell, there are Nx operations involved in total, each of which has M many operation counts at each S R level.Below, we estimate the total number of required operation counts for Option 1 and Option 2.
We first consider Option 1.With a computer code with successive multiplications and additions for a dot product, Eq. ( 51) involves R + 1 multiplications and R + 1 additions for each m = 1, . . ., R + 1.This gives 2(R + 1) operations for Eq. ( 51) for each m and l.Likewise, we also have 2(R + 1) operations for Eq.(53) for each m and i.As a result, we see there are 2m(R + 1)(l + i) operations involved on each S R .Since there are Nx many S R calculations overall, the total number of operations becomes 2m(R + 1)(l + i)Nx per each time step, where m, i, l = 1, . . .R + 1.Assuming there are M time steps required for the run, we require 2m(R + 1)(l + i)NxM operations in total.
Let us now consider Option 2. A major difference in this case is to realize the fact that Eq. ( 55) can be pre-computed as soon as the grid is configured.This is because, unlike f m (x l ) in Eq. ( 51), the vector P m i has no dependency on any local data but only on the grid itself, whereby it can be pre-computed and saved for each m and i before evolving each simulation, and reused during the simulation evolutions.The typical operation count for a (R + 1)-dimensional vector and a (R + 1) × (R + 1)-dimensional matrix multiplication with successive multiplications and additions is found to be 2(R + 1) 2 .Hence we see there are 2im(R + 1) 2 total operations that can be pre-computed and saved, m, i = 1, . . ., R + 1.Here is an extra operation reduction available by realizing that Eq. ( 55) is same for all m, thereby we have 2i(R + 1) 2 total operations for i = 1, . . ., R + 1. Eq. ( 56) however, needs to be computed locally during the run because it depends on the local data Gm that evolves in time.For each m and i, the dot product in Eq. (56) involves 2(R + 1) operations, totaling 2im(R + 1)Nx per each time step, m, i = 1, . . .R + 1.As a result, we see there are 2i(R + 1) 2 operations initially, plus 2im(R + 1)Nx during the run per each time step.The simulation with M time steps then involves 2i(R + 1) 2 + 2im(R + 1)NxM = 2im(R + 1)[R + 1 + NxM ] in total.
In comparison, we see there is a factor of two performance gain in Option 2 because the ratio of the two options becomes Operation No. Option 1 Operation No. Option 2 = 2m(R + 1)(l + i)NxM 2i(R + 1)[R + 1 + mNxM ] (66) where we used i, l, m = R + 1.

Fig. 1
Fig. 1 An example of a 2D GP stencil with a radius R. It consists of a collection grid cells I i k (blue cells) that are included in a multidimensional blocky sphere of a radius R from the local cell I i 0 , in which the GP interpolation/reconstruction takes place.

Fig. 2 A
Fig.2A schematic example of the 5-point GP stencil with R = 2 centered at x i .Two grid locations indicated as x * are the target locations where we seek high-order GP approximations of pointwise value (e.g., ρ * ) using the given cell-centered volume average data nearby (e.g., ρs , s = i − 2, . . ., i + 2).

Fig. 4 Fig. 5
Fig. 4 The Shu-Osher problem.Density profiles at t = 1.8 computed using three different GP stencil radii (R = 1, 2, 3) on a 200 grid resolution with C cfl = 0.8.All GP calculations use /∆ = 6 and the HLLC Riemann solver.The reference solution is obtained using WENO-JS on a 2056 grid resolution.4(b) shows a closeup view of the post shock features.

Fig. 6
Fig. 6 The Sod shock tube problem at t = 0.2.(a) GP with stencil radius R = 1, (b) GP with stencil radius R = 2, (c) GP with stencil radius R = 3, and (d) WENO-JS, all using 128 grid points with C cfl = 0.8.All GP calculations use /∆ = 12 and the HLLC Riemann solver.Solid lines show a reference solution computed using WENO-JS on 1024 grid points.

Fig. 8
Fig. 8 The Brio-Wu shock tube at t = 0.1.(a) GP with stencil radius R = 1, (b) GP with stencil radius R = 2, (c) GP with stencil radius R = 3, and (d) WENO-JS, all using 128 grid points with C cfl = 0.8.All GP calculations use /∆ = 12 and the Roe Riemann solver.Solid lines show a reference solution computed using WENO-JS on 1024 grid points.