A discontinuous Galerkin method for approximating the stationary distribution of stochastic fluid-fluid processes

Introduced by Bean and O'Reilly (2014), a stochastic fluid-fluid process is a Markov processes $\{X_t, Y_t, \varphi_t\}_{t \geq 0}$, where the first fluid $X_t$ is driven by the Markov chain $\varphi_t$, and the second fluid $Y_t$ is driven by $\varphi_t$ as well as by $X_t$. That paper derived a closed-form expression for the joint stationary distribution, given in terms of operators acting on measures, which does not lend itself easily to numerical computations. Here, we construct a discontinuous Galerkin method for approximating this stationary distribution, and illustrate the methodology using an on-off bandwidth sharing system, which is a special case of a stochastic fluid-fluid process.


Introduction
A stochastic fluid process {X t , ϕ t } t≥0 is a two-dimensional Markov process, where the phase ϕ t is a continuous-time Markov chain on a finite state space S, and the fluid X t varies linearly at rate c ϕt . A subset of Markov additive processes, stochastic fluids have been well-analysed in the past two decades. There have been two recent generalisations of stochastic fluid processes to a higher dimension: Miyazawa and Zwart [1] analysed discrete-time multidimensional Markov additive processes, and Bean and O'Reilly [2] studied the so-called stochastic fluid-fluid process, the latter is our focus in this paper.
A stochastic fluid-fluid is a Markov process {X t , Y t , ϕ t } t≥0 , where the phase ϕ t is still a Markov chain on a finite state space S; X t ∈ (−∞, ∞) is the first fluid, which varies linearly at rate c ϕt X t := X 0 + t 0 c ϕs ds; and Y t is the second fluid, which varies linearly at rate r ϕt (X t ): Y t := Y 0 + t 0 r ϕs (X s ) ds.
As the classic fluid process {X t , ϕ t } t≥0 is used extensively in many areas, such as insurance and environmental modelling, it is clear that stochastic fluid-fluid models have even a wider range of applicability.
An example of application for a stochastic fluid fluid is the modelling of growth and bleaching of coral reefs, as described in [2]. In this process, we can model the density of symbiotic zooxanthellae at time t by X t , with the positive rates c i corresponding to the growth of the zooxanthellae, the negative rates to the bleaching. If the density X t is below a certain threshold x, the coral cannot store lipids until the density increases again past this level. During the time X t ∈ (0, x), the coral relies on stored lipids, modelled as Y t , and dies when the latter runs out, that is, While the analyses in [1,2] are markedly different, both papers drew inspiration from Neuts' matrix-analytic approach [3,4] to obtain the limiting behaviour of these processes, working with operators on function spaces instead of matrices. Thus, their closed-form expressions for the limiting distributions ( [1,Theorem 4.1], [2,Theorem 2]) are given in terms of operators acting on measures, which are not immediately amenable to numerical computations for real-life applications. One way to numerically handle operators on function spaces is to construct approximations of the operators. To this end, there exist numerical procedures such as finite difference, finite volume, finite element, and discontinuous Galerkin (DG) methods [5]. The operators arising from fluid-fluid processes are assumed to be acting on a function space of smooth probability densities. The choice of an approximation method should reflect these properties in its solutions. In the DG method, the conservation of probability mass and local smoothness can be captured in the approximations [5].
In this paper, we construct a discontinuous Galerkin method to approximate the joint stationary distribution of a stochastic fluid-fluid process. We numerically illustrate the effectiveness of the methodology using an on-off bandwidth-sharing system of two processors [6]. In this example, inputs into the processors, X t and Y t , are turned on and off by a Markov chain, ϕ t ; the combined output capacity is fixed and allocated according to the workload of the first, high-priority, processor X t . Latouche et al. [6] evaluated the marginal limiting distribution of the first processor X t , and provided bounds for the marginal limiting distribution of the workload of the second processor Y t . We verify our DG approximations by comparing them against Monte Carlo simulations of the system, against analytical results obtained in [6], and against our intuitive understanding of the system dynamics. In all considered cases, we find the approximations to be accurate.
The paper is organised as follows. In Section 2, we give relevant background to present the joint stationary distribution of a stochastic fluid-fluid process. We construct in Section 3 a discontinuous Galerkin method to approximate the stationary distribution, and include numerical experiments in Section 4.

Preliminaries
Consider a stochastic fluid-fluid process {X t , Y t , ϕ t } t≥0 . We assume that X t , Y t ∈ [0, ∞) and that there is a regulated boundary at level 0 for both buffers: for i ∈ S. Let T be the irreducible generator for the finite Markov chain ϕ t . We denote by C := diag(c i ) i∈S the diagonal fluid-rate matrix for X t , and R(x) := diag(r i (x)) i∈S the diagonal fluid-rate matrix for Y t . For the remainder of this section, we summarize the findings of [2] on the joint stationary distribution of For each Markovian state i ∈ S, we partition F according to the rates of change r i (·) for the second fluid Y t : For all i ∈ S, the functions r i (·) are assumed to be sufficiently well-behaved that F m i , m ∈ {+, −, 0}, is a finite union of intervals and isolated points. Moreover, define We assume that the process {X t , Y t , ϕ t } is positive recurrent, in order to guarantee the existence of the joint stationary density operator π(y) = (π i (y)) i∈S and the joint stationary mass operator p = (p i ) i∈S , where for A ⊂ F The determination of π(y) involves two important matrices of operators, B and Ψ. Intuitively, for a set A ∈ F and a measure vector µ = (µ i ) i∈S , µe Bt (A) gives the conditional probability of X t ∈ A, and µΨ(A) the conditional probability of Y t returning to level zero and doing so when X t ∈ A, given that the initial distribution is µ.

Matrix B of Operators
Let M(S × R + ) be the set of integrable complex-valued Borel measures on the Borel σ-algebra B S×R + . For µ ∈ M(S × R + ), we can write where µ i ∈ M(F i ), the set of integrable complex-valued Borel measures on B F i , and We denote by V(t) the matrix of operators V m ij (t) : M(F i ) → M(F m j ), i ∈ S , j ∈ S m and , m ∈ {+, −, 0}, which are defined for A ⊂ F m j as follows: the probability of the process {X t , ϕ t } being in the destination set (A, j) at time t, given that it starts in (F i , i) according to the measure µ i . We can write V(t) in terms of its infinitesimal generator B as For V(t) and B, as well as other operators of the same dimensions to be introduced later in the paper, the operators are partitioned according to {+, −, 0}; for example, where each block B m is the |S | × |S m | matrix of operators B m ij . We note that V(t) forms a strongly continuous semigroup on the set of measures that are absolutely continuous with respect to Lebesgue measure and have analytic densities as well as possibly a point mass at zero in certain phases [2]. Thus, we restrict the domain of V(t) to the set of such measures, and write where ν i is the associated density and p i is the probability mass at the boundary 0 when ϕ t = i, for ∈ {+, −, 0} and i ∈ S.
For simplicity, from here on we assume a set A is an interval, which might or might not include its end points, that is, , , m ∈ {+, −, 0} and i, j ∈ S, are given as follows. We include also brief probabilistic interpretations of the terms (see remarks in [2] for more details). Note that in all of the following cases, we assume A ⊂ F m j ; this is without loss of generality, as for any set G and a measure µ in our chosen domain, Case 1 represents when there is a stochastic jump from state i to state j = i, which happens with rate T ij . The integral represents the probability mass of the intersection of the initiating domain F i and the destination set A. If c j ≤ 0, then the point mass p i is preserved after the change of phase. If c j > 0, then the point mass p i disperses into the density in state j, and is captured only if A has an upper bound strictly greater than 0. Note that in this case 0 does not have to be in A, it only has to be in the closure of A.
Case 2. When i = j and = m, where ∂ L\R [G] denotes the left boundary point that mustn't also be the right boundary point of the closure of the set G, and similarly for ∂ R\L [G]. Case 2 represents a drift from F i to F m i . When neither F i nor A is an isolated point, there is a transfer of density from one set to another through the relevant endpoints. When A = {0}, c i < 0, and 0 is the left endpoint of the closure of F i (which can't be an isolated point itself), then there is also an accumulation of point mass.
Case 3. When i = j and = m, where ∂ R [G] denotes the right boundary point of the closure of G, and similarly for ∂ L [G]. Case 3 represents stochastic jumps out of state i and drift across F i .

Matrix Ψ of Operators
We denote by Ψ(s) the |S + | × |S − | matrix of operators recording the Laplace-Stieltjes transforms of the time for Y t to return, for the first time, to the initial level of zero. Define the stopping time θ(y) := inf{t > 0 : Y t = y} to be the first time Y t hits level y, then each component Ψ ij (s) : M(F + i ) → M(F − j ), i ∈ S + and j ∈ S − , is given by Let b(t) := t 0 |r ϕz (X z )| dz be the total unregulated amount of fluid that has flowed into or out of the second buffer Y t during [0, t], and let ω(y) := inf{t > 0 : b(t) = y} be the first time this accumulated in-out amount hits level y. We denote by U(y, s) the matrix of operators recording the Laplace-Stieltjes transforms of ω(y): where U m is the |S |×|S m | matrix of operators U m ij , for y > 0, s ∈ C, and Re(s) ≥ 0. Each operator U m ij (y, s) : M(F i ) → M(F m j ), for , m ∈ {+, −} and i, j ∈ S, is given by We can write where D(s) is the infinitesimal generator of the strongly continuous semigroup U(·, y). Lemma 4 of [2] gives the following expression for D(s).
where R := diag(R i ) i∈S is a diagonal matrix of operators R i given by By [2, Theorem 1], Ψ(s) has the following characterisation. Furthermore, if s is real then Ψ(s) is the minimal nonnegative solution.

Discontinuous Galerkin Approximations
Discontinuous Galerkin methods are used to approximate the solution to a system of partial differential equations. A brief description of these methods is as follows [5]. On the domain of the approximation, consider a finite sequence of socalled nodal points. We refer to each interval between two consecutive nodal points as a mesh, and the combination of meshes and nodal points as a stencil. Within each mesh, we have a finite element approximation, which constructs a finite-dimensional smooth Sobolev space by choosing appropriate piecewise polynomial basis functions, and then projects the partial differential equations onto this space. This projection leads to a new system of equations, referred to as the weak form of the original PDEs.
There is a flux operator moving probability from one mesh to another, in a manner similar to the underlying principle of a finite volume approximation: integrating the PDEs over each mesh and then constructing a new system of ordinary differential equations, which describe the change in the integral over the mesh. This method conserves probability, and can handle discontinuities, such as jumps and point masses.
Discontinuous Galerkin methods lead to global approximations in the space of piecewise functions. Intuitively, we sacrifice the continuity between meshes to gain the conservation of probability.

Application to a Stochastic Fluid-Fluid Model
Here, we construct a discontinuous Galerkin method to approximate the operator matrix B and subsequently the operator matrix Ψ, the two key ingredients of the joint stationary distribution for {X t , Y t , ϕ t }. We begin with approximating the joint which satisfies the system of partial differential equations subject to suitable boundary conditions [2]. While X t ∈ [0, ∞), any numerical approximation by necessity has to take place on a finite interval. Clearly, the state space truncation results in a point mass at the upper bound, which we have to address properly. It is important to choose an interval sufficiently large in order to control the error induced by the artificial upper bound for X t . We shall further comment on this in Section 4, where we report our numerical experiments.
Let [0, I] be the domain of the approximation, where I < ∞. We denote by {x 1 , x 2 , . . . , x K } a finite sequence of K nodal points on [0, I], with x 1 := 0 and x K := I, and by {D 1 , . . . , D K−1 } the sequence of corresponding meshes, As there are two point masses, one at zero where there is a regulated boundary for X t and another at I where there is an artificial upper bound, the two end meshes, D 1 and D K−1 , each of which contains both a point mass and density.
For k = 1, . . . , K − 1, we choose N k functions φ k n : D k → [0, ∞), n = 1, . . . , N k , to be the basis functions, the span of which forms our approximation space, Then, a function u i (·, ·) ∈ V K has the form: for some coefficient functions α k i,n (t). To construct an approximation for f i (x, t) we need to determine these functions α k i,n (t) or, equivalently, the N -dimensional row vector and N := N k is the total number of basis functions on [0, I].
To that end, there are three important matrices we need to introduce. The first two matrices, M and G, are N × N block-diagonal and detail the dynamics within each mesh: where, for m, n ∈ {1, . . . , N k }, The third matrix, F i , i ∈ S, is related to the dynamics between adjacent meshes: it is the flux operator moving probability from one mesh to another. Let u k i (x, t) be the projection of u i (x, t) onto the mesh D k . A central idea of the discontinuous Galerkin method is that the values of u k i (x, t) on different meshes are linked to each other only through a numerical flux f * and that one can consider the approximation where x R k and x L k denote the right and left endpoints of the kth mesh, respectively, and [g(x)] x b xa := g(x b ) − g(x a ) for any function g. There are many options for f * . Here, we choose a first-order up-winding scheme [7], that is, is an adjustment parameter for when we have a stencil with meshes of different structures; note that the numerical flux f * i (x, t) is defined at nodal points only. Suppose we are considering the kth mesh, D k . Then, we define the function η(sgn(c i ), x) to be Here, the function η ,k , ∈ {k − 1, k + 1}, is the ratio of the integrals of the basis functions transferring the probability from the th mesh to the kth mesh, where for one and then all m ∈ {1, . . . , N k } and n ∈ {1, . . . , N }. In other words, η ,k is not dependent on the particular choice of basis functions, m and n, but on their meshes, k and , only.
This choice of f * requires the flux information only from the left if the fluid rate c ϕt of X t is positive, and only from the right if that rate is negative. Intuitively, when c ϕt > 0, the numerical flux of probability going from D k−1 into D k is the probability density accumulated on the right-hand edge of the approximation on D k−1 . Similarly, when c ϕt < 0, the numerical flux going from D k into D k−1 is the probability density accumulated on the left-hand edge of the approximation on D k .
Thus, on nodal points we have where for any function g and is used to allow us access to the density on the left-hand edge of D k+1 and the right-hand edge of D k−1 , respectively.
We are now ready to introduce the block-tridiagonal matrix F i , ) be the vector containing all basis functions on the th mesh. We define for c i > 0 Proof We begin with the RHS of (22). For c i > 0, As each basis function φ j n , for n = 1, . . . , N j , is trivially zero outside its jth mesh, we have Substituting (24) and (25) into (23) leads to The argument for c i < 0 follows analogously. 2 Theorem 3.2. The weak formulation of the PDEs (14) is the following system of ordinary differential equations Proof Consider Equation (14), for i ∈ S, t ≥ 0, x ∈ [0, I], which we restate below: For details on the steps of discontinuous Galerkin methods, see [5]. Here, we start by replacing the density function f i (x, t) in (27) by its approximation u i (x, t) to obtain Multiplying both sides of (28) by a basis function φ k m , m ∈ {1, . . . , N k } and k ∈ {1, . . . , K = 1}, and then integrating over the approximation domain [0, I] gives Expanding and then integrating the third term by parts leads to where the first part can be approximated by using the numerical flux f * i (x, t), that is, Substituting (30) into (29) and expanding u k i (x, t) gives Finally, we switch the order of integrals and summations to arrive at Equivalently, this can be written in matrix form as which completes the proof.
Intuitively, the second and third conditions are to ensure stability in the solution and conservation of the transfer of probability across meshes, respectively.
Defining the discontinuous Galerkin infinitesimal operator Next, we approximate the operators R i by an N × N block-diagonal matrix R i , with diagonal sub-blocks [R i ] kk given by where ρ i,(k,n) = 0 is an approximation of the fluid rate r i (X t ) of Y t , given ϕ t = i, X t ∈ D k , and the basis function φ k n . These functions ρ i,(k,n) can sensibly be chosen to be: Define The matrix equation (33) can be solved using one of the many algorithms suggested by Bean et al. [8]. We then use this approximation ψ to evaluate the limiting density, according to Theorem 2.3.

An Example
Consider a process {X t , Y t , ϕ t } where there is only one phase i = 1 in S, in other words, there is no Markov modulation. Consequently, the PDE (14) has the drift component only: Suppose we would like to develop a DG approximation for the density function We choose the following basis functions: Assume that c = 1. Then, the non-zero upper-diagonal blocks are given by and the non-zero diagonal blocks are with all other sub-blocks of F being identically zero. Thus, Consequently, the DG generator Q = c(G + F )M −1 is given by where note that all the row sums are zero. Now, to illustrate the approximation B of the operator matrix B, suppose the phase process ϕ t ∈ S = {1, 2} has a generator matrix T . Let the DG approximation for this fluid-fluid process {X t , Y t , ϕ t } have the same stencil as described in Figure 1.
When ϕ t = 1, we assume Y t has a positive fluid rate, r 1 (X t ) > 0, when X t ∈ D 1 ∪ D 2 , and a negative rate, r 1 (X t ) < 0, when X t ∈ D 3 ∪ D 4 . Thus, F + 1 = D 1 ∪ D 2 and F − 1 = D 3 ∪ D 4 . When ϕ t = 2, we assume the opposite: If we are to pre-multiply B +− 11 with a row vector, we can see the matrix picks up the value of the right-most basis function with support in F + 1 , and then distributes this to the first two basis functions with support in F − 1 . This concurs with our physical interpretation (see Section 2.1) that there is a drift from F + 1 to F − 1 . Case 3. When i = j and = m: Suppose i = 1 and = −. Then,

Numerical Experiments
To illustrate the validity of our discontinuous Galerkin approximation, we perform numerical experiments on a stochastic fluid-fluid model, in a three-pronged approach.
First, we run Monte Carlo simulations, in order to compare the simulated joint density of {X t , ϕ t } evaluated at the time Y t first returns to the initial level 0 against that which is obtained via the return-probability matrix ψ. This numerically verifies the accuracy of our proposed approximation for the operator matrix Ψ. Second, using ψ we evaluate the limiting joint density of {X t , ϕ t }, which we compare against the same density analytically derived in [6]. Third, we vary the parameters of the second fluid Y t to confirm that the approximating joint density for {X t , ϕ t } does not change, while the marginal limiting distribution for Y t does, both of which are consistent with our intuitive understanding of the chosen example. In all three procedures, we find the approximations to be accurate.
We also analyse different choices for the level of spatial discretisation and the degree of polynomial basis functions, with respect to the order of convergence in relevant error terms.

An on-off bandwith-sharing model
The example we choose for our experiments is as follows. Consider a stochastic fluid-fluid {X t , Y t , ϕ t } t≥0 , where X t and Y t represent the workloads in Buffer 1 and Buffer 2 at time t ≥ 0, both driven by the phase ϕ t , which is a Markov chain on the state space S = {11, 10, 01, 00}. Here, the state 11 indicates inputs to both buffers being on, the state 00 indicates both being off, the state 10 is when only the first input is on, and the state 01 is when only the second is on. The input of Buffer k is switched from on to off with rate α k , and from off to on with rate β k , for k = 1, 2. Thus, the infinitesimal generator T for ϕ t is given by We denote by λ k the input rate of Buffer k during an on period, and by ζ k the output rate, and assume that λ k > ζ k , for k = 1, 2. This example imitates the model considered in [6], where the two buffers share a fixed total output capacity κ > 0.
In particular, the output rate ζ 1 is constantly θ 1 , except for when the buffer is empty, that is, when X t = 0. On the other hand, ζ 2 varies depending on the values of the buffers. More specifically, as Buffer 1 is considered high-priority, this buffer is allocated the entire output capacity κ whenever exceeding a pre-determined threshold x * , leaving ζ 1 = κ and ζ 2 = 0. However, when Buffer 1 is empty, Buffer 2 is given the whole output capacity κ, meaning ζ 2 = κ. For X t ∈ (0, x * ), ζ 1 = θ 1 and ζ 2 = θ 2 , for θ 1 , θ 2 ≥ 0 such that θ 1 + θ 2 = κ. Clearly, if Buffer k is empty, its output rate ζ k would be zero. Table 1 summarizes the output rates for all different scenarios. Table 1: Output rates ζ1 and ζ2 for the buffers in different scenarios, as specified in the model in [6]. Note that while Buffer 1 is independent of Buffer 2, its output rate ζ1 depends on Xt.

Buffer 1 Buffer 2 and Phase
Given its dynamics as currently defined based on [6], Buffer 1 is an example of what is known in the literature as a level-dependent fluid, because its net rates depend on the value of X t relative to the threshold x * . As the existing theoretical analysis for stochastic fluid-fluid processes (developed in [2] and briefly summarized in Section 2) does not allow for level dependency in the first buffer, here we modify the bandwith-sharing model in [6] slightly. We let the output rate ζ 1 of Buffer 1 remain θ 1 for X t > 0, effectively eliminating the threshold effect on the first buffer but keeping the effect on the second. Table 2 represents the modified rates. Table 2: Output rates ζ1 and ζ2 for the buffers in different scenarios, as modified slightly from the model in [6]. Note that the output rate ζ1 no longer depends on the value of Xt, except for a boundary condition at 0.

Buffer 1 Buffer 2 and Phase
Consequently, the net rates of change for X t , c i , are given by (c 11 , c 10 , c 01 , c 00 ) = ( and the net rates of change for Y t , r i , are as follows (r 11 , r 10 , r 01 , r 00 ) For our numerical experiments, we use the parameter choices given in [6]: As mentioned previously, while the true problem has an unbounded domain [0, ∞), the discontinuous Garlekin method requires the domain of approximation to be a finite interval. Hence, for all approximations we consider a finite interval but large enough so that the boundary-induced dynamics do not significantly affect the results.
To specify the stencil for our numerical approximation, we define a vector ω K,h,∆ h of K nodal points as for h > 0 and for ∆ h > 0, both sufficiently small. In this stencil, there are K − 1 meshes, of which K − 5 are interior meshes of length h, the left and right boundary meshes are of length ∆ h , and the second-to-left and second-to-right ones are of length h − ∆ h . The boundary meshes always have piecewise-constant approximations, because this is sufficient to approximate the point masses accumulated at either boundary.

The return-probability matrix ψ via Monte Carlo simulations
We choose our initial distribution v i (x, y, 0) := P [X 0 ≤ x, Y 0 ≤ y, ϕ 0 = i] to be a point mass of 1 for (X 0 = 5, Y 0 = 0, ϕ 0 = 01), and zero everywhere else. The choice of an initial point being a point mass instead of a non-degenerate distribution is purely for the convenience of numerical simulation, because it eliminates the need of simulating multiple initial starting points. Using this initial condition and the parameters specified in (38, 39), we simulate 10 5 trajectories, terminating each path either when Buffer Y t returns to zero or at time V = 10000, Overall, 2.3% of the trajectories did not reach zero in buffer Y by time V , and so are rejected.
Note that as we assume positive recurrence for our system, the probability of Buffer 2 returning to its initial level zero is 1. However, the time this takes might be longer than the time window of our simulation, V = 10000, which means the trajectories that we have terminated at time V must eventually return to zero with probability 1.
Let τ := inf{t > 0 : Y t = 0} be the first time Y t returns to zero. For each retained simulated trajectory, we record the values of X τ and ϕ τ . For the states 10 (the first input being on, the second being off) and 00 (off-off), we present in Figure 2 the cumulative distributions of X τ , P[X τ ≤ x, Y τ = 0, ϕ τ = i], with i = 00, 10, as determined by the simulations as well as by a piecewise linear DG approximation constructed from the approximating matrix ψ of operator Ψ. In this piecewise linear DG approximation, we consider the approximation interval [0, I] = [0, 16], and the parameters of the stencil ω K,h,∆ h , defined in (40), take the following values: K = 43, h = 0.4 and ∆ h = 0.001. The boundary meshes each have a constant basis function of value 1; each kth interior mesh D k := [x k , x k+1 ] has two piecewise linear basis functions for x ∈ D k and k = 2, . . . , K − 2. As can be seen in Figure 2, these two distributions are closely matched. 4.3. The marginal density of {X t , ϕ t } Since Buffer 1, X t , is independent of Buffer 2, Y t , we can use results from the existing literature on stochastic fluid flows to obtain the marginal limiting density χ(x) = (χ i (x)) i∈S of X t : On the other hand, we can use the operator Ψ to compute, based on Theorem 2.3, the joint limiting density π i (A, y), where Thus, we can approximate the joint limiting density π i (y)(A), via a discontinuous Galerkin approximation ψ of Ψ, and consequently form an approximation of the marginal limiting density χ(x). In particular, we evaluate an approximation of π i (y)([0, x)), and then integrate the approximating function over the domain of y, and finally differentiate with respect to x; note that Let two vectors χ and χ denote respectively the piecewise constant and piecewise linear DG approximations of χ, obtained via the corresponding approximations ψ. We use ω 43,0.4,0.001 as our stencil for the DG approximation and the nodal points at which we evaluate the analytical density function χ. Define We present in Figure 3 the analytical density χ at given nodal points, a piecewise constant DG approximation, and a piecewise linear DG approximation.  The approximations reconstruct the general shape of the density reasonably well: for both piecewise constant and piecewise linear, we see most of the probability being concentrated in the point mass at X t = 0, and then decaying as x increases. We observe from the subplot inside the left plot of Figure 3 that the piecewise constant approximation χ off underestimates the point mass, and redistributes the difference over the rest of the state space of X t . Hence, there is more mass in the tails of the densities. On the other hand, the piecewise linear approximation χ appears to be very close to the analytical solution.

Sensitivity analysis of the dynamics of Y t
To further confirm that the discontinuous Galerkin approximation ψ of the operator Ψ accurately captures the dynamics of Y t , we vary the rates at which the input to this buffer switches on and off (denoted by α 2 and β 2 , respectively). As we modify these rates, we should see a change in the distribution of probability between the limiting marginal density of X t when Y t = 0, and that of X t when Y t > 0. On the other hand, the sum of these two densities should be identical in all the different meaningful scenarios of α 2 and β 2 ; that is, the sum χ 0 (x) + χ + (x) should remain fixed and be equal to the sum χ on (x) + χ off (x), for all x.
To that end, we keep our stencil ω 43,0.4,0.001 and basis functions fixed, and compute the marginal limiting density for different values of α 2 and β 2 . The results coincide with what we expect from the dynamics in Y t . Table 3: The functions χ 0 (x) and χ + are piecewise linear DG approximations of the limiting marginal densities χ 0 (x) and χ + (x) over the stencil ω43,0.4,0.001.
From Table 3, we observe that as α 2 (the rate at which the input for Y t switches off) increases, so does the probability of Y t being empty. Furthermore, even though there are different amounts of probabilities in the two marginal densities, χ 0 (x) and χ + (x), their sum remains the same as the sum of the marginal limiting densities χ on (x) and χ off (x) for all calculated values of x (data not shown here). These numerical results indicate that the dynamics of Y t are captured well by the DG approximations.

Errors of approximation
In a discontinuous Galerkin approximation, we choose the smoothness of the basis functions and the level of spatial discretisation. Once a particular selection is made, we project our operators into a finite-dimensional linear operator space corresponding to these choices (see Section 3). It has been shown that operators such as B (defined in Section 2.1) under a DG approximation have an error which converges at the order of O(h s ), where h is the discretisation and s is the degree of the basis [9].
However, this result cannot be easily translated across to the operator Ψ. The DG approximation of the Ψ operator is constructed by taking the DG approximation of the operators B and then solving the Riccati equation (33) using the approximate operators. Further, we then use this approximation of Ψ to derive an approximation for the limiting density π. With such a construction, it is not trivial to determine how the error propagates through the process of solving the Riccati equation, and then through further calculations to determine π. Determining bounds for the approximation errors of Ψ and π, as functions of the discretisation and basis selection, is a topic for future research.
As a preliminary step in this direction, we empirically investigate how the approximation error of the marginal limiting density of the fluid X t (see Section 4.3) changes with respect to the choices of basis functions and the levels of discretisation. We begin by introducing our normed vector space in which we compare the different levels of discretisation and smoothness. Recall, from (40), that the left boundary mesh of our approximations is of length ∆ h , and all but two interior meshes are of length h, with two compensating meshes of width h − ∆ h . Let [0, I] be the interval on which we approximate our solution, then both the approximations and the analytical solution belong to the space S × C −1 ([0, I]), where C −1 ([0, I]) is the set of functions with countably many discontinuities. We choose the right boundary mesh to be a piecewise-constant function. Then, for any function g : S ×[0, I] → R, where g(i, ·) ∈ C −1 ([0, I]) for i ∈ S, we define the star seminorm as follows: Essentially, the star seminorm is an extension of the L 1 norm which incorporates our interpretation: that the left and right boundary meshes are treated as point masses. That is, we only study the total mass in the intervals [0, ∆ h ] and [I − ∆ h , I] and not the distribution over them.
We conduct two numerical experiments. The first is to understand the effects of choosing piecewise-linear basis functions over piecewise-constant, the second experiment is to understand the effects of treating a point mass as a density on a short interval.
In the first experiment, we choose ∆ h = 10 −6 and I = 16. We then consider the error between the approximation and the reference solution in the star seminorm. The notations χ ω and χ ω denote, respectively, a piecewise-linear approximation with stencil ω and a piecewise-constant approximation with stencil ω.
In the left plot in Figure 4, we observe that the piecewise-constant approximation has an error that scales approximately O(h 0.88 ), with respective to the mesh size h, and the piecewise linear approximation has an error that scales approximately O(h 1.84 ). We observe that the coarsest piecewise-linear basis approximation has an error similar to that of the finest piecewise-constant approximation, which is two orders of magnitude finer.
In the second experiment, we fix h = 1.0, I = 16, and the basis functions to be piecewise-linear, and then we scale ∆ h and observe the trend in the star seminorm. The approximation error will plateau past the length where the error caused by h and I dominate over the error gains of reducing the width of the boundary mesh. Hence, we subtract from this approximation error the approximation error of a finer approximation, χ 19,1.0,0.005 . The right plot in Figure 4 shows that the error scales approximately O(∆ 1.7 h ). The numerical experiments above indicate that an increase in the degree of the basis functions would result in an increase in the order of convergence of the error in the star seminorm. However, this experiment is not taking into consideration that by increasing the degree of basis functions we need to increase the number of basis (elements) in the mesh. For example, the piecewise-constant has one basis element in each mesh, while the piecewise-linear has two. With respect to storage, the overall number of elements in the entire stencil is increasing linearly with the order of the elements.
In Table 4, we give some computational statistics of the approximations used in Figure 4 to aid in the understanding of the trade-offs between memory, computation time, mesh size, and error. We observe for a mesh size h = 0.5 that the piecewiseconstant approximation uses roughly half the number of elements and is twice as fast, compared to the piecewise linear approximation. However, the piecewise-linear approximation is two hundred times more accurate than the piecewise-constant approximation.
In our particular examples, if we were given a restriction to use no more than a prescribed amount of elements, then choosing a larger mesh size with piecewiselinear elements would be a better strategy than choosing a smaller mesh size and using piecewise-constant elements. The generalisation of these observations and the exploration of higher order basis functions is the focus for future research.  Table 4: Computational times and storage comparisons between piecewise-linear and -constant approximations. Overall storage is the total storage of all the operators from (7) to (12). The computations were performed on 2.5Ghz Intel Core i7 with 16GB of RAM running OSX 10.10.5.
The code was implemented in python, using scientific python libraries.

Conclusions
Finite Differences and Finite Volume methods have been used in the past to approximate operators that arise in stochastic processes. In principle, these methods approximate the operators by higher dimensional linear operators. In these methods, intuitive notions of mass conservation and positivity are captured; however, regularity is lost, making highly regular probability distributions computationally intensive.
We proposed the application of the discontinuous Galerkin method to approximate stochastic operators, with the intent that its ability to incorporate local regularity and maintain mass conservation, will lead to more accurate approximations and a reduction in computational effort. To demonstrate this, we applied the discontinuous Galerkin method to approximate all the operators needed to construct the joint stationary distribution of a stochastic fluid-fluid process.
The numerical results showed that the approximation of the stationary distribution arising from DG approximations of the operators is accurate and effective. We also verified that the operators and their dynamics were captured accurately. Furthermore, in our example, we observed that adding more regularity in the basis functions led to a significant decrease in computational effort. The DG method also enabled us to obtain other performance measures of stochastic fluid-fluid processes that are also analytically presented by operators.
Future work includes determining error bounds for the approximations of the operator Ψ as well as of the stationary distribution in general, and a more thorough investigation of the computational effort as higher-order basis functions are used.