Wasserstein and Kolmogorov Error Bounds for Variance-Gamma Approximation via Stein’s Method I

The variance-gamma (VG) distributions form a four-parameter family that includes as special and limiting cases the normal, gamma and Laplace distributions. Some of the numerous applications include financial modelling and approximation on Wiener space. Recently, Stein’s method has been extended to the VG distribution. However, technical difficulties have meant that bounds for distributional approximations have only been given for smooth test functions (typically requiring at least two derivatives for the test function). In this paper, which deals with symmetric variance-gamma (SVG) distributions, and a companion paper (Gaunt 2018), which deals with the whole family of VG distributions, we address this issue. In this paper, we obtain new bounds for the derivatives of the solution of the SVG Stein equation, which allow for approximations to be made in the Kolmogorov and Wasserstein metrics, and also introduce a distributional transformation that is natural in the context of SVG approximation. We apply this theory to obtain Wasserstein or Kolmogorov error bounds for SVG approximation in four settings: comparison of VG and SVG distributions, SVG approximation of functionals of isonormal Gaussian processes, SVG approximation of a statistic for binary sequence comparison, and Laplace approximation of a random sum of independent mean zero random variables.


Overview of Stein's method for variance-gamma approximation
The variance-gamma (VG) distribution with parameters r > 0, θ ∈ R, σ > 0, µ ∈ R has probability density function where x ∈ R and the modified Bessel function of the second kind K ν (x) is defined in Appendix A. If the random variable Z has density (1.1), we write Z ∼ VG(r, θ, σ, µ).The support of the VG distributions is R when σ > 0, but in the limit σ → 0 the support is the region (µ, ∞) if θ > 0, and is (−∞, µ) if θ < 0. Alternative parametrisations are given in [10] and [29] (in which they use the name generalized Laplace distribution).Distributional properties are given in [16] and Chapter 4 of the book [29].
The VG distribution was introduced to the financial literature by [32].Due to their semi-heavy tails, VG distributions are useful for modelling financial data [33]; see the book [29] and references therein for an overview of the many applications.The class of VG distributions contain many classical distributions as special or limiting cases, such as the normal, gamma, Laplace, product of zero mean normals and difference of gammas (see Proposition 1.2 of [16] for a list of further cases).Consequently, the VG distribution appears in many other settings beyond financial mathematics [29]; for example, in alignment-free sequence comparison [31,45].In particular, starting with the works [15,16], Stein's method [50] has been developed for VG approximation.The theory of [15,16] and the Malliavin-Stein method (see [36]) was applied by [12] to obtain "six moment" theorems for the VG approximation of double Wiener-Itô integrals.Further VG approximations are given in [1] and [2], in which the limiting distribution is the difference of two centered gamma random variables.
Introduced in 1972, Stein's method [50] is a powerful tool for deriving distributional approximations with respect to a probability metric.The theory for normal and Poisson approximation is particularly well established with numerous application in probability and beyond; see the books [6] and [3].There is active research into the development of Stein's method for other distributional limits (see [30] for an overview), and Stein's method for exponential and geometric approximation, for example, is now also well developed; see the survey [48].In particular, [39] have developed a framework to obtain error bounds for the Kolmogorov and Wasserstein distance metrics for exponential approximation, and [40] developed a framework for total variation error bounds for geometric approximation.
This paper and its companion [23] focuses on the development of Stein's method for VG approximation.At the heart of the method [16] is the Stein equation where h(x) = h(x) − Eh(Z) for h : R → R and Z ∼ VG(r, θ, σ, µ).Together with the Stein equations of [41] and [43], this was one of the first second order Stein equations to appear in the literature.We now set µ = 0; the general case follows from the translation property that if Z ∼ VG(r, θ, σ, µ) then Z − µ ∼ VG(r, θ, σ, 0).The solution to (1.2) is then where ν = r−1 2 and the modified Bessel function of the first kind I ν (x) is defined in Appendix A. If h is bounded, then f h (x) and f ′ h (x) are bounded for all x ∈ R.Moreover, this is the unique bounded solution when r ≥ 1.
To approximate a random variable of interest W by a VG random variable Z, one may evaluate both sides of (1.2) at W , take expectations and finally take the supremum of both sides over a class of functions H to obtain Many important probability metrics are of the form sup h∈H |Eh(W ) − Eh(Z)|.In particular, taking gives the Kolmogorov, Wasserstein and bounded Wasserstein distances, which we denote by d K , d W and d BW , respectively.The problem of bounding sup h∈H |Eh(W ) − Eh(Z)| is thus reduced to bounding the solution (1.3) and some of its lower order derivatives and bounding the expectation on the right-hand side of (1.4).To date, the only techniques for bounding this expectation for VG approximation are local couplings [15,16] and the integration by parts technique used to prove Theorem 4.1 of [12].Other coupling techniques that are commonly found in the Stein's method literature, such as exchangeable pairs [51] and Stein couplings [7], have yet to be used in VG approximation, although one of the contributions of this paper is a new coupling technique for SVG approximation by Stein's method.
The presence of modified Bessel functions in the solution (1.3) together with the singularity at the origin in the Stein equation (1.2) makes bounding the solution and its derivatives technically challenging.Indeed, in spite of the introduction of new inequalities for modified Bessel functions and their integrals [17,18] and extensive calculations ( [15], Section 3.3 and Appendix D), the first bounds given in the literature [16] were only given for the case θ = 0 and had a far from optimal dependence on the parameter r.Substantial progress was made by [9], in which their iterative approach reduced the problem of bounding the derivatives of any order to bounding just the solution and its first derivative.However, the bounds obtained in [9] have a dependence on the test function h which means that error bounds for VG approximation can only be given for smooth test functions.

Summary of results and outline of the paper
In this paper and its companion [23], we obtain new bounds for the solution of the VG Stein equation that allow for Wasserstein and Kolmogorov error bounds for VG approximation via Stein's method.This paper focuses on the case θ = 0 (symmetric variancegamma (SVG) distributions), while [23] deals with the whole family of VG distributions.This organisation is due to the additional complexity of the θ = 0 case.One of the difficulties is that when θ = 0, the inequalities for expressions involving integrals of modified Bessel functions that we use to bound the solution take a more complicated form, meaning our main results need to be presented in parallel for the two cases.It should be noted, though, that, once the inequalities for modified Bessel functions have been established (which has now been done in [17,18,21]), the intrinsic difficulty of bounding the derivatives of the solution of the Stein equation in the two cases is similar.This organisation allows for a clear exposition with manageable calculations.
In Section 3, we obtain new bounds for the solution of the SVG Stein equation (Theorem 3.1 and Corollary 3.3) that have the correct dependence on the test function h to allow for Wasserstein ( h ′ ) and Kolmogorov ( h ) error bounds for SVG approximation via Stein's method.This task is arguably more technically demanding than for any other distribution for which this ingredient of Stein's method has been established.Indeed, Theorem 3.1 builds on the bounds of [15,16], the iterative technique of [9], and three papers on inequalities for integrals of modified Bessel functions [17,18,21] whose primary motivation was Stein's method for VG approximation.In Propositions 3.5 and 3.6 we note that higher order derivatives of the solution cannot have a dependence on h of the form h or h ′ .
In Section 4, we introduce (Definition 4.3) a distributional transformation, which we call the centered equilibrium transformation of order r, that is natural in the context of SVG approximation via Stein's method.As our choice of name suggests, it generalises the centered equilibrium transformation [43], which is itself the natural analogue for Laplace approximation of the equilibrium transformation for exponential approximation [39].In Theorem 4.10, we combine with the bounds of Section 3 to obtain general Wasserstein and Kolmogorov error bounds for SVG approximation.Our bounds are the SVG analogue of the general bounds of Theorem 3.1 of [39] that have been shown to be a useful tool for obtaining bounds for exponential approximation.
It should be noted that even with the new bounds of Section 3, with other coupling techniques, such as local couplings, more effort may be required to obtain Wasserstein and Kolmogorov bounds than would be the case for normal approximation, for example.This is due to the presence of the coefficient σ 2 x in the leading derivative of the SVG Stein equation (1.2).This therefore provides motivation for introducing this distributional transformation.
In Section 5, we apply the results of Sections 3 and 4 in four applications, these being: approximation of a general VG distribution by a SVG distribution; quantitative six moment theorems for SVG approximation of double Wiener-Itô integrals; SVG approximation of a statistic for binary sequence comparison (a special case of the D 2 statistic for alignment-free sequence comparison [4,31]); and Laplace approximation of a random sum of independent mean zero random variables.Our error bounds are given in the Wasserstein and Kolmogorov metrics, and in each case such bounds would not have been attainable by appealing to the present literature.
The rest of this paper is organised as follows.In Section 2, we introduce the class of SVG distributions and state some of their basic properties.Section 3 gives new bounds for the solution of the SVG Stein equation.In Section 4, we introduce a new distributional transformation, which we apply to give general bounds for SVG approximation in the Wasserstein and Kolmogorov metrics.In Section 5, we apply our results to obtain SVG approximations in several applications.Proofs of technical results are postponed until Section 6. Basic properties and inequalities for modified Bessel functions that are needed in this paper are collected in Appendix A.

The class of symmetric variance-gamma distributions
In this section, we introduce the class of symmetric variance-gamma (SVG) distributions and present some of their basic properties.
Setting θ = 0 in (1.1) gives the p.d.f. of Z ∼ SVG(r, σ, µ): where K ν (x) is a modified Bessel function of the second kind.The parameter r is known as the scale parameter.As r increases, the distribution becomes more rounded around its peak value µ (as can be seen from (2.7) below).The parameter σ is called the tail parameter.As σ decreases, the tails decay more quickly (see (2.6)).The parameter µ is the location parameter.Calculations can often be simplified by using the basic relation that if Z ∼ SVG(r, 1, 0), then σZ + µ ∼ SVG(r, σ, µ).The SVG(r, 1, 0) distribution is in a sense the standard symmetric variance-gamma distribution.The presence of the modified Bessel function makes (2.5) difficult to parse at first inspection.The following asymptotic formulas help in this regard.Applying (A.86) to (2.5) gives that, for all r > 0, σ > 0 and µ ∈ R, Similarly, applying (A.84) to (2.5) (see [15]) gives that (2.7) The density thus has a singularity at x = µ if r ≤ 1.In fact, for any parameters, the SVG(r, σ, µ) distribution is unimodal with mode µ.This can be seen from the fact that the function x ν K ν (x) is a decreasing function of x in the interval (0, ∞) for all ν ∈ R (see (A.88)).
The SVG distribution has moment generating function M(t) = (1 + σ 2 t 2 ) −r/2 , t ∈ R, and therefore has moments of arbitrary order.In particular, the mean and variance of Z ∼ SVG(r, σ, µ) are given by (2.8) Perhaps surprisingly, this author could not find a formula for the absolute centered moments of the SVG(r, σ, µ) distribution in the literature.The result and its simple proof are given here.
Proposition 2.2.Let Z ∼ SVG(r, σ, µ).Then, for k > 0, (2.9) Proof.We follow the approach given in Proposition 4.1.6 of [29] to obtain the moments of the SVG(r, σ, 0) distribution.Recall that 2 ) and Y ∼ N(0, 1) are independent.Therefore whence the result follows on using the standard formulas EX s = Γ( r 2 + s)/Γ( r 2 ) and In interpreting Corollary 5.4 it will be useful to note the following formulas for the moments and cumulants of Y ∼ SVG(r, σ, 0) ( [12], Lemma 3.6): with the odd order moments and cumulants all being equal to zero.
Lastly, we note that the class of SVG distributions contains several classical distributions as special or limiting cases ( [16], Proposition 1.2).
1. Let X r have the SVG(r, σ √ r , µ) distribution.Then X r converges in distribution to a N(µ, σ 2 ) random variable in the limit r → ∞.
3 Bounds for the solution of the Stein equation In this section, we obtain bounds for the solution of the SVG Stein equation (that is (1.2) with θ = 0) which have the correct dependence on the test function h to allow for Wasserstein and Kolmgorov distance bounds for SVG approximation via Stein's method.
For ease of exposition, in our proofs, we shall analyse the solution of the SVG(r, 1, 0) Stein equation.The general case follows from that fact that SVG(r, σ, µ) = d µ+σSVG(r, 1, 0) and a simple rescaling and translation.The solution of the SVG(r, 1, 0) Stein equation is then where ν = r−1 2 and h(x) = h(x) − Eh(Z) for Z ∼ SVG(r, 1, 0).The equality between (3.10) and (3.11) follows because |t| ν K ν (|t|) is proportional to the SVG(r, 1, 0) density.The equality is very useful, because it means that we will be able to restrict our attention to bounding the solution in the region x ≥ 0, from which a bound for all x ∈ R is immediate.
We now note two useful bounds due to [16] for the solution of the SVG(r, σ, µ) Stein equation that will be used in the proof of Theorem 3.1 and some of the applications of Section 5.For bounded and measurable h : R → R, ) Let us now state the main result of this section.
Theorem 3.1.Suppose that h : R → R is bounded and measurable.Let f be the solution of the SVG(r, σ, µ) Stein equation.Then and also Proof.As noted above, for ease of notation, we set σ = 1 and µ = 0.The bounds for the general case, as stated in the theorem, follow from a simple change of variables; see the proof of Theorem 3.6 of [16].We also recall that it suffices to obtain bounds in the region x ≥ 0. Let us first establish the bound for f , which we will need to obtain several of the other bounds.By the mean value theorem, | h(x)| ≤ h ′ (|x| + E|Z|), where Z ∼ SVG(r, 1, 0).From (2.9) we have E|Z| = 2 √ π Γ( r+1 2 )/Γ( r 2 ).Now, on using inequalities (A.96), (A.98), (A.97) and (A.99) to obtain the second inequality we have, for x ≥ 0, , where we used the standard formula uΓ(u) = Γ(u + 1) to obtain the final equality.Now, 2 ) is a decreasing function of ν for ν > − 1 2 (see [26]), and so is bounded above by Γ( 12 ) = √ π for all ν > − 1 2 .Hence, |f (x)| ≤ 7 2 h ′ for all x ≥ 0, which is sufficient to prove (3.17).
In the following corollary, we apply some of the estimates of Theorem 3.1 to bound some useful quantities.We shall make use of these bounds in Section 4. It will be convenient to define the operator T r by T r f (x) = xf ′ (x) + rf (x).
Corollary 3.2.Let f be the solution of the SVG(r, σ, 0) Stein equation.Then for h : R → R bounded and measurable, and Lipschitz, respectively, From the triangle inequality and the estimates of Theorem 3.1, which proves the result.
Remark 3.4.For the normal [6] and exponential [5] Stein equations, because the solution of the Stein equation with test function h z (x) = 1(x ≤ z) can be expressed in terms of elementary functions, a detailed analysis of the solution yields bounds with smaller constants than would be obtained by first working with a general bounded test function h and then bounding hz ≤ 1.However, because of the presence of modified Bessel functions in the solution, such improvements would be more difficult to obtain here.
It is natural to ask whether, for all z ∈ R, a bound of the form f ′′ z ≤ C r,σ could be obtained for the solution f z .The following proposition, which is proved in Section 6, shows that this is not possible.Proposition 3.5.f ′ µ (x) has a discontinuity at x = µ.Similarly, one may ask whether a bound of the form f (3) ≤ C r,σ h ′ could be obtained for all Lipschitz h : R → R. The following proposition, which is proved in Section 6, again shows this is not possible (see [11] for similar results that apply to solutions of Stein equations for a wide class of distributions).Our approach differs from that of Proposition 3.5 in that we do not find a Lipschitz test function h for which f ′′ has a discontinuity.This would be more tedious to establish for f ′′ than for f ′ and instead we consider a highly oscillating test function and perform an asymptotic analysis.Proposition 3.6.Let f be the solution of the SVG(r, σ, µ) Stein equation.Then there does not exist a constant M r,σ > 0 such that f (3) ≤ M r,σ h ′ for all Lipschitz h : R → R. Remark 3.7.(i) Throughout this remark, we set µ = 0.The bounds (3.13) and (3.19) are of order r −1 as r → ∞.This is indeed the optimal order, which can be seen by the following argument, which is similar to the one given in Remark 2.2 of [24] to show that the rate in their bound for solution of the gamma Stein equation was optimal.
(ii) The bound (3.12) for f is of order r − 1 2 as r → ∞.Indeed, for r > 1, which follows from the inequalities [25]) and (see [13]).The O(r − 1 2 ) rate is optimal, which can be seen as follows.Take h to be h .
Here we used that the first limit is equal to zero by the asymptotic formulas (A.83) and (A.84).We computed the second limit using the asymptotic formula (A.83) and that the integrand is proportional to the density of the SVG(2ν + 1, σ, 0) distribution.Therefore, by (3.26), we conclude that the optimal rate is order r − 1 2 as r → ∞. (iii) Arguing as we did in part (i), we have that f (0) = σ 2 (r + 1)f ′′ (0) − h ′ (0), which for a general Lipschitz test function h is O(1) (see bound (3.19)), and so the bound (3.17) is of optimal order.
(iv) In light of inequalities (3.17)-(3.19)one might expect inequalities (3.15) and (3.16) to be of lower than (3.14) as r → ∞.However, this is not the case.A calculation involving L'Hôpital's rule (which is given in Section 6) shows that, for any bounded h : R → R, and from the SVG(r, σ, 0) Stein equation and inequality (3.18) we obtain Thus, inequalities (3.14) and (3.16) are of optimal order in r.We expect this to also be the case for inequalities (3.20)- (3.22), although verifying this would involve a more detailed analysis, which we omit for space reasons.
4 The centered equilibrium transformation of order r In this section, we introduce a new distributional transformation and apply it to obtain general Wasserstein and Kolmogorov error bounds for SVG approximation.We begin with the following proposition which relates the Kolmogorov and Wasserstein distances between a general distribution and a SVG distribution.This proposition is of interest, because Wasserstein distance bounds are often easier to obtain than Kolmogorov distance bounds through Stein's method.The proof is deferred until Section 6. Proposition 4.1.Let Z ∼ SVG(r, σ, µ).Then, for any random variable W : 676, then the upper bound in part (ii) is equal to 1.075, and is therefore uninformative.
(ii) Recall that N(µ, σ 2 ) = d lim r→∞ SVG(r, σ √ r , µ).Therefore from (4.28) and the = 1 (see (3.26)), we recover the inequality (with obvious abuse of notation) which is a special case of part 2 of Proposition 1.2 of [48].It is known (see [6], p. 48) that this bound gives the optimal rate under some conditions, but in other applications the rate is suboptimal.Proposition 5.1 gives an application in which the inequalities (4.28), (4.29) and (4.30) are not of optimal rate in δ = d W (W, Z); see Remark 5.2.
As in Section 3, we define the operator T r by T r f (x) = xf ′ (x) + rf (x).We also denote D = d dx .From now until the end of this section, we set µ = 0. Definition 4.3.Let W be a random variable with mean zero and variance 0 < rσ 2 < ∞.We say that W Vr has the W -centered equilibrium distribution of order r if for all twice-differentiable f : R → R such that the expectations in (4.31) exist.
As we shall see later, it is convenient to write Var(W ) = rσ 2 , because the variance of a SVG(r, σ, 0) random variable is rσ 2 .As the name suggests, the centered equilibrium distribution of order r generalises the centered equilibrium distribution of W , denoted by W L , that was introduced by [43].Its characterising equation is We also refer the reader to [8] for a generalisation of (4.32) to all random variables W with finite second moment.The centered equilibrium distribution is itself the Laplace analogue of the equilibrium distribution that has been shown to be a useful tool in Stein's method for exponential approximation by [39].We can see that W V 2 = W L by setting f (x) = xg(x) in (4.32).For r = 2, a characterising equation of the form (4.32) is not useful.To see this, recall that the Stein operator for the SVG(r, σ, 0) distribution is which has a singularity at x = 0 if r = 2.We also note that 2) , where W * (2) has the W -zero bias distribution of order 2 (see [19]).This distributional transformation is a natural generalisation of the zero bias transformation (defined below) to the setting of Stein's method for products of independent standard normal random variables.We shall make use of this fact in Section 5.3.
We now obtain an inverse of the operator T r D. This inverse operator will be used later in this section to establish properties of the centered equilibrium distribution of order r.Recall that the Beta(r, 1) distribution has p.d.f.p(x) = rx r−1 , 0 < x < 1.
Lemma 4.4.Let B r ∼ Beta(r, 1) and U ∼ U(0, 1) be independent, and define the operator G r by G r f (x) = x r Ef (xUB r ).Then, G r is the right-inverse of the operator T r D in the sense that Suppose now that f is twice-differentiable.Then, for any r ≥ 1, Therefore, G r is the inverse of T r D when the domain of T r D is the space of all twicedifferentiable functions f on R with f (0) = 0.
Proof.We begin by obtaining a useful formula for G r f (x) = x r Ef (xUB r ): We now use (4.35) to verify (4.33): Finally, we verify relation (4.34).We have Before presenting some properties of the centered equilibrium distribution of order r, we recall two distributional transformations that are standard in the Stein's method literature.If W is a mean zero random variable with finite, non-zero variance σ 2 , we say that W * has the W -zero biased distribution [27] if for all differentiable f for which For any random variable W with finite second moment, we say that W has the W -square bias distribution ( [6], pp.[34][35] if for all f such that EW 2 f (W ) exists, When EW = 0, there is neat relationship between these distribution transformations: W * = d UW , where U ∼ U(0, 1) is independent of W (this is a slight variant of Proposition 2.3 [6]; see [19], Proposition 3.2).
The following construction of W Vr generalises Theorem 3.2 of [43].Similar constructions for distributional transformations that are natural in the context in gamma and generalized gamma approximation can be found in [44] and [42].
Proposition 4.5.Let W be a random variable with zero mean and finite, non-zero variance rσ 2 , and let W * have the W -zero bias distribution.Let B r ∼ Beta(r, 1) be independent of W * .Then, the random variable has the centered equilibrium distribution of order r.
Proof.Let f ∈ C c , the collection of continuous functions with compact support.In Lemma 4.4 we defined the operator G r g(x) = x r Eg(xUB r ) and showed that T r DG r g(x) = g(x) for any g.We therefore have Since the expectation of f (W Vr ) and f (B r W * ) are equal for all f ∈ C c , the random variables W Vr and B r W * must be equal in distribution.
In the following proposition, we collect some useful properties of the centered equilibrium distribution of order r.As might be expected in the light of Proposition 4.5, some of these properties are quite similar to those given for the zero bias distribution in Lemma 2.1 of [27].
Proposition 4.6.Let W be a mean zero variable with finite, non-zero variance rσ 2 , and let W Vr have the W -centered equilibrium distribution of order r in accordance with Definition 4.3.
(i) The SVG(r, σ, 0) distribution is the unique fixed point of the centered equilibrium transformation of order r.
(ii) The distribution of W Vr is unimodal about zero and absolutely continuous with density It follows that the support of W Vr is the closed convex hull of the support of W and that W Vr is bounded whenever W is bounded.
(iii) The centered equilibrium transformation of order r preserves symmetry.
(v) For c ∈ R, cW Vr has the cW -centered equilibrium distribution of order r.
Proof.(i) This is immediate from Definition 4.3 and the Stein characterisation for the SVG(r, σ, 0) distribution given in Lemma 3.1 of [16].
(ii) Firstly, we note that, for fixed t ∈ (0, 1), the expectation E[W 1(W > w/t)] is increasing for w < 0 and decreasing for w > 0. We therefore deduce that p(w) is increasing for w < 0 and decreasing for w > 0. Now, from Proposition 4.5, we have that W Vr = d B r W * .Formula (4.36) then follows from the fact that X * is absolutely continuous with density f W * (w) = E[W 1(W > w)]/Var(W ) (part (ii) of Lemma 2.1 of [27]) and the standard formula for computing the density of a product.
(iii) We follow the argument of part (iii) of Lemma 2.1 of [27].Let w be a continuity point of a symmetric random variable W .Then, for fixed t ∈ (0, 1), for almost all w.Therefore, there is a version of the dw density of W Vr which is the same at w and −w for almost all w[dw], and so W Vr is symmetric.
(v) Let g be a function such that EW g(W ) exists, and define g(x) = cg(cx).Then g(k) (x) = c k+1 g (k) (cx).As W Vr has the W -centered equilibrium distribution of order r, Hence cW Vr has the cW -centered equilibrium distribution of order r.
We end this section by proving Theorem 4.10 below, which formalises the notion that if L(W ) and L(W Vr ) are approximately equal then W has an approximation SVG distribution.This theorem is the SVG analogue of Theorem 2.1 of [39], in which the Wasserstein and Kolmogorov error bounds are given in terms of the difference in absolute expectation between the random variable of interest W and its W -equilibrium transformation.We follow the approach of [39] and begin by stating three lemmas.Lemma 4.7.Let Z ∼ SVG(r, σ, 0).Then, for any random variable W , where Proof.Clearly, Since, for all r > 0 and σ > 0, the SVG(r, σ, 0) density p(x) is an increasing function of x for x < 0 and a decreasing function of x for x > 0, we have that To obtain (4.37), we bound the integral on the right-hand side of (4.38), treating the cases r > 1, r = 1 and 0 < r < 1 separately.For r > 1 we bound the density p(x) 2 )/Γ( r 2 ) using (2.7) and then compute the trivial integral; for r = 1 we use inequality (6.79); and for 0 < r < 1 we use inequality (6.80).This yields (4.37), as required.
The next lemma follows immediately from the estimates of Theorem 3.1 and Corollary 3.2, and the subsequent lemma is straightforward and we hence omit the proof.Let f a,ǫ be the solution of the SVG(r, σ, 0) Stein equation with test function h a,ǫ .Define h a,0 (x) = 1(x ≤ a) and f a,0 accordingly.Then ) Lemma 4.9.Let Z ∼ SVG(r, σ, 0) and W be a real-valued random variable.Then, for any ǫ > 0, where C r,σ,ǫ is defined as in Lemma 4.7 and h a,ǫ is defined as in Lemma 4.8.
Theorem 4.10.Let W be a mean zero random variable with variance 0 < rσ 2 < ∞.Suppose that (W, W Vr ) is given on a joint probability space so that W Vr has the W -centered equilibrium distribution of order r.Then where C r,σ,4β is defined as in Lemma 4.7.Also, Proof.For this proof, we shall write κ = d K (W, Z).Let ∆ := W − W Vr .Define I 1 := 1(|∆| ≤ β); note that W Vr may not have finite second moment.Let f be the solution of the SVG(r, σ, 0) Stein equation with test function h a,ǫ , as defined in (4.39).Then ET r f ′ (W Vr ) is well defined, because T r f ′ < ∞ (see Lemma 4.8), and we have Using (4.43) gives |J 2 | ≤ 2 × 5 2 + 1 2r P(|∆| > β).Arguing as we did at the start of the proof of Corollary 3.2 to obtain the second equality, and then using inequalities (4.40) and (4.42) and Lemma 4.7 in the last step gives Similarly, and so we conclude that Using Lemma 4.9 and taking ǫ = 4β now gives whence on solving for κ yields (4.44).Now let us prove (4.45).We can write Taylor expanding, applying the triangle inequality to xf ′ (x) + f (x) , and using the estimates (4.40), (4.41) and (4.42) then gives that which gives (4.45).Suppose now that E|W | 3 < ∞, which, by part (iv) of Proposition 4.6, ensures that and by Taylor expansion, we have On using the estimate (3.25) we obtain (4.46), as required.Also, Applying the estimates (3.20) and (3.17) to (4.49) yields (4.47), whilst applying the estimates (3.15) and (3.12) yields (4.48).

Comparison of variance-gamma distributions
The following proposition quantifies the error in approximating a general VG distribution by a SVG distribution.We refer the reader to [30] for a number of similar bounds for comparison of univariate distributions.The proof provides an example under which the bounds on (x − µ)f (k) (x) , k = 0, 1, 2, 3, for the solution of the SVG Stein equation that were given in Theorem 3.1 prove useful.This application also serves as a simple example in which the inequalities of Proposition 4.1 are suboptimal. (5.50) When µ 1 = µ 2 , this lower bound is equal to r 1 |θ 1 |, and so there exist constants c > 0 and = 0 follows from the assumptions on h, the estimates of Theorem 3.1, and Lemma 3.1 of [16].Firstly, we prove (5.50).Suppose h ∈ H W .Then, from (5.52), (5.53) Using the estimates of Theorem 3.1 (with h ′ ≤ 1) to bound (5.53) yields (5.50).Now suppose that µ 1 = µ 2 .Take h z (x) = 1(x ≤ z).On using the estimates of Corollary 3.3 to bound (5.53), we obtain (5.51), as required.

Malliavin-Stein method for symmetric variance-gamma approximation
In recent years, one of the most significant applications of Stein's method has been to Gaussian analysis on Wiener space.This body of research was initiated by [34], in which Stein's method and Malliavin calculus are combined to derive a quantitative "fourth moment" theorem for the normal approximation of a sequence of random variables living in a fixed Wiener chaos.
In a recent work [12], the Malliavin-Stein method was extended to the VG distribution.Here, we obtain explicit constants in some of the main results (in the SVG case) of [12], these being six moment theorems for the SVG approximation of double Wiener-Itô integrals.Our results also fix a technical issue in that the Wasserstein distance bounds stated in [12] had only been proven in the weaker bounded Wasserstein distance (at the time of [12] the bounds for the solution of the Stein equation in the literature [15,16] had a dependence on the test function h such that this was the best that could be achieved).
Let us first introduce some notation; see the book [36] for a more detailed discussion.Let D p,q be the Banach space of all functions in L q (γ), where γ is the standard Gaussian measure, whose Malliavin derivatives up to order p also belong to L q (γ).Let D ∞ be the class of infinitely many times Malliavin differentiable random variables.We introduce the so-called Γ-operators Γ j [35].For a random variable F ∈ D ∞ , we define Γ 1 (F ) = F and, for every Here D is the Malliavin derivative, L −1 is the pseudo-inverse of the infinitesimal generator of the Ornstein-Uhlenbeck semi-group, and H is a real separable Hilbert space.Finally, for f ∈ H ⊙2 , we write I 2 (f ) for the double Wiener-Itô integral of f .Theorem 5.3.Let F ∈ D 2,4 be such that EF = 0 and let Z ∼ SVG(r, σ, 0).Then (5.54) If in addition F ∈ D 3,8 , then Γ 3 (F ) is square-integrable and (5.55) Proof.Let f : R → R be twice differentiable with bounded first and second derivative.
Then it was shown in the proof of Theorem 4.1 of [12] that If h ∈ H W , then the solution f of the SVG(r, σ, 0) Stein equation is twice differentiable with bounded first and second derivatives.Using the estimates (3.19) and (3.18) of Theorem 3.1 to bound f ′′ and f ′ then yields (5.54).Inequality (5.55) is justified in [12]. . (5.57) 2 and Theorem 4.3 of [35]), and it was shown in the proof of Theorem 5.8 of [12] that Inserting these formulas into (5.54)yields (5.57), as required.
Remark 5.5.One can obtain Kolmogorov distance bounds by applying Proposition 4.1 to the bound (5.57).However, these bounds are unlikely to be of optimal order.Unlike for normal approximation, for which an optimal rate of convergence in Kolmogorov distance has been obtained [37], there is a technical difficulty for SVG approximation because the first derivative of the solution f z of the SVG(r, σ, 0) Stein equation with test function h z (x) = 1(x ≤ z) has a discontinuity at the origin when z = 0 (see Proposition 3.5).We can, however, bound the expression using the inequalities (3.16) for xf ′′ (x) and (3.13) for f ′ to obtain the bound provided the expectations exist.However, there are no formulas in the literature for the expectations E[Γ 3 (F )/F ] and E[(Γ 3 (F )) 2 /F 2 ] (when they exist), and it is unlikely they could be expressed solely in terms of lower order cumulants of F .

Binary sequence comparison
Here we consider an application of Theorem 4.10 to binary sequence comparison.This a special case of a more general problem of word sequence comparison, which is of importance to biological sequence comparison.One way of comparing sequences uses k-tuples (a sequence of letters of length k).If two sequences are closely related, we would expect their k-tuple content to be similar.A statistic for sequence comparison based on k-tuple content, known as the D 2 statistic, was suggested by [4] (see [45] for further statistics based on k-tuple content).Letting A denote an alphabet of size d, and X w and Y w the number of occurrences of the word w ∈ A k in the first and second sequences, respectively, then the D 2 statistic is defined by Due to the complicated dependence structure (for a detailed account see [46]) approximating the asymptotic distribution of D 2 is a difficult problem.However, for certain parameter regimes D 2 has been shown to be asymptotically normal and Poisson [31].
We now consider the case of an alphabet of size 2 with comparison based on the content of 1-tuples.We suppose that the sequences are of length m and n, the alphabet is {0, 1}, and P(0 appears) = P(1 appears) = 1  2 .Denoting the number of occurrences of 0 in the two sequences by X and Y , then Clearly, X and Y are independent binomial variables with expectations m 2 and n 2 .Straightforward calculations (see [31]) show that ED 2 = mn 2 and Var(D 2 ) = mn 4 and the standardised D 2 statistic can be written as . (5.58) By the central limit theorem, (X − m 2 )/ m 4 and (Y − n 2 )/ n 4 are approximately N(0, 1) distributed, and so W has an approximate SVG(1, 1, 0) distribution.In [16], a O(m −1 + n −1 ) bound for the rate of convergence was given in a smooth test function metric (which requires the test function to be three times differentiable).In Theorem 5.8 below we use Theorem 4.10 to obtain bounds in the more usual Wasserstein and Kolmogorov metrics.Our rate of convergence is slower, but we do quantify the approximation in stronger metrics.We will first need to prove the following theorem.
Theorem 5.6.Suppose X 1 , . . ., X m are i.i.d. and Y 1 , . . ., Y n are i.i.d., with (5.59) (5.60) Remark 5.7.The rate of convergence in Kolmogorov distance bound (5.60) is unlikely to be of optimal order, but is better than the O m − 1 4 log(m) + n − 1 4 log(n) rate that would result from simply applying Proposition 4.1 to (5.59).A reasonable conjecture is that the optimal rate is O(m Proof.Since Z ∼ SVG(1, 1, 0), we will apply Theorem 4.10 with r = 1, for which 2) , the W -zero bias transformation of order 2. We begin by collecting some useful properties of this distributional transformation.In [19], the following construction is given: Since W 1 and W 2 are sums of independent random variables, we have by part (v) of Lemma 2.1 of [27] that √ n , where I and J are chosen uniformly from {1, . . ., m} and {1, . . ., n} respectively.It was shown in the proofs of Corollaries 4.1 and 4.2 of [19] that (5.62) The assumption 27], part (iv) of Lemma 2.1), which allowed [19] to obtain the O(m −1 + n −1 ) rate in (5.62).
Proof.Let I i and J i be the indicator random variables that letter 0 occurs at position i in the first and second sequences, respectively.Then X = m i=1 I i and Y = n j=1 J j .We may then write where ).The X i and Y j are all independent with zero mean and unit variance.Also, , and the result now follows from Theorem 5.6.

Random sums
Let X 1 , X 2 , . . .be i.i.d., positive, non-degenerate random variables with unit mean.Let N p be a Geo(p) random variable with P(N p = k) = p(1 −p) k−1 , k ≥ 1, that is independent of the X i .Then, a well-known result of [47] states that p Np i=1 X i converges in distribution to an exponential distribution with parameter 1 as p → 0. Geometric summation does indeed arise in a variety of settings; see [28].Stein's method was used by [39] to obtain a quantitative generalisation of the result of [47].If we alter the assumptions so that the X i have mean zero and finite non-zero variance, then p Np i=1 X i converges to a Laplace distribution as p → 0; see [52] and [43].Recently, [43], through the use of the centered equilibrium transformation, mirrored the approach of [39] to obtain an explicit error bound in the bounded Wasserstein metric.
In this section, we use Theorem 4.10 to obtain Wasserstein and Kolmogorov error bounds for the theorems of [43].Indeed, Theorems 5.9 and 5.10 below give Wasserstein and Kolmogorov distance bounds for the approximations of Theorems 1.3 and 4.4 of [43], respectively.The results of [39] are also given in these metrics, and we follow their approach to obtain our Kolmogorov bounds.For a random variable X, we denote by distribution function by F X and its generalized inverse by F −1 X .
Theorem 5.9.Let N be a positive, integer valued random variables with µ = EN < ∞ and let X 1 , X 2 , . . .be a sequence of independent random variables, independent of N, with Also, let M be any positive, integer valued random variable, independent of the X i , satisfying . (5.64) Suppose further that |X i | ≤ C for all i and |N − M| ≤ K. Then if K = 0, the same bound also holds for unbounded X i .
Remark 5.11.(i) Theorem 1.3 of [43] gives the bound (5.69) which holds under the same conditions as (5.68).Aside from being given in a stronger metric, the bound (5.68) has a theoretical advantage of having a multiplicative constant, 12, which is independent of σ, whereas (5.69) has a multiplicative constant 2 √ 2 + σ.The bound (5.69) has a smaller constant than (5.68) when σ < 12 − 2 √ 2, whilst the constant is larger when σ > 12 − 2 √ 2. (ii) The argument used to prove the final assertion of Theorem 5.10 also shows that the O(p 1 2 ) rate in (5.69) is optimal.(iii) Suppose now that τ = sup i≥1 EX 4 i < ∞.Then arguing as we did in the proof of Theorem 5.6 would result in the alternative bound where C > 0 does not depend on p.Thus, the dependence on p is worse than in (5.67), but (5.70) may be preferable if sup i≥1 F −1 is difficult to compute or large.The same remark applies to Theorem 5.9.

The quantity sup
can be easily bounded if the X i have finite support.To see this, suppose that X 1 , X 2 , . . .are supported on a subset of the finite interval [a, b] ⊂ R. Theorem 3.2 of [43] (see also Proposition 4.5) gives that X L = d B 2 X * , where B 2 ∼ Beta(2, 1) and X * , the X-zero bias distribution, are independent.But part (ii) of Lemma 2.1 of [27] tells us that the support of X * is the closed convex hull of the support of X, and since B r is supported on [0, 1] it follows that X L is supported on [a, b].We therefore have the bound sup i≥1 F −1 Proof.As noted by [43], the assumptions on N and the X i imply that L(M) = L(N), so we can take M = N. Inequality (5.67) is now immediate from (5.65).To obtain (5.68), we note the inequality (see [43]) where we used the Cauchy-Schwarz inequality.Inequality (5.68) now follows from (5.64).Finally, we prove that the O(p 2 ) rate in (5.68) is optimal.Suppose, in addition to the assumptions in the statement of the theorem, that X 1 , X 2 , . . .are i.i.d. with moments of all order and EX (5.71) Now, since EX 1 = 0 and EX 2 1 = σ 2 , as p → 0, (5.72) Plugging (5.72) into (5.71) and performing a simple asymptotic analysis using the formula , and so the O(p 1 2 ) rate cannot be improved.

Further proofs
Proof of Proposition 3.5.As usual, we set σ = 1 and µ = 0.The solution of the SVG(r, 1, 0) Stein equation with test function h z (x) = 1(x ≤ z) is then Setting z = 0 and differentiating (6.73) using (A.89) and (A.90) gives that We now note that, for all ν > − 1 2 , due to the asymptotic formula (A.83) and the fact that |t| ν K ν (|t|) is a constant multiple of the SVG(r, 1, 0) density meaning that the integral is bounded for all x ∈ R. Then On using the asymptotic formulas (A.83) and (A.84), we obtain f ′ 0 (0+) = − 1 2(2ν+1) and f ′ 0 (0−) = 1 2(2ν+1) , which proves the assertion.✷ Proof of Proposition 3.6.As usual, we set σ = 1 and µ = 0. Consider the test function h(x) = sin(ax) a , which is in the class H W .Therefore, if there was a general bound of the form f (3) ≤ M r h ′ , then we would be able to find a constant N r > 0, independent of a, such that f (3) ≤ N r .We shall show that f (3) (x) blows up as x → 0 for a such that ax ≪ 1 ≪ a 2 x, meaning that such a bound cannot be obtained for f (3) which proves the proposition.Before performing this analysis, we note that the second derivative h ′′ (x) = −a sin(ax) blows up if ax ≪ 1 ≪ a 2 x (consider the expansion sin(t) = t + O(t 3 ), t → 0).A bound of the form f (3) ≤ M r,0 h + M r,1 h ′ + M r,2 h ′′ is therefore still be possible, and we know from Section 3.1.7 of [9] that this is indeed the case.
Let x > 0. We first obtain a formula for f (3) (x).To this end, we note that twice differentiating the representation (3.10) of the solution and then simplifying using the differentiation formulas (A.89) and (A.90) followed by the Wronskian formula x [38] gives that Differentiating this formulas then gives where Here, to obtain equality (6.74) we used differentiation formulas (A.89) and (A.90) followed again by the Wronskian formula.For all ν > − 1 2 and x > 0, we can use inequalities (A.94) and (A.99) to bound R 1 : .
As h ≤ 2 h = 2 a , the term R 1 does not explode when a → ∞.Applying integration by parts to (6.74) we obtain where

.75)
For all ν > − 1 2 , we show that there exists a constant C ν > 0 independent of x such that A ν (x) ≤ C ν for all x > 0. To see this, it suffices to consider the behaviour in the limits x ↓ 0 and x → ∞.We first note that A ν (x) → 0 as x → ∞, which follows from using the differentiation formula (A.93) followed by (A.86) and the following limiting form (see [22]): . Also, using the differentiation formula (A.93) followed by the limiting forms (A.83) and (A.84) gives that, for ν > − 1 2 , as x ↓ 0, and therefore A ν (x) is bounded as x ↓ 0, as required.We conclude that R 2 does not explode when a → ∞.Now, we use the differentiation formula (A.93) to obtain where where we used (A.95) and that h ′ = 1 to obtain the second inequality.For ν > − 1 2 , the expression involving modified Bessel functions in (6.76) is uniformly bounded for all x ≥ 0, which can be seen from a straightforward analysis involving the asymptotic formulas (A.83) -(A.86).Therefore, the term R 3 does not explode when a → ∞.
In addition to x ↓ 0 and a → ∞ we let ax ↓ 0. Therefore on using that cos(t) = 1 − 1 2 t 2 + O(t 4 ) as t ↓ 0, we have that, in this regime, If we take choose a such that ax ≪ 1 ≪ a 2 x, then f (3) (x) blows up in a neighbourhood of the origin, which proves the assertion.✷ Proof of (3.27).As usual, we set σ = 1.From the formula (3.10) for the solution of the SVG(r, 1, 0) Stein equation we have We shall use L'Hôpital's rule to calculate I 1 and I 2 .In anticipation of this we note that d dx where we used the quotient rule and (A.90) in the final step.Similarly, on using (A.89) we obtain d dx Therefore, by L'Hôpital's rule, where we used the asymptotic formulas (A.85) and (A.86) to compute the limits.Thus, lim x→∞ xf (x) = − h(∞).Similarly, by considering (3.11) instead of (3.10), we obtain lim x→−∞ xf (x) = h(−∞).✷ The following lemma will be used in the proof of Proposition 4.1.
Proof of Proposition 4.1.For ease of notation, we shall set µ = 0; the extension to general µ ∈ R is obvious.Throughout this proof, Z will denote a SVG(r, σ, 0) random variable.
(i) Let r > 1. Proposition 1.2 of [48] states that if a random variable Y has Lebesgue density bounded by C, then for any random variable W , (6.77) Since the SVG(r, σ, 0) distribution is unimodal about 0, it follows from (2.7) that the density is bounded above by C = 1 2σ √ π Γ( r−1 2 )/Γ( r 2 ), which on substituting into (6.77)yields the desired bound.
(ii) Here we consider the case r = 1.We begin by following the approach used in the proof of Proposition 1.2 of [48], but we need to alter the argument because the SVG(1, σ, 0) density p(x) = 1 πσ K 0 |x| σ is unbounded as x → 0. Consider the functions h z (x) = 1(x ≤ z), and the 'smoothed' h z,α (x) defined to be one for x ≤ z + 2α, zero for x > z, and linear between.Then We take α = 1 2 πσd W (W, Z), which, as we assumed that σ −1 d W (W, Z) < 0.676, ensures that α σ < 0.729.This leads to the upper bound πσ .
Similarly, we can show that Combining these bounds proves (4.29).
(iii) Let 0 < r < 1.Then the SVG(r, σ, 0) density is unbounded as x → 0 and is a decreasing function of x for x > 0 and an increasing function for x < 0. Therefore we argue as we did in part (ii) and bound P(0 ≤ Z ≤ α) and then substitute into (6.78).Let ν = r−1 2 , so that − 1 2 < ν < 0. We have where we used a change of variables and (A.82) in the second step and Lemma 6.1 in the third.We therefore have that, for any z ∈ R and α > 0, To optimise, we take α = d W (W,Z) 2(2ν+1)Cν,σ As in part (ii), we can similarly obtain a lower bound, and on substituting ν = r−1 2 we obtain (4.30), which completes the proof.✷

A Properties of modified Bessel functions
Here we list standard properties and inequalities for modified Bessel functions that are used throughout this paper.All formulas can be found in [38], except for the differentation fromulas (A.92)-(A.93),which can be found in [15] and [20], and the inequalities.
The modified Bessel function of the first kind of order ν ∈ R is defined, for x ∈ R, by The modified Bessel function of the second kind of order ν ∈ R is defined, for x > 0, by Applying the inequality I µ+1 (x) < I µ (x), x > 0, µ > − 1 2 [49] to the sixth differentiation formula of Corollary 1 of [20] gives the inequality x ν < I ν (x) x ν , x > 0, ν > − 1 2 . (A.94) The next inequality follows from two applications of inequality (2.6) of [17].For x ≥ 0, The following bounds, which can be found in [18,21], are used to bound the solution to