Abstract
In a recent paper, Gaunt (Ann I H Poincare Probab Stat 56:1484–1513, 2020) extended Stein’s method to limit distributions that can be represented as a function \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) of a centred multivariate normal random vector \(\Sigma ^{1/2}{\textbf{Z}}\) with \({\textbf{Z}}\) a standard d-dimensional multivariate normal random vector and \(\Sigma \) a non-negative-definite covariance matrix. In this paper, we obtain improved bounds, in the sense of weaker moment conditions, smaller constants and simpler forms, for the case that g has derivatives with polynomial growth. We obtain new non-uniform bounds for the derivatives of the solution of the Stein equation and use these inequalities to obtain general bounds on the distance, measured using smooth test functions, between the distributions of \(g({\textbf{W}}_n)\) and \(g({\textbf{Z}})\), where \({\textbf{W}}_n\) is a standardised sum of random vectors with independent components and \({\textbf{Z}}\) is a standard d-dimensional multivariate normal random vector. We apply these general bounds to obtain bounds for the Chi-square approximation of the family of power divergence statistics (special cases include the Pearson and likelihood ratio statistics), for the case of two cell classifications, that improve on existing results in the literature.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let \({\textbf{Z}}\) be a standard d-dimensional multivariate normal random vector and let \(\Sigma \in {\mathbb {R}}^{d\times d}\) be a non-negative-definite covariance matrix, so that \(\Sigma ^{1/2}{\textbf{Z}}\sim \textrm{MVN}_d({\textbf{0}},\Sigma )\). By the continuous mapping theorem, if a sequence of d-dimensional random vectors \(({\textbf{W}}_n)_{n\ge 1}\) converges in distribution to \(\Sigma ^{1/2}{\textbf{Z}}\), then, for any continuous function \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\), \((g({\textbf{W}}_n))_{n\ge 1}\) converges in distribution to \(g(\Sigma ^{1/2}{\textbf{Z}})\). In a recent work, Gaunt [17] developed Stein’s method [39] for the problem of obtaining explicit bounds on the distance between the distributions of \(g({\textbf{W}}_n)\) and \(g(\Sigma ^{1/2}{\textbf{Z}})\), measured using smooth test functions. Henceforth, for ease of notation, we drop the subscript from \({\textbf{W}}_n\).
The basic approach used by [17] (a version of which was first used for Chi-square approximation by [33, 37]) is as follows. Consider the multivariate normal Stein equation [5, 22, 23] with test function \(h(g(\cdot ))\):
which has solution
In the univariate case, with \(d=1\) and \(\Sigma =1\), the multivariate normal Stein equation (1.1) reduces to the standard normal Stein equation [39]
where \(Z\sim N(0,1)\), and the solution is given by
The quantity of interest \(|{\mathbb {E}}[h(g({\textbf{W}}))]-{\mathbb {E}}[h(g(\Sigma ^{1/2}{\textbf{Z}}))]|\) can now be bounded by bounding the expectation
Taking the supremum of (1.5) over all h in some measure determining class of test functions \({\mathcal {H}}\) yields a bounds on the distance between \(g({\textbf{W}})\) and \(g(\Sigma ^{1/2}{\textbf{Z}})\) as measured by the integral probability metric \(d_{{\mathcal {H}}}({\textbf{W}},\Sigma ^{1/2}{\textbf{Z}}):=\sup _{h\in {\mathcal {H}}}|{\mathbb {E}}[h(g({\textbf{W}}))]-{\mathbb {E}}[h(g(\Sigma ^{1/2}{\textbf{Z}}))]|\). In Stein’s method, the following classes of test functions are often used:
which induce the Kolmogorov, Wasserstein and smooth Wasserstein (\(p\ge 1\)) distances, denoted by \(d_{\textrm{K}}\), \(d_{\textrm{W}}\) and \(d_p\), respectively. As discussed by [6], in theoretical settings the \(d_p\) distance is a natural probability metric to work, particularly in the context of quantitative limit theorems with faster convergence rates than the \(O(n^{-1/2})\) Berry–Esseen rate. Here, and throughout the paper, \(\Vert \cdot \Vert :=\Vert \cdot \Vert _\infty \) is the usual supremum norm of a real-valued function. Note that \(d_1=d_{\textrm{W}}\).
One of the main contributions of [17] was to obtain suitable bounds for solution (1.2) of the Stein equation (1.1) that hold for a large class of functions \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) [this is necessary in order to obtain good bounds on the quantity (1.5)]. In particular, Gaunt [17] obtained bounds for the case that the derivatives of g have polynomial growth (this covers, for example, Chi-square approximation: \(g(w)=w^2\)). These bounds were used by [17] to derive explicit bounds on the distance between the distributions of \(g({\textbf{W}})\) and \(g({\textbf{Z}})\) for the case that \({\textbf{W}}\) is a sum of independent random vectors with independent components, that is \({\textbf{W}}=(W_1,\ldots ,W_d)^\intercal \), where, for \(j=1,\ldots ,d\), \(W_j=n_j^{-1/2}\sum _{i=1}^{n_j}X_{ij}\), and the \(X_{ij}\) are independent random variables with zero mean and unit variance. Notably, Gaunt [17] obtained bounds with faster rates of convergence than the \(O(n^{-1/2})\) Berry–Esseen rate under additional matching moments between the \(X_{ij}\) and the standard normal distribution, and when g is an even function \((g({\textbf{w}})=g(-{\textbf{w}})\) for all \({\textbf{w}}\in {\mathbb {R}}^d\)).
The aforementioned results of [17] seem to have broad applicability, in part because many distributional approximations in probability and statistics assess the distance between the distributions of random variables that can be expressed in the form \(g({\textbf{W}})\) and \(g(\Sigma ^{1/2}{\textbf{Z}})\), where \({\textbf{W}}\) is close in distribution to \(\Sigma ^{1/2}{\textbf{Z}}\). Indeed, applications include bounds for the Chi-square approximation of the likelihood ratio statistic [2], the family of power divergence statistics [18] and Friedman’s statistic [21], multivariate normal approximation of the maximum likelihood estimator [1], and the bounds for distributional approximation in the delta method [17].
Motivated by the broad applicability of the results of [17], in this paper we improve the results of [17] in the form of weaker moment conditions, smaller constants and simpler bounds. Future works (including [12]) that require results from Stein’s method for functions of multivariate normal approximation will reap these benefits.
In Sect. 2, we obtain non-uniform bounds on the derivatives of solution (1.2) of the Stein equation (1.1) that have smaller constants than those of [17] and improved polynomial growth rate. We achieve these improved bounds through a more focused proof than that used by [17], which had derived the bounds for the case of polynomial growth rate \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) from a more general framework. We also use the iterative technique of [10] for bounding derivatives of solutions of Stein equations to obtain bounds on the derivatives of the solution (1.2) in the univariate \(d=1\) case (which require weaker differentiability assumptions on h and g) that have an optimal w-dependence; a crude approach had been used by [17] that resulted in bounds with sub-optimal w-dependence.
In Sect. 3, we apply the bounds of Sect. 2 to obtain bounds on the distance between the distributions of \(g({\textbf{W}})\) and \(g({\textbf{Z}})\), where \({\textbf{W}}\) is a sum of independent random vectors with independent components and \({\textbf{Z}}\) is a standard d-dimensional multivariate normal random vector. The bounds of Theorem 3.1 improve on those of Theorems 3.2-\(-\)3.5 of [17] in terms of smaller constants and weaker moment conditions. In Corollary 3.2, we provide simplified bounds without explicit constants.
In Sect. 4, we provide an application of the general bounds of Sect. 3 to Chi-square approximation. We derive explicit bounds for the Chi-square approximation of the power divergence family of statistics [9] in the case of two cell classifications. The power divergence statistic has a special structure in the case of two cell classifications, which allows us to apply the bounds of Sect. 3. Our bounds improve on existing results in the literature by holding in stronger probability metrics and having smaller constants, and our Kolmogorov distance bounds have a faster rate of convergence than the only other bounds in the literature that have the correct dependence on the cell classification probabilities. Moreover, as all our bounds possess an optimal dependence on the cell classification probabilities, we are able to demonstrate the significance of the weaker moment conditions of the bounds of Theorem 3.1, as using the general bounds of [17] would lead to bounds with a sub-optimal dependence on these probabilities; see Remark 4.3. Finally, some technical lemmas are proved in Appendix A.
Notation. The class \(C^n_b(I)\) consists of all functions \(h:I\subset {\mathbb {R}}\rightarrow {\mathbb {R}}\) for which \(h^{(n-1)}\) exists and is absolutely continuous and has bounded derivatives up to the n-th order. For a given P, the class \(C_P^n({\mathbb {R}}^d)\) consists of all functions \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) such that all n-th order partial derivatives of g exist and are such that, for \({\textbf{w}}\in {\mathbb {R}}^d\),
We will also consider the weaker class \(C_{P,*}^n({\mathbb {R}}^d)\), which consists of all functions \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) such that all n-th order partial derivatives of g exist and are bounded in absolute value by \(P(\textrm{w})\) for all \({\textbf{w}}\in {\mathbb {R}}^d\). We will write , where is a Stirling number of the second kind (see [31]). A standard multivariate normal random vector of dimension d will be denoted by \({\textbf{Z}}\), and, in the univariate \(d=1\) case, Z will denote a standard normal N(0, 1) random variable. Many of our bounds will be expressed in terms of the r-th absolute moment of the N(0, 1) distribution, which we denote by \(\mu _r=2^{r/2}\Gamma ((r+1)/2)/\sqrt{\pi }\).
2 Bounds for the Solution of the Stein Equation
In the following proposition, we provide bounds for solution (1.2) of the \(\textrm{MVN}_d({\textbf{0}},\Sigma )\) Stein equation (1.1) with test function \(h(g(\cdot ))\), which improve on results of [17].
Proposition 2.1
Let \(P({\textbf{w}})=A+B\sum _{i=1}^d|w_i|^{r_i}\), where \(r_i\ge 0\), \(i=1,\ldots ,d\). Let \(\sigma _{i,i}=(\Sigma )_{i,i}\), \(i=1,\ldots ,d\). Let \(f(=f_h)\) denote solution (1.2).
-
(i)
Assume that \(\Sigma \) is non-negative definite and \(h\in C_b^n({\mathbb {R}})\) and \(g\in C_P^n({\mathbb {R}}^d)\) for \(n\ge 1\). Then, for all \({\textbf{w}}\in {\mathbb {R}}^d\),
$$\begin{aligned} \bigg |\frac{\partial ^nf({\textbf{w}})}{\prod _{j=1}^n\partial w_{i_j}}\bigg |&\le \frac{h_n}{n}\bigg [A+B\sum _{i=1}^d2^{r_i/2}\big (|w_i|^{r_i}+\sigma _{ii}^{r_i/2}\mu _{r_i}\big )\bigg ]. \end{aligned}$$(2.6) -
(ii)
Now assume that \(\Sigma \) is positive definite and \(h\in C_b^{n-1}({\mathbb {R}})\) and \(g\in C_P^{n-1}({\mathbb {R}}^d)\) for \(n\ge 2\). Then, for all \({\textbf{w}}\in {\mathbb {R}}^d\),
$$\begin{aligned} \bigg |\frac{\partial ^{n}f({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |&\le \frac{\sqrt{\pi }\Gamma (\frac{n}{2})}{2\Gamma (\frac{n+1}{2})}h_{n-1}\min _{1\le l\le d} \bigg [A{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l|\nonumber \\&\quad +B\sum _{i=1}^d2^{r_i/2}\big (|w_i|^{r_i}{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l|\nonumber \\ {}&\quad +{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l((\Sigma ^{1/2}{\textbf{Z}})_i)^{r_i}|\big )\bigg ]. \end{aligned}$$(2.7)In the case \(\Sigma =I_d\), the \(d\times d\) identity matrix, we obtain the simplified bound:
$$\begin{aligned} \bigg |\frac{\partial ^{n}f({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |&\le \frac{\sqrt{\pi }\Gamma (\frac{n}{2})}{2\Gamma (\frac{n+1}{2})}h_{n-1} \bigg [A+B\sum _{i=1}^d2^{r_i/2}\big (|w_i|^{r_i}+\mu _{r_i+1}\big )\bigg ]. \end{aligned}$$(2.8) -
(iii)
Finally, consider the case \(d=1\) with \(\Sigma =1\). Assume that \(h\in C_b^{n-2}({\mathbb {R}})\) and \(g\in C_P^{n-2}({\mathbb {R}})\), where \(n\ge 3\) and \(P(w)=A+B|w|^r\), \(r\ge 0\). Then, for all \(w\in {\mathbb {R}}\),
$$\begin{aligned} |f^{(n)}(w)|\le h_{n-2}\big [\alpha _rA+2^{r/2}B\big (\beta _r|w|^r+\gamma _r\big )\big ], \end{aligned}$$(2.9)where \((\alpha _r,\beta _r,\gamma _r)=(4,4,2\mu _r)\) if \(0\le r\le 1\), and \((\alpha _r,\beta _r,\gamma _r)=(r+3,r+5,(r+1)\mu _{r+1})\) if \(r>1\).
-
(iv)
If \(h(w)=w\) for all \(w\in {\mathbb {R}}\), then the inequalities in parts (i), (ii) and (iii) hold for g in the classes \(C_{P,*}^n({\mathbb {R}}^d)\), \(C_{P,*}^{n-1}({\mathbb {R}}^d)\) and \(C_{P,*}^{n-2}({\mathbb {R}})\), respectively. Moreover, under these assumptions, the polynomial growth rate in bounds (2.6)–(2.9) is optimal.
In the following proposition, we obtain bounds for \(\psi _m\), the solution of the Stein equation
where \(f(=f_h)\) is the solution (1.2). Again, the bounds improve on results of [17]. Here, and throughout this section, we suppress in the notation the dependence of the solution \(\psi _m\) on the components with respect to which f has been differentiated; we do this for ease of notation and because our bounds for \(\psi _m\) do not depend themselves on which components f has been differentiated with respect to. The Stein equation (2.10) arises in the proof of parts (iii) and (iv) of Theorem 3.1, in which faster convergence rates for the distributional approximation of \(g({\textbf{W}})\) by \(g({\textbf{Z}})\) are achieved when \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) is an even function.
Proposition 2.2
Let \(P({\textbf{w}})=A+B\sum _{i=1}^d|w_i|^{r_i}\), where \(r_i\ge 0\), \(i=1,\ldots ,d\). Let \(\sigma _{i,i}=(\Sigma )_{i,i}\), \(i=1,\ldots ,d\).
-
(i)
Assume that \(\Sigma \) is non-negative definite and \(h\in C_b^{m+n}({\mathbb {R}})\) and \(g\in C_P^{m+n}({\mathbb {R}}^d)\) for \(m,n\ge 1\). Then, for all \({\textbf{w}}\in {\mathbb {R}}^d\),
$$\begin{aligned} \bigg |\frac{\partial ^n\psi _m({\textbf{w}})}{\prod _{j=1}^n\partial w_{i_j}}\bigg |&\le \frac{h_{m+n}}{n(m+n)}\bigg [A+B\sum _{i=1}^d3^{r_i/2}\big (|w_i|^{r_i}+2\sigma _{ii}^{r_i/2}\mu _{r_i}\big )\bigg ]. \end{aligned}$$(2.11) -
(ii)
Now assume that \(\Sigma \) is positive definite and \(h\in C_b^{m+n-2}({\mathbb {R}})\) and \(g\in C_P^{m+n-2}({\mathbb {R}}^d)\) for \(m,n\ge 1\) and \(m+n\ge 3\). Then, for all \({\textbf{w}}\in {\mathbb {R}}^d\),
$$\begin{aligned}&\bigg |\frac{\partial ^{n}\psi _{m}({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\nonumber \\ {}&\le \frac{\pi \Gamma (\frac{n}{2})\Gamma (\frac{m+n}{2})}{4\Gamma (\frac{n+1}{2})\Gamma (\frac{m+n+1}{2})}h_{m+n-2}\min _{1\le k,l\le d}\bigg [A{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_k|{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l|\nonumber \\&\quad +B\sum _{i=1}^d3^{r_i/2}\big (|w_i|^{r_i}{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l|{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_k|\nonumber \\&\quad +2{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_k|{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l((\Sigma ^{1/2}{\textbf{Z}})_i)^{r_i}|\big )\bigg ]. \end{aligned}$$(2.12)In the case \(\Sigma =I_d\), we obtain the simplified bound
$$\begin{aligned}{} & {} \bigg |\frac{\partial ^{n}\psi _{m}({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\nonumber \\ {}{} & {} \le \frac{\sqrt{2\pi }\Gamma (\frac{n}{2})\Gamma (\frac{m+n}{2})}{4\Gamma (\frac{n+1}{2})\Gamma (\frac{m+n+1}{2})}h_{m+n-2}\bigg [A+B\sum _{i=1}^d3^{r_i/2}\big (|w_i|^{r_i}+2\mu _{r_i+1}\big )\bigg ].\qquad \quad \end{aligned}$$(2.13) -
(iii)
Finally, consider the case \(d=1\) with \(\Sigma =1\). Assume that \(h\in C_b^{m-1}({\mathbb {R}})\) and \(g\in C_P^{m-1}({\mathbb {R}})\), where \(m\ge 2\) and \(P(w)=A+B|w|^r\), \(r\ge 0\). Then, for all \(w\in {\mathbb {R}}\),
$$\begin{aligned} |\psi _m^{(3)}(w)|\le h_{m-1}\big [{{\tilde{\alpha }}}_r A+ 3^{r/2}B\big ({{\tilde{\beta }}}_r|w|^{r}+{{\tilde{\gamma }}}_r\big )\big ], \end{aligned}$$(2.14)where \(({{\tilde{\alpha }}}_r,{{\tilde{\beta }}}_r,{{\tilde{\gamma }}}_r)=(10,10,10\mu _{r+1})\) if \(0\le r\le 1\), and \(({{\tilde{\alpha }}}_r,{{\tilde{\beta }}}_r,{{\tilde{\gamma }}}_r)=(r^2+r+8,r^2+2r+18,(2r^2+r+5)\mu _{r+1})\) if \(r>1\).
-
(iv)
If \(h(w)=w\) for all \(w\in {\mathbb {R}}\), then the inequalities in parts (i), (ii) and (iii) hold for g in the classes \(C_{P,*}^{m+n}({\mathbb {R}}^d)\), \(C_{P,*}^{m+n-2}({\mathbb {R}}^d)\) and \(C_{P,*}^{m-1}({\mathbb {R}})\), respectively. Moreover, under these assumptions, the polynomial growth rate in the bounds (2.11)–(2.14) is optimal.
Remark 2.3
The following two-sided inequalities are helpful in understanding the behaviour of the bounds (2.7) and (2.8) of Proposition 2.1 and the bounds (2.12) and (2.13) of Proposition 2.2 for large m and n:
and
Here we used the inequalities \(\frac{\Gamma (x+1/2)}{\Gamma (x+1)}>(x+1/2)^{-1/2}\) for \(x>0\) (see the proof of Corollary 3.4 of [15]) and \(\frac{\Gamma (x+1/2)}{\Gamma (x+1)}<(x+1/4)^{-1/2}\) for \(x>-1/4\) (see [11]).
Remark 2.4
1. Inequalities (2.6)–(2.8) of Proposition 2.1 and inequalities (2.11)–(2.13) of Proposition 2.2 are sharper than the corresponding bounds given in Corollaries 2.2 and 2.3 of [17] through improved dependence on the constants \(r_1,\ldots ,r_d\) and m and n. We remark that further more accurate bounds can be given; for example, a slight modification of the proof of inequality (2.6) leads to the improved bound
where \(I_{n,r}=n\int _{0}^{1}t^{n-1}(t+\sqrt{1-t^2})^r\,\textrm{d}t\), and it is clear that \(I_{n,r}\le 2^{r/2}\). This integral cannot in general be expressed in terms of elementary functions, but for given n and r can be evaluated by computational algebra packages. In specific applications in which n and \(r_1,\ldots ,r_d\) are known, this bound could be applied to improve constants. Similarly, the constant \(2^{r_i/2}\) in the bounds (2.7) and (2.8) can be improved by replacing \(2^{r_i/2}\) by \(J_{n,r_i}=\frac{2\Gamma {((n+1)/2)}}{\sqrt{\pi }\Gamma {(n/2)}}\int _{0}^{1}\frac{t^{n-1}}{\sqrt{1-t^2}}(t+\sqrt{1-t^2})^{r_i}\,\textrm{d}t\), and the factor \(3^{r_i/2}\) in inequality (2.11) can be improved to: \(I_{m,n,r_i}=n(m+n)\int _{0}^{1}\int _{0}^{1}t^{m+n-1}s^{n-1}(st+t\sqrt{1-s^2}+\sqrt{1-t^2})^{r_i}\,\textrm{d}s\,\textrm{d}t\), whilst the factor \(3^{r_i/2}\) in inequalities (2.12) and (2.13) can be improved to \(J_{m,n,r_i}=\frac{4\Gamma ((n+1)/2)\Gamma ((m+n+1)/2)}{\pi \Gamma (n/2)\Gamma ((m+n)/2)}\int _{0}^{1}\int _{0}^{1}\frac{t^{m+n-1}}{\sqrt{1-t^2}}\frac{s^{n-1}}{\sqrt{1-s^2}}(st+t\sqrt{1-s^2}+\sqrt{1-t^2})^{r_i}\,\textrm{d}s\,\textrm{d}t\). For our purposes, we find it easier to work with the more explicit bounds stated in the propositions, particularly as these inequalities are used to derive inequalities (2.9) and (2.14) leading to a more efficient proof and simpler constants in the final bounds.
2. Inequality (2.9) of Proposition 2.1 and inequality (2.14) of Proposition 2.2 have a theoretically improved dependence on r than the corresponding bounds of [17] in that for large r they are of a smaller asymptotic order; however, for small r they may be numerically larger. More significantly, inequality (2.9) of Proposition 2.1 improves on the corresponding bound of Corollary 2.2 of [17] by improving the w-dependence of the bound from \(|w|^{r+1}\) to \(|w|^r\), whilst inequality (2.9) of Proposition 2.2 improves on the corresponding bound of Corollary 2.3 of [17] by improving the w-dependence of the bound from \(|w|^{r+2}\) to \(|w|^r\). This improved w-dependence allows us to impose weaker moment conditions in our general bounds in Theorem 3.1.
In proving Propositions 2.1 and 2.2, we will make use of the following simple lemmas. The proofs are given in Appendix A.
Lemma 2.5
Suppose \(a,b,c,x,y,z\ge 0\) and \(r\ge 0\). Then
Lemma 2.6
Let \(T_r(w)=w\textrm{e}^{w^2/2}\int _{w}^{\infty }t^r\textrm{e}^{-t^2/2}\,\textrm{d}t\).
-
1.
Suppose \(0\le r\le 1\). Then, for \(w>0\),
$$\begin{aligned} T_r(w)\le w^r. \end{aligned}$$(2.18) -
2.
Suppose \(r>1\). Then, for \(w>r-1\),
$$\begin{aligned} T_r(w)\le 2w^r. \end{aligned}$$(2.19)
Proof of Proposition 2.1
(i) Suppose that \(\Sigma \) is non-negative definite. We first recall inequality (2.2) of [17], which states (under a change of variable in the integral) that, for positive definite \(\Sigma \), and \(g\in C_P^n({\mathbb {R}}^d)\) and \(h\in C_b^n({\mathbb {R}})\),
where \({\textbf{z}}_{t,{\textbf{w}}}^{{\textbf{x}}}=t{\textbf{w}}+{\textbf{x}}\sqrt{1-t^2}\). We will denote the i-th component of \({\textbf{z}}_{t,{\textbf{w}}}^{{\textbf{x}}}\) by \(({\textbf{z}}_{t,{\textbf{w}}}^{{\textbf{x}}})_i\). Using inequality (2.20) with dominating function \(P({\textbf{w}})=A+B\sum _{i=1}^{d}|w_i|^{r_i}\) gives the bound
Now using inequality (2.16) of Lemma 2.5, we have
where we used basic calculus to bound \(t+\sqrt{1-t^2}\le \sqrt{2}\), \(t\in (0,1)\). Therefore,
from which inequality (2.6) follows on evaluating the integral \(\int _0^1t^{n-1}\,\textrm{d}t=1/n\) and using that \((\Sigma ^{1/2}{\textbf{Z}})_i\sim N(0,\sigma _{ii})\), so that \({\mathbb {E}}|(\Sigma ^{1/2}{\textbf{Z}})_i|^{r_i}=\sigma _{ii}^{r_i/2}\mu _{r_i}\).
(ii) Suppose now that \(\Sigma \) is positive definite. Under this assumption, we may recall inequality (2.3) of [17], which states that, for \(g\in C_P^n({\mathbb {R}}^d)\) and \(h\in C_b^n({\mathbb {R}})\),
With inequality (2.21) at hand, a similar argument to the one used in part (i) of the proof yields the bound
from which inequality (2.7) follows on evaluating \(\int _{0}^{1}\frac{t^{n-1}}{\sqrt{1-t^2}}\,\textrm{d}t=\frac{\sqrt{\pi }\Gamma (n/2)}{2\Gamma ((n+1)/2)}\). Inequality (2.8) now follows by setting \(\Sigma =I_d\) in inequality (2.7) and bounding \({\mathbb {E}}|Z|=\sqrt{2/\pi }<1\).
(iii) Suppose now that \(d=1\) and \(\Sigma =1\). We will employ the iterative approach to bounding solutions of Stein equations of [10]. Let L denote the standard normal Stein operator given by \(Lf(w)=f''(w)-wf'(w)\). Then, as noted by [10], an induction on n gives that, for \(n\ge 3\),
From (2.22), a rearrangement and an application of the triangle inequality gives that
From inequality (2.6) and Lemma 2.1 of [17], we have the bounds
and
It now remains to bound \(|wf^{(n-1)}(w)|\), which requires more work to get the correct w-dependence. To this end, we obtain from (1.3), (1.4) and (2.22) the following representations for \(f^{(n-1)}(w)\):
We first consider the case \(0\le r\le 1\). From (2.26), we have that, for \(w>0\),
where in the final step we used inequality (2.18). Due to the equality between (2.26) and (2.27), the same argument can be used to verify that inequality (2.29) is also valid for \(w<0\). Applying inequalities (2.24), (2.25) and (2.29) to inequality (2.23) yields the bound (2.9) for \(0\le r\le 1\).
Suppose now that \(r>1\). When \(r>1\), applying inequalities (2.18) and (2.19) to (2.28) (and noting that the case \(w<0\) is dealt with similarly) yields the bound
for \(|w|>r-1\). We now bound \(|wf^{(n-1)}(w)|\) for \(|w|\le r-1\). We begin by noting that, for \(n\ge 3\), \(u(n)=\Gamma ((n-1)/2)/\Gamma (n/2)\le 2/\sqrt{\pi }\), which follows because u is a decreasing function of n on \((3,\infty )\) (see [26]) and \(u(3)=2/\sqrt{\pi }\). Using this inequality, the bound (2.8) yields that, for \(|w|\le r-1\),
From inequalities (2.30) and (2.31), we deduce that, for \(r>1\) and \(w\in {\mathbb {R}}\),
where we used that \(\mu _r\le \mu _{r+1}\) for \(r>1\). Finally, we substitute the bounds (2.24), (2.25) and (2.32) into inequality (2.23) to obtain inequality (2.9) for \(r>1\), where we again used that \(\mu _r\le \mu _{r+1}\) for \(r>1\).
(iv) In obtaining inequality (2.20), Gaunt [17] used the assumption that \(g\in C_P^n({\mathbb {R}}^d)\) to bound the absolute value of the n-th order partial derivatives of \((h\circ g)({\textbf{w}})\) by \(h_nP({\textbf{w}})\), \({\textbf{w}}\in {\mathbb {R}}^d\) (see Lemma 2.1 of [17]). However, if \(h(w)=w\), then under the assumption \(g\in C_{P,*}^n({\mathbb {R}}^d)\) we can bound the n-th order partial derivatives of \((h\circ g)({\textbf{w}})=g({\textbf{w}})\) by \(P({\textbf{w}})\), \({\textbf{w}}\in {\mathbb {R}}^d\). Also, note that if \(h(w)=w\), then \(h_n=1\). We therefore see that inequality (2.6) holds under the weaker assumption \(g\in C_{P,*}^n({\mathbb {R}}^d)\) if \(h(w)=w\). For very similar reasons, we can also weaken the conditions on g in parts (ii) and (iii) of the proposition if \(h(w)=w\).
To prove the optimality of the polynomial growth rate in the bounds (2.6)–(2.9), it suffices to consider the case \(d=1\). Let \(h(w)=w\) and \(g(w)=|w|^q\), where \(q\ge n\ge 1\). Then, \(g\in C_{P_1,*}^n({\mathbb {R}})\), where \(P_1(w)=n!|w|^{q-n}\). Using the representation (1.2) of f and the dominated convergence theorem, we have that
where \(\phi \) is the standard normal probability density function and \(\textrm{sgn}(x)\) is the sign of \(x\in {\mathbb {R}}\). A simple asymptotic analysis now gives that \(|f^{(n)}(w)|\sim (n!/q)|w|^{q-n}, \) as \(|w|\rightarrow \infty \). Thus, the \(|w|^r\) growth rate in inequality (2.6) is optimal. The optimality of the growth rate in inequalities (2.7)–(2.9) is established similarly. We observe that \(g\in C_{P_2,*}^{n-1}({\mathbb {R}})\) and \(g\in C_{P_3,*}^{n-2}({\mathbb {R}})\), where \(P_2(w)=(n-1)!|w|^{q-n+1}\) and \(P_3(w)=(n-2)!|w|^{q-n+2}\), and almost identical calculations now confirm that the growth rate in inequalities (2.7)–(2.9) is optimal. \(\square \)
Proof of Proposition 2.2
(i) Suppose that \(\Sigma \) is non-negative definite. We will make use of the following bound given in Lemma 2.4 of [17], which states (under a change of variable in the integral) that, for non-negative definite \(\Sigma \), and \(g\in C_P^{m+n}({\mathbb {R}}^d)\) and \(h\in C_b^{m+n}({\mathbb {R}})\),
where \({\textbf{z}}_{s,t,{\textbf{w}}}^{{\textbf{x}},{\textbf{y}}}=st{\textbf{w}}+t\sqrt{1-s^2}{\textbf{y}}+\sqrt{1-t^2}{\textbf{x}}\) and, here and in part (ii) of the proof, \({\textbf{Z}}'\) is an independent copy of \({\textbf{Z}}\). We will apply inequality (2.33) with dominating function \(P({\textbf{w}})=A+B\sum _{i=1}^{d}|w_i|^{r_i}\). Now, using inequality (2.17) gives that
where we used basic calculus to bound \(st +t\sqrt{1-s^2}+\sqrt{1-t^2}\le \sqrt{3}\) for \(0<s,t<1\). We therefore obtain the bound
and on evaluating the integral we deduce inequality (2.11).
(ii) Suppose now that \(\Sigma \) is positive definite. Under this assumption, we recall a bound from Lemma 2.4 of [17] that states that, for \(g\in C_P^{m+n-2}({\mathbb {R}}^d)\) and \(h\in C_b^{m+n-2}({\mathbb {R}})\),
Applying inequality (2.34) to the bound (2.35) gives that
from which inequality (2.12) follows on evaluating the integral similarly to how we did in proving inequality (2.7) of Proposition 2.1. The simplified bound inequality (2.13) now follows by setting \(\Sigma =I_d\) in inequality (2.12) and using that, for such \(\Sigma \) we have \({\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_k|={\mathbb {E}}|Z|=\sqrt{2/\pi }\) for \(k=1,\ldots ,d\).
(iii) We proceed similarly to the proof of inequality (2.9), using the iterative technique of [10]. Recall that \(L\psi _m(w)=f^{(m)}(w)\), where L is the standard normal Stein operator. Differentiating gives that
From (2.36) and the triangle inequality, we get that
From (2.9) and (2.13), we have the bounds
where we used that, for \(m\ge 2\), \(v(m)=\Gamma ((m+1)/2)/\Gamma (m/2+1)\le \sqrt{\pi }/2\), which follows because v is a decreasing function of m on \((2,\infty )\) (see [26]) and \(v(2)=\sqrt{\pi }/2\), in applying inequality (2.13).
It now remains to bound \(|w\psi _m''(w)|\), which requires more effort to establish a bound with the correct w-dependence. From (1.3), (1.4) and (2.36), we obtain the following representation for \(\psi _m''(w)\):
We first consider the case \(0\le r\le 1\).
Applying (2.38) and (2.40) to (2.41) gives that, for \(w\ge 0\),
where we used inequality (2.18) to bound the integral. Due to the equality between (2.41) and (2.42), it is readily seen that inequality (2.44) also holds for \(w<0\). Applying inequalities (2.38), (2.40) and (2.44) to inequality (2.37), as well as using the inequality \(2\mu _r\le 3\mu _{r+1}\), \(0\le r\le 1\), to simplify the bound, now yields (2.14) for \(0\le r\le 1\).
Suppose now that \(r>1\). We now apply inequalities (2.39) and (2.40) to (2.43) to get that, for \(|w|>r-1\),
where we used inequalities (2.18) and (2.19) to bound the integral. Now suppose that \(|w|\le r-1\). Rearranging \(L\psi _m(w)=f^{(m)}(w)\) and applying the triangle inequality gives that, for \(|w|\le r-1\),
Using inequalities (2.8) and (2.13) now gives that
where in applying inequalities (2.8) and (2.13), we used the bounds \(\Gamma ((m+1)/2)/\Gamma (m/2+1)\le \sqrt{\pi }/2\) and \(\Gamma (m/2)/\Gamma ((m+1)/2)\le 2/\sqrt{\pi }\) for \(m\le 2\), which can be verified with the usual argument for bounding ratios of gamma functions that we have used earlier in this paper. From inequalities (2.45) and (2.46), we deduce that, for \(r>1\) and \(w\in {\mathbb {R}}\),
Finally, we substitute the bounds (2.39), (2.40) and (2.47) into inequality (2.37) to obtain inequality (2.14) for \(r>1\).
(iv) When \(h(w)=w\), the bounds in parts (i), (ii) and (iii) hold for g in the weaker classes \(C_{P,*}^{m+n}({\mathbb {R}}^d)\), \(C_{P,*}^{m+n-2}({\mathbb {R}}^d)\) and \(C_{P,*}^{m-1}({\mathbb {R}})\) for the same reason given in part (iv) of the proof of Proposition 2.1.
The optimality of the \(|w|^r\) growth rate of inequalities (2.11)–(2.14) is established similarly to we did in part (iv) of the proof of Proposition 2.1. We demonstrate the optimality of the growth rate in inequality (2.11); similar considerations show the optimality of the growth rate in inequalities (2.12)–(2.14). Let \(d=1\), \(h(w)=w\) and \(g(w)=|w|^q\), where \(q\ge m+n\). Observe that \(g\in C_{P,*}^{m+n}({\mathbb {R}})\) with \(P(w)=(m+n)!|w|^{q-m-n}\). Also, by the dominated convergence theorem,
where we recall that \(z_{s,t,w}^{x,y}=stw+t\sqrt{1-s^2}y+\sqrt{1-t^2}x\). A simple asymptotic analysis now gives that \(|\psi _m^{(n)}(w)|\sim (m+n)!|w|^{q-m-n}/(q(q-m)), \) as \(|w|\rightarrow \infty \). Thus, the \(|w|^r\) growth rate in inequality (2.11) is optimal. \(\square \)
3 Bounds for the Distance Between the Distributions of \(g({\textbf{W}})\) and \(g({\textbf{Z}})\)
In this section, we obtain general bounds to quantify the quality of the distributional approximation of \(g({\textbf{W}})\) by \(g({\textbf{Z}})\), where \({\textbf{Z}}\sim \textrm{MVN}_d({\textbf{0}},I_d)\), in the setting that \({\textbf{W}}\) is a sum of independent random vectors with independent components. We shall suppose that \(X_{1,1},\ldots ,X_{n_1,1},\ldots ,X_{1,d},\ldots ,X_{n_d,d}\) are independent random variables, and define the random vector \({\textbf{W}}:=(W_1,\ldots ,W_d)^\intercal \), where \(W_j=n_j^{-1/2}\sum _{i=1}^{n_j}X_{ij}\), for \(1\le j\le d\). We shall also assume that \({\mathbb {E}}[X_{ij}^k]={\mathbb {E}}[Z^k]\) for all \(1\le i\le n_j\), \(1\le j\le d\) and all \(k\in {\mathbb {Z}}^+\) such that \(k\le p\), for some \(p\ge 2\); having three or more matching moments allows for faster convergence rates than the \(O(n^{-1/2})\) Berry–Esseen rate. As in Sect. 2, we shall suppose that the partial derivatives of g up to a specified order exist and have polynomial growth rate. To this end, we introduce the dominating function \(P({\textbf{w}})=A+B\sum _{i=1}^{d}|w_i|^{r_i},\) where A, B and \(r_1,\ldots , r_d\) are non-negative constants. In the univariate \(d=1\) case, we simplify notation, writing \(W=n^{-1/2}\sum _{i=1}^nX_i\), where \(X_1,\ldots ,X_n\) are independent random variables such that \({\mathbb {E}}[X_i^k]={\mathbb {E}}[Z^k]\) for all \(1\le i\le n\) and \(1\le k\le p\). The dominating function also takes the simpler form, \(P(w)=A+B|w|^r\).
Our general bounds are stated in the following theorem and improve on the bounds of Theorems 3.2-\(-\)3.5 of [17] through smaller numerical constants and weaker moment conditions. These improvements are a result of our improved bounds of Propositions 2.1 and 2.2 on the solutions of the Stein equations (1.1) and (2.10) (the improved w-dependence of the univariate bounds results in weaker moment conditions) and some more careful simplifications in the proof of Theorem 3.1 to improve the dependence of the bounds in parts (iii) and (iv) (in which g is an even function) on the moments of the \(X_{ij}\). The rate of convergence of all bounds with respect to n is of optimal order; see [17, Proposition 3.1]. We shall let \(\Delta _h(g({\textbf{W}}),g({\textbf{Z}}))\) denote the quantity \(|{\mathbb {E}}[h(g({\textbf{W}}))]-{\mathbb {E}}[h(g({\textbf{Z}}))]|\). The bounds involve the constants \(c_r=\max \{1,2^{r-1}\},\) \(r\ge 0\).
Theorem 3.1
Suppose that the above notations and assumptions prevail. Then under additional assumptions, as given below, the following bounds hold.
-
(i)
Suppose \({\mathbb {E}}|X_{ij}|^{r_l+p+1}<\infty \) for all i, j, l, and that \(g\in C_P^p({\mathbb {R}}^d)\) and \(h\in C_b^p({\mathbb {R}})\). Then
$$\begin{aligned}&\Delta _h(g({\textbf{W}}),g({\textbf{Z}}))\nonumber \\&\le \frac{(p+1)\sqrt{\pi }\Gamma (\frac{p+1}{2})}{2p!\Gamma (\frac{p}{2}+1)}h_p\sum _{j=1}^{d}\sum _{i=1}^{n_j}\frac{1}{n_j^{(p+1)/2}}\bigg [A{\mathbb {E}}|X_{ij}|^{p+1}\nonumber \\ {}&\quad +B\sum _{k=1}^{d}2^{r_k/2}\bigg (c_{r_k}{\mathbb {E}}|X_{ij}|^{p+1}{\mathbb {E}}|W_k|^{r_k}\nonumber \\&\quad +\frac{c_{r_k}}{n_k^{r_k/2}}{\mathbb {E}}|X_{ij}^{p+1}X_{ik}^{r_k}|+\mu _{r_k+1}{\mathbb {E}}|X_{ij}|^{p+1}\bigg )\bigg ]. \end{aligned}$$(3.48) -
(ii)
Suppose \({\mathbb {E}}|X_{i}|^{r+p+1}<\infty \) for all i, and that \(g\in C_P^{p-1}({\mathbb {R}})\) and \(h\in C_b^{p-1}({\mathbb {R}})\). Then
$$\begin{aligned} \Delta _h(g(W),g(Z))&\le \frac{(p+1)}{p!n^{(p+1)/2}}h_{p-1}\sum _{i=1}^{n}\bigg [\alpha _r A{\mathbb {E}}|X_i|^{p+1}+2^{r/2}B\nonumber \\ \bigg (c_r\beta _r\bigg ({\mathbb {E}}|X_i|^{p+1}{\mathbb {E}}|W|^r\nonumber \\&\quad +\frac{1}{n^{r/2}}{\mathbb {E}}|X_i|^{r+p+1}\bigg )+\gamma _r{\mathbb {E}}|X_i|^{p+1}\bigg )\bigg ]. \end{aligned}$$(3.49) -
(ii)
Suppose \({\mathbb {E}}|X_{ij}|^{r_l+p+2}<\infty \) for all i, j, l, and that \(g\in C_P^{p+2}({\mathbb {R}}^d)\) and \(h\in C_b^{p+2}({\mathbb {R}})\). In addition, suppose that \(p\ge 2\) is even and that g is an even function. Then
$$\begin{aligned}&\Delta _h(g({\textbf{W}}),g({\textbf{Z}}))\nonumber \\&\le \frac{1}{p!}h_{p+2}\bigg \{\sum _{j=1}^{d}\sum _{i=1}^{n_j}\frac{1}{n_j^{p/2+1}}\frac{2p+3}{(p+1)(p+2)}\nonumber \\&\bigg [A{\mathbb {E}}|X_{ij}|^{p+2}+B\sum _{k=1}^{d}2^{r_k/2}\bigg (c_{r_k}{\mathbb {E}}|X_{ij}|^{p+2}{\mathbb {E}}|W_k|^{r_k}\nonumber \\&\quad +\frac{c_{r_k}}{n_k^{r_k/2}}{\mathbb {E}}|X_{ij}^{p+2}X_{ik}^{r_k}|+\mu _{r_k}{\mathbb {E}}|X_{ij}|^{p+2}\bigg )\bigg ]+\frac{3\pi \Gamma (\frac{p}{2}+2)}{8\sqrt{2}\Gamma (\frac{p+5}{2})}\sum _{j=1}^{d}\sum _{i=1}^{n_j}\frac{|{\mathbb {E}}[X_{ij}^{p+1}]|}{n_j^{(p+1)/2}}\nonumber \\&\quad \times \sum _{k=1}^{d}\sum _{l=1}^{n_k}\frac{1}{n_k^{3/2}}\bigg [A{\mathbb {E}}|X_{lk}|^3+B\sum _{t=1}^{d}3^{r_t/2}\bigg (c_{r_t}{\mathbb {E}}|X_{lk}|^3{\mathbb {E}}|W_t|^{r_t}+\frac{c_{r_t}}{n_t^{r_t/2}}{\mathbb {E}}|X_{lk}^3X_{lt}^{r_t}|\nonumber \\&\quad +2\mu _{r_t+1}{\mathbb {E}}|X_{lk}|^3\bigg )\bigg ]\bigg \}. \end{aligned}$$(3.50) -
(iv)
Suppose \({\mathbb {E}}|X_{i}|^{r+p+2}<\infty \) for all i, and that \(g\in C_P^{p}({\mathbb {R}})\) and \(h\in C_b^{p}({\mathbb {R}})\). In addition, suppose that \(p\ge 2\) is even and that g is an even function. Then,
$$\begin{aligned}&\Delta _h(g(W),g(Z)) \le \frac{1}{p!n^{p/2+1}}h_p\bigg \{\sum _{i=1}^{n}\frac{2p+3}{p+1}\bigg [\alpha _r A{\mathbb {E}}|X_i|^{p+2}\nonumber \\&\quad +2^{r/2}B\bigg (c_r\beta _r\bigg ({\mathbb {E}}|X_i|^{p+2}{\mathbb {E}}|W|^r\nonumber \\&\quad +\frac{1}{n^{r/2}}{\mathbb {E}}|X_i|^{r+p+2}\bigg )\!+\!\gamma _r{\mathbb {E}}|X_i|^{p+2}\bigg )\bigg ]\!+\!\frac{3}{2n}\sum _{i=1}^{n}\sum _{l=1}^{n}|{\mathbb {E}}[X_i^{p+1}]|\bigg [{{\tilde{\alpha }}}_r A{\mathbb {E}}|X_l|^3\nonumber \\&\quad +3^{r/2}B\bigg (c_r{{\tilde{\beta }}}_r\bigg ({\mathbb {E}}|X_l|^3{\mathbb {E}}|W|^r +\frac{1}{n^{r/2}}{\mathbb {E}}|X_l|^{r+3}\bigg )\!+{{\tilde{\gamma }}}_r{\mathbb {E}}|X_l|^3\bigg )\bigg ]\bigg \}. \end{aligned}$$(3.51) -
(v)
If \(h(w)=w\) for all \(w\in {\mathbb {R}}\), then the inequalities in parts (i)–(iv) hold for g in the classes \(C_{P,*}^p({\mathbb {R}}^d)\), \(C_{P,*}^{p-1}({\mathbb {R}})\), \(C_{P,*}^{p+2}({\mathbb {R}}^d)\) and \(C_{P,*}^{p}({\mathbb {R}})\), respectively.
If one only requires bounds with an explicit dependence on \(n_1,\ldots ,n_d\) and the dimension d, but not with explicit constants, then the bounds given in the following corollary may be preferable.
Corollary 3.2
Let \(r_*=\max _{1\le j\le d}r_j\) and \(n_*=\min _{1\le j\le d}n_j\), and also let \({\tilde{h}}_p=\sum _{j=1}^p\Vert h^{(j)}\Vert \). Let C be a constant that does not depend on \(n_1,\ldots ,n_d\) and d and which may change from line to line. Then
(i) Under the assumptions of part (i) of Theorem 3.1,
(ii) Under the assumptions of part (ii) of Theorem 3.1,
(iii) Under the assumptions of part (iii) of Theorem 3.1,
(iv) Under the assumptions of part (iv) of Theorem 3.1,
Remark 3.3
The univariate bounds in parts (ii) and (iv) give bounds on \(\Delta _h(g(W),g(Z))\) that hold under weaker assumptions on g and h than the bounds in parts (i) and (iii) with \(d=1\). The univariate bounds of Theorems 3.2-\(-\)3.5 of [17] also enjoyed this property, but came at the cost of stronger moment conditions in the univariate case. In virtue of the improved w-dependence of our univariate bounds in Propositions 2.1 and 2.2, we were able to derive univariate bounds that do not incur this cost.
Remark 3.4
As noted by [17, Remark 3.6], if \(\Vert h^{(k)}\Vert =1\), \(1\le k\le p\), then , where \(B_p=\textrm{e}^{-1}\sum _{j=1}^\infty j^p/j!\) is the p-th Bell number (see [31, Section 26.7(i)]). For example, setting \(h_p=B_p\) in inequality (3.48) gives a bound in the \(d_{p}\) metric, \(p\ge 2\). Also note that setting \(p=2\) in part (ii) gives a bound in the Wasserstein distance. To understand the behaviour of the bound (3.48) for large p, we apply the lower bound in (2.15), Stirling’s inequality \(p!>\sqrt{2\pi }p^{p+1/2}\textrm{e}^{-p}\) and the inequality \(B_p<(0.792p/\log (p+1))^p\) of [7] to obtain the bound
We note that whilst we have shown that this constant tends to zero as \(p\rightarrow \infty \), this does not imply that the bound (3.48) will tend to zero in this limit if n is fixed. This is because, for \(p\ge 2\) and \(i=1,\ldots ,n\), we have that \({\mathbb {E}}|X_i|^{p+1}\ge {\mathbb {E}}|X_i|^p={\mathbb {E}}|Z|^p=2^{p/2}\Gamma ((p+1)/2)/\sqrt{\pi }\). Similar comments apply to parts (ii)–(iv) of Theorem 3.1.
Remark 3.5
Suppose \(n_1=\cdots =n_d=n\). For a fixed number of matching moments \(p\ge 2\), if we allow the dimension d to grow with n, the bound (3.52) of Corollary 3.2 tends to zero if \(n/d^{4/(p-1)}\rightarrow \infty \) and the bound (3.53) tends to zero if \(n/d^{6/p}\rightarrow \infty \). In particular, for \(p\ge 6\), the bound (3.52) can tend to zero even if \(d\gg n\).
We now set about proving Theorem 3.1 and Corollary 3.2. We begin with the following lemma, which provides bounds on some expectations that will appear in the proof of Theorem 3.1. In the lemma, f denotes the solution (1.2) of the Stein equation (1.1) and \(\psi _{m,j}\), for \(m\ge 1\), \(1\le j\le d\), is the solution of the Stein equation
In the univariate \(d=1\) case, we drop the subscript j and simply write \(\psi _m\) for the solution to (3.54). For \(1\le i\le n\) and \(1\le j\le d\), we let \({\textbf{W}}^{(i,j)}={\textbf{W}}-n_j^{-1/2}{\textbf{X}}_{ij}\), where the random vector \({\textbf{X}}_{ij}\) is defined to be such that it has \(X_{ij}\) as its j-th entry and the other \(d-1\) entries are equal to zero. Note that \({\textbf{W}}^{(i,j)}\) is independent of \({\textbf{X}}_{ij}\). We then define \({\textbf{W}}_\theta ^{(i,j)}={\textbf{W}}^{(i,j)}+\theta n_j^{-1/2}{\textbf{X}}_{ij}\) for some \(\theta \in (0,1)\). In the univariate case, we let \(W_\theta ^{(i)}=W^{(i)}+\theta X_{i}/\sqrt{n}\), where \(W^{(i)}=W-X_i/\sqrt{n}\).
Lemma 3.6
Let \(P({\textbf{w}})=A+B\sum _{i=1}^{d}|w_i|^{r_i},\) where \(A,B\ge 0\), and \(r_1,\ldots , r_d\ge 0\). Assume that \(\Sigma =I_d\), \(\theta \in (0,1)\) and \(q\ge 0\), \(m\ge 2\) and \(t\ge 3\). Then,
where the inequalities are for g in the classes \(C_P^t({\mathbb {R}}^d)\), \(C_P^{t-1}({\mathbb {R}}^d)\) and \(C_P^{m+1}({\mathbb {R}}^d)\), respectively. Now suppose \(d=1\) and \(\Sigma =1\). Then
where the inequalities are for g in the classes \(C_P^{t-2}({\mathbb {R}})\) and \(C_P^{m-1}({\mathbb {R}})\), respectively.
Proof
The proof is very similar to that of Lemma 3.4 of [17]. The only difference is that we improve the bounds of [17] by applying the improved bounds of Propositions 2.1 and 2.2 for the solutions f and \(\psi _m\) and we also use the inequality \(|a+b|^r\le c_r(|a|^r+|b|^r)\), \(r\ge 0\), which is sharper than the inequality \(|a+b|^r\le 2^r(|a|^r+|b|^r)\), \(r\ge 0\), used by [17]. \(\square \)
Proof of Theorem 3.1
Under the assumptions of part (i) of the theorem, the following bound given in Lemma 3.1 of [17] holds:
for some \(\theta _1,\theta _2\in (0,1)\). We now obtain inequality (3.48) by using inequality (3.56) to bound the expectations in the bound (3.60), and then simplifying the resulting bound by using the basic inequalities \({\mathbb {E}}|X_{ij}|\le {\mathbb {E}}|X_{ij}|^a\le {\mathbb {E}}|X_{ij}|^b\) for \(2\le a\le b\). These inequalities follow easily from an application Hölder’s inequality and the assumption that \({\mathbb {E}}[X_{ij}^2]=1\). We also use these inequalities to deduce that \({\mathbb {E}}|X_{ij}^{p-1}X_{ik}^{r_k}|\le {\mathbb {E}}|X_{ij}^{p+1}X_{ik}^{r_k}|\) (\(p\ge 2\), \(r_k\ge 0\)). This is verified by considering the cases \(j=k\) and \(j\not =k\) separately, where in the latter case we make use of the independence of \(X_{ij}\) and \(X_{ik}\). The proof of inequality (3.49) is similar, but we instead use inequality (3.58) to bound the expectations in (3.60).
Now, under the assumptions of part (iii) of the theorem, the following bound of Lemma 3.3 of [17] holds:
for some \(\theta _1,\ldots ,\theta _5\in (0,1)\). Using inequalities (3.55) and (3.57) to bound the expectations in the bound (3.61) and simplifying the resulting bound using the same considerations given earlier in this proof yields the bound (3.50). The proof of inequality (3.51) is similar, but we instead use the inequalities (3.58) and (3.59) to bound the expectations in (3.61). Finally, the assertion in part (v) is a consequence of part (iv) of Propositions 2.1 and 2.2. \(\square \)
Proof of Corollary 3.2
The simplified bounds are obtained by applying the following considerations to the bounds of Theorem 3.1. Firstly, for a real-valued random variable Y with \({\mathbb {E}}|Y|^b<\infty \), we have \({\mathbb {E}}|Y|^a\le {\mathbb {E}}|Y|^b,\) for any \(2\le a\le b\).
Secondly, \({\mathbb {E}}|W_k|^r\le Cn_k^{-1}\sum _{i=1}^{n_k}{\mathbb {E}}|X_{ik}|^{r}\) for \(r\ge 2\), and \({\mathbb {E}}|W_k|^r\le 1\) for \(0\le r<2\). For \(r\ge 2\), this inequality follows from the Marcinkiewicz–Zygmund inequality [29] \({\mathbb {E}}|\sum _{i=1}^{n}Y_i|^r\le C_r\{\sum _{i=1}^{n}{\mathbb {E}}|Y_i|^r+(\sum _{i=1}^{n}{\mathbb {E}}[Y_i^2])^{1/2}\}\) and the inequality \({\mathbb {E}}|Y|^a\le {\mathbb {E}}|Y|^b,\) for any \(2\le a\le b\). For \(0\le r<2\), we just use Hölder’s inequality, \({\mathbb {E}}|W_k|^r\le ({\mathbb {E}}W_k^2)^{r/2}=1\). Thirdly, by Hölder’s inequality, \({\mathbb {E}}|X_{ij}^aX_{ik}^b|\le ({\mathbb {E}}|X_{ij}^{a+b}|)^{a/(a+b)} ({\mathbb {E}}|X_{ik}^{a+b}|)^{b/(a+b)} \le \max \{{\mathbb {E}}|X_{ij}|^{a+b},{\mathbb {E}}|X_{ik}|^{a+b}\}\), for \(a,b\ge 1\). \(\square \)
4 Application to Chi-square Approximation
In this section, we provide an application of the general bounds of Sect. 3 to the Chi-square approximation of the power divergence statistics. Consider a multinomial goodness-of-fit test over n independent trials, with each trial resulting in a unique classification over \(r\ge 2\) classes. We denote the observed frequencies arising in the classes by \(U_1,\ldots ,U_r\) and denote the nonzero classification probabilities by \(p_1,\ldots ,p_r\). The power divergence family of statistics, introduced by [9], are then given by
where the index parameter \(\lambda \in {\mathbb {R}}\). When \(\lambda =0,-1\), the notation (4.62) should be understood as a result of passage to the limit. The case \(\lambda =0\) corresponds to the log-likelihood ratio statistic, whilst other important special cases include the Freeman–Tukey statistic [13] (\(\lambda =-1/2\)) and Pearson’s statistic [32] (\(\lambda =1\)):
A fundamental result is that, for all \(\lambda \in {\mathbb {R}}\), the statistic \(T_\lambda \) converges in distribution to the \(\chi _{(r-1)}^2\) distribution as the number of trials n tends to infinity (see [9], p. 443). Edgeworth expansions have been used to assess the quality of the Chi-square approximation of the distribution of the statistic \(T_{\lambda }\) by [3, 4, 14, 34, 36, 40]. For \(r\ge 4\) and all \(\lambda \in {\mathbb {R}}\), Ulyanov and Zubov [40] obtained a \(O(n^{-(r-1)/r})\) bound on the rate of convergence in the Kolmogorov distance (a refinement of this result is given in [3]) and, for \(r=3\), Assylebekov et al. [4] obtained a \(O(n^{-3/4+0.065})\) bound on the rate of convergence. It has also been shown recently by [35] that introducing external randomisation can increase the speed of convergence in the Kolmogorov metric. Special cases of the family have also been considered. For the likelihood ratio statistic, Anastasiou and Reinert [2] obtained an explicit \(O(n^{-1/2})\) bound for smooth test functions (in a setting more general than that of categorical data). For Pearson’s statistic, Yarnold [41] established a \(O(n^{-(r-1)/r})\), for \(r\ge 2\), bound on the rate of convergence in the Kolmogorov distance, which was improved to \(O(n^{-1})\) for \(r\ge 6\) by [24]. An explicit \(O(n^{-1/2})\) Kolmogorov distance bound was established using Stein’s method by [28], whilst [20] used Stein’s method to obtain explicit \(O(n^{-1})\) bounds, measured using smooth test functions. The results of [20] have recently been generalised to the power divergence statistics by [18], covering the family for \(\lambda >-1\), the largest subclass for which finite sample bounds can be obtained.
In this section, we improve the results of [18, 20] for the case of \(r=2\) cell classifications, by obtaining bounds that hold in weaker metrics and possess constants several orders of magnitude smaller. We derive our results by taking advantage of the special structure of Pearson’s statistic for the case \(r=2\), which allows us to apply the general bounds of Theorem 3.1. The special structure is that the statistic can be written as the square of a sum of i.i.d. random variables with zero mean and unit variance; see the proof of Proposition 4.1.
Proposition 4.1
(i) Let \((U_1,U_2)\) represent the vector of \(n\ge 1\) observed counts, with cell classification probabilities \(0<p_1,p_2<1\). Let \(\chi ^2\) denote Pearson’s statistic, as defined in (4.63). Then,
and, for \(h\in C_b^2({\mathbb {R}}^+)\),
where \(\chi _{(1)}^2h\) denotes the expectation \({\mathbb {E}}[h(Y)]\) for \(Y\sim \chi _{(1)}^2\).
(ii) More generally, let \(T_\lambda \), \(\lambda >-1\), denote the power divergence statistic (4.62). Then,
and, for \(h\in C_b^2({\mathbb {R}}^+)\),
Using the special structure of Pearson’s statistic when \(r=2\), and applying inequality (4.67) together with a standard smoothing technique for converting smooth test function bounds into Kolmogorov distance bounds, we deduce the following Kolmogorov distance bounds (see [8, p. 48], and also [19] for a detailed account of the technique).
Corollary 4.2
Suppose \(n\ge 1\) and \(0<p_1,p_2<1.\) Then,
Let \(\lambda >-1\). Then, there exist universal constants \(C_1,C_2,C_3>0\), independent of n, \(p_1\), \(p_2\) and \(\lambda \), such that
Remark 4.3
Order \(n^{-1/2}\) bounds for the quantity \(|{\mathbb {E}}[h(T_\lambda )]-\chi _{(r)}^2\,h|\), \(r\ge 2\), were obtained by [18] for \(h\in C_b^2({\mathbb {R}}^+)\), whilst \(O(n^{-1})\) bounds were obtained for \(h\in C_b^5({\mathbb {R}}^+)\). These bounds generalised bounds of [20] for the quantity \(|{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|\), which held for bounded h in the classes \(C_b^2({\mathbb {R}}^+)\) and \(C_b^5({\mathbb {R}}^+)\). By obtaining bounds on \(|{\mathbb {E}}[h(T_\lambda )]-\chi _{(1)}^2h|\) that hold for a wider class of functions, in Corollary 4.2 we are able to deduce a Kolmogorov distance bound for the \(\chi _{(1)}^2\) approximation of \(T_\lambda \) for two cell classifications that improve the \(O(n^{-1/10})\) bound on the rate of convergence obtained by [18] to \(O(n^{-1/5})\). This rate of convergence is slower than the optimal \(O(n^{-1/2})\) rate for the case \(r=2\), but is, to the best knowledge of the author, the only such bound in the literature (except that of [18]) that tends to zero under the condition that \(np_*\rightarrow \infty \), where \(p_*=\textrm{min}\{p_1,p_2\}\).
Indeed, \(np_*\rightarrow \infty \) (\(p_*=\textrm{min}_{1\le j\le r}p_j\)) is an established condition under which the Chi-square approximation of Pearson’s statistic is valid [25]. The key to obtaining bounds that tend to zero under this condition was the weaker moment conditions of Theorem 3.1. If any of the absolute moments in the bounds in parts (ii) and (iv) were of larger order, we would not have been able to obtain bounds that tend to zero if \(np_*\rightarrow \infty \). As an illustrative example, using Theorem 3.5 of [17] instead of part (iv) of Theorem 3.1 would have resulted in a \(O(n^{-1})\) bound that tends to zero under the stronger condition \(n^{5/2}p_*\rightarrow \infty \).
The following result, which may be of independent interest, is used to prove Proposition 4.1.
Corollary 4.4
Suppose \(X_1,\ldots ,X_n\) are i.i.d. random variables with \({\mathbb {E}}[X_1]=0\), \({\mathbb {E}}[X_1^2]=1\) and \({\mathbb {E}}[X_1^4]<\infty \). Let \(W=n^{-1/2}\sum _{i=1}^nX_i\). Then,
Suppose now that \({\mathbb {E}}[X_1^6]<\infty \). Then, for \(h\in C_b^2({\mathbb {R}}^+)\),
Remark 4.5
The bound (4.71) improves on the bound of Theorem 3.1 of [20] for the quantity \(|{\mathbb {E}}[h(W^2)]-\chi _{(1)}^2h|\) by holding for a larger class of test functions, having weaker moment conditions, and also possessing smaller numerical constants.
Proof
To prove inequality (4.70), we apply part (ii) of Theorem 3.1. We have \(g(w)=w^2\) and, since \(g'(w)=2w\), we take \(P(w)=2|w|\) as our dominating function. Applying inequality (3.49) (taking note of Remark 3.4 to get a Wasserstein distance bound) with \(A=0, B=2, r=1\) and \(p=2\), and using that \({\mathbb {E}}|W|\le ({\mathbb {E}}[W^2])^{1/2}=1\) and \(\mu _1=\sqrt{2/\pi }\), we obtain the bound (4.70), after rounding numerical constants up to the nearest integer.
We now prove inequality (4.71). Since \(g(w)=w^2\) is an even function, we can apply part (iv) of Theorem 3.1 to obtain a \(O(n^{-1})\) bound. We have \((g'(w))^2=4w^2\) and \(g''(w)=2\), and so take \(P(w)=2+4w^2\) as our dominating function. Applying inequality (3.51) with \(A=2, B=4, r=2\) and \(p=2,\) and using that \({\mathbb {E}}[W^2]=1\) and \(\mu _3=2\sqrt{2/\pi }\), now yields inequality (4.71). \(\square \)
We will use the following lemma in our proof of Proposition 4.1. The proof, which involves an application of Theorem 3.1, is deferred until the end of the section.
Lemma 4.6
Let \(X_1,\ldots ,X_n\) be i.i.d. random variables with \(X_1=_d(I-p_1)/\sqrt{p_1p_2}\), where \(I\sim \textrm{Ber}(p_1)\). Let \(W=n^{-1/2}\sum _{i=1}^nX_i\). Then, for \(h\in C_b^2({\mathbb {R}}^+)\),
Proof of Proposition 4.1
(i) We begin by proving inequalities (4.64) and (4.65). Let \(p=p_1\), so that \(p_2=1-p\). As \(U_1+U_2=n\), a short calculation gives that
Since \(U_1\sim \textrm{Bin}(n,p)\), it can be expressed as a sum of i.i.d. indicator random variables: \(U_1=\sum _{i=1}^nI_i\), where \(I_i\sim \textrm{Ber}(p)\). We can therefore write \(\chi ^2=W^2\), where \(W=n^{-1/2}\sum _{i=1}^n X_i\), for \(X_1,\ldots , X_n\) i.i.d. random variables with \(X_1=(I_1-p)/\sqrt{p(1-p)}\). We have that \({\mathbb {E}}[X_1]=0\), \({\mathbb {E}}[X_1^2]=1\) and \(|{\mathbb {E}}[X_1^m]|\le {\mathbb {E}}|X_1|^m\le (p(1-p))^{1-m/2}\) for all \(m\ge 2\). The assumptions for inequalities (4.70) and (4.71) therefore hold. Plugging these moment bounds into (4.70) and rounding constants up to the nearest integer then yield the bound
We obtain the simplified bound (4.64) as follows. Let \(Y\sim \chi _{(1)}^2\). Observe that, for \(h\in {\mathcal {H}}_{\textrm{W}}\),
and so
Now, the upper bound in (4.73) is greater than the upper bound in (4.74) if \(\sqrt{np_1p_2}<17\), and so we may take \(\sqrt{np_1p_2}\ge 17\) in (4.73), and doing so yields the bound (4.64). Similarly, we obtain inequality (4.65) by plugging the moment bounds into (4.71) and simplifying the bound as we did in deriving inequality (4.64).
(ii) To prove inequality (4.66), we use the following bound, which can be found by examining the proof of Theorem 2.3 of [18]:
Using inequality (4.64) to bound \(|{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|\) and also using the inequality \(1/\sqrt{p_1}+1/\sqrt{p_2}\le \sqrt{2/(p_1p_2)}\) now yield the desired bound. We note that inequality (4.75) was derived under the assumption that \(np_*\ge 1\) and that if \(\lambda \ge 2\), then also \(np_*\ge 2(\lambda -2)^2\), where \(p_*=\textrm{min}\{p_1,p_2\}\). However, if these assumptions are not satisfied, then the bound in (4.66) exceeds 2 [the upper bound in (4.74)], and so there is no need to impose these conditions in the statement of the theorem; similar comments apply to inequality (4.76) below.
Finally, we prove inequality (4.67). We use the following bound, which can be found by examining the proof of Theorem 2.2 of [18]:
where
and \(S_j=(U_j-np_j)/\sqrt{np_j}\), \(j=1,2\). To bound R, we express \(S_1\) and \(S_2\) in terms of W. Using the representation of \(W^2=\chi ^2\) given at the beginning of the proof and the alternative representation \(W=(U_2-np_2)/{\sqrt{np_1p_2}}\), which is obtained using a similar calculation, we have the representations \(S_1=\sqrt{p_2}W\) and \(S_2=\sqrt{p_1}W. \) Therefore,
where we used Lemma 4.6 to obtain the first inequality. We now obtain inequality (4.67) by substituting inequality (4.77) and our bound (4.65) for \(|{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|\) into (4.76), and simplifying the resulting bound using the formula \(1/p_1+1/p_2= 1/(p_1p_2)\) and the inequality \((1/\sqrt{p_1}+1/\sqrt{p_2})^2\le 2/(p_1p_2)\), for \(0<p_1,p_2<1\) such that \(p_1+p_2=1\). \(\square \)
Proof of Corollary 4.2
We first prove inequality (4.68). Recall from the proof of Proposition 4.1 that when \(r=2\) we can write Pearson’s statistic as \(\chi ^2=W^2\), where \(W=n^{-1/2}\sum _{i=1}^{n}X_{i}\) for \(X_{1},\ldots ,X_{n}\) i.i.d. random variables with \(X_1=_d(I_1-p)/\sqrt{p(1-p)}\) and \(I_1\sim \textrm{Ber}(p)\). Recall also that if \(Z\sim N(0,1)\) then \(Z^2\sim \chi _{(1)}^2\). Thus, for any \(z>0\),
from which we deduce that \( d_{\textrm{K}}({\mathcal {L}}(\chi ^2),\chi ^2_{(1)})\le 2d_{\textrm{K}}({\mathcal {L}}(W),{\mathcal {L}}(Z))\). Now, using the Berry–Esseen theorem with the best available numerical constant of \(C=0.4748\) [38], we get:
We now prove inequality (4.69). Let \(\alpha >0\), and for fixed \(z>0\) define a function \(h_\alpha :{\mathbb {R}}^+\rightarrow [0,1]\) by \(h_\alpha (x)=1\) if \(x\le z\); \(h_\alpha (x)=1-2(x-z)^2/\alpha ^2\) if \(z<x\le z+\alpha /2\); \(h_\alpha (x)=2(x-(z+\alpha ))^2/\alpha ^2\) if \(z+\alpha /2<x\le z+\alpha \); and \(h_\alpha (x)=0\) if \(x\ge z+\alpha \). Then \(h_\alpha '\) is Lipschitz with \(\Vert h_\alpha '\Vert =2/\alpha \) and \(\Vert h_\alpha ''\Vert =4/\alpha ^2\). Let \(Y\sim \chi _{(1)}^2\). Using inequality (4.67) now yields
We note the inequality \({\mathbb {P}}(z\le Y\le z+\alpha )\le \sqrt{2\alpha /\pi }\) (see [20, p. 754]). An upper bound now follows on applying this inequality to (4.78) and choosing \(\alpha = c(np_1p_2)^{-2/5}\), for some universal constant \(c>0\). To simplify the bound, we used that basic inequality \(|\lambda -1|\le 1+(\lambda -1)^2\). Similarly, we can obtain a lower bound that is the negative of the upper bound, which completes the proof. \(\square \)
Proof of Lemma 4.6
Let \(g(w)=w^3h'(w^2)\). We begin by noting that \({\mathbb {E}}[g(Z)]={\mathbb {E}}[Z^3h'(Z^2)]=0\). This is because the standard normal distribution is symmetric about the origin (\(Z=_d -Z\), for \(Z\sim N(0,1)\)) and g(w) is an odd function (\(g(w)=-g(-w)\) for all \(w\in {\mathbb {R}}\)), meaning that \({\mathbb {E}}[g(Z)]={\mathbb {E}}[g(-Z)]=-{\mathbb {E}}[g(Z)]\), whence \({\mathbb {E}}[g(Z)]=0\). Therefore, we can write \({\mathbb {E}}[W^3h'(W^2)]={\mathbb {E}}[W^3h'(W^2)]-{\mathbb {E}}[Z^3h'(Z^2)]\).
We shall obtain the bound (4.72) by applying part (ii) of Theorem 3.1 with \(h(w)=w\) and \(g(w)=w^3\,h'(w^2)\). Using the basic inequality \(2a^2\le 1+a^4\), we have, for \(w\in {\mathbb {R}}\),
We therefore can apply part (ii) of Theorem 3.1 with \(A=3\Vert h'\Vert /2\), \(B=2\Vert h''\Vert +3\Vert h'\Vert /2\), \(r=4\) and \(p=2\), and using the bound (3.49) gives that
where we used that \(\mu _5=8\sqrt{2/\pi }\) and that \({\mathbb {E}}|X_1^3|\le (p_1p_2)^{-1/2}\), \({\mathbb {E}}|X_1^7|\le (p_1p_2)^{-5/2}\) and \({\mathbb {E}}[W^4]=3(n-1)({\mathbb {E}}[X_1^2])^2/n+{\mathbb {E}}[X_1^4]/n\le 3+1/(np_1p_2)\). We obtain the final simplified bound (4.72) using a similar argument to the one used in the proof of Proposition 4.1. \(\square \)
References
Anastasiou, A., Gaunt, R.E.: Multivariate normal approximation of the maximum likelihood estimator via the delta method. Braz. J. Probab. Stat. 34, 136–149 (2020)
Anastasiou, A., Reinert, G.: Bounds for the asymptotic distribution of the likelihood ratio. Ann. Appl. Probab. 30, 608–643 (2020)
Assylebekov, Z.A.: Convergence rate of multinomial goodness-of-fit statistics to chi-square distribution. Hiroshima Math. J. 40, 115–131 (2010)
Assylebekov, Z.A., Zubov, V.N., Ulyanov, V.V.: On approximating some statistics of goodness-of-fit tests in the case of three-dimensional discrete data. Sib. Math. J. 52, 571–584 (2011)
Barbour, A.D.: Stein’s method for diffusion approximations. Probab. Theory Relat. 84, 297–322 (1990)
Barbour, A.D., Hall, P.: On bounds to the rate of convergence in the central limit theorem. Bull. Lond. Math. Soc. 17, 151–156 (1985)
Berend, D., Tassa, T.: Improved bounds on Bell numbers and on moments of sums of random variables. Probab. Math. Stat. 30, 185–205 (2010)
Chen, L.H.Y., Goldstein, L., Shao, Q.-M.: Normal Approximation by Stein’s Method. Springer, Berlin (2011)
Cressie, N., Read, T.R.C.: Multinomial goodness-of-fit tests. J. R. Stat. Soc. B Methodol. 46, 440–464 (1984)
Döbler, C., Gaunt, R.E., Vollmer, S.J.: An iterative technique for bounding derivatives of solutions of Stein equations. Electron. J. Probab. 22, 96 (2017)
Elezović, N., Giordano, C., Pečarić, J.: The best bounds in Gautschi’s inequality. Math. Inequal. Appl. 3, 239–252 (2000)
Fischer, A., Gaunt, R.E., Reinert, G., Swan, Y.: Normal approximation for the posterior in exponential families. arXiv:2209.08806 (2022)
Freeman, M.F., Tukey, J.W.: Transformations related to the angular and the square root. Ann. Math. Stat. 21, 607–611 (1950)
Fujikoshi, Y., Ulyanov, V.V.: Non-asymptotic Analysis of Approximations for Multivariate Statistics. Springer Briefs (2020)
Gaunt, R.E.: Inequalities for modified Bessel functions and their integrals. J. Math. Anal. Appl. 420, 373–386 (2014)
Gaunt, R.E.: Rates of convergence in normal approximation under moment conditions via new bounds on solutions of the Stein equation. J. Theoret. Probab. 29, 231–247 (2016)
Gaunt, R.E.: Stein’s method for functions of multivariate normal random variables. Ann. I. H. Poincare Probab. Stat. 56, 1484–1513 (2020)
Gaunt, R.E.: Bounds for the chi-square approximation of the power divergence family of statistics. J. Appl. Probab. 59, 1059–1080 (2022)
Gaunt, R.E., Li, S.: Bounding Kolmogorov distances through Wasserstein and related integral probability metrics. J. Math. Anal. Appl. 514, 126274 (2023)
Gaunt, R.E., Pickett, A., Reinert, G.: Chi-square approximation by Stein’s method with application to Pearson’s statistic. Ann. Appl. Probab. 27, 720–756 (2017)
Gaunt, R.E., Reinert, G.: Bounds for the chi-square approximation of Friedman’s statistic by Stein’s method. Bernoulli (2023)
Goldstein, L., Rinott, Y.: Multivariate normal approximations by Stein’s method and size bias couplings. J. Appl. Probab. 33, 1–17 (1996)
Götze, F.: On the rate of convergence in the multivariate CLT. Ann. Probab. 19, 724–739 (1991)
Götze, F., Ulyanov, V.V.: Asymptotic distribution of \(\chi ^{2}\)-type statistics. Preprint 03–033 Research group Spectral analysis, asymptotic distributions and stochastic dynamics, Bielefeld Univ., Bielefeld (2003)
Greenwood, P.E., Nikulin, M.S.: A Guide to Chi-Squared Testing. Wiley, New York (1996)
Ismail, M.E.H., Lorch, L., Muldoon, M.E.: Completely monotonic functions associated with the gamma function and its q-analogues. J. Math. Anal. Appl. 116, 1–9 (1986)
Jameson, G.J.O.: The incomplete gamma functions. Math. Gazette 100, 298–306 (2016)
Mann, B.: Convergence rate for a \(\chi ^{2}\) of a multinomial. Unpublished manuscript (1997)
Marcinkiewicz, J., Zygmund, A.: Sur les fonctions indépendantes. Fundam. Math. 29, 60–90 (1937)
Natalini, P., Palumbo, B.: Inequalities for the incomplete gamma function. Math. Inequal. Appl. 3, 69–77 (2000)
Olver, F.W.J., Lozier, D.W., Boisvert, R.F., Clark, C.W.: NIST Handbook of Mathematical Functions. Cambridge University Press (2010)
Pearson, K.: On the criterion that a given system of deviations is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. 50, 157–175 (1900)
Pickett, A.: Rates of Convergence of \(\chi ^2\) Approximations via Stein’s Method. DPhil Thesis, University of Oxford (2004)
Prokhorov, Y.V., Ulyanov, V.V.: Some approximation problems in statistics and probability. In: Eichelsbacher, P. et al (eds.) Limit Theorems in Probability, Statistics and Number Theory, Springer Proceedings in Mathematics & Statistics, vol. 42, pp. 235–249 (2013)
Puchkin, N., Ulyanov, V.: Inference via randomized test statistics. Ann. I. H. Poincare Probab. Stat. (2023)
Read, T.R.C.: Closer asymptotic approximations for the distributions of the power divergence goodness-of-fit statistics. Ann. I. Stat. Math. 36, 59–69 (1984)
Reinert, G.: Three general approaches to Stein’s method. In: Barbour, A.D., Chen, L.H.Y. (eds.), An Introduction to Stein’s Method. Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore, vol. 4. Singapore Univ. Press, Singapore, pp. 183–221 (2005)
Shevtsova, I.: On the absolute constants in the Berry Esseen type inequalities for identically distributed summands. arXiv:1111.6554 (2011)
Stein, C.: A bound for the error in the normal approximation to the the distribution of a sum of dependent random variables. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 2. Univ. California Press, Berkeley, pp. 583–602 (1972)
Ulyanov, V.V., Zubov, V.N.: Refinement on the convergence of one family of goodness-of-fit statistics to chi-squared distribution. Hiroshima Math. J. 39, 133–161 (2009)
Yarnold, J.K.: Asymptotic approximations for the probability that a sum of lattice random vectors lies in a convex set. Ann. Math. Stat. 43, 1566–1580 (1972)
Acknowledgements
HS is supported by an EPSRC PhD Studentship.
Author information
Authors and Affiliations
Contributions
Both authors carried out the research, wrote the manuscript and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Further Proofs
A Further Proofs
Proof of Lemma 2.5
The case \(r=0\) is trivial. Let us prove inequality (2.16) for integer \(r\ge 1\). By the binomial theorem,
where the inequality follows from an application of the basic inequality \(x^ky^{r-k}\le x^r+y^r\). The proof for general \(r>0\) is exactly the same except that we must instead use the generalised binomial theorem. The proof of inequality (2.17) is also very similar, with the same basic argument, but instead we apply the generalised trinomial theorem and the basic inequality \(x^iy^jz^k\le x^r+y^r+z^r\) for \(i,j,k\ge 0\) such that \(i+j+k=r\). \(\square \)
Proof of Lemma 2.6
Making the change of variables \(u=t^2/2\) allows us to write \(T_r(w)\) in terms of the upper incomplete gamma function \(\Gamma (a,x)=\int _{x}^{\infty }u^{a-1}\textrm{e}^{-u}\,\textrm{d}u\):
We now note two upper bounds for the incomplete gamma function. If \(0<a\le 1,\) then \(\Gamma (a,x)\le x^{a-1}\textrm{e}^{-x}\) for all \(x>0\) (See [27]), whilst if \(a>1\), then \(\Gamma (a,x)\le Bx^{a-1}\textrm{e}^{-x}\), for \(x>\frac{B}{B-1}(a-1), B>1\) (see [30]). For \(0\le r\le 1\), we apply the bound of [27] to (A.1) to obtain inequality (2.18). To obtain inequality (2.19), we instead apply the inequality of [30] with \(B=2\) to (A.1). \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gaunt, R.E., Sutcliffe, H. Improved Bounds in Stein’s Method for Functions of Multivariate Normal Random Vectors. J Theor Probab 37, 642–670 (2024). https://doi.org/10.1007/s10959-023-01257-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10959-023-01257-6
Keywords
- Stein’s method
- Functions of multivariate normal random vectors
- Multivariate normal approximation
- Chi-square approximation
- Rate of convergence
- Power divergence statistic