1 Introduction

Let \({\textbf{Z}}\) be a standard d-dimensional multivariate normal random vector and let \(\Sigma \in {\mathbb {R}}^{d\times d}\) be a non-negative-definite covariance matrix, so that \(\Sigma ^{1/2}{\textbf{Z}}\sim \textrm{MVN}_d({\textbf{0}},\Sigma )\). By the continuous mapping theorem, if a sequence of d-dimensional random vectors \(({\textbf{W}}_n)_{n\ge 1}\) converges in distribution to \(\Sigma ^{1/2}{\textbf{Z}}\), then, for any continuous function \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\), \((g({\textbf{W}}_n))_{n\ge 1}\) converges in distribution to \(g(\Sigma ^{1/2}{\textbf{Z}})\). In a recent work, Gaunt [17] developed Stein’s method [39] for the problem of obtaining explicit bounds on the distance between the distributions of \(g({\textbf{W}}_n)\) and \(g(\Sigma ^{1/2}{\textbf{Z}})\), measured using smooth test functions. Henceforth, for ease of notation, we drop the subscript from \({\textbf{W}}_n\).

The basic approach used by [17] (a version of which was first used for Chi-square approximation by [33, 37]) is as follows. Consider the multivariate normal Stein equation [5, 22, 23] with test function \(h(g(\cdot ))\):

$$\begin{aligned} \nabla ^\intercal \Sigma \nabla f({\textbf{w}})-{\textbf{w}}^\intercal \nabla f({\textbf{w}})=h(g({\textbf{w}}))-{\mathbb {E}}[h(g(\Sigma ^{1/2}{\textbf{Z}}))], \end{aligned}$$
(1.1)

which has solution

$$\begin{aligned} f_h({\textbf{w}})=-\int _{0}^{1}\frac{1}{t}\big \{{\mathbb {E}}[h(g(t{\textbf{w}}+\sqrt{1-t^2}\Sigma ^{1/2}{\textbf{Z}}))]-{\mathbb {E}}[h(g(\Sigma ^{1/2}{\textbf{Z}}))]\big \}\,\textrm{d}t. \end{aligned}$$
(1.2)

In the univariate case, with \(d=1\) and \(\Sigma =1\), the multivariate normal Stein equation (1.1) reduces to the standard normal Stein equation [39]

$$\begin{aligned} f''(w)-wf'(w)=h(g(w))-{\mathbb {E}}[h(g(Z))], \end{aligned}$$

where \(Z\sim N(0,1)\), and the solution is given by

$$\begin{aligned} f_h'(w)&=-\textrm{e}^{w^2/2}\int _w^\infty \{h(t)-{\mathbb {E}}[h(g(Z))]\}\textrm{e}^{-t^2/2}\,\textrm{d}t \end{aligned}$$
(1.3)
$$\begin{aligned}&=\textrm{e}^{w^2/2}\int _{-\infty }^w\{h(t)-{\mathbb {E}}[h(g(Z))]\}\textrm{e}^{-t^2/2}\,\textrm{d}t. \end{aligned}$$
(1.4)

The quantity of interest \(|{\mathbb {E}}[h(g({\textbf{W}}))]-{\mathbb {E}}[h(g(\Sigma ^{1/2}{\textbf{Z}}))]|\) can now be bounded by bounding the expectation

$$\begin{aligned} {\mathbb {E}}[\nabla ^\intercal \Sigma \nabla f_h({\textbf{W}})-{\textbf{W}}^\intercal \nabla f_h({\textbf{W}})]. \end{aligned}$$
(1.5)

Taking the supremum of (1.5) over all h in some measure determining class of test functions \({\mathcal {H}}\) yields a bounds on the distance between \(g({\textbf{W}})\) and \(g(\Sigma ^{1/2}{\textbf{Z}})\) as measured by the integral probability metric \(d_{{\mathcal {H}}}({\textbf{W}},\Sigma ^{1/2}{\textbf{Z}}):=\sup _{h\in {\mathcal {H}}}|{\mathbb {E}}[h(g({\textbf{W}}))]-{\mathbb {E}}[h(g(\Sigma ^{1/2}{\textbf{Z}}))]|\). In Stein’s method, the following classes of test functions are often used:

$$\begin{aligned} {\mathcal {H}}_{\textrm{K}}&=\{{\textbf{1}}(\cdot \le z\,:\,z\in {\mathbb {R}}\}, \\ {\mathcal {H}}_{\textrm{W}}&=\{h:{\mathbb {R}}\rightarrow {\mathbb {R}}\,:\,\hbox {} h \hbox {is Lipschitz with} \Vert h'\Vert \le 1\}, \\ {\mathcal {H}}_p&=\{h:{\mathbb {R}}\rightarrow {\mathbb {R}}\,:\, h^{(p-1)}\text { is Lipschitz with }\Vert h^{(k)}\Vert \le 1, 1\le k\le p\}, \end{aligned}$$

which induce the Kolmogorov, Wasserstein and smooth Wasserstein (\(p\ge 1\)) distances, denoted by \(d_{\textrm{K}}\), \(d_{\textrm{W}}\) and \(d_p\), respectively. As discussed by [6], in theoretical settings the \(d_p\) distance is a natural probability metric to work, particularly in the context of quantitative limit theorems with faster convergence rates than the \(O(n^{-1/2})\) Berry–Esseen rate. Here, and throughout the paper, \(\Vert \cdot \Vert :=\Vert \cdot \Vert _\infty \) is the usual supremum norm of a real-valued function. Note that \(d_1=d_{\textrm{W}}\).

One of the main contributions of [17] was to obtain suitable bounds for solution (1.2) of the Stein equation (1.1) that hold for a large class of functions \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) [this is necessary in order to obtain good bounds on the quantity (1.5)]. In particular, Gaunt [17] obtained bounds for the case that the derivatives of g have polynomial growth (this covers, for example, Chi-square approximation: \(g(w)=w^2\)). These bounds were used by [17] to derive explicit bounds on the distance between the distributions of \(g({\textbf{W}})\) and \(g({\textbf{Z}})\) for the case that \({\textbf{W}}\) is a sum of independent random vectors with independent components, that is \({\textbf{W}}=(W_1,\ldots ,W_d)^\intercal \), where, for \(j=1,\ldots ,d\), \(W_j=n_j^{-1/2}\sum _{i=1}^{n_j}X_{ij}\), and the \(X_{ij}\) are independent random variables with zero mean and unit variance. Notably, Gaunt [17] obtained bounds with faster rates of convergence than the \(O(n^{-1/2})\) Berry–Esseen rate under additional matching moments between the \(X_{ij}\) and the standard normal distribution, and when g is an even function \((g({\textbf{w}})=g(-{\textbf{w}})\) for all \({\textbf{w}}\in {\mathbb {R}}^d\)).

The aforementioned results of [17] seem to have broad applicability, in part because many distributional approximations in probability and statistics assess the distance between the distributions of random variables that can be expressed in the form \(g({\textbf{W}})\) and \(g(\Sigma ^{1/2}{\textbf{Z}})\), where \({\textbf{W}}\) is close in distribution to \(\Sigma ^{1/2}{\textbf{Z}}\). Indeed, applications include bounds for the Chi-square approximation of the likelihood ratio statistic [2], the family of power divergence statistics [18] and Friedman’s statistic [21], multivariate normal approximation of the maximum likelihood estimator [1], and the bounds for distributional approximation in the delta method [17].

Motivated by the broad applicability of the results of [17], in this paper we improve the results of [17] in the form of weaker moment conditions, smaller constants and simpler bounds. Future works (including [12]) that require results from Stein’s method for functions of multivariate normal approximation will reap these benefits.

In Sect. 2, we obtain non-uniform bounds on the derivatives of solution (1.2) of the Stein equation (1.1) that have smaller constants than those of [17] and improved polynomial growth rate. We achieve these improved bounds through a more focused proof than that used by [17], which had derived the bounds for the case of polynomial growth rate \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) from a more general framework. We also use the iterative technique of [10] for bounding derivatives of solutions of Stein equations to obtain bounds on the derivatives of the solution (1.2) in the univariate \(d=1\) case (which require weaker differentiability assumptions on h and g) that have an optimal w-dependence; a crude approach had been used by [17] that resulted in bounds with sub-optimal w-dependence.

In Sect. 3, we apply the bounds of Sect. 2 to obtain bounds on the distance between the distributions of \(g({\textbf{W}})\) and \(g({\textbf{Z}})\), where \({\textbf{W}}\) is a sum of independent random vectors with independent components and \({\textbf{Z}}\) is a standard d-dimensional multivariate normal random vector. The bounds of Theorem 3.1 improve on those of Theorems 3.2-\(-\)3.5 of [17] in terms of smaller constants and weaker moment conditions. In Corollary 3.2, we provide simplified bounds without explicit constants.

In Sect. 4, we provide an application of the general bounds of Sect. 3 to Chi-square approximation. We derive explicit bounds for the Chi-square approximation of the power divergence family of statistics [9] in the case of two cell classifications. The power divergence statistic has a special structure in the case of two cell classifications, which allows us to apply the bounds of Sect. 3. Our bounds improve on existing results in the literature by holding in stronger probability metrics and having smaller constants, and our Kolmogorov distance bounds have a faster rate of convergence than the only other bounds in the literature that have the correct dependence on the cell classification probabilities. Moreover, as all our bounds possess an optimal dependence on the cell classification probabilities, we are able to demonstrate the significance of the weaker moment conditions of the bounds of Theorem 3.1, as using the general bounds of [17] would lead to bounds with a sub-optimal dependence on these probabilities; see Remark 4.3. Finally, some technical lemmas are proved in Appendix A.

Notation. The class \(C^n_b(I)\) consists of all functions \(h:I\subset {\mathbb {R}}\rightarrow {\mathbb {R}}\) for which \(h^{(n-1)}\) exists and is absolutely continuous and has bounded derivatives up to the n-th order. For a given P, the class \(C_P^n({\mathbb {R}}^d)\) consists of all functions \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) such that all n-th order partial derivatives of g exist and are such that, for \({\textbf{w}}\in {\mathbb {R}}^d\),

$$\begin{aligned} \bigg |\frac{\partial ^k}{\prod _{j=1}^{k}\partial w_{i_j}}g({\textbf{w}})\bigg |^{n/k}\le P({\textbf{w}}), \quad k=1,\ldots ,n. \end{aligned}$$

We will also consider the weaker class \(C_{P,*}^n({\mathbb {R}}^d)\), which consists of all functions \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) such that all n-th order partial derivatives of g exist and are bounded in absolute value by \(P(\textrm{w})\) for all \({\textbf{w}}\in {\mathbb {R}}^d\). We will write , where is a Stirling number of the second kind (see [31]). A standard multivariate normal random vector of dimension d will be denoted by \({\textbf{Z}}\), and, in the univariate \(d=1\) case, Z will denote a standard normal N(0, 1) random variable. Many of our bounds will be expressed in terms of the r-th absolute moment of the N(0, 1) distribution, which we denote by \(\mu _r=2^{r/2}\Gamma ((r+1)/2)/\sqrt{\pi }\).

2 Bounds for the Solution of the Stein Equation

In the following proposition, we provide bounds for solution (1.2) of the \(\textrm{MVN}_d({\textbf{0}},\Sigma )\) Stein equation (1.1) with test function \(h(g(\cdot ))\), which improve on results of [17].

Proposition 2.1

Let \(P({\textbf{w}})=A+B\sum _{i=1}^d|w_i|^{r_i}\), where \(r_i\ge 0\), \(i=1,\ldots ,d\). Let \(\sigma _{i,i}=(\Sigma )_{i,i}\), \(i=1,\ldots ,d\). Let \(f(=f_h)\) denote solution (1.2).

  1. (i)

    Assume that \(\Sigma \) is non-negative definite and \(h\in C_b^n({\mathbb {R}})\) and \(g\in C_P^n({\mathbb {R}}^d)\) for \(n\ge 1\). Then, for all \({\textbf{w}}\in {\mathbb {R}}^d\),

    $$\begin{aligned} \bigg |\frac{\partial ^nf({\textbf{w}})}{\prod _{j=1}^n\partial w_{i_j}}\bigg |&\le \frac{h_n}{n}\bigg [A+B\sum _{i=1}^d2^{r_i/2}\big (|w_i|^{r_i}+\sigma _{ii}^{r_i/2}\mu _{r_i}\big )\bigg ]. \end{aligned}$$
    (2.6)
  2. (ii)

    Now assume that \(\Sigma \) is positive definite and \(h\in C_b^{n-1}({\mathbb {R}})\) and \(g\in C_P^{n-1}({\mathbb {R}}^d)\) for \(n\ge 2\). Then, for all \({\textbf{w}}\in {\mathbb {R}}^d\),

    $$\begin{aligned} \bigg |\frac{\partial ^{n}f({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |&\le \frac{\sqrt{\pi }\Gamma (\frac{n}{2})}{2\Gamma (\frac{n+1}{2})}h_{n-1}\min _{1\le l\le d} \bigg [A{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l|\nonumber \\&\quad +B\sum _{i=1}^d2^{r_i/2}\big (|w_i|^{r_i}{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l|\nonumber \\ {}&\quad +{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l((\Sigma ^{1/2}{\textbf{Z}})_i)^{r_i}|\big )\bigg ]. \end{aligned}$$
    (2.7)

    In the case \(\Sigma =I_d\), the \(d\times d\) identity matrix, we obtain the simplified bound:

    $$\begin{aligned} \bigg |\frac{\partial ^{n}f({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |&\le \frac{\sqrt{\pi }\Gamma (\frac{n}{2})}{2\Gamma (\frac{n+1}{2})}h_{n-1} \bigg [A+B\sum _{i=1}^d2^{r_i/2}\big (|w_i|^{r_i}+\mu _{r_i+1}\big )\bigg ]. \end{aligned}$$
    (2.8)
  3. (iii)

    Finally, consider the case \(d=1\) with \(\Sigma =1\). Assume that \(h\in C_b^{n-2}({\mathbb {R}})\) and \(g\in C_P^{n-2}({\mathbb {R}})\), where \(n\ge 3\) and \(P(w)=A+B|w|^r\), \(r\ge 0\). Then, for all \(w\in {\mathbb {R}}\),

    $$\begin{aligned} |f^{(n)}(w)|\le h_{n-2}\big [\alpha _rA+2^{r/2}B\big (\beta _r|w|^r+\gamma _r\big )\big ], \end{aligned}$$
    (2.9)

    where \((\alpha _r,\beta _r,\gamma _r)=(4,4,2\mu _r)\) if \(0\le r\le 1\), and \((\alpha _r,\beta _r,\gamma _r)=(r+3,r+5,(r+1)\mu _{r+1})\) if \(r>1\).

  4. (iv)

    If \(h(w)=w\) for all \(w\in {\mathbb {R}}\), then the inequalities in parts (i), (ii) and (iii) hold for g in the classes \(C_{P,*}^n({\mathbb {R}}^d)\), \(C_{P,*}^{n-1}({\mathbb {R}}^d)\) and \(C_{P,*}^{n-2}({\mathbb {R}})\), respectively. Moreover, under these assumptions, the polynomial growth rate in bounds (2.6)–(2.9) is optimal.

In the following proposition, we obtain bounds for \(\psi _m\), the solution of the Stein equation

$$\begin{aligned} \nabla ^\intercal \Sigma \nabla \psi _m({\textbf{w}})-{\textbf{w}}^\intercal \nabla \psi _m({\textbf{w}})=\frac{\partial ^m f({\textbf{w}})}{\prod _{j=1}^{m}\partial w_{i_j}}, \end{aligned}$$
(2.10)

where \(f(=f_h)\) is the solution (1.2). Again, the bounds improve on results of [17]. Here, and throughout this section, we suppress in the notation the dependence of the solution \(\psi _m\) on the components with respect to which f has been differentiated; we do this for ease of notation and because our bounds for \(\psi _m\) do not depend themselves on which components f has been differentiated with respect to. The Stein equation (2.10) arises in the proof of parts (iii) and (iv) of Theorem 3.1, in which faster convergence rates for the distributional approximation of \(g({\textbf{W}})\) by \(g({\textbf{Z}})\) are achieved when \(g:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) is an even function.

Proposition 2.2

Let \(P({\textbf{w}})=A+B\sum _{i=1}^d|w_i|^{r_i}\), where \(r_i\ge 0\), \(i=1,\ldots ,d\). Let \(\sigma _{i,i}=(\Sigma )_{i,i}\), \(i=1,\ldots ,d\).

  1. (i)

    Assume that \(\Sigma \) is non-negative definite and \(h\in C_b^{m+n}({\mathbb {R}})\) and \(g\in C_P^{m+n}({\mathbb {R}}^d)\) for \(m,n\ge 1\). Then, for all \({\textbf{w}}\in {\mathbb {R}}^d\),

    $$\begin{aligned} \bigg |\frac{\partial ^n\psi _m({\textbf{w}})}{\prod _{j=1}^n\partial w_{i_j}}\bigg |&\le \frac{h_{m+n}}{n(m+n)}\bigg [A+B\sum _{i=1}^d3^{r_i/2}\big (|w_i|^{r_i}+2\sigma _{ii}^{r_i/2}\mu _{r_i}\big )\bigg ]. \end{aligned}$$
    (2.11)
  2. (ii)

    Now assume that \(\Sigma \) is positive definite and \(h\in C_b^{m+n-2}({\mathbb {R}})\) and \(g\in C_P^{m+n-2}({\mathbb {R}}^d)\) for \(m,n\ge 1\) and \(m+n\ge 3\). Then, for all \({\textbf{w}}\in {\mathbb {R}}^d\),

    $$\begin{aligned}&\bigg |\frac{\partial ^{n}\psi _{m}({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\nonumber \\ {}&\le \frac{\pi \Gamma (\frac{n}{2})\Gamma (\frac{m+n}{2})}{4\Gamma (\frac{n+1}{2})\Gamma (\frac{m+n+1}{2})}h_{m+n-2}\min _{1\le k,l\le d}\bigg [A{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_k|{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l|\nonumber \\&\quad +B\sum _{i=1}^d3^{r_i/2}\big (|w_i|^{r_i}{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l|{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_k|\nonumber \\&\quad +2{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_k|{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l((\Sigma ^{1/2}{\textbf{Z}})_i)^{r_i}|\big )\bigg ]. \end{aligned}$$
    (2.12)

    In the case \(\Sigma =I_d\), we obtain the simplified bound

    $$\begin{aligned}{} & {} \bigg |\frac{\partial ^{n}\psi _{m}({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\nonumber \\ {}{} & {} \le \frac{\sqrt{2\pi }\Gamma (\frac{n}{2})\Gamma (\frac{m+n}{2})}{4\Gamma (\frac{n+1}{2})\Gamma (\frac{m+n+1}{2})}h_{m+n-2}\bigg [A+B\sum _{i=1}^d3^{r_i/2}\big (|w_i|^{r_i}+2\mu _{r_i+1}\big )\bigg ].\qquad \quad \end{aligned}$$
    (2.13)
  3. (iii)

    Finally, consider the case \(d=1\) with \(\Sigma =1\). Assume that \(h\in C_b^{m-1}({\mathbb {R}})\) and \(g\in C_P^{m-1}({\mathbb {R}})\), where \(m\ge 2\) and \(P(w)=A+B|w|^r\), \(r\ge 0\). Then, for all \(w\in {\mathbb {R}}\),

    $$\begin{aligned} |\psi _m^{(3)}(w)|\le h_{m-1}\big [{{\tilde{\alpha }}}_r A+ 3^{r/2}B\big ({{\tilde{\beta }}}_r|w|^{r}+{{\tilde{\gamma }}}_r\big )\big ], \end{aligned}$$
    (2.14)

    where \(({{\tilde{\alpha }}}_r,{{\tilde{\beta }}}_r,{{\tilde{\gamma }}}_r)=(10,10,10\mu _{r+1})\) if \(0\le r\le 1\), and \(({{\tilde{\alpha }}}_r,{{\tilde{\beta }}}_r,{{\tilde{\gamma }}}_r)=(r^2+r+8,r^2+2r+18,(2r^2+r+5)\mu _{r+1})\) if \(r>1\).

  4. (iv)

    If \(h(w)=w\) for all \(w\in {\mathbb {R}}\), then the inequalities in parts (i), (ii) and (iii) hold for g in the classes \(C_{P,*}^{m+n}({\mathbb {R}}^d)\), \(C_{P,*}^{m+n-2}({\mathbb {R}}^d)\) and \(C_{P,*}^{m-1}({\mathbb {R}})\), respectively. Moreover, under these assumptions, the polynomial growth rate in the bounds (2.11)–(2.14) is optimal.

Remark 2.3

The following two-sided inequalities are helpful in understanding the behaviour of the bounds (2.7) and (2.8) of Proposition 2.1 and the bounds (2.12) and (2.13) of Proposition 2.2 for large m and n:

$$\begin{aligned} \frac{\sqrt{2}}{\sqrt{n}}<\frac{\Gamma (\frac{n}{2})}{\Gamma (\frac{n+1}{2})}<\frac{\sqrt{2}}{\sqrt{n-1/2}}, \quad n\ge 1, \end{aligned}$$
(2.15)

and

$$\begin{aligned}\frac{2}{\sqrt{n(m+n)}}<\frac{\Gamma (\frac{n}{2})\Gamma (\frac{m+n}{2})}{\Gamma (\frac{n+1}{2})\Gamma (\frac{m+n+1}{2})}<\frac{2}{\sqrt{(n-1/2)(m+n-1/2)}}, \quad m,n\ge 1.\end{aligned}$$

Here we used the inequalities \(\frac{\Gamma (x+1/2)}{\Gamma (x+1)}>(x+1/2)^{-1/2}\) for \(x>0\) (see the proof of Corollary 3.4 of [15]) and \(\frac{\Gamma (x+1/2)}{\Gamma (x+1)}<(x+1/4)^{-1/2}\) for \(x>-1/4\) (see [11]).

Remark 2.4

1. Inequalities (2.6)–(2.8) of Proposition 2.1 and inequalities (2.11)–(2.13) of Proposition 2.2 are sharper than the corresponding bounds given in Corollaries 2.2 and 2.3 of [17] through improved dependence on the constants \(r_1,\ldots ,r_d\) and m and n. We remark that further more accurate bounds can be given; for example, a slight modification of the proof of inequality (2.6) leads to the improved bound

$$\begin{aligned} \bigg |\frac{\partial ^nf({\textbf{w}})}{\prod _{j=1}^n\partial w_{i_j}}\bigg |&\le \frac{h_n}{n}\bigg [A+B\sum _{i=1}^dI_{n,r_i}\big (|w_i|^{r_i}+\sigma _{ii}^{r_i/2}\mu _{r_i}\big )\bigg ], \quad {\textbf{w}}\in {\mathbb {R}}^d, \end{aligned}$$

where \(I_{n,r}=n\int _{0}^{1}t^{n-1}(t+\sqrt{1-t^2})^r\,\textrm{d}t\), and it is clear that \(I_{n,r}\le 2^{r/2}\). This integral cannot in general be expressed in terms of elementary functions, but for given n and r can be evaluated by computational algebra packages. In specific applications in which n and \(r_1,\ldots ,r_d\) are known, this bound could be applied to improve constants. Similarly, the constant \(2^{r_i/2}\) in the bounds (2.7) and (2.8) can be improved by replacing \(2^{r_i/2}\) by \(J_{n,r_i}=\frac{2\Gamma {((n+1)/2)}}{\sqrt{\pi }\Gamma {(n/2)}}\int _{0}^{1}\frac{t^{n-1}}{\sqrt{1-t^2}}(t+\sqrt{1-t^2})^{r_i}\,\textrm{d}t\), and the factor \(3^{r_i/2}\) in inequality (2.11) can be improved to: \(I_{m,n,r_i}=n(m+n)\int _{0}^{1}\int _{0}^{1}t^{m+n-1}s^{n-1}(st+t\sqrt{1-s^2}+\sqrt{1-t^2})^{r_i}\,\textrm{d}s\,\textrm{d}t\), whilst the factor \(3^{r_i/2}\) in inequalities (2.12) and (2.13) can be improved to \(J_{m,n,r_i}=\frac{4\Gamma ((n+1)/2)\Gamma ((m+n+1)/2)}{\pi \Gamma (n/2)\Gamma ((m+n)/2)}\int _{0}^{1}\int _{0}^{1}\frac{t^{m+n-1}}{\sqrt{1-t^2}}\frac{s^{n-1}}{\sqrt{1-s^2}}(st+t\sqrt{1-s^2}+\sqrt{1-t^2})^{r_i}\,\textrm{d}s\,\textrm{d}t\). For our purposes, we find it easier to work with the more explicit bounds stated in the propositions, particularly as these inequalities are used to derive inequalities (2.9) and (2.14) leading to a more efficient proof and simpler constants in the final bounds.

2. Inequality (2.9) of Proposition 2.1 and inequality (2.14) of Proposition 2.2 have a theoretically improved dependence on r than the corresponding bounds of [17] in that for large r they are of a smaller asymptotic order; however, for small r they may be numerically larger. More significantly, inequality (2.9) of Proposition 2.1 improves on the corresponding bound of Corollary 2.2 of [17] by improving the w-dependence of the bound from \(|w|^{r+1}\) to \(|w|^r\), whilst inequality (2.9) of Proposition 2.2 improves on the corresponding bound of Corollary 2.3 of [17] by improving the w-dependence of the bound from \(|w|^{r+2}\) to \(|w|^r\). This improved w-dependence allows us to impose weaker moment conditions in our general bounds in Theorem 3.1.

In proving Propositions 2.1 and 2.2, we will make use of the following simple lemmas. The proofs are given in Appendix A.

Lemma 2.5

Suppose \(a,b,c,x,y,z\ge 0\) and \(r\ge 0\). Then

$$\begin{aligned} (ax+by)^r&\le (a+b)^r(x^r+y^r),\end{aligned}$$
(2.16)
$$\begin{aligned} (ax+by+cz)^r&\le (a+b+c)^r(x^r+y^r+z^r). \end{aligned}$$
(2.17)

Lemma 2.6

Let \(T_r(w)=w\textrm{e}^{w^2/2}\int _{w}^{\infty }t^r\textrm{e}^{-t^2/2}\,\textrm{d}t\).

  1. 1.

    Suppose \(0\le r\le 1\). Then, for \(w>0\),

    $$\begin{aligned} T_r(w)\le w^r. \end{aligned}$$
    (2.18)
  2. 2.

    Suppose \(r>1\). Then, for \(w>r-1\),

    $$\begin{aligned} T_r(w)\le 2w^r. \end{aligned}$$
    (2.19)

Proof of Proposition 2.1

(i) Suppose that \(\Sigma \) is non-negative definite. We first recall inequality (2.2) of [17], which states (under a change of variable in the integral) that, for positive definite \(\Sigma \), and \(g\in C_P^n({\mathbb {R}}^d)\) and \(h\in C_b^n({\mathbb {R}})\),

$$\begin{aligned} \bigg |\frac{\partial ^nf({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\le h_n\int _{0}^{1}t^{n-1}{\mathbb {E}}[P({\textbf{z}}_{t,{\textbf{w}}}^{\Sigma ^{1/2}{\textbf{Z}}})]\,\textrm{d}t, \end{aligned}$$
(2.20)

where \({\textbf{z}}_{t,{\textbf{w}}}^{{\textbf{x}}}=t{\textbf{w}}+{\textbf{x}}\sqrt{1-t^2}\). We will denote the i-th component of \({\textbf{z}}_{t,{\textbf{w}}}^{{\textbf{x}}}\) by \(({\textbf{z}}_{t,{\textbf{w}}}^{{\textbf{x}}})_i\). Using inequality (2.20) with dominating function \(P({\textbf{w}})=A+B\sum _{i=1}^{d}|w_i|^{r_i}\) gives the bound

$$\begin{aligned} \bigg |\frac{\partial ^nf({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\le h_n\int _{0}^{1}t^{n-1}{\mathbb {E}}\bigg [A+B\sum _{i=1}^{d}\big |({\textbf{z}}_{t,{\textbf{w}}}^{\Sigma ^{1/2}{\textbf{Z}}})_i\big |^{r_i}\bigg ]\,\textrm{d}t. \end{aligned}$$

Now using inequality (2.16) of Lemma 2.5, we have

$$\begin{aligned} |({\textbf{z}}_{t,{\textbf{w}}}^{{\textbf{x}}})_i|^{r_i}\le (t+\sqrt{1-t^2})^{r_i}(|w_i|^{r_i}+|x_i|^{r_i})\le 2^{r_i/2}(|w_i|^{r_i}+|x_i|^{r_i}), \end{aligned}$$

where we used basic calculus to bound \(t+\sqrt{1-t^2}\le \sqrt{2}\), \(t\in (0,1)\). Therefore,

$$\begin{aligned} \bigg |\frac{\partial ^nf({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\le h_n\int _{0}^{1}t^{n-1}\bigg [A+B\sum _{i=1}^{d}2^{r_i/2}(|w_i|^{r_i}+{\mathbb {E}}[|(\Sigma ^{1/2}{\textbf{Z}})_i|^{r_i}])\bigg ]\,\textrm{d}t, \end{aligned}$$

from which inequality (2.6) follows on evaluating the integral \(\int _0^1t^{n-1}\,\textrm{d}t=1/n\) and using that \((\Sigma ^{1/2}{\textbf{Z}})_i\sim N(0,\sigma _{ii})\), so that \({\mathbb {E}}|(\Sigma ^{1/2}{\textbf{Z}})_i|^{r_i}=\sigma _{ii}^{r_i/2}\mu _{r_i}\).

(ii) Suppose now that \(\Sigma \) is positive definite. Under this assumption, we may recall inequality (2.3) of [17], which states that, for \(g\in C_P^n({\mathbb {R}}^d)\) and \(h\in C_b^n({\mathbb {R}})\),

$$\begin{aligned} \bigg |\frac{\partial ^nf({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\le h_{n-1}\min _{1\le l\le d}\int _{0}^{1}\frac{t^{n-1}}{\sqrt{1-t^2}}{\mathbb {E}}\big |(\Sigma ^{-1/2}{\textbf{Z}})_lP({\textbf{z}}_{t,{\textbf{w}}}^{\Sigma ^{1/2}{\textbf{Z}}})\big |\,\textrm{d}t. \end{aligned}$$
(2.21)

With inequality (2.21) at hand, a similar argument to the one used in part (i) of the proof yields the bound

$$\begin{aligned} \bigg |\frac{\partial ^nf({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |&\le h_{n-1}\min _{1\le l\le d}\int _{0}^{1}\frac{t^{n-1}}{\sqrt{1-t^2}}\bigg [A{\mathbb {E}}\big |(\Sigma ^{-1/2}{\textbf{Z}})_l\big |\\&\quad +B\sum _{i=1}^{d}2^{r_i/2}\big (|w_i|^{r_i}{\mathbb {E}}\big |(\Sigma ^{-1/2}{\textbf{Z}})_l\big |+{\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_l((\Sigma ^{1/2}{\textbf{Z}})_i)^{r_i}\big |\big )\bigg ]\,\textrm{d}t, \end{aligned}$$

from which inequality (2.7) follows on evaluating \(\int _{0}^{1}\frac{t^{n-1}}{\sqrt{1-t^2}}\,\textrm{d}t=\frac{\sqrt{\pi }\Gamma (n/2)}{2\Gamma ((n+1)/2)}\). Inequality (2.8) now follows by setting \(\Sigma =I_d\) in inequality (2.7) and bounding \({\mathbb {E}}|Z|=\sqrt{2/\pi }<1\).

(iii) Suppose now that \(d=1\) and \(\Sigma =1\). We will employ the iterative approach to bounding solutions of Stein equations of [10]. Let L denote the standard normal Stein operator given by \(Lf(w)=f''(w)-wf'(w)\). Then, as noted by [10], an induction on n gives that, for \(n\ge 3\),

$$\begin{aligned} Lf^{(n-2)}(w)=(h\circ g)^{(n-2)}(w)+(n-2)f^{(n-2)}(w). \end{aligned}$$
(2.22)

From (2.22), a rearrangement and an application of the triangle inequality gives that

$$\begin{aligned} |f^{(n)}(w)|\le |wf^{(n-1)}(w)|+(n-2)|f^{(n-2)}(w)|+|(h\circ g)^{(n-2)}(w)|. \end{aligned}$$
(2.23)

From inequality (2.6) and Lemma 2.1 of [17], we have the bounds

$$\begin{aligned} (n-2)|f^{(n-2)}(w)|\le h_{n-2}\big [A+2^{r/2}B(|w|^r+\mu _r)\big ], \end{aligned}$$
(2.24)

and

$$\begin{aligned} |(h\circ g)^{(n-2)}(w)|\le h_{n-2}\big [A+B|w|^r\big ]. \end{aligned}$$
(2.25)

It now remains to bound \(|wf^{(n-1)}(w)|\), which requires more work to get the correct w-dependence. To this end, we obtain from (1.3), (1.4) and (2.22) the following representations for \(f^{(n-1)}(w)\):

$$\begin{aligned} f^{(n-1)}(w)&=-\textrm{e}^{w^2/2}\int _{w}^{\infty }[(h\circ g)^{(n-2)}(t)+(n-2)f^{(n-2)}(t)]\textrm{e}^{-t^2/2}\,\textrm{d}t\end{aligned}$$
(2.26)
$$\begin{aligned}&=\textrm{e}^{w^2/2}\int _{-\infty }^{w}[(h\circ g)^{(n-2)}(t)+(n-2)f^{(n-2)}(t)]\textrm{e}^{-t^2/2}\,\textrm{d}t. \end{aligned}$$
(2.27)

We first consider the case \(0\le r\le 1\). From (2.26), we have that, for \(w>0\),

$$\begin{aligned} |wf^{(n-1)}(w)|&\le w\textrm{e}^{w^2/2}\int _{w}^{\infty }\big [|(h\circ g)^{(n-2)}(t)|+(n-2)|f^{(n-2)}(t)|\big ]\textrm{e}^{-t^2/2}\,\textrm{d}t\nonumber \\&\le h_{n-2}w\textrm{e}^{w^2/2}\int _{w}^{\infty }\big [2A+B\big ((1+2^{r/2})t^r+2^{r/2}\mu _r\big )\big ]\textrm{e}^{-t^2/2}\,\textrm{d}t\end{aligned}$$
(2.28)
$$\begin{aligned}&\le h_{n-2}\big [2A+2^{r/2}B\big (2|w|^r+\mu _r\big )\big ], \end{aligned}$$
(2.29)

where in the final step we used inequality (2.18). Due to the equality between (2.26) and (2.27), the same argument can be used to verify that inequality (2.29) is also valid for \(w<0\). Applying inequalities (2.24), (2.25) and (2.29) to inequality (2.23) yields the bound (2.9) for \(0\le r\le 1\).

Suppose now that \(r>1\). When \(r>1\), applying inequalities (2.18) and (2.19) to (2.28) (and noting that the case \(w<0\) is dealt with similarly) yields the bound

$$\begin{aligned} |wf^{(n-1)}(w)|\le h_{n-2}\big [2A+2^{r/2}B\big (4|w|^r+\mu _r\big )\big ], \end{aligned}$$
(2.30)

for \(|w|>r-1\). We now bound \(|wf^{(n-1)}(w)|\) for \(|w|\le r-1\). We begin by noting that, for \(n\ge 3\), \(u(n)=\Gamma ((n-1)/2)/\Gamma (n/2)\le 2/\sqrt{\pi }\), which follows because u is a decreasing function of n on \((3,\infty )\) (see [26]) and \(u(3)=2/\sqrt{\pi }\). Using this inequality, the bound (2.8) yields that, for \(|w|\le r-1\),

$$\begin{aligned} |wf^{(n-1)}(w)|&\le (r-1)|f^{(n-1)}(w)|\le (r-1)h_{n-2}\big [A+2^{r/2}B (|w|^r+\mu _{r+1})\big ]. \end{aligned}$$
(2.31)

From inequalities (2.30) and (2.31), we deduce that, for \(r>1\) and \(w\in {\mathbb {R}}\),

$$\begin{aligned} |wf^{(n-1)}(w)|\le h_{n-2}\big [(r+1)A+2^{r/2}B((r+3)|w|^r+r\mu _{r+1})\big ], \end{aligned}$$
(2.32)

where we used that \(\mu _r\le \mu _{r+1}\) for \(r>1\). Finally, we substitute the bounds (2.24), (2.25) and (2.32) into inequality (2.23) to obtain inequality (2.9) for \(r>1\), where we again used that \(\mu _r\le \mu _{r+1}\) for \(r>1\).

(iv) In obtaining inequality (2.20), Gaunt [17] used the assumption that \(g\in C_P^n({\mathbb {R}}^d)\) to bound the absolute value of the n-th order partial derivatives of \((h\circ g)({\textbf{w}})\) by \(h_nP({\textbf{w}})\), \({\textbf{w}}\in {\mathbb {R}}^d\) (see Lemma 2.1 of [17]). However, if \(h(w)=w\), then under the assumption \(g\in C_{P,*}^n({\mathbb {R}}^d)\) we can bound the n-th order partial derivatives of \((h\circ g)({\textbf{w}})=g({\textbf{w}})\) by \(P({\textbf{w}})\), \({\textbf{w}}\in {\mathbb {R}}^d\). Also, note that if \(h(w)=w\), then \(h_n=1\). We therefore see that inequality (2.6) holds under the weaker assumption \(g\in C_{P,*}^n({\mathbb {R}}^d)\) if \(h(w)=w\). For very similar reasons, we can also weaken the conditions on g in parts (ii) and (iii) of the proposition if \(h(w)=w\).

To prove the optimality of the polynomial growth rate in the bounds (2.6)–(2.9), it suffices to consider the case \(d=1\). Let \(h(w)=w\) and \(g(w)=|w|^q\), where \(q\ge n\ge 1\). Then, \(g\in C_{P_1,*}^n({\mathbb {R}})\), where \(P_1(w)=n!|w|^{q-n}\). Using the representation (1.2) of f and the dominated convergence theorem, we have that

$$\begin{aligned} f^{(n)}(w)&=-\int _0^1\int _{-\infty }^\infty t^{n-1}g^{(n)}\big (tw+\sqrt{1-t^2}y\big )\phi (y)\,\textrm{d}y\,\textrm{d}t\\&=-n!\int _0^1\int _{-\infty }^\infty t^{n-1}\big |tw+\sqrt{1-t^2}y\big |^{q-n}\\ {}&\quad \big (\textrm{sgn}\big (tw+\sqrt{1-t^2}y\big )\big )^n\phi (y)\,\textrm{d}y\,\textrm{d}t, \end{aligned}$$

where \(\phi \) is the standard normal probability density function and \(\textrm{sgn}(x)\) is the sign of \(x\in {\mathbb {R}}\). A simple asymptotic analysis now gives that \(|f^{(n)}(w)|\sim (n!/q)|w|^{q-n}, \) as \(|w|\rightarrow \infty \). Thus, the \(|w|^r\) growth rate in inequality (2.6) is optimal. The optimality of the growth rate in inequalities (2.7)–(2.9) is established similarly. We observe that \(g\in C_{P_2,*}^{n-1}({\mathbb {R}})\) and \(g\in C_{P_3,*}^{n-2}({\mathbb {R}})\), where \(P_2(w)=(n-1)!|w|^{q-n+1}\) and \(P_3(w)=(n-2)!|w|^{q-n+2}\), and almost identical calculations now confirm that the growth rate in inequalities (2.7)–(2.9) is optimal. \(\square \)

Proof of Proposition 2.2

(i) Suppose that \(\Sigma \) is non-negative definite. We will make use of the following bound given in Lemma 2.4 of [17], which states (under a change of variable in the integral) that, for non-negative definite \(\Sigma \), and \(g\in C_P^{m+n}({\mathbb {R}}^d)\) and \(h\in C_b^{m+n}({\mathbb {R}})\),

$$\begin{aligned} \bigg |\frac{\partial ^n\psi _m({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |\le h_{m+n}\int _{0}^{1}\int _{0}^{1}t^{m+n-1}s^{n-1}{\mathbb {E}}[P({\textbf{z}}_{s,t,{\textbf{w}}}^{\Sigma ^{1/2}{\textbf{Z}},\Sigma ^{1/2}{\textbf{Z}}'})]\,\textrm{d}s\,\textrm{d}t, \end{aligned}$$
(2.33)

where \({\textbf{z}}_{s,t,{\textbf{w}}}^{{\textbf{x}},{\textbf{y}}}=st{\textbf{w}}+t\sqrt{1-s^2}{\textbf{y}}+\sqrt{1-t^2}{\textbf{x}}\) and, here and in part (ii) of the proof, \({\textbf{Z}}'\) is an independent copy of \({\textbf{Z}}\). We will apply inequality (2.33) with dominating function \(P({\textbf{w}})=A+B\sum _{i=1}^{d}|w_i|^{r_i}\). Now, using inequality (2.17) gives that

$$\begin{aligned} |({\textbf{z}}_{s,t,{\textbf{w}}}^{{\textbf{x}},{\textbf{y}}})_i|^{r_i}&\le (st +t\sqrt{1-s^2}+\sqrt{1-t^2})^{r_i}(|w_i|^{r_i}+|x_i|^{r_i}+|y_i|^{r_i})\nonumber \\&\le 3^{r_i/2}(|w_i|^{r_i}+|x_i|^{r_i}+|y_i|^{r_i}), \end{aligned}$$
(2.34)

where we used basic calculus to bound \(st +t\sqrt{1-s^2}+\sqrt{1-t^2}\le \sqrt{3}\) for \(0<s,t<1\). We therefore obtain the bound

$$\begin{aligned} \bigg |\frac{\partial ^n\psi _m({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |&\le h_{m+n}\int _{0}^{1}\int _{0}^{1}t^{m+n-1}s^{n-1}\bigg [A+B\sum _{i=1}^{d}3^{r_i/2}\big (|w_i|^{r_i}+{\mathbb {E}}[|(\Sigma ^{1/2}{\textbf{Z}})_i|^{r_i}]\\&\quad +{\mathbb {E}}[|(\Sigma ^{1/2}{\textbf{Z}}')_i|^{r_i}]\big )\bigg ]\,\textrm{d}s\,\textrm{d}t, \end{aligned}$$

and on evaluating the integral we deduce inequality (2.11).

(ii) Suppose now that \(\Sigma \) is positive definite. Under this assumption, we recall a bound from Lemma 2.4 of [17] that states that, for \(g\in C_P^{m+n-2}({\mathbb {R}}^d)\) and \(h\in C_b^{m+n-2}({\mathbb {R}})\),

$$\begin{aligned} \bigg |\frac{\partial ^n\psi _m({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |&\le h_{m+n-2}\min _{1\le k,l\le d}\int _{0}^{1}\int _{0}^{1}\frac{t^{m+n-1}}{\sqrt{1-t^2}}\frac{s^{n-1}}{\sqrt{1-s^2}}\nonumber \\&\quad \times {\mathbb {E}}\big |(\Sigma ^{-1/2}{\textbf{Z}})_k(\Sigma ^{-1/2}{\textbf{Z}}')_lP({\textbf{z}}_{s,t,{\textbf{w}}}^{\Sigma ^{1/2}{\textbf{Z}},\Sigma ^{1/2}{\textbf{Z}}'})\big |\,\textrm{d}s\,\textrm{d}t. \end{aligned}$$
(2.35)

Applying inequality (2.34) to the bound (2.35) gives that

$$\begin{aligned} \bigg |\frac{\partial ^n\psi _m({\textbf{w}})}{\prod _{j=1}^{n}\partial w_{i_j}}\bigg |&\le h_{m+n-2}\min _{1\le k,l\le d}\int _{0}^{1}\int _{0}^{1}\frac{t^{m+n-1}}{\sqrt{1-t^2}}\frac{s^{n-1}}{\sqrt{1-s^2}}{\mathbb {E}}\bigg [|(\Sigma ^{-1/2}{\textbf{Z}})_k(\Sigma ^{-1/2}{\textbf{Z}}')_l|\\&\quad \times \bigg (A+B\sum _{i=1}^{d}3^{r_i/2}\big (|w_i|^{r_i}+|(\Sigma ^{1/2}{\textbf{Z}})_i|^{r_i}+|(\Sigma ^{1/2}{\textbf{Z}}')_i|^{r_i}\big )\bigg )\bigg ]\,\textrm{d}s\,\textrm{d}t, \end{aligned}$$

from which inequality (2.12) follows on evaluating the integral similarly to how we did in proving inequality (2.7) of Proposition 2.1. The simplified bound inequality (2.13) now follows by setting \(\Sigma =I_d\) in inequality (2.12) and using that, for such \(\Sigma \) we have \({\mathbb {E}}|(\Sigma ^{-1/2}{\textbf{Z}})_k|={\mathbb {E}}|Z|=\sqrt{2/\pi }\) for \(k=1,\ldots ,d\).

(iii) We proceed similarly to the proof of inequality (2.9), using the iterative technique of [10]. Recall that \(L\psi _m(w)=f^{(m)}(w)\), where L is the standard normal Stein operator. Differentiating gives that

$$\begin{aligned} L\psi _m'(w)=f^{(m+1)}(w)+\psi _m'(w). \end{aligned}$$
(2.36)

From (2.36) and the triangle inequality, we get that

$$\begin{aligned} |\psi _m^{(3)}(w)|\le |w\psi _m''(w)|+|f^{(m+1)}(w)|+|\psi _m'(w)|. \end{aligned}$$
(2.37)

From (2.9) and (2.13), we have the bounds

$$\begin{aligned} |f^{(m+1)}(w)|&\le 2h_{m-1}\big [2A+2^{r/2}B(2|w|^r+\mu _{r})\big ], \quad 0\le r\le 1, \end{aligned}$$
(2.38)
$$\begin{aligned} |f^{(m+1)}(w)|&\le h_{m-1}\big [(r+3)A+2^{r/2}B((r+5)|w|^r+(r+1)\mu _{r+1})\big ],\quad r>1, \end{aligned}$$
(2.39)
$$\begin{aligned} |\psi _m'(w)|&\le h_{m-1}\big [A+3^{r/2}B(|w|^r+2\mu _{r+1})\big ], \quad r\ge 0, \end{aligned}$$
(2.40)

where we used that, for \(m\ge 2\), \(v(m)=\Gamma ((m+1)/2)/\Gamma (m/2+1)\le \sqrt{\pi }/2\), which follows because v is a decreasing function of m on \((2,\infty )\) (see [26]) and \(v(2)=\sqrt{\pi }/2\), in applying inequality (2.13).

It now remains to bound \(|w\psi _m''(w)|\), which requires more effort to establish a bound with the correct w-dependence. From (1.3), (1.4) and (2.36), we obtain the following representation for \(\psi _m''(w)\):

$$\begin{aligned} \psi _m''(w)&=-\textrm{e}^{w^2/2}\int _{w}^{\infty }\big [f^{(m+1)}(t)+\psi _m'(t)\big ]\textrm{e}^{-t^2/2}\,\textrm{d}t\end{aligned}$$
(2.41)
$$\begin{aligned}&=\textrm{e}^{w^2/2}\int _{-\infty }^{w}\big [f^{(m+1)}(t)+\psi _m'(t)\big ]\textrm{e}^{-t^2/2}\,\textrm{d}t. \end{aligned}$$
(2.42)

We first consider the case \(0\le r\le 1\).

Applying (2.38) and (2.40) to (2.41) gives that, for \(w\ge 0\),

$$\begin{aligned} |w\psi _m''(w)|&\le w\textrm{e}^{w^2/2}\int _{w}^{\infty }\big [|f^{(m+1)}(t)|+|\psi _m'(t)|\big ]\textrm{e}^{-t^2/2}\,\textrm{d}t\end{aligned}$$
(2.43)
$$\begin{aligned}&\le h_{m-1}w\textrm{e}^{w^2/2}\int _{w}^{\infty }\big [5A+3^{r/2}B(5t^r+2\mu _{r+1}+2\mu _r)\big ]\textrm{e}^{-t^2/2}\,\textrm{d}t\nonumber \\&\le h_{m-1}\big [5A+3^{r/2}B(5|w|^r+2\mu _{r+1}+2\mu _r)\big ], \end{aligned}$$
(2.44)

where we used inequality (2.18) to bound the integral. Due to the equality between (2.41) and (2.42), it is readily seen that inequality (2.44) also holds for \(w<0\). Applying inequalities (2.38), (2.40) and (2.44) to inequality (2.37), as well as using the inequality \(2\mu _r\le 3\mu _{r+1}\), \(0\le r\le 1\), to simplify the bound, now yields (2.14) for \(0\le r\le 1\).

Suppose now that \(r>1\). We now apply inequalities (2.39) and (2.40) to (2.43) to get that, for \(|w|>r-1\),

$$\begin{aligned} |w\psi _m''(w)|&\le h_{m-1}|w|\textrm{e}^{w^2/2}\int _{|w|}^{\infty }\big [(r+4)A+3^{r/2}B((r+6)t^r+(r+3)\mu _{r+1})\big ]\textrm{e}^{-t^2/2}\,\textrm{d}t\nonumber \\&\le h_{m-1}\big [(r+4)A+3^{r/2}B((2r+12)|w|^r+(r+3)\mu _{r+1})\big ], \end{aligned}$$
(2.45)

where we used inequalities (2.18) and (2.19) to bound the integral. Now suppose that \(|w|\le r-1\). Rearranging \(L\psi _m(w)=f^{(m)}(w)\) and applying the triangle inequality gives that, for \(|w|\le r-1\),

$$\begin{aligned} |w\psi _m''(w)|\le |w^2\psi _m'(w)|+|wf^{(m)}(w)|\le (r-1)^2|\psi _m'(w)|+(r-1)|f^{(m)}(w)|. \end{aligned}$$

Using inequalities (2.8) and (2.13) now gives that

$$\begin{aligned} |w\psi _m''(w)|&\le (r-1)^2h_{m-1}\big [A+3^{r/2}B(|w|^r+2\mu _{r+1})\big ]\nonumber \\&\quad +(r-1)h_{m-1}\big [A+2^{r/2}B(|w|^r+\mu _{r+1})\big ] \nonumber \\&\le h_{m-1}\big [r^2A+3^{r/2}B(r^2|w|^r+2r^2\mu _{r+1})\big ], \end{aligned}$$
(2.46)

where in applying inequalities (2.8) and (2.13), we used the bounds \(\Gamma ((m+1)/2)/\Gamma (m/2+1)\le \sqrt{\pi }/2\) and \(\Gamma (m/2)/\Gamma ((m+1)/2)\le 2/\sqrt{\pi }\) for \(m\le 2\), which can be verified with the usual argument for bounding ratios of gamma functions that we have used earlier in this paper. From inequalities (2.45) and (2.46), we deduce that, for \(r>1\) and \(w\in {\mathbb {R}}\),

$$\begin{aligned} |w\psi _m''(w)|\le h_{m-1}\big [(r^2+4)A+3^{r/2}B((r^2+r+12)|w|^r+(2r^2+2)\mu _{r+1})\big ]. \end{aligned}$$
(2.47)

Finally, we substitute the bounds (2.39), (2.40) and (2.47) into inequality (2.37) to obtain inequality (2.14) for \(r>1\).

(iv) When \(h(w)=w\), the bounds in parts (i), (ii) and (iii) hold for g in the weaker classes \(C_{P,*}^{m+n}({\mathbb {R}}^d)\), \(C_{P,*}^{m+n-2}({\mathbb {R}}^d)\) and \(C_{P,*}^{m-1}({\mathbb {R}})\) for the same reason given in part (iv) of the proof of Proposition 2.1.

The optimality of the \(|w|^r\) growth rate of inequalities (2.11)–(2.14) is established similarly to we did in part (iv) of the proof of Proposition 2.1. We demonstrate the optimality of the growth rate in inequality (2.11); similar considerations show the optimality of the growth rate in inequalities (2.12)–(2.14). Let \(d=1\), \(h(w)=w\) and \(g(w)=|w|^q\), where \(q\ge m+n\). Observe that \(g\in C_{P,*}^{m+n}({\mathbb {R}})\) with \(P(w)=(m+n)!|w|^{q-m-n}\). Also, by the dominated convergence theorem,

$$\begin{aligned} \psi _m^{(n)}(w)&=\int _0^1\!\int _0^1\!\int _{-\infty }^\infty \!\int _{-\infty }^\infty t^{m+n-1}s^{n-1}g^{(m+n)}(z_{s,t,w}^{x,y})\phi (x)\phi (y)\,\textrm{d}x\,\textrm{d}y\,\textrm{d}s\,\textrm{d}t\\&=(m+n)!\int _0^1\!\int _0^1\!\int _{-\infty }^\infty \!\int _{-\infty }^\infty t^{m+n-1}s^{n-1}|z_{s,t,w}^{x,y}|^{q-m-n}(\textrm{sgn}(z_{s,t,w}^{x,y}))^{m+n}\\&\quad \times \phi (x)\phi (y)\,\textrm{d}x\,\textrm{d}y\,\textrm{d}s\,\textrm{d}t, \end{aligned}$$

where we recall that \(z_{s,t,w}^{x,y}=stw+t\sqrt{1-s^2}y+\sqrt{1-t^2}x\). A simple asymptotic analysis now gives that \(|\psi _m^{(n)}(w)|\sim (m+n)!|w|^{q-m-n}/(q(q-m)), \) as \(|w|\rightarrow \infty \). Thus, the \(|w|^r\) growth rate in inequality (2.11) is optimal. \(\square \)

3 Bounds for the Distance Between the Distributions of \(g({\textbf{W}})\) and \(g({\textbf{Z}})\)

In this section, we obtain general bounds to quantify the quality of the distributional approximation of \(g({\textbf{W}})\) by \(g({\textbf{Z}})\), where \({\textbf{Z}}\sim \textrm{MVN}_d({\textbf{0}},I_d)\), in the setting that \({\textbf{W}}\) is a sum of independent random vectors with independent components. We shall suppose that \(X_{1,1},\ldots ,X_{n_1,1},\ldots ,X_{1,d},\ldots ,X_{n_d,d}\) are independent random variables, and define the random vector \({\textbf{W}}:=(W_1,\ldots ,W_d)^\intercal \), where \(W_j=n_j^{-1/2}\sum _{i=1}^{n_j}X_{ij}\), for \(1\le j\le d\). We shall also assume that \({\mathbb {E}}[X_{ij}^k]={\mathbb {E}}[Z^k]\) for all \(1\le i\le n_j\), \(1\le j\le d\) and all \(k\in {\mathbb {Z}}^+\) such that \(k\le p\), for some \(p\ge 2\); having three or more matching moments allows for faster convergence rates than the \(O(n^{-1/2})\) Berry–Esseen rate. As in Sect. 2, we shall suppose that the partial derivatives of g up to a specified order exist and have polynomial growth rate. To this end, we introduce the dominating function \(P({\textbf{w}})=A+B\sum _{i=1}^{d}|w_i|^{r_i},\) where AB and \(r_1,\ldots , r_d\) are non-negative constants. In the univariate \(d=1\) case, we simplify notation, writing \(W=n^{-1/2}\sum _{i=1}^nX_i\), where \(X_1,\ldots ,X_n\) are independent random variables such that \({\mathbb {E}}[X_i^k]={\mathbb {E}}[Z^k]\) for all \(1\le i\le n\) and \(1\le k\le p\). The dominating function also takes the simpler form, \(P(w)=A+B|w|^r\).

Our general bounds are stated in the following theorem and improve on the bounds of Theorems 3.2-\(-\)3.5 of [17] through smaller numerical constants and weaker moment conditions. These improvements are a result of our improved bounds of Propositions 2.1 and 2.2 on the solutions of the Stein equations (1.1) and (2.10) (the improved w-dependence of the univariate bounds results in weaker moment conditions) and some more careful simplifications in the proof of Theorem 3.1 to improve the dependence of the bounds in parts (iii) and (iv) (in which g is an even function) on the moments of the \(X_{ij}\). The rate of convergence of all bounds with respect to n is of optimal order; see [17, Proposition 3.1]. We shall let \(\Delta _h(g({\textbf{W}}),g({\textbf{Z}}))\) denote the quantity \(|{\mathbb {E}}[h(g({\textbf{W}}))]-{\mathbb {E}}[h(g({\textbf{Z}}))]|\). The bounds involve the constants \(c_r=\max \{1,2^{r-1}\},\) \(r\ge 0\).

Theorem 3.1

Suppose that the above notations and assumptions prevail. Then under additional assumptions, as given below, the following bounds hold.

  1. (i)

    Suppose \({\mathbb {E}}|X_{ij}|^{r_l+p+1}<\infty \) for all ijl, and that \(g\in C_P^p({\mathbb {R}}^d)\) and \(h\in C_b^p({\mathbb {R}})\). Then

    $$\begin{aligned}&\Delta _h(g({\textbf{W}}),g({\textbf{Z}}))\nonumber \\&\le \frac{(p+1)\sqrt{\pi }\Gamma (\frac{p+1}{2})}{2p!\Gamma (\frac{p}{2}+1)}h_p\sum _{j=1}^{d}\sum _{i=1}^{n_j}\frac{1}{n_j^{(p+1)/2}}\bigg [A{\mathbb {E}}|X_{ij}|^{p+1}\nonumber \\ {}&\quad +B\sum _{k=1}^{d}2^{r_k/2}\bigg (c_{r_k}{\mathbb {E}}|X_{ij}|^{p+1}{\mathbb {E}}|W_k|^{r_k}\nonumber \\&\quad +\frac{c_{r_k}}{n_k^{r_k/2}}{\mathbb {E}}|X_{ij}^{p+1}X_{ik}^{r_k}|+\mu _{r_k+1}{\mathbb {E}}|X_{ij}|^{p+1}\bigg )\bigg ]. \end{aligned}$$
    (3.48)
  2. (ii)

    Suppose \({\mathbb {E}}|X_{i}|^{r+p+1}<\infty \) for all i, and that \(g\in C_P^{p-1}({\mathbb {R}})\) and \(h\in C_b^{p-1}({\mathbb {R}})\). Then

    $$\begin{aligned} \Delta _h(g(W),g(Z))&\le \frac{(p+1)}{p!n^{(p+1)/2}}h_{p-1}\sum _{i=1}^{n}\bigg [\alpha _r A{\mathbb {E}}|X_i|^{p+1}+2^{r/2}B\nonumber \\ \bigg (c_r\beta _r\bigg ({\mathbb {E}}|X_i|^{p+1}{\mathbb {E}}|W|^r\nonumber \\&\quad +\frac{1}{n^{r/2}}{\mathbb {E}}|X_i|^{r+p+1}\bigg )+\gamma _r{\mathbb {E}}|X_i|^{p+1}\bigg )\bigg ]. \end{aligned}$$
    (3.49)
  3. (ii)

    Suppose \({\mathbb {E}}|X_{ij}|^{r_l+p+2}<\infty \) for all ijl, and that \(g\in C_P^{p+2}({\mathbb {R}}^d)\) and \(h\in C_b^{p+2}({\mathbb {R}})\). In addition, suppose that \(p\ge 2\) is even and that g is an even function. Then

    $$\begin{aligned}&\Delta _h(g({\textbf{W}}),g({\textbf{Z}}))\nonumber \\&\le \frac{1}{p!}h_{p+2}\bigg \{\sum _{j=1}^{d}\sum _{i=1}^{n_j}\frac{1}{n_j^{p/2+1}}\frac{2p+3}{(p+1)(p+2)}\nonumber \\&\bigg [A{\mathbb {E}}|X_{ij}|^{p+2}+B\sum _{k=1}^{d}2^{r_k/2}\bigg (c_{r_k}{\mathbb {E}}|X_{ij}|^{p+2}{\mathbb {E}}|W_k|^{r_k}\nonumber \\&\quad +\frac{c_{r_k}}{n_k^{r_k/2}}{\mathbb {E}}|X_{ij}^{p+2}X_{ik}^{r_k}|+\mu _{r_k}{\mathbb {E}}|X_{ij}|^{p+2}\bigg )\bigg ]+\frac{3\pi \Gamma (\frac{p}{2}+2)}{8\sqrt{2}\Gamma (\frac{p+5}{2})}\sum _{j=1}^{d}\sum _{i=1}^{n_j}\frac{|{\mathbb {E}}[X_{ij}^{p+1}]|}{n_j^{(p+1)/2}}\nonumber \\&\quad \times \sum _{k=1}^{d}\sum _{l=1}^{n_k}\frac{1}{n_k^{3/2}}\bigg [A{\mathbb {E}}|X_{lk}|^3+B\sum _{t=1}^{d}3^{r_t/2}\bigg (c_{r_t}{\mathbb {E}}|X_{lk}|^3{\mathbb {E}}|W_t|^{r_t}+\frac{c_{r_t}}{n_t^{r_t/2}}{\mathbb {E}}|X_{lk}^3X_{lt}^{r_t}|\nonumber \\&\quad +2\mu _{r_t+1}{\mathbb {E}}|X_{lk}|^3\bigg )\bigg ]\bigg \}. \end{aligned}$$
    (3.50)
  4. (iv)

    Suppose \({\mathbb {E}}|X_{i}|^{r+p+2}<\infty \) for all i, and that \(g\in C_P^{p}({\mathbb {R}})\) and \(h\in C_b^{p}({\mathbb {R}})\). In addition, suppose that \(p\ge 2\) is even and that g is an even function. Then,

    $$\begin{aligned}&\Delta _h(g(W),g(Z)) \le \frac{1}{p!n^{p/2+1}}h_p\bigg \{\sum _{i=1}^{n}\frac{2p+3}{p+1}\bigg [\alpha _r A{\mathbb {E}}|X_i|^{p+2}\nonumber \\&\quad +2^{r/2}B\bigg (c_r\beta _r\bigg ({\mathbb {E}}|X_i|^{p+2}{\mathbb {E}}|W|^r\nonumber \\&\quad +\frac{1}{n^{r/2}}{\mathbb {E}}|X_i|^{r+p+2}\bigg )\!+\!\gamma _r{\mathbb {E}}|X_i|^{p+2}\bigg )\bigg ]\!+\!\frac{3}{2n}\sum _{i=1}^{n}\sum _{l=1}^{n}|{\mathbb {E}}[X_i^{p+1}]|\bigg [{{\tilde{\alpha }}}_r A{\mathbb {E}}|X_l|^3\nonumber \\&\quad +3^{r/2}B\bigg (c_r{{\tilde{\beta }}}_r\bigg ({\mathbb {E}}|X_l|^3{\mathbb {E}}|W|^r +\frac{1}{n^{r/2}}{\mathbb {E}}|X_l|^{r+3}\bigg )\!+{{\tilde{\gamma }}}_r{\mathbb {E}}|X_l|^3\bigg )\bigg ]\bigg \}. \end{aligned}$$
    (3.51)
  5. (v)

    If \(h(w)=w\) for all \(w\in {\mathbb {R}}\), then the inequalities in parts (i)–(iv) hold for g in the classes \(C_{P,*}^p({\mathbb {R}}^d)\), \(C_{P,*}^{p-1}({\mathbb {R}})\), \(C_{P,*}^{p+2}({\mathbb {R}}^d)\) and \(C_{P,*}^{p}({\mathbb {R}})\), respectively.

If one only requires bounds with an explicit dependence on \(n_1,\ldots ,n_d\) and the dimension d, but not with explicit constants, then the bounds given in the following corollary may be preferable.

Corollary 3.2

Let \(r_*=\max _{1\le j\le d}r_j\) and \(n_*=\min _{1\le j\le d}n_j\), and also let \({\tilde{h}}_p=\sum _{j=1}^p\Vert h^{(j)}\Vert \). Let C be a constant that does not depend on \(n_1,\ldots ,n_d\) and d and which may change from line to line. Then

(i) Under the assumptions of part (i) of Theorem 3.1,

$$\begin{aligned} \Delta _h(g({\textbf{W}}),g({\textbf{Z}}))\le \frac{Cd{\tilde{h}}_p}{n_*^{(p+1)/2}}\sum _{j=1}^d\sum _{i=1}^{n_j}{\mathbb {E}}|X_{ij}|^{r_*+p+1}. \end{aligned}$$
(3.52)

(ii) Under the assumptions of part (ii) of Theorem 3.1,

$$\begin{aligned} \Delta _h(g(W),g(Z))\le \frac{C{\tilde{h}}_{p-1}}{n^{(p+1)/2}}\sum _{i=1}^n{\mathbb {E}}|X_{i}|^{r+p+1}. \end{aligned}$$

(iii) Under the assumptions of part (iii) of Theorem 3.1,

$$\begin{aligned} \Delta _h(g({\textbf{W}}),g({\textbf{Z}}))&\le \frac{Cd{\tilde{h}}_{p+2}}{n_*^{p/2+2}}\sum _{j=1}^d\sum _{k=1}^d\sum _{i=1}^{n_j}\sum _{l=1}^{n_k}\big ({\mathbb {E}}|X_{ij}|^{r_*+p+2}+|{\mathbb {E}}[X_{ij}^{p+1}]|{\mathbb {E}}|X_{lk}|^{r_*+3} \big ). \end{aligned}$$
(3.53)

(iv) Under the assumptions of part (iv) of Theorem 3.1,

$$\begin{aligned} \Delta _h(g(W),g(Z))\le \frac{C{\tilde{h}}_p}{n^{p/2+2}}\sum _{i=1}^n\sum _{l=1}^n\big ({\mathbb {E}}|X_i|^{r+p+2}+|{\mathbb {E}}[X_i^{p+1}]|{\mathbb {E}}|X_l|^{r+3} \big ). \end{aligned}$$

Remark 3.3

The univariate bounds in parts (ii) and (iv) give bounds on \(\Delta _h(g(W),g(Z))\) that hold under weaker assumptions on g and h than the bounds in parts (i) and (iii) with \(d=1\). The univariate bounds of Theorems 3.2-\(-\)3.5 of [17] also enjoyed this property, but came at the cost of stronger moment conditions in the univariate case. In virtue of the improved w-dependence of our univariate bounds in Propositions 2.1 and 2.2, we were able to derive univariate bounds that do not incur this cost.

Remark 3.4

As noted by [17, Remark 3.6], if \(\Vert h^{(k)}\Vert =1\), \(1\le k\le p\), then , where \(B_p=\textrm{e}^{-1}\sum _{j=1}^\infty j^p/j!\) is the p-th Bell number (see [31, Section 26.7(i)]). For example, setting \(h_p=B_p\) in inequality (3.48) gives a bound in the \(d_{p}\) metric, \(p\ge 2\). Also note that setting \(p=2\) in part (ii) gives a bound in the Wasserstein distance. To understand the behaviour of the bound (3.48) for large p, we apply the lower bound in (2.15), Stirling’s inequality \(p!>\sqrt{2\pi }p^{p+1/2}\textrm{e}^{-p}\) and the inequality \(B_p<(0.792p/\log (p+1))^p\) of [7] to obtain the bound

$$\begin{aligned} \frac{\sqrt{\pi }\Gamma (\frac{p+3}{2})}{p!\Gamma (\frac{p}{2}+1)}h_p<\frac{1}{2}\sqrt{\frac{p+2}{p}}\bigg (\frac{2.153}{\log (p+1)}\bigg )^p\le \frac{1}{\sqrt{2}}\bigg (\frac{2.153}{\log (p+1)}\bigg )^p, \quad p\ge 2. \end{aligned}$$

We note that whilst we have shown that this constant tends to zero as \(p\rightarrow \infty \), this does not imply that the bound (3.48) will tend to zero in this limit if n is fixed. This is because, for \(p\ge 2\) and \(i=1,\ldots ,n\), we have that \({\mathbb {E}}|X_i|^{p+1}\ge {\mathbb {E}}|X_i|^p={\mathbb {E}}|Z|^p=2^{p/2}\Gamma ((p+1)/2)/\sqrt{\pi }\). Similar comments apply to parts (ii)–(iv) of Theorem 3.1.

Remark 3.5

Suppose \(n_1=\cdots =n_d=n\). For a fixed number of matching moments \(p\ge 2\), if we allow the dimension d to grow with n, the bound (3.52) of Corollary 3.2 tends to zero if \(n/d^{4/(p-1)}\rightarrow \infty \) and the bound (3.53) tends to zero if \(n/d^{6/p}\rightarrow \infty \). In particular, for \(p\ge 6\), the bound (3.52) can tend to zero even if \(d\gg n\).

We now set about proving Theorem 3.1 and Corollary 3.2. We begin with the following lemma, which provides bounds on some expectations that will appear in the proof of Theorem 3.1. In the lemma, f denotes the solution (1.2) of the Stein equation (1.1) and \(\psi _{m,j}\), for \(m\ge 1\), \(1\le j\le d\), is the solution of the Stein equation

$$\begin{aligned} \nabla ^\intercal \Sigma \nabla \psi _{m,j}({\textbf{w}})-{\textbf{w}}^\intercal \nabla \psi _{m,j}({\textbf{w}})=\frac{\partial ^m f({\textbf{w}})}{\partial w_j^m}. \end{aligned}$$
(3.54)

In the univariate \(d=1\) case, we drop the subscript j and simply write \(\psi _m\) for the solution to (3.54). For \(1\le i\le n\) and \(1\le j\le d\), we let \({\textbf{W}}^{(i,j)}={\textbf{W}}-n_j^{-1/2}{\textbf{X}}_{ij}\), where the random vector \({\textbf{X}}_{ij}\) is defined to be such that it has \(X_{ij}\) as its j-th entry and the other \(d-1\) entries are equal to zero. Note that \({\textbf{W}}^{(i,j)}\) is independent of \({\textbf{X}}_{ij}\). We then define \({\textbf{W}}_\theta ^{(i,j)}={\textbf{W}}^{(i,j)}+\theta n_j^{-1/2}{\textbf{X}}_{ij}\) for some \(\theta \in (0,1)\). In the univariate case, we let \(W_\theta ^{(i)}=W^{(i)}+\theta X_{i}/\sqrt{n}\), where \(W^{(i)}=W-X_i/\sqrt{n}\).

Lemma 3.6

Let \(P({\textbf{w}})=A+B\sum _{i=1}^{d}|w_i|^{r_i},\) where \(A,B\ge 0\), and \(r_1,\ldots , r_d\ge 0\). Assume that \(\Sigma =I_d\), \(\theta \in (0,1)\) and \(q\ge 0\), \(m\ge 2\) and \(t\ge 3\). Then,

$$\begin{aligned} {\mathbb {E}}\bigg |X_{ij}^q\frac{\partial ^tf}{\partial w_j^t}({\textbf{W}}_\theta ^{(i,j)})\bigg |&\le \frac{h_t}{t}\bigg [A{\mathbb {E}}|X_{ij}|^q+B\sum _{k=1}^d2^{r_k/2}\bigg (c_{r_k}{\mathbb {E}}|X_{ij}|^q{\mathbb {E}}|W_k|^{r_k}\nonumber \\&\quad +\frac{c_{r_k}}{n_k^{r_k/2}}{\mathbb {E}}|X_{ij}^qX_{ik}^{r_k}|+\mu _{r_k}{\mathbb {E}}|X_{ij}|^q\bigg )\bigg ],\end{aligned}$$
(3.55)
$$\begin{aligned} {\mathbb {E}}\bigg |X_{ij}^q\frac{\partial ^tf}{\partial w_j^t}({\textbf{W}}_\theta ^{(i,j)})\bigg |&\le h_{t-1}\frac{\sqrt{\pi }\Gamma (\frac{t}{2})}{2\Gamma (\frac{t+1}{2})}\bigg [A{\mathbb {E}}|X_{ij}|^q+B\sum _{k=1}^d2^{r_k/2}\bigg (c_{r_k}{\mathbb {E}}|X_{ij}|^q{\mathbb {E}}|W_k|^{r_k}\nonumber \\&\quad +\frac{c_{r_k}}{n_k^{r_k/2}}{\mathbb {E}}|X_{ij}^qX_{ik}^{r_k}|+\mu _{r_k+1}{\mathbb {E}}|X_{ij}|^q\bigg )\bigg ],\end{aligned}$$
(3.56)
$$\begin{aligned} {\mathbb {E}}\bigg |X_{ij}^q\frac{\partial ^3\psi _{m,j}}{\partial w_j^3}({\textbf{W}}_\theta ^{(i,j)})\bigg |&\le h_{m+1}\frac{\pi \Gamma (\frac{m+3}{2})}{4\sqrt{2}\Gamma (\frac{m}{2}+2)}\bigg [A{\mathbb {E}}|X_{ij}|^q+B\sum _{k=1}^d3^{r_k/2}\bigg (c_{r_k}{\mathbb {E}}|X_{ij}|^q{\mathbb {E}}|W_k|^{r_k}\nonumber \\&\quad +\frac{c_{r_k}}{n_k^{r_k/2}}{\mathbb {E}}|X_{ij}^qX_{ik}^{r_k}|+2\mu _{r_k+1}{\mathbb {E}}|X_{ij}|^q\big )\bigg ], \end{aligned}$$
(3.57)

where the inequalities are for g in the classes \(C_P^t({\mathbb {R}}^d)\), \(C_P^{t-1}({\mathbb {R}}^d)\) and \(C_P^{m+1}({\mathbb {R}}^d)\), respectively. Now suppose \(d=1\) and \(\Sigma =1\). Then

$$\begin{aligned} {\mathbb {E}}|X_i^qf^{(t)}(W_{\theta }^{(i)})|&\le h_{t-2}\bigg \{\alpha _r A{\mathbb {E}}|X_i|^q+2^{r/2}B\bigg [c_r\beta _r\bigg ({\mathbb {E}}|X_i|^q{\mathbb {E}}|W|^r\nonumber \\&\quad +\frac{1}{n^{r/2}}{\mathbb {E}}|X_i|^{q+r}\bigg ) +\gamma _r{\mathbb {E}}|X_i|^q\bigg ]\bigg \}, \end{aligned}$$
(3.58)
$$\begin{aligned} {\mathbb {E}}|X_i^q\psi _m^{(3)}(W_{\theta }^{(i)})|&\le h_{m-1}\bigg \{{{\tilde{\alpha }}}_r A{\mathbb {E}}|X_i|^q+3^{r/2}B\bigg [c_r{{\tilde{\beta }}}_r\bigg ({\mathbb {E}}|X_i|^q{\mathbb {E}}|W|^r\nonumber \\&\quad +\frac{1}{n^{r/2}}{\mathbb {E}}|X_i|^{q+r}\bigg )+{{\tilde{\gamma }}}_r{\mathbb {E}}|X_i|^q\bigg ]\bigg \}, \end{aligned}$$
(3.59)

where the inequalities are for g in the classes \(C_P^{t-2}({\mathbb {R}})\) and \(C_P^{m-1}({\mathbb {R}})\), respectively.

Proof

The proof is very similar to that of Lemma 3.4 of [17]. The only difference is that we improve the bounds of [17] by applying the improved bounds of Propositions 2.1 and 2.2 for the solutions f and \(\psi _m\) and we also use the inequality \(|a+b|^r\le c_r(|a|^r+|b|^r)\), \(r\ge 0\), which is sharper than the inequality \(|a+b|^r\le 2^r(|a|^r+|b|^r)\), \(r\ge 0\), used by [17]. \(\square \)

Proof of Theorem 3.1

Under the assumptions of part (i) of the theorem, the following bound given in Lemma 3.1 of [17] holds:

$$\begin{aligned} \Delta _h(g({\textbf{W}}),g({\textbf{Z}}))&\le \sum _{j=1}^d\sum _{i=1}^{n_j}\frac{1}{(p-1)!n_j^{(p+1)/2}}\bigg \{{\mathbb {E}}\bigg |X_{ij}^{p-1}\frac{\partial ^{p+1}f}{\partial w_j^{p+1}}({\textbf{W}}_{\theta _1}^{(i,j)})\bigg |\nonumber \\&\quad +\frac{1}{p}{\mathbb {E}}\bigg |X_{ij}^{p+1}\frac{\partial ^{p+1}f}{\partial w_j^{p+1}}({\textbf{W}}_{\theta _2}^{(i,j)})\bigg |\bigg \}, \end{aligned}$$
(3.60)

for some \(\theta _1,\theta _2\in (0,1)\). We now obtain inequality (3.48) by using inequality (3.56) to bound the expectations in the bound (3.60), and then simplifying the resulting bound by using the basic inequalities \({\mathbb {E}}|X_{ij}|\le {\mathbb {E}}|X_{ij}|^a\le {\mathbb {E}}|X_{ij}|^b\) for \(2\le a\le b\). These inequalities follow easily from an application Hölder’s inequality and the assumption that \({\mathbb {E}}[X_{ij}^2]=1\). We also use these inequalities to deduce that \({\mathbb {E}}|X_{ij}^{p-1}X_{ik}^{r_k}|\le {\mathbb {E}}|X_{ij}^{p+1}X_{ik}^{r_k}|\) (\(p\ge 2\), \(r_k\ge 0\)). This is verified by considering the cases \(j=k\) and \(j\not =k\) separately, where in the latter case we make use of the independence of \(X_{ij}\) and \(X_{ik}\). The proof of inequality (3.49) is similar, but we instead use inequality (3.58) to bound the expectations in (3.60).

Now, under the assumptions of part (iii) of the theorem, the following bound of Lemma 3.3 of [17] holds:

$$\begin{aligned}&\Delta _h(g({\textbf{W}}),g({\textbf{Z}}))\le \sum _{j=1}^d\sum _{i=1}^{n_j}\frac{1}{p!n_j^{p/2+1}}\nonumber \\ {}&\quad \bigg \{{\mathbb {E}}\bigg |X_{ij}^{p}\frac{\partial ^{p+2}f}{\partial w_j^{p+2}}({\textbf{W}}_{\theta _1}^{(i,j)})\bigg |+\!\frac{1}{p+1}\!{\mathbb {E}}\bigg |X_{ij}^{p+2}\frac{\partial ^{p+2}f}{\partial w_j^{p+2}}({\textbf{W}}_{\theta _2}^{(i,j)})\bigg |\nonumber \\&\quad +\!|{\mathbb {E}}X_{ij}^{p+1}|{\mathbb {E}}\bigg |X_{ij}\frac{\partial ^{p+2}f}{\partial w_j^{p+2}}({\textbf{W}}_{\theta _3}^{(i,j)})\bigg |\bigg \}+\sum _{j=1}^d\sum _{i=1}^{n_j}\frac{|{\mathbb {E}}[X_{ij}^{p+1}]|}{p!n_j^{(p+1)/2}}\sum _{k=1}^d\sum _{l=1}^{n_k}\frac{1}{n_k^{3/2}}\times \nonumber \\&\quad \times \bigg \{{\mathbb {E}}\bigg |X_{lk}\frac{\partial ^3\psi _{p+1,j}}{\partial w_k^3}({\textbf{W}}_{\theta _4}^{(l,k)})\bigg |+\frac{1}{2}{\mathbb {E}}\bigg |X_{lk}^3\frac{\partial ^3\psi _{p+1,j}}{\partial w_k^3}({\textbf{W}}_{\theta _5}^{(l,k)})\bigg |\bigg \}, \end{aligned}$$
(3.61)

for some \(\theta _1,\ldots ,\theta _5\in (0,1)\). Using inequalities (3.55) and (3.57) to bound the expectations in the bound (3.61) and simplifying the resulting bound using the same considerations given earlier in this proof yields the bound (3.50). The proof of inequality (3.51) is similar, but we instead use the inequalities (3.58) and (3.59) to bound the expectations in (3.61). Finally, the assertion in part (v) is a consequence of part (iv) of Propositions 2.1 and 2.2. \(\square \)

Proof of Corollary 3.2

The simplified bounds are obtained by applying the following considerations to the bounds of Theorem 3.1. Firstly, for a real-valued random variable Y with \({\mathbb {E}}|Y|^b<\infty \), we have \({\mathbb {E}}|Y|^a\le {\mathbb {E}}|Y|^b,\) for any \(2\le a\le b\).

Secondly, \({\mathbb {E}}|W_k|^r\le Cn_k^{-1}\sum _{i=1}^{n_k}{\mathbb {E}}|X_{ik}|^{r}\) for \(r\ge 2\), and \({\mathbb {E}}|W_k|^r\le 1\) for \(0\le r<2\). For \(r\ge 2\), this inequality follows from the Marcinkiewicz–Zygmund inequality [29] \({\mathbb {E}}|\sum _{i=1}^{n}Y_i|^r\le C_r\{\sum _{i=1}^{n}{\mathbb {E}}|Y_i|^r+(\sum _{i=1}^{n}{\mathbb {E}}[Y_i^2])^{1/2}\}\) and the inequality \({\mathbb {E}}|Y|^a\le {\mathbb {E}}|Y|^b,\) for any \(2\le a\le b\). For \(0\le r<2\), we just use Hölder’s inequality, \({\mathbb {E}}|W_k|^r\le ({\mathbb {E}}W_k^2)^{r/2}=1\). Thirdly, by Hölder’s inequality, \({\mathbb {E}}|X_{ij}^aX_{ik}^b|\le ({\mathbb {E}}|X_{ij}^{a+b}|)^{a/(a+b)} ({\mathbb {E}}|X_{ik}^{a+b}|)^{b/(a+b)} \le \max \{{\mathbb {E}}|X_{ij}|^{a+b},{\mathbb {E}}|X_{ik}|^{a+b}\}\), for \(a,b\ge 1\). \(\square \)

4 Application to Chi-square Approximation

In this section, we provide an application of the general bounds of Sect. 3 to the Chi-square approximation of the power divergence statistics. Consider a multinomial goodness-of-fit test over n independent trials, with each trial resulting in a unique classification over \(r\ge 2\) classes. We denote the observed frequencies arising in the classes by \(U_1,\ldots ,U_r\) and denote the nonzero classification probabilities by \(p_1,\ldots ,p_r\). The power divergence family of statistics, introduced by [9], are then given by

$$\begin{aligned} T_\lambda =\frac{2}{\lambda (\lambda +1)}\sum _{j=1}^rU_j\bigg [\bigg (\frac{U_j}{np_j}\bigg )^\lambda -1\bigg ], \end{aligned}$$
(4.62)

where the index parameter \(\lambda \in {\mathbb {R}}\). When \(\lambda =0,-1\), the notation (4.62) should be understood as a result of passage to the limit. The case \(\lambda =0\) corresponds to the log-likelihood ratio statistic, whilst other important special cases include the Freeman–Tukey statistic [13] (\(\lambda =-1/2\)) and Pearson’s statistic [32] (\(\lambda =1\)):

$$\begin{aligned} \chi ^2=\sum _{j=1}^{r}\frac{(U_j-np_j)^2}{np_j}. \end{aligned}$$
(4.63)

A fundamental result is that, for all \(\lambda \in {\mathbb {R}}\), the statistic \(T_\lambda \) converges in distribution to the \(\chi _{(r-1)}^2\) distribution as the number of trials n tends to infinity (see [9], p. 443). Edgeworth expansions have been used to assess the quality of the Chi-square approximation of the distribution of the statistic \(T_{\lambda }\) by [3, 4, 14, 34, 36, 40]. For \(r\ge 4\) and all \(\lambda \in {\mathbb {R}}\), Ulyanov and Zubov [40] obtained a \(O(n^{-(r-1)/r})\) bound on the rate of convergence in the Kolmogorov distance (a refinement of this result is given in [3]) and, for \(r=3\), Assylebekov et al. [4] obtained a \(O(n^{-3/4+0.065})\) bound on the rate of convergence. It has also been shown recently by [35] that introducing external randomisation can increase the speed of convergence in the Kolmogorov metric. Special cases of the family have also been considered. For the likelihood ratio statistic, Anastasiou and Reinert [2] obtained an explicit \(O(n^{-1/2})\) bound for smooth test functions (in a setting more general than that of categorical data). For Pearson’s statistic, Yarnold [41] established a \(O(n^{-(r-1)/r})\), for \(r\ge 2\), bound on the rate of convergence in the Kolmogorov distance, which was improved to \(O(n^{-1})\) for \(r\ge 6\) by [24]. An explicit \(O(n^{-1/2})\) Kolmogorov distance bound was established using Stein’s method by [28], whilst [20] used Stein’s method to obtain explicit \(O(n^{-1})\) bounds, measured using smooth test functions. The results of [20] have recently been generalised to the power divergence statistics by [18], covering the family for \(\lambda >-1\), the largest subclass for which finite sample bounds can be obtained.

In this section, we improve the results of [18, 20] for the case of \(r=2\) cell classifications, by obtaining bounds that hold in weaker metrics and possess constants several orders of magnitude smaller. We derive our results by taking advantage of the special structure of Pearson’s statistic for the case \(r=2\), which allows us to apply the general bounds of Theorem 3.1. The special structure is that the statistic can be written as the square of a sum of i.i.d. random variables with zero mean and unit variance; see the proof of Proposition 4.1.

Proposition 4.1

(i) Let \((U_1,U_2)\) represent the vector of \(n\ge 1\) observed counts, with cell classification probabilities \(0<p_1,p_2<1\). Let \(\chi ^2\) denote Pearson’s statistic, as defined in (4.63). Then,

$$\begin{aligned} d_{\textrm{W}}({\mathcal {L}}(\chi ^2),\chi _{(1)}^2)\le \frac{25}{\sqrt{np_1p_2}}, \end{aligned}$$
(4.64)

and, for \(h\in C_b^2({\mathbb {R}}^+)\),

$$\begin{aligned} |{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|\le \frac{892}{np_1p_2}(\Vert h'\Vert +\Vert h''\Vert ), \end{aligned}$$
(4.65)

where \(\chi _{(1)}^2h\) denotes the expectation \({\mathbb {E}}[h(Y)]\) for \(Y\sim \chi _{(1)}^2\).

(ii) More generally, let \(T_\lambda \), \(\lambda >-1\), denote the power divergence statistic (4.62). Then,

$$\begin{aligned} d_{\textrm{W}}({\mathcal {L}}(T_\lambda ),\chi _{(1)}^2)\le \frac{1}{\sqrt{np_1p_2}}\bigg (25+\frac{\sqrt{2}|(\lambda -1)(4\lambda +7)|}{\lambda +1}\bigg ), \end{aligned}$$
(4.66)

and, for \(h\in C_b^2({\mathbb {R}}^+)\),

$$\begin{aligned} |{\mathbb {E}}[h(T_\lambda )]-\chi _{(1)}^2h|&\le \frac{1}{np_1p_2}\bigg \{(892+496|\lambda -1|)(\Vert h'\Vert +\Vert h''\Vert )+\frac{19}{9}(\lambda -1)^2\Vert h''\Vert \nonumber \\&\quad +\frac{|(\lambda -1)(\lambda -2)(12\lambda +13)|}{6(\lambda +1)}\Vert h'\Vert \bigg \}. \end{aligned}$$
(4.67)

Using the special structure of Pearson’s statistic when \(r=2\), and applying inequality (4.67) together with a standard smoothing technique for converting smooth test function bounds into Kolmogorov distance bounds, we deduce the following Kolmogorov distance bounds (see [8, p. 48], and also [19] for a detailed account of the technique).

Corollary 4.2

Suppose \(n\ge 1\) and \(0<p_1,p_2<1.\) Then,

$$\begin{aligned} d_{\textrm{K}}({\mathcal {L}}(\chi ^2),\chi ^2_{(1)})\le \frac{0.9496}{\sqrt{np_1p_2}}. \end{aligned}$$
(4.68)

Let \(\lambda >-1\). Then, there exist universal constants \(C_1,C_2,C_3>0\), independent of n, \(p_1\), \(p_2\) and \(\lambda \), such that

$$\begin{aligned} d_{\textrm{K}}({\mathcal {L}}(T_\lambda ),\chi _{(1)}^2)&\le \frac{1}{(np_1p_2)^{1/5}}\bigg \{C_1+C_2(\lambda -1)^2+\frac{C_3|(\lambda -1)(\lambda -2)(12\lambda +13)|}{(\lambda +1)(np_1p_2)^{2/5}} \bigg \}. \end{aligned}$$
(4.69)

Remark 4.3

Order \(n^{-1/2}\) bounds for the quantity \(|{\mathbb {E}}[h(T_\lambda )]-\chi _{(r)}^2\,h|\), \(r\ge 2\), were obtained by [18] for \(h\in C_b^2({\mathbb {R}}^+)\), whilst \(O(n^{-1})\) bounds were obtained for \(h\in C_b^5({\mathbb {R}}^+)\). These bounds generalised bounds of [20] for the quantity \(|{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|\), which held for bounded h in the classes \(C_b^2({\mathbb {R}}^+)\) and \(C_b^5({\mathbb {R}}^+)\). By obtaining bounds on \(|{\mathbb {E}}[h(T_\lambda )]-\chi _{(1)}^2h|\) that hold for a wider class of functions, in Corollary 4.2 we are able to deduce a Kolmogorov distance bound for the \(\chi _{(1)}^2\) approximation of \(T_\lambda \) for two cell classifications that improve the \(O(n^{-1/10})\) bound on the rate of convergence obtained by [18] to \(O(n^{-1/5})\). This rate of convergence is slower than the optimal \(O(n^{-1/2})\) rate for the case \(r=2\), but is, to the best knowledge of the author, the only such bound in the literature (except that of [18]) that tends to zero under the condition that \(np_*\rightarrow \infty \), where \(p_*=\textrm{min}\{p_1,p_2\}\).

Indeed, \(np_*\rightarrow \infty \) (\(p_*=\textrm{min}_{1\le j\le r}p_j\)) is an established condition under which the Chi-square approximation of Pearson’s statistic is valid [25]. The key to obtaining bounds that tend to zero under this condition was the weaker moment conditions of Theorem 3.1. If any of the absolute moments in the bounds in parts (ii) and (iv) were of larger order, we would not have been able to obtain bounds that tend to zero if \(np_*\rightarrow \infty \). As an illustrative example, using Theorem 3.5 of [17] instead of part (iv) of Theorem 3.1 would have resulted in a \(O(n^{-1})\) bound that tends to zero under the stronger condition \(n^{5/2}p_*\rightarrow \infty \).

The following result, which may be of independent interest, is used to prove Proposition 4.1.

Corollary 4.4

Suppose \(X_1,\ldots ,X_n\) are i.i.d. random variables with \({\mathbb {E}}[X_1]=0\), \({\mathbb {E}}[X_1^2]=1\) and \({\mathbb {E}}[X_1^4]<\infty \). Let \(W=n^{-1/2}\sum _{i=1}^nX_i\). Then,

$$\begin{aligned} d_{\textrm{W}}({\mathcal {L}}(W^2),\chi _{(1)}^2)\le \frac{1}{\sqrt{n}}\bigg [24{\mathbb {E}}|X_1|^3+\frac{17}{\sqrt{n}}{\mathbb {E}}[X_1^4]\bigg ]. \end{aligned}$$
(4.70)

Suppose now that \({\mathbb {E}}[X_1^6]<\infty \). Then, for \(h\in C_b^2({\mathbb {R}}^+)\),

$$\begin{aligned} |{\mathbb {E}}[h(W^2)]-\chi _{(1)}^2h|&\le \frac{(\Vert h'\Vert +\Vert h''\Vert )}{n}\bigg \{187{\mathbb {E}}[X_1^4]+\frac{131}{n}{\mathbb {E}}[X_1^6]\nonumber \\&\quad +|{\mathbb {E}}[X_1^3]|\bigg [704{\mathbb {E}}|X_1|^3+\frac{468}{n}{\mathbb {E}}|X_1|^5\bigg ]\bigg \}. \end{aligned}$$
(4.71)

Remark 4.5

The bound (4.71) improves on the bound of Theorem 3.1 of [20] for the quantity \(|{\mathbb {E}}[h(W^2)]-\chi _{(1)}^2h|\) by holding for a larger class of test functions, having weaker moment conditions, and also possessing smaller numerical constants.

Proof

To prove inequality (4.70), we apply part (ii) of Theorem 3.1. We have \(g(w)=w^2\) and, since \(g'(w)=2w\), we take \(P(w)=2|w|\) as our dominating function. Applying inequality (3.49) (taking note of Remark 3.4 to get a Wasserstein distance bound) with \(A=0, B=2, r=1\) and \(p=2\), and using that \({\mathbb {E}}|W|\le ({\mathbb {E}}[W^2])^{1/2}=1\) and \(\mu _1=\sqrt{2/\pi }\), we obtain the bound (4.70), after rounding numerical constants up to the nearest integer.

We now prove inequality (4.71). Since \(g(w)=w^2\) is an even function, we can apply part (iv) of Theorem 3.1 to obtain a \(O(n^{-1})\) bound. We have \((g'(w))^2=4w^2\) and \(g''(w)=2\), and so take \(P(w)=2+4w^2\) as our dominating function. Applying inequality (3.51) with \(A=2, B=4, r=2\) and \(p=2,\) and using that \({\mathbb {E}}[W^2]=1\) and \(\mu _3=2\sqrt{2/\pi }\), now yields inequality (4.71). \(\square \)

We will use the following lemma in our proof of Proposition 4.1. The proof, which involves an application of Theorem 3.1, is deferred until the end of the section.

Lemma 4.6

Let \(X_1,\ldots ,X_n\) be i.i.d. random variables with \(X_1=_d(I-p_1)/\sqrt{p_1p_2}\), where \(I\sim \textrm{Ber}(p_1)\). Let \(W=n^{-1/2}\sum _{i=1}^nX_i\). Then, for \(h\in C_b^2({\mathbb {R}}^+)\),

$$\begin{aligned} |{\mathbb {E}}[W^3h'(W^2)]|\le \frac{2976}{\sqrt{np_1p_2}}\{\Vert h'\Vert +\Vert h''\Vert \}. \end{aligned}$$
(4.72)

Proof of Proposition 4.1

(i) We begin by proving inequalities (4.64) and (4.65). Let \(p=p_1\), so that \(p_2=1-p\). As \(U_1+U_2=n\), a short calculation gives that

$$\begin{aligned} \chi ^2=\bigg (\frac{U_1-np}{\sqrt{np(1-p)}}\bigg )^2. \end{aligned}$$

Since \(U_1\sim \textrm{Bin}(n,p)\), it can be expressed as a sum of i.i.d. indicator random variables: \(U_1=\sum _{i=1}^nI_i\), where \(I_i\sim \textrm{Ber}(p)\). We can therefore write \(\chi ^2=W^2\), where \(W=n^{-1/2}\sum _{i=1}^n X_i\), for \(X_1,\ldots , X_n\) i.i.d. random variables with \(X_1=(I_1-p)/\sqrt{p(1-p)}\). We have that \({\mathbb {E}}[X_1]=0\), \({\mathbb {E}}[X_1^2]=1\) and \(|{\mathbb {E}}[X_1^m]|\le {\mathbb {E}}|X_1|^m\le (p(1-p))^{1-m/2}\) for all \(m\ge 2\). The assumptions for inequalities (4.70) and (4.71) therefore hold. Plugging these moment bounds into (4.70) and rounding constants up to the nearest integer then yield the bound

$$\begin{aligned} d_{\textrm{W}}({\mathcal {L}}(\chi ^2),\chi _{(1)}^2)\le \frac{1}{\sqrt{np_1p_2}}\bigg (24+\frac{17}{\sqrt{np_1p_2}}\bigg ). \end{aligned}$$
(4.73)

We obtain the simplified bound (4.64) as follows. Let \(Y\sim \chi _{(1)}^2\). Observe that, for \(h\in {\mathcal {H}}_{\textrm{W}}\),

$$\begin{aligned} |{\mathbb {E}}[h(\chi ^2)]-\chi ^2_{(1)}h|\le \Vert h'\Vert {\mathbb {E}}|\chi ^2-Y|\le \Vert h'\Vert \big ({\mathbb {E}}[\chi ^2]+{\mathbb {E}}[Y]\big )=2\Vert h'\Vert \le 2, \end{aligned}$$

and so

$$\begin{aligned} d_{\textrm{W}}({\mathcal {L}}(\chi ^2),\chi _{(1)}^2)\le 2. \end{aligned}$$
(4.74)

Now, the upper bound in (4.73) is greater than the upper bound in (4.74) if \(\sqrt{np_1p_2}<17\), and so we may take \(\sqrt{np_1p_2}\ge 17\) in (4.73), and doing so yields the bound (4.64). Similarly, we obtain inequality (4.65) by plugging the moment bounds into (4.71) and simplifying the bound as we did in deriving inequality (4.64).

(ii) To prove inequality (4.66), we use the following bound, which can be found by examining the proof of Theorem 2.3 of [18]:

$$\begin{aligned}{} & {} |{\mathbb {E}}[h(T_\lambda )]-\chi _{(1)}^2h|\le |{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|\nonumber \\ {}{} & {} +\frac{|(\lambda -1)(4\lambda +7)|\Vert h'\Vert }{(\lambda +1)\sqrt{n}}\bigg (\frac{1}{\sqrt{p_1}}+\frac{1}{\sqrt{p_2}}\bigg ). \end{aligned}$$
(4.75)

Using inequality (4.64) to bound \(|{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|\) and also using the inequality \(1/\sqrt{p_1}+1/\sqrt{p_2}\le \sqrt{2/(p_1p_2)}\) now yield the desired bound. We note that inequality (4.75) was derived under the assumption that \(np_*\ge 1\) and that if \(\lambda \ge 2\), then also \(np_*\ge 2(\lambda -2)^2\), where \(p_*=\textrm{min}\{p_1,p_2\}\). However, if these assumptions are not satisfied, then the bound in (4.66) exceeds 2 [the upper bound in (4.74)], and so there is no need to impose these conditions in the statement of the theorem; similar comments apply to inequality (4.76) below.

Finally, we prove inequality (4.67). We use the following bound, which can be found by examining the proof of Theorem 2.2 of [18]:

$$\begin{aligned}&|{\mathbb {E}}[h(T_\lambda )]-\chi _{(1)}^2h|\nonumber \\ {}&\le \frac{|(\lambda -1)(\lambda -2)(12\lambda +13)|}{6(\lambda +1)n}\bigg (\frac{1}{p_1}+\frac{1}{p_2}\bigg )\Vert h'\Vert \nonumber \\&\quad +\frac{19(\lambda -1)^2}{18n}\bigg (\frac{1}{\sqrt{p_1}}+\frac{1}{\sqrt{p_2}}\bigg )^2\Vert h''\Vert +|{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|+R, \end{aligned}$$
(4.76)

where

$$\begin{aligned} R=\frac{|\lambda -1|}{3}\bigg |{\mathbb {E}}\bigg [\bigg (\frac{S_1^3}{\sqrt{np_1}}+\frac{S_2^3}{\sqrt{np_2}}\bigg )h'(W^2)\bigg ]\bigg |, \end{aligned}$$

and \(S_j=(U_j-np_j)/\sqrt{np_j}\), \(j=1,2\). To bound R, we express \(S_1\) and \(S_2\) in terms of W. Using the representation of \(W^2=\chi ^2\) given at the beginning of the proof and the alternative representation \(W=(U_2-np_2)/{\sqrt{np_1p_2}}\), which is obtained using a similar calculation, we have the representations \(S_1=\sqrt{p_2}W\) and \(S_2=\sqrt{p_1}W. \) Therefore,

$$\begin{aligned} R&=\frac{|\lambda -1|}{3}\bigg (\frac{p_2^{3/2}}{\sqrt{np_1}}+\frac{p_1^{3/2}}{\sqrt{np_2}}\bigg )|{\mathbb {E}}[W^3h'(W^2)]|\nonumber \\&\le \frac{992|\lambda -1|}{np_1p_2}\big (p_2^{3/2}p_1^{1/2}+p_1^{3/2}p_2^{1/2}\big )\big (\Vert h'\Vert +\Vert h''\Vert \big )\le \frac{496|\lambda -1|}{np_1p_2}\big (\Vert h'\Vert +\Vert h''\Vert \big ), \end{aligned}$$
(4.77)

where we used Lemma 4.6 to obtain the first inequality. We now obtain inequality (4.67) by substituting inequality (4.77) and our bound (4.65) for \(|{\mathbb {E}}[h(\chi ^2)]-\chi _{(1)}^2h|\) into (4.76), and simplifying the resulting bound using the formula \(1/p_1+1/p_2= 1/(p_1p_2)\) and the inequality \((1/\sqrt{p_1}+1/\sqrt{p_2})^2\le 2/(p_1p_2)\), for \(0<p_1,p_2<1\) such that \(p_1+p_2=1\). \(\square \)

Proof of Corollary 4.2

We first prove inequality (4.68). Recall from the proof of Proposition 4.1 that when \(r=2\) we can write Pearson’s statistic as \(\chi ^2=W^2\), where \(W=n^{-1/2}\sum _{i=1}^{n}X_{i}\) for \(X_{1},\ldots ,X_{n}\) i.i.d. random variables with \(X_1=_d(I_1-p)/\sqrt{p(1-p)}\) and \(I_1\sim \textrm{Ber}(p)\). Recall also that if \(Z\sim N(0,1)\) then \(Z^2\sim \chi _{(1)}^2\). Thus, for any \(z>0\),

$$\begin{aligned} |{\mathbb {P}}(W^2\le z)-{\mathbb {P}}(Z^2\le z)|&=|{\mathbb {P}}(-\sqrt{z}\le W^2\le \sqrt{z})-{\mathbb {P}}(-\sqrt{z}\le Z^2\le \sqrt{z})|\\&\le |{\mathbb {P}}(W\le \sqrt{z})-{\mathbb {P}}\\ {}&\quad (Z\le z)|+|{\mathbb {P}}(W\le -\sqrt{z})-{\mathbb {P}}(Z\le -\sqrt{z})|, \end{aligned}$$

from which we deduce that \( d_{\textrm{K}}({\mathcal {L}}(\chi ^2),\chi ^2_{(1)})\le 2d_{\textrm{K}}({\mathcal {L}}(W),{\mathcal {L}}(Z))\). Now, using the Berry–Esseen theorem with the best available numerical constant of \(C=0.4748\) [38], we get:

$$\begin{aligned} d_{\textrm{K}}({\mathcal {L}}(\chi ^2),\chi ^2_{(1)})\le 2\frac{0.4748{\mathbb {E}}|X_1^3|}{({\mathbb {E}}[X_1^2])^{3/2}\sqrt{n}}=\frac{0.9496}{\sqrt{np_1p_2}}. \end{aligned}$$

We now prove inequality (4.69). Let \(\alpha >0\), and for fixed \(z>0\) define a function \(h_\alpha :{\mathbb {R}}^+\rightarrow [0,1]\) by \(h_\alpha (x)=1\) if \(x\le z\); \(h_\alpha (x)=1-2(x-z)^2/\alpha ^2\) if \(z<x\le z+\alpha /2\); \(h_\alpha (x)=2(x-(z+\alpha ))^2/\alpha ^2\) if \(z+\alpha /2<x\le z+\alpha \); and \(h_\alpha (x)=0\) if \(x\ge z+\alpha \). Then \(h_\alpha '\) is Lipschitz with \(\Vert h_\alpha '\Vert =2/\alpha \) and \(\Vert h_\alpha ''\Vert =4/\alpha ^2\). Let \(Y\sim \chi _{(1)}^2\). Using inequality (4.67) now yields

$$\begin{aligned}&{\mathbb {P}}(T_\lambda \le z)-{\mathbb {P}}(Y\le z)\nonumber \\&\quad \le {\mathbb {E}}[h_\alpha (T_\lambda )]-{\mathbb {E}}[h_\alpha (Y)]+{\mathbb {E}}[h_\alpha (Y)]-{\mathbb {P}}(Y\le z)\nonumber \\&\quad \le \frac{1}{np_1p_2}\bigg \{(892+496|\lambda -1|)\bigg (\frac{2}{\alpha }+\frac{4}{\alpha ^2}\bigg )+\frac{19}{9}(\lambda -1)^2\cdot \frac{4}{\alpha ^2}\nonumber \\&\quad \quad +\frac{|(\lambda -1)(\lambda -2)(12\lambda +13)|}{6(\lambda +1)}\cdot \frac{2}{\alpha } \bigg \}+{\mathbb {P}}(z\le Y\le z+\alpha ). \end{aligned}$$
(4.78)

We note the inequality \({\mathbb {P}}(z\le Y\le z+\alpha )\le \sqrt{2\alpha /\pi }\) (see [20, p. 754]). An upper bound now follows on applying this inequality to (4.78) and choosing \(\alpha = c(np_1p_2)^{-2/5}\), for some universal constant \(c>0\). To simplify the bound, we used that basic inequality \(|\lambda -1|\le 1+(\lambda -1)^2\). Similarly, we can obtain a lower bound that is the negative of the upper bound, which completes the proof. \(\square \)

Proof of Lemma 4.6

Let \(g(w)=w^3h'(w^2)\). We begin by noting that \({\mathbb {E}}[g(Z)]={\mathbb {E}}[Z^3h'(Z^2)]=0\). This is because the standard normal distribution is symmetric about the origin (\(Z=_d -Z\), for \(Z\sim N(0,1)\)) and g(w) is an odd function (\(g(w)=-g(-w)\) for all \(w\in {\mathbb {R}}\)), meaning that \({\mathbb {E}}[g(Z)]={\mathbb {E}}[g(-Z)]=-{\mathbb {E}}[g(Z)]\), whence \({\mathbb {E}}[g(Z)]=0\). Therefore, we can write \({\mathbb {E}}[W^3h'(W^2)]={\mathbb {E}}[W^3h'(W^2)]-{\mathbb {E}}[Z^3h'(Z^2)]\).

We shall obtain the bound (4.72) by applying part (ii) of Theorem 3.1 with \(h(w)=w\) and \(g(w)=w^3\,h'(w^2)\). Using the basic inequality \(2a^2\le 1+a^4\), we have, for \(w\in {\mathbb {R}}\),

$$\begin{aligned}{} & {} |g'(w)|=|2w^4h''(w^2)+3w^2h'(w^2)|\le 2w^4\Vert h''\Vert +3w^2\Vert h'\Vert \\ {}{} & {} \quad \le \frac{3}{2}\Vert h'\Vert +\Big (2\Vert h''\Vert +\frac{3}{2}\Vert h'\Vert \Big )w^4. \end{aligned}$$

We therefore can apply part (ii) of Theorem 3.1 with \(A=3\Vert h'\Vert /2\), \(B=2\Vert h''\Vert +3\Vert h'\Vert /2\), \(r=4\) and \(p=2\), and using the bound (3.49) gives that

$$\begin{aligned}&|{\mathbb {E}}[W^3h'(W^2)]|=|{\mathbb {E}}[W^3h'(W^2)]-{\mathbb {E}}[Z^3h'(Z^2)]| \\&\le \frac{3}{2\sqrt{n}}\bigg [\frac{21}{2}\Vert h'\Vert {\mathbb {E}}|X_1^3|+4\Big (2\Vert h''\Vert +\frac{3}{2}\Vert h'\Vert \Big )\bigg (72{\mathbb {E}}|X_1^3|{\mathbb {E}}[W^4]+\frac{72}{n^{2}}{\mathbb {E}}|X_1^{7}|+40\sqrt{\frac{2}{\pi }}{\mathbb {E}}|X_1^3|\bigg )\bigg ]\\&\le \frac{1}{\sqrt{np_1p_2}}\bigg [\Vert h'\Vert \bigg (2247+\frac{648}{np_1p_2}+\frac{648}{(np_1p_2)^2}\bigg )+\Vert h''\Vert \bigg (2975+\frac{864}{np_1p_2}+\frac{864}{(np_1p_2)^2}\bigg )\bigg ]\\&\le \frac{1}{\sqrt{np_1p_2}}\{\Vert h'\Vert +\Vert h''\Vert \}\bigg (2975+\frac{864}{np_1p_2}+\frac{864}{(np_1p_2)^2}\bigg ), \end{aligned}$$

where we used that \(\mu _5=8\sqrt{2/\pi }\) and that \({\mathbb {E}}|X_1^3|\le (p_1p_2)^{-1/2}\), \({\mathbb {E}}|X_1^7|\le (p_1p_2)^{-5/2}\) and \({\mathbb {E}}[W^4]=3(n-1)({\mathbb {E}}[X_1^2])^2/n+{\mathbb {E}}[X_1^4]/n\le 3+1/(np_1p_2)\). We obtain the final simplified bound (4.72) using a similar argument to the one used in the proof of Proposition 4.1. \(\square \)