1 Introduction

Asymptotic normality is a frequently occurring phenomenon, the classical central limit theorem being the very first example. The first step in the proof is the observation that the moment generating function of the sum of n identically independently distributed random variables is the n-th power of the moment generating function of the distribution underlying the summands. As similar moment generating functions occur in many examples in combinatorics, a general theorem to prove asymptotic normality is desirable. Such a theorem was proved by Hwang [17], usually called the “quasi-power theorem”.

Theorem

(Hwang [17]) Let \(\{\Omega _n\}_{n\ge 1}\) be a sequence of integer-valued random variables. Suppose that the moment generating function satisfies the asymptotic expression

$$\begin{aligned} M_n(s):=\mathbb {E}\left( e^{\Omega _ns}\right) =e^{W_n(s)}(1+O(\kappa _n^{-1})), \end{aligned}$$
(1.1)

the O-term being uniform for \(|s |\le \tau \), \(s\in \mathbb {C}\), \(\tau >0\), where

  1. (1)

    \(W_n(s)=u(s)\phi _{n}+v(s)\), with u(s) and v(s) analytic for \(|s |\le \tau \) and independent of n; and \(u''(0)\ne 0\);

  2. (2)

    \(\lim _{n\rightarrow \infty }\phi _{n}=\infty \);

  3. (3)

    \(\lim _{n\rightarrow \infty }\kappa _n=\infty \).

Then the distribution of \(\Omega _n\) is asymptotically normal, i.e.,

$$\begin{aligned} \sup _{x\in \mathbb {R}}\left| \mathbb {P}\left( \frac{\Omega _n- u'(0)\phi _{n}}{\sqrt{u''(0)\phi _{n}}} < x\right) - \Phi (x)\right| =O\left( \frac{1}{\sqrt{\phi _{n}}}+\frac{1}{\kappa _n}\right) , \end{aligned}$$

where \(\Phi \) denotes the standard normal distribution

$$\begin{aligned} \Phi (x)=\frac{1}{\sqrt{2\pi }}\int _{-\infty }^{x}\exp \left( -\frac{1}{2} y^2\right) \,dy. \end{aligned}$$

See Hwang’s article [17] as well as Flajolet-Sedgewick [6, Sec. IX.5] for many applications of this theorem. A generalisation of the quasi-power theorem to dimension 2 has been provided in [12]. It has been used in [4, 14,15,16, 18]. In [3, Thm. 2.22], an m-dimensional version of the quasi-power theorem is stated without speed of convergence. Also in [1], such an m-dimensional theorem without speed of convergence is proved. There, several multidimensional applications are given, too.

In contrast to many results about the speed of convergence in classical probability theory (see, e.g., [11]), the sequence of random variables is not assumed to be independent. The only assumption is that the moment generating function behaves asymptotically like a large power. This mirrors the fact that the moment generating function of the sum of independent, identically distributed random variables is exactly a large power. The advantage is that the asymptotic expression (1.1) arises naturally in combinatorics by using techniques such as singularity analysis or saddle point approximation (see [6]).

The purpose of this article is to generalise the quasi-power theorem including the speed of convergence to arbitrary dimension m. We first state this main result in Theorem 1 in this section. In Sect. 2, a new Berry–Esseen inequality (Theorem 2) is presented, which we use to prove the m-dimensional quasi-power theorem. The combinatorial idea behind the formulation of the Berry–Esseen inequality is discussed in Sect. 3. Our Berry–Esseen bound is proved in Sect. 4. The final Sect. 5 is then devoted to the proof of the quasi-power theorem. Examples are given in the extended abstract [13].

We use the following conventions: vectors are denoted by boldface letters such as \(\mathbf {s}\), their components are then denoted by regular letters with indices such as \(s_j\). For a vector \(\mathbf {s}\), \(\Vert \mathbf {s}\Vert \) denotes the maximum norm \(\max \{|s_j |\}\). The standard inner product on \(\mathbb {C}^m\) (with linearity in the second argument) is denoted by \(\langle \,\cdot \,, \,\cdot \,\rangle \). All implicit constants of O-terms may depend on the dimension m as well as on \(\tau \) which is introduced in Theorem 1.

Our first main result is the following m-dimensional version of Hwang’s theorem.

Theorem 1

Let \(\{\varvec{\Omega }_n\}_{n\ge 1}\) be a sequence of m-dimensional real random vectors. Suppose that the moment generating function satisfies the asymptotic expression

$$\begin{aligned} M_n(\mathbf {s}):=\mathbb {E}(e^{\langle \varvec{\Omega }_n,\mathbf {s}\rangle })=e^{W_n(\mathbf {s})}(1+O(\kappa _n^{-1})), \end{aligned}$$
(1.2)

the O-term being uniform for \(||\mathbf {s} ||\le \tau \), \(\mathbf {s}\in \mathbb {C}^m\), \(\tau >0\), where

  1. (1)

    \(W_n(\mathbf {s})=u(\mathbf {s})\phi _{n}+v(\mathbf {s})\), with \(u(\mathbf {s})\) and \(v(\mathbf {s})\) analytic for \(||\mathbf {s} ||\le \tau \) and independent of n; and the Hessian \(H_u(\varvec{0})\) of u at the origin is non-singular;

  2. (2)

    \(\lim _{n\rightarrow \infty }\phi _{n}=\infty \);

  3. (3)

    \(\lim _{n\rightarrow \infty }\kappa _n=\infty \).

Then, the distribution of \(\varvec{\Omega }_n\) is asymptotically normal with speed of convergence \(O(\phi _n^{-1/2})\), i.e.,

$$\begin{aligned} \sup _{\mathbf {x}\in \mathbb {R}^{m}}\left| \mathbb {P}\left( \frac{\varvec{\Omega }_n-{{\mathrm{grad}}}u (\varvec{0})\phi _{n}}{\sqrt{\phi _{n}}} \le \mathbf {x}\right) - \Phi _{H_u(\varvec{0})}(\mathbf {x})\right| =O\left( \frac{1}{\sqrt{\phi _{n}}}\right) , \end{aligned}$$
(1.3)

where \(\Phi _{\Sigma }\) denotes the distribution function of the non-degenerate m-dimensional normal distribution with mean \(\varvec{0}\) and variance-covariance matrix \(\Sigma \), i.e.,

$$\begin{aligned} \Phi _\Sigma (\mathbf {x})=\frac{1}{(2\pi )^{m/2}\sqrt{\det \Sigma }}\int _{\mathbf {y}\le \mathbf {x}}\exp \left( -\frac{1}{2} \mathbf {y}^\top \Sigma ^{-1} \mathbf {y}\right) \,d\mathbf {y}, \end{aligned}$$

where \(\mathbf {y}\le \mathbf {x}\) means \(y_\ell \le x_\ell \) for \(1\le \ell \le m\).

If \(H_{u}(\varvec{0})\) is singular, the random variables

$$\begin{aligned} \frac{\varvec{\Omega }_{n}-{{\mathrm{grad}}}u(\varvec{0})\phi _{n}}{\sqrt{\phi _{n}}} \end{aligned}$$

converge in distribution to a degenerate normal distribution with mean \(\varvec{0}\) and variance-covariance matrix \(H_{u}(\varvec{0})\).

Note that in the case of singular \(H_{u}(\varvec{0})\), a uniform speed of convergence cannot be guaranteed. To see this, consider the (constant) sequence of random variables \(\Omega _{n}\) which takes values \(\pm \,1\) each with probability 1 / 2. Then the moment generating function is \((e^{s}+e^{-s})/2\), which is of the form (1.2) with \(\phi _{n}=n\), \(u(s)=0\), \(v(s)=\log (e^{s}+e^{-s})/2\) and \(\kappa _{n}\) arbitrary. However, the distribution function of \(\Omega _{n}/\sqrt{n}\) is given by

$$\begin{aligned} \mathbb {P}\left( \frac{\Omega _{n}}{\sqrt{n}}\le x\right) = {\left\{ \begin{array}{ll} 0&{} \quad \text {if }\,x<-1/\sqrt{n},\\ 1/2&{} \quad \text {if }\,-1/\sqrt{n}\le x<1/\sqrt{n},\\ 1&{} \quad \text {if }\,1/\sqrt{n}\le x, \end{array}\right. } \end{aligned}$$

which does not converge uniformly.

In contrast to the original quasi-power theorem, the error term in our result does not contain the summand \(O(1/\kappa _n)\). In fact, this summand could also be omitted in the original proof of the quasi-power theorem by using a better estimate for the error \(E_n(s)=M_n(s)e^{-W_{n}(s)}-1\), cf. the proof of our Lemma 5.1.

The order of the error is optimal (without further assumptions on the random variables), as it is the case for the one-dimensional Berry–Esseen inequality. See, for example, the approximation of a binomial distribution by the normal distribution [20, § 1.2].

The proof of Theorem 1 relies on an m-dimensional Berry–Esseen inequality (Theorem 2). It is a generalisation of Sadikova’s result [23, 24] in dimension 2. The main challenge is to provide a version which leads to bounded integrands around the origin, but still allows to use excellent bounds for the tails of the characteristic functions. To achieve this, linear combinations involving all partitions of the set \(\{1,\ldots , m\}\) are used.

Note that there are several generalisations of the one-dimensional Berry–Esseen inequality [2, 5] to arbitrary dimension, see, e.g., Gamkrelidze [7, 8] and Prakasa Rao [21]. However, using these results would lead to a less precise error term in (1.3), see the end of Sect. 2 for more details. For that reason we generalise Sadikova’s result, which was already successfully used by the first author in [12] to prove a 2-dimensional quasi-power theorem. Also note that our theorem can deal with discrete random variables, too, in contrast to [22], where density functions are considered.

For the sake of completeness, we also state the following result about the moments of \(\varvec{\Omega }_{n}\).

Proposition 1.1

The cross-moments of \(\varvec{\Omega }_{n}\) satisfy

$$\begin{aligned} \frac{1}{\prod _{\ell =1}^{m}k_{\ell }!}\mathbb E\left( \prod _{\ell =1}^{m}\Omega _{n,\ell }^{k_{\ell }}\right) = p_{\mathbf {k}}(\phi _{n})+O\left( \kappa _{n}^{-1}\phi _{n}^{k_{1}+\cdots +k_{m}}\right) , \end{aligned}$$

for \(k_{\ell }\) nonnegative integers, where \(p_{\mathbf {k}}\) is a polynomial of degree \(\sum _{\ell =1}^{m}k_{\ell }\) defined by

$$\begin{aligned} p_{\mathbf {k}}(X)=\left[ s_{1}^{k_{1}}\cdots s_{m}^{k_{m}}\right] e^{u(\mathbf {s})X+v(\mathbf {s})}. \end{aligned}$$

In particular, the mean and the variance-covariance matrix are

$$\begin{aligned}&\mathbb E(\varvec{\Omega }_{n})={{\mathrm{grad}}}u(\varvec{0})\phi _{n}+{{\mathrm{grad}}}v(\varvec{0})+O\left( \kappa _{n}^{-1}\right) ,\\&{{\mathrm{Cov}}}(\varvec{\Omega }_{n})=H_{u}(\varvec{0})\phi _{n}+H_{v}(\varvec{0})+O\left( \kappa _{n}^{-1}\right) , \end{aligned}$$

respectively.

2 A Berry–Esseen inequality

This section is devoted to a generalisation of Sadikova’s Berry–Esseen inequality [23, 24] in dimension 2 to dimension m. Before stating the theorem, we introduce our notation.

We use Iverson’s convention

$$\begin{aligned}{}[ expr ] = {\left\{ \begin{array}{ll} 1&{}\text { if } \textit{ expr } \text { is true},\\ 0&{}\text { if } \textit{ expr } \text { is false}, \end{array}\right. } \end{aligned}$$

popularized by Graham, Knuth, and Patashnik [10]. As in [10], we consider \([ expr ]\) to be “very strongly zero” when \( expr \) is false; such that even products of \([ expr ]\) with an undefined quantity vanish in that case.

Let \(L=\{1,\ldots , m\}\). For \(K\subseteq L\), we write \(\mathbf {s}_K=(s_k)_{k\in K}\) for the projection of \(\mathbf {s}\in \mathbb {C}^L\) to \(\mathbb {C}^K\). For \(J\subseteq K\subseteq L\), let \(\chi _{J,K}:\mathbb {C}^{J}\rightarrow \mathbb {C}^{K}\), \((s_{j})_{j\in J}\mapsto (s_{k}[k\in J ])_{k\in K}\) be an injection from \(\mathbb {C}^{J}\) into \(\mathbb {C}^{K}\). Similarly, let \(\psi _{J,K}:\mathbb {C}^{K}\rightarrow \mathbb {C}^{K}\), \((s_{k})_{k\in K}\mapsto (s_{k}[k\in J ])_{k\in K}\) be the projection which sets all coordinates corresponding to \(K\setminus J\) to 0.

We denote the set of all partitions of K by \(\Pi _K\). We consider a partition as a set \(\alpha =\{J_{1},\ldots ,J_{k}\}\). Thus \(|\alpha |\) denotes the number of parts of the partition \(\alpha \). Furthermore, \(J\in \alpha \) means that J is a part of the partition \(\alpha \).

Now, we can define an operator which we later use to state our Berry–Esseen inequality. The motivation behind this definition is explained at the end of this section.

Definition 2.1

Let \(K\subseteq L\) and \(h:\mathbb {C}^K\rightarrow \mathbb {C}\). We define the non-linear operator

$$\begin{aligned} \Lambda _K(h):=\sum _{\alpha \in \Pi _K}\mu _\alpha \prod _{J\in \alpha }h\circ \psi _{J, K} \end{aligned}$$

where

$$\begin{aligned} \mu _\alpha = (-1)^{|\alpha |-1}(|\alpha |-1)!\,. \end{aligned}$$

We denote \(\Lambda _{L}\) briefly by \(\Lambda \).

For any random variable \(\mathbf {Z}\), we denote its cumulative distribution function by \(F_\mathbf {Z}\), its density function by \(f_\mathbf {Z}\) (if it exists) and its characteristic function by \(\varphi _\mathbf {Z}\).

With these definitions, we are able to state our second main result, an m-dimensional version of the Berry–Esseen inequality.

Theorem 2

Let \(m\ge 1\) and \(\mathbf {X}\) and \(\mathbf {Y}\) be m-dimensional random variables. Assume that \(F_\mathbf {Y}\) is differentiable.

Let

for \(1\le j\le m\) where denotes a Stirling partition number (Stirling number of the second kind).

Then for every \(T>0\),

$$\begin{aligned} \sup _{\mathbf {z}\in \mathbb {R}^m}|F_{\mathbf {X}}(\mathbf {z})-F_{\mathbf {Y}}(\mathbf {z}) |\le & {} \frac{2}{(2\pi )^m} \int _{||\mathbf {t} ||\le T}\left|\frac{\Lambda (\varphi _{\mathbf {X}})(\mathbf {t})- \Lambda (\varphi _{\mathbf {Y}})(\mathbf {t})}{\prod _{\ell \in L} t_\ell } \right|\,d\mathbf {t}\nonumber \\&+ \,2\sum _{\emptyset \ne J\subsetneq L}B_{m-|J |}\sup _{\mathbf {z}_J\in \mathbb {R}^J}\left|F_{\mathbf {X}_{J}}(\mathbf {z}_J)-F_{\mathbf {Y}_{J}}(\mathbf {z}_J) \right| \nonumber \\&+\,\frac{2\sum _{j=1}^m A_j}{T}(C_1+C_2) \end{aligned}$$
(2.1)

holds. Existence of \(\mathbb {E}(\mathbf {X})\) and \(\mathbb {E}(\mathbf {Y})\) is sufficient for the finiteness of the integral in (2.1).

Let us give two remarks on the distribution functions occurring in this theorem: The distribution function \(F_\mathbf {Y}\) is non-decreasing in every variable, thus \(A_j>0\) for all j. Furthermore, our general notations imply that \(F_{\mathbf {X}_J}\) is a marginal distribution of \(\mathbf {X}\).

The numbers \(B_j\) are known as “Fubini numbers” or “ordered Bell numbers”. They form the sequence A000670 in [19].

Recursive application of (2.1) leads to the following corollary, where we no longer explicitly state the constants depending on the dimension.

Corollary 2.2

Let \(m\ge 1\) and \(\mathbf {X}\) and \(\mathbf {Y}\) be m-dimensional random variables. Assume that \(F_\mathbf {Y}\) is differentiable and let

$$\begin{aligned} A_j=\sup _{\mathbf {y}\in \mathbb {R}^m}\frac{\partial F_\mathbf {Y}(\mathbf {y})}{\partial y_j}, \qquad 1\le j\le m. \end{aligned}$$

Then

$$\begin{aligned}&\sup _{\mathbf {z}\in \mathbb {R}^m}|F_{\mathbf {X}}(\mathbf {z})-F_{\mathbf {Y}}(\mathbf {z}) |\nonumber \\&\quad = O\left( \sum _{\emptyset \ne K\subseteq L}\int _{||\mathbf {t}_K ||\le T}\left|\frac{\Lambda _K(\varphi _{\mathbf {X}}\circ \chi _{K, L})(\mathbf {t}_K)-\Lambda _K(\varphi _{\mathbf {Y}}\circ \chi _{K, L})(\mathbf {t}_K)}{\prod _{k\in K} t_k} \right|\,d\mathbf {t}_K\right. \nonumber \\&\qquad \qquad \qquad \left. + \frac{\sum _{j=1}^m A_j}{T}\right) \end{aligned}$$
(2.2)

where the O-constants only depend on the dimension m.

Existence of \(\mathbb {E}(\mathbf {X})\) and \(\mathbb {E}(\mathbf {Y})\) is sufficient for the finiteness of the integrals in (2.2).

In order to explain the choice of the operator \(\Lambda \), we first state it in dimension 2:

$$\begin{aligned} \Lambda (h)(s_1, s_2) = h(s_1, s_2) - h(s_1, 0)h(0, s_2). \end{aligned}$$
(2.3)

This coincides with Sadikova’s definition. This also shows that our operator is non-linear as, e.g., \(\Lambda (s_{1}+s_{2})(s_{1},s_{2})\ne \Lambda (s_{1})(s_{1},s_{2})+\Lambda (s_{2})(s_{1},s_{2})\).

In Theorem 2, we apply \(\Lambda \) to characteristic functions; so we may restrict our attention to functions h with \(h(\varvec{0})=1\). From (2.3), we see that \(\Lambda (h)(s_1, 0) = \Lambda (h)(0, s_2)=0\), so that \(\Lambda (h)(s_1, s_2)/(s_1s_2)\) is bounded around the origin. This is essential for the boundedness of the integral in Theorem 2. In general, this property will be guaranteed by our particular choice of coefficients. The coefficients \(\mu _\alpha \) for \(\alpha \in \Pi _L\) are chosen as the values \(\mu (\alpha , \{L\})\) of the Möbius function of the lattice of partitions, which reflects the underlying combinatorial structure (as will be explained): Weisner’s theorem (see Stanley [25, Corollary 3.9.3]) is crucial in the proof that \(\Lambda (h)(\mathbf {s})/(s_1\ldots s_m)\) is bounded around the origin (see the proof of Lemma 3.1).

The second property is that our proof of the quasi-power theorem needs estimates for the tails of the integral in Theorem 2. These estimates have to be exponentially small in every variable, which means that every variable has to occur in every summand. This is trivially fulfilled in our definition as every summand in the definition of \(\Lambda \) is formulated in terms of a partition. In contrast, Gamkrelidze [8] (and also Prakasa Rao [21]) use a linear operator L mapping h to

$$\begin{aligned} (s_1, s_2) \mapsto h(s_1, s_2) - h(s_1, 0) - h(0, s_2). \end{aligned}$$
(2.4)

When taking the difference of two characteristic functions, we may assume that \(h(0, 0)=0\) so that the first crucial property as defined above still holds. However, the tails are no longer exponentially small in every variable: the last summand \(h(0,s_{2})\) in (2.4) is not exponentially small in \(s_{1}\) because it is independent of \(s_{1}\) and nonzero in general. However, the first two summands are exponentially small in \(s_{1}\) by our assumption (1.2).

For that reason, using the Berry–Esseen inequality by Gamkrelidze [8] to prove a quasi-power theorem leads to a weaker error term \(O(\phi _{n}^{-1/2}\log ^{m-1}\phi _n)\) in (1.3). It can be shown that the less precise error term necessarily appears when using Gamkrelidze’s result by considering the example of \(\varvec{\Omega }_n\) being the 2-dimensional vector consisting of a normal distribution with mean \(-1\) and variance n and a normal distribution with mean 0 and variance n. This is a consequence of the linearity of the operator L in Gamkrelidze’s result.

3 Combinatorial background of the operator \(\Lambda \)

Before we start with the proof of Theorem 2, we state and prove the property of the operator \(\Lambda \) which motivates its Definition 2.1.

Lemma 3.1

Let \(K\subsetneq L\) and \(h:\mathbb {C}^L\rightarrow \mathbb {C}\) with \(h(\varvec{0})=1\). Then

$$\begin{aligned} \Lambda (h)\circ \psi _{K, L}=0. \end{aligned}$$

Before actually proving the lemma, we recall some of the theory about the Möbius function of a partially ordered set (poset), see also Stanley [25, Section 3.7].

By the following definition, \(\Pi _L\), the set of all partitions of L, is a poset: As usual, a partition \(\alpha \in \Pi _L\) is said to be a refinement of a partition \(\alpha '\in \Pi _L\) if

$$\begin{aligned} \forall J\in \alpha :\exists J'\in \alpha ':J\subseteq J'. \end{aligned}$$

In this case, we write \(\alpha \le \alpha '\). This defines a partial order on \(\Pi _L\).

The Möbius function on \(\Pi _L\) is denoted by \(\mu \) and is recursively defined as follows: for \(\alpha < \alpha '\), we set \(\mu (\alpha ', \alpha ')=1\) and

$$\begin{aligned} \mu (\alpha , \alpha ')= -\sum _{\begin{array}{c} \beta \in \Pi _L\\ \alpha <\beta \le \alpha ' \end{array}} \mu (\beta , \alpha '). \end{aligned}$$

For \(\alpha \), \(\alpha '\in \Pi _L\), the infimum \(\alpha \wedge \alpha '\) of \(\alpha \) and \(\alpha '\) is given by

$$\begin{aligned} \{ J\cap J':J\in \alpha , J'\in \alpha ', J\cap J'\ne \emptyset \}. \end{aligned}$$

In fact, \(\Pi _L\) is a lattice (cf. Stanley [25, Example 3.10.4]). The greatest element is \(\{L\}\).

For \(\alpha \in \Pi _L\), we have

$$\begin{aligned} \mu (\alpha , \{L\})=(-1)^{|\alpha |-1}(|\alpha |-1)!=\mu _\alpha , \end{aligned}$$

where \(|\alpha |\) denotes the number of parts of the partition, see Stanley [25, (3.37)]. In particular, we may rewrite the definition of \(\Lambda \) (Definition 2.1) as

$$\begin{aligned} \Lambda (h):=\sum _{\alpha \in \Pi _L}\mu (\alpha , \{L\}) \prod _{J\in \alpha }h\circ \psi _{J, L}. \end{aligned}$$
(3.1)

For any \(\gamma \), \(\beta \in \Pi _L\) with \(\gamma \le \beta < \{L\}\), Weisner’s theorem (see Stanley [25, Corollary 3.9.3]) applied to the interval \([\gamma , \{L\}]\) asserts that

$$\begin{aligned} \sum _{\begin{array}{c} \alpha \in \Pi _L\\ \alpha \wedge \beta =\gamma \end{array}}\mu (\alpha , \{L\})=0. \end{aligned}$$
(3.2)

We now turn to the actual proof of the lemma.

Proof of Lemma 3.1

Consider the partition \(\beta =\{K\} \cup \{\{k\}:k\in L\setminus K\}\) of L, i.e., \(\beta \) consists of K as one part and a collection of singletons. As \(K\ne L\), we have \(\beta < \{L\}\).

By definition of \(\psi \), we have \(\psi _{J,L}\circ \psi _{K,L}=\psi _{J\cap K, L}\) for J, \(K\subseteq L\). If \(\alpha \in \Pi _L\), then

$$\begin{aligned} \prod _{J\in \alpha } h\circ \psi _{J\cap K, L}= \prod _{\begin{array}{c} J\in \alpha \wedge \beta \\ J\subseteq K \end{array}} h\circ \psi _{J, L} \end{aligned}$$

because parts \(J\in \alpha \) with \(J\cap K=\emptyset \) contribute \(h(\varvec{0})=1\). Therefore, collecting the sum (3.1) according to \(\alpha \wedge \beta \) yields

$$\begin{aligned} \Lambda (h)\circ \psi _{K, L}:= & {} \sum _{\alpha \in \Pi _L}\mu (\alpha , \{L\}) \prod _{J\in \alpha }h\circ \psi _{J\cap K, L}\\= & {} \sum _{\gamma \in \Pi _L}\prod _{\begin{array}{c} J\in \gamma \\ J\subseteq K \end{array}} h\circ \psi _{J, L} \sum _{\begin{array}{c} \alpha \in \Pi _L\\ \alpha \wedge \beta =\gamma \end{array}}\mu (\alpha , \{L\}). \end{aligned}$$

As \(\gamma \le \beta <\{L\}\), the inner sum vanishes by (3.2). \(\square \)

4 Proof of the Berry–Esseen inequality

This section is devoted to the proof of the Berry–Esseen inequality, Theorem 2. It is a generalisation of Sadikova’s proof.

We start with an auxiliary one-dimensional random variable.

Lemma 4.1

Let P be the one-dimensional random variable with probability density function

$$\begin{aligned} f_P(z)=\frac{3}{8\pi }\left( \frac{\sin (z/4)}{z/4}\right) ^4. \end{aligned}$$

Then its characteristic function is

$$\begin{aligned} \varphi _P(t)= {\left\{ \begin{array}{ll} 1-6t^2+6|t |^3&{}\quad \hbox { if}\,\ 0\le |t |\le 1/2,\\ 2(1-|t |)^3&{}\quad \hbox { if}\,\ 1/2\le |t |\le 1,\\ 0&{}\quad \hbox { if}\,\ 1\le |t | \end{array}\right. } \end{aligned}$$
(4.1)

and

$$\begin{aligned} \mathbb {E}(P^2)&=12,\nonumber \\ \mathbb {E}(|P |)&\le C_2. \end{aligned}$$
(4.2)

Let \(\lambda \) be the unique positive number such that

$$\begin{aligned} \mathbb {P}(P\le \lambda ) = \mathbb {P}(P\ge -\lambda ) = \left( \frac{3}{4}\right) ^{1/m}. \end{aligned}$$

Then

$$\begin{aligned} \lambda \le C_1. \end{aligned}$$
(4.3)

Proof

The characteristic function (4.1) is mentioned in [9, Section 39]; it is computed by standard methods.

Differentiating \(\varphi _P\) twice, we see that the second moment is 12. To prove (4.2), we rewrite \(\mathbb {E}(|P |)\) as

$$\begin{aligned} \mathbb {E}(|P |) =\frac{12}{\pi } \int _0^1 \frac{\sin ^4 z}{z^3}\, dz +\frac{12}{\pi } \int _1^\infty \frac{\sin ^4 z}{z^3}\, dz. \end{aligned}$$

We use the estimates \(\sin z\le z\) and \(|\sin z |\le 1\) on the intervals [0, 1] and \([1, \infty )\), respectively. Thus

$$\begin{aligned} \mathbb {E}(|P |)\le \frac{12}{\pi }\left( \frac{1}{2} + \frac{1}{2}\right) =\frac{12}{\pi }. \end{aligned}$$

To obtain a bound for \(\lambda \), we follow Gamkrelidze [8]: we estimate the tail using \(|\sin ^4(z) |\le 1\) and get

$$\begin{aligned} 1-\left( \frac{3}{4}\right) ^{1/m}= & {} \frac{3}{8\pi }\int _{\lambda }^\infty \left( \frac{\sin (z/4)}{z/4}\right) ^4\, dz\le \frac{3}{2\pi }\int _{\lambda /4}^\infty \left( \frac{1}{z}\right) ^4\,dz \\= & {} \frac{3}{2\pi }\left( -\frac{1}{3}\right) \frac{1}{z^3}\Bigr |_{z=\lambda /4}^\infty =\frac{32}{\pi \lambda ^3}. \end{aligned}$$

This results in (4.3). \(\square \)

In the next step, we consider tuples of random variables distributed as P. They will be used to ensure smoothness. We write \(\varvec{1}\) to denote a vector with all coordinates equal to 1.

Lemma 4.2

Let \(\mathbf {Q}=(P_1/T, \ldots , P_m/T)\) be the m-dimensional random variable where the \(P_j\) are independent random variables with the same distribution as P in Lemma 4.1 and T is the fixed constant defined in Theorem 2.

Then \(\mathbf {Q}\) has density function and characteristic function

$$\begin{aligned} f_\mathbf {Q}(\mathbf {z})&=\prod _{j=1}^m Tf_P(Tz_j),\\ \varphi _\mathbf {Q}(\mathbf {t})&=\prod _{j=1}^m\varphi _P\left( \frac{t_j}{T}\right) , \end{aligned}$$

respectively. The characteristic function vanishes outside \([-T, T]^m\).

Furthermore,

$$\begin{aligned} \int _{\mathbf {z}\in \mathbb {R}^m} |z_j | f_\mathbf {Q}\left( \mathbf {z}+ \frac{\theta \lambda }{T}\varvec{1}\right) \,d\mathbf {z}&\le \frac{C_2+\lambda }{T}, \end{aligned}$$
(4.4)
$$\begin{aligned} \int _{\theta \mathbf {z}\le 0} f_\mathbf {Q}\left( \mathbf {z}+ \frac{\theta \lambda }{T} \varvec{1}\right) \,d\mathbf {z}&=\frac{3}{4} \end{aligned}$$
(4.5)

hold for \(\theta \in \{\pm 1\}\) and \(j\in \{1,\ldots , m\}\).

Proof

Because of independence, the distribution function and the characteristic function of \(\mathbf {Q}\) are the product of the distribution functions and the characteristic functions of the \(P_j/T\), respectively. Division by T transforms the density and characteristic functions as claimed. As \(\varphi _P(t)\) vanishes outside \([-1, 1]\) by (4.1), \(\varphi _\mathbf {Q}(\mathbf {t})\) vanishes outside \([-T, T]^m\).

By a simple translation, the integral on the left hand side of (4.4) can be seen to be equal to

$$\begin{aligned} \mathbb {E}\left( \left|Q_j-\frac{\theta \lambda }{T} \right|\right) . \end{aligned}$$

Then (4.4) is a simple consequence of \(Q_j=P_j/T\), (4.2) and the triangle inequality.

By the same translation and the definition of \(\lambda \), the integral on the left hand side of (4.5) is

$$\begin{aligned} \mathbb {P}\left( \theta \mathbf {Q}\le \frac{\lambda }{T}\varvec{1}\right) =\prod _{j=1}^m \mathbb {P}(\theta P_j\le \lambda )=\frac{3}{4}. \end{aligned}$$

\(\square \)

From now on, we let \(\mathbf {Q}\) be as in Lemma 4.2 and let \(\mathbf {Q}\) be independent of \(\mathbf {X}\) and \(\mathbf {Y}\). We first prove an inequality relating the difference between the distribution functions of \(\mathbf {X}\) and \(\mathbf {Y}\) to that of the distribution functions of \(\mathbf {X}+\mathbf {Q}\) and \(\mathbf {Y}+\mathbf {Q}\).

Lemma 4.3

We have

$$\begin{aligned} \sup _{\mathbf {z}\in \mathbb {R}^m}|F_{\mathbf {X}+\mathbf {Q}}(\mathbf {z})-F_{\mathbf {Y}+\mathbf {Q}}(\mathbf {z}) |\le & {} \sup _{\mathbf {z}\in \mathbb {R}^m}|F_\mathbf {X}(\mathbf {z})-F_{\mathbf {Y}}(\mathbf {z}) |\nonumber \\\le & {} 2\sup _{\mathbf {z}\in \mathbb {R}^m}|F_{\mathbf {X}+\mathbf {Q}}(\mathbf {z})-F_{\mathbf {Y}+\mathbf {Q}}(\mathbf {z}) |\nonumber \\&+\frac{2\sum _{j=1}^m A_j}{T}(C_1+C_2). \end{aligned}$$
(4.6)

Proof

Let

$$\begin{aligned}&S=\sup _{\mathbf {z}\in \mathbb {R}^m}|F_\mathbf {X}(\mathbf {z})-F_{\mathbf {Y}}(\mathbf {z}) |\\&S'=\sup _{\mathbf {z}\in \mathbb {R}^m}|F_{\mathbf {X}+\mathbf {Q}}(\mathbf {z})-F_{\mathbf {Y}+\mathbf {Q}}(\mathbf {z}) | \end{aligned}$$

and \(\varepsilon >0\). We choose \(\theta \in \{\pm 1\}\) such that \(S=\sup _{z\in \mathbb {R}^m}\theta (F_{\mathbf {X}}(\mathbf {z})-F_{\mathbf {Y}}(\mathbf {z}))\).

There is a \(\mathbf {z}_{\varepsilon }\in \mathbb {R}^m\) such that

$$\begin{aligned} S-\varepsilon \le \theta (F_{\mathbf {X}}-F_{\mathbf {Y}})(\mathbf {z}_\varepsilon ). \end{aligned}$$

Let \(\mathbf {w}\in \mathbb {R}^n\) with \(\theta \mathbf {w}\le \varvec{0}\). By monotonicity of \(F_{\mathbf {X}}\), we have \(\theta F_{\mathbf {X}}(\mathbf {z}_\varepsilon -\mathbf {w})\ge \theta F_{\mathbf {X}}(\mathbf {z}_\varepsilon )\). Thus

$$\begin{aligned} \theta (F_{\mathbf {X}}-F_{\mathbf {Y}})(\mathbf {z}_\varepsilon -\mathbf {w})&\ge \theta (F_{\mathbf {X}}-F_{\mathbf {Y}})(\mathbf {z}_{\varepsilon })-\theta (F_{\mathbf {Y}}(\mathbf {z}_\varepsilon -\mathbf {w})-F_{\mathbf {Y}}(\mathbf {z}_\varepsilon ))\\&\ge S-\varepsilon - \sum _{j=1}^m A_j |w_j |. \end{aligned}$$

We multiply this inequality by \(f_{\mathbf {Q}}\left( \mathbf {w}+\frac{\theta \lambda }{T}\varvec{1}\right) \) and integrate over all \(\mathbf {w}\in \mathbb {R}^n\) with \(\theta \mathbf {w}\le \varvec{0}\). By (4.5) and (4.4), we get

$$\begin{aligned} I_1:= & {} \int _{\theta \mathbf {w}\le \varvec{0}}\theta (F_{\mathbf {X}}-F_{\mathbf {Y}})(\mathbf {z}_\varepsilon -\mathbf {w})f_{\mathbf {Q}}\left( \mathbf {w}+\frac{\theta \lambda }{T}\varvec{1}\right) \,d\mathbf {w}\ge \frac{3}{4}(S-\varepsilon )\nonumber \\&-\,\frac{C_2+\lambda }{T}\sum _{j=1}^m A_j. \end{aligned}$$
(4.7)

Setting

$$\begin{aligned} I_2:=\int _{\theta \mathbf {w}\nleq \varvec{0}}\theta (F_{\mathbf {X}}-F_{\mathbf {Y}})(\mathbf {z}_\varepsilon -\mathbf {w})f_{\mathbf {Q}}\left( \mathbf {w}+\frac{\theta \lambda }{T}\varvec{1}\right) \,d\mathbf {w}\end{aligned}$$

and using the estimate \(|\theta (F_{\mathbf {X}}-F_{\mathbf {Y}})(\mathbf {z}_\varepsilon -\mathbf {w}) |\le S\) yields

$$\begin{aligned} |I_2 |\le S \int _{\theta \mathbf {w}\nleq \varvec{0}}f_{\mathbf {Q}}\left( \mathbf {w}+\frac{\theta \lambda }{T}\varvec{1}\right) \,d\mathbf {w}=\frac{S}{4} \end{aligned}$$
(4.8)

by (4.5) and the fact that \(f_{\mathbf {Q}}\) is a probability density function.

Combining (4.7) and (4.8) yields

$$\begin{aligned} |I_1+I_2 |\ge |I_1 |-|I_2 |\ge I_1-|I_2 |\ge \frac{S}{2}-\frac{C_2+\lambda }{T}\sum _{j=1}^m A_j-\frac{3\varepsilon }{4}. \end{aligned}$$
(4.9)

As the sum of random variables corresponds to a convolution, we have

$$\begin{aligned} (F_{\mathbf {X}+\mathbf {Q}}-F_{\mathbf {Y}+\mathbf {Q}})(\mathbf {z}) = \int _{\mathbb {R}^m}(F_{\mathbf {X}}-F_{\mathbf {Y}})(\mathbf {z}-\mathbf {w}) f_{\mathbf {Q}}(\mathbf {w})\,d\mathbf {w}. \end{aligned}$$
(4.10)

Replacing \(\mathbf {z}\) and \(\mathbf {w}\) by \(\mathbf {z}_{\varepsilon }+\frac{\theta \lambda }{T}\varvec{1}\) and \(\mathbf {w}+\frac{\theta \lambda }{T}\varvec{1}\), respectively, and using (4.9) leads to

$$\begin{aligned} S'\ge \left|(F_{\mathbf {X}+\mathbf {Q}}-F_{\mathbf {Y}+\mathbf {Q}})\left( \mathbf {z}_\varepsilon +\frac{\theta \lambda }{T}\varvec{1}\right) \right|=|I_1+I_2 |\ge \frac{S}{2}-\frac{C_2+\lambda }{T}\sum _{j=1}^m A_j-\frac{3\varepsilon }{4} \end{aligned}$$

for all \(\varepsilon >0\). Taking the limit for \(\varepsilon \rightarrow 0\) and rearranging yields the right hand side of (4.6).

The left hand side of (4.6) is an immediate consequence of (4.10). \(\square \)

We are now able to bound the difference of the distribution functions by their characteristic functions.

Lemma 4.4

We have

$$\begin{aligned}&\sup _{\mathbf {z}\in \mathbb {R}^m}\left|\sum _{\alpha \in \Pi _L}\mu _\alpha \left( \prod _{J\in \alpha } F_{\mathbf {X}_J+\mathbf {Q}_J}(\mathbf {z}_J)-\prod _{J\in \alpha }F_{\mathbf {Y}_J+\mathbf {Q}_J}(\mathbf {z}_J)\right) \right|\nonumber \\&\quad \le \frac{1}{(2\pi )^m}\int _{||\mathbf {t} ||\le T}\left|\frac{\Lambda (\varphi _{\mathbf {X}})(\mathbf {t})-\Lambda (\varphi _{\mathbf {Y}})(\mathbf {t})}{\prod _{\ell \in L} t_\ell } \right|\,d\mathbf {t}. \end{aligned}$$
(4.11)

Proof

Let \(\mathbf {a}\), \(\mathbf {z}\in \mathbb {R}^m\) with \(\mathbf {a}\le \mathbf {z}\).

The random variable \(\mathbf {X}_J+\mathbf {Q}_J\) admits a density function, because \(\mathbf {Q}_J\) admits a density function. In particular, \(\mathbf {X}_J+\mathbf {Q}_J\) is a continuous random variable. By Lévy’s theorem (see, e.g., [26, Thm. 1.8.4]),

$$\begin{aligned}&\mathbb {P}(\mathbf {a}_J \le \mathbf {X}_J+\mathbf {Q}_J\le \mathbf {z}_J)\\&\quad =\frac{1}{(2\pi )^{|J |}}\lim _{\begin{array}{c} T_j\rightarrow \infty \\ j\in J \end{array}}\int \limits _{\begin{array}{c} -T_j\le t_j\le T_j\\ j\in J \end{array}} \varphi _{\mathbf {X}_J+\mathbf {Q}_J}(\mathbf {t}_J)\prod _{j\in J}\frac{e^{-it_jz_j}-e^{-it_ja_j}}{-it_j}\, d\mathbf {t}_J. \end{aligned}$$

As \(\varphi _{\mathbf {X}_J+\mathbf {Q}_J}(\mathbf {t}_J)=\varphi _{\mathbf {X}_J}(\mathbf {t}_J)\varphi _{\mathbf {Q}_J}(\mathbf {t}_J)\) and \(\varphi _{\mathbf {Q}_J}(\mathbf {t}_J)\) vanishes outside \([-T, T]^J\) by Lemma 4.2, we can replace the limit \(T_j\rightarrow \infty \) by setting \(T_j=T\), i.e.,

$$\begin{aligned}&\mathbb {P}(\mathbf {a}_J \le \mathbf {X}_J+\mathbf {Q}_J\le \mathbf {z}_J)\\&\quad =\frac{i^{|J |}}{(2\pi )^{|J |}}\int _{||\mathbf {t}_J ||\le T} \varphi _{\mathbf {X}_J}(\mathbf {t}_J)\varphi _{\mathbf {Q}_J}(\mathbf {t}_J)\prod _{j\in J}\frac{e^{-it_jz_j}-e^{-it_ja_j}}{t_j}\, d\mathbf {t}_J. \end{aligned}$$

Taking the product over all \(J\in \alpha \) and summing over \(\alpha \in \Pi _L\) yields

$$\begin{aligned}&\sum _{\alpha \in \Pi _L}\mu _\alpha \prod _{J\in \alpha }\mathbb {P}(\mathbf {a}_J \le \mathbf {X}_J+\mathbf {Q}_J\le \mathbf {z}_J) \nonumber \\&\quad = \frac{i^m}{(2\pi )^m}\int _{||\mathbf {t} ||\le T} \varphi _{\mathbf {Q}}(\mathbf {t}) \prod _{\ell \in L}\frac{e^{-it_\ell z_\ell }-e^{-it_\ell a_\ell }}{t_\ell }\sum _{\alpha \in \Pi _L}\mu _\alpha \prod _{J\in \alpha }\varphi _{\mathbf {X}_J}(\mathbf {t}_J) \, d\mathbf {t}\qquad \qquad \end{aligned}$$
(4.12)

where Fubini’s theorem and the fact that \(\varphi _\mathbf {Q}(\mathbf {t})=\prod _{J\in \alpha }\varphi _{\mathbf {Q}_J}(\mathbf {t}_J)\) have been used. By definition of \(\varphi _{\mathbf {X}}\), we have \(\varphi _{\mathbf {X}_J}(\mathbf {t}_J)=\varphi _{\mathbf {X}}(\psi _{J, L}(\mathbf {t}))\). Therefore, we can use the definition of \(\Lambda (\varphi _\mathbf {X})\) to rewrite (4.12) to

$$\begin{aligned}&\sum _{\alpha \in \Pi _L}\mu _\alpha \prod _{J\in \alpha }\mathbb {P}(\mathbf {a}_J \le \mathbf {X}_J+\mathbf {Q}_J\le \mathbf {z}_J) \\&\quad = \frac{i^m}{(2\pi )^m}\int _{||\mathbf {t} ||\le T} \frac{\Lambda (\varphi _\mathbf {X})(\mathbf {t})}{\prod _{\ell \in L}t_\ell }\varphi _{\mathbf {Q}}(\mathbf {t}) \prod _{\ell \in L}(e^{-it_\ell z_\ell }-e^{-it_\ell a_\ell }) \, d\mathbf {t}. \end{aligned}$$

This equation remains valid when replacing \(\mathbf {X}\) by \(\mathbf {Y}\); taking the difference results in

$$\begin{aligned}&\sum _{\alpha \in \Pi _L}\mu _\alpha \left( \prod _{J\in \alpha }\mathbb {P}(\mathbf {a}_J \le \mathbf {X}_J+\mathbf {Q}_J\le \mathbf {z}_J)- \prod _{J\in \alpha }\mathbb {P}(\mathbf {a}_J \le \mathbf {Y}_J+\mathbf {Q}_J\le \mathbf {z}_J)\right) \nonumber \\&\quad = \frac{i^m}{(2\pi )^m}\int _{||\mathbf {t} ||\le T} \frac{\Lambda (\varphi _\mathbf {X})(\mathbf {t})-\Lambda (\varphi _\mathbf {Y})(\mathbf {t})}{\prod _{\ell \in L}t_\ell }\varphi _{\mathbf {Q}}(\mathbf {t}) \prod _{\ell \in L}(e^{-it_\ell z_\ell }-e^{-it_\ell a_\ell }) \, d\mathbf {t}.\nonumber \\ \end{aligned}$$
(4.13)

If the integral on the right hand side of (4.11) is infinite, there is nothing to show. Thus we may assume that it is finite. This also implies that

$$\begin{aligned} \frac{\Lambda (\varphi _\mathbf {X})(\mathbf {t})-\Lambda (\varphi _\mathbf {Y})(\mathbf {t})}{\prod _{\ell \in L}t_\ell }\varphi _{\mathbf {Q}}(\mathbf {t}) \end{aligned}$$

is an integrable function on \(\mathbb {R}^m\) (as it vanishes outside \([-T, T]^m\)). Then by the Riemann–Lebesgue lemma, we may take the limit \(a_\ell \rightarrow -\infty \) for all \(\ell \in L\) in (4.13) to obtain

$$\begin{aligned}&\sum _{\alpha \in \Pi _L}\mu _\alpha \left( \prod _{J\in \alpha }\mathbb {P}(\mathbf {X}_J+\mathbf {Q}_J\le \mathbf {z}_J)- \prod _{J\in \alpha }\mathbb {P}(\mathbf {Y}_J+\mathbf {Q}_J\le \mathbf {z}_J)\right) \\&\quad = \frac{i^m}{(2\pi )^m}\int _{||\mathbf {t} ||\le T} \frac{\Lambda (\varphi _\mathbf {X})(\mathbf {t})-\Lambda (\varphi _\mathbf {Y})(\mathbf {t})}{\prod _{\ell \in L}t_\ell }\varphi _{\mathbf {Q}}(\mathbf {t}) e^{-i\langle \mathbf {t}, \mathbf {z}\rangle } \, d\mathbf {t}. \end{aligned}$$

Taking absolute values and rewriting the left hand side in terms of marginal distribution functions yields (4.11). \(\square \)

We now bound the contribution of the lower dimensional distributions.

Lemma 4.5

We have

$$\begin{aligned}&\sup _{\mathbf {z}\in \mathbb {R}^m}\Biggl \vert \sum _{\begin{array}{c} \alpha \in \Pi _L\\ \alpha \ne \{L\} \end{array}}\mu _\alpha \Biggl (\prod _{J\in \alpha } F_{\mathbf {X}_J+\mathbf {Q}_J}(\mathbf {z}_J)-\prod _{J\in \alpha }F_{\mathbf {Y}_J+\mathbf {Q}_J}(\mathbf {z}_J)\Biggr )\Biggr \vert \\&\quad \le \sum _{\emptyset \ne J\subsetneq L}B_{m-|J |}\sup _{\mathbf {z}\in \mathbb {R}^J}\left|F_{\mathbf {X}_{J}}(\mathbf {z})-F_{\mathbf {Y}_{J}}(\mathbf {z}) \right| . \end{aligned}$$

Proof

Let \(\alpha =\{J_1,\ldots , J_r\}\in \Pi _L\). Then

$$\begin{aligned}&\left|\prod _{J\in \alpha } F_{\mathbf {X}_J+\mathbf {Q}_J}(\mathbf {z}_J) - \prod _{J\in \alpha } F_{\mathbf {Y}_J+\mathbf {Q}_J}(\mathbf {z}_J) \right| \\&\quad =\biggl |\sum _{k=1}^r \biggl (\prod _{j=1}^k F_{\mathbf {X}_{J_j}+\mathbf {Q}_{J_j}}(\mathbf {z}_{J_j})\prod _{j={k+1}}^r F_{\mathbf {Y}_{J_j}+\mathbf {Q}_{J_j}}(\mathbf {z}_{J_j}) \\&\qquad -\, \prod _{j=1}^{k-1} F_{\mathbf {X}_{J_j}+\mathbf {Q}_{J_j}}(\mathbf {z}_{J_j})\prod _{j={k}}^r F_{\mathbf {Y}_{J_j}+\mathbf {Q}_{J_j}}(\mathbf {z}_{J_j})\biggr )\biggr |\\&\quad = \biggl |\sum _{k=1}^r \prod _{j=1}^{k-1} F_{\mathbf {X}_{J_j}+\mathbf {Q}_{J_j}}(\mathbf {z}_{J_j})\prod _{j={k+1}}^r F_{\mathbf {Y}_{J_j}+\mathbf {Q}_{J_j}}(\mathbf {z}_{J_j}) \\&\qquad \times \bigl (F_{\mathbf {X}_{J_k}+\mathbf {Q}_{J_k}}(\mathbf {z}_{J_j})-F_{\mathbf {Y}_{J_k}+\mathbf {Q}_{J_k}}(\mathbf {z}_{J_j})\bigr )\biggr |\\&\quad \le \sum _{J\in \alpha }\left|F_{\mathbf {X}_{J}+\mathbf {Q}_{J}}(\mathbf {z}_J)-F_{\mathbf {Y}_{J}+\mathbf {Q}_{J}}(\mathbf {z}_J) \right| \end{aligned}$$

because the products over the distribution functions are bounded by 1.

Therefore,

$$\begin{aligned}&\Biggl \vert \sum _{\begin{array}{c} \alpha \in \Pi _L\\ \alpha \ne \{L\} \end{array}}\mu _\alpha \Biggl (\prod _{J\in \alpha } F_{\mathbf {X}_J+\mathbf {Q}_J}(\mathbf {z}_J)-\prod _{J\in \alpha }F_{\mathbf {Y}_J+\mathbf {Q}_J}(\mathbf {z}_J)\Biggr )\Biggr \vert \\&\quad \le \sum _{\emptyset \ne J\subsetneq L}\left|F_{\mathbf {X}_{J}+\mathbf {Q}_{J}}(\mathbf {z}_J)-F_{\mathbf {Y}_{J}+\mathbf {Q}_{J}}(\mathbf {z}_J) \right| \sum _{\begin{array}{c} \alpha \in \Pi _{L}\\ J\in \alpha \end{array}}|\mu _\alpha |. \end{aligned}$$

A partition \(\alpha \in \Pi _L\) with \(J\in \alpha \) can be uniquely written as \(\alpha =\{J\}\cup \beta \) for a \(\beta \in \Pi _{L\setminus J}\). Thus

because there are partitions of \(L\setminus J\) with k parts. Using the left hand side of (4.6) yields the assertion (more precisely, applying a version of the left hand side of (4.6) for marginal distributions). \(\square \)

Now, we can complete the proof of the theorem.

Proof of Theorem 2

The estimate (2.1) follows from Lemma 4.3 [more precisely, the right hand side of (4.6)], Lemma 4.4 and Lemma 4.5.

If the expectation of \(\mathbf {X}\) exists, \(\varphi _{\mathbf {X}}\) is differentiable. Therefore, \(\Lambda (\varphi _\mathbf {X})\) is differentiable, too. By Lemma 3.1, \(\Lambda (\varphi _\mathbf {X})(\mathbf {t})\) has a zero whenever one of the \(t_\ell \), \(\ell \in L\), vanishes. Thus

$$\begin{aligned} \frac{\Lambda (\varphi _\mathbf {X})(\mathbf {t})}{\prod _{\ell \in L}t_\ell } \end{aligned}$$

is bounded around \(\varvec{0}\) and therefore bounded on \([-T, T]^m\). The same holds for \(\mathbf {Y}\). Thus the integral on the right hand side of (2.1) converges. \(\square \)

5 Proof of the quasi-power theorem

We may now prove the m-dimensional quasi-power theorem, Theorem 1.

Let \(\varvec{\mu }_n=\phi _{n}{{\mathrm{grad}}}u(\varvec{0}) \) and \(\Sigma =H_u(\varvec{0})\). We define the random vector \(\mathbf {X}=\phi _{n}^{-1/2}(\varvec{\Omega }_n-\varvec{\mu }_n)\). For simplicity, we ignore the dependence on n in this and the following notations.

First, we establish bounds for the characteristic function of \(\mathbf {X}\).

Lemma 5.1

For \(\Sigma \) regular or singular, there exists a function \(V(\mathbf {s})\) which is analytic for \(||\mathbf {s} ||< \tau \sqrt{\phi _{n}}/2\) such that

$$\begin{aligned} \varphi _{\mathbf {X}}(\mathbf {s})=\exp \left( -\frac{1}{2} \mathbf {s}^\top \Sigma \mathbf {s}+V(\mathbf {s})\right) \end{aligned}$$

and

$$\begin{aligned} V(\mathbf {s})=O\left( \frac{||\mathbf {s} ||^3+||\mathbf {s} ||}{\sqrt{\phi _{n}}}\right) \end{aligned}$$
(5.1)

hold for all \(\mathbf {s}\in \mathbb {C}^K\) with \(||\mathbf {s} ||< \tau \sqrt{\phi _{n}}/2\).

For \(n\rightarrow \infty \), \(\mathbf {X}\) converges in distribution to a normal distribution with mean \(\varvec{0}\) and variance-covariance matrix \(\Sigma \). In particular, \(\Sigma \) is positive (semi-)definite if it is regular (singular, respectively).

Proof

By replacing \(u(\mathbf {s})\) and \(v(\mathbf {s})\) by \(u(\mathbf {s})-u(\varvec{0})\) and \(v(\mathbf {s})-v(\varvec{0})\), respectively, we may assume that \(u(\varvec{0})=v(\varvec{0})=0\). We define \(E_n(\mathbf {s})\) by the relation \(M_n(\mathbf {s})=e^{W_n(\mathbf {s})}(1+E_n(\mathbf {s}))\) and note that by assumption, \(E_n(\mathbf {s})=O(\kappa _n^{-1})\) uniformly for \(\Vert \mathbf {s}\Vert \le \tau \). We note that this implies \(E_n(\varvec{0})=0\).

By assumption, \(M_n(\mathbf {s})\) exists for \(||\mathbf {s} ||\le \tau \). Therefore, it is continuous for these \(\mathbf {s}\) and, by Morera’s theorem combined with applications of Fubini’s and Cauchy’s theorems, \(M_n(\mathbf {s})\) is analytic for \(||\mathbf {s} ||\le \tau \). This also implies that \(E_n(\mathbf {s})\) is analytic for \(||\mathbf {s} ||\le \tau \). By Cauchy’s formula, we have

$$\begin{aligned} \frac{\partial E_n(\mathbf {s})}{\partial s_j}=\frac{1}{2\pi i}\oint _{|\zeta _j |=\tau }\frac{E_n(s_1,\ldots , s_{j-1}, \zeta _j, s_{j+1}, \ldots , s_d)}{(\zeta _j-s_j)^2}\,d\zeta _j=O\left( \frac{1}{\kappa _n}\right) \end{aligned}$$

for \(||\mathbf {s} ||<\tau /2\). Thus

$$\begin{aligned} E_n(\mathbf {s})=\int _{[\varvec{0}, \mathbf {s}]}\langle {{\mathrm{grad}}}E_n(\mathbf {t}), d\mathbf {t}\rangle =O\left( \frac{||\mathbf {s} ||}{\kappa _n}\right) \end{aligned}$$

for \(||s ||<\tau /2\) (where \([\varvec{0}, \mathbf {s}]\) denotes the line segment from \(\varvec{0}\) to \(\mathbf {s}\)).

We calculate that

$$\begin{aligned} \varphi _{\mathbf {X}}(\mathbf {s})&=M_n\left( i\phi _{n}^{-1/2}\mathbf {s}\right) \exp \left( -i\phi _{n}^{-1/2}\langle \varvec{\mu }_n, \mathbf {s}\rangle \right) \\&=\exp \left( -\frac{1}{2} \mathbf {s}^\top \Sigma \mathbf {s}+ V(\mathbf {s})\right) \end{aligned}$$

with

$$\begin{aligned} V(\mathbf {s})= & {} u(i\phi _{n}^{-1/2}\mathbf {s})\phi _{n}+v(i\phi _{n}^{-1/2}\mathbf {s})-i\phi _{n}^{-1/2}\langle \varvec{\mu }_n, \mathbf {s}\rangle \\&+ \frac{1}{2} \mathbf {s}^\top \Sigma \mathbf {s}+\log (1+E_n(i\phi _{n}^{-1/2}\mathbf {s})). \end{aligned}$$

Since \(u(\varvec{0})=v(\varvec{0})=0\) and the first and second order terms of u cancel out, we have

$$\begin{aligned} V(\mathbf {s})=O\left( \frac{\Vert \mathbf {s}\Vert ^3+\Vert \mathbf {s}\Vert }{\sqrt{\phi _{n}}}\right) \end{aligned}$$

for \(\Vert \mathbf {s}\Vert <\tau \sqrt{\phi _{n}}/2\).

Note that

$$\begin{aligned} \lim _{n\rightarrow \infty } \varphi _\mathbf {X}(\mathbf {s})=\exp \left( -\frac{1}{2} \mathbf {s}^\top \Sigma \mathbf {s}\right) \end{aligned}$$

for \(\mathbf {s}\in \mathbb {C}^m\), which implies that, in distribution, \(\mathbf {X}\) converges to the normal distribution with mean zero and variance-covariance matrix \(\Sigma \). Although we have to refine our estimates for applying Theorem 2, we immediately conclude that \(\Sigma \) is positive (semi-) definite depending on whether it is regular or not. \(\square \)

Let now \(\Sigma \) be regular. By \(\mathbf {Y}\) we denote a normally distributed random variable in \(\mathbb {R}^m\) with mean \(\varvec{0}\) and variance-covariance matrix \(\Sigma \). Its characteristic function is

$$\begin{aligned} \varphi _\mathbf {Y}(\mathbf {s})=\exp \left( -\frac{1}{2} \mathbf {s}^\top \Sigma \mathbf {s}\right) . \end{aligned}$$

The smallest eigenvalue of \(\Sigma \) is denoted by \(\sigma >0\).

We are now able to bound the functions occurring in the Berry–Esseen inequality.

Lemma 5.2

There exists a \(c<\tau /2\) such that

$$\begin{aligned} |\Lambda (\varphi _\mathbf {X})(\mathbf {s})-\Lambda (\varphi _\mathbf {Y})(\mathbf {s}) |\le \exp \left( -\frac{\sigma }{4}||\mathbf {s} ||^2 + O(||\mathbf {s} ||)\right) O\left( \frac{||\mathbf {s} ||^3+||\mathbf {s} ||}{\sqrt{\phi _{n}}}\right) \end{aligned}$$

holds for all \(\mathbf {s}\in \mathbb {C}^L\) with \(||\mathbf {s} ||\le c\sqrt{\phi _{n}}\) and \(||\mathfrak {I}\mathbf {s} ||\le 1\).

Proof

Let \(\alpha \in \Pi _L\). Then by Lemma 5.1, we have

$$\begin{aligned}&\left|\prod _{J\in \alpha } (\varphi _\mathbf {X}\circ \psi _{J,L})(\mathbf {s}) - \prod _{J\in \alpha } (\varphi _\mathbf {Y}\circ \psi _{J,L})(\mathbf {s}) \right|\nonumber \\&\quad =\exp \left( -\frac{1}{2}\mathfrak {R}\sum _{J\in \alpha } \psi _{J,L}(\mathbf {s})^\top \Sigma \psi _{J,L}(\mathbf {s}) \right) \left|\exp \left( \sum _{J\in \alpha } V(\psi _{J,L}(\mathbf {s}))\right) -1 \right|.\nonumber \\ \end{aligned}$$
(5.2)

For \(\mathbf {t}\in \mathbb {R}^L\), we have \(\mathbf {t}^\top \Sigma \mathbf {t}\ge \sigma \mathbf {t}^\top \mathbf {t}\ge \sigma ||\mathbf {t} ||^2\). For complex w, we have \(|\exp (w)-1|\le |w|\exp (|w|)\). Splitting \(\mathbf {s}\) into its real and imaginary parts in the first summand and using these inequalities for the first and second factor of (5.2), respectively, yields

$$\begin{aligned}&\left|\prod _{J\in \alpha } (\varphi _\mathbf {X}\circ \psi _{J,L})(\mathbf {s}) - \prod _{J\in \alpha } (\varphi _\mathbf {Y}\circ \psi _{J,L})(\mathbf {s}) \right|\\&\quad \le \exp \left( -\frac{\sigma }{2}||\mathbf {s} ||^2 + O\left( ||\mathbf {s} ||+\frac{||\mathbf {s} ||^3+ ||\mathbf {s} ||}{\sqrt{\phi _{n}}}\right) \right) O\left( \frac{||\mathbf {s} ||^3+ ||\mathbf {s} ||}{\sqrt{\phi _{n}}}\right) \end{aligned}$$

by (5.1). For sufficiently small c, we obtain

$$\begin{aligned}&\left|\prod _{J\in \alpha } (\varphi _\mathbf {X}\circ \psi _{J,L})(\mathbf {s}) - \prod _{J\in \alpha } (\varphi _\mathbf {Y}\circ \psi _{J,L})(\mathbf {s}) \right| \\&\quad \le \exp \left( -\frac{\sigma }{4}||\mathbf {s} ||^2 + O(||\mathbf {s} ||)\right) O\left( \frac{||\mathbf {s} ||^3+ ||\mathbf {s} ||}{\sqrt{\phi _{n}}}\right) . \end{aligned}$$

Multiplication by \(|\mu _\alpha |\) and summation over all \(\alpha \in \Pi _L\) conclude the proof of the lemma. \(\square \)

The last ingredient to prove the quasi-power theorem is a bound of the integrals occurring in the Berry–Esseen inequality.

Lemma 5.3

Let c be as in Lemma 5.2. Then

$$\begin{aligned} \int _{||\mathbf {s} ||\le c\sqrt{\phi _{n}}}\left|\frac{\Lambda (\varphi _{\mathbf {X}})(\mathbf {s})-\Lambda (\varphi _\mathbf {Y})(\mathbf {s})}{\prod _{\ell \in L}s_\ell } \right|\, d\mathbf {s}= O\left( \frac{1}{\sqrt{\phi _{n}}}\right) . \end{aligned}$$

Proof

For simplicity, set \(h=\Lambda (\varphi _{\mathbf {X}})-\Lambda (\varphi _\mathbf {Y})\). For a partition \(\{J,K\}\) of L, set

$$\begin{aligned} \mathcal {S}(J, K)=\left\{ \mathbf {s}\in \mathbb {R}^L:|s_j |\le 1\text { for }j\in J,\ 1\le |s_k |\le c\sqrt{\phi _{n}}\text { for }k\in K\right\} \end{aligned}$$

and partition \(\mathbf {s}\) into \((\mathbf {s}_{J},\mathbf {s}_{K})\). We use the notation

$$\begin{aligned} D^{J} = \frac{\partial ^{|J |}}{\partial z_{j_1}\cdots \partial z_{j_{|J |}}} \end{aligned}$$

when \(J=\{j_1,\ldots , j_{|J |}\}\). The product of the paths from 0 to \(s_j\) for \(j\in J\) is denoted by \([\varvec{0}, \mathbf {s}_{J}]\).

By Lemma 3.1, we have

$$\begin{aligned} h(\mathbf {s})=\int _{[\varvec{0},\mathbf {s}_{J}]} D^{J}(h(\mathbf {z}_{J}, \mathbf {s}_{K}))\,d\mathbf {z}_{J}. \end{aligned}$$
(5.3)

By Cauchy’s integral formula, we have

$$\begin{aligned} D^{J}(h(\mathbf {z}_{J}, \mathbf {s}_{K})) = \frac{1}{(2\pi i)^{|J |}}\oint _{\varvec{\zeta }_{J}}\frac{h(\varvec{\zeta }_{J}, \mathbf {s}_{K})}{\prod _{j\in J}(\zeta _j-z_j)^2}\,d\varvec{\zeta }_{J} \end{aligned}$$
(5.4)

where \(\zeta _j\) is integrated over the circle of radius 1 around \(z_j\) for \(j\in J\), thus \(||\mathfrak {I}\varvec{\zeta }_{J} ||\le 1\).

Using the estimate of Lemma 5.2 yields

$$\begin{aligned} \begin{aligned} |h(\varvec{\zeta }_{J}, \mathbf {s}_{K}) |&=\exp \left( -\frac{\sigma }{4}||(\varvec{\zeta }_{J}, \mathbf {s}_{K}) ||^2+O(||(\varvec{\zeta }_{J}, \mathbf {s}_{K}) ||)\right) \\&\quad \times O\left( \frac{||(\varvec{\zeta }_{J}, \mathbf {s}_{K}) ||^3+||(\varvec{\zeta }_{J}, \mathbf {s}_{K}) ||}{\sqrt{\phi _{n}}}\right) \\&=\exp \left( -\frac{\sigma }{4}||\mathbf {s} ||^2+O(||\mathbf {s} ||+1)\right) O\left( \frac{||\mathbf {s} ||^3+1}{\sqrt{\phi _{n}}}\right) . \end{aligned} \end{aligned}$$
(5.5)

Combining (5.3), (5.4) and (5.5) leads to

$$\begin{aligned}&\int _{\mathcal {S}(J, K)}\left|\frac{h(\mathbf {s})}{\prod _{\ell \in L}s_\ell } \right|\, d\mathbf {s}\\&\quad =O\left( \frac{1}{\sqrt{\phi _{n}}}\int _{\mathcal {S}(J, K)}\frac{1}{\prod _{\ell \in L}|s_\ell |}\right. \\&\qquad \qquad \quad \left. \left|\int _{[\varvec{0}, \mathbf {s}_{J}]}\exp \left( -\frac{\sigma }{4}||\mathbf {s} ||^2+ O(||\mathbf {s} ||+1)\right) (||\mathbf {s} ||^3+1) \, d\mathbf {z}_{J} \right|d\mathbf {s}\right) \\&\quad =O\left( \frac{1}{\sqrt{\phi _{n}}}\int _{\mathcal {S}(J, K)}\frac{1}{\prod _{\ell \in L}|s_\ell |}\exp \left( -\frac{\sigma }{4}||\mathbf {s} ||^2+ O(||\mathbf {s} ||+1)\right) (||\mathbf {s} ||^3+1)\right. \\&\qquad \qquad \quad \left. \times \left|\int _{[\varvec{0}, \mathbf {s}_{J}]} \, d\mathbf {z}_{J} \right|d\mathbf {s}\right) . \end{aligned}$$

The inner integral results in \(|\prod _{j\in J}s_j |\). The factors \(|s_k |\ge 1\) for \(k\in K\) in the denominator can simply be omitted. If \(K\ne \emptyset \), we still have to bound

$$\begin{aligned}&\int _{\mathcal {S}(J, K)}\exp \left( -\frac{\sigma }{4}||\mathbf {s} ||^2+ O(||\mathbf {s} ||+1)\right) (||\mathbf {s} ||^3+1)\,d\mathbf {s}\\&\quad =\sum _{k\in K} \int _{\begin{array}{c} \mathcal {S}(J, K)\\ ||\mathbf {s} ||=|s_k | \end{array}} \exp \left( -\frac{\sigma }{4}||\mathbf {s} ||^2+ O(||\mathbf {s} ||+1)\right) (||\mathbf {s} ||^3+1)\,d\mathbf {s}\\&\quad =\sum _{k\in K} \int _{\begin{array}{c} \mathcal {S}(J, K)\\ ||\mathbf {s} ||=|s_k | \end{array}} \exp \left( -\frac{\sigma }{4}|s_k |^2+ O(|s_k |+1)\right) (|s_k |^3+1)\,d\mathbf {s}\\&\quad =\sum _{k\in K} \int _{1\le |s_k |\le c\sqrt{\phi _{n}}} \exp \left( -\frac{\sigma }{4}|s_k |^2+ O(|s_k |)\right) (|s_k |^3+1)\\&\qquad \times \int _{\varvec{1}\le |\mathbf {s}_{K\setminus \{k\}} |\le |s_k |\varvec{1}} \int _{|\mathbf {s}_J |\le \varvec{1}} \,d\mathbf {s}_J d\mathbf {s}_{K\setminus \{k\}}ds_k\\&\quad =2^{|L |-1}\sum _{k\in K} \int _{1\le |s_k |\le c\sqrt{\phi _{n}}} \exp \left( -\frac{\sigma }{4}|s_k |^2+ O(|s_k |)\right) (|s_k |^3+1)|s_k |^{|K |-1}\,ds_k \end{aligned}$$

where the integration bounds are meant coordinate-wise. Then we use the fact that

$$\begin{aligned} \int _{x\in \mathbb {R}} \exp \left( -\frac{\sigma }{4} x^2\right) |x |^t\,dx \end{aligned}$$

is finite for all constants \(t\ge 0\). Thus, after completing the square in the argument of the exponential function, the integral over \(s_k\) is bounded by a constant, i.e.,

$$\begin{aligned} \int _{\mathcal {S}(J, K)}\exp \left( -\frac{\sigma }{4}||\mathbf {s} ||^2+ O(||\mathbf {s} ||+1)\right) (||\mathbf {s} ||^3+1)\,d\mathbf {s}=O(1). \end{aligned}$$

We conclude that

$$\begin{aligned} \int _{\mathcal {S}(J, K)}\left|\frac{h(\mathbf {s})}{\prod _{\ell \in L}s_\ell } \right|\, d\mathbf {s}=O\left( \frac{1}{\sqrt{\phi _{n}}}\right) . \end{aligned}$$

Summation over all partitions \(\{J,K\}\) of L completes the proof of the lemma. \(\square \)

We now collect all results to prove Theorem 1.

Proof of Theorem 1

We set \(T=c\sqrt{\phi _{n}}\) with c from Lemma 5.2. By Theorem 2 and Lemma 5.3, we have

$$\begin{aligned} \sup _{\mathbf {z}\in \mathbb {R}^m}|F_{\mathbf {X}}(\mathbf {z})-F_{\mathbf {Y}}(\mathbf {z}) |=O\left( \frac{1}{\sqrt{\phi _{n}}}\right) +O\Bigg (\sum _{\emptyset \ne J\subsetneq L} \sup _{\mathbf {z}_J\in \mathbb {R}^J}\left|F_{\mathbf {X}_{J}}(\mathbf {z}_J)-F_{\mathbf {Y}_{J}}(\mathbf {z}_J) \right|\Bigg ).\nonumber \\ \end{aligned}$$
(5.6)

For \(\emptyset \ne J\subsetneq L\), we have \(\varphi _{\mathbf {X}_J}=\varphi _{\mathbf {X}}\circ \chi _{J, L}\). Therefore, all prerequisites for applying the quasi-power theorem on \((\varvec{\Omega }_n)_J\) are fulfilled. Therefore, we can apply (5.6) recursively and finally obtain

$$\begin{aligned} \sup _{\mathbf {z}\in \mathbb {R}^m}|F_{\mathbf {X}}(\mathbf {z})-F_{\mathbf {Y}}(\mathbf {z}) |=O\left( \frac{1}{\sqrt{\phi _{n}}}\right) . \end{aligned}$$

\(\square \)

Note that it would also have been possible to apply Corollary 2.2; however, this would have required proving Lemmas 5.2 and 5.3 for subsets K of L, which would have required some notational overhead using \(\chi _{K, L}\).

Proof of Proposition 1.1

This follows by the same arguments as in [17, Thm. 2]. \(\square \)