Higher Dimensional Quasi-Power Theorem and Berry-Esseen Inequality

Hwang's quasi-power theorem asserts that a sequence of random variables whose moment generating functions are approximately given by powers of some analytic function is asymptotically normally distributed. This theorem is generalised to higher dimensional random variables. To obtain this result, a higher dimensional analogue of the Berry-Esseen inequality is proved, generalising a two-dimensional version by Sadikova.


Introduction
Asymptotic normality is a frequently occurring phenomenon in combinatorics, the classical central limit theorem being the very first example. The first step in the proof is the observation that the moment generating function of the sum of n identically independently distributed random variables is the n-th power of the moment generating function of the distribution underlying the summands. As similar moment generating functions occur in many examples in combinatorics, a general theorem to prove asymptotic normality is desirable. Such a theorem was proved by Hwang [18], usually called the "quasi-power theorem".
In contrast to many results about the speed of convergence in classical probability theory (see, e.g., [12]), the sequence of random variables is not assumed to be independent. The only assumption is that the moment generating function behaves asymptotically like a large power. This mirrors the fact that the moment generating function of the sum of independent, identically distributed random variables is exactly a large power. The advantage is that the asymptotic expression (1.1) arises naturally in combinatorics by using techniques such as singularity analysis or saddle point approximation (see [8]).
The purpose of this article is to generalise the quasi-power theorem including the speed of convergence to arbitrary dimension m. We first state this main result in Theorem 1 in this section. In Section 2, a new Berry-Esseen inequality (Theorem 2) is presented, which we use to prove the m-dimensional quasi-power theorem. In Section 3, we give some applications of the multidimensional quasi-power theorem. The combinatorial idea behind the formulation of the Berry-Esseen inequality is discussed in Section 4. Our Berry-Esseen bound is proved in Section 5. The final Section 6 is then devoted to the proof of the quasi-power theorem.
We use the following conventions: vectors are denoted by boldface letters such as s, their components are then denoted by regular letters with indices such as s j . For a vector s, s denotes the maximum norm max{|s j |}. All implicit constants of O-terms may depend on the dimension m as well as on τ which is introduced in Theorem 1.
Our first main result is the following m-dimensional version of Hwang's theorem.
Theorem 1. Let {Ω n } n≥1 be a sequence of m-dimensional real random vectors. Suppose that the moment generating function satisfies the asymptotic expression and v(s) analytic for s ≤ τ and independent of n; and the Hessian H u (0) of u at the origin is non-singular; Then, the distribution of Ω n is asymptotically normal with speed of convergence O(φ −1/2 n ), i.e., where Φ Σ denotes the distribution function of the non-degenerate m-dimensional normal distribution with mean 0 and variance-covariance matrix Σ, i.e., converge in distribution to a degenerate normal distribution with mean 0 and variance-covariance matrix H u (0).
Note that in the case of the singular H u (0), a uniform speed of convergence cannot be guaranteed. To see this, consider the (constant) sequence of random variables Ω n which takes values ±1 each with probability 1/2. Then the moment generating function is (e t + e −t )/2, which is of the form (1.2) with φ n = n, u(s) = 0, v(s) = log(e t + e −t )/2 and κ n arbitrary. However, the distribution function of Ω n / √ n is given by which does not converge uniformly.
In contrast to the original quasi-power theorem, the error term in our result does not contain the summand O(1/κ n ). In fact, this summand could also be omitted in the original proof of the quasi-power theorem by using a better estimate for the error E n (s) = M n (s)e −Wn(s) − 1, cf. the proof of our Lemma 6.1.
The order of the error is optimal (without further assumptions on the random variables), as it is the case for the one-dimensional Berry-Esseen inequality. See, for example, the approximation of a binomial distribution by the normal distribution [21, § 1.2].
The proof of Theorem 1 relies on an m-dimensional Berry-Esseen inequality (Theorem 2). It is a generalisation of Sadikova's result [25,26] in dimension 2. The main challenge is to provide a version which leads to bounded integrands around the origin, but still allows to use excellent bounds for the tails of the characteristic functions. To achieve this, linear combinations involving all partitions of the set {1, . . . , m} are used.
Note that there are several generalisations of the one-dimensional Berry-Esseen inequality [3,7] to arbitrary dimension, see, e.g., Gamkrelidze [9,10] and Prakasa Rao [23]. However, using these results would lead to a less precise error term in (1.3), see the end of Section 2 for more details. For that reason we generalise Sadikova's result, which was already successfully used by the first author in [13] to prove a 2-dimensional quasi-power theorem. Also note that our theorem can deal with discrete random variables, too, in contrast to [24], where density functions are considered.
For the sake of completeness, we also state the following result about the moments of Ω n .
Proposition 1.1. The cross-moments of Ω n satisfy 1 · · · s km m ]e u(s)X+v(s) . In particular, the mean and the variance-covariance matrix are

A Berry-Esseen Inequality
This section is devoted to a generalisation of Sadikova's Berry-Esseen inequality [25,26] in dimension 2 to dimension m. Before stating the theorem, we introduce our notation.
Let L = {1, . . . , m}. For K ⊆ L, we write s K = (s k ) k∈K for the projection of s ∈ C L to C K .
) k∈K be the projection which sets all coordinates corresponding to K \ J to 0.
We denote the set of all partitions of K by Π K . We consider a partition as a set α = {J 1 , . . . , J k }. Thus |α| denotes the number of parts of the partition α. Furthermore, J ∈ α means that J is a part of the partition α. Now, we can define an operator which we later use to state our Berry-Esseen inequality. The motivation behind this definition is explained at the end of this section.
Definition 2.1. Let K ⊆ L and h : C K → C. We define the non-linear operator We denote Λ L briefly by Λ.
For any random variable Z, we denote its cumulative distribution function by F Z , its density function by f Z (if it exists) and its characteristic function by ϕ Z .
With these definitions, we are able to state our second main result, an m-dimensional version of the Berry-Esseen inequality.
k denotes a Stirling partition number (Stirling number of the second kind). Let T > 0 be fixed. Then Existence of E(X) and E(Y) is sufficient for the finiteness of the integral in (2.1).
Let us give two remarks on the distribution functions occurring in this theorem: The distribution function F Y is non-decreasing in every variable, thus A j > 0 for all j. Furthermore, our general notations imply that F X J is a marginal distribution of X.
The numbers B j are known as "Fubini numbers" or "ordered Bell numbers". They form the sequence A000670 in [20].
Recursive application of (2.1) leads to the following corollary, where we no longer explicitly state the constants depending on the dimension. Then where the O-constants only depend on the dimension m.
Existence of E(X) and E(Y) is sufficient for the finiteness of the integrals in (2.2).
In order to explain the choice of the operator Λ, we first state it in dimension 2: This coincides with Sadikova's definition. This also shows that our operator is non-linear as, e.g., In Theorem 2, we apply Λ to characteristic functions; so we may restrict our attention to functions h with h(0) = 1. From (2.3), we see that Λ(h)(s 1 , 0) = Λ(h)(0, s 2 ) = 0, so that Λ(h)(s 1 , s 2 )/(s 1 s 2 ) is bounded around the origin. This is essential for the boundedness of the integral in Theorem 2. In general, this property will be guaranteed by our particular choice of coefficients. It is no coincidence that for α ∈ Π L , the coefficient µ α equals the value µ(α, {L}) of the Möbius function in the lattice of partitions: Weisner's theorem (see Stanley [27,Corollary 3.9.3]) is crucial in the proof that Λ(h)(s)/(s 1 . . . s m ) is bounded around the origin (see the proof of Lemma 4.1).
The second property is that our proof of the quasi-power theorem needs estimates for the tails of the integral in Theorem 2. These estimates have to be exponentially small in every variable, which means that every variable has to occur in every summand. This is trivially fulfilled as every summand in the definition of Λ is formulated in terms of a partition.
Note that Gamkrelidze [10] (and also Prakasa Rao [23]) use a linear operator L mapping h to When taking the difference of two characteristic functions, we may assume that h(0, 0) = 0 so that the first crucial property as defined above still holds. However, the tails are no longer exponentially small in every variable: the last summand h(0, s 2 ) in (2.4) is not exponentially small in s 1 because it is independent of s 1 and nonzero in general. However, the first two summands are exponentially small in s 1 by our assumption (1.2). For that reason, using the Berry-Esseen inequality by Gamkrelidze [10] to prove a quasi-power theorem leads to a less precise error term It can be shown that the less precise error term necessarily appears when using Gamkrelidze's result by considering the example of Ω n being the 2-dimensional vector consisting of a normal distribution with mean −1 and variance n and a normal distribution with mean 0 and variance n. This is a consequence of the linearity of the operator L in Gamkrelidze's result.

Examples of Multidimensional Central Limit Theorems
In this section, we give two examples from combinatorics where we can apply Theorem 1. Asymptotic normality was already shown in earlier publications [4,2], but we additionally provide an estimate for the speed of convergence.
3.1. Context-Free Languages. Consider the following example of a context-free grammar G with non-terminal symbols S and T , terminal symbols {a, b, c}, starting symbol S and the rules P = {S → aSbS, S → bT, T → bS, T → cT, T → a}.
The corresponding context-free language L(G) consists of all words which can be generated starting with S using the rules in P to replace all non-terminal symbols. For example, abcabababba ∈ L(G) because it can be derived as S → aSbS → abT baSbS → abcT babT bbT → abcabababba.
Let P(Ω n = x) be the probability that a word of length n in L(G) consists of x 1 and x 2 terminal symbols a and b, respectively. Thus there are n − x 1 − x 2 terminal symbols c. For simplicity, this random variable is only 2-dimensional. But it can be easily extended to higher dimensions.
Following Drmota [4,Sec. 3.2], we obtain that the moment generating function is E(e Ωn,s ) = y n (e s ) y n (1) with y n (z) defined in [4]. Using [4,Equ. (4.9)], this moment generating function has an asymptotic expansion as in (1.2) with φ n = n. Thus Ω n is asymptotically normally distributed after standardisation (as was shown in [4]) and additionally the speed of convergence is O(n −1/2 ). Other context-free languages can be analysed in the same way, either by directly using the results in [4] (if the underlying system is strongly connected) or by similar methods. This has applications, for example, in genetics (see [22]).

Dissections of Labelled Convex
Polygons. Let S 1 · ∪· · · · ∪S t+1 = {3, 4, . . .} be a partition. We dissect a labelled convex n-gon into smaller convex polygons by choosing some non-intersecting diagonals. Each small polygon should be a k-gon with k ∈ S t+1 . Define a n (r) to be the number of dissections of an n-gon such that it consists of exactly r i small polygons whose number of vertices is in S i , for i = 1, . . . , t. For convenience, we use a 2 (r) = [r = 0]. Asymptotic normality was proved in [2,Sec. 3], see also [1,Ex. 7.1] for a one-dimensional version. We additionally provide an estimate for the speed of convergence.
Then choosing a k-gon with k ∈ S 1 · ∪ · · · · ∪ S t and gluing dissected polygons to k − 1 of its sides translates into the equation Following [1], this equation can be used to obtain an asymptotic expression for the moment generating function as in (1.2) with φ n = n. The asymptotic normal distribution follows after suitable standardisation with speed of convergence O(n −1/2 ).

Combinatorial Background of the Operator Λ
Before we start with the proof of Theorem 2, we state and prove the property of our operator Λ which motivates its Definition 2.1. By the following definition, Π L , the set of all partitions of L, is a poset: As usual, a partition α ∈ Π L is said to be a refinement of a partition α ∈ Π L if ∀J ∈ α : ∃J ∈ α : J ⊆ J .
In this case, we write α ≤ α . This defines a partial order on Π L .

Proof of the Berry-Esseen Inequality
This section is devoted to the proof of our Berry-Esseen inequality, Theorem 2. It is a generalisation of Sadikova's proof.
We start with an auxiliary one-dimensional random variable.
Lemma 5.1. Let P be the one-dimensional random variable with probability density function Then its characteristic function is To obtain a bound for λ, we follow Gamkrelidze [10]: we estimate the tail using |sin 4 (z)| ≤ 1 and get This results in (5.3).
In the next step, we consider tuples of random variables distributed as P . They will be used to ensure smoothness. We write 1 to denote a vector with all coordinates equal to 1.
Lemma 5.2. Let Q = (P 1 /T, . . . , P m /T ) be the m-dimensional random variable where the P j are independent random variables with the same distribution as P in Lemma 5.1 and T is the fixed constant defined in Theorem 2.
Proof. Because of independence, the distribution function and the characteristic function of Q is the product of the distribution functions and the characteristic functions of the P j /T , respectively. Division by T transforms the density and characteristic functions as claimed. As ϕ P (t) vanishes By a simple translation, the integral on the left hand side of (5.4) can be seen to be equal to Then (5.4) is a simple consequence of Q j = P j /T , (5.2) and the triangle inequality. By the same translation and the definition of λ, the integral on the left hand side of (5.5) is From now on, we let Q be as in Lemma 5.2 and let Q be independent of X and independent of Y. We first prove an inequality relating the difference between the distribution functions of X and Y to that of the distribution functions of X + Q and Y + Q. Lemma 5.3. We have and ε > 0. We choose θ ∈ {±1} such that S = sup z∈R m θ(F X (z) − F Y (z)).
There is a z ε ∈ R m such that Let w ∈ R n with θw ≤ 0. By monotonicity of F X , we have θF X (z ε − w) ≥ θF X (z ε ). Thus We multiply this inequality by f Q w + θλ T 1 and integrate over all w ∈ R n with θw ≤ 0. By (5.5) and (5.4), we get (5.7) Setting by (5.5) and the fact that f Q is a probability density function. Combining (5.7) and (5.8) yields As the sum of random variables corresponds to a convolution, we have Replacing z and w by z ε + θλ T 1 and w + θλ T 1, respectively, and using (5.9) leads to for all ε > 0. Taking the limit for ε → 0 and rearranging yields the right hand side of (5.6). The left hand side of (5.6) is an immediate consequence of (5.10).
We are now able to bound the difference of the distribution functions by their characteristic functions.
Lemma 5.4. We have Proof. Let a, z ∈ R m with a ≤ z.
The random variable X J +Q J admits a density function, because Q J admits a density function. In particular, X J + Q J is a continuous random variable. By Lévy's theorem (see, e.g., [28,Thm. 1.8.4]), As ϕ X J +Q J (t J ) = ϕ X J (t J )ϕ Q J (t J ) and ϕ Q J (t J ) vanishes outside [−T, T ] J by Lemma 5.2, we can replace the limit T j → ∞ by setting T j = T , i.e., Taking the product over all J ∈ α and summing over α ∈ Π L yields where Fubini's theorem and the fact that ϕ Q (t) = J∈α ϕ Q J (t J ) have been used. By definition of ϕ X , we have ϕ X J (t J ) = ϕ X (ψ J,L (t)). Therefore, we can use the definition of Λ(ϕ X ) to rewrite (5.12) to This equation remains valid when replacing X by Y; taking the difference results in If the integral on the right hand side of (5.11) is infinite, there is nothing to show. Thus we may assume that it is finite. This also implies that is an integrable function on R m (as it vanishes outside [−T, T ] m ). Then by the Riemann-Lebesgue lemma, we may take the limit a → −∞ for all ∈ L in (5.13) to obtain Taking absolute values and rewriting the left hand side in terms of marginal distribution functions yields (5.11).
We now bound the contribution of the lower dimensional distributions.
because the products over the distribution functions are bounded by 1. Therefore, A partition α ∈ Π L with J ∈ α can be uniquely written as α = {J} ∪ β for a β ∈ Π L\J . Thus because there are m−|J| k partitions of L\J with k parts. Using the left hand side of (5.6) yields the assertion (more precisely, of a version of the left hand side of (5.6) for marginal distributions). Now, we can complete the proof of the theorem.
If the expectation of X exists, ϕ X is differentiable. Therefore, Λ(ϕ X ) is differentiable, too. By Lemma 4.1, Λ(ϕ X )(t) has a zero whenever one of the t , ∈ L, vanishes. Thus Λ(ϕ X )(t) ∈L t is bounded around 0 and therefore bounded on [−T, T ] m . The same holds for Y. Thus the integral on the right hand side of (2.1) converges.

Proof of the Quasi-Power Theorem
We may now prove the m-dimensional quasi-power theorem, Theorem 1. Let µ n = φ n grad u(0) and Σ = H u (0). We define the random vector X = φ −1/2 n (Ω n − µ n ). For simplicity, we ignore the dependence on n in this and the following notations.
First, we establish bounds for the characteristic function of X.
hold for all s ∈ C K with s < τ √ φ n /2. For n → ∞, X converges in distribution to a normal distribution with mean 0 and variancecovariance matrix Σ. In particular, Σ is positive (semi-)definite if it is regular (singular, respectively).
Proof. By replacing u(s) and v(s) by u(s) − u(0) and v(s) − v(0), respectively, we may assume that u(0) = v(0) = 0. We define E(s) by the relation M n (s) = e Wn(s) (1 + E(s)) and note that by assumption, E(s) = O(κ −1 n ) uniformly for s ≤ τ . We note that this implies E(0) = 0. By assumption, M n (s) exists for s ≤ τ . Therefore, it is continuous for these s and, by Morera's theorem combined with applications of Fubini's and Cauchy's theorems, M n (s) is analytic for s ≤ τ . This also implies that E(s) is analytic for s ≤ τ . By Cauchy's formula, we have Since u(0) = v(0) = 0 and the first and second order terms of u cancel out, we have s Σs for s ∈ C m , which implies that, in distribution, X converges to the normal distribution with mean zero and variance-covariance matrix Σ. Although we have to refine our estimates for applying Theorem 2, we immediately conclude that Σ is positive (semi-)definite depending on whether it is regular or not.
Let now Σ be regular. By Y we denote a normally distributed random variable in R m with mean 0 and variance-covariance matrix Σ. Its characteristic function is ϕ Y (s) = exp − 1 2 s Σs .
The smallest eigenvalue of Σ is denoted by σ > 0.
We are now able to bound the functions occurring in the Berry-Esseen inequality. We now collect all results to prove Theorem 1.
Proof of Theorem 1. We set T = c √ φ n with c from Lemma 6.2. By Theorem 2 and Lemma 6.3, we have For ∅ = J L, we have ϕ X J = ϕ X •χ J,L . Therefore, all prerequisites for applying the quasi-power theorem on (Ω n ) J are fulfilled. Therefore, we can apply (6.6) recursively and finally obtain sup z∈R m Note that it would also have been possible to apply Corollary 2.2; however, this would have required proving Lemmas 6.2 and 6.3 for subsets K of L, which would have required some notational overhead using χ K,L .
Proof of Proposition 1.1. This follows by the same arguments as in [18,Thm. 2].