Abstract
In this paper, we consider approximating expansions for the distribution of integer valued random variables, in circumstances in which convergence in law (without normalization) cannot be expected. The setting is one in which the simplest approximation to the \(n\)th random variable \(X_n\) is by a particular member \(R_n\) of a given family of distributions, whose variance increases with \(n\). The basic assumption is that the ratio of the characteristic function of \(X_n\) to that of \(R_n\) converges to a limit in a prescribed fashion. Our results cover and extend a number of classical examples in probability, combinatorics and number theory.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The topic of this paper is the explicit approximation, in various metrics, of random variables which, in terms of characteristic functions, behave like a sum
of a “model” variable \(Z_n\) (for instance, a Poisson random variable) and an independent perturbation \(Y_n\), when the model variable has “large” parameter. Our interest is in discrete random variables, and in cases where this simpleminded decomposition does not in fact exist. We have two motivations:
(1) In probabilistic number theory, it has been known since the proof by Rényi and Turán of the Erdős–Kac theorem that the random variable \(\omega (N_n)\) given by the number of prime divisors (without multiplicity, for definiteness) of an integer \(N_n\) uniformly chosen in the interval \(\{1,2,\ldots ,n\}\) has characteristic function given by
as \(n\rightarrow \infty \), where \(Z_n \sim \mathrm{Po\,}(\log \log n)\) is a Poisson variable with mean \(\log \log n\) and \(\Phi (\theta )\) is defined by
the product being absolutely convergent for all \(\theta \) real. This \(\Phi (\theta )\) is not the characteristic function of a probability distribution, and hence formula (1.1) with \(Z_n \sim \mathrm{Po\,}(\log \log n)\) cannot be true. However, we are nonetheless able to obtain explicit approximation statements for the law of \(\omega (N_n)\):
Theorem 1.1
For every integer \(r\ge 0\), there exist explicitly computable signed measures \(\nu _{r,n}\) on the positive integers such that the total variation distance between the law of \(\omega (N_n)\) and \(\nu _{r,n}\) is of order \(O\{(\log \log n)^{(r+1)/2}\}\) for \(n\ge 3\).
This is proved in Sect. 7.3, where formulas for the measures \(\nu _{1,n}\) and \(\nu _{2,n}\) are also given. Such results are new in analytic number theory, where total variation distance estimates have hardly been considered before [but see [4] for a result concerning the total variation distance to a Poisson approximation for the distribution of a truncated version of \(\omega (N_n)\)].
For more on the significance of the Rényi–Turán formula, comparison with the Keating–Snaith conjectures for the Riemann zeta function, and finitefield analogues, see Kowalski and Nikeghbali [6].
(2) In a beautiful paper, Hwang [5] considered sequences of nonnegative integer valued random variables \(X_n\), whose probability generating functions \(f_{X_n}\) satisfy
for all \(z\in \mathbb{C }\) with \(z \le \eta \), for some \(\eta > 1\), where the function \(g\) is analytic, and \(\lim _{n\rightarrow \infty }\lambda _n = \infty \). This assumption is also intuitively related to a model (1.1). Under some extra conditions, Hwang exhibits bounds of order \(O(\lambda _n^{1})\) on the accuracy of the approximation of the distribution of \(X_n\) by a Poisson distribution with carefully chosen mean, close to \(\lambda _n\). Hwang [5] also notes that his methods can be applied to families of distributions other than the Poisson family, and gives examples using the Bessel family.
In this paper, we systematically consider sequences of integer valued random variables \(X_n\), whose characteristic functions \(\phi _{X_n}\) satisfy a condition which, in the Poisson context, is some strengthening of the convergence
Under suitable conditions, we derive explicit approximations to the distribution of \(X_n\), in various metrics, by measures related to the Poisson model. The approximations can be made close to any given polynomial order in \(\lambda _n^{1/2}\), if the conditions are sharp enough and the measure is correspondingly chosen. The conditions that we require for these expansions are much weaker than those of Hwang [5]. For instance, his conditions require the \(X_n\) to take only nonnegative values, and to have exponential tails, neither of which conditions we need to impose.
Our basic result, Proposition 2.1, is very simple and explicit. It enables us to dispense with asymptotic settings, and to prove concrete error bounds. It also allows us to consider approximation by quite general families of distributions on the integers, instead of just the Poisson family, requiring only the replacement of the Poisson characteristic function in (1.2) by the characteristic function corresponding to the family chosen. This enables us to deduce expansions based on any such discrete family of distributions, as shown in Sect. 4, without any extra effort. Indeed, the main problem would seem to be to identify the higher order terms in the expansions, but these turn out simply to be linear combinations of the higher order differences of the basic distribution: see (2.6).
This elementary result, and a simple but powerful theorem that follows from it, are given, together with an example, in Sect. 2. The conditions are then substantially relaxed, in order to allow for wider application, and to treat total variation approximation in a satisfactory manner. The general conclusions are proved in the context of approximating finite signed measures in Sect. 3, and they are reformulated for approximating probability distributions in the usual asymptotic framework in Sect. 4.
In the Poisson context, the measures that result are the Poisson–Charlier measures. Our general results enable us to deduce a Poisson–Charlier approximation with error of order \(O(\lambda _n^{t/2})\), for any prescribed \(t\), assuming that Hwang’s conditions hold. We also show that the Poisson–Charlier expansions are valid under more general conditions, in which the \(X_n\) may have only a few finite moments. These expansions are established in Sect. 5, and the compound Poisson context is briefly discussed in Sect. 6. We discuss some examples, to sums of independent integer valued random variables, to Hwang’s setting and to our first motivation, proving Theorem 1.1, in Sect. 7.
In order to ease the reading of this paper, we give here a diagram indicating the logical dependency of the results we prove. On the lefthand side are the basic approximation theorems, the righthand side represents applications, and the results of Sect. 4 represent the bridge linking the two:
We frame our approximations in terms of three distances between (signed) measures \(\mu \) and \(\nu \) on the integers: the point metric
the Kolmogorov distance
and the total variation norm
Other metrics could also be treated using our methods.
2 The basic estimate
The essence of our argument is the following elementary result, linking the closeness of finite signed measures \(\mu \) and \(\nu \) to the closeness of their characteristic functions, when these have a common factor involving a ‘large’ parameter \(\rho \); for a finite signed measure \(\zeta \) on \(\mathbb{Z }\), the characteristic function \(\phi _\zeta \) is defined by \(\phi _\zeta (\theta ) := \sum _{j\in \mathbb{Z }} e^{ij\theta }\zeta \{j\}\), for \(\theta \le \pi \).
Proposition 2.1
Let \(\mu \) and \(\nu \) be finite signed measures on \(\mathbb{Z }\), with characteristic functions \(\phi _\mu \) and \(\phi _\nu \) respectively. Suppose that \(\phi _\mu = \psi _\mu \chi \) and \(\phi _\nu = \psi _\nu \chi \), and write \(d_{\mu \nu }:= \psi _\mu  \psi _\nu \). Suppose that, for some \( \gamma ,\rho ,t>0\),
Then there are explicit constants \(\alpha _{1t}\) and \(\alpha _{2t}\) such that

1.
\(\sup _{j\in \mathbb{Z }}\mu \{j\}  \nu \{j\} \ \le \ \alpha _{1t} \gamma (\rho \vee 1)^{(t+1)/2};\)

2.
\(\sup _{a \le b\in \mathbb{Z }}\mu \{[a,b]\}  \nu \{[a,b]\} \ \le \ \alpha _{2t} \gamma (\rho \vee 1)^{t/2}\).
Proof
For any \(j\in \mathbb{Z }\), the Fourier inversion formula gives
from which our assumptions imply directly that
For \(\rho \le 1\), we thus have
For \(\rho \ge 1\), it is immediate that
with \(\beta ^{\prime }_{1t} := 2^{(t+1)/2}m_{t}/\sqrt{2\pi }\); here, \(m_t\) denotes the \(t\)th absolute moment of the standard normal distribution. Setting
this proves part 1. The second part is similar, adding (2.2) over \(a\le j\le b\), and estimating
This gives part 2, with
\(\square \)
We shall principally be concerned with taking \(\mu \) to be the distribution of a random variable \(X\). We allow \(\nu \) to be a signed measure, because in many cases, such as in the following canonical example and in the Poisson–Charlier expansions of Sect. 5, signed measures appear as the natural approximations.
Let \(X\) be an integer valued random variable with characteristic function \(\phi _X := \psi \chi \), where \(\chi \) is the characteristic function of a (wellknown) probability distribution \(R\) on \(\mathbb{Z }\). Suppose that \(\chi \) satisfies
as for Proposition 2.1, and that \(\psi \) can be approximated by a polynomial expansion around \(\theta =0\) of the form
for real coefficients \(\tilde{a}_l\) (and with \(\tilde{a}_0=1\)) and some \(r\in \mathbb{N }_0\), in that
for some \(0<\delta \le 1\). In view of Proposition 2.1, this suggests that the distribution of \(X\) may be well approximated by the signed measure \(\nu _r = \nu _r(R;\tilde{a}_1,\ldots ,\tilde{a}_r)\) having \({\tilde{\psi }}_r\chi \) as characteristic function. Now \(\nu _r\) can immediately be identified as
where the differences \(D^l R\) of the probability measure \(R\) are determined by iterating the relation \(DR\{j\} := R\{j\}  R\{j1\}\). Hence, under these assumptions, Proposition 2.1 implies the following theorem; note that the assumption (2.5) is much like supposing that \(\psi \) has a Taylor expansion of length \(r\) around zero (in powers of \(i\theta \)), and hence that \(X\) has a corresponding number of finite moments.
Theorem 2.2
Let \(X\) be a random variable on \(\mathbb{Z }\) with distribution \(P_X\). Suppose that its characteristic function \(\phi _X\) is of the form \(\psi \chi \), where \(\chi \) is the characteristic function of a probability distribution \(R\) and satisfies (2.3) above. Suppose also that (2.5) is satisfied, for some \(r\in \mathbb{N }_0, \,\tilde{a}_1,\ldots ,\tilde{a}_r \in \mathbb{R }\) and \(\delta \ge 0\). Then, writing \(t=r+\delta \), we have

1.
\(d_{\mathrm{loc}}(P_X,\nu _r) \ \le \ \alpha _{1t} K_{r\delta }(\rho \vee 1)^{(t+1)/2}\);

2.
\(d_K(P_X,\nu _r) \ \le \ \alpha _{2t} K_{r\delta } (\rho \vee 1)^{t/2}\),
with \(\alpha _{1t}\) and \(\alpha _{2t}\) as in Proposition 2.1, and with \(\nu _r \ =\ \nu _r(R;\tilde{a}_1,\ldots ,\tilde{a}_r)\) as defined in (2.6).
Remark
Note that Proposition 2.1 can be applied with \(\psi _\mu = 0\), corresponding to \(\mu \) the zero measure, and \(\psi _\nu (\theta ) = \tilde{a}_l(e^{i\theta }1)^l\), for any \(1\le l\le r\), showing that the contribution from the \(l\)th term in the expansion to \(\nu _r\{j\}\) is at most \(\tilde{a}_l\alpha _{1l}(\rho \vee 1)^{(l+1)/2}\), and that to \(\nu _r\{[a,b]\}\) at most \(\tilde{a}_l\alpha _{2l}(\rho \vee 1)^{l/2}\). Thus, if \(\rho \) is large and the coefficients \(\tilde{a}_l\) moderate, the contributions decrease in powers of \(\rho ^{1/2}\) as \(l\) increases. In such circumstances, the signed measure \(\nu _r\) can be seen as a perturbation of the underlying distribution \(R\).
The simplest application of the above results arises when \(\phi _X = \phi _Yp_{\lambda }\), where \(p_{\lambda }(\theta ) = e^{\lambda (e^{i\theta }1)}\) is the characteristic function of the Poisson distribution \(\mathrm{Po\,}(\lambda )\) with mean \(\lambda \), which satisfies (2.3) with \(\rho = 2\pi ^{2}\lambda \), and \(\phi _Y\) is the characteristic function associated with a random variable \(Y\) on the integers. In this case, \(X=Z+Y\) is the sum of two independent random variables, as in (1.1), with \(Z\sim \mathrm{Po\,}(\lambda )\), and the situation is probabilistically very clear. For \(w = w_\theta = e^{i\theta }1\), we have \(\phi _Y(\theta ) = \mathbb{E }\{(1+w)^Y\}\). The latter expression has an expansion in powers of \(w\) up to the term in \(w^r\) if the \(r\)th moment of \(Y\) exists, with coefficients \(\tilde{a}_k := F_k(Y)/k!, \,1\le k\le r\), where \(F_k(Y)\) denotes the \(k\)th factorial moment of \(Y\):
Thus the asymptotic expansion of \(X\) around \(\mathrm{Po\,}(\lambda )\) is simply derived from the factorial moments of the perturbing random variable \(Y\), if they exist.
For example, we could take \(\phi _Y\) to be the characteristic function of a random variable \(Y_s\) with distribution
for some integer \(s\ge 1\); the random variable has only \(s1\) moments, and takes negative values, so that the theorems in [5] cannot be applied. However, \(Y_s\) has factorial moments
and characteristic function
and (2.5) holds for \({\tilde{\psi }}_r\) as in (2.4), with \(r=s1\) and any \(\delta < 1\), for \(\tilde{a}_k = F_k(Y)/k! = (1)^k s/(sk)\). Hence, if \(X = Z + Y_s\), where \(Z\sim \mathrm{Po\,}(\lambda )\) is independent of \(Y_s\), then Theorem 2.2 can be applied, approximating the distribution of \(X\) by the signed measure \(\nu _{s1}(\mathrm{Po\,}(\lambda );\tilde{a}_1,\ldots ,\tilde{a}_{s1})\).
3 Refinements
3.1 Weaker conditions
Proposition 2.1 yields explicit bounds on \(d_{\mathrm{loc}}(\mu ,\nu )\) and \(d_{\mathrm{K}}(\mu ,\nu )\) in terms of the quantities specified in (2.1). However, for many applications, a slight weakening of its conditions is useful, in which Conditions (2.1) need not hold either exactly or for all \(\theta \), though with corresponding consequences for the bounds obtained. The bound assumed for the difference \(\psi _\mu (\theta )\psi _\nu (\theta )\) in Proposition 2.1 is also replaced by a sum involving different powers of \(\theta \) in the following theorem. This would at first sight seem superfluous, but is nonetheless useful for asymptotics, when the coefficients of the powers may depend in different ways on the ‘large’ parameter \(\rho \).
We say that a characteristic function \(\chi \) is \((\rho ,\theta _0)\)locally normal if
and that characteristic functions \(\phi _\mu \) and \(\phi _\nu \) are \((\varepsilon ,\eta ,\theta _0)\)mod \(\chi \) polynomially close, for some \(\varepsilon ,\eta > 0\) and \(0 < \theta _0 \le \pi \), if \(\phi _\mu = \psi _\mu \chi \) and \(\phi _\nu = \psi _\nu \chi \), and that, for some \(M\ge 0\) and positive pairs \(\gamma _m,t_m, \,1\le m\le M\),
Note that, for practical purposes, the quantities \(\varepsilon \) and \(\eta \) should be as small as possible. Using these definitions, we can state the following theorem, whose proof follows that of Proposition 2.1 very closely, and is omitted.
Theorem 3.1
Let \(\mu \) and \(\nu \) be finite signed measures on \(\mathbb{Z }\), with characteristic functions \(\phi _\mu \) and \(\phi _\nu \) respectively. Suppose that \(\chi \) is \((\rho ,\theta _0)\)locally normal, and that \(\phi _\mu \) and \(\phi _\nu \) are \((\varepsilon ,\eta ,\theta _0)\)mod \(\chi \) polynomially close. Then, with \(\alpha _{lt}\) as for Proposition 2.1, and for any \(a_0 < b_0 \in \mathbb{Z }\), we have
where
and \(\gamma _1,\ldots ,\gamma _M\) are as in (3.2).
The first conclusion yields a bound on \(d_{\mathrm{loc}}(\mu ,\nu )\). However, the presence of the factor \((b_0a_0+1)\) in the second bound means that, in contrast to the situation in Proposition 2.1, a direct bound on \(d_K(\mu ,\nu )\) is not immediately visible. The following result, giving bounds on both \(d_K(\mu ,\nu )\) and \(\Vert \mu \nu \Vert \), is however easily deduced; for a signed measure \(\mu , \,\mu \) as usual denotes its variation.
Theorem 3.2
With the notation and conditions of Theorem 3.1,
where
with \(\alpha _{lt}\) as for Proposition 2.1 and with \(\gamma _m\) as in (3.2). If also \(\mu \) is a probability measure and \(\nu (\mathbb{Z })=1\), then
Proof
The inequality for the total variation norm is immediate. For the Kolmogorov distance, by considering the possible positions of \(x\) in relation to \(a<b\), we have
If \(\mu \) is a probability measure and \(\nu (\mathbb{Z })=1\), we have
\(\square \)
3.2 Sharper total variation approximation
When using Theorem 3.2, it can safely be assumed that the tails of the wellknown measure \(\nu \) can be suitably bounded. However, taking \(\chi \) to be the characteristic function of the Poisson distribution \(\mathrm{Po\,}(\lambda )\), for example, as in the example of Sect. 2, the measure of the tail set \([a,b]^c\) cannot be small unless \(ba\) is large in comparison to \(\sqrt{\lambda }\); in an asymptotic sense, as \(\lambda \rightarrow \infty \) and since \(\lambda \asymp \rho \), one would need at least \(\rho ^{1/2}(ba) \rightarrow \infty \). As a result, the quantity \(\varepsilon ^{(1)}_{ab}\) appearing in the bound on the total variation distance would necessarily be of larger asymptotic order than \(\sum _{m=1}^M\gamma _m \alpha _{2t_m} \rho ^{t_m/2}\), which, in view of the bound on \(d_K\), would nonetheless seem to be the ‘natural’ order of approximation. Under somewhat stronger conditions than those of Theorem 3.1, a total variation bound of this order can be deduced (at least, if the quantities \(\varepsilon \) and \(\eta \) are also suitably small); the argument is reminiscent of that in [8].
We say that a characteristic function \(\chi \) is \((\rho ,\gamma ^{\prime },\theta _0)\)smoothly locally normal if \(\chi (\theta ) := e^{i\zeta \theta u(\theta )}\) for some \(\zeta = \zeta _\chi \in \mathbb{R }\), and for some twice differentiable function \(u\) such that \(u(0)=u^{\prime }(0)=0\), and that
Taking \(\chi = p_\lambda \) to be the characteristic function of the Poisson distribution \(\mathrm{Po\,}(\lambda )\), for example, we can set \(\zeta _\chi =\lambda \) and \(u(\theta ) = \lambda (1e^{i\theta } + i\theta )\), showing that \(p_\lambda \) is \((\rho ,\gamma ^{\prime },\pi )\)smoothly locally normal with \(\rho = 2\lambda /\pi ^2\) and \(\gamma ^{\prime } = \pi ^2/2\).
For any \(\varepsilon ,\eta > 0\) and \(0 < \theta _0 \le \pi \), we then say that characteristic functions \(\phi _\mu \) and \(\phi _\nu \) are \((\varepsilon ,\eta ,\theta _0)\)smoothly mod \(\chi \) polynomially close if \(\phi _\mu = \psi _\mu \chi \) and \(\phi _\nu = \psi _\nu \chi \), and that, for some \(M\ge 0\) and positive pairs \(\gamma _m,t_m, \,1\le m\le M\), there is a twice differentiable function \({\tilde{d}}_{\mu \nu }\) defined on \(\theta  \le \theta _0\), for some \(0 < \theta _0 \le \pi /4\), such that \({\tilde{d}}_{\mu \nu }(0) = {\tilde{d}}_{\mu \nu }^{\prime }(0) = 0\) and
Again, the smaller \(\varepsilon \) and \(\eta \), the better the bounds to be obtained.
Theorem 3.3
Let \(\mu \) and \(\nu \) be finite signed measures on \(\mathbb{Z }\), with characteristic functions \(\phi _\mu \) and \(\phi _\nu \) respectively. Suppose that \(\chi \) is \((\rho ,\gamma ^{\prime },\theta _0)\)smoothly locally normal, and that \(\phi _\mu \) and \(\phi _\nu \) are \((\varepsilon ,\eta ,\theta _0)\)smoothly mod \(\chi \) polynomially close. Assume also that \(\rho \ge 1\) and that \(\rho \theta _0^2 \ge \log \rho \). Then there is a function \(\alpha ^{\prime }:=\alpha ^{\prime }(t,\gamma )\) such that
where \(\gamma _m\) and \(t_m\) are as in (3.5) and \(\gamma ^{\prime }\) is as in (3.4). If \(\mu \) is a probability measure and \(\nu (\mathbb{Z }) = 1\), then
If (3.5) and (3.6) hold with \(\varepsilon =0\) for all \(0\le \theta \le \pi \), then there is a function \(\alpha ^*:=\alpha ^*(t,\gamma )\) such that
Writing \(H := {\sum _{m=1}^M}\gamma _m \rho ^{(t_m+2)/2} + \max \{\varepsilon ,\eta \}\), it is clearly enough to show that, for any \(j \in (\lfloor \zeta _\chi \rfloor  \rho ,\lfloor \zeta _\chi \rfloor + \rho )\),
for some constant \(K\), giving a total contribution to the bound from such \(j\) of order \(O(\rho H)\). In view of (3.6) and (3.7), the main effort is to bound \(\int _{\theta _0}^{\theta _0} e^{\rho \theta ^2}{\tilde{d}}_{\mu \nu }(\theta )d\theta \); however, using (3.5) directly gives a bound of order \(O(\rho ^{1/2}H)\), which is too large. To get round this, for \(j\zeta _\chi \) bigger than \(\rho ^{1/2}\), we write \(e^{ij\theta }(\phi _\mu (\theta )  \phi _\nu (\theta )) = e^{i(\zeta _\chi j)\theta u(\theta )}(\psi _\mu (\theta )  \psi _\nu (\theta ))\), and integrate (3.8) twice by parts, to get a factor of \((j\zeta _\chi )^2\) in the denominator. To make this argument work, we need to continue the function \({\tilde{w}}(\theta ) := e^{u(\theta )}{\tilde{d}}_{\mu \nu }(\theta )\) into \(\theta _0 < \theta  \le \pi \) in suitable fashion. For this, we use the following technical lemma, whose proof is given in the Appendix.
Lemma 3.4
Let \(w:(\infty ,0] \rightarrow \mathbb{R }\) be such that \(w(0)=a\) and \(w^{\prime }(0) = b\). Then \(w\) can be continued differentiably on \([0,\infty )\) by a piecewise quadratic function such that \(w^{\prime \prime }(x) \le c\) for all \(x > 0\) for which \(w^{\prime \prime }(x)\) is defined, and such that \(w(x)=0\) for all
furthermore, \(\max _{x\ge 0} w(x) \le a + b^2/2c\).
We then write
where \(d_{\mu \nu }:= \psi _\mu \psi _\nu \), and, for each \(j\), bound the two parts of the final expression separately.
Proof of Theorem 3.3
(i). For the first step, we use Lemma 3.4 to continue the real and imaginary parts of \({\tilde{w}}(\theta )\) into \(\theta _0 \le \theta  \le \pi \), in such a way that \({\tilde{w}}\) is piecewise twice differentiable on \([\pi ,\pi ]\) and satisfies
with the second derivatives of the real and imaginary parts suitably bounded. Since
it follows from (3.4) and (3.5) that
where
and \(\kappa _1(t,\gamma ) := (t+\gamma )/\{t(t1)\}\). Hence we can continue \({\tilde{w}}\) in \(\theta _0 \le \theta \le \pi \) by a sum of functions \({\sum _{m=1}^M}{\tilde{w}}_m\), where \({\tilde{w}}_m(\theta _0) \le a_m\) and \({\tilde{w}}^{\prime }_m(\theta _0) \le b_m\) for each \(m\), and these bounds at \(\theta _0\) hold also for the real and imaginary parts \({\tilde{w}}_{mr}\) and \({\tilde{w}}_{mi}\) of \({\tilde{w}}\). Define \({\tilde{w}}_{mr}\) and \({\tilde{w}}_{mi}\) in \(\theta _0 \le \theta \le \pi \) using Lemma 3.4, in each case restricting their second derivatives by taking
Then it follows from the lemma that the length of the \(\theta \)interval beyond \(\theta _0\) on which \({\tilde{w}}_m\) is not identically zero is bounded by
from (3.11) and (3.12), the bound being the same for all \(m\); note that
since \(\theta _0 \le \pi /4\) and \(\rho \theta _0^2 \ge 1\). From this and (3.14), and from the analogous continuation in \(\pi \le \theta \le \theta _0\), it follows also that
and, using (3.11), (3.12), (3.13) and Lemma 3.4, that
(ii). The next step is to bound the first part of the integral in (3.9). Here, by (3.6) and (3.7), we have \(e^{u(\theta )}[d_{\mu \nu }(\theta )  {\tilde{d}}_{\mu \nu }(\theta )] \ \le \ \varepsilon \) in \(\theta  \le \theta _0\), whereas, in \(\theta _0 < \theta  \le \pi \), it is bounded by \(\eta + {\tilde{w}}(\theta )\). Hence, for any \(j\), we use (3.17) to give
Noting also that, if \(\rho \theta ^2 \ge \log \rho \ge 0, \,\theta > 0\) and \(t\ge 2\), then
for \(k_2(t) = \{(t1)/e\}^{(t1)/2}\), it follows that
This bounds the first element of (3.9) as \(O(H)\) for all \(j\).
(iii). For the second part of (3.9), we begin by considering values of \(j\) such that \(j\zeta _\chi  < 1 + \lceil \sqrt{\rho }\rceil \). Here, we write
Since, by (3.5),
the first integral is bounded, as in the proof of Proposition 2.1, by
and the second is bounded, as above, by
giving the bound
since also \(\rho \ge 1\). The bound is of order \(O(\rho ^{1/2}H)\), but there are only at most \(4\,+\,2\sqrt{\rho }\le 6\sqrt{\rho }\) integers \(j\) satisfying \(j\zeta _\chi  < 1 + \lceil \sqrt{\rho }\rceil \), so that their sum is of order \(O(\rho H)\), which is as required.
(iv). For \(j\zeta _\chi  \ge 1 + \lceil \sqrt{\rho }\rceil \), integrating twice by parts and using (3.10), it follows that
where
in \(\theta  \le \theta _0\). Hence, using (3.5), (3.20) and the fact that, from (3.4), \(u^{\prime }(\theta ) \le \gamma ^{\prime }\rho \theta \) in \(\theta \le \theta _0\), the part of the integral in (3.23) for this range of \(\theta \) can be bounded by
after some calculation, where, with \(m_t\) as in Proposition 2.1,
The remaining part of the integral in (3.23), for \(\theta _0 < \theta  \le \pi \), yields an additional element of
from (3.16), with \(k_3(t) := \{(t+1)/2e\}^{(t+1)/2}\). As a result, we find that, for \(j\zeta _\chi  \ge 1 + \lceil \sqrt{\rho }\rceil \), the second part of (3.9) can be bounded by
and adding over \(j\zeta _\chi  \ge 1 + \lceil \sqrt{\rho }\rceil \) gives a contribution of order \(O(\rho H)\).
(v). The final step is to make the arbitrary choice \(s = \rho \) in the bound
and to note that, if \(\mu \) is a probability measure and \(\nu (\mathbb{Z }) = 1\), then
(vi). If (3.5) and (3.6) hold with \(\varepsilon =0\) for all \(0\le \theta \le \pi \) (implying, in particular, that \(\eta \) is irrelevant), the proof simplifies dramatically. The considerations concerning \(\theta _0 < \theta  \le \pi \) become unnecessary. This leaves the bound
for \(j\zeta _\chi  < 1 + \lceil \sqrt{\rho }\rceil \), where \({\tilde{w}}(\theta ) = e^{u(\theta )}d_{\mu \nu }(\theta )\). Then, since \(e^{i(j\zeta _\chi )\theta } {\tilde{w}}(\theta )\) is a \(2\pi \)periodic function, the integration by parts in (3.23) remains true, giving the bound
for \(j\zeta _\chi  \ge 1 + \lceil \sqrt{\rho }\rceil \). Adding over all \(j\) gives the final bound, with \(\alpha ^*(t,\gamma ^{\prime }) := 2\beta ^{\prime }(t,\gamma ^{\prime }) + {6\alpha _{1t}}/\{t(t1)\}\). \(\square \)
In certain applications, the difference \(d_{\mu \nu }\) is expressed in the form \(d_{\mu \nu }(\theta ) = {\hat{d}}_{\mu \nu }(e^{i\theta }1)\). If it is true that \({\hat{d}}_{\mu \nu }(0) = {\hat{d}}_{\mu \nu }^{\prime }(0) = 0\) and \({\hat{d}}_{\mu \nu }^{\prime \prime }(w) \le {\hat{\gamma }}w^{t2}\) for complex \(w\) such that \(w \le \theta _0\), then it follows that \(d_{\mu \nu }(0) = d_{\mu \nu }^{\prime }(0) = 0\) and that
4 Approximating probability distributions
4.1 The general case
The most common application of the general bounds is when \(\mu \) is a probability distribution which is close to a member \(R_\lambda \) of a family \(\{R_\lambda ,\,\lambda >0\}\) of probability distributions on the integers, and one is interested in bounds when \(\lambda \) is large. Suppose, in particular, that the characteristic function \(r_\lambda \) of \(R_\lambda \) is \((\rho ,\gamma ^{\prime },\pi )\)smoothly locally normal, and that \(\phi _\mu = \psi r_\lambda \), where \(\psi \) has a polynomial approximation \({\tilde{\psi }}_r\) as given in (2.4), for some \(r \in \mathbb{N }\) and \(\tilde{a}_1,\ldots ,\tilde{a}_r \in \mathbb{R }\). This indicates that \(\mu \) may be close to \(\nu = \nu _r(R_\lambda ;\tilde{a}_1,\ldots ,\tilde{a}_r)\) given in (2.6). The following corollary, in which we use a more probabilistic notation for \(\mu \), establishes the corresponding results.
Corollary 4.1
Let \(X\) be an integer valued random variable with distribution \(P_X\) and characteristic function \(\phi _X := \psi r_\lambda \), where \(r_\lambda \) is a \((\rho ,\gamma ^{\prime },\theta _0)\)smoothly locally normal characteristic function and \(\rho \ge 1\). Let \({\tilde{\psi }}_r\) be as in (2.4). Then, if \(\phi _X\) and \({\tilde{\psi }}_r r_\lambda \) are \((\varepsilon ,\eta ,\theta _0)\)mod \(r_\lambda \) polynomially close, it follows that

1.
\(d_{\mathrm{loc}}(P_X,\nu _r) \ \le \ {\sum _{m=1}^M}\gamma _m \alpha _{1t_m} \rho ^{(t_m+1)/2} + \tilde{\alpha }_1 \varepsilon + \tilde{\alpha }_2 \eta ;\)

2.
\(d_K(P_X,\nu _r) \ \le \ 2\inf _{a\le b}\left( \varepsilon ^{\mathrm{(K)}}_{ab} + \nu _r\{[a,b]^c\}\right) \!;\)

3.
\(\Vert P_X  \nu _r\Vert \ \le \ \inf _{a\le b}\left( \varepsilon ^{(1)}_{ab} + \varepsilon ^{\mathrm{(K)}}_{ab} + 2\nu _r\{[a,b]^c\}\right) \!,\)
where the quantities appearing in the bounds are as in Theorem 3.2, and with \(\nu _r \ =\ \nu _r(R_\lambda ;\tilde{a}_1,\ldots ,\tilde{a}_r)\) as defined in (2.6). Furthermore, if \(\phi _X\) and \({\tilde{\psi }}_r r_\lambda \) are \((\varepsilon ,\eta ,\theta _0)\)smoothly mod \(r_\lambda \) polynomially close, then

4.
$$\begin{aligned}&\Vert P_X  \nu _r\Vert \le 2{\sum _{m=1}^M}\gamma _m \alpha ^{\prime }(t_m,\gamma ^{\prime }) \rho ^{t_m/2} + 6\rho \max \{\varepsilon ,\eta \}\\&\qquad +\, 2\nu _r\{(\lfloor \zeta _{r_\lambda }\rfloor \rho ,\lfloor \zeta _{r_\lambda }\rfloor + \rho )^c\}, \end{aligned}$$
and, if (3.5) and (3.6) hold with \(\varepsilon =0\) for all \(0\le \theta \le \pi \), then

5.
\( \Vert P_X  \nu _r\Vert \ \le \ {\sum _{m=1}^M}\gamma _m \alpha ^*(t_m,\gamma ^{\prime }) \rho ^{t_m/2}\).
Remark
Taking \(\psi _\mu = 0\) and \(\psi _\nu = (e^{i\theta }1)^l\) in Theorem 3.3 for \(l\ge 2\) gives \(d^{\prime \prime }_{\mu \nu }(\theta ) \le l(l+1)\theta ^{l2}\) for all \(\theta \) and \(d^{\prime }_{\mu \nu }(0)=0\), where \(d_{\mu \nu }(\theta ) = \psi _m(\theta )  \psi _n(\theta )\). Hence, by the final part of the theorem, the contribution from the \(l\)th term in the signed measure \(\nu _r\) of (2.6) has total variation norm at most \(\alpha ^*(l,\gamma ^{\prime }) l(l+1) \tilde{a}_l \rho ^{l/2}\), for \(2\le l\le r\).
4.2 Probability distributions as approximations
The use of signed measures to approximate probability distributions is convenient, but not very natural. However, the signed measures \(\nu _1(R_\lambda ;\tilde{a}_1)\) and \(\nu _2(R_\lambda ;\tilde{a}_1,\tilde{a}_2)\) can often be replaced by suitably translated members of the family \(\{R_\lambda ,\,\lambda > 0\}\), with the same asymptotic rate of approximation, by fitting the first two moments, a procedure analogous to that used in the Berry–Esseen theorem. We accomplish this under some further mild assumptions on the distributions \(R_\lambda \).
We call the family \(\{R_\lambda ,\,\lambda >0\}\) amenable if the following three conditions are satisfied. First, the characteristic functions \(r_\lambda \) are to be \((\rho (\lambda ),\gamma ^{\prime },\pi )\)smoothly locally normal (with the same value of \(\gamma ^{\prime }\) for all), where \(\lim _{\lambda \rightarrow \infty }\rho (\lambda ) = \infty \); secondly, if \(b_1 := b_1(\lambda ,\lambda ^{\prime })\) and \(b_2 := b_2(\lambda ,\lambda ^{\prime })\) are chosen to make the first two derivatives of the function
vanish at zero (\(w_{\lambda ,\lambda ^{\prime }}(\theta ) = 0\) is automatic), then \(\delta _{\lambda ,\lambda ^{\prime }}(\theta ) := w_{\lambda ,\lambda ^{\prime }}(\theta )/r_\lambda (\theta )\) is to satisfy
for some continuous function \(f:\,\mathbb{R }_+ \rightarrow \mathbb{R }_+\); and thirdly, if \(Z_\lambda \sim R_\lambda \), then \(\mu (\lambda ) := \mathbb{E }Z_\lambda \) and \(\sigma ^2(\lambda ) := \mathrm{Var\,}Z_\lambda \) should exist, with \(\sigma ^2(\cdot )\) strictly increasing from zero to infinity, and the functions \(\mu (\cdot ), \,\sigma ^2(\cdot )\) and \((\sigma ^2)^{1}(\cdot )\) are all to be uniformly continuous.
The quantities \(b_1\) and \(b_2\), as defined in (4.1), can be explicitly expressed:
and it follows from (4.2) that
Note that the Poisson family \(\{\mathrm{Po\,}(\lambda )\,\lambda >0\}\) is amenable.
Now the signed measures \(\nu _r, \,r\ge 2\), have mean and variance given by
and the corresponding equations for \(\nu _1\) just have \(\tilde{a}_2=0\). However, when choosing a translation of \(R_\lambda \) to match these moments, only integer translations \(m\) of \(R_\lambda \) can be allowed, since the distributions must remain on the integers, and so it is not possible to match both moments exactly within the family. To circumvent this, we extend to approximation by a member of the family of probability distributions \(Q_{mp}(R_{\lambda ^{\prime }})\), for \(\lambda ^{\prime }>0, \,m\in \mathbb{Z }\) and \(0\le p < 1\), where
If \(Z \sim R_{\lambda ^{\prime }}\), then \(Q_{mp}(R_{\lambda ^{\prime }})\) is the distribution of \(Z+m+I\), where \(I \sim \mathrm{Be\,}(p)\) is independent of \(Z\). \(Q_{mp}(R_{\lambda ^{\prime }})\) has characteristic function \(q_{mp}(R_{\lambda ^{\prime }})\) given by
similar to the measure \(\nu _2\{R_{\lambda ^{\prime }};m+p,{m\atopwithdelims ()2}+mp\}\), but with terms of higher order as powers of \((e^{i\theta }1)\) as well.
Among the distributions \(\{Q_{mp}(R_{\lambda ^{\prime }});\, \lambda ^{\prime } > 0, m\in \mathbb{Z }, 0 \le p < 1\}\), we can always find one having a given mean \(\mu _*\) and variance \(\sigma ^2_*\), provided that \(\{R_\lambda ,\,\lambda >0\}\) is amenable and that \(\sigma ^2_* \ge 1/4\), by solving the equations
To do so, let \(\lambda _p\) solve \(\sigma ^2(\lambda _p) = \sigma ^2_*  p(1p)\), possible for \(0\le p \le 1\), since \(\sigma ^2_* \ge 1/4\) and the function \(\sigma ^2\) has an inverse; note also that \(\lambda _0 \!=\! \lambda _1\). Define \(m_p := \mu _* \!\! \mu (\lambda _p)  p\), continuous under the assumptions on \(\sigma ^2\), and observe that \(m_0 = m_1 + 1\). Hence the value \(m = \lfloor m_0 \rfloor \) is realized in the form \(m_p\) for some \(0 \le p < 1\), and then \(\lambda _p, \,m_p\) and \(p\) satisfy (4.8). In the Poisson case, for instance, this gives
where \(\langle x\rangle \) denotes the fractional part of \(x\).
Suppose now that we have an approximation of a distribution \(P_X\) by some measure \(\nu _r(R_\lambda ;\tilde{a}_1,\ldots ,\tilde{a}_r)\), for \(r\ge 2\), and with \(\rho (\lambda ) \ge 1\). We wish to show that \(Q_{mp}(R_{\lambda ^{\prime }})\) and \(\nu _r = \nu _r(R_\lambda ;\tilde{a}_1,\ldots ,\tilde{a}_r)\) are close to order \(O\{\rho (\lambda )^{3/2}\}\), if \(\lambda ^{\prime },m\) and \(p\) are suitably chosen. Matching the first two moments, the choices of \(\lambda ^{\prime },m\) and \(p\) in (4.8) when \(\mu _*\) and \(\sigma ^2_*\) are given by (4.5) are such as to give
implying that, for \(b_1 := b_1(\lambda ,\lambda ^{\prime })\) and \(b_2 := b_2(\lambda ,\lambda ^{\prime })\) given in (4.3),
note also that \(m, \,\lambda ^{\prime }\lambda , \,\mu (\lambda ^{\prime })  \mu (\lambda )\) and \(\sigma ^2(\lambda ^{\prime })  \sigma ^2(\lambda )\) are uniformly bounded for \((\tilde{a}_1,\tilde{a}_2)\) in any compact set. Now, from the definition of \(\delta _{\lambda ,\lambda ^{\prime }}\) and from (4.7), \(q_{mp}(R_{\lambda ^{\prime }})(\theta )\) can be written as \(r_\lambda (\theta )\psi _{\lambda ,\lambda ^{\prime }}(\theta )\), with
However, in view of (4.10),
is a polynomial in \(w\) that begins with the \(w^3\)term, so that
satisfies
where \({\hat{\gamma }}= {\hat{\gamma }}(\tilde{a}_1,\ldots ,\tilde{a}_r)\) remains bounded if \(\tilde{a}_1,\ldots ,\tilde{a}_r\) do. In view of (4.4) and (4.11)–(4.13), \(q_{mp}\) and \({\tilde{\psi }}_r r_\lambda \) are \((0,0,\pi )\)smoothly mod \(r_\lambda \) polynomially close, with \(M=1\) and \(t_1=3\), for a constant \(\gamma _1 = \gamma _1(\tilde{a}_1,\ldots ,\tilde{a}_r)\), whose definition depends on the family \(R_\lambda \). In view of Corollary 4.1(5), this proves the following result.
Proposition 4.2
Suppose that the family \(\{R_\lambda ,\,\lambda >0\}\) is amenable, and that \(\lambda ^{\prime }, \,m\) and \(p\) are chosen to satisfy (4.8) for \(\mu _*\) and \(\sigma ^2_*\) given by (4.5). Then
Thus the signed measure \(\nu _r(R_\lambda ;\tilde{a}_1,\ldots ,\tilde{a}_r)\) can be replaced as approximation by the probability distribution \(Q_{mp}(R_{\lambda ^{\prime }})\) with an additional error in total variation of order at most \(O(\rho (\lambda )^{3/2})\).
Suppose that, instead of having a bound on \(d_{\mu \nu }:= \psi  {\tilde{\psi }}_r\), we are given an approximation to \(\psi \) by a Taylor expansion \(\psi _r(\theta ) := \sum _{l=0}^ra_l (i\theta )^l\) around \(\theta =0\), for real coefficients \(a_l\) (and with \(a_0=1\)) and some \(r\in \mathbb{N }_0\). Then, equating coefficients of \(i\theta \), it follows that
for \(U_r := U_r(a_1,\ldots ,a_r)\), if \(\tilde{a}_1,\ldots ,\tilde{a}_r\) are defined implicitly by
where \(S_m := \{(s_1,\ldots ,s_l):\,\sum _{t=1}^l s_t = m\}\). Hence we can replace any bound on the difference \(\psi  \psi _r\) by a corresponding bound on \(d_{\mu \nu }\) in the assumptions of the theorems, in which the original bound is increased by \(U_{r} \theta ^{r+1}\). This will typically not change the order of the approximation obtained.
Sometimes it is convenient, for simplicity, to use parameters in the expansions that are not those emerging naturally from the proofs. Under the conditions on the family \(\{R_\lambda ,\,\lambda >0\}\) imposed in this section, this is easy to accommodate. For instance, suppose that, for \(\theta  \le \pi \),
with \(A(\theta ) := 1 + \sum _{l=1}^ra_l (e^{i\theta }1)^l, \,A^{\prime }(\theta ) := 1 + \sum _{l=1}^ra^{\prime }_l (e^{i\theta }1)^l\) and with \(\lambda > \lambda ^{\prime }\). Then \(d_{\mu \nu }^{(1)}:= A  A^{\prime }\) satisfies
enabling \(\phi _\mu \) to be replaced by \(\phi _{\nu ^{(1)}}\) in exchange for an error that can be bounded using Corollary 4.1. Similarly, setting \(d_{\mu \nu }^{(2)}:= A(1  r_{\lambda ^{\prime }}/r_\lambda )\), we have
in view of (4.2), where \({\tilde{f}}:\,\mathbb{R }_+\rightarrow \mathbb{R }_+\) is continuous.
5 Poisson–Charlier expansions
As observed above, the Poisson family satisfies all the requirements placed on the family \(\{R_\lambda ,\,\lambda >0\}\) in the previous section, so all the results of that section can be carried across. In this case, the signed measures \(\nu _r\) on \(\mathbb{N }_0\) have the explicit representation
where
denotes the \(l\)th Charlier polynomial [2, (1.9), p. 171].
Note that, if \({j\atopwithdelims ()k}\) is replaced by \(j^k/k!\) in (5.2), one obtains the binomial expansion of \((1j/\lambda )^l\). As this suggests, the values of \(C_l(j;\lambda )\) are in fact small for \(j\) near \(\lambda \) if \(\lambda \) is large:
[1, Lemma 6.1]. The equation (5.3) thus implies that, in any interval of the form \(j\lambda  \le c\sqrt{\lambda }\), which is where the probability mass of \(\mathrm{Po\,}(\lambda )\) is mostly to be found, the correction to the Poisson measure \(\mathrm{Po\,}(\lambda )\) is of uniform relative order \(O(\lambda ^{l/2})\). Indeed, the Chernoff inequalities for \(Z \sim \mathrm{Po\,}(\lambda )\) can be expressed in the form
[3, Theorem 3.2]. Since also, from (5.2),
and since
if \(0\le k\le l\) and \(j \ge l + \lambda \), it follows that, for any \(l\ge 0\), we have
for \(m\le \lambda ,\) and, for \(l\le r\) and \(m\ge \lambda + r\),
It thus follows that
where \(\bar{A}_r := 1 + \sum _{l=1}^r2^l\tilde{a}_l\), demonstrating exponential concentration of measure for \(\nu _r\) on a scale of \(\sqrt{\lambda }\) around \(\lambda \). Moreover, it can be deduced from (5.3) that there exists a positive constant \(d = d(\tilde{a}_1,\ldots ,\tilde{a}_r)\) such that \(\nu _r\{j\} \ge 0\) for \(j\lambda  \le d\lambda \), and it follows from (5.5) that \(\nu _r\{j:\,j\lambda  > d\lambda \} = O(e^{\alpha \lambda })\) for some \(\alpha > 0\). Since also \(\nu _r\{\mathbb{N }_0\} = 1\), it thus follows that, even if \(\nu _r\) is formally a signed measure, it differs from a probability only on a set of measure exponentially small with \(\lambda \).
Since the measures \(\nu _r\) are so well concentrated, the bounds in Corollary 4.1(2–4) can be made more specific. We give as example a theorem deriving from Part 3, under the simplest conditions.
Theorem 5.1
Suppose that \(X\) is as above, having characteristic function \(\phi _X := \psi p_{\lambda }\), and that (2.5) holds; write \(t = r + \delta \).
If \(\lambda \ge 1\), there is a constant \(\alpha _{4t} = \alpha _{4t}(\tilde{a}_0,\ldots ,\tilde{a}_r)\) such that
if \(\lambda < 1\), then there is a constant \(\alpha _{5t} = \alpha _{5t}(\tilde{a}_0,\ldots ,\tilde{a}_r)\) such that
Remark
Of course, for the bound in (5.7) to be of use, \(K_{r\delta }\) should be small.
Proof
For \(\lambda \ge 1\), we use both parts of (5.5), taking
where \(\lfloor x \rfloor \le x \le \lceil x \rceil \) denote the integers closest to \(x\), and with
If \(r + c_{r\lambda }\sqrt{\lambda \log (\lambda +1)} \le \lambda \), we obtain
since \(c_{r\lambda } \ge 1\), and, if \(r + c_{r\lambda }\sqrt{\lambda \log (\lambda +1)} > \lambda \), we get
since \(\lambda \ge \log (\lambda + 1)\) in \(\lambda \ge 0\). Hence, in either case, from the definition of \(c_{r\lambda }\), we have
Hence, from Corollary 4.1(3), with \(\varepsilon =\eta =0, \,M=1, \,\gamma _1 = K_{r\delta }\) and \(t_1=t\), it follows that
with
so that
with \(\beta _{3t} := \alpha ^{\prime }_{1t}\{4r+11\} + \alpha ^{\prime }_{2t} + 4\bar{A}_r\).
For \(\lambda <1\), we take \(b := \lceil 2 + r + {3\log K_{r\delta }}\rceil \) in (5.5), giving
and then, from Corollary 4.1(3) as above, it follows that
so that
with \(\beta ^{\prime }_{3t} := \alpha ^{\prime }_{1t}\{r+6\} + \alpha ^{\prime }_{2t} + 2\bar{A}_r\). \(\square \)
6 Compound Poisson approximation
The theory of Sect. 3 can also be applied when the distributions \(R_\lambda \) come from a compound Poisson family. For \(\lambda > 0\) and for \(\mu \) a probability distribution on \(\mathbb{Z }\), let \(\mathrm{CP\,}(\lambda ,\mu )\) denote the distribution of the sum \(Y := \sum _{j\in \mathbb{Z }\setminus \{0\}} jZ_j\), where \(Z_j, \,j\ne 0\), are independent, and \(Z_j \sim \mathrm{Po\,}(\lambda \mu _j)\). Then, if \(\mu _1>0\), the characteristic function of \(Y\) is of the form \(R_\lambda := \zeta _{\lambda }p_{\lambda _1}\), where \(\zeta _{\lambda }\) is the characteristic function of \(\sum _{j\in \mathbb{Z }\setminus \{0,1\}} jZ_j\) and \(\lambda _1 = \lambda \mu _1\). Thus, for the purposes of applying Corollary 4.1, \(\rho \) can be taken to be \(2\pi ^{2}\mu _1\lambda \).
These considerations apply as long as \(\mu _1 > 0\), and could also be invoked if \(\mu _{1} > 0\). If \(\mu _1=\mu _{1}=0\), there is then no factor of the form \(p_{\lambda }\) to guarantee that, for some \(\rho >0\), the characteristic function \(\phi _Y\) of \(Y\) is \((\rho ,\pi )\)locally normal. Indeed, if \(Y = 2Z\) where \(Z\sim \mathrm{Po\,}(\lambda )\), and if \(W\sim \mathrm{Be\,}(1/2)\) is independent of \(Y\), it is not true that the distribution of \(Y + W\) is close to that of \(Y\) in total variation, even though \(\phi _{Y+W}(\theta )  \phi _Y(\theta ) \le K_0\theta \,\phi _Y(\theta )\); this is to be compared to the example in Sect. 2.
7 Applications
7.1 Sums of independent random variables
Let \(X_1,\ldots ,X_n\) be independent integer valued random variables, and let \(S_n\) denote their sum. In contexts in which a central limit approximation to the distribution of \(S_n\) would be appropriate, the classical Edgeworth expansion (see, e.g., [7, Chapter 5]) is unwieldy, because \(S_n\) is confined to the integers. As an alternative, Barbour and Čekanavičius [1, Theorem 5.1] give a Poisson–Charlier expansion, for \(S_n\) ‘centred’ so that its mean and variance are almost equal. If the \(X_i\) have variances that are uniformly bounded below and have bounded \((r+1+\delta )\)th moments, and if the distribution of each \(X_i\) has nontrivial overlap with that of \(X_i+1\), their error bound with respect to the total variation norm is of order \(O(n^{(r1+\delta )/2})\). Here, under similar conditions, we use Corollary 4.1 to prove an error bound for their expansion which is of the same order, but is established only with respect to the less stringent Kolmogorov distance. A total variation bound for the error, of the slightly larger order \(O(n^{(r1+\delta )/2}\sqrt{\log n})\), could be deduced from Corollary 4.1(3), by taking \(a = \lfloor \lambda  k\sqrt{\lambda \log \lambda } \rfloor \) and \(b = \lceil \lambda + k\sqrt{\lambda \log \lambda } \rceil \), for suitable choice of \(k= k_r\), where \(\lambda = \mathbb{E }S_n\) (and \(\mathbb{E }S_n \approx \mathrm{Var\,}S_n\), because of centring).
Assume that each of the \(X_j\) has finite \((r+1+\delta )\)th moment, with \(r\ge 1\), and define
where \(\kappa _l := \kappa _l(S_n)\) and \(\kappa _l(X)\) denotes the \(l\)th factorial cumulant of the random variable \(X\). Then the approximation that we establish is to the Poisson–Charlier signed measure \(\nu _r\) with
where \(L_r := \max \{1,3(r1)\}\), and where \(\lambda := \mathbb{E }S_n; \,\nu _r\) has characteristic function
where
We need two further quantities involving the \(X_j\):
kept small by judicious centring, and
Theorem 7.1
Suppose that there are constants \(K_{l}, \,1\le l\le r+1\), such that, for each \(j\),
Suppose also that \(p_j \ge p_0 > 0\) for all \(j\), and that \(\lambda \ge n\lambda _0\). Then
for a function \(G\) that is bounded on compact sets.
Remark
For asymptotics in \(n\), with triangular arrays of variables, the error is of order \(O(n^{(r1+\delta )/2})\) when \(\lambda _0\) and \(p_0\) are bounded away from zero, and \(K_{1},\ldots ,K_{r+1}\) and \(K^{(n)}\) remain bounded. The requirements on \(\lambda _0\) and \(p_0\) can often be achieved by grouping the random variables appropriately, though attention then has to be paid to the consequent changes in the \(K_{l}\). The condition (7.5) can always be satisfied with \(K^{(n)}\le 1\), by replacing the \(X_j\) by translates, where necessary. For more discussion, we refer to [1]. The above conditions are designed to cover sums of independent random variables, each of which has nontrivial variance, has uniformly bounded \((r+1+\delta )\)th moment, and whose distribution overlaps with its unit translate.
Proof
We check the conditions of Corollary 4.1(2). First, in view of (7.6), we can write
where both \(\phi _{1j}\) and \(\phi _{2j}\) are characteristic functions. Hence we have
Hence \(\phi _\mu (\theta ) := \mathbb{E }(e^{i\theta S_n})\) satisfies
On the other hand, from the additivity of the factorial cumulants, we have
with \(\kappa _2(S_n) \le K^{(n)}\) from (7.5). From (7.1), we thus deduce the bound \(\tilde{a}_l^{(r)} \le c_l n^{\lfloor l/3\rfloor }\), for \(c_l = c_l(K^{(n)},K_{3},\ldots ,K_{r+1}), \,l\ge 1\). Hence
for \(c^{\prime \prime }=c^{\prime \prime }(K^{(n)},K_{3},\ldots ,K_{r+1})\). Combining (7.7) and (7.8), we can thus take \(\eta := C e^{n\rho ^{\prime }\theta _0^2}\) in (3.2), for
and a suitable \(C = C(K^{(n)},K_{3},\ldots ,K_{r+1})\). The choice of \(\theta _0\) we postpone for now.
For \(\theta  \le \theta _0\), we take \(\chi (\theta ) := p_\lambda (\theta )\), and check the approximation of
by \({\widetilde{A}}^{(r)}(\theta )\) as a polynomial in \(w := e^{i\theta }1\). We begin with the inequality
derived using Taylor’s expansion, true for any \(s\in \mathbb{Z }\) and \(0 < \delta \le 1\), where \(s_{(l)} := s(s1)\ldots (sl+1)\). Hence, for each \(j\), we have
for a universal constant \(c_{r,\delta }\). Then, writing
and using the differentiation formula in [7, p. 170], we have
for a suitable function \(c\) and for all \(\theta  \le \pi \). Combining these estimates, we deduce that, for \(w = e^{i\theta }1\) and for all \(\theta  \le \pi \),
where \(k_1 = k_1(K_{1},\ldots ,K_{r+1})\).
Now a standard inequality shows that, for \(u_j := \prod _{l=1}^j x_l \prod _{l=j+1}^n y_l\), for complex \(x_l,y_l\) with \(y_l \ne 0\) and \(x_l/y_l  1 \le \varepsilon _l\), then
Taking \(x_j := \mathbb{E }\{ (1 + w)^{X_j} \}+ e^{\mathbb{E }X_j w}\) and \(y_j := Q^{(2)}_{r+1}(w;X_j)\), (7.11) shows that we can take \(\varepsilon _l := \varepsilon := k_1\theta ^{r+1+\delta }e^E\) for each \(l\), with
provided that \(\theta  \le \theta _0 \le 1\). For \(r\ge 2\), choosing \(\theta _0 := n^{1/3}\) then ensures that \((1+\varepsilon )^n\) is suitably bounded, and (7.12) yields
for \(k_2 = k_2(K^{(n)},K_{1},\ldots ,K_{r+1})\), since
is bounded for \(\theta _0 = n^{1/3}\), in view of (7.5). For \(r=1, \,u_0\) is uniformly bounded if \(\theta _0 \le 1\), and the choice \(\theta _0 = n^{1/(2+\delta )}\) ensures that \((1+\varepsilon )^n\) remains bounded.
The remaining step is to note that, for \(w:=e^{i\theta }1, \,{\widetilde{A}}^{(r)}(\theta )\) contains all terms up to the power \(w^{L_r}\) in the power series expansion of \(Q^{(2)}_{r+1}(w;S_n)\), giving
Now \(\kappa _2(S_n)\) is bounded by \(K^{(n)}\), and, for \(l\ge 3\), each \(\kappa _l(S_n)\), for which we have only the weak bound \(nK_{l}\), occurs associated with the power \(w^l\) in the exponent of \(Q^{(2)}_{r+1}(w;S_n)\). Writing
the monomials that make up \(P_s(n,z)\) thus have coefficients of magnitude \(n^l\) associated with powers \(z^m\) with \(m \ge (2l  (sl))_+ = (3ls)_+\), so that they are themselves of magnitude at most \(O(n^{l  (3ls)_+/3}) = O(n^{s/3})\) in \(\theta ^{\prime } \le n^{1/3}\). Taking \(s=L_r+1\) and \(r\ge 2, \,m=0\) requires that \(l \le r1\), and \(l\ge r\) entails \(m\ge 2\), so that, for \(r\ge 2\) and \(\theta \le \theta _0\),
with \(k_3 = k_3(K^{(n)},K_{1},\ldots ,K_{r+1})\). If \(\theta  \ge n^{1/2}\), it follows that the bound in (7.14) is at most \(2k_3\{(L_r+1)!\}^{1} n^r \theta ^{3r}\); if \(\theta  \le n^{1/2}\), the bound is at most \(2k_3\{(L_r+1)!\}^{1} n \theta ^{r+2}\). Combining this with (7.13), we have established that for \(\theta \le n^{1/3}\) and \(r\ge 2\), we have
where \(k_4 = k_4(K^{(n)},K_{1},\ldots ,K_{r+1})\). This shows that \(\phi _\mu \) and \(\phi _{\nu _r}\) are \((0,\eta ,\theta _0)\)mod \(p_\lambda \) polynomially close, with
and with \(\theta _0 = n^{1/3}\) and \(\eta = Ce^{n^{1/3}\rho ^{\prime }}\), this last from the bounds (7.7) and (7.8). Applying Corollary 4.1(2), taking \(a = 0\) and \(b=2\lambda \), and using the tail properties of the Poisson–Charlier measures (5.5), the theorem follows for \(r\ge 2\).
For \(r=1\), the bound in (7.14) is easily of order \(\theta ^2\), giving a bound in (7.15) of \(k^{\prime }_4(n\theta ^{2+\delta } + \theta ^2)\). This leads to the choices
together with \(\theta _0 = n^{1/(2+\delta )}\) and \(\eta = Ce^{n^{\delta /(2+\delta )}\rho ^{\prime }}\), and the remainder of the proof is as before. \(\square \)
7.2 Analytic combinatorial schemes
An extremely interesting range of applications is to be found in the paper of Hwang [5]. His conditions are motivated by examples from combinatorics, in which generating functions are natural tools. He works in an asymptotic setting, assuming that \(X_n\) is a random variable whose probability generating function \(G_n\) is of the form
where \(h\) is a nonnegative integer, and both \(g\) and \(\varepsilon _n\) are analytic in a closed disc of radius \(\eta > 1\). As \(n\rightarrow \infty \), he assumes that \(\lambda \rightarrow \infty \) and that \(\sup _{z:z\le \eta }\varepsilon _n(z) \le K\lambda ^{1}\), uniformly in \(n\). He then proves a number of results describing the accuracy of the approximation of \(P_{X_nh}\) by \(\mathrm{Po\,}(\lambda + g^{\prime }(1))\).
Under his conditions, it is immediate that we can write
for \(z1 < \eta \), with
for all \(j\ge 0\). Hence \(X := X_nh\) has characteristic function of the form \(\psi ^{(n)}p_{\lambda }\), where
and thus, for any \(r\in \mathbb{N }_0\),
with \({\tilde{\psi }}^{(n)}_r\) defined as in (2.4), taking \(\tilde{a}^{(n)}_j = g_j+\varepsilon _{nj}\); note that the constant \(K_{r1}\) can indeed be taken to be uniform for all \(n\). Since also \(g\) and \(\varepsilon _n\) are both uniformly bounded on the unit circle, and since \({\tilde{\psi }}^{(n)}_r\) is bounded (uniformly in \(n\)) for \(\theta  \le \pi \), it is clear that (7.18) can be extended to all \(\theta  \le \pi \), albeit with a different uniform constant \(K^{\prime }_{r1}\), so that (2.5) holds with \(\delta =1\) for any \(r\in \mathbb{N }_0\). Thus Parts 1–3 of Corollary 4.1 (with \(R_\lambda =\mathrm{Po\,}(\lambda )\) and \(\rho (\lambda )=2\lambda /\pi ^2\)) can be applied with any choice of \(r\), giving progressively more accurate approximations to \(P_{X_nh}\), as far as the \(\lambda \)order is concerned, in terms of progressively more complicated perturbations of the Poisson distribution. These theorems are thus applicable to all the examples that Hwang considers, including the numbers of components (counted in various ways) in a wide class of logarithmic assemblies, multisets and selections.
For instance, using translated Poisson approximation as in Sect. 4.2 by way of Proposition 4.2 gives an approximation to \(P_{X_nh}\) by the mixture \(Q_{mp}(\mathrm{Po\,}(\lambda ^{\prime }))\), where, from (4.9),
where \(m_n := g_n^{\prime }(1), \,v_n := g_n^{\prime \prime }(1) + g_n^{\prime }(1)  \{g_n^{\prime }(1)\}^2\) and \(g_n := g + \varepsilon _n\). Hwang’s approximation by \(\mathrm{Po\,}(\lambda + g^{\prime }(1))\) has asymptotically the same mean as ours (and as that of \(X_nh\)), but a variance asymptotically differing by \(\kappa := g^{\prime \prime }(1)  \{g^{\prime }(1)\}^2\). As a consequence, Hwang’s approximation has an error of larger asymptotic order, in which the quantity \(\kappa \) appears; for instance, for Kolmogorov distance, his Theorem 1 gives an error of order \(O(\lambda ^{1})\), whereas that obtained using Corollary 4.1(2) together with Proposition 4.2 is of order \(O(\lambda ^{3/2})\).
Although our Poisson expansion theorems are automatically applicable under Hwang’s conditions, they also apply to examples that do not satisfy his conditions: the simple example at the end of Sect. 2 is one such. Conversely, Hwang’s Theorem 2, which establishes Poisson approximation in the lower tail with good relative accuracy, cannot be proved using only our conditions; the conclusion would not be true, for instance, in the example just mentioned.
Note also that Hwang examines problems from combinatorial settings in which approximation is not by Poisson distributions: he has examples concerning the (amenable) Bessel family of distributions,
for the appropriate choice of normalizing constant \(L(\lambda )\). Thus we could apply Corollary 4.1 to obtain asymptotically more accurate expansions, and, in conjunction with Proposition 4.2, obtain slightly sharper approximations than his within the translated Bessel family.
7.3 Prime divisors
The numbers of prime divisors of a positive integer \(n\), counted either with (\(\Omega (n)\)) or without (\(\omega (n)\)) multiplicity, can also be treated by these methods, since excellent information is available about their generating functions. For our purposes, we use only the shortest expansion, taken from [11, Theorems II.6.1 and 6.2]. One finds that, for \(N_n\) uniformly distributed on \(\{1,2,\ldots ,n\}\), the characteristic functions of \(\Omega (n)\) and \(\omega (n)\) are given by
where \(\varepsilon _s(\theta ) \le C_s/\log n, \,s=1,2\), for some constants \(C_1\) and \(C_2\), and
\(q\) running here over prime numbers. These expansions were established and used by Rényi and Turán [9] in their proof of the Erdős–Kac Theorem, but they are also sketched by Selberg [10].
Kowalski and Nikeghbali [6] have emphasized the structural interpretation of these functions, which we now recall. Write
so that \(\Phi _1(e^{i\theta }1)=\Phi _{1,1}(\theta )\Phi _{1,2}(\theta )\).
Let \(X_n\) be the random variable giving the number of disjoint cycles appearing in the decomposition of a random uniformly distributed permutation of size \(n\). In addition, let \(Y_n\) be a random variable of the form
where the \(B_q\) are independent Bernoulli random variables indexed by primes, with \(\mathbb{P }[B_q=1]=1/q; \,Y_n\) represents a naive model of the number of prime divisors \(\le n\) of a large integer.
Then we have
and
This suggests an interpretation of the Rényi–Turán formula as a probabilistic decomposition of \(\omega (N_n)\) in terms of random permutations of size \(\log n\) and the naive divisibility model for integers, with an intricate dependency structure. We note that in the setting of polynomials over finite fields, this interpretation was shown by Kowalski and Nikeghbali [6] to have a precise meaning and to be very useful.
We come back to the application of our results to \(\omega (N_n)\) and \(\Omega (N_n)\). Let \(\tilde{a}_{ls}, \,s=1,2\), denote the Taylor coefficients of the functions \(\Phi _s(w)\) as power series in \(w\) (around \(w=0\), which corresponds to \(\theta =0\)). By analyticity near \(0\), it follows that, for any \(r\), we have
for suitable constants \(C_{rs}, C^{\prime }_{rs}\) and for \(w\le 2\). In order to approximate the distributions \(P_{\omega (N_n)}\) and \(P_{\Omega (N_n)}\), we define the measures \(\nu _r^{(s)}\) by
and invoke Corollary 4.1 with \(M=1, \,\theta _0=\pi \) and \(\varepsilon = C_s/\log n\), together with (3.30); this leads to the following conclusion, which refines the Erdős–Kac theorem.
Theorem 7.2
For the measures \(\nu _r^{(s)}\) defined above, we have
for suitable constants \({\widetilde{C}}_1\) and \({\widetilde{C}}_2\), and with \(\alpha ^{\prime }_{1l}\) as defined in (5.9).
Remark
As far as we know, total variation approximation was first considered in this context by Harper [4], who proved a bound with error of size \(1/(\log \log n)\) (for a truncated version of \(\omega (n)\), counting only prime divisors of size up to \(n^{1/(3(\log \log n)^2)}\)), and deduced explicit bounds in Kolmogorov distance.
To indicate what this means in concrete terms for number theory readers, consider the case of \(\omega (n)\) for \(r=1\). Taylor expansion gives
as \(w\rightarrow 0\), where \(B_1\approx 0.26149721\) is the Mertens constant, i.e., the real number such that
as \(x\rightarrow +\infty \). An application of Theorem 7.2 with \(r=1\) gives
for any set \(A\) of positive integers, where
Higher expansions could be computed in much the same way.
Alternatively, a more accurate approximation is available from Theorem 7.2 with \(r=2\), while staying within the realm of (translated) Poisson distributions, by invoking Proposition 4.2. For this, we compute the expansion of \(\Phi _1\) to order \(2\), obtaining (after some calculations) that
where
(use \(1/\Gamma (1+w)=1+\gamma w+(\gamma ^2/2\pi ^2/12)w^2+O(w^3)\), as well as the Mertens identity
and expand every term in the Euler product). This corresponds to (2.5), since \(w=e^{i\theta }1\).
We can then apply Theorem 7.2 and Proposition 4.2 to yield the translated Poisson approximation \(Q_{mp}(\mathrm{Po\,}(\lambda ^{\prime }))\), with \(\lambda ^{\prime }, m\) and \(p\) found from (4.9). With
this gives
Thus, for any positive integer \(n\) and any set \(A\) of positive integers, we have
Similar results hold for \(\Omega (n)\), where one obtains the following approximate values for the quantities \(p,m,\lambda ^{\prime }\):
References
Barbour, A.D., Čekanavičius, V.: Total variation asymptotics for sums of independent integer random variables. Ann. Probab. 30, 509–545 (2002)
Chihara, T.S.: An Introduction to Orthogonal Polynomials. Gordon and Breach, New York (1978)
Chung, F., Lu, L.: Concentration inequalities and martingale inequalities: a survey. Int. Math. 3, 79–127 (2006)
Harper, A.J.: Two new proofs of the Erdős–Kac Theorem, with bound on the rate of convergence, by Stein’s method for distributional approximations. Math. Proc. Camb. Philos. Soc. 147, 95–114 (2009)
Hwang, H.K.: Asymptotics of Poisson approximation to random discrete distributions: an analytic approach. Adv. Appl. Probab. 31, 448–491 (1999)
Kowalski, E., Nikeghbali, A.: ModPoisson convergence in probability and number theory, Int. Math. Res. Notices. doi:10.1093/imrn/rnq019; see also arXiv:0905.0318 (2010)
Petrov, V.V.: Limit Theorems of Probability Theory. Oxford University Press, Oxford (1975)
Presman, E.L.: Approximation of binomial distributions by infinitely divisible ones. Theor. Probab. Appl. 28, 393–403 (1983)
Rényi, A., Turán, P.: On a theorem of Erdős–Kac. Acta Arith. 4, 71–84 (1958)
Selberg, A.: Note on the paper by L.G. Sathe. J. Indian Math. Soc. 18, 83–87 (1954)
Tenenbaum, G.: Introduction à la théorie analytique et probabiliste des nombres. Société Mathématique de France (1995)
Author information
Authors and Affiliations
Corresponding author
Additional information
A. D. Barbour supported in part by Schweizerischer Nationalfonds Projekt Nr. 20117625/1 and by Australian Research Council Grants Nos. DP120102728 and DP120102398. A. Nikeghbali supported in part by Schweizerischer Nationalfonds Projekt Nr. 200021_119970/1.
Appendix
Appendix
To prove Lemma 3.4, assume without loss of generality that \(a\ge 0\). If \(b \ge 0\), take \(w(x) = a + bx  {\textstyle \frac{1}{2}}cx^2\) for \(0 \le x \le x_1 := b/c\), when \(w\) reaches its maximum of \(h := a + b^2/2c\), and continue with the same definition until \(x = x_1 + x_2\), where \(x_2 := \sqrt{h/c}\), at which point \(w(x_1+x_2) = h/2\). Changing the second derivative from \(c\) to \(c\) gives \(w(x) = {\textstyle \frac{1}{2}}c(x_1+2x_2  x)^2\), to be used for \(x_1+x_2 \le x \le x_1 + 2x_2\), and then take \(w(x) = 0\) for \(x>x_1+2x_2\). This definition of \(w\) satisfies all the claimed requirements.
For \(b < 0\), take \(w(x) = a +bx + {\textstyle \frac{1}{2}}c x^2\) until \(x_1 := b/c\), when \(w^{\prime }(x_1)=0\) and \(w(x_1) = a  b^2/c\). Thereafter, continue essentially as before, with \(h := a  b^2/2c\) and \(x_2 := \sqrt{h/c}\), taking second derivative \(\mathrm{sgn}(w(x_1))c\) in \((x_1,x_1+x_2)\) and \(\mathrm{sgn}(w(x_1))c\) in \((x_1+x_2,x_1+2x_2)\). \(\square \)
Rights and permissions
About this article
Cite this article
Barbour, A.D., Kowalski, E. & Nikeghbali, A. Moddiscrete expansions. Probab. Theory Relat. Fields 158, 859–893 (2014). https://doi.org/10.1007/s0044001304988
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0044001304988