Richter’s local limit theorem, its refinement, and related results*

We give a detailed exposition of the proof of Richter’s local limit theorem in a refined form and establish the stability of the remainder term in this theorem under small perturbations of the underlying distribution (including smoothing).We also discuss related quantitative bounds for characteristic functions and Laplace transforms.


Introduction and Formulation of the Results
Let (X n ) n≥1 be independent copies of a random variable X with mean EX = 0 and variance Var(X) = 1.Throughout, we assume without mentioning that the normalized sum has a bounded density p n 0 for some n = n 0 , that is, p n 0 (x) ≤ M for all x ∈ R with some constant M .Then all Z n with n ≥ 2n 0 have continuous bounded densities p n (x).An asymptotic behavior of these densities describing their closeness to the normal density is governed by several local limit theorems.First of all, there is a uniform local limit theorem due to Gnedenko sup x |p n (x) − ϕ(x)| → 0 as n → ∞.
Under higher order moment assumptions, say if E |X| m < ∞ for an integer m ≥ 3, this statement may be considerably sharpened in the form of a non-uniform local limit theorem sup where ϕ m denotes the Edgeworth correction of ϕ of order m (cf.[11], [16], [17]).In various applications, this relation is typically effective in the range |x| ≤ √ c log n, since then the ratio p n (x)/ϕ(x) remains close to 1 (for a suitable c).For example, (1.1) is crucial in the study of rates in the entropic central limit theorem, rates for Rényi divergences of finite orders and for the relative Fisher information ([5]- [7]).
As for larger regions, the asymptotic behavior of p n (x) is governed by the following remarkable theorem due to Richter [18], assuming the finiteness of an exponential moment for the random variable X.
Theorem 1.1.Suppose that, for some b > 0, E e b|X| < ∞. (1.2) Then, for x = o( √ n), the densities of Z n admit the representation Here, λ(τ ) represents an analytic function in some neighborhood of zero.
It was shown by Amosova [1] that the condition (1.2) is necessary for the existence of a representation like (1.3) in the region |x| = o( √ n) with some analytic function λ.
The function λ in (1.3) is representable as a power series, called the Cramér series, which is absolutely convergent in some disc |τ | < τ 0 of the complex plane.It has appeared in the work by Cramér [9] in a similar representation for the ratio of the tails of distribution functions of Z n and the standard normal law (cf.also [10], [14], [15]).
Let us also mention that (1.3) is stated in Richter's work in a slightly different form with O( |x| √ n ) in the last brackets and for |x| > 1.A similar result is proved in the book by Ibragimov and Linnik [13] under the assumption that X has a bounded continuous density.
As a consequence of (1.3), one immediately obtains, for example, that uniformly in the region |x| = o(n 1/6 ).In the region c 0 n 1/6 ≤ |x| ≤ c 1 n 1/2 , the behavior may be quite different, and in order to describe it, the appearance of the term O( 1+|x| √ n ) in (1.3) is non-desirable.The purpose of this paper is to give a detailed exposition of the proof of Theorem 1.1, clarifying the meaning of the leading coefficient in (1.4) and replacing this term with an n-depending quantity.We basically employ the tools of [13] and derive the following refinement.
As we will see in Section 5, where γ m (m ≥ 3) is the first non-zero cumulant of the random variable X (assuming that it is not normal).Equivalently, m is the smallest positive integer such that EX m = EZ m , where Z is a standard normal random variable, in which case With this refinement, it should be clear that the relation (1.5) holds true uniformly over all x in the potentially larger region For example, if the distribution of X is symmetric about the origin, then γ 3 = 0, so that necessarily m ≥ 4.
Another consequence of (1.6), which cannot be obtained on the basis of (1.3), is needed in the study of the central limit theorem (CLT) with respect to the Rényi divergence of infinite order (including the rate of convergence).Let us recall that the Rényi divergence of a finite order κ > 0 from the distribution of Z n to the standard normal law is defined by As a function of κ, it is non-decreasing, representing a strong distance-like quantity.In the range 0 < κ < 1, it is metrically equivalent to the total variation, that is, L 1 -distance between p n and ϕ.The case κ = 1 corresponds to the relative entropy (Kullback-Leibler's distance) and another important case κ = 2 leads to the function of the χ 2 -Pearson distance.So far, information-theoretic CLT's of the form D κ (p n ||ϕ) → 0 as n → ∞ have been completely characterized in terms of the distribution of X (that is, in the i.i.d.situation).However, such a statement remains fully open for the limit distance Equivalently, the problem is to find conditions under which the related quantity (the Tsallis distance of infinite order) ϕ(x) tends to zero for growing n (note that one may not put the absolute values sign, since all p n may be compactly supported).
As a first natural step towards this variant of the CLT, we consider the problem of the convergence for the restricted Tsallis distance with the above suprema taken over growing intervals |x| = O( √ n).With this in mind, Theorem 1.2 allows one to prove the following: Corollary 1.3.Under the conditions of Theorem 1.1, suppose that m is even, m ≥ 4, and γ m < 0. There exist constants τ 0 > 0 and c > 0 with the following property.
Here, the condition about cumulants is fulfilled, for example, when the random variable X is strongly subgaussian in the sense that E e tX ≤ e t 2 /2 for all t ∈ R (1.8) (recall that that EX 2 = 1, while the condition EX = 0 is necessary).This interesting class of probability distributions is rather rich, and we refer the reader to [8] for discussions and various examples.Our main motivation stemmed from the fact that the strong subgaussianity is necessary for the convergence D ∞ (p n ||ϕ) → 0. As we have recently learned, (1.8) had previously appeared under the name "sharp subgaussianity" in the work by Guionnet and Husson [12] for a completely different reason as a condition to have LDPs for the largest eigenvalue of Wigner matrices with the same rate function as in the case of Gaussian entries.
One important issue which is not addressed in the formulation of Theorem 1.2 is how one can control the involved constant in the O-remainder term in (1.6).To better quantify this asymptotic representation, we actually prove the following statement using the same analytic functions λ(τ ) and µ(τ ).
Theorem 1.4.Assume that E e α|X| ≤ 2 (α > 0).There exist absolute positive constants C and c such that, whenever n ≥ n 1 and τ = x/ √ n, |τ | ≤ τ 0 , we have where |B| ≤ C and This statement should be useful in applications to smoothed distributions in order to guarantee that the constant in the remainder term may be chosen to be common for all distributions under consideration.
In order to make the proofs/arguments more transparent and self-contained, we include a short review of various related results -partly technical, but often interesting in themselves -about maxima of densities, analytic characteristic functions and log-Laplace transforms.The rest of the paper is organized as follows.In Sections 2 we recall basic properties of the maximum of convolved densities and then develop their applications to bounding restricted integrals of powers of characteristic functions (Section 3).In section 4, we discuss behavior of analytic characteristic functions near the origin.Section 5 is devoted to the so-called saddle points and associated Taylor expansions for the log-Laplace transforms.Here we also analyze the functions λ(τ ) and µ(τ ).Sections 6 deals with contour integration needed to establish preliminary representations for p n (x).Final steps in the proof of Theorem 1.4 are made in Section 7. The proof of Corollary 1.3 is postponed to Section 8.

Maximum of Convolved Densities
Convolved densities are known to have improved smoothing properties.First, let us emphasize the following general fact (which explains the condition n ≥ 2n 0 mentioned before Theorem 1.1).
Proof.Denote by q k the densities of ξ k and assume that q k (x) ≤ M k for all x ∈ R with some constants M k (k ≤ m).By the Plancherel theorem, for the characteristic functions where we used the property |g k (t)| ≤ 1, t ∈ R. Hence, by Hölder's inequality, the characteristic function One may conclude that the random variable S m has a bounded, uniformly continuous density expressed by the inversion Fourier formula Since g is integrable, it also follows that q(x) → 0 as |x| → ∞ (by the Riemann-Lebesgue lemma).
Consider the functional M (ξ) = ess sup x q(x), where ξ is a random variable with density q (one may put M (ξ) = ∞ in all other cases).Since, by (2.2), for all x ∈ R, the inequality (2.1) also implies that This shows in particular that M (ξ) may not increase by adding to ξ an independent random variable.However, the relation (2.3) does not correctly reflect the behavior of M (S m ) with respect to the growing parameter m, especially in the i.i.d.situation.A more precise statement is described in the following relation, where the geometric mean of maxima is replaced with the harmonic mean. (2.4) This bound may be viewed as a counterpart of the entropy power inequality in Information Theory.It may be obtained by combining Rogozin's maximum-of-density theorem with Ball's bound on the volume of slices of the cube.Namely, it was shown in [19] that, if the values M k = M (ξ k ) are fixed, M (S m ) is maximized for ξ k uniformly distributed in the intervals of length 1/M k .Of course, in this case M (S m ) has a rather complicated structure as a function in variables M 1 , . . ., M m .
On the other hand, if , where η k are independent and uniformly distributed in (0, 1), and the coefficients satisfy cf. [2].In geometric language, this is the same as saying that 1 ≤ |Q ∩ H| ≤ √ 2, where Q = (0, 1) m is the unit cube, H is an arbitrary hyperplane in R m passing through the center of the cube, and | • | stands for the (m − 1)-dimensional volume.To obtain (2.4), put Since, by [19], M (S m ) does not exceed the first term in (2.6), we get With this argument, this relation is mentioned in [3], where its multidimensional analog is derived by applying the Hausdorff-Young inequality with best constants (due to Beckner and Lieb).
Remark 2.3.Modulo a universal constant, the left inequality in (2.5) may be extended to the more general setting.Namely, if a random variable ξ has a density q with a finite standard deviation σ, then Here equality is attained for the uniform distribution on arbitrary bounded intervals of the real line.This relation is well-known; as an early reference one can mention Statuljavičus [20], p. 651, where (2.7) is stated without proof.Since it is used below, let us include a short argument.For normalization, one may assume that M (ξ) = 1 and Eξ = 0.In this case, the tail function H(x) = P{|ξ| ≥ x} has a Lipschitz semi-norm at most 2, implying that H(x) ≥ 1 − 2x for all x ≥ 0. This gives

L p -Norms of Characteristic Functions and Orlicz Norms
One useful consequence of (2.4) is the next bound on L 2m -norms of characteristic functions.
Proof.We apply Proposition 2.2 to 2m summands k are independent copies of ξ.Introduce the symmetrized random variable Sm = S m −S ′ m , where S ′ m is an independent copy of S m .By (2.4), we then get In addition, Sm has characteristic function |g(t)| 2m .If M (ξ) is finite, one may apply Proposition 2.1 and conclude that Sm has a bounded continuous density q m (x) which is vanishing at infinity.Moreover, q m (x) is maximized at x = 0, and its value at this point is described by the inversion formula (2.2) which gives Using (2.3), one can obtain a similar relation, but without the factor 1/ √ m in (3.1).
When M (ξ) is finite and m is large, this bound may be considerably sharpened asymptotically with respect to m when restricting the integration to the regions |t| ≥ ε > 0. Before making this precise, first let us note that, since the random variable ξ has a density, we have for all ε > 0. This holds by continuity of g, and since |g(t)| < 1 for all t = 0 (which is true for any non-lattice distribution), while g(t) tends to zero as t → ∞, by the Riemann-Lebesgue lemma.By the way, this property remains to hold in the more general situation, where the m-fold convolution of the distribution of ξ with itself has a density (while the distribution of ξ might be not absolutely continuous).Indeed, in that case, (3.2) may be applied to g m , and it remains to notice that this relation does not depend on m.
The property (3.2) may be quantified using, for example, the following observation due to Statuljavičus [20].Proposition 3.2.If a random variable ξ has a bounded density with M = M (ξ) and finite variance σ 2 = Var(ξ), σ > 0, then its characteristic function g satisfies, for all ε > 0, This relation may be extended to non-bounded densities q, in which case the parameter M should be replaced with quantiles of the random variable q(ξ).The moment condition may also be removed, and instead it is sufficient to deal with quantiles of |ξ − ξ ′ |, where ξ ′ is an independent copy of ξ; cf.[4] for details.
Returning to (3.1) and applying (3.3) with ε ≤ 1, we then have with some absolute constant C. Thus, the resulting bound decays asymptotically fast in m.
Let us derive a similar bound in the scheme of independent copies (X n ) n≥1 of the random variable X with Var(X) = 1, assuming that the normalized sum Z n has a bounded density for n = n 0 with M = M (Z n 0 ).Consider the characteristic function f (t) = E e itX .We apply Propositions 3.
, while m ≥ n 8n 0 , and we arrive at By (3.3) with ε ≤ 1, we also have which may be simplified to Combining the two bounds, one may summarize.
Corollary 3.3.Let Var(X) = 1, and suppose that Z n has a density for n = n 0 bounded by M .Then, for all 0 < ε ≤ 1 and n ≥ 4n 0 , the characteristic function f of X satisfies (3.4)

Behavior of Characteristic Functions near Zero
While the boundedness of the density is important to control integrability properties of powers of the characteristic function of a random variable X, the condition (1.2) on the finiteness of an exponential moment of X guarantees that the characteristic function is well-defined and analytic in the strip |y| = |Re(z)| < b of the complex plane.Equivalently, we will assume throughout that, for some α > 0, This parameter is more convenient to quantify the behavior of f (z) near zero.For example, using xe −x ≤ e −1 (x ≥ 0) and assuming that |y| ≤ α 2 , we then have Hence |f (z) − 1| ≤ 4 αe |z| (since f (0) = 1).Thus, we obtain: This allows one to consider the log-Laplace transform One may also bound the derivatives of all orders.
Lemma 4.2.For all complex numbers z in the disc |z| ≤ α 4 , As a consequence, Thus, these derivatives have at most a factorial growth in absolute value with respect to the growing parameter k.For the particular orders k = 2 and k = 3, and under our moment assumptions, the bound (4.3) may be refined in a smaller disc according to (4.4)-(4.5).
Proof.To obtain (4.3), one may apply Cauchy's formula with r = α 4 together with the second bound in (4.2).
We shall now show that |f (z)| is bounded away from 1 in a certain region near zero.Proof.Using f ′ (0) = 0, f ′′ (0) = −1, one may start with an integral Taylor formula This equality is needed in the disc |z| ≤ r of radius r = 1 8 5 4 α 3 .By the triangle inequality, we then have where we used |t| ≤ 1 4 on the last step.Thus, (4.10) follows.Turning to the maximum in (4.9), one may apply the last bound in (4.For z = t + iy, |y| ≤ 1  2 |t|, we have |z| 3 ≤ ( 5 4 ) 3/2 |t| 3 , and (4.9)-(4.10)therefore give Finally, let us make a few remarks about the relationship between the conditions (1.2) and (4.1).When the random variable X has a finite exponential moment, and α is optimal, then (4.1) becomes an equality.In this case, the quantity 1 α represents the Orlicz norm of X generated by the Young function ψ(x) = e |x| − 1, x ∈ R: If EX 2 = 1, the parameter α may not be large, since the L 2 -norm is dominated by the L ψ -norm.More precisely, using x 2 e −x ≤ 4e −2 (x ≥ 0), we have In fact, this bound may be sharpened.Proof.We may assume that X ≥ 0 and then we need to show that E e X > 2. It is easy to check that x + 1 6 x 3 ≥ ax 2 for all x ≥ 0 with the optimal constant a = 2 Note that if we start with a more general condition B = Ee b|X| < ∞ as in Theorem 1.1, (4.1) is fulfilled for a certain constant α > 0. Indeed, if B ≤ 2, then one may take α = b.

Saddle Point and Taylor Expansions
Assume that EX = 0, EX 2 = 1, and E e α|X| ≤ 2 (α > 0).Since the log-Laplace transform K(z) = log E e zX was defined as an analytic function in the disc |z| ≤ α 2 of the complex plane, it may be expanded as an absolutely convergent power series Here, the coefficients γ k = K k) (0) are called cumulants of X.Every γ k represents a certain polynomial in moments of X up to order k.In particular, γ 3 = EX 3 and γ 4 = EX 4 − 3. Similarly, The next object is important for contour integration.Definition 5.1.Given τ ∈ C, a saddle point is a solution z 0 = z 0 (τ ) of the equation Thus, a saddle point is the solution of Proposition 5.2.In the disc |τ | ≤ α 3 32 , the equation (5.1) has a unique solution z 0 (τ ).Moreover, it represents an injective analytic function satisfying z ′ 0 (0) = 1 and Proof.Let us use (5.2) as the definition of the analytic function τ = K ′ (z).If τ is sufficiently small, say |τ | ≤ τ 0 , this equality may be inverted as a power series in τ , 16 , define the path z t = (1 − t)z 1 + tz 2 connecting these points.We have As a consequence, the map z → τ (z) is injective in the disc |z| ≤ α 3 16 .In addition, since τ (0) = 0, we have Therefore, the image of the circle |z| = α 3 16 under this map represents a closed curve on the complex plane outside the circle |τ | = α 3 32 .Since the image of the disc |z| ≤ α 3 16 under τ is a connected set, while τ (0) = 0, this set must contain the disc |τ | ≤ α 3 32 .Thus, the inverse map z 0 (τ ) = τ −1 is well-defined and represents a holomorphic injective function in |τ | ≤ α 3 32 satisfying (5.3), by (5.6), and z ′ 0 (0) = 1, by (5.4).Hence, one may take τ 0 = α 3 32 .In addition, z 0 (τ ) takes real values for real τ .Indeed, since all cumulants are real numbers, τ (z) is real for real z, so is the inverse function z 0 .Also, by (5.5), which shows that z 0 (τ ) > 0 as long as 0 < τ ≤ α 3 32 (since the expression under the integral sign is a real-valued function whose absolute value does not exceed 1/2).
It is natural to determine the leading term in the Taylor expansion for z 0 (τ ) when expanding this function as a power series in τ .Assuming that X is not normal, let γ m (m ≥ 3) be the first non-zero cumulant of X.Then, as |z| → 0, ) as τ → 0, cf.Proposition 5.2, we get from (5.7) Therefore, (5.9) Next, let us write down the Taylor expansion around the point z 0 = z 0 (τ ): Here, we used the property that the function K(z) − τ z has derivative K ′ (z 0 ) − τ = 0 at the saddle point z = z 0 .Thus, the linear term in (5.10) corresponding to k = 1 is vanishing.As for the free term corresponding to k = 0, we have Using (5.9) and (5.11), we actually have (5.12) Thus, applying Proposition 5.2 and recalling that K(z) is analytic in |z| ≤ α 2 (Lemma 4.1), we obtain: is well-defined and analytic in the disc |τ | ≤ α 3 32 .Moreover, as τ → 0, It follows that λ(τ ) is bounded for small τ , but we will need to quantify this property in terms of the parameter α.Recall that EX = 0, EX 2 = 1 and E e α|X| ≤ 2 (α > 0).Proposition 5.5.We have Proof.Proposition 5.2 allows us to apply Cauchy's formula, which yields 64 .Moreover, the latter implies, by (5.3), Next, we note that, by Definition 5.1 of the saddle point, cf.(5.1), the function has the first three derivatives we may apply the Taylor integral formula together with (5.15) to conclude that As ψ(τ ) = τ 3 λ(τ ), the relation (5.14) follows.
Let us now introduce another analytic function which appears in the representation (1.6) of Theorem 1.2.
Let us also mention that the function K(z) is convex and has a positive second derivative on the real line, more precisely -on the interval where it is finite.Hence µ(τ ) is real-valued for real τ .

Contour Integration
Let (X n ) n≥1 be independent copies of a random variable X with EX = 0, Var(X) = 1, and characteristic function f (t) = E e itX .We now consider the normalized sum assuming that M = M (Z n 0 ) is finite.As already discussed in Section 2, in this case all Z n with n ≥ 2n 0 have continuous bounded densities expressed by the inversion formula where denotes the characteristic functions of Z n .Equivalently, Using contour integration, one can cast this formula in a different form involving the log-Laplace transform K(z) = log E e zX and the saddle point z 0 = z 0 (τ ) for the real value τ = x/ √ n.This is a preliminary step towards Theorems 1.2 and 1.4.
As before, let E e α|X| ≤ 2 with a parameter α > 0. Proof.Applying Corollary 3.3, we get from (6.1) that, for any ε ∈ (0, 1], Assuming for definiteness that x ≥ 0, we take the rectangle contour with segment parts where h > 0 is chosen to satisfy h ≤ ε 2 .With this choice the complex numbers z = t + iy with |t| ≤ ε, |y| ≤ h lie in the domain of the definition of K(z).Then, by Cauchy's theorem, Note that in the lower half-plane z = t − iy, 0 ≤ y ≤ h, we have |e Moreover, |f (z)| is bounded away from 1 on L 2 and L 4 according to (6.5) which gives ≤ ε e −nε 2 /5 ≤ 1 16 e −nε 2 /5 .
As a next step, let us show that, at the expense of a small error, the integration in (6.2) may be restricted to the interval |t| ≤ t n with This can be achieved under stronger conditions such as Indeed, using (5.10) and (5.12) in the representation (6.2), one may rewrite (6.2) as where τ = x √ n and ρ k = K (k) (z 0 ).Equivalently Here, the new remainder term is still exponentially small with respect to n due to the first condition in (6.6), which strengthens the assumption |τ | ≤ ε 2 in Lemma 6.1 (recall that M ≥ 1  12 ).In this case, the expression in the exponent will be of order − cnε 2 n 0 M 2 up to an absolute constant c > 0. Hence, (6.7) yields where where we used α < 1 and |t| ≤ ε ≤ 1 80 α 3 so as to bound the last sum, according to the second assumption in (6.6).Hence, when restricted to |t| ≥ t n , the absolute value of the integral in (6.8) does not exceed As a result, assuming the conditions (6.6), where t ′ n = min(t n , ε), |θ j | ≤ 1, and where R n is now defined in (6.9).

Proof of Theorem 1.4
As a final step, we need to explore an asymptotic behavior of the integral in (6.11),where we recall that ρ k = K (k) (z 0 ), z 0 = z 0 (τ ) being the saddle point for τ = x/ √ n.In view of the conditions in (6.6), we choose with c 0 = 1/6400.Note that with this choice the definition (6.9) becomes where c 1 > 0 is an absolute constant.Suppose that |τ | ≤ cτ 0 with a constant 0 < c ≤ 1 to be chosen later on.The integrand in (6.11) may be written as First assume that n ≥ n 1 = max(4n 0 , ε −4 ) which insures that t ′ n = t n (since ε < 1 80 ).As |t| ≤ t n , from (6.10) it follows that Here and below B denotes a quantity, perhaps different in different places, bounded by an absolute constant.With this convention, since n ≥ ( 80 α 3 ) 4 , we also have nv(t) = B, and by (6.10) with k = 3, which dominates (7.2) and the previous two expressions, we obtain that and (6.11) is simplified to Next, one may extend the integration in (7.4) to the whole real line at the expense of an error not exceeding where we used ρ 2 ≥ 1 2 .The latter bound is dominated by Bα −6 (log n) 3 n , and since the integral over the whole real line is equal to √ ρ 2 n , we obtain from (7.4) a simpler representation √ ρ 2 e nτ 3 λ(τ ) 1 + Bα −6 (log n) 3 n + Bn −1 e nτ 3 λ(τ ) + BR n .
Here, the first remainder term may be absorbed in the brackets, so that this formula is further simplified to (1400 M 2 n 0 ) 1/3 α 3 .(7.6) Since M ≥ 1 12 , we have (M 2 n 0 ) 1/3 ≤ 12 4/3 M 2 n 0 .Hence (7.6) may be strengthened to |τ | ≤ cτ 0 with a suitable constant c > 0. Under this condition, from (7.5) we thus get where R n is still defined as in (7.1) with a new constant c 1 .
It is now useful to note that the last error term in this representation is dominated by the second last one for sufficiently large n.Indeed, using e −y ≤ 2/y 2 (y > 0), we have where the last inequality holds true for n ≥ CM 4 n 2 0 α −12 with an absolute constant C > 0. This condition is slightly stronger than n ≥ n 1 which was assumed before.As a result, (7.Let us also note that the case 2n 0 ≤ n < n 1 is not interesting, since then |x| ≤ τ 0 n 1 , and (1.6) holds true by choosing a suitable constant in O in (1.6).In the remaining bounded interval |x| ≤ m, this argument does not work, and it is better to employ the Chebyshev-Edgeworth expansion for the correction ϕ m (x) in (1.1) (which depends on n as well).In terms of the first non-zero cumulant, (1.1) may be written more accurately as , where H m (x) denotes the Chebyshev-Hermite polynomial of degree m.As a consequence, for any constant x 0 > 0, which is stronger than (1.7), since m is even (m ≥ 4).
.3) So, one may use the Taylor expansion e x = 1 + x + O(x 2 ) in a bounded interval |x| ≤ B with