Central limit theorems for Diophantine approximants

In this paper we study certain counting functions which represent the numbers of solutions of systems of linear inequalities arising in the theory of Diophantine approximation. We develop a method that allows us to explain the random-like behavior that these functions exhibit and prove a central limit theorem for them. Our approach is based on a quantitative study of higher-order correlations for functions defined on the space of lattices and a novel technique for estimating cumulants of Siegel transforms.


Motivation
Many objects which arise in Diophantine Geometry exhibit random-like behavior. For instance, the classical Khinchin theorem in Diophantine approximation can be interpreted as the Borel-Cantelli Property for quasi-independent events, while Schmidt's quantitative generalization of Khinchin's Theorem is analogous to the Law of Large Numbers. One might ask whether much deeper probabilistic phenomena also take place.
In this paper, we develop a general framework which allows us to capture certain independence properties which govern the asymptotic behavior of arithmetic counting functions. We expect that the new methods will have a wide range of applications in Diophantine Geometry; here we apply the techniques to study the distribution of a class of counting functions which we now describe.
A basic problem in Diophantine approximation is to find "good" rational approximants of vectors u = (u 1 , . . . , u m ) ∈ R m . More precisely, given positive numbers w 1 , . . . , w m , which we shall assume sum to one, and positive constants ϑ 1 , . . . , ϑ m , we consider the system of inequalities with ( p, q) ∈ Z m × N. It is well-known that for Lebesgue-almost all u ∈ R m , the system (1.1) has infinitely many solutions ( p, q) ∈ Z m × N, so it is natural to try to count solutions in bounded regions, which leads us to the counting function T (u) := |{( p, q) ∈ Z m × N : 1 q < T and (1.1) holds}|.
Schmidt [15] proved that for Lebesgue-almost all u ∈ [0, 1] m , T (u) = C m log T + O u,ε ((log T ) 1/2+ε ), for all ε > 0, (1.2) where C m := 2 m ϑ 1 · · · ϑ m . One may view this as an analogue of the Law of Large Numbers, the heuristic for this analogy runs along the following lines. First, note that If one could prove that the functions (s 1 ) (·) and (s 2 ) (·) were "quasi-independent" random variables on [0, 1] m , at least when s 1 , s 2 , and |s 1 − s 2 | are sufficiently large, then (1.2) would follow by some version of the Law of Large Numbers. Moreover, the same heuristic further suggests that, in addition to the Law of Large Numbers, a central limit theorem and perhaps other probabilistic limit laws also hold for T (·).
In this paper, we put the above heuristic on firm ground. We do so by representing (s) (·) as a function on the space of unimodular lattices. It turns out that the "quasi-independence" of the family ( (s) ) that we are trying to capture can be translated into the dynamical language of higher-order mixing for a subgroup of linear transformations acting on the space of lattices.

Main results
We are not the first to explore central limit theorems for Diophantine approximants. The one-dimensional case (m = 1) has been thoroughly investigated by Leveque [11,12], Philipp [13], and Fuchs [6], leading to the following result proved by Fuchs [6]: there exists an explicit σ > 0 such that the counting function denotes the normal distribution with the variance σ . Central limit theorems in higher dimensions when w 1 = · · · = w m = 1/m have recently been studied Dolgopyat et al. [3]. In this paper, using very different techniques, we establish the following CLT for general exponents w 1 , . . . , w m . Our proof of Theorem 1.1, as well as the proof in [3], proceeds by interpreting T (·) as a function on a certain subset Y of the space of all unimodular lattices in R m+1 , and then studies how the sequence a s Y, where a is a fixed linear transformation of R m+1 , distributes inside this space. However, the arguments in the two papers follow very different routes. The proof in [3] contains a novel refinement of the martingale method (this approach was initiated in this setting by Le Borgne [10]). Here, one crucially uses the fact that when w 1 = · · · = w m = 1/m, then the set Y is an unstable manifold for the action of a on the space of lattices. For general weights, Y has strictly smaller dimension than the unstable leaves, and it seems challenging to apply martingale approximation techniques. Instead, our method involves a quantitative analysis of higher-order correlations for functions on the space of lattices. We establish an asymptotic formula for correlations of arbitrary orders and use this formula to compute limits of all the moments of T (·) directly. One of the key innovations of our approach is an efficient way of estimating sums of cumulants (alternating sums of moments) developed in our recent work [2].

An outline of the proof of Theorem 1.2
We begin by observing that T (·) can be interpreted as a function on the space of lattices in R m+n . Given u ∈ M m,n ([0, 1]), we define the unimodular lattice u in R m+n by and we see that where T denotes the domain The space X of unimodular lattices in R m+n is naturally a homogeneous space of the group SL m+n (R) equipped with the invariant probability measure μ X . The set Y := { u : u ∈ M m,n ([0, 1])} is a mn-dimensional torus embedded in X , and we equip Y with the Haar probability measure μ Y , interpreted as a Borel measure on X . We further observe (see Sect. 6 for more details) that each domain T can be tessellated using a fixed diagonal matrix a in SL m+n (R), so that for a suitable function χ : X → R, we have Hence we are left with analyzing the distribution of values for the sums N s=0χ (a s y) with y ∈ Y. This will allow us to apply techniques developed in our previous work [2], as well as in [1] (joint with M. Einsiedler). Intuitively, our arguments will be guided by the hope that the observablesχ • a s are "quasi-independent" with respect to μ Y . Due to the discontinuity and unboundedness of the functionχ on X , it gets quite technical to formulate this quasi-independence directly. Instead, we shall argue in steps.
We begin in Sect. 2 by establishing quasi-independence for observables of the form φ • a s , where φ is a smooth and compactly supported function on X . This amounts to an asymptotic formula (Corollary 2.4) for the higher-order correlations (1.10) It will be crucial for our arguments later that the error term in this formula is explicit in terms of the exponents s 1 , . . . , s r and in (certain norms of) the functions φ 1 , . . . , φ r . In Sect. 3, we use these estimates to prove the central limit theorems for sums of the form To do this, we use an adaption of the classical Cumulant Method (see Proposition 3.4), which provides bounds on cumulants (alternating sums of moments) given estimates on expressions as in (1.10), at least in certain ranges of the parameters (s 1 , . . . , s r ).
Here we shall exploit the decomposition (3.7) into "separated"/"clustered" tuples. We stress that the cumulant Cum (r ) (F N ) of order r can be expressed as a sum of O(N r ) terms, normalized by N r /2 , so that in order to prove that it vanishes asymptotically, we require more than just square-root cancellation; however, the error term in the asymptotic formula for (1.10) is rather weak. Nonetheless, by using intricate combinatorial cancellations of cumulants, we can establish the required bounds. In order to extend the method in Sect. 3 to the kind of unbounded functions which arise in our subsequent approximation arguments we have to investigate possible escapes of mass for the sequence of tori a s Y inside the space X . In Sect. 4, we prove several results in this direction (see e.g. Proposition 4.5), as well as L p -bounds (see Propositions 4.6 and 4.8). We stress that the general non-divergence estimates for unipotent flows developed by Kleinbock-Margulis [8] are not sufficient for our purposes, and in particular, the exact value of the exponent in Proposition 4.5 will be crucial for our argument. The proof of the L 2 -norm bound in Proposition 4.8 is especially interesting in this regard since it uncovers that the escape of mass is related to delicate arithmetic questions; our arguments require careful estimates on the number of solutions of certain Diophantine equations.
To make the technical passages in the final steps of the proof of Theorem 1.2 a bit more readable, we shall devote Sect. 5 to central limit theorems for sums of the form where f is a smooth and compactly supported function on R m+n , andf denotes the Siegel transform of f (see Sect. 4.3 for definitions). We stress that even though f is assumed to be bounded,f is unbounded on X . To prove the central limit theorems in this setting, we approximatef by compactly supported functions on X and then use the estimates from Sect. 3. However, the bounds in these estimates crucially depend on the order of approximation, so this step requires a delicate analysis of the error terms. The non-divergence results established in Sect. 4 play important role here.
Finally, to prove the central limit theorem for the functionχ (which is the Siegel transform of an indicator function on a nice bounded domain in R m+n ), and thus establish Theorem 1.2, we need to approximate χ with smooth functions, and show that the arguments in Sect. 5 can be adapted to certain sequences of Siegel transforms of smooth and compactly supported functions. This will be done in Sect. 6.

Estimates on higher-order correlations
Let X denote the space of unimodular lattices in R m+n . Setting G := SL m+n (R) and we may consider the space X as a homogeneous space under the linear action of the group G, so that Let μ X denote the G-invariant probability measure on X . We fix m, n 1 and denote by U the subgroup and set Y := U Z m+n ⊂ X . Geometrically, Y can be visualized as a mn-dimensional torus embedded in the spaces of lattices X . We denote by μ Y the probability measure on Y induced by the Lebesgue probability measure on M m,n ([0, 1]), and we note that Y corresponds to the collection of unimodular lattices u , for u ∈ M m,n ([0, 1)), introduced earlier in (1.8).
Let us further fix positive numbers w 1 , . . . , w m+n satisfying and denote by (a t ) the one-parameter semi-group The aim of this section is to analyze the asymptotic behavior of a t Y ⊂ X as t → ∞, and investigate "decoupling" of correlations of the form for "large" t 1 , . . . , t r > 0. It will be essential for our subsequent argument that the error terms in this "decoupling" are explicit in terms of the parameters t 1 , . . . , t r > 0 and suitable norms of the functions φ 1 , . . . , φ r , which we now introduce.
Every Y ∈ Lie(G) defines a first order differential operator If we fix an (ordered) basis {Y 1 , . . . , Y r } of Lie(G), then every monomial Z = Y 1 1 · · · Y r r defines a differential operator by of degree deg(Z ) = 1 + · · · + r . For k 1 and φ ∈ C ∞ c (X ), we define the norms and This identity readily extends to the universal enveloping algebras U(Lie(G)) as well, and thus we also have where Ad(g) denotes the extension of the Ad(g) from Lie(G) to U(Lie(G)). Since Ad(g)Z can be written as a finite sum of monomials of degrees not exceeding the degree of Z , we conclude that for every k 1, there exists a sub-multiplicative function g → C k (g) such that In particular, there a constant ξ = ξ(m, n, k) (which also depends on our fixed choice of weights w 1 , . . . , w m+n ) such that where the suppressed constants are independent of t and φ. The starting point of our discussion is a well-known quantitative estimate on correlations of smooth functions on X : Theorem 2.1 There exist γ > 0 and k 1 such that for all φ 1 , φ 2 ∈ C ∞ c (X ) and g ∈ G, This theorem has a very long history that we will not attempt to survey here, but only mention that a result of this form can be found, for instance, in [7,Corollary 2.4.4].
From now on we fix k 1 so that Theorem 2.1 holds.
Our goal is to decouple the higher-order correlations in (2.3), but in order to state our results we first need to introduce a family of finer norms on C ∞ c (X ) than ( · L 2 (X ) k ). Let us denote by · C 0 the uniform norm on C c (X ). If we fix a right-invariant Riemannian metric on G, then it induces a metric d on X G/ , which allows us to define the norms for φ ∈ C ∞ c (X ). We shall prove:

Remark 2.3
The case r = 1 was proved by Kleinbock and Margulis in [9], and our arguments are inspired by theirs. We stress that the constant δ in Theorem 2.2 is independent of r .
We also record the following corollary of Theorem 2.2.

Preliminary results
We recall that d is a distance on X ∼ = G/ induced from a right-invariant Riemannian metric on G. We denote by B G (ρ) the ball of radius ρ centered at the identity in G. For a point x ∈ X , we let ι(x) denote the injectivity radius at x, that is to say, the supremum over ρ > 0 such that the map By Mahler's Compactness Criterion, K ε is a compact subset of X . Furthermore, using reduction theory, one can show: An important role in our argument will be played by the one-parameter semi-group b t := diag e t/m , . . . , e t/m , e −t/n , . . . , e −t/n , t > 0, (2.10) which coincides with the semi-group (a t ) as defined in (2.2) with the special choice of exponents The submanifold Y ⊂ X is an unstable manifold for the flow (b t ) which makes the analysis of the asymptotic behavior of b t Y significantly easier than that of a t Y for general parameters. Using Theorem 2.1, Kleinbock and Margulis proved in [7] a quantitative equidistribution result for the family b t Y as t → ∞, we shall use a version of this result from their later work [9].

Remark 2.7
Although the dependence on φ is not stated in [9, Theorem 2.3], the estimate is explicit in the proof. Indeed, in [9, Section 2, p. 390], the authors show under the assumptions above that wheref is an (explicit) smooth function on X with compact support constructed from f . Theorem 2.6 now follows from the decay of matrix coefficients in Theorem 2.1.
We will prove Theorem 2.2 through successive uses of Theorem 2.6. In order to make things more transparent, it will be convenient to embed the flow (a t ) as defined in ( and, with s t := (w 1 t, . . . , w m+n t), we see that a t = a(s t ).
In addition to Theorem 2.6, we shall also need the following quantitative nondivergence estimate for unipotent flows established by Kleinbock and Margulis in [9]. Theorem 2.8 [9,Cor. 3.4] There exists θ = θ(m, n) > 0 such that for every compact L ⊂ X and a Euclidean ball B ⊂ U centered at the identity, there exists T 0 > 0 such that for every ε ∈ (0, 1), x ∈ L, and s ∈ S + satisfying s T 0 , one has |{u ∈ B : a(s)ux / ∈ K ε }| ε θ |B|.

Proof of Theorem 2.2
Let us fix r 1 and a r -tuple (t 1 , . . . , t r ). Upon re-labeling, we may assume that t 1 . . . t r , so that D := D(t 1 , . . . , t r ) = min{t 1 , t 2 − t 1 , . . . , t r − t r −1 }. (2.12) The next lemma provides an additional parameter s ∈ S + , which depends on the r -tuple (t 1 , . . . , t r ). This parameter will be used throughout the proof of Theorem 2.2, and we stress that the accompanying constants c 1 , c 2 and c 3 are independent of r and the r -tuple (t 1 , . . . , t r ).

Lemma 2.9
There exist c 1 , c 2 , c 3 > 0 such that given any t r > t r −1 > 0, there exists s ∈ S + satisfying: Let us now continue with the proof of Theorem 2.2. With the parameter s provided by Lemma 2.9 above, we have where b z is defined as in (2.10) and D as in (2.12) Throughout the rest of the proof we fix a compact set ⊂ U . Our aim now is to estimate integrals of the form where f ranges over C ∞ c (U ) with supp( f ) ⊂ . Our proof will proceed by induction over r , the case r = 0 being trivial.
Before we can start the induction, we need some notation. Let ρ 0 and k be as in Theorem 2.6, and pick for 0 < ρ < ρ 0 , a non-negative for some fixed σ = σ (m, n, k) > 0. The integral I r can now be rewritten as follows: If we set Since s ∈ S + , the linear map v → a(−s)va(s), for v ∈ U ∼ = R mn , is non-expanding, and thus we can conclude that there exists a fixed Euclidean ball B in U (depending only on and ρ 0 ), and centered at the identity, such that if f s,u (v) = 0, then u ∈ B. This implies that the integral I r can be written as and Furthermore, We decompose the integral I r in (2.19) as where I r (ε) is the integral over and I r (ε) is the integral over B\B ε . To estimate I r (ε), we first recall that s c 1 D by (2.15), so if D T 0 /c 1 , where T 0 is as in Theorem 2.8 applied to L = K ε and B, then the same theorem implies that there exists θ > 0 such that for every ε ∈ (0, 1), and thus (2.23) Let us now turn to the problem of estimating I r (ε). Since the Riemannian distance d on G, restricted to U , and the Euclidean distance on U are equivalent on a small open identity neighborhood, we see that for v ∈ B G (ρ 0 ) and any t ∈ S + . Hence, using (2.16), we obtain that for all i = 1, . . . , r − 1, and thus, for all v ∈ B G (ρ 0 ), . This leads to the estimate Hence, using (2.21), we obtain that Since by (2.17), we apply Theorem 2.6 to estimate this integral. We recall that supp( ε m+n by Proposition 2.5. In particular, we may take to arrange that ι(x s,u ) > 2ρ. Hence, by applying Theorem 2.6, we deduce that there exist c, γ > 0 such that for all u ∈ B\B ε . Using (2.20) and (2.21) and z c 3 D, we deduce that Applying (2.24) one more time (in the backward direction), we get where we recognize the first term as I r −1 . Using (2.19) and (2.21), we now conclude that Hence, combining this estimate with (2.25), we deduce that and thus, in view of (2.23), This estimate holds whenever ρ < ρ 0 and ε ρ 1/(m+n) . Taking ρ = e −c 4 D for sufficiently small c 4 > 0 and ε ρ 1/(m+n) , we conclude that there exists δ > 0 such that for all sufficiently large D, The exponent δ depends on the constants c 2 and c 3 given by Lemma 2.9 and the parameters θ, c, σ, γ . In particular, δ is independent of r . By possibly enlarging the implicit constants we can ensure that the estimate (2.26) also holds for all r -tuples (t 1 , . . . , t r ), and not just the ones with sufficiently large D(t 1 , . . . , t r ). By iterating the estimate (2.26), using that I 0 is a constant, the proof of Theorem 2.2 is finished.
The map a defines a continuous self-map of the space X , which preserves μ X . We recall that the torus Y = U Z m+n ⊂ X is equipped with the probability measure μ Y . In this section, we shall prove a central limit theorem for the averages We stress that this result is not needed in the proof of Theorem 1.2, but we nevertheless include it here because we feel that its proof might be instructive before entering the proof of the similar, but far more technical, Theorem 6.1.

Remark 3.2 It follows from Theorem 2.1 that the variance σ φ is finite.
Our main tool in the proof of Theorem 3.1 will be the estimates on higher-order correlations established in Sect. 2. To make notations less heavy, we shall use a simplified version of Corollary 2.4 stated in terms of C k -norms (note that N k · C k ):

The method of cumulants
Let (X , μ) be a probability space. Given bounded measurable functions φ 1 , . . . , φ r on X , we define their joint cumulant as where the sum is taken over all partitions P of the set {1, . . . , r }. When it is clear from the context, we skip the subscript μ. For a bounded measurable function φ on X , we also set We shall use the following classical CLT-criterion (see, for instance, [5]).

Proposition 3.4 Let (F T ) be a sequence of real-valued bounded measurable functions such that
and lim T →∞ Then for every ξ ∈ R, Since all the moments of a random variable can be expressed in terms of its cumulants, this criterion is equivalent to the more widely known "Method of Moments". However, the cumulants have curious, and very convenient, cancellation properties that will play an important role in our proof of Theorem 3.1.
For a partition Q of {1, . . . , r }, we define the conditional joint cumulant with respect to Q as In what follows, we shall make frequent use of the following proposition.

Estimating cumulants
In this section, we shall estimate cumulants of the form for r 3. Since we shall later need to apply these estimate in cases when the function φ is allowed to vary with N , it will be important to keep track of the dependence on φ in our estimates. We shall decompose (3.5) into sub-sums where the parameters s 1 , . . . , s r are either "separated" or "clustered", and it will also be important to control their sizes. For this purpose, it will be convenient to consider the set {0, . . . , N − 1} r as a subset of R r +1 + via the embedding (s 1 , . . . , s r ) → (0, s 1 , . . . , s r ). Following the ideas developed in the paper [2], we define for non-empty subsets I and J of {0, . . . , r } and s = (s 0 , . . . , s r ) ∈ R r +1 + , For 0 α < β, we define The following decomposition of R r +1 + was established in our paper [2, Prop. 6.2]: given where the union is taken over the partitions In order to estimate the cumulant (3.5), we shall separately estimate the sums over (β r +1 ; N ) and Q (α j , β j+1 ; N ), the exact choices of the sequences (α j ) and (β j ) will be fixed at the very end of our argument.

Case 0: Summing over Ä(ˇr +1 ; N)
In this case, s i β r +1 for all i, and thus Hence, where the implied constants may depend on r , but we shall henceforth omit this subscript to simplify notations.

Case 1: Summing over
In this case, we have so that it follows that Hence, and s i α j for all i ∈ I 0 , and s i > β j+1 for all i / ∈ I 0 .
In particular, Let I be an arbitrary subset of {1, . . . , r }. In what follows, we shall show a precise version of the "decoupling": where we henceforth shall use the convention that the product is equal to one when Let us estimate the left hand side of (3.13). We begin by setting To prove (3.13), we expand We recall that when i / ∈ I 0 , we have s i β j+1 , and thus it follows from Corollary 3.
To estimate the other integrals in (3.15), we also apply Corollary 3.3. Let us first fix a subset J ⊂ I \I 0 and for each 1 h l, we pick i h ∈ I h , and set We note that for i ∈ I h , we have |s i − s i h | α j , and thus, by (2.7), there exists Using (3.14) and (3.17) and invariance of the measure μ X , we deduce that Hence, we conclude that We shall choose the parameters α j and β j+1 so that Next, we estimate the right hand side of (3.13). Let us fix 1 h l and for a subset J ⊂ I ∩ I h , we define As in (3.17), for some ξ > 0, Applying Corollary 3.3 to the function J and using that where we have used a-invariance of μ X . Combining (3.16) and (3.21), we deduce that Furthermore, multiplying out the products over I ∩ I h , we get (3.23) Comparing (3.20) and (3.23), we finally conclude that . This establishes (3.13) with an explicit error term. This estimate implies that for the partition Q = {I 0 , . . . , I }, By Proposition 3.5, so it follows that for all (s 1 , . . . , s r ) ∈ Q (α j , β j+1 ; N ),

Final estimates on the cumulants
Let us now return to (3.5). Upon decomposing this sum into the regions discussed above, and applying the estimates (3.9), (3.10) and (3.24) to respective region, we get the bound Given any γ > 0, we define the parameters β j inductively by the formula It easily follows by induction that β r +1 r γ , and we deduce from (3.25) that Taking γ = (r /δ) log N , we conclude that when r 3, (3.28)

Estimating the variance
In this section, we wish to compute the limit Setting s 1 = s + t and s 2 = t, we rewrite the above sums as (3.30) It follows from Corollary 3.3 that for fixed s and as t → ∞, We conclude that as t → ∞, and for fixed s, If one carelessly interchange limits above, one expects that as N → ∞, (3.31) To prove this limit rigorously, we need to say a bit more to ensure, say, dominated convergence. It follows from Corollary 3.3 that and thus, in combination with (3.30), This integral can also be estimated in a different way. If we set φ t = φ • a t , then we deduce from Corollary 3.3 that By (2.7), there exists ξ = ξ(m, n, k) > 0, such that Let us now combine (3.32) and (3.33): When t δ/(2ξ) s, we use (3.33), and when t δ/(2ξ) s, we use (3.32). If we set δ = min(δ/2, δ 2 /(2ξ)) > 0, then Hence, the Dominated Convergence Theorem applied to (3.29) yields (3.31).

Proof of Theorem 3.1
In this subsection we shall check that the conditions of Proposition 3.4 hold for the sequence (F N ) defined in (3.1). First, by construction, it is easy to check that and by (3.31), Furthermore, by (3.28), for every r 3, which finishes the proof.

Siegel transforms
We recall that the space X of unimodular lattices in R m+n can be identified with the quotient space G/ , where G = SL m+n (R) and = SL m+n (Z), which is endowed with the G-invariant probability measures μ X . We denote by m G a bi-G-invariant Radon measure on G. Given a bounded measurable function f : R m+n → R with compact support, we define its Siegel transformf : X → R by We stress thatf is unbounded on X , its growth is controlled by an explicit function α which we now introduce. Given a lattice in R m+n , we say that a subspace V of It follows from the Mahler Compactness Criterion that α is a proper function on X .
Using reduction theory, it is not hard to derive the following integrability of α, which is well-known in Geometry of Numbers (see e.g. [4, Lemma 3.10]).
In what follows, dz denotes the volume element on R m+n which assigns volume one to the unit cube. In our arguments below, we will make heavy use of the following two integral formulas: where P(Z m+n ) denotes the set of primitive integral vectors in Z m+n , and ζ denotes Riemann's ζ -function.

Non-divergence estimates
We retain the notation from Sect. 2. Given 0 < w 1 , . . . , w m < n and w 1 + · · · + w m = n, we denote by a the self-map on X induced by a = diag(e w 1 , . . . , e w m , e −1 , . . . , e −1 ). (4.1) Our goal in this subsection is to analyze the escape of mass for the submanifolds a s Y and bound the Siegel transformsf (a s y) for y ∈ Y. The following proposition will play a very important role in our arguments.

Proposition 4.5
There exists κ > 0 such that for every L 1 and s κ log L, Proof Let χ L be the characteristic function of the subset {α < L} of X . By Mahler's Compactness Criterion, χ L has a compact support. We further pick a non-negative Since μ X is G-invariant, It follows from invariance of m G that if D Z is a differential operator as defined as in (2.4), then D Z η L = (D Z ρ) * χ L . Hence, η L ∈ C ∞ c (X ), and η L C k ρ C k . Note that there exists c > 1 such that for every g ∈ supp(ρ) and all x ∈ X , we have α(g −1 x) c −1 α(x), and thus {α • g −1 < L} ⊂ {α < cL} and η L χ cL . This implies the lower bound Combining these bounds, we get and thus By taking s κ log L where κ = p δ , the proof is finished.

Remark 4.7 In [4, Theorem 3.2], the authors (implicitly) show a similar uniform estimate for integrals of smooth Siegel transforms over SO(m + n)-orbits of a lattice. Their proof is quite different from ours.
Proof We note that there exist 0 < υ 1 < υ 2 and ϑ > 0 such that the support of f is contained in the set and without loss of generality we may assume that f is the characteristic function of this set. We recall that Y can be identified with the collection of lattices Hence, We observe that for each i and p i ∈ Z, the volume of the set u ∈ [0, 1] n : |p + u, q | ϑ q −w i is estimated from above by and we note that the set is empty whenever | p| > j |q j | + ϑ q −w i . In particular, it is non-empty for at most O( q ) choices of p ∈ Z. Hence, we deduce that uniformly in s. This completes the proof.
Proof As in the proof of Proposition 4.6, it is sufficient to consider the case when f is the characteristic function of the set (4.2). Thenf (a s y) is given by (4.3), and we get For fixed q, l ∈ Z n , we wish to estimate First, we consider the case when q and are linearly independent. Then there exist indices j, k = 1, . . . , n such that q j k − q k j = 0. Let us consider the function ψ on R 2 defined by ψ(x 1 , as well as the periodized functionψ on then we denote by S the affine map which induces an affine endomorphism of the torus R 2 /Z 2 . We note that where μ denotes the Lebesgue probability measure on the torus R 2 /Z 2 . Since the endomorphism S preserves μ, we see that Therefore, we conclude that in this case, Let us now we consider the second case when q and are linearly dependent. Upon re-arranging indices if needed, we may assume that |q 1 | = max(|q 1 |, . . . , |q n |, | 1 |, . . . , | n |). (4.5) In particular, q 1 = 0, and thus 1 = 0, since q and are linearly dependent, so we can define the new variables We note that the last integral is non-zero only when | p| |q 1 | and |r | | 1 |. We set q 1 = q d and 1 = d where d = gcd(q 1 , 1 ). Then q 1 r − 1 p = jd for some j ∈ Z. We observe that when j is fixed, then the integers p and r satisfy the equation q r − p = j. Since gcd(q , ) = 1, all the solutions of this equation are of the form p = p 0 + kq , r = r 0 + k for k ∈ Z. In particular, it follows that the number of such solutions satisfying | p| |q 1 | and |r | | 1 | is at most O(d). We write where the first sum is taken over those p, r with q 1 r − 1 p = 0, and the second sum is taken over those p, r with q 1 r − 1 p = 0. Upon applying a linear change of variables, we obtain Let us consider the function We note that the integrand equals the indicator function of the intersection of the intervals and thus it follows that ρ i is non-increasing when x 0, and non-decreasing when x 0. This implies that Next, we proceed with estimation of J (2) i (q, ). Let c 0 := min{ q : q ∈ Z n \{0}}. We set where we used that q 1 is chosen according to (4.5). Combining the obtained estimates for J (1) i (q, ) and J (2) i (q, ), we conclude that when q and are linearly dependent, where q 1 is chosen according to (4.5). Now we proceed to estimate Using (4.4), the sum in (4.7) over linearly independent q and can be estimated as For a subset I of {1, . . . , m}, we set w(I ) := i∈I w i . Then using (4.6), we deduce that the sum in (4.7) over linearly dependent q and is bounded by * The star indicates that the sum is taken over linearly dependent q and .
When I c = ∅, then w(I ) = n. Since the number of (q, ) satisfying υ 1 e s q , υ 2 e s is estimated as O(e 2ns ), it is clear that the corresponding term in the above sum is uniformly bounded. Now we suppose that I c = ∅. Since q and are linearly dependent, the vector is uniquely determined given 1 and q, and we obtain that for some υ 1 , υ 2 > 0, * υ 1 e s q , We shall use the following lemma:
We order these solutions according to d := gcd(q, ). Let q = q d and = d. Then q and are coprime, and the system (4.9) is equivalent to Because of coprimality, each p i have to be divisible by q , so that the number of such p i 's is at most O(q/q ) = O(d). We note that given d the number of possible choices for is at most q/d, and ( p 1 , . . . , p k , ) uniquely determine (r 1 , . . . , r k ). Hence, the number of solutions of (4.9) is estimated by where σ k−1 (q) = d|q d k−1 . Writing q = q d, we conclude that This proves the lemma.

Remark 4.10
A similar estimate in the case k = 1 was proved by Schmidt in [16].
A simple modification of this argument also gives that Hence, it follows that * where The terms in this sum are uniformly bounded unless I = ∅ and |I c | = 1, namely, when m > 1. When m = 1, we obtain the bound O(1 + s). This proves the proposition.

Truncated Siegel transform
The Siegel transform of a compactly supported function is typically unbounded on X ; to deal with this complication, it is natural to approximatef by compactly supported functions on X , the so called truncated Siegel transforms, which we shall denote bŷ f (L) . They will be constructed using a smooth cut-off function η L , which will be defined in the following lemma.

Lemma 4.11
For every c > 1, there exists a family (η L ) in C ∞ c (X ) satisfying: Proof Let χ L denote the indicator function of the subset {α L} ⊂ X , and pick a non-negative φ ∈ C ∞ c (G) with G φ dm G = 1 and with support in a sufficiently small neighbourhood of identity in G to ensure that for all g ∈ supp(φ) and x ∈ X , We now define η L as To prove the last property, we observe that it follows from invariance of m G that for a differential operator D Z as in (2.4) For a bounded function f : R m+n → R with compact support, we define the truncated Siegel transform of f asf We record some basic properties of this transform that will be used later in the proofs.

Moreover, the implied constants are uniform when supp( f ) is contained in a fixed compact set.
Proof It follows from Proposition 4.1 that Since 0 η L 1, (4.11) follows from Proposition 4.2, and the upper bound in (4.12) holds since supp(η L ) ⊂ {α cL}.
We observe that for a differential operator D Z as in (2.4), we have D Z (f ) = D Z f . Hence, we deduce from Proposition 4.1 that Since supp(η L ) ⊂ {α cL} and η L C k 1, we deduce that To prove (4.14), we observe that since 0 η L 1 and η L = 1 on {α < c −1 L}, it follows from Proposition 4.1 that Hence, applying the Hölder inequality with 1 p < m + n and q

Now it follows from Proposition 4.2 that
which proves (4.14). The proof of (4.15) is similar, and we omit the details.

CLT for smooth Siegel transforms
Assume that f ∈ C ∞ c (R m+n ) satisfies f 0 and supp( f ) ⊂ {(x m+1 , . . . , x m+n ) = 0}. We shall in this section analyze the asymptotic behavior of the averages and prove the following result: Theorem 5.1 If m 2 (and thus m + n 3) and f is as above, then the variance is finite, and for every ξ ∈ R, The proof of Theorem 5.1 follows the same plan as the proof of Theorem 3.1, but we need to develop an additional approximation argument which involves truncations of the Siegel transformf . This can potentially change the behaviour of the averages F N , so we will have to take into account possible escapes of masses for the sequences of submanifolds a s Y in X .
Throughout the proof, we shall frequently make use of the basic observation that if we approximate F N byF N in such a way that F N −F N L 1 (Y) → 0, then F N andF N will have the same convergence in distribution. Each time we apply this observation, the new approximation will depend on some sequence which depend on N ; ultimately, we will end up with three different, but interrelated, sequences K (N ), L(N ) and M(N ), which need to be matched. In Sect. 5.3, we will provide explicit choices for these sequences. LetF for some M = M(N ) → ∞ that will be chosen later. We observe that and thus, by Proposition 4.6 we see that It particular, it follows that if (5.1) holds forF N , then it also holds for F N , we shall prove the former. In order to simplify notation, let us drop the tilde, and assume from now on that F N is given by (5.2). Given a sequence L = L(N ), which shall be chosen later, we consider the average F (L) defined for the truncated Siegel transformsf (L) introduced in Sect. 4.3. We have We recall thatf (L) =f · η L , 0 η L 1, and η L ( Hence, by the Cauchy-Schwarz inequality, Let us now additionally assume that M log L, (5.5) so that the assumption of Proposition 4.5 is satisfied when s M. This implies that We also recall that by Proposition 4.8 when m 2, and thus If we now choose L = L(N ) → ∞ so that then it follows that Hence, if we can show Theorem 5.1 for the averages F (L) N with the parameter constraints above, then it would also hold for F N . In order to prove CLT for (F (L) N ), we follow the route of Proposition 3.4 and estimate cumulants and L 2 -norms of the sequence.

Estimating cumulants
We set Our aim is to estimate when r 3. The argument proceeds as in Sect. 3.2, but we have to refine the previous estimates to take into account the dependence on the parameters L and M. Using the notation from Sect In this case, we shall show that a s 1 , . . . , φ (L) • a s r (5.11) where φ (L) :=f (L) − μ X (f (L) ). This reduces to estimating the integrals Y i∈I β j+1 ; N ), and thus it follows from Corollary 3.3 with r = 1 that there exists δ > 0 such that For a fixed J ⊂ I , we define and note that for some ξ = ξ(m, n, k) > 0, we have If we again apply Corollary 3.3 to the function (L) , we obtain Y i∈Jf where we used that μ X is invariant under the transformation a. Let us now choose the exponents α j and β j+1 so that δβ j+1 − r ξα j > 0. Combining (5.12), (5.13) and (5.14), we deduce that Y i∈I and thus, for any partition P, and consequently, whenever (s 1 , . . . , s r ) ∈ Q (α j , β j+1 ; M, N ) with Q = {{0}, {1, . . . , r }}, from which (5.11) follows. We now claim that where we use the notation x + = max(x, 0). The implied constant in (5.17) and below in the proof depend only on supp( f ). By the definition of the cumulant, to prove (5.17), it suffices to show that for every z 1 and indices i 1 , . . . , i z , .

Case 2: Summing over
where we used Lemma 4.12.
Finally, we combine the established bounds to estimate Cum We choose the parameters α j and β j as in (3.27). Then β r +1 r γ . In particular, we may choose M r γ (5.20) to guarantee that (5.10) is satisfied. With these choices of α j and β j , we obtain the estimate We observe that since m 2, Hence, we can choose q > 1/(m + n) such that q(r − (m + n − 1)) + < r /2 − 1 for all r 3.
Then we select so that, in particular, the condition (5.7) is satisfied. Now (5.21) can be rewritten as with sufficiently large c r > 0, we conclude that for all r 3.

Estimating variances
We now turn to the analysis of the variances of the average F (L) N which are given by We proceed as in Sect. 3.3 taking into account dependence on parameters M and L. We observe that this expression is symmetric with respect to s 1 and s 2 , writing s 1 = s + t and s 2 = t with 0 s N − M − 1 and M t N − s − 1, we obtain that To estimate (L) N (s), we introduce an additional parameter K = K (N ) → ∞ such that K M (to be specified later) and consider separately the cases when s < K and when s K .
First, we consider the case when s K . By Corollary 3.3, we have Also, by Corollary 3.3, Hence, combining (5.24) and (5.25), we deduce that we conclude that where we used Lemma 4.12. The implied constants here and below in the proof depend only on supp( f ).
Let us now consider the case s < K . We observe that Corollary 3.3 (for r = 1) applied to the function φ (L) s Furthermore, for some ξ = ξ(m, n, k) > 0, we have Therefore, we deduce that Combining this estimate with (5.25), we conclude that Hence, setting we obtain that Therefore, using Lemma 4.12, we deduce that Combining (5.26) and (5.27), we conclude that (L) Next, we shall show that with a suitable choice of parameters, We recall that by Lemma 4.12, for all τ < m + n − 1, where the implied constant depends only on supp( f ). It follows from these estimates that Then (5.32) follows. We conclude that Finally, we compute ∞ (s) by using Rogers formula (Proposition 4.4) applied to the function Since by the Siegel Mean Value Theorem (Proposition 4.3), we conclude that Finally, we show that the sum in (5.36) is finite. We represent points z ∈ R m+n as z = (x, y) with x ∈ R m and y ∈ R n . Since f is bounded, and its compact support is contained in {y = 0}, we may assume without loss of generality that f is the characteristic function of the set Then Setting We note that I (e s p −1 ) ∩ I (q −1 ) = ∅ unless (υ 1 υ −1 2 )e s q p (υ 2 υ −1 1 )e s q, and also that Hence, it follows that (e s q) −(m+n−1) < ∞, because m + n 3.

Proof of Theorem 5.1
As we already remarked above, it is sufficient to show that the sequence of averages F (L) N converges in distribution to the normal law. To verify this, we use the Method of Cumulants (Proposition 3.4). It is easy to see that Moreover, with a suitable choice of parameters, we have shown in Sect. 5.1 that for r 3, and in Sect. 5.2 that Hence, Proposition 3.4 applies, and it remains to verify that we can choose our parameters that satisfy the stated assumptions. We recall that with sufficiently large c 1 > 0 so that (5.29) is satisfied. Then taking we arrange that (5.5), (5.20), and (5.30) hold for all N N 0 (r ). We note that the constant c r and the implicit constant in (5.20) depends on r , and the (log log N )-factor here is added to guarantee that the parameter M is independent of r . Finally, the conditions (5.4), (5.31), (5.35) are immediate from our choices.

CLT for counting functions and the proof of Theorem 1.2
We recall from Sect. 1.3 that where u is defined in (1.8) and the domains T are defined in (1.9). We shall decompose this domain into smaller pieces using the linear map a = diag(e w 1 , . . . , e w m , e −1 , . . . , e −1 ). We note that for any integer N 1, and thus where χ denotes the characteristic function of the set e . Hence the proof of Theorem 1.2 reduces to analyzing sums of the form N −1 s=0χ (a s y) with y ∈ Y. For this purpose, we define Our main result in this section now reads as follows.
as N → ∞, where where χ θ denotes the characteristic function of the interval [−θ, θ]. We observe that and moreover this integral is non-zero only when | p i | = O( q ). Hence, The number of integral points in the region {(1 − ε)e s y e s } can be estimated in terms of its volume. Namely, there exist r > 0 (depending only on the norm) such that Hence, The integral forχ 2,ε • a s can be estimated similarly. The integral overχ 3,ε • a s as in (6.2) can be written as a sum of the products of the integral and the integrals We observe that these integrals are non-zero only when | p j | = O( q ) and | p i | = O( q ). Hence, we conclude that which completes the proof of the proposition. Now we start with the proof of Theorem 6.1. As in Sect. 5, we modify F N and consider insteadF for a parameter M = M(N ) → ∞ that will be chosen later. As in (5.3) we obtain that Hence, if we can prove the CLT for (F N ), then the CLT for (F N ) would follow. From now on, to simplify notations, we assume that F N is given by (6.3). Our next step is to exploit the approximation χ ≈ f ε , so we introduce where the parameter ε = ε(N ) → 0 will be specified later. We observe that it follows from Proposition 6.2 that We choose ε = ε(N ) and M = M(N ) so that Hence, it remains to prove convergence in distribution for the sequence F We observe that the sequence F (ε) N fits into the framework of Sect. 5. However, we need to take into account the dependence on the new parameter ε and refine the previous estimates. It will be important for our argument that the supports of the functions f ε are uniformly bounded, f ε C 0 1, and f ε C k ε −k . As in Sect. 5, we consider the truncation defined for a parameter L = L(N ) → ∞. We assume that 1 for all s 0, uniformly on ε. Hence, the bound (5.6) can be proved exactly as before, and we obtain We choose the parameter L as before so that N = o L p for some p < m + n (6.7) to guarantee that Now it remains to show that the family F (ε,L) N satisfies the CLT with a suitable choice of parameters M, L, ε. As in Sect. 5 we will show that for r 3, and with an explicit σ ∈ (0, ∞).
Under the condition M γ, (6.10) the estimate (5.21) gives the bound We note that the implicit constant in (5.21) depends only on supp( f ε ) so that it is uniform on ε. We choose L = N q as in Sect. 5 and γ = c r (log N ), where c r > 0 will be specified later. In particular, then N 1−r /2 γ r −1 L (r −(m+n−1)) + → 0, and assuming that N r /2 L r ε −rk = o(e δγ ), (6.11) it follows that (6.8) holds.
To prove (6.9), we have to estimate Indeed, arguing as in (5.28), we deduce that Hence, (6.12) holds provided that e −δ K L 2 ε −2k → 0, (6.13) N −1 e −δ M e ξ K L 2 ε −2k → 0, (6.14) Next, we set  where ω n := S n−1 z −n dz. We also see that 6.1 Proof of Theorem 6.1 As we already remarked above, it is sufficient to show that the average F Then the first part of (6.5) holds. Then we select sufficiently large c r in γ = c r (log N ) so that (6.11) holds. After that we choose with sufficiently large c 1 > 0 so that (6.13) holds. Then it is clear that (6.16) also holds. Given these ε, L, γ , and K , we choose so that the second part of (6.5), (6.6), (6.10), and (6.14) hold for all N N 0 (r ). With these choices, it is clear that (6.4) and (6.15) also hold. Hence, Theorem 6.1 follows from Proposition 3.4.

Proof of Theorem 1.2
For u ∈ M m,n ([0, 1]), we set where C m,n = 2 m ϑ 1 · · · ϑ m ω n with ω n := S n−1 z −n dz. We shall show that D T (u) can be approximated by the averages F N defined in (6.1). This will allow us deduce convergence in distribution for D T . We observe that: where C m,n is defined above.
Proof We observe that where N denotes the characteristic function of the set (x, y) ∈ R m+n : 1 y < e N , |x i | < ϑ i y −w i , i = 1, . . . , m .
Using notation as in the proof of Proposition 4.6, we obtain We claim that To prove this, let us consider more generally a bounded measurable functions χ on R with compact support, the function ψ(x) = χ(x 1 ) on R m , and the functionψ(x) = p∈Z χ( p + x 1 ) on the torus R m /Z m . We suppose without loss of generality that q 1 = 0 and consider a non-degenerate linear map S : R m → R m : u → ( u, q , u 2 , . . . , u m ) which induced a linear epimorphism of the torus R m /Z m . Using that S preserves the Lebesgue probability measure μ on R m /Z m , we deduce that which yields (6.19).