Central Limit Theorems for Diophantine approximants

In this paper we study counting functions representing the number of solutions of systems of linear inequalities which arise in the theory of Diophantine approximation. We develop a method that allows us to explain the random-like behavior that these functions exhibit and prove a Central Limit Theorem for them. Our approach is based on a quantitative study of higher-order correlations for functions defined on the space of lattices and a novel technique for estimating cumulants of Siegel transforms.


Motivation
Many objects which arise in Diophantine Geometry exhibit random-like behavior. For instance, the classical Khinchin theorem in Diophantine approximation can be interpreted as the Borel-Cantelli Property for quasi-independent events, while Schmidt's quantitative generalization of Khinchin's Theorem is analogous to the Law of Large Numbers. One might ask whether much deeper probabilistic phenomena also take place. In this paper, we develop a general framework which allows us to capture certain quasi-independence properties which govern the asymptotic behavior of arithmetic counting functions. We expect that the new methods will have a wide range of applications in Diophantine Geometry; here we apply the techniques to study the distribution of counting functions involving Diophantine approximants.
A basic problem in Diophantine approximation is to find "good" rational approximants of vectors u = (u 1 , . . . , u m ) ∈ R m . More precisely, given positive numbers w 1 , . . . , w m , which we shall assume sum to one, and positive constants ϑ 1 , . . . , ϑ m , we consider the system of inequalities u j − p j q ϑ j q 1+w j , for j = 1, . . . , m, (1. 1) with (p, q) ∈ Z m × N. It is well-known that for Lebesgue-almost all u ∈ R m , the system (1.1) has infinitely many solutions (p, q) ∈ Z m × N, so it is natural to try to count solutions in bounded regions, which leads us to the counting function ∆ T (u) := |{(p, q) ∈ Z m × N : 1 q < T and (1.1) holds}|.
W. Schmidt [16] proved that for Lebesgue-almost all u ∈ [0, 1] m , ∆ T (u) = C m log T + O u,ε (log T ) 1/2+ε , for all ε > 0, (1.2) where C m := 2 m ϑ 1 · · · ϑ m . One may view this as an analogue of the Law of Large Numbers, the heuristic for this analogy runs along the following lines. First, note that where ∆ (s) (u) := |{(p, q) ∈ Z m × N : e s q < e s+1 and (1.1) holds}|. If one could prove that the functions ∆ (s 1 ) (·) and ∆ (s 2 ) (·) were "quasi-independent" random variables on [0, 1] m , at least when s 1 , s 2 , and |s 1 − s 2 | are sufficiently large, then (1.2) would follow by some version of the Law of Large Numbers. Moreover, the same heuristic further suggests that, in addition to the Law of Large Numbers, a Central Limit Theorem and perhaps other probabilistic limit laws also hold for ∆ T (·).
In this paper, we put the above heuristic on firm ground. We do so by representing ∆ (s) (·) as a function on the space of unimodular lattices. It turns out that the "quasi-independence" of the family (∆ (s) ) that we are trying to capture can be translated into the dynamical language of higher-order mixing for a subgroup of linear transformations acting on the space of lattices.
Our proof of Theorem 1.1, as well as the proof in [3], proceeds by interpreting ∆ T (·) as a function on a certain subset Y of the space of all unimodular lattices in R m+1 , and then studies how the sequence a s Y, where a is a fixed linear transformation of R m+1 , distributes inside this space. However, the arguments in the two papers follow very different routes. The proof in [3] contains a novel refinement of the martingale method (this approach was initiated in this setting by Le Borgne [11]). Here, one crucially uses the fact that when w 1 = · · · = w m = 1/m, then the set Y is an unstable manifold for the action of a on the space of lattices. For general weights, Y has strictly smaller dimension than the unstable leaves, and it seems challenging to apply martingale approximation techniques. Instead, our method involves a quantitative analysis of higher-order correlations for functions on the space of lattices. We establish an asymptotic formula for correlations of arbitrary orders and use this formula to compute limits of all the moments of ∆ T (·) directly. One of the key innovations of our approach is an efficient way of estimating sums of cumulants (alternating sums of moments) developed in our recent work [2].
We also investigate the more general problem of Diophantine approximation for systems of linear forms. The space M m,n (R) of m linear forms in n real variables is parametrized by real m × n matrices. Given u ∈ M m,n (R), we consider the family (L Let · be a norm on R n . Fix ϑ 1 , . . . , ϑ m > 0 and w 1 , . . . , w m > 0 which satisfy w 1 + · · · + w m = n, and consider the system of Diophantine inequalities p i + L (i) u (q 1 , . . . , q n ) < ϑ i q −w i , i = 1, . . . , m, (1.5) with (p, q) = (p 1 , . . . , p m , q 1 , . . . , q n ) ∈ Z m × (Z n \{0}). The number of solutions of this system with the norm of the "denominator" q bounded by T is given by ∆ T (u) := |{(p, q) ∈ Z m × Z n : 0 < q < T and (1.5) holds}| . (1.6) Our main result in this paper is the following generalization of Theorem 1.1. as T → ∞, where C m,n := C m ω n with ω n := S n−1 z −n dz and σ m,n := 2C m,n 2ζ(m + n − 1)ζ(m + n) −1 − 1 .

An outline of the proof of Theorem 1.2
We begin by observing that ∆ T (·) can be interpreted as a function on the space of lattices in R m+n . Given u ∈ M m,n ([0, 1]), we define the unimodular lattice Λ u in R m+n by The space X of unimodular lattices in R m+n is naturally a homogeneous space of the group SL m+n (R) equipped with the invariant probability measure µ X . The set is a mn-dimensional torus embedded in X, and we equip Y with the Haar probability measure µ Y , interpreted as a Borel measure on X.
We further observe (see Section 6 for more details) that each domain Ω T can be tessellated using a fixed diagonal matrix a in SL m+n (R), so that for a suitable functionχ : X → R, we have Hence we are left with analyzing the distribution of values for the sums N s=0χ (a s y) with y ∈ Y. This will allow us to apply techniques developed in our previous work [2], as well as in [1] (joint with M. Einsiedler). Intuitively, our arguments will be guided by the hope that the observableŝ χ•a s are "quasi-independent" with respect to µ Y . Due to the discontinuity and unboundedness of the functionχ on X, it gets quite technical to formulate this quasi-independence directly. Instead, we shall argue in steps.
We begin in Section 2 by establishing quasi-independence for observables of the form φ • a s , where φ is a smooth and compactly supported function on X. This amounts to an asymptotic formula (Corollary 2.4) for the higher-order correlations (1.10) It will be crucial for our arguments later that the error term in this formula is explicit in terms of the exponents s 1 , . . . , s r and in (certain norms of) the functions φ 1 , . . . , φ r . In Section 3, we use these estimates to prove the Central Limit Theorems for sums of the form To do this, we use an adaption of the classical Cumulant Method (see Proposition 3.4), which provides bounds on cumulants (alternating sums of moments) given estimates on expressions as in (1.10), at least in certain ranges of the parameters (s 1 , . . . , s r ). Here we shall exploit the decomposition (3.7) into "seperated"/"clustered" tuples. We stress that the cumulant Cum (r) (F N ) of order r can be expressed as a sum of O(N r ) terms, normalized by N r/2 , so that in order to prove that it vanishes asymptotically, we require more than just square-root cancellation; however, the error term in the asymptotic formula for (1.10) is rather weak. Nonetheless, by using intricate combinatorial cancellations of cumulants, we can establish the required bounds. In order to extend the method in Section 3 to the kind of unbounded functions which arise in our subsequent approximation arguments we have to investigate possible escapes of mass for the sequence of tori a s Y inside the space X. In Section 4, we prove several results in this direction (see e.g. Proposition 4.5), as well as L p -bounds (see Propositions 4.6 and 4.7). We stress that the general non-divergence estimates for unipotent flows developed by Kleinbock-Margulis [8] are not sufficient for our purposes, and in particular, the exact value of the exponent in Proposition 4.5 will be crucial for our argument. The proof of the L 2 -norm bound in Proposition 4.7 is especially interesting in this regard since it uncovers that the escape of mass is related to delicate arithmetic questions; our arguments require careful estimates on the number of solutions of certain Diophantine equations.
To make the technical passages in the final steps of the proof of Theorem 1.2 a bit more readable, we shall devote Section 5 to Central Limit Theorems for sums of the form where f is a smooth and compactly supported function on R m+n , andf denotes the Siegel transform of f (see Section 4.3 for definitions). We stress that even though f is assumed to be bounded, f is unbounded on X. To prove the Central Limit Theorems in this setting, we approximatef by compactly supported functions on X and then use the estimates from Section 3. However, the bounds in these estimates crucially depend on the order of approximation, so this step requires a delicate analysis of the error terms. The non-divergence results established in Section 4 play important role here.
Finally, to prove the Central Limit Theorem for the functionχ (which is the Siegel transform of an indicator function on a nice bounded domain in R m+n ), and thus establish Theorem 1.2, we need to approximate χ with smooth functions, and show that the arguments in Section 5 can be adapted to certain sequences of Siegel transforms of smooth and compactly supported functions. This will be done in Section 6.

ESTIMATES ON HIGHER-ORDER CORRELATIONS
Let X denote the space of unimodular lattices in R m+n . Setting G := SL m+n (R) and Γ := SL m+n (Z), we may consider the space X as a homogeneous space under the linear action of the group G, so that We fix m, n 1 and denote by U the subgroup and set Y := UZ m+n ⊂ X. Geometrically, Y can be visualized as a mn-dimensional torus embedded in the spaces of lattices X. We denote by µ Y the probability measure on Y induced by the Lebesgue probability measure on M m,n ([0, 1]), and we note that Y corresponds to the collection of unimodular lattices Λ u , for u ∈ M m,n ([0, 1)), introduced earlier in (1.8).
Let us further fix positive numbers w 1 , . . . , w m+n satisfying and denote by (a t ) the one-parameter semi-group a t := diag e w 1 t , . . . , e w m t , e −w m+1 t , . . . , e −w m+n t , t > 0. (2.2) The aim of this section is to analyze the asymptotic behavior of a t Y ⊂ X as t → ∞, and investigate "decoupling" of correlations of the form for "large" t 1 , . . . , t r > 0. It will be essential for our subsequent argument that the error terms in this "decoupling" are explicit in terms of the parameters t 1 , . . . , t r > 0 and suitable norms of the functions φ 1 , . . . , φ r , which we now introduce.
Every Y ∈ Lie(G) defines a first order differential operator If we fix an (ordered) basis {Y 1 , . . . , Y r } of Lie(G), then every monomial Z = Y ℓ 1 1 · · · Y ℓ r r defines a differential operator by of degree deg(Z) = ℓ 1 + · · · + ℓ r . For k 1 and φ ∈ C ∞ c (X), we define the norm The starting point of our discussion is a well-known quantitative estimate on correlations of smooth functions on X: Theorem 2.1. There exist γ > 0 and k 1 such that for all φ 1 , φ 2 ∈ C ∞ c (X) and g ∈ G, . This theorem has a very long history that we will not attempt to survey here, but only mention that a result of this form can be found, for instance, in [7,9].
Let us from now on we fix k 1 so that Theorem 2.1 holds.
Our goal is to decouple the higher-order correlations in (2.3), but in order to state our results we first need to introduce a family of finer norms on C ∞ c (X) than ( · L 2 (X) k ). Let us denote by · C 0 the uniform norm on C c (X). If we fix a right-invariant Riemannian metric on G, then it induces a metric d on X ≃ G/Γ , which allows us to define the norms and for φ ∈ C ∞ c (X). We shall prove: where Remark 2.3. The case r = 1 was proved by Kleinbock and Margulis in [10], and our arguments are inspired by theirs. We stress that the constant δ in Theorem 2.2 is independent of r.
We also record the following corollary of Theorem 2.2.

Preliminary results
We recall that d is a distance on X ∼ = G/Γ induced from a right-invariant Riemannian metric on G. We denote by B G (ρ) the ball of radius ρ centered at the identity in G. For a point x ∈ X, we let ι(x) denote the injectivity radius at x, that is to say, the supremum over ρ > 0 such that the map (2.7) By Mahler's Compactness Criterion, K ε is a compact subset of X. Furthermore, using Reduction Theory, one can show: An important role in our argument will be played by the one-parameter semi-group b t := diag e t/m , . . . , e t/m , e −t/n , . . . , e −t/n , t > 0, (2.8) which coincides with the semi-group (a t ) as defined in (2.2) with the special choice of exponents The submanifold Y ⊂ X is an unstable manifold for the flow (b t ) which makes the analysis of the asymptotic behavior of b t Y significantly easier than that of a t Y for general parameters. Using Theorem 2.1, Kleinbock and Margulis proved in [7] a quantitative equidistribution result for the family b t Y as t → ∞, we shall use a version of this result from their later work [10].

Remark 2.7.
Although the dependence on φ is not stated in [10, Theorem 2.3], the estimate is explicit in the proof.
We will prove Theorem 2.2 through successive uses of Theorem 2.6. In order to make things more transparent, it will be convenient to embed the flow (a t ) as defined in (2.2) in a multiparameter flow as follows. For s = (s 1 , . . . , s m+n ) ∈ R m+n , we set a(s) := diag e s 1 , . . . , e s m , e −s m+1 , . . . , e −s m+n . (2.9) We denote by S + the cone in R m+n consisting of those s = (s 1 , . . . , s m+n ) which satisfy and thus, with s t := (w 1 t, . . . , w m+n t), we see that a t = a(s t ).
In addition to Theorem 2.6, we shall also need the following quantitative non-divergence estimate for unipotent flows established by Kleinbock and Margulis in [10]. There exists θ = θ(m, n) > 0 such that for every compact L ⊂ X and a Euclidean ball B ⊂ U centered at the identity, there exists T 0 > 0 such that for every ε ∈ (0, 1), x ∈ L, and s ∈ S + satisfying ⌊s⌋ T 0 , one has |{u ∈ B : a(s)ux / ∈ K ε }| ≪ ε θ |B|.
(2.10) The next lemma provides an additional parameter s ∈ S + , which depends on the r-tuple (t 1 , . . . , t r ). This parameter will be used throughout the proof of Theorem 2.2, and we stress that the accompanying constants c 1 , c 2 and c 3 are independent of r and the r-tuple (t 1 , . . . , t r ). Lemma 2.9. There exist c 1 , c 2 , c 3 > 0 such that given any t r > t r−1 > 0, there exists s ∈ S + satisfying: . . , z m , z n , . . . , z n ) for some z c 3 min(t r−1 , t r − t r−1 ). Proof. We start the proof by defining s by the formula in (iii), where the parameter z will be chosen later, that is to say, we set Then (i) holds provided that where A 1 := min(w i : 1 i m) and A 2 := min(w i : m + 1 i m + n), so if we set c 1 = min(A 1 , A 2 ), then (i) holds when z c 1 min(m, n)t r−1 . (2.11) To arrange (ii), we observe that and thus (ii) holds provided that If we let c 2 = min(A 1 , A 2 )/2, then (ii) holds when z c 2 min(m, n)(t r − t r−1 ). (2.12) So far we have arranged so that (i) and (ii) hold provided that z satisfies (2.11) and (2.12). Let c 3 = min(c 1 , c 2 ) min(m, n), and note that if we pick z = c 3 min(t r−1 , t r − t r−1 ), then (i), (ii) and (iii) are all satisfied.
Let us now continue with the proof of Theorem 2.2. With the parameter s provided by Lemma 2.9 above, we have where b z is defined as in (2.8) and D as in (2.10) Let us throughout the rest of the proof fix a compact set Ω ⊂ U. Our aim now is to estimate integrals of the form where f ranges over C ∞ c (U) with supp(f) ⊂ Ω. Our proof will proceed by induction over r, the case r = 0 being trivial.
Before we can start the induction, we need some notation. Let ρ 0 and k be as in Theorem 2.6, and pick for 0 < ρ < ρ 0 , a non-negative ω ρ ∈ C ∞ c (X) such that for some fixed σ = σ(m, n, k) > 0. The integral I r can now be rewritten as follows: If we set and x s,u := a(s)ux 0 , then is non-expanding, and thus we can conclude that there exists a fixed Euclidean ball B in U (depending only on Ω and ρ 0 ), and centered at the identity, such that if f s,u (v) = 0, then u ∈ B. This implies that the integral I r can be written as and We decompose the integral I r in (2.17) as To estimate I ′ r (ε), we first recall that s c 1 D by (2.13), so if D T 0 /c 1 , where T 0 is as in Theorem 2.8 applied to L = K ε and B, then the same theorem implies that there exists θ > 0 such that for every ε ∈ (0, 1), and thus Let us now turn to the problem of estimating I ′′ r (ε). Since the Riemannian distance d on G, restricted to U, and the Euclidean distance on U are equivalent on a small open identity neighborhood, we see that for v ∈ B G (ρ 0 ) and any t ∈ S + . Hence, using (2.14), we obtain that for all i = 1, . . . , r − 1, and thus, for all v ∈ B G (ρ 0 ), . This leads to the estimate Hence, using (2.19), we obtain that Since by (2.15), we apply Theorem 2.6 to estimate this integral. We recall that supp we have x s,u ∈ K ε , so that ι(x s,u ) ≫ ε m+n by Proposition 2.5. In particular, we may take to arrange that ι(x s,u ) > 2ρ. Hence, by applying Theorem 2.6, we deduce that there exist c, γ > 0 such that , for all u ∈ B\B ε . Using (2.18)-(2.19) and z c 3 D, we deduce that Applying (2.22) one more time (in the backward direction), we get It follows from (2.20) that where we recognize the first term as I r−1 . Using (2.17) and (2.19), we now conclude that Hence, combining this estimate with (2.23), we deduce that and thus, in view of (2.21), This estimate holds whenever ρ < ρ 0 and ε ≫ ρ 1/(m+n) . Taking ρ = e −c 4 D for sufficiently small c 4 > 0 and ε ≪ ρ 1/(m+n) , we conclude that there exists δ > 0 such that for all sufficiently large D, The exponent δ depends on the constants c 2 and c 3 given by Lemma 2.9 and the parameters θ, c, σ, γ. In particular, δ is independent of r. By possibly enlarging the implicit constants we can ensure that the estimate (2.24) also holds for all r-tuples (t 1 , . . . , t r ), and not just the ones with sufficiently large D(t 1 , . . . , t r ). By iterating the estimate (2.24), using that I 0 is a constant, the proof of Theorem 2.2 is finished.

CLT FOR FUNCTIONS WITH COMPACT SUPPORT
Let a = diag(a 1 , . . . , a m+n ) be a diagonal linear map of R m+n with a 1 , . . . , a m > 1, 0 < a m+1 , . . . , a m+n < 1, and a 1 · · · a m+n = 1. The map a defines a continuous self-map of the space X, which preserves µ X . We recall that the torus Y = UZ m+n ⊂ X is equipped with the probability measure µ Y . In this section, we shall prove a Central Limit Theorem for the averages We stress that this result is not needed in the proof of Theorem 1.2, but we nevertheless include it here because we feel that its proof might be instructive before entering the proof of the similar, but far more technical, Theorem 6.1.
Remark 3.2. It follows from Theorem 2.1 that the variance σ φ is finite.
Our main tool in the proof of Theorem 3.1 will be the estimates on higher-order correlations established in Section 2. To make notations less heavy, we shall use a simplified version of Corollary 2.4 stated in terms of C k -norms (note that N k ≪ · C k ):

The method of cumulants
Let (X, µ) be a probability space. Given bounded measurable functions φ 1 , . . . , φ r on X, we define their joint cumulant as where the sum is taken over all partitions P of the set {1, . . . , r}. When it is clear from the context, we skip the subscript µ. For a bounded measurable function φ on X, we also set We shall use the following classical CLT-criterion (see, for instance, [5]).

Proposition 3.4. Let (F T ) be a sequence of real-valued bounded measurable functions such that
and lim T →∞ Then for every ξ ∈ R, Since all the moments of a random variable can be expressed in terms of its cumulants, this criterion is equivalent to the more widely known "Method of Moments". However, the cumulants have curious, and very convenient, cancellation properties that will play an important role in our proof of Theorem 3.1.
For a partition Q of {1, . . . , r}, we define the conditional joint cumulant with respect to Q as In what follows, we shall make frequent use of the following proposition.

Estimating cumulants
Fix φ ∈ C ∞ c (X). It will be convenient to write ψ s (y) := φ(a s y) − µ Y (φ • a s ), so that In this section, we shall estimate cumulants of the form for r 3. Since we shall later need to apply these estimate in cases when the function φ is allowed to vary with N, it will be important to keep track of the dependence on φ in our estimates.
We shall decompose (3.5) into sub-sums where the parameters s 1 , . . . , s r are either "separated" or "clustered", and it will also be important to control their sizes. For this purpose, it will be convenient to consider the set {0, . . . , N − 1} r as a subset of R r+1 + via the embedding (s 1 , . . . , s r ) → (0, s 1 , . . . , s r ). Following the ideas developed in the paper [2], we define for non-empty subsets For 0 α < β, we define The following decomposition of R r+1 + was established in our paper [2, Prop. 6.2]: given we have where the union is taken over the partitions Q of {0, . . . , r} with |Q| 2. Upon taking restrictions, we also have where , In order to estimate the cumulant (3.5), we shall separately estimate the sums over Ω(β r+1 ; N) and Ω Q (α j , β j+1 ; N), the exact choices of the sequences (α j ) and (β j ) will be fixed at the very end of our argument.
In this case, s i β r+1 for all i, and thus Hence, where the implied constants may depend on r, but we shall henceforth omit this subscript to simplify notations.

Case 1: Summing over
In this case, we have so that it follows that Hence, 1 In this case, the partition Q defines a non-trivial partition and s i α j for all i ∈ I 0 , and s i > β j+1 for all i / ∈ I 0 . In particular, Let I be an arbitrary subset of {1, . . . , r}; we shall show that where we henceforth shall use the convention that the product is equal to one when I ∩ I h = ∅.
Let us estimate the right hand side of (3.13). We begin by setting It is easy to see that there exists ξ = ξ(m, n, k) > 0 such that (3.14) To prove (3.13), we expand We recall that when i / ∈ I 0 , we have s i β j+1 , and thus it follows from Corollary 3.3 with r = 1 that To estimate the other integrals in (3.15), we also apply Corollary 3.3. Let us first fix a subset J ⊂ I\I 0 and for each 1 h l, we pick i h ∈ I h , and set We note that for i ∈ I h , we have |s i − s i h | α j , and thus there exists ξ = ξ(m, n, k) > 0 such that Using (3.14) and (3.16) and invariance of the measure µ X , we deduce that Hence, we conclude that We shall choose the parameters α j and β j+1 so that Substituting (3.15) and (3.17) Next, we estimate the right hand side of (3.13). Let us fix 1 h l and for a subset J ⊂ I ∩ I h , we define As in (3.16), for some ξ > 0, Applying Corollary 3.3 to the function Φ J and using that where we have used a-invariance of µ X . Combining (3.15) and (3.20), we deduce that Furthermore, multiplying out the products over I ∩ I h , we get Comparing (3.19) and (3.22), we finally conclude that Y i∈I when (s 1 , . . . , s r ) ∈ Ω Q (α j , β j+1 ; N). This establishes (3.13) with an explicit error term. This estimate implies that for the partition Q ′ = {I 0 , . . . , I ℓ }, µ Y (ψ s 1 , . . . , ψ s r |Q ′ ) = 0, so it follows that for all (s 1 , . . . , s r ) ∈ Ω Q (α j , β j+1 ; N),

Estimating the variance
In this section, we wish to compute the limit Setting s 1 = s + t and s 2 = t, we rewrite the above sums as It follows from Corollary 3.3 that for fixed s and as t → ∞, as t → ∞, and for fixed s, If one carelessly interchange limits above, one expects that as N → ∞, (3.30) To prove this limit rigorously, we need to say a bit more to ensure, say, dominated convergence.

Proof of Theorem 3.1
In this subsection we shall check that the conditions of Proposition 3.4 hold for the sequence (F N ) defined in (3.1). First, by construction, It is easy to check that for every r 3, which finishes the proof.

Siegel transforms
We recall that the space X of unimodular lattices in R m+n can be identified with the quotient space G/Γ , where G = SL m+n (R) and Γ = SL m+n (Z), which is endowed with the G-invariant probability measures µ X . We denote by m G a bi-G-invariant Radon measure on G. Given a bounded measurable function f : R m+n → R with compact support, we define its Siegel transformf : X → R byf We stress thatf is unbounded on X, its growth is controlled by an explicit function α which we now introduce. Given a lattice Λ in R m+n , we say that a subspace It follows from the Mahler Compactness Criterion that α is a proper function on X.
Using Reduction Theory, it is not hard to derive the following integrability of α: In what follows, dz denotes the volume element on R m+n which assigns volume one to the unit cube. In our arguments below, we will make heavy use of the following two integral formulas:  [18]). If f : R m+n → R is a bounded Riemann integrable function with compact support, then Proposition 4.4 (Rogers Formula; [15], Theorem 5). If F : R m+n × R m+n → R is a non-negative measurable function, then where the sum is taken over primitive integral vectors, and ζ denotes Riemann's ζ-function.

Non-divergence estimates
We retain the notation from Section 2. Given 0 < w 1 , . . . , w m < n and w 1 + · · · + w m = n, we denote by a the self-map on X induced by a = diag(e w 1 , . . . , e w m , e −1 , . . . , e −1 ). (4.1) Our goal in this subsection is to analyze the escape of mass for the submanifolds a s Y and bound the Siegel transformsf(a s y) for y ∈ Y. The following proposition will play a very important role in our arguments.

Proposition 4.5.
There exists κ > 0 such that for every L 1 and s κ log L, µ Y ({y ∈ Y : α(a s y) L}) ≪ p L −p for all p < m + n.
Proof. Let χ L be the characteristic function of the subset {α < L} of X. By Mahler's Compactness Criterion, χ L has a compact support. We further pick a non-negative ρ ∈ C ∞ c (G) with G ρ dm G = 1. Let It follows from invariance of m G that if D Z is a differential operator as defined as in (2.4), then D Z η L = (D Z ρ) * χ L . Hence, η L ∈ C ∞ c (X), and η L C k ≪ ρ C k .

By Corollary 3.3, there exists δ > 0 and k 1 such that
Proof. We note that there exist 0 < υ 1 < υ 2 and ϑ > 0 such that the support of f is contained in the set and without loss of generality we may assume that f is the characteristic function of this set. We recall that Y can be identified with the collection of lattices Hence, We observe that for each i and p i ∈ Z, the volume of the set u ∈ [0, 1] n : |p + u, q | ϑ q −w i is estimated from above by 2ϑ q −w i max j |q j | ≪ q −1−w i , and we note that the set is empty whenever |p| > j |q j | + ϑ q −w i . In particular, it is non-empty for at most O( q ) choices of p ∈ Z. Hence, we deduce that where ν 1 = 1 and ν m = 0 when m 2.
Proof. As in the proof of Proposition 4.6, it is sufficient to consider the case when f is the characteristic function of the set (4.2). Thenf(a s y) is given by (4.3), and we get For fixed q, l ∈ Z n , we wish to estimate First, we consider the case when q and ℓ are linearly independent. Then there exist indices j, k = 1, . . . , n such that q j ℓ k − q k ℓ j = 0. Let us consider the function ψ on R 2 defined by ψ(x 1 , ℓ (x 2 ) as well as the periodized functionψ on R 2 /Z 2 defined byψ(x) = z∈Z 2 ψ(z + x). If we set ω := ζ =j,k q ζ u ζ and ρ := ζ =j,k ℓ ζ u ζ , then we denote by S the affine map which induces an affine endomorphism of the torus R 2 /Z 2 . We note that where µ denotes the Lebesgue probability measure on the torus R 2 /Z 2 . Since the endomorphism S preserves µ, we see that Therefore, we conclude that in this case, Let us now we consider the second case when q and ℓ are linearly dependent. Upon rearranging indices if needed, we may assume that |q 1 | = max(|q 1 |, . . . , |q n |, |ℓ 1 |, . . . , |ℓ n |). (4.5) In particular, q 1 = 0, and thus ℓ 1 = 0, since q and ℓ are linearly dependent, so we can define the new variables (ℓ j /ℓ 1 )u j and v 2 = u 2 , . . . , v n = u n , and thus We note that the last integral is non-zero only when |p| ≪ |q 1 | and |r| ≪ |ℓ 1 |. We set q 1 = q ′ d and ℓ 1 = ℓ ′ d where d = gcd(q 1 , ℓ 1 ). Then q 1 r − ℓ 1 p = jd for some j ∈ Z. We observe that when j is fixed, then the integers p and r satisfy the equation q ′ r − ℓ ′ p = j. Since gcd(q ′ , ℓ ′ ) = 1, all the solutions of this equation are of the form p = p 0 + kq ′ , r = r 0 + kℓ ′ for k ∈ Z. In particular, it follows that the number of such solutions satisfying |p| ≪ |q 1 | and |r| ≪ |ℓ 1 | is at most O(d). We write i (q, ℓ), where the first sum is taken over those p, r with q 1 r − ℓ 1 p = 0, and the second sum is taken over those p, r with q 1 r − ℓ 1 p = 0.
Upon applying a linear change of variables, we obtain Let us consider the function We note that the integrand equals the indicator function of the intersection of the intervals thus it follows that ρ i is non-increasing when x 0, and non-decreasing when x 0. This implies that Next, we proceed with estimation of J (2) i (q, ℓ). Let c 0 := min{ q : q ∈ Z n \{0}}. Denoting by N(q 1 , ℓ 1 ) the number of solutions (p, r) of the equation where we used that q 1 is chosen according to (4.5). Combining the obtained estimates for J (1) i (q, ℓ) and J (2) i (q, ℓ), we conclude that when q and ℓ are linearly dependent, where q 1 is chosen according to (4.5).
Using (4.4), the sum in (4.7) over linearly independent q and ℓ can be estimated as For a subset I of {1, . . . , m}, we set w(I) := i∈I w i . Then using (4.6), we deduce that the sum in (4.7) over linearly dependent q and ℓ is bounded by The star indicates that the sum is taken over linearly dependent q and ℓ.
When I c = ∅, then w(I) = n. Since the number of (q, ℓ) satisfying υ 1 e s q , ℓ υ 2 e s is estimated as O(e 2ns ), it is clear that the corresponding term in the above sum is uniformly bounded. Now we suppose that I c = ∅. Since q and ℓ are linearly dependent, the vector ℓ is uniquely determined given ℓ 1 and q, and we obtain that for some υ ′ 1 , υ ′ 2 > 0, * We shall use the following lemma: Lemma 4.8. For every k 1, Proof. We observe that the sum of N(q, ℓ) k over 1 ℓ q is equal to the number of solutions (p 1 , . . . , p k , r 1 , . . . , r k , ℓ) of the system of equations satisfying |p 1 |, . . . , |p k | (c −n 0 ϑ + n)q and 1 ℓ q. We order these solutions according to d := gcd(q, ℓ). Let q = q ′ d and ℓ = ℓ ′ d. Then q ′ and ℓ ′ are coprime, and the system (4.9) is equivalent to (4.10) Because of coprimality, each p i have to be divisible by q ′ , so that the number of such p i 's is at most O(q/q ′ ) = O(d). We note that given d the number of possible choices for ℓ is at most q/d, and (p 1 , . . . , p k , ℓ) uniquely determine (r 1 , . . . , r k ). Hence, the number of solutions of (4.9) is estimated by where σ k−1 (q) = d|q d k−1 . Writing q = q ′ d, we conclude that This proves the lemma.
A simple modification of this argument also gives that Hence, it follows that * The terms in this sum are uniformly bounded unless I = ∅ and |I c | = 1, namely, when m > 1. When m = 1, we obtain the bound O(1 + s). This proves the proposition.

Truncated Siegel transform
The Siegel transform of a compactly supported function is typically unbounded on X; to deal with this complication, it is natural to approximatef by compactly supported functions on X, the so called truncated Siegel transforms, which we shall denote byf (L) . They will be constructed using a smooth cut-off function η L , which will be defined in the following lemma.
Proof. Let χ L denote the indicator function of the subset {α L} ⊂ X, and pick a non-negative φ ∈ C ∞ c (G) with G φ dm G = 1 and with support in a sufficiently small neighbourhood of identity in G to ensure that for all g ∈ supp(φ) and x ∈ X, We now define η L as Since φ 0 and G φ dm G = 1, it is clear that 0 η L 1. If α(x) c −1 L, then for g ∈ supp(φ), we have α(g −1 x) L, so that η L (x) = G φ dm G = 1. If α(x) > c L, then for g ∈ supp(φ), we have α(g −1 x) > L, so that η L (x) = 0.
To prove the last property, we observe that it follows from invariance of m G that for a differential operator D Z as in (2.4), we have D Z η L = (D Z φ) * χ L . Therefore, supp(D Z η L ) ⊂ {α c L} and For a bounded function f : R m+n → R with compact support, we define the truncated Siegel transform of f asf (L) :=f · η L .
We record some basic properties of this transform that will be used later in the proofs.
We observe that for a differential operator D Z as in (2.4), we have D Z (f) = D Z f. Hence, we deduce from Proposition 4.1 that Since supp(η L ) ⊂ {α cL} and η L C k ≪ 1, we deduce that To prove (4.14), we observe that since 0 η L 1 and η L = 1 on {α < c −1 L}, it follows from Proposition 4.1 that Hence, applying the Hölder inequality with 1 p < m + n and q = (1 − 1/p) −1 , we deduce that which proves (4.14). The proof of (4.15) is similar, and we omit the details.

CLT FOR SMOOTH SIEGEL TRANSFORMS
Assume that f ∈ C ∞ c (R m+n ) satisfies f 0 and supp(f) ⊂ {(x m+1 , . . . , x m+n ) = 0}. We shall in this section analyze the asymptotic behavior of the averages and prove the following result: If m 2 and f is as above, then the variance is finite, and for every ξ ∈ R, The proof of Theorem 5.1 follows the same plan as the proof of Theorem 3.1, but we need to develop an additional approximation argument which involves truncations of the Siegel transformf. This can potentially change the behavior of the averages F N , so we will have to take into account possible escape of mass for the sequences of submanifolds a s Y in X.
Throughout the proof, we shall frequently make use of the basic observation that if we approximate F N byF N in such a way that F N −F N L 1 (Y) → 0, then F N andF N will have the same convergence in distribution.
for some M = M(N) → ∞ that will be chosen later. We observe that and thus, by Proposition 4.6 we see that It particular, it follows that if (5.1) holds forF N , then it also holds for F N , we shall prove the former. In order to simplify notation, let us drop the tilde, and assume from now on that F N is given by (5.2).
Given a sequence L = L(N), which shall be chosen later, we consider the average F (L) defined for the truncated Siegel transformsf (L) introduced in Section 4.3. We have .
Hence, by the Cauchy-Schwarz inequality, Let us now additionally assume that so that the assumption of Proposition 4.5 is satisfied when s M. This implies that We also recall that by Proposition 4.7 when m 2, and thus then it follows that Hence, if we can show Theorem 5.1 for the averages F (L) N with the parameter constraints above, then it would also hold for F N . In order to prove CLT for (F (L) N ), we follow the route of Proposition 3.4 and estimate cumulants and L 2 -norms of the sequence.

Estimating cumulants
We set Our aim is to estimate when r 3. The argument proceeds as in Section 3.2, but we have to refine the previous estimates to take into account the dependence on the parameters L and M. Using the notation from Section 3.2, we have the decomposition where , We decompose the sum into the sums over Ω(β r+1 ; M, N) and Ω Q (α j , β j+1 ; M, N). Let us choose M so that M > β r+1 . (5.10) Then Ω(β r+1 ; M, N) = ∅, and does not contribute to our estimates. In this case, we shall show that ). This reduces to estimating the integrals If (s 1 , . . . , s r ) ∈ Ω Q (α j , β j+1 ; N), and thus it follows from Corollary 3.3 with r = 1 that there exists δ > 0 such that For a fixed J ⊂ I, we define and note that for some ξ = ξ(m, n, k) > 0, we have If we again apply Corollary 3.3 to the function Φ (L) , we obtain Y i∈Jf where we used that µ X is invariant under the transformation a. Let us now choose the exponents α j and β j+1 so that δβ j+1 − rξα j > 0. Combining (5.12), (5.13) and (5.14), we deduce that and thus, for any partition P, and consequently, . . , r}}, from which (5.11) follows.
We now claim that where we use the notation x + = max(x, 0). The implied constant in (5.17) and below in the proof depend only on supp(f). By the definition of the cumulant, to prove (5.17), it suffices to show that for every z 1 and indices i 1 , . . . , i z , Using the generalized Hölder inequality, we deduce that when z m + n − 1, .
Then we select L = N q , so that, in particular, the condition (5.7) is satisfied. Now (5.21) can be rewritten as Choosing γ of the form with sufficiently large c r > 0, we conclude that

Estimating variances
We now turn to the analysis of the variances of the average F (L) N which are given by We proceed as in Section 3.3 taking into account dependence on parameters M and L. We observe that this expression is symmetric with respect to s 1 and s 2 , writing s 1 = s + t and s 2 = t with 0 s N − M − 1 and M t N − s − 1, we obtain that To estimate Θ (L) N (s), we introduce an additional parameter K = K(N) → ∞ such that K M (to be specified later) and consider separately the cases when s < K and when s K.
First, we consider the case when s K. By Corollary 3.3, we have Hence, combining (5.24) and (5.25), we deduce that where we used Lemma 4.10. The implied constants here and below in the proof depend only on supp(f).
Let us now consider the case s < K. We observe that Corollary 3.3 (for r = 1) applied to the function φ (L) s Combining this estimate with (5.25), we conclude that we obtain that + N −1 e −δM e ξs f (L) 2 C k . Therefore, using Lemma 4.10, we deduce that Θ (L) Next, we shall show that with a suitable choice of parameters, We recall that by Lemma 4.10, for all τ < m + n − 1, where the implied constant depends only on supp(f). It follows from these estimates that We choose the parameters K = K(N) → ∞ and L = L(N) → ∞ so that KL −(τ−1)/2 → 0 for some τ < m + n − 1. (5.35) Then (5.32) follows. We conclude that Finally, we compute Θ ∞ (s) using by the Rogers formula (Proposition 4.4) applied to the function Since by the Siegel Mean Value Theorem (Proposition 4.3), we conclude that Finally, we show that the sum in (5.36) is finite. We represent points z ∈ R m+n as z = (x, y) with x ∈ R m and y ∈ R n . Since f is bounded, and its compact support is contained in {y = 0}, we may assume without loss of generality that f is the characteristic function of the set y −n dy.

Proof of Theorem 5.1
As we already remarked above, it is sufficient to show that the sequence of averages F (L) N converges in distribution to the normal law. To verify this, we use the Method of Cumulants (Proposition 3.4). It is easy to see that Moreover, with a suitable choice of parameters, we have shown in Section 5.1 that for r 3, and in Section 5.2 that → σ 2 f < ∞ as N → ∞. Hence, Proposition 3.4 applies, and it remains to verify that we can choose our parameters that satisfy the stated assumptions. We recall that L = N q and γ = c r (log N). We recall from Section 1.3 that where Λ u is defined in (1.8) and the domains Ω T are defined in (1.9). We shall decompose this domain into smaller pieces using the linear map a = diag(e w 1 , . . . , e w m , e −1 , . . . , e −1 ), we note that for any integer N 1, and thus where χ denotes the characteristic function of the set Ω e . Hence the proof of Theorem 1.2 reduces to analyzing sums of the form N−1 s=0χ (a s y) with y ∈ Y. For this purpose, we define Our main result in this section now reads as follows.
as N → ∞, where We approximate χ by a family of non-negative functions f ε ∈ C ∞ c (R m+n ) whose supports are contained in an ε-neighbourhood of the set Ω e , and This approximation allows us to construct smooth approximations of the Siegel transformχ in the following sense.
Hence, it remains to show that for j = 1, 2, 3, As in (4.3), we compute that where χ θ denotes the characteristic function of the interval [−θ, θ]. We observe that and moreover this integral is non-zero only when |p i | = O( q ). Hence, The number of integral points in the region {(1 − ε)e s y e s } can be estimated in terms of its volume. Namely, there exist r > 0 (depending only on the norm) such that |{q ∈ Z n : (1 − ε)e s q e s }| ≪ |{y ∈ R n : (1 − ε)e s − r y e s + r}| . Hence, The integral forχ 2,ε • a s can be estimated similarly.
The integral overχ 3,ε • a s as in (6.2) can be written as a sum of the products of the integral and the integrals We observe that these integrals are non-zero only when |p j | = O( q ) and |p i | = O( q ). Hence, we conclude that which completes the proof of the proposition. Now we start with the proof of Theorem 6.1. As in Section 5, we modify F N and consider insteadF for a parameter M = M(N) → ∞ that will be chosen later. As in (5.3) we obtain that Hence, if we can prove the CLT for (F N ), then the CLT for (F N ) would follow. From now on, to simplify notations, we assume that F N is given by (6.3).
Our next step is to exploit the approximation χ ≈ f ε , so we introduce where the parameter ε = ε(N) → 0 will be specified later. We observe that it follows from Proposition 6.2 that .
We choose ε = ε(N) and M = M(N) so that Hence, it remains to prove convergence in distribution for the sequence F We observe that the sequence F (ε) N fits into the framework of Section 5. However, we need to take into account the dependence on the new parameter ε and refine the previous estimates. It will be important for our argument that the supports of the functions f ε are uniformly bounded, f ε C 0 ≪ 1, and f ε C k ≪ ε −k .
As in Section 5, we consider the truncation defined for a parameter L = L(N) → ∞. We assume that M ≫ log L, (6.6) so that Proposition 4.7 applies when s M. Since the family of functions f ε is majorized by a fixed bounded function with compact support, Proposition 4.7 implies that when m 2 , f ε • a s L 2 (Y) ≪ 1 for all s 0, uniformly on ε. Hence, the bound (5.6) can be proved exactly as before, and we obtain We choose the parameter L as before so that N = o (L p ) for some p < m + n (6.7) to guarantee that Now it remains to show that the family F (ε,L) N satisfies the CLT with a suitable choice of parameters M, L, ε. As in Section 5 we will show that for r 3, and with an explicit σ ∈ (0, ∞). ≪ N r/2 e −δγ L r ε −rk + N 1−r/2 γ r−1 L (r−(m+n−1)) + .
We note that the implicit constant in (5.21) depends only on supp(f ε ) so that it is uniform on ε. We choose L = N q as in Section 5 and γ = c r (log N), where c r > 0 will be specified later. In particular, then N 1−r/2 γ r−1 L (r−(m+n−1)) + → 0, and assuming that N r/2 L r ε −rk = o(e δγ ), (6.11) it follows that (6.8) holds.
The sum of the other integral appearing in (6.18) is estimated similarly. This proves (6.17).
Then the first part of (6.5) holds. Then we select sufficiently large c r in γ = c r (log N) so that (6.11) holds. After that we choose with sufficiently large c 1 > 0 so that (6.13) holds. Then it is clear that (6.16) also holds. Given these ε, L, γ, and K, we choose M(N) = (log N)(log log N) so that the second part of (6.5), (6.6), (6.10), and (6.14) hold for all N N 0 (r). With these choices, it is clear that (6.4) and (6.15) also hold. Hence, Theorem 6.1 follows from Proposition 3.4. where C m,n = 2 m ϑ 1 · · · ϑ m ω n with ω n := S n−1 z −n dz. We shall show that D T (u) can be approximated by the averages F N defined in (6.1). This will allow us deduce convergence in distribution for D T . We observe that: where C m,n is defined above.
Proof. We observe that where Ξ N denotes the charateristic function of the set (x, y) ∈ R m+n : 1 y < e N , |x i | < ϑ i y −w i , i = 1, . . . , m .
Using notation as in the proof of Proposition 4.6, we obtain We claim that To prove this, let us consider more generally a bounded measurable functions χ on R with compact support, the function ψ(x) = χ(x 1 ) on R m , and the functionψ(x) = p∈Z χ(p + x 1 ) on the torus R m /Z m . We suppose without loss of generality that q 1 = 0 and consider a non-degenerate linear map S : R m → R m : u → ( u, q , u 2 , . . . , u m ) which induced a linear epimorphism of the torus R m /Z m . Using that S preserves the the Lebesgue probability measure µ on R m /Z m , we deduce that which yields (6.19).