Central limit theorems for generic lattice point counting

We consider the problem of counting lattice points contained in domains in Rd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}^d$$\end{document} defined by products of linear forms. For d≥9\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d \ge 9$$\end{document} we show that the normalized discrepancies in these counting problems satisfy non-degenerate Central Limit Theorems with respect to the unique SLd(R)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text {SL}}_d(\mathbb {R})$$\end{document}-invariant probability measure on the space of unimodular lattices in Rd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}^d$$\end{document}. We also study more refined versions pertaining to “spiraling of approximations”. Our techniques are dynamical in nature and exploit effective exponential mixing of all orders for actions of diagonalizable subgroups on spaces of unimodular lattices.

In this paper we study the corresponding discrepancy function defined by . (1.1) When the domain T is a T -dilation of a region ⊂ R d with piecewise smooth boundary, one can easily prove that (see the book [21] for many results of this form), and this estimate is the best possible in this generality. However, the estimate has been improved for certain particular classes of domains. A well-studied setting is when the domain has non-vanishing curvature.
In this case, Hlawka [16] has shown that These bounds have been subsequently improved by a number of people (see, for instance, [17] for a survey).
In this paper we shall be interested in asymptotic behaviour (T → ∞) of the discrepancy function D T ( ) for "generic" lattices . The following two questions naturally arise in this setting: (i) what is the asymptotic "generic" growth of D T ( )? (ii) do suitably normalized discrepancy functions converge in distribution?
Concerning Question (i): it turns out that the estimate (1.2) can be improved for generic lattices. The first striking result in this direction was established by W. Schmidt [25]. He proved that for a every increasing family of Borel sets T as above and almost every lattice , However, the exact asymptotic behavior of D T ( ) for generic lattices is still quite mysterious, and it turns out that the answer depends very sensitively on the shape of the domains. For instance, Hardy, Littlewood [16] and Khinchin [20] discovered that when T is a T -dilation of a generic compact polygon in R 2 , then This exhibits a striking difference with the estimate (1.3) for strictly convex domains. Skriganov [28] established a far-reaching generalization of this estimate. He showed that when T is a dilation by a factor T of a compact polyhedron in R d , then for almost every unimodular lattice , for all ε > 0.
It is not known whether the above bound is optimal. Another well-studied example is the case when the domains T are the Euclidean balls in R d . In this case, it was shown by Kelmer [18] that for any exponentially growing sequence T i → ∞ and almost all lattices , Concerning Question(ii) above: several results have been proved for certain particular families of lattices. For instance, it was discovered by Beck that the distributions of suitably normalized discrepancy functions are asymptotically Gaussian. We refer to a survey [2] and a monograph [3] for a comprehensive exposition of these results. Beck considered the domains T := (x, y) ∈ R 2 : x 2 − 2y 2 ∈ (a, b), 0 < x < T , y > 0 and translated lattices ω := Z 2 + (ω, 0) with 0 < ω < 1 and showed that there exists an explicit σ > 0 such that While this approach seems to work for domains defined by more general indefinite integral binary quadratic forms, it was not clear whether this result could hold in higher dimensions since its proof was based on properties of continued fraction expansions for quadratic irrationals. Furthermore, Beck points out that there are essential difficulties in extending his work to higher dimensions related to the long-standing Littlewood Conjecture.
Levin [22] investigated the discrepancy function of the family of lattices of the form a := diag(a 1 , . . . , a d ) −1 O, a = (a 1 , . . . , a d ) where O is a fixed lattice in R d arising from an order in a totally real number field. He showed that for the boxes N := [−N 1 , N 1 ] × · · · × [−N d , N d ], suitably normalized discrepancy functions D N ( a ) are asymptotically Gaussian as N 1 · · · N d → ∞, with a ∈ (0, 1) d considered random. Since the results [2,3,22] treat only very particular lattices arising from orders in number fields, one may wonder whether this behavior occurs for truly generic lattices. We will address this question in the present paper.
One should also mention the ground-breaking works of Dolgopyat, Fayad [8,10] (see also the survey [9]), generalizing Kersten [19], about the discrepancy of distribution for toral translations. Using our terminology, these results can be interpreted in terms of discrepancy functions for the family of lattices given by u := (x 1 + u 1 y, . . . , x d−1 + u d−1 y, y) : (x 1 , . . . , x d−1 , y) ∈ Z d with 0 ≤ u 1 , . . . , u d−1 < 1. and certain families of domains T (θ ) depending on additional parameters θ . It is shown in [8,10] that the corresponding discrepancy for | u ∩ T (θ )| after a suitable normalization converges in distribution as T → ∞, with (u, θ) considered random. It should be noted that the obtained limit distributions in [8,10] are different from the Normal Law. Further related results about distribution of Diophantine approximants were proved in [11] and [7].

Main results
Let L 1 , . . . , L d : R d → R be linearly independent linear forms and N (x) := L 1 (x) · · · L d (x). For a bounded interval I ⊂ R + and T > 0, we consider the domains We write X for the space of unimodular lattices in R d equipped with the unique SL d (R)-invariant probability measure μ. The following result provides an analogue of (1.4) for μ-generic unimodular lattices: The corresponding norm on R d j will be denoted by · , the spherical measure on S d j −1 will be denoted by κ j , and we set κ := κ 1 ⊗ · · · ⊗ κ k . (1.5) Let us also fix rotation-invariant smooth metrics on each S d j −1 with d j ≥ 2. If d j = 1, we endow S 0 = {−1, 1} with the discrete distance. If B ⊂ S d is a Borel set and ε > 0, we denote by B ε the ε-thickening of B with respect to the products of the chosen metrics. We say that a Borel set B ⊂ S d has a smooth boundary if where the implicit constants are independent of ε.
Given a bounded interval I ⊂ (0, ∞), a Borel set B ⊂ S d and T > 0, we consider the domains (1.7) Our main result is the following: Theorem 1.3 When k ≥ 2 and d ≥ 9, the discrepancy functions for the sets T (I , B) satisfy, Theorems 1.1 and 1.3 have been announced in [5] for d ≥ 4. However, it turned out that the technical part of our argument works only for d ≥ 9.
In the next section, we summarize the main steps of the proof of Theorem 1.3. Our argument can be roughly divided into two parts that involve: • a construction of a suitable approximation for the counting function (Sect. 4), • analysis of such approximations (Sect. 3).

Concerning novelty
We want to stress that the approximation of the counting function in this paper is very different from, and much more involved than, the approximation employed in our previous paper [7]. In the latter paper, the domains in which the lattice points were counted could be perfectly tiled by a fixed subgroup of diagonal matrices, thus essentially reducing the question whether a Central Limit Theorem holds, to (an unbounded version of) the setting in [6].
In this paper however, the relevant domains in which we wish to count, are foliated by lower-dimensional subsets, which all admit nice tilings by (higher rank) diagonal subgroups of matrices, but these subgroups depend in a non-trivial way on the leaf in the foliation. We can approximate the counting function on each of these leaves, and bunch the resulting approximations together into a functional tiling (see Sect. 2 for more details). This functional tiling is an integral of averages of a parameterized family of smooth functions over yet another parameterized family of subgroups of diagonal matrices. Each of these parameterized averages can in principle be analyzed using the techniques from [6,7], but that is not enough.
The issue is that the parameterized family of smooth functions in the averages is not bounded in the relevant parameter (even after the cuspidal cut-offs), which causes serious problems in our cumulant machinery, more specifically in our analysis of "clustered tuples". To circumvent this, we need to make use of some special features of the geometry at hand (see Sect. 3.5.2).

Outline of the proof
Our argument will involve analysis on the space X of unimodular lattices in R d , which can be considered as a homogeneous space X The space X supports a unique SL d (R)-invariant probability measure, which we shall denote by μ throughout the paper.
Given a bounded Borel measurable function f : R d → R with bounded support, its Siegel transform f : X → R is defined by According to Siegel's Mean Value Theorem [27], if f is Riemann integrable, then where we normalise the Lebesgue measure dz on R d so that the unit cube is assigned volume one.
Suppose that T is a bounded Borel set in R d , which does not contain the origin. Then, with the above notations, In the setting of Theorem The initial idea of our approach is that the level sets L s,ξ can be tessellated, using the action of a discrete subgroup of A on R d . Unfortunately, the domains T themselves do not possess such simple tilings. However, it turns out that each of the intersections T ∩L s,ξ has a tiling where tiles and the discrete subgroup depends on the parameters s and T (but not on the parameter ξ ). We will show that the indicator functions χ T can be approximated by suitable integrals of varying functional averages. These "functional tilings" stem from the above tilings for different values of s and ξ and are constructed using the following data: • A collection of finite measure spaces (Y T ,i , κ T ,i ) indexed by T > 0 and i in a finite set I, The corresponding "functional tiling" is given by We shall show that for a suitable choice of the data, F T provides an approximation for the characteristic function χ T in the sense that Assuming this, we can then write where the first and third term on the right hand side tend to zero in the L 1 (μ)-norm. Thus, the distributional limit of D T ( ) is the same as the distributional limit of the sequence of functions The significance of this observation is that Siegel transforms of functional tilings like F T can be investigated using homogeneous dynamics techniques. Since averages of this form also arise in other arithmetic problems, we will analyze their behavior in an abstract axiomatic setting (cf. assumptions (I.a)-(I.c) and (II.a)-(II.c) below). This analysis will be carried out in Sect. 3. Our main result here is Theorem 3.19. Notably, it shows that when certain basic norm estimates for functions f T ,i hold, the distributional convergence of ϒ T ( ) holds provided that the variance ϒ T L 2 (X ) converges. Next, in Sect. 4 we construct an approximation for χ T of the form (2.4) satisfying our assumptions (I.a)-(I.c) and (II.a)-(II.c). Once such an approximation is available, our main result will be a corollary of Theorem 3.19.

Analysis of general functional tilings
In this section we consider a family of functions F T on R d defined by a "functional tiling" as in (2.4). Our goal is to analyze the asymptotic behavior of the sums F T ( ) = z∈ \{0} F T (z) for lattices in R d . We will formulate several assumptions on the objects defining F T and then in the next section demonstrate that the developed framework does apply to our setting.
We hope that the axiomatic approach outlined in this paper can be used in other counting problems as well. Our main result here is Theorem 3.19, which establishes the Central Limit Theorem for ( F T ), with respect to the measure μ.

Some remarks about the axioms
The goal of this section is to describe a general approach for proving Central Limit Theorems for Siegel transforms of functional tilings (F T ) of the form (2.4). Our approach is based on two sets of assumptions on the data The first set of assumptions are labelled I.a,I.b,I.c and are described in Sect. 3.2, while the second set of assumptions are labelled II.a, II.b, II.c and are described in Sect. 3.4.
The first set of assumptions simply describes the objects in the data that make up the functional tiling. The key point here is that the functions f T ,i are smooth in the first variable and supported in a fixed compact subset of R d (in particular, the Siegel transform of f T ,i (·, y i ) is well-defined and smooth for every y i ∈ Y T ,i and for all i ∈ I.
The second set of assumptions is deeper. The first two assumptions (II.a and II.b) are concerned with the finite subsets Q T ,i (y i ). Roughly speaking, II.a requires that Q T ,i are well-separated subsets of the group A described in the previous subsection, while II.b takes this assumption a bit further, namely that there is a sequence ( Q T ,i ) (independent of y i ∈ Y T ,i ) and a family of quasi-isometric embeddings β T ,i (·, These two assumptions will be useful when we estimate the contribution to cumulants of (truncations of) F T ,i (·, y i ) coming from separated tuples (Sect. 3.5.1) The remaining assumption II.c is the most technical one. It is used to control the contribution to cumulants of (truncations of) F T coming from clustered tuples (Sect. 3.5.2). Roughly speaking, the idea behind this assumption can be explained as follows. By II.b, the y i -dependence of the map β T ,i is rather mild, and, up to bounded error, β T ,i is close to a map β T ,i which is independent of y i . The essence of the assumption II.c is that the sums in functional tilings like (2.4) can be estimated from above by sums over subsets which are independent of y i . Although this assumption probably can be weakened, it holds in the setting that we are interested in, and it simplifies a lot of the upper estimates of integrals involving products of the f T ,i 's.

Functional averages and their truncations
Let I be a finite set. For T > 0 and i ∈ I, we consider: We use the notations Given the data in (I.a)-(I.c), we consider the family of functions given by and their Siegel transforms Our goal is to show that under suitable assumptions the functions converge in distribution. One of the difficulties here is that Siegel transforms (even for bounded Borel functions with bounded support) are not bounded. Nonetheless, they are typically only large on sets of very small μ-measure and belong to L p (X ) for p < d (see Lemmas 3.2 and 3.3 below). Here and later in the paper we always assume that d ≥ 3 so that the Siegel transforms are L 2 -integrable. This makes it possible to efficiently approximate a Siegel transform by bounded functions on X whose L pdistance from the original Siegel transform is small. To make this approximation precise, we shall use a family of compactly supported cutoff functions η L : X → [0, 1] with L > 0, constructed in [7,Lemma 4.11] such that for every compact set K ⊂ R d and f ∈ C(K ), we have Furthermore, for every ε > 0, where the implicit constants are independent of L.
We introduce a parameter L T → ∞, which will be specified later, and introduce the functions which provide compactly supported truncations of the functions f T ,i (·, y i ). We then consider The following lemma shows that this function approximates the Siegel transform F T if the parameter L T grows fast enough.
Similarly, if Before we proceed to the proof of this lemma, we discuss its relevance to our arguments so far. We wish to prove convergence in distribution for the functions If L T is chosen as in (3.9), then the first and third term of the right hand side tend to zero in the L 1 -norm, whence ϒ T converges in distribution to a continuous measure if and only if the functions do. In the upcoming subsections, we will analyse this type of sequences.

Proof of Lemma 3.1 By construction, we have
Since the measure μ is A-invariant, the inner terms are independent of a ∈ Q T ,i (y i ), whence By the assumption (I.b), the supports of the functions Furthermore, by the assumption (I.c), we have |Q T ,i (y i )| ≤ V T , so that we conclude that This implies the first part of the lemma, and the proof of the second part is similar.

Sobolev norms and mixing estimates
In order to obtain quantitative estimates on correlations, we need to control the smoothness of the functions. Our main tool for this purpose are Sobolev norms, which we now introduce. First note that every Y in the Lie algebra Y m , and refer to the integer |η| := η 1 + · · · + η m as the order of D η , where η = (η 1 , . . . , η m ). We write C ∞ c (X ) for the space of compactly supported functions ϕ such that all the derivatives D η ϕ exist.
Let ∈ X . We say that a linear subspace It can readily be checked that α is a proper function on X , and that for every compact Before we introduce Sobolev norm, we mention important properties of the α-function in relation with Siegel transforms.
The following estimate is also well-known (see e.g. [13, Lemma 3.10]): The following norms were introduced and studied by Einsiedler, Margulis and Venkatesh [12].

Definition 3.4 (Sobolev norms)
Let q be a positive integer. For ϕ ∈ C ∞ c (X ), its Sobolev norm S q (ϕ) of order q is defined as The explicit expression of the norm S q will not be important in our paper. Instead we shall use as black boxes, the following properties of the norms, established in [12] and in our previous paper [7].
For our next proposition, we need some notation and preliminary results. First, we recall some further properties of the cut-off functions η L constructed in [7]: Remark 3. 7 The second inequality in Proposition 3.6 is not explicitly stated in [7]. However, Lemma 4.11 in [7] tells us that D η η L L ∞ (X ) η 1, so the inequality in the proposition above follows after iterated use of the product rule for derivatives, in combination with Lemma 3.2 and the fact that the supports of the functions D η η L are still contained in {α L} for every η (where the implicit constants are independent of L and η).
The following corollary concerning Sobolev norms of truncated Siegel transforms is now immediate.
We also record the following corollary for future references. It is immediate from the inequalities in (3.11) and the first part of Proposition 3.6.

Corollary 3.9 For every compact set
Recall that A R k−1 via the map u → a(u) defined in (2.3). Let us throughout the rest of the section denote by · the ∞ -norm on R k−1 . The following theorem is a special case of [4, Theorem 1.1]. Roughly speaking, this theorem asserts that if ϕ ∈ C ∞ c (X ), then the family u → ϕ(a(u)·) consists of "almost independent" random variables, at least if the u's are far apart.
Theorem 1.1 in [4] is formulated for general r -tuples of elements in G = SL d (R), and not just for r -tuples in A. Furthermore, in the version in [4], the min i = j -expression is applied to differences with respect to an invariant Riemannian metric on G. The restriction of any such metric to A is quasi-isometric to the ∞ -distance on R k−1 , and the resulting constants are assumed to have been absorbed in δ r and by the -sign.

Cumulants
We review the notion of cumulants, and a classical CLT-criterion due to Frechet and Shohat. In this subsection (X , μ) can be a general probability measure space. Furthermore, the 2-cumulant of is just the μ-variance of .
The main property of cumulants that makes them valuable to us in this paper is summarized in the following CLT-criterion by Frechet and Shohat, which can be deduced from their results in [14]. It is essentially the classical method of moments tailored for (distributional) convergence to the normal distribution. Then the μ-distributions of T converge in the sense of distribution to the Normal Law with mean zero and variance σ 2 (the case σ = 0 is interpreted as convergence in the sense of distributions to the Dirac measure at 0). Remark 3.14 There are no explicit mentioning of cumulants in the paper of Frechet and Shohat, so in particular Proposition 3.13 is not directly featured there. A more modern (and explicit) exposition of cumulants can be found in [23], although our formulation of Proposition 3.13 is not explicit there either. However, it is noted in [23,Subsect. 3.2] that cumulants of random variables can be expressed in terms of moments (and vice versa). By the classical method of moments, to prove that the μ-distributions of T converges in the sense of distributions to the centered Normal Law with variance σ 2 it suffices to check that all moments (or cumulants) of T with respect to μ converge (as real numbers) to the moments (or cumulants) of the centered Normal Law with variance σ 2 . Since cumulants of a random variable can be expressed as logarithmic derivatives of the Fourier transform of the corresponding probability distribution (see e.g. [23, Subsect. 3.1]), it follows after some straightforward computations that Normal Laws are characterized as those probability distributions whose cumulants of order r ≥ 3 all vanish (at least within the class of distributions that are uniquely determined by their moments).
In order to apply this proposition, we have to analyze the cumulants cum r ( T ). This task will be carried out in the next section.

Estimating cumulants of order r ≥ 3
Let T be defined by (3.10). Our goal is to show that under suitable additional conditions, this is equivalent to Let us from now on fix r ≥ 3. For each r -tuples i = (i 1 , . . . , i r ) ∈ I r , we set and for y = (y 1 , . . . , y r ) ∈ Y T ,i , we set Q T ,i (y) := Q T ,i 1 (y 1 ) × · · · × Q T ,i r (y r ).
We shall make the following additional assumptions regarding the data defining the function T . Throughout this section, · denotes the ∞ -norm on R k−1 and B(x, γ ) the ball with respect to this norm.
(II.a) There exist finite sets Q T ,i ⊂ R k−1 satisfying: where the implicit constants are independent of u, T , and i. satisfying: • there exist c 1 , c 2 > 0, independent of T , such that for all u, v ∈ Q T ,i , • there exist maps β T ,i : We further assume that the family of the functions is uniformly bounded, and there exists a fixed compact set K ⊂ R d such that for all T and i.

Remark 3.15
We note that the condition (I.c) from Sect. 3.2 follows immediately from condition (II.a) and the first part of condition (II.b).
With this new notation, we set where Q T ,i := Q T ,i 1 × · · · × Q T ,i r . Then For γ > 0, we define the r -diagonal γ -neighborhood r (γ ) by We split the sum defining r ,T ,i into two subsums subdivided with respect to the set r (γ ). Namely, we choose a parameter γ T ,r → ∞, which will be specified later, and write r ,T ,i (y) denotes the sum over clustered r -tuples (1) , y i 1 ) , · · · , ϕ T ,ir (·, y ir ) • a β T ,ir (u (r ) , y ir ) , (3.17) and (2) r ,T ,i (y) denotes the sum over separated r -tuples: (1) , y i 1 ) , · · · , ϕ T ,ir (·, y ir ) • a β T ,ir (u (r ) , y ir ) . (3.18) The aim in the upcoming subsections is to find conditions on the parameters γ T ,r and L T such that for every i = (i 1 , . . . , i r ) ∈ I r , Together with the assumption (I.a) in Sect. 3.2, these estimates imply (3.12).

Analysis of the separated tuples
Now we prove the estimate (3.20) involving separated tuples. The crucial ingredient here is the estimates on higher-order correlations (Theorem 3.10), which allows us to establish an estimate on cumulants following our approach from [6]. We recall the estimate from Proposition 3.5(iii) that for every q ≥ 1, there exists σ q > 0 such that We may without loss of generality assume that the map q → σ q is increasing. Furthermore, we may also assume that the map r → δ r in Theorem 3.10 is decreasing.
In particular, without loss of generality we can assume that δ r < r σ q , for all q, r ≥ 1. (3.21) The following lemma is a corollary of the main technical results from our work [6].

23)
where c 1 is the positive constant in condition (II.b), and c r ,q is given by Lemma 3.16.
Then, for every i = (i 1 , . . . , i r ) ∈ I r , Proof We first note that if (u (1) , . . . , u (r ) ) belongs to Q T ,i ∩ r (γ T ,r ) c , then by condition (II.b), for all i m , i n ∈ I, for all y i m ∈ Y T ,i m and y i n ∈ Y T ,i n . Applying Lemma 3.16 with γ defined by c 1 γ T ,r − c 2 = c r ,q γ, we deduce that where we in the last -sign have absorbed the e −c 2 /c r ,q -factor. We recall that By Corollary 3.8, where K ⊂ R d is a fixed compact set which contains all of the supports of the functions x → f T ,i (x, y i ) as y i ranges over Y T ,i . We conclude that sup y∈Y T ,i (2) r ,T ,i (y) This implies the proposition.

Analysis of the clustered tuples
Next, we deal with the clustered tuples. Our analysis here is one of the main novelties of this paper. We stress that we do not assume that the maps T → f T ,i ∞ are bounded (otherwise, our analysis could have been carried out as in [6]). This is also where the assumption (II.c) becomes crucial. This condition says roughly that the κ T ,i -integrals of f T ,i are bounded functions. The main purpose of this subsection is to explain how this "bounded on average"-condition can be used to derive (3.19).

Proposition 3.18
Suppose that the parameters L T and γ T ,r satisfy for some ε > 0, Then, Proof Expanding the definition of the cumulant in (3.17), we deduce that (3.25) We recall that By condition (II.c), there exist Borel functions h T ,i : for all u ∈ Q T ,i , z ∈ R d , and y i ∈ Y T ,i . Hence, setting We recall that according condition (II.c), the function h is uniformly bounded and its support is contained in a fixed compact set. In particular, it follows from Lemma 3.2 that By Corollary 3.9, there is a constant B = B(C) > 0 such that Combining the above estimates, we conclude that where ψ T is defined by Therefore, we deduce from (3.25) that In particular, it also follows that for p ≥ d, According to the general Hölder inequality, for exponents p k ∈ (1, ∞] satisfying Therefore, when |I | < d, and when |I | ≥ d, We conclude that for every partition P, and from (3.25), it follows from condition (II.a) that for all ε > 0, which implies the assertion of the proposition.

Main result
In this section, we finally prove convergence in distribution of the functions We recall that the data in this formula satisfy the conditions (I.a)-(I.c) and (II.a)-(II.c). We further put an additional condition on the norms of the functions f T ,i , using the notation introduced in (3.2)-(3.3).
The main result of Sect. 3 is the following theorem:

Theorem 3.19 Suppose that
• There exists θ 0 > 0 such that • For q ≥ 1, there exists θ q > 0 such that • The limit exists and is finite. θ 0 ), then the functions ϒ T on (X , μ) converge in distribution to the Normal Law with variance σ .
Proof We shall use Proposition 3.13. We recall that by Lemma 3.1, the functions F T can be approximated by functions This implies that the functions Then, in particular, lim T →∞ T L 2 (X ) = σ . It also follows that if T converges in distribution to the Normal Law, so does ϒ T . Hence, it remains to verify that the conditions of Proposition 3.13 hold for the functions T , namely, that Since the later cumulant can be expressed as (3.16), this will follow from Propositions 3.17 and 3.18. Now it remains to choose the parameters L T and γ T ,r so that the conditions in Lemma 3.1, Propositions 3.17, and 3.18 are satisfied. To do this, we shall take (3.30) where ρ and M r are positive real numbers, which will be chosen later. The condition (3.8) in Lemma 3.1 is satisfied if ρ is chosen so that for some ε > 0 We write q r for the index introduced in Lemma 3.16 and fix an integer q > q r . The condition (3.23) in Proposition 3.17 is satisfied if V ρr (d+1)+r /2− c 1 Mr cr ,q +r θ q T → 0, which can always be arranged by choosing M r large enough, depending on r , ρ, d. Finally, the condition (3.24) in Proposition 3.18 is satisfied if we choose the constants ρ and M r such that for some ε > 0, This holds provided that (3.32) Hence, it is sufficient to choose ρ so that both (3.31) and (3.32) hold for all r ≥ 3. This is possible provided that Since ε > 0 is arbitrary, this argument works provided that d > 4 + 4θ 0 .

Remark 3.20
In order to proceed with the proof above it is sufficient to have that According to Lemma 3.1, condition (3.33) holds under assumption (3.9). This assumption is weaker than (3.8), so that we can replace (3.31) by the assumption Then the argument can be carried out when d > 2(1 + θ 0 ), provided that we can establish (3.34) independently.

Proof of the main theorem
In this section, we prove our main theorem (Theorem 1.3). We recall that our goal is to analyze the lattice counting function for the domains Ultimately, we will construct an approximation of the characteristic function χ T by functional averages of the form (2.4) and show that these functional averages satisfy the assumptions of Theorem 3.19, so that Theorem 1.3 will be a consequence of Theorem 3.19. This is a tedious and rather technical task, so it might be beneficial for the reader to first take a look in Sect. 4.7, where the main objects of the section are summarized, and the most important verifications are indexed.

A basic reduction
Let L j : R d → R d j with j = 1, . . . , k, I ⊂ (0, ∞), and B ⊂ S d be the objects defining the sets T . We also consider the basic domains 0 Therefore, for any lattice , Since the measure on the space of lattices is invariant under L 0 , it is sufficient to analyze the distribution of the function → | ∩ 0 T | − Vol( 0 T ). From now on we assume that the sets T = T (I , B) are defined by (4.2), where I is a non-empty bounded interval in (0, ∞), and B is a Borel subset of S d with positive measure.

A coodinate system
The sets T are more conveniently studied in a different coordinate system which we now introduce. We use notations and Let where u(z) := (log z 1 , . . . , log z k−1 ) , It is readily checked that the map π is equivariant with respect to the group A defined in (2.3) in the following sense: and that the inverse map π −1 is given by If one computes the Jacobian of this inverse map, the following lemma emerges:

Lemma 4.1 For every bounded Borel function f
Here dz denote the volume element on R d which assigns volume one to the unit cube, du is the volume element on R k−1 such that the unit cube in R k−1 has volume one, and the measure κ is defined in (1.5).
Let us now write out the set T in (u, s, ξ)-coordinates. We define and given a point z in R d * , we set Then z ∈ T if and only if We now set v j = u j − log T for j = 1, . . . , k − 1. Then, the above conditions on u are equivalent to v 1 , . . . , v k−1 < 0 and For s < d log T , we let δ T (s) denote the diagonal (k − 1) × (k − 1)-matrix whose diagonal elements δ T , j (s) are given by We note that since the interval I is bounded, the inequality s < d log T is satisfied for all x ∈ T (I , B) when T > e sup(I )/d . Then (4.7) can be re-written as and v T := (log T , . . . , log T ).
We conclude that when T > e sup(I )/d .

Volume and variance computations
The above parametrization of T leads, in particular, to an an easy computation of its volume, and the mean and the variance of the Siegel transforms χ T .
If we expand the inner parenthesis and integrating term-wise, we deduce that Vol( T ) = P I ,B (log T ) for the polynomial The leading term of this polynomial is c k−1 (I , B)t k−1 with which finishes the proof of the lemma.
From (2.2), we also obtain that To compute the variance of the Siegel transform, we need the following Theorem 4.3 (Rogers' mean-square value theorem, [24]) Let d ≥ 3 and let f : R d → R be a bounded and non-negative Borel measurable function with bounded support. Then f ∈ L 2 (X ) and where ζ denotes the Riemann zeta-function.
For a future reference, we also note that a straightforward application of the Cauchy-Schwarz inequality to the expression in Theorem 4.3 yields the following corollary:

Corollary 4.4 If d ≥ 3 and f : R d → R is a bounded and non-negative Borel measurable function with bounded support, then
Now using Theorem 4.3, we compute the variance: Proof By Theorem 4.3, If we split this sum into sums over { p = q} and { p = q} and use the symmetry of p and q and the formula Vol(q −1 T ) = q −d Vol( T ) for every q ≥ 1, we see that this sum can be written as We observe that for c, T > 0, I ⊂ (0, ∞), and B ⊂ S d , and for T 1 , T 2 > 0, I 1 , I 2 ⊂ (0, ∞), and B 1 , B 2 ⊂ S d , Hence, we deduce from Lemma 4.2 that for every c ≥ 1, Then since we are assuming that d ≥ 3, we can apply the Dominated Convergence Theorem to conclude that the limit σ 2 exists and This implies the stated formula.

Tessellations of the sets Ä T (I, B)
In this subsection, we construct, for all large enough T , a functional tiling of the indicator function χ T using the coordinate system introduced in the previous section. This tiling will be the basis for our smooth approximation scheme later. Before we can state our main observation (Corollary 4.10) of this subsection, we need some preliminaries. For a positive integer N , we define We note that this definition of S 1 coincides with the one given in (4.8) above. Geometrically, S 1 and S 2 are the lower and upper pieces of the unit cube (−1, 0] k in R k−1 cut in half by the hyperplane u 1 + · · · + u k−1 = −1. Furthermore, we define The next lemma tells us that S(N ) can be tesselated by translates of S 1 by vectors in P N ,1 and by translates of S 2 by elements of P N ,2 . We stress that while the sets of integer vectors P N ,1 and P N ,2 are not disjoint, the translates of S 1 and S 2 by vectors in the respective sets are disjoint.

Lemma 4.6 For every positive integer N ,
In particular, Proof Fix u ∈ S(N ), and note that since −N ≤ u j ≤ 0 for all j, there are unique integers 0 ≤ n j ≤ N such that and thus either w ∈ S 1 or w ∈ S 2 , whence u ∈ S i − n for either i = 1, 2. Clearly these are disjoint events, so in particular, which finishes the proof.
We observe that in view of (4.9) the sets T are related to suitable dilations of the sets S(N ). Indeed, for T and s with s < d log T , we let Therefore, applying Lemma 4.6 to S( log T ), we get the following "functional tiling" for the characteristic function χ T .

Lemma 4.7 For all
In particular, for all T > e sup(I )/d , this identity holds everywhere.

Construction of a functional tiling
Now we construct our functional tiling, namely, the objects satisfying conditions (I.a)-(I.c) and (II.a)-(II.c) with V T := Vol( T ). Let us now rewrite the assertion of Lemma 4.7, so that it fits the decomposition (2.4). We note that

Construction of the sets Q T
uniformly on s in compact sets, where We define for u ∈ R k−1 and s ∈ R. Let Q T ,i := P log T ,i ⊂ R k−1 , for i = 1, 2. (4.14) From Lemmas 4.2 and 4.6, we see that | Q T ,i | Vol( T ). The condition (3.13) in (II.a) can be also checked easily. The following lemma verifies condition (II.b). We recall that · denotes the ∞ -norm on R k−1 .

Lemma 4.8 Let J ⊂ R be a bounded interval.
(i) There exist c 1 , c 2 > 0 such that for all T ≥ T 0 (J ), s 1 , s 2 ∈ J , and u, v ∈ Q T ,i , (ii) There exists c 3 > 0 such that for all T ≥ T 0 (J ), s ∈ J , and u ∈ Q T ,i , Proof Since u log T for all u ∈ Q T ,i , this lemma follows immediately from (4.12) and the definitions of the maps β T andβ T .

Remark 4.9
While in Sect. 2 we have allowed β T and β T to also depend on i, it is not necessary at this point. However, to properly work with these functions in our setting, we also need to define the finite measure spaces (Y T ,i , κ T ,i ), for i = 1, 2. This will be done in the next section.
Let us now rewrite the decomposition in Lemma 4.7 using the standard coordinates. We set and for i = 1, 2, and note that the assertion in the lemma above can be written as for all large enough T . Let us now set h T ,i := h T ,i • π, for i = 1, 2.

Corollary 4.10 For all large enough T ,
We stress that the summation range in the above formula depend on the point z, albeit in a weak way via s(z). In the next subsection, we will get rid of this z-dependence upon introducing an additional average. The price we have to pay for this is that the functions h T ,i will be replaced with more complicated functions f T ,i , which depend on the an extra variable, coming from the average.

Construction of the spaces (Y T ,i , Ä T ,i ) and functions f T ,i (assumptions (I.a)-(I.b))
If T ⊂ R k−1 is a subset and r ≥ 0, we denote by T r the r -thickening of T with respect to this norm. Similarly, for a subset B of S d , we denote by B r the r -thickening of B with respect to the rotation-invariant metric on S d .
Since |v| log T for every v ∈ Q T ,i , it follows from (4.12) that for any bounded interval J ⊂ R, there exist c(J ) > 0 such that for all s, t ∈ J , T ≥ T 0 (J ), and v ∈ Q T ,i , Hence, we deduce that for all s, t ∈ J satisfying |s − t| ≤ r , T ≥ T 0 (J ), u ∈ R k−1 , and v ∈ Q T ,i , T (v, t)) . (4.17) Let us now introduce a parameter ε ∈ (0, 1) and a non-negative real smooth function ρ ε on R with For future reference, we also note that ρ ε can be chosen, so that By the standard properties of convolutions, Then, using (4.17), we deduce that for every u ∈ R k−1 and v ∈ Q T ,i , where c = c(J ) > 0 for a fixed bounded interval J which contains (log I ) ε for all 0 < ε < 1. Let ψ i,ε be a smooth function on R k−1 such that 20) and let ϑ ε be a smooth function on S d such that For future reference, we note that these functions can be constructed, so that ψ i,ε C q ε −1−q and ϑ ε C q ε −θ q for some θ q > 0. (4.22) From the above estimate, we deduce that for every u ∈ R k−1 and v ∈ Q T ,i , h T ,i u + β T (v, s), s, ξ ≤ (log I ) ε ψ i,ε τ T (s) −1 (u + β T (v, t)) ρ ε (s − t) ϑ ε (ξ ) dt. (4.23) By the same argument as in (4.17), we also have for all s, t ∈ J satisfying |s − t| ≤ ε, T ≥ T 0 (J ), u ∈ R k−1 , and v ∈ Q T ,i , T (v, s)) .
Then it follows from (4.20) and (4.21) that (4.24) and We introduce a parameter ε T ∈ (0, 1), to be specified later, and define (4.28) Then it follows from (4.25) that We conclude that The estimate indicates that F T provides an approximation for the characteristic function χ T . Let us now define f T ,i : R d × R → [0, ∞) by f T ,i (z, y) = f T ,i (π(z), y) for z ∈ R d * and y ∈ Y T , (4.30) and f T ,i (z, y) := 0 for all z ∈ R d \R d * . Then f T ,i is smooth in the z-coordinate. We also set (4.31) From (4.5) we see that the function F T can be written as The following lemma demonstrates that the function F T proves a good approximation for the characteristic function χ T = χ T • π .
Hence, we conclude that f T ,i (·, y) C 0 ε −1 T . This proves the first estimate. Using additionally (4.22), we conclude that also f T ,i (·, y) C q ε −r q T for some r q > 0, which implies the second estimate.

Proof of Theorem 1.3
Let us now summarize what we have done in this technical section. The aim has been to produce smooth approximations F T for the indicator functions χ T to which the arguments of Sect. 3 apply. These approximations are given explicitly in (4.32). They are integrals of varying averages which are fibered over the finite measure spaces These averages are constructed using finite subsets Q T ,i and Q T ,i (y) of R k−1 , defined in (4.14) and (4.15), and Borel maps β T : R k−1 ×Y T → R k−1 and β T : R k−1 → R k−1 , defined in (4.13). The approximations F T depend on a choice of a parameter ε T , which we take ε T = Vol( T ) −η for some η > 0. In order for these approximations to be useful for us, we arrange that as T → ∞, for p = 1, 2.