Translates of rational points along expanding closed horocycles on the modular surface

We study the limiting distribution of the rational points under a horizontal translation along a sequence of expanding closed horocycles on the modular surface. Using spectral methods we confirm equidistribution of these sample points for any translate when the sequence of horocycles expands within a certain polynomial range. We show that the equidistribution fails for generic translates and a slightly faster expanding rate. We also prove both equidistribution and non-equidistribution results by obtaining explicit limiting measures while allowing the sequence of horocycles to expand arbitrarily fast. Similar results are also obtained for translates of primitive rational points.


Introduction
Let {S n } n∈N be a sequence of "nice" subsets that become equidistributed in their ambient space. Given a sequence of discrete subsets {R n } n∈N with R n ⊂ S n , an interesting question is to study to what extent does the distribution behavior of {R n } n∈N mimic that of {S n } n∈N . One naturally expects that when the size of R n is relatively large, it is more likely that {R n } n∈N inherits some distribution property from {S n } n∈N ; on the other hand if R n lies on S n sparsely, then it is more likely that points in {R n } n∈N become decorrelated and distribute like random points on the ambient space.
In the setting of unipotent dynamics, the most typical example of a sequence {S n } n∈N is a sequence of expanding closed horocycles on a non-compact finite-area hyperbolic surface M. More precisely, we can realize M as a quotient \H where is a cofinite Fuchsian subgroup and H = {z = x + iy ∈ C : y > 0} is the Poincaré upper half-plane, equipped with the hyperbolic metric ds = |dz|/y, where dz = dx + idy is the complex line element. Up to conjugating by an appropriate isometry, we may assume that M = \H has a width one cusp at infinity, that is, that the isotropy group ∞ < is generated by the translation sending z ∈ H to z + 1. A closed horocycle of height y > 0 is a closed set of the form H y := { (x + iy) : x ∈ R/Z} ⊂ M, and its period, i.e., its hyperbolic length, is y −1 . As H y gets longer, that is, as y → 0 + , it becomes equidistributed on M with respect to the hyperbolic area dμ(z) = y −2 dxdy. The first effective version of this result is due to Sarnak [28] who, using spectral arguments, proved that for every ∈ C ∞ c ( \H) and any y > 0, where S is some Sobolev norm, and 0 < α < 1 is a constant depending on the first non-trivial residual hyperbolic Laplacian eigenvalue of . In the case of the modular surface SL 2 (Z)\H, α = 1 2 , while Zagier [32] observed that the Riemann hypothesis is equivalent to the equidistribution rate O y 3/4− .
In this setting, this problem was first investigated by Hejhal in [12] with a heuristic and numerical study of the value distribution of the sample points x+ j n + iy : 0 ≤ j ≤ n − 1 ( 1 . 2 ) for some Hecke triangle groups = G q under the assumption that ny is small. Set where is some mean-zero step function on a fixed fundamental domain for \H (automorphically extended to H). The numerics show that the value distribution of n −1/2 S n,y, (x) with respect to x ∈ [0, 1) approaches a Gaussian curve for the nonarithmetic Hecke triangle groups G 5 and G 7 , while this phenomenon breaks down for G 3 = PSL 2 (Z). Hejhal gave an explanation of this difference based on the existence of Hecke operators on G 3 . The convergence to a Gaussian distribution for general non-arithmetic Fuchsian groups was later confirmed by Strömbergsson [30,Corollary 6.5], under the assumption that the sequence {y n } n∈N decays sufficiently rapidly. Other such problems have since been investigated. Marklof and Strömbergsson [27] proved the equidistribution of generic Kronecker sequences along a sequence of closed horocycles expanded at a certain rate y n on T 1 M, the unit tangent bundle of M. The equidistribution of Hecke points proved by Clozel-Ullmo [4] (see also [3,10]) implies the equidistribution of the primitive rational points j n + i n : 1 ≤ j ≤ n − 1, gcd( j, n) = 1 at prime steps on the modular surface, see [10, Remark on p. 171]. More recently, the equidistribution of the above sequence along the full sequence of positive integers was proved by Einsiedler-Luethi-Shah [8] in a slightly more general setting, namely on the product of the unit tangent bundle of the modular surface and a torus. Various sparse equidistribution results have also been obtained for expanding horospheres in the space of lattices SL n (R)/ SL n (Z) for n ≥ 3 [7,9,22,23,26] and in Hilbert modular surfaces [24]. For each of these equidistribution results, assumptions on the expanding rate of the sequence {S n } n∈N are crucial; the discrete subsets {R n } n∈N lying on {S n } n∈N can not be too sparse.
This paper emerged from an attempt to prove a result which turned out to be false. We consider the sparse equidistribution problem for the subset of rational points (with denominator n) under a horizontal translation x ∈ R/Z on a horocycle H y on the modular surface; we denote this subset by R n (x, y n ) (cf. (1.4)). We thought that since the closed horocycles H y equidistribute as y → 0 + , if we fix a sequence {y n } n∈N approaching zero, then the normalized counting measures on R n (x, y n ) (and its primitive counterpart) should equidistribute for Lebesgue almost every x as n → ∞. See the recent paper of Bersudsky [1,Theorem 1.5] for an analogue situation where such a result is true. Note the order of quantifiers; we first fix the sequence {y n } n∈N and only then choose the horizontal translation x. It is not hard to see that if one flips the quantifiers, for any fixed horizontal translation x, there are sequences {y n } n∈N (approaching zero rapidly) such that equidistribution fails. We were very surprised to learn though, that in stark contrast to our initial expectation, equidistribution fails. The main novel result of this paper (Theorem 1.5) says that there are sequences {y n } n∈N approaching zero arbitrarily fast such that for almost every horizontal translation x the normalized counting measures R n (x, y n ) and its primitive counterpart do not equidistribute. In fact, we show the collection of limit measures contains the uniform measure μ M , the zero measure and certain singular measures. Although these should be considered as the main contribution of this paper, we also complement our analysis with answering natural questions concerning sequences {y n } n∈N approaching zero in a polynomial rate.
The next subsections describe more precisely the setting and results obtained.

Context of the present paper
Let = SL 2 (Z) and let M = \H be the modular surface. In this paper, generalizing the setting of [8], we study the equidistribution problem for the sets of rational and primitive rational points under an arbitrary horizontal translation x ∈ R/Z along a given sequence of expanding closed horocycles on M. The set of rational points is the obvious choice of a sparse set with identical spacings, while primitive rational points constitute the simplest pseudorandom sequence (via the linear congruential generator). For any n ∈ N, x ∈ R/Z and y > 0 we denote by R n (x, y) := x + j n + iy ∈ H y : 0 ≤ j ≤ n − 1 (1.4) and respectively R pr n (x, y) := x + j n + iy ∈ H y : j ∈ (Z/nZ) × , (1.5) the set of rational and respectively primitive rational points with denominator n on the closed horocycle H y translated to the right by x. As usual, (Z/nZ) × denotes here the multiplicative group of integers modulo n. Let {y n } n∈N be a sequence of positive numbers such that y n → 0 as n → ∞. We investigate the limiting distribution of the sequences of sample points {R n (x, y n )} n∈N and R pr n (x, y n ) n∈N under various assumptions on the expanding rate of the sequence of horocycles {H y n } n∈N , or equivalently, the decay rate of {y n } n∈N .
This problem is naturally easier when the sequence {y n } n∈N decays slowly since then at each step we have relatively more sample points on the underlying horocycle. For instance, if ny n → ∞ as n → ∞, the hyperbolic distance between two adjacent points in R n (x, y n ) decays to zero as n → ∞. Since the points in R n (x, y n ) distribute evenly on H y n , the distribution behavior of R n (x, y n ) then mimics that of H y n . In particular, for any x ∈ R/Z the sequence {R n (x, y n )} n∈N becomes equidistributed on M with respect to the hyperbolic area μ as n → ∞, following from the equidistribution of the sequence {H y n } n∈N .
Regarding R pr n (x, y n ) n∈N , its distribution behavior is well understood when x = 0. Indeed, it was shown by Luethi [24] that if y n = c/n α for some c > 0 and some α ∈ (0, 1), then R pr n (0, y n ) becomes equidistributed on M with respect to μ as n → ∞. Moreover, under the simple symmetry relation that for gcd( j, n) = 1 and y > 0 j n + iy = − j n + i n 2 y , (1.6) one can extend this equidistribution result to the range α ∈ (1, 2); this improves the previous work of Demirci Akarsu [5,Theorem 2] which confirms equdistribution of {R pr n (0, c/n α )} n∈N for α ∈ ( 3 2 , 2). Here j ∈ (Z/nZ) × denotes the multiplicative inverse of j ∈ (Z/nZ) × . The equidistribution for the case α = 1 was later proved by Einsiedler-Luethi-Shah [8]; Jana [16,Theorem 1] recently gave an alternative spectral proof to this equidistribution result. We also mention that both [5,Theorem 2] and [16,Theorem 1] are valid in the same setting as [8], namely, on the product of the unit tangent bundle of the modular surface and a torus. When α = 2 the equidistribution fails as the aforementioned symmetry implies that R pr n (0, c/n 2 ) = R pr n (0, 1/c) is always trapped in the closed horocycle H 1/c . For the same reason, when α > 2 (or more generally for any sequence satisfying n 2 y n → 0), one has with R pr n (0, c/n α ) = R pr n (0, n α−2 /c) ⊂ H n α−2 /c a full escape to the cusp of M as n → ∞. It is worth noting that while the symmetry (1.6) still holds for rational translates (cf. Lemma 3.6), it breaks down for irrational translates.

Statements of the results
We will state here the main results of this paper, and postpone the discussion of their proofs to the next subsection. Let μ M := μ(M) −1 μ be the normalized hyperbolic area on M. For any n ∈ N, x ∈ R/Z and y > 0 let δ n,x,y and δ pr n,x,y denote the normalized probability counting measure supported on R n (x, y) and R pr n (x, y) respectively.
Using spectral expansion and collecting estimates on the Fourier coefficients of Hecke-Maass forms and Eisenstein series, we obtain the following effective result, which yields equidistribution when the sequence is within a certain polynomial range. Theorem 1.1 Let M be the modular surface. For any ∈ C ∞ c (M), for any n ∈ N, x ∈ R/Z and y > 0 we have where θ = 7/64 is the current best known bound towards the Ramanujan conjecture (which implies θ = 0) and S 2,2 is a "L 2 , order-2" Sobolev norm on C ∞ c (M), see Sect. 2.1.
If {y n } n∈N is a sequence of positive numbers satisfying lim n→∞ y n = 0 and y n 1/n α for some fixed α ∈ 0, 2 1+2θ = (0, 64 39 ), then Theorem 1.1 implies that for any translate x ∈ R/Z, both {R n (x, y n )} n∈N and R pr n (x, y n ) n∈N become equidistributed on M with respect to μ M as n → ∞. In particular, it gives an alternative -spectral -proof to the aforementioned results of Luethi [24] and Einsiedler-Luethi-Shah [8]. The upper bound 2 1+2θ is the natural barrier for our spectral methods. Nevertheless, when x is a rational translate, a generalization of the symmetry (1.6) allows to go beyond this barrier, and to prove unconditionally the remaining range α ∈ [ 2 1+2θ , 2), as holds in the case of {R pr n (0, y n )} n∈N . Theorem 1.2 Let x = p/q be a primitive rational number, i.e. gcd( p, q) = 1. Let {y n } n∈N be a sequence of positive numbers satisfying y n 1/n α for some fixed α ∈ [ 2 1+2θ , 2). Then both δ n,x,y n n∈N q and δ pr n,x,y n n∈N pr q weakly converge to μ M as n goes to infinity, where N q := {n ∈ N : gcd(n 2 , q) | n} and N pr q := {n ∈ N : gcd(n, q) = 1}.

Remark 1.7
If q is squarefree, then the condition gcd(n 2 , q) | n is void. Thus for such q, Theorem 1.2 (together with Theorem 1.1) confirms the equidistribution of the sample points R n ( p/q, y n ) (with y n 1/n α ) along the full set of positive integers for any 0 < α < 2.
As a byproduct of our analysis, we also have the following non-equidistribution result for rational translates, giving infinitely many explicit limiting measures. Let us first fix some notation. For each m ∈ N, let P m := {n = m ∈ N : is a prime number and m}. (1.8) For each Y > 0, we denote by μ Y the uniform probability measure supported on the closed horocycle H Y . For each m ∈ N and Y > 0, we define the probability measure (1.9) Theorem 1.3 Keep the notation as above. Let x = p/q be a primitive rational number and let {y n } n∈N be a sequence of positive numbers.
(1) If y n = c/n 2 for some constant c > 0, then for any m ∈ N q and for any (2) If lim n→∞ n 2 y n = 0, then both sequences {R n (x, y n )} n∈N and {R pr n (x, y n )} n∈N fully escape to the cusp of M.
Our next result shows that, similar to the rational translate case, equidistribution fails for generic translates as soon as {y n } n∈N decays logarithmically faster than 1/n 2 . Theorem 1.4 Let d M (·, ·) be the distance function on M induced from the hyperbolic distance function on H. Fix z 0 ∈ M. Let {y n } n∈N be a sequence of positive numbers satisfying y n 1/(n 2 log β n) for some fixed 0 < β < 2. Then for almost every This implies that for almost every x ∈ R/Z, there exists an unbounded subsequence of N such that along this subsequence inf z∈R n (x,y n ) where α = min{β, 2 − β}. That is, for almost every x ∈ R/Z, all the sample points R n (x, y n ) (and hence also R pr n (x, y n )) are moving towards the cusp of M along this subsequence, and eventually escape to the cusp as n in this subsequence goes to infinity.
Our proof of Theorem 1.4 relies on connections to Diophantine approximation theory. This viewpoint comes with inherent limitations; in the specific setting y n 1/(n 2 log β n), Khintchine's approximation theorem guarantees full escape to the cusp almost surely, but this argument does not extend to any sequence {y n } n∈N that decays polynomially faster than 1/n 2 , see Sect. 1.3 for a more detailed discussion. It is thus interesting to study the cases when {y n } n∈N is beyond the ranges in Theorems 1.1 and 1.4.
Indeed, the rest of our results deal with sequences {y n } n∈N that can decay arbitrarily fast, and give both positive and negative results. This is the main novelty of this paper; the handling of cases in which the sample points can be arbitrarily sparse on the closed horocycles they lie on. We now state the main novel aspect of this paper: Theorem 1.5 For any sequence of positive numbers {c n } n∈N , there exists a sequence {y n } n∈N satisfying 0 < y n < c n for each n ∈ N and such that for almost every x ∈ R/Z the set of limiting measures of {δ n,x,y n } n∈N and {δ pr n,x,y n } n∈N both contain the uniform measure μ M , the zero measure, and singular probability measures. Theorem 1.5 is a sum of three more precise theorems, which each handles a specific limiting measure, and which we discuss in the next subsection.

Discussion of the results
Our proofs of Theorems 1.1 and 1.2 rely on spectral estimates collected in the recent paper of Kelmer and Kontorovich [18], with a necessary refinement of [18, (3.6)] in the form of Proposition 3.3, which comes at the cost of a higher degree Sobolev norm. This strategy is standard and is also found in [4,16,27,31], to name just a few recent papers on related problems. The analysis in [18] was carried out in a more general setting, namely for the congruence covers 0 ( p)\H with p a prime number. Theorem 1.1 can be extended to that more general setting, see Remark 3.11. With these spectral estimates in hand, we further prove an effective non-equidistribution result for rational translates from which part (1) of Theorem 1.3 follows, see Theorem 3.10. Part (2) of Theorem 1.3 is an easy application of the symmetry (1.6).

Remark 1.11
As was pointed out to us by Asaf Katz, we could also have used the estimates from [31, Proposition 3.1] in place of [18,Proposition 3.4], which in our specific setting, give the same equidistribution range (with a higher degree Sobolev norm). We also mention that the estimates in [31,Proposition 3.1] are valid in the setting of 0 (q)\ SL 2 (R) with q ∈ N, and thus imply an effective equidistribution result analogous to Theorem 1.1 in this generality.
As mentioned earlier, a generalization of the symmetry (1.6) is available for rational translates but breaks down for irrational translates. To handle irrational translates, we approximate them by rational ones to apply the symmetry relation, see Lemma 4.2. This is where Diophantine approximation kicks in. Similar ideas were also used in [27, Section 7] to construct counterexamples in their setting. In fact, we prove Theorem 1.4 by proving a more general result that captures the cusp excursion rates of the sample points R n (x, y n ) in terms of the Diophantine properties of the translate x, see Theorem 4.3. Theorem 1.4 will then follow from Theorem 4.3 by imposing a Diophantine condition which ensures cusp excursion, while also holds for almost every translate thanks to Khintchine's approximation theorem. This Diophantine condition accounts for the tight restrictions on {y n } n∈N in Theorem 1.4. On the other hand, assuming an even stronger Diophantine condition (which holds for a null set of translates), we can handle sequences decaying polynomially faster than 1/n 2 with a much faster excursion rate towards the cusp, see Theorem 4.4. We also prove a non-equidistribution result (which, this time, holds for every x) when y n = c/n 2 and the constant c is restricted to some range, see Theorem 4.5. The trade-off of this upgrade from Theorem 1.4 to the everywhere non-equidistribution result is that we can no longer prove the full escape to the cusp along subsequences as in Theorem 1.4.
As mentioned before, Theorem 1.5 follows from three more precise theorems which each handles a specific limiting measure. Our first result confirms equidistribution almost surely along a fixed subsequence of N for any sequence {y n } n∈N decaying at least polynomially. Theorem 1.6 Fix α > 0. Then there exists a fixed unbounded subsequence N ⊂ N such that for any sequence of positive numbers {y n } n∈N satisfying y n n −α and for almost every x ∈ R/Z, both δ n,x,y n and δ pr n,x,y n weakly converge to μ M as n ∈ N goes to infinity. Remark 1.12 It will be clear from our proof that one can take N ⊂ N to be any subsequence satisfying n∈N n −c < ∞ for some positive c < min{ α 2 , 1 − 2θ }, e.g. we may take N = { n κ } n∈N for any κ > 1/ min{ α 2 , 1 − 2θ }. Theorem 1.6 follows from a second moment estimate for the discrepancies |δ n,x,y − μ M | and |δ pr n,x,y − μ M | along the closed horocycle H y (Theorem 5.2) together with a standard Borel-Cantelli type argument. This was also the strategy used in [27] when studying the Kronecker sequences in (1.3). Along these lines, they deduce from spectral estimates the equidistribution for almost every β ∈ R along a fixed subsequence {n k } n∈N when y n n −α with k ∈ N depending on α > 0. Then, using a continuity argument, this result is upgraded to the equidistribution along the full sequence of positive integers, see [27,Section 4]. This continuity argument fails in our situation. Instead of applying directly spectral estimates to the second moment formulas, we express the latter in terms of certain Hecke operators (Proposition 5.1), and rely on available (spectral) bounds for their operator norm, see [10]. Contrarily to spectral estimates, the recourse to Hecke operators allows us to have a uniform subsequence N which is valid for all {y n } n∈N decaying at least polynomially.
Next, we show that there exists a sequence {y n } n∈N decaying arbitrarily rapidly such that for almost every x, R n (x, y n ) (and thus also R pr n (x, y)) escapes to the cusp with a certain rate along subsequences. Theorem 1.7 Fix z 0 ∈ M. For any sequence of positive numbers {c n } n∈N , there exists a sequence {y n } n∈N satisfying 0 < y n < c n for each n ∈ N and such that for Finally, we show that escape to the cusp is not the only obstacle to equidistribution. Theorem 1.8 Let m ∈ N and Y > 0 satisfy m 2 Y > 1. Let P m ⊂ N and ν m,Y be as defined in (1.8) and (1.9) respectively. For any sequence of positive numbers {c n } n∈P m , there exists a sequence {y n } n∈P m satisfying 0 < y n < c n for all n ∈ P m such that for almost every x ∈ R/Z, the set of limiting measures of {δ n,x,y n } n∈P m contains ν m,Y .
Remark 1.14 We note that P 1 is the set of prime numbers and ν 1,Y = μ Y . Since whenever p is a prime number, when m = 1 the conclusion of Theorem 1.8 also holds for the sequence {δ pr n,x,y n } n∈P 1 . We also note that it will be clear from our proof that Theorems 1.7 and 1.8 can be combined. In fact, our argument shows that there always exists a sequence {y n } n∈N decaying faster than any prescribed sequence such that for almost every x ∈ R/Z the set of limiting measures of δ n,x,y n n∈N contains the trivial measure and ν m,Y for any finitely many pairs (m, Y ) ∈ N × R >0 with m 2 Y > 1, see Remark 7.26. Moreover, in view of Theorem 1.6 if y n n −α for some α > 0, then it also contains the hyperbolic area μ M almost surely.
For the rest of this introduction we describe the strategy of our proof to Theorem 1.7 (Theorem 1.8 follows from similar ideas). To detect cusp excursions, we study for each n ∈ N the occurrence of the events x + j n + iy n ∈ C for all 0 ≤ j ≤ n − 1, (1.15) where C ⊂ M is some fixed cusp neighborhood of M. More precisely, we determine when the limsup set I ∞ = lim n→∞ I n is of full measure, where for each n ∈ N, consists of translates x ∈ R/Z for which the events in (1.15) occur. This requires to study the left regular u 1/n -action on C ⊂ M and thus calls for the underlying lattice to be normalized by u 1/n . Therefore, we construct an explicit tower of coverings { n \H} n∈N in which each n is a congruence subgroup normalized by u 1/n . We note that the existence of such n < is the starting point of our proof and it relies on the assumption that = SL 2 (Z); this construction would fail for replaced by a non-arithmetic lattice.
The key ingredient of the proof will be a sufficient condition which states that if a point n (x + iy n ) ∈ n \H visits a certain cusp neighborhood C n on n \H, then the events in (1.15) will be realized for x ∈ R/Z, that is, x ∈ I n , see Lemma 7.6.
Using this sufficient condition, we can then relate the measure of I n to the proportion of certain closed horocycles on n \H visiting the cusp neighborhood C n ⊂ n \H, which in turn, using the equidistribution of expanding closed horocycles on n \H, can be estimated for y n sufficiently small. Since the sets I n also need to satisfy certain quasi-independence conditions for I ∞ to have full measure (Lemma 2.5), we need to apply the equidistribution of certain subsegments of the expanding closed horocycles on n \H. More precisely, at the n-th step these subsegments will be taken to be the sets I m for all m < n. These subsegment are finite disjoint unions of subintervals whose number and size depend sensitively on the height parameters {y m } m<n , see Remark 6.3. If there would exist an effective equidistribution result which would be insensitive to the geometry of these subsegments, that is, for which the error term depends only on the measure of these subsegments, then we would have an effective control on the sequence {y n } n∈N in Theorem 1.7 (and similarly also in Theorem 1.8). However, it is not clear to us whether one should expect such an effective equidistribution result.
Finally, we note that it was communicated to us by Strömbergsson that using a number theoretic interpretation of the aforementioned sufficient condition and some elementary estimates, one can alternatively prove Theorem 1.7 without going into these congruence covers, see Remark 7.17.

Structure of the paper
In Sect. 2, we collect some preliminary results that will be needed in the rest of the paper. In Sect. 3, we prove a key spectral estimate (Proposition 3.3) and proceed to prove Theorems 1.1 and 1.2. In Sect. 4, we prove Theorems 4.3 and 4.5 by examining the connections between Diophantine approximations and cusp excursions on the modular surface. In Sect. 5, we prove Theorem 1.6 by proving a second moment bound using Hecke operators. In Sect. 6, we study the left regular action of a normalizing element on the set of cusp neighborhoods of a congruence cover of the modular surface. Building on the results, we prove Theorems 1.7 and 1.8 in Sect. 7.

Notation
For two positive quantities A and B, we will use the notation A B or A = O(B) to mean that there is a constant c > 0 such that A ≤ cB, and we will use subscripts to indicate the dependence of the constant on parameters. We will write A B for A B A. For any z ∈ H we denote by e(z) := e 2πiz . For any n ∈ N, we denote by d|n the product over all positive divisors of n, and by p|n prime the product over all prime divisors of n. For any x ≥ 0 and n ∈ N, σ x (n) := d|n d x is the power-x divisor function which satisfies the estimate σ x (n) n x+ for any small > 0.

Preliminaries
Let G = SL 2 (R). We consider the Iwasawa decomposition G = N AK with where u x = 1 x 0 1 , a y = y 1/2 0 0 y −1/2 and k θ = cos θ sin θ − sin θ cos θ respectively. Under the coordinates g = u x a y k θ on G, the Haar measure is given (up to scalars) by The group G acts on the upper half plane H = {z = x + iy ∈ C : y > 0} via the Möbius transformation: gz = az+b cz+d for any g = a b c d ∈ G and z ∈ H. This action preserves the hyperbolic area dμ(z) = y −2 dxdy and induces an identification between G/K and H.
Let < G be a lattice, that is, is a discrete subgroup of G such that the corresponding hyperbolic surface \H has finite area (with respect to μ). We denote by μ := μ( \H) −1 μ the normalized hyperbolic area on \H such that μ ( \H) = 1. We note that when = SL 2 (Z) then μ = μ M with μ M the normalized hyperbolic area on the modular surface M given as in the introduction. We note that in this case it is well known μ(M) = π/3, and hence dμ M (z) = 3 π dxdy y 2 . (2.1) Using the above identification between H and G/K we can identify the hyperbolic surface \H with the locally symmetric space \G/K . We can thus view subsets of \H as right K -invariant subsets of \G. Similarly, we can view functions on \H as right K -invariant functions on \G. We note that using the above description of the Haar measure, the probability Haar measure on \G (when restricted to the subfamily of right K -invariant subsets) coincides with the normalized hyperbolic area μ on \H.

Sobolev norms
In this subsection we record some useful properties about Sobolev norms. Let g = sl 2 (R) be the Lie algebra of G. Fix a basis B = {X 1 , X 2 , X 3 } for g, and given a smooth test function ∈ C ∞ ( \G) we define the "L p , order-d" Sobolev norm S p,d ( ) as where D runs over all monomials in B of order at most d, and the L p -norm is with respect to the normalized Haar measure on \G.
For any ∈ C ∞ ( \G) (which we think of a smooth left -invariant function on G) and for any h ∈ G we denote by L h (g) := (h −1 g) the left regular h-action on . It is easy to check that L h ∈ C ∞ (h h −1 \G), and since taking Lie derivatives commutes with the left regular action, we have Next we note that using the product rule for Lie derivatives (see e.g. [21, p. 90]), the triangle inequality and the Cauchy-Schwarz inequality, for any monomial D of order k≤ d we have for any smooth functions 1 , 2 ∈ C ∞ ( \G) In particular this implies that can be viewed as a smooth left -invariant function on G. Since the Sobolev norms are defined with respect to the normalized Haar measure on the corresponding homogeneous space, we have for < of finite index and ∈ C ∞ ( \G)

Spectral decomposition
Let < G be a non-uniform lattice, that is, is a lattice and \H is not compact. Let = −y 2 ( ∂ ∂ x 2 + ∂ ∂ y 2 ) be the hyperbolic Laplace operator. It is a second order differential operator acting on C ∞ ( \H) and extends uniquely to a self-adjoint and positive semi-definite operator on L 2 ( \H). Since is non-uniform, the spectrum of is composed of a continuous part (spanned by Eisenstein series) and a discrete part (spanned by Maass forms) which further decomposes as the cuspidal spectrum and the residual spectrum. The residual spectrum always contains the constant functions (coming from the trivial pole of the Eisenstein series). If is a congruence subgroup, that is, contains a principal congruence subgroup for some n ∈ N, then the residual spectrum consists only of the constant functions, see e.g. [15,Theorem 11.3].
Let {φ k } be an orthonormal basis of the space of cusp forms that are eigenfunctions of the Laplace operator . Explicitly, for each φ k there exists λ k ≥ 0 such that Selberg's eigenvalue conjecture states that for congruence subgroups, λ k ≥ 1/4, or equivalently, there is no r k ∈ i(0, 1/2). Selberg's conjecture is known to be true for the modular surface M, and more generally, the best known bound towards this conjecture is currently λ k ≥ 1 4 − θ 2 , with θ = 7/64, which follows from the bound of Kim and Sarnak towards the Ramanujan conjecture, see [19, p. 176].
Let now = SL 2 (Z). In the notation introduced at the beginning of this section, the Eisenstein series for the modular group at the cusp ∞ is defined for Re(s) > 1 by with a meromorphic continuation to s ∈ C. Moreover, for any s ∈ C, E(·, s) is an eigenfunction of the Laplace operator with eigenvalue s(1 − s). Let ∈ L 2 (M) and we have the following spectral decomposition (see [15,Theorems 4.7 and 7.3]) where the convergence holds in the L 2 -norm topology, and is pointwise if ∈ C ∞ c (M). As a direct consequence we have for ∈ L 2 (M), (2.7)

Hecke operators
The spectral theory of M has extra structure due to the existence of Hecke operators. The main goal of this subsection is to prove an operator norm bound for Hecke operators and the main reference is [15,Section 8.5]. For any n ∈ N define the set where M 2 (Z) is the space of two by two integral matrices. The n-th Hecke operator T n is defined by that for any ∈ L 2 (M) The Hecke operator T n is a self-adjoint operator on L 2 (M) and since T n commutes with the Laplace operator (since is defined via right multiplication and T n is defined via left multiplication) the orthonormal basis of the space of cusp forms {φ k } can be chosen consisting of joint eigenfunctions of all T n , that is, On the other hand, for any r ∈ R the Eisenstein series E(z, 1/2+ir) is an eigenfunction , see [15,Equation (8.33)]. It is clear that |λ r (n)| ≤ σ 0 (n) with σ 0 (n) the divisor function. For the eigenvalue of cusp forms it is conjectured (Ramanujan-Petersson) that for any above φ k and for any n ∈ N The aforementioned bound of Sarnak and Kim [19] implies that Using these bounds on eigenvalues and the above spectral decomposition (2.6) and (2.7) we have the following bound on the operator norm of the Hecke operator, see also [10, pp. 172-173].

Hecke operators attached to a group element
Let = SL 2 (Z) and let M = \H be the modular surface as above. There is another type of Hecke operators on L 2 (M) defined via a group element in SL 2 (Q). Namely, for each h ∈ SL 2 (Q) the Hecke operator attached to h, denoted by T h , is defined by that for any ∈ L 2 (M) For our purpose, we will need another expression for T h . For any h ∈ SL 2 (Q) we denote by h := ∩h −1 h. We note that the map from to \ h sending γ ∈ to hγ induces an identification between h \ and \ h . This identification induces the following alternative expression for T h : (2.10) It is clear from the definition that T h is defined only up to representatives for the double coset h , that is, Using elementary column and row operations one can see that for h ∈ SL 2 (Q) with degree n where gcd(g) is the greatest common divisor of the entries of g. Thus we can parameterize the Hecke operators by their degrees, that is, we will denote by T n := T h for any h ∈ SL 2 (Q) with degree n. We also note that by direct computation when Now using the description (2.11) we have the double coset decomposition This decomposition together with the definitions (2.8), (2.9) and (2.12) implies the relation Thus by the Möbius inversion formula we have Using this relation and Proposition 2.1 we can prove the following operator norm bounds for T n which we will later use, see also [3, Theorem 1.1] for such bounds in a much greater generality.

Proposition 2.2
Keep the notation as in Proposition 2.1. For any ∈ L 2 (M) and for any n ∈ N we have Proof. By Proposition 2.1 and using the relation (2.13), the trivial estimates |μ(d)| ≤ 1 and ν n ≥ n 2 and the triangle inequality we have

Equidistribution of subsegments of expanding closed horocycles
We record a special case of Sarnak's result [28, Theorem 1] on effective equidistribution of expanding closed horocycles, namely:

Proposition 2.3 Let < SL 2 (Z) be a congruence subgroup and assume that has a cusp at ∞ with width one. Then for any
where the implied constant is absolute, independent of , and y, and the L 2 -norm is with respect to the normalized hyperbolic area μ .

Remark 2.15
We omit the proof here and refer the reader to [18, (3.5)]. We note that while [18] only deals with the case when = 0 ( p) with p a prime number, the proof there works for general congruence subgroups, given that they have trivial residual spectrum; see [15,Theorem 11.3].
We will also need the following (non-effective) equidistribution result replacing the whole closed horocycle by a fixed subsegment: The proof of Proposition 2.4 uses Margulis' thickening trick [25] and mixing property of the geodesic flow on the unit tangent bundle of \H; this approach is also effective, see e.g. [17,Proposition 2.3]. A proof of (2.16) using spectral methods was also sketched in [12,Theorem 1 ]. We also note that both equidistribution results in Propositions 2.3 and 2.4 can be lifted to the unit tangent bundle of \H (with necessary modifications to the error term in (2.14)); since we will be only working in the hyperbolic surface level, we state these two results in the current format for convenience of our discussion. We further refer the reader to [13,30] for some much stronger effective equidistribution results regarding long enough (varying) subsegments on expanding closed horocycles. Proposition 2.4 can be equivalently stated as following: For any fixed open interval I ⊂ (0, 1), the measures μ I ,y weakly converge to μ as y → 0 + , where for any y ∈ (0, 1) and

Remark 2.17
Thus by the Portmanteau theorem, (2.16) extends to = χ B with B ⊂ \H a Borel subset with boundary of measure zero. More generally, let ρ : [0, 1) → R be a Riemann integrable function. Since ρ can be weakly approximated from both above and below by step functions, we have with B ⊂ \H a Borel set with boundary of measure zero.

A quantitative Borel-Cantelli lemma
Finally we record here a quantitative Borel-Cantelli lemma which ensures for the limsup set of certain sequence of events to have full measure given certain quasiindependence conditions.
Remark 2. 19 Keep the notation as in Lemma 2.5. It was shown in [20,Proposition 5.4] that if ∃C > 0 and η > 1 such that for any n = m, R n,m ≤ C then the sequence {A i } i∈N satisfies the condition (2.18).
We will use the following slightly modified version of quantitative Borel-Cantelli lemma which has the flexibility to consider sequence of measurable sets {A n } n∈S indexed by a general unbounded subset S ⊂ N.

Corollary 2.6
Let (X , B, ν) be as in Lemma 2.5. Let S ⊂ N be an unbounded subset and let {A n } n∈S be a sequence of measurable subsets in B. Suppose that Then by for any i < j we have where for the first inequality we used the assumption (2.20) and for the second inequality we used the estimates

Equidistribution range
Let M = SL 2 (Z)\H. Since we fix = SL 2 (Z) throughout this section, we abbreviate the Sobolev norm S p,d by S p,d . In this section, we prove Theorems 1.1 and 1.2. The main ingredient of our proof is an explicit bound of Fourier coefficients which follows from a slight modification of the estimates obtained in [18].

Bounds on Fourier coefficients
Let ∈ C ∞ c (M). Since is left -invariant, it is invariant under the transformation determined by u 1 : z → z +1, and it thus has a Fourier expansion for in the variable x = Re(z): Similarly we denote by a φ k (m, y) and a(s; m, y) the mth Fourier coefficients of the Hecke-Maass form φ k and the Eisenstein series E(·, s) respectively. Estimates on these Fourier coefficients yield, via the spectral expansion (2.6), estimates on the Fourier coefficients of . Namely, We record the following bounds for a φ k (m, y) and a(s; m, y):

3)
where θ = 7/64 is the best known bound towards the Ramanujan conjecture as before.
Moreover, for any m = 0, and any > 0 and any α 0 > 5/3, we have where S α 0 is a Sobolev norm of degree α 0 .

Remark 3.7
The Sobolev norm S α 0 is explicit from the proof of [18,Proposition 3.4]: The following refinement of this last estimate allows to estimate the Fourier coefficients when |m| > y −1 is large. This refinement is crucial for our later results, and the price we pay is a Sobolev norm of higher degree.
The following corollary of Proposition 3.3 is the key estimate that we will use to prove Theorem 1.1.

Corollary 3.4 Let q be a positive integer. For any
Proof If qy ≤ 1 we can separate the above sum into two parts to get Applying ( where for the second estimate we used that 4/3 − θ − > 1. If qy > 1 then we have |qm|y > 1 for all m = 0. We can apply Proposition 3.3 to a (qm, y) for all integers where for the last estimate we used that θ < 1/3 − .

Proof of Theorem 1.1
In this subsection we prove Theorem 1.1. In view of (3.5) it suffices to prove the following proposition. .
where for the second inequality we used the fact that gcd(m, n) = n/d implies that (n/d) | m, for the third inequality we applied Corollary 3.4 and for the second last inequality we applied the estimate ϕ(d)

Full range equidistribution for rational translates
In this subsection we prove Theorem 1.2. We fix x = p/q a primitive rational number and let N q = n ∈ N : gcd(n 2 , q) | n be as in Theorem 1.2. As mentioned in the introduction, the key ingredient is a symmetry lemma for rational translates which generalizes the symmetry (1.6). Before stating the lemma, let us briefly explain why we need to restrict to the subsequence N q . Let n ∈ N and let y > 0. We need to study the distribution of the points (x + j n + iy) = ( p q + j n + iy) for 0 ≤ j ≤ n − 1. Let p j q j be the reduced form of p q + j n and in view of the symmetry (1.6) we have where p j is the multiplicative inverse of p j modulo q j . To further analyze the distribution of these points, we thus need to solve the congruence equation x p j ≡ 1 (mod q j ) in x. Write k = gcd(n, q) and q = q/k and n = n/k. Then p q + j n = p kq + j kn = pn + jq kq n , implying that can be written canonically as a product of two integers. Here for the second equality we used that gcd( pn + jq , q ) = gcd( pn , q ) = 1. In view of the Chinese remainder theorem, the above congruence equation modulo q j is relatively easy to solve when the two factors q and n/ gcd( pn + jq , n) are coprime (see the proof of Lemma 3.6 for more details). This condition can be guaranteed for any j if gcd(q , n) = gcd(q/ gcd(q, n), n) = 1 which is equivalent to the condition n ∈ N q . Finally, we also note that by writing n and q in prime decomposition forms, it is not hard to check that n ∈ N q is equivalent to q = kl with l = gcd(n, q) | n and gcd(k, n) = 1. We now state the symmetry lemma.

Lemma 3.6 Let m
kl be a primitive rational number and let n ∈ N such that l | n and gcd(k, n) = 1. Then for any 0 ≤ j ≤ n − 1 and for any y > 0 we have

14)
where d = d j := gcd(m n l + jk, n) and a = a d , b = b d ∈ Z are some fixed integers such that a n d + bk = 1. Here, for any integer x, x denotes the multiplicative inverse of x modulo k, x * denotes the multiplicative inverse of x modulo n/d. If we further assume gcd( j, n) = l = 1, then d j = gcd(mn + jk, n) = 1 and

15)
Proof Since l | n, by direct computation we have m kl + j n = mn/l+ jk kn . Note that since gcd(k, mn) = 1 we have gcd(m n l + jk, k) = gcd(m n l , k) = 1. This implies that gcd(m n l + jk, kn) = gcd(m n l + jk, n) = d. Hence let p q be the reduced form of m kl + j n , then we have ( p, q) = ((m n l + jk)/d, kn/d). Now since gcd( p, q) = 1, there exist some integers v, w ∈ Z such that γ = w v −q p ∈ . By direct computation we have implying that where for the second equality we used the relation q = kn/d. Moreover, since γ ∈ we have wp + vq = 1, implying that (again using the relation ( p, q) = ((m n l + jk)/d, kn/d)) We claim that w ≡ dlmn n d a + m n l + jk /d * kb (mod k n d ). (3.17) In view of the Chinese Remainder Theorem, since gcd(k, n/d) = 1, it suffices to check For the first equation we have where for the first equality we used the fact that gcd(dl, k) = 1 (since d | n, l | n and gcd(k, n) = 1). The second equation follows similarly. Now plugging relation (3.17) into (3.16) we get (3.14).
For the second half we note that d j = gcd(mn + jk, n) = gcd( jk, n) = 1. The first equality is true since l = 1, and the second equality is true since by assumption gcd(k, n) = gcd( j, n) = 1. Thus in view of (3.14), to prove (3.15) it suffices to note that (mn + jk) * ≡ ( jk) * (mod n), or equivalently, mn + jk ≡ jk (mod n).

Remark 3.18
When k = 1 we can take (a, b) = (0, 1), then (3.15) recovers the symmetry (1.6). We also note that for the point (x + j/n + iy) with x irrational, the above symmetry clearly breaks. Proposition 3.7 Let p/q be a primitive rational number and let n ∈ N q . Then for any y > 0 we have

19)
where x d ∈ R/Z is some number depending on d (and also on p, q, n) and k := q/ gcd(n, q). If we further assume gcd(n, q) = 1, then

20)
where x denotes the multiplicative inverse of x modulo q and a ∈ Z is as in Lemma 3.6.
Proof Relation (3.20) follows immediately from (3.15) by taking (m, k) = ( p, q) and noting that which follows from the fact that gcd(bq, n) = 1 (since gcd(bq, n) = gcd(1−an, n) = 1). Here (q j) * denotes the multiplicative inverse of q j modulo n and b ∈ Z is as in Lemma 3.6. For (3.19), we set m = p, l = gcd(n, q) (so that k = q/l). As mentioned above, the condition gcd(n 2 , q) | n implies that gcd(k, n) = 1. Thus the pair ( m kl , n) satisfies the assumptions in Lemma 3.6 and we can apply (3.14) for the points Moreover, we note that since gcd(k, n) = 1, we have On the other hand, by (3.14) we have where for any integer x, x denotes the multiplicative inverse of x modulo k, x * denotes the multiplicative inverse of x modulo n/d, and a d , b d ∈ Z are some fixed integers such that a d We can thus conclude the proof by noting that the above relation follows immediately from (3.22) together with the fact gcd( Using these two relations and the estimate (3.13) one gets the following effective estimates. where for the second estimate we applied (3.13) and for the third estimate we used the trivial estimate ϕ(n/d) < n/d. Now plugging y d = d 2 /(k 2 n 2 y) into the above equation we get where the dependence on k in the first estimate is absorbed into the dependence on q (since k := q/ gcd(n, q) ≤ q). The second estimate follows from similar (but easier) analysis with the relation (3.20) in place of (3.19).
We are now in the position to prove Theorem 1.2. We will prove the following proposition from which Theorem 1.2 follows, see also Remark 3.23. Theorem 3.9 Let x = p/q be a primitive rational number and let n ∈ N q . Let y n = c/n α for some 1 < α < 2 and c > 0. Then for any ∈ C ∞ c (M) we have δ n,x,y n ( ) − μ M ( ) ,q,c, n α/2−1+ + n 2θ+4 −α(1/2+θ+ ) .
If we further assume gcd(n, q) = 1, then we have

Remark 3.23
The dependence on in the first estimate can also be made explicit. In fact, we can remove this dependence by adding a factor of S 2,2 ( ) + ∞ to the right hand side of this estimate. We also note that since we may take θ = 7/64, the right hand side of these two estimates decays to zero as n → ∞ for any 1 < α < 2. The second estimate follows immediately from (3.5) and the trivial estimate |q| ≥ 1. For the first estimate we separate the sum into two parts to get 1 n d|n ϕ n d a 0, d 2 Applying finishing the proof, where for the first estimate we used the identity that d|n ϕ(n/d) = n and the estimate that ϕ (n/d) < n/d, and for the second estimate we used the estimates

Quantitative non-equidistribution for rational translates
As a direct consequence of the analysis in the previous subsection we also have the following quantitative non-equidistribution result for rational translates when {y n } n∈N is beyond the above range, generalizing the situation for {R pr n (0, y n )} n∈N . As before, for any Y > 0 we denote by μ Y the probability uniform distribution measure supported on H Y . Theorem 3.10 Let x = p/q be a primitive rational number and let y n = c/n 2 for some constant c > 0. Let ∈ C ∞ c ( ). Then for any n ∈ N q we have with k n = q/ gcd(n 2 , q). If we further assume that gcd(n, q) = 1, then δ pr n,x,y n ( ) = μ 1 Proof These two effective estimates follow immediately from Proposition 3.8 by plugging in y n = c/n 2 and noting that a (0, Y ) = 1 0 We can now give the Proof of Theorem 1.3. For part (1), in view of Theorem 3.10 only the second equation needs a proof. Since we are taking n ∈ P m going to infinity, it is sufficient to consider n = m ∈ P m with the prime number > q (so that q). For such n, we have gcd(n 2 , q) = gcd(m 2 2 , q) = gcd(m 2 , q). Since by assumption gcd(m 2 , q) | m and m | n, we can apply the first effective estimate in Theorem 3.10 for such n = m ∈ P m . Moreover, for any such n we have is a fixed number only depending on m and q. Here for the last equality we used the assumption that gcd(m 2 , q) | m. Now let n = m ∈ P m with q sufficiently large such that μ Y ( ) = 0 whenever Y > 2 /(ck n ) 2 (this can be guaranteed since k n is a fixed number and is compactly supported). In particular, for any d | n, μ d 2 /(ck 2 n ) ( ) = 0 whenever | d. This, together with the first estimate in Theorem 3.10 implies that for all such sufficiently large n = m ∈ P m where for the second estimate we used that gcd(m, ) = 1 and is a prime number.
We can now finish the proof by taking n = m → ∞ along the subsequence P m (equivalently, taking → ∞) and plugging in the relation k n = q/ gcd(m, q). For part (2), since R pr n (x, y n ) ⊂ R n (x, y n ), we only need to prove the full escape to the cusp for the sequence {R n (x, y n )} n∈N . Identify (up to a null set) M with the standard fundamental domain F := z ∈ H : Re(z) < 1 2 , |z| > 1 . For any n ∈ N and 0 ≤ j ≤ n − 1 let p j q j be the reduced form of x + j n = p q + j n = pn+q j qn so that by (1.6) Thus using the trivial inequality |q j | ≤ |q|n for all 0 ≤ j ≤ n − 1 and the assumption lim

Negative results: in connection with Diophantine approximations
Let = SL 2 (Z) and M = \H be the modular surface. Let μ M be the normalized hyperbolic area on M as before. In this section we prove a general result which captures the cusp excursion rate for the sample points R n (x, y n ) in terms of the Diophantine property of the translate x ∈ R/Z ∼ = [0, 1), see Theorem 4.3. Theorem 1.4 will then be an easy consequence of this result.

Notation and a preliminary result on cusp excursions
In this subsection we prove a preliminary lemma relating cusp excursions on the modular surface to Diophantine approximations. Let us first fix some notation. Finally, we record a distance formula that we will later use. Let d M (·, ·) be the distance function on M induced from the hyperbolic distance function d H on H, i.e., (4.1) The estimate (4.1) holds for a general non-compact finite-volume hyperbolic manifold using reduction theory after Garland and Raghunathan [11,Theorem 0.6] combined with a distance estimate by Borel [2, Theorem C]. We give here a selfcontained elementary proof for the special case of the modular surface.
Proof of Lemma 4.1. In view of the triangle inequality, we may assume z 0 = i.
The following simple lemma is the key observation relating cusp excursions with Diophantine approximation. Then for any 0 ≤ j ≤ n − 1 we have In particular, we have Proof The in particular part follows immediately from the inclusion C Y j ,2Y j ⊂ C Y , which in turn follows from the trivial bound Y j ≥ Y . Hence it suffices to prove the first half of the lemma. For simplicity of notation, we set r = 1/(2Y n 2 ). Then by assumption |x − m n | < r . Fix 0 ≤ j ≤ n − 1, and let p q be the reduced form of m+ j n (so that q = n gcd(n,m+ j) ). Then x + j n +ir ∈ H • p/q,r and x + j n +ir ∈ H p/q,r for some r < r < 2r . Take γ ∈ sending H • p/q,r to the region z ∈ H : Im(z) > 1/(2rq 2 ) = Y j .
Then we have Im γ (x + j n + ir) > Y j and Im γ (x + j n + ir ) = Y j . Since r < r < 2r we can bound the hyperbolic distance implying that which implies (4.2).

Full escape to the cusp along subsequences for almost every translate
In this subsection we prove Theorem 4.3. Before stating this theorem, we first recall a definition from Diophantine approximation. Let ψ : N → (0, 1/2) be a non-increasing function. We say that x ∈ R is primitive ψ-approximable if there exist infinitely many n ∈ N such that the inequality is satisfied by some m ∈ Z coprime to n. Since we assume ψ(N) ⊂ (0, 1/2), the existence of such an m implies its uniqueness. We prove the following: If x ∈ [0, 1) is primitive ψ-approximable, then R n (x, y n ) ⊂ C r n infinitely often.

Remark 4.6
Since R pr n (x, y) ⊂ R n (x, y) for any n ∈ N, x ∈ R and y > 0, Theorem 4.3 also holds for translates of the primitive rational points.

Proof of Theorem 4.3 Let
x ∈ [0, 1) be primitive ψ-approximable. Then for Y n = 1/ (2nψ(n)), we have by (4.3) that for infinitely many n's. For every n ∈ N, set d n := Y n /r n = max {ψ(n)/(ny n ), ny n /ψ(n)}. Then for any t ∈ R. As in the proof of Lemma 4.2, by (4.7) and (4.8) we have R n (x, y n ) ⊂ C Y n /d n for any n in (4.7).

We now give a short
Proof of Theorem 1.4 Let α = min{β, 2 − β}. For each n ≥ 2, let ψ(n) = 1/(n log n) and let {y n } n∈N be a sequence of positive numbers satisfying y n 1/(n 2 log β n). Then r n as in (4.5) is given by r n = 1 2 min{ψ(n) −2 y n , n −2 y −1 n } log α n. By Theorem 4.3, for any x ∈ [0, 1) primitive ψ-approximable, we have that R n (x, y n ) ⊂ C r n infinitely often. Hence by (4.1), for each such x ∈ R/Z, we have infinitely often, implying the inequality (1.10). Finally, since n∈N ψ(n) = ∞ and ψ is decreasing, the set of primitive ψ-approximable numbers in [0, 1) is of full measure by Khintchine's approximation theorem.
For every irrational x ∈ R, the Diophantine exponent κ x > 0 is the supremum of κ > 0 for which x is primitive n −κ -approximable. Dirichlet's approximation theorem implies that κ x ≥ 1 for any irrational x and by Khintchine's theorem, κ x = 1 for almost every x ∈ R. When κ x > 1, we have the following result that yields much faster cusp excursion rates for our sample points while handling sequences {y n } n∈N decaying polynomially faster than 1/n 2 .

A non-equidistribution result for all translates
In this subsection we prove the following result which, together with part (1) of Theorem 1.3 implies non-equidistribution for all translates:  Proof Let x ∈ [0, 1) be primitive ψ c -approximable, that is, there exist infinitely many n ∈ N satisfying |x − m/n| < c/n 2 = y n with some uniquely determined m ∈ Z satisfying gcd(m, n) = 1. For each such n, and for any 0 ≤ j ≤ n − 1, let k = gcd(n, m + j) 2 . Then by (4.2),
Since 0 < c < 3/2 we have max{2c, 4/c} < 9/(2c) implying that U is nonempty. We will show that E c is disjoint from U. Let for j = 2, 3. Moreover, since the interval (max {2c, 4/c} , 9/2c) intersects I 2 and I 3 trivially, we have E j c ∩ U = ∅ for j = 2, 3. It thus remains to show that E 1 c ∩ U = ∅. For this we note that z ∈ F satisfies the property that Hence to show E 1 c ∩ U = ∅, it suffices to show that max γ ∈ Im(γ z) ≤ max {2c, 4/c} for any z = s + it ∈ H with Im(z) = t ∈ I 1 = [1/(2c), 1/c]. For this, using the same discussion as in the proof of Lemma 4.1 we have for any Finally, using the above description of U and (2.1) we have by direct computation max{2c,4/c} − 2c 9 < 1 (again since 0 < c < 3/2).

Remark 4.9
The condition on the sequence {y n } n∈N in Theorem 4.5 is quite restrictive and the proof of Theorem 4.5 is much more involved than that of Theorem 4.3. We note that this is because we need to take care of the badly approximable numbers, that is, the set of irrational numbers that are not primitive ψ c -approximable for some c > 0. If x ∈ [0, 1) is not badly approximable, then a similar argument as in the proof of Theorem 4.3 using only the crude estimate (4.3) would already be sufficient to prove non-equidistribution of the sample points R n (x, y n ) for any sequence {y n } n∈N satisfying y n 1/n 2 .

Second moments of the discrepancy
Let = SL 2 (Z) and let M = \H be the modular surface as before. In this section we prove Theorem 1.6. Our proof relies on a second moment computation of the discrepancies |δ n,x,y − μ M | and |δ n,y ( ) respectively. Since we assume = SL 2 (Z) we will also use the notation μ for μ M .

Relation to Hecke operators
In this subsection we prove two preliminary estimates relating these second moments to the Hecke operators defined in Sect. 2.3. where 0 = − μ ( ), T u j/n is the Hecke operator associated to u j/n ∈ SL 2 (Q) defined as in (2.9), the Sobolev norm S( ) is defined by and the implied constants are absolute.
Proof Without loss of generality we may assume that is real-valued. Expanding the square in the left hand side of (5.1), doing a change of variables, and using the left u 1 -invariance of , we have that D n,y ( ) equals Applying (2.14) to the term 1 0 (x + iy)dx and using the trivial estimate For each 0 ≤ j ≤ n − 1, it is easy to check that u 1 ∈ j and j contains the principal congruence subgroup (n 2 ), hence j satisfies the assumptions in Proposition 2.3. Then by (2.14), Next we note that by (2.3),

Using the fact that is left -invariant and
where for the second equality we used (2.2). Hence we have Thus applying (2.14) to F j ∈ C ∞ ( j \H) and using (5.6) we get Plugging (5.7) into (5.5) and using the identities μ ( ) = μ j ( ) = μ j (L u −1 j/n ) (the second equality follows from the left G-invariance of the hyperbolic area μ j ) we get that Let F ⊂ H be a fundamental domain for \H. The disjoint union γ ∈ j \ γ F forms a fundamental domain for j \H. Thus we can conclude the proof of (5.1) by noting that where for the second equation we did a change of variable z → γ z, used the leftinvariance of and the relation [ : j ]μ j = μ , and for the last equality we used the expression (2.10). Similarly, applying the estimates (2.14) and (5.4) and making change of variables we see that D pr n,y ( ) equals Finally we can finish the proof by noting that for each 0

Second moment estimates
Combining Proposition 5.1 and the operator norm bound in Proposition 2.2 we have the following second moment estimates: where θ = 7/64 is the best bound towards the Ramanujan conjecture as before and the Sobolev norm S( ) is as defined in (5.3).

Remark 5.9
It is also possible to approach the second moment computation using the spectral bounds on the Fourier coefficients of from Sect. 3.1 rather than Hecke operators. The spectral approach however yields a weaker estimate when y > 0 is small. For comparison, following the spectral approach, one obtains 1 0 |δ n,x,y ( ) − μ ( )| 2 dx n −1 y −2(θ+ ) + y 1/2 S 2,2 ( ).
Proof of Theorem 5.2 First we prove (5.8). For each 0 ≤ j ≤ n − 1, it is clear that u j/n is of degree n j := n/ gcd(n, j), and thus T u j/n = T n j . Applying For any d | n, #{0 ≤ j ≤ n − 1 : where for the first inequality we used the trivial bound ϕ(d) < d. Finally, we observe that 0 2 ≤ 2 .
We now give a quick Proof of Theorem 1. 6 Let α > 0 be the fixed number as in this theorem. Let β := min{ α 2 , 1 − 2θ }. Fix 0 < c < β and let N ⊂ N be an unbounded subsequence such that n∈N n −c < ∞. We want to show that for any {y n } n∈N satisfying y n n −α there exists a full measure subset I ⊂ R/Z such that for any x ∈ I , δ n,x,y n ( ) → μ M ( ) and δ pr n,x,y n ( ) → μ M ( ) for any ∈ C ∞ c (M) as n ∈ N goes to infinity. Since the function space C ∞ c (M) has a dense countable subset, it suffices to prove the above assertion for a fixed . Now we fix ∈ C ∞ c (M) and take > 0 sufficiently small such that β − 2 > c. For any n ∈ N define I n = I 1 n ∪ I 2 n ⊂ R/Z such that Thus by the second moment estimate (5.8), the assumption that y n n −α and Chebyshev's inequality we get |I n | ≤ I 1 n + I 2 n ≤ 2n max D n,y ( ), D pr n,y ( ) , n −β+2 < n −c , implying that n∈N |I n | < ∞. Hence taking I ⊂ R/Z to be the complement of this limsup set lim n∈N n→∞ I n ⊂ R/Z and by the Borel-Cantelli lemma we have I is of full measure. Moreover, for any x ∈ I , x ∈ I c n for all n ∈ N sufficiently large, that is, In particular for such x, δ n,x,y n ( ) → μ M ( ) and δ pr n,x,y n ( ) → μ M ( ) as n ∈ N goes to infinity.

Remark 5.10
The second moment D n,y ( ) is closely related to the sample points (1.2) considered in [12]: Using the extra invariance δ n,x+1/n,y ( ) = δ n,x,y ( ) and applying a change of variable, one can easily check that Thus let N ⊂ N be the fixed sequence as in the above proof, by Theorem 5.2 and the same Borel-Cantelli type argument we have that for almost every x ∈ R/Z the sequence of sample points { ( x+ j n + iy n : 0 ≤ j ≤ n − 1} equidistributes on M with respect to μ M as n ∈ N goes to infinity, as long as {y n } n∈N decays at least polynomially.

Left regular action of normalizing elements
In this section, denotes a congruence subgroup, and we set by 1 = SL 2 (Z). We moreover assume that there exists some h ∈ SL 2 (Q) normalizing , that is, h −1 h = . It induces the left regular h-action on \H given by z ∈ \H → hz ∈ \H. Since h normalizes , this map is well defined: Suppose z = z , that is there exists some γ ∈ such that z = γ z. Then hz = hγ z = hγ h −1 hz = hz. The goal of this section is to describe this action on cylindrical cuspidal neighborhoods of \H.

Cusp neighborhoods of congruence surfaces
Since is a congruence subgroup, the set of cusps of can be parameterized by the coset \ (Q ∪ {∞}) (see e.g. [21, p. 222]), where the action of on Q ∪ {∞} is defined via the Möbius transformation. We denote by a complete list of coset representatives for \ (Q ∪ {∞}). For each cusp representative c ∈ , its stabilizer subgroup is given by where τ c ∈ 1 is such that τ c ∞ = c. (More precisely, c is an index two subgroup of the stabilizer subgroup of c if −I 2 ∈ .) The existence of such τ c is guaranteed by the transitivity of the action of 1 on Q ∪ {∞}. On the other hand, τ c is only unique up to right multiplication by any element of ±N . We note that c is independent of the choice of τ c , and since c ∈ is a cusp, c is nontrivial. Moreover, τ −1 c c τ c is a subgroup of N ∩ 1 = u 1 . Hence τ −1 c c τ c is a cyclic group generated by a unipotent matrix u ω c for some positive integer ω c , which is called the width of the cusp c.
We can now define cusp neighborhoods on the hyperbolic surface \H around a cusp c ∈ . For any Y > 0, C ,c Y ⊂ \H denote the projection of the horodisc We record the following two lemmas for the later purpose of computing the measure of certain unions of cusp neighborhoods.
Proof The one-to-one correspondence is given by the projection of the above rectangular set onto \H. Indeed, since c ⊂ , this map projects the rectangular set in (6.1) onto C ,c Y ,Y . To show that it is also injective, suppose τ c z = τ c z for some z, z from this rectangular set. Then there exists some γ ∈ such that τ −1 c γ τ c z = z . If γ ∈ ± c then τ −1 c γ τ c ∈ ± u ω c , and this implies that z = z . Otherwise, let τ −1 c γ τ c = a b c d ∈ 1 . Since γ / ∈ ± c , c = 0. We easily see this cannot happen since it would imply contradicting that Im(z ) > Y > 1. For the area computation, we use the definition (2.1) of μ 1 together with μ 1 = [ 1 : ]μ (since −I 2 ∈ ).
Lemma 6.2 Given two distinct cusps c 1 , c 2 ∈ , and any Y 1 , Proof Since Y 1 , Y 2 ≥ 1, the sets {τ c 1 z ∈ H : Im(z) > Y 1 } and {τ c 2 z ∈ H : Im(z) > Y 2 } are subsets of the interior of the Ford circles based at c 1 and c 2 respectively. Two Ford circles are either disjoint or identical. Suppose z ∈ C ,c 1 Y 1 ∩ C ,c 2 Y 2 . Then there exists an isometry γ ∈ that maps the Ford circle at c 1 to the Ford circle at c 2 . Consequently, we must have γ c 1 = c 2 , which is a contradiction. Remark 6. 3 We will later consider sets I y,Y ,c := x ∈ (0, 1) : (x + iy) ∈ C ,c Y for some y > 0, Y > 1 and c ∈ . This set is the intersection of the line segment {x + iy ∈ H : 0 < x < 1} with the preimage of C ,c Y in H (under the natural projection from H to \H). By definition the preimage of C ,c Y is the disjoint (since Y > 1) union of the infinitely many horodiscs {τ c z ∈ H : Im(z) > Y } = H • p/q,1/(2q 2 Y ) for all cusps c = p/q ∈ c. Moreover, note that a necessary condition for such a horodisc intersecting the line segment {x + iy ∈ H : and We show that τ −1 hc hτ c is an upper triangular matrix. Indeed, τ −1 hc hτ c ∞ = τ −1 hc (hc) = ∞. This proves (6.4). We moreover conclude that for some λ = 0, and it remains to show that λ 2 = ω hc /ω c . For this we conjugate the subgroup τ −1 hc hc τ h·c by the matrix τ −1 hc hτ c . We obtain with (6.4) that On the other hand, using (6.6) and τ −1 hc hc τ hc = u ω hc , we have Comparing both equations we conclude that λ 2 = ω hc /ω c . Finally replacing τ hc with −τ hc if necessary, we can ensure λ is positive.
Proof The second statement follows from the first one by taking Y → ∞. Since

Negative results: horocycles expanding arbitrarily fast
In this section using the results from the previous section, we prove Theorems 1.7 and 1.8 which provide new limiting measures for the sequences δ n,x,y n n∈N and δ pr n,x,y n n∈N , allowing {y n } n∈N to decay arbitrarily fast. For any n ∈ N we consider the congruence subgroup n < SL 2 (Z) given by It is clear that 1 = SL 2 (Z) and that n contains the congruence subgroup 1 (n 2 ) := γ ∈ SL 2 (Z) : γ ≡ 1 * 0 1 (mod n 2 ) .

Basic properties of the congruence subgroups 0 n
First we show that n is normalized by u j/n for any j ∈ Z. As mentioned in the introduction this simple fact is the starting point of our proofs to Theorems 1.7 and 1.8. Hence if γ ∈ n , that is, n 2 | c and a ≡ d ≡ ±1 (mod n), all the entries are integers with the bottom left entry divisible by n 2 , and This implies that u −1 j/n n u j/n ⊂ n .
Next we prove the following index formula for n .

Lemma 7.2
For any integer n ≥ 3, we have Proof Let J n < Z/n 2 Z × be the subgroup It is easy to check that #(J n ) = 2n. Consider the map h : n → J n sending γ = a b c d ∈ n to [a] ∈ Z/n 2 Z × . Using the definition of n , one can check that h is a group homomorphism with the kernel ker(h) = 1 (n 2 ). For each 0 ≤ k ≤ n − 1, set γ ± k = ± 1+kn 1 −k 2 n 2 1−kn ∈ n . Then h surjects the set γ ± k ∈ n : 0 ≤ k ≤ n − 1 onto J n . Finally we use the index formula for 1 (n 2 ) (see e.g. [6, Section 1.2]) to get Next, we study the properties of n relative to its cusps. As in Sect. 6 we denote by n the set of cusps of n . The following lemma computes the width of each cusp of n . Proof Let τ c ∈ 1 be as before such that τ c ∞ = c. Thus the left column of τ c is m l . By direct computation we have Thus by (7.1) an element in ( n ) c = τ c N τ −1 c ∩ n is of the form γ = 1−mlt m 2 t −l 2 t 1+mlt ∈ 1 satisfying that n 2 | l 2 t and 1 − mlt ≡ 1 + mlt ≡ ±1 (mod n). Looking at the top right and bottom left entries of γ , we have that m 2 t, l 2 t ∈ Z. Since gcd(m, l) = 1, we have t ∈ Z. Then the condition n 2 | l 2 t is equivalent to n 2 gcd(n,l) 2 | t, and the condition n | mlt is equivalent to that n gcd(n,ml) | t. Moreover, since n gcd(n,ml) | n 2 gcd(n,l) 2 , the condition n gcd(n,ml) | t is implied by the condition n 2 gcd(n,l) 2 | t. We conclude that n 2 | l 2 t implies 1 − mlt ≡ 1 + mlt ≡ ±1 (mod n). Thus Conjugating ( n ) c back via τ c and using the equivalence of the two conditions n 2 | l 2 t and n 2 gcd(n,l) 2 | t we get implying that ω c = n 2 / gcd(n, l) 2 .
Next we compute the number of cusps of n .
To prove Proposition 7.4 we first prove a preliminary formula for # n .

Lemma 7.5 For any integer n ≥ 3 we have
2n .
Proof Since −I 2 ∈ n and 1 (n 2 ) < n , we have n = n \ 1 (n 2 ) . On the other hand, by the analysis in [6, p. 102], the set 1 (n 2 ) is in bijection with the union of cosets d|n 2 ±I 2 \Z d , where for each d | n 2 , For each d | n 2 , using the definition of n , it is easy to check that the linear action of n on Z 2 (by matrix multiplication) induces a well-defined action of n on Z d and that the corresponding action of the subgroup 1 (n 2 ) is trivial. From the proof of Lemma 7.2, we have n / 1 (n 2 ) ∼ = J n , where proving the claim, and hence also this lemma.
We can now give the proof of Proposition 7.4 by simplifying the formula in Lemma 7.5.

Proof of Proposition 7.4
Write n = k i=1 p α i i in the prime decomposition form and apply Lemma 7.5 to get where the summation is over all vectors β = (β 1 , . . . , β k ) ∈ Z k satisfying 0 ≤ β i ≤ 2α i for all 1 ≤ i ≤ k, and we used that gcd(n 2 /d, Using the fact that ϕ is multiplicative and interchanging the summation and product signs we get where for the second equality we used that for 1 ≤ β i ≤ 2α i −1, ϕ p zero. Moreover, applying the volume formula (6.2), the index formula in Lemma 7.2 and the cusp number formula in Proposition 7.4 (see also Remark 7.4 for the case when n = 2) we have for any n ∈ N, For any n ∈ N and 0 < y < 1 we define By definition, x ∈ I n (y) if and only if n (x + iy) ∈ C n,c ω c Y n ,2ω c Y n ⊂ C n,c ω c Y n for some c ∈ n . Thus Lemma 7.6 implies that This, together with our choice that Y n = max{log n, 1} and the distance formula (4.1), implies that for any n ≥ 3 and for any x ∈ I n (y) It thus suffices to show that there exists a sequence {y n } n∈N satisfying that 0 < y n < c n for all n ∈ N and that the limsup set lim n→∞ I n (y n ) is of full Lebesgue measure in R/Z. For this, we will construct a sequence {y n } n∈N decaying sufficiently fast and then apply the quantitative Borel-Cantelli lemma Corollary 2.6 to the sequence {I n (y n )} n∈N ⊂ R/Z. To ensure the quasi-independence condition (2.20) in Corollary 2.6, we need, for every pair 1 ≤ m < n ∈ N, the two quantities |I m (y m ) ∩ I n (y n )| and |I m (y m )| |I n (y n )| to be sufficiently close to each other. The key observations for this are the following two relations that and n (x + iy n )dx. (7.8) Assuming the limit equation (2.16) holds for the pairs ((0, 1), n ) and (I m (y m ), n ) (we will verify this later), then by relation (7.8) the quantity |I m (y m ) ∩ I n (y n )| is close to the quantity |I m (y m )|μ n ( n ) which in turn is close to |I m (y m )||I n (y n )| by relation (7.7), provided that y n > 0 is sufficiently small. We now implement the above ideas rigorously. We first claim that there exists a sequence {y n } n∈N satisfying, for all n ∈ N, 0 < y n < c n and for any subset I ⊂ R/Z taken from the finite set We now construct such a sequence successively. For the base case n = 1 since (7.11) holds for the pair ((0, 1), 1 ) on M = 1 \H, there exists 0 < y 1 < c 1 sufficiently small such that For a general integer n ≥ 2, suppose that we already have chosen 0 < y m < c m satisfying (7.9) for all the positive integers m < n. By Remark 6.3 the set I m (y m ) ⊂ R/Z is a disjoint union of finitely many open intervals for any m < n. Thus (7.11) is satisfied for all the pairs ((0, 1), n ) , (I m (y m ), n ), 1 ≤ m < n on n \H. Since there are only finitely many such pairs, we can take 0 < y n < c n sufficiently small such that (7.9) is satisfied for all I ∈ {(0, 1)} {I m (y m ) : 1 ≤ m < n}, which is the set in (7.10). This finishes the proof of the claim. Now let {y n } n∈N be as in the claim. For any n ∈ N apply (7.9) to the pair ((0, 1), n ) we get By the triangle inequality, this implies μ n ( n ) ≤ 2|I n (y n )|. (7.13) More generally, for each 1 ≤ m < n apply (7.9) to the pair (I m (y m ), n ) we get |I m (y m ) ∩ I n (y n )| − |I m (y m )| μ n ( n ) ≤ |I m (y m )| μ n ( n ) 2n 2 . (7.14) Using the inequalities (7.12), (7.13), (7.14) together with the triangle inequality we get ||I m (y m ) ∩ I n (y n )| − |I m (y m )| |I n (y n )|| ≤ |I m (y m )| μ n ( n ) n 2 ≤ 2 |I m (y m )| |I n (y n )| n 2 . (7.15) Hence the sequence {I n (y n )} n∈N ⊂ R/Z satisfies the quasi-independence condition (2.20) (with the subset S = N and the exponent η = 2). Moreover, using the inequality (7.12), the volume computation (7.6) and the estimate that Y n log n we have that n∈N |I n (y n )| ≥ n∈N 1 2 μ n ( n ) n∈N 1 n log n = ∞.
Thus by Corollary 2.6, lim n→∞ I n (y n ) ⊂ R/Z is of full Lebesgue measure, finishing the proof.

Remark 7.16
It is not clear to us whether the rate log log n is the fastest excursion rate for generic translates. We note that in principle it can be proved (or disproved) if one can compute the volume of the set E n Y := n z ∈ n \H : 1 u j/n z ∈ C Y for all 0 ≤ j ≤ n − 1 .
For instance, if one can show μ n (E n Y ) 1/(nY ) for all n ∈ N and for all Y ≥ 1, then Theorem 1.7 together with a standard application of the Borel-Cantelli lemma would imply that the inequality in (1.13) is indeed an equality for almost every x ∈ R/Z. We also note that our analysis (Lemmas 6.2, 7.6) shows that for any n ∈ N and for any Y ≥ 1 implying that 1/(nY ) μ n E n Y 1/Y . On the other hand using some elementary arguments (which relies on the width computation Lemma 7.3) one can show that any u 1/n -orbit contains at least one cusp of width one. This fact together with the fact that 1 ≤ ω c ≤ n 2 implies that E n Y = c∈ n C n,c Y when Y ≥ n 2 . However, both estimates are not sufficient for the purpose of obtaining an upper bound.

Remark 7.17
Here we give a very brief sketch of the argument communicated to us by Strömbergsson: For each n ∈ N and y > 0, it is not difficult to see that n (x + iy) ∈ C n,c ω c Y n for some c = p q ∈ n with gcd( p, q) = 1 if and only if x − p q 2 < y ω c Y n q 2 − y 2 = y gcd(n, q) 2 n 2 Y n q 2 − y 2 . (7.18) Here Y n = max{log n, 1} is as in the above proof. Definẽ One can easily check that elements inĨ n (y) satisfy the inequality (7.18). Hence by Lemma 7.6 we haveĨ n (y) ⊂ {x ∈ R/Z : R n (x, y) ⊂ C Y n }. (7.19) Moreover, using some standard techniques from analytic number theory one can show that for any subinterval I ⊂ R/Z (or more generally, any finite disjoint union of subintervals), lim y→0 + |I | −1 Ĩ n (y) ∩ I = c n Y n with c n = 3 π 2 ϕ(n) n 2 p n (1− p −2 ) −1 ϕ(n) n 2 . This limit equation is the analog of (7.11). Another input is the divergence of the series n∈N c n Y n n∈N ϕ(n) n 2 log n , which follows from the estimate ϕ(n) n/ log log n. With these two inputs one can then mimic the arguments in the above proof to construct a sequence {y n } n∈N decaying sufficiently fast and then apply Corollary 2.6 to get a full measure limsup set lim n→∞Ĩn (y n ) ⊂ R/Z. Finally, we note that the relation (7.19) can be checked directly using the definition of the setĨ n (y). Hence this argument can be carried over without going into the congruence covers n \H.

Proof of Theorem 1.8
We prove Theorem 1.8 in this subsection. The strategy is similar to that of Theorem 1.7 with the sequence of cuspidal sets approaching the cusps replaced by a sequence of compact cylinders approaching certain closed horocycles. Let n ∈ N be an integer and let n z ∈ n \H be a point close to a cusp c ∈ n . For any 0 ≤ j ≤ n−1, the analysis in Sect. 6 gives exact information about the height of the companion point n u j/n z with respect to the cusp u j/n c. While this is sufficient for Theorem 1.7 (cusp excursions), to realize the limiting measure ν m,Y in Theorem 1.8 one needs more refined information about the spacing of these companion points along the closed horocycles they lie on.
For this, we further analyze the left regular u 1/n -action on points near certain type cusps which we now define.
We say c ∈ n is of simple type if c can be represented by a primitive rational number m/q satisfying that gcd(n 2 , q) | n, and we denote by sim n ⊂ n the set of simple type cusps. (This notion of simple type cusps is closely related to the condition n ∈ N q in Theorem 1.2. In fact, let p/q be a primitive rational number then the condition n ∈ N q is equivalent to that the cusp c ∈ n represented by p/q is of simple type.) If m /q is another representative for c, that is, m /q is primitive and m /q = γ (m/q) for some γ ∈ n , then using the definition of n , it is easy to check that gcd(n 2 , q) = gcd(n 2 , q ). Hence the simple type cusps are well-defined. As mentioned in Sect. 3.3 the condition gcd(n 2 , q) | q implies the further decomposition q = kl with l = gcd(n, q) | n and k = q/l satisfying gcd(k, n) = 1. We can thus reparameterize a simple type c by m/(kl) with gcd(m, kl) = gcd(k, n) = 1 and l | n. The main new ingredient of our proof to Theorem 1.8 is the following decomposition of the sample points which generalizes (3.19). m a kl b ∈ 1 , and for each 1 ≤ j ≤ n − 1 let τ u j/n c = p j v j q j w j ∈ 1 , where p j , q j are as in the proof of Lemma 7.8, a, b, v j , w j are some integers such that τ c , τ u j/n c ∈ 1 , that is, mb − kla = 1 and m n l + jk w j − knv j = d j (7.20) with d j = gcd(m n l + jk, n) as in the proof of Lemma 7.8. By direct computation and using Lemmas 6.3 and 7.8 (and the relation ω c = d 2 0 = n 2 /l 2 ) we have (Here we used the assumption that mkl = 0.) Hence we have for any 0 ≤ j ≤ n − 1 n u j/n z = n u j/n τ c (x + iy ) = n τ u j/n c τ −1 u j/n c u j/n τ c (x + iy ) (7.21) = n τ u j/n c Here for the first equality we used the assumption that n z = n τ c z and the fact that u j/n normalizes n . Now as in the proof of Proposition 3.7 for any d | n, we define Use the second relation in (7.20) to get for j ∈ D d , w j (m n l + jk)/d ≡ 1 (mod k n d ).
where for the last equality we used the identities gcd(n 2 /d, d) = d (since d | n) and where for the second equality we used the fact that n 2 /d and n share the same set of prime divisors. Hence for n = m we have # c ∈ sim n : ω c ≥ m 2 = ϕ(n) 2 We note that the first condition is guaranteed by the facts that t n ≥ (m 2 Y − 1) −1 and that lim n∈P m n→∞ t n = ∞. For the second condition, we note that by the definitions of Y n and Y n , 1 Y n − 1 Y n = 1 Y t n . Moreover, using the fact that there are only finitely many prime numbers dividing m we get where the divergence of the rightmost series follows from the estimate j j log j which is an easy consequence of the prime number theorem. Here j ∈ P 1 denotes the j-th prime number.
We now give the δ pr m /d,x n,c,d ,d 2 Y ( ). We now conclude by taking n → ∞ along the subsequence N x and noting that lim n∈N x n→∞ log Y n /Y n = 0 (since lim n∈P m n→∞ Y n /Y n = 1 which follows from the assumption lim n∈P m n→∞ Y n = lim n∈P m n→∞ Y n = Y ).

Remark 7.26
It is clear that we can take a sequence {y n } n∈N decaying sufficiently fast such that the conditions (7.9) and (7.25) (for any finitely many pairs (m, Y ) with m 2 Y > 1) are all satisfied and hence (noting that the intersection of finitely many full measure sets is still of full measure) for such a sequence the conclusions of Theorems 1.7 and 1.8 (for any finitely many pairs (m, Y ) with m 2 Y > 1) hold simultaneously.