Bounds for theta sums in higher rank. I

Theta sums are finite exponential sums with a quadratic form in the oscillatory phase. This paper establishes new upper bounds for theta sums in the case of smooth and box truncations. This generalises a classic 1977 result of Fiedler, Jurkat and Körner for one-variable theta sums and, in the multi-variable case, improves previous estimates obtained by Cosentino and Flaminio in 2015. Key steps in our approach are the automorphic representation of theta functions and their growth in the cusps of the underlying homogeneous space.


Introduction
Consider the exponential sum θ f (M, X, x, y) = m∈Z n f (M −1 (m + x)) e 1  2 mX t m + m t y , ( where f : R n → R is a rapidly decaying cut-off function, M ∈ R >0 , X a real symmetric n × n matrix, and x, y ∈ R n (represented as row vectors).We also use the shorthand e(z) = e 2πiz .We refer to θ f as a theta sum.If, for example f (x) = exp −πxP t x for some positive definite matrix P , we obtain the classical Siegel theta series θ f (M, X, 0, y) = m∈Z n e 1  2 mZ t m + m t y , with Z = X + iY and Y = M −2 P .If, on the other hand, f = χ B is the characteristic function of a bounded set B ⊂ R n we have the finite sum θ f (M, X, x, y) = e 1 2 mX t m + m t y . (1.3) In this case we will also use the notation The following theorem, which is our first main result, gives an upper bound on the values of theta sums in the limit of large M , when the truncation function is in the class of complex-valued Schwartz functions S(R n ) .
This theorem follows from a geometric representation of θ f as an automorphic function and an application of a dynamical Borel-Cantelli lemma for flows on homogeneous spaces, theorem 1.7 in [8].The special case of theorem 1.1 for general smooth theta sums in one variable was considered in [14].
Upper bounds for smooth multi-variable theta sums, with an additional linear average in X, have played an important role in understanding the value distribution of quadratic forms, see for example the work of Götze [6], Buterus, Götze, Hille and Margulis [1] and the first named author [12,13].
The second main result of this paper deals with the subtler case when f is the characteristic function of a rectangular box B. In this setting Cosentino and Flaminio [3] established the bound θ B (M, X, 0, y) = O X,ǫ M n 2 (log M ) n+ 1 2n+2 +ǫ (1.6) for the unit cube B = [0, 1] n , any ǫ > 0 and almost every X.The following theorem improves on this by a factor of (log M ) n and produces a uniform bound for rectangular boxes of the form B = (0, b 1 ) × • • • × (0, b n ), with b i ∈ R >0 ranging over compacta.Theorem 1.2.Fix a compact subset K ⊂ R n >0 , and choose ψ as in theorem 1.1.Then there exists a subset X (ψ) ⊂ R n×n sym of full Lebesgue measure such that for all M ≥ 1, b = (b 1 , . . ., b n ) ∈ K, X ∈ X (ψ), x, y ∈ R n .The implied constants are independent of M , b, x and y.
To compare this with the bound obtained in [3], note that ψ(t) = t 1 2n+2 +ǫ satisfies (1.4) and thus resulting bound (1.7) indeed improves (1.6) by a factor of (log M ) n .The paper [3] also established the stronger bound θ B (M, X, x, y) = O X M n 2 (1.8) for "bounded-type" X that are badly approximable by rationals (these form a set of measure zero), and weaker bounds for X that satisfy more relaxed Diophantine conditions.These same bounds can also be obtained from our techniques, but with no further improvements.
In the case n = 1 our estimate (1.7) matches the optimal results found by Fiedler, Jurkat and Körner [5].For n > 1, obtaining the lower bounds in these papers (which follow from the harder part of the Borel-Cantelli lemma) is more subtle, and we hope to develop an approach to this elsewhere.
The bounds for the theta sum in (1.2) and (1.5) are uniform in the shift x and the linear phase y.In forthcoming work [15] we will consider improved bounds valid for almost all (X, x, y) ⊂ R n×n sym × R n × R n , generalising the results for n = 1 found in [4].This paper is organised as follows.We begin in section 2 by recalling some basic facts about the Heisenberg group and symplectic group Sp(n, R) as well as their semi-direct product, the Jacobi group, including the Iwasawa decomposition, Haar measure, and parabolic subgroups.We then review the Schrödinger and Segal-Shale-Weil representations of the Heisenberg and symplectic group, respectively.Following the method of [11], these representations are used to define theta functions in section 4.
The theta functions satisfy an automorphy condition on a certain, morally-speaking discrete subgroup of the Jacobi group.This subgroup is discussed in section 3. Its projection to the symplectic group is just the integral symplectic group Sp(n, Z).The bulk of section 3 concerns a fundamental domain (a slight modification of Siegel's classic fundamental domain [18] based on the work of [7]) and its properties.
In section 4 we define the theta functions and state their automorphy properties before analysing their asymptotic behaviour.While for the proof of theorems 1.1 and 1.2 we only need the upper bound contained in corollary 4.5, the full asymptotics contained in theorem 4.4 may be of independent interest.The proof of theorem 4.4 combines the properties of the fundamental domain constructed in section 3 and basic estimates for sums over integers together with the Langlands decompositions of the maximal parabolic subgroups of the symplectic group.
We prove theorems 1.1 and 1.2 in section 5. Apart from the upper bound in corollary 4.5, our method relies on upper bounds for the measure of rapidly diverging orbits under a particular one-parameter diagonal action in the symplectic group as well as (for theorem 1.2) a resolution of the singular cutoff function in (1.3) using an n-parameter diagonal action.The estimates for the first part are largely based on the easy part of the proof of theorem 1.7 in [8], which is also a main input into the method in [3].The complications arising from the n-parameter flow however prevent a straightforward application of this theorem, so we instead proceed more directly with a self-contained proof.
2 Heisenberg, symplectic, and Jacobi groups We define the (2n + 1)-dimensional Heisenberg group H to be the set R n × R n × R with multiplication given by The rank n symplectic group G = Sp(n, R) is defined by where with I the n × n identity matrix.We have the alternative characterization The group G acts by on H via where Since g preserves the symplectic form J 0 used to define the multiplication (2.1), this action is by automorphisms, i.e. (h We define the semi-direct product group H ⋊ G, called the Jacobi group, to be the set of all (h, g), h ∈ G and g ∈ G, with multiplication given by (2.7)

Iwasawa decomposition and Haar measure
The intersection K = G ∩ O(2n) is a maximal compact subgroup of G and defines an isomorphism from the unitary group U(n) to K. The Iwasawa decomposition of G with respect to K implies that any g ∈ G can be written uniquely as where X and Y are symmetric, Y is positive definite, and Q ∈ U(n).Here we have chosen Y 1 2 to by upper-triangular with positive diagonal entries, and we often further decompose Y = U V t U with U upper-triangular unipotent and V positive diagonal.We also note that Y − 1 2 is always interpreted as (Y . We make frequent use of the following expressions for the X, Y , and Q coordinates, where as before (C t C + D t D) 2 is chosen to be upper-triangular with positive diagonal entries.The Haar measure on G can be easily expressed in terms of the Iwasawa decomposition.For the Haar measure µ on G is given by (2.12) Here dQ denotes the Haar measure on U(n) and dx ij , du ij , dv jj are respectively the Lebesgue measures on the entries of X, U , V .
We note that if g = A B C D with D invertible, then we can write Therefore the set of g ∈ G having the form for X, T symmetric and A ∈ GL(n, R) is open and dense in G.We claim that in these coordinates we have, up to multiplication by a positive constant, where dx ij , da ij , dt ij are the Lebesgue measure on the entries of X, A, T .
To verify (2.15) up to a positive constant it suffices to check that the right side is invariant under left multiplication by generators of G.The invariance under matrices I X 1 0 I with X 1 symmetric is obvious, and the invariance under matrices and that the replacements (2.17) To verify the invariance under 0 −I I 0 we may restrict further to the set of g of the form (2.14) with X invertible, as this is still an open, dense set.We then have The invariance then follows from the fact that the replacement

Parabolic subgroups
We recall that conjugacy classes of parabolic subgroups of G are in bijection with subsets of the n positive simple roots, see for example section 4.5.3 of [19].Here we make the choice of positive simple roots α 1 , . . ., α n where, for 1 ≤ l < n,

.21)
Here is positive diagonal.See for example section 5.1 of [19].
The parabolic corresponding to a subset L ⊂ {α 1 , . . ., α n } is given by where Z(ker(α)) is the centraliser in G of the kernel of the root α and The maximal parabolic subgroups correspond to subsets L of size n − 1 and we denote them by P l , 1 ≤ l ≤ n corresponding to root α l not in the set L. For 1 ≤ l < n, we write an arbitrary element of P l as where R l and For l = n, we write an arbitrary element of P n as where T n is n × n symmetric, a n > 0, and U n ∈ GL(n, R) with det U n = ±1.The factorizations (2.25), (2.26) are in fact the Langlands decompositions of P l , P n , which write an arbitrary element of the parabolic subgroup as a product of elements of a nilpotent subgroup, a diagonal subgroup, and a semi-simple subgroup.For general considersations regarding the Langlands decomposition, see section 7.7 of [10].The author's lecture notes [16] contain explicit calculations for the symplectic group G along these lines.

Schrödinger and Segal-Shale-Weil representations
The Schrödinger representation W of H acts on L 2 (R n ) by the unitary transformations We remark that this definition of the Schrödinger representation differs slightly from the conventional one; they are of course unitarily equivalent.Given g ∈ G, we obtain another representation W g of H by W g (h) = W (h g ).By the Stonevon Neumann theorem, there exists unitary operators R(g) on L 2 (R n ) such that (2.28) The relation (2.28) actually defines R(g) up to a scalar multiple.Regardless of the choice of this scalar (which we make below), we have for a nontrivial, unitary cocycle ρ : G × G → C. Thus R defines a projective representation of G, which is called the Segal-Shale-Weil representation.The projective representation R can be extended to a true representation of the metaplectic group -the simply connected double cover of G, but we do not make use of this construction.
The following proposition gives expressions for R(g) for certain g and on a dense subset of L 2 (R n ).In particular the proposition makes precise the choice of scalar multiple in our definition of R.
we have and for with square blocks of size l, n − l, l, and n − l along the diagonal, 0 ≤ l ≤ n, we have 1) , y (2) )e(−x (2) t y (2) )dy (2)  (2.33) x (2) .Moreover, for g of the form (2.30) and any g ′ ∈ G, we have ρ(g, g This proposition is a summary of various calculations found in [11].The forthcoming lecture notes by the authors [16] will give self-contained proofs.
We remark that together with the Bruhat decomposition where P n is the maximal parabolic subgroup (2.26), proposition 2.1 allows one to compute R(g) for any g ∈ G.For example, for with C invertible, we have (2.36) 3 The subgroups Γ and Γ We denote by Γ the discrete subgroup Γ = Sp(n, Z) ⊂ G.For we set h γ = (r, s, 0) ∈ H where the entries or r are 0 or 1 2 depending on whether the corresponding diagonal entry of C t D is even or odd, and the entries of s are 0 or 1  2 depending on whether the corresponding diagonal entry of A t B is even or odd.We now define the group Γ ⊂ H ⋊ G by We note that this is a subgroup of G because, modulo left multiplication by elements (m, n, t) ∈ H with m, n ∈ Z n and t ∈ R, we have γ 2 for any γ 1 , γ 2 ∈ Γ.Indeed, we have that 2r and 2s, where h γ 1 γ 2 = (r, s, 0), has the same parity as the diagonal entries of which, in view of A 2 t D 2 − B 2 t C 2 = I, have the same parity as the diagonal entries of On the other hand, we have where γ γ j = (r j , s j , 0).The entries of two times the vectors on the right of (3.7) have the same parity as the diagonal entries of (3.5) and (3.6), as claimed.
We say that a closed set D ⊂ G is a fundamental domain for Γ\G if 1. for all g ∈ G there exists γ ∈ Γ such that γg ∈ D and 2. if for g ∈ D there is a non-identity γ ∈ Γ such that γg ∈ D, then g is contained in the boundary of D.
Following Siegel [18], we define D to be the set of all 2. Y ∈ D ′ , a fundamental domain for the action of GL(n, Z) on n × n positive symmetric matrices, and , where x ij are the entries of X.
We note that since (2.10) implies that the first condition implies that for g ∈ D, | det(Y (g))| ≥ | det(Y (γg))| for all γ ∈ Γ.We also note that Siegel chooses D ′ to be the set of positive definite symmetric Y such that Y −1 is in Minkowski's classical fundamental domain.However, here we choose D ′ to be the set of Y such that Y −1 is in Grenier's fundamental domain, see [7] and [19].Following [7] and [19], we define D ′ = D ′ n recursively as follows.We set D ′ 1 = {y > 0} and the standard fundamental domain for GL(2, Z).For n > 2 we define D ′ n to be the set of , where r j are the entries of r 1 .
This is proven to be a fundamental domain in [7] and [19].In general, the motivation for using this fundamental domain is the box-shaped cusp, but here the primary advantage is its recursive definition, which we make frequent use of below.We remark that one can construct a fundamental domain for Γ\G with a box-shaped cusp by maximising v 1 over all of Γ, not just GL(n, Z).This approach is utilised in the second paper in this series [15].However we do not need this feature here, and in fact maximising the determinant in the fundamental domain as we have done is useful in what follows, see the proofs of lemmas 5.2 and 5.2.
The following proposition records some useful properties of D.
Proposition 3.1.Let g ∈ D and write and also Then we have x (2) . (3.14) which may be completed to where x nn is the (n, n) entry of X.Since the entries of X are at most a half in absolute value, v 2 n ≥ 1 − x 2 nn ≥ 3 4 as required.We have and we note that to demonstrate v j ≥ 3 4 v j+1 , it suffices to consider j = 1 by the inductive construction of D ′ .We apply the minimality of v −1 1 for an element γ ∈ GL(n, Z) having first row where r is the first entry of r 1 .Since |r| ≤ 1 2 , it follows that v 1 ≥ 3 4 v 2 .To demonstrate the second part of the proposition, we let y 1 , . . ., y n denote the rows of Setting y = x 2 y 2 + • • • + x n y n , where the x j are the entries of x, our aim is to prove that for some constants 0 < c 1 < 1 < c 2 depending only on n, We let 0 < φ 1 < π denote the angle between y 1 and y and 0 < φ 2 < π 2 denote the angle between y 1 and the hyperplane span(y 2 , . . ., y n ).We have φ 2 ≤ min(φ 1 , π −φ 1 ), and so | cos φ 1 | ≤ | cos φ 2 |.We bound cos φ 2 away from 1 by bounding sin φ 2 away from 0.
We have so it suffices to show that v 1 2 1 ≫ ||y 1 ||.Here ∧ denotes the usual wedge product on R n and the norm on k R n is given by Using the inductive construction of D ′ and the fact that the entries of r 1 (Y ), r 1 (Y 1 ), . . .are at most1 2 in absolute value, we observe that U has entries bounded by a constant depending only on n.We find that with the implied constant depending on n.

Theta functions and asymptotics
Following [11], for f ∈ S(R n ) we define the theta function Θ f :

.1)
Setting h = (x, y, t), and Thus for f (x) = exp −πx t x , Q = I, and h = (0, 0, 0), we recover (det Y ) 1 4 times the classical Siegel theta series that is holomorphic in Z = X + iY . 1  The following theorem establishes the automorphy of Θ f under Γ, which we recall is defined at the beginning of section 3. Theorem 4.1.For all (uh γ , γ) ∈ Γ and (h, g) ∈ H ⋊ G, there is a complex number ε(γ) of modulus 1 such that where u = (m, n, t).
This theorem is proved in [11] but with Γ replaced by a finite index subgroup.The automorphy under the full group Γ is proved in [17], however only for the special function f (x) = exp −πx t x .In [11] it is shown that this function is an eigenfunction for all of the operators R(k(Q)), Q ∈ U(n).Moreover, it can be seen from the theory built there that the automorphy for any Schwartz function follows from that for exp −πx t x .A self-contained proof along the lines of [11] is presented in forthcoming notes by the authors [16].We also remark that ε(γ) can be expressed as a kind of Gauss sum as shown in [11] and the author's notes, but we do not make use of this here.
We recall that for The following lemma states that if f is a Schwartz function, then the f Q are "uniformly Schwartz."Lemma 4.2.Let f ∈ S(R n ).Then for all A > 0 and multi-indices α ≥ 0, there exist constants Proof.Since f is Schwartz, so are the Fourier transforms of f with respect to any subset of the variables.For a subset S ⊂ {1, . . ., n}, multi-index α ≥ 0, and A > 0, we let c S f (α, A) be constants such that ∂ ∂x where f S is the Fourier transform of f in the variables having indices in S.
We now consider f Q for Q ∈ U(n) diagonal with the first n − l entries 1 and the last l entries e iφ j with 0 < φ j < π.We let S ⊂ {1, . . ., n} be the set of indices j, n − l < j ≤ n, such that φ j ∈ 0, π 4 ∪ 3π 4 , π and we write Q = Q ′ Q S where Q S is diagonal with (j, j) entry i if j ∈ S and 1 if j ∈ S. We have and we recall that |ρ(k with C, D diagonal, the entries of C being cos φ j or sin φ j depending on whether j ∈ S or not, the entries of D being − sin φ j or cos φ j depending on whether j ∈ S or not.We note that the entries of C are at least .9) using proposition 2.1, and noting that R(k(Q S )) = f S , we compute 2) dy (2) .(4.10) Now as the entries of C are between 1 √ 2 and 1, and the entries of D are at most 1 √ 2 in absolute value, integration by parts and (4.6) shows that with implied constant depending on f , S, α, and A.
We observe that for real orthogonal for any orthogonal Q.It now suffices to show that any unitary matrix Q 0 can be written as Q 1 QQ 2 with Q having the special form above and is the identity.It follows that X, Y commute, and thus can be simultaneously diagonalized by an orthogonal matrix Q 1 .We have Finally, we may permute the diagonal entries of Q and change their signs so that the special form above holds.
We now turn to analysing the behaviour of the theta function Θ f , f a Schwartz function, in the cusp of Γ\H ⋊ G.We repeatedly use the easy bounds recorded in the following lemma.Lemma 4.3.For real numbers A > 1 2 , |x| ≤ 1 2 and v, y > 0, we have and, if in addition v ≤ ay with a > 0, Proof.We have The first sum here is 0 if y < v, otherwise it is at most v − 1 2 y −A+ 1 2 .The second sum is at most which is (4.17) The first sum here is ≪ a v − 1 2 y −A+ 1 2 , while the second sum is at most so (4.14) follows immediately.
The following theorem, while a little complicated, gives an asymptotic formula for Θ f (h, g) as g → ∞ inside the fundamental domain D. We describe the relevant neighbourhoods of ∞ using the Langlands decomposition (2.25) of the parabolic subgroups P l with 1 ≤ l < n, see (4.19).The semi-simple part of the Langlands decomposition of this parabolic is a copy of Sp(n − l, R), and our asymptotic formula for Θ f has a theta function associated to Sp(n − l, R) for a main term, see (4.20).
Theorem 4.4.Let f ∈ S(R n ), g ∈ D, and h = (x, y, t) ∈ H with the entries of x and y all at most 1  2 in absolute value.For 1 ≤ l ≤ n we write We have where and, with x y = x (1) x (2) y (1) y (2) , Proof.Comparing the expressions we find that and Applying (4.13) with v = v l , x the last entry of x (1) , y = x (1) V l t x (1) − v l x 2 , and renaming 2 for g ∈ D by proposition 3.1, we obtain the following corollary.Corollary 4.5.For a Schwartz function f ∈ S(R n ), g ∈ D, and h = (x, y, t) ∈ H with the entries of x and y at most 1  2 in absolute value, we have where with Y = U V t U as usual.

Proof of the main theorems
Having the bounds from corollary 4.5, we now proceed to the proof of theorems 1.1 and 1.2.
In the smooth setting of theorem 1.1, we need to construct a distance-like (DL) function that captures the bounds in corollary 4.5.This will enable us to directly apply theorem 1.7 in [8] modulo a standard argument that allows us to pass from a full measure set in Γg ∈ Γ\G to a full measure set on the unstable foliation parametrized by X ∈ R n×n sym .The proof of theorem 1.2 is more involved, and requires modifications of the method in [8] to enable a resolution of the singular cutoff function in (1.3).To this end we need to uniformly manage many points in Γ\G.
We note that theorem 1.7 in [8] is also a main input in the method of [3].
We now make the linear change of variables s j = u j − u j+1 for j < n and s This transformation has determinant n and its inverse is given by (5.9) We find that the exponent in (5.8) is then (5.10) As j(n−j)
We now control the change in the height function D under a geodesic flow by a fixed distance.This estimate should be compared to the requirement in [8] that distance-like functions be uniformly continuous.with implied constants depending only on n.
Proof.For arbitrary g ∈ G and |s| ≤ 1, we set We first claim that det V (g s ) ≍ det V (g) (5.13) for all g ∈ G.
As usual we have and we note that det Y = det V (g).Writing we have so in view of (2.10) (5.17) The ratio of the right to the left side of (5.13) is then det e 2s S t S + e −2s R t R . (5.18) Using the diagonalization argument from the proof of lemma 4.2, we can multiply by orthogonal matrices to make R and S diagonal with entries cos φ j and sin φ j .The determinant (5.18) is then 1≤j≤n (e 2s sin 2 φ j + e −2s cos 2 φ j ). (5.19) Since |s| ≤ 1, this is clearly bounded from above by a constant depending on n, and since sin 2 φ j and cos 2 φ j cannot both be less than 1 2 , it is also bounded away from 0. This establishes (5.13).Now we have for some γ 0 ∈ Γ.By (5.13) we have D(Γg s ) ≪ det V (γ 0 g) ≤ D(Γg).The same reasoning with g s replaced by g leads to the reverse inequality, establishing (5.11).
The following lemma is similar to lemma 5.2 in that we control the change in D under a particular action.Here the action is more general, however we only need to consider small neighbourhoods in Γ\G.
Lemma 5.3.There exists a constant ǫ n > 0 depending only on n such that for all Γg ∈ Γ\G, A ∈ GL(n, R) satisfying ||A − I|| ≤ ǫ n , and symmetric T satisfying ||T || ≤ ǫ n , we have (5.21) Proof.As in the proof of lemma 5.2, it suffices to show that for all g ∈ G, where We write where R + iS ∈ U(n).We compute (5.25) so the ratio of the left and right sides of (5.22) is det (SA + R t A −1 T ) (5.26) Recalling that R t R + S t S = I, we have It follows that ǫ n can be made sufficiently small so that the symmetric matrix has all eigenvalues less than 1 n , say, in absolute value, and (5.22) follows.
where the sum over m (L) has a bounded number of terms, B (L ′ ) is the edge of B associatated to L ′ , and we have used the decomposition (5.45) to express θ B (L ′ ) (M, X L ′ ,L ′ , x (L ′ ) , y (L ′ ) + (m (L) + x (L) )X (L,L ′ ) ) as × e 1 2 (m (L ′ ) + x (L ′ ) )X (L ′ ,L ′ ) t (m (L ′ ) + x (L ′ ) ) + m (L ′ ) t (y (L ′ ) + (m (L) + x (L) )X (L,L ′ ) ) .(5.64) When L = {1, . . ., n} or n = 1, the corresponding part of (5.63) is clearly bounded.Proceeding by induction on n > 1, for any other L, there are full measure subsets X (n−#L) such that if X (L ′ ,L ′ ) ∈ X (n−#L) , the corresponding part of (5.63) is ≪ M n−#L 2 +ǫ for any ǫ > 0. It follows that (5.63) is ≪ M n 2 assuming that X is such that X (L ′ ,L ′ ) ∈ X (n−#L) for all nonempty L ⊂ {1, . . ., n}.We now consider the part of (5.59) with j such that 2 j i b −1 j i ≤ M .We set X j (ψ, C) to be the set of X ∈ Z n×n sym \R n×n sym such that there exist A ∈ GL(n, R) and T ∈ R (5.65) Here K is the compact subset in theorem 1.2 identified with the compact subset of diagonal matrices B in GL(n, R) in the obvious way.We then set X j (ψ, C2 We now verify that with ψ satisfying the conditions of theorems 1.1, 1.2, X (ψ) has full measure, noting (again by induction on n) that it is enough to show that C>0 j≥0 X j (ψ, C2 .1 and Y l ∈ D ′ n−l .Applying (4.14) repeatedly as we did in the l = 1 case, we obtain the bound