Diophantine approximation on matrices and Lie groups

We study the general problem of extremality for metric Diophantine approximation on submanifolds of matrices. We formulate a criterion for extremality in terms of a certain family of algebraic obstructions and show that it is sharp. In general, the almost sure diophantine exponent of a submanifold is shown to depend only on its Zariski closure, and when the latter is defined over the rational numbers, we prove that the exponent is rational and give a method to effectively compute it. This method is applied to a number of cases of interest, in particular, we manage to determine the diophantine exponent of random subgroups of certain nilpotent Lie groups in terms of representation theoretic data.


Introduction
Pick n + m vectors x 1 , . . . , x n+m at random in R m according to a certain distribution ν. Take integer linear combinations of the vectors and ask how close they can get to the origin in R m . One way to measure this is via the diophantine exponent, defined as: p i x i > p −β for all but finitely many p ∈ Z m+n }, (1.1) where x = (x 1 , . . . , x m+n ), p = (p 1 , . . . , p m+n ) and where on both sides · is the supremum norm on the coordinates.
In the case where x 1 = e 1 , . . . , x m = e m , where e 1 , . . . , e m is the canonical basis of R m , and (x m+1 , . . . , x m+n ) is chosen on a submanifold of (R m ) n M m,n (R), this question is the topic of Diophantine approximation on submanifolds of matrices, which studies the quality of approximation by integer vectors of the image of an integer vector under an m × n matrix chosen at random in some submanifold of M m,n (R).
When m = 1, this subject has been studied extensively, starting with a 1932 conjecture of Mahler [Mah42], which posited that the Veronese curve M = {(x, x 2 , . . . , x n ), x ∈ R} is extremal, i.e. that a random point on it has the same diophantine exponent as a random point in R n chosen with respect to Lebesgue measure. This conjecture was proved by Sprindžuk [Spr64,Spr69]. We refer the reader to [Spr64,Spr69,BD99,BRV16] for background on Diophantine approximation on manifolds.
About twenty years ago, Kleinbock and Margulis [KM98] revolutionized the subject by introducing methods from the dynamics of homogeneous flows into this area. In particular they settled a conjecture of Sprindžuk by establishing that every analytic submanifold of R n that is not contained in a proper affine subspace is extremal (see [BB96] for the state of the art before their work). This method is based on certain quantitative non-divergence estimates for diagonal flows on the space of lattices and takes its roots in the early work of Margulis [Mar75] pertaining to the non-divergence of unipotent flows on homogeneous spaces in connection with Margulis' first proof of arithmeticity for non-uniform higher rank lattices. It was later further developed by Kleinbock in an important series of papers [Kle98b,Kle03,Kle08b,Kle10a]. We refer the reader to [Kle08a,Kle10b] for a nice introduction to these techniques.
Our first goal in this article is to answer a question of Beresnevich, Kleinbock and Margulis [BKM15] asking for the right criterion for a submanifold of matrices to be extremal. We were led to this problem after we observed that a solution would enable us to compute diophantine exponents for dense subgroups of nilpotent Lie groups, in the spirit of our previous work [ABRdS15a]. Beyond extremality, our criterion allows us to effectively compute the exponent of any rationally defined submanifold of matrices. Further in the paper, we will derive consequences for nilpotent Lie groups. The results of this paper were announced in [ABRdS15b]. * * * We view the (n + m)-tuple x as a m × (m + n) matrix. We note in passing that the exponent β(x) depends only on the kernel ker x. Therefore diophantine approximation on submanifolds of matrices is best phrased in terms of diophantine approximation on submanifolds of the grassmannian, here the grassmannian of n-planes in R n+m . We will keep this observation as a guiding principle, but will always phrase everything in terms of matrices. We now describe a family of obvious obstructions to extremality.
Indeed there are roughly R dim W integer points in the ball of radius R in W , but they get mapped into a ball of radius O(R) in R r , so, comparing volumes, ε-balls around each of these points cannot be all disjoint if ε is at least of order R 1−dim W/r . Consequently, we see that if dim W r > m + n m (1.2) then the point x is not extremal, since n m is the diophantine exponent of a random point of M m,m+n (R) chosen with respect to Lebesgue measure.
Pencils satisfying (1.2) will be called constraining pencils, because they constrain the diophantine exponent away from its extremal value. A pencil is called proper if it is not all of M m,m+n (R). When viewed in the grassmannian (i.e. looking at the set of kernels ker x, with x in a pencil), pencils are wellstudied objects: they are a certain kind of Schubert varieties (see e.g. [GH94, ch 1.5] and Section 4.2 below).
When the Zariski closure of M is defined over Q, it turns out that these obvious obstructions are the only ones, and the following is our first main result.
Theorem 1.2 (Exponent for rational manifolds). Let m, n be positive integers and M be a connected analytic submanifold of M m,m+n (R). Assume that the Zariski closure of M is defined over Q. Then for Lebesgue almost every point x ∈ M the diophantine exponent is rational and equals Moreover, the maximum in the right-hand side is achieved for a rational W .
Note that the maximum in (1.3) is always achieved, because the quantity dim W r − 1 takes values in a finite set of rational points; the important point in the second part of the theorem is that some rational subspace achieves the maximum.
The theorem shows in particular that the generic value of β(x) is constant, and can be effectively computed once the pencils containing M are identified. We also note that this value is the smallest possible value for the exponent of an arbitrary point on M, because of the Dirichlet or pigeonhole type argument explained above.
When speaking of Zariski closure, algebraic subsets and algebraic varieties in this paper, we will always consider these notions in real algebraic geometry and we refer the reader to the textbook [BCR98] for definitions and basic properties. By Lebesgue measure on M we mean the top dimensional Hausdorff measure of the subset M of M m,m+n (R). The real algebraic variety M is said to be defined over Q if it is the set of zeroes of a family of polynomials (in the matrix entries) with rational coefficients. We immediately conclude: Theorem 1.2 and its proof hold verbatim for more general measures (than Lebesgue measure on an analytic submanifold), which we call locally good measures (see Definition 5.2.2).
The above theorem answers a question discussed at the end of the original paper of Kleinbock and Margulis [KM98,§6.2] and also raised in the problem list [Gor07, §9.1] as well as in [KMW10,p.23] and [BKM15, Problem 1]. The papers [KMW10] and [BKM15] proposed other sufficient criterions for extremality of an analytic submanifold of M m,n (R), but they just failed to be optimal. The weak non-planar condition of [BKM15] is not strong enough for example to show extremality in the applications to nilpotent groups described below. This condition is equivalent in our language to not being contained in any proper pencil, which is a strictly stronger condition (see Example 6.2.3).
We now no longer assume the Zariski closure to be defined over Q. In three papers [Kle03], [Kle08b] and [Kle10a] Kleinbock studied this situation. He showed in particular that the diophantine exponent of a random point on a connected analytic submanifold of M m,n (R) achieves the same value almost everywhere. In the case where m = 1 he showed that this almost sure value depends only on the affine span of the submanifold in R n . In other words a submanifold inherits its exponent from its affine span. Another remarkable observation he made in [Kle08b] for R n and in [Kle10a,Theorem 1.4] for matrices, is that the diophantine exponent of a random point is also the worst diophantine exponent of any point on the submanifold. In the context of matrices, one needs to find the right replacement for the affine span. We show the following: The existence of a generic value for β(x), which is the same for almost every point x of M, and the identity (1.4) are due to Kleinbock [Kle10a,Theorem 1.4.a]. When we say that the generic exponent depends only on the Plücker or on the Zariski closure of M, this means in particular that the generic value of the exponent of a random point of H(M) or of the Zariski closure of M (considered with respect to the Lebesgue measure on these algebraic varieties) coincides with that of M.
The Plücker closure of a subset S in M m,m+n (R) is the set of all matrices x whose full list of minors satisfy the same linear equations as those of S. It contains the Zariski closure of S and is contained in the Schubert closure of S, namely the intersection of all pencils containing S. See §5.2.
Theorem 1.4 has been established independently in the recent work of Das, Fishman, Simmons and Urbanski, see [DFSU15,Theorem 1.9]. In this work the authors introduce a very general class of measures, which is invariant under measure automorphisms and encompasses many fractal measures of dynamical origin, for which Theorem 1.4 is shown to hold, see the discussion after [DFSU15, Definition 1.2].
Our Theorems 1.2 and 1.5 will be proved in Section 5 for a smaller class of measures on Hom(V, E) (still more general than Lebesgue measure on analytic submanifolds), which we call locally good measures and are basically the matrix analogue of the friendly measures introduced by Kleinbock, Lindenstrauss and Weiss in [KLW04].
We also stress that all of our results will be proved for submanifolds of Hom(V, E), where V and E are finite-dimensional real vector spaces of arbitrary dimension (we will not assume dim V > dim E as in this introduction).
We further show the following general inequality: Theorem 1.5 (Bounds for the exponent). Let M be a connected analytic submanifold of M m,m+n (R). Then for almost every x ∈ M with respect to Lebesgue measure: (1.5) The maximum is taken over rational pencils on the left-hand side and arbitrary pencils on the right-hand side. In particular: Corollary 1.6. If M is not contained in any constraining pencil, then M is extremal.
When the Zariski closure is defined over Q, Theorem 1.2 reduces the determination of the exponent to a purely algebraic question: determining the maximum in (1.3). This may still be a challenging task, however the analysis is greatly simplified by the following property of the pencils realizing the maximum in (1.3).
Proposition 1.7. There is a unique subspace W ≤ R m+n of maximal dimension such that M ⊂ P W,r and dim W r = max This fact is a consequence of a very general lemma, which we call the submodularity lemma and runs as follows.
Lemma 1.8 (Submodularity lemma). Let V be a finite-dimensional vector space, and suppose that φ : Grass(V ) → N is a non-decreasing (for set inclusion) and submodular function, i.e. for any two vector subspaces U and W we have Then the maximum max dim W φ(W ) is attained at a unique subspace of maximal dimension.
In particular this applies to the right-hand side in (1.5), because the function W → max x∈M dim xW is submodular.
In presence of symmetry, this lemma greatly simplifies the algebraic analysis of determining the right-hand side in (1.5) and hence the exponent. Indeed if φ is G-invariant for the action of some group G on the grassmannian (preserving dimension), then the submodularity lemma implies that the subspace realizing the maximum is G-invariant. This observation is used to derive Theorem 1.2 from Theorem 1.5 by means of the Galois group action on V . Here is an example illustrating the use of the submodularity lemma for another group action in combination with Theorem 1.2 : Example 1.9 (The Veronese manifold in matrices). Let p, s ∈ N. What is the infimum β of all β > 0 such that for almost every M ∈ M s (R), the inequality v 0 I + v 1 M + · · · + v p M p ≤ v −β has at most finitely many integer solutions v ∈ Z p+1 ? The Mahler conjecture proved by Sprindžuk mentioned earlier is the case s = 1, and then the answer is β = p. When s > 1, this problem fits the diophantine approximation on submanifolds of matrices scheme: set V = R m+n = R p [X] the space of polynomials of degree at most p. Naturally, Z m+n is identified with the lattice of polynomials with integer coefficients. Set E = R m = M s (R), and M ⊂ Hom(V, E) M m,m+n (R) the set of all linear maps P → P (M ). In terms of matrices, M is just the Veronese manifold {(I, M, . . . , M p ) ; M ∈ M s (R)}. Using Theorem 1.2 the exponent β can be determined once the pencils containing M are determined. But the group G of affine transformations of the real line acts on V by substitution of the variable and preserves M. So in order to compute the maximum in (1.2), we only need to consider G-invariant subspaces. This is easily done, and we find that In Of course when the Zariski closure is not defined over Q, it can happen that the almost sure exponent is not given by either side of (1.5). Then the question remains to find an appropriate hull S(M) of M whose almost sure exponent would be the same as that of M, in the spirit of Kleinbock's work [Kle03] [Kle08b] on submanifolds of R n with irrational affine span. The natural candidate is what we call the Schubert closure of M, which is the intersection of the pencils containing M. It is an algebraic variety, which is in general bigger than the Zariski closure of M. In the classical setting of submanifolds of M m,n (R) this is the same space as the space H(M) considered by Beresnevich, Kleinbock and Margulis in [BKM15,7.2]. The Plücker closure considered here and in [DFSU15] is in general smaller than the Schubert closure.
The results above are a first stone in a more complete study of diophantine approximation on submanifolds of matrices that we do not undertake here and would be concerned with badly approximable points (such as in [Bak76,Kle98a]), improvements of Dirichlet's theorem (see [KW08]), Khintchinetype theorems (in the spirit of [BKM01,BBKM02]), Jarnik-type theorems, etc. Let us only mention that the methods introduced in the current paper also apply to the study of multiplicative diophantine approximation as in the original work of Kleinbock and Margulis [KM98] ; in particular, one can derive a satisfying criterion for a manifold to be strongly extremal. This will be explained in a forthcoming paper of the first and last authors with Das and Simmons [BDSS].
In order to apply the above results to the computation of diophantine exponents of random subgroups of nilpotent Lie groups, it is necessary (in the general case) to extend them to the setting of quasi-norms and consider weighted diophantine approximation, in which different directions are assigned possibly different weights. We now briefly describe what this means, but later in the paper we will prove our results in this generalized setting.
As above M = Φ(U ) is a connected analytic submanifold of Hom(V, E), where V and E are two finite-dimensional real vector spaces, U ⊂ R N is a connected open set and Φ : U → Hom(V, E) an analytic map. We endow V and E with quasi-norms, namely is a basis of V and (u 1 , . . . , u e ) a basis of E, and α 1 ≥ . . . ≥ α d > 0 and α 1 ≥ . . . ≥ α e > 0 are positive numbers. Next we let ∆ be the lattice Zu 1 + . . . + Zu d and we consider the following diophantine exponent for x ∈ Hom(V, E) (1.6) Given two non-negative numbers a, b, and a subspace W of V , we define (see Section 4.4) the pencil where ψ (resp. φ) is defined for a subspace W ≤ V (resp. W ≤ E) by The pencil P W,a,b is a certain closed algebraic subset of Hom(V, E) associated to our choice of quasi-norms on V and E, and coincides with the pencil defined in Definition 1.1 in the unweighted case. Pencils are indeed closely related to Schubert subvarieties of the grassmannian Grass(V ) (see Section 4.2).
Theorem 1.10 (Weighted case). Theorems 1.2 and 1.5 hold more generally in the weighted (quasi-norm) setting. In particular for Lebesgue almost every (1.7) with equality when the Zariski closure of M is defined over Q.
We refer the reader to Theorems 5.1 and 5.2 for more complete statements. The Q-structure on Hom(V, E) used implicitly in the above theorem is the one induced by the Q-span of the bases of V and E chosen to define the quasi-norms. The condition that the Zariski closure of M be defined over Q is satisfied for example when the map Φ itself is a polynomial map with coefficients in Q (i.e. each matrix entry is such).
The dynamical method adapts well to the weighted case, as had already been observed by Kleinbock early on, for example in [Kle98b].

* * *
We now pass to the description of our results regarding diophantine approximation on nilpotent Lie groups. Inspired by the work of Gamburd, Jakobson and Sarnak [GJS99] we introduced in our previous paper [ABRdS15a] the notion of diophantine Lie group. We refer the reader to this paper for an introduction and background on the general question of diophantine approximation on Lie groups. We briefly recall here the basic definitions. We are given a connected Lie group G and a k-tuple of elements g := (g 1 , . . . , g k ). We consider the subgroup Γ g generated by g 1 , . . . , g k . Its elements can be represented by words w(g) in the elements g i . We ask: How close to the identity can an element γ of Γ g be in terms of the minimal length (γ) of a word that represents it ? The following definition is natural. Let d(x, y) be a left-invariant Riemannian distance on G (or more generally a left-invariant geodesic metric) and let V g (n) be the number of elements in Γ g that can be represented by a word of length at most n.
Definition 1.11. The k-tuple g = (g 1 , . . . , g k ) ∈ G k (or the subgroup Γ g they generate) is called diophantine if there exists β > 0 such that for all but at most finitely many group elements γ ∈ Γ g .
For example in G = (R, +) the pair g = (1, α) is diophantine if and only if α is a diophantine, i.e. non-Liouville, number. It is easy to check that this definition depends only on the subgroup Γ g and not on the particular generating set g, nor on the choice of metric. We say that the Lie group G is diophantine on k-letters if almost every k-tuple in G (chosen independently at random with respect to Haar measure) is diophantine. We say that it is diophantine if it is diophantine on k-letters for every k ≥ 1. When G = R n , then Γ g is diophantine if and only if the n×k matrix whose column vectors are g 1 , . . . , g k is diophantine in the sense that its diophantine exponent (defined as in (1.1)) is finite. Hence the connection with diophantine approximation on matrices.
It is conjectured that semisimple real Lie groups are diophantine. This conjecture is open already for the smallest Lie groups, such as SO(3, R) (see [GHS + 09, §6]). Remez-type inequalities [BG73] combined with the Borel Cantelli lemma only yield a superexponential lower bound exp(−C (γ) 2 ) for Lebesgue almost every g in a semisimple Lie group (see the work of Kaloshin and Rodnianski [KR01] who handled SU(2) but the method is general). It is also already an open problem to show that the affine group Aff(R) of the real line and the group of motions of the Euclidean plane O(2) R 2 are diophantine. See the work of Varjú [Var14] for a very interesting recent result in this direction.
It is fairly easy to see that any nilpotent Lie group with a rational structure (for its Lie algebra) is diophantine, but in [ABRdS15a] we constructed examples (arising only in nilpotency class 6 and higher) of non-diophantine nilpotent Lie groups. They exist because of some particular feature of the representation theory of the general linear group GL k on the free Lie algebra on k-letters: multiplicity for the s-th homogeneous part, s ≥ 6.
For nilpotent Lie groups the growth function V g (n) grows polynomially like n τ with an integer exponent τ given by the Bass-Guivarc'h formula (see [Bas72,Gui73]), and therefore changing the generating set in Γ g does not result in a change of the exponent β. So it makes sense to ask for the optimal β for which (1.8) holds. We call this the exponent of the subgroup Γ g . This is the quantity we study here, with the help of Theorem 1.2 and Lemma 1.8.
It turns out that the optimal exponent always exists.
Theorem 1.12 (Existence of the exponent). Let G be a connected and simply connected nilpotent real Lie group endowed with a left-invariant geodesic metric d. Then for each k ≥ 1, there is β k ∈ [0, +∞] such that for almost every k-tuple g ∈ G k with respect to Haar measure, we have where β(g) is the infimum of all β > 0 such that (1.8) holds.
While the property of being diophantine or not for a k-generated subgroup of G does not depend on the choice of metric near the identity, the exponent does. By a geodesic metric (a.k.a length metric) we mean a distance that is defined in terms of the length of the shortest path between two points. It is well known that left-invariant geodesic metrics on connected Lie groups are all Carnot-Carathéodory-Finsler metrics induced by a norm on a generating subspace of the Lie algebra (this is Berestowski's theorem, [Ber88]). Riemannian metrics are of course examples of such, but in the context of nilpotent Lie groups non Riemannian, Carnot-Carathéodory metrics are also very natural.
We will present two proofs of Theorem 1.12. The first is only valid when k ≥ dim G/[G, G], and is based on a relatively simple general argument that relies on the ergodicity of the action of the group of automorphisms of the free Lie algebra on k-tuples of elements in g. The second proof on the other hand works for all k and necessitates to first translate the diophantine problem in terms of (weighted) diophantine approximation on submanifolds of matrices, and then use the techniques of homogeneous dynamics alluded above.
This translation will allow us to determine β k for all rational nilpotent Lie groups endowed with a rational left-invariant geodesic metric (e.g. a Riemannian metric, see §7.3 for the definition). Our main result is the following: Theorem 1.13 (Rationality and stability of the exponent). Let G be a connected and simply connected rational nilpotent real Lie group endowed with a rational left-invariant geodesic metric. Then for every k ≥ 1 the exponent β k is rational. Furthermore, there exist an integer k 1 and a rational function F g ∈ Q(X) with rational coefficients such that, for all k ≥ k 1 , The rational function F g (X) is a ratio P Q , where P and Q are both polynomials of the same degree with rational coefficients. The degree is equal to the nilpotency class of G.
The constant k 1 will be shown to depend only on dim g. The value of β k for small k may not fit the general pattern F g (k). The fact that it does for k large is an instance of the so-called representation stability, as per the notion investigated by Church, Ellenberg and Farb in a series of recent papers. Here stability occurs for the action of GL k on the free Lie algebra on k-letters, see [CF13, Corollary 5.7] for the case of interest to us.
In [ABRdS15a] we showed that a nilpotent Lie group G with Lie algebra g is diophantine on k-letters if and only if its Lie algebra of laws on kletters L k,g is a diophantine subspace of the free Lie algebra F k on k-letters endowed with its natural Q-structure. A law of g is an element of the free Lie algebra that vanishes identically when the indeterminates are replaced with arbitrary elements of g.
The rational function F g (k) can be computed explicitly in terms of g and the representation theory of the GL k -action by linear substitutions on the relatively free Lie algebra F k /L k,g . In Sections 7 and 8 we will give exact formulas in a number of examples. In particular, we compute β k for metabelian groups, for the group of unipotent upper-triangular matrices and for certain free nilpotent groups.
In [ABRdS15a, §5.2] we asked whether β k has a limit as k tends to infinity. With the above tools it is possible to answer this question, in the case where the nilpotent Lie group G is rational.
Here G (s) is the last step in the descending central series of G. Although we again make the assumption here that the Lie algebra g is defined over Q, we believe that the above limit is always rational without such a rationality assumption, provided of course that G is diophantine: Conjecture 1.15 (Rationality conjecture). Suppose G is a connected nilpotent real Lie group endowed with a left-invariant geodesic metric. Assume that G is diophantine. Then the limit lim k→+∞ β k exists and is a rational number.
Recall that G is said to be diophantine if it is diophantine on k-letters for every positive integer k. In [ABRdS15a] we constructed for each integer k a connected nilpotent Lie group that is diophantine on k letters, but not on k + 1 letters.
In order to apply the diophantine results for submanifolds of matrices expounded at the beginning of this introduction, we will apply Theorem 1.10 with U = g k and V = F k /L k,g , E = g and Φ(x) the evaluation map at x = (X 1 , . . . , X k ) ∈ g k . The quasi-norm on V will be defined in terms of the descending central series of V and the parameters α i will be integers. The choice of a left-invariant geodesic metric on G will yield a quasi-norm on E and Proposition 7.2.2 will show that the diophantine problem (1.8) translates precisely into (1.6). It turns out that the submodularity lemma is also of great help for this in the case of nilpotent groups. It will be used this time for the action of GL k on F k /L k,g . This will mean that once the representation theory of F k /L k,g viewed as a GL k -module is understood well enough, the diophantine exponent can be computed. We will do just that in Section 8 for a number of concrete examples, for example for free nilpotent groups and for metabelian nilpotent groups.
Part of the results proved in this paper were announced in [ABRdS15b]. The paper is organized as follows. In Section 2 we give the first proof of Theorem 1.12 on the existence of the exponent via the ergodicity of the group of rational points of the group of automorphims of the free Lie algebra acting on k-tuples. Section 3 is devoted to an important example, the Heisenberg group: we prove Theorem 1.13 for this group by an ad hoc argument using a Remez-type inequality for quadratic forms. The limitation of this method for tackling the more general nilpotent groups shows the power of the method used later on. Section 5 is devoted to the Kleinbock-Margulis method and the Dani correspondence, which allows to reformulate the diophantine approximation problem in terms of orbits in the space of lattices. The proof of Theorem 5.1 is given in Section 5, and Theorem 5.2 is derived in Section 6, after the submodularity lemma. In Section 7 we give a second proof of Theorem 1.12 and establish the rationality of the exponent -Theorem 1.13 -in full generality using the diophantine approximation results for submanifolds of matrices developed in the previous sections. In Section 8 we compute the exponent explicitly in a number of examples using representation theory of the free Lie algebra.
Acknowledgements. It is a pleasure to thank V. Beresnevich and D. Kleinbock for interesting discussions in relation to the topics of this paper. We are also grateful to the referee for his comments.

A zero-one law
We now turn to the problem of diophantine approximation in nilpotent Lie groups. The purpose of this section is to show the existence of a critical exponent for any simply connected nilpotent real Lie group (not necessarily defined over Q). Later, in Sections 3, 7 and 8, we will explain how to compute this exponent when the group is defined over the rationals.
In this section, G denotes an arbitrary connected simply connected nilpotent Lie group endowed with a left-invariant distance d(·, ·) inducing the topology. A finitely generated subgroup Γ of G will be called β-diophantine if there is a symmetric generating set S of Γ and a constant c > 0 such that for every integer n (recall that S n denotes the set of products of n elements from S), (2.1) By the Bass-Guivarc'h formula [Bas72,Gui73], we know that, within positive multiplicative constants, |S n | n τ for some integer τ depending on Γ only. It follows that if (2.1) holds for some generating set S, then it will also hold for any other generating set, possibly with a different constant c, but with the same β. We can therefore define the diophantine exponent β(Γ) of a finitely generated subgroup of G by Theorem 2.1 (Zero-one law). Let G be a simply connected nilpotent Lie group, whose abelianization has dimension d. Let k ≥ d be an integer. Given β ≥ 0 the set of k-tuples g = (g 1 , . . . , g k ) generating a group Γ g that is β-diophantine is either null or co-null for the Haar measure on G k .
In particular, there is a number β k ∈ [0, +∞] such that β(Γ g ) = β k for almost every k-tuple g ∈ G k .
This shows that Theorem 2.1 implies Theorem 1.12 from the introduction, in the case k ≥ d. An alternative proof (including the cases k < d) will be given in Section 7. We record here the following open problems: (1) Is the set of k-tuples g such that Γ g is β k -diophantine null or co-null ? Consistency with the case G = R, where it is known that badly approximable tuples have zero measure, hints that the answer ought to be null, provided (2) Is there a Jarník type theorem for k-tuples, i.e. a formula for the Hausdorff dimension of the set of k-tuples g such that β(Γ g ) > β for any given β ?
The rest of this section is devoted to the proof of Theorem 2.1, which is based on an ergodicity argument.
2.1. Ergodic action on g k by rational automorphisms. Let F k,s be the free s-step nilpotent Lie algebra on k generators. The group Aut(F k,s ) of linear automorphisms of F k,s is an algebraic group defined over Q. If α ∈ Aut(F k,s )(R) and x 1 , . . . , x k are free generators of F k,s , then for each i = 1, . . . , k, we let α i = α(x i ) and note that for every r ∈ F k,s (R), Let g be an s-step nilpotent real Lie algebra. The group Aut(F k,s )(R) acts on k-tuples g k as follows: Note that this action is algebraic and preserves the measure class of the Lebesgue measure λ on g k : For g ∈ Aut(F k,s ), the pushforward g * λ is absolutely continuous with respect to λ, with density given by the Radon-Nikodym cocycle c(g, X). Moreover, the Radon-Nikodym cocycle c(g, X) is continuous in (g, X) ∈ Aut(F k,s )(R)×g k . In fact Aut(F k,s )(R) preserves the commutator ideal [F k,s , F k,s ] and thus acts on the quotient F k,s /[F k,s , F k,s ], which is isomorphic to R k . This yields a natural epimorphism from the group Aut(F k,s )(R) onto GL k (R) with unipotent kernel; the cocycle is independent of X and given by the determinant of the image of g under this epimorphism.
We are going to show: Proposition 2.1.1 (Ergodic action by automorphisms).
, then the action of Aut(F k,s )(Q) on g k is ergodic.
To prove this, we first recall: Lemma 2.1.2. Let G be a real Lie group, H ≤ G a closed subgroup and ν a quasi-invariant measure on G/H with continuous Radon-Nikodym cocycle. Then every dense subgroup of G acts ergodically on G/H.
. Then for every continuous and compactly supported function φ on G/H and every sequence of elements g n ∈ G converging to g ∈ G, the sequence of functions x → φ(g n x)c(g n , x) is uniformly converging to φ(gx)c(g, x) on G/H. It follows that for every f ∈ L ∞ (G/H), g n f, φ : By Fubini's theorem, Ω has measure zero, and ν-almost every horizontal fiber Ω Proof of Proposition 2.1.1. The group Aut(F k,s ) is an algebraic group defined over Q. Hence the group of Q-points Aut(F k,s )(Q) is dense in the group of R-points Aut(F k,s )(R) (in this case this can be checked directly on the reductive part, which is GL k , and the unipotent part). Therefore it suffices to show that Aut(F k,s )(R) admits a Zariski open orbit on g k , when k ≥ d = dim(g/[g, g]). Indeed, its complement will have Lebesgue measure zero, while the open orbit will be a homogeneous space G/H of G := Aut(F k,s )(R) with Lebesgue measure coming from g k as quasi-invariant measure. Lemma 2.1.2 then implies that Aut(F k,s )(Q) acts ergodically.
To see that Aut(F k,s )(R) admits a Zariski open orbit when k ≥ d, observe that any two k-tuples X, X of points in g k with the property that their reductions modulo [g, g] generate g/[g, g] as a vector space must be in the same orbit of Aut(F k,s )(R). Indeed, since k ≥ d = dim(g/[g, g]), we can find an element of GL k (R) such that gX and X have the same reduction modulo [g, g]. We can thus assume that X − X belongs to [g, g] k . Now, the fact that the tuple X = (X 1 , . . . , X k ) generates g modulo [g, g] implies that every element of [g, g] can be written as α(X 1 , . . . , X k ) for some α ∈ [F k,s , F k,s ]. Therefore X − X = (α 1 (X), . . . , α k (X)) for some α i ∈ [F k,s , F k,s ]. But every endomorphism of F k,s of the form To finish, simply note that when k ≥ d, the set of k-tuples X of points in g k such that their reduction modulo [g, g] spans g/[g, g] is a non-empty Zariski open subset of g k .
2.2. Critical exponent of a nilpotent Lie group. We will prove below Theorem 2.1. It will be a consequence of Proposition 2.1.1 and Proposition 2.2.1 below. In the statement of this proposition, we identify the connected and simply connected nilpotent Lie group G with its Lie algebra g via the exponential map exp : g → G, which is a diffeomorphism. This allows to view Aut(F k,s )(R) as acting on G k rather than g k . Proposition 2.2.1. Given β ≥ 0, the set of β-diophantine k-tuples in G k is Lebesgue measurable and invariant under the action of Aut(F k,s )(Q).
Recall that two groups Γ and Γ are commensurable if their intersection has finite-index in both Γ and Γ . We divide the proof of Proposition 2.2.1 into two lemmas.
Lemma 2.2.2. Let G be a simply connected nilpotent Lie group equipped with a left-invariant distance d(·, ·), and Γ a finitely generated subgroup of G. If Γ is β-diophantine, then any subgroup of G commensurable to Γ is also β-diophantine.
Proof. Let Γ be β-diophantine and Γ commensurable to Γ. Then Γ ∩ Γ has finite-index in Γ and therefore, there exists a normal subgroup Γ 0 < Γ ∩ Γ that has finite-index in Γ . In particular, for some integer p, any element γ ∈ Γ has γ p ∈ Γ 0 . Moreover, Γ 0 is included in Γ, so it is β-diophantine. Let S and S 0 be symmetric generating sets for Γ and Γ 0 , respectively. Since Γ 0 has finite index in Γ , there exists a constant C such that for all integer n ≥ 1, Γ 0 ∩ S n ⊂ S Cn 0 . Now suppose γ ∈ Γ is an element of S n \{1}. Then γ p is an element of Γ 0 ∩ S pn and therefore Using that |S n 0 | grows polynomially and the fact that means ≥ up to a positive multiplicative constant, and in turn, Our second lemma is as follows.
Then the subgroup of G generated by e X 1 , . . . , e X k is commensurable to the subgroup generated by e X 1 , . . . , e X k .
Proof. Recall that if γ 1 , . . . , γ k generate a nilpotent group, then for every integer n ≥ 1, the subgroup generated by the powers γ n 1 , . . . , γ n k has finite index [Rag72, Lemma 4.4.]. By assumption, there is an integer N such that each N X i belongs to F k,s (Z)(X 1 , . . . , X k ), that is N X i is an integer linear combination of commutators in X 1 , . . . , X k . However recall [ABRdS15a, Lemma 3.5.] that there is an integer M such that e M r(X 1 ,...,X k ) belongs to the subgroup generated by e X 1 , . . . , e X k for any r ∈ F k,s (Z). It follows that each (e X i ) M N belongs to the subgroup generated by e X 1 , . . . , e X k . Interchanging X and X , the lemma follows.
Proof of Proposition 2.2.1. It is clear from the definition that the subset of elements g such that Γ g is β-diophantine is a Borel subset. Now suppose that g = (g 1 , . . . , g k ) is such that Γ g is β-diophantine and let g be in the Aut(F k,s )(Q)-orbit of g. This means that log(g) and log(g ) are in the same Aut(F k,s )(Q)-orbit. So Lemma 2.2.3 implies that Γ g and Γ g are commensurable. Since Γ g is β-diophantine, it follows from Lemma 2.2.2 that Γ g also is β-diophantine.
Proof of Theorem 2.1. In view of Proposition 2.2.1, the set D β of k-tuples such that Γ g is β-diophantine is measurable and invariant under the action of Aut(F k,s )(Q). Since this action is ergodic by Proposition 2.1.1, we conclude that D β is either null or conull. If β k denotes the infimum of all β ≥ 0 for which D β is conull, then D β will be conull when β > β k and null if β < β k .
Remark 2.2.4. The property of being diophantine for a k-tuple g does not depend only on the projection of g on the abelianization of G. It is easy to construct examples showing that one may have two tuples g and g such that In particular, Aut(F k,s )(Q) in Proposition 2.2.1 cannot be replaced by the subgroup GL k (Q).
Remark 2.2.5. In Theorem 2.1, the distance d(·, ·) is only assumed to be left-invariant, and need not be geodesic. The alternative argument given in Section 7, which is based on quantitative non-divergence, requires the distance to be geodesic (and hence a Carnot-Carathéodory-Finsler metric by [Ber88]).

Critical exponent for the Heisenberg group
As a warm-up, we now present an explicit computation of the critical exponent of the 3-dimensional Heisenberg group. The method is elementary, using the Borel-Cantelli lemma combined with an ad-hoc Remez-type inequality for a certain family of quadratic forms (cf. Lemma 3.6 below). The relationship between this elementary method and the Kleinbock-Margulis type approach developed later in this paper will be discussed at the end of this section.
Here G will always denote the 3-dimensional Heisenberg group, consisting of 3 × 3 unipotent upper-triangular matrices. It will be convenient for us to view G as the space R 3 , endowed with the group law (x, y, z) * (x , y , z ) = (x + x , y + y , z + z + xy ).
Recall the definition of the diophantine exponent of a finitely generated subgroup of G, made at the beginning of Section 2. We want to prove the following.
Theorem 3.1 (Critical exponent for the Heisenberg group). Let k ≥ 2. Then for almost every k-tuple g = (g 1 , . . . , g k ) in G, The diophantine exponent here is computed with respect to any leftinvariant Riemannian metric on G (equivalently for max{|x|, |y|, |z|}). For the proof of Theorem 3.1, we will need a few definitions. Recall that if w is a word on k letters, i.e. an element of the free group F k over k generators x 1 , . . . , x k , it induces a word map where w(g 1 , . . . , g k ) is the element of G obtained by substituting each letter x i by the element g i . The group F k,G of word maps in k letters on G is defined to be the set of all such maps, with product law given by w G w G = (ww ) G . The length (ω) of a word map ω in F k,G is the minimal length of a word w such that ω = w G .
In the case of the Heisenberg group, one can describe the group of word maps very explicitly. For any two elements g and h in G, [g, h] denotes the commutator of g and h, defined by [g, h] = ghg −1 h −1 . The subgroup [G, G] generated by commutators coincides with the center Z of G and is the set of elements (0, 0, z), for z ∈ R.
Proposition 3.2 (Word maps on the Heisenberg group). Let k ≥ 2. For each word map ω on k letters on G, there exist integers n i , 1 ≤ i ≤ k and n ij , 1 ≤ i < j ≤ k such that for all g = (g 1 , . . . , g k ) in G k , Moreover, the n i and n ij are uniquely determined by ω and there exists a constant C > 0 depending only on k such that Proof. The existence of the n i and n ij is proved by elementary operations on a word representing ω, using that all commutators lie in the center of G, because G is 2-step nilpotent. To verify uniqueness, it suffices to show that no non-trivial family of integers n l , n ij yields the trivial word map; this can be checked directly, by expliciting the word maps in the (x, y, z) coordinates for G. The statement about the length of ω can also be proved directly, We leave the details to the reader.
We are now ready to prove the following theorem, which will easily imply Theorem 3.1.
• If α > k 2 − k − 2, then for almost every k-tuple g ∈ G k , d(ω(g), 1) ≥ (ω) −α for all but finitely many ω ∈ F k,G . • If α < k 2 − k − 2, then for all g ∈ G k , there are infinitely many We decompose the proof into two lemmas, studying first the word maps ω on G that are non-trivial modulo the center of G, i.e. those for which some n i is non-zero.
Lemma 3.4 (Diophantine property outside the derived subgroup). Let k ≥ 2. If α > k 2 −1, then for almost every k-tuple g, we have d(ω(g), 1) ≥ (ω) −α for all but finitely many word maps ω that are non-trivial modulo the center of G.
Proof. Suppose g = (g 1 , . . . , g k ) is chosen at random in G k according to the Haar measure on a compact subset. Then, the projectionḡ = (ḡ 1 , . . . ,ḡ k ) to (G/Z) k is a random k-tuple in (G/Z) k (R 2 ) k , and its law is absolutely continuous with respect to the Lebesgue measure. By a standard application of the Borel-Cantelli lemma, we know that almost surely, if α > k 2 − 1, then n 1ḡ1 + · · · + n kḡk ≥ (max |n i |) −α , for all but finitely many (n 1 , . . . , n k ) in Z k . This proves the lemma.
We now need to study word maps that are trivial modulo the center.
If µ denotes the Lebesgue measure on the ball of radius 1 centered at 0 in G k , Lemma 3.6 below implies that However, by Lemma 3.2, (ω) (max |n ij |) 1 2 = n 1 2 , and therefore, Since every word map trivial modulo the derived group corresponds to a unique k(k−1) The lemma then follows from the Borel-Cantelli lemma.
We now state and prove the elementary lemma used in the above proof.
Lemma 3.6. Let q be a quadratic form on R d , and assume that in some orthogonal basis, q can be expressed without squares as If µ denotes the Lebesgue measure on the ball of radius 1 centered at 0 in R d , then for all ε > 0 Proof. Choose k 0 and l 0 such that |a k 0 l 0 | = max k,l |a kl |. As we may permute the indices, we may assume without loss of generality that (k 0 , l 0 ) = (1, 2). Then write where x = (x 2 , . . . , x d ), a = (a 12 , . . . , a 1d ), and , denotes the usual inner product in R d−1 . Let λ denote the Lebesgue measure on the real line. For Then write We can now conclude the proofs of Theorems 3.3 and 3.1.
Proof of Theorem 3.3. The first assertion follows from combining Lemmas 3.4 and 3.5, noting also that for k ≥ 2, one has k 2 − 1 ≤ k 2 − k − 2. For the second assertion, we note we have k(k − 1)/2 commutators [g i , g j ] lying in Z R. Therefore, given a positive integer n, Dirichlet's pigeonhole argument shows that there exist integers n ij , 1 ≤ i < j ≤ k such that |n ij | ≤ n 2 and where C is a constant depending only on g. Thus, for the word map ω : Proof of Theorem 3.1. By Proposition 3.2, the number of word maps of length at most n is bounded above and below by a positive constant times n k+2 k(k−1) 2 = n k 2 . However, we know from [ABRdS15a, Lemma 2.5] that, for almost every k-tuple g, the group Γ g is isomorphic to F k,G , so that, if V g (n) denotes the number of elements in the ball of radius n with respect to the generating set g, there exist positive constants c 1 , c 2 such that Together with Theorem 3.3, this shows that for almost every k-tuple g, Remark 3.0.6. Let Lemma 3.5 is equivalent to saying that the pushforward under ϕ of the Haar measure on (a ball of) G k is extremal. So an alternative proof consists in using the Kleinbock-Margulis theorem [KM98, Theorem A]: all one is left to check is that the image of ϕ is not contained in a hyperplane.
In the rest of this paper, this will be our approach to compute the critical exponent of an arbitrary connected simply connected rational nilpotent Lie group. Clearly there is little hope to use a direct elementary approach in the spirit of Lemma 3.5 to handle general nilpotent groups. We will reduce the problem to studying the extremality of certain maps from G k to G. If the target space of these maps were one-dimensional, the Kleinbock-Margulis theory would in general be enough. However this is typically not the case and one is thus naturally led to develop a suitable matrix analogue of their theory, which is what we do in the next two sections. The translation to a problem about (weighted) diophantine approximation on subanifolds of matrices is expounded in Section 7.

Quasi-norms, Schubert cells, and pencils
The main goal of the next three sections is to establish Theorems 1.2 and 1.5 from the introduction, which address a problem raised by Kleinbock and Margulis in [KM98, 6.2] and studied in [KMW10] and [BKM15]. The proof will be split into two parts, which correspond to Sections 5 and 6, respectively. First, the upper bound and the fact that the almost sure exponent depends only on the Zariski closure will be proved as Theorem 5.3.1 below, which is a consequence of the quantitative non-divergence estimates on the space of lattices, and the proof will be given in the slightly more general setup of locally good measures. Second, we shall show that equality holds when the Zariski closure is defined over Q. This will be Theorem 6.2.1, and will be obtained as a consequence of the submodularity lemma proved in Section 6. In the present section, we describe the geometric objects involved in our diophantine problem, and explain what Dirichlet's pigeonhole argument, giving the lower bound on the exponent, becomes in this general setting.
As before, V and E are two finite-dimensional real vector spaces, and ∆ is a lattice in V . Given a measure ν (say a probability measure) on Hom(V, E), we want to study the diophantine properties with respect to ∆ of a random point x in Hom(V, E) chosen according to the distribution ν. For the application to nilpotent groups that we develop in Section 7, we need to compute the diophantine exponent of an element x in Hom(V, E), which is defined in terms of certain quasi-norms on V and E.
Definition 4.1. A quasi-norm (resp. local quasi-norm) on V is a map V → R + , v → |v|, for which there exist C > 0, a basis u 1 , . . . , u d of V and positive real numbers α 1 , . . . , α d such that outside a neighborhood of the origin (resp. in a neighborhood of the origin), where u * 1 , . . . , u * d is the dual basis. Given a quasi-norm | · | on V and a local quasi-norm | · | on E, we define the diophantine exponent of x in Hom(V, E) by Remark 4.0.7. At a first reading it is fair to assume that | · | and | · | are chosen to be genuine norms on V and E.
4.1. Quasi-norms and local quasi-norms. For later use, we now record some elementary properties of quasi-norms and local quasi-norms. We say that two quasi-norms (resp. local quasi-norms) are comparable (or equivalent) if their ratio is bounded and bounded away from zero outside a neighborhood of the origin (resp. in a neighborhood of the origin). Let V and E be real vector spaces of respective dimension d and e, and fix |·| a quasi-norm on V and | · | a local quasi-norm on E. As in Definition 4.1, we fix bases (u i ) 1≤i≤d and (u i ) 1≤i≤e for V and E, together with positive real numbers α 1 , . . . , α d and α 1 , . . . , α e such that in a neighborhood of 0 in E.
2. (subspace) If W ≤ V is a subspace, then the restriction of | · | to W is a quasi-norm, associated to the flag 0 when w ∈ W . This is easily seen, since d(w, W ∩V i ) is comparable to d(w, V i ) when w varies in W . Similarly, the restriction of | · | to a subspace F in E is a local quasi-norm, associated to the flag 0 = V e ∩ F ≤ . . . ≤ V 1 ∩ F ≤ F .

(quotient)
If W ≤ V is a subspace, then | · | induces a quasi-norm on V /W by setting, for v := v mod W , It is easy to see that this indeed defines a quasi-norm V /W . It is associated To see this, note that W has an adapted complement, namely a subspace W such that (W ∩ V i ) ⊕ (W ∩ V i ) = V i for every i = 1, . . . , d, and that |v| is comparable to the restriction of |v| to W .

(triangle inequality)
There is C > 0 such that for every v, w in V outside a neighborhood of the origin in V , and |v + w| ≤ C(|v| + |w| ) for every v, w inside a neighborhood of the origin in E.

5.
(volume of large balls) From the quasi-norm |·| on V , we define a function ψ : Grass(V ) → R + by with the convention that V d+1 = V and α d+1 = 0. Clearly ψ is nondecreasing because I is. Moreover −ψ is submodular (cf. Lemma 6.1.2). This is easily seen, since reorganizing the sum yields and α i ≥ α i+1 . Moreover, ψ determines the volume of the ball of radius Q restricted to W : for all Q > 1, Here and hereafter, we write x y if there exist positive constants c, C > 0 such that cx ≤ y ≤ Cx. The constants c and C are allowed to depend on W and | · |, but not on Q. Note also that if W is an adapted complement to W in V , then and hence ψ(W ) = ψ(V ) − ψ(W ) and vol{v ∈ V /W | |v| ≤ Q} Q ψ(V )−ψ(W ) .
6. (volume of small balls) We also define a function φ : Grass(E) → R + by with the convention that V 0 = E. Then φ is non-decreasing (because J is) and submodular (cf. Lemma 6.1.2). This is easily seen, since α i ≥ α i+1 , and reorganizing the sum yields Moreover, φ determines the volume of the ball of radius ε restricted to F : within multiplicative constants (depending on F and | · | only), for all ε ∈ (0, 1), 4.2. Schubert varieties. Schubert varieties are certain distinguished closed algebraic subsets of the Grassmannian. See [GH94] for the complex case and [BCR98] for the real case. In this subsection we briefly recall their definition, because the pencils defined in the introduction give rise to Schubert varieties via the map x → ker x and because they will appear in the definition of locally good measures below. As before V is a d-dimensional real vector space and 0 = V 1 < V 2 < . . . < V d < V d+1 = V is a full flag of subspaces. Let n be an integer with 1 ≤ n ≤ d and let σ = (σ 1 , . . . , σ n ) be a sequence of integers such that 1 ≤ σ 1 < σ 2 < . . . < σ n ≤ d. Let Grass n (V ) be the Grassmannian of n-dimensional subspaces of V . Recall that Grass n (V ) can be realized as a closed (affine) algebraic subset of M d,d (R) by assigning to each subspace the orthogonal projection onto it. The Schubert cell of type σ associated to the full flag {V i } i is the subset e(σ) of all subspaces W in Grass n (V ) such that dim(V σ i +1 ∩ W ) = dim(V σ i ∩ W ) + 1 for each i = 1, . . . , n, or equivalently in the notation of the previous subsection, such that I(W ) = {σ 1 , . . . , σ n }.
The cell e(σ) is a subset of Grass n (V ), and it is easy to see that its closure is a union of other cells, namely: where we define the partial order τ ≤ σ by requiring that τ i ≤ σ i for each i = 1, . . . , n. This closure e(σ) is called the Schubert variety of type σ.
It is worth mentioning that subsets of the form {W ∈ Grass n (V ); dim(W ∩ V j ) ≥ i} for some indices i, j are Schubert subvarieties and that the Schubert subvarieties associated to the full flag {V i } i are precisely the intersections of subsets of this form.
Note that dim(W ∩ V j ) ≥ i if and only if the family of vectors obtained by projecting a fixed basis of V j onto the orthogonal complement of W has rank < j − i. In particular this subset of the Grassmannian is a closed algebraic subset and hence so are all Schubert subvarieties. Moreover the Schubert cell e(σ) is Zariski open (and dense) in the Schubert subvariety e(σ).

4.3.
Dirichlet's principle. As before, V and E are real vector spaces of respective dimensions d and e, and ∆ is a lattice in V . We fix a quasi-norm | · | on V associated to a full flag 0 = V 1 < V 2 < . . . < V d < V and a d-tuple α 1 ≥ · · · ≥ α d > 0 via (4.3), and a local quasi-norm | · | on E defined by (4.4) using a decreasing full flag E > V 1 > V 2 > . . . V e = 0 and an e-tuple α 1 ≥ · · · ≥ α e > 0. To this data are associated the volume exponents functions ψ : Grass(V ) → R + and φ : Grass(E) → R + defined in points 5. and 6. of the Subsection 4.1.
Recall that the diophantine exponent of a homomorphism x ∈ Hom(V, E) with respect to these quasi-norms is defined by Proof. This is a version of the classical Dirichlet argument using the pigeonhole principle. If Q is a large parameter, the ball of radius Q in W for the quasi-norm | · | contains roughly (up to multiplicative constants) Q ψ(W ) points of ∆, which are mapped to a ball in xW (for the quotient quasinorm on xV V / ker x) of volume Q ψ(W )−ψ(W ∩ker x) . On the other hand, the volume of a ball of radius ε for the restriction of | · | to xW ≤ E is comparable to ε φ(xW ) . Using the triangle inequality 4. from the previous subsection, we see that not all ε-balls in xW around the images of Q ψ(W ) integer points can be disjoint if ε φ(xW ) Q ψ(W ) Q ψ(W )−ψ(W ∩ker x) . The proposition follows.
Remark 4.3.2. Anticipating the next sections, we note here that given any x ∈ Hom(V, E), the maps W → −ψ(W ∩ ker x) and W → φ(xW ) are both submodular (see §6.1). This is formal from the submodularity of −ψ and φ. Also they are non-increasing and non-decreasing respectively for set inclusion. So the hypotheses of the submodularity lemma (Lemma 6.1.2) are fulfilled. 4.4. Pencils. Given two non-negative numbers a, b, and a subspace W of V , we define the pencil P W,a,b associated to our choice of quasi-norms on V and E to be the algebraic subset where as before We easily see that pencils are closed algebraic subvarieties of Hom(V, E).
Moreover {ker x ; x ∈ P W,a,b , dim ker x = k} is a Schubert subvariety of Grass k (V ) (see §4.2).
For an irreducible closed algebraic subset M in Hom(V, E), we define and Remark 4.4.1. When all weights α i and α i are equal to 1, then ψ and φ are just the dimension functions. Hence in this case our pencil P W,a,b is just the pencil P W,r defined in the introduction, with r equal to the integer part of the minimum of b and dim W − a.

Diophantine approximation and flows on the space of lattices
In this section we prove the results about diophantine approximation on submanifold of matrices stated in the introduction. We will do so in the general weighted setting using the quasi-norms defined in the previous section. More precisely we will show the following results (using the notation defined in (4.7), (4.8) and (4.9)). In particular β is a rational number if the α i , α i are rational.
The Q-structure on Hom(V, E) used implicitly in the above theorem is the one induced by the Q-span of the bases of V and E chosen to define the quasi-norms.
The above statements account for Theorems 1.2, 1.5 and 1.10 from the introduction. Theorem 1.4 will be proved below as Theorem 5.2.6 (together with its consequences 5.2.10 and 5.2.11.) 5.1. The Dani correspondence. We now recall the connection between diophantine approximation and flows on homogeneous spaces, in particular the so-called quantitative non-divergence estimate, which originates in the work of Margulis [Mar75] and Dani [Dan85] and first arose in the groundbreaking work of Kleinbock and Margulis [KM98] on diophantine approximation on manifolds.
A Dani correspondence is a statement which relates a diophantine exponent to the rate of escape of a certain flow on the space of lattices. In this subsection we present a Dani correspondence for matrices that is valid in the quasi-norm setting. A similar correspondence was worked out by Kleinbock and Margulis already in the matrix context in their work on logarithm laws [KM99, Theorem 8.5].
We keep the notation of the previous subsection and let x ∈ Hom(V, E). If I(ker x) = {i 1 < i 2 < · · · < i n }, then is a full flag from ker x to V . Concatenating these two flags, we obtain a full flag in V : , and we define a d-tuple (a 1 , . . . , a d ) of positive numbers by setting Finally given β > 0 we define a one-parameter subgroup {g where n = dim ker x. Observe that a i ≥ a i+1 for every i < n and a i ≤ a i+1 for every i > n.
Proposition 5.1.1 (Dani correspondence). For x ∈ Hom(V, E) the diophantine exponent for quasi-norms defined in (4.7) is given by where · is a fixed Euclidean norm on V .
Proof. We show that given β and x,g (x,β) t v is uniformly bounded away from zero when t > 0 and v ∈ ∆\{0} if and only if |xv| ·|v| β is uniformly bounded away from zero for all non-zero v ∈ ∆.
Note that ker x = e (x) i , i ≤ n , and that we may restrict attention to vectors v such that xv is small, for otherwise there is nothing to prove. We and note, using the triangle inequality 5. of §4.1, that |v| is comparable within multiplicative constants to |v | since v can be assumed bounded. So we are left to check that max{max i≤n |v i e −a i t |, max i>n |v i e βa i t |} is bounded away from zero uniformly in t > 0 and v ∈ ∆ \ {0} if and only if |xv| ·|v| β is bounded away from zero for v in ∆\{0}. This is straightforward since |xv| is comparable to max i>n |v i | 1/a i , while |v| is comparable to |v | and hence to max i≤n |v i | 1/a i .
For the next subsection, it will be convenient to use a slightly different flow g  has rank m = dim xV . We obtain the elements i 1 < · · · < i n of the set I(ker x) in the following way: Now, consider the matrix x ∈ GL d (R) given in rows by We denote by a Remark 5.1.2. Note that, with this notation, F (i) x = x −1 V i+1 and our former definitiong    With this construction, the map x → g (x,β) t is a polynomial map on any set where I(ker x) and J(xV ) are constant. This last condition is equivalent to requiring that ker x and xV respectively stay in some a fixed Schubert cell of the grassmannians Grass(V ) and Grass(E), defined with respect to the flags {V i } and {V i }.

5.2.
Quantitative non-divergence. For the remainder of this subsection, V and E are two filtered real vector spaces endowed with quasi-norms as introduced in §4.3. In this context, we adapt the strategy introduced by Kleinbock and Margulis [KM98] to study diophantine approximation on manifolds. The results presented in this subsection are closely related to further work of Kleinbock [Kle08b,Kle10a], who proved in particular the existence of an almost sure exponent and the fact that it is the smallest exponent.
Let ν be a Borel measure on a metric space X. Given an open set U ⊂ X, the measure ν is D-doubling (or D-Federer ) on U if every ball B centered at x ∈ U ∩ Supp(ν) and contained in U satisfies where 1 3 B is the ball centered at the center of B whose radius is a third of the radius of B. Given two positive constants C and α, we say that a real-valued function f on X is (C, α)-good on U with respect to the measure ν if it satisfies, for any ball B ⊂ U and any ε > 0, where f ν,B = sup x∈B∩Supp ν |f (x)|. By convention, we also agree that if f is identically zero on the support of ν, then it is (C, α)-good with respect to ν for any values of C and α.
The heart of the Kleinbock-Margulis approach to study diophantine approximation on manifolds is the following key result, which can be seen as a Remez-type inequality (see [BG73] or [ABRdS15a, Thm 2.7]) for functions taking values in the space of lattices. It originates in the work of Margulis [Mar71] on non-divergence of unipotent flows, which was later greatly generalized in work of Dani [Dan85] and Kleinbock-Margulis [KM98]. The following version is borrowed from [Kle08b] (see also [Kle10b]).
Then for every ε ∈ (0, ρ], we have that where C > 0 depends only on D, C, α and the Besicovitch constant of X. In the above theorem, the letter Ω ε denotes the part of the space of lattices GL d (R)/ GL d (Z) made of lattices in R d admitting a non-zero vector of length at most ε. When one restricts attention to unimodular lattices, the set Ω ε is thought of as "the cusp" of Ω, because as ε → 0 these sets form a nested sequence of neighborhoods of infinity. This fact is known as Mahler's criterion [Rag72,Cor. 10.9].
In fact the theorem is stated only for SL d (R)-valued maps in [Kle08b], but it is valid as well -with the same proof -for GL d (R)-valued maps. In older versions of this non-divergence result (such as in [KLW04]) the second assumption involved a lower bound of the form ρ. Kleinbock's observation [Kle08b] that one can relax this assumption to a lower bound ρ k is crucial when one wants to study diophantine exponents of measures that might not be extremal; it will be essential in the proofs of our formulas for the exponent.
Recall also that a metric space X is called Besicovitch if there exists an integer C such that we have the following property: suppose A ⊂ X and for each a ∈ A we are given a non-empty open ball B a centered at a; then there exists a countable subset A ⊂ A such that A ⊂ a∈A B a and any intersection of C distinct balls B a , a ∈ A , is empty. In the definition below, the flow g (y,β) t on GL(V ) is the one constructed at the end of §5.1, and we denote by W k ∆ the set of non-zero pure integral k-vectors, i.e. those w ∈ ∧ k V \ {0} that can be written w = v 1 ∧ · · · ∧ v k where each v i is an element of ∆.
Definition 5.2.2. Let ν be a Radon measure on Hom(V, E) and x ∈ Supp(ν). We will say that the measure ν is locally good at x if there exists a neighborhood U x of x and positive constants C, D and α such that • there exists n ≤ d and a Schubert cell e(σ) ⊂ Grass n (V ) such that for all y in U x ∩ Supp ν, ker y belongs to e(σ) (see §4.2). • there exists m ≤ e and a Schubert cell e(σ ) ⊂ Grass m (E) such that for all y in U x ∩ Supp ν, yV belongs to e(σ ) (see §4.2). • ν is D-doubling on U x • for all t, β > 0, all k ≥ 0 and all w in W k ∆ , the map y → g (y,β) t w is (C, α)-good on U x with respect to ν.
Remark 5.2.3. The first two conditions ensure that the map y → g With the quantitative non-divergence result (Theorem 5.2.1) we can derive the following statement. The argument in the proof is taken from Kleinbock [Kle10a].
Theorem 5.2.5 (Existence of a local exponent). If a Radon measure ν on Hom(V, E) is locally good at x, then there is a neighborhood B x of x such that for ν-almost every point y ∈ B x , β(y) = inf z∈Bx β(z). (

5.4)
If B x is a neighborhood such that (5.4) holds, we define the local diophantine exponent of ν at x by β ν (x) = inf z∈Bx β(z).
Proof. If B x is any neighborhood of x, it is trivial that for all y in B x , β(y) ≥ inf z∈Bx β(z). Conversely, assume ν is locally good at x, and let C, D, α > 0 and B x = B(x, r) be a ball around x such that the conditions of Definition 5.2.2 hold on U x := B(x, 3 d r). We claim that for almost every y in B x , β(y) ≤ inf z∈Bx β(z). To see this, fix z ∈ B x and β > β(z), so that, by Proposition 5.1.1, there exists c > 0 such that Of course, this implies that For η > 0, apply Theorem 5.2.1 with U = U x , B = B x , ρ = c, h : y → g (y,β) t , and ε = e −ηt . For any t > 0 such that e −ηt < c, we find that and, by the Borel-Cantelli lemma and Proposition 5.1.1 again, for almost every y in B x , β(y) ≤ β + η. Letting η → 0, β → β(z), and taking the infimum over z in B x , we get, for almost every y in B x , Given a non-negative parameter β, we say that the measure ν satisfies the condition (C β ) at x if the following holds: It follows from Theorem 5.2.5 and Proposition 5.1.1 that if ν is locally good at x, then β ν (x) = inf{β > 0 | ν satisfies (C β ) at x}. Theorem 5.2.6 (Inheritance). Let ν, ν be Radon measures on Hom(V, E), and assume that ν is locally good at x and ν locally good at Proof. Let S be a compact subset of Hom(V, E), and assume that for all x in S, the subspaces ker x and xV belong to some fixed Schubert cells in Grass(V ) and Grass(E). We will say that S satisfies (C β ) if The expression g (y,β) t w is a linear function of the minors of g (y,β) t . By (5.1), it is also a linear function of the minors of y, and therefore, for some constant C > 0 depending only on S, where B is a fixed compact neighborhood of 0 in Hom(V, E) containing S (note that both expressions on the right-hand side are norms on the finitedimensional space of linear maps between θ(H(S)) and ∧ k V ). It follows that the condition (C β (S)) depends only on H(S). To conclude the proof of the theorem, it suffices to use equality (5.5), and to note that the measure ν satisfies (C β ) at x if and only if (C β (B(x, r) ∩ Supp ν)) holds for every r > 0. Remark 5.2.8 (local Zariski closure). The local Zariski closure of ν at x is defined to be the intersection of the Zariski closures of B(x, r) ∩ Supp(ν) in Hom(V, E) for all r > 0. By noetherianity the local Zariski closure (resp. the local Plücker closure) coincides with the Zariski closure (resp. the Plücker closure) of B(x, r) ∩ Supp(ν) whenever r is sufficiently small. Observe in particular that H ν (x) depends only on the local Zariski closure at x.
Hence we obtain the following corollary: Corollary 5.2.9. Let ν, ν be Radon measures on Hom(V, E), and assume that ν is locally good at x and ν locally good at x . If the local Zariski closures coincide, then β ν (x) = β ν (x ).
As a corollary of this inheritance theorem, we can define the diophantine exponent of an algebraic subset of Hom(V, E). Here, and throughout the paper, when speaking of Zariski closure, algebraic subsets and algebraic varieties, we will always consider these notions in real algebraic geometry and we refer the reader to the textbook [BCR98] for definitions and basic properties. By Lebesgue measure on an algebraic set M we mean the topdimensional Hausdorff measure on the subset M of Hom(V, E). Restricting U if necessary, we see using [Kle10a, Proposition 2.1] that there exist constants (C, α) and a neighborhood U x of x such that all functions y → g (y,β) t w are (C, α)-good on U x with respect to ν. So ν is locally good at x, and by Theorem 5.2.5, for ν-almost all y in a neighborhood of x, β(y) = β ν (x). At a non-singular x as above the local Zariski closure is M itself ([BCR98, Proposition 3.3.14]). So by Corollary 5.2.9, β ν (x) is independent of the choice of x. This proves the corollary. The pushforward of the Lebesgue measure under an analytic map U → Hom(V, E), where U is a open domain in R N is locally good at every point, so we have the following important corollary to Theorem 5.2.5. This result is due to Kleinbock [Kle10a] in the case where | · | and | · | are norms, and the proof is essentially the same, once the correspondence of §5.1 has been established.

Pencils and extremality.
Given two non-negative numbers a, b and a subspace W of V , recall that we have defined the pencil P W,a,b associated to our choice of quasi-norms on V and E to be the set And the numbers τ (M) and τ Q (M) have been defined in (4.8) and (4.9). Proof. The left-hand side follows from Dirichlet's principle proved in Proposition 4.3.1. Let us verify the right-hand side. Let β > τ (M). By Theorem 5.2.5, it suffices to show that β > β ν (x) for every non-singular point x of M which lies in the interior of the smallest Schubert subvariety containing M. Equivalently, from (5.5), it is enough to check that (C β ) holds at x. Note that we may replace g in the definition of (C β ), because x remains bounded in a neighborhood of x in M. Clearly then a sufficient condition for (C β ) to hold is that w → sup y∈B(x,r)∩M π (y,β) 0 (w) does not vanish for any r > 0, when w ranges among pure k-vectors of norm 1, and π (y,β) 0 is the projection onto the sum of the eigenspaces ofg Now, if W is the subspace of V associated to the k-vector w, then the largest eigenvalue occurring in the decomposition of w into eigenvectors of g (y,β) t is βφ(yW ) − ψ(ker x ∩ W ). Therefore, if (C β ) fails to hold, there exist r > 0 and a subspace W ≤ V such that B(x, r) ∩ M is entirely contained in the set which is the (finite) union of all pencils P W,a,b such that βb − a < 0. Recall that neighborhoods of non-singular points are Zariski-dense in M [BCR98, Proposition 3.3.14]. From the irreducibility of M it follows that M is entirely contained in a single pencil P W,a,b for a pair a, b with βb − a < 0, which is contrary to our assumption that β > τ (ν).

The submodularity lemma
In view of Theorem 5.3.1, the following question is natural: Under what condition on the irreducible algebraic subset M ⊂ Hom(V, E) is the maximum defining τ (M) in (4.8) attained on a ∆-rational subspace W ? We will show here that a sufficient condition is that M be defined over the rationals. This is the content of Theorem 6.2.1. The method will also show that if M is invariant under some group G of linear automorphisms, then τ (M) is attained on a G-invariant subspace W . This observation will be essential for the application to nilpotent Lie groups developed in Section 7. 6.1. Statement and proof of the submodularity lemma. Let V be a d-dimensional vector space (over some field). Let φ and ψ be two real-valued functions on the Grassmannian of V , with the following properties: (1) φ ≥ 0, φ(0) = 0.
(3) φ and −ψ are submodular, i.e. for any two vector subspaces U and W we have We let We are interested in the supremum S of q on the entire Grassmannian. Note that S ∈ R ∪ {±∞}.
Lemma 6.1.1 (Submodularity lemma). The supremum is attained, and there is a unique subspace x 0 of maximal dimension such that S = q(x 0 ).
So we have proved our initial claim and we thus assume without loss of generality that ψ is not identically zero, that q(0) < S, and that φ(x) > 0 if x = 0. This enables us to assert that S = S 1 , where we have denoted, for k = 1, . . . , d, S k = sup{q(x); dim x ≥ k}.
Let k 0 be the maximal k ≥ 1 such that S k = S. If k 0 = d, then S = q(V ) and the conclusion of the lemma holds with x 0 = V . If not we have S k 0 +1 < S.
Pick ε > 0 such that S − S k 0 +1 > 2ε, and pick x 0 ∈ Grass(V ) such that dim(x 0 ) ≥ k 0 and q(x 0 ) > S − ε. Note that dim(x 0 ) = k 0 , for otherwise dim(x 0 ) ≥ k 0 + 1 and thus q(x 0 ) ≤ S k 0 +1 < S − 2ε < S − ε, a contradiction to our choice of x 0 . Now let y 0 be another choice of subspace such that dim(y 0 ) ≥ k 0 and q(y 0 ) > S − ε. For the same reason dim(y 0 ) = k 0 = dim(x 0 ). But where we have used submodularity of −ψ in the first line, positivity of φ in the second line, submodularity of φ in the fourth line and monotonicity of φ in the last line. Hence: Therefore dim(x 0 + y 0 ) ≤ k 0 . This means that x 0 = y 0 . Hence we have proved that q(x) > S − ε and dim(x) ≥ k 0 implies that x is unique. In particular S = q(x 0 ), and the lemma follows.
Corollary 6.1.2. Let G be a group acting on the Grassmannian, and assume that the action preserves dimension. If φ and ψ are invariant under G, then the supremum S is attained on a G-invariant subspace.
Proof. Indeed x 0 and gx 0 will have the same dimension and will both achieve the supremum S, so by uniqueness x 0 = gx 0 , for every g ∈ G.
Remark 6.1.3. The proof works verbatim more generally for functions defined on a graded lattice of finite length, in place of Grass(V ), i.e. a partially ordered set with a smallest element (0) and a largest element (V) such that every pair of elements admits unique lower and upper bounds, and which is equipped with an integer-valued rank function r(x), which is ≥ 0 and takes only finitely many values and is such that if x < y and there is no z with x < z < y, then r(y) = r(x) + 1.
6.2. Applications of the submodularity lemma. We go back to the setting of Section 4. Thus, V and E are finite-dimensional real vector spaces, endowed with quasi-norms | · | and | · | with associated flags {V i } and {V i }. The lattice ∆ ≤ V induces a Q-structure on V . Recall that by a Q-structure we mean a Q-vector subspace which generates the ambient space over R and whose dimension over Q is the dimension over R of the ambient space. Let us assume that the flag {V j } is made of ∆-rational subspaces, and endow E with a Q-structure for which each V j is rational. This endows Hom(V, E) with a natural Q-structure. This completes the proof of Theorem 5.2. A different application of the submodularity lemma will be given in Section 7, in the proof of Theorem 1.13 from the introduction.
For now, we just give another simple example where the submodularity lemma applies, and allows to compute the diophantine exponent of an algebraic subset of matrices. For the example below, and until the end of this section, we shall only consider diophantine approximation for genuine norms on V and E.
Example 6.2.2 (The Veronese curve in algebras). Consider the Veronese curve V in R n given by the parametrization x → (x, . . . , x n ). The Mahler conjecture proved by Sprindzuk says that the curve V is extremal, or in other words, that for almost every x ∈ R, for all ε > 0, the inequality has only finitely many integer solutions v ∈ Z n+1 , where here β = n. We can consider the analogous problem, where R is replaced by an arbitrary finite-dimensional R-algebra E with unit (e.g. E = M m (R)). Given x ∈ E, we may consider integer linear combinations of 1, x, . . . , x n . We may then ask for the minimal β > 0 such that for almost every x ∈ E and for every ε > 0, the inequality (6.3) (with a norm in place of the absolute value on the left-hand side) has at most finitely many integer solutions v ∈ Z n+1 .
To fit this problem into our setting, we let V = R n [X] the space of polynomials of degree at most n and ∆ the lattice of polynomials with integer coefficients. Consider the submanifold M of Hom(V, E) that consists of the evaluations maps P → P (x) for x ∈ E. It is defined over Q, so, by Theorem 6.2.1, its exponent, which is exactly the β we are looking for, is equal to Here W (x) ≤ E denotes the image of W under the evaluation map P → P (x). Now, let G be the group of affine transformations of the real line and let it act on V by substitution of the variable. The maps φ and −ψ are submodular and G-invariant, so by Lemma 6.1.2, τ achieves its value on one of the G-invariant subspaces, which are exactly the subspaces V i ≤ V , i = 0, . . . , n of polynomials of degree at most i. Let m be the maximal dimension of one-generator subalgebras R[x] ≤ E, for x ∈ E. Evaluating V i at an x with minimal polynomial of degree m, we see that, for i < m, φ(V i ) = i + 1 and ψ(V i ) = 0. On the other hand, we always have φ(W ) ≤ m and ψ(W ) ≥ dim W − m, so we find This shows that the desired diophantine exponent β is equal to max{0, n+1−m m }.
Example 6.2.3 (Weak non-planarity and extremality). Consider the unweighted case (i.e. α i = α i = 1). A submanifold M ⊂ Hom(V, E) is called weakly non-planar if it is not contained in any proper pencil P W,r Hom(V, E). This notion was introduced, using slightly different words, by Beresnevich, Kleinbock and Margulis [BKM15] who showed that every locally good weakly non-planar measure on Hom(V, E) is extremal. The converse however does not hold, and we now provide an example. Let k ≥ 4 be an integer and X = (R 3 ) k . Consider the finite-dimensional space V of polynomial maps f from X to E = R 3 given by where ∧ is the usual wedge product in R 3 and a ij ∈ R. Let M be the Zariski closure in Hom(V, E) of the image of X by the map Φ : Let W be the subspace of V generated by the u 1 ∧ u j , j = 2, . . . , k. For any x = (u 1 , . . . , u k ) with u 1 = 0, the space W (x) is included in the orthogonal of u 1 and hence has dimension at most 2. Therefore, M ⊂ P W,2 is not weakly non-planar.
However, it is easy to see that GL k (R) acts irreducibly on V by substitution of the variables. By the submodularity lemma we conclude that V is the unique subspace realizing the maximum in τ (M). Therefore M is not contained in any constraining pencil and thus M must be extremal by Corollary 1.6. is extremal.
Proof. Corollary 1.6, all we have to do is to check that for all non-zero subspace W < V = R 4 , there exists a point x ∈ M such that dim x(W ) ≥ dim W 2 . We study all possible values for dim W . Suppose first dim W = 1, 2. Since M is row-nonplanar, there exists x ∈ M such that x(W ) is nonzero, which implies dim x(W ) ≥ 1 ≥ dim W 2 . Now if dim W = 3, denote by v 1 , . . . , v 4 the coordinates in V , and suppose first that W is given by an equation v 1 = λ 2 v 2 + λ 3 v 3 + λ 4 v 4 . The matrix of the restriction of (I 2 |M ) to W is given in some basis by Let M x be the Zariski closure of the support of ν x . It contains all points of the form xg −1 , g ∈ SL(V ). Now suppose W is a subspace of V . Since rk x = m, we may choose g ∈ SL(V ) such that dim(xg −1 )(W ) = min{m, dim W }. This shows that τ (M x ) = dim V m . Moreover, equality is attained for W = V , which is ∆ 0 -rational, so Theorem 5.2 shows that for almost every y in M x , β ∆ 0 (y) = dim V −m m . In particular, for almost every g in U , Remark 6.2.8. The above proposition implies that for almost every W in the Grassmannian of k-planes in V , the pencil P W,r is extremal, even if

The critical exponent for rational nilpotent Lie groups
In this section we shall prove Theorems 1.12 and 1.13. Our method is based on the results of the previous sections regarding diophantine approximation on submanifolds of matrices.
Let G denote an arbitrary simply connected nilpotent real Lie group, with Lie algebra g of nilpotency class s. We endow G with a left-invariant geodesic metric d(·, ·). It is well known [Ber88] that these are exactly the left-invariant Carnot-Carathéodory-Finsler metrics on G. These metrics are obtained by the same construction as the left-invariant Riemannian metrics on G, except that instead of a Euclidean norm on g, we start with an arbitrary norm on a generating subspace V 1 of g. Denoting inductively, , every Carnot-Carathéodory-Finsler metric on G associated to V 1 is comparable near the identity (up to multiplicative constants) to where V 0 = 0 by convention, dist is some fixed Euclidean distance on the Lie algebra, s is the nilpotency class of g and X ∈ g lies in a neighborhood of the origin. Hence the function X → d(exp(X), 1) is a local quasi-norm (see Definition 4.1).
7.1. Growth of a generic subgroup and group of words maps. Let k ≥ 1 and a k-tuple g = (g 1 , . . . , g k ) of elements of G. Recall that the subgroup Γ g generated by g is β-diophantine if there exists c > 0 such that, for every integer n, min where S = {1, g ±1 1 , . . . , g ± k } and S n is the set of elements that can be obtained as a word of length n in the elements of S. This definition involves the volume V g (n) = |S n | of the ball of radius n in the subgroup generated by S = {1, g ±1 1 , . . . , g ±1 k }. It turns out [ABRdS15a, Lemma 2.5] that for a generic k-tuple g (generic with respect to Haar measure on G k ) the isomorphism class of the subgroup Γ g = S is always the same. It is isomorphic to the group of word maps on k letters F k,G of G, introduced in [ABRdS15a]. We briefly recall this notion. Any element w in the free group F k on k letters x 1 , . . . , x k determines a word map w G : G k → G given by replacing each letter x i by an element of G. The group F k,G is defined to be the set of all such word maps, with composition law given by w G w G = (ww ) G , where ww denotes the product of w and w in F k . It can also be viewed as the relatively free group in the variety of k-generated subgroups of G.
Since F k,G is a fixed nilpotent group, the Bass-Guivarc'h formula [Bas72,Gui73] tells us that, up to multiplicative constants, for a generic k-tuple g in G, V g (n) n η G (k) (7.2) where η G (k) is a positive integer that can be expressed in terms of the ranks of the successive quotients of the central descending series of F k,G , see (7.7) below. Now, given a k-tuple g of elements of G, we can define another diophantine exponent, denoted α(g), as follows. It is the infimum of all α > 0 such that for all but finitely many ω ∈ F k,G we have: where (ω) is the length of an element ω ∈ F k,G , i.e. the minimal length of a word w such that w G = ω. Recall that the diophantine exponent of the subgroup Γ g was defined in Section 2 by It follows from the above observations about V g (n) that, for almost every k-tuple g in G, α(g) = η G (k)β(Γ g ). (7.4) 7.2. From words to Lie brackets. In this subsection, we describe the correspondence between words on G and laws on its Lie algebra g. We first define the Lie algebra F k,g of bracket maps on k letters on g. Let F k be the free Lie algebra on k generators. Each element r in F k yields a map r : g k → g (X 1 , . . . , X k ) → r(X 1 , . . . , X k ) where r(X 1 , . . . , X k ) is the evaluation of the formal bracket r at the point (X 1 , . . . , X k ) in g k . By definition, the Lie algebra F k,g consists of all maps from g k to g obtained in the above manner. It is naturally isomorphic to the quotient Lie algebra F k /L k,g where L k,g is the ideal of laws on k letters on the Lie algebra g. Recall that a law on k letters is an element r ∈ F k such that r(X 1 , . . . , X k ) = 0 for all X 1 , . . . , X k in g.
Note that the free Lie algebra F k has a natural Q-structure induced by the subring F k (Z) of integer linear combinations of brackets monomials. We can thus consider the ideal of rational laws L k,g,Q , which is the real span of the intersection of L k,g with F k (Z). It thus inherits a rational structure, which induces a rational structure on the quotient space F k,g,Q = F k /L k,g,Q . We will say that r is an element of F k,g,Q (Z) if it is the projection on F k,g,Q of an element of F k (Z).
Remark 7.2.1. Certainly if g itself is defined over the rationals, then so is L k,g , but the converse does not hold. For example, it followed from the analysis made in [ABRdS15a, Appendix A] that the ideal of laws is always defined over Q if g is a nilpotent Lie algebra of step at most 5, or if g is both nilpotent and metabelian.
The Lie algebra F k,g Q has a graded structure k,g,Q is the homogeneous part of F k,g,Q consisting of brackets of degree i. For r = r i with r i ∈ F where · is a fixed norm on F k,g,Q .
Lemma 7.2.2. Let g be a real nilpotent Lie algebra and G the simply connected Lie group with Lie algebra g. There are positive integers C, D such that • If ω ∈ F k,G , then there exists r ∈ F k,g,Q (Z) with |r| ≤ D (ω) such that for all X 1 , . . . , X k in g, ω(e X 1 , . . . , e X k ) = e 1 C r(X 1 ,...,X k ) . • If r ∈ F k,g,Q (Z), then there exists ω ∈ F k,G with (ω) ≤ D|r| and for all X 1 , . . . , X k in g, e Cr(X 1 ,...,X k ) = ω(e X 1 , . . . , e X k ).
Proof. This was proved in [ABRdS15a, Lemma 3.5] for the free Lie algebra F k and the free group F k . The relative version stated here follows without difficulty, using that F k,g, Recall that |X| denotes the local quasi-norm on g defined in (7.1). The above lemma has the following immediate consequence.
Proposition 7.2.3. Let G be a simply connected nilpotent Lie group, with Lie algebra g. Let g = (e X 1 , . . . , e X k ) be a k-tuple in G. The exponent α(g) defined in (7.3) is also the infimum of all α > 0 such that |r(X 1 , . . . , X k )| ≥ |r| −α (7.6) holds for all but finitely many r ∈ F k,g,Q (Z).
It also follows from Lemma 7.2.2 that the group of word maps F k,G is included as a lattice in the simply connected rational nilpotent Lie group whose Lie algebra is F k,g,Q [ABRdS15a, Proposition 3.9]. The Bass-Guivarc'h formula for the growth exponent of F k,G from (7.2) now reads: k,g,Q . (7.7) Each F k,g,Q is a module for the action of GL k under linear substitution. We will see below that, as a consequence, dim F k,g,Q is a degree i polynomial in k with rational coefficients, when k is large enough. It follows that, for k large, η G (k) is given by a degree s polynomial in k. Therefore, in view of (7.4), we now focus on computing α(g) for a random tuple g.

7.
3. Existence of the exponent. We now use the results of Section 5 to study the diophantine problem described by (7.6), and prove Theorem 1.12 from the introduction, which we recall here: Theorem 7.3.1 (Existence of the exponent). Let G be a connected and simply connected nilpotent real Lie group endowed with a left-invariant geodesic metric d. For each k ≥ 1, there is β k ∈ [0, +∞] such that for almost every k-tuple g ∈ G k with respect to Haar measure, we have Proof. We set E = g, V = F k,g,Q , ∆ = F k,g,Q (Z) and we define the local quasi-norm |X| on E = g by (7.1) and the quasi-norm |r| on V by (7.5). We let U = g k and Φ :U → Hom(V, E) g → (Φ(g) : r → r(g)).
Now we are in the setting of Corollary 5.2.12 from Section 5 and hence α(g) is almost everywhere constant. In view of (7.4) this shows that β(g) is also well-defined and constant almost everywhere.
We denote by α k the almost sure value of α(g) for g ∈ g k . Hence (7.4) becomes: (7.8) If we assume moreover that G and the geodesic metric are rational, we may even conclude that α k is rational. We define a rational nilpotent Lie group as a nilpotent Lie group G with Lie algebra g endowed with a rational structure induced by a basis B with rational structure constants. A leftinvariant geodesic metric on a rational nilpotent Lie group G is said to be rational if the associated subspaces V i -defined at the beginning of this section -are rational. For this, it is enough to require that V 1 is rational.
Theorem 7.3.2 (Rationality of the exponent). Let G be a connected and simply connected rational nilpotent real Lie group, and d(·, ·) a rational leftinvariant geodesic metric on G. Then, for all k ≥ 1, Proof. We use the notation of the previous proof. If g is rational, then the Zariski closure M of Φ(U ) is defined over Q. Indeed, complexifying E, V and Φ we see that σ(Φ(g)) = Φ(σ(g)), for all σ ∈ Gal(C|Q). Therefore, Theorem 6.2.1 applies and we get that α k = τ Q (M). Now note that the exponents α i and α i defining our quasi-norms are integers. As a result, the functions ψ and φ are integer-valued, and thus α k ∈ Q.
7.4. The relatively free Lie algebra as a GL k -module. In order to prove the second part of Theorem 1.13, we need some understanding of the GL k -submodules of F k,g,Q . The action of the linear group GL k on the free Lie algebra F k on k letters is by substitution of the variables. It turns out that the decomposition of each homogeneous component F then F s (W ) decomposes, with the same multiplicities, as where E λ (k) is the irreducible representation of GL k with Young diagram λ.
Remark 7.4.2. The Young diagrams appearing in either decomposition have at most s boxes. Furthermore F s sends L k,g to L s,g and L k,g,Q to L s,g,Q , if g is any Lie algebra of nilpotency class s. In particular, as k grows, F k,g,Q has only boundedly many irreducible GL k -submodules counting multiplicity in its decomposition, all obtained by the above process from the decomposition of F s,g,Q into irreducible GL s -submodules.
Proof. Note that elements of F ≤s k are linear combinations of brackets having at most s letters, and that F ≤s k decomposes into weight spaces for the diagonal action (t 1 , . . . , t k ) · c(x 1 , . . . , x k ) = c(t 1 x 1 , . . . , t k x k ).
k is an irreducible GL k -module, then it is generated by a highest weight vector, whose weight λ is of the form (n 1 , . . . , n k ) with n i = 0 if i > s. The corresponding GL s -submodule F s (W ) is generated by the same highest weight vector. It is therefore an irreducible GL s -module with the same Young diagram. The result follows.
Given a Young diagram λ, by the Weyl dimension formula, the dimension d λ (k) of the irreducible GL k -module associated to λ is: In particular, if i is the total number of boxes of λ, then d λ (k) is a degree i polynomial in k with rational coefficients. The number of boxes of E λ ≤ F ≤s k is the number of letters appearing in the brackets of the associated submodule of F ≤s k . In particular, we obtain: k,g,Q is a polynomial in k of degree i with rational coefficients, provided k ≥ s. From (7.7) the same holds for the growth exponent η G (k). More generally, there is a finite family F of rational polynomials of degree at most s such that for k ≥ s, if W is any GL k -submodule in F k,g,Q then dim W = P (k) for some P ∈ F . 7.5. Stability of the exponent. We now prove the second part of Theorem 1.13 from the introduction, which we recall here, for convenience.
Theorem 7.5.1 (Stability of the exponent). Let G be a connected and simply connected rational nilpotent real Lie group, and d(·, ·) a rational leftinvariant geodesic metric on G. There exists a rational function F g ∈ Q(X) with rational coefficients such that, for k large enough, Proof. We already know from Corollary 7.4.3 that, for k ≥ s, η G (k) is given by a degree s polynomial in k with rational coefficients. So in view of (7.8), we need only prove that α k is given by a rational function, for large k. We saw in §7.3 that α k = τ Q (M), where τ Q (M) is given by (4.9) and takes the form (6.1) of the ratio of submodular functions, −ψ M and φ M defined on the grassmannian Grass(V ), where V = F k,g,Q .
The group GL k acts by linear substitution on V and it preserves the flag of subspaces k,g,Q (7.10) defining the quasi-norm |r| in (7.5), so the function ψ is GL k -invariant. Noting that gW ∩ ker Φ(u) = g(W ∩ ker Φ(g −1 u)) for any g ∈ GL k , u ∈ g k and W ≤ V , we conclude that ψ M is GL k -invariant. Similarly φ M is GL kinvariant. We can therefore apply the submodularity Lemma 6.1.2, and conclude that there is a GL k -invariant ∆-rational subspace W ≤ V such that (7.11) Combining (7.5) with (4.5) and (7.1) with (4.6) we obtain ψ M and φ M explicitly in the form: for all x ∈ g k in a Zariski open subset, where π i : g → g/V i . Recall that V i ≤ V = F k,g,Q is defined in (7.10) above, and V i ≤ g at the beginning of this section; also note that this notation is not perfectly coherent with the notation in Section 5, where {V i } and {V i } are full flags. According to Proposition 7.4.1 and Corollary 7.4.3 the dimension of GL kinvariant subspaces of V = F k,g,Q can achieve only boundedly many values, each given by a polynomial in k with rational coefficients and degree at most s. It then follows from the special form (4.5) taken by ψ that ψ M can only take boundedly many values, each of which is one of boundedly many polynomials in k of degree at most s and with rational coefficients. We conclude that the same holds for α k , since φ is integer valued and bounded in terms of dim g only. When k is large enough (larger than a constant depending only on the size of the coefficients of these polynomials, hence only on dim g), this maximum will be achieved by a single polynomial with rational coefficients and of degree at most s. In fact the degree will be exactly s, because of the Dirichlet lower bound and the fact that already with W = F Now that we have finished proving Theorem 1.13, we briefly explain how to derive Corollary 1.14.
Proof of Corollary 1.14. The existence of the limit follows immediately from the theorem. To prove the upper bound, note that φ takes integer values, while ψ is bounded above by s dim F k,g,Q , as follows from (4.5). However, according to the Bass-Guivarc'h formula (7.7), η G (k) is asymptotic to s dim F k,g,Q . This shows the upper bound. For the lower bound, note that τ Q (M) is a maximum over the subspaces W ; evaluating at W = F . When d(·, ·) is Riemannian, all α i are 1 and then φ(W ) ≤ dim g [s] . This ends the proof. We will see explicit examples in the next section, where both upper and lower bounds are attained for lim k β k , see also Remark 8.4.3.
Remark 7.5.2 (Irrational g). If L k,g is not rational, then there is a non-trivial subspace W ≤ F k,g,Q for which φ M (W ) = 0, and in particular τ (M) = +∞, so we cannot conclude anything in this case. However if L k,g is rational and even if g is not, we can assert that the conclusion of Theorem 1.13 holds in certain cases. For example it holds whenever F k,g is multiplicity-free as a GL k -module. Indeed in this case every GL k -invariant subspace is rational (because GL k is Q-split). Since we know that the maximum in τ (M) is attained at a GL k -invariant subspace, we obtain again τ (M) = τ Q (M) in this case. We will make use of this observation in some of the examples below.
Remark 7.5.3 (The exponent is attained in the last step). Let When k is large enough, the GL k -invariant rational subspace W realizing the maximum of ψ M /φ M in (7.11) can be chosen to belong to L. This can be seen easily by writing W = W 0 ⊕ W 0 with W 0 = W ∩ L for a GL k -invariant W 0 (which exists by complete reducibility of the GL k action). Then from (7.12) it follows that ψ M (W ) − ψ M (W 0 ) is a polynomial of degree at most s − 1 in k, while φ M (W ) > φ M (W 0 ) unless W = W 0 . From these two inequalities we see that, for large enough k,

Explicit values of the critical exponent in some examples
In this final section, we illustrate Theorems 1.13 and 1.14 and work out an explicit value for the critical exponent β k in several examples. For definiteness we always assume in this section that the metric on G is Riemannian. 8.1. Nilpotent Lie groups of step 2. In this paragraph, G is a connected simply connected non-abelian nilpotent Lie group of step 2. We denote by d i , i = 1, 2 the dimension of g [i] . k,g are both irreducible GL k -modules of dimension k and k(k − 1)/2 respectively. In particular Remark 7.5.2 applies and this implies that α k = k(k−1) d 2 − 2. Since η G (k) = k + k(k − 1) = k 2 according to (7.7), we get: Theorem 8.1.1 (Step 2 nilpotent Lie groups). Let G be a connected simply connected non-abelian nilpotent Lie group of step 2. Set d 1 = dim G/[G, G] and d 2 = dim[G, G] and let k ≥ d 1 be an integer. The critical exponent for k-tuples in G is In the special case of the 3-dimensional Heisenberg group, d 2 = 1 and we thus recover the computation made in Section 3. 8.2. Metabelian nilpotent Lie groups. Suppose now, more generally, that G is a simply connected metabelian nilpotent Lie group, that is we assume that [G, G] is abelian. This does not constrain the nilpotency class.
No assumption of rationality on G is made. It is shown in [ABRdS15a, k,g is irreducible as a GL k -module for each i and isomorphic to E (i−1,1) (k). Hence Remark 7.5.2 applies. In particular, for k large we obtain: On the other hand, the Bass-Guivarc'h formula reads: i dim E (i−1,1) (k).
Using Weyl's dimension formula, we may compute dim E (i−1,1) (k) = (i − 1) i+k−2 i a polynomial of degree i in k.
Theorem 8.2.1 (Metabelian nilpotent groups). When k is large enough the critical exponent is given by β k = α k /η G (k) with the above polynomial expressions for α k and η G (k). In particular, lim k→∞ β k = 1 dim G (s) .
8.3. Unipotent upper triangular matrices. We now deal with the case of the group U s of upper triangular unipotent (s + 1) × (s + 1) matrices. Its Lie algebra is the Lie algebra g = u s of upper triangular matrices with zero diagonal, and our first task will be to determine, for any positive integer k, the Lie algebra F k,g of bracket maps on g on k letters. The result is the following.
Proposition 8.3.1. Let k be a positive integer, and let g = u s be the Lie algebra of upper triangular (s+1)×(s+1) matrices with zero diagonal terms. The Lie algebra F k,g of bracket maps on g on k letters is isomorphic to the free s-step nilpotent Lie algebra F k,s .
An alternative formulation of Proposition 8.3.1 is in terms of the nilpotent group of unipotent upper triangular matrices: Corollary 8.3.2. Let U s be the group of upper triangular unipotent (s+1)× (s + 1) matrices, and let k be any positive integer. Then U s contains the free nilpotent group of step s on k generators as a finitely generated subgroup.
Proof of Proposition 8.3.1. Let k ≥ s be a positive integer and let F k denote the free Lie algebra on the k generators x 1 , x 2 , . . . , x k . We have to check that g = u s has no non-trivial relation of degree less than or equal to s in F k . First, by the proof of [Bah87, Theorem 3, page 99] we note that if g satisfies a non-trivial relation, then it also satisfies a non-trivial multilinear relation whose degree is not larger. Now, if r is such a multilinear relation in g, then so is each of its homogeneous components, so we may assume r has degree one in each x i , 1 ≤ i ≤ t. In fact, we may also assume t = s, otherwise, we replace r by [[r, x t+1 ], . . . , x s ]. So we just have to see that g has no multilinear relation in s variables.
Let H s be the vector space of all elements of F k of degree s that are multilinear in (x 1 , x 2 , . . . , x s ). We want to check that the canonical map θ : H s → F k,g is injective. By Witt's Formula This gives us a family (m σ ) of (s − 1)! elements in H s . We will show that its image (θ(m σ )) under θ is linearly independent in F k,g ; together with (8.1), this will prove the theorem. For 1 ≤ i, j ≤ s + 1 we denote by E i,j the matrix whose only non-zero entry is 1, in position (i, j). For 1 ≤ i ≤ s, we also let e i = E i,i+1 . Using the relations [E i,j , E k,l ] = δ jk E il , one can compute the values of the θ(m σ ) on permutations of the s-tuple (e i ), and get, for any two permutations σ and τ of {1, 2, . . . , s}, both fixing 1: θ(m σ )(e 1 , e τ (2) , . . . , e τ (s) ) = E s+1,s+1 if σ = τ −1 0 otherwise.
This implies in particular that the family (θ(m σ )) is linearly independent, so we are done.
From Proposition 8.3.1, it is not difficult to compute the critical exponent for the group G = U s of unipotent upper triangular matrices. Indeed, if r ∈ F k is a law of g/g (s) , then [x k+1 , r] is a law of g, hence has all its homogeneous components of degree at least s+1 according to Proposition 8.3.1. Hence r ∈ F where M (x) = n≤x µ(n) is the Mertens function, and thus obtain: Theorem 8.3.3 (Critical exponent for U s ). Let U s be the nilpotent group of unipotent upper triangular (s + 1) × (s + 1) matrices. Then for k large, the critical exponent for k-tuples in U s is β k = d|s µ(d)k s/d − s i≤s M ( s i )k i with M (x) = n≤x µ(n) the Mertens function. In particular, lim k β k = 1.
Note that the limit also follows directly from Theorem 1.14 since for G = U s , we have dim G (s) = 1. 8.4. Free nilpotent Lie algebras. In this paragraph g will denote the free nilpotent Lie algebra F d,s of step s on d generators. We will assume throughout that d ≥ s.
Note that g has a natural structure of GL d -module. Given a Young diagram λ, we denote by d λ (d) the dimension of the irreducible GL d -representation E λ (d) associated to λ.
Theorem 8.4.1 (Critical exponent for free nilpotent groups). Let G be the connected simply connected Lie group with Lie algebra g = F d,s . Assume that d ≥ s. Then, if k is large enough, the critical exponent for k-tuples in G is s i≤s M ( s i )k i where M (x) = n≤x µ(n) is the Mertens function.
Passing to the limit, we get: Corollary 8.4.2 (Limiting value). For d ≥ s, we have the following limit for the critical exponent for g = F d,s : Remark 8.4.3. This shows that the strict inequality 1 dim g (s) < lim k β k < 1 can happen.
We now pass to the proof of Theorem 8.4.1. Our Lie algebra g has the following special property: for each k ≥ s the laws of g/g (s) are either laws of g or are contained in F [s] k,g (i.e. if r ∈ F k,g and r(g k ) ≤ g (s) , then r ∈ F [s] k,g ). So it follows from Remark 7.5.3 that the maximum value of τ Q (M) is attained at a rational GL k -invariant subspace W that is contained in F [s] k,g . For such a subspace ψ M (W ) = s(dim W − φ M (W )) and φ M (W ) = dim W (x), where W (x) is the image of W in g under evaluation at a generic x in g k . Note that if k ≥ s, then, for x in a dense Zariski open set in g k , the space W (x) is independent of x, equal to the span W g of all r(x 1 , . . . , x k ), r ∈ W , x 1 , . . . , x k ∈ g. Thus, k,g rational GL k -module}.
Recall Proposition 7.4.1, which describes precisely the GL k -submodules of F [s] k,g . An immediate consequence of this proposition is the following simpler expression: where the maximum is taken over all Young diagrams λ that appear in the decomposition of g (s) as a GL d -module. The following theorem of Klyachko describes this set of diagrams: Theorem 8.4.4 (Klyachko [Klj74,Reu93]). Let g = F d,s . Then, except for (s) and (1, 1, . . . , 1), with the further exceptions of λ = (2, 2), when s = 4, and λ = (2, 2, 2) when s = 6, all Young tableaux with s boxes and at most d rows appear in the decomposition of g [s] into irreducible GL d -modules.
We thus need to maximize d λ (k) d λ (d) over all diagrams with s boxes, at most d rows, and different from the above exceptions. For this we prove the following lemma: Lemma 8.4.5. Let µ and λ be two Young diagrams with at most d rows and having the same number of boxes. If µ can be obtained from λ by moving some boxes downwards, then for any k ≥ d, Proof. Without loss of generality we may assume that µ is obtained from λ by moving only one box downwards, i.e. µ i = λ i , except for µ r = λ r − 1 and µ s = λ s + 1 for some indices r < s. Using Weyl's dimension formula: But λ s ≤ λ r , and hence each factor in the above product is at most 1.
Combined with Klyachko's theorem, this lemma allows us to compute α k . The Young diagram λ present in g (s) and achieving the maximal value in (8.3) is the diagram with s boxes of the form λ 0 := (2, 1, . . . , 1), because we have assumed s ≤ d. Using Weyl's dimension formula, it is easy to compute d λ 0 (k) = (s − 1) k + 1 s .
When k ≥ d ≥ s, the relatively free Lie algebra F k,g coincides with the free Lie algebra F k,s of step s. In particular, the growth exponent η G (k) is given by (8.2). Given that β k = α k /η G (k), this concludes the proof of Theorem 8.4.1.