1 Introduction

Given a linear program of the form

$$\begin{aligned} \max \;\varvec{c}^{\top }\varvec{x}\;:\;\varvec{A}\varvec{x}&=\varvec{b}\nonumber \\ \varvec{x}&\ge {\mathbf {0}}, \end{aligned}$$
(1)

where \(\varvec{A}\) is a full-row-rank \(m\times n\) integral matrix, \(\varvec{b}\in {\mathbb {Z}}^{m}\), and \(\varvec{c}\in {\mathbb {Z}}^{n}\), we seek to understand how far away an optimal vertex \(\varvec{x}^{*}\) of the feasible region can be to a nearby feasible integer solution \(\varvec{z}^{*}\), assuming the feasible region has at least one integral point. Typically it is further required that \(\varvec{z}^{*}\) is itself optimal; we do not impose this requirement in this manuscript. We refer to the smallest possible distance between \(\varvec{x}^{*}\) and a feasible integral solution \(\varvec{z}^{*}\) as the proximity of (1). This distance is measured in terms of some given norm, for example the \(\left\| \cdot \right\| _1\) or \(\left\| \cdot \right\| _{\infty }\) norms; in this paper we state our results in terms of the Euclidean norm \(\left\| \cdot \right\| _2\).

Bounds for proximity are typically given in terms of the largest possible absolute value \(\Delta _{m}(\varvec{A})\) of any \(m\times m\) subdeterminant of \(\varvec{A}\). Note that this parameter is within a factor of \(\genfrac(){0.0pt}1{n}{m}\) of \(\Delta \left( \varvec{A}\right) :=\sqrt{\det (\varvec{A}\varvec{A}^{\top })}\). Finding such bounds is a well-studied problem which goes back to the classic Cook et al. result [7] bounding the proximity of the dual of (1). See, for instance, the recent works of Eisenbrand and Weismantel [8] and of Aliev et al. [2] and the references therein.

In this manuscript, we would like to understand the worst-possible proximity, which we denote by \(\mathrm {dist}(\varvec{A})\), over all choices of \(\varvec{b}\) and \(\varvec{c}\), when the matrix \(\varvec{A}\) is chosen randomly. The model of randomness we consider is the following: we choose the matrix \(\varvec{A}\) up to left-multiplication by unimodular matrices, and we choose \(\varvec{A}\) uniformly at random subject to the condition that the greatest common divisor of the maximal minors of \(\varvec{A}\) is 1, and that \(\Delta (\varvec{A})\) is at most some sufficiently large (with respect to m and n) integer T. This is a natural model to study from a geometric point of view, since \(\Delta (\varvec{A})\) is the determinant of the lattice of integer points in the kernel of \(\varvec{A}\). This is also the model considered by Aliev and Henk [1], in their investigation of diagonal Frobenius numbers.

Our main result concerns not \(\mathrm {dist}(\varvec{A})\) but rather a related random variable we denote by \(\mathrm {dist}^{*}(\varvec{A})\). This is an asymptotic version of \(\mathrm {dist}(\varvec{A})\) that further imposes some mild restrictions on \(\varvec{b}\). Our main result is that it satisfies the following Markov-type inequality:

$$\begin{aligned} {\mathbf {P}}\left( \mathrm {dist}^{*}(\varvec{A})>t\Delta (\varvec{A})^{1/(n-m)}\right) \ll t^{-2/3}. \end{aligned}$$
(2)

Here \(\ll \) means less than, up to constants which only depend on n and m. In particular, this shows that proximity generally depends only on \(\Delta ^{1/(n-m)}\) in our random setting, for “almost all” choices of \(\varvec{b}\) in a certain precise sense. This is significantly better than the linear dependency on \(\Delta _{m}\) in the deterministic case, that is known to be tight [2, Theorem 1].

1.1 Related work

A similar result, with a slightly different random model, was obtained in [2] the so-called knapsack scenario, where \(m=1\). In this work, a fixed integer T is given, and the matrix \(\varvec{A}\) is a row vector chosen uniformly at random from \(\left\{ 1,2,\ldots ,T\right\} ^{n}\) such that the greatest common divisor of the entries equals 1. A special case of [2, Theorem 2] states

$$\begin{aligned} {\mathbf {P}}\left( \mathrm {dist}(\varvec{A})>t\left\| \varvec{A}\right\| _{\infty }^{2/n}\right) \ll t^{-1}, \end{aligned}$$

where \(\mathrm {dist}\left( \varvec{A}\right) \) measures distance using the \(\left\| \cdot \right\| _{\infty }\) norm.

The recent work of Oertel et al. [14] considers a random model that allows \(\varvec{b}\) to vary but keeps \(\varvec{A}\) fixed. More precisely, for a given positive integer t, the vector \(\varvec{b}\) is chosen uniformly at random from \(\left\{ -T,\ldots ,T\right\} ^{m}\) such that \(\varvec{A}\varvec{x}=\varvec{b},\varvec{x}\ge {\mathbf {0}}\) is integer-feasible. The result in [14, Corollary 1.3] states that

$$\begin{aligned} \mathrm {dist}(\varvec{A})\le \left( m+1\right) \left( \sqrt{m}\left\| \varvec{A}\right\| _{\infty }\right) ^{m} \end{aligned}$$

with probability approaching 1 as \(T\rightarrow \infty \). Here again \(\mathrm {dist}\left( \varvec{A}\right) \) measures distance using the \(\left\| \cdot \right\| _{\infty }\) norm. Note that this bound does not depend on n.

Finally, we mention the very recent work of Borst et al. [5] which investigates the integrality gap of integer programs of the form

$$\begin{aligned} \max \;\varvec{c}^{\top }\varvec{x}\;:\qquad \varvec{A}\varvec{x}&\le \varvec{b}\\ {\mathbf {0}}\le \varvec{x}&\le {\mathbf {1}}\nonumber \\ \varvec{x}&\in {\mathbb {Z}}^{n},\nonumber \end{aligned}$$
(3)

with \(\varvec{A}\) and \(\varvec{c}\) having independent, Gaussian N(0, 1) entries. This quantity measures the difference between the optimal value of (3) and that of its linear relaxation. Their result is that the integrality gap is bounded from above by \(\mathrm {poly}\left( m\right) \left( \log n\right) ^{2}/n\) with probability at least \(1-n^{-7}-2^{-\mathrm {poly}\left( m\right) }\), subject to certain conditions on \(\varvec{b}\). See [5] and the references therein for a history of this problem.

1.2 Outline of proof

The proof of our result combines ideas of [1, 2] using facts from the geometry of numbers, some results of Schmidt from [15] on random sublattices of \({\mathbb {Z}}^{n}\) of fixed dimension, and computations of the measure of certain distinguished regions of the real Grassmannian \(\mathrm {Gr}(d,n)\) of d-dimensional subspaces of \({\mathbb {R}}^{n}\), where \(d=n-m\). For us the crucial parameters from the geometry of numbers that we need are the covering radius \(\mu \), as well as the successive minima \(\lambda _{1},\ldots ,\lambda _{d}\) of \(\ker \varvec{A}\cap B_{2}^{n}\) with respect to the lattice \(\ker \varvec{A}\cap {\mathbb {Z}}^{n}\), where \(B_{2}^{n}\) denotes the unit-radius Euclidean ball in \({\mathbb {R}}^{n}\). Further details on these parameters can be found in Sect. 3.

The restrictions imposed by the definition of \(\mathrm {dist}^{*}\left( \varvec{A}\right) \) on the right hand side \(\varvec{b}\) ensure that, given a vertex \(\varvec{x}^{*}\) of the feasible region of (1), one can always find a feasible integral solution \(\varvec{z}^{*}\) such that

$$\begin{aligned} \left\| \varvec{x}^{*}-\varvec{z}^{*}\right\| _{2}\le \mu \left( \left\| \varvec{A}_{\sigma }^{-1}\varvec{A}\right\| _{1}+1\right) , \end{aligned}$$

where \(\varvec{x}^{*}\) has support contained in \(\sigma \subseteq \left[ n\right] \) and \(\varvec{A}_{\sigma }\) denotes the square submatrix of \(\varvec{A}\) whose columns are indexed by \(\sigma \). This restriction on \(\varvec{b}\) amounts to picking \(\varvec{b}\) sufficiently deep inside the cone spanned by the columns of \(\varvec{A}_{\sigma }\), or choosing \(\varvec{b}\) from a reduced cone in the sense of Gomory [11, p. 261]. A uniform upper bound on all ratios \(\lambda _{i+1}/\lambda _{i},\;i=1,2,\ldots ,d-1\) implies an upper bound on \(\mu \), see Lemma 2. Meanwhile, Sect. 4 shows that the measure in \(\mathrm {Gr}(d,n)\) of those subspaces \(\ker \varvec{A}\in \mathrm {Gr}(d,n)\) such that any given entry of \(\varvec{A}_{\sigma }^{-1}\varvec{A}\) exceeds in absolute value some fixed parameter \(s>0\) is a function of the order \(s^{-1}\). Theorem 2, itself a straightforward corollary of results of [15], combines these two pieces together: a random lattice of the form \(\ker \varvec{A}\cap {\mathbb {Z}}^{n}\) is unlikely to have any ratio \(\lambda _{i+1}/\lambda _{i}\), nor any entry of \(\varvec{A}_{\sigma }^{-1}\varvec{A}\), exceedingly large. The details of this are carried out in Sect. 5.

We remark that the exponent of \(-2/3\) is mainly an artifact of the proof, and we expect that it can be further improved. The problem of finding an inequality analogous to (2) for \(\mathrm {dist}(\varvec{A})\) is more challenging and remains open. When we allow \(\varvec{b}\) to lie close the the boundary of the cone spanned by the columns of \(\varvec{A}_{\sigma }\), our arguments no longer apply.

Remark 1

(Changes from proceedings version)  The following changes have been made since the proceedings version of this manuscript [6]. In Sect. 3 we clarified and expanded upon the geometry of numbers theory that is used in this paper. In Sect. 4 we gave a proof of the claim that a particular subset of \(\mathrm {Gr}(d,n)\) is Jordan measurable. Some minor typos have also been corrected, and some minor changes have been made to the introduction.

2 Main result and notation

2.1 Notation

Throughout this manuscript we assume fixed positive integers dmn such that \(n=m+d\). For a subset \(\sigma \subseteq [n]\) and \(\varvec{x}\in {\mathbb {R}}^{n}\), we let \(\varvec{x}_{\sigma }\) denote the vector obtained by orthogonally projecting \(\varvec{x}\) onto the coordinates indexed by \(\sigma \). Similarly, if \(\varvec{A}\) is a matrix, then we denote by \(\varvec{A}_{\sigma }\) the submatrix of \(\varvec{A}\) whose columns are those indexed by \(\sigma \). In particular, if \(k\in [n]\) then \(\varvec{A}_{k}\) denotes the corresponding column of \(\varvec{A}\). If \(\varvec{A}_{\sigma }\) is an invertible square matrix we say \(\sigma \) is a basis of \(\varvec{A}\). We denote the complement of \(\sigma \) by \({\bar{\sigma }}:=[n]\backslash \sigma \). Given a d-dimensional subspace \(L\subseteq {\mathbb {R}}^{n}\), the m-dimensional orthogonal complement of L is denoted by \(L^{\perp }\). If \(\Lambda \subset {\mathbb {R}}^{n}\), let \(\Lambda _{{\mathbb {R}}}\) denote the linear subspace of \({\mathbb {R}}^{n}\) spanned by \(\Lambda \). We say \(\sigma \subseteq [n]\) is a coordinate basis of \(\Lambda \) or \(\Lambda _{{\mathbb {R}}}\) if the coordinate projection map

$$\begin{aligned} \Lambda _{{\mathbb {R}}}&\rightarrow {\mathbb {R}}^{\sigma }\\ \varvec{x}&\mapsto \varvec{x}_{\sigma } \end{aligned}$$

is an isomorphism. This is equivalent to saying that \(\sigma \) is a basis of \(\varvec{A}\) for any full-row-rank matrix \(\varvec{A}\) such that \(\ker (\varvec{A})=\Lambda _{{\mathbb {R}}}\). Finally, we denote the group of \(n\times n\) orthogonal real matrices by O(n). This notation presents a conflict with “big-O” asymptotic notation, so we write \({\mathcal {O}}(n)\) for the latter.

2.2 Definition of \(\mathrm {dist}(\varvec{A})\)

Let \(\varvec{A}\in {\mathbb {Z}}^{m\times n}\) be a full-row-rank matrix. For a basis \(\sigma \) of \(\varvec{A}\), we define the semigroup

$$\begin{aligned} {\mathcal {S}}_{\sigma }\left( \varvec{A}\right) :=\left\{ \varvec{x}\ge {\mathbf {0}}:\varvec{x}_{{\bar{\sigma }}}={\mathbf {0}},\,\varvec{x}_{\sigma }=\varvec{A}_{\sigma }^{-1}\varvec{A}{\mathbf {g}}\text { for some }{\mathbf {g}}\in {\mathbb {Z}}^{n}_{\geqslant 0}\right\} . \end{aligned}$$
(4)

For a vector \(\varvec{b}\in {\mathbb {Z}}^{m}\), we define the polyhedron

$$\begin{aligned} {\mathcal {P}}(\varvec{A},\varvec{b}):=\left\{ \varvec{x}\in {\mathbb {R}}^{n}:\varvec{A}\varvec{x}=\varvec{b},\;\varvec{x}\ge {\mathbf {0}}\right\} . \end{aligned}$$

The idea behind these definitions is that if \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \varvec{A}\right) \), then \(\varvec{b}:=\varvec{A}\varvec{x}^{*}\) is an integral vector, \({\mathcal {P}}(\varvec{A},\varvec{b})\) is a polyhedron containing at least one integral point, and \(\varvec{x}^{*}\) is the vertex of \({\mathcal {P}}(\varvec{A},\varvec{b})\) associated to the basis \(\sigma \). Now given a basis \(\sigma \) of \(\varvec{A}\) and \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \varvec{A}\right) \), we define the distance

$$\begin{aligned} \mathrm {dist}\left( \varvec{A},\sigma ,\varvec{x}^{*}\right) :=\min _{\varvec{z}^{*}\in {\mathbb {Z}}^{n}\cap {\mathcal {P}}(\varvec{A},\varvec{b})}\left\| \varvec{x}^{*}-\varvec{z}^{*}\right\| _{2}. \end{aligned}$$

where \(\varvec{b}:=\varvec{A}\varvec{x}^{*}\). We then define the worst-case distance over all choices of bases \(\sigma \) of \(\varvec{A}\) and elements \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \varvec{A}\right) \) as

$$\begin{aligned} \mathrm {dist}\left( \varvec{A}\right) :=\max _{\sigma }\,\max _{\varvec{x}^{*}}\,\mathrm {dist}\left( \varvec{A},\sigma ,\varvec{x}^{*}\right) . \end{aligned}$$
(5)

This definition has the disadvantage that it is stated in terms of the matrix \(\varvec{A}\). Since we may replace \(\varvec{A}\varvec{x}=\varvec{b}\) with \(\varvec{U}\varvec{A}\varvec{x}=\varvec{U}\varvec{b}\) for any \(m\times m\) integral matrix \(\varvec{U}\), it is not so clear from this formulation how to define our random model. This motivates an alternative, more geometric definition of \(\mathrm {dist}(\varvec{A})\) which we now state.

2.3 Definition of \(\mathrm {dist}(\Lambda )\)

Suppose instead we start with a d-dimensional sublattice \(\Lambda \) of \({\mathbb {Z}}^{n}\). Suppose \(\sigma \) is a coordinate basis of \(\Lambda \). Then we may define the semigroup

$$\begin{aligned} {\mathcal {S}}_{\sigma }\left( \Lambda \right) :=\left\{ \varvec{x}\ge {\mathbf {0}}:\varvec{x}_{{\bar{\sigma }}}={\mathbf {0}},\,\varvec{x}\in \Lambda _{{\mathbb {R}}}+{\mathbf {g}}\text { for some }{\mathbf {g}}\in {\mathbb {Z}}^{n}_{\geqslant 0}\right\} . \end{aligned}$$
(6)

For \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \), define the distance

$$\begin{aligned} \mathrm {dist}\left( \Lambda ,\sigma ,\varvec{x}^{*}\right) :=\max _{{\mathbf {g}}\in \left( \Lambda _{{\mathbb {R}}}+\varvec{x}^{*}\right) \cap {\mathbb {Z}}^{n}}\,\min _{\varvec{z}^{*}\in \left( \Lambda +{\mathbf {g}}\right) \cap {\mathbb {R}}_{\geqslant 0}^{n}}\left\| \varvec{x}^{*}-\varvec{z}^{*}\right\| _{2}. \end{aligned}$$
(7)

The extra maximum accounts for the fact that, if \(\Lambda \) is not primitive, then there are multiple ways to embed \(\Lambda \) into \(\Lambda _{{\mathbb {R}}}+\varvec{x}^{*}\) as an integral translate of \(\Lambda \). Finally, define the worst case distance

$$\begin{aligned} \mathrm {dist}\left( \Lambda \right) :=\max _{\sigma }\,\max _{\varvec{x}^{*}}\,\mathrm {dist}\left( \Lambda ,\sigma ,\varvec{x}^{*}\right) , \end{aligned}$$
(8)

where the maximum is taken over all coordinate bases of \(\Lambda \) and elements \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \).

We now explain the relationship between definitions (5) and (8). First note that if \(\varvec{A}\) is any integral matrix such that \(\Lambda _{{\mathbb {R}}}=\ker (\varvec{A})\), then the two definitions (4) and (6) of \({\mathcal {S}}_{\sigma }\left( \varvec{A}\right) \) and \({\mathcal {S}}_{\sigma }\left( \Lambda \right) \) coincide. Moreover, if \(\Lambda \) is a primitive lattice, that is, if \(\Lambda =\Lambda _{{\mathbb {R}}}\cap {\mathbb {Z}}^{n}\), then we have

$$\begin{aligned} \mathrm {dist}\left( \Lambda ,\sigma ,\varvec{x}^{*}\right) =\mathrm {dist}\left( \varvec{A},\sigma ,\varvec{x}^{*}\right) \end{aligned}$$

and therefore

$$\begin{aligned} \mathrm {dist}\left( \Lambda \right) =\mathrm {dist}\left( \varvec{A}\right) . \end{aligned}$$

Definition (8) also makes sense when \(\Lambda \) is non-primitive, however, and it is immediate from the definitions that in general,

$$\begin{aligned} \mathrm {dist}\left( \Lambda \right) \ge \mathrm {dist}\left( \Lambda _{{\mathbb {R}}}\cap {\mathbb {Z}}^{n}\right) . \end{aligned}$$

The key advantage of definition (8) is that there are only finitely many d-dimensional sublattices of \({\mathbb {Z}}^{n}\) whose determinant is at most some fixed positive integer T. Thus, we may consider the uniform distribution over these bounded-determinant lattices.

2.4 An asymptotic version of \(\mathrm {dist}(\Lambda )\)

We next consider a modification of \(\mathrm {dist}\left( \Lambda \right) \). Choose any full-row-rank matrix \(\varvec{A}\) such that \(\ker (\varvec{A})=\Lambda _{{\mathbb {R}}}\), the particular choice of \(\varvec{A}\) is not important. Let \(B_{2}^{n}\subset {\mathbb {R}}^{n}\) denote the n-dimensional Euclidean ball of radius 1.

Define the vector \(\varvec{w}=\varvec{w}\left( \Lambda _{{\mathbb {R}}}\right) \in {\mathbb {R}}^{n}\) so that, for each \(i\in [n]\),

$$\begin{aligned} \varvec{w}_{i}:=\max _{\varvec{x}\in B_{2}^{n}\cap \Lambda _{{\mathbb {R}}}}\varvec{x}_{i}. \end{aligned}$$

Denote by \(\mu =\mu \left( \Lambda \right) \) the covering radius of \(B_{2}^{n}\) with respect to \(\Lambda \). That is,

$$\begin{aligned} \mu :=\inf \left\{ t>0:\Lambda +tB_{2}^{n}\;\text {contains}\;\Lambda _{{\mathbb {R}}}\right\} . \end{aligned}$$
(9)

For more information on the covering radius we refer to Sect. 3. If \(\sigma \) is a basis of \(\varvec{A}\) then define the following subsemigroup of \({\mathcal {S}}_{\sigma }\left( \Lambda \right) \):

$$\begin{aligned} {\mathcal {S}}_{\sigma }^{*}\left( \Lambda \right) :=\left\{ \varvec{x}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) :\varvec{x}_{\sigma }\ge \mu \varvec{w}_{\sigma }+\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\varvec{w}_{{\bar{\sigma }}}\right\} . \end{aligned}$$

The next proposition shows that if we further restrict \(\varvec{x}^{*}\) so that it can only lie in \({\mathcal {S}}_{\sigma }^{*}\left( \Lambda \right) \), then we can guarantee that \({\mathcal {P}}(\varvec{A},\varvec{b})\) contains an integral point reasonably close to \(\varvec{x}^{*}\). We prove it in Sect. 5.

Proposition 1

For a basis \(\sigma \) of \(\varvec{A}\) and \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }^{*}\left( \Lambda \right) \), let \(\varvec{b}=\varvec{A}\varvec{x}^{*}\). Then \({\mathcal {P}}(\varvec{A},\varvec{b})\) contains a translate of the scaled ball \(\mu \cdot \left( B_{2}^{n}\cap \Lambda _{{\mathbb {R}}}\right) \), which in turn contains an integral vector.

Now set

$$\begin{aligned} \mathrm {dist}^{*}\left( \Lambda \right) :=\max _{\sigma }\,\max _{\varvec{x}^{*}}\,\mathrm {dist}\left( \Lambda ,\sigma ,\varvec{x}^{*}\right) , \end{aligned}$$
(10)

where the maximum is taken over all bases \(\sigma \) of \(\varvec{A}\) and elements \(\varvec{x}^{*}\) of the semigroup \({\mathcal {S}}_{\sigma }^{*}\left( \Lambda \right) \).

2.5 Main result

We are now ready to state the main theorem.

Theorem 1

For \(T\gg 1\), let \(\Lambda \) be a sublattice of \({\mathbb {Z}}^{n}\) of dimension d and determinant at most T, chosen uniformly at random. Then for all \(t>1\),

$$\begin{aligned} {\mathbf {P}}\left( \mathrm {dist}^{*}\left( \Lambda \right) >t\left( \Delta \left( \Lambda \right) \right) ^{1/d}\right) \ll t^{-2/3}. \end{aligned}$$

What we would like to do is translate this statement into a statement about integer programs, and in particular derive inequality (2). For this we use a known result on the ratio between primitive sublattices and all sublattices with a fixed determinant upper bound, a consequence of Theorems 1 and 2 in [15]:

Lemma 1

Suppose there are exactly N(dnT) d-dimensional sublattices of \({\mathbb {Z}}^{n}\) with determinant at most T, of which exactly P(dnT) are primitive. Then

$$\begin{aligned} \lim _{T\rightarrow \infty }\frac{P(d,n,T)}{N(d,n,T)}=\frac{1}{\zeta (d+1)\cdots \zeta (n)}, \end{aligned}$$

where \(\zeta (\cdot )\) denotes the Riemann zeta function.

Recall from the introduction our probability model. We start with a sufficiently large integer T relative to m and n, and consider the set of all \(m\times n\) integral matrices \(\varvec{A}\) such that the greatest common divisor of all maximal minors of \(\varvec{A}\) equals 1, and that \(\Delta \left( \varvec{A}\right) \le T\). The group of \(m\times m\) unimodular matrices acts on this set of matrices by multiplication on the left, and there are finitely many orbits of this action. We consider the uniform distribution on these orbits. We define

$$\begin{aligned} \mathrm {dist}^{*}\left( \varvec{A}\right) :=\mathrm {dist}^{*}\left( \ker \left( \varvec{A}\right) \cap {\mathbb {Z}}^{n}\right) . \end{aligned}$$

Note that this definition depends not on \(\varvec{A}\) but only on the orbit of \(\varvec{A}\). The greatest common divisor condition ensures that \(\Delta \left( \varvec{A}\right) \) equals the determinant of the lattice \(\ker \left( \varvec{A}\right) \cap {\mathbb {Z}}^{n}\). Recall we set \(d:=n-m\). We derive the next corollary by combining Theorem 1, Lemma 1, and the simple conditional probability inequality \({\mathbf {P}}(E\mid F)\le {\mathbf {P}}(E)/{\mathbf {P}}(F)\), where E is the event that \(\mathrm {dist}^{*}\left( \Lambda \right) >t\left( \Delta \left( \Lambda \right) \right) ^{1/d}\) and F is the event that \(\Lambda \) is primitive.

Corollary 1

For \(T\gg 1\), choose \(\varvec{A}\) randomly as above, with determinant at most T. Then for all \(t>1\),

$$\begin{aligned} {\mathbf {P}}\left( \mathrm {dist}^{*}\left( \varvec{A}\right) >t\left( \Delta \left( \varvec{A}\right) \right) ^{1/d}\right) \ll t^{-2/3}. \end{aligned}$$

We remark that the question of deriving the constants in this bound remains unexplored.

3 Geometry of Numbers and a theorem of Schmidt

Next we state some basic functionals and tools from Geometry of Numbers as well as a theorem of Schmidt which are fundamental for the proof of our results. An excellent reference for the Geometry of Numbers tools is Gruber’s book [12, Chapters 21–23]. We start with Minkowski’s successive minima. Given a d-dimensional lattice \(\Lambda \subset {\mathbb {R}}^{d}\), the ith successive minimum \(\lambda _{i}(\Lambda )\), \(i\in \{1,\dots ,d\}\), is defined as

$$\begin{aligned} \lambda _{i}(\Lambda ):=\min \{\lambda >0:\dim (\lambda \,B_{2}^{d}\cap \Lambda )\ge i\}. \end{aligned}$$

In other words, \(\lambda _{i}(\Lambda )\) is the smallest dilation factor \(\lambda \) such that the Euclidean ball of radius \(\lambda \) contains at least i linearly independent lattice points of \(\Lambda \). Observe that

$$\begin{aligned} \lambda _{1}(\Lambda )\le \lambda _{2}(\Lambda )\le \cdots \le \lambda _{d}(\Lambda ). \end{aligned}$$

Minkowski introduced these successive minima not only for a ball but for any convex body symmetric to the origin, but here we just need them for the ball. In this particular setting, Minkowski’s so called second theorem on successive minima reads as follows

$$\begin{aligned} \lambda _{1}(\Lambda )\cdots \lambda _{d}(\Lambda )\,\omega _{d}\le 2^{d}\det \Lambda , \end{aligned}$$
(11)

where \(\omega _{d}\) is the d-dimensional volume of the ball \(B_{2}^{d}\). Inequality (11) is for \(d>1\) actually a strict inequality and one can improve on the factor \(2^{d}\) on the right hand side, but for our purposes it is enough to use (11). The other functional we need from Geometry of Numbers is the already introduced covering radius \(\mu (\Lambda )\) (see (9)) which may also be defined as

$$\begin{aligned} \mu (\Lambda )=\min \{\mu >0:(\varvec{y}+\mu \,B_{2}^{d})\cap \Lambda \ne \emptyset \text { for all }\varvec{y}\in {\mathbb {R}}^{d}\}. \end{aligned}$$

The so called Jarnik’s inequalities show that the covering radius is essentially of the size of the last succesive minimum

$$\begin{aligned} \frac{1}{2}\lambda _{d}(\Lambda )\le \mu (\Lambda )\le \frac{1}{2}\left( \lambda _{1}(\Lambda )+\cdots +\lambda _{d}(\Lambda )\right) . \end{aligned}$$
(12)

Now, in general the successive minima can take any arbitrary values, even for sublattices of \({\mathbb {Z}}^{d}\). A fundamental result of Schmidt [15] states, however, that for a “typical” primitive sublattice of \({\mathbb {Z}}^{d}\) the ratios \(\lambda _{i+1}(\Lambda )/\lambda _{i}(\Lambda )\) are not “too” large. So one may expect that all the successive minima are more or less of the same size, which then allows us to give a “good” bound on \(\mu (\Lambda )\) via (11) and (12). But first we need a few more definitions in order to state Schmidt’s result.

We continue with our assumption that \(d=n-m\). Let \(\mathrm {Gr}\left( d,n\right) \) denote the set of d-dimensional subspaces of \({\mathbb {R}}^{n}\). Let \(\nu \) denote the unique O(n)-invariant probability measure on the real Grassmannian \(\mathrm {Gr}\left( d,n\right) \) (see, e.g., [3, Section 3.3]).

Definition 1

([15, p. 40]) A subset \(\xi \subset \mathrm {Gr}\left( d,n\right) \) is Jordan measurable if for all \(\varepsilon >0\) there exists continuous functions \(f_{1}\le {\mathbf {1}}_{\xi }\le f_{2}\) such that

$$\begin{aligned} \int \left( f_{2}-f_{1}\right) d\nu <\varepsilon . \end{aligned}$$

Here \({\mathbf {1}}_{\xi }\) denotes the indicator function of \(\xi \).

In the next definiton we define the set \(G\left( \varvec{a},\xi ,T\right) \) of lattices we are interested in: they are sublattices of \({\mathbb {Z}}^{d}\) of determinant at most T, their span \(\Lambda _{{\mathbb {R}}}\) is contained in a given subset \(\xi \subseteq \mathrm {Gr}\left( d,n\right) \) and the ratios \(\lambda _{i+1}(\Lambda )/\lambda _{i}(\Lambda )\) are at least as large as the ith entry of the given vector \(\varvec{a}\). More formally,

Definition 2

Let \(\varvec{a}=\left( a_{1},\ldots ,a_{d}\right) \in {\mathbb {R}}^{d}\), with each \(a_{i}\ge 1\). Let T be a positive integer, and let \(\xi \subset \mathrm {Gr}\left( d,n\right) \). Then we define \(G\left( \varvec{a},\xi ,T\right) \) to be the set of sublattices \(\Lambda \) of \({\mathbb {Z}}^{n}\) of dimension d with determinant at most T, such that

$$\begin{aligned} \frac{\lambda _{i+1}\left( \Lambda \right) }{\lambda _{i}\left( \Lambda \right) }\ge a_{i}\text { for all }\,i=1,2,\ldots ,d-1, \end{aligned}$$

and \(\Lambda _{{\mathbb {R}}}\in \xi \).

The result of Schmidt that we intend to use is a combination of Theorems 3 and 5 in [15]:

Theorem 2

Assuming \(\xi \subset \mathrm {Gr}\left( d,n\right) \) is Jordan measurable, we have

$$\begin{aligned} \left| G\left( \varvec{a},\xi ,T\right) \right| \asymp \left( \prod _{i=1}^{d-1}a_{i}^{-i\left( d-i\right) }\right) \nu \left( \xi \right) T^{n}, \end{aligned}$$

where \(f\asymp g\) means \(f\ll g\) and \(g\ll f\).

Roughly speaking, the amount of lattices having large successive minima ratios is small. In order to formalize this, let G(dnT) denote the set of all sublattices of \({\mathbb {Z}}^{n}\) of dimension d with determinant at most T. Let \({\mathbf {P}}={\mathbf {P}}_{d,n,T}\) denote the uniform probability distribution over G(dnT).

Corollary 2

For \(t>1\), we have

$$\begin{aligned} {\mathbf {P}}\left( \max _{i\in \left[ d-1\right] }\left\{ \frac{\lambda _{i+1}\left( \Lambda \right) }{\lambda _{i}\left( \Lambda \right) }\right\} \ge t\right) \ll \left( d-1\right) t^{-\left( d-1\right) }. \end{aligned}$$

Proof

Following Aliev and Henk [1], let

$$\begin{aligned} \varvec{\delta }_{i}(t):=\left( 1,\ldots ,1,\underset{i}{t},1,\ldots ,1\right) ^{\top }\!\in {\mathbb {R}}^{d}. \end{aligned}$$

Applying the union bound to Theorem 2, this probability is at most

$$\begin{aligned} \sum _{i=1}^{d-1}\frac{\left| G\left( \varvec{\delta }_{i}(t),\mathrm {Gr}\left( d,n\right) ,T\right) \right| }{\left| G\left( \varvec{\delta }_{i}(1),\mathrm {Gr}\left( d,n\right) ,T\right) \right| }\ll \sum _{i=1}^{d-1}t^{-i\left( d-i\right) }\le \left( d-1\right) t^{-\left( d-1\right) }. \end{aligned}$$

\(\square \)

Finally we present the already mentioned upper bound on \(\mu (\lambda )\) provided we know that \(\lambda _{i+1}(\Lambda )/\lambda _{i}(\Lambda )\) is bounded. The argument is implicitly contained in the proof of Lemma 5.1 in [1].

Lemma 2

Let \(\Lambda \subset {\mathbb {R}}^{d}\) be a lattice, and let \(u>0\) such that for \(1\le i\le d-1\)

$$\begin{aligned} \frac{\lambda _{i+1}(\Lambda )}{\lambda _{i}(\Lambda )}<\left( u\frac{\omega _{d}^{1/d}}{d}\right) ^{\frac{2}{d-1}}. \end{aligned}$$

Then

$$\begin{aligned} \mu (\Lambda )\le u\,(\det \Lambda )^{\frac{1}{d}}. \end{aligned}$$

Proof

For abbreviation we set \(r:=\left( u\omega _{d}^{1/d}/d\right) ^{\frac{2}{d-1}}\). Due to our assumption we get a lower bound on all successive minima \(\lambda _{i}(\Lambda )\), \(i=1,\dots ,d-1\), in terms of the last successive minimum

$$\begin{aligned} \lambda _{d}(\lambda )\le r^{d-i}\lambda _{i}(\Lambda ). \end{aligned}$$

Combined with Minkowski’s inequality (11) we obtain

$$\begin{aligned} \lambda _{d}(\Lambda )^{d}r^{-d(d-1)/2}\le \lambda _{1}(\Lambda )\cdots \lambda _{d}(\Lambda )\le \frac{2^{d}}{\omega _{d}}\det \Lambda . \end{aligned}$$

Hence,

$$\begin{aligned} \lambda _{d}(\Lambda )\le \left( u\frac{\omega _{d}^{1/d}}{d}\right) \frac{2}{\omega _{d}^{1/d}}\det \Lambda ^{\frac{1}{d}}=\frac{2}{d}u(\det \Lambda )^{\frac{1}{d}}, \end{aligned}$$

and Jarnik’s inequality (12) yields the assertion. \(\square \)

4 Typical Cramer’s rule ratios

We see in the next section that the proximity can be bounded from above by an expression involving the largest absolute value of the entries of \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\), as \(\sigma \) ranges over all bases of \(\varvec{A}\), and \(\varvec{A}\) is chosen randomly. Hence, we would like to show that that the largest absolute value of any entry of the matrix \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\) is typically not too large, where for our purposes the subspace \(L:=\ker \varvec{A}\) is chosen uniformly at random from \(\mathrm {Gr}\left( d,n\right) \). Note that the matrix \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\) depends only on L and \(\sigma \). We remark that the entries of the matrix \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\) are explicitly computed using Cramer’s rule: for \(i\in \sigma \) and \(j\notin \sigma \), we have

$$\begin{aligned} \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}=\frac{\det \left( \varvec{A}_{\sigma -i+j}\right) }{\det \left( \varvec{A}_{\sigma }\right) }. \end{aligned}$$
(13)

As before, we let \(\nu :{\mathscr {G}}\rightarrow [0,1]\) denote the O(n)-invariant probability measure on \(\mathrm {Gr}\left( d,n\right) \). The precise statement we show is the following: Fix \(\sigma \subseteq \left[ n\right] \), \(i\in \sigma \), \(j\in \left[ n\right] \backslash \sigma \). Then, as a function of a parameter \(s>1\), we have

$$\begin{aligned} \nu \left( \ker \left( \varvec{A}\right) :\varvec{A}_{\sigma }\text { is nonsingular},\,\left| \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\right| >s\right) =\frac{2}{\pi s}+{\mathcal {O}}\left( s^{-3}\right) . \end{aligned}$$
(14)

The proof proceeds in the three subsections below. First, we get a handle on \(\nu \) by relating it to another probability distribution, namely the Gaussian distribution \(\gamma \) on the matrix space \({\mathbb {R}}^{m\times n}\), where the entries are i.i.d. normally distributed with mean 0 and variance 1. This is done via the kernel map, which is introduced in Sect. 4.1 and related to \(\gamma \) in Sect. 4.2. Equation (14) is then derived in Sect. 4.3.

4.1 The real Grassmannian

For a general introduction to matrix groups and Grassmannians, we refer the reader to [4]. There is a right action of the orthogonal group O(n) on \(\mathrm {Gr}\left( d,n\right) \) defined as follows: if \(\ker \left( \varvec{A}\right) \in \mathrm {Gr}\left( d,n\right) \), where \(\varvec{A}\in {\mathbb {R}}^{m\times n}\), then

$$\begin{aligned} \left( \ker \left( \varvec{A}\right) \right) \cdot \varvec{U}=\ker \left( \varvec{A}\varvec{U}\right) . \end{aligned}$$
(15)

This is well-defined, since if \(\ker \left( \varvec{A}\right) =\ker \left( \varvec{A}'\right) \) for some \(\varvec{A}'\in {\mathbb {R}}^{m\times n}\), then \(\varvec{A}=\varvec{D}\varvec{A}'\) for some invertible \(m\times m\) matrix \(\varvec{D}\), and hence

$$\begin{aligned} \ker \left( \varvec{A}\varvec{U}\right) =\ker \left( \varvec{D}\varvec{A}'\varvec{U}\right) =\ker \left( \varvec{A}'\varvec{U}\right) . \end{aligned}$$

Let \(\mathrm {St}^{m\times n}:=\left\{ \varvec{A}\in {\mathbb {R}}^{m\times n}:\mathrm {rank}(\varvec{A})=m\right\} \). Call this the Stiefel manifold. Again, there is a right action of O(n) on \(\mathrm {St}^{m\times n}\) which in this case is simply right multiplication:

$$\begin{aligned} \varvec{A}\cdot \varvec{U}=\varvec{A}\varvec{U}. \end{aligned}$$

The only thing to check here is that \(\varvec{A}\varvec{U}\) indeed lies in \(\mathrm {St}^{m\times n}\), but this is indeed the case since

$$\begin{aligned} \varvec{A}\varvec{U}\left( \varvec{A}\varvec{U}\right) ^{\top }=\varvec{A}\varvec{U}\varvec{U}^{\top }\varvec{A}^{\top }=\varvec{A}\varvec{A}^{\top }, \end{aligned}$$

thus \(\varvec{A}\) and \(\varvec{A}\varvec{U}\) have the same Gram matrix \(\varvec{A}\varvec{A}^{\top }\), and an \(m\times n\) matrix has full-row-rank if and only if its Gram matrix does.

The kernel map gives rise to a surjective map

$$\begin{aligned} \ker :\mathrm {St}^{m\times n}&\rightarrow \mathrm {Gr}\left( d,n\right) \\ \varvec{A}&\mapsto \ker \left( \varvec{A}\right) \end{aligned}$$

Thus, we see from (15) that the following statement holds:

Proposition 2

The map \(\ker :\mathrm {St}^{m\times n}\rightarrow \mathrm {Gr}\left( d,n\right) \) is equivariant with respect to the right actions of O(n) on \(\mathrm {St}^{m\times n}\) and \(\mathrm {Gr}\left( d,n\right) \); that is, \(\left( \ker \left( \varvec{A}\right) \right) \cdot \varvec{U}=\ker \left( \varvec{A}\cdot \varvec{U}\right) \).

4.2 Probability spaces

Consider the probability space \(\left( {\mathbb {R}}^{m\times n},{\mathscr {B}}({\mathbb {R}}^{m\times n}),\gamma \right) \) where \({\mathscr {B}}({\mathbb {R}}^{m\times n})\) is the Borel \(\sigma \)-algebra, and the measure \(\gamma \) is defined so that each \(\varvec{A}\in {\mathbb {R}}^{m\times n}\) has iid N(0, 1) entries. In other words, \(\gamma \) is the standard Gaussian probability measure on the mn-dimensional real vector space \({\mathbb {R}}^{m\times n}\) with mean zero and identity covariance matrix. By restricting to \(\mathrm {St}^{m\times n}\), we get the probability space \(\left( \mathrm {St}^{m\times n},{\mathscr {B}}(\mathrm {St}^{m\times n}),\gamma \right) \). We can do this because \({\mathbb {R}}^{m\times n}\backslash \mathrm {St}^{m\times n}\) is an algebraic hypersurface in \({\mathbb {R}}^{m\times n}\), and therefore has measure zero with respect to \(\gamma \). Let \({\mathscr {B}}:={\mathscr {B}}(\mathrm {St}^{m\times n})\).

The Grassmannian \(\mathrm {Gr}\left( d,n\right) \) is endowed with the topology where \(E\subseteq \mathrm {Gr}\left( d,n\right) \) is open if and only if \(\ker ^{-1}(E)\) is open in \(\mathrm {St}^{m\times n}\). Let \({\mathscr {G}}\) denote the associated Borel \(\sigma \)-algebra. The measure \(\nu \) on \(\mathrm {Gr}\left( d,n\right) \) is characterized as follows:

Proposition 3

([13, Corollary 3.1.3]) The measure \(\nu \) is the unique measure on \(\mathrm {Gr}\left( d,n\right) \) satisfying

$$\begin{aligned} \nu \left( E\cdot \varvec{U}\right)&=\nu \left( E\right) \text { for all }\,E\in {\mathscr {G}}\text { and }\varvec{U}\in O(n)\\ \nu \left( \mathrm {Gr}\left( d,n\right) \right)&=1.\nonumber \end{aligned}$$
(16)

The map \(\ker :\mathrm {St}^{m\times n}\rightarrow \mathrm {Gr}\left( d,n\right) \) thus defines a map of probability spaces:

$$\begin{aligned} \ker :\left( \mathrm {St}^{m\times n},{\mathscr {B}},\gamma \right) \rightarrow \left( \mathrm {Gr}\left( d,n\right) ,{\mathscr {G}},\nu \right) . \end{aligned}$$

Proposition 4

The measure \(\nu \) is the pushforward measure of \(\gamma \) under this map. That is, \(\nu (E)=\gamma (\ker ^{-1}(E))\) for each \(E\in {\mathscr {G}}\).

Proof

We establish the conditions of (16). By surjectivity, and the fact that \(\gamma \) is a probability measure, we have

$$\begin{aligned} \gamma (\ker ^{-1}(\mathrm {Gr}\left( d,n\right) ))=\gamma \left( \mathrm {St}^{m\times n}\right) =1. \end{aligned}$$

It therefore remains to show \(\gamma (\ker ^{-1}(E\cdot \varvec{U}))=\gamma (\ker ^{-1}(E))\) for each \(E\in {\mathscr {G}}\) and \(\varvec{U}\in O(n)\). By Proposition 2, we have

$$\begin{aligned} \ker ^{-1}(E\cdot \varvec{U})=\ker ^{-1}(E)\cdot \varvec{U}. \end{aligned}$$
(17)

Now, \({\mathbb {R}}^{m\times n}\) has the inner product \(\left\langle \varvec{A},\varvec{B}\right\rangle =\mathrm {trace}\left( \varvec{A}\varvec{B}^{\top }\right) \). With respect to this inner product we may consider the subgroup \(O\left( m\times n\right) \) of \(\mathrm {GL}\left( {\mathbb {R}}^{m\times n}\right) \) which is given by

$$\begin{aligned} O\left( m\times n\right) :=\left\{ \varphi \in \mathrm {GL}\left( {\mathbb {R}}^{m\times n}\right) :\left\langle \varphi \left( \varvec{A}\right) ,\varphi \left( \varvec{B}\right) \right\rangle =\left\langle \varvec{A},\varvec{B}\right\rangle \right\} . \end{aligned}$$

Observe that, for a fixed \(\varvec{U}\in O\left( n\right) \), the linear map \(\varphi _{\varvec{U}}\in \mathrm {GL}\left( {\mathbb {R}}^{m\times n}\right) \) given by

$$\begin{aligned} \varphi _{\varvec{U}}\left( \varvec{A}\right) =\varvec{A}\varvec{U} \end{aligned}$$
(18)

lies in \(O\left( m\times n\right) \), since

$$\begin{aligned} \left\langle \varphi \left( \varvec{A}\right) ,\varphi \left( \varvec{B}\right) \right\rangle =\mathrm {trace}\left( \varvec{A}\varvec{U}\left( \varvec{B}\varvec{U}\right) ^{\top }\right) =\mathrm {trace}\left( \varvec{A}\varvec{B}^{\top }\right) =\left\langle \varvec{A},\varvec{B}\right\rangle . \end{aligned}$$

Now the probability measure \(\gamma \) on \({\mathbb {R}}^{m\times n}\) is defined so that the coordinates \(\varvec{A}_{i,j}\) of a randomly chosen \(\varvec{A}\in {\mathbb {R}}^{m\times n}\) are iid N(0, 1) normally distributed. In particular this measure is invariant under isometry, in that for all \({\mathcal {K}}\in {\mathscr {B}}\left( {\mathbb {R}}^{m\times n}\right) \) and \(\varphi \in O\left( m\times n\right) \), we have

$$\begin{aligned} \gamma \left( \varphi \left( {\mathcal {K}}\right) \right) =\gamma \left( {\mathcal {K}}\right) . \end{aligned}$$
(19)

The same is therefore true for the restricted probability measure \(\gamma \) on \(\mathrm {St}^{m\times n}\). It follows that if \(\varvec{U}\in O(n)\) and \(E\in {\mathscr {G}}\), then, using (17), (18), and (19), we have

$$\begin{aligned} \gamma \left( \ker ^{-1}(E\cdot \varvec{U})\right) =\gamma \left( \ker ^{-1}(E)\cdot \varvec{U}\right) =\gamma \left( \varphi _{\varvec{U}}\left( \ker ^{-1}(E)\right) \right) =\gamma \left( \ker ^{-1}(E)\right) . \end{aligned}$$

\(\square \)

4.3 Cramer’s rule ratios

Let \(\sigma \subset [n]\) of size m, and define

$$\begin{aligned} \mathrm {St}_{\sigma }^{m\times n}&:=\left\{ \varvec{A}\in \mathrm {St}^{m\times n}:\varvec{A}_{\sigma }\text { is nonsingular}\,\right\} .\\ \mathrm {Gr}\left( d,n\right) _{\sigma }&:=\left\{ \ker \left( \varvec{A}\right) \in \mathrm {Gr}\left( d,n\right) :\varvec{A}_{\sigma }\text {is nonsingular}\,\right\} . \end{aligned}$$

Note that \(\gamma \left( \mathrm {St}_{\sigma }^{m\times n}\right) =\nu \left( \mathrm {Gr}\left( d,n\right) _{\sigma }\right) =1\). Also define, for \(s>1\), \(i\in \sigma \), and \(j\notin \sigma \),

$$\begin{aligned} \xi _{\sigma ,i,j}\left( s\right) :=\left\{ \ker \left( \varvec{A}\right) \in \mathrm {Gr}\left( d,n\right) _{\sigma }:\left| \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\right| >s\right\} . \end{aligned}$$

Proposition 5

The set \(\xi _{\sigma ,i,j}\left( s\right) \) is Jordan measurable.

Proof

Let \(\xi =\xi _{\sigma ,i,j}\left( s\right) \). We first argue that it suffices to show \(\nu \left( \partial \xi \right) =0\), where \(\partial \xi \) denotes the boundary of \(\xi \). There is a metric on \(\mathrm {Gr}\left( d,n\right) \), which we denote by \(\delta \), whose open balls form a basis for our topology of \(\mathrm {Gr}\left( d,n\right) \). These open balls are defined, for each \(\varepsilon >0\) and d-dimensional subspace V of \({\mathbb {R}}^{n}\), as

$$\begin{aligned} B\left( V,\varepsilon \right) :=\left\{ W\in \mathrm {Gr}\left( d,n\right) :\delta \left( V,W\right) <\varepsilon \right\} . \end{aligned}$$

Let

$$\begin{aligned} \left( \partial \xi \right) _{\varepsilon }:=\bigcup _{V\in \partial \xi }B\left( V,\varepsilon \right) . \end{aligned}$$

Note that \(\partial \xi =\cap _{k\ge 1}\left( \partial \xi \right) _{1/k}\). We have, by the monotone convergence theorem,

$$\begin{aligned} \lim _{N\rightarrow \infty }\nu \left( \bigcap _{k=1}^{N}\left( \partial \xi \right) _{1/k}\right) =\nu \left( \bigcap _{k\ge 1}\left( \partial \xi \right) _{1/k}\right) =\nu \left( \partial \xi \right) =0. \end{aligned}$$

In particular, if we now fix some \(\varepsilon >0\), there is some \(k\ge 1\) such that \(\nu \bigl (\left( \partial \xi \right) _{1/k}\bigr )<\varepsilon \). Observe \({\overline{\xi }}\cap \left( \partial \xi \right) _{1/k}^{c}\) and \(\overline{\xi ^{c}}\) are two disjoint closed sets, where \({\overline{X}},X^{c}\) denotes the closure and complement of X in \(\mathrm {Gr}\left( d,n\right) \), respectively. As \(\mathrm {Gr}\left( d,n\right) \) is a metric space it is therefore a normal space, and we may therefore apply Urysohn’s lemma [10, Lemma 4.15] to get a continuous function \(f_{1}:\mathrm {Gr}\left( d,n\right) \rightarrow \left[ 0,1\right] \) such that

$$\begin{aligned} \left. f_{1}\right| _{{\overline{\xi }}\cap \left( \partial \xi \right) _{1/k}^{c}}=1\quad \text {and}\quad \left. f_{1}\right| _{\overline{\xi ^{c}}}=0. \end{aligned}$$

Again applying Urysohn’s lemma, we also get a function \(f_{2}:\mathrm {Gr}\left( d,n\right) \rightarrow \left[ 0,1\right] \) such that

$$\begin{aligned} \left. f_{2}\right| _{\overline{\xi ^{c}}\cap \left( \partial \xi \right) _{1/k}^{c}}=0\quad \text {and}\quad \left. f_{2}\right| _{{\overline{\xi }}}=1. \end{aligned}$$

Note that by construction, \(f_{1}\le {\mathbf {1}}_{\xi }\le f_{2}\). Furthermore,

$$\begin{aligned} \int \left( f_{2}-f_{1}\right) d\nu \le \nu \bigl (\left( \partial \xi \right) _{1/k}\bigr )<\varepsilon , \end{aligned}$$

which establishes the condition of Definition 1.

To conclude the proof, it remains to show \(\nu \left( \partial \xi \right) =0\). One way to see this is that \(\ker ^{-1}\left( \overline{\partial \xi }\right) \) is the solution set in \(\mathrm {St}^{m\times n}\) to

$$\begin{aligned} \left( \det \varvec{X}_{\sigma -i+j}\right) ^{2}-\left( s\cdot \det \varvec{X}_{\sigma }\right) ^{2}=0, \end{aligned}$$

where \(\varvec{X}\) denotes an \(m\times n\) matrix of variables. This is an algebraic hypersurface, hence by Proposition 4 we conclude

$$\begin{aligned} \nu \bigl (\partial \xi \bigr )=\gamma \left( \ker ^{-1}\left( \overline{\partial \xi }\right) \right) =0. \end{aligned}$$

\(\square \)

Proposition 6

For \(s>1\) and \(\sigma ,i,j\) as above, we have

$$\begin{aligned} \nu \left( \xi _{\sigma ,i,j}\left( s\right) \right) =\frac{2}{\pi s}+{\mathcal {O}}\left( s^{-3}\right) . \end{aligned}$$

Proof

Let \(\varvec{A}\) be a random element of \(\mathrm {St}_{\sigma }^{m\times n}\), and let H denote the (random) hyperplane spanned by the columns of \(\varvec{A}_{\sigma \backslash \left\{ i\right\} }\), and let \(\ell \) denote the line perpendicular to H. Let \(\varvec{u}_{\ell }\) denote the unit normal vector to H whose first nonzero coordinate is positive. Thus,

$$\begin{aligned} \ell ={\mathbb {R}}\varvec{u}_{\ell }=\left\{ \lambda \varvec{u}_{\ell }:\lambda \in {\mathbb {R}}\right\} . \end{aligned}$$

Let \(\alpha \in \left\{ -1,+1\right\} \) denote the sign of the first nonzero entry of \({\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\). Then we can write

$$\begin{aligned} \varvec{u}_{\ell }^{\top }=\frac{\alpha {\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}}{\left\| {\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\right\| _{2}}, \end{aligned}$$

since for all \(k\in \sigma \backslash \left\{ i\right\} \) we have

$$\begin{aligned} \alpha {\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\varvec{A}_{k}=\alpha {\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\varvec{A}_{\sigma }{\mathbf {e}}_{k}=0, \end{aligned}$$

and \(\alpha {\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\) has first nonzero component positive by definition of \(\alpha \).

Now let k be any element of \(\left[ n\right] \) outside of \(\sigma \backslash \left\{ i\right\} \). Since \(\varvec{u}_{\ell }\) depends only on \(\varvec{A}_{\sigma \backslash \left\{ i\right\} }\), and the entries of \(\varvec{A}\) are mutually independent, we have that \(\varvec{u}_{\ell }\) and \(\varvec{A}_{k}\) are independent random vectors. Now, for any fixed unit vector \(\varvec{v}\in {\mathbb {S}}^{n-1}\), as \(\varvec{A}_{k}\) has N(0, 1) iid entries, then the dot product \(\varvec{v}^{\top }\varvec{A}_{k}\) also has distribution N(0, 1). Thus, for any fixed \(t\in {\mathbb {R}}\), the random variable

$$\begin{aligned} \gamma \left( \varvec{u}_{\ell }^{\top }\varvec{A}_{k}\le t\mid \ell \right) \end{aligned}$$

(i.e. the conditional probability in terms of the \(\sigma \)-algebra generated by \(\ell \)) is in fact constant. Evaluating at the line \(\ell ={\mathbb {R}}{\mathbf {e}}_{1}\), for example, this constant is given by

$$\begin{aligned} \gamma \left( \varvec{A}_{1,k}\le t\right) . \end{aligned}$$

This shows that the random quantity \(\varvec{u}_{\ell }^{\top }\varvec{A}_{k}\) has distribution N(0, 1). We have

$$\begin{aligned} \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}=\frac{{\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\varvec{A}_{j}}{{\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\varvec{A}_{i}}=\frac{\varvec{u}_{\ell }^{\top }\varvec{A}_{j}}{\varvec{u}_{\ell }^{\top }\varvec{A}_{i}}. \end{aligned}$$

The independence of \(\varvec{u}_{\ell }^{\top }\varvec{A}_{i}\) and \(\varvec{u}_{\ell }^{\top }\varvec{A}_{j}\) imply that \(\left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\) has the Cauchy distribution, that is, the ratio of two iid N(0, 1) random variables. In particular, the cdf of \(\left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\) is given by

$$\begin{aligned} \gamma \left( \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\le t\right) =\frac{1}{\pi }\arctan (t)+\frac{1}{2}. \end{aligned}$$

See [9, p. 50] for more on the Cauchy distribution. Using the series expansion

$$\begin{aligned} \arctan \left( t\right) =\frac{\pi }{2}-\frac{1}{t}+\frac{1}{3t^{3}}-\frac{1}{5t^{5}}+\cdots , \end{aligned}$$

we get

$$\begin{aligned} \gamma \left( \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\le t\right) =1-\left( \frac{1}{\pi t}-\frac{1}{3\pi t^{3}}+\frac{1}{5\pi t^{5}}-\cdots \right) . \end{aligned}$$

Hence, using Proposition 4 and the fact \(s>1\), we conclude

$$\begin{aligned} \nu \left( \xi _{\sigma ,i,j}\left( s\right) \right)&=\gamma \left( \left| \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\right|>s\right) \\&=2\cdot \gamma \left( \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}>s\right) \\&=2\left( 1-\gamma \left( \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\le s\right) \right) \\&=2\left( \frac{1}{\pi s}-\frac{1}{3\pi s^{3}}+\frac{1}{5\pi s^{5}}-\cdots \right) \\&=\frac{2}{\pi s}+{\mathcal {O}}\left( s^{-3}\right) . \end{aligned}$$

\(\square \)

5 Proof of the main result

In this final section we prove the main result of this paper, Theorem 1.

Definition 3

Define the constant

$$\begin{aligned} {\tilde{\omega }}_{d}:=\frac{\omega _{d}^{1/d}}{d}, \end{aligned}$$

where \(\omega _{d}\) denotes the volume of the d-dimensional Euclidean ball of radius 1. This constant \({\tilde{\omega }}_{d}\) is of the order \(d^{-3/2}\).

Definition 4

Assume \(\Lambda _{{\mathbb {R}}}=\ker \left( \varvec{A}\right) \). Given positive real numbers s and u, we say \(\Lambda \) is \(\left( \sigma ,s,u\right) \)-controlled if \(\sigma \) is a basis of \(\varvec{A}\) and:

  1. 1.

    The largest entry of \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\) is at most s, and

  2. 2.

    The successive minima ratios of \(\Lambda \) are not too large: we have

    $$\begin{aligned} \frac{\lambda _{i+1}\left( \Lambda \right) }{\lambda _{i}\left( \Lambda \right) }<\left( {\tilde{\omega }}_{d}u\right) ^{2/(d-1)} \end{aligned}$$

    for all \(i=1,2,\ldots ,d-1\).

Lemma 3

If \(\sigma \) is a basis of \(\varvec{A}\) and \(\Lambda \) is \(\left( \sigma ,s,u\right) \)-controlled, then for all \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \) we have

$$\begin{aligned} \mathrm {dist}\left( \Lambda ,\sigma ,\varvec{x}^{*}\right) \le 2n^{3/2}su\left( \Delta \left( \Lambda \right) \right) ^{1/d}. \end{aligned}$$

Proof

Let \(\varvec{b}=\varvec{A}\varvec{x}^{*}\), let \(B=B_{2}^{n}\cap \Lambda _{{\mathbb {R}}}\), and let \(\mu \) denote the covering radius of B with respect to \(\Lambda \). Define the vector \(\varvec{v}\in {\mathbb {R}}^{n}\) so that:

$$\begin{aligned} \varvec{v}_{j}&=\mu \varvec{w}_{j}\text { for all}\, j\in {\bar{\sigma }}\\ \varvec{A}\varvec{v}&=\varvec{b}. \end{aligned}$$

We show that the scaled, translated ball \(\mu B+\varvec{v}\) is contained in \({\mathcal {P}}(\varvec{A},\varvec{b})\). Since \(B\subseteq \Lambda _{{\mathbb {R}}}\), we have that each \(\varvec{x}\in \mu B+\varvec{v}\) satisfies \(\varvec{A}\varvec{x}=\varvec{b}\). For each \(j\in \left[ n\right] \), let \(\varvec{x}^{(j)}\) be the unique point in \(\mu B+\varvec{v}\) such that \(\varvec{x}_{j}^{(j)}\) is minimized. If \(j\in {\bar{\sigma }}\), then

$$\begin{aligned} \varvec{x}_{j}^{(j)}=\mu (-\varvec{w}_{j})+\varvec{v}_{j}=\mu (-\varvec{w}_{j})+\mu \varvec{w}_{j}=0. \end{aligned}$$

If \(j\in \sigma \), then since \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \) we have

$$\begin{aligned} \varvec{x}_{j}^{(j)}&=\mu (-\varvec{w}_{j})+\varvec{v}_{j}\\&=\mu (-\varvec{w}_{j})+\left( \varvec{A}_{\sigma }^{-1}\varvec{b}-\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\varvec{w}_{{\bar{\sigma }}}\right) _{j}\\&\ge \mu (-\varvec{w}_{j})+\mu \varvec{w}_{j}\\&=0. \end{aligned}$$

This concludes the proof that \(\mu B+\varvec{v}\subseteq {\mathcal {P}}(\varvec{A},\varvec{b})\).

Let \({\mathbf {g}}\in \left( \Lambda _{{\mathbb {R}}}+\varvec{x}^{*}\right) \cap {\mathbb {Z}}^{n}\). Since \(\mu \) is the covering radius of B with respect to \(\Lambda \), there exists \(\varvec{z}^{*}\in \left( \Lambda +{\mathbf {g}}\right) \cap (\mu B+\varvec{v})\) such that

$$\begin{aligned} \left\| \varvec{x}^{*}-\varvec{z}^{*}\right\| _{2}\le \left\| \varvec{x}^{*}-\varvec{v}\right\| _{2}+\left\| \varvec{v}-\varvec{z}^{*}\right\| _{2}\le \mu \left\| {\tilde{\varvec{w}}}\right\| _{2}+\mu . \end{aligned}$$
(20)

where we define \({\tilde{\varvec{w}}}:=(\varvec{v}-\varvec{x}^{*})/\mu \). That is, \({\tilde{\varvec{w}}}\) satisfies

$$\begin{aligned} \varvec{A}{\tilde{\varvec{w}}}&={\mathbf {0}}\\ {\tilde{\varvec{w}}}_{j}&=\varvec{w}_{j}\text { for all}\, j\in {\bar{\sigma }}. \end{aligned}$$

Observe that

$$\begin{aligned} {\tilde{\varvec{w}}}_{\sigma }=-\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}{\tilde{\varvec{w}}}_{{\bar{\sigma }}}. \end{aligned}$$

Using the fact \(\varvec{w}\in [0,1]^{n}\), we therefore have

$$\begin{aligned} \left\| {\tilde{\varvec{w}}}\right\| _{2}^{2}&=\left\| {\tilde{\varvec{w}}}_{\sigma }\right\| _{2}^{2}+\left\| {\tilde{\varvec{w}}}_{{\bar{\sigma }}}\right\| _{2}^{2}\\&=\left\| \varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}{\tilde{\varvec{w}}}_{{\bar{\sigma }}}\right\| _{2}^{2}+\left\| {\tilde{\varvec{w}}}_{{\bar{\sigma }}}\right\| _{2}^{2}\\&\le m\left\| \varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\right\| _{\infty }^{2}\left\| {\tilde{\varvec{w}}}_{{\bar{\sigma }}}\right\| _{1}^{2}+\left\| {\tilde{\varvec{w}}}_{{\bar{\sigma }}}\right\| _{2}^{2}\\&\le \left( ms^{2}+1\right) d^{2}. \end{aligned}$$

Thus we conclude

$$\begin{aligned} \left\| \varvec{x}^{*}-\varvec{z}^{*}\right\| _{2}&\le \mu \left( \left\| {\tilde{\varvec{w}}}\right\| _{2}+1\right) \\&\le u\Delta ^{1/d}\left( \sqrt{\left( ms^{2}+1\right) d^{2}}+1\right) \\&\le 2n^{3/2}su\Delta ^{1/d}. \end{aligned}$$

\(\square \)

Proof

Let \(\Lambda \) be a uniformly chosen lattice from \(G\left( d,n,T\right) \). Let \(t>1\), and let \(s:=t^{2/3}/(2n^{3/2})\) and \(u:=t^{1/3}\), so that \(t=2n^{3/2}su\) as in Lemma 3. We have

$$\begin{aligned}&{\mathbf {P}}\left( \mathrm {dist}\left( \Lambda \right)>t\left( \Delta \left( \Lambda \right) \right) ^{1/d}\right) \\&\le \sum _{\sigma }{\mathbf {P}}\left( {\sigma \,\text {basis of}\, \varvec{A}, }\mathrm {dist}\left( \Lambda ,\sigma ,\varvec{x}^{*}\right) >t\left( \Delta \left( \Lambda \right) \right) ^{1/d}\,\text {for some }\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \right) \\&\le \sum _{\sigma }{\mathbf {P}}\left( {\sigma \text {basis of}\, \varvec{A}, }{\Lambda \text {is not}\,\left( \sigma ,s,u\right) \text {-controlled}}\,\right) \end{aligned}$$

where the sums are over all subsets \(\sigma \subseteq \left[ n\right] \) of size m. It therefore suffices to show, for each such \(\sigma \),

$$\begin{aligned} {\mathbf {P}}\left( {\sigma \text {basis of}\, \varvec{A}, }{\Lambda \text {is not}\, \left( \sigma ,s,u\right) \text {-controlled}}\right) \ll t^{-2/3}. \end{aligned}$$

By definition, this probability is at most

$$\begin{aligned} {\mathbf {P}}\left( \max _{i\in \left[ d-1\right] }\left\{ \frac{\lambda _{i+1}\left( \Lambda \right) }{\lambda _{i}\left( \Lambda \right) }\right\} \!\ge \!\left( \tilde{\omega }_{d}u\right) ^{2/(d-1)}\right) \!+\!\sum _{\begin{array}{c} i\in \sigma \\ j\notin \sigma \end{array} }{\mathbf {P}}\left( {\sigma \,\text { basis of}\, \varvec{A}, }\left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\ge s\right) .\nonumber \\ \end{aligned}$$
(21)

By Theorem 2, we have

$$\begin{aligned} {\mathbf {P}}\left( \sigma \text {basis of}\,\varvec{A}, \left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\ge s\right) =\frac{\left| G\left( {\mathbf {1}},\xi _{\sigma ,i,j}\left( s\right) ,T\right) \right| }{\left| G\left( {\mathbf {1}},\mathrm {Gr}\left( d,n\right) ,T\right) \right| }\asymp \nu \left( \xi _{\sigma ,i,j}\left( s\right) \right) . \end{aligned}$$

Hence, applying Corollary 2 and Proposition 6, for T sufficiently large, we may estimate up to constants the quantity (21) by

$$\begin{aligned} u^{-2}+s^{-1}\ll t^{-2/3}. \end{aligned}$$

\(\square \)