Abstract
We study proximity bounds within a natural model of random integer programs of the type \(\max \;\varvec{c}^{\top }\varvec{x}:\varvec{A}\varvec{x}=\varvec{b},\,\varvec{x}\in {\mathbb {Z}}_{\ge 0}\), where \(\varvec{A}\in {\mathbb {Z}}^{m\times n}\) is of rank m, \(\varvec{b}\in {\mathbb {Z}}^{m}\) and \(\varvec{c}\in {\mathbb {Z}}^{n}\). In particular, we seek bounds for proximity in terms of the parameter \(\Delta (\varvec{A})\), which is the square root of the determinant of the Gram matrix \(\varvec{A}\varvec{A}^{\top }\) of \(\varvec{A}\). We prove that, up to constants depending on n and m, the proximity is “generally” bounded by \(\Delta (\varvec{A})^{1/(n-m)}\), which is significantly better than the best deterministic bounds which are, again up to dimension constants, linear in \(\Delta (\varvec{A})\).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Given a linear program of the form
where \(\varvec{A}\) is a full-row-rank \(m\times n\) integral matrix, \(\varvec{b}\in {\mathbb {Z}}^{m}\), and \(\varvec{c}\in {\mathbb {Z}}^{n}\), we seek to understand how far away an optimal vertex \(\varvec{x}^{*}\) of the feasible region can be to a nearby feasible integer solution \(\varvec{z}^{*}\), assuming the feasible region has at least one integral point. Typically it is further required that \(\varvec{z}^{*}\) is itself optimal; we do not impose this requirement in this manuscript. We refer to the smallest possible distance between \(\varvec{x}^{*}\) and a feasible integral solution \(\varvec{z}^{*}\) as the proximity of (1). This distance is measured in terms of some given norm, for example the \(\left\| \cdot \right\| _1\) or \(\left\| \cdot \right\| _{\infty }\) norms; in this paper we state our results in terms of the Euclidean norm \(\left\| \cdot \right\| _2\).
Bounds for proximity are typically given in terms of the largest possible absolute value \(\Delta _{m}(\varvec{A})\) of any \(m\times m\) subdeterminant of \(\varvec{A}\). Note that this parameter is within a factor of \(\genfrac(){0.0pt}1{n}{m}\) of \(\Delta \left( \varvec{A}\right) :=\sqrt{\det (\varvec{A}\varvec{A}^{\top })}\). Finding such bounds is a well-studied problem which goes back to the classic Cook et al. result [7] bounding the proximity of the dual of (1). See, for instance, the recent works of Eisenbrand and Weismantel [8] and of Aliev et al. [2] and the references therein.
In this manuscript, we would like to understand the worst-possible proximity, which we denote by \(\mathrm {dist}(\varvec{A})\), over all choices of \(\varvec{b}\) and \(\varvec{c}\), when the matrix \(\varvec{A}\) is chosen randomly. The model of randomness we consider is the following: we choose the matrix \(\varvec{A}\) up to left-multiplication by unimodular matrices, and we choose \(\varvec{A}\) uniformly at random subject to the condition that the greatest common divisor of the maximal minors of \(\varvec{A}\) is 1, and that \(\Delta (\varvec{A})\) is at most some sufficiently large (with respect to m and n) integer T. This is a natural model to study from a geometric point of view, since \(\Delta (\varvec{A})\) is the determinant of the lattice of integer points in the kernel of \(\varvec{A}\). This is also the model considered by Aliev and Henk [1], in their investigation of diagonal Frobenius numbers.
Our main result concerns not \(\mathrm {dist}(\varvec{A})\) but rather a related random variable we denote by \(\mathrm {dist}^{*}(\varvec{A})\). This is an asymptotic version of \(\mathrm {dist}(\varvec{A})\) that further imposes some mild restrictions on \(\varvec{b}\). Our main result is that it satisfies the following Markov-type inequality:
Here \(\ll \) means less than, up to constants which only depend on n and m. In particular, this shows that proximity generally depends only on \(\Delta ^{1/(n-m)}\) in our random setting, for “almost all” choices of \(\varvec{b}\) in a certain precise sense. This is significantly better than the linear dependency on \(\Delta _{m}\) in the deterministic case, that is known to be tight [2, Theorem 1].
1.1 Related work
A similar result, with a slightly different random model, was obtained in [2] the so-called knapsack scenario, where \(m=1\). In this work, a fixed integer T is given, and the matrix \(\varvec{A}\) is a row vector chosen uniformly at random from \(\left\{ 1,2,\ldots ,T\right\} ^{n}\) such that the greatest common divisor of the entries equals 1. A special case of [2, Theorem 2] states
where \(\mathrm {dist}\left( \varvec{A}\right) \) measures distance using the \(\left\| \cdot \right\| _{\infty }\) norm.
The recent work of Oertel et al. [14] considers a random model that allows \(\varvec{b}\) to vary but keeps \(\varvec{A}\) fixed. More precisely, for a given positive integer t, the vector \(\varvec{b}\) is chosen uniformly at random from \(\left\{ -T,\ldots ,T\right\} ^{m}\) such that \(\varvec{A}\varvec{x}=\varvec{b},\varvec{x}\ge {\mathbf {0}}\) is integer-feasible. The result in [14, Corollary 1.3] states that
with probability approaching 1 as \(T\rightarrow \infty \). Here again \(\mathrm {dist}\left( \varvec{A}\right) \) measures distance using the \(\left\| \cdot \right\| _{\infty }\) norm. Note that this bound does not depend on n.
Finally, we mention the very recent work of Borst et al. [5] which investigates the integrality gap of integer programs of the form
with \(\varvec{A}\) and \(\varvec{c}\) having independent, Gaussian N(0, 1) entries. This quantity measures the difference between the optimal value of (3) and that of its linear relaxation. Their result is that the integrality gap is bounded from above by \(\mathrm {poly}\left( m\right) \left( \log n\right) ^{2}/n\) with probability at least \(1-n^{-7}-2^{-\mathrm {poly}\left( m\right) }\), subject to certain conditions on \(\varvec{b}\). See [5] and the references therein for a history of this problem.
1.2 Outline of proof
The proof of our result combines ideas of [1, 2] using facts from the geometry of numbers, some results of Schmidt from [15] on random sublattices of \({\mathbb {Z}}^{n}\) of fixed dimension, and computations of the measure of certain distinguished regions of the real Grassmannian \(\mathrm {Gr}(d,n)\) of d-dimensional subspaces of \({\mathbb {R}}^{n}\), where \(d=n-m\). For us the crucial parameters from the geometry of numbers that we need are the covering radius \(\mu \), as well as the successive minima \(\lambda _{1},\ldots ,\lambda _{d}\) of \(\ker \varvec{A}\cap B_{2}^{n}\) with respect to the lattice \(\ker \varvec{A}\cap {\mathbb {Z}}^{n}\), where \(B_{2}^{n}\) denotes the unit-radius Euclidean ball in \({\mathbb {R}}^{n}\). Further details on these parameters can be found in Sect. 3.
The restrictions imposed by the definition of \(\mathrm {dist}^{*}\left( \varvec{A}\right) \) on the right hand side \(\varvec{b}\) ensure that, given a vertex \(\varvec{x}^{*}\) of the feasible region of (1), one can always find a feasible integral solution \(\varvec{z}^{*}\) such that
where \(\varvec{x}^{*}\) has support contained in \(\sigma \subseteq \left[ n\right] \) and \(\varvec{A}_{\sigma }\) denotes the square submatrix of \(\varvec{A}\) whose columns are indexed by \(\sigma \). This restriction on \(\varvec{b}\) amounts to picking \(\varvec{b}\) sufficiently deep inside the cone spanned by the columns of \(\varvec{A}_{\sigma }\), or choosing \(\varvec{b}\) from a reduced cone in the sense of Gomory [11, p. 261]. A uniform upper bound on all ratios \(\lambda _{i+1}/\lambda _{i},\;i=1,2,\ldots ,d-1\) implies an upper bound on \(\mu \), see Lemma 2. Meanwhile, Sect. 4 shows that the measure in \(\mathrm {Gr}(d,n)\) of those subspaces \(\ker \varvec{A}\in \mathrm {Gr}(d,n)\) such that any given entry of \(\varvec{A}_{\sigma }^{-1}\varvec{A}\) exceeds in absolute value some fixed parameter \(s>0\) is a function of the order \(s^{-1}\). Theorem 2, itself a straightforward corollary of results of [15], combines these two pieces together: a random lattice of the form \(\ker \varvec{A}\cap {\mathbb {Z}}^{n}\) is unlikely to have any ratio \(\lambda _{i+1}/\lambda _{i}\), nor any entry of \(\varvec{A}_{\sigma }^{-1}\varvec{A}\), exceedingly large. The details of this are carried out in Sect. 5.
We remark that the exponent of \(-2/3\) is mainly an artifact of the proof, and we expect that it can be further improved. The problem of finding an inequality analogous to (2) for \(\mathrm {dist}(\varvec{A})\) is more challenging and remains open. When we allow \(\varvec{b}\) to lie close the the boundary of the cone spanned by the columns of \(\varvec{A}_{\sigma }\), our arguments no longer apply.
Remark 1
(Changes from proceedings version) The following changes have been made since the proceedings version of this manuscript [6]. In Sect. 3 we clarified and expanded upon the geometry of numbers theory that is used in this paper. In Sect. 4 we gave a proof of the claim that a particular subset of \(\mathrm {Gr}(d,n)\) is Jordan measurable. Some minor typos have also been corrected, and some minor changes have been made to the introduction.
2 Main result and notation
2.1 Notation
Throughout this manuscript we assume fixed positive integers d, m, n such that \(n=m+d\). For a subset \(\sigma \subseteq [n]\) and \(\varvec{x}\in {\mathbb {R}}^{n}\), we let \(\varvec{x}_{\sigma }\) denote the vector obtained by orthogonally projecting \(\varvec{x}\) onto the coordinates indexed by \(\sigma \). Similarly, if \(\varvec{A}\) is a matrix, then we denote by \(\varvec{A}_{\sigma }\) the submatrix of \(\varvec{A}\) whose columns are those indexed by \(\sigma \). In particular, if \(k\in [n]\) then \(\varvec{A}_{k}\) denotes the corresponding column of \(\varvec{A}\). If \(\varvec{A}_{\sigma }\) is an invertible square matrix we say \(\sigma \) is a basis of \(\varvec{A}\). We denote the complement of \(\sigma \) by \({\bar{\sigma }}:=[n]\backslash \sigma \). Given a d-dimensional subspace \(L\subseteq {\mathbb {R}}^{n}\), the m-dimensional orthogonal complement of L is denoted by \(L^{\perp }\). If \(\Lambda \subset {\mathbb {R}}^{n}\), let \(\Lambda _{{\mathbb {R}}}\) denote the linear subspace of \({\mathbb {R}}^{n}\) spanned by \(\Lambda \). We say \(\sigma \subseteq [n]\) is a coordinate basis of \(\Lambda \) or \(\Lambda _{{\mathbb {R}}}\) if the coordinate projection map
is an isomorphism. This is equivalent to saying that \(\sigma \) is a basis of \(\varvec{A}\) for any full-row-rank matrix \(\varvec{A}\) such that \(\ker (\varvec{A})=\Lambda _{{\mathbb {R}}}\). Finally, we denote the group of \(n\times n\) orthogonal real matrices by O(n). This notation presents a conflict with “big-O” asymptotic notation, so we write \({\mathcal {O}}(n)\) for the latter.
2.2 Definition of \(\mathrm {dist}(\varvec{A})\)
Let \(\varvec{A}\in {\mathbb {Z}}^{m\times n}\) be a full-row-rank matrix. For a basis \(\sigma \) of \(\varvec{A}\), we define the semigroup
For a vector \(\varvec{b}\in {\mathbb {Z}}^{m}\), we define the polyhedron
The idea behind these definitions is that if \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \varvec{A}\right) \), then \(\varvec{b}:=\varvec{A}\varvec{x}^{*}\) is an integral vector, \({\mathcal {P}}(\varvec{A},\varvec{b})\) is a polyhedron containing at least one integral point, and \(\varvec{x}^{*}\) is the vertex of \({\mathcal {P}}(\varvec{A},\varvec{b})\) associated to the basis \(\sigma \). Now given a basis \(\sigma \) of \(\varvec{A}\) and \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \varvec{A}\right) \), we define the distance
where \(\varvec{b}:=\varvec{A}\varvec{x}^{*}\). We then define the worst-case distance over all choices of bases \(\sigma \) of \(\varvec{A}\) and elements \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \varvec{A}\right) \) as
This definition has the disadvantage that it is stated in terms of the matrix \(\varvec{A}\). Since we may replace \(\varvec{A}\varvec{x}=\varvec{b}\) with \(\varvec{U}\varvec{A}\varvec{x}=\varvec{U}\varvec{b}\) for any \(m\times m\) integral matrix \(\varvec{U}\), it is not so clear from this formulation how to define our random model. This motivates an alternative, more geometric definition of \(\mathrm {dist}(\varvec{A})\) which we now state.
2.3 Definition of \(\mathrm {dist}(\Lambda )\)
Suppose instead we start with a d-dimensional sublattice \(\Lambda \) of \({\mathbb {Z}}^{n}\). Suppose \(\sigma \) is a coordinate basis of \(\Lambda \). Then we may define the semigroup
For \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \), define the distance
The extra maximum accounts for the fact that, if \(\Lambda \) is not primitive, then there are multiple ways to embed \(\Lambda \) into \(\Lambda _{{\mathbb {R}}}+\varvec{x}^{*}\) as an integral translate of \(\Lambda \). Finally, define the worst case distance
where the maximum is taken over all coordinate bases of \(\Lambda \) and elements \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \).
We now explain the relationship between definitions (5) and (8). First note that if \(\varvec{A}\) is any integral matrix such that \(\Lambda _{{\mathbb {R}}}=\ker (\varvec{A})\), then the two definitions (4) and (6) of \({\mathcal {S}}_{\sigma }\left( \varvec{A}\right) \) and \({\mathcal {S}}_{\sigma }\left( \Lambda \right) \) coincide. Moreover, if \(\Lambda \) is a primitive lattice, that is, if \(\Lambda =\Lambda _{{\mathbb {R}}}\cap {\mathbb {Z}}^{n}\), then we have
and therefore
Definition (8) also makes sense when \(\Lambda \) is non-primitive, however, and it is immediate from the definitions that in general,
The key advantage of definition (8) is that there are only finitely many d-dimensional sublattices of \({\mathbb {Z}}^{n}\) whose determinant is at most some fixed positive integer T. Thus, we may consider the uniform distribution over these bounded-determinant lattices.
2.4 An asymptotic version of \(\mathrm {dist}(\Lambda )\)
We next consider a modification of \(\mathrm {dist}\left( \Lambda \right) \). Choose any full-row-rank matrix \(\varvec{A}\) such that \(\ker (\varvec{A})=\Lambda _{{\mathbb {R}}}\), the particular choice of \(\varvec{A}\) is not important. Let \(B_{2}^{n}\subset {\mathbb {R}}^{n}\) denote the n-dimensional Euclidean ball of radius 1.
Define the vector \(\varvec{w}=\varvec{w}\left( \Lambda _{{\mathbb {R}}}\right) \in {\mathbb {R}}^{n}\) so that, for each \(i\in [n]\),
Denote by \(\mu =\mu \left( \Lambda \right) \) the covering radius of \(B_{2}^{n}\) with respect to \(\Lambda \). That is,
For more information on the covering radius we refer to Sect. 3. If \(\sigma \) is a basis of \(\varvec{A}\) then define the following subsemigroup of \({\mathcal {S}}_{\sigma }\left( \Lambda \right) \):
The next proposition shows that if we further restrict \(\varvec{x}^{*}\) so that it can only lie in \({\mathcal {S}}_{\sigma }^{*}\left( \Lambda \right) \), then we can guarantee that \({\mathcal {P}}(\varvec{A},\varvec{b})\) contains an integral point reasonably close to \(\varvec{x}^{*}\). We prove it in Sect. 5.
Proposition 1
For a basis \(\sigma \) of \(\varvec{A}\) and \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }^{*}\left( \Lambda \right) \), let \(\varvec{b}=\varvec{A}\varvec{x}^{*}\). Then \({\mathcal {P}}(\varvec{A},\varvec{b})\) contains a translate of the scaled ball \(\mu \cdot \left( B_{2}^{n}\cap \Lambda _{{\mathbb {R}}}\right) \), which in turn contains an integral vector.
Now set
where the maximum is taken over all bases \(\sigma \) of \(\varvec{A}\) and elements \(\varvec{x}^{*}\) of the semigroup \({\mathcal {S}}_{\sigma }^{*}\left( \Lambda \right) \).
2.5 Main result
We are now ready to state the main theorem.
Theorem 1
For \(T\gg 1\), let \(\Lambda \) be a sublattice of \({\mathbb {Z}}^{n}\) of dimension d and determinant at most T, chosen uniformly at random. Then for all \(t>1\),
What we would like to do is translate this statement into a statement about integer programs, and in particular derive inequality (2). For this we use a known result on the ratio between primitive sublattices and all sublattices with a fixed determinant upper bound, a consequence of Theorems 1 and 2 in [15]:
Lemma 1
Suppose there are exactly N(d, n, T) d-dimensional sublattices of \({\mathbb {Z}}^{n}\) with determinant at most T, of which exactly P(d, n, T) are primitive. Then
where \(\zeta (\cdot )\) denotes the Riemann zeta function.
Recall from the introduction our probability model. We start with a sufficiently large integer T relative to m and n, and consider the set of all \(m\times n\) integral matrices \(\varvec{A}\) such that the greatest common divisor of all maximal minors of \(\varvec{A}\) equals 1, and that \(\Delta \left( \varvec{A}\right) \le T\). The group of \(m\times m\) unimodular matrices acts on this set of matrices by multiplication on the left, and there are finitely many orbits of this action. We consider the uniform distribution on these orbits. We define
Note that this definition depends not on \(\varvec{A}\) but only on the orbit of \(\varvec{A}\). The greatest common divisor condition ensures that \(\Delta \left( \varvec{A}\right) \) equals the determinant of the lattice \(\ker \left( \varvec{A}\right) \cap {\mathbb {Z}}^{n}\). Recall we set \(d:=n-m\). We derive the next corollary by combining Theorem 1, Lemma 1, and the simple conditional probability inequality \({\mathbf {P}}(E\mid F)\le {\mathbf {P}}(E)/{\mathbf {P}}(F)\), where E is the event that \(\mathrm {dist}^{*}\left( \Lambda \right) >t\left( \Delta \left( \Lambda \right) \right) ^{1/d}\) and F is the event that \(\Lambda \) is primitive.
Corollary 1
For \(T\gg 1\), choose \(\varvec{A}\) randomly as above, with determinant at most T. Then for all \(t>1\),
We remark that the question of deriving the constants in this bound remains unexplored.
3 Geometry of Numbers and a theorem of Schmidt
Next we state some basic functionals and tools from Geometry of Numbers as well as a theorem of Schmidt which are fundamental for the proof of our results. An excellent reference for the Geometry of Numbers tools is Gruber’s book [12, Chapters 21–23]. We start with Minkowski’s successive minima. Given a d-dimensional lattice \(\Lambda \subset {\mathbb {R}}^{d}\), the ith successive minimum \(\lambda _{i}(\Lambda )\), \(i\in \{1,\dots ,d\}\), is defined as
In other words, \(\lambda _{i}(\Lambda )\) is the smallest dilation factor \(\lambda \) such that the Euclidean ball of radius \(\lambda \) contains at least i linearly independent lattice points of \(\Lambda \). Observe that
Minkowski introduced these successive minima not only for a ball but for any convex body symmetric to the origin, but here we just need them for the ball. In this particular setting, Minkowski’s so called second theorem on successive minima reads as follows
where \(\omega _{d}\) is the d-dimensional volume of the ball \(B_{2}^{d}\). Inequality (11) is for \(d>1\) actually a strict inequality and one can improve on the factor \(2^{d}\) on the right hand side, but for our purposes it is enough to use (11). The other functional we need from Geometry of Numbers is the already introduced covering radius \(\mu (\Lambda )\) (see (9)) which may also be defined as
The so called Jarnik’s inequalities show that the covering radius is essentially of the size of the last succesive minimum
Now, in general the successive minima can take any arbitrary values, even for sublattices of \({\mathbb {Z}}^{d}\). A fundamental result of Schmidt [15] states, however, that for a “typical” primitive sublattice of \({\mathbb {Z}}^{d}\) the ratios \(\lambda _{i+1}(\Lambda )/\lambda _{i}(\Lambda )\) are not “too” large. So one may expect that all the successive minima are more or less of the same size, which then allows us to give a “good” bound on \(\mu (\Lambda )\) via (11) and (12). But first we need a few more definitions in order to state Schmidt’s result.
We continue with our assumption that \(d=n-m\). Let \(\mathrm {Gr}\left( d,n\right) \) denote the set of d-dimensional subspaces of \({\mathbb {R}}^{n}\). Let \(\nu \) denote the unique O(n)-invariant probability measure on the real Grassmannian \(\mathrm {Gr}\left( d,n\right) \) (see, e.g., [3, Section 3.3]).
Definition 1
([15, p. 40]) A subset \(\xi \subset \mathrm {Gr}\left( d,n\right) \) is Jordan measurable if for all \(\varepsilon >0\) there exists continuous functions \(f_{1}\le {\mathbf {1}}_{\xi }\le f_{2}\) such that
Here \({\mathbf {1}}_{\xi }\) denotes the indicator function of \(\xi \).
In the next definiton we define the set \(G\left( \varvec{a},\xi ,T\right) \) of lattices we are interested in: they are sublattices of \({\mathbb {Z}}^{d}\) of determinant at most T, their span \(\Lambda _{{\mathbb {R}}}\) is contained in a given subset \(\xi \subseteq \mathrm {Gr}\left( d,n\right) \) and the ratios \(\lambda _{i+1}(\Lambda )/\lambda _{i}(\Lambda )\) are at least as large as the ith entry of the given vector \(\varvec{a}\). More formally,
Definition 2
Let \(\varvec{a}=\left( a_{1},\ldots ,a_{d}\right) \in {\mathbb {R}}^{d}\), with each \(a_{i}\ge 1\). Let T be a positive integer, and let \(\xi \subset \mathrm {Gr}\left( d,n\right) \). Then we define \(G\left( \varvec{a},\xi ,T\right) \) to be the set of sublattices \(\Lambda \) of \({\mathbb {Z}}^{n}\) of dimension d with determinant at most T, such that
and \(\Lambda _{{\mathbb {R}}}\in \xi \).
The result of Schmidt that we intend to use is a combination of Theorems 3 and 5 in [15]:
Theorem 2
Assuming \(\xi \subset \mathrm {Gr}\left( d,n\right) \) is Jordan measurable, we have
where \(f\asymp g\) means \(f\ll g\) and \(g\ll f\).
Roughly speaking, the amount of lattices having large successive minima ratios is small. In order to formalize this, let G(d, n, T) denote the set of all sublattices of \({\mathbb {Z}}^{n}\) of dimension d with determinant at most T. Let \({\mathbf {P}}={\mathbf {P}}_{d,n,T}\) denote the uniform probability distribution over G(d, n, T).
Corollary 2
For \(t>1\), we have
Proof
Following Aliev and Henk [1], let
Applying the union bound to Theorem 2, this probability is at most
\(\square \)
Finally we present the already mentioned upper bound on \(\mu (\lambda )\) provided we know that \(\lambda _{i+1}(\Lambda )/\lambda _{i}(\Lambda )\) is bounded. The argument is implicitly contained in the proof of Lemma 5.1 in [1].
Lemma 2
Let \(\Lambda \subset {\mathbb {R}}^{d}\) be a lattice, and let \(u>0\) such that for \(1\le i\le d-1\)
Then
Proof
For abbreviation we set \(r:=\left( u\omega _{d}^{1/d}/d\right) ^{\frac{2}{d-1}}\). Due to our assumption we get a lower bound on all successive minima \(\lambda _{i}(\Lambda )\), \(i=1,\dots ,d-1\), in terms of the last successive minimum
Combined with Minkowski’s inequality (11) we obtain
Hence,
and Jarnik’s inequality (12) yields the assertion. \(\square \)
4 Typical Cramer’s rule ratios
We see in the next section that the proximity can be bounded from above by an expression involving the largest absolute value of the entries of \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\), as \(\sigma \) ranges over all bases of \(\varvec{A}\), and \(\varvec{A}\) is chosen randomly. Hence, we would like to show that that the largest absolute value of any entry of the matrix \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\) is typically not too large, where for our purposes the subspace \(L:=\ker \varvec{A}\) is chosen uniformly at random from \(\mathrm {Gr}\left( d,n\right) \). Note that the matrix \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\) depends only on L and \(\sigma \). We remark that the entries of the matrix \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\) are explicitly computed using Cramer’s rule: for \(i\in \sigma \) and \(j\notin \sigma \), we have
As before, we let \(\nu :{\mathscr {G}}\rightarrow [0,1]\) denote the O(n)-invariant probability measure on \(\mathrm {Gr}\left( d,n\right) \). The precise statement we show is the following: Fix \(\sigma \subseteq \left[ n\right] \), \(i\in \sigma \), \(j\in \left[ n\right] \backslash \sigma \). Then, as a function of a parameter \(s>1\), we have
The proof proceeds in the three subsections below. First, we get a handle on \(\nu \) by relating it to another probability distribution, namely the Gaussian distribution \(\gamma \) on the matrix space \({\mathbb {R}}^{m\times n}\), where the entries are i.i.d. normally distributed with mean 0 and variance 1. This is done via the kernel map, which is introduced in Sect. 4.1 and related to \(\gamma \) in Sect. 4.2. Equation (14) is then derived in Sect. 4.3.
4.1 The real Grassmannian
For a general introduction to matrix groups and Grassmannians, we refer the reader to [4]. There is a right action of the orthogonal group O(n) on \(\mathrm {Gr}\left( d,n\right) \) defined as follows: if \(\ker \left( \varvec{A}\right) \in \mathrm {Gr}\left( d,n\right) \), where \(\varvec{A}\in {\mathbb {R}}^{m\times n}\), then
This is well-defined, since if \(\ker \left( \varvec{A}\right) =\ker \left( \varvec{A}'\right) \) for some \(\varvec{A}'\in {\mathbb {R}}^{m\times n}\), then \(\varvec{A}=\varvec{D}\varvec{A}'\) for some invertible \(m\times m\) matrix \(\varvec{D}\), and hence
Let \(\mathrm {St}^{m\times n}:=\left\{ \varvec{A}\in {\mathbb {R}}^{m\times n}:\mathrm {rank}(\varvec{A})=m\right\} \). Call this the Stiefel manifold. Again, there is a right action of O(n) on \(\mathrm {St}^{m\times n}\) which in this case is simply right multiplication:
The only thing to check here is that \(\varvec{A}\varvec{U}\) indeed lies in \(\mathrm {St}^{m\times n}\), but this is indeed the case since
thus \(\varvec{A}\) and \(\varvec{A}\varvec{U}\) have the same Gram matrix \(\varvec{A}\varvec{A}^{\top }\), and an \(m\times n\) matrix has full-row-rank if and only if its Gram matrix does.
The kernel map gives rise to a surjective map
Thus, we see from (15) that the following statement holds:
Proposition 2
The map \(\ker :\mathrm {St}^{m\times n}\rightarrow \mathrm {Gr}\left( d,n\right) \) is equivariant with respect to the right actions of O(n) on \(\mathrm {St}^{m\times n}\) and \(\mathrm {Gr}\left( d,n\right) \); that is, \(\left( \ker \left( \varvec{A}\right) \right) \cdot \varvec{U}=\ker \left( \varvec{A}\cdot \varvec{U}\right) \).
4.2 Probability spaces
Consider the probability space \(\left( {\mathbb {R}}^{m\times n},{\mathscr {B}}({\mathbb {R}}^{m\times n}),\gamma \right) \) where \({\mathscr {B}}({\mathbb {R}}^{m\times n})\) is the Borel \(\sigma \)-algebra, and the measure \(\gamma \) is defined so that each \(\varvec{A}\in {\mathbb {R}}^{m\times n}\) has iid N(0, 1) entries. In other words, \(\gamma \) is the standard Gaussian probability measure on the mn-dimensional real vector space \({\mathbb {R}}^{m\times n}\) with mean zero and identity covariance matrix. By restricting to \(\mathrm {St}^{m\times n}\), we get the probability space \(\left( \mathrm {St}^{m\times n},{\mathscr {B}}(\mathrm {St}^{m\times n}),\gamma \right) \). We can do this because \({\mathbb {R}}^{m\times n}\backslash \mathrm {St}^{m\times n}\) is an algebraic hypersurface in \({\mathbb {R}}^{m\times n}\), and therefore has measure zero with respect to \(\gamma \). Let \({\mathscr {B}}:={\mathscr {B}}(\mathrm {St}^{m\times n})\).
The Grassmannian \(\mathrm {Gr}\left( d,n\right) \) is endowed with the topology where \(E\subseteq \mathrm {Gr}\left( d,n\right) \) is open if and only if \(\ker ^{-1}(E)\) is open in \(\mathrm {St}^{m\times n}\). Let \({\mathscr {G}}\) denote the associated Borel \(\sigma \)-algebra. The measure \(\nu \) on \(\mathrm {Gr}\left( d,n\right) \) is characterized as follows:
Proposition 3
([13, Corollary 3.1.3]) The measure \(\nu \) is the unique measure on \(\mathrm {Gr}\left( d,n\right) \) satisfying
The map \(\ker :\mathrm {St}^{m\times n}\rightarrow \mathrm {Gr}\left( d,n\right) \) thus defines a map of probability spaces:
Proposition 4
The measure \(\nu \) is the pushforward measure of \(\gamma \) under this map. That is, \(\nu (E)=\gamma (\ker ^{-1}(E))\) for each \(E\in {\mathscr {G}}\).
Proof
We establish the conditions of (16). By surjectivity, and the fact that \(\gamma \) is a probability measure, we have
It therefore remains to show \(\gamma (\ker ^{-1}(E\cdot \varvec{U}))=\gamma (\ker ^{-1}(E))\) for each \(E\in {\mathscr {G}}\) and \(\varvec{U}\in O(n)\). By Proposition 2, we have
Now, \({\mathbb {R}}^{m\times n}\) has the inner product \(\left\langle \varvec{A},\varvec{B}\right\rangle =\mathrm {trace}\left( \varvec{A}\varvec{B}^{\top }\right) \). With respect to this inner product we may consider the subgroup \(O\left( m\times n\right) \) of \(\mathrm {GL}\left( {\mathbb {R}}^{m\times n}\right) \) which is given by
Observe that, for a fixed \(\varvec{U}\in O\left( n\right) \), the linear map \(\varphi _{\varvec{U}}\in \mathrm {GL}\left( {\mathbb {R}}^{m\times n}\right) \) given by
lies in \(O\left( m\times n\right) \), since
Now the probability measure \(\gamma \) on \({\mathbb {R}}^{m\times n}\) is defined so that the coordinates \(\varvec{A}_{i,j}\) of a randomly chosen \(\varvec{A}\in {\mathbb {R}}^{m\times n}\) are iid N(0, 1) normally distributed. In particular this measure is invariant under isometry, in that for all \({\mathcal {K}}\in {\mathscr {B}}\left( {\mathbb {R}}^{m\times n}\right) \) and \(\varphi \in O\left( m\times n\right) \), we have
The same is therefore true for the restricted probability measure \(\gamma \) on \(\mathrm {St}^{m\times n}\). It follows that if \(\varvec{U}\in O(n)\) and \(E\in {\mathscr {G}}\), then, using (17), (18), and (19), we have
\(\square \)
4.3 Cramer’s rule ratios
Let \(\sigma \subset [n]\) of size m, and define
Note that \(\gamma \left( \mathrm {St}_{\sigma }^{m\times n}\right) =\nu \left( \mathrm {Gr}\left( d,n\right) _{\sigma }\right) =1\). Also define, for \(s>1\), \(i\in \sigma \), and \(j\notin \sigma \),
Proposition 5
The set \(\xi _{\sigma ,i,j}\left( s\right) \) is Jordan measurable.
Proof
Let \(\xi =\xi _{\sigma ,i,j}\left( s\right) \). We first argue that it suffices to show \(\nu \left( \partial \xi \right) =0\), where \(\partial \xi \) denotes the boundary of \(\xi \). There is a metric on \(\mathrm {Gr}\left( d,n\right) \), which we denote by \(\delta \), whose open balls form a basis for our topology of \(\mathrm {Gr}\left( d,n\right) \). These open balls are defined, for each \(\varepsilon >0\) and d-dimensional subspace V of \({\mathbb {R}}^{n}\), as
Let
Note that \(\partial \xi =\cap _{k\ge 1}\left( \partial \xi \right) _{1/k}\). We have, by the monotone convergence theorem,
In particular, if we now fix some \(\varepsilon >0\), there is some \(k\ge 1\) such that \(\nu \bigl (\left( \partial \xi \right) _{1/k}\bigr )<\varepsilon \). Observe \({\overline{\xi }}\cap \left( \partial \xi \right) _{1/k}^{c}\) and \(\overline{\xi ^{c}}\) are two disjoint closed sets, where \({\overline{X}},X^{c}\) denotes the closure and complement of X in \(\mathrm {Gr}\left( d,n\right) \), respectively. As \(\mathrm {Gr}\left( d,n\right) \) is a metric space it is therefore a normal space, and we may therefore apply Urysohn’s lemma [10, Lemma 4.15] to get a continuous function \(f_{1}:\mathrm {Gr}\left( d,n\right) \rightarrow \left[ 0,1\right] \) such that
Again applying Urysohn’s lemma, we also get a function \(f_{2}:\mathrm {Gr}\left( d,n\right) \rightarrow \left[ 0,1\right] \) such that
Note that by construction, \(f_{1}\le {\mathbf {1}}_{\xi }\le f_{2}\). Furthermore,
which establishes the condition of Definition 1.
To conclude the proof, it remains to show \(\nu \left( \partial \xi \right) =0\). One way to see this is that \(\ker ^{-1}\left( \overline{\partial \xi }\right) \) is the solution set in \(\mathrm {St}^{m\times n}\) to
where \(\varvec{X}\) denotes an \(m\times n\) matrix of variables. This is an algebraic hypersurface, hence by Proposition 4 we conclude
\(\square \)
Proposition 6
For \(s>1\) and \(\sigma ,i,j\) as above, we have
Proof
Let \(\varvec{A}\) be a random element of \(\mathrm {St}_{\sigma }^{m\times n}\), and let H denote the (random) hyperplane spanned by the columns of \(\varvec{A}_{\sigma \backslash \left\{ i\right\} }\), and let \(\ell \) denote the line perpendicular to H. Let \(\varvec{u}_{\ell }\) denote the unit normal vector to H whose first nonzero coordinate is positive. Thus,
Let \(\alpha \in \left\{ -1,+1\right\} \) denote the sign of the first nonzero entry of \({\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\). Then we can write
since for all \(k\in \sigma \backslash \left\{ i\right\} \) we have
and \(\alpha {\mathbf {e}}_{i}^{\top }\varvec{A}_{\sigma }^{-1}\) has first nonzero component positive by definition of \(\alpha \).
Now let k be any element of \(\left[ n\right] \) outside of \(\sigma \backslash \left\{ i\right\} \). Since \(\varvec{u}_{\ell }\) depends only on \(\varvec{A}_{\sigma \backslash \left\{ i\right\} }\), and the entries of \(\varvec{A}\) are mutually independent, we have that \(\varvec{u}_{\ell }\) and \(\varvec{A}_{k}\) are independent random vectors. Now, for any fixed unit vector \(\varvec{v}\in {\mathbb {S}}^{n-1}\), as \(\varvec{A}_{k}\) has N(0, 1) iid entries, then the dot product \(\varvec{v}^{\top }\varvec{A}_{k}\) also has distribution N(0, 1). Thus, for any fixed \(t\in {\mathbb {R}}\), the random variable
(i.e. the conditional probability in terms of the \(\sigma \)-algebra generated by \(\ell \)) is in fact constant. Evaluating at the line \(\ell ={\mathbb {R}}{\mathbf {e}}_{1}\), for example, this constant is given by
This shows that the random quantity \(\varvec{u}_{\ell }^{\top }\varvec{A}_{k}\) has distribution N(0, 1). We have
The independence of \(\varvec{u}_{\ell }^{\top }\varvec{A}_{i}\) and \(\varvec{u}_{\ell }^{\top }\varvec{A}_{j}\) imply that \(\left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\) has the Cauchy distribution, that is, the ratio of two iid N(0, 1) random variables. In particular, the cdf of \(\left( \varvec{A}_{\sigma }^{-1}\varvec{A}_{j}\right) _{i}\) is given by
See [9, p. 50] for more on the Cauchy distribution. Using the series expansion
we get
Hence, using Proposition 4 and the fact \(s>1\), we conclude
\(\square \)
5 Proof of the main result
In this final section we prove the main result of this paper, Theorem 1.
Definition 3
Define the constant
where \(\omega _{d}\) denotes the volume of the d-dimensional Euclidean ball of radius 1. This constant \({\tilde{\omega }}_{d}\) is of the order \(d^{-3/2}\).
Definition 4
Assume \(\Lambda _{{\mathbb {R}}}=\ker \left( \varvec{A}\right) \). Given positive real numbers s and u, we say \(\Lambda \) is \(\left( \sigma ,s,u\right) \)-controlled if \(\sigma \) is a basis of \(\varvec{A}\) and:
-
1.
The largest entry of \(\varvec{A}_{\sigma }^{-1}\varvec{A}_{{\bar{\sigma }}}\) is at most s, and
-
2.
The successive minima ratios of \(\Lambda \) are not too large: we have
$$\begin{aligned} \frac{\lambda _{i+1}\left( \Lambda \right) }{\lambda _{i}\left( \Lambda \right) }<\left( {\tilde{\omega }}_{d}u\right) ^{2/(d-1)} \end{aligned}$$for all \(i=1,2,\ldots ,d-1\).
Lemma 3
If \(\sigma \) is a basis of \(\varvec{A}\) and \(\Lambda \) is \(\left( \sigma ,s,u\right) \)-controlled, then for all \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \) we have
Proof
Let \(\varvec{b}=\varvec{A}\varvec{x}^{*}\), let \(B=B_{2}^{n}\cap \Lambda _{{\mathbb {R}}}\), and let \(\mu \) denote the covering radius of B with respect to \(\Lambda \). Define the vector \(\varvec{v}\in {\mathbb {R}}^{n}\) so that:
We show that the scaled, translated ball \(\mu B+\varvec{v}\) is contained in \({\mathcal {P}}(\varvec{A},\varvec{b})\). Since \(B\subseteq \Lambda _{{\mathbb {R}}}\), we have that each \(\varvec{x}\in \mu B+\varvec{v}\) satisfies \(\varvec{A}\varvec{x}=\varvec{b}\). For each \(j\in \left[ n\right] \), let \(\varvec{x}^{(j)}\) be the unique point in \(\mu B+\varvec{v}\) such that \(\varvec{x}_{j}^{(j)}\) is minimized. If \(j\in {\bar{\sigma }}\), then
If \(j\in \sigma \), then since \(\varvec{x}^{*}\in {\mathcal {S}}_{\sigma }\left( \Lambda \right) \) we have
This concludes the proof that \(\mu B+\varvec{v}\subseteq {\mathcal {P}}(\varvec{A},\varvec{b})\).
Let \({\mathbf {g}}\in \left( \Lambda _{{\mathbb {R}}}+\varvec{x}^{*}\right) \cap {\mathbb {Z}}^{n}\). Since \(\mu \) is the covering radius of B with respect to \(\Lambda \), there exists \(\varvec{z}^{*}\in \left( \Lambda +{\mathbf {g}}\right) \cap (\mu B+\varvec{v})\) such that
where we define \({\tilde{\varvec{w}}}:=(\varvec{v}-\varvec{x}^{*})/\mu \). That is, \({\tilde{\varvec{w}}}\) satisfies
Observe that
Using the fact \(\varvec{w}\in [0,1]^{n}\), we therefore have
Thus we conclude
\(\square \)
Proof
Let \(\Lambda \) be a uniformly chosen lattice from \(G\left( d,n,T\right) \). Let \(t>1\), and let \(s:=t^{2/3}/(2n^{3/2})\) and \(u:=t^{1/3}\), so that \(t=2n^{3/2}su\) as in Lemma 3. We have
where the sums are over all subsets \(\sigma \subseteq \left[ n\right] \) of size m. It therefore suffices to show, for each such \(\sigma \),
By definition, this probability is at most
By Theorem 2, we have
Hence, applying Corollary 2 and Proposition 6, for T sufficiently large, we may estimate up to constants the quantity (21) by
\(\square \)
References
Aliev, I., Henk, M.: Feasibility of integer knapsacks. SIAM J. Optim. 20(6), 2978–2993 (2010)
Aliev, I., Henk, M., Oertel, T.: Distances to lattice points in knapsack polyhedra. Math. Program. 182((1–2, Ser. A)), 175–198 (2020)
Artstein-Avidan, S., Giannopoulos, A., Milman, V.D.: Asymptotic Geometric Analysis: Part I. Mathematical Surveys and Monographs, vol. 202. American Mathematical Society, Providence (2015)
Baker, A.: Matrix Groups: An Introduction to Lie Group Theory. Springer Undergraduate Mathematics Series, Springer, London (2003)
Borst, S., Dadush, D., Huiberts, S., Tiwari, S.: On the integrality gap of binary integer programs with gaussian data. In: Integer Programming and Combinatorial Optimization. Lecture Notes in Computer Science, vol. 12707, pp. 427–442. Springer, Cham (2021)
Celaya, M., Henk, M.: Proximity bounds for random integer programs. In: Integer Programming and Combinatorial Optimization. Lecture Notes in Computer Science, vol. 12707, pp. 413–426. Springer, Cham (2021)
Cook, W., Gerards, A.M.H., Schrijver, A., Tardos, É.: Sensitivity theorems in integer linear programming. Math. Program. 34(3), 251–264 (1986)
Eisenbrand, F., Weismantel, R.: Proximity results and faster algorithms for integer programming using the Steinitz lemma. ACM Trans. Algorithms 16(1), 1–14 (2019)
Feller, V., Feller, W.: An Introduction to Probability Theory and Its Applications. A Wiley Publication in Mathematical Statistics, vol. 1. Wiley, Hoboken (1968)
Folland, G.B.: Real Analysis: Modern Techniques and Their Applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts, Wiley, Hoboken (1999)
Gomory, R.E.: On the relation between integer and noninteger solutions to linear programs. Proc. Natl. Acad. Sci. U.S.A. 53(2), 260 (1965)
Gruber, P.M.: Convex and Discrete Geometry. Fundamental Principles of Mathematical Sciences, vol. 336. Springer, Berlin (2007)
Krantz, S.G., Parks, H.R.: Geometric Integration Theory. Cornerstones. Birkhäuser, Boston (2008)
Oertel, T., Paat, J., Weismantel, R.: The distributions of functions related to parametric integer optimization. SIAM J. Appl. Algebra Geom. 4(3), 422–440 (2020)
Schmidt, W.M.: The distribution of sublattices of \({\mathbf{Z}}^m\). Monatshefte Math. 125(1), 37–81 (1998)
Acknowledgements
Marcel Celaya was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy—The Berlin Mathematics Research Center MATH+(EXC-2046/1, Project ID: 390685689). The authors wish to thank the anonymous referees for their helpful comments and suggestions.
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Celaya, M., Henk, M. Proximity bounds for random integer programs. Math. Program. 197, 1201–1219 (2023). https://doi.org/10.1007/s10107-022-01786-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01786-8