Proximity bounds for random integer programs

We study proximity bounds within a natural model of random integer programs of the type maxc⊤x:Ax=b,x∈Z≥0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\max \;\varvec{c}^{\top }\varvec{x}:\varvec{A}\varvec{x}=\varvec{b},\,\varvec{x}\in {\mathbb {Z}}_{\ge 0}$$\end{document}, where A∈Zm×n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}\in {\mathbb {Z}}^{m\times n}$$\end{document} is of rank m, b∈Zm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{b}\in {\mathbb {Z}}^{m}$$\end{document} and c∈Zn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{c}\in {\mathbb {Z}}^{n}$$\end{document}. In particular, we seek bounds for proximity in terms of the parameter Δ(A)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta (\varvec{A})$$\end{document}, which is the square root of the determinant of the Gram matrix AA⊤\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}\varvec{A}^{\top }$$\end{document} of A\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}$$\end{document}. We prove that, up to constants depending on n and m, the proximity is “generally” bounded by Δ(A)1/(n-m)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta (\varvec{A})^{1/(n-m)}$$\end{document}, which is significantly better than the best deterministic bounds which are, again up to dimension constants, linear in Δ(A)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta (\varvec{A})$$\end{document}.


Introduction
Given a linear program of the form max c x : where A is a full-row-rank m × n integral matrix, b ∈ Z m , and c ∈ Z n , we seek to understand how far away an optimal vertex x * of the feasible region can be to a nearby feasible integer solution z * , assuming the feasible region has at least one integral point. Typically it is further required that z * is itself optimal; we do not impose this requirement in this manuscript. We refer to the smallest possible distance between x * and a feasible integral solution z * as the proximity of (1). This distance is measured in terms of some given norm, for example the · 1 or · ∞ norms; in this paper we state our results in terms of the Euclidean norm · 2 . Bounds for proximity are typically given in terms of the largest possible absolute value m ( A) of any m × m subdeterminant of A. Note that this parameter is within a factor of n m of ( A) := det( AA ). Finding such bounds is a well-studied problem which goes back to the classic Cook et al. result [7] bounding the proximity of the dual of (1). See, for instance, the recent works of Eisenbrand and Weismantel [8] and of Aliev et al. [2] and the references therein.
In this manuscript, we would like to understand the worst-possible proximity, which we denote by dist( A), over all choices of b and c, when the matrix A is chosen randomly. The model of randomness we consider is the following: we choose the matrix A up to left-multiplication by unimodular matrices, and we choose A uniformly at random subject to the condition that the greatest common divisor of the maximal minors of A is 1, and that ( A) is at most some sufficiently large (with respect to m and n) integer T . This is a natural model to study from a geometric point of view, since ( A) is the determinant of the lattice of integer points in the kernel of A. This is also the model considered by Aliev and Henk [1], in their investigation of diagonal Frobenius numbers.
Our main result concerns not dist( A) but rather a related random variable we denote by dist * ( A). This is an asymptotic version of dist( A) that further imposes some mild restrictions on b. Our main result is that it satisfies the following Markov-type inequality: Here means less than, up to constants which only depend on n and m. In particular, this shows that proximity generally depends only on 1/(n−m) in our random setting, for "almost all" choices of b in a certain precise sense. This is significantly better than the linear dependency on m in the deterministic case, that is known to be tight [2, Theorem 1].

Related work
A similar result, with a slightly different random model, was obtained in [2] the socalled knapsack scenario, where m = 1. In this work, a fixed integer T is given, and the matrix A is a row vector chosen uniformly at random from {1, 2, . . . , T } n such that the greatest common divisor of the entries equals 1. A special case of [2, Theorem 2] states where dist ( A) measures distance using the · ∞ norm. The recent work of Oertel et al. [14] considers a random model that allows b to vary but keeps A fixed. More precisely, for a given positive integer t, the vector b is chosen uniformly at random from {−T , . . . , T } m such that Ax = b, x ≥ 0 is integer-feasible. The result in [14,Corollary 1.3] states that with probability approaching 1 as T → ∞. Here again dist ( A) measures distance using the · ∞ norm. Note that this bound does not depend on n. Finally, we mention the very recent work of Borst et al. [5] which investigates the integrality gap of integer programs of the form max c x : with A and c having independent, Gaussian N (0, 1) entries. This quantity measures the difference between the optimal value of (3) and that of its linear relaxation. Their result is that the integrality gap is bounded from above by poly (m) (log n) 2 /n with probability at least 1 − n −7 − 2 −poly(m) , subject to certain conditions on b. See [5] and the references therein for a history of this problem.

Outline of proof
The proof of our result combines ideas of [1,2] using facts from the geometry of numbers, some results of Schmidt from [15] on random sublattices of Z n of fixed dimension, and computations of the measure of certain distinguished regions of the real Grassmannian Gr(d, n) of d-dimensional subspaces of R n , where d = n − m. For us the crucial parameters from the geometry of numbers that we need are the covering radius μ, as well as the successive minima λ 1 , . . . , λ d of ker A ∩ B n 2 with respect to the lattice ker A ∩ Z n , where B n 2 denotes the unit-radius Euclidean ball in R n . Further details on these parameters can be found in Sect. 3.
The restrictions imposed by the definition of dist * ( A) on the right hand side b ensure that, given a vertex x * of the feasible region of (1), one can always find a feasible integral solution z * such that where x * has support contained in σ ⊆ [n] and A σ denotes the square submatrix of A whose columns are indexed by σ . This restriction on b amounts to picking b sufficiently deep inside the cone spanned by the columns of A σ , or choosing b from a reduced cone in the sense of Gomory [11, p. 261]. A uniform upper bound on all ratios λ i+1 /λ i , i = 1, 2, . . . , d −1 implies an upper bound on μ, see Lemma 2. Meanwhile, Sect. 4 shows that the measure in Gr(d, n) of those subspaces ker A ∈ Gr(d, n) such that any given entry of A −1 σ A exceeds in absolute value some fixed parameter s > 0 is a function of the order s −1 . Theorem 2, itself a straightforward corollary of results of [15], combines these two pieces together: a random lattice of the form ker A ∩ Z n is unlikely to have any ratio λ i+1 /λ i , nor any entry of A −1 σ A, exceedingly large. The details of this are carried out in Sect. 5.
We remark that the exponent of −2/3 is mainly an artifact of the proof, and we expect that it can be further improved. The problem of finding an inequality analogous to (2) for dist( A) is more challenging and remains open. When we allow b to lie close the the boundary of the cone spanned by the columns of A σ , our arguments no longer apply.

Remark 1 (Changes from proceedings version)
The following changes have been made since the proceedings version of this manuscript [6]. In Sect. 3 we clarified and expanded upon the geometry of numbers theory that is used in this paper. In Sect. 4 we gave a proof of the claim that a particular subset of Gr(d, n) is Jordan measurable. Some minor typos have also been corrected, and some minor changes have been made to the introduction.

Notation
Throughout this manuscript we assume fixed positive integers d, m, n such that n = m + d. For a subset σ ⊆ [n] and x ∈ R n , we let x σ denote the vector obtained by orthogonally projecting x onto the coordinates indexed by σ . Similarly, if A is a matrix, then we denote by A σ the submatrix of A whose columns are those indexed by σ . In particular, if k ∈ [n] then A k denotes the corresponding column of A. If A σ is an invertible square matrix we say σ is a basis of A. We denote the complement of σ byσ := [n]\σ . Given a d-dimensional subspace L ⊆ R n , the m-dimensional orthogonal complement of L is denoted by L ⊥ . If ⊂ R n , let R denote the linear subspace of R n spanned by . We say σ ⊆ [n] is a coordinate basis of or R if the coordinate projection map is an isomorphism. This is equivalent to saying that σ is a basis of A for any fullrow-rank matrix A such that ker( A) = R . Finally, we denote the group of n × n orthogonal real matrices by O(n). This notation presents a conflict with "big-O" asymptotic notation, so we write O(n) for the latter.

Definition of dist( A)
Let A ∈ Z m×n be a full-row-rank matrix. For a basis σ of A, we define the semigroup For a vector b ∈ Z m , we define the polyhedron The idea behind these definitions is that if x * ∈ S σ ( A), then b := Ax * is an integral vector, P( A, b) is a polyhedron containing at least one integral point, and x * is the vertex of P( A, b) associated to the basis σ . Now given a basis σ of A and x * ∈ S σ ( A), where b := Ax * . We then define the worst-case distance over all choices of bases σ of A and elements x * ∈ S σ ( A) as This definition has the disadvantage that it is stated in terms of the matrix A. Since we may replace Ax = b with U Ax = U b for any m × m integral matrix U, it is not so clear from this formulation how to define our random model. This motivates an alternative, more geometric definition of dist( A) which we now state.

Definition of dist(3)
Suppose instead we start with a d-dimensional sublattice of Z n . Suppose σ is a coordinate basis of . Then we may define the semigroup For The extra maximum accounts for the fact that, if is not primitive, then there are multiple ways to embed into R + x * as an integral translate of . Finally, define the worst case distance where the maximum is taken over all coordinate bases of and elements x * ∈ S σ ( ). We now explain the relationship between definitions (5) and (8). First note that if A is any integral matrix such that R = ker( A), then the two definitions (4) and (6) of S σ ( A) and S σ ( ) coincide. Moreover, if is a primitive lattice, that is, if Definition (8) also makes sense when is non-primitive, however, and it is immediate from the definitions that in general, The key advantage of definition (8) is that there are only finitely many d-dimensional sublattices of Z n whose determinant is at most some fixed positive integer T . Thus, we may consider the uniform distribution over these bounded-determinant lattices.

An asymptotic version of dist(3)
We next consider a modification of dist ( ). Choose any full-row-rank matrix A such that ker( A) = R , the particular choice of A is not important. Let B n 2 ⊂ R n denote the n-dimensional Euclidean ball of radius 1.
Define the vector w = w ( R ) ∈ R n so that, for each i ∈ [n], Denote by μ = μ ( ) the covering radius of B n 2 with respect to . That is, For more information on the covering radius we refer to Sect. 3. If σ is a basis of A then define the following subsemigroup of S σ ( ): The next proposition shows that if we further restrict x * so that it can only lie in S * σ ( ), then we can guarantee that P( A, b) contains an integral point reasonably close to x * . We prove it in Sect. 5. Then P( A, b) contains a translate of the scaled ball μ · B n 2 ∩ R , which in turn contains an integral vector.

Proposition 1 For a basis σ of
where the maximum is taken over all bases σ of A and elements x * of the semigroup S * σ ( ).

Main result
We are now ready to state the main theorem.
Theorem 1 For T 1, let be a sublattice of Z n of dimension d and determinant at most T , chosen uniformly at random. Then for all t > 1, What we would like to do is translate this statement into a statement about integer programs, and in particular derive inequality (2). For this we use a known result on the ratio between primitive sublattices and all sublattices with a fixed determinant upper bound, a consequence of Theorems 1 and 2 in [15]: where ζ(·) denotes the Riemann zeta function.
Recall from the introduction our probability model. We start with a sufficiently large integer T relative to m and n, and consider the set of all m × n integral matrices A such that the greatest common divisor of all maximal minors of A equals 1, and that ( A) ≤ T . The group of m × m unimodular matrices acts on this set of matrices by multiplication on the left, and there are finitely many orbits of this action. We consider the uniform distribution on these orbits. We define Note that this definition depends not on A but only on the orbit of A. The greatest common divisor condition ensures that ( A) equals the determinant of the lattice ker ( A) ∩ Z n . Recall we set d := n − m. We derive the next corollary by combining Theorem 1, Lemma 1, and the simple conditional probability inequality P(E | F) ≤ P(E)/P(F), where E is the event that dist * ( ) > t ( ( )) 1/d and F is the event that is primitive.

Corollary 1 For T
1, choose A randomly as above, with determinant at most T . Then for all t > 1,

Geometry of Numbers and a theorem of Schmidt
Next we state some basic functionals and tools from Geometry of Numbers as well as a theorem of Schmidt which are fundamental for the proof of our results. An excellent reference for the Geometry of Numbers tools is Gruber's book [12,. We start with Minkowski's successive minima. Given a d-dimensional lattice ⊂ R d , the ith successive minimum λ i ( ), i ∈ {1, . . . , d}, is defined as In other words, λ i ( ) is the smallest dilation factor λ such that the Euclidean ball of radius λ contains at least i linearly independent lattice points of . Observe that Minkowski introduced these successive minima not only for a ball but for any convex body symmetric to the origin, but here we just need them for the ball. In this particular setting, Minkowski's so called second theorem on successive minima reads as follows where ω d is the d-dimensional volume of the ball B d 2 . Inequality (11) is for d > 1 actually a strict inequality and one can improve on the factor 2 d on the right hand side, but for our purposes it is enough to use (11). The other functional we need from Geometry of Numbers is the already introduced covering radius μ( ) (see (9)) which may also be defined as The so called Jarnik's inequalities show that the covering radius is essentially of the size of the last succesive minimum Now, in general the successive minima can take any arbitrary values, even for sublattices of Z d . A fundamental result of Schmidt [15] states, however, that for a "typical" primitive sublattice of Z d the ratios λ i+1 ( )/λ i ( ) are not "too" large. So one may expect that all the successive minima are more or less of the same size, which then allows us to give a "good" bound on μ( ) via (11) and (12). But first we need a few more definitions in order to state Schmidt's result.
We continue with our assumption that d = n − m. Let Gr (d, n) denote the set of d-dimensional subspaces of R n . Let ν denote the unique O(n)-invariant probability measure on the real Grassmannian Gr (d, n) (see, e.g., [3, Section 3.3]).
In the next definiton we define the set G (a, ξ, T ) of lattices we are interested in: they are sublattices of Z d of determinant at most T , their span R is contained in a given subset ξ ⊆ Gr (d, n) and the ratios λ i+1 ( )/λ i ( ) are at least as large as the ith entry of the given vector a. More formally, a = (a 1 , . . . , a d ) ∈ R d , with each a i ≥ 1. Let T be a positive integer, and let ξ ⊂ Gr (d, n). Then we define G (a, ξ, T ) to be the set of sublattices of Z n of dimension d with determinant at most T , such that
The result of Schmidt that we intend to use is a combination of Theorems 3 and 5 in [15]: where f g means f g and g f .
Roughly speaking, the amount of lattices having large successive minima ratios is small. In order to formalize this, let G(d, n, T ) denote the set of all sublattices of Z n of dimension d with determinant at most T . Let P = P d,n,T denote the uniform probability distribution over G(d, n, T ).
Proof Following Aliev and Henk [1], let Applying the union bound to Theorem 2, this probability is at most Finally we present the already mentioned upper bound on μ(λ) provided we know that λ i+1 ( )/λ i ( ) is bounded. The argument is implicitly contained in the proof of Lemma 5.1 in [1].
Proof For abbreviation we set r := uω Due to our assumption we get a lower bound on all successive minima λ i ( ), i = 1, . . . , d − 1, in terms of the last successive minimum Combined with Minkowski's inequality (11) we obtain Hence, and Jarnik's inequality (12) yields the assertion.

Typical Cramer's rule ratios
We see in the next section that the proximity can be bounded from above by an expression involving the largest absolute value of the entries of A −1 σ Aσ , as σ ranges over all bases of A, and A is chosen randomly. Hence, we would like to show that that the largest absolute value of any entry of the matrix A −1 σ Aσ is typically not too large, where for our purposes the subspace L := ker A is chosen uniformly at random from Gr (d, n). Note that the matrix A −1 σ Aσ depends only on L and σ . We remark that the entries of the matrix A −1 σ Aσ are explicitly computed using Cramer's rule: for i ∈ σ and j / ∈ σ , we have As before, we let ν : G → [0, 1] denote the O(n)-invariant probability measure on Gr (d, n). The precise statement we show is the following: Fix σ ⊆ [n], i ∈ σ , j ∈ [n] \σ . Then, as a function of a parameter s > 1, we have The proof proceeds in the three subsections below. First, we get a handle on ν by relating it to another probability distribution, namely the Gaussian distribution γ on the matrix space R m×n , where the entries are i.i.d. normally distributed with mean 0 and variance 1. This is done via the kernel map, which is introduced in Sect. 4.1 and related to γ in Sect. 4 (14) is then derived in Sect. 4.3.

The real Grassmannian
For a general introduction to matrix groups and Grassmannians, we refer the reader to [4]. There is a right action of the orthogonal group O(n) on Gr (d, n) defined as follows: if ker ( A) ∈ Gr (d, n), where A ∈ R m×n , then

Probability spaces
Consider the probability space R m×n , B(R m×n ), γ where B(R m×n ) is the Borel σ -algebra, and the measure γ is defined so that each A ∈ R m×n has iid N (0, 1) entries.
In other words, γ is the standard Gaussian probability measure on the mn-dimensional real vector space R m×n with mean zero and identity covariance matrix. By restricting to St m×n , we get the probability space St m×n , B(St m×n ), γ . We can do this because R m×n \St m×n is an algebraic hypersurface in R m×n , and therefore has measure zero with respect to γ . Let  (Gr (d, n) , G , ν) .

Proposition 4
The measure ν is the pushforward measure of γ under this map. That is, Proof We establish the conditions of (16). By surjectivity, and the fact that γ is a probability measure, we have γ (ker −1 (Gr (d, n))) = γ St m×n = 1.
It therefore remains to show γ (ker −1 (E · U)) = γ (ker −1 (E)) for each E ∈ G and U ∈ O(n). By Proposition 2, we have Now, R m×n has the inner product A, B = trace AB . With respect to this inner product we may consider the subgroup O (m × n) of GL R m×n which is given by Observe that, for a fixed U ∈ O (n), the linear map ϕ U ∈ GL R m×n given by Now the probability measure γ on R m×n is defined so that the coordinates A i, j of a randomly chosen A ∈ R m×n are iid N (0, 1) normally distributed. In particular this measure is invariant under isometry, in that for all K ∈ B R m×n and ϕ ∈ O (m × n), we have The same is therefore true for the restricted probability measure γ on St m×n . It follows that if U ∈ O(n) and E ∈ G , then, using (17), (18), and (19), we have

Proposition 5 The set ξ σ,i, j (s) is Jordan measurable.
Proof Let ξ = ξ σ,i, j (s). We first argue that it suffices to show ν (∂ξ ) = 0, where ∂ξ denotes the boundary of ξ . There is a metric on Gr (d, n), which we denote by δ, whose open balls form a basis for our topology of Gr (d, n). These open balls are defined, for each ε > 0 and d-dimensional subspace V of R n , as Note that ∂ξ = ∩ k≥1 (∂ξ ) 1/k . We have, by the monotone convergence theorem, In particular, if we now fix some ε > 0, there is some k ≥ 1 such that ν (∂ξ ) 1/k < ε.
Observe ξ ∩ (∂ξ ) c 1/k and ξ c are two disjoint closed sets, where X , X c denotes the closure and complement of X in Gr (d, n), respectively. As Gr (d, n) is a metric space it is therefore a normal space, and we may therefore apply Urysohn's lemma [10,Lemma 4.15] to get a continuous function f 1 : Again applying Urysohn's lemma, we also get a function f 2 : Gr (d, n) → [0, 1] such that Note that by construction, f 1 ≤ 1 ξ ≤ f 2 . Furthermore, which establishes the condition of Definition 1.
To conclude the proof, it remains to show ν (∂ξ ) = 0. One way to see this is that ker −1 ∂ξ is the solution set in St m×n to where X denotes an m × n matrix of variables. This is an algebraic hypersurface, hence by Proposition 4 we conclude ν ∂ξ = γ ker −1 ∂ξ = 0.

Proposition 6
For s > 1 and σ, i, j as above, we have Proof Let A be a random element of St m×n σ , and let H denote the (random) hyperplane spanned by the columns of A σ \{i} , and let denote the line perpendicular to H . Let u denote the unit normal vector to H whose first nonzero coordinate is positive. Thus, Let α ∈ {−1, +1} denote the sign of the first nonzero entry of e i A −1 σ . Then we can write and αe i A −1 σ has first nonzero component positive by definition of α. Now let k be any element of [n] outside of σ \ {i}. Since u depends only on A σ \{i} , and the entries of A are mutually independent, we have that u and A k are independent random vectors. Now, for any fixed unit vector v ∈ S n−1 , as A k has N (0, 1) iid entries, then the dot product v A k also has distribution N (0, 1). Thus, for any fixed t ∈ R, the random variable γ u A k ≤ t | (i.e. the conditional probability in terms of the σ -algebra generated by ) is in fact constant. Evaluating at the line = Re 1 , for example, this constant is given by This shows that the random quantity u A k has distribution N (0, 1). We have The independence of u A i and u A j imply that A −1 σ A j i has the Cauchy distribution, that is, the ratio of two iid N (0, 1) random variables. In particular, the cdf of A −1 σ A j i is given by See [9, p. 50] for more on the Cauchy distribution. Using the series expansion Hence, using Proposition 4 and the fact s > 1, we conclude

Proof of the main result
In this final section we prove the main result of this paper, Theorem 1.

Proof
Let b = Ax * , let B = B n 2 ∩ R , and let μ denote the covering radius of B with respect to . Define the vector v ∈ R n so that: v j = μw j for all j ∈σ We show that the scaled, translated ball μB+v is contained in P( A, b). Since B ⊆ R , we have that each x ∈ μB + v satisfies Ax = b. For each j ∈ [n], let x ( j) be the unique point in μB + v such that x ( j) j is minimized. If j ∈σ , then If j ∈ σ , then since x * ∈ S σ ( ) we have This concludes the proof that μB + v ⊆ P ( A, b). Let g ∈ ( R + x * ) ∩ Z n . Since μ is the covering radius of B with respect to , there exists z * ∈ ( + g) ∩ (μB + v) such that where we definew := (v − x * )/μ. That is,w satisfies Aw = 0 w j = w j for all j ∈σ .
Hence, applying Corollary 2 and Proposition 6, for T sufficiently large, we may estimate up to constants the quantity (21) by u −2 + s −1 t −2/3 .