Abstract
Nfold integer programs (IPs) form an important class of blockstructured IPs for which increasingly fast algorithms have recently been developed and successfully applied. We study highmultiplicity Nfold IPs, which encode IPs succinctly by presenting a description of each block type and a vector of block multiplicities. Our goal is to design algorithms which solve Nfold IPs in time polynomial in the size of the succinct encoding, which may be significantly smaller than the size of the explicit (nonsuccinct) instance. We present the first fixedparameter algorithm for highmultiplicity Nfold IPs, which even works for convex objectives. Our key contribution is a novel proximity theorem which relates fractional and integer optima of the Configuration LP, a fundamental notion by Gilmore and Gomory [Oper. Res., 1961] which we generalize. Our algorithm for Nfold IP is faster than previous algorithms whenever the number of blocks is much larger than the number of block types, such as in Nfold IP models for various scheduling problems.
Introduction
The fundamental Integer Programming (IP) problem is to solve:
where \(f: \mathbb {R}^n \rightarrow \mathbb {R}\), \(A \in \mathbb {Z}^{m \times n}\), \({\mathbf{b}}\in \mathbb {Z}^m\), and \({\mathbf{l}}, {\mathbf{u}}\in (\mathbb {Z}\cup \{\pm \infty \})^n\). Any IP instance with infinite bounds \({\mathbf{l}}, {\mathbf{u}}\) can be reduced to an instance with finite bounds using standard techniques (solving the continuous relaxation and using proximity bounds to restrict the relevant region), so that from now on we will assume finite bounds \({\mathbf{l}}, {\mathbf{u}}\in \mathbb {Z}^n\). We denote \({\displaystyle f_{\max } = \max _{\begin{array}{c} {\mathbf{x}}\in \mathbb {Z}^n:\\ {\mathbf{l}}\le {\mathbf{x}}\le {\mathbf{u}} \end{array}} f({\mathbf{x}})}\).
Integer Programming is a fundamental problem with vast importance both in theory and practice. Because it is NPhard already with a single row (by reduction from Subset Sum) or with A a 0/1matrix (by reduction from Vertex Cover), there is high interest in identifying tractable subclasses of IP. One such tractable subclass is Nfold IPs, whose constraint matrix A is defined as
Here, \(r,s,t,N \in \mathbb {N}\), \(E^{(N)}\) is an \((r+Ns)\times Nt\)matrix, \(E^i_1 \in \mathbb {Z}^{r \times t}\) and \(E^i_2 \in \mathbb {Z}^{s \times t}\), \(i \in [N]\), are integer matrices. We define \(E := \left( {\begin{matrix} E_1^1 &{} E_1^2 &{} \cdots &{} E_1^N \\ E_2^1 &{} E_2^2 &{} \cdots &{} E_2^N \end{matrix}}\right) \), and call \(E^{(N)}\) the Nfold product of E. The structure of \(E^{(N)}\) allows us to divide any Ntdimensional object, such as the variables of \({\mathbf{x}}\), bounds \({\mathbf{l}}, {\mathbf{u}}\), or the objective f, into N bricks of size t, e.g. \({\mathbf{x}}=({\mathbf{x}}^1, \dots , {\mathbf{x}}^N)\). We use subscripts to index within a brick and superscripts to denote the index of the brick, i.e., \(x{^i_{j}}\) is the jth variable of the ith brick with \(j \in [t]\) and \(i \in [N]\). Problem (IP) with \(A=E^{(N)}\) is known as Nfold integer programming (Nfold IP).
Such blockstructured matrices have been the subject of extensive research stretching back to the ’70s [3,4,5, 15, 16, 28, 42, 44, 45], as this special structure allows applying methods like the DantzigWolfe decomposition and others, leading to significant speedups in practice. On the theoretical side, the term “Nfold IP” has been coined by De Loera et al. [9], and since then increasingly efficient algorithms have been developed and applied to various problems relating to Nfold IPs [2, 6, 25, 26, 29, 32]. This line of research culminated with an algorithm by Eisenbrand et al. [13] which solves Nfold IPs in time \((\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + rs^2)} \cdot N \log N \cdot \log \Vert {\mathbf{u}}{\mathbf{l}}\Vert _\infty \cdot \log f_{\max }\) for all separable convex objectives f (i.e., when \(f({\mathbf{x}}) = \sum _{i=1}^n f_i(x_i)\) and each \(f_i: \mathbb {R}\rightarrow \mathbb {R}\) is convex).
Our contribution
Previous algorithms for Nfold IP have focused on reducing the runtime dependency on N down to almost linear. Instead, our interest here is on Nfold IPs which model applications where many bricks are of the same type, that is, they share the same bounds, righthand side, and objective function. For those applications, it is natural to encode an Nfold IP instance succinctly by describing each brick type by its constraint matrix, bounds, righthand side, and objective function, and giving a vector of brick multiplicities. When the number of brick types \(\tau \) is much smaller than the number N of bricks, e.g., if \(N \approx 2^\tau \), this succinct instance is (much) smaller than the previously studied encoding of Nfold IP, and an algorithm running in time polynomial in the size of the succinct instance may be (much) faster than current algorithms. We call the Nfold IP where the instance is given succinctly the huge Nfold IP problem, and we present a fast algorithm for it:
Theorem 1
Huge Nfold IP with any separable convex objective can be solved in time
A natural application of Theorem 1 are scheduling problems. In many scheduling problems, the number n of jobs that must be assigned to machines, as well as the number m of machines, are very large, whereas the number of types of jobs and the number of kinds of machines are relatively small. An instance of such a scheduling problem can thus be compactly encoded by simply stating, for each job type and machine kind, the number of jobs with that type and machines with that kind together with their characteristics (like processing time, weight, release time, due date, etc.), respectively. This key observation was made by several researchers [7, 37], until Hochbaum and Shamir [20] coined the term highmultiplicity scheduling problem. Clearly, many efficient algorithms for scheduling problems, where all jobs are assumed to be distinct, become exponentialtime algorithms for the corresponding highmultiplicity problem.
Let us shortly demonstrate how Theorem 1 allows designing algorithms which are efficient for the succinct highmultiplicity encoding of the input. In modern computational clusters, it is common to have several kinds of machines differing by processing unit type (high single or multicore performance CPUs, GPUs), storage type (HDD, SSD, etc.), network connectivity, etc. However, the number of machine kinds \(\tau \) is still much smaller (perhaps 10) than the number of machines, which may be in the order of tens of thousands or more. Many scheduling problems have Nfold IP models [31] where \(\tau \) is the number of machine kinds and N is the number of machines. On these models, Theorem 1 would likely outperform the currently fastest Nfold IP algorithms.
Proof ideas. To solve a highmultiplicity problem, one needs a succinct way to argue about solutions. In 1961, Gilmore and Gomory [17] introduced the fundamental and widely influential notion of Configuration IP (ConfIP) which describes a solution (e.g., a schedule) by a list of pairs “(machine schedule s, multiplicity \(\mu \) of machines with schedule s)”. The linear relaxation of ConfIP, called the Configuration LP (ConfLP), can often be solved efficiently, and is known to provide solutions of strikingly high quality in practice [41]; for example, the optimum of the ConfLP for Bin Packing is conjectured to have value x such that an optimal integer packing uses \(\le \lceil x \rceil + 1\) bins [38]. However, surprisingly little is known in general about the structure of solutions of ConfIP and ConfLP, and how they relate to each other.
We define the Configuration IP and LP of an Nfold IP instance, and show how to solve the ConfLP quickly using the property that the ConfLP and ConfIP have polynomial encoding length even for huge Nfold IP. Our main technical contribution is a novel proximity theorem about Nfold IP, showing that a solution of its relaxation corresponding to the ConfLP optimum is very close to the integer optimum. Thus, the algorithm of Theorem 1 proceeds in three steps: (1) it solves the ConfLP, (2) it uses the proximity theorem to create a “residual” \(N'\)fold instance with \(N'\) upperbounded by \((\Vert E\Vert _\infty rs)^{\mathcal {O}(rs)}\), and (3) it solves the residual instance by an existing Nfold IP algorithm.
Related work
Besides the references mentioned already, we point out that solving ConfLP is commonly used as subprocedure in approximation algorithms, e.g. [1, 14, 22, 27]. Jansen and SolisOba use a mixed ConfLP to give a parameterized \(\textit{OPT}+1\) algorithm for bin packing [24]; Onn [36] gave a weaker form of Theorem 1 which only applies to the setting where \(E_1^i = I\) and \(E_2^i\) is totally unimodular, for all i. Jansen et al. [25] extend the ConfIP to multiple “levels” of configurations. An extended version [31] of this paper shows how to model many scheduling problems as high multiplicity Nfold IPs, so that an application of Theorem 1 yields new parameterized algorithms for these problems. Knop and Koutecký [30] use our new proximity theorem to show efficient preprocessing algorithms (kernels) for scheduling problems.
There are currently several “fastest” algorithms for Nfold IP with standard (nonsuccinct) encoding. First, we have already mentioned the algorithm of Eisenbrand et al. [13]. Second, the algorithm of Jansen et al. [26] has a better parameter dependency of \((\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + s^2)}\) (as compared with \((\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + rs^2)}\) of the previous algorithm), but has a slightly worse dependence on N of \(N \log ^5 N\), and only works for linear objectives. Third, a recent algorithm of Cslovjecsek et al. [8] again only works for linear objectives and runs in time \((\Vert E\Vert _\infty s)^{\mathcal {O}(s^2)} {{\,\mathrm{\mathrm{poly}}\,}}(r) N \log ^2(Nt) \log ^2(\Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, f_{\max }\Vert _\infty ) + (\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + s^2)}Nt\). While the authors claim that this constitutes the currently fastest algorithm, it seems that it is only potentially faster than prior work in a narrow parameter regime.
The third paper, by Cslovjecsek et al. [8], is the closest to ours in its approach: it solves a strong relaxation of Nfold IP which coincides with the ConfLP if each brick is of a distinct type, and which is generalized by the ConfLP (in our work) otherwise. The authors show that this relaxation can be solved in nearlinear time, and then develop a proximity theorem similar to ours (but using different techniques) and a dynamic program, which allows them to construct and solve a residual instance in linear time. An earlier version of our paper [31] stated a worse proximity bound than that of Cslovjecsek et al. [8], but our bound applies to separable convex objective whereas theirs [8] does not. Presently, we adapt one of their lemmas ( [8, Lemma 3]) (Lemma 5) and a modeling idea (Sect. 3.4) to obtain the same proximity bound as they have [8], but which also works for separable convex objectives. It is likely that the complexity of our algorithm to solve the ConfLP could be improved along the lines of their work [8]. Despite these similarities, we highlight that only our algorithm solves the highmultiplicity version of Nfold IP.
Preliminaries
For positive integers m, n with \(m \le n\) we set \([m,n] = \{m, m+1, \ldots , n\}\) and \([n] = [1,n]\). We write vectors in boldface (e.g., \({\mathbf{x}}, {\mathbf{y}}\)) and their entries in normal font (e.g., the ith entry of \({\mathbf{x}}\) is \(x_i\) or x(i)). For \(\alpha \in \mathbb {R}\), \({\lfloor }{\alpha }{\rfloor }\) is the floor of \(\alpha \), \({\lceil }{\alpha }{\rceil }\) is the ceiling of \(\alpha \), and we define \(\{\alpha \} = \alpha  {\lfloor }{\alpha }{\rfloor }\), similarly for vectors where these operators are defined componentwise.
We call a brick of \({\mathbf{x}}\) integral if all of its coordinates are integral, and fractional otherwise.
Huge Nfold IP. The huge Nfold IP problem is an extension of Nfold IP to the highmultiplicity scenario, where there are potentially exponentially many bricks. This requires a succinct representation of the input and output. The input to a huge Nfold IP problem with \(\tau \) brick types is defined by matrices \(E^i_1 \in \mathbb {Z}^{r \times t}\) and \(E^i_2 \in \mathbb {Z}^{s \times t}\), \(i \in [\tau ]\), vectors \({\mathbf{l}}^1, \dots , {\mathbf{l}}^\tau \), \({\mathbf{u}}^1, \dots , {\mathbf{u}}^\tau \in \mathbb {Z}^t\), \({\mathbf{b}}^0 \in \mathbb {Z}^r\), \({\mathbf{b}}^1, \dots , {\mathbf{b}}^\tau \in \mathbb {Z}^s\), functions \(f^1, \dots , f^\tau :\mathbb {R}^{t} \rightarrow \mathbb {R}\) satisfying \(\forall i \in [\tau ], \, \forall {\mathbf{x}}\in \mathbb {Z}^t:\, f^i({\mathbf{x}}) \in \mathbb {Z}\) and given by evaluation oracles, and integers \(\mu ^1, \dots , \mu ^\tau \in \mathbb {N}\) such that \(\sum _{i=1}^\tau \mu ^i = N\). We say that a brick is of type i if its lower and upper bounds are \({\mathbf{l}}^i\) and \({\mathbf{u}}^i\), its right hand side is \({\mathbf{b}}^i\), its objective is \(f^i\), and the matrices appearing at the corresponding coordinates are \(E^i_1\) and \(E^i_2\). The task is to solve (IP) with a matrix \(E^{(N)}\) which has \(\mu ^i\) bricks of type i for each i. Onn [35] shows that for any solution, there exists a solution which is at least as good and has only few (at most \(\tau \cdot 2^t\)) distinct bricks. In Sect. 3 we show new bounds which do not depend exponentially on t.
Graver bases and the Steinitz lemma
Let \({\mathbf{x}}, {\mathbf{y}}\) be ndimensional vectors. We call \({\mathbf{x}}, {\mathbf{y}}\) signcompatible if they lie in the same orthant, that is, for each \(i \in [n]\), \(x_i \cdot y_i \ge 0\). We call \(\sum _i {\mathbf{g}}^i\) a signcompatible sum if all \({\mathbf{g}}^i\) are pairwise signcompatible. Moreover, we write \({\mathbf{y}}\sqsubseteq {\mathbf{x}}\) if \({\mathbf{x}}\) and \({\mathbf{y}}\) are signcompatible and \(y_i \le x_i\) for each \(i \in [n]\). Clearly, \(\sqsubseteq \) imposes a partial order, called “conformal order”, on ndimensional vectors. For an integer matrix \(A \in \mathbb {Z}^{m \times n}\), its Graver basis \(\mathcal {G}(A)\) is the set of \(\sqsubseteq \)minimal nonzero elements of the lattice of A, \(\ker _{\mathbb {Z}}(A) = \{{\mathbf{z}}\in \mathbb {Z}^n \mid A {\mathbf{z}}= {\mathbf {0}}\}\). A circuit of A is an element \({\mathbf{g}}\in \ker _{\mathbb {Z}}(A)\) whose support \(\text {supp}({\mathbf{g}})\) (i.e., the set of its nonzero entries) is minimal under inclusion and whose entries are coprime. We denote the set of circuits of A by \(\mathcal {C}(A)\). It is known that \(\mathcal {C}(A) \subseteq \mathcal {G}(A)\) [34, Definition 3.1 and remarks]. We make use of the following two propositions:
Proposition 1
(Positive Sum Property [34, Lemma 3.4]) Let \(A \in \mathbb {Z}^{m \times n}\) be an integer matrix. For any integer vector \({\mathbf{x}}\in \ker _{\mathbb {Z}}(A)\), there exists an \(n' \le 2n2\) and a decomposition \({\mathbf{x}}= \sum _{j=1}^{n'} \alpha _j {\mathbf{g}}_j\) with \(\alpha _j \in \mathbb {N}\) for each \(j \in [n']\), into a sum of \({\mathbf{g}}_j \in \mathcal {G}(A)\). For any fractional vector \({\mathbf{x}}\in \ker (A)\) (that is, \(A{\mathbf{x}}={\mathbf{0}}\)), there exists a decomposition \({\mathbf{x}}= \sum _{j=1}^{n} \alpha _j {\mathbf{g}}_j\) into \({\mathbf{g}}_j \in \mathcal {C}(A)\), where \(\alpha _j \ge 0\) for each \(j \in [n]\).
Proposition 2
(Separable convex superadditivity [10, Lemma 3.3.1]) Let \(f({\mathbf{x}}) = \sum _{i=1}^n f_i(x_i)\) be separable convex, let \({\mathbf{x}}\in \mathbb {R}^n\), and let \({\mathbf{g}}_1,\dots ,{\mathbf{g}}_k \in \mathbb {R}^n\) be vectors with the same signpattern from \(\{\le 0, \ge 0\}^n\), that is, belonging to the same orthant of \(\mathbb {R}^n\). Then
for arbitrary integers \(\alpha _1,\dots ,\alpha _k \in \mathbb {N}\).
Our proximity theorem relies on the Steinitz Lemma, which has recently received renewed attention [11, 12, 23].
Lemma 1
(Steinitz [40], Sevastjanov, Banaszczyk [39]) Let \(\Vert \cdot \Vert \) denote any norm, and let \({\mathbf{x}}_1, \dots , {\mathbf{x}}_n \in \mathbb {R}^d\) be such that \(\Vert {\mathbf{x}}_i\Vert \le 1\) for \(i \in [n]\) and \(\sum _{i=1}^n {\mathbf{x}}_i = 0\). Then there exists a permutation \(\pi \in S_n\) such that for all \(k = 1,\dots ,n\), the prefix sum satisfies \(\left\ \sum _{i=1}^k {\mathbf{x}}_{\pi (i)}\right\ \le d\).
For an integer matrix A, we define \(g_1(A) = \max _{{\mathbf{g}}\in \mathcal {G}(A)} \Vert {\mathbf{g}}\Vert _1\). When it could make a difference, we will state our bounds both in terms of \(\Vert E\Vert _\infty \) (worstcase, when we have no other information) and in terms of \(g_1(E_2) := \max _i g_1(E_2^i)\), e.g. in Lemma 10 and Theorem 2.
Proof of Theorem 1
We first give a relatively highlevel description of the proof, before we present all its details.
Proof overview and ideas
Configuration LP and IP
Given an input to the huge Nfold IP, we first reformulate it as another IP, which we refer to as the Configuration IP. We then consider its fractional relaxation, the socalled Configuration LP. Our approach is to (efficiently) solve the Configuration LP, and bound the distance of its LP optimum to the integer optimum (of the Configuration IP). We use this bound to reduce the input to the huge Nfold IP from a highmultiplicity input to an input of a standard Nfold IP which is small both in terms of the number of bricks and size of the bounding box. This small input we then solve using an existing Nfold IP algorithm. On this way, there are several nontrivial obstacles that we need to overcome.
We will refer to huge Nfold IP as HugeIP, its corresponding fractional relaxation as HugeCP (this is a convex program if the objective f is convex), the Configuration LP of the HugeIP as ConfLP, and to its integer version as ConfIP. We define a mapping \(\varphi \) from the solutions of ConfLP to the solutions of HugeCP which, for every variable \(y_{{\mathbf{c}}}\) of the ConfLP introduces \(\lfloor y_{{\mathbf{c}}} \rfloor \) bricks with configuration \({\mathbf{c}}\), and then introduces \(\sum _{{\mathbf{c}}} \{y_{{\mathbf{c}}}\}\) bricks with configuration \(\frac{1}{\sum _{{\mathbf{c}}} \{y_{{\mathbf{c}}}\}} \sum _{{\mathbf{c}}} \{y_{{\mathbf{c}}}\} \cdot {\mathbf{c}}\) (i.e., an “average” configuration). We call a solution \({\mathbf{x}}^*\) of HugeCP “confoptimal” if it is the image \(\varphi ({\mathbf{y}}^*)\) of some ConfLP optimum \({\mathbf{y}}^*\). One would hope that then the objective value of a confoptimal solution \({\mathbf{x}}^*\) in HugeCP and of \({\mathbf{y}}^*\) in ConfLP were identical. While this is true for any linear objective f, it need not be true for a convex objective f. To overcome this impediment, we introduce an auxiliary objective \({\hat{f}}\) which preserves the values of optima of ConfLP and confoptimal solutions of HugeCP.
Proximity theorem
The bulk of our work is showing that for each confoptimal solution \({\mathbf{x}}^*\) of the HugeLP, there is an optimum \({\mathbf{z}}^*\) of the HugeIP whose \(\ell _1\)distance from \({\mathbf{x}}^*\) is bounded by \(P :=(\Vert E\Vert _\infty rs)^{\mathcal {O}(rs)}\). We will show that we can obtain a ConfLP optimum \({\mathbf{y}}\) with support of size at most \(r+\tau \), and by the definition of \(\varphi \) (recall that \({\mathbf{x}}^* = \varphi ({\mathbf{y}})\)), this means that \({\mathbf{x}}^*\) has at most \(r+\tau +1\) distinct bricks (the \(+1\) is due to \(\varphi \) creating an additional “average configuration” brick type). This, in turn, means that our bound on the \(\ell _1\)distance between \({\mathbf{z}}^*\) and \({\mathbf{x}}^*\) says something about ConfLP and ConfIP: for any ConfLP optimum \({\mathbf{y}}\) there is a ConfIP optimum \({\mathbf{y}}^*\) in \(\ell _1\)distance at most P where any configuration \({\mathbf{c}}\) in the support of \({\mathbf{y}}^*\) is at most P far from some configuration \({\mathbf{c}}'\) in the support of \({\mathbf{y}}\). As far as we know, this is a unique result about the Configuration LP.
A way of bounding the distance between some types of optima in an integer program has been introduced by Hochbaum and Shanthikumar [21] and adapted to the setting of Nfold IP by Hemmecke at al. [19]. A somewhat different approach was later developed by Eisenbrand and Weismantel [11] in the setting of IPs with few rows, and was adapted to the setting of Nfold IPs soon after [12, 13]. The idea is as follows. Let \({\mathbf{x}}^*\) be a HugeCP optimum, and \({\mathbf{z}}^*\) be a HugeIP optimum, We call a nonzero integral vector \({\mathbf{p}}\sqsubseteq {\mathbf{x}}^*  {\mathbf{z}}^*\), i.e., which is signcompatible (i.e., has the same signpattern) with \({\mathbf{x}}^*  {\mathbf{z}}^*\) and which is smaller in absolute value than \({\mathbf{x}}^*  {\mathbf{z}}^*\) in each coordinate, a cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\). If \({\mathbf{z}}^*\) minimizes \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\), it can be shown that no cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\) exists. Moreover, if a cycle exists, then a cycle of \(\ell _1\)norm at most B exists, which implies \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1 \le B\).
Notice that the previous argument assumes \({\mathbf{x}}^*\) to be a HugeCP optimum: this cannot be replaced with a confoptimal solution for the following reason. The existence of a cycle \({\mathbf{p}}\) leads to a contradiction because either \({\mathbf{z}}^* + {\mathbf{p}}\) is also a HugeIP optimum (but closer to \({\mathbf{x}}^*\)) or \({\mathbf{x}}^*  {\mathbf{p}}\) is also a HugeCP optimum (but closer to \({\mathbf{z}}^*\)). But if \({\mathbf{x}}^*\) is a confoptimal solution, we have no guarantee that \({\mathbf{x}}^*  {\mathbf{p}}\) is again a configurable solution, and the argument breaks down. This means that we need to restrict our attention to cycles with the property that if \({\mathbf{x}}^*\) is a configurable solution, then \({\mathbf{x}}^*  {\mathbf{p}}\) is also configurable.
We call such a \({\mathbf{p}}\) a configurable cycle. The next task is an analogy of the argument above: if \({\mathbf{x}}^*\) is confoptimal and \({\mathbf{z}}^*\) is a HugeIP optimum, then the existence of a configurable cycle \({\mathbf{p}}\) of \({\mathbf{x}}^*  {\mathbf{z}}^*\) leads to a contradiction. For that, we need the separability and convexity of the objective f and a careful use of the configurability of \({\mathbf{p}}\). With this argument at hand, we have reduced our task to bounding the norm of any configurable cycle (Lemma 7).
However, the main existing tool for showing proximity is by ruling out cycles. To overcome this, we develop new tools to deal with configurable cycles.
The algorithm
It remains to use our proximity bound P. As already hinted at, if two solutions differ in \(\ell _1\)norm by at most P, then they may differ in at most P bricks. This means that we may fix all but P bricks for each configuration appearing in the ConfLP optimum. Since the size of the support of the ConfLP optimum is small (\(r+\tau \)), the total number of bricks to be determined is also small, and can be done using a standard Nfold IP algorithm in the required time complexity (Proof of Theorem 1)
To recap, the algorithm works in the following steps.
 1.:

We solve the ConfLP and obtain its optimum \({\mathbf{y}}\) by solving its Dual LP using a separation oracle. The separation oracle is implemented using a fixedparameter algorithm for IP with small coefficients.
 2.:

We use the ConfLP optimum \({\mathbf{y}}\) to fix the solution on all but \((r+\tau )P\) bricks.
 3.:

The remaining instance can be encoded as an Nfold IP with at most \((r+\tau )P\) bricks and solved using an existing algorithm.
Let us now go back to a detailed proof of Theorem 1.
Configurations of huge Nfold IP
Fix a huge Nfold IP instance with \(\tau \) types. Recall that \(\mu ^i\) denotes the number of bricks of type i, and \(\varvec{ \mu }= (\mu ^1, \dots , \mu ^\tau )\). We define for each \(i \in [\tau ]\) the set of configurations of type i as
Here we are interested in four instances of convex programming (CP) and convex integer programming (IP) related to huge Nfold IP. First, we have the Huge IP
and the Huge CP, which is a relaxation of (HugeIP),
We shall define the objective function \({\hat{f}}\) later, for now it suffices to say that for all integral feasible \({\mathbf{x}}\in \mathbb {Z}^{Nt}\) we have \(f({\mathbf{x}}) = {\hat{f}}({\mathbf{x}})\) so that indeed the optimum of (HugeCP) lower bounds the optimum of (HugeIP) and that \({\hat{f}}\) is convex. Then, there is the Configuration LP of (HugeIP), that is, the following linear program:
Letting B be its constraint matrix and \({\mathbf{d}}= \left( {\begin{matrix}{\mathbf{b}}^0 \\ \varvec{ \mu }^\intercal \end{matrix}}\right) \) be the right hand side, we can shorten (3)–(4) as
Finally, by observing that \(B{\mathbf{y}}={\mathbf{d}}\) implies \(y(i,{\mathbf{c}}) \le \Vert \varvec{ \mu }\Vert _\infty \) for all \(i \in [\tau ], {\mathbf{c}}\in \mathcal {C}^i\), defining \(C = \sum _{i \in [\tau ]} \mathcal {C}^i\), leads to the Configuration ILP,
A solution \({\mathbf{x}}\) of (HugeCP) is configurable if, for every \(i \in [\tau ]\), each brick \({\mathbf{x}}^j\) of type i is a convex combination of \(\mathcal {C}^i\), i.e., \({\mathbf{x}}^j \in \text {conv}(\mathcal {C}^i)\). We shall define a mapping from solutions of (ConfLP) to configurable solutions of (HugeCP) as follows. For every solution \({\mathbf{y}}\) of (ConfLP) we define a solution \({\mathbf{x}}= \varphi ({\mathbf{y}})\) of (HugeCP) to have \({\lfloor }{y(i, {\mathbf{c}})}{\rfloor }\) bricks of type i with configuration \({\mathbf{c}}\) and, for each \(i \in [\tau ]\), let \({\mathfrak {f}}^i = \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{y(i, {\mathbf{c}})\}\) and let \({\mathbf{x}}\) have \({\mathfrak {f}}^i\) bricks with value \({\hat{{\mathbf{c}}}}_i = \frac{1}{{\mathfrak {f}}^i} \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{y(i, {\mathbf{c}})\}{\mathbf{c}}\). (Because \(\sum _{{\mathbf{c}}\in \mathcal {C}^i} y(i,{\mathbf{c}}) = \mu ^i\) and \(\sum _{{\mathbf{c}}\in \mathcal {C}^i} {\lfloor }{y(i,{\mathbf{c}})}{\rfloor }\) is clearly integral, \({\mathfrak {f}}^i = \mu ^i  \sum _{{\mathbf{c}}\in \mathcal {C}^i} {\lfloor }{y(i,{\mathbf{c}})}{\rfloor }\) is also integral.) Note that \(\varphi ({\mathbf{y}})\) has at most as many fractional bricks as \({\mathbf{y}}\) has fractional entries since each \({\mathfrak {f}}^i < 1\) and the number of nonzero \({\mathfrak {f}}^i\) is at most the number of fractional entries of \({\mathbf{y}}\). Call a solution \({\mathbf{x}}\) of (HugeCP) confoptimal if there is an optimal solution \({\mathbf{y}}\) of (ConfLP) such that \({\mathbf{x}}= \varphi ({\mathbf{y}})\).
We are going to introduce an auxiliary objective function \({\hat{f}}\), but we first want to discuss our motivation in doing so. The reader might already see that for any integer solution \({\mathbf{y}}\in \mathbb {Z}^{C}\) of (ConfILP), \({\mathbf{v}}{\mathbf{y}}= f(\varphi ({\mathbf{y}}))\) holds, as we shall prove in Lemma 4. Our natural hope would be that for a fractional optimum \({\mathbf{y}}^*\) of (ConfLP) we would have \({\mathbf{v}}{\mathbf{y}}^* = f(\varphi ({\mathbf{y}}^*))\). However, by convexity of f and the construction of \({\hat{{\mathbf{c}}}}_i\) it only follows that \({\mathbf{v}}{\mathbf{y}}^* \ge f(\varphi ({\mathbf{y}}^*))\). Even worse, there may be two confoptimal solutions \({\mathbf{x}}\) and \({\mathbf{x}}'\) with \(f({\mathbf{x}}) {<} f({\mathbf{x}}')\). To overcome this, we define an auxiliary objective function \({\hat{f}}\) with the property that for any confoptimal solution \({\mathbf{x}}^*\) of (HugeCP) and any optimal solution \({\mathbf{y}}^*\) of (ConfLP), \({\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\mathbf{x}}^*)\).
Fix a brick \({\mathbf{x}}^j\) of type i. We say that a multiset \(\varGamma ^j \subseteq (\mathcal {C}^i \times \mathbb {R}_{\ge 0})\) is a decomposition of \({\mathbf{x}}^j\) and write \({\mathbf{x}}^j = \sum \varGamma ^j\) if \({\mathbf{x}}^j = \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j} \lambda _{{\mathbf{c}}} {\mathbf{c}}\) and \(\sum _{({\mathbf{c}}, \lambda _{{\mathbf{c}}}) \in \varGamma ^j} \lambda _{\mathbf{c}}= 1\). We define the objective \({\hat{f}}({\mathbf{x}})\) for all configurable solutions as \({\hat{f}}({\mathbf{x}}) = \sum _{j=1}^N {\hat{f}}^i({\mathbf{x}}^j)\), where
In a sense, \({\hat{f}}({\mathbf{x}})\) is the value of the minimum (w.r.t. f) interpretation of \({\mathbf{x}}\) as a convex combination of feasible integer solutions. Correspondingly, we call a decomposition \(\varGamma ^j\) of \({\mathbf{x}}^j\) \({\hat{f}}\)optimal if it is a minimizer of (5). Formally, we let \({\hat{f}}^i({\mathbf{x}}^j) = f^i({\mathbf{x}}^j)\) for a nonconfigurable \({\mathbf{x}}^j\) in order to make the definition of (HugeCP) valid; however, we are never interested in the value of \({\hat{f}}\) for nonconfigurable bricks in the following.
Lemma 2
Let \({\mathbf{x}}\) be a configurable solution of (HugeCP), and \({\mathbf{x}}^j\) be a brick of type i. Then \(f^i({\mathbf{x}}^j) \le {\hat{f}}^i({\mathbf{x}}^j)\). If \({\mathbf{x}}^j\) is integral, then \(f^i({\mathbf{x}}^j) = {\hat{f}}^i({\mathbf{x}}^j)\).
Proof
By convexity of \(f^i\) we have
for any decomposition \(\varGamma ^j\) of \({\mathbf{x}}^j\). If \({\mathbf{x}}^j\) is integral, then \(\varGamma ^j = \{({\mathbf{x}}^j, 1)\}\) is its optimal decomposition (not necessarily unique^{Footnote 1}), concluding the proof. \(\square \)
Moreover, for each \({\mathbf{x}}^j\) there is an \({\hat{f}}\)optimal decomposition \(\varGamma ^j\) with \(\varGamma ^j\le t+1\) since \({\hat{f}}\)optimal decompositions correspond to optima of a linear program with \(t+1\) equality constraints, namely
Let us describe the relationship of the objective values of the various formulations.
Lemma 3
For any feasible solution \({\tilde{{\mathbf{y}}}}\) of (ConfLP),
Proof
Let \({\tilde{{\mathbf{x}}}} = \varphi ({\tilde{{\mathbf{y}}}})\). We can decompose \({\hat{f}}(\varphi ({\tilde{{\mathbf{y}}}})) = U_1 + U_2\), where \(U_1\) is the cost of integer bricks of \(\varphi ({\tilde{{\mathbf{y}}}})\) and \(U_2\) is the cost of its fractional bricks. It is easy to see that \(U_1 = {\mathbf{v}}{\lfloor }{{\tilde{{\mathbf{y}}}}}{\rfloor }\) by the equality of \(f^i\) and \({\hat{f}}^i\), for all \(i \in [\tau ]\), over integer vectors. We shall further decompose the value \(U_2\) into costs of fractional bricks of each type. For each \(i \in [\tau ]\), the cost of each fractional brick of type i is at most \(\frac{1}{{\mathfrak {f}}^i} \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{{\tilde{y}}(i, {\mathbf{c}})\}f^i({\mathbf{c}})\) because the decomposition \(\left\{ \left( {\mathbf{c}}, \frac{1}{{\mathfrak {f}}^i} \{{\tilde{y}}^i_{{\mathbf{c}}}\}\right) \Big  {\mathbf{c}}\in \mathcal {C}^i\right\} \) of \({\hat{{\mathbf{c}}}}_i\) (recall that \({\hat{{\mathbf{c}}}}_i = \frac{1}{{\mathfrak {f}}^i} \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{{\tilde{y}}(i, {\mathbf{c}})\}{\mathbf{c}}\)) is merely a feasible (not necessarily optimal) solution of (6) Summing this estimate up over all \({\mathfrak {f}}^i\) fractional bricks of type i gives \({\mathfrak {f}}^i \cdot \frac{1}{{\mathfrak {f}}^i} \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{{\tilde{y}}(i, {\mathbf{c}})\}f^i({\mathbf{c}}) = {\mathbf{v}}^i \{{\tilde{{\mathbf{y}}}}^i\}\), concluding the proof. \(\square \)
Lemma 4
Let \({\hat{{\mathbf{y}}}}\) be an optimum of (ConfILP), \({\mathbf{z}}^*\) be an optimum of (HugeIP), \({\mathbf{y}}^*\) be an optimum of (ConfLP), \({\tilde{{\mathbf{x}}}} = \varphi ({\mathbf{y}}^*)\), and \({\mathbf{x}}^*\) be a configurable optimum of (HugeCP). Then
Proof
We have \({\hat{f}}({\mathbf{z}}^*) = f({\mathbf{z}}^*)\) by equality of \({\hat{f}}\) and f on integer solutions (Lemma 2), and \(f({\mathbf{z}}^*) = f(\varphi ({\hat{{\mathbf{y}}}})) = {\mathbf{v}}{\hat{{\mathbf{y}}}}\) by the definition of \(\varphi \) and the fact that \({\hat{{\mathbf{y}}}}\) is an integer optimum. Clearly, \({\mathbf{v}}{\hat{{\mathbf{y}}}} \ge {\mathbf{v}}{\mathbf{y}}^*\), because (ConfLP) is a relaxation of (ConfILP) and thus the former lower bounds the latter.
Let us construct a mapping \(\phi \) for any configurable solution \({\mathbf{x}}\) of (HugeCP). Start with \(\phi ({\mathbf{x}}) = {\mathbf{y}}= {\mathbf{0}}\). For each brick \({\mathbf{x}}^j\) of type i let \(\varGamma ^j\) be a \({\hat{f}}\)optimal decomposition of \({\mathbf{x}}^j\) and update \(y^i_{\mathbf{c}}:= y^i_{\mathbf{c}}+ \lambda _{\mathbf{c}}\) for each \(({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j\). Now it is easy to see that
Our goal is to argue that \({\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\tilde{{\mathbf{x}}}}) = {\hat{f}}({\mathbf{x}}^*)\). We have \({\hat{f}}({\tilde{{\mathbf{x}}}}) = {\hat{f}}(\varphi ({\mathbf{y}}^*)) \le {\mathbf{v}}{\mathbf{y}}^*\) by (7), but by optimality of \({\mathbf{y}}^*\) and (8) it must be that \({\mathbf{v}}\phi ({\tilde{{\mathbf{x}}}}) = {\hat{f}}({\tilde{{\mathbf{x}}}}) \ge {\mathbf{v}}{\mathbf{y}}^*\) and hence \({\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\tilde{{\mathbf{x}}}})\). Similarly,
with the “\(=\)” by (8), the first “\(\ge \)” by optimality of \({\mathbf{y}}^*\), and the second “\(\ge \)” by (7). However, since \({\hat{f}}(\varphi ({\mathbf{y}}^*)) \ge {\hat{f}}({\mathbf{x}}^*)\) by optimality of \({\mathbf{x}}^*\), all inequalities are in fact equalities and thus \({\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\mathbf{x}}^*)\). \(\square \)
Remark 1
We only need the properties of \({\hat{f}}\) that we have proved so far. To gain a little bit more intuition, consider the dual of the LP (6). Notice that the set of right hand sides \({\mathbf{x}}^j\) whose optimum is attained by a particular set of configurations \(\text {supp}(\varvec{ \lambda })\) is a polyhedron. Call such a set a cell. This means that \({\hat{f}}\) is a convex function which is linear in each cell. Another observation is that \({\hat{f}}\) is nonseparable.
We do not have a more intuitive explanation of \({\hat{f}}\). It would be tempting to think that \({\hat{f}}\) is the piecewise linear approximation of f in which, for every \(i \in [Nt]\), we replace each segment of \(f_i\) between two adjacent integers \(k,k+1\) by the affine function going through the points \((k,f_i(k))\) and \((k+1, f_i(k+1))\). However, this turns out to be incorrect: for example, say that \(f_1(x_1) = x_11\) (thus \(f_1(0) = f_1(2) = 1\) and \(f_1(1) = 0\)) and that we set \(x_1 = 2x_2\) for a new integer variable \(x_2\). This constraint ensures that \(x_1\) only takes on even values. Thus, \(x_1\) never attains the value 1 and \({\hat{f}}_1(1) \ge 1\) even though the piecewise linear approximation of \(f_1\) has value 0 at 1.
Bounding the number of fractional coordinates.
Lemma 5
(Adaptation of [8, Lemma 4.1]) An optimal vertex solution \({\mathbf{y}}^*\) of (ConfLP) has at most 2r fractional coordinates.
Proof
Notice that if a brick \(({\mathbf{y}}^*)^i\) is a vertex of the set \(Q^i := \text {conv}\{{\mathbf{y}}^i \in \mathbb {R}^{\mathcal {C}^i} \mid {\mathbf{1}}{\mathbf{y}}^i = \mu ^i, {\mathbf{y}}^i \ge {\mathbf{0}}\}\), then it is integral. Thus, any brick of \({\mathbf{y}}^*\) which is fractional cannot be a vertex of \(Q^i\) and hence there exists a direction \({\mathbf{e}}^i \in \text {Ker}_{\mathbb {Z}}({\mathbf{1}})\) and a length \(\lambda ^i >0\) such that \(({\mathbf{y}}^*)^i \pm \lambda ^i {\mathbf{e}}^i \in Q^i\). For the sake of contradiction, assume there are \(r+1\) bricks of \({\mathbf{y}}^*\) which contain a fractional coordinate and I is the index set of such bricks. Hence we have \({\mathbf{e}}^i, \lambda ^i\) as above for each \(i \in I\). We abuse the notation and treat \(\mathcal {C}^i\) as a matrix whose columns are the configurations. Consider the vectors \(E_1^i \mathcal {C}^i \lambda ^i {\mathbf{e}}^i \in \mathbb {R}^r\): because there are \(r+1\) of them, they are linearly dependent, and, by rescaling, there must be coefficients \({\bar{\varvec{ \lambda }}}\) such that \({\bar{\lambda }}^i \le \lambda ^i\) for each \(i \in I\) and \(\sum _{i \in I} E_1^i \mathcal {C}^i {\bar{\lambda }}^i {\mathbf{e}}^i = {\mathbf{0}}\). Define \({\mathbf{e}}\in \mathbb {R}^C\) (recall that C is the total number of configurations) such that its ith brick is equal to \({\bar{\lambda }}^i {\mathbf{e}}^i\) if \(i \in I\), and is \({\mathbf{0}}\) otherwise. Then \({\mathbf{y}}^* \pm {\mathbf{e}}\) are both feasible solutions of (ConfLP), and thus \({\mathbf{y}}^*\) is not a vertex solution—a contradiction.
So far, we have shown there are at most r fractional bricks of \({\mathbf{y}}^*\). Notice that all we needed for that was \(r+1\) linearly dependent vectors which can be added to some brick in both directions while preserving feasibility. Because \({\mathbf{e}}^i \in \text {Ker}_{\mathbb {Z}}({\mathbf{1}})\) for each \(i \in I\), we can decompose \({\mathbf{e}}^i\) into elements of \(\mathcal {G}({\mathbf{1}})\), which are exactly vectors with one 1 and one \(1\). Hence, to avoid the contradiction above, there can be at most r vectors \({\mathbf{e}}^i\), and, additionally, all of them must belong to \(\mathcal {G}({\mathbf{1}})\). Thus, the resulting vector \({\mathbf{e}}\) has support of size at most 2r, and \({\mathbf{y}}^*\) has at most 2r fractional coordinates. \(\square \)
Finding a confoptimal solution with small number of fractional bricks.
Our goal is to show that the proximity of any confoptimal solution \({\mathbf{x}}^*\) of (HugeCP) from an integer optimum \({\mathbf{z}}^*\) of (HugeIP) depends on the number of fractional bricks. This number, by definition of \(\varphi \), depends on the number of fractional coordinates of the corresponding solution \({\mathbf{y}}\) of (ConfLP). The following lemma shows how to produce optima of (ConfLP) with small support. We emphasize that our proximity theorem does not require that the fractional solution be optimal but rather confoptimal.
Lemma 6
There is an algorithm that finds an optimal vertex solution \({\mathbf{y}}^*\) of (ConfLP) with \(\text {supp}({\mathbf{y}}^*) \le r + \tau \) and at most 2r fractional coordinates, and a confoptimal solution \({\mathbf{x}}^* = \varphi ({\mathbf{y}}^*)\) of (HugeCP) with at most 2r fractional bricks, in time \(g_1(E_2)^{\mathcal {O}(s)} {{\,\mathrm{\mathrm{poly}}\,}}(r t \tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu }, E\Vert _\infty )\).
Proof
The proof has three parts. First, we describe how to find an optimal basic solution of the dual of (ConfLP). Next, we identify \(r+\tau \) inequalities of this dual which fully determine the optimal dual LP solution. Finally, we show how to use this information to solve (ConfLP) itself.
Recall that \(\tau \) is the number of brick types in the huge Nfold instance. Since (ConfLP) has exponentially many variables, we take the standard approach and solve the dual LP of (ConfLP) by the ellipsoid method and the equivalence of optimization and separation. The Dual LP of (ConfLP) in variables \(\varvec{\alpha } \in \mathbb {R}^r\), \(\varvec{\beta } \in \mathbb {R}^{\tau }\) is:
To verify feasibility of \((\varvec{\alpha }, \varvec{\beta })\), we need, for each \(i \in [\tau ]\), to maximize the lefthand side of (9) over all \({\mathbf{c}}\in \mathcal {C}^i\) and check if it is at most \(\beta ^i\). This corresponds to finding integer variables \({\mathbf{c}}\) which for given \((\varvec{\alpha }, \varvec{\beta })\) solve
This program can be solved in time \(T''' \le g_1(E_2)^{\mathcal {O}(s)} t^3\cdot {{\,\mathrm{\mathrm{poly}}\,}}(\log \Vert {\mathbf{b}}^i, {\mathbf{l}}^i, {\mathbf{u}}^i,\Vert E\Vert _\infty \Vert _\infty )\) [33, Theorem 4].
Grötschel et al. [18, Theorem 6.4.9] show that an optimal solution of an LP (even one which is a vertex [18, Remark 6.5.2]) can be found in a number of calls to a separation oracle which is polynomial in the dimension and the encoding length of the inequalities returned by a separation oracle. Clearly the inequalities (9) have encoding length bounded by \(\log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu }\Vert _\infty \) and thus \(T = {{\,\mathrm{\mathrm{poly}}\,}}(r t \tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu },E\Vert _\infty )\) calls to a separation oracle are sufficient to find an optimal vertex solution, which amounts to \(T \cdot T'''\) arithmetic operations.
Next, we will identify \(r+\tau \) inequalities determining the previously found optimal vertex solution of the dual of (ConfLP). Observe that the dimension of the dual LP is the number of rows of the primal LP, which is \(r + \tau \). Since each point in \((r+\tau )\)dimensional space is fully determined by \(r+\tau \) linearly independent inequalities, there must exist a subset I of \(r+\tau \) inequalities among the T inequalities considered by the ellipsoid method which fully determines the dual optimum. We can find them as follows.
We initialize I to be the empty set. Taking the T considered inequalities one by one, we process the inequality if it is satisfied as equality by the given optimal basic solution for the dual LP, and we discard other inequalities. If we process the current inequality and either some inequality of I or the present inequality is dominated^{Footnote 2} by an inequality that can be obtained as a nonnegative linear combination of the others, discard it; otherwise, include it in I and continue. Testing whether an inequality \({\mathbf{d}}{\mathbf{z}}\le e'\) is dominated by a nonnegative combination of a system of inequalities \(D {\mathbf{z}}\le {\mathbf{e}}\) can be decided by solving
and checking whether the optimal value is at most \(e'\). If it is, then the solution \(\varvec{ \alpha }\) encodes a nonnegative linear combination of the inequalities \(D {\mathbf{z}}\le {\mathbf{e}}\) which yields an inequality dominating \({\mathbf{d}}{\mathbf{z}}\le e'\), and if it is not, then such a combination does not exists. Thus, when a new inequality is considered, we solve (10) for at most \(r+\tau \) inequalities (the new one and all less than \(r+\tau \) already selected ones), and there are at most T inequalities considered. The time needed to solve (10) is \({{\,\mathrm{\mathrm{poly}}\,}}(r+\tau , \log \Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, f_{\max },E\Vert _\infty )\) because its dimension is at most \(r+\tau \) and its encoding length is at most \(\log \Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, f_{\max },E\Vert _\infty \). Altogether, we need time
Finally, let the restricted (ConfLP) be the (ConfLP) restricted to the variables corresponding to the inequalities in I. We claim that an optimal solution to the restricted (ConfLP) is also an optimal solution to (ConfLP). To see that, use LP duality: the optimal objective value of the dual LP restricted to inequalities in I is the same as one of the dual optima, and thus an optimal solution of the restricted (ConfLP) must be an optimal solution of (ConfLP). We solve the restricted (ConfLP) using any polynomial LP algorithm in time \(T'' \le {{\,\mathrm{\mathrm{poly}}\,}}((r+\tau ), \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, \varvec{ \mu }, {\mathbf{b}}^0,E \Vert _\infty )\). The resulting total time complexity is thus \(T \cdot T''' + T'\) to construct the restricted (ConfLP) instance and time \(T''\) to solve it, \(T \cdot T''' + T' + T''\) total, which is upper bounded by \(g_1(E_2)^{\mathcal {O}(s)} {{\,\mathrm{\mathrm{poly}}\,}}(r t \tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu },E\Vert _\infty )\), as claimed.
Let \({\mathbf{y}}^*\) be an optimum of (ConfLP) we have thus obtained. Since \(I \le r+\tau \), the support of \({\mathbf{y}}^*\) is of size at most \(r+\tau \). By Lemma 5, \({\mathbf{y}}^*\) has at most 2r fractional coordinates. Now setting \({\mathbf{x}}^* = \varphi ({\mathbf{y}}^*)\) is enough, since we have already argued (see definition of \(\varphi \)) that \({\mathbf{x}}^*\) has at most as many fractional bricks as \({\mathbf{y}}^*\) has fractional coordinates and \({\mathbf{x}}^*\) can be computed from \({\mathbf{y}}^*\) in \(\mathcal {O}(r + \tau )\) time. \(\square \)
Proximity theorem
Let us give a plan for the next subsection. We wish to prove that for every confoptimal solution \({\mathbf{x}}^*\) of (HugeCP) there is an integer solution \({\mathbf{z}}^*\) of (HugeIP) nearby. In the following, let \({\mathbf{x}}^*\) be a confoptimal solution of (HugeCP) and \({\mathbf{z}}^*\) be an optimal solution of (HugeIP) minimizing \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\). A technique for proving proximity theorems which was introduced by Eisenbrand and Weismantel [11] works as follows. A vector \({\mathbf{h}}\in \mathbb {Z}^{Nt}\) is called a cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\) if \({\mathbf{h}}\ne {\mathbf{0}}\), \(E^{(N)} {\mathbf{h}}= {\mathbf{0}}\), and \({\mathbf{h}}\sqsubseteq {\mathbf{x}}^*  {\mathbf{z}}^*\). It is not too difficult to see that if \({\mathbf{x}}'\) is an optimal (not necessarily confoptimal) solution of (HugeCP) with the objective f, then there cannot exist a cycle of \({\mathbf{x}}'  {\mathbf{z}}^*\) (cf. proof of Lemma 9). Based on a certain decomposition of \({\mathbf{x}}'  {\mathbf{z}}^*\) into integer and fractional smaller dimensional vectors and by an application of the Steinitz Lemma, the existence of a cycle is proven unless \(\Vert {\mathbf{x}}'{\mathbf{z}}^*\Vert _1\) is roughly bounded by the number of fractional bricks of \({\mathbf{x}}'\). However, we cannot apply this technique directly as an optimal solution \({\mathbf{x}}'\) of (HugeCP) might have many fractional bricks. At the same time, an existence of a cycle \({\mathbf{h}}\) of \({\mathbf{x}}^*  {\mathbf{z}}^*\) does not necessarily contradict that \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\) is minimal, because \({\mathbf{x}}^* + {\mathbf{h}}\) might not be a configurable solution, which is an essential part of the argument.
All of this leads us to introduce a stronger notion of a cycle. We say that \({\mathbf{h}}\in \mathbb {Z}^{Nt}\) is a configurable cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\) (with respect to \({\mathbf{x}}^*\)) if (1) \({\mathbf{h}}\) is a cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\), (2) for each brick \(j \in [N]\) of type \(i \in [\tau ]\) there exists an \({\hat{f}}\)optimal decomposition \(\varGamma ^j\) of \(({\mathbf{x}}^*)^j\) such that we may write \({\mathbf{h}}^j = \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j} \lambda _{\mathbf{c}}{\mathbf{h}}_{\mathbf{c}}\), and (3) for each \(({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j\) we have \({\mathbf{h}}_{\mathbf{c}}\sqsubseteq {\mathbf{c}} ({\mathbf{z}}^*)^j\) and \({\mathbf{h}}_{\mathbf{c}}\in \text {Ker}_\mathbb {Z}(E^i_2)\). Soon we will show that if \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\) is minimal, \({\mathbf{x}}^*  {\mathbf{z}}^*\) does not have a configurable cycle. The next task becomes to show how large must \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\) be in order for a configurable cycle to exist. Recall that the technique of Eisenbrand and Weismantel [11] can be used to rule out an existence of a (regular) cycle, not a configurable cycle. To overcome this, we “lift” both \({\mathbf{x}}^*\) and \({\mathbf{z}}^*\) to a higherdimensional space and show that a cycle in this space corresponds to a configurable cycle in the original space. Only then are we ready to prove a proximity bound using the aforementioned technique.
Lemma 7
If \({\mathbf{h}}\) is a configurable cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\), then \({\mathbf{x}}^*  {\mathbf{h}}\) is configurable.
Proof
Fix \(j \in [N]\). Let \({\mathbf{p}}\) be the brick \(({\mathbf{x}}^*  {\mathbf{h}})^j\) and let \(i \in [\tau ]\) be its type. Now \({\mathbf{p}}\) can be written as \({\mathbf{p}}= \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j} \lambda _{\mathbf{c}}({\mathbf{c}} {\mathbf{h}}_{\mathbf{c}})\). Furthermore, we have \(E^i_2({\mathbf{c}} {\mathbf{h}}_{{\mathbf{c}}}) = E^i_2 {\mathbf{c}}= {\mathbf{b}}^j\), and, by \({\mathbf{h}}\sqsubseteq {\mathbf{x}}^*  {\mathbf{z}}^*\), we also have \({\mathbf{l}}\le {\mathbf{x}}^*  {\mathbf{h}}\le {\mathbf{u}}\). \(\square \)
We now need a technical lemma:
Lemma 8
Let \({\mathbf{x}}^*\) be a confoptimal solution of (HugeCP), let \({\mathbf{z}}^*\) be an optimum of (HugeIP), and let \({\mathbf{h}}^*\) be a configurable cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\). Then
Proof
We begin by a simple observation: let \(g: \mathbb {R}\rightarrow \mathbb {R}\) be a convex function, \(x \in \mathbb {R}\), \(z \in \mathbb {Z}\), and \(r \in \mathbb {Z}\) be such that \(r \sqsubseteq xz\) (that is, there is some \(\rho , 0 \le \rho \le 1\), such that \(r = \rho \cdot (xz)\)). By convexity of g we have that
Fix \(j \in [N]\) and \({\mathbf{z}}= ({\mathbf{z}}^*)^j\), \({\mathbf{x}}= ({\mathbf{x}}^*)^j\), \({\mathbf{h}}= ({\mathbf{h}}^*)^j\), and let i be the type of brick j. Since \({\mathbf{h}}^*\) is a configurable cycle there exists an \({\hat{f}}\)optimal decomposition \(\varGamma \) of \({\mathbf{x}}\) such that, for each \(({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma \), there exists a \({\mathbf{h}}_{\mathbf{c}}\sqsubseteq {\mathbf{c}} {\mathbf{z}}\), \({\mathbf{h}}_{\mathbf{c}}\in \text {Ker}_\mathbb {Z}(E^i_2)\), and \({\mathbf{h}}= \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma } \lambda _{\mathbf{c}}{\mathbf{h}}_{\mathbf{c}}\). Due to separability of f we may apply (12) independently to each coordinate, obtaining for each \({\mathbf{c}}\)
Since all arguments of \(f^i\) are integral, we immediately get
Aggregating according to \(\varGamma \), we get (recall that we have \(\sum _{({\mathbf{c}}, \lambda _{{\mathbf{c}}}) \in \varGamma } \lambda _{{\mathbf{c}}} = 1\))
where by \({\hat{f}}\)optimality of \(\varGamma \) the righthand side is equal to \({\hat{f}}^i({\mathbf{z}}) + {\hat{f}}^i({\mathbf{x}})\). As for the lefthand side, observe that decompositions \(\varGamma ' = \{({\mathbf{z}}+ {\mathbf{h}}_{\mathbf{c}}, \lambda _{\mathbf{c}}) \mid ({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma \}\) and \(\varGamma '' = \{({\mathbf{c}} {\mathbf{h}}_{\mathbf{c}}, \lambda _{\mathbf{c}}) \mid ({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma \}\) satisfy \(\sum \varGamma ' = {\mathbf{z}}+ {\mathbf{h}}\) and \(\sum \varGamma '' = {\mathbf{x}} {\mathbf{h}}\) but are only feasible (not necessarily optimal) solutions of (6). Thus, we have
Combining over \(\varGamma \) then yields
and since we have proven this claim for every brick j, aggregation over bricks concludes the proof of the main claim (11). \(\square \)
Let us show that if \({\mathbf{x}}^*\) and \({\mathbf{z}}^*\) are as stated, then there is no configurable cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\).
Lemma 9
Let \({\mathbf{x}}^*\) be a confoptimal solution of (HugeCP) and let \({\mathbf{z}}^*\) be an optimal solution of (HugeIP) such that \(\Vert {\mathbf{x}}^*{\mathbf{z}}^*\Vert _1\) is minimal. Then there is no configurable cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\).
Proof
For the sake of contradiction, suppose that there exists a configurable cycle \({\mathbf{h}}^*\) of \({\mathbf{x}}^*  {\mathbf{z}}^*\). By Lemma 8, one of two cases must occur:
Case 1: \({\hat{f}}({\mathbf{z}}^* + {\mathbf{h}}^*) \le {\hat{f}}({\mathbf{z}}^*)\). Then \({\mathbf{z}}^* + {\mathbf{h}}^*\) is an optimal integer solution (by \({\mathbf{h}}\sqsubseteq {\mathbf{x}}^*  {\mathbf{z}}^*\) we have \({\mathbf{l}}\le {\mathbf{z}}^* + {\mathbf{h}}\le {\mathbf{u}}\) and by \({\mathbf{h}}^* \in \ker _{\mathbb {Z}}\left( E^{(N)}\right) \) we have \(E^{(N)} ({\mathbf{z}}^* + {\mathbf{h}}) = {\mathbf{b}}\)) which is closer to \({\mathbf{x}}^*\), a contradiction to minimality of \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\).
Case 2: \({\hat{f}}({\mathbf{x}}^*  {\mathbf{h}}^*) < {\hat{f}}({\mathbf{x}}^*)\). Since \({\mathbf{h}}^*\) is a configurable cycle, Lemma 7 states that \({\mathbf{x}}^*  {\mathbf{h}}^*\) is configurable, so we have a contradiction with confoptimality of \({\mathbf{x}}^*\). \(\square \)
Overview of the remainder of the proof
In order to use existing proximity arguments to bound the norm of a cycle, our plan is to move into an extended (higherdimensional) space which corresponds to decomposing each brick \({\mathbf{x}}^i\) of \({\mathbf{x}}^*\) into configurations as \({\mathbf{x}}^i = \sum _{{\mathbf{c}}} \lambda _{{\mathbf{c}}} {\mathbf{c}}\) – each summand becomes a new brick in the extended space.
We denote this new higherdimensional representation of \({\mathbf{x}}^*\) with respect to \(\varGamma \) as \(\uparrow {\mathbf{x}}^*\) and call it the rise of \({\mathbf{x}}^*\), and define similarly the rise of \({\mathbf{z}}^*\) (with respect to a given decomposition of each brick of \({\mathbf{x}}^*\)). The situation gets very delicate at this point.
First, we require that each decomposition of a brick of \({\mathbf{x}}^*\) is optimal with respect to the auxiliary objective \({\hat{f}}\) so that we can use the argument about nonexistence of a cycle. Second, because the proximity bound depends on the number of fractional bricks of \(\uparrow {\mathbf{x}}^*\), we require that the decomposition of each brick is small, i.e., into only few elements. Third, we require that each coefficient \(\lambda _{{\mathbf{c}}}\) is of the form \(1/q_{{\mathbf{c}}}\) for an integer \(q_{{\mathbf{c}}}\), because we need to ensure that, for a corresponding cycle brick \({\mathbf{h}}_{{\mathbf{c}}}\), \(\lambda ^{1}_{{\mathbf{c}}} {\mathbf{h}}_{{\mathbf{c}}}\) is an integer vector, so \(\lambda ^{1}_{{\mathbf{c}}}\) has to be an integer. To ensure the second and third condition simultaneously, we first show that there is a decomposition of each brick of size at most \(t+1\) and with each coefficient bounded by P, and then show that each fraction p/q can be written as an Egyptian fraction \(p/q = 1/a_1 + 1/a_2 + \cdots 1/a_{{\mathfrak {c}}}\) with \({\mathfrak {c}} \le 2\log _2 q\) (Lemmas 10–12). (Bounds on the length of Egyptian fractions have been studied in the past and our bound is not the best possible, but in order to use our proximity theorem, we need exact and not merely asymptotic bounds, so we prove this worse but exact bound of \(2\log _2 q\).) We call a decomposition of a brick satisfying all three criteria given above a small scalable decomposition.
Fix a small scalable decomposition for each brick of \({\mathbf{x}}^*\), and let \(\uparrow \!\!{\mathbf{x}}^*\) be the rise of \({\mathbf{x}}^*\) with respect to this decomposition. Since this decomposition is small, \(\uparrow \!\!{\mathbf{x}}^*\) has at most \({{\,\mathrm{\mathrm{poly}}\,}}(\Vert E\Vert _\infty , r, s)\) fractional bricks. Moreover, the other properties above allow us to say the following: if \({\mathbf{r}}\) is a cycle of \(\uparrow \!\!{\mathbf{x}}^*  \uparrow \!\!{\mathbf{z}}^*\), then the compression of \({\mathbf{r}}\) back to the original space is a configurable cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\) (Lemma 14). So in order to bound \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\), it suffices (by triangle inequality) to bound \(\Vert \!\!\uparrow \!\!{\mathbf{x}}^*  \uparrow \!\!{\mathbf{z}}^*\Vert _1\). We do this by adapting the approach of Eisenbrand and Weismantel [11] to bound the length of any cycle \({\mathbf{r}}\) of \(\uparrow \!\!{\mathbf{x}}^*  \uparrow \!\!{\mathbf{z}}^*\).
The remainder of the proof
We say that \(\varGamma \) is the size of the decomposition. Let us show that for each brick, there exists an \({\hat{f}}\)optimal decomposition whose coefficients have small encoding length, and its size is small. For any matrix A, define \(g_\infty (A) = \max _{{\mathbf{g}}\in \mathcal {G}(A)} \Vert {\mathbf{g}}\Vert _\infty \).
Lemma 10
Each brick of \({\mathbf{x}}^*\) of type i has an \({\hat{f}}\)optimal decomposition \(\varGamma \)

1.
of size at most \(t+1\), and

2.
\(\max _{({\mathbf{c}}, \lambda _{{\mathbf{c}}} = p_{{\mathbf{c}}} / q_{{\mathbf{c}}}) \in \varGamma }\{p_{\mathbf{c}},q_{\mathbf{c}}\} \le (t+1)! ((2t2) g_\infty (E^i_2))^{t+1} \le (t+1)^{(t+1)}(g_1(E_2))^{(t+2)} \le (t+1)^{(t+1)}(s\Vert E^i_2\Vert _\infty +1)^{(s+1)(t+2)}\).
Proof
An \({\hat{f}}\)optimal decomposition corresponds to a solution of the LP (6). We will argue that there is a solution whose support is composed of columns which do not differ by much, which corresponds to a solution of an LP with small coefficients, and the claimed bound can then be obtained by Cramer’s rule.
Specifically, we claim that there exists an \({\hat{f}}\)optimal decomposition \(\varGamma \) which corresponds to an optimal solution \(\varvec{ \lambda }\) of (6) such that there exists a point \(\varvec{ \zeta }\in \mathbb {Z}^t\) and if \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\), then \(\Vert {\mathbf{c}} \varvec{ \zeta }\Vert _\infty \le (t1) g_\infty (E^i_2)\). For a solution \(\varvec{ \lambda }\) of (6), define \(R':= \max _{{\mathbf{c}}, {\mathbf{c}}' \in \text {supp}(\varvec{ \lambda })} \Vert {\mathbf{c}} {\mathbf{c}}'\Vert _\infty \) to be the longest side of the bounding box of all \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\). For a point \(\varvec{ \zeta }\in \mathbb {Z}^t\), say, for \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\), that a coordinate \(j \in [t]\) is tight if \(c_j = \zeta _j  {\lceil }{\frac{R'}{2}}{\rceil }\) or \(c_j = \zeta _j + {\lceil }{\frac{R'}{2}}{\rceil }\), and define \(S = \sum _{{\mathbf{c}}\in \text {supp}(\varvec{ \lambda })} \sum _{j=1}^t \lambda _{{\mathbf{c}}} [j \text {is tight in } {\mathbf{c}}]\) (where “[X]” is an indicator of the statement X) to be the weighted number of tight coordinates. Now let \(\varvec{ \zeta }\in \mathbb {Z}^t\) be any point which is an integer center of the bounding box (i.e., \(\Vert {\mathbf{c}} \varvec{ \zeta }\Vert _\infty \le {\lceil }{\frac{R'}{2}}{\rceil }\) for all \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\)) and which minimizes S. For contradiction assume that \(\varvec{ \lambda }\) is an optimal solution of (6) which minimizes \(R'\) and S (lexicographically in this order) and \(R' > (2t2)g_\infty (E^i_2)\). Assuming \(\varGamma \) is a decomposition of a brick of type i, we have \({\mathbf{c}}, {\mathbf{c}}' \in \mathcal {C}^i = \{{\tilde{{\mathbf{c}}}} \in \mathbb {Z}^t \mid E^i_2 {\tilde{{\mathbf{c}}}} = {\mathbf{b}}^i, \, {\mathbf{l}}^i \le {\tilde{{\mathbf{c}}}} \le {\mathbf{u}}^i\}\) and thus \({\mathbf{c}} {\mathbf{c}}' \in \text {Ker}_{\mathbb {Z}}(E^i_2)\). By Proposition 1 we may write \({\mathbf{c}} {\mathbf{c}}' = \sum _{j=1}^{2t2} \gamma _j {\mathbf{g}}_j\) with \({\mathbf{g}}_j \in \mathcal {G}(E^i_2)\) and \({\mathbf{g}}_j \sqsubseteq {\mathbf{c}} {\mathbf{c}}'\) for all \(j \in [2t2]\). Note that because \(\Vert {\mathbf{c}} {\mathbf{c}}'\Vert _\infty > R := (2t2)g_\infty (E^i_2)\), we have that there exists \(j \in [2t2]\) such that \(\gamma _j > 1\). Hence \({\mathbf{g}}:= \sum _{j=1}^{2t2} \lfloor \frac{\gamma _j}{2}\rfloor {\mathbf{g}}_j\) satisfies \({\mathbf{g}}\ne {\mathbf{0}}\). Let \({\bar{{\mathbf{c}}}} := {\mathbf{c}} {\mathbf{g}}\), and \({\bar{{\mathbf{c}}}}' := {\mathbf{c}}' + {\mathbf{g}}\).
First, because \({\bar{{\mathbf{c}}}}  {\bar{{\mathbf{c}}}}' = ({\mathbf{c}} {\mathbf{c}}')  2{\mathbf{g}}= \sum _{j=1}^{2t2} (\gamma _j  2\lfloor \frac{\gamma _j}{2}\rfloor ) {\mathbf{g}}_i\), we may bound \(\Vert {\bar{{\mathbf{c}}}}  {\bar{{\mathbf{c}}}}'\Vert _\infty \le (2t2) g_\infty (E^i_2) = R\). Second, by the conformality of the decomposition, \({\bar{{\mathbf{c}}}}, {\bar{{\mathbf{c}}}}' \in \mathcal {C}^i\). Third, by separable convex superadditivity (Proposition 2), we have that \(f({\mathbf{c}}) + f({\mathbf{c}}') \ge f({\bar{{\mathbf{c}}}}) + f({\bar{{\mathbf{c}}}}')\). Fourth, there exist a coordinate \(j \in [t]\) such that \(c_j  c'_j=R'\) but, since \(\Vert {\bar{{\mathbf{c}}}}  {\bar{{\mathbf{c}}}}'\Vert _\infty \le R\), \({\bar{c}}_j  {\bar{c}}'_j \le R < R'\) and thus j is no longer a tight coordinate for either \({\bar{{\mathbf{c}}}}\) or \({\bar{{\mathbf{c}}}}'\) (or both), and no new tight coordinates can be introduced because \(R < R'\). Without loss of generality, let \(\lambda _{{\mathbf{c}}} \le \lambda _{{\mathbf{c}}'}\). Now initialize \(\varvec{ \lambda }' := \varvec{ \lambda }\) and modify it by setting \(\lambda '_{{\bar{{\mathbf{c}}}}}, \lambda '_{{\bar{{\mathbf{c}}}}'} := \lambda _{{\mathbf{c}}}\), \(\lambda '_{{\mathbf{c}}} := 0\), \(\lambda '_{{\mathbf{c}}'} := \lambda _{{\mathbf{c}}'}  \lambda _{{\mathbf{c}}}\). By our arguments above, \(\varvec{ \lambda }'\) is another optimal solution of (6) but the weighted number S of tight coordinates has decreased by the fourth point, a contradiction.
Thus, there exists a point \(\varvec{ \zeta }\in \mathbb {Z}^t\) and an optimal solution \(\varvec{ \lambda }\) of (6) such that for each \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\), it holds that \(\Vert {\mathbf{c}} \varvec{ \zeta }\Vert _\infty \le R/2 = (t1) g_\infty (E^i_2)\). We obtain the following reduced LP from (6) by deleting all columns \({\mathbf{c}}\) with \(\Vert {\mathbf{c}} \varvec{ \zeta }\Vert _\infty > R/2\), and denote the remaining set of columns by \({\bar{\mathcal {C}}}^i\):
This LP is equivalent to one obtained by subtracting \(\varvec{ \zeta }\) from all columns and the right hand side:
Now, this LP has \(t+1\) rows and its columns have the largest coefficient bounded by R/2 in absolute value. A basic solution \(\varvec{ \lambda }\) has \(\text {supp}(\varvec{ \lambda }) \le t+1\) and, by Cramer’s rule, the denominator of each \(\lambda _{\mathbf{c}}\) is bounded by \((t+1)!\) times the largest coefficient to the power of \(t+1\), thus bounded by
In the worst case, we can bound this as
where we use
[12, Lemma 2]. \(\square \)
Next, we will need the notion of an Egyptian fraction. For a rational number p/q, \(p,q \in \mathbb {N}\), its Egyptian fraction is a finite sum of distinct unit fractions such that
for \(q_1, \dots , q_k \in \mathbb {N}\) distinct. Call the number of terms k the length of the Egyptian fraction. Vose [43] has proven that any p/q has an Egyptian fraction of length \(\mathcal {O}(\sqrt{\log q})\). Since our algorithm requires an exact bound, we present the following weaker yet exact result:
Lemma 11
(Egyptian Fractions) Let \(p, q \in \mathbb {N}\), \(1 \le p < q\). Then p/q has an Egyptian fraction of length at most \(2(\log _2 q)+1\) and all denominators are at most \(q^2\).
Proof
Let \(a=2^k\) be largest such that \(a < q\), so \(k={\lceil }{(\log _2 q)1}{\rceil } < \log _2 q\). Write \(ap = bq+r\), \(0 \le r < q\). Note that \(p< q \implies b < a\) and \(q \le 2a \implies r < 2a\). Now let \((b_{k1}, \dots , b_1, b_0)\) be the binary representation of \(b < a\) so \(b=\sum _{i=0}^{k1} 2^i b_i\) and \(e(r_k, \dots , r_1, r_0)\) be that of \(r < 2a\) so \(r=\sum _{i=0}^{k} r_i 2^i\). Then we have
where \(b_i, r_i \in \{0,1\}\), so a sum of at most \(2k+1 \le 2 (\log _2 q)+1\) terms with all denominators \(d_i \le q 2^k = qa \le q^2\). Moreover, all denominators in the first sum are distinct and at most \(2^k\), and all in the second sum are distinct and at least \(q > 2^k\), hence all distinct, so this is an Egyptian fraction of p/q of length \(2(\log _2 q)+1\) and denominators are at most \(q^2\). \(\square \)
Recall that our goal is to obtain a configurable cycle. However, for that we also need a special form of a decomposition. Say that \(\varGamma \) is a scalable decomposition of a brick \(({\mathbf{x}}^*)^j\) of type i if it is a \({\hat{f}}\)optimal decomposition, and for each \(({\mathbf{c}}_\gamma , \lambda _\gamma ) \in \varGamma \), \(\lambda _\gamma \) is of the form \(1/q_{\gamma }\) for some \(q_{\gamma } \in \mathbb {N}\). We note that in what follows we do not need an algorithm computing a scalable decomposition, only the following existence statement.
Lemma 12
Each brick of \({\mathbf{x}}^*\) has a scalable decomposition of size at most \(\kappa _1 \cdot t^3 \log (t\Vert E_2\Vert _\infty )\), where \(\kappa _1 = 52\).
Proof
Fix \(j \in [N]\). Let \({\mathbf{x}}= ({\mathbf{x}}^*)^j\) be a brick of \({\mathbf{x}}^*\) of type i. By Lemma 10, there exists an \({\hat{f}}\)optimal decomposition of \({\mathbf{x}}\) of size \(t+1\) where each coefficient \(\lambda _{\mathbf{c}}=p_{\mathbf{c}}/q_{\mathbf{c}}\) satisfies \(p_{\mathbf{c}},q_{\mathbf{c}}\le (t+1)^{(t+1)}(s\Vert E^i_2\Vert _\infty +1)^{(s+1)(t+2)}\). For each \({\mathbf{c}}\) in the decomposition now express \(\lambda _{\mathbf{c}}\) as an Egyptian fraction:
By Lemma 11,
Thus, the resulting decomposition is of size at most \((t+1) 25st \log (st\Vert E^i_2\Vert _\infty ) \le 2 \cdot 26t^3 \log (t\Vert E^i_2\Vert _\infty )\) (by \(s \le t\) this justifies the deletion of s in the \(\log ()\) at the cost of a factor of 2, so the last bound holds) and is scalable, since each coefficient is of the form \(1/q_{\gamma }\) for some \(q_{\gamma } \in \mathbb {N}\). \(\square \)
We will now show that we are guaranteed a configurable cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\) if there exists an analogue of a regular cycle of a certain “lifting” of \({\mathbf{x}}^*\) and \({\mathbf{z}}^*\).
Fix for each brick of \({\mathbf{x}}^*\) a scalable decomposition \(\varGamma ^j\). Let \(\uparrow \!\!{\mathbf{x}}^*\) be the rise of \({\mathbf{x}}^*\) defined as a vector obtained from \({\mathbf{x}}^*\) by keeping every integer brick \(({\mathbf{x}}^*)^j\), and replacing every fractional brick \(({\mathbf{x}}^*)^j\) with \(\varGamma ^j\) terms \(\lambda _\gamma {\mathbf{c}}_\gamma \), one for each \(({\mathbf{c}}_\gamma , \lambda _\gamma ) \in \varGamma ^j\). Observe that each brick of \(\uparrow \!\! {\mathbf{x}}^*\) is of the form \(\lambda _{{\mathbf{c}}} {\mathbf{c}}\) for some configuration \({\mathbf{c}}\) and some coefficient \(0 \le \lambda _{{\mathbf{c}}} \le 1\). Thus, for a brick \(\lambda _{{\mathbf{c}}} {\mathbf{c}}\) we say that \({\mathbf{c}}\) is its configuration, \(\lambda _{{\mathbf{c}}}\) is its coefficient, and its type is identical to the type of brick it originated from; in particular, bricks which originated from an integer brick \({\mathbf{p}}= ({\mathbf{x}}^*)^j\) are of the form \(\lambda _{{\mathbf{p}}} {\mathbf{p}}\) with \(\lambda _{{\mathbf{p}}} = 1\). Let \(N'\) be the number of bricks of \(\uparrow \!\! {\mathbf{x}}^*\) and define a mapping \(\nu : [N'] \rightarrow [N]\) such that if a brick \(j \in [N']\) of \(\uparrow \!\! \!\!{\mathbf{x}}^*\) was defined from brick \(\ell \in [N]\) of \({\mathbf{x}}^*\), then \(\nu (j) = \ell \). The natural inverse \(\nu ^{1}\) is defined such that, for \(\ell \in [N]\), \(\nu ^{1}(\ell )\) is the set of bricks of \(\uparrow \!\! {\mathbf{x}}^*\) which originated from \(({\mathbf{x}}^*)^\ell \).
Lemma 13
The vector \(\uparrow \!\! {\mathbf{x}}^*\) has at most \(\kappa _2 \cdot r \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks, where \(\kappa _2 = 2\kappa _1\).
Proof
By Lemma 6 there is a confoptimal \({\mathbf{x}}^*\) with at most 2r fractional bricks. By Lemma 12 for each fractional brick of \({\mathbf{x}}^*\) of type i there is a scalable decomposition of size at most \(\kappa _1 \cdot t^3 \log (t\Vert E^i_2\Vert _\infty ) \le \kappa _1 \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\). Thus, \(\uparrow \!\! {\mathbf{x}}^*\) has at most \(\kappa _1 \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks for each fractional brick of \({\mathbf{x}}^*\), of which there are at most 2r, totaling \(2\kappa _1 \cdot r\cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks. \(\square \)
Denote by \(\uparrow \!\!{\mathbf{z}}^* \in \mathbb {R}^{N't}\) the rise of \({\mathbf{z}}^*\) (with respect to \({\mathbf{x}}^*\)) defined as follows. Let \(j \in [N']\), \(\ell = \nu (j)\), and \(\lambda \) be the coefficient of the jth brick of \(\uparrow \!\! {\mathbf{x}}^*\). Then the jth brick of \(\uparrow \!\! {\mathbf{z}}^*\) is \((\uparrow \!\! {\mathbf{z}}^*)^j := \lambda ({\mathbf{z}}^*)^{\ell }\). Observe that \(\Vert \!\!\uparrow \!\! {\mathbf{x}}^*  \uparrow \!\! {\mathbf{z}}^*\Vert _1 \ge \Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\) by applying triangle inequality to each brick and its decomposition individually and aggregating.
For any vector \({\mathbf{x}}\in \mathbb {R}^{N't}\), define the fall of \({\mathbf{x}}\) as a vector \(\downarrow \!\!{\mathbf{x}}\in \mathbb {R}^{Nt}\) such that for \(\ell \in [N]\), \((\downarrow \!\!{\mathbf{x}})^\ell = \sum _{j \in \nu ^{1}(\ell )} {\mathbf{x}}^j\). We see that \(\downarrow \!\! (\uparrow \!\! {\mathbf{x}}^*) = {\mathbf{x}}^*\) and \(\downarrow \!\! (\uparrow \!\! {\mathbf{z}}^*) = {\mathbf{z}}^*\). Say that \({\mathbf{r}}\) is a cycle of \(\uparrow \!\! {\mathbf{x}}^*  \uparrow \!\! {\mathbf{z}}^*\) if \({\mathbf{r}}\sqsubseteq \uparrow \!\! {\mathbf{x}}^*  \uparrow \!\! {\mathbf{z}}^*\) and \({\mathbf{r}}\in \text {Ker}_{\mathbb {Z}}(E^{(N')})\).^{Footnote 3}
Lemma 14
If \({\mathbf{r}}\) is a cycle of \(\uparrow \!\! {\mathbf{x}}^*  \uparrow \!\! {\mathbf{z}}^*\), then \(\downarrow {\mathbf{r}}\) is a configurable cycle of \({\mathbf{x}}^*  {\mathbf{z}}^*\).
Proof
To show that \(\downarrow \!\!\!{\mathbf{r}}\) is a configurable cycle, we need to show that (1) \(\downarrow \!\!\!{\mathbf{r}}\in \text {Ker}_{\mathbb {Z}}(E^{(N)})\) and, (2) for each brick \({\mathbf{x}}= ({\mathbf{x}}^*)^j\) of \({\mathbf{x}}^*\), there is an \({\hat{f}}\)optimal decomposition of \({\mathbf{x}}\) such that \({\mathbf{h}}= (\downarrow \!\! {\mathbf{r}})^j\) decomposes accordingly. For the first part, \(\downarrow \!\! {\mathbf{r}}\) is integral because it is obtained by summing bricks of \({\mathbf{r}}\), which is integral. Denote by i(j) the type of a brick j (we abuse this notation; note that i(j) for \(j \in [N]\) may differ from i(j) for \(j \in [N']\), but context always makes clear what we mean). By the fact that \({\mathbf{r}}\in \text {Ker}_{\mathbb {Z}}(E^{(N')})\) and the definition of \(\downarrow \!\! {\mathbf{r}}\), we have \({\mathbf{0}}= \sum _{j=1}^{N'} E^{i(j)}_1 {\mathbf{r}}^j = \sum _{j=1}^{N} E^{i(j)}_1 (\downarrow \!\! {\mathbf{r}})^j\), and, for each \(\ell \in [N]\), \({\mathbf{0}}= \sum _{j \in \nu ^{1}(\ell )} E^{i(j)}_2 {\mathbf{r}}^j = E^{i(\ell )}_2(\downarrow \!\!{\mathbf{r}})^\ell \), thus \(\downarrow \!\! {\mathbf{r}}\in \text {Ker}_{\mathbb {Z}}(E^{(N)})\).
To see the second part, fix a brick \(j \in [N]\) of type i and let \({\mathbf{x}}= ({\mathbf{x}}^*)^j\), \({\mathbf{z}}= ({\mathbf{z}}^*)^j\) and \({\mathbf{h}}= (\downarrow \!\! {\mathbf{r}})^j\). We need to show that \({\mathbf{h}}= \sum _{\gamma \in \nu ^{1}(j)} {\mathbf{h}}_\gamma \) can be written as \(\sum _{{\mathbf{c}}\in \mathcal {C}^i} \lambda _{\mathbf{c}}{\mathbf{h}}_{\mathbf{c}}\) with \({\mathbf{h}}_{\mathbf{c}}\sqsubseteq {\mathbf{c}} {\mathbf{z}}\) and \({\mathbf{h}}_{\mathbf{c}}\in \text {Ker}_\mathbb {Z}(E^i_2)\). By definition of \(\uparrow \!\! {\mathbf{x}}^*\) and \({\mathbf{r}}\), there is a scalable decomposition \(\varGamma \) of \({\mathbf{x}}\) (namely the one used to define \(\uparrow \!\! \!{\mathbf{x}}^*\)) such that for each \(\gamma \in \nu ^{1}(j)\), \({\mathbf{h}}_\gamma \sqsubseteq \lambda _\gamma ({\mathbf{c}}_\gamma  {\mathbf{z}})\) and \({\mathbf{h}}_\gamma \in \text {Ker}_\mathbb {Z}(E^i_2)\). Thus we may write \({\mathbf{h}}= \sum _{\gamma \in \nu ^{1}(j)} \lambda _\gamma \cdot (\lambda ^{1}_\gamma {\mathbf{h}}_\gamma )\) with \(\lambda ^{1}_\gamma {\mathbf{h}}_\gamma \sqsubseteq {\mathbf{c}}_\gamma  {\mathbf{z}}\) and \(\lambda ^{1}_\gamma {\mathbf{h}}_\gamma \) integral by the fact that \(\lambda _\gamma = 1/q_{\gamma }\) with \(q_{\gamma } \in \mathbb {N}\), concluding the proof. \(\square \)
We are finally ready to use the Steinitz Lemma to derive a bound on \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\).
Theorem 2
Let \({\mathbf{x}}^*\) be a confoptimal solution of (HugeCP) with at most 2r fractional bricks. Then there exists an optimal solution \({\mathbf{z}}^*\) of (HugeIP) such that
Proof
Denote by \({\bar{E}}_1\) the first r rows of the matrix \(E^{(N)}\). Let \({\mathbf{z}}^*\) be an optimal integer solution such that \(\Vert {\mathbf{z}}^*  {\mathbf{x}}^*\Vert _1\) is minimal, let \(\uparrow \!\! \!{\mathbf{x}}^*\) be the rise of \({\mathbf{x}}^*\) with at most \(\kappa _2 \cdot r \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks (see Lemma 13), let \(\uparrow \!\! {\mathbf{z}}^*\) be the rise of \({\mathbf{z}}^*\) with respect to \({\mathbf{x}}^*\), and let \({\mathbf{q}}= \uparrow \!\! {\mathbf{x}}^*  \uparrow \!\! {\mathbf{z}}^*\).
We want to get into the setting of the Steinitz Lemma, that is, to obtain a sequence of vectors with small \(\ell _1\)norm and summing up to zero. To this end, we shall decompose \({\bar{E}}_1{\mathbf{q}}\) in the following way; we stress that we have \({\bar{E}}_1{\mathbf{q}}= {\mathbf{0}}\). For every integral brick \({\mathbf{q}}^i\) of type \(\ell \in [\tau ]\) we have its decomposition \({\mathbf{q}}^i = \sum _j {\mathbf{g}}^i_j\) into elements of \(\mathcal {G}(E^\ell _2)\) by the Positive Sum Property (Proposition 1); for each \({\mathbf{g}}^i_j\) append \(E^\ell _1{\mathbf{g}}^i_j\) into the sequence. For every fractional brick \({\mathbf{q}}^i\) of type \(\ell \in [\tau ]\) we have its decomposition \({\mathbf{q}}^i = \sum _{j=1}^{t} \alpha _j {\mathbf{g}}^i_j\), \(\alpha _j \ge 0\) for each j, into elements of \(\mathcal {C}(E^\ell _2)\); for each \({\mathbf{g}}^i_j\) append \({\lfloor }{\alpha _j}{\rfloor }\) copies of \(E^\ell _1 {\mathbf{g}}^i_j\) into the sequence, and finally append \(E^\ell _1 \{\alpha _j\} {\mathbf{g}}^i_j\). Observe that since \(\uparrow \!\! {\mathbf{x}}^*\) has at most \(\kappa _2 \cdot r \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks (Lemma 13), so does \({\mathbf{q}}\), and thus we have appended \({\mathfrak {f}} \le t \cdot \kappa _2 \cdot r \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty ) \le \kappa _2 \cdot r \cdot t^4 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional vectors into the sequence. Now we have a sequence
with m integer vectors \({\mathbf{o}}_1, \dots , {\mathbf{o}}_m\) and \({\mathfrak {f}}\) fractional vectors \({\mathbf{p}}_{m+1}, \dots , {\mathbf{p}}_{m+{\mathfrak {f}}}\). Moreover, since, for each \(i \in [\tau ]\), \(\mathcal {C}(E^i_2) \subseteq \mathcal {G}(E^i_2)\),
each vector has \(\ell _\infty \)norm of \(\Vert E^1_1,\dots ,E^\tau _1\Vert _\infty \cdot g_1(E_2)\) and they sum up to \({\mathbf{0}}\). Observe that \((m+{\mathfrak {f}})\cdot g_1(E_2) \ge \Vert {\mathbf{q}}\Vert _1 = \Vert \uparrow \!\! {\mathbf{x}}^*  \uparrow \!\! {\mathbf{z}}^*\Vert _1 \ge \Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\). We now focus on bounding \(m+{\mathfrak {f}}\). The Steinitz Lemma (Lemma 1) implies that there exists a permutation \(\pi \) such that the sequence (15) can be rearranged as
where \({\mathbf{v}}_i\) is \({\mathbf{o}}_{\pi ^{1}(i)}\) if \(i \in [1,m]\) and \({\mathbf{p}}_{\pi ^{1}(i)}\) if \(i \in [m+1, m+{\mathfrak {f}}]\), respectively, and for each \(1 \le k \le m+{\mathfrak {f}}\) the prefix sum \({\mathbf{t}}_k := \sum _{i=1}^k {\mathbf{v}}_{i}\) satisfies
We will now argue that there cannot be indices \(1 \le k_1< \cdots < k_{{\mathfrak {f}}+2} \le {\mathfrak {f}}+m\) with
which implies that \({\mathfrak {f}}+m\) is bounded by \({\mathfrak {f}}+1\) times the number of integer points of norm at most \(r \Vert E_1\Vert _\infty g_1(E_2)\) and therefore,
Assume for contradiction that there exist \({\mathfrak {f}}+2\) indices \(1 \le k_1< \cdots < k_{{\mathfrak {f}}+2} \le {\mathfrak {f}}+m\) satisfying (17). By the pigeonhole principle, there is an index \(k_\ell \) such that all the vectors \({\mathbf{v}}_{k_{\ell }+1},\dots ,{\mathbf{v}}_{k_{\ell +1}}\) from the rearrangement (16) correspond to integer vectors \({\mathbf{o}}_{\pi ^{1}(p)}\) for \(p \in [k_{\ell }+1, k_{\ell +1}]\). We will show that this collection of vectors corresponds to a cycle \({\mathbf{h}}\) of \(\uparrow \!\! {\mathbf{x}}^*  \uparrow \!\! {\mathbf{z}}^*\) which by the minimality of \(\Vert {\mathbf{x}}^*  {\mathbf{z}}^*\Vert _1\) and Lemmas 9 and 14 is impossible. To obtain the cycle, for each \(p \in [k_{\ell }+1, k_{\ell +1}]\), let i(p), j(p), and \(\ell (p)\) be such that \({\mathbf{o}}_{\pi ^{1}(p)} = E^{\ell (p)}_1 {\mathbf{g}}_{j(p)}^{i(p)}\). Initialize \({\mathbf{h}}:= {\mathbf{0}}\in \mathbb {Z}^{N't}\) and, for each \(p \in [k_{\ell }+1, k_{\ell +1}]\), let \({\mathbf{h}}^{i(p)} := {\mathbf{h}}^{i(p)} + g_{j(p)}^{i(p)}\). Now we check that \({\mathbf{h}}\) is, in fact, a cycle. First, to see that \(E^{(N')} {\mathbf{h}}= {\mathbf{0}}\), we have \(E^\ell _2 {\mathbf{h}}^i = {\mathbf{0}}\) for every brick \(i \in [N']\) of type \(\ell \) by the fact that \({\mathbf{h}}^i\) is a sum of \({\mathbf{g}}_j^i \in \mathcal {G}(E^\ell _2) \subseteq \text {Ker}_{\mathbb {Z}}(E^\ell _2)\), and we have \({\bar{E}}_1 {\mathbf{h}}= {\mathbf{0}}\) by the fact that \({\mathbf{t}}_{k_\ell } = {\mathbf{t}}_{k_{\ell +1}}\) and thus \(\sum _{p \in [m+{\mathfrak {f}}]} E^{\ell (p)}_1 {\mathbf{g}}_{j(p)}^{i(p)} ={\mathbf{0}}\). Second, \({\mathbf{h}}\sqsubseteq {\mathbf{q}}\) because, for every brick \(i \in [N']\), \({\mathbf{h}}^i\) is a signcompatible sum of elements \({\mathbf{g}}^i_j \sqsubseteq {\mathbf{q}}^i\). \(\square \)
Improving the proximity theorem when I has identical columns
In this section we will show how to construct a huge nfold instance \(I'\) from any input instance I such that the number of columns of \(I'\) per brick is at most \((2\Vert E\Vert _\infty +1)^{r+s}\), and in some sense I and \(I'\) are equivalent. Specifically, we will show a mapping between the solutions of I and \(I'\) which maps integer or configurable optima of I to integer or configurable optima of \(I'\) and vice versa, respectively, and such that proximity bounds from \(I'\) can be transferred to I. This will eventually allow us to show that even if I has very large t, we can bound the distance between a configurable optimum and some integer optimum of I by a function independent of t.
Construction of \(I'\).
Note that \((2\Vert E\Vert _\infty +1)^{r+s}\) is the number of distinct \((r+s)\)dimensional integer vectors with entries bounded by \(\Vert E\Vert _\infty \) in absolute value, hence the number of possible distinct columns per brick. We will show how to “join” variables corresponding to identical columns. Consider any IP with a separable convex objective where columns corresponding to variables \(x_1\) and \(x_2\) are identical. Let \(f_1\) and \(f_2\) be the objective functions corresponding to \(x_1\) and \(x_2\), and \(l_1, l_2\) and \(u_1, u_2\) be their lower and upper bounds, respectively. Let \(x_{12}\) be a new variable which replaces \(x_1, x_2\) in \(I'\). Set the lower bound of \(x_{12}\) to be \(l_{12}=l_1 + l_2\), upper bound \(u_{12} = u_1 + u_2\), and define its objective function as the \((\min ,+)\)convolution of \(f_1\) and \(f_2\):
Note that if \(f_1\) and \(f_2\) are convex, then \(f_{12}\) is also convex. Extend \(f_{12}\) to fractional values as a linear interpolation, that is, for \(x_{12} = {\lfloor }{x_{12}}{\rfloor } + \{x_{12}\}\) fractional, let \(f_{12}(x_{12})\) be \(f_{12}({\lfloor }{x_{12}}{\rfloor }) + \{x_{12}\} (f_{12}({\lceil }{x_{12}}{\rceil })  f_{12}({\lfloor }{x_{12}}{\rfloor }))\). The value \(f_{12}(x_{12})\) can be obtained by binary search on \(x_1\) (which determines \(x_2 = x_{12}  x_1\)) in \(\mathcal {O}(\log (u_{12}  l_{12}))\) calls to evaluation oracles for \(f_1\) and \(f_2\). When merging a set S of more than 2 variables, one would compute \(f_S(x_S)\) as the solution of the corresponding integer program whose objective is \(\sum _{i \in S} f_i(x_i)\) and its constraints are \(\sum _{i \in S} x_i = x_S\) and appropriate lower and upper bounds; by [13], this is solvable in time \({{\,\mathrm{\mathrm{poly}}\,}}(S) \log (f_{\max }, u_S  l_S)\). However, our goal here is to strengthen our proximity result for I by studying \(I'\), without actually attempting to solve \(I'\).
For a solution \({\mathbf{x}}\) of I (not necessarily integral), we define \(\sigma ({\mathbf{x}})\) to be a solution of \(I'\) where \(x_1\) and \(x_2\) are replaced by \(x_{12} = x_1 + x_2\). Clearly, for integer \({\mathbf{x}}\), the value of \(\sigma ({\mathbf{x}})\) under the objective of \(I'\) is at most the value of \({\mathbf{x}}\) under f, and if \({\mathbf{x}}\) is an integer optimum of I, then \(\sigma ({\mathbf{x}})\) will be an integer optimum of \(I'\) because we then have \(f_{12}(x_{12}) = f_1(x_1) + f_2(x_2)\). We abuse the notation and for an integer \({\mathbf{x}}'\) define \(\sigma ^{1}({\mathbf{x}}')\) to be some integral member \({\mathbf{x}}\) of the set \(\sigma ^{1}({\mathbf{x}}')\) which satisfies \(f_1(x_1) + f_2(x_2) = f_{12}(x_{12}')\). For a configurable solution \({\mathbf{x}}'\) we define \(\sigma ^{1}({\mathbf{x}}')\) by taking an \({\hat{f}}\)optimal decomposition \(\varGamma '\) of the brick of \({\mathbf{x}}'\) containing \(x_{12}\) and applying \(\sigma ^{1}\) to the configurations in \(\varGamma '\); this defines a decomposition \(\varGamma \) and thus a brick \(\sum \varGamma \) of a solution \({\mathbf{x}}\) of I. The next lemma shows that this construction preserves the value of the solution.
Lemma 15
If \({\mathbf{x}}\) is an integer optimum of I, then \(\sigma ({\mathbf{x}})\) is an integer optimum of \(I'\), respectively. Similarly, if \({\mathbf{x}}\) is a configurable optimum of I, then \(\sigma ({\mathbf{x}})\) is a configurable optimum of \(I'\). Analogously, if \({\mathbf{x}}'\) an integer optimum of \(I'\), then \(\sigma ^{1}({\mathbf{x}}')\) is an integer optimum of I, and if \({\mathbf{x}}'\) is a configurable optimum of \(I'\), then \(\sigma ^{1}({\mathbf{x}}')\) is a configurable optimum of I.
Proof
It follows from the definition of \(f_{12}\) that for any integer solution of I we get an integer solution of \(I'\) which is at least as good, and for any integer solution of \(I'\) we get an integer solution of I with the same value. For configurable solutions we apply the observation above to each configuration in some \({\hat{f}}\)optimal decomposition and use the fact that \({\hat{f}}\) is defined via \(f_{12}\). \(\square \)
This approach generalizes readily to any number of variables. For the sake of simplicity we continue with the example of “joining” two variables whose columns in \(E^{(N)}\) are identical.
We are left to argue about proximity. While we believe that it holds in general that any proximity bound between integer and configurable optima of \(I'\) transfers to I, we only need this for our specific bound, so we take a less general route.
Lemma 16
Let \({\mathbf{x}}\) be a configurable optimum of I with at most 2r fractional bricks, \({\mathbf{x}}' = \sigma ({\mathbf{x}})\) a configurable optimum of \(I'\), \({\mathbf{z}}'\) an \(\ell _1\)closest integer optimum of \(I'\), and \({\mathbf{z}}= \sigma ^{1}({\mathbf{z}}')\) an integer optimum of I. Let P be the bound of Theorem 2 on \(\Vert {\mathbf{x}}'  {\mathbf{z}}'\Vert _1\). Then \(\Vert {\mathbf{x}} {\mathbf{z}}\Vert _1 \le P\).
Proof
Consider the proof of Theorem 2. In it, we create a sequence of vectors \(v_1, \dots , v_{m+{\mathfrak {f}}}\). Each of these vectors corresponds to some \(E_1^{\ell } \lambda _j g_j^i\). The crucial observation is that the sequence \(({\mathbf{v}}_i)_i\) obtained from \({\mathbf{x}}, {\mathbf{z}}\) is identical to the sequence obtained from \({\mathbf{x}}', {\mathbf{z}}'\), so if \(\Vert {\mathbf{x}}'  {\mathbf{z}}'\Vert _1 \le P\), then also \(\Vert {\mathbf{x}} {\mathbf{z}}\Vert _1 \le P\). \(\square \)
The next corollary is now immediate:
Corollary 1
Let \({\mathbf{x}}^*\) be a confoptimal solution of (HugeCP) with at most 2r fractional bricks. Then there is an optimal solution \({\mathbf{z}}^*\) of (HugeIP) such that
Algorithm
Recall the statement of the theorem we are proving: Theorem 1. Huge Nfold IP with any separable convex objective can be solved in time
Proof
We first give a description of the algorithm which solves huge Nfold IP, then show its correctness, and finally give a time complexity analysis.
Description of the algorithm. First, obtain an optimal solution \({\mathbf{y}}\) of (ConfLP) and from it a confoptimal solution \({\mathbf{x}}^* = \varphi ({\mathbf{y}})\) with at most 2r fractional bricks by Lemma 6. Applying Corollary 1 to \({\mathbf{x}}^*\) guarantees the existence of an integer optimum \({\mathbf{z}}^*\) satisfying
Together with the fact that there are at most 2r fractional bricks, this implies that \({\mathbf{z}}^*\) differs from \({\mathbf{x}}^*\) in at most \(P' = P + 2r\) bricks. The idea of the algorithm is to “fix” the value of the solution on “almost all” bricks and compute the rest using an auxiliary \({\bar{N}}\)fold IP problem with a polynomial \({\bar{N}}\).
Formally, our goal is to compute an optimal solution \({\mathbf{z}}\) of (HugeIP) represented succinctly by multiplicities of configurations, or in other words, as a solution \(\varvec{ \zeta }\) of (ConfILP). Denote by \({\mathbf{y}}_{P'}\) the vector whose coordinates are defined by setting, for every type \(i \in [\tau ]\) and every configuration \({\mathbf{c}}\in \mathcal {C}^i\), \({\mathbf{y}}_{P'}(i,{{\mathbf{c}}}) = \max \{0, {\lfloor }{y(i, {\mathbf{c}})}{\rfloor }  P'\}\) This leaves us with \(\Vert {\mathbf{y}}\Vert _1  \Vert {\mathbf{y}}_{P'}\Vert _1 \le \text {supp}({\mathbf{y}}) P' \le (r+\tau ) P' =: {\bar{P}}\) bricks to determine. Let \({\bar{\varvec{ \zeta }}} = {\mathbf{y}} {\mathbf{y}}_{P'}\), define \({\bar{\varvec{ \mu }}}\) by setting, for each \(i \in [\tau ]\), \({\bar{\varvec{ \mu }}}_i := \sum _{{\mathbf{c}}\in \mathcal {C}^i} {\bar{\zeta }}(i,{\mathbf{c}})\), let \({\bar{{\mathbf{x}}}} = \varphi ({\bar{\varvec{ \zeta }}})\), and let \({\bar{N}} = \Vert {\bar{\varvec{ \zeta }}}\Vert _1 = \Vert {\bar{\varvec{ \mu }}}\Vert _1 \le {\bar{P}}\). Construct an auxiliary \({\bar{N}}\)fold IP instance with the same blocks \(E^i_1, E^i_2\), \(i \in [\tau ]\), by, for each brick \({\bar{{\mathbf{x}}}}^j\) of type i, setting
We say that such a brick was derived from type i. Lastly, let \({\bar{{\mathbf{b}}}}^0 = {\mathbf{b}}^0  \sum _{i=1}^{\tau } \sum _{{\mathbf{c}}\in \mathcal {C}^i} \zeta (i, {\mathbf{c}}) E^i_1 {\mathbf{c}}\).
After obtaining an optimal solution \({\bar{{\mathbf{z}}}}\) of this instance we update \(\varvec{ \zeta }\) as follows. For each brick \({\bar{{\mathbf{z}}}}^j\) derived from type i, increment \(\zeta (i,{\bar{{\mathbf{z}}}}^j)\) by one.
Correctness. By (19), it is correct to assume that there exists a solution \(\varvec{ \zeta }\) of (ConfILP) which has \(\zeta (i,{\mathbf{c}}) \ge \max \{0, {\lfloor }{y(i, {\mathbf{c}})}{\rfloor }  P'\}\) for each \(i \in [\tau ]\) and \({\mathbf{c}}\in \mathcal {C}^i\). Thus we may do a variable transformation of (ConfILP) \(\varvec{ \zeta }= {\bar{\varvec{ \zeta }}} + {\mathbf{y}}_{P'}\), obtaining an auxiliary (ConfILP) instance
The auxiliary huge \({\bar{N}}\)fold instance is simply the instance corresponding to the above, and the final construction of \(\varvec{ \zeta }\) corresponds to the described variable transformation.
Complexity. Since \(\Vert {\bar{\varvec{ \zeta }}}\Vert _1 \le {\bar{P}}\), we can obtain an optimal solution \({\bar{{\mathbf{z}}}}\) of the auxiliary instance in time \((\Vert E\Vert _\infty r s)^{\mathcal {O}(r^2s + rs^2)} (t {\bar{P}}) \log (t {\bar{P}}) \log \Vert f_{\max }, {\bar{{\mathbf{b}}}}, {\bar{{\mathbf{l}}}}, {\bar{{\mathbf{u}}}}\Vert _\infty ^2\) [13, Corollary 91]. Let us now compute the time needed altogether. To solve (ConfLP), we need time (Lemma 6)
To solve the auxiliary instance above, we need time
Hence we can solve huge Nfold IP in time at most
\(\square \)
Concluding remarks
At this point one may wonder why bother with the ConfLP rather than solving HugeCP and showing that its optima are close to those of HugeIP. The reason is that even though handling optima of HugeCP is much easier than handling confoptimal solutions, and even though solving HugeCP is easier than solving ConfLP,^{Footnote 4} a HugeCP optimum can be very far from a HugeIP optimum [8, Proposition 1]. In other words, ConfLP is a stronger relaxation than HugeCP: consider a brick \({\mathbf{p}}\) of a HugeCP optimum and a brick \({\mathbf{q}}\) of a confoptimal solution; then
In plain language, while \({\mathbf{q}}\) lies in the integer hull of all configurations, \({\mathbf{p}}\) only lies in the fractional relaxation of this hull.
Another obstacle is that even though Configuration LP is a standard tool, it is typical that the separation problem is merely approximated rather than solved exactly, leading to approximate solutions of ConfLP. But, we require an exact solution, and so we use a parameterized exact algorithm for IP to solve the separation problem. It is an interesting question when a kapproximate solution of ConfLP, i.e., a solution whose value is at most \(k \cdot OPT\), may be used to obtain an h(k)accurate configurable solution of HugeCP, i.e., a configurable solution which is at \(\ell _1\)distance at most h(k) from a configurable optimum. An approximate solution of ConfLP might be much easier to obtain, and yet it may be almost as good as an exact solution for our purposes here.
Another interesting question is a tight complexity bound for the algorithm of Lemma 6. It seems likely that the recent approach of Cslovjecsek et al. [8] could also apply in our highmultiplicity setting, which would yield a nearlinear fixedparameter algorithm. Notice that the iterative augmentation algorithms for standard Nfold IP have a strong combinatorial flavor and use no “black boxes”. Could the ellipsoid method behind Lemma 6 be replaced by a (more) combinatorial algorithm, at least for some important problems which have huge Nfold IP models, such as the scheduling problems studied by Knop et al. [31]?
Availability of data and material (data transparency)
Not applicable.
Code Availability
Not applicable.
Notes
e.g., potentially \({\mathbf{x}}^j = \frac{1}{2}(({\mathbf{x}}^j + {\mathbf{c}}) + ({\mathbf{x}}^j  {\mathbf{c}}))\) for some \({\mathbf{c}}\), and \(\varGamma ^j\) is optimal for any linear objective.
An inequality \({\mathbf{a}}{\mathbf{x}}\le b\) is dominated by \({\mathbf{c}}{\mathbf{x}}\le d\) if for every \({\mathbf{x}}\) such that \({\mathbf{c}}{\mathbf{x}}\le d\) we also have \({\mathbf{a}}{\mathbf{x}}\le b\).
Recall that \(E^{(N')}\) is the \(N'\)fold matrix formed from blocks E, see Eq. (1).
We only outline the reason. We claim that an optimal solution of HugeCP can be obtained in the following way: construct an auxiliary \(\tau \)fold CP with bricks \({\bar{{\mathbf{x}}}}^1, \dots , {\bar{{\mathbf{x}}}}^\tau \) where the ith brick \({\bar{{\mathbf{x}}}}^i\), \(i \in [\tau ]\), represents the \(\mu ^i\) bricks of type i in the original instance. This is achieved by setting the bounds to \(\mu ^i {\mathbf{l}}^i\) and \(\mu ^i {\mathbf{u}}^i\), the right hand side to \(\mu ^i{\mathbf{b}}^i\), and the objective to \(\mu ^i f^i({\bar{{\mathbf{x}}}}^i/\mu ^i)\). Given an optimum \({\bar{{\mathbf{x}}}}\) of this \(\tau \)fold CP, we set each brick of type \({\mathbf{x}}^i\) to the value \({\bar{{\mathbf{x}}}}^i/\mu ^i\), and claim that \({\mathbf{x}}\) is an optimum of HugeCP. Thus, while HugeCP can be solved in polynomial time, the separation subproblem needed to solve ConfLP can be NPhard, in this sense making ConfLP harder.
References
Alon, N., Azar, Y., Woeginger, G.J., Yadid, T.: Approximation schemes for scheduling on parallel machines. J. Sched. 1(1), 55–66 (1998)
Altmanová, K., Knop, D., Koutecký, M.: Evaluating and tuning \(n\)fold integer programming. ACM J. Exp. Algorithmics 24(1), 1–22 (2019)
Aykanat, C., Pinar, A., Çatalyürek, Ü.V.: Permuting sparse rectangular matrices into blockdiagonal form. SIAM J. Scientific Comput. 25(6), 1860–1879 (2004)
Bergner, M., Caprara, A., Ceselli, A., Furini, F., Lübbecke, M.E., Malaguti, E., Traversi, E.: Automatic DantzigWolfe reformulation of mixed integer programs. Math. Prog. 149(1–2), 391–424 (2015)
Borndörfer, R., Ferreira, C.E., Martin, A.: Decomposing matrices into blocks. SIAM J. Optim. 9(1), 236–269 (1998)
Chen, L., Marx, D.: Covering a tree with rooted subtrees—parameterized and approximation algorithms. In Proc. SODA 2018, 2801–2820 (2018)
Cosmadakis, S.S., Papadimitriou, C.H.: The traveling salesman problem with many visits to few cities. SIAM J. Comput. 13(1), 99–108 (1984)
Cslovjecsek, J., Eisenbrand, F., Hunkenschröder, C., Rohwedder, L., Weismantel, R.: Blockstructured integer and linear programming in strongly polynomial and near linear time. In: Proc. SODA 2021, pp. 1666–1681 (2021)
De Loera, J.A., Hemmecke, R., Onn, S., Weismantel, R.: \(n\)fold integer programming. Discrete Optim. 5(2), 231–241 (2008)
De Loera, J. A., Hemmecke, R., Köppe, M.: Algebraic and geometric ideas in the theory of discrete optimization, volume 14 of MOSSIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA (2013)
Eisenbrand, F., Weismantel, R.: Proximity results and faster algorithms for integer programming using the Steinitz lemma. ACM Trans. Algorithms 16(1), 5:15:14 (2020)
Eisenbrand, F., Hunkenschröder, C., Klein, K.M.: Faster algorithms for integer programs with block structure. In Proc. ICALP 2018, volume 107 of Leibniz Int. Proc. Informatics, pp. 49:1–49:13 (2018)
Eisenbrand, F., Hunkenschröder, C., Klein, K., Koutecký, M., Levin, A., Onn, S.: An algorithmic theory of integer programming. Technical report (2019). http://arXiv.org/abs/1904.01361
Fernandez de la Vega, W., Lueker, G.S.: Bin packing can be solved within \(1+\varepsilon \) in linear time. Combinatorica 1(4), 349–355 (1981)
Ferris, M.C., Horn, J.D.: Partitioning mathematical programs for parallel solution. Math. Prog. 80(1), 35–61 (1998)
Gamrath, G., Lübbecke, M. E.: Experiments with a generic Dantzig–Wolfe decomposition for integer programs. In: Proc. SEA 2010, Lecture Notes in Computer Science, vol. 6049, pp. 239–252. Springer (2010)
Gilmore, P.C., Gomory, R.E.: A linear programming approach to the cuttingstock problem. Oper. Res. 9, 849–859 (1961)
Grötschel, M., Lovász, L., Schrijver, A.: Geometric algorithms and combinatorial optimization, vol. 2 of Algorithms and Combinatorics. SpringerVerlag, Berlin, second edition (1993)
Hemmecke, R., Köppe, M., Weismantel, R.: Graver basis and proximity techniques for blockstructured separable convex integer minimization problems. Math. Prog. 145(1–2, Ser. A), 1–18 (2014)
Hochbaum, D.S., Shamir, R.: Strongly polynomial algorithms for the high multiplicity scheduling problem. Oper. Res. 39(4), 648–653 (1991)
Hochbaum, D.S., Shantikumar, J.G.: Convex separable optimization is not much harder than linear optimization. J. ACM 37(4), 843–862 (1990)
Hochbaum, D.S., Shmoys, D.B.: Using dual approximation algorithms for scheduling problems: theoretical and practical results. J. Assoc. Comput. Mach. 34(1), 144–162 (1987)
Jansen, K., Rohwedder, L.: On integer programming and convolution. In Proc. ITCS 2019, vol. 124 of Leibniz Int. Proc. Informatics, pp. 43:1–43:17 (2019)
Jansen, K., SolisOba, R.: A polynomial time \(OPT+1\) algorithm for the cutting stock problem with a constant number of object lengths. Math. Oper. Res. 36(4), 743–753 (2011)
Jansen, K., Klein, K., Maack, M., Rau, M.: Empowering the ConfigurationIP — new PTAS results for scheduling with setups times. In Proc. ITCS 2019, volume 124 of Leibniz Int. Proc. Informatics, pp. 44:1–44:19 (2019)
Jansen, K., Lassota, A., Rohwedder, L.: Nearlinear time algorithm for \(n\)fold ilps via color coding. SIAM J. Discret. Math. 34(4), 2282–2299 (2020)
Karmarkar, N., Karp, R.M.: An efficient approximation scheme for the onedimensional binpacking problem. In Proc. FOCS 1982, pp. 312–320 (1982)
Khaniyev, T., Elhedhli, S., Erenay, F.S.: Structure detection in mixedinteger programs. INFORMS J. Comput. 30(3), 570–587 (2018)
Knop, D., Koutecký, M.: Scheduling meets \(n\)fold integer programming. J. Sched. 21(5), 493–503 (2018)
Knop, D., Koutecký, M.: Scheduling kernels via configuration LP. Technical report, (2020). arXiv:2003.02187
Knop, D., Koutecký, M., Levin, A., Mnich, M., Onn, S.: Multitype integer monoid optimization and applications. Technical report (2019) arXiv:1909.07326
Knop, D., Koutecký, M., Mnich, M.: Combinatorial \(n\)fold integer programming and applications. Math. Program. 184(1), 1–34 (2020)
Koutecký, M., Levin, A., Onn, S.: A parameterized strongly polynomial algorithm for block structured integer programs. In Proc. ICALP 2018, vol. 107 of Leibniz Int. Proc. Informatics, pp. 85:1–85:14 (2018)
Onn, S.: Nonlinear discrete optimization—an algorithmic theory. Zurich Lectures in Advanced Mathematics. European Mathematical Society (EMS), Zürich (2010)
Onn, S.: Huge multiway table problems. Discrete Optim. 14, 72–77 (2014)
Onn, S.: Huge tables and multicommodity flows are fixedparameter tractable via unimodular integer Carathéodory. J. Comput. Syst. Sci. 83(1), 207–214 (2017)
Psaraftis, H.N.: A dynamic programming approach for sequencing groups of identical jobs. Oper. Res. 28(6), 1347–1359 (1980)
Scheithauer, G., Terno, J.: The modified integer roundup property of the onedimensional cutting stock problem. European J. Oper. Res. 84(3), 562–571 (1995)
Sevast’janov, S., Banaszczyk, W.: To the Steinitz lemma in coordinate form. Discrete Math. 169(1–3), 145–152 (1997)
Steinitz, E.: Bedingt konvergente Reihen und konvexe Systeme. J. Reine Angew. Math. 146, 1–52 (1916)
van den Akker, J.M., Hoogeveen, J.A., van de Velde, S.L.: Parallel machine scheduling by column generation. Oper. Res. 47(6), 862–872 (1999)
Vanderbeck, F., Wolsey, L. A.: Reformulation and decomposition of integer programs. In: 50 Years of Integer Programming 19582008: From the Early Years to the StateoftheArt, pp. 431–502. Springer (2010)
Vose, M.D.: Egyptian fractions. Bull. London Math. Soc. 17(1), 21–24 (1985)
Wang, J., Ralphs, T.: Computational experience with hypergraphbased methods for automatic decomposition in discrete optimization. In: Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pp. 394–402. Springer (2013)
Weil, R.L., Kettler, P.C.: Rearranging matrices to blockangular form for decomposition (and other) algorithms. Mgmt. Sci. 18(1), 98–108 (1971)
Funding
Open Access funding enabled and organized by Projekt DEAL. D.K. partially supported by OP VVV MEYS project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics”. M.K. partially supported by Charles University project UNCE/SCI/004, and by project 1927871X of Grantová agentura České republiky (GA ČR). A.L. partially supported by Israel Science Foundation grant 308/18. M.M. supported by Deutsche Forschungsgemeinschaft (DFG) grant MN 59/41. S.O. partially supported by the Dresner chair and Israel Science Foundation grant 308/18.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Knop, D., Koutecký, M., Levin, A. et al. Highmultiplicity Nfold IP via configuration LP. Math. Program. (2022). https://doi.org/10.1007/s10107022018829
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10107022018829
Keywords
 Integer programming
 Configuration IP
 Fixedparameter algorithms
 Scheduling
Mathematics Subject Classification
 90C10
 90C27
 49M27