1 Introduction

The fundamental Integer Programming (IP) problem is to solve:

$$\begin{aligned} \min f({\mathbf{x}}):\, A{\mathbf{x}}= {\mathbf{b}}, \, {\mathbf{l}}\le {\mathbf{x}}\le {\mathbf{u}}, \, {\mathbf{x}}\in \mathbb {Z}^n, \end{aligned}$$
(IP)

where \(f: \mathbb {R}^n \rightarrow \mathbb {R}\), \(A \in \mathbb {Z}^{m \times n}\), \({\mathbf{b}}\in \mathbb {Z}^m\), and \({\mathbf{l}}, {\mathbf{u}}\in (\mathbb {Z}\cup \{\pm \infty \})^n\). Any IP instance with infinite bounds \({\mathbf{l}}, {\mathbf{u}}\) can be reduced to an instance with finite bounds using standard techniques (solving the continuous relaxation and using proximity bounds to restrict the relevant region), so that from now on we will assume finite bounds \({\mathbf{l}}, {\mathbf{u}}\in \mathbb {Z}^n\). We denote \({\displaystyle f_{\max } = \max _{\begin{array}{c} {\mathbf{x}}\in \mathbb {Z}^n:\\ {\mathbf{l}}\le {\mathbf{x}}\le {\mathbf{u}} \end{array}} |f({\mathbf{x}})|}\).

Integer Programming is a fundamental problem with vast importance both in theory and practice. Because it is NP-hard already with a single row (by reduction from Subset Sum) or with A a 0/1-matrix (by reduction from Vertex Cover), there is high interest in identifying tractable subclasses of IP. One such tractable subclass is N-fold IPs, whose constraint matrix A is defined as

$$\begin{aligned} A:= E^{(N)} := \left( \begin{array}{cccc} E^1_1 &{}\quad E^2_1\quad &{} \cdots &{}\quad E^N_1 \\ E^1_2 &{}\quad 0 &{} \cdots &{}\quad 0 \\ 0 &{}\quad E^2_2 &{} \cdots &{}\quad 0 \\ \vdots &{}\quad \vdots &{} \ddots &{}\quad \vdots \\ 0 &{} \quad 0 &{} \cdots &{} E^N_2 \\ \end{array}\right) . \end{aligned}$$
(1)

Here, \(r,s,t,N \in \mathbb {N}\), \(E^{(N)}\) is an \((r+Ns)\times Nt\)-matrix, \(E^i_1 \in \mathbb {Z}^{r \times t}\) and \(E^i_2 \in \mathbb {Z}^{s \times t}\), \(i \in [N]\), are integer matrices. We define \(E := \left( {\begin{matrix} E_1^1 &{} E_1^2 &{} \cdots &{} E_1^N \\ E_2^1 &{} E_2^2 &{} \cdots &{} E_2^N \end{matrix}}\right) \), and call \(E^{(N)}\) the N-fold product of E. The structure of \(E^{(N)}\) allows us to divide any Nt-dimensional object, such as the variables of \({\mathbf{x}}\), bounds \({\mathbf{l}}, {\mathbf{u}}\), or the objective f, into N bricks of size t, e.g. \({\mathbf{x}}=({\mathbf{x}}^1, \dots , {\mathbf{x}}^N)\). We use subscripts to index within a brick and superscripts to denote the index of the brick, i.e., \(x{^i_{j}}\) is the j-th variable of the i-th brick with \(j \in [t]\) and \(i \in [N]\). Problem (IP) with \(A=E^{(N)}\) is known as N-fold integer programming (N-fold IP).

Such block-structured matrices have been the subject of extensive research stretching back to the ’70s [3,4,5, 15, 16, 28, 42, 44, 45], as this special structure allows applying methods like the Dantzig-Wolfe decomposition and others, leading to significant speed-ups in practice. On the theoretical side, the term “N-fold IP” has been coined by De Loera et al. [9], and since then increasingly efficient algorithms have been developed and applied to various problems relating to N-fold IPs [2, 6, 25, 26, 29, 32]. This line of research culminated with an algorithm by Eisenbrand et al. [13] which solves N-fold IPs in time \((\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + rs^2)} \cdot N \log N \cdot \log \Vert {\mathbf{u}}-{\mathbf{l}}\Vert _\infty \cdot \log f_{\max }\) for all separable convex objectives f (i.e., when \(f({\mathbf{x}}) = \sum _{i=1}^n f_i(x_i)\) and each \(f_i: \mathbb {R}\rightarrow \mathbb {R}\) is convex).

1.1 Our contribution

Previous algorithms for N-fold IP have focused on reducing the run-time dependency on N down to almost linear. Instead, our interest here is on N-fold IPs which model applications where many bricks are of the same type, that is, they share the same bounds, right-hand side, and objective function. For those applications, it is natural to encode an N-fold IP instance succinctly by describing each brick type by its constraint matrix, bounds, right-hand side, and objective function, and giving a vector of brick multiplicities. When the number of brick types \(\tau \) is much smaller than the number N of bricks, e.g., if \(N \approx 2^\tau \), this succinct instance is (much) smaller than the previously studied encoding of N-fold IP, and an algorithm running in time polynomial in the size of the succinct instance may be (much) faster than current algorithms. We call the N-fold IP where the instance is given succinctly the huge N-fold IP problem, and we present a fast algorithm for it:

Theorem 1

Huge N-fold IP with any separable convex objective can be solved in time

$$\begin{aligned} (\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + rs^2)} {{\,\mathrm{\mathrm{poly}}\,}}(\tau , t, \log \Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, N, f_{\max }\Vert _\infty ) . \end{aligned}$$

A natural application of Theorem 1 are scheduling problems. In many scheduling problems, the number n of jobs that must be assigned to machines, as well as the number m of machines, are very large, whereas the number of types of jobs and the number of kinds of machines are relatively small. An instance of such a scheduling problem can thus be compactly encoded by simply stating, for each job type and machine kind, the number of jobs with that type and machines with that kind together with their characteristics (like processing time, weight, release time, due date, etc.), respectively. This key observation was made by several researchers [7, 37], until Hochbaum and Shamir [20] coined the term high-multiplicity scheduling problem. Clearly, many efficient algorithms for scheduling problems, where all jobs are assumed to be distinct, become exponential-time algorithms for the corresponding high-multiplicity problem.

Let us shortly demonstrate how Theorem 1 allows designing algorithms which are efficient for the succinct high-multiplicity encoding of the input. In modern computational clusters, it is common to have several kinds of machines differing by processing unit type (high single- or multi-core performance CPUs, GPUs), storage type (HDD, SSD, etc.), network connectivity, etc. However, the number of machine kinds \(\tau \) is still much smaller (perhaps 10) than the number of machines, which may be in the order of tens of thousands or more. Many scheduling problems have N-fold IP models [31] where \(\tau \) is the number of machine kinds and N is the number of machines. On these models, Theorem 1 would likely outperform the currently fastest N-fold IP algorithms.

Proof ideas. To solve a high-multiplicity problem, one needs a succinct way to argue about solutions. In 1961, Gilmore and Gomory [17] introduced the fundamental and widely influential notion of Configuration IP (ConfIP) which describes a solution (e.g., a schedule) by a list of pairs “(machine schedule s, multiplicity \(\mu \) of machines with schedule s)”. The linear relaxation of ConfIP, called the Configuration LP (ConfLP), can often be solved efficiently, and is known to provide solutions of strikingly high quality in practice [41]; for example, the optimum of the ConfLP for Bin Packing is conjectured to have value x such that an optimal integer packing uses \(\le \lceil x \rceil + 1\) bins [38]. However, surprisingly little is known in general about the structure of solutions of ConfIP and ConfLP, and how they relate to each other.

We define the Configuration IP and LP of an N-fold IP instance, and show how to solve the ConfLP quickly using the property that the ConfLP and ConfIP have polynomial encoding length even for huge N-fold IP. Our main technical contribution is a novel proximity theorem about N-fold IP, showing that a solution of its relaxation corresponding to the ConfLP optimum is very close to the integer optimum. Thus, the algorithm of Theorem 1 proceeds in three steps: (1) it solves the ConfLP, (2) it uses the proximity theorem to create a “residual” \(N'\)-fold instance with \(N'\) upperbounded by \((\Vert E\Vert _\infty rs)^{\mathcal {O}(rs)}\), and (3) it solves the residual instance by an existing N-fold IP algorithm.

1.2 Related work

Besides the references mentioned already, we point out that solving ConfLP is commonly used as subprocedure in approximation algorithms, e.g. [1, 14, 22, 27]. Jansen and Solis-Oba use a mixed ConfLP to give a parameterized \(\textit{OPT}+1\) algorithm for bin packing [24]; Onn [36] gave a weaker form of Theorem 1 which only applies to the setting where \(E_1^i = I\) and \(E_2^i\) is totally unimodular, for all i. Jansen et al. [25] extend the ConfIP to multiple “levels” of configurations. An extended version [31] of this paper shows how to model many scheduling problems as high multiplicity N-fold IPs, so that an application of Theorem 1 yields new parameterized algorithms for these problems. Knop and Koutecký [30] use our new proximity theorem to show efficient preprocessing algorithms (kernels) for scheduling problems.

There are currently several “fastest” algorithms for N-fold IP with standard (non-succinct) encoding. First, we have already mentioned the algorithm of Eisenbrand et al. [13]. Second, the algorithm of Jansen et al. [26] has a better parameter dependency of \((\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + s^2)}\) (as compared with \((\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + rs^2)}\) of the previous algorithm), but has a slightly worse dependence on N of \(N \log ^5 N\), and only works for linear objectives. Third, a recent algorithm of Cslovjecsek et al. [8] again only works for linear objectives and runs in time \((\Vert E\Vert _\infty s)^{\mathcal {O}(s^2)} {{\,\mathrm{\mathrm{poly}}\,}}(r) N \log ^2(Nt) \log ^2(\Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, f_{\max }\Vert _\infty ) + (\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + s^2)}Nt\). While the authors claim that this constitutes the currently fastest algorithm, it seems that it is only potentially faster than prior work in a narrow parameter regime.

The third paper, by Cslovjecsek et al. [8], is the closest to ours in its approach: it solves a strong relaxation of N-fold IP which coincides with the ConfLP if each brick is of a distinct type, and which is generalized by the ConfLP (in our work) otherwise. The authors show that this relaxation can be solved in near-linear time, and then develop a proximity theorem similar to ours (but using different techniques) and a dynamic program, which allows them to construct and solve a residual instance in linear time. An earlier version of our paper [31] stated a worse proximity bound than that of Cslovjecsek et al. [8], but our bound applies to separable convex objective whereas theirs [8] does not. Presently, we adapt one of their lemmas ( [8, Lemma 3]) (Lemma 5) and a modeling idea (Sect. 3.4) to obtain the same proximity bound as they have [8], but which also works for separable convex objectives. It is likely that the complexity of our algorithm to solve the ConfLP could be improved along the lines of their work [8]. Despite these similarities, we highlight that only our algorithm solves the high-multiplicity version of N-fold IP.

2 Preliminaries

For positive integers mn with \(m \le n\) we set \([m,n] = \{m, m+1, \ldots , n\}\) and \([n] = [1,n]\). We write vectors in boldface (e.g., \({\mathbf{x}}, {\mathbf{y}}\)) and their entries in normal font (e.g., the i-th entry of \({\mathbf{x}}\) is \(x_i\) or x(i)). For \(\alpha \in \mathbb {R}\), \({\lfloor }{\alpha }{\rfloor }\) is the floor of \(\alpha \), \({\lceil }{\alpha }{\rceil }\) is the ceiling of \(\alpha \), and we define \(\{\alpha \} = \alpha - {\lfloor }{\alpha }{\rfloor }\), similarly for vectors where these operators are defined component-wise.

We call a brick of \({\mathbf{x}}\) integral if all of its coordinates are integral, and fractional otherwise.

Huge N-fold IP. The huge N-fold IP problem is an extension of N-fold IP to the high-multiplicity scenario, where there are potentially exponentially many bricks. This requires a succinct representation of the input and output. The input to a huge N-fold IP problem with \(\tau \) brick types is defined by matrices \(E^i_1 \in \mathbb {Z}^{r \times t}\) and \(E^i_2 \in \mathbb {Z}^{s \times t}\), \(i \in [\tau ]\), vectors \({\mathbf{l}}^1, \dots , {\mathbf{l}}^\tau \), \({\mathbf{u}}^1, \dots , {\mathbf{u}}^\tau \in \mathbb {Z}^t\), \({\mathbf{b}}^0 \in \mathbb {Z}^r\), \({\mathbf{b}}^1, \dots , {\mathbf{b}}^\tau \in \mathbb {Z}^s\), functions \(f^1, \dots , f^\tau :\mathbb {R}^{t} \rightarrow \mathbb {R}\) satisfying \(\forall i \in [\tau ], \, \forall {\mathbf{x}}\in \mathbb {Z}^t:\, f^i({\mathbf{x}}) \in \mathbb {Z}\) and given by evaluation oracles, and integers \(\mu ^1, \dots , \mu ^\tau \in \mathbb {N}\) such that \(\sum _{i=1}^\tau \mu ^i = N\). We say that a brick is of type i if its lower and upper bounds are \({\mathbf{l}}^i\) and \({\mathbf{u}}^i\), its right hand side is \({\mathbf{b}}^i\), its objective is \(f^i\), and the matrices appearing at the corresponding coordinates are \(E^i_1\) and \(E^i_2\). The task is to solve (IP) with a matrix \(E^{(N)}\) which has \(\mu ^i\) bricks of type i for each i. Onn [35] shows that for any solution, there exists a solution which is at least as good and has only few (at most \(\tau \cdot 2^t\)) distinct bricks. In Sect. 3 we show new bounds which do not depend exponentially on t.

2.1 Graver bases and the Steinitz lemma

Let \({\mathbf{x}}, {\mathbf{y}}\) be n-dimensional vectors. We call \({\mathbf{x}}, {\mathbf{y}}\) sign-compatible if they lie in the same orthant, that is, for each \(i \in [n]\), \(x_i \cdot y_i \ge 0\). We call \(\sum _i {\mathbf{g}}^i\) a sign-compatible sum if all \({\mathbf{g}}^i\) are pair-wise sign-compatible. Moreover, we write \({\mathbf{y}}\sqsubseteq {\mathbf{x}}\) if \({\mathbf{x}}\) and \({\mathbf{y}}\) are sign-compatible and \(|y_i| \le |x_i|\) for each \(i \in [n]\). Clearly, \(\sqsubseteq \) imposes a partial order, called “conformal order”, on n-dimensional vectors. For an integer matrix \(A \in \mathbb {Z}^{m \times n}\), its Graver basis \(\mathcal {G}(A)\) is the set of \(\sqsubseteq \)-minimal non-zero elements of the lattice of A, \(\ker _{\mathbb {Z}}(A) = \{{\mathbf{z}}\in \mathbb {Z}^n \mid A {\mathbf{z}}= {\mathbf {0}}\}\). A circuit of A is an element \({\mathbf{g}}\in \ker _{\mathbb {Z}}(A)\) whose support \(\text {supp}({\mathbf{g}})\) (i.e., the set of its non-zero entries) is minimal under inclusion and whose entries are coprime. We denote the set of circuits of A by \(\mathcal {C}(A)\). It is known that \(\mathcal {C}(A) \subseteq \mathcal {G}(A)\) [34, Definition 3.1 and remarks]. We make use of the following two propositions:

Proposition 1

(Positive Sum Property [34, Lemma 3.4]) Let \(A \in \mathbb {Z}^{m \times n}\) be an integer matrix. For any integer vector \({\mathbf{x}}\in \ker _{\mathbb {Z}}(A)\), there exists an \(n' \le 2n-2\) and a decomposition \({\mathbf{x}}= \sum _{j=1}^{n'} \alpha _j {\mathbf{g}}_j\) with \(\alpha _j \in \mathbb {N}\) for each \(j \in [n']\), into a sum of \({\mathbf{g}}_j \in \mathcal {G}(A)\). For any fractional vector \({\mathbf{x}}\in \ker (A)\) (that is, \(A{\mathbf{x}}={\mathbf{0}}\)), there exists a decomposition \({\mathbf{x}}= \sum _{j=1}^{n} \alpha _j {\mathbf{g}}_j\) into \({\mathbf{g}}_j \in \mathcal {C}(A)\), where \(\alpha _j \ge 0\) for each \(j \in [n]\).

Proposition 2

(Separable convex superadditivity [10, Lemma 3.3.1]) Let \(f({\mathbf{x}}) = \sum _{i=1}^n f_i(x_i)\) be separable convex, let \({\mathbf{x}}\in \mathbb {R}^n\), and let \({\mathbf{g}}_1,\dots ,{\mathbf{g}}_k \in \mathbb {R}^n\) be vectors with the same sign-pattern from \(\{\le 0, \ge 0\}^n\), that is, belonging to the same orthant of \(\mathbb {R}^n\). Then

$$\begin{aligned} f \left( {\mathbf{x}}+ \sum _{j=1}^k \alpha _j {\mathbf{g}}_j \right) - f({\mathbf{x}})&\ge \sum _{j=1}^k \alpha _j \left( f({\mathbf{x}}+ {\mathbf{g}}_j) - f({\mathbf{x}}) \right) \end{aligned}$$
(2)

for arbitrary integers \(\alpha _1,\dots ,\alpha _k \in \mathbb {N}\).

Our proximity theorem relies on the Steinitz Lemma, which has recently received renewed attention [11, 12, 23].

Lemma 1

(Steinitz [40], Sevastjanov, Banaszczyk [39]) Let \(\Vert \cdot \Vert \) denote any norm, and let \({\mathbf{x}}_1, \dots , {\mathbf{x}}_n \in \mathbb {R}^d\) be such that \(\Vert {\mathbf{x}}_i\Vert \le 1\) for \(i \in [n]\) and \(\sum _{i=1}^n {\mathbf{x}}_i = 0\). Then there exists a permutation \(\pi \in S_n\) such that for all \(k = 1,\dots ,n\), the prefix sum satisfies \(\left\| \sum _{i=1}^k {\mathbf{x}}_{\pi (i)}\right\| \le d\).

For an integer matrix A, we define \(g_1(A) = \max _{{\mathbf{g}}\in \mathcal {G}(A)} \Vert {\mathbf{g}}\Vert _1\). When it could make a difference, we will state our bounds both in terms of \(\Vert E\Vert _\infty \) (worst-case, when we have no other information) and in terms of \(g_1(E_2) := \max _i g_1(E_2^i)\), e.g. in Lemma 10 and Theorem 2.

3 Proof of Theorem 1

We first give a relatively high-level description of the proof, before we present all its details.

3.1 Proof overview and ideas

3.1.1 Configuration LP and IP

Given an input to the huge N-fold IP, we first reformulate it as another IP, which we refer to as the Configuration IP. We then consider its fractional relaxation, the so-called Configuration LP. Our approach is to (efficiently) solve the Configuration LP, and bound the distance of its LP optimum to the integer optimum (of the Configuration IP). We use this bound to reduce the input to the huge N-fold IP from a high-multiplicity input to an input of a standard N-fold IP which is small both in terms of the number of bricks and size of the bounding box. This small input we then solve using an existing N-fold IP algorithm. On this way, there are several non-trivial obstacles that we need to overcome.

We will refer to huge N-fold IP as HugeIP, its corresponding fractional relaxation as HugeCP (this is a convex program if the objective f is convex), the Configuration LP of the HugeIP as ConfLP, and to its integer version as ConfIP. We define a mapping \(\varphi \) from the solutions of ConfLP to the solutions of HugeCP which, for every variable \(y_{{\mathbf{c}}}\) of the ConfLP introduces \(\lfloor y_{{\mathbf{c}}} \rfloor \) bricks with configuration \({\mathbf{c}}\), and then introduces \(\sum _{{\mathbf{c}}} \{y_{{\mathbf{c}}}\}\) bricks with configuration \(\frac{1}{\sum _{{\mathbf{c}}} \{y_{{\mathbf{c}}}\}} \sum _{{\mathbf{c}}} \{y_{{\mathbf{c}}}\} \cdot {\mathbf{c}}\) (i.e., an “average” configuration). We call a solution \({\mathbf{x}}^*\) of HugeCP “conf-optimal” if it is the image \(\varphi ({\mathbf{y}}^*)\) of some ConfLP optimum \({\mathbf{y}}^*\). One would hope that then the objective value of a conf-optimal solution \({\mathbf{x}}^*\) in HugeCP and of \({\mathbf{y}}^*\) in ConfLP were identical. While this is true for any linear objective f, it need not be true for a convex objective f. To overcome this impediment, we introduce an auxiliary objective \({\hat{f}}\) which preserves the values of optima of ConfLP and conf-optimal solutions of HugeCP.

3.1.2 Proximity theorem

The bulk of our work is showing that for each conf-optimal solution \({\mathbf{x}}^*\) of the HugeLP, there is an optimum \({\mathbf{z}}^*\) of the HugeIP whose \(\ell _1\)-distance from \({\mathbf{x}}^*\) is bounded by \(P :=(\Vert E\Vert _\infty rs)^{\mathcal {O}(rs)}\). We will show that we can obtain a ConfLP optimum \({\mathbf{y}}\) with support of size at most \(r+\tau \), and by the definition of \(\varphi \) (recall that \({\mathbf{x}}^* = \varphi ({\mathbf{y}})\)), this means that \({\mathbf{x}}^*\) has at most \(r+\tau +1\) distinct bricks (the \(+1\) is due to \(\varphi \) creating an additional “average configuration” brick type). This, in turn, means that our bound on the \(\ell _1\)-distance between \({\mathbf{z}}^*\) and \({\mathbf{x}}^*\) says something about ConfLP and ConfIP: for any ConfLP optimum \({\mathbf{y}}\) there is a ConfIP optimum \({\mathbf{y}}^*\) in \(\ell _1\)-distance at most P where any configuration \({\mathbf{c}}\) in the support of \({\mathbf{y}}^*\) is at most P far from some configuration \({\mathbf{c}}'\) in the support of \({\mathbf{y}}\). As far as we know, this is a unique result about the Configuration LP.

A way of bounding the distance between some types of optima in an integer program has been introduced by Hochbaum and Shanthikumar [21] and adapted to the setting of N-fold IP by Hemmecke at al. [19]. A somewhat different approach was later developed by Eisenbrand and Weismantel [11] in the setting of IPs with few rows, and was adapted to the setting of N-fold IPs soon after [12, 13]. The idea is as follows. Let \({\mathbf{x}}^*\) be a HugeCP optimum, and \({\mathbf{z}}^*\) be a HugeIP optimum, We call a non-zero integral vector \({\mathbf{p}}\sqsubseteq {\mathbf{x}}^* - {\mathbf{z}}^*\), i.e., which is sign-compatible (i.e., has the same sign-pattern) with \({\mathbf{x}}^* - {\mathbf{z}}^*\) and which is smaller in absolute value than \({\mathbf{x}}^* - {\mathbf{z}}^*\) in each coordinate, a cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\). If \({\mathbf{z}}^*\) minimizes \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\), it can be shown that no cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\) exists. Moreover, if a cycle exists, then a cycle of \(\ell _1\)-norm at most B exists, which implies \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1 \le B\).

Notice that the previous argument assumes \({\mathbf{x}}^*\) to be a HugeCP optimum: this cannot be replaced with a conf-optimal solution for the following reason. The existence of a cycle \({\mathbf{p}}\) leads to a contradiction because either \({\mathbf{z}}^* + {\mathbf{p}}\) is also a HugeIP optimum (but closer to \({\mathbf{x}}^*\)) or \({\mathbf{x}}^* - {\mathbf{p}}\) is also a HugeCP optimum (but closer to \({\mathbf{z}}^*\)). But if \({\mathbf{x}}^*\) is a conf-optimal solution, we have no guarantee that \({\mathbf{x}}^* - {\mathbf{p}}\) is again a configurable solution, and the argument breaks down. This means that we need to restrict our attention to cycles with the property that if \({\mathbf{x}}^*\) is a configurable solution, then \({\mathbf{x}}^* - {\mathbf{p}}\) is also configurable.

We call such a \({\mathbf{p}}\) a configurable cycle. The next task is an analogy of the argument above: if \({\mathbf{x}}^*\) is conf-optimal and \({\mathbf{z}}^*\) is a HugeIP optimum, then the existence of a configurable cycle \({\mathbf{p}}\) of \({\mathbf{x}}^* - {\mathbf{z}}^*\) leads to a contradiction. For that, we need the separability and convexity of the objective f and a careful use of the configurability of \({\mathbf{p}}\). With this argument at hand, we have reduced our task to bounding the norm of any configurable cycle (Lemma 7).

However, the main existing tool for showing proximity is by ruling out cycles. To overcome this, we develop new tools to deal with configurable cycles.

3.1.3 The algorithm

It remains to use our proximity bound P. As already hinted at, if two solutions differ in \(\ell _1\)-norm by at most P, then they may differ in at most P bricks. This means that we may fix all but P bricks for each configuration appearing in the ConfLP optimum. Since the size of the support of the ConfLP optimum is small (\(r+\tau \)), the total number of bricks to be determined is also small, and can be done using a standard N-fold IP algorithm in the required time complexity (Proof of Theorem 1)

To recap, the algorithm works in the following steps.

1.:

We solve the ConfLP and obtain its optimum \({\mathbf{y}}\) by solving its Dual LP using a separation oracle. The separation oracle is implemented using a fixed-parameter algorithm for IP with small coefficients.

2.:

We use the ConfLP optimum \({\mathbf{y}}\) to fix the solution on all but \((r+\tau )P\) bricks.

3.:

The remaining instance can be encoded as an N-fold IP with at most \((r+\tau )P\) bricks and solved using an existing algorithm.

Let us now go back to a detailed proof of Theorem 1.

3.2 Configurations of huge N-fold IP

Fix a huge N-fold IP instance with \(\tau \) types. Recall that \(\mu ^i\) denotes the number of bricks of type i, and \(\varvec{ \mu }= (\mu ^1, \dots , \mu ^\tau )\). We define for each \(i \in [\tau ]\) the set of configurations of type i as

$$\begin{aligned} \mathcal {C}^i = \left\{ {\mathbf{c}}\in \mathbb {Z}^t \mid E^i_2 {\mathbf{c}}= {\mathbf{b}}^i, \, {\mathbf{l}}^i \le {\mathbf{c}}\le {\mathbf{u}}^i \right\} . \end{aligned}$$

Here we are interested in four instances of convex programming (CP) and convex integer programming (IP) related to huge N-fold IP. First, we have the Huge IP

$$\begin{aligned} \min f({\mathbf{x}}): \, E^{(N)} {\mathbf{x}}= {\mathbf{b}}, \, {\mathbf{l}}\le {\mathbf{x}}\le {\mathbf{u}}, \, {\mathbf{x}}\in \mathbb {Z}^{Nt}, \end{aligned}$$
(HugeIP)

and the Huge CP, which is a relaxation of (HugeIP),

$$\begin{aligned} \min {\hat{f}}({\mathbf{x}}): \, E^{(N)} {\mathbf{x}}= {\mathbf{b}}, \, {\mathbf{l}}\le {\mathbf{x}}\le {\mathbf{u}}, \, {\mathbf{x}}\in \mathbb {R}^{Nt} . \end{aligned}$$
(HugeCP)

We shall define the objective function \({\hat{f}}\) later, for now it suffices to say that for all integral feasible \({\mathbf{x}}\in \mathbb {Z}^{Nt}\) we have \(f({\mathbf{x}}) = {\hat{f}}({\mathbf{x}})\) so that indeed the optimum of (HugeCP) lower bounds the optimum of (HugeIP) and that \({\hat{f}}\) is convex. Then, there is the Configuration LP of (HugeIP), that is, the following linear program:

$$\begin{aligned}&\min {\mathbf{v}}{\mathbf{y}}= \min \sum _{i=1}^\tau \sum _{{\mathbf{c}}\in \mathcal {C}^i} f^i({\mathbf{c}}) \cdot y(i, {\mathbf{c}}) \end{aligned}$$
(3)
$$\begin{aligned}&\sum _{i=1}^{\tau } E^i_1 \sum _{{\mathbf{c}}\in \mathcal {C}^i} {\mathbf{c}}y(i, {\mathbf{c}}) = {\mathbf{b}}^0, \nonumber \\&\sum _{{\mathbf{c}}\in \mathcal {C}^i} y(i, {\mathbf{c}}) = \mu ^i \quad \quad \quad \forall i \in [\tau ],\nonumber \\&{\mathbf{y}}\ge {\mathbf {0}} . \end{aligned}$$
(4)

Letting B be its constraint matrix and \({\mathbf{d}}= \left( {\begin{matrix}{\mathbf{b}}^0 \\ \varvec{ \mu }^\intercal \end{matrix}}\right) \) be the right hand side, we can shorten (3)–(4) as

$$\begin{aligned} \min {\mathbf{v}}{\mathbf{y}}:\, B {\mathbf{y}}= {\mathbf{d}}, \, {\mathbf{y}}\ge {\mathbf {0}} . \end{aligned}$$
(ConfLP)

Finally, by observing that \(B{\mathbf{y}}={\mathbf{d}}\) implies \(y(i,{\mathbf{c}}) \le \Vert \varvec{ \mu }\Vert _\infty \) for all \(i \in [\tau ], {\mathbf{c}}\in \mathcal {C}^i\), defining \(C = \sum _{i \in [\tau ]} |\mathcal {C}^i|\), leads to the Configuration ILP,

$$\begin{aligned} \min {\mathbf{v}}{\mathbf{y}}: \,B {\mathbf{y}}= {\mathbf{d}}, \, {\mathbf {0}} \le {\mathbf{y}}\le (\Vert \varvec{ \mu }\Vert _\infty , \dots , \Vert \varvec{ \mu }\Vert _\infty )^{\intercal },\, {\mathbf{y}}\in \mathbb {N}^{C} . \end{aligned}$$
(ConfILP)

A solution \({\mathbf{x}}\) of (HugeCP) is configurable if, for every \(i \in [\tau ]\), each brick \({\mathbf{x}}^j\) of type i is a convex combination of \(\mathcal {C}^i\), i.e., \({\mathbf{x}}^j \in \text {conv}(\mathcal {C}^i)\). We shall define a mapping from solutions of (ConfLP) to configurable solutions of (HugeCP) as follows. For every solution \({\mathbf{y}}\) of (ConfLP) we define a solution \({\mathbf{x}}= \varphi ({\mathbf{y}})\) of (HugeCP) to have \({\lfloor }{y(i, {\mathbf{c}})}{\rfloor }\) bricks of type i with configuration \({\mathbf{c}}\) and, for each \(i \in [\tau ]\), let \({\mathfrak {f}}^i = \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{y(i, {\mathbf{c}})\}\) and let \({\mathbf{x}}\) have \({\mathfrak {f}}^i\) bricks with value \({\hat{{\mathbf{c}}}}_i = \frac{1}{{\mathfrak {f}}^i} \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{y(i, {\mathbf{c}})\}{\mathbf{c}}\). (Because \(\sum _{{\mathbf{c}}\in \mathcal {C}^i} y(i,{\mathbf{c}}) = \mu ^i\) and \(\sum _{{\mathbf{c}}\in \mathcal {C}^i} {\lfloor }{y(i,{\mathbf{c}})}{\rfloor }\) is clearly integral, \({\mathfrak {f}}^i = \mu ^i - \sum _{{\mathbf{c}}\in \mathcal {C}^i} {\lfloor }{y(i,{\mathbf{c}})}{\rfloor }\) is also integral.) Note that \(\varphi ({\mathbf{y}})\) has at most as many fractional bricks as \({\mathbf{y}}\) has fractional entries since each \({\mathfrak {f}}^i < 1\) and the number of non-zero \({\mathfrak {f}}^i\) is at most the number of fractional entries of \({\mathbf{y}}\). Call a solution \({\mathbf{x}}\) of (HugeCP) conf-optimal if there is an optimal solution \({\mathbf{y}}\) of (ConfLP) such that \({\mathbf{x}}= \varphi ({\mathbf{y}})\).

We are going to introduce an auxiliary objective function \({\hat{f}}\), but we first want to discuss our motivation in doing so. The reader might already see that for any integer solution \({\mathbf{y}}\in \mathbb {Z}^{C}\) of (ConfILP), \({\mathbf{v}}{\mathbf{y}}= f(\varphi ({\mathbf{y}}))\) holds, as we shall prove in Lemma 4. Our natural hope would be that for a fractional optimum \({\mathbf{y}}^*\) of (ConfLP) we would have \({\mathbf{v}}{\mathbf{y}}^* = f(\varphi ({\mathbf{y}}^*))\). However, by convexity of f and the construction of \({\hat{{\mathbf{c}}}}_i\) it only follows that \({\mathbf{v}}{\mathbf{y}}^* \ge f(\varphi ({\mathbf{y}}^*))\). Even worse, there may be two conf-optimal solutions \({\mathbf{x}}\) and \({\mathbf{x}}'\) with \(f({\mathbf{x}}) {<} f({\mathbf{x}}')\). To overcome this, we define an auxiliary objective function \({\hat{f}}\) with the property that for any conf-optimal solution \({\mathbf{x}}^*\) of (HugeCP) and any optimal solution \({\mathbf{y}}^*\) of (ConfLP), \({\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\mathbf{x}}^*)\).

Fix a brick \({\mathbf{x}}^j\) of type i. We say that a multiset \(\varGamma ^j \subseteq (\mathcal {C}^i \times \mathbb {R}_{\ge 0})\) is a decomposition of \({\mathbf{x}}^j\) and write \({\mathbf{x}}^j = \sum \varGamma ^j\) if \({\mathbf{x}}^j = \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j} \lambda _{{\mathbf{c}}} {\mathbf{c}}\) and \(\sum _{({\mathbf{c}}, \lambda _{{\mathbf{c}}}) \in \varGamma ^j} \lambda _{\mathbf{c}}= 1\). We define the objective \({\hat{f}}({\mathbf{x}})\) for all configurable solutions as \({\hat{f}}({\mathbf{x}}) = \sum _{j=1}^N {\hat{f}}^i({\mathbf{x}}^j)\), where

$$\begin{aligned} {\hat{f}}^i({\mathbf{x}}^j) = \min _{\varGamma ^j: \sum \varGamma ^j = {\mathbf{x}}^j} \sum _{({\mathbf{c}}, \lambda _{{\mathbf{c}}}) \in \varGamma ^j} \lambda _{{\mathbf{c}}} \cdot f^i({\mathbf{c}}) . \end{aligned}$$
(5)

In a sense, \({\hat{f}}({\mathbf{x}})\) is the value of the minimum (w.r.t. f) interpretation of \({\mathbf{x}}\) as a convex combination of feasible integer solutions. Correspondingly, we call a decomposition \(\varGamma ^j\) of \({\mathbf{x}}^j\) \({\hat{f}}\)-optimal if it is a minimizer of (5). Formally, we let \({\hat{f}}^i({\mathbf{x}}^j) = f^i({\mathbf{x}}^j)\) for a non-configurable \({\mathbf{x}}^j\) in order to make the definition of (HugeCP) valid; however, we are never interested in the value of \({\hat{f}}\) for non-configurable bricks in the following.

Lemma 2

Let \({\mathbf{x}}\) be a configurable solution of (HugeCP), and \({\mathbf{x}}^j\) be a brick of type i. Then \(f^i({\mathbf{x}}^j) \le {\hat{f}}^i({\mathbf{x}}^j)\). If \({\mathbf{x}}^j\) is integral, then \(f^i({\mathbf{x}}^j) = {\hat{f}}^i({\mathbf{x}}^j)\).

Proof

By convexity of \(f^i\) we have

$$\begin{aligned} f^i({\mathbf{x}}^j) = f^i\left( \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j} \lambda _{{\mathbf{c}}} {\mathbf{c}}\right)&\le \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j} \lambda _{\mathbf{c}}f^i({\mathbf{c}}), \end{aligned}$$

for any decomposition \(\varGamma ^j\) of \({\mathbf{x}}^j\). If \({\mathbf{x}}^j\) is integral, then \(\varGamma ^j = \{({\mathbf{x}}^j, 1)\}\) is its optimal decomposition (not necessarily uniqueFootnote 1), concluding the proof. \(\square \)

Moreover, for each \({\mathbf{x}}^j\) there is an \({\hat{f}}\)-optimal decomposition \(\varGamma ^j\) with \(|\varGamma ^j|\le t+1\) since \({\hat{f}}\)-optimal decompositions correspond to optima of a linear program with \(t+1\) equality constraints, namely

$$\begin{aligned} \min \sum _{{\mathbf{c}}\in \mathcal {C}^i} \lambda _{{\mathbf{c}}} f^i({\mathbf{c}}) \quad \text {s.t.} \quad \sum _{{\mathbf{c}}\in \mathcal {C}^i} \lambda _{{\mathbf{c}}} {\mathbf{c}}= {\mathbf{x}}^j, \, \Vert \varvec{ \lambda }\Vert _1 = 1, \, \varvec{ \lambda }\ge {\mathbf{0}}. \end{aligned}$$
(6)

Let us describe the relationship of the objective values of the various formulations.

Lemma 3

For any feasible solution \({\tilde{{\mathbf{y}}}}\) of (ConfLP),

$$\begin{aligned} {\mathbf{v}}{\tilde{{\mathbf{y}}}} \ge {\hat{f}}(\varphi ({\tilde{{\mathbf{y}}}})) . \end{aligned}$$
(7)

Proof

Let \({\tilde{{\mathbf{x}}}} = \varphi ({\tilde{{\mathbf{y}}}})\). We can decompose \({\hat{f}}(\varphi ({\tilde{{\mathbf{y}}}})) = U_1 + U_2\), where \(U_1\) is the cost of integer bricks of \(\varphi ({\tilde{{\mathbf{y}}}})\) and \(U_2\) is the cost of its fractional bricks. It is easy to see that \(U_1 = {\mathbf{v}}{\lfloor }{{\tilde{{\mathbf{y}}}}}{\rfloor }\) by the equality of \(f^i\) and \({\hat{f}}^i\), for all \(i \in [\tau ]\), over integer vectors. We shall further decompose the value \(U_2\) into costs of fractional bricks of each type. For each \(i \in [\tau ]\), the cost of each fractional brick of type i is at most \(\frac{1}{{\mathfrak {f}}^i} \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{{\tilde{y}}(i, {\mathbf{c}})\}f^i({\mathbf{c}})\) because the decomposition \(\left\{ \left( {\mathbf{c}}, \frac{1}{{\mathfrak {f}}^i} \{{\tilde{y}}^i_{{\mathbf{c}}}\}\right) \Big | {\mathbf{c}}\in \mathcal {C}^i\right\} \) of \({\hat{{\mathbf{c}}}}_i\) (recall that \({\hat{{\mathbf{c}}}}_i = \frac{1}{{\mathfrak {f}}^i} \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{{\tilde{y}}(i, {\mathbf{c}})\}{\mathbf{c}}\)) is merely a feasible (not necessarily optimal) solution of (6) Summing this estimate up over all \({\mathfrak {f}}^i\) fractional bricks of type i gives \({\mathfrak {f}}^i \cdot \frac{1}{{\mathfrak {f}}^i} \sum _{{\mathbf{c}}\in \mathcal {C}^i} \{{\tilde{y}}(i, {\mathbf{c}})\}f^i({\mathbf{c}}) = {\mathbf{v}}^i \{{\tilde{{\mathbf{y}}}}^i\}\), concluding the proof. \(\square \)

Lemma 4

Let \({\hat{{\mathbf{y}}}}\) be an optimum of (ConfILP), \({\mathbf{z}}^*\) be an optimum of (HugeIP), \({\mathbf{y}}^*\) be an optimum of (ConfLP), \({\tilde{{\mathbf{x}}}} = \varphi ({\mathbf{y}}^*)\), and \({\mathbf{x}}^*\) be a configurable optimum of (HugeCP). Then

$$\begin{aligned}{\hat{f}}({\mathbf{z}}^*) = f({\mathbf{z}}^*) = f(\varphi ({\hat{{\mathbf{y}}}})) = {\mathbf{v}}{\hat{{\mathbf{y}}}} \ge {\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\tilde{{\mathbf{x}}}}) = {\hat{f}}({\mathbf{x}}^*) .\end{aligned}$$

Proof

We have \({\hat{f}}({\mathbf{z}}^*) = f({\mathbf{z}}^*)\) by equality of \({\hat{f}}\) and f on integer solutions (Lemma 2), and \(f({\mathbf{z}}^*) = f(\varphi ({\hat{{\mathbf{y}}}})) = {\mathbf{v}}{\hat{{\mathbf{y}}}}\) by the definition of \(\varphi \) and the fact that \({\hat{{\mathbf{y}}}}\) is an integer optimum. Clearly, \({\mathbf{v}}{\hat{{\mathbf{y}}}} \ge {\mathbf{v}}{\mathbf{y}}^*\), because (ConfLP) is a relaxation of (ConfILP) and thus the former lower bounds the latter.

Let us construct a mapping \(\phi \) for any configurable solution \({\mathbf{x}}\) of (HugeCP). Start with \(\phi ({\mathbf{x}}) = {\mathbf{y}}= {\mathbf{0}}\). For each brick \({\mathbf{x}}^j\) of type i let \(\varGamma ^j\) be a \({\hat{f}}\)-optimal decomposition of \({\mathbf{x}}^j\) and update \(y^i_{\mathbf{c}}:= y^i_{\mathbf{c}}+ \lambda _{\mathbf{c}}\) for each \(({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j\). Now it is easy to see that

$$\begin{aligned} {\mathbf{v}}\phi ({\mathbf{x}}^*) = {\hat{f}}({\mathbf{x}}^*) . \end{aligned}$$
(8)

Our goal is to argue that \({\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\tilde{{\mathbf{x}}}}) = {\hat{f}}({\mathbf{x}}^*)\). We have \({\hat{f}}({\tilde{{\mathbf{x}}}}) = {\hat{f}}(\varphi ({\mathbf{y}}^*)) \le {\mathbf{v}}{\mathbf{y}}^*\) by (7), but by optimality of \({\mathbf{y}}^*\) and (8) it must be that \({\mathbf{v}}\phi ({\tilde{{\mathbf{x}}}}) = {\hat{f}}({\tilde{{\mathbf{x}}}}) \ge {\mathbf{v}}{\mathbf{y}}^*\) and hence \({\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\tilde{{\mathbf{x}}}})\). Similarly,

$$\begin{aligned} {\hat{f}}({\mathbf{x}}^*) = {\mathbf{v}}\phi ({\mathbf{x}}^*) \ge {\mathbf{v}}{\mathbf{y}}^* \ge {\hat{f}}(\varphi ({\mathbf{y}}^*)) \end{aligned}$$

with the “\(=\)” by (8), the first “\(\ge \)” by optimality of \({\mathbf{y}}^*\), and the second “\(\ge \)” by (7). However, since \({\hat{f}}(\varphi ({\mathbf{y}}^*)) \ge {\hat{f}}({\mathbf{x}}^*)\) by optimality of \({\mathbf{x}}^*\), all inequalities are in fact equalities and thus \({\mathbf{v}}{\mathbf{y}}^* = {\hat{f}}({\mathbf{x}}^*)\). \(\square \)

Remark 1

We only need the properties of \({\hat{f}}\) that we have proved so far. To gain a little bit more intuition, consider the dual of the LP (6). Notice that the set of right hand sides \({\mathbf{x}}^j\) whose optimum is attained by a particular set of configurations \(\text {supp}(\varvec{ \lambda })\) is a polyhedron. Call such a set a cell. This means that \({\hat{f}}\) is a convex function which is linear in each cell. Another observation is that \({\hat{f}}\) is non-separable.

We do not have a more intuitive explanation of \({\hat{f}}\). It would be tempting to think that \({\hat{f}}\) is the piece-wise linear approximation of f in which, for every \(i \in [Nt]\), we replace each segment of \(f_i\) between two adjacent integers \(k,k+1\) by the affine function going through the points \((k,f_i(k))\) and \((k+1, f_i(k+1))\). However, this turns out to be incorrect: for example, say that \(f_1(x_1) = |x_1-1|\) (thus \(f_1(0) = f_1(2) = 1\) and \(f_1(1) = 0\)) and that we set \(x_1 = 2x_2\) for a new integer variable \(x_2\). This constraint ensures that \(x_1\) only takes on even values. Thus, \(x_1\) never attains the value 1 and \({\hat{f}}_1(1) \ge 1\) even though the piece-wise linear approximation of \(f_1\) has value 0 at 1.

Bounding the number of fractional coordinates.

Lemma 5

(Adaptation of [8, Lemma 4.1]) An optimal vertex solution \({\mathbf{y}}^*\) of (ConfLP) has at most 2r fractional coordinates.

Proof

Notice that if a brick \(({\mathbf{y}}^*)^i\) is a vertex of the set \(Q^i := \text {conv}\{{\mathbf{y}}^i \in \mathbb {R}^{\mathcal {C}^i} \mid {\mathbf{1}}{\mathbf{y}}^i = \mu ^i, {\mathbf{y}}^i \ge {\mathbf{0}}\}\), then it is integral. Thus, any brick of \({\mathbf{y}}^*\) which is fractional cannot be a vertex of \(Q^i\) and hence there exists a direction \({\mathbf{e}}^i \in \text {Ker}_{\mathbb {Z}}({\mathbf{1}})\) and a length \(\lambda ^i >0\) such that \(({\mathbf{y}}^*)^i \pm \lambda ^i {\mathbf{e}}^i \in Q^i\). For the sake of contradiction, assume there are \(r+1\) bricks of \({\mathbf{y}}^*\) which contain a fractional coordinate and I is the index set of such bricks. Hence we have \({\mathbf{e}}^i, \lambda ^i\) as above for each \(i \in I\). We abuse the notation and treat \(\mathcal {C}^i\) as a matrix whose columns are the configurations. Consider the vectors \(E_1^i \mathcal {C}^i \lambda ^i {\mathbf{e}}^i \in \mathbb {R}^r\): because there are \(r+1\) of them, they are linearly dependent, and, by rescaling, there must be coefficients \({\bar{\varvec{ \lambda }}}\) such that \(|{\bar{\lambda }}^i| \le \lambda ^i\) for each \(i \in I\) and \(\sum _{i \in I} E_1^i \mathcal {C}^i {\bar{\lambda }}^i {\mathbf{e}}^i = {\mathbf{0}}\). Define \({\mathbf{e}}\in \mathbb {R}^C\) (recall that C is the total number of configurations) such that its i-th brick is equal to \({\bar{\lambda }}^i {\mathbf{e}}^i\) if \(i \in I\), and is \({\mathbf{0}}\) otherwise. Then \({\mathbf{y}}^* \pm {\mathbf{e}}\) are both feasible solutions of (ConfLP), and thus \({\mathbf{y}}^*\) is not a vertex solution—a contradiction.

So far, we have shown there are at most r fractional bricks of \({\mathbf{y}}^*\). Notice that all we needed for that was \(r+1\) linearly dependent vectors which can be added to some brick in both directions while preserving feasibility. Because \({\mathbf{e}}^i \in \text {Ker}_{\mathbb {Z}}({\mathbf{1}})\) for each \(i \in I\), we can decompose \({\mathbf{e}}^i\) into elements of \(\mathcal {G}({\mathbf{1}})\), which are exactly vectors with one 1 and one \(-1\). Hence, to avoid the contradiction above, there can be at most r vectors \({\mathbf{e}}^i\), and, additionally, all of them must belong to \(\mathcal {G}({\mathbf{1}})\). Thus, the resulting vector \({\mathbf{e}}\) has support of size at most 2r, and \({\mathbf{y}}^*\) has at most 2r fractional coordinates. \(\square \)

Finding a conf-optimal solution with small number of fractional bricks.

Our goal is to show that the proximity of any conf-optimal solution \({\mathbf{x}}^*\) of (HugeCP) from an integer optimum \({\mathbf{z}}^*\) of (HugeIP) depends on the number of fractional bricks. This number, by definition of \(\varphi \), depends on the number of fractional coordinates of the corresponding solution \({\mathbf{y}}\) of (ConfLP). The following lemma shows how to produce optima of (ConfLP) with small support. We emphasize that our proximity theorem does not require that the fractional solution be optimal but rather conf-optimal.

Lemma 6

There is an algorithm that finds an optimal vertex solution \({\mathbf{y}}^*\) of (ConfLP) with \(|\text {supp}({\mathbf{y}}^*)| \le r + \tau \) and at most 2r fractional coordinates, and a conf-optimal solution \({\mathbf{x}}^* = \varphi ({\mathbf{y}}^*)\) of (HugeCP) with at most 2r fractional bricks, in time \(g_1(E_2)^{\mathcal {O}(s)} {{\,\mathrm{\mathrm{poly}}\,}}(r t \tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu }, E\Vert _\infty )\).

Proof

The proof has three parts. First, we describe how to find an optimal basic solution of the dual of (ConfLP). Next, we identify \(r+\tau \) inequalities of this dual which fully determine the optimal dual LP solution. Finally, we show how to use this information to solve (ConfLP) itself.

Recall that \(\tau \) is the number of brick types in the huge N-fold instance. Since (ConfLP) has exponentially many variables, we take the standard approach and solve the dual LP of (ConfLP) by the ellipsoid method and the equivalence of optimization and separation. The Dual LP of (ConfLP) in variables \(\varvec{\alpha } \in \mathbb {R}^r\), \(\varvec{\beta } \in \mathbb {R}^{\tau }\) is:

$$\begin{aligned} \max&{\mathbf{b}}^0 \varvec{\alpha } + \sum _{i=1}^\tau \mu ^i \beta ^i&\nonumber \\ \text {s.t.}&(\varvec{\alpha } E^i_1) {\mathbf{c}}- f^i({\mathbf{c}})&\le -\beta ^i&\forall i \in [\tau ], \,\forall {\mathbf{c}}\in \mathcal {C}^i. \end{aligned}$$
(9)

To verify feasibility of \((\varvec{\alpha }, \varvec{\beta })\), we need, for each \(i \in [\tau ]\), to maximize the left-hand side of (9) over all \({\mathbf{c}}\in \mathcal {C}^i\) and check if it is at most \(-\beta ^i\). This corresponds to finding integer variables \({\mathbf{c}}\) which for given \((\varvec{\alpha }, \varvec{\beta })\) solve

$$\begin{aligned} \min \left( f^i({\mathbf{c}})- (\varvec{\alpha } E^i_1) {\mathbf{c}}\right) = -\max \,\left( (\varvec{\alpha } E^i_1) {\mathbf{c}}- f^i({\mathbf{c}})\right) \,:\, E^i_2 {\mathbf{c}}= {\mathbf{b}}^i, \, {\mathbf{l}}^i \le {\mathbf{c}}\le {\mathbf{u}}^i, {\mathbf{c}}\in \mathbb {Z}^t. \end{aligned}$$

This program can be solved in time \(T''' \le g_1(E_2)^{\mathcal {O}(s)} t^3\cdot {{\,\mathrm{\mathrm{poly}}\,}}(\log \Vert {\mathbf{b}}^i, {\mathbf{l}}^i, {\mathbf{u}}^i,\Vert E\Vert _\infty \Vert _\infty )\) [33, Theorem 4].

Grötschel et al. [18, Theorem 6.4.9] show that an optimal solution of an LP (even one which is a vertex [18, Remark 6.5.2]) can be found in a number of calls to a separation oracle which is polynomial in the dimension and the encoding length of the inequalities returned by a separation oracle. Clearly the inequalities (9) have encoding length bounded by \(\log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu }\Vert _\infty \) and thus \(T = {{\,\mathrm{\mathrm{poly}}\,}}(r t \tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu },E\Vert _\infty )\) calls to a separation oracle are sufficient to find an optimal vertex solution, which amounts to \(T \cdot T'''\) arithmetic operations.

Next, we will identify \(r+\tau \) inequalities determining the previously found optimal vertex solution of the dual of (ConfLP). Observe that the dimension of the dual LP is the number of rows of the primal LP, which is \(r + \tau \). Since each point in \((r+\tau )\)-dimensional space is fully determined by \(r+\tau \) linearly independent inequalities, there must exist a subset I of \(r+\tau \) inequalities among the T inequalities considered by the ellipsoid method which fully determines the dual optimum. We can find them as follows.

We initialize I to be the empty set. Taking the T considered inequalities one by one, we process the inequality if it is satisfied as equality by the given optimal basic solution for the dual LP, and we discard other inequalities. If we process the current inequality and either some inequality of I or the present inequality is dominatedFootnote 2 by an inequality that can be obtained as a non-negative linear combination of the others, discard it; otherwise, include it in I and continue. Testing whether an inequality \({\mathbf{d}}{\mathbf{z}}\le e'\) is dominated by a non-negative combination of a system of inequalities \(D {\mathbf{z}}\le {\mathbf{e}}\) can be decided by solving

$$\begin{aligned} \min \varvec{ \alpha }{\mathbf{e}}\quad \text {s.t.} \quad \varvec{ \alpha }^{\intercal } D = {\mathbf{d}}, \, \varvec{ \alpha }\ge {\mathbf{0}}, \end{aligned}$$
(10)

and checking whether the optimal value is at most \(e'\). If it is, then the solution \(\varvec{ \alpha }\) encodes a non-negative linear combination of the inequalities \(D {\mathbf{z}}\le {\mathbf{e}}\) which yields an inequality dominating \({\mathbf{d}}{\mathbf{z}}\le e'\), and if it is not, then such a combination does not exists. Thus, when a new inequality is considered, we solve (10) for at most \(r+\tau \) inequalities (the new one and all less than \(r+\tau \) already selected ones), and there are at most T inequalities considered. The time needed to solve (10) is \({{\,\mathrm{\mathrm{poly}}\,}}(r+\tau , \log \Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, f_{\max },E\Vert _\infty )\) because its dimension is at most \(r+\tau \) and its encoding length is at most \(\log \Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, f_{\max },E\Vert _\infty \). Altogether, we need time

$$\begin{aligned}&T \cdot (r+\tau ) \cdot {{\,\mathrm{\mathrm{poly}}\,}}(r+\tau , \log \Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, f_{\max },E\Vert _\infty )\\&\quad \le {{\,\mathrm{\mathrm{poly}}\,}}(r t \tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu },E\Vert _\infty ) =: T'. \end{aligned}$$

Finally, let the restricted (ConfLP) be the (ConfLP) restricted to the variables corresponding to the inequalities in I. We claim that an optimal solution to the restricted (ConfLP) is also an optimal solution to (ConfLP). To see that, use LP duality: the optimal objective value of the dual LP restricted to inequalities in I is the same as one of the dual optima, and thus an optimal solution of the restricted (ConfLP) must be an optimal solution of (ConfLP). We solve the restricted (ConfLP) using any polynomial LP algorithm in time \(T'' \le {{\,\mathrm{\mathrm{poly}}\,}}((r+\tau ), \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, \varvec{ \mu }, {\mathbf{b}}^0,E \Vert _\infty )\). The resulting total time complexity is thus \(T \cdot T''' + T'\) to construct the restricted (ConfLP) instance and time \(T''\) to solve it, \(T \cdot T''' + T' + T''\) total, which is upper bounded by \(g_1(E_2)^{\mathcal {O}(s)} {{\,\mathrm{\mathrm{poly}}\,}}(r t \tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu },E\Vert _\infty )\), as claimed.

Let \({\mathbf{y}}^*\) be an optimum of (ConfLP) we have thus obtained. Since \(|I| \le r+\tau \), the support of \({\mathbf{y}}^*\) is of size at most \(r+\tau \). By Lemma 5, \({\mathbf{y}}^*\) has at most 2r fractional coordinates. Now setting \({\mathbf{x}}^* = \varphi ({\mathbf{y}}^*)\) is enough, since we have already argued (see definition of \(\varphi \)) that \({\mathbf{x}}^*\) has at most as many fractional bricks as \({\mathbf{y}}^*\) has fractional coordinates and \({\mathbf{x}}^*\) can be computed from \({\mathbf{y}}^*\) in \(\mathcal {O}(r + \tau )\) time. \(\square \)

3.3 Proximity theorem

Let us give a plan for the next subsection. We wish to prove that for every conf-optimal solution \({\mathbf{x}}^*\) of (HugeCP) there is an integer solution \({\mathbf{z}}^*\) of (HugeIP) nearby. In the following, let \({\mathbf{x}}^*\) be a conf-optimal solution of (HugeCP) and \({\mathbf{z}}^*\) be an optimal solution of (HugeIP) minimizing \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\). A technique for proving proximity theorems which was introduced by Eisenbrand and Weismantel [11] works as follows. A vector \({\mathbf{h}}\in \mathbb {Z}^{Nt}\) is called a cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\) if \({\mathbf{h}}\ne {\mathbf{0}}\), \(E^{(N)} {\mathbf{h}}= {\mathbf{0}}\), and \({\mathbf{h}}\sqsubseteq {\mathbf{x}}^* - {\mathbf{z}}^*\). It is not too difficult to see that if \({\mathbf{x}}'\) is an optimal (not necessarily conf-optimal) solution of (HugeCP) with the objective f, then there cannot exist a cycle of \({\mathbf{x}}' - {\mathbf{z}}^*\) (cf. proof of Lemma 9). Based on a certain decomposition of \({\mathbf{x}}' - {\mathbf{z}}^*\) into integer and fractional smaller dimensional vectors and by an application of the Steinitz Lemma, the existence of a cycle is proven unless \(\Vert {\mathbf{x}}'-{\mathbf{z}}^*\Vert _1\) is roughly bounded by the number of fractional bricks of \({\mathbf{x}}'\). However, we cannot apply this technique directly as an optimal solution \({\mathbf{x}}'\) of (HugeCP) might have many fractional bricks. At the same time, an existence of a cycle \({\mathbf{h}}\) of \({\mathbf{x}}^* - {\mathbf{z}}^*\) does not necessarily contradict that \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\) is minimal, because \({\mathbf{x}}^* + {\mathbf{h}}\) might not be a configurable solution, which is an essential part of the argument.

All of this leads us to introduce a stronger notion of a cycle. We say that \({\mathbf{h}}\in \mathbb {Z}^{Nt}\) is a configurable cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\) (with respect to \({\mathbf{x}}^*\)) if (1) \({\mathbf{h}}\) is a cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\), (2) for each brick \(j \in [N]\) of type \(i \in [\tau ]\) there exists an \({\hat{f}}\)-optimal decomposition \(\varGamma ^j\) of \(({\mathbf{x}}^*)^j\) such that we may write \({\mathbf{h}}^j = \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j} \lambda _{\mathbf{c}}{\mathbf{h}}_{\mathbf{c}}\), and (3) for each \(({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j\) we have \({\mathbf{h}}_{\mathbf{c}}\sqsubseteq {\mathbf{c}}- ({\mathbf{z}}^*)^j\) and \({\mathbf{h}}_{\mathbf{c}}\in \text {Ker}_\mathbb {Z}(E^i_2)\). Soon we will show that if \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\) is minimal, \({\mathbf{x}}^* - {\mathbf{z}}^*\) does not have a configurable cycle. The next task becomes to show how large must \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\) be in order for a configurable cycle to exist. Recall that the technique of Eisenbrand and Weismantel [11] can be used to rule out an existence of a (regular) cycle, not a configurable cycle. To overcome this, we “lift” both \({\mathbf{x}}^*\) and \({\mathbf{z}}^*\) to a higher-dimensional space and show that a cycle in this space corresponds to a configurable cycle in the original space. Only then are we ready to prove a proximity bound using the aforementioned technique.

Lemma 7

If \({\mathbf{h}}\) is a configurable cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\), then \({\mathbf{x}}^* - {\mathbf{h}}\) is configurable.

Proof

Fix \(j \in [N]\). Let \({\mathbf{p}}\) be the brick \(({\mathbf{x}}^* - {\mathbf{h}})^j\) and let \(i \in [\tau ]\) be its type. Now \({\mathbf{p}}\) can be written as \({\mathbf{p}}= \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma ^j} \lambda _{\mathbf{c}}({\mathbf{c}}- {\mathbf{h}}_{\mathbf{c}})\). Furthermore, we have \(E^i_2({\mathbf{c}}- {\mathbf{h}}_{{\mathbf{c}}}) = E^i_2 {\mathbf{c}}= {\mathbf{b}}^j\), and, by \({\mathbf{h}}\sqsubseteq {\mathbf{x}}^* - {\mathbf{z}}^*\), we also have \({\mathbf{l}}\le {\mathbf{x}}^* - {\mathbf{h}}\le {\mathbf{u}}\). \(\square \)

We now need a technical lemma:

Lemma 8

Let \({\mathbf{x}}^*\) be a conf-optimal solution of (HugeCP), let \({\mathbf{z}}^*\) be an optimum of (HugeIP), and let \({\mathbf{h}}^*\) be a configurable cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\). Then

$$\begin{aligned} {\hat{f}}({\mathbf{z}}^* + {\mathbf{h}}^*) + {\hat{f}}({\mathbf{x}}^* - {\mathbf{h}}^*)&\le {\hat{f}}({\mathbf{z}}^*) + {\hat{f}}({\mathbf{x}}^*) . \end{aligned}$$
(11)

Proof

We begin by a simple observation: let \(g: \mathbb {R}\rightarrow \mathbb {R}\) be a convex function, \(x \in \mathbb {R}\), \(z \in \mathbb {Z}\), and \(r \in \mathbb {Z}\) be such that \(r \sqsubseteq x-z\) (that is, there is some \(\rho , 0 \le \rho \le 1\), such that \(r = \rho \cdot (x-z)\)). By convexity of g we have that

$$\begin{aligned} g(z+r) + g(x-r) \le g(z) + g(x) . \end{aligned}$$
(12)

Fix \(j \in [N]\) and \({\mathbf{z}}= ({\mathbf{z}}^*)^j\), \({\mathbf{x}}= ({\mathbf{x}}^*)^j\), \({\mathbf{h}}= ({\mathbf{h}}^*)^j\), and let i be the type of brick j. Since \({\mathbf{h}}^*\) is a configurable cycle there exists an \({\hat{f}}\)-optimal decomposition \(\varGamma \) of \({\mathbf{x}}\) such that, for each \(({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma \), there exists a \({\mathbf{h}}_{\mathbf{c}}\sqsubseteq {\mathbf{c}}- {\mathbf{z}}\), \({\mathbf{h}}_{\mathbf{c}}\in \text {Ker}_\mathbb {Z}(E^i_2)\), and \({\mathbf{h}}= \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma } \lambda _{\mathbf{c}}{\mathbf{h}}_{\mathbf{c}}\). Due to separability of f we may apply (12) independently to each coordinate, obtaining for each \({\mathbf{c}}\)

$$\begin{aligned} f^i({\mathbf{z}}+ {\mathbf{h}}_{\mathbf{c}}) + f^i({\mathbf{c}}- {\mathbf{h}}_{\mathbf{c}})&\le f^i({\mathbf{z}}) + f^i({\mathbf{c}}) . \end{aligned}$$

Since all arguments of \(f^i\) are integral, we immediately get

$$\begin{aligned} {\hat{f}}^i({\mathbf{z}}+ {\mathbf{h}}_{\mathbf{c}}) + {\hat{f}}^i({\mathbf{c}}- {\mathbf{h}}_{\mathbf{c}})&\le {\hat{f}}^i({\mathbf{z}}) + {\hat{f}}^i({\mathbf{c}}) . \end{aligned}$$

Aggregating according to \(\varGamma \), we get (recall that we have \(\sum _{({\mathbf{c}}, \lambda _{{\mathbf{c}}}) \in \varGamma } \lambda _{{\mathbf{c}}} = 1\))

$$\begin{aligned} \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma } \lambda _{\mathbf{c}}\left( {\hat{f}}^i({\mathbf{z}}+ {\mathbf{h}}_{\mathbf{c}}) + {\hat{f}}^i({\mathbf{c}}- {\mathbf{h}}_{\mathbf{c}}) \right)\le & {} \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma } \lambda _{\mathbf{c}}\left( {\hat{f}}^i({\mathbf{z}}) + {\hat{f}}^i({\mathbf{c}}) \right) \\= & {} {\hat{f}}^i({\mathbf{z}}) + \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma } \lambda _{{\mathbf{c}}}{\hat{f}}^i({\mathbf{c}}), \end{aligned}$$

where by \({\hat{f}}\)-optimality of \(\varGamma \) the right-hand side is equal to \({\hat{f}}^i({\mathbf{z}}) + {\hat{f}}^i({\mathbf{x}})\). As for the left-hand side, observe that decompositions \(\varGamma ' = \{({\mathbf{z}}+ {\mathbf{h}}_{\mathbf{c}}, \lambda _{\mathbf{c}}) \mid ({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma \}\) and \(\varGamma '' = \{({\mathbf{c}}- {\mathbf{h}}_{\mathbf{c}}, \lambda _{\mathbf{c}}) \mid ({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma \}\) satisfy \(\sum \varGamma ' = {\mathbf{z}}+ {\mathbf{h}}\) and \(\sum \varGamma '' = {\mathbf{x}}- {\mathbf{h}}\) but are only feasible (not necessarily optimal) solutions of (6). Thus, we have

$$\begin{aligned} {\hat{f}}^i({\mathbf{z}}+ {\mathbf{h}}) + {\hat{f}}^i({\mathbf{x}}- {\mathbf{h}}) \le \sum _{({\mathbf{c}}, \lambda _{\mathbf{c}}) \in \varGamma } \lambda _{\mathbf{c}}\left( {\hat{f}}^i({\mathbf{z}}+ {\mathbf{h}}_{\mathbf{c}}) + {\hat{f}}^i({\mathbf{c}}- {\mathbf{h}}_{\mathbf{c}}) \right) . \end{aligned}$$

Combining over \(\varGamma \) then yields

$$\begin{aligned} {\hat{f}}^i({\mathbf{z}}+ {\mathbf{h}}) + {\hat{f}}^i({\mathbf{x}}- {\mathbf{h}}) \le {\hat{f}}^i({\mathbf{z}}) + {\hat{f}}^i({\mathbf{x}}), \end{aligned}$$

and since we have proven this claim for every brick j, aggregation over bricks concludes the proof of the main claim (11). \(\square \)

Let us show that if \({\mathbf{x}}^*\) and \({\mathbf{z}}^*\) are as stated, then there is no configurable cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\).

Lemma 9

Let \({\mathbf{x}}^*\) be a conf-optimal solution of (HugeCP) and let \({\mathbf{z}}^*\) be an optimal solution of (HugeIP) such that \(\Vert {\mathbf{x}}^*-{\mathbf{z}}^*\Vert _1\) is minimal. Then there is no configurable cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\).

Proof

For the sake of contradiction, suppose that there exists a configurable cycle \({\mathbf{h}}^*\) of \({\mathbf{x}}^* - {\mathbf{z}}^*\). By Lemma 8, one of two cases must occur:

Case 1: \({\hat{f}}({\mathbf{z}}^* + {\mathbf{h}}^*) \le {\hat{f}}({\mathbf{z}}^*)\). Then \({\mathbf{z}}^* + {\mathbf{h}}^*\) is an optimal integer solution (by \({\mathbf{h}}\sqsubseteq {\mathbf{x}}^* - {\mathbf{z}}^*\) we have \({\mathbf{l}}\le {\mathbf{z}}^* + {\mathbf{h}}\le {\mathbf{u}}\) and by \({\mathbf{h}}^* \in \ker _{\mathbb {Z}}\left( E^{(N)}\right) \) we have \(E^{(N)} ({\mathbf{z}}^* + {\mathbf{h}}) = {\mathbf{b}}\)) which is closer to \({\mathbf{x}}^*\), a contradiction to minimality of \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\).

Case 2: \({\hat{f}}({\mathbf{x}}^* - {\mathbf{h}}^*) < {\hat{f}}({\mathbf{x}}^*)\). Since \({\mathbf{h}}^*\) is a configurable cycle, Lemma 7 states that \({\mathbf{x}}^* - {\mathbf{h}}^*\) is configurable, so we have a contradiction with conf-optimality of \({\mathbf{x}}^*\). \(\square \)

3.4 Overview of the remainder of the proof

In order to use existing proximity arguments to bound the norm of a cycle, our plan is to move into an extended (higher-dimensional) space which corresponds to decomposing each brick \({\mathbf{x}}^i\) of \({\mathbf{x}}^*\) into configurations as \({\mathbf{x}}^i = \sum _{{\mathbf{c}}} \lambda _{{\mathbf{c}}} {\mathbf{c}}\) – each summand becomes a new brick in the extended space.

We denote this new higher-dimensional representation of \({\mathbf{x}}^*\) with respect to \(\varGamma \) as \(\uparrow {\mathbf{x}}^*\) and call it the rise of \({\mathbf{x}}^*\), and define similarly the rise of \({\mathbf{z}}^*\) (with respect to a given decomposition of each brick of \({\mathbf{x}}^*\)). The situation gets very delicate at this point.

First, we require that each decomposition of a brick of \({\mathbf{x}}^*\) is optimal with respect to the auxiliary objective \({\hat{f}}\) so that we can use the argument about non-existence of a cycle. Second, because the proximity bound depends on the number of fractional bricks of \(\uparrow {\mathbf{x}}^*\), we require that the decomposition of each brick is small, i.e., into only few elements. Third, we require that each coefficient \(\lambda _{{\mathbf{c}}}\) is of the form \(1/q_{{\mathbf{c}}}\) for an integer \(q_{{\mathbf{c}}}\), because we need to ensure that, for a corresponding cycle brick \({\mathbf{h}}_{{\mathbf{c}}}\), \(\lambda ^{-1}_{{\mathbf{c}}} {\mathbf{h}}_{{\mathbf{c}}}\) is an integer vector, so \(\lambda ^{-1}_{{\mathbf{c}}}\) has to be an integer. To ensure the second and third condition simultaneously, we first show that there is a decomposition of each brick of size at most \(t+1\) and with each coefficient bounded by P, and then show that each fraction p/q can be written as an Egyptian fraction \(p/q = 1/a_1 + 1/a_2 + \cdots 1/a_{{\mathfrak {c}}}\) with \({\mathfrak {c}} \le 2\log _2 q\) (Lemmas 1012). (Bounds on the length of Egyptian fractions have been studied in the past and our bound is not the best possible, but in order to use our proximity theorem, we need exact and not merely asymptotic bounds, so we prove this worse but exact bound of \(2\log _2 q\).) We call a decomposition of a brick satisfying all three criteria given above a small scalable decomposition.

Fix a small scalable decomposition for each brick of \({\mathbf{x}}^*\), and let \(\uparrow \!\!{\mathbf{x}}^*\) be the rise of \({\mathbf{x}}^*\) with respect to this decomposition. Since this decomposition is small, \(\uparrow \!\!{\mathbf{x}}^*\) has at most \({{\,\mathrm{\mathrm{poly}}\,}}(\Vert E\Vert _\infty , r, s)\) fractional bricks. Moreover, the other properties above allow us to say the following: if \({\mathbf{r}}\) is a cycle of \(\uparrow \!\!{\mathbf{x}}^* - \uparrow \!\!{\mathbf{z}}^*\), then the compression of \({\mathbf{r}}\) back to the original space is a configurable cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\) (Lemma 14). So in order to bound \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\), it suffices (by triangle inequality) to bound \(\Vert \!\!\uparrow \!\!{\mathbf{x}}^* - \uparrow \!\!{\mathbf{z}}^*\Vert _1\). We do this by adapting the approach of Eisenbrand and Weismantel [11] to bound the length of any cycle \({\mathbf{r}}\) of \(\uparrow \!\!{\mathbf{x}}^* - \uparrow \!\!{\mathbf{z}}^*\).

3.5 The remainder of the proof

We say that \(|\varGamma |\) is the size of the decomposition. Let us show that for each brick, there exists an \({\hat{f}}\)-optimal decomposition whose coefficients have small encoding length, and its size is small. For any matrix A, define \(g_\infty (A) = \max _{{\mathbf{g}}\in \mathcal {G}(A)} \Vert {\mathbf{g}}\Vert _\infty \).

Lemma 10

Each brick of \({\mathbf{x}}^*\) of type i has an \({\hat{f}}\)-optimal decomposition \(\varGamma \)

  1. 1.

    of size at most \(t+1\), and

  2. 2.

    \(\max _{({\mathbf{c}}, \lambda _{{\mathbf{c}}} = p_{{\mathbf{c}}} / q_{{\mathbf{c}}}) \in \varGamma }\{p_{\mathbf{c}},q_{\mathbf{c}}\} \le (t+1)! ((2t-2) g_\infty (E^i_2))^{t+1} \le (t+1)^{(t+1)}(g_1(E_2))^{(t+2)} \le (t+1)^{(t+1)}(s\Vert E^i_2\Vert _\infty +1)^{(s+1)(t+2)}\).

Proof

An \({\hat{f}}\)-optimal decomposition corresponds to a solution of the LP (6). We will argue that there is a solution whose support is composed of columns which do not differ by much, which corresponds to a solution of an LP with small coefficients, and the claimed bound can then be obtained by Cramer’s rule.

Specifically, we claim that there exists an \({\hat{f}}\)-optimal decomposition \(\varGamma \) which corresponds to an optimal solution \(\varvec{ \lambda }\) of (6) such that there exists a point \(\varvec{ \zeta }\in \mathbb {Z}^t\) and if \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\), then \(\Vert {\mathbf{c}}- \varvec{ \zeta }\Vert _\infty \le (t-1) g_\infty (E^i_2)\). For a solution \(\varvec{ \lambda }\) of (6), define \(R':= \max _{{\mathbf{c}}, {\mathbf{c}}' \in \text {supp}(\varvec{ \lambda })} \Vert {\mathbf{c}}- {\mathbf{c}}'\Vert _\infty \) to be the longest side of the bounding box of all \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\). For a point \(\varvec{ \zeta }\in \mathbb {Z}^t\), say, for \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\), that a coordinate \(j \in [t]\) is tight if \(c_j = \zeta _j - {\lceil }{\frac{R'}{2}}{\rceil }\) or \(c_j = \zeta _j + {\lceil }{\frac{R'}{2}}{\rceil }\), and define \(S = \sum _{{\mathbf{c}}\in \text {supp}(\varvec{ \lambda })} \sum _{j=1}^t \lambda _{{\mathbf{c}}} [j \text {is tight in } {\mathbf{c}}]\) (where “[X]” is an indicator of the statement X) to be the weighted number of tight coordinates. Now let \(\varvec{ \zeta }\in \mathbb {Z}^t\) be any point which is an integer center of the bounding box (i.e., \(\Vert {\mathbf{c}}- \varvec{ \zeta }\Vert _\infty \le {\lceil }{\frac{R'}{2}}{\rceil }\) for all \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\)) and which minimizes S. For contradiction assume that \(\varvec{ \lambda }\) is an optimal solution of (6) which minimizes \(R'\) and S (lexicographically in this order) and \(R' > (2t-2)g_\infty (E^i_2)\). Assuming \(\varGamma \) is a decomposition of a brick of type i, we have \({\mathbf{c}}, {\mathbf{c}}' \in \mathcal {C}^i = \{{\tilde{{\mathbf{c}}}} \in \mathbb {Z}^t \mid E^i_2 {\tilde{{\mathbf{c}}}} = {\mathbf{b}}^i, \, {\mathbf{l}}^i \le {\tilde{{\mathbf{c}}}} \le {\mathbf{u}}^i\}\) and thus \({\mathbf{c}}- {\mathbf{c}}' \in \text {Ker}_{\mathbb {Z}}(E^i_2)\). By Proposition 1 we may write \({\mathbf{c}}- {\mathbf{c}}' = \sum _{j=1}^{2t-2} \gamma _j {\mathbf{g}}_j\) with \({\mathbf{g}}_j \in \mathcal {G}(E^i_2)\) and \({\mathbf{g}}_j \sqsubseteq {\mathbf{c}}- {\mathbf{c}}'\) for all \(j \in [2t-2]\). Note that because \(\Vert {\mathbf{c}}- {\mathbf{c}}'\Vert _\infty > R := (2t-2)g_\infty (E^i_2)\), we have that there exists \(j \in [2t-2]\) such that \(\gamma _j > 1\). Hence \({\mathbf{g}}:= \sum _{j=1}^{2t-2} \lfloor \frac{\gamma _j}{2}\rfloor {\mathbf{g}}_j\) satisfies \({\mathbf{g}}\ne {\mathbf{0}}\). Let \({\bar{{\mathbf{c}}}} := {\mathbf{c}}- {\mathbf{g}}\), and \({\bar{{\mathbf{c}}}}' := {\mathbf{c}}' + {\mathbf{g}}\).

First, because \({\bar{{\mathbf{c}}}} - {\bar{{\mathbf{c}}}}' = ({\mathbf{c}}- {\mathbf{c}}') - 2{\mathbf{g}}= \sum _{j=1}^{2t-2} (\gamma _j - 2\lfloor \frac{\gamma _j}{2}\rfloor ) {\mathbf{g}}_i\), we may bound \(\Vert {\bar{{\mathbf{c}}}} - {\bar{{\mathbf{c}}}}'\Vert _\infty \le (2t-2) g_\infty (E^i_2) = R\). Second, by the conformality of the decomposition, \({\bar{{\mathbf{c}}}}, {\bar{{\mathbf{c}}}}' \in \mathcal {C}^i\). Third, by separable convex superadditivity (Proposition 2), we have that \(f({\mathbf{c}}) + f({\mathbf{c}}') \ge f({\bar{{\mathbf{c}}}}) + f({\bar{{\mathbf{c}}}}')\). Fourth, there exist a coordinate \(j \in [t]\) such that \(|c_j - c'_j|=R'\) but, since \(\Vert {\bar{{\mathbf{c}}}} - {\bar{{\mathbf{c}}}}'\Vert _\infty \le R\), \(|{\bar{c}}_j - {\bar{c}}'_j| \le R < R'\) and thus j is no longer a tight coordinate for either \({\bar{{\mathbf{c}}}}\) or \({\bar{{\mathbf{c}}}}'\) (or both), and no new tight coordinates can be introduced because \(R < R'\). Without loss of generality, let \(\lambda _{{\mathbf{c}}} \le \lambda _{{\mathbf{c}}'}\). Now initialize \(\varvec{ \lambda }' := \varvec{ \lambda }\) and modify it by setting \(\lambda '_{{\bar{{\mathbf{c}}}}}, \lambda '_{{\bar{{\mathbf{c}}}}'} := \lambda _{{\mathbf{c}}}\), \(\lambda '_{{\mathbf{c}}} := 0\), \(\lambda '_{{\mathbf{c}}'} := \lambda _{{\mathbf{c}}'} - \lambda _{{\mathbf{c}}}\). By our arguments above, \(\varvec{ \lambda }'\) is another optimal solution of (6) but the weighted number S of tight coordinates has decreased by the fourth point, a contradiction.

Thus, there exists a point \(\varvec{ \zeta }\in \mathbb {Z}^t\) and an optimal solution \(\varvec{ \lambda }\) of (6) such that for each \({\mathbf{c}}\in \text {supp}(\varvec{ \lambda })\), it holds that \(\Vert {\mathbf{c}}- \varvec{ \zeta }\Vert _\infty \le R/2 = (t-1) g_\infty (E^i_2)\). We obtain the following reduced LP from (6) by deleting all columns \({\mathbf{c}}\) with \(\Vert {\mathbf{c}}- \varvec{ \zeta }\Vert _\infty > R/2\), and denote the remaining set of columns by \({\bar{\mathcal {C}}}^i\):

$$\begin{aligned} \min \sum _{{\mathbf{c}}\in {\bar{\mathcal {C}}}^i} \lambda _{{\mathbf{c}}} f^i({\mathbf{c}}) \quad \text {s.t.} \quad \sum _{{\mathbf{c}}\in {\bar{\mathcal {C}}}^i} \lambda _{{\mathbf{c}}} {\mathbf{c}}= {\mathbf{x}}^j, \, \Vert \varvec{ \lambda }\Vert _1 = 1, \, \varvec{ \lambda }\ge {\mathbf{0}}. \end{aligned}$$
(13)

This LP is equivalent to one obtained by subtracting \(\varvec{ \zeta }\) from all columns and the right hand side:

$$\begin{aligned} \min \sum _{{\mathbf{c}}\in {\bar{\mathcal {C}}}^i} \lambda _{{\mathbf{c}}} f^i({\mathbf{c}}) \quad \text {s.t.} \quad \sum _{{\mathbf{c}}\in {\bar{\mathcal {C}}}^i} \lambda _{{\mathbf{c}}} ({\mathbf{c}}-\varvec{ \zeta }) = ({\mathbf{x}}^j-\varvec{ \zeta }), \, \Vert \varvec{ \lambda }\Vert _1 = 1, \, \varvec{ \lambda }\ge {\mathbf{0}}. \end{aligned}$$
(14)

Now, this LP has \(t+1\) rows and its columns have the largest coefficient bounded by R/2 in absolute value. A basic solution \(\varvec{ \lambda }\) has \(|\text {supp}(\varvec{ \lambda })| \le t+1\) and, by Cramer’s rule, the denominator of each \(\lambda _{\mathbf{c}}\) is bounded by \((t+1)!\) times the largest coefficient to the power of \(t+1\), thus bounded by

$$\begin{aligned} (t+1)! (R/2)^{(t+1)} \le (t+1)! ((t-1) g_\infty (E^i_2))^{(t+1)} \le (t+1)^{(t+1)}(g_1(E_2))^{(t+2)}. \end{aligned}$$

In the worst case, we can bound this as

$$\begin{aligned} (t+1)^{(t+1)}(s\Vert E^i_2\Vert _\infty +1)^{s(t+2)}, \end{aligned}$$

where we use

$$\begin{aligned}&g_\infty (E^i_2) \le \Vert E^i_2\Vert _\infty (2s\Vert E^i_2\Vert _\infty +1)^s \end{aligned}$$

[12, Lemma 2]. \(\square \)

Next, we will need the notion of an Egyptian fraction. For a rational number p/q, \(p,q \in \mathbb {N}\), its Egyptian fraction is a finite sum of distinct unit fractions such that

$$\begin{aligned} \frac{p}{q} = \frac{1}{q_1} + \frac{1}{q_2} + \cdots + \frac{1}{q_k}, \end{aligned}$$

for \(q_1, \dots , q_k \in \mathbb {N}\) distinct. Call the number of terms k the length of the Egyptian fraction. Vose [43] has proven that any p/q has an Egyptian fraction of length \(\mathcal {O}(\sqrt{\log q})\). Since our algorithm requires an exact bound, we present the following weaker yet exact result:

Lemma 11

(Egyptian Fractions) Let \(p, q \in \mathbb {N}\), \(1 \le p < q\). Then p/q has an Egyptian fraction of length at most \(2(\log _2 q)+1\) and all denominators are at most \(q^2\).

Proof

Let \(a=2^k\) be largest such that \(a < q\), so \(k={\lceil }{(\log _2 q)-1}{\rceil } < \log _2 q\). Write \(ap = bq+r\), \(0 \le r < q\). Note that \(p< q \implies b < a\) and \(q \le 2a \implies r < 2a\). Now let \((b_{k-1}, \dots , b_1, b_0)\) be the binary representation of \(b < a\) so \(b=\sum _{i=0}^{k-1} 2^i b_i\) and \(e(r_k, \dots , r_1, r_0)\) be that of \(r < 2a\) so \(r=\sum _{i=0}^{k} r_i 2^i\). Then we have

$$\begin{aligned} \frac{p}{q} = \frac{ap}{aq} = \frac{bq + r}{aq} = \frac{b}{a} + \frac{1}{q}\frac{r}{a} = \sum _{i=0}^{k-1} \frac{b_i}{2^{k-i}} + \sum _{i=0}^{k} \frac{r_i}{q \cdot 2^{k-i}}, \end{aligned}$$

where \(b_i, r_i \in \{0,1\}\), so a sum of at most \(2k+1 \le 2 (\log _2 q)+1\) terms with all denominators \(d_i \le q 2^k = qa \le q^2\). Moreover, all denominators in the first sum are distinct and at most \(2^k\), and all in the second sum are distinct and at least \(q > 2^k\), hence all distinct, so this is an Egyptian fraction of p/q of length \(2(\log _2 q)+1\) and denominators are at most \(q^2\). \(\square \)

Recall that our goal is to obtain a configurable cycle. However, for that we also need a special form of a decomposition. Say that \(\varGamma \) is a scalable decomposition of a brick \(({\mathbf{x}}^*)^j\) of type i if it is a \({\hat{f}}\)-optimal decomposition, and for each \(({\mathbf{c}}_\gamma , \lambda _\gamma ) \in \varGamma \), \(\lambda _\gamma \) is of the form \(1/q_{\gamma }\) for some \(q_{\gamma } \in \mathbb {N}\). We note that in what follows we do not need an algorithm computing a scalable decomposition, only the following existence statement.

Lemma 12

Each brick of \({\mathbf{x}}^*\) has a scalable decomposition of size at most \(\kappa _1 \cdot t^3 \log (t\Vert E_2\Vert _\infty )\), where \(\kappa _1 = 52\).

Proof

Fix \(j \in [N]\). Let \({\mathbf{x}}= ({\mathbf{x}}^*)^j\) be a brick of \({\mathbf{x}}^*\) of type i. By Lemma 10, there exists an \({\hat{f}}\)-optimal decomposition of \({\mathbf{x}}\) of size \(t+1\) where each coefficient \(\lambda _{\mathbf{c}}=p_{\mathbf{c}}/q_{\mathbf{c}}\) satisfies \(p_{\mathbf{c}},q_{\mathbf{c}}\le (t+1)^{(t+1)}(s\Vert E^i_2\Vert _\infty +1)^{(s+1)(t+2)}\). For each \({\mathbf{c}}\) in the decomposition now express \(\lambda _{\mathbf{c}}\) as an Egyptian fraction:

$$\begin{aligned} \lambda _{\mathbf{c}}= \frac{p_{\mathbf{c}}}{q_{\mathbf{c}}} = \frac{1}{a_1} + \frac{1}{a_2} + \cdots + \frac{1}{a_{{\mathfrak {e}}}} . \end{aligned}$$

By Lemma 11,

$$\begin{aligned} {\mathfrak {e}}\le & {} 2(\log _2 q_{{\mathbf{c}}})+1 = 2\left( \log \left( (t+1)^{(t+1)}(s\Vert E^i_2\Vert _\infty +1)^{(s+1)(t+2)}\right) \right) +1\\\le & {} 25st \log (st\Vert E^i_2\Vert _\infty ) . \end{aligned}$$

Thus, the resulting decomposition is of size at most \((t+1) 25st \log (st\Vert E^i_2\Vert _\infty ) \le 2 \cdot 26t^3 \log (t\Vert E^i_2\Vert _\infty )\) (by \(s \le t\) this justifies the deletion of s in the \(\log ()\) at the cost of a factor of 2, so the last bound holds) and is scalable, since each coefficient is of the form \(1/q_{\gamma }\) for some \(q_{\gamma } \in \mathbb {N}\). \(\square \)

We will now show that we are guaranteed a configurable cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\) if there exists an analogue of a regular cycle of a certain “lifting” of \({\mathbf{x}}^*\) and \({\mathbf{z}}^*\).

Fix for each brick of \({\mathbf{x}}^*\) a scalable decomposition \(\varGamma ^j\). Let \(\uparrow \!\!{\mathbf{x}}^*\) be the rise of \({\mathbf{x}}^*\) defined as a vector obtained from \({\mathbf{x}}^*\) by keeping every integer brick \(({\mathbf{x}}^*)^j\), and replacing every fractional brick \(({\mathbf{x}}^*)^j\) with \(|\varGamma ^j|\) terms \(\lambda _\gamma {\mathbf{c}}_\gamma \), one for each \(({\mathbf{c}}_\gamma , \lambda _\gamma ) \in \varGamma ^j\). Observe that each brick of \(\uparrow \!\! {\mathbf{x}}^*\) is of the form \(\lambda _{{\mathbf{c}}} {\mathbf{c}}\) for some configuration \({\mathbf{c}}\) and some coefficient \(0 \le \lambda _{{\mathbf{c}}} \le 1\). Thus, for a brick \(\lambda _{{\mathbf{c}}} {\mathbf{c}}\) we say that \({\mathbf{c}}\) is its configuration, \(\lambda _{{\mathbf{c}}}\) is its coefficient, and its type is identical to the type of brick it originated from; in particular, bricks which originated from an integer brick \({\mathbf{p}}= ({\mathbf{x}}^*)^j\) are of the form \(\lambda _{{\mathbf{p}}} {\mathbf{p}}\) with \(\lambda _{{\mathbf{p}}} = 1\). Let \(N'\) be the number of bricks of \(\uparrow \!\! {\mathbf{x}}^*\) and define a mapping \(\nu : [N'] \rightarrow [N]\) such that if a brick \(j \in [N']\) of \(\uparrow \!\! \!\!{\mathbf{x}}^*\) was defined from brick \(\ell \in [N]\) of \({\mathbf{x}}^*\), then \(\nu (j) = \ell \). The natural inverse \(\nu ^{-1}\) is defined such that, for \(\ell \in [N]\), \(\nu ^{-1}(\ell )\) is the set of bricks of \(\uparrow \!\! {\mathbf{x}}^*\) which originated from \(({\mathbf{x}}^*)^\ell \).

Lemma 13

The vector \(\uparrow \!\! {\mathbf{x}}^*\) has at most \(\kappa _2 \cdot r \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks, where \(\kappa _2 = 2\kappa _1\).

Proof

By Lemma 6 there is a conf-optimal \({\mathbf{x}}^*\) with at most 2r fractional bricks. By Lemma 12 for each fractional brick of \({\mathbf{x}}^*\) of type i there is a scalable decomposition of size at most \(\kappa _1 \cdot t^3 \log (t\Vert E^i_2\Vert _\infty ) \le \kappa _1 \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\). Thus, \(\uparrow \!\! {\mathbf{x}}^*\) has at most \(\kappa _1 \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks for each fractional brick of \({\mathbf{x}}^*\), of which there are at most 2r, totaling \(2\kappa _1 \cdot r\cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks. \(\square \)

Denote by \(\uparrow \!\!{\mathbf{z}}^* \in \mathbb {R}^{N't}\) the rise of \({\mathbf{z}}^*\) (with respect to \({\mathbf{x}}^*\)) defined as follows. Let \(j \in [N']\), \(\ell = \nu (j)\), and \(\lambda \) be the coefficient of the j-th brick of \(\uparrow \!\! {\mathbf{x}}^*\). Then the j-th brick of \(\uparrow \!\! {\mathbf{z}}^*\) is \((\uparrow \!\! {\mathbf{z}}^*)^j := \lambda ({\mathbf{z}}^*)^{\ell }\). Observe that \(\Vert \!\!\uparrow \!\! {\mathbf{x}}^* - \uparrow \!\! {\mathbf{z}}^*\Vert _1 \ge \Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\) by applying triangle inequality to each brick and its decomposition individually and aggregating.

For any vector \({\mathbf{x}}\in \mathbb {R}^{N't}\), define the fall of \({\mathbf{x}}\) as a vector \(\downarrow \!\!{\mathbf{x}}\in \mathbb {R}^{Nt}\) such that for \(\ell \in [N]\), \((\downarrow \!\!{\mathbf{x}})^\ell = \sum _{j \in \nu ^{-1}(\ell )} {\mathbf{x}}^j\). We see that \(\downarrow \!\! (\uparrow \!\! {\mathbf{x}}^*) = {\mathbf{x}}^*\) and \(\downarrow \!\! (\uparrow \!\! {\mathbf{z}}^*) = {\mathbf{z}}^*\). Say that \({\mathbf{r}}\) is a cycle of \(\uparrow \!\! {\mathbf{x}}^* - \uparrow \!\! {\mathbf{z}}^*\) if \({\mathbf{r}}\sqsubseteq \uparrow \!\! {\mathbf{x}}^* - \uparrow \!\! {\mathbf{z}}^*\) and \({\mathbf{r}}\in \text {Ker}_{\mathbb {Z}}(E^{(N')})\).Footnote 3

Lemma 14

If \({\mathbf{r}}\) is a cycle of \(\uparrow \!\! {\mathbf{x}}^* - \uparrow \!\! {\mathbf{z}}^*\), then \(\downarrow {\mathbf{r}}\) is a configurable cycle of \({\mathbf{x}}^* - {\mathbf{z}}^*\).

Proof

To show that \(\downarrow \!\!\!{\mathbf{r}}\) is a configurable cycle, we need to show that (1) \(\downarrow \!\!\!{\mathbf{r}}\in \text {Ker}_{\mathbb {Z}}(E^{(N)})\) and, (2) for each brick \({\mathbf{x}}= ({\mathbf{x}}^*)^j\) of \({\mathbf{x}}^*\), there is an \({\hat{f}}\)-optimal decomposition of \({\mathbf{x}}\) such that \({\mathbf{h}}= (\downarrow \!\! {\mathbf{r}})^j\) decomposes accordingly. For the first part, \(\downarrow \!\! {\mathbf{r}}\) is integral because it is obtained by summing bricks of \({\mathbf{r}}\), which is integral. Denote by i(j) the type of a brick j (we abuse this notation; note that i(j) for \(j \in [N]\) may differ from i(j) for \(j \in [N']\), but context always makes clear what we mean). By the fact that \({\mathbf{r}}\in \text {Ker}_{\mathbb {Z}}(E^{(N')})\) and the definition of \(\downarrow \!\! {\mathbf{r}}\), we have \({\mathbf{0}}= \sum _{j=1}^{N'} E^{i(j)}_1 {\mathbf{r}}^j = \sum _{j=1}^{N} E^{i(j)}_1 (\downarrow \!\! {\mathbf{r}})^j\), and, for each \(\ell \in [N]\), \({\mathbf{0}}= \sum _{j \in \nu ^{-1}(\ell )} E^{i(j)}_2 {\mathbf{r}}^j = E^{i(\ell )}_2(\downarrow \!\!{\mathbf{r}})^\ell \), thus \(\downarrow \!\! {\mathbf{r}}\in \text {Ker}_{\mathbb {Z}}(E^{(N)})\).

To see the second part, fix a brick \(j \in [N]\) of type i and let \({\mathbf{x}}= ({\mathbf{x}}^*)^j\), \({\mathbf{z}}= ({\mathbf{z}}^*)^j\) and \({\mathbf{h}}= (\downarrow \!\! {\mathbf{r}})^j\). We need to show that \({\mathbf{h}}= \sum _{\gamma \in \nu ^{-1}(j)} {\mathbf{h}}_\gamma \) can be written as \(\sum _{{\mathbf{c}}\in \mathcal {C}^i} \lambda _{\mathbf{c}}{\mathbf{h}}_{\mathbf{c}}\) with \({\mathbf{h}}_{\mathbf{c}}\sqsubseteq {\mathbf{c}}- {\mathbf{z}}\) and \({\mathbf{h}}_{\mathbf{c}}\in \text {Ker}_\mathbb {Z}(E^i_2)\). By definition of \(\uparrow \!\! {\mathbf{x}}^*\) and \({\mathbf{r}}\), there is a scalable decomposition \(\varGamma \) of \({\mathbf{x}}\) (namely the one used to define \(\uparrow \!\! \!{\mathbf{x}}^*\)) such that for each \(\gamma \in \nu ^{-1}(j)\), \({\mathbf{h}}_\gamma \sqsubseteq \lambda _\gamma ({\mathbf{c}}_\gamma - {\mathbf{z}})\) and \({\mathbf{h}}_\gamma \in \text {Ker}_\mathbb {Z}(E^i_2)\). Thus we may write \({\mathbf{h}}= \sum _{\gamma \in \nu ^{-1}(j)} \lambda _\gamma \cdot (\lambda ^{-1}_\gamma {\mathbf{h}}_\gamma )\) with \(\lambda ^{-1}_\gamma {\mathbf{h}}_\gamma \sqsubseteq {\mathbf{c}}_\gamma - {\mathbf{z}}\) and \(\lambda ^{-1}_\gamma {\mathbf{h}}_\gamma \) integral by the fact that \(\lambda _\gamma = 1/q_{\gamma }\) with \(q_{\gamma } \in \mathbb {N}\), concluding the proof. \(\square \)

We are finally ready to use the Steinitz Lemma to derive a bound on \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\).

Theorem 2

Let \({\mathbf{x}}^*\) be a conf-optimal solution of (HugeCP) with at most 2r fractional bricks. Then there exists an optimal solution \({\mathbf{z}}^*\) of (HugeIP) such that

$$\begin{aligned} \Vert {\mathbf{z}}^*-{\mathbf{x}}^*\Vert _1 \le&\left( \kappa _2 t^4 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\right) (2r\Vert E_1\Vert _\infty g_1(E_2)))^{r+2} \\ \le&\left( \kappa _2 t^4 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\right) (2r)^{r+2} (\Vert E\Vert _\infty s)^{3rs} . \end{aligned}$$

Proof

Denote by \({\bar{E}}_1\) the first r rows of the matrix \(E^{(N)}\). Let \({\mathbf{z}}^*\) be an optimal integer solution such that \(\Vert {\mathbf{z}}^* - {\mathbf{x}}^*\Vert _1\) is minimal, let \(\uparrow \!\! \!{\mathbf{x}}^*\) be the rise of \({\mathbf{x}}^*\) with at most \(\kappa _2 \cdot r \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks (see Lemma 13), let \(\uparrow \!\! {\mathbf{z}}^*\) be the rise of \({\mathbf{z}}^*\) with respect to \({\mathbf{x}}^*\), and let \({\mathbf{q}}= \uparrow \!\! {\mathbf{x}}^* - \uparrow \!\! {\mathbf{z}}^*\).

We want to get into the setting of the Steinitz Lemma, that is, to obtain a sequence of vectors with small \(\ell _1\)-norm and summing up to zero. To this end, we shall decompose \({\bar{E}}_1{\mathbf{q}}\) in the following way; we stress that we have \({\bar{E}}_1{\mathbf{q}}= {\mathbf{0}}\). For every integral brick \({\mathbf{q}}^i\) of type \(\ell \in [\tau ]\) we have its decomposition \({\mathbf{q}}^i = \sum _j {\mathbf{g}}^i_j\) into elements of \(\mathcal {G}(E^\ell _2)\) by the Positive Sum Property (Proposition 1); for each \({\mathbf{g}}^i_j\) append \(E^\ell _1{\mathbf{g}}^i_j\) into the sequence. For every fractional brick \({\mathbf{q}}^i\) of type \(\ell \in [\tau ]\) we have its decomposition \({\mathbf{q}}^i = \sum _{j=1}^{t} \alpha _j {\mathbf{g}}^i_j\), \(\alpha _j \ge 0\) for each j, into elements of \(\mathcal {C}(E^\ell _2)\); for each \({\mathbf{g}}^i_j\) append \({\lfloor }{\alpha _j}{\rfloor }\) copies of \(E^\ell _1 {\mathbf{g}}^i_j\) into the sequence, and finally append \(E^\ell _1 \{\alpha _j\} {\mathbf{g}}^i_j\). Observe that since \(\uparrow \!\! {\mathbf{x}}^*\) has at most \(\kappa _2 \cdot r \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional bricks (Lemma 13), so does \({\mathbf{q}}\), and thus we have appended \({\mathfrak {f}} \le t \cdot \kappa _2 \cdot r \cdot t^3 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty ) \le \kappa _2 \cdot r \cdot t^4 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\) fractional vectors into the sequence. Now we have a sequence

$$\begin{aligned} {\mathbf{o}}_1,\dots ,{\mathbf{o}}_m,{\mathbf{p}}_{m+1},\dots ,{\mathbf{p}}_{m+{\mathfrak {f}}} \end{aligned}$$
(15)

with m integer vectors \({\mathbf{o}}_1, \dots , {\mathbf{o}}_m\) and \({\mathfrak {f}}\) fractional vectors \({\mathbf{p}}_{m+1}, \dots , {\mathbf{p}}_{m+{\mathfrak {f}}}\). Moreover, since, for each \(i \in [\tau ]\), \(\mathcal {C}(E^i_2) \subseteq \mathcal {G}(E^i_2)\),

each vector has \(\ell _\infty \)-norm of \(\Vert E^1_1,\dots ,E^\tau _1\Vert _\infty \cdot g_1(E_2)\) and they sum up to \({\mathbf{0}}\). Observe that \((m+{\mathfrak {f}})\cdot g_1(E_2) \ge \Vert {\mathbf{q}}\Vert _1 = \Vert \uparrow \!\! {\mathbf{x}}^* - \uparrow \!\! {\mathbf{z}}^*\Vert _1 \ge \Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\). We now focus on bounding \(m+{\mathfrak {f}}\). The Steinitz Lemma (Lemma 1) implies that there exists a permutation \(\pi \) such that the sequence (15) can be re-arranged as

$$\begin{aligned} {\mathbf{v}}_{1},\dots ,{\mathbf{v}}_{m+{\mathfrak {f}}}, \end{aligned}$$
(16)

where \({\mathbf{v}}_i\) is \({\mathbf{o}}_{\pi ^{-1}(i)}\) if \(i \in [1,m]\) and \({\mathbf{p}}_{\pi ^{-1}(i)}\) if \(i \in [m+1, m+{\mathfrak {f}}]\), respectively, and for each \(1 \le k \le m+{\mathfrak {f}}\) the prefix sum \({\mathbf{t}}_k := \sum _{i=1}^k {\mathbf{v}}_{i}\) satisfies

$$\begin{aligned} \Vert {\mathbf{t}}_k\Vert _\infty \le r \Vert E_1\Vert _\infty g_1(E_2) . \end{aligned}$$

We will now argue that there cannot be indices \(1 \le k_1< \cdots < k_{{\mathfrak {f}}+2} \le {\mathfrak {f}}+m\) with

$$\begin{aligned} {\mathbf{t}}_{k_1} = \cdots = {\mathbf{t}}_{k_{{\mathfrak {f}}+2}}, \end{aligned}$$
(17)

which implies that \({\mathfrak {f}}+m\) is bounded by \({\mathfrak {f}}+1\) times the number of integer points of norm at most \(r \Vert E_1\Vert _\infty g_1(E_2)\) and therefore,

$$\begin{aligned} \Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1&\le \Vert \uparrow {\mathbf{x}}^* - \uparrow {\mathbf{z}}^*\Vert _1 \le ({\mathfrak {f}}+1) \left( 2r \Vert E_1\Vert _\infty g_1(E_2)+1\right) ^r \cdot g_1(E_2) \\&\le \kappa _2 \cdot t^4 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty ) \cdot r \left( 2r \Vert E_1\Vert _\infty g_1(E_2)+1\right) ^r \cdot g_1(E_2) \\&\le \left( \kappa _2 t^4 \log (t\Vert E^1_2,\dots ,E^\tau _2\Vert _\infty )\right) (2r\Vert E_1\Vert _\infty g_1(E_2)))^{r+2}. \end{aligned}$$

Assume for contradiction that there exist \({\mathfrak {f}}+2\) indices \(1 \le k_1< \cdots < k_{{\mathfrak {f}}+2} \le {\mathfrak {f}}+m\) satisfying (17). By the pigeonhole principle, there is an index \(k_\ell \) such that all the vectors \({\mathbf{v}}_{k_{\ell }+1},\dots ,{\mathbf{v}}_{k_{\ell +1}}\) from the rearrangement (16) correspond to integer vectors \({\mathbf{o}}_{\pi ^{-1}(p)}\) for \(p \in [k_{\ell }+1, k_{\ell +1}]\). We will show that this collection of vectors corresponds to a cycle \({\mathbf{h}}\) of \(\uparrow \!\! {\mathbf{x}}^* - \uparrow \!\! {\mathbf{z}}^*\) which by the minimality of \(\Vert {\mathbf{x}}^* - {\mathbf{z}}^*\Vert _1\) and Lemmas 9 and 14 is impossible. To obtain the cycle, for each \(p \in [k_{\ell }+1, k_{\ell +1}]\), let i(p), j(p), and \(\ell (p)\) be such that \({\mathbf{o}}_{\pi ^{-1}(p)} = E^{\ell (p)}_1 {\mathbf{g}}_{j(p)}^{i(p)}\). Initialize \({\mathbf{h}}:= {\mathbf{0}}\in \mathbb {Z}^{N't}\) and, for each \(p \in [k_{\ell }+1, k_{\ell +1}]\), let \({\mathbf{h}}^{i(p)} := {\mathbf{h}}^{i(p)} + g_{j(p)}^{i(p)}\). Now we check that \({\mathbf{h}}\) is, in fact, a cycle. First, to see that \(E^{(N')} {\mathbf{h}}= {\mathbf{0}}\), we have \(E^\ell _2 {\mathbf{h}}^i = {\mathbf{0}}\) for every brick \(i \in [N']\) of type \(\ell \) by the fact that \({\mathbf{h}}^i\) is a sum of \({\mathbf{g}}_j^i \in \mathcal {G}(E^\ell _2) \subseteq \text {Ker}_{\mathbb {Z}}(E^\ell _2)\), and we have \({\bar{E}}_1 {\mathbf{h}}= {\mathbf{0}}\) by the fact that \({\mathbf{t}}_{k_\ell } = {\mathbf{t}}_{k_{\ell +1}}\) and thus \(\sum _{p \in [m+{\mathfrak {f}}]} E^{\ell (p)}_1 {\mathbf{g}}_{j(p)}^{i(p)} ={\mathbf{0}}\). Second, \({\mathbf{h}}\sqsubseteq {\mathbf{q}}\) because, for every brick \(i \in [N']\), \({\mathbf{h}}^i\) is a sign-compatible sum of elements \({\mathbf{g}}^i_j \sqsubseteq {\mathbf{q}}^i\). \(\square \)

3.6 Improving the proximity theorem when I has identical columns

In this section we will show how to construct a huge n-fold instance \(I'\) from any input instance I such that the number of columns of \(I'\) per brick is at most \((2\Vert E\Vert _\infty +1)^{r+s}\), and in some sense I and \(I'\) are equivalent. Specifically, we will show a mapping between the solutions of I and \(I'\) which maps integer or configurable optima of I to integer or configurable optima of \(I'\) and vice versa, respectively, and such that proximity bounds from \(I'\) can be transferred to I. This will eventually allow us to show that even if I has very large t, we can bound the distance between a configurable optimum and some integer optimum of I by a function independent of t.

3.6.1 Construction of \(I'\).

Note that \((2\Vert E\Vert _\infty +1)^{r+s}\) is the number of distinct \((r+s)\)-dimensional integer vectors with entries bounded by \(\Vert E\Vert _\infty \) in absolute value, hence the number of possible distinct columns per brick. We will show how to “join” variables corresponding to identical columns. Consider any IP with a separable convex objective where columns corresponding to variables \(x_1\) and \(x_2\) are identical. Let \(f_1\) and \(f_2\) be the objective functions corresponding to \(x_1\) and \(x_2\), and \(l_1, l_2\) and \(u_1, u_2\) be their lower and upper bounds, respectively. Let \(x_{12}\) be a new variable which replaces \(x_1, x_2\) in \(I'\). Set the lower bound of \(x_{12}\) to be \(l_{12}=l_1 + l_2\), upper bound \(u_{12} = u_1 + u_2\), and define its objective function as the \((\min ,+)\)-convolution of \(f_1\) and \(f_2\):

$$\begin{aligned} f_{12}(x_{12})= & {} \min _{\begin{array}{c} x_1, x_2 \in \mathbb {Z},\, x_{12} = x_1 + x_2\\ (l_1, l_2) \le (x_1, x_2) \le (u_1, u_2) \end{array}} f_1(x_1) + f_2(x_2) . \end{aligned}$$
(18)

Note that if \(f_1\) and \(f_2\) are convex, then \(f_{12}\) is also convex. Extend \(f_{12}\) to fractional values as a linear interpolation, that is, for \(x_{12} = {\lfloor }{x_{12}}{\rfloor } + \{x_{12}\}\) fractional, let \(f_{12}(x_{12})\) be \(f_{12}({\lfloor }{x_{12}}{\rfloor }) + \{x_{12}\} (f_{12}({\lceil }{x_{12}}{\rceil }) - f_{12}({\lfloor }{x_{12}}{\rfloor }))\). The value \(f_{12}(x_{12})\) can be obtained by binary search on \(x_1\) (which determines \(x_2 = x_{12} - x_1\)) in \(\mathcal {O}(\log (u_{12} - l_{12}))\) calls to evaluation oracles for \(f_1\) and \(f_2\). When merging a set S of more than 2 variables, one would compute \(f_S(x_S)\) as the solution of the corresponding integer program whose objective is \(\sum _{i \in S} f_i(x_i)\) and its constraints are \(\sum _{i \in S} x_i = x_S\) and appropriate lower and upper bounds; by [13], this is solvable in time \({{\,\mathrm{\mathrm{poly}}\,}}(|S|) \log (f_{\max }, u_S - l_S)\). However, our goal here is to strengthen our proximity result for I by studying \(I'\), without actually attempting to solve \(I'\).

For a solution \({\mathbf{x}}\) of I (not necessarily integral), we define \(\sigma ({\mathbf{x}})\) to be a solution of \(I'\) where \(x_1\) and \(x_2\) are replaced by \(x_{12} = x_1 + x_2\). Clearly, for integer \({\mathbf{x}}\), the value of \(\sigma ({\mathbf{x}})\) under the objective of \(I'\) is at most the value of \({\mathbf{x}}\) under f, and if \({\mathbf{x}}\) is an integer optimum of I, then \(\sigma ({\mathbf{x}})\) will be an integer optimum of \(I'\) because we then have \(f_{12}(x_{12}) = f_1(x_1) + f_2(x_2)\). We abuse the notation and for an integer \({\mathbf{x}}'\) define \(\sigma ^{-1}({\mathbf{x}}')\) to be some integral member \({\mathbf{x}}\) of the set \(\sigma ^{-1}({\mathbf{x}}')\) which satisfies \(f_1(x_1) + f_2(x_2) = f_{12}(x_{12}')\). For a configurable solution \({\mathbf{x}}'\) we define \(\sigma ^{-1}({\mathbf{x}}')\) by taking an \({\hat{f}}\)-optimal decomposition \(\varGamma '\) of the brick of \({\mathbf{x}}'\) containing \(x_{12}\) and applying \(\sigma ^{-1}\) to the configurations in \(\varGamma '\); this defines a decomposition \(\varGamma \) and thus a brick \(\sum \varGamma \) of a solution \({\mathbf{x}}\) of I. The next lemma shows that this construction preserves the value of the solution.

Lemma 15

If \({\mathbf{x}}\) is an integer optimum of I, then \(\sigma ({\mathbf{x}})\) is an integer optimum of \(I'\), respectively. Similarly, if \({\mathbf{x}}\) is a configurable optimum of I, then \(\sigma ({\mathbf{x}})\) is a configurable optimum of \(I'\). Analogously, if \({\mathbf{x}}'\) an integer optimum of \(I'\), then \(\sigma ^{-1}({\mathbf{x}}')\) is an integer optimum of I, and if \({\mathbf{x}}'\) is a configurable optimum of \(I'\), then \(\sigma ^{-1}({\mathbf{x}}')\) is a configurable optimum of I.

Proof

It follows from the definition of \(f_{12}\) that for any integer solution of I we get an integer solution of \(I'\) which is at least as good, and for any integer solution of \(I'\) we get an integer solution of I with the same value. For configurable solutions we apply the observation above to each configuration in some \({\hat{f}}\)-optimal decomposition and use the fact that \({\hat{f}}\) is defined via \(f_{12}\). \(\square \)

This approach generalizes readily to any number of variables. For the sake of simplicity we continue with the example of “joining” two variables whose columns in \(E^{(N)}\) are identical.

We are left to argue about proximity. While we believe that it holds in general that any proximity bound between integer and configurable optima of \(I'\) transfers to I, we only need this for our specific bound, so we take a less general route.

Lemma 16

Let \({\mathbf{x}}\) be a configurable optimum of I with at most 2r fractional bricks, \({\mathbf{x}}' = \sigma ({\mathbf{x}})\) a configurable optimum of \(I'\), \({\mathbf{z}}'\) an \(\ell _1\)-closest integer optimum of \(I'\), and \({\mathbf{z}}= \sigma ^{-1}({\mathbf{z}}')\) an integer optimum of I. Let P be the bound of Theorem 2 on \(\Vert {\mathbf{x}}' - {\mathbf{z}}'\Vert _1\). Then \(\Vert {\mathbf{x}}- {\mathbf{z}}\Vert _1 \le P\).

Proof

Consider the proof of Theorem 2. In it, we create a sequence of vectors \(v_1, \dots , v_{m+{\mathfrak {f}}}\). Each of these vectors corresponds to some \(E_1^{\ell } \lambda _j g_j^i\). The crucial observation is that the sequence \(({\mathbf{v}}_i)_i\) obtained from \({\mathbf{x}}, {\mathbf{z}}\) is identical to the sequence obtained from \({\mathbf{x}}', {\mathbf{z}}'\), so if \(\Vert {\mathbf{x}}' - {\mathbf{z}}'\Vert _1 \le P\), then also \(\Vert {\mathbf{x}}- {\mathbf{z}}\Vert _1 \le P\). \(\square \)

The next corollary is now immediate:

Corollary 1

Let \({\mathbf{x}}^*\) be a conf-optimal solution of (HugeCP) with at most 2r fractional bricks. Then there is an optimal solution \({\mathbf{z}}^*\) of (HugeIP) such that

$$\begin{aligned} \Vert {\mathbf{z}}^*-{\mathbf{x}}^*\Vert _1&\le \left( \kappa _2 (r+s)(2\Vert E\Vert _\infty +1)^{4s}\right) (2r\Vert E_1\Vert _\infty g_1(E_2)))^{6r} \\&\le (2\Vert E\Vert _\infty +1)^{\mathcal {O}(s)} (2r \Vert E\Vert _\infty g_1(E_2))^{\mathcal {O}(r)} \le (\Vert E\Vert _\infty rs)^{\mathcal {O}(rs)} . \end{aligned}$$

3.7 Algorithm

Recall the statement of the theorem we are proving: Theorem 1. Huge N-fold IP with any separable convex objective can be solved in time

$$\begin{aligned} (\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + rs^2)} {{\,\mathrm{\mathrm{poly}}\,}}(\tau , t, \log \Vert {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, N, f_{\max }\Vert _\infty ) . \end{aligned}$$

Proof

We first give a description of the algorithm which solves huge N-fold IP, then show its correctness, and finally give a time complexity analysis.

Description of the algorithm. First, obtain an optimal solution \({\mathbf{y}}\) of (ConfLP) and from it a conf-optimal solution \({\mathbf{x}}^* = \varphi ({\mathbf{y}})\) with at most 2r fractional bricks by Lemma 6. Applying Corollary 1 to \({\mathbf{x}}^*\) guarantees the existence of an integer optimum \({\mathbf{z}}^*\) satisfying

$$\begin{aligned} \Vert {\mathbf{x}}^* - {\mathbf{z}}^* \Vert _1 \le P := \left( (r+s)(2\Vert E\Vert _\infty +1)^{4s}\right) (2r\Vert E_1\Vert _\infty g_1(E_2)))^{6r} . \end{aligned}$$
(19)

Together with the fact that there are at most 2r fractional bricks, this implies that \({\mathbf{z}}^*\) differs from \({\mathbf{x}}^*\) in at most \(P' = P + 2r\) bricks. The idea of the algorithm is to “fix” the value of the solution on “almost all” bricks and compute the rest using an auxiliary \({\bar{N}}\)-fold IP problem with a polynomial \({\bar{N}}\).

Formally, our goal is to compute an optimal solution \({\mathbf{z}}\) of (HugeIP) represented succinctly by multiplicities of configurations, or in other words, as a solution \(\varvec{ \zeta }\) of (ConfILP). Denote by \({\mathbf{y}}_{-P'}\) the vector whose coordinates are defined by setting, for every type \(i \in [\tau ]\) and every configuration \({\mathbf{c}}\in \mathcal {C}^i\), \({\mathbf{y}}_{-P'}(i,{{\mathbf{c}}}) = \max \{0, {\lfloor }{y(i, {\mathbf{c}})}{\rfloor } - P'\}\) This leaves us with \(\Vert {\mathbf{y}}\Vert _1 - \Vert {\mathbf{y}}_{-P'}\Vert _1 \le |\text {supp}({\mathbf{y}})| P' \le (r+\tau ) P' =: {\bar{P}}\) bricks to determine. Let \({\bar{\varvec{ \zeta }}} = {\mathbf{y}}- {\mathbf{y}}_{-P'}\), define \({\bar{\varvec{ \mu }}}\) by setting, for each \(i \in [\tau ]\), \({\bar{\varvec{ \mu }}}_i := \sum _{{\mathbf{c}}\in \mathcal {C}^i} {\bar{\zeta }}(i,{\mathbf{c}})\), let \({\bar{{\mathbf{x}}}} = \varphi ({\bar{\varvec{ \zeta }}})\), and let \({\bar{N}} = \Vert {\bar{\varvec{ \zeta }}}\Vert _1 = \Vert {\bar{\varvec{ \mu }}}\Vert _1 \le {\bar{P}}\). Construct an auxiliary \({\bar{N}}\)-fold IP instance with the same blocks \(E^i_1, E^i_2\), \(i \in [\tau ]\), by, for each brick \({\bar{{\mathbf{x}}}}^j\) of type i, setting

$$\begin{aligned} -\quad \bar{f}^j = f^i, \qquad \qquad -\quad \bar{{\mathbf{b}}}^j = {\mathbf{b}}^i, \qquad \qquad -\quad \bar{{\mathbf{l}}}^j = {\mathbf{l}}^i,\qquad \qquad -\quad \bar{{\mathbf{u}}}^j = {\mathbf{u}}^i. \end{aligned}$$

We say that such a brick was derived from type i. Lastly, let \({\bar{{\mathbf{b}}}}^0 = {\mathbf{b}}^0 - \sum _{i=1}^{\tau } \sum _{{\mathbf{c}}\in \mathcal {C}^i} \zeta (i, {\mathbf{c}}) E^i_1 {\mathbf{c}}\).

After obtaining an optimal solution \({\bar{{\mathbf{z}}}}\) of this instance we update \(\varvec{ \zeta }\) as follows. For each brick \({\bar{{\mathbf{z}}}}^j\) derived from type i, increment \(\zeta (i,{\bar{{\mathbf{z}}}}^j)\) by one.

Correctness. By (19), it is correct to assume that there exists a solution \(\varvec{ \zeta }\) of (ConfILP) which has \(\zeta (i,{\mathbf{c}}) \ge \max \{0, {\lfloor }{y(i, {\mathbf{c}})}{\rfloor } - P'\}\) for each \(i \in [\tau ]\) and \({\mathbf{c}}\in \mathcal {C}^i\). Thus we may do a variable transformation of (ConfILP) \(\varvec{ \zeta }= {\bar{\varvec{ \zeta }}} + {\mathbf{y}}_{-P'}\), obtaining an auxiliary (ConfILP) instance

$$\begin{aligned} \min {\mathbf{v}}({\bar{\varvec{ \zeta }}} + {\mathbf{y}}_{-P'}) \,:\, B({\bar{\varvec{ \zeta }}} + {\mathbf{y}}_{-P'}) = {\mathbf{d}},\, {\mathbf{0}}\le {\bar{\varvec{ \zeta }}} . \end{aligned}$$

The auxiliary huge \({\bar{N}}\)-fold instance is simply the instance corresponding to the above, and the final construction of \(\varvec{ \zeta }\) corresponds to the described variable transformation.

Complexity. Since \(\Vert {\bar{\varvec{ \zeta }}}\Vert _1 \le {\bar{P}}\), we can obtain an optimal solution \({\bar{{\mathbf{z}}}}\) of the auxiliary instance in time \((\Vert E\Vert _\infty r s)^{\mathcal {O}(r^2s + rs^2)} (t {\bar{P}}) \log (t {\bar{P}}) \log \Vert f_{\max }, {\bar{{\mathbf{b}}}}, {\bar{{\mathbf{l}}}}, {\bar{{\mathbf{u}}}}\Vert _\infty ^2\) [13, Corollary 91]. Let us now compute the time needed altogether. To solve (ConfLP), we need time (Lemma 6)

$$\begin{aligned} \Vert E\Vert _\infty ^{\mathcal {O}(s^2)}{{\,\mathrm{\mathrm{poly}}\,}}(rt\tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu }\Vert _\infty ) . \end{aligned}$$

To solve the auxiliary instance above, we need time

$$\begin{aligned} (\Vert E\Vert _\infty r s)^{\mathcal {O}(r^2s + rs^2)} (t {\bar{P}}) \log (t {\bar{P}}) \log \Vert f_{\max }, {\bar{{\mathbf{b}}}}, {\bar{{\mathbf{l}}}}, {\bar{{\mathbf{u}}}}\Vert ^2_\infty , \end{aligned}$$
$$\begin{aligned} \text {where}~{\bar{P}} = (r+\tau )P' \le \tau (2\Vert E\Vert _\infty +1)^{\mathcal {O}(s)} (2r \Vert E\Vert _\infty g_1(E_2))^{\mathcal {O}(r)} . \end{aligned}$$

Hence we can solve huge N-fold IP in time at most

$$\begin{aligned} (\Vert E\Vert _\infty rs)^{\mathcal {O}(r^2s + rs^2)} {{\,\mathrm{\mathrm{poly}}\,}}(t\tau \log \Vert f_{\max }, {\mathbf{l}}, {\mathbf{u}}, {\mathbf{b}}, \varvec{ \mu }\Vert _\infty )\,. \end{aligned}$$

\(\square \)

4 Concluding remarks

At this point one may wonder why bother with the ConfLP rather than solving HugeCP and showing that its optima are close to those of HugeIP. The reason is that even though handling optima of HugeCP is much easier than handling conf-optimal solutions, and even though solving HugeCP is easier than solving ConfLP,Footnote 4 a HugeCP optimum can be very far from a HugeIP optimum [8, Proposition 1]. In other words, ConfLP is a stronger relaxation than HugeCP: consider a brick \({\mathbf{p}}\) of a HugeCP optimum and a brick \({\mathbf{q}}\) of a conf-optimal solution; then

$$\begin{aligned} {\mathbf{q}}\in \text {conv}\{{\mathbf{c}}\in \mathbb {Z}^d \mid E_2 {\mathbf{c}}= {\mathbf{0}}, {\mathbf{l}}^i \le {\mathbf{c}}\le {\mathbf{u}}^i\} \subset \{{\mathbf{p}}' \in \mathbb {R}^d \mid E_2 {\mathbf{p}}' = {\mathbf{0}}, {\mathbf{l}}^i \le {\mathbf{p}}\le {\mathbf{u}}^i\} .\end{aligned}$$

In plain language, while \({\mathbf{q}}\) lies in the integer hull of all configurations, \({\mathbf{p}}\) only lies in the fractional relaxation of this hull.

Another obstacle is that even though Configuration LP is a standard tool, it is typical that the separation problem is merely approximated rather than solved exactly, leading to approximate solutions of ConfLP. But, we require an exact solution, and so we use a parameterized exact algorithm for IP to solve the separation problem. It is an interesting question when a k-approximate solution of ConfLP, i.e., a solution whose value is at most \(k \cdot OPT\), may be used to obtain an h(k)-accurate configurable solution of HugeCP, i.e., a configurable solution which is at \(\ell _1\)-distance at most h(k) from a configurable optimum. An approximate solution of ConfLP might be much easier to obtain, and yet it may be almost as good as an exact solution for our purposes here.

Another interesting question is a tight complexity bound for the algorithm of Lemma 6. It seems likely that the recent approach of Cslovjecsek et al. [8] could also apply in our high-multiplicity setting, which would yield a near-linear fixed-parameter algorithm. Notice that the iterative augmentation algorithms for standard N-fold IP have a strong combinatorial flavor and use no “black boxes”. Could the ellipsoid method behind Lemma 6 be replaced by a (more) combinatorial algorithm, at least for some important problems which have huge N-fold IP models, such as the scheduling problems studied by Knop et al. [31]?