High-multiplicity N-fold IP via configuration LP

N-fold integer programs (IPs) form an important class of block-structured IPs for which increasingly fast algorithms have recently been developed and successfully applied. We study high-multiplicityN-fold IPs, which encode IPs succinctly by presenting a description of each block type and a vector of block multiplicities. Our goal is to design algorithms which solve N-fold IPs in time polynomial in the size of the succinct encoding, which may be significantly smaller than the size of the explicit (non-succinct) instance. We present the first fixed-parameter algorithm for high-multiplicity N-fold IPs, which even works for convex objectives. Our key contribution is a novel proximity theorem which relates fractional and integer optima of the Configuration LP, a fundamental notion by Gilmore and Gomory [Oper. Res., 1961] which we generalize. Our algorithm for N-fold IP is faster than previous algorithms whenever the number of blocks is much larger than the number of block types, such as in N-fold IP models for various scheduling problems.


Introduction
The fundamental Integer Programming (IP) problem is to solve: where f : R n → R, A ∈ Z m×n , b ∈ Z m , and l, u ∈ (Z ∪ {±∞}) n . Any IP instance with infinite bounds l, u can be reduced to an instance with finite bounds using standard techniques (solving the continuous relaxation and using proximity bounds to restrict the relevant region), so that from now on we will assume finite bounds l, u ∈ Z n . We denote f max = max Integer Programming is a fundamental problem with vast importance both in theory and practice. Because it is NP-hard already with a single row (by reduction from Subset Sum) or with A a 0/1-matrix (by reduction from Vertex Cover), there is high interest in identifying tractable subclasses of IP. One such tractable subclass is N -fold IPs, whose constraint matrix A is defined as Here, r , s, t, N ∈ N, E (N ) is an (r + N s) × N t-matrix, E i 1 ∈ Z r ×t and E i 2 ∈ Z s×t , i ∈ [N ], are integer matrices. We define E := , and call E (N ) the N -fold product of E. The structure of E (N ) allows us to divide any N t-dimensional object, such as the variables of x, bounds l, u, or the objective f , into N bricks of size t, e.g. x = (x 1 , . . . , x N ). We use subscripts to index within a brick and superscripts to denote the index of the brick, i.e., x i j is the j-th variable of the i-th brick with j ∈ [t] and i ∈ [N ]. Problem (IP) with A = E (N ) is known as N -fold integer programming (N -fold IP).
Such block-structured matrices have been the subject of extensive research stretching back to the '70s [3-5, 15, 16, 28, 42, 44, 45], as this special structure allows applying methods like the Dantzig-Wolfe decomposition and others, leading to significant speed-ups in practice. On the theoretical side, the term "N -fold IP" has been coined by De Loera et al. [9], and since then increasingly efficient algorithms have been developed and applied to various problems relating to N -fold IPs [2,6,25,26,29,32]. This line of research culminated with an algorithm by Eisenbrand et al. [13] which solves N -fold IPs in time ( E ∞ rs) O(r 2 s+rs 2 ) · N log N ·log u−l ∞ ·log f max for all separable convex objectives f (i.e., when f (x) = n i=1 f i (x i ) and each f i : R → R is convex).

Our contribution
Previous algorithms for N -fold IP have focused on reducing the run-time dependency on N down to almost linear. Instead, our interest here is on N -fold IPs which model applications where many bricks are of the same type, that is, they share the same bounds, right-hand side, and objective function. For those applications, it is natural to encode an N -fold IP instance succinctly by describing each brick type by its constraint matrix, bounds, right-hand side, and objective function, and giving a vector of brick multiplicities. When the number of brick types τ is much smaller than the number N of bricks, e.g., if N ≈ 2 τ , this succinct instance is (much) smaller than the previously studied encoding of N -fold IP, and an algorithm running in time polynomial in the size of the succinct instance may be (much) faster than current algorithms. We call the N -fold IP where the instance is given succinctly the huge N -fold IP problem, and we present a fast algorithm for it:

Theorem 1 Huge N -fold IP with any separable convex objective can be solved in time
( E ∞ rs) O(r 2 s+rs 2 ) poly(τ, t, log l, u, b, N , f max ∞ ) .
A natural application of Theorem 1 are scheduling problems. In many scheduling problems, the number n of jobs that must be assigned to machines, as well as the number m of machines, are very large, whereas the number of types of jobs and the number of kinds of machines are relatively small. An instance of such a scheduling problem can thus be compactly encoded by simply stating, for each job type and machine kind, the number of jobs with that type and machines with that kind together with their characteristics (like processing time, weight, release time, due date, etc.), respectively. This key observation was made by several researchers [7,37], until Hochbaum and Shamir [20] coined the term high-multiplicity scheduling problem. Clearly, many efficient algorithms for scheduling problems, where all jobs are assumed to be distinct, become exponential-time algorithms for the corresponding high-multiplicity problem.
Let us shortly demonstrate how Theorem 1 allows designing algorithms which are efficient for the succinct high-multiplicity encoding of the input. In modern computational clusters, it is common to have several kinds of machines differing by processing unit type (high single-or multi-core performance CPUs, GPUs), storage type (HDD, SSD, etc.), network connectivity, etc. However, the number of machine kinds τ is still much smaller (perhaps 10) than the number of machines, which may be in the order of tens of thousands or more. Many scheduling problems have N -fold IP models [31] where τ is the number of machine kinds and N is the number of machines. On these models, Theorem 1 would likely outperform the currently fastest N -fold IP algorithms. Proof ideas. To solve a high-multiplicity problem, one needs a succinct way to argue about solutions. In 1961, Gilmore and Gomory [17] introduced the fundamental and widely influential notion of Configuration IP (ConfIP) which describes a solution (e.g., a schedule) by a list of pairs "(machine schedule s, multiplicity μ of machines with schedule s)". The linear relaxation of ConfIP, called the Configuration LP (ConfLP), can often be solved efficiently, and is known to provide solutions of strikingly high quality in practice [41]; for example, the optimum of the ConfLP for Bin Packing is conjectured to have value x such that an optimal integer packing uses ≤ x + 1 bins [38]. However, surprisingly little is known in general about the structure of solutions of ConfIP and ConfLP, and how they relate to each other.
We define the Configuration IP and LP of an N -fold IP instance, and show how to solve the ConfLP quickly using the property that the ConfLP and ConfIP have polynomial encoding length even for huge N -fold IP. Our main technical contribution is a novel proximity theorem about N -fold IP, showing that a solution of its relaxation corresponding to the ConfLP optimum is very close to the integer optimum. Thus, the algorithm of Theorem 1 proceeds in three steps: (1) it solves the ConfLP, (2) it uses the proximity theorem to create a "residual" N -fold instance with N upperbounded by ( E ∞ rs) O(rs) , and (3) it solves the residual instance by an existing N -fold IP algorithm.

Related work
Besides the references mentioned already, we point out that solving ConfLP is commonly used as subprocedure in approximation algorithms, e.g. [1,14,22,27]. Jansen and Solis-Oba use a mixed ConfLP to give a parameterized OPT + 1 algorithm for bin packing [24]; Onn [36] gave a weaker form of Theorem 1 which only applies to the setting where E i 1 = I and E i 2 is totally unimodular, for all i. Jansen et al. [25] extend the ConfIP to multiple "levels" of configurations. An extended version [31] of this paper shows how to model many scheduling problems as high multiplicity N -fold IPs, so that an application of Theorem 1 yields new parameterized algorithms for these problems. Knop and Koutecký [30] use our new proximity theorem to show efficient preprocessing algorithms (kernels) for scheduling problems.
There are currently several "fastest" algorithms for N -fold IP with standard (nonsuccinct) encoding. First, we have already mentioned the algorithm of Eisenbrand et al. [13]. Second, the algorithm of Jansen et al. [26] has a better parameter dependency of ( E ∞ rs) O(r 2 s+s 2 ) (as compared with ( E ∞ rs) O(r 2 s+rs 2 ) of the previous algorithm), but has a slightly worse dependence on N of N log 5 N , and only works for linear objectives. Third, a recent algorithm of Cslovjecsek et al. [8] again only works for linear objectives and runs in time While the authors claim that this constitutes the currently fastest algorithm, it seems that it is only potentially faster than prior work in a narrow parameter regime.
The third paper, by Cslovjecsek et al. [8], is the closest to ours in its approach: it solves a strong relaxation of N -fold IP which coincides with the ConfLP if each brick is of a distinct type, and which is generalized by the ConfLP (in our work) otherwise. The authors show that this relaxation can be solved in near-linear time, and then develop a proximity theorem similar to ours (but using different techniques) and a dynamic program, which allows them to construct and solve a residual instance in linear time. An earlier version of our paper [31] stated a worse proximity bound than that of Cslovjecsek et al. [8], but our bound applies to separable convex objective whereas theirs [8] does not. Presently, we adapt one of their lemmas ( [8, Lemma 3]) (Lemma 5) and a modeling idea (Sect. 3.4) to obtain the same proximity bound as they have [8], but which also works for separable convex objectives. It is likely that the complexity of our algorithm to solve the ConfLP could be improved along the lines of their work [8]. Despite these similarities, we highlight that only our algorithm solves the high-multiplicity version of N -fold IP.

Preliminaries
For positive integers m, n with m ≤ n we set [m, n] = {m, m + 1, . . . , n} and [n] = [1, n]. We write vectors in boldface (e.g., x, y) and their entries in normal font (e.g., the i-th entry of x is x i or x(i)). For α ∈ R, α is the floor of α, α is the ceiling of α, and we define {α} = α − α , similarly for vectors where these operators are defined component-wise.
We call a brick of x integral if all of its coordinates are integral, and fractional otherwise. Huge N-fold IP. The huge N -fold IP problem is an extension of N -fold IP to the highmultiplicity scenario, where there are potentially exponentially many bricks. This requires a succinct representation of the input and output. The input to a huge N -fold IP problem with τ brick types is defined by matrices E i , ∀x ∈ Z t : f i (x) ∈ Z and given by evaluation oracles, and integers μ 1 , . . . , μ τ ∈ N such that τ i=1 μ i = N . We say that a brick is of type i if its lower and upper bounds are l i and u i , its right hand side is b i , its objective is f i , and the matrices appearing at the corresponding coordinates are E i 1 and E i 2 . The task is to solve (IP) with a matrix E (N ) which has μ i bricks of type i for each i. Onn [35] shows that for any solution, there exists a solution which is at least as good and has only few (at most τ · 2 t ) distinct bricks. In Sect. 3 we show new bounds which do not depend exponentially on t.

Graver bases and the Steinitz lemma
Let x, y be n-dimensional vectors. We call x, y sign-compatible if they lie in the same orthant, that is, for each i ∈ [n], x i · y i ≥ 0. We call i g i a sign-compatible sum if all g i are pair-wise sign-compatible. Moreover, we write y x if x and y are sign-compatible and |y i | ≤ |x i | for each i ∈ [n]. Clearly, imposes a partial order, called "conformal order", on n-dimensional vectors. For an integer matrix A ∈ Z m×n , its Graver basis G(A) is the set of -minimal non-zero elements of the lattice of A, ker Z (A) = {z ∈ Z n | Az = 0}. A circuit of A is an element g ∈ ker Z (A) whose support supp(g) (i.e., the set of its non-zero entries) is minimal under inclusion and whose entries are coprime. We denote the set of circuits of A by C(A). It is known that C(A) ⊆ G(A) [34, Definition 3.1 and remarks]. We make use of the following two propositions: Proposition 1 (Positive Sum Property [34,Lemma 3.4]) Let A ∈ Z m×n be an integer matrix. For any integer vector x ∈ ker Z (A), there exists an n ≤ 2n − 2 and a decomposition x = n j=1 α j g j with α j ∈ N for each j ∈ [n ], into a sum of g j ∈ G(A). For any fractional vector x ∈ ker(A) (that is, Ax = 0), there exists a decomposition .
be separable convex, let x ∈ R n , and let g 1 , . . . , g k ∈ R n be vectors with the same sign-pattern from {≤ 0, ≥ 0} n , that is, belonging to the same orthant of R n . Then for arbitrary integers α 1 , . . . , α k ∈ N.
Our proximity theorem relies on the Steinitz Lemma, which has recently received renewed attention [11,12,23].
Lemma 1 (Steinitz [40], Sevastjanov, Banaszczyk [39]) Let · denote any norm, and let x 1 , . . . , x n ∈ R d be such that x i ≤ 1 for i ∈ [n] and n i=1 x i = 0. Then there exists a permutation π ∈ S n such that for all k = 1, . . . , n, the prefix sum satisfies For an integer matrix A, we define g 1 (A) = max g∈G(A) g 1 . When it could make a difference, we will state our bounds both in terms of E ∞ (worst-case, when we have no other information) and in terms of g 1 (E 2 ) := max i g 1 (E i 2 ), e.g. in Lemma 10 and Theorem 2.

Proof of Theorem 1
We first give a relatively high-level description of the proof, before we present all its details.

Configuration LP and IP
Given an input to the huge N -fold IP, we first reformulate it as another IP, which we refer to as the Configuration IP. We then consider its fractional relaxation, the socalled Configuration LP. Our approach is to (efficiently) solve the Configuration LP, and bound the distance of its LP optimum to the integer optimum (of the Configuration IP). We use this bound to reduce the input to the huge N -fold IP from a high-multiplicity input to an input of a standard N -fold IP which is small both in terms of the number of bricks and size of the bounding box. This small input we then solve using an existing N -fold IP algorithm. On this way, there are several non-trivial obstacles that we need to overcome.
We will refer to huge N -fold IP as HugeIP, its corresponding fractional relaxation as HugeCP (this is a convex program if the objective f is convex), the Configuration LP of the HugeIP as ConfLP, and to its integer version as ConfIP. We define a mapping ϕ from the solutions of ConfLP to the solutions of HugeCP which, for every variable y c of the ConfLP introduces y c bricks with configuration c, and then introduces c {y c } bricks with configuration 1 c {y c } c {y c } · c (i.e., an "average" configuration). We call a solution x * of HugeCP "conf-optimal" if it is the image ϕ(y * ) of some ConfLP optimum y * . One would hope that then the objective value of a conf-optimal solution x * in HugeCP and of y * in ConfLP were identical. While this is true for any linear objective f , it need not be true for a convex objective f . To overcome this impediment, we introduce an auxiliary objectivef which preserves the values of optima of ConfLP and conf-optimal solutions of HugeCP.

Proximity theorem
The bulk of our work is showing that for each conf-optimal solution x * of the HugeLP, there is an optimum z * of the HugeIP whose 1 -distance from x * is bounded by P := ( E ∞ rs) O(rs) . We will show that we can obtain a ConfLP optimum y with support of size at most r + τ , and by the definition of ϕ (recall that x * = ϕ(y)), this means that x * has at most r + τ + 1 distinct bricks (the +1 is due to ϕ creating an additional "average configuration" brick type). This, in turn, means that our bound on the 1 -distance between z * and x * says something about ConfLP and ConfIP: for any ConfLP optimum y there is a ConfIP optimum y * in 1 -distance at most P where any configuration c in the support of y * is at most P far from some configuration c in the support of y. As far as we know, this is a unique result about the Configuration LP.
A way of bounding the distance between some types of optima in an integer program has been introduced by Hochbaum and Shanthikumar [21] and adapted to the setting of N -fold IP by Hemmecke at al. [19]. A somewhat different approach was later developed by Eisenbrand and Weismantel [11] in the setting of IPs with few rows, and was adapted to the setting of N -fold IPs soon after [12,13]. The idea is as follows. Let x * be a HugeCP optimum, and z * be a HugeIP optimum, We call a non-zero integral vector p x * − z * , i.e., which is sign-compatible (i.e., has the same sign-pattern) with x * − z * and which is smaller in absolute value than x * − z * in each coordinate, a cycle of x * − z * . If z * minimizes x * − z * 1 , it can be shown that no cycle of x * − z * exists. Moreover, if a cycle exists, then a cycle of 1 -norm at most B exists, which implies x * − z * 1 ≤ B. Notice that the previous argument assumes x * to be a HugeCP optimum: this cannot be replaced with a conf-optimal solution for the following reason. The existence of a cycle p leads to a contradiction because either z * + p is also a HugeIP optimum (but closer to x * ) or x * − p is also a HugeCP optimum (but closer to z * ). But if x * is a conf-optimal solution, we have no guarantee that x * −p is again a configurable solution, and the argument breaks down. This means that we need to restrict our attention to cycles with the property that if x * is a configurable solution, then x * − p is also configurable.
We call such a p a configurable cycle. The next task is an analogy of the argument above: if x * is conf-optimal and z * is a HugeIP optimum, then the existence of a configurable cycle p of x * − z * leads to a contradiction. For that, we need the separability and convexity of the objective f and a careful use of the configurability of p. With this argument at hand, we have reduced our task to bounding the norm of any configurable cycle (Lemma 7).
However, the main existing tool for showing proximity is by ruling out cycles. To overcome this, we develop new tools to deal with configurable cycles.

The algorithm
It remains to use our proximity bound P. As already hinted at, if two solutions differ in 1 -norm by at most P, then they may differ in at most P bricks. This means that we may fix all but P bricks for each configuration appearing in the ConfLP optimum.
Since the size of the support of the ConfLP optimum is small (r + τ ), the total number of bricks to be determined is also small, and can be done using a standard N -fold IP algorithm in the required time complexity (Proof of Theorem 1) To recap, the algorithm works in the following steps.
1. We solve the ConfLP and obtain its optimum y by solving its Dual LP using a separation oracle. The separation oracle is implemented using a fixed-parameter algorithm for IP with small coefficients. 2. We use the ConfLP optimum y to fix the solution on all but (r + τ )P bricks. 3. The remaining instance can be encoded as an N -fold IP with at most (r + τ )P bricks and solved using an existing algorithm.
Let us now go back to a detailed proof of Theorem 1.

Configurations of huge N-fold IP
Fix a huge N -fold IP instance with τ types. Recall that μ i denotes the number of bricks of type i, and μ = (μ 1 , . . . , μ τ ). We define for each i ∈ [τ ] the set of configurations of type i as Here we are interested in four instances of convex programming (CP) and convex integer programming (IP) related to huge N -fold IP. First, we have the Huge IP and the Huge CP, which is a relaxation of (HugeIP), We shall define the objective functionf later, for now it suffices to say that for all integral feasible x ∈ Z N t we have f (x) =f (x) so that indeed the optimum of (HugeCP) lower bounds the optimum of (HugeIP) and thatf is convex. Then, there is the Configuration LP of (HugeIP), that is, the following linear program: Letting B be its constraint matrix and d = b 0 μ be the right hand side, we can shorten (3)-(4) as min vy : Finally, by observing that A solution x of (HugeCP) is configurable if, for every i ∈ [τ ], each brick x j of type i is a convex combination of C i , i.e., x j ∈ conv(C i ). We shall define a mapping from solutions of (ConfLP) to configurable solutions of (HugeCP) as follows. For every solution y of (ConfLP) we define a solution x = ϕ(y) of (HugeCP) to have y(i, c) bricks of type i with configuration c and, for each is also integral.) Note that ϕ(y) has at most as many fractional bricks as y has fractional entries since each f i < 1 and the number of non-zero f i is at most the number of fractional entries of y. Call a solution x of (HugeCP) conf-optimal if there is an optimal solution y of (ConfLP) such that x = ϕ(y).
We are going to introduce an auxiliary objective functionf , but we first want to discuss our motivation in doing so. The reader might already see that for any integer solution y ∈ Z C of (ConfILP), vy = f (ϕ(y)) holds, as we shall prove in Lemma 4. Our natural hope would be that for a fractional optimum y * of (ConfLP) we would have vy * = f (ϕ(y * )). However, by convexity of f and the construction ofĉ i it only follows that vy * ≥ f (ϕ(y * )). Even worse, there may be two conf-optimal solutions x and x with f (x)< f (x ). To overcome this, we define an auxiliary objective functionf with the property that for any conf-optimal solution x * of (HugeCP) and any optimal solution y * of (ConfLP), vy * =f (x * ).
Fix a brick x j of type i. We say that a multiset In a sense,f (x) is the value of the minimum (w.r.t. f ) interpretation of x as a convex combination of feasible integer solutions. Correspondingly, we call a decomposition Γ j of x jf -optimal if it is a minimizer of (5). Formally, we let for a non-configurable x j in order to make the definition of (HugeCP) valid; however, we are never interested in the value off for non-configurable bricks in the following. Lemma 2 Let x be a configurable solution of (HugeCP), and x j be a brick of type i.
Proof By convexity of f i we have for any decomposition Γ j of x j . If x j is integral, then Γ j = {(x j , 1)} is its optimal decomposition (not necessarily unique 1 ), concluding the proof.
Moreover, for each x j there is anf -optimal decomposition Γ j with |Γ j | ≤ t + 1 sincef -optimal decompositions correspond to optima of a linear program with t + 1 equality constraints, namely Let us describe the relationship of the objective values of the various formulations.

Lemma 3
For any feasible solutionỹ of (ConfLP), Proof Letx = ϕ(ỹ). We can decomposef (ϕ(ỹ)) = U 1 + U 2 , where U 1 is the cost of integer bricks of ϕ(ỹ) and U 2 is the cost of its fractional bricks. It is easy to see that , over integer vectors. We shall further decompose the value U 2 into costs of fractional bricks of each type. For each i ∈ [τ ], the cost of each fractional brick of type i is at most 1 is merely a feasible (not necessarily optimal) solution of (6) Summing this estimate up over all f i fractional bricks of type i gives f i · 1

Lemma 4
Letŷ be an optimum of (ConfILP), z * be an optimum of (HugeIP), y * be an optimum of (ConfLP),x = ϕ(y * ), and x * be a configurable optimum of (HugeCP). Thenf by equality off and f on integer solutions (Lemma 2), and f (z * ) = f (ϕ(ŷ)) = vŷ by the definition of ϕ and the fact thatŷ is an integer optimum. Clearly, vŷ ≥ vy * , because (ConfLP) is a relaxation of (ConfILP) and thus the former lower bounds the latter.
Let us construct a mapping φ for any configurable solution x of (HugeCP). Start with φ(x) = y = 0. For each brick x j of type i let Γ j be af -optimal decomposition of x j and update y i Our goal is to argue that vy * =f (x) =f (x * ). We havef (x) =f (ϕ(y * )) ≤ vy * by (7), but by optimality of y * and (8) with the "=" by (8), the first "≥" by optimality of y * , and the second "≥" by (7). However, sincef (ϕ(y * )) ≥f (x * ) by optimality of x * , all inequalities are in fact equalities and thus vy * =f (x * ).

Remark 1
We only need the properties off that we have proved so far. To gain a little bit more intuition, consider the dual of the LP (6). Notice that the set of right hand sides x j whose optimum is attained by a particular set of configurations supp(λ) is a polyhedron. Call such a set a cell. This means thatf is a convex function which is linear in each cell. Another observation is thatf is non-separable. We do not have a more intuitive explanation off . It would be tempting to think thatf is the piece-wise linear approximation of f in which, for every i ∈ [N t], we replace each segment of f i between two adjacent integers k, k +1 by the affine function going through the points (k, f i (k)) and (k + 1, f i (k + 1)). However, this turns out to be incorrect: for example, say that f 1 (x 1 ) = |x 1 − 1| (thus f 1 (0) = f 1 (2) = 1 and f 1 (1) = 0) and that we set x 1 = 2x 2 for a new integer variable x 2 . This constraint ensures that x 1 only takes on even values. Thus, x 1 never attains the value 1 and f 1 (1) ≥ 1 even though the piece-wise linear approximation of f 1 has value 0 at 1.

Bounding the number of fractional coordinates.
Lemma 5 (Adaptation of [8,Lemma 4.1]) An optimal vertex solution y * of (ConfLP) has at most 2r fractional coordinates.
, then it is integral. Thus, any brick of y * which is fractional cannot be a vertex of Q i and hence there exists a direction e i ∈ Ker Z (1) and a length λ i > 0 such that (y * ) i ±λ i e i ∈ Q i . For the sake of contradiction, assume there are r +1 bricks of y * which contain a fractional coordinate and I is the index set of such bricks. Hence we have e i , λ i as above for each i ∈ I . We abuse the notation and treat C i as a matrix whose columns are the configurations. Consider the vectors E i 1 C i λ i e i ∈ R r : because there are r + 1 of them, they are linearly dependent, and, by rescaling, there must be coefficientsλ such that |λ i | ≤ λ i for each i ∈ I and i∈I E i 1 C iλi e i = 0. Define e ∈ R C (recall that C is the total number of configurations) such that its i-th brick is equal toλ i e i if i ∈ I , and is 0 otherwise. Then y * ± e are both feasible solutions of (ConfLP), and thus y * is not a vertex solution-a contradiction.
So far, we have shown there are at most r fractional bricks of y * . Notice that all we needed for that was r + 1 linearly dependent vectors which can be added to some brick in both directions while preserving feasibility. Because e i ∈ Ker Z (1) for each i ∈ I , we can decompose e i into elements of G(1), which are exactly vectors with one 1 and one −1. Hence, to avoid the contradiction above, there can be at most r vectors e i , and, additionally, all of them must belong to G(1). Thus, the resulting vector e has support of size at most 2r , and y * has at most 2r fractional coordinates.

Finding a conf-optimal solution with small number of fractional bricks.
Our goal is to show that the proximity of any conf-optimal solution x * of (HugeCP) from an integer optimum z * of (HugeIP) depends on the number of fractional bricks. This number, by definition of ϕ, depends on the number of fractional coordinates of the corresponding solution y of (ConfLP). The following lemma shows how to produce optima of (ConfLP) with small support. We emphasize that our proximity theorem does not require that the fractional solution be optimal but rather conf-optimal.

Lemma 6
There is an algorithm that finds an optimal vertex solution y * of (ConfLP) with |supp(y * )| ≤ r + τ and at most 2r fractional coordinates, and a confoptimal solution x * = ϕ(y * ) of (HugeCP) with at most 2r fractional bricks, in time poly(rtτ log f max , l, u, b, μ, E ∞ ).
Proof The proof has three parts. First, we describe how to find an optimal basic solution of the dual of (ConfLP). Next, we identify r + τ inequalities of this dual which fully determine the optimal dual LP solution. Finally, we show how to use this information to solve (ConfLP) itself.
Recall that τ is the number of brick types in the huge N -fold instance. Since (ConfLP) has exponentially many variables, we take the standard approach and solve the dual LP of (ConfLP) by the ellipsoid method and the equivalence of optimization and separation. The Dual LP of (ConfLP) in variables α ∈ R r , β ∈ R τ is: To verify feasibility of (α, β), we need, for each i ∈ [τ ], to maximize the left-hand side of (9) over all c ∈ C i and check if it is at most −β i . This corresponds to finding integer variables c which for given (α, β) solve This program can be solved in time Grötschel et al. [18,Theorem 6.4.9] show that an optimal solution of an LP (even one which is a vertex [18, Remark 6.5.2]) can be found in a number of calls to a separation oracle which is polynomial in the dimension and the encoding length of the inequalities returned by a separation oracle. Clearly the inequalities (9) have encoding length bounded by log f max , l, u, b, μ ∞ and thus T = poly(rtτ log f max , l, u, b, μ, E ∞ ) calls to a separation oracle are sufficient to find an optimal vertex solution, which amounts to T · T arithmetic operations.
Next, we will identify r + τ inequalities determining the previously found optimal vertex solution of the dual of (ConfLP). Observe that the dimension of the dual LP is the number of rows of the primal LP, which is r + τ . Since each point in (r + τ )dimensional space is fully determined by r +τ linearly independent inequalities, there must exist a subset I of r + τ inequalities among the T inequalities considered by the ellipsoid method which fully determines the dual optimum. We can find them as follows.
We initialize I to be the empty set. Taking the T considered inequalities one by one, we process the inequality if it is satisfied as equality by the given optimal basic solution for the dual LP, and we discard other inequalities. If we process the current inequality and either some inequality of I or the present inequality is dominated 2 by an inequality that can be obtained as a non-negative linear combination of the others, discard it; otherwise, include it in I and continue. Testing whether an inequality dz ≤ e is dominated by a non-negative combination of a system of inequalities Dz ≤ e can be decided by solving and checking whether the optimal value is at most e . If it is, then the solution α encodes a non-negative linear combination of the inequalities Dz ≤ e which yields an inequality dominating dz ≤ e , and if it is not, then such a combination does not exists. Thus, when a new inequality is considered, we solve (10) for at most r + τ inequalities (the new one and all less than r + τ already selected ones), and there are at most T inequalities considered. The time needed to solve (10) is poly(r + τ, log l, u, b, f max , E ∞ ) because its dimension is at most r + τ and its encoding length is at most log l, u, b, f max , E ∞ . Altogether, we need time Finally, let the restricted (ConfLP) be the (ConfLP) restricted to the variables corresponding to the inequalities in I . We claim that an optimal solution to the restricted (ConfLP) is also an optimal solution to (ConfLP). To see that, use LP duality: the optimal objective value of the dual LP restricted to inequalities in I is the same as one of the dual optima, and thus an optimal solution of the restricted (ConfLP) must be an optimal solution of (ConfLP). We solve the restricted (ConfLP) using any polynomial LP algorithm in time T ≤ poly((r + τ ), log f max , l, u, μ, b 0 , E ∞ ). The resulting total time complexity is thus T · T + T to construct the restricted (ConfLP) instance and time T to solve it, T · T + T + T total, which is upper bounded by Let y * be an optimum of (ConfLP) we have thus obtained. Since |I | ≤ r + τ , the support of y * is of size at most r + τ . By Lemma 5, y * has at most 2r fractional coordinates. Now setting x * = ϕ(y * ) is enough, since we have already argued (see definition of ϕ) that x * has at most as many fractional bricks as y * has fractional coordinates and x * can be computed from y * in O(r + τ ) time.

Proximity theorem
Let us give a plan for the next subsection. We wish to prove that for every conf-optimal solution x * of (HugeCP) there is an integer solution z * of (HugeIP) nearby. In the following, let x * be a conf-optimal solution of (HugeCP) and z * be an optimal solution of (HugeIP) minimizing x * − z * 1 . A technique for proving proximity theorems which was introduced by Eisenbrand and Weismantel [11] works as follows. A vector h ∈ Z N t is called a cycle of x * − z * if h = 0, E (N ) h = 0, and h x * − z * . It is not too difficult to see that if x is an optimal (not necessarily conf-optimal) solution of (HugeCP) with the objective f , then there cannot exist a cycle of x − z * (cf. proof of Lemma 9). Based on a certain decomposition of x − z * into integer and fractional smaller dimensional vectors and by an application of the Steinitz Lemma, the existence of a cycle is proven unless x − z * 1 is roughly bounded by the number of fractional bricks of x . However, we cannot apply this technique directly as an optimal solution x of (HugeCP) might have many fractional bricks. At the same time, an existence of a cycle h of x * − z * does not necessarily contradict that x * − z * 1 is minimal, because x * + h might not be a configurable solution, which is an essential part of the argument.
All of this leads us to introduce a stronger notion of a cycle. We say that h ∈ Z N t is a configurable cycle of x * − z * (with respect to x * ) if (1) h is a cycle of x * − z * , (2) for each brick j ∈ [N ] of type i ∈ [τ ] there exists anf -optimal decomposition Γ j of (x * ) j such that we may write h j = (c,λ c )∈Γ j λ c h c , and (3) for each (c, λ c ) ∈ Γ j we have h c c − (z * ) j and h c ∈ Ker Z (E i 2 ). Soon we will show that if x * − z * 1 is minimal, x * − z * does not have a configurable cycle. The next task becomes to show how large must x * − z * 1 be in order for a configurable cycle to exist. Recall that the technique of Eisenbrand and Weismantel [11] can be used to rule out an existence of a (regular) cycle, not a configurable cycle. To overcome this, we "lift" both x * and z * to a higher-dimensional space and show that a cycle in this space corresponds to a configurable cycle in the original space. Only then are we ready to prove a proximity bound using the aforementioned technique.
We now need a technical lemma: Lemma 8 Let x * be a conf-optimal solution of (HugeCP), let z * be an optimum of (HugeIP), and let h * be a configurable cycle of x * − z * . Then Proof We begin by a simple observation: let g : R → R be a convex function, x ∈ R, z ∈ Z, and r ∈ Z be such that r x − z (that is, there is some ρ, 0 ≤ ρ ≤ 1, such that r = ρ · (x − z)). By convexity of g we have that Fix j ∈ [N ] and z = (z * ) j , x = (x * ) j , h = (h * ) j , and let i be the type of brick j.
Since h * is a configurable cycle there exists anf -optimal decomposition Γ of x such that, for each (c, λ c ) ∈ Γ , there exists a h c c − z, h c ∈ Ker Z (E i 2 ), and h = (c,λ c )∈Γ λ c h c . Due to separability of f we may apply (12) independently to each coordinate, obtaining for each c Since all arguments of f i are integral, we immediately get Aggregating according to Γ , we get (recall that we have (c,λ c )∈Γ λ c = 1) where byf -optimality of Γ the right-hand side is equal tof i (z) +f i (x). As for the left-hand side, observe that decompositions Γ = {(z + h c , λ c ) | (c, λ c ) ∈ Γ } and Γ = {(c − h c , λ c ) | (c, λ c ) ∈ Γ } satisfy Γ = z + h and Γ = x − h but are only feasible (not necessarily optimal) solutions of (6). Thus, we havê Combining over Γ then yieldŝ and since we have proven this claim for every brick j, aggregation over bricks concludes the proof of the main claim (11).
Let us show that if x * and z * are as stated, then there is no configurable cycle of x * −z * .

Lemma 9
Let x * be a conf-optimal solution of (HugeCP) and let z * be an optimal solution of (HugeIP) such that x * − z * 1 is minimal. Then there is no configurable cycle of x * − z * .
Proof For the sake of contradiction, suppose that there exists a configurable cycle h * of x * − z * . By Lemma 8, one of two cases must occur: Case 1:f (z * + h * ) ≤f (z * ). Then z * + h * is an optimal integer solution (by h x * − z * we have l ≤ z * + h ≤ u and by h * ∈ ker Z E (N ) we have E (N ) (z * + h) = b) which is closer to x * , a contradiction to minimality of x * − z * 1 . Case 2:f (x * − h * ) <f (x * ). Since h * is a configurable cycle, Lemma 7 states that x * − h * is configurable, so we have a contradiction with conf-optimality of x * .

Overview of the remainder of the proof
In order to use existing proximity arguments to bound the norm of a cycle, our plan is to move into an extended (higher-dimensional) space which corresponds to decomposing each brick x i of x * into configurations as x i = c λ c c -each summand becomes a new brick in the extended space.
We denote this new higher-dimensional representation of x * with respect to Γ as ↑x * and call it the rise of x * , and define similarly the rise of z * (with respect to a given decomposition of each brick of x * ). The situation gets very delicate at this point.
First, we require that each decomposition of a brick of x * is optimal with respect to the auxiliary objectivef so that we can use the argument about non-existence of a cycle. Second, because the proximity bound depends on the number of fractional bricks of ↑x * , we require that the decomposition of each brick is small, i.e., into only few elements. Third, we require that each coefficient λ c is of the form 1/q c for an integer q c , because we need to ensure that, for a corresponding cycle brick h c , λ −1 c h c is an integer vector, so λ −1 c has to be an integer. To ensure the second and third condition simultaneously, we first show that there is a decomposition of each brick of size at most t + 1 and with each coefficient bounded by P, and then show that each fraction p/q can be written as an Egyptian fraction p/q = 1/a 1 + 1/a 2 + · · · 1/a c with c ≤ 2 log 2 q (Lemmas 10-12). (Bounds on the length of Egyptian fractions have been studied in the past and our bound is not the best possible, but in order to use our proximity theorem, we need exact and not merely asymptotic bounds, so we prove this worse but exact bound of 2 log 2 q.) We call a decomposition of a brick satisfying all three criteria given above a small scalable decomposition.
Fix a small scalable decomposition for each brick of x * , and let ↑x * be the rise of x * with respect to this decomposition. Since this decomposition is small, ↑x * has at most poly( E ∞ , r , s) fractional bricks. Moreover, the other properties above allow us to say the following: if r is a cycle of ↑x * − ↑z * , then the compression of r back to the original space is a configurable cycle of x * − z * (Lemma 14). So in order to bound x * − z * 1 , it suffices (by triangle inequality) to bound ↑x * − ↑z * 1 . We do this by adapting the approach of Eisenbrand and Weismantel [11] to bound the length of any cycle r of ↑x * − ↑z * .

The remainder of the proof
We say that |Γ | is the size of the decomposition. Let us show that for each brick, there exists anf -optimal decomposition whose coefficients have small encoding length, and its size is small. For any matrix A, define g ∞ (A) = max g∈G(A) g ∞ .

Lemma 10
Each brick of x * of type i has anf -optimal decomposition Γ 1. of size at most t + 1, and 2. max (c, Proof Anf -optimal decomposition corresponds to a solution of the LP (6). We will argue that there is a solution whose support is composed of columns which do not differ by much, which corresponds to a solution of an LP with small coefficients, and the claimed bound can then be obtained by Cramer's rule. Specifically, we claim that there exists anf -optimal decomposition Γ which corresponds to an optimal solution λ of (6) such that there exists a point ζ ∈ Z t and if c ∈ supp(λ), then . For a solution λ of (6), define R := max c,c ∈supp(λ) c − c ∞ to be the longest side of the bounding box of all c ∈ supp(λ). For a point ζ ∈ Z t , say, for c ∈ supp(λ), that a coordinate j " is an indicator of the statement X ) to be the weighted number of tight coordinates. Now let ζ ∈ Z t be any point which is an integer center of the bounding box (i.e., c − ζ ∞ ≤ R 2 for all c ∈ supp(λ)) and which minimizes S. For contradiction assume that λ is an optimal solution of (6) which minimizes R and S (lexicographically in this order) and R > (2t − 2)g ∞ (E i 2 ). Assuming Γ is a decomposition of a brick of type i, we have c, . By Proposition 1 we may write c − c = 2t−2 j=1 γ j g j with g j ∈ G(E i 2 ) and g j c − c for all j ∈ [2t − 2]. Note that because c − c ∞ > R := (2t − 2)g ∞ (E i 2 ), we have that there exists j ∈ [2t − 2] such that γ j > 1. Hence g := 2t−2 j=1 γ j 2 g j satisfies g = 0. Letc := c − g, andc := c + g.
Second, by the conformality of the decomposition, c,c ∈ C i . Third, by separable convex superadditivity (Proposition 2), we have that f (c ). Fourth, there exist a coordinate j ∈ [t] such that |c j − c j | = R but, since c −c ∞ ≤ R, |c j −c j | ≤ R < R and thus j is no longer a tight coordinate for eitherc orc (or both), and no new tight coordinates can be introduced because R < R . Without loss of generality, let λ c ≤ λ c . Now initialize λ := λ and modify it by setting λ c , λ c := λ c , λ c := 0, λ c := λ c − λ c . By our arguments above, λ is another optimal solution of (6) but the weighted number S of tight coordinates has decreased by the fourth point, a contradiction.
Thus, there exists a point ζ ∈ Z t and an optimal solution λ of (6) such that for each c ∈ supp(λ), it holds that . We obtain the following reduced LP from (6) by deleting all columns c with c − ζ ∞ > R/2, and denote the remaining set of columns byC i : This LP is equivalent to one obtained by subtracting ζ from all columns and the right hand side: Now, this LP has t + 1 rows and its columns have the largest coefficient bounded by R/2 in absolute value. A basic solution λ has |supp(λ)| ≤ t + 1 and, by Cramer's rule, the denominator of each λ c is bounded by (t + 1)! times the largest coefficient to the power of t + 1, thus bounded by In the worst case, we can bound this as where we use Next, we will need the notion of an Egyptian fraction. For a rational number p/q, p, q ∈ N, its Egyptian fraction is a finite sum of distinct unit fractions such that for q 1 , . . . , q k ∈ N distinct. Call the number of terms k the length of the Egyptian fraction. Vose [43] has proven that any p/q has an Egyptian fraction of length O( √ log q). Since our algorithm requires an exact bound, we present the following weaker yet exact result: Lemma 11 (Egyptian Fractions) Let p, q ∈ N, 1 ≤ p < q. Then p/q has an Egyptian fraction of length at most 2(log 2 q) + 1 and all denominators are at most q 2 .
Proof Let a = 2 k be largest such that a < q, so k = (log 2 q) − 1 < log 2 q. Write , so a sum of at most 2k + 1 ≤ 2(log 2 q) + 1 terms with all denominators d i ≤ q2 k = qa ≤ q 2 . Moreover, all denominators in the first sum are distinct and at most 2 k , and all in the second sum are distinct and at least q > 2 k , hence all distinct, so this is an Egyptian fraction of p/q of length 2(log 2 q) + 1 and denominators are at most q 2 .
Recall that our goal is to obtain a configurable cycle. However, for that we also need a special form of a decomposition. Say that Γ is a scalable decomposition of a brick (x * ) j of type i if it is af -optimal decomposition, and for each (c γ , λ γ ) ∈ Γ , λ γ is of the form 1/q γ for some q γ ∈ N. We note that in what follows we do not need an algorithm computing a scalable decomposition, only the following existence statement.

Lemma 12
Each brick of x * has a scalable decomposition of size at most κ 1 · t 3 log(t E 2 ∞ ), where κ 1 = 52.
Proof Fix j ∈ [N ]. Let x = (x * ) j be a brick of x * of type i. By Lemma 10, there exists anf -optimal decomposition of x of size t + 1 where each coefficient . For each c in the decomposition now express λ c as an Egyptian fraction: By Lemma 11, Thus, the resulting decomposition is of size at most (t + 1)25st log(st E i 2 ∞ ) ≤ 2 · 26t 3 log(t E i 2 ∞ ) (by s ≤ t this justifies the deletion of s in the log() at the cost of a factor of 2, so the last bound holds) and is scalable, since each coefficient is of the form 1/q γ for some q γ ∈ N.
We will now show that we are guaranteed a configurable cycle of x * − z * if there exists an analogue of a regular cycle of a certain "lifting" of x * and z * .
Fix for each brick of x * a scalable decomposition Γ j . Let ↑x * be the rise of x * defined as a vector obtained from x * by keeping every integer brick (x * ) j , and replacing every fractional brick (x * ) j with |Γ j | terms λ γ c γ , one for each (c γ , λ γ ) ∈ Γ j . Observe that each brick of ↑x * is of the form λ c c for some configuration c and some coefficient 0 ≤ λ c ≤ 1. Thus, for a brick λ c c we say that c is its configuration, λ c is its coefficient, and its type is identical to the type of brick it originated from; in particular, bricks which originated from an integer brick p = (x * ) j are of the form λ p p with λ p = 1. Let N be the number of bricks of ↑x * and define a mapping ν : such that if a brick j ∈ [N ] of ↑x * was defined from brick ∈ [N ] of x * , then ν( j) = . The natural inverse ν −1 is defined such that, for ∈ [N ], ν −1 ( ) is the set of bricks of ↑x * which originated from (x * ) .
Denote by ↑z * ∈ R N t the rise of z * (with respect to x * ) defined as follows. Let j ∈ [N ], = ν( j), and λ be the coefficient of the j-th brick of ↑x * . Then the j-th brick of ↑z * is (↑z * ) j := λ(z * ) . Observe that ↑x * − ↑z * 1 ≥ x * −z * 1 by applying triangle inequality to each brick and its decomposition individually and aggregating.
For any vector x ∈ R N t , define the fall of x as a vector ↓x ∈ R N t such that for ∈ [N ], (↓x) = j∈ν −1 ( ) x j . We see that ↓(↑x * ) = x * and ↓(↑z * ) = z * . Say that r is a cycle of ↑x * − ↑z * if r ↑x * − ↑z * and r ∈ Ker Z (E (N ) ). 3

Lemma 14
If r is a cycle of ↑x * − ↑z * , then ↓ r is a configurable cycle of x * − z * .
Proof To show that ↓r is a configurable cycle, we need to show that (1) ↓r ∈ Ker Z (E (N ) ) and, (2) for each brick x = (x * ) j of x * , there is anf -optimal decomposition of x such that h = (↓r) j decomposes accordingly. For the first part, ↓r is integral because it is obtained by summing bricks of r, which is integral. Denote by i( j) the type of a brick j (we abuse this notation; note that i( j) for j ∈ [N ] may differ from i( j) for j ∈ [N ], but context always makes clear what we mean). By the fact that r ∈ Ker Z (E (N ) ) and the definition of ↓r, To see the second part, fix a brick j ∈ [N ] of type i and let x = (x * ) j , z = (z * ) j and h = (↓ r) j . We need to show that h = γ ∈ν −1 ( j) h γ can be written as . By definition of ↑x * and r, there is a scalable decomposition Γ of x (namely the one used to define ↑x * ) such that We are finally ready to use the Steinitz Lemma to derive a bound on x * − z * 1 .
Theorem 2 Let x * be a conf-optimal solution of (HugeCP) with at most 2r fractional bricks. Then there exists an optimal solution z * of (HugeIP) such that Proof Denote byĒ 1 the first r rows of the matrix E (N ) . Let z * be an optimal integer solution such that z * − x * 1 is minimal, let ↑x * be the rise of x * with at most κ 2 · r · t 3 log(t E 1 2 , . . . , E τ 2 ∞ ) fractional bricks (see Lemma 13), let ↑z * be the rise of z * with respect to x * , and let q =↑x * − ↑z * .
We want to get into the setting of the Steinitz Lemma, that is, to obtain a sequence of vectors with small 1 -norm and summing up to zero. To this end, we shall decomposē E 1 q in the following way; we stress that we haveĒ 1 q = 0. For every integral brick q i of type ∈ [τ ] we have its decomposition q i = j g i j into elements of G(E 2 ) by the Positive Sum Property (Proposition 1); for each g i j append E 1 g i j into the sequence. For every fractional brick q i of type ∈ [τ ] we have its decomposition q i = t j=1 α j g i j , α j ≥ 0 for each j, into elements of C(E 2 ); for each g i j append α j copies of E 1 g i j into the sequence, and finally append E 1 {α j }g i j . Observe that since ↑x * has at most κ 2 · r · t 3 log(t E 1 2 , . . . , E τ 2 ∞ ) fractional bricks (Lemma 13), so does q, and thus we have appended f ≤ t ·κ 2 ·r ·t 3 log(t E 1 , each vector has ∞ -norm of E 1 1 , . . . , E τ 1 ∞ ·g 1 (E 2 ) and they sum up to 0. Observe that (m +f)· g 1 (E 2 ) ≥ q 1 = ↑x * − ↑z * 1 ≥ x * −z * 1 . We now focus on bounding m + f. The Steinitz Lemma (Lemma 1) implies that there exists a permutation π such that the sequence (15) can be re-arranged as where We will now argue that there cannot be indices 1 ≤ k 1 < · · · < k f+2 ≤ f + m with which implies that f + m is bounded by f + 1 times the number of integer points of norm at most r E 1 ∞ g 1 (E 2 ) and therefore, Assume for contradiction that there exist f+2 indices 1 ≤ k 1 < · · · < k f+2 ≤ f+m satisfying (17). By the pigeonhole principle, there is an index k such that all the vectors v k +1 , . . . , v k +1 from the rearrangement (16) correspond to integer vectors o π −1 ( p) for p ∈ [k + 1, k +1 ]. We will show that this collection of vectors corresponds to a cycle h of ↑x * − ↑z * which by the minimality of x * − z * 1 and Lemmas 9 and 14 is impossible. To obtain the cycle, for each p ∈ [k + 1, k +1 ], let i( p), j( p), and ( p) be such that o π −1 ( p) = E

Improving the proximity theorem when I has identical columns
In this section we will show how to construct a huge n-fold instance I from any input instance I such that the number of columns of I per brick is at most (2 E ∞ + 1) r +s , and in some sense I and I are equivalent. Specifically, we will show a mapping between the solutions of I and I which maps integer or configurable optima of I to integer or configurable optima of I and vice versa, respectively, and such that proximity bounds from I can be transferred to I . This will eventually allow us to show that even if I has very large t, we can bound the distance between a configurable optimum and some integer optimum of I by a function independent of t.

Construction of I .
Note that (2 E ∞ + 1) r +s is the number of distinct (r + s)-dimensional integer vectors with entries bounded by E ∞ in absolute value, hence the number of possible distinct columns per brick. We will show how to "join" variables corresponding to identical columns. Consider any IP with a separable convex objective where columns corresponding to variables x 1 and x 2 are identical. Let f 1 and f 2 be the objective functions corresponding to x 1 and x 2 , and l 1 , l 2 and u 1 , u 2 be their lower and upper bounds, respectively. Let x 12 be a new variable which replaces x 1 , x 2 in I . Set the lower bound of x 12 to be l 12 = l 1 + l 2 , upper bound u 12 = u 1 + u 2 , and define its objective function as the (min, +)-convolution of f 1 and f 2 : Note that if f 1 and f 2 are convex, then f 12 is also convex. Extend f 12 to fractional values as a linear interpolation, that is, for x 12 = x 12 + {x 12 } fractional, let f 12 (x 12 ) be f 12 ( x 12 ) + {x 12 }( f 12 ( x 12 ) − f 12 ( x 12 )). The value f 12 (x 12 ) can be obtained by binary search on x 1 (which determines x 2 = x 12 − x 1 ) in O(log(u 12 − l 12 )) calls to evaluation oracles for f 1 and f 2 . When merging a set S of more than 2 variables, one would compute f S (x S ) as the solution of the corresponding integer program whose objective is i∈S f i (x i ) and its constraints are i∈S x i = x S and appropriate lower and upper bounds; by [13], this is solvable in time poly(|S|) log( f max , u S − l S ). However, our goal here is to strengthen our proximity result for I by studying I , without actually attempting to solve I .
For a solution x of I (not necessarily integral), we define σ (x) to be a solution of I where x 1 and x 2 are replaced by x 12 = x 1 + x 2 . Clearly, for integer x, the value of σ (x) under the objective of I is at most the value of x under f , and if x is an integer optimum of I , then σ (x) will be an integer optimum of I because we then have f 12 (x 12 ) = f 1 (x 1 ) + f 2 (x 2 ). We abuse the notation and for an integer x define σ −1 (x ) to be some integral member x of the set σ −1 (x ) which satisfies f 1 (x 1 )+ f 2 (x 2 ) = f 12 (x 12 ). For a configurable solution x we define σ −1 (x ) by taking anfoptimal decomposition Γ of the brick of x containing x 12 and applying σ −1 to the configurations in Γ ; this defines a decomposition Γ and thus a brick Γ of a solution x of I . The next lemma shows that this construction preserves the value of the solution. Proof It follows from the definition of f 12 that for any integer solution of I we get an integer solution of I which is at least as good, and for any integer solution of I we get an integer solution of I with the same value. For configurable solutions we apply the observation above to each configuration in somef -optimal decomposition and use the fact thatf is defined via f 12 .
This approach generalizes readily to any number of variables. For the sake of simplicity we continue with the example of "joining" two variables whose columns in E (N ) are identical.
We are left to argue about proximity. While we believe that it holds in general that any proximity bound between integer and configurable optima of I transfers to I , we only need this for our specific bound, so we take a less general route.

Lemma 16
Let x be a configurable optimum of I with at most 2r fractional bricks, x = σ (x) a configurable optimum of I , z an 1 -closest integer optimum of I , and z = σ −1 (z ) an integer optimum of I . Let P be the bound of Theorem 2 on x − z 1 . Then x − z 1 ≤ P.
Proof Consider the proof of Theorem 2. In it, we create a sequence of vectors v 1 , . . . , v m+f . Each of these vectors corresponds to some E 1 λ j g i j . The crucial observation is that the sequence (v i ) i obtained from x, z is identical to the sequence obtained from x , z , so if x − z 1 ≤ P, then also x − z 1 ≤ P.
The next corollary is now immediate: Corollary 1 Let x * be a conf-optimal solution of (HugeCP) with at most 2r fractional bricks. Then there is an optimal solution z * of (HugeIP) such that

Algorithm
Recall the statement of the theorem we are proving: Theorem 1. Huge N -fold IP with any separable convex objective can be solved in time Proof We first give a description of the algorithm which solves huge N -fold IP, then show its correctness, and finally give a time complexity analysis. Description of the algorithm. First, obtain an optimal solution y of (ConfLP) and from it a conf-optimal solution x * = ϕ(y) with at most 2r fractional bricks by Lemma 6.
Applying Corollary 1 to x * guarantees the existence of an integer optimum z * satisfying Together with the fact that there are at most 2r fractional bricks, this implies that z * differs from x * in at most P = P + 2r bricks. The idea of the algorithm is to "fix" the value of the solution on "almost all" bricks and compute the rest using an auxiliarȳ N -fold IP problem with a polynomialN . Formally, our goal is to compute an optimal solution z of (HugeIP) represented succinctly by multiplicities of configurations, or in other words, as a solution ζ of (Con-fILP). Denote by y −P the vector whose coordinates are defined by setting, for every type i ∈ [τ ] and every configuration c ∈ C i , y −P (i, c) = max{0, y(i, c) − P } This leaves us with y 1 − y −P 1 ≤ |supp(y)|P ≤ (r + τ )P =:P bricks to determine. Letζ = y − y −P , defineμ by setting, for each i ∈ [τ ],μ i := c∈C iζ (i, c), let x = ϕ(ζ ), and letN = ζ 1 = μ 1 ≤P. Construct an auxiliaryN -fold IP instance with the same blocks E i 1 , E i 2 , i ∈ [τ ], by, for each brickx j of type i, setting We say that such a brick was derived from type i. Lastly, letb 0 After obtaining an optimal solutionz of this instance we update ζ as follows. For each brickz j derived from type i, increment ζ(i,z j ) by one. Correctness. By (19), it is correct to assume that there exists a solution ζ of (ConfILP) which has ζ(i, c) ≥ max{0, y(i, c) − P } for each i ∈ [τ ] and c ∈ C i . Thus we may do a variable transformation of (ConfILP) ζ =ζ + y −P , obtaining an auxiliary (ConfILP) instance min v(ζ + y −P ) : B(ζ + y −P ) = d, 0 ≤ζ .
The auxiliary hugeN -fold instance is simply the instance corresponding to the above, and the final construction of ζ corresponds to the described variable transformation. Complexity. Since ζ 1 ≤P, we can obtain an optimal solutionz of the auxiliary To solve the auxiliary instance above, we need time ( E ∞ rs) O(r 2 s+rs 2 ) (tP) log(tP) log f max ,b,l,ū 2 ∞ , whereP = (r + τ )P ≤ τ (2 E ∞ + 1) O(s) (2r E ∞ g 1 (E 2 )) O(r) .
Hence we can solve huge N -fold IP in time at most ( E ∞ rs) O(r 2 s+rs 2 ) poly(tτ log f max , l, u, b, μ ∞ ) .

Concluding remarks
At this point one may wonder why bother with the ConfLP rather than solving HugeCP and showing that its optima are close to those of HugeIP. The reason is that even though handling optima of HugeCP is much easier than handling conf-optimal solutions, and even though solving HugeCP is easier than solving ConfLP, 4 a HugeCP optimum can be very far from a HugeIP optimum [8,Proposition 1]. In other words, ConfLP is a stronger relaxation than HugeCP: consider a brick p of a HugeCP optimum and a brick q of a conf-optimal solution; then In plain language, while q lies in the integer hull of all configurations, p only lies in the fractional relaxation of this hull. Another obstacle is that even though Configuration LP is a standard tool, it is typical that the separation problem is merely approximated rather than solved exactly, leading to approximate solutions of ConfLP. But, we require an exact solution, and so we use a parameterized exact algorithm for IP to solve the separation problem. It is an interesting question when a k-approximate solution of ConfLP, i.e., a solution whose value is at most k · O PT , may be used to obtain an h(k)-accurate configurable solution of HugeCP, i.e., a configurable solution which is at 1 -distance at most h(k) from a configurable optimum. An approximate solution of ConfLP might be much easier to obtain, and yet it may be almost as good as an exact solution for our purposes here.
Another interesting question is a tight complexity bound for the algorithm of Lemma 6. It seems likely that the recent approach of Cslovjecsek et al. [8] could also apply in our high-multiplicity setting, which would yield a near-linear fixedparameter algorithm. Notice that the iterative augmentation algorithms for standard N -fold IP have a strong combinatorial flavor and use no "black boxes". Could the ellipsoid method behind Lemma 6 be replaced by a (more) combinatorial algorithm, at least for some important problems which have huge N -fold IP models, such as the scheduling problems studied by Knop

Availability of data and material (data transparency) Not applicable.
Code Availability Not applicable.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.