1 Introduction

The combinatorial diameter of a polyhedron P is the diameter of the vertex-edge graph associated with P. Hirsch’s famous conjecture from 1957 asserted that the combinatorial diameter of a d-dimensional polytope (bounded polyhedron) with f facets is at most \(f-d\). This was disproved by Santos in 2012 [30]. The polynomial Hirsch conjecture, i.e., finding a poly(f) bound on the combinatorial diameter remains a central question in the theory of linear programming.

The first quasipolynomial bound was given by Kalai and Kleitman [24, 25], see [32] for the best current bound and an overview of the literature. Dyer and Frieze [11] proved the polynomial Hirsch conjecture for totally unimodular (TU) matrices. For a system \(\{x\in \mathbb {R}^d:\, Mx\le b\}\) with integer constraint matrix M, polynomial diameter bounds were given in terms of the maximum subdeterminant \(\Delta _M\) [4, 7, 12, 20]. These arguments can be strengthened to using a parametrization by a ‘discrete curvature measure’ \(\delta _M\ge 1/(d\Delta ^2_M)\). The best such bound was given by Dadush and Hähnle [12] as \(O(d^3\log (d/\delta _M)/\delta _M)\), using a shadow vertex simplex algorithm.

As a natural relaxation of the combinatorial diameter, Borgwardt, Finhold, and Hemmecke [5] initiated the study of circuit diameters. Consider a polyhedron in standard equality form

$$\begin{aligned} P=\{\,x\in \mathbb {R}^n: Ax=b, x\ge \mathbb {0}\,\} \end{aligned}$$
(P)

for \(A\in \mathbb {R}^{m\times n}\), \(b\in \mathbb {R}^m\); we assume \({\text {rk}}(A)=m\). For the linear space \(W=\ker (A)\subseteq \mathbb {R}^n\), \(g\in W\) is an elementary vector if g is a support-minimal nonzero vector in W, that is, no \(h\in W\setminus \{\mathbb {0}\}\) exists such that \(\textrm{supp}(h)\subsetneq \textrm{supp}(g)\). A circuit in W is the support of some elementary vector; these are precisely the circuits of the associated linear matroid \(\mathcal {M}(A)\). We remark that many papers on circuit diameter, e.g., [2, 3, 5, 8, 26], refer to elementary vectors as circuits; we follow the traditional convention of [21, 27, 29]. We let \(\mathcal {E}(W)=\mathcal {E}(A)\subseteq W\) and \(\mathcal {C}(W)=\mathcal {C}(A)\subseteq 2^n\) denote the set of elementary vectors and circuits in the space \(W=\ker (A)\), respectively. All edge directions of P are elementary vectors, and the set of elementary vectors \(\mathcal {E}(A)\) equals the set of all possible edge directions of P in the form (P) for varying \(b\in \mathbb {R}^m\) [31].

A circuit walk is a sequence of points \(x^{(0)},x^{(1)},\ldots ,x^{(k)}\) in P such that for each \(i=0,\ldots ,k-1\), \(x^{(i+1)}=x^{(i)}+\alpha ^{(i)} g^{(i)}\) for some \(g^{(i)}\in \mathcal {E}(A)\) and \(\alpha ^{(i)} > 0\), and further, \(x^{(i)}+\alpha g^{(i)}\notin P\) for any \(\alpha >\alpha ^{(i)}\), i.e., each consecutive circuit step is maximal. The circuit diameter of P is the maximum length (number of steps) of a shortest circuit walk between any two vertices \(x,y\in P\). Note that, in contrast to walks in the vertex-edge graph, circuit walks are non-reversible and the minimum length from x to y may be different from the one from y to x; this is due to the maximal step requirement. The circuit-analogue of Hirsch conjecture, formulated in [5], asserts that the circuit diameter of d-dimensional polyhedron with f facets is at most \(f-d\); this may be true even for unbounded polyhedra, see [8]. For P in the form (P), \(d=n-m\) and the number of facets is at most n; hence, the conjectured bound is m.

Circuit diameter bounds have been shown for some combinatorial polytopes such as dual transportation polyhedra [5], matching, travelling salesman, and fractional stable set polytopes [26]. The paper [2] introduced several other variants of circuit diameter, and explored the relation between them. We note that [2, 16, 26] considers circuits for LPs given in the general form \(\{x\in \mathbb {R}^n:\, Ax=b,\, Bx\le d\}\). In Sect. 8, we show that this setting can be reduced to the form (P).

Circuit augmentation algorithms Circuit diameter bounds are inherently related to circuit augmentation algorithms. This is a general algorithmic scheme to solve an LP

$$\begin{aligned} \min \; \left\langle c, x \right\rangle \quad \mathrm {s.t.}\quad Ax=b\, , \, x \ge \mathbb {0}\, .\\ \end{aligned}$$
(LP)

The algorithm proceeds through a sequence of feasible solutions \(x^{(t)}\). An initial feasible \(x^{(0)}\) is required in the input. For \(t=0,1,\ldots ,\) the current \(x^{(t)}\) is updated to \(x^{(t+1)}=x^{(t)}+\alpha g\) for some \(g\in \mathcal {E}(A)\) such that \(\left\langle c, g \right\rangle \le 0\), and \(\alpha >0\) such that \(x^{(t)}+\alpha g\) is feasible. The elementary vector g is an augmenting direction if \(\left\langle c, g \right\rangle <0\) and such an \(\alpha >0\) exists; by LP duality, \(x^{(t)}\) is optimal if and only if no augmenting direction exists. The augmentation is maximal if \(x^{(t)}+\alpha ' g\) is infeasible for any \(\alpha '>\alpha \); \(\alpha \) is called the maximal stepsize for \(x^{(t)}\) and g. Clearly, an upper bound on the number of steps of a circuit augmentation algorithm with maximal augmentations for arbitrary cost c and starting point \(x^{(0)}\) yields an upper bound on the circuit diameter.

Simplex is a circuit augmentation algorithm that is restricted to using special elementary vectors corresponding to edges of the polyhedron. Many network optimization algorithms can be seen as special circuit augmentation algorithms. Bland [6] introduced a circuit augmentation algorithm for LP, that generalizes the Edmonds–Karp–Dinic maximum flow algorithm and its analysis, see also [27, Proposition 3.1]. Circuit augmentation algorithms were revisited by De Loera, Hemmecke, and Lee in 2015 [15], analyzing different augmentation rules and also extending them to integer programming. De Loera, Kafer, and Sanità [16] studied the convergence of these rules on 0/1-polytopes, as well as the computational complexity of performing them. We refer the reader to [15] and [16] for a more detailed overview of the background and history of circuit augmentations.

The circuit imbalance measure For a linear space \(W=\ker (A)\subseteq \mathbb {R}^n\), the circuit imbalance \(\kappa _W=\kappa _A\) is defined as the maximum of \(|g_j/g_i|\) over all elementary vectors \(g\in \mathcal {E}(W)\), \(i,j\in \textrm{supp}(g)\). It can be shown that \(\kappa _W=1\) if and only if W is a unimodular space, i.e., the kernel of a totally unimodular matrix. This parameter and related variants have been used implicitly or explicitly in many areas of linear programming and discrete optimization, see [19] for a recent survey. It is closely related to the Dikin–Stewart–Todd condition number \(\bar{\chi }_W\) that plays a key role in layered-least-squares interior point methods introduced by Vavasis and Ye [38]. An LP of the form (LP) for \(A\in \mathbb {R}^{m\times n}\) can be solved in time poly\((n,m,\log \kappa _A)\), which is strongly polynomial if \(\kappa _A \le 2^{\textrm{poly}(n)}\); see [13, 17] for recent developments and references.

Imbalance and diameter The combinatorial diameter bound \(O(d^3\log (d/\delta _M)/\delta _M)\) from [12] mentioned above translates to a bound \(O((n-m)^{3} m \kappa _{A}\log (\kappa _{A}+n))\) for the system in the form (P), see [19]. For circuit diameters, the Goldberg-Tarjan minimum-mean cycle cancelling algorithm for minimum-cost flows [23] naturally extends to a circuit augmentation algorithm for general LPs using the steepest-descent rule. This yields a circuit diameter bound \(O(n^2m\kappa _A \log (\kappa _A + n))\) [19], see also [22]. However, note that these bounds may be exponential in the bit-complexity of the input.

1.1 Our contributions

Our first main contribution improves the \(\kappa _A\) dependence to a \(\log \kappa _A\) dependence for circuit diameter bounds.

Theorem 1.1

The circuit diameter of a system in the form (P) with constraint matrix \(A\in \mathbb {R}^{m\times n}\) is \(O(m \min \{m, n- m\}\log (m+\kappa _A))\).

The proof in Sect. 3 is via a simple ‘shoot towards the optimum’ scheme. We need the well-known concept of conformal circuit decompositions. We say that \(x,y\in \mathbb {R}^n\) are sign-compatible if \(x_iy_i\ge 0\) for all \(i\in [n]\). We write \(x\sqsubseteq y\) if they are sign-compatible and further \(|x_i|\le |y_i|\) for all \(i\in [n]\). It follows from Carathéodory’s theorem and Minkowski–Weyl theorem that for any linear space \(W\subseteq \mathbb {R}^n\) and \(x\in W\), there exists a decomposition \(x=\sum _{j=1}^k h^{(j)}\) such that \(h^{(j)}\in \mathcal {E}(W)\), \(h^{(j)}\sqsubseteq x\) for all \(j\in [k]\) and \(k\le \dim (W)\). This is called a conformal circuit decomposition of x (see also Definition 2.2 and Lemma 2.3 below).

Let \(B\subseteq [n]\) be a feasible basis and \(N=[n]\setminus B\), i.e., \(x^*=(A_B^{-1}b,\mathbb {0}_N) \ge \mathbb {0}_n\) is a basic feasible solution. This is the unique optimal solution to (LP) for the cost function \(c=(\mathbb {0}_B,\mathbbm {1}_N)\). Let \(x^{(0)}\in P\) be an arbitrary vertex. We may assume that \(n\le 2m\), by restricting to the union of the support of \(x^*\) and \(x^{(0)}\), and setting all other variables to 0. For the current iterate \(x^{(t)}\), let us consider a conformal circuit decomposition \(x^*-x^{(t)}= \sum _{j=1}^k h^{(j)}\). Note that the existence of such a decomposition does not yield a circuit diameter bound of n, due to the maximality requirement in the definition of circuit walks. For each \(j\in [k]\), \(x^{(t)}+h^{(j)}\in P\), but there might be a larger augmentation \(x^{(t)}+\alpha h^{(j)}\in P\) for some \(\alpha >1\).

Still, one can use this decomposition to construct a circuit walk. Let us pick the most improving circuit from the decomposition, i.e., the one maximizing \(-\left\langle c, h^{(j)} \right\rangle =\Vert h^{(j)}_N\Vert _1\), and obtain \(x^{(t+1)}=x^{(t)}+\alpha ^{(t)} h^{(j)}\) for the maximum stepsize \(\alpha ^{(t)}\ge 1\). The proof of Theorem 1.1 is based on analyzing this procedure. The first key observation is that \(\left\langle c, x^{(t)} \right\rangle =\Vert x^{(t)}_N\Vert _1\) decreases geometrically. Then, we look at the set of indices \(L_t=\{i\in [n]:\, x^*_i>n\kappa _A\Vert x^{(t)}_N\Vert _1\}\) and \(R_t=\{i\in [n]:\, x^{(t)}_i\le (n - m)x^*_i\}\), and show that indices may never leave these sets once they enter. Moreover, a new index is added to either set every \(O(m\log (m+\kappa _A))\) iterations. In Sect. 4, we extend this bound to the setting with upper bounds on the variables.

Theorem 1.2

The circuit diameter of a system in the form \(Ax=b\), \(\mathbb {0}\le x\le u\) with constraint matrix \(A\in \mathbb {R}^{m\times n}\) is \(O(m \min \{m, n - m\}\log (m+\kappa _A) + (n-m)\log n)\).

There is a straightforward reduction from the capacitated form to (P) by adding n slack variables; however, this would give an \(O(n^2\log (n+\kappa _A))\) bound. For the stronger bound, we use a preprocessing that involves cancelling circuits in the support of the current solution; this eliminates all but O(m) of the capacity bounds in \(O(n\log n)\) iterations, independently of \(\kappa _A\).

For rational input, \(\log (\kappa _A)=O({\text {size}}(A))\) where \({\text {size}}(A)\) denotes the total encoding length of A [13]. Hence, our result yields an \(O(m\min \{m, n - m\} {\text {size}}(A)+n\log n)\) diameter bound on \(Ax=b\), \(\mathbb {0}\le x\le u\). This can be compared with the bounds \(O(n {\text {size}}(A,b))\) using deepest descent augmentation steps in [15, 16], where \({\text {size}}(A,b)\) is the encoding length of (Ab). (Such a bound holds for every augmentation rule that decreases the optimality gap geometrically, including the minimum-ratio circuit rule discussed below.) Note that our bound is independent of b. Furthermore, it is also applicable to systems given by irrational inputs, in which case arguments based on subdeterminants and bit-complexity cannot be used.

In light of these results, the next important step towards the polynomial Hirsch conjecture might be to show a poly\((n,\log \kappa _A)\) bound on the combinatorial diameter of (P). Note that—in contrast with the circuit diameter—not even a poly\((n,{\text {size}}(A,b))\) bound is known. In this context, the best known general bound is \(O((n-m)^{3} m \kappa _{A}\log (\kappa _{A}+n))\) implied by [12].

Circuit augmentation algorithms The diameter bounds in Theorems 1.1 and 1.2 rely on knowing the optimal solution \(x^*\); thus, they do not provide efficient LP algorithms. We next present circuit augmentation algorithms with poly\((n,m,\log \kappa _A)\) bounds on the number of iterations. Such algorithms require subroutines for finding augmenting circuits. In many cases, such subroutines are LPs themselves. However, they may be of a simpler form, and might be easier to solve in practice. Borgwardt and Viss [9] exhibit an implementation of a steepest-descent circuit augmentation algorithm with encouraging computational results.

We assume that a subroutine Ratio-Circuit(Acw) is available; this implements the well-known minimum-ratio circuit rule. It takes as input a matrix \(A\in \mathbb {R}^{m\times n}\), \(c\in \mathbb {R}^n\), \(w\in (\mathbb {R}_{++}\cup \{\infty \})^n\), and returns a basic optimal solution to the system

$$\begin{aligned} \min \; \left\langle c, z \right\rangle \, \quad \mathrm {s.t.}\quad Az =\mathbb {0}\,, \, \left\langle w, z^- \right\rangle \le 1\,, \end{aligned}$$
(1)

where \((z^-)_i:= \max \{0,-z_i\}\) for \(i\in [n]\). Here, we use the convention \(w_iz_i = 0\) if \(w_i = \infty \) and \(z_i = 0\). This system can be equivalently written as an LP using auxiliary variables. If bounded, a basic optimal solution is either \(\mathbb {0}\) or an elementary vector \(z\in \mathcal {E}(A)\) that minimizes \(\left\langle c, z \right\rangle /\left\langle w, z^- \right\rangle \).

Given \(x\in P\), we use weights \(w_i=1/x_i\) (with \(w_i=\infty \) if \(x_i=0\)). For minimum-cost flow problems, this rule was proposed by Wallacher [39]; such a cycle can be found in strongly polynomial time for flows. The main advantage of this rule is that the optimality gap decreases by a factor \(1-1/n\) in every iteration. This rule, along with the same convergence property, can be naturally extended to linear programming [28], and has found several combinatorial applications, e.g., [40, 41], and has also been used in the context of integer programming [33].

On the negative side, Wallacher’s algorithm is not strongly polynomial: it does not terminate finitely for minimum-cost flows, as shown in [28]. In contrast, our algorithms achieve a strongly polynomial running time whenever \(\kappa _A\le 2^{\textrm{poly}(n)}\). An important modification is the occasional use of a second type of circuit augmentation step Support-Circuit that removes circuits in the support of the current (non-basic) iterate \(x^{(t)}\) (see Subroutine 2.1); this can be implemented using simple linear algebra. Our first result addresses the feasibility setting:

Theorem 1.3

Consider an LP of the form (LP) with cost function \(c = (\mathbb {0}_{[n]{\setminus } N}, \mathbbm {1}_N)\) for some \(N\subseteq [n]\). There exists a circuit augmentation algorithm that either finds a solution x such that \(x_N=\mathbb {0}\) or a dual certificate that no such solution exists, using \(O(mn\log (n+\kappa _A))\) Ratio-Circuit and \((m+1)n\) Support-Circuit augmentation steps.

Such problems typically arise in Phase I of the Simplex method when we add auxiliary variables in order to find a feasible solution. The algorithm is presented in Sect. 6. The analysis extends that of Theorem 1.1, tracking large coordinates \(x_i^{(t)}\). Our second result considers general optimization:

Theorem 1.4

Consider an LP of the form (LP). There exists a circuit augmentation algorithm that finds an optimal solution or concludes unboundedness using \(O(mn^2\log (n+\kappa _A))\) Ratio-Circuit and \((m+1)n^2\) Support-Circuit augmentation steps.

The proof is given in Sect. 7. The main subroutine identifies a new index \(i\in [n]\) such that \(x^{(t)}_i=0\) in the current iteration and \(x^*_i=0\) in an optimal solution; we henceforth fix this variable to 0. To derive this conclusion, at the end of each phase the current iterate \(x^{(t)}\) will be optimal to (LP) with a slightly modified cost function \({\tilde{c}}\); the conclusion follows using a proximity argument (Theorem 5.4). The overall algorithm repeats this subroutine n times. The subroutine is reminiscent of the feasibility algorithm (Theorem 1.3) with the following main difference: whenever we identify a new ‘large’ coordinate, we slightly perturb the cost function.

Comparison to black-box LP approaches An important milestone towards strongly polynomial linear programming was Tardos’s 1986 paper [35] on solving (LP) in time poly\((n,m,\log \Delta _A)\), where \(\Delta _A\) is the maximum subdeterminant of A. Her algorithm makes O(nm) calls to a weakly polynomial LP solver for instances with small integer capacities and costs, and uses proximity arguments to gradually learn the support of an optimal solution. This approach was extended to the real model of computation for a poly\((n,m,\log \kappa _A)\) bound [17]. The latter result uses proximity arguments with circuit imbalances \(\kappa _A\), and eliminates all dependence on bit-complexity.

The proximity tool Theorem 5.4 derives from [17], and our circuit augmentation algorithms are inspired by the feasibility and optimization algorithms in this paper. However, using circuit augmentation oracles instead of an approximate LP oracle changes the setup. Our arguments become simpler since we proceed through a sequence of feasible solutions, whereas much effort in [17] is needed to deal with infeasibility of the solutions returned by the approximate solver. On the other hand, we need to be more careful as all steps must be implemented using circuit augmentations in the original system, in contrast to the higher degree of freedom in [17] where we can make approximate solver calls to arbitrary modified versions of the input LP.

Organization of the paper The rest of the paper is organized as follows. We first provide the necessary preliminaries in Sect. 2. In Sect. 3, we upper bound the circuit diameter of (P). In Sect. 4, this bound is extended to the setting with upper bounds on the variables. Then, we develop circuit-augmentation algorithms for solving (LP). We first present the required proximity results in Sect. 5, Sect. 6 contains the algorithm for finding a feasible solution, whereas Sect. 7 contains the algorithm for solving (LP) given an initial feasible solution. Section 8 shows how circuits in LPs of more general forms can be reduced to the notion used in this paper.

2 Preliminaries

Let \([n]=\{1,2,\ldots ,n\}\). Let \(\mathbb {R}_+\) and \(\mathbb {R}_{++}\) be the set of nonnegative and positive real numbers respectively. For \(\alpha \in \mathbb {R}\), we denote \(\alpha ^+=\max \{0,\alpha \}\) and \(\alpha ^-=\max \{0,-\alpha \}\). For a vector \(z \in \mathbb {R}^n\), we define \(z^+,z^-\in \mathbb {R}^n\) as \((z^+)_i=(z_i)^+\), \((z^-)_i=(z_i)^-\) for \(i\in [n]\). For \(z\in \mathbb {R}^n\), we let \(\textrm{supp}(z)=\{i\in [n]: z_i\ne 0\}\) denote its support, and \(1/z\in (\mathbb {R}\cup \{\infty \})^n\) denote the vector \((1/z_i)_{i\in [n]}\). We use \(\Vert \cdot \Vert _p\) to denote the \(\ell _p\)-norm; we simply write \(\Vert \cdot \Vert \) for \(\Vert \cdot \Vert _2\). For \(A\in \mathbb {R}^{m\times n}\) and \(S\subseteq [n]\), we let \(A_S\in \mathbb {R}^{m\times |S|}\) denote the submatrix corresponding to columns S. We denote \({\text {rk}}(S):= {\text {rk}}(A_S)\), i.e., the rank of the set S in the linear matroid associated with A. A spanning subset of S is a subset \(T\subseteq S\) such that \({\text {rk}}(T) = {\text {rk}}(S)\). The closure of S is defined as \({\text {cl}}(S):= \left\{ i\in [n]:{\text {rk}}(S\cup \{i\}) = {\text {rk}}(S) \right\} \). The dual linear program of (LP) is

$$\begin{aligned} \max \; \left\langle b, y \right\rangle \quad \mathrm {s.t.}\quad A^\top y + s=c\, , \, s \ge \mathbb {0}\, . \end{aligned}$$
(DLP)

Note that y uniquely determines s, and due to the assumption \({\text {rk}}(A) =m\), s also uniquely determines y. For this reason, given a dual feasible solution (ys), we can just focus on y or s.

For \(A\in \mathbb {R}^{m\times n}\), let \(W=\ker (A)\). Recall that \(\mathcal {C}(W) = \mathcal {C}(A)\) and \(\mathcal {E}(W)= \mathcal {E}(A)\) are the set of circuits and elementary vectors in W respectively. Note that every circuit has size at most \(m+1\) because we assumed that \({\text {rk}}(A) = m\). The circuit imbalance measure of W is defined as

$$\begin{aligned} \kappa _W :=\kappa _A :=\max _{g\in \mathcal {E}(W)} \left\{ \frac{|g_i|}{|g_j|}:i,j\in \textrm{supp}(g) \right\} \end{aligned}$$

if \(W\ne \{\mathbb {0}\}\). Otherwise, it is defined as \(\kappa _W:=\kappa _A:=1\). For a linear space \(W\subseteq \mathbb {R}^n\), let \(W^\perp \) denote the orthogonal complement. Thus, for \(W=\ker (A)\), \(W^\perp ={\text {Im}}(A^\top )\). According to the next lemma, circuit imbalances are self-dual.

Lemma 2.1

([13]) For a linear space \(W\subseteq \mathbb {R}^n\), we have \(\kappa _W=\kappa _{W^\perp }\).

For P as in (P), \(x \in P\) and an elementary vector \(g \in \mathcal {E}(A){\setminus } \mathbb {R}^n_+\), we let \({\text {aug}}_P(x, g):= x + \alpha g\) where \(\alpha = \max \{\bar{\alpha }: x + \bar{\alpha }g \in P\}\).

Definition 2.2

[14] We say that \(x,y\in \mathbb {R}^n\) are sign-compatible if \(x_iy_i\ge 0\) for all \(i\in [n]\). We write \(x\sqsubseteq y\) if they are sign-compatible and further \(|x_i|\le |y_i|\) for all \(i\in [n]\). For a linear space \(W\subseteq \mathbb {R}^n\) and \(x\in W\), a conformal circuit decomposition of x is a set of elementary vectors \(h^{(1)},h^{(2)},\dots ,h^{(k)}\) in W such that \(x=\sum _{j=1}^k h^{(j)}\), \(k\le \dim (W)\), and \(h^{(j)}\sqsubseteq x\) for all \(j\in [k]\).

The following lemma shows that every vector in a linear space has a conformal circuit decomposition. It is a simple corollary of the Minkowski–Weyl and Carathéodory theorems.

Lemma 2.3

For a linear space \(W\subseteq \mathbb {R}^n\), every \(x\in W\) has a conformal circuit decomposition \(x = \sum _{j=1}^k h^{(j)}\) such that \(k \le \min \{\dim (W), |\textrm{supp}(x)|\}\).

2.1 Circuit oracles

In Sects. 4, 6, and 7, we use a simple circuit finding subroutine Support-Circuit(AcxS) that will be used to identify circuits in the support of a solution x. This can be implemented easily using Gaussian elimination. Note that the constraint \(\left\langle c, z \right\rangle \le 0\) is superficial as \(-z\) is also an elementary vector for every elementary vector z.

figure a

The circuit augmentation algorithms in Sects. 6 and 7 will use the subroutine Ratio-Circuit(Acw).

figure b

Note that (2) can be reformulated as an LP using additional variables, and its dual LP can be equivalently written as (3). Recall that we use the convention \(w_iz_i = 0\) if \(w_i = \infty \) and \(z_i = 0\) in (2). The opposite convention is used in (3), i.e., \(\lambda _i w_i = \infty \) if \(\lambda = 0\) and \(w_i = \infty \). If (2) is bounded, then a basic optimal solution is either \(\mathbb {0}\) or an elementary vector \(z\in \mathcal {E}(A)\) that minimizes \(\left\langle c, z \right\rangle /\left\langle w, z^- \right\rangle \). Moreover, observe that every feasible solution to (3) is also feasible to (DLP).

We will use the following lemma, a direct consequence of [18, Lemma 4.3].

Lemma 2.4

Given \(A\in \mathbb {R}^{m\times n}\), \(W=\ker (A)\), \(\ell \in (\mathbb {R}\cup \{-\infty \})^n\) and \(u\in (\mathbb {R}\cup \{\infty \})^n\), let \(r\in W\) such that \(\ell \le r\le u\). In \({\text {poly}}(m,n)\) time, we can find a vector \(r'\in W\) such that \(\ell \le r'\le u\) and \(\Vert r'\Vert _\infty \le \kappa _A \Vert \ell ^+ + u^-\Vert _1\).

This lemma, together with Lemma 2.1, allows us to assume that the optimal dual solution s returned by Ratio-Circuit satisfies

$$\begin{aligned} \Vert s\Vert _\infty \le 2\kappa _A \Vert c\Vert _1. \end{aligned}$$
(4)

To see this, let \((y,s,\lambda )\) be an optimal solution to (3). We know that \(-c \le s-c \le \lambda w - c\). Let \(\ell :=-c\), \(r :=s-c\) and \(u :=\lambda w - c\). By Lemma 2.4, we can in \({\text {poly}}(m,n)\) time compute \(r'\in W^\perp \) such that \(\ell \le r'\le u\) and

$$\begin{aligned}\Vert r'\Vert _\infty \le \kappa _{W^\perp } \Vert \ell ^+ + u^-\Vert _1 \le \kappa _{W^\perp } \Vert c^- + c^+\Vert _1 = \kappa _{W^\perp } \Vert c\Vert _1.\end{aligned}$$

Then, \(s':=r'+c\) is an optimal solution to (3) which satisfies

$$\begin{aligned}\Vert s'\Vert _\infty \le \Vert r'\Vert _\infty + \Vert c\Vert _\infty \le (\kappa _{W^\perp }+1)\Vert c\Vert _1 \le 2\kappa _{W^\perp }\Vert c\Vert _1.\end{aligned}$$

Thus, (4) follows using Lemma 2.1, since \(\kappa _{W^\perp }=\kappa _W=\kappa _A\).

The following lemma is well-known, see e.g., [28, Lemma 2.2].

Lemma 2.5

Let \(\textrm{OPT}\) be the optimal value of (LP), and assume that it is finite. Given a feasible solution x to (LP), let g be the optimal solution to (2) returned by Ratio-Circuit(Ac, 1/x).

  1. (i)

    If \(\left\langle c, g \right\rangle = 0\), then x is optimal to (LP).

  2. (ii)

    If \(\left\langle c, g \right\rangle <0\), then letting \(x'={\text {aug}}_P(x, g)\), we have \(\alpha \ge 1\) for the augmentation stepsize and

    $$\begin{aligned}\left\langle c, x' \right\rangle -\textrm{OPT}\le \left( 1-\frac{1}{|\textrm{supp}(x)|}\right) \left( \left\langle c, x \right\rangle -\textrm{OPT}\right) \,.\end{aligned}$$

Proof

We only prove (ii) because (i) is trivial. The stepsize bound \(\alpha \ge 1\) follows since \(\left\langle 1/x, g^- \right\rangle \le 1\); thus, \(x+g\in P\). Let \(x^*\) be an optimal solution to (LP), and let \(z=(x^*-x)/|\textrm{supp}(x)|\). Note that \(g\ngeq \mathbb {0}\), as otherwise (2) is unbounded. So, \(x\ne \mathbb {0}\). Then, z is feasible to (2) for \(w=1/x\). Therefore,

$$\begin{aligned} \alpha \left\langle c, g \right\rangle \le \left\langle c, g \right\rangle \le \left\langle c, z \right\rangle =\frac{\textrm{OPT}-\left\langle c, x \right\rangle }{|\textrm{supp}(x)|}\,, \end{aligned}$$

implying the second claim. \(\square \)

Remark 2.6

It is worth noting that Lemma 2.5 shows that applying Ratio-Circuit to vectors x with small support gives better convergence guarantees. Algorithms 3 and 4 for feasibility and optimization in Sects. 6 and 7 apply Ratio-Circuit to vectors x which have large support \(|\textrm{supp}(x)| = \Theta (n)\) in general. These algorithms could be reformulated in that one first runs Support-Circuit to reduce the size of the support to size O(m) and only then runs Ratio-Circuit. The guarantees of Lemma 2.5 now imply that to reduce the optimality gap by a constant factor we would replace O(n) calls to Ratio-Circuit with only O(m) calls. On the other hand, this comes at the cost of n additional calls to Support-Circuit for every call to Ratio-Circuit.

2.2 A norm bound

We now formulate a proximity bound asserting that if the columns of A outside N are linearly independent, then we can bound the \(\ell _\infty \)-norm of any vector in \(\ker (A)\) by the norm of its coordinates in N. This can be seen as a special case of Hoffman-proximity results; see Sect. 5 for more such results and references.

Lemma 2.7

For \(A\in \mathbb {R}^{m\times n}\), let \(N\subseteq [n]\) such that \(A_{[n]\setminus N}\) has full column rank. Then, for any \(z\in \ker (A)\), we have \(\Vert z\Vert _\infty \le \kappa _A\Vert z_N\Vert _1\).

Proof

Let \(h^{(1)},\ldots , h^{(k)}\) be a conformal circuit decomposition of z. Then, \(\Vert z\Vert _\infty \le \sum _{t=1}^k \Vert h^{(t)}\Vert _\infty \). For each \(h^{(t)}\), we have \(\textrm{supp}(h^{(t)})\cap N\ne \emptyset \) because \(A_{[n]\setminus N}\) has full column rank. Hence, \(\Vert h^{(t)}\Vert _\infty \le \kappa _A |h^{(t)}_{j(t)}|\) for some \(j(t)\in N\). Conformality implies that

$$\begin{aligned} \sum _{t=1}^k \left| h^{(t)}_{j(t)} \right| = \sum _{s\in N}\sum _{j(t) = s} \left| h^{(t)}_{j(t)} \right| \le \sum _{s\in N}|z_s| = \Vert z_N\Vert _1. \end{aligned}$$

The lemma follows by combining all the previous inequalities. \(\square \)

2.3 Estimating circuit imbalances

The circuit augmentation algorithms in Sects. 6 and 7 explicitly use the circuit imbalance measure \(\kappa _A\). However, this is NP-hard to approximate within a factor \(2^{O(n)}\), see [13, 36]. We circumvent this problem using a standard guessing procedure, see e.g., [13, 38]. Instead of \(\kappa _A\), we use an estimate \(\hat{\kappa }\), initialized as \(\hat{\kappa }=n\). Running the algorithm with this estimate either finds the desired feasible or optimal solution (which one can verify), or fails. In case of failure, we conclude that \(\hat{\kappa }<\kappa _A\), and replace \(\hat{\kappa }\) by \(\hat{\kappa }^2\). Since the running time of the algorithms is linear in \(\log (n+\hat{\kappa })\), the running time of all runs will be dominated by the last run, giving the desired bound. For simplicity, the algorithm descriptions use the explicit value \(\kappa _A\).

3 The circuit diameter bound

In this section, we show Theorem 1.1, namely the bound \(O(m\min \{m,n-m\}\log (m+\kappa _A))\) on the circuit diameter of a polyhedron in standard form (P). As outlined in the Introduction, let \(B\subseteq [n]\) be a feasible basis and \(N=[n]\setminus B\) such that \(x^*=(A_B^{-1}b,\mathbb {0}_N)\) is a basic solution to (LP). We can assume \(n\le 2m\): the union of the supports of the starting vertex \(x^{(0)}\) and the target vertex \(x^*\) is at most 2m; we can fix all other variables to 0. Defining \({\tilde{n}}:= |\textrm{supp}(x^*) \cup \textrm{supp}(x^{(0)})| \le 2m\) and restricting A to these columns, we show a circuit diameter bound \(O({\tilde{n}}({\tilde{n}}-m)\log (m+\kappa _A))\). This implies Theorem 1.1 for general n. In the rest of this section, we use n instead of \({\tilde{n}}\), but assume \(n\le 2m\). The simple ‘shoot towards the optimum’ procedure is shown in Algorithm 1.

Algorithm 1
figure c

Diameter-Bound

A priori, even finite termination is not clear. First, we show that the ‘cost’ \(\Vert x^{(t)}_N\Vert _1\) decreases geometrically. It is a consequence of choosing the most improving circuit \(g^{(t)}\) in each iteration.

Lemma 3.1

For every iteration \(t\ge 0\), we have \(\Vert x^{(t+1)}_N\Vert _1 \le (1-\frac{1}{n-m})\Vert x_N^{(t)}\Vert _1\). Furthermore, \(|x_i^{(t+1)} - x_i^{(t)}| \le (n - m) |x_i^* - x_i^{(t)}|\) for all \(i \in [n]\).

Proof

Let \(h^{(1)}, \ldots , h^{(k)}\) with \(k \le n - m\) be the conformal circuit decomposition of \(x^* - x^{(t)}\) used in iteration t of Algorithm 1. Note that \(h^{(i)}_N \le \mathbb {0}_N\) for all \(i \in [k]\) because \(x_N^* = \mathbb {0}_N\) and \(x^{(t)} \ge \mathbb {0}\). By our choice of \(g^{(t)}\),

$$\begin{aligned}\Vert g_N^{(t)}\Vert _1 = \max _{i \in [k]} \Vert h^{(i)}_N\Vert _1 \ge \frac{1}{k} \sum _{i \in [k]} \Vert h^{(i)}_N\Vert _1 = \frac{1}{k} \Vert x_N^{(t)}\Vert _1\end{aligned}$$

where the last equality uses the conformality of the decomposition. Let \(\alpha ^{(t)}\) be such that \(x^{(t+1)} = x^{(t)} + \alpha ^{(t)}g^{(t)}\). Clearly, \(\alpha ^{(t)}\ge 1\) because \(x^{(t)}+g^{(t)}\in P\). Hence,

$$\begin{aligned} \begin{aligned} \big \Vert x^{(t+1)}_N \big \Vert _1&= \big \Vert x^{(t)}_N + \alpha ^{(t)}g^{(t)}_N \big \Vert _1 \le \big \Vert x_N^{(t)} + g_N^{(t)} \big \Vert _1 \\&= \big \Vert x_N^{(t)} \big \Vert _1- \big \Vert g_N^{(t)} \big \Vert _1\le \left( 1-\frac{1}{k}\right) \big \Vert x_N^{(t)} \big \Vert _1\,. \end{aligned} \end{aligned}$$

Further, using \(\mathbb {0}\le x_N^{(t+1)}\le x_N^{(t)}\), we see that

$$\begin{aligned} \alpha ^{(t)} = \frac{ \big \Vert x_N^{(t+1)} - x_N^{(t)} \big \Vert _1}{ \big \Vert g_N^{(t)} \big \Vert _1} \le \frac{\big \Vert x_N^{(t)} \big \Vert _1}{\big \Vert g_N^{(t)} \big \Vert _1} \le k, \end{aligned}$$

and so for all \(i\in [n]\) we have \(|x_i^{(t+1)} - x_i^{(t)}| = \alpha ^{(t)} |g_i^{(t)}| \le k |g_i^{(t)}| \le k |x_i^* - x_i^{(t)}| \). \(\square \)

Our convergence proof is based on analyzing the following sets

$$\begin{aligned}{} & {} L_t:=\{i\in [n]:\, x_i^* > n\kappa _A \Vert x_N^{(t)}\Vert _1\}\,, \qquad T_t:=[n]\setminus L_t\,,\\{} & {} R_t:= \{i\in [n]:x^{(t)}_i \le (n-m)x^*_i\}\,. \end{aligned}$$

The set \(L_t\) consists of indices i where \(x_i^*\) is much larger than the current ‘cost’ \(\Vert x_N^{(t)}\Vert _1\). On the other hand, the set \(R_t\) consists of indices i where \(x_i^{(t)}\) is not much above \(x_i^*\). The next lemma shows that the sets \(L_t\) and \(R_t\) are monotonically growing.

Lemma 3.2

For every iteration \(t\ge 0\), we have \(L_t\subseteq L_{t+1}\subseteq B\) and \(R_t\subseteq R_{t+1}\).

Proof

Clearly, \(L_t \subseteq L_{t+1}\) as \(\Vert x_N^{(t)}\Vert _1\) is monotonically decreasing by Lemma 3.1, and \(L_t \subseteq B\) as \(x_N^* = \mathbb {0}_N\). Next, let \(j \in R_t\). If \(x_j^{(t)} \ge x_j^*\), then \(x_j^{(t+1)} \le x_j^{(t)}\) by conformality. If \(x_j^{(t)} < x_j^*\), then \(x^{(t+1)}_j \le x^{(t)}_j + (n -m) (x_j^* - x_j^{(t)}) \le (n - m)x_j^*\) by Lemma 3.1. In both cases, we conclude that \(j \in R_{t+1}\). \(\square \)

Our goal is to show that \(R_t\) or \(L_t\) is extended within \(O((n-m)\log (n+\kappa _A))\) iterations. By the maximality of the augmentation, we know that at least one variable is set to zero in every iteration t. The following lemma shows that these variables do not lie in \(L_t\).

Lemma 3.3

For every iteration \(t\ge 0\), we have \(\emptyset \ne \textrm{supp}(x^{(t)}){\setminus } \textrm{supp}(x^{(t+1)}) \subseteq T_t\).

Proof

Let \(i \in \textrm{supp}(x^{(t)}) \setminus \textrm{supp}(x^{(t+1)})\). Such a variable exists by the maximality of the augmentation. Clearly, \(x^{(t+1)}_i = 0\). Applying Lemma 2.7 to \(x^{(t+1)} - x^*\in \ker (A)\) yields

$$\begin{aligned} x^*_i \le \Vert x^{(t+1)} - x^*\Vert _\infty \le \kappa _A \Vert x_N^{(t+1)} - x_N^*\Vert _1 = \kappa _A\Vert x_N^{(t+1)}\Vert _1 \le \kappa _A\Vert x_N^{(t)}\Vert _1. \end{aligned}$$

The equality is due to \(x^*_N = \mathbb {0}\), while the last inequality follows from Lemma 3.1. So, \(i \in T_t\). \(\square \)

Clearly, any variable i that is set to zero in iteration t belongs to \(R_{t+1}\). So, if \(i\notin R_t\), then we make progress as \(R_t\subsetneq R_{t+1}\). Note that this is always the case if \(i\in N\). We show that if \(\Vert x^{(t)}_{T_t} - x^*_{T_t}\Vert _\infty \) is sufficiently large, then \(i\notin R_t\).

Lemma 3.4

If \(\Vert x^{(t)}_{T_t} - x^*_{T_t}\Vert _\infty > 2mn^2\kappa _A^2\left\| x^*_{T_t}\right\| _\infty \) for some iteration t, then \(R_t \subsetneq R_{t+1}\).

Proof

Let \(i\in \textrm{supp}(x^{(t)}){\setminus } \textrm{supp}(x^{(t+1)})\). Clearly, \(i\in R_{t+1}\) because \(x^{(t+1)}_i = 0\). So, it suffices to show that \(i\notin R_t\). Since \(x^{(t+1)} - x^{(t)}\) is an elementary vector, we have \(\Vert x^{(t+1)} - x^{(t)}\Vert _\infty \le \kappa _A |x^{(t+1)}_i - x^{(t)}_i| = \kappa _A x^{(t)}_i\). As \(|\textrm{supp}(x^{(t+1)}-x^{(t)})|\le m+1\), we obtain

$$\begin{aligned} \big \Vert x_N^{(t)} - x_N^{(t+1)} \big \Vert _1 \le (m+1) \big \Vert x_i^{(t)} - x_i^{(t+1)} \big \Vert _\infty \le (m+1)\kappa _A x_i^{(t)}\le 2m\kappa _A x_i^{(t)}\,. \end{aligned}$$
(5)

Let \(h^{(1)}, \ldots , h^{(k)}\) with \(k \le n -m \) be the conformal circuit decomposition of \(x^* - x^{(t)}\) used in iteration t of Algorithm 1. Let \(j \in T_t\) such that \(|x^{(t)}_j - x_j^*| = \Vert x_{T_t}^{(t)} - x_{T_t}^*\Vert _\infty \). There exists \({\widetilde{h}}=h^{(\ell )}\) for some \(\ell \in [k]\) in this decomposition such that \(|{\widetilde{h}}_j| \ge \frac{1}{k}|x^{(t)}_j - x^*_j|\). Since \(A_B\) has full column rank, we have \(\textrm{supp}({\widetilde{h}}) \cap N \ne \emptyset \) and so

$$\begin{aligned} \Vert {\widetilde{h}}_N\Vert _1 \ge \frac{|{\widetilde{h}}_j|}{\kappa _A} \ge \frac{|x^{(t)}_j - x^*_j|}{k\kappa _A}\,. \end{aligned}$$
(6)

From (5), (6) and noting that \(\Vert {\widetilde{h}}_N\Vert _1 \le \Vert g^{(t)}_N\Vert _1\le \Vert x_N^{(t)} - x_N^{(t+1)}\Vert _1\) by our choice of \(g^{(t)}\), we get

$$\begin{aligned} x_i^{(t)} \ge \frac{\Vert x_N^{(t)} - x_N^{(t+1)}\Vert _1}{2m\kappa _A} \ge \frac{\Vert {\widetilde{h}}_N\Vert _1}{2m\kappa _A} \ge \frac{\Vert x_{T_t}^{(t)} - x_{T_t}^*\Vert _\infty }{2mk\kappa _A^2}\,. \end{aligned}$$

Thus, if \(\Vert x_{T_t}^{(t)} - x_{T_t}^*\Vert _\infty > 2mn^2\kappa _A^2 \Vert x_{T_t}^*\Vert _\infty \) as in the assumption of the lemma, then \(x_i^{(t)} > n\Vert x_{T_t}^*\Vert _\infty \ge n x_i^*\), where the last inequality is due to \(i\in T_t\) by Lemma 3.3. It follows that \(i \notin R_t\) as desired. \(\square \)

We are ready to give the convergence bound. We have just proved that a large \(\Vert x^{(t)}_{T_t} - x^*_{T_t}\Vert _\infty \) guarantees the extension of \(R_t\). Using the geometric decay of \(\Vert x_N^{(t)}\Vert \) (Lemma 3.1), we now show that if \(\Vert x^{(t)}_{T_t} - x^*_{T_t}\Vert _\infty \) is small, then \(\Vert x^{(t)}_N\Vert _1\) drops sufficiently such that a new variable enters \(L_t\).

Proof of Theorem 1.1

Recall that we assumed \(n\le 2m\) without loss of generality. In light of Lemma 3.2, it suffices to show that either \(L_t\) or \(R_t\) is extended in every \(O((n-m)\log (n+\kappa _A))\) iterations. If \(\Vert x^{(t)}_{T_t} - x^*_{T_t}\Vert _\infty > 2mn^2\kappa _A^2 \left\| x^*_{T_t}\right\| _\infty \), then \(R_t \subsetneq R_{t+1}\) by Lemma 3.4.

So, let us assume that \(\Vert x^{(t)}_{T_t} - x^*_{T_t}\Vert _\infty \le 2mn^2\kappa _A^2 \left\| x^*_{T_t}\right\| _\infty \), that is, \(\Vert x^{(t)}_{T_t}\Vert _\infty \le (2mn^2\kappa _A^2+1) \left\| x^*_{T_t}\right\| _\infty \). We may also assume that \(\Vert x^{(t)}_N\Vert _1>0\), as otherwise \(x^{(t)} = x^*\). By Lemma 3.1, there is an iteration \(r = t + O((n - m)\log (n + \kappa _A))\) such that \(n^2\kappa _A (2mn^2\kappa _A^2 + 1) \Vert x_N^{(r)}\Vert _1 < \Vert x_N^{(t)}\Vert _1\). Hence,

$$\begin{aligned} (2mn^2\kappa _A^2 + 1)\Vert x_{T_t}^*\Vert _\infty\ge & {} \Vert x_{T_t}^{(t)}\Vert _\infty \ge \Vert x_{N}^{(t)}\Vert _\infty \\\ge & {} \frac{1}{n}\Vert x_{N}^{(t)}\Vert _1 > n\kappa _A(2mn^2\kappa _A^2 + 1)\Vert x_{N}^{(r)}\Vert _1, \end{aligned}$$

where the second inequality is due to \(N\subseteq T_t\) by Lemma 3.2. Thus, \(\Vert x_{T_t}^*\Vert _\infty > n\kappa _A \Vert x_{N}^{(r)}\Vert _1\) and so \(L_t \subsetneq L_r\). \(\square \)

4 Circuit diameter bound for the capacitated case

In this section we consider diameter bounds for systems of the form

$$\begin{aligned} P_u =\{x\in \mathbb {R}^n:\, Ax=b, \mathbb {0}\le x \le u\}. \end{aligned}$$
(Cap-P)

The theory in Sect. 3 carries over to \(P_u\) at the cost of turning m into n via the standard reformulation

$$\begin{aligned} {\widetilde{P}}_u = \left\{ (x,y) \in \mathbb {R}^{n + n}:\, \begin{bmatrix}A &{} 0 \\ I &{} I\end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} b \\ u \end{bmatrix}, x, y \ge \mathbb {0}\right\} , \quad P_u = \{x: (x, y ) \in {\widetilde{P}}_u\}.\nonumber \\ \end{aligned}$$
(7)

Corollary 4.1

The circuit diameter of a system in the form (Cap-P) with constraint matrix \(A\in \mathbb {R}^{m\times n}\) is \(O(n^2\log (n+\kappa _A))\).

Proof

Follows straightforward from Theorem 1.1 together with the reformulation (7). Let \(\widetilde{A}\) denote the constraint matrix of (7). It is easy to check that \(\kappa _A = \kappa _{\widetilde{A}}\), and that there is a one-to-one mapping between the circuits and maximal circuit augmentations of the two systems. \(\square \)

Intuitively, the polyhedron should not become more complex; related theory in [37] also shows how two-sided bounds can be incorporated in a linear program without significantly changing the complexity of solving the program.

Theorem 1.2 is proved using a new procedure, which we outline below. A basic feasible point \(x^* \in P_u\) is characterised by a partition \(B \cup L \cup H = [n]\) where \(A_B\) is a basis (has full column rank), \(x^*_L = \mathbb {0}_L\) and \(x^*_H = u_H\). In \(O(n \log n)\) iterations, we fix all but 2m variables to the same bound as in \(x^*\); for the remaining system with 2m variables, we can use the standard reformulation.

Algorithm 2 starts with a preprocessing. We let \(S_t\subseteq L\cup H\) denote the set of indices where \(x_i^{(t)}\ne x^*_i\), i.e., we are not yet at the required lower and upper bound. If \(|S_t|\le m\), then we remove the indices in \((L\cup H)\setminus S_t\), and use the diameter bound resulting from the standard embedding as in Corollary 4.1.

As long as \(|S_t|>m\), we proceed as follows. We define the cost function \(c\in \mathbb {R}^n\) by \(c_i=0\) for \(i\in B\), \(c_i=1/u_i\) for \(i\in L\), and \(c_i=-1/u_i\) for \(i\in H\). For this choice, we see that the optimal solution of the LP \(\min _{x\in P_u}\left\langle c, x \right\rangle \) is \(x^*\) with optimal value \(\left\langle c, x^* \right\rangle = -|H|\).

Depending on the value of \(\left\langle c, x^{(t)} \right\rangle \), we perform one of two updates. As long as \(\left\langle c, x^{(t)} \right\rangle \ge -|H|+1\), we take a conformal decomposition of \(x^*-x^{(t)}\), and pick the most improving augmenting direction from the decomposition. If \(\left\langle c, x^{(t)} \right\rangle <-|H|+1\), then we use a support circuit augmentation obtained from Support-Circuit\((A,c,x^{(t)},S_t)\).

Let us show that whenever Support-Circuit is called, \(g^{(t)}\) is guaranteed to exist. This is because \(|S_t|>m\) and \(x^{(t)}_i>0\) for all \(i\in S_t\). Indeed, if \(x^{(t)}_j = 0\) for some \(j\in S_t\), then \(j\in H\) from the definition of \(S_t\). However, this implies that

$$\begin{aligned} \langle c, x^{(t)} \rangle \ge \sum _{i\in H\setminus \{j\}} c_i x^{(t)}_i \ge -|H|+1, \end{aligned}$$

which is a contradiction.

The cost \(\left\langle c, x^{(t)} \right\rangle \) is monotone decreasing, and it is easy to see that \(\left\langle c, x^{(0)} \right\rangle \le n\) for any initial solution \(x^{(0)}\). Hence, within \(O((n-m)\log n)\) iterations we must reach \(\left\langle c, x^{(t)} \right\rangle <-|H|+1\). Each support circuit augmentation sets \(x_i^{(t+1)}=0\) for \(i\in L\) or \(x_i^{(t+1)}=u_i\) for \(i\in H\); hence, we perform at most \(n-m\) such augmentations. The formal proof is given below.

Algorithm 2
figure d

Capacitated-Diameter-Bound

Proof of Theorem 1.2

We show that Algorithm 2 has the claimed number of iterations. As previously mentioned, \(\left\langle c, x^* \right\rangle = -|H|\) is the optimal value of the LP \(\min _{x\in P_u}\left\langle c, x \right\rangle \). Initially, \(\left\langle c, x^{(0)} \right\rangle = -\sum _{i\in H} \frac{x^{(0)}}{u_i} +\sum _{i\in L} \frac{x^{(0)}}{u_i} \le n\). Similar to Lemma 3.1, due to our choice of \(g^{(t)}\) from the conformal circuit decomposition, we have \(\left\langle c, x^{(t+1)} \right\rangle + |H| \le (1 - \frac{1}{n-m})(\left\langle c, x^{(t)} \right\rangle + |H|)\). In particular, \(O((n-m)\log n)\) iterations suffice to find an iterate t such that \(\langle c, x^{(t)} \rangle < - |H| + 1\).

Note that the calls to Support-Circuit do not increase \(\left\langle c, x^{(t)} \right\rangle \), so from now we will never make use of the conformal circuit decomposition again. An augmentation resulting from a call to Support-Circuit will set at least one variable \(i \in \textrm{supp}(g^{(t)})\) to either 0 or \(u_i\). We claim that either \(x^{(t+1)}_i=0\) for some \(i\in L\), or \(x^{(t+1)}_i=u_i\) for some \(i\in H\), that is, we set a variable to the ‘correct’ boundary. To see this, note that if \(x^{(t+1)}_i\) hits the wrong boundary, then the gap between \(\left\langle c, x^{(t+1)} \right\rangle \) and \(-|H|\) must be at least 1, a clear contradiction to \(\left\langle c, x^{(t+1)} \right\rangle < -|H|+1\).

Thus, after at most \(n-m\) calls to Support-Circuit, we get \(|S_{t}| \le m\), at which point we call Algorithm 1 with at most 2m variables, so the diameter bound of Theorem 1.1 applies. \(\square \)

5 Proximity results

We now present Hoffman-proximity bounds in terms of the circuit imbalance measure \(\kappa _A\). A simple such bound was Lemma 2.7; we now present additional norm bounds. These can be derived from more general results in [17]; see also [19]. The references also explain the background and similar results in previous literature, in particular, to proximity bounds via \(\Delta _A\) in e.g., [35] and [10]. For completeness, we include the proofs.

The next technical lemma will be key in our arguments. See Corollary 5.2 below for a simple implication.

Lemma 5.1

Let \(A\in \mathbb {R}^{m\times n}\) and \(x\in \mathbb {R}^n\). Let \(L\subseteq \textrm{supp}(x)\) and \(S\subseteq [n]\setminus L\). If there is no circuit \(C\subseteq \textrm{supp}(x)\) such that \(C\cap S\ne \emptyset \), then

$$\begin{aligned}\Vert x_S\Vert _\infty \le \kappa _A\min _{z\in \ker (A)+x}\Vert z_{[n]\setminus {\text {cl}}(L)}\Vert _1 .\end{aligned}$$

Before the proof, it is worth stating a useful special case \(L = \emptyset \) and \(S=[n]\).

Corollary 5.2

Let x be a basic (but not necessarily feasible) solution to (LP). Then, for any z where \(Az=b\), we have \(\Vert x\Vert _\infty \le \kappa _A\Vert z\Vert _1 \).

Proof of Lemma 5.1

First, we show that \(x_{S\cap {\text {cl}}(L)} = \mathbb {0}\) due to our assumption. Indeed, any \(i\in S\cap {\text {cl}}(L)\) with \(x_i\ne 0\) gives rise to a circuit in \(L\cup \{i\}\subseteq \textrm{supp}(x)\), contradicting the assumption in the lemma. It follows that \(\Vert x_S\Vert _\infty = \Vert x_{S{\setminus } {\text {cl}}(L)}\Vert _\infty \); let \(j\in S{\setminus } {\text {cl}}(L)\) such that \(|x_j| = \Vert x_{S}\Vert _\infty \). Let \(z\in \ker (A)+x\) be a minimizer of the RHS in the statement. We may assume that \(|x_j|> |z_j|\), as otherwise we are done because \(\kappa _A\ge 1\).

Let \(h^{(1)},\ldots , h^{(k)}\) be a conformal circuit decomposition of \(z - x \in \ker (A)\). Among these elementary vectors, consider the set \(R:= \{t\in [k]: h^{(t)}_j\ne 0\}\).

Claim 5.3

For each \(t\in R\), there exists an index \(i(t)\in \textrm{supp}(h^{(t)}){\setminus } {\text {cl}}(L)\) such that \(x_{i(t)}= 0\) and \(z_{i(t)}\ne 0\).

Proof

For the purpose of contradiction, suppose that \(\textrm{supp}(h^{(t)})\setminus {\text {cl}}(L)\subseteq \textrm{supp}(x)\). For every \(i\in {\text {cl}}(L){\setminus } L\), we can write \(A_i = Ay^{(i)}\) where \(\textrm{supp}(y^{(i)})\subseteq L\). Consider the vector

$$\begin{aligned}h:=h^{(t)} + \sum _{i\in {\text {cl}}(L)\setminus L} h^{(t)}_i (y^{(i)} - e_i).\end{aligned}$$

Clearly, \(h_{{\text {cl}}(L){\setminus } L} = \mathbb {0}\) and \(h_{[n]{\setminus } {\text {cl}}(L)} = h^{(t)}_{[n]{\setminus } {\text {cl}}(L)}\). Since \(L\subseteq \textrm{supp}(x)\) and we assumed \(\textrm{supp}(h^{(t)})\setminus {\text {cl}}(L)\subseteq \textrm{supp}(x)\), it follows that \(\textrm{supp}(h)\subseteq \textrm{supp}(x)\). Moreover, \(j\in \textrm{supp}(h)\) because \(j\in S\setminus {\text {cl}}(L)\). Hence, applying Lemma 2.3 to \(h\in \ker (A)\) yields an elementary vector \(g\in \mathcal {E}(A)\) such that \(\textrm{supp}(g)\subseteq \textrm{supp}(x)\) and \(\textrm{supp}(g)\cap S\ne \emptyset \). This contradicts the assumption of the lemma. \(\square \)

By conformality of the decomposition, \(|x_j-z_j|=\sum _{t\in R} |h^{(t)}_j|\). According to Claim 5.3, for every \(t\in R\), we have \(|h^{(t)}_j|\le \kappa _A |h^{(t)}_{i(t)}|\) where \(i(t)\in [n]\setminus ({\text {cl}}(L)\cup \{j\})\); notice that \(i(t)\ne j\) for all \(t\in R\) due to our assumption \(|x_j|>0\). Applying conformality again yields

$$\begin{aligned} \sum _{t\in R}|h^{(t)}_{i(t)}| = \sum _{s\in [n]\setminus ({\text {cl}}(L)\cup \{j\})} \sum _{i(t) = s} |h^{(t)}_{i(t)}| \le \sum _{s\in [n]\setminus ({\text {cl}}(L)\cup \{j\})} |z_s| = \Vert z_{[n]\setminus ({\text {cl}}(L)\cup \{j\})}\Vert _1. \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert x_S\Vert _\infty = |x_j| \le |z_j| + |x_j - z_j| \le \kappa _A\Vert z_{[n]\setminus {\text {cl}}(L)}\Vert _1 \end{aligned}$$

where the last inequality is obtained by combining the previous equation and inequalities. \(\square \)

The following proximity theorem will be key to derive \(x^*_i=0\) for certain variables in our optimization algorithm; see [17] and [19, Theorem 6.5]. For \(\tilde{c}\in \mathbb {R}^n\), we use \({\text {LP}}(\tilde{c})\) to denote (LP) with cost vector \(\tilde{c}\), and \(\textrm{OPT}(\tilde{c})\) as the optimal value of \({\text {LP}}(\tilde{c})\).

Theorem 5.4

Let \(c,c'\in \mathbb {R}^n\) be two cost vectors, such that both \({\text {LP}}(c)\) and \({\text {LP}}(c')\) have finite optimal values. Let \(s'\) be a dual optimal solution to \({\text {LP}}(c')\). For all indices \(j\in [n]\) such that

$$\begin{aligned}s'_j> (m+1)\kappa _A \Vert c-c'\Vert _\infty ,\end{aligned}$$

it follows that \(x^*_j=0\) for every optimal solution \(x^*\) to \({\text {LP}}(c)\).

Proof

We may assume that \(c\ne c'\), as otherwise we are done by complementary slackness. Let \(x'\) be an optimal solution to \({\text {LP}}(c')\). By complementary slackness, \(s'_jx'_j=0\), and therefore \(x'_j=0\). For the purpose of contradiction, suppose that there exists an optimal solution \(x^*\) to \({\text {LP}}(c)\) such that \(x^*_j>0\). Let \(h^{(1)},\ldots , h^{(k)}\) be a conformal circuit decomposition of \(x^*-x'\). Then, \(h^{(t)}_j>0\) for some \(t\in [k]\). Since \(h^{(t)}\) is an elementary vector, \(|\textrm{supp}{(h^{(t)})}|\le m+1\) and so \(\Vert h^{(t)}\Vert _1\le (m+1)\Vert h^{(t)}\Vert _\infty \le (m+1)\kappa _A h_j^{(t)}\). Observe that for any \(i\in [n]\) where \(h^{(t)}_i < 0\), we have \(s'_i = 0\) because \(x'_i > x^*_i \ge 0\). Hence,

$$\begin{aligned} \left\langle c, h^{(t)} \right\rangle&= \left\langle c-c', h^{(t)} \right\rangle + \left\langle c', h^{(t)} \right\rangle \ge - \Vert c-c'\Vert _\infty \Vert h^{(t)}\Vert _1 + \left\langle s', h^{(t)} \right\rangle \\&\ge -(m+1)\kappa _A\Vert c-c'\Vert _\infty \,h^{(t)}_j + s'_jh^{(t)}_j > 0\, . \end{aligned}$$

The first inequality here used Hölder’s inequality and that \(\left\langle c', h^{(t)} \right\rangle =\left\langle s', h^{(t)} \right\rangle \) since \(c'-s'\in {\text {Im}}(A^\top )\) and \(h^{(t)}\in \ker (A)\). Since \(x^*-h^{(t)}\) is feasible to \({\text {LP}}(c)\), this contradicts the optimality of \(x^*\). \(\square \)

The following lemma provides an upper bound on the norm of the perturbation \(c-c'\) for which the existence of an index j as in Theorem 5.4 is guaranteed.

Lemma 5.5

Let \(c,c'\in \mathbb {R}^n\) be two cost vectors, and let \(s'\) be an optimal dual solution to \({\text {LP}}(c')\). If \(c\in \ker (A)\), \(\Vert c\Vert _2 = 1\) and \(\Vert c-c'\Vert _\infty < 1/(\sqrt{n}(m+ 2)\kappa _A)\), then there exists an index \(j\in [n]\) such that

$$\begin{aligned}s'_j>\frac{m+1}{\sqrt{n}(m+2)}.\end{aligned}$$

Proof

Let \(r = c-c'\). Note that \(s'+r\in {\text {Im}}(A^{\top })+c\). Then,

$$\begin{aligned}\Vert s'\Vert _\infty + \Vert r\Vert _\infty \ge \Vert s'+r\Vert _\infty \ge \frac{1}{\sqrt{n}}\Vert s'+r\Vert _2\ge \frac{1}{\sqrt{n}}\Vert c\Vert _2 = \frac{1}{\sqrt{n}},\end{aligned}$$

where the last inequality is due to \(s'+r-c\) and c being orthogonal. This gives us

$$\begin{aligned}\Vert s'\Vert _\infty \ge \frac{1}{\sqrt{n}} - \Vert r\Vert _\infty > \frac{(m+2)\kappa _A-1}{\sqrt{n}(m+2)\kappa _A} \ge \frac{m+1}{\sqrt{n}(m+2)}\end{aligned}$$

as desired because \(\kappa _A\ge 1\). \(\square \)

6 A circuit augmentation algorithm for feasibility

In this section we prove Theorem 1.3: given a linear program (LP) with cost \(c = (\mathbb {0}_{[n]{\setminus } N}, \mathbbm {1}_N)\) for some \(N\subseteq [n]\), find a solution x with \(x_N=\mathbb {0}\) (showing that the optimum value is 0), or certify that no such solution exists. A dual certificate in the latter case is a vector \(y\in \mathbb {R}^m\) such that \(A^\top y\le c\) and \(\left\langle b, y \right\rangle >0\).

Theorem 1.3 can be used to solve the feasibility problem for linear programs. Given a polyhedron in standard form (P), we construct an auxiliary linear program whose feasibility problem is trivial, and whose optimal solutions correspond to feasible solutions to (P). This is in the same tune as Phase I of the Simplex method:

$$\begin{aligned} \min \; \left\langle \mathbbm {1}, z \right\rangle \quad \mathrm {s.t.}\quad Ay - Az=b\, , \, y, z \ge \mathbb {0}\, .\\ \end{aligned}$$
(Aux-LP)

For the constraint matrix \({\widetilde{A}} = \begin{bmatrix} A&- A\end{bmatrix}\), it is easy to see that \(\kappa _{{\widetilde{A}}} = \kappa _A\) and that any solution \(Ax = b\) can be converted into a feasible solution to (Aux-LP) via \((y,z) = (x^+, x^-)\). Hence, if the subroutines Support-Circuit and Ratio-Circuit are available for (Aux-LP), then we can invoke Theorem 1.3 with \(N = \{n+1,n+2,\dots ,2n\}\) on (Aux-LP) to solve the feasibility problem of (P) in \(O(mn\log (n+\kappa _A))\) augmentation steps.

Our algorithm is presented in Algorithm 3. We maintain a set \(\mathcal { L}_t\subseteq [n]\setminus N\), initialized as \(\emptyset \). Whenever \(x^{(t)}_i \ge 4mn\kappa _A \Vert x_N^{(t)}\Vert _1\) for the current iterate \(x^{(t)}\), we add i to \(\mathcal { L}_t\). Note that once an index i enters \(\mathcal {L}_t\), it is never removed, even though \(x_i\) might drop below this threshold in the future. Still, we will show that \(\mathcal {L}_t\subseteq \textrm{supp}(x^{(t)})\) in every iteration.

Whenever \({\text {rk}}(\mathcal { L}_t)\) increases, we run Support-Circuit\((A,c,x^{(t)},N)\) iterations as long as there exists a circuit in \(\textrm{supp}(x^{(t)})\) intersecting N. Afterwards, we run a sequence of Ratio-Circuit iterations until \({\text {rk}}(\mathcal { L}_t)\) increases again. The key part of the analysis is to show that \({\text {rk}}(\mathcal { L}_t)\) increases in every \(O(n\log (n+\kappa _A))\) iterations.

Algorithm 3
figure e

Feasibility-Algorithm

Let us first analyze what happens during Ratio-Circuit iterations.

Lemma 6.1

If Ratio-Circuit is called in iteration t, then either \(\Vert x_N^{(t+1)}\Vert _1\le \left( 1-\frac{1}{n}\right) \Vert x_N^{(t)}\Vert _1\), or the algorithm terminates with a dual certificate.

Proof

The oracle returns \(g^{(t)}\) that is optimal to (2) and \((y^{(t)},s^{(t)})\) that is optimal to (3) with optimum value \(-\lambda \). Thus, \(A^\top y+s=c\) and \(\mathbb {0}\le s\le \lambda w\). Recall that we use weights \(w_i=1/x^{(t)}_i\). If \(\left\langle b, y^{(t)} \right\rangle >0\), the algorithm terminates. Otherwise, note that

$$\begin{aligned} \left\langle c, x^{(t)} \right\rangle =\left\langle b, y^{(t)} \right\rangle +\left\langle s^{(t)}, x^{(t)} \right\rangle \le \lambda \left\langle w_{\textrm{supp}(x^{(t)})}, x^{(t)}_{\textrm{supp}(x^{(t)})} \right\rangle \le n \lambda \,, \end{aligned}$$

implying \(\lambda \ge \left\langle c, x^{(t)} \right\rangle /n\), and therefore \(\left\langle c, g^{(t)} \right\rangle =-\lambda \le -\left\langle c, x^{(t)} \right\rangle /n\). This implies the lemma, noting that

$$\begin{aligned} \Vert x_N^{(t+1)}\Vert _1=\left\langle c, x^{(t+1)} \right\rangle \le \left\langle c, x^{(t)} \right\rangle +\left\langle c, g^{(t)} \right\rangle \le \left( 1-\frac{1}{n}\right) \Vert x_N^{(t)}\Vert _1\,. \end{aligned}$$

\(\square \)

Next, we analyze what happens during Support-Circuit iterations.

Lemma 6.2

If Support-Circuit is called in iteration t, then \(\Vert x^{(t+1)}-x^{(t)}\Vert _\infty \le \kappa _A\Vert x_N^{(t)}\Vert _1\).

Proof

We have \(g^{(t)}_i < 0\) for some \(i\in N\) because \(\textrm{supp}(g^{(t)})\cap N\ne \emptyset \) and \(\left\langle c, g^{(t)} \right\rangle \le 0\). Hence,

$$\begin{aligned} \Vert x^{(t+1)}-x^{(t)}\Vert _\infty \le \kappa _A |x^{(t+1)}_i - x^{(t)}_i|\le \kappa _A x^{(t)}_i \le \kappa _A\Vert x_N^{(t)}\Vert _1. \end{aligned}$$

\(\square \)

The following lemma shows that once a coordinate enters \(\mathcal { L}_t\), its value stays above a certain threshold.

Lemma 6.3

For every iteration \(t\ge 0\), we have \(x_j^{(t)}\ge 2mn\kappa _A\Vert x^{(t)}_N\Vert _1\) for all \(j\in \mathcal { L}_t\).

Proof

Fix an iteration \(t\ge 0\) and a coordinate \(j\in \mathcal { L}_t\). We may assume that \(\Vert x^{(t)}_N\Vert _1 > 0\), as otherwise the lemma trivially holds because \(x^{(t)}\ge \mathbb {0}\). Let \(r\le t\) be the iteration in which j was added to \(\mathcal { L}_r\); the lemma clearly holds at iteration r.

We analyze the ratio \(x^{(t')}_j/\Vert x^{(t')}_N\Vert _1\) for iterations \(t'=r,\ldots ,t\). At an iteration \(r\le t'<t\) that performs Ratio-Circuit, observe that if \(x^{(t')}_j/\Vert x^{(t')}_N\Vert _1\ge 2n\kappa _A\), then

$$\begin{aligned} \begin{aligned} \frac{x^{(t'+1)}_j}{\left\| x_N^{(t'+1)} \right\| _1}&\ge \frac{x^{(t')}_j - \kappa _A \left\| x_N^{(t'+1)} - x_N^{(t')} \right\| _1}{(1- \frac{1}{n}) \left\| x_N^{(t')} \right\| _1} \ge \frac{x^{(t')}_j - 2\kappa _A \left\| x_N^{(t')} \right\| _1}{(1- \frac{1}{n}) \left\| x_N^{(t')}\right\| _1}\\ {}&\ge \frac{(1-\frac{1}{n})x_j^{(t')}}{(1 - \frac{1}{n}) \left\| x_N^{(t')} \right\| _1} = \frac{x^{(t')}_j}{ \left\| x_N^{(t')} \right\| _1} . \end{aligned} \end{aligned}$$

The first inequality is due to Lemma 6.1 and the fact that \(x^{(t'+1)} - x^{(t')}\) is an elementary vector whose support intersects N. This fact follows from \(\langle c, g^{(t')} \rangle < 0\) because \(\Vert x^{(t')}_N\Vert _1 \ge \Vert x^{(t)}_N\Vert _1 > 0\) and \(\langle b, y^{(t')} \rangle \le 0\). The second inequality uses the monotonicity \(\Vert x^{(t'+1)}_N\Vert _1\le \Vert x^{(t')}_N\Vert _1\) and the triangle inequality. The third inequality uses the assumption \(x^{(t')}_j/\Vert x^{(t')}_N\Vert _1\ge 2n\kappa _A\).

Hence, it suffices to show that Support-Circuit maintains the invariant \(x^{(t')}_j/\Vert x^{(t')}_N\Vert _1\ge 2n\kappa _A\). At an iteration \(r\le t' < t\) which performs Support-Circuit, we have

$$\begin{aligned}\frac{x^{(t'+1)}_j}{\left\| x^{(t'+1)}_N \right\| _1} \ge \frac{x^{(t')}_j - \kappa _A \left\| x^{(t')}_N \right\| _1}{\left\| x^{(t')}_N \right\| _1} = \frac{x^{(t')}_j}{\left\| x^{(t')}_N \right\| _1} - \kappa _A\end{aligned}$$

by Lemma 6.2. Since Algorithm 3 performs at most \((m+1)n\) Support-Circuit iterations, the total decrease of this ratio is at most \((m+1)n\kappa _A \le 2mn\kappa _A\). As the starting value is at least \(4mn\kappa _A\), it follows that this ratio does not drop below \(2mn\kappa _A\). \(\square \)

Proof of Theorem 1.3

The correctness of Algorithm 3 is obvious. If the algorithm terminates due to \(x^{(t)}_N=\mathbb {0}\), then \(x^{(t)}\) is the desired solution to (LP). Otherwise, if the algorithm terminates due to \(\langle b, y^{(t)} \rangle >0\), then \(y^{(t)}\) is the desired dual certificate as it is feasible to (DLP).

Next, we show that if \({\text {rk}}(\mathcal { L}_t) = m\), then the algorithm will terminate in iteration \(r \le t+ n\) with \(x^{(r)}_N = \mathbb {0}\). As long as \(x^{(t)}_N \ne \mathbb {0}\), we have \(\mathcal { L}_t\subseteq [n]{\setminus } N\) by Lemma 6.3. Moreover, any \(i\in \textrm{supp}(x^{(t)}_N)\) induces a circuit in \(\mathcal { L}_t\cup \{i\}\), so Support-Circuit will be invoked. Since every call to Support-Circuit reduces \(\textrm{supp}(x^{(t)})\), all the coordinates in N will be zeroed-out in at most n calls.

It is left to bound the number of iterations of Algorithm 3. In the first iteration and whenever \({\text {rk}}(\mathcal { L}_t)\) increases, we perform a sequence of at most n Support Circuit cancellations. Let us consider an iteration t right after we are done with the Support Circuit cancellations. Then, there is no circuit in \(\textrm{supp}(x^{(t)})\) intersecting N. We show that \({\text {rk}}(\mathcal { L}_t)\) increases within \(O(n\log (n+\kappa _A))\) consecutive calls to Ratio-Circuit; this completes the proof.

By Lemma 6.1, within \(O(n\log (n\kappa _A)) = O(n \log (n + \kappa _A))\) consecutive Ratio-Circuit augmentations, we reach an iterate \(r = t + O(n \log (n + \kappa _A))\) such that \(\Vert x_N^{(r)}\Vert _1 \le (4 mn^3\kappa _A^2)^{-1} \Vert x_N^{(t)}\Vert _1\). Since \(\mathcal { L}_t\subseteq \textrm{supp}(x^{(t)})\) and \(N\subseteq [n]{\setminus } \mathcal { L}_t\) by Lemma 6.3, and there is no circuit in \(\textrm{supp}(x^{(t)})\) intersecting N, applying Lemma 5.1 with \(x=x^{(t)}\) and \(z=x^{(r)}\) yields

$$\begin{aligned} \left\| x^{(r)}_{[n]\setminus {\text {cl}}(\mathcal { L}_t)} \right\| _\infty \ge \frac{\left\| x^{(r)}_{[n]\setminus {\text {cl}}(\mathcal { L}_t)} \right\| _1}{n}\ge \frac{\left\| x^{(t)}_{N} \right\| _\infty }{n\kappa _A} \ge \frac{ \left\| x^{(t)}_{N} \right\| _1}{n^2\kappa _A}\ge {4mn\kappa _A}{\left\| x_N^{(r)} \right\| _1}\,, \end{aligned}$$

showing that some \(j\in [n]\setminus {\text {cl}}(\mathcal { L}_t)\) must be included in \(\mathcal { L}_r\). \(\square \)

7 A circuit augmentation algorithm for optimization

In this section, we give a circuit-augmentation algorithm for solving (LP), given by \(A\in \mathbb {R}^{m\times n}\), \(b\in \mathbb {R}^m\) and \(c\in \mathbb {R}^n\). We also assume that an initial feasible solution \(x^{(0)}\) is provided. In every iteration t, the algorithm maintains a feasible solution \(x^{(t)}\) to (LP), initialized with \(x^{(0)}\). The goal is to augment \(x^{(t)}\) using the subroutines Support-Circuit and Ratio-Circuit until the emergence of a nonempty set \(N\subseteq [n]\) which satisfies \(x^{(t)}_N=x^*_N=\mathbb {0}\) for every optimal solution \(x^*\) to (LP). When this happens, we have reached a lower dimensional face of the polyhedron that contains the optimal face. Hence, we can fix \(x^{(t')}_N=\mathbb {0}\) in all subsequent iterations \(t'\ge t\). In particular, we repeat the same procedure on a smaller LP with constraint matrix \(A_{[n]\setminus N}\), RHS vector b, and cost \(c_{[n]\setminus N}\), initialized with the feasible solution \(x^{(t)}_{[n]\setminus N}\). Note that a circuit walk of this smaller LP corresponds to a circuit walk of the original LP. This gives the overall circuit-augmentation algorithm.

In what follows, we focus on the aforementioned variable fixing procedure (Algorithm 4), since the main algorithm just calls it at most n times.

We fix parameters

$$\begin{aligned}\delta :=\frac{1}{2n^{3/2}(m+2)\kappa _A}\,, \qquad T:=\Theta (n\log (n+\kappa _A))\,, \qquad \Gamma :=\frac{6(m+2)\sqrt{n}\kappa _A^2T}{\delta }\,.\end{aligned}$$

Throughout the procedure, A and b will be fixed, but we will sometimes modify the cost function c. Recall that for any \(\tilde{c}\in \mathbb {R}^n\), we use \({\text {LP}}(\tilde{c})\) to denote the problem with cost vector \(\tilde{c}\), and the optimal value is \(\textrm{OPT}(\tilde{c})\). We will often use the fact that if \({\tilde{s}}\in {\text {Im}}(A^\top )+{\tilde{c}}\), then the linear programs \({\text {LP}}({\tilde{s}})\) and \({\text {LP}}({\tilde{c}})\) are equivalent.

Let us start with a high level overview before presenting the algorithm. The inference that \(x^{(t)}_N=x^*_N=\mathbb {0}\) for every optimal \(x^*\) will be made using Theorem 5.4. To apply this, our goal is to find a cost function \(c'\) and an optimal dual solution \(s'\) to \({\text {LP}}(c')\) such that the set of indices \(N:=\{\,j: s'_j> (m+1)\kappa _A \Vert c-c'\Vert _\infty \,\}\) is nonempty.

If \(c=\mathbb {0}\), then we can return \(x^{(0)}\) as an optimal solution. Otherwise, we can normalize to \(\Vert c\Vert =1\).Footnote 1 Let us start from any primal and dual feasible solutions \((x^{(0)},s^{(0)})\) to \({\text {LP}}(c)\); we can obtain \(s^{(0)}\) from a call to Ratio-Circuit. Within \(O(n\log (n+\kappa _A))\) Ratio-Circuit augmentations, we arrive at a pair of primal and dual feasible solutions \((x,s)=(x^{(t)},s^{(t)})\) such that \(\left\langle x, s \right\rangle \le \varepsilon :=\left\langle x^{(0)}, s^{(0)} \right\rangle /\textrm{poly}(n,\kappa _A)\).

We now describe the high level motivation for the algorithm. Suppose that for every \(i\in \textrm{supp}(x)\), \(s_i\) is small, say \(s_i< \delta \). Let \({\tilde{c}}_i:=s_i\) if \(i\notin \textrm{supp}(x)\) and \({\tilde{c}}_i:=0\) if \(i\in \textrm{supp}(x)\). Then, \(\Vert {\tilde{c}}-s\Vert _\infty < \delta \) and x and \({\tilde{c}}\) are primal and dual optimal solutions to \({\text {LP}}({\tilde{c}})\). This follows because they are primal and dual feasible and satisfy complementary slackness. Consider the vector \(c':=c-s+{\tilde{c}}\), which satisfies \(\Vert c-c'\Vert _\infty < \delta \). Since \(c-s\in {\text {Im}}(A^\top )\), \({\text {LP}}(c')\) and \({\text {LP}}({\tilde{c}})\) are equivalent. Thus, x and \({\tilde{c}}\) are primal and dual optimal solutions to \({\text {LP}}(c')\). Then, Theorem 5.4 is applicable for the costs \(c, c'\) and the dual optimal solution \(\tilde{c}\). However, to be able to make progress by fixing variables, we also need to guarantee that \(N\ne \emptyset \). Following Tardos [34, 35], this can be ensured if we pre-process by projecting the cost vector c onto \(\ker (A)\); this guarantees that \(\Vert s\Vert \)—and thus \(\Vert {\tilde{c}}\Vert \)—must be sufficiently large.

Let us now turn to the case when the above property does not hold for (xs): for certain coordinates we could have \(x_i>0\) and \(s_i\ge \delta \). We enter the second phase of the algorithm. Let \(S = \{i\in [n]:s_i\ge \delta \}\) be the coordinates with large dual slack. Since \(x_is_i\le \left\langle x, s \right\rangle \le \varepsilon \), this implies \(x_i\le \varepsilon /\delta \) for all \(i\in S\). Therefore, \(\Vert x_S\Vert \) is sufficiently small, and one can show that the set of ‘large’ indices \(\mathcal { L}=\{i\in [n]:\, x_i\ge \Gamma \Vert x_S\Vert _1\}\) is nonempty. We proceed by defining a new cost function \({\tilde{c}}_i:=s_i\) if \(i\in S\) and \({\tilde{c}}_i:=0\) if \(i\notin S\). We perform Support-Circuit iterations as long as there exist circuits in \(\textrm{supp}(x)\) intersecting \(\textrm{supp}({\tilde{c}})\), and then perform further \(O(n\log (n+\kappa _A))\) Ratio-Circuit iterations for the cost function \({\tilde{c}}\). If we now arrive at an iterate \((x,s)=(x^{(t')},s^{(t')})\) such that \(s_i < \delta \) for every \(i\in \textrm{supp}(x)\), then we truncate s as before to an optimal dual solution to \({\text {LP}}(c'')\) for some vector \(c''\) where \(\Vert c-c''\Vert _\infty < 2\delta \). After that, Theorem 5.4 is applicable for the costs \(c,c''\) and said optimal dual solution. Otherwise, we continue with additional phases.

The algorithm formalizes the above idea, with some technical modifications. The algorithm comprises at most \(m+1\) phases; the main potential is that the rank of the large index set \(\mathcal { L}\) increases in every phase. We show that if an index \(i\notin {\text {cl}}(\mathcal { L})\) was added to \(\mathcal { L}\), then it must have \(s_i<\delta \) at the beginning of every later phase. Thus, these indices cannot be violating anymore.

Algorithm 4
figure f

Variable-Fixing

We now turn to a more formal description of Algorithm 4. We start by orthogonally projecting the input cost vector c to \(\ker (A)\). This does not change the optimal face of (LP). If \(c=\mathbb {0}\), then we terminate and return the current feasible solution \(x^{(0)}\) as it is optimal. Otherwise, we scale the cost to \(\Vert c\Vert _2 = 1\), and use Ratio-Circuit to obtain a feasible solution \(\tilde{s}^{(-1)}\) to the dual of \({\text {LP}}(c)\).

The rest of Algorithm 4 consists of repeated phases, ending when \(\left\langle \tilde{s}^{(t-1)}, x^{(t)} \right\rangle = 0\). In an iteration t, let \(S_t = \{i\in [n]:\tilde{s}^{(t-1)}_i \ge \delta \}\) be the set of coordinates with large dual slack. The algorithm keeps track of the following set

$$\begin{aligned}\mathcal { L}_t :=\mathcal { L}_{t-1} \cup \left\{ i\in [n]:x^{(t)}_i \ge \Gamma \Vert x^{(t)}_{S_t}\Vert _1 \right\} .\end{aligned}$$

These are the variables that were once large with respect to \(\Vert x^{(t')}_{S_{t'}}\Vert _1\) in iteration \(t'\le t\). Note that \(|\mathcal { L}_t|\) is monotone nondecreasing.

The first phase starts at \(t=0\), and we enter a new phase k whenever \({\text {rk}}(\mathcal { L}_t) > {\text {rk}}(\mathcal { L}_{t-1})\). Such an iteration t is called the first iteration in phase k. At the start of the phase, we define a new modified cost \(\tilde{c}^{(k)}\) from the dual slack \(\tilde{s}^{(t-1)}\) by truncating entries less than \(\delta \) to 0. This cost vector will be used until the end of the phase. Then, we call Support-Circuit\((A,\tilde{c}^{(k)},x^{(t)},\textrm{supp}(\tilde{c}^{(k)}))\) to eliminate circuits in \(\textrm{supp}(x^{(t)})\) intersecting \(\textrm{supp}(\tilde{c}^{(k)})\). Note that there are at most n such calls because each call sets a primal variable \(x^{(t)}_i\) to zero.

In the remaining part of the phase, we augment \(x^{(t)}\) using Ratio-Circuit\((A,\tilde{c}^{(k)},1/x^{(t)})\) until \({\text {rk}}(\mathcal { L}_t)\) increases, triggering a new phase. In every iteration, Ratio-Circuit\((A,\tilde{c}^{(k)},1/x^{(t)})\) returns a minimum cost-to-weight ratio circuit \(g^{(t)}\), where the choice of weights \(1/x^{(t)}\) follows Wallacher [39]. It also returns a feasible solution \((y^{(t)},s^{(t)})\) to the dual of \({\text {LP}}(\tilde{c}^{(k)})\). After augmenting \(x^{(t)}\) to \(x^{(t+1)}\) using \(g^{(t)}\), we update the dual slack as

$$\begin{aligned} \tilde{s}^{(t)} :=\mathop {\mathrm {arg\,min}}\limits _{s\in \{\tilde{c}^{(k)}, s^{(t)}\}} \left\langle s, x^{(t+1)} \right\rangle . \end{aligned}$$

This finishes the description of a phase.

Since \({\text {rk}}(A) = m\), clearly there are at most \(m+1\) phases. Let k and t be the final phase and iteration of Algorithm 4 respectively. As \(\left\langle \tilde{s}^{(t-1)}, x^{(t)} \right\rangle =0\), and \(x^{(t)},\tilde{s}^{(t-1)}\) are primal-dual feasible solutions to \({\text {LP}}({\tilde{c}}^{(k)})\), they are also optimal. Now, it is not hard to see that \(\tilde{c}^{(k)} \in {\text {Im}}(A^\top ) + c-r\) for some \(\mathbb {0}\le r\le (m+1)\delta \mathbbm {1}\) (Claim 7.3). Hence, \(\tilde{s}^{(t-1)}\) is also an optimal solution to the dual of \({\text {LP}}(c-r)\). The last step of the algorithm consists of identifying the set N of coordinates with large dual slack \(\tilde{s}^{(t-1)}_i\). Then, applying Theorem 5.4 for \(c'=c-r\) allows us to conclude that they can be fixed to zero.

In order to prove Theorem 1.4, we need to show that \(N\ne \emptyset \). Moreover, we need to show that there are at most T iterations of Ratio-Circuit per phase. First, we show that the objective value is monotone nonincreasing.

Lemma 7.1

For any two iterations \(r\ge t\) in phases \(\ell \ge k\ge 1\) respectively,

$$\begin{aligned}\left\langle \tilde{c}^{(\ell )}, x^{(r)} \right\rangle \le \left\langle \tilde{c}^{(k)}, x^{(t)} \right\rangle .\end{aligned}$$

Proof

We proceed by induction on \(\ell -k \ge 0\). For the base case \(\ell -k=0\), iterations r and t occur in the same phase. So, the objective value is nonincreasing from the definition of Support Circuit and Ratio-Circuit. Next, suppose that the statement holds for \(\ell -k = d\), and consider the inductive step \(\ell -k = d+1\). Let q be the first iteration in phase \(k+1\); note that \(r\ge q>t\). Then, we have

$$\begin{aligned} \left\langle \tilde{c}^{(\ell )}, x^{(r)} \right\rangle \le \left\langle \tilde{c}^{(k+1)}, x^{(q)} \right\rangle \le \left\langle \tilde{s}^{(q-1)}, x^{(q)} \right\rangle \le \left\langle \tilde{c}^{(k)}, x^{(q)} \right\rangle \le \left\langle \tilde{c}^{(k)}, x^{(t)} \right\rangle \,. \end{aligned}$$

The first inequality uses the inductive hypothesis. In the second inequality, we use that \(\tilde{c}^{(k+1)}\) is obtained from \(\tilde{s}^{(q-1)}\) by setting some nonnegative coordinates to 0. The third inequality is by the definition of \(\tilde{s}^{(q-1)}\). The final inequality is by monotonicity within the same phase. \(\square \)

The following claim gives a sufficient condition for Algorithm 4 to terminate.

Claim 7.2

Let t be an iteration in phase \(k\ge 1\). If Ratio-Circuit returns an elementary vector \(g^{(t)}\) such that \(\left\langle \tilde{c}^{(k)}, g^{(t)} \right\rangle = 0\), then Algorithm 4 terminates in iteration \(t+1\).

Proof

Recall that the weights w in Ratio-Circuit are chosen as \(w = 1/x^{(t)}\). Recall also the constraint \(s^{(t)} \le \lambda w\) in the dual program (3). Hence, for every \(i\in \textrm{supp}(x^{(t)})\), \(s_i^{(t)} x_i^{(t)} \le \lambda = -\left\langle \tilde{c}^{(k)}, g^{(t)} \right\rangle \), where the equality is due to strong duality. It follows that \(\left\langle s^{(t)}, x^{(t)} \right\rangle \le -n\left\langle \tilde{c}^{(k)}, g^{(t)} \right\rangle = 0\). Since \(\tilde{s}^{(t)}, x^{(t+1)}\ge \mathbb {0}\), we have

$$\begin{aligned} 0 \le \left\langle \tilde{s}^{(t)}, x^{(t+1)} \right\rangle \le \left\langle s^{(t)}, x^{(t+1)} \right\rangle = \left\langle s^{(t)}, x^{(t)} \right\rangle \le 0.\end{aligned}$$

Thus, the algorithm terminates in the next iteration. \(\square \)

The next two claims provide some basic properties of the modified cost \(\tilde{c}^{(k)}\). For convenience, we define \(\tilde{c}^{(0)} :=c\).

Claim 7.3

For every phase \(k\ge 0\), we have \(\tilde{c}^{(k)}\in {\text {Im}}(A^{\top })+c-r\) for some \(\mathbb {0}\le r\le k\delta \mathbbm {1}\).

Proof

We proceed by induction on k. The base case \(k=0\) is trivial. Next, suppose that the statement holds for k, and consider the inductive step \(k+1\). Let t be the first iteration of phase \(k+1\), i.e., \(\tilde{c}^{(k+1)}_i = \tilde{s}^{(t-1)}_i\) if \(i\in S_{t}\), and \(\tilde{c}^{(k+1)}_i = 0\) otherwise. Note that \(\tilde{s}^{(t-1)}\in \{\tilde{c}^{(k)},s^{(t-1)}\}\). Since both of them are feasible to the dual of \({\text {LP}}(\tilde{c}^{(k)})\), we have \(\tilde{s}^{(t-1)} \in {\text {Im}}(A^{\top }) + \tilde{c}^{(k)}\). By the inductive hypothesis, \(\tilde{c}^{(k)} \in {\text {Im}}(A^{\top }) + c-r\) for some \(\mathbb {0}\le r\le k\delta \mathbbm {1}\). Hence, from the definition of \(\tilde{c}^{(k+1)}\), we have \(\tilde{c}^{(k+1)}\in {\text {Im}}(A^{\top })+c-r-q\) for some \(\mathbb {0}\le q\le \delta \mathbbm {1}\) as required. \(\square \)

Claim 7.4

For every phase \(k\ge 0\), we have \(\Vert \tilde{c}^{(k)}\Vert _\infty \le 3\sqrt{n}\kappa _A\).

Proof

We proceed by induction on k. The base case \(k=0\) is easy because \(\Vert c\Vert _\infty \le \Vert c\Vert _2 = 1\). Next, suppose that the statement holds for k, and consider the inductive step \(k+1\). Let t be the first iteration of phase \(k+1\). If \(\tilde{s}^{(t-1)} = \tilde{c}^{(k)}\), then \(\tilde{c}^{(k+1)}\) is obtained from \(\tilde{c}^{(k)}\) by setting some coordinates to 0, so we are done by the inductive hypothesis. Otherwise, \(\tilde{s}^{(t-1)} = s^{(t-1)}\). We know that \(s^{(t-1)}\) is an optimal solution to (3) for Ratio-Circuit\((A, \tilde{c}^{(k)}, 1/x^{(t-1)})\). Since \(c-r \in {\text {Im}}(A^{\top }) + \tilde{c}^{(k)}\) for some \(\mathbb {0}\le r\le k\delta \mathbbm {1}\) by Claim 7.3, \(s^{(t-1)}\) is also an optimal solution to (3) for Ratio-Circuit\((A, c-r, 1/x^{(t-1)})\). By (4), we obtain

$$\begin{aligned} \Vert s^{(t-1)}\Vert _\infty \le 2\kappa _A \Vert c-r\Vert _1&\le 2\kappa _A(\Vert c\Vert _1 + \Vert r\Vert _1) \\&\le 2\kappa _A\big (\sqrt{n} + nk\delta \big ) \le 2\kappa _A\big (\sqrt{n} + n(m+1)\delta \big ) \le 3\sqrt{n}\kappa _A. \end{aligned}$$

The third inequality is due to \(\Vert c\Vert _2 = 1\), the fourth inequality follows from the fact that there are at most \(m+1\) phases, and the last inequality follows from the definition of \(\delta \). \(\square \)

We next show a primal proximity lemma that holds for iterates throughout the algorithm.

Lemma 7.5

Let t be the first iteration of a phase \(k\ge 1\). For any iteration \(r\ge t\),

$$\begin{aligned} \left\| x^{(r+1)} - x^{(r)} \right\| _\infty \le \frac{3\sqrt{n}\kappa ^2_A}{\delta } \left\| x^{(t)}_{S_t} \right\| _1\,. \end{aligned}$$
(8)

Proof

Fix an iteration \(r\ge t\) and let \(\ell \ge k\) be the phase in which iteration r occurred. Consider the elementary vector \(g^{(r)}\). If it is returned by Support-Circuit, then \(g^{(r)}_i<0\) for some \(i\in \textrm{supp}(\tilde{c}^{(\ell )})\) by definition. If it is returned by Ratio-Circuit, we also have \(g^{(r)}_i<0\) for some \(i\in \textrm{supp}(\tilde{c}^{(\ell )})\) unless \(\langle \tilde{c}^{(\ell )}, g^{(r)} \rangle = 0\). Note that if \(\langle \tilde{c}^{(\ell )}, g^{(r)} \rangle = 0\), then the algorithm sets \(x^{(r+1)} = x^{(r)}\), which makes the lemma trivially true. Hence, we may assume that such an iteration does not occur.

By construction, we have \(x^{(r+1)} - x^{(r)}=\alpha g^{(r)}\) for some \(\alpha >0\), and \(\alpha |g_i^{(r)}|\le x^{(r)}_i\). Applying the definition of \(\kappa _A\) yields

$$\begin{aligned} \left\| x^{(r+1)} - x^{(r)} \right\| _\infty \le \kappa _A x^{(r)}_i \le \frac{\kappa _A}{\delta } \left\langle \tilde{c}^{(\ell )}, x^{(r)} \right\rangle \le \frac{\kappa _A}{\delta }\left\langle \tilde{c}^{(k)}, x^{(t)} \right\rangle \le \frac{3\sqrt{n}\kappa ^2_A}{\delta } \left\| x^{(t)}_{S_t} \right\| _1. \end{aligned}$$

The second inequality uses that all nonzero coordinates of \(\tilde{c}^{(\ell )}\) are at least \(\delta \). The third inequality is by Lemma 7.1, whereas the fourth inequality is by Claim 7.4 and \(\textrm{supp}(\tilde{c}^{(k)}) = S_t\). \(\square \)

With the above lemma, we show that any variable which enters \(\mathcal { L}_t\) at the start of a phase, is lower bounded by \({\text {poly}}(n,\kappa _A)\Vert x^{(t)}_{S_t}\Vert _1\) in the next \(\Theta (mT)\) iterations.

Lemma 7.6

Let t be the first iteration of a phase \(k\ge 1\) and let \(i\in \mathcal {L}_t{\setminus } \mathcal {L}_{t-1}\). For any iteration \(t\le t'\le t+2(m+1)T\),

$$\begin{aligned}x^{(t')}_i \ge \frac{6\sqrt{n}\kappa ^2_A}{\delta }\Vert x^{(t)}_{S_t}\Vert _1.\end{aligned}$$

Proof

By definition, we have that \(x_i^{(t)} \ge \Gamma \Vert x_{S_t}^{(t)}\Vert _1\). With Lemma 7.5 we get

$$\begin{aligned} \begin{aligned} x^{(t')}_i \ge x^{(t)}_i - \Vert x^{(t')} - x^{(t)}\Vert _\infty&\ge x^{(t)}_i - \sum _{r=t}^{t'-1}\Vert x^{(r+1)} - x^{(r)}\Vert _\infty \\&\ge \left( \Gamma - \frac{6(m+1)\sqrt{n}\kappa ^2_AT}{\delta }\right) \Vert x^{(t)}_{S_t}\Vert _1 \\&\ge \frac{6\sqrt{n}\kappa ^2_AT}{\delta }\Vert x^{(t)}_{S_t}\Vert _1\,. \end{aligned} \end{aligned}$$

The lower bound follows from \(T\ge 1\), as long as the constant in the definition of T is chosen large enough. \(\square \)

For any iteration t in phase \(k\ge 1\), let us define

$$\begin{aligned} D_t:=\bigcup \left\{ \mathcal { L}_{t'}\setminus \mathcal { L}_{t'-1}\,:\, t' \text { is the first iteration of phase }k'=1,2,\ldots ,k\right\} \,. \end{aligned}$$
(9)

These are the variables which entered \(\mathcal { L}_{t'}\) at the start of a phase for all \(t'\le t\). Note that \({\text {rk}}(D_t)={\text {rk}}(\mathcal { L}_t)\) holds. As a consequence of Lemma 7.6, \(D_t\) remains disjoint from the support of the modified cost \(\tilde{c}^{(k)}\).

Lemma 7.7

Let \(0\le t\le 2(m+1)T\) be an iteration and let \(k\ge 1\) be the phase in which iteration t occured. Let \(D_t\subseteq \mathcal { L}_t\) be defined as in (9). If \(\langle \tilde{c}^{(k)}, x^{(t)} \rangle > 0\), then

$$\begin{aligned}D_t\cap \textrm{supp}(\tilde{c}^{(k)}) = \emptyset \,.\end{aligned}$$

Proof

For the purpose of contradiction, suppose that there exists an index \(i\in D_t\cap \textrm{supp}(\tilde{c}^{(k)})\). Let \(r\le t\) be the iteration in which i was added to \(\mathcal { L}_r\). By our choice of \(D_t\), r is the first iteration of phase j for some \(j\le k\), which implies that \(S_r = \textrm{supp}(\tilde{c}^{(j)})\). Since \(\langle \tilde{c}^{(j)}, x^{(r)} \rangle \ge \langle \tilde{c}^{(k)}, x^{(t)} \rangle >0\) by Lemma 7.1, we have \(\Vert x^{(r)}_{S_r}\Vert _1 > 0\). However, we get the following contradiction

$$\begin{aligned} 6\sqrt{n}\kappa ^2_A\Vert x^{(r)}_{S_r}\Vert _1 \le \delta x^{(t)}_i\le \left\langle \tilde{c}^{(k)}, x^{(t)} \right\rangle \le \left\langle \tilde{c}^{(j)}, x^{(r)} \right\rangle \le 3\sqrt{n}\kappa _A \Vert x^{(r)}_{S_r}\Vert _1. \end{aligned}$$

The first inequality is by Lemma 7.6, the third inequality is by Lemma 7.1, while the fourth inequality is by Claim 7.4. \(\square \)

The following lemma shows that Ratio-Circuit geometrically decreases the norm \(\Vert x^{(t)}_{S_t}\Vert _1\).

Lemma 7.8

Let t be the first Ratio-Circuit iteration in phase \(k\ge 1\). After \(p\in \mathbb {N}\) consecutive Ratio-Circuit iterations in phase k,

$$\begin{aligned} \Vert x_{S_{t+p}}^{(t+p)}\Vert _1\le \frac{3n^{1.5}\kappa _A}{\delta }\left( 1-\frac{1}{n}\right) ^{p-1} \Vert x_{\textrm{supp}(\tilde{c}^{(k)})}^{(t)}\Vert _1, \end{aligned}$$

Proof

$$\begin{aligned} \Vert x^{(t+p)}_{S_{t+p}}\Vert _1&\le \frac{1}{\delta } \left\langle \tilde{s}^{(t+p-1)}, x^{(t+p)} \right\rangle \quad \quad \quad (\text {as}\,\, \tilde{s}^{(t+p-1)}_i \ge \delta \,\, \text {for all}\,\, i\in S_{t+p}) \\&\le \frac{1}{\delta }\left\langle s^{(t+p-1)}, x^{(t+p)} \right\rangle \quad \quad \quad (\text {from the definition of}\,\, \tilde{s}^{(t+p-1)})\\&= \frac{1}{\delta }\left\langle s^{(t+p-1)}, x^{(t+p-1)}+ \alpha g^{(t+p-1)} \right\rangle \quad \quad \quad (\text {for some augmentation step size}\,\,\alpha )\\&= \frac{1}{\delta } \left( \left\langle s^{(t+p-1)}, x^{(t+p-1)} \right\rangle + \alpha \left\langle \tilde{c}^{(k)}, g^{(t+p-1)} \right\rangle \right) \quad \quad \quad (\text {as}\,\, s^{(t+p-1)}\in {\text {Im}}(A^\top ) + \tilde{c}^{(k)}) \\&\le \frac{1}{\delta }\left\langle s^{(t+p-1)}, x^{(t+p-1)} \right\rangle \quad \quad \quad (\text {because}\,\, \left\langle \tilde{c}^{(k)}, g^{(t+p-1)} \right\rangle \le 0) \\&\le -\frac{n}{\delta }\left\langle \tilde{c}^{(k)}, g^{(t+p-1)} \right\rangle \quad \quad \quad (s^{(t+p-1)}_i \le -\left\langle \tilde{c}^{(k)}, g^{(t+p-1)} \right\rangle /x^{(t+p-1)}_i \,\, (\text {by}\,\, (3))\\&\le \frac{n}{\delta }\left( \left\langle \tilde{c}^{(k)}, x^{(t+p-1)} \right\rangle - \textrm{OPT}(\tilde{c}^{(k)})\right) \quad \quad \quad (\text {by step size}\,\, \alpha \ge 1 \,\, \text {in Lemma}\,\, 2.5)\\&\le \frac{n}{\delta }\left( 1-\frac{1}{n}\right) ^{p-1}\left( \left\langle \tilde{c}^{(k)}, x^{(t)} \right\rangle - \textrm{OPT}(\tilde{c}^{(k)})\right) \quad \quad \quad (\text {by geometric decay in Lemma}\,\,(2.5))\\&\le \frac{n}{\delta }\left( 1-\frac{1}{n}\right) ^{p-1} \left\langle \tilde{c}^{(k)}, x^{(t)} \right\rangle \quad \quad \quad (\text {because} \,\, \tilde{c}^{(k)}\ge \mathbb {0})\\&\le \frac{3n^{1.5}\kappa _A}{\delta }\left( 1-\frac{1}{n}\right) ^{p-1} \left\| x^{(t)}_{\textrm{supp}(\tilde{c}^{(k)})} \right\| _1 \quad \quad \quad \quad \quad \quad (\text {by Claim}\,\,7.4). \end{aligned}$$

\(\square \)

Recall Lemma 5.5 which guarantees the existence of a coordinate with large dual slack. It explains why we chose to work with a projected and normalized cost vector in Algorithm 4. We are now ready to prove the main result of this section.

Proof of Theorem 1.4

We first prove the correctness of Algorithm 4. Suppose that the algorithm terminates in iteration t. We may assume that there is at least 1 phase, as otherwise \(x^{(0)}\) is an optimal solution to (LP). Let \(k\ge 1\) be the phase in which iteration t occurred. Since \(\left\langle \tilde{s}^{(t-1)}, x^{(t)} \right\rangle = 0\) and \(x^{(t)},\tilde{s}^{(t-1)}\) are primal-dual feasible solutions to \({\text {LP}}(\tilde{c}^{(k)})\), they are also optimal. By Claim 7.3, we know that \(\tilde{c}^{(k)}\in {\text {Im}}(A^{\top }) + c -r\) for some \(\Vert r\Vert _\infty \le (m+1)\delta \). Hence, \(\tilde{s}^{(t-1)}\) is also an optimal dual solution to \({\text {LP}}(c')\) where \(c':=c-r\). Since \(c\in \ker (A)\), \(\Vert c\Vert _2 = 1\), and

$$\begin{aligned}\Vert c-c'\Vert _\infty \le (m+1)\delta = \frac{m+1}{2n^{3/2}(m+2)\kappa _A} < \frac{1}{\sqrt{n}(m+2)\kappa _A},\end{aligned}$$

where the strict inequality is due to \(n\ge m\) and \(n>1\), Lemma 5.5 guarantees the existence of an index \(j\in [n]\) such that

$$\begin{aligned}\tilde{s}^{(t)}_j> \frac{(m+1)}{\sqrt{n}(m+2)} > (m+1)\kappa _A\Vert c-c'\Vert _\infty .\end{aligned}$$

Thus, the algorithm returns \(N\ne \emptyset \). Moreover, for all \(j\in N\), Theorem 5.4 allows us to conclude that \(x_j^{(t)} = x^*_j = 0\) for every optimal solution \(x^*\) to \({\text {LP}}(c)\).

Next, we show that if \({\text {rk}}(\mathcal { L}_t) = m\) in some phase k, then the algorithm will terminate in iteration \(r \le t+ n + 1\). As long as \(\langle \tilde{c}^{(k)}, x^{(t)} \rangle > 0\), we have \(D_t\subseteq [n]\setminus \textrm{supp}(\tilde{c}^{(k)})\) by Lemma 7.7. Moreover, any \(i\in \textrm{supp}(\tilde{c}^{(k)})\cap \textrm{supp}(x^{(t)})\) induces a circuit in \(D_t\cup \{i\}\), so Support-Circuit will be invoked. Since every call to Support-Circuit reduces \(\textrm{supp}(x^{(t)})\), all the coordinates in \(\textrm{supp}(\tilde{c}^{(k)})\) will be zeroed-out in at most n calls. Let \(t\le t'\le t+n\) be the first iteration when \(\langle \tilde{c}^{(k)}, x^{(t')} \rangle = 0\). Since Ratio-Circuit returns \(g^{(t')}\) with \(\langle \tilde{c}^{(k)}, g^{(t')} \rangle = 0\), the algorithm terminates in the next iteration by Claim 7.2.

It is left to bound the number of iterations of Algorithm 4. Clearly, there are at most \(m+1\) phases. In every phase, there are at most n Support-Circuit iterations because each call sets a primal variable to 0. It is left to show that there are at most T Ratio-Circuit iterations in every phase.

Fix a phase \(k\ge 1\) and assume that every phase \(\ell < k\) consists of at most T many Ratio-Circuit iterations. Let t be the first iteration in phase k. We may assume that \({\text {rk}}(\mathcal { L}_t) < m\), as otherwise there is only one Ratio-Circuit iteration in this phase by the previous argument. Note that this implies \(\Vert x^{(t')}_{S_{t'}}\Vert _1>0\) for all \(t'\le t\). Otherwise, \(\mathcal { L}_{t'} = [n]\) and \({\text {rk}}(\mathcal { L}_{t'}) = m\), which contradicts \({\text {rk}}(\mathcal { L}_{t'}) \le {\text {rk}}(\mathcal { L}_t)\).

Let \(r\ge t\) be the first Ratio-Circuit iteration in phase k. Let \(D_r\subseteq \mathcal { L}_r\) be as defined in (9). By Lemma 7.6 and our assumption, we have \(x^{(r)}_{D_r} > \mathbb {0}\). We claim that \(D_r\cap \textrm{supp}(\tilde{c}^{(k)}) = \emptyset \). This is clearly the case if \(\langle \tilde{c}^{(k)}, x^{(r)} \rangle = 0\). Otherwise, it is given by Lemma 7.7. We also know that there is no circuit in \(\textrm{supp}(x^{(r)})\) which intersects \(\textrm{supp}(\tilde{c}^{(k)})\). Hence, applying Lemma 5.1 with \(L=D_r\), \(S=\textrm{supp}({\tilde{c}}^{(k)})\), \(x=x^{(r)}\), \(z=x^{(r+T)}\) yields

$$\begin{aligned} \left\| x^{(r+T)}_{[n]\setminus {\text {cl}}(D_r)} \right\| _\infty \ge \frac{ \left\| x^{(r+T)}_{[n]\setminus {\text {cl}}(D_r)} \right\| _1}{n} \ge \frac{\left\| x^{(r)}_{\textrm{supp}(\tilde{c}^{(k)})} \right\| _\infty }{n\kappa _A} \ge \frac{\left\| x^{(r)}_{\textrm{supp}(\tilde{c}^{(k)})} \right\| _1}{n^2\kappa _A}\ge \Gamma {\left\| x_{S_{r+T}}^{(r+T)} \right\| _1}\,, \end{aligned}$$

where the last inequality follows from Lemma 7.8 by choosing a sufficiently large constant in the definition of T. Note that \({\text {cl}}(D_r) = {\text {cl}}(\mathcal { L}_r)\) because \(D_r\) is a spanning subset of \(\mathcal { L}_r\). Thus, there exists an index \(i\in [n]{\setminus } {\text {cl}}(\mathcal { L}_r)\) which is added to \(\mathcal { L}_{r+T}\), showing that \({\text {rk}}(\mathcal { L}_{r+T}) > {\text {rk}}(\mathcal { L}_r)\) as required.

Since the main circuit-augmentation algorithm consists of applying Algorithm 4 at most n times, we obtain the desired bound on the number of iterations. \(\square \)

8 Circuits in general form

There are many instances in the literature where circuits are considered outside standard equality form. For example, [2, 16, 26] defined circuits for polyhedra in the general form

$$\begin{aligned} P=\{x\in \mathbb {R}^n: Ax=b,\, Bx\le d\}\,,\end{aligned}$$
(10)

where \(A\in \mathbb {R}^{m_A\times n}\), \(B\in \mathbb {R}^{m_B\times n}\), \(b\in \mathbb {R}^{m_A}\), \(c\in \mathbb {R}^{m_B}\). It implicitly includes polyhedra in inequality form, which were considered by e.g., [5, 8]. For this setup, they define \(g\in \mathbb {R}^n\) to be an elementary vector if

  1. (i)

    \(g\in \ker (A)\), and

  2. (ii)

    Bg is support minimal in the collection \(\{By: y\in \ker (A), y\ne \mathbb {0}\}\).

In the aforementioned works, the authors use the term ‘circuit’ also for elementary vectors.

Let us assume that

$$\begin{aligned} {\text {rk}}\begin{pmatrix}A\\ B\end{pmatrix}=n\,. \end{aligned}$$
(11)

This assumption is needed to ensure that P is pointed; otherwise, there exists a vector \(z\in \mathbb {R}^n\), \(z\ne \mathbb {0}\) such that \(Az=\mathbb {0}\), \(Bz=\mathbb {0}\). Thus, the lineality space of P is nontrivial. Note that the circuit diameter is defined as the maximum length of a circuit walk between two vertices; this implicitly assumes that vertices exists and therefore the lineality space is trivial.

Under this assumption, we show that circuits in the above definition are a special case of our definition in the Introduction, and explain how our results in the standard form are applicable. Consider the matrix and vector

$$\begin{aligned} M:=\begin{pmatrix}A&{}\quad 0\\ B&{}\quad I_{m_B}\end{pmatrix}\,, \, q:=\begin{pmatrix}b\\ d\end{pmatrix}\,, \end{aligned}$$

and let \({\bar{W}}:=\ker (M)\subseteq \mathbb {R}^{n+m_B}\). Let J denote the set of the last \(m_B\) indices, and \(W:=\pi _J({\bar{W}})\) denote the coordinate projection to J. The assumption (11) guarantees that for each \(s\in W\), there is a unique \((x,s)\in {\bar{W}}\); further, \(x\ne \mathbb {0}\) if and only if \(s\ne \mathbb {0}\).

Consider the polyhedron

$$\begin{aligned} {\bar{P}}=\left\{ (x,s)\in \mathbb {R}^{n}\times \mathbb {R}^{m_B}:\, M(x,s)=q\,, s\ge \mathbb {0} \right\} . \end{aligned}$$

Note that P is the projection of \({\bar{P}}\) onto the x variables. Let \(Q:=\pi _J({\bar{P}})\subseteq \mathbb {R}^{m_B}\) be the projection of \({\bar{P}}\) onto the s variables. It is easy to verify the following statements.

Lemma 8.1

If (11) holds, then there is an invertible affine one-to-one mapping \(\psi \) between Q and P, defined by

$$\begin{aligned} M(\psi (s),s)=q\,. \end{aligned}$$

Further, \(g\in \mathbb {R}^n\) is an elementary vector as in (i),(ii) above if and only if there exists \(h\in \mathbb {R}^{m_B}\) such that \((g,h)\in {\bar{W}}\), \(h\ne \mathbb {0}\) and h is support minimal.

Given such a pair \((g,h)\in {\bar{W}}\) of elementary vectors, let \(s\in Q\) and let \(s':={\text {aug}}_Q(s,h)\) denote the result of the circuit augmentation starting from s. Then, \(\psi (s')={\text {aug}}_P(\psi (s),g)\).

Consequently, the elementary vectors of (10) are in one-to-one mapping to elementary vectors in the subspace W as used in this paper. This was also independently shown by Borgwardt and Brugger [1, Corollary 3]. By the last part of the statement, analyzing circuit walks on P reduces to analyzing circuit walks of Q that is given in the subspace form \(Q=\{s\in \mathbb {R}^{m_B}:\, s\in W+r, s\ge \mathbb {0}\}\).

Finally, we can represent Q in standard equality form as follows. Using row operations on M, we can create an \(n\times n\) identity matrix in the first n columns. Thus, we can construct a representation \(Q=\{s\in \mathbb {R}^{m_B}:\, Hs=f, s\ge \mathbb {0}\}\), where \(H\in \mathbb {R}^{(m_A+m_B-n)\times m_B}\), \(f\in \mathbb {R}^{m_A+m_B-n}\). By Lemma 8.1,

$$\begin{aligned} \kappa _H = \max \left\{ \frac{|(Bg)_i|}{|(Bg)_j|}: i,j\in \textrm{supp}(Bg), g \text { is an elementary vector of }(10)\right\} . \end{aligned}$$