## 1 Introduction and formulation of the main result

We consider the following optimization problem:

\begin{aligned} \min \{f(x): x\in S\}, \end{aligned}
(1.1)

where f is a function and S is a finite subset of rational vectors in $$\mathbb {R}^n.$$ We assume that f can be accessed via an oracle which uses a black-box data structure to compute f. For instance, in the linear case, this structure can e.g. contain the coefficient vector. We will say that f is polynomially computable if there is an algorithm (an evaluation oracle) such that both the number of arithmetic operations and binary sizes of the numbers in the course of that algorithm are bounded by a polynomial in the binary size of that data structure and in the binary size of the argument. In this case, the binary size of f(x) is also polynomially bounded in the mentioned sense for every $$x\in S.$$

Let OPT denote the optimal value. A feasible solution x is an $$\varepsilon$$-approximate solution to the above problem if

\begin{aligned} f(x) \le OPT + \varepsilon . \end{aligned}

We assume that f belongs to a class $$\mathcal {C},$$ of functions defined on S,  closed under additions of linear functions, i.e., if $$f\in \mathcal {C}$$ and h is linear over S,  then $$f+h\in \mathcal {C}.$$ An algorithm computing g(x) for every $$g\in \mathcal {C}$$ and $$x\in S$$ is further called an evaluation oracle.

Also, we assume that we have an augmentation oracle which, given $$x\in S$$ and $$g\in \mathcal {C},$$ either returns $$y\in S$$ such that $$g(y) < g(x),$$ or correctly decides that x minimizes g over S.

Further we consider a model of computation including arithmetic operations, additions of linear functions to functions of $$\mathcal {C},$$ and calls to the augmentation oracle. By an oracle running time or oracle complexity of an algorithm we understand the number of calls to the augmentation oracle performed by this algorithm for a given input. The access to solutions in S is only possible via the augmentation oracle, i.e., it is only possible to check whether $$x\in S$$ is optimal for a given function $$g\in \mathcal {C}$$ and, if this is not the case, to get another solution in S with a lower value of g. Also, there is no direct access to the evaluation oracle, which is used exclusively inside the augmentation oracle. That is, in this model of computation we are basically interested in the number of calls to the augmentation oracle and put aside any details related to the arithmetic model of computation.

In the subsequent sections we present a general scaling framework where the main idea is to perform a perturbation of the objective function by adding a linear function multiplied by a scaling parameter. The linear function which is to be added at each iteration depends on the current solution. The scaling parameter decreases by a factor of two whenever the current solution is optimal for the perturbed objective function, which is verified by means of the augmentation oracle. An advantage of this scaling technique is that it is applicable to arbitrary objective functions, in contrast to methods based on bit-scaling like that proposed by Schulz et al. [11], which are restricted to linear objective functions.

The choice of a linear function for perturbation of the original objective function depends on the problem in question. We will consider two cases; in the first case, S is a set of binary vectors and in the second case S is the vertex set of a polytope defined by a system of linear equations and nonnegativity constraints.

If S is a set of binary vectors, our algorithm runs in polynomial oracle time. The previous results on the oracle complexity of 0–1 problems focused on linear problems. Schulz et al. [11] proved that a linear 0–1 problem can be solved in a polynomial number of calls to an augmentation oracle. Chubanov [2] proposed a variant of the simplex method for integer linear problems given by verification oracles, i.e., by the oracles which only verify whether a given solution is optimal and which do not necessarily return an improvement direction if the given solution is not optimal. This variant of the simplex method visits O(n) vertices of the convex hull of the feasible set in the 0–1 case and O(nq) vertices in the more general case where the feasible set is contained in $$[0,q]^n.$$ Orlin et al. [10] considered the case when local optimality does not necessarily imply global optimality and proposed a fully polynomial time approximation scheme for finding an $$\varepsilon$$-local optimal solution, i.e., one whose objective value is within a factor of $$1+\varepsilon$$ of the minimum objective value in the respective neighborhood of this solution. In the present paper we consider only the case when local optimality means global optimality; so the existence of an FPTAS in the sense of the concept of $$\varepsilon$$-local optimality introduced in [10] seems to be an open question in the nonlinear case.

For the well-known polynomially solvable cases like the assignment problem and the minimum-weight spanning tree problem, as well as the problem of finding a maximum-weight independent set of a matroid, for which simple augmentation oracles are known, our scaling algorithm represents an alternative polynomial-time algorithmic approach. The main advantage of our algorithm is that its iterations are easy to parallelize because any improvement of the current objective function (which can differ from the objective function of the problem in question) is sufficient at each iteration, in contrast to greedy algorithms, which require the best local improvement.

On the other hand, our algorithm implies a new complexity bound for the simplex method. For the polytopes of the form $$P = \{x\in \mathbb {R}^n:Ax = b, x\ge \mathbf{0}\},$$ where A is a matrix and b is a vector of the respective dimension ($$\mathbf{0}$$ denotes a zero vector), we show that there is a variant of the simplex method which solves a linear problem over P by visiting a number of vertices which is polynomially bounded in certain parameters related to the problem, in particular in the ratio between the largest and the smallest nonzero elements of a basic feasible solution. It should be noted that our bound depends on the dimension, i.e., on the number of variables, only logarithmically; the respective bound on the diameters of polytopes is presented in Corollary 5.1. Our bound can be viewed as an improvement of the bound obtained recently by Kitahara and Mizuno [7], which depends linearly on the dimension.

Another important consequence of our scaling algorithm is that any greedy algorithm for binary optimization, with the property that it is able to find an optimal solution for any function in the class $$\mathcal {C},$$ must run in polynomial oracle time. Moreover, the greedy algorithm runs in strongly polynomial oracle time for the case of a linear objective function. To prove this, we use our scaling algorithm.

It is not clear if our algorithm can be generalized for the case when S is a finite subset of integer vectors which are not necessarily vertices of a polytope. In particular, the question is, whether there is an algorithm for (1.1) with pseudopolynomial oracle running time when S is a finite subset of integer vectors. More precisely, assuming that S is a set of nonnegative integer vectors such that $$x \le u$$ for some integer vector u,  the question is whether there exists an algorithm for finding an $$\varepsilon$$-approximate solution for (1.1) whose oracle complexity is polynomial exclusively in n$$\Vert u\Vert _\infty$$ and $$\log \frac{1}{\varepsilon }.$$ A special case of (1.1) where the objective function is linear and $$S = \{x \in \mathbb {Z}^n: Ax = b, \mathbf{0} \le x \le u\}$$ was studied by De Loera, Hemmecke, and Lee [3]; they developed a pseudopolynomial oracle algorithm for this case.

In this paper, we do not consider any concrete class of nonlinear functions which would be closed under additions of linear functions and permit an efficient augmentation oracle. This question is very nontrivial and should be studied separately.

## 2 Scaling algorithm

Let us consider a family of functions

\begin{aligned} g_{y,\delta }:S\longrightarrow \mathbb {R} \end{aligned}
(2.1)

parameterized by $$y\in S$$ and $$\delta \in \mathbb {R}_+,$$ such that, for all $$y\in S,$$

\begin{aligned} g_{y,\delta }(x) = f(x) + \delta \cdot c^T_y(x - y), \end{aligned}

where $$c_y\in \mathbb {R}^n$$ is a vector depending on y such that

\begin{aligned} \forall y\in S, \forall x\in S, x\ne y: c^T_y(x - y) > 0. \end{aligned}
(2.2)

Note that, according to (2.2), $$c_y$$ defines a linear function which is uniquely minimized by y. If all elements of S are vertices of the convex hull of S,  then such $$c_y$$ exists for every $$y\in S.$$ In the next sections we consider different special cases where a suitable $$c_y$$ is easy to find. Denote

\begin{aligned} \mu ^- = \min _{x, y\in S, x\ne y} c^T_{y}(x - y),\;\;\; \mu ^+ = \max _{x, y\in S} c^T_{y}(x - y) . \end{aligned}
(2.3)

These values are positive according to our assumptions about $$c_y.$$

### Lemma 2.1

The following statements are true:

1. (i)

If y is optimal for $$g_{y,\delta },$$ then y is an $$(\mu ^+\delta )$$-approximate solution of (1.1).

2. (ii)

If $$x\ne y$$ and $$g_{y,\delta }(y) \ge g_{y,\delta }(x),$$ then $$f(x) \le f(y) - \mu ^-\delta .$$

### Proof

Proof of (i): Let $$x^*$$ be an optimal solution of (1.1). Then,

\begin{aligned} f(y) = g_{y,\delta }(y) \le g_{y,\delta }(x^*) = f(x^*) + \delta \cdot c^T_y(x^*-y) \le OPT + \mu ^+\delta . \end{aligned}

Proof of (ii):

\begin{aligned} 0 \ge g_{y,\delta }(x) - g_{y,\delta }(y) = f(x) - f(y) + \delta \cdot c^T_y(x-y) \ge f(x) - f(y) + \mu ^-\delta . \end{aligned}

This implies (ii). $$\square$$

Lemma 2.1 naturally suggests the following algorithm, which we further call the scaling algorihtm:

Input: A family of functions (2.1) and an initial solution $$x^0\in S.$$

Output: $$x^*\in S$$ optimal for (1.1).

1. 1.

Set $$k:=0$$ and $$\delta := 1.$$ While $$x^0$$ is not optimal for $$g_{x^0,\delta },$$ set $$\delta := 2\delta .$$

2. 2.

Call the augmentation oracle for f and $$x^k;$$ if $$x^k$$ is optimal for f,  then return $$x^*:= x^k.$$

3. 3.

Call the augmentation oracle to find $$x^{k+1}\in S$$ with $$g_{x^k,\delta }(x^{k+1}) < g_{x^k,\delta }(x^k)$$ or prove that $$x^k$$ is optimal for $$g_{x^k,\delta }.$$

4. 4.

If $$x^k$$ is optimal for $$g_{x^k,\delta },$$ then set $$\delta := \delta /2$$ and $$x^{k+1} := x^k.$$

5. 5.

Set $$k := k + 1$$ and go to Step 2.

Further, we refer to repetitions of steps 2–5 as to iterations of the scaling algorithm.

At each iteration, the scaling algorithm tries to improve the current value of function $$g_{x^k,\delta }$$ associated with the current solution. This function belongs to $$\mathcal {C}$$ because it is obtained from $$f\in \mathcal {C}$$ by adding a linear function and $$\mathcal {C}$$ is closed under additions of linear functions. So it is important to keep in mind that the augmentation oracle should be applicable to every function in $$\mathcal {C}.$$

Step 1 sets $$\delta$$ to 1 and then multiplies it by 2 in the while-loop until $$x^0$$ becomes optimal for $$g_{x^0,\delta }.$$ The while-loop of Step 1 runs in oracle time

\begin{aligned} O\left( \log \frac{f(x^0) - OPT}{\mu ^-}\right) \end{aligned}
(2.4)

iterations because if $$x^0$$ is not optimal for $$g_{x^0,\delta }$$ then $$f(x^0)$$ can be improved by at least $$\mu ^-\delta$$ (which follows from (ii) of Lemma 2.1) and $$f(x^0) - f(x) \le f(x^0) - OPT$$ for any $$x\in S.$$ Thus, $$\delta$$ is either 1 after Step 1, in which case $$x^0$$ is optimal for $$g_{x^0,1},$$ or $$\delta > 1.$$ In the latter case, after Step 1, for a solution x which is optimal for $$g_{x^0,\delta /2},$$ we have

\begin{aligned} f(x^0) = g_{x^0,\delta /2}(x^0) > g_{x^0,\delta /2}(x) = f(x) + \frac{\delta }{2}\cdot c^T_{x^0}(x - x^0) \ge OPT + \frac{\mu ^-\delta }{2}. \end{aligned}

Therefore, the initial value $$\delta$$ constructed in Step 1 satisfies

\begin{aligned} \delta \le \max \left\{ 1, \frac{2(f(x^0) - OPT)}{\mu ^-}\right\} . \end{aligned}
(2.5)

Let $$\gamma > 0$$ such that $$\gamma \le \mu ^+$$ be a strict lower bound on the gap between two different objective values that can be taken at feasible solutions:

\begin{aligned} \forall x,y\in S, f(x)\ne f(y): \gamma < |f(x) - f(y)|. \end{aligned}

### Theorem 2.1

The scaling algorithm finds an optimal solution in oracle time

\begin{aligned} O\left( \frac{\mu ^+}{\mu ^-}\cdot \log \frac{\mu ^+(f(x^0) - OPT))}{\mu ^-\cdot \gamma }\right) . \end{aligned}
(2.6)

### Proof

As already mentioned, Step 1 requires a number of oracle calls bounded by (2.4).

The first division of $$\delta$$ occurs at exactly the first iteration because $$x^0$$ is optimal for $$g_{x^0,\delta }$$ after Step 1.

The number of iterations between two consecutive divisions of $$\delta$$ is bounded by $$O(\mu ^+/\mu ^-)$$ due to Lemma 2.1. Indeed, let $$x^k$$ be optimal for $$g_{x^k,\delta }.$$ Then (i) of Lemma 2.1 implies that $$x^k$$ is $$(\mu ^+\delta )$$-approximate for the current $$\delta .$$ Denote the current $$\delta$$ by $$\delta ^\prime .$$ The new $$\delta$$ is equal to $$\delta ^\prime /2$$ according to Step 4. Statement (ii) of Lemma 2.1 implies that, in each iteration before the next division of $$\delta ,$$ the objective value will be improved by at least $$\mu ^-\delta ^\prime /2.$$ The number of such improvements is bounded by $$O(\mu ^+/\mu ^-)$$ because $$x^k$$ is $$(\mu ^+\delta ^\prime )$$-approximate.

If $$\delta$$ is divided by 2,  the current solution $$x^k$$ is $$(\mu ^+\delta )$$-approximate for the current $$\delta .$$ If $$\delta$$ was less than $$\gamma /\mu ^+,$$ this would mean that the current solution would be $$\gamma$$-approximate, i.e., optimal for f. In this case Step 2 would return an optimal solution and the algorithm would stop. It follows that $$\delta$$ is still not less than $$\gamma /\mu ^+$$ whenever a division of $$\delta$$ occurs. Therefore, (2.5), implies that the number of divisions of $$\delta$$ is bounded by the logarithm in (2.6).

Thus, we have proved that the scaling algorithm runs in oracle time bounded by (2.6). $$\square$$

## 3 Scaling algorithm for binary optimization

In this section, we assume that

\begin{aligned} S\subset \{0,1\}^n. \end{aligned}

Let $$g_{y,\delta }$$ be defined as

\begin{aligned} g_{y,\delta }(x) = f(x) + \delta \cdot ((-\mathbf{1})^{y})^T(x - y), \end{aligned}
(3.1)

where $$(-\mathbf{1})^y$$ is a componentwise power. That is, the ith component of $$(-\mathbf{1})^y$$ is 1 if $$y_i = 0$$ and it is equal to $$-1$$ if $$y_i = 1.$$ Note that $$g_{y,\delta }(y) = f(y).$$ When restricted to 0–1 vectors x and y,  this function has the following interpretation: when computing $$g_{y,\delta }(x),$$ the value $$\delta$$ is added to f(x) whenever $$x_i \ne y_i$$ for an index $$i = 1,\ldots ,n.$$ That is, one can say that we additionally pay $$\delta$$ for each inversion of a binary digit when moving from y to x. So if x and y are 0–1 vectors and $$x\ne y,$$ then

\begin{aligned} f(x) \le g_{y,\delta }(x) - \delta . \end{aligned}

Let m be an upper bound on the number of nonzeros in any feasible solution:

\begin{aligned} \forall x\in S:\Vert x\Vert _1 \le m. \end{aligned}

### Theorem 3.1

If S is a set of 0–1 vectors, then the scaling algorithm runs in oracle time bounded by

\begin{aligned} O\left( m\log \frac{m (f(x^0) - OPT)}{\gamma }\right) . \end{aligned}

### Proof

Note that $$\mu ^+ \le 2m$$ and $$\mu ^- \ge 1$$ because $$c_y = (-\mathbf{1})^{y}$$ and then apply Theorem 2.1. $$\square$$

It is not hard to see that the scaling algorithm is a polynomial-time method for such problems as minimum-cost spanning trees, the assignment problem and, more generally, for all combinatorial problems, with polynomially computable objective functions, for which there is a polynomial-time augmentation oracle for the class $$\mathcal {C}.$$ The advantage of the scaling algorithm for instance for the spanning tree problem is a possibility of parallelization. From the matroid theory it follows that a spanning tree is optimal if and only if a replacement of an edge of this tree by an edge not belonging to the tree does not lead to a better solution. An augmentation oracle in the course of the scaling algorithm does not need to find the best improvement, it only needs to find a better solution for the current objective function or to prove that the current solution is optimal for the current objective function. That is, we can decompose the neighborhood explored by the augmentation oracle according to the number of processors in a parallel model of computation, so that each processor would independently explore the respective subset of the neighborhood.

## 4 Greedy algorithm

Now we are going to prove that every greedy algorithm of a sufficiently general class must run in a polynomial oracle time. Let N be a mapping from S to $$2^S,$$ the set of all subsets of S,  such that $$x\in N(x)$$ for all $$x\in S.$$ In other words, N defines a neighborhood for each $$x\in S.$$ Assume that N is defined so that the following optimality condition holds for any $$g\in \mathcal {C}:$$

\begin{aligned} x^*\in \arg \min _{x\in S} g(x) \Longleftrightarrow x^*\in \arg \min _{x\in N(x^*)} g(x). \end{aligned}
(4.1)

This condition says that $$x^*$$ is optimal for g if and only if $$x^*$$ is locally optimal for g. For instance, if S represents the set of spanning trees of a graph, then N(x) can be defined as the set consisting of x and all spanning trees which can be obtained by means of removing an edge of the current tree x and adding another edge of the graph. (More generally, we can consider S representing the set of bases of a matroid.) A greedy algorithm based on such pairwise interchanges belongs to the class of greedy algorithms that we define below.

Consider the following greedy algorithm:

Input: Problem (1.1) and an initial solution $$x^0\in S.$$

Output: An optimal solution $$x^*$$ to (1.1).

1. 1.

$$k:=0.$$

2. 2.

Find $$x^{k+1}$$ in $$\arg \min _{x\in N(x^k)} f(x).$$

3. 3.

$$k := k + 1.$$

4. 4.

Repeat steps 2 and 3 until $$x^k \in \arg \min _{x\in N(x^k)} f(x).$$

5. 5.

Return $$x^*.$$

In the context of the greedy algorithm, by an augmentation oracle we mean one which is able to minimize a given function $$g\in \mathcal {C}$$ over the neighborhood N(x) of a given solution $$x\in S.$$ To distinguish such oracles from a larger class of augmentation oracles we have already introduced, we call such oracles local optimization oracles. Whenever we are speaking about oracle running time of the greedy algorithm, we mean a model of computation with a local optimization oracle. Although the greedy algorithm does not use optimization of other functions than f in the neighborhood of the current solution at each iteration, condition (4.1) is important when estimating the running time:

### Theorem 4.1

Let (4.1) hold for all $$g\in \mathcal {C}.$$ Then, if S is a set of 0–1 vectors, the greedy algorithm runs in oracle time

\begin{aligned} O\left( m \log \frac{m(f(x^0) - OPT)}{\gamma }\right) . \end{aligned}

### Proof

First, let us prove the following statement: If

\begin{aligned} f(x^{k+1}) \ge f(x^k) - \delta , \end{aligned}
(4.2)

then $$x^{k}$$ minimizes $$g_{x^k,\delta }.$$

The statement can be proved in the following way. Assume the contrary, i.e., taking into account (4.1), that there is $$x\in N(x^k)$$ with

\begin{aligned} g_{x^k,\delta }(x) < g_{x^k,\delta }(x^k). \end{aligned}

Then, since $$x^{k+1}$$ minimizes f over $$N(x^k),$$

\begin{aligned} f(x^{k+1}) \le f(x) \overset{x\ne x^k}{ \le } g_{x^k,\delta }(x) - \delta < g_{x^k,\delta }(x^k) - \delta = f(x^k) - \delta . \end{aligned}

It follows that

\begin{aligned} f(x^{k+1}) < f(x^k) - \delta , \end{aligned}

The above statement implies that if (4.2) takes place, then $$x^k$$ is a $$2m\delta$$-approximate solution. Indeed, using the fact that (4.2) implies that $$x^{k}$$ is optimal for $$g_{x^k,\delta },$$ we can write:

\begin{aligned} f(x^{k}) = g_{x^k,\delta }(x^{k}) \le g_{x^k,\delta }(x^*)\le f(x^*) + \Vert x^* - x^k\Vert _1\cdot \delta \le OPT + 2m\delta , \end{aligned}

where $$x^*$$ is optimal for f. Let $$\delta$$ be initialized exactly as in the scaling algorithm. Let $$\delta$$ be divided by 2 whenever the improvement of the objective function in the greedy algorithm is smaller than $$\delta ,$$ i.e., whenever (4.2) holds. The number of iterations between two consecutive divisions of $$\delta$$ does not exceed 4m because if no division of $$\delta$$ occurs then the current value of f is improved by at least $$\delta$$ and a division of $$\delta$$ implies that the current solution is $$2m\delta$$-approximate for the current $$\delta .$$ Now, to estimate the number of divisions of $$\delta ,$$ we observe that a $$\gamma$$-approximate solution is optimal, which implies the respective logarithmic bound on the number of divisions of $$\delta .$$ $$\square$$

If m is much smaller than n,  say, when n is exponential in m,  the above theorem improves the bound O(n) for the diameter of 0,1-polytopes obtained by Naddef [9]:

### Corollary 4.1

Let P be a 0,1-polytope where each vertex has no more than m nonzero components. Then, the diameter of P is bounded by $$O(m\log n).$$

### Proof

Let w be a vertex of P and f be defined as $$f(x) = (\mathbf{1} - 2w)^Tx.$$ This function is uniquely minimized by w. Let v be another vertex of P. Let S be the vertex set of P and, for each $$x\in S,$$ N(x) consist of x and the vertices adjacent to x. The greedy algorithm starting at $$x^0 = v$$ constructs a path from v to w in the 1-skeleton of P to minimize f over S. Now we apply Theorem 4.1 noting that $$\gamma$$ can be chosen as $$\gamma = 1/2$$ and $$|f(x)| \le n$$ for all $$x\in S$$ in this case. $$\square$$

Actually, the greedy algorithm runs in strongly polynomial oracle time provided that the objective function is linear:

### Corollary 4.2

Let f be a linear function of the form

\begin{aligned} f(x) = c^Tx \end{aligned}

for some integer vector c. Then, if S is a set of 0–1 vectors, then the greedy algorithm runs in oracle time

\begin{aligned} O(mn\log m). \end{aligned}

### Proof

We now formulate a new problem such that $$\mu ^+,$$ $$\mu ^-,$$ and the objective values of feasible solutions will be of strongly polynomial size and the greedy algorithm will construct exactly the same sequence of solutions as when solving the original problem.

Let $$D_=$$ be the subset of all solution pairs $$(x,y)\in S\times S$$ such that $$c^T(x-y) = 0$$ and $$D_{<}$$ be the subset of all $$(x,y)\in S\times S$$ such that $$c^T(x-y) < 0.$$

The integer vector c defining the objective function f is contained in the polyhedron

\begin{aligned} Q = \{z: z^T(x-y) = 0,\forall (x,y)\in D_{=}\}\cap \{z: z^T(x-y)\le -1,\forall (x,y)\in D_{<}\}, \end{aligned}

which means that Q is not empty. The coefficient matrix of the system of linear inequalities defining Q is a $$-1,0,1$$-matrix.

The absolute values of the subdeterminants of the coefficient matrix of the system of constraints defining Q are bounded by $${(2m)}^n$$ because each row of this coefficient matrix contains no more than 2m nonzero entries. Consider a matrix obtained from the coefficient matrix by replacing one of the columns by the $$-1,0$$-vector of the right-hand sides of the constraints defining Q. The absolute values of the subdeterminants of this matix are bounded by $${(2m+1)}^n.$$ Standard linear algebra implies that a minimal face of Q contains a vector a such that $$\Vert a\Vert _\infty \le {(2m+1)}^n$$ and $$\Vert a\Vert _\infty \ge 1/{(2m)^n}.$$ For every $$x\in S$$ and $$y\in S,$$ we have $$a^Tx < a^Ty$$ if and only if $$c^Tx < c^Ty.$$ It follows that

\begin{aligned} \forall x\in S: \arg \min _{y\in N(x)} a^Ty = \arg \min _{y\in N(x)} c^Ty. \end{aligned}
(4.3)

Let $$\bar{f}$$ be defined as $$\bar{f}(x) = a^Tx.$$ If f had a unique local optimum in N(x) for every x,  then we could conclude immediately that the greedy algorithm would construct the same sequence of solutions when applied to $$\bar{f}.$$ In the general case, we need to construct a new local optimization oracle which, under the same circumstances, outputs exactly the same result for $$\bar{f}$$ as that for f. Thus, let $$\varphi :S \longrightarrow S$$ be defined as follows: For each $$x\in S,$$ let $$\varphi (x) := x$$ if the original local optimization oracle decides that x is optimal for f and, otherwise, $$\varphi (x) := y,$$ where y is a better solution returned by the original local optimization oracle with respect to f. Define a new local optimization oracle in the following way:

Input: $$x\in S$$ and $$g\in \mathcal {C};$$

Output: $$y\in S$$ such that $$g(y) < g(x)$$ or a decision that x is optimal for g.

• if $$g\ne \bar{f}$$ then run the original oracle to minimize g over N(x);

• else if $$\varphi (x) = x,$$ then decide that x is optimal, else return $$\varphi (x).$$

Given $$x\in S$$ and a linear function $$g\in \mathcal {C},$$ the new oracle returns a solution in $$\arg \min _{y\in N(x)} g(x).$$ Indeed, this is the case if $$g\ne \bar{f}.$$ Otherwise, this is implied by (4.3).

Let the greedy algorithm use the new oracle and be applied to the new problem, where the objective is to minimize $$\bar{f}$$ over $$x\in S.$$ In this case, the greedy algorithm constructs exactly the same sequence of solutions as when applied to the original problem, with the original oracle. To estimate the oracle running time for the new problem, set $$\gamma = 1/(2{(2m)^n})$$ and apply Theorem 4.1. Note that $$a^Tx^0 \le n(2m+1)^{n}$$ and the optimal value of the new problem is not less than $$-{n(2m+1)^n}$$ because $$|a^Tx| \le n\Vert a\Vert _\infty$$ for all $$x\in S.$$ Then Theorem 4.1 implies that the greedy algorithm runs in oracle time $$O(mn\log n)$$ for the new problem. Since the sequence of solutions considered in this case by the greedy algorithm is exactly the same as that for the original problem, it follows that the greedy algorithm solves the original problem in the same oracle time using the original oracle. $$\square$$

Remark. It should be emphasized that Corollary 4.2 does not assume any modification of the objective function, i.e., the greedy algorithm is applied to the original objective function, in contrast to the method of Frank and Tardos [5] which relies on simultaneous Diophantine approximation. Schulz et al. [11] also apply simultaneous Diophantine approximation at a preprocessing step.

If P is a nondegenerate 0,1-polytope being the set of feasible solutions of a linear program of the standard form, then the above complexity bounds hold true for the number of vertices visited by the simplex method equipped with Dantzig’s best improvement rule. In this case, the bound of Corollary 4.2 follows from the analysis of the simplex method proposed by Kitahara and Mizuno [6]. In the next section, we will consider a variant of the simplex method, for the general case, which can use any pivot rule modified so that the simplex method using it becomes our scaling algorithm.

## 5 Optimization of functions over vertex sets of polytopes

Let the following system define a polytope P :

\begin{aligned} Ax = b, x\ge \mathbf{0}, \end{aligned}
(5.1)

where $$A\in \mathbb {Z}^{m\times n}$$ and $$b\in \mathbb {Z}^m.$$

In this section, S is assumed to be the vertex set of the polytope P.

Let Z(x) denote the index set of zero components of a vector x and $$\mathbf{1}_J$$ denote the characteristic vector of a set J.

Let $$g_{y,\delta }$$ be defined in the following way:

\begin{aligned} g_{y,\delta }(x) = f(x) + \delta \cdot \mathbf{1}^T_{Z(y)}(x - y). \end{aligned}
(5.2)

Such a function has the following interpretation: when moving from y to x,  we additionally pay $$\delta \cdot x_i$$ for each component $$x_i$$ such that $$y_i = 0.$$ Since $$x\ge \mathbf{0}$$ for all x in P,  the value added to f(x) is nonnegative:

\begin{aligned} \forall x,y\in P: \mathbf{1}^T_{Z(y)}(x - y) = \mathbf{1}^T_{Z(y)}x \ge 0. \end{aligned}
(5.3)

At this stage, in order to apply our scaling algorithm, we should still prove that (2.2) takes place for $$c_y = \mathbf{1}_{Z(y)},$$ i.e., that the added value is positive if $$x\ne y.$$

Let $$\alpha > 0$$ be a lower bound and $$\beta > 0$$ be an upper bound on the nonzero components of the vertices of P,  the polytope defined by (5.1).

### Observation 5.1

The following statement is true:

\begin{aligned} \forall x,y\in S, x\ne y: \mathbf{1}^T_{Z(y)}(x - y) \ge \alpha . \end{aligned}
(5.4)

### Proof

If y is a vertex of P,  then the system

\begin{aligned} Az = b, z_i = 0, \forall i\in Z(y), \end{aligned}
(5.5)

uniquely identifies y,  i.e., the only solution of (5.5) is y. Then for each $$x\in S,$$ $$x\ne y,$$ there exists $$i\in Z(y)$$ with $$x_i \ge \alpha .$$ Since x is nonnegative, (5.3) implies (5.4). $$\square$$

Thus, by the above observation we have the property (2.2) and can apply the scaling algorithm.

### Theorem 5.1

If S is the vertex set of P,  then the scaling algorithm solves the problem (1.1) in oracle time

\begin{aligned} O\left( m\frac{\beta }{\alpha }\log \frac{m\beta (f(x^0) - OPT)}{\gamma \alpha }\right) , \end{aligned}

where m is the number of rows of A.

### Proof

Indeed, noting that $$\mu ^+ \le m\beta$$ and $$\mu ^- \ge \alpha ,$$ we apply Theorem 2.1. $$\square$$

Let $$\Delta$$ be an upper bound on the absolute values of the subdeterminants of A.

### Theorem 5.2

Let $$c\in \mathbb {Z}^n.$$ There exists a variant of the simplex method which solves the linear program

\begin{aligned} \text {minimize } c^Tx \;\;\;\text {subject to } (5.1) \end{aligned}
(5.6)

by visiting at most

\begin{aligned} O\left( m\frac{\beta }{\alpha }\log \frac{m\beta \Vert c\Vert _1\Delta }{\alpha }\right) \end{aligned}
(5.7)

vertices of P.

### Proof

For every vertex $$x\in S,$$ let N(x) denote the set consisting of x and the vertices of P adjacent to x. For any linear objective function, a vertex x is optimal if and only if it is optimal in N(x). Let the augmentation oracle be an algorithm which either proves that x is optimal in N(x) or delivers a solution in N(x) which is better than x for a given linear objective function. Such an augmentation oracle can e.g. be based on any anticycling pivot rule according to which an entering variable has a negative reduced cost. Then, the augmentation oracle can take the following form: Given $$x\in S$$ and $$g\in \mathcal {C},$$ choose a basis defining x and use the pivot rule until a new basic feasible solution is constructed or it is proved that x is optimal for g. (The polytope P can be degenerate and the pivot rule may need more than one iteration to reach another vertex.) With such an oracle, the scaling algorithm becomes a variant of the simplex method. During each call to the oracle, the pivot rule can start with any basis defining the given vertex x or use the current basis obtained at the previous call to the oracle (in this case the scaling algorithm should store the respective basis after each call to the oracle).

Let x and y$$x\ne y,$$ be adjacent vertices of P. By Observation 5.1,

\begin{aligned} \mathbf{1}^T_{Z(y)}(x - y) \ge \alpha . \end{aligned}

It is clear that $$c^Tx^0\le \beta \Vert c\Vert _1$$ and $$OPT \ge - \beta \Vert c\Vert _1.$$ Note that $$\gamma =\frac{1}{2\Delta ^2}$$ is a strict lower bound on $$|c^T(x-y)|$$ where $$x,y\in S,$$ and $$c^Tx \ne c^Ty.$$ Then Theorem 5.1 implies the required complexity bound. $$\square$$

The coefficient of the logarithmic term in the bound (5.7) does not depend on the dimension. That is, our approach can be useful when developing column generation algorithms, when the number of columns of A is much larger than the number of rows of A.

Obviously, the above complexity bound implies the following bound on diameters of polytopes:

### Corollary 5.1

The diameter of P is bounded by

\begin{aligned} O\left( m\frac{\beta }{\alpha }\log \left( \frac{mn\beta \Delta }{\alpha }\right) \right) . \end{aligned}
(5.8)

### Proof

Consider a vertex $$x^1$$ of P. This vertex uniquely minimizes the linear function $$\mathbf{1}^T_{Z(x^1)}x$$ over P. Now consider the problem of minimizing this function over the vertex set of P. The objective value at $$x^1$$ is zero while the objective values of the other vertices are not less than $$\alpha ,$$ which follows from (5.4). The length of the path found by the variant of the simplex method described in the proof of Theorem 5.2, when starting from another vertex $$x^0,$$ is bounded by (5.8) according to (5.7). The other end of the path is $$x^1$$ because $$x^1$$ is the unique optimal solution for the linear function $$\mathbf{1}^T_{Z(x^1)}x.$$ Thus, we obtain the bound (5.8). $$\square$$

The bounds (5.7) and (5.8) depend on the dimension at most logarithmically (to see this for (5.7), observe that $$\Vert c\Vert _1\le n\Vert c\Vert _\infty$$). Recently, Kitahara and Mizuno [6] proposed a bound of the form $$O\left( nm\frac{\beta }{\alpha }\log \frac{\beta }{\alpha }\right) ,$$ which depends linearly on the dimension. Bonifas et al. [1] proposed a bound of $$O(\Delta ^2n^{3.5} \log n\Delta )$$ for polytopes in an n-dimensional space, where $$\Delta$$ is an upper bound on the absolute values of the subdeterminants of the coefficient matrix of a system of linear inequalities defining the respective polytope. This bound does not depend on the number of inequalities, but the dependence of this bound on $$\Delta$$ is at least quadratic. For totally unimodular coefficient matrices, this bound is strongly polynomial and substantially improves the bound obtained by Dyer and Frieze [4]. In our case, if A is totally unimodular, the estimate (5.8) can be competitive only if the ratio $$\beta /\alpha$$ is bounded by a polynomial of n whose degree is not so high.

## 6 Conclusions

We have proved that augmentation is polynomially equivalent to optimization for arbitrary functions over sets of binary vectors. Also, we have given complexity bounds for optimization of arbitrary functions over vertex sets of polytopes. As a consequence, we have obtained new bounds on diameters of polytopes and guaranteed bounds on the running time which can be achieved by the simplex method. For instance, the simple method introduced in the paper allows to estimate the quality of the current solution in the course of column generation methods, the respective estimates depending only logarithmically on the number of columns of the coefficient matrix.