1 Introduction

Linear programming is the workhorse of modern optimization, but despite its importance, we still face plenty of open mathematical questions about it. Dantzig’s simplex method is certainly one of the most popular algorithms for solving linear programs. Its intuitive geometry is quite simple. The simplex method searches the graph of the polyhedron, from a vertex of the one-skeleton to a better neighboring one according to some pivot rule, which selects an improving neighbor. Geometrically, the simplex method traces a path on the graph of the polytope. The diameter of the graph of a polytope is the length of the longest shortest path among all possible pairs of vertices.

The historic results of Francisco Santos disproving the Hirsch bound on the diameter (Santos 2012), mentioned in the main article (see Sect. 4 of Santos 2013), have dramatically advanced the understanding of the geometry of the simplex method and motivated new active research. Santos’ present article in TOP is a wonderful snapshot of what we know today about the diameter problem and the power of the abstract point of view on dealing with optimization problems (which recently has seen active work, see De Loera et al. (2013) and the many references therein). Today, we still do not know exact bounds for the growth of the diameter; Santos lower-bound is far from the best upper bounds discussed in Sect. 3 of Santos 2013. Here, I would like to make just three comments about his excellent article.

Comment 1: There is also very rich geometry in other algorithms for linear programming, with plenty of open questions waiting for us, also

Indeed, there are plenty of alternatives to the simplex method. There is the Ellipsoid method of Hačijan (1979), the first polynomial time algorithm for linear programming. There is the simple and beautiful Fourier–Motzkin elimination process, which can be used to prove the duality theory of linear programs. There is the Criss-Cross method, pivoting methods that are allowed to go out of the feasible region, walking not on the graph of it the polytope, but rather on the graph of the hyperplane arrangement defining it (Fukuda and Terlaky 1997). There is the family of Relaxation methods based on linear projections (Agmon 1954; Motzkin and Schoenberg 1954; Betke and Gritzmann 1992; Betke 2004; Goffin 1982; Telgen 1982) and many many other methods of solution and analysis (see, e.g., Bertsimas and Vempala 2004; Bárász and Vempala 2010; Borgwardt 2009; Chubanov 2012; Dunagan and Vempala 2008; Spielman and Teng 2004) all based on different geometric principles. We could really fill an entire book on the geometry of all linear programming algorithms.

To prove my point, let me take a quick look at one lovely geometric challenge that arises from the main competitors of the simplex method, the interior point methods. To state the results, let us consider the pair of linear programming problems in primal and dual formulation:

$$\begin{aligned} &\mbox{Maximize } \mathbf{c}^T \mathbf{x} \quad \mbox{subject to } A \mathbf{x} = \mathbf{b} \mbox{ and } \mathbf{x} \geq 0; \end{aligned}$$
(1)
$$\begin{aligned} &\mbox{Minimize } \mathbf{b}^T \mathbf{y} \quad \mbox{subject to } A^T\mathbf{y} - {\bf s}= \mathbf{c} \mbox{ and } {\bf s} \geq 0. \end{aligned}$$
(2)

Here, A is an m×n matrix. The primal-dual interior point methods are among the most computationally successful algorithms for linear optimization. While the simplex methods follow an edge path on the boundary, the interior point methods follow the central path. The famous primal-dual central path is given by the following system of quadratic and linear polynomial equations:

$$ A {\bf x} = {\bf b},\quad A^T {\bf y} - {\bf s} = {\bf c},\quad \hbox{and} \quad x_i s_i = \lambda \quad \hbox{for } i = 1,2,\ldots, n. $$
(3)

The system has several properties: For all λ>0, the system of polynomial equations has a unique real solution (x (λ),y (λ),s (λ)) with the properties x (λ)>0 and s (λ)>0. The point x (λ) is the optimal solution of the logarithmic barrier function for (1), which is defined as

$$f_{\lambda}(\mathbf{x}) := \mathbf{c}^T \mathbf{x} + \lambda \sum_{i=1}^n \log x_i. $$

Any limit point (x (0),y (0),s (0)) of these solutions for λ→0 is the unique solution of the complementary slackness constraints, and thus yields an optimum point.

Optimization workers only look at the central path as connecting the optimal solution of the linear programs in question with its analytic center within one single cell, with x i ,s i ≥0 (a cell of the arrangement is the polyhedron defined by a choice of signs in the constraints); but the central path is a portion of an algebraic curve that extends beyond a single feasibility region (given by sign constraints on variables x i , s i ). Instead of studying the problem with only constraints x i ,s i ≥0, one can ask for all feasible programs arising from any set of sign conditions. There are at most \({m-1\choose n}\) such feasible sign vectors; (A,b) is said to be in general position if this number is attained. Then the central curve passes through all the vertices of a hyperplane arrangement.

In practical computations, interior point methods follow a piecewise-linear approximation to the central path. One way to estimate the number of Newton steps needed to reach the optimal solution is to bound the total curvature of the central path. The intuition is that curves with small curvature are easier to approximate with fewer line segments. This idea has been investigated by various authors (see, e.g., Monteiro and Tsuchiya 2008; Sonnevend et al. 1992/1991; Vavasis and Ye 1996; Zhao and Stoer 1993), and has yielded interesting results. For example, Vavasis and Ye (1996) found that the central path contains no more than n 2 turning points. This finding led to an interior-point algorithm whose running time depends only on the constraint matrix A. Thus, in a way, curvature can be regarded as the continuous analogue of the diameter in the simplex method.

Dedieu et al. (2005) first investigated the differential geometric properties of the central curve of interior point methods. Their main theorem is as follows: Let (A,b,c) be as above with (A,b) in general position. Then the average total curvature of the primal, the dual, and the primal-dual central paths of the strictly feasible polytopes defined by (A,b) is at most 2π(n−1) (primal), at most 2πn (dual), and at most 2πn (primal-dual), respectively. In particular, it is independent of the number m of constraints. Although traditionally the central path is only followed approximately by interior point methods with some kind of Newton steps in De Loera et al. (2012) the authors obtained explicit exact algebraic formulas for the primal central curve and gave improved bounds for the total curvature in terms of the degree of the Gauss maps of the curve. It is surprising that the formulas can be read from a matroid associated to the matrix A and the cost vector c.

Of course, for practical applications the more relevant quantity is not the average total curvature but rather the curvature in a single feasible region. This has been investigated by A. Deza, T. Terlaky, and Y. Zinchenko in a series of papers. Dedieu et al. conjectured that the curvature (in a single cell) could only grow linearly in the dimension, but Deza et al. (2009) constructed central paths that are forced to visit small neighborhoods near of all vertices of a cube, “à la Klee–Minty.” In Deza et al. (2008), they proved that even for d=2 the total curvature can grow linearly in the number of facet constraints. They have conjectured the following curvature analogue of diameter:

Conjecture 1

The curvature of a polytope, defined as the largest possible total curvature of the associated central path with respect to the various cost vectors, is no more than 2πm, where m is the number of facets of the polytope.

This is a pretty conjecture that combines both the differential geometry of the central curve and the combinatorics of the input polyhedra. Many more of these kinds of challenges await us for each of the different linear programming algorithms.

Comment 2: We still need to work harder to understand the geometry of pivoting

The simplex method is governed by a chosen pivoting rule, i.e., a method of choosing adjacent vertices with better objective function value. Starting with the historical 1972 construction of Klee and Minty (1972), showing that Dantzig’s original pivoting rule could require exponentially many steps, researchers have discarded many of the popular pivot rules as good candidates for polynomial behavior. By 2010, almost all known natural deterministic pivoting rules were known to require an exponential number of steps to solve some linear programs (see Amenta and Ziegler 1999; Ziegler 2004), but three conspicuous pivot rules resisted the attacks of researchers until then. The most famous “untamed” pivot rules were Zadeh’s rule (also known as the least entered rule) 2009, the randomized pivot rules of Random-Edge originally proposed by G. Dantzig, and Random-Facet proposed by Kalai (1992), and in a different form by Matoušek et al. (1996).

At any nonoptimal vertex, the Zadeh pivot rule chooses the decreasing edge that leaves the facet that has been left least often in the previous moves. In case of ties, a tie-breaking rule is used to determine the decreasing edge to be taken. Any other pivot rule can be used as a tie-breaking rule. The rule was proposed by Norman Zadeh in a 1980 technical report from the department of operations research of Stanford University. It has now appeared reprinted in Zadeh (2009). Now, the random edge pivot rule chooses, from among all improving pivoting steps (or edges) from the current basic feasible solution (or vertex), one uniformly at random. The description of Random-Facet pivoting is a bit more complicated as there are several versions: roughly, at any nonoptimal vertex v choose one facet F containing v uniformly at random and solve the problem restricted to F by applying the algorithm recursively to one of its facets. The recursion decreases the dimension of the polytope at each iteration, thus it will eventually restrict to a one-dimensional face, which is solved by following that edge. One repeats the process until reaching an optimum.

No nonpolynomial lower bounds were known until recently for these three pivot rules. Prior evidence of exponential behavior was given in Matoušek (1994) and Matoušek and Szabó (2006) that both the random edge and random facet pivot rules do not have a polynomial bound when used in a certain class of oriented graphs, which include the graph of a polyhedron oriented by an objective function. Morris (2002) showed bad behavior existed for random edge in the related setting of linear complementarity problems (see also Avis and Moriyama 2009; Gärtner and Kaibel 2007 for more on abstract graphs). But there was also evidence of good behavior in special cases (e.g., Balogh and Pemantle 2007; Kaibel et al. 2004/05) and the random facet rule can be shown to perform an expected subexponential number of steps in the worst case (Kalai 1992; Matoušek et al. 1996). This outperforms the deterministic pivot rules so far.

The results that these pivot rules are not always polynomial for specific simple LPs is an exciting breakthrough in the theory of the simplex method. The breakthrough came in two nice papers. The team of Friedmann et al. (2011) provided the first lower bound of the form \(2^{\varOmega(n^{\alpha})}\), for some α>0, for both the Random-Edge and the Random-Facet pivot rule in the one-pass variant. Based on Fearnley (2010), Friedmann et al. (2011) and Friedmann (2009), Friedmann constructed an exponential lower bound for the number of steps in Zadeh’s rule (Friedmann 2011). Their new constructions use the close relation between simplex-type algorithms for solving linear programs and the policy iteration algorithms for the stochastic 1-player games called Markov decision processes or MDPs Bertsekas 2005; Puterman 1994). It should be remarked that the diameter of the resulting polytopes is actually smaller than the Hirsch bound. Curiously, Ye (2011) showed that the simplex method using Dantzig’s pivot rule (where one chooses the entering variable with the largest reduced cost coefficient). is strongly polynomial for the linear programs derived from Markov Decision Processes with Fixed Discount (which is not the setting for the other papers, but is an important case of MDPs).

Ye’s result has inspired others to revisit the classical analysis of pivot rules. In the case of pivot rule of Dantzig, and based on Ye’s analysis, Kitahara and Mizuno (2011, 2012) have shown bounds for the number of basic feasible solutions (BFSs) visited by the simplex method with Dantzig’s pivot rule. The bounds uses the relative sizes of the nonzero coordinates of the vertices: Given a linear program of the form max{c T x:Ax=b,x≥0} where A is a real d×n matrix, the number of different BFSs generated by this version of the simplex method is bounded by \(n \lceil d \frac{\gamma}{\delta} \log (d\frac{\gamma}{\delta}) \rceil\), where δ and γ are the minimum and the maximum values of all the positive elements of primal BFSs and ⌈a⌉ denotes the smallest integer greater than a. Interestingly, they also presented a variant of Klee–Minty’s LP, for which the number of iterations for this variant is equal to the ratio \(\frac{\gamma}{\delta}\).

Beyond the theoretical analysis of pivot rules, there are a number of things we can learn from experiments. Ziegler (2004) reported on studies analyzing some of the well-known NETLIB collection of benchmark problems using the shadow boundary method, a pivot rule guided by the two-dimensional projection of the polytope. This pivot rule deserves more investigation and plays an interesting role in the smoothed analysis of linear programs (Spielman and Teng 2004). There is so much we still to understand.

Comment 3: There are many interesting special polyhedra, which already present a worthy challenge to determine their diameter

It is clear that some additional structure can always help and I think one can exploit this even more that we have done so far. Indeed, as mentioned in Sect. 4.5 of Santos’ article, some families of bounded coefficients on the vertices have been shown to satisfy (low-degree) polynomial diameter bounds (Brightwell et al. 2006; De Loera et al. 2009), but we it is frustrating that we do not know whether the Hirsch conjecture holds for them. A rather interesting family is the special case in which A is a totally unimodular matrix, which recently had a great improvement and it contains all network polytopes, in particular the classical transportation polytopes. As already stressed in Theorem 4.28 of the article, the bounds by Bonifas et al. (2011) provided a bound that is polynomial in the dimension and the largest absolute value of a sub-determinant of the defining integer matrix A. They exploited the metric information of the problem. For unimodular matrices, this gives the beautiful bound O(d 4logd) and O(d 3.5logd) for general or unbounded polytopes. This greatly improves over the previous best bound for totally unimodular matrices by Dyer and Frieze (1994).

For the specific case of transportation polytopes, one is truly very close of the Hirsch bound. The current record is the theorem of Hurkens:

Theorem 1

(Hurkens 2013)

The diameter of every p×q transportation polytope is at most 4(p+q−2).

The big question is can we close the gap further to reach the Hirsch bound in the case of transportation problems? In other words, is the diameter of p×q transportation always less than or equal to p+q−1.

The Hirsch conjecture holds for all known many cases of special transportation polytopes that restrict the margins. For example, the conjecture is true for Birkhoff’s polytope and for some special right-hand sides (see, e.g., Borgwardt 2013). Our work continues today to try to settle this important instance.