Two-row and two-column mixed-integer presolve using hashing-based pairing methods

,


Introduction
Presolve for mixed-integer programming (MIP) is a set of routines that remove redundant information and strengthen the model formulation with the aim of accelerating a subsequent main solution process, which is usually a branch-and-cut approach. Further, presolve is an excellent complement to branch-and-cut, because it focuses on model reductions and reformulations that are commonly not in the working space of branch-and-cut, e.g., greatest common divisor reductions or redundancy detection. Presolve can be very effective and it frequently makes the difference between a problem being intractable and solvable [2].
Considering that presolve is undisputedly an important building block in solving MIP problems, there exist comparatively few articles in the MIP literature. Only in the last few years there has been a moderate increase in the number of publications on this subject. One of the earliest contributions was by Brearly et al. [13]. Later Johnson and Suhl [26], Guignard and Spielberg [24], Crowder et al. [15], and Hoffman and Padberg [25] investigated presolve techniques for zero-one inequalities. Savelbergh [35] published preprocessing and probing techniques for MIP problems. Many of the methods published in it are now standard in almost every MIP solver. Presolve also plays an important role in linear programming [5] and especially for interior point algorithms [23]. In recent years, presolve as a research topic has become increasingly important [19,3]. Finally, the very recent publication by Achterberg et al. [4] deserves special attention. Not only does it present many new presolve methods, but also highlights theoretically interesting details of presolve reductions and relations to number theory.
How the presolve algorithms are implemented frequently makes the difference between a beneficial and a harmful method. Often it is reasonable to weight the strength of the reductions against the runtime behavior in order to rank the presolve methods appropriately. Details on implementation of various presolve reductions are discussed in Suhl and Szymanski [40], Martin [32], Atamtürk and Savelsbergh [6], and Achterberg [1]. Investigations concerning the performance impact of different features of a MIP solver were published in Bixby et al. [12], Bixby and Rothberg [11], and Achterberg and Wunderling [2]. These publications confirmed that presolve and cutting plane techniques are by far the most important components of modern MIP solvers.
In the most simple and usually fastest case, individual rows or columns are considered for presolve reductions. Some interactions between reductions may take place through e.g. tightened variables bounds, but strictly speaking this approach is limited to local information and thus often yields weak reductions. On the other hand one could try to find stronger reductions by building global data structures such as for example the clique table, which collects mutual incompatibilities between binary variables, or by considering larger parts of the problem or even the whole problem, which generally leads to higher runtimes.
It is therefore important to strike a good balance between effectiveness and computational overhead. One way to achieve this would be to examine two rows or two columns simultaneously. Bixby and Wagner [10] published an efficient algorithm to find parallel rows or columns of the constraint matrix. Multi-row presolve reductions were also the subject of research in Achterberg et al. [3], where they derive improvements to the model formulation by simultaneously examining multiple problem constraints. Finally, considering more than one row or column at a time allows more techniques to be used. For instance, matrix sparsification techniques as described in Chang and McCormick [14], Gondzio [23] and, more recently, Achterberg et. al. [4], cannot be applied when considering only one row or one column at a time.
In many of the publications mentioned above, amazing results have been achieved with presolve reductions for solving MIPs. However, these results often are closely related to the exact details of the underlying implementation. Especially two-row or two-column techniques immediately raise the question of how to find promising pairs of rows or columns as simple enumeration of all combinations is usually too computationally expensive in a presolve context. In most publications there is no answer to this question. Therefore, in this article we develop two efficient methods for determining suitable pairs for presolve reductions via hashing methods [27]. In addition, we describe new presolve approaches or extensions of existing techniques that, to the best of our knowledge, have not been published yet.
The paper is organized as follows. Section 2 introduces the notation and some basic concepts used in the remainder of the paper. Sections 3-5 constitute the main part of the paper. Five new presolve approaches or extensions of already published methods are described in Section 3. The first two methods deal with sparsification of the constraint matrix, two more methods focus on bound tightening, and the method presented last is a dual approach resulting in variable and constraint fixings. Next, in Section 4, we describe the hashing methods we use to address the question of finding promising pairs of two rows or columns of the problem for the previously presented presolve methods. Subsequently, Section 5 provides computational results to assess their performance impact. Finally, we summarize our conclusions in Section 6.

Notation and Background
This document builds on the following notation. Let N = {1, ..., n} and M = {1, ..., m} be index sets. Given a matrix A ∈ R m×n , vectors c ∈ R n , b ∈ R m , ∈ (R ∪ {−∞}) n , u ∈ (R ∪ {∞}) n , variables x ∈ R n with x j ∈ Z for j ∈ I ⊆ N , and relations • i ∈ {= , ≤, ≥} for all i ∈ M of A, a mixed-integer program (MIP) can be formulated in the following form: min c x j ≤ x j ≤ u j for all j ∈ N , x j ∈ Z for all j ∈ I. (1) The set of feasible solutions of (1) is defined by In addition, we define the set of feasible solutions of the linear programming relaxation of (1) by The single coefficients of A are denoted by a ij with i ∈ M and j ∈ N . We use the notation A i· to identify the row vector given by the i-th row of A. Similarly, A ·j is the column vector given by the j-th column of A. Let R ⊆ M and V ⊆ N , then A RV denotes a matrix consisting of the coefficients of A restricted to the row indices i ∈ R and column indices j ∈ V . Similarly, x V denotes the vector x limited to the index set V .
In some cases, we consult the support function which determines the indices of the nonzero entries of a vector. With supp(A i· ) = {j ∈ N | a ij = 0} we denote the support of row A i· . Correspondingly, supp(A ·j ) = {i ∈ M | a ij = 0} designates the support of column A ·j .
Depending on the coefficients of the matrix A and the bounds of the variables x, the minimal activity and maximal activity for each row i ∈ M in (1) are given by and respectively. As infinite lower or upper bounds on the variables x are allowed, infinite minimal or maximal activities may occur. To simplify notation we write for row i and a subset V ⊆ N of variable indices, in order to refer to partial minimal and maximal activities. This coincides with (2) and (3) for V = N .

Consider an inequality constraint
holds for all x ∈ P MIP . There are various possibilities in order to achieve this and the most simple approach for single-row presolve is to optimize in linear time over the bounding box, i.e.
If u V i is finite, we can determine bounds on variable x j . Depending on the sign of a ij we distinguish two cases. For a ij > 0 a valid lower bound on x j is given by and hence, we can replace j by Analogously, for a ij < 0, a valid upper bound on x j is given by and hence, we can replace u j by If x j is an integer variable we replace (6) by and analogously (8) by The observations above together with (4) can be used to realize an approach called feasibility-based bound tightening (FBBT), which is an iterative range-reduction technique [18]. It involves tightening the variable ranges using all of the constraint restrictions. This procedure is iterated as long as the variable ranges keep changing. FBBT might fail to converge finitely (see Achterberg et al. [4]). Therefore, in practice one stops iterating when the improvement after one or several iterations is too small, an alternate approach to deal with this special case is described in [9].
Instead of trivially optimizing over the bounding box one could calculate However, this approach would usually be very time consuming as it amounts to solving problem (1) with a different objective function. A lightweight alternative to solving the complete MIPs is to only consider the linear programming relaxation Still, this procedure is often too time-consuming.
A more reasonable compromise between computational complexity and tightness of the bounds is to use only a small number of constraints of the system instead of all constraints over which we maximize or minimize. This can again be done via solving a MIP or an LP. If we just use a single constraint, then solving the corresponding LP is of particular interest, since such problems can be solved in linear time in the number of variables, see Dantzig [17] and Balas and Zemel [7].
A closely related procedure is the so-called optimization based bound tightening (OBBT), which first has been applied in the context of global optimization by Shectmann and Sahinidis [38] and Quesada and Grossmann [34]. Here, the block we want to minimize and maximize consists of just one variable: Again, instead of solving the full MIP one can solve any relaxation of the problem to obtain valid bounds for x j .
Presolve methods can be classified into two groups: primal and dual presolve methods. Primal presolve methods derive reductions based on the feasibility argument and thus are valid independently of the objective function. In contrast, dual presolve methods derive reductions by utilizing the information from the objective function while ensuring at least one optimal solution is retained, as long as the problem was feasible.

Two-Row and Two-Column Presolve Methods
In this section we will outline five presolve methods for which we developed the pairing mechanisms following in Section 4. First, Section 3.1 presents a primal presolve method to increase sparsity of the constraint matrix by cancelling nonzero coefficients using pairs of rows and the method in Section 3.2 extends this idea onto using columns for nonzero cancellation. The next two primal presolve methods, outlined in Section 3.3 and Section 3.4, improve variable bounds via FBBT on pairs of rows. Although these two methods work in a similar way they have different advantages. In short, the method in Section 3.3 is easier to implement and can be applied to single variables but cannot tighten bounds on all variables present in the involved constraints. Finally, Section 3.5 presents a way to reuse the method presented in Section 3.4 in order to improve a dual presolve method.

Two-Row Nonzero Cancellation
This presolve method tries to find pairs consisting of an equality and a second constraint such that adding the appropriately scaled equality to the other constraint sparsifies the latter by cancelling nonzero coefficients. This approach to reduce the number of nonzeros in the constraint matrix A has already been used by Chang and McCormick [14], Gondzio [23] and, more recently, Achterberg et. al. [4]. More precisely, assume two constraints with i, r ∈ M and disjoint subsets of the column indices U , V , W , Y ⊆ N . Further, assume there exists a scalar λ ∈ R such that A rU − λA iU = 0 and A rk − λA ik = 0 for all k ∈ V . Subtracting λ times the equality i from constraint r yields The difference in the number of nonzeros of A is |U | − |W |. Since the case |U | − |W | ≤ 0 does not seem to offer any advantage, the reduction is applied only if the number of nonzeros actually decreases. In all cases, this procedure takes at most O(|varindex|) time.
For mixed-integer programming, reducing the number of nonzeros of A has two main advantages. First, many subroutines in a MIP solver depend on this number. Especially the LP solver benefits from sparse basis matrices. Secondly, nonzero cancellation may open up possibilities for other presolve techniques to perform useful reductions or improvements on the formulation. One special case occurs if W = ∅, that is, the support of the equality constraint is a subset of the support of the other constraint. This case is of particular interest because decompositions may take place.

Two-Column Nonzero Cancellation
This presolve method extends the idea of the previous presolve method onto columns, i.e. we now aim for nonzero cancellations based on two columns of the constraint matrix A.
To be more precise, consider the sum of two columns of problem (1) where j, k ∈ N and U , V , W , Y ⊆ M are disjoint subsets of the row indices. We first discuss the case where x j is a continuous variable. Suppose there exists a scalar λ ∈ R such that λA U j − A U k = 0, λA V j − A V k = 0. By rewriting (12) as and introducing a new variable z := x j + λx k , we obtain From the definition of z, it follows that the lower and upper bounds of z are z = j + λ k , for λ > 0, j + λu k , for λ < 0, and u z = u j + λu k , for λ > 0, u j + λ k , for λ < 0, respectively. However, in order to keep the bounds of j ≤ x j ≤ u j , the constraint needs to explicitly be added to problem (1). As is the case for the sparsification method from Section 3.1, the new representation consisting of (13) is of interest when |U |−|W | > 0 holds. One difference to the row-based version is that the reformulation contains the additional overhead of adding the constraint (14) to the problem (1). However, if x j is a free variable, i.e. j = −∞ and u j = ∞ hold, the constraint (14) is redundant and does not need to be added to the problem (1). When the bounds of x j are implied by the constraints and the bounds of other variables, x j can also be treated as a free variable. In this case one speaks of an implicit free variable.
We now consider the case where x j is an integer variable. To maintain integrality, we require that x k is also an integer variable and λ is an integral scalar. This guarantees that, when reversing the transformation during post-processing, we can obtain an integral value of x k from the values of x j and the new integer variable z.
Note that applying this presolve method to the problem (1) can be seen as applying the row-based version to the associated dual problem. Therefore it takes time linear in the number of constraints, i.e. its time complexity is O(|M|). Due to the additional overhead resulting from constraint (14) the difference in the number of nonzeros |U | − |W | is generally required to be larger than in the row-based method.

LP-based Bound Tightening on Two Constraints
The idea of using two rows of the constraint matrix A to determine tighter variable bounds or redundant constraints has already been described in [4]. We will first present the presolve method and then discuss an important observation to omit unnecessary calculations.
Consider two arbitrary constraints of (1) where r, s ∈ M and U, V, W ⊆ N . Note that for any i ∈ M with • i ∈ {≤, =} the corresponding constraint can be normalized to be of the form above. Together with the variable bounds ≤ x ≤ u we exploit the special form of the two constraints to set up the following two LPs As shown in [17] and [7], each of these problems can be solved within linear time. Provided we have finite values for y max and y min , i.e. the above problems have been feasible and bounded, the results can be used to, again in linear time, derive potentially stronger variable bounds and detect redundancy of constraint r. Bounds on variables x j , j ∈ V , can be calculated via (16) where V = V \ {j} and constraint r is redundant if condition (17) holds. Note the similarity of (16) to (5) and (7), where sup(A rU x U ) is replaced by y max as well as the fact that y max is at least as strong, and potentially stronger, as sup(A rU x U ).
Note that the roles of the rows r and s are interchangeable. As the following lemma shows, bound tightening on the constraints r and s may only improve the bounds on variables x j , j ∈ V if there exists at least one variable x i , i ∈ U with a ri < 0 < a si or a si < 0 < a ri .
Lemma 3.1. If for all variables x i , i ∈ U , the corresponding coefficients a ri and a si are either both non-negative or both non-positive, then inequality (16) cannot improve the bounds on the variables x j , j ∈ V , over regular FBBT on constraint r.
Proof. Let x i , i ∈ U be a variable such that its coefficients a ri and a si are both nonnegative. In this case, setting the variable x i to its upper bound maximizes the value of y max and also contributes most to satisfying the constraint Analogously, a variable with non-positive coefficients a ri and a si should be set to its lower bound. Hence, the value y max of the optimal solution coincides with sup(A rU x U ) such that (16) yields the same bounds as (5) and (7) on constraint r.
The lemma above shows that in order to improve the bounds over what can be achieved using FBBT on a single constraint, it is of crucial importance to find pairs of inequalities that contain variables such that their respective coefficients have opposite sign.
Remark 3.2. By rewriting the objective function for y min as max −A rU x U we can, by the same arguments as in Lemma 3.1, reason that equal signs are advantageous for detecting redundant constraints. Since we focus on bound tightening we did not continue this line of thought in this work, but it certainly is an interesting experiment for future research.
Next we give an example to further illustrate the basic idea as well as the statements made in Lemma 3.1 and Remark 3.2.
Example 3.3. Consider the following two constraints Then we get y max = 7 and y min = 4 and applying (16) and (17) yields Therefore, the lower bound of x 1 could not be tightened over simple FBBT on the first constraint, but we have detected that the first constraint can be removed from the problem as it is redundant with the second constraint. If we keep the variable bounds and change the problem to we obtain y max = 3 and y min = 0. Again, we apply (16) as well as (17) and get In this case, due to opposite signs for coefficients appearing in both constraints, we were able to tighten the lower bound on x 1 , but the first constraint is no longer redundant with the second one. Note that FBBT on the first constraint would still yield −3 as lower bound on x 1 .
In certain cases, presolve methods interact with each other. This is one reason why different presolve methods are often grouped by their runtime behavior and executed in repeated rounds [20]. Of particular interest are cases where one method generates a structure that helps another method to perform further useful reductions. As an example, we will look at an interaction between the method of Section 3.1 and the method described here.
Example 3.4. Consider the following problem: we perform LP-based bound tightening for the first and second constraint as follows to obtain y max = 4. Now we would like to determine a tighter lower bound on x 3 , but this is not possible as sup(x 3 + x 4 ) = ∞. However, if we first use the presolve method of Section 3.1 and add the equality to the first constraint, then the variables x 4 and x 5 disappear. As a results, we obtain x 3 ≥ (6 − y max )/1 = 2.

Bound Tightening on Convex Combinations of Two Constraints
Belotti [8] presented an algorithm to efficiently derive bounds for all variables from the LP relaxation of the problem where r, s ∈ M. Note again that for any constraint i with • i ∈ {≤, =} the corresponding constraint can be normalized to be of the same form as constraints r and s above. We will first summarize the original method and then discuss an extension onto two rows together with disjoint set packing constraints.

Basic Bound Tightening on Convex Combinations of Two Constraints
The basic idea is to reformulate the problem using convex combinations of the two constraints in (18). Let λ ∈ [0, 1], for the convex combination of the coefficient vectors we writeā rs (λ) := λA r· + (1 − λ)A s· andb rs (λ) := λb r + (1 − λ)b s for the convex combination of the right hand side. For single indices j ∈ N or index subsets V ⊆ N we writeā rs j (λ) andā rs V (λ), respectively, to address the corresponding entries ofā rs (λ). One may now perform FBBT on the combined constraintā rs (λ)x ≥b rs (λ). Note that for λ ∈ {0, 1} this cannot improve the bounds over FBBT on the original constraints, and we therefore restrict ourselves to λ ∈ (0, 1). In short, possibly stronger variable bounds [¯ j ,ū j ] on a variable x j can be derived by solving the following one-dimensional optimization problems: where Belotti [8] has then shown that these problems can be solved by evaluating the piecewise linear L j (λ) for a small finite set of values for λ, namely the set of all breakpoints of L j (λ). The next proposition summarizes the result.
contains all optimal solutions to the problems (19) and (20).
Note that problems (19) and (20) may not have optimal solutions or are unbounded. However, these edge cases, which are discussed in more detail in the original paper [8], do not affect the overall viability of this approach.
Computing L j (λ) for all variables and each λ ∈ Λ individually would result in an algorithm with running time in O(|N | 3 ). However, it is possible to simultaneously solve the optimization problems for all variables in time in O(|N | 2 ) by using constant-time update operations for each λ ∈ Λ instead of recomputing each L j (λ) individually. We will outline the basic idea as well as the core operations; for full details of the algorithm, see [8].
In order to derive bounds for λ ∈ [λ 1 , λ 2 ] we need to first update L r and L s . More precisely, for each variable x j , j ∈ N withā rs j (λ 1 ) = 0, i.e. each variable whose combined coefficient switches sign in λ 1 , we need update L r and L s via (25) and (26) respectively. If a combined coefficient changes sign from positive to negative, we need to set the corresponding variable from its upper bound to its lower bound. Analogously, a variable needs to be set from its lower bound to its upper bound if its combined coefficient changes sign from negative to positive.

Further Improvement via Disjoint Set Packing Constraints
In this section, we show that Belotti's approach can be extended to derive bounds from Problem (18) together with an arbitrary number of disjoint set packing constraints of the form where S = {S 1 , ..., S |S| } is a partitioning of a subset of the integer variable index set I. More precisely, for λ ∈ (0, 1) we want to derive bounds from the problem x j ∈ Z ∀j ∈ I ⊆ N .
To further simplify notation, we write S := Sp∈S S p and T := N \ S.
In the following two examples, we illustrate the effect of the addition of set packing constraints and then proceed to analyze how Belotti's algorithm needs to be adjusted in order to achieve these results with minimal additional computational effort.
Example 3.6. Consider the following mixed-integer problem with one set packing constraint. Figure 1a shows a comparison between which results from applying the original method presented in 3.4.1, and solving the following optimization problem for λ ∈ (0, 1).
Whereas the original method cannot tighten the bound at all, when considering the set packing constraint we get a positive lower bound on x 3 for λ ∈ ( 1 3 , 2 3 ). First, this illustrates one case for which adding a set packing constraint indeed yields a better bound on x 3 . Second, the example shows that the set Λ needs to be extended as it no longer contains all optimal solutions to (30). In this example, there is the new breakpoint λ = 1 2 where we haveā 1 ( 1 2 ) = 1 2 =ā 2 ( 1 2 ). This new breakpoint is clearly visible in Figure 1a. We will see later, that at all new breakpoints λ * the index argmax j∈Sp {ā j (λ)} changes between λ < λ * and λ > λ * . Example 3.7. Consider the following mixed-integer program Again, Figure 1b shows for λ ∈ (0, 1) the comparison between the results obtained by the method presented in 3.4.1, i.e. , and solving Similar to the previous example, the set packing constraint allows us to derive stronger bounds. Whereas the original method yields a lower bound of 1 3 for λ = 3 4 the set packing constraint allows us to tighten that bound to 1 for λ = 2 3 and we get new breakpoints again, namely λ 1 = 1 3 and λ 2 = 2 3 . Note that for these breakpoints we also havē respectively. As a final note, while L j (λ) and a rs j (λ) are always (piecewise) linear, this does not necessarily hold for the quotient Lj (λ) a rs j (λ) . Note that in both examples we have strengthened the bounds of a variable that did not appear in any set packing constraint. In the following, we assume j ∈ T , i.e. the variable x j is not present in any set packing constraint. For a variable x j , j ∈ S p , S p ∈ S, we can still derive bounds by simply relaxing the corresponding set packing constraint to k∈Sp\{j} x k ≤ 1. At the end of the section we present derivations that are stronger than what is achievable by relaxing the set packing constraint.
Let j ∈ T . Since for each subset S p ∈ S we can choose at most one variable x k ∈ S p to be set to one, we obtain, for any given λ ∈ (0, 1) withā rs j (λ) > 0, valid bounds by and analogously for λ ∈ (0, 1) withā rs j (λ) < 0 we have Note that (33) and (34) solve, in their respective cases, the problems min x j , s.t. (28), as well as max x j , s.t. (28) to optimality. This also shows that for any given λ ∈ (0, 1) the set packing constraint corresponding to a subset S p ∈ S is redundant if and only if a rs j (λ) > 0 holds for at most one index j ∈ S p . To simplify notation, define Transferring bounds (33) and (34) to the problems (19) and (20) then yields Similar to problems (19) and (20), the problems (35) and (36) may not have optimal solutions or are unbounded and again, these edge cases do not affect the overall viability of this approach, and hence we restrict ourself to cases where these problems are bounded and have an optimal solution. In order to show Belotti's algorithm can be extended to the problems above we prove that for solving these problems a linear number of evaluations of Γ j (λ) suffices. As a first step to prove the analogue of Proposition 3.5 for the problems (35) and (36), we show that Γ j (λ) is, just like L j (λ) in the original case, a piecewise linear function and determine all its breakpoints.
Lemma 3.8. For any j ∈ T , the function Γ j (λ) is piecewise linear on its domain (0, 1) and the set contains all its breakpoints.
Proof. Each Φ p (λ), S p ∈ S, is defined as the maximum over the constant zero function and the set of the linear termsā rs k (λ), k ∈ S p , and is hence piecewise linear. Consequently, we have that Γ j (λ) is piecewise linear as the sum of linear and piecewise linear functions.
Note that |Λ| is linear in the number of variables since the set packing constraints are disjoint. Further, the setΛ can be computed in time in O(|N | log |N |). A simple well known algorithm for this would be the following. For each S p ∈ S we first sort the linear functionsā rs k (λ), k ∈ S p by their gradients a rk − a sk . For theā rs k (λ) with the smallest gradient we then haveā rs k (λ) = Φ p (λ) for small enough values of λ. From there, go through the remainingā rs k (λ) in ascending order of their gradients and iteratively compute the next breakpoint. Since eachā rs k (λ) needs to be considered at most once, the total running time is in O (|N | log|N |). Theorem 3.9. If an optimal solution to Problem (35) exists, then at least one optimal solution is also contained inΛ. The same statement holds for Problem (36).
Proof. By Lemma 3.8 the setΛ contains all breakpoints of the piecewise linear Γ j (λ). Together withā rs (λ) being linear, we established the same conditions as in the proof of Proposition 3.5. The remaining proof of this theorem is then identical and can be found in full detail as proof of Proposition 1 in [8].
To extend the original algorithm for computing bounds one now proceeds as follows. First, the extended algorithm must compute L r and L s in a slightly different way. For easier notation, let κ p ∈ S p be the index of a variable x κp with a sκp = max j∈Sp {a sj } and a rκp − a sκp ≥ max{(a rj − a sj ) | j ∈ S p , a sj = a sκp }. In other words, of the variables x j , j ∈ S p , with maximum coefficientā rs j (λ) we choose the one with maximal gradient of a rs j (λ).L Second, the extended algorithm must compute the new set of breakpointsΛ which, as mentioned above, can be done in time in O(|N | log|N |) and therefore does not increase the total time complexity. Note that in the extension it is required to track which variable coefficient currently determines the value of Φ p (λ) for each S p ∈ S. However, this only leads to linear computational overhead if the breakpoints and variables are labelled accordingly while computingΛ. The bound calculations corresponding to (23) and (24) as well as the updates corresponding to (25) and (26) essentially work the same but consist of larger case distinctions due to the set packing constraints.
Before closing this section we will discuss what can be done for variables appearing in the set packing constraints. Let j ∈ S p for some fixed S p ∈ S and λ ∈Λ. We distinguish four cases.
Together with the set packing constraint corresponding to S p the original problem is infeasible if above condition does not hold.
A proper lower bound for the given λ could be derived by strengthening the summand Φ p (λ) of Γ j (λ) to max k∈Sp\{j} {0, max{ā rs k (λ)}} and considering the additional breakpoints of this strengthening that may not already be contained inΛ. However, we are aware of no efficient way to incorporate these calculations without creating additional computational overhead.
iii.ā rs j (λ) < 0 ∧ Φ p (λ) = 0: Now the set packing constraint corresponding to S p has no effect and we get an upper bound on x j via (34). iv.ā rs j (λ) < 0 ∧ Φ p (λ) > 0: Any solution with x j = 1 immediately forces x k = 0, k ∈ S p , k = j. We can therefore adjust (34) to this case and set x j = 0 if holds. Note that a negative left hand side does not necessarily imply infeasibility.
Example 3.10. This example illustrates the second case of above distinction. Consider the following mixed-integer problem.
By Lemma 3.8 we haveΛ = { 1 3 , 2 3 } and for λ = 1 3 we get we can, by the second case of above distinction, check for feasibility via Because ofā 1 ( 1 3 ) =ā 2 ( 1 3 ) = 3 it is not even possible to strengthen Φ 1 ( 1 3 ) and similar considerations hold for λ = 2 3 . However, max{0, max k∈{2,3}āk (λ)}} has an additional breakpoint at λ = 1 2 for which it would be possible to tighten the lower bound of x 1 to Remark 3.11. This presolve method is very similar to the one presented in the previous section, so we want to highlight some of the differences between them. Most importantly, the method in Section 3.3 can be implemented in a straightforward manner and it is possible to punctually apply the method to single variables. However, there are two major points that speak in favor of the method of this section. First, the previous method is restricted to tightening bounds of variables that do not have nonzero coefficients in both constraints whereas this section's method is able to tighten bounds of all variables involved. Second, it does not seem the case that the set packing extension translates to the presolve method from Section 3.3 in a canonical way. Consider the case that all variables are binary and present in one of the disjoint set packing constraints. Adding all the set packing constraints to the optimization problems shown in (16) results in solving so-called Multiple-Choice Knapsack Problems, see [39], which would exceed the scope of this paper.

Exploiting Complementary Slackness on Two Columns of Continuous Variables
By propagating the dual problem of the linear programming relaxation of problem (1) appropriately, we are able to derive bounds on the dual variables. These bounds can be used to detect implied equalities and to fix variables in (1). This presolve method was first published by Achterberg et al. [4]. We will first describe the basic idea behind this method. Then we will discuss the extension to two columns. Finally, we will go into some details of the implementation.

Description of the Basic Method
Here we want to outline the basic idea of this presolve method as it was published in [4]. Consider the primal LP and its dual LP max{b y | A y ≤ c, y ≥ 0, y ∈ R |M| }.
Suppose x * is feasible for (40) and y * is feasible for (41). A sufficient condition for x * and y * to be optimal for (40) and (41), respectively, is complementary slackness (see Schrijver [37]). It is possible to exploit certain cases of complementary slackness for mixed-integer programs as well. Let problem (1) with • i = ≥ for all i ∈ M and lower bounds = 0 for the x variables be given. While considering only the continuous variable indices S := N \ I and applying bound propagation on the polyhedron to get valid bounds¯ ≤ y ≤ū, we can make the following statements: The problem max{b y | y ∈ P (S)} is a relaxation of the dual LP (41), since only the constraints that belong to continuous variables in the primal problem are considered.
Note that even if x j is an integer variable in case (ii) we can fix it to its lower bound. The statements (i) and (ii) are thus a bit of a generalization for complementary slackness in the context of mixed-integer programming.

Extension to Two Columns of Continuous Variables
In order to obtain tight bounds for the dual variables we can solve for each variable y i , i ∈ M two LPs min{y i | y ∈ P (S)} and max{y i | y ∈ P (S)}. However, in most cases this is too costly for presolve. On the other side only considering single rows, as shown in Section 2, sometimes delivers only weak bounds. A reasonable compromise is to consider two rows of P (S) simultaneously. For this case we apply a hashing-based approach for finding promising pairs of columns as described in Section 4 and reuse the method of Section 3.4 for determining bounds on the dual variables y i . We illustrate the approach by an example.
Example 3.12. Consider the following mixed-integer program: With S = {3, 4} we obtain from (42) the following polyhedron P (S): −2y 1 + 2y 2 + y 4 ≤ 8 y 1 − y 2 + 2y 3 − 0.1y 4 ≤ −1.8 y 1 , y 2 , y 3 , y 4 ≥ 0. (43) The sign pattern of the first two columns in (43) prevents us from getting tighter bounds while considering only single rows. Such sign patterns or jammed substructures can only be resolved by a more global view or specifically in this case by a two-row approach. Now using a two-row approach as in Section 3. It should be noted that it would also be possible to use the presolve method described in Section 3.3 to determine tighter bounds for the dual variables. However, this was not done, because the presolve method shown in Section 3.4 always delivers bounds that are at least as strong as the bounds determined by LP-based bound tightening on two constraints.

Implementation Details
In order to further improve runtime behavior for this method it is advantageous to consider only a subset of the continuous variables to tighten the bounds of the dual variables y. Consider a continuous variable x j , j ∈ N \ I, with an explicit upper bound. We can imagine this as an additional constraint, i.e., −x j ≥ −u j > −∞. The corresponding dual variable, say y i , is a singleton column in the dual formulation or in other words | supp((A ) ·i )| = 1. That is the bounds of y i can only be improved in the dual constraint (A ·j ) y ≤ c j . Unfortunately, y i has an upper bound of ∞ and thus the minimal activity (2) needed for propagating bounds for the dual variables y k , k ∈ supp(A ·j ) \ {i} is always −∞ as y i has a coefficient of −1. Consequently, no new bounds can be identified from this row for the dual variables y k , k ∈ supp(A ·j ) \ {i}. In addition, the coefficient of −1 only allows the determination of a new lower bound for y i .
At the beginning of our implementation, it is verified which continuous variable has bounds that are implied by constraints and other variable bounds. Careful attention must be paid to interdependencies between bounds. That is if the redundancy of two bounds depends on each other only one bound is redundant. Redundant variable bounds are subsequently removed, which usually gives us more implied free variables or variables with only one bound. As free variables are assigned to dual equalities and variables with only a lower bound to dual inequalities this results mostly in a stronger dual formulation and therefore to tighter bounds on the dual variables. As described above, variables with a finite upper bound are excluded from further consideration.
LetS be the set of continuous variables without unimplied upper bounds. Our current implementation always performs FBBT (see Section 2) on P (S), for determining bounds on the dual variables y, which is essentially the approach published in [4]. In addition, the current implementation in SCIP uses the previously described two-row approach directly before FBBT. In order to exclude unfavorable situations as shown in (43) the pairing mechanism explicitly searches for jammed substructures to find matching pairs of rows. This makes it possible that the bounds calculated in the two-row approach can be exploited and further tightened in a subsequent FBBT realization.

Pairing Methods
An exhaustive search for suitable pairs of rows or columns to apply the presolve methods presented in the previous section is usually too inefficient since it is of quadratic complexity in the number of constraints or variables. In this section, we want to outline the hashingbased pairing mechanisms we have applied in our computational experiments. A detailed introduction to hashing can be found in [27]. The basic idea of our pairing mechanisms is to scan through the problem and punctually remember promising constraints or variables in a hashlist or hashtable. We then go through the problem for a second time and check whether the hashlist or hashtable contains a fitting counterpart for the constraint or variable we are currently looking at.

Pair-Search Based on a Hashtable
The presolve methods presented in Section 3.1 and Section 3.2 require a high number of matching coefficients. To achieve a favorable trade-off between search time and effectiveness, we use a pairing mechanism on variable pairs and constraint pairs respectively. In the following we explain the mechanism as used for the presolve method from Section 3.1 and highlight the differences to the pairing mechanism required for the other presolve method at the end of this section.
For each pair of variable indices j, k in each equality i, the triple (j, k, aij a ik ), consisting of the variable indices and their coefficients' ratio, is used as a key to store the quintuple (i, a ij , j, a ik , k). Efficiently storing and querying these tuples is very important for the overall runtime behavior of the algorithm. Therefore, we use the hashtable and collision resolution scheme described in [31, Sec. 2.1.8], which allows for constant time access under practical assumptions. If the key (j, k, aij a ik ) is already contained in the hashtable, then the entry corresponding to the sparser row is kept. Afterwards, for each pair of variable indices j, k in each inequality r, the hashtable is queried for (j, k, arj a rk ). If the conditions a ij = a rj , a ik = a rk , W = ∅ hold for the corresponding entry (i, a ij , j, a ik , k), then reformulation (11) is applied. To further decrease search time, we use the following set of limits to heuristically prevent unrewarding investigations.
First, we consider at most 4900 variable pairs per row to be hashed and added to the hashtable. The same limit is used when searching through the inequality rows after the hashtable is built. Further, we keep track of the number of failed hashtable queries of the current row. If no reduction is found on the current row, we add the number of failed hashtable queries to a global counter. If a reduction is successfully applied to the current row we instead decrease the global counter by the number of failed hashtable queries of this row. The presolve method stops when the global counter exceeds 100 ·|M|. The rationale is that we want to keep going if, after many useless calls that almost exceeded the budget, we finally reach a useful section of rows. However, for the case that the useful section is all at the beginning and we quickly want to quit afterwards, we do not allow a negative build-up, i.e. the global counter remains at zero if it were to become negative.
In addition, note that a target matrix with as few nonzeros as possible does not necessarily give the best results. In fact, it seems important to preserve special structures by not applying cancellation if it would destroy one of the following properties in the row r above: integrality of the coefficients; more specifically coefficients +1 and −1; set packing, set covering, set partitioning, or logicor constraint types; variables with no or only one up-/downlock [1].
Also, it should be mentioned that adding scaled equations to other constraints needs to be done with care. In particular, too large or too small scaling factors λ can lead to numerical problems. Currently, as in [4], a limit of |λ| ≤ 1000 is used.
For the presolve method from Section 3.2, we consider pairs of constraint indices and their coefficients. The pairing mechanism itself is then identical with the role of rows and columns being reversed. However, we only apply the reduction if we have |U | − |W | ≥ 100 as the reduction induces a larger computational overhead.
Finally, it is important to note that this hashing mechanism can miss possible reductions in two cases. First, a pair might not be evaluated as one of the imposed limits is hit before it was processed. Second, the hashing mechanism disregards pairs with an overlap of size one. This follows the intuition that the methods presented in Subsections 3.1 and becomes more powerful with increasing overlap.

Pair-Search Based on Multiple Hashlists
The following pairing mechanism is used for the presolve methods presented in Section 3.3 and Section 3.4. It is also used indirectly by the presolve method from Section 3.5 as it uses the method from Section 3.4 without its extension in order to derive stronger bounds from the dual problem than could be achieved by using regular FBBT. In all three cases, as highlighted by Lemma 3.1, it is important to find pairs of constraints such that variables shared by the constraints have coefficients with opposite sign. In order to do this, we devised a pairing mechanism that uses an open surjective hash function H(j, k) for variable indices j, k ∈ N and works as follows.

− Step 1:
Create the four sets For each key value h for which L h ++ and L h −− have both been created, perform the tworow bound tightening on all row pairs (r, s) ∈ L h ++ × L h −− , r = s, as this guarantees that for this row pair there exist at least two variables x j , x k whose coefficients a rj , a sj and a rk , a sk have opposite sign in the constraints r and s. This is done analogously for key values h for which L h +− and L h −+ have both been created. While doing so, add each processed pair's hash to a hashtable. To prevent unnecessarily processing the same pairs multiple times we query the hashtable before applying the two-row bound tightening.
In order to make this approach more efficient, several working limits are used in our implementation. First, creating the four sets L ++ , L −− , L +− , L −+ for all rows and variable pairs is too computationally expensive. Therefore, for each row we consider at most 10000 variable pairs and also limit the total number of variable pairs to be considered over all rows by 10 · |M|. Second, for each processed pair we check whether new bounds have been found and stop searching if 1000 consecutively processed pairs have not yielded tighter bounds. Third, before applying two-row bound tightening on a row pair we check if its hash is already contained in the hashtable tracking the already processed pairs. If 1000 consecutive pairs have been found to already be contained in the hashtable, the method also stops. Finally, the method stops after successfully processing at most |M| row pairs. Even if the method finds many reductions it is important to prevent the method from taking up too much time. To do this, we chose a limit which is linear in the problem size because the number of constraints in real-world problems as well as our testset can easily range from a few hundred to multiple millions and future instance sizes will increase even further.
Like the hashing mechanism presented in Section 4.1, this hashing mechanism can miss possible reductions if one of the limits is hit or a pair has overlap of size one.

Computational Results
In this section we will present our computational results. We discuss a comparison of each individual presolve method with a baseline run where all five methods are disabled. Finally, we provide an evaluation of the combined impact of all five presolve methods.
The methods have been implemented in SCIP 6.0.2 with SoPlex 4.0.2 as LP solver and the computations were performed on machines running on a Xeon E3-1240 v5 CPU ("Skylake", 4 cores, HT disabled, 3.5 GHz base frequency) and 32 GB RAM. Aside from en-or disabling the five presolve methods and setting a time limit of 7200 seconds we ran SCIP on its default settings. The run with all five presolve methods being disabled is referred to as Baseline. (Out of the five methods described in this paper, Two-Row Nonzero Cancellation is the only one that was already included in SCIP 6.0.2.) Our testset is the second version of the MIPLIB2017 benchmark set [22,33] released in June 2019. It consists of 240 instances from varying optimization backgrounds. To further increase the accuracy of our results we run each instance and each setting on four different random seed initializations as well as seed zero, which is used in the release version of SCIP. SCIP uses the random seed initializations to affect numerical perturbations in the LP solver, for tie breakings as well as heuristic decision rules that necessarily occur in many parts of the solving process. Each combination of instance and seed is considered its own data point to tackle the problem of performance variability [16,30] where small changes which are neutral from a purely mathematical point of view have a large impact on the performance of a MIP solver. Finally, we discard all combinations of instance and seed where computations ran into numerical difficulties or, as happened in rare cases, the new code revealed bugs in the existing framework.
The result tables in this section each present the comparison between two different runs and are structured as follows. Each row presents the aggregate results on a set of problem instances. The row all contains the entire testset. The subsets corresponding to the brackets [t, T ] contain instances that were solved by at least one run and for which the maximum solving time (among both runs) lies between t seconds and T seconds. If a configuration did not solve the problem within the time limit it is recorded as T seconds. In this work, we set T to the time limit of 7200 seconds and with increasing t, this provides a hierarchy of subsets by difficulty. In particular, the bracket [0, 7200] contains all instances that have been solved within the time limit and deserves special attention as it is the largest subset where computation times can be compared in a meaningful way. We then compare the number of instances that had been solved, as well as the shifted geometric mean of solving times and branch-and-bound nodes in the columns solved, time and nodes respectively. The shifted geometric mean of values x 1 , ..., x n is where the shift s is set to 1 second and 100 nodes respectively.

Two-Row Nonzero Cancellation
For a detailed description of this method we refer to Section 3.1. The method found reductions on 390 instances. Despite the method being applied to a large number of instances its performance impact, as presented in Table 1, is slightly negative in terms of runtime and also causes three fewer instances to be solved within the time limit. Although the impact is still only neutral on the subset [100,7200] the results show a tendency towards a positive impact on difficult instances. If a presolve method does not perform well, this is usually due to one or both of the following two reasons. First, the method itself takes too much time and therefore causes a degradation on models where it does not find enough reductions. Second, the reductions create a new substructure that is unfavorable for subsequent solves. In our case both reasons appear to play a role. From all of the presented presolve techniques, this method creates the largest runtime overhead. However, this does not suffice to fully explain the degradation observed in Table 1.

Two-Column Nonzero Cancellation
This presolve method was described in detail in Section 3.2. Since the rules for applying a reduction in this method are more restrictive than in the row version we observe reductions on only 115 instances. The performance results for this presolve method are presented in Table 2. In total, its impact can be considered neutral with two more instances being solved at the cost of a slightly prolonged runtime for instances in the subset [1000,7200]. In contrast to the neutral performance impact on the MIPLIB 2017 benchmark set, it has a tremendous impact on the testset studied in [29]. These instances are constructed by [29] based on a deterministic equivalent mixed-integer programming formulation of the two-sided chance constrained program (TSCCP). In total, 30 instances are publicly available at [28]. Additional tests were performed on these instances using the same settings as above. Similar to other tests, each instance was solved five times with different random seed initializations, resulting in a testset consisting of 150 MIPs.
The computational results presented in Table 3 show that the proposed method found reductions on all instances. Overall, the proposed method improves performance by 74% on the TSCCP testset. In addition, with this method, the solver was able to solve 147 instances within the time limit of 7200 seconds. In comparison, without this method the solver was able to solve only 98 instances within the same time limit.

LP-based Bound Tightening on Two Constraints
For a detailed description of this presolve method we refer to Section 3.3. The method applied reductions to 119 instances and the performance results are presented in Table 4.
On the entire testset, the method has neutral impact. However, with increasing difficulty we can see a positive impact of up to 4% for the instances in the subset [1000,7200].

Bound Tightening on Convex Combinations of Two Constraints
The extension of this presolve method presented in Section 3.4 is incorporated using SCIPs so-called clique table which represents the solvers current knowledge of binary variables that may not be set to one at the same time. The clique table is generated from set packing constraints explicitly stated in the model as well as information extracted during earlier presolve and is also used for cutting plane separation, linear constraint propagation, and primal heuristics (see [21]). From this clique table we generate the set packing constrains used in the extension. It is important to keep in mind that, although this allows the method to find better bounds than the method presented in Section 3.3, it also increases the cost in terms of running time. In our implementations, this amounts to an average of 0.11 seconds being spent per instance for the method from Section 3.3, which increases to an average of 0.31 seconds being spent per instance for this method. Note, however that this is in part due to this method running longer as it finds more reductions. As mentioned in Remark 3.11 this method works on a superset of the variables for which LP-based bound tightening on two constraints from Section 3.3 can tighten bounds on. As a result this method applies reductions in 142 instances, i.e. 23 more instances than for LP-based bound tightening. The results presented in Table 5 show that the additional bound tightenings found by this method indeed yield a positive impact on solving times. As for LP-based bound tightening the impact of this presolve method scales with instance difficulty. As a result its impact changes from neutral on the overall testset towards a considerable 6% improvement on the subset [1000,7200]. The results presented in Table 6 illustrate the benefit of the clique extension over the standard procedure developed by Belotti [8]. It can be seen that albeit the number of nodes increases when using the extension, it has an overall positive impact in terms of runtime. Further, when deactivating the clique extension the number of instances where reductions are found drops from 142 to 119.

Exploiting Complementary Slackness on Two Columns of Continuous Variables
For a detailed description of this presolve method we refer to Section 3.5. As can be seen in Table 7, the impact of the method is almost neutral on the overall testset as well as all subsets. The method performs reductions on only 11 instances. However, the neutral impact is not necessarily due to the inefficiency of the reductions themselves, but rather a consequence of the very low number of instances where this method can be applied because of missing continuous variables. On a more specialized testset of real-world supply chain instances, however, we have observed a significant influence of this presolve method. In [36], a testset of 40 real-world supply chain instances was studied. We performed additional tests on these instances with the same settings as above. It should be emphasized that this method found reductions on each of the 40 instances. On 9 instances more than 10% of the variables were fixed and on 8 instances more than 2% percent of the sides were changed to equalities.

All Presolve Methods
Finally, we ran all presolvers presented in this work in the following order: LPBound, CompSlack, ConvComb, TwoRowNonzeroCancel, TwoColNonzeroCancel. As previously illustrated in Example 3.4, different presolve methods can positively interact with each other, even if each method itself is neutral or even slightly negative. However, the results presented in Table 8 show that enabling all five presolve methods presented in this work has a slightly negative impact on solving times.  Table 9 shows the number of instances each of the presolvers applied reductions to when run individually (left) and with all five methods enabled (right). We can make two noteworthy observations. First, we have an additional confirmation that the presolve methods presented in this work do not enable each other to find additional reductions, i.e. each presolve method finds more reductions when run with the other four being disabled. Second, bound tightening based on convex combinations applies reductions on 99 instances even when run after LP based bound tightening. This shows that it is indeed the more powerful method. Before closing our presentation of computational results, we want to add that for practical development of a fast and stable MIP solver, there are three reasons to include presolve methods that show only neutral or even slightly negative impact on the MIPLIB 2017 benchmark set. First, there may be positive interactions between new and existing presolve methods. However, this was not the case for among the five methods presented in this work. Second, presolve methods with neutral or slightly negative performance impact in the benchmark set, may have significant impact on other instances, as we have seen for the two-sided chance constrained program instances for the technique described in Section 3.2 and for real-world supply chain instances for the technique described in Section 3.5. Finally, it is also important to actively involve new presolve methods in the development process of a solver to avoid overtuning of established methods. It could be, for example, that a new presolve method, which is not performing well at the moment, in combination with a newly developed heuristic reveals its potential. If this method had not been included in the development process, this would not have been apparent.

Conclusions
In this paper, we considered five two-row and two-column methods for mixed-integer presolve that are generally effective, but suffer from the limitation that simply testing all methods on all pairs of rows or columns is computationally not appropriate. We resolved this bottleneck by introducing two hashing-based pairing mechanisms that identify promising pairs of rows and columns quickly. During discussion of the five two-row and two-column presolve methods we focused on the required properties of promising pairs of rows or columns. Additionally we discussed an extension based on set packing constraints to further strengthen the results of the bound tightening method applied to convex combinations of two rows. We then introduced two hashing-based pairing mechanisms, implemented the ideas in the mixed-integer programming solver SCIP and conducted a performance evaluation on the MIPLIB 2017 benchmark set [33].
The first pairing mechanism, based on a hashtable, aimed at pairs of rows (or columns) such that there exist at least two coefficients that can be cancelled simultaneously by linearly combining the rows (or columns). This is required by the two nonzero cancellation methods, which turned out to have neutral or slightly negative effect on the performance.
The second pairing mechanism uses four hashlists to find pairs of rows such that there exist at least two variables with opposite sign. This increases the chance of finding stronger variable bounds when applying the two-row bound strengthening methods presented in Sections 3.3 and 3.4. Using bound tightening based on convex combinations, we were able to improve solving times by 1% on the entire testset and by up to a considerable 6% for instances with longer solving times.
Finally, we experimented with applying the method presented in Section 3.4 to improve the bound strengthening required during the dual presolve method presented in Section 3.5, but observed that this leads to only few reductions and no performance impact. This is largely due to the fact that many of the MIPLIB 2017 instances do not contain any continuous variables at all.
To conclude, the results of this paper demonstrate that two-row and two-column presolve methods hold potential for practically solving challenging mixed-integer programming problems when they are combined with a sophisticated selection mechanism. For the methods discussed in this paper, more extensive parameter tuning or extensions of the methods themselves may even lead to larger performance gains.
Other compelling directions for further investigation are finding new combinations of pairing mechanisms and two-row or two-column presolve methods as well as moving on to reductions based on three or even more rows or columns.