The cost of not knowing enough: mixed-integer optimization with implicit Lipschitz nonlinearities

It is folklore knowledge that nonconvex mixed-integer nonlinear optimization problems can be notoriously hard to solve in practice. In this paper we go one step further and drop analytical properties that are usually taken for granted in mixed-integer nonlinear optimization. First, we only assume Lipschitz continuity of the nonlinear functions and additionally consider multivariate implicit constraint functions that cannot be solved for any parameter analytically. For this class of mixed-integer problems we propose a novel algorithm based on an approximation of the feasible set in the domain of the nonlinear function—in contrast to an approximation of the graph of the function considered in prior work. This method is shown to compute approximate global optimal solutions in finite time and we also provide a worst-case iteration bound. In some first numerical experiments we show that the “cost of not knowing enough” is rather high by comparing our approach with the open-source global solver SCIP. This reveals that a lot of work is still to be done for this highly challenging class of problems and we thus finally propose some possible directions of future research.


Introduction
Mixed-integer nonlinear optimization problems (MINLPs) are one of the most important classes of models in mathematical optimization. This is due to its capability of modeling decisions via incorporating discrete aspects as well as the possibility of modeling nonlinear phenomena. However, the combination of these two aspects also makes these problems very hard to solve [1,2]. For a general overview about MINLP we refer to [3].
Both from a theoretic and algorithmic point of view it is important to classify MINLPs further. Probably the most important distinction is to be made between convex and nonconvex MINLPs. Since gradients yield valid cuts in the convex case, outer approximations of the feasible sets of the convex nonlinearities can be derived and exploited in algorithms; see, e.g., [4][5][6][7][8] and the references therein. In the nonconvex case this is not possible. Here, one typically needs to derive convex underestimators and concave overestimators that yield (piecewise) convex relaxations of the nonconvex nonlinearities [9][10][11]. These approaches and also most other algorithms for computing global optima of such nonconvex MINLPs are based on spatial branching. The method we propose is different in the sense that we generate an iteratively tightened but nonconvex outer approximation of the feasible set, which is algorithmically realized via constructing a tightened mixed-integer linear approximation of the MINLP. Moreover, spatial branching based on convex underestimators and concave overestimators usually exploit known analytical properties of the nonconvex nonlinear functions, which is obviously not possible if these properties, like, e.g., differentiability, are not known or even knowledge about the explicit representation is missing. In this case, one typically tries to resort to Lipschitz assumptions about the nonlinearities, which leads to the field of global Lipschitz optimization; see, e.g., [12][13][14][15][16][17][18] to name only a few. For a more detailed overview about this field see the textbook [19] and the references therein.
In this paper, we focus on a specific setting that can be observed in many applications (see below for some problem-specific references), namely that the Lipschitzian MINLP under consideration can be decomposed into a mixed-integer linear (MILP) part and a nonlinear part. Our working hypothesis in this and also the preceding works [20][21][22][23] is that the MILP part can be solved comparably fast and reliable whereas the nonlinearity really hampers the solution process-at least in combination with the MILP part of the problem. See also [24][25][26][27], where this working hypothesis is followed as well. At this point, we remark that the cited literature usually tries to get rid of the nonlinear functions by replacing or, so to say, re-modeling them using MILP-representable approximations. In this paper, we consider the case in which this re-modeling is not possible without adding additional unknowns and constraints to the problem.
One specific field of application that fits into the above discussion and that has been studied very successfully in recent years is mixed-integer nonlinear optimization of gas transport networks [22,[28][29][30][31]; see the recent book [32] and the survey [33] for more references. This application will also be studied later in our numerical case study. MINLPs from gas transport optimization are defined on graphs that represent the transport network. Mass balances and simple physical as well as technical bounds yield linear constraints in this context and controllable elements are typically modeled by mixed-integer (non)linear sets of variables and constraints. Here, the complicating nonlinearity is the system of differential equations modeling the gas flow through pipes. In its full beauty, these equations yield implicit and highly nonlinear constraints that can be evaluated only by simulation and that do not possess additional and desired analytical properties like convexity; see [34][35][36][37][38][39]. In almost all contributions of the literature cited above the authors make, depending on their special focus, tailored assumptions that allow to develop very effective solution approaches. In this paper, we follow the way paved by the paper [21], where general methods have been developed (and tested on gas transport problems) that only require Lipschitzian, 1-dimensional, and explicitly given nonlinearities. In this paper, we show that the approach can be extended to the case where the nonlinearity is only given implicitly. To this end, we relax the assumptions used in [21] as follows: 1. We consider constraints that are only implicitly given, i.e., constraints of type . , x n ) = x i cannot be achieved analytically for any i ∈ {1, . . . , n}. 2. We consider multi-dimensional constraint functions F(x) = 0 with x ∈ R n and n ≥ 2.
Regarding (1) it is of course possible to re-model the implicit constraint as F(x) = y and by adding additional linear constraints y = 0. We, however, refrain from such a re-modeling in this paper and consider the case were such a re-modeling, which also enlarges the problem, is not appropriate. It will turn out that both aspects drastically accentuate the computational hardness of the problem. Our contribution is the following. We formalize the problem class in Sect. 2 and the corresponding assumptions sketched above (Sect. 3). Based on this, we state an algorithm and prove its correctness, i.e., that it computes approximate global optimal solutions (or proves the infeasibility of the problem) in finite time. Moreover, we prove a worst-case iteration bound for the algorithm. Finally, we present a numerical case study in Sect. 4, where we apply the method to nonconvex problems from gas transport. This case study reveals that the "cost of not knowing enough" is rather high: Only small gas transport networks can be solved in reasonable time under the stated very weak assumptions. Moreover, a comparison with the open-source global solver SCIP shows that the latter is outperforming our approach if additional structure of the problem is at hand that is exploited by SCIP but not by our approach. This is why we consider the addressed problem class as an open computational challenge for which we state possible directions of future research in Sect. 5.

Problem definition
We consider problems of the form where i.e., x c(i) is a sub-vector of the vector x c of all continuous variables. We remark that all variables are finitely bounded by −∞ <x ≤x < ∞. Moreover, we note that the assumption that the nonlinearities only depend on the continuous variables is only made to simplify the technical discussions later on-it can always be formally satisfied by introducing auxiliary continuous variables. Instead of optimizing the objective of (1) over the feasible set F given by (1b, 1c) we replace F by an approximating sequence F k ≈ F and globally optimize the sequence of problems The iteration can then be stopped once a solution x k of (2) is close enough to the original feasible set F. To this end let be the ε-relaxed version of the original problem (1). Note that we only relax the nonlinearities whereas all other constraints stay as they are. The precise choice of the approximate sets F k will be detailed later in Sect. 3.

Algorithm
The main idea of our algorithm for solving Problem (1) to approximate global optimality is to split the problem into its mixed-integer linear and its nonlinear part. The mixed-integer linear part is solved in the so-called master problem, which additionally contains a successively tightened approximation of the zero sets of the nonlineari- . Let x k denote the master problem's solution in iteration k. Then, for every master problem solution, we check the feasibility of the solution w.r.t. the nonlinearities F i . In case of feasibility, we have found a global optimal solution of the original problem; and in case of infeasibility, we construct a tighter approximation of the zero sets for the next master problem based on the information obtained by evaluating the nonlinearities F i at the current master solution x k . For the latter, we need (i) the function values , as well as (ii) the global Lipschitz constants L i of the F i w.r.t. the norm · ∞ on R n i , i.e., it holds

Remark 3.1 Let us note that for a continuously differentiable function F i and arbitrary
holds by the fundamental theorem of calculus. Hence one obtains an estimate of the needed Lipschitz constant by setting We now show how to construct the successively tightened approximations of the zero sets in case of Assumption 1. Let k i denote this approximation of the zero set of function F i in iteration k of the algorithm. Initially, we start with the box 0 defined by the original bounds of the problem (or after presolving; see Sect. 4). Assume now that the evaluation of the nonlinearity yields |F i (x k c(i) )| ≥ ε for a prescribed tolerance ε > 0. We then exclude the box in the next iteration. If |F i (x k c(i) )| < ε holds, we set B k i = ∅. Putting these boxes together, we obtain the nonconvex outer approximation With this at hand, we can now formulate the master problem: The main goal of this section is to prove the correctness of the algorithm, i.e., finite termination of the algorithm at approximate global optimal points. However, before we do this, we first state and prove some properties of the master problem that will be used later and formally state the algorithm.

Proposition 3.2 Assume that Assumption 1 holds. Then, it holds
for all i ∈ [p] and all k. Thus, the master problem (6) is a relaxation of (1) for all k.
Proof The proposition follows directly from the construction (4) and (5) and the Lipschitz continuity of F i .
The next lemma shows that we can use state-of-the-art MILP software for solving the master problems.

Lemma 3.3
The nonconvex master problem (6) can be modeled as a mixed-integer linear problem. The number of additional variables and constraints required to formulate , can be formulated with mixed-integer linear constraints as well. We define the index set for the boxes to be excluded. Moreover, we definē for all i ∈ [p] and j ∈ I k i ; see Fig. 1 for an illustration. In (7) we use the notation 1 for the vector of ones in appropriate dimension.
To model the gray area in Fig. 1, we have to exclude the white rectangles B j i for all j ∈ I k i . This can be done in the following way: We remark that (

Algorithm 1 Global Optimization of MINLPs with Implicit Nonlinearities
Require: Problem (1) and ε > 0. Ensure: Returns an approximate global optimal point for Problem (1) or an indication of infeasibility.

5:
Let x k denote the optimal solution of (6). 6: Evaluate end for 11: Increase k ← k + 1. 12: end while ensure that B j i is excluded, we have to add the inequality (8f). One can easily see that System (8) requires O(k |c(i)|) additional variables and constraints for every i ∈ [p].
We additionally remark that after having solved the master problem, all F i can be evaluated in parallel in every iteration. The overall algorithm is formally stated in Algorithm 1.
We now prove correctness of the algorithm, i.e., that the algorithm terminates after a finite number of iterations with an approximate global optimal solution or with an indication of infeasibility of the original problem. Here, an approximate global optimum is an optimal point that does not violate any constraint more than a given ε > 0, i.e., an approximate global optimum is a solution of Problem (3).

Theorem 3.4
Suppose that Assumption 1 holds. Then, Algorithm 1 terminates after a finite number of iterations at an approximate globally optimal point x k of Problem (1) or with an indication that Problem (1) is infeasible.
Proof We assume that the algorithm does not terminate after a finite number of iterations. That is, there exists an i ∈ [p] and a corresponding subsequence (indexed by l) of iterates with We investigate the master problem's solutions x l c(i) . Since all variables are bounded, the subsequence (x l c(i) ) has a convergent subsequence (x μ c(i) ). Thus, we can write for all sufficiently large indices α and β of the μ-subsequence and arbitrarily small δ > 0. However, all iterates x l c(i) are excluded from the subsequent feasible set, see (4), and together with (9) we can write for all α, β. This contradicts (10).
Next, we establish a worst-case bound for required number of iterations.

Theorem 3.5 For given i ∈ [p]
, let n i = |c(i)|, i.e., x c(i) ∈ R n i . Furthermore, let , with side-length ε/L i . To see this, we note that each of the intervals [x j ,x j ] for j ∈ c(i) can be decomposed as into σ i j L i /ε + 1 subintervals of length lower or equal ε/L i . Taking the product of these intervals gives the desired cover of the hypercube. Now, by definition, in an iteration k during which k i is changed, a point x c(i) together with its neighborhood B k i is excluded from k i . In order for this to happen, needs to hold. Consequently, the neighborhood B k i contains a hypercube with sidelength 2ε/L i . As a consequence, if x c(i) is contained in H k ε , then no future iterate can be placed in this H k ε . This proves the claimed bound on the iterations. The last theorem shows that the maximum number of required iterations is bounded above by O((L max /ε)n), wheren = max{c(i) : i ∈ [p]} is the maximum number of arguments of the nonlinear functions F i and L max = max{L i : i ∈ [p]} is the maximum Lipschitz constant. Moreover, the asymptotic bounds derived in [40,41] indicate that, in general, no better worst-case iteration bound can be expected.

Remark 3.6
During the course of the algorithm, the excluded boxes B k i are determined based on the master problem's solution. To tighten the feasible set of the initial master problem, we first equidistantly sample points from the initial box 0 i . Afterward, we sort these sampling points by the excluded box volume, see (4), in descending order and add them to the initial master problem if the sampling points are not excluded by the box of a previous sampling point. By doing so, we obtain a tighter feasible region from the beginning at the cost of a larger MILP formulation.

Numerical case study
In this sect., we present computational results of Algorithm 1 applied to stationary gas transport optimization. We briefly describe the model in Sect. 4.1 and discuss the results in Sect. 4.2.

Stationary gas transport optimization
One central task in the gas industry is to transport prescribed supplied and discharged flows at minimum costs. Gas mainly flows from higher to lower pressures and in order to transport gas over large distances through pipeline systems, it is required to increase the gas pressure. This is done by compressors. Usually, these compressors add discrete aspects to the problem. In contrast to that, for this case study we decide to choose the easiest and, thus, continuous modeling of a compressor so that we can purely focus on the main aspects of the proposed algorithm, i.e., the iteratively tightened outer approximation of the nonconvex feasible set induced by the nonlinearities of the model.
We model a gas network as a directed graph G = (V , A) with node set V and arc set A. The set of nodes is partitioned into the set of entry nodes V + , where gas is supplied, the set of exit nodes V − , where gas is discharged, and the set of inner nodes V 0 . The set of arcs consists of pipes A pi and compressors A cm . We associate positive gas flow on arcs a = (u, v) with mass flow in arc direction, i.e., q a > 0 if gas flows from u to v and q a < 0 if gas flows from v to u. Moreover, mass flow variables q a are bounded from below and above, i.e., q a ∈ [q a ,q a ]. The sets δ in (u) := {a ∈ A : a = (v, u)} and δ out (u) := {a ∈ A : a = (u, v)} are the sets of in-and outgoing arcs for node u ∈ V . Thus, we model mass conservation by where q u denotes the supplied or discharged flows. In addition, we need bounded pressure variables p u ∈ [p u ,p u ] for each node u ∈ V . Pipes a ∈ A pi are used to transport the gas through the network. We consider their length L a , their diameter D a , their cross-sectional area A a , their slope s a , and their friction factor λ a , which we model using the formula of Nikuradse; see, e.g., [36]. Gas flow in networks is described by a system of partial differential equationsthe Euler equations for compressible fluids [42]. In what follows, we consider the stationary case and assume small velocities, constant temperatureT , and constant compressibility factorẑ. This leads to the so-called Weymouth equation, see [24,36], that describes the relation between pressure and mass flow on a pipe via where c denotes the speed of sound in natural gas. Finally, we describe our model of compressors a = ( p u , p v ) ∈ A cm . They are used to increase the inflow gas pressure to a higher outflow pressure, which we model via for given lower and upper compression bounds − a and + a , respectively. For more complicated compressor models, see, e.g., [35,37]. We can, in principle, also include these more complex and mixed-integer nonlinear models but refrain from these complicated models to be able to focus on the main algorithmic aspects of the proposed approach.
We now collect all component models and obtain the mixed-integer optimization problem min a∈A cm P a s.t. pressure and flow bounds, mass conservation: (11), pipe model: (12), compressor model: (13).
Note that the only nonlinearity of the model is given by the pipe model (12). Consequently, we compute the global Lipschitz constant analytically and obtain (4) by evaluating (12).

Results
Our test instances are the networks GasLib-4 and GasLib-4-Tree (see Fig. 2 left and right) as well as the networks GasLib-11 and GasLib-24; see [43]. The two latter instances are publicly available at http://gaslib.zib.de and the two first ones are available at https://github.com/m-schmidt-math-opt/cost-of-not-knowing-enough.
To simplify the presentation and to focus on the main aspects of the proposed method, we modified the networks GasLib-11 and GasLib-24. For the GasLib-11, we removed the valve since this element is not covered in the model discussed in this paper. This makes the resulting network a tree. Moreover, we increased the lower pressure bound at the exits exit02 and exit03 to provide a higher compression. For the GasLib-24 network, we deleted the resistor, the short pipe, and the control valve. The resulting network still contains cycles. Moreover, we again increased the lower pressure bounds of the exits exit02, exit03, and exit04.
The Lipschitz constants are computed by applying Remark 3.1 to (12). For all cyclic networks, we fixed the mass flows where possible (usually on the arcs that connect entries or exits with cyclic parts of the network) two reduce the dimension of the nonlinear constraint (12) by one. The modified networks together with the preprocessed data are publicly available as input data in the GitHub repository at https://github.com/m-schmidt-math-opt/cost-of-not-knowing-enough, where also all implementations can be found.
For our algorithm we use the following relative termination criterion. Let x k be the kth iterate that contains the values p k u , p k v , and q k a for the inflow pressure, the outflow pressure, and for the mass flow on pipe a. Denote furtherp k u ,p k v , andq k a the values that one gets by solving Equation (12) for the respective value and by plugging in the two other values from the current iteration. For instance, holds. We terminate our algorithm if the maximum relative error |q k a − q k a | q k a is below 1% for all pipes of the network.
Our algorithm is implemented in Python 3.8.8 using the corresponding Python interface of Gurobi. We solve all MILPs with Gurobi 9.1.1 using 2 physical cores, 4 logical processors, and up to 4 threads; see [44]. We also implemented the nonconvex model using Pyomo 5.7.3 [45,46] and solved it to global optimality with the opensource global solver SCIP 7.0.2 [47] to obtain a benchmark for our method. The time limit is set to 15 min for all tests. All computations were performed on an Intel © Core TM i5-3360M CPU with 4 cores and 2.8 GHz each and 4 GB RAM.
The first part of Table 1 displays the results for the GasLib-4-Tree network. This network is a tree so that we never sample the flow space (see the "-" entries in the second column). This instance is always solved in less the 1s, independent of how coarse or fine the sampling is done before the main algorithm starts. SCIP (taken as a benchmark) solves the problem in 0.04 s and terminates with a global optimal objective function value of 6.74. Note that the objective function values in the first part of Table 1 are always slightly smaller, which is consistent to that we compute approximate global optimal points and that this approximation always corresponds to a relaxation. Finally, we see that the maximum relative error is always below 1 % as desired. We see that the number of required iterations is monotonically decreasing in dependence of the granularity of the sampling process that we apply before the main algorithm starts. This is to be expected since the more sampling points we use, the better our initial outer approximation of the zero set of the nonlinearities usually will be.
However, this is not always the case as can be seen in the middle part of the table that displays the results for the GasLib-4 network, which contains a cycle. SCIP solves this model in 0.64 s and delivers the optimal objective function value of 9.57. The algorithm inside SCIP is using the explicitly given algebraic representation of the constraints to set up a spatial branching process that is based on concave overestimators and convex underestimators of the nonlinearities. This significantly outperforms our method, which is not using the explicit representation of the constraint. Nevertheless, we can still solve the instance in approximately 150 s. Interestingly, less sampling in the preprocessing phase seems to be favorable since, otherwise, already the initial MILP is getting too large so that every iteration (in which we have to solve MILPs Table 1 The results of Algorithm 1 for the GasLib-4-Tree network (top), the GasLib-4 network (middle), and the GasLib-11 network (bottom). The first two columns state the equidistant step sizes for sampling the feasible pressure intervals (σ p ) and for the feasible mass flow intervals (σ q ), k is the number of iterations required by Algorithm 1, "t (sampling)" and "t (total)" is the time spent in the sampling routine and the overall solution time (in seconds). "Obj." is the approximate global optimal objective function value and "Rel. Err." is the achieved maximum relative error (in percent) of increasing size) is getting too costly. The extreme case is the very fine sampling shown in the last two rows of the middle part of the table, which leads to the case that our approach hits the time limit. Finally, the bottom part of the table shows the results for the GasLib-11 network. This network is larger than the two other ones but is again a tree. This leads to the fact that all mass flows can be fixed in the preprocessing and that the dimensions of the nonlinear constraints on the arcs are reduced by one. Again, less sampling leads to better results. We can solve the instance in 16 s if we do not apply any sampling. However, SCIP is much faster and solves the instance in 0.12 s, yielding an optimal objective function value of 12.93. For the GasLib-24 network, our method always reached the time limit, whereas SCIP solves the instance in 1.13 s with an optimal objective function value of 27.76. To sum up, we see that the "cost of not knowing" is rather high since the benchmark (SCIP) is solving the problems much faster. However, one has to remark that we compare a hand-crafted implementation of our method with a highly evolved and large-scale software package. SCIP is (of course, on purpose) exploiting all the given insights based on the explicit algebraic representation of the constraints to construct tight convex envelopes. We, in contrast to that, do not exploit (again, on purpose) all this information. The strength of our method lies in the fact that the required assumptions are very weak. However, for such instances, a comparison with a global solver such as SCIP does not make sense since such a global solver would simply not be able to solve the problem.
Let us finally take a closer look into the behavior of the proposed method by analyzing the iterates in the ( p u , p v ) space of an exemplary pipe during the solution process of the GasLib-11 network; see Fig. 3.
First, we see that the nonlinearity (the red curve) is rather moderate in this example, which is something that SCIP explicitly exploits since the constructed convex envelopes are rather tight in this case. Our method, on the other hand, is blind to such structural characteristics of the model. The blue box is the initial feasible box that our algorithm starts with. Black points correspond to iterates and the respective yellow boxes represent the excluded boxes. Since the Lipschitz constant stays the same over the course of the iteration, the size of the boxes is monotonic in the infeasibility of the corresponding iterate. One can nicely see the automatic adaptivity of the method: The nonlinearity is approximated much more accurately close to the optimal point. Moreover, the objective function of the problem solved in each iteration yields iterates mostly above the feasible curve.
Both aspects make clear that a more sophisticated sampling method in the preprocessing phase could lead to much better result than those presented for the equidistant sampling used in our numerical results. Finally, our method is, in general, very well suited for warmstarting since the MILPs do not differ very much from iteration to iteration. However, our method right now does not use this possibility since a direct use of the warmstarting capabilities of Gurobi cannot be used. The reason is that the variable space changes from iteration to iteration and that the old point is not feasible anymore since it is explicitly excluded from the feasible set by the last added box. Exploiting the potential of warmstarting would, in our opinion, improve the performance of the algorithm since it will speed up the running time in every iteration of the method.

Conclusion
Without any doubt, mixed-integer nonlinear optimization is hard. Fortunately, in many practically relevant situations effective solution approaches exist. In this paper, we considered especially hard cases of mixed-integer nonlinear optimization with implicit constraints that have the adverse property that evaluating the constraint does not give a feasible point and that they may be multi-dimensional. The numerical results are somehow sobering since we show that the "cost of not knowing enough" about the problem's structure is rather high. On the other hand, the class of considered problems is of high importance in many fields of applications.
Thus, there is need for computational improvement for this class of problems if they are tackled using MILP technology as a working horse. At this point, we suggest three possible directions of future research. by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.