A Successive Linear Relaxation Method for MINLPs with Multivariate Lipschitz Continuous Nonlinearities

Grübel, Julia; Krug, Richard; Schmidt, Martin; Wollner, Winnifried

doi:10.1007/s10957-023-02254-9

A Successive Linear Relaxation Method for MINLPs with Multivariate Lipschitz Continuous Nonlinearities

Open access
Published: 23 July 2023

Volume 198, pages 1077–1117, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

A Successive Linear Relaxation Method for MINLPs with Multivariate Lipschitz Continuous Nonlinearities

Download PDF

Julia Grübel^1,2,
Richard Krug³,
Martin Schmidt ORCID: orcid.org/0000-0001-6208-5677⁴ &
…
Winnifried Wollner⁵

1240 Accesses
1 Altmetric
Explore all metrics

Abstract

We present a novel method for mixed-integer optimization problems with multivariate and Lipschitz continuous nonlinearities. In particular, we do not assume that the nonlinear constraints are explicitly given but that we can only evaluate them and that we know their global Lipschitz constants. The algorithm is a successive linear relaxation method in which we alternate between solving a master problem, which is a mixed-integer linear relaxation of the original problem, and a subproblem, which is designed to tighten the linear relaxation of the next master problem by using the Lipschitz information about the respective functions. By doing so, we follow the ideas of Schmidt, Sirvent, and Wollner (Math Program 178(1):449–483 (2019) and Optim Lett 16(5):1355-1372 (2022)) and improve the tackling of multivariate constraints. Although multivariate nonlinearities obviously increase modeling capabilities, their incorporation also significantly increases the computational burden of the proposed algorithm. We prove the correctness of our method and also derive a worst-case iteration bound. Finally, we show the generality of the addressed problem class and the proposed method by illustrating that both bilevel optimization problems with nonconvex and quadratic lower levels as well as nonlinear and mixed-integer models of gas transport can be tackled by our method. We provide the necessary theory for both applications and briefly illustrate the outcomes of the new method when applied to these two problems.

A decomposition method for MINLPs with Lipschitz continuous nonlinearities

Article 23 June 2018

Solving mixed-integer nonlinear optimization problems using simultaneous convexification: a case study for gas networks

Article Open access 22 February 2021

Continuous Relaxation of MINLP Problems by Penalty Functions: A Practical Comparison

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Mixed-integer nonlinear optimization problems (MINLPs) form one of today’s most important classes of optimization models. The reason is, at least, twofold. First, the capability of modeling nonlinearities allows to include many sophisticated aspects of, e.g., physics, economics, engineering, or medicine. Second, the incorporation of integer variables makes it possible to model decision making such as turning on or off a machine or investing in a product or not. Of course, this modeling flexibility comes at the price of models that are hard to solve for realistically sized instances since MINLPs are NP-hard in general [17, 29]. Nevertheless, the theoretical and algorithmic advances of the last years and decades make it possible today to solve rather large-scale instances to global optimality in a reasonable amount of time [6]—in particular if problem-specific structural properties of the model can be exploited algorithmically; see, eg., [7, 15, 16, 33, 34] for the convex as well as [1, 36, 49, 51] for the nonconvex case.

In addition, there has been a significant amount of research devoted to the cases in which only very few structural assumptions can be exploited. This is the framework considered in this paper since we assume that certain functions of the model are not given explicitly but can be evaluated and some analytical properties such as Lipschitz continuity are known. To illustrate why this is important, let us sketch three areas of applications in which only few assumptions on the structure of the model (or on specific parts of the model) can be made. First, many mixed-integer optimization problems subject to ordinary or partial differential equations fit into this context. In many cases, approaches in this field are driven by incorporating the so-called control-to-state map into the optimization model to “eliminate” the differential equation from the model; see, eg., [5, 8, 23, 52]. This mapping, however, cannot be stated explicitly in general and one thus has to resort to exploiting analytical properties such as Lipschitz continuity and the ability to evaluate the mapping, at least in an approximate way. Second, and rather related to the first example, optimization models incorporating constraints that rely on calls to expensive simulation software can be cast in the framework mentioned in this paper as well if enough analytic information is known about the input-output mapping of the simulation code; see, eg., [2, 10]. Third and finally, bilevel optimization problems can also be interpreted as models in which a single constraint makes the problem much harder to solve [12,13,14]. In this case, it is the constraint that models the optimality of the decisions of the lower-level (or follower) problem given the decisions of the leader (which is the decision maker modeled in the upper-level problem). The best-reply function, which models the optimal response of the follower, can usually not be written in closed form but it can be evaluated (by solving the lower-level problem for a given feasible point of the upper level) and, under suitable assumptions [13], analytical properties such as Lipschitz continuity can be established.

This paper addresses a special class of MINLPs for which closed-form expressions for the nonlinearities are not available but Lipschitz continuity is guaranteed with known Lipschitz constants. To this end, all three areas of application discussed above can be addressed by the method proposed in this paper. One part of our contribution, indeed, is that the case studies presented later in Sects. 4 and 5 explicitly show the applicability of our method to bilevel problems with nonconvex and quadratic lower-level problems (Sect. 4) and to problems on gas transport networks that are subject to differential equations (Sect. 5). Both are classes of problems that received a lot of attention during the last years; see, eg., [4, 12, 31] and [32, 41]. Before we are able to tackle these problems, we first need to formally state the problem class under consideration, which is what we do in Sect. 2. Afterward, in Sect. 3, we describe the main rationale of the method, present it formally, and analyze it theoretically. The latter leads to a correctness theorem, showing that the method finitely terminates with an approximate solution and we further derive a worst-case iteration bound.

The main contributions of our work are the following. We develop an algorithm that only requires very weak assumptions for being applied. Hence, the method can be applied to a very large set of problems that cannot be solved with other classic MINLP solvers, which require that all constraints of the problem are given in closed form. To illustrate the generality of the method, we present the application to nonlinear gas network optimization as well as bilevel problems with nonconvex quadratic lower-level problems. Our work clearly needs to be seen as a generalization of the works [47, 48]. In particular, we generalize [47] to the multidimensional case for which we present a more effective numerical scheme compared to [48] that uses different geometries for the outer approximation as compared to those used in [47]. Since our main workhorse is the Lipschitz continuity of the nonlinearities, we are still in line with the works [25, 26, 28, 42,43,44, 53], to name only a few. For a more detailed overview about this field, see the textbook [27] and the references therein as well as [47, 48], where we discussed the positioning of the method in the literature in more detail.

2 Problem Definition

We consider the problem

$$\begin{aligned} \min _{x} \quad&c^\top x \end{aligned}$$

(1a)

$$\begin{aligned} \text {s.t.} \quad&Ax \ge b, \quad \underline{x} \le x \le \bar{{x}}, \quad x \in \mathbb {R}^{n} \times \mathbb {Z}^{m}, \end{aligned}$$

(1b)

$$\begin{aligned}&f_i(x_{I_i}) = x_{r_i}, \quad i \in [p], \end{aligned}$$

(1c)

where $c \in \mathbb {R}^{n+m}$, $A \in \mathbb {R}^{q \times (n+m)}$, and $b \in \mathbb {R}^q$ are given data, $\underline{x} \in \mathbb {R}^n \times \mathbb {Z}^m$ and $\bar{{x}} \in \mathbb {R}^n \times \mathbb {Z}^m$ are finite bounds, and $[p] \mathrel {{\mathop :}{=}}\{1, \dotsc , p\}$. Hence, we consider a linear objective (1a), linear mixed-integer constraints (1b), and nonlinear constraints defined by the functions $f_i: \mathbb {R}^{l_i} \rightarrow \mathbb {R}$. All $f_i$, $i \in [p]$, are Lipschitz continuous functions and $l_i = |{I_i}|$ is the number of their arguments. Moreover, $I_i\subset [n]$ is the index set of the variables on which the function $f_i$ depends. Without loss of generality, we further assume that $r_i\in [n]$ with $r_i\notin I_i$ for all $i \in [p]$. In what follows, we also write $x_{\mathcal {I}_i} = (x_{I_i}, x_{r_i}) \in \mathbb {R}^{l_i + 1}$.^{Footnote 1}

The main challenge when solving Problem (1) is that we assume that the nonlinear functions $f_i$ are not given in closed form but that we can only evaluate them and that we know their Lipschitz constants.

The $\varepsilon $-relaxed version of the original problem (1) is given by

$$\begin{aligned} \min _{x} \quad&c^\top x \end{aligned}$$

(2a)

$$\begin{aligned} \text {s.t.} \quad&Ax \ge b, \quad \underline{x} \le x \le \bar{{x}}, \quad x \in \mathbb {R}^{n} \times \mathbb {Z}^{m}, \end{aligned}$$

(2b)

$$\begin{aligned}&|{f_i(x_{I_i}) - x_{r_i}}| \le \varepsilon , \quad i \in [p], \end{aligned}$$

(2c)

where $\varepsilon > 0$ is a prescribed tolerance. A feasible point of (2) is called an $\varepsilon $-feasible point of Problem (1). Moreover, we call a global solution of (2) an approximate global optimal solution in what follows.

3 The Algorithm

In this section, we introduce an iterative procedure to solve Problem (1) to approximate global optimality. The main idea is to relax all nonlinearities by utilizing the Lipschitz continuity of these functions. In each iteration, the relaxed problem, which we will call the master problem, needs to be solved to global optimality. Subsequently, a subproblem is solved to tighten the relaxation for the next master problem. This procedure is then repeated until an $\varepsilon $-feasible solution is found or until it can be shown that the original problem is infeasible.

The master problem in iteration k reads

where $\Omega _i^k$ is a relaxation of the graph of the nonlinearity $f_i$. This relaxation will be stated in terms of mixed-integer linear constraints; see below. The idea behind is to partition the domain of $f_i$ into a set of boxes that are indexed using indices in $J_i^k = \{1,\dots , |{J_i^k}|\}$ and to linearly relax the graph over each box with the region $\Omega _i^k(j)$, $j \in J_i^k$, using the Lipschitz continuity of $f_i$. Hence, we obtain

$$\begin{aligned} \Omega _i^k = \bigcup _{j \in J_i^k} \Omega _i^k(j). \end{aligned}$$

(3)

After solving the master problem, the boxes that contain the solution $x^k$ are identified and split in smaller boxes to get a finer relaxation for the next iteration. The main purpose of solving the subproblem afterward is to find good splitting points for these boxes. To this end, the subproblem determines a point of the graph of each nonlinearity and, at the same time, tries to minimize the distance to the solution of the last master problem. Hence, the subproblem is given by

where $j_i^k \in J_i^k$ denotes the box with $x_{\mathcal {I}_i}^k \in \Omega _i^k(j_i^k)$ for all $i \in [p]$ and ${\tilde{\Omega }}_i^k(j_i^k)$ is a suitably chosen subregion of $\Omega _i^k(j_i^k)$. The reason for the use of these subregions will be discussed in detail in Sect. 3.2.

Figure 1 depicts the subproblem (4) with the regions ${\tilde{\Omega }}_i^k(j_i^k)$ and $\Omega _i^k(j_i^k)$ (left) and the corresponding regions $\Omega _i^{k+1}(j)$ and $\Omega _i^{k+1}(j+1)$ of the master problem (M(k)) in the next iteration (right).

Before we continue with the discussion of the master problem, let us briefly explain the systematics of our index notation. The index i of the respective nonlinearity is subscripted, while the iteration index k is superscripted. The box index j is written in parentheses because it can depend on the other indices i and k as well.

3.1 Construction of the Master Problem’s Feasible Region

We now describe in detail how we construct the relaxations $\Omega _i^k$. First, we define the box

$$\begin{aligned} B(\underline{v}, \bar{{v}}) \mathrel {{\mathop :}{=}}\{x \in \mathbb {R}^{d}:\underline{v} \le x \le \bar{{v}}\} \end{aligned}$$

for $\underline{v}, \bar{{v}} \in \mathbb {R}^{d}$, $\underline{v} \le \bar{{v}}$, and arbitrary dimension d.

For each $i \in [p]$, we assume that we are given vectors $\underline{v}_i^k(j), \bar{{v}}_i^k(j) \in \mathbb {R}^{l_i}$ for $j \in J_i^k$ such that the boxes $B(\underline{v}_i^k(j), \bar{{v}}_i^k(j))$ have pairwise disjoint interiors and cover the bounding box of $x_{I_i}$, i.e., we have

$$\begin{aligned} B\left( \underline{x}_{I_i}, \bar{{x}}_{I_i}\right) = \bigcup _{j \in J_i^k} B\left( \underline{v}_i^k(j), \bar{{v}}_i^k(j)\right) . \end{aligned}$$

(4)

Let $L_i$ be the Lipschitz constant of $f_i$ on $B(\underline{x}_{I_i}, \bar{{x}}_{I_i}) \subset \mathbb {R}^{l_i}$ w.r.t. a given norm $\Vert {\cdot }\Vert $, where any (weighted) norm in $\mathbb {R}^{l_i}$ can be used. Let $m_i^k(j)$ be the center point of the box $B(\underline{v}_i^k(j), \bar{{v}}_i^k(j))$, ie.,

$$\begin{aligned} m_i^k(j) = \frac{1}{2} \left( \underline{v}_i^k(j) + \bar{{v}}_i^k(j) \right) \end{aligned}$$

holds. Due to the Lipschitz continuity of $f_i$, we have

$$\begin{aligned} x_{r_i}&\le f_i(m_i^k(j)) + L_i \Vert {x_{I_i} - m_i^k(j)}\Vert , \\ x_{r_i}&\ge f_i(m_i^k(j)) - L_i \Vert {x_{I_i} - m_i^k(j)}\Vert \end{aligned}$$

for $x_{I_i} \in B(\underline{v}_i^k(j), \bar{{v}}_i^k(j))$. Since $\Vert {x_{I_i} - m_i^k(j)}\Vert $ attains its maximum over $B(\underline{v}_i^k(j), \bar{{v}}_i^k(j))$ in the vertices of the box, we can replace $x_{I_i}$ with $\bar{{v}}_i^k(j)$. It thus holds

$$\begin{aligned} x_{r_i}&\le f_i(m_i^k(j)) + L_i \Vert {\bar{{v}}_i^k(j) - m_i^k(j)}\Vert = f_i(m_i^k(j)) + \frac{L_i}{2} \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert , \end{aligned}$$

(5a)

$$\begin{aligned} x_{r_i}&\ge f_i(m_i^k(j)) - L_i \Vert {\bar{{v}}_i^k(j) - m_i^k(j)}\Vert = f_i(m_i^k(j)) - \frac{L_i}{2} \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert \end{aligned}$$

(5b)

for $x_{I_i} \in B(\underline{v}_i^k(j), \bar{{v}}_i^k(j))$. With this, we can define the region $\Omega _i^k(j)$ for $j \in J_i^k$ as the box

$$\begin{aligned} \begin{aligned} \Omega _i^k(j) \mathrel {{\mathop :}{=}}\Big \{(x_{I_i}, x_{r_i}) \in \mathbb {R}^{l_i + 1} :&\underline{v}_i^k(j) \le x_{I_i} \le \bar{{v}}_i^k(j),\\&x_{r_i} \le f_i(m_i^k(j)) + \frac{1}{2}L_i \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert ,\\&x_{r_i} \ge f_i(m_i^k(j)) - \frac{1}{2}L_i \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert \Big \}. \end{aligned} \end{aligned}$$

The next step is to define the region $\Omega _i^k$ as the union of all $\Omega _i^k(j)$; see (3). Using (4), we obtain a covering of the bounding box of $x_{I_i}$ and via (5), we obtain bounds for $x_{r_i}$. Hence, it follows that $\Omega _i^k$ is a relaxation of the graph of the function $f_i$.

Proposition 3.1

It holds ${{\,\textrm{graph}\,}}(f_i) \subseteq \Omega _i^k$ on $B(\underline{x}_{I_i}, \bar{{x}}_{I_i})$ for all $i \in [p]$ and all k.

For what follows, we abbreviate

$$\begin{aligned} \mathcal {X}_i^k \mathrel {{\mathop :}{=}}\left\{ B(\underline{v}_i^k(1), \bar{{v}}_i^k(1)), \dots , B(\underline{v}_i^k(|{J_i^k}|), \bar{{v}}_i^k(|{J_i^k}|))\right\} , \end{aligned}$$

ie., $\mathcal {X}_i^k$ is the set of boxes that is used to define $\Omega _i^k$. In our algorithm, we use $\Omega _i^k$ to replace the nonlinear constraints $f_i(x_{I_i}) = x_{r_i}$ for all $i \in [p]$ in Problem (1) in order to obtain a relaxation.

Lemma 3.2

The master problem (M(k)) can be modeled as mixed-integer linear problem.

Proof

We can write (M(k)) using the following Big-M formulation:

$$\begin{aligned} \min _{x, z} \quad&c^\top x \end{aligned}$$

(6a)

$$\begin{aligned} \text {s.t.} \quad&Ax \ge b, \quad \underline{x} \le x \le \bar{{x}}, \quad x \in \mathbb {R}^{n} \times \mathbb {Z}^{m}, \end{aligned}$$

(6b)

$$\begin{aligned}&x_{I_i} \ge \underline{v}_{I_i}^k(j) - M (1 - z_i^k(j)),&i \in [p], \ j \in J_i^k, \end{aligned}$$

(6c)

$$\begin{aligned}&x_{I_i} \le \bar{{v}}_{I_i}^k(j) + M (1 - z_i^k(j)),&i \in [p], \ j \in J_i^k, \end{aligned}$$

(6d)

$$\begin{aligned}&x_{r_i} \le f_i(m_i^k(j)) + \frac{1}{2}L_i \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert + M (1 - z_i^k(j)),&i \in [p], \ j \in J_i^k, \end{aligned}$$

(6e)

$$\begin{aligned}&x_{r_i} \ge f_i(m_i^k(j)) - \frac{1}{2}L_i \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert - M (1 - z_i^k(j)),&i \in [p], \ j \in J_i^k, \end{aligned}$$

(6f)

$$\begin{aligned}&\sum _{j \in J_i^k} z_i^k(j) = 1,&i \in [p], \end{aligned}$$

(6g)

$$\begin{aligned}&z_i^k(j) \in \{0, 1\},&i \in [p], \ j \in J_i^k. \end{aligned}$$

(6h)

The rationale of this model is as follows. For each nonlinearity $i \in [p]$ and each box $j \in J_i^k$, we introduce a binary variable $z_i^k(j)$ that indicates whether the solution lies in this box or not. If $z_i^k(j) = 1$, the constraints (6c)–(6f) are equivalent to the definition of $\Omega _i^k(j)$. If $z_i^k(j) = 0$, then (6c)–(6f) are always fulfilled if the constant M is chosen sufficiently large. Constraint (6g) finally ensures that for each nonlinearity $i \in [p]$ exactly one box $j \in J_i^k$ is chosen. $\square $

Let us remark that it is always possible in our setting to obtain finite and sufficiently large values M by using the finite bounds on the variables in (1b). Also note that the current algorithm is a generalization of the algorithm in [47] with the main change being the use of boxes instead of polytopes. This change is necessary because the number of constraints needed to model the polytopes would increase exponentially with the number of arguments $l_i$ of the nonlinearity $f_i$.

3.2 Construction of the Subproblem

We now introduce the chosen subregion of $\Omega _i^k(j)$ used in the subproblem (4). We define this region as

$$\begin{aligned} {\tilde{\Omega }}_i^k(j) = \Omega _i^k(j) \cap {\hat{\Omega }}_i^k(j), \end{aligned}$$

using the further subregion

$$\begin{aligned} {\hat{\Omega }}_i^k(j) \!=\! \{(x_{I_i}, x_{r_i}) \in \mathbb {R}^{l_i + 1}:(1 \!-\! \lambda ) \underline{v}_i^k(j)\! +\! \lambda \bar{{v}}_i^k(j) \le x_{I_i} \le \lambda \underline{v}_i^k(j) \!+\! (1 - \lambda ) \bar{{v}}_i^k(j)\}, \end{aligned}$$

for some $\lambda \in (0, 1/2]$. This ensures that the solution of the subproblem (4) cannot be arbitrarily close to the edges of the used box if $\lambda > 0$ is chosen appropriately.

Let us also note that for each iteration $k \in \mathbb {N}$, the subproblem (4) can be separated into p smaller problems under reasonable assumptions. If the index sets $\mathcal {I}_i= (I_i, r_i)$ are non-overlapping, ie.,

$$\begin{aligned} \left( I_i\cup \{r_i\}\right) \cap \left( I_j \cup \{r_j\}\right) = \emptyset \quad \text {for all} \quad i,j \in [p],\ i \ne j, \end{aligned}$$

(7)

these smaller problems can be solved in parallel. The above assumption can always be satisfied by introducing additional auxiliary variables.

Lemma 3.3

Suppose that the index sets $\mathcal {I}_i= (I_i, r_i)$ are non-overlapping, ie., (7) holds. Then, the subproblem (4) is completely separable, ie., we can solve the subproblem in iteration k by solving p smaller problems given by

$$\begin{aligned} \min _{{\tilde{x}}_{\mathcal {I}_i}} \quad \Vert {{\tilde{x}}_{\mathcal {I}_i} - x_{\mathcal {I}_i}^k}\Vert _2^2 \quad \mathrm{s.t.} \quad f_i({\tilde{x}}_{I_i}) = {\tilde{x}}_{r_i}, \quad {\tilde{x}}_{\mathcal {I}_i} \in {\tilde{\Omega }}_i^k(j_i^k). \end{aligned}$$

(8)

Proof

The constraints of (4), ie.,

$$\begin{aligned} f_i({\tilde{x}}_{I_i}) = {\tilde{x}}_{r_i}, \quad {\tilde{x}}_{\mathcal {I}_i} \in {\tilde{\Omega }}_i^k(j_i^k), \quad i \in [p], \end{aligned}$$

completely decouple along $i \in [p]$ and so does the objective function

$$\begin{aligned} \Vert {{\tilde{x}} - x^k}\Vert _2^2 = \sum _{i \in [p]} \Vert {{\tilde{x}}_{\mathcal {I}_i} - x_{\mathcal {I}_i}^k}\Vert _2^2. \end{aligned}$$

Therefore, the solution of the subproblem (4) can be obtained by solving Problem (8) for all $i \in [p]$. $\square $

Note that, in the formal sense of complexity theory, the subproblem can be as hard to solve as the originally given MINLP. However, the split into multiple and thus much smaller subproblems can make a huge difference in practice. Moreover, we completely split the mixed-integer aspects from the nonlinear aspects of the problem, which can also help a lot in solving the subproblems although they are still hard in the formal sense.

3.3 Formal Statement of the Algorithm

Before we can formally introduce the algorithm, we need the following notation. Let $B(\underline{v}, \bar{{v}}) \subseteq \mathbb {R}^{d}$ be a box with an interior point $x \in {{\,\textrm{int}\,}}B(\underline{v}, \bar{{v}})$, ie., $\underline{v}< x < \bar{{v}}$. The point x splits the box into a set of boxes that we define as

$$\begin{aligned} S(\underline{v}, \bar{{v}}, x) \mathrel {{\mathop :}{=}}\{B(\underline{w}, \bar{{w}}):(\underline{w}_\ell = \underline{v}_\ell \wedge \bar{{w}}_\ell = x_\ell ) \vee (\underline{w}_\ell = x_\ell \wedge \bar{{w}}_\ell = \bar{{v}}_\ell ) \text { for all } \ell \in [d]\}. \end{aligned}$$

We can utilize this notation to get a finer relaxation of ${{\,\textrm{graph}\,}}(f)$ by splitting an element of $\mathcal {X}_i^k$ using the solution of the subproblem (4) as the splitting point. This yields a set of smaller boxes that still fulfills Condition (4) as the following proposition states.

Proposition 3.4

For a given box $B(\underline{v}, \bar{{v}}) \subset \mathbb {R}^{l_i}$ and an interior point $x \in {{\,\textrm{int}\,}}B(\underline{v}, \bar{{v}})$, the set $S(\underline{v}, \bar{{v}}, x)$ contains $2^{l_i}$ smaller boxes that have pairwise disjoint interiors and that completely cover the box $B(\underline{v}, \bar{{v}})$, ie.,

$$\begin{aligned} B(\underline{v}, \bar{{v}}) = \bigcup _{b \in S(\underline{v}, \bar{{v}}, x)} b. \end{aligned}$$

We can now present the complete method, which is given in Algorithm 1. Before we prove its correctness, let us first discuss its basic functionality. After the master problem (M(k)) is solved in Step 3, it is checked in Step 8 if its solution is already $\varepsilon $-feasible for the original problem. To determine the boxes $j_i^k \in J_i^k$ in Step 11, one can simply check the indicator variables $z_i^k(j)$ of the MIP formulation (6). If the solution is not yet $\varepsilon $-feasible, then there are nonlinearities $f_i$ that have feasibility violations larger than $\varepsilon $. For these nonlinearities, we refine the relaxation of the master problem in Step 15 and re-iterate.

Note that it is not necessary for the correctness of Algorithm 1 to solve the subproblem (4) to global optimality. Our rationale, however, is that optimal solutions of the subproblems yield better splitting points that lead to faster convergence in practice. For the correctness of the algorithm, it is sufficient to find feasible points of (4) that are guaranteed to exist due to the following lemma.

Lemma 3.5

All subproblems (4) are feasible if Property (7) is satisfied.

Proof

Because of Property (7), the nonlinear constraints $f_i({\tilde{x}}_{I_i}) = {\tilde{x}}_{r_i}$ in (4) do not depend on each other. From Proposition 3.1, we know that the graph of $f_i$ over the initial region $B(\underline{x}_{I_i}, \bar{{x}}_{I_i})$ lies entirely in $\Omega _i^k$. Since the nonempty subregion ${\tilde{\Omega }}_i^k(j)$ restricts $\Omega _i^k$ in all but the last dimension, the claim follows. $\square $

Next, we prove that Algorithm 1 always terminates after finitely many iterations.

Theorem 3.6

There exists a constant $K < \infty $ such that Algorithm 1 either terminates with an approximate global optimal solution $x^{k^*}$ or with the indication of infeasibility in an iteration $k^* \le K$.

Proof

The box $\Omega _i^k(j)$ is bounded for each iteration k and all $i \in [p]$ and $j \in J_i^k$. For the $x_{r_i}$-coordinate, the bounding inequalities are given by

$$\begin{aligned} f_i(m_i^k(j)) - \frac{1}{2}L_i \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert \le x_{r_i} \le f_i(m_i^k(j)) + \frac{1}{2}L_i \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert . \end{aligned}$$

Therefore, the corresponding side length of the box $\Omega _i^k(j)$ in its $x_{r_i}$-coordinate is

$$\begin{aligned} d_{r_i}^k(j) \mathrel {{\mathop :}{=}}L_i \Vert {\bar{{v}}_i^k(j) - \underline{v}_i^k(j)}\Vert . \end{aligned}$$

If $d_{r_i}^k(j) \le \varepsilon $ holds, then the inequality

$$\begin{aligned} |{f_i(x_{I_i}^k) - x_{r_i}^k}| > \varepsilon \end{aligned}$$

in Step 14 of Algorithm 1 cannot be fulfilled. It follows that the box $B(\underline{v}_i^k(j_i^k), \bar{{v}}_i^k(j_i^k))$ will not be split but remains the same in Step 17.

Next, we analyze how $d_{r_i}^k(j)$ changes if a box is split into smaller boxes. Let

$$\begin{aligned} B\left( \underline{v}_i^{k+1}(j), \bar{{v}}_i^{k+1}(j)\right) \in S\left( \underline{v}_i^k(j_i^k), \bar{{v}}_i^k(j_i^k), {\tilde{x}}_i^k\right) \end{aligned}$$

be one of the smaller boxes that is added to $\mathcal {X}_i^k$ in Step 15. The side length $d_{r_i}^{k+1}(j)$ of the corresponding box $\Omega _i^{k+1}(j)$ can be bounded from above via

$$\begin{aligned} d_{r_i}^{k+1}(j){} & {} = L_i \Vert {\bar{{v}}_i^{k+1}(j) - \underline{v}_i^{k+1}(j)}\Vert \\ {}{} & {} \le (1 - \lambda ) L_i \Vert {\bar{{v}}_i^k(j_i^k) - \underline{v}_i^k(j_i^k)}\Vert = (1 - \lambda ) d_{r_i}^k(j_i^k). \end{aligned}$$

This means that $d_{r_i}^k(j)$ is decreased by at least a factor of $(1 - \lambda )$ if a box is split in Step 15. Since $\left( (1 - \lambda )^k\right) _{k \in \mathbb {N}}$ is a geometric sequence with $|{1 - \lambda }| < 1$, it converges to zero, ie., $(1 - \lambda )^k \rightarrow 0 \text { as } k \rightarrow \infty $. It follows that any box $B(\underline{v}_i^k(j), \bar{{v}}_i^k(j))$—including the first box $B(\underline{x}_{I_i}, \bar{{x}}_{I_i})$—can only be split finitely many times in Step 15 before the side length $d_{r_i}^k(j)$ of $\Omega _i^k(j)$ fulfills $d_{r_i}^k(j) \le \varepsilon $.

Since the index set [p] is finite, there are only finitely many boxes in $\mathcal {X}_i^k$ for all $i \in [p]$ and all k. These boxes can only be split finitely many times. Hence, there exists an iteration $K < \infty $ in which no box will be split in Step 15. This, however, can only be the case if the if-condition in Step 14 does not hold for all $i \in [p]$. Thus, we have

$$\begin{aligned} |{f_i(x_{I_i}^K) - x_{r_i}^K}| \le \varepsilon \quad \text {for all } i \in [p], \end{aligned}$$

which is the if-condition in Step 8. This means that the algorithm terminates in Step 9 in an iteration K. Hence, it follows that there exists a $K < \infty $ such that Algorithm 1 terminates in Step 5 or 9 in an iteration $k^* \le K$. $\square $

We close this section by stating and proving a result for the worst-case number of required iterations of Algorithm 1.

Theorem 3.7

Algorithm 1 terminates after at most

$$\begin{aligned} K = \sum _{i \in [p]} \sum _{k = 0}^{S_i} 2^{k l_i} \end{aligned}$$

iterations with

for $i \in [p]$.

Proof

From the proof of Theorem 3.6, we know that a box $B(\underline{v}_i^k(j_i^k), \bar{{v}}_i^k(j_i^k))$ can only be split finitely many times before the side length $d_{r_i}^k(j)$ of $\Omega _i^k(j)$ satisfies $d_{r_i}^k(j) \le \varepsilon $. We can give an upper bound for how many iterations this takes for the first box $B(\underline{x}_{I_i}, \bar{{x}}_{I_i})$ by solving the equation

$$\begin{aligned} (1 - \lambda )^k L_i \Vert {\bar{{x}}_{I_i} - \underline{x}_{I_i}}\Vert = \varepsilon \end{aligned}$$

for k, which yields

$$\begin{aligned} k = \log _{(1 - \lambda )} \left( \frac{\varepsilon }{L_i \Vert {\bar{{x}}_{I_i} - \underline{x}_{I_i}}\Vert }\right) . \end{aligned}$$

Since the box $B(\underline{v}_i^k(j_i^k), \bar{{v}}_i^k(j_i^k))$ will not be split anymore if $d_{r_i}^k(j) \le \varepsilon $, we can round this value to obtain

For each box that is split, there are $2^{l_i}$ smaller boxes that are added to $\mathcal {X}_i^k$. Therefore, for each $i \in [p]$ the maximal number of iterations required until there are no boxes left in $\mathcal {X}_i^k$ that can be split is bounded from above by

$$\begin{aligned} \sum _{k = 0}^{S_i} \left( 2^{l_i} \right) ^k = \sum _{k = 0}^{S_i} 2^{k \, l_i}. \end{aligned}$$

(9)

Since it is possible that in each iteration, there is only a single nonlinearity $i \in [p]$ for which a box is split, we have to sum up (9) for each $i \in [p]$ to get

$$\begin{aligned} K = \sum _{i \in [p]} \sum _{k = 0}^{S_i} 2^{k l_i} \end{aligned}$$

as an upper bound for the required number of iterations of Algorithm 1. $\square $

Remark 3.8

Theorem 3.7 states that choosing $\lambda = 0.5$ results in the lowest number of iterations in the worst case. Then, no subproblem (4) is needed as one can simply evaluate the nonlinearity $f_i$ in the center point $m_i^k(j_i^k)$ of the current box to receive the splitting point. However, in practice it can be better to choose a smaller parameter $\lambda $, which allows the splitting point to be closer to the master problem’s solution and which, thus, may result in a finer approximation of the nonlinearity near the optimal solution of Problem (1).

4 Application to Nonlinear Bilevel Problems

The method developed in the previous section can be applied to nonlinear bilevel problems with nonconvex lower-level models, which is an extremely challenging class of problems. To illustrate this, we consider optimistic MIQP-QP bilevel problems of the form

$$\begin{aligned} \begin{aligned} \min _{x, y} \quad&\frac{1}{2} x^\top H_\textrm{u}x+ c_\textrm{u}^\top x+ \frac{1}{2} y^\top G_\textrm{u}y+ d_\textrm{u}^\top y\\ \text {s.t.} \quad&A x+ B y\le a, \quad \underline{x} \le x \le \bar{{x}}, \quad x\in \mathbb {R}^{{n_x}}, \\&y\in {{\,\mathrm{arg\,min}\,}}_{\tilde{y}} \bigg \{ \frac{1}{2} \tilde{y}^\top G_\textrm{l}\tilde{y}+ d_\textrm{l}^\top \tilde{y}: C x+ D \tilde{y}\le b, \ \underline{y} \le \tilde{y}\le \bar{{y}}, \ \tilde{y}\in \mathbb {R}^{n_y}\bigg \}, \end{aligned} \end{aligned}$$

(10)

where $x\in \mathbb {R}^{n_x}$ and $y\in \mathbb {R}^{n_y}$ denote the upper- and lower-level variables, which are finitely bounded by $\underline{x}$, $\bar{{x}}$, $\underline{y}$, and $\bar{{y}}$. Further, we have matrices $A \in \mathbb {R}^{{m_\textrm{u}}\times {n_x}}$, $B \in \mathbb {R}^{{m_\textrm{u}}\times {n_y}}$, $C\in \mathbb {R}^{{m_\textrm{l}}\times {n_x}}$, $D\in \mathbb {R}^{{m_\textrm{l}}\times {n_y}}$, as well as right-hand side vectors $a\in \mathbb {R}^{m_\textrm{u}}$ and $b\in \mathbb {R}^{m_\textrm{l}}$. In addition, we have $c_\textrm{u}\in \mathbb {R}^{n_x}$ and $d_\textrm{u}, d_\textrm{l}\in \mathbb {R}^{n_y}$. Finally, $H_\textrm{u}\in \mathbb {R}^{{n_x}\times {n_x}}$, $G_\textrm{u}\in \mathbb {R}^{{n_y}\times {n_y}}$ are positive semidefinite and symmetric matrices, while $G_\textrm{l}\in \mathbb {R}^{{n_y}\times {n_y}}$ is a possibly indefinite and symmetric matrix. Thus, the upper level is a convex-quadratic problem over linear constraints and the lower-level problem

$$\begin{aligned} \min _{\tilde{y}} \quad \frac{1}{2} \tilde{y}^\top G_\textrm{l}\tilde{y}+ d_\textrm{l}^\top \tilde{y}\quad \text {s.t.} \quad C x+ D \tilde{y}\le b, \ \underline{y} \le \tilde{y}\le \bar{{y}}, \ \tilde{y}\in \mathbb {R}^{n_y}, \end{aligned}$$

(11)

is an $x$-parameterized and continuous but nonconvex quadratic problem. Let $\varphi (\cdot )$ be the optimal-value function of the lower level, ie.,

$$\begin{aligned} \varphi ({x}) \mathrel {{\mathop :}{=}}\min _{\tilde{y}} \bigg \{ \frac{1}{2} \tilde{y}^\top G_\textrm{l}\tilde{y}+ d_\textrm{l}^\top \tilde{y}: C x+ D \tilde{y}\le b, \ \underline{y} \le \tilde{y}\le \bar{{y}}, \ \tilde{y}\in \mathbb {R}^{n_y}\bigg \}. \end{aligned}$$

With this, we can rewrite Problem (10) equivalently as the single-level problem

$$\begin{aligned} \min _{x, y} \quad&\frac{1}{2} x^\top H_\textrm{u}x+ c_\textrm{u}^\top x+ \frac{1}{2} y^\top G_\textrm{u}y+ d_\textrm{u}^\top y\end{aligned}$$

(12a)

$$\begin{aligned} \text {s.t.} \quad&A x+ B y\le a, \quad C x+ D y\le b, \end{aligned}$$

(12b)

$$\begin{aligned}&\underline{x} \le x \le \bar{{x}}, \quad \underline{y} \le y \le \bar{{y}}, \quad x\in \mathbb {R}^{n_x}, \quad y\in \mathbb {R}^{n_y}, \end{aligned}$$

(12c)

$$\begin{aligned}&\frac{1}{2} y^\top G_\textrm{l}y+ d_\textrm{l}^\top y\le \varphi (x), \end{aligned}$$

(12d)

see, eg., [13]. We now reformulate Problem (12) so that it fits into the framework introduced above. Therefore, we introduce the auxiliary variables $\eta _1$ and $\eta _2$ as well as the nonlinear function $f: \mathbb {R}^{{n_y}} \rightarrow \mathbb {R}$ with $f(y) = 1/2 y^\top G_\textrm{l}y+ d_\textrm{l}^\top y$. Based on this notation, Problem (12) can be restated as

$$\begin{aligned} \min _{x, y, \eta _1, \eta _2} \quad&\frac{1}{2} x^\top H_\textrm{u}x+ c_\textrm{u}^\top x+ \frac{1}{2} y^\top G_\textrm{u}y+ d_\textrm{u}^\top y\end{aligned}$$

(13a)

$$\begin{aligned} \text {s.t.} \quad&A x+ B y\le a, \quad C x+ D y\le b, \end{aligned}$$

(13b)

$$\begin{aligned}&\underline{x} \le x \le \bar{{x}}, \quad \underline{y} \le y \le \bar{{y}}, \quad x\in \mathbb {R}^{n_x}, \quad y\in \mathbb {R}^{n_y}, \end{aligned}$$

(13c)

$$\begin{aligned}&\eta _2 - \eta _1 \le 0, \quad \varphi (x) = \eta _1, \quad f(y) = \eta _2, \quad \eta _1, \eta _2 \in \mathbb {R}. \end{aligned}$$

(13d)

Now, the method developed in Sect. 3 can be applied to (13) if (i) the nonconvex functions $\varphi $ and f are Lipschitz continuous on the projections of the bilevel constraint region onto the decision space of the upper and lower level, respectively, ie., on the domains

$$\begin{aligned} {\mathcal {F}}_x&\mathrel {{\mathop :}{=}}\left\{ x\in [\underline{x}, \bar{{x}}]:\exists y\in \mathbb {R}^{n_y}\text { such that } A x+ B y\le a, \, C x+ D y\le b, \, \underline{y} \le y \le \bar{{y}}\right\} , \\ {\mathcal {F}}_y&\mathrel {{\mathop :}{=}}\left\{ y\in [\underline{y}, \bar{{y}}]:\exists x\in \mathbb {R}^{n_x}\text { such that } A x+ B y\le a, \, C x+ D y\le b, \, \underline{x} \le x \le \bar{{x}}\right\} , \end{aligned}$$

and if (ii) the associated Lipschitz constants are computable.

What makes things more complicated compared to the general setup described in Sect. 3 is that we can only evaluate the optimal-value function $\varphi (x)$ but we cannot optimize over it. Thus, we cannot use subproblem (4) directly to obtain a new splitting point. On the other hand, following the strategy to take the box center m as the new splitting point simplifies solving problem (4) to evaluating $\varphi (m)$. More precisely, using the box center corresponds to setting $\lambda = 1/2$. This, however, is only applicable if $\varphi (m)$ is well-defined, which we ensure with the following assumption.

Assumption 1

The set ${\mathcal {T}}(x) \mathrel {{\mathop :}{=}}\{ y\in \mathbb {R}^{{n_y}}:D y\le b - C x, \, \underline{y} \le y \le \bar{{y}}\}$ is nonempty for all $x\in B(\underline{x}, \bar{{x}})$.

This assumption implies that $- \infty< \varphi (x) < + \infty $ for all $x\in B(\underline{x}, \bar{{x}})$, ie., a minimizer $y$ of Problem (11) exists for all $x\in B(\underline{x}, \bar{{x}})$ and, thus, for every possible box center m.

Before we present the theoretical developments that are required to apply our method to the introduced class of bilevel problems, let us briefly discuss that interpreting approximate solutions of bilevel problems with nonconvex lower-level problems needs to be done with some care. In particular, it is shown in [3] that lower-level solutions that are only $\varepsilon $-feasible can lead to upper-level solutions that can be arbitrary far away from actual bilevel solutions. Since this also applies to our method, we later always present the difference of our solutions to the known optimal solutions of the bilevel problems in our test set.

4.1 Lipschitz Continuity Properties

To apply our method with the outlined modifications for the subproblem to Problem (13), it remains to show that the properties (i) and (ii) are fulfilled. We start with the nonconvex function f. Since the relevant domain $B(\underline{y}, \bar{{y}})$ of this function is compact, continuous differentiability of f implies global Lipschitz continuity of f on this set. Since $B(\underline{y}, \bar{{y}})$ is convex and compact, the tightest Lipschitz constant can be computed by solving the optimization problem

$$\begin{aligned} \max _{y\in B(\underline{y}, \bar{{y}})} || G_\textrm{l}y+ d_\textrm{l}||. \end{aligned}$$

(14)

Note that it would also be possible to compute the Lipschitz constant in (14) over the feasible set of the master problem, ie., over the set ${\mathcal {F}}_y$. However, this involves solving an optimization problem not over a simple box but over a more complex polytope. In our computational study, we test both variants. We will denote the former method as the “fast” method and the latter as the “slow” method. In the absence of lower- and upper-level constraints except for simple variable bounds, both approaches coincide.

Next, we continue with the more difficult case of proving Lipschitz continuity of the optimal-value function $\varphi $. To this end, we exploit a variant of the Hoffman Lemma; for the original version see the main theorem in [24] or Lemma 5.8 in [50]. For the ease of presentation, we assume from now on that the finite bounds on y are part of the lower-level inequality constraints and, thus, also part of the matrix C.

Lemma 4.1

(see Corollary 5.1 in [50]) Suppose Assumption 1 holds. There exists $L_H > 0$ such that for any $x,{\tilde{x}} \in B(\underline{x}, \bar{{x}})$ it holds: For any $y\in {\mathcal {T}}(x)$, we can find a point ${\tilde{y}} \in {\mathcal {T}}({\tilde{x}})$ with

$$\begin{aligned} ||{\tilde{y}}-y|| \le L_H ||C (x-{\tilde{x}})|| \le L_H ||C|| \, ||{\tilde{x}}-x||. \end{aligned}$$

The scalar $L_H$ is the so-called Hoffman constant. A sharp characterization of this constant and an algorithm to compute it can be found in [37, 40]. Based on the introduced variant of the Hoffman Lemma, we can now establish Lipschitz continuity of the optimal-value function $\varphi $ under Assumption 1. Our proof follows the idea of the proof of Corollary 5.2 in [50]. There, the Lipschitz continuity of the optimal-value function of a linear program with right-hand side perturbation is demonstrated. In contrast to this, we have a quadratic program with right-hand-side perturbation.

Theorem 4.2

Suppose Assumption 1 holds. Then, there exists $L > 0$ such that for any $x,{\tilde{x}} \in B(\underline{x}, \bar{{x}})$ it holds

$$\begin{aligned} | \varphi ({\tilde{x}}) - \varphi (x) | \le L || {\tilde{x}} - x||. \end{aligned}$$

Proof

Let any $x,{\tilde{x}} \in B(\underline{x}, \bar{{x}})$ be given. By Assumption 1, minimizers $y$ and ${\tilde{y}}$ of Problem (11) exist for $x$ and ${\tilde{x}}$, respectively. By Lemma 4.1, for $y$ we can find a point ${\hat{y}} \in {\mathcal {T}}({\tilde{x}})$ such that

$$\begin{aligned} ||{\hat{y}} - y|| \le L_H ||C|| \, || {\tilde{x}} - x|| \end{aligned}$$

holds for some $L_H > 0$. Based on this, we can conclude

$$\begin{aligned} \varphi ({\tilde{x}})&- \varphi (x) \\ \le \ {}&\frac{1}{2} {\hat{y}}^\top G_\textrm{l}{\hat{y}} + d_\textrm{l}^\top {\hat{y}} - \left( \frac{1}{2} y^\top G_\textrm{l}y+ d_\textrm{l}^\top y\right) \\ = \ {}&\frac{1}{2} {\hat{y}}^\top G_\textrm{l}{\hat{y}} - \frac{1}{2} y^\top G_\textrm{l}y+ d_\textrm{l}^\top ({\hat{y}} - y) \\ = \ {}&\frac{1}{2} {\hat{y}}^\top G_\textrm{l}{\hat{y}} - \frac{1}{2} {\hat{y}}^\top G_\textrm{l}y+ \frac{1}{2} {\hat{y}}^\top G_\textrm{l}y- \frac{1}{2} y^\top G_\textrm{l}y + \frac{1}{2} y^\top G_\textrm{l}{\hat{y}} - \frac{1}{2} y^\top G_\textrm{l}{\hat{y}} + d_\textrm{l}^\top ({\hat{y}} - y) \\ = \ {}&\frac{1}{2} {\hat{y}}^\top G_\textrm{l}({\hat{y}} - y) + \frac{1}{2} y^\top G_\textrm{l}({\hat{y}} - y) + \frac{1}{2} {\hat{y}}^\top G_\textrm{l}y- \frac{1}{2} y^\top G_\textrm{l}{\hat{y}} + d_\textrm{l}^\top ({\hat{y}} - y) \\ = \ {}&\frac{1}{2} \left( {\hat{y}}^\top G_\textrm{l}+ d_\textrm{l}^\top + y^\top G_\textrm{l}+ d_\textrm{l}^\top \right) ({\hat{y}} - y) + \frac{1}{2} y^\top G_\textrm{l}^\top {\hat{y}} - \frac{1}{2} y^\top G_\textrm{l}{\hat{y}} \\ \le \ {}&\frac{1}{2} \left( ||G_\textrm{l}^\top {\hat{y}} + d_\textrm{l}|| + ||G_\textrm{l}^\top y+ d_\textrm{l}||\right) ||{\hat{y}} - y|| \\ \le \ {}&L_{G_\textrm{l}} ||{\hat{y}} - y|| \\ \le \ {}&L_H ||C|| L_{G_\textrm{l}} ||{\tilde{x}} - x||, \end{aligned}$$

where we use the symmetry of $G_\textrm{l}$ and $L_{G_\textrm{l}} \mathrel {{\mathop :}{=}}\max \{ || G_\textrm{l}y+ d_\textrm{l}|| : x\in B(\underline{x}, \bar{{x}}), y\in {\mathcal {T}}(x) \}$, which is well-defined due to Assumption 1.

Analogously, by Lemma 4.1, for ${\tilde{y}}$ we can find a point ${\hat{y}} \in {\mathcal {T}}(x)$ such that

$$\begin{aligned} ||{\hat{y}} - {\tilde{y}}|| \le L_H ||C|| \, ||x-{\tilde{x}}|| \end{aligned}$$

holds for the same $L_H > 0$. With the same arguments as before, we obtain

$$\begin{aligned} \varphi (x) - \varphi ({\tilde{x}}) \le L_H ||C|| L_{G_\textrm{l}} ||{\tilde{x}}-x||. \end{aligned}$$

Consequently,

$$\begin{aligned} | \varphi ({\tilde{x}}) - \varphi (x) | \le L_H ||C|| L_{G_\textrm{l}} ||{\tilde{x}}-x|| \mathrel {{=}{\mathop :}}L ||{\tilde{x}}-x|| \end{aligned}$$

holds and the claim follows. $\square $

Let us finally note that the presented method is not restricted to nonconvex but quadratic problems in the lower level in general. If one has knowledge about the Lipschitz constant of a nonconvex lower-level problem with more general nonlinearities, the method can be applied as it is explained in this section.

4.2 Implementation Details

In this section, we discuss some implementation details to clarify how we modified and extended Algorithm 1 to get a more tailored method for the considered bilevel setup.

4.2.1 “Slow” and “Fast” Method for $\varphi $

Because ${\mathcal {T}}(x) \subseteq B(\underline{y}, \bar{{y}}) $ holds for all $x\in B(\underline{x}, \bar{{x}})$, we immediately get

$$\begin{aligned} L_{G_\textrm{l}} = \max _{x\in B(\underline{x}, \bar{{x}}), y\in {\mathcal {T}}(x)} || G_\textrm{l}y+ d_\textrm{l}|| \le \max _{y\in B(\underline{y}, \bar{{y}})} || G_\textrm{l}y+ d_\textrm{l}||. \end{aligned}$$

Since we have to compute $L_{G_\textrm{l}}$ for computing the Lipschitz constant of $\varphi $, we can distinguish between the “fast” and the “slow” method for $\varphi $ as well.

4.2.2 Additional Nonlinearities

Bilinear nonlinearities of the form $x_i y_j$ in the lower- or upper-level objective function can be easily reformulated to fit in our setup. If such nonlinearities occur, eg., in the lower level, an additional variable $y_k$ is introduced in the lower level. Moreover, the constraint $y_k = x_i$ is added to the lower level, while the nonlinear objective term $x_i y_j$ is replaced by the product $y_ky_j$. The resulting bilevel problem then fits in our setup.

4.2.3 Box Filtering

Figure 2 illustrates the case that after splitting an initial bounding box, here $[0,3]^2$, a few times, there might be boxes (such as $[2.25, 3]^2$) that do not include any point from the bilevel constraint region, which is colored in red in Fig. 2. To avoid further investigation of these boxes, we can detect these boxes by checking if the intersection of the bilevel constraint region with newly created boxes is empty. However, this requires to solve an LP feasibility problem and thus creates some additional computational effort. The benefit, however, is that constraints (6c)–(6f) and the corresponding binary variables are not added to the master problem for the filtered boxes in the given application. However, as the bilevel constraint region is also accounted for in the master problem, these filtered boxes are not further splited anyway, i.e., the constraints and variables that are not added to the master problem would be redundant if added. Hence, the additional computational effort of box filtering should only be undertaken if necessary.

Note that box filtering is not necessary if there are no lower-level and upper-level constraints except for simple variable bounds since the intersection of the bilevel constraint region and every possible box created by the algorithm can never be empty. In contrast, box filtering is necessary if the “slow” method is applied since, e.g., Problem (14) for computing the Lipschitz constant of the nonconvex function f might be infeasible on newly created boxes. Thus, these boxes must be filtered after each box splitting and before Lipschitz constants are updated. This does not occur with the “fast” method, and box filtering is therefore not necessary for this method.

4.2.4 Tighter Lipschitz Constants for Box-Constrained Lower Levels

For instances with simple variable bounds on the lower level that are not influenced by upper-level decisions, we can compute tighter Lipschitz constants for $\varphi $. To this end, we now explicitly take into account bilinear terms of the form $x_i y_j$ in the lower-level objective function and do not reformulate these terms as described in Sect. 4.2.2. Hence, the lower-level objective function is given by

$$\begin{aligned} f(y, x) = \frac{1}{2} \begin{pmatrix} y\\ x\end{pmatrix}^\top \begin{bmatrix} E &{}\quad F^\top \\ F &{}\quad 0 \end{bmatrix} \begin{pmatrix} y\\ x\end{pmatrix} + \begin{pmatrix} e \\ 0 \end{pmatrix}^\top \begin{pmatrix} y\\ x\end{pmatrix}, \end{aligned}$$

where E and $F \ne 0$ are suitably chosen matrices and e is a suitably chosen vector. Now, for given ${\hat{x}}, {\tilde{x}} \in B(\underline{x}, \bar{{x}})$, let ${\hat{y}}$ and ${\tilde{y}}$ be defined as

$$\begin{aligned} {\hat{y}} \mathrel {{\mathop :}{=}}{{\,\mathrm{arg\,min}\,}}_{y\in B(\underline{y}, \bar{{y}})} f(y, {\hat{x}}), \quad {\tilde{y}} \mathrel {{\mathop :}{=}}{{\,\mathrm{arg\,min}\,}}_{y\in B(\underline{y}, \bar{{y}})} f(y, {\tilde{x}}), \end{aligned}$$

(15)

where $B(\underline{y}, \bar{{y}})$ is the lower level’s feasible set in this case. Note that $B(\underline{y}, \bar{{y}}) = {\mathcal {T}}(x)$ holds for all $x\in B(\underline{x}, \bar{{x}})$ because $x$ only influences the lower-level objective function but not the lower level’s feasible set. Thus, it holds

$$\begin{aligned} \varphi ({\hat{x}}) - \varphi ({\tilde{x}}) =\ {}&f({\hat{y}},{\hat{x}}) - f({\tilde{y}}, {\tilde{x}}) \le f({\tilde{y}},{\hat{x}}) - f({\tilde{y}}, {\tilde{x}}) \\ = \ {}&{\tilde{y}}^\top F^\top {\hat{x}} - {\tilde{y}}^\top F^\top {\tilde{x}} \le ||F {\tilde{y}}|| \, || {\hat{x}} - {\tilde{x}} ||. \end{aligned}$$

Note that $f({\hat{y}},{\hat{x}}) \le f({\tilde{y}}, {\hat{x}})$ holds since ${\hat{y}}$ minimizes $f(y,{\hat{x}})$ over $B(\underline{y}, \bar{{y}})$ and ${\tilde{y}} \in B(\underline{y}, \bar{{y}})$. This is not necessarily true for the case of general lower-level constraints, ie., when optimizing in (15) over ${\mathcal {T}}({\hat{x}})$ and ${\mathcal {T}}({\tilde{x}})$, respectively, since ${\mathcal {T}}({\hat{x}}) \ne {\mathcal {T}}({\tilde{x}})$ and ${\tilde{y}} \notin {\mathcal {T}}({\hat{x}})$ might hold. Analogously, we obtain $\varphi ({\tilde{x}}) - \varphi ({\hat{x}}) \le ||F {\hat{y}}|| \, || {\hat{x}} - {\tilde{x}} ||$ and, consequently,

$$\begin{aligned} | \varphi ({\hat{x}}) - \varphi ({\tilde{x}}) | \le L_F || {\hat{x}} - {\tilde{x}} || \end{aligned}$$

holds with $L_F \mathrel {{\mathop :}{=}}\max \{||Fy||:y\in B(\underline{y}, \bar{{y}})\}$. Thus, $L_F$ is a valid Lipschitz constant.

4.2.5 Big-Ms

As a valid Big-M in the master problem, we use the maximum of $||\bar{{y}} - \underline{y}||_\infty $, $||\bar{{x}} - \underline{x}||_\infty $, and

$$\begin{aligned} \max _{y\in B(\underline{y}, \bar{{y}})} \frac{1}{2} y^\top G_\textrm{l}y+ d_\textrm{l}y- \min _{y\in B(\underline{y}, \bar{{y}})} \frac{1}{2} y^\top G_\textrm{l}y+ d_\textrm{l}y. \end{aligned}$$

4.2.6 Lipschitz Constant Updates

The Lipschitz constants are updated after each box splitting since it is to be expected that the constants get smaller if they are computed on smaller sets.

4.3 Numerical Results

In our computational study below, we consider the QP-QP instances from the BASBLib library [38]. We first describe which instances need to be excluded because they do not fit into our setup. First, we exclude the three instances d_1992_01, b_1984_02, and dd_2012_02 because they contain nonlinear constraints. Next, the two instances y_1996_02 and lmp_1987_01 are excluded due to nonconvex upper-level objective functions, ie., the matrix $H_\textrm{u}$ is not positive semidefinite for these instances. The instance sc_1998_01 is not considered because C is the zero matrix and, thus, the resulting optimization problem is not a “true” bilevel problem because the lower-level problem is not constrained by upper-level variables. Such problems can easily be solved by backwards induction, ie., an optimal solution to the lower-level problem can first be determined and, given this lower-level optimal solution, the upper-level problem can be solved to optimality. Finally, we have to exclude the following four instances because they violate Assumption 1:

tmh_2007_01 (lower level is not feasible for $x=9$),
b_1988_01 (lower level is not feasible for $x=9$),
b_1998_07 (lower level is not feasible for $x=9$), and
cw_1990_02 (lower level is not feasible for $x=7$).

In total, 10 instances (out of 20) remain; see Table 1. Note that, due to the applied reformulations as described in Sect. 4.2.2, the reported number of variables might differ from those reported in the BASBLib. Moreover, the reported number of constraints does not include additional constraints necessary due to these reformulations. Variable bounds are also not counted as constraints.

Table 1 Overview of the considered BASBLib instances

Full size table

We implemented the algorithm in Python 3.7.9. All computations were conducted on a machine with an Intel(R) Core(TM) i7-8565U CPU with 4 cores, 1.8GHz to 4.6GHz, and 16GB RAM. The master problem (M(k)) and the subproblem (4) are modeled using Pyomo 5.7.2 [22] and solved with Gurobi 9.1.0 [21]. The Hoffman constant is computed with the algorithm described in [40] using the MATLAB code made publicly available by the authors of [40].^{Footnote 2} We set $\varepsilon = 10^{-1}$ and use a time limit of 5h. If the instance is solved within the time limit, we decrease $\varepsilon $ by subsequently dividing by ten until $\varepsilon = 10^{-5}$ is reached to see how much accuracy can be reached by our algorithm in the given time limit. The obtained results for the case of box filtering and determining the Lipschitz constant with the “slow” method are summarized in Table 2, while the results for the “fast” method without box filtering are given in Table 3. Note that Table 2 contains results for 4 instances, while Table 3 contains results for 10 instances. The reason is that 6 instances only have simple variable bounds so that there is no difference between the “slow” and the “fast” method as well as between box filtering and no box filtering (except for the additional computational effort due to the LP feasibility problems solved in the former case, see Sect. 4.2.3). In these cases, we only list the respective instances in Table 3. The tables are organized as follows. The first column states the ID of the instance and the second one states the used $\varepsilon $ for the termination criterion. The number of required iterations is denoted by k, “runtime” states the runtimes in seconds, and the “final $\varepsilon $” column contains the tolerance that is actually reached. Finally, the columns “diff to opt.” and “diff to opt. value” contain the 2-norm distance of our solution to the one reported in the BASBLib and the respective difference in the objective value.

Table 2 Computational results with “slow” method and box filtering

Full size table

Table 3 Computational results with “fast” method and no box filtering

Full size table

Before we discuss the results in detail, let us comment on two important aspects. First, it can be expected that our method performs rather bad if there are multiplicities. To get a finer relaxation, all boxes covering the multiple solutions have to be split at least once if $\varepsilon $-feasibility is not reached by the first split. Second, it is to be expected that our method performs rather good for instances with small variable ranges, eg., ranges such as $[0,1]^{n_x}\times [0,1]^{n_y}$ instead of $[0,1000]^{n_x}\times [0,1000]^{n_y}$, as well as small lower-level objective function ranges. In particular, the range of the lower level’s objective function is small in the case of small Lipschitz constants. The Lipschitz constant derived here for the optimal-value function is valid for all quadratic programs with right-hand-side perturbations that satisfy Assumption 1. However, for specific bilevel applications much tighter Lipschitz constants might be derived by exploiting problem-specific structural properties.

In what follows, we comment on the obtained computational results in the light of these two aspect. Multiplicities are reported for instance as_1981_01. As can be seen in Table 2, for $\varepsilon = 10^{-2}$, this instance is not solved to $\varepsilon $-feasibility within the time limit despite the fact that an optimal solution is already reached. This effect is even enhanced if box filtering is deactivated and the “fast” method is used; see Table 3. In this case, the final $\varepsilon $ is 47.3, which is very large. As this instance has rather many general constraints, box filtering with the “slow” method improves the solution process significantly. However, for instances with less general constraints like sa_1981_01 and sa_1981_02, the additional computational burden of the “slow” method and box filtering can outweigh its advantages.

Finally, let us discuss the results in dependence of the tightness of the Lipschitz constants and the size of the variable ranges. To this end, we first order the considered instances by increasing Lipschitz constants of the function f: b_1998_02, b_1998_03, d_2000_01, d_1978_01, fl_1995_01, sa_1981_02, as_1981_01, sa_1981_01, b_1998_04, and b_1998_05,^{Footnote 3} Indeed, the three instances with lowest Lipschitz constants are solved for the smallest $\varepsilon $ within the time limit; see Tables 2 and 3. In addition to these 3 instances, the “fast” method with no box filtering also solves the instances d_1978_01, sa_1981_01, and b_1998_04 at least for the initial $\varepsilon $. Besides having one of the lowest Lipschitz constants, the instance d_1978_01 has relatively low variable ranges. The other two instances sa_1981_01 and b_1998_04 have relatively low number of lower- and upper-level variables, which leads to reduced dimensions of the boxes and therefore of the worst-case number of iterations. Nevertheless, the disadvantage of the large Lipschitz constant of b_1998_04 is reflected in the results as for $\varepsilon = 10^{-1}$ already over 2h are needed to compute an $\varepsilon $-feasible solution.

In total, our method solves 7 out of 10 instances. Interestingly, 2 out of the 3 unsolved instances even have a convex lower-level problem and could thus be solved with specialized methods such as, eg., [30] that explicitly exploit this property. In addition, as our method only requires very weak assumptions and can be applied to a broad range of problems besides nonconvex bilevel problems, it cannot be expected that it outperforms specifically tailored methods like, e.g., the BASBL solver [39].

5 Application to Gas Network Optimization

In this section, we use Algorithm 1 to solve stationary gas network optimization problems. We start by modeling the gas network and state an implicit nonlinear pressure law function for gas flow in pipes. We cannot state this function explicitly but it is possible to evaluate it rather cheaply. Then, we analyze its derivatives to derive suitable Lipschitz constants. Finally, numerical results on test instances show the successful application of our method.

5.1 Modeling

We model the gas network as a directed and weakly connected graph $(V, A)$, where the arcs $A $ are composed of pipes $A _\textrm{pi} $, short pipes $A _\textrm{sp} $, valves $A _\textrm{va} $, compressor stations $A _\textrm{cs} $, and control valves $A _\textrm{cv} $, ie.,

$$\begin{aligned} A = A _\textrm{pi} \cup A _\textrm{sp} \cup A _\textrm{va} \cup A _\textrm{cs} \cup A _\textrm{cv}. \end{aligned}$$

The two main variables that describe the state of the gas flowing through the network are the pressure $p $ and mass flow $q $. Each node $u \in V $ has a bounded pressure variable $p _u \in [\underline{p}_u, \bar{{p}}_u ]$ and a given mass flow $q _u $ that is supplied to or withdrawn from the network. A node $u \in V $ is called an entry node if $q _u > 0$, an exit node if $q _u < 0$, and an inner node if $q _u = 0$. In addition, each arc $a \in A $ has a variable $q _a \in [\underline{q}_a, \bar{{q}}_a ]$ that models the mass flow in the arc.

The balance equation

$$\begin{aligned} q _u + \sum _{a \in \delta ^{\textrm{in}}(u)} q _a = \sum _{a \in \delta ^{\textrm{out}}(u)} q _a \quad \text {for all } u \in V \end{aligned}$$

(16)

ensures that no gas is gained or lost. Here, $\delta ^{\textrm{in}}(u) $ and $\delta ^{\textrm{out}}(u) $ denote the sets of in- and outgoing arcs of node $u $.

A short pipe $a = (u, v) \in A _\textrm{sp} $ directly connects its nodes $u $ and $v $. Therefore, the related pressure values coincide:

$$\begin{aligned} p _u = p _v \quad \text {for all } a = (u, v) \in A _\textrm{sp}. \end{aligned}$$

(17)

A valve $a = (u, v) \in A _\textrm{va} $ is either open or closed. If it is open, it is modeled as a short pipe. If it is closed, the mass flow $q _a $ has to be zero and the related pressures are decoupled but the corresponding pressure difference between the nodes $u $ and $v $ cannot exceed a given value $\Delta \bar{{p}}_a $. This can be modeled by introducing a binary variable $o_a $ that indicates if the valve $a $ is open ($o_a = 1$) or closed ($o_a = 0$). The valve model then reads

$$\begin{aligned} q _a&\ge \underline{q}_a o_a&\text {for all } a \in A _\textrm{va}, \end{aligned}$$

(18a)

$$\begin{aligned} q _a&\le \bar{{q}}_a o_a&\text {for all } a \in A _\textrm{va}, \end{aligned}$$

(18b)

$$\begin{aligned} p _u- p _v&\le \Delta \bar{{p}}_a (1 - o_a)&\text {for all } a = (u, v) \in A _\textrm{va}, \end{aligned}$$

(18c)

$$\begin{aligned} p _v- p _u&\le \Delta \bar{{p}}_a (1 - o_a)&\text {for all } a = (u, v) \in A _\textrm{va}. \end{aligned}$$

(18d)

Compressor stations $a \in A _\textrm{cs} $ have a fixed flow direction, ie., $\underline{q}_a \ge 0$ holds, and can increase the pressure of the gas. We model this pressure increase by introducing the variable $\Delta p _a \in [0, \Delta \bar{{p}}_a ]$. Then, the compressor station model is given by

$$\begin{aligned} p _v = p _u + \Delta p _a \quad \text {for all } a = (u, v) \in A _\textrm{cs}. \end{aligned}$$

(19)

Note that this model is a significant simplification of how compressor stations in real gas networks operate. More complicated and realistic models can be found in, eg., [41, 45].

Control valves $a \in A _\textrm{cv} $ are modeled similar to compressor stations but decrease the gas pressure instead of increasing it. We again have $\underline{q}_a \ge 0$ and

$$\begin{aligned} p _v = p _u- \Delta p _a \quad \text {for all } a = (u, v) \in A _\textrm{cv} \end{aligned}$$

(20)

with $\Delta p _a \in [0, \Delta \bar{{p}}_a ]$.

Until now, we have modeled all gas network components in a (mixed-integer) linear way. The only components that are still missing are pipes $a \in A _\textrm{pi} $. We will describe the pressure loss in a pipe in dependence of the inflow pressure and the mass flow using a nonlinear and Lipschitz continuous function

$$\begin{aligned} p _v = p _{v, a}(p _u, q _a) \quad \text {for all } a = (u, v) \in A _\textrm{pi}, \end{aligned}$$

(21)

which we will analyze in the next section.

The goal is to minimize the overall activity of the compressor stations. Therefore, the full stationary gas network optimization problem is given by

This model has the form of Problem (1). The pipe equations (21) constitute the Lipschitz nonlinearities (1c), while the other constraints fit (1b) as they are linear. Hence, we can use Algorithm 1 to solve it.

In contrast to the gas network model considered in [47], we do not restrict ourselves to tree-structured networks. This has the consequence that the mass flows in the network cannot be pre-computed. Therefore, the nonlinearity on each arc is multivariate as it depends on the pressure and the mass flow. This means that Algorithm 1 can be applied to a much broader class of gas transport models.

5.2 Lipschitz Continuity of the Gas Flow Equation

In this section, we derive and analyze the nonlinear pipe model (21). For the sake of readability, we henceforth omit the subscript $a $ that indicates the pipe $a \in A _\textrm{pi} $.

Gas flow along a pipe can be modeled by the stationary momentum equation. This ordinary differential equation (ODE) reads

$$\begin{aligned} \partial _{x}\left( p + \frac{\chi ^2}{\rho }\right) = -\frac{1}{2} \theta \frac{\chi |{\chi }|}{\rho }, \quad \chi = \rho v, \quad \theta = \frac{\lambda }{D} \end{aligned}$$

where $p $, $v$, $\chi $, and $\rho $ model the pressure, velocity, mass flux, and density of the gas and $\lambda $ as well as $D$ denote the pipe’s friction coefficient and the diameter of the pipe; see, eg., [19]. The relation between the mass flow $q $ and mass flux $\chi $ is given by $q = A\chi $ where $A = \pi D^2 / 4$ is the cross-sectional area of the pipe.

The pressure $p $ and density $\rho $ are coupled by the equation of state

$$\begin{aligned} p = R_{\textrm{s}}T z \rho \end{aligned}$$

(22)

for real gas, where $R_{\textrm{s}}$ denotes the specific gas constant. The compressibility factor $z $ can be computed by the so-called AGA formula

$$\begin{aligned} z = 1 + \alpha p,\quad \alpha = 0.257 \frac{1}{p _\textrm{c}} - 0.533 \frac{T _\textrm{c}}{p _\textrm{c} T} < 0 \end{aligned}$$

with pseudocritical pressure $p _\textrm{c} $, pseudocritical temperature $T _\textrm{c} $, and temperature $T $ that we assume to be constant; see, eg., [35]. We only consider a positive compressibility factor, which is equivalent to

$$\begin{aligned} p < \frac{1}{|{\alpha }|}. \end{aligned}$$

(23)

We further introduce the speed of sound $c $ which is defined via

$$\begin{aligned} \frac{1}{c ^2} = \frac{\partial \rho }{\partial p} \end{aligned}$$

and the squared Mach number

$$\begin{aligned} \eta = \frac{v^2}{c ^2}. \end{aligned}$$

Lemma 5.1

It holds

$$\begin{aligned} \eta = R_{\textrm{s}}T \frac{\chi ^2}{p ^2}. \end{aligned}$$

Proof

We solve the equation of state (22) for $\rho $ and obtain

$$\begin{aligned} \rho = \frac{p}{R_{\textrm{s}}T z}. \end{aligned}$$

We can use this and the definition of the speed of sound to get

$$\begin{aligned} \eta&= v^2 \frac{1}{c ^2} = \left( \frac{\chi }{\rho }\right) ^2 \frac{\partial \rho }{\partial p} = \left( \frac{\chi R_{\textrm{s}}T z}{p}\right) ^2 \frac{R_{\textrm{s}}T z- p R_{\textrm{s}}T \alpha }{(R_{\textrm{s}}T z)^2}\\&= \left( \frac{\chi R_{\textrm{s}}T z}{p}\right) ^2 \frac{1}{R_{\textrm{s}}T z ^2} = R_{\textrm{s}}T \frac{\chi ^2}{p ^2}.\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \end{aligned}$$

$\square $

We assume that the velocity of the gas is subsonic, ie., $\eta < 1$ holds, as it is the case for real-world gas networks. This is equivalent to

$$\begin{aligned} p > |{\chi }| \sqrt{R_{\textrm{s}}T}. \end{aligned}$$

(24)

In what follows, we only consider pressures within the interval $(|{\chi }| \sqrt{R_{\textrm{s}}T}, 1 / |{\alpha }|)$. Therefore, we have

$$\begin{aligned} p ^2 - \chi ^2 R_{\textrm{s}}T> 0 \quad \text {and} \quad 1 + \alpha p > 0, \end{aligned}$$

which we will use many times throughout this section.

For a pipe $(u, v)$ with length $L$ and pressure $p _u $ at node $u $, the pressure function reads

$$\begin{aligned} p (x, p _u, \chi )&= F^{-1}\left( F(p _u) - \frac{1}{2} R_{\textrm{s}}T \chi |{\chi }| \theta x\right) , \end{aligned}$$

(25a)

$$\begin{aligned} F(p)&= \frac{1}{\alpha } p + \left( \chi ^2 R_{\textrm{s}}T- \frac{1}{\alpha ^2}\right) \ln (|{1+\alpha p}|) - \chi ^2 R_{\textrm{s}}T \ln (p), \end{aligned}$$

(25b)

for $x\in [0, L]$; see [19, 20]. From [20], we further know the following properties of F.

Lemma 5.2

The function F as defined in (25b) is differentiable for $p \in (|{\chi }| \sqrt{R_{\textrm{s}}T}, 1 / |{\alpha }|)$ with

$$\begin{aligned} F'(p) = \frac{p ^2 - \chi ^2 R_{\textrm{s}}T}{p (1 + \alpha p)} > 0. \end{aligned}$$

(26)

The second derivative fulfills

$$\begin{aligned} F''(p) = \frac{p ^2 + \chi ^2 R_{\textrm{s}}T (1 + 2 \alpha p)}{p ^2 (1 + \alpha p)^2} > 0. \end{aligned}$$

The property in (26) implies that F is strictly increasing. Therefore, the inverse in (25a) is well-defined. To evaluate the pressure function $p $ in (25a), the equation

$$\begin{aligned} F(p) = F(p _u) - \frac{1}{2} R_{\textrm{s}}T \chi |{\chi }| \theta x \end{aligned}$$

(27)

needs to be solved. This can be done numerically using Newton’s method since F is strictly increasing and convex; see [20]. In the same way, (27) can be solved for $p _u $ if $p $ and $\chi $ are given.

We are interested in the pressure at the end of the pipe, ie., for $x = L$, and need to find a Lipschitz constant for

$$\begin{aligned} p _v = p _v (p _u, \chi ) \mathrel {{\mathop :}{=}}p (L, p _u, \chi ). \end{aligned}$$

To this end, we make the following assumption.

Assumption 2

We assume that the variables $\chi $ and $p _u $ of each pipe $(u, v)$ are bounded by

$$\begin{aligned} \chi \in [\underline{\chi }, \bar{{\chi }}] = [\underline{q} / A, \bar{{q}} / A], \quad p _u \in [\underline{p}_u, \bar{{p}}_u ] \subset \left( |{\chi }| \sqrt{R_{\textrm{s}}T}, \frac{1}{|{\alpha }|}\right) \end{aligned}$$

with

$$\begin{aligned} p _v (p _u, \chi ) \in \left( |{\chi }| \sqrt{R_{\textrm{s}}T}, \frac{1}{|{\alpha }|}\right) \quad \text {for all } (p _u, \chi ) \in \mathcal {F}\mathrel {{\mathop :}{=}}[\underline{p}_u, \bar{{p}}_u ] \times [\underline{\chi }, \bar{{\chi }}]. \end{aligned}$$

The following lemma guarantees the Lipschitz continuity of $p _v (p _u, \chi )$ on $\mathcal {F}$.

Lemma 5.3

Let $f:\mathcal {F}\rightarrow \mathbb {R}$ be a partially differentiable function on a compact and convex subset $\mathcal {F}\subset \mathbb {R}^d$ with $d \in \mathbb {N}$. Then, f is Lipschitz continuous on $\mathcal {F}$ with Lipschitz constant $L = 1$ w.r.t. the weighted 1-norm

$$\begin{aligned} \Vert {{\tilde{x}}}\Vert _{w} \mathrel {{\mathop :}{=}}\sum _{i = 1}^d \left| {{\tilde{x}}_i}\right| w_i \end{aligned}$$

(28)

with positive weights $w \in \mathbb {R}^d$ and $w_i \ge \max _{x \in \mathcal {F}} \left| {\partial _{x_i}f(x)}\right| $ for all $i \in [d]$.

Proof

Let ${\tilde{x}}, {\tilde{y}} \in \mathcal {F}$. We define the auxiliary function $g:[0,1] \rightarrow \mathbb {R}$ as $g(\lambda ) \mathrel {{\mathop :}{=}}f(\lambda {\tilde{x}} + (1-\lambda ) {\tilde{y}})$. Now, we use the fundamental theorem of calculus to prove the claim:

$$\begin{aligned} |{f({\tilde{x}}) - f({\tilde{y}})}|&= |{g(1) - g(0)}|\\&= \left| {\int _0^1 g^\prime (\lambda ) \,\textrm{d}\lambda }\right| \\&= \left| {\int _0^1 ({\tilde{x}} - {\tilde{y}})^\top \nabla f(\lambda {\tilde{x}} + (1-\lambda ) {\tilde{y}}) \,\textrm{d}\lambda }\right| \\&= \left| {\int _0^1 \sum _{i = 1}^d ({\tilde{x}}_i - {\tilde{y}}_i) \partial _{x_i}f(\lambda {\tilde{x}} + (1-\lambda ) {\tilde{y}}) \,\textrm{d}\lambda }\right| \\&\le \int _0^1 \sum _{i = 1}^d \left| {({\tilde{x}}_i - {\tilde{y}}_i) \partial _{x_i}f(\lambda {\tilde{x}} + (1-\lambda ) {\tilde{y}})}\right| \,\textrm{d}\lambda \\&\le \int _0^1 \sum _{i = 1}^d \left| {{\tilde{x}}_i - {\tilde{y}}_i}\right| \max _{x\in \mathcal {F}} \left| {\partial _{x_i}f(x)}\right| \,\textrm{d}\lambda \\&\le \left\| {{\tilde{x}}_i - {\tilde{y}}_i}\right\| _{\textrm{w}}. \end{aligned}$$

$\square $

Using this weighted 1-norm allows to get tighter bounds in (5) compared to the usual 1-norm. To actually compute this weighted norm, one could solve $\max _{x\in \mathcal {F}} \left| {\partial _{x_i}f(x)}\right| $, which is an NLP for each $i \in [d]$. In the case of Algorithm 1, the set $\mathcal {F}$ will always be a box. For the function $p _v (p _u, \chi )$, we give the optimal solution or at least a suitable upper bound of these NLPs in the following two theorems.

Theorem 5.4

If Assumption 2 holds, the derivative $\partial _{p _u}p _v (p _u, \chi ) > 0$ attains its maximum on a given box $\emptyset \ne [p _u ^-, p _u ^+] \times [\chi ^-, \chi ^+] \subseteq [\underline{p}_u, \bar{{p}}_u ] \times [\underline{\chi }, \bar{{\chi }}]$ in $(p _u ^-, \chi ^+)$ if $\chi ^+ \ge 0$, or in $(p _u ^+, \chi ^+)$ else.

Theorem 5.5

Let $\mathcal {F}= [p _u ^-, p _u ^+] \times [\chi ^-, \chi ^+] \subseteq [\underline{p}_u, \bar{{p}}_u ] \times [\underline{\chi }, \bar{{\chi }}]$ be a given box with $\mathcal {F}\ne \emptyset $. If Assumption 2 holds, the derivative $\partial _{\chi }p _v (p _u, \chi ) \le 0$ has the following properties:

$$\begin{aligned} \min _{(p _u, \chi ) \in \mathcal {F}\cap \mathbb {R}\times \mathbb {R}_{\ge 0}} \partial _{\chi }p _v (p _u, \chi )&= \partial _{\chi }p _v (p _u ^-, \chi ^+)&\text {if } \chi ^+ \ge 0,\\ \min _{(p _u, \chi ) \in \mathcal {F}\cap \mathbb {R}\times \mathbb {R}_{\le 0}} \partial _{\chi }p _v (p _u, \chi )&> \partial _{\chi }p _v (p _v (p _u ^-, \chi ^-), -\chi ^-)&\text {if } \chi ^- < 0. \end{aligned}$$

The proofs of these two theorems can be found in the appendix of this paper as they are rather long and technical.

Theorems 5.4 and 5.5 can be used to not only compute the bounds in (5) for the initial box $[\underline{p}_u, \bar{{p}}_u ] \times [\underline{\chi }, \bar{{\chi }}]$ but also for every new box that is created in Step 15 of Algorithm 1. This can significantly tighten the bounds in (5) as the iteration proceeds.

5.3 Numerical Results

Now, we apply Algorithm 1 to two test problems from the GasLib library [46], which contains stationary gas network benchmark instances. To this end, we implemented our method in Python 3.8.10. The computations were done on a machine with an Intel(R) Core(TM) i7-8550U CPU with 4 cores, 1.8 GHz to 4.0 GHz, and 16 GB RAM. The master problems (M(k)) and the subproblems (4) have been modeled using GAMS 36.2.0 [9]. We used the solver CPLEX 20.1.0.1 [11] to solve the master problems, which are MIPs, and the solver SNOPT 7.7.7 [18] for the subproblems, which are NLPs.

To improve the performance of our method, we detect boxes $[p _u ^-, p _u ^+] \times [\chi ^-, \chi ^+]$ in $\mathcal {X}_i^k$ that lie outside the feasible set of a pipe. From (29) and (30), we know that this is the case if $\chi ^- \ge 0$ and $p _v (p _u ^+, \chi ^-) < \underline{p}_v $ holds or if $\chi ^+ \le 0$ and $p _v (p _u ^-, \chi ^+) > \bar{{p}}_v $ holds. Additionally, we can fix the flow in pipes that are not part of a cycle. This allows us to reduce the dimension of the corresponding nonlinearities by one. To get well-scaled problems, we model all pressure values in bar instead of Pa and exclusively use mass flow values (in kg s$^{-1}$) instead of mass flux values. As a result, we have to scale all values of $\partial _{\chi }p _v (p _u, \chi )$ that are used to compute the bounds in (5) by a factor of $10^{-5}/A$.

To compute a sufficiently large value M for the master problem constraints (6c)–(6f), we use the difference between the values that can occur on the right-hand sides of the inequalities and the bounds of the variables on the left-hand side. This leads to the formula

$$\begin{aligned} M \mathrel {{\mathop :}{=}}\max \left\{ \left( \bar{{p}}_u- \underline{p}_u \right) , \left( \bar{{q}} - \underline{q}\right) , \left( \frac{1}{|{\alpha }|} - \underline{p}_v \right) , \left( \bar{{p}}_v- \frac{\max \left\{ \bar{{q}}, |{\underline{q}}|\right\} }{A} \sqrt{R_{\textrm{s}}T}\right) \right\} . \end{aligned}$$

The first instance we solve is GasLib-11, which is shown in Fig. 3. It contains 11 nodes including 3 entries and 3 exits, 8 pipes, 2 compressor stations, and a single valve. Because the network has a cycle, not all flows on all arcs are known a priori.

The second instance we solve is GasLib-24; see Fig. 4. It consists of 24 nodes including 3 entries and 5 exits, 19 pipes, 3 compressor stations, a single control valve, a single short pipe, and a single resistor, which we replace by a short pipe. There are two cycles in the network.

Table 4 The runtimes and numbers of iterations of Algorithm 1 for different parameters $\lambda $

Full size table

For both instances, we tested the values 0.125, 0.25, 0.375, and 0.5 for the parameter $\lambda $. We chose the value $\varepsilon = 0.1$ (in bar) for the termination criterion. The resulting runtimes and numbers of iterations are listed in Table 4. In both cases, $\lambda =0.25$ and $\lambda =0.375$ yield better results than $\lambda =0.125$ or $\lambda =0.5$. For $\lambda =0.125$, this can be explained by the fact that the boxes do not shrink fast enough when split because the splitting point can be quite close to the edge of its box. Choosing $\lambda =0.5$, on the other hand, removes the possibility for the splitting points to be closer to the master problem solution and therefore potentially to the optimal solution of Problem (1).

For GasLib-11, the best result was achieved for $\lambda =0.375$ for which an $\varepsilon $-feasible solution was found in 47.41s. The method terminated after 74 iterations with a mean iteration time of 0.64s. The mean time to solve the master problem was 0.35s, and the mean time to solve the subproblems was 0.14s. The remaining 0.16s were used to (re-)build the model in each iteration.

For GasLib-24, the best result was achieved for $\lambda =0.25$ for which an $\varepsilon $-feasible solution was found in 158.43s. With 95 iterations, the mean iteration time was 1.67s. This mean includes 1.05s to solve the master problem, 0.23s to solve the subproblems, and 0.39s to (re-)build the model in each iteration.

Figure 5 shows the progress of the maximal error over the course of the iterations for the best-case of both test instances. One can see that it falls rapidly in the first iterations. After that, it fluctuates strongly with only a slight downward trend until the threshold $\max _{i \in [p]} |{f_i(x_{I_i}) - x_{r_i}}| \le \varepsilon $ is reached. This behavior is typical as the approximation progress from splitting a box in Step 15 of Algorithm 1 is reduced as the boxes get smaller. The fluctuations can be explained by the master problem solution switching to a bigger box $j_i^k \in J_i^k$ after the previous box has been refined a number of times in Step 15.

6 Conclusion

In this paper, we developed a successive linear relaxation method for solving MINLPs with nonlinearities that are not given in closed form. Instead, we only assume that these multivariate nonlinearities can be evaluated and that we know their global Lipschitz constants. We illustrate the flexibility of this class of models and of our method by showing that it can be applied to bilevel optimization models with nonlinear and nonconvex lower-level problems as well as to nonconvex MINLPs for gas transport problems that are constrained by differential equations. Moreover, we proved finite termination of the method (at approximate global optimal solutions) and derived a worst-case iteration bound.

Finally, let us sketch three important topics for future research that are out of scope of this paper. First, one can weaken the assumptions made in this paper as it is done in [47]. In particular, the assumption of exact function evaluations is rather strong in some applications such as, eg., in the case of PDE constraints. Second, one could also incorporate general-purpose local solvers to compute feasible points that lead to upper bounds. Together with lower bounds that we directly get from solving the master problem in every iteration, this could be used to obtain a gap and thus a further termination criterion besides the one based on the approximation accuracy that we used in this paper. Third, it is obvious (and also not expected) that the proposed method is not competitive with MINLP solution methods that explicitly use the structural properties of the nonlinearities that are given in closed form. However, there are many possible ways of improving the general method proposed in this paper. For instance, the incorporation of presolving and dimension-reduction techniques would help a lot to improve the performance of the method.

Notes

Note that we omit transpositions here and in what follows for the ease of better reading.
The MATLAB code can be found at http://www.andrew.cmu.edu/user/jfp/hoffman.html.
In case of identical Lipschitz constants, alphabetical sorting is used. The ordering of the considered instances by increasing Lipschitz constants of the optimal-value function $\varphi $ is slightly different: b_1998_02 b_1998_03, d_1978_01, fl_1995_01, sa_1981_02, d_2000_01, sa_1981_01, b_1998_04, b_1998_05, and as_1981_01.

References

Al-Khayyal, F.A., Sherali, H.D.: On finitely terminating branch-and-bound algorithms for some global optimization problems. SIAM J. Optim. 10(4), 1049–1057 (2000). https://doi.org/10.1137/S105262349935178X
Article MathSciNet MATH Google Scholar
Bajaj, I., Arora, A., Hasan, M.M.F.: Black-box optimization: methods and applications. In: Black Box Optimization, Machine Learning, and No-Free Lunch Theorems, pp. 35–65. Springer (2021). https://doi.org/10.1007/978-3-030-66515-9_2
Beck, Y., Ljubić, I., Schmidt, M.: A survey on bilevel optimization under uncertainty. Eur. J. Oper. Res. 311(2), 401–426 (2023). https://doi.org/10.1016/j.ejor.2023.01.008
Beck, Y., Bienstock, D., Schmidt, M., Thürauf, J.: On a computationally ill-behaved bilevel problem with a continuous and nonconvex lower level. J. Optim. Theory. Appl. 198, 428–447 (2023). https://doi.org/10.1007/s10957-023-02238-9
Becker, R., Meidner, D., Vexler, B.: Efficient numerical solution of parabolic optimization problems by finite element methods. Optim. Methods Softw. 22(5), 813–833 (2007). https://doi.org/10.1080/10556780701228532
Article MathSciNet MATH Google Scholar
Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-integer nonlinear optimization. Acta Numerica 22, 1–131 (2013). https://doi.org/10.1017/S0962492913000032
Article MathSciNet MATH Google Scholar
Bonami, P., Biegler, L.T., Conn, A.R., Cornuéjols, G., Grossmann, I.E., Laird, C.D., Lee, J., Lodi, A., Margot, F., Sawaya, N., Wächter, A.: An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optim. 5(2), 186–204 (2008). https://doi.org/10.1016/j.disopt.2006.10.011
Article MathSciNet MATH Google Scholar
Buchheim, C., Kuhlmann, R., Meyer, C.: Combinatorial optimal control of semilinear elliptic PDEs. Comput. Optim. Appl. 70(3), 641–675 (2018). https://doi.org/10.1007/s10589-018-9993-2
Article MathSciNet MATH Google Scholar
Bussieck, M.R., Meeraus, A.: General algebraic modeling system (GAMS). In: Modeling Languages in Mathematical Optimization, pp. 137–157. Springer (2004). https://doi.org/10.1007/978-1-4613-0215-5_8
Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to derivative-free optimization. MOS-SIAM Series on Optimization. SIAM (2009). https://doi.org/10.1137/1.9780898718768
CPLEX, IBM ILOG. V12.1: User’s Manual for CPLEX. 53, p. 157 (2009)
Dempe, S.: Bilevel optimization: theory, algorithms, applications and a bibliography. In: Dempe, S., Zemkoho, A. (eds.) Bilevel Optimization: Advances and Next Challenges, pp. 581–672. Springer (2020). https://doi.org/10.1007/978-3-030-52119-6_20
Chapter MATH Google Scholar
Dempe, S.: Foundations of Bilevel Programming. Springer (2002). https://doi.org/10.1007/b101970
Article Google Scholar
Dempe, S., Kalashnikov, V., Pérez-Valdés, G.A., Kalashnykova, N.: Bilevel Programming Problems. Springer (2015). https://doi.org/10.1007/978-3-662-45827-3
Book MATH Google Scholar
Duran, M.A., Grossmann, I.E.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36(3), 307–339 (1986). https://doi.org/10.1007/BF02592064
Article MathSciNet MATH Google Scholar
Fletcher, R., Leyffer, S.: Solving mixed integer nonlinear programs by outer approximation. Math. Program. 66(1), 327–349 (1994). https://doi.org/10.1007/BF01581153
Article MathSciNet MATH Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1979)
MATH Google Scholar
Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47(1), 99–131 (2005). https://doi.org/10.1137/S0036144504446096
Article MathSciNet MATH Google Scholar
Gugat, M., Leugering, G., Martin, A., Schmidt, M., Sirvent, M., Wintergerst, D.: Towards simulation based mixed-integer optimization with differential equations. Networks 72(1), 60–83 (2018). https://doi.org/10.1002/net.21812
Article MathSciNet MATH Google Scholar
Gugat, M., Schultz, R., Wintergerst, D.: Networks of pipelines for gas with nonconstant compressibility factor: stationary states. Comput. Appl. Math. 37(2), 1066–1097 (2018). https://doi.org/10.1007/s40314-016-0383-z
Article MathSciNet MATH Google Scholar
Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual (2022). http://www.gurobi.com
Hart, W.E., Laird, C.D., Watson, J.-P., Woodruff, D.L., Hackebeil, G.A., Nicholson, B.L., Siirola, J.D.: Pyomo–Optimization Modeling in Python, vol. 67. Springer (2017). https://doi.org/10.1007/978-3-319-58821-6
Book MATH Google Scholar
Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S.: Optimization with PDE Constraints. Mathematical Modelling: Theory and Applications, vol. 23. Springer (2009). https://doi.org/10.1007/978-1-4020-8839-1
Book MATH Google Scholar
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bureau Stand. 49(4), 263–265 (1952)
Article MathSciNet Google Scholar
Horst, R.: Deterministic global optimization with partition sets whose feasibility is not known: application to concave minimization, reverse convex constraints, DC programming, and Lipschitzian optimization. J. Optim. Theory Appl. 58(1), 11–37 (1988). https://doi.org/10.1007/BF00939768
Article MathSciNet MATH Google Scholar
Horst, R., Thoai, N.V.: Branch-and-bound methods for solving systems of Lipschitzian equations and inequalities. J. Optim. Theory Appl. 58(1), 139–145 (1988). https://doi.org/10.1007/BF00939776
Article MathSciNet MATH Google Scholar
Horst, R., Tuy, H.: Global Optimization, 2nd edn. Springer, Berlin (1996). https://doi.org/10.1007/978-3-662-03199-5
Book MATH Google Scholar
Horst, R., Tuy, H.: On the convergence of global methods in multiextremal optimization. J. Optim. Theory Appl. 54(2), 253–271 (1987). https://doi.org/10.1007/BF00939434
Article MathSciNet MATH Google Scholar
Kannan, R., Monma, C. L.: On the computational complexity of integer programming problems. In: Henn, R., Korte, B., Oettli, W. (eds) Optimization and Operations Research: Proceedings of a Workshop Held at the University of Bonn, October 2–8, 1977, pp. 161–172. Springer, Berlin (1978). https://doi.org/10.1007/978-3-642-95322-4_17
Kleinert, T., Grimm, V., Schmidt, M.: Outer approximation for global optimization of mixed-integer quadratic bilevel problems. Math. Program. (Series B) 188, 461–521 (2021). https://doi.org/10.1007/s10107-020-01601-2
Article MathSciNet MATH Google Scholar
Kleinert, T., Labbé, M., Ljubić, I., Schmidt, M.: A survey on mixed-integer programming techniques in bilevel optimization. EURO J. Comput. Optim. 9, 100007 (2021). https://doi.org/10.1016/j.ejco.2021.100007
Article MathSciNet MATH Google Scholar
Koch, T., Hiller, B., Pfetsch, M.E., Schewe, L.: Evaluating Gas Network Capacities. SIAM-MOS Series on Optimization. SIAM (2015). https://doi.org/10.1137/1.9781611973693
Kronqvist, J., Bernal, D.E., Lundell, A., Westerlund, T.: A center-cut algorithm for quickly obtaining feasible solutions and solving convex MINLP problems. In: Computers and Chemical Engineering 122 (2019). 2017 Edition of the European Symposium on Computer Aided Process Engineering (ESCAPE-27), pp. 105–113. https://doi.org/10.1016/j.compchemeng.2018.06.019
Kronqvist, J., Bernal, D.E., Lundell, A., Grossmann, I.E.: A review and comparison of solvers for convex MINLP. Optim. Eng. 20, 397–455 (2019). https://doi.org/10.1007/s11081-018-9411-8
Article MathSciNet Google Scholar
Králik, J., Stiegler, P., Vostry, Z., Závorka, J.: Dynamic Modeling of Large-Scale Networks with Application to Gas Distribution. Studies in Automation and Control, vol. 6. Elsevier, New York (1988)
Leyffer, S., Sartenaer, A., Wanufelle, E.: Branch-and-refine for mixed-integer nonconvex global optimization. Technical report. Preprint ANL/MCS-P1547-0908, Mathematics and Computer Science Division, Argonne National Laboratory (2008). https://wiki.mcs.anl.gov/leyffer/images/1/15/SOS-OA-ANL.pdf
Li, W.: The sharp Lipschitz constants for feasible and optimal solutions of a perturbed linear program. Linear Algebra Appl. 187, 15–40 (1993). https://doi.org/10.1016/0024-3795(93)90125-8
Article MathSciNet MATH Google Scholar
Paulavicius, R., Adjiman, C. S.: BASBLib—a library of bilevel test problems. Version v2.2. Zenodo (2017). https://doi.org/10.5281/zenodo.897966
Paulavičius, R., Gao, J., Kleniati, P.-M., Adjiman, C.S.: BASBL: Branch-and-sandwich bilevel solver. Implementation and computational study with the BASBLib test set. Comput. Chem. Eng. 132 (2020). https://doi.org/10.1016/j.compchemeng.2019.106609
Pena, J., Vera, J., Zuluaga, L.F.: An algorithm to compute the Hoffman constant of a system of linear constraints (2018). arXiv: 1804.08418 [math.OC]
Pfetsch, M.E., Fügenschuh, A., Geißler, B., Geisler, N., Gollmer, R., Hiller, B., Humpola, J., Koch, T., Lehmann, T., Martin, A., Morsi, A., Rövekamp, J., Schewe, L., Schmidt, M., Schultz, R., Schwarz, R., Schweiger, J., Stangl, C., Steinbach, M.C., Vigerske, S., Willert, B.M.: Validation of nominations in gas network optimization: models, methods, and solutions. Optim. Methods Softw. 30(1), 15–53 (2015). https://doi.org/10.1080/10556788.2014.888426
Pintér, J.: Branch- and bound algorithms for solving global optimization problems with Lipschitzian structure. Optimization 19(1), 101–110 (1988). https://doi.org/10.1080/02331938808843322
Article MathSciNet MATH Google Scholar
Pintér, J.: Globally convergent methods for n-dimensional multiextremal optimization. Optimization 17(2), 187–202 (1986). https://doi.org/10.1080/02331938608843118
Article MathSciNet MATH Google Scholar
Pintér, J.D.: Global Optimization in Action (Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications). Springer (1996). https://doi.org/10.1007/978-1-4757-2502-5
Article MATH Google Scholar
Rose, D., Schmidt, M., Steinbach, M.C., Willert, B.M.: Computational optimization of gas compressor stations: MINLP models versus continuous reformulations. Math. Methods Oper. Res. 83(3), 409–444 (2016). https://doi.org/10.1007/s00186-016-0533-5
Article MathSciNet MATH Google Scholar
Schmidt, M., Aßmann, D., Burlacu, R., Humpola, J., Joormann, I., Kanelakis, N., Koch, T., Oucherif, D., Pfetsch, M.E., Schewe, L., Schwarz, R., Sirvent, M.: GasLib—a library of gas network instances. Data. 2(4), (2017). https://doi.org/10.3390/data2040040
Schmidt, M., Sirvent, M., Wollner, W.: A decomposition method for MINLPs with Lipschitz continuous nonlinearities. Math. Program. 178(1), 449–483 (2019). https://doi.org/10.1007/s10107-018-1309-x
Article MathSciNet MATH Google Scholar
Schmidt, M., Sirvent, M., Wollner, W.: The cost of not knowing enough: mixed-integer optimization with implicit Lipschitz nonlinearities. Optim. Lett. 16(5), 1355–1372 (2022). https://doi.org/10.1007/s11590-021-01827-9
Article MathSciNet MATH Google Scholar
Smith, E.M.B., Pantelides, C.C.: Global optimisation of nonconvex MINLPs. Comput. Chem. Eng. 21, S791–S796 (1997). https://doi.org/10.1016/S0098-1354(97)87599-0
Article Google Scholar
Still, G.: Lectures on Parametric Optimization: An Introduction. University of Twente, The Netherlands (2018). https://optimization-online.org/2018/04/6587/
Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming. Theory, Algorithms, Software, and Applications, vol. 65. Springer (2002). https://doi.org/10.1007/978-1-4757-3532-1
Book MATH Google Scholar
Tröltzsch, F.: Optimal Control of Partial Differential Equations. Graduate Studies in Mathematics, vol. 112. American Mathematical Society, Providence (2010). https://doi.org/10.1090/gsm/112
Book MATH Google Scholar
Tuy, H., Horst, R.: Convergence and restart in branch-and-bound algorithms for global optimization. Application to concave minimization and D.C. optimization problems. Math. Program. 41(1), 161–183 (1988). https://doi.org/10.1007/BF01580762
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The first author gratefully acknowledges the use of the services and facilities of the Energie Campus Nürnberg and financial support of the state of Bavaria. All authors thank the DFG for their support within projects A05, B08, and C08 in CRC TRR 154.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Lange Gasse 20, 90403, Nuremberg, Germany
Julia Grübel
Energie Campus Nürnberg, Fürther Str. 250, 90429, Nuremberg, Germany
Julia Grübel
Department of Data Science, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Cauerstr. 11, 91058, Erlangen, Germany
Richard Krug
Department of Mathematics, Trier University, Universitätsring 15, 54296, Trier, Germany
Martin Schmidt
MIN Fakulät, Fachbereich Mathematik, Universität Hamburg, Bundesstr. 55, 20146, Hamburg, Germany
Winnifried Wollner

Authors

Julia Grübel
View author publications
You can also search for this author in PubMed Google Scholar
Richard Krug
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Winnifried Wollner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Schmidt.

Additional information

Communicated by Alexander Mitsos.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A. Proofs of Theorem 5.4 and Theorem 5.5

In this appendix, we present the proof for Theorem 5.4 and Theorem 5.5. To this end, we first need to analyze the first- and second-order derivatives of the pressure function $p (x, p _u, \chi )$. The first-order derivatives are considered in Lemma A.1, the second-order derivatives $\partial _{p _u}^2p $ and $\partial _{\chi }\partial _{p _u}p $ are examined in Lemma A.2 and A.3, and, finally, the second-order derivative $\partial _{\chi }^2p $ is investigated in Lemma A.4, A.5, and A.6.

We start with the following properties of the first-order derivatives from [20].

Lemma A.1

The function $p $ as defined in (25a) is differentiable for $p \in (|{\chi }| \sqrt{R_{\textrm{s}}T},$ $1 / |{\alpha }|)$ with

$$\begin{aligned} \frac{\partial p}{\partial p _u}(x, p _u, \chi )&= \frac{F'(p _u)}{F'(p (x, p _u, \chi ))} > 0, \end{aligned}$$

(29)

$$\begin{aligned} \frac{\partial p}{\partial \chi }(x, p _u, \chi )&= \frac{2\chi R_{\textrm{s}}T \ln \left( \frac{(1 + \alpha p _u) p (x, p _u, \chi )}{(1 + \alpha p (x, p _u, \chi )) p _u}\right) - R_{\textrm{s}}T |{\chi }| \theta x}{F'(p (x, p _u, \chi ))} {\left\{ \begin{array}{ll} = 0, &{} \textrm{for }\, \chi = 0,\\ < 0, &{} \textrm{for }\, \chi \ne 0, \end{array}\right. } \end{aligned}$$

(30)

and

$$\begin{aligned} {{\,\textrm{sign}\,}}\left( \frac{\partial p}{\partial x}(x, p _u, \chi )\right) = - {{\,\textrm{sign}\,}}(\chi ). \end{aligned}$$

(31)

The sign condition in (31) implies

$$\begin{aligned} p _u&> p (x, p _u, \chi ) \quad \text {if }\, \chi > 0,\\ p _u&< p (x, p _u, \chi ) \quad \text {if }\, \chi < 0,\\ p _u&= p (x, p _u, \chi ) \quad \text {if }\, \chi = 0 \end{aligned}$$

for $x \in (0, L]$. This can be written as

$$\begin{aligned} {{\,\textrm{sign}\,}}(\chi ) = {{\,\textrm{sign}\,}}(p _u- p (x, p _u, \chi )). \end{aligned}$$

From [19], we know that the second derivative w.r.t. $p _u $ of $p $ is given by

$$\begin{aligned} \begin{aligned} \frac{\partial ^2 p}{\partial p _u ^2} ={}&\frac{(p _u ^2 + R_{\textrm{s}}T \chi ^2(1 + 2\alpha p _u)) (p ^2 - R_{\textrm{s}}T \chi ^2)^2}{F'(p) p _u ^2 (1 + \alpha p _u)^2 (p ^2 - \chi ^2 R_{\textrm{s}}T)^2} \\&\quad - \frac{(p ^2 + R_{\textrm{s}}T \chi ^2(1 + 2 \alpha p)) (p _u ^2 - R_{\textrm{s}}T \chi ^2)^2}{F'(p) p _u ^2 (1 + \alpha p _u)^2 (p ^2 - \chi ^2 R_{\textrm{s}}T)^2}. \end{aligned} \end{aligned}$$

(32)

The sign of this derivative is given by the following lemma.

Lemma A.2

It holds

$$\begin{aligned} {{\,\textrm{sign}\,}}\left( \frac{\partial ^2 p}{\partial p _u ^2}\right) = - {{\,\textrm{sign}\,}}(\chi ) \end{aligned}$$

for $p \in (|{\chi }| \sqrt{R_{\textrm{s}}T}, 1 / |{\alpha }|)$ and $x \in (0, L]$.

Proof

Because of (23), (24), and (26), the denominator in (32) is positive. Thus, only the numerator

$$\begin{aligned} (p _u ^2 + R_{\textrm{s}}T \chi ^2(1 + 2\alpha p _u)) (p ^2 {-} R_{\textrm{s}}T \chi ^2)^2 {-} (p ^2 {+} R_{\textrm{s}}T \chi ^2(1 + 2 \alpha p)) (p _u ^2 - R_{\textrm{s}}T \chi ^2)^2\nonumber \\ \end{aligned}$$

(33)

determines the sign of $\frac{\partial ^2 p}{\partial p _u ^2}$. We note that $(p _u ^2 + R_{\textrm{s}}T \chi ^2(1 + 2\alpha p _u))$ as well as $(p ^2 + R_{\textrm{s}}T \chi ^2(1 + 2 \alpha p))$ are positive:

$$\begin{aligned} p _u ^2 + R_{\textrm{s}}T \chi ^2(1 + 2\alpha p _u)&= (p _u ^2 - R_{\textrm{s}}T \chi ^2) + 2 R_{\textrm{s}}T \chi ^2 (1 + \alpha p _u) > 0, \end{aligned}$$

(34a)

$$\begin{aligned} p ^2 + R_{\textrm{s}}T \chi ^2 (1 + 2 \alpha p)&= (p ^2 - R_{\textrm{s}}T \chi ^2) + 2 R_{\textrm{s}}T \chi ^2 (1 + \alpha p) > 0. \end{aligned}$$

(34b)

It follows that both the first and the second terms in (33) are positive. We rewrite them as

$$\begin{aligned} (p _u ^2 \!+\! R_{\textrm{s}}T \chi ^2 (1 \!+\! 2\alpha p _u)) (p ^2 \!-\! R_{\textrm{s}}T \chi ^2)^2&\!=\! p _u ^2 p ^4 \left( 1 \!+\! R_{\textrm{s}}T \chi ^2 \frac{1 \!+\! 2\alpha p _u}{p _u ^2}\right) \left( 1 - \eta \right) ^2,\\ (p ^2 \!+\! R_{\textrm{s}}T \chi ^2(1 \!+\! 2 \alpha p)) (p _u ^2 \!-\! R_{\textrm{s}}T \chi ^2)^2&\!=\! p _u ^4 p ^2 \left( 1 \!+\! R_{\textrm{s}}T \chi ^2 \frac{1 + 2\alpha p}{p ^2}\right) \left( 1 - \eta _u \right) ^2 \end{aligned}$$

using the squared Mach numbers

$$\begin{aligned} \eta = R_{\textrm{s}}T \frac{\chi ^2}{p ^2}, \quad \eta _u = R_{\textrm{s}}T \frac{\chi ^2}{p _u ^2}. \end{aligned}$$

Now, if the sign of $\chi $ is given, all terms are easily comparable except for

$$\begin{aligned} 1 + R_{\textrm{s}}T \chi ^2 \frac{1 + 2\alpha p _u}{p _u ^2} \quad \text {and} \quad 1 + R_{\textrm{s}}T \chi ^2 \frac{1 + 2\alpha p}{p ^2}. \end{aligned}$$

We take the derivative w.r.t. the pressure and get

$$\begin{aligned} \frac{\partial }{\partial p} \left( 1 + R_{\textrm{s}}T \chi ^2 \frac{1 + 2\alpha p}{p ^2}\right)&= R_{\textrm{s}}T \chi ^2 \frac{\partial }{\partial p}\left( \frac{1 + 2 \alpha p}{p ^2}\right) = R_{\textrm{s}}T \chi ^2 \frac{- 2 p- 2 \alpha p ^2}{p ^4} \\&= - 2 R_{\textrm{s}}T \chi ^2 \frac{p (1 + \alpha p)}{p ^4} < 0. \end{aligned}$$

We can now determine the sign of $\frac{\partial ^2 p}{\partial p _u ^2}$ in dependence of $\chi $. If $\chi = 0$, then $p = p _u $ and, therefore,

$$\begin{aligned} \frac{\partial ^2 p}{\partial p _u ^2} = 0 \end{aligned}$$

holds. If $\chi > 0$, then $p _u > p $ and $\eta > \eta _u $. We thus have

$$\begin{aligned} \frac{\partial ^2 p}{\partial p _u ^2} < 0. \end{aligned}$$

If $\chi < 0$, we analogously get

$$\begin{aligned} \frac{\partial ^2 p}{\partial p _u ^2} > 0. \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \end{aligned}$$

$\square $

Next, we analyze the mixed second-order derivative of $p $.

Lemma A.3

It holds

$$\begin{aligned} \frac{\partial ^2 p}{\partial \chi \partial p _u} {\left\{ \begin{array}{ll} = 0, &{} \textrm{for } \chi = 0,\\ > 0, &{} \textrm{for } \chi \ne 0 \end{array}\right. } \end{aligned}$$

for $p \in (|{\chi }| \sqrt{R_{\textrm{s}}T}, 1 / |{\alpha }|)$ and $x \in (0, L]$.

Proof

It holds

$$\begin{aligned}&\frac{\partial ^2 p}{\partial \chi \partial p _u} {=} \frac{\partial }{\partial \chi }\left( \frac{F'(p _u)}{F'(p)}\right) {=} \frac{\partial }{\partial \chi }\left( \frac{(p _u ^2 {-} \chi ^2 R_{\textrm{s}}T) p (1 {+} \alpha p)}{(p ^2 {-} \chi ^2 R_{\textrm{s}}T) p _u (1 {+} \alpha p _u)}\right) \\ {=} \ {}&\frac{\left( -2 \chi R_{\textrm{s}}T p (1 {+} \alpha p) {+} (p _u ^2 {-} \chi ^2 R_{\textrm{s}}T) \left( (1 {+} \alpha p) \frac{\partial p}{\partial \chi } {+} p \alpha \frac{\partial p}{\partial \chi }\right) \right) (p ^2 {-} \chi ^2 R_{\textrm{s}}T) p _u (1 {+} \alpha p _u)}{(p ^2 {-} \chi ^2 R_{\textrm{s}}T)^2 p _u ^2 (1 + \alpha p _u)^2} \\&\quad -\, \frac{(p _u ^2 - \chi ^2 R_{\textrm{s}}T) p (1 + \alpha p) \left( 2 p \frac{\partial p}{\partial \chi } - 2 \chi R_{\textrm{s}}T \right) p _u (1 + \alpha p _u)}{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2 p _u ^2 (1 + \alpha p _u)^2} \\ = \ {}&\frac{\left( -2 \chi R_{\textrm{s}}T p (1 + \alpha p) + (p _u ^2 - \chi ^2 R_{\textrm{s}}T) (1 + 2 \alpha p) \frac{\partial p}{\partial \chi }\right) (p ^2 - \chi ^2 R_{\textrm{s}}T)}{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2 p _u (1 + \alpha p _u)} \\&\quad -\, \frac{2 (p _u ^2 - \chi ^2 R_{\textrm{s}}T) p ^2 (1 + \alpha p) \frac{\partial p}{\partial \chi } - 2 \chi R_{\textrm{s}}T (p _u ^2 - \chi ^2 R_{\textrm{s}}T) p (1 + \alpha p)}{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2 p _u (1 + \alpha p _u)} \\ = \ {}&\frac{2 \chi R_{\textrm{s}}T p (1 + \alpha p) \left( (p _u ^2 - \chi ^2 R_{\textrm{s}}T) - (p ^2 - \chi ^2 R_{\textrm{s}}T)\right) }{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2 p _u (1 + \alpha p _u)} \\&\quad + \frac{(p _u ^2 - \chi ^2 R_{\textrm{s}}T) \left( (1 + 2 \alpha p) (p ^2 - \chi ^2 R_{\textrm{s}}T) - 2 p ^2 (1 + \alpha p)\right) \frac{\partial p}{\partial \chi }}{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2 p _u (1 + \alpha p _u)} \\ = \ {}&\frac{2 \chi R_{\textrm{s}}T p (1 + \alpha p) (p _u ^2 - p ^2) - (p _u ^2 - \chi ^2 R_{\textrm{s}}T) (p ^2 + \chi ^2 R_{\textrm{s}}T (1 + 2 \alpha p)) \frac{\partial p}{\partial \chi }}{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2 p _u (1 + \alpha p _u)}. \end{aligned}$$

We note that $(p ^2 + \chi ^2 R_{\textrm{s}}T (1 + 2 \alpha p)) > 0$ holds because of (34b). Now, if $\chi = 0$, then $p _u = p $ and $\frac{\partial p}{\partial \chi } = 0$ holds. It follows

$$\begin{aligned} \frac{\partial ^2 p}{\partial \chi \partial p _u} = 0. \end{aligned}$$

If $\chi \ne 0$, then $\chi (p _u ^2 - p ^2) > 0$ as well as $\frac{\partial p}{\partial \chi } < 0$ holds and we get

$$\begin{aligned} \frac{\partial ^2 p}{\partial \chi \partial p _u} > 0.\square \end{aligned}$$

From $\frac{\partial p}{\partial \chi } = 0$ for $\chi = 0$ and by using Schwarz’s theorem for $\chi \ne 0$, it follows

$$\begin{aligned} \frac{\partial ^2 p}{\partial p _u \partial \chi } = \frac{\partial ^2 p}{\partial \chi \partial p _u} {\left\{ \begin{array}{ll} = 0, &{} \text {for } \chi = 0,\\ > 0, &{} \text {for } \chi \ne 0. \end{array}\right. } \end{aligned}$$

(35)

For the second-order derivative w.r.t. $\chi $ of $p $, we can only give the sign for positive mass flux $\chi $.

Lemma A.4

It holds

$$\begin{aligned} \frac{\partial ^2 p}{\partial \chi ^2} < 0 \end{aligned}$$

for $p \in (|{\chi }| \sqrt{R_{\textrm{s}}T}, 1 / |{\alpha }|)$, $x \in (0, L]$, and $\chi > 0$.

Proof

We introduce the auxiliary function

$$\begin{aligned} h\mathrel {{\mathop :}{=}}h(x, p _u, \chi ) \mathrel {{\mathop :}{=}}R_{\textrm{s}}T \ln \left( \frac{(1 + \alpha p) p _u}{(1 + \alpha p _u) p}\right) + \frac{1}{2} R_{\textrm{s}}T {{\,\textrm{sign}\,}}(\chi ) \theta x \end{aligned}$$

(36)

to rewrite the derivative of $p $ w.r.t. $\chi $ as

$$\begin{aligned} \frac{\partial p}{\partial \chi } = \frac{-2 \chi hp (1 + \alpha p)}{p ^2 - \chi ^2 R_{\textrm{s}}T}. \end{aligned}$$

It holds

$$\begin{aligned} {{\,\textrm{sign}\,}}(h) = {{\,\textrm{sign}\,}}(p _u- p) = {{\,\textrm{sign}\,}}(\chi ). \end{aligned}$$

We can take the derivative of $h$ w.r.t. $\chi $ for $\chi \ne 0$ and get

$$\begin{aligned} \frac{\partial h}{\partial \chi }(x, p _u, \chi )&= R_{\textrm{s}}T \frac{(1 + \alpha p _u) p}{(1 + \alpha p) p _u} \frac{\alpha p _u \frac{\partial p}{\partial \chi } (1 + \alpha p _u) p- (1 + \alpha p) p _u (1 + \alpha p _u) \frac{\partial p}{\partial \chi }}{(1 + \alpha p _u)^2 p ^2}\\&= \frac{- R_{\textrm{s}}T}{(1 + \alpha p) p} \frac{\partial p}{\partial \chi } = \frac{2 \chi R_{\textrm{s}}T h(x, p _u, \chi )}{p ^2 - \chi ^2 R_{\textrm{s}}T} > 0. \end{aligned}$$

Now, we can take the second derivative of $p $ w.r.t. $\chi $ and get

$$\begin{aligned}&\frac{\partial ^2 p}{\partial \chi ^2} = \frac{\partial }{\partial \chi }\left( \frac{-2 \chi hp (1 + \alpha p)}{p ^2 - \chi ^2 R_{\textrm{s}}T}\right) \\ = \ {}&\frac{-2 \left( hp (1 + \alpha p) + \chi p (1 + \alpha p) \frac{\partial h}{\partial \chi } + \chi h(1 + 2 \alpha p) \frac{\partial p}{\partial \chi }\right) (p ^2 - \chi ^2 R_{\textrm{s}}T)}{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2} \\&\quad + \frac{2 \chi hp (1 + \alpha p) (2 p \frac{\partial p}{\partial \chi } - 2 \chi R_{\textrm{s}}T)}{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2} \\ = \ {}&\frac{-2 h\left( \left( p (1 + \alpha p) + \chi (1 + 2 \alpha p) \frac{\partial p}{\partial \chi }\right) (p ^2 - \chi ^2 R_{\textrm{s}}T) - \chi p (1 + \alpha p) (2 p \frac{\partial p}{\partial \chi } - 2 \chi R_{\textrm{s}}T)\right) }{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2} \\&\quad -\, \frac{2 \chi p (1 + \alpha p) \frac{\partial h}{\partial \chi } (p ^2 - \chi ^2 R_{\textrm{s}}T)}{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2} \\ = \ {}&\frac{-2 h\left( p (1 + \alpha p) (p ^2 + \chi ^2 R_{\textrm{s}}T) - \chi (p ^2 + \chi ^2 R_{\textrm{s}}T (1 + 2 \alpha p)) \frac{\partial p}{\partial \chi }\right) }{(p ^2 - \chi ^2 R_{\textrm{s}}T)^2} \\&\quad -\, \frac{2 \chi p (1 + \alpha p) \frac{\partial h}{\partial \chi }}{p ^2 - \chi ^2 R_{\textrm{s}}T}. \end{aligned}$$

It holds ${{\,\textrm{sign}\,}}(h) = {{\,\textrm{sign}\,}}(\chi )$, and we have $(p ^2 + \chi ^2 R_{\textrm{s}}T (1 + 2 \alpha p)) > 0$ because of (34b). For $\chi > 0$, we get $\frac{\partial p}{\partial \chi } < 0$ and $\frac{\partial h}{\partial \chi } > 0$. The claim then follows. $\square $

To compensate the fact that the last result only holds for $\chi > 0$, we estimate the value of $\frac{\partial p}{\partial \chi }$ for $\chi < 0$ with values for $\chi > 0$.

Lemma A.5

It holds

$$\begin{aligned} \frac{\partial p}{\partial \chi }(x, p _u, \chi ) > \frac{\partial p}{\partial \chi }(x, p (x, p _u, \chi ), -\chi ) \end{aligned}$$

for $p \in (|{\chi }| \sqrt{R_{\textrm{s}}T}, 1/|{\alpha }|)$, $x \in (0, L]$, and $\chi < 0$.

Proof

Equation (27), ie.,

$$\begin{aligned} F(p) = F(p _u) - \frac{1}{2} R_{\textrm{s}}T \chi |{\chi }| \theta x, \end{aligned}$$

is equivalent to

$$\begin{aligned} F(p _u) = F(p) - \frac{1}{2} R_{\textrm{s}}T (-\chi ) |{-\chi }| \theta x. \end{aligned}$$

Therefore, if $(p _u, \chi , p)$ fulfills (27), then $(p, -\chi , p _u)$ fulfills (27) as well. This means, we can write the pressure $p _u $ as a function of $p $ and $\chi $:

$$\begin{aligned} p _u (x, p, \chi ) = p (x, p, -\chi ). \end{aligned}$$

Now, we have

$$\begin{aligned} \begin{aligned} \frac{\partial p}{\partial \chi }(x, p _u, \chi )&= \frac{2\chi R_{\textrm{s}}T \ln \left( \frac{(1 + \alpha p _u) p}{(1 + \alpha p) p _u}\right) - R_{\textrm{s}}T |{\chi }| \theta x}{F'(p)} \\&= \frac{2(-\chi ) R_{\textrm{s}}T \ln \left( \frac{(1 + \alpha p) p _u}{(1 + \alpha p _u) p}\right) - R_{\textrm{s}}T |{-\chi }| \theta x}{F'(p)} \\&= \frac{F'(p _u)}{F'(p)} \frac{\partial p _u}{\partial \chi }(x, p, -\chi ) \\&> \frac{\partial p _u}{\partial \chi }(x, p, -\chi ) \end{aligned} \end{aligned}$$

because $p _u < p $ holds for $\chi < 0$, $F'$ is strictly increasing in $p $, and $\frac{\partial p _u}{\partial \chi }(x, p, -\chi ) < 0$ for $\chi \ne 0$. $\square $

Now, for $\frac{\partial p}{\partial \chi }(x, p (x, p _u, \chi ), -\chi )$, we can determine the sign of its derivative w.r.t. $\chi $ for $\chi < 0$.

Lemma A.6

It holds

$$\begin{aligned} \frac{\partial ^2 p}{\partial \chi ^2}(x, p (x, p _u, \chi ), -\chi ) > 0 \end{aligned}$$

for $p \in (|{\chi }| \sqrt{R_{\textrm{s}}T}, 1 / |{\alpha }|)$, $x \in (0, L]$, and $\chi < 0$.

Proof

We know that

$$\begin{aligned} p _u = p (x, p (x, p _u, \chi ), -\chi ) \end{aligned}$$

holds. Therefore, we can write

$$\begin{aligned} \frac{\partial p}{\partial \chi }(x, p (x, p _u, \chi ), -\chi )&= \frac{2 (- \chi ) R_{\textrm{s}}T \ln \left( \frac{(1 + \alpha p (x, p _u, \chi )) p _u}{(1 + \alpha p _u) p (x, p _u, \chi )}\right) - R_{\textrm{s}}T |{-\chi }| \theta x}{F'(p _u)}\\&= \frac{-2 \chi hp _u (1 + \alpha p _u)}{p _u ^2 - \chi ^2 R_{\textrm{s}}T} \end{aligned}$$

by using the definition (36) of $h$. Then, the second-order derivative is given by

$$\begin{aligned}{} & {} \frac{\partial ^2 p}{\partial \chi ^2}(x, p (x, p _u, \chi ), -\chi )\\{} & {} \quad = p _u (1 + \alpha p _u) \frac{\left( -2 h- 2 \chi \frac{\partial h}{\partial \chi }\right) (p _u ^2 - \chi ^2 R_{\textrm{s}}T) - (4 \chi ^2 hR_{\textrm{s}}T)}{(p _u ^2 - \chi ^2 R_{\textrm{s}}T)^2}. \end{aligned}$$

Because of ${{\,\textrm{sign}\,}}(h) = {{\,\textrm{sign}\,}}(\chi ) = -1$ and $\frac{\partial h}{\partial \chi } > 0$, the claim follows. $\square $

Now, we gathered all tools that are needed to find the optimal solution or at least a suitable upper bound for the NLPs $\max _{x \in \mathcal {F}} \left| {\partial _{p _u}p _v}\right| $ and $\max _{x \in \mathcal {F}} \left| {\partial _{\chi }p _v}\right| $ that are needed to compute the weighted norm (28) for (5).

Proof of Theorem 5.4

The claim is a direct consequence of Lemma A.2 and Lemma A.3. $\square $

Proof of Theorem 5.5

The first property is a direct consequence of (35) and Lemma A.4.

Now, we prove the second property. From (35) it follows that $p _u = p _u ^-$ must hold in the minimum. We know from Lemma A.5 that

$$\begin{aligned} \partial _{\chi }p _v (p _u ^-, \chi ) > \partial _{\chi }p _v (p _v (p _u ^-, \chi ), -\chi ) \end{aligned}$$

holds for all $\chi \in [\chi ^-, \chi ^+] \cap \mathbb {R}_{\le 0}$. Since we are dealing with a compact set, this also holds for the minima:

$$\begin{aligned} \min _{\chi \in [\chi ^-, \chi ^+] \cap \mathbb {R}_{\le 0}} \partial _{\chi }p _v (p _u ^-, \chi ) > \min _{\chi \in [\chi ^-, \chi ^+] \cap \mathbb {R}_{\le 0}} \partial _{\chi }p _v (p _v (p _u ^-, \chi ), -\chi ). \end{aligned}$$

Hence, the claim follows from Lemma A.6. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Grübel, J., Krug, R., Schmidt, M. et al. A Successive Linear Relaxation Method for MINLPs with Multivariate Lipschitz Continuous Nonlinearities. J Optim Theory Appl 198, 1077–1117 (2023). https://doi.org/10.1007/s10957-023-02254-9

Download citation

Received: 15 August 2022
Accepted: 04 June 2023
Published: 23 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10957-023-02254-9

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Successive Linear Relaxation Method for MINLPs with Multivariate Lipschitz Continuous Nonlinearities

Abstract

Similar content being viewed by others

A decomposition method for MINLPs with Lipschitz continuous nonlinearities

Solving mixed-integer nonlinear optimization problems using simultaneous convexification: a case study for gas networks

Continuous Relaxation of MINLP Problems by Penalty Functions: A Practical Comparison

1 Introduction

2 Problem Definition

3 The Algorithm

3.1 Construction of the Master Problem’s Feasible Region

Proposition 3.1

Lemma 3.2

Proof

3.2 Construction of the Subproblem

Lemma 3.3

Proof

3.3 Formal Statement of the Algorithm

Proposition 3.4

Lemma 3.5

Proof

Theorem 3.6

Proof

Theorem 3.7

Proof

Remark 3.8

4 Application to Nonlinear Bilevel Problems

Assumption 1

4.1 Lipschitz Continuity Properties

Lemma 4.1

Theorem 4.2

Proof

4.2 Implementation Details

4.2.1 “Slow” and “Fast” Method for \(\varphi \)

4.2.2 Additional Nonlinearities

4.2.3 Box Filtering

4.2.4 Tighter Lipschitz Constants for Box-Constrained Lower Levels

4.2.5 Big-Ms

4.2.6 Lipschitz Constant Updates

4.3 Numerical Results

5 Application to Gas Network Optimization

5.1 Modeling

5.2 Lipschitz Continuity of the Gas Flow Equation

Lemma 5.1

Proof

Lemma 5.2

Assumption 2

Lemma 5.3

Proof

Theorem 5.4

Theorem 5.5

5.3 Numerical Results

6 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A. Proofs of Theorem 5.4 and Theorem 5.5

Appendix A. Proofs of Theorem 5.4 and Theorem 5.5

Lemma A.1

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Lemma A.6

Proof

Proof of Theorem 5.4

Proof of Theorem 5.5

Rights and permissions

About this article

Cite this article

Share this article