Multivariate McCormick relaxations

Tsoukalas, A.; Mitsos, A.

doi:10.1007/s10898-014-0176-0

Multivariate McCormick relaxations

Open access
Published: 02 April 2014

Volume 59, pages 633–662, (2014)
Cite this article

Download PDF

You have full access to this open access article

Journal of Global Optimization Aims and scope Submit manuscript

Multivariate McCormick relaxations

Download PDF

A. Tsoukalas¹ &
A. Mitsos^1,2

5414 Accesses
60 Citations
1 Altmetric
Explore all metrics

An Erratum to this article was published on 25 October 2016

Abstract

McCormick (Math Prog 10(1):147–175, 1976) provides the framework for convex/concave relaxations of factorable functions, via rules for the product of functions and compositions of the form $F\circ f$, where $F$ is a univariate function. Herein, the composition theorem is generalized to allow multivariate outer functions $F$, and theory for the propagation of subgradients is presented. The generalization interprets the McCormick relaxation approach as a decomposition method for the auxiliary variable method. In addition to extending the framework, the new result provides a tool for the proof of relaxations of specific functions. Moreover, a direct consequence is an improved relaxation for the product of two functions, at least as tight as McCormick’s result, and often tighter. The result also allows the direct relaxation of multilinear products of functions. Furthermore, the composition result is applied to obtain improved convex underestimators for the minimum/maximum and the division of two functions for which current relaxations are often weak. These cases can be extended to allow composition of a variety of functions for which relaxations have been proposed.

Differentiable McCormick relaxations

Article 27 May 2016

A theorem of the alternative with an arbitrary number of inequalities and quadratic programming

Article 19 April 2017

Compositions of convex functions and fully linear models

Article 15 February 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Many nonlinear programs (NLPs), in particular in engineering design, are nonconvex and multimodal. Their global optimization typically relies on the construction of converging convex/concave relaxations, i.e., convex/concave functions that under-/over-estimate the objective function and the constraints. In principle it would be desirable to construct the convex/concave envelopes, but this is typically not practical.

One approach for the convex relaxations are the so-called $\alpha $BB and $\gamma $BB relaxations, developed by Floudas and coworkers [1, 2, 21]. The methods are applicable to twice-continuously differential functions and rely on an estimation of the Hessian of the original functions. For elementary functions, convex/concave envelopes are known or it is possible to calculate tight relaxations. Note for instance the construction of envelopes of univariate functions described by Maranas and Floudas [22], the work by Liberti and Pantelides [20] for monomials of odd degree, and the work of Tawarmalani and Sahinidis [46, 47] for a class of fractional and lower semi-continuous functions.

McCormick [23, 24] has provided a framework for the convex/concave relaxations of factorable functions, i.e., functions that can be represented as a finite recursive composition of binary sums, binary products and a given library of univariate intrinsic functions. The relaxations of the univariate intrinsic functions are propagated based on two main theorems, which essentially allow the relaxation of expressions in the form $F_1\circ f_1 + F_2 \circ f_2 \cdot F_3\circ f_3$. These relaxations are in general nonsmooth [30]. If all functions involved are smooth and the convex/concave envelopes of the functions are used in the composition theorem, then the convergence order is at least quadratic [7] even if natural interval extensions with linear convergence order are used for the enclosures of functions.

An alternative to McCormick’s relaxation is the auxiliary variable method (AVM) which employs auxiliary variables for each factor involved [42, 48–50]. More precisely, instead of relaxing the functions, the nonconvex optimization problem is relaxed, i.e., the nonconvex problem is reformulated introducing auxiliary variables in such a way that the intrinsic functions are decoupled and can be relaxed one by one. A lower bound to the nonconvex problem is calculated via a relaxed NLP or linear program (LP).

Mitsos et al. [30] proposed the propagation of relaxations and their subgradients through procedures, thus extending the McCormick relaxations to the global optimization with algorithms embedded; examples in [30] demonstrate that optimizing in the original dimensional space can, for a class of problems, result in drastic computational savings compared to the AVM. The nonsmoothness of the relaxations implies the utilization of non-smooth optimization methods [16] for the calculation of lower bounds to the nonconvex optimization problem. The McCormick relaxations can be generalized in other ways [38], allowing also the relaxations of NLPs with dynamics (i.e., an ordinary differential equation or differential algebraic system) embedded [11, 12, 39, 40] as well as the relaxation of implicit functions [43]. Recently, Sahlodin and Chachuat [37] also proposed the so-called McCormick-Taylor models, whereby McCormick relaxations are propagated in addition to interval bounds for enclosing the remainder term. Two implementations of McCormick’s relaxations are MC++ [10] and modMC [13]; the former is freely available and used herein to calculate the McCormick relaxations.

While McCormick relaxations are clearly a very important tool, they have the limitation of only allowing univariate composition, i.e., univariate outer function. Herein, a generalization to multivariate outer functions is proposed via a reformulation of McCormick’s Composition theorem in terms of a simple optimization problem. The new theorem directly allows the relaxation and subgradient propagation through procedures similar to [30] under mild assumptions. It even gives rules for the propagation of the subdifferential.

Auxiliary variable method has two clear advantages compared to McCormick’s relaxations, namely that the relaxations are $(i)$ at least as tight and in some cases tighter and $(ii)$ differentiable for a larger class of functions [48]. On the other hand, McCormick relaxations have the advantage that the relaxations are constructed in the original space and allow for several generalizations. The generalization of McCormick relaxations presented here, makes the relationship of the two approaches explicit, yielding a, to the best of our knowledge previously unknown, interpretation of McCormick Relaxation’s as a decomposition method to solve the relaxed NLP constructed by AVM. We note that such decomposition methods for AVM have not been implemented by the global optimization community.

The proposed generalization allows a more direct relaxation of the product of functions, which proves to be at least as tight and in some cases tighter than McCormick’s product rule. It also allows the direct relaxation of multilinear product of functions, i.e., without resorting to recursive application of the bilinear rule. Similarly, the proposed theorem results in at least as tight and often tighter relaxations for the minimum/maximum and the division of two functions.

The rest of the paper is organized as follows. In Sect. 2 we review McCormick’s Composition Theorem and we give its generalization to multivariate outer functions, while in Sect. 3 we provide a way to propagate subgradient information. In Sect. 4 we discuss the relationship with AVM. We apply our results to compute relaxations of the product of two functions in Sect. 5, the minimum/maximum of two functions in Sect. 6 and the division of two functions in Sect. 7. We conclude and discuss future directions in Sect. 8.

2 Convex underestimator theorems

Theorem 1 is the main result in McCormick [23] and constructs convex/concave relaxations of composite functions where the outer function is univariate. Therein, $\mathrm{mid}(\alpha ,\beta ,\gamma )$ gives the median of three real numbers; in the trivial case that $\alpha =\beta =\gamma $ we have $\mathrm{mid}(\alpha ,\beta ,\gamma )=\alpha $; otherwise it is the numerical value that is smaller than the maximum and/or larger than the minimum.

Theorem 1

(McCormick composition theorem [23]) Let $Z\subset \mathbb {R}^n$ and $X \subset \mathbb {R}$ be nonempty compact convex sets. Consider the composite function $g=F \circ f(\cdot )$ where $f:Z \rightarrow \mathbb {R}$, $F:X\rightarrow \mathbb {R}$ and let $f(Z)\subset X$. Suppose that convex/concave relaxations $f^{cv},f^{cc}: Z\rightarrow \mathbb {R} $ of $f$ on $Z$ are known. Let $F^{cv}: X \rightarrow \mathbb {R}$ be a convex relaxation of $F$ on $X$ and let $x^{\min } \in X$ be a point where $F^{cv}$ attains its minimum on $X$. Then $\bar{g}^{cv}: Z\rightarrow \mathbb {R}$,

$$\begin{aligned} \bar{g}^{cv}(\mathbf{z})=F^{cv}\left( \mathrm{mid} \{f^{cv}(\mathbf{z} ),f^{cc}(\mathbf{z}),x^{\min }\}\right) , \end{aligned}$$

(1)

is a convex relaxation of $g$ on $Z$.

A similar theorem exists for the concave relaxation. Below we give an equivalent, yet more convenient to generalize, definition of the McCormick relaxation.

Proposition 1

Let $g^{cv}: Z\rightarrow \mathbb {R}$

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{x \in X}\{ F^{cv}(x)| f^{cv}(\mathbf{z})\le x\le f^{cc}(\mathbf{z})\} \end{aligned}$$

(2)

For the function $\bar{g}^{cv}$ defined by (1) there holds

$$\begin{aligned} \bar{g}^{cv}(\mathbf{z})=g^{cv}(\mathbf{z}) \end{aligned}$$

for all $\mathbf{z}\in Z$.

Proof

Note that we clearly have $f^{cv}(\mathbf{z})\le f^{cc}(\mathbf{z})$ for all $\mathbf{z} \in Z$ and that since $f(Z)\subset X$ there holds $[f^{cv}(\mathbf{z}),f^{cc}(\mathbf{z})]\cap X\ne \emptyset $ for all $\mathbf{z}\in Z$. Furthermore, let $x^{\min }$ be the minimum of $F^{cv}$ in $X$.

We consider all three cases. If

$$\begin{aligned} \text {mid} \{f^{cv}(\mathbf{z} ),f^{cc}(\mathbf{z}),x^{\min }\}=x^{\min } \end{aligned}$$

then we have

$$\begin{aligned} g^{cv}(z)=\min _{x\in X}\{ F^{cv}(x)| f^{cv}(\mathbf{z})\le x\le f^{cc}(\mathbf{z})\}=F^{cv}(x^{\min })=\bar{g}^{cv}(\mathbf{z}). \end{aligned}$$

If on the other hand

$$\begin{aligned} \text {mid}\{f^{cv}(\mathbf{z} ),f^{cc}(\mathbf{z}),x^{\min }\}=f^{cv}(\mathbf{z} ) \end{aligned}$$

we note that $f^{cv}(\mathbf{z} ) \le f^{cc}(\mathbf{z} )$ and thus $x^{\min }\le f^{cv}(\mathbf{z})\le f^{cc}(\mathbf{z})$. Since $F^{cv}$ is convex it must be nondecreasing for $x\ge x^{\min }$ [30]. In addition we have $x^{\min } \le f^{cv}(\mathbf{z})\le f(\mathbf{z})$ and therefore $f^{cv}(\mathbf{z})\in X$. Thus, we have

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{x\in X}\{ F^{cv}(x)| f^{cv}(\mathbf{z})\le x\le f^{cc}(\mathbf{z})\}=F^{cv}\circ f^{cv}(\mathbf{z})=\bar{g}^{cv}(\mathbf{z}). \end{aligned}$$

Similarly, if

$$\begin{aligned} \text {mid}\{f^{cv}(\mathbf{z} ),f^{cc}(\mathbf{z}),x^{\min }\}=f^{cc}(\mathbf{z} ) \end{aligned}$$

we have $f^{cv}(\mathbf{z})\le f^{cc}(\mathbf{z})\le x^{\min }$. Since $F^{cv}$ is convex it must be nonincreasing for $x\le x^{\min }$. In addition we have $f(\mathbf{z}) \le f^{cc}(\mathbf{z})\le x^{\min }$ and therefore $f^{cc}(\mathbf{z})\in X$. Thus, we have

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{x\in X}\{ F^{cv}(x)| f^{cv}(\mathbf{z})\le x\le f^{cc}(\mathbf{z})\}=F^{cv}(f^{cc}(\mathbf{z}))=\bar{g}^{cv}(\mathbf{z}). \end{aligned}$$

$\square $

Theorem 2 gives a generalization of Theorem 1 for multivariate outer functions. Its proof makes use of Lemma 1, which we will also use in the development of subgradient propagation in Sect. 3. Note that $\partial g(x)$ denotes the subdifferential of $g$ at $x$, i.e., the set of all subgradients.

Lemma 1

[17] Let $f_1,\ldots ,f_m$ be $m$ convex functions from $\mathbb {R}^n \rightarrow \mathbb {R}$ and let $F$ be a convex and non-decreasing function from $\mathbb {R}^m \rightarrow \mathbb {R}$. Then $g(x)=F(f_1(x),\ldots ,f_m(x))$ is a convex function. Furthermore

$$\begin{aligned} \partial g(x)=\left\{ \sum _{i=1}^m \rho _i s_i: (\rho _1,\ldots ,\rho _m) \in \partial F(f_1(x),\ldots ,f_m(x)),s_i \in \partial f_i(x) \quad \forall i=1,\ldots ,m \right\} . \end{aligned}$$

Theorem 2

Let $Z\subset \mathbb {R}^n$ and $X \subset \mathbb {R}^m$ be nonempty compact convex sets. Consider the composite function $g=F(f_1(\mathbf z),\ldots ,f_m(\mathbf{z}))$, where $F:X\rightarrow \mathbb {R}$ and for $i\in I= \{1,\ldots ,m\}$, $f_i:Z \rightarrow \mathbb {R}$ are continuous functions, and let

$$\begin{aligned} \{\left( f_i(\mathbf{z}),\ldots ,f_m(\mathbf{z})\right) | {\mathbf{z}} \in Z\}\subset X. \end{aligned}$$

(3)

Suppose that convex relaxations $f^{cv}_i: Z\rightarrow \mathbb {R} $ and concave relaxations $f^{cc}_i: Z\rightarrow \mathbb {R} $ of $f_i$ on $Z$ are known for every $i\in I$. Let $F^{cv}: X \rightarrow \mathbb {R}$ be a convex relaxation of $F$ on $X$ and $F^{cc}: X \rightarrow \mathbb {R}$ be a concave relaxation of $F$ on $X$. Then $g^{cv}: Z\rightarrow \mathbb {R}$,

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{\mathbf{x} \in X} \left\{ F^{cv}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} \end{aligned}$$

is a convex relaxation of $g$ on $Z$ and $g^{cc}: Z\rightarrow \mathbb {R}$,

$$\begin{aligned} g^{cc}(\mathbf{z})=\max _{\mathbf{x} \in X} \left\{ F^{cc}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} \end{aligned}$$

is a concave relaxation of $g$ on $Z$.

Proof

First we prove that $g^{cv}$ underestimates $g$ on $Z$. Using (3) and the fact that $f^{cv}_i(\mathbf{z})\le f_i(\mathbf{z}) \le f^{cc}_i(\mathbf{z}) $ we obtain

$$\begin{aligned} g^{cv}(\mathbf{z})&= \min _{\mathbf{x} \in X} \left\{ F^{cv}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} \\&\le \min _{\mathbf{x} \in X} \left\{ F^{cv}(\mathbf{x})| x_i = f_i(\mathbf{z}), \quad \forall i\in I \right\} \\&= F^{cv} \left( f_1(\mathbf{z}),\ldots ,f_m(\mathbf{z}) \right) \\&\le F \left( f_1(\mathbf{z}),\ldots ,f_m(\mathbf{z}) \right) =g(\mathbf{z}). \end{aligned}$$

Next we prove that $g^{cv}$ is convex. Consider the function $h$ defined on $X \times X$ by

$$\begin{aligned} h({\varvec{\chi }}^{cv},{\varvec{\chi }}^{cc})=\min _{\mathbf{x} \in X \subset \mathbb {R}^m}\{ F^{cv}(\mathbf{x})| -\mathbf{x}\le -{\varvec{\chi }}^{cv},\mathbf{x} \le -{\varvec{\chi }}^{cc} \} \end{aligned}$$

with $F^{cv}$ convex.

From convexity of $F^{cv}$, the function $h$ is increasing and convex as a perturbation function of a convex problem [8]. Observing that

$$\begin{aligned} g^{cv}(\mathbf{z})=h\left( f_1^{cv}(\mathbf z),\ldots ,f_m^{cv}(\mathbf{z}),-f_1^{cc}(\mathbf z),\ldots ,-f_m^{cc}(\mathbf{z})\right) \end{aligned}$$

and applying Lemma 1 we obtain convexity of $g^{cv}$. Note, that the negative sign at the right hand side of the second constraint in the definition of $h$, was chosen conveniently to negate the concave terms $f_i^{cc}$ and decompose $g^{cv}$ to a convex function of convex functions. The proof for $g^{cc}$ is analogous.$\square $

We note that $g^{cv}/g^{cc}$ is not in general the convex/concave envelope of $g$ even if $F^{cv},f^{cv}$ are the convex envelopes and $F^{cc},f^{cc}$ the concave envelopes of $F,f$ respectively, see e.g., Fig. 1.

The definitions of $g^{cv}/g^{cc}$ at a point $\mathbf{z}$ involve the minimization/maximization of the convex/concave relaxation $F^{cv},F^{cc}$ of $F$, where the convex/concave relaxations are computed over $X$ and the optimization is over $X\cap B$ where B is a box defined by $f_i^{cv}(\mathbf{z})$, $f_i^{cc}(\mathbf{z})$. This is typically a relatively easy convex problem to solve as $F^{cv}$, $F^{cc}$ are usually simple functions. In many cases, including the binary product of functions (Sect. 5), the solution can be described as a function of $\mathbf{z}$ in closed form.

Similarly to McCormick’s relaxations, nested functions can be handled by recursive application of the theorem and do not present any difficulty. The only requirement is the availability of closed form solutions or reliable algorithms to solve the convex problems.

For the rest of the paper, unless otherwise stated, we assume that

$$\begin{aligned} \left[ f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z})\right] \times \cdots \times \left[ f_m^{cv}(\mathbf{z}),f_m^{cc}(\mathbf{z})\right] \subset X. \end{aligned}$$

(4)

Note that this is without loss of generality as we can always take

$$\begin{aligned} \bar{f}_i^{cv}(\mathbf{z})=\max \left\{ f_i^{cv}(\mathbf{z}),\min _{\mathbf{x} \in X}x_i \right\} ,\quad \bar{f}_i^{cc}(\mathbf{z})=\min \left\{ f_i^{cc}(\mathbf{z}),\max _{\mathbf{x}\in X}x_i \right\} . \end{aligned}$$

More specifically, if we assume that $X$ is a box defined as $\left[ f_1^L,f_1^U\right] \times \cdots \times \left[ f_m^L,f_m^U\right] $ where $\left[ f_i^L,f_i^U\right] $ is an inclusion function of $f_i$ on $Z$ we can take

$$\begin{aligned} \bar{f}_i^{cv}(\mathbf{z})=\max \left\{ f_i^{cv}(\mathbf{z}),f_i^L \right\} ,\quad \bar{f}^{cc}(\mathbf{z})=\min \left\{ f_i^{cc}(\mathbf{z}),f_i^U \right\} . \end{aligned}$$

Corollary 3 gives a simplified version of Theorem 2 in the case of monotonicity, which we also utilize to compute convex/concave relaxations for the minimum/maximum of two functions, Sect. 6.

Corollary 3

If in addition to the assumptions of Theorem 2, Assumption (4) holds and

1.
$F^{cv}$ is monotonic increasing then
$$\begin{aligned} g^{cv}(\mathbf{z})=F^{cv}\left( f_1^{cv}(\mathbf{z}),..,f_m^{cv}(\mathbf{z})\right) \end{aligned}$$
is a convex relaxation of $g$.
2.
$F^{cv}$ is monotonic decreasing then
$$\begin{aligned} g^{cv}(\mathbf{z})=F^{cv}\left( f_1^{cc}(\mathbf{z}),..,f_m^{cc}(\mathbf{z})\right) \end{aligned}$$
is a convex relaxation of g.
3.
$F^{cc}$ is monotonic increasing then
$$\begin{aligned} g^{cc}(\mathbf{z})=F^{cc}\left( f_1^{cc}(\mathbf{z}),..,f_m^{cc}(\mathbf{z})\right) \end{aligned}$$
is a concave relaxation of $g$.
4.
$F^{cc}$ is monotonic decreasing then
$$\begin{aligned} g^{cc}(\mathbf{z})=F^{cc}\left( f_1^{cv}(\mathbf{z}),..,f_m^{cv}(\mathbf{z})\right) \end{aligned}$$
is a concave relaxation of $g$.

The convexity of $g^{cv}$ and concavity of $g^{cc}$ in this case is well known, e.g., [8].

3 Subgradient propagation

Theorem (2) allows the evaluation of the convex/concave relaxation of an arbitrary composite function at a point $\mathbf{z}$, provided that convex/concave relaxations of the intrinsic functions are available. As demonstrated in Mitsos et al. [30] the calculation of subgradients of the convex/concave relaxations is useful. In this section the results of Mitsos et al. [30] are generalized to multivariate outer-functions and to the entire subdifferential.

Lemma 2

[Adapted from strong duality theorem.] Consider the problem

$$\begin{aligned} h({\varvec{\chi }}^{cv},{\varvec{\chi }}^{cc})=\min _{\mathbf{x} \in X \subset \mathbb {R}^m}\{ F^{cv}(\mathbf{x})| -\mathbf{x}\le -{\varvec{\chi }}^{cv}, \mathbf{x} \le -{\varvec{\chi }}^{cc} \} \end{aligned}$$

with $F^{cv}$ convex. Then $\mathbf{u}=\left[ \begin{array}{c} \hat{\varvec{\lambda }}^{cv}\\ \hat{\varvec{\lambda }}^{cc} \end{array}\right] $ is an optimal solution of the dual problem $D(\hat{{\varvec{\chi }}}^{cv},\hat{{\varvec{\chi }}}^{cc})$ given by

$$\begin{aligned} \max _{({{\varvec{\lambda }}^{cv}},{\varvec{\lambda }}^{cc})} \min _{\mathbf{x} \in X}\{ F^{cv}(\mathbf{x})+{({\varvec{\lambda }}^{cv})}^{T} (-\mathbf{x}+\hat{\varvec{\chi }}^{cv}) + ({\varvec{\lambda }}^{cc})^{T}(\mathbf{x} + \hat{\varvec{\chi }}^{cc}) \} \end{aligned}$$

at $\hat{{\varvec{\chi }}}= \left[ \begin{array}{c} \hat{\varvec{\chi }}^{cv}\\ \hat{\varvec{\chi }}^{cc} \end{array}\right] $ with $\hat{{\varvec{\chi }}}^{cv}\le -\hat{{\varvec{\chi }}}^{cc}$, if and only if $\mathbf{u}\in \partial h(\hat{\varvec{\chi }}^{cv},\hat{\varvec{\chi }}^{cc}).$

Proof

The proof is based on the proof of the strong duality theorem in [14]. If $ \mathbf{u}=\left[ \begin{array}{c} \hat{\varvec{\lambda }}^{cv}\\ \hat{\varvec{\lambda }}^{cc} \end{array}\right] \ge 0 $ solves the dual then

$$\begin{aligned} \min _{\mathbf{x} \in X} F^{cv}(\mathbf{x})+\mathbf{u}^T \left[ \begin{array}{c} -\mathbf{x}+\hat{\varvec{\chi }}^{cv} \\ \mathbf{x} + \hat{\varvec{\chi }}^{cc} \end{array} \right] =h(\hat{{\varvec{\chi }}}^{cv},\hat{{\varvec{\chi }}}^{cc}). \end{aligned}$$

For any fixed $\bar{{\varvec{\chi }}}=\left[ \begin{array}{c} \bar{\varvec{\chi }}^{cv}\\ \bar{\varvec{\chi }}^{cc} \end{array}\right] \in \mathbb {R}^{2m}$, if $\mathbf{x} \in X$ with $\left[ \begin{array}{c} -\mathbf{x} \\ \mathbf{x} \end{array} \right] \le \left[ \begin{array}{c} -\bar{\varvec{\chi }}^{cv}\\ -\bar{\varvec{\chi }}^{cc} \end{array} \right] $, then there holds

$$\begin{aligned} F^{cv}(\mathbf{x})-\mathbf{u}^T (\bar{{\varvec{\chi }}}-\hat{{\varvec{\chi }}})\ge F^{cv}(\mathbf{x})+\mathbf{u}^T \left[ \begin{array}{c} -\mathbf{x}+\hat{\varvec{\chi }}^{cv} \\ \mathbf{x} + \hat{\varvec{\chi }}^{cc} \end{array} \right] \ge h(\hat{{\varvec{\chi }}}^{cv},\hat{{\varvec{\chi }}}^{cc}). \end{aligned}$$

Thus, for fixed $\bar{{\varvec{\chi }}}$

$$\begin{aligned} -\mathbf{u}^T (\bar{{\varvec{\chi }}}-\hat{{\varvec{\chi }}})+h(\bar{{\varvec{\chi }}})= \begin{array}{cc} \displaystyle \min _{\mathbf{x} \in X} &{} F^{cv}(\mathbf{x})-\mathbf{u}^T (\bar{{\varvec{\chi }}}-\hat{{\varvec{\chi }}})\\ \text {s.t.} &{} -\mathbf{x}\le -\bar{{\varvec{\chi }}}^{cv}\\ &{} \mathbf{x}\le -\bar{{\varvec{\chi }}}^{cc} \end{array}\ge h(\hat{{\varvec{\chi }}}) \end{aligned}$$

and $\mathbf{u}\in \partial h(\hat{\varvec{\chi }}^{cv},\hat{\varvec{\chi }}^{cc}).$

For the converse assume $\mathbf{u}=\left[ \begin{array}{c} \hat{\varvec{\lambda }}^{cv}\\ \hat{\varvec{\lambda }}^{cc} \end{array}\right] \in \partial h(\hat{\varvec{\chi }}^{cv},\hat{\varvec{\chi }}^{cc}).$ Noting that $h$ is non-decreasing, let $\mathbf{e}^i$ denote the $i$th unit vector and observe

$$\begin{aligned} h(\hat{\varvec{\chi }})\ge h(\hat{\varvec{\chi }}-\mathbf{e}^i)\ge h(\hat{\varvec{\chi }}) -u_i \end{aligned}$$

for all $i$, and thus $\mathbf{u}\ge \mathbf{0}$ and $\mathbf{u}$ is dual feasible. For any $\tilde{\mathbf{x}}\in X$ let $\tilde{{\varvec{\chi }}}=\left[ \begin{array}{c} \tilde{\mathbf{x}} \\ -\tilde{\mathbf{x}} \end{array} \right] $. We have

$$\begin{aligned} F^{cv}(\tilde{\mathbf{x}})= h(\tilde{{\varvec{\chi }}})\ge h(\hat{{\varvec{\chi }}}) +\mathbf{u}^T (\tilde{{\varvec{\chi }}}-\hat{{\varvec{\chi }}})=h(\hat{{\varvec{\chi }}}) +\mathbf{u}^T \left[ \begin{array}{c} \tilde{\mathbf{x}}-\hat{{\varvec{\chi }}}^{cv}\\ -\tilde{\mathbf{x}}-\hat{{\varvec{\chi }}}^{cc} \end{array} \right] . \end{aligned}$$

or rearranging

$$\begin{aligned} h(\hat{{\varvec{\chi }}})\le F^{cv}(\tilde{\mathbf{x}}) +\mathbf{u}^T \left[ \begin{array}{c} -\tilde{\mathbf{x}}+\hat{{\varvec{\chi }}}^{cv}\\ \tilde{\mathbf{x}}+\hat{{\varvec{\chi }}}^{cc} \end{array} \right] , \end{aligned}$$

and therefore

$$\begin{aligned} h(\hat{{\varvec{\chi }}})\le \min _{\mathbf{x}\in X} F^{cv}(\mathbf{x}) +\mathbf{u}^T \left[ \begin{array}{c} -{\mathbf{x}}+\hat{{\varvec{\chi }}}^{cv}\\ {\mathbf{x}}+\hat{{\varvec{\chi }}}^{cc} \end{array} \right] . \end{aligned}$$

On the other hand, weak duality yields the opposite inequality and thus equality holds and $\mathbf{u}$ is optimal.$\square $

Theorem 4

The subdifferential of $g^{cv}$ at $\hat{\mathbf{z}}$ is given by

$$\begin{aligned} \partial g^{cv}(\hat{\mathbf{z}})= \left\{ \begin{array}{c} \sum \limits _{i=1}^m \rho _i^{cv} s_i^{cv}-\rho _i^{cc} s_i^{cc}| \end{array} \begin{array}{c} \left( \rho _1^{cv},\ldots ,\rho _m^{cv},\rho _1^{cc},\ldots ,\rho _m^{cc}\right) \in \Lambda (\hat{\mathbf{z}}),\\ s_i^{cv} \in \partial f_i^{cv}(\hat{\mathbf{z}}),s_i^{cc} \in \partial f_i^{cc}(\hat{\mathbf{z}}) \quad \forall i=1,\ldots ,m \end{array} \right\} , \end{aligned}$$

where

$$\begin{aligned} \Lambda (\hat{\mathbf{z}})&= \mathop {\mathrm{arg\,max}}\limits _{({\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc})} \left\{ \min _{\mathbf{x} \in X} L(\mathbf{x},{\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc},\hat{\mathbf{z}})\right\} ,\\ L(\mathbf{x},{\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc},\hat{\mathbf{z}})&= F^{cv}(\mathbf{x})+\sum _{i=1}^m {\lambda }_i^{cv} \left( -\mathbf{x}+f^{cv}_i(\hat{\mathbf{z}})\right) + \lambda _i^{cc}\left( \mathbf{x} - f^{cc}_i(\hat{\mathbf{z}})\right) \end{aligned}$$

Proof

The subdifferential of the convex function $-f_i^{cc}$ is given by

$$\begin{aligned} \partial (-f_i^{cc})(\hat{\mathbf{z}})=\left\{ \mathbf{s}: -\mathbf{s}\in \partial (f_i^{cc})(\hat{\mathbf{z}}) \right\} . \end{aligned}$$

Since

$$\begin{aligned} g^{cv}(\hat{\mathbf{z}})=h(f_1^{cv}(\hat{\mathbf{z}}),\ldots ,f_m^{cv}(\hat{\mathbf{z}}),-f_1^{cc}(\hat{\mathbf{z}}),\ldots ,-f_m^{cv}(\hat{\mathbf{z}})) \end{aligned}$$

the result follows from Lemmata 1,2.$\square $

We note that in some cases, including fractional (Sect. 7) and multilinear (Sect. 5) terms, a tighter convex relaxation $F^{cv}$ of $F$ than the ones available in closed form, can be calculated through a convex optimization problem of the form

$$\begin{aligned} F^{cv}(\mathbf{x})=\left\{ \min _{\mathbf{w}\in W} r_1(\mathbf{x},\mathbf{w})| \mathbf{r}_2(\mathbf{x},\mathbf{w})\le 0 \right\} , \end{aligned}$$

with $W\subset \mathbb {R}^{n_w}$, $r_1:X \times W \rightarrow \mathbb {R}$, $\mathbf{r}_2:X \times W \rightarrow \mathbb {R}^{n_r}$. The convex underestimator of $g$ will be given by

$$\begin{aligned} \begin{array}{ccc} g^{cv}(\mathbf{z})=&{} \displaystyle \min _{\mathbf{x} \in X,\mathbf{w} \in W} &{} r_1(\mathbf{x},\mathbf{w})\\ &{}\text {s.t.}&{} \mathbf{r}_2(\mathbf{x},\mathbf{w})\le 0 \\ &{} &{}f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z}), \quad \forall i \end{array}, \end{aligned}$$

(5)

where the defining problem has Lagrangian

$$\begin{aligned} \bar{L}(\mathbf{x}, {\mathbf{w}},{\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc},{\varvec{\mu }})= r_1(\mathbf{x},\mathbf{w})+ \sum _{i\in I} \lambda ^{cv}_i \left( f^{cv}_i({\mathbf{z}}){-}x_i\right) +\sum _{i\in I} \lambda ^{cc}_i \left( x_i{-}f^{cc}_i({\mathbf{z}})\right) +{\varvec{\mu }}^T \mathbf{r}_2(\mathbf{x},\mathbf{w}). \end{aligned}$$

(6)

We can calculate the subgradient of $g^{cv}(\mathbf{z})$ using Theorem 4 using the Lagrangian multipliers associated with the constraints $f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z})$. This is formalized in Proposition 2.

Proposition 2

If strong duality holds for (5) and $\left( \hat{\mathbf{x}},\hat{\mathbf{w}},\left( \hat{\varvec{\lambda }}^{cv},\hat{\varvec{\lambda }}^{cc},\hat{\varvec{\mu }}\right) \right) $ is an optimal primal dual pair of (5) then $\left( \hat{\mathbf{x}},\left( \hat{\varvec{\lambda }}^{cv},\hat{\varvec{\lambda }}^{cc}\right) \right) $ is an optimal primal dual pair for the problem

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{\mathbf{x} \in X} \left\{ F^{cv}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} \end{aligned}$$

(7)

with Lagrangian

$$\begin{aligned} L(\mathbf{x},{\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc})= F^{cv}(\mathbf{x})+ \sum _{i\in I} \lambda ^{cv}_i \left( f^{cv}_i({\mathbf{z}})-x_i\right) +\sum _{i\in I} \lambda ^{cc}_i \left( x_i-f^{cc}_i({\mathbf{z}})\right) , \end{aligned}$$

(8)

where the constraints defining the box X are not dualized.

Proof

From strong duality we have

$$\begin{aligned}&\displaystyle \bar{L}\left( \hat{\mathbf{x}},\hat{\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right) =r_1(\hat{\mathbf{x}},\hat{\mathbf{w}}),\end{aligned}$$

(9)

$$\begin{aligned}&\displaystyle \mathbf{r}_2(\hat{\mathbf{x}},\hat{\mathbf{w}})\le 0,\nonumber \\&\displaystyle f^{cv}_i( {\mathbf{z}})\le \hat{x}_i \le f^{cc}_i( {\mathbf{z}}) \quad \text {for all } i,\end{aligned}$$

(10)

$$\begin{aligned}&\displaystyle \hat{{\varvec{\mu }}}^T \mathbf{r}_2(\hat{\mathbf{x}},\hat{\mathbf{w}})=0,\end{aligned}$$

(11)

$$\begin{aligned}&\displaystyle \hat{\lambda }^{cv}_i \left( f^{cv}_i({\mathbf{z}})-\hat{x}_i\right) =0, \quad \hat{\lambda }^{cc}_i \left( \hat{x}_i-f^{cc}_i({\mathbf{z}})\right) =0\quad \text {for all } i. \end{aligned}$$

(12)

Using (12), (9) we obtain

$$\begin{aligned} L\left( \hat{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) =\bar{L}\left( \hat{\mathbf{x}},\hat{\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right) . \end{aligned}$$

Keeping in mind (12) and (10) to show that $\left( \hat{\mathbf{x}},\left( \hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) \right) $ is an optimal point with its corresponding Lagrangian multipliers of (7) we only need

$$\begin{aligned} \hat{\mathbf{x}} \in \arg \min _{\mathbf{x}\in X} L\left( \mathbf{x},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) . \end{aligned}$$

Assume to the contrary that there exist an $\bar{\mathbf{x}}\in X$ with

$$\begin{aligned} L\left( \bar{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) <L\left( \hat{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) \end{aligned}$$

and let

$$\begin{aligned} \bar{\mathbf{w}}\in \begin{array}{cc} \displaystyle \arg \min _{\mathbf{w}} &{} r_1(\bar{\mathbf{x}},\mathbf{w})\\ \text {s.t.}&{} \mathbf{r}_2(\bar{\mathbf{x}},\mathbf{w})\le 0 \\ \end{array}. \end{aligned}$$

We have $r_1(\bar{\mathbf{x}},\bar{\mathbf{w}})=F^{cv}(\bar{\mathbf{x}})$ and $\mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\le 0$. Then we have

$$\begin{aligned} \bar{L}\left( \bar{\mathbf{x}},\bar{\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right)&= r_1(\bar{\mathbf{x}},\bar{\mathbf{w}}) + \sum _{i\in I} \left( \hat{\lambda }^{cv}_i \left( f^{cv}_i({\mathbf{z}})-\bar{x}_i\right) \right) \\&+\sum _{i\in I} \left( \hat{\lambda }^{cc}_i \left( \bar{x}_i-f^{cc}_i({\mathbf{z}})\right) \right) +\hat{{\varvec{\mu }}}^T \mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\\&= F^{cv}(\bar{\mathbf{x}})+\sum _{i\in I} \left( \hat{\lambda }^{cv}_i \left( f^{cv}_i({\mathbf{z}})-\bar{x}_i\right) \right) \\&+\sum _{i\in I} \left( \hat{\lambda }^{cc}_i \left( \bar{x}_i-f^{cc}_i({\mathbf{z}})\right) \right) +\hat{{\varvec{\mu }}}^T \mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\\&= L\left( \bar{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) +\hat{{\varvec{\mu }}}^T \mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\\&< L\left( \hat{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) +\hat{{\varvec{\mu }}}^T \mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\\&\le L\left( \hat{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) \\&= \bar{L}\left( \hat{\mathbf{x}},\hat{\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right) , \end{aligned}$$

which is a contradiction since

$$\begin{aligned} (\hat{\mathbf{x}},\hat{\mathbf{w}}) \in \arg \min _{\mathbf{x} \in X,\mathbf{w\in W}} \bar{L}\left( {\mathbf{x}}, {\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right) . \end{aligned}$$

$\square $

4 McCormick relaxations and the auxiliary variable method

In this section we revisit the relationship between McCormick relaxations and the AVM [41, 42]. AVM lies at the heart of the state of the art software BARON [35, 36], and handles composite functions implicitly by a substitution of argument functions with auxiliary variables.

While it is well known that both methods provide lower bounding mechanisms for factorable functions, the restatement of McCormick Relaxations in Theorem 1 and the subsequent generalization makes the relationship between the two approaches explicit and the occasional gap in relaxations smaller.

As mentioned in the introduction, an advantage of AVM compared to McCormick’s approach is the potentially tighter bounds due to repeated terms. While multivariate McCormick can provide better bounds than univariate McCormick it can still be weaker than AVM due to the same reasons. A case where multivariate McCormick can provide tighter bounds than AVM, is if tighter convex relaxations can be made practically available through optimization problems as is the case for fractional terms discussed in Sect. 7.

McCormick’s approach allows for optimization of the bounding problem in the original space. While there is no general rule dictating that a smaller number of variables will lead to superior performance, it has been demonstrated that in a class of problems with few variables and complex expressions, operating in the original space can give a drastic improvement of CPU time [30].

To illustrate the relationship of the two methodologies, consider the functions $\displaystyle f_1: Z_1\subset \mathbb {R}^2\rightarrow \mathbb {R}$, $\displaystyle f_2: Z_2\subset \mathbb {R}\rightarrow \mathbb {R}$, $\displaystyle f_3: Z_3\subset \mathbb {R}\rightarrow \mathbb {R}$, $\displaystyle f_4: Z_4\subset \mathbb {R}\rightarrow \mathbb {R}$, and the composite function

$$\begin{aligned} g(z)=f_1(f_3(z),f_4(z))+f_2(f_3(z)), \end{aligned}$$

$z\in Z\subset \mathbb {R}.$ Assume that for all intrinsic functions $f_i$, convex and concave relaxations $f_i^{cv}$, $f_i^{cc}$ on $Z_i$ are available. Furthermore assume that $Z_i$ are boxes and that $Z \subset Z_3$, $Z \subset Z_4$, $f_3(Z_3)\times f_4(Z_4)\subset Z_1$, $f_3(Z_3) \subset Z_2$. Note that the univariate McCormick theorem cannot handle this directly.

To solve $\{\min _{z\in Z} g(z)\}$ AVM could formulate the problem in two different ways, depending on whether it would recognize the common term $f_3(z)$.

$$\begin{aligned} \begin{array}{cc|cc} \text {Formulation 1} &{} &{}\text {Formulation 2}&{}\\ \displaystyle \min _{\begin{array}{c} z\in Z,w_1\in Z_1\\ w_2\in Z_2,w_3\in Z_3\\ w_3'\in Z_3,w_4\in Z_4 \end{array}}&{} w_1+w_2 &{}\displaystyle \min _{\begin{array}{c} z\in Z,w_1\in Z_1 \\ w_2\in Z_2,w_3\in Z_3 \\ w_4\in Z_4 \end{array}}&{} w_1+w_2\\ s.t. &{} \displaystyle w_1=f_1(w_3,w_4) &{}s.t. &{} \displaystyle w_1=f_1(w_3,w_4)\\ &{}\displaystyle w_2=f_2(w_3')&{} &{}\displaystyle w_2=f_2(w_3)\\ &{}\displaystyle w_4=f_4(z)&{} &{}\displaystyle w_4=f_4(z)\\ &{}\displaystyle w_3=f_3(z)&{} &{}\displaystyle w_3=f_3(z)\\ &{}\displaystyle w'_3=f_3(z) &{} &{} \end{array}. \end{aligned}$$

(13)

with corresponding convex relaxations

$$\begin{aligned} \begin{array}{cc|cc} \text {Formulation 1} &{} &{}\text {Formulation 2}&{}\\ \displaystyle \min _{\begin{array}{c} z\in Z,w_1\in Z_1 \\ w_2\in Z_2,w_3\in Z_3 \\ w_3'\in Z_3,w_4\in Z_4 \end{array}}&{} w_1+w_2 &{} \displaystyle \min _{\begin{array}{c} z\in Z,w_1\in Z_1 \\ w_2\in Z_2,w_3\in Z_3 \\ w_4\in Z_4 \end{array}}&{} w_1+w_2\\ s.t. &{} f_1^{cv}(w_3,w_4)\le w_1\le f_1^{cc}(w_3,w'_4) &{}s.t. &{} f_1^{cv}(w_3,w_4)\le w_1\le f_1^{cc}(w_3,w_4)\\ &{} f_2^{cv}(w_3')\le w_2\le f_2^{cc}(w_3')&{} &{} f_2^{cv}(w_3)\le w_2\le f_2^{cc}(w_3)\\ &{} f_4^{cv}(z)\le w_4\le f_4^{cc}(z)&{} &{} f_4^{cv}(z)\le w_4\le f_4^{cc}(z)\\ &{} f_3^{cv}(z)\le w_3\le f_3^{cc}(z)&{} &{} f_3^{cv}(z)\le w_3\le f_3^{cc}(z)\\ &{} f_3^{cv}(z)\le w_3'\le f_3^{cc}(z)&{}&{} \end{array}. \end{aligned}$$

(14)

Formulation 2 is tighter and will likely give a better bound. It is not hard to see that multivariate McCormick will give the same bound with Formulation 1 by solving the problem

$$\begin{aligned} \min _z\left\{ \begin{array}{cc} \displaystyle \min _{\hat{w}_1,\hat{w}_2}&{} \hat{w}_1+\hat{w}_2 \\ s.t.&{} \begin{array}{ccc} \left( \begin{array}{cc} \displaystyle \min _{\hat{w}_3,\hat{w}_4} &{} f_1^{cv}(\hat{w}_3,\hat{w}_4)\\ s.t. &{} f_3^{cv}(z)\le \hat{w}_3 \le f_3^{cc}(z) \\ &{} f_4^{cv}(z)\le \hat{w}_4 \le \hat{f}_4^{cc}(z)\\ \end{array} \right) &{}{\le } \hat{w}_1 {\le } &{} \left( \begin{array}{cc} \displaystyle \max _{\hat{w}_3,\hat{w}_4} &{} f_1^{cc}(\hat{w}_3,\hat{w}_4)\\ s.t. &{} f_3^{cv}(z)\le \hat{w}_3 \le f_3^{cc}(z) \\ &{} f_4^{cv}(z)\le \hat{w}_4 \le f_4^{cc}(z)\\ \end{array} \right) \\ \left( \begin{array}{cc} \displaystyle \min _{\hat{w}'_3} &{} f_2^{cv}(\hat{w}'_3)\\ s.t. &{} f_3^{cv}(z)\le \hat{w}'_3 \le f_3^{cc}(z) \\ \end{array} \right) &{}\le \hat{w}_2 \le &{} \left( \begin{array}{cc} \displaystyle \max _{\hat{w}'_3} &{} f_2^{cc}(\hat{w}'_3)\\ s.t. &{} f_3^{cv}(z)\le \hat{w}'_3 \le f_3^{cc}(z) \\ \end{array} \right) \end{array}\\ \end{array} \right\} . \end{aligned}$$

(15)

Equation (15) can be interpreted as a decomposition method of the first formulation of (14). In general all inner problems will be easy to solve analytically and a numerical algorithm will only be needed to minimize the resulting relaxation with respect to the original variable $z$.

In the multivariate McCormick’s Framework it is possible to introduce just sufficiently mny artificial variables to improve the resulted relaxation and match the AVM. In our example this would yield the problem

$$\begin{aligned} \min _{z,w_3}\left\{ \begin{array}{cc} \displaystyle \min _{\hat{w}_1,\hat{w}_2}&{} \hat{w}_1+\hat{w}_2 \\ s.t.&{} \begin{array}{ccc} \left( \begin{array}{cc} \displaystyle \min _{\hat{w}_3,\hat{w}_4} &{} f_1^{cv}(\hat{w}_3,\hat{w}_4)\\ s.t. &{} f_4^{cv}(z)\le \hat{w}_4 \le \hat{f}_4^{cc}(z)\\ \end{array} \right) &{}\le \hat{w}_1 \le &{} \left( \begin{array}{cc} \displaystyle \max _{\hat{w}_3,\hat{w}_4} &{} f_1^{cc}(\hat{w}_3,\hat{w}_4)\\ s.t. &{} f_4^{cv}(z)\le \hat{w}_4 \le f_4^{cc}(z)\\ \end{array} \right) \\ f_3^{cv}(z)&{}\le w_3 \le &{} f_3^{cc}(z) \\ \end{array}\\ \end{array} \right\} \!. \end{aligned}$$

(16)

yielding the same bound with AVM while increasing the optimization space by a single variable.

It is instructive to consider only one level of composition, i.e., a function $F(f(\mathbf{z}))$, to compare the two methodologies: as it turns out, the use of a cutting plane algorithm to minimize McCormick relaxations using the subgradient propagation mechanism of Sect. 3, is strongly related to applying generalized benders decomposition [15], on the lower bounding problem defined by AVM.

To minimize $F(f(\mathbf{z}))$ AVM would formulate the problem

$$\begin{aligned} \min _{\mathbf{x} \in X,\mathbf{z}\in Z} \left\{ F^{cv}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} . \end{aligned}$$

(17)

If we apply (generalized) Benders decomposition on (17) treating ${\mathbf{z}}$ as the “complicating” variables, the master problem is

$$\begin{aligned}&\min _{\mathbf{z}\in Z} V(\mathbf{z}),\\&\text {where}\quad V(\mathbf{z})= \max _{\lambda ^{cv}\ge 0,\lambda ^{cc}\ge 0}\min _{\mathbf{x}\in X}\left\{ F^{cv}(\mathbf{x})+\sum _i\lambda ^{cv}_i \left( f^{cv}_i(\mathbf{z})-\mathbf{x}\right) +\sum _i \lambda ^{cc}_i \left( \mathbf{x}-f^{cc}_i(\mathbf{z})\right) \right\} \end{aligned}$$

for all $\mathbf{z}$.

The restricted master $V_r(\mathbf{z})$ employs a subset $\Lambda $ of multipliers. A $\hat{\mathbf{z}}$ obtained by solving the restricted master will be suboptimal if

$$\begin{aligned} V_r(\hat{\mathbf{z}})< \max _{\lambda ^{cv}\ge 0,\lambda ^{cc}\ge 0} \min _{\mathbf{x}\in X} \left\{ F^{cv}(\mathbf{x})+\sum _i \lambda ^{cv}_i \left( f^{cv}_i(\hat{\mathbf{z}})-\mathbf{x}\right) +\sum _i \lambda ^{cc}_i \left( \mathbf{x}-f^{cc}_i(\hat{\mathbf{z}})\right) \right\} . \end{aligned}$$

(18)

The cut obtained is

$$\begin{aligned} V(\mathbf{z})\ge \min _{\mathbf{x} \in X} F^{cv}( \mathbf{x})+\sum _i \hat{\lambda }^{cv}_i \left( f^{cv}_i(\mathbf{z})-\mathbf{x}\right) +\sum _i \hat{\lambda }^{cc}_i \left( \mathbf{x}-f^{cc}_i(\mathbf{z})\right) , \end{aligned}$$

(19)

where

$$\begin{aligned} \left( \hat{\lambda }^{cv},\hat{\lambda }^{cc}\right) \in \mathop {\mathrm{arg\,max}}\limits _{\lambda ^{cv}\ge 0,\lambda ^{cc}\ge 0} \min _\mathbf{x\in X} \left\{ F^{cv}(\mathbf{x})+\sum _i \lambda ^{cv}_i \left( f^{cv}_i(\hat{\mathbf{z}})-\mathbf{x}\right) +\sum _i \lambda ^{cc}_i \left( \mathbf{x}-f^{cc}_i(\hat{\mathbf{z}})\right) \right\} , \end{aligned}$$

cutting off $\hat{\mathbf{z}}$. The generalized-Benders cut (19) can be further relaxed by linearization around $\hat{\mathbf{z}}$, yielding,

$$\begin{aligned} V(\mathbf{z})&\ge \min _{\mathbf{x} \in X} F^{cv}( \mathbf{x})+\sum _i \hat{\lambda }^{cv}_i \left( f^{cv}_i(\hat{\mathbf{z}})+\mathbf{s}^{cv}_i(\mathbf{z}{-}\hat{\mathbf{z}}){-}\mathbf{x} \right) {-}\sum _i \hat{\lambda }^{cc}_i\left( f^{cc}_i(\hat{\mathbf{z}})+\mathbf{s}^{cc}_i(\mathbf{z}{-}\hat{\mathbf{z}}){-}\mathbf{x} \right) \\&= \sum _i \left( \hat{\lambda }^{cv}_i \mathbf{s}^{cv}_i-\hat{\lambda }^{cc}_i \mathbf{s}^{cc}_i\right) (\mathbf{z}-\hat{\mathbf{z}})+\min _{\mathbf{x} \in X} F^{cv}( \mathbf{x})+\sum _i \hat{\lambda }^{cv}_i f^{cv}_i(\hat{\mathbf{z}})-\sum _i\hat{\lambda }^{cc}_i f^{cc}_i(\hat{\mathbf{z}})\\&=g^{cv}(\hat{\mathbf{z}})+\sum _i \left( \hat{\lambda }^{cv}_i \mathbf{s}^{cv}_i-\hat{\lambda }^{cc}_i \mathbf{s}^{cc}_i\right) (\mathbf{z}-\hat{\mathbf{z}}), \end{aligned}$$

where $\mathbf{s}^{cv}_i$, $\mathbf{s}^{cc}_i$ are subgradients of $f^{cv}_i$, $f^{cc}_i$ at $\hat{\mathbf{z}}$. It can be seen that the linearized cut is equivalent with a subgradient inequality for $g^{cv}$ as obtained by Theorem 4. Note, that the generalized Benders’ subproblem (18) generating the cut is identical with (the dual of) the problem solved to provide a function evaluation and generate the subgradient.

Therefore, in the single level composition, applying generalized Benders decomposition to AVM is equivalent to minimizing $g(\mathbf{z})$ through a first-order algorithm which at iteration $k+1$ chooses point $\mathbf{z}_{k+1}$ for evaluation by solving the linear relaxation

$$\begin{aligned} \min _{w,\mathbf{z}} \left\{ w: w\ge g(\mathbf{z}_i)+\mathbf{s}_i^T (\mathbf{z}-\mathbf{z}_i), \quad \forall i\le k \right\} , \end{aligned}$$

where $g(\mathbf{z}_i)$, $\mathbf{s}_i$ are the function evaluation and subgradient returned by the oracle at iteration $i$. This is not a very efficient algorithm to minimize $g(\mathbf{z})$ and here we use it just to illustrate the equivalence. More efficient first-order methods can be found, for example, in [31].

We note, that in the (univariate and multivariate) McCormick Relaxation framework, it is straightforward to apply Theorem 4 recursively to generate subgradients for nested compositions of functions, This is not to say that it would be impossible to construct equivalent nested decomposition schemes for AVM, which has a staircase structure. In the context of stochastic programming, Birge [6] explores nested decomposition schemes for LPs. In the presence of NLPs with a staircase structure, O’Neill [32] proposes a decomposition framework combining primal and dual decomposition ideas, which is however rather involved.

Note also that the advantage of retaining the original variable space is important only in the case of such complex expressions that would need an introduction of a great number of variables to construct the AVM equivalent. Thus, the above observations on the equivalence of the two approaches for a single composition are mainly of theoretical interest.

5 Product rule

An interesting example of multivariate composition are products of functions. Note that bilinear terms and bilinear products of functions are very important in applications, see for instance the recent articles [19, 28]. Let $\mathrm{mult}(x_1,x_2)=x_1 x_2$. As given in [3, 23], the convex/concave envelopes of $\mathrm{mult}(\cdot ,\cdot )$ on $\left[ x_1^L,x_1^U\right] \times \left[ x_2^L,x_2^U\right] $ are

$$\begin{aligned} \mathrm{mult}^{cv}&= \max \left\{ x_2^U x_1+x_1^U x_2- x_1^U x_2^U, x_2^L x_1+x_1^L x_2- x_1^L x_2^L \right\} ,\\ \mathrm{mult}^{cc}&= \min \left\{ x_2^L x_1+x_1^U x_2- x_1^U x_2^L, x_2^U x_1+x_1^L x_2- x_1^L x_2^U \right\} . \end{aligned}$$

Theorem 2 directly gives convex/concave relaxations for the product of two functions.

Corollary 5

Let $\displaystyle g(\mathbf{z})=\mathrm{mult}(f_1(\mathbf{z}),f_2(\mathbf{z}))$, with $\displaystyle f_1:Z\subset \mathbb {R}^n \rightarrow \mathbb {R}$, $\displaystyle f_2:Z \subset \mathbb {R}^n \rightarrow \mathbb {R}$. Let also $f_i^L$, $f_i^U$ denote bounds for $f_i$, i.e., $ f_i^L\le f_i(\mathbf{z})\le f_i^U$ and $\displaystyle f^{cv}_i$, $f^{cc}_i$ convex and concave relaxations of $f_i$ on $Z$ respectively.

$$\begin{aligned} \begin{array}{ccc}g^{cv}(\mathbf{z})=&{} \displaystyle \min _{x_i\in [f_i^L,f_i^U]} &{} \displaystyle \max \left\{ f_2^U x_1+f_1^U x_2- f_1^U f_2^U, f_2^L x_1+f_1^L x_2- f_1^L f_2^L \right\} \\ &{}\text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{}&{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \end{aligned}$$

(20)

is a convex relaxation of $g$ on $Z$ and

$$\begin{aligned} \begin{array}{ccc}g^{cc}(\mathbf{z})=&{} \displaystyle \max _{x_i\in [f_i^L,f_i^U]} &{} \displaystyle \min \left\{ f_2^L x_1+f_1^U x_2- f_1^U f_2^L, f_2^U x_1+f_1^L x_2- f_1^L f_2^U \right\} \\ &{}\text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{}&{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \end{aligned}$$

(21)

is a concave relaxation of $g$ on $Z$.

The convex relaxation for $g$ that McCormick proposed [23] is

$$\begin{aligned} \bar{g}^{cv}(\mathbf{z})=\max \left\{ \alpha _1(\mathbf{z})+\alpha _2(\mathbf{z})-f_1^L f_2^L,\beta _1(\mathbf{z})+\beta _2(\mathbf{z})-f_1^U f_2^U \right\} \end{aligned}$$

(22)

where

$$\begin{aligned} \begin{array}{cc} \alpha _1(\mathbf{z})= \min \left\{ f_2^L f_1^{cv}(\mathbf{z}),f_2^L f_1^{cc}(\mathbf{z})\right\} ,&{} \alpha _2(\mathbf{z})= \min \left\{ f_1^L f_2^{cv}(\mathbf{z}),f_1^L f_2^{cc}(\mathbf{z})\right\} ,\\ \beta _1(\mathbf{z})= \min \left\{ f_2^U f_1^{cv}(\mathbf{z}),f_2^U f_1^{cc}(\mathbf{z})\right\} ,&{} \beta _2(\mathbf{z})= \min \left\{ f_1^U f_2^{cv}(\mathbf{z}),f_1^U f_2^{cc}(\mathbf{z})\right\} . \end{array} \end{aligned}$$

The equivalent concave relaxation is

$$\begin{aligned} \bar{g}^{cc}(\mathbf{z})=\max \left\{ \gamma _1(\mathbf{z})+\gamma _2(\mathbf{z})-f_1^U f_2^L,\delta _1(\mathbf{z})+\delta _2(\mathbf{z})-f_1^U f_2^U \right\} \end{aligned}$$

(23)

where

$$\begin{aligned} \begin{array}{cc} \gamma _1(\mathbf{z})= \max \left\{ f_2^L f_1^{cv}(\mathbf{z}),f_2^L f_1^{cc}(\mathbf{z})\right\} ,&{} \gamma _2(\mathbf{z})= \max \left\{ f_1^U f_2^{cv}(\mathbf{z}),f_1^U f_2^{cc}(\mathbf{z})\right\} ,\\ \delta _1(\mathbf{z})= \max \left\{ f_2^U f_1^{cv}(\mathbf{z}),f_2^U f_1^{cc}(\mathbf{z})\right\} ,&{} \delta _2(\mathbf{z})= \max \left\{ f_1^L f_2^{cv}(\mathbf{z}),f_1^L f_2^{cc}(\mathbf{z})\right\} . \end{array} \end{aligned}$$

Proposition 3 shows that the proposed relaxations $g^{cv}/g^{cc}$ are always at least as tight as McCormick’s rule $\bar{g}^{cv}/\bar{g}^{cc}$, while Fig. 1 shows that they can be tighter.

Proposition 3

$g^{cv}(\mathbf{z}) \ge \bar{g}^{cv}(\mathbf{z})$ for all $\mathbf{z} \in Z$ and $g^{cc}(\mathbf{z}) \le \bar{g}^{cc}(\mathbf{z})$ for all $\mathbf{z} \in Z$.

Proof

Using the well-known fact, e.g., [51], that for any function $\phi (\mathbf{x},\mathbf{y})$ defined on $\mathcal {X} \times \mathcal {Y}$ there holds

$$\begin{aligned} \min _{\mathbf{x}\in \mathcal {X}} \max _{\mathbf{y} \in \mathcal {Y}} \phi (\mathbf{x},\mathbf{y}) \ge \max _{\mathbf{y} \in \mathcal {Y}} \min _{\mathbf{x}\in \mathcal {X}} \phi (\mathbf{x},\mathbf{y}), \end{aligned}$$

by interchanging the minimization and maximization operators we obtain

$$\begin{aligned} g^{cv}(\mathbf{z})&\ge \max \left\{ \begin{array}{cc} \displaystyle \min _{x_i\in [f_i^L,f_i^U]} &{} \displaystyle f_2^U x_1+f_1^U x_2- f_1^U f_2^U \\ \text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ \displaystyle &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array}, \begin{array}{cc} \displaystyle \min _{x_i\in [f_i^L,f_i^U]} &{} \displaystyle f_2^L x_1+f_1^L x_2- f_1^L f_2^L \\ \displaystyle \text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \right\} \nonumber \\ \end{aligned}$$

(24)

$$\begin{aligned}&\ge \max \left\{ \begin{array}{cc} \displaystyle \min _{x_i\in \mathbb {R}} &{} \displaystyle f_2^U x_1+f_1^U x_2- f_1^U f_2^U \\ \text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array}, \begin{array}{cc} \displaystyle \min _{x_i\in \mathbb {R}} &{} \displaystyle f_2^L x_1+f_1^L x_2- f_1^L f_2^L \\ \text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \right\} \nonumber \\&= \quad \max \left\{ b_1(\mathbf{z})+b_2(\mathbf{z})-f_1^U f_2^U,a_1(\mathbf{z})+a_2(\mathbf{z})-f_1^L f_2^L \right\} =\bar{g}^{cv}(\mathbf{z}). \end{aligned}$$

(25)

The proof that $g^{cc}(\mathbf{z}) \le \bar{g}^{cc}(\mathbf{z})$ for all $\mathbf{z} \in Z$ is similar and is omitted for brevity.$\square $

Note that the first inequality in (24) can be strict only if $f_1^L<0<f_1^U$ or $f_2^L<0<f_2^U$ and the second only if $\left[ f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z})\right] \not \subset \left[ f_1^L,f_1^U\right] $ or $\left[ f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z})\right] \not \subset \left[ f_2^L,f_2^U\right] $, that is, only if Assumption 4 does not hold. Scott and Barton [38] have observed that $\bar{g}^{cv}$ can be tightened by intersecting with the interval bounds. However, from the definition of $g^{cv}$ we have that $g^{cv}$ is at least as tight and in some cases tighter than the result by Scott and Barton. If $f_1^U=f_1^L$ or $f_2^U=f_2^L$, at least one of the functions is constant and the computation of the convex and concave envelopes of their product is trivial.

The convex relaxations obtained by Eqs. (20), (21) can be represented in closed form. If $f_1^U>f_1^L$ and $f_2^U>f_2^L$, $g^{cv}(\mathbf{z})$ can be shown to be given by

$$\begin{aligned} g^{cv}(\mathbf{z})=\min \left\{ \begin{array}{cc} \max &{} \left\{ f_2^U f_1^{cv}(\mathbf{z}) +f_1^U \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cv}(\mathbf{z})+\zeta )-f_1^U f_2^U,\right. \\ &{}\left. f_2^L f_1^{cv}(\mathbf{z}) +f_1^L \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}) ,\kappa f_1^{cv}(\mathbf{z})+\zeta ) -f_1^L f_2^L \right\} ,\\ \max &{} \left\{ f_2^U f_1^{cc}(\mathbf{z}) +f_1^U \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cc}(\mathbf{z})+\zeta )-f_1^U f_2^U,\right. \\ &{}\left. f_2^L f_1^{cc}(\mathbf{z}) +f_1^L \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cc}(\mathbf{z})+\zeta ) -f_1^L f_2^L \right\} ,\\ \max &{} \left\{ f_2^U \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cv}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^U f_2^{cv}(\mathbf{z}) -f_1^U f_2^U,\right. \\ &{}\left. f_2^L \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cv}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^L f_2^{cv}(\mathbf{z}) -f_1^L f_2^L \right\} ,\\ \max &{} \left\{ f_2^U \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cc}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^U f_2^{cc}(\mathbf{z}) -f_1^U f_2^U,\right. \\ &{}\left. f_2^L \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cc}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^L f_2^{cc}(\mathbf{z}) -f_1^L f_2^L \right\} \end{array} \right\} \end{aligned}$$

where

$$\begin{aligned} \kappa =\frac{f_2^L-f_2^U}{f_1^U-f_1^L},\quad \zeta =\frac{f_1^U f_2^U-f_1^L f_2^L}{f_1^U-f_1^L}. \end{aligned}$$

Similarly, if $f_1^U>f_1^L$ and $f_2^U>f_2^L$ then $g^{cc}(\mathbf{z})$ is given by

$$\begin{aligned} g^{cc}(\mathbf{z})=\max \left\{ \begin{array}{cc} \min &{} \left\{ f_2^L f_1^{cv}(\mathbf{z}) +f_1^U \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cv}(\mathbf{z})+\zeta )-f_1^U f_2^L,\right. \\ &{}\left. f_2^U f_1^{cv}(\mathbf{z}) +f_1^L \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cv}(\mathbf{z})+\zeta ) -f_1^L f_2^U \right\} ,\\ \min &{} \left\{ f_2^L f_1^{cc}(\mathbf{z}) +f_1^U \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cc}(\mathbf{z})+\zeta )-f_1^U f_2^L,\right. \\ &{}\left. f_2^U f_1^{cc}(\mathbf{z}) +f_1^L \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cc}(\mathbf{z})+\zeta ) -f_1^L f_2^U \right\} ,\\ \min &{} \left\{ f_2^L \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cv}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^U f_2^{cv}(\mathbf{z}) -f_1^U f_2^L,\right. \\ &{}\left. f_2^U \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cv}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^L f_2^{cv}(\mathbf{z}) -f_1^L f_2^U \right\} ,\\ \min &{} \left\{ f_2^L \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cc}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^U f_2^{cc}(\mathbf{z}) -f_1^U f_2^L,\right. \\ &{}\left. f_2^U \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cc}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^L f_2^{cc}(\mathbf{z}) -f_1^L f_2^U \right\} \end{array} \right\} \end{aligned}$$

where

$$\begin{aligned} \kappa =\frac{f_2^U-f_2^L}{f_1^U-f_1^L},\quad \zeta =\frac{f_1^U f_2^L-f_1^L f_2^U}{f_1^U-f_1^L}. \end{aligned}$$

In addition to bilinear products of functions, often multilinear products of functions are used in applications. The class of functions considered herein can be summarized as $G(\mathbf{z})=\sum _{t \in T} c_t \Pi _{i \in I_t} f_i(\mathbf{z})$, where $T$ and $I_t \subset I$ are index sets and $c_t$ are constants. Such functions can be handled by recursive application of McCormick’s product rule and these approaches give weaker than possible relaxations, compare for instance [4, 5, 9, 25]. In contrast, Theorem 2 provides the framework to directly handle such terms and provide tighter relaxations. Herein, only the convex relaxations are discussed; the concave relaxations are analogous.

Rikun [34] considers $F: \mathbb {R}^n \rightarrow \mathbb {R}$, $F(\mathbf{x})=\sum _{t \in T} c_t \Pi _{i \in I_t} x_i$ on a hypercube $X=X_1 \times X_2 \times \cdots \times X_n$, where $[x_i^L,x_i^U]$. He proves that the convex envelope $F^{cv,env}$ at a point $\mathbf{x}$ can be evaluated by the following optimization problem

where $\mathbf{x}^k$ denote the vertices of $X$. Note that this is a LP, albeit of size $m$. Note also that explicit representations exist for subclasses of this function such as the explicit facets of trilinear terms by Meyer and Floudas [25].

By Theorem 2 a convex relaxation of $G$ on $Z$ can be constructed as

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{\mathbf{x}} \quad&F^{cv,env}(\mathbf{x}) \\&f_i^{cv}(\mathbf{z})\le&x_i \le f_i^{cc}(\mathbf{z}), \quad i \in I. \end{aligned}$$

Noting that $\displaystyle \min _{y} \min _{w} h(y,w)=\min _{y,w} h(y,w)$ we obtain

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{ \mathbf{x},{\varvec{\lambda }}}&\sum _k \lambda _k F(\mathbf{x^k})&\\&\mathbf{x}=\sum _k \lambda _k \mathbf{x}^{k}&\\&\sum _k \lambda _k=1&\\&f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z}),&\quad i \in I. \end{aligned}$$

which is still an LP of similar size.

By Proposition 2, Theorem 4 can still be used for the computation of subgradients if we take into account in the construction of $\mathbf{\sigma }^{cv}$ only the Lagrangian multipliers associated with the constraints $f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z})$.

6 Convex/concave envelopes and relaxations of min/max operators

The operators $\min $ and $\max $ often arise in engineering optimization formulations. It is well-known that the minimum of concave functions is a concave function, but the same does not hold in general for convex functions. To the authors’ best knowledge relaxations for such functions are not available in literature or in most numerical codes. For instance, the operators $\min /\max $ are currently not handled by the state-of-the-art general-purpose solvers BARON [36, 49] and ANTIGONE [27–29], while in MC++ [10] they are handled using the well-known reformulation

$$\begin{aligned} \min \left( f_1(\mathbf{z}),f_2(\mathbf{z})\right) =\frac{1}{2}\left( f_1(\mathbf{z})+f_2(\mathbf{z})-|f_1(\mathbf{z})-f_2(\mathbf{z})| \right) \end{aligned}$$

(26)

and applying the univariate McCormick composition theorem to the negative absolute value. However, the constructed relaxations are not as tight as the ones proposed here.

Calculating interval enclosures for the function $\min (f_1(x),f_2(x))$ given interval enclosures for $f_1$ and $f_2$ is straightforward and is also done in MC++ [10].

Proposition 4

Consider $Z \in \mathbb {R}^n$ and $f_1,f_2:Z \rightarrow \mathbb {R}$. Suppose that interval enclosures are given for $f_1$ and $f_2$ on $Z$, i.e., bounds $f_1^L,f_1^U$, $f_2^L,f_2^U$ such that

$$\begin{aligned} f_1^L \le f_1(\mathbf{z}) \le f_1^U \qquad f_2^L \le f_2(\mathbf{z}) \le f_2^U \end{aligned}$$

Then we have

$$\begin{aligned} \min \left( f_1^L,f_2^L\right) \le \min (f_1(\mathbf{z}),f_2(\mathbf{z})) \le \min \left( f_1^U,f_2^U\right) \end{aligned}$$

It is noteworthy that these bounds are not exact, as shown in the next example

Example 1

Take $\min (z,-z)$ with $z \in [-1,1]$. The range is clearly $[-1,0]$ and the rule given in Proposition 4 gives a valid but overestimated enclosure as $[-1,1]$.

We can utilize Corollary 3 to compute convex/concave relaxations for the minimum/maximum of two functions. The computation of convex/concave relaxations of the minimum/maximum of two functions by Theorem 2 requires the convex/concave envelopes of $\min (x_1,x_2)/\max (x_1,x_2)$ on an arbitrary rectangle, which is easy to derive to the authors’ best knowledge is not explicitly available in the literature.

Lemma 3

Consider $Z=X_1 \times X_2\subset \mathbb {R}^2$ with $X_1=\left[ x_1^L,x_1^U\right] $ and $X_2=\left[ x_2^L,x_2^U\right] $ and let $\mathbf{z} =(x_1,x_2)$. The convex envelope of $\min (x_1,x_2)$ on $Z$ is given by $\min ^{cv}:Z \rightarrow \mathbb {R}$,

$$\begin{aligned} \mathrm{min}^{\mathrm{cv}}(\hbox {x}_1,\mathrm{x}_2)&= \max \left( \mathrm{min}^{\mathrm{cv},1}(\mathrm{x}_1,\mathrm{x}_2),\mathrm{min}^{\mathrm{cv},2}(\mathrm{x}_1,\mathrm{x}_2)\right) \quad \text {with} \\ \mathrm{min}^{\mathrm{cv},1}(\mathrm{x}_1,\mathrm{x}_2)&= \min \left( x_1^L,x_2^L\right) + \frac{x_1-x_1^L}{x_1^U-x_1^L} \left( \min (x_1^U, x_2^L)-\min \left( x_1^L, x_2^L\right) \right) \\&+ \,\,\frac{x_2-x_2^L}{x_2^U-x_2^L} \left( \min (x_1^L, x_2^U)-\min \left( x_1^L, x_2^L\right) \right) \\ \mathrm{min}^{\mathrm{cv},2}(\mathrm{x}_1,\mathrm{x}_2)&= \min \left( x_1^U,x_2^U\right) +\frac{x_1-x_1^U}{x_1^L-x_1^U} \left( \min (x_1^L, x_2^U)-\min \left( x_1^U, x_2^U\right) \right) \\&+\,\,\frac{x_2-x_2^U}{x_2^L-x_2^U} \left( \min (x_1^U, x_2^L)-\min \left( x_1^U, x_2^U\right) \right) \\ \end{aligned}$$

and the concave envelope of $\max (x_1,x_2)$ on $Z$ is given by $\max ^{cc}:Z \rightarrow \mathbb {R}$,

$$\begin{aligned} \mathrm{max}^{\mathrm{cc}}(\mathrm{x}_1,\mathrm{x}_2)&= \min \left( \mathrm{max}^{\mathrm{cc},1}(\mathrm{x},\mathrm{x}_2),\mathrm{max}^{\mathrm{cc},2}(\mathrm{x}_1,\mathrm{x}_2)\right) \quad \text { with}\\ \mathrm{max}^{\mathrm{cc},1}(\mathrm{x}_1,\mathrm{x}_2)&= \max \left( x_1^L,x_2^L\right) + \frac{x_1-x_1^L}{x_1^U-x_1^L} \left( \max \left( x_1^U, x_2^L\right) -\max \left( x_1^L, x_2^L\right) \right) \\&+\,\, \frac{x_2-x_2^L}{x_2^U-x_2^L} \left( \max \left( x_1^L, x_2^U\right) -\max \left( x_1^L, x_2^L\right) \right) \\ \mathrm{max}^{\mathrm{cc},2}(\mathrm{x}_1,\mathrm{x}_2)&= \max \left( x_1^U,x_2^U\right) +\frac{x_1-x_1^U}{x_1^L-x_1^U} \left( \max \left( x_1^L, x_2^U\right) -\max \left( x_1^U, x_2^U\right) \right) \\&+\,\,\frac{x_2-x_2^U}{x_2^L-x_2^U} \left( \max \left( x_1^U, x_2^L\right) -\max \left( x_1^U, x_2^U\right) \right) \\ \end{aligned}$$

Proof

The proof is in the Appendix.$\square $

A convex relaxation of the maximum of two functions is trivially given by the maximum of the convex relaxations of the two functions and a concave relaxation of the minimum of two functions as the minimum of the concave relaxations of the two functions.

Proposition 5

Consider $Z \in \mathbb {R}^n$ and $g_1,g_2,f_1,f_2:Z \rightarrow \mathbb {R}$ such that $g_1(\mathbf{z})=\min \left( f_1(\mathbf{z}),f_2(\mathbf{z}) \right) $, $g_2(\mathbf{z})=\max \left( f_1(\mathbf{z}),f_2(\mathbf{z}) \right) $. Suppose that interval enclosures are given for $f_1$ and $f_2$ on $Z$, i.e., bounds $f_1^L,f_1^U$, $f_2^L,f_2^U$ such that

$$\begin{aligned} f_1^L \le f_1({\mathbf{z}}) \le f_1^U \qquad f_2^L \le f_2(\mathbf{z}) \le f_2^U \end{aligned}$$

and convex and concave relaxations such that

$$\begin{aligned} f_1^{cv}(\mathbf{z}) \le f_1(\mathbf{z}) \le f_1^{cc}(\mathbf{z}) \qquad f_2^{cv}(\mathbf{z}) \le f_2(\mathbf{z})\le f_2^{cc}(\mathbf{z}). \end{aligned}$$

Recall that Proposition 4 gives interval enclosures for $f$ on $Z$. The following procedure defines a convex relaxation $g_1^{cv}:Z \rightarrow \mathbb {R}$ of $g_1$ on $Z$.

If $f_1^U \le f_2^L$ then $g_1^{cv}(\mathbf{z})=f_1^{cv}(\mathbf{z})$. Similarly, if $f_2^U \le f_1^L$ then $g_1^{cv}(\mathbf{z})=f_2^{cv}(\mathbf{z})$. Otherwise

$$\begin{aligned} g_1^{cv}(\mathbf{z})=\max \left( g_1^{cv,1}(\mathbf{z}),g_1^{cv,2}(\mathbf{z})\right) \end{aligned}$$

where

$$\begin{aligned} g_1^{cv,1}(\mathbf{z})&= \min \left( f_1^L,f_2^L\right) + \frac{f_1^{cv}(\mathbf{z})-f_1^L}{f_1^U-f_1^L} \left( \min \left( f_1^U, f_2^L\right) -\min \left( f_1^L, f_2^L\right) \right) \\&+\,\, \frac{f_2^{cv}(\mathbf{z})-f_2^L}{f_2^U-f_2^L} \left( \min \left( f_1^L, f_2^U\right) -\min \left( f_1^L, f_2^L\right) \right) \\ g_1^{cv,2}(\mathbf{z})&= \min \left( f_1^U,f_2^U\right) +\frac{f_1^{cv}(\mathbf{z})-f_1^U}{f_1^L-f_1^U} \left( \min \left( f_1^L, f_2^U\right) -\min \left( f_1^U, f_2^U\right) \right) \\&+\,\,\frac{f_2^{cv}(\mathbf{z})-f_2^U}{f_2^L-f_2^U} \left( \min \left( f_1^U, f_2^L\right) -\min \left( f_1^U, f_2^U\right) \right) . \end{aligned}$$

Furthermore, the following procedure defines a concave relaxation $g_2^{cc}:Z \rightarrow \mathbb {R}$ of $g_2$ on $Z$. If $f_1^U \le f_2^L$ then $g_2^{cc}(\mathbf{z})=f_1^{cc}(\mathbf{z})$. Similarly, if $f_2^U \le f_1^L$ then $g_2^{cc}(\mathbf{z})=f_2^{cc}(\mathbf{z})$. Otherwise $g_2^{cc}(\mathbf{z})=\min \left( g_2^{cc,1}(\mathbf{z}),g_2^{cc,2}(\mathbf{z})\right) $ where

$$\begin{aligned} g_2^{cc,1}(\mathbf{z})&= \max \left( f_1^L,f_2^L\right) + \frac{f_1^{cc}(\mathbf{z})-f_1^L}{f_1^U-f_1^L} \left( \max \left( f_1^U, f_2^L\right) -\max \left( f_1^L, f_2^L\right) \right) \\&+\,\, \frac{f_2^{cc}(\mathbf{z})-f_2^L}{f_2^U-f_2^L} \left( \max \left( f_1^L, f_2^U\right) -\max \left( f_1^L, f_2^L\right) \right) \\ g_2^{cc,2}(\mathbf{z})&= \max \left( f_1^U,f_2^U\right) +\frac{f_1^{cc}(\mathbf{z})-f_1^U}{f_1^L-f_1^U} \left( \max \left( f_1^L, f_2^U\right) -\max \left( f_1^U, f_2^U\right) \right) \\&+\,\,\frac{f_2^{cc}(\mathbf{z})-f_2^U}{f_2^L-f_2^U} \left( \max \left( f_1^U, f_2^L\right) -\max \left( f_1^U, f_2^U\right) \right) \end{aligned}$$

Proof

Since $\min (\cdot ,\cdot )$ and $\max (\cdot ,\cdot )$ are monotonic increasing the result follows by Corollary 3.$\square $

Note that there is no guarantee that the proposed relaxation is the envelope even if the estimators of the factors are, as shown in Fig. 2.

Reformulating $\min (\cdot ,\cdot )$ and $\max (\cdot ,\cdot )$ operators using the absolute value of the difference, results in weak natural interval extensions and also weaker McCormick relaxations as shown in Proposition 6. Figure 2 shows that the inequality in Proposition 6 can be strict.

Proposition 6

Consider $Z \in \mathbb {R}^n$ and $f_1,f_2:Z \rightarrow \mathbb {R}$ such that $g_1(\mathbf{z})=\min \left( f_1(\mathbf{z}),f_2(\mathbf{z}) \right) $. Suppose that interval enclosures are given for $f_1$ and $f_2$ on $Z$, i.e., bounds $f_1^L,f_1^U$, $f_2^L,f_2^U$ such that

$$\begin{aligned} f_1^L \le f_1(\mathbf{z}) \le f_1^U \qquad f_2^L \le f_2(\mathbf{z}) \le f_2^U \end{aligned}$$

and convex/concave relaxations such that

$$\begin{aligned} f_1^{cv}(\mathbf{z}) \le f_1(\mathbf{z}) \le f_1^{cc}(\mathbf{z}) \qquad f_2^{cv}(\mathbf{z}) \le f_2(\mathbf{z}) \le f_2^{cc}(\mathbf{z}). \end{aligned}$$

For the overlapping case $f_1^L< f_2^U$, $f_2^L < f_1^U$, the convex/concave relaxations for min/max proposed in Theorem 5 are at least as tight as the ones obtained by McCormick’s composition Theorem applied to the reformulation via the absolute value.

Proof

The proof is given in the Appendix $\square $

Relaxations of $\min \left( f_1,\ldots ,f_m \right) $ can be computed either recursively, or by direct application of Theorem 2 if an envelope/relaxation of $\min \left( x_1,\ldots ,x_m \right) $ is available on the appropriate domain. In $[0,1]^m$, for example, it can be shown that the convex envelope of $\min \left( x_1,\ldots ,x_m \right) $ is $\displaystyle \max (0,\sum \nolimits _i x_i-n+1)$.

If the relaxation for the multiterm operator is the envelope, direct application of the multivariate composition will result in at least as tight relaxations. If in contrast the relaxations for the multiterm operator are weak, it may be advisable to use the bivariate composition recursively.

7 Fractional terms

Fractional terms $f_1(\mathbf{z})/f_2(\mathbf{z})$ often arise in engineering optimization formulations. In McCormick relaxation framework, e.g., in MC++ [10] they are handled rigorously using the presentation as $f_1(\mathbf{z}) \times \left( f_2(\mathbf{z})\right) ^{-1}$, i.e., as a bilinear product with the inverse function embedded. The multivariate composition theorem can handle the fractional terms more naturally and yields at least as tight and often tighter relaxations. For the rest of this section we assume that $f_2^L>0$ or $f_2^U<0$, so that the division is well defined.

Consider the fractional term $\frac{x_1}{x_2}$ on $X_1\times X_2=\left[ x_1^L,x_1^U\right] \times \left[ x_2^L,x_2^U\right] $ which we will denote via the division function $\mathrm{div}(\cdot ,\cdot )$. Tawarmalani and Sahinidis [47] discuss convex relaxations and the envelope for the positive orthant, i.e., for $x_1^L>0$, $x_2^L>0$. One relaxation by Zamora and Grossmann [53, 54] is given by

$$ \begin{aligned} \mathrm{div}^{cv,Z \& G}(x_1,x_2)=\frac{1}{x_2}\left( \frac{x_1+\sqrt{x_1^L x_1^U}}{\sqrt{x_1^L}+\sqrt{x_1^U}} \right) ^2. \end{aligned}$$

(27)

The function $ \mathrm{div}^{cv,Z \& G}$ is the convex envelope when $x_2^L\rightarrow 0$ and $x_2^U\rightarrow \infty $. A piecewise linear relaxation of $\mathrm{div}$ [33, 47] is given by

$$\begin{aligned} \mathrm{div}^{cv,lin}(x_1,x_2)=\max \left\{ \frac{x_1 x_2^U-x_1^Lx_2+x_1^Lx_2^U}{(x_2^U)^2}, \frac{x_1 x_2^L-x_1^U x_2+x_1^Ux_2^L}{(x_2^L)^2} \right\} . \end{aligned}$$

(28)

Another method to obtain a valid convex relaxation of $\mathrm{div}(\cdot ,\cdot )$ on $X_1\times X_2$, assuming that either $x_2^U<0$ or $x_2^L>0$ is to apply the product rule of McCormick [23] (defined in Eq. (22)) using the representation $x_1 \times \mathrm{Inv}(x_2)$ where $\mathrm{Inv}(\cdot )=(\cdot )^{-1}$. Let $\mathrm{Inv}^L$, $\mathrm{Inv}^U$ denote the implied bounds, and $\mathrm{Inv}^{cv}(x_2)$, $\mathrm{Inv}^{cc}(x_2)$ the convex and concave relaxations of $\mathrm{Inv}$ on $X_2$. It is easy to verify that the result is

$$\begin{aligned} \mathrm{div}^{cv,mc}(x_1,x_2)=\max \left\{ \begin{array}{c} \mathrm{Inv}^L x_1 + \min \left\{ x_1^L \mathrm{Inv}^{cv}(x_2),x_1^L \mathrm{Inv}^{cc}(x_2) \right\} -x_1^L \mathrm{Inv}^L,\\ \mathrm{Inv}^U x_1 + \min \left\{ x_1^U \mathrm{Inv}^{cv}(x_2),x_1^U \mathrm{Inv}^{cc}(x_2) \right\} -x_1^U \mathrm{Inv}^U. \end{array} \right\} \end{aligned}$$

(29)

which for the positive orthant reduces to

$$\begin{aligned} \mathrm{div}^{cv,mc,+}(x_1,x_2)=\max \left\{ \frac{x_1}{x_2^U} + \frac{x_1^L}{x_2} -\frac{x_1^L}{x_2^U}, \frac{x_1}{x_2^L} + \frac{x_1^U}{x_2} -\frac{x_1^U}{x_2^L} \right\} , \end{aligned}$$

(30)

as computed by Quesada and Grossman [33] following the same procedure. It is shown in [33] that $\mathrm{div}^{cv,lin}$ is a linearization of $\mathrm{div}^{cv,mc,+}$ at $x_2^L, x_2^U$ and thus

$$\begin{aligned} \mathrm{div}^{cv,lin}(x_1,x_2)\le \mathrm{div}^{cv,mc,+}(x_1,x_2). \end{aligned}$$

The concave envelope for the positive orthant is computed in [46] to be

$$\begin{aligned} \mathrm{div}^{cc,mc,+}(x_1,x_2)=\frac{1}{x_2^L x_2^U}\min \left\{ x_2^U x_1-x_1^L x_2+x_1^L x_2^L,x_2^L x_1-x_1^U x_2+x_1^U x_2^U \right\} . \end{aligned}$$

(31)

Finally, Tawarmalani and Sahinidis [46, 47] prove that the convex envelope at a point can be evaluated by solving an optimization problem

$$\begin{aligned} \mathrm{div}^{cv,env}(x_1,x_2)=&\min \limits _{y_p,z_p,z_c^e,\lambda } z_c^e \nonumber \\&s.t.z_py_p \ge x_1^L (1-\lambda )^2 \nonumber \\&(z_c^e-z_p)(x_2-y_p)=x_1^U \lambda ^2 \nonumber \\&y_p \ge x_2^L (1-\lambda ) \nonumber \\&y_p \ge x_2- x_2^U \lambda \nonumber \\&y_p \le x_2^U (1-\lambda ) \nonumber \\&y_p \le x_2- x_2^L \lambda \nonumber \\&x_1=x_1^L +\left( x_1^U-x_1^L\right) \lambda \nonumber \\&z_c^e \ge z_p \nonumber \\&\lambda \in [0,1], z_p \ge 0 \end{aligned}$$

(32)

which can be reformulated as a semi-definite program.

In MC++ [10] a relaxation for $G(\mathbf{z})=\frac{f_1(\mathbf{z})}{\hbox {f}_2(\mathbf{z})}$ is obtained using the representation $f_1(\mathbf{z}) \times \left( \mathrm{Inv} \circ f_2(\mathbf{z})\right) $ and applying first McCormick’s composition theorem to obtain $(\mathrm{Inv} \circ \hbox {f}_2)^{\mathrm{cv}}(\mathbf{z})$, $(\mathrm{Inv} \circ \hbox {f}_2)^{cc}(\mathbf{z})$ and then McCormick’s product rule (22). The resulting relaxation is given by

$$\begin{aligned}&\!\!\!\bar{g}^{cv,MC++}(\mathbf{z})\\&~ =\max \left\{ \begin{array}{c} \min \left\{ \frac{1}{f_2^U} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^U} f_1^{cc}(\mathbf{z}) \right\} + \min \left\{ f_1^L \left( {\mathrm{Inv}}\circ f_2\right) ^{cv}(\mathbf{z}),f_1^L \left( {\mathrm{Inv}}\circ f_2\right) ^{cc}(\mathbf{z}) \right\} -\frac{f_1^L}{f_2^U},\\ \min \left\{ \frac{1}{f_2^L} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^L} f_1^{cc}(\mathbf{z}) \right\} + \min \left\{ f_1^U \left( {\mathrm{Inv}}\circ f_2\right) ^{cv}(\mathbf{z}),f_1^U \left( {\mathrm{Inv}}\circ f_2\right) ^{cc}(\mathbf{z}) \right\} -\frac{f_1^U}{f_2^L} \end{array} \right\} . \end{aligned}$$

Since both $\mathrm{Inv}^{\mathrm{cv}}, \mathrm{Inv}^{\mathrm{cc}}$ are decreasing in $(-\infty , 0)$ and $(0, \infty )$, by Corollary 3 we have

$$\begin{aligned} \left( \mathrm{Inv}\circ f_2\right) ^{cv}(\mathbf{z}) =\mathrm{Inv}^{cv} ( f_2^{cc}(\mathbf{z})), \quad \left( \mathrm{Inv}\circ f_2\right) ^{cc}(\mathbf{z})=\mathrm{Inv}^{cc} (f_2^{cv}(\mathbf{z})), \end{aligned}$$

and thus

$$\begin{aligned}&\!\!\!\bar{g}^{cv,MC++}(\mathbf{z})\nonumber \\&~ = \max \left\{ \begin{array}{llll} \min \left\{ \frac{1}{f_2^U} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^U} f_1^{cc}(\mathbf{z}) \right\} + \min \left\{ f_1^L \mathrm{Inv}^{cv}( f_2^{cc}(\mathbf{z})),f_1^L \mathrm{Inv}^{cc}( f_2^{cv}(\mathbf{z})) \right\} -\frac{f_1^L}{f_1^U},\\ \min \left\{ \frac{1}{f_2^L} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^L} f_1^{cc}(\mathbf{z}) \right\} + \min \left\{ f_1^U \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z})),f_1^U \mathrm{Inv}^{cc}( f_2^{cv}(\mathbf{z})) \right\} -\frac{f_1^U}{f_1^L} \end{array} \right\} \!.\nonumber \\ \end{aligned}$$

(33)

The multivariate composition theorem provides a direct method to calculate convex relaxations:

Corollary 6

Consider $Z \in \mathbb {R}^n$ and $G,f_1,f_2:Z \rightarrow \mathbb {R}$ such that $G(\mathbf{z})=\frac{f_1(\mathbf{z})}{f_2(\mathbf{z})}$. Suppose that interval enclosures are given for $f_1$ and $f_2$ on $Z$, i.e., bounds $f_1^L,f_1^U$, $f_2^L,f_2^U$ such that

$$\begin{aligned} f_1^L \le f_1(\mathbf{z}) \le f_1^U \qquad f_2^L \le f_2(\mathbf{z}) \le f_2^U \end{aligned}$$

and convex/concave relaxations such that

$$\begin{aligned} f_1^{cv}(\mathbf{z}) \le f_1(\mathbf{z}) \le f_1^{cc}(\mathbf{z}) \qquad f_2^{cv}(\mathbf{z}) \le f_2(\mathbf{z}) \le f_2^{cc}(\mathbf{z}). \end{aligned}$$

A valid convex relaxation for $G$ on $Z$ is given by $g^{cv}$

$$\begin{aligned} \begin{array}{ccc} g^{cv}(\mathbf{z})=&{} \displaystyle \min _{\begin{array}{c} x_1\in X_1\\ x_2 \in X_2 \end{array}}&{} \mathrm{div}^{cv}_{X_1\times X_2}(x_1,x_2) \\ &{}s.t. &{} f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{}&{} f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}), \end{array} \end{aligned}$$

(34)

where $\mathrm{div}^{cv}_{X_1\times X_2}(x_1,x_2)$ is any valid convex relaxation of $\mathrm{div}(\cdot ,\cdot )$ on $X_1\times X_2$.

Similarly, a concave relaxation is obtained by

$$\begin{aligned} \begin{array}{ccc} g^{cc}(\mathbf{z})=&{} \displaystyle \max _{\begin{array}{c} x_1\in X_1\\ x_2 \in X_2 \end{array}}&{} \mathrm{div}^{cc}_{X_1\times X_2}(x_1,x_2) \\ &{}s.t. &{} f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{}&{} f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}), \end{array} \end{aligned}$$

(35)

Proposition 7

Consider the relaxation $g^{cv}$ constructed in Corollary 6 for $G(\mathbf{z})=f_1(\mathbf{z})/f_2(\mathbf{z})$ and suppose that the relaxations for $\mathrm{div}(\cdot ,\cdot )$ on $X_1\times X_2$ are at least as tight as $\mathrm{div}^{cv,mc}$ and $\mathrm{div}^{cc,mc}$. Then $g^{cv}$ is at least as tight as $\bar{g}^{cv,MC++}$ as defined in Eq. (33).

Proof

The proof is given in the Appendix.$\square $

Figure 3 shows that the proposed relaxations can be substantially tighter than the ones obtained via the McCormick relaxations. Moreover, it shows that if weak relaxations are used for the outer function in the multivariate composition theorem, the relaxations can be weaker than the univariate McCormick relaxations.

Implementing Theorem 2 for the division of two functions is straightforward if the outer function is relaxed via (27), (28) or (29), as these relaxations are given in closed form. Subgradient computation is also straightforward and Theorem 4 can be utilized to further propagate them to outer functions.

The use of the convex envelope defined in (32) is more involved but we can use it by solving

$$\begin{aligned} \begin{array}{cccc} \displaystyle \min _{\begin{array}{c} x_1\in X_1\\ x_2 \in X_2 \end{array}}&{} \mathrm{div}^{cv,env}(x_1,x_2) &{}= \displaystyle \min _{\begin{array}{c} y_p,z_p,z_c^e,\lambda \\ ,x_1,x_2 \end{array}} &{} z_c^e \\ s.t. &{}\displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) &{} s.t. &{}\displaystyle z_py_p \ge x_1^L (1-\lambda )^2 \\ &{}\displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z})&{} &{}\displaystyle (z_c^e-z_p)(x_2-y_p)=x_1^U \lambda ^2 \\ &{}&{}&{}\displaystyle y_p \ge x_2^L (1-\lambda ) \\ &{}&{}&{}\displaystyle y_p \ge x_2- x_2^U \lambda \\ &{}&{}&{}\displaystyle y_p \le x_2^U (1-\lambda ) \\ &{}&{}&{}\displaystyle y_p \le x_2- x_2^L \lambda \\ &{}&{}&{}\displaystyle x_1=x_1^L +(x_1^U-x_1^L) \lambda \\ &{}&{}&{}\displaystyle z_c^e \ge z_p \\ &{}&{}&{}\displaystyle \lambda \in [0,1], z_p \ge 0\\ &{}&{}&{}\displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{}&{}&{}\displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \end{aligned}$$

which for a given $\mathbf{z}$ can be written a semi-definite program. Similarly to the discussion of multilinear products we can obtain subgradients by using the Lagrange multipliers associated to the constraints $f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})$, $f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z})$.

If the envelope is not used, one can easily take the maximum of $ \mathrm{div}^{\mathrm{cv,Z \& K}}$ and $\mathrm{div}^{\mathrm{cv,mc+}}$. Tawarmalani and Sahinidis [47] show it can be beneficial compared to any of the two terms.

8 Concluding remarks

We presented a multivariate generalization of McCormick’s composition theorem [23]. McCormick’s results for the relaxation of composite functions with univariate outer function are the basis of so-called McCormick-relaxations, which are one of the key ideas in constructing convex relaxations in deterministic global optimization. Our generalization to multivariate outer functions results in tighter relaxations for important classes of functions including binary product of functions, the division of functions and the minimum/maximum of functions. Similarly to McCormick’s composition and product theorem, the multivariate composition can be applied recursively and in fact the implementation of our result is very similar to McCormick’s relaxations; many of our improvements have been implemented in both MC++ [10] and modMC [13]. In contrast to the univariate McCormick’s relaxations, our result also enables the direct relaxation of classes of functions such as multilinear products of functions. This is particularly important since in recent years many relaxations have been proposed for relatively complicated expressions and it has been shown that using this is advantageous compared to recursive application of simple rules. For instance, an important class of functions are the so-called edge-concave functions treated in [26, 44, 45]; the work presented herein can be used to obtain tight relaxations for functions that are a composition of an edge-concave outer function and an arbitrary inner function; the relaxation can be achieved via a similar reasoning to our theorems for relaxations of bilinear, multilinear and fractional terms. It would be very useful to collect all these rules and implement them in the proposed multivariate McCormick relaxations and then perform a thorough computational comparison of the advances obtained. Moreover, it would be interesting to consider other important functions found in applications, such as $|f_1(\mathbf{z})-f_2(\mathbf{z})|$ and $(f_1(\mathbf{z})-f_2(\mathbf{z}))^2$ which are found for instance in parameter estimation. Also, it would be interesting to consider discontinuous functions as done in [52].

Similarly to univariate McCormick relaxations, our result is also applicable to functions calculated by algorithms [30]. It is well-known that univariate McCormick relaxations are nonsmooth and recently subgradient propagation has been proposed [30]. For the proposed multivariate framework it is also possible to propagate subgradients and in fact, we provide the framework to obtain, at least in principle, the entire subdifferential.

An alternative to McCormick relaxations is the AVM. Our reformulation and generalization of McCormick’s composition theorem makes the connection with this method more explicit. In particular, it illustrates that the McCormick relaxation framework can be interpreted as a decomposition method for AVM. It would be of interest to indeed utilize such decomposition methods in the AVM. Moreover, we discussed the tightness of relaxations of the AVM compared to the multivariate McCormick relaxations. In cases that common subexpressions are recognized in the AVM this can result in tighter relaxations than the McCormick relaxations [48]; the same holds for the simple recursive application of the proposed multivariate McCormick relaxations. In some cases, it is possible to introduce just enough auxiliary variables to close this gap, and it would be interesting to explore this opportunity computationally. Moreover, the proposed multivariate relaxations can result in tighter relaxations in specific cases by enabling the use of complicated but tight relaxations of some functions. It would be interesting to computationally compare the two methods.

References

Adjiman, C.S., Floudas, C.A.: Rigorous convex underestimators for general twice-differentiable problems. J. Glob. Optim. 9(1), 23–40 (1996)
Article MathSciNet MATH Google Scholar
Akrotirianakis, I.G., Floudas, C.A.: A new class of improved convex underestimators for twice continuously differentiable constrained NLPs. J. Glob. Optim. 30(4), 367–390 (2004)
Article MathSciNet MATH Google Scholar
Al-Khayyal, F.A., Falk, J.E.: Jointly constrained biconvex programming. Math. Oper. Res. 8(2), 273–286 (1983)
Google Scholar
Bao, X., Khajavirad, A., Sahinidis, N.V., Tawarmalani, M.: Global optimization of nonconvex problems with multilinear intermediates. Math. Program. Comput. (2013, submitted for publication)
Belotti, P., Cafieri, S., Lee, J., Liberti, L., Miller, A.: On the composition of convex envelopes for quadrilinear terms. In: Chinchuluun, A., Pardalos, P.M., Enkhbat, R., Pistikopoulos, E.N. (eds.) Optimization, Simulation, and Control, vol. 76 of Springer Optimization and Its Applications, pp. 1–16. Springer, New York (2013)
Birge, J.R.: Decomposition and partitioning methods for multistage stochastic linear programs. Oper. Res. 33(5), 989–1007 (1985)
Article MathSciNet MATH Google Scholar
Bompadre, A., Mitsos, A.: Convergence rate of McCormick relaxations. J. Glob. Optim. 52(1), 1–28 (2012)
Article MathSciNet MATH Google Scholar
Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge, UK (2004)
Cafieri, S., Lee, J., Liberti, L.: On convex relaxations of quadrilinear terms. J. Glob. Optim. 47, 661–685 (2010)
Article MathSciNet MATH Google Scholar
Chachuat, B.: MC++: a versatile library for bounding and relaxation of factorable functions. http://www3.imperial.ac.uk/environmentenergyoptimisation/software (2013)
Chachuat, B., Singer, A.B., Barton, P.I.: Global mixed integer dynamic optimization. AIChE J. 51(8), 2235–2253 (2005)
Article Google Scholar
Chachuat, B., Singer, A.B., Barton, P.I.: Global methods for dynamic optimization and mixed-integer dynamic optimization. Ind. Eng. Chem. Res. 45(25), 8373–8392 (2006)
Article Google Scholar
Corbett, C., Maier, M., Beckers, M., Naumann, U., Ghobeity, A., Mitsos, A.: Compiler-generated subgradient code for mccormick relaxations. Technical Report AIB 2011-25, RWTH Aachen. http://www.stce.rwth-aachen.de/software/modMC.html (2011)
Freund, R.: Nonlinear Programming. Lecture Notes, MIT (2012)
Geoffrion, A.M.: Generalized benders decomposition. J. Optim. Theory Appl. 10(4), 237–260 (1972)
Article MathSciNet MATH Google Scholar
Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I Fundamentals. Springer, Berlin (1993)
MATH Google Scholar
Hiriart-Urruty, J.-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2001)
Book MATH Google Scholar
Horst, R., Pardalos, P.M., Thoai, N.V.: Introduction to Global Optimization. Springer, Netherlands (2000)
Book MATH Google Scholar
Khajavirad, A., Sahinidis, N.V.: Convex envelopes of products of convex and component-wise concave functions. J. Glob. Optim. 52(3), 391–409 (2012)
Article MathSciNet MATH Google Scholar
Liberti, L., Pantelides, C.C.: Convex envelopes of monomials of odd degree. J. Glob. Optim. 25(2), 157–168 (2003)
Article MathSciNet MATH Google Scholar
Maranas, C.D., Floudas, C.A.: A global optimization approach for Lennard-Jones microclusters. J. Chem. Phys. 97(10), 7667–7678 (1992)
Article Google Scholar
Maranas, C.D., Floudas, C.A.: Finding all solutions of nonlinearly constrained systems of equations. J. Glob. Optim. 7(2), 143–182 (1995)
Article MathSciNet MATH Google Scholar
McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: part I convex underestimating problems. Math. Program. 10(1), 147–175 (1976)
Article MathSciNet MATH Google Scholar
McCormick, G.P.: Nonlinear Programming: Theory, Algorithms, and Applications. Wiley, New York (1983)
MATH Google Scholar
Meyer, C.A., Floudas, C.A.: Trilinear monomials with mixed sign domains: facets of the convex and concave envelopes. J. Glob. Optim. 29(2), 125–155 (2004)
Article MathSciNet MATH Google Scholar
Meyer, C.A., Floudas, C.A.: Convex envelopes for edge-concave functions. Math. Program. 103(2), 207–224 (2005)
Article MathSciNet MATH Google Scholar
Misener, R., Floudas, C.A.: A framework for globally optimizing mixed-integer signomial programs. J. Optim. Theory Appl. (2013, in press) doi:10.1007/s10957-013-0396-3
Misener, R., Floudas, C.A.: GloMIQO: global mixed-integer quadratic optimizer. J. Glob. Optim. 57(1), 3–50 (2013)
Article MathSciNet MATH Google Scholar
Misener, R., Floudas, C.A.: ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. (2014, accepted for publication)
Mitsos, A., Chachuat, B., Barton, P.I.: McCormick-based relaxations of algorithms. SIAM J. Optim. 20(2), 573–601 (2009)
Article MathSciNet MATH Google Scholar
Nemirovski, A.: Efficient Methods in Convex Programming. http://www2.isye.gatech.edu/nemirovs/Lect_EMCO.pdf (2005)
O’Neill, R.P.: Nested decomposition of multistage convex programs. SIAM J. Control Optim. 14(3), 409–418 (1976)
Article MathSciNet MATH Google Scholar
Quesada, I., Grossmann, I.E.: A global optimization algorithm for linear fractional and bilinear programs. J. Glob. Optim. 6(1), 39–76 (1995)
Article MathSciNet MATH Google Scholar
Rikun, A.D.: A convex envelope formula for multilinear functions. J. Glob. Optim. 10(4), 425–437 (1997)
Article MathSciNet MATH Google Scholar
Ryoo, H.S., Sahinidis, N.V.: A branch-and-reduce approach to global optimization. J. Glob. Optim. 8(2), 107–138 (1996)
Article MathSciNet MATH Google Scholar
Sahinidis, N.V.: BARON: a general purpose global optimization software package. J. Glob. Optim. 8(2), 201–205 (1996)
Article MathSciNet MATH Google Scholar
Sahlodin, A.M., Chachuat, B.: Convex/concave relaxations of parametric ODEs using taylor models. Comput. Chem. Eng. 35(5), 844–857 (2011)
Article MATH Google Scholar
Scott, J.K., Stuber, M.D., Barton, P.I.: Generalized mccormick relaxations. J. Glob. Optim. 51(4), 569–606 (2011)
Article MathSciNet MATH Google Scholar
Singer, A.B., Barton, P.I.: Global solution of optimization problems with parameter-embedded linear dynamic systems. J. Optim. Theory Appl. 121(3), 613–646 (2004)
Article MathSciNet MATH Google Scholar
Singer, A.B., Barton, P.I.: Bounding the solutions of parameter dependent nonlinear ordinary differential equations. SIAM J. Sci. Comput. 27(6), 2167–2182 (2006)
Article MathSciNet MATH Google Scholar
Smith, E., Pantelides, C.C.: A symbolic reformulation/spatial branch-and-bound algorithm for the global optimisation of nonconvex minlps. Comput. Chem. Eng. 23(4–5), 457–478 (1999)
Article Google Scholar
Smith, E., Pantelides, C.C.: Global optimisation of nonconvex minlps. Comput. Chem. Eng. 21, S791–S796 (1997)
Article Google Scholar
Stuber, M.D., Barton, P.I.: Robust simulation and design using semi-infinite programs with implicit functions. Int. J. Reliab. Saf. 5, 378–397 (2011)
Article Google Scholar
Tardella, F.: On the existence of polyhedral convex envelopes. In: Floudas, C.A., Pardalos, P. (eds.) Frontiers in Global Optimization, vol. 74 of Nonconvex Optimization and Its Applications, pp. 563–573 (2003)
Tardella, F.: Existence and sum decomposition of vertex polyhedral convex envelopes. Optim. Lett. 2(3), 363–375 (2008)
Article MathSciNet MATH Google Scholar
Tawarmalani, M., Sahinidis, N.V.: Semidefinite relaxations of fractional programs via novel convexification techniques. J. Glob. Optim. 20(2), 133–154 (2001)
Article MathSciNet MATH Google Scholar
Tawarmalani, M., Sahinidis, N.V.: Convex extensions and envelopes of lower semi-continuous functions. Math. Program. 93(2), 247–263 (2002)
Article MathSciNet MATH Google Scholar
Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, and Applications, . Kluwer Academic Publishers, Dortrecht, The Netherlands (2002)
Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225–249 (2005)
Article MathSciNet MATH Google Scholar
Tawarmalani, M., Sahinidis, N.V.: Global optimization of mixed-integer nonlinear programs: a theoretical and computational study. Math. Program. 99(3), 563–591 (2004)
Article MathSciNet MATH Google Scholar
Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton University Press, Princeton (1953)
MATH Google Scholar
Wechsung, A., Barton, P.I.: Global Optimization of Discontinuous Functions. In: AIChE annual meeting (2010)
Zamora, J.M., Grossmann, I.E.: A global MINLP optimization algorithm for the synthesis of heat exchanger networks with no stream splits. Comput. Chem. Eng. 22(3), 367–384 (1998)
Article Google Scholar
Zamora, J.M., Grossmann, I.E.: A branch and contract algorithm for problems with concave univariate, bilinear and linear fractional terms. J. Glob. Optim. 14(3), 217–249 (1999)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for the thorough review and helpful comments, in particular the reviewer that suggested the equivalence of the subgradient inequality with the linearization of the generalized benders cut in the auxiliary variable method.

Author information

Authors and Affiliations

Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
A. Tsoukalas & A. Mitsos
AVT Process Systems Engineering (SVT), RWTH Aachen University, Turmstrasse 46, 52064 , Aachen, Germany
A. Mitsos

Authors

A. Tsoukalas
View author publications
You can also search for this author in PubMed Google Scholar
A. Mitsos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Mitsos.

Additional information

An erratum to this article can be found at http://dx.doi.org/10.1007/s10898-016-0470-0.

Appendix

Herein, proofs for three results are given.

1.1 Proof of Lemma 3

For the proof we make use of the following Lemma:

Lemma 4

([18] Theorem 1.23) Let $T$ be a convex set and $S\subset T$ a simplex. If f is concave on $T$ and $f^{cv,S}$, $f^{cv,T}$ are the convex envelopes of $f$ over $S$, $T$ respectively then $f^{cv,S}\ge f^{cv,T}$ on $S$.

Now we prove Lemma 3

Proof

First we note that $\mathrm{min}^{\mathrm{cv}}(\mathrm{x}_1,\mathrm{x}_2)$ is convex in $Z$ as the maximum of two affine functions. We have

$$\begin{aligned} \mathrm{min}^{\mathrm{cv},1}\left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{L}\right)&= \min \left( x_1^L,x_2^L\right) ,\\ \mathrm{min}^{\mathrm{cv},1}\left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{U}\right)&= \min \left( x_1^L,x_2^U\right) ,\\ \mathrm{min}^{\mathrm{cv},1}\left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{L}\right)&= \min \left( x_1^U,x_2^L\right) ,\\ \mathrm{min}^{\mathrm{cv},1}\left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{U}\right)&= \min \left( x_1^U,x_2^L\right) +\min \left( x_1^L,x_2^U\right) -\min \left( x_1^L,x_2^L\right) \end{aligned}$$

and

$$\begin{aligned} \mathrm{min}^{\mathrm{cv},2}\left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{L}\right)&= \min \left( x_1^L,x_2^U\right) +\min \left( x_1^U,x_2^L\right) -\min \left( x_1^U,x_2^U\right) ,\\ \mathrm{min}^{\mathrm{cv},2}\left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{U}\right)&= \min \left( x_1^L,x_2^U\right) ,\\ \mathrm{min}^{\mathrm{cv},2}\left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{L}\right)&= \min \left( x_1^L,x_2^L\right) ,\\ \mathrm{min}^{\mathrm{cv},2}\left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{U}\right)&= \min \left( x_1^U,x_2^U\right) . \end{aligned}$$

Without loss of generality we can assume that either $x_1^L\le x_1^U \le x_2^L \le x_2^U$, if the bounds do not overlap or $x_1^L\le x_2^U, x_2^L\le x_1^U$, if the bounds overlap. In the former case we have

$$\begin{aligned} \mathrm{min}^{\mathrm{cv},1}\left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{U}\right)&= \mathrm{x}_1^\mathrm{U}+\mathrm{x}_1^\mathrm{L}-\mathrm{x}_1^\mathrm{L}=\mathrm{x}_1^\mathrm{U}=\min \left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{U}\right) ,\\ \mathrm{min}^{\mathrm{cv},2}\left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{L}\right)&= \mathrm{x}_1^\mathrm{L}+\mathrm{x}_1^\mathrm{U}-\mathrm{x}_1^\mathrm{U}=\mathrm{x}_1^\mathrm{L}=\min \left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{L}\right) . \end{aligned}$$

In the latter case

$$\begin{aligned} \mathrm{min}^{\mathrm{cv},1}\left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{U}\right)&= \mathrm{x}_2^\mathrm{L}+\mathrm{x}_1^\mathrm{L}-\min \left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{L}\right) =\max \left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{L}\right) \le \min \left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{U}\right) ,\\ \mathrm{min}^{\mathrm{cv},2}\left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^L\right)&= \mathrm{x}_2^\mathrm{L}+\mathrm{x}_1^\mathrm{L}-\min \left( \mathrm{x}_1^\mathrm{U},\mathrm{x}_2^\mathrm{U}\right) \le \mathrm{x}_2^\mathrm{L}+\mathrm{x}_1^\mathrm{L} -\max \left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{L}\right) \le \min \left( \mathrm{x}_1^\mathrm{L},\mathrm{x}_2^\mathrm{L}\right) . \end{aligned}$$

Since $\min (x_1,x_2)$ is concave, $\mathrm{min}^{\mathrm{cv},1},\mathrm{min}^{\mathrm{cv},2}$ affine and $\mathrm{min}^{\mathrm{cv},1}(\mathrm{x}_1,\mathrm{x}_2) \le \min (\mathrm{x}_1,\mathrm{x}_2)$, $\mathrm{min}^{\mathrm{cv},2}(\mathrm{x}_1,\mathrm{x}_2) \le \min (\mathrm{x}_1,\mathrm{x}_2)$ for all vertices of the box $Z$, it follows that $\mathrm{min}^{\mathrm{cv},1}$, $\mathrm{min}^{\mathrm{cv},2}$, and thus also $\mathrm{min}^{\mathrm{cv}}$ are convex underestimators of $\min (x_1,x_2)$ on $Z$.

Also, if $S_1$ is the simplex defined by the points $\left\{ \left( x_1^L,x_2^L\right) ,\left( x_1^L,x_2^U\right) ,\left( x_1^U,x_2^L\right) \right\} $ and since $\mathrm{min}^{\mathrm{cv},1}(\mathrm{x}_1,\mathrm{x}_2)=\min \left( \mathrm{x}_1,\mathrm{x}_2\right) $ for all vertices of $S_1$, it follows, see for example [18], that $\mathrm{min}^{\mathrm{cv},1}(\mathrm{x}_1,\mathrm{x}_2)$ is the convex envelope of $\min (x_1,x_2)$ in $S_1$. Similarly if $S_2$ is the simplex defined by the points $\left\{ \left( x_1^L,x_2^U\right) ,\left( x_1^U,x_2^L\right) ,\left( x_1^U,x_2^U\right) \right\} $ and since $\mathrm{min}^{\mathrm{cv},2}(\mathrm{x}_1,\mathrm{x}_2)=\min (\mathrm{x}_1,\mathrm{x}_2)$ for all vertices of $S_2$, it follows that $\mathrm{min}^{\mathrm{cv},2}(\mathrm{x}_1,\mathrm{x}_2)$ is the convex envelope of $\min (x_1,x_2)$ in $S_2$.

From Lemma 4, if $F_Z^{cv}$ is the convex envelope of $\min (x_1,x_2)$ on $Z$, then we have

$$\begin{aligned} F_Z^{cv}(\mathbf{z})&\le \mathrm{min}^{\mathrm{cv},1}(\mathbf{z}) \quad \text {for all} \quad \mathbf{z} \in \mathrm{S}_1,\\ F_Z^{cv}(\mathbf{z})&\le \mathrm{min}^{\mathrm{cv},2}(\mathbf{z}) \quad \text {for all} \quad \mathbf{z} \in \mathrm{S}_2. \end{aligned}$$

Thus

$$\begin{aligned} F_Z^{cv}(\mathbf{z})\le \max \left( \mathrm{min}^{\mathrm{cv},1}(\mathbf{z}),\mathrm{min}^{\mathrm{cv},2}(\mathbf{z}) \right) \quad \text {for all} \quad \mathbf{z} \in S_1\cup S_2=Z \end{aligned}$$

and $\mathrm{min}^{\mathrm{cv}}$ is the convex envelope of $\min (x_1,x_2)$ on $Z$.

The proof for the concave envelope of $\max (x_1,x_2)$ is similar and is omitted.$\square $

1.2 Proof of Proposition 6

Proof

Since the negative absolute value is concave and piecewise affine linear, its envelope is the secant. Thus, application of McCormick’s composition Theorem 1 as reformulated in (2) gives

$$\begin{aligned} \bar{g}_1^{cv,abs}(\mathbf{z})&= \min _{w} \mathrm{min}^{\mathrm{cv,abs}}( \mathbf{z},\mathrm{w})\nonumber \\&\text {s.t. } f_1^{cv}(\mathbf{z})-f_2^{cc}(\mathbf{z})\le w \le f_1^{cc}(\mathbf{z})-f_2^{cv}(\mathbf{z}) \\&\qquad w^L \le w \le w^U \nonumber , \end{aligned}$$

(36)

where $w^L=f_1^L-f_2^U$ and $w^U= f_1^U-f_2^L$, with

$$\begin{aligned} \mathrm{min}^{\mathrm{cv,abs}}(\mathbf{z},\mathrm{w}) =0.5 \left( \hbox {f}_1^{\mathrm{cv}}(\mathbf{z})+\hbox {f}_2^{\mathrm{cv}}(\mathbf{z})+|\hbox {w}^\mathrm{L}|+\frac{-|\hbox {w}^\mathrm{U}|+|\hbox {w}^\mathrm{L}|}{\hbox {w}^\mathrm{U}-\hbox {w}^\mathrm{L}}\left( \hbox {w}-\hbox {w}^\mathrm{L}\right) \right) . \end{aligned}$$

On the other hand, Theorem 2 gives a convex relaxation for $g_1$ on $Z$

$$\begin{aligned} g_1^{cv}(\mathbf{z})&= \min _{\mathbf{x}} \mathrm{min}^{\mathrm{cv}}(\mathbf{x}) \nonumber \\&\text {s.t. } f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z}) \\&\qquad x_i^L \le x_i \le x_i^U, \quad i=1,2, \nonumber \end{aligned}$$

(37)

where $x_i^L=f_i^L$, $x_i^U=f_i^U$ and $\mathrm{min}^{\mathrm{cv}}$ is the convex envelope of $\min (\cdot ,\cdot )$ on $X=\left( x_1^L,x_1^U\right) \times \left( x_2^L,x_2^U\right) $. We will show that (36) has an optimal value smaller or equal to the optimal value of (37) for an arbitrary but fixed $\mathbf{z}$, and thus $\mathrm{min}^{\mathrm{cv,abs}}(\mathbf{z})\le \mathrm{min}^{\mathrm{cv}}(\mathbf{z})$. To do so, first we reformulate (36) by introducing two new variables $x_1,x_2$ with $w=x_1-x_2$ and eliminate $w$ obtaining

$$\begin{aligned} \bar{g}_1^{cv,abs}(\mathbf{z})&= \min _{x_1,x_2} \mathrm{min}^{\mathrm{cv,abs}}( \mathbf{z},\mathrm{x}_1-\mathrm{x}_2)\nonumber \\&\text {s.t. } f_1^{cv}(\mathbf{z})-f_2^{cc}(\mathbf{z})\le x_1-x_2 \le f_1^{cc}(\mathbf{z})-f_2^{cc}(\mathbf{z}) \\&\qquad w^L \le x_1-x_2 \le w^U \nonumber . \end{aligned}$$

(38)

which is equivalent with (36). Next we show that the optimization problem at the right hand of (38) is a relaxation of the optimization problem at the right hand of (37) and thus has a smaller optimal value.

First we will show that any feasible point of (37) is also feasible in (38). Indeed, take any feasible $(x_1,x_2)$. By feasibility we have

$$\begin{aligned} f_1^L \le x_1 \le f_1^U, \qquad f_2^L \le x_2 \le f_2^U \end{aligned}$$

and thus

$$\begin{aligned} f_1^L-f_2^U \le x_1 -x_2 \le f_1^U - f_2^L. \end{aligned}$$

Similarly by feasibility

$$\begin{aligned} f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}), \qquad f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{aligned}$$

and thus

$$\begin{aligned} f_1^{cv}(\mathbf{z})-f_2^{cc}(\mathbf{z}) \le x_1 -x_2 \le f_1^{cc}(\mathbf{z}) - f_2^{cv}(\mathbf{z}) \end{aligned}$$

and $(x_1,x_2)$ is also feasible in (38).

It remains to show that the objective function of (38) is an underestimate of (37). Take any $(x_1,x_2)$ which is feasible in (37). By construction of $\mathrm{min}^{\mathrm{cv,abs}}$ we have

$$\begin{aligned} \mathrm{min}^{\mathrm{cv,abs}}(\mathbf{z},\mathrm{x}_1-\mathrm{x}_2)\le 0.5\left( \mathrm{f}_1^{\mathrm{cv}}(\mathbf{z})+\hbox {f}_2^{\mathrm{cv}}(\mathbf{z}) -|\hbox {x}_1-\hbox {x}_2|\right) . \end{aligned}$$

By feasibility of $(x_1,x_2)$ in (36) we also have

$$\begin{aligned} f_1^{cv}(\mathbf{z})\le x_1, \qquad f_2^{cv}(\mathbf{z})\le x_2 \end{aligned}$$

and thus combining the last two inequalities we also obtain

$$\begin{aligned} \mathrm{min}^{\mathrm{cv,abs}}(\mathbf{z},\hbox {x}_1-\hbox {x}_2) \le \hbox {x}_1+\hbox {x}_2 -|\hbox {x}_1-\hbox {x}_2|=\min (\hbox {x}_1,\hbox {x}_2) \end{aligned}$$

or $\mathrm{min}^{\mathrm{cv,abs}}(\mathbf{z},\cdot )$ underestimates $\min (\cdot ,\cdot )$ on $X$. Moreover, $\mathrm{min}^{\mathrm{cv,abs}}(\mathrm{z},\cdot )$ is affine linear and thus also convex on $X$. Since $\mathrm{min}^{\mathrm{cv}}$ is the convex envelope of $\min (\cdot ,\cdot )$ on $X$ we directly obtain

$$\begin{aligned} \mathrm{min}^{\mathrm{cv,abs}}(\mathbf{z},\hbox {w}=\hbox {x}_1-\hbox {x}_2) \le \mathrm{min}^{\mathrm{cv}}(\hbox {x}_1,\hbox {x}_2), \end{aligned}$$

and the result follows.

The result for the concave envelope is analogous and so are the results for the $\max $.$\square $

1.3 Proof of Proposition 7

Proof

Assume first that $\mathrm{div}^{cv,mc}$ is used for $\mathrm{div}^{cv}_{X_1\times X_2}$. In that case the relaxations are obtained by

$$\begin{aligned} \begin{array}{ccc} g^{cv}(\mathbf{z})=&{}\displaystyle \min _{\begin{array}{c} x_1\in X_1 \\ x_2\in X_2 \end{array}}&{} \displaystyle \mathrm{div}^{cv,mc}(x_1,x_2) \\ &{}s.t. &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{}&{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \end{aligned}$$

or substituting (29) for $\mathrm{div}^{cv,mc}(x_1,x_2)$ and the implied bounds on which the relaxation is computed $f_1^L\le x_1 \le f_1^U$, $f_2^L\le x_2 \le f_2^U$ we obtain

$$\begin{aligned}&\begin{array}{llll} g^{cv}(\mathbf{z})&{}=&{} \displaystyle \min _{\begin{array}{c} x_1\in X_1\\ x_2\in X_2 \end{array}} &{} \max \left\{ \begin{array}{c} \displaystyle \frac{1}{f_2^U} x_1 + \min \left\{ f_1^L \mathrm{Inv}^{cv}(x_2),f_1^L \mathrm{Inv}^{cc}(x_2) \right\} -\frac{f_1^L}{f_2^U},\\ \displaystyle \frac{1}{f_2^L} x_1 + \min \left\{ f_1^U \mathrm{Inv}^{cv}(x_2),f_1^U \mathrm{Inv}^{cc}(x_2) \right\} -\frac{f_1^U}{f_2^L}. \end{array} \right\} \\ &{} &{}s.t.&{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{}&{}&{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array}\\&\qquad \qquad \ge \max \left\{ \begin{array}{lll}\displaystyle \min _{\begin{array}{c} x_1\in X_1 \\ x_2\in X_2 \end{array}} &{}\displaystyle \frac{x_1}{f_2^U} + \min \big \{ f_1^L \mathrm{Inv}^{cv}(x_2),\\ &{}\qquad f_1^L \mathrm{Inv}^{cc}(x_2) \big \} -\frac{f_1^L}{f_2^U},\\ s.t.&{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}). \end{array} , \begin{array}{lll}\displaystyle \min _{\begin{array}{c} x_1\in X_1 \\ x_2\in X_2 \end{array}} &{}\displaystyle \frac{x_1}{f_2^L} + \min \big \{ f_1^U \mathrm{Inv}^{cv}(x_2),\\ &{}\qquad f_1^U \mathrm{Inv}^{cc}(x_2) \big \} -\frac{f_1^U}{f_2^L}\\ s.t. &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \right\} \end{aligned}$$

where the inequality was obtained (similarly to the proof of proposition 3) by interchanging the $\min $ and $\max $ operators.

Since the inner problems are separable in $x_1$, $x_2$ we have

$$\begin{aligned} g^{cv}(\mathbf{z})\ge \max \left\{ \zeta _1(\mathbf{z}) +\zeta _2(\mathbf{z})-\frac{f_1^L}{f_2^U},\zeta _3(\mathbf{z}) +\zeta _4(\mathbf{z})-\frac{f_1^U}{f_2^L} \right\} \end{aligned}$$

(39)

with

$$\begin{aligned} \begin{array}{cc} \begin{array}{ccc} \displaystyle \zeta _1(\mathbf{z})= &{} \displaystyle \min _{x_1\in X_1} &{} \displaystyle \frac{1}{f_2^U} x_1 \\ &{}s.t. &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ \end{array}, &{} \begin{array}{ccc} \displaystyle \zeta _2(\mathbf{z})= &{} \displaystyle \min _{x_2 \in X_2} &{} \displaystyle \min \left\{ f_1^L \mathrm{Inv}^{cv}(x_2),f_1^L \mathrm{Inv}^{cc}(x_2) \right\} \\ &{}s.t. &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \\ \end{array}\\ \begin{array}{ccc} \displaystyle \zeta _3(\mathbf{z})= &{} \displaystyle \min _{x_1\in X_1} &{} \frac{1}{f_2^L} x_1 \\ &{}s.t. &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ \end{array}, &{} \begin{array}{ccc} \displaystyle \zeta _4(\mathbf{z})= &{} \displaystyle \min _{x_2 \in X_2} &{} \displaystyle \min \left\{ f_1^U \mathrm{Inv}^{cv}(x_2),f_1^U \mathrm{Inv}^{cc}(x_2) \right\} \\ &{}s.t. &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le \displaystyle f_2^{cc}(\mathbf{z}) \\ \end{array} \end{array} \end{aligned}$$

We have

$$\begin{aligned} \zeta _1(\mathbf{z})= \min \left\{ \frac{1}{f_2^U} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^U} f_1^{cc}(\mathbf{z}) \right\} ,\quad \zeta _3(\mathbf{z})= \min \left\{ \frac{1}{f_2^L} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^L} f_1^{cc}(\mathbf{z}) \right\} . \end{aligned}$$

(40)

Also, we have

$$\begin{aligned} \zeta _2(\mathbf{z})= \min \left\{ \begin{array}{cc} \displaystyle \min _{x_2\in X_2} &{} \displaystyle f_1^L \mathrm{Inv}^{cv}(x_2)\\ s.t. &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \\ \end{array}, \begin{array}{cc} \displaystyle \min _{x_2 \in X_2} &{} \displaystyle \min f_1^L \mathrm{Inv}^{cc}(x_2)\\ s.t. &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \\ \end{array} \right\} . \end{aligned}$$

We treat separately the cases $f_1^L\ge 0$ versus $f_1^L<0$. If $f_1^L\ge 0$ since both $\mathrm{Inv}^{\mathrm{cv}}, \mathrm{Inv}^{\mathrm{cc}}$ are decreasing monotonically we have

$$\begin{aligned} \zeta _2(\mathbf{z})= \min \left\{ \begin{array}{c} f_1^L \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z})),\\ f_1^L \mathrm{Inv}^{cc}(f_2^{cc}(\mathbf{z})) \end{array} \right\} = f_1^L \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z})) \end{aligned}$$

But

$$\begin{aligned} f_1^L \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z}))\le f_1^L \mathrm{Inv}^{cv}(f_2^{cv}(\mathbf{z}))\le f_1^L \mathrm{Inv}^{cc}(f_2^{cv}(\mathbf{z})) \end{aligned}$$

and therefore

$$\begin{aligned} \zeta _2(\mathbf{z})= \min \left\{ \begin{array}{c} f_1^L \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z}))\\ f_1^L \mathrm{Inv}^{cc}(f_2^{cv}(\mathbf{z})) \end{array} \right\} . \end{aligned}$$

If on the other hand $f_1^L<0$ since both $-\mathrm{Inv}^{\mathrm{cv}}, -\mathrm{Inv}^{\mathrm{cc}}$ are increasing monotonically we have

$$\begin{aligned} \zeta _2(\mathbf{z})= \min \left\{ \begin{array}{c} f_1^L \mathrm{Inv}^{cv}(f_2^{cv}(\mathbf{z})),\\ f_1^L \mathrm{Inv}^{cc}(f_2^{cv}(\mathbf{z})) \end{array} \right\} = f_1^L \mathrm{Inv}^{cc}(f_2^{cv}(\mathbf{z})). \end{aligned}$$

In this case due to negativity of $f_1^L$ there holds

$$\begin{aligned} f_1^L \mathrm{Inv}^{cc}(f_2^{cv}(\mathbf{z}))\le f_1^L \mathrm{Inv}^{cc}(f_2^{cc}(\mathbf{z}))\le f_1^L \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z})) \end{aligned}$$

and we obtain again

$$\begin{aligned} \zeta _2(\mathbf{z})= \min \left\{ \begin{array}{c} f_1^L \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z}))\\ f_1^L \mathrm{Inv}^{cc}(f_2^{cv}(\mathbf{z})) \end{array} \right\} . \end{aligned}$$

(41)

Therefore, we have established that Eq. (41) holds independently of the sign of $f_1^L$. By a similar reasoning we deduce that

$$\begin{aligned} \zeta _4(\mathbf{z})= \min \left\{ \begin{array}{c} f_1^U \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z}))\\ f_1^U \mathrm{Inv}^{cc}(f_2^{cv}(\mathbf{z})) \end{array} \right\} . \end{aligned}$$

(42)

From inequality (39) and Eqs. (33),(40),(41),(42) we obtain

$$\begin{aligned} g^{cv}(\mathbf{z})\ge \bar{g}^{cv,MC++}(\mathbf{z}). \end{aligned}$$

If the available relaxations of $\mathrm{div}(\cdot ,\cdot )$ on $X_1\times X_2$ are tighter than $\mathrm{div}^{cv,mc}$ and $\mathrm{div}^{cc,mc}$ the resulting relaxation has to be even tighter.$\square $

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Reprints and permissions

About this article

Cite this article

Tsoukalas, A., Mitsos, A. Multivariate McCormick relaxations. J Glob Optim 59, 633–662 (2014). https://doi.org/10.1007/s10898-014-0176-0

Download citation

Received: 17 April 2013
Accepted: 18 March 2014
Published: 02 April 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s10898-014-0176-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multivariate McCormick relaxations

Abstract

Similar content being viewed by others

Differentiable McCormick relaxations

A theorem of the alternative with an arbitrary number of inequalities and quadratic programming

Compositions of convex functions and fully linear models

1 Introduction

2 Convex underestimator theorems

Theorem 1

Proposition 1

Proof

Lemma 1

Theorem 2

Proof

Corollary 3

3 Subgradient propagation

Lemma 2

Proof

Theorem 4

Proof

Proposition 2

Proof

4 McCormick relaxations and the auxiliary variable method

5 Product rule

Corollary 5

Proposition 3

Proof

6 Convex/concave envelopes and relaxations of min/max operators

Proposition 4

Example 1

Lemma 3

Proof

Proposition 5

Proof

Proposition 6

Proof

7 Fractional terms

Corollary 6

Proposition 7

Proof

8 Concluding remarks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Lemma 3

Lemma 4

Proof

1.2 Proof of Proposition 6

Proof

1.3 Proof of Proposition 7

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation