On piecewise linear approximations of bilinear terms: structural comparison of univariate and bivariate mixed-integer programming formulations

Bilinear terms naturally appear in many optimization problems. Their inherent non-convexity typically makes them challenging to solve. One approach to tackle this difficulty is to use bivariate piecewise linear approximations for each variable product, which can be represented via mixed-integer linear programming (MIP) formulations. Alternatively, one can reformulate the variable products as a sum of univariate functions. Each univariate function can again be approximated by a piecewise linear function and modelled via an MIP formulation. In the literature, heterogeneous results are reported concerning which approach works better in practice, but little theoretical analysis is provided. We fill this gap by structurally comparing bivariate and univariate approximations with respect to two criteria. First, we compare the number of simplices sufficient for an ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varepsilon $$\end{document}-approximation. We derive upper bounds for univariate approximations and compare them to a lower bound for bivariate approximations. We prove that for a small prescribed approximation error ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varepsilon $$\end{document}, univariate ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varepsilon $$\end{document}-approximations require fewer simplices than bivariate ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varepsilon $$\end{document}-approximations. The second criterion is the tightness of the continuous relaxations (CR) of corresponding sharp MIP formulations. Here, we prove that the CR of a bivariate MIP formulation describes the convex hull of a variable product, the so-called McCormick relaxation. In contrast, we show by a volume argument that the CRs corresponding to univariate approximations are strictly looser. This allows us to explain many of the computational effects observed in the literature and to give theoretical evidence on when to use which kind of approximation.


Introduction
Many real-world optimization problems contain bilinear terms. For example, the modelling of economic interactions quite often results in products of prices and (production) quantities in optimization models; see e.g. [11,18]. Other applications of bilinear programming include water management [20], gas network optimization [13,14,31] or pooling problems [8,33]. In practice, such bilinear terms of continuous variable products x y are often approximated by piecewise linear functions, because they can be modelled using mixed-integer linear formulations; see e.g. [6,15,17,26,30,39,44]. For any pre-specified ε > 0, this can be done in such a way that the maximum approximation error, given as the maximum absolute pointwise deviation between the pwl. approximation and the non-linear function, is at most ε for each term. One straightforward approach is to use mixed-integer programming (MIP) formulations for bivariate piecewise linear functions that approximate x y; see e.g. [16,29,47,50]. At the same time, it is well known that x y can be reformulated as a sum of univariate functions using additional variables and constraints. For example, in [3,28,38,49] the authors suggest to use the substitution x y = p 2 1 − p 2 2 with p 1 := 1 2 (x + y) and p 2 := 1 2 (x − y). The monomials p 2 1 and p 2 2 can then be approximated by two univariate piecewise linear functions, using a separate MIP formulation for each of these functions. This raises the main question of this article: which approach is more efficient in which situation?
In [36], it is suggested that there is no clear answer as to whether or not to reformulate products of variables by several univariate functions. This claim is supported by heterogeneous computational results from the literature. On the one hand, it is shown in [50] in a small computational study in the context of planning decentralized energy grids that a bivariate piecewise linear approximation may outperform a quadratic univariate formulation on certain instances. On the other hand, in [1] the authors obtain very good computational results with a quadratic univariate reformulation. Similarly, [21,41] report good results for a univariate logarithmic reformulation. The authors of the latter articles suspect that this is due to the smaller number of simplices required by the MIP formulations they use. From the computational experience in the literature reviewed above, we conclude that the actual choice of univariate and bivariate piecewise linear functions used to approximate x y is crucial for their respective performance. From a theoretical point of view, the literature offers much fewer analysis of the two approaches. Firstly, the best choice of a bivariate piecewise linear approximationuniquely determined by the given triangulation of the domain-is not straightforward. In particular, finding an explicit construction rule for the optimal triangulation (w.r.t. the number of triangles) of a rectangular domain in order to approximate x y is still an open problem. In [29], the author gives an implicit construction via a mixed-integer quadratically constrained quadratic program (MIQCQP). In the univariate case, there exist algorithms that can compute optimal piecewise linear approximations, for example for continuous functions (see [35]). However, these algorithms do not provide an algebraic expression of the approximation error. Further, [21] is the only theoretical analysis on the topic of univariate reformulations we are aware of. The authors derive an upper bound for the approximation error of a univariate logarithmic reformulation. They use it to construct ε-approximations that are more compact than direct bivariate piecewise linear approximations on problem instances from the field of paper production. However, as the triangulations are chosen heuristically, their results are not sufficient to state that in general univariate reformulations require less simplices.
Altogether, there is no rigorous comparison up to now which allows a comparison of the two approximation approaches with respect to the required number of simplices. Apart from the mentioned studies, there is-to the best of our knowledge-no theoretical analysis that would give a recommendation under which circumstances any one of the approaches is preferable. Furthermore, we are not aware of any works which analyse the continuous relaxations of the corresponding MIP formulations. A tighter continuous relaxation results in a tighter root relaxation of the branch-and-bound tree and therefore helps to keep the tree small. Since the number of simplices determines the number of necessary binary variables, less simplices directly lead to a smaller branch-and-bound tree.
In this paper, we fill the observed gap in the literature concerning a theoretical comparison of univariate and bivariate MIP formulations for piecewise linear approximations of x y. We establish hierarchies among them with respect to the following two criteria: (i) the number of simplices that are required to guarantee an approximation of x y with a given accuracy and (ii) the tightness of the continuous relaxation of an MIP formulation with respect to the graph of x y in terms of the enclosed volume.
Naturally, both aspects are crucial for the efficient solution of optimization problems containing bilinear terms with branch-and-cut algorithms. In this respect, we will highlight two important findings. First, we prove that commonly used monomial univariate reformulations always require fewer simplices than any bivariate approximation, as long as the prescribed error is small. Second, we show that the continuous relaxations of bivariate approximations always equal the McCormick relaxations and are genuinely tighter than the continuous relaxations of univariate reformulations. In addition, we derive a hierarchy among the univariate reformulations with respect to both questions. The remainder of this paper is structured as follows. In Sect. 2, we introduce the general notation and concepts that are used throughout the paper. Afterwards, we compare structural properties of the bivariate and univariate approximations in Sect. 3. In particular, in Sect. 3.1 we compare the number of required simplices, and in Sect. 3.2 the strength of the continuous relaxations of MIP formulations. In Sect. 3.3, we discuss how these results can be used for practical applications. We show why approximations with as few simplices as possible are advantageous for setting up good piecewise linear relaxations of x y and explain how to convert known cutting planes for quadratic expressions into univariately reformulated models. Finally, we draw our conclusions in Sect. 4. in practice most often triangulations are used, see e.g. [47]. Therefore, we limit ourselves to pwl. functions over triangulations. This is without loss of generality as a pwl. function defined on a polytopal partition can always be represented by a pwl. function over a triangulation, namely by triangulating each polytope.
In the following, we formally introduce the relevant definitions in this context. For the sake of simplicity, we restrict ourselves to continuous functions over compact domains. Further, we use the notation V (P) for the vertex set of a polyhedron P ⊂ R d .

Definition 1
A n-simplex S is the convex hull of n + 1 affinely independent points in R d . We call S a full-dimensional simplex if n = d holds.
A triangulation is a partition consisting of full-dimensional simplices as defined next.
Using the above definition of a triangulation, we can define pwl. functions as follows.

Definition 3 Let B ⊂ R d be a compact set, and let
In particular, for univariate pwl. functions g : Piecewise linear functions can be used to approximate non-linear functions, as shown in the next definition.

Definition 4
Let B ⊂ R d be a compact set, and let T := {S 1 , . . . , S k }, k ∈ N, be a triangulation of B. We call a pwl. function g : B → R a pwl. approximation of a continuous function G : Note that in this definition of a pwl. approximation, we restrict ourselves to interpolations. This is partly because some mixed-integer programming models of pwl. functions require continuity of the approximation, and partly because some of the results from the literature presented here have been developed specifically for interpolations (cf. [22,29,40]). Usually, the error of a pwl. approximation is measured by the maximum absolute pointwise deviation between the pwl. approximation itself and the non-linear function to be approximated; see e.g. [21,22,36,50]. In the following, we also use this definition of the approximation error and extend it to separable functions by introducing the so-called combined approximation error. The latter reflects the cancellations between positive and negative local approximation errors of the individual univariate summands a separable function decomposes into.
the approximation error on a simplex S ∈ T . Consequently, we define the approximation error of g (or, equivalently, of T ) w.r.t. G over the domain B as In the special case that G(x) = n i=1 G i (x i ) is a separable function and g(x) = n i=1 g i (x i ) is a separable pwl. approximation of G with pwl. approximations g i of G i , we define the combined approximation error as Given some ε > 0, we call g an ε-approximation and T (or (T i ) i∈n ) an ε-triangulation (or an ε-family of triangulations) if the (combined) approximation error is less than or equal to ε.
For our results regarding the approximation error of univariate reformulations of non-linear functions, we use the following straightforward upper bound for the combined approximation error of a separable function.

Lemma 1 Consider a compact set B and a separable continuous function G
be a separable pwl. approximation of G where each g i is a pwl. approximation of G i . Then the combined approximation error fulfils

Mixed-integer formulations of pwl. functions
Consider a continuous function G : B → R and its pwl. approximation g : B → R. In the following, we focus on representations of the graph of g, defined as where we allow the restriction of g to a subsetB ⊆ B. When solving optimization problems where g occurs in the objective function or in the constraints, it is impractical to work with Definition 3 directly. Instead, we need an explicit representation of the "if"-condition in Eq. (1). Very often this is done by expressing g in terms of gra(g). For example, minimizing over g is equivalent to minimizing z subject to (x, z) ∈ gra(g). The graph of a pwl. function can be modelled with the help of additional auxiliary continuous and binary variables as well as linear constraints (cf. [24][25][26][27]).
Note that the dimensions p and q of the continuous and the binary auxiliary variables, respectively, do not necessarily coincide. In [46], several such MIP formulations for the graph of a pwl. function are presented, e.g. the incremental method or the multiple-choice method, with their respective sizes stated in Table 1. All MIP formulations mentioned there have the desirable property to be sharp. In order to define sharpness, we need some more notation. For this reason, we define the terms convex envelope and concave envelope, which we use to describe the convex hull of the graph.

Definition 7 Consider a continuous function
as the convex envelope and the concave envelope of G with respect toB ⊂ B.
We have for the convex hull of gra(g). For brevity, we use the notation gra(g) := gra B (g), convenv(g) := convenv B (g) and caveenv(g) := caveenv B (g). An MIP formulation of a graph is called sharp if its PCR coincides with the convex hull of the graph.
To obtain a finer measure of the strength of an MIP formulation M g , we further consider the volume of its PCR, namely vol(proj (x,z) C(M g )). The volume of an MIP formulation M g for a corresponding pwl. function g is minimal if M g is sharp, i.e. we have vol(conv(gra(g)) = vol(proj (x,z) C(M g )). If M g is not sharp, the volume can be larger. We say that a MIP formulation is looser or tighter than another if the volume of its PCR is larger or smaller, respectively. These terms are suitable in the sense that the volume of the PCR is the integral over the maximum pointwise deviation to gra(g). The volume can therefore be interpreted as an overall error measure of the continuous relaxation.

Structural properties of univariate and bivariate piecewise linear approximations
Our work focusses on the structural analysis of pwl. approximations of the non-linear function where D := [x,x] × [y,ȳ] ⊂ R 2 is a box domain with x <x and y <ȳ. It is a straightforward idea to approximate F via a bivariate pwl. function f : D → R. Using an MIP formulation M f , we can then model gra( f ) as in order to obtain a mixed-integer linear representation of f . Alternatively, we can equivalently reformulate F as a sum of univariate functions in order to approximate F by approximating each individual function in the sum. This reformulation can be done in various ways. Table 1 summarizes-to the best of our knowledge-all univariate reformulations of F used in the optimization literature. It shows the corresponding variable substitutions, the additionally required constraints as well as bibliographical references for the use of each reformulations in optimization.
Although we also list the logarithmic reformulation Ln in Table 1, we will not discuss it further in this work for various reasons. Firstly, the literature reports numerical difficulties in connection with the use of this reformulation in practice (see [10,22,50]), which is plausible given the asymptotic behavior of the logarithm for inputs close to zero. Secondly, Ln is only applicable in the case x > 0 and y > 0. Although this condition can always be fulfilled via a simple bound-shifting trick (see [21]), a shifted approximation does not retain its accuracy in general, as elementary examples show. Further, the upper bounds on the combined error of a pwl. approximation based on Ln stated in [21] deteriorate with increasing shift values as well.
In the following, we exemplarily derive an MIP formulation for a univariate approximation of gra(F) via reformulation Bin1 from Table 1. First, the graph of F can be stated as The domains of the additional variables p 1 and p 2 are consequently given by ) and gra( f Bin1 2 ). We can then model an approximation of gra(F) as together with the MIP formulation Corresponding MIP formulations for Bin2 and Bin3 are stated in "Appendix A".
In the remainder of this section, we will compare bivariate MIP formulations for the approximation of gra(F) as given in (3) to univariate MIP formulations, such as (5), using two different metrics of efficiency. In Sect. 3.1, we analyse the number of simplices required in each case to construct an ε-approximation. We will show that using Bin1, Bin2 and Bin3, we can construct ε-families of triangulations with a smaller number of simplices than needed for any bivariate ε-triangulation if the prescribed approximation accuracy ε is sufficiently small. Furthermore, we will prove that a particular equidistant family of triangulations is εoptimal for Bin1. In Sect. 3.2, we then investigate the tightness of the continuous relaxations of univariate and bivariate MIP formulations. On the one hand, we show that the PCR of any bivariate MIP formulation coincides with the convex hull of gra(F), which is known as the McCormick relaxation [32]. On the other hand, we show how to compute the PCRs of the considered univariate MIP formulations and prove that these are indeed weaker relaxations of gra(F) than the McCormick relaxation. Moreover, we show that using Bin1 yields the tightest continuous relaxation among the studied univariate reformulations. Finally, we indicate in Sect. 3.3 how to use these theoretical results in practice. In particular, we outline how to overcome the fact that univariate MIP formulations yield weaker continuous relaxations by adding the linear inequalities describing the convex hull, which are known as the McCormick cuts, to the univariate MIP formulations in a reformulated fashion, as done in [1]. Furthermore, we suggest under which circumstances which univariate reformulation should be chosen.

Number of simplices
We start our comparison between bivariate and univariate pwl. approximations of the bilinear function F by considering the size of the resulting MIP formulation. In this respect, the overall number of binary variables is a crucial factor for the computational complexity of the resulting optimization problem. This number, however, strongly depends on the specific modelling of the MIP formulation, see [47]. The number of binary variables can be reduced significantly, for example, by a logarithmic encoding of the simplices, compared to a straightforward modelling approach as shown in [29,48]. Therefore, we will instead compare pwl. approximations by the number of simplices required to obtain a prescribed approximation guarantee, which directly impacts the number of binary variables in any modelling of the arising MIP formulation.
To this end, we introduce the concept of ε-optimal triangulations for the pwl. approximation of a non-linear function. We use the same definition as in [29,41] and refer to [5] for more context on optimal triangulations and possible alternative definitions.

Definition 9
Let B ⊆ R d be a compact set, and let g : B → R be a pwl. ε-approximation of the continuous function G : B → R w.r.t. the underlying ε-triangulation T of B. We say that T is ε-optimal if |T | is minimal among all ε-triangulations of B.
In the special case that is a pwl. approximation of G, such that each g i is a pwl. approximation of G i , we say that the corresponding family of triangulations It is not obvious how to determine ε-optimal triangulations in general. To the best of our knowledge, the complexity status of this problem is still open. The only related result we are aware of is the NP-hardness of finding minimum edge-weighted triangulations, where the aim is to minimize the sum of the edge weights, see [37]. However, finding an ε-optimal triangulation corresponds to minimizing the maximum edge weight in the chosen triangulation, as shown in [29]. Thus, we will mostly work with lower and upper bounds on the required number of simplices for a pwl. approximation. More precisely, we will show that for a sufficiently small prescribed approximation accuracy ε > 0 we can construct ε-families of triangulations for Bin1, Bin2 and Bin3, such that the corresponding number of simplices is smaller than that of any bivariate ε-triangulation.

Univariate pwl. approximations
We will now consider the construction of ε-approximations for univariate reformulations of F. For this purpose, we study equidistant triangulations for pwl. approximations of univariate quadratic functions. We then prove that a particular family of equidistant triangulations is ε-optimal for reformulation Bin1. Finally, we derive upper bounds for the size of ε-optimal triangulations in the reformulations Bin2 and Bin3 by using equidistant triangulations.
Finding ε-triangulations for univariate functions has been extensively covered in the literature under the term minimax approximation. For an overview, we refer to [35], where the author also provides an algorithm for finding an ε-optimal piecewise polynomial approximation of degree n for a given continuous univariate function. In particular, this algorithm can be used to find pwl. approximations. Another approach can be found in [42]. Here, the authors present a mixed-integer non-linear optimization program (MINLP) for computing an ε-optimal continuous pwl. approximation for a given univariate function. However, both approaches do not provide closed functional relations for the required number of simplices depending on ε. In contrast, our focus here will be on deriving functional relations for the number of simplices of ε-families of triangulations in Bin1, Bin2 and Bin3. We start with a relation for ε-optimal families of triangulations in reformulation Bin1. In order to do so, we make use of the following lemma about linear approximations of univariate quadratic functions, which is straightforward to prove via differential calculus.
It is attained at the centre of the domain, i.e. at x * := x+x 2 . The following result extends Lemma 2 to pwl. approximations of univariate quadratic functions. It says that an equidistant placement of vertices minimizes the approximation error.
formed by an equidistant placement of the n + 1 vertices x 0 := x < . . . < x n :=x ∈ R. Further, let g : [x,x] → R be the pwl. approximation of G w.r.t. T . Then the corresponding approximation error is given by Furthermore, the approximation error of g is minimal among all pwl. approximations of G over n simplices.
Proof Let the triangulation T := {S 0 , S 1 , . . . , S n−1 } be given by the simplices As the corresponding pwl. approximation g is linear over each S i and coincides with G at the vertices, its linear segments are given by functions g i : Lemma 2 states that the approximation error over each simplex S i is attained at the respective midpoint, with Thus, the approximation error is minimized by an equidistant placement of the vertices, i.e.
Note that the approximation error for a univariate quadratic function only depends on the diameter of the domain and the number of simplices of the triangulation and is thus invariant under shifts of the domain itself.
We can now prove that particular equidistant families of triangulations are ε-optimal for reformulation Bin1. Then the combined approximation error of (T Bin1 2

is given by a pair of equidistant triangulations with
Proof First, note that D 1 × D 2 is a quadratic box with a width of (x − x +ȳ − y)/2. Furthermore, the feasible domain of the variable substitution in Bin1, given by is a rhombus inscribed into this box. This situation is depicted in Fig. 1. we assume n 1 ≤ n 2 . Further, for any p * 1 ∈ D 1 and p * 2 ∈ D 2 , we define the projections of I onto the coordinate axes as We consider now the following two exhaustive cases 1) and 2): . . , n 2 : From this assumption it follows that T 1 has to be equidistant. Moreover, we know from Lemma 3 that ε f Bin1 1 ,F Bin1 1 (T 1 ) =ε. By the same arguments, we also know that It is obvious by geometric reasoning that the diameter of the projection I p * 1 is longer than (x − x +ȳ − y)/(2n 1 ), see Fig. 1a. As a result, there must be at least one vertex p 2, j contained in I p * 1 . As the approximation error at a vertex is always zero, it follows that the approximation error at In summary, we have Again, by geometric arguments, I p * 1 must be longer than (x − x +ȳ − y)/2n 1 . However, due to the fact that the approximation error at a vertex is always zero, I p * 1 cannot contain any vertex p 2, j ∈ N (T 2 ) as this would imply that we have a point in I p * 1 at which the combined approximation error is greater thanε, namely ( p * 1 , p 2, j ). Consequently, we have This means that at the midpoint p * 2 of D 2 (which is also the midpoint of Obviously, I p * 2 = D 1 , and therefore D 1 cannot contain

(a) (b)
any points with an approximation error of zero, which is a contradiction to the fact that f Bin1 1 is a pwl. approximation (interpolation).
It is not straightforward how to obtain a similar result as Lemma 4 for reformulations Bin2 and Bin3. The difficulty stems from the fact that in these two cases we have to approximate three functions simultaneously, instead of only two as in Bin1. However, we can still use equidistant triangulations to determine upper bounds on the number of simplices for Bin2 and Bin3.
Proof To obtain an ε-family of triangulations for Bin2, we use Lemma 3 to construct εtriangulations for each of the two concave terms −x 2 , approximated by − f Bin2 2 and −y 2 , approximated by − f Bin2 3 , as well as a 2ε-triangulation for the convex term (x + y) 2 , approximated by f Bin2 1 . This directly yields the number of simplices stated in the claim. Taking into account the prefactor of 0.5 in the variable substitution, Lemma 1 then certifies that we have indeed found an ε-family of triangulations.
The same result as above holds for Bin3, as it consists of the same quadratic terms, only with switched signs. The upper bounds for ε-families of triangulations derived so far are summarized in Table 2.
If we do not require ε-approximations for each of the terms −x 2 (or x 2 ) and −y 2 (or y 2 ) in Bin2 (or Bin3), but rather only require a 2ε-approximation for the combined approximation of these two functions, we can still apply Lemma 1 to obtain equidistant ε-families of triangulations, and it is possible in many cases to improve the bounds presented in Table 2. We can determine these improved bounds by solving a mixed-integer quadratically constrained quadratic program (MIQCQP) as follows.

Remark 1
Let ε > 0 be a prescribed maximum combined error for a pwl. approximation of F either via Bin2 or Bin3. Then we can compute the minimum possible number of simplices for any corresponding family of equidistant ε-triangulations as the optimal value n * of the following optimization problem: n * := min n 1 ,n 2 ,n 3 The variables n 1 , n 2 and n 3 model the number of simplices used for the triangulations T Bin2 ) in the pwl. approximation of the terms −x 2 , −y 2 and + p 2 (or x 2 , y 2 and − p 2 ) respectively, see "Appendix A" for the complete models. The two inequality constraints of Problem (6) model the max-expression in the upper bound Table 2 Upper bounds on the minimal number of simplices in an ε-family of triangulations in Bin1, Bin2 and Bin3. For Bin1, this is also the size of an ε-optimal family of triangulations

Reformulation
Max. required number of simplices on the combined approximation error provided by Lemma 1; the respective terms on the left-hand sides stem from Lemma 3. Note that Problem (6) can be equivalently reformulated as a non-convex MIQCQP: n * := min n 1 ,n 2 ,n 3 ,η 1 ,η 2 with auxiliary variables η 1 and η 2 .
We cannot make a general hierarchical statement among the univariate reformulation Bin1, Bin2 and Bin3, since we do not know ε-optimal families of triangulations for Bin2 and Bin3. However, the simple fact that in Bin1 we only approximate two instead of three univariate functions suggests that ε-optimal families of triangulations for Bin2 and Bin3 consist of more simplices than those for Bin1. xxx

Bivariate pwl. approximations
Finding a bivariate ε-optimal triangulation for the approximation of F over a rectangular domain is still an open problem, see the elaborations in [29] and the references therein. However, it will be sufficient for us to determine a lower bound on the number of simplices in an ε-optimal triangulation to see that in essence bivariate pwl. approximations of F require more simplices than univariate ones. In order to derive this lower bound, we first prove the following rather general lemma, which has been presented in the dissertation [12] of the second author. It gives sufficient conditions under which the maximum approximation error between a non-linear function and its pwl. approximation is attained at a facet of one of the simplices of the triangulation. Proof Let S ∈ T , and let g S be the linear approximation of G over the simplex S. Furthermore, let x ∈ S be a point in the interior of the simplex S, and let L x be a line such that G is linear along S ∩ L x . Naturally, g S is also linear along S ∩ L x , which therefore also holds for the function g S − G. Thus, g S − G attains its minimum on one end point of the line segment S ∩ L x and its maximum on the other end point. Therefore, the error function |g S − G| over S ∩ L x attains its maximum, i.e. the maximal approximation error, on one of the facets of S. As S ∈ T and x ∈ S were chosen arbitrarily, this finishes the proof.
With the help of the above lemma, we can now characterize the approximation error of a bivariate pwl. approximation of F. Note that the following result is well known in the literature. We show it again in order to demonstrate the utility of Lemma 6 in delivering a concise proof.
Proof It is obvious that the prerequisites of Lemma 6 apply to F. In particular, for each point in some simplex S ∈ T , F is linear along each of the two coordinate axes. Consequently, the approximation error is attained over a facet e of S. We can now parametrize the functions f |e and F |e , i.e. the restrictions of f and F onto e, using the convex combination of its endpoints (x 0 , y 0 ) and (x 1 , y 1 ). By writing each point (x, y) ∈ e as (x, y) = (x 0 , y 0 ) + (1 − λ)(x 1 , y 1 ) for some λ ∈ [0, 1], we can express f |e , F |e and E f |e ,F |e as functions in λ: Lemma 2 implies that the approximation error, i.e. the maximum of the quadratic error function E f |e ,F |e , has a value of and is attained at λ * = 0.5, corresponding to the centre of e.
Lemma From 7, we can conclude that the (maximum) error of a bivariate pwl. approximation of F corresponding to a given triangulation of D is always attained at the centre of a facet of one of its simplices. In [29], the author uses this property to formulate the problem to find ε-optimal triangulations as an MIQCQP. To the best of our knowledge, this is the only work considering provably ε-optimal triangulations of the rectangular domain D for the approximation of F. Unfortunately, due to the size of the resulting optimization model, this approach is computationally intractable even for small instances. However, in order to prove that univariate ε-families of triangulations require fewer simplices than any bivariate ε-triangulation for a sufficiently small approximation error ε, it suffices to derive a suitable lower bound for the size of an ε-triangulation. The following lemma gives such a lower bound by using so-called ε-optimal triangles. An ε-optimal triangle satisfies a prescribed approximation error bound of ε while taking a maximum possible area. The idea of the following lower bound is to assume that there exists a triangulation consisting exclusively of ε-optimal triangles. simplices.
Proof In [40], the authors show with the help of a version of Lemma 7 that the area of an ε-optimal triangle is 2 √ 5ε. The area of the rectangular domain D is (x −x)(ȳ − y). Assuming that we can triangulate D solely by ε-optimal triangles, we obtain the indicated lower bound. Figure 2 shows two different 0.25-optimal triangles as an example. Together they form a parallelogram. Therefore, copies of the two triangles can be arranged to obtain a triangulation of the plane R 2 . However, it is unclear if or how we can use ε-optimal triangles to triangulate polyhedral domains, such as boxes. The problem with using only ε-optimal triangles is their orientation in the plane. Since we want to triangulate an axis-parallel box domain, we have at least four edges that are axis-parallel. However, there is no ε-optimal triangle that has an axis-parallel edge. If a triangle has at least one axis-parallel edge, its maximal area can be at most 4ε instead of 2 √ 5ε, as shown in [29]. For more information about ε-optimal triangles, we refer the reader to [4,40]. For an overview of actual triangulations of box domains to approximate variable products, see [7].
Furthermore, it is easy to see that the lower bound from Lemma 8 is not always tight. From Monsky's Theorem in [34], we know that we cannot triangulate a rectangle with an odd number of simplices such that all simplices have the same area. As a consequence, at least for all values of ε for which the lower bound is an odd number, we need at least one additional simplex than the lower bound suggests.

Comparison of univariate and bivariate approximations
We close Sect. 3.1 by comparing univariate and bivariate approaches with respect to the required number of simplices. Our main result concerning ε-approximations of F then says the following: Via the reformulations Bin1, Bin2 and Bin3 we can always obtain ε-families of triangulations with fewer simplices than any bivariate ε-triangulation, if the approximation accuracy ε is sufficiently small. This finding is formally stated in Theorem 1.
For any given ε, we can compare the bounds stated in Table 2 and Lemma 8 respectively in order to determine if univariate or bivariate approximation yields smaller triangulations.
To illustrate Theorem 1, we provide some exemplary numerical results for the concrete domain D = [0, 2] × [0, 6] in Table 3. We list the numbers of simplices in the triangulations constructed via Lemma 4 for Bin1 and Remark 1 for Bin2 and Bin3 together with the actual approximation error in the columns entitled |T | and ε f ,F (T ), respectively. For the bivariate approximation, we list the lower bounds from Lemma 8.
For all approximation accuracies lower than 0.25, the equidistant pair of triangulations in Bin1 dominates all other triangulations. Further, for the smallest considered approximation accuracy ε = 0.05, all univariate numbers fall below the bivariate lower bound. In particular, Bin1 requires three times less simplices than the bivariate lower bound postulates. This demonstrates the advantage of univariate reformulations for pwl. approximations most clearly.

Envelopes and strength of the continuous relaxations
An important property of any MIP formulation is the tightness of its continuous relaxation (CR), i.e. the set obtained by relaxing the integrality constraints. Very often, MIP formulations of pwl. functions are used to represent or approximate the non-linear parts of an optimization problem. The usual solution method is then a branch-and-cut approach, in which a continuous relaxation of that problem is solved at each node in the branch-and-bound tree to compute bounds on the objective function value of the optimization problem. In general, a tighter relaxation is more desirable as it yields a smaller branch-and-bound tree, which in turn often leads to shorter computation times. Thus, when comparing MIP formulations for the approximation of gra(F) it is relevant to study the quality of the respective CRs.
In the following, we compare the bivariate MIP formulation (3) with the univariate MIP formulations (5), (10) and (12), where the latter two are stated explicitly in "Appendix A". Since these MIP formulations require additional auxiliary variables, we compare the quality of their respective continuous relaxation based on the volume of their PCRs, i.e. after projection to the surrounding space of gra(F). This will lead to two main results. Firstly, we show that the PCR of any bivariate MIP formulation equals conv(gra(F)). Secondly, we show that the PCRs of univariate MIP formulations are strict relaxations of conv(gra(F)).

Continuous relaxations of bivariate pwl. approximations
According to Definition 8, the PCR of a sharp MIP formulation actually coincides with the convex hull of the modelled pwl. graph. This means that in this sense, all sharp MIP formulations of a graph are equivalent. Sharpness is a property many well-known MIP formulations fulfil, such as the convex-combination method, the multiple-choice method and the incremental method (see [46]).
In the following, we consider sharp MIP formulations M f for gra( f ), where f is a bivariate pwl. approximations of F. For these, we show that the PCR proj (x,y,z) (C(M f )) is not only independent of the chosen MIP formulation, but also independent of the underlying triangulation that defines f . For this purpose, we first recall some important notions concerning the convex and the concave envelope of a given function; see [45] for a more extensive treatment of the subject.

Definition 10
Let B ⊂ R n be a polytope with vertices V (B). We say that a continuous function G : holds for every x ∈ B. In this case, we also call the function G itself convex polyhedral. Analogously, the function G has a vertex polyhedral concave envelope if holds for every x ∈ B; the function G is then called concave polyhedral.
For functions that are convex or concave polyhedral, we can show that this property also carries over to their pwl. approximations. This new result allows us to directly give an algebraic representation of proj (x,z) C(M f ) from the convex and concave envelope of F.

Lemma 9 Let B ⊂ R n be a polytope and G : B → R be a convex (concave) polyhedral function. Further, let g be a pwl. approximation of G over B, defined by a triangulation T . Then convenv B (g) (caveenv B (g)) is convex (concave) polyhedral as well and convenv
Proof It suffices to show the statement for the convex polyhedral case as the concave polyhedral one is analogous. As g is a pwl. approximation of G, we have g( It remains to show that g(x) ≥ convenv V (B) (g)(x) for all x ∈ B. To this end, let x ∈ B, and let S ∈ T be a simplex with vertices s 0 , . . . , s n , chosen such that x ∈ S holds. Then there exist λ i ≥ 0, i = 0, . . . , n, such that x = n i=0 λ i s i , with n i=0 λ i = 1. Thus, it follows

This results in convenv
This leads to the following central result for pwl. approximations f of F. It says that the PCR of (3) is (i) independent of the actual choice of f and (ii) independent of the MIP formulation modelling gra( f ) as long as the MIP formulation is sharp.
Proof In [43,Remark 1.3], it is shown that multi-linear functions on boxes are both convex and concave polyhedral. Thus, F has a vertex polyhedral convex and concave envelope. By Lemma 9, every pwl. approximation f of F is also convex and concave polyhedral. In addition, F(x, y) = f (x, y) = x y holds for all (x, y) ∈ V (D). It follows that and therefore conv(gra(F)) = conv(gra( f )).
From the sharpness of M f for gra( f ), we can conclude that proj (x,z) C(M f ) = conv(gra( f )) = conv(gra(F)), which completes the proof.
From the literature, conv(gra(F)) is known as the McCormick relaxation of F (cf. [32]). It is defined by the two functions C L : D → R and C U : The McCormick relaxation is the tightest relaxation of gra(F) that any MIP formulation can obtain. In the following remark, we discuss how the relaxation of bivariate MIP formulations can be tightened when additional restrictions are added for x and y.

Remark 2
We consider the special case where D is intersected with a compact set Z ∈ R 2 . This might be the case if F occurs as a term in the objective function or a constraint of an optimization problem. For this case, the set Z can model a large variety of possible constraints involving the variables x and y. We know the following: Convex envelopes as functions D → R This means that the PCR of M f restricted to D ∩ Z can potentially be tightened by adding additional constraints. See [2], where the authors consider the set Z := {(x, y) ∈ R 2 | x y ≤ u} for some u ∈ R and derive conv(gra D∩Z ( f )) by adding additional linear and conic constraints to conv(gra( f )) ∩ (Z × R).

Continuous relaxations of univariate pwl. approximations
We now turn to the PCRs of sharp univariate MIP formulations as in (5), (10) and (12). We point out that univariate reformulations are described by separable functions over rectangular domains. Such functions are known to be sum decomposable; see [45]. This means that the envelopes of separable functions are determined by the sum of the envelopes of their univariate summands; see also [19]. As a consequence of this, the convex and concave envelopes of pwl. univariate approximations of F, and thus the PCRs of the corresponding MIP formulations, depend on both the choice of the univariate reformulation and the chosen triangulations defining the pwl. approximations. The dependency on the triangulations is in contrast to the result we had in the bivariate case. The consequence is that the tightness of the PCR is influenced by the approximation error and thus depends on the number and placement of the vertices of the triangulations. For further details we refer to [9], where the effects of the approximation error on PCRs are discussed, and neglect the approximation error in the following. We rather assume that the approximation error is sufficiently small so that it does not interfere with the comparison of the PCRs. Consequently, we focus on the envelopes that we obtain from the non-linear univariate reformulations Bin1, Bin2 and Bin3, i.e. (4), (9) and (11). Note that each of the univariate reformulation Bin1, Bin2, and Bin3 is a sum of quadratic functions which are all either convex or concave. The convex (concave) envelope of each convex (concave) summand is the convex (concave) function itself. In contrast, a convex (concave) function is vertex polyhedral; its concave (convex) envelope is therefore given as the linear interpolant which uses the domain bounds as vertices. In Table 4, we list the convex and concave envelopes of the pwl. approximations f Bin1 : D → R, f Bin2 : D → R and f Bin3 : D → R of F that we obtain by exploiting sum decomposability as explained above.
We emphasize that these envelopes are strict under-resp. overestimators of F and thus only give a relaxation of conv(gra(F)) in the sense of Eq. (2). Further, we also state the respective PCRs in Table 4. The following proposition compares the volumes of these PCRs. It states that among the three univariate reformulations, the PCR proj (x,z) C(M f Bin1 ) is a strictly tighter relaxation of gra(F) than proj (x,z) C(M f Bin2 ) and proj (x,z) C(M f Bin3 ), which coincide in terms of volume.

Lemma 10 The volumes V D
Bin1 , V D Bin2 and V D Bin3 of the projections proj (x,y,z) C(M f Bin1 ), proj (x,y,z) C(M f Bin2 ) and proj (x,y,z) C(M f Bin3 ), respectively, form the following hierarchy: . Proof For the volumes of the two projections proj (x,y,z) C(M f Bin2 ) and proj (x,y,z) Both volumes of V D Bin2 and V D Bin3 are given by The volume of the projection proj (x,y,z) C(M f Bin1 ) is given as Together with (7), we obtain which completes the proof.

Comparison of the univariate and bivariate continuous relaxations
We now compare the PCRs that result from the univariate and bivariate MIP formulations.
The following theorem says that the PCRs of the univariate MIP formulations always yield looser relaxations of gra(F) than the PCR of a bivariate MIP formulation.
Further, we know from Lemma 10 that Bin1 provides the tightest CR among the univariate reformulations. It now holds that the difference of these two volumes is always greater than zero, i.e. Thus, To quantify this downside of the univariate MIP formulations, we calculate the ratio between the volume of their PCRs to the volume of conv(gra(F)). We denote the ratios by Obviously, the ratios R D Bin1 , R D Bin2 and R D Bin3 are invariant under axial shifts of the domain D. This means that the ratios depend only on the length of the axes (x − x) and (ȳ − y). In Fig. 3, we plot R D Bin1 , R D Bin2 and R D Bin3 with respect to the elongation and scaling of the domain by varying (x − x) and (ȳ − y). In accordance with Theorem 3, Bin1 always yields a better ratio than either of Bin2 or Bin3. Furthermore, it is noteworthy that the more rectangularly stretched D is, the worse the ratios of the univariate reformulations become. The ratios start from 2.5 (Bin1) and 3.5 (Bin2, Bin3) on the quadratic domain D = [0, 1] × [0, 1] and then increase towards infinity as the domain becomes more rectangular.
To illustrate the shapes of the different PCRs, we have plotted them exemplarily for the quadratic domain D = [0, 1] × [0, 1] in Fig. 4. Although the volumes V D Bin2 and V D Bin3 are the same, it can be shown that C L 2 is a tighter convex underestimator for F over D than C L 3 . The opposite is true for the concave overestimators, where C U 3 is a tighter convex overestimator than C U 2 . These observations are of particular interest in the context of an optimization problem. If for example F appears in the objective function of a minimization problem, Bin2 gives a tighter convex underestimator, while Bin3 gives a tighter convex overestimator if F instead appears in the objective function of a maximization problem. However, this clear hierarchy does not hold for Bin1, which yields tighter or less tight relaxations than Bin2 or Bin3 depending on the elongation of the domain and the optimization sense. Formal proofs of these hierarchical observations are given in Section A.1.

Discussion and guidelines for practice
In Sect. 3.1, we have shown that univariate MIP formulations are superior to bivariate MIP formulations when it comes to the size of the underlying triangulation required to attain a certain high approximation accuracy for F. However, this is in part bought by the fact that their corresponding PCRs are looser, as we showed in Sect. 3.2. In this section, we discuss some consequences of these observations for the practical use of pwl. approximations in the modelling of optimization problems.
On the one hand, a bivariate MIP formulation is favourable if we are interested in obtaining good dual bounds for a pwl. approximation of a given non-convex MIQCQPs early in the solution process, for example. This is mainly because in the root node it yields the best possible linear-programming (LP) bound as its PCR equals the McCormick envelope, independent of the number of simplices used, as we showed in Theorem 2. In contrast, in Theorem 3 we have proved that the PCR of any univariate MIP formulation is looser than the bivariate PCR. Therefore, the initial LP bound at the root node is weaker.
On the other hand, if instead the optimal solution of a high-accuracy MIP approximation of a certain MIQCQP is required, the results of Sect. 3.1 suggest to pursue a univariate reformulation scheme, as it requires less simplices to obtain an ε-approximations for some prescribed guarantee ε. To compensate for the disadvantage of looser PCRs in this case, we can easily tighten the univariate reformulation by incorporating a univariate variant of the well-known McCormick cuts, which are known to completely describe the convex hull of F. To this end, we can simply replace the term x y in the corresponding univariate reformulation of the constraint at hand. We exemplarily state the resulting version of the McCormick cuts z ≥ x y + x y − x y, z ≥x y + xȳ −xȳ, For Bin2 and Bin3, the corresponding McCormick cuts are straightforward to compute as well. With an increasing prescribed accuracy of a pwl. approximation, a bivariate approach requires unproportionally more simplices and consequently binary variables. Hence, a univariate reformulation approach together with the addition of the four inequalities (8) quickly becomes the cheaper alternative in terms of complexity. This recommendation is in line with the results of [1], where pwl. approximations are utilized to solve MINLPs arising in the context of alternating current optimal power flow. The authors reformulate the bilinear terms in their original model for the problem by the univariate reformulation Bin2. Additionally, they add the reformulated McCormick cuts shown in (8). It turns out that the resulting univariate model is solved much faster than the bivariate one, while the solutions of both models are of the same approximation quality. To the best of our knowledge, the authors of [1] are the first who use such a univariate reformulation enhanced with additional cutting planes.
Although the figures stated in Table 3 suggest that Bin1 compares favourably to Bin2 and Bin3 in terms of the number of required simplices, the structure of the constraint set of the considered optimization problem is crucial. If, for instance, bounds for the term x − y are known a priori, for example inferred from the problem data, using Bin3 can be advantageous (cf. [50]). The same holds for Bin2, if bounds for the term x + y are available. Moreover, in case that for a subset x 1 , x 2 , . . . , x n of the variables at hand many of the bilinear terms x i x j with i, j ∈ {1, 2, . . . , n} occur in the constraints of the problem, using Bin2 or Bin3 can again be beneficial. The reason for this is the following general observation. If the same non-linear function G occurs multiple times in an optimization problem (except for linear factors), we can replace this function with the same variableg everywhere in the model and add the constraintg = G only once. This way, we need only one pwl. approximation for all occurrences of G. Thus, if we reformulate the terms x i x j via Bin2 or Bin3, for each of the O(n) many quadratic monomials x 2 i and x 2 j only one pwl. approximation has to be constructed. Apart from this, we only need one pwl. approximation for each of the O(n 2 )- In case of Bin1, however, we need two different pwl. approximations for each of the O(n 2 )-many p 2 1,i, j, = ( 1 2 (x i + x j )) 2 and p 2 2,i, j = ( 1 2 (x i − x j )) 2 .

Conclusion and discussion
In this paper, we studied MIP formulations for pwl. approximations of bilinear terms in optimization models. More precisely, we compared MIP formulations for direct bivariate pwl. approximations of variable products to MIP formulations for pwl. approximations after univariate reformulations with respect to two different metrics of efficiency. First, we proved that for a sufficiently small prescribed approximation error ε, all considered univariate reformulations allow more compact ε-approximations than any bivariate ε-approximation requiresas measured by the number of simplices in the underlying triangulation. In this sense, concerning the size of the resulting pwl. approximations, and consequently the required number of binary variables, our results are a strong indication for using univariate reformulations in optimization problems. Second, we showed that, in contrast, all univariate reformulations lead to genuinely weaker continuous relaxations than bivariate MIP formulations. These two opposing characteristics of the respective MIP formulations explain many of the mixed computational results found in the literature. Finally, we discussed our theoretical results with regard to their application in practice. Notably, the looser relaxations of the univariate reformulation approaches can be improved to equal those of a bivariate pwl. approximation by adding linear cutting planes, the so-called McCormick cuts. A first algorithmic approach constructed in this fashion can already be found in the literature ( [1]), reporting very good computational results for the considered application. In this way, the authors profit from compact MIP formulations as well as from tight relaxations at the same time. Both our theoretical results and these first empirical evidence indicate that it would be promising to study generic algorithms for MIQCQPs based on univariate reformulations as part of future research on the topic. thank the anonymous referees for their insightful comments, which led to a substantial improvement of the paper.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A.1: A hierarchy of convex underestimators
In the following, we derive a hierarchy for the convex underestimators that result from the continuous relaxations of the univariate reformulations (see Table 4). The following results are useful, for example, if F occurs as a term in the objective function to be minimized in some optimization problem. This is because the choice of convex underestimators determines the tightness of the resulting continuous relaxation (while the overestimators of F are not relevant due to the optimization sense). We start by comparing the convex underestimators C L 1 with C L 3 , belonging to Bin1 and Bin3 respectively.

Proposition 1
The convex envelope C L 1 : D → R resulting from the univariate reformulation Bin1 is a tighter convex underestimator of F over D than the convex envelope C L 3 : D → R resulting from the univariate reformulation Bin3, i.e. we have and there exists a point (x, y) ∈ D with Proof We note that the first condition is equivalent to proving that the optimal objective value of the maximization problem max (x,y)∈D with C 31 : D → R and C 31 (x, y) := 4(C L 3 (x, y) − C L 1 (x, y)) = (x − y) 2 − (x + x −ȳ − y)(x − y) + (x −ȳ)(x − y), is less than or equal to 0, which we do in the following.
In Problem (13), we maximize a univariate convex quadratic function in x − y, which means that the maximum is attained at one of the two bounds of the domain of x − y over D, i.e. at either at (x,ȳ) or at (x, y). Evaluating C 31 at these two points yields This means that the optimal objective value of Problem (13) is indeed 0. Now consider the point (x, y). We have Thus, C L 1 is strictly tighter than C L 3 .
The same results as above also holds with respect to C L 2 and C L 3 , belonging to Bin2 and Bin3 respectively.

Proposition 2
The convex envelope C L 2 : D → R resulting from the univariate reformulation Bin2 is a tighter convex underestimator of F over D than the convex envelope C L 3 (x, y) resulting from the univariate reformulation Bin3, i.e. we have with C 23 : D → R and C 23 (x, y) := 2(C L 2 (x, y) − C L 3 (x, y)) = 2x y − (ȳ + y)x − (x + x)y + x y +xȳ.
Problem (14) minimizes a bilinear function over a box. It is obvious that C 23 is linear along both the x-axis and the y-axis, i.e. along the edges of the box. This means that C 23 is edgeconcave, and therefore the minimum of C 23 over D is attained at one of the vertices V D = {(x,ȳ), (x, y), (x, y), (x,ȳ)} of the box. By evaluation, we obtain: Between C L 1 and C L 2 , belonging to Bin1 and Bin2 respectively, C L 2 is the tighter convex underestimator; however, this only holds over square-shaped domains.

Proposition 3
The convex envelope C L 2 : D → R resulting from the univariate reformulation Bin2 is a tighter convex underestimator of F over D than the convex envelope C L 1 (x, y) resulting from the univariate reformulation Bin1 if D is a square. In this case, we have C L 2 (x, y) − C L 1 (x, y) ≥ 0 ∀(x, y) ∈ D, and there exists a point (x, y) ∈ D with C L 2 (x, y) − C L 1 (x, y) > 0.
In other words, there exists a point (x, y) ∈ D with C L 2 (x, y) − C L 1 (x, y) > 0.