Enhancements of Discretization Approaches for Non-Convex Mixed-Integer Quadratically Constraint Quadratic Programming‹

We study mixed-integer programming (MIP) relaxation techniques for the solution of non-convex mixed-integer quadratically constrained quadratic programs (MIQCQPs). We present two MIP relaxation methods for non-convex continuous variable products that enhance existing approaches. One is based on a separable reformulation, while the other extends the well-known MIP relaxation normalized multiparametric disaggregation technique (NMDT). In addition, we introduce a logarithmic MIP relaxation for univariate quadratic terms, called sawtooth relaxation, based on [4]. We combine the latter with the separable reformulation to derive MIP relaxations of MIQCQPs. We provide a comprehensive theoretical analysis of these techniques, and perform a broad computational study to demonstrate the effectiveness of the enhanced MIP relaxations in terms producing tight dual bounds for MIQCQPs.


Introduction
In this work, we study relaxations of general mixed-integer quadratically constrained quadratic programs (MIQCQPs), which are defined as . . , m, x i P rx i ,x i s i P 1, . . . , n, y P t0, 1u k , ‹ B. Beach and R. Hildebrand are supported by AFOSR grant FA9550-21-0107. Furthermore, we acknowledge financial support by the Bavarian Ministry of Economic Affairs, Regional Development and Energy through the Center for Analytics -Data -Applications (ADA-Center) within the framework of "BAYERN DIGITAL II".
for Q 0 , Q j P R nˆn , c 0 , c j P R n , d 0 , d j P R k and b j P R, j " 1, . . . m. More precisely, we consider discretization techniques for non-convex MIQCQPs that allow for relaxations of the set of feasible solutions based on mixed-integer programming (MIP) formulations. To this end, we study a number of MIP formulations that form relaxations of the quadratic equations z " x 2 and z " xy. These MIP formulations can then be applied to (1) by introducing auxiliary variables and one such quadratic equation for each quadratic term to form a relaxation of the overall problem. In particular, we consider the strength of various MIP relaxations applied directly to a given problem, which is the simplest approach to enable the solution of MIQCQPs via an MIP solver. Our focus here is to analyze these approaches both theoretically and computationally with respect to the quality of the dual bound they deliver for MIQCQPs.
Background MIQCQPs naturally arise in the solution of many real-world optimization problems, stemming e.g. from the contexts of power supply systems ( [1]), gas networks ( [17,25]), water management ( [21]) or pooling/mixing ( [3,9,13,28,29]). See [23,35] and the references therein for more examples. For the solution of such problems, there are a number of different approaches, which differ in case the problems are convex or non-convex. Within this work, we focus on the most general case, i.e. non-convex MIQCQPs, and only require finite upper and lower bounds on the variables, as stated in (1).
In the literature, a variety of solution techniques for non-convex MIQCQPs exists. The most prominent class among them are McCormick -based techniques, see e.g. [10,11,12,14,33,34]. For quadratic programs, in particular, convexification can be applied to bivariate monomials xy by introducing a new variable z " xy and constructing the convex hull over the bounds on x and y. This yields the so-called McCormick relaxation, which is the smallest convex set containing the feasible set of the equation z " xy for given finite bounds on x and y. This relaxation is known to be a polytope described by four linear inequalities (see [32]), and it is tighter the smaller the a priori known bounds on x and y are. Hence, one standard approach is spatial branch-and-bound, where the key idea is to split the domain recursively into two subregions. For instance, one can choose the two subregions where x ďx and x ěx, respectively, for some valuex. By branching on subregions, we can improve the convexification of the feasible region by adding valid inequalities to the subproblems. Thus, applying spatial branchand-bound in conjunction with convexification (such as McCormick Relaxations) sequentially tightens the relaxation of the problem.
Alternatively, similar effects can be achieved through some kind of binarization. This is a general term that describes the conversion of continuous or integer variables into binary variables. By branching on these new binary variables, we also partition the space into subproblems in a way that simulates spatial branchand-bound. The binarization of the partition makes the resulting problem a piecewise linear (pwl. ) relaxation of the original problem with binary auxiliary variables. McCormick-based methods can differ in the way the partition and the binarization is performed. The partition can be performed purely on one variable or on both variables, equidistantly or non-equidistantly. The binarization can be done linearly or logarithmically in the number of partition elements, see [38,30]. In a broader sense, (axial-)spatial branching for bilinear terms can also be seen as a piecewise McCormick linearization approach. Here, the partition is not performed a priori, but rather an initial partition is refined via branching on continuous variables. An overview of spatial-branching techniques can be found in [5].
Another common idea for linearizing variable products is to use quadratic convex reformulations as in [6,24,20,19,4]. This technique transforms the nonconvex parts of the problem into univariate terms via reformulations. In [4], the authors apply diagonal perturbation to convexify the quadratic matrices. The resulting univariate quadratic correction terms are then linearized by introducing new variables and constraints of the form r " x 2 , which are then approximated by pwl. functions. The binarization of the univariate pwl. functions is done logarithmically by using the so-called sawtooth function, introduced in [40]. An advantage of this approach is that only linearly many expressions of the form r " x 2 have to be linearized instead of quadratically many equations of the form z ij " x i x j , with respect to the dimension of the original quadratic matrix. This approach yields a convex MIQCQP relaxation instead of the MIP relaxation obtained via a direct modelling using bilinear terms.
A further set of approaches relies on separable reformulations of the nonconvex variable products, as done e.g. in [8]. Here, each term of the form xy is reformulated as a sum of separable univariate terms, for example using the equivalent reformulation xy " 1 {2px 2`y2´p x´yq 2 q " 1 {2pr`s´tq withr " x 2 , s " y 2 and t " px´yq 2 as described by [2]. The univariate constraints, here equations of the form r " x 2 , s " y 2 and t " px´yq 2 , are then relaxed. Again, this approach can be combined with a logarithmic encoding of the univariate linear segments, as in [20,4]. In [8], the authors analyze the following possible reformulations: Bin2 : xy " 1 {2`px`yq 2´x2´y2˘, Bin3 : xy " 1 {2`x 2`y2´p x´yq 2˘.
They prove that MIP-based approximations of each of these univariate reformulations require fewer binary variables than a bivariate MIP-based approximation that guarantees the same maximal approximation error, if this prescribed error is small enough. However, this comes at the cost of weaker linear programming (LP) relaxations. Alternatively, one can also obtain an MIP relaxation of xy directly via a bivariate pwl. relaxation, see e.g. [8,7,25,38]. One way to do this is to perform a triangulation of the domain, which defines a pwl. approximation of the variable product. This pwl. approximation can then easily be converted into a relaxation of the feasible set by axis-parallel shifting, which yields a pwl. underestimator and overestimator. Bivariate pwl. approximations can also be binarized using (logarithmically-many) binary variables, see e.g. [25,38,30]. Contribution We compare different MIP relaxations approaches, both known ones and new ones, in terms of the dual bound they impose for non-convex MIQCQPs. Among others, we extend the known separable approximation approaches Bin2 and Bin3 from [8] to MIP relaxations for z " xy. Additionally, we introduce a novel MIP relaxation for the equation z " xy called hybrid separable (HybS) that is based on a sophisticated combination of Bin2 and Bin3 that allows us to relax only linearly-many univariate quadratic terms (in the dimension of the quadratic matrix). We combine these separable MIP relaxations for z " xy with a MIP relaxation, called sawtooth relaxation, for z " x 2 that requires only logarithmically-many binary variables. Thus, we can obtain MIP relaxations for MIQCQPs. The sawtooth relaxation is an extension of the sawtooth approximation from [4] which has the strong property of hereditary sharpness. Hereditary sharpness of a MIP formulation means that the formulation is tight in the space of the original variables, even after branching on integer variables. We can show that the sawtooth relaxation is also hereditary sharp. Furthermore, we extend the well-known McCormick-based approach called normalized multiparametric disaggregation technique (NMDT) by a double discretization of both variables in the equation z " xy. We refer to the latter as doubly discretized NMDT (D-NMDT).
In a theoretical analysis of the considered models, we show that both HybS and D-NMDT have theoretical advantages, such as fewer binary variables and better LP relaxations compared to Bin2, Bin3, and NMDT.
Finally, we perform an extensive numerical study where we use the considered formulations to generate MIP relaxations of non-convex MIQCQPs and boxQPs. Foremost, we test the different relaxation techniques in their ability to generate tight dual bounds for the original quadratic problems. It will be seen that the extensions HybS and D-NMDT have a clear advantage over their predecessors Bin2, Bin3 and NMDT. This effect becomes even more apparent on the dense instances.
Outline We proceed as follows. In Section 2, we introduce several useful concepts and notations used throughout the work. In Section 3, we present core formulations used repeatedly in our linear relaxations of quadratic terms. In Section 4, we introduce two new MIP relaxations for equations of the form z " xy. In Section 5, we prove various properties about the strengths of the MIP relaxations focusing on volume, sharpness, and optimal choice of breakpoints. In Section 6 we prove that the sawtooth relaxation is hereditarily sharp. In Section 7, we present our computational study.

MIP Formulations
Throughout this article, we use the following convenient notation: for any two integers i ď j, we define i, j :" ti, i`1, . . . , ju, and for an integer i ě 1 we define i :" 1, i . We will denote sets using capital letters, variables using lower case letters and vectors of variables using bold face. For a vector u " pu 1 , . . . , u n q and some index set I Ď n , we write u I :" pu i q iPI . Thus, e.g. u i " pu 1 , . . . , u i q.
Furthermore, we introduce the following notation: for a function F : X Ñ R and a subset B Ď X, let gra B pF q, epi B pF q and hyp B pF q denote the graph, epigraph and hypograph of the function F over the set B, respectively. That is, gra B pF q :" tpu, zq P BˆR : z " F puqu, epi B pF q :" tpu, zq P BˆR : z ě F puqu, hyp B pF q :" tpu, zq P BˆR : z ď F puqu.
In the following, we introduce MIP formulations as we will use them to represent these sets as well as the different notions of the strength of an MIP formulation explored in this work.
We will study mixed-integer linear sets, so-called mixed-integer programming (MIP) formulations, of the form P IP :" tpu, v, zq P R d`1ˆr 0, 1s pˆt 0, 1u q : Apu, v, zq ď bu for some matrix A and vector b of suitable dimensions. The linear programming (LP) relaxation or continuous relaxation P LP of P IP is given by P LP :" tpu, v, zq P R d`1ˆr 0, 1s pˆr 0, 1s q : Apu, v, zq ď bu.
We will often focus on the projections of these sets onto the variables u, i.e. proj u pP IP q :" tu P R d`1 : Dpv, zq P r0, 1s pˆt 0, 1u q s.t. pu, v, zq P P IP u. (2) The corresponding projected linear relaxation proj u pP LP q onto the u-space is defined accordingly.
In order to assess the quality of an MIP formulation, we will work with several possible measures of formulation strength. First, we define notions of sharpness, as in [4,27]. These relate to the tightness of the LP relaxation of an MIP formulation. Whereas properties such as total unimodularity guarantee an LP relaxation to be a complete description for the mixed-integer points in the full space, we are interested here in LP relaxations that are tight description of the mixed-integer points in the projected space. Definition 1. We say that the MIP formulation P IP is sharp if proj u pP LP q " convpproj u pP IP qq.
holds. Further, we call it hereditarily sharp if, for all I Ď L andẑ P t0, 1u |I| , we have proj u pP LP | z I "ẑ q " conv`proj u pP IP | z I "ẑ q˘.
Sharpness expresses a tightness at the root node of a branch-and-bound tree.
Hereditarily sharp means that fixing any subset of binary variables to 0 or 1 preserves sharpness, and therefore this means sharpness is preserved throughout a branch-and-bound tree.
In this article, we study certain non-polyhedral sets U Ď R d`1 and will develop MIP formulations P IP to form relaxations of U in the projected space, as defined in the following.
1. The maximum error of P IP w.r.t. U is defined as Epu, U q.
2. The average error of P IP w.r.t. U is defined as E avg pP IP , U q :" volpP IP zU q.
Via integral calculus, the second, volume-based error measure can be interpreted as the average pointwise error of all points u P proj u pP IP q. Note that whenever the volume of U is zero (i.e. it is a lower-dimensional set), the average error just reduces to the volume of P IP . Both of the defined error quantities for an MIP relaxation P IP can also be used to measure the tightness of the corresponding LP relaxation P LP . In Section 5.3.2, we use these to compare formulations when P LP is not sharp.

Core Relaxations
In the definition of the discretizations studied in this work, we repeatedly make use of several "core" formulations for specific sets of feasible points. They are introduced in the following.

McCormick Envelopes
For our discretizations of MIQCQPs, we will frequently need to consider terms of the form xy for continuous or integer variables x and y within certain bounds D x :" rx,xs and D y :" rȳ,ȳs, respectively.
To this end, we will often use the function F : D Ñ R, F px, yq " xy, D :" D xˆDy , and refer to the set of feasible solutions to the equation z " xy via the graph of F , i.e. gra D pF q " tpx, y, zq P DˆR : z " xyu. In order to simplify the exposition, we will, for example, often write gra D pxyq or refer to a relaxation of the equation z " xy instead of gra D pF q. We will do this similarly for the epigraph and hypograph of F as well as for the univariate function f : D x Ñ R, f pxq " x 2 and equations of the form z " x 2 , for example.
The convex hull of the equation z " xy for px, yq P D is given by a set of linear equations known as the McCormick envelope, see [32]: Mpx, yq :" px, y, zq P rx,xsˆrȳ,ȳsˆR : (4) ( .
x¨y`x¨ȳ´x¨ȳ ď z ďx¨y`x¨ȳ´x¨ȳ x¨y`x¨ȳ´x¨ȳ, ď z ď x¨y`x¨ȳ´x¨ȳ.
Note that, by construction in [40,4], S L is defined such that when α P t0, 1u L , the relationship between g j and g j´1 is g j " mint2g j´1 , 2p1´g j´1 qu for j " 1, . . . , L, which means that it is given by the "tooth" function G : r0, 1s Ñ r0, 1s, Gpxq " mint2x, 2p1´xqu. Therefore, each g j represents the output of a "sawtooth" function of x, as described in [40,37], i.e. when α P t0, 1u L , we have Now, we define the function F L : r0, 1s Ñ r0, 1s, which is a close approximation to x 2 in the sense stated in the following proposition, which summarizes useful information about this approximation from [40,4].
Using the relationships (11) and (12) between x and g, any constraint of the form z " x 2 can be approximated via the function The sawtooth functions G j for j " 1, 2, 3.  We use the above definitions to give an MIP formulation that approximates equations of the form z " x 2 .
Definition 4 (Sawtooth Approximation, [4]). Given some L P N, the depth-L sawtooth approximation for z " x 2 on the interval x P r0, 1s is given by The set (14) is a compact approximation of gra r0,1s pF L q in terms of the number of variables and constraints.
Based on the sawtooth approximation, we can now present the sawtooth relaxation for z " x 2 from [4], illustrated in Figure 3, which arises by shifting each approximating function F j , j " 0, . . . , L, down by its maximum error 2´2 j´2 (established in Proposition 1, Item 2) and then adding additional outer-approximation cuts to x 2 at x " 0 and x " 1.
Definition 5 (Sawtooth Relaxation, SR [4]). Given some L P N, the depth-L sawtooth relaxation for z " x 2 on the interval x P r0, 1s is given by px, zq P r0, 1sˆR : Dpg, αq P r0, 1s L`1ˆt 0, 1u L : (16) ( , Remark 1 (Transformation to General Bounds). Most of our MIP formulations can be extended to general intervals x P rx,xs by mapping rx,xs to r0, 1s via the The successive piecewise linear approximations of x 2 shifted down to be underestimators. The markers indicate the places where the underestimators coincide with x 2 and in fact show that the affine segments are tangent lines to the function. The inequality z ě F L pxq´2´2 L´2 in fact creates 2 L tangent lower bounds.
substitutionx " x´x x´x P r0, 1s and applying the sawtooth formulation to model the equation Thus, for general intervals, we first apply the approximation toẑ "x 2 , then add the equationsx " x´x x´x ,ẑ " z´xp2x´xq px´xq 2 . In our computational study in Section 7, these constraints are implemented as defining expressions forx andẑ, and the MIP formulations are constructed forx andẑ then. See Appendix B for the generalized MIP formulations under this transformation. Now, we consider the LP relaxation of S L , where each variable α j is relaxed to the interval r0, 1s. Then, via the constraints (10), we see that the weakest lower bounds on each g j w.r.t. g j´1 can be attained via setting α j " g j´1 , yielding a lower bound of 0. Thus, after projecting out α, the LP relaxation of S L in terms of just x and g can be stated as T L " px, gq P r0, 1sˆr0, 1s L`1 : (17) ( , The sawtooth relaxation (15) is sharp by Theorem 2 (proved later in this work), which follows in much the same way as the sharpness of the sawtooth Fig. 3. The sawtooth relaxation from Definition 5 at depths L " 0, 1, 2. The shaded region is the relaxation. Some additional inequalities are plotted to help visualize the inequalities with respect to the functions F j . Fig. 4. The tightened sawtooth relaxations R L,L 1 for the pairs pL, L1q " p0, 1q, p0, 2q, p1, 2q. By increasing L1 beyond L, we tighten the lower bound by creating more inequalities. This is done by only adding linearly-many variables and inequalities in the extended formulation to gain exponentially-many equally spaced cuts in the projection.
approximation (14), as established in [4,Theorem 1]. Thus, the LP relaxation of the sawtooth relaxation (15) yields the same lower bound on z as the MIP version due to sharpness and the convexity of F L . This allows us to define an LP outer approximation for inequalities of the form z ě x 2 : Definition 6 (Sawtooth Epigraph Relaxation, SER). Given some L P N, the depth-L sawtooth epigraph relaxation for z ě x 2 on the interval x P r0, 1s is given by We will prove in Proposition 2 that the maximum error for the sawtooth epigraph relaxation is 2´2 L´4 .
Finally, we combine the depth-L sawtooth relaxation (15) with the depth-L 1 sawtooth epigraph relaxation (18) for some L 1 ě L to obtain a sawtooth relaxation which is stronger in the lower bound, but uses the same number of binary variables.
Definition 7 (Tightened Sawtooth Relaxation, TSR). Given some L, L 1 P N with L 1 ě L, the tightened sawtooth relaxation for z " x 2 on the interval x P r0, 1s with upper-bounding depth L and lower-bounding depth L 1 is given by R L,L1 :" tpx, zq P r0, 1sˆR : Dpg, αq P r0, 1s L1`1ˆt 0, 1u L : (21)u, (20) px, g 0,L , αq P S L (21a) We will prove in Theorem 2 that the tightened sawtooth relaxation is also sharp, and in Theorem 3 that it is hereditarily sharp. We also note that the sawtooth epigraph relaxation (18), which requires no binary variables, can be used to tighten relaxations of univariate quadratic terms when using any alternative methods. For example, we use this relaxation to tighten the univariate forms of the relaxations NMDT ( [11], Section 4.2) and D-NMDT (Section 4.3).

MIP Formulations for Non-Convex MIQCQPs
In this section, we focus on MIP formulations for relaxing bilinear equations of the form z " xy. For convenience, we define a dense MIQCQP as an MIQCQP for which all terms of the form x 2 i and x i x j appear in either the objective or in some constraint. The novel formulations presented herein, D-NMDT and HybS, are extensions of existing formulations, designed to significantly reduce the number of binary variables required to reach the same level of approximation accuracy compared to their original counterparts NMDT, Bin2 and Bin3 for dense MIQCQPs, which will all be introduced in the following.

Separable MIP Formulations
We present three MIP formulations based on separable reformulations. A separable reformulation turns a multivariate expression into a sum of univariate functions. To this end, we make use of the reformulation approaches Bin2 and Bin3, given via Bin2: xy " 1 {2ppx`yq 2´x2´y2 q, Bin3: xy " 1 {2px 2`y2´p x´yq 2 q, see e.g. [8], and combine them with the sawtooth relaxation (20) to derive MIP relaxations for the occurring equations of the form z " xy. While the following MIP formulations based on Bin2 and Bin3 are natural extensions of the MIP approximations studied in [8] to MIP relaxations, we will also combine both reformulations to a new formulation in which the MIP relaxation requires significantly less binary variables if it is used to solve problems of the form (1) where the matrices Q j are dense.
Definition 8 (Bin2). The MIP relaxation Bin2 of z " xy, x, y P r0, 1s 2 , with a lower-bounding depth L 1 P N and an upper-bounding depth L P N, is defined as follows: p " x`y z " 1 {2pz p´zx´zy q px, y, zq P Mpx, yq px, z x q, py, z y q, pp, z p q P R L,L1 x, y P r0, 1s, p P r0, 2s.
Definition 9 (Bin3). The MIP relaxation Bin3 of z " xy, x, y P r0, 1s 2 , with a lower-bounding depth L 1 P N and an upper-bounding depth of L P N, is defined as follows: p " x´y z " 1 {2pz x`zy´zp q px, y, zq P Mpx, yq px, z x q, py, z y q, pp, z p q P R L,L1 x, y P r0, 1s, p P r´1, 1s.
Note that we apply the tightened sawtooth relaxation R L,L1 , defined in (20), not only to x, y P r0, 1s, but also to the variable p, where the domain is either r0, 2s or r´1, 1s. This is done by following the transformation in Remark 1 to map p and z p to the interval r0, 1s and then applying (20) to the transformed variables.
We now combine Bin2 and Bin3 to derive an MIP relaxation for z " xy based on bounding z in the following two ways: and then replacing each right-hand side with proper upper and lower bounds. We choose this setting so that we only have to model lower bounds for the px´yq 2and px`yq 2 -terms and can thus apply the sawtooth epigraph relaxation (18) to circumvent the use of binary variables for these terms. To this end, we introduce the continuous auxiliary variables p 1 , p 2 , z x , z y , z p1 , z p2 and z to obtain an equivalent relaxation for z " xy: Finally, we replace x 2 and y 2 in the non-convex constraints (24b) with a sawtooth relaxation (21c) of depth L and p 2 1 and p 2 2 in the convex constraints (24c) by a sawtooth epigraph relaxation (21e) with depth L 1 to obtain a relaxation of z " xy in (24d). The resulting model is especially interesting as, in contrast to Bin2 and Bin3, it does not require binary variables to model equations of the form p 2 1 " px`yq 2 and p 2 2 " px´yq 2 , since we only need to incorporate lower bounds as used in Q L .
Definition 10 (Hybrid Separable HybS). Let x, y P r0, 1s, and let L, L 1 P N. The following MIP relaxation for z " xy, which combines the relaxations Bin2 and Bin3, is called the hybrid separable MIP relaxation, in short HybS, with a lower-bounding depth of L 1 and an upper-bounding depth of L: p 1 " x`y, p 2 " x´y px, z x q, py, z y q P R L,L1 pp 1 , z p1 q, pp 2 , z p2 q P Q L1 1 {2pz p1´zx´zy q ď z ď 1 {2pz x`zy´zp2 q px, y, zq P Mpx, yq x, y P r0, 1s, p 1 P r0, 2s, p 2 P r´1, 1s.
As Q L1 in (25) is originally defined for variables in r0, 1s, we again use the transformation from Remark 1 to extend it to other domains. Note that, when some constraint of an MIQCQP has a dense quadratic matrix, the number of (24c)-type constraints is quadratic in the dimension of x. Thus, the number of binary variables for Bin2 and Bin3 is in Opn 2 Lq, while the formulation HybS requires only nL binary variables. As we will show in Section 5, the formulation HybS also has a strictly tighter LP relaxation than that of either formulation Bin2 or Bin3. This implies a smaller volume of the projected LP relaxation as well. We also note, however, that the MIP relaxation is not strictly tighter. For example, let L " L 1 " 1 and consider the point px, yq " p 1 4 , 3 4 q. The upper bound on z " xy produced by the MIP relaxation Bin2 at this point is z ď 3 16 , i.e. the exact value. The MIP relaxation HybS (as well as Bin3), however, has a weaker upper bound of z ď 1 4 at this point. When we apply any of the separable formulations Bin2, Bin3 and HybS to compute dual bounds for MIQCQPs in Section 7, all original univariate quadratic terms of the form x 2 i (i.e. those not resulting from any reformulations) are modelled via the tightened sawtooth relaxation (20).

Remark 2.
We can alternatively obtain a convex mixed-integer quadratic relaxation of z " xy by directly incorporating the quadratic constraints z x ď x 2 , z y ď y 2 , z p1 ě p 2 1 and z p2 ě p 2 2 in (24) exactly instead of using pwl. relaxations.

Base-2 NMDT
Now we present the base-2 version of the Normalized Multiparametric Disaggregation Technique (NMDT) by Castro [11], as described in [4,3], along with its univariate form (see [4,Appendix A]). Later, we also introduce a tightened version of NMDT for univariate quadratic terms that takes advantage of the sawtooth epigraph relaxation (18). In NMDT, the key idea for relaxing z " xy is to discretize one variable, e.g. x, using binary variables β P t0, 1u L and a residual term ∆ L x and then relaxing the resulting products β i y and ∆ L x y using McCormick envelopes.
The following derivation of NMDT can be transferred one-to-one to bases different to 2. The term NMDT was coined in [11], where the formulation was originally stated in base 10. The core idea behind the base-2 NMDT is to discretize x as then multiply by y to obtain the exact representation We then use McCormick envelopes to model all remaining product terms, α i y and ∆ L x¨y , to obtain the final formulation.
Definition 11 (NMDT, [11]). The MIP relaxation NMDT of z " xy with x P r0, 1s, y P r0, 1s and a depth of L P N is defined as follows: x " Since McCormick envelopes are exact reformulations of the variable products if at least one of the variables is required to be binary, the maximum error of NMDT with respect to z " xy is purely due to the McCormick relaxation of ∆ L z " ∆ L x¨y , with a value of 2´L´2. An advantage of the NMDT approach compared to the other formulations in this section is that it requires fewer binary variables to reach a desired level of accuracy for bipartite MIQCQPs, for which the quadratic part in each constraint is of the form x T Qy. This is due to the fact that one has only to discretize either x P R n or y P R m . Thus, to reach a maximum error of 2´2 L´2 for each bilinear term, NMDT requires only 2L mintm, nu binary variables instead of the Lpm`nq variables required by the approaches D-NMDT (see Section 4.3) or HybS. In contrast, NMDT requires twice the number of binary variables to reach the same level of accuracy if all quadratic terms x i x k and x 2 l with k " 1, . . . , n and l " 1, . . . , m must be modelled, for example if Q is dense.
Next, we show how to model univariate quadratic equations z " x 2 with the NMDT technique: Definition 12 (Univariate NMDT ( [11])). The MIP relaxation NMDT of z " x 2 with x P r0, 1s and a depth of L P N is defined as follows: x " Note that for any depth L, the univariate formulation NMDT yields a maximum error of 2´L´2 instead of the 2´2 L´2 in the sawtooth relaxation (15), see [4, page 39]. Further, the formulation NMDT is not sharp. For example at x " 1 2 , its LP relaxation admits the solution β j " 1 2 for all j P L , ∆ L x " 2´L´1, u j " 0 for all j P L , ∆ L z " 0 and z " 0, which is not in the convex hull of gra r0,1s px 2 q. We can tighten the lower bound obtained from (28) by adding the sawtooth epigraph relaxation (18) of depth L 1 (with L 1 ě L), i.e. px, zq P Q L1 . We refer to NMDT with this lower-bound tightening for quadratic terms as T-NMDT.

Definition 13 (Univariate T-NMDT). The MIP relaxation T-NMDT of
z " x 2 with x P r0, 1s and a depth of L, L 1 P N with L 1 ě L is defined as follows:

Doubly Discretized NMDT
The key idea behind the novel MIP relaxation Doubly Discretized NMDT ( D-NMDT) for z " xy is to further increase the accuracy of NMDT by discretizing the second variable y as well, which leads to a double NMDT substitution, namely in the ∆ L x y-term. In this way, for problems where NMDT would require discretizing all x i -variables, e.g. if we have some dense constraint, we can double the accuracy of the relaxation for the equations z ij " x i x j without adding additional binary variables by taking advantage of the fact that both variables are discretized anyway. In NMDT, we could choose to discretize either x or y for each equation of the form z " xy. For D-NMDT, we consider both options of discretization, and then, by introducing a parameter λ P r0, 1s, we can model a hybrid version of the two resulting MIP relaxations. Namely, we write xy " λxy`p1´λqxy, then discretize y first in the relaxation of λxy and x first in the relaxation of p1´λqxy. Finally, the complete MIP relaxation D-NMDT is obtained by relaxing the resulting products via McCormick envelopes (see Appendix A for the detailed derivation).
Definition 14 (D-NMDT). The MIP relaxation D-NMDT of z " xy with x, y P r0, 1s, a depth of L P N and the parameter λ P r0, 1s is defined as follows: x " Again, as McCormick envelopes are exact reformulations of bilinear products if one of the variables is binary, we only have a linearization error in the relaxation of the continuous variable product ∆ L x ∆ L y . This yields a maximum error of 2´2 L´2 for D-NMDT. For bounds on the terms p1´λq∆ L x`λ x and λ∆ L y`p 1´λqy, see Appendix B.
Remark 3. For our implementation of the D-NMDT technique used in Section 7, we set λ " 1 2 for the sake of formulation symmetry in x and y. Further, we include the first two McCormick envelopes for both λ " 0 and λ " 1, which can be done without increasing the number of binary variables.
To model the univariate quadratic terms with this method, we set y " x and z " xy. This results in an MIP relaxation for z " x 2 , which is equivalent to the MIP relaxation T-NMDT from Definition 13.
Definition 15 (Univariate D-NMDT). The MIP relaxation D-NMDT of z " x 2 with x P r0, 1s and a depth of L P N is defined as follows: x " Again, as McCormick envelopes are exact reformulations of bilinear products if one of the variables is required to be binary, we only have a linearization error in the relaxation of the continuous variable product ∆ L x ∆ L x . This yields a maximum error of 2´2 L´2 for univariate D-NMDT. Note that the upper bound of this formulation is formed by exactly the same pwl. approximation for z " x 2 as the sawtooth formulations. Unfortunately, the univariate D-NMDT is not sharp; for example, at x " 1 2 , its LP relaxation admits the solution β j " 1 2 for all j P L , ∆ L x " 2´L´1, ∆ L z " 0, u j " 0 for all j P L and z " 0, which is not in the convex hull of gra r0,1s px 2 q.
To formulate a tightened version of D-NMDT, we tighten (31) in the lowerbounding constraints, removing all McCormick lower bounds and adding the sawtooth epigraph relaxation (18) of depth L 1 (with L 1 ě L).

Definition 16 (Univariate T-D-NMDT). The MIP relaxation T-D-NMDT
of z " x 2 with x P r0, 1s and depths L, L 1 P N with L 1 ě L is defined as follows: In Table 1 in Section 5, we give a summary of the number of binary variables and constraints as well as the accuracy of each MIP relaxation when applied to a dense MIQCQP of the form (1).

Remark 4 (Binary Variables and Dense MIQCQPs).
When modelling Problem (1) using the MIP relaxations Bin2 and Bin3 at depth L, we have L binary variables created whenever the tigthened sawtooth relaxation R L,L1 is used. For Bin2, we need the relaxations px i , z xi q P R L,L1 and pp ij , z pij q P R L,L1 for all pairs i ‰ j, where p ij " x i`xj . Note that p ij " p ji . Thus, we need pn`1 2 pn´1q 2 qL " 1 2 pn 2`1 qL binary variables. We have the same result for Bin3, where instead we have p ij " x i´xj for all pairs i ‰ j. Although this means p ij ‰ p ji , we still have p 2 ij " p 2 ji . Thus, a careful implementation also has 1 2 pn 2`1 qL binary variables. HybS uses significantly fewer binary variables as it only requires px i , z xi q P R L,L1 for each i. Hence, there are only nL binary variables. Surprisingly, this relaxation decreases the error bound from Bin2 and Bin3 by half. The strength in this approach is gained without quadratically-many binary variables by using the tightening set Q L1 with the p 1 -and p 2 -variables.
For both NMDT and D-NMDT, for each variable, we will need a discretization of the form x " ř L j"1 2´jβ j`∆ L x with β P t0, 1u L . Thus, both of these formulations use nL binary variables in the case of a dense MIQCQP. However, the improved binarizations in D-NMDT reduces the errors exponentially compared to NMDT.
Note that it is possible that some preprocessing or reformulation, such as via a convex quadratic reformulation (QCR) may improve the number of binary variables needed. We do not use such reformulations in this work, but just focus on applying our MIP relaxations as is.

Theoretical Analysis
In this section, we give a theoretical analysis of the presented MIP relaxations for the equation z " xy over x, y P r0, 1s as well as the equation z " x 2 over x P r0, 1s, respectively, in order to allow for a comparison of structural properties between them. In particular, we will analyse their maximum and average errors, formulation strengths, i.e. (hereditary) sharpness and LP relaxation volumes, as well as the optimal placement of breakpoints to minimize average errors. The results we will arrive at are summarized in Table 1. Table 1. A summary of characteristics of the different MIP relaxations. Binary variables and constraints are given in the worst-case, in which every possible quadratic term must be modelled, for example if some matrix Qi is dense. The average error for HybS, Bin2 and Bin3 with respect to gra r0,1s 2 pxyq is calculated for L1 Ñ 8 and without the McCormick envelopes added. Finally, the average errors for Bin2 and Bin3 apply only to L ě 1; the corresponding volumes are 7 12 for L " 0. Finite L1 leads to slightly increased error bounds for the methods Bin2, Bin3 and HybS.

Maximum Error
We start by discussing the maximum errors of the presented MIP formulations.

Core Formulations
First, we discuss the maximum errors of the core formulations from Section 3.1. For the sawtooth approximation (14), the maximum error is an overestimation by 2´2 L´2 , see [4]. The maximum error of the sawtooth epigraph relaxation is 2´2 L´4 , which we prove in the following. The tightened sawtooth relaxation stated in (20) uses the sawtooth approximation for overestimation while the lower bound, which is incident with the sawtooth epigraph relaxation (18), gains an extra layer of accuracy, with a maximum error of 2´2 L´4 . Due to the overestimator, the (tightened) sawtooth relaxation has the same maximum error of 2´2 L´2 as the sawtooth approximation.  (22). In the left column, we show the case L " L1 " 1. In the right column, we show L " 1 and L1 Ñ 8.

Proposition 2 (Error of the sawtooth epigraph relaxation). The maximum error of the sawtooth epigraph relaxation
Proof. The lower-bounding inequalities on z induced by the px, zq-projection of the sawtooth epigraph relaxation, i.e. proj x,z pQ L q, are exactly the supporting valid linear inequalities to z ě x 2 at the points x k :" k 2 L`1 , k " 0, . . . , 2 L ; see Proposition 1. The maximum error is attained at the intersection of two consecutive linear segments on the boundary of the feasible region defined by these inequalities, i.e. at px k , z k q :" p x k`xk`1 2 , x k x k`1 q " ppk`1 2 q2´L´1, kpk1 q2´2 L´2 q. Thus, the maximum error is given by independent from the choice of k.

Separable MIP Formulations
In order to generate MIP relaxations of MIQCQPs, we need to discretize equations of the form z " x 2 and z " xy. In  (25). In the left column, we show the case L " L1 " 1. In the right column, we show L " 1 and L1 Ñ 8.
the approaches Bin2, Bin3 and HybS, we use the tightened sawtooth relaxation to discretize z " x 2 , which by Proposition 1 has a maximum error of 2´2 L´2 . For z " xy, we use a different separation in each approach. First, we consider the maximum error in the bounds on z in which x 2 and y 2 are overestimated and p 2 is underestimated. This applies to both bounds on z in HybS, the lower bound on z in Bin2 and the upper bound on z in Bin3. In this case, the maximum overestimation of both z x " x 2 and z y " y 2 is 2´2 L´2 , occurring at the grid centres x k " y k " pk`1 2 q2´L, k " 0, . . . , 2 L´1 . If we combine these points with a point on the graph of p 2 for z p , i.e. with error 0, we obtain a lower bound for the maximum error in the separable formulations. Namely, if P IP L,L1 denotes either of the MIP relaxations Bin2, Bin3 or HybS of gra r0,1s 2 pxyq with depths L, L 1 , we have independent of the choice of k. This yields the following proposition.
Proposition 3. The maximum error in the MIP relaxations Bin2, Bin3 and HybS for z " xy with x, y P r0, 1s is at least 2´2 L´2 .
The maximum underestimation of p 2 is 2´2 L1´2 (twice the domain width, which means the error quadruples). This means we have an upper bound of on the maximum error in the lower bound on z in Bin2, the upper bound on z in Bin3 and both the upper and lower bound on z in HybS. We can use this observation to give an upper bound on the maximum error in the MIP relaxation HybS for z " xy.
Proposition 4. The maximum error in the MIP relaxation HybS for z " xy with x, y P r0, 1s is at most 2´2 L´2`2´2L1´3 .
Next, we consider the upper bound on z in Bin2 and the lower bound on z in Bin3. Here, we are interested in the overestimation of the p 2 -terms and the underestimation of x 2 and y 2 . The maximum overestimation of p 2 is 2´2 L (again, doubling the the domain width quadruples the error). Combined with the maximum underestimation of the sawtooth relaxation for x 2 and y 2 of 2´2 L1´4 , this yields an upper bound on the maximum error on z of in terms of overestimation in Bin2 and underestimation in Bin3. Thus, we obtain the following upper bound for the maximum error in Bin2 and Bin3.
Proposition 5. The maximum error in the MIP relaxations Bin2 and Bin3 for z " xy with x, y P r0, 1s is at most 2´2 L´1`2´2L1´3 .
In summary, we have the same lower bound for the maximum error of 2´2 L´2 in Bin2, Bin3 and HybS. However, the known upper bound 2´2 L´1`2´2L1´4 in HybS is slightly better than that of Bin2 and Bin3 with 2´2 L´1`2´2L1´3 .
Remark 5. In the MIP relaxations Bin2, Bin3 and HybS, increasing L 1 does not introduce any new binary variables. Therefore, we note that in our computations in Section 7 we choose L 1 to be significantly larger than L, such that the maximum error depends primarily on L. As L 1 increases to infinity, the maximum errors in all three MIP relaxations converge to 2´2 L´2 .

NMDT-Based Formulations
Next we analyse the NMDT-based formulations. We show how to reduce the error calculations for these formulations to the error of a sequence of McCormick relaxations. For univariate NMDT, this follows since when we fix β P t0, 1u L , two things happen: (1) x is allowed to vary only with ∆ L x P r0, 2´Ls, and (2) the Mc-Cormick relaxation py, β i , u i q P Mpy, β i q is exact for each i " 1, . . . , L, i.e. the relaxation implies u i " yβ i . These two facts mean that the only error incurred on this small interval over which x can vary stems from the single McCormick relaxation Mp∆ L x , yq. A similar reasoning can be done for D-NMDT as well.
For NMDT, the error in z " xy is purely in the McCormick relaxation of the term ∆ L x y over grid pieces of the form rk2´L, pk`1q2´Lsˆr0, 1s with k P 0, 2 L . This yields a maximum error of 1 4 p2´L¨1q " 2´L´2. Likewise, for D-NMDT, the maximum error in z " xy is purely in the McCormick relaxation of the term ∆ L x¨∆ L y over the grid pieces of the form rk x 2´L, pk x`1 q2´Lsˆrk y 2´L, pk y`1 q2´Ls with k x , k y P 0, 2 L and a maximum error of 1 4 p2´L2´Lq " 2´2 L´2 . For univariate equations z " x 2 , the maximum error for D-NMDT is 2´2 L´2 , arising due to the McCormick relaxation of the term p∆ L x q 2 . For NMDT, the error is incurred by the McCormick relaxation of the term ∆ L x¨x , for which the usual maximizing point x " 0. Thus, the maximum error is slightly less than 2´L´2. For L ě 1, the maximum lower-bounding error for univariate NMDT is The maximum upper-bounding error is computed in a similar manner. For an easy comparison, we provide the following error-maximal points for univariate NMDT with L ě 1; for the lower bound they are and for the upper bound We remark that the upper bound for univariate NMDT matches that of D-NMDT if L " 1.
A summary of the results of the maximum error analysis can be found in Table 1. It should be noted that for a fixed depth L, HybS and D-NMDT provide the smallest maximum errors among the considered MIP relaxations.

Average Error and Minimizing the Average Error
In this section, we will study the average error of an MIP relaxation by computing the volume enclosed by the projected MIP relaxation as an additional measure of its relaxation quality.
First, we compute the volumes of all presented MIP relaxations. Then we prove that the uniform discretizations, which are used by definition in each MIP formulation in this article, are indeed optimal in terms of the minimizing the volume of the projected MIP relaxation if the number of discretization points is fixed (i.e. if L and L 1 are fixed).

Separable Formulations
In all separable formulations, we use the sawtooth relaxation (15) for equations of the form z " x 2 . In [4,Propostion 6], the authors show that the volume of this relaxation R L,L is 3 {16¨2´2 L . Furthermore, from [4,Proposition 5] it follows that for any fixed number of breakpoints a uniform discretization minimizes the volume of the sawtooth epigraph relaxation.
Next, we consider the volumes for the MIP relaxations of z " xy. We start by showing that Bin2, Bin3 and HybS induce a grid structure in terms of relaxation error and have constant volumes over the resulting grid pieces. While the grid structure for HybS is obvious, we have yet to show it for Bin2 and Bin3. From [8, Table 4], we further know that for L, L 1 Ñ 8 the z-values in the projected LP relaxation of Bin2 (22) are bounded from below by the convex function C L 2 : rx,xsˆrȳ,ȳs Ñ R and from above by the concave func- The same holds for Bin3 (23) and the convex and concave functions C L 3 : rx,xsr ȳ,ȳs Ñ R and C U 3 : rx,xsˆrȳ,ȳs Ñ R, As the upper bound on the z-value in HybS is the same as that for Bin2 and the lower bound is the same as that for Bin3, the respective projected LP relaxations P LP L,L1 in the limit for Bin2, Bin3 and HybS are In the following discussion, we will let L 1 Ñ 8. This simplifies the proofs considerably and is relevant in so far as in our computations we use a relatively high value of L 1 " 10, which has a resulting maximum error below standard machine precision and yet has no influence on the number of binary variables and uses only OpL 1 q constraints. Although for different values of L 1 the volumes are different, the hierarchy of MIP formulations that we establish is independent from this choice. We start with the volume of the MIP relaxation HybS. Proposition 6. Let P IP pL x ,L y q,L1 be the MIP relaxation HybS from (25) without the McCormick inequalities, where we now allow for independent discretization depths L x and L y to overestimate x 2 and y 2 , respectively (i.e. with px, z x q P R Lx,L1 and py, z y q P R Ly,L1 ). Then the volume of P IP pL x ,L y q,L1 converges to the same value over each grid piece of the form rk x 2´L x , pk x`1 q2´L x sˆrk y 2´L y , pk y1 q2´L y s, where k x P 0, 2 L x and k y P 0, 2 L y for L 1 Ñ 8. Furthermore, for the total volume of P IP pL x ,L y q,L1 , we have under Hausdorff distance. In HybS, we have pp 1 , z p1 q, pp 2 , z p2 q P Q L1 (via the transformation in Remark 1) as well as p 1 " x`y and p 2 " x´y. Thus, we have in the limit, as L 1 Ñ 8: z p1 ě px`yq 2 and z p2 ě px´yq 2 .
Furthermore, since F L pxq ě x 2 for all x P r0, 1s, L P tL x , L y u, and px, z x q P R L x ,L1 , py, z y q P R L y ,L1 , we obtain z x ď F L x pxq and z y ď F L y pyq.
Therefore, the inequality (25) implies the following in the limit: Now we apply these inequalities to grid pieces of the form rx,xsˆrȳ,ȳs. Let x :" k x 2´L x ,x :" pk x`1 q2´L x , ȳ :" k y 2´L y andȳ :" pk y`1 q2´L y , and define l x :"x´x " 2´L x as well as l y :"ȳ´ȳ " 2´L y . Then, as F L x pxq "´px`xqx`xx for x P rx,xs and F L x pyq "´pȳ`ȳqy`ȳȳ for y P rȳ,ȳs, the above bounds on z are exactly the envelopes C L 2 px, yq for the lower bound and C U 3 px, yq for the upper bound, respectively. Thus, by Proposition 12, which is proved later, the volume of proj x,y,z pP IP pL x ,L y q,L1 q over the grid piece is in the limit. Note that this does not depend on the choice of k x and k y (and thus the choice of grid piece).
Since we have 2 LxLy grid pieces overall, the total volume in the limit is then given by which finishes the proof.
[ \ The following proposition establishes the volumes of the MIP relaxations and grid structure for the MIP relaxations Bin2 and Bin3. As this derivation is extensive, we prove it in Appendix C.
Proposition 7. Let P IP pL x ,L y q,L1 be the either the MIP relaxation Bin2 from (22) or Bin3 from (23). Then the volume of P IP pL x ,L y q,L1 converges to the same value over each grid piece of the form rk Now that we have calculated the average error, i.e. the volume of the MIP relaxations, for uniform breakpoints, we show that among all possible breakpoint choices, a uniform placement of breakpoints minimizes the average error. For z " x 2 and the sawtooth functions, this has already been shown in [4]; for equations z " xy it still has to be shown. We prove average error minimization for uniform breakpoint placement in HybS and do not consider the formulations Bin2 and Bin3 here, as they are hard to analyse in this respect, which is also mentioned in [8] for approximations. In Proposition 6, we show that HybS has a grid structure where on each grid piece, the average error is 1 6 pl x l 3 y`ly l 3 x q, where l x and l y are the widths of the grid piece in x-and y-direction respectively. In the following, we consider a piecewise relaxation defined via these grid pieces and show that the total average error is minimized by a uniform breakpoint placement, as is the result of HybS.
. ă x n " 1 and 0 " y 0 ă y 1 ă . . . ă y m " 1 be sets of breakpoints. For each grid piece rx i´1 , x i sˆry j´1 , y j s, consider a relaxation of gra r0,1s 2 pxyq with average error 1 6 pl xi l 3 yj`lyj l 3 xi q, where l xi :" x i´xi´1 and l yj :" y j´yj´1 are the widths of the grid piece with i P n and j P m . Then a uniform spacing of these breakpoints minimizes the average error over all piecewise relaxations of this form.
Proof. The problem of minimizing the average error of a piecewise relaxation of this form can be formulated as The objective function in (40) sums the average errors over the single grid pieces while the constraints ensure that all single grid lengths sum up to 1 and are greater than or equal to 0. It can be rewritten to Thus, (40) decomposes into two independent problems where the respective optimal solutions x˚and y˚, can be composed to create px˚, y˚q, which is optimal for the original problem (40). The subproblems are and These are exactly the sawtooth-area optimization problems from [4, Proposition 5], such that a uniform placement of the breakpoints where each l xi " 1 n is optimal for (41), and l yj " 1 m is optimal for (42). Consequently, a uniform placement of grid points is optimal for (40) and the total volume is 1 6 p 1 m 2`1 n 2 q.

NMDT-Based Formulations
For equations of the form z " x 2 , univariate D-NMDT uses piecewise McCormick relaxations. In [4,Proposition 5], it is shown that uniform discretization is optimal for fixed numbers of breakpoints. However, for NMDT the calculation of the volume is much more complicated, so we omit it here. We next derive the average error of the MIP relaxations NMDT and D-NMDT for z " xy.
Proof. Note that the discretization in NMDT and D-NMDT yields piecewise McCormick relaxations over a uniformly spaced grid, where each grid piece corresponds to some fixed integer solution β x , β y P t0, 1u L , ∆ L x , ∆ L y P r0, 2´Ls. The volume of of the McCormick envelope over a single grid piece is 1 6 l 2 x l 2 y , where l x is its x-length and l y is its y-length (see e.g. [31, page 22]). The average error is then the sum over all grid piece volumes. Now, for NMDT we have 2 L grid pieces with l y " 1 and l x " 2´L, yielding a volume per grid piece of 1 6 2´2 L and thus a total volume of 1 6 2´L. Similarly, for D-NMDT we have 2 2L grid pieces with l x " l y " 2´L, which yields a volume per grid piece of 1 6 2´4 L and thus a total volume of 1 6 2´2 L .
We now prove that a uniform placement of breakpoints minimizes the average error in a piecewise McCormick relaxation. For n " 2 L and m " 1, this yields precisely the NMDT relaxation of depth L, and if n " m " 2 L , then this yields precisely the D-NMDT relaxation of depth L. Hence, they are optimal discretizations. The average error in NMDT is 1 6n " 1 6 2´L, and 1 6n 2 " 1 6 2´2 L in D-NMDT. This follows from the proof below. Theorem 1. Let 0 " x 0 ă x 1 ă . . . ă x n " 1 and 0 " y 0 ă y 1 ă . . . ă y m " 1 be sets of breakpoints. Then a uniform spacing of these breakpoints minimizes the average error over all piecewise McCormick relaxations of gra r0,1s 2 pxyq.
Proof. Let l x k :" rx k´1 , x k s and l x l :" ry l´1 , y l s with k P n and l x k P m be the lengths of the grid pieces rx k´1 , x k sˆry l´1 , y l s. The volume of the Mc-Cormick envelope Mprx k´1 , x k s, ry l´1 , y l sq over a single grid piece is 1 6 l 2 x k l 2 x l , see [31, page 22]. Therefore, the problem of minimizing the average error of a piecewise McCormick relaxation can be formulated as The objective function in (43) sums the average errors over the single grid pieces while the constraints ensure that all single grid lengths sum up to 1 and are greater than or equal to 0. Rewriting it to lets (44) decompose into the two independent convex subproblems Applying the KKT conditions to (45) and (46), which are sufficient for global optimality here, directly shows that a uniform placement of the breakpoints with l xi " 1 n and l yj " 1 m is optimal for (43). The total average error is then 1 6nm . Corollary 2. Let 0 " x 0 ă x 1 ă . . . ă x n " 1 and 0 " y 0 ă y 1 " 1 be sets of breakpoints with n " 2 L and P IP L a depth-L NMDT MIP relaxation of gra r0,1s 2 pxyq from (27). Then P IP L is an optimal piecewise McCormick relaxation with an average error of E avg pP IP L , gra r0,1s 2 pxyqq " 1 6 2´L. Corollary 3. Let 0 " x 0 ă x 1 ă . . . ă x n " 1 and 0 " y 0 ă y 1 ă . . . ă y m " 1 be sets of breakpoints with n " m " 2 L and P IP L a depth-L D-NMDT MIP relaxation of gra r0,1s 2 pxyq from (30). Then P IP L is an optimal piecewise McCormick relaxation with an average error of E avg pP IP L , gra r0,1s 2 pxyqq " 1 6 2´L. We summarize the key results of Section 5.2 in the remark below and in Table 1.
Remark 6 (Tightness of MIP Relaxations). For an equation z " x 2 , the tightened sawtooth relaxation (20), and the formulations that employ it, have the smallest volume in the projected MIP relaxation among all studied formulations: they are equivalent in upper bound, with a tightened lower bound, compared to univariate NMDT and D-NMDT. For z " xy, D-NMDT is the tightest formulation, as it yields the convex hull of gra D pxyq on each grid piece D " rk x 2´L, pk x`1 q2´Lsˆrk y 2´L, pk y`1 q2´Ls, k x , k y P 0, 2 L´1 . Combining these facts, T-D-NMDT is the tightest relaxation presented for the full MIQCQP.

Formulation Strength
In the previous section, we discussed the maximum and average errors incurred from using certain discretizations. We will now consider the strength of the resulting MIP relaxations by analysing their LP relaxation. First, we will check for sharpness and later compare them via the volume of the projected LP relaxation. Sharpness means that the projected LP relaxation equals the convex hull of the set to be formulated. If we now consider the volume of a projected LP relaxation, it can minimally be the volume of the convex hull, which precisely holds if the formulation is sharp. If a formulation is not sharp, the volume of the projected LP relaxation yields a measure of how much the formulation is "not sharp". The volume of LP relaxation as a measure for formulation strength was previously used in [8].

Sharpness
We start with the core formulations from Section 3. It is well known that the McCormick relaxation yields the convex hull of the feasible set of z " xy over box domains. Therefore, it is obviously sharp. In [4], it is shown that the sawtooth approximation for z " x 2 is sharp. We use this result to prove that sharpness also holds for the tightened sawtooth relaxation (20). See Figure 4 for examples of this relaxation under different parameter choices.
Theorem 2 (Sharpness of the tightened sawtooth relaxation). Consider the tightened sawtooth relaxation P IP L,L1 described in (20) in the space of px, z, g, αq for L, L 1 P N with L ď L 1 , i.e. R L,L1 " proj x,z pP IP L,L1 q. The MIP relaxation P IP L,L1 is sharp.
Then P IP L,L1 is sharp if and only if both P IPL ,L1 and P IPĹ ,L1 are sharp. This simplification holds since P IP L,L1 " P IPL ,L1 X P IPĹ ,L1 and since the upper bound P IPL ,L1 strictly overestimates x 2 , while in lower bound, P IPĹ ,L1 strictly underestimates x 2 , such that sharpness of the two can be considered separately. Now, the sharpness of P IPL ,L1 follows directly from the sharpness of the sawtooth approximation (14), which holds by [4,Theorem 1]. For the sharpness of P IPĹ ,L1 , the proof closely follows the proof of sharpness in [4, Theorem 1], except that, after choosing some fixed x P r0, 1s, we frame the contradiction as follows: 1. Choose g˚as in [4, Theorem 1], and choose the minimum possible value of z˚given g˚, such that z˚is incident with one of its lower bounds. 2. Observe that the chosen solution admits a feasible solution in P IP L,L1 , such that if it is minimal in the LP, then we are done. 3. Suppose for a contradiction that there exists a better z-minimal solution pẑ,ĝq than the proposed solution pz˚, g˚q, such that some incident lower bound must have been improved. 4. Observe that the improved incident lower bound must be of the form z ě f j px, g˚q´2´2 L´2 for some j ě 0, as the lower bounds 0 and 2x´1 do not change with the choice of g˚. Thus, f j px, g˚q´2´2 L´2 ě f j px,ĝq´2´2 L´2 5. Show that f j px, g˚q´f j px, g˚q ă 0, a contradiction on the choice of pŷ,ĝq.
Thus, the solution (z˚, g˚) was optimal to begin with, and therefore sharpness must hold.
The proof that f j px, g˚q´f j px, g˚q ă 0 follows in exactly the same manner as [4, Theorem 1] and is thus omitted here.
[ \ In [4], besides sharpness, it is further shown that the sawtooth approximation is also hereditarily sharp. The following theorem states that the same is true for the tightened sawtooth relaxation (20) and z " x 2 .
Theorem 3. The tightened sawtooth relaxation for z " x 2 is hereditarily sharp.
As the proof of Theorem 3 takes up a significant amount of space, we moved it to Section 6 . Next, we show that neither of the MIP relaxations Bin2, Bin3 or HybS for z " xy are sharp. That is, their projected LP relaxation does not equal Mpx, yq for any L, L 1 P N. Note that we have included the McCormick inequalities in the definitions of Bin2, Bin3 and HybS to make the formulations stronger. The following proofs, however, refer to the fact that if one omits the McCormick inequalities in these formulations, then they are not sharp. Together with the McCormick inequalities, of course, they are sharp trivially. We now show`lim L,L1Ñ8 proj x,y,z pP LP L,L1 q˘z convpproj x,y,z pP IP 1,1 qq ‰ H, which implies proj x,y,z pP LP L,L1 qz convpproj x,y,z pP IP L,L1 qq ‰ H, such that P IP L,L1 is not sharp for any L, L 1 P N. The argument works in the following manner: proj x,y,z pP LP L,L1 q˙z convpproj x,y,z pP IP 1,1 qq ‰ H ñ proj x,y,z pP LP L,L1 qz convpproj x,y,z pP IP L,L1 qq ‰ H ñ proj x,y,z pP LP L,L1 q ‰ convpproj x,y,z pP IP L,L1 qq. To this end, we show that there exist points px, y, zq P lim L,L1Ñ8 proj x,y,z pP LP L,L1 q with px, y, zq R proj x,y,z pP IP 1,1 q. Observe that, for any L, the point px, xq is feasible within the LP relaxation of the tightened sawtooth relaxation (20) for x 2 , with α i " g i´1 , g i " 0. Thus, for all L, L 1 ě 0 and for allx,ŷ P r0, 1s 2 , we have that P LP L,L1 , and thus also its limit lim L,L1Ñ8 proj x,y,z pP LP L,L1 q, admits the values z x "x, z y "ŷ and z p1 " px`ŷq 2 . Therefore, for px, yq " p0, 1 4 q, we obtain z " 1 2 ppx`yq 2´x´y q "´3 16 , such that p0, 1 4 ,´3 16 q P P LP 8,8 . Next, in order to prove p0, 1 4 ,´3 16 q R convpproj x,y,z pP IP 1,1 qq, we show mintz : py, zq P proj y,z pP IP 1,1 | x"0 qu "´1 8 . If this holds, then we have mintz : py, zq P convpproj y,z pP IP 1,1 | x"0 qqu "´1 8 , such that p0, 1 4 ,´3 16 q R convpproj x,y,z pP IP 1,1 qq. We derive a representation of proj y,z pP IP 1,1 | x"0 q that becomes an LP after branching spatially at y " 1 2 to resolve the upper bound on z y . We then minimize z over both branches via solving an MIP. Let x " 0. Then the bounds on z, z x , z y , z p1 within proj x,y,z pP IP 1,1 q are z x ď 0, z y ď y´1 4 mint2y, 2p1´yqu " maxt y 2 , 3y´1 2 u z p1 ě 4`y 2´1 4 mint2 y 2 , 2p1´y 2 q´1 16 u˘" maxty´1 4 , 3y´9 4 u z p1 ě 4p y 2´1 4 q " 2y´1 z p1 ě 0 z p1 ě 4p2 y 2´1 q " 4py´1q z ě z p1´zx´zy y P r0, 1s.
Note that the two pieces of the upper bound on z y meet at y " 1 2 . Using this to separately minimize z over the above set, once over y P r0, 1 2 s and once over y P r 1 2 , 1s, e.g. using an MIQCQP solver, we obtain two globally minimizing solutions with z "´1 8 , namely at y " 1 4 and at y " 3 4 . Thus, we conclude that p0, 1 4 ,´3 16 q R convpproj x,y,z pP IP 1,1 qq, such that P IP L,L1 is not sharp for any 1 ď L ď L 1 .
[ \ Proposition 11. Let P IP L,L1 be either of the two MIP relaxations Bin2 (22) or Bin3 (23). Then, without the inequalities from the McCormick envelope Mpx, yq, P IP L,L1 is not sharp for any L, L 1 P N.
Proof. Since Bin2 (22) has the same lower-bounding constraints as HybS, the proof follows directly from Proposition 10. Moreover, for Bin3 (23), the proof follows in exactly the same way as the proof of Proposition 10, except for the upper-bounding version of the same point, px, y, zq " p0, 1 4 , 3 8 q, and acting on the upper-bounding constraints from (25) and maximizing z instead. As the proof is very similar, with the corresponding upper bound z " 1 8 on proj y,z pP IP 1,1 | x"0 q, we omit it here.

LP Relaxation Volume
Having proved that none of the separable MIP formulations is sharp, which implies that they are also not hereditarily sharp, we now turn to considering the volume of projected LP relaxations.
For L " L 1 , the volume for the tightened sawtooth formulation (7) is 3 16 2´2 L , which has been shown in [4]. For general L 1 , by integrating over the overapproximation and underapproximation errors separately with the same analysis as in [4], we can derive a general volume of 1 6 2´2 L`1 48 2´2 L1 . We omit the precise calculation here. Further, as shown in the previously mentioned article, since univariate D-NMDT yields piecewise McCormick relaxations, we obtain a relaxation volume of 1 4 2´2 L . Finally, we do not present the univariate NMDT volumes, as they are difficult to compute due to the non-uniform structure of the relaxation.
In our analysis of the separable MIP formulations, we only consider the limits for L, L 1 Ñ 8. This allows us to evaluate the volumes independently of the underlying discretizations. For the additional volumes resulting from discretization errors, we refer to [4,Appendix], where the volume over the error function of the sawtooth approximation is given. We start with HybS.
Proposition 12. Let P LP L,L1 be the LP relaxation of the MIP relaxation HybS stated in (25) over the general domain rx ,xsˆrȳ ,ȳs. Without the McCormick envelope constraints, the volume of the limit of the projected LP relaxation lim L,L1Ñ8 proj x,y,z pP LP L,L1 q is 1 6 pl x l 3 y`ly l 3 x q, where l x "x´x and l y "ȳ´ȳ . Proof. The z-values in the projected LP relaxation of (25) are bounded by the convex function C L 2 and the concave function C U 3 , which are stated above in (33) and (36), respectively. The volume of the projected LP relaxation (25) is then calculated via integration: [ \ Proposition 13. Let P LP L,L1 be the LP relaxation of either the MIP relaxation Bin2 or Bin3 stated in (22) and (23), respectively, over the domain rx ,xsˆrȳ ,ȳs. Without the McCormick envelope constraints, the volume of the limit of the projected LP relaxation is lim L,L1Ñ8 volpproj x,y,z pP LP L,L1 qq " where l x "x´x and l y "ȳ´ȳ .
Proof. The z-values in the projected LP relaxation of (22) and (23) are bounded by the convex function C L 2 and the concave function C U 3 , which are stated above in (33) and (36), respectively. The volume calculation is then done via integration: [ \ We use Proposition 12 and Proposition 13 to prove that HybS yields strictly tighter LP relaxations than Bin2 and Bin3.

Proposition 14.
Without the McCormick envelope constraints, the LP relaxation of the MIP relaxation HybS in the limit as L, L 1 Ñ 8 is strictly tighter than that of Bin2 or Bin3. Moreover, the volume of the projected LP relaxation of formulation HybS in the limit as L, L 1 Ñ 8 is smaller by 1 4 l 2 x l 2 y . Proof. In [8, Appendix, Proposition 2] it has been shown that C L 2 is a tighter convex underestimator than C L 3 and that C U 3 is a tighter concave overestimator than C U 2 for z " xy. Thus, since the HybS approach converges to C L 2 as an underestimator and C L 3 as an overestimator, it is strictly tighter than either of Bin2 or Bin3. The volume calculation can again be done via integration: [ \

Proof of Theorem 3: Hereditary Sharpness of the Tightened Sawtooth Relaxation
This section is devoted to proving Theorem 3 which states that the tightened sawtooth relaxation (20) for z " x 2 is hereditarily sharp. This is a similar, albeit, more difficult result as the related one in [4] regarding the original sawtooth approximation. It is not clear how to obtain the former as a corollary of the latter. Furthermore, we use the result of [4] to shorten the work needed here. Before we begin the proof, we first introduce some required notation and restate several helpful results from [4]. For integers L 1 ě L ě 0, let P IP L,L1 be the tightened sawtooth relaxation from (20) in the space of px, z, g, αq and let P LP L,L1 be its LP relaxation, where in the latter all α-variables are relaxed to the interval r0, 1s. For convenience, and to avoid the variable redundancy g 0 " x throughout this section, we will omit the use of g 0 and use the abbreviated notation g " g 1,L1 . To further simplify the notation, we omit the subscript L, L 1 when the context is clear and simply write P IP and P LP instead of P IP L,L1 and P LP L,L1 . Now let I Ď L be the index set of the binary variables α which are fixed to given values ᾱ P t0, 1u I . This can be thought of as considering the branch in a branch-and-bound tree where α " ᾱ holds. Then we wish to show that at this node in the tree, sharpness also holds. More precisely, the goal is to show that P IP is sharp under the restriction α I " ᾱ , where α I " rα i1 , . . . , α i |I| s J and I " ti 1 , . . . , i |I| u. Hereditary sharpness of P IP then means convpproj x,z pP IP | α I "ᾱ qq " proj x,z pP LP | α I "ᾱ q.
In particular, these variables must satisfy (21a) and (21b). We also define the corresponding projections onto x, namely X IP :" proj x pP IP,ᾱ q andX LP :" proj x pP LP,ᾱ q.
We remark thatX LP " convpX IP q, which is shown later in (53).
This applies analogously toP LP,ᾱ . We now state some important results from [4] that establish bounds on each variable g i withinP LP,ᾱ px,g,αq and a closed-form optimal solution for g when minimizing z withinP IP,ᾱ or anyP IP,ᾱ j .

Lemma 1 (Bounds in Projection, Lemma 3 from [4]
). For all i P 0, L , we have proj gi pP LP,ᾱ px,g,αq q " convpproj gi pP IP,ᾱ px,g,αq qq ": ra i , b i s ‰ H. Furthermore, it holds that ra L , b L s " r0, 1s, and ra i´1 , b i´1 s can be computed from ra i , b i s as Note that in the last case, a i´1 ď 1 2 and b i´1 ě 1 2 hold.
Note that Lemma 1 with i " 0 and g 0 " x yieldsX LP " convpX IP q, viǎ which has also been used in [4]. Next, we adapt Lemma 5 from [4], which establishes that, when minimizing or maximizing z within P LP L,L | α I "ᾱ given a fixed value forx, each g i can directly be computed from g i´1 and the bounds established in Lemma 1. In particular, for the sawtooth relaxation (i.e. I " H), when minimizing z over the MIPfeasible points with a fixed x, we find that g i " mint2g i´1 , 1´2g i´1 u. That is, the g-variables take one of the two upper bounds that restrict them. However, in this section we have fixed several of the α-variables and have thus changed the feasible domain for each g-variable. Now, it could be that b i becomes an additional upper bound.
Lemma 2 (Adapted from Lemma 5 from [4]). Let a i and b i be defined as in Lemma 1 for all i P L 1 and letx P ra 0 , b 0 s. Further, define g˚as where, for i P I, it holds G i pg i´1 q " 2g i´1 if α i " 0, and G i pg i"1 q " 2p1´g i´1 q otherwise. Then we have g˚P arg mintz : pz, gq P proj z,g pP LP,ᾱ | x"x qu, (54a) g˚P arg mintz : pz, gq P proj z,g pP LP,ᾱ j | x"x qu @j P ´2, L 1 .

(54b)
That is, each g i with unfixed α i can take on one of its upper bounds w.r.t. g i´1 when minimizing z withinP LP,ᾱ | x"x andP LP,ᾱ j | x"x . Furthermore, this choice is unique for all i ď j, i.e. |arg mintz : pz, g j q P proj z,g j pP LP,ᾱ j q| x"x qu| " 1.
Finally, there exists some j P ´2, L 1 for whicȟ f j px, g˚q " mintz : pz, gq P proj z,g pP LP,ᾱ q| x"x qu. (55) Proof. The proofs of the optimality results (54a) and (54b) on g˚for j ě 1 closely follow the structure of the proof of Theorem 2, with the same underlying reasoning as in the proof of [4,Lemma 5]. In fact, the uniqueness of the optimizer also follows from the proof. Thus, the details are omitted here. To establish the optimality results for j ď 0, we observe that in this casef j is purely a function of x, such that the choice of g has no effect onf j , and g˚is thus still optimal. Finally, to fulfil (55), let j max P ´2, L 1 be chosen such that max jP ´2,L1 f j px, g˚q "f jmax px, g˚q.
Then we have mintz : pz, gq P proj z,g pP LP,ᾱ q| x"x qu " max jP ´2,L1 f j px, g˚q "f jmax px, g˚q " mintz : pz, gq P proj z,g pP LP,ᾱ jmax q| x"x qu ď mintz : pz, gq P proj z,g pP LP,ᾱ q| x"x qu, as required.
[ \ The next auxiliary result we need is a a lemma concerning reflections over x " 1 2 inP IP,ᾱ px,g,αq andP LP,ᾱ px,g,αq for the case where α 1 is not fixed.
That is, the maximum errors from the lower bounds coincide. Similarly, where g˚is defined on Lemma 2. Lastly, Proof. Recall thatP IP,ᾱ px,g,αq is formed from the constraints in S L and T L1 , along with fixing binary variables α I " ᾱ . It is easy to check that px, g, αq PP IP,ᾱ px,g,αq if an only if p1´x, g,ᾱq PP IP,ᾱ px,g,αq , whereᾱ 1 :" 1´α 1 andᾱ i :" α i for i P Izt1u. Thus, (56) holds due to this correspondence.
For j P 0, L 1 , we have Similarly, it holdsx The last equation forf´2 in the lemma statement follows from the reverse calculation of above. The same secondary result holds iff j px, gq is replaced withf px, gq. This follows since each constituting function (for the pair j "´1, j "´2) is symmetric about x " 1 2 w.r.t. the maximum error; the pointwise maximum over the functions retains the same symmetry. Similarly, the same result holds if I " H, such thatX " r0, 1s.
[ \ The following lemma formalizes the convex hull of convex functions over gaps. We denote the boundary of the set X by BX.
(61) Then we have convpepi X pF qq " epi convpXq pF X q.
This lemma is proved in Appendix C. We are now ready to prove Theorem 3.

Proof (of Theorem 3).
As discussed before, we only need to show thatP IP,ᾱ L,L1 is sharp to conclude that P IP L,L1 is hereditarily sharp. In particular, we need to show that convpproj x,z pP IP,ᾱ L,L1 qq " proj x,z pP LP,ᾱ L,L1 q.
Reduction to L 1 " L: Recall that L 1 ě L holds by definition.
Claim. We claim that it suffices to reduce L 1 to L to conclude hereditary sharpness of P IP L,L1 .
In particular, note that at the boundary points BX IP " t0, 2 8 , 6 8 , 1u, the tight lowerbounding inequalities are z ě 0, y ě 2x´1 and z ě F 1´2´4 . Thus, on the gap p 2 8 , 6 8 q the functionsF 2 ,F 3 are not needed to describe the convex hull of the MIP.
4. When we restrict to the domain convpX IP qzX IP and consider the convex hulls, we have equality as we reduce L 1 , i.e. convpproj x,z pP IP,ᾱ L,L1´1 q| convpX IP qzX IP " convpproj x,z pP IP,ᾱ L,L1 q| convpX IP qzX IP .
This is due to Item 2, the convexity ofF and Lemma 4.
Thus, the convex hull remains unchanged across the gaps inX IP , and since the LP relaxation does not weaken, sharpness in lower bound is maintained; see Figure 7. This implies thatP IP,ᾱ L,L1 is sharp ifP IP,ᾱ L,L1´1 is sharp. The claim then holds by induction.W e now proceed to prove sharpness ofP IP,ᾱ L,L by induction on L. Base case: If L " 0, then there are no binary variables and, hence, nothing to branch on; therefore, the result holds trivially. Induction on L: For the inductive step, we assume thatP IP,ᾱ L´1,L´1 is hereditarily sharp for all possible fixings of α-variables, and show thatP IP,ᾱ L,L is hereditarily sharp.
We begin by observing that proj x,z´P IP,ᾱ L,L¯" epiXIPpF q.
By Lemma 4, it follows that whereFXIP is defined as in Lemma 4. Thus, proving Theorem 3 is equivalent to proving that proj x,z pP LP,ᾱ L,L q " epi convpX IP q pFXIPq. In particular, it suffices to show that for anyx P convpX IP q, we havě which we do in the following. Case I:x PX IP . By Theorem 2, P IP L,L is sharp (i.e. when I " H). Thus, the LP lower bounds on z are incident with the MIP lower bounds for MIP-feasible points x PX IP , such that we have proj x,z pP LP,ᾱ L,L q| xPX IP " epiXIP pF q| xPX IP . This implies (63). Case II:x P convpX IP qzX IP . Letx´,x`PX IP as defined in Lemma 4. Since x RX IP , it follows thatx´,x`P BX IP . Case II.A: 1 R I. Assume 1 R I. Case II.A.1: rx´,x`s Ď BX IP X r0, 1{2s. We make use of the induction hypothesis here. To this end, we will work with L´1 layers. We will decorate variables and parameters from the smaller set using "˜".
Sincez is minimal, we havez "f j px,gq for some j. We claim that z " f j 1 px, gq for some j 1 .
Note that the above calculations also imply that, for allj P ´2, L´1 and for all px,gq P proj x,g pP LP px,g,αq q, we have for some j P ´2, L that Φ z pfjpx,gqq " f j pΦ x pxq, Φ g pgqq. Further, since eachj maps to a unique j (with only the inactive j "´1 skipped), this implies that Φ z pf px,gqq "f pΦ x pxq, Φ g pgqq. Thus, we can conclude that (21d) and (21e) hold.
Next, we argue that px, g, αq P proj x,g,α 2,L pP LP,ᾱ px,g,αq q. This implies in particular that (21a) as well as (21b) hold and that we have α I " ᾱ I .
Since g 1 "x " 2x, we observe thatP LP,ᾱ px,g,αq can be written as the set of points px, g, αq P r0, 1sˆr0, 1s Lˆr 0, 1s L such that In this form, it is straightforward to confirm px, g, αq PP LP,ᾱ px,g,αq from the corresponding form forP LP px,g,αq : since the indices for both the map on g and on the shift fromĨ to I are shifted by 1 in the same direction, with the same choice of ᾱ , all equality constraints on g i , i PĨ, are preserved through the mapping. Further, the relationship between each g i and g i´1 is likewise preserved, as the corresponding α i is the same, and finally the choice of g 1 is feasible given x. Thus, all constraints are satisfied, such that px, g, αq PP LP,ᾱ px,g,αq , yielding for the choice of z above that px, z, g, αq PP LP,ᾱ L,L | xPconvpX IP Xr0, 1 {2sq . Further, from the form forP LP,ᾱ px,g,αq above, we observe that Φ x pX IP q "X IP X r0, 1 2 s and Φ x pX LP q " convpX IP X r0, 1 2 sq. To show the first portion, we have already shown that Φ x pX IP q ĎX IP X r0, 1 2 s. To prove the other direction, we simply reverse the map for any px, g, αq PP IP,ᾱ px,g,αq | xPr0, 1 {2s , ignoring α 1 : letting x " g 1 " x 2 ,g " g 2,L andα " α 2,L , it is easy to confirm px,g,αq PP px,g,αq . To show that proj x´Φ pP LP,ᾱ L´1,L´1 q¯" convpX IP X r0, 1 2 sq, we observe that convpX IP q| xPr0, 1 {2s is a closed interval with boundary points inX IP X r0, 1 2 s " Φ x pX IP q, such that convpX IP X r0, 1 2 sq " convpΦ x pX IP qq " Φ x pconvpX IP qq " Φ x pX LP q, since Φ is linear in x.W e now show two facts: Claim 1. Letx PX LP andz˚P arg mintz : pz,gq P projz ,g pP LP,ᾱ L´1,L´1 |x "x qu with the corresponding solutiong˚defined in Lemma 2. Then p 1 4z˚, Φ g pg˚qq P arg mintz : pz, gq P proj z,g pP LP,ᾱ L,L | x"Φxpxq qu.
Consequently, Φ z pFXIPpxqq "FXIPpΦ x pxqq holds onX. Now, by Lemma 4, across any gapx´,x`PX for which px´,x`q XX " H andx P rx´,x`s, we have thatFXIP pxq is on the line between the points px´,f px´qq and px`,f px`qq. Thus, sincex :" Φ x pxq, and since Φ is linear in x and z,f pxq lies on the line between the points pΦ x px´q,f px´qqq and Φ x ppx`q,f px`qq. Now, observe that, since Φ x pXq " X X r0, 1 2 s, we have that px´, x`q :" pΦ x px´q, Φ x px`qq is a gap in X, with x´, x`P X and px´, x`q X X " H. Furthermore, as x`, x´P X, we have thatFXIP pxq " Φ x pFXIP px´qq, and similarly for x`. Then, by Lemma 4, we have for x P px`, x´q thatFXIPpΦ x pxqq " FXIPpxq " Φ x pFXIPpxqq, as required.C ase II.A.2: rx´,x`s Ď convpX IP X r1{2, 1sq. Applying Lemma 3 toP IP,ᾱ , we immediately recover sharpness on 1´Φ x pX LP q " convpX IP X r 1 2 , 1sq. To see this, let x P Φ x pX LP q. Then, via Lemma 3, we obtain exactly the same feasible regions for g, α with x " 1´x as with x "x, i.e. proj g,α 2,L pP IP,ᾱ px,g,αq | x"x q " proj g,α 2,L pP IP,ᾱ px,g,αq | x"1´x q, and moreover, similar to Lemma 3, it is not hard to show that we havex 2´F pxq " p1´xq 2´F p1´xq. Thus, we have that botȟ F p1´xq and min gPP LP,ᾱ L,L | x"x pf p1´x, gqq maintain the same distance below p1´xq 2 asFXIP pxq and min gPP LP,ᾱ L,L | x"x pf px, gqq, respectively. Since the second pair coincides, so must the first pair, such thať and therefore sharpness holds on 1´Φ x pX LP q. Case II.A.3: 1 2 P rx´,x`s. Since we showed sharpness on both convpX IP Xr0, 1 2 sq and convpX IP Xr 1 2 , 1sq, we only have to show sharpness on the gap px´,x`q inX IP . Note, in this case, 1 2 RX IP . We wish to show that min gPP LP,ᾱ | x"x pf px, gqq coincides with the line between px´,f px´qq and px`,f px`qq.
To show this, we first note that both endpoints coincide withf jmax px, g˚q for some j max , and by Lemma 3, both this value of j and the corresponding solution g˚must be the same for both gap endpoints. Further, sincex´,x` are the endpoints of a gap, we have thatf px´q "x 2 andf px`q "x 2 . This can be seen as follows: first, by [4, Lemma 6], we have that eachf j , j ě 0, is incident with x 2 exactly at the points x " k 2 j`1 2 j`1 , k " 0, . . . , 2 j´1 . Furthermore, the points at which the α-vector changes, and thus the possible gaps inX IP , are exactly the pointsx " k2´L, which must take the form above for some j P 0, L´1 , so thatF j´1 pxq " x 2 for x P tx´,x`u. Since each otheř f j pxq ď x 2 at these points, this yieldsf pxq " x 2 for x P tx´,x`u. Now, let ra 1 , b 1 s be the bounds on g 1 from Lemma 1. Then we have g1 " b 1 : through the mapping Φ, we have g1 "x "b 0 at bothx´andx`, whereb 0 is defined in the manner of Lemma 1. Thus, since g 1 is subject to every constraint inP IP,ᾱ px,g,αq thatx is inP IP,ᾱ px,g,αq , we have that b 1 ďb 0 " g1 ď b 1 , such that g1 " b 1 .
Furthermore, by the convexity of proj x,g´P LP,ᾱ px,g,αq¯, since px´, g˚q, px`, g˚q P proj x,g´P LP,ᾱ px,g,αq¯, we have that px, g˚q P proj x,g´P LP,ᾱ px,g,αq¯f or allx P px´,x`q. Thus, we have for any suchx that yielding by Lemma 2 that g˚P arg mintz : pz, gq P proj z,g pP LP,ᾱ L,L q| x"x u. Thus, we havef px, g˚q " min gPP LP,ᾱ L,L | x"x pf px, gqq "f pxq is linear inx across the gap rx´,x`s and coincides withf pxq at the endpoints, as required. Therefore, we have thatP LP,ᾱ L,L is sharp across the gap. We have now established sharpness ofP LP,ᾱ L,L over all of convpP px,g,αq q, and thus the proof is complete for 1 R I. Case II.B: 1 P I. Finally, to recover sharpness if 1 P I, we only have to observe that inserting 1 into I, thereby restricting α 1 " 1 or α 1 " 0, simply restrictš P IP,ᾱ L,L to either x P Φ x pX IP q or x P 1´Φ x pX IP q, on which sharpness holds exactly as the sharpness result on the image of Φ (or its reflection) with 1 P I, with one difference: we define Φ so that α 1 "α 1 . However, this difference has no effect on the z-minimal solutions for g1 withinX LP , and thus no effect on sharpness.

Computational Results
In order to test the MIP relaxations from Section 4 with respect to their ability to determine dual bounds, we now perform an indicative computational study. More precisely, we will derive MIP relaxations of non-convex MIQCQP instances. The MIP relaxations are then solved using Gurobi [26] as an MIP solver to determine dual bounds and a callback function that uses the non-linear programming (NLP) solver IPOPT [39] to find a feasible solution for the MIQCQP. The formulations are tested for several discretization depths. To compare the considered methods to state-of-the-art spatial branching based solvers, we also run Gurobi as an MIQCQP solver. All instances were solved in Python 3.8.3, via Gurobi 9.5.1 and IPOPT 3.12.13 on the 'Woody' cluster, using the "Kaby Lake" nodes with two Xeon E3-1240 v6 chips (4 cores, HT disabled), running at 3.7 GHZ with 32 GB of RAM. For more information, see the Woody Cluster Website of Friedrich-Alexander-Universität Erlangen-Nürnberg. The global relative optimality tolerance in Gurobi was set to the default value of 0.01%, for all MIPs and MIQCQPs.

Study Design
In the following, we explain the design of our study and go into detail regarding the instance set as well as the various parameter configurations.
Instances. We consider a three-part benchmark set of 60 instances: 20 nonconvex boxQP instances from [20,4,15] and earlier works, 20 AC optimal power flow (ACOPF) instances from the NESTA benchmark set (v0.7.0) (see [16]), previously used in [1], and 20 MIQCQP instancess from the QPLIB [22]. In appendix D you will find links that contain download options and detailed descriptions of the instances. For an overview of the IDs of all instances, see Table 9. The benchmark set is equally divided into 30 sparse and 30 dense instances. We refer to dense problems as instances where either the objective function and/or at least one quadratic function in the constraint set is of the form x J Qx, where x P R n are all variables of the problem and Q P R n,n is a matrix with at least 25% of its entries being nonzero.
Parameters. For each instance, we solve the resulting MIP relaxation of each method from Section 4 using various approximation depths of L P t1, 2, 4, 6u and a time limit of 8 hours. All sawtooth and separable MIP relaxations are solved once with L 1 " L and once with a tightened underestimator version for univariate quadratic terms where L 1 " maxt2, 1.5Lu. This tightening is done as described in Definition 7 by adding linear cuts and without introducing further binary variables. In the separable methods Bin2, Bin3 and HybS this leads to a tightening of the relaxation of z " xy terms as well as of z " x 2 terms in the original MIQCQP. Note that the tightened MIP relaxations T-NMDT and T-D-NMDT are equivalent to the non-tightened MIP relaxations NMDT and D-NMDT when applied to bilinear terms of the form z " xy. However, they differ from them in that all lower bounding McCormick constraints in the univariate quadratic terms of the form z " x 2 are replaced by a tighter sawtooth epigraph relaxation (18) as described in Sections 4.2 and 4.3.
In Table 2, one can see an overview of the different parameters in our study. In total, we have 40 parameter configurations for 60 original problems, which means that we solve 2400 MIP instances. For the comparison with Gurobi as a state-of-the-art MIQCQP solver, we solve an additional 960 MIP instances and 120 MIQCQP instances. These additional MIP instances arise from disabling the cuts in Gurobi for the winner of the NMDT-based methods and the winner of the separable methods. The 120 MIQCQP instances are built from solving all 60 benchmark problems once with cuts enabled and once with cuts disabled. See Table 2. In the study, we consider the parameters cuts, depth and formulation on 60 MIQCQP instances and thus solve 2¨4¨5¨60 " 2400 MIP relaxations.
Depth L " 1, 2, 4, 6 L1 " L Tightened: L " 1, 2, 4, 6 L1 " maxt2, 1.5Lu Callback function. Solving all MIP relaxations, we use a callback function with the local NLP solver IPOPT that works as follows: given any MIP-feasible solution, the callback function fixes any integer variables from the original problem (before applying any of the discretization techniques from this work) according to this solution and then solves the resulting NLP locally via IPOPT in an attempt to find a feasible solution for the original MIQCQP problem.

Results
In the following, we present the results of our study. In particular, we aim to answer the following questions regarding dual bounds: -Are our enhanced methods D-NMDT and HybS computationally superior to their predecessors NMDT and Bin2 or Bin3, respectively? -Is it beneficial to use tightened versions of the MIP relaxations, i.e., to choose L 1 ą L? -How do our methods compare to the state-of-the-art MIQCQP solver Gurobi?
We provide performance profile plots as proposed by Dolan and More [18] to illustrate the results of the computational study regarding the dual bounds, see Figure 8 - Figure 16. The performance profiles work as follows: Let d p,s be the best dual bound obtained by MIP relaxation or MIQCQP solver s for instance p after a certain time limit. With the performance ratio r p,s :" d p,s { min s d p,s , the performance profile function value P pτ q is the percentage of problems solved by approach s such that the ratios r p,s are within a factor τ P R of the best possible ratios. All performance profiles are generated with the help of Perprof-py by Siqueira et al. [36]. The plots are divided into three blocks, one for NMDTbased methods, one for separable methods and one for the best methods against Gurobi as an MIQCQP solver. In addition to the performance profiles across all instances, we also show performance profiles for the dense and sparse subsets of the instance set.
Although the main criterion of the study is the dual bound, we also discuss run times. Here, we use the shifted geometric mean, which is a common measure for comparing two different MIP-based solution approaches. The shifted geometric mean of n numbers t 1 , . . . , t n with shift s is defined as`ś n i"1 pt i`s q˘1 {n´s . It has the advantage that it is neither affected by very large outliers (in contrast to the arithmetic mean) nor by very small outliers (in contrast to the geometric mean). We use a typical shift s " 10. Moreover, we only include those instances in the computation of the shifted geometric mean, where at least one solution method delivered an optimal solution within the run time limit of 8 hours.
Finally, we will highlight some important results regarding primal bounds for NMDT-based and separable methods and in the comparison of our methods with Gurobi [26] as an MIQCQP solver.

NMDT-based MIP relaxations
We start our analysis of the results by looking at the NMDT-based MIP relaxations. In Figure 8 we show performance profiles for the dual bounds that are obtained by the different NMDTbased MIP relaxations. The plot is based on all 60 instances of the benchmark set. Starting from L " 2, we can see that both D-NMDT and T-D-NMDT deliver notably tighter bounds within the run time limit of 8 hours. The largest difference is at L " 4, where D-NMDT and T-D-NMDT are able to find dual bounds that are within a factor 1.05 of the overall best bounds for nearly all instances. In contrast, NMDT and T-NMDT require a corresponding factor of more than 1.1. In addition, the tightened versions perform somewhat better than the corresponding counterparts, especially for L " 4.
To gain a deeper insight into the benefits of D-NMDT and the tightening of NMDT-based relaxations, we divide the benchmark set into sparse and dense instances. For sparse instances, the advantage of the new methods is rather small; see Figure 9. Here, T-D-NMDT provides marginally better bounds than the other methods in case of L " 4 and L " 6. For L " 1 and L " 2, however, T-NMDT dominates all other approaches. Moreover, the tightened versions outperform their counterparts for all depths L.
For dense instances, D-NMDT and T-D-NMDT are clearly superior to NMDT and T-NMDT; see Figure 10. Regardless of the relaxation depth, the new methods yield the tightest dual bounds, with T-D-NMDT being superior to D-NMDT only in case of L " 2, where the tightened version T-D-NMDT is able to find the best dual bound for roughly 10% more instances than D-NMDT. Tightening the NMDT method does not deliver better bounds, in fact, T-NMDT is surpassed by NMDT for L " 1.
Regarding the run times of the various NMDT-based approaches, Table 3 shows significantly lower run times for D-NMDT and T-D-NMDT. Again, T-D-NMDT is slightly ahead of D-NMDT.
In Table 4, we can see that the QP heuristic (IPOPT) we mentioned at the beginning of this section delivers high-quality feasible solutions for the original (MIQC-)QP instances. With increasing L values, IPOPT is able to find more feasible solutions with all NMDT-based methods quite similarly. For L " 6, T-D-NMDT combined with IPOPT yields feasible solutions for 50 out of 60 benchmark instances, 47 of which have a relative optimality gap below 1% and 46 of which are even globally optimal, i.e., which have a gap below 0.01%.
In summary, both T-D-NMDT and D-NMDT are clearly superior to the previously known NMDT approach. The double discretization and the associated reduction in the number of binary variables while maintaining the same relaxation error are most likely the reason for this. Surprisingly, the tightening of the lower bounds in the univariate quadratic terms and the resulting introduction of new constraints does not lead to higher run times. Thus, the latter is recommended. Moreover, T-D-NMDT is slightly ahead of the other methods in computing good solutions for the MIP relaxations that are used by the NLP solver IPOPT to find feasible solutions for the original MIQCQP instances. Altogether, we consider T-D-NMDT to be the winner among the NMDT-based methods.   Figure 11 the performance profiles of the separable MIP relaxations with regard to dual bounds using all instances can be seen. Starting with L " 2, the new introduced methods HybS and T-HybS deliver significantly better dual bounds. Except for L " 2, where T-HybS dominates HybS, we do not obtain better dual bounds by tightening the separable MIP relaxations. With L " 4 and L " 6, HybS yields dual bounds that are within a factor 1.05 of the overall best bounds among separable MIP relaxations for nearly all instances. All other methods require a corresponding factor of at least 1.2.
In Figure 12 and Figure 13, we divide the benchmark set into sparse and dense instances again to obtain a more in-depth look at the benefits of HybS. For sparse instances, using HybS and T-HybS has no clear advantage, as Figure 12 shows. However, with L " 1 and L " 2, the tightened variants deliver notably better dual bounds. For L " 1, the dual bounds computed with T-Bin2 and T-Bin3 are in almost all cases the overall best found bounds. Their counterparts Bin2 and Bin3 are only able to provide the overall best bounds for about 50% of the instances. For L " 2, we see a similar picture. T-Bin2 and T-Bin3 deliver the best bounds for roughly 80% of the instances, while Bin2 and Bin3 achieve this only in 40% of the cases.
For dense instances, the picture is much clearer. Here, HybS and T-HybS are considerably better than Bin2, Bin3, and their tightened variants, particularly from L " 2 to L " 6; see Figure 13. With L " 2, HybS and T-HybS are able to compute dual bounds that are within a factor 1.05 of the overall best bounds for nearly all instances. All other methods require a corresponding factor of more than 1.2. For L " 4 and L " 6, we obtain by HybS the best overall bounds for roughly 90% of all instances, while all other approaches provide the best  bounds for less than 50% of the instances. With the exception of L " 2, where tightening HybS results in slightly better dual bounds, the tightened versions of the separable MIP relaxations attain significantly weaker dual bounds than their corresponding counterparts.
In Table 5, the shifted geometric mean values of the run times for solving the separable MIP relaxations on all instances are given. Here, HybS clearly outperforms all other methods, including its tightened variant T-HybS. HybS is at least a factor of two better than (T-)Bin2 and (T-)Bin3. However, tightening the lower bounds in HybS results partially in notably higher run times, e.g. by a factor of more than two in case of L " 4. Regarding the ability to find feasible solutions, all separable methods perform quite similarly and find more feasible solutions with higher L values. With L " 6, HybS in combination with IPOPT is able to compute feasible solutions for 51 out of 60 benchmark instances, 43 of which have a relative optimality gap below 1% and 40 of which are even globally optimal, i.e., which have a gap below 0.01%. Table 6. Number of feasible solutions found with different relative optimality gaps. The first number corresponds to a gap of less than 0.01%, the second to a gap of less than 1% and the third number indicates the number of feasible solutions. Bin2 T All in all, the clear winner among the separable methods is HybS. For large L values, HybS provides the best bounds, the shortest run times, and finds in combination with IPOPT the most and best feasible solutions for the original MIQCQP instances. This advantage is especially noticeable on dense instances and consistent with the theoretical findings from Section 5. While in HybS the number of binary variables increases linearly in the number of variable products, it increases quadratically in Bin2 and Bin3. Furthermore, based on the computational results, a tightening of the separable methods is not advisable, except for sparse instances with small L values. This is most likely due to the large number of additional constraints that are needed to underestimate p 2 1 and p 2 2 ; see Table 1.

7.2.3
Comparison with state-of-the-art MIQCQP Solver Gurobi Finally, we compare the two winners T-D-NMDT and HybS of the previous subsections with the state-of-the-art MIQCQP solver Gurobi 9.5.1. We perform the comparison in two ways. Firstly, with Gurobi's default settings, and secondly, with cuts disabled, i.e., we set the parameter "Cuts = 0". The reason for running Gurobi again with cuts turned off is that cuts are one of the most important components of MIQCQP/MIP solvers that rely on the structure of the problem. While constructing the MIP relaxations with T-D-NMDT and HybS, the original problem is transformed in such a way that Gurobi can no longer recognize the original quadratic structure of the problem. However, many cuts would still be valid and applicable in the MIP relaxations, for instance, RLT and PSD cuts.
We start our comparison with showing performance profiles for Gurobi, T-D-NMDT, HybS, and their variants without cuts ("-NC") on all instances in Figure 14. As expected, Gurobi performs best for all L values, followed by its variant without cuts in second place. However, as the depth L increases, the MIP relaxations provide gradually tighter dual bounds. For L " 6, T-D-NMDT and HybS are able to find the best dual bounds for more than 50% of the cases, while Gurobi delivers the best bounds for roughly 90% and its variant without cuts for about 70% of the cases. Surprisingly, in contrast to T-D-NMDT, disabling cuts in case of HybS has little effect on the quality of the dual bounds.
As before, we divide the benchmark set into sparse and dense instances. For sparse instances, the dual bounds computed by T-D-NMDT and HybS become progressively tighter with increasing L; see Figure 15. For L " 4 and L " 6, T-D-NMDT and HybS are able to find the best dual bounds in about 60% of the instances, while Gurobi delivers the best bounds for roughly 80%. Compared to Gurobi-NC, our new methods T-D-NMDT, HybS, and most notably HybS-NC perform almost equally well.
In the case of dense instances, a different picture emerges, see Figure 16. Again, Gurobi and also Gurobi-NC are dominant for all approximation depths. However, for L " 1, T-D-NMDT delivers dual bounds that are within a factor 1.1 of the dual bounds provided by the variant of Gurobi without cuts. With higher L values, T-D-NMDT, HybS, and HybS-NC compute in about 40% of the cases the best bounds, while Gurobi yields the best bounds in all cases and Gurobi-NC for roughly 70% of the instances.
In Table 7 we show the shifted geometric mean values of the run times for solving all instances with Gurobi and the corresponding MIP relaxations constructed with T-D-NMDT and HybS. The variants of Gurobi, T-D-NMDT, and HybS without cuts are also contained. Gurobi has significantly shorter run times than all other approaches. However, with L " 1 and L " 2, T-D-NMDT, HybS, T-D-NMDT-NC and HybS-NC are somewhat faster than Gurobi-NC.
Remark 7. Note, that for calculating the shifted geometric mean only those instances are used for which at least one method computed the optimal solution within the run time limit of 8 hours. Since with higher L values the complexity of the MIP relaxations increases, fewer instances are solved to optimality by T-D-NMDT and HybS. Therefore, the shifted geometric mean decreases for Gurobi and Gurobi-NC with higher L values. This inherent nature of the shifted geometric mean is also the reason why we see different values in Tables 3, 5 and 7 for the same methods.
In combination with IPOPT as a QP heuristic, T-D-NMDT, HybS, and their variants without cuts are competitive with Gurobi for high L values when it comes to finding feasible solutions, as Table 8 shows. HybS-NC with IPOPT is able to find feasible with a relative optimality gap below 1% for 48 out of 60 benchmark instances, while Gurobi finds 50 feasible solutions with a gap below 1%. T-D-NMDT computes 46 solutions that are globally optimal, whereas Gurobi achieves this for 50 instances. Surprisingly, the variant without cuts of HybS delivers more feasible solutions than its variant with cuts enabled. Finally, we note that some MIQCQP instances have been solved to global optimality by the MIP relaxation methods, while Gurobi reached the run time limit of 8 hours. For instance, T-D-NMDT with IPOPT is able to solve the QPLIB instance "QPLIB 0698" to global optimality for L P t2, 4, 6u with a run time below 5 minutes, while Gurobi has a relative optimality gap of more than 5% after a run time of 8 hours.
Overall, the comparison with Gurobi as a state-of-the-art MIQCQP solver has shown that the new methods T-D-NMDT and HybS can be relevant for practical applications. For sparse instances, the dual bounds provided by T-D-NMDT and HybS are of similar quality to those provided by Gurobi. In terms of MIQCQP-feasible solutions, for most instances the two methods are able to find very high quality solutions in combination with IPOPT as NLP solver.
Moreover, there is still plenty of room for improvement. First, numerical studies have shown before that an adaptive refinement of nonlinearities drastically decreases run times for solving MINLPs by piecewise linear MIP relaxations; see [7] for example. Hence, an approach with an adaptive refinement of the approximation depth L is even more promising. Second, HybS and its variant without cuts HybS-NC have performed very similarly in our computational study. In addition, HybS-NC was relatively close to Gurobi-NC in both solution quality and dual bounds for the MIQCQPs. Since most MIQCQP-specific cuts can still be integrated into the HybS approach, we believe that HybS can be further improved by embedding it in a branch-and-cut solution framework that is able to add MIQCQP-specific cuts, such as BQP and PSD cuts, to the MIP relaxations. In this way, we obtain both tighter dual bounds and MIP relaxation solutions that are more likely to yield feasible solutions for the MIQCQP in combination with IPOPT.  Table 7. Shifted geometric mean for run times on all instances for best MIP relaxation compared to Gurobi as MIQCQP solver with cuts and without cuts (-NC).

Conclusion
We introduced two enhanced MIP relaxation techniques for non-convex quadratic products of the form z " xy, called hybrid separable (HybS) and doubly discretized normalized multiparametric disaggregation technique (D-NMDT). While HybS is based on a combination of separable reformulations, D-NMDT is an extension of the well known normalized multiparametric disaggregation technique (NMDT).We showed that these new MIP relaxations have clear theoretical advantages over their predecessors, such as a significantly lower number of integer variables and tighter linear programming relaxations. In addition to new MIP relaxations for z " xy, we introduced a hereditarily sharp MIP relaxation called sawtooth relaxation for z " x 2 terms, which requires only a logarithmic number of binary variables with respect to the relaxation error. The sawtooth relaxation is based on the sawtooth approximation from [4].
In a broad computational study, we compared HybS, combined with the sawtooth relaxation, and D-NMDT against their predecessors from the literature and the state-of-the-art MIQCQP solver Gurobi. We showed that the new MIP relaxations determine far better dual bounds than their predecessors, while also exhibiting shorter run times. The new methods are able to find high-quality solutions to the original quadratic problems when used in conjunction with a primal solution callback function and a local non-linear programming solver. Finally, we show that we can partially compete with Gurobi as MIQCQP solvers and give some indications on how to further improve the new approaches.
Two of the most promising directions in this context are employing adaptivity and adding MIQCQP-specific cuts that are valid but not recognized by the MIP solvers. This is the subject of future work.

A Detailed Derivation of the MIP Relaxation D-NMDT
For the derivation of the MIP relaxation D-NMDT for gra r0,1s 2 pxyq, we first define ∆ L x P r0, 2´Ls, ∆ L y P r0, 2´Ls, β x P t0, 1u L , β y P t0, 1u L . (65) Then we use the NMDT representation (26), expand the ∆ L x y-term and obtain z " xy " y˜L Alternatively, if we discretize y first, then expand the term ∆ L y x, we obtain z " L ÿ j"1 2´jpβ y j x`β x j ∆ L y q`∆ L x ∆ L y .
Finally, we obtain the complete MIP relaxation D-NMDT stated in (30) by applying McCormick envelopes to the product terms β y j pp1´λq∆ L x`λ xq, β x j pλ∆ L yp 1´λqyq and ∆ L x ∆ L y . For bounds on the terms pp1´λq∆ L x`λ xq and pλ∆ L yp 1´λqyq, see Appendix B.

B MIP Relaxations on General Intervals
In this section, we generalize the MIP relaxations for gra r0,1s 2 pxyq and gra 2 r0,1s px 2 q discussed in this article to general box domains px, yq P rx,xsˆP rȳ,ȳs and x " l xx`x and z " xy " pl xx`x qy " l xẑ`x¨y to obtain x P r0, 2´Lpx´xqs, y P rȳ,ȳs, β P t0, 1u L .
In this way, we are able to formulate the MIP relaxation NMDT on a general box domain as follows: x " l x L ÿ j"1 px, α j , u j q P Mpx, β j q j P 1, . . . , L p∆ L x , y, ∆ L z q P Mp∆ L x , yq ∆ L x P r0, 2´Ll x s, y P rȳ,ȳs, β P t0, 1u L Finally, we present the modelling of D-NMDT on general box domains. Analogously as for NMDT, we apply McCormick envelopes to model all remaining product terms α j y and ∆ L x¨y . Further, we introduce the variablesx P r0, 1s andẑ P r0, 1s to map the domain to r0, 1s intervals by using the transformations x :" l xx`x and y :" l yẑ`ȳ as well as z " xy " pl xx`x qpl yẑ`ȳ q " l x l yxẑ`lxx ȳ`l yẑ x`xȳ " l x l yẑ`lxx ȳ`l yẑ x`xȳ.
As in the derivation of (30), we then obtain the formulation D-NMDT by applying McCormick envelopes to the product terms β i pp1´λq∆ L x`λx q, α i pλ∆ L z`p 1λ qẑq and ∆ L x ∆ L z . As in (30), we incorporate the following bounds to construct McCormick envelopes: p1´λq∆ L x`λx P r0, p1´λq2´L`λs λ∆ L z`p 1´λqẑ P r0, λ2´L`p1´λqs.
We note that generalizing the sawtooth epigraph relaxation (18) works analogously.
For NMDT and D-NMDT, we derive the general formulations by using the derivations in Appendix A with x " y. In the case of NMDT, where the original model is (28), this leads to px, β i , u i q P Mpx, α i q i P 1, . . . , L p∆ L x , x, ∆ L z q P Mp∆ L x , xq ∆ L x P r0, 2´Ll x s, x P rx,xs, β P t0, 1u L .
For D-NMDT, we obtain (31) for general domains as follows: x P r0, 2´Ls, x P rx,xs, β P t0, 1u L , with l x ∆ L x`x P rx, l x 2´L`xs.

C Auxiliary Results and Proofs
In this appendix, we give the proofs of Lemma 4 and Proposition 7 which we have moved here for better readability.

C.1 Epigraphs Over Non-Contiguous Domains
Here we present the proof of Lemma 4.
Proof (Lemma 4). We first note that we have F X pxq ě F pxq for all x P convpXq: for all x P convpXq, we have that either F X pxq " F pxq or that F X pxq is the line between two points on the graph of f , which must lie above the graph of f by the convexity of f . Further, we have that F X is convex, as it is a maximum between the convex function F and some of its secant lines, which are also convex. Now, trivially, by the convexity of F X , we have convpepi X pF qq " convpepi X pF X qq Ď convpepi convpXq pF X qq " epi convpXq pF X q To show that epi convpXq pF X q Ď convpepi X pF qq, let px, yq P epi convpXq pF X q. Then if x P X, y ě F X pxq " F pxq, such that px, yq P epi X pF q Ď convpepi X pF qq.
On the other hand, if x P convpXqzX, then by definition of F X we have that there exist some λ P r0, 1s and x 1 , x 2 P X such that x " λx 1`p 1´λqx 2 and F X pxq " λF px 1 q`p1´λqF px 2 q. Then we have that px, yq is a convex combination of the points px 1 , f px 1 q`py´F X pxqqq and px 2 , F px 2 q`py´F X pxqqq, which are in epi X pF q (since y´F X pxq ě 0), yielding px, yq P convpepi X pF qq as required.

C.2 Volume Proof for Bin2 and Bin3
Now we prove Proposition 7.
Proof (Proposition 7). Let P IP L,L1 be the MIP relaxation Bin2, where F L is the sawtooth approximation of z x " x 2 and z y " y 2 that consists of secant lines to x 2 between consecutive breakpoints x k " k2´L and y k " k2´L for k P 0, 2 L . Further, for L 1 Ñ 8 we have lim L1Ñ8 tpp, z p q P r0, 1sˆR : pp, z p q P Q L1 u " tpp, z p q P r0, 1sˆR : pp, z p q P epi r0,1s pp 2 qu under Hausdorff distance. As a result, we obtain lim L,L1Ñ8 pproj x,y,z pP IP L,L1 qq " tpx, y, zq P r0, 1s 2ˆR : 2´y2˘u . Now let and l x " l y " 2´p L´1q be the distance between any two consecutive breakpoints x k , x k´1 and y k , y k´1 , respectively, and consider the volume of proj x,y,z pP IP L,L1 q over the grid piece rx k´1 , x k sˆry k´1 , y k s: L`x`y 2˘´p x`y 2 q 2˘d ydx.
The first two integrals are each the overapproximation volumes for the sawtooth approximation over two consecutive univariate domain segments, each of which has an area of 1 6 2´3 L , see [4, Appendix A]. Thus, since l x " l y " 2˚2´L, we have that the first two integrals add up to 2 3 2´4 L .
To process the third integral, we apply the two substitutions u " px´x k´1 q`py´y k´1 q 2 and v " px´x k´1 q´py´y k´1 q 2 . The integral then becomes 2 ż x k x k´1 ż y k y k´1`F L`x`y 2˘´p x`y 2 q 2˘d ydx The steps J1 and J2 rely on the observation that F L is the secant line to x 2 across the intervals r L , x k´1`yk´1 2`22´2 L s, due to the positions of x k´1 and y k´1 . In addition, for somex P rx k´1 , x k s, the error between and x 2 and the secant line to x 2 at points x k´1 and x k is given by px´x k´1 qpx k´x q -the product of distances to each endpoint. Thus, for u P r0, 2´Ls, we have F L´u`xk´1`yk´1 2¯´p u`x k´1`yk´1 2 q 2 " up2 L´u q, yielding the validity of step J2. On the other hand, to show that step J1 is valid, we observe for u P r0, 2´Ls that F L´u`xk´1`yk´1 2¯´p u`x k´1`yk´1 2 q 2 " pu´2 L qp2´2 L´u q holds, such that the second integral becomes the first integral under the substitutionũ " 2´L´u, since the secant-error portion of the integrand is symmetric about u " 2´L. Thus, the volume related to the second integral is 4 3 2´4 L . The volume of P IP L,L1 over each grid piece converges to 2¨2´4 L , yielding a total volume convergence of lim L1Ñ8 volpproj x,y,z pP IP L,L1 qq " 2 2pL´1q p2¨2´4 L q " 1 2 2´2 L .
The proof for Bin3 is similar and therefore omitted here.

D Instance set
In Table 9 we show a listing of all instances of the computational study from Section 7. The boxQP instances are publicly available at https://github.com /joehuchette/quadratic-relaxation-experiments. The ACOPF are also publicly available at https://github.com/robburlacu/acopflib. The QPLIB instances are available at https://qplib.zib.de/. In total, we have 60 instances, of which 30 are dense and 30 are sparse. Table 9. IDs of all 60 instances used in the computational study. In bold are the IDs of the instances that are dense.