Abstract
We present a technique for producing valid dual bounds for nonconvex quadratic optimization problems. The approach leverages an elegant piecewise linear approximation for univariate quadratic functions due to Yarotsky (Neural Netw 94:103–114, 2017), formulating this (simple) approximation using mixed-integer programming (MIP). Notably, the number of constraints, binary variables, and auxiliary continuous variables used in this formulation grows logarithmically in the approximation error. Combining this with a diagonal perturbation technique to convert a nonseparable quadratic function into a separable one, we present a mixed-integer convex quadratic relaxation for nonconvex quadratic optimization problems. We study the strength (or sharpness) of our formulation and the tightness of its approximation. Further, we show that our formulation represents feasible points via a Gray code. We close with computational results on problems with quadratic objectives and/or constraints, showing that our proposed method (i) across the board outperforms existing MIP relaxations from the literature, and (ii) on hard instances produces better bounds than exact solvers within a fixed time budget.
Similar content being viewed by others
Notes
Furthermore, Yarotsky [57] observes that it is straightforward to represent each of the sawtooth functions as a composition of the standard ReLU activation function \(\sigma (x) = \max \{0,x\}\). For example, \(G_1(x) = 2\sigma (x) - 4 \sigma (x-\frac{1}{2}) + 2 \sigma (x-1)\). In this way, \(F_L\) can be written as a neural network with a very particular choice of architecture and weight values.
This can be accomplished in a number of ways: for example, by computing the minimum eigenvalue of D, or by solving a semidefinite programming problem [23].
In actuality, any Gray code, not just the reflected Gray code studied in this paper, yields a (potentially distinct) logarithmic formulation for a univariate function. Here, we mean the one constructed with the reflected Gray code, which is the most common choice regardless.
CPLEX does not support nonconvex quadratic constraints of this form, so we do not include a corresponding approach with the diagonal shift.
References
Adjiman, C.S., Androulakis, I.P., Floudas, C.A.: A global optimization method, \(\alpha \)bb, for general twice-differentiable constrained NLPs–II. Implementation and computational results. Comput. Chem. Eng. 22(9), 1159–1179 (1998)
Adjiman, C.S., Dallwig, S., Floudas, C.A., Neumaier, A.: A global optimization method, \(\alpha \)BB, for general twice-differentiable constrained NLPs–I. Theoretical advances. Comput. Chem. Eng. 22(9), 1137–1158 (1998)
Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, J.P.: Strong mixed-integer programming formulations for trained neural networks. In: A. Lodi, V. Nagarajan (eds.) Proceedings of the 20th Conference on Integer Programming and Combinatorial Optimization, pp. 27–42. Springer International Publishing, Cham (2019). arXiv:1811.08359
Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, J.P.: Strong mixed-integer programming formulations for trained neural networks. In: Lodi, A., Nagarajan, V. (eds.) Integer Programming and Combinatorial Optimization, pp. 27–42. Springer International Publishing, Cham (2019)
Androulakis, I., Maranas, C.D.: \(\alpha \)BB: a global optimization method for general constrained nonconvex problems. J. Glob. Optim. 7(4), 337–363 (1995)
Androulakis, I.P., Maranas, C.D., Floudas, C.A.: \(\alpha \)bb: a global optimization method for general constrained nonconvex problems. J. Glob. Optim. 7(4), 337–363 (1995)
Bader, J., Hildebrand, R., Weismantel, R., Zenklusen, R.: Mixed integer reformulations of integer programs and the affine tu-dimension of a matrix. Math. Program. 169(2), 565–584 (2018)
Billionnet, A., Elloumi, S., Lambert, A.: Extending the QCR method to general mixed-integer programs. Math. Program. 131(1–2), 381–401 (2012). https://doi.org/10.1007/s10107-010-0381-7
Billionnet, A., Elloumi, S., Lambert, A.: Exact quadratic convex reformulations of mixed-integer quadratically constrained problems. Math. Program. 158(1), 235–266 (2016). https://doi.org/10.1007/s10107-015-0921-2
Bonami, P., Günlük, O., Linderoth, J.: Globally solving nonconvex quadratic programming problems with box constraints via integer programming methods. Math. Program. Comput. 10(3), 333–382 (2018). https://doi.org/10.1007/s12532-018-0133-x
Bunel, R., Lu, J., Turkaslan, I., Torr, P.H., Kohli, P., Kumar, M.P.: Branch and bound for piecewise linear neural network verification (2019). arXiv:1909.06588
Burer, S., Saxena, A.: The MILP road to MIQCP. In: Lee, J., Leyffer, S. (eds.) Mixed Integer Nonlinear Programming, pp. 373–405. Springer, New York (2012)
Castillo, P.A.C., Castro, P.M., Mahalec, V.: Global optimization of MIQCPs with dynamic piecewise relaxations. J. Glob. Optim. 71(4), 691–716 (2018). https://doi.org/10.1007/s10898-018-0612-7
Castro, P.M.: Normalized multiparametric disaggregation: an efficient relaxation for mixed-integer bilinear problems. J. Glob. Optim. 64(4), 765–784 (2015)
Castro, P.M.: Tightening piecewise McCormick relaxations for bilinear problems. Comput. Chem. Eng. 72, 300–311 (2015). https://doi.org/10.1016/j.compchemeng.2014.03.025
Castro, P.M., Liao, Q., Liang, Y.: Comparison of mixed-integer relaxations with linear and logarithmic partitioning schemes for quadratically constrained problems. Optim. Eng. (2021). https://doi.org/10.1007/s11081-021-09603-5
Chen, J., Burer, S.: Globally solving nonconvex quadratic programming problems via completely positive programming. Math. Program. Comput. 4(1), 33–52 (2012)
Croxton, K.L., Gendron, B., Magnanti, T.L.: A comparison of mixed-integer programming models for nonconvex piecewise linear cost minimization problems. Manag. Sci. 49(9), 1268–1273 (2003)
Dantzig, G.B.: On the significance of solving linear programming problems with some integer variables. Econometrica J. Econom. Soc. 30–44 (1960)
Dey, S.S., Gupte, A.: Analysis of milp techniques for the pooling problem. Oper. Res. 63(2), 412–427 (2015)
Dey, S.S., Kazachkov, A.M., Lodi, A., Mu, G.: Cutting plane generation through sparse principal component analysis. http://www.optimization-online.org/DB_HTML/2021/02/8259.html
Dong, H.: Relaxing nonconvex quadratic functions by multiple adaptive diagonal perturbations. SIAM J. Optim. 26(3), 1962–1985 (2016)
Dong, H., Luo, Y.: Compact disjunctive approximations to nonconvex quadratically constrained programs (2018)
Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
Elloumi, S., Lambert, A.: Global solution of non-convex quadratically constrained quadratic programs. Optim. Methods Software 34(1), 98–114 (2019). https://doi.org/10.1080/10556788.2017.1350675
Fortet, R.: L’algebre de boole et ses applications en recherche operationnelle. Trabajos de Estadistica 11(2), 111–118 (1960). https://doi.org/10.1007/bf03006558
Foss, F.A.: The use of a reflected code in digital control systems. Transactions of the I.R.E. Professional Group on Electronic Computers EC-3(4), 1–6 (1954). https://doi.org/10.1109/irepgelc.1954.6499244
Frangioni, A., Gentile, C.: Perspective cuts for a class of convex 0–1 mixed integer programs. Math. Program. Ser. A 106, 225–236 (2006)
Frangioni, A., Gentile, C.: SDP diagonalizations and perspective cuts for a class of nonseparable MIQP. Oper. Res. Lett. 35(2), 181–185 (2007)
Furini, F., Traversi, E., Belotti, P., Frangioni, A., Gleixner, A., Gould, N., Liberti, L., Lodi, A., Misener, R., Mittelmann, H., Sahinidis, N.V., Vigerske, S., Wiegele, A.: QPLIB: a library of quadratic programming instances. Math. Program. Comput. 11(2), 237–265 (2019)
Galli, L., Letchford, A.N.: A compact variant of the qcr method for quadratically constrained quadratic 0–1 programs. Optim. Lett. 8(4), 1213–1224 (2014). https://doi.org/10.1007/s11590-013-0676-8
Galli, L., Letchford, A.N.: A binarisation heuristic for non-convex quadratic programming with box constraints. Oper. Res. Lett. 46(5), 529–533 (2018). https://doi.org/10.1016/j.orl.2018.08.005
Glover, F.: Improved linear integer programming formulations of nonlinear integer problems. Manag. Sci. 22(4), 455–460 (1975). https://doi.org/10.1287/mnsc.22.4.455
Hammer, P., Ruben, A.: Some remarks on quadratic programming with 0–1 variables. Revue Francaise D Automatique Informatique Recherche Operationnelle 4(3), 67–79 (1970)
Hansen, P., Jaumard, B., Ruiz, M., Xiong, J.: Global minimization of indefinite quadratic functions subject to box constraints. Naval Res. Logist. (NRL) 40(3), 373–392 (1993). https://doi.org/10.1002/1520-6750(199304)40:3<373::AID-NAV3220400307>3.0.CO;2-A
Huchette, J., Vielma, J.P.: Nonconvex piecewise linear functions: Advanced formulations and simple modeling tools. Oper. Res. (to appear). arXiv:1708.00050
Huchette, J.A.: Advanced mixed-integer programming formulations : methodology, computation, and application. Ph.D. thesis, Massachusetts Institute of Technology (2018)
Kaibel, V., Pashkovich, K.: Constructing Extended Formulations from Reflection Relations, pp. 77–100. Springer, Berlin (2013)
Lee, J., Wilson, D.: Polyhedral methods for piecewise-linear functions I: the lambda method. Discrete Appl. Math. 108, 269–285 (2001)
Magnanti, T.L., Stratila, D.: Separable concave optimization approximately equals piecewise linear optimization. In: D. Bienstock, G. Nemhauser (eds.) Lecture Notes in Computer Science, vol. 3064, pp. 234–243. Springer (2004)
Misener, R., Floudas, C.A.: Global optimization of mixed-integer quadratically-constrained quadratic programs (MIQCQP) through piecewise-linear and edge-concave relaxations. Math. Program. 136(1), 155–182 (2012). https://doi.org/10.1007/s10107-012-0555-6
Nagarajan, H., Lu, M., Wang, S., Bent, R., Sundar, K.: An adaptive, multivariate partitioning algorithm for global optimization of nonconvex programs. J. Glob. Optim. 74, 639–675 (2019)
Padberg, M.: Approximating separable nonlinear functions via mixed zero-one programs. Oper. Res. Lett. 27, 1–5 (2000)
Pardalos, P., Vavasis, S.: Quadratic programming with one negative eigenvalue is NP-hard. J. Glob. Optim. 1(1), 15–22 (1991)
Phan-huy-Hao, E.: Quadratically constrained quadratic programming: some applications and a method for solution. Zeitschrift für Oper. Res. 26(1), 105–119 (1982)
Savage, C.: A survey of combinatorial gray codes. SIAM Rev. 39(4), 605–629 (1997)
Saxena, A., Bomani, P., Lee, J.: Convex relaxations of non-convex mixed integer quadratically constrained programs: Projected formulations. Math. Program. 130, 359–413 (2011)
Serra, T., Ramalingam, S.: Empirical bounds on linear regions of deep rectifier networks (2018). arXiv:1810.03370
Serra, T., Tjandraatmadja, C., Ramalingam, S.: Bounding and counting linear regions of deep neural networks. In: Thirty-fifth International Conference on Machine Learning (2018)
Tjeng, V., Xiao, K., Tedrake, R.: Verifying neural networks with mixed integer programming. In: International Conference on Learning Representations (2019)
Vielma, J.P., Ahmed, S., Nemhauser, G.: Mixed-integer models for nonseparable piecewise-linear optimization: unifying framework and extensions. Oper. Res. 58(2), 303–315 (2010)
Vielma, J.P., Ahmed, S., Nemhauser, G.: Mixed-integer models for nonseparable piecewise-linear optimization: unifying framework and extensions. Oper. Res. 58(2), 303–315 (2010). https://doi.org/10.1287/opre.1090.0721
Vielma, J.P., Nemhauser, G.L.: Modeling disjunctive constraints with a logarithmic number of binary variables and constraints. Math. Program. 128(1–2), 49–72 (2009). https://doi.org/10.1007/s10107-009-0295-4
Wei, Y.: Triangular function analysis. Comput. Math. Appl. 37(6), 37–56 (1999). https://doi.org/10.1016/s0898-1221(99)00075-9
Wiese, S.: A computational practicability study of MIQCQP reformulations. https://docs.mosek.com/whitepapers/miqcqp.pdf (2021). Accessed 22 Feb 2021
Xia, W., Vera, J.C., Zuluaga, L.F.: Globally solving nonconvex quadratic programs via linear integer programming techniques. INFORMS J. Comput. 32(1), 40–56 (2020). https://doi.org/10.1287/ijoc.2018.0883
Yarotsky, D.: Error bounds for approximations with deep relu networks. Neural Netw. 94, 103–114 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by AFOSR (Grant FA9550-21-0107) and ONR (Grant N00014-20-1-2156). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Office of Naval Research or the Air Force Office of Scientific Research.
Appendices
Normalized multi-parametric disaggregation technique
We present a standard approach to descretizing continuous variables for handling bilinear products in nonlinear models. This approach is perhaps the most straightforward way to convert bilinear problems to MILPs and has been referred to as Normalized Multi-Parametric Disaggregation Technique (NMDT) [14]. We adapt the bilinear approach here to a squaring a single variable.
Consider \(x \in [0,1]\), and let L be a positive integer. We then use the representation
where L is the number of binary variables to use.
Multiplying (52a) by x, and substituting the representation into the \(x \varDelta x\) term, we obtain
Now, using the fact that \(x + \varDelta x \in [0, 1+2^{-L}]\), first lift the model by adding variables \(u_i\) and \(\varDelta u\) such that \(u_i = (x + \varDelta x) \beta _i\) and \(\varDelta u = \varDelta x^2\), and then we relax these equations using McCormick Envelopes.
Given bounds \(x \in [ {x}^{\min }, {x}^{\max }]\) and \(\beta \in [0, 1]\), The McCormick envelope \({\mathcal {M}}(x,\beta )\) is defined as the following relaxation of \(u = x \beta \)
To approximate \(u = x^2\) with \(x \in [0, {x}^{\max }]\), this becomes
We present two ways to use this approach. The first is the most direct use of \(\texttt {NMDT}{}\), as used in [14]. This model is
Here, the only error introduced in the relaxation is from \(\varDelta u = x\varDelta x\), yielding a maximum error of \(2^{-L-2}\), again occurring when \(\varDelta x = 2^{-L-1}\).
Alternatively, we consider the expansion of the \(x \varDelta x\) term. We thus obtain the T-NMDT relaxation for \(y=x^2\).
Since \(\beta _i\) is binary, \(u_i = \beta _i (x + \varDelta x)\) is represented exactly. Thus, the only possible error is introduced in the relaxation of \(\varDelta y = \varDelta x^2\), which yields a maximum error of \(2^{-2L-2}\), occurring when \(\varDelta x = 2^{-L-1}\).
Now, the expected error of T-NMDT is the expected error from the relaxation of \(\varDelta y = \varDelta x^2\). Modeling \(\varDelta x\) as a uniform random variable within its bounds \([0,2^{-L}]\), and noting that the only overestimator from (56) is \(y \le 2^{-L} \varDelta x\) we obtain expected overapproximation error
Similarly, the expected underapproximation error can be computed as \(\frac{1}{12} 2^{-2L}\).
Additional baseline computation summaries
In Table 9 we summarize the results of our baseline experiments stratified by the number of decision variables as in, e.g., Table 4 of Dey et al. [21].
General representations with sawtooth bases
The premise of our formulation is that the function \(y=x^2\) can be arbitrarily closely approximated by a series of sawtooth functions. We discuss here if such approximations could conveniently apply to other polynomials.
In [54], the authors present a Fourier series-like method that leverages orthogonal triangular functions to derive a convergent class of \(L_2\)-optimal approximations for general functions on the interval \([-\pi ,\pi ]\). Define the periodic triangular functions
The authors then build their orthogonal basis functions using an orthogonal linear transformation of the basis
However, as with Fourier series approximations, this method has the limitation that all approximating functions are equal at the endpoints of the interval, resulting in a poor approximation for functions at which the endpoints are not equal. Thus, to obtain good approximations for \(x^3\) on \([-\pi ,\pi ]\), we first add the linear function \(-\pi ^2 x\) to enforce equality at the endpoints.
Then, applying this method to \(x^2\) and \(x^3 - \pi ^2 x\) on the interval \(x \in [-\pi ,\pi ]\), we obtain the following numbers for the (\(L_1\)-error). Note that almost all of the Y(nx) functions are relevant for approximating \(x^3 - \pi ^2 x\) (and no X(nx)’s), while only a few X(nx) functions (and no \(Y(nx)'s\)) are relevant for approximating \(x^2\).
To investigate the outlook of sparsely approximating \(x^3\) with triangular functions directly, we solved the following MIP to obtain the \(L_1\)-optimal triangular approximation to \(x^3\) on the interval [0, 1] using re-scaled versions of the basis functions above, and explicitly including a linear shift. We discretely approximate the \(L-1\) error via the error at uniformly-spaced points \(x_1, \dots , x_{N_p} \in [0,1]\), allowing the inclusion of only \(N_f\) triangular functions.
The result, shown in Fig. 4, suggests that it is not possible to use this triangular basis to obtain a similar-quality sparse approximation for \(x^3\) as for \(x^2\): the best achievable error rate for \(x^3\) is roughly \(O(N_f^{-2})\), compared to \(O(2^{-2N_f})\) for the quadratic. See also Table 10 where we compare the convergence of the two approximations.
Rights and permissions
About this article
Cite this article
Beach, B., Hildebrand, R. & Huchette, J. Compact mixed-integer programming formulations in quadratic optimization. J Glob Optim 84, 869–912 (2022). https://doi.org/10.1007/s10898-022-01184-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-022-01184-6