Skip to main content

Ideal formulations for constrained convex optimization problems with indicator variables


Motivated by modern regression applications, in this paper, we study the convexification of a class of convex optimization problems with indicator variables and combinatorial constraints on the indicators. Unlike most of the previous work on convexification of sparse regression problems, we simultaneously consider the nonlinear non-separable objective, indicator variables, and combinatorial constraints. Specifically, we give the convex hull description of the epigraph of the composition of a one-dimensional convex function and an affine function under arbitrary combinatorial constraints. As special cases of this result, we derive ideal convexifications for problems with hierarchy, multi-collinearity, and sparsity constraints. Moreover, we also give a short proof that for a separable objective function, the perspective reformulation is ideal independent from the constraints of the problem. Our computational experiments with sparse regression problems demonstrate the potential of the proposed approach in improving the relaxation quality without significant computational overhead.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. Aktürk, M.S., Atamtürk, A., Gürel, S.: A strong conic quadratic reformulation for machine-job assignment with controllable processing times. Oper. Res. Lett. 37(3), 187–191 (2009)

    MathSciNet  MATH  Google Scholar 

  2. Angulo, G., Ahmed, S., Dey, S.S., Kaibel, V.: Forbidden vertices. Math. Oper. Res. 40(2), 350–360 (2015)

    MathSciNet  MATH  Google Scholar 

  3. Anstreicher, K.M.: On convex relaxations for quadratically constrained quadratic programming. Math. Program. 136(2), 233–251 (2012)

    MathSciNet  MATH  Google Scholar 

  4. Atamtürk, A., Gómez, A.: Strong formulations for quadratic optimization with M-matrices and indicator variables. Math. Program. 170(1), 141–176 (2018)

    MathSciNet  MATH  Google Scholar 

  5. Atamtürk, A., Gómez, A.: Rank-one convexification for sparse regression. Optimization Online. (2019)

  6. Atamtürk, A., Gómez, A., Han, S.: Sparse and smooth signal estimation: convexification of L0 formulations. J. Mach. Learn. Res. 3, 1–43 (2021)

    MATH  Google Scholar 

  7. Bacci, T., Frangioni, A., Gentile, C., Tavlaridis-Gyparakis, K.: New MINLP formulations for the unit commitment problems with ramping constraints. Optimization Online. (2019)

  8. Belotti, P., Góez, J.C., Pólik, I., Ralphs, T.K., Terlaky, T.: A conic representation of the convex hull of disjunctive sets and conic cuts for integer second order cone optimization. In: Numerical Analysis and Optimization, pp. 1–35. Springer (2015)

  9. Bertsimas, D., Cory-Wright, R., Pauphilet, J.: Mixed-projection conic optimization: A new paradigm for modeling rank constraints. arXiv preprint arXiv:2009.10395 (2020a)

  10. Bertsimas, D., King, A.: OR Forum - An algorithmic approach to linear regression. Oper. Res. 64(1), 2–16 (2016)

    MathSciNet  MATH  Google Scholar 

  11. Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)

    MathSciNet  MATH  Google Scholar 

  12. Bertsimas, D., Pauphilet, J., Van Parys, B., et al.: Sparse regression: scalable algorithms and empirical performance. Stat. Sci. 35(4), 555–578 (2020b)

    MathSciNet  MATH  Google Scholar 

  13. Bertsimas, D., Van Parys, B.: Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. Ann. Statist. 1, 300–323 (2020)

    MathSciNet  MATH  Google Scholar 

  14. Bien, J., Taylor, J., Tibshirani, R.: A lasso for hierarchical interactions. Ann. Stat. 41(3), 1111 (2013)

    MathSciNet  MATH  Google Scholar 

  15. Bienstock, D., Michalka, A.: Cutting-planes for optimization of convex functions over nonconvex sets. SIAM J. Opt. 24(2), 643–677 (2014)

    MathSciNet  MATH  Google Scholar 

  16. Burer, S.: On the copositive representation of binary and continuous nonconvex quadratic programs. Math. Program. 120(2), 479–495 (2009)

    MathSciNet  MATH  Google Scholar 

  17. Burer, S., Kılınç-Karzan, F.: How to convexify the intersection of a second order cone and a nonconvex quadratic. Math. Program. 162(1–2), 393–429 (2017)

    MathSciNet  MATH  Google Scholar 

  18. Carrizosa, E., Mortensen, L., Morales, D.R.: On linear regression models with hierarchical categorical variables. Tech. rep. (2020)

  19. Ceria, S., Soares, J.: Convex programming for disjunctive convex optimization. Math. Program. 86, 595–614 (1999)

    MathSciNet  MATH  Google Scholar 

  20. Cozad, A., Sahinidis, N.V., Miller, D.C.: Learning surrogate models for simulation-based optimization. AIChE J. 60(6), 2211–2227 (2014)

    Google Scholar 

  21. Cozad, A., Sahinidis, N.V., Miller, D.C.: A combined first-principles and data-driven approach to model building. Comput. Chem. Eng. 73, 116–127 (2015)

    Google Scholar 

  22. Dedieu, A., Hazimeh, H., Mazumder, R.: Learning sparse classifiers: Continuous and mixed integer optimization perspectives. J. Mach. Learn. Res. 15, 1–4 (2021)

    MathSciNet  MATH  Google Scholar 

  23. Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017)

  24. Dong, H.: On integer and MPCC representability of affine sparsity. Oper. Res. Lett. 47(3), 208–212 (2019)

    MathSciNet  MATH  Google Scholar 

  25. Dong, H., Ahn, M., Pang, J.-S.: Structural properties of affine sparsity constraints. Math. Program. 176(1–2), 95–135 (2019)

    MathSciNet  MATH  Google Scholar 

  26. Dong, H., Chen, K., Linderoth, J.: Regularization vs. relaxation: A conic optimization perspective of statistical variable selection. arXiv preprint arXiv:1510.06083 (2015)

  27. Dong, H., Linderoth, J.: On valid inequalities for quadratic programming with continuous variables and binary indicators. In: Goemans, M., Correa, J. (eds.) Integer Programming and Combinatorial Optimization, pp. 169–180. Springer, Berlin (2013)

  28. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    MathSciNet  MATH  Google Scholar 

  29. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    MathSciNet  MATH  Google Scholar 

  30. Frangioni, A., Furini, F., Gentile, C.: Approximated perspective relaxations: a project and lift approach. Comput. Opt. Appl. 63(3), 705–735 (2016)

    MathSciNet  MATH  Google Scholar 

  31. Frangioni, A., Gentile, C.: Perspective cuts for a class of convex 0–1 mixed integer programs. Math. Program. 106, 225–236 (2006)

    MathSciNet  MATH  Google Scholar 

  32. Frangioni, A., Gentile, C.: SDP diagonalizations and perspective cuts for a class of nonseparable MIQP. Oper. Res. Lett. 35(2), 181–185 (2007)

    MathSciNet  MATH  Google Scholar 

  33. Frangioni, A., Gentile, C.: A computational comparison of reformulations of the perspective relaxation: SOCP vs. cutting planes. Oper. Res. Lett. 37(3), 206–210 (2009)

    MathSciNet  MATH  Google Scholar 

  34. Frangioni, A., Gentile, C., Grande, E., Pacifici, A.: Projected perspective reformulations with applications in design problems. Oper. Res. 59(5), 1225–1232 (2011)

    MathSciNet  MATH  Google Scholar 

  35. Frangioni, A., Gentile, C., Hungerford, J.: Decompositions of semidefinite matrices and the perspective reformulation of nonseparable quadratic programs. Math. Oper. Res. 45(1), 15–33 (2020)

    MathSciNet  MATH  Google Scholar 

  36. Günlük, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear programs with indicator variables. Math. Program. 124, 183–205 (2010)

    MathSciNet  MATH  Google Scholar 

  37. Han, S., Gómez, A., Atamtürk, A.: 2x2 convexifications for convex quadratic optimization with indicator variables. arXiv preprint arXiv:2004.07448 (2020)

  38. Hardy, G.H.: Course of Pure Mathematics. Courier Dover (1908). (Publications)

  39. Hastie, T., Tibshirani, R., Wainwright, M.: Statistical learning with sparsity: the lasso and generalizations. In: Monographs on Statistics and Applied Probability, vol. 143. Chapman and Hall/CRC (2015)

  40. Hazimeh, H., Mazumder, R.: Learning hierarchical interactions at scale: a convex optimization approach. In: Chiappa, S. and Calandra, R., editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp. 1833–1843. PMLR

  41. Hazimeh, H., Mazumder, R., Saab, A.: Sparse regression at scale: branch-and-bound rooted in first-order optimization. arXiv preprint arXiv:2004.06152 (2020)

  42. Hijazi, H., Bonami, P., Cornuéjols, G., Ouorou, A.: Mixed-integer nonlinear programs featuring “on/off” constraints. Comput. Opt. Appl. 52(2), 537–558 (2012)

  43. Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Stat. Sci. Rev. J. Inst. Math. Stat. 27(4),(2012)

  44. Jeon, H., Linderoth, J., Miller, A.: Quadratic cone cutting surfaces for quadratic programs with on-off constraints. Dis. Opt. 24, 32–50 (2017)

    MathSciNet  MATH  Google Scholar 

  45. Kılınç-Karzan, F., Yıldız, S.: Two-term disjunctions on the second-order cone. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 345–356. Springer. (2014)

  46. Küçükyavuz, S., Shojaie, A., Manzour, H., Wei, L.: Consistent second-order conic integer programming for learning Bayesian networks. arXiv preprint arXiv:2005.14346 (2020)

  47. Manzour, H., Küçükyavuz, S., Wu, H.-H., Shojaie, A.: Integer programming for learning directed acyclic graphs from continuous data. INFORMS J. Opt. 3(1), 46–73 (2021)

    MathSciNet  Google Scholar 

  48. Miller, A.: Subset selection in regression. Chapman and Hall/CRC (2002)

  49. Modaresi, S., Kılınç, M.R., Vielma, J.P.: Intersection cuts for nonlinear integer programming: convexification techniques for structured sets. Math. Program. 155(1–2), 575–611 (2016)

    MathSciNet  MATH  Google Scholar 

  50. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    MathSciNet  MATH  Google Scholar 

  51. Pilanci, P., Wainwright, M.J., El Ghaoui, L.: Sparse learning via Boolean relaxations. Math. Program. 151, 63–87 (2015)

    MathSciNet  MATH  Google Scholar 

  52. Richard, J.-P.P., Tawarmalani, M.: Lifting inequalities: a framework for generating strong cuts for nonlinear programs. Math. Program. 121(1), 61–104 (2010)

    MathSciNet  MATH  Google Scholar 

  53. Sato, T., Takano, Y., Miyashiro, R., Yoshise, A.: Feature subset selection for logistic regression via mixed integer optimization. Comput. Opt. Appl. 64(3), 865–880 (2016)

    MathSciNet  MATH  Google Scholar 

  54. Stubbs, R.A., Mehrotra, S.: A branch-and-cut method for 0–1 mixed convex programming. Math. Program. 86(3), 515–532 (1999)

    MathSciNet  MATH  Google Scholar 

  55. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol., pp. 267–288 (1996)

  56. Vielma, J.P.: Small and strong formulations for unions of convex sets from the Cayley embedding. Math. Program. 177(1–2), 21–53 (2019)

    MathSciNet  MATH  Google Scholar 

  57. Wang, A. L., Kılınç-Karzan, F.: The generalized trust region subproblem: solution complexity and convex hull results. Forthcoming in Math. Program. (2020a)

  58. Wang, A.L., Kılınç-Karzan, F.: On convex hulls of epigraphs of QCQPs. In: Bienstock, D., Zambelli, G. (eds.) Integer Programming and Combinatorial Optimization, pp. 419–432. Cham. Springer International Publishing (2020b)

  59. Wang, A. L., Kılınç-Karzan, F.: On the tightness of SDP relaxations of QCQPs. Forthcoming in Math. Program. (2021)

  60. Wei, L., Gómez, A., Küçükyavuz, S.: On the convexification of constrained quadratic optimization problems with indicator variables. In: Bienstock, D., Zambelli, G. (eds.) Integer Programming and Combinatorial Optimization, pp. 433–447. Cham. Springer International Publishing (2020)

  61. Wu, B., Sun, X., Li, D., Zheng, X.: Quadratic convex reformulations for semicontinuous quadratic programming. SIAM J. Opt. 27(3), 1531–1553 (2017)

    MathSciNet  MATH  Google Scholar 

  62. Xie, W., Deng, X.: Scalable algorithms for the sparse ridge regression. SIAM J. Opt. 30(4), 3359–3386 (2020)

    MathSciNet  MATH  Google Scholar 

  63. Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    MathSciNet  MATH  Google Scholar 

  64. Zheng, X., Sun, X., Li, D.: Improving the performance of MIQP solvers for quadratic programs with cardinality and minimum threshold constraints: a semidefinite program approach. INFORMS J. Comput. 26(4), 690–703 (2014)

    MathSciNet  MATH  Google Scholar 

Download references


We thank the AE and two referees whose comments expanded and improved our computational study, and also led to the result in Appendix 1.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Andrés Gómez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is supported, in part, by ONR grant N00014-19-1-2321, and NSF grants 1818700, 2006762, and 2007814.

A preliminary version of this work appeared in [60]

A The special case when \({{\,\mathrm{conv}\,}}(Q)\) is compact

A The special case when \({{\,\mathrm{conv}\,}}(Q)\) is compact

In this section, we give an extended formulation of \({{\,\mathrm{cl \ conv}\,}}(Z_Q)\) based on an extended formulation of \({{\,\mathrm{conv}\,}}(Q_0)\). In particular, this alternative formulation is more favorable in cases when the number of facets of \({{\,\mathrm{conv}\,}}(Q)\) is polynomially bounded while \({{\,\mathrm{conv}\,}}(Q_0)\) has an exponential number of facets. We denote the facets of \({{\,\mathrm{conv}\,}}(Q)\) which do not contain zero by \(\{F_\ell \}_{1 \le \ell \le k}\), and we write each \(F_\ell \) as \(F_\ell := \{z \; | \; A_\ell z \le b_\ell \}\). Angulo et al. [2] prove that \({{\,\mathrm{conv}\,}}(Q_0) = {{\,\mathrm{conv}\,}}\left( \bigcup _{1 \le \ell \le k} F_\ell \right) \), and a natural extended formulation of \({{\,\mathrm{conv}\,}}(Q_0)\) is as follows:

$$\begin{aligned}&z = \sum _{\ell \in [k]} {{\hat{z}}}_\ell \end{aligned}$$
$$\begin{aligned}&A_\ell {{\hat{z}}}_\ell \le \lambda _\ell b_\ell&\ell \in [k] \end{aligned}$$
$$\begin{aligned}&\sum _{\ell \in [k]} \lambda _\ell = 1, \; \lambda \ge \mathbf{0 }. \end{aligned}$$

Theorem 4

$$\begin{aligned}&{{\,\mathrm{cl \ conv}\,}}(Z_Q)\\&\quad = {{\,\mathrm{proj}\,}}_{(z,\beta ,t)} \Big \{(z,{{\hat{z}}}, \lambda , \beta ,t) \in {\mathbb {R}}_+^{(k+1)p + k} \times {\mathbb {R}}^{p}\times {\mathbb {R}}\; | \; (32a)-(32b),\; \sum _{\ell \in [k]} \lambda _\ell \le 1, \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad t \ge f(\mathbf{1 }^\top \beta ),\;t \ge (\mathbf{1 }^\top \lambda )f\left( \frac{\mathbf{1 }^\top \beta }{\mathbf{1 }^\top \lambda }\right) \Big \}. \end{aligned}$$

Proof Let

$$\begin{aligned} Z \!=\!\Big \{(z,{{\hat{z}}}, \lambda , \beta ,t) \!\in \! {\mathbb {R}}_+^{(k+1)p + k} \times {\mathbb {R}}^{p}\!\times \! {\mathbb {R}}\; | \;&(32a)-(32b),\; \sum _{\ell \in [k]} \lambda _\ell \!\le \! 1, t \!\ge \! f(\mathbf{1 }^\top \beta ), \\&t \ge (\mathbf{1 }^\top \lambda )f\left( \frac{\mathbf{1 }^\top \beta }{\mathbf{1 }^\top \lambda }\right) \Big \}. \end{aligned}$$

First we show that \({{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z) \subseteq {{\,\mathrm{cl \ conv}\,}}(Z_Q)\). Given any \((z,{{\hat{z}}}, \lambda , \beta ,t)\in Z\), note that constraints \(z\in {{\,\mathrm{conv}\,}}(Q)\) and \(t \ge f(\mathbf{1 }^\top \beta )\) defining \( {{\,\mathrm{cl \ conv}\,}}(Z_Q)\) are trivially satisfied. It remains to show that \(t \ge (\pi ^\top z)f\left( \frac{\mathbf{1 }^\top \beta }{\pi ^\top z}\right) ,\; \; \forall \pi \in {\mathcal {F}}\). For each \(\pi \in {\mathcal {F}}\), we have

$$\begin{aligned} \pi ^\top z = \sum _{\ell \in [k]} \pi ^\top {{\hat{z}}}_\ell = \sum _{\ell \in [k]} \lambda _\ell \pi ^\top \left( \frac{{{\hat{z}}}_\ell }{\lambda _\ell }\right) \ge \sum _{\ell \in [k]} \lambda _\ell , \end{aligned}$$

where the inequality follows from the fact that we must have either \(\lambda _\ell = 0\) and \({{\hat{z}}}_\ell = \mathbf{0 }\) or \(\lambda _\ell > 0\) and \(\frac{{{\hat{z}}}_\ell }{\lambda _\ell } \in F_\ell \) since each \(F_\ell \) is a polytope contained in the half-space defined by inequality \(\mathbf{1 }^\top z\ge 1\). Thus, from Lemma 1, we have \(t \ge (\mathbf{1 }^\top \lambda )f\left( \frac{\mathbf{1 }^\top \beta }{\mathbf{1 }^\top \lambda }\right) \ge (\pi ^\top z)f\left( \frac{\mathbf{1 }^\top \beta }{\pi ^\top z}\right) ,\; \; \forall \pi \in {\mathcal {F}}\), hence \({{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z) \subseteq {{\,\mathrm{cl \ conv}\,}}(Z_Q)\).

Now, it remains to prove that \({{\,\mathrm{cl \ conv}\,}}(Z_Q) \subseteq {{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z)\). For any \((z , \beta , t) \in {{\,\mathrm{cl \ conv}\,}}(Z_Q)\) if \(z \in {{\,\mathrm{conv}\,}}(Q_0)\), then there exist \({{\hat{z}}}_\ell \) and \(\lambda _\ell \) that satisfy (32) and \(\mathbf{1} ^\top \lambda = 1\). Since \(t \ge f(\mathbf{1 }^\top \beta )\) for all \((z , \beta , t) \in {{\,\mathrm{cl \ conv}\,}}(Z_Q)\), \((z, \beta , t) \in {{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z)\). If \(z \in {{\,\mathrm{conv}\,}}(Q)\backslash {{\,\mathrm{conv}\,}}(Q_0)\), then, from Lemma 2, we can write z as \(z = \lambda _0 z_0\), \(0 \le \lambda _0 < 1\), and we may assume \(z_0\) is on one of the facets of \({{\,\mathrm{conv}\,}}(Q_0)\) defined by \({{\hat{\pi }}}^\top z_0 = 1\) for some \({{\hat{\pi }}} \in {\mathcal {F}}\). By definition, \(\forall \pi \in {\mathcal {F}} \; \;\) \(\pi ^\top z_0 \ge {{\hat{\pi }}}^\top z_0 = 1\) which implies \(\lambda _0 = {{\hat{\pi }}}^\top z = \min _{\pi \in {\mathcal {F}}} \pi ^\top z\). Since \(z_0 \in {{\,\mathrm{conv}\,}}(Q_0)\), there exists \({{\hat{z}}}_\ell , \lambda _\ell \) such that \(z_0 = \sum _{\ell \in [k]} {{\hat{z}}}_\ell \) and (32b)–(32c) hold. Then

$$\begin{aligned} z =&\sum _{\ell \in [k]} (\lambda _0 {{\hat{z}}}_\ell ) \\&A_\ell (\lambda _0 {{\hat{z}}}_\ell ) \le \lambda _0\lambda _\ell b_\ell ,&\ell \in [k] \\&\sum _{\ell \in [k]} \lambda _0 \lambda _\ell \le 1, \lambda \ge \mathbf{0 }, \end{aligned}$$

and we have \(\sum _{\ell \in [k]} \lambda _0 \lambda _\ell = \lambda _0 = \min _{\pi \in {\mathcal {F}}} \pi ^\top z\). Using Lemma 1, we find that \(t \ge (\pi ^\top z)f\left( \frac{\mathbf{1 }^\top \beta }{\pi ^\top z}\right) ,\; \; \forall \pi \in {\mathcal {F}}\) implies that \(t \ge (\sum _{\ell \in [k]} \lambda _0 \lambda _\ell ) f\left( \frac{\mathbf{1 }^\top \beta }{\sum _{\ell \in [k]} \lambda _0 \lambda _\ell }\right) \ge (\sum _{\ell \in [k]} \lambda _\ell ) f\left( \frac{\mathbf{1 }^\top \beta }{\sum _{\ell \in [k]} \lambda _\ell }\right) \). Hence, \((z, \beta , t) \in {{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z)\). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wei, L., Gómez, A. & Küçükyavuz, S. Ideal formulations for constrained convex optimization problems with indicator variables. Math. Program. 192, 57–88 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Convexification
  • Perspective formulation
  • Indicator variables
  • Combinatorial constraints