Convexification of Learning from Constraints

  • Iaroslav Shcherbatyi
  • Bjoern AndresEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9796)


Regularized empirical risk minimization with constrained labels (in contrast to fixed labels) is a remarkably general abstraction of learning. For common loss and regularization functions, this optimization problem assumes the form of a mixed integer program (MIP) whose objective function is non-convex. In this form, the problem is resistant to standard optimization techniques. We construct MIPs with the same solutions whose objective functions are convex. Specifically, we characterize the tightest convex extension of the objective function, given by the Legendre-Fenchel biconjugate. Computing values of this tightest convex extension is NP-hard. However, by applying our characterization to every function in an additive decomposition of the objective function, we obtain a class of looser convex extensions that can be computed efficiently. For some decompositions, common loss and regularization functions, we derive a closed form.


  1. 1.
    Bach, F.: Learning with submodular functions: a convex optimization perspective. Found. Trends Mach. Learn. 6(2–3), 145–373 (2013)CrossRefzbMATHGoogle Scholar
  2. 2.
    Ballerstein, M.: Convex relaxations for mixed-integer nonlinear programs. Dissertation, Eidgenössische Technische Hochschule ETH Zürich, Nr. 21024 (2013)Google Scholar
  3. 3.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-integer nonlinear optimization. Acta Numerica 22, 1–131 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bie, T.D., Cristianini, N.: Semi-supervised learning using semi-definite programming. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semi-Supervised Learning, pp. 119–135. MIT Press, Cambridge (2006)Google Scholar
  6. 6.
    Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: ICCV (2013)Google Scholar
  7. 7.
    Bonami, P., Kilinç, M., Linderoth, J.: Algorithms and software for convex mixed integer nonlinear programs. In: Lee, J., Leyffer, S. (eds.) Mixed Integer Nonlinear Programming, pp. 1–39. Springer, New York (2012)CrossRefGoogle Scholar
  8. 8.
    Chambolle, A., Cremers, D., Pock, T.: A convex approach to minimal partitions. SIAM J. Imag. Sci. 5(4), 1113–1158 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Chapelle, O., Chi, M., Zien, A.: A continuation method for semi-supervised SVMs. In: ICML (2006)Google Scholar
  10. 10.
    Chapelle, O., Sindhwani, V., Keerthi, S.S.: Branch and bound for semi-supervised support vector machines. In: NIPS (2006)Google Scholar
  11. 11.
    Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)zbMATHGoogle Scholar
  12. 12.
    Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: AISTATS (2005)Google Scholar
  13. 13.
    Chopra, S., Rao, M.R.: The partition problem. Math. Programm. 59(1–3), 87–115 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. Theoret. Comput. Sci. 361(2), 172–187 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Finley, T., Joachims, T.: Supervised clustering with support vector machines. In: ICML (2005)Google Scholar
  16. 16.
    Grötschel, M., Wakabayashi, Y.: A cutting plane algorithm for a clustering problem. Math. Programm. 45(1), 59–96 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Guo, Y., Schuurmans, D.: Convex relaxations of latent variable training. In: NIPS (2008)Google Scholar
  18. 18.
    Guo, Y., Schuurmans, D.: Adaptive large margin training for multilabel classification. In: AAAI (2011)Google Scholar
  19. 19.
    Jach, M., Michaels, D., Weismantel, R.: The convex envelope of (n-1)-convex functions. SIAM J. Optim. 19(3), 1451–1466 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML (1999)Google Scholar
  21. 21.
    Joachims, T.: Transductive learning via spectral graph partitioning. In: ICML (2003)Google Scholar
  22. 22.
    Joulin, A., Bach, F.: A convex relaxation for weakly supervised classifiers. In: ICML (2012)Google Scholar
  23. 23.
    Khajavirad, A., Sahinidis, N.V.: Convex envelopes of products of convex and component-wise concave functions. J. Global Optim. 52(3), 391–409 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Khajavirad, A., Sahinidis, N.V.: Convex envelopes generated from finitely many compact convex sets. Math. Programm. 137(1–2), 371–408 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Lee, J., Leyffer, S.: Mixed Integer Nonlinear Programming. Springer, Heidelberg (2011)Google Scholar
  26. 26.
    Li, Y.F., Tsang, I.W., Kwok, J.T., Zhou, Z.H.: Tighter and convex maximum margin clustering. In: AISTATS (2009)Google Scholar
  27. 27.
    Locatelli, M.: A technique to derive the analytical form of convex envelopes for some bivariate functions. J. Global Optim. 59(2–3), 477–501 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Martí, R., Reinelt, G.: The Linear Ordering Problem: Exact and Heuristic Methods in Combinatorial Optimization. Springer, Heidelberg (2011)CrossRefzbMATHGoogle Scholar
  29. 29.
    Pock, T., Chambolle, A., Cremers, D., Bischof, H.: A convex relaxation approach for computing minimal partitions. In: CVPR (2009)Google Scholar
  30. 30.
    Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the mumford-shah functional. In: ICCV (2009)Google Scholar
  31. 31.
    Pock, T., Schoenemann, T., Graber, G., Bischof, H., Cremers, D.: A convex formulation of continuous multi-label problems. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 792–805. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  32. 32.
    Sindhwani, V., Keerthi, S.S., Chapelle, O.: Deterministic annealing for semi-supervised kernel machines. In: ICML (2006)Google Scholar
  33. 33.
    Strekalovskiy, E., Chambolle, A., Cremers, D.: A convex representation for the vectorial mumford-shah functional. In: CVPR (2012)Google Scholar
  34. 34.
    Tawarmalani, M., Richard, J.P.P., Xiong, C.: Explicit convex and concave envelopes through polyhedral subdivisions. Math. Programm. 138(1–2), 531–577 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous and Mixed-integer Nonlinear Programming: Theory, Algorithms, Software, and Applications. Springer, New York (2002)CrossRefzbMATHGoogle Scholar
  36. 36.
    Tawarmalani, M., Sahinidis, N.V.: Global optimization of mixed-integer nonlinear programs: a theoretical and computational study. Math. Programm. 99(3), 563–591 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Vapnik, V.N., Chervonenkis, A.J.: Theory of pattern recognition: Statistical problems of learning. Nauka, Moscow (1974)zbMATHGoogle Scholar
  38. 38.
    Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: NIPS (2005)Google Scholar
  39. 39.
    Xu, L., Schuurmans, D.: Unsupervised and semi-supervised multi-class support vector machines. In: AAAI (2005)Google Scholar
  40. 40.
    Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum margin clustering made practical. IEEE Trans. Neural Netw. 20(4), 583–596 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Saarland UniversitySaarbrückenGermany

Personalised recommendations