Mathematical Programming

, Volume 169, Issue 1, pp 141–176 | Cite as

DC formulations and algorithms for sparse optimization problems

Full Length Paper Series B


We propose a DC (Difference of two Convex functions) formulation approach for sparse optimization problems having a cardinality or rank constraint. With the largest-k norm, an exact DC representation of the cardinality constraint is provided. We then transform the cardinality-constrained problem into a penalty function form and derive exact penalty parameter values for some optimization problems, especially for quadratic minimization problems which often appear in practice. A DC Algorithm (DCA) is presented, where the dual step at each iteration can be efficiently carried out due to the accessible subgradient of the largest-k norm. Furthermore, we can solve each DCA subproblem in linear time via a soft thresholding operation if there are no additional constraints. The framework is extended to the rank-constrained problem as well as the cardinality- and the rank-minimization problems. Numerical experiments demonstrate the efficiency of the proposed DCA in comparison with existing methods which have other penalty terms.


Sparse optimization Cardinality constraint Rank constraint DCA Largest-k norm Ky Fan k norm Proximal operation 

Mathematics Subject Classification

47A30 90C20 90C26 90C90 



The research of the first author was supported by JSPS KAKENHI Grant Number 15K01204, 22510138, and 26242027. The research of the second author was supported by JST CREST Grant Number JPMJCR15K5, Japan. The authors are very grateful for the reviewers, whose comments enabled us to improve the readability of the paper.


  1. 1.
    Alizadeh, F.: Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM J. Optim. 5(1), 13–51 (1995)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics, vol. 341. Wiley, New York (1981)MATHGoogle Scholar
  3. 3.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)CrossRefMATHGoogle Scholar
  4. 4.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Bertsimas, D., Pachamanova, D., Sim, M.: Robust linear optimization under general norms. Oper. Res. Lett. 32(6), 510–516 (2004)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, (1998).
  8. 8.
    Blum, M., Floyd, R.W., Pratt, V., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: International Conference on Machine Learning, Volume 98, pp. 82–90. (1998)Google Scholar
  10. 10.
    Brodie, J., Daubechies, I., De Mol, C., Giannone, D., Loris, I.: Sparse and stable markowitz portfolios. Proc. Natl. Acad. Sci. 106(30), 12267–12272 (2009)CrossRefMATHGoogle Scholar
  11. 11.
    Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Cai, J., Candes, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Cai, X., Nie, F., Huang, H.: Exact top-\(k\) feature selection via \(\ell _{2,0}\)-norm constraint. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (2013)Google Scholar
  14. 14.
    Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Donoho, D.L.: De-noising by soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of International Conference on Machine Learning Volume 28, pp. 37–45. (2013)Google Scholar
  20. 20.
    Gotoh, J., Uryasev, S.: Two pairs of families of polyhedral norms versus \(\ell _p\)-norms: proximity and applications in optimization. Math. Program. 156(1), 391–431 (2016)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Gulpinar, N., Le Thi, H.A., Moeini, M.: Robust investment strategies with discrete asset choice constraints using DC programming. Optimization 59(1), 45–62 (2010)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Hempel, A.B., Goulart, P.J.: A novel method for modelling cardinality and rank constraints. In: IEEE Conference on Decision and Control, pp. 4322–4327. Los Angeles, USA, December (2014).;action=details;id=4712
  23. 23.
    Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 3rd edn. Springer-Verlag, Berlin (1996)CrossRefMATHGoogle Scholar
  24. 24.
    Hu, Y., Zhang, D., Ye, J., Li, X., He, X.: Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2117–2130 (2013)CrossRefGoogle Scholar
  25. 25.
    Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)MathSciNetMATHGoogle Scholar
  26. 26.
    Le Thi, H.A., Pham Dinh, T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Le Thi, H.A., Pham Dinh, T., Muu, L.D.: Exact penalty in D.C. programming. Vietnam J. Math. 27(2), 169–178 (1999)MathSciNetMATHGoogle Scholar
  28. 28.
    Le Thi, H.A., Le, H.M., Nguyen, V.V., Pham Dinh, T.: A DC programming approach for feature selection in support vector machines learning. Adv. Data Anal. Classif. 2(3), 259–278 (2008)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Le Thi, H.A., Pham Dinh, T., Yen, N.D.J.: Properties of two DC algorithms in quadratic programming. J. Glob. Optim. 49(3), 481–495 (2011)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Le Thi, H.A., Pham Dinh, T., Ngai, H.V.J.: Exact penalty and error bounds in DC programming. J. Glob. Optim. 52(3), 509–535 (2012)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Lu, C., Tang, J., Yan, S., Lin, Z.: Generalized nonconvex nonsmooth low-rank minimization. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 4130–4137. IEEE, (2014)Google Scholar
  32. 32.
    Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. 128(1), 321–353 (2011)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247(3), 721–731 (2015a)Google Scholar
  34. 34.
    Miyashiro, R., Takano, Y.: Subset selection by Mallow’s \(C_p\): a mixed integer programming approach. Expert Syst. Appl. 42(1), 325–331 (2015b)CrossRefGoogle Scholar
  35. 35.
    Moghaddam, B., Weiss, Y., Avidan, S.: Generalized spectral bounds for sparse lda. In: Proceedings of the 23rd International Conference on Machine learning, pp. 641–648. ACM, (2006)Google Scholar
  36. 36.
    Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)MathSciNetCrossRefMATHGoogle Scholar
  38. 38.
    Nguyen, T.B.T., Le Thi, H.A., Le, H.M., Vo, X.T.: DC approximation approach for \(\ell _0\)-minimization in compressed sensing. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering, pp. 37–48. Springer, Berlin (2015)Google Scholar
  39. 39.
    Nhat, P.D., Nguyen, M.C., Le Thi, H.A.: A DC programming approach for sparse linear discriminant analysis. In: Do, T.V., Le Thi, H.A., Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge Engineering, pp. 65–74. Springer, (2014)Google Scholar
  40. 40.
    Nocedal, Jorge, Wright, Stephen J.: Numerical Optimization 2nd. Springer, Berlin (2006)MATHGoogle Scholar
  41. 41.
    Overton, M.L., Womersley, R.S.: Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Program. 62(1–3), 321–357 (1993)MathSciNetCrossRefMATHGoogle Scholar
  42. 42.
    Pavlikov, K., Uryasev, S.: CVaR norm and applications in optimization. Optim. Lett. 8(7), 1999–2020 (2014)MathSciNetCrossRefMATHGoogle Scholar
  43. 43.
    Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)MathSciNetMATHGoogle Scholar
  44. 44.
    Pham Dinh, T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)MathSciNetCrossRefMATHGoogle Scholar
  45. 45.
    Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. Trans. Comput. Collect. Intell. 8342, 1–37 (2014)Google Scholar
  46. 46.
    Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetCrossRefMATHGoogle Scholar
  47. 47.
    Shevade, S.K., Keerthi, S.S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)CrossRefGoogle Scholar
  48. 48.
    Smola, A.J., Vishwanathan, S.V.N., Hofmann, T.: Kernel methods for missing variables. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pp. 325–332. (2005)Google Scholar
  49. 49.
    Sriperumbudur, B.K., Lanckriet, G.R.G.: A proof of convergence of the concave-convex procedure using Zangwill’s theory. Neural Comput. 24(6), 1391–1407 (2012)MathSciNetCrossRefMATHGoogle Scholar
  50. 50.
    Takeda, A., Niranjan, M., Gotoh, J., Kawahara, Y.: Simultaneous pursuit of out-of-sample performance and sparsity in index tracking portfolios. Comput. Manag. Sci. 10(1), 21–49 (2013)MathSciNetCrossRefMATHGoogle Scholar
  51. 51.
    Thiao, M., Pham Dinh, T., Le Thi, H.A.: A DC programming approach for sparse eigenvalue problem. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1063–1070. (2010)Google Scholar
  52. 52.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)MathSciNetMATHGoogle Scholar
  53. 53.
    Toh, K.-C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J. Optim. 6(3), 615–640 (2010)MathSciNetMATHGoogle Scholar
  54. 54.
    Watson, G.A.: Linear best approximation using a class of polyhedral norms. Numer. Algorithms 2(3), 321–335 (1992)MathSciNetCrossRefMATHGoogle Scholar
  55. 55.
    Watson, G.A.: On matrix approximation problems with Ky Fan \(k\) norms. Numer. Algorithms 5(5), 263–272 (1993)MathSciNetCrossRefMATHGoogle Scholar
  56. 56.
    Wu, B., Ding, C., Sun, D.F., Toh, K.-C.: On the Moreau-Yosida regularization of the vector \(k\)-norm related functions. SIAM J. Optim. 24, 766–794 (2014)MathSciNetCrossRefMATHGoogle Scholar
  57. 57.
    Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)MathSciNetCrossRefMATHGoogle Scholar
  58. 58.
    Zheng, X., Sun, X., Li, D., Sun, J.: Successive convex approximations to cardinality-constrained convex programs: a piecewise-linear DC approach. Comput. Optim. Appl. 59(1–2), 379–397 (2014)MathSciNetCrossRefMATHGoogle Scholar
  59. 59.
    Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. stat. 15(2), 265–286 (2006)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany and Mathematical Optimization Society 2017

Authors and Affiliations

  1. 1.Department of Industrial and Systems EngineeringChuo UniversityTokyoJapan
  2. 2.Department of Mathematical Analysis and Statistical InferenceThe Institute of Statistical MathematicsTokyoJapan
  3. 3.RIKEN Center for Advanced Intelligence ProjectTokyoJapan
  4. 4.Data Science Research LaboratoriesNEC CorporationKanagawaJapan

Personalised recommendations