Mathematical Programming Computation

, Volume 11, Issue 3, pp 381–420 | Cite as

Certifiably optimal sparse principal component analysis

  • Lauren Berk
  • Dimitris BertsimasEmail author
Full Length Paper


This paper addresses the sparse principal component analysis (SPCA) problem for covariance matrices in dimension n aiming to find solutions with sparsity k using mixed integer optimization. We propose a tailored branch-and-bound algorithm, Optimal-SPCA, that enables us to solve SPCA to certifiable optimality in seconds for \(n = 100\) s, \(k=10\) s. This same algorithm can be applied to problems with \(n=10{,}000\,\mathrm{s}\) or higher to find high-quality feasible solutions in seconds while taking several hours to prove optimality. We apply our methods to a number of real data sets to demonstrate that our approach scales to the same problem sizes attempted by other methods, while providing superior solutions compared to those methods, explaining a higher portion of variance and permitting complete control over the desired sparsity. The software that was reviewed as part of this submission has been given the DOI (digital object identifier)


Sparse principal component analysis Principal component analysis Mixed integer optimization Sparse eigenvalues 

Mathematics Subject Classification

62H25 65F15 65K05 90C06 90C26 90C27 



  1. 1.
    Amini, A.A., Wainwright, M.J.: High-dimensional analysis of semidefinite relaxations for sparse principal components. In: IEEE International Symposium on Information Theory, pp. 2454–2458. IEEE (2008)Google Scholar
  2. 2.
    Asteris, M., Papailiopoulos, D., Kyrillidis, A., Dimakis, A.G.: Sparse PCA via bipartite matchings. In: Advances in Neural Information Processing Systems, pp. 766–774 (2015)Google Scholar
  3. 3.
    Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Beck, A., Vaisbourd, Y.: The sparse principal component analysis problem: optimality conditions and algorithms. J0 Optim. Theory Appl. 170(1), 119–143 (2016)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bennett, K.P., Parrado-Hernández, E.: The interplay of optimization and machine learning research. J. Mach. Learn. Res. 7, 1265–1281 (2006)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Bertsimas, D., Copenhaver, M.S.: Characterization of the equivalence of robustification and regularization in linear and matrix regression. Eur. J. Oper. Res. 270, 931–942 (2017)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Bertsimas, D., Copenhaver, M.S., Mazumder, R.: Certifiably optimal low rank factor analysis. J. Mach. Learn. Res. 18(29), 1–53 (2017)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 64(1), 1–44 (2017)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Bertsimas, D., King, A.: An algorithmic approach to linear regression. Oper. Res. 64(1), 2–16 (2016)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Bertsimas, D., King, A., Mazumder, R., et al.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Bixby, R.E.: A brief history of linear and mixed-integer programming computation. Doc. Math. Extra Volume: Optimization Stories, 107–121 (2012)Google Scholar
  13. 13.
    Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Carrizosa, E., Guerrero, V.: rs-Sparse principal component analysis: a mixed integer nonlinear programming approach with VNS. Comput. Oper. Res. 52, 349–354 (2014)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Chamberlain, G., Rothschild, M.J.: Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51, 1281–1304 (1983)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Chan, S.O., Papailiopoulos, D., Rubinstein, A.: On the worst-case approximability of sparse PCA. arXiv preprint arXiv:1507.05950 (2015)
  17. 17.
    Chen, Y., Jalali, A., Sanghavi, S., Xu, H.: Clustering partially observed graphs via convex optimization. J. Mach. Learn. Res. 15(1), 2213–2238 (2014)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Computing, J.: Julia micro-benchmarks (2018).
  19. 19.
    d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)MathSciNetzbMATHGoogle Scholar
  20. 20.
    d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Deluzio, K., Astephen, J.: Biomechanical features of gait waveform data associated with knee osteoarthritis: an application of principal component analysis. Gait Posture 25(1), 86–93 (2007)Google Scholar
  22. 22.
    Ding, C., He, X.: K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning, Banff, Alberta, Canada, 04–08 July 2004, p. 29. ACM, New York (2004).
  23. 23.
    Du, Q., Fowler, J.E.: Hyperspectral image compression using jpeg2000 and principal component analysis. IEEE Geosci. Remote Sens. Lett. 4(2), 201–205 (2007)Google Scholar
  24. 24.
    Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017). MathSciNetzbMATHGoogle Scholar
  25. 25.
    Gurobi Optimization Inc.: Gurobi 7.0 performance benchmarks. (2015). Accessed 17 Dec 2016
  26. 26.
    Gurobi Optimization Inc.: Gurobi optimizer reference manual (2017).
  27. 27.
    Hand, D.J., Daly, F., McConway, K., Lunn, D., Ostrowski, E.: A Handbook of Small Data Sets, vol. 1. CRC Press, Boca Raton (1993)zbMATHGoogle Scholar
  28. 28.
    Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)zbMATHGoogle Scholar
  29. 29.
    Hein, M., Bühler, T.: An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA. In: Advances in Neural Information Processing Systems, pp. 847–855 (2010)Google Scholar
  30. 30.
    Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)zbMATHGoogle Scholar
  31. 31.
    Hsu, Y.L., Huang, P.Y., Chen, D.T.: Sparse principal component analysis in cancer research. Transl. Cancer Res. 3(3), 182 (2014)Google Scholar
  32. 32.
  33. 33.
    Iezzoni, A.F., Pritts, M.P.: Applications of principal component analysis to horticultural research. HortScience 26(4), 334–338 (1991)Google Scholar
  34. 34.
    Iguchi, T., Mixon, D.G., Peterson, J., Villar, S.: Probably certifiably correct k-means clustering. Math. Program. 165(2), 605–642 (2017)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Jeffers, J.N.: Two case studies in the application of principal component analysis. Appl. Stat. 16(3), 225–236 (1967)Google Scholar
  36. 36.
    Jolliffe, I.T.: Rotation of principal components: choice of normalization constraints. J. Appl. Stat. 22(1), 29–35 (1995)MathSciNetGoogle Scholar
  37. 37.
    Jolliffe, I.T.: Principal Component Analysis. Wiley, London (2002)zbMATHGoogle Scholar
  38. 38.
    Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003)MathSciNetGoogle Scholar
  39. 39.
    Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Kaiser, H.F.: The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3), 187–200 (1958)zbMATHGoogle Scholar
  41. 41.
    Kumar, V., Kanal, L.N.: Parallel branch-and-bound formulations for and/or tree search. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 768–778 (1984)Google Scholar
  42. 42.
    Labib, K., Vemuri, V.R.: An application of principal component analysis to the detection and visualization of computer network attacks. Annales des Telecommunications/Ann. Telecommun. 61(1–2), 218–234 (2006)Google Scholar
  43. 43.
    Land, A.H., Doig, A.G.: An automatic method of solving discrete programming problems. Econometrica 28, 497–520 (1960)MathSciNetzbMATHGoogle Scholar
  44. 44.
    Lee, S., Epstein, M.P., Duncan, R., Lin, X.: Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies. Genet. Epidemiol. 36(4), 293–302 (2012)Google Scholar
  45. 45.
    Lee, Y.K., Lee, E.R., Park, B.U.: Principal component analysis in very high-dimensional spaces. Stat. Sin. 22(1), 933–956 (2012)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Leng, C., Wang, H.: On general adaptive sparse principal component analysis. J. Comput. Graph. Stat. 18(1), 201–215 (2009)MathSciNetGoogle Scholar
  47. 47.
    Li, G.J., Wah, B.W.: Coping with anomalies in parallel branch-and-bound algorithms. IEEE Trans. Comput. 100(6), 568–573 (1986)MathSciNetGoogle Scholar
  48. 48.
    Lichman, M.: UCI machine learning repository (2013).
  49. 49.
    Lougee-Heimer, R.: The common optimization interface for operations research. IBM J. Res. Dev. 47(1), 57–66 (2003)Google Scholar
  50. 50.
    Luss, R., Teboulle, M.: Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev. 55(1), 65–98 (2013)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Ma, Z., et al.: Sparse principal component analysis and iterative thresholding. Ann. Stat. 41(2), 772–801 (2013)MathSciNetzbMATHGoogle Scholar
  52. 52.
    Mangasarian, O.L.: Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J. Mach. Learn. Res. 7, 1517–1530 (2006)MathSciNetzbMATHGoogle Scholar
  53. 53.
    Mazumder, R., Radchenko, P., Dedieu, A.: Subset selection with shrinkage: sparse linear modeling when the snr is low. arXiv preprint arXiv:1708.03288 (2017)
  54. 54.
    Moghaddam, B., Weiss, Y., Avidan, S.: Spectral bounds for sparse PCA: Exact and greedy algorithms. In: Advances in Neural Information Processing Systems, pp. 915–922 (2005)Google Scholar
  55. 55.
    Nemhauser, G.L.: Integer Programming: the Global Impact. Presented at EURO, INFORMS, Rome, Italy, 2013. (2013). Accessed 9 Sept 2015
  56. 56.
    Papailiopoulos, D.S., Dimakis, A.G., Korokythakis, S.: Sparse PCA through low-rank approximations. ICML 3, 747–755 (2013)Google Scholar
  57. 57.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  58. 58.
    Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38(8), 904–909 (2006)Google Scholar
  59. 59.
    Richman, M.B.: Rotation of principal components. J. Climatol. 6(3), 293–335 (1986)MathSciNetGoogle Scholar
  60. 60.
    Richtárik, P., Takáč, M., Ahipaşaoğlu, S.D.: Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes. arXiv preprint arXiv:1212.4137 (2012)
  61. 61.
    Scott, D.S.: On the accuracy of the Gerschgorin circle theorem for bounding the spread of a real symmetric matrix. Linear Algebra Appl. 65, 147–155 (1985)MathSciNetzbMATHGoogle Scholar
  62. 62.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25, 2960–2968 (2012)Google Scholar
  63. 63.
    Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)Google Scholar
  64. 64.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  65. 65.
    Top500 Supercomputer Sites: performance development. (2016). Accessed 17 Dec 2016
  66. 66.
    Wilkinson, J.H.: The Algebraic Eigenvalue Problem, vol. 87. Clarendon Press, Oxford (1965)zbMATHGoogle Scholar
  67. 67.
    Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)Google Scholar
  68. 68.
    Witten, D.M., Tibshirani, R.J.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8(1), 1–27 (2009)MathSciNetzbMATHGoogle Scholar
  69. 69.
    Yanover, C., Meltzer, T., Weiss, Y.: Linear programming relaxations and belief propagation—an empirical study. J. Mach. Learn. Res. 7, 1887–1907 (2006)MathSciNetzbMATHGoogle Scholar
  70. 70.
    Yuan, X.T., Zhang, T.: Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14, 899–925 (2013)MathSciNetzbMATHGoogle Scholar
  71. 71.
    Zeng, Z.Q., Yu, H.B., Xu, H.R., Xie, Y.Q., Gao, J.: Fast training support vector machines using parallel sequential minimal optimization. In: 3rd International Conference on Intelligent System and Knowledge Engineering, 2008, vol. 1, pp. 997–1001. ISKE 2008. IEEE (2008)Google Scholar
  72. 72.
    Zhang, Y., Ghaoui, L.E.: Large-scale sparse principal component analysis with application to text data. In: Advances in Neural Information Processing Systems, vol. 24, pp. 532–539 (2011)Google Scholar
  73. 73.
    Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Operations Research CenterMassachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations