Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes

Abstract

Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via the alternating maximization (AM) method. We show that AM is equivalent to GPower for all formulations. Besides this, we provide 24 efficient parallel SPCA implementations: 3 codes (multi-core, GPU and cluster) for each of the 8 problems. Parallelism in the methods is aimed at (1) speeding up computations (our GPU code can be 100 times faster than an efficient serial code written in C++), (2) obtaining solutions explaining more variance and (3) dealing with big data problems (our cluster code can solve a 357 GB problem in a minute).

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    In the \(L_1\) penalized formulations this can be seen from the inequality \(\Vert x\Vert _1 \le \sqrt{\Vert x\Vert _0}\Vert x\Vert _2\).

  2. 2.

    Open source code with efficient implementations of the algorithms developed in this paper is published here: https://github.com/optml/24am.

  3. 3.

    https://www.kaggle.com/kasikrit/att-database-of-faces/data.

  4. 4.

    https://github.com/optml/24am.

  5. 5.

    http://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

  6. 6.

    Note that the different colors in Tables 8 and 9 are corresponding to the formulations with the same color in Table 2.

  7. 7.

    http://software.intel.com/en-us/articles/intel-mkl/.

  8. 8.

    https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2.

  9. 9.

    http://developer.nvidia.com/cublas.

References

  1. Amini AA, Wainwright MJ (2009) High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann Stat 37:2877–2921

    MathSciNet  Article  Google Scholar 

  2. Aravkin A, Becker S (2016) Dual smoothing and value function techniques for variational matrix decomposition. Applications in Image and Video Processing, Handbook of Robust Low-Rank and Sparse Matrix Decomposition

  3. Bah B, Tanner J (2010) Improved bounds on restricted isometry constants for gaussian matrices. SIAM J Matrix Anal Appl 31:2882–2898

    MathSciNet  Article  Google Scholar 

  4. Beck A, Vaisbourd Y (2016) The sparse principal component analysis problem: optimality conditions and algorithms. J Optim Theory Algorithm 170:119–143

    MathSciNet  Article  Google Scholar 

  5. Berk L, Bertsimas D (2019) Certifiably optimal sparse principal component analysis. Math Program Comput 11:381–420

    MathSciNet  Article  Google Scholar 

  6. Bouwmans T, Sobral A, Javed S, Jung SK, Zahzah EH (2017) Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset. Comput Sci Rev 23:1–71

    Article  Google Scholar 

  7. Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58:Article 11

    MathSciNet  Article  Google Scholar 

  8. Croux C, Filzmoser P, Fritz H (2013) Robust sparse principal component analysis. Technometrics 55:202–214

    MathSciNet  Article  Google Scholar 

  9. d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet G (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49:434–448

    MathSciNet  Article  Google Scholar 

  10. d’Aspremont A, Bach F, El Ghaoui L (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294

    MathSciNet  MATH  Google Scholar 

  11. Hastie T, Tibshirani R, Wainwright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC

  12. Hubert M, Reynkens T, Schmitt E, Verdonck T (2016) Sparse PCA for high-dimensional data with outliers. Technometrics 58:424–434

    MathSciNet  Article  Google Scholar 

  13. Jollife I (1986) Principal component analysis. Springer, New York

    Google Scholar 

  14. Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12(3):531–547

    MathSciNet  Article  Google Scholar 

  15. Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553

    MathSciNet  MATH  Google Scholar 

  16. Kwak N (2008) Principal component analysis based on \(l_1\) norm maximization. IEEE Trans Pattern Anal Mach Intell 30:1672–1680

    Article  Google Scholar 

  17. Lei J, Vu VQ (2015) Sparsity and agnostic inference in sparse pca. Ann Stat 43:299–322

    Article  Google Scholar 

  18. Lu Z, Zhang Y (2012) An augmented Lagrangian approach for sparse principal component analysis. Math Program Ser A 135:149–193. https://doi.org/10.1007/s10107-011-0452-4

    MathSciNet  Article  MATH  Google Scholar 

  19. Luss R, Teboulle M (2013) Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev 55:65–98

    MathSciNet  Article  Google Scholar 

  20. Mackey L (2008) Deflation methods for sparse PCA. Adv Neural Inf Process Syst 21:1017–1024

    Google Scholar 

  21. Magdon-Ismail M (2017) Np-hardness and inapproximability of sparse PCA. Inf Process Lett 126:35–38

    MathSciNet  Article  Google Scholar 

  22. Meng D, Zhao Q, Xu Z (2012) Improve robustness of sparse PCA by \(l_1\)-norm maximization. Pattern Recogn 45:487–497

    Article  Google Scholar 

  23. Moghaddam B, Weiss Y, Avidan S (2006) Spectral bounds for sparse PCA: exact and greedy algorithms. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems. MIT Press, Cambridge, vol 18, pp 915–922

  24. Qi X, Luo R, Zhao H (2013) Sparse principal component analysis by choice of norm. J Multivar Anal 114:127–160

    MathSciNet  Article  Google Scholar 

  25. Richtárik P (2011) Finding sparse approximations to extreme eigenvectors: generalized power method for sparse PCA and extensions. In: Proceedings of signal processing with adaptive sparse structured representations

  26. Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 99(6):1015–1034

    MathSciNet  Article  Google Scholar 

  27. Trendafilov NT (2016) From simple structure to sparse components: a review. Comput Stat 29:431–454

    MathSciNet  Article  Google Scholar 

  28. Trendafilov NT, Jolliffe IT (2006) Projected gradient approach to the numerical solution of the scotlass. J Comput Stat Data Anal 50:242–253

    MathSciNet  Article  Google Scholar 

  29. Vu VQ, Lei J (2013) Minimax sparse principal subspace estimation in high dimensions. Ann Stat 41:2905–2947

    MathSciNet  Article  Google Scholar 

  30. Vu VQ, Cho J, Lei J, Rohe K (2013) Fantope projection and selection: a near-optimal convex relaxation of sparse PCA. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Red Hook, New York, pp 2670–2678

  31. Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applicaitons to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534

    Article  Google Scholar 

  32. Zhang Y, El Ghaoui L (2011) Large-scale sparse principal component analysis with application to text data. Adv Neural Inf Process Syst 24:532–539

    Google Scholar 

  33. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    MathSciNet  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Martin Takáč.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

MT was partially supported by National Science Foundation Grants CCF-1618717, CMMI-1663256 and CCF-1740796.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Richtárik, P., Jahani, M., Ahipaşaoğlu, S.D. et al. Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes. Optim Eng (2020). https://doi.org/10.1007/s11081-020-09562-3

Download citation

Keywords

  • Sparse PCA
  • Alternating maximization
  • GPower
  • Big data analytics
  • Unsupervised learning