Skip to main content

Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes

Abstract

Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via the alternating maximization (AM) method. We show that AM is equivalent to GPower for all formulations. Besides this, we provide 24 efficient parallel SPCA implementations: 3 codes (multi-core, GPU and cluster) for each of the 8 problems. Parallelism in the methods is aimed at (1) speeding up computations (our GPU code can be 100 times faster than an efficient serial code written in C++), (2) obtaining solutions explaining more variance and (3) dealing with big data problems (our cluster code can solve a 357 GB problem in a minute).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. In the \(L_1\) penalized formulations this can be seen from the inequality \(\Vert x\Vert _1 \le \sqrt{\Vert x\Vert _0}\Vert x\Vert _2\).

  2. Open source code with efficient implementations of the algorithms developed in this paper is published here: https://github.com/optml/24am.

  3. https://www.kaggle.com/kasikrit/att-database-of-faces/data.

  4. https://github.com/optml/24am.

  5. http://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

  6. Note that the different colors in Tables 8 and 9 are corresponding to the formulations with the same color in Table 2.

  7. http://software.intel.com/en-us/articles/intel-mkl/.

  8. https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2.

  9. http://developer.nvidia.com/cublas.

References

  • Amini AA, Wainwright MJ (2009) High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann Stat 37:2877–2921

    Article  MathSciNet  Google Scholar 

  • Aravkin A, Becker S (2016) Dual smoothing and value function techniques for variational matrix decomposition. Applications in Image and Video Processing, Handbook of Robust Low-Rank and Sparse Matrix Decomposition

  • Bah B, Tanner J (2010) Improved bounds on restricted isometry constants for gaussian matrices. SIAM J Matrix Anal Appl 31:2882–2898

    Article  MathSciNet  Google Scholar 

  • Beck A, Vaisbourd Y (2016) The sparse principal component analysis problem: optimality conditions and algorithms. J Optim Theory Algorithm 170:119–143

    Article  MathSciNet  Google Scholar 

  • Berk L, Bertsimas D (2019) Certifiably optimal sparse principal component analysis. Math Program Comput 11:381–420

    Article  MathSciNet  Google Scholar 

  • Bouwmans T, Sobral A, Javed S, Jung SK, Zahzah EH (2017) Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset. Comput Sci Rev 23:1–71

    Article  Google Scholar 

  • Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58:Article 11

    Article  MathSciNet  Google Scholar 

  • Croux C, Filzmoser P, Fritz H (2013) Robust sparse principal component analysis. Technometrics 55:202–214

    Article  MathSciNet  Google Scholar 

  • d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet G (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49:434–448

    Article  MathSciNet  Google Scholar 

  • d’Aspremont A, Bach F, El Ghaoui L (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294

    MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC

  • Hubert M, Reynkens T, Schmitt E, Verdonck T (2016) Sparse PCA for high-dimensional data with outliers. Technometrics 58:424–434

    Article  MathSciNet  Google Scholar 

  • Jollife I (1986) Principal component analysis. Springer, New York

    Book  Google Scholar 

  • Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12(3):531–547

    Article  MathSciNet  Google Scholar 

  • Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553

    MathSciNet  MATH  Google Scholar 

  • Kwak N (2008) Principal component analysis based on \(l_1\) norm maximization. IEEE Trans Pattern Anal Mach Intell 30:1672–1680

    Article  MathSciNet  Google Scholar 

  • Lei J, Vu VQ (2015) Sparsity and agnostic inference in sparse pca. Ann Stat 43:299–322

    MATH  Google Scholar 

  • Lu Z, Zhang Y (2012) An augmented Lagrangian approach for sparse principal component analysis. Math Program Ser A 135:149–193. https://doi.org/10.1007/s10107-011-0452-4

    Article  MathSciNet  MATH  Google Scholar 

  • Luss R, Teboulle M (2013) Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev 55:65–98

    Article  MathSciNet  Google Scholar 

  • Mackey L (2008) Deflation methods for sparse PCA. Adv Neural Inf Process Syst 21:1017–1024

    Google Scholar 

  • Magdon-Ismail M (2017) Np-hardness and inapproximability of sparse PCA. Inf Process Lett 126:35–38

    Article  MathSciNet  Google Scholar 

  • Meng D, Zhao Q, Xu Z (2012) Improve robustness of sparse PCA by \(l_1\)-norm maximization. Pattern Recogn 45:487–497

    Article  Google Scholar 

  • Moghaddam B, Weiss Y, Avidan S (2006) Spectral bounds for sparse PCA: exact and greedy algorithms. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems. MIT Press, Cambridge, vol 18, pp 915–922

  • Qi X, Luo R, Zhao H (2013) Sparse principal component analysis by choice of norm. J Multivar Anal 114:127–160

    Article  MathSciNet  Google Scholar 

  • Richtárik P (2011) Finding sparse approximations to extreme eigenvectors: generalized power method for sparse PCA and extensions. In: Proceedings of signal processing with adaptive sparse structured representations

  • Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 99(6):1015–1034

    Article  MathSciNet  Google Scholar 

  • Trendafilov NT (2016) From simple structure to sparse components: a review. Comput Stat 29:431–454

    Article  MathSciNet  Google Scholar 

  • Trendafilov NT, Jolliffe IT (2006) Projected gradient approach to the numerical solution of the scotlass. J Comput Stat Data Anal 50:242–253

    Article  MathSciNet  Google Scholar 

  • Vu VQ, Lei J (2013) Minimax sparse principal subspace estimation in high dimensions. Ann Stat 41:2905–2947

    Article  MathSciNet  Google Scholar 

  • Vu VQ, Cho J, Lei J, Rohe K (2013) Fantope projection and selection: a near-optimal convex relaxation of sparse PCA. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Red Hook, New York, pp 2670–2678

  • Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applicaitons to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534

    Article  Google Scholar 

  • Zhang Y, El Ghaoui L (2011) Large-scale sparse principal component analysis with application to text data. Adv Neural Inf Process Syst 24:532–539

    Google Scholar 

  • Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Takáč.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

MT was partially supported by National Science Foundation Grants CCF-1618717, CMMI-1663256 and CCF-1740796.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Richtárik, P., Jahani, M., Ahipaşaoğlu, S.D. et al. Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes. Optim Eng 22, 1493–1519 (2021). https://doi.org/10.1007/s11081-020-09562-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11081-020-09562-3

Keywords

  • Sparse PCA
  • Alternating maximization
  • GPower
  • Big data analytics
  • Unsupervised learning