Abstract
Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via the alternating maximization (AM) method. We show that AM is equivalent to GPower for all formulations. Besides this, we provide 24 efficient parallel SPCA implementations: 3 codes (multi-core, GPU and cluster) for each of the 8 problems. Parallelism in the methods is aimed at (1) speeding up computations (our GPU code can be 100 times faster than an efficient serial code written in C++), (2) obtaining solutions explaining more variance and (3) dealing with big data problems (our cluster code can solve a 357 GB problem in a minute).
This is a preview of subscription content, access via your institution.







Notes
In the \(L_1\) penalized formulations this can be seen from the inequality \(\Vert x\Vert _1 \le \sqrt{\Vert x\Vert _0}\Vert x\Vert _2\).
Open source code with efficient implementations of the algorithms developed in this paper is published here: https://github.com/optml/24am.
References
Amini AA, Wainwright MJ (2009) High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann Stat 37:2877–2921
Aravkin A, Becker S (2016) Dual smoothing and value function techniques for variational matrix decomposition. Applications in Image and Video Processing, Handbook of Robust Low-Rank and Sparse Matrix Decomposition
Bah B, Tanner J (2010) Improved bounds on restricted isometry constants for gaussian matrices. SIAM J Matrix Anal Appl 31:2882–2898
Beck A, Vaisbourd Y (2016) The sparse principal component analysis problem: optimality conditions and algorithms. J Optim Theory Algorithm 170:119–143
Berk L, Bertsimas D (2019) Certifiably optimal sparse principal component analysis. Math Program Comput 11:381–420
Bouwmans T, Sobral A, Javed S, Jung SK, Zahzah EH (2017) Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset. Comput Sci Rev 23:1–71
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58:Article 11
Croux C, Filzmoser P, Fritz H (2013) Robust sparse principal component analysis. Technometrics 55:202–214
d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet G (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49:434–448
d’Aspremont A, Bach F, El Ghaoui L (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294
Hastie T, Tibshirani R, Wainwright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC
Hubert M, Reynkens T, Schmitt E, Verdonck T (2016) Sparse PCA for high-dimensional data with outliers. Technometrics 58:424–434
Jollife I (1986) Principal component analysis. Springer, New York
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12(3):531–547
Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553
Kwak N (2008) Principal component analysis based on \(l_1\) norm maximization. IEEE Trans Pattern Anal Mach Intell 30:1672–1680
Lei J, Vu VQ (2015) Sparsity and agnostic inference in sparse pca. Ann Stat 43:299–322
Lu Z, Zhang Y (2012) An augmented Lagrangian approach for sparse principal component analysis. Math Program Ser A 135:149–193. https://doi.org/10.1007/s10107-011-0452-4
Luss R, Teboulle M (2013) Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev 55:65–98
Mackey L (2008) Deflation methods for sparse PCA. Adv Neural Inf Process Syst 21:1017–1024
Magdon-Ismail M (2017) Np-hardness and inapproximability of sparse PCA. Inf Process Lett 126:35–38
Meng D, Zhao Q, Xu Z (2012) Improve robustness of sparse PCA by \(l_1\)-norm maximization. Pattern Recogn 45:487–497
Moghaddam B, Weiss Y, Avidan S (2006) Spectral bounds for sparse PCA: exact and greedy algorithms. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems. MIT Press, Cambridge, vol 18, pp 915–922
Qi X, Luo R, Zhao H (2013) Sparse principal component analysis by choice of norm. J Multivar Anal 114:127–160
Richtárik P (2011) Finding sparse approximations to extreme eigenvectors: generalized power method for sparse PCA and extensions. In: Proceedings of signal processing with adaptive sparse structured representations
Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 99(6):1015–1034
Trendafilov NT (2016) From simple structure to sparse components: a review. Comput Stat 29:431–454
Trendafilov NT, Jolliffe IT (2006) Projected gradient approach to the numerical solution of the scotlass. J Comput Stat Data Anal 50:242–253
Vu VQ, Lei J (2013) Minimax sparse principal subspace estimation in high dimensions. Ann Stat 41:2905–2947
Vu VQ, Cho J, Lei J, Rohe K (2013) Fantope projection and selection: a near-optimal convex relaxation of sparse PCA. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Red Hook, New York, pp 2670–2678
Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applicaitons to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534
Zhang Y, El Ghaoui L (2011) Large-scale sparse principal component analysis with application to text data. Adv Neural Inf Process Syst 24:532–539
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
MT was partially supported by National Science Foundation Grants CCF-1618717, CMMI-1663256 and CCF-1740796.
Rights and permissions
About this article
Cite this article
Richtárik, P., Jahani, M., Ahipaşaoğlu, S.D. et al. Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes. Optim Eng 22, 1493–1519 (2021). https://doi.org/10.1007/s11081-020-09562-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-020-09562-3
Keywords
- Sparse PCA
- Alternating maximization
- GPower
- Big data analytics
- Unsupervised learning