Abstract
In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief introduction and motivation on sparse PCA and detail our implementation of the algorithm in d’Aspremont et al. (SIAM Rev. 49(3):434–448, 2007). We then apply these results to some classic clustering and feature selection problems arising in biology.
Similar content being viewed by others
References
Alizadeh A, Eisen M, Davis R, Ma C, Lossos I, Rosenwald A (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Alon A, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biol 96:6745–6750
Cadima J, Jolliffe IT (1995) Loadings and correlations in the interpretation of principal components. J Appl Stat 22:203–214
Candès EJ, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215
d’Aspremont A (2005) Smooth optimization with approximate gradient. ArXiv:math.OC/0512344
d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet GRG (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49(3):434–448
Donoho DL, Tanner J (2005) Sparse nonnegative solutions of underdetermined linear equations by linear programming. Proc Natl Acad Sci 102(27):9446–9451
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Huang TM, Kecman V (2005) Gene extraction for cancer diagnosis by support vector machines-an improvement. Artif Intell Med 35:185–194
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547
Moler C, Van Loan C (2003) Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev 45(1):3–49
Moghaddam B, Weiss Y, Avidan S (2006a) Generalized spectral bounds for sparse LDA. In: International conference on machine learning
Moghaddam B, Weiss Y, Avidan S (2006b) Spectral bounds for sparse PCA: Exact and greedy algorithms. Adv Neural Inf Process Syst, 18
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate O(1/k 2). Sov Math Dokl 27(2):372–376
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152
Pataki G (1998) On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Math Oper Res 23(2):339–358
Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S (2003) Rankgene: identification of diagnostic genes based on expression data. Bioinformatics 19:1578–1579
Srebro N, Shakhnarovich G, Roweis S (2006) An investigation of computational and informational limits in Gaussian mixture clustering. In: Proceedings of the 23rd international conference on machine learning, pp 865–872
Sturm J (1999) Using SEDUMI 1.0x, a MATLAB toolbox for optimization over symmetric cones. Optim Methods Softw 11:625–653
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 58(1):267–288
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Zhang Z, Zha H, Simon H (2002) Low rank approximations with sparse factors I: basic algorithms and error analysis. SIAM J Matrix Anal Appl 23(3):706–727
Zhang Z, Zha H, Simon H (2004) Low rank approximations with sparse factors II: penalized methods with discrete Newton-like iterations. SIAM J Matrix Anal Appl 25(4):901–920
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Luss, R., d’Aspremont, A. Clustering and feature selection using sparse principal component analysis. Optim Eng 11, 145–157 (2010). https://doi.org/10.1007/s11081-008-9057-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-008-9057-z