Clustering and feature selection using sparse principal component analysis


In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief introduction and motivation on sparse PCA and detail our implementation of the algorithm in d’Aspremont et al. (SIAM Rev. 49(3):434–448, 2007). We then apply these results to some classic clustering and feature selection problems arising in biology.

This is a preview of subscription content, access via your institution.


  1. Alizadeh A, Eisen M, Davis R, Ma C, Lossos I, Rosenwald A (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403:503–511

    Article  Google Scholar 

  2. Alon A, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biol 96:6745–6750

    Google Scholar 

  3. Cadima J, Jolliffe IT (1995) Loadings and correlations in the interpretation of principal components. J Appl Stat 22:203–214

    Article  MathSciNet  Google Scholar 

  4. Candès EJ, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215

    Article  Google Scholar 

  5. d’Aspremont A (2005) Smooth optimization with approximate gradient. ArXiv:math.OC/0512344

  6. d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet GRG (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49(3):434–448

    MATH  Article  MathSciNet  Google Scholar 

  7. Donoho DL, Tanner J (2005) Sparse nonnegative solutions of underdetermined linear equations by linear programming. Proc Natl Acad Sci 102(27):9446–9451

    Article  MathSciNet  Google Scholar 

  8. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    MATH  Article  Google Scholar 

  9. Huang TM, Kecman V (2005) Gene extraction for cancer diagnosis by support vector machines-an improvement. Artif Intell Med 35:185–194

    Article  Google Scholar 

  10. Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547

    Article  MathSciNet  Google Scholar 

  11. Moler C, Van Loan C (2003) Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev 45(1):3–49

    MATH  Article  MathSciNet  Google Scholar 

  12. Moghaddam B, Weiss Y, Avidan S (2006a) Generalized spectral bounds for sparse LDA. In: International conference on machine learning

  13. Moghaddam B, Weiss Y, Avidan S (2006b) Spectral bounds for sparse PCA: Exact and greedy algorithms. Adv Neural Inf Process Syst, 18

  14. Nesterov Y (1983) A method of solving a convex programming problem with convergence rate O(1/k 2). Sov Math Dokl 27(2):372–376

    MATH  Google Scholar 

  15. Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152

    MATH  Article  MathSciNet  Google Scholar 

  16. Pataki G (1998) On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Math Oper Res 23(2):339–358

    MATH  Article  MathSciNet  Google Scholar 

  17. Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S (2003) Rankgene: identification of diagnostic genes based on expression data. Bioinformatics 19:1578–1579

    Article  Google Scholar 

  18. Srebro N, Shakhnarovich G, Roweis S (2006) An investigation of computational and informational limits in Gaussian mixture clustering. In: Proceedings of the 23rd international conference on machine learning, pp 865–872

  19. Sturm J (1999) Using SEDUMI 1.0x, a MATLAB toolbox for optimization over symmetric cones. Optim Methods Softw 11:625–653

    Article  MathSciNet  Google Scholar 

  20. Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 58(1):267–288

    MATH  MathSciNet  Google Scholar 

  21. Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  22. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320

    MATH  Article  MathSciNet  Google Scholar 

  23. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    Article  MathSciNet  Google Scholar 

  24. Zhang Z, Zha H, Simon H (2002) Low rank approximations with sparse factors I: basic algorithms and error analysis. SIAM J Matrix Anal Appl 23(3):706–727

    MATH  Article  MathSciNet  Google Scholar 

  25. Zhang Z, Zha H, Simon H (2004) Low rank approximations with sparse factors II: penalized methods with discrete Newton-like iterations. SIAM J Matrix Anal Appl 25(4):901–920

    MATH  Article  MathSciNet  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Alexandre d’Aspremont.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Luss, R., d’Aspremont, A. Clustering and feature selection using sparse principal component analysis. Optim Eng 11, 145–157 (2010).

Download citation


  • Sparse principal component analysis
  • Semidefinite programming
  • Clustering
  • Feature selection