Computational Statistics

, Volume 29, Issue 3–4, pp 489–513 | Cite as

Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

  • Charles Bouveyron
  • Camille Brunet-Saumard
Original Paper


The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the importance of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through \(\ell _{1}\)-type penalizations. Experimental comparisons with existing approaches on simulated and real-world data sets demonstrate the interest of the proposed methodology. An application to the segmentation of hyperspectral images of the planet Mars is also presented.


Model-based clustering Variable selection Discriminative subspace Fisher-EM algorithm \(\ell _{1}\)-Type penalizations 



The authors would like to greatly thank Cathy Maugis for providing the results of Selvarclust on the zoo, glass, satimage and usps358 data sets.


  1. Baek J, McLachlan G, Flack L (2009) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 1–13Google Scholar
  2. Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonGoogle Scholar
  3. Bibring J-P et al (2005) Mars surface diversity as revealed by the OMEGA/Mars express observations. Science 307(5715):1576–1581Google Scholar
  4. Biernacki C, Celeux G, Govaert G (2001) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725CrossRefGoogle Scholar
  5. Bouveyron C, Brunet C (2012a) Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm. J Multivar Anal 109:29–41CrossRefzbMATHMathSciNetGoogle Scholar
  6. Bouveyron C, Brunet C (2012b) Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat Comput 22(1):301–324CrossRefMathSciNetGoogle Scholar
  7. Bouveyron C, Brunet-Saumard C (2013) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal (in press). doi: 10.1016/j.csda.2012.12.008
  8. Bouveyron C, Girard S, Schmid C (2007a) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519CrossRefzbMATHMathSciNetGoogle Scholar
  9. Bouveyron C, Girard S, Schmid C (2007b) High dimensional discriminant analysis. Commun Stat Theory Methods 36(14):2607–2623CrossRefzbMATHMathSciNetGoogle Scholar
  10. Cadima J, Jolliffe I (1995) Loadings and correlations in the interpretation of the principal components. J Appl Stat 22:203–214Google Scholar
  11. Celeux G, Martin-Magniette ML, Maugis C Raftery (2011) A letter to the editor. J Am Stat Assoc 106(493):383Google Scholar
  12. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499CrossRefzbMATHMathSciNetGoogle Scholar
  13. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188CrossRefGoogle Scholar
  14. Foley DH, Sammon JW (1975) An optimal set of discriminant vectors. IEEE Trans Comput 24:281–289CrossRefzbMATHGoogle Scholar
  15. Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, San DiegozbMATHGoogle Scholar
  16. Galimberti G, Montanari A, Viroli C (2009) Penalized factor mixture analysis for variable selection in clustered data. Comput Stat Data Anal 53(12):4301–4310CrossRefzbMATHMathSciNetGoogle Scholar
  17. Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. Technical report, University of TorontoGoogle Scholar
  18. Gower JC, Dijksterhuis GB (2004) Procrustes problems. Oxford University Press, OxfordGoogle Scholar
  19. Law M, Figueiredo M, Jain A (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans PAMI 26(9):1154–1166CrossRefGoogle Scholar
  20. Liu J, Zhang JL, Palumbo MJ, Lawrence CE (2003) Bayesian clustering with variable and transformation selection. Bayesian Stat 7:249–276MathSciNetGoogle Scholar
  21. Maugis C, Celeux G, Martin-Magniette M-L (2009a) Variable selection for clustering with Gaussian mixture models. Biometrics 65(3):701–709CrossRefzbMATHMathSciNetGoogle Scholar
  22. Maugis C, Celeux G, Martin-Magniette M-L (2009b) Variable selection in model-based clustering: a general variable role modeling. Comput Stat Data Anal 53:3872–3882CrossRefzbMATHMathSciNetGoogle Scholar
  23. McLachlan G, Peel D, Bean R (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41(3):379–388Google Scholar
  24. McNicholas P, Murphy B (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296CrossRefMathSciNetGoogle Scholar
  25. Montanari A, Viroli C (2010) Heteroscedastic factor mixture analysis. Stat Model Int J 10(4):441–460CrossRefMathSciNetGoogle Scholar
  26. Montanari A, Viroli C (2011) Dimensionally reduced mixtures of regression models. J Stat Plan Inference 141(5):1744–1752CrossRefzbMATHMathSciNetGoogle Scholar
  27. Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8:1145–1164zbMATHGoogle Scholar
  28. Qiao Z, Zhou L, Huang JZ (2009) Sparse linear discriminant analysis with applications to high dimensional low sample size data. Int J Appl Math 39(1):48–60Google Scholar
  29. Raftery A, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178CrossRefzbMATHMathSciNetGoogle Scholar
  30. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a dataset via the gap statistic. J R Stat Soc Ser B 32(2):411–423CrossRefMathSciNetGoogle Scholar
  31. Wang S, Zhou J (2008) Variable selection for model-based high dimensional clustering and its application to microarray data. Biometrics 64:440–448CrossRefzbMATHMathSciNetGoogle Scholar
  32. Witten DM, Tibshirani R (2010) A framework for feature selection in clustering. J Am Stat Assoc 105(490):713–726CrossRefMathSciNetGoogle Scholar
  33. Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistic 10(3):515–534CrossRefGoogle Scholar
  34. Xie B, Pan W, Shen X (2008) Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables. Electr J Stat 2:168–212CrossRefzbMATHMathSciNetGoogle Scholar
  35. Xie B, Pan W, Shen X (2010) Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data. Bioinformatics 26(4):501–508CrossRefGoogle Scholar
  36. Yoshida R, Higuchi T, Imoto S (2004) A mixed factor model for dimension reduction and extraction of a group structure in gene expression data. IEEE Comput Syst Bioinform Conf 8:161–172Google Scholar
  37. Zhang Z, Dai G, Jordan MI (2009) A flexible and efficient algorithm for regularized fisher discriminant analysis. In: Proceedings of the European conference on machine learning and knowledge discovery in databases, pp 632–647Google Scholar
  38. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320CrossRefzbMATHMathSciNetGoogle Scholar
  39. Zou H, Hastie R, Tibshirani T (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286CrossRefMathSciNetGoogle Scholar
  40. Zou H, Hastie T, Tibshirani R (2007) On the degrees of freedom of the Lasso. Ann Stat 35(5):2173–2192CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Laboratoire SAMM, EA 4543Université Paris 1 Panthéon-SorbonneParisFrance
  2. 2.Laboratoire LAREMA, UMR CNRS 6093Université d’AngersAngersFrance

Personalised recommendations