Statistics and Computing

, Volume 22, Issue 1, pp 301–324 | Cite as

Simultaneous model-based clustering and visualization in the Fisher discriminative subspace



Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets highlight the good performance of the proposed approach as compared to existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data.


High-dimensional clustering Model-based clustering Discriminative subspace Fisher criterion Visualization Parsimonious models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high-dimensional data for data mining application. In: ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998) Google Scholar
  2. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974) MathSciNetMATHCrossRefGoogle Scholar
  3. Alexandrov, T., Decker, J., Mertens, B., Deelder, A., Tollenaar, R., Maass, P., Thiele, H.: Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation. Bioinformatics 25(5), 643–649 (2009) CrossRefGoogle Scholar
  4. Anderson, E.: The irises of the Gaspé Peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935) Google Scholar
  5. Baek, J., McLachlan, G., Flack, L.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1298–1309 (2010) CrossRefGoogle Scholar
  6. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) MATHGoogle Scholar
  7. Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000) CrossRefGoogle Scholar
  8. Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003) MathSciNetCrossRefGoogle Scholar
  9. Bishop, C., Svensen, M.: The generative topographic mapping. Neural Comput. 10(1), 215–234 (1998) CrossRefGoogle Scholar
  10. Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Trans. PAMI 31(8), 1429–1443 (2009) CrossRefGoogle Scholar
  11. Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52(1), 502–519 (2007) MathSciNetMATHCrossRefGoogle Scholar
  12. Campbell, N.: Canonical variate analysis: a general model formulation. Aust. J. Stat. 28, 86–96 (1984) Google Scholar
  13. Celeux, G., Diebolt, J.: The SEM algorithm: a probabilistic teacher algorithm from the EM algorithm for the mixture problem. Comput. Stat. Q. 2(1), 73–92 (1985) Google Scholar
  14. Celeux, G., Govaert, G.: A classification E.M. algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14, 315–332 (1992) MathSciNetMATHCrossRefGoogle Scholar
  15. Clausi, D.A.: K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation. Pattern Recognit. 35, 1959–1972 (2002) MATHCrossRefGoogle Scholar
  16. Ding, C., Li, T.: Adaptative dimension reduction using discriminant analysis and k-means clustering. In: ICML (2007) Google Scholar
  17. Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2000) Google Scholar
  18. Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936) CrossRefGoogle Scholar
  19. Foley, D., Sammon, J.: An optimal set of discriminant vectors. IEEE Trans. Comput. 24, 281–289 (1975) MATHCrossRefGoogle Scholar
  20. Fraley, C., Raftery, A.: MCLUST: software for model-based cluster analysis. J. Classif. 16, 297–306 (1999) MATHCrossRefGoogle Scholar
  21. Fraley, C., Raftery, A.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458) (2002) Google Scholar
  22. Friedman, J.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84, 165–175 (1989) CrossRefGoogle Scholar
  23. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, San Diego (1990) MATHGoogle Scholar
  24. Golub, G., Van Loan, C.: Matrix Computations, 2nd edn. Hopkins University Press, Baltimore (1991) Google Scholar
  25. Guo, Y.F., Li, S.J., Yang, J.Y., Shu, T.T., Wu, L.D.: A generalized Foley-Sammon transform based on generalized Fisher discriminant criterion and its application to face recognition. Pattern Recognit. Lett. 24, 147–158 (2003) MATHCrossRefGoogle Scholar
  26. Hamamoto, Y., Matsuura, Y., Kanaoka, T., Tomita, S.: A note on the orthonormal discriminant vector method for feature extraction. Pattern Recognit. 24(7), 681–684 (1991) CrossRefGoogle Scholar
  27. Hastie, T., Buja, A., Tibshirani, R.: Penalized discriminant analysis. Ann. Stat. 23, 73–102 (1995) MathSciNetMATHCrossRefGoogle Scholar
  28. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009) MATHCrossRefGoogle Scholar
  29. Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 26(8), 995–1006 (2004) CrossRefGoogle Scholar
  30. Jain, A., Marty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999) CrossRefGoogle Scholar
  31. Jin, Z., Yang, J., Hu, Z., Lou, Z.: Face recognition based on the uncorrelated optimal discriminant vectors. Pattern Recognit. 10(34), 2041–2047 (2001) CrossRefGoogle Scholar
  32. Jolliffe, I.: Principal Component Analysis. Springer, New York (1986) Google Scholar
  33. Kimeldorf, G., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33(1), 82–95 (1971) MathSciNetMATHCrossRefGoogle Scholar
  34. Krzanowski, W.: Principles of Multivariate Analysis. Oxford University Press, Oxford (2003) MATHGoogle Scholar
  35. la Torre Frade, F.D., Kanade, T.: Discriminative cluster analysis. In: ICML, pp. 241–248 (2006) Google Scholar
  36. Law, M., Figueiredo, M., Jain, A.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. PAMI 26(9), 1154–1166 (2004) CrossRefGoogle Scholar
  37. Liu, K., Cheng, Y.Q., Yang, J.Y.: A generalized optimal set of discriminant vectors. Pattern Recognit. 25(7), 731–739 (1992) CrossRefGoogle Scholar
  38. Maugis, C., Celeux, G., Martin-Magniette, M.L.: Variable selection for clustering with Gaussian mixture models. Biometrics 65(3), 701–709 (2009) MathSciNetMATHCrossRefGoogle Scholar
  39. McLachlan, G., Krishnan, T.: The EM algorithm and extensions. Wiley, New York (1997) MATHGoogle Scholar
  40. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000) MATHCrossRefGoogle Scholar
  41. McLachlan, G., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat. Data Anal. 41, 379 (2003) MathSciNetCrossRefGoogle Scholar
  42. McNicholas, P., Murphy, B.: Parsimonious Gaussian mixture models. Stat. Comput. 18(3), 285–296 (2008) MathSciNetCrossRefGoogle Scholar
  43. Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Model. 10(4), 441–460 (2010) MathSciNetCrossRefGoogle Scholar
  44. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl. 6(1), 69–76 (1998) Google Scholar
  45. Raftery, A., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101(473), 168–178 (2006) MathSciNetMATHCrossRefGoogle Scholar
  46. Rubin, D., Thayer, D.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982) MathSciNetMATHCrossRefGoogle Scholar
  47. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978) MATHCrossRefGoogle Scholar
  48. Scott, D., Thompson, J.: Probability density estimation in higher dimensions. In: Fifteenth Symposium in the Interface, pp. 173–179. (1983) Google Scholar
  49. Tipping, E., Bishop, C.: Mixtures of probabilistic principal component analysers. Neural Comput. 11(2), 443–482 (1999) CrossRefGoogle Scholar
  50. Trendafilov, N., Jolliffe, I.T.: DALASS: variable selection in discriminant analysis via the LASSO. Comput. Stat. Data Anal. 51, 3718–3736 (2007) MathSciNetMATHCrossRefGoogle Scholar
  51. Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: IWANN (2005) Google Scholar
  52. Ye, J.: Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J. Mach. Learn. Res. 6, 483–502 (2005) MathSciNetMATHGoogle Scholar
  53. Ye, J., Zhao, Z., Wu, M.: Discriminative k-means for clustering. Adv. Neural Inf. Process. Syst. 20, 1649–1656 (2007) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Laboratoire SAMM, EA 4543Université Paris 1 Panthéon-SorbonneParisFrance
  2. 2.IBISC, TADIB, FRE CNRS 3190Université d’Evry Val d’EssonneEvry CourcouronnesFrance

Personalised recommendations