Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

Bouveyron, Charles; Brunet, Camille

doi:10.1007/s11222-011-9249-9

Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

Published: 13 April 2011

Volume 22, pages 301–324, (2012)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Charles Bouveyron¹ &
Camille Brunet²

616 Accesses
71 Citations
Explore all metrics

Abstract

Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets highlight the good performance of the proposed approach as compared to existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high-dimensional data for data mining application. In: ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998)
Google Scholar
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Article MathSciNet MATH Google Scholar
Alexandrov, T., Decker, J., Mertens, B., Deelder, A., Tollenaar, R., Maass, P., Thiele, H.: Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation. Bioinformatics 25(5), 643–649 (2009)
Article Google Scholar
Anderson, E.: The irises of the Gaspé Peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935)
Google Scholar
Baek, J., McLachlan, G., Flack, L.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1298–1309 (2010)
Article Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
Article Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003)
Article MathSciNet Google Scholar
Bishop, C., Svensen, M.: The generative topographic mapping. Neural Comput. 10(1), 215–234 (1998)
Article Google Scholar
Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Trans. PAMI 31(8), 1429–1443 (2009)
Article Google Scholar
Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52(1), 502–519 (2007)
Article MathSciNet MATH Google Scholar
Campbell, N.: Canonical variate analysis: a general model formulation. Aust. J. Stat. 28, 86–96 (1984)
Google Scholar
Celeux, G., Diebolt, J.: The SEM algorithm: a probabilistic teacher algorithm from the EM algorithm for the mixture problem. Comput. Stat. Q. 2(1), 73–92 (1985)
Google Scholar
Celeux, G., Govaert, G.: A classification E.M. algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14, 315–332 (1992)
Article MathSciNet MATH Google Scholar
Clausi, D.A.: K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation. Pattern Recognit. 35, 1959–1972 (2002)
Article MATH Google Scholar
Ding, C., Li, T.: Adaptative dimension reduction using discriminant analysis and k-means clustering. In: ICML (2007)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2000)
Google Scholar
Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Article Google Scholar
Foley, D., Sammon, J.: An optimal set of discriminant vectors. IEEE Trans. Comput. 24, 281–289 (1975)
Article MATH Google Scholar
Fraley, C., Raftery, A.: MCLUST: software for model-based cluster analysis. J. Classif. 16, 297–306 (1999)
Article MATH Google Scholar
Fraley, C., Raftery, A.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458) (2002)
Friedman, J.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84, 165–175 (1989)
Article Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, San Diego (1990)
MATH Google Scholar
Golub, G., Van Loan, C.: Matrix Computations, 2nd edn. Hopkins University Press, Baltimore (1991)
Google Scholar
Guo, Y.F., Li, S.J., Yang, J.Y., Shu, T.T., Wu, L.D.: A generalized Foley-Sammon transform based on generalized Fisher discriminant criterion and its application to face recognition. Pattern Recognit. Lett. 24, 147–158 (2003)
Article MATH Google Scholar
Hamamoto, Y., Matsuura, Y., Kanaoka, T., Tomita, S.: A note on the orthonormal discriminant vector method for feature extraction. Pattern Recognit. 24(7), 681–684 (1991)
Article Google Scholar
Hastie, T., Buja, A., Tibshirani, R.: Penalized discriminant analysis. Ann. Stat. 23, 73–102 (1995)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)
Book MATH Google Scholar
Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 26(8), 995–1006 (2004)
Article Google Scholar
Jain, A., Marty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Jin, Z., Yang, J., Hu, Z., Lou, Z.: Face recognition based on the uncorrelated optimal discriminant vectors. Pattern Recognit. 10(34), 2041–2047 (2001)
Article Google Scholar
Jolliffe, I.: Principal Component Analysis. Springer, New York (1986)
Google Scholar
Kimeldorf, G., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33(1), 82–95 (1971)
Article MathSciNet MATH Google Scholar
Krzanowski, W.: Principles of Multivariate Analysis. Oxford University Press, Oxford (2003)
MATH Google Scholar
la Torre Frade, F.D., Kanade, T.: Discriminative cluster analysis. In: ICML, pp. 241–248 (2006)
Law, M., Figueiredo, M., Jain, A.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. PAMI 26(9), 1154–1166 (2004)
Article Google Scholar
Liu, K., Cheng, Y.Q., Yang, J.Y.: A generalized optimal set of discriminant vectors. Pattern Recognit. 25(7), 731–739 (1992)
Article Google Scholar
Maugis, C., Celeux, G., Martin-Magniette, M.L.: Variable selection for clustering with Gaussian mixture models. Biometrics 65(3), 701–709 (2009)
Article MathSciNet MATH Google Scholar
McLachlan, G., Krishnan, T.: The EM algorithm and extensions. Wiley, New York (1997)
MATH Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Book MATH Google Scholar
McLachlan, G., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat. Data Anal. 41, 379 (2003)
Article MathSciNet Google Scholar
McNicholas, P., Murphy, B.: Parsimonious Gaussian mixture models. Stat. Comput. 18(3), 285–296 (2008)
Article MathSciNet Google Scholar
Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Model. 10(4), 441–460 (2010)
Article MathSciNet Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl. 6(1), 69–76 (1998)
Google Scholar
Raftery, A., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101(473), 168–178 (2006)
Article MathSciNet MATH Google Scholar
Rubin, D., Thayer, D.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982)
Article MathSciNet MATH Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Scott, D., Thompson, J.: Probability density estimation in higher dimensions. In: Fifteenth Symposium in the Interface, pp. 173–179. (1983)
Google Scholar
Tipping, E., Bishop, C.: Mixtures of probabilistic principal component analysers. Neural Comput. 11(2), 443–482 (1999)
Article Google Scholar
Trendafilov, N., Jolliffe, I.T.: DALASS: variable selection in discriminant analysis via the LASSO. Comput. Stat. Data Anal. 51, 3718–3736 (2007)
Article MathSciNet MATH Google Scholar
Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: IWANN (2005)
Ye, J.: Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J. Mach. Learn. Res. 6, 483–502 (2005)
MathSciNet MATH Google Scholar
Ye, J., Zhao, Z., Wu, M.: Discriminative k-means for clustering. Adv. Neural Inf. Process. Syst. 20, 1649–1656 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire SAMM, EA 4543, Université Paris 1 Panthéon-Sorbonne, 90 rue de Tolbiac, 75013, Paris, France
Charles Bouveyron
IBISC, TADIB, FRE CNRS 3190, Université d’Evry Val d’Essonne, 40 rue de Pelvoux, 91020, Evry Courcouronnes, France
Camille Brunet

Authors

Charles Bouveyron
View author publications
You can also search for this author in PubMed Google Scholar
Camille Brunet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Camille Brunet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouveyron, C., Brunet, C. Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat Comput 22, 301–324 (2012). https://doi.org/10.1007/s11222-011-9249-9

Download citation

Received: 15 June 2010
Accepted: 16 March 2011
Published: 13 April 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11222-011-9249-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

Abstract

Access this article

Similar content being viewed by others

A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering

Spectral type subspace clustering methods: multi-perspective analysis

A dual subspace parsimonious mixture of matrix normal distributions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

Abstract

Access this article

Similar content being viewed by others

A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering

Spectral type subspace clustering methods: multi-perspective analysis

A dual subspace parsimonious mixture of matrix normal distributions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation