Skip to main content
Log in

Graphical tools for model-based mixture discriminant analysis

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

The paper introduces a methodology for visualizing on a dimension reduced subspace the classification structure and the geometric characteristics induced by an estimated Gaussian mixture model for discriminant analysis. In particular, we consider the case of mixture of mixture models with varying parametrization which allow for parsimonious models. The approach is an extension of an existing work on reducing dimensionality for model-based clustering based on Gaussian mixtures. Information on the dimension reduction subspace is provided by the variation on class locations and, depending on the estimated mixture model, on the variation on class dispersions. Projections along the estimated directions provide summary plots which help to visualize the structure of the classes and their characteristics. A suitable modification of the method allows us to recover the most discriminant directions, i.e., those that show maximal separation among classes. The approach is illustrated using simulated and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

    Article  MATH  MathSciNet  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57:289–300

    MATH  MathSciNet  Google Scholar 

  • Bensmail H, Celeux G (1996) Regularized Gaussian discriminant analysis through eigenvalue decomposition. J Am Stat Assoc 91:1743–1748

    Article  MATH  MathSciNet  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

    Google Scholar 

  • Bouveyron C, Brunet-Saumard C (2013) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal. doi:10.1016/j.csda.2012.12.008

  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, New York

  • Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793

    Article  Google Scholar 

  • Chen CH, Li KC (2001) Generalization of fisher’s linear discriminant analysis via the approach of sliced inverse regression. J Korean Stat Soc 30:193–217

    Google Scholar 

  • Cook DR, Forzani L (2009) Likelihood-based sufficient dimension reduction. J Am Stat Assoc 104(485): 197–208

    Google Scholar 

  • Cook RD, Weisberg S (1991) Discussion of Li (1991). J Am Stat Assoc 86:328–332

    Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm (with discussion). J R Stat Soc Ser B Stat Methodol 39:1–38

    Google Scholar 

  • Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87

    Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188

    Article  Google Scholar 

  • Flury B, Riedwyl H (1988) Multivariate statistics: a practical approach. Chapman & Hall Ltd., London

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631

    Article  MATH  MathSciNet  Google Scholar 

  • Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington

  • Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84:165–175

    Article  Google Scholar 

  • Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing JR, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Google Scholar 

  • Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B (Statistical Methodology) 58(1):155–176

    MATH  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Buja A (1994) Flexible discriminant analysis by optimal scoring. J Am Stat Assoc 89:1255–1270

    Google Scholar 

  • Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat 23:73–102

    Google Scholar 

  • Hennig C (2004) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13(4): 930–945

    Google Scholar 

  • Kent JT (1991) Discussion of Li (1991). J Am Stat Assoc 86:336–337

    Google Scholar 

  • Li KC (1991) Sliced inverse regression for dimension reduction (with discussion). J Am Stat Assoc 86: 316–342

    Google Scholar 

  • Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, London

  • Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with gaussian mixture models. Biometrics 65(3):701–709

    Google Scholar 

  • Pardoe I, Yin X, Cook R (2007) Graphical tools for quadratic discriminant analysis. Technometrics 49(2):172–183

    Google Scholar 

  • R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/

  • Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:31–38

    Google Scholar 

  • Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20(4):471–484. doi:10.1007/s11222-009-9138-7

    Google Scholar 

  • Velilla S (2008) A method for dimension reduction in quadratic classification problems. J Comput Graph Stat 17(3):572–589

    Article  MathSciNet  Google Scholar 

  • Velilla S (2010) On the structure of the quadratic subspace in discriminant analysis. J Multivariate Anal 101(5):1239–1251

    Article  MATH  MathSciNet  Google Scholar 

  • Zhu M, Hastie TJ (2003) Feature extraction for nonparametric discriminant analysis. J Comput Graph Stat 12(1):101–120

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The author is grateful to the Associate Editor and two referees for their inspiring comments and recommendations that led to a substantial improvement of the manuscript. This work was partly supported by the Eunice Kennedy Shriver National Institute of Child Health and Development through grant R01 HD070936.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Scrucca.

Appendix

Appendix

1.1 Proof of Proposition 1

Assume an EDDA mixture model with common full class covariance matrix. The last condition implies that the matrix \({\varvec{M}}_\mathsf{II }\) in Eq. (2) cancels out, so the kernel matrix simplifies to

$$\begin{aligned} {\varvec{M}}= {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1} {\varvec{M}}_\mathsf{I }, \end{aligned}$$

where \({\varvec{M}}_\mathsf{I }= \sum _{k=1}^K \pi _k ({\varvec{\mu }}_{k} - {\varvec{\mu }})({\varvec{\mu }}_{k} - {\varvec{\mu }}){}^{\top }= {\varvec{\varSigma }}_B\), the between-class covariance matrix. The basis of the subspace \(\mathcal{S }({\varvec{\beta }})\) provided by GMMDRC is obtained as the solution of the following problem

$$\begin{aligned} {\varvec{M}}{\varvec{\beta }}_j = l_j {\varvec{\varSigma }}_X {\varvec{\beta }}_j, \end{aligned}$$

with \(l_1 \ge \dots \ge l_d\), and \(d = \min (p,K-1)\). Thus, \({\varvec{\beta }}_j\) is the \(j\)th eigenvector associated to the \(j\)th largest eigenvalue \(l_j\) (\(j=1,\ldots ,d\)) of the \((p \times p)\) matrix

$$\begin{aligned} {\varvec{\varSigma }}^{-1/2}_X {\varvec{M}}{\varvec{\varSigma }}^{-1/2}_X&= {\varvec{\varSigma }}^{-1/2}_X {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1} {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}^{-1/2}_X \\&= \left( {\varvec{\varSigma }}^{-1/2}_X {\varvec{\varSigma }}_B {\varvec{\varSigma }}^{-1/2}_X\right) {}^{\top }\left( {\varvec{\varSigma }}^{-1/2}_X {\varvec{\varSigma }}_B {\varvec{\varSigma }}^{-1/2}_X\right) . \end{aligned}$$

The subspace estimated by SIR is obtained as the solution of

$$\begin{aligned} {\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}\ = l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_X {\varvec{\beta }}_j^{\mathsf{SIR }} \end{aligned}$$
(6)

which is given by the eigen-decomposition of \({\varvec{\varSigma }}^{-1/2}_X {\varvec{\varSigma }}_B{\varvec{\varSigma }}^{-1/2}_X\). It is easily seen that \({\varvec{\beta }}_j = {\varvec{\beta }}_j^{\mathsf{SIR }}\) and \(l_j = (l_j^{\mathsf{SIR }})^2\), for \(j=1,\ldots ,d\). Thus, the basis of the subspace provided by GMMDRC under model EDDA with full common class covariance matrix is equivalent to the basis estimated by SIR.

We now consider the relation of GMMDRC with LDA canonical variates. From (6), we may subtract \(l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_{B} {\varvec{\beta }}_{j}^{{\mathsf{SIR }}}\) from both side and, recalling the decomposition of the total variance, \({\varvec{\varSigma }}_X = {\varvec{\varSigma }}_B + {\varvec{\varSigma }}_W\), we may write

$$\begin{aligned} {\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}- l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}&= l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_X {\varvec{\beta }}_j^{\mathsf{SIR }}- l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}\\ \left( 1 - l_j^{\mathsf{SIR }}\right) {\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}&= l_j^{\mathsf{SIR }}\left( {\varvec{\varSigma }}_X - {\varvec{\varSigma }}_B\right) {\varvec{\beta }}_j^{\mathsf{SIR }}\\ {\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}&= l_j^{\mathsf{SIR }}/\left( 1 - l_j^{\mathsf{SIR }}\right) {\varvec{\varSigma }}_W {\varvec{\beta }}_j^{\mathsf{SIR }}. \end{aligned}$$

It is clear that \(l_j^{{\mathsf{SIR }}} /(1-l_{j}^{{\mathsf{SIR }}})\) and \({\varvec{\beta }}_{j}^{{\mathsf{SIR }}}\) are, respectively, the \(j\)th eigenvalue and the associated eigenvector of \({\varvec{\varSigma }}_W^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_W^{-1/2}\), the decomposition solving the Rayleigh quotient used to derive canonical variates in LDA. Thus, the basis of the subspace \(\mathcal{S }({\varvec{\beta }}^\mathsf{LDA })\) is equivalent to \(\mathcal{S }({\varvec{\beta }}^{\mathsf{SIR }})\), which in turn is equivalent to that provided by GMMDRC under the specific model assumption.

1.2 Proof of Proposition 2

The kernel matrix of SAVE can be written in the original scale of the variables as

$$\begin{aligned} {\varvec{M}}_\mathsf{SAVE } = \sum _{k=1}^K \omega _k \left( {\varvec{I}}_p - {\varvec{\varSigma }}_X^{-1/2} {\varvec{\varSigma }}_k {\varvec{\varSigma }}_X^{-1/2} \right) ^2. \end{aligned}$$

Recalling that \({\varvec{\varSigma }}_X = {\varvec{\varSigma }}_B + {\varvec{\varSigma }}_W\), we may write the expression within parenthesis as follows:

$$\begin{aligned} {\varvec{\varSigma }}_X^{-1/2} ({\varvec{\varSigma }}_X - {\varvec{\varSigma }}_k) {\varvec{\varSigma }}_X^{-1/2}&= {\varvec{\varSigma }}_X^{-1/2} ({\varvec{\varSigma }}_B + {\varvec{\varSigma }}_W - {\varvec{\varSigma }}_k) {\varvec{\varSigma }}_X^{-1/2} \\&= {\varvec{\varSigma }}_X^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_X^{-1/2} + {\varvec{\varSigma }}_X^{-1/2} ({\varvec{\varSigma }}_W - {\varvec{\varSigma }}_k) {\varvec{\varSigma }}_X^{-1/2}. \end{aligned}$$

Then,

$$\begin{aligned} {\varvec{M}}_\mathsf{SAVE }&= \sum _{k=1}^K \omega _k \left( {\varvec{\varSigma }}_X^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_X^{-1/2} + {\varvec{\varSigma }}_X^{-1/2}({\varvec{\varSigma }}_W - {\varvec{\varSigma }}_k) {\varvec{\varSigma }}_X^{-1/2} \right) ^2 \\&= {\varvec{\varSigma }}_X^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_X^{-1} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_X^{-1/2} \\&+ {\varvec{\varSigma }}_X^{-1/2} \left( \sum _{k=1}^K w_k ({\varvec{\varSigma }}_k - {\varvec{\varSigma }}_W) {\varvec{\varSigma }}_X^{-1} ({\varvec{\varSigma }}_k - {\varvec{\varSigma }}_W){}^{\top }\right) {\varvec{\varSigma }}_X^{-1/2}\\&= {\varvec{\varSigma }}_X^{-1/2} {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1} {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1/2} + {\varvec{\varSigma }}_X^{-1/2} {\varvec{M}}_\mathsf{II }{\varvec{\varSigma }}_X^{-1/2} \\&= {\varvec{\varSigma }}_X^{-1/2} ( {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1} {\varvec{M}}_\mathsf{I }+ {\varvec{M}}_\mathsf{II }) {\varvec{\varSigma }}_X^{-1/2}, \end{aligned}$$

where \({\varvec{M}}_\mathsf{I }\) and \({\varvec{M}}_\mathsf{II }\) are those obtained from an EDDA Gaussian mixture model with a single component for each class and different class covariance matrices (VVV).

1.3 Proof of Proposition 3

The proof is analogous to that provided for Prop. 2 in Scrucca (2010) and it is not replicated here.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scrucca, L. Graphical tools for model-based mixture discriminant analysis. Adv Data Anal Classif 8, 147–165 (2014). https://doi.org/10.1007/s11634-013-0147-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-013-0147-1

Keywords

Mathematics Subject Classification

Navigation