Abstract
The paper introduces a methodology for visualizing on a dimension reduced subspace the classification structure and the geometric characteristics induced by an estimated Gaussian mixture model for discriminant analysis. In particular, we consider the case of mixture of mixture models with varying parametrization which allow for parsimonious models. The approach is an extension of an existing work on reducing dimensionality for model-based clustering based on Gaussian mixtures. Information on the dimension reduction subspace is provided by the variation on class locations and, depending on the estimated mixture model, on the variation on class dispersions. Projections along the estimated directions provide summary plots which help to visualize the structure of the classes and their characteristics. A suitable modification of the method allows us to recover the most discriminant directions, i.e., those that show maximal separation among classes. The approach is illustrated using simulated and real data.
Similar content being viewed by others
References
Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57:289–300
Bensmail H, Celeux G (1996) Regularized Gaussian discriminant analysis through eigenvalue decomposition. J Am Stat Assoc 91:1743–1748
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Bouveyron C, Brunet-Saumard C (2013) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal. doi:10.1016/j.csda.2012.12.008
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, New York
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Chen CH, Li KC (2001) Generalization of fisher’s linear discriminant analysis via the approach of sliced inverse regression. J Korean Stat Soc 30:193–217
Cook DR, Forzani L (2009) Likelihood-based sufficient dimension reduction. J Am Stat Assoc 104(485): 197–208
Cook RD, Weisberg S (1991) Discussion of Li (1991). J Am Stat Assoc 86:328–332
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm (with discussion). J R Stat Soc Ser B Stat Methodol 39:1–38
Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188
Flury B, Riedwyl H (1988) Multivariate statistics: a practical approach. Chapman & Hall Ltd., London
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington
Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84:165–175
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing JR, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B (Statistical Methodology) 58(1):155–176
Hastie T, Tibshirani R, Buja A (1994) Flexible discriminant analysis by optimal scoring. J Am Stat Assoc 89:1255–1270
Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat 23:73–102
Hennig C (2004) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13(4): 930–945
Kent JT (1991) Discussion of Li (1991). J Am Stat Assoc 86:336–337
Li KC (1991) Sliced inverse regression for dimension reduction (with discussion). J Am Stat Assoc 86: 316–342
Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, London
Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with gaussian mixture models. Biometrics 65(3):701–709
Pardoe I, Yin X, Cook R (2007) Graphical tools for quadratic discriminant analysis. Technometrics 49(2):172–183
R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:31–38
Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20(4):471–484. doi:10.1007/s11222-009-9138-7
Velilla S (2008) A method for dimension reduction in quadratic classification problems. J Comput Graph Stat 17(3):572–589
Velilla S (2010) On the structure of the quadratic subspace in discriminant analysis. J Multivariate Anal 101(5):1239–1251
Zhu M, Hastie TJ (2003) Feature extraction for nonparametric discriminant analysis. J Comput Graph Stat 12(1):101–120
Acknowledgments
The author is grateful to the Associate Editor and two referees for their inspiring comments and recommendations that led to a substantial improvement of the manuscript. This work was partly supported by the Eunice Kennedy Shriver National Institute of Child Health and Development through grant R01 HD070936.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Proposition 1
Assume an EDDA mixture model with common full class covariance matrix. The last condition implies that the matrix \({\varvec{M}}_\mathsf{II }\) in Eq. (2) cancels out, so the kernel matrix simplifies to
where \({\varvec{M}}_\mathsf{I }= \sum _{k=1}^K \pi _k ({\varvec{\mu }}_{k} - {\varvec{\mu }})({\varvec{\mu }}_{k} - {\varvec{\mu }}){}^{\top }= {\varvec{\varSigma }}_B\), the between-class covariance matrix. The basis of the subspace \(\mathcal{S }({\varvec{\beta }})\) provided by GMMDRC is obtained as the solution of the following problem
with \(l_1 \ge \dots \ge l_d\), and \(d = \min (p,K-1)\). Thus, \({\varvec{\beta }}_j\) is the \(j\)th eigenvector associated to the \(j\)th largest eigenvalue \(l_j\) (\(j=1,\ldots ,d\)) of the \((p \times p)\) matrix
The subspace estimated by SIR is obtained as the solution of
which is given by the eigen-decomposition of \({\varvec{\varSigma }}^{-1/2}_X {\varvec{\varSigma }}_B{\varvec{\varSigma }}^{-1/2}_X\). It is easily seen that \({\varvec{\beta }}_j = {\varvec{\beta }}_j^{\mathsf{SIR }}\) and \(l_j = (l_j^{\mathsf{SIR }})^2\), for \(j=1,\ldots ,d\). Thus, the basis of the subspace provided by GMMDRC under model EDDA with full common class covariance matrix is equivalent to the basis estimated by SIR.
We now consider the relation of GMMDRC with LDA canonical variates. From (6), we may subtract \(l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_{B} {\varvec{\beta }}_{j}^{{\mathsf{SIR }}}\) from both side and, recalling the decomposition of the total variance, \({\varvec{\varSigma }}_X = {\varvec{\varSigma }}_B + {\varvec{\varSigma }}_W\), we may write
It is clear that \(l_j^{{\mathsf{SIR }}} /(1-l_{j}^{{\mathsf{SIR }}})\) and \({\varvec{\beta }}_{j}^{{\mathsf{SIR }}}\) are, respectively, the \(j\)th eigenvalue and the associated eigenvector of \({\varvec{\varSigma }}_W^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_W^{-1/2}\), the decomposition solving the Rayleigh quotient used to derive canonical variates in LDA. Thus, the basis of the subspace \(\mathcal{S }({\varvec{\beta }}^\mathsf{LDA })\) is equivalent to \(\mathcal{S }({\varvec{\beta }}^{\mathsf{SIR }})\), which in turn is equivalent to that provided by GMMDRC under the specific model assumption.
1.2 Proof of Proposition 2
The kernel matrix of SAVE can be written in the original scale of the variables as
Recalling that \({\varvec{\varSigma }}_X = {\varvec{\varSigma }}_B + {\varvec{\varSigma }}_W\), we may write the expression within parenthesis as follows:
Then,
where \({\varvec{M}}_\mathsf{I }\) and \({\varvec{M}}_\mathsf{II }\) are those obtained from an EDDA Gaussian mixture model with a single component for each class and different class covariance matrices (VVV).
1.3 Proof of Proposition 3
The proof is analogous to that provided for Prop. 2 in Scrucca (2010) and it is not replicated here.
Rights and permissions
About this article
Cite this article
Scrucca, L. Graphical tools for model-based mixture discriminant analysis. Adv Data Anal Classif 8, 147–165 (2014). https://doi.org/10.1007/s11634-013-0147-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-013-0147-1
Keywords
- Dimension reduction
- Model-based discriminant analysis
- Gaussian mixtures
- Canonical variates for mixture modeling