Graphical tools for model-based mixture discriminant analysis

Scrucca, Luca

doi:10.1007/s11634-013-0147-1

Graphical tools for model-based mixture discriminant analysis

Regular Article
Published: 20 August 2013

Volume 8, pages 147–165, (2014)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Luca Scrucca¹

558 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

The paper introduces a methodology for visualizing on a dimension reduced subspace the classification structure and the geometric characteristics induced by an estimated Gaussian mixture model for discriminant analysis. In particular, we consider the case of mixture of mixture models with varying parametrization which allow for parsimonious models. The approach is an extension of an existing work on reducing dimensionality for model-based clustering based on Gaussian mixtures. Information on the dimension reduction subspace is provided by the variation on class locations and, depending on the estimated mixture model, on the variation on class dispersions. Projections along the estimated directions provide summary plots which help to visualize the structure of the classes and their characteristics. A suitable modification of the method allows us to recover the most discriminant directions, i.e., those that show maximal separation among classes. The approach is illustrated using simulated and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Article Open access 01 July 2018

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

References

Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Article MATH MathSciNet Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57:289–300
MATH MathSciNet Google Scholar
Bensmail H, Celeux G (1996) Regularized Gaussian discriminant analysis through eigenvalue decomposition. J Am Stat Assoc 91:1743–1748
Article MATH MathSciNet Google Scholar
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Google Scholar
Bouveyron C, Brunet-Saumard C (2013) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal. doi:10.1016/j.csda.2012.12.008
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, New York
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Article Google Scholar
Chen CH, Li KC (2001) Generalization of fisher’s linear discriminant analysis via the approach of sliced inverse regression. J Korean Stat Soc 30:193–217
Google Scholar
Cook DR, Forzani L (2009) Likelihood-based sufficient dimension reduction. J Am Stat Assoc 104(485): 197–208
Google Scholar
Cook RD, Weisberg S (1991) Discussion of Li (1991). J Am Stat Assoc 86:328–332
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm (with discussion). J R Stat Soc Ser B Stat Methodol 39:1–38
Google Scholar
Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188
Article Google Scholar
Flury B, Riedwyl H (1988) Multivariate statistics: a practical approach. Chapman & Hall Ltd., London
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Article MATH MathSciNet Google Scholar
Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington
Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84:165–175
Article Google Scholar
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing JR, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Google Scholar
Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B (Statistical Methodology) 58(1):155–176
MATH MathSciNet Google Scholar
Hastie T, Tibshirani R, Buja A (1994) Flexible discriminant analysis by optimal scoring. J Am Stat Assoc 89:1255–1270
Google Scholar
Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat 23:73–102
Google Scholar
Hennig C (2004) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13(4): 930–945
Google Scholar
Kent JT (1991) Discussion of Li (1991). J Am Stat Assoc 86:336–337
Google Scholar
Li KC (1991) Sliced inverse regression for dimension reduction (with discussion). J Am Stat Assoc 86: 316–342
Google Scholar
Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, London
Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with gaussian mixture models. Biometrics 65(3):701–709
Google Scholar
Pardoe I, Yin X, Cook R (2007) Graphical tools for quadratic discriminant analysis. Technometrics 49(2):172–183
Google Scholar
R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:31–38
Google Scholar
Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20(4):471–484. doi:10.1007/s11222-009-9138-7
Google Scholar
Velilla S (2008) A method for dimension reduction in quadratic classification problems. J Comput Graph Stat 17(3):572–589
Article MathSciNet Google Scholar
Velilla S (2010) On the structure of the quadratic subspace in discriminant analysis. J Multivariate Anal 101(5):1239–1251
Article MATH MathSciNet Google Scholar
Zhu M, Hastie TJ (2003) Feature extraction for nonparametric discriminant analysis. J Comput Graph Stat 12(1):101–120
Article MathSciNet Google Scholar

Download references

Acknowledgments

The author is grateful to the Associate Editor and two referees for their inspiring comments and recommendations that led to a substantial improvement of the manuscript. This work was partly supported by the Eunice Kennedy Shriver National Institute of Child Health and Development through grant R01 HD070936.

Author information

Authors and Affiliations

Dipartimento di Economia, Finanza e Statistica, Università degli Studi di Perugia, Via A. Pascoli 20, 06123 , Perugia, Italy
Luca Scrucca

Authors

Luca Scrucca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Scrucca.

Appendix

1.1 Proof of Proposition 1

Assume an EDDA mixture model with common full class covariance matrix. The last condition implies that the matrix ${\varvec{M}}_\mathsf{II }$ in Eq. (2) cancels out, so the kernel matrix simplifies to

$$\begin{aligned} {\varvec{M}}= {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1} {\varvec{M}}_\mathsf{I }, \end{aligned}$$

where ${\varvec{M}}_\mathsf{I }= \sum _{k=1}^K \pi _k ({\varvec{\mu }}_{k} - {\varvec{\mu }})({\varvec{\mu }}_{k} - {\varvec{\mu }}){}^{\top }= {\varvec{\varSigma }}_B$, the between-class covariance matrix. The basis of the subspace $\mathcal{S }({\varvec{\beta }})$ provided by GMMDRC is obtained as the solution of the following problem

$$\begin{aligned} {\varvec{M}}{\varvec{\beta }}_j = l_j {\varvec{\varSigma }}_X {\varvec{\beta }}_j, \end{aligned}$$

with $l_1 \ge \dots \ge l_d$, and $d = \min (p,K-1)$. Thus, ${\varvec{\beta }}_j$ is the $j$th eigenvector associated to the $j$th largest eigenvalue $l_j$ ($j=1,\ldots ,d$) of the $(p \times p)$ matrix

$$\begin{aligned} {\varvec{\varSigma }}^{-1/2}_X {\varvec{M}}{\varvec{\varSigma }}^{-1/2}_X&= {\varvec{\varSigma }}^{-1/2}_X {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1} {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}^{-1/2}_X \\&= \left( {\varvec{\varSigma }}^{-1/2}_X {\varvec{\varSigma }}_B {\varvec{\varSigma }}^{-1/2}_X\right) {}^{\top }\left( {\varvec{\varSigma }}^{-1/2}_X {\varvec{\varSigma }}_B {\varvec{\varSigma }}^{-1/2}_X\right) . \end{aligned}$$

The subspace estimated by SIR is obtained as the solution of

$$\begin{aligned} {\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}\ = l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_X {\varvec{\beta }}_j^{\mathsf{SIR }} \end{aligned}$$

(6)

which is given by the eigen-decomposition of ${\varvec{\varSigma }}^{-1/2}_X {\varvec{\varSigma }}_B{\varvec{\varSigma }}^{-1/2}_X$. It is easily seen that ${\varvec{\beta }}_j = {\varvec{\beta }}_j^{\mathsf{SIR }}$ and $l_j = (l_j^{\mathsf{SIR }})^2$, for $j=1,\ldots ,d$. Thus, the basis of the subspace provided by GMMDRC under model EDDA with full common class covariance matrix is equivalent to the basis estimated by SIR.

We now consider the relation of GMMDRC with LDA canonical variates. From (6), we may subtract $l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_{B} {\varvec{\beta }}_{j}^{{\mathsf{SIR }}}$ from both side and, recalling the decomposition of the total variance, ${\varvec{\varSigma }}_X = {\varvec{\varSigma }}_B + {\varvec{\varSigma }}_W$, we may write

$$\begin{aligned} {\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}- l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}&= l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_X {\varvec{\beta }}_j^{\mathsf{SIR }}- l_j^{\mathsf{SIR }}{\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}\\ \left( 1 - l_j^{\mathsf{SIR }}\right) {\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}&= l_j^{\mathsf{SIR }}\left( {\varvec{\varSigma }}_X - {\varvec{\varSigma }}_B\right) {\varvec{\beta }}_j^{\mathsf{SIR }}\\ {\varvec{\varSigma }}_B {\varvec{\beta }}_j^{\mathsf{SIR }}&= l_j^{\mathsf{SIR }}/\left( 1 - l_j^{\mathsf{SIR }}\right) {\varvec{\varSigma }}_W {\varvec{\beta }}_j^{\mathsf{SIR }}. \end{aligned}$$

It is clear that $l_j^{{\mathsf{SIR }}} /(1-l_{j}^{{\mathsf{SIR }}})$ and ${\varvec{\beta }}_{j}^{{\mathsf{SIR }}}$ are, respectively, the $j$th eigenvalue and the associated eigenvector of ${\varvec{\varSigma }}_W^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_W^{-1/2}$, the decomposition solving the Rayleigh quotient used to derive canonical variates in LDA. Thus, the basis of the subspace $\mathcal{S }({\varvec{\beta }}^\mathsf{LDA })$ is equivalent to $\mathcal{S }({\varvec{\beta }}^{\mathsf{SIR }})$, which in turn is equivalent to that provided by GMMDRC under the specific model assumption.

1.2 Proof of Proposition 2

The kernel matrix of SAVE can be written in the original scale of the variables as

$$\begin{aligned} {\varvec{M}}_\mathsf{SAVE } = \sum _{k=1}^K \omega _k \left( {\varvec{I}}_p - {\varvec{\varSigma }}_X^{-1/2} {\varvec{\varSigma }}_k {\varvec{\varSigma }}_X^{-1/2} \right) ^2. \end{aligned}$$

Recalling that ${\varvec{\varSigma }}_X = {\varvec{\varSigma }}_B + {\varvec{\varSigma }}_W$, we may write the expression within parenthesis as follows:

$$\begin{aligned} {\varvec{\varSigma }}_X^{-1/2} ({\varvec{\varSigma }}_X - {\varvec{\varSigma }}_k) {\varvec{\varSigma }}_X^{-1/2}&= {\varvec{\varSigma }}_X^{-1/2} ({\varvec{\varSigma }}_B + {\varvec{\varSigma }}_W - {\varvec{\varSigma }}_k) {\varvec{\varSigma }}_X^{-1/2} \\&= {\varvec{\varSigma }}_X^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_X^{-1/2} + {\varvec{\varSigma }}_X^{-1/2} ({\varvec{\varSigma }}_W - {\varvec{\varSigma }}_k) {\varvec{\varSigma }}_X^{-1/2}. \end{aligned}$$

Then,

$$\begin{aligned} {\varvec{M}}_\mathsf{SAVE }&= \sum _{k=1}^K \omega _k \left( {\varvec{\varSigma }}_X^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_X^{-1/2} + {\varvec{\varSigma }}_X^{-1/2}({\varvec{\varSigma }}_W - {\varvec{\varSigma }}_k) {\varvec{\varSigma }}_X^{-1/2} \right) ^2 \\&= {\varvec{\varSigma }}_X^{-1/2} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_X^{-1} {\varvec{\varSigma }}_B {\varvec{\varSigma }}_X^{-1/2} \\&+ {\varvec{\varSigma }}_X^{-1/2} \left( \sum _{k=1}^K w_k ({\varvec{\varSigma }}_k - {\varvec{\varSigma }}_W) {\varvec{\varSigma }}_X^{-1} ({\varvec{\varSigma }}_k - {\varvec{\varSigma }}_W){}^{\top }\right) {\varvec{\varSigma }}_X^{-1/2}\\&= {\varvec{\varSigma }}_X^{-1/2} {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1} {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1/2} + {\varvec{\varSigma }}_X^{-1/2} {\varvec{M}}_\mathsf{II }{\varvec{\varSigma }}_X^{-1/2} \\&= {\varvec{\varSigma }}_X^{-1/2} ( {\varvec{M}}_\mathsf{I }{\varvec{\varSigma }}_X^{-1} {\varvec{M}}_\mathsf{I }+ {\varvec{M}}_\mathsf{II }) {\varvec{\varSigma }}_X^{-1/2}, \end{aligned}$$

where ${\varvec{M}}_\mathsf{I }$ and ${\varvec{M}}_\mathsf{II }$ are those obtained from an EDDA Gaussian mixture model with a single component for each class and different class covariance matrices (VVV).

1.3 Proof of Proposition 3

The proof is analogous to that provided for Prop. 2 in Scrucca (2010) and it is not replicated here.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scrucca, L. Graphical tools for model-based mixture discriminant analysis. Adv Data Anal Classif 8, 147–165 (2014). https://doi.org/10.1007/s11634-013-0147-1

Download citation

Received: 07 December 2012
Revised: 18 July 2013
Accepted: 26 July 2013
Published: 20 August 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s11634-013-0147-1

Keywords

Mathematics Subject Classification

62H30

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graphical tools for model-based mixture discriminant analysis

Abstract

Access this article

Similar content being viewed by others

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Tutorial on PCA and approximate PCA and approximate kernel PCA

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Proposition 1

1.2 Proof of Proposition 2

1.3 Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Graphical tools for model-based mixture discriminant analysis

Abstract

Access this article

Similar content being viewed by others

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Tutorial on PCA and approximate PCA and approximate kernel PCA

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Proposition 1

1.2 Proof of Proposition 2

1.3 Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation