Skip to main content
Log in

Probabilistic assessment of model-based clustering

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Finite mixtures provide a powerful tool for modeling heterogeneous data. Model-based clustering is a broadly used grouping technique that assumes the existence of the one-to-one correspondence between clusters and mixture model components. Although there are many directions of active research in the model-based clustering framework, very little attention has been paid to studying the specific nature of detected clustering solutions. In this paper, we develop an approach for assessing the variability in classifications carried out by the Bayes decision rule. The proposed technique allows assessing significance of each assignment made. We also apply the developed instrument for identifying influential observations that have impact on the structure of the detected partitioning. The proposed diagnostic methodology is studied and illustrated on synthetic data and applied to the analysis of three well-known classification datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Anderson E (1935) The Irises of the Gaspe Peninsula. Bull Am Iris Soc 59:2–5

    Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

    Article  MATH  MathSciNet  Google Scholar 

  • Baudry J-P, Raftery A, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19:332–353

    Article  MathSciNet  Google Scholar 

  • Bewley A, Shekhar R, Leonard S, Upcroft B, Lever P (2011) Real-time volume estimation of a dragline payload. In IEEE international conference on robotics and automation, pp 1571–1576

  • Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 413:561–575

    Article  MathSciNet  Google Scholar 

  • Bouveyron C, Brunet C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78

    Article  MathSciNet  Google Scholar 

  • Campbell NA, Mahon RJ (1974) A multivariate study of variation in two species of rock crab of genus leptograsus. Aust J Zool 22:417–425

    Article  Google Scholar 

  • Casella G, Berger R (2001) Statistical inference, 2nd edn. Duxbury Press, New York

  • Charytanowicz M, Niewczas J, Kulczycki P, Kowalski P, Lukasik S, Zak S (2010) A complete gradient clustering algorithm for features analysis of X-ray images. In: Information Technologies in Biomedicine. Springer, Berlin, pp 15–24

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm (with discussion). Jounal of the Royal Statistical Society, Series B 39:1–38

    MATH  MathSciNet  Google Scholar 

  • Driver HE, Kroeber AL (1932) Quantitative expression of cultural relationships. University of California press, Berkeley

    Google Scholar 

  • Dunn OJ (1959) Confidence intervals for the means of dependent normally distributed variables. J Am Stat Assoc 54:613–621

    Article  MATH  MathSciNet  Google Scholar 

  • Dunn OJ (1961) Multiple comparisons among means. Journal of American Statistician Association 56:52–64

    Article  MATH  MathSciNet  Google Scholar 

  • Edwards AWF, Cavalli-Sforza LL (1965) Amethod for cluster analysis. Biometrics 21:362–375

    Article  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic poblems. Ann Eugenics 7:179–188

    Article  Google Scholar 

  • Forgy E (1965) Cluster analysis of multivariate data: efficiency vs. interpretability of classifications. Biometrics 21:768–780

    Google Scholar 

  • Fraley C, Raftery A (2007) Model-based methods of classification: using the mclust software in chemometrics. J. Stat. Softw. 18:1–13

    Article  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering. Discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631

    Article  MATH  MathSciNet  Google Scholar 

  • Genz A, Azzalini A (2011) mnormt: the multivariate normal and t distributions, R Package. http://cran.r-project.org/package=mnormt

  • Grubesic, T. H. and Murray, A. T. (2001), Detecting Hot Spots Using Cluster Analysis and GIS, Fifth Annual International Crime Mapping Research Conference, Dallas, TX

  • Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Stat Prob Lett 4:53–56

    Article  MathSciNet  Google Scholar 

  • Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Class 4:3–34

    Article  MATH  MathSciNet  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Class 2:193–218

    Article  Google Scholar 

  • Huth R, Beck C, Philipp A, Demuzere M, Ustrnul Z, Cahynová M, Kyselý J, Tveito OE (2013) Classifications of atmospheric circulation patterns. Ann N Y Acad Sci 1146:105–152

    Article  Google Scholar 

  • Jolliffe IT, Jones B, Morgan BJT (1995) Identifying influential observations in hierarchical cluster analysis. J Appl Stat 22:61–80

    Article  MathSciNet  Google Scholar 

  • Jorgensen MA (1990) Influence-Based Diagnostics for Finite Mixture Models. Biometrics 46:1047–1058

    Article  Google Scholar 

  • Kim D, Seo B (2014) Assessment of the number of components in Gasussian mixture models in the presence of multiple local maximizers. J Multivar Anal 125:100–120

    Article  MATH  MathSciNet  Google Scholar 

  • Lehman EL, Casella G (1998) Theory of point estimation, 2nd edn. Springer, New York

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium 1:281–297

    MathSciNet  Google Scholar 

  • Maitra R, Melnykov V (2010) Simulating data to study performance of finite mixture modeling and clustering algorithms. J Comput Graph Stat 19:354–376

    Article  MathSciNet  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York

    MATH  Google Scholar 

  • Melnykov V (2013a) Challenges in model-based clustering, WIREs. Comput Stat 5:135–148

    Article  Google Scholar 

  • Melnykov V (2013b) On the distribution of posterior probabilities in finite mixture models with application in clustering. J Multivar Anal 122:175–189

    Article  MATH  MathSciNet  Google Scholar 

  • Melnykov V (2014) Merging mixture components for clustering through pairwise overlap. J Comput Graph Stat. doi:10.1080/10618600.2014.978007

  • Melnykov V, Chen W-C, Maitra R (2012) MixSim: an R package for simulating data to study performance of clustering algorithms. J Stat Softw 51:1–25

    Article  Google Scholar 

  • Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4:80–116

    Article  MATH  MathSciNet  Google Scholar 

  • Peel D, McLachlan G (2000) Robust mixture modeling using the t distribution. Stat Comput 10:339–348

    Article  Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimensions of a model. Ann Stat 6:461–464

    Article  MATH  Google Scholar 

  • Seo B, Kim D (2012) Root selection in normal mixture models. Comput Stat Data Anal 56:2454–2470

    Article  MATH  MathSciNet  Google Scholar 

  • Sneath P (1957) The application of computers to taxonomy. J Gen Microbiol 17:201–226

    Article  Google Scholar 

  • Sunaga D, Nievola J, Ramos M (2007) Statistical and Biological Validation Methods in Cluster Analysis of Gene Expression. In: Sixth international conference on machine learning and applications, pp 494–499

  • Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244

    Article  Google Scholar 

Download references

Acknowledgments

The authors acknowledge partial support provided by Lockheed Martin Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Volodymyr Melnykov.

Appendix

Appendix

1.1 The proof of Proposition 2.1 from Sect. 2.3

In this section, we provide main steps needed to prove Proposition 2.1. First, note that

$$\begin{aligned} h_{ik{k^\prime }}(\varvec{\Psi })&= \log {\pi _{ik}(\varvec{\Psi }) \!- \log \pi _{i{k^\prime }}(\varvec{\Psi })}\!= \log (\tau _k \phi (\varvec{x}_i;\varvec{\mu }_k, \varvec{\Sigma }_k))\\&\quad - \log (\tau _{{k^\prime }} \phi (\varvec{x}_i;\varvec{\mu }_{{k^\prime }}, \varvec{\Sigma }_{{k^\prime }}))\\&=\log {\tau _k} \!- \log {\tau _{k^{\prime }}} \!- \frac{1}{2}\log |\varvec{\Sigma }_k| + \frac{1}{2} \log |\varvec{\Sigma }_{k^{\prime }}| - \frac{1}{2}(\varvec{x}_i-\varvec{\mu }_k)^{T}\varvec{\Sigma }_k^{-1}(\varvec{x}_i-\varvec{\mu }_k) \\&\quad + \frac{1}{2}(\varvec{x}_i-\varvec{\mu }_{k^{\prime }})^{T}\varvec{\Sigma }_{k^{\prime }}^{-1}(\varvec{x}_i-\varvec{\mu }_{k^{\prime }}). \end{aligned}$$

For \(k^\prime \ne K\), we obtain

$$\begin{aligned} \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \tau _k} = \frac{1}{\tau _k}, \quad \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \tau _{k^\prime }} = -\frac{1}{\tau _{k^\prime }}, \quad \text{ and } \quad \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \tau _s} = 0 \; \text{ for } \; s \ne k, k^\prime . \end{aligned}$$

For \(k^\prime = K\), we have

$$\begin{aligned} \frac{\partial {h_{ikK}(\varvec{\Psi })}}{\partial \tau _k} = \frac{\partial \log {\tau _k}}{\partial \tau _k} - \frac{\partial \log \left( 1-\tau _1-\tau _2-\cdots -\tau _{K-1}\right) }{\partial \tau _k} = \frac{1}{\tau _k} +\frac{1}{\tau _K} \end{aligned}$$
$$\begin{aligned} \quad \text{ and }\quad \frac{\partial {h_{ikK}(\varvec{\Psi })}}{\partial \tau _s} = -\frac{\partial \log \left( 1-\tau _1-\tau _2-\cdots -\tau _{K-1}\right) }{\partial \tau _s} = \frac{1}{\tau _K} \; \text{ for } \; s \ne k. \end{aligned}$$

From the table of derivatives for matrix differential calculus, we immediately obtain

$$\begin{aligned} \frac{\partial \varvec{x}_i^T \varvec{\Sigma }_s^{-1} \varvec{\mu }_s}{\partial \varvec{\mu }_s} = \varvec{\Sigma }_s^{-1}\varvec{x}_i, \quad \quad \frac{\partial \varvec{\mu }_s^T \varvec{\Sigma }_s^{-1} \varvec{\mu }_s}{\partial \varvec{\mu }_s} = 2\varvec{\Sigma }_s^{-1}\varvec{\mu }_s, \quad \quad \frac{\partial \log |\varvec{\Sigma }_s|}{\partial \varvec{\Sigma }_s} = \varvec{\Sigma }_s^{-1}, \end{aligned}$$
$$\begin{aligned} \quad \text{ and }\quad \frac{\partial (\varvec{x}_i-\varvec{\mu }_s)^{T}\varvec{\Sigma }_s^{-1}(\varvec{x}_i-\varvec{\mu }_s)}{\partial \varvec{\Sigma }_s} = -\varvec{\Sigma }_s^{-1}(\varvec{x}_i-\varvec{\mu }_s)(\varvec{x}_i-\varvec{\mu }_s)^{T} \varvec{\Sigma }_s^{-1}. \end{aligned}$$

As a result, it follows that

$$\begin{aligned} \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \varvec{\mu }_k}= & {} -\frac{1}{2}\frac{\partial (\varvec{x}_i-\varvec{\mu }_k)^{T}\varvec{\Sigma }_k^{-1}(\varvec{x}_i-\varvec{\mu }_k)}{\partial \varvec{\mu }_k} =\varvec{\Sigma }_k^{-1}(\varvec{x}_i-{\varvec{\mu }}_k),\\ \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \varvec{\mu }_{k^\prime }}= & {} \frac{1}{2}\frac{\partial (\varvec{x}_i-\varvec{\mu }_{k^{\prime }})^{T}\varvec{\Sigma }_{k^\prime }^{-1}(\varvec{x}_i-\varvec{\mu }_{k^{\prime }})}{\partial \varvec{\mu }_{k^{\prime }}} =-\varvec{\Sigma }_{k^\prime }^{-1}(\varvec{x}_{i}-\varvec{\mu }_{k^\prime }),\\&\text{ and } \quad \frac{\partial {h_{ik{k'}}(\varvec{\Psi })}}{\partial \varvec{\mu }_s} = \underline{0} \quad \text{ for } \quad s \ne k, k^\prime . \end{aligned}$$
Fig. 11
figure 11

Additional illustration for the simulation study from Sect. 4.1. The display represents proportions of correct (black) and incorrect (red) assignments plotted versus the sample size for various levels of overlap denoted by \(\omega \)

Similarly, we obtain

$$\begin{aligned} \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \varvec{\Sigma }_{k}}&= -\frac{1}{2}\frac{\partial \log |\varvec{\Sigma }_k|}{\partial \varvec{\Sigma }_k} -\frac{1}{2}\frac{\partial (\varvec{x}_i-\varvec{\mu }_k)^{T}\varvec{\Sigma }_k^{-1}(\varvec{x}_i-\varvec{\mu }_k)}{\partial \varvec{\Sigma }_k}\\&= \frac{1}{2} \varvec{\Sigma }_k^{-1}((\varvec{x}_i-\varvec{\mu }_k)(\varvec{x}_i-\varvec{\mu }_k)^{T} \varvec{\Sigma }_k^{-1} - \varvec{I}_p),\\ \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \varvec{\Sigma }_{k^\prime }}&= \frac{1}{2}\frac{\partial \log |\varvec{\Sigma }_{k^\prime }|}{\partial \varvec{\Sigma }_{k^\prime }} + \frac{1}{2}\frac{\partial (\varvec{x}_i-\varvec{\mu }_{k^{\prime }})^{T}\varvec{\Sigma }_{k^\prime }^{-1}(\varvec{x}_i-\varvec{\mu }_{k^{\prime }})}{\partial \varvec{\Sigma }_{k^\prime }}\\&= -\frac{1}{2} \varvec{\Sigma }_{k^{\prime }}^{-1}((\varvec{x}_i-\varvec{\mu }_{k^{\prime }})(\varvec{x}_i-\varvec{\mu }_{k^{\prime }})^{T} \varvec{\Sigma }_{k^{\prime }}^{-1} - \varvec{I}_p),\\&\quad \text{ and } \quad \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \varvec{\Sigma }_s} = \mathbf {0} \quad \text{ for } \quad s \ne k, k^\prime . \end{aligned}$$

After adjusting derivatives with respect to covariance matrices for symmetry, the following gradient vector can be obtained:

$$\begin{aligned}&\nabla h_{ik{k^\prime }}(\varvec{\Psi })\\&\quad = \left( \left( \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \tau _s}\right) _{s=1, \ldots , K-1}, \left( \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial {\varvec{\mu }}_s}\right) ^{T}_{s=1, \ldots , K},\ \left( \frac{\partial {h_{ik{k^\prime }}(\varvec{\Psi })}}{\partial \text{ vech }\{\varvec{\Sigma }_s\}}\right) ^{T}_{s=1, \ldots , K}\right) ^{T}. \end{aligned}$$

Now, the direct application of Delta method concludes the proof.

1.2 Additional illustrations for the simulation study from Sect. 4.1

Figure 3 reveals the relationship between proportions of correct and incorrect classifications and the overlap value. In Fig. 11, we provide an additional display devoted to plotting the proportions versus the sample size. As we can see, for a given value of overlap, the proportion of correct significant classifications improves dramatically with the increase in the sample size while the proportion of correct classifications shows just a mild improvement.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Melnykov, V. Probabilistic assessment of model-based clustering. Adv Data Anal Classif 9, 395–422 (2015). https://doi.org/10.1007/s11634-015-0215-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-015-0215-9

Keywords

Mathematics Subject Classification

Navigation