Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap

O’Hagan, Adrian; Murphy, Thomas Brendan; Scrucca, Luca; Gormley, Isobel Claire

doi:10.1007/s00180-019-00897-9

Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap

Original paper
Published: 28 May 2019

Volume 34, pages 1779–1813, (2019)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Adrian O’Hagan¹,
Thomas Brendan Murphy¹,
Luca Scrucca² &
…
Isobel Claire Gormley¹

1067 Accesses
23 Citations
Explore all metrics

Abstract

Mixture models with (multivariate) Gaussian components are a popular tool in model-based clustering. Such models are often fitted by a procedure that maximizes the likelihood, such as the EM algorithm. At convergence, the maximum likelihood parameter estimates are typically reported, but in most cases little emphasis is placed on the variability associated with these estimates. In part this may be due to the fact that standard errors are not directly calculated in the model-fitting algorithm, either because they are not required to fit the model, or because they are difficult to compute. The examination of standard errors in model-based clustering is therefore typically neglected. Sampling based methods, such as the jackknife (JK), bootstrap (BS) and parametric bootstrap (PB), are intuitive, generalizable approaches to assessing parameter uncertainty in model-based clustering using a Gaussian mixture model. This paper provides a review and empirical comparison of the jackknife, bootstrap and parametric bootstrap methods for producing standard errors and confidence intervals for mixture parameters. The performance of such sampling methods in the presence of small and/or overlapping clusters requires consideration however; here the weighted likelihood bootstrap (WLBS) approach is demonstrated to be effective in addressing this concern in a model-based clustering framework. The JK, BS, PB and WLBS methods are illustrated and contrasted through simulation studies and through the traditional Old Faithful data set and also the Thyroid data set. The MclustBootstrap function, available in the most recent release of the popular R package mclust, facilitates the implementation of the JK, BS, PB and WLBS approaches to estimating parameter uncertainty in the context of model-based clustering. The JK, WLBS and PB approaches to variance estimation are shown to be robust and provide good coverage across a range of real and simulated data sets when performing model-based clustering; but care is advised when using the BS in such settings. In the case of poor model fit (for example for data with small and/or overlapping clusters), JK and BS are found to suffer from not being able to fit the specified model in many of the sub-samples formed. The PB also suffers when model fit is poor since it is reliant on data sets simulated from the model upon which to base the variance estimation calculations. However the WLBS will generally provide a robust solution, driven by the fact that all observations are represented with some weight in each of the sub-samples formed under this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

Article Open access 11 June 2015

Gaussian parsimonious clustering models with covariates and a noise component

Article 20 September 2019

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Article 25 January 2024

References

Andrews DW, Buchinsky M (2000) A three-step method for choosing the number of bootstrap repetitions. Econometrica 68(1):23–51
MathSciNet MATH Google Scholar
Andrews DW, Guggenberger P (2009) Incorrect asymptotic size of subsampling procedures based on post-consistent model selection estimators. J Econom 152(1):19–27
MathSciNet MATH Google Scholar
Azzalini A, Bowman A (1990) A look at some data on the old faithful geyser. Appl Stat 39(3):357–365
MATH Google Scholar
Basford K, Greenway D, McLachlan G, Peel D (1997) Standard errors of fitted means under normal mixture models. Comput Stat 12:1–17
MATH Google Scholar
Boldea O, Magnus J (2009) Maximum likelihood estimation of the multivariate normal mixture model. J Am Stat Assoc 104:1539–1549
MathSciNet MATH Google Scholar
Bühlmann P (1997) Sieve bootstrap for time series. Bernoulli 3(2):123–148
MathSciNet MATH Google Scholar
Coomans D, Broeckaert I, Jonckheer M, Massart D (1983) Comparison of multivariate discrimination techniques for clinical data-application to the thyroid functional state. Methods Inf Med 22(02):93–101
Google Scholar
Davison AC, Hinkley DV (1997) Bootstrap methods and their application, vol 1. Cambridge University Press, Cambridge
MATH Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
MathSciNet MATH Google Scholar
Diebolt J, Ip E (1996) Stochastic EM: method and application. In: Gilks WR, Richardson R, Spiegelhalter D (eds) Markov Chain Monte Carlo in practice. Chapman & Hall, London, pp 259–273
Google Scholar
Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599
MathSciNet MATH Google Scholar
Efron B (1982) The jackknife, the bootstrap, and other resampling plans, vol 38. SIAM, Philadelphia
MATH Google Scholar
Efron B (1994) Missing data, imputation and the bootstrap (with discussion). J Am Stat Assoc 89(426):463–479
MATH Google Scholar
Efron B, Stein C (1981) The jackknife estimate of variance. Ann Stat 9(3):586–596
MathSciNet MATH Google Scholar
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall/CRC, New York
MATH Google Scholar
Everitt BS, Hothorn T (2009) A handbook of statistical analyses using R, 2nd edn. Chapman & Hall, London
MATH Google Scholar
Ford I, Silvey S (1980) A sequentially constructed design for estimating a nonlinear parametric function. Biometrika 67(2):381–388
MathSciNet MATH Google Scholar
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588
MATH Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–612
MathSciNet MATH Google Scholar
Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust Version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Tech. Rep. No. 597, Department of Statistics, University of Washington, USA
Grün B, Leisch F (2007) Fitting finite mixtures of generalized linear regressions in R. Comput Stat Data Anal 51(11):5247–5252
MathSciNet MATH Google Scholar
Hong H, Mahajan A, Nekipelov D (2015) Extremum estimation and numerical derivatives. J Econom 188(1):250–263
MathSciNet MATH Google Scholar
Lee SX, McLachlan GJ (2013a) EMMIX-uskew: an R package for fitting mixtures of multivariate skew t-distributions via the EM algorithm. J Stat Softw 55(12):1–22
Google Scholar
Lee SX, McLachlan GJ (2013b) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22(4):427–454
MathSciNet MATH Google Scholar
Leeb H, Pötscher BM (2005) Model selection and inference: facts and fiction. Econom Theory 21(1):21–59
MathSciNet MATH Google Scholar
McLachlan G (1987) On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. J R Stat Soc Ser C 36:318–324
Google Scholar
McLachlan G, Peel D, Basford K, Adams P (1999) Fitting mixtures of normal and $t$-components. J Stat Softw 4(2)
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
MATH Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
MATH Google Scholar
Meilijson I (1989) A fast improvement to the EM algorithm on its own terms. J R Stat Soc Ser B 51(1):127–138
MathSciNet MATH Google Scholar
Meng X, Rubin D (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J Am Stat Assoc 86(416):899–909
Google Scholar
Meng XL, Rubin D (1989) Obtaining asymptotic variance-covariance matrices for missing-data problems using EM. In: Proceedings of the American statistical association (statistical computing section), American Statistical Association, Alexandria, Virginia, pp 140–144
Mita N, Jiao J, Kani K, Tabuchi A, Hara H (2012) The parametric and non-parametric bootstrap resamplings for the visual acuity measurement. Kawasaki J Med Welf 18:19–28
Google Scholar
Moulton LH, Zeger SL (1991) Bootstrapping generalized linear models. Comput Stat Data Anal 11(1):53–63
MathSciNet MATH Google Scholar
Newton MA, Raftery AE (1994) Approximate Bayesian inference with the weighted likelihood bootstrap. J R Stat Soc Ser B 56(1):3–26
MathSciNet MATH Google Scholar
Nyamundanda G, Brennan L, Gormley I (2010) Probabilistic principal component analysis for metabolomic data. BMC Bioinform 11(1):571
Google Scholar
Pawitan Y (2000) Computing empirical likelihood from the bootstrap. Stat Probab Lett 47(4):337–345
MATH Google Scholar
Peel D (1998) Mixture model clustering and related topics. Ph.D thesis, University of Queensland, Brisbane
Quenouille M (1956) Notes on bias in estimation. Biometrika 43(2):343–348
MathSciNet MATH Google Scholar
R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
MathSciNet MATH Google Scholar
Shi X (1988) A note on the delete-d jackknife variance estimators. Stat Probab Lett 6(5):341–347
MathSciNet MATH Google Scholar
Stoica P, Söderström T (1982) On non-singular information matrices and local identifiability. Int J Control 36(2):323–329
MATH Google Scholar
Tanner MA (2012) Tools for statistical inference. Springer, New York
Google Scholar
Titterington DM (1984) Recursive parameter estimation using incomplete data. J R Stat Soc Ser B 46(2):257–267
MathSciNet MATH Google Scholar
Tukey J (1958) Bias and confidence in not-quite large samples (abstract). Ann Math Stat 29(2):614
Google Scholar
Turner TR (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. J R Stat Soc: Ser C 49(3):371–384
MathSciNet MATH Google Scholar
Wu CFJ (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14(4):1261–1295
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Statistics and Insight: Centre for Data Analytics, University College Dublin, Dublin, Ireland
Adrian O’Hagan, Thomas Brendan Murphy & Isobel Claire Gormley
Department of Economics, Università degli Studi di Perugia, Perugia, Italy
Luca Scrucca

Authors

Adrian O’Hagan
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brendan Murphy
View author publications
You can also search for this author in PubMed Google Scholar
Luca Scrucca
View author publications
You can also search for this author in PubMed Google Scholar
Isobel Claire Gormley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrian O’Hagan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the Insight Research Centre (SFI/12/RC/2289) and Science Foundation Ireland under the Research Frontiers Programme (2007/RFP/MATH281).

Appendices

Appendix A: Pairs plots of a simulated data set from Simulation Setting Three

Simulation Setting Three explores the performance and computational features of the JK, BS, PB and WLBS approaches to parameter variance estimation in a higher dimensional setting featuring overlapping and small clusters. Figures 7, 8 and 9 provide pairs plots from a single simulated data set under this setting for which $n = 500$, $p = 25$ and $G = 5$. Each of the different colours/symbols in the plots denotes one of the 5 distinct clusters of observations simulated.

Appendix B: Covariance parameter estimates and standard errors for the Thyroid data

Cluster covariance estimated values are presented below using jackknife (JK), bootstrap (BS), parametric bootstrap (PB) and weighted likelihood bootstrap (WLBS) methods (with associated standard errors) for the optimal mixture of Gaussians model for the Thyroid data, group 1, where $G = 3$ and $p = 5$ and the optimal model has unequal diagonal covariance structure across clusters.

$$\begin{aligned} \varSigma _{MCLUST, \,\,Group \,1}= & {} \left( \begin{array}{ccccc} 66.39 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 4.82 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0.23 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0.22 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 3.19 \\ \end{array}\right) \\ \varSigma _{JK, \,\,Group \,1}= & {} \left( \begin{array}{ccccc} 67.50 \,(7.82) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 4.80 \,(0.63) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0.24 \,(0.03) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.33 \,(0.04) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 3.25 \,(0.36)\\ \end{array}\right) \\ \varSigma _{BS, \,\,Group \,1}= & {} \left( \begin{array}{ccccc} 66.00 \,(8.25) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 4.80 \,(0.64) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0.23 \,(0.03) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.22 \,(0.05) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 3.16 \,(0.34)\\ \end{array}\right) \\ \varSigma _{PB, \,\,Group \,1}= & {} \left( \begin{array}{ccccc} 65.85 \,(7.80) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 4.80 \,(0.54) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0.23 \,(0.03) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.22 \,(0.03) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 3.17 \,(0.37)\\ \end{array}\right) \\ \varSigma _{WLBS, \,\,Group \,1}= & {} \left( \begin{array}{ccccc} 65.85 \,(7.99) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 4.78 \,(0.62) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0.23 \,(0.03) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.22 \,(0.05) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 3.17 \,(0.42)\\ \end{array}\right) \\ \end{aligned}$$

Cluster covariance estimated values are presented below using jackknife (JK), bootstrap (BS), parametric bootstrap (PB) and weighted likelihood bootstrap (WLBS) methods (with associated standard errors) for the optimal mixture of Gaussians model for the Thyroid data, group 2, where $G = 3$ and $p = 5$ and the optimal model has unequal diagonal covariance structure across clusters.

$$\begin{aligned} \varSigma _{MCLUST, \,\,Group \,2}= & {} \left( \begin{array}{ccccc} 344.46 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 17.44 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 4.92 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0.15 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0.07 \\ \end{array}\right) \\ \varSigma _{JK, \,\,Group \,2}= & {} \left( \begin{array}{ccccc} 384.31 \,(101.72) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 14.84 \,(3.00) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 5.19 \,(1.37) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.15 \,(0.03) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.08 \,(0.02)\\ \end{array}\right) \\ \varSigma _{BS, \,\,Group \,2}= & {} \left( \begin{array}{ccccc} 336.73 \,(98.03) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 16.85 \,(2.88) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 4.77 \,(1.31) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.15 \,(0.03) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.07 \,(0.02)\\ \end{array}\right) \\ \varSigma _{PB, \,\,Group \,2}= & {} \left( \begin{array}{ccccc} 334.34 \,(83.28) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 17.04 \,(4.45) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 4.74 \,(1.15) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.15 \,(0.04) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.07 \,(0.02)\\ \end{array}\right) \\ \varSigma _{WLBS, \,\,Group \,2}= & {} \left( \begin{array}{ccccc} 332.50 \,(92.04) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 16.71 \,(2.71) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 4.81 \,(1.28) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.15 \,(0.03) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0.07 \,(0.02)\\ \end{array}\right) \\ \end{aligned}$$

Cluster covariance estimated values are presented below using jackknife (JK), bootstrap (BS), parametric bootstrap (PB) and weighted likelihood bootstrap (WLBS) methods (with associated standard errors) for the optimal mixture of Gaussians model for the Thyroid data, group 3, where $G = 3$ and $p = 5$ and the optimal model has unequal diagonal covariance structure across clusters.

$$\begin{aligned}&\varSigma _{MCLUST, \,\,Group \,3} = \left( \begin{array}{ccccc} 95.23 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 4.26 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0.28 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 147.06 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 231.22 \\ \end{array}\right) \\&\varSigma _{JK, \,\,Group \,3} = \left( \begin{array}{ccccc} 95.47 \,(29.87) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 2.91 \,(1.10) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0.24 \,(0.06) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 157.52 \,(71.60) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 234.45 \,(71.18)\\ \end{array}\right) \\&\varSigma _{BS, \,\,Group \,3} = \left( \begin{array}{ccccc} 90.83 \,(27.53) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 3.93 \,(0.94) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0.26 \,(0.06) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 143.33 \,(65.03) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 222.37 \,(65.83)\\ \end{array}\right) \\&\varSigma _{PB, \,\,Group \,3} = \left( \begin{array}{ccccc} 91.11 \,(25.03) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 4.17 \,(1.16) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0.27 \,(0.08) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 141.92 \,(38.84) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 222.99 \,(61.33)\\ \end{array}\right) \\&\varSigma _{WLBS, \,\,Group \,3} = \left( \begin{array}{ccccc} 92.72 \,(25.66) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 3.91 \,(0.85) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0.26 \,(0.05) &{} 0 \,(0) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 139.92 \,(61.20) &{} 0 \,(0)\\ 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 0 \,(0) &{} 219.38 \,(62.58)\\ \end{array}\right) \end{aligned}$$

The following code produces all variance estimation results for the Thyroid data set, using the MclustBootstrap function in mclust.

Rights and permissions

Reprints and permissions

About this article

Cite this article

O’Hagan, A., Murphy, T.B., Scrucca, L. et al. Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap. Comput Stat 34, 1779–1813 (2019). https://doi.org/10.1007/s00180-019-00897-9

Download citation

Received: 20 January 2016
Accepted: 18 May 2019
Published: 28 May 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00180-019-00897-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap

Abstract

Access this article

Similar content being viewed by others

Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

Gaussian parsimonious clustering models with covariates and a noise component

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

References