Abstract
Latent class analysis is used to perform model based clustering for multivariate categorical responses. Selection of the variables most relevant for clustering is an important task which can affect the quality of clustering considerably. This work considers a Bayesian approach for selecting the number of clusters and the best clustering variables. The main idea is to reformulate the problem of group and variable selection as a probabilistically driven search over a large discrete space using Markov chain Monte Carlo (MCMC) methods. Both selection tasks are carried out simultaneously using an MCMC approach based on a collapsed Gibbs sampling method, whereby several model parameters are integrated from the model, substantially improving computational performance. Post-hoc procedures for parameter and uncertainty estimation are outlined. The approach is tested on simulated and real data .
Similar content being viewed by others
References
Aitkin, M., Anderson, D., Hinde, J.: Statistical modelling of data on teaching styles. J. R. Stat. Soc. Ser. A 144, 419–461 (1981)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Akadémiai Kiadó. (ed.) Second International Symposium on Information Theory, pp. 267–281. Springer, New York (1973)
Bartholomew, D.J., Knott, M.: Latent Variable Models and Factor Analysis, 2nd edn. Kendall’s Library of Statistics, Hodder Arnold (1999)
Bennet, N.: Teaching Styles and Pupil Progress. Open Books, London (1976)
Bensmail, H., Celeux, G., Raftery, A., Robert, C.: Inference in model-based cluster analysis. Stati. Comput. 7, 1–10 (1997)
Cappé, O., Robert, C.P., Rydén, T.: Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 65(3), 679–700 (2003)
Carpaneto, G., Toth, P.: Algorithm 548: solution of the assignment problem [H]. ACM Trans. Math. Softw. 6, 104–111 (1980)
Celeux, G., Hurn, M., Robert, C.P.: Computational and inferential difficulties with mixture posterior distributions. J. Am. Stat. Assoc. 95, 957–970 (2000)
Celeux, G., Forbes, F., Robert, C.P., Titterington, D.: Deviance information criteria for missing data models. Bayesian Anal. 1, 651–673 (2006)
Chopin, N., Robert, C.P.: Properties of nested sampling. Biometrika 97(3), 741–755 (2010)
Dean, N., Raftery, A.E.: Latent class analysis variable selection. Ann. Inst. Stat. Math. 62, 11–35 (2010)
Dellaportas, P., Papageorgiou, I.: Multivariate mixtures of normals with unknown number of components. Stat. Comput. 16, 57–68 (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from incomplete data via the EM Algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)
Fraley, C., Raftery, A.: Model-based methods of classification: using the software in chemometrics. J. Stat. Softw. 18, 1–13 (2007)
Frühwirth-Schnatter, S.: Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econom. J. 7(1), 143–167 (2004)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models: Modeling and Applications to Random Processes. Springer, Berlin (2006)
Garrett, E.S., Zeger, S.L.: Latent class model diagnosis. Biometrics 56, 1055–1067 (2000)
Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions and the bayesian restoration of images. IEEE Trans.Pattern Anal. Mach. Intell. 6, 721–741 (1984)
Geweke, J.: Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57(6), 1317–1339 (1989)
Gollini, I., Murphy ,T.: Mixture of latent trait analyzers for model-based clustering of categorical data. Statistics and Computing (to appear) (2013)
Goodman, L.A.: Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, 215–231 (1974)
Green, P.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995)
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Ley, E., Steel, M.F.J.: On the effect of prior assumptions in Bayesian model averaging with applications to growth regression. J. Appl. Econom. 24, 651–674 (2009)
Marin, J.M., Mengersen, K., Robert, C.P.: Bayesian modelling and inference on mixtures of distributions. In: Dey, D., Rao, C. (eds) Bayesian Thinking: Modeling and Computation, vol 25, 1st edn, chap 16, pp 459–507. Handbook of Statistics, North Holland, Amsterdam (2005)
McDaid, A.F., Murphy, T.B., Friel, N., Hurley, N.: Improved Bayesian inference for the stochastic block model with application to large networks. Comput. Stat. & Data Anal. 60, 12–31 (2013)
McLachlan, G., Peel, D.: Finite Mixture Models. John Wiley & Sons, New York (2002)
Meng, X.L., Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat. Sin. 6, 831–860 (1996)
Moran, M., Walsh, C., Lynch, A., Coen, R.F., Coakley, D., Lawlor, B.A.: Syndromes of behavioural and psychological symptoms in mild alzheimer’s disease. Int J Geriatr Psychiatry 19, 359–364 (2004)
Newton, M.A., Raftery, A.E.: Approximate bayesian inference with the weighted likelihood bootstrap. J. R. Stat. Soc. Ser. B (Methodol.) 56(1), 3–48 (1994)
Nobile, A.: Bayesian finite mixtures: a note on prior specification and posterior computation. Tech. Rep. 05–3, University of Glasgow, Glasgow, UK (2005)
Nobile, A., Fearnside, A.: Bayesian finite mixtures with an unknown number of components: the allocation sampler. Stat. Comput. 17, 147–162 (2007)
Pan, J.C., Huang, G.H.: Bayesian inferences of latent class models with an unknown number of classes. Psychometrika. pp 1–26 (2013)
Pandolfi, S., Bartolucci, F., Friel, N.: A generalized multiple-try version of the reversible jump algorithm. Comput. Stat. & Data Anal. 72, 298–314 (2014)
Plummer, M., Best, N., Cowles, K., Vines, K.: CODA: convergence diagnosis and output analysis for MCMC. R News 6, 7–11 (2006)
R Core Team.: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.R-project.org/
Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101, 168–178 (2006)
Raftery, A.E., Newton, M.A., Satagopan, J.M., Krivitsky, P.N.: Estimating the integrated likelihood via posterior simulation using the harmonic mean identity (with discussion). In: Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.) Bayesian Statistics, vol. 8, pp. 1–45. Oxford University Press, Oxford (2007)
Richardson, S., Green, P.J.: On bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B (Stat. Methodol.) 59, 731–792 (1997)
Rousseau, J., Mengersen, K.: Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73, 689–710 (2011)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Smart, K.M., Blake, C., Staines, A., Doody, C.: The Discriminative Validity of “Nociceptive”, “Peripheral Neuropathic”, and “Central Sensitization” as mechanisms-based classifications of musculoskeletal pain. Clin. J. pain 27, 655–663 (2011)
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 64, 583–639 (2002)
Stephens, M.: Bayesian analysis of mixture models with an unknown number of components an alternative to reversible jump methods. Ann. Stat. 28(1), 40–74 (2000a)
Stephens, M.: Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B 62, 795–809 (2000b)
Tadesse, M.G., Sha, N., Vannucci, M.: Bayesian variable selection in clustering high-dimensional data. J. Am. Stat. Assoc. 100, 602–617 (2005)
Walsh, C.: Latent class analysis identification of syndromes in alzheimer’s disease: a bayesian approach. Metodol Zvezki Adv. Methodol. Stat. 3, 147–162 (2006)
White, A., Murphy, B.: BayesLCA: Bayesian Latent Class Analysis (2013). http://CRAN.R-project.org/package=BayesLCA, R package version 1.3
Wyse, J., Friel, N.: Block clustering with collapsed latent block models. Stat. Comput. 22, 415–428 (2012)
Acknowledgments
This work of Arthur White and Thomas Brendan Murphy was partly supported by Science Foundation Ireland under the Clique Strategic Research Cluster (08/SRC/I1407) while the work of Jason Wyse was partly done while working at the Insight Centre for Data Analytics which is supported by Science Foundation Ireland [SFI/12/RC/2289].
Author information
Authors and Affiliations
Corresponding author
Appendix: Comparison with reversible jump MCMC
Appendix: Comparison with reversible jump MCMC
In this section we investigate how the performance of a collapsed Gibbs sampler compares with an RJMCMC based on using all model parameters. We divide this investigation into two tasks: selecting (a) the number of classes, and (b) which variables to include. We implement the approach of Pan and Huang (2013) using already available software to investigate the efficacy of RJMCMC for the former task, and outline our own approach to perform the latter for the case where the observed data is binary only. We find that the approach performs reasonably well when selecting the number of classes, although its performance is somewhat slower than that of the collapsed sampler. The implemented approach performs poorly when performing variable selection.
1.1 Number of classes
To identify the number of groups in a dataset using RJMCMC methods, we apply softwareFootnote 1 implementing the approach of Pan and Huang (2013). We applied the software to the binary and non-binary Dean and Raftery datasets described in Sect. 2.2, running the sampler for 100,000 iterations in both cases. All prior settings were set by default. In both cases, the non-informative variables were removed, since the sole task was to identify the correct number of classes.
As the software for this approach was implemented as a C++ programme, it can be thought of as broadly comparable to our own collapsed sampler, which is implemented in C; for the binary and non-binary datasets, the software took roughly 25 and 90 mins to run respectively, based on the same hardware specifications described previously. In both cases, this was markedly longer than the collapsed sampler took, despite the fact that the model was exploring the group space only, and the dimension of the data had been reduced.
The results from the samplers are shown in Table 14. In the case of the binary data, the correct number of groups is chosen as the most likely candidate, although, with a lower posterior probability in comparison to the collapsed sampler. In the case of the non-binary data, \(G = 2\) is incorrectly is chosen as the most likely candidate, with some level of uncertainty surrounding which model is the most suitable.
1.2 Variable selection
Recall that for the variable inclusion/exclusion step, a variable \({m^*}\) is selected at random from \(1, \ldots , M.\) An inclusion or exclusion move is then proposed, based on the current status of the variable. In what follows, we assume that the state space has \(G\) groups, and that the data is binary, so that \(X_{nm} \in \{0, 1\}\), for all \(n = 1, \ldots , N\) and \(m = 1, \ldots , M.\)
1.2.1 Inclusion step
Suppose we select a variable \({m^*},\) which is currently excluded from the model. For the inclusion step, dropping the variable index, we propose the following move:
-
1.
Generate \(u_1, \ldots , u_{G-1} \sim \hbox {Uniform}(-\epsilon , \epsilon ), \) and set \(u_{G} = - \sum ^{G-1}_{i=1} u_i.\)
-
2.
Set
$$\begin{aligned} \log \left( \frac{\theta _{1}}{1 - \theta _{1}}\right) = \log \left( \frac{\beta }{1 - \beta }\right) + u_1, \end{aligned}$$which is equivalent to setting
$$\begin{aligned} \theta _1 = \frac{\beta e^{u_1}}{1 + \beta (e^{u_1} -1 )}. \end{aligned}$$Similarly, for \(g = 2, \ldots , G,\) set:
$$\begin{aligned} \theta _g = \frac{\beta e^{u_g}}{1 + \beta (e^{u_g} -1 )}. \end{aligned}$$
Then the proposed move is accepted with probability \(\alpha \), where
where
and we define \(S_{g{m^*}} = \sum ^N_{n=1} X_{nm^*}Z_{ng}, \) \(S^C_{g{m^*}} = \sum ^N_{n=1} (1 - X_{nm^*})Z_{ng}, \) and \(N_{m^*} = \sum ^N_{n=1} X_{nm^*}.\) Here we use \(p(\xi \rightarrow \xi ^*) = 1/M\) to denote the probability of the proposed move. Finally, the Jacobian \({\mathcal J}\) is defined as \({\mathcal J}_{1g} = \frac{\partial \theta _{gm^*}}{\partial \rho _{m^*}},\) and \({\mathcal J}_{kg} = \frac{\partial \theta _{gm^*}}{\partial u_{k-1}},\) for \(g = 1, \dots , G \hbox { and } k = 2, \dots , G.\)
1.2.2 Exclusion step
If the variable \({m^*},\) is currently included in the model, we propose the exclusion step,
where again we have dropped the variable index. Using this expression, we then obtain
for \(g = 1, \ldots , G-1\), demonstrating the required bijection between \({\varvec{\theta }}_{m^*}\) and \((\rho _{m^*}, \mathbf {u})\). The proposed move is again accepted with probability \(\alpha \), where
where the calculations are inverted, so that
The probability of the proposed move remains, \(p(\xi \rightarrow \xi ^*) = 1/M\).
1.2.3 Dean and Raftery data application
We apply this approach to the binary Dean and Raftery dataset described previously in Sect. 2.2. Here, we fix the number of groups to the true value \(G = 2\), so that the model search is based on variable selection only. The sampler was run for 50,000 iterations, with \(\epsilon = 1\), which resulted in an acceptance probability for the inclusion/exclusion move of \(\alpha \approx 0.12.\)
The posterior probability for variable inclusion from the sampler are shown in Table 15. None of the informative variables are selected as frequently as for the collapsed sampler, with the model only finding weak evidence for variable 1, and failing to distinguish between the other variables.
Rights and permissions
About this article
Cite this article
White, A., Wyse, J. & Murphy, T.B. Bayesian variable selection for latent class analysis using a collapsed Gibbs sampler. Stat Comput 26, 511–527 (2016). https://doi.org/10.1007/s11222-014-9542-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9542-5