Abstract
Growth mixture models are an important tool for detecting group structure in repeated measures data. Unlike traditional clustering methods, they explicitly model the repeated measurements on observations, and the statistical framework they are based on allows for model selection methods to be used to select the number of clusters. However, the basic growth mixture model makes the assumption that all of the measurements in the data have grouping information that separate the clusters. In other clustering contexts, it has been shown that including non-clustering variables in clustering procedures can lead to poor estimation of the group structure both in terms of the number of clusters and cluster membership/parameters. In this paper, we present an extension of the growth mixture model that allows for incorporation of stepwise variable selection based on the work done by Maugis, Celeux, and Martin-Magniette (2009) and Raftery and Dean (2006). Results presented on a simulation study suggest that the method performs well in correctly selecting the clustering variables and improves on recovery of the cluster structure compared with the basic growth mixture model. The paper also presents an application of the model to a clinical study dataset and concludes with a discussion and suggestions for directions of future work in this area.
Similar content being viewed by others
References
BAUDRY, J.P., RAFTERY, A.E., CELEUX, G., LO, K., and GOTTARDO, R. (2010), “Combining Mixture Components for Clustering”, Journal of Computational and Graphical Statistics, 19, 332–353.
BIERNACKI, C., and GOVAERT, G. (1997), “Using the Classification Likelihood to Choose the Number of Clusters”, Computing Science and Statistics, 29, 451–457.
BIERNACKI, A.C., and GOVAERT, G. (1999), “Choosing Models in Model-Based Clustering and Discriminant Analysis”, Journal of Statistical Computation and Simulation, 64, 49–71.
DEAN, N., and RAFTERY, A.E. (2010), “Latent Class Analysis Variable Selection”, Annals of the Institute of Statistical Mathematics, 62(1), 11–35.
DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data Via the EM Algorithm”, Journal of the Royal Statistical Society, Series B (Methodological), 1–38.
EVERITT, B., LANDAU, S., LEESE, M., and STAHL, D. (2011), Cluster Analysis, Wiley Series in Probability and Statistics, Chichester, UK: Wiley.
FRALEY, C., and RAFTERY, A.E. (1998), “How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis”, The Computer Journal, 41(8), 578–588.
FRALEY, C., and RAFTERY, A.E. (2002), “Model-Based Clustering, Discriminant Analysis, and Density Estimation”, Journal of the American Statistical Association, 97, 611–631.
FRALEY, C., RAFTERY, A.E., MURPHY, T.B., and SCRUCCA, L. (2012), “Mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation”, Report No. 597, Department of Statistics, University of Washington.
GRÜN, B., and LEISCH, F. (2008), “Finite Mixtures of Generalized Linear Regression Models”, in Recent Advances in Linear Models and Related Areas: Essays in Honour of Helge Toutenburg, Physica-Verlag.
GUPTA, M.R., and YIHUA CHEN, Y. (2011), “Theory and Use of the EM Algorithm”, Foundations and Trends® in Signal Processing, 4(3), 223–296.
HARTIGAN, J.A. (1975), Clustering Algorithms, Wiley.
HARTIGAN, J.A. (1981), “Consistency of Single Linkage for High-Density Clusters”, Journal of the American Statistical Association, 76, 388–394.
HENNIG, C. (2010), “Methods for Merging Gaussian Mixture Components”, Advances in Data Analysis and Classification, 4, 3–34.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2(1), 193–218.
JAMES, G.M., and SUGAR, C.A. (2003), “Clustering for Sparsely Sampled Functional Data”, Journal of the American Statistical Association, 98, 565–576.
KERIBIN, C. (2000), “Consistent Estimation of the Order of Mixture Models”, Sankhya, 62, 49–66.
LAZARSFELD, P.F., and HENRY, N.W. (1968), Latent Structure Analysis, Houghton Mifflin.
MACQUEEN, J.B. (1967), “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press.
MAUGIS, C., CELEUX, G., and MARTIN-MAGNIETTE, M-L. (2009), “Variable Selection for Clustering with Gaussian Mixture Models”, Biometrics, 65(3), 701–709.
MCLACHLAN, G.J., and KRISHNAN, T. (2008), The EM Algorithm and Extensions, Wiley.
MCNICHOLAS, P.D., and SUBEDI, S. (2012), “The EM Algorithm and Extensions”, Journal of Statistical Planning and Inference, 5, 1114–1127.
MELNYKOV, V. (2016), “Merging Mixture Components for Clustering Through Pairwise Overlap”, Journal of Computational and Graphical Statistics, 24(1), 66–90.
MURPHY, T.B., DEAN, N., and RAFTERY, A.E. (2010), “Variable Selection and Updating in Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications”, Annals of Applied Statistics, 4, 396–421.
MUTHÉN, B., and SHEDDEN, K. (1999), “Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm”, Biometrics, 55(2), 463–469.
PEARSON, K. (1894), “Contribution to the Mathematical Theory of Evolution”, Philosophical Transactions of the Royal Society of London, Series A, 71.
R CORE TEAM (2015), “R: A Language and Environment for Statistical Computing”, R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/.
RAFTERY, A.E. (1995), “Bayesian Model Selection in Social Research (With Discussion)”, Sociological Methodology, 111–196.
RAFTERY, A.E, and DEAN, N. (2006), “Variable Selection for Model-Based Clustering”, Journal of the American Statistical Association, 101(473), 168–178.
RAM, N., and GRIMM, K.J. (2009), “Methods and Measures: Growth Mixture Modeling: A Method for Identifying Differences in Longitudinal Change Among Unobserved Groups”, International Journal of Behavioral Development, 33(6), 565–576.
RUSAKOV, D., and GEIGER, D. (2005), “Asymptotic Model Selection for Naive Bayesian Networks”, Journal of Machine Learning Research, 6, 1–35.
SCHWARZ, G.E. (1978), “Estimating the Dimension of a Model”, Annals of Statistics, 6(2), 461–464.
SCRUCCA, L. (2016), “Identifying Connected Components in Gaussian Finite Mixture Models for Clustering”, Computational Statistics and Data Analysis, 93, 5–17.
STEEL, R.G.D., and TORRIE, J.H. (1960), Principles and Procedures of Statistics with Special Reference to the Biological Sciences, McGraw Hill.
THASE, M.E, GREENHOUSE, J.B., FRANK, E., REYNOLDS, C.F, PILKONIS, P.A., HURLEY, K., GROCHOCINSKI, V., and KUPFER, D.J. (1997), “Treatment of Major Depression with Psychotherapy or Psychotherapy-Pharmacotherapy Combinations”, Archives of General Psychiatry, 54(11), 1009–1015.
TITTERINGTON, D.M., SMITH, A.F.M., and MAKOV, U.E. (1985), Statistical Analysis of Finite Mixture Distributions (Vol 7), New York: Wiley New York.
WARD, J.H. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statistical Association, 58(301), 236–244.
WISHART, D. (1969), “Mode Analysis: A Generalization of Nearest Neighbor Which Reduces Chaining Effects”, in Numerical Taxonomy, ed. A.J. Cole, Academic Press, pp. 282-311.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Flynt, A., Dean, N. Growth Mixture Modeling with Measurement Selection. J Classif 36, 3–25 (2019). https://doi.org/10.1007/s00357-018-9275-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9275-9