Skip to main content
Log in

Growth Mixture Modeling with Measurement Selection

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Growth mixture models are an important tool for detecting group structure in repeated measures data. Unlike traditional clustering methods, they explicitly model the repeated measurements on observations, and the statistical framework they are based on allows for model selection methods to be used to select the number of clusters. However, the basic growth mixture model makes the assumption that all of the measurements in the data have grouping information that separate the clusters. In other clustering contexts, it has been shown that including non-clustering variables in clustering procedures can lead to poor estimation of the group structure both in terms of the number of clusters and cluster membership/parameters. In this paper, we present an extension of the growth mixture model that allows for incorporation of stepwise variable selection based on the work done by Maugis, Celeux, and Martin-Magniette (2009) and Raftery and Dean (2006). Results presented on a simulation study suggest that the method performs well in correctly selecting the clustering variables and improves on recovery of the cluster structure compared with the basic growth mixture model. The paper also presents an application of the model to a clinical study dataset and concludes with a discussion and suggestions for directions of future work in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • BAUDRY, J.P., RAFTERY, A.E., CELEUX, G., LO, K., and GOTTARDO, R. (2010), “Combining Mixture Components for Clustering”, Journal of Computational and Graphical Statistics, 19, 332–353.

    Article  MathSciNet  Google Scholar 

  • BIERNACKI, C., and GOVAERT, G. (1997), “Using the Classification Likelihood to Choose the Number of Clusters”, Computing Science and Statistics, 29, 451–457.

    Google Scholar 

  • BIERNACKI, A.C., and GOVAERT, G. (1999), “Choosing Models in Model-Based Clustering and Discriminant Analysis”, Journal of Statistical Computation and Simulation, 64, 49–71.

    Article  MATH  Google Scholar 

  • DEAN, N., and RAFTERY, A.E. (2010), “Latent Class Analysis Variable Selection”, Annals of the Institute of Statistical Mathematics, 62(1), 11–35.

    Article  MathSciNet  MATH  Google Scholar 

  • DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data Via the EM Algorithm”, Journal of the Royal Statistical Society, Series B (Methodological), 1–38.

  • EVERITT, B., LANDAU, S., LEESE, M., and STAHL, D. (2011), Cluster Analysis, Wiley Series in Probability and Statistics, Chichester, UK: Wiley.

  • FRALEY, C., and RAFTERY, A.E. (1998), “How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis”, The Computer Journal, 41(8), 578–588.

    Article  MATH  Google Scholar 

  • FRALEY, C., and RAFTERY, A.E. (2002), “Model-Based Clustering, Discriminant Analysis, and Density Estimation”, Journal of the American Statistical Association, 97, 611–631.

    Article  MathSciNet  MATH  Google Scholar 

  • FRALEY, C., RAFTERY, A.E., MURPHY, T.B., and SCRUCCA, L. (2012), “Mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation”, Report No. 597, Department of Statistics, University of Washington.

  • GRÜN, B., and LEISCH, F. (2008), “Finite Mixtures of Generalized Linear Regression Models”, in Recent Advances in Linear Models and Related Areas: Essays in Honour of Helge Toutenburg, Physica-Verlag.

  • GUPTA, M.R., and YIHUA CHEN, Y. (2011), “Theory and Use of the EM Algorithm”, Foundations and Trends® in Signal Processing, 4(3), 223–296.

    Article  MATH  Google Scholar 

  • HARTIGAN, J.A. (1975), Clustering Algorithms, Wiley.

  • HARTIGAN, J.A. (1981), “Consistency of Single Linkage for High-Density Clusters”, Journal of the American Statistical Association, 76, 388–394.

    Article  MathSciNet  MATH  Google Scholar 

  • HENNIG, C. (2010), “Methods for Merging Gaussian Mixture Components”, Advances in Data Analysis and Classification, 4, 3–34.

    Article  MathSciNet  MATH  Google Scholar 

  • HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2(1), 193–218.

    Article  MATH  Google Scholar 

  • JAMES, G.M., and SUGAR, C.A. (2003), “Clustering for Sparsely Sampled Functional Data”, Journal of the American Statistical Association, 98, 565–576.

    Article  MathSciNet  MATH  Google Scholar 

  • KERIBIN, C. (2000), “Consistent Estimation of the Order of Mixture Models”, Sankhya, 62, 49–66.

    MathSciNet  MATH  Google Scholar 

  • LAZARSFELD, P.F., and HENRY, N.W. (1968), Latent Structure Analysis, Houghton Mifflin.

  • MACQUEEN, J.B. (1967), “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press.

  • MAUGIS, C., CELEUX, G., and MARTIN-MAGNIETTE, M-L. (2009), “Variable Selection for Clustering with Gaussian Mixture Models”, Biometrics, 65(3), 701–709.

    Article  MathSciNet  MATH  Google Scholar 

  • MCLACHLAN, G.J., and KRISHNAN, T. (2008), The EM Algorithm and Extensions, Wiley.

  • MCNICHOLAS, P.D., and SUBEDI, S. (2012), “The EM Algorithm and Extensions”, Journal of Statistical Planning and Inference, 5, 1114–1127.

    Article  Google Scholar 

  • MELNYKOV, V. (2016), “Merging Mixture Components for Clustering Through Pairwise Overlap”, Journal of Computational and Graphical Statistics, 24(1), 66–90.

    Article  MathSciNet  Google Scholar 

  • MURPHY, T.B., DEAN, N., and RAFTERY, A.E. (2010), “Variable Selection and Updating in Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications”, Annals of Applied Statistics, 4, 396–421.

    Article  MathSciNet  MATH  Google Scholar 

  • MUTHÉN, B., and SHEDDEN, K. (1999), “Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm”, Biometrics, 55(2), 463–469.

    Article  MATH  Google Scholar 

  • PEARSON, K. (1894), “Contribution to the Mathematical Theory of Evolution”, Philosophical Transactions of the Royal Society of London, Series A, 71.

  • R CORE TEAM (2015), “R: A Language and Environment for Statistical Computing”, R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/.

  • RAFTERY, A.E. (1995), “Bayesian Model Selection in Social Research (With Discussion)”, Sociological Methodology, 111–196.

  • RAFTERY, A.E, and DEAN, N. (2006), “Variable Selection for Model-Based Clustering”, Journal of the American Statistical Association, 101(473), 168–178.

    Article  MathSciNet  MATH  Google Scholar 

  • RAM, N., and GRIMM, K.J. (2009), “Methods and Measures: Growth Mixture Modeling: A Method for Identifying Differences in Longitudinal Change Among Unobserved Groups”, International Journal of Behavioral Development, 33(6), 565–576.

    Article  Google Scholar 

  • RUSAKOV, D., and GEIGER, D. (2005), “Asymptotic Model Selection for Naive Bayesian Networks”, Journal of Machine Learning Research, 6, 1–35.

    MathSciNet  MATH  Google Scholar 

  • SCHWARZ, G.E. (1978), “Estimating the Dimension of a Model”, Annals of Statistics, 6(2), 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • SCRUCCA, L. (2016), “Identifying Connected Components in Gaussian Finite Mixture Models for Clustering”, Computational Statistics and Data Analysis, 93, 5–17.

    Article  MathSciNet  MATH  Google Scholar 

  • STEEL, R.G.D., and TORRIE, J.H. (1960), Principles and Procedures of Statistics with Special Reference to the Biological Sciences, McGraw Hill.

  • THASE, M.E, GREENHOUSE, J.B., FRANK, E., REYNOLDS, C.F, PILKONIS, P.A., HURLEY, K., GROCHOCINSKI, V., and KUPFER, D.J. (1997), “Treatment of Major Depression with Psychotherapy or Psychotherapy-Pharmacotherapy Combinations”, Archives of General Psychiatry, 54(11), 1009–1015.

    Article  Google Scholar 

  • TITTERINGTON, D.M., SMITH, A.F.M., and MAKOV, U.E. (1985), Statistical Analysis of Finite Mixture Distributions (Vol 7), New York: Wiley New York.

    MATH  Google Scholar 

  • WARD, J.H. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statistical Association, 58(301), 236–244.

    Article  MathSciNet  Google Scholar 

  • WISHART, D. (1969), “Mode Analysis: A Generalization of Nearest Neighbor Which Reduces Chaining Effects”, in Numerical Taxonomy, ed. A.J. Cole, Academic Press, pp. 282-311.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abby Flynt.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Flynt, A., Dean, N. Growth Mixture Modeling with Measurement Selection. J Classif 36, 3–25 (2019). https://doi.org/10.1007/s00357-018-9275-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-018-9275-9

Keywords

Navigation