Abstract
Factor mixture modeling (FMM) has been increasingly used in behavioral and social sciences to examine unobserved population heterogeneity. Covariates (e.g., gender, race) are often included in FMM to help understand the formation and characterization of latent subgroups or classes. This Monte Carlo simulation study evaluated the performance of one-step and three-step approaches to covariate inclusion across three scenarios, i.e., correct specification (study 1), model misspecification (study 2), and model overfitting (study 3), in terms of direct covariate effects on factors. Results showed that the performance of these two approaches was comparable when class separation was large and the specification of covariate effect was correct. However, one-step FMM had better class enumeration than the three-step approach when class separation was poor, and was more robust to the misspecification or overfitting concerning direct covariate effects. Recommendations regarding covariate inclusion approaches are provided herein depending on class separation and sample size. Large sample size (1000 or more) and the use of sample size-adjusted BIC (saBIC) in class enumeration are recommended.
This is a preview of subscription content, access via your institution.



Data availability
The data sets generated and/or analyzed during the current study are not publicly available due to the large volume of data in Monte Carlo simulations (i.e., 200 replications per condition), but are available from the corresponding author on reasonable request.
Code availability
Example codes for generating and analyzing data are available on the Open Science Framework (https://osf.io/amupe/?view_only=9cfdf249a13b4ae5b9031f713acc1ede).
Notes
Additional approaches to class enumeration are available such as the Lo-Mendell-Rubin (LMR) test and adjusted LMR test (Lo et al., 2001), and bootstrap likelihood ratio test (BLRT; McLachlan & Peel, 2000). However, the former two were not examined given that previous simulation studies have shown that they do not perform as well as ICs (Henson et al., 2007; Nylund et al., 2007; Tein et al., 2013). BLRT was not considered due to the long execution time. Entropy was also not included due to the unreliable performance that has been documented in the literature (e.g., E. Kim et al., 2016; Tein et al., 2013), which was confirmed by an examination of class enumeration based on entropy for a subset of conditions (interested readers are referred to Tables S3 and S4). In addition, a major limitation with the use of entropy for class enumeration is that the one-class model cannot be compared with models with two or more classes in terms of fit.
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705
Allan, N. P., MacPherson, L., Young, K. C., Lejuez, C. W., & Schmidt, N. B. (2014). Examining the latent structure of anxiety sensitivity in adolescents using factor mixture modeling. Psychological Assessment, 26(3), 741–751. https://doi.org/10.1037/a0036744
Asparouhov, T., & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21(3), 329–341. https://doi.org/10.1080/10705511.2014.915181
Bakk, Z., Tekle, F. B., & Vermunt, J. K. (2013). Estimating the association between latent class membership and external variables using bias adjusted three-step approaches. Sociological Methodology, 43(1), 272–311. https://doi.org/10.1177/0081175012470644
Bauer, D. J. (2007). Observations on the use of growth mixture models in psychological research. Multivariate Behavioral Research, 42(4), 757–786. https://doi.org/10.1080/00273170701710338
Bernstein, A., Stickle, T. R., & Schmidt, N. B. (2013). Factor mixture model of anxiety sensitivity and anxiety psychopathology vulnerability. Journal of Affective Disorders, 149(1-3), 406–417. https://doi.org/10.1016/j.jad.2012.11.024
Cetin-Berber, D. D., & Leite, W. L. (2018). A comparison of one-step and three-step approaches for including covariates in the shared parameter growth mixture model. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 588–599. https://doi.org/10.1080/10705511.2018.1428806
Cho, S., & Cohen, A. S. (2010). A multilevel mixture IRT model with an application to DIF. Journal of Educational and Behavioral Statistics, 35(3), 336–370. https://doi.org/10.3102/1076998609353111
De Ayala, R. J., Kim, S.-H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243–276. https://doi.org/10.1080/15305058.2002.9669495
Diallo, T. M. O., & Lu, H. (2017). On the application of the three-step approach to growth mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 24(5), 714–732. https://doi.org/10.1080/10705511.2017.1322516
Diallo, T. M. O., Morin, A. J. S., & Lu, H. (2017). The impact of total and partial inclusion or exclusion of active and inactive time invariant covariates in growth mixture models. Psychological Methods, 22(1), 166–190. https://doi.org/10.1037/met0000084
Dimitrov, D. M., Al-Saud, F. A. A.-M., & Alsadaawi, A. S. (2015). Investigating population heterogeneity and interaction effects of covariates: The case of a large-scale assessment for teacher licensure in Saudi Arabia. Journal of Psychoeducational Assessment, 33(7), 674–686. https://doi.org/10.1177/0734282914562121
Elhai, J. D., Naifeh, J. A., Forbes, D., Ractliffe, K. C., & Tamburrino, M. (2011). Heterogeneity in clinical presentations of posttraumatic stress disorder among medical patients: Testing factor structure variation using factor mixture modeling. Journal of Traumatic Stress, 24(4), 435–443. https://doi.org/10.1002/jts.20653
Henson, J. M., Reise, S. P., & Kim, K. H. (2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14(2), 202–226. https://doi.org/10.1080/10705510709336744
Hu, J., Leite, W. L., & Gao, M. (2017). An evaluation of the use of covariates to assist in class enumeration in linear growth mixture modeling. Behavior Research Methods, 49(3), 1179–1190. https://doi.org/10.3758/s13428-016-0778-1
Jensen, T. M. (2017). Constellations of dyadic relationship quality in stepfamilies: A factor mixture model. Journal of Family Psychology, 31(8), 1051–1062. https://doi.org/10.1037/fam0000355
Kim, M., Vermunt, J., Bakk, Z., Jaki, T., & Van Horn, M. L. (2016). Modeling predictors of latent classes in regression mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 23(4), 601–614. https://doi.org/10.1080/10705511.2016.1158655
Kim, E. S., Joo, S.-H., Lee, P., Wang, Y., & Stark, S. (2016). Measurement invariance testing across between-level latent classes using multilevel factor mixture modeling. Structural Equation Modeling: A Multidisciplinary Journal, 23(6), 870–887. https://doi.org/10.1080/10705511.2016.1196108
Kim, E. S., & Wang, Y. (2018). Investigating sources of heterogeneity with 3-step multilevel factor mixture modeling: Beyond testing measurement invariance in cross-national studies. Structural Equation Modeling: A Multidisciplinary Journal, 26(2), 165–181. https://doi.org/10.1080/10705511.2018.1521284
Lee, H., & Beretvas, S. N. (2014). Evaluation of two types of differential item functioning in factor mixture models with binary outcomes. Educational and Psychological Measurement, 74(5), 831–858. https://doi.org/10.1177/0013164414526881
Li, M., & Harring, J. R. (2017). Investigating approaches to estimating covariate effects in growth mixture modeling: A simulation study. Educational and Psychological Measurement, 77(5), 766–791. https://doi.org/10.1177/0013164416653789
Li, L., & Hser, Y.-I. (2011). On inclusion of covariates for class enumeration of growth mixture models. Multivariate Behavioral Research, 46(2), 266–302. https://doi.org/10.1080/00273171.2011.556549
Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767–778. https://doi.org/10.1093/biomet/88.3.767
Lubke, G. H., & Muthén, B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10(1), 21–39. https://doi.org/10.1037/1082-989X.10.1.21
Lubke, G., & Muthén, B. O. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling: A Multidisciplinary Journal, 14(1), 26–47. https://doi.org/10.1080/10705510709336735
Masyn, K. E. (2017). Measurement invariance and differential item functioning in latent class analysis with stepwise multiple indicator multiple cause modeling. Structural Equation Modeling: A Multidisciplinary Journal, 24(2), 180–197. https://doi.org/10.1080/10705511.2016.1254049
McLachlan, G., & Peel, D. (2000). Finite mixture models. Hoboken, NJ: Wiley.
Muthén, L. K., & Muthén, B. O. (1998-2017). Mplus User’s guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén.
No, U., & Hong, S. (2018). A comparison of mixture modeling approaches in latent class models with external variables under small samples. Educational and Psychological Measurement, 78(6), 925–951. https://doi.org/10.1177/0013164417726828
Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14(4), 535–569. https://doi.org/10.1080/10705510701575396
Nylund-Gibson, K., & Masyn, K. E. (2016). Covariates and mixture modeling: Results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling: A Multidisciplinary Journal, 23(6), 782–797. https://doi.org/10.1080/10705511.2016.1221313
Park, J., & Yu, H.-T. (2018). A comparison of approaches for estimating covariate effects in nonparametric multilevel latent class models. Structural Equation Modeling: A Multidisciplinary Journal, 25(5), 778–790. https://doi.org/10.1080/10705511.2018.1448711
Piper, M. E., Bolt, D. M., Kim, S.-Y., Japuntich, S. J., Smith, S. S., Niederdeppe, J., Cannon, D. S., & Baker, T. B. (2008). Refining the tobacco dependence phenotype using the Wisconsin Inventory of Smoking Dependence Motives. Journal of Abnormal Psychology, 117(4), 747–761. https://doi.org/10.1037/a0013298
Rice, K. G., Richardson, C. M. E., & Tueller, S. (2014). The short form of the Revised Almost Perfect Scale. Journal of Personality Assessment, 96(3), 368–379. https://doi.org/10.1080/00223891.2013.838172
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136
Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52(3), 333–343. https://doi.org/10.1007/BF02294360
Stegmann, G., & Grimm, K. J. (2018). A new perspective on the effects of covariates in mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 25(2), 167–178. https://doi.org/10.1080/10705511.2017.1318070
Tay, L., Newman, D. A., & Vermunt, J. K. (2011). Using mixed-measurement item response theory with covariates (MM-IRT-C) to ascertain observed and unobserved measurement equivalence. Organizational Research Methods, 14(1), 147–176. https://doi.org/10.1177/1094428110366037
Tein, J., Coxe, S., & Cham, H. (2013). Statistical power to detect the correct number of classes in latent profile analysis. Structural Equation Modeling: A Multidisciplinary Journal, 20(4), 640–657. https://doi.org/10.1080/10705511.2013.824781
Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469. https://doi.org/10.1093/pan/mpq025
Vermunt, J. K., & Magidson, J. (2021). How to perform three-step latent class analysis in the presence of measurement non-invariance or differential item functioning. Structural Equation Modeling: A Multidisciplinary Journal, 28(3), 356–364. https://doi.org/10.1080/10705511.2020.1818084
Wang, Y., Hsu, H.-Y., & Kim, E. (2021). Investigating the impact of covariate inclusion on sample size requirements of factor mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal. Advance online publication. https://doi.org/10.1080/10705511.2021.1910036
Wang, Y., Kim, E., Ferron, J. M., Dedrick, R. F., Tan, T. X., & Stark, S. (2020). Testing measurement invariance across unobserved groups: The role of covariates in factor mixture modeling. Educational and Psychological Measurement, 81(1), 61–89. https://doi.org/10.1177/0013164420925122
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
Yan Wang designed the study and worked on the simulation codes. All three authors ran simulations and analyzed data. Yan Wang led the writing of the manuscript, and Chunhua Cao and Eunsook Kim both contributed to the writing.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflicts of interest/Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Open practices statement
Supplemental tables and simulation codes for data generation and analysis are available on the Open Science Framework (https://osf.io/amupe/?view_only=f72fb1198c4947dbab731cebb5c416a3).
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
ESM 1
(PDF 331 KB)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Cao, C. & Kim, E. Covariate inclusion in factor mixture modeling: Evaluating one-step and three-step approaches under model misspecification and overfitting. Behav Res (2022). https://doi.org/10.3758/s13428-022-01964-8
Accepted:
Published:
DOI: https://doi.org/10.3758/s13428-022-01964-8
Keywords
- Factor mixture model
- Covariate
- One-step
- Three-step
- Model misspecification