Skip to main content

Covariate inclusion in factor mixture modeling: Evaluating one-step and three-step approaches under model misspecification and overfitting

Abstract

Factor mixture modeling (FMM) has been increasingly used in behavioral and social sciences to examine unobserved population heterogeneity. Covariates (e.g., gender, race) are often included in FMM to help understand the formation and characterization of latent subgroups or classes. This Monte Carlo simulation study evaluated the performance of one-step and three-step approaches to covariate inclusion across three scenarios, i.e., correct specification (study 1), model misspecification (study 2), and model overfitting (study 3), in terms of direct covariate effects on factors. Results showed that the performance of these two approaches was comparable when class separation was large and the specification of covariate effect was correct. However, one-step FMM had better class enumeration than the three-step approach when class separation was poor, and was more robust to the misspecification or overfitting concerning direct covariate effects. Recommendations regarding covariate inclusion approaches are provided herein depending on class separation and sample size. Large sample size (1000 or more) and the use of sample size-adjusted BIC (saBIC) in class enumeration are recommended.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Data availability

The data sets generated and/or analyzed during the current study are not publicly available due to the large volume of data in Monte Carlo simulations (i.e., 200 replications per condition), but are available from the corresponding author on reasonable request.

Code availability

Example codes for generating and analyzing data are available on the Open Science Framework (https://osf.io/amupe/?view_only=9cfdf249a13b4ae5b9031f713acc1ede).

Notes

  1. Additional approaches to class enumeration are available such as the Lo-Mendell-Rubin (LMR) test and adjusted LMR test (Lo et al., 2001), and bootstrap likelihood ratio test (BLRT; McLachlan & Peel, 2000). However, the former two were not examined given that previous simulation studies have shown that they do not perform as well as ICs (Henson et al., 2007; Nylund et al., 2007; Tein et al., 2013). BLRT was not considered due to the long execution time. Entropy was also not included due to the unreliable performance that has been documented in the literature (e.g., E. Kim et al., 2016; Tein et al., 2013), which was confirmed by an examination of class enumeration based on entropy for a subset of conditions (interested readers are referred to Tables S3 and S4). In addition, a major limitation with the use of entropy for class enumeration is that the one-class model cannot be compared with models with two or more classes in terms of fit.

References

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

Yan Wang designed the study and worked on the simulation codes. All three authors ran simulations and analyzed data. Yan Wang led the writing of the manuscript, and Chunhua Cao and Eunsook Kim both contributed to the writing.

Corresponding author

Correspondence to Yan Wang.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflicts of interest/Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Open practices statement

Supplemental tables and simulation codes for data generation and analysis are available on the Open Science Framework (https://osf.io/amupe/?view_only=f72fb1198c4947dbab731cebb5c416a3).

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

ESM 1

(PDF 331 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Cao, C. & Kim, E. Covariate inclusion in factor mixture modeling: Evaluating one-step and three-step approaches under model misspecification and overfitting. Behav Res (2022). https://doi.org/10.3758/s13428-022-01964-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-022-01964-8

Keywords

  • Factor mixture model
  • Covariate
  • One-step
  • Three-step
  • Model misspecification