Growth mixture modeling (GMM; Muthén, 2001, 2002; Muthén & Muthén, 2000; Muthén & Shedden, 1999) has been increasingly used to separate individuals into unknown subgroups characterized by their distinct growth trajectory over time (Bauer, 2007). GMM is similar to latent class growth modeling (LCGM; Nagin, 1999), but LCGMs set the variances and covariances of growth factors within class to zero (Muthén, 2004). Regardless of its popularity, class enumeration in GMM has remained an active topic of research (Li & Hser, 2011; Liu & Hancock, 2014; Peugh & Fan, 2012; Peugh & Fan, 2014; Tofighi & Enders, 2007;). One topic still needing additional research is in which conditions including covariates in GMM facilitates the identification of the correct number of classes (Li & Hser, 2011; Liu & Hancock, 2014, Peugh & Fan, 2014; Tofighi & Enders, 2007).

There have been a few recommendations in the literature about incorporating covariates in GMM for class enumeration: Muthén (2004) suggested that properly selected covariates may facilitate GMM class enumeration, estimation of class proportions and class membership; Lubke and Muthén (2007) studied covariate effects in factor mixture modeling and suggested that covariate incorporation is beneficial for parameter recovery and class assignment in the case of covariates only predicting latent class membership. Using continuous normally-distributed covariates, Tofighi and Enders (2007) concluded that covariates hampered GMM class enumeration unless the sample size was 200. Peugh and Fan (2014) added more nuance to the knowledge about inclusion of covariates in GMM by finding that the effect of adding covariates into GMM varied across model selection criteria with smaller sample sizes (i.e., 300, 500), but with larger sample sizes (i.e., N=3,000) the addition of covariates decreased correct class enumeration rates with most fit criteria. However, using binary and non-normally distributed covariates, Li and Hser (2011) found it is beneficial to include covariates for GMM class enumeration. Therefore, there is evidence that the effect of including covariates on class enumeration in GMM is complex. The purpose of this paper is to add to current knowledge about GMM by examining whether the way covariates are included in the model affects class enumeration. More specifically, we examine GMM with covariates added as predictors of intercept and slope and GMM with covariates as direct predictors of the outcome as compared with unconditional GMM. In particular, direct-effect GMM, which is an extension of the direct-effect latent growth model discussed by Stoel, Van den Wittenboer, and Hox (2004), has never been examined to improve class enumeration in GMM. These model specifications are compared across variations of class separation, sample size, covariate effect, and magnitude of covariance between intercept and slope. The structure of this paper is the following: We first describe the three GMM specifications that are the focus of this paper, then present model fit indexes commonly used for class enumeration. Next, we review previous studies of class enumeration in mixture modeling. Finally, we present two Monte Carlo simulation studies examining class enumeration with three GMM specifications with several fit indices across a variety of conditions.

Growth mixture models

GMM can be used to describe a variety of linear and nonlinear growth trajectories. In this paper, we will focus on the linear growth pattern that has been widely used in applied research (e.g., Abroms, et al, 2005; Greenbaum, et al, 2005; McDonough, Sacker, & Wiggins, 2005; Stoolmiller, Kim, & Capaldi, 2005). We assume that covariates are antecedents of class membership (Muthén, 2004), which implies that they may affect class membership and can be used as auxiliary variables for class enumeration.

Unconditional growth mixture model

An unconditional growth mixture model (U-GMM, Muthén, 2002) can be represented by the following equations:

$$ {y}_{ij}^k={\eta}_{iI}^k+{\eta}_{iS}^k{t}_j+{\varepsilon}_{ij}^k $$
(1)
$$ {\eta}_{i I}^k={\gamma}_{00}^k+{\xi}_{i0}^k $$
(2)
$$ {\eta}_{i S}^k={\gamma}_{01}^k+{\xi}_{i1}^k $$
(3)

Equation 1 presents the measurement part of the U-GMM where y k ij is the value of the outcome variable for participant i at occasion j in class k, t j is a time code assumed here to be the same for all individuals, η k iI is the random intercept, η k iS is the random slope, and ε k ij is the residual. Equations 2 and 3 are the structural part of the growth mixture model, where γ k00 and γ k01 are the mean intercepts and slopes as latent factors across all participants and occasions ξ k i0 and ξ k i1 are residuals of the latent growth factors at the between level.

$$ \begin{array}{l}\mathrm{In}\left[\frac{P\left({c}_{ik}=1\right)}{P\left({c}_{iK}=1\right)}\right]={\lambda}_k\\ {}{c}_{ik}=\left\{\begin{array}{l}1\kern0.36em i f\; subject\; i\; belongs\; to\; class\; k\hfill \\ {}0\kern0.36em otherwise\hfill \end{array}\right.\end{array} $$
(4)

Equation 4 is the unconditional multinomial logit part of the U-GMM where c ik is one if participant i belongs to the kth class or zero otherwise, and λ k is the log odds of belonging to class k instead of reference class K (Lubke & Muthén, 2007).

Growth mixture models with covariates

Previous studies about the effects of covariates on class enumeration in GMM (i.e., Li & Hser, 2011; Tofighi & Enders, 2007) investigated a conditional GMM where covariate(s) predict the growth factors and class membership. With such a model, it is assumed that the growth factors fully mediate the relationship between predictors and the outcome variables (Stoel, Van den Wittenboer, & Hox, 2004). We will refer to this conditional GMM as growth predictor growth mixture model (GP-GMM), because it is an expansion of the growth predictor latent growth model discussed by Stoel, Van den Wittenboer, and Hox (2004) to characterize models with covariates predicting the latent growth factors. The GP-GMM can be represented by the following equations:

$$ {y}_{ij}^k={\eta}_{iI}^k+{\eta}_{iS}^k{t}_j+{\varepsilon}_{ij}^k $$
(5)
$$ {\eta}_{i I}^k={\gamma}_{00}^k+{\displaystyle \sum_{p=1}^P{\gamma}_{p0}^k}{x}_{i p}+{\xi}_{i0}^k $$
(6)
$$ {\eta}_{i S}^k={\gamma}_{01}^k+{\displaystyle \sum_{p=1}^P{\gamma}_{p1}^k}{x}_{i p}+{\xi}_{i1}^k $$
(7)

While the GP-GMM assumes that the growth factor fully mediates the relationships between predictors and outcomes, the direct effect growth mixture model (DE-GMM) relaxes this assumption, and is an expansion of the direct effect latent growth model (Stoel, Van den Wittenboer, & Hox, 2004) to characterize GMM with covariates predicting the observed variables directly. The DE-GMM can be represented by the following equations:

$$ {y}_{ij}^k={\eta}_{iI}^k+{\eta}_{iS}^k{t}_j+{\displaystyle \sum_{p=1}^P{\beta}_{j p}^k}{x}_{ip}+{\varepsilon}_{ij}^k $$
(8)
$$ {\eta}_{i I}^k={\gamma}_{00}^k+{\xi}_{i0}^k $$
(9)
$$ {\eta}_{i S}^k={\gamma}_{01}^k+{\xi}_{i1}^k $$
(10)

In Equation 8, β k jp is the effect of predictor x ip on the outcome at time j for class k. Equation 11 can be used with both the GP-GMM and DE-GMM to describe the relationship between the log odds of class membership and predictors:

$$ \mathrm{In}\left[\frac{P\left({c}_{i k}=1\left|{x}_i\dots {x}_p\right.\right)}{P\left({c}_{i K}=1\left|{x}_i\dots {x}_p\right.\right)}\right]={\lambda}_k+{\displaystyle \sum_{p=1}^P{\delta}_p^k{x}_{i p}} $$
(11)

where x ip is the pth covariate used as the predictor of the latent class. γ k p0 and γ k p1 are direct effects of x ip on the intercept and slope factors, respectively. In Equation 8, the covariates x ip predict the log odds of the probability of belonging to class k with effects equal to δ k p (Lubke & Muthén, 2007).

Both the GP-GMM and the DE-GMM assume that the within-class model is correctly specified. Because the full mediation assumption of the GP-GMM may not hold, the use of the DE-GMM could protect against the effect of misspecification of the relationship between predictors and outcomes in the within-class model.

Research about the effect of covariates on class enumeration in GMM has focused solely on the GP-GMM. Muthén (2004) suggested taking a first glance at the number of classes without covariates then adding covariates to help make a final decision about the number of classes. However, Muthén (2006) warned that researchers should not be surprised if the class formation changed after adding covariates. This is because adding covariates might change the normality of the latent intercept and the latent slope factor within class. The task of selecting the number of classes is further complicated by the availability of several model selection indices and statistics, which differ in the sensitivity to model complexity, sample size, the presence of covariates and the presence of covariance between the intercept and the slope in the model. No agreement has yet been reached regarding which index works best as indicator of the number of classes (Lubke & Muthén, 2007; Tofighi & Enders, 2007; Yang, 2006). In the next section, we review fit indices and statistics commonly used for class enumeration in mixture modeling as well as studies comparing their performance.

Class enumeration in growth mixture modeling

In this study, we will focus on the following model information-based fit indices: Akaike’s Information Criterion (AIC; Akaike, 1987), Bayesian information criterion (BIC; Schwartz, 1978), adjusted BIC (ABIC, Sclove, 1987). We will also examine the following likelihood ratio tests: Lo-Mendell- Rubin likelihood ratio test (LMR; Lo, Mendell, & Rubin, 2001), and the bootstrapped likelihood ratio test (BLRT; McLachlan, 1987; McLachlan & Peel, 2000).

The AIC, the BIC and the ABIC can be calculated with Equations 12 to 14, and they are all functions of the log-likelihood (LL) of the fitting model and number of parameters (p):

$$ A I C=-2 L L+2 p $$
(12)
$$ B I C=-2 L L+ p L n(N) $$
(13)
$$ ABIC=-2 L L+ p L n\left(\frac{n+2}{24}\right) $$
(14)

As an alternative to information-based indices, the likelihood ratio test (LRT) compares the fit of models that are nested in each other (Bollen, 1989).

$$ L R=-2\left( L{L}_{k-1}- L{L}_k\right) $$
(15)

where LL k − 1 is the log-likelihood of the model with k-1 classes and LL k is the log-likelihood of the model with k classes. LR is a function of the difference of these two likelihoods and assumes a chi square difference distribution. However, with mixture models, the standard LRT presented above cannot be used because the LR is not asymptotically distributed as chi-square (McLachlan & Peel, 2000). Lo, Mendell, and Rubin (2001) solved this problem by proposing an approximation of the chi-square distribution in the mixture context that is a weighted sum of chi-squares. The LMR statistic compares the fit of the k-1 class model versus the k class model. A significant p value rejects the k-1 class model in favor of the k class model. A bootstrap version of the LRT (BLRT) is to use bootstrap samples as the distribution of the log-likelihood ratio test statistic rather than the assumed chi-square distribution (Nylund et al., 2007). The BLRT statistic is similar to LMR, but only compares the statistics against the bootstrap empirical sampling distribution. A significant p value of BLRT rejects the k-1 class model in favor of the k class model (Tofighi & Enders, 2007). Yang (2006) examined the class enumeration performance of the information indices with latent class analysis models. The results showed that ABIC was more accurate in detecting the correct number of latent classes than the AIC. The BIC was very sensitive to sample size and performed well at a sample size of 1,000, while the AIC was consistent but sample size did not improve its accuracy substantially. Tofighi and Enders (2007) recommended the ABIC. Nylund et al. (2007) found that the BLRT had the best performance, followed by BIC and ABIC, for class enumeration. However, Nylund et al. (2007) only investigated unconditional GMM. Lubke and Muthén (2007) concluded that covariates helped correct assignment of classes especially when class separation was large. However, in their study, the covariate was only predicting class membership and had no effect on the factor mean, which was not common in GMM. Li and Hser (2011) found that it was beneficial to include covariates in GMM to assist in the identification of correct number of class when sample size was equal to or larger than 400. They also found that the BIC performed best across sample sizes, the ABIC performed well when sample size was larger than 400, and the AIC had the poorest performance. BIC outperformed LRTs in general.

Peugh and Fan (2014) found that a sample size of 3,000 was needed to optimize class identification. Under conditions of largest sample size (N = 3,000), large class separation (MD = 2), maximum variance explained (25 %), and equal class proportion, BIC, LMR, and BLRT were able to identify the correct number of classes. With a sample size of 500, all fit criteria performed poorly. By comparing their results with a previous study by Peugh and Fan (2012) that did not include covariates in the GMM, Peugh and Fan (2014) concluded that adding covariates decreased the detection rate of BIC and BLRT under large sample size, large class separation, and large variance explained.

Li and Hser (2011), Peugh and Fan (2014), and Tofighi and Enders (2007) have studied the effect of covariate inclusion in GMM and class enumeration performance of fit indices and statistics when the generating model had covariates, but these studies used the GP-GMM for class enumeration with covariates. The current study expands the existing GMM literature by further investigating the performance of two different types of GMM with covariates and GMM without covariates on class enumeration. Our research questions are: (a) Is there any difference in performance between the U-GMM, GP-GMM and DE-GMM with respect to accuracy of class enumeration? (b) How do commonly used indices and statistics for class enumeration perform across different GMM specifications?

GMM can differ with respect to which parameters are allowed to vary between classes. We conducted two studies to address the research questions above, each with a different degree of model complexity with respect to allowing parameters to vary between classes. Study 1 compared versions of the U-GMM, GP-GMM and DE-GMM where intercept means were allowed to vary between classes. Study 2 compared versions of these models where intercept and slope means, residual variances and covariance were allowed to vary between classes. By comparing the results of Study 1 and Study 2, we were able to examine whether class enumeration performance depended on which parameters were allowed to vary between classes.

Method

Population model

Both Study 1 and Study 2 used a two-class linear GP-GMM with one covariate as a population model, because it is popular in applied research (e.g., Abroms et al, 2005; Greenbaum et al, 2005; McDonough, Sacker, & Wiggins, 2005; Stoolmiller, Kim, & Capaldi, 2005) and has also been used in previous simulation studies (e.g., Lubke & Muthén, 2007; Nylund et al., 2007):

$$ {\gamma}_{ij}^k={\eta}_{iI}^k+{\eta}_{iS}^k{t}_j+{\xi}_{ij}^k $$
(16)
$$ {\eta}_{i I}^k={\gamma}_{00}^k+{\gamma}_{10}^k{x}_{i1}+{\xi}_{i0}^k $$
(17)
$$ {\eta}_{i S}^k={\gamma}_{01}^k+{\gamma}_{11}^k{x}_{i1}+{\xi}_{i1}^k $$
(18)

Population parameters for Equations 16 to 18 used in the Monte Carlo simulations in Studies 1 and 2 are shown in Table 1. Data were simulated for a normally distributed outcome measured at five equally spaced time intervals and one normally distributed standard covariate. In a simulation study with 4 and 7 waves, Tofighi and Enders (2007) found that the number of repeated measures had only a minor impact on class enumeration. We chose five waves because it is within the range of waves examined by Tofighi and Enders (2007). The number of classes was set at two through reviewing previous LCA and GMM simulation studies (e.g., Li & Hser, 2011; Lubke & Muthén, 2007; Nylund, et al., 2007). Lubke and Muthén (2007) stated that unequal class proportions do not have a noticeable effect on model selection. Therefore, we chose to set population class proportions at .50. Following previous studies (e.g., Li & Hser, 2011; Lubke & Muthén, 2007; Nylund, et al., 2007; Yang, 2006), we defined effects of the covariate on the growth factors that were invariant across classes. We selected population coefficients for the covariate such as that it explains 15 % of the variance of the intercept and 5 % of the slope. In turn, population residual variances for the intercept, slope and outcome were chosen so that the intercept and the slope together account for 20 % of the variance of the outcome at each measurement occasion. These targets are similar to Li and Hser (2011), who simulated a covariate that explained 20 % of the growth factors’ variance, and Tofighi and Enders (2007), who generated covariates accounting for 15 % of the variance of the intercept and slope.

Table 1 Population values for Monte Carlo simulation studies

Manipulated conditions in Study 1

In study 1, we manipulated three conditions: class separation (multivariate Mahalanobis Distance (MD) of 1.3, 2.6, and 3.9), sample size (N = 400, 1,000, and 2,000) and covariate’s effect on class membership (pseudo R 2 = 0.1 and 0.3). The variables manipulated resulted in a total of 3 × 3 × 2=18 unique conditions. For each condition, 100 datasets were generated using the R statistical package.

The three levels of class separation (MD = 1.3, 2.6, and 3.9) manipulated correspond to differences of 1, 2, and 3 standard deviations between the means of the intercept factors of the two classes. Lubke and Muthén (2007) simulated class separation of 0, 1, 1.5, and 2 standard deviations but mentioned that real data could have higher levels of class separation than simulated in their study. In Study 1, the separation between classes was determined solely by the population mean difference on the intercept factor.Tofighi and Enders (2007) performed a review of GMM applications and found a large range of sample sizes (i.e., 110–5,833) in applied studies. In this simulation, we generated datasets with sample sizes of 400, 1,000, and 2,000, which are approximately between the 25th and 75th percentiles of the sample sizes in studies identified by Tofighi and Enders (2007).

None of the previous studies about class enumeration in GMM manipulated the strength of the effect of covariates on class membership. The McKelvey and Zavoina Pseudo-R 2 (Hu et al. 2006; McKelvey & Zavoina, 1975) can be used to manipulate this effect. The McKelvey and Zavoina Pseudo-R 2 can be interpreted as the proportion of the variance of a hypothetical continuous variable underlying the latent class variable that is explained by the covariate. Covariate effects corresponding to pseudo-R 2 of 0.1 and 0.3 were manipulated in this study.

Manipulated conditions in Study 2

We varied the intercept mean and slope mean differences in the population model so that the two classes have unique mean intercept and mean slope. In addition to the three conditions (class separation, sample size, and covariate effect) manipulated in Study 1, we manipulated whether the residual covariance matrix of intercept and slope were allowed to vary between classes. The variables manipulated resulted in a total of 3 × 3 × 2 × 2 =36 unique conditions. For each condition, 200 datasets were generated using the R statistical package.

Analysis procedure

For each simulated dataset, the U-GMM as shown in Equations 14, the GP-GMM shown in Equations 5, 6, 7, and 11 and DE-GMM shown in Equations 811 with one to three latent classes were fit using the MPLUS 7.1 software with maximum likelihood estimation. In Study 1, these models were fitted once to each simulated dataset, allowing means of intercepts to vary between classes. In Study 2, these models were fit twice to each simulated dataset, once allowing factor means to vary between classes, and another time allowing factor means, residual variances and covariance of growth factors to vary between factors. The covariate’s effects on the growth factors in the GP-GMM and on the outcomes in the DE-GMM were set to be invariant across classes. Similarly to Peugh and Fan (2014), we fit GMM to each simulated dataset with the correct number of classes (i.e., two classes) and incorrect but neighboring number of classes (i.e., one and three classes), because a review of previous GMM studies indicated that class enumeration procedures resulted in class counts that were close to the correct number of classes regardless of simulated condition. For example, in Li and Hser (2011), the number of classes selected using different criteria across all conditions fell mostly within one to three classes even though the authors fit models with one to five classes.

Class enumeration was compared across information indices (i.e., AIC, BIC, and ABIC) and likelihood ratio tests (i.e., LMR and BLRT). The outcome of the Monte Carlo simulation study was a dichotomous indicator of whether the correct number of classes, two classes in this case, was identified using each class enumeration criterion. For the AIC, BIC, and ABIC, if the model with two classes produced a smaller value than the models with one and three classes, an outcome of 1 was assigned to indicate correct class enumeration. For the LMR and BLRT, a significant p value (p ≤ .05) indicates that the model with k classes fits better than the model with k-1 classes. Therefore, the number of classes was determined by the following criterion: if p ≤ .05 for the comparison between models with one and two classes and p >.05 for the comparison between models with two and three classes, a value of 1 was assigned to the outcome and regarded as correct class enumeration. This is a strict criterion given that the LMR and BLRT not only had to indicate better fit of two class model over three class model (as in Nylund et al., 2007), but also had to indicate better fit of the two class model over the one class model.

Mixture models are known for high frequency of convergence problems, local maxima, and improper solutions (Li & Hser, 2011; Lubke & Muthén, 2007; Nylund et al., 2007). To eliminate these problems, we set starting values at 400 random sets in the initial stage and 100 optimizations in the final stage. These numbers are much larger than the default MPLUS settings of 20 and 4 for the initial and final stages respectively, and previous research has obtained adequate results by just doubling the MPLUS defaults (e.g., Peugh & Fan, 2014).

Results

In this section, the performance of model fit indices and statistics for selecting correct number of classes with three different GMM are reported for both Study 1 and Study 2. To aid in the interpretation of results of the Monte Carlo simulation study (Bandalos & Leite, 2013) a mixed-design ANOVA table was calculated with the dichotomous indicator of each of the fit indexes selecting the correct model as the outcome. This is a linear probability model (Agresti, 2002) that is in general not recommended for dichotomous outcomes, but it is useful here because our goal is to simply compare effects across manipulated conditions rather than to interpret parameter estimates. Furthermore, as Agresti (2002) pointed out, this model is valid over a restricted range of predictors, and in our simulation all of our predictors are categorical variables. Class separation, sample size, and covariate effect on class membership were between-iteration effects in both Studies 1 and 2, because these factors were manipulated across different simulated datasets. GMM type was a within-iteration effect in Study 1, both GMM type and whether the residual covariance matrix of the intercept and the slope was allowed to vary between classes were within-iteration effects in Study 2. Generalized eta squared (η 2) (Olejnik and Algina 2003) was used to quantify the effects of the manipulated factors. We considered effects equal or larger than 0.01 to be substantial and focused interpretation on them, because the large number of replications in simulation studies results in high levels of power and only those effects reaching some pre-specified level of effect size should be interpreted for practical significance (Bandalos & Leite, 2015). These effects are shown in Table 2 (all results were available upon request to the first author).

Table 2 Effect sizes of manipulated factors in Study 1 and Study 2

Study 1

Table 3 presents a comparison of class enumeration with AIC, BIC, ABIC, LMR, and BLRT across simulated conditions. With correct classes enumeration with the AIC as the outcome, GMM type (η 2 = .018), and the interaction of GMM type by class separation (η 2 = .018) had the largest effect sizes, followed by the interaction of class separation and sample size (η 2 =.010). GMM type, class separation and sample size had significant main effects. The two GMM with a covariate performed better than the GMM without covariate only when class separation was small (MD = 1.3) and sample size was 1,000 or less. For these conditions, the DE-GMM outperformed the GP-GMM.

Table 3 Percentage of correct model identification of fit indices in Study 1

For class enumeration with the BIC, we observed significant main effects of class separation, sample size, and covariate effect. Class separation (η 2 =.604) had the largest effect size, followed by sample size (η 2 =.131) and covariate effect (η 2 =.027). However, there were two three-way interactions present: Interaction of class separation by sample size (η 2 = .110) and interaction of class separation by covariate effect (η 2 = .010). The two GMM with a covariate only outperformed GMM without covariates when the covariate effect was stronger (R2= .3) and class separation was 2.6. We found that class enumeration with the BIC was very sensitive to class separation and sample size. In these conditions, the GP-GMM performed slightly better than the DE-GMM when the pseudo-R2 was 0.1 but slightly worse when the pseudo R2 was 0.3.

For class enumeration with the ABIC, we observed a substantial main effect of all four manipulated factors: class separation, sample size, covariate effect, and GMM type. Class separation (η 2 =.530) had the largest effect size, followed by the interaction of class separation by sample size (η 2, =.041), sample size (η 2 =.014), and covariate effect (η 2 =.014). However, three-way interactions were also present: class separation by sample size by GMM type, sample size by covariate effect by GMM type. The two GMM with a covariate performed better than GMM without a covariate when class separation was 1.3, or when class separation was 2.6 in combination with a sample size of 1,000 or smaller. The DE-GMM resulted in higher accuracy of class enumeration than the GP-GMM in these conditions.

For class enumeration with the LMR, there were substantial main effects of class separation and sample size, as well as two-way interactions. Class separation (η 2 =.328) had the largest effect size, followed by sample size (η 2 =.043), the interaction of class separation by GMM type (η 2 =.025), the interaction of class separation by sample size (η 2 =.015). The two GMM with a covariate performed better than GMM without covariate when class separation was 1.3, or when class separation was 2.6 in combination with a sample size of 1,000 or smaller. In most conditions, the DE-GMM outperformed the GP-GMM.

The BLRT performed similarly to LMR. Class separation had the largest effect size (η 2 = .385), followed by interaction of class separation by sample size (η 2 = .023), interaction of class separation by GMM type (η 2 = .011), and sample size main effect (η 2 = .010). The GP-GMM and DE-GMM performed better than the U-GMM when class separation was 1.3, or when class separation was 2.6 in combination with a sample size of 1,000 or smaller, and DE-GMM outperformed GP-GMM except when the sample size was 400.

In summary, the GMM with covariates were the best models for class enumeration, with the DE-GMM outperforming the GP-GMM most of the time, when sample sizes and Mahalanobis Distances were at the smallest levels simulated, but the U-GMM works better than both GMM with covariates when sample sizes and distances increase. Among fit indices and statistics, the AIC was consistently the best index for class enumeration when sample sizes and Mahalanobis distances were at the smallest levels simulated. The BIC showed very a high detection rate but only at high class separation (MD = 3.9) and larger sample size (1,000 or larger). ABIC had a similar pattern with BIC but performed better than BIC at smaller class separation of 2.6 or smaller, or at class separation of 3.9 and sample size of 400. At the most favorable condition of class separation of 3.9 and sample size of 1,000 or larger, ABIC and BIC had similar detection rates. LMR and BLRT shared similar performances. Comparing across ABIC and the LRTs, under favorable conditions like class separation of 3.9 and sample size of 1,000 or above, ABIC performed better than LRTs in that ABIC had a higher detection rate and ABIC performed well in both GMM with and without covariates. Under the same conditions, the LRTs worked better for GMM without a covariate.

Study 2

Table 4 presents a comparison of class enumeration with AIC, BIC, ABIC, LMR, and BLRT across simulated conditions. For class enumeration with the AIC, GMM type (η 2 =.058), class separation (η 2 =.057), and sample size (η 2 =.015) had effect sizes larger than .01. In general GMM without covariates performed better than GMM with a covariate, and the DE-GMM outperformed the GP-GMM by small differences.

Table 4 Percentage of correct model identification of fit indices in Study 2

For BIC, class separation (η 2 =.574) had the largest effect size, followed by sample size (η 2 =.325), covariate effect (η 2 =.034), whether the residual covariance of intercept and slope was allowed to vary between classes (η 2 =.028), model (η 2 =.025), interaction of model and residual covariance allowed to vary between classes (η 2 =.019), interaction of model, class separation, and sample size (η 2 =.017), and four-way interaction of model, class separation, sample size, and whether the residual covariance was allowed to vary between classes (η 2 =.011). The BIC did not achieve a correct enumeration rate above 0.8 until class separation was at 2.6 in combination with a sample size of 2,000, or class separation at 3.9 with a sample size of 1,000 or larger. With the BIC, the GMM without covariates performed better than the two GMM with a covariate. When class separation and sample size were at the largest levels (MD=3.9, N=2,000), all growth mixture models had the same 100 % detection rate on BIC. GMM not allowing the residual covariance matrix to vary between classes resulted in slightly better class enumeration than when the residual covariance matrix was allowed to vary.

For ABIC, class separation (η 2 =.473) had the largest effect size, followed by the interaction of class separation by sample size (η 2 =.075), sample size (η 2 =.055), the interaction of class separation by sample size by model (η 2 =.014), covariate effect (p <.001,η 2 =.014), whether the residual covariance matrix of growth factors was allowed to vary between classes (η 2 =.013), type of model (η 2 =.012), and the interaction of model and whether the residual covariance varied between classes (η 2 =.012). The U-GMM resulted in higher accuracy of class enumeration than both the DE-GMM and GP-GMM except when class separation and sample size were small (MD=1.3, N=400). When class separation was 3.9 and sample size was 2,000, detection rate for ABIC was almost all 100 %.

For the LMR, class separation (η 2 =.184) had the largest effect size, followed by sample size (η 2 =.079), the interaction of class separation by sample size (η 2 =.022). In general, GMM without covariates performed better than GMM with a covariate except when class separation and sample size were really small (MD = 1.3, N = 400). The DE-GMM resulted in better or equal class enumeration than the GP-GMM in most conditions. When class separation is large (MD=3.9) in combination with a sample size of 1,000 or larger, all three models detected the correct number of classes in at least 72 % of iterations.

For BLRT, class separation had the largest effect size (η 2 = .441), followed by sample size, interaction of class separation by sample size (η 2 = .023), interaction of class separation by model (η 2 = .011), and sample size (η 2 = .010). The U-GMM produced better class enumeration than the two models with covariates, but the DE-GMM outperformed the GP-GMM. However, when class separation is 3.9 in combination with a sample size of 1,000 or larger, all models had correct class enumeration of 92 % or higher.

In summary, the AIC only worked better than the other fit indices and statistics with small sample sizes and small distances. BIC was very sensitive to sample size and class separation, and showed very high detection rate at only high class separation (MD = 3.9) and larger sample size (1,000 or above). ABIC performed better than BIC especially when class separation was small (MD = 1.3). LMR performed slightly better than BLRT at small class separation (MD = 1.3) and small sample size (N = 400), while BLRT showed much higher and accurate detection rate than LMR at large class separation and sample size (MD = 3.9, N = 1,000 or above). The ABIC performed slightly better than BLRT at small sample size of 400. The GMM without covariate performed better than the GMM with covariates in general, and only when class separation and sample size were small (MD = 1.3, N = 400) do GMM with a covariate slightly outperform GMM without covariates. Allowing the residual covariance matrix of intercept and slope to vary between classes had no or little effect on the accuracy of class enumeration. In both studies, an interaction of sample size and class separation was observed, which supported the finding of Lubke and Neale (2006) that the smaller class separation can be partially compensated for by increasing the sample size to get a correct model. There were only small differences between GP-GMM and DE-GMM in terms of identifying the correct number of classes. Comparing the data simulating in Study 1 and Study 2, we found that allowing intercept and slope residual covariance matrix differences between two classes increased the detection rate of the AIC and ABIC under unfavorable conditions (MD = 1.3 N = 400) only slightly.

Discussion

While some methodological studies about class enumeration in growth mixture models did not include covariates in the generating model (e.g., Liu & Hancock, 2014; Nylund et al. 2007; Peugh & Fan, 2012; Yang, 2006), other studies such as Li and Hser (2011) and Tofighi and Enders (2007) studied the GMM performance when the generating model had a covariate and compared performance of different growth mixture models. However, these papers only examined the GP-GMM but not the DE-GMM. Therefore, the purpose of this study was to conduct further evaluation on whether the inclusion of covariates in GMM was beneficial to class enumeration. Two GMM with covariates (i.e., the GP-GMM and DE-GMM) and an unconditional GMM were fit to the data generated based on the GP-GMM. Commonly used fit indices and test statistics for class enumeration were compared.

In our studies, we found that the inclusion of a covariate only helps with class enumeration when class separation and sample size are small (i.e., MD = 1.3 and N = 400) and in this case, it is better to fit the DE-GMM and the GP-GMM. However, the inclusion of a covariate in this case makes little practical difference because the percentages of correct class identification are low across all fit criteria. We consider that the difference between models is only of practical significance in conditions where using the unconditional model allows a percentage of correct identification above 70 % but the other models do not. Therefore, we conclude that under favorable conditions, such as class separation of 3.9 or class separation of 2.6 with a sample size of 2,000, GMM without covariates performs better than GMM with covariates. We found that our results are supported by the existing literature: Peugh and Fan (2014) concluded that adding covariates decreased the detection rate of BIC and BLRT under large sample size (i.e., N = 3000), large class separation (MD = 2), and large variance explained. Yet Peugh and Fan (2014) did not compare the performance of different GMM with and without covariates in the same paper with the same population model. Tofighi and Enders (2007) simulated a GP-GMM with covariates and concluded that the inclusion of covariates hampered GMM class extraction capability. However, it is worth noting that Tofighi and Enders (2007) reached this conclusion under the most favorable conditions in their study (i.e., sample size =1,000, largest class separation of 3 standard deviations which was equivalent to MD = 3.9 in our study; normal distribution within class; mixing percentages of 20: 33: 47 %), and therefore their conclusion is similar to the results of our study.

Li and Hser (2011) concluded that inclusion of a covariate was beneficial to GMM class enumeration when GMM misspecification was minor (i.e., defined as GMM without covariate fit to data generated by GMM with covariate and the same covariate mean). They simulated GMM with covariates and fixed the class separation as a class mean difference equivalent to 3 standard deviations, yet they varied the covariate mean difference between classes. Their results were different from our results, and this difference might be due to the fact that in Li and Hser’s study (2011) covariates were binary or non-normally distributed, while the covariate was normally distributed in our study. Lubke and Muthén (2007) concluded that including a covariate helped the class assignment of factor mixture models. They concluded that correct class assignment increased as covariate difference between classes increased (MD of covariate increased from .5 to 2). In our study, the covariate effect was defined as the variance that covariate explained the intercept and the slope as well as the effect on class membership, which was manipulated through the pseudo R 2. In Lubke and Muthén’s study (2007) the covariate in the simulation model only predicted the class membership but not the intercept and the slope. These differences may explain why our results contrast with Li and Hser (2011) and Lubke and Muthén (2007), but agree with Tofighi and Enders (2007) and Peugh and Fan (2014).

On the performance of the fit indexes, the ABIC was the best indicator followed by the LRTs when class separation was large and sample size was large (MD = 3.9, N = 1,000 or larger). Peugh and Fan (2014) also found that sample size adjusted BIC performed better than LMR and BLRT when the class-mixing proportion was equal. Specific to our studies, ABIC showed higher detection rate and worked well under both GMM with covariate and the GMM without covariate. Under the same conditions, LMR and BLRT performed better in GMM without covariates. This conclusion is consistent with the findings of Peugh and Fan (2014), who found that adding covariates decreased the detection rate of BLRT comparing to GMM without covariates. We found that the BIC was sensitive to sample size and class separation, and only achieved satisfactory performance when class separation was 3.9 in combination with a sample size of 1,000 or larger. Under these conditions, GMM type did not impact the BIC’s performance. Liu and Hancock (2014) also found BIC performed well in linear GMM with MD = 2 or larger, and large sample size. The AIC performed consistently especially in unconditional GMM, but did not show substantial improvement with increase of class separation conditions, as other indices did.

Our comparison of fit indices resulted in conclusions that are similar to previous studies: On the performance of ABIC, Yang (2006) also concluded that the ABIC was accurate in the context of latent class analysis. Tofighi and Enders (2007) concluded that ABIC detected the correct number of classes even when sample size was 400. Nylund et al. (2007) also concluded that the ABIC performed well in GMM when sample size was 500 or larger. On the performance of BIC, Yang (2006), Tofighi and Enders (2007), Nylund et al. (2007), and Liu and Hancock (2014) all reached the same conclusion that the BIC performed adequately when sample size was above 1,000. These conclusions about the sensitivity of BIC to sample size are not surprising since BIC is a function of sample size. On the performance of AIC, our results mirrored those of Yang (2006) and Nylund et al. (2007), who concluded that AIC performs poorly because it does not improve much as sample size increases.

The LRTs performed very well in class enumeration particularly with unconditional GMM when class separation was 3.9. LMR performed almost as well in class identification as BLRT. Our results about the performance of LMR supported the conclusion of Tofighi and Enders (2007), Nylund (2007), and Liu and Hancock (2014). Our results about the BLRT’s performance are consistent with Nylund (2007). The poor performance of LRTs on GMM with covariates might be due to the model complexity of GMM with covariates.

Conclusion

The evaluation of inclusion of covariates in GMM under various specifications, the difference between DE-GMM and GP-GMM, and the performance of model fit indexes was the focus of this paper. Consistent with previous studies, we concluded that GMM depends on larger sample sizes and is difficult to implement with small sample sizes. When class separation is large and a sufficient sample size is available (e.g., 1,000), the ABIC, LMR, and BLRT can be employed together to detect the correct number of classes. Based on the findings of this study, the practice we recommend is to assess the class separation without covariates if the sample size is large. However, if the sample size is small (e.g., 400) and the class separation is not clear, we recommend including a covariate to improve class enumeration accuracy, with the awareness that class enumeration is frequently inadequate with small sample sizes. Covariates should be selected based on substantive theory and empirical analysis (Muthén, 2004). Also, when selecting covariates, care must be given to whether the predictor merely predicts the latent class membership or whether the predictor predicts the latent growth factor within class as well, or both (Bauer, 2007). If a predictor is to be included, the DE-GMM is recommended because it performs slightly better than GP-GMM on most occasions.

The findings of this study are limited by the fact that only linear growth models were examined. Furthermore, only one level of class proportions and number of waves was examined, and we simulated data where the covariate had effects on the intercept and slope but not direct effects on the outcome (i.e., the full mediation assumption discussed by Stoel, Van den Wittenboer, and Hox (2004) was held in the simulated data). Future research could focus on population models with nonlinear growth and multilevel levels of number of waves. We also restricted our study to only the most commonly used fit criteria, but a comprehensive evaluation of fit criteria, including the consistent AIC (C-AIC), Hannan and Quinn (HQ), Hurvich and Tsai AIC, normalized entropy criterion (NEC), classification likelihood criterion (CLC), integrated completed likelihood criterion (ICL), and adjusted Lo-Mendell-Rubin (aLMR), which were not examined in this study, can be found in Peugh and Fan (2014). We only analyzed conditions where growth trajectory was correctly specified, so it remains an open question whether class identification with large sample sizes using different fit criteria are robust to some degree of misspecification of the growth trajectory.

A separate but related area of research is about how to include covariates in GMM when the interest is in either estimating the effect of covariates on class membership or estimating the effects of class membership on distal outcomes. In this area of research, several multiple-step alternatives to fitting a single GMM with covariates have been proposed, such as the pseudo class (PC) method (Clark & Muthén, 2009), Vermunt’s three-step method (Vermunt, 2010), and the Lanza, Tan, and Bray (2013) method, and comparisons across these methods, such as the recent study by Asparouhov and Muthén (2014), is an area of active research.