Power and sample size computation for Wald tests in latent class models

: Latent class (LC) analysis is used by social, behavioral, and medical sci-ence researchers among others as a tool for clustering (or unsupervised classiﬁcation) with categorical response variables, for analyzing the agreement between multiple raters, for evaluating the sensitivity and speciﬁcity of diagnostic tests in the absence of a gold standard, and for modeling heterogeneity in developmental trajectories. Despite the increased popularity of LC analysis, little is known about statistical power and required sample size in LC modeling. This paper shows how to perform power and sample size computations in LC models using Wald tests for the parameters de-scribing association between the categorical latent variable and the response variables. Moreover, the design factors affecting the statistical power of these Wald tests are studied. More speciﬁcally, we show how design factors which are speciﬁc for LC analysis, such as the number of classes, the class proportions, and the number of response variables, affect the information matrix. The proposed power computation approach is illustrated using realistic scenarios for the design factors. A simulation study conducted to assess the performance of the proposed power analysis procedure shows that it performs well in all situations one may encounter in practice.


Introduction
Latent class (LC) analysis was initially introduced in the 1950s by Lazarsfeld (1950) as a tool for identifying subgroups of individuals giving similar responses to sets of dichotomous attitude questions. It took another two decades until LC analysis started attracting the attention of other statisticians. Since then, various important extensions of the original LC model have been proposed, such as models for polytomous responses, models with covariates, models with multiple latent variables, and models with parameter constraints (Goodman 1974;Dayton and Macready 1976;Formann 1982;McCutcheon 1987;Dayton and Macready 1988;Vermunt 1996;Magidson and Vermunt 2004). More recently, statistical software for LC analysis has become generally available-e.g., Latent GOLD (Vermunt and Magidson 2013b), Mplus (Muthén and Muthén 2012), LEM (Vermunt 1997), the SAS routine PROC LCA (Lanza, Collins, Lemmon, and Schafer 2007), and the R package poLCA (Linzer and Lewis 2011)-which has contributed to the increased popularity of this model among applied researchers. Applications of LC analysis include building typologies of respondents based on social survey data (McCutcheon 1987), identifying subgroups based on health risk behaviors (Collins and Lanza 2010), identifying phenotypes of stalking victimization (Hirtenlehner, Starzer, and Weber 2012), and finding symptom subtypes of clinically diagnosed disorders (Keel et al. 2004). Applications which are specific for medical research include the estimation of the sensitivity and specificity of diagnostic tests in the absence of a gold standard (Rindskopf and Rindskopf 1986;Yang and Becker 1997) and the analysis of the agreement between raters (Uebersax and Grove 1990).
Despite the increased popularity of LC analysis in a broad range of research areas, no specific attention has been paid to power analysis for LC models. However, as in the application of other statistical methods, users of LC models wish to confirm the validity of their research hypotheses. This requires that a study has sufficient statistical power; that is, that it is able to confirm a research hypothesis when it is true. Also, reviewers of journal publications and research grant proposals often request sample size and/or power computations (Nakagawa and Foster 2004). However, in the literature on LC analysis, methods for sample size and/or power computation as well as a thorough study on the design factors affecting the power of statistical tests used in LC analysis, are lacking.
In this paper, we present a method for assessing the power of tests related to the class-specific response probabilities, which in confirmatory LC analysis are the parameters of main interest. Relevant tests include tests for whether response probabilities are equal across latent classes, whether response probabilities are equal to specific values, whether response proba-bilities are equal across response variables (indicators), and whether sensitivities or specificities are equal across indicators (Goodman 1974;Holt and Macready 1989;Vermunt 2010b). Since the class-specific response probabilities are typically parameterized using logit equations (Formann 1992;Vermunt 1997), as in logistic regression analysis, hypotheses about these LC model parameters can be tested using Wald tests (Agresti 2002). The proposed power analysis method is therefore referred to as a Wald based power analysis.
For logistic regression models, Demidenko (2007Demidenko ( , 2008 and Whittemore (1981) described the large-sample approximation for the power of the Wald test. In this paper, we show how to use this procedure in the context of LC analysis. An important difference compared to standard logistic regression analysis is that in a LC analysis the predictor in the logistic models for the responses, the latent class variable, is unobserved. This implies that the uncertainty about the individuals' class memberships should be taken into account in the power and sample size computation. As will be shown, factors affecting this uncertainty include the number of classes, the class proportions, the strength of the association between classes and indicator variables, and the number of indicator variables (Collins and Lanza 2010;Vermunt 2010a).
The remainder of this paper is organized as follows. Section 2 presents the LC model for dichotomous responses and discusses the relevant hypotheses for the parameters of the LC model. Section 3 discusses power computation for Wald tests in LC analysis and, moreover, shows how the LC specific design factors affect the power via the information matrix. Section 4 presents a numerical study in which we assess the performance of the proposed method and illustrates power/sample size computation for different scenarios of the relevant design factors. In the last section, we provide a brief discussion of the main results of our study.

The LC Model
The LC model is a probabilistic clustering or unsupervised classification model for dichotomous or categorical response variables (Goodman 1974;McCutcheon 1987;Hagenaars 1988;Magidson and Vermunt 2004;Vermunt 2010b). Taking the dichotomous case as an example, let y ij be the value of response pattern i for the binary variable Y j , for j = 1, 2, 3, ..., p, where y ij = 1 represents a positive response and 0 a negative response. We denote the full-response vector by y i . For example, for p = 3, y i takes on one of the following eights triplets of 0 and/or 1's: The three response variables could, for example, represent the answers to the following questions: "Do you support gay marriage?", "Do you support a raise of minimum wages?", and "Do you support the initiative for health care reform?". In a sample of size n persons, a particular person could answer these questions with 'no', 'yes', and 'yes', respectively, in which case the response pattern for this subject becomes (0, 1, 1). In such an application, the aim of the analysis would be to determine whether one can identify two latent classes with different response tendencies (say republicans and democrats), and subsequently to classify subjects into one of these classes based on their observed responses, or to compare the probability of positive responses to a given response variable between the republican and the democrat classes.
In general, for p dichotomous response variables, we have 2 p tuples of 0 and/or 1's. We denote the number of individuals with response pattern y i by n i , where the total sample size n = 2 p i=1 n i . The LC model assumes that the response probabilities depend on a discrete latent variable, which we denote by X with categories t = 1, 2, 3, ..., c. The probability of having response pattern y i is modeled as a mixture of c class-specific probability functions (Dayton and Macready 1976;Goodman 1974;McCutcheon 1987;McLachlan and Peel 2000;Vermunt 2010b). That is, where P (X = t), which we also denote by π t , represents the relative size of class t, and P (Y = y i |X = t) is the corresponding class-specific joint response probability. The class-specific probabilities for binary variable Y j is usually modeled using a logistic parameterization; that is, θ jt = P (Y j = 1|X = t) = exp (βjt) 1+exp (βjt) , where β jt is the log-odds of giving a positive response on item j in class t. Moreover, assuming that the response variables are independent within classes-which is referred to as the local independence assumption-the LC model represented by equation (1) can be rewritten as follows: where π t is such that 0 < π t < 1 and c t=1 π t = 1. The vector of parameters, Ψ, consists of the sub-vector π, the class proportions, and the sub-vector β, the class-specific logits for the indicator variables. For example, for c = 2 and p = 3, the parameter vector will be: Ψ = (π , β ) = (π 1 , β 11 , β 21 , β 31 , β 12 , β 22 , β 32 ). In the application presented above, these parameters would correspond to the proportion of 'republicans', the log-odds of a republican responds 'yes' instead of 'no' to questions Y 1 , Y 2 , and Y 3 , and the log-odds of a democrat responds 'yes' instead of 'no' to questions Y 1 , Y 2 , and Y 3 .
In general, for a LC model having c classes and p binary indicator variables, we have m = c−1+c·p free model parameters. These parameters are usually estimated by maximum likelihood (ML) (Dayton and Macready 1976;Goodman 1974;McLachlan and Peel 2000;Vermunt 2010b), which involves seeking the values of Ψ, sayΨ which maximize the log-likelihood function: Maximizing the log-likelihood function in (3) produces a unique estimate for Ψ, provided that the LC model in equation (1) is identifiable. As indicated by Goodman (1974), a necessary condition for an LC model to be identified is that the number of independent response patterns is at least as large as the number of free model parameters. That is, 2 p − 1 ≥ m = c− 1+c·p. A sufficient condition for local identification is that the Jacobian is full rank (McHugh 1956). Because the analytic evaluation of the rank of the Jacobian is very difficult, Forcina (2008) proposed checking identification of LC models by evaluating the rank of the Jacobian for a large number of random parameter values. For the scenarios considered in this paper we applied Forcina's method, which showed that the models were identified.
Typically, researchers using LC models do not only wish to obtain point estimates for the Ψ parameters, but are also interested in tests concerning these parameters. For simplicity we will focus on a single type of test, which in most applications is the test of main interest. That is, the test to determine whether there is a significant association between the latent classes and a particular indicator variable. Inference regarding this association involves testing the null hypothesis that the response logit does not differ across latent classes for the indicator variable concerned. This null hypothesis can be formulated as H 0 : β j1 = β j2 = ... = β jc , for j = 1, 2, 3, ..., p. An equivalent formulation of this hypothesis is Or, using matrix notation, as H 0 : Hβ j = 0, where H is a c − 1 by c design matrix with linear contrasts and β j is a c by 1 column vector with the parameters for Y j , i.e., β j = (β j1 , β j2 , ..., β jc ). Under the null hypothesis of no association, the difference β j1 −β jt occurs by chance alone, implying that the indicator does not contribute to the definition of classes in a statistically significant way.
As already indicated in the introduction section, various other types of hypotheses concerning the class-specific logit parameters may be of interest. Examples include tests for whether β jt is equal to a particular value (e.g., β 11 = 1), whether the β jt parameters are equal across two or more items (e.g., β 1t − β 2t = 0), and whether the value is the opposite of the value for another class (e.g., β 11 +β 12 = 0) (Goodman 1974). In medical research, we may be interested in comparing the sensitivity and specificity of diagnostic tests (see, for example Yang and Becker 1997), yielding hypotheses such as β 11 −β 21 = 0 and β 12 −β 22 = 0, respectively. Note that all these hypotheses can be expressed in the general form Hβ = 0.

The Wald Statistic and Its Asymptotic Properties
One of the properties of the ML estimator is that, under certain regularity conditions (McHugh 1956;White 1982), the estimatorΨ converges in probability to Ψ as the sample size tends to infinity. That is, for any se-quenceΨ n we haveΨ n a.s. − − → Ψ. The other interesting property of the ML estimator is that it has a limiting normal distribution. More specifically, for large sample size n, where −→ denotes convergence in distribution, V = I −1 (Ψ) is the asymptotic co-variance of √ nΨ n , and I(Ψ) is the m by m information matrix (McHugh 1956;Redner 1981;Rencher 2000;Wald 1943;Wolfe 1970). The latter has the following block structure: for t, s = 1, 2, 3, ...., c − 1, l, q = 1, 2, 3, ...., c and k, j = 1, 2, 3, ..., p. The sub-matrices I 1 , I 2 , I 3 , and I 4 are of dimensions c − 1 by c − 1, c − 1 by c · p, c · p by c − 1, and c · p by c · p, respectively. The terms between braces indicate the parameters involved in the sub-matrix concerned.
Using the algebraic properties of block matrices, it follows that where A = I 1 − I 2 I −1 4 I 3 and B = I 4 − I 3 I −1 1 I 2 . A necessary condition for A to be invertible, which is a requirement to obtain the covariance matrix of Ψ n , is that both I 1 and I 4 are non-singular matrices (Rencher 2000). In the Appendix section, we provide details on the expressions for I 1 , I 2 , I 3 , and I 4 .
The consistency and multivariate normality discussed above apply to the estimators of the component parameters as well. That is, using the property of multivariate normal random variables which states that the subvectors of a multivariate normal are also normal, the limiting distributions ofπ andβ become Also sub-vectorβ j ofβ is normally distributed, with mean β j and with covariance V j , being a c by c sub-matrix of B −1 . In the remaining part of the paper, we focus on this β j . Using the Continuous Mapping Theorem (Mann and Wald 1943), for a design matrix H that defines the contrasts on the null hypothesis, one can show that Hβ j −→ N (Hβ j , HV j H ). The quadratic form of the test for the hypothesis H 0 : Hβ j = 0 yields the well-known Wald statistic: Under the null hypothesis, that is, if H 0 : Hβ j = 0 holds, the Wald statistic W has an asymptotic (central) chi-square distribution with c − 1 degrees of freedom (Rencher 2000;Wald 1943). That is, Under the alternative hypothesis, W follows a non-central chi-square distribution with c − 1 degrees of freedom and non-centrality parameter λ. That is, where λ = n(Hβ j ) (HV j H ) −1 (Hβ j ).

Power and Sample Size Computation
With the establishment of the distribution of the test statistic under the null and alternative hypotheses and the availability of a closed form expression for the non-centrality parameter λ, it becomes possible to compute the power of the test for a given sample size or the sample size for a given power. As in any power analysis, we first have to define the population model. In our case, this involves defining the number of classes and the number of response variables, and, moreover, specifying the values for the class proportions π and the class-specific logits β. For the assumed population model, we can compute the inverse information matrix V which appears in the formula of the non-centrality parameter.
Once the population parameters are set and V is computed, power computation for a given sample size and required sample size computation for a given power proceeds along the steps described below.

Steps for Power Computation
Power computation proceeds as follows: 1. Compute the non-centrality parameter λ for the specified sample size n (use the expression in equation 10). 2. For a given value of type I error α, read the 100(1−α) percentile value from the (central) chi-square distribution. That is, find χ 2 (1−α) (c − 1) such that under the null hypothesis, P W > χ 2 (1−α) (c − 1) = α. This value is referred to as the critical value of a test. 3. Compute the power as the probability that a random variable W from the non-central chi-square distribution (with non-centrality parameter λ given in step 1) will assume a value greater than the critical value obtained under step 2.

Steps for Sample Size Computation
Sample size computation proceeds as follows: 1. For a given value of α, read the 100(1 − α) percentile value from the (central) chi-square distribution (see the second step for power computation). 2. For a given power and the critical value obtained in step 1, find the non-centrality parameter λ such that, under the alternative hypothesis, the condition that power is equal to P W > χ 2 (1−α) (c − 1) is satisfied.

Software Implementation
The above procedure for power computation can be applied using existing software for LC analysis that allows defining starting values or fixed values for the logit parameters and that provides the (inverse) information matrix as output, for example, using LEM (Vermunt 1997), Mplus (Muthén and Muthén 2012), or Latent GOLD (Vermunt and Magidson 2013b). More specifically, with a LC analysis software package, the inverse information matrix V can be obtained. This will typically require the following two steps: A. Create a data set containing all possible data patterns and with the expected frequencies according to the LC model of interest as weights. This can be achieved by running the LC software with the population parameters specified as fixed values and with the estimated frequencies as requested output. The created output is, in fact, a data set which is exactly in agreement with the population model. Such a data set is sometimes referred as an 'exemplary' data set (ÓBrien 1986). B. Analyze the (exemplary) data set created in step A with the LC model of interest and request the variance-covariance matrix of the parameters (the inverse information matrix) as output. Note that when analyzing a data set which is exactly in agreement with the model, the observed information matrix is identical to the expected information matrix. The same applies to the approximate observed information matrix based on the outer-product of the gradient contributions of the data patterns (see Appendix).
The above two steps provide us with the inverse information matrix V. The actual power or sample size computations using the steps described above can subsequently be performed using software that allows performing matrix computations and that has functions for obtaining the critical value from the chi-squared distribution and the non-centrality value from the noncentral chi-squared distribution. For this purpose, one can use R. An R script is available from the first author.
The procedure described above is fully automated in version 5.0 of the Latent GOLD program (Vermunt and Magidson 2013b). Users define the population model and specify either the sample size or the required power. The program computes the power or the required sample size for the Wald tests it reports by default, as well as for other Wald tests defined by the user. In the Appendix, we give an example of the Latent GOLD syntax for power computation.

Design Factors Affecting the Power of a Wald Test in LC Models
Now let us look in more detail at the factors affecting the power of the Wald test in LC models. It should be noted that the power is determined by the value of the type I error and the value of the noncentrality parameter λ. The larger the type I error and the larger λ, the larger the power. As can be observed from equation (10), λ is a function of the sample size n, the precision of the estimator (V j ), and the effect size Hβ j . Note that in our case the effect size is the difference between the class-specific β parameters or, equivalently, the strength of the association between the classes and the response variable concerned.
Specific for LC models is that the precision of the estimator is affected by the fact that class membership is unobserved; that is, that we are uncertain about a person's class membership. Recall from equation (5) that the block of V concerning the β parameters is obtained as the inverse of B = I 4 − I 3 I −1 1 I 2 . This means that B becomes larger when I 4 and I 1 become larger and when I 2 and I 3 become smaller. To show how the uncertainty about the class membership affects B, let us have a closer look at I 4 , which is the most important term in B. Its elements are obtained as follows: where θ jq = exp(β jq )/(1 + exp(β jq )). As can be seen, specific for a LC analysis is that the elements of the information matrix are not only a function of the model parameters, but also of the posterior class membership probabilities P (X = q|y i ). For example, the contribution of response pattern i to the information on parameter β jq equals P (X = q|y i ) 2 (y ij − θ jq ) 2 P (y i ). In other words, response pattern i contributes with "weight" P (X = q|y i ) 2 to the information on a parameter of class q. The contribution to total of the parameters of all c classes equals c t=1 P (X = t|y i ) 2 . This shows that the information is maximal when P (X = q|y i ) equals 1 for one class and 0 for the other classes, in which case the total contribution equals 1. This occurs when the classes are perfectly separated or when the class membership is observed rather than latent.
Also the entries of I 1 become larger when the posterior class membership probabilities get closer to either 0 or 1. The matrices I 2 and I 3 capture the overlap in information between the class proportions and the β parameters. The elements of this matrix are 0 when separation is perfect and become larger with lower class separation.
The implication of the above is that the power can be increased by increasing the separation between the classes; that is, by influencing the factors affecting the posterior class membership probabilities. The posterior class membership probabilities depend on the number of classes, the class proportions, the class-specific conditional response probabilities, and the number of response variables (Collins and Lanza 2010;Vermunt 2010a). More specifically, class separation is better with less latent classes, a more uniform class distribution, response variables which are more strongly related to the classes, and a larger number of response variables.
Note that the conditional response probabilities have a dual role. The more the conditional response probabilities θ jq or the logit parameters β jq differ across latent classes, the larger the effect size and thus also the higher the power of the test for the parameters of indicator variable Y j . However, a larger difference between classes in the response on Y j also increases the class separation, and thus the power of all tests, also the ones for the other response variables.

Numerical Study
In this section, we present a numerical study that illustrates the Wald based power analysis for different configurations of design factors. As was shown in Section 3, in addition to the usual factors (i.e., sample size, level of significance, and effect size), power computation in LC models involves the specification of design factors such as the number of classes, the number of observed response variables, the class sizes, and the class-specific probabilities (or logits) for the response variables, which we refer to as LC-specific design factors.
As already indicated in Section 3.3, LC-specific design configurations yielding better separated classes, or posterior class membership probabilities which are closer to either 0 or 1, yield more precise estimators, and as a result larger power of the Wald tests. Therefore, in order to be able to compare different design configurations, it is important to have a measure for class separation. For this purpose, we use the entropy based R-square. The entropy of the posterior class membership probabilities for data pattern i, denoted by E i , equals c t=1 −P (X = t|y i ) log P (X = t|y i ). Note that E i gets closer to 0 when the posteriors are closer to 0 and 1. The average entropy across data patterns, denoted by E, equals 2 p i=1 E i P (y i ). The entropy based R-square can now be obtained as follows: R 2 entropy = 1 − E/E(0). Here, E(0) is the maximum entropy given the class proportions; that is, E(0) = c t=1 −P (X = t) log P (X = t). The entropy based R-square takes on values between 0 and 1, where larger R 2 entropy indicate larger separation between classes. Values lower than .5, between .5 and .75, and larger than .75 correspond to LC models with small, medium, and large class separation, respectively. Closer inspection of the expression R 2 entropy = 1 − E/E(0) shows that the largest entropy based R-square is obtained when E equals 0. This occurs when P (X = t|y i ) is either 0 or 1 for each response pattern y i ; that is, when class separation is perfect.

Manipulation of the Design Factors
The LC-specific design factors that were varied are the number of classes, the number of indicator variables, the class-specific conditional probabilities, and the class proportions. The number of classes varied from 2 to 4 (i.e., c = 2, 3, 4). The number of indicator variables was set to p = 6 and p = 10. The class-specific conditional probabilities θ jt were 0.7, 0.8, and 0.9 (or, depending on the class, 1-0.7, 1-0.8, and 1-0.9), corresponding to a weak, medium, and strong association between classes and indicator variables. The θ jt were high for class 1, say 0.8, and low for class c, say 1-0.8; with c = 3, class 2 had high θ jt values for the first half of the items and low values for the other items; with c = 4, class 2 had low θ jt values for the first half of the items and high values for the other items, and class 3 had high θ jt values for the first half of the items and low values for the other items. The class sizes were equal or unequal, where for the unequal conditions we used class proportions of (0.75, 0.25), (0.5, 0.3, 0.2), (0.6,0.3, 0.1), and (0.4, 0.3, 0.2, 0.1), respectively.
In addition to the four LC-specific design factors, we varied the sample size, power, level of significance, and effect size (Cohen, 1988). For power computation, the sample size was set to 75, 100, 200, 300, 500, 700, 1000, and 1500, whereas for sample size determination, the power was set to .8, .9, and .95. The type I error was fixed to 0.05. The effect size is already specified via the response probabilities θ jt , where it should be noted that the logit coefficients β jt for which the Wald tests are performed equal β jt = log θ jt /(1 − θ jt ). Table 1 presents the entropy based R-square for several combinations of the LC-specific design factors. It shows how the value of this R-square measure is affected by the number of classes, the class proportions, the number of indicators, and the strength of the class-indicator associations, given specific values of the other design factors. As can be seen, the smaller the number of the classes, the larger the number of indicator variables, and/or the stronger the class-indicator associations, the larger the value of the entropy based R-square. Moreover, the more equal the class sizes, the larger the entropy. It can also be seen that the entropy based R-square may become very low when all conditions are less favorable.

Effects of Design Factors on Power and Sample Size
To investigate the effect of class separation on the power of the Wald test for the significance of a class-indicator association, the power is com- puted for five of the design configurations that were presented in Table 1 under different sample sizes. The results are presented in Table 2. From this table, we can see that the power of a Wald test for class-indicator association strongly depends on the class separations. When classes are well separated, a sample size of 100 can be large enough to achieve a power of .8 or more. With a class separation of .330, .607, and .624, a sample size of 900, 370, and 140, respectively, is required to achieve such a power. With very badly separated classes as in the worst condition, even a sample size of 1500 is not large enough to achieve a power of .8. Table 3 reports the required sample size for a specified power for various combinations of LC-specific design factors. We use the condition with c = 3, p = 6, equal size classes, and medium class-indicator associations as the baseline. This condition requires sample sizes of 82, 108, and 131, respectively, to achieve the three reported power levels. The other conditions are obtained by varying one design factor at the time.
The results in Table 3 show that, as expected, the required sample size depends on the number of classes, the number of indicators, the strength of the class-indicator associations, and the class sizes. More specifically, keeping the other LC-specific design factors constant, the larger the number of classes and the fewer the number of indicators, the larger the required sample size to achieve the specified power level. The strength of the class-indicator associations turns out to be one of the key factors affecting the power; for example, to obtain a power of .80, we need at least 419 observations when these associations are weak, but only 34 observations when these are strong. Moreover, many more observations are required when the class sizes are un- H0 : βj1 = βj2 = ... = βjc for which j = 1 and c = 3. Note: The baseline model is the model with c = 3, p = 6, equal size classes, and medium association between classes and indicators. One design factor is varied to get the other conditions reported in the table.
equal than when they are equal; for example, to achieve a power of .95, we need approximately 130, 225, and 600 observations for the (0.334, 0.333, 0.333), (0.5, 0.3, 0.2), and (0.6, 0.3,0.1) condition, respectively. In summary, these results show that the strength of the class-indicator associations and the class distribution have a much stronger impact on the power than the number of classes and the number of indicator variables. The fact the strength of the class-indicator association is so important can be explained by the fact it affects both the class separation and the effect size. For example, for P = 6, C = 3, and equal class sizes, when the θ jt value changes from .9 to .7, the class separation drops from .880 to .332 and the difference between classes in their conditional response probabilities drops from .8 to .4. Thus, a θ jt value of .9 yields not only a much larger Rsquare value but also a much larger effect size than a θ jt value of .7. The class sizes are important because the power of a test regarding differences between groups depends strongly on the size of the smallest group.

Performance of the Power Computation Procedure
An important question is whether the theoretical power computed using the formulae presented in this paper agrees with the actual power when using the Wald with empirical data. To answer this question, we conducted a simulation study in which the theoretical power is compared with the actual power in data sets generated from the assumed population model. Note that the actual power equals the proportion of simulated data sets in which the null hypothesis is rejected.
The population model is a three-class LC model with six indicators and equal class sizes. We varied the strength of the class-indicator associations (same three levels as above) and the sample size (75, 100, 200, 300, 500, 700, and 1000). The actual power was computed using 500 samples from the population under the alternative hypothesis. For each of these samples, the LC model is estimated and it is checked whether the Wald value for the test of interest exceeds the critical value. Table 4 presents the theoretical and actual power of the Wald test under the investigated simulation conditions. As can be seen, both measures show the same overall trend, namely that the power increases with increasing sample size and increasing effect size (and class separation). However, the actual power of the Wald test is always slightly lower than its theoretical value, where the differences are larger for the smaller sample size and the weaker class separation conditions. An explanation for these differences is that the estimated asymptotic variance-covariance matrix used in the simulated power computations overestimates the variability of the β j parameters. On the other hand, substantive conclusions are the same for the simulated and theoretical power levels reported in Table 4. With the small effect size and the corresponding weak class separation condition, a sample size of 500 is needed to achieve a power of .8; with the medium class separation, a sample size of 100 suffices; and with the strong class separation, less than 75 observations are needed.

Discussion and Conclusion
In LC analysis, the association between class membership and the response variables is usually modeled using a logistic parametrization. This The power presented here is for the null hypothesis H0 : βj1 = βj2 = ... = βjc for which j = 1. Moreover, c = 3, p = 6, and class sizes are equal.
paper dealt with power analysis for Wald tests for these logit coefficients, for example, for the hypothesis of no association between class membership and the response provided on one of the indicators. We showed that, in addition to the usual design factors-that is, effect size, sample size, and level of significance-the power of Wald tests in LC models depends on factors affecting the amount uncertainty about the subjects' class memberships. More specifically, factors affecting the class separation also affect the power. The most important of these LC-specific design factors are the number of classes, the class proportions, the strength of the class-indicator associations, and the number of indicator variables. A numerical study was conducted to illustrate the proposed power and sample size computation procedures. More precisely, it was shown how class separation-quantified using the entropy-based R-square-is affected by the number of classes, the class proportions, the strength of the classindicator associations, and the number of indicator variables, and, moreover, how class separation affects the power. It turned out that under the most favorable conditions a sample size of 100 suffices to achieve a power of .8 or .9. For the situation where the entropy-based R-square is small, a considerably larger sample size is required. It was shown that under the least favorable conditions, even a sample size of 2000 did not suffice to achieve an acceptable power level. This demonstrates the importance of performing a power analysis prior to conducting a study that will make use of the LC analysis.
If power turns out to be too low given the planned sample size, instead of increasing the sample size, one may try to increase the class separation, for example, by using a larger number of indicators or, if possible, also by using indicators of a better quality. Note that improving the quality of indicators has a dual effect on the power of the Wald test for class-indicator associations: It increases both the effect size and the class separation. This dual effect could be seen in our numerical study where we saw a dramatic reduction of the required sample size when the θ jt value increased from .7 to .9. In practice, improving the quality of the indicators will not be easy, even in the type of more confirmatory LC analyses we were dealing with.
A simulation study was conducted to evaluate whether the theoretical power corresponds with the actual power of the Wald test. It turns out that the estimated power obtained with the formulae provided in this paper is slightly larger than the actual power, where we see a larger overestimation for smaller sample sizes and lower power levels. This implies that to be on the safe side, to achieve the specified power, a slightly larger sample size may be used than the estimated sample size.
In this paper, we restricted ourselves to power computations for Wald tests. However, likelihood-ratio test are often used in LC models as well, either for testing the same kinds of hypotheses as discussed here or for comparing models with different number of latent classes. Future research will focus on power computation for likelihood-ratio tests in LC models.
Another limitation of the current work is that we restricted ourselves to simple LC models. In future work, we will investigate whether the methods discussed in this paper can be extended to more complex LC models, such LC models with covariates, latent Markov models, mixture growth models, and mixture regression models.
Most of the simulation studies on LC and mixture modelling show that larger sample sizes may be needed than those found with the power computation method described in the current paper (see for example , Yang 2006;Nylund, Asparouhov, and Muthń 2007;Tofighi and Enders 2008). Those studies are, however, about deciding on the number of classes, whereas here we focus on the class-indicator association for a single response variable assuming that the number of classes is known. Note also that these studies typically do not look at significance testing, but at the performance of measures like BIC, which may have less power because of their penalty for model complexity. Further research is needed on the power of statistical tests for deciding about the number of classes, for example, of the bootstrapped likelihood-ratio test.
This shows that the computation of the information matrix requires solving the first-order partial derivatives ∂ log p(y i ) ∂ψl . For a class-proportion π t and a class-specific response logit β jt , these take on the following form: This yields the following forms for the entries of the sub-matrix I 1 , I 2 , I 3 , and I 4 : I 1 (π t , π s ) = 2 p i=1 P (X = t|y i ) π t − P (X = c|y i ) π c P (X = s|y i ) π s − P (X = c|y i ) π c P (y i , Ψ), I 2 (π t , β jl ) = 2 p i=1 P (X = t|y i ) π t − P (X = c|y i ) π c P (X = l|y i )(y ij − θ jl )P (y i Ψ), P (X = q|y i )(y ij − θ jq )P (X = l|y i ) (y ik − θ kl )P (y i , Ψ).

A.2 An Example of the Latent GOLD Setup for Wald Based Power Computation
The Latent GOLD 5.0 (Vermunt and Magidson 2013a) Syntax system implements the power computation procedure described in this paper. In order to perform such a Wald power computation, one should first create a small "example" data set; that is, a data set with the structure of the data one is interested in. With six binary response variables (y1through y6), this file could be of the form: y1 y2 y3 y4 y5 y6 0 0 0 0 0 0 which is basically a data set with a single observation with a response of 0 on all six variables. For this small data set, one defines the model of interest and requests the power or the required sample size using the output options. This is done as follows using the Latent GOLD "options", "variables", and "equations" sections: In the "variables" section, we define the variables which are in the model and also their number of categories. These are the six response variables and the latent variable "x". The "equations" section specifies the logit equations defining the model of interest, as well as the values of the population parameters. Note that the value 1.386294361 for a logit coefficients corresponds to a conditional response probability of .80.
The "output" line in the "options" section lists the output requested. With WaldPower=<number>, one requests a power or sample size compu-tation. When using a "number" between 0 and 1, the program reports the required sample size for that power, and when using a values larger than 1, the program reports the power obtained with that sample size. The optional statement WaldTest='filename' can be used to define user-specific Wald test in addition to the test which are provided by default. The linear contrasts for the user-defined hypotheses of interest are defined in a text file.