Satorra and Saris (1985) developed a method for estimating the power of the LRT (i.e., a SEM’s χ2 fit statistic) in SEM. This method can be used to estimate the power to detect overall misspecification of SEMs, and to estimate the power to detect misspecification due to specific parameters. We will first discuss the power related to overall fit of the model, and then explain how the same procedure can be used for power calculations related to specific parameters.
Theoretical background: Power to reject overall exact model fit
At the population level, the variables in a SEM may be related to each other. The population covariance matrix between the variables is denoted by Σpopulation. A researcher who plans to use SEM specifies a model that presumably explains the variances and covariances between the variables. The parameters in that model (for example, factor loadings, factor (co)variances, and residual variances in a factor model) lead to a so-called model-implied covariance matrix, denoted by Σmodel. If the researcher specified the correct model, then the specified model indeed gives rise to the population covariance matrix, and Σpopulation = Σmodel. If the specified model is not exactly correct, there is another model leading to Σpopulation, resulting in a discrepancy between Σmodel and Σpopulation, so that Σpopulation ≠ Σmodel. The discrepancy between Σpopulation and Σmodel is denoted by F0.
The χ2 test of overall fit in SEM tests whether the hypothesized model fits exactly in the population—that is, the H0 that the population discrepancy F0 is zero. When H0 is true, the expected value of the χ2 statistic equals the expected sampling error, which is equal to the degrees of freedom (df) of a model. The df of a model can be calculated by counting the number of observed statistics p (the number of unique elements in the observed covariance matrix and mean vector of the variables) and the number of model parameters to be estimated, q. The model’s df is then equal to df = p − q. Calculation of a model’s degrees of freedom will be illustrated in the example analysis in the next section.
Fitting the hypothesized model to data leads to an observed χ2 statistic. The p value associated with the observed χ2 statistic and the model’s df gives the probability of observing a sample discrepancy at least as large as the observed one, when any discrepancy is solely due to random sampling error. When this probability is smaller than the nominal α level, H0 is rejected, implying that the model does not hold exactly in the population. In other words, we conclude that the model is misspecified.
The H0 thus represents the case that the model fits the data exactly. When this is true, the expected χ2 value will be equal to the expected sampling error, i.e. with E() denoting the expected value: E(χ2) = E(sampling error) = df. The H1 is that the model does not fit the data exactly. When H1 is true but the (misspecified) H0 model is fit to the data, the test statistic also asymptotically follows a χ2 distribution (assuming multivariate normality and limited misfit), but with a larger mean and larger sampling variance. As a result, the distribution of the χ2 statistic under H1 lies more to the right, and is more spread out, than the distribution of the χ2 statistic under H0. The expected χ2 value under H1 consists not only of discrepancies due to sampling error, but also discrepancies due to misspecification, i.e., E(χ2) = E(sampling error) + E(misspecification error). The expected misspecification error is called the noncentrality parameter, denoted by λ. Therefore, under H1, the expected χ2 statistic equals df + λ. The exact size of λ depends on the population discrepancy F0 and the sample size (see Moshagen & Erdfelder, 2016):
$$ \uplambda =n\times {\mathrm{F}}_0, $$
(1)
where n = N under normal-theoryFootnote 4 maximum likelihood estimation.
To summarize, under H0 the test statistic follows a central χ2 distribution, with an expected value (i.e., mean) equal to its df parameter, and sampling variance equal to 2 × df. Under H1, the test statistic follows a χ2 distribution that is noncentral, with a mean equal to its df plus its noncentrality parameter λ—a nonnegative number that quantifies the degree of misspecification error—and sampling variance equal to 2df + 4λ (i.e., greater misspecification leads to more variability between replications of a study). Table 2 provides an overview of the hypotheses, models, and distributions associated with H0 and H1.
Table 2 Overview of the hypotheses, models, and distributions associated with H0 and H1 of the overall χ2 test Figure 1 shows a central χ2 distribution with df = 5 in red, and a noncentral χ2 distribution with df = 5 and λ = 10 in blue. The noncentral χ2 distribution is the χ2 distribution associated with H1. The vertical line indicates the critical χ2 value under the central χ2 distribution that is associated with the H0 with α = .05. The H0 will only be rejected if the observed χ2 value is larger than the critical value. The blue area under the H1 curve then shows the statistical power: the probability of rejecting H0 given that H1 is true. This probability is easy to obtain if one knows the two distributions of the test statistic under H0 and H1. The most challenging part of computing χ2-based power in SEM is obtaining the noncentrality parameter associated with a specific H1.
Satorra and Saris (1985) showed that in order to obtain the noncentrality parameter for the χ2 test in SEM, one can fit the H0 model to covariances (and means) implied by the population model under H1. Because the model is fit to population moments, the sampling error is eliminated from the model (E(sampling error) = 0). All resulting discrepancies therefore arise from misspecification error, so that
$$ \mathrm{E}\left({\upchi}^2\right)=0+\mathrm{E}\left(\mathrm{misspecification}\ \mathrm{error}\right)=0+\uplambda . $$
(2)
The χ2 value obtained in this way is therefore the noncentrality parameter λ under H1.
Practically, a researcher performing a SEM power analysis first has to formulate the H0 model. This is the model that the researcher thinks is the correct model. Next, the researcher has to think about a situation in which the H0 model should be rejected. That is, they have to define what H1 actually represents, by formulating a model with one or more additional parameters that are not zero. They then calculate the statistical power to reject the H0 model when H1 is true. Although conceptually it is easier to think about the H0 model first, and then define how the H0 model might be wrong (or what misspecification one wants to be able to detect with sufficient power), in order to perform power calculations, one has to specify the H1 model first, followed by the H0 model.
The following steps are used to obtain the statistical power (Saris & Satorra, 1993):
-
Step 1: Calculate the model-implied population covariance matrix under the alternative-hypothesized model (Model H1). The calculated covariance matrix is treated as population data in Step 2.
-
Step 2: Fit the null-hypothesized model (Model H0) to the model-implied covariance matrix from Step 1.
-
Step 3: Use the χ2 value from Step 2 as the noncentrality parameter λ to calculate the statistical power.
We will illustrate these three steps with power analyses for the overall fit of a path model.
Example 1: Calculating the power of the χ2 test for overall fit of a path model
As an example, we use the path model that was analyzed by Ma et al. (2020). It evaluates the effects of role conflict, role ambiguity, coworker support, and family support on three outcomes: emotional exhaustion (EE), depersonalization (DP), and decreased personal accomplishment (DPA). This path model is shown in Fig. 2, using the thinner black lines (so the thicker gray lines should be ignored for now). The model contains seven variances, four covariances, and 10 regression coefficients to be estimated, leading to a total of 21 parameters. The number of unique elements in the observed covariance matrix equals (7 × 8)/2 = 28. Thus, df = 28 − 21 = 7. With a significance level of α = .05, exact fit of this model would be rejected if the χ2 value obtained were larger than the critical value of a χ2 distribution with df = 7 and α = .05, which equals χ2 = 14.067. In order to calculate the power of the overall χ2 test, we follow the three steps as outlined above.
-
Step 1 - We have to specify an H1 model that contains more parameters than the model to be tested (H0). We have to specify the population values for all parameters in the model, including the parameters that are also included in the model under H0. For this example we use the standardized parameter estimates obtained by Ma et al. (2019) as population values for the parameters that are also included in the H0 model. Figure 2 shows the path model with the smaller black lines representing these population parameters. In general, it may be convenient to specify the parameter values in standardized form, so one can base values on the guidelines regarding small, medium, and large effects in the appropriate research domain. Next, we have to specify the parameters that are present under H1, but not under H0. These parameters define exactly how the model under H0 is misspecified. As there are many options for defining H1, it may require quite some deliberation to decide what the exact misspecification should entail. In principle, we would advise researchers to think about the parameters that should really lead to rejection of H0 if they are not zero. Regarding the value of these parameters, our recommendation would be to choose the minimum value that would be of interest. In our example, we added two small effects to the model associated with H1: an effect of .10 for role ambiguity on EE, and an effect of .10 for family support on EE. In addition, we added a covariance between the residuals of family support and coworker support of .30. Note that specifying only these three extra parameters implies that we chose population values of zero for the rest of the parameters, such as the effect of role conflict on DPA. Figure 2 shows the population values of all parameters under H1, with the extra parameters indicated in thicker gray lines. The goal of step 1 of the procedure is to generate population data based on H1. If one wants to generate data in R, one can for example specify the population values in designated matrices and use matrix algebra to do so. Appendix 1 provides the R code to calculate the model-implied covariance matrix with matrix algebra for this example. However, the power4SEM app lets users specify the model in lavaan syntax with all fixed parameters, and will do these calculations behind the scenes using functions from the semTools package (Jorgensen, Pornprasertmanit, Schoemann & Rosseel, 2020). Below, we show the lavaan syntax that specifies our example model under H1.
All parameters are fixed at the (chosen) population values using the multiplication operator. For example, the population direct effect of RoleAmbi on CoSup is specified as being −.253 using “CoSup ~ -.253*RoleAmbi.” In the app, a graphical display of the model will appear at the right side of the dialog box. This figure is created using the semPlot package (Epskamp, 2019). Although the outline of these figures may not always be optimal, especially with larger models, this graphical display can be used to check whether all population values are indeed specified as fixed parameters. If the model syntax still contains unspecified/free direct effects or (co)variances, these will be displayed in red.
Note that we started by using the standardized parameter values as reported by Ma et al., to ensure meaningful interpretation of the size of parameters However, by adding the extra parameters in the H1 model, we also changed two population variances of the variables. As a result, the standardized values of the parameters may also change, compromising the interpretation of specified parameter values according to a standardized metric. If one clicks the button that says “View H1 values” in the app, a pop-up window appears that contains the model-implied covariance matrix of the H1 model. The variances of the variables are on the diagonal of the covariance matrix. In a path model where all variances equal 1, all parameters are in the standardized metric. In a factor model, the same is true when the common factors are scaled by fixing the factor variances to 1. If the model-implied variances are not equal to 1, users may want to change some population values (for example by increasing or decreasing residual variances) such that the model-implied variances are 1. Users can inspect the table containing the values of the H1 parameters in the standardized metric in the pop-up window. In our example, the model-implied variances of EE and DP are no longer exactly 1, but are close enough to ensure that the difference between the standardized values of the added direct effects and the specified values are within rounding error.
-
Step 2 - The next step is to specify the model under H0. In our app, the lower input box on the left can be used to add the lavaan syntax specifying the model to be tested. A graphical display of the model to be analyzed is shown next to the input box. Since this model contains free parameters, this figure contains red parameters. Figures 3 and 4 show a screenshot of the app with the input boxes and the graphical displays of our example model. If we hit the green button that says “Calculate NCP,” power4SEM will fit the H0 model to the population data generated under H1, with the specified intended sample size, using the function SSpower() from the semTools package (Jorgensen et al., 2020). The resulting χ2 value is the noncentrality parameter that we need to calculate the power. In our example, the noncentrality parameter equals 26.638.
-
Step 3 - In the second tab of the app, we can calculate the power of the χ2 test using the obtained noncentrality parameter. By filling in the noncentrality parameter (λ = 26.638), df = 7, and α = .05, the two associated χ2 distributions and the calculated power will appear at the right side. In this example, we see that the power to reject the overall fit of the path model, given the chosen H1 model, equals .982. At the lower left part of this tab, the minimum sample size that would be needed to obtain a specific power level can be calculated. In this example, a sample of 109 would be needed to obtain a power of .80.
Theoretical background: Power of the χ2 difference test
The χ2 statistic can be used to evaluate the overall fit of a model, but it can also be used to test the difference between two nested models with the χ2 difference (Δχ2) test. For example, one may use the χ2 difference test to test whether removing a certain direct effect in a path model leads to significantly worse model fit. A specific model (Model A) is said to be nested within a less restricted model (Model B) with more parameters (i.e., fewer df) than Model A, if Model A can be derived from Model B by introducing restrictions only. For example, path model A is nested within path model B by fixing one of the path coefficients in Model B to zero, or by constraining two path coefficients in path model B to be equal to each other. This is known as parameter nesting: any two models are nested when the free parameters in the more restrictive model are a subset of the free parameters in the less restrictive model.
The H0 for the χ2 difference test is that the difference between the population discrepancy values for the two models (Model A and Model B) is zero: ΔF0 = F0_A − F0_B = 0, or in other words that the two models fit equally well. The H1 is that the models do not fit equally well, or specifically, that the more restricted Model A fits worse than Model B, so that F0_A − F0_B > 0, or equivalently, ΔF0 > 0.
As the test statistic of each of the nested models follows a χ2 distribution, the difference in χ2 values between two nested models is also χ2 distributed:
$$ {\Delta \upchi}^2={\upchi_{\mathrm{A}}}^2-{\upchi_{\mathrm{B}}}^2, $$
(3)
with degrees of freedom for the difference equal to the difference in degrees of freedom for the two models:
$$ \Delta df={df}_{\mathrm{A}}-{df}_{\mathrm{B}}. $$
(4)
When Model A and Model B fit equally well in the population (so H0 is true), then the models have the same F0, leading to the same noncentrality parameter λ, such that Δλ = λA − λB = 0. In this case, the Δχ2 between the models asymptotically follows a central χ2 distribution. Under H1, so when the two models do not fit equally well, the noncentrality parameter of the most restricted model will be larger, such that Δλ = λA − λB > 0. In this case, under the assumption that neither Model A nor Model B is badly misspecified, the Δχ2 between the models asymptotically follows a noncentral χ2 distribution with noncentrality parameter Δλ (Steiger et al. 1985). See Table 3 for an overview of the hypotheses, models, and distributions associated with H0 and H1 of the χ2 difference test.
Table 3 Overview of the hypotheses, models, and distributions associated with H0 and H1 of the χ2 difference test between two nested models Model A (most restrictive) and Model B (least restrictive) The difference in model fit thus can be tested by comparing Δχ2 to a χ2 distribution with Δdf, which is called the χ2 difference test. If Δχ2 is significant, the H0 of equal fit for both models is rejected, so the less restrictive Model B should be retained. If Δχ2 is not significant, the fit of the restricted model (Model A) is not significantly worse than the fit of the unrestricted model (Model B), so the H0 of equal fit cannot be rejected. In this case, the more restricted model (Model A) may be preferred based on the parsimony principle.
Note that because all overidentified models (so all models with df > 0) are nested in the saturated model (the model with df = 0), the overall (χ2) test is actually a special case of the Δχ2 test. That is, when Model B is the saturated model, χB2 and dfB are zero, so that Δχ2 and Δdf are the same as the overall χ2 and df for Model A.
Power calculations for the χ2 difference test are straightforward once the noncentrality parameter Δλ is obtained. Obtaining Δλ involves generating population data from the less restricted Model B. When the more restricted Model A is fitted to these data, the model will not fit perfectly and will yield a nonzero discrepancy value F0_A. Fitting Model B to the population data will lead to a perfect fit, so F0_B = 0 and λB = 0. Therefore, the noncentrality parameter for the χ2 difference test equals the noncentrality parameter from Model A: Δλ = λA − 0 = λA (MacCallum, Browne & Cai, 2006). In practice, we do not need to fit Model B to the data to verify that it will fit perfectly. Therefore, power calculations for the χ2 difference test involve the same three steps as before, with the H1 model (used to generate population data) being the Model B with the parameter(s) to be tested, and the H0 model (model to be fitted to the population data) being the more restricted Model A.
Example 2: Calculating the power of the Δχ2 test
Suppose that a researcher wants to know the statistical power of the Δχ2 test to detect a direct effect of Y1 on Y5 in the model from Fig. 5. The two nested models that would be compared with a Δχ2 test in this case are models with and without estimating the direct effect.
-
Step 1 - The first step is to calculate the model-implied covariance matrix from the model with the direct effect, i.e. the model under H1. Similar to the earlier examples, one has to choose population values for each parameter in the model. In this example we chose medium-sized standardized values for the direct effects that are also included in the model under H0. We will calculate the power to detect a small standardized effect of .10 of Y1 on Y5. The (residual) variances are chosen in such a way that the total variances of all variables are 1, so that the specified effects are equal to the standardized effects.
-
Step 1 consists of calculating the model-implied covariance matrix based on this model. We entered the following code to the first textbox (but see Appendix 2 for the calculation of the model-implied covariance matrix using matrix algebra). Note that paths that are omitted from the specification are path coefficients that are assumed zero in the population, such as the effect of Variable 1 on Variable 4. One can view the model-implied covariance matrix by clicking the button “View H1 values.” The resulting model is graphically shown to the right of the syntax, where all parameters are displayed in black because they are fixed.
-
Step 2 - Next, the model under H0, which is the model without the direct effect, is fitted to the covariance matrix from Step 1. In the app, the H0 model can be specified in the textbox at the lower left side using lavaan syntaxFootnote 5. The H0 model is the model that does not contain the parameter(s) of interest. So, in our example, the effect of Y1 on Y5 is fixed at zero. Fitting this model to the population data with a certain sample size provides a χ2 value, which equals the noncentrality parameter. In this example, the app fits the H0 model with N = 200, which results in a noncentrality parameter of λ = 4.007. The noncentrality parameter is the misfit that arises because the direct effect of Y1 on Y5 is .10 in the population, but it is not included in model H0.
-
Step 3 - The power of the Δχ2 test is calculated by inserting the values of the noncentrality parameter (4.007), the degrees of freedom of the test (1; the difference in the number of parameters between model H0 and model H1) and the sample size (200) in the second tab of the app. The result then shows that under the specified conditions, the power to detect the effect of Y1 on Y5 equals 52%, which is quite low. With the button at the lower left of this page in the app, one can calculate how large the sample should be to reach different power levels. In this example, one would need a sample size of 391 to obtain 80% power for the Δχ2 test.
By calculating the power of the Δ χ2 test, we anticipated a situation in which one has an a priori hypothesis about this specific effect, and therefore would test the significance of this specific effect with the Δ χ2 test with df = 1. Note that the same noncentrality parameter can be used to calculate the power to reject the overall χ2 test for exact fit of model H0, because the overall χ2 test is actually a Δχ2 test against the saturated model. In this example the H0 model is correctly specified except for one direct effect, because the other parameters that are assumed to be zero in H0 are indeed zero in the population. Still, the overall χ2 test would have df = 5, because it is a test relating to all parameters that are not included in the model, regardless of how many of those parameters are nonzero in the population. In this example, the overall χ2 test with df = 5 would have 29.2% power to reject exact fit.