Introduction

Data in social and behavioral sciences are commonly acquired through questionnaires, and constructs of interests are tapped by indicators that typically contain measurement errors. Because of its capability of modeling measurement errors, structural equation modeling (SEM) is regarded as a proper method for the analysis of such data (Bentler, 1980). While SEM can yield important information about the structural relationship of the manifest and latent variables, regression analysis using composite scores directly serves the purpose of prediction and classification of individuals. For a correctly specified model with two latent variables, a recent study showed that regression analysis via weighted composites yields larger effect size than SEM for the regression coefficient between the two constructs (Yuan & Fang, 2021). However, substantive studies often involve more constructs and it is also hard to have a correctly specified model with real data. Because path analysisFootnote 1 with composite scores is a widely used method for data analysis, it is important to find out whether the method will deliver estimates with greater effect sizes than SEM in the general case. This article aims to address this issue via meta comparison. In particular, we will compare the two classes of methods with respect to effect sizeFootnote 2 and/or statistical power corresponding to the estimates of the path coefficients of the structural model, and the square root of the sample size times the effect size plays the role of non-centrality parameter in the z- or t-statistics. The datasets come from textbooks, software manuals and published studies where SEM has been used in the original analyses.

Composite scores are not free of measurement errors when variables/indicators contain measurement errors. A criticism against path analysis using composite scores is that the estimates of the path coefficients are biased (Croon, 2002; Devlieger & Rosseel, 2017; Hoshino & Bentler, 2013). However, recent literature pointed out that the population values of factor loadings and path coefficients of the structural model under SEM are artificially determined (Yuan & Deng, 2021). This is because latent variables do not naturally carry units and we have to assign a scale to each latent variable under the SEM methodology. For an exogenous latent variable, this can be done by fixing the factor loading for one of its indicators at 1.0 or by fixing the variance of the latent variable at 1.0, but the choice between the two is arbitrary. The scale of an endogenous latent variable is typically aligned to one of its indicators by fixing the factor loading at 1.0. However, the choice of the indicator as well as the value 1.0 are arbitrary. Note that the population values of the factor loadings as well as those of the path coefficients change with the change of the scaling method, without affecting the population covariance matrix of the indicators. Thus, for models involving latent variables, the particular population values of parameters are neither an integrated part of the observed data nor that of the population distribution.

In addition, the scales of composite scores are also artificially assigned. For example, the total scores contain the same information as the average scores. It can be shown that for a set of given values of the path coefficients among latent variables, one can obtain identical values for these coefficients under path analysis with composite scores by adjusting the scales of the composites (e.g., Croon, 2002; Devlieger, Mayer & Rosseel, 2016), and vice versa. Consequently, biasFootnote 3 in the estimates by path analysis with composite scores is artificial rather than substantively grounded. Such a fact leaves more space for researchers to choose methods according to the purpose of a study. In particular, a key step of path analysis with either composites or latent variables is to test the null hypothesis on the path coefficients or to confirm the role of the theoretical predictors. We should choose an approach that delivers the greatest effect size or signal-to-noise ratio in order to conduct this step most efficiently.

Both path analysis with composite scores and SEM can be used for the purpose of prediction, but they are from different perspectives and for different purposes. The predictive relationship under SEM is among latent variables that represent the population distribution, and all individuals are equivalent under such a relationship. In practice, when observed scores are used for prediction or diagnosis, individuals are no longer equivalent. A person with greater test scores is expected to perform better on the criterion variable, and such a relationship is directly characterized by the path coefficients under path analysis with the composite scores. In addition, even when predictors contain measurement errors, regression model with the LS estimates still yields the best linear unbiased predictor for a future value (see Fuller, 1987, p. 75). Consequently, the issue of whether path analysis with composite scores corresponds to larger signal-to-noise ratios than SEM deserves further study. If the answer is yes, then path analysis with composites not only enjoys more prediction accuracy but also facilitates more efficient selection of predictors. Note that it is the LS estimates of the regression coefficients that yield a predicted value of the outcome variable with the smallest mean-square error (MSE), not parameter estimates under SEM or bias-corrected parameter estimates (Fuller, 1987). We will further discuss the issue in the concluding section.

We may believe that SEM will yield greater signal-to-noise ratios for the path coefficients of the structural model since the effect of measurement errors is removed via the measurement model. However, this is not the case for models with two-latent variables according to the results in Yuan and Fang (2021). With more latent variables, the issues become rather complicated. This is because both measurement errors and correlations among the composites affect both the values of the regression coefficients and those of the standard deviations of their estimates (Buonaccorsi, 2010, p. 109). Both measurement and structural models might be misspecified, which also affect the signal-to-noise ratios of the parameter estimates under both SEM and path analysis with composite scores. It is impossible to quantify the combined effect of these sources on parameter estimates analytically. The Monte Carlo method also becomes impractical because the location and size of model misspecification are typically unknown, and each empirical study tends to have its own features in both model and data. Meta comparison is consequently a viable and informative approach for us to find out the strength of different methods in practice. While the distribution of the data/sample also affects the performances of different methods, we will use a robust technique so that the effects of nonnormally distributed data or data contamination on both parameter estimates and their standard errors are properly regulated.

In the following section, we will review the properties of several weighted composites, including those under the partial least-squares approach to SEM. A robust-transformation technique will be described in a subsequent section, and parameter estimates by both path analysis with composite scores and SEM become more efficient when applying normal-distribution-based methods to the transformed data. Datasets, models, and results will be presented in a follow-up section. Recommendation and discussion will be provided in the concluding section. An A contains the information on statistical packages and programs that are used to carry out the procedures described in this article.

Weighted composites

There are an infinite number of ways to formulate composite scores. We can only study a few of them, including the sum score and those possessing certain optimal properties. They are briefly described in this section.

The most widely used composite is the sum score or the simple average, and we will refer to it as the equally weighted composite (EWC). EWCs are easy to formulate but they do not use any statistical/psychometric information of the indicators/variables. It is known that both the path coefficients and R2 values under path analysis are affected by measurement reliabilities of the composites. More reliable composites correspond to more efficient estimates and greater R2 values, but an EWC can be less reliable than a single indicator in the composition of the EWC. That is, the sum score (x1 + x2 + … + xm) can be less reliable than an individual score xj.

The Bartlett- and regression-factor scores are among the most widely used composites for subscales (see e.g., Bartholomew, 2009; Lawley & Maxwell, 1971; Schuster & Lubbe, 2020). While the literature has demonstrated that Bartlett-factor scores possess different statistical properties from those of the regression-factor scores (Croon, 2002; Devlieger et al., 2016), Yuan and Deng (2021) showed that their differences in path analysis are only for the values of the estimated path coefficients. They also showed that, for regression analysis, the two types of factor scores yield identical standardized path coefficients, identical effect size for testing the significance of the path coefficients, and identical R2 value. In addition, the two types of factor scores possess the property of attaining the maximum reliability among all weighted composites (e.g., Yuan & Bentler, 2002). Thus, Bartlett-factor scores are equivalent to regression-factor scores for the purpose of path analysis, and we will only need to include one of them in our comparison of weighted-composite path analysis against SEM.

A method that systematically generates weighted composites is the partial least-squares approach to structural equation modeling (PLS-SEM), which has established its identity in many disciplines. Systematic introductions to PLS-SEM can be found in Wold (1980), Hair, Hult, Ringle, and Sarstedt, (2017) and Henseler (2021). We will briefly review the key elements of this method to make our presentation self-contained. To distinguish the commonly used SEM methodology from PLS-SEM, the former is referred to as covariance-based SEM (CB-SEM). PLS-SEM is essentially path analysis with weighted composites via the method of least-squares (LS) regression, and the method consists of two stages. Weighted composites are computed in the first stage, and path analysis with the obtained composites is conducted in the second stage.

Under PLS-SEM, indicators are unidimensional, that is, each indicator only loads on a single construct. The indicators for a particular construct form a block. The first stage of PLS-SEM offers two ways to compute weighted composites. One is termed as mode A and the other is termed as mode B (Wold, 1980, 1982). Both are computed by LS regression iteratively via the so-called environmental variables, which are temporary variables that are derived according to the model structure. Suppose the model has m constructs, and a construct η is directly connectedFootnote 4 to other q constructs out of the remaining m − 1. Let the block-wise average scores be the starting values for the composites, and the composite of η is denoted as \(\tilde {\eta }\). Then the environment variable of η (denoted as ηev) is defined as a linear combination of the q composites whose latent-variable counterparts are directly connected to η. The coefficients of the q composites in ηev can be either the sign of the correlation between \(\tilde {\eta }\) and the directly connected construct (termed as the centroid scheme) or the value of the correlation itself (termed as the factorial scheme, Henseler, 2021, pp. 90–91). Let yj (j = 1, 2, …, pη) be the block of indicators of η. Under mode A, the weight wj in the composite \(\tilde {\eta }=w_{1}y_{1}+w_{2}y_{2}+\ldots +w_{p_{\eta }}y_{p_{\eta }}\) is updated iteratively by the slope parameter of LS regression of yj on the environmental variable ηev. Under mode B, the weights (w1, w2, …, \(w_{p_{\eta }}\)) in the composite \(\tilde {\eta }\) are updated iteratively by the coefficients of multiple regression of ηev on yj, (j = 1, 2, …, pη). The iteration alternates across the m blocks of indicators until all the weights are stabilized, and the m composites are consequently obtained. In our analysis of the empirical data, we will use the centroid scheme in the formulation of ηev, while analytical studies by Dijkstra (1983) and Schneeweiss (1993) imply that different schemes yield essentially the same results.

In the SEM literature, a latent variable can either directly predict or being predicted by a set of manifest variables. The manifest variables being directly predicted by a latent variables are called reflective/effect indicators, and those that directly predict a latent variables are called formative/causal indicators. Starting from Wold (1980, 1982), mode A has been recommended to compute the weights for composites of reflective indicators and mode B to compute the weights for composites of formative indicators. While such recommendations are intuitive, Dijkstra and Henseler (2015) noted that mode A converges faster and yields numerically more stable weights than mode B, but the composite obtained by mode A can be less reliable than the corresponding EWC or a single indicator of the block (Yuan, Wen & Tang, 2020). When applying mode B to models with reflective indicators, Yuan and Deng (2021) showed that, at the level of population, the obtained composites are equivalent to the Bartlett-factor scores, which are known to have the maximum reliability among all weighted composites. However, results in Yuan and Deng (2021) showed that sampling and/or specification errors may cause mode B to yield negative weights even when the items are positively scored, while all the weights by mode A as well as all the factor loadings under CB-SEM are positive for the same dataset. They consequently proposed a procedure to transform mode A to mode B according to the structure of a one-factor model for each block of indicators. They also showed that the weighted composites following the transformed mode (denoted as BA) enjoy the same statistical properties as those directly obtained under mode B. We will use the mode BA instead of mode B in the following data analyses because the weights under mode BA enjoy the numerical stability of those under mode A and the resulting composites enjoy the statistical properties of those under mode B.

Note that the equivalence of PLS-SEM mode B, mode BA and regression analysis with the Bartlett-factor scores is for correctly specified models, while the equivalence between the Bartlett- and regression-factor scores in conducting regression analysis does not need a correctly specified model as long as the two types of factor scores are computed based on the same model (Yuan & Deng, 2021). Since it is unlikely to have a correctly specified model with real data, we will include four types of composites in our comparison of path analysis with weighted composites against CB-SEM. They are equally weighted composites, Bartlett-factor scores, composites by PLS-SEM mode A, and those by PLS-SEM mode BA. So there are a total of five different ways in estimating the coefficients of the structural model, one is CB-SEM and the other four are path analyses with different composites.

Robust transformation with real data

This section describes a robust transformation technique for more efficient parameter estimates with real data. Most classical methods of multivariate analysis are developed for normally distributed data. They are not for real data. The so-called full information maximum likelihood (FIML) method for CB-SEM is normal-distribution-based maximum likelihood (NML). It becomes a pseudo maximum likelihood method unless the true population distribution of the sample is multivariate normal. With real data, a robust method can easily perform better than the pseudo NML method with respect to type I error control and statistical power, as will be further discussed in this section.

Data in social sciences seldom follow normal distributions. Such a fact has been repeatedly confirmed by meta analyses (e.g., Blanca, Arnau, Löpez-Montiel, Bono, & Bendayan, 2013; Cain, Zhang, & Yuan, 2017; Micceri, 1989). However, CB-SEM and path analysis are routinely conducted under the normality assumption. With real data typically having heavier tails than that of the normal distribution, results by NML are most likely misleading or inefficient at least. In particular, the observed significance of the likelihood ratio statistic Tml for the overall model structure under CB-SEM is partly due to the extra kurtoses of the data in addition to an imperfect model (Hu, Bentler & Kano, 1992), and the true standard errors of the parameter estimates by NML increase with the size of the population kurtoses (Bentler, 1983; Browne, 1982). For such data, robust methodsFootnote 5 by accounting for the distribution shape of the data not only yield more efficient parameter estimates but also more reliable test statistics and fit indices (Yuan & Bentler, 1998; Yuan & Zhong, 2013). Robust M-estimation is a class of such methods under which the contribution of each observation to the final estimate is rationalized via scalar-valued weights. Cases sitting further away from the center of the data will get smaller weights (Huber, 1981). Yuan, Chan, and Bentler (2000) showed that robust M-estimation can be carried out via a transformation procedure. The sample means and sample covariance matrix of the transformed data are the robust estimates of the population means and covariances, respectively. To use this transformation, the researcher only needs to choose a cutoff value to determine the percent of the outlying cases whose effects on the estimated parameters are regulated. In particular, with Huber-type robust M-estimator (Huber, 1981), the cutoff value can be chosen via the quantile of a chi-square distribution. For example, with p = 10 variables and we choose the 95% quantile c.95 of \(\chi _{10}^{2}\) as the cutoff value, cases sitting further from the center of the data than c.95 will be assigned smaller weights in estimatingFootnote 6 the means and variances-covariances. The deviation of each case from the center of the data is computed by the Mahalanobis distance, with means and precision matrix being computed iteratively using the robust estimates. The further a case is from the center, the smaller the weights will become so that the contribution/influence of the case on the resulting parameter estimates is regulated. Note that, unless certain values of the observed variables for a case are huge, the weights are not zero so that the case is still counted in the estimation process. Cases with extreme values correspond to tiny weights, which are approximately equivalent to being removed from the analysis.

Following the logic that the sample covariance matrix is most efficient only when data are normally distributed, Yuan et al., (2000) proposed using Mardia’s (1970) measure of multivariate kurtosis to facilitate the choice of the cutoff value in conducting robust transformations. They recommended to start with the quantile c.95 as the cutoff value. If the standardized multivariate kurtosis Ms of the corresponding sample is not significantly different from that of a multivariate normal distribution, then the resulting transformed data will be obtained according to this weighting rule. If Ms is still significantly different from that of a multivariate normal distribution, then choosing c.90, c.85, … as the cutoff values for weights until the resulting Ms is not statistically significant. R and SAS code for conducting the robust transformation can be downloaded at https://www3.nd.edu/~kyuan/overview_robustMethod/ and their use is documented in Yuan and Gomer (2021). Existing studies indicate that the results following a robust transformation are much more reliable in the analyses of real data (e.g., Yuan et al., 2000). We denote the robust transformation using Huber-type weights as the H(α)-transformation, where α indicates that the quantile c1−α of \({\chi _{p}^{2}}\) is used as the cutoff value for regulating cases.

The scenario for robust M-estimation to work best is for samples whose populations follow elliptically distributions (Maronna, Martin, Yohai, & Salibián-Barrera, 2019). However, outliers or data contamination can cause the observed data skewed. It is difficult to distinguish whether the skewness in the sample is due to a skewed underlying population distribution or data contamination. Results with real data in Yuan et al., (2000) and Yuan and Bentler (1998) indicate that robust methods yield more reliable model evaluation than NML although the datasets are not symmetrically distributed. Monte Carlo results in Yuan, Chan, and Tian (2016) imply that robust methods yield more accurate parameter estimates than NML for asymmetric distributions with heavier tails than that of the normal distribution. The discussions by Huber (1981, p. 172) and Hampel, Ronchetti, Rousseeuw, and Stahel (1986, pp. 401–402) imply that robust methods are still preferred even when the underlying population distribution is skewed.

Note that raw data are needed for the application of robust transformation. With raw data, we can actually check the normality assumption via the measure of multivariate kurtosis (Mardia, 1970; Cain et al., 2017). While sample skewnesses also reflect departure from normality, it has been pointed out that, without mean structures, results of path analysis and CB-SEM are asymptotically unaffected by skewness of the underlying population distribution (Yuan & Bentler, 2000). Consequently, we do not pay attention to skewness of the observed data in the following section. When only sample means and covariances are available, we will have to assume that the raw data follow a multivariate normal distribution in order to proceed with the analysis, as will be done in the following section.

Like CB-SEM, path analysis with weighted composites can also be conducted using the NML method. Because PLS-SEM uses LS regression to estimate the path coefficients of the structural model, we will also use LS regression in estimating the path coefficients for the method of path analyses with the Bartlett-factor scores and equally weighted composites.

Data and modeling results

In this section, we compare path analyses with weighted composites against CB-SEM in estimating the coefficients of the structural model. Nine real datasets and 11 models are used in the comparison. These datasets and models were not chosen to obtain particular results, nor were they selected so that a model has to fit a dataset well enough. Rather, they are from SEM textbooks, software demonstrations and authors who were willing to share their raw data with us. These datasets represent different areas where theoretical constructs are modeled for studies of substantive interests. To save space, we will only include the results of the path coefficients of the structural model. Estimates of other model parameters under CB-SEM are reported in supplementary material (https://www3.nd.edu/~kyuan/SEM_PAcomparison/CB_SEMtables.pdf). Note that the estimated factor loadings (λ) and error variances (ψ) in the supplementary material are used to compute the Bartlett-factor scores that are used in path analysis.

Each of the 11 models in this section is represented by a path diagram. The path coefficients in each path diagram are labeled using LISREL (Jöreskog & Sörbom, 1993) notation under which γ is used for paths from exogenous variables to endogenous variables, and β is used for paths between endogenous variables. In the presentation we will report the estimates of the path coefficients, their SEs and the corresponding z-statistics. Because an empirical effect size is just the value of a z-statistic divided by \(\sqrt {N}\), we will not separately report the values of the empirical effect size.

Dataset 1

The first dataset (sample correlations and standard deviations) was originally presented in Wheaton, Muthén, Alwin, and Summers (1977), who studied stability of alienation as predicted by socioeconomic status. This dataset has been used to illustrate the CB-SEM methodology and its applications in textbooks and software manuals (e.g., Bentler, 2006; Jöreskog & Sörbom, 1993). The dataset has p = 6 variables and N = 932 cases. The six variables are anomie 1967, powerlessness 1967, anomie 1971, powerlessness 1971, education 1966, and socioeconomic index (SEI) 1966. Several models have been used to fit this dataset in the literature. We will consider the mediation model in Fig. 1 (the same as the model in Figure 3.3 of Joreskog and Sorbom, 1993, p. 88), which contains three path coefficients: γ11, γ21, β21. An alternative model (Figure 2.11 of Bentler, 2006, p. 39) is to have correlated errors between anomie 1967 and anomie 1971, and between powerlessness 1967 and powerlessness 1971. We do not consider the model with error-covariances because they empirically share the same effect with the values of the path coefficients. In particular, path analysis with weighted composites accounts for all associations through path coefficients. Including error-covariances under CB-SEM does not yield a fair comparison between the two classes of methods.

Fig. 1
figure 1

A model for stability of alienation (Bentler, 2006; Jöreskog & Sörbom, 1993; Wheaton et al.,, 1977; N = 932)

Because only the sample covariance matrix is available, we will assume that the population distribution of the six variables is multivariate normal. Fitting the model in Fig. 1 to Dataset 1 by NML results in Tml = 71.470 under CB-SEM. The p-value is essentially zero when compared Tml against \({\chi _{6}^{2}}\). The corresponding value of the root mean square error of approximation (RMSEA, Steiger & Lind, 1980) is .108, which also indicates that the model fit is not adequate. However, the comparative fit index (CFI =.969) implies that the model fits the data reasonably well (Bentler, 1990; Hu & Bentler, 1999). Such an inconsistency between different fit indices is not a surprise since they evaluate the model from different perspectives (Lai & Green, 2016).

Table 1 contains the estimates of the three path coefficients, their standard errors (SEs) and the corresponding z-statistics. As we noted earlier that the estimates of the coefficients are not directly comparable, since they depend on the scales of the predictors as well as that of the outcome variable. While the z-statistics following path analyses with the four types of composites are rather close, CB-SEM yields uniformly the smallest z-statistics for each of the coefficients. In particular, the z-statistic for γ21 under CB-SEM is close to half of those under path analyses with weighted composites. The results in Table 1 indicate that PLS-SEM mode BA yields the largest z-statistic for γ11, while path analysis with EWCs yields the largest z-statistic for γ21, and PLS-SEM mode A corresponds to the largest z-statistic for β21. Because, for correctly specified models and conditional on the population weights, path analysis using the Bartlett-factor scores and PLS-SEM mode BA are equivalent, the observed differences between the results of the two methods in Table 1 can be due to model misspecification and sampling errors.

Table 1 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 1 (p = 6, N = 932, Tml = 71.470, df = 6, p-value =.000; RMSEA =.108, and CFI =.969)

Dataset 2

The second dataset (sample covariance matrix) is from Table 7.4 of the LISREL manual (Jöreskog & Sörbom, 1993, p. 193), which was from Wiley and Hornik (1973) who studied communication processes with data collected from El Salvador. The study consists of two constructs, television watching by children and television possession by their family. Each construct was taped by two congeneric measures at three points in time. With a total of p = 12 manifest variables, the sample covariance matrix is from a dataset with N = 189 participants. The path diagram presented in Fig. 2 is for a model given by Joreskog and Sorbom (1993, p. 194). It is an auto regression model for each construct, and the two constructs at time 1 are correlated. There are four path coefficients: γ11, γ22, β31 and β42. Again, we will regard the raw data as being normally distributed in the analysis.

Fig. 2
figure 2

A longitudinal model for two versions of television possession and watching at three occasions (Jöreskog & Sörbom, 1993; Wiley & Hornik, 1973; N = 189)

Fitting the model in Fig. 2 to the television dataset by NML yields Tml = 185.378, and the p-value is essentially 0 according to \(T_{ml}\sim \chi _{49}^{2}\). The corresponding RMSEA =.122 also indicates that the model fit is poor, but, with CFI =.938, the model can be regarded as acceptable in real data analysis. The estimates of the four coefficients by the five methods are given in Table 2. Again, the z-statistics under CB-SEM are uniformly the smallest. While PLS-SEM mode BA yields uniformly the largest values of the z-statistic, the results under path analyses by the four types of weighted composites are close.

Table 2 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 2 (p = 12, N = 189, Tml = 185.378, df = 49, p-value = 0; RMSEA =.122, and CFI =.938)

Dataset 3

The third dataset is from Table 10.1 of Schumacker and Lomax (2010, p. 202), which is for a study of home resource and educational achievement. With p = 9 variables, the sample covariance matrix is based on a sample of size N = 200. The model, represented by Fig. 3, is as given in Figure 10.1 of Schumacker and Lomax (2010, p. 196), which implies that the effects of both home resource and ability on achievement are mediated by aspiration. The model contains five path coefficients: γ11, γ21, γ12, γ22, β21. Because only the sample covariance matrix is available, we regard the raw data as being normally distributed in the analysis.

Fig. 3
figure 3

A model of home resources and educational achievement (Schumacker & Lomax, 2010, N = 200)

Fitting the model in Fig. 3 to the educational-achievement data by NML results in Tml = 57.167, which corresponds to p-value = 3.39 × 10− 5 when referred to \(\chi _{21}^{2}\). With CFI =.974 and RMSEA =.093, the model might be regarded as adequate in practice. Table 3 contains the estimates of the five path coefficients by each of the five methods. While the z-statistics following path analyses are comparable across the four different weighted composites, those following CB-SEM are uniformly the smallest. In particular, the estimate of γ21 under CB-SEM is not statistically significant (z = 1.900) at the level of .05. The smallest z-statistic for γ21 by path analyses is 3.959 and with the Bartlett-factor scores. Again, the largest z-statistics for the five parameters are either by PLS-SEM mode BA (γ11, γ21) or by path analyses using the Bartlett-factor scores (γ12, γ22, β21).

Table 3 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 3 (p = 9, N = 200, Tml = 57.167, df = 21, p-value = 3.39 × 10− 5; RMSEA =.093, and CFI =.974)

Note that an estimate that is not statistically significant at the level of .05 does not imply that its population counterpart is 0. Many factors contribute to effect size and power in statistical modeling, as was discussed earlier in this article.

Dataset 4

The fourth dataset is from Table 8.1 of Kline (1998, p. 254), which originated from Worland, Weeks, Janes, and Strock (1984). The authors studied the effect of familial risk on academic performance and classroom adjustment with junior and senior high school students. Figure 4 contains the path diagram for the model as presented on page 255 of Kline (1998), which suggests that the effect of familial risk on classroom adjustment is completely mediated by cognitive ability and scholastic achievement. With one exogenous and three endogenous constructs, the model contains three path coefficients: γ11, β21, and β32. However, only a 12-variable correlation matrix based on a sample of size N = 158 is available according to Kline (1998). We will treat the sample correlation matrix as a sample covariance matrix and the underlying population distribution as normally distributed in our analysis.

Fig. 4
figure 4

A model of familial risk for psychopathology and child adjustment (Kline, 1998; Worland et al.,, 1984, N = 158)

Fitting the structural model in Fig. 4 to the 12 × 12 correlation matrix by NML results in Tml = 181.424 (df = 51, RMSEA =.128, CFI =.892). The model cannot be regarded as achieving an adequate fit according to established norms (Hu and Bentler, 1999), but it serves the purpose for comparing the different methods when the model fit is poor. Table 4 contains the estimated coefficients and the corresponding z-statistics. CB-SEM yields the smallest z-statistics for γ11 and β21 but path analysis with EWCs yields the smallest z-statistic for β32. In contrast, PLS-SEM mode BA yields the largest z-statistics for all the three parameters across the five methods.

Table 4 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 4 (p = 12, N = 158, Tml = 181.424, df = 51, p-value = 0; RMSEA =.128, and CFI =.892)

Dataset 5

The fifth dataset is from Table 14.1 of Kline (2016, p. 342), which contains the sample correlations and standard deviations of p = 12 variables. The data were from Houghton and Jinkerson (2007) who studied constructive thought strategies and job satisfaction, based on a sample of size N = 263 full-time university employees. With three indicators for each latent variables, the 12 variables respectively measure four constructs: 1) constructive thinking, 2) dysfunctional thinking, 3) subjective well-being, and 4) job satisfaction. Two models were estimated for this dataset in Kline (2016). The first model, as represented by only the solid arrows in Fig. 5, hypothesizes that constructive thinking reduces dysfunctional thinking, which leads to an enhanced sense of well-being, which in turn results in greater job satisfaction. This model has four path coefficients: γ11, β21, β31, and β32. The second model includes both the solid and dashed arrows, which hypothesizes that constructive thinking also has direct effects on sense of well-being and job satisfaction. This second model has six path coefficients: γ11, γ21, γ31, β21, β31, β32.

Fig. 5
figure 5

Two models of thought strategies and job satisfaction (Houghton & Jinkerson, 2007; Kline, 2016, N = 263)

The NML likelihood ratio statistic for the first model is Tml = 66.061, corresponding to a p-value of .064 when referred to \(\chi _{50}^{2}\). With RMSEA =.035 and CFI =.984, the model would be regarded as having achieved close fit in practice. Table 5 contains the parameter estimates for this model by the five different methods. Different from the previous datasets, CB-SEM yields the largest z-statistic for parameter γ11 (z = − 1.813), although it is still not significant at the level of .05. For the other three coefficients (β21, β31, and β32), CB-SEM again yields uniformly the smallest z-statistics. In particular, z = − 1.980 for β31 under CB-SEM is only marginally significant, the z-statistics for this parameter by the other four methods are all larger than 2.997 in absolute value.

Table 5 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 5, excluding the two dashed arrows (p = 12, N = 263, Tml = 66.061, df = 50, p-value =.064; RMSEA =.035, and CFI =.984)

Fitting the second model to the job satisfaction data by NML yields Tml = 62.231, corresponding to a p-value =.081 when referred to \(\chi _{48}^{2}\). The corresponding fit indices (RMSEA =.034, CFI =.986) also imply that the model has achieved close fit in practice. Table 6 contains the estimates of the six coefficients of the structural model and their corresponding z-statistics. None of the z-statistics for the first three coefficients (γ11, γ21, γ31) is significant at the level of .05. The z-statistic for β31 under CB-SEM (-1.936) is not significant either while those by the methods of path analysis with the four types of composite scores are. Actually, for the three coefficients among the endogenous constructs (β21, β31, β32), the z-statistics under CB-SEM are uniformly the smallest.

Dataset 6

Table 6 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 5, including the two dashed arrows (p = 12, N = 263, Tml = 62.231, df = 48, p-value =.081; RMSEA =.034, and CFI =.986)

Weston and Gore (2006, Table 2) presented a sample covariance matrix for a dataset with N = 403 cases and p = 12 variables. The data were from a survey of college students who participated in a vocational psychology research project. The indicator variables were based on a subset of items that measure social dimensions (Holland, 1997). With three indicators for each construct, the 12 variables respectively tap 1) self-efficacy beliefs, 2) outcome expectations, 3) career-related interests, and 4) occupational considerations. Weston and Gore Jr considered two structural models, as represented in Fig. 6. The model represented by the solid arrows (model 1) suggests that the effect of self-efficacy beliefs on career-related interests is partially mediated by outcome expectations, while the effect of self-efficacy beliefs on occupational considerations is completely through the two mediator variables (outcome expectations and career-related interests). This first model has four path coefficients in the structural model: γ11, γ21, β21, β32. The second model, represented by both the solid and dashed arrows in Fig. 6, suggests that the effects of self-efficacy beliefs on career-related interests and on occupational considerations are partially mediated by outcome expectations, and the effect of outcome expectations on occupational considerations is also partially mediated by career-related interests. This second model has six path coefficients in the structural model: γ11, γ21, γ31, β21, β31, β32.

Fig. 6
figure 6

Two mediated models for self-efficacy belief on occupational considerations (Weston & Gore, 2006, N = 403)

Fitting the first model in Fig. 6 to the vocational-psychology dataset by NML results in Tml = 416.061, which corresponds to a p-value that is essentially 0 when referred to \(\chi _{50}^{2}\). With CFI =.913, and RMSEA =.135, the model might not be regarded as fitting the data adequately although it is substantively sound (see Weston and Gore, 2006 and references therein). Such a discrepancy between theory and goodness of model-fit is not unusual in empirical modeling. Table 7 contains the estimates of the coefficients represented by the four solid arrows. CB-SEM yields the greatest z-statistic for parameter γ21, but for the other three parameters the z-statistics following CB-SEM are uniformly the smallest. The methods yield the greatest z-statistics for γ11, β21 and β32 are PLS-SEM mode BA, path analysis using the Bartlett-factor scores, and PLS-SEM mode BA, respectively.

Table 7 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 6, excluding the two dashed arrows (p = 12, N = 403, Tml = 416.061, df = 50, p-value =.000; RMSEA =.135, and CFI =.913)

Table 8 contains the results of fitting the second model in Fig. 6 to the vocational-psychology data. With two additional parameters the statistic Tml drops to 361.848, which corresponds to RMSEA =.128, and CFI =.926. The partial mediation model still cannot be regarded as adequate according to the established norms on fit indices (Hu & Bentler, 1999). However, the value of Tml as well as those of the derived RMSEA and CFI are affected by other factors in addition to the structural model. In particular, extra kurtosis of the observed sample alone can cause the significance of Tml even if the model is literally correct (Hu et al., 1992), but we do not have such information for this dataset, because only a sample covariance matrix is available. In Table 8, CB-SEM yields uniformly the smallest z-statistics for all the six parameters. The greatest z-statistics for γ11, γ11, γ11, β21, β31, β32 are obtained by PLS-SEM mode BA, PLS-SEM mode BA, PLS-SEM mode A, path analyses via Bartlett-factor scores, equally weighted composites, and Bartlett-factor scores, respectively.

Table 8 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 6, including the two dashed arrows (p = 12, N = 403, Tml = 361.848, df = 48, p-value =.000; RMSEA =.128, and CFI =.926)

Dataset 7

The seventh is a raw dataset from Neumann (1994), who studied the relationship of psychopathology and alcoholism. The dataset consists of p = 10 variables and N = 335 cases. The single exogenous construct (family history) is tapped by two indicators: family history of psychopathology and family history of alcoholism. The other eight indicators are respectively the age of 1st problem with alcohol, age of 1st detoxification from alcohol, alcohol severity score, alcohol use inventory, SCL-90 psychological inventory, the sum of the Minnesota Multiphasic Personality Inventory scores, the lowest level of psychosocial functioning during the past year, and the highest level of psychosocial functioning during the past year. With two indicators for each latent construct, these eight indicators respectively measure: age of onset, alcohol symptoms, psychopathology symptoms, and global functioning. Neumann’s (1994) theoretical model for this data set is represented by Fig. 7, which has six path coefficients: γ11, β21, β31, β32, β41, β42.

Fig. 7
figure 7

A model of symptoms of alcoholism and psychopathology (Neumann, 1994, N = 335)

With raw data, we can check the distribution properties of this psychopathology-alcoholism sample, which corresponds to a standardized multivariate kurtosis Ms = 14.763. Consequently, the sample cannot be regarded as normally distributed. Using the 95% quantile (c.95) of \(\chi _{10}^{2}\) as the cutoff value for regulating observations according to Huber-type weights, the resulting kurtosis of the transformed sample is Ms = 2.248, still statistically significant when referred to N(0,1). Further tuning with c.90 as the cutoff value for regulating observations results in Ms = −.110. Thus, we can regard the transformed data by H(.10) as approximately normally distributed. Our analysis and comparison below will be for this transformed sample.

Fitting the model in Fig. 7 to the transformed sample by NML results in Tml = 40.985, corresponding to a p-value =.069 when referred to \(\chi _{29}^{2}\). Fit indices (RMSEA =.035, CFI =.987) also indicate that the model reached close fit. The estimates of the six coefficients are reported in Table 9. Estimates of β31 by the 5 methods are consistently not statistically significant at .05 level although the z-statistic under CB-SEM is the largest. For the other 5 parameters, PLS-SEM mode BA yields the largest z-statistics for γ11 and β41, PLS-SEM mode A yields the largest z-statistics for β21 and β42, and path analysis with the Bartlett-factor scores yields the largest z-statistic for β32. Again, a non-significant z-statistic at the .05 level does not mean that we can regard the parameter β31 as 0. Many factors contribute to the significance of a parameter estimate in statistical modeling in general.

Table 9 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 7 (p = 10, N = 335, Tml = 40.985, df = 29, p-vale =.069; RMSEA =.035, and CFI =.987)

Dataset 8

The eighth is a raw dataset from a study of health and stress, and was examined in Yuan and Deng (2021), with p = 24 and N = 264. The 264 participants were recruited from both high-pressure professionals and from a psychiatric hospital. The 24 variables are indicators of four constructs. The three exogenous constructs are respectively: 1) emotional exhaustion with 5 indicators, 2) cynicism with 4 indicators, and 3) professional efficacy with 6 indicators. The single endogenous construct (depression) has 9 indicators obtained from patient health questionnaire (PHQ9, Kroenke, Spitzer & Williams, 2001). The path diagram for the structural model is in Fig. 8, which has three path coefficients: γ11, γ12, and γ13.

Fig. 8
figure 8

A model of burnout and depression (Yuan and Deng, 2021, N = 264)

As noted in Yuan and Deng (2021), with a standardized multivariate kurtosis Ms = 23.292, we cannot regard the sample as following a multivariate normal distribution. Using Huber-type weights and cutoff value c.95 (95% quantile of \(\chi _{24}^{2}\)) in the robust transformation, the standardized multivariate kurtosis of the transformed sample is Ms = 8.420, which is still highly significant. We subsequently tuned the cutoff value and end up with Ms = −.003 under the transformation H(.22). The results presented below are based on fitting the model in Fig. 8 to this robustly transformed sample by NML.

The likelihood ratio statistic for the overall model is Tml = 469.579, highly significant when referred to \(\chi _{246}^{2}\). However, fit indices CFI =.948 and RMSEA =.059 suggest that the model in Fig. 8 fits the sample reasonably well. The estimates of the three coefficients by the five methods are given in Table 10, where the z-statistics for both γ11 and γ13 under CB-SEM are not statistically significant at the .05 level. The z-statistics for γ13 are not significant under path analyses with the four weighted composites. But the four estimates of γ11 under path analyses are significant. Path analysis using the Bartlett-factor scores yields the largest z-statistic for γ12 while PLS-SEM mode BA yields the largest z-statistic for γ11.

Table 10 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 8 (p = 24, N = 264, Tml = 469.579, df = 246, p-value =.000; RMSEA =.059, and CFI =.948)

Dataset 9

The ninth dataset is from Table 8.2 of Bollen (1989, p. 334), which contains a sample covariance matrix of p = 11 industrialization-democracy variables based on a sample of N = 75 developing countries. These 11 variables are indicators of three constructs. Industrialization in 1960 is an exogenous construct and has three indicators: gross national product (GNP) per capita, energy consumption per capita, and the percent of the labor force in industrial occupations. Democracy 1960 is an endogenous construct and has four indicators: freedom of the press, freedom of group opposition, fairness of elections, and the elective nature and effectiveness of the legislative body. Democracy 1965 is another endogenous construct and has the same set of indicators as democracy 1960 but measured at a later time. Bollen (1989) considered multiple models for this dataset. We will estimate the model in Fig. 9, which has three path coefficients: γ11, γ21, and β21.

Fig. 9
figure 9

A model of industrialization and democracy (Bollen, 1989, N = 75), each of the four factor loadings on democracy 1960 equals that on democracy 1965

The difference of this model from the previous ones is that each of the four factor loadings on democracy 1960 is constrained to be equal to that on democracy 1965, due to the nature of panel data. The constraints were also part of the model in Figure 8.1 of Bollen (1989, p. 324), which also contains six pairs of correlated errors. We do not include the 6 error-covariances in the model since they empirically share the responsibility of the path coefficients. In particular, under CB-SEM error-covariances and the path coefficients in the structural model jointly account for the association of the indicators, whereas path analysis via weighted composites and PLS-SEM account for the association only through the path coefficients in the structural model. Thus, it is not a fair comparison between the two classes of methods when models under CB-SEM have correlated errors.

Fitting the model (with three extra constraints on factor loadings) in Fig. 9 to the p = 11 industrialization-democracy variables by NML yields Tml = 72.709, which corresponds to a p-value =.004 when referred to \(\chi _{44}^{2}\). Fit indices CFI =.957 and RMSEA =.094 indicate that the model fits the data reasonably well. Table 11 contains the estimates of the three path coefficients by the five methods. Note that the Bartlett-factor scores are computed using the factor loadings and error variances estimated by NML, and they implicitly used the equality constraints. However, the other three types of composites do not use the equality constraints. PLS-SEM mode BA yields the smallest z-statistic for β21 while CB-SEM yields the smallest z-statistic for γ21. The largest z-statistics for γ11, γ21 and β21 are obtained by PLS-SEM mode BA, path analysis with EWCs, and path analysis with the Bartlett-factor scores, respectively.

Table 11 Parameter estimates (est), their SEs (se) and z-statistics for the path coefficients in Fig. 9 (p = 11, N = 75, Tml = 72.709, df = 44, p-value =.004; RMSEA =.094, and CFI =.957)

We have compared CB-SEM against path analysis with four types of composites in estimating the path coefficients of the structural models. Eleven models over nine datasets were examined, with a total of 47 parameters. CB-SEM yielded the smallest z-statistics 38 times, and largest z-statistics 3 times.Footnote 7 Path analysis with Bartlett-factor scores never had the smallest z-statistics nor PLS-SEM mode A. The frequencies for each method to yield the largest and smallest z-statistics are reported in Table 12. For each parameter, we also ranked the five z-statistics from the smallest (rank 1) to the largest (rank 5) in absolute values. The average value of the rank for each method across the 47 parameters is obtained and so is the average of the absolute value of the 47 z-statistics. They are included in Table 12 as well. All the results in Table 12 indicate that CB-SEM has the smallest powerFootnote 8 and/or effect size in testing the significance of the path coefficients of the structural models. In contrast, PLS-SEM mode BA yielded the largest average z-statistics and average rank, followed by path analysis with Bartlett-factor scores. In particular, the results in Table 12 indicate that path analysis with equally weighted composites also corresponds to greater effect size and statistical power in testing the path coefficients than CB-SEM.

Table 12 Counts for each method to yield the largest and smallest z-statistics as well as the mean rank (1 to 5) and mean value of the z-statistics across the 47 parameters

Because the focus of the article is on the effect size and statistical power in estimating/testing path coefficients, we did not include R2 values in the comparison. But measurement errors are known to affect the R2 values. In particular, the R2 for regression analysis with composite scores will become smaller when either the outcome variable or the predictors contain more measurement errors. The value of the resulting R2 is proportional to the reliability of the outcome variable and a weighted average of the reliabilities of the predictors (Cochran, 1970). There are 30 endogenous constructs within the 11 models we examined. CB-SEM yields the largest R2 values for 29 out of the 30 cases. The only exception is for the regression model of democracy 1960 being predicted by industrialization 1960 (the left part of Fig. 9), where PLS-SEM mode BA corresponds to a greater value of R2 than CB-SEM, due to a substantial change in weights from PLS-SEM mode A to PLS-SEM mode BA. However, the R2 value under CB-SEM is not achievable when the model is used for prediction or classification of individuals. This is because we have to use the individual scores to predict the individual outcome variables, and measurement errors cannot be avoided with the observed scores. The reliabilities of the observed scores affect not only the value of R2 but also the prediction accuracy. Among path analyses with weighted composites, PLS-SEM mode BA yields the largest R2 in most cases, followed by path analysis using the Bartlett-factor scores, PLS-SEM mode A, and path analysis using equally weighted composites. Thus, PLS-SEM mode BA and path analyses using Bartlett-factor scores are also preferred from the perspective of prediction of individuals with the smallest MSE.

Conclusions and discussion

In social and behavioral sciences, theoretical constructs are commonly measured by indicators that contain measurement errors. CB-SEM is regarded as a proper method for modeling the relationship among the constructs because of its function in separating measurement errors and the latent constructs. However, the estimates of the path coefficients under CB-SEM may contain sizable sampling errors due to simultaneously estimating many parameters. In practice, the substantive interests are often prediction and/or classification of individuals. Then path analysis with composite scores has the advantage of directly estimating the relationship of the observed scores. This article shows that path analysis via weighted composites has an additional advantage of yielding path coefficients with less relative errors, as reflected by greater effect size and statistical power. The most interesting finding is that even path analysis with EWCs by the LS method yields estimates with greater signal-to-noise ratio than CB-SEM on average.

The results of this article imply that path analyses with Bartlett-factor scores or PLS-SEM mode BA tend to yield greater effect size for the path coefficients of the structural model than PLS-SEM mode A or path analysis with EWCs. Such a pattern is closely related to measurement reliability of the composites. In particular, Bartlett-factor scores and the weighted composites under PLS-SEM mode BA are most reliable among all weighted combinations of the indicators (Yuan and Bentler, 2002; Yuan & Deng, 2021). Consequently, we recommend mode BA instead of mode A for models with reflective indicators. Because formative indicators typically do not contain measurement errors or they do not share a common construct, mode B is still a viable option for computing the weights of formative indicators.

Although latent variables are error-free and CB-SEM directly models their relationship, measurement errors still interfere with the accuracy of the resulting parameter estimates. For a model containing two constructs (η, ξ) with a structural model η = γξ + ζ, let \(\hat \gamma _{ls}\) be the LS estimate of γ when ξ and η were directly used in the estimation, which is an ideal scenario. Then, the NML estimate of γ under CB-SEM is mathematically less accurate than \(\hat \gamma _{ls}\) even as the numbers of the indicators for both ξ and η go to infinity. This is because both the number of factor loadings and the number of error variances increase as the number of indicators increases, and CB-SEM has to face the issue of increasing number of model parameters. In contrast, as the number of indicators increases and under proper conditions, the reliabilities of all the composites \(\tilde \xi \) and \(\tilde {\eta }\) approach 1.0 (Hayashi, Yuan, & Sato, 2021). Let \(\hat \gamma _{ls}^{*}\) be the LS estimate for the model \(\tilde {\eta }=\gamma ^{*}\tilde \xi +\zeta ^{*}\). The signal-to-noise ratio in \(\hat \gamma _{ls}^{*}\) then converges to that of the LS estimate \(\hat \gamma _{ls}\). That is, as the number of indicators increases, the accuracy of the LS estimate of the regression coefficient under path analysis with composite scores approaches that under the ideal scenario. The results imply that, as the number of indicators increases, the advantage of path analysis with weighted composites over CB-SEM becomes more obvious, and the difference between the different types of composites will be less clear.

While path analysis with composite scores yields path coefficients with greater signal-to-noise ratio, the method itself is unable to offer the information such as reliability of individual items or the goodness of the overall model structure. These are important features of measurement and modeling, and can be directly obtained from CB-SEM. Other aspects of measurement such as unidimensionality as well as convergent and discriminant validity are also better evaluated under CB-SEM. Thus, while path analysis with composites has extra strength in prediction, classification and diagnosis for individuals, CB-SEM still offers valuable information that path analysis via composite scores does not provide. We recommend CB-SEM analysis being conducted first and followed by path analysis using Bartlett-factor scores or PLS-SEM mode BA even if prediction is of primary interest.

Under the assumption of normality on the distribution of the composite scores, the mechanism for path analysis to yield prediction with the smallest MSE is due to the fact that the predicted value is the conditional mean of the outcome variable. When data are not normally distributed, the prediction is most accurate in the class of linear predictors (Fuller, 1987, p. 75). In addition to the properties inherited from conditional means, composites under PLS-SEM mode BA also possess the property of maximum reliability, which also directly contributes to minimizing the MSE of the prediction. However, path analysis with weighted composites or PLS-SEM is a two-step procedure for modeling the relationship of the constructs. We can also obtain the conditional mean of an endogenous construct directly from the covariance matrices under the CB-SEM model. While this direct approach may seem more expedient, it is also a two-step procedure since parameters under CB-SEM have to be estimated before conditional means can be computed. With a correctly formulated model structure, the direct approach may yield equivalent predictors as those under path analysis using the Bartlett-factor scores or PLS-SEM mode BA. The matter becomes complicated with misspecified models. Regardless of whether the model is correctly specified or not, PLS-SEM has two mechanisms to maximize the predictive relationship among constructs: 1) The estimated weights under PLS-SEM maximize the correlation between the indicators in different blocks by regression analysis via the environmental variables (see e.g., Boardman, Hui & Wold, 1981); and 2) the estimated path coefficients maximize the correlation between the predictors and the outcome variable according to LS regression. Similarly, with misspecified models, the parameter estimates under CB-SEM also alter their values in order to account for the extra association of the observed indicators (Yuan, Marshall & Bentler, 2003). Additional studies are needed for comparing the two approaches.

The literatures of regression analysis with weighted composites and measurement-error models contain proposals for correcting the regression coefficients so that they become consistent with those under CB-SEM or regression analysis with true scores (e.g., Croon, 2002; Devlieger et al.,, 2016; Devlieger & Rosseel, 2017; Dijkstra & Henseler, 2015; Fuller, 1987; Hoshino & Bentler, 2013; Yuan et al.,, 2020). However, the corrected estimates may not work well when they are teamed with the composites for prediction or diagnosis of individuals. This is because the composites may not be on the same scales as those of the latent variables under CB-SEM or the true scores. Even if they are on the same scales, the corrected estimates will break the property of the “best in the class of linear predictors” of the intact LS estimates (Fuller, 1987, p. 75). In addition, the estimated path coefficients under two different types of composites are not exchangeable for the purpose of prediction since they may have different scales (Croon, 2002; Devlieger et al.,, 2016).