# Why generalized structured component analysis is not universally preferable to structural equation modeling

## Abstract

Generalized structured component analysis has emerged in marketing and psychometric literature as an alternative to structural equation modeling. A recent simulation study recommends that, in most cases, this analysis is preferable to structural equation modeling because it outperforms the latter when the model is misspecified. This article examines the characteristics of generalized structured component analysis and reveals that the surprising previous findings are attributable to an incomplete experimental design and an error incurred during the software implementation of generalized structured component analysis. Simulated data show that generalized structured component analysis provides inconsistent estimates. In some instances, model misspecification can nearly neutralize this inconsistency, but in others it will reinforce the inconsistency. Moreover, generalized structured component analysis is hardly suitable for mediation analysis because it substantially overestimates the direct effect. Thus, generalized structured component analysis *cannot* be recommended universally over structural equation modeling.

### Keywords

Generalized structured component analysis Structural equation modeling Model misspecification Mediation“Since its introduction in marketing […], structural equation models with latent variables have been used extensively in measurement and hypothesis testing” (Bagozzi and Yi 1988, p. 74). This statement, which is as true today as it was in 1988, begins the most frequently cited article ever published in this journal, on the subject of structural equation modeling (SEM). It is not surprising, therefore, that SEM has become an established element in the methodological toolbox of marketing researchers (Baumgartner and Homburg 1996). Researchers have embraced the advantages of SEM, which include its abilities to model latent variables, to account for measurement error (Bagozzi and Phillips 1982), and to test a series of dependence relationships simultaneously (Shook et al. 2004). Traditionally, marketing research has predominantly used two SEM techniques: covariance-based SEM and partial least squares path modeling (Fornell and Bookstein 1982; Reinartz et al. 2009). Hwang and Takane (2004) have also proposed a third SEM technique: generalized structured component analysis (GSCA).

In this context, misspecified (and therefore incorrect) models are a permanent threat to the advancement of marketing science and model-based predictions in marketing practice. Marketing researchers are well aware of the consequences of model misspecification: in marketing research, “model misspecification looms large” (Chandy 2003, p. 353), and in marketing practice, “model misspecification can affect resource allocation decisions and other marketing efforts that are important to a firm” (Schweidel et al. 2008, p. 82). As for any marketing models, it is crucial to identify and eliminate misspecification for structural equation models (Hu and Bentler 1998), whose estimates otherwise would not be trustworthy.

A recent article advises marketing researchers to avoid SEM and adopt GSCA instead (Hwang et al. 2010b). These authors report that GSCA outperforms SEM when the models are misspecified, and that GSCA obtains more accurate estimates in the case of misspecified models than in the case of well-specified models. They also advise that, “if correct model specification cannot be ensured, the researcher should use generalized structured component analysis” (Hwang et al. 2010b, p. 710). Yet this condition seems somewhat rhetorical; virtually all models are biased in some way (Browne and Cudeck 1993). In turn, GSCA apparently would be the universal method of choice in normal circumstances. Should marketing researchers follow this advice? If GSCA consistently achieves accurate estimates from misspecified models, it would offer significant opportunities for empirical research. Using GSCA, researchers could expect accurate estimates without having to pursue the correct specification of their models.

Logic tells us, however, that this claim cannot be true. Model misspecification means developing a hypothetico-deductive system on the basis of incorrect assumptions. Consequently, a system itself becomes inconsistent, and “the distinction between truth by derivation and falsity by derivation becomes blurred” (Bunge 1967, p. 437); this makes it impossible to tell whether conclusions based on the system are right or wrong. Therefore, GSCA’s superior parameter accuracy in the case of misspecified models refers to a situation in which the correct conclusions are coincidentally drawn from an inconsistent model. However, as we will show by means of a computational experiment, there are other situations in which GSCA estimates facilitate incorrect conclusions in case of both misspecified and well-specified models.

Consistency can be achieved only if the separate assumptions of a system are true (cf. Bunge 1967), which means there is no way to avoid the consequences of misspecification other than to specify the model correctly. Thus, GSCA cannot be a remedy for model misspecification. Still, the possibility remains that GSCA could be relatively robust against model misspecification. This means that, while it is impossible for GSCA to work better in the case of misspecified models than in the case of well-specified models, it might be possible that misspecification affects the estimates of GSCA less than it affects those obtained by covariance-based SEM. With this study, we investigate exactly what GSCA is and how it behaves to discern how Hwang et al. (2010b) achieved their findings. In so doing, we offer some guidelines for what researchers should consider when they use GSCA or interpret studies that have used it.

Moreover, this paper makes four key contributions. First, we provide new insights into what GSCA does and what characteristics it has. We show that GSCA creates weighted sums of indicator variables (i.e., composites) that maximize the average coefficient of determination (R^{2}) of prespecified linear equations between the composites. Second, we reveal that Hwang and Takane (2004) erred in their description of GSCA’s algorithm, which influenced all software implementations of GSCA. We identify the methodological articles affected by this problem and advise marketing researchers about which of the resulting conclusions they should disregard. Third, we demonstrate that Hwang et al.’s (2010b) findings, which are based on a simulation study, reflect their specific population model choice; in general, GSCA provides inconsistent estimates. If the bias induced by the model misspecification neutralizes GSCA’s inconsistency, then GSCA provides estimates that are closer to the true values; alternatively, the model misspecification could catalyze GSCA’s inconsistency. Fourth, we show that GSCA exhibits undesirable behavior in the case of a mediation analysis, such that it overestimates the direct effect. Overall then, GSCA *cannot* be universally recommended for use in marketing research, regardless of whether a correct model specification has been achieved. Instead, researchers should make deliberate choices based on conceptual, empirical, and simulation-based comparisons of extant structural equation modeling techniques (e.g., Dijkstra 1983; Fornell and Bookstein 1982; Lu et al. 2011; Reinartz et al. 2009).

## Generalized structured component analysis

Hwang and Takane (2004) propose generalized structured component analysis (GSCA) as an alternative to SEM. GSCA maximizes the average or sum of explained variances of linear composites and is equivalent (as we will show later) to an approach developed by Glang (1988), which he called “maximization of the sum of explained variances.” Thus Hwang and Takane (2004) might more accurately be considered promulgators of Glang’s (1988) work than the inventors of GSCA. GSCA consists of three defining elements: (1) a way to specify linear models, (2) an optimization criterion, and (3) an algorithm to obtain estimates. We illustrate all three elements of the GSCA approach next.

### The GSCA model specification

As its name suggests, GSCA is a component-based approach, which means that composites result from linear combinations of the observed variables (Meredith and Millsap 1985). The approach assumes that all observed variables and composites are centered and scaled to unit variance (Hwang and Takane 2004). The definition of the composites depends on whether a construct is formative or reflective (for a general discussion on this distinction, see Diamantopoulos and Winklhofer 2001). Then for each formative construct, GSCA defines a composite of the construct’s indicators, relying on the assumption that formative constructs do not contain measurement error on either the indicator or the construct level. For each reflective construct, GSCA defines a composite of the construct’s indicators and transforms each reflective indicator into a single-indicator composite with unit weight, such that it can define relationships from the composite that link the reflective construct to the single-indicator composite(s).

_{i}is a composition of the true value plus random and systematic error:

_{i}as obtained from GSCA makes it possible to determine the variance of GSCA’s construct scores as follows (under the usual assumption that random errors are neither correlated with each other nor with the true construct score):

As long as the reflective indicators are affected by random measurement error, the construct scores will also contain some measurement error. However, if there is no systematic measurement error, the use of multiple indicators reduces the proportion of variance in the construct scores due to random measurement error compared to the variance due to the true score. The same does not hold true for systematic measurement error because the proportion of variance due to this type of error is reinforced almost proportionally to the variance due to the true score.

**W**and

**A**, so for GSCA, these two figures are different representations of the same model. The dashed lines (parameterized by the composite weights) denote the defining relationships of the composites and therefore do not enter into the explained variance of the respective composite. The weights of the single-indicator composites equal 1 (w

_{11}= w

_{22}= w

_{33}= w

_{74}= w

_{85}= w

_{96}= 1). GSCA identifies this model by constraining the variances of all composites to equal 1 and setting the construct-level measurement errors to 0 (e

_{7}= e

_{8}= 0). The model equation of GSCA can be written as follows:

**Z**is the data matrix of form N × J, with N as the number of observations and J as the number of observed variables, and

**W**is a J × T matrix containing the measurement weights, with T as the number of composite variables in the model. Thus

**W**describes how the composites can be built from the observed variables. For the example in Fig. 1, the matrix

**W**would be:

**Γ**=

**ZW**(form N × T) that contains the values of all composite variables. The right-hand side of Eq. 3 reveals the already described matrices

**Z**and

**W**, as well as

**A**(form T × T), which contains the component loadings and path coefficients. For the example in Fig. 1, the matrix

**A**would be:

The six leftmost columns of **A** contain the component loadings (submatrix **C**); the three rightmost columns contain the path coefficients of the structural paths (submatrix **B**). Overall then, the right-hand side of Eq. 3 consists of the predicted values of the composites (**ZWA** = **ΓA** of form N × T) and the residual values matrix **E** (form N × T). Equation 3 implies that all composite scores can be explained by the composite scores in the model—a notation that follows the spirit of a reticular action model (McArdle and McDonald 1984).

### The GSCA optimization criterion

**W**and

**A**) are estimated such that the sum of squares of all residuals (e

_{i}) is as small as possible” (Hwang et al. 2010b, p. 700). The following function thus must be minimized:

**S**:

In this equation, matrix **I** is the identity matrix of the same dimension as **A**. Because the sample size N is a positive constant (for a particular optimization), it can be disregarded for optimization purposes. Equation 5 shows that the raw data are not required for the optimization function, as long as the correlation matrix is available.

^{2}) of all composite variables in the model. As a proportion of variance (or average thereof), the value of FIT can range from 0 to 1.

Thus, GSCA maximizes the average explained variance of linear composites. Equivalently, it maximizes the sum of the R-square values of linear composites.

### The GSCA algorithm

The third element of GSCA is an algorithm that must estimate the variable elements of the matrices **A** and **W** to minimize the optimization criterion f_{GSCA}. In principle, this task can be fulfilled in several ways, such as by using numerical optimization, applying existing algorithms such as Glang’s (1988), or creating a new algorithm. Hwang and Takane (2004) recommend an alternating least squares (ALS) algorithm (de Leeuw et al. 1976) to minimize f_{GSCA}. Their algorithm consists of two steps: calculate **A** keeping **W** constant, and calculate **W** keeping **A** constant (for a detailed description of ALS, including a discussion of its convergence, see Hwang and Takane 2004). It has been implemented in the software programs VisualGSCA (Hwang 2007) and GeSCA (Hwang and Park 2009), as well as in a protected MATLAB code (Hwang and Takane 2004).

## An error in the GSCA algorithm and its consequences

_{2004}) does not maximize FIT but rather a weighted average over the explained variances of the composites:

As Eq. 8 illustrates, the model statistic FIT(GSCA_{2004}) weights the explained variances of the D constructs in the model by a factor of 1/N. Because the optimization criterion depends on the sample size, any fixed correlation matrix will produce varying GSCA_{2004} estimates for different sample sizes. The asymptotic properties of the GSCA_{2004} algorithm are particularly interesting: for large values of N, the importance of the structural model for determining composite weights approaches 0, and the estimates become more and more similar to those that result from regressions between principal components—exactly the finding reported by Tenenhaus (2008).

_{2004}algorithm leads to substantially different results. In comparing the results obtained by GSCA

_{2004}and the correct implementation of GSCA, Hwang et al. (2010c) confirm that a

noticeable difference was found in the direction of the (average) relative biases of the parameter estimates under correct specification. Specifically, in the article, we reported that the loading estimates of generalized structured component analysis generally had a tolerable level of positive bias (less than 10%), whereas the path coefficient estimates were negatively biased. However, the new simulation showed that the loading estimates had an acceptable level of negative bias, whereas the path coefficient estimates were positively biased.

The error in the GSCA_{2004} algorithm turns the positive bias of GSCA into negative bias, and vice versa, which suggests that its impact is substantial. At first glance, it might appear unintuitive that GSCA underestimates loadings and overestimates path coefficients, because this situation conflicts with findings in prior research pertaining to principle component analysis and common factor analysis (cf. Widaman 1990). We therefore replicate Hwang et al.’s (2010b) computational experiment in the next section and show that this unexpected behavior is due to cross-loadings in the population model. Cross-loadings mean that an observed indicator is influenced by one or more variables other than the intended latent variable. Although marketing researchers try to avoid cross-loadings because of their detrimental effects on discriminant validity (Fornell and Larcker 1981), such cross-loadings are very common as a result of the frequent problem of common method variance (Podsakoff et al. 2003).

A secondary objective of the computational experiment is to illustrate the substantiality of the error in the GSCA_{2004} algorithm. Until around 2010, all GSCA software—whether published like VisualGSCA (Hwang 2007) and GeSCA (Hwang and Park 2009) or unpublished like the MATLAB code (Hwang et al. 2010b; Hwang and Takane 2004)—implemented GSCA_{2004} instead of GSCA. This error has consequences for all empirical studies that have applied or explored GSCA or introduced extensions of GSCA (up to August 2011, Hwang 2007, 2009; Hwang et al. 2007a, 2010a, 2010b; Hwang and Park 2009; Hwang and Takane 2004; Hwang et al. 2007b; Tenenhaus 2008) and renders their empirical findings with regard to GSCA invalid.

## Reexamining the parameter recovery of GSCA

The computational experiment by Hwang et al. (2010b) is the foundation for their recommendation to prefer GSCA over SEM. We therefore describe their experiment, identify three serious shortcomings, and replicate and extend it.

*χ*= .21. They also implemented an experimental factor model specification by estimating two different models: Model 1 (“correct specification”), which includes cross-loadings and constrains the path β

_{3}to 0, and Model 2 (“misspecification”), which omits the cross-loadings and freely estimates β

_{3}. They used GSCA

_{2004}and SEM to estimate both models. In addition to the model specification experimental condition, they manipulated the sample size and data distribution. The factors model specification and applied method had significant and substantial effects on parameter accuracy, but neither the sample size nor the data distribution had substantial effects (if they were significant, they were not substantial, η

^{2}< .005).

^{1}The key finding of their experiment revealed that in the misspecification experimental condition, GSCA recovered the parameters of the population model significantly better than SEM did. However, this finding was possible only because their experiment contains three severe shortcomings, two of which affected the internal validity and one that affected external validity.

First, the experiment manipulates two factors at once, so the observed effect cannot be attributed to either of the two factors. Hwang et al. (2010b) combined the measurement model specification (correctly specified versus misspecified) and the consideration of an additional structural path (fixed to 0 versus freely estimated), which means that they mixed model underparameterization and model overparameterization. With underparameterization, “one or more parameters are fixed to zeros whose population values are nonzeros,” whereas overparameterization means that “one or more parameters are estimated whose population values are zeros” (Hu and Bentler 1998, p. 427). Underparameterization and overparameterization have different consequences for structural equation models (La Du and Tanaka 1989). Conceptually, underparameterization renders a theoretical model wrong, whereas overparameterization renders a theoretical model weaker. Combining the manipulations makes it impossible to attribute a change in the criterion (i.e., parameter recovery) to any particular manipulation. Moreover, it remains unclear whether GSCA is immune to measurement model misspecification or able to recognize an effect of zero as such. Second, Hwang et al. (2010b) did not apply GSCA but rather used GSCA_{2004}. This erroneous application renders their findings inapplicable. According to their corresponding web errata, the conclusions of the experiment generally remained unaffected. However, because the updated results were not reported entirely, it is hard to verify the extent to which their conclusions hold. Third, they chose a particular population model but made generalizations to other models. They did not control for potential interferences of the constituting elements of the population model, such as the sign of coefficients or the existence of cross-loadings.

_{2004}and disentangle the misspecification factor into two subfactors, each with two levels: cross-loadings, which are either modeled or ignored, and a direct effect, which can be freely estimated or fixed to zero. Second, to explore the contingencies for these phenomena, we use four population models. In Model A, we retain Hwang et al.’s (2010b) original model with three positive cross-loadings (

*χ*= .21) and two positive structural paths (β

_{1}= β

_{2}= .6). In the second population model (Model B), we slightly modify their model to exclude all cross-loadings (

*χ*= 0; standardized error variances increase accordingly). All other parameters remain unchanged. Models C and D are similar to Models A and B, except for the sign of the structural paths (β

_{1}= β

_{2}= −.6). We provide the correlation matrices for the four population models in Table 1; our data exactly produce these four correlation matrices. Although most experiments require replications to cope with data uncertainty, they are not needed in our case, because for different data with the same correlation matrix, GSCA always produces the same estimates (see Eq. 5). To reduce complexity, we keep the sample size (N = 100) and data distribution (normal) constant. Overall, our extended experimental design thus includes 4 × 2 × 2 = 16 conditions (four population models; cross-loadings modeled or ignored; direct effect free or fixed).

Correlation matrices resulting from the four population models

z | z | z | z | z | z | z | z | z | |
---|---|---|---|---|---|---|---|---|---|

A) Population model with positive structural paths (β | |||||||||

z | 1 | ||||||||

z | .4900 | 1 | |||||||

z | .4900 | .4900 | 1 | ||||||

z | .4410 | .4410 | .4410 | 1 | |||||

z | .2940 | .2940 | .2940 | .5782 | 1 | ||||

z | .3469 | .3469 | .3469 | .6823 | .5782 | 1 | |||

z | .2646 | .2646 | .2646 | .5204 | .4410 | .6145 | 1 | ||

z | .1764 | .1764 | .1764 | .3469 | .2940 | .4410 | .5782 | 1 | |

z | .1764 | .1764 | .1764 | .3469 | .2940 | .4410 | .5782 | .4900 | 1 |

B) Population model with positive structural paths (β | |||||||||

z | 1 | ||||||||

z | .4900 | 1 | |||||||

z | .4900 | .4900 | 1 | ||||||

z | .2940 | .2940 | .2940 | 1 | |||||

z | .2940 | .2940 | .2940 | .4900 | 1 | ||||

z | .2940 | .2940 | .2940 | .4900 | .4900 | 1 | |||

z | .1764 | .1764 | .1764 | .2940 | .2940 | .2940 | 1 | ||

z | .1764 | .1764 | .1764 | .2940 | .2940 | .2940 | .4900 | 1 | |

z | .1764 | .1764 | .1764 | .2940 | .2940 | .2940 | .4900 | .4900 | 1 |

C) Population model with negative structural paths (β | |||||||||

z | 1 | ||||||||

z | .4900 | 1 | |||||||

z | .4900 | .4900 | 1 | ||||||

z | −.1827 | −.1827 | −.1827 | 1 | |||||

z | −.2940 | −.2940 | −.2940 | .4994 | 1 | ||||

z | −.2997 | −.2997 | −.2997 | .5091 | .4994 | 1 | |||

z | .1096 | .1096 | .1096 | −.1862 | −.1827 | −.0409 | 1 | ||

z | .1764 | .1764 | .1764 | −.2997 | −.2940 | −.1827 | .4994 | 1 | |

z | .1764 | .1764 | .1764 | −.2997 | −.2940 | −.1827 | .4994 | .4900 | 1 |

D) Population model with negative structural paths (β | |||||||||

z | 1 | ||||||||

z | .4900 | 1 | |||||||

z | .4900 | .4900 | 1 | ||||||

z | −.2940 | −.2940 | −.2940 | 1 | |||||

z | −.2940 | −.2940 | −.2940 | .4900 | 1 | ||||

z | −.2940 | −.2940 | −.2940 | .4900 | .4900 | 1 | |||

z | .1764 | .1764 | .1764 | −.2940 | −.2940 | −.2940 | 1 | ||

z | .1764 | .1764 | .1764 | −.2940 | −.2940 | −.2940 | .4900 | 1 | |

z | .1764 | .1764 | .1764 | −.2940 | −.2940 | −.2940 | .4900 | .4900 | 1 |

_{2004}for each of the 16 experimental conditions. Experimental conditions II and III are the original conditions analyzed by Hwang et al. (2010b). All other cells represent results from the new experimental conditions, which have not been analyzed previously.

Results of the computational experiment

Population model | Cross-loadings | Direct effect | Condition | Method | Estimate for β | Estimate for β | Estimate for β |
---|---|---|---|---|---|---|---|

(A) Positive structural paths β | Modeled | Free | I (overparameterized) | SEM | .600 | .600 | .000 |

GSCA | .674 | .875 | −.136 | ||||

GSCA | .527 | .546 | .006 | ||||

Fixed to zero | II | SEM | .600 | .600 | – | ||

GSCA | .666 | .778 | – | ||||

GSCA | .527 | .549 | – | ||||

Ignored | Free | III | SEM | .662 | .847 | −.148 | |

GSCA | .522 | .601 | −.002 | ||||

GSCA | .517 | .578 | .005 | ||||

Fixed to zero | IV (misspecified) | SEM | .646 | .740 | – | ||

GSCA | .522 | .600 | – | ||||

GSCA | .517 | .581 | – | ||||

(B) Positive structural paths β | Modeled | Free | V (overparameterized) | SEM | .600 | .600 | .000 |

GSCA | .572 | .724 | −.034 | ||||

GSCA | .447 | .410 | .085 | ||||

Fixed to zero | VI (overparameterized) | SEM | .600 | .600 | – | ||

GSCA | .571 | .701 | – | ||||

GSCA | .447 | .448 | – | ||||

Ignored | Free | VII (overparameterized) | SEM | .600 | .600 | .000 | |

GSCA | .445 | .407 | .086 | ||||

GSCA | .445 | .407 | .086 | ||||

Fixed to zero | VIII (well-specified) | SEM | .600 | .600 | – | ||

GSCA | .445 | .445 | – | ||||

GSCA | .445 | .445 | – | ||||

(C) Negative structural paths β | Modeled | Free | IX (overparameterized) | SEM | −.600 | −.600 | .000 |

GSCA | −.511 | −.608 | .047 | ||||

GSCA | −.389 | −.318 | .117 | ||||

Fixed to zero | X (well-specified) | SEM | −.600 | −.600 | – | ||

GSCA | −.511 | −.637 | – | ||||

GSCA | −.389 | −.364 | – | ||||

Ignored | Free | XI (misspecified) | SEM | −.526 | −.389 | .113 | |

GSCA | −.397 | −.309 | .122 | ||||

GSCA | −.397 | −.305 | .121 | ||||

Fixed to zero | XII (misspecified) | SEM | −.537 | −.462 | – | ||

GSCA | −.397 | −.357 | – | ||||

GSCA | −.397 | −.354 | – | ||||

(D) Negative structural paths β | Modeled | Free | XIII (overparameterized) | SEM | −.600 | −.600 | .000 |

GSCA | −.572 | −.724 | .034 | ||||

GSCA | −.447 | −.410 | .085 | ||||

Fixed to zero | XIV (overparameterized) | SEM | −.600 | −.600 | – | ||

GSCA | −.571 | −.701 | – | ||||

GSCA | −.447 | −.448 | – | ||||

Ignored | Free | XV (overparameterized) | SEM | −.600 | −.600 | .000 | |

GSCA | −.445 | −.407 | .086 | ||||

GSCA | −.445 | −.407 | .086 | ||||

Fixed to zero | XVI (well-specified) | SEM | −.600 | −.600 | – | ||

GSCA | −.445 | −.445 | – | ||||

GSCA | −.445 | −.445 | – |

If we considered only the results from conditions II and III, we would confirm the conclusions drawn by Hwang et al. (2010b): SEM recovers the parameters perfectly for a well-specified model (condition II), but for a misspecified model (condition III), it performs poorly. In contrast, GSCA_{2004} delivers near-perfect estimates in both conditions. For GSCA, the results are similar when we review the misspecified model, but they differ for the well-specified model. That is, condition II suggests that GSCA cannot recover parameters when the model is well-specified.

The results from all 16 conditions offer richer, different conclusions. Five in particular are worth emphasizing. First, we find that in experimental conditions X and XI (which differ from conditions II and III only in the sign of the population effects), GSCA performs better when estimating well-specified models rather than misspecified ones. Thus, Hwang et al.’s (2010b) primary finding cannot be reproduced with another model and appears attributable to the specific population model they chose.

Second, regarding the substantiality of the difference between GSCA and GSCA_{2004}, we find that conditions III, IV, VII, VIII, XI, XII, XV, and XVI indicate relatively small differences. However, all other conditions (I, II, V, VI, IX, X, XII, XIV) reveal substantial differences between these methods. The two groupings of conditions vary according to whether the cross-loadings are estimated. If no cross-loadings are estimated, the differences in the GSCA versus GSCA_{2004} estimates seem negligible. However, when we estimate cross-loadings, GSCA and GSCA_{2004} produce clearly different results.

Third, our experiment indicates that GSCA estimates are inconsistent. Whereas SEM is able to perfectly recover parameters for well-specified models (conditions I, II, V–X, XIII–XVI), GSCA cannot recover parameters perfectly in any condition.

Fourth, we note the effects of experimentally applied constraints. When we compare a condition in which the direct effect is constrained with a corresponding condition without this constraint (in the structural model, condition I versus II, III versus IV, V versus VI, and so on), we find that GSCA’s estimates are not invariant to the correct constraints. If a parameter in a GSCA model is constrained, it affects the other parameters. In contrast, SEM estimates remain unaffected by correct constraints.

Fifth, with regard to GSCA’s behavior, we note that modeling cross-loadings leads to an increase in the absolute size of the estimates \( {\widehat{\beta }_1} \) and \( {\widehat{\beta }_2} \) but a decrease in the absolute size of \( {\widehat{\beta }_3} \). This pattern emerges regardless of whether the population model has cross-loadings.

Overall, our extended experiment reveals five contingencies that influence the accuracy of GSCA estimates: (1) correct model specification, (2) setting constraints in the structural model, (3) modeling cross-loadings, (4) the existence of cross-loadings in the population model, and (5) the sign of the path coefficients if the population model contains cross-loadings. In contrast, the accuracy of SEM estimates depends only on correct model specification, not on the other four contingencies.

## GSCA’s behavior in models with mediation

The finding that GSCA delivers inconsistent estimates is particularly worrisome if we consider the direct effect. For conditions I, V, VII, IX, XIII, and XV, which imply well-specified models, the method should identify a path of 0 as such. Whereas SEM is able to recover a population effect of 0, GSCA obtains standardized estimates with an absolute value of up to .136. Conditions I and VII even deserve a special comment: both conditions imply a well-specified model, and GSCA delivers estimates for their direct effects that are clearly different from 0 (−.136 and .086, respectively). If the sample size were large enough (e.g., about 200 observations in the case of an estimate of −.136), GSCA would identify these effects as significant. Therefore, the larger the sample size, the more likely GSCA is to commit a Type-I error. It is striking that this bias applies particularly to condition VII, which represents a common type of (sub-)model in marketing and other business and social sciences.

^{2}:

Unless the mediating construct is measured perfectly reliably, GSCA will provide a non-zero estimate for a direct effect. The exact size of the estimation bias depends on the reliability of the mediating construct \( \left( {{\text{rel}}\left( {{{\widetilde{\eta }}_2}} \right)} \right) \), its coefficient of determination \(R^{2} {\left( {\widetilde{\eta }_{2} } \right)}\), and the correlation between the mediated constructs’ scores \( \left( {{\text{cor}}\left( {{{\widetilde{\eta }}_1},{{\widetilde{\eta }}_3}} \right)} \right) \).

## Conclusion

Structural equation modeling plays an important role not only in marketing but also in management, psychology, sociology, educational research, and beyond. Researchers embrace SEM’s advantages, such as its abilities to model latent variables, correct for measurement error, specify error covariance structures, and estimate entire theories simultaneously. Hwang and Takane (2004) have reintroduced Glang’s (1988) concept of the maximization of the sum of explained variances under a new name, generalized structured component analysis (GSCA), and presented it as an alternative to SEM.

With this article, we attempt to provide a better understanding of what GSCA is and how it works. We have shown that GSCA creates standardized composite variables as weighted sums of indicator variables, such that the average R-square value resulting from predefined linear relationships gets maximized.^{3} Moreover, this average R-square value equals GSCA’s model statistic, the FIT.

A mistake in the formulation of the algorithm and its subsequent implementation means that several academic publications that use GSCA are at least partially invalid. In particular, the erroneous implementation of GSCA facilitated the incorrect claim by Hwang et al. (2010b) that GSCA outperforms SEM. Using a simulation study, they conclude that “if correct model specification cannot be ensured, the researcher should use generalized structured component analysis” instead of SEM. However, we have shown that these findings are attributable to the specific choice of a population model; in general, GSCA provides inconsistent estimates. If the bias induced by the model misspecification or the inclusion of cross-loadings neutralizes GSCA’s inconsistency, then GSCA provides estimates that are closer to the true values. But model misspecification can also aggravate GSCA’s inconsistency, in which case the GSCA estimates are much farther from the truth than are the parallel estimates derived from SEM. Furthermore, GSCA’s inconsistency has particularly negative consequences for mediation analysis, because GSCA is likely to overestimate the direct effect.

### Implications

Related to these findings, our study offers several key implications for marketing researchers. Primarily, they should recognize that Hwang et al.’s (2010b) findings are invalid, due to an error in the algorithm and an experimental design that lacks internal validity. In general, GSCA can neither yield more accurate estimates from misspecified models than from correctly specified ones nor come close to SEM’s parameter accuracy.

To obtain accurate estimates, researchers instead must specify their models correctly and use adequate techniques to provide estimates. Applying GSCA cannot alleviate the potential bias that results from model misspecification. Rather, when they use GSCA, researchers should expect inconsistent estimates. These estimates tend to not only be further from the true values than SEM estimates but also have lesser value for meta-analyses, because aggregating GSCA results cannot reveal the true parameter.

With the findings from this study, researchers can critically examine existing studies that have used GSCA to interpret their estimates in perspective. Typically, path coefficients will be attenuated, which means that the true relationship is likely to be stronger than indicated by GSCA. However, the opposite implication holds in the case of mediation, in which setting GSCA is likely to overestimate the direct effect. If a GSCA model involves direct and indirect effects, researchers can use our Eq. 9 to assess the estimate for the direct effect, if in reality there had been no direct effect. With Eq. 9, researchers also can distinguish between effects that emerge because they likely exist and those that are methodological artifacts of GSCA use. Finally, researchers should expect GSCA path coefficient estimates to be inflated when there are cross-loadings.

In some situations, GSCA can be expected to yield accurate, consistent estimates, namely, if the construct measurement is perfectly reliable and valid. Such situations might occur for observable variables (e.g., time, turnover, marketing expenditures) or, eventually, with applications of formative measurements. In other situations, researchers using GSCA would obtain inaccurate estimates. Because GSCA does not provide any known benefits in return for this lower parameter accuracy, we cannot recommend that marketing researchers use it as a substitute for SEM.

### Further research

Methodology research should further explore the nature and behavior of GSCA to find other situations in which GSCA provides value for marketing researchers. In principle, GSCA contains several characteristics that provide a foundation for methodological advantages, such as the existence of a global optimization criterion, the independence of distributional assumptions, and the convergence behavior of the algorithm. Perhaps GSCA’s characteristics would be beneficial in settings with small sample sizes, complex models, or highly nonnormally distributed data. A promising path for further research would be to equip GSCA with some form of correction for attenuation (cf. Croon 2002). Its statistical power also demands further investigation, in that Hwang et al. (2010b) find that GSCA_{2004}, in combination with bootstrapping, yields relatively small standard errors. We find the same pattern in the standard error of GSCA. For condition VIII, a well-specified model, we compare the standard errors of SEM and GSCA and find that for the coefficients β_{1} and β_{2}, SEM yields standard errors of .155 and .154, whereas GSCA (with 10,000 bootstrap samples) yields standard errors of .079 and .081, respectively. In this example, GSCA thus has greater statistical power than SEM.

As long as empirical evidence of GSCA’s superiority over other techniques remains lacking, marketing researchers should not make assumptions about GSCA’s behavior. Instead, they should resort to the ample support provided by a plethora of conceptual, empirical, and simulation-based comparisons of structural equation modeling techniques (e.g., Dijkstra 1983; Fornell and Bookstein 1982; Lu et al. 2011; Reinartz et al. 2009) to make deliberate choices among their options.

## Footnotes

- 1.
Hwang et al. (2010b) also assessed the standard errors of the estimates. Because GSCA relies on bootstrapping, a well-established approach for obtaining standard errors, we do not pursue this issue further.

- 2.
This equation holds only if neither the population nor the estimated model contains cross-loadings.

- 3.
Extensions for nonlinear relationships also are available (e.g., Hwang et al. 2010a).

## Notes

### Acknowledgements

The author thanks Theo K. Dijkstra, Werner J. Reinartz, Allard C. R. van Riel, Christian M. Ringle, José L. Roldán, Marko Sarstedt and three anonymous reviewers for helpful comments.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

### References

- Bagozzi, R. P., & Phillips, L. W. (1982). Representing and testing organizational theories: A holistic construal.
*Administrative Science Quarterly, 27*(3), 459–489.CrossRefGoogle Scholar - Bagozzi, R. P., & Yi, Y. (1988). On the evaluation of structural equation models.
*Journal of the Academy of Marketing Science, 16*(1), 74–94.CrossRefGoogle Scholar - Baumgartner, H., & Homburg, C. (1996). Applications of structural equation modeling in marketing and consumer research: A review.
*International Journal of Research in Marketing, 13*(2), 139–161.CrossRefGoogle Scholar - Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. Bollen & J. Long (Eds.),
*Testing structural equation models*(pp. 136–162). Thousand Oaks: Sage.Google Scholar - Bunge, M. (1967).
*Scientific research I: The search for system*. Berlin: Springer.Google Scholar - Chandy, R. K. (2003). Research as innovation: Rewards, perils, and guideposts for research and reviews in marketing.
*Journal of the Academy of Marketing Science, 31*(3), 351–355.CrossRefGoogle Scholar - Croon, M. (2002). Using predicted latent scores in general latent structure models. In G. A. Marcoulides & I. Moustaki (Eds.),
*Latent variable and latent structure models*(pp. 195–223). Mahwah: Lawrence Erlbaum Associates.Google Scholar - de Leeuw, J., Young, F. W., & Takane, Y. (1976). Additive structure in qualitative data: An alternating least squares method with optimal scaling features.
*Psychometrika, 41*(4), 471–503.CrossRefGoogle Scholar - Diamantopoulos, A., & Winklhofer, H. M. (2001). Index construction with formative indicators: An alternative to scale development.
*Journal of Marketing Research, 38*(2), 269–277.CrossRefGoogle Scholar - Dijkstra, T. K. (1983). Some comments on maximum likelihood and partial least squares methods.
*Journal of Econometrics, 22*(1–2), 67–90.CrossRefGoogle Scholar - Fornell, C., & Bookstein, F. L. (1982). Two structural equation models: LISREL and PLS applied to consumer exit-voice theory.
*Journal of Marketing Research, 19*(4), 440–452.CrossRefGoogle Scholar - Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error.
*Journal of Marketing Research, 18*(1), 39–50.CrossRefGoogle Scholar - Glang, M. (1988).
*Maximierung der Summe erklärter Varianzen in linear-rekursiven Strukturgleichungsmodellen mit multiplen Indikatoren: Eine Alternative zum Schätzmodus B des Partial-Least-Squares-Verfahrens*(Engl.: Maximization of the Sum of Explained Variances in Linear-recursive Structural Equation Models with Multiple Indicators: An Alternative to Mode B of the Partial Least Squares Approach). PhD Thesis. University of Hamburg.Google Scholar - Hair, J. F., Sarstedt, M., Ringle, C. M., Mena, J. A. (2011). “An assessment of the use of partial least squares structural equation modeling in marketing research.”
*Journal of the Academy of Marketing Science*: In print.Google Scholar - Henseler, J. (2010).
*A comparative study on parameter recovery of three approaches to structural equation modeling: A rejoinder*. SSRN eLibrary Manuscript no. 1585305.Google Scholar - Horst, P. (1961). Relations among
*m*sets of measures.*Psychometrika, 26*(2), 129–149.CrossRefGoogle Scholar - Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification.
*Psychological Methods, 3*(4), 424–453.CrossRefGoogle Scholar - Hwang, H. (2007).
*VisualGSCA 1.0, a graphical user interface software program for generalized structured component analysis*. Montreal: Department of Psychology, McGill University.Google Scholar - Hwang, H. (2009). Regularized generalized structured component analysis.
*Psychometrika, 74*(3), 517–530.CrossRefGoogle Scholar - Hwang, H., & Park, S. (2009).
*GeSCA, version as of 9 December 2009*. Montreal: McGill University, http://www.sem-gesca.org - Hwang, H., & Takane, Y. (2004). Generalized structured component analysis.
*Psychometrika, 69*(1), 81–99.CrossRefGoogle Scholar - Hwang, H., Desarbo, W. S., & Takane, Y. (2007). Fuzzy clusterwise generalized structured component analysis.
*Psychometrika, 72*(2), 181–198.CrossRefGoogle Scholar - Hwang, H., Takane, Y., & Malhotra, N. K. (2007). Multilevel generalized structured component analysis.
*Behaviormetrika, 34*(2), 95–109.CrossRefGoogle Scholar - Hwang, H., Ho, M.-H. R., & Lee, J. (2010a). Generalized structured component analysis with latent interactions.
*Psychometrika, 75*(2), 228–242.CrossRefGoogle Scholar - Hwang, H., Malhotra, N. K., Kim, Y., Tomiuk, M. A., & Hong, S. (2010b). A comparative study on parameter recovery of three approaches to structural equation modeling.
*Journal of Marketing Research, 47*(4), 699–712.CrossRefGoogle Scholar - Hwang, H., Malhotra, N. K., Kim, Y., Tomiuk, M. A., Hong, S. (2010c). “Web errata for ‘a comparative study on parameter recovery of three approaches to structural equation modeling (Hwang, Naresh, Kim, Tomiuk, & Hong, 2010, Vol. XLVII, August, pp. 699–712).’” http://www.marketingpower.com/AboutAMA/Documents/JMR_Web_Appendix/2010.4/erratum_comparative_study_on_parameter_recovery.pdf, retrieved 2011-11-06.
- La Du, T. J., & Tanaka, J. S. (1989). Influence of sample size, estimation method, and model specification on goodness-of-fit assessments in structural equation models.
*Journal of Applied Psychology, 74*(4), 625–635.CrossRefGoogle Scholar - Lu, I. R. R., Kwan, E., Thomas, D. R., & Cedzynski, M. (2011). Two new methods for estimating structural equation models: An illustration and a comparison with two established methods.
*International Journal of Research in Marketing, 28*(3), 258–268.CrossRefGoogle Scholar - McArdle, J. J., & McDonald, R. P. (1984). Some algebraic properties of the reticular action model for moment structures.
*British Journal of Mathematical and Statistical Psychology, 37*(2), 234–251.CrossRefGoogle Scholar - McDonald, R. P. (1968). A unified treatment of the weighting problem.
*Psychometrika, 33*(3), 351–381.CrossRefGoogle Scholar - Meredith, W., & Millsap, R. E. (1985). On component analyses.
*Psychometrika, 50*(4), 495–507.CrossRefGoogle Scholar - Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies.
*Journal of Applied Psychology, 88*(5), 879–903.CrossRefGoogle Scholar - Reinartz, W. J., Haenlein, M., & Henseler, J. (2009). An empirical comparison of the efficacy of covariance-based and variance-based SEM.
*International Journal of Research in Marketing, 26*(4), 332–344.CrossRefGoogle Scholar - Schweidel, D. A., Fader, P. S., & Bradlow, E. T. (2008). Understanding service retention within and across cohorts using limited information.
*Journal of Marketing, 72*(1), 82–94.CrossRefGoogle Scholar - Shook, C. L., Ketchen, D. J., Jr., Hult, G. T. M., & Kacmar, K. M. (2004). An assessment of the use of structural equation modeling in strategic management research.
*Strategic Management Journal, 25*(4), 397–404.CrossRefGoogle Scholar - Tenenhaus, M. (2008). Component-based structural equation modelling.
*Total Quality Management and Business Excellence, 19*(7), 871–886.CrossRefGoogle Scholar - Widaman, K. F. (1990). Bias in pattern loadings represented by common factor analysis and component analysis.
*Multivariate Behavioral Research, 25*(1), 89–95.CrossRefGoogle Scholar