Abstract
Compositebased structural equation modeling (SEM), and especially partial least squares path modeling (PLS), has gained increasing dissemination in marketing. To fully exploit the potential of these methods, researchers must know about their relative performance and the settings that favor each method’s use. While numerous simulation studies have aimed to evaluate the performance of compositebased SEM methods, practically all of them defined populations using common factor models, thereby assessing the methods on erroneous grounds. This study is the first to offer a comprehensive assessment of compositebased SEM techniques on the basis of composite model data, considering a broad range of model constellations. Results of a largescale simulation study substantiate that PLS and generalized structured component analysis are consistent estimators when the underlying population is composite modelbased. While both methods outperform sum scores regression in terms of parameter recovery, PLS achieves slightly greater statistical power.
Introduction
Structural equation modeling (SEM) has become a quasistandard with respect to analyzing cause–effect relationships between latent variables. Its ability to model latent variables while simultaneously taking into account various forms of measurement error makes SEM useful for a plethora of research questions (e.g., Babin et al. 2008; Steenkamp and Baumgartner 2000), particularly in the marketing field, which typically focuses on examining unobservable phenomena such as consumer attitudes, perceptions, and intentions.
To estimate structural equation models, researchers can draw on two main approaches: Factorbased SEM (Jöreskog 1973) as carried out by software programs such as Amos, EQS, LISREL, or Mplus, and partial least squares path modeling (PLS; Wold 1974) as implemented in software programs such as ADANCO, PLSGraph, SmartPLS, or XLSTAT. A review of all empirical studies published in the 30year period between 1986 and 2015 in the Journal of Marketing (JM) and Journal of the Academy of Marketing Science (JAMS)—the top two marketing journals according to the 2015 Thomson Reuters Journal Citation Report—demonstrates the relevance of these two SEM methods for applied research. Our search yielded a total of 193 studies that used factorbased SEM, while 53 studies applied PLS. Looking at the cumulative number of studies that appeared between 1986 and 2015 (Fig. 1) shows that the use of both SEM approaches has significantly increased over time. The review also shows that PLS use has gained momentum relative to factorbased SEM in recent years. While in the early 90s, the ratio of factorbased SEM studies to PLS studies was 4.5, this ratio decreased to 2 in the period 2011 to 2015. Regressing the number of studies on the linear effects of time yields significant models (factorbased SEM: F = 62.409, p < 0.01; PLS: F = 25.836, p < 0.01) and time effects (factorbased SEM: β = 0.831, t = 7.900, p < 0.01; PLS: β = 0.693, t = 5.083, p < 0.01). For factorbased SEM, a quadratic effect of time is negative but not significant (β = −0.139, t = −0.314, p > 0.10), indicating that its use has grown linearly over time. In contrast, the quadratic time effect for PLS is positive and significant (β = 0.984, t = 1.817, p < 0.10), indicating its use has accelerated over time.
Several of the PLSbased models featured in this review have had a lasting effect on the field. For example, the American Customer Satisfaction Index (ACSI; Fornell et al. 1996) ranks among the most salient models in studying customer satisfaction (e.g., Anderson and Fornell 2000) and has given rise to related indices such as the European Customer Satisfaction Index (Eklöf and Westlund 2002). The ACSI has become a key performance indicator for companies and government agencies, as well as entire industries and sectors. Many followup studies use ACSI results, for example, to assess the impact of customer satisfaction on market share (e.g., Rego et al. 2013) or stock returns (e.g., Fornell et al. 2016), furthering our understanding of customer satisfaction’s value relevance. PLS has also been used extensively in services research, for example, to analyze the impact of service failures (e.g., Gelbrich 2010; Heidenreich et al. 2015; van der Heijden et al. 2013) or the drivers of service satisfaction (Brady et al. 2012; Dellande et al. 2004; HennigThurau et al. 2006). Finally, the technology acceptance model (Burke 2011), arguably one of the most widely used PLSbased models for studying user adoption in the information systems discipline, has been applied to the different elements of the marketing mix (Haenlein and Kaplan 2011), such as salesperson behavior (Sundaram et al. 2007) or product adoption (Kaplan et al. 2007).
Wold (1980) created PLS as a complementary approach to factorbased SEM that would emphasize prediction while simultaneously relaxing the demands on data and specification of relationships (e.g., Dijkstra 2010; Jöreskog and Wold 1982). In early writing, researchers noted that PLS estimation is “deliberately approximate” to factorbased SEM (Hui and Wold 1982, p. 127), a characteristic that has come to be known as the PLS bias (e.g., Chin et al. 2003). A number of studies have used simulations to demonstrate the alleged PLS bias (e.g., Goodhue et al. 2012a; McDonald 1996; Rönkkö and Evermann 2013), which manifests itself in estimates between the constructs and indicators that are higher, while estimates among the constructs are smaller compared to the prespecified values.^{Footnote 1} The parameter estimates will approach what has been labeled the “true” parameter values obtained from factor models when both the number of indicators per construct and sample size increase (Hui and Wold 1982).
While prior simulation studies have aimed to evaluate the performance of compositebased PLS, practically all of the simulations defined populations using factor models in which the indicator covariances define the nature of the data (for a notable exception, see Becker et al. 2013a). That is, these studies used factorbased SEM as the benchmark against which the PLS parameters were evaluated with the assumption that they should be the same. In contrast, in composite model populations, the data are defined by means of linear combinations of indicators. Therefore, prior simulation studies universally estimated and evaluated PLS models that were incorrectly specified with regard to the population model (Rigdon 2016; Rigdon et al. 2014).
Early research warned about the questionable legitimacy of such analyses. More than 25 years ago, Lohmöller (1989) and Schneeweiß (1991) argued that PLS can be seen as a consistent estimator of parameters as long as researchers emphasize which type of population parameter they attempt to estimate. More recently, Marcoulides et al. (2012, p. 717) noted “the comparison of PLS to other methods cannot and should not be applied indiscriminately” and referred to any evaluation of PLS visávis factorbased SEM methods as “comparing apples with oranges” (p. 725). Similarly, in their comparative study on parameter recovery of factorbased SEM, PLS, and generalized structured component analysis (GSCA; Hwang 2009), Hwang et al. (2010, p. 710) note that their common factor modelbased data generation approach “may have had an unfavorable effect on the performance of partial least squares and generalized structured component analysis” and conclude that “it appears necessary in future studies to investigate whether a particular data generation procedure may influence the relative performance of the different approaches.” While these concerns have been echoed by many other authors (e.g., Henseler et al. 2014; McDonald 1996; Rigdon 2012; Tenenhaus 2008), some researchers continue to adhere to the reflexlike application of factor model populations to judge the relative performance of PLS (e.g., McIntosh et al. 2014; Rönkkö et al. 2015) and other compositebased techniques, such as GSCA, and sum scores regression (e.g., Rönkkö and Evermann 2013; Rönkkö et al. 2016).
Apart from these parameterization issues, previous simulation studies univocally focused on structural model estimates, but neglected the measurement models. This narrow focus is problematic, particularly since PLS analyses of common factor–based populations tend to yield higher measurement model estimates, which likely support the measures’ reliability and validity (e.g., Henseler et al. 2015). Understanding the methods’ performance in estimating measurement models is crucial to correctly appreciate their suitability for evaluating measurement model quality—a fundamental step in model evaluation practice. Similarly, prior studies have generally analyzed parameter estimation bias, while neglecting the methods’ sensitivity to Type I and II errors which are, however, fundamental performance features of every statistical method (Goodhue et al. 2012a). In light of the above and despite the diversity of literature on compositebased SEM, it is reasonable to conclude that our understanding of these methods’ actual performance is still rather limited.
Addressing these gaps in research, this study is the first to offer a comprehensive assessment of PLS and other compositebased SEM techniques, considering a broad range of model constellations. Specifically, we examine the relative efficacy of PLS, GSCA, and sum scores regression under a variety of conditions that researchers typically encounter in practice.^{Footnote 2} By efficacy we refer to the ability of compositebased methods to support a researcher’s need to statistically test hypothesized relationships among constructs and indicators in structural and measurement models (Goodhue et al. 2012a). More specifically, we test whether these compositebased SEM techniques have different abilities in terms of: (1) producing accurate path estimates, (2) avoiding false positives (Type I errors), and (3) avoiding false negatives (Type II errors, related to statistical power). Most importantly, our approach correctly assesses the methods’ efficacy on the basis of composite model populations, rather than common factor model populations. As such, this study addresses numerous calls for further research in this respect (e.g., Chin 2010; Hwang et al. 2010; Marcoulides and Chin 2013).
Our results show that PLS and GSCA perform very similar in terms of parameter accuracy, whereas sum scores regression does not perform as well when indicator weights differ on the same construct. Further analyses show that PLS achieves higher power levels than GSCA. The results directly contradict prior research regarding the performance of PLS and GSCA using factor modelbased populations, where these methods have been shown to overestimate measurement model parameters and underestimate structural model parameters (e.g., Barroso et al. 2010; Chin et al. 2003; Reinartz et al. 2009). When using the methods to estimate data from composite modelbased populations, the direction of parameter bias is reversed and much smaller, approaching zero as sample size increases. Thus, PLS and GSCA are consistent estimators of compositebased models, whereas this is not the case with sum scores regression.
In what follows, we first compare key aspects of factorbased and compositebased SEM, deriving recommendations for their use. Our comparisons focus on PLS since it is regarded as “the most fully developed and general system for path analysis with composites” (McDonald 1996, p. 240). Next, we describe our simulation design, followed by the results description, and interpretation. Based on our findings, we derive guidelines for choosing among compositebased SEM methods. Finally, we discuss the implications of our findings for marketing researchers as well as our study’s limitations, along with avenues for future research.
Factorbased SEM and PLS
While factorbased SEM and PLS share the same objective—determining the relationships among constructs and indicators—they take different routes to achieve it. Numerous studies have contrasted the two approaches, focusing on aspects such as distributional assumptions, their efficacy for estimating reflective vs. formative measurement models, and sample size requirements (e.g., Chin 1998; Hair et al. 2011; Hair et al. 2012a; Henseler et al. 2009). More recent research, however, emphasizes the interplay between model estimation and treatment of construct measures as the key distinguishing factor of the two approaches (Henseler et al. 2016; Rigdon 2012; Sarstedt et al. 2016).
Factorbased SEM initially divides the variance of each indicator into two parts: (1) the common variance, which is estimated from the variance shared with other indicators in the measurement model of a construct, and (2) the unique variance, which consists of both the specific and the error variance (Bollen 1989; Rigdon 1998). In estimating the model parameters, factorbased SEM only draws on the common variance, assuming that the variance of a set of indicators can be perfectly explained by the existence of one unobserved variable (the common factor) and individual random error (Spearman 1927; Thurstone 1947). This procedure conforms to the measurement philosophy underlying reflective measurement models, which, in essence, is why factorbased SEM has limitations when it comes to estimating formatively specified constructs (e.g., Lee and Cadogan 2013).
Different from factorbased SEM, PLS considers the total variance of the indicators in estimating the model (e.g., Tenenhaus et al. 2005). To do so, PLS linearly combines the indicators to form composites, therefore generally conforming to the measurement philosophy underlying formative measurement models (Henseler et al. 2016). However, PLS’s designation as compositebased refers only to the method’s way to represent constructs that approximate the conceptual variables from a theoretical model—the method readily accommodates both measurement model types without identification issues (Hair et al. 2011).
Composites formed by PLS explicitly serve as proxies of the conceptual variables under investigation (e.g., Henseler et al. 2016; Rigdon 2012).^{Footnote 3} Because of this trait, researchers in other fields such as chemometrics, refer to the acronym PLS as “projection to latent structures” (Wold et al. 2001, p. 110). That is, PLS approximates common factorbased reflective measurement models (Hui and Wold 1982) and will produce biased estimates if the common factor model holds—just like factorbased SEM produces biased estimates when using the method to estimate data generated from a composite model (Sarstedt et al. 2016). Referring to the proxy nature that underlies all measurement (e.g., Cliff 1983; Rigdon 2012), Sarstedt et al. (2016, p. 4002) recently noted that composites produced by PLS “can be used to measure any type of property to which the focal concept refers, including attitudes, perceptions, and behavioral intentions. (…) As with any type of measurement conceptualization, however, researchers need to offer a clear construct definition and specify items that closely match this definition—that is, they must share conceptual unity.” PLSbased composites can also be applied as a method for dimension reduction, similar to principal components analysis, where the aim is to condense the measures so they adequately cover a conceptual variable’s salient features (Dijkstra and Henseler 2011).
To estimate the model parameters PLS draws on composites formed from the indicators and applies a series of ordinary least squares regressions to estimate partial model structures with the objective of minimizing the error terms (i.e., the residual variance) of the endogenous constructs. Since PLS does not estimate all model relationships simultaneously, the approach enables complex models to be estimated with small sample sizes, situations in which factorbased SEM often does not converge, or develops inadmissible solutions (Henseler et al. 2014). This characteristic has greatly contributed to PLS’s popularity but has also triggered debates among methodologists (e.g., Marcoulides et al. 2012; Rigdon et al. 2014). As Hair et al. (2013, p. 2) note, “some researchers abuse this advantage by relying on extremely small samples relative to the underlying population”, and “PLSSEM has an erroneous reputation for offering special sampling capabilities that no other multivariate analysis tool has.”
Factorbased SEM estimates model parameters based on the statistical objective of minimizing the discrepancy between the empirical and modelimplied covariance matrices. This difference serves as a basis for the χ^{2}based indices, which allow testing a model’s goodnessoffit. As such, factorbased SEM focuses on explanatory modeling, which is “the use of statistical models for testing causal explanations” (Shmueli 2010, p. 390). Different from PLS, the construct scores of factorbased SEM need not to be known or assumed at any stage of the estimation process (Jöreskog 1973). A crucial consequence of this indeterminacy is that the correlation between a common factor and any variable outside the factor model is itself indeterminate—it may be high or low, depending on which set of factor scores one chooses (Schönemann and Steiger 1978). As a consequence, factorbased SEM offers very limited value for predictive modeling (Becker et al. 2013a; Evermann and Tate 2016).
PLS on the other hand focuses on prediction, and as such, is concerned with generalization (Shmueli et al. 2016), which is the ability to predict sample data, or preferably outofsample data (e.g., Shmueli 2010). Complementing the range of metrics for assessing a model’s predictive relevance (e.g., Hair et al. 2017), researchers have advocated several PLSbased goodnessoffit indices such as the standardized root mean square residual (SRMR) or the root mean square residual covariance. However, literature casts doubt on whether measured fit—as understood in a factorbased SEM context—is a relevant concept for PLS (Hair et al. 2017; Lohmöller 1989; Rigdon 2012). Different from factorbased SEM, the discrepancy between the empirical and the modelimplied covariance matrices, which serves as the basis for fit indices such as SRMR, is a byproduct of the PLS algorithm and not explicitly minimized. Correspondingly, Lohmöller (1989, p. 222) long ago noted that these goodnessoffit measures “should not be used for a decision about the fit of the model, because these indices are not optimized in the estimation procedure.”
Factorbased SEM and PLS are not interchangeable but rather complementary, a fact that was stressed by the method’s originators (Jöreskog and Wold 1982). Researchers need to apply the SEM approach that best suits their research objective, measurement properties, and model setup. Table 1 summarizes the differences between factorbased SEM and PLS along with sample references discussing each criterion.
Simulation design and model estimation
An extensive body of literature provides the technical underpinnings of PLS (e.g., Lohmöller 1989; Tenenhaus et al. 2005; Wold 1982) and GSCA (e.g., Henseler 2012; Hwang 2009; Hwang et al. 2010). GSCA is an alternative method for path analysis with composites. The method replaces factors by exact linear combinations of observed variables, employs a least squares criterion to estimate model parameters, and retains the advantages of PLS (e.g., less restrictive distributional assumptions and no improper solutions). Model estimation using sum scores regression (sum scores) is similar to PLS, but differs in that the sum scores approach assumes equal (typically unit) indicator weights, whereas weightings in PLS represent the partialized effect of the indicators on their corresponding construct, and thus control the individual effects of all the other construct indicators (e.g., Goodhue et al. 2012b; Tenenhaus 2008). We do not elaborate further on the methodological underpinnings of these techniques, but refer to the relevant literature.
Our simulation study considers the path model shown in Fig. 2, which Reinartz et al. (2009) also used in their comprehensive simulation study on the comparison of PLS and factorbased SEM. The model mirrors the structure of the ACSI model (e.g., Fornell et al. 1996) and the European Customer Satisfaction Index model (e.g., Eklöf and Westlund 2002) whose estimations routinely draw on PLS. We also chose this type of model, because it reflects the typical degree of complexity used in compositebased structural equation modeling in the marketing discipline (Hair et al. 2012a).
For the simulation study, we selected low (i.e., 0.15; γ _{ 2 }, γ _{ 3 }, β _{ 6 }), medium (i.e., 0.30; β _{ 3 }), and high (i.e., 0.50; γ _{ 1 }, γ _{ 6 }, β _{ 1 }, β _{ 2 }, β _{ 4 }, β _{ 5 }) prespecified values of standardized path coefficients (Fig. 2). As a result of corresponding calls in the literature (e.g., Marcoulides et al. 2012a), we extended the simulation study by adding a construct (ξ _{2}) with two null paths (γ _{ 4 } and γ _{ 5 }) to the original model.
Our choice of manipulated factors and their factor levels follows prior research (e.g., Becker et al. 2013b; Chin et al. 2003; Henseler 2012; Hwang et al. 2010; Vilares and Coelho 2013). These conditions compare well with those seen in the marketing field as evidenced in prior reviews of PLS use (e.g., Hair et al. 2012a). Specifically, we manipulate the following factors:

The number of indicators per latent variable: 2, 4, 6, 8 (4 factor levels)

The standardized indicator weights for different numbers of indicators per latent variable: equal, unequal (2 factor levels)^{Footnote 4}

The data distribution: normal, nonnormal (i.e., diff normal), and extremely nonnormal (i.e., log normal) (3 factor levels)^{Footnote 5}

The sample size: 100, 250, 500, 1000, 10,000 (5 factor levels)
This is a full factorial design with 4·2·3·5 = 120 factor level combinations. To obtain stable average outcomes for our analyses, we conducted 300 replications of each factorlevel combination resulting in the generation of 120·300 = 36,000 datasets. For the three methods under investigation (i.e., PLS, GSCA, and sum scores), this simulation study draws on a total number of 108,000 model estimations.
For the data generation, we use the prespecified measurement model (Table A1) and structural model (Fig. 2) coefficients to determine the indicators’ population correlation matrix (Becker et al. 2013a; Ringle et al. 2014; Schuberth et al. 2016). Its Cholesky decomposition allows us to extract the indicator data for a prespecified number of observations and the desired data distribution (see the Online Appendix for further information). For the data generation and the (partial) regression model estimations by means of the sum scores approach, we use the statistical software R (R Core Team 2016). PLS estimations draw on the semPLS (Monecke and Leisch 2012) package, while the GSCA estimations rely on the ASGSCA (Romdhani et al. 2014) package. Moreover, the R software’s snowfall package (Knaus 2013) allowed us exercising parallel computing on several hundred processors of a highperformance computing cluster.
Results
In order to draw a comprehensive picture of each approach’s efficacy for estimating path models with composite model data, analyses of our results address the following aspects related to the methods’ performance: (1) parameter accuracy, (2) the direction of the estimation bias, and (3) statistical power and Type I errors. Prior to these analyses, we examined whether the candidate solutions for the iterations of each method tend to get closer to the desired solution (i.e., convergence), which is a key area of concern when using iterative algorithms (Henseler 2010). All methods examined in this study (i.e., PLS, GSCA, and sum scores) converge across all factor level combinations, providing support for their ability to provide proper solutions.
Parameter accuracy
In order to be useful, an SEM method should produce parameter estimates that are very similar to the prespecified parameters of the artificially generated datasets. The mean absolute error (MAE; e.g., Hulland et al. 2010) is a commonly used quantity to measure how close estimates and prespecified parameters are. The MAE is defined as
where t equals the number of parameters, θ _{ j } is the prespecified parameter j, and \( {\widehat{\theta}}_j \) is the parameter estimate in any replication. Thus, the lower the MAE the higher a method’s parameter accuracy. An MAE for the measurement models of, for example, 0.05 indicates that the absolute deviation between estimated and prespecified weights is on average 0.05 units.
Figure 3 displays the MAE results for the measurement models in a series of eight charts.^{Footnote 6} The left column of charts shows the results for unequal weights, whereas the right column shows those for equal weights. The four rows display the results for two, four, six, and eight indicators per measurement model. Each chart in Fig. 3 illustrates the PLS and GSCA results for different sample sizes (i.e., 100, 250, 500, 1000, and 10,000) on the xaxis, and the MAE values of the measurement models (MAE MM) on the yaxis. When using the sum scores approach, no measurement model estimation occurs as this approach draws on equal weights by design. Thus, assessing sum scores’ performance in measurement models is not meaningful.
When comparing the charts in Fig. 3 across rows and columns, we find that the MAE values vary only marginally depending on the number of indicators per measurement model and the type of weights (i.e., equal vs. unequal). MAE values are lower for fewer indicators per measurement model but the differences are not pronounced. On the contrary, the sample size has a significant bearing on the results—MAE values drop considerably as sample size increases. When comparing PLS and GSCA, we find that GSCA always yields slightly lower MAE values across all simulation conditions. These differences become smaller with higher sample sizes.
Complementing our assessment of the methods’ parameter accuracy in the measurement models, the following analyses address their accuracy when estimating the relationships between latent variables in the structural model. The presentation of MAE values for the structural model in Fig. 4 follows the same principle as in Fig. 3 with columns and rows showing the MAE values for different types of weights and numbers of indicators. The xaxis maps the sample size, while the yaxis shows each method’s MAE values for the structural model (MAE SM). An MAE value for the structural model of, for example, 0.05 indicates that the absolute deviation between estimated and prespecified path coefficients is on average 0.05 units. Different from our previous assessment, the charts also display sum scores since this approach entails an explicit estimation of structural model relationships.
Similar to the MAE results of the measurement models, we find that the MAE values of the structural model only marginally vary across different numbers of indicators per measurement model and the type of weights (i.e., equal vs. unequal). The MAE values produced by PLS and GSCA are almost identical, except for small sample sizes of 100 where PLS is slightly more accurate. As sample size increases, parameter accuracy generally improves for all three methods, but this improvement is less pronounced for the sum scores approach compared to PLS and GSCA.
Further contrasting the methods’ performance shows that PLS and GSCA generally have higher parameter accuracy than the sum scores approach when indicator weights are unequal. The differences between PLS and GSCA on the one hand and the sum scores approach on the other are particularly pronounced for larger sample sizes; for example, with four indicators and 10,000 observations the sum scores approach’s MAE values are four times as high as those of PLS and GSCA. Only in two situations with unequal weights (when measurement models have six or eight indicators and 100 observations), does the sum scores approach perform slightly better than PLS and GSCA. As sample sizes increase, however, its performance deteriorates relative to the other approaches.
Not surprisingly, a different picture emerges when indicator weights are equal. In this situation, the sum scores approach performs better or as well as PLS and GSCA across all simulation conditions. The differences between the methods are marginal, however, especially for sample sizes of 250 and higher, independent of the number of indicators. Only for 100 observations and when measurement models have six or eight indicators, differences between the sum scores approach and the other two methods are more pronounced.
To further our assessment of parameter accuracy, and in order to facilitate the comparison of the results with those of Reinartz et al. (2009), we also computed the mean absolute relative error (MARE) defined as
where t equals the number of parameters, θ _{ j } is the prespecified parameter j, and \( {\widehat{\theta}}_j \) is the parameter estimate in any replication. This equation is similar to the MAE (Eq. 1) in that the MARE represents the MAE relative to the prespecified parameter values. A MARE value of, for example, 0.10 indicates that the prespecified coefficients were missed by 10% on average. Thus, if a prespecified parameter has a relatively low value, a comparatively small MAE entails a relatively rather large MARE.^{Footnote 7}
The findings resulting from the MARE assessment parallel those from the MAE analysis.^{Footnote 8} While the types of weights and number of indicators have a limited impact, the MARE improves considerably for increasing sample sizes in both, the measurement and structural models (Figures A1 and A2 in the Online Appendix). For example, for 100 observations the methods’ MARE values in the structural model (Fig. A2) are around 0.25, and decrease to 0.15 for 250 observations, 0.10 for 500 and 1000 observations, and finally, less than 0.05 for 10,000 observations—except for the sum scores approach when indicator weights are unequal, particularly for large sample sizes. Most notably, the MARE values in Reinartz et al. (2009) show a similar pattern, but at a considerably higher level. For example, regardless of the number of indicators, MARE values in Reinartz et al. (2009) never drop below 0.10, even for a sample size of 10,000. This difference underlines the importance of aligning the data generation procedure with the assumptions underlying the compositebased SEM methods such as PLS.
Direction of the estimation bias
The previous analysis indicated the degree of estimation bias. The next step is to evaluate the direction of this bias. By doing so, we evaluate whether the methods exhibit a systematic underestimation or overestimation tendency of the prespecified parameter values. Our assessment of each method’s direction of the estimation bias builds on the mean error (ME), which is defined analogous to the MAE (Eq. 1), but not in terms of an absolute value:
Here, t equals the number of parameters, θ _{ j } is the prespecified parameter j, and \( {\widehat{\theta}}_j \) is the parameter estimate in any replication. The goal is to obtain a low ME close to zero, which substantiates that a systematic underestimation or overestimation is not an issue. A substantial positive (negative) ME confirms that the estimated parameter values are systematically larger (smaller) than the prespecified values of this simulation study. An ME for the measurement models of, for example, −0.05 implies that the estimated values are on average 0.05 units lower than the prespecified parameters.
Analyzing the ME of the weights estimates in the measurement models (Fig. A3 in the Online Appendix), we find that PLS and GSCA perform very well with practically no systematic bias across most simulation conditions. Both methods show slight underestimation tendencies, which are more pronounced when sample sizes are small (i.e., 100), particularly for six and eight indicator models, independent of whether the indicator weights are equal or not. Even in these settings, however, the ME values never exceed 0.025 units, which is very low in absolute terms. Thus, as sample size increases the ME diminishes, approaching zero for 10,000 observations.
With regard to the ME in the structural model (Fig. A4 in the Online Appendix), PLS and GSCA show slight overestimation tendencies. The overestimation is more pronounced when sample sizes are small (i.e., 100), particularly for six and eight indicator models. But with larger sample sizes, the magnitude of the overestimation becomes trivial, especially for sample sizes of 250 and higher.
PLS and GSCA clearly outperform the sum scores approach when indicator weights are unequal. The sum scores approach always underestimates the path coefficients, particularly when the measurement models only have two or four indicators. In this situation, the sum scores approach’s ME values peak at 0.046 units. The pronounced underestimation tendency for the sum scores approach when indicator weights are unequal, however, translates into a slight overestimation tendency when indicator weights are equal. In this situation, the approach always yields ME values equal to or lower than PLS and GSCA.
Statistical power and Type I errors
The next analyses addresses the question of how well the methods reveal relationships with low (β = 0.15), medium (β = 0.30), and high (β = 0.50) population effect sizes. This assessment is particularly important to applied researchers, because any failure to correctly depict such relationships as significant may lead to erroneous conclusions with adverse consequences for theory building and testing (Cohen 1988). To test for each method’s statistical power, we calculated the relative occurrence of significant paths found by each method and averaged the results with paths of identical population effect size (Reinartz et al. 2009). For example, in terms of low effect sizes, the analysis considers the model parameters β _{ 6 }, γ _{ 2 }, and γ _{ 3 } (Fig. 2).
The results show that PLS performs best in terms of statistical power, followed by GSCA and the sum scores approach (Table A2 in the Online Appendix). For low effect sizes and small sample sizes, PLS yields power values of approximately 0.70, whereas the sum scores approach and GSCA have power values of 0.60 and 0.40, respectively. The differences become less pronounced as sample sizes increase and ultimately diminish for 1000 observations or more. This pattern of results also holds for medium and high effect sizes, with the only difference being that the methods’ power values already align at a sample size of 250.
Mistakes in the interpretation of results and false conclusions can also occur if the methods erroneously render a null relationship as significant (i.e., Type I error; false positives). For this reason, we analyze the null relationships (γ _{ 4 } and γ _{ 5 }) in the structural model (Fig. 2) and determine the probability of rendering these relationships significant—hence, low probabilities indicate small Type I errors. We find that all three methods have very low Type I error rates of less than 10% (Table A2). This especially holds for PLS, which has the lowest error rates in most of the factor level constellations.
Discussion and conclusions
Summary of results
Along with the increasing prominence of compositebased SEM methods, particularly in the marketing field (Hair et al. 2012a; Henseler et al. 2009), researchers have started calling for their emancipation from factorbased SEM methods. As Rigdon (2012, p. 353) notes with regard to PLS, “instead of emphasizing how it is ‘like’ factorbased SEM but with advantages and disadvantages across different conditions, PLS path modeling should celebrate its status as a purely compositebased method.” To address this call and further the emancipation of compositebased SEM from its factorbased sibling, researchers require a more nuanced understanding of these methods’ performances based on strong simulation results (Henseler et al. 2014; Sarstedt et al. 2016). However, reviewing prior simulation studies gives rise to substantial concern.
Any simulation study begins by specifying a population—a true state from which data are sampled (Paxton et al. 2001). Even though a number of simulation studies have focused on evaluating the performance of compositebased SEM techniques, most notably PLS, practically all of these studies began by assuming that the common factor model represents the truth (Becker et al. 2013a; Rigdon 2016). Hence, most previous simulations evaluated models that were incorrectly specified relative to the population and, thereby, introduced a substantial research design bias. Addressing corresponding calls in prior research (Chin 2010; Hwang et al. 2010; Marcoulides and Chin 2013; Marcoulides et al. 2012), this study is the first to provide a systematic assessment of and comparison between PLS, GSCA, and the sum scores approach based on composite model data that is consistent with the functional principles of the estimators.
Our results show that PLS and GSCA are consistent estimators of measurement and structural model coefficients when the underlying population is composite modelbased—approaching the prespecified values as sample sizes increase. However, both methods show slight tendencies to underestimate measurement model parameters and small overestimation tendencies of structural model parameters. In fact, in absolute terms the parameter biases in measurement models are quite pronounced for a small sample size of 100, yielding MAE values above 0.10, regardless of the number of indicators and weights pattern (i.e., equal or unequal). In the extreme case (a condition with 100 observations and eight indicators with unequal weights at relatively low levels), PLS and GSCA show relative deviations of measurement model parameter estimates (as expressed through MARE) of 60% and more. For larger sample sizes, however, the relative deviations decrease substantially, dropping below 10% for 10,000 observations.
Compared to the measurement models, structural model MAE values are lower when the sample size is small (i.e., 100 observations), with values of around 0.075. For 250 observations, structural model MAE values drop below 0.05, diminishing further as sample size increases. Similarly, the relative deviations of parameter estimates (i.e., the MARE values) in the structural model quickly decrease for larger sample sizes. As sample size increases to 10,000, the relative deviation falls below 5%, providing support for the consistency of the PLS and GSCA methods. This finding does not hold for the sum scores approach, however, which exhibits MARE values of 7.5% and higher for large sample sizes when indicator weights are unequal. As unequal indicator weights are more the rule than the exception in applied research, these results cast substantial doubt on prior research that called for using sum scores over PLS or GSCA, but in making this recommendation relied on factor modelbased data (Rönkkö and Evermann 2013).
Similarly, our results demonstrate that prior allegations suggesting that PLS is not suitable for disclosing null effects, without formally testing this claim (Rönkkö and Evermann 2013; Rönkkö et al. 2016), or drawing on factor modelbased data (Rönkkö et al. 2015) are without merit. Our study reveals that model estimation using PLS, GSCA, and sum scores yields low Type I error rates of around 5%, regardless of sample size and measurement model setup. Additionally, when statistical power is considered, our results indicate that PLS should be the preferred method visàvis GSCA. While both methods achieve power values close to 1.00 for sample sizes of 500 or more, PLS excels at small sample sizes, particularly when effect sizes are low. In this situation, PLS’s power is approximately 75% higher than that of GSCA.
Implications for marketing researchers
None of the compositebased SEM approaches dominates in all criteria. Instead, each approach has certain strengths and weaknesses, which make it suitable when the focus is on parameter accuracy in the measurement models, the structural model, or on statistical power. Against this background, Fig. 5 visualizes the recommendations to marketing researchers that result from the simulation study.
When the aim is to maximize parameter recovery accuracy, researchers should draw on PLS or GSCA. The latter method has slight advantages in terms of estimating measurement models, but PLS offers higher accuracy in the estimation of structural model parameters. Only when sample size is small (i.e., 100) and measurement models are complex (i.e., 6 or more indicators), the sum scores approach performs favorably in terms of structural model parameter accuracy while having the same Type I error rate and statistical power as PLS. In this situation, the bias in indicator weights due to sampling variability is higher than the bias resulting from the assignment of equal weights. The lower average weights that occur when using many indicators amplify this effect. Except for this specific condition, however, our results advise against the use of the sum scores approach. When the primary focus is disclosing the significance of structural model effects, researchers should choose PLS in light of the method’s high statistical power compared to the sum scores approach and GSCA.
Our results in terms of parameter recovery directly contradict previous findings regarding the performance of compositebased PLS in factor modelbased populations, where the method has been shown to overestimate measurement model parameters while underestimating structural model parameters (e.g., Barroso et al. 2010; Chin et al. 2003; Reinartz et al. 2009). In composite modelbased populations, the direction of parameter bias is reversed and much smaller. For example, PLS’s relative deviations in structural model estimates are always smaller than those obtained when using the method to estimate factor modelbased data, as in Reinartz et al.’s (2009) study, which used a highly similar simulation design. Interestingly, the very same holds when comparing PLS’s performance in this study with that of factorbased SEM in Reinartz et al.’s (2009) study when estimating factor modelbased populations. For example, in their study, factorbased SEM’s parameter estimates deviate between 30% and 65% for small sample sizes (Fig. 2 in Reinartz et al. 2009), depending on the number of indicators in the measurement models, while in this study deviations produced by PLS are around 25% (Fig. A2). Clearly, when using PLS on correctly specified populations in simulation studies, the biases are smaller than when using factorbased SEM on correctly specified populations.
Our results also cast doubt on the widely held belief that PLS is universally applicable for small sample size situations (e.g., Chin and Newsted 1999; Haenlein and Kaplan 2004; Hair et al. 2011). While this study is not the first to question PLS’s performance in this regard (Goodhue et al. 2012a; Rönkkö and Evermann 2013), it is the first to do so on the grounds of an approriately specified (i.e., composite model) population. However, this conclusion primarily holds for the measurement models, where the relative deviations of parameter estimates are pronounced. PLS’s performance is much less affected in the structural model where the relative deviations for small sample sizes are comparable to those produced by factorbased SEM for sample sizes of 250 to 500 when estimating factor modelbased data (Reinartz et al. 2009). The implications of this aspect of our analysis are twofold.
First, these results give confidence that the ACSI model (Fornell et al. 1996) or the technology acceptance model (Davis 1989) and its extensions (Venkatesh et al. 2003), which have been validated using hundreds or thousands of observations, offer valid results on both measurement and structural model levels (Anderson et al. 2004). Researchers frequently interpret the latent variable scores from these models and use the scores to run betweenmodel meancomparisons. For example, drawing on latent variables scores from PLS analyses, Hult et al. (2017) recently used ACSI scores to investigate the extent to which managers’ perceptions of the levels and drivers of their customers’ satisfaction and loyalty align with that of their actual customers. Similarly, a multitude of studies on the marketingfinance interface draws on ACSI scores derived from PLSbased indicator weighting (e.g., AnguloRuiz et al. 2014; Fornell et al. 2016; Habel and Klarmann 2015; Lee et al. 2015) to offer evidence for the value relevance of customer satisfaction and related marketing asset variables. Our results regarding the parameter accuracy of indicator weights provide support for the validity of such analyses, given that the sample size is sufficiently large, as it is the case with the original ACSI data. However, for small sample sizes (e.g., 100), indicator weights estimates show pronounced biases that may compromise the validity of latent variable scores.
Second, caution needs to be exercised when interpreting PLS or GSCA results on an item level when sample size is small since the biases produced in this situation potentially cast doubt on any prioritization on the grounds of indicator weights. This result is particularly relevant for studies implementing formative measurement models, which continue to feature prominently in marketing research (e.g., Ranjan and Read 2016; Rubera et al. 2016; Wolter and Cronin 2016). Similarly, findings from an importanceperformance map analyses (Ringle and Sarstedt 2016), which contrast each item’s total effect on a target construct (importance) with the average latent variable scores (performance) are prone to biased indicator weights with small sample sizes. Considering the prominence of compositebased SEM when sample sizes are small, this caveat holds for many studies. For example, Hair et al.’s (2012b) review of PLS use in strategic management shows that more than half of all models estimated with PLS draw on sample sizes of 100 or less. In other fields such as marketing (Hair et al. 2012a), management information systems (Ringle et al. 2012), and supply chain management (Kaufmann and Gaeckler 2015), this is the case for about a quarter of all models. In our review of PLS use in JM and JAMS between 1986 and 2015, the majority of studies (64.15%) incorporated at least one model with a sample size of less than 250; seven of these 53 PLS studies (13.21%) had at least one model with less than 100 observations, with Green et al. (1995) using a sample size as low as 39. Different from factorbased SEM, PLS provides measurement model estimates even when the sample size is very small. But authors, reviewers and editors should question the value of those estimates in small sample size situations and rather focus on the structural model outcomes. However, when the aim is to interpret measurement model results, we concur with Rigdon (2016, p. 600) who recently noted that “with respect to both compositebased and factorbased approaches to SEM, if sample size is small, the best course is to get more data.” In BtoB research, for example, large samples often are not available. In such situations, when populations are small and/or data is difficult to obtain, the application of PLS with smaller samples denotes a viable attempt to advance knowledge in these areas.
Future research avenues
Our results in terms of parameter recovery in the measurement models reinforce Rigdon’s (2012, p. 353) call that methodologists “should work to complete and validate a purely compositebased approach to evaluating modeling results.” Using compositebased SEM instead of factorbased SEM does not mean rejecting rigor. Instead, it means defining rigor in composite terms (Rigdon 2014, 2016). With regard to PLS, such an evaluation approach should take the method’s prediction orientation into account (Shmueli et al. 2016). Further advancing the set of predictionoriented evaluation criteria is a particularly promising area of future research (see Becker et al. 2013a; Evermann and Tate 2016; Sarstedt et al. 2014; Schubring et al. 2016).
With respect to the recovery of composite model parameters, subsequent studies should extend the simulation design by considering more complex model structures, such as hierarchical component models (e.g., Becker et al. 2012), interaction terms (e.g., Henseler and Fassott 2010), mediating effects (Hair et al. 2017), and nonlinear effects (e.g., Rigdon et al. 2010). While the use of these modeling elements has recently become more en vogue, little is known about the efficacy of different compositebased SEM methods when estimating such model structures. Followup simulation studies should also complement our design, which focused on parameter accuracy as well as Type I and II errors, by considering the methods’ predictive power and different weighting modes (i.e., correlation versus regression weights; Becker et al. 2013a). Future research should also include a broader set of compositebased SEM methods such as regularized generalized canonical correlation analysis (Tenenhaus and Tenenhaus 2011), best fitting proper indices (Dijkstra and Henseler 2011), and the extended PLS algorithm, which Lohmöller (1989) recommended to overcome some of the original PLS method’s restrictions in terms of model specifications. For example, the extended PLS algorithm allows for imposing restrictions on model parameters (e.g., the orthogonalization of latent variables to circumvent collinearity issues) and for assigning an indicator to multiple constructs. Researchers have largely overlooked the extended PLS algorithm; consequently, its performance has not yet been examined across a broader set of data and model constellations.
Complementing our empirical perspective, future studies should aim at providing analytical support for the methods’ consistency in estimating measurement and structural model parameters. Providing such an analytical rationale would complement prior research on the consistency of PLS estimates in common factor model populations (Hui and Wold 1982), thereby broadening our understanding of the methods’ performance in composite modelbased populations.
Finally, while the parameter bias that PLS produces when estimating common factor models has been extensively debated in the literature (e.g., Goodhue et al. 2012a; Reinartz et al. 2009; Rönkkö and Evermann 2013), the bias that factorbased SEM produces when estimating composite models has not yet been explored in depth. Initial results show that this bias can be substantial (Sarstedt et al. 2016), but future research should aim at evaluating in greater detail how data and model constellations affect factorbased SEM’s parameter accuracy in this context. Understanding this bias would help researchers to clarify the consequences of mistakenly using factorbased SEM on composite modelbased populations. This endeavor would be particularly fruitful since researchers increasingly oppose adherence to the common factor model in SEM analyses (e.g., Rigdon 2012; Schönemann and Wang 1972; Treiblmaier et al. 2011). For example, among 72 articles published during 2012 in what Atinc et al. (2012) consider the four leading management journals (Academy of Management Journal, Journal of Applied Psychology, Journal of Management, and Strategic Management Journal) that tested one or more common factor model(s), fewer than 10% contained a common factor model that did not have to be rejected. While these results do not necessarily imply that the composite model predominantly holds in practice, understanding the consequences of using factorbased SEM on such models would contribute to broadening our understanding of this method type, which is still the mainstay for analyzing cause–effect models in marketing.
Notes
 1.
Note that researchers frequently distinguish between latent variables/constructs and composites. We use the term latent variable/construct to refer to the entities that represent conceptual variables in a structural equation model.
 2.
Our comparison does not consider consistent PLS (PLSc; Dijkstra 2014; Dijkstra and Henseler 2015) that corrects the PLS estimates for attenuation to mimic common factor models. As our objective is to compare compositebased SEM techniques on the basis of composite model data, PLSc is not relevant to our study.
 3.
Note that constructs in factorbased SEM are also proxies for the conceptual variables under investigation (Rigdon 2012).
 4.
Table A1 in the Online Appendix shows the indicator weights for different numbers of indicators.
 5.
For further details about the nonnormal data, see the additional information on the data generation in the Online Appendix.
 6.
As the analyses show only marginal differences between normal and nonnormal data, the following results presentations use the joint outcomes of the different data distribution types considered in this simulation study.
 7.
For example, for the condition with 500 observations, two indicators with equal weights of 0.625, PLS yields a MAE value of 0.05814 in the measurement models, which translates into an MARE value of 0.093. On the contrary, a very similar MAE value of 0.06029 for the condition with 500 observations, eight indicators with equal weights of 0.25 translates into a MARE value of 0.241.
 8.
Note that the MARE is not defined for the two null paths γ _{ 4 } and γ _{ 5 } (Fig. 2). Hence, we did not include these two paths in the MARE computations.
References
Anderson, E. W., & Fornell, C. G. (2000). Foundations of the American customer satisfaction index. Total Quality Management, 11(7), 869–882.
Anderson, E. W., Fornell, C. G., & Mazvancheryl, S. K. (2004). Customer satisfaction and shareholder value. Journal of Marketing, 68(4), 172–185.
AnguloRuiz, F., Donthu, N., Prior, D., & Rialp, J. (2014). The financial contribution of customeroriented marketing capability. Journal of the Academy of Marketing Science, 42(4), 380–399.
Atinc, G., Simmering, M. J., & Kroll, M. J. (2012). Control variable use and reporting in macro and micro management research. Organizational Research Methods, 15(1), 57–74.
Babin, B. J., Hair, J. F., & Boles, J. S. (2008). Publishing research in marketing journals using structural equation modeling. Journal of Marketing Theory and Practice, 16(4), 279–285.
Barroso, C., Carrión, G. C., & Roldán, J. L. (2010). Applying maximum likelihood and PLS on different sample sizes: studies on SERVQUAL model and employee behavior model. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: concepts, methods and applications (pp. 427–447). Berlin: Springer.
Becker, J. M., Klein, K., & Wetzels, M. (2012). Hierarchical latent variable models in PLSSEM: guidelines for using reflectiveformative type models. Long Range Planning, 45(5–6), 359–394.
Becker, J. M., Rai, A., & Rigdon, E. E. (2013a). Predictive validity and formative measurement in structural equation modeling: Embracing practical relevance. In 2013a Proceedings of the international conference on information systems. Milan.
Becker, J. M., Rai, A., Ringle, C. M., & Völckner, F. (2013b). Discovering unobserved heterogeneity in structural equation models to avert validity threats. MIS Quarterly, 37(3), 665–694.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Brady, M. K., Voorhees, C. M., & Brusco, M. J. (2012). Service sweethearting: its antecedents and customer consequences. Journal of Marketing, 76(2), 81–98.
Burke, S. J. (2011). Competitive positioning strength: market measurement. Journal of Strategic Marketing, 19(5), 421–428.
Cassel, C., Hackl, P., & Westlund, A. H. (1999). Robustness of partial leastsquares method for estimating latent variable quality structures. Journal of Applied Statistics, 26(4), 435–446.
Chin, W. W. (1998). The partial least squares approach to structural equation modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 295–358). Mahwah: Erlbaum.
Chin, W. W. (2010). Bootstrap crossvalidation indices for PLS path model assessment. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares (pp. 83–97). Berlin: Springer.
Chin, W. W., & Newsted, P. R. (1999). Structural equation modeling analysis with small samples using partial least squares. In R. H. Hoyle (Ed.), Statistical strategies for small sample research (pp. 307–341). Thousand Oaks, CA: Sage.
Chin, W. W., Marcolin, B. L., & Newsted, P. R. (2003). A partial least squares latent variable modeling approach for measuring interaction effects: results from a Monte Carlo simulation study and an electronicmail emotion/adoption study. Information Systems Research, 14(2), 189–217.
Cliff, N. (1983). Some cautions concerning the application of causal modeling methods. Multivariate Behavioral Research, 18(1), 115–126.
Cohen, J. (1988). Statistical power analysis for the behavioural sciences (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.
Core Team, R. (2016). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340.
Dellande, S., Gilly, M. C., & Graham, J. L. (2004). Gaining compliance and losing weight: the role of the service provider in health care services. Journal of Marketing, 68(3), 78–91.
Diamantopoulos, A., & Riefler, P. (2011). Using formative measures in international marketing models: a cautionary tale using consumer animosity as an example. In M. Sarstedt, M. Schwaiger, & C. R. Taylor (Eds.), Advances in international marketing (Vol. 22, pp. 11–30). Bingley: Emerald.
Dijkstra, T. K. (2010). Latent variables and indices: Herman Wold’s basic design and partial least squares. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: concepts, methods and applications (pp. 23–46). Berlin: Springer.
Dijkstra, T. K. (2014). PLS' Janus face – response to professor Rigdon's ‘rethinking partial least squares modeling: in praise of simple methods’. Long Range Planning, 47(3), 146–153.
Dijkstra, T. K., & Henseler, J. (2011). Linear indices in nonlinear structural equation models: best fitting proper indices and other composites. Quality & Quantity, 45(6), 1505–1518.
Dijkstra, T. K., & Henseler, J. (2015). Consistent partial least squares path modeling. MIS Quarterly, 39(2), 297–316.
Eklöf, J. A., & Westlund, A. H. (2002). The panEuropean customer satisfaction index program: current work and the way ahead. Total Quality Management, 13(8), 1099–1106.
Evermann, J., & Tate, M. (2016). Assessing the predictive performance of structural equation model estimators. Journal of Business Research, 69(10), 4565–4582.
Fornell, C. G., Johnson, M. D., Anderson, E. W., Cha, J., & Bryant, B. E. (1996). The American customer satisfaction index: nature, purpose, and findings. Journal of Marketing, 60(4), 7–18.
Fornell, C., Morgeson, F. V., & Hult, G. T. M. (2016). Stock returns on customer satisfaction do beat the market: gauging the effect of a marketing intangible. Journal of Marketing, 80(5), 92–107.
Gelbrich, K. (2010). Anger, frustration, and helplessness after service failure: coping strategies and effective informational support. Journal of the Academy of Marketing Science, 38(5), 567–585.
Goodhue, D. L., Lewis, W., & Thompson, R. (2012a). Does PLS have advantages for small sample size or nonnormal data? MIS Quarterly, 36(3), 981–1001.
Goodhue, D. L., Lewis, W., & Thompson, R. (2012b). Comparing PLS to regression and LISREL: a response to Marcoulides, Chin, and Saunders. MIS Quarterly, 36(3), 703–716.
Green, D. H., Donald, W. B., & Ryans, A. B. (1995). Entry strategy and longterm performance: conceptualization and empirical examination. Journal of Marketing, 59(4), 1–16.
Habel, J., & Klarmann, M. (2015). Customer reactions to downsizing: when and how is satisfaction affected? Journal of the Academy of Marketing Science, 43(6), 768–789.
Haenlein, M., & Kaplan, A. M. (2004). A beginner's guide to partial least squares analysis. Understanding Statistics, 3(4), 283–297.
Haenlein, M., & Kaplan, A. M. (2011). The influence of observed heterogeneity on path coefficient significance: technology acceptance within the marketing discipline. Journal of Marketing Theory and Practice, 19(2), 153–168.
Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLSSEM: indeed a silver bullet. Journal of Marketing Theory and Practice, 19(2), 139–151.
Hair, J. F., Sarstedt, M., Ringle, C. M., & Mena, J. A. (2012a). An assessment of the use of partial least squares structural equation modeling in marketing research. Journal of the Academy of Marketing Science, 40(3), 414–433.
Hair, J. F., Sarstedt, M., Pieper, T. M., & Ringle, C. M. (2012b). The use of partial least squares structural equation modeling in strategic management research: a review of past practices and recommendations for future applications. Long Range Planning, 45(5–6), 320–340.
Hair, J. F., Ringle, C. M., & Sarstedt, M. (2013). Partial least squares structural equation modeling: rigorous applications, better results and higher acceptance. Long Range Planning, 46(1–2), 1–12.
Hair, J. F., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2017). A primer on partial least squares structural equation modeling (PLSSEM) (2nd ed.). Thousand Oaks, CA: Sage.
Heidenreich, S., Wittkowski, K., Handrich, M., & Falk, T. (2015). The dark side of customer cocreation: exploring the consequences of failed cocreated services. Journal of the Academy of Marketing Science, 43(3), 279–296.
HennigThurau, T., Groth, M., Paul, M., & Gremler, D. D. (2006). Are all smiles created equal? How emotional contagion and emotional labor affect service relationships. Journal of Marketing, 70(3), 58–73.
Henseler, J. (2010). On the convergence of the partial least squares path modeling algorithm. Computational Statistics, 25(1), 107–120.
Henseler, J. (2012). Why generalized structured component analysis is not universally preferable to structural equation modeling. Journal of the Academy of Marketing Science, 40(3), 402–413.
Henseler, J., & Fassott, G. (2010). Testing moderating effects in PLS path models: an illustration of available procedures. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: concepts, methods and applications (pp. 713–735). Berlin: Springer.
Henseler, J., & Sarstedt, M. (2013). Goodnessoffit indices for partial least squares path modeling. Computational Statistics, 28(2), 565–580.
Henseler, J., Ringle, C. M., & Sinkovics, R. R. (2009). The use of partial least squares path modeling in international marketing. In R. R. Sinkovics & P. N. Ghauri (Eds.), Advances in international marketing (Vol. 20, pp. 277–320). Bingley: Emerald.
Henseler, J., Dijkstra, T. K., Sarstedt, M., Ringle, C. M., Diamantopoulos, A., Straub, D. W., Ketchen, D. J., Hair, J. F., Hult, G. T. M., & Calantone, R. J. (2014). Common beliefs and reality about partial least squares: comments on Rönkkö & Evermann (2013). Organizational Research Methods, 17(2), 182–209.
Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variancebased structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135.
Henseler, J., Hubona, G. S., & Ray, P. A. (2016). Using PLS path modeling in new technology research: updated guidelines. Industrial Management & Data Systems, 116(1), 1–19.
Hui, B. S., & Wold, H. O. A. (1982). Consistency and consistency at large of partial least squares estimates. In K. G. Jöreskog & H. O. A. Wold (Eds.), Systems under indirect observation, part II (pp. 119–130). Amsterdam: North Holland.
Hulland, J., Ryan, M. J., & Rayner, R. K. (2010). Modeling customer satisfaction: a comparative performance evaluation of covariance structure analysis versus partial least squares. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: concepts, methods and applications (pp. 307–325). Berlin: Springer.
Hult, G. T. M., Morgeson III, F. V., Morgan, N. A., Mithas, S., & Fornell, C. (2017). Do managers know what their customers think and why? Journal of the Academy of Marketing Science, 45(1), 37–54.
Hwang, H. (2009). Regularized generalized structured component analysis. Psychometrika, 74(3), 517–530.
Hwang, H., Malhotra, N. K., Kim, Y., Tomiuk, M. A., & Hong, S. (2010). A comparative study on parameter recovery of three approaches to structural equation modeling. Journal of Marketing Research, 47(4), 699–712.
Jöreskog, K. G. (1973). A general method for estimating a linear structural equation system. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 255–284). New York, NJ: Seminar Press.
Jöreskog, K. G., & Wold, H. O. A. (1982). The ML and PLS techniques for modeling with latent variables: historical and comparative aspects. In H. O. A. Wold & K. G. Jöreskog (Eds.), Systems under indirect observation, part I (pp. 263–270). Amsterdam: NorthHolland.
Kaplan, A. M., Schoder, D., & Haenlein, M. (2007). Factors influencing the adoption of mass customization: the impact of base category consumption frequency and need satisfaction. Journal of Product Innovation Management, 24(2), 101–116.
Kaufmann, L., & Gaeckler, J. (2015). A structured review of partial least squares in supply chain management research. Journal of Purchasing and Supply Management, 21(4), 259–272.
Knaus, J. (2013). R package snowfall: Easier cluster computing (version: 1.84–6). cran.rproject.org/web/packages/snowfall/.
Lee, N., & Cadogan, J. W. (2013). Problems with formative and higherorder reflective variables. Journal of Business Research, 66(2), 242–247.
Lohmöller, J.B. (1989). Latent variable path modeling with partial least squares. Heidelberg: Physica.
Marcoulides, G. A., & Chin, W. W. (2013). You write, but others read: common methodological misunderstandings in PLS and related methods. In H. Abdi, W. W. Chin, V. Esposito Vinzi, G. Russolillo, & L. Trinchera (Eds.), New perspectives in partial least squares and related methods (pp. 31–64). New York, NJ: Springer.
Marcoulides, G. A., Chin, W. W., & Saunders, C. (2012). When imprecise statistical statements become problematic: a response to Goodhue, Lewis, and Thompson. MIS Quarterly, 36(3), 717–728.
McDonald, R. P. (1996). Path analysis with composite variables. Multivariate Behavioral Research, 31(2), 239–270.
McIntosh, C. N., Edwards, J. R., & Antonakis, J. (2014). Reflections on partial least squares path modeling. Organizational Research Methods, 17(2), 210–251.
Monecke, A., & Leisch, F. (2012). semPLS: structural equation modeling using partial least squares. Journal of Statistical Software, 48(3), 1–32.
Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Chen, F. (2001). Monte Carlo experiments: design and implementation. Structural Equation Modeling, 8(2), 287–312.
Ranjan, K. R., & Read, S. (2016). Value cocreation: concept and measurement. Journal of the Academy of Marketing Science, 44(3), 290–315.
Rego, L. L., Morgan, N. A., & Fornell, C. (2013). Reexamining the market share–customer satisfaction relationship. Journal of Marketing, 77(5), 1–20.
Reinartz, W. J., Haenlein, M., & Henseler, J. (2009). An empirical comparison of the efficacy of covariancebased and variancebased SEM. International Journal of Research in Marketing, 26(4), 332–344.
Rigdon, E. E. (1998). Structural equation modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 251–294). Mahwah: Erlbaum.
Rigdon, E. E. (2012). Rethinking partial least squares path modeling: in praise of simple methods. Long Range Planning, 45(5–6), 341–358.
Rigdon, E. E. (2014). Rethinking partial least squares path modeling: breaking chains and forging ahead. Long Range Planning, 47(3), 161–167.
Rigdon, E. E. (2016). Choosing PLS path modeling as analytical method in European management research: a realist perspective. European Management Journal, 34(6), 598–605.
Rigdon, E. E., Ringle, C. M., & Sarstedt, M. (2010). Structural modeling of heterogeneous data with partial least squares. In N. K. Malhotra (Ed.), Review of marketing research (pp. 255–296). Armonk: Sharpe.
Rigdon, E. E., Becker, J.M., Rai, A., Ringle, C. M., Diamantopoulos, A., Karahanna, E., Straub, D. W., & Dijkstra, T. K. (2014). Conflating antecedents and formative indicators: a comment on AguirreUrreta and Marakas. Information Systems Research, 25(4), 780–784.
Ringle, C. M., & Sarstedt, M. (2016). Gain more insight from your PLSSEM results: the importanceperformance map analysis. Industrial Management & Data Systems, 116(9), 1865–1886.
Ringle, C. M., Sarstedt, S., & Straub, D. W. (2012). A critical look at the use of PLSSEM in MIS Quarterly. MIS Quarterly, 36(1), iii–xiv.
Ringle, C. M., Sarstedt, M., & Schlittgen, R. (2014). Genetic algorithm segmentation in partial least squares structural equation modeling. OR Spectrum, 36(1), 251–276.
Romdhani, H., Grinek, S., Hwang, H., & Labbe, A. (2014). R package ASGSCA: Association studies for multiple SNPs and multiple traits using generalized structured equation models (Version 1.4.0), http://bioconductor.org/packages/ASGSCA/.
Rönkkö, M., & Evermann, J. (2013). A critical examination of common beliefs about partial least squares path modeling. Organizational Research Methods, 16(3), 425–448.
Rönkkö, M., McIntosh, C. N., & Antonakis, J. (2015). On the adoption of partial least squares in psychological research: caveat emptor. Personality and Individual Differences, 87, 76–84.
Rönkkö, M., McIntosh, C. N., Antonakis, J., & Edwards, J. R. (2016). Partial least squares path modeling: time for some serious second thoughts. Journal of Operations Management, 4748(November), 9–27.
Rubera, G., Chandrasekaran, D., & Ordanini, A. (2016). Open innovation, product portfolio innovativeness and firm performance: the dual role of new product development capabilities. Journal of the Academy of Marketing Science, 44(2), 166–184.
Sarstedt, M., Ringle, C. M., Henseler, J., & Hair, J. F. (2014). On the emancipation of PLSSEM: a commentary on Rigdon (2012). Long Range Planning, 47(3), 154–160.
Sarstedt, M., Hair, J. F., Ringle, C. M., Thiele, K. O., & Gudergan, S. P. (2016). Estimation issues with PLS and CBSEM: where the bias lies! Journal of Business Research, 69(10), 3998–4010.
Schneeweiß, H. (1991). Models with latent variables: LISREL versus PLS. Statistica Neerlandica, 45(2), 145–157.
Schönemann, P. H., & Steiger, J. H. (1978). On the validity of indeterminate factor scores. Bulletin of the Psychonomic Society, 12(4), 287–290.
Schönemann, P. H., & Wang, M.M. (1972). Some new results on factor indeterminacy. Psychometrika, 37(1), 61–91.
Schuberth, F., Henseler, J., & Dijkstra, T. K. (2016). Partial least squares path modeling using ordinal categorical indicators. Quality & Quantity, forthcoming.
Schubring, S., Lorscheid, I., Meyer, M., & Ringle, C. M. (2016). The PLS agent: predictive modeling with PLSSEM and agentbased simulation. Journal of Business Research, 69(10), 4604–4612.
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.
Shmueli, G., Ray, D., Manuel, J., Estrada, V., & Chatla, S. B. (2016). The elephant in the room: evaluating the predictive performance of PLS models. Journal of Business Research, 69(10), 4552–4564.
Spearman, C. (1927). The abilities of man. London: MacMillan.
Steenkamp, J.B. E. M., & Baumgartner, H. (2000). On the use of structural equation models for marketing modeling. International Journal of Research in Marketing, 17(2/3), 195–202.
Sundaram, S., Schwarz, A., Jones, E., & Chin, W. W. (2007). Technology use on the front line: how information technology enhances individual performance. Journal of the Academy of Marketing Science, 35(1), 101–112.
Tenenhaus, M. (2008). Componentbased structural equation modelling. Total Quality Management & Business Excellence, 19(7–8), 871–886.
Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76(2), 257–284.
Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48(1), 159–205.
Thurstone, L, L. (1947). Multiple factor analysis. Chicago: The University of Chicago Press.
Treiblmaier, H., Bentler, P. M., & Mair, P. (2011). Formative constructs implemented via common factors. Structural Equation Modeling: A Multidisciplinary Journal, 18(1), 1–17.
van der Heijden, G. A. H., Schepers, J. J. L., Nijssen, E. J., & Ordanini, A. (2013). Don’t just fix it, make it better! Using frontline service employees to improve recovery performance. Journal of the Academy of Marketing Science, 41(5), 515–530.
Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: toward a unified view. MIS Quarterly, 27(3), 425–478.
Vilares, M. J., & Coelho, P. S. (2013). Likelihood and PLS estimators for structural equation modeling: an assessment of sample size, skewness and model misspecification effects. In J. Lita da Silva, F. Caeiro, I. Natário, & C. A. Braumann (Eds.), Advances in regression, survival analysis, extreme values, Markov processes and other statistical applications (pp. 11–33). Berlin: Springer.
Wold, H. O. A. (1974). Causal flows with latent variables: partings of ways in the light of NIPALS modelling. European Economic Review, 5(1), 67–86.
Wold, H. O. A. (1980). Model construction and evaluation when theoretical knowledge is scarce: theory and application of PLS. In J. Kmenta & J. B. Ramsey (Eds.), Evaluation of econometric models (pp. 47–74). New York, NJ: Academic Press.
Wold, H. O. A. (1982). Soft modeling: the basic design and some extensions. In K. G. Jöreskog & H. O. A. Wold (Eds.), Systems under indirect observations, part II (pp. 1–54). Amsterdam: NorthHolland.
Wold, S., Sjöström, M., & Eriksson, L. (2001). PLSregression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109–130.
Wolter, J. S., & Cronin, J. J. (2016). Reconceptualizing cognitive and affective customercompany identification: the role of selfmotives and different customerbased outcomes. Journal of the Academy of Marketing Science, 44(3), 397–413.
Acknowledgements
Earlier versions of the manuscript have been presented at the 2015 Academy of Marketing Science Annual Conference held in Denver, Colorado, and the 2nd International Symposium on Partial Least Squares Path Modeling: The Conference for PLS Users held in Seville, 2015. The authors would like to thank JanMichael Becker, University of Cologne, Jörg Henseler, University of Twente, and Rainer Schlittgen, University of Hamburg, for their support and helpful comments when developing the simulation study and its data generation in order to improve earlier versions of the manuscript. Even though this research does not explicitly refer to the use of the statistical software SmartPLS (http://www.smartpls.com), Ringle acknowledges a financial interest in SmartPLS.
Author information
Affiliations
Corresponding author
Additional information
John Hulland served as Area Editor for this article.
Electronic Supplementary Material
ESM 1
(DOCX 704 kb)
Rights and permissions
About this article
Cite this article
Hair, J.F., Hult, G.T.M., Ringle, C.M. et al. Mirror, mirror on the wall: a comparative evaluation of compositebased structural equation modeling methods. J. of the Acad. Mark. Sci. 45, 616–632 (2017). https://doi.org/10.1007/s117470170517x
Received:
Accepted:
Published:
Issue Date:
Keywords
 Composite
 Generalized structured component analysis
 GSCA
 Partial least squares
 PLS
 SEM
 Simulation
 Structural equation modeling
 Sum scores regression