Mirror, mirror on the wall: a comparative evaluation of composite-based structural equation modeling methods

Abstract

Composite-based structural equation modeling (SEM), and especially partial least squares path modeling (PLS), has gained increasing dissemination in marketing. To fully exploit the potential of these methods, researchers must know about their relative performance and the settings that favor each method’s use. While numerous simulation studies have aimed to evaluate the performance of composite-based SEM methods, practically all of them defined populations using common factor models, thereby assessing the methods on erroneous grounds. This study is the first to offer a comprehensive assessment of composite-based SEM techniques on the basis of composite model data, considering a broad range of model constellations. Results of a large-scale simulation study substantiate that PLS and generalized structured component analysis are consistent estimators when the underlying population is composite model-based. While both methods outperform sum scores regression in terms of parameter recovery, PLS achieves slightly greater statistical power.

Introduction

Structural equation modeling (SEM) has become a quasi-standard with respect to analyzing cause–effect relationships between latent variables. Its ability to model latent variables while simultaneously taking into account various forms of measurement error makes SEM useful for a plethora of research questions (e.g., Babin et al. 2008; Steenkamp and Baumgartner 2000), particularly in the marketing field, which typically focuses on examining unobservable phenomena such as consumer attitudes, perceptions, and intentions.

To estimate structural equation models, researchers can draw on two main approaches: Factor-based SEM (Jöreskog 1973) as carried out by software programs such as Amos, EQS, LISREL, or Mplus, and partial least squares path modeling (PLS; Wold 1974) as implemented in software programs such as ADANCO, PLS-Graph, SmartPLS, or XLSTAT. A review of all empirical studies published in the 30-year period between 1986 and 2015 in the Journal of Marketing (JM) and Journal of the Academy of Marketing Science (JAMS)—the top two marketing journals according to the 2015 Thomson Reuters Journal Citation Report—demonstrates the relevance of these two SEM methods for applied research. Our search yielded a total of 193 studies that used factor-based SEM, while 53 studies applied PLS. Looking at the cumulative number of studies that appeared between 1986 and 2015 (Fig. 1) shows that the use of both SEM approaches has significantly increased over time. The review also shows that PLS use has gained momentum relative to factor-based SEM in recent years. While in the early 90s, the ratio of factor-based SEM studies to PLS studies was 4.5, this ratio decreased to 2 in the period 2011 to 2015. Regressing the number of studies on the linear effects of time yields significant models (factor-based SEM: F = 62.409, p < 0.01; PLS: F = 25.836, p < 0.01) and time effects (factor-based SEM: β = 0.831, t = 7.900, p < 0.01; PLS: β = 0.693, t = 5.083, p < 0.01). For factor-based SEM, a quadratic effect of time is negative but not significant (β = −0.139, t = −0.314, p > 0.10), indicating that its use has grown linearly over time. In contrast, the quadratic time effect for PLS is positive and significant (β = 0.984, t = 1.817, p < 0.10), indicating its use has accelerated over time.

Fig. 1
figure1

Development of factor-based SEM and PLS applications in marketing

Several of the PLS-based models featured in this review have had a lasting effect on the field. For example, the American Customer Satisfaction Index (ACSI; Fornell et al. 1996) ranks among the most salient models in studying customer satisfaction (e.g., Anderson and Fornell 2000) and has given rise to related indices such as the European Customer Satisfaction Index (Eklöf and Westlund 2002). The ACSI has become a key performance indicator for companies and government agencies, as well as entire industries and sectors. Many follow-up studies use ACSI results, for example, to assess the impact of customer satisfaction on market share (e.g., Rego et al. 2013) or stock returns (e.g., Fornell et al. 2016), furthering our understanding of customer satisfaction’s value relevance. PLS has also been used extensively in services research, for example, to analyze the impact of service failures (e.g., Gelbrich 2010; Heidenreich et al. 2015; van der Heijden et al. 2013) or the drivers of service satisfaction (Brady et al. 2012; Dellande et al. 2004; Hennig-Thurau et al. 2006). Finally, the technology acceptance model (Burke 2011), arguably one of the most widely used PLS-based models for studying user adoption in the information systems discipline, has been applied to the different elements of the marketing mix (Haenlein and Kaplan 2011), such as salesperson behavior (Sundaram et al. 2007) or product adoption (Kaplan et al. 2007).

Wold (1980) created PLS as a complementary approach to factor-based SEM that would emphasize prediction while simultaneously relaxing the demands on data and specification of relationships (e.g., Dijkstra 2010; Jöreskog and Wold 1982). In early writing, researchers noted that PLS estimation is “deliberately approximate” to factor-based SEM (Hui and Wold 1982, p. 127), a characteristic that has come to be known as the PLS bias (e.g., Chin et al. 2003). A number of studies have used simulations to demonstrate the alleged PLS bias (e.g., Goodhue et al. 2012a; McDonald 1996; Rönkkö and Evermann 2013), which manifests itself in estimates between the constructs and indicators that are higher, while estimates among the constructs are smaller compared to the prespecified values.Footnote 1 The parameter estimates will approach what has been labeled the “true” parameter values obtained from factor models when both the number of indicators per construct and sample size increase (Hui and Wold 1982).

While prior simulation studies have aimed to evaluate the performance of composite-based PLS, practically all of the simulations defined populations using factor models in which the indicator covariances define the nature of the data (for a notable exception, see Becker et al. 2013a). That is, these studies used factor-based SEM as the benchmark against which the PLS parameters were evaluated with the assumption that they should be the same. In contrast, in composite model populations, the data are defined by means of linear combinations of indicators. Therefore, prior simulation studies universally estimated and evaluated PLS models that were incorrectly specified with regard to the population model (Rigdon 2016; Rigdon et al. 2014).

Early research warned about the questionable legitimacy of such analyses. More than 25 years ago, Lohmöller (1989) and Schneeweiß (1991) argued that PLS can be seen as a consistent estimator of parameters as long as researchers emphasize which type of population parameter they attempt to estimate. More recently, Marcoulides et al. (2012, p. 717) noted “the comparison of PLS to other methods cannot and should not be applied indiscriminately” and referred to any evaluation of PLS vis-á-vis factor-based SEM methods as “comparing apples with oranges” (p. 725). Similarly, in their comparative study on parameter recovery of factor-based SEM, PLS, and generalized structured component analysis (GSCA; Hwang 2009), Hwang et al. (2010, p. 710) note that their common factor model-based data generation approach “may have had an unfavorable effect on the performance of partial least squares and generalized structured component analysis” and conclude that “it appears necessary in future studies to investigate whether a particular data generation procedure may influence the relative performance of the different approaches.” While these concerns have been echoed by many other authors (e.g., Henseler et al. 2014; McDonald 1996; Rigdon 2012; Tenenhaus 2008), some researchers continue to adhere to the reflex-like application of factor model populations to judge the relative performance of PLS (e.g., McIntosh et al. 2014; Rönkkö et al. 2015) and other composite-based techniques, such as GSCA, and sum scores regression (e.g., Rönkkö and Evermann 2013; Rönkkö et al. 2016).

Apart from these parameterization issues, previous simulation studies univocally focused on structural model estimates, but neglected the measurement models. This narrow focus is problematic, particularly since PLS analyses of common factor–based populations tend to yield higher measurement model estimates, which likely support the measures’ reliability and validity (e.g., Henseler et al. 2015). Understanding the methods’ performance in estimating measurement models is crucial to correctly appreciate their suitability for evaluating measurement model quality—a fundamental step in model evaluation practice. Similarly, prior studies have generally analyzed parameter estimation bias, while neglecting the methods’ sensitivity to Type I and II errors which are, however, fundamental performance features of every statistical method (Goodhue et al. 2012a). In light of the above and despite the diversity of literature on composite-based SEM, it is reasonable to conclude that our understanding of these methods’ actual performance is still rather limited.

Addressing these gaps in research, this study is the first to offer a comprehensive assessment of PLS and other composite-based SEM techniques, considering a broad range of model constellations. Specifically, we examine the relative efficacy of PLS, GSCA, and sum scores regression under a variety of conditions that researchers typically encounter in practice.Footnote 2 By efficacy we refer to the ability of composite-based methods to support a researcher’s need to statistically test hypothesized relationships among constructs and indicators in structural and measurement models (Goodhue et al. 2012a). More specifically, we test whether these composite-based SEM techniques have different abilities in terms of: (1) producing accurate path estimates, (2) avoiding false positives (Type I errors), and (3) avoiding false negatives (Type II errors, related to statistical power). Most importantly, our approach correctly assesses the methods’ efficacy on the basis of composite model populations, rather than common factor model populations. As such, this study addresses numerous calls for further research in this respect (e.g., Chin 2010; Hwang et al. 2010; Marcoulides and Chin 2013).

Our results show that PLS and GSCA perform very similar in terms of parameter accuracy, whereas sum scores regression does not perform as well when indicator weights differ on the same construct. Further analyses show that PLS achieves higher power levels than GSCA. The results directly contradict prior research regarding the performance of PLS and GSCA using factor model-based populations, where these methods have been shown to overestimate measurement model parameters and underestimate structural model parameters (e.g., Barroso et al. 2010; Chin et al. 2003; Reinartz et al. 2009). When using the methods to estimate data from composite model-based populations, the direction of parameter bias is reversed and much smaller, approaching zero as sample size increases. Thus, PLS and GSCA are consistent estimators of composite-based models, whereas this is not the case with sum scores regression.

In what follows, we first compare key aspects of factor-based and composite-based SEM, deriving recommendations for their use. Our comparisons focus on PLS since it is regarded as “the most fully developed and general system for path analysis with composites” (McDonald 1996, p. 240). Next, we describe our simulation design, followed by the results description, and interpretation. Based on our findings, we derive guidelines for choosing among composite-based SEM methods. Finally, we discuss the implications of our findings for marketing researchers as well as our study’s limitations, along with avenues for future research.

Factor-based SEM and PLS

While factor-based SEM and PLS share the same objective—determining the relationships among constructs and indicators—they take different routes to achieve it. Numerous studies have contrasted the two approaches, focusing on aspects such as distributional assumptions, their efficacy for estimating reflective vs. formative measurement models, and sample size requirements (e.g., Chin 1998; Hair et al. 2011; Hair et al. 2012a; Henseler et al. 2009). More recent research, however, emphasizes the interplay between model estimation and treatment of construct measures as the key distinguishing factor of the two approaches (Henseler et al. 2016; Rigdon 2012; Sarstedt et al. 2016).

Factor-based SEM initially divides the variance of each indicator into two parts: (1) the common variance, which is estimated from the variance shared with other indicators in the measurement model of a construct, and (2) the unique variance, which consists of both the specific and the error variance (Bollen 1989; Rigdon 1998). In estimating the model parameters, factor-based SEM only draws on the common variance, assuming that the variance of a set of indicators can be perfectly explained by the existence of one unobserved variable (the common factor) and individual random error (Spearman 1927; Thurstone 1947). This procedure conforms to the measurement philosophy underlying reflective measurement models, which, in essence, is why factor-based SEM has limitations when it comes to estimating formatively specified constructs (e.g., Lee and Cadogan 2013).

Different from factor-based SEM, PLS considers the total variance of the indicators in estimating the model (e.g., Tenenhaus et al. 2005). To do so, PLS linearly combines the indicators to form composites, therefore generally conforming to the measurement philosophy underlying formative measurement models (Henseler et al. 2016). However, PLS’s designation as composite-based refers only to the method’s way to represent constructs that approximate the conceptual variables from a theoretical model—the method readily accommodates both measurement model types without identification issues (Hair et al. 2011).

Composites formed by PLS explicitly serve as proxies of the conceptual variables under investigation (e.g., Henseler et al. 2016; Rigdon 2012).Footnote 3 Because of this trait, researchers in other fields such as chemometrics, refer to the acronym PLS as “projection to latent structures” (Wold et al. 2001, p. 110). That is, PLS approximates common factor-based reflective measurement models (Hui and Wold 1982) and will produce biased estimates if the common factor model holds—just like factor-based SEM produces biased estimates when using the method to estimate data generated from a composite model (Sarstedt et al. 2016). Referring to the proxy nature that underlies all measurement (e.g., Cliff 1983; Rigdon 2012), Sarstedt et al. (2016, p. 4002) recently noted that composites produced by PLS “can be used to measure any type of property to which the focal concept refers, including attitudes, perceptions, and behavioral intentions. (…) As with any type of measurement conceptualization, however, researchers need to offer a clear construct definition and specify items that closely match this definition—that is, they must share conceptual unity.” PLS-based composites can also be applied as a method for dimension reduction, similar to principal components analysis, where the aim is to condense the measures so they adequately cover a conceptual variable’s salient features (Dijkstra and Henseler 2011).

To estimate the model parameters PLS draws on composites formed from the indicators and applies a series of ordinary least squares regressions to estimate partial model structures with the objective of minimizing the error terms (i.e., the residual variance) of the endogenous constructs. Since PLS does not estimate all model relationships simultaneously, the approach enables complex models to be estimated with small sample sizes, situations in which factor-based SEM often does not converge, or develops inadmissible solutions (Henseler et al. 2014). This characteristic has greatly contributed to PLS’s popularity but has also triggered debates among methodologists (e.g., Marcoulides et al. 2012; Rigdon et al. 2014). As Hair et al. (2013, p. 2) note, “some researchers abuse this advantage by relying on extremely small samples relative to the underlying population”, and “PLS-SEM has an erroneous reputation for offering special sampling capabilities that no other multivariate analysis tool has.”

Factor-based SEM estimates model parameters based on the statistical objective of minimizing the discrepancy between the empirical and model-implied covariance matrices. This difference serves as a basis for the χ2-based indices, which allow testing a model’s goodness-of-fit. As such, factor-based SEM focuses on explanatory modeling, which is “the use of statistical models for testing causal explanations” (Shmueli 2010, p. 390). Different from PLS, the construct scores of factor-based SEM need not to be known or assumed at any stage of the estimation process (Jöreskog 1973). A crucial consequence of this indeterminacy is that the correlation between a common factor and any variable outside the factor model is itself indeterminate—it may be high or low, depending on which set of factor scores one chooses (Schönemann and Steiger 1978). As a consequence, factor-based SEM offers very limited value for predictive modeling (Becker et al. 2013a; Evermann and Tate 2016).

PLS on the other hand focuses on prediction, and as such, is concerned with generalization (Shmueli et al. 2016), which is the ability to predict sample data, or preferably out-of-sample data (e.g., Shmueli 2010). Complementing the range of metrics for assessing a model’s predictive relevance (e.g., Hair et al. 2017), researchers have advocated several PLS-based goodness-of-fit indices such as the standardized root mean square residual (SRMR) or the root mean square residual covariance. However, literature casts doubt on whether measured fit—as understood in a factor-based SEM context—is a relevant concept for PLS (Hair et al. 2017; Lohmöller 1989; Rigdon 2012). Different from factor-based SEM, the discrepancy between the empirical and the model-implied covariance matrices, which serves as the basis for fit indices such as SRMR, is a byproduct of the PLS algorithm and not explicitly minimized. Correspondingly, Lohmöller (1989, p. 222) long ago noted that these goodness-of-fit measures “should not be used for a decision about the fit of the model, because these indices are not optimized in the estimation procedure.”

Factor-based SEM and PLS are not interchangeable but rather complementary, a fact that was stressed by the method’s originators (Jöreskog and Wold 1982). Researchers need to apply the SEM approach that best suits their research objective, measurement properties, and model setup. Table 1 summarizes the differences between factor-based SEM and PLS along with sample references discussing each criterion.

Table 1 Comparison between factor-based SEM and PLS-SEM

Simulation design and model estimation

An extensive body of literature provides the technical underpinnings of PLS (e.g., Lohmöller 1989; Tenenhaus et al. 2005; Wold 1982) and GSCA (e.g., Henseler 2012; Hwang 2009; Hwang et al. 2010). GSCA is an alternative method for path analysis with composites. The method replaces factors by exact linear combinations of observed variables, employs a least squares criterion to estimate model parameters, and retains the advantages of PLS (e.g., less restrictive distributional assumptions and no improper solutions). Model estimation using sum scores regression (sum scores) is similar to PLS, but differs in that the sum scores approach assumes equal (typically unit) indicator weights, whereas weightings in PLS represent the partialized effect of the indicators on their corresponding construct, and thus control the individual effects of all the other construct indicators (e.g., Goodhue et al. 2012b; Tenenhaus 2008). We do not elaborate further on the methodological underpinnings of these techniques, but refer to the relevant literature.

Our simulation study considers the path model shown in Fig. 2, which Reinartz et al. (2009) also used in their comprehensive simulation study on the comparison of PLS and factor-based SEM. The model mirrors the structure of the ACSI model (e.g., Fornell et al. 1996) and the European Customer Satisfaction Index model (e.g., Eklöf and Westlund 2002) whose estimations routinely draw on PLS. We also chose this type of model, because it reflects the typical degree of complexity used in composite-based structural equation modeling in the marketing discipline (Hair et al. 2012a).

Fig. 2
figure2

Simulation model

For the simulation study, we selected low (i.e., 0.15; γ 2 , γ 3 , β 6 ), medium (i.e., 0.30; β 3 ), and high (i.e., 0.50; γ 1 , γ 6 , β 1 , β 2 , β 4 , β 5 ) pre-specified values of standardized path coefficients (Fig. 2). As a result of corresponding calls in the literature (e.g., Marcoulides et al. 2012a), we extended the simulation study by adding a construct (ξ 2) with two null paths (γ 4 and γ 5 ) to the original model.

Our choice of manipulated factors and their factor levels follows prior research (e.g., Becker et al. 2013b; Chin et al. 2003; Henseler 2012; Hwang et al. 2010; Vilares and Coelho 2013). These conditions compare well with those seen in the marketing field as evidenced in prior reviews of PLS use (e.g., Hair et al. 2012a). Specifically, we manipulate the following factors:

  • The number of indicators per latent variable: 2, 4, 6, 8 (4 factor levels)

  • The standardized indicator weights for different numbers of indicators per latent variable: equal, unequal (2 factor levels)Footnote 4

  • The data distribution: normal, non-normal (i.e., diff normal), and extremely non-normal (i.e., log normal) (3 factor levels)Footnote 5

  • The sample size: 100, 250, 500, 1000, 10,000 (5 factor levels)

This is a full factorial design with 4·2·3·5 = 120 factor level combinations. To obtain stable average outcomes for our analyses, we conducted 300 replications of each factor-level combination resulting in the generation of 120·300 = 36,000 datasets. For the three methods under investigation (i.e., PLS, GSCA, and sum scores), this simulation study draws on a total number of 108,000 model estimations.

For the data generation, we use the pre-specified measurement model (Table A1) and structural model (Fig. 2) coefficients to determine the indicators’ population correlation matrix (Becker et al. 2013a; Ringle et al. 2014; Schuberth et al. 2016). Its Cholesky decomposition allows us to extract the indicator data for a pre-specified number of observations and the desired data distribution (see the Online Appendix for further information). For the data generation and the (partial) regression model estimations by means of the sum scores approach, we use the statistical software R (R Core Team 2016). PLS estimations draw on the semPLS (Monecke and Leisch 2012) package, while the GSCA estimations rely on the ASGSCA (Romdhani et al. 2014) package. Moreover, the R software’s snowfall package (Knaus 2013) allowed us exercising parallel computing on several hundred processors of a high-performance computing cluster.

Results

In order to draw a comprehensive picture of each approach’s efficacy for estimating path models with composite model data, analyses of our results address the following aspects related to the methods’ performance: (1) parameter accuracy, (2) the direction of the estimation bias, and (3) statistical power and Type I errors. Prior to these analyses, we examined whether the candidate solutions for the iterations of each method tend to get closer to the desired solution (i.e., convergence), which is a key area of concern when using iterative algorithms (Henseler 2010). All methods examined in this study (i.e., PLS, GSCA, and sum scores) converge across all factor level combinations, providing support for their ability to provide proper solutions.

Parameter accuracy

In order to be useful, an SEM method should produce parameter estimates that are very similar to the pre-specified parameters of the artificially generated datasets. The mean absolute error (MAE; e.g., Hulland et al. 2010) is a commonly used quantity to measure how close estimates and pre-specified parameters are. The MAE is defined as

$$ MAE=\frac{1}{t}\sum_{j=1}^t\left|{\widehat{\theta}}_j-{\theta}_j\right|, $$
(1)

where t equals the number of parameters, θ j is the pre-specified parameter j, and \( {\widehat{\theta}}_j \) is the parameter estimate in any replication. Thus, the lower the MAE the higher a method’s parameter accuracy. An MAE for the measurement models of, for example, 0.05 indicates that the absolute deviation between estimated and pre-specified weights is on average 0.05 units.

Figure 3 displays the MAE results for the measurement models in a series of eight charts.Footnote 6 The left column of charts shows the results for unequal weights, whereas the right column shows those for equal weights. The four rows display the results for two, four, six, and eight indicators per measurement model. Each chart in Fig. 3 illustrates the PLS and GSCA results for different sample sizes (i.e., 100, 250, 500, 1000, and 10,000) on the x-axis, and the MAE values of the measurement models (MAE MM) on the y-axis. When using the sum scores approach, no measurement model estimation occurs as this approach draws on equal weights by design. Thus, assessing sum scores’ performance in measurement models is not meaningful.

Fig. 3
figure3

Mean absolute error (MAE) in the measurement models (MM). Note: Charts in the left (right) column show the results for unequal (equal) indicator weights; the four rows of charts show the results for 2, 4, 6, and 8 indicators per measurement model. Each chart illustrates the results of PLS and GSCA for different sample sizes (i.e., 100, 250, 500, 1000, and 10,000) on the x-axis, and the MAE of the measurement models (MM) on the y-axis. An MAE for the measurement models of, for example, 0.05 indicates that the absolute deviation between estimated and pre-specified weights is on average 0.05 units

When comparing the charts in Fig. 3 across rows and columns, we find that the MAE values vary only marginally depending on the number of indicators per measurement model and the type of weights (i.e., equal vs. unequal). MAE values are lower for fewer indicators per measurement model but the differences are not pronounced. On the contrary, the sample size has a significant bearing on the results—MAE values drop considerably as sample size increases. When comparing PLS and GSCA, we find that GSCA always yields slightly lower MAE values across all simulation conditions. These differences become smaller with higher sample sizes.

Complementing our assessment of the methods’ parameter accuracy in the measurement models, the following analyses address their accuracy when estimating the relationships between latent variables in the structural model. The presentation of MAE values for the structural model in Fig. 4 follows the same principle as in Fig. 3 with columns and rows showing the MAE values for different types of weights and numbers of indicators. The x-axis maps the sample size, while the y-axis shows each method’s MAE values for the structural model (MAE SM). An MAE value for the structural model of, for example, 0.05 indicates that the absolute deviation between estimated and pre-specified path coefficients is on average 0.05 units. Different from our previous assessment, the charts also display sum scores since this approach entails an explicit estimation of structural model relationships.

Fig. 4
figure4

Mean absolute error (MAE) in the structural model (SM). Note: Charts in the left (right) column show the results for unequal (equal) indicator weights; the four rows of charts show the results for 2, 4, 6, and 8 indicators per measurement model. Each chart illustrates the results of PLS, sum scores, and GSCA for different sample sizes (i.e., 100, 250, 500, 1000, and 10,000) on the x-axis, and the MAE of the structural model (SM) on the y-axis. An MAE for the structural model of, for example, 0.05 indicates that the absolute deviation between estimated and pre-specified path coefficients is on average 0.05 units

Similar to the MAE results of the measurement models, we find that the MAE values of the structural model only marginally vary across different numbers of indicators per measurement model and the type of weights (i.e., equal vs. unequal). The MAE values produced by PLS and GSCA are almost identical, except for small sample sizes of 100 where PLS is slightly more accurate. As sample size increases, parameter accuracy generally improves for all three methods, but this improvement is less pronounced for the sum scores approach compared to PLS and GSCA.

Further contrasting the methods’ performance shows that PLS and GSCA generally have higher parameter accuracy than the sum scores approach when indicator weights are unequal. The differences between PLS and GSCA on the one hand and the sum scores approach on the other are particularly pronounced for larger sample sizes; for example, with four indicators and 10,000 observations the sum scores approach’s MAE values are four times as high as those of PLS and GSCA. Only in two situations with unequal weights (when measurement models have six or eight indicators and 100 observations), does the sum scores approach perform slightly better than PLS and GSCA. As sample sizes increase, however, its performance deteriorates relative to the other approaches.

Not surprisingly, a different picture emerges when indicator weights are equal. In this situation, the sum scores approach performs better or as well as PLS and GSCA across all simulation conditions. The differences between the methods are marginal, however, especially for sample sizes of 250 and higher, independent of the number of indicators. Only for 100 observations and when measurement models have six or eight indicators, differences between the sum scores approach and the other two methods are more pronounced.

To further our assessment of parameter accuracy, and in order to facilitate the comparison of the results with those of Reinartz et al. (2009), we also computed the mean absolute relative error (MARE) defined as

$$ MARE=\frac{1}{t}\sum_{j=1}^t\frac{\left|{\widehat{\theta}}_j-{\theta}_j\right|}{\theta_j} $$
(2)

where t equals the number of parameters, θ j is the pre-specified parameter j, and \( {\widehat{\theta}}_j \) is the parameter estimate in any replication. This equation is similar to the MAE (Eq. 1) in that the MARE represents the MAE relative to the pre-specified parameter values. A MARE value of, for example, 0.10 indicates that the pre-specified coefficients were missed by 10% on average. Thus, if a pre-specified parameter has a relatively low value, a comparatively small MAE entails a relatively rather large MARE.Footnote 7

The findings resulting from the MARE assessment parallel those from the MAE analysis.Footnote 8 While the types of weights and number of indicators have a limited impact, the MARE improves considerably for increasing sample sizes in both, the measurement and structural models (Figures A1 and A2 in the Online Appendix). For example, for 100 observations the methods’ MARE values in the structural model (Fig. A2) are around 0.25, and decrease to 0.15 for 250 observations, 0.10 for 500 and 1000 observations, and finally, less than 0.05 for 10,000 observations—except for the sum scores approach when indicator weights are unequal, particularly for large sample sizes. Most notably, the MARE values in Reinartz et al. (2009) show a similar pattern, but at a considerably higher level. For example, regardless of the number of indicators, MARE values in Reinartz et al. (2009) never drop below 0.10, even for a sample size of 10,000. This difference underlines the importance of aligning the data generation procedure with the assumptions underlying the composite-based SEM methods such as PLS.

Direction of the estimation bias

The previous analysis indicated the degree of estimation bias. The next step is to evaluate the direction of this bias. By doing so, we evaluate whether the methods exhibit a systematic underestimation or overestimation tendency of the pre-specified parameter values. Our assessment of each method’s direction of the estimation bias builds on the mean error (ME), which is defined analogous to the MAE (Eq. 1), but not in terms of an absolute value:

$$ ME=\frac{1}{t}\sum_{j=1}^t\left({\widehat{\theta}}_j-{\theta}_j\right). $$
(3)

Here, t equals the number of parameters, θ j is the pre-specified parameter j, and \( {\widehat{\theta}}_j \) is the parameter estimate in any replication. The goal is to obtain a low ME close to zero, which substantiates that a systematic underestimation or overestimation is not an issue. A substantial positive (negative) ME confirms that the estimated parameter values are systematically larger (smaller) than the pre-specified values of this simulation study. An ME for the measurement models of, for example, −0.05 implies that the estimated values are on average 0.05 units lower than the pre-specified parameters.

Analyzing the ME of the weights estimates in the measurement models (Fig. A3 in the Online Appendix), we find that PLS and GSCA perform very well with practically no systematic bias across most simulation conditions. Both methods show slight underestimation tendencies, which are more pronounced when sample sizes are small (i.e., 100), particularly for six and eight indicator models, independent of whether the indicator weights are equal or not. Even in these settings, however, the ME values never exceed 0.025 units, which is very low in absolute terms. Thus, as sample size increases the ME diminishes, approaching zero for 10,000 observations.

With regard to the ME in the structural model (Fig. A4 in the Online Appendix), PLS and GSCA show slight overestimation tendencies. The overestimation is more pronounced when sample sizes are small (i.e., 100), particularly for six and eight indicator models. But with larger sample sizes, the magnitude of the overestimation becomes trivial, especially for sample sizes of 250 and higher.

PLS and GSCA clearly outperform the sum scores approach when indicator weights are unequal. The sum scores approach always underestimates the path coefficients, particularly when the measurement models only have two or four indicators. In this situation, the sum scores approach’s ME values peak at 0.046 units. The pronounced underestimation tendency for the sum scores approach when indicator weights are unequal, however, translates into a slight overestimation tendency when indicator weights are equal. In this situation, the approach always yields ME values equal to or lower than PLS and GSCA.

Statistical power and Type I errors

The next analyses addresses the question of how well the methods reveal relationships with low (β = 0.15), medium (β = 0.30), and high (β = 0.50) population effect sizes. This assessment is particularly important to applied researchers, because any failure to correctly depict such relationships as significant may lead to erroneous conclusions with adverse consequences for theory building and testing (Cohen 1988). To test for each method’s statistical power, we calculated the relative occurrence of significant paths found by each method and averaged the results with paths of identical population effect size (Reinartz et al. 2009). For example, in terms of low effect sizes, the analysis considers the model parameters β 6 , γ 2 , and γ 3 (Fig. 2).

The results show that PLS performs best in terms of statistical power, followed by GSCA and the sum scores approach (Table A2 in the Online Appendix). For low effect sizes and small sample sizes, PLS yields power values of approximately 0.70, whereas the sum scores approach and GSCA have power values of 0.60 and 0.40, respectively. The differences become less pronounced as sample sizes increase and ultimately diminish for 1000 observations or more. This pattern of results also holds for medium and high effect sizes, with the only difference being that the methods’ power values already align at a sample size of 250.

Mistakes in the interpretation of results and false conclusions can also occur if the methods erroneously render a null relationship as significant (i.e., Type I error; false positives). For this reason, we analyze the null relationships (γ 4 and γ 5 ) in the structural model (Fig. 2) and determine the probability of rendering these relationships significant—hence, low probabilities indicate small Type I errors. We find that all three methods have very low Type I error rates of less than 10% (Table A2). This especially holds for PLS, which has the lowest error rates in most of the factor level constellations.

Discussion and conclusions

Summary of results

Along with the increasing prominence of composite-based SEM methods, particularly in the marketing field (Hair et al. 2012a; Henseler et al. 2009), researchers have started calling for their emancipation from factor-based SEM methods. As Rigdon (2012, p. 353) notes with regard to PLS, “instead of emphasizing how it is ‘like’ factor-based SEM but with advantages and disadvantages across different conditions, PLS path modeling should celebrate its status as a purely composite-based method.” To address this call and further the emancipation of composite-based SEM from its factor-based sibling, researchers require a more nuanced understanding of these methods’ performances based on strong simulation results (Henseler et al. 2014; Sarstedt et al. 2016). However, reviewing prior simulation studies gives rise to substantial concern.

Any simulation study begins by specifying a population—a true state from which data are sampled (Paxton et al. 2001). Even though a number of simulation studies have focused on evaluating the performance of composite-based SEM techniques, most notably PLS, practically all of these studies began by assuming that the common factor model represents the truth (Becker et al. 2013a; Rigdon 2016). Hence, most previous simulations evaluated models that were incorrectly specified relative to the population and, thereby, introduced a substantial research design bias. Addressing corresponding calls in prior research (Chin 2010; Hwang et al. 2010; Marcoulides and Chin 2013; Marcoulides et al. 2012), this study is the first to provide a systematic assessment of and comparison between PLS, GSCA, and the sum scores approach based on composite model data that is consistent with the functional principles of the estimators.

Our results show that PLS and GSCA are consistent estimators of measurement and structural model coefficients when the underlying population is composite model-based—approaching the pre-specified values as sample sizes increase. However, both methods show slight tendencies to underestimate measurement model parameters and small overestimation tendencies of structural model parameters. In fact, in absolute terms the parameter biases in measurement models are quite pronounced for a small sample size of 100, yielding MAE values above 0.10, regardless of the number of indicators and weights pattern (i.e., equal or unequal). In the extreme case (a condition with 100 observations and eight indicators with unequal weights at relatively low levels), PLS and GSCA show relative deviations of measurement model parameter estimates (as expressed through MARE) of 60% and more. For larger sample sizes, however, the relative deviations decrease substantially, dropping below 10% for 10,000 observations.

Compared to the measurement models, structural model MAE values are lower when the sample size is small (i.e., 100 observations), with values of around 0.075. For 250 observations, structural model MAE values drop below 0.05, diminishing further as sample size increases. Similarly, the relative deviations of parameter estimates (i.e., the MARE values) in the structural model quickly decrease for larger sample sizes. As sample size increases to 10,000, the relative deviation falls below 5%, providing support for the consistency of the PLS and GSCA methods. This finding does not hold for the sum scores approach, however, which exhibits MARE values of 7.5% and higher for large sample sizes when indicator weights are unequal. As unequal indicator weights are more the rule than the exception in applied research, these results cast substantial doubt on prior research that called for using sum scores over PLS or GSCA, but in making this recommendation relied on factor model-based data (Rönkkö and Evermann 2013).

Similarly, our results demonstrate that prior allegations suggesting that PLS is not suitable for disclosing null effects, without formally testing this claim (Rönkkö and Evermann 2013; Rönkkö et al. 2016), or drawing on factor model-based data (Rönkkö et al. 2015) are without merit. Our study reveals that model estimation using PLS, GSCA, and sum scores yields low Type I error rates of around 5%, regardless of sample size and measurement model set-up. Additionally, when statistical power is considered, our results indicate that PLS should be the preferred method vis-à-vis GSCA. While both methods achieve power values close to 1.00 for sample sizes of 500 or more, PLS excels at small sample sizes, particularly when effect sizes are low. In this situation, PLS’s power is approximately 75% higher than that of GSCA.

Implications for marketing researchers

None of the composite-based SEM approaches dominates in all criteria. Instead, each approach has certain strengths and weaknesses, which make it suitable when the focus is on parameter accuracy in the measurement models, the structural model, or on statistical power. Against this background, Fig. 5 visualizes the recommendations to marketing researchers that result from the simulation study.

Fig. 5
figure5

Guidelines for choosing among composite-based SEM methods

When the aim is to maximize parameter recovery accuracy, researchers should draw on PLS or GSCA. The latter method has slight advantages in terms of estimating measurement models, but PLS offers higher accuracy in the estimation of structural model parameters. Only when sample size is small (i.e., 100) and measurement models are complex (i.e., 6 or more indicators), the sum scores approach performs favorably in terms of structural model parameter accuracy while having the same Type I error rate and statistical power as PLS. In this situation, the bias in indicator weights due to sampling variability is higher than the bias resulting from the assignment of equal weights. The lower average weights that occur when using many indicators amplify this effect. Except for this specific condition, however, our results advise against the use of the sum scores approach. When the primary focus is disclosing the significance of structural model effects, researchers should choose PLS in light of the method’s high statistical power compared to the sum scores approach and GSCA.

Our results in terms of parameter recovery directly contradict previous findings regarding the performance of composite-based PLS in factor model-based populations, where the method has been shown to overestimate measurement model parameters while underestimating structural model parameters (e.g., Barroso et al. 2010; Chin et al. 2003; Reinartz et al. 2009). In composite model-based populations, the direction of parameter bias is reversed and much smaller. For example, PLS’s relative deviations in structural model estimates are always smaller than those obtained when using the method to estimate factor model-based data, as in Reinartz et al.’s (2009) study, which used a highly similar simulation design. Interestingly, the very same holds when comparing PLS’s performance in this study with that of factor-based SEM in Reinartz et al.’s (2009) study when estimating factor model-based populations. For example, in their study, factor-based SEM’s parameter estimates deviate between 30% and 65% for small sample sizes (Fig. 2 in Reinartz et al. 2009), depending on the number of indicators in the measurement models, while in this study deviations produced by PLS are around 25% (Fig. A2). Clearly, when using PLS on correctly specified populations in simulation studies, the biases are smaller than when using factor-based SEM on correctly specified populations.

Our results also cast doubt on the widely held belief that PLS is universally applicable for small sample size situations (e.g., Chin and Newsted 1999; Haenlein and Kaplan 2004; Hair et al. 2011). While this study is not the first to question PLS’s performance in this regard (Goodhue et al. 2012a; Rönkkö and Evermann 2013), it is the first to do so on the grounds of an approriately specified (i.e., composite model) population. However, this conclusion primarily holds for the measurement models, where the relative deviations of parameter estimates are pronounced. PLS’s performance is much less affected in the structural model where the relative deviations for small sample sizes are comparable to those produced by factor-based SEM for sample sizes of 250 to 500 when estimating factor model-based data (Reinartz et al. 2009). The implications of this aspect of our analysis are twofold.

First, these results give confidence that the ACSI model (Fornell et al. 1996) or the technology acceptance model (Davis 1989) and its extensions (Venkatesh et al. 2003), which have been validated using hundreds or thousands of observations, offer valid results on both measurement and structural model levels (Anderson et al. 2004). Researchers frequently interpret the latent variable scores from these models and use the scores to run between-model mean-comparisons. For example, drawing on latent variables scores from PLS analyses, Hult et al. (2017) recently used ACSI scores to investigate the extent to which managers’ perceptions of the levels and drivers of their customers’ satisfaction and loyalty align with that of their actual customers. Similarly, a multitude of studies on the marketing-finance interface draws on ACSI scores derived from PLS-based indicator weighting (e.g., Angulo-Ruiz et al. 2014; Fornell et al. 2016; Habel and Klarmann 2015; Lee et al. 2015) to offer evidence for the value relevance of customer satisfaction and related marketing asset variables. Our results regarding the parameter accuracy of indicator weights provide support for the validity of such analyses, given that the sample size is sufficiently large, as it is the case with the original ACSI data. However, for small sample sizes (e.g., 100), indicator weights estimates show pronounced biases that may compromise the validity of latent variable scores.

Second, caution needs to be exercised when interpreting PLS or GSCA results on an item level when sample size is small since the biases produced in this situation potentially cast doubt on any prioritization on the grounds of indicator weights. This result is particularly relevant for studies implementing formative measurement models, which continue to feature prominently in marketing research (e.g., Ranjan and Read 2016; Rubera et al. 2016; Wolter and Cronin 2016). Similarly, findings from an importance-performance map analyses (Ringle and Sarstedt 2016), which contrast each item’s total effect on a target construct (importance) with the average latent variable scores (performance) are prone to biased indicator weights with small sample sizes. Considering the prominence of composite-based SEM when sample sizes are small, this caveat holds for many studies. For example, Hair et al.’s (2012b) review of PLS use in strategic management shows that more than half of all models estimated with PLS draw on sample sizes of 100 or less. In other fields such as marketing (Hair et al. 2012a), management information systems (Ringle et al. 2012), and supply chain management (Kaufmann and Gaeckler 2015), this is the case for about a quarter of all models. In our review of PLS use in JM and JAMS between 1986 and 2015, the majority of studies (64.15%) incorporated at least one model with a sample size of less than 250; seven of these 53 PLS studies (13.21%) had at least one model with less than 100 observations, with Green et al. (1995) using a sample size as low as 39. Different from factor-based SEM, PLS provides measurement model estimates even when the sample size is very small. But authors, reviewers and editors should question the value of those estimates in small sample size situations and rather focus on the structural model outcomes. However, when the aim is to interpret measurement model results, we concur with Rigdon (2016, p. 600) who recently noted that “with respect to both composite-based and factor-based approaches to SEM, if sample size is small, the best course is to get more data.” In B-to-B research, for example, large samples often are not available. In such situations, when populations are small and/or data is difficult to obtain, the application of PLS with smaller samples denotes a viable attempt to advance knowledge in these areas.

Future research avenues

Our results in terms of parameter recovery in the measurement models reinforce Rigdon’s (2012, p. 353) call that methodologists “should work to complete and validate a purely composite-based approach to evaluating modeling results.” Using composite-based SEM instead of factor-based SEM does not mean rejecting rigor. Instead, it means defining rigor in composite terms (Rigdon 2014, 2016). With regard to PLS, such an evaluation approach should take the method’s prediction orientation into account (Shmueli et al. 2016). Further advancing the set of prediction-oriented evaluation criteria is a particularly promising area of future research (see Becker et al. 2013a; Evermann and Tate 2016; Sarstedt et al. 2014; Schubring et al. 2016).

With respect to the recovery of composite model parameters, subsequent studies should extend the simulation design by considering more complex model structures, such as hierarchical component models (e.g., Becker et al. 2012), interaction terms (e.g., Henseler and Fassott 2010), mediating effects (Hair et al. 2017), and nonlinear effects (e.g., Rigdon et al. 2010). While the use of these modeling elements has recently become more en vogue, little is known about the efficacy of different composite-based SEM methods when estimating such model structures. Follow-up simulation studies should also complement our design, which focused on parameter accuracy as well as Type I and II errors, by considering the methods’ predictive power and different weighting modes (i.e., correlation versus regression weights; Becker et al. 2013a). Future research should also include a broader set of composite-based SEM methods such as regularized generalized canonical correlation analysis (Tenenhaus and Tenenhaus 2011), best fitting proper indices (Dijkstra and Henseler 2011), and the extended PLS algorithm, which Lohmöller (1989) recommended to overcome some of the original PLS method’s restrictions in terms of model specifications. For example, the extended PLS algorithm allows for imposing restrictions on model parameters (e.g., the orthogonalization of latent variables to circumvent collinearity issues) and for assigning an indicator to multiple constructs. Researchers have largely overlooked the extended PLS algorithm; consequently, its performance has not yet been examined across a broader set of data and model constellations.

Complementing our empirical perspective, future studies should aim at providing analytical support for the methods’ consistency in estimating measurement and structural model parameters. Providing such an analytical rationale would complement prior research on the consistency of PLS estimates in common factor model populations (Hui and Wold 1982), thereby broadening our understanding of the methods’ performance in composite model-based populations.

Finally, while the parameter bias that PLS produces when estimating common factor models has been extensively debated in the literature (e.g., Goodhue et al. 2012a; Reinartz et al. 2009; Rönkkö and Evermann 2013), the bias that factor-based SEM produces when estimating composite models has not yet been explored in depth. Initial results show that this bias can be substantial (Sarstedt et al. 2016), but future research should aim at evaluating in greater detail how data and model constellations affect factor-based SEM’s parameter accuracy in this context. Understanding this bias would help researchers to clarify the consequences of mistakenly using factor-based SEM on composite model-based populations. This endeavor would be particularly fruitful since researchers increasingly oppose adherence to the common factor model in SEM analyses (e.g., Rigdon 2012; Schönemann and Wang 1972; Treiblmaier et al. 2011). For example, among 72 articles published during 2012 in what Atinc et al. (2012) consider the four leading management journals (Academy of Management Journal, Journal of Applied Psychology, Journal of Management, and Strategic Management Journal) that tested one or more common factor model(s), fewer than 10% contained a common factor model that did not have to be rejected. While these results do not necessarily imply that the composite model predominantly holds in practice, understanding the consequences of using factor-based SEM on such models would contribute to broadening our understanding of this method type, which is still the mainstay for analyzing cause–effect models in marketing.

Notes

  1. 1.

    Note that researchers frequently distinguish between latent variables/constructs and composites. We use the term latent variable/construct to refer to the entities that represent conceptual variables in a structural equation model.

  2. 2.

    Our comparison does not consider consistent PLS (PLSc; Dijkstra 2014; Dijkstra and Henseler 2015) that corrects the PLS estimates for attenuation to mimic common factor models. As our objective is to compare composite-based SEM techniques on the basis of composite model data, PLSc is not relevant to our study.

  3. 3.

    Note that constructs in factor-based SEM are also proxies for the conceptual variables under investigation (Rigdon 2012).

  4. 4.

    Table A1 in the Online Appendix shows the indicator weights for different numbers of indicators.

  5. 5.

    For further details about the non-normal data, see the additional information on the data generation in the Online Appendix.

  6. 6.

    As the analyses show only marginal differences between normal and non-normal data, the following results presentations use the joint outcomes of the different data distribution types considered in this simulation study.

  7. 7.

    For example, for the condition with 500 observations, two indicators with equal weights of 0.625, PLS yields a MAE value of 0.05814 in the measurement models, which translates into an MARE value of 0.093. On the contrary, a very similar MAE value of 0.06029 for the condition with 500 observations, eight indicators with equal weights of 0.25 translates into a MARE value of 0.241.

  8. 8.

    Note that the MARE is not defined for the two null paths γ 4 and γ 5 (Fig. 2). Hence, we did not include these two paths in the MARE computations.

References

  1. Anderson, E. W., & Fornell, C. G. (2000). Foundations of the American customer satisfaction index. Total Quality Management, 11(7), 869–882.

    Article  Google Scholar 

  2. Anderson, E. W., Fornell, C. G., & Mazvancheryl, S. K. (2004). Customer satisfaction and shareholder value. Journal of Marketing, 68(4), 172–185.

    Article  Google Scholar 

  3. Angulo-Ruiz, F., Donthu, N., Prior, D., & Rialp, J. (2014). The financial contribution of customer-oriented marketing capability. Journal of the Academy of Marketing Science, 42(4), 380–399.

    Article  Google Scholar 

  4. Atinc, G., Simmering, M. J., & Kroll, M. J. (2012). Control variable use and reporting in macro and micro management research. Organizational Research Methods, 15(1), 57–74.

    Article  Google Scholar 

  5. Babin, B. J., Hair, J. F., & Boles, J. S. (2008). Publishing research in marketing journals using structural equation modeling. Journal of Marketing Theory and Practice, 16(4), 279–285.

    Article  Google Scholar 

  6. Barroso, C., Carrión, G. C., & Roldán, J. L. (2010). Applying maximum likelihood and PLS on different sample sizes: studies on SERVQUAL model and employee behavior model. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: concepts, methods and applications (pp. 427–447). Berlin: Springer.

    Google Scholar 

  7. Becker, J. M., Klein, K., & Wetzels, M. (2012). Hierarchical latent variable models in PLS-SEM: guidelines for using reflective-formative type models. Long Range Planning, 45(5–6), 359–394.

    Article  Google Scholar 

  8. Becker, J. M., Rai, A., & Rigdon, E. E. (2013a). Predictive validity and formative measurement in structural equation modeling: Embracing practical relevance. In 2013a Proceedings of the international conference on information systems. Milan.

  9. Becker, J. M., Rai, A., Ringle, C. M., & Völckner, F. (2013b). Discovering unobserved heterogeneity in structural equation models to avert validity threats. MIS Quarterly, 37(3), 665–694.

    Google Scholar 

  10. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

    Google Scholar 

  11. Brady, M. K., Voorhees, C. M., & Brusco, M. J. (2012). Service sweethearting: its antecedents and customer consequences. Journal of Marketing, 76(2), 81–98.

    Article  Google Scholar 

  12. Burke, S. J. (2011). Competitive positioning strength: market measurement. Journal of Strategic Marketing, 19(5), 421–428.

    Article  Google Scholar 

  13. Cassel, C., Hackl, P., & Westlund, A. H. (1999). Robustness of partial least-squares method for estimating latent variable quality structures. Journal of Applied Statistics, 26(4), 435–446.

    Article  Google Scholar 

  14. Chin, W. W. (1998). The partial least squares approach to structural equation modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 295–358). Mahwah: Erlbaum.

    Google Scholar 

  15. Chin, W. W. (2010). Bootstrap cross-validation indices for PLS path model assessment. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares (pp. 83–97). Berlin: Springer.

    Google Scholar 

  16. Chin, W. W., & Newsted, P. R. (1999). Structural equation modeling analysis with small samples using partial least squares. In R. H. Hoyle (Ed.), Statistical strategies for small sample research (pp. 307–341). Thousand Oaks, CA: Sage.

    Google Scholar 

  17. Chin, W. W., Marcolin, B. L., & Newsted, P. R. (2003). A partial least squares latent variable modeling approach for measuring interaction effects: results from a Monte Carlo simulation study and an electronic-mail emotion/adoption study. Information Systems Research, 14(2), 189–217.

    Article  Google Scholar 

  18. Cliff, N. (1983). Some cautions concerning the application of causal modeling methods. Multivariate Behavioral Research, 18(1), 115–126.

    Article  Google Scholar 

  19. Cohen, J. (1988). Statistical power analysis for the behavioural sciences (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.

    Google Scholar 

  20. Core Team, R. (2016). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

    Google Scholar 

  21. Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340.

    Article  Google Scholar 

  22. Dellande, S., Gilly, M. C., & Graham, J. L. (2004). Gaining compliance and losing weight: the role of the service provider in health care services. Journal of Marketing, 68(3), 78–91.

    Article  Google Scholar 

  23. Diamantopoulos, A., & Riefler, P. (2011). Using formative measures in international marketing models: a cautionary tale using consumer animosity as an example. In M. Sarstedt, M. Schwaiger, & C. R. Taylor (Eds.), Advances in international marketing (Vol. 22, pp. 11–30). Bingley: Emerald.

    Google Scholar 

  24. Dijkstra, T. K. (2010). Latent variables and indices: Herman Wold’s basic design and partial least squares. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: concepts, methods and applications (pp. 23–46). Berlin: Springer.

    Google Scholar 

  25. Dijkstra, T. K. (2014). PLS' Janus face – response to professor Rigdon's ‘rethinking partial least squares modeling: in praise of simple methods’. Long Range Planning, 47(3), 146–153.

    Article  Google Scholar 

  26. Dijkstra, T. K., & Henseler, J. (2011). Linear indices in nonlinear structural equation models: best fitting proper indices and other composites. Quality & Quantity, 45(6), 1505–1518.

    Article  Google Scholar 

  27. Dijkstra, T. K., & Henseler, J. (2015). Consistent partial least squares path modeling. MIS Quarterly, 39(2), 297–316.

    Article  Google Scholar 

  28. Eklöf, J. A., & Westlund, A. H. (2002). The pan-European customer satisfaction index program: current work and the way ahead. Total Quality Management, 13(8), 1099–1106.

    Article  Google Scholar 

  29. Evermann, J., & Tate, M. (2016). Assessing the predictive performance of structural equation model estimators. Journal of Business Research, 69(10), 4565–4582.

    Article  Google Scholar 

  30. Fornell, C. G., Johnson, M. D., Anderson, E. W., Cha, J., & Bryant, B. E. (1996). The American customer satisfaction index: nature, purpose, and findings. Journal of Marketing, 60(4), 7–18.

    Article  Google Scholar 

  31. Fornell, C., Morgeson, F. V., & Hult, G. T. M. (2016). Stock returns on customer satisfaction do beat the market: gauging the effect of a marketing intangible. Journal of Marketing, 80(5), 92–107.

    Article  Google Scholar 

  32. Gelbrich, K. (2010). Anger, frustration, and helplessness after service failure: coping strategies and effective informational support. Journal of the Academy of Marketing Science, 38(5), 567–585.

    Article  Google Scholar 

  33. Goodhue, D. L., Lewis, W., & Thompson, R. (2012a). Does PLS have advantages for small sample size or non-normal data? MIS Quarterly, 36(3), 981–1001.

    Google Scholar 

  34. Goodhue, D. L., Lewis, W., & Thompson, R. (2012b). Comparing PLS to regression and LISREL: a response to Marcoulides, Chin, and Saunders. MIS Quarterly, 36(3), 703–716.

    Google Scholar 

  35. Green, D. H., Donald, W. B., & Ryans, A. B. (1995). Entry strategy and long-term performance: conceptualization and empirical examination. Journal of Marketing, 59(4), 1–16.

    Article  Google Scholar 

  36. Habel, J., & Klarmann, M. (2015). Customer reactions to downsizing: when and how is satisfaction affected? Journal of the Academy of Marketing Science, 43(6), 768–789.

    Article  Google Scholar 

  37. Haenlein, M., & Kaplan, A. M. (2004). A beginner's guide to partial least squares analysis. Understanding Statistics, 3(4), 283–297.

    Article  Google Scholar 

  38. Haenlein, M., & Kaplan, A. M. (2011). The influence of observed heterogeneity on path coefficient significance: technology acceptance within the marketing discipline. Journal of Marketing Theory and Practice, 19(2), 153–168.

    Article  Google Scholar 

  39. Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLS-SEM: indeed a silver bullet. Journal of Marketing Theory and Practice, 19(2), 139–151.

    Article  Google Scholar 

  40. Hair, J. F., Sarstedt, M., Ringle, C. M., & Mena, J. A. (2012a). An assessment of the use of partial least squares structural equation modeling in marketing research. Journal of the Academy of Marketing Science, 40(3), 414–433.

    Article  Google Scholar 

  41. Hair, J. F., Sarstedt, M., Pieper, T. M., & Ringle, C. M. (2012b). The use of partial least squares structural equation modeling in strategic management research: a review of past practices and recommendations for future applications. Long Range Planning, 45(5–6), 320–340.

    Article  Google Scholar 

  42. Hair, J. F., Ringle, C. M., & Sarstedt, M. (2013). Partial least squares structural equation modeling: rigorous applications, better results and higher acceptance. Long Range Planning, 46(1–2), 1–12.

    Article  Google Scholar 

  43. Hair, J. F., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2017). A primer on partial least squares structural equation modeling (PLS-SEM) (2nd ed.). Thousand Oaks, CA: Sage.

    Google Scholar 

  44. Heidenreich, S., Wittkowski, K., Handrich, M., & Falk, T. (2015). The dark side of customer co-creation: exploring the consequences of failed co-created services. Journal of the Academy of Marketing Science, 43(3), 279–296.

    Article  Google Scholar 

  45. Hennig-Thurau, T., Groth, M., Paul, M., & Gremler, D. D. (2006). Are all smiles created equal? How emotional contagion and emotional labor affect service relationships. Journal of Marketing, 70(3), 58–73.

    Article  Google Scholar 

  46. Henseler, J. (2010). On the convergence of the partial least squares path modeling algorithm. Computational Statistics, 25(1), 107–120.

    Article  Google Scholar 

  47. Henseler, J. (2012). Why generalized structured component analysis is not universally preferable to structural equation modeling. Journal of the Academy of Marketing Science, 40(3), 402–413.

    Article  Google Scholar 

  48. Henseler, J., & Fassott, G. (2010). Testing moderating effects in PLS path models: an illustration of available procedures. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: concepts, methods and applications (pp. 713–735). Berlin: Springer.

    Google Scholar 

  49. Henseler, J., & Sarstedt, M. (2013). Goodness-of-fit indices for partial least squares path modeling. Computational Statistics, 28(2), 565–580.

    Article  Google Scholar 

  50. Henseler, J., Ringle, C. M., & Sinkovics, R. R. (2009). The use of partial least squares path modeling in international marketing. In R. R. Sinkovics & P. N. Ghauri (Eds.), Advances in international marketing (Vol. 20, pp. 277–320). Bingley: Emerald.

    Google Scholar 

  51. Henseler, J., Dijkstra, T. K., Sarstedt, M., Ringle, C. M., Diamantopoulos, A., Straub, D. W., Ketchen, D. J., Hair, J. F., Hult, G. T. M., & Calantone, R. J. (2014). Common beliefs and reality about partial least squares: comments on Rönkkö & Evermann (2013). Organizational Research Methods, 17(2), 182–209.

    Article  Google Scholar 

  52. Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135.

    Article  Google Scholar 

  53. Henseler, J., Hubona, G. S., & Ray, P. A. (2016). Using PLS path modeling in new technology research: updated guidelines. Industrial Management & Data Systems, 116(1), 1–19.

    Article  Google Scholar 

  54. Hui, B. S., & Wold, H. O. A. (1982). Consistency and consistency at large of partial least squares estimates. In K. G. Jöreskog & H. O. A. Wold (Eds.), Systems under indirect observation, part II (pp. 119–130). Amsterdam: North Holland.

    Google Scholar 

  55. Hulland, J., Ryan, M. J., & Rayner, R. K. (2010). Modeling customer satisfaction: a comparative performance evaluation of covariance structure analysis versus partial least squares. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: concepts, methods and applications (pp. 307–325). Berlin: Springer.

    Google Scholar 

  56. Hult, G. T. M., Morgeson III, F. V., Morgan, N. A., Mithas, S., & Fornell, C. (2017). Do managers know what their customers think and why? Journal of the Academy of Marketing Science, 45(1), 37–54.

    Article  Google Scholar 

  57. Hwang, H. (2009). Regularized generalized structured component analysis. Psychometrika, 74(3), 517–530.

    Article  Google Scholar 

  58. Hwang, H., Malhotra, N. K., Kim, Y., Tomiuk, M. A., & Hong, S. (2010). A comparative study on parameter recovery of three approaches to structural equation modeling. Journal of Marketing Research, 47(4), 699–712.

    Article  Google Scholar 

  59. Jöreskog, K. G. (1973). A general method for estimating a linear structural equation system. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 255–284). New York, NJ: Seminar Press.

    Google Scholar 

  60. Jöreskog, K. G., & Wold, H. O. A. (1982). The ML and PLS techniques for modeling with latent variables: historical and comparative aspects. In H. O. A. Wold & K. G. Jöreskog (Eds.), Systems under indirect observation, part I (pp. 263–270). Amsterdam: North-Holland.

    Google Scholar 

  61. Kaplan, A. M., Schoder, D., & Haenlein, M. (2007). Factors influencing the adoption of mass customization: the impact of base category consumption frequency and need satisfaction. Journal of Product Innovation Management, 24(2), 101–116.

    Article  Google Scholar 

  62. Kaufmann, L., & Gaeckler, J. (2015). A structured review of partial least squares in supply chain management research. Journal of Purchasing and Supply Management, 21(4), 259–272.

    Article  Google Scholar 

  63. Knaus, J. (2013). R package snowfall: Easier cluster computing (version: 1.84–6). cran.r-project.org/web/packages/snowfall/.

  64. Lee, N., & Cadogan, J. W. (2013). Problems with formative and higher-order reflective variables. Journal of Business Research, 66(2), 242–247.

    Article  Google Scholar 

  65. Lohmöller, J.-B. (1989). Latent variable path modeling with partial least squares. Heidelberg: Physica.

    Google Scholar 

  66. Marcoulides, G. A., & Chin, W. W. (2013). You write, but others read: common methodological misunderstandings in PLS and related methods. In H. Abdi, W. W. Chin, V. Esposito Vinzi, G. Russolillo, & L. Trinchera (Eds.), New perspectives in partial least squares and related methods (pp. 31–64). New York, NJ: Springer.

    Google Scholar 

  67. Marcoulides, G. A., Chin, W. W., & Saunders, C. (2012). When imprecise statistical statements become problematic: a response to Goodhue, Lewis, and Thompson. MIS Quarterly, 36(3), 717–728.

    Google Scholar 

  68. McDonald, R. P. (1996). Path analysis with composite variables. Multivariate Behavioral Research, 31(2), 239–270.

    Article  Google Scholar 

  69. McIntosh, C. N., Edwards, J. R., & Antonakis, J. (2014). Reflections on partial least squares path modeling. Organizational Research Methods, 17(2), 210–251.

    Article  Google Scholar 

  70. Monecke, A., & Leisch, F. (2012). semPLS: structural equation modeling using partial least squares. Journal of Statistical Software, 48(3), 1–32.

    Article  Google Scholar 

  71. Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Chen, F. (2001). Monte Carlo experiments: design and implementation. Structural Equation Modeling, 8(2), 287–312.

    Article  Google Scholar 

  72. Ranjan, K. R., & Read, S. (2016). Value co-creation: concept and measurement. Journal of the Academy of Marketing Science, 44(3), 290–315.

    Article  Google Scholar 

  73. Rego, L. L., Morgan, N. A., & Fornell, C. (2013). Reexamining the market share–customer satisfaction relationship. Journal of Marketing, 77(5), 1–20.

    Article  Google Scholar 

  74. Reinartz, W. J., Haenlein, M., & Henseler, J. (2009). An empirical comparison of the efficacy of covariance-based and variance-based SEM. International Journal of Research in Marketing, 26(4), 332–344.

    Article  Google Scholar 

  75. Rigdon, E. E. (1998). Structural equation modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 251–294). Mahwah: Erlbaum.

    Google Scholar 

  76. Rigdon, E. E. (2012). Rethinking partial least squares path modeling: in praise of simple methods. Long Range Planning, 45(5–6), 341–358.

    Article  Google Scholar 

  77. Rigdon, E. E. (2014). Rethinking partial least squares path modeling: breaking chains and forging ahead. Long Range Planning, 47(3), 161–167.

    Article  Google Scholar 

  78. Rigdon, E. E. (2016). Choosing PLS path modeling as analytical method in European management research: a realist perspective. European Management Journal, 34(6), 598–605.

    Article  Google Scholar 

  79. Rigdon, E. E., Ringle, C. M., & Sarstedt, M. (2010). Structural modeling of heterogeneous data with partial least squares. In N. K. Malhotra (Ed.), Review of marketing research (pp. 255–296). Armonk: Sharpe.

    Google Scholar 

  80. Rigdon, E. E., Becker, J.-M., Rai, A., Ringle, C. M., Diamantopoulos, A., Karahanna, E., Straub, D. W., & Dijkstra, T. K. (2014). Conflating antecedents and formative indicators: a comment on Aguirre-Urreta and Marakas. Information Systems Research, 25(4), 780–784.

    Article  Google Scholar 

  81. Ringle, C. M., & Sarstedt, M. (2016). Gain more insight from your PLS-SEM results: the importance-performance map analysis. Industrial Management & Data Systems, 116(9), 1865–1886.

    Article  Google Scholar 

  82. Ringle, C. M., Sarstedt, S., & Straub, D. W. (2012). A critical look at the use of PLS-SEM in MIS Quarterly. MIS Quarterly, 36(1), iii–xiv.

    Google Scholar 

  83. Ringle, C. M., Sarstedt, M., & Schlittgen, R. (2014). Genetic algorithm segmentation in partial least squares structural equation modeling. OR Spectrum, 36(1), 251–276.

    Article  Google Scholar 

  84. Romdhani, H., Grinek, S., Hwang, H., & Labbe, A. (2014). R package ASGSCA: Association studies for multiple SNPs and multiple traits using generalized structured equation models (Version 1.4.0), http://bioconductor.org/packages/ASGSCA/.

  85. Rönkkö, M., & Evermann, J. (2013). A critical examination of common beliefs about partial least squares path modeling. Organizational Research Methods, 16(3), 425–448.

    Article  Google Scholar 

  86. Rönkkö, M., McIntosh, C. N., & Antonakis, J. (2015). On the adoption of partial least squares in psychological research: caveat emptor. Personality and Individual Differences, 87, 76–84.

    Article  Google Scholar 

  87. Rönkkö, M., McIntosh, C. N., Antonakis, J., & Edwards, J. R. (2016). Partial least squares path modeling: time for some serious second thoughts. Journal of Operations Management, 47-48(November), 9–27.

    Article  Google Scholar 

  88. Rubera, G., Chandrasekaran, D., & Ordanini, A. (2016). Open innovation, product portfolio innovativeness and firm performance: the dual role of new product development capabilities. Journal of the Academy of Marketing Science, 44(2), 166–184.

    Article  Google Scholar 

  89. Sarstedt, M., Ringle, C. M., Henseler, J., & Hair, J. F. (2014). On the emancipation of PLS-SEM: a commentary on Rigdon (2012). Long Range Planning, 47(3), 154–160.

    Article  Google Scholar 

  90. Sarstedt, M., Hair, J. F., Ringle, C. M., Thiele, K. O., & Gudergan, S. P. (2016). Estimation issues with PLS and CBSEM: where the bias lies! Journal of Business Research, 69(10), 3998–4010.

    Article  Google Scholar 

  91. Schneeweiß, H. (1991). Models with latent variables: LISREL versus PLS. Statistica Neerlandica, 45(2), 145–157.

    Article  Google Scholar 

  92. Schönemann, P. H., & Steiger, J. H. (1978). On the validity of indeterminate factor scores. Bulletin of the Psychonomic Society, 12(4), 287–290.

    Article  Google Scholar 

  93. Schönemann, P. H., & Wang, M.-M. (1972). Some new results on factor indeterminacy. Psychometrika, 37(1), 61–91.

    Article  Google Scholar 

  94. Schuberth, F., Henseler, J., & Dijkstra, T. K. (2016). Partial least squares path modeling using ordinal categorical indicators. Quality & Quantity, forthcoming.

  95. Schubring, S., Lorscheid, I., Meyer, M., & Ringle, C. M. (2016). The PLS agent: predictive modeling with PLS-SEM and agent-based simulation. Journal of Business Research, 69(10), 4604–4612.

    Article  Google Scholar 

  96. Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.

    Article  Google Scholar 

  97. Shmueli, G., Ray, D., Manuel, J., Estrada, V., & Chatla, S. B. (2016). The elephant in the room: evaluating the predictive performance of PLS models. Journal of Business Research, 69(10), 4552–4564.

    Article  Google Scholar 

  98. Spearman, C. (1927). The abilities of man. London: MacMillan.

    Google Scholar 

  99. Steenkamp, J.-B. E. M., & Baumgartner, H. (2000). On the use of structural equation models for marketing modeling. International Journal of Research in Marketing, 17(2/3), 195–202.

    Article  Google Scholar 

  100. Sundaram, S., Schwarz, A., Jones, E., & Chin, W. W. (2007). Technology use on the front line: how information technology enhances individual performance. Journal of the Academy of Marketing Science, 35(1), 101–112.

    Article  Google Scholar 

  101. Tenenhaus, M. (2008). Component-based structural equation modelling. Total Quality Management & Business Excellence, 19(7–8), 871–886.

    Article  Google Scholar 

  102. Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76(2), 257–284.

    Article  Google Scholar 

  103. Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.-M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48(1), 159–205.

    Article  Google Scholar 

  104. Thurstone, L, L. (1947). Multiple factor analysis. Chicago: The University of Chicago Press.

  105. Treiblmaier, H., Bentler, P. M., & Mair, P. (2011). Formative constructs implemented via common factors. Structural Equation Modeling: A Multidisciplinary Journal, 18(1), 1–17.

    Article  Google Scholar 

  106. van der Heijden, G. A. H., Schepers, J. J. L., Nijssen, E. J., & Ordanini, A. (2013). Don’t just fix it, make it better! Using frontline service employees to improve recovery performance. Journal of the Academy of Marketing Science, 41(5), 515–530.

    Article  Google Scholar 

  107. Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: toward a unified view. MIS Quarterly, 27(3), 425–478.

    Google Scholar 

  108. Vilares, M. J., & Coelho, P. S. (2013). Likelihood and PLS estimators for structural equation modeling: an assessment of sample size, skewness and model misspecification effects. In J. Lita da Silva, F. Caeiro, I. Natário, & C. A. Braumann (Eds.), Advances in regression, survival analysis, extreme values, Markov processes and other statistical applications (pp. 11–33). Berlin: Springer.

    Google Scholar 

  109. Wold, H. O. A. (1974). Causal flows with latent variables: partings of ways in the light of NIPALS modelling. European Economic Review, 5(1), 67–86.

    Article  Google Scholar 

  110. Wold, H. O. A. (1980). Model construction and evaluation when theoretical knowledge is scarce: theory and application of PLS. In J. Kmenta & J. B. Ramsey (Eds.), Evaluation of econometric models (pp. 47–74). New York, NJ: Academic Press.

    Google Scholar 

  111. Wold, H. O. A. (1982). Soft modeling: the basic design and some extensions. In K. G. Jöreskog & H. O. A. Wold (Eds.), Systems under indirect observations, part II (pp. 1–54). Amsterdam: North-Holland.

    Google Scholar 

  112. Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109–130.

    Article  Google Scholar 

  113. Wolter, J. S., & Cronin, J. J. (2016). Re-conceptualizing cognitive and affective customer-company identification: the role of self-motives and different customer-based outcomes. Journal of the Academy of Marketing Science, 44(3), 397–413.

    Article  Google Scholar 

Download references

Acknowledgements

Earlier versions of the manuscript have been presented at the 2015 Academy of Marketing Science Annual Conference held in Denver, Colorado, and the 2nd International Symposium on Partial Least Squares Path Modeling: The Conference for PLS Users held in Seville, 2015. The authors would like to thank Jan-Michael Becker, University of Cologne, Jörg Henseler, University of Twente, and Rainer Schlittgen, University of Hamburg, for their support and helpful comments when developing the simulation study and its data generation in order to improve earlier versions of the manuscript. Even though this research does not explicitly refer to the use of the statistical software SmartPLS (http://www.smartpls.com), Ringle acknowledges a financial interest in SmartPLS.

Author information

Affiliations

Authors

Corresponding author

Correspondence to G. Tomas M. Hult.

Additional information

John Hulland served as Area Editor for this article.

Electronic Supplementary Material

ESM 1

(DOCX 704 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hair, J.F., Hult, G.T.M., Ringle, C.M. et al. Mirror, mirror on the wall: a comparative evaluation of composite-based structural equation modeling methods. J. of the Acad. Mark. Sci. 45, 616–632 (2017). https://doi.org/10.1007/s11747-017-0517-x

Download citation

Keywords

  • Composite
  • Generalized structured component analysis
  • GSCA
  • Partial least squares
  • PLS
  • SEM
  • Simulation
  • Structural equation modeling
  • Sum scores regression