1 Introduction

Marketing research aims at explaining the exchange between marketers and customers (Bagozzi 1975). Hence, Marketing comprises both a design component, how things should be done, and a behavioral component, what do actors in the marketplace do (Hunt 2010). In specific, Marketing research develops strategies such as marketing mix (McCarthy 1978) and market orientation (Kohli and Jaworski 1990), wherefore it can be regarded as a “science of the artifical” (Simon 1969). Moreover, it investigates the effects of these human-made concepts, so-called artifacts on, for example, the customer’s behavior. Hence, it can be regarded as a Behavioral science. As a consequence, it must combine the research paradigms of both Design and Behavioral research (Henseler 2017).

To meet the demands of both Design and Behavioral sciences in empirical research, structural equation modeling (SEM) is eminently suitable. It allows to model abstract concepts by a set of observable variables and to connect constructs, i.e., the statistical representation of the concepts, via a structural model (Bollen 1989). In doing so, complex relationships among the constructs and random measurement errors in the observable variables can be taken into account. Hence, SEM is a flexible approach that allows researchers to statistically model their theories. Moreover, researchers can use SEM to empirically falsify their theories which makes it a favorable tool in many disciplines including the field of Marketing research (Baumgartner and Homburg 1996; Steenkamp and Baumgartner 2000).

In SEM, three ways of modeling abstract concepts have been established: the reflective measurement model, the causal–formative measurement model, and the composite model (see e.g., Bollen and Bauldry 2011). Both, the reflective and causal–formative measurement model operationalize the concept under investigation as a latent variable, which has been proven to be particularly valuable for concepts of Behavioral sciences such as attitudes and personality traits. While in the reflective measurement model, it is assumed that the concept manifests itself in its measures, i.e., the measures share a common cause (Reichenbach 1956), in the causal–formative measurement model, the observable variables are assumed to cause the concept (Diamantopoulos et al. 2008). In contrast, the composite model can be used to operationalize abstract concepts that are conceived as constructions, i.e., as artifacts, which are objects designed to fulfill a certain purpose (Hevner et al. 2004; Henseler 2015, 2017; Schuberth et al. 2018; Benitez et al. 2020).Footnote 1 In doing so, the concept is modeled as an emergent variable, i.e., instead of a latent variable, a linear combination of observable variables represents the concept in the statistical model (Cohen et al. 1990; Reise 1999). Hence, it is assumed that a combination of ingredients compose the artifact, which, in turn, is represented by an emergent variable in the model.

Although emergent variables are less often encountered in empirical research than latent variables to model abstract concepts, they have been acknowledged and employed in various disciplines. For example concepts like marketing mix known from Marketing research (Fornell and Bookstein 1982) and concepts from Information Systems research such as IT infrastructure flexibility (Benitez et al. 2018) and IT integration capability (Braojos et al. 2020) have been modeled as emergent variable. Moreover, in Tourism and Hospitality research and Ecology, concepts such as tourist engagement (Rasoolimanesh et al. 2019) and plant community structure (Antoninka et al. 2011) were represented in the statistical model by an emergent variable. Similarly, in Evolution and Human Behavior research, the concept of environmental harshness was modeled as an emergent variable (Mell et al. 2018).

To broaden the accessibility of SEM to composite models, and therefore to open it to sciences of the artifical, SEM has been extended recently by confirmatory composite analysis (CCA; Henseler et al. 2014; Schuberth et al. 2018). For a recent introduction of CCA to Business research, see Henseler and Schuberth (forthcoming). CCA is a subtype of SEM, and thus follows the typical steps of SEM, namely, model specification, model identification, model estimation, and model assessment. To specify a model in CCA, emergent variables are employed and interrelated to represent the abstract concepts of interest. After the model has been specified and it has been ensured that the model is identified, i.e., the parameters can be uniquely from the variance–covariance matrix of the observable variables, the model parameters need to be estimated. In doing so, an estimator is employed that conforms the assumptions implied by the specified composite model. Finally, the estimated composite model is assessed globally and locally, i.e., the overall model fit and each emergent variable with its respective observable variables are examined. Consequently, CCA is similar to CFA, but instead of empirically falsifying reflective measurement models, it empirically falsifies composite models (Schuberth et al. 2018).

For SEM various estimators have been proposed (Reinartz et al. 2009). Particularly, the partial least squares (PLS) estimator [as developed by Wold (1975)] has gained increasing attention over the last two decades (Henseler et al. 2009; Hair et al. 2012; Shiau et al. 2019). PLS is a two-step estimation approach, which, in the first step, obtains weights by an iterative algorithm to create linear combinations of observable variables that serve as proxies for the constructs in the structural model. Subsequently, in the second step, the path coefficients are estimated by ordinary least squares based on the correlation matrix of the proxies from the previous step. Recently, PLS has been the subject of critical scientific examinations highlighting several shortcomings such as inconsistent parameter estimates for reflective and causal–formative measurement models and a lack of a test for exact overall model fit (Rönkkö and Evermann 2013; Rönkkö et al. 2015, 2016). To overcome these shortcomings, several enhancement have been introduced such as consistent partial least squares (PLSc) to consistently estimate the parameters of reflective and causal–formative measurement models (Dijkstra and Henseler 2015a) and a bootstrap-based test to assess the exact overall model fit (Dijkstra and Henseler 2015b). Moreover, studies provide arguments for and against the use of PLS (e.g., Rigdon 2016) and how PLS can be of value for different types of research (Henseler 2018).

Most recently, in an article in the Journal of Business Research, Hair et al. (2020) discuss “CCA” as the measurement model assessment steps in partial least squares structural equation modeling [PLS-SEM, a framework introduced by Hair et al. (2014)]. However, the method they present differs from CCA as originally developed by Schuberth et al. (2018) in many regards. First, Hair et al.’s method “can facilitate the assessment of reflective as well as formative measurement models” (Hair et al. 2020, p. 108) and has the purpose “to confirm both reflective and formative measurement models” (p. 101). In contrast, CCA is not meant for assessing measurement models, whether reflective or formative. Instead, its purpose is to assess composite models. Second, Hair et al.’s method “does not require the assessment of fit” (Hair et al. 2020, p. 108), whereas the overall model fit assessment is an essential step in completing a CCA. Third, the methods’ relation to PLS differ strongly. Hair et al.’s method is strongly linked to PLS-SEM; it “is a systematic methodological process for confirming measurement models in PLS-SEM” (Hair et al. 2020, p. 104). In contrast, although CCA can use the iterative PLS algorithm as an estimator for the weights and the consruct correlation matrix, this is in no way mandatory. There are many estimators for CCA, such as generalized canonical correlation analysis (Kettenring 1971) and generalized structured component analysis (GSCA, Glang 1988; Hwang and Takane 2004). Besides these, in principle also maximum-likelihood, unweighted least squares, and weighted least squares estimators for composite models are conceivable. Consequently, CCA is not tied to PLS. If anything, the iterative PLS algorithm could be regarded as a possible component of CCA. Finally, while there are mathematical proofs and empirical evidence for the efficacy of CCA, such evidence does not exist for Hair et al.’s method. What is worse, one can easily provide evidence against it (see Sect. 5), and the literature has already called the rules of thumb underlying Hair et al.’s method ‘unjustifiable’ (Rönkkö et al. 2016). Overall, CCA and Hair et al.’s method differ in nature, purpose, and efficacy. To prevent researchers confusing CCA with Hair et al.’s method, it is proposed to dub the latter the “method of confirming measurement quality” (MCMQ), a term used by Hair et al. (2020, p. 101). Table 1 summarizes the main differences between CCA and MCMQ.

Table 1 Main differences between confirmatory composite analysis and the method of confirming measurement quality

Against the background of the situation outlined above, the present paper makes a double contribution. First, it describes both CCA and MCMQ and highlights their differences in nature and purpose to prevent researchers from confusing the two. Second, it exposes both methods to a series of scenarios to demonstrate their efficacy in fulfilling their intended purpose. The outcomes show that whereas CCA can indeed distinguish between correctly and incorrectly specified composite models, there are instances in which MCMQ fails to discriminate between correctly and incorrectly specified formative and reflective measurement models, i.e., it tends to confirm measurement models although they are incorrectly specified. These findings imply that (1) if researchers want to assess composite models, they should apply CCA; (2) if researchers want to assess the quality of formative and reflective measurement models, they should use SEM including CFA; and (3) MCMQ could benefit from rethinking, redesigning, and additional investigation.

2 Different types of outer models

To provide solid ground for the discussion, in the following, the three main outer model types in the context of SEM are briefly presented, see e.g., Bollen and Bauldry (2011): (a) the reflective measurement model; (b) the causal–formative measurement model; and (c) the composite model.

The three outer models differ with regard to the relationship of the construct to its observable variables and with regard to the type of construct that represents the abstract concept in the statistical model. Among the three outer models, the following two types of constructs are distinguished: latent variable and emergent variable. While latent variables cannot be inferred with certainty from the data (Borsboom 2008), emergent variables, also referred to as composites, are composed of their observable variables and thus are fully determined by them. It is stressed “that researchers should avoid using the confusing convention of referring to composites as latent variables” (Hardin and Marcoulides 2011, pp. 754–755). In the remainder of this section, it is elaborated on each type of outer model.

Fig. 1
figure 1

Three different types of outer models

Figure 1a displays the reflective measurement model, which is also often referred to as the common factor model. The reflective measurement model assumes that a latent variable (\(\eta\)) causes the observable variables (y) and their interrelations (Jöreskog 1969). Since, the variance in the observable variables can usually not be fully explained by the latent variable, the remaining variance is captured by random errors (\(\epsilon\)). Typically, these random errors are assumed to be uncorrelated with each other and uncorrelated with the latent variable. As a consequence, the latent variable is the only explanation for the correlations among the observable variables.

The causal–formative measurement model, as illustrated in Fig.  1b, is an alternative type of measurement model (Diamantopoulos 2008). At its core is a set of causal indicators (y), which causally affects the latent variable (\(\eta\)). In doing so, the causal indicators are allowed to be freely correlated. Since the indicators typically do not cause all of the variation in the latent variable, an error term (\(\zeta\)) captures the remaining variance in the construct. It is assumed that the error term is uncorrelated with the causal indicators and that all effects of the causal indicators on other variables in the model are fully mediated by the latent variable.

The composite model is depicted in Fig. 1c. In contrast to the two types of measurement models, the composite model assumes that the construct (\(\eta\)) is fully composed of its observable variables (y), see, e.g., Grace and Bollen (2008). Thus, the construct emerges of its observable variables, wherefore this type of construct is also called emergent variable (Cohen et al. 1990; Cole et al. 1993; Reise 1999; Benitez et al. 2020). As a consequence, there is no error term at the construct level. Similar to the causal–formative measurement model, the observable variables can freely covary.

The choice of the outer model reflects a researcher’s understanding of the world and is not tied to any specific estimator. Every type of outer model requires different rules for model identification and imposes different constraints on the variance–covariance matrix of the observable variables. While the causal–formative measurement model as well as the composite model typically put no constraints on the variance–covariance matrix of the observable variables causing and composing the construct, respectively, the reflective measurement model assumes that the measures are independent when is controlled for the latent variable (Lazarsfeld 1959). However, the former two impose constraints on the covariances between the observable variables connected to a construct and observable indicators connected to other constructs in the model. All these constraints can be exploited in model fit assessment to examine whether the specified model is consistent with the collected data.

3 Confirmatory composite analysis

Only recently, Schuberth et al. (2018) introduced CCA as a subtype of SEM that aims at assessing composite models. Like all forms of SEM, CCA consists of four steps: (1) model specification, (2) model identification, (3) model estimation, and (4) model assessment. The latter is particularly important to assess the estimated model and thus a researcher’s theory (Mulaik et al. 1989; Yuan 2005). Approaches that omit one or more of these steps—in particular the last one—would be considered incomplete and hence inapt for fulfilling the purpose of SEM (Rönkkö and Evermann 2013); in other words, “if SEM is used, then model fit testing and assessment is paramount, indeed crucial, and cannot be fudged for the sake of “convenience” or simple intellectual laziness on the part of the investigator” (Barrett 2007, p. 823). The following subsections briefly explain each of the four steps in CCA.

3.1 Specifying composite models

Composite models as typically studied in CCA consist of a set of emergent variables that are allowed to freely covary, although in general, constraints on the variance–covariance matrix of the emergent variables are also conceivable (Dijkstra 2017). Each emergent variable \(\eta _j\) is a linear combination (weighted sum using weights \(w_{ji}\)) of \(I_j\) observable variables \(y_{ji}\):Footnote 2

$$\begin{aligned} \eta _j = \sum _{i=1}^{I_j} w_{ji} \cdot y_{ji} \end{aligned}$$

Due to their nature as weighted sums of other variables, emergent variables are essentially prescriptions for dimension reduction (Dijkstra and Henseler 2011). The analogy between CCA and CFA is obvious and intended: Whereas a CFA usually studies a set of interrelated latent variables, a CCA examines a set of interrelated emergent variables.

Fig. 2
figure 2

A composite model consisting of three correlated emergent variables

Figure 2 shows an exemplary composite model consisting of three interrelated emergent variables, each of which is composed of three observable variables. This composite model allows the observable variables of each emergent variable to be freely correlated as highlighted by the double-headed arrows among the observable variables belonging to one emergent variable. Similarly, all emergent variables are allowed to be freely correlated. Finally, the emergent variables fully convey the correlations between the observables variables of two different emergent variables.

3.2 Identifying composite models

Model identification plays a preponderant role in SEM including CFA and CCA. Once the model is specified, it needs to be ensured that the parameters of the specified model can be uniquely retrieved from the variance–covariance matrix of the observable variables, i.e., it needs to be assessed whether the model is identified. Interpreting parameter estimates of underidentified models leads to questionable conclusions, as several sets of parameters exist that satisfy the constraints of the model-implied variance–covariance matrix, i.e., several parameter sets lead to the same model-implied variance–covariance matrix.

Similar to CFA, a necessary condition for the identification of a composite model is to fix the scale of each emergent variable in the model. This can be done by either fixing one weight per emergent variable or fixing the variance of each emergent variable. Typically, the variance of each emergent variable is fixed to one by employing properly scaled weights. If this approach is applied, the sign of each weight vector of every block also needs to be determined, similar to a reflective measurement model if the variance of the latent variable is fixed to one. Moreover, no emergent variable is allowed to be isolated, i.e., each emergent variable must be correlated with at least one emergent variable in the model.Footnote 3 Otherwise, the model-implied covariances between the observable variables of an isolated emergent variable and the observable variables of the remaining emergent variables are all equal to zero, which implies for the isolated emergent variable an infinite number of weight sets that satisfies the scaling condition.

3.3 Estimating composite models

As common in SEM, CCA deliberately distinguishes between the model and the estimator. Although it is suggestive to employ a composite-based estimator such as the iterative PLS algorithms to estimate the weights and the emergent variables’ correlations of the composite models (Henseler 2016), in general, other estimators such GSCA or approaches to generalized canonical correlation analysis can be employed. The guiding criteria in the decision in favor of a certain estimator are its statistical properties and its implied assumptions about the underlying model and population, i.e., the estimator must conform the assumptions imposed by the composite model. Predominantly, researchers should favor unbiased and/or consistent estimators over biased/inconsistent estimators. A consistent estimator produces estimates that converge in probability towards their population counterpart; an unbiased estimator produces estimates whose expected value equals the population counterpart. Purely practical aspects such as computation time and convergence behavior tend to play a minor role in the selection of estimators.

3.4 Assessing composite models

Similar to CFA and SEM, model assessment is a crucial step in CCA. In doing so, the composite model is usually assessed globally and locally. While the global model assessment examines the model as a whole, local model assessment investigates each emergent variable separately including the relationship to its observable variables.

The global model assessment, also known as overall model fit assessment, is a crucial step in SEM as well as CCA and its importance is acknowledged across various disciplines and emphasized in literally every textbook on SEM (e.g., Schumacker and Lomax 2016; Kline 2015). The most well-known ways to assess a model’s fit are: (i) fit indices and (ii) the test for exact overall model fit. The former quantifies the misfit on a continuum and the decision about whether the model acceptably fits the collected data is usually based on heuristic rules. In contrast, the latter assesses the exact fit of the overall model by means of statistical testing. In doing so, model fit assessment and decision making are based on a p-value.

In general, the test for exact overall model fit compares the sample variance–covariance matrix of the observable variables to their model-implied counterpart. In doing so, the test assesses the null hypothesis that the model-implied variance–covariance matrix based on the population parameters equals the population variance–covariance matrix of the observable variables: \(H_0: {\varvec{\varSigma }}({\varvec{\theta }}) = {\varvec{\varSigma} }\). In other words, the test for exact overall model fit examines whether it is plausible that the world functions as described by the specified model.

Various tests are available for this endeavor depending on the statistical properties of the employed estimator and the assumptions about the population. To keep the number of needed assumptions low, Schuberth et al. (2018) proposed employing a nonparametric test for exact overall model fit assessment that obtains the distribution of the discrepancy measure under the null hypothesis through the bootstrap (Beran and Srivastava 1985). This test has the advantage that as long as the selected estimator produces consistent estimates for the population parameters, the asymptotic distribution of the test statistic is not tied to a specific estimator.Footnote 4 The asymptotic properties of the test are mathematically derived, and its finite sample performance for composite models has been demonstrated (Schuberth et al. 2018).Footnote 5

Once the model shows an acceptable model fit, the researcher can proceed with the local assessment of the composite model, i.e., each emergent variable with its respective observable variables is considered separately. In doing so, the significance of the estimated weights, their sign, and size are examined and matched with a researcher’s theory. Similarly, the correlations among the emergent variables are matched with the expectations of a researcher. Moreover, if the weight estimates are subject to multicollinearity, it is recommended to inspect the correlational patterns of the observable variables forming an emergent variable.

4 Hair’s method of confirming measurement quality

Hair et al. (2020) introduce MCMQ as the measurement model assessment step in PLS-SEM (Hair et al. 2014). MCMQ entails different evaluation steps, depending on whether analysts define their measurement models as reflective or formative. The next two subsections briefly describe the evaluation steps for reflective and formative measurement models and add some critical reflections where needed. For a more elaborate treatment, we refer to Hair et al. (2020).

4.1 Assessing reflective measurement models using MCMQ

Once the PLS algorithm has been executed, MCMQ requires seven steps to evaluate a reflective measurement model (see Table 2 of Hair et al. 2020):

  1. 1.

    Estimate of loadings and significance

    In PLS-SEM, loading estimates are correlations between the construct scores and the corresponding observable variables. According to Hair et al. (2020), the loading estimates must be significantly different from zero and have a value of 0.708 or greater. However, Hair et al. (2020) fail to mention that, if interpreted as loadings of a reflective measurement model, the loading estimates in PLS-SEM tend to be strongly upward biased (McDonald 1996).

  2. 2.

    Indicator reliability (items)

    The concept of indicator reliability is strongly connected to the reflective measurement model and indicates how much of the variance in an indicator is explained by its underlying latent variable. In PLS-SEM, the indicator reliability is calculated as the squared loading estimate. Hair et al. (2020) do not provide any further explanations or any threshold level for the indicator reliability, so it remains unclear how and for what purpose researchers should employ it. Since the loading estimates in PLS-SEM are upwardly biased for reflective measurement models, so are the indicator reliabilities.

  3. 3.

    Composite reliability (construct)

    According to Hair et al. (2020), the composite reliability of construct scores should range from 0.7 to 0.95. The coefficient of composite reliability is inherently tied to the reflective measurement model and assumes that the measures of a latent variable are prone to random measurement errors. It equals the ratio of the variance of the latent variable and the variance of a linear combination of the latent variable’s indicators (Jöreskog 1971). Since in PLS-SEM, the composite reliability is calculated based on upwardly biased loadings, composite reliability will be inflated. For instance, a construct measured by two observable variables with an indicator reliability of 0.28 each will yield a composite reliability greater than 0.7, although the true reliability of the construct scores is 0.4375. Researchers relying on MCMQ may thus mistakenly believe that their measures are reliable, whereas in fact, the reliability of their measures is far below the threshold of 0.7 that is recommended by Nunnally and Bernstein (1994). A more detailed presentation including negative consequences is given in the “Appendix”.

  4. 4.

    Average variance extracted (AVE)

    According to Fornell and Larcker (1981), the average variance extracted (AVE) should exceed 0.5. The AVE assumes a reflective measurement model and indicates how much of the variance in the indicators is explained by the underlying latent variable (Fornell and Larcker 1981). In PLS-SEM, the AVE is determined by averaging the indicator reliabilities (Hair et al. 2020). Since the indicator reliabilities are upwardly biased, so is the AVE. For instance, a construct that is measured by two observable variables with an indicator reliability of 0.28 each will yield an AVE clearly higher than 0.5 in PLS-SEM, although the true AVE would only be 0.28. Researchers relying on MCMQ may thus mistakenly conclude that their measures possess convergent validity, whereas in fact, they do not.

  5. 5.

    Discriminant validity—HTMT

    Whether two constructs can be statistically discriminated should be assessed by means of the heterotrait-monotrait ratio of correlations (HTMT, Henseler et al. 2015). The HTMT assumes a reflective measurement model and provides a consistent estimate for the correlation between two latent variables if the reflective measurement model is tau-equivalent, see the Appendix of Henseler et al. (2015). Since the HTMT is only based on the correlation matrix of the observable variables, its values remain untouched by PLS-SEM. However, it is noted that the HTMT assumes an underlying reflective measurement model for which PLS-SEM produces inconsistent estimates, regardless of whether Mode A or B is employed to calculate the weights.

  6. 6.

    Nomological validity

    The nomological validity of constructs can be assessed by means of verifying that the correlational pattern of the constructs is in line with the expected relationships based on extant theory (Hair et al. 2020). However, in PLS-SEM, for reflective measurement models, the correlations among latent variables are biased, and thus, conclusions about the nomological validity are questionable.

  7. 7.

    Predictive validity

    Finally, predictive validity “assesses the extent to which a construct score predicts scores on some criterion measure” (Hair et al. 2020, p. 105). If a researcher’s goal is pure prediction, he is of course not tied rules of confirmatory research.

Remarkably, the assessment of overall model fit is not part of the evaluation steps. As a consequence, MCMQ does not assess whether the specified model is consistent with the collected data. Moreover, several of the assessment steps involve assessment criteria that assume a reflective measurement model for which in PLS-SEM biased estimates are produced. Therefore, conclusions based on these criteria are questionable.

4.2 Assessing formative measurement models using MCMQ

For formative measurement models, MCMQ entails five evaluation steps (see Table 3 of Hair et al. 2020):

  1. 1.

    Convergent validity—redundancy

    The latent variable’s coefficient of determination (\(R^2\)) refers to convergent validity. It quantifies the extent of the latent variable’s variance that is explained by its formative indicators. Instead of the \(R^2\), one can also assess its square root, which represents the effect of an optimal linear combination of formative indicators on the latent variable. As the former and the latter are typically standardized in PLS-SEM, this effect equals the path coefficient between the two. Hair et al. (2020) postulate a path coefficient of at least 0.7, which corresponds to a minimal \(R^2\) value of 0.49. However, this rule of thumb neglects that in PLS-SEM the path coefficients are biased if the dependent or the independent variable (or both) is latent. For instance, if the latent variable has two reflective indicators with a reliability of 0.28 each (which will remain unnoticed by MCMQ; see the explanation above), a path coefficient of 0.7 is impossible.

  2. 2.

    Indicator multicollinearity

    According to Hair et al. (2020), researchers must assess indicator multicollinearity because it “creates problems with formative measurement models” (p. 105). Variance inflation factors (VIFs) should not exceed 3, and bivariate correlations should be lower than 0.5 (Hair et al. 2020). Notably, bivariate correlations can exceed 0.8 without necessarily yielding a VIF above 3. At the same time, VIF values greater than 10 can occur even if no bivariate correlation exceeds 0.5. Consequently, this assessment step is likely to often render measurement models ‘problematic’. Moreover, mulitcollinearity is a characteristic of the sample and not a problem of the underlying model.

  3. 3.

    Size and significance of indicator weights

    Convergent validity and absence of multicollinearity are prerequisites for the next step, the assessment of the size and significance of indicator weights (Hair et al. 2020). The weights of the formative indicators should be significant; indicators with insignificant weights may or may not be discarded (Hair et al. 2020).

  4. 4.

    Contribution of indicators (size and significance of loadings)

    The loadings of formative indicators should be significantly different from zero and exceed 0.5; otherwise, the researcher can discard or retain formative indicators (Hair et al. 2020).

  5. 5.

    Predictive validity

    Finally, the assessment of formatively measured constructs’ predictive validity is analogous to that of reflectively measured constructs (Hair et al. 2020). Therefore, it is only repeated that researchers who are interested in pure predictive research do not have to follow the rules for rigorous confirmatory research.

Again, the assessment of overall model fit is not part of the evaluation steps. As a consequence, MCMQ does not inform researchers on whether a specified formative measurement model is consistent with the collected data.

5 Assessment of confirmatory composite analysis and Hair’s method of confirming measurement quality by means of a scenario analysis

The inventors of CCA and MCMQ make different claims about what their methods are suitable for: CCA is meant to assess composite models (Schuberth et al. 2018), and MCMQ’s objective is the “confirmation of the measurement models” (Hair et al. 2020, p. 103), whether reflective or formative. However, whereas Schuberth et al. (2018) demonstrated the efficacy of CCA by means of a Monte Carlo simulation, MCMQ has not been exposed to any rigorous examination so far. Although Hair et al. (2020, p. 108) state that MCMQ “is a superior approach”, they do not provide any evidence. Consequently, two fundamental questions about the absolute and relative efficacy of MCMQ arise: Does MCMQ actually detect problematic models? And if so, is it superior to SEM including CCA and CFA? A scenario analysis can help to answer these questions.

5.1 Setup of the scenario analysis

Scenario analysis allows us to illustrate how well different methods retrieve parameters from a given population; “estimation” is thus not done on the basis of a sample. Hence, parameters are rather retrieved than estimated. This avoids distributional assumptions and uncertainty introduced by sampling.

To examine to what extent SEM, including CCA and CFA, and MCMQ are able to assess the quality of a model, the methods are exposed to six scenarios. Each scenario is a combination of a population model and a specified model of a fictitious researcher. In the Scenarios 1a, 2a, and 3a, the researcher’s model is correctly specified and the methods employed to assess the model are expected not to indicate any problems. In contrast, in the three Scenarios 1b, 2b, and 3b, the studied population functions differently than the way the researcher thinks it does. In these situations, the researcher’s model is incorrectly specified. Hence, the methods employed to assess the model are expected to sound the alarm and indicate that the model is problematic.

In Scenarios 1a and 1b, a researcher thinks that the population functions according to a multiple-indicators, multiple-causes (MIMIC, Jöreskog and Goldberger 1975) model. A MIMIC model entails that a latent variable is measured with both reflective and causal–formative indicators, i.e., it combines causal–formative and reflective measurement. In our case, there are six indicators, three of which are considered reflective and the other three to be causal–formative. Whereas in Scenario 1a, the population indeed functions according to a MIMIC model, in Scenario 1b, the population functions according to a different model. Since the purpose of MCMQ is to confirm formative and reflective measurement models, a MIMIC model is a formidable test case for MCMQ. Both MCMQ and classical SEM are used to assess the MIMIC model. Obviously, in Scenario 1a, the methods should not reject the researcher’s model, whereas in Scenario 1b, the methods should make the researcher aware that his or her model is problematic.

In Scenarios 2a and 2b, a researcher thinks that the population functions according to a composite model. Concretely, the specified model consists of two correlated emergent variables, each composed of three observable variables. Whereas in Scenario 2a, the population indeed functions according to the composite model, in Scenario 2b, the population functions according to a different model.

Since CCA serves to assess composite models, these scenarios allow us to illustrate its performance. Notably, Hair et al. regard MCMQ “as a separate approach to confirming linear composite constructs in measurement models” (p. 104). For them, “reflective measurement models are composite latent constructs” (Hair et al. 2020, p. 104), and “[f]ormative composite measurement models are linear combinations of a set of indicators that form the construct” (p. 105). Moreover, MCMQ “enables researchers to develop and validate measures within a nomological network. Each composite, therefore, must relate to at least one other composite. Hence, the validity of a composite depends on the nomological network in which it is embedded.” (Hair et al. 2020, p. 103) These statements suggest that MCMQ might be suitable for assessing composite models as analyzed in CCA. Therefore, MCMQ is also employed in Scenarios 2a and 2b. In Scenario 2a, both CCA and MCMQ should not sound alarm, whereas in Scenario 2b, the methods should alert the researcher that the specified model is problematic.

Against the recent history of PLS-SEM, and especially since Rigdon (2012), it is suggested that in PLS-SEM reflective and formative measurement refer to a model where the emergent variable is built by correlation weights (Mode A) and regression weights (Mode B), respectively. As MCMQ is applied to PLS-SEM, a further composite model is investigated in the last two scenarios. In contrast to Scenario 2a, in Scenario 3a, a population is considered where the second emergent variable is built by correlation weights. To generate the population weights for this emergent variable, an approach suggested by Cho and Choi (2020) is applied. Similar to Scenarios 2a and 2b, in Scenario 3a, the specified model matches the underlying population, while in Scenario 3b, the model is misspecified. It is noted that the model where an emergent variable is built by correlation weights is nested within the composite model analyzed in CCA. Hence, also CCA is employed in Scenarios 3a and 3b.

Similar to Scenarios 1 and 2, in Scenario 3a, the specified model is correct, while in Scenario 3b, the population does not function according to the specified model. Consequently, in Scenario 3a, CCA and MCMQ should indicate no problems in the specified model. In contrast, in Scenario 3b, the specified model is incorrect and this should also be reflected by CCA and MCMQ.

Table 2 provides an overview of the six scenarios. It includes the population models, their correlation matrices, and the models as specified by the fictitious researcher. In the Scenarios 3a and 3b, the second construct \(\eta _2\) is deliberately displayed by a hexagon to highlight the fact that it is an emergent variable, although the arrows point from the construct to the indicators. This implies that the observable variables of the second emergent variable are also allowed to be freely correlated, and it also emphasizes the difference of that special type of emergent variable with the classical reflective measurement model known from SEM. All values are rounded to the second decimal place.

Table 2 The six scenarios

MCMQ was exposed to all six scenarios. Since MCMQ is the evaluation step in PLS-SEM, PLS-SEM was conducted as outlined in a current primer on that framework (concretely, Hair et al. 2017). Subsequently, the evaluation steps proposed by Hair et al. (2020) were performed and the corresponding conclusions were drawn. Three evaluation steps were omitted. First, discriminant validity was not assessed because it is not applicable to models with less than two reflectively measured constructs. Second, nomological validity was not assessed in Scenarios 1a and 1b, as only one construct, i.e., the latent variable, is available. Third, although PLS can be employed for predictive modeling (Cepeda Carrión et al. 2016), predictive validity was not considered because it is not a mandatory part of confirmatory research. It is well-known that in causal research, i.e., explanatory and confirmatory research, the correctness of the specified model is substantial, while “’wrong’ model can sometimes predict better than the correct one” (Shmueli 2010, p. 6). Moreover, similar guidelines (such as, for instance, in Hair et al. 2017) do not include predictive validity.Footnote 6

SEM and CCA were performed along their four constituting steps: model specification, identification, estimation, and assessment. To keep the difference between the different methods as small as possible, PLS was employed as an estimator. To correct the parameters of the reflective measurement model for attenuation in Scenario 1a and 1b, PLSc was applied for SEM. To obtain the parameters in Scenarios 2a and 2b, the iterative PLS algorithm was employed using mode B for both emergent variables. Similar in Scenarios 3a and 3b, Mode B was used for the first emergent variable and Mode A for the second one.Footnote 7 Finally, three coefficients helped to assess the overall fit of the models: (1) the standardized root mean square residual (SRMR, Hu and Bentler 1998) and (2) the geodesic discrepancy (\(d_{\mathrm {G}}\)) as measures of misfit (Dijkstra and Henseler 2015a) and (3) the normed fit index (NFI, Bentler and Bonett 1980) as a relative fit measure. For the latter, the model-implied correlation matrix of the competing model equals a unit matrix.

5.2 Results and conclusion

Table 3 presents the parameters of all models involved: both the parameters of the population models and the retrieved parametersFootnote 8 of the specified models. In Scenario 1a, as expected and well-known in the literature (e.g., Dijkstra 1981), the iterative PLS algorithm produces Fisher-consistent estimates for the composite model but not for the reflective measurement model. To retrieve the population parameters correctly from reflective measurement models, PLSc is used as an estimator in SEM. In contrast, PLS-SEM and thus MCMQ rely on traditional PLS without a correction for attenuation, and as a consequence it does not retrieve the population parameters. This is a deliberate choice of the inventors of PLS-SEM because in their view, PLSc “adds very little to the body of knowledge” (Hair et al. 2019b, p. 570). In Scenario 2a, which purely deals with emergent variables, the iterative PLS algorithm delivers the true parameter values as a result. Similarly, in Scenario 3a, which contains the special emergent variable formed by correlation weights, the PLS algorithm using Mode B for the first emergent variable and Mode A for the second emergent variable retrieves the population parameters. Consequently, both CCA and MCMQ are based on the population parameters in Scenarios 2a and 3a. In Scenarios 1b, 2b, and 3b, where the specified model does not match the population model, all estimators provide parameter values that deviate from the population values. As a consequence, CCA, SEM, and MCMQ are all based on incorrect parameters, which shows that a misspecification of the model generally prevents the fictitious researcher from obtaining an accurate understanding of the world.

Table 3 Population and retrieved parameters for the six scenarios

For Scenarios 1a and 1b, the results of MCMQ and SEM are displayed in Table 4. MCMQ confirms the quality of the specified model in both scenarios, and none of the proposed steps could detect the problem of the specification in Scenario 1b. To assess the outer model of the first construct \(\eta _1\), MCMQ’s rules for formative measurement models are followed, which lead, for both scenarios, to the same results. As suggested by Hair et al. (2020), convergent validity is established since the path coefficient between the emergent variable and the latent variable is above 0.7 (see Table 3). Moreover, multicollinearity is not regarded as an issue since the correlations among the three causal indicators are below 0.5 (see the correlation matrix of Scenarios 1a and 1b in Table 2). Similarly, the VIFs are all below the suggested threshold of 3.Footnote 9 The retrieved weights for the causal–formative indicators are all sizable (see Table 3) and range from 0.347 to 0.607.Footnote 10 Similarly, the loadings of the causal indicators range from 0.512 to 0.746 (see Table 3) and thus are all above the recommended threshold of \(0.5\).10 To assess the reflective measurement model of the latent variable, MCMQ’s quality criteria for reflective measurement models are applied. The criteria do not indicate any problems for either Scenario 1a or 1b. The corresponding loadings are all above 0.7 (see Table 3).10 Since there is no threshold mentioned by Hair et al. (2020), indicator reliabilities ranging from 0.663 to 0.755 are regarded as sufficient. For both scenarios, the composite reliability measures are also between the proposed thresholds of 0.70 and 0.95: Scenario 1a: Cronbach’s \(\alpha = 0.810\) and Jöreskog’s \(\rho = 0.887\), and Scenario 1b: Cronbach’s \(\alpha = 0.810\) and Jöreskog’s \(\rho = 0.886\) . Similarly, the AVE is 0.725 in Scenario 1a and 0.721 in Scenario 1b and thus above the proposed threshold of 0.5. As a consequence, MCMQ does not alert the researcher that his or her model of Scenario 1b is misspecified.

Table 4 Results of MCMQ and SEM for Scenarios 1a and 1b

Considering the model fit assessment criteria in Table 4, they correctly indicate no problems in the specified model of Scenario 1a . The SRMR as well as the geodesic distance are both zero, while the NFI equals 1. In contrast, for Scenario 1b, both the SRMR and the geodesic distance substantially exceed 0. Moreover, the normed fit index (NFI) is clearly below 1. As a consequence, the researcher is alerted that his or her model is misspecified.

Similar to the assessment of the reflective and causal–formative measurement models, Table 5 shows that MCMQ’s quality criteria for formative measurement models do not indicate any problems with the specified composite models for both Scenarios 2a and 2b. Since there are no reflective measures of the two emergent variables, convergent validity has not been assessed. Considering the correlations among the observable variables (see the correlation matrix of Scenarios 2a and 2b in Table 2) and the VIF values for the weights of the two emergent variables, multicollinearity is not an issue.Footnote 11 Moreover, all weights are sizable, and the loadings are above the proposed threshold of 0.5 (see Table 3).10 As a consequence, MCMQ does not alert the researcher that his or her composite model in Scenario 2b is misspecified.

Table 5 Results of MCMQ and CCA for the Scenarios 2a and 2b

In contrast, the proposed model fit criteria in CCA do not reject the specified composite model of Scenario 2a and correctly indicate problems in the composite model specification of Scenario 2b (see Table 5). The SRMR and the geodesic distance are both 0 in Scenario 2a and clearly above 0 in Scenario 2b. Similarly, the NFI is 1 in the scenario where the composite model is correctly specified and significantly below 1 in Scenario 2b. Consequently, CCA sounds the alert that the composite model of Scenario 2b is misspecified.

For Scenarios 3a and 3b, the results of CCA and MCMQ are displayed in Table 6. MCMQ’s quality criteria for formative measurement models do not indicate any problems: the path coefficient between the two constructs is above 0.7, and thus, following Hair et al. (2020), convergent validity is established (see Table 3). Considering the correlations among the observable variables (see the correlation matrices of Table 2) and the VIF values for the weights of the first emergent variable,Footnote 12 multicollinearity is not an issue. Moreover, all weights are sizable and none of the loadings of the first construct are below the suggested threshold of 0.5 (see Table 3).10

To assess the second emergent variable, MCMQ’s rules for reflective measurement models are applied. Again, for both scenarios, no problems are indicated. All loadings associated with the second emergent variable are above the proposed threshold of 0.708 (see Table 3).10 As no decision criterion for indicator reliability is mentioned by Hair et al. (2020), the squared loadings ranging from 0.680 to 0.755 are regarded as sufficient. Composite reliability is assessed by means of Cronbach’s α and Jöreskog’s \(\rho\). In both scenarios, the two reliability measures are between the proposed thresholds of 0.70 and 0.95.Footnote 13 Considering the AVE, the values for the second construct are all above the proposed threshold of 0.5.Footnote 14 Since there is only one reflective construct, discriminant validity is not assessed. Nomological validity can be regarded as established since the correlation between the construct scores of the second and the first constructs are in line with the researcher’s expectations. As a consequence, MCMQ’s quality criteria do not indicate any problems with the specified model in Scenario 3b, where the model is indeed misspecified.

Table 6 Results of MCMQ and CCA for the Scenarios 3a and 3b

The model fit criteria of CCA correctly do not sound alarm in the case of Scenario 3a. The SRMR and the geodesic distance are both zero.Footnote 15 Moreover, the NFI is equal to one. However, for Scenario 3b, the SRMR and the geodesic distance are substantially larger than 0. Similarly, the NFI is below 1. As a consequence, CCA correctly alerts a researcher that his or her model is misspecified.

6 Conclusion and discussion

Dijkstra and Henseler extended SEM and invented a statistical method that assesses the overall fit of a composite model and, making use of the inventor’s privilege to name the invention, they called it ‘confirmatory composite analysis’ (Henseler et al. 2014, see also the author note in that paper). Moreover, Schuberth et al. (2018) fully developed this method and provide evidence for its efficacy. The name of the method was chosen on purpose: The method allows the analysis of composite models; it is confirmatory in nature because it (dis-)confirms whether the specified model is consistent with the data; in the words of Bollen (1989 p. 68): “If a model is consistent with reality, then the data should be consistent with the model”. Following the principle that a name should say something about the meaning of the named object, one could thus say that the name ‘confirmatory composite analysis’ fits the method like a glove. Moreover, the name emphasizes the proximity to CFA: CCA and CFA share everything except for the specified model. Whereas CFA assesses reflective measurement models, also known as common factor models, CCA assesses composite models in which the abstract concept is represented by an emergent variable instead of a latent variable.

In a recent paper published in the Journal of Business Research, Hair et al. (2020) took the term ‘confirmatory composite analysis’ and used it for something else, namely, the evaluation step of PLS-SEM. While it is comprehensible that they employ rebranding to distance their method from negative connotations (evoked, e.g., by papers such as Rönkkö and Evermann 2013; Rönkkö et al. 2015; Dijkstra and Henseler 2015b; Rönkkö et al. 2016; Goodhue et al. 2017), it is unfortunate that they used a name of an extant method that is substantially different. As this paper has shown, Hair et al.’s method neither assesses composite models nor is it suitable for confirmatory purposes. Consequently, it creates unnecessary ambiguity and confusion to call MCMQ ‘confirmatory composite analysis’, as CCA introduced by Schuberth et al. (2018) was exactly developed for that purpose. Instead, it is recommended either keeping the original name, the ‘evaluation step of PLS-SEM’, or using the descriptive name ‘method of confirming measurement quality’, as it was done in this paper.

Whereas CCA has been demonstrated to serve as an effective method to assess composite models [in addition to the scenario analysis presented in this paper, see, for instance, the Monte Carlo simulations conducted by Schuberth et al. (2018) and Schuberth et al. (2020)], so far, no alleged capabilities of MCMQ have been demonstrated. In contrast, this article provided evidence that MCMQ is unable to fulfill its promises. As known in the literature and reconfirmed by this study, MCMQ is not suited to evaluate reflective and causal–formative measurement models. This is mainly due to two reasons: First, model fit assessment plays no role in MCMQ which leaves a researcher uninformed about whether his/her specified model is an acceptable representation of the studied population. Second, several of the evaluation criteria employed in MCMQ assume a reflective measurement model for which PLS-SEM produces inconsistent estimates (e.g., Henseler et al. 2014). Moreover, this study showed that MCMQ fails to disconfirm invalid composite models including models where the emergent variable is formed by correlation weights (“reflective measurement” in PLS-SEM parlance). This casts additionally doubts about applying evaluation criteria that assume a reflective measurement model, such as Jöreskog’s \(\rho\) and AVE, to composite models. From the findings of the scenario analysis, it is concluded that MCMQ lacks sensitivity. While it indeed confirms measurement quality in case of valid measurement, at the same time, it fails to disconfirm measurement quality in cases in which the specified model is in fact invalid. Ergo, the evaluation steps subsumed under MCMQ confirm measurement quality, but they do not necessarily assess it.Footnote 16

This does not mean that MCMQ’s rules for reflective measurement models should be discarded. In fact, they have been widely proposed in guidelines for the assessment of reflective measurement models estimated by PLSc, e.g., Benitez et al. (2020), Müller et al. (2018) and Henseler et al. (2016). However, as shown by this study, they cannot replace the crucial step of model fit assessment in SEM. Every rigorous scientific study should engage in assessing the fit of the outer models before confirming their quality.

Apart from promulgating an ineffective method, Hair et al. (2020) make several statements on CCA that are incorrect; not always is the reason of their incorrectness that they confound CCA with MCMQ. Table 7 lists the most important statements and sets the record straight.

Table 7 Incorrect statements of Hair et al. (2020) about confirmatory composite analysis and corresponding corrections

Finally, the study at hand provides a very simple rule for researchers who would like to assess the quality of their models: Always assess a model with the method that was developed for this model. Concretely, CFA is used for assessing reflective measurement models (common factor models), SEM is used for assessing causal–formative measurement models, and CCA is used for assessing composite models.