Higher-Order PLS-PM Approach for Different Types of Constructs

Partial least squares path modeling (PLS-PM) has become very popular in recent years, for measuring concepts that depend on different aspects and that are based on different types of relationships. PLS-PM represents a useful tool to explore relationships and to analyze the influence of the different aspects on the complex phenomenon analyzed. In particular, the use of higher-order constructs has allowed researchers to extend the application of PLS-PM to more advanced and complex models. In this work, our attention is focused on higher-order constructs that include reflective or formative relationships. Even if the dispute between formative models and reflective models is not exactly recent, it is still alive in current literature, for the most part within the context of structural equation models. This paper focuses attention on theoretical and mathematical differences between formative and reflective measurement models within the context of the PLS-PM approach. A simulation study is proposed in order to show how these approaches fit well in different modeling situations. The approaches have been compared using empirical application in a sustainability context. The findings from the simulation and the empirical application can help researchers to estimate and to use the higher-order PLS-PM approach in reflective and formative type models.


Introduction
Over the last 30 years, many researchers have focused their attention on measuring the importance of constructs and the nature of the relationships between constructs. The focus of their scientific works, regarding reflective and formative relationships, has been primarily on identification and estimation issues (Blalock 1982;Bollen and Lennox 1991). The choice between formative or reflective models has enjoyed increasing attention in the literature of the recent years (Andreev et al. 2009;Diamantopoulos et al. 2008;Coltman et al.

Formative Versus Reflective Constructs
In the last few years, many articles have focused on the different constructs model, and about theoretical errors resulting from model misspecification (Bollen and Lennox 1991;Diamantopoulos and Winklhofer 2001;Jarvis et al. 2003;MacKenzie et al. 2005;Roy et al. 2012). In a reflective model (Edwards and Bagozzi 2000;Diamantopoulos and Winklhofer 2001) the construct is considered as the cause and the indicators its manifestations. For example, as Eboli et al. (2018) claim, "intelligence determines the responses of a subject to a questionnaire designed to assess this aspect and not vice versa". Hence, if the intelligence of a person increases, this will lead to an increase in the number of correct answers to all questions (Simonetto 2012). Thus, the construct determines its indicators (as shown on the left of Fig. 1) and each indicator, being a manifestation of the construct, can be removed if its coefficient is not statistically significant (Bollen and Lennox 1991). In a formative model, the indicators determine the latent construct (Bollen and Lennox 1991) (Fig. 1, on the right). According to MacCallum and Browne (1993), "in many cases indicators could be viewed as causing rather than being caused by the latent variable measured by the indicators". In this kind of model, a single indicator cannot be removed without affecting the definition of the construct. As Mazziotta and Pareto highlight in their works Pareto 2013, 2019), "a typical example of formative model is the measurement of wellbeing of society. This depends on health, income, occupation, services, environment, etc., and not vice versa. Therefore, if any one of these factors improves, well-being will increase (even if the other factors do not change). However, if well-being increases, this will not necessarily be accompanied by an improvement in all the other factors".
Today, formative models are becoming a standard tool in socio-economic research, particularly in the fields of causal modeling and multidimensional evaluation. Although theoretically the debate on the use of formative models is still very heated (there are still unsolved methodological problems encountered when addressing structural equation models comprising formative constructs), in practice these models are applied in many studies (Diamantopoulos et al. 2008). Generally, it is important to understand the nature of indicators, the reflective or formative nature, because an incorrect specification of the latent constructs can undermine the construct content validity, misrepresent a model, and lead to less useful theories for both researchers (Coltman et al. 2008). According to Coltman et al. (2008) there are three theoretical assumptions to consider when deciding whether the measurement model is formative or reflective: the nature of the construct, the direction of causality between the indicators and the latent construct, and the characteristics of the indicators used to measure the construct (Eboli et al. 2018).
In order to determine whether a construct is reflective or formative, the statistical instruments can also be used. As a matter of fact, the constructs are assessed through the theory underpinning each LV (Bagozzi 2007;Chin 1998b) and their validity by means of Cronbach's alpha and Average Variance Extracted (AVE) (Coltman et al. 2008). Additionally, the indicator-construct causality flow can also be checked using Weight-Loading Sign (WLS). A negative WLS indicates a causality issue which means that the indicator-construct link is impossible or reverse (Kock 2013;Wagner 1982), a tale-tale sign of a Simpson paradox instance (Kock 2015;Roni et al. 2015). Roy et al. (2012) claim "Wrongly modeling a reflective model as formative, and vice versa, is known as model misspecification". A misspecification in the measurement model impacts on the structural paths of the LV, thus leading to erroneous path coefficients (Jarvis et al. 2003;MacKenzie et al. 2005). For this reason, it is critical to understand when the use of formative or reflective Reflective and formative constructs measurement models is appropriate and how such models should be formulated (Roy et al. 2012).

Higher-Order Constructs in PLS-PM
The PLS-PM approach analyzes multiple relationships between a set of blocks of Manifest Variables (MV), assuming that each block of variables is represented by a Latent Variable (LV) or by a theoretical concept and that the relationships between the blocks are established on the basis of the knowledge (theory) of the phenomenon analyzed. PLS-PM is evolving as a statistical modeling technique, and there are several published articles on the method (Bollen 1989;Chin 2010;Joseph et al. 2014). Two very important review papers on the PLS approach to Structural Equation Modeling (SEM) are Chin (1998b) and Tenenhaus et al. (2005). Lately, Lauro et al. (2018) showed recent developments in PLS-PM for the treatment of non-metric data, hierarchical data, longitudinal data and multi-block data. PLS-PM is made up of two elements, the "Measurement Model" (also called the "Outer Model"), which describes the relationships between the MVs and their respective LVs, and the "Structural Model" (also called the "Inner Model"), which describes the relationships between the LVs (Esposito et al. 2010;Roy et al. 2012;Eboli et al. 2018). As Lauro et al. (2018) write, "the PLS-PM approach consists of an iterative algorithm that computes the estimation of the LVs, measured by a set of MVs, and the relationships between them, by means of an interdependent system of equations based on multiple and simple regression. The idea is to determine the scores of the LVs through a process, that, iteratively, computes, first, an outer and, secondly, an inner estimation (Tenenhaus et al. 2005). For this reason, the procedure name is partial (Aria et al. 2018). In recent years, in the context of PLS-PM models, HOCs have become very popular (Edwards 2001;Jarvis et al. 2003;Johnson et al. 2012). According to Chin (1998b); Chin et al. (2003), "HOC Models, also known as Hierarchical Models, are explicit representations of multidimensional constructs that exist at a higher level of abstraction and are related to other constructs at a similar level of abstraction completely mediating the influence from or to their underlying dimensions". Law et al. (1998) define "[...] a construct as multidimensional when it consists of a number of interrelated attributes or dimensions and exists in multidimensional domains. These dimensions can be conceptualized under an overall abstraction, and it is theoretically meaningful and parsimonious to use this overall abstraction as a representation of the dimensions." Typically, HOC Models are characterized by the number of levels in the model (often restricted to second-order models) (Rindskopf and Rose 1988) and the different relationships between the HOCs and the LOCs (reflective and formative relationships) (Edwards 2001;Jarvis et al. 2003;Joseph et al. 2014;Wetzels et al. 2009;Becker et al. 2012). As shown by Becker et al. (2012), "a higher (or second)-order construct is a general concept that is either represented (reflective) or constituted (formative) by its dimensions (lower (or first)-order constructs). Therefore, the relation between the higher and lower-order constructs is not a question of causality, but rather a question of the nature of the hierarchical LV, as the higher-order construct (the general concept) does not exist without its lower-order constructs (dimensions). If the higher-order construct is reflective, the general concept is manifested by several specific dimensions, themselves being latent (unobserved). If the higher-order construct is formative, it is a combination of several specific (latent) dimensions in a general concept" (Edwards 2001;Wetzels et al. 2009). As we can see in Fig. 2, there are four main types of HOC Models discussed in the literature (Jarvis et al. 2003;Wetzels et al. 2009) and used in applications (Johnson et al. 2012). These types of models depend on the relationship among the first-order latent variables and their manifest variables, and the second-order latent variables and the first-order latent variables (Becker et al. 2012).
The Type I is the Reflective-Reflective Measurement Model, known as the Second-Order Construct Type I, one of the models most frequently applied in SEM among researchers nowadays. The Type II is the Reflective-Formative Measurement Model Type II. According to Chin's clarification, the LOCs are selectively measured constructs that do not share a common cause but rather form a general concept that fully mediates the impact on subsequent endogenous variables (Chin 1998b). In recent years, this type of model has become the most widely used in empirical applications, currently being considered by researchers, also thanks to the recent availability of appropriate model software (Roni et al. 2015). Several works describe different methods used to estimate the reflective-formative HOCs (Becker et al. 2012;Ciavolino 2012). The third type is the Formative-Reflective Measurement Model Type III, slightly different compared to the Reflective-Formative Type II in the explanation above. In this instance, the HOC is a common concept of several specific formative LOCs. Examples in the empirical literature are rather scarce, but a meaningful application of such a model could be firm performance as a reflective HOC measured by several different indices of a firm performance as formative LOCs (Becker et al. 2012). Finally, the fourth type is represented by the Formative-Formative Measurement Model Type III, the least frequently implemented in the SEM. This application is appropriate when both the HOC and LOCs are formative constructs.
Empirical studies reports that HOC PLS-PM, the reflective-formative type and formative-formative type, were most frequently employed. This indicates the predominance of formative type hierarchical LV models, even if clear guidelines on their use are lacking in the literature (Joseph et al. 2014). For PLS-PM, guidelines are mainly available for HOCs with reflective relationships (Lohmöller 2013;Wetzels et al. 2009;Wold 1982), even though Joseph et al. (2014) show that HOCs with reflective relationships in the first-order and second-order of the hierarchy represent only a minority (20%) of the models applied in MIS Quarterly. However, there is a large need for guidelines on the use and modeling of HOCs with formative relationships in PLS-SEM (Becker et al. 2012).

Estimation of Higher-Order Constructs in PLS-PM
Within the frame of PLS-PM, two main approaches have been developed in the literature in order to estimate the parameters of HOCs: the Repeated Indicator Approach Lohmöller (2013) and the Two Step Approach (Joseph et al. 2014;Wetzels et al. 2009). Recently, other two approaches have been presented: the Mixed Two Step Approach and the PLS Components Regression Approach (Cataldo et al. 2017). All these approaches will be briefly described in this section. For a more detailed discussion and step-by-step illustration, see Becker et al. (2012) Step Approach, as described by Cataldo et al. (2017), "consists of two phases: first, the LV scores of the LOCs are computed without the HOC (Rajala and Westerlund 2010); then, the PLS-PM analysis is performed using the computed scores as indicators of the HOCs. The implementation is not performed through a single PLS run; this implies that any Second-Order Construct, investigated in stage two, is not taken into account when estimating the LV scores in stage one" (Fig. 3).
The Mixed Two Step Approach begins with the implementation of the PLS-PM using the indicators of the LOCs as the MVs of the HOC, such as the Repeated Indicators Approach. In this way, the algorithm gives the scores of the LOCs. Then, the scores of the blocks become indicators of the HOC, and the PLS-PM algorithm is run again (Fig. 4). At last, the PLS Component Regression Approach is described by Cataldo et al. (2017) as consisting of three different steps ( Fig. 5): "firstly, a HOC is formed of all the MVs of the LOCs; then, PLS-Regression is applied in order to obtain h components for each block; once h components have been obtained, they represent the MVs of the HOC and the PLS-PM algorithm is performed".
In the literature there are already works that compare these approaches within the reflective-formative type of HOC in PLS-PM. In particular (Becker et al. 2012) compared the Repeated Indicators Approach and the Two Step Approach using a simulation Step Approach has the advantage of estimating a more parsimonious model and proves suitable for the estimation of Second-Order Constructs since it produces estimates that are better than those obtained through the Repeated Indicators Approach. However, the Two Step Approach presents some important limitations related to the components of each block, such as the fact that only one component is chosen for each block, and this has a strong representative power but a weak predictive power in the analysis of the HOC. For these reason, lately, the two other approaches, the Mixed Two Step Approach and the PLS Components Regression Approach, have been proposed in  order to overcome these drawbacks. Since the aim of PLS-PM is to estimate the relationships between the LVs, these two approaches provide components that are at the same time representative of their blocks and predictive of the Second-Order Construct (Cataldo et al. 2017). Moreover, the PLS Component Regression Approach overcomes the problem of the number of components of the First-Order Constructs, giving the possibility of choosing the number of components to be extracted manually or according to a specific criterion (Cataldo et al. 2017). Cataldo et al. (2017) compared all four approaches through a simulation study with only one type of HOCs, particularly the reflective-formative type of HOCs and their findings suggest that the Mixed Two Step and PLS Component Regression Approaches are always the best choices, in terms of the bias and Mean Squared Error (MSE) of the estimates, when the researcher aims at studying the formative relationships of the structural model with constructs measured reflectively by their indicators. Starting from the simulation study of Cataldo et al. (2017), in this paper we propose an extended version of that simulation study, the approaches have been compared with all type categories reported by (Jarvis et al. 2003).

Simulation Study
The objective of this paper is to analyze and compare, within the same simulation design, the performance of the different approaches available in PLS-PM to estimate HOCs: the Repeated Indicators Approach, the Two Step Approach, the Mixed Two Step Approach and the PLS Component Regression Approach. In the literature there are works that apply simulation study for comparing the performance of the first two approaches for modeling HOCs, for example (Ciavolino and Nitti 2013;Becker et al. 2012). In this simulation work, we have included all the types of HOCs discussed in the literature (Jarvis et al. 2003) in our conceptual framework, using different sample sizes, in order to understand the effect of the sample dimension on the type of HOC. The performances have been evaluated in terms of the prediction accuracy, the estimate bias and the efficiency of the considered approach. The Monte Carlo simulation is used to compare the performance of these approaches, through the R language package. The data generation process is consistent with the procedure described by Paxton et al. (2001) for a Monte Carlo SEM study. In this simulation the same structure of the model and the parameters of the population has been adopted such as in the Cataldo et al. work (Cataldo et al. 2017). Firstly, we defined the structure of the model and the parameters of the population. Then, we generated randomly the Second-Order LV and, given the parameters and the error terms, we estimated the First-Order LVs. According to the outer parameters and error terms, finally, we generated the First and Second-Order MVs. The underlying population model used for the simulation consisted of one Second-Order LV (denoted by II ) and four First-Order LVs (denoted by I 1 , I 2 , I 3 , and I block depended on the number of components of the First-Order dimension extracted by the PLS Regression. The starting point was the generation of the First-Order LVs I i as random variables I q ∼ N(0, 1) . The data generated were rescaled in the interval [1,100].
For the formative structural model, the Second-Order Construct II j was computed as the product of I q by the path coefficient vector qj with the addition of an error component j according to the Eq. (1): For the reflective structural model, the First-Order construct I j was computed as the product of II q by the path coefficient vector qj with the addition of an error component q according to the Eq. (2): The path coefficient vector ( ) of the structural model was assumed to have elements equal to 0.7.
For the formative measurement model, the LV was supposed to be generated by its own MVs, following the Eq. (3): For the reflective measurement model, the MVs are generated starting from the LVs, given the lambda coefficients, following the Eq. (4): where the error term was distributed as a continuous uniform: ∼ U(−1, 1) . We estimated each approach with a centroid inner weighting scheme.
In order to assess and compare the methods estimated we used the Relative Bias (RB) and Standard Deviation (StD) following the indications in (Cataldo et al. 2017).
The RB was computed as: where n represents the number of replications in the simulation, ̂i is the parameter estimate for each replication and is the corresponding population parameter. The formula is equivalent to the mean RB (Reinartz et al. 2002).
The StD was computed as: (1) where E(̂) is the mean of the estimates across the 500 simulated datasets. A positive RB indicates an overestimation of the true parameter, a negative RB an underestimation. Instead StD index provides information on the efficiency of the estimates. For each scheme it was decided to compare the communality index which is always calculable with each type of relationship both for the structural model and for the measurement model. Furthermore, for the reflective models, the communality was equal to the AVE net of a constant, whereas for the training models it approximated the redundancy index. Table 1 reports the simulation results relating to the communality, relative bias and standard deviation. The results are grouped according to the estimation approach used, sample size and type of higher-order construct. The estimated community was always significant for all approaches except for the Repeated Indicators Approach that had very low values. Considering the value of the indicator, as can be seen in Fig. 6, it was always higher using the PLS-Regression algorithm, both as the number of samples increased and with each type of model ratio. With the Mixed Two Step approach, instead, it was always lower than with the PLS-Regression but however higher than with the Two Step Approach. Only for the reflective-reflective type, did the communality index of the two approaches have the same value. In Table 1 we also note how, as the number of the sample increased, the variability of the estimate was lower for the PLS-Regression than for the other approaches.
In the "Appendix, Tables 5, 6, 7 and 8" report the simulation results relating to the coefficients (computed as the average of the 500 replications), relative bias and standard deviation. The results are grouped according to the estimation approach used and sample size. For each combination, the path coefficients, relative bias and standard deviation of the four parameters are reported. As can be seen in Tables 5, 6, 7 and 8 , the estimated paths did not differ much for all the sample sizes and for all the approaches and frameworks used. The path estimates were always significant but, while remaining low, the variability of the estimates was lower when using the Mixed Two Step and PLS-Regression Approaches. Finally, the relative bias of the path coefficients are reported in detail in Tables 9, 10, 11 and 12 grouped for each framework (see the "Appendix"). The Two Step Approach heavily underestimated all the path coefficients linking the First-Order Construct with the Second-Order LV in all sample sizes. Looking at the new methods proposed, we can see that for relative small samples (n=100; n=250) the Mixed Two Step Approach worked best, producing estimates near to zero, while the methods had the same performance for large samples (n=500; n=1000), giving an equivalent accuracy.
The Mixed Two Step and PLS-Regression approaches demonstrated a greater accuracy in predicting the higher level construct, since the communality index was higher than that of the Two Step Approach. The difference is remarkable for all sample size. Therefore, these approaches were also the best option for predicting the Second-Order LV. Overall, the results show that these two methods were always the best choice, in terms of the bias and MSE of the estimates.

Application Case Study
The approaches have been compared also in an empirical application in a sustainability context. Global Sustainability was conceived as a Third-Order Construct affecting the Second-Order dimensions, which in turn shaped the First-Order LVs, underlying   (Cataldo et al. 2020). For the sake of simplicity and for illustrative purposes only, the analysis focuses on the social area of Sustainability and its related goals with 54 elementary indicators. This decision was made in order to facilitate, particularly, the implementation of the different approaches. This section reports the main results of the The application case focuses on the Type II category reported by Jarvis et al. (2003), a model resulting from the combination of formative LOC and formative HOC (Fig. 7).
The official data derive from the database of the United Nations "Sustainable Development Goals". The analysis was developed with reference to the European Community countries, in total 28. The data are related to the triennium 2015-2017, in particular to the year 2017 (however, for some variables, because of the lack of data for that year, the previous years of this triennium were taken as the reference). Considering that each EIs has different units and values, for the purposes of comparison all the units were first normalized to a value between 0 and 1, where 0 was the value assigned to the least sustainable country while 1 was assigned to the most sustainable for each EI. The Sanchez "plspm" package in the R programming language (Sanchez 2013) was used in order to perform the PLS-PM analysis involving the formative indicators and the centroid scheme for the inner estimation. Table 2 reports the main quality measurements of the four model approaches.
The Communality measure of the Mixed Two Step approach and the PLS-Regression approach was higher than that of the Repeated Approach and the Two Step Approach. Therefore, the amount of variability of the MVs captured by the social area construct of SDGs using these two methods was higher than when the two classic methods known in the literature were adopted. It is important to note that there is a difference in the use of the scores of the First-Order dimensions. In order to assess the significance of the path coefficients, Table 3 reports the value and significance of the structural coefficients linking the First-Order dimensions to the Second-Order construct.
The model estimated with the Repeated Indicator approach and the Two Step Approach some not significant paths. In all the approaches the goal "Good Health and Well-Being" Goal proves to be most influential among all the factors. This dimension proves not to be significant for the Two Step Approach precisely because it is related to the way it is calculated.
The quality of the model (Tenenhaus et al. 2004) (Table 4) is quite high in all approaches, except the Repeated Indicators Approach, but slightly higher if we estimate the components with the Mixed Two Step Approach and PLS Component Regression Approach. Therefore, the application case focused on the Type II category reported by Jarvis et al. (2003) demonstrated how well the approaches fit the construct in terms of communality and path significance. We can see that the PLS-Regression approach works better than the other approaches reported in the literature.

Conclusions and Future Research
The debate over the reflective or formative approach is still open. In the literature it is possible to find considerations on this theme, sometimes in conflict with each other. The goal of this work has been to present the approaches for the PLS-PM parameter estimation in the presence of HOCs. The performances of the Higher-Order approaches have been compared in different modeling situations through simulations, in which we have tested all types of HOCs in our conceptual framework. The performances of these approaches have been analyzed through a simulation study. The PLS-Regression and the Mixed Two Step approaches, compared with the Repeated Indicators Approach and the Two Step Approach have almost always shown more stable estimates, both when the sample size changes and when the relationships between the LVs and the MVs change. In all the cases considered, the communalities are always higher when using the PLS-Regression Approach. The PLS-Regression Approach has shown a higher variability than the mixed approach for small samples. As the sample size increases, PLS-Regression improves in terms of estimates, becoming more stable than the Mixed Two Step Approach. In terms of relative error, the results vary with respect to the type of relationship that binds them. The Mixed Two Step and PLS Regression approaches are always the best choices, in terms of bias and MSE of the estimates. They slightly outperform, in terms of prediction accuracy, the Repeated Indicators Approach and the Two Step Approach. In general, if we work on large samples these two methods have the same performance. To be more accurate, the PLS Regression Approach would seem to be the best, because, as shown in the simulations, it works better than the Mixed Two Step Approach. In the empirical example about Sustainability we proposed, we have verified that with small samples if we want to study the formative relationships of the structural model, with constructs measured by their indicators in a formative way, the Mixed Two Step Approach and PLS Component Regression Approach are the most powerful methods, in terms of quality of the model and path significance. It is interesting to note that it is not possible to define strict rules to choose between a reflective or a formative model. The researchers to decide the mode must consider the latent construct and the indicators (observable or observed) at their disposal. The simulation study we have performed considers a simple situation with only one Second-order LV and four Firstorder LV. In the future we want to study a more complex model considering also Third or Fourth order LV, using more and more MV and LV. In this way will see how the different approaches analyzed perform when the complexity increase. We intend to test, also, the performance of high order PLS path models with qualitative external informations to take into account characteristics about the units or about the variables (Ciavolino et al. 2015). In our application for example we could have considered the qualitative external information: name of the UE Country in order to to compare the performances of the different Nations.
Funding Open access funding provided by Università di Foggia within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.