Modeling indices using partial least squares: How to determine the optimum weights?

Indices are often used to model theoretical concepts in economics and finance. Beyond the econometric models used to test the relationships between these variables, partial least squares path modeling (PLS-PM) allows the study of complex models, but it is an estimator that is still in its infancy in economics and finance research. Thus, the use of PLS-PM for composite analysis needs to be explored further. As one such attempt, this paper is focused on the determination of the indices’ optimum weights. For this purpose, the effects of the market potential index (MPI) on foreign direct investment (FDI) and gross domestic product (GDP) were analysed by implementing different weighting schemes. The assessment of the model shows that PLS Mode B leads to better model fit.


Introduction
Socio-economic position (SEP) is a widely used concept in epidemiological research that is used to determine the well-being of societies (Howe et al. 2008).It refers to physical resources, social resources, and status within a social hierarchy (Krieger 2001).Therefore, scholars often use indices to assess relative SEPs within a population.Moreover, national governments are interested both in the positions and potentials of their economies in the world (Bobek and Vide 2005).Thus, they also benefit from the use of indices.The creation of an index requires the use of formative indicators (Diamantopoulos and Winklhofer 2001).Such composite indicators allow the public to easily comprehend the information provided (Freudenberg 2003).They are widely used in the field of economic and business statistics to clarify their political importance and operational adequacy in decision making (Munda and Nardo 2003).
Although the indicators of SEP tend to be monetary measures, such as income or consumption expenditure, more complex conceptualizations of SEP are needed to cover other aspects of well-being (Howe et al. 2008).Therefore, scholars from economics and finance attempt to develop increasingly complex indices and build more complex models to develop socioeconomic theories.
Moreover, when constructing the indices from a set of variables, scholars need to decide about the weights to assign to each indicator.Principal Components Analysis (PCA) has been one of the methods recommended for determining weights for components (Filmer and Pritchett 2001).However, beyond its complexity, PCA is considered as problematic when used with the discrete data included in the indices, and the first principal component often explains only a low proportion of the total variation in asset data.In recent years, structural equation modeling (SEM) has become an important statistical tool used especially in the social and behavioral sciences (Benitez et al. 2020).Estimators for SEM can be classified as either covariance-or variance-based estimators.While the first estimates model parameters using the empirical variance-covariance matrix, the second estimates the model parameters using proxies based on linear combinations of observed variables (Henseler et al. 2016).On the other hand, SEM through PLS is an approach based on regression, which minimizes the residual variances of variables (Hair et al. 2011;Merli et al. 2019).
Even though some other methods have been suggested, none have proved more simple nor more suitable for discrete data (Howe et al. 2008).Therefore, latent variable approaches have been proposed (Ferguson et al. 2003;Montgomery and Hewett 2005).One recent and useful method for this purpose ispartial least squares (PLS), which will be discussed in detail in this paper.
As a widely used variance-based SEM technique, PLS enables the researcher to analyze complex relationships between variables while also examining their direct, indirect and moderating relationships (Nitzl et al. 2016).PLS appears to be an excellent method of studying emergent variables (as opposed to latent variables).Emergent variables are defined by their indicators; in other words, they are composites of variables (Benitez et al. 2020;Henseler and Schuberth 2020;Reise 1999).They are also termed 'composite constructs' (Benitez et al. 2018), 'aggregate constructs' (Edwards 2001), and 'formative constructs' (Petter et al. 2007).However in this paper, we follow the suggestion of Benitez et al. (2020) and we emergent variables to emphasize that the construct emerges from the indicators.An emergent variable is a composite with the additional property that it follows the axiom of unity (see Henseler and Schuberth 2021).If an emergent variable has effects in a larger model, the emergent variable mediates all the effects of its elements (whereas not all composites do that).One may think of the analogy of common factors and latent variables and ask the question "Why do we need the term latent variable?"The answer is that all latent variables are common factors, but not all common factors are latent variables.Latent vari-ables are those common factors that obey the axiom of local independence (not all common factors do that).
The PLS literature before around 2015 mostly uses the term "PLS-SEM", which reflects an outdated understanding of what formative and reflective mean.They associate reflective with PLS Mode A and formative with PLS Mode B. Consequently, the model to be analyzed does not exactly match the analyzed model (Henseler 2021).This is probably why proponents of "PLS-SEM" discourage the assessment of model fit.As a result, the term PLS path modeling (PLS-PM) was preferred in this study rather than PLS-SEM.
Owing to its alleged ability to model both factors and composites, PLS-PM has been termed a "silver bullet" (Hair et al. 2011) and it is widely used in various fields of business administration research including information systems (Marcoulides & Saunders, 2006), strategic management (Hair et al. 2012), marketing (Hair et al. 2012), operations management (Peng and Lai 2012), human resource management (Ringle et al. 2020), and finance (Avkiran and Ringle 2018).
Whereas factors can be used to model latent variables in behavioral research, including into attitudes and personal traits, composites can be applied to model strong concepts (Höök and Löwgren 2012) such as social and economic indices.Unlike the measurement models, in the composite model the indicators do not cause the construct, but combine to compose it (Benitez et al. 2020).Composite models do not suffer from factor indeterminacy, and the indicator weights determine -for each observation -how to determine the location parameter (Henseler and Schuberth 2021).To specify and assess composite models, Jörg Henseler and Theo K. Dijkstra developed confirmatory composite analysis (CCA) as an innovative set of procedures (Henseler and Schuberth 2020).
Any choice of weights can be questioned and debated in public discussions (Anand and Sen 1997).By focusing on the composites, this paper illustrates the use of PLS-PM as an analytical tool for determining the weights of indices by comparing alternative weighting schemes that are available in PLS.

Foundations of PLS-PM and composite analysis
In recent years, structural equation modeling (SEM) has become an important statistical tool used especially in the social and behavioral sciences (Benitez et al. 2020).Estimators for SEM can be classified as either covariance-or variance-based.While the first estimates model parameters using the empirical variance-covariance matrix, the second estimates the model parameters using the proxies based on linear combinations of observed variables (Henseler et al. 2016).On the other hand, SEM via PLS is an approach based on regression, which minimizes the residual variances of variables (Hair et al. 2011;Merli et al. 2019).
In scientific disciplines, theoretical constructs are the building blocks of theories (Henseler 2017).In general, most theoretical constructs can only indirectly be measured through observable indicators, and no single indicator can capture the full theoretical meaning of an underlying construct (Steenkamp and Baumgartner 2000).Thus, multiple indicators are necessary for measuring a theoretical construct.Therefore, researchers may benefit from SEM to examine the relationships between constructs.SEM is widely preferred by researchers since it permits them to graphically model and estimate parameters for relationships between theoretical constructs and to test behavioral science theories (Bollen 1989).
Based on their estimation objectives, SEM can be categorized as covariance-based and variance-based (Henseler et al. 2009).As a variance-based estimator for SEM, PLS-PM can estimate linear, non-linear, recursive, and nonrecursive structural models (Dijkstra and Henseler 2015;Dijkstra and Schermelleh-Engel 2014).
PLS-PM can be employed for modeling two types of constructs: emergent variables and latent variables (Benitez et al. 2020).Figure 1 shows a composite model in which the hexagon represents an emergent variable, and the rectangles stand for the observable variables.Latent variables are those that are not directly observable but are inferred through a measurement model from directly measured observed variables (Hair et al. 2016).On the other hand, emergent variablesare constructs emerging from the indicators; in other words, they are defined by their indicators (Reise 1999).The composite model can be employed to operationalize these concepts (Henseler 2017).The relationship between an emergent variable and its indicators is definitional rather than causal (Henseler 2015).In the literature, the use of indices has gained momentum as an alternative to traditional psychometric measurement models (Diamantopoulos and Winklhofer 2001).Since the indices are composites, PLS is expected to be a useful tool for construct analysis.

Application areas of PLS vs. econometric models
In the field of economics and finance, econometric analysis is widely used to analyze the relationships between constructs such as Gross Domestic Product (GDP) (Bajrami et al. 2022), Foreign Direct Investment (FDI) (Song et al. 2021), agricultural employment (Jiang et al. 2022), the unemployment rate (Drachal 2020), total entrepreneurship activity (Gautam and Lal 2021) and those from indices such as the economic inequality index (Brida, Risso, Sanchez Carrera, & Segarra, 2021), the global competitiveness index (Sergi et al. 2021), the green finance index (Ye et al. 2022), and the digital index (Litvintseva and Karelin 2020).
Since the performance of econometric models is sensitive to which analytical method is used (Clements and Hendry 1998), it might pay off in some situations to consider the use of PLS-PM.Besides estimating complex models that have many latent and/or emergent variables and handling both reflective (if consistent PLS is employed, see Dijkstra ) and composite models, PLS-PM avoids some small sample size problems (Henseler et al. 2009).On the other hand, the visual interface of PLS makes it easier for many users to understand even very complex models and to interpret their results.Moreover, since different econometric models may provide different forecasting accuracies (Song et al. 2003), PLS-PM can also be used as a triangulation for model testing.In addition to cross-sectional studies, PLS-PM has also been proposed to be used for longitudinal studies to analyze unobservable and complex variables over time (Roemer 2016).
4 Various ways of forming composites in PLS: Guidelines for determining optimum weights for indices PLS-PM includes three approaches to measurement models: PLS Mode A, PLS Mode B, and Sum Scores (Yuan et al. 2020).In PLS Mode A, the weights are calculated as correlations between the observed variables and the corresponding construct scores.In contrast, in PLS Mode B, the weights are calculated as the estimated coefficients of an ordinary least squares regression from the construct scores on the corresponding observed variables (Schuberth et al. 2021).If the constructs are conceptualized as composites, then regression weights (PLS-PM Mode B) should be preferred.On the other hand, if a model includes common factors, the use of correlation weights (PLS Mode A) is recommended in combination with a correction for attenuation as provided by PLSc (Dijkstra and Henseler 2015a,b).
Consequently, scholars tend to choose weighting schemes mostly based on the epistemic relationship between the construct and its indicators, i.e., the choice is mostly based on whether the measurement model is formative or reflective (Diamantopoulos and Winklhofer 2001;Henseler 2010).Often, scholars simply use PLS Mode A for reflective measurements and PLS Mode B for formative measurements of latent variables.However, this oversimplification has casted doubts about PLS-PM's suitability (Aguirre-Urreta and Marakas 2014a, b; Rigdon et al. 2014).In line with the critics of Rönkkö et al. (2016), Schuberth et al. (2021, p. 108) conclude that 'both Mode A and Mode B produce inconsistent parameter estimates for latent variable models such as the reflective measurement and causal formative measurement models.'Instead, PLS Mode B and PLS Mode A should be understood as different options for determining composite weights.Compared to PLS Mode B, PLS Mode A has the additional constraint that the weight estimates are proportional to the loadings of a common factor model (Henseler and Schuberth 2022).
This study aims to develop guidelines for determining weighting schemes in line with the principles of Occam's razor which are "as parsimonious as possible, as flexible as practical" (Sharma et al. 2019).Many social science scholars give importance to parsimony in theory development since it 'explains much by little' (Benitez et al. 2020;Sharma et al. 2019).
Once a model is estimated, its fit should be assessed to examine whether the model is consistent with the collected data.For this purpose, the bootstrap-based test based on a discrepancy measure -such as the SRMR -can be used (Schuberth et al. 2020).To test the hypothesis of exact model fit, the discrepancy measure should be compared with the 95% or 99% quantiles of its corresponding distribution as obtained through bootstrapping (Henseler et al. 2016).The null hypothesis stating that the indicator population correlation matrix equals the model-implied counterpart is rejected if the discrepancy measure exceeds the 95% or 99% quantiles of its reference distribution (Schuberth et al. 2020).
Based on the explanations above, guidelines are developed to determine the weighting scheme for composites that exhibit an optimum alignment with the data.Those are shown in Table 1.We follow each step of our proposed guidelines to determine the weights of indices as composites.

Description of the example
We provide an illustrative example to test various weighting schemes in a macroeconomic construct using PLS, where Market Potential Index was assumed to be a predictor of foreign direct investment (FDI) and gross domestic product (GDP).Figure 2 shows the proposed model to be estimated and tested.
As the global economy becomes increasingly interconnected and markets become increasingly accessible, international expansion has become increasingly imperative for companies.Thus, managers need objective criteria to identify, evaluate and select suitable foreign markets (Ozturk et al. 2015).To satisfy this need, the Market Potential Index (MPI) was proposed by Cavusgil (1997) initially for Western firms to assess and compare the The raw data of these variables were rescaled into a scale of 1-100 (Sakarya et al. 2007).Next, the weights of these indicators were determined based on a Delphi process among international business scholars and professionals (Sheng and Mullen 2011).With the aim of guiding U.S. companies' for their expansion in international markets, the index was created for 26 countries classified as "emerging markets" before 2014.By that year, it had been calculated for 87 countries that were in the top 100 performers in terms of total GDP, that had a population of at least one million, and for which reliable data was available for the majority of the indicators used (GlobalEdge, 2021).The index measures have been published every year on the GlobalEdge web site.
Based on a dynamic panel data method, Carstensen and Toubal (2004) showed that market potential has a substantial positive effect on FDI.FDI is a critical component of countries' economies.It has been assumed that FDI leads to increasing income, production, prices, employment, economic growth, development and general welfare of the recipient country (Kok and Ersoy 2009).Moreover, beyond just resources, such investment also brings technology, access to markets, and improvements in human capital (Stiglitz 2000).Since FDI is widely regarded as an aggregation of capital, technology, marketing, and management, many countries consider it as an important part of their economic development strategies (Cheng and Kwan 2000).
Another core indicator in judging the position of a country's economy over time or relative to that of other countries is GDP (Van den Bergh 2009).Since GDP was first suggested an indicator of progress seven decades ago, it has become to be seen as an old-fashion metric for the world today and it has been criticized since it ignores social costs, environmental impacts, and income inequality (Costanza et al. 2014).
Acknowledging the above shortcomings, we consider two basic indicators of economic progress through proposing two hypotheses: H 1 MPI has a positive effect on FDI.
H 2 FDI has a positive effect on GDP.
Although previous studies that adopted PLS-PM have considered more complex models that included more constructs, second-order constructs and/or moderation effects (Benitez et al. 2020), the hypothesized model seems suitable for our purposes since (1) we aim to provide a guideline for using PLS-PM in confirmatory and explanatory economic research using indices; (2) the considered model with both its latent and emergent variables is a good example of how PLS-PM can leverage its full capacities; and (3) we use endogenous variables with single-indicator constructs.
The composite model is employed to operationalize the indices (Market Potential Index, in the present example).In doing so, Market Potential is assumed to be composed of market size, market growth rate, market intensity, market consumption capacity, commercial infrastructure, economic freedom, market receptivity, and country risk.Scholars and analysts of both economics and finance might also consider other approaches, such as export market potential (Sheng and Mullen 2011), the Bitcoin market potential index (Hileman 2015) or other industry-specific indexes.The hexagon shown in Fig. 2 represents the exogenous construct (here: a composite) and rectangles represents the elements forming the construct (while the dashed rectangles represent the elements eliminated after the first composite analysis).
On the other hand, the endogenous variables in the model, namely FDI and GDP, were modeled as single-indicator constructs.

Data collection and preparation
To test the hypotheses, we collected secondary data from the following online databases: (1) eight indicators' scores of Market Potential Index were downloaded from the GlobalEdge web site (https://globaledge.msu.edu/mpi);(2) FDI data -net inflows from the database of the World Bank (https://data.worldbank.org/indicator/BX.KLT.DINV.CD.WD?display=default); and (3) GDP data from the database of the World Bank (https:// data.worldbank.org/indicator/NY.GDP.MKTP.CD?display=default).
Moreover, we excluded data from 2020 and 2021 to eliminate the lockdown effect of COVID pandemic on economies.Since the MPI calculation method was changed after 2015, we considered data for the years 2016, 2017, 2018 and 2019.We used the averages of the 4-year data for each indicator to smooth any potential bias.Next, we aggregated all the data and we ended up with a list of 82 countries and complete data for all the indicators.

Estimation of the weights in the composite model
Various software packages -including PLS-Graph, SmartPLS, WarpPLS, XLSTAT-PLS and ADANCO -can be used to estimate the parameters of a PLS path model (Benitez et al. 2020).However, most of them do not allow to test a model's goodness of fit.In this study, we used ADANCO 2.0.1 Professional for Windows (http://www.composite-modeling.com/) to estimate the composite model.
Firstly, we needed to set a dominant indicator in the composite model.The dominant indicator is used to dictate the orientation of the construct and it is expected to positively correlate with the construct.Since FDI and GDP were specified as single-indicator constructs, we only needed to determine a dominant indicator for MPI.To select that, face validity can be used (Benitez et al. 2020).Since in the original MPI calculation Market Size had received the highest weight (25/100), in our example, we chose it as the dominant indicator for MPI.Next, we used the factor weighting scheme for inner weighting, and statistical inferences were based on the bootstrap procedure, relying on 4,999 bootstrap runs as recommended by Henseler et al. (2016).It is important that each bootstrap sample has the same size as the original sample.ADANCO takes this into account automatically.
When we ran the analysis, the model produced insignificant factor loadings for market growth rate, market intensity, market receptivity and economic freedom.After elimination of these elements, we ran it again.
As the next step, we tried PLS Mode A, PLS Mode B and Sum Score as the weighting schemes for the constructs.1 Since both FDI and GDP are single-indicator constructs, their loadings equal one and remain thus unaffected by a change in weighting scheme.Therefore, we focus on the weighting schemes for MPI.

Assessment of the composite model
The steps followed to assess the composite model are summarized in Table 2.The assessment started with the evaluation of the overall fit of the model.Table 3 summarizes the values of the discrepancy measures as well as the 95% and 99% quantiles of their corre-  sponding reference distribution for our example.Only for PLS Mode B was the value of the SRMR below the recommended threshold value of 0.080 (Henseler et al. 2014).Next, when we considered whether the discrepancy measures were below the 95% quantile of their reference distribution (HI 95 ) or at least below the 99% quantile (HI 99 ), we observed that only in PLS Mode B do the assessment criteria perform well.
Even though composite models are typically estimated by Mode B (regression weights) in PLS-PM (Benitez et al. 2020), we evaluated all the weighting schemes on the same basis.Table 4 shows the weights and loadings obtained through PLS Mode A, PLS Mode B and Sum Scores as weighting schemes.It is important to note the difference between weights and loadings.While weights represent the degree of importance of each indicator (ingredient) to the construct, composite loadings represent the correlation between the indicator and the corresponding emergent variable (Cenfetelli and Bassellier 2009).After evaluating the significance of the weight and loading estimates, scholars can decide whether or not to keep non-significant indicators for a construct's content validity (Hair et al. 2016).Based on this explanation, we observe that the remaining four indicators (Market Size, Market Consumption Capacity, Commercial Infrastructure, and Country Risk) loadings are significant in all the weighting schemes.

Discussion and further research
Estimating and assessing models for indices as composites in the fields of economics and finance poses a challenge for scholars employing SEM, particularly in determining a suitable weighting scheme.To overcome this difficulty, this study suggests employing PLS-PM and provides guidelines to determine the optimum weighting scheme to be used in the analysis.For this purpose, we propose a five-step approach to choosing between PLS Mode A, PLS Mode B, and Sum Scores.This enables researchers to determine the optimum weights for the indices in their analyses.
We used widely known constructs in economics and applied each step we proposed in the guidelines.We compared the outcomes of three weighting schemes: PLS Mode A, PLS Mode B, and Sum Scores.Even though an isolated composite is not identified, as soon as an emergent variable is embedded in a wider model (i.e., it is studied in its nomological net), there is only one possible set of weights that fulfill the requirements of an emergent variable.
Based on the results from the illustrative example that included composites, similar to the findings of Schuberth et al. (2020) and (Becker et al. 2012), PLS Mode B produced not only consistent estimates, but also acceptable discrepancy measures Therefore, it was preferable over the other weighting schemes.On the other hand, in the case of a large degree of multicollinearity among the indicators of a composite, the use of PLS Mode A might be worthwhile (Schuberth et al. 2020).
Since any preliminary study is limited by its design, we recommend future research to conduct more complex index studies.For instance, researchers could attempt to develop higher-order models and/or test the relationships between various indices.

Conclusions
Indices are often used in economics and finance research to test various types of theoretical concepts, such as market potential, as well as to estimate and test their relationships.PLS-PM is a useful estimator for this purpose.However, determining optimum weights for indices remains an open issue in PLS-PM analyses.This study provides guidelines on PLS-PM for composite analysis by testing three weighting schemes (PLS Mode A, PLS Mode B, and Sum Score).To achieve this, an example was provided with an estimation of the effect of MPI on FDI and GDP.Two hypotheses were developed based on the extant literature.As a result, the key contribution of this study to the methodological literature on the empirical research in economics and finance was the attempt to determine the optimum weights of indices.Further research is recommended to test how would PLS-PM perform as an alternative to econometric analyses and how to transfer modeling tactics to panel data.
The empirical study revealed that whereas PLS Mode A and sum scores yielded a significant model misfit, the model's goodness of fit was acceptable when PLS Mode B was employed.Therefore, based on the results, we recommend using PLS Mode B to determine weights in PLS-PM studies that involve the use of indices in economics and finance.

Table 1
Guidelines to determine weights for indicesStep 1.
Estimate the model based on the extant literature Step 2. Assess the overall model fit for different weighting schemes Step 3. Update the model with elimination or combination of variables Step 4.Keep the weighting scheme that yields insignificant model misfit Step 5.Extract construct scores Fig.2Modeling the Market Potential Index and its impact on foreign direct investment (FDI) and gross domestic product (GDP)

Table 2
Steps to assess common factor and composite models HI 95 d ULS < HI 95 or d ULS < HI 99 d G < HI 95 or d G < HI 99

Table 3
Results of the Composite Analysis