Generalized properties for Hanafi–Wold’s procedure in partial least squares path modeling

Partial least squares path modeling is a statistical method that allows to analyze complex dependence relationships among several blocks of observed variables, each one represented by a latent variable. The computation of latent variable scores is an essential step of the method, achieved through an iterative procedure named here Hanafi–Wold’s procedure. The present paper generalizes properties already known in the literature for this procedure, from which additional convergence results will be obtained.


Introduction
Partial least squares path modeling (PLS-PM), originally developed by Wold (1982Wold ( , 1985)), is a powerful multivariate statistical method that allows us to analyze relationships among several blocks of observed variables, usually called manifest variables (MVs).Each block is assumed to represent an unobserved variable, the so-called latent variable (LV).
PLS-PM is considered to be an alternative approach to the covariance structure analysis (Jöreskog 1970), traditionally used for parameter estimation in Structural Equation Modeling (SEM).However, PLS-PM is based on a statistical approach quite different from covariance structure analysis for SEM, leading to different parameters to be estimated.It is actually better defined as an estimation method for composite-based path modeling, according to the most recent literature (Dijkstra 2017).
PLS-PM focuses on LV score computation, accounting for variances of MVs and correlations among LVs.It shows a number of interesting features (e.g., great flexibility, robustness, few demands concerning distributional assumptions and requirement for identification) and has become increasingly important in many areas (Esposito Vinzi et al 2010;Hair et al 2017).
The fundamental principle in PLS-PM is that all the information concerning the relationships between K blocks of observable variables X 1 ,X 2 , . . ., X K is assumed to be conveyed by K composites z 1 ,z 2 , . . ., z K .Each composite, z K , is a proxy of the corresponding LV, ξ k , which can not be directly observable and is assumed to represent the block of p k MVs X k = [x k,1 , x k,2 , . . ., x k, p k ].The same N observations are measured on the blocks of MVs, stacked in matrices X 1 ,X 2 , . . ., X K .
In PLS-PM the investigator starts with a conceptual picture of the model.In the conventional graphical representation of the conceptual model (see Fig. 1), the MVs x k, j (1 ≤ j ≤ p k ; 1 ≤ k ≤ K) are represented by squares and LVs by circles.Using prior substantive knowledge and intuition the investigator can specify the links between LVs (represented by arrows) when these LVs are assumed to be related.The investigator therefore defines which LVs are to be connected to others and which are not.
Once the conceptual model is designed, the design of the formal model and the estimation algorithm are straightforward.Two models are considered in PLS-PM.The first one, called the measurement model, relates the MVs to the LVs.Each MV x k,j is related to its LV ξ k by a simple regression (1 ≤ j ≤ p k ; 1 ≤ k ≤ K): The only hypothesis made on the model in Eq. ( 1) is called the predictor specification condition: This condition implies that the residuals k, j have zero mean and are even mean independent with the LV ξ k .The error terms of each block are allowed to freely correlate.
The second model, called the structural model, specifies the relationship between LVs.A LV that never appears as a dependent variable is called exogenous; otherwise, it is called endogenous.In the structural model, a dependent LV ξ k is related by a multiple regression model to the corresponding predictor LVs ξ k , k ∈ J k , where J k = {k : ξ k is predicted by ξ k }: where the usual hypotheses on the residuals implied by the predictor specification condition are made.
The estimation of the parameters of the models in Eqs. ( 1) and ( 3) is based on the PLS-PM algorithm which proceeds in three stages.The first stage computes the LV scores, z 1 ,z 2 , . . ., z K .Each z k is constructed as a linear combination of its indicators = 1).The second stage estimates the model parameters in Eqs.(1) and (3), i.e., the parameters π k, j (1 ≤ j ≤ p k , 1 ≤ k ≤ K ) and β k k .Finally, the third stage estimates the location parameters, the parameters π 0 k, j (1 ≤ j ≤ p k , 1 ≤ k ≤ K ) and β k 0 .The first two stages use centred data X 1 , X 2 . . .X K , and the two last stages use classical OLS regression and do not show any difficulty concerning the computation.The present paper focuses on the computation of LV scores (i.e. the first stage of the PLS-PM algorithm).Hanafi (2007) points out that there are two main procedures for the computation of scores in the first stage of the PLS-PM algorithm: the original procedure as proposed by Wold (1982Wold ( , 1985)), and extended by Hanafi (2007) (called here Hanafi-Wold's procedure), and an alternative procedure introduced by Lohmöller (1989).
As demonstrated by Hanafi (2007), the advantage of Hanafi-Wold's procedure is that it is monotonically convergent, it reaches a stable solution faster and has a better performance in terms of convergence speed compared to Lohmöller's procedure.This latter procedure does not converge monotonically, but is implemented in most PLS-PM software applications, such as PLS-Graph (Chin 2003), PLS-GUI (Hubona 2015), SmartPLS (Ringle et al 2015), XLSTAT's PLSPM package (Addinsoft XLSTAT 2019), among others.The present paper focuses on Hanafi-Wold's procedure.Hanafi (2007) showed that the sequence of LV scores computed through Hanafi-Wold's procedure increases two different criteria, depending on two schemes for computing the weights relating LV scores (see Theorem 1 below), well known in the PLS-PM community as centroid and factorial schemes.Both criteria are defined as a function of the correlation matrix of the LVs scores.These monotony properties make it easy to establish the monotone convergence of the Hanafi-Wold's procedure.Here, monotony convergence refers that the values obtained iteratively by the two criteria define a real sequence that is convergent.
The present paper proposes generalizations of some properties already established in the literature (Hanafi 2007) and from which additional convergence results for Hanafi-Wold's procedure will be established.
The paper is organized as follows.Section 2 describes briefly Hanafi-Wold's procedure for the computation of LV scores and summarizes their monotony properties as obtained by Hanafi (2007).A generalized forms of these properties will be provided in Sect.3. Conclusions and future research are presented in Sect. 4.

Hanafi-Wold's procedure
The basic idea of Hanafi-Wold's procedure was initially proposed by Wold (1985, pp. 586) in the particular case of six blocks, and extended by Hanafi (2007, pp. 280) considering (i) any number of blocks and (ii) any conceptual design model.
Let C = [c k,l ] be a (K,K) symmetric matrix, which takes into account the link between the LVs.It is defined from the conceptual design of the model.The elements of the matrix C are defined as follows : c k,l = c l,k = 1 if there is an arrow between the LVs ξ k and ξ l , and c k,l = c l,k = 0 otherwise.
Overall, it is a symmetrical procedure (Dolce et al. 2018) that concerns the computation of the values of w k vectors of weights, associated with z k = X k w k (1 ≤ k ≤ K), with the constraints that these LV scores are centered and have unit variance ).The Hanafi-Wold's procedure consists of building iteratively a sequence of LV scores z (s)  k = X k w (s)  k , (1 ≤ k ≤ K) and (s = 0, 1, 2, . ..), as follows : , where r denotes the Pearson correlation coefficient The procedure begins with an arbitrary choice of initialization.Suppose that z are computed following Steps 2, 3, 4 and 5.The process is iterated over k || 2 is smaller or equal to a fixed threshold.Clearly, the procedure depends on two schemes (step 2) depending on how θ (s)   k 0 ,l elements are calculated.In step 4 is considered here only the calculation of the weights using the so-called mode B. Another alternative for the weights calculation, known as mode A (see Hanafi 2007), can also be used, but the present work is limited only to mode B.
The Hanafi-Wold's procedure can be presented in a compact form, depending on the two chosen schemes (i.e., centroid or factorial).For the centroid scheme, the procedure can be presented as follows: where (5) For the factorial scheme, the procedure can be presented as follows: where l . (7) The compact forms (4) and ( 6) are obtained straightforwardly by substituting step Theorem 1 (Hanafi 2007, pp. 282) . ., be a sequence of LV scores generated by Hanafi-Wold's procedure.When the centroid scheme is considered, then the following inequalities hold: where f is given as the following : k be a sequence of LVs generated by the Hanafi-Wold's procedure.When the factorial scheme is considered, then the following inequalities hold: where g is given as the following: As a direct consequence of Theorem 1, monotony convergence of Hanafi Wold's procedure was established.
The Corollary 1 (respectively, Corollary 2) in Hanafi (2007, pp.284-287), established the monotony convergence of Hanafi Wold's Procedure.That is to say, that the real sequence a

Generalization of Hanafi's Theorem 1 and its consequences
The present section provides a generalisation of Hanafi's Theorem 1.

, be a sequence of
LVs scores generated by Hanafi-Wold's procedure, when the centroid scheme is considered the following equality holds: where f is given in (9) and λ (s)

., be a sequence of LVs generated by
Hanafi-Wold's procedure, when the factorial scheme is considered the following equalities hold: where g is given in (11) and μ (s) k is given in (7).
Theorem 2 generalizes Theorem 1. Indeed, it is sufficient to note that the right side of the Eq. ( 12) in the Theorem 2(i) is always positive: It follows that: As a consequence, inequalities (8) in Theorem 1(i) hold.
In the same way, the right side of the Equalities (13) in the Theorem 2(ii) is also always positive: As a consequence, inequalities (9) in Theorem 1(ii) hold.

Proof of Theorem 2 (i)
For k = 1, 2, . . ., K , let f k be the function defined by: It is worth noting that f k can be also written as the function of Z (s) k given in (5), as: where cov denotes the covariance. 123 Hanafi (2007, pp. 281) shows that f k w Evaluating f k respectively in w k as follows: and Considering the following equalities: , and summing over k, it follows Substitution ( 16) in the right of the equality (17) gives: Proof of Theorem 2 (ii) For k = 1, 2, . . ., K , let g k be the function defined by : Hanafi (2007, pp. 285) shows that can be written equivalently as Noting that The proof of the Theorem 2 (ii) is straightforward, just by observing the following equalities: Indeed by summing over k these equalities, it follows: and substituting Eq. ( 18) in right side of Eq. ( 19), it follows Theorem 2 allows additional convergence results of Hanafi-Wold's procedure, as summarized in the following corollary.

Corollary 3 Let z (s)
k be a sequence of LV scores generated by Hanafi-Wold's procedure, then (i) when the centroid scheme is considered, the sequence λ

Proof of Corollary 3
As a matter of fact, starting from theorem 2(i) the following inequalities hold for each k

Conclusion and perspectives
The main contribution of the present paper is the generalization of some properties of Hanafi-Wold's procedure established in Hanafi (2007).A generalized form of Theorem 1 in Hanafi ( 2007) is given here by the Theorem 2. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Fig. 1
Fig. 1 Conceptual model involving three blocks of manifest variables

kkAcknowledgements
Moreover, additional convergence results to those already established by Corollary 1 and Corollary 2 in Hanafi (2007), are presented in the Corollary 3 of the present paper.The results presented in this paper do not allow us to conclude that the sequence z convergences to zero, but gives a substantial contribution towards this achievement.Indeed, the Corollary 3 suggests to focus further work on finding a strictly positive lower bounds for the sequences λ τ and κ, respectively, Corollary 3 allows to conclude straightforwardly that the sequences z convergences to zero as following.For centroid scheme, using Corollary 3 (i) it follows that for every k and every s, .In the same way, for factorial scheme, using Corollary 3 (ii) it follows that for every k and every s, z Open access funding provided by Università degli Studi di Napoli Federico II within the CRUI-CARE Agreement.