Clustered Sparse Structural Equation Modeling for Heterogeneous Data

Joint analysis with clustering and structural equation modeling is one of the most popular approaches to analyzing heterogeneous data. The methods involved in this approach estimate a path diagram of the same shape for each cluster and interpret the clusters according to the magnitude of the coefficients. However, these methods have problems with difficulty in interpreting the coefficients when the number of clusters and/or paths increases and are unable to deal with any situation where the path diagram for each cluster is different. To tackle these problems, we propose two methods for simplifying the path structure and facilitating interpretation by estimating a different form of path diagram for each cluster using sparse estimation. The proposed methods and related methods are compared using numerical simulation and real data examples. The proposed methods are superior to the existing methods in terms of both fitting and interpretation.


Introduction
In recent years, various types of multivariate data have become available for analysis.Structural Equation Modeling (SEM;Bollen 1989;Heck and Thomas 2015;Smid et al. 2020) is a method that can reveal the relationships between observed variables and latent variables in the data.SEM is an effective method in terms of visualization as it can be visualized using a path diagram and is used in many fields, such as, marketing, psychology, education, and so on, because it estimates relationships among variables and a path structure (causal structure) behind the data.Confirmatory factor analysis would be formulated as an SEM model because it estimates the size of each arrow in a path diagram after assuming a causal structure based on the prior knowledge and experience of analysts and/or researchers.There are two approaches in SEM, namely, covariance-based and component-based approaches (Hair Jr. et al., 2017;Reinartz et al., 2009).The covariance-based approach is also called covariance structure analysis and estimates parameters in a model such that the sample covariance matrix of the data and the covariance matrix based on the model are closer.However, when applying the component-based approach, it is assumed that the components comprise observed variables, such as principal component analysis (PCA), which estimates parameters by minimizing the error between observed variables and the component; the typical example of this is the partial least square SEM (PLSSEM; Chin 1998).
When SEM is applied to multivariate data, the individuals are obtained not only from the same population, but also from some clusters that potentially exist.The data obtained in this case are called heterogeneous data.For example, consumer belief structures are assumed to be different for different market segments by the expectancy-value model (Bagozzi, 1982), and the decision-making process for brand choice may differ among different groups of consumers (Kamakura et al., 1996).When SEM is applied to the heterogeneous data, there may be a risk of bias in the estimation because a single-path structure is estimated even when different path structures potentially exist for each cluster.To overcome this problem, one approach is to consider the cluster structure behind the data and simultaneously perform clustering and estimate the size of each arrow in SEM (Fordellone & Vichi, 2020;Hwang et al., 2007;Jedidi et al., 1997a, b).We refer to the methods based on this approach as clustered SEM.Clustered SEM considers the heterogeneity of the data, even when the cluster structure from the data is not known beforehand.Moreover, they have the benefit that cluster features are grasped visually by the path structures.In clustered SEM on a covariance-based approach, mixture SEM (MSEM; Jedidi et al., 1997a, b;Muthén and Shedden 1999), which assumes a Gaussian mixture distribution for SEM, is proposed.In that case, the componentbased approach, Fuzzy clusterwise GSCA (FCGSCA; Hwang et al., 2007), which combines Generalized structured component analysis (GSCA) and is an extension of PLSSEM and fuzzy clustering, and a method combining PLSSEM and Kmeans clustering (Fordellone and Vichi, 2020), is proposed.However, there are two problems in the existing clustered SEM.First, the shape of the path diagram for each cluster is assumed to be the same, and the clusters are interpreted according to the size of the path coefficients only, which makes it difficult to interpret and compare clusters as the number of clusters increases.Second, the existing methods estimate the coefficients in the same way for all clusters, even if the shape of a reasonable path diagram is different for each cluster.Nevertheless, owing to the characteristics of applying SEM, it is not feasible to assume different path diagrams for each cluster in advance.
Therefore, in this study, we propose a clustered sparse SEM to grasp different path structures for each cluster and interpret the path diagram for each cluster more easily using sparse estimation.Concretely, there are three advantages to clustered sparse SEM.First, even if the number of clusters and that of variables are larger, it is easy to interpret the estimated coefficients compared with non-sparse clustered SEM because some coefficients are estimated as zero.Second, in clustered sparse SEM, it is easy to interpret the distinctive feature of each cluster because each path diagram can be estimated differently by each cluster, while non-sparse clustered SEM cannot provide such path structures.Generally, it is natural that these clusters have different path structures.From that, it can be considered that the assumption of sparse SEM is relaxed from that of non-sparse SEM.Third, sparse estimation also eliminates relationships among variables that do not affect the cluster structure; thus, only the relationships that affect clustering remain and clustering accuracy is improved (Pan & Shen, 2007;Xie et al., 2010).The purpose of this study is to propose an extension of the existing methods via two approaches by comparing and reviewing them.The first approach is the extension of sparse SEM (SSEM) to heterogeneous data.In Huang (2018) and Liang and Jacobucci (2020), sparse SEM for multi-group datasets, which have labels dividing individuals into groups, multi-group sparse SEM (MGSSEM), was proposed.We propose partitioning sparse SEM (PS-SEM), which can be used to estimate cluster-specific path structures, by incorporating Kmeans clustering into MGSSEM in the known group, .In MGSSEM, the causal structure specific to each group and the causal structure common to all groups can be expressed.Similarly, in PS-SEM, the causal structure specific to each cluster and the causal structure common to all clusters can be expressed.The second approach is the extension of MSEM by sparse estimation.Mixture models using sparse estimation have been proposed, for example, Galimberti et al. (2009) and Xie et al. (2010) in the framework of factor analysis and Fop et al. (2019); Pan and Shen (2007) and Zhou et al. (2009) in the framework of clustering.In this paper, we propose a mixture sparse SEM (MS-SEM) as an extension of MSEM based on sparse estimation.In addition, MSEM combines the EM algorithm with the conjugate gradient method (Powell, 1977), which slows down the convergence speed, while MS-SEM does not use the conjugate gradient method, which has the advantage of faster convergence.
This paper is organized as follows: in Sect.2, the proposed methods, PS-SEM and MS-SEM, are described; in Sect.3, numerical simulations and their results are relayed to compare and investigate the performance of the proposed and existing methods; in Sect.4, the evaluation and comparison of the proposed methods are relayed, while applying real data and some interpretation examples are provided; and in Sect.5, we summarize and discuss this paper.

Proposed Method
In this section, we describe the proposed PS-SEM and MS-SEM methods.Both can perform sparse estimation of SEM and clustering, simultaneously.As a result, they obtain a different causal structure for each cluster and interpretation of each cluster becomes easier.PS-SEM is an extension of MGSSEM and can group without the group labels.MS-SEM is a mixture SEM with sparse estimation.The model is the same for both methods, and their parameters are estimated using the EM algorithm.In parameter estimation, iterative conditional fitting is used to estimate the covariance matrix of variables.
Before describing the proposed methods, we introduce some notations.Given a vector x ∈ R P×1 and matrix Y ∈ R N ×P , the notations are defined in Table 1.

The Model
Let V ∈ R N ×P denote the data matrix and G denote the number of clusters.Then, the model for PS-SEM and MS-SEM can be defined as where is a random vector composed of the observed variable vector v g ∈ R P×1 and the latent variable vector f g ∈ R M×1 for cluster g = 1, 2, . . ., G. v g has different meanings for PS-SEM and MS-SEM.In PS-SEM, v g denotes the observed variables for individuals belonging to cluster g, whereas in MS-SEM, v g is assumed to be distributed from a normal distribution corresponding to the cluster g. α g ∈ R (P+M)×1 is the intercept vector comprising an intercept vector of observed variables α is from latent variables to latent variables.diag(B g ) is set to 0 P+M because the relations from the observed (latent) variable to itself are not considered in almost SEM applications.P+M) is the covariance matrix of ζ g and can be written as , and . Under the assumptions, expectation and covariance matrices of η g can be written as μ (η)  g = (I P+M − B g ) −1 α g , and respectively.As v g = (I P , O P×M )η g , the expectation and covariance matrix of v g and covariance matrix btween v g and η g can also be written as (3)

Partitioning Sparse SEM
PS-SEM can consider parameters for both common effects for all clusters and cluster-specific effects for each cluster.Nevertheless, MGSSEM (Huang, 2018;Lindstrøm and Dahl, 2020) can consider them for both group effects for all groups and group-specific effects for each group.However, in many practical situations, the group structure is unknown.Therefore, we extended the MGSSEM to address this situation.The parameter estimation of PS-SEM can be performed in the same way as MGSSEM.
Let KMSSEM denote the parameter space of the PS-SEM.Then, the parameter vector θ ∈ KMSSEM is defined as , is the parameter for effects for all clusters.Specif- is the parameter for cluster-specific effects.Specifically, α g , vec(B g ) , vec( g ) .We call each effect a common effect and cluster effect.Then, we assume that θ g , which includes each element of α g , B g , and g , is described as The objective function of PS-SEM is defined as where L KMSSEM (θ ) is the log-likelihood function, which is similar to the log-likelihood function of the multi-group SEM (Jöreskog, 1971), (5) and R KMSSEM (θ , λ) is the penalization term, which is similar to the penalization term of the multi-group sparse SEM (Huang, 2018), N g denotes the number of individuals belonging to cluster g, and v gn denotes the n th observed variable vector of v g .The first term of R KMSSEM (θ , λ) is the penalization term for θ , and the second term, the penalization term for θ g (g = 1, 2, . . ., G).Here, λ is a regularization parameter.c θ q and c θ gq are the regularization indicators for θ q and θ gq , respectively, which takes 0 or 1.Using these indicators, we can determine if the parameter is sparsely estimated in advance.If the indicator is 1, the parameter is estimated with penalization, whereas if the indicator is 0, the parameter is estimated without penalization.There are two penalization terms in R KMSSEM (θ , λ), that is, a zero or non-zero pattern in θ gq , which means the q th elements of θ g .θ gq is expressed as follows: .
The first case is that there are both common and cluster effects.The path diagram in this case shows that the coefficients are different for each cluster, although the shape of the path diagram is the same.The second case is that there is only a cluster effect.In this case, it is different for each cluster if the path diagram has an arrow, that is, if each cluster has a differently shaped path diagram.The third case is that there is only a common effect.In this case, both the shape and coefficients are the same for each cluster.The fourth case is that there is neither a common effect nor a cluster effect.In this case, the path diagram of each cluster does not have arrows for the parameter.This means that there was no relationship between these variables.In particularly, our goal that the proposed method should obtain a path diagram of a different shape for each cluster is achieved in the second case.
Log-likelihood Eq. 5 can be transformed as follows: where To maximize the objective function, there are parameter estimation and clustering phases.The parameter estimation phase is performed by the EM algorithm (Rubin and Thayer, 1982) and updates the parameters for SEM, especially the M-step.The clustering phase determines the cluster in which the individual belongs.

E-step
By Eq. 4, the completely penalized log-likelihood of the PS-SEM is written as Note that Eq. 8 is not a completely penalized likelihood of η gn but ζ gn (= η g −α gn − B g η gn ), because η gn has a determinant of the estimated parameter, which makes it difficult to estimate the parameter.The expectation of U C KMSSEM (θ , λ) is derived from Eq. 8 as follows: where is the parameter updated by the t th time update in the parameter estimation phase, and ŵ(t) g is obtained by updating the t th time in the clustering phase.Now, we define γ , and . Then, they are described as where . The E-step is performed by computing Eqs. 10, 11, and 12.

Updating formula on B g
In relation to B g , the updating formulas of B and B g are given as follows: where , and where

Updating formula on α g
In relation to α g , the updating formulas of α and α g are given as follows: where , and where

Updating formula on g
In relation to g , the updating formulas of non-diagonal elements in and g are given as the following.Here, we define and (t+1) .
Then, we obtain where where Meanwhile, diagonal elements of them are given as follows: where φ(t+1) Estimating the covariance matrix of ζ g is achieved by iterative conditional fitting (ICF) (Chaudhuri et al., 2007).In ICF, the diagonal and non-diagonal elements of the covariance matrix are updated differently.The diagonal elements of g are updated by updating ϕ g in Eq. 18.As they are variances of residuals variable, they are not regularized, that is, λ = 0. Therefore, we need not consider the common effect in the variance of residuals variable but only the cluster effect.In other words, we can set φ j j = 0, and then φ g j j = φ g j j .In the clustering of PS-SEM, individuals are assigned to either of the clusters where the likelihood function becomes maximum.To compute the likelihood function, Eq. 5 is transformed into where p(v gn ; μ g ) denotes the probability density function of normal distribution.u ng is an element of a cluster assignment matrix U ∈ R N ×G and values 1 or 0. u ng = 1 if the n th individual belongs to the g th cluster, otherwise u ng = 0.Then, u ng is updated by the following: Using û(t+1) ng , w g is updated by the following: From the above updating formulas, the PS-SEM algorithm is described as Algorithm 1.

Algorithm 1
The algorithm of PS-SEM The objective function of MS-SEM is defined as where c θ gq is a regularization indicator.R MSSEM (θ ) has one penalization term because the parameters of MS-SEM are not divided, unlike those of PS-SEM.Maximization of the objective function is conducted only by the EM algorithm, which differs from the PS-SEM, because the E-step in MS-SEM computes the conditional expectation of z ng contribute clustering.In addition, the M-step in MS-SEM updates the mixing proportions π g , although it is not a parameter for SEM.

E-step
Considering another latent variable z = (z 1 , z 2 , . . ., z G ) ∈ R G×1 which represents assignment individuals such that z g ∈ {0, 1} and G g=1 z g = 1, the completely penalized log-likelihood is written as Moreover, the expectation of U C MSSEM (θ , λ) is given as follows: and ], then, they are described as

M-step
The M-step was performed in the same way as PS-SEM without π g .Here, π g , B g , α g , and g are estimated, iteratively, Consequently, parameters are updated as follows.
Updating Formula on π g The π g is given as follows with responsibility.

Updating formula on B g
The updating formula on B g is given as follows: where Updating Formula on α g The updating formula on α g is given as follows: where

Updating Formula on g
The updating formula on the non-diagonal elements of g is given as the following.
Then, we obtain where W (t+1) The updating formula on the diagonal elements of g is given as follows: where Clustering in MS-SEM is the same as model-based clustering, that is, rng denotes the degree to which the individual n belongs to the cluster g.When performing hard clustering in MS-SEM, the individuals are assigned to either of clusters where rng becomes maximum.
According to the above updating formulas, the algorithm of MS-SEM is described as Algorithm 2 Algorithm 2 The algorithm of MS-SEM.

Simulation Study
In this section, we conducted a simulation study to evaluate the performance of the proposed method.The simulation design and simulation results are described below.

Simulation Design
Both PS-SEM and MS-SEM estimate path diagrams and the size of the arrows in them and assume that there is a latent cluster structure that has a different path diagram for each.Therefore, we generated artificial data from different path diagrams and compared the proposed methods with existing methods in the simulation study.Concretely, we let P = 8, M = 2, and G = 3, the path diagram of Cluster 1, as shown in Fig. 1, Cluster 2 as Fig. 2, and Cluster 3 as Fig. 3 The dashed lines in Figs. 2 and 3 represent the fact that the existence of the relations is assumed beforehand, although these relations do not exist in the true path diagram.According to them, if the proposed methods can be estimated correctly, their corresponding parameters are estimated as 0 by sparse estimation.
True coefficient matrices that express the path diagrams in Figs. 1, 2, and 3 are described as , and where These bold types are part of coefficient matrices corresponding with dashed lines.Moreover, to maintain identification we set other parameters as follows.
By fixing these parameters, B 2 , and g and (vv) g based on the above, N g observed data in cluster g are generated from v gn ∼ N (μ g ), where N g = N /G.Then, the data are V = (v 11 , v 12 , . . ., v 1N 1 , v 21 , v 22 , . . ., v 2N 2 , v 31 , v 32 , . . ., v 3N 3 ) .We apply each method to the data 100 times in the different sample sizes, N = 90, N = 150, and N = 300.b g jk , which is an element of B g , is initialized by b g jk ∼ U(0, 3), where U(x, y) denotes a uniform distribution.In KM-SEM, the initial value of the indicator matrix U is required.In each row of U, we randomly set one of the elements of the column as 1 and all the others as 0. In MS-SEM the initial value of π g is set as π g = ρ g / G j=1 ρ j (g = 1, 2, 3), where ρ g ∼ U(0.9, 1).Due to this, π g satisfies G g π g = 1.A random start is conducted 30 times because algorithms depend on initial value.
We applied four methods, PS-SEM, MS-SEM, MSEM, and KM+MGSSEM, where KM+MGSSEM is a tandem analysis that applies MGSSEM after clustering using Kmeans.Three methods without MSEM are needed to set the regularization parameter λ. λ is chosen based on BIC from , which is the grid search range, because in Huang (2018) the selection of λ was based on BIC.Let θ (λ) be the optimized parameter of λ, then, the BIC of PS-SEM and MS-SEM are respectively, where d(λ) is the number of nonzero parameters.AS MS-SEM has the constraint G g π g = 1, d(λ) − 1 appears in the BIC of the MS-SEM.Moreover, KMSSEM = {0.01,0.02, . . ., 0.1}, and MSSEM = {1.1,1.2, . . ., 2}. in KM+MGSSEM is the same as KMSSEM .In this simulation, to choose the tuning parameter, grid search is used and selected based on BIC.
We evaluated each method using four indices: adjusted rand index (ARI) (Hubert and Arabie, 1985), Accuracy for each cluster (ACC), true positive rate (TPR), false positive rate (FPR), and root mean square error (RMSE).ARI is calculated between the true clustering structure and the estimated clustering structure.If the ARI is close to 1, the estimated clustering structure is considered as good, otherwise, the estimated clustering result is considered as not good.ACC is calculated by each cluster and calculated as follows: G, where ûng and ûng are elements of the true indicator matrix and the estimated indicator matrix, respectively.The estimated label with the largest number of labels corresponding to each true label is determined to be the corresponding label.ACC is sensitivity for true cluster labels.Accuracy refers to clustering accuracy.TPR refers to the rate of relations that do not exist in the true structure (the dashed lines in Figs. 2 and 3) that are estimated to be zero by sparse estimation.On the other hand, FPR is the rate of relations that exist in the true structure and are incorrectly estimated to be 0 by sparse estimation.Therefore, the closer the TPR is to 1, the better, while the closer the FPR is to 0, the better.Finally, RMSE is the estimation error.We set the true parameter vector and the estimated parameter vector as θ = (θ 1 , θ 2 , . . ., θ R ) and θ = ( θ1 , θ2 , . . ., θR ), respectively, as follows: The reason RMSE is added to the evaluation of the numerical simulation is to evaluate the magnitude level of the error variance.

Simulation Result
The simulation results are shown in Tables 2, 3, and 4.These are the means and standard deviations of the evaluation indices by 30 simulations for N = 90, N = 150, and N = 300 and the bold type in the table shows the largest mean for each indicator.As MSEM did not perform sparse estimation, TPR and FPR were not calculated in MSEM.In the other methods, TPR and FPR were also calculated for each cluster.For example, TPR cls2 meant TPR in Cluster 2, and TPR cls1 were not calculated because the truth path structure of Cluster 1 did not require sparse estimation.First, in terms of ARI, PS-SEM demonstrates the best accuracy for all cases of N = 90, 150, and 300.From the results of ACC, it can be confirmed that the clustering results of PS-SEM is stable.As the clustering of the PS-SEM is hard clustering, it is more accurate than that of the MS-SEM and MSEM, which use soft clustering.However, the clustering KM+MGSSEM has the lowest accuracy, although it is clustered by Kmeans method, which is a hard clustering method.Accordingly, because the tandem method does not perform is used is to provide the GFI values referenced from previous studies.The data used are from a survey on satisfaction with and perception of cell phone service providers in the German cell phone market (Sarstedt & Mooi, 2014) and are considered to include a heterogeneous structure.It is often used as an example of the application of methods, such as MGSEM and clustered SEM (Fordellone & Vichi, 2020;Hair Jr. et al., 2016;Matthews et al., 2016).The survey questions were structured on a Likert scale, and the subjects' responses were obtained on a seven-point scale from "1: not at all true (not satisfied at all)" to "7: most true (very satisfied)".The subjects were service users of the target providers, and the sample size of the data was N = 344.These data had 31 variables, which are shown in Table 5, and they were obtained from seven components, evaluations of company's competence (COMP), company's likeability (LIKE), customer loyalty (CUSL), quality (QUAL), performance PERF, corporate social responsibility (CSOR), attractiveness (ATTR), and customer satisfaction constructs (CUSA), similar to Hair Jr. et al. (2016).We used the corporate reputation model as the path structure (Fig. 4) to apply each method, which was proposed by Eberl (2010), and used by Fordellone and Vichi (2020) ;Hair Jr. et al. (2016), andMatthews et al. (2016).
According to the data and path structure, PS-SEM, MS-SEM, MSEM, and KM+SEM were compared.As the meaning of the latent variable and the relationships among variables are determined in the corporate reputation model (Eberl, 2010), we do not regularize the parameters of the coefficients representing the relationship between the latent variables and the observed variables, but only the parameters of the coefficients representing the relationship between the latent variables, to avoid the interpretation of the latent variables that differ from the corporate reputation model.
As all methods require the number of clusters to be determined in advance, the optimal number of clusters was determined by the information criterion by applying each method with G = 2, 3, and 4. Bold letters in Table 6 indicate the part where the log-likelihood is The actual name of the company was inserted in the bracketed space 123 the maximum and the information criterion is the minimum for each method.Table 6 shows the values of the log-likelihood and information criterion for each number of clusters for the four methods and the values of the regularization parameters at that time.According to Table 6, the log-likelihood is maximum at G = 4, and both AIC and BIC are minimum for all methods.This result suggests that when the number of clusters is increased, the value of the log-likelihood becomes larger because the data are more segmented; however, the number of parameters also increases at the same time, resulting in a larger value of the information criterion.Due to this result, we report the results with G = 2 which is regarded as the best number of clusters.The result of the best number of clusters is the same as Fordellone and Vichi (2020) and Matthews et al. (2016), which use the corporate reputation model.For selecting the tuning parameter for sparseness, we used BIC in the same way as choosing the number of clusters.
Table 7 shows the results of parameter estimation.We state the coefficients representing the relationships among the latent variables because they are sparsely estimated.−→ in the table indicates the direction of the arrow in a path diagram, for example, CSOR−→LIKE represents the relationship from CSOR to LIKE.B 1 and B 2 are the coefficient matrices of Cluster 1 and Cluster 2, respectively, and the bold letters in the table indicate coefficients that are 0 by sparse estimation.
Before describing the interpretation of the results, we examine the validity of the estimated model.The GFI, which is one of the fit indices of SEM is 0.436 for PS-SEM, 0.434 for MS-SEM, 0.435 for MSEM, and 0.407 for KM+MGSSEM.From this result, it can be said that the proposed method estimates a model that fits the data while simplifying the path structure by sparse estimation to improve interpretability.Next, we provide the interpretation of the results for each method in Table 7.
First, we discuss the results for PS-SEM.The coefficient representing CSOR−→LIKE in B 1 was estimated to be 0. Due to this, the evaluation of the company's social responsibility by the respondents belonging to Cluster 1 does not affect its likeability.However, the coefficient representing CSOR−→COMP in B 2 was estimated as 0. Hence evidently, the social responsibility of the respondents belonging to Cluster 2 affects its likeability but does not affect its model-based clustering and Kmeans clustering.However, the simulation results suggest that PS-SEM has a relatively high clustering accuracy and can obtain a sparse coefficient matrix, whereas MS-SEM can estimate the coefficients closer to the true causal structure.As mentioned in Subsection 2.2, PS-SEM has the advantage that the computational cost is less than that of MS-SEM because it is not necessary to compute the expected value for all individuals in the E-step.MS-SEM has the advantage of being a more stable estimation because Kmeans-based clustering is more dependent on the initial values and is hard clustering.MS-SEM is also more suitable for cases where the number of individuals in a cluster is not equal.
Finally, we discuss the scope for further research.There are four things for future work.First, there is room for improvement in the regularization term and objective function because MS-SEM did not have sufficient sparse estimation results in the simulation study.The number of clusters and regularization parameters are determined by grid search in this study; therefore, the construction of a rational selection criterion remains an issue.SEM with sparse estimation is also called the semi-confirmatory method (Huang, 2020).This means that it explores the optimal path structure from a set of path structures assumed to some extent in advance by pruning.In short, it is not supposed to discover the optimal path diagram from a path diagram (full path) that assumes relationships among all variables and latent variables or to determine the number of latent variables.This is due to the identifiability of SEM and the fact that the order of pruning by sparse estimation is not uniquely determined.These problems make it difficult to compare the path diagrams estimated from the full path for each cluster.Therefore, the extent to which we can weaken the restriction of the assumed path diagram in advance is also an issue when searching for comparable path diagrams for each cluster in the proposed method.Second, the numerical simulation was designed to evaluate the results of the proposed methods; however, the numerical simulations were performed in a specific situation.For example, the RMSE of MS-SEM is the best among these methods; however, the differences of among them were relatively small.The differences depend on the true path structures and coefficients.To deal with these problems, it needs to improve the computational time and conduct further numerical simulations.Third, in a real example, we adopted GFI as an evaluation index according to previous studies such as (Huang, 2018;Steiger, 1998).However, it needs further consideration from various perspectives.Finally, although the MSEM originally combines the EM algorithm with the conjugate gradient method, MS-SEM does not use the conjugate gradient method.Therefore, it needs to use conjugate gradient method in MS-SEM to improve the computational time.
is a coefficient matrix that describes the relations from the observed variables to observed variables, B (v f ) g ∈ R P×M from latent variables to observed variables, B ( f v) g ∈ R M×P from observed variables to latent variables, and ) where S(x, y) = sign(x) max(|x| − y, 0), and W gq (λ) Stand for the updated estimate on regularization parameter λ.When λ = 0, they become θ (t+1) q

Table 1
Notation table

Table 5
Data description

Table 6
Information criterion for each method for each number of clusters