A unified model-implied instrumental variable approach for structural equation modeling with mixed variables

The model-implied instrumental variable (MIIV) estimator is an equation-by-equation estimator of structural equation models that is more robust to structural misspecifications than full information estimators. Previous studies have concentrated on endogenous variables that are all continuous (MIIV-2SLS) or all ordinal . We develop a unified MIIV approach that applies to a mixture of binary, ordinal, censored, or continuous endogenous observed variables. We include estimates of factor loadings, regression coefficients, variances, and covariances along with their asymptotic standard errors. In addition, we create new goodness of fit tests of the model and overidentification tests of single equations. Our simulation study shows that the proposed MIIV approach is more robust to structural misspecifications than diagonally weighted least squares (DWLS) and that both the goodness of fit model tests and the overidentification equations tests can detect structural misspecifications. We also find that the bias in asymptotic standard errors for the MIIV estimators of factor loadings and regression coefficients are often lower than the DWLS ones, though the differences are small in large samples. Our analysis shows that scaling indicators with low reliability can adversely affect the MIIV estimators. Also, using a small subset of MIIVs reduces small sample bias of coefficient estimates, but can lower the power of overidentification tests of equations. Supplementary Information The online version contains supplementary material available at 10.1007/s11336-021-09771-4.


Introduction
Structural equation modeling (SEM) is widely used in social and behavior sciences. Endogenous observed variables (e.g., indicators) in a SEM model can be continuous, binary, ordinal, or censored. When all endogenous observed variables are continuous, systemwide maximum likelihood (ML) is the dominant estimator. When binary or ordinal observed endogenous variables are present, researchers frequently use unweighted least squares (ULS, Muthén, 1978) and diagonally weighted least squares (DWLS, Muthén et al., 1997). Typically, these estimators require the estimation of a polychoric correlation matrix as a first stage. In the second stage, they use the polychoric correlation matrix as input and apply systemwide estimators such as ML, ULS, and DWLS. They are systemwide in the sense that all parameter estimates for all equations in the whole structural equation model are estimated simultaneously. Despite their popularity, various studies (e.g., Bollen, 1996;Jin et al., 2016;Nestler, 2013;Yang-Wallentin et al., 2010) have shown that the systemwide estimators can spread the bias due to model misspecifications such as nonzero coefficients ("omitted variables") or covariances of errors mistakenly set to zero. Our focus is on these types of misspecifications.
To reduce the impacts of such model misspecifications, Bollen (1996) proposed a two-stage least squares (2SLS) approach using model-implied instrumental variables (MIIVs) when all endogenous observed variables are continuous. Bollen and Maydeu-Olivares (2007) extend this approach to models with observed ordinal endogenous variables and called it the polychoric instrumental variable (PIV) estimator. The MIIVs for the 2SLS and PIV estimators are easily found from the pool of variables in the SEM model by the algorithm described in Bollen (1996, page 114) and implemented in SAS by Bollen and Bauer (2004), in Stata by Bauldry (2014), and in R by Fisher et al. (2017). The instrumental variables from this approach are model-implied instrumental variables; in that according to the structure of the model, these MIIVs should be uncorrelated with the error of the equation to be estimated. 2SLS with MIIVs is denoted by MIIV-2SLS to distinguish it from the 2SLS with auxiliary, external instrumental variables that are not part of the original model. In practice, auxiliary instruments (Bollen, 2012) are more common than MIIVs so it is important to distinguish between them. Both MIIV-2SLS and PIV approaches estimate the parameters in an equation-by-equation manner, in which each indicator equation with its factor loadings and each latent variable equation with its regression coefficients is estimated separately from each other. Furthermore, the estimator is non-interative (Bollen, 1996), which eliminates the non-convergence issue that can occur with the systemwide estimators. The simulation study of  showed that the MIIV-2SLS estimator has accuracy comparable to the traditional ML estimator if the model is correctly specified and is often more robust if the model is misspecified. Similarly, Jin et al. (2016) and Nestler (2013) showed that the PIV estimator is as accurate as the ULS and DWLS estimators in the correctly specified model and is more accurate in the structurally misspecified model. The reader is directed to Bollen et al. (2018a) and Bollen (2020) for conditions under which the MIIV-2SLS and PIV approaches remain robust to structural misspecifications. Recently, the principle of MIIV-2SLS and PIV has been generalized by Bollen et al. (2014) to SEM with generalized method of moments estimators, by Fisher et al. (2019) to dynamic time-series models, by Nestler (2014) to handle equality constraints, by Nestler (2015a) to nonlinear SEM models, and by Nestler (2015b) to growth curve models, just to name a few.
Another advantage of the MIIV-2SLS and PIV method is the local, equation-specific test of model specification. The most common method to improve model fit and search for model misspecification with systemwide estimators is to sequentially apply the modification index (Sörbom, 1989). However, the modification index can lead to severe errors (MacCallum, 1986;MacCallum et al., 1992). Bollen (1996, pages 117-118) and Kirby and Bollen (2009) recommend the Sargan (Sargan, 1958) test for equation-by-equation overidentification tests of misspecification. Jin and Cao (2018) showed that these tests are not suitable for ordinal endogenous variables and proposed alternative overidentification tests for such variables.
The cited studies focused on situations where all endogenous variables were either continuous (MIIV-2SLS) or all were ordinal (PIV). To our knowledge, the only exception is a parallel study by Fisher and Bollen (2020), which also considered inference using MIIV with different types of observed variables. The main purpose of our study is to unify the MIIV-2SLS and PIV approaches and propose an approach for a SEM model with different variable types, such as continuous, ordinal, binary, or a mixture of different types. Hereafter, the unified approach is termed the MIIV approach. We will present the MIIV point estimators, which is in line with the estimator in Fisher and Bollen (2020). Two sets of standard errors are proposed. One set of standard errors matches those in Bollen (1996) for observed continuous variables in finite sample and is asymptotically equivalent to those in Bollen and Maydeu-Olivares (2007) for observed ordinal variables. The other is asymptotically equivalent to Bollen (1996) but equivalent to Bollen and Maydeu-Olivares (2007) in finite samples. Fisher and Bollen (2020) only focused on point estimation and standard error estimation. In contrast, we will also develop goodness-of-fit tests for the whole model and overidentification tests for individual equations. Similarly to the standard error estimators, one set of overidentification tests for equations matches Kirby and Bollen (2009) for observed continuous variables and the other set matches Jin and Cao (2018) for observed ordinal variables.
The rest of the paper is organized as follows. We first briefly present the SEM model and the MIIV idea. Then, we develop the estimation of SEMs with the unified MIIV approach. Next, we provide model goodness-of-fit tests and overidentification tests of equations. We conduct a Monte Carlo simulation to investigate the small sample properties of the proposed approach. An empirical example illustrates our approach. A discussion and conclusion section ends the paper.

SEM Model
Consider the SEM model where y * is the vector of continuous variables, is the matrix of factor loadings, η is the vector of latent variables, B is the matrix of latent regression coefficient, and ε and ζ are disturbances. Without loss of generality, we assume that y * has a zero mean. We can justify the zero mean assumption by forming mean deviation variables for the observed continuous variables and for observed binary and ordinal variables researchers commonly assume that their underlying variables have means of zero. We also assume that η and ε are independent and that ζ and ε are independent. The η vector consists of endogenous and exogenous latent variables with the exogenous variables independent of the errors in ζ that correspond to the endogenous variables. Furthermore, the disturbances ε and ζ are homoscedastic, where var(ε) = and var(ζ ) = .
The observed vector is denoted by y, which is not necessarily the same as y * . If y j , the jth entry in y, is continuous, then y j = y * j , where y * j is the jth entry in y * . If y j is binary or ordinal, then it is obtained by categorizing the underlying continuous variable y * j according to the threshold values. If y j is censored, then it is obtained by censoring y * j at a boundary. If y * only contains observed continuous variables, we only need it to have finite moments up to the fourth order. If y * contains any observed binary, ordinal, or censored variables, we assume η, ε, and ζ follow a multivariate normal distribution.
Let be the population covariance matrix of y * . If some entries in y * are discrete, the corresponding variances are fixed to 1 for identification. Estimation of S, the sample counterpart of , depends on the types of indicators. For example, using Olsson (1979), we can estimate the polychoric correlation for two ordinal variables, and using Olsson et al. (1982), we can estimate the polyserial correlation for one ordinal indicator and one continuous indicator. Throughout the paper, we assume that S is a consistent estimator of and that where n is the sample size, s is the vector of nonredundant free entries in S, σ is the vector of corresponding free entries in , and ϒ is the asymptotic covariance matrix.
Our SEM model implies that equals the implied covariance matrix of where θ is the vector of all model parameters (i.e., free parameters in , B, , and ) and (·) −T takes the transpose of the inverse of the enclosed matrix. The diagonal entries of that correspond to binary and ordinal indicators are not free parameters, but are restricted so that the corresponding entries in (θ ) are 1 for identification. The traditional systemwide approaches estimate θ by minimizing the fit function where σ (θ ) is the vector of free entries in (θ ) and the weight matrixŴ is a consistent estimator of W . Different weight matrices yield different estimators, e.g., W = I in unweighted least squares (ULS; Muthén, 1978), W is the inverse of diagonal elements of ϒ in diagonally weighted least squares (DWLS; Muthén et al., 1997), and W = ϒ −1 in weighted least squares (WLS; Browne, 1984).

Latent to Observed Variable (L2O) Transformation
Each latent variable must be given a scale or metric. To set the scale of η, the most common approach is to choose one indicator per latent variable and to set its factor loading to one so that it becomes the scaling indicator. When possible, we should choose the scaling indicator to have factor complexity of one and a high R 2 . Suppose that we partition y * into ( y * T 1 , y * T 2 ) T and ε into (ε T 1 , ε T 2 ) T such that y * 1 = η + ε 1 is for the scaling indicators and y * 2 = 2 η + ε 2 contains unknown factor loadings for the nonscaling indicators. Following Bollen and Maydeu-Olivares (2007), the SEM model is equivalent to which is simply a multivariate regression system with "observed" variables. This is referred to as the Latent-to-Observed (L2O) variable transformation (Bollen, 2019), because the original equation system with latent variables is transformed to one without latent variables. When we have noncontinuous endogenous observed variables, the L2O transformation results in underlying scaling variables rather than observed ones. Regardless, in most cases, this leads the regressors to be correlated with one or more parts of the composite error terms. We can use MIIVs to consistently estimate the regression coefficients in B and 2 . Equation (5) indicates that we can partition θ into two vectors: θ 1 (free parameters in 2 and B) and θ 2 (the nonredundant free parameters in and ). To estimate θ 1 , Bollen (1996) and Bollen and Maydeu-Olivares (2007) proposed to consider the equation system (5) in a row-by-row manner. Suppose that the jth row of the system (5) is where z * j is the right-hand side explanatory variable vector, and θ ( j) 1 is the vector of parameters. We assume that there exists a non-empty subset of z * j that is correlated with the error e j . As the names implies, the MIIVs, denoted by v * j , are selected from the variables y * , excluding y * j and the elements in z * j that have nonzero correlations with e j . The valid MIIVs v * j must be correlated with the endogenous variable z * j as well as uncorrelated with the error term e j , and the number of v * j must be no lower than the number of z * j . We can find the pool of all valid MIIVs for a given equation by using the algorithms proposed by Bollen (1996). Due to space limitation, we direct the readers to Bollen (2019) for examples of Eq. (6) and the algorithm of selecting valid IVs. It is worth mentioning that the choice of the scaling indicator will affect the choice of MIIVs and the estimates. Though the estimator remains consistent, its asymptotic variance could differ. As mentioned earlier, it is preferable to choose as scaling indicators those indicators that are thought to be most closely related to the latent variable that they measure. Bollen (1996;2001), , and Kirby and Bollen (2009) have investigated the case where y is continuous. In contrast, Bollen and Maydeu-Olivares (2007), Jin et al. (2016), Cao (2018), andNestler (2013) have investigated the case where y is ordinal. Fisher and Bollen (2020) is the only study we know of that considered indicators of different types.

Estimation by the MIIV Approach
In this section, we propose a unified approach for estimating and testing models with continuous, binary, or ordinal endogenous observed variables. The approach applies as long as the sample moment matrix is a consistent estimator of the population moment matrix, with an estimate of the asymptotic covariance matrix of the elements of such matrices.

Point Estimator of θ 1
If Eq. (6) is correctly specified and the MIIVs are valid, then θ with vz, j being the covariance matrix between v * j and z * j , and vv, j being the covariance matrix of v * j , and vy, j being the covariance matrix between v * j and y * j . The MIIV estimator iŝ where S ··, j is the sample counterpart of ··, j . The same estimator was proposed by Bollen and Maydeu-Olivares (2007) and Fisher and Bollen (2020). If all observed variables are continuous, the MIIV estimator (7) is equivalent to the estimator in Bollen (1996) where y j , Z j , and V j are the demeaned data matrices of y * j , z * j , and v * j , respectively. Similar to MIIV-2SLS and PIV, the MIIV estimator is a consistent estimator of θ ( j) 1 , as long as Eq. (6) is correctly specified, the inverses in Eq. (7) exist, and the MIIVs are valid. This implies that we can maintain consistency ofθ ( j) 1 in a misspecified SEM model under certain conditions (e.g., Bollen, 2001;Bollen et al., 2018b;Bollen, 2020). In contrast, the systemwide estimators sometimes spread the bias from a poorly specified equation in one part of the model to a correctly specified equation in another part.
To obtain the estimator of the standard error, K j (σ ) is estimated by K j (s) and estimation of ϒ depends on the distributional assumption. In general, we can estimate ϒ using the approach in Muthén (1984). If all observed variables are ordinal, we can estimate ϒ from Jöreskog (1994). If all observed variables are continuous, we estimate ϒ using the fourth-order moments as in the asymptotic distribution-free approach (Browne, 1984). If all observed variables are normally distributed, we can estimate ϒ by 2 D + (S ⊗ S) D +T , where D is a duplication matrix and D + = D T D −1 D T (Browne, 1987).

Equivalence of Standard Errors ofθ 1
The asymptotic distribution attained by Bollen and Maydeu-Olivares (2007) when all observed variables are ordinal is the same as the asymptotic distribution (9). The same distribution is also used by Fisher and Bollen (2020). As mentioned above, this asymptotic distribution is applicable regardless of the validity of the MIIVs. If Eq. (6) is correctly specified and all MIIVs are valid, vy, j = vz, j θ ( j) 1 , then the first term in equation (10) vanishes and where The covariance matrix (11) can be estimated by whereˆ j is the estimator of with evaluated at S and ϒ evaluated at some estimatorΥ. If all observed variables are continuous and all MIIVs are valid, Bollen (1996) showed that vv, j vz, j is the asymptotic covariance matrix of the predicted right-hand side endogenous variable of Eq. (6) and is the variance of the error term e j . We expect the asymptotic covariance matrix ϕ 2 j −1 zẑ, j to be equivalent to (11) when all observed variables are continuous and all MIIVs are valid, since both are asymptotic covariance matrices ofθ ( j) 1 . It is worth emphasizing that we need all MIIVs to be valid in order for (11) and (13) to be valid, but neither the observed continuous variables nor the disturbances need to be normally distributed (Greene, 2002, page 77). We estimate the asymptotic covariance zẑ, j and equation (11), their finite sample estimators may still differ, as revealed in the following proposition. For ease of presentation, all proofs are placed in the appendix.

and the estimator (12) is the same asφ
Proposition 1 shows that the estimator 2 D + (S ⊗ S) D +T equates the estimator (12) and the standard error in Bollen (1996) in finite samples. Such an estimator is a consistent estimator of ϒ under the normality assumption. In contrast, if the normality assumption is relaxed and ϒ is estimated from the sample fourth-order moments, the estimator (12) is not necessarily equal For Proposition 1 to hold, we do not even need the MIIVs to be valid. Regardless of the validity of MIIVs, we always have which makes the estimator (12) the same asφ 2 As revealed in the following proposition, if we relax the assumption that ϒ is estimated by 2 D + (S ⊗ S) D +T but assume that all MIIVs are valid, the finite sample equivalence still holds.

Proposition 2.
Suppose that all observed variables are continuous with finite fourth-order moments, Eq. (6) is correctly specified, and all MIIVs are valid. Then, j = ϕ 2 j vv, j .
Proposition 2 implies that we can estimate j byφ 2 j S vv, j to attain the finite sample equivalence of (12) andφ 2 . It is also worth mentioning that Proposition 2 does not require the observed continuous variables to be normally distributed.

Point Estimator of θ 2
To estimate θ 2 givenθ 1 , we minimize the least-squares fit function T θ 1 , θ 2 , where T (θ ) is given by (4). The weight matrices commonly used for systemwide estimators can naturally be used here. Bollen and Maydeu-Olivares (2007) and Fisher and Bollen (2020) used ULS. It is worth mentioning that the estimator of θ 2 is not robust against model misspecification, since we use the systemwide estimation here [see Bollen and Maydeu-Olivares (2007) page 321 and Nestler (2015b)]. It is also worth mentioning that non-convergence is not an issue when estimating θ 1 , since it is based on the closed-form estimator (7). However, when T θ 1 , θ 2 is minimized to estimate θ 2 , non-convergence becomes a potential issue for the MIIV approach as well. However, we can always estimateθ 1 even if non-convergence prevents us from estimating θ 2

Standard Error ofθ 2
Bollen and Maydeu-Olivares (2007) derived the asymptotic distribution of the PIV estimator θ 2 under the ULS fit function. Their equation (31) was taken by Fisher and Bollen (2020) in equation (33). Under certain conditions, it is shown in the appendix that where We estimate the standard errors from distribution (15) by evaluating θ, σ , and ϒ atθ, s, and ϒ, respectively. Our H matrix is more general than the H matrix in equation (33) of Fisher and Bollen (2020) because we allow the use of a more general weight matrix W to develop the standard errors whereas they assume W = I as is true for ULS. 1

Goodness-of-fit Tests for Model
Bollen and Maydeu-Olivares (2007) propose goodness-of-fit tests for the model as a whole for PIV. We develop similar model test statistics for the MIIV approach here. We show in the appendix that, if the model is correctly specified, nT (θ) evaluated at the MIIV estimates converges in distribution to a weighted sum of independent Chi-square random variables with 1 degree of freedom, where the weights are the eigenvalues of with J (θ ) = ∂σ (θ) /∂θ T . Hence, we can apply the Satorra and Bentler (1994) adjustments to assess the overall goodness of fit of the model. In particular, the mean-scaled statistic and the mean-variance adjusted statistic are respectively, where r is the difference between the number of nonredundant free entries in and the number of free parameters. The mean-scaled statistic is approximated by a Chi-square distribution with r degrees of freedom and the mean-variance adjusted statistic is approximated by a Chi-square distribution with tr M 2 /tr M 2 degrees of freedom.

Overidentification Test for Equations
We mentioned earlier that an advantage of the MIIV approach is that we have an overidentification tests for every overidentified equation in the system (5). This provides a test of the validity of the MIIVs and the validity of the model specification. The reader is directed to Bollen (2019) for the implications of such a test. In this section, we present the asymptotic Chi-square distributed overidentification test that is applicable to continuous, ordinal, and binary endogenous observed variables. The derivation by and large follows the derivation in Jin and Cao (2018).
If equation (6) is correctly specified and the MIIVs are valid, g j (σ ) = 0. Then the delta method indicates that It is easy to show that is an idempotent matrix, but not necessarily symmetric. LetQ j be an estimator of Q j by replacing with S and θ 1 . Together with Eq. (7), it can be shown that where the second equality holds since respectively. The following theorem shows that is asymptotically Chi-square distributed.
Theorem 1. As n → ∞, F converges in distribution to a Chi-square distribution with L j − K j degrees of freedom, provided that Eq. (6) is correctly specified and all MIIVs are valid, where L j is the number of MIIVs, K j is the number of explanatory variables inz j , and L j − K j > 0.
The expression of F is the same as the equation (9) in Jin and Cao (2018) for observed ordinal variables. We have shown in Theorem 1 that it is also valid for a mixture of different types of variables. We can interpret the test statistic F as the generalized Wald statistic of Andrews (1987).

Connection with Sargan's Chi-Square
The test statistic (19) is asymptotically Chi-square in general, including the case where all observed variables are continuous. The Sargan's Chi-square statistic 1 is the residual vector from Eq. (6). Equivalently, The asymptotic distribution of F Sargan is derived by Sargan (1958 Otherwise, the distribution (17) applies. Despite of the asymptotic equivalence, F and F Sargan are not necessarily the same due to the finite sample estimator of j . A properly chosen estimator of −1/2 j G j −1/2 j is needed to yield the established test statistics in the literature. By Proposition 2, if j is estimated byφ 2 j S vv, j , then F = F Sargan at any sample size. Alternatively, j can be estimated bỹ is asymptotically the same as F, whereG j is the Moore-Penrose inverse ofQ jQ T j with If all variables are normal andΥ = 2 D + (S ⊗ S) D +T , then˜ j reduces toφ 2 j S vv, j (Proposition 1) andF is exactly the same as F Sargan at any sample size.

Satorra-Bentler Adjusted Overidentification Tests of Equations
Besides the Chi-square distributed test statistic, Jin and Cao (2018) proposed alternative test statistics in the spirit of Satorra and Bentler (1994) and showed that the Satorra-Bentler-based statistics tend to have better small sample properties than the Chi-square distributed test statistic. In this section, the test statistic in Jin and Cao (2018) is extended to the mixture of different types of variables as well.
We can express the Sargan's Chi-square statistic F Sargan as Jin and Cao (2018) termed (23) as "Naive Sargan's Chi-square statistic" and showed that its asymptotic distribution is a weighted sum of independent Chi-square random variables with 1 degrees of freedom, if all observed variables are ordinal. The next theorem generalizes their results.
Theorem 2 implies that we can apply the Satorra-Bentler-type adjustments to F Sargan . In particular, the mean-scaled statistic and and the mean-variance adjusted statistic are respectively, whereˆ j is the estimator of j , replacing by S and θ 1 . We can approximate the mean-scaled statistic by a Chi-square distribution with L j − K j degrees of freedom and approximate the mean-variance adjusted statistic by a Chi-square distribution with tr ˆ j 2 /tr ˆ 2 j degrees of freedom.
If the observed variables are ordinal, the mean-scaled and mean-variance adjusted statistics derived by Jin and Cao (2018) are respectively, whereˆ j is an estimator of Despite that j is not necessarily the same as j , the following corollary shows their connections. Corollary 1 indicates that F m and F mv are asymptotically the same as the statistics developed by Jin and Cao (2018), if only ordinal variables are observed and all MIIVs are valid. If all observed variables are continuous and all MIIVs are valid, j = Q j , and F m , F mv , and F Sargan are asymptotically the same. The implication is that F m and F mv can always be computed as test statistics and be approximated by their corresponding Chi-square distributions. Similarly to the Chi-square distributed test statistic, the consistent estimator˜ j needs to be used to ensure small sample equivalence of F m , F mv , and F Sargan under the assumptions in Proposition 1, whereaŝ j only ensures asymptotic equivalence. Hereafter, the mean-scaled statistic and mean-variance adjusted statistic using˜ j are denoted byF m andF mv , respectively.

Monte Carlo Simulation
In this section a simulation study is conducted to investigate the finite sample properties of the proposed MIIV approach. The simulation is performed in R (R Core Team, 2020) and is built on the packages lavaan (Rosseel, 2012) and MIIVsem (Fisher et al., 2017). 2 Due to space limitation, we present only limited results in the subsequent section.

Simulation Design
To explore some of our proposed statistics, we use the SEM model in Li (2016) with two exogenous and three endogenous latent variables. The latent regression of the true data generating process with standardized coefficients is shown in Fig. 1. Every latent variable is measured by three indicators. The indicators of η 1 , η 3 , and η 5 are continuous, whereas the indicators of η 2 and η 4 are ordinal with five categories. The response probabilities are 0.04, 0.05, 0.21, 0.46, and 0.24, which is the slightly asymmetric condition in Li (2016). Each ordinal variable has an underlying continuous variable, and we set the error variances of the latter so that their total variances equal one. For simplicity, we also let the variances of continuous indicators equal one, but the error variances of continuous indicators are freely estimated. The factor loadings are set to 0.8, 0.65, and 0.5 with corresponding R 2 of indicators being 0.64, 0.42, and 0.25.
Given the scaling indicator, MIIVs are automatically generated from the hypothesized model using the package MIIVsem. Previous studies (e.g., Jin and Cao, 2018) suggested that when the sample size is small, using fewer IVs results in less bias than using all IVs. Hence, we consider three estimators in the simulation study, i.e., MIIV with one more IV than the number of endogenous variables (denoted by MIIV1), MIIV with all possible IVs (denoted by MIIVall), and DWLS. We use the same set of MIIVs for estimation and in the model goodness-of-fit tests and the equation overidentification tests.
Since the MIIV approach uses the scaling indicators to set the scale of η, we explore two choices of the scaling indicator, i.e., the indicators with standardized factor loadings 0.8 and the indicators with standardized factor loadings 0.5. The former sets the scale of η using the indicator with the highest R 2 , whereas the latter uses the indicator with the lowest R 2 . We also use the same MIIV scaling indicators for DWLS, so that we can compare the effects of scaling on different estimators. Fisher and Bollen (2020) have shown that the scaling indicator with a higher Shea (1997)'s R 2 contributes to lower biases in the MIIV approach. The R 2 of the scaling indicator measures the reliability of that indicator, whereas the Shea (1997)'s R 2 is a diagnostic for "weak" instrumental variables. Choosing the scaling indicator as the one with the lowest reliability or lowest Shea (1997)'s R 2 weakens the instrument compared to choosing the scaling indicator with the higher values. MIIVall uses all MIIVs in estimation. MIIV1 uses one more than the minimum number of MIIVs and these are chosen to maximize the Shea (1997)'s R 2 . Figure 2 illustrates the values of Shea (1997)'s R 2 for the data generating process. There are ten equations for factor loadings and three equations for latent regression coefficients in our model. Figure 2 shows that the maximum Shea (1997)'s R 2 never exceeds 0.5 and the highest values occur when the scaling indicator has the highest R 2 value. The Shea (1997)'s R 2 is considerably lower for the scaling indicators with the lowest R 2 s. Indeed for two of these equations, the Shea (1997)'s R 2 value is lower than 0.1. Although the high R 2 scaling indicator condition generally exceeds the low R 2 scaling indicator condition, the last two equations for the latent variable model have values of 0.1 or less. Hence, the simulation allows us to examine how very weak instrumental variables affect the estimates.
We fit both correctly specified and misspecified models. The hypothesized model is correctly specified, if it includes all paths in Figure 1. The misspecified model omits the dashed path b 43 . For each model, six sample sizes are considered, namely, n = 200, 400, 800, 1200, 2000, and 3200. The number of replication is 10, 000, for each combination of the hypothesized model, sample size and choice of scaling indicator.

Outcome Measures
If only the coefficients and factor loadings of θ 1 are of interest, MIIV always converges. Non-convergence is only a potential problem for MIIV when θ 2 is estimated, but it is always a potential problem for DWLS. In the current study, θ 2 is estimated using the DWLS fit function, given the MIIV estimator of θ 1 . Even for the converged estimates, the covariance matrices of error terms and factors are possibly non-positive definite. Both non-convergence and non-positive definite solutions are regarded as improper and are discarded in the current study.
To investigate the accuracy of the MIIV estimator, we compute the relative bias 100 × median of θ −1 θ r − θ , r = 1, ..., R , whereθ r is the estimate of θ in the r th replication and R is the number of proper replications. We use the median instead of the mean to prevent outliers from affecting the central tendency estimate in the smaller sample sizes. We focus on relative bias greater than 5 percent bias. The relative bias is also used to investigate the accuracy of the  standard error estimates. However, we do not know the true value of the standard error. Hence, we use the sample standard deviation of the estimates as the pseudo-true value. Due to the presence of outliers, we use the interquartile range to trim the estimates, where the interquartile range is calculated by the third empirical quartile minus the first empirical quartile. Any estimate of a parameter is trimmed if the estimate is less than the first empirical quartile minus 3 times the interquartile range or larger than the third empirical quartile plus the 3 times the interquartile range. Since the converged replications with negative definite covariance matrices also reflect sampling fluctuation, the standard deviation of the converged estimates after trimming is treated as robust estimate of the standard error.
Concerning the goodness-of-fit test for the model as a whole, both the mean scaled test T m and the mean-variance adjusted test T mv are investigated. For DWLS, only the mean-scaled test is used. The goodness-of-fit tests of models are only computed for proper solutions. Regarding the overidentification tests for equations, the Chi-square tests ((19) and (21)), the mean scaled tests (F m andF m ), and the mean-variance adjusted tests (F mv andF mv ) are investigated. We will only focus on the overidentification equation tests for the misspecified model, since it contains both correctly and incorrectly specified equations when estimating θ 1 . Because the overidentification equation tests does not require estimation of θ 2 , we compute them for all converged solutions, including the solutions with negative covariance matrices. We use the empirical percentage of test statistics that exceed the χ 2 critical value at the significance level 0.05 for all test statistics.  Table 1 shows that MIIV and DWLS generally have similar percentages of proper solutions when the R 2 of the scaling indicator is high. If the R 2 is low and n is low, MIIV can yield more improper solutions than DWLS. Recall that MIIV always converges if only θ 1 is estimated. Hence, the improper solutions for MIIV originates from estimating θ 2 .

Correctly Specified Model
When the model is correctly specified, all estimators are consistent estimators. We report the average of the absolute value of the relative bias for each of the four parameter sets ( , B, , and ). Figure 3 shows that the biases of all estimators converges toward zero as sample size grows. Under the high R 2 condition, the average absolute value of the relative bias of the parameter estimates is low for DWLS, MIIVall and MIIV1. In contrast, when both the R 2 and sample size are low, DWLS maintains low bias, while MIIVall and MIIV1 exhibit more bias that diminishes as the sample size increases. It is also clear that MIIV1 tends to be less biased than MIIVall, especially when the scaling indicator has a low R 2 . When the R 2 is low, MIIVall has the highest average absolute value of the relative bias for the latent variable equation error variances ( ). Figure 4 reveals the averaged absolute value of relative bias of the standard error estimates for each parameter set. We found some extreme outliers at the smaller sample sizes, so we trimmed values that were greater than three times the interquartile range above the third quartile or below the first quartile. This resulted in no more than 1.3% of cases trimmed which occurred when n = 200. The percentage trimmed rapidly decreased toward zero as n increased. For MIIV, both standard errors from (9) and (12) are expected to be accurate in the correctly specified model. If the R 2 of the scaling indicator is high, all standard errors are accurate. The only exception is that the relative bias of the standard errors is larger for the latent variable regression coefficients (B) for DWLS at the smaller sample size. If the R 2 is low, the pattern is more complex. Under these conditions, the relative bias of the standard errors of the factor loadings and latent variable regression coefficients are lowest for MIIV1 (Eq. 9), MIIV1 (Eq. 12), and MIIVall (Eq. 12). The bias is greatest for MIIVall (equation 9) and DWLS, though these biases diminish as sample size increases. Continuing with the low R 2 results, the bias of the standard errors of the variance  Averaged absolute value of the relative bias of the standard error estimators when the model is correctly specified. Note: The MIIV standard errors ofˆ andB are computed from Eq. 9 or 12. The MIIV standard errors ofˆ andˆ are computed from Eq. 15, hence MIIV Eq. 9 is the same as MIIV Eq. 12.
parameter estimates are highest for MIIVall followed by MIIV1 and DWLS. These too diminish as the sample size increases. When MIIV1 is used, the standard error estimator (12) tends to be similar to (9). However, the standard error estimator (9) tends to be more biased than (12) for MIIVall. Given that the model is correctly specified, we can explore the empirical size of the goodnessof-fit tests for models by computing the percentage of test statistics that exceed the critical value of a Chi-square with significance level 0.05. Figure 5 shows that using different number of IVs does not have strong effects on the empirical size. The mean-variance adjusted statistics for MIIV are the closest to the nominal level of all of the model test statistic (Jin et al., 2016). The size of the mean-scaled statistic for DWLS tends to be too high at about the same magnitude for the high and low R 2 scaling indicators. The mean-scaled statistics for MIIV are the least accurate, tending to be somewhat higher under the high R 2 condition and much higher under the low R 2 condition.

Misspecified Model
In the misspecified model, b 43 is mistakenly fixed to 0, indicating that the latent regression with η 4 as the dependent variable ( j = 12) is misspecified. Bollen (2001) and Bollen et al. (2018a) give the conditions under which the MIIV-2SLS and PIV estimator of coefficients will be robust under structural misspecifications such as in this model. Using their results, we know that MIIV factor loading ( ) estimates are identical under the correct and incorrect specifications and hence will have identical properties. Similarly, the MIIV estimates of the coefficients of b 51 , b 52 , b 53 and b 54 from the η 5 equation are robust to the structural misspecification. In contrast, the MIIV estimates of b 41 and b 42 from the latent regression for η 4 are not robust due to the omission of η 3 from the η 4 equation. In addition, the MIIV estimates of b 31 and b 32 from the latent regression for η 3 (with j = 11) are not robust. The reason is that in the misspecified model, it mistakenly appears that y 10 , y 11 , and y 12 , the indicators of η 4 , are suitable MIIVs because the path from ζ 3 to these indicators is cutoff by setting b 43 to zero. Hence, we expect MIIVall to be consistent for , b 51 , b 52 , b 43 and b 54 , but inconsistent for b 31 , b 32 , b 41 , and b 42 . In contrast, MIIV1 can be consistent for b 31 and b 32 as long as y 10 , y 11 , and y 12 are excluded. These robustness conditions describe the status of the factor loadings and latent variable coefficients, but there are no analogous analytic conditions for the variance parameters.
MIIV1 and MIIVall versions of MIIV are robust in the estimation of the factor loadings ( ). Therefore, the results of estimating the factor loadings for MIIV1 and MIIVall should be the same in the misspecified as they were in the correctly specified model, since the misspecification in the latent regression model does not affect the MIIVs for the measurement model. Figure 6 confirms this. Both MIIV1 and MIIVall yield consistent estimators of regardless of the R 2 of the scaling indicator. DWLS does not seem to spread the bias to the measurement model and in the smaller samples the bias is even lower than that of MIIV1 and MIIVall, though the differences are slight in the bigger samples. In contrast, DWLS spreads the bias over all regression coefficients from the entire latent regression part. The DWLS average of absolute bias ranges from a low of 6 to 8 percentage bias for b 31 and b 32 to 50 percent bias for b 51 , b 52 , b 53 and b 54 and 70 percent bias for b 41 and b 42 . The MIIV1 and MIIVall estimators of b 51 , b 52 , b 53 and b 54 have large small sample biases, but decrease toward zero as n increases. The MIIV1 estimates of b 31 and b 32 have low biases, whereas the MIIVall estimator has a large bias if the scaling indicator has a large R 2 . It is however interesting to see that the small sample bias of MIIVall is low if the scaling indicator has a low R 2 . Hence, the MIIVall estimator of b 31 and b 32 can be inconsistent, but the asymptotic bias can still be low. When it comes to the effect of the scaling indicator, using a strong scaling indicator generally yields a lower bias than using a weak one for all parameters that can be consistently estimated.
If the model is misspecified, we still expect the standard errors from Eq. (9) to correctly quantify the variation ofθ 1 , even thoughb 31 ,b 32 ,b 41 , andb 42 are not consistent estimators of the true population value. However, the standard errors from (12) can be biased for these same coefficients, since it needs the MIIVs to be valid. From Eq. 10, if S vy, j − S vz, j γ j (s) is far away from 0, the bias is large. Otherwise the bias is small. It is seen from Figure 7 that all MIIV standard errors have low biases as n increases, regardless of the R 2 . In contrast, the DWLS standard errors tend to be biased for the estimates of B. It is interesting to see that the standard error (12) also yields a low bias when the MIIVs are invalid. Nevertheless, we still prefer standard errors from Eq. 9, as it is theoretically valid. The bias of estimated standard errors is often higher if the R 2 of the scaling indicator is low. To produce Fig. 7, the percentage of trimmed replications is generally small (e.g., no more than 1.7% are trimmed when n = 200 and no more than 0.3% are trimmed when n = 400).
The empirical powers of the goodness-of-fit tests for the full model are in Fig. 8. Recall that the mean and variance adjusted test statistics (T mv ) for MIIV1 and MIIVall had the most accurate Type I error probabilities in the correctly specified model under both the high and low R 2 conditions. The T m test statistic from MIIV1 and MIIVall and the DWLS test statistic rejected too frequently with the correct model. For the misspecified model, using a strong scaling indicator tends to yield higher power than using a weak scaling indicator for all MIIV and DWLS test statistics. Nevertheless as n increases, the power of all tests tends to increase. Under the high R 2 condition, MIIV1 (T m ) has the highest power with the test statistics MIIV1 (T mv ), DWLS, and MIIVall (T m ) the next highest. MIIVall (T mv ) has the lowest power. Under the low R 2 condition, MIIV1 (T m ), MIIVall (T m ), and DWLS have the highest power followed by MIIV1 (T mv ) and MIIVall (T mv ). If we want the best combination of accurate Type I error and high statistical power under the high R 2 condition, then MIIV1 (T mv ) is the best choice. The situation is more ambiguous for the low R 2 condition.
. Figure 9 illustrates that the sizes of all proposed overidentification tests become closer to the significance level 0.05 for the correctly specified η 5 equation with the rate of convergence more rapid if the R 2 of the scaling indicator is high. The power of the MIIVall tests for the η 3 ( j = 11) and η 4 ( j = 12) equations increase as n increases. However, the power of the MIIV1 tests is still far from 1, which is in line with the results in Jin and Cao (2018). It is also seen that the MIIV1 tests of F, F m , and F mv perform almost the same and that the MIIV1 tests ofF,F m , and F mv perform almost the same. Hence, they are almost visually indistinguishable in the figure. For MIIVall, using a strong scaling indicator yields a higher power than using a weak scaling indicator.

Empirical Example
To demonstrate the use of our MIIV estimator, we consider the Reisenzein (1986) dataset with 138 observations that is available in the MIIVsem package. It is hypothesized that the effect of Controllability (perceived controllability over a situation) on Help (decision to help another person) is mediated by Sympathy and Anger, shown in Fig. 10. Each factor is measured by three indicators. Every observed variable of Controllability and Anger has nine categories and are treated as continuous. The observed variables of Sympathy and Help are measured by either the five-point or nine-point Likert scales. They are converted to five-point Likert scales and are treated as categorical.  We first use MIIV1, MIIVall and DWLS to estimate the model in Fig. 10 without the crossloading s * 3 on Help. For both MIIV1 and MIIVall, there exists eight indicator equations with factor loadings (with left-hand side variables c 2 , c 3 , s * 2 , s * 3 , a 2 , a * 3 , h * 2 , h * 3 ) and three latent variable equations with regression coefficients (with left-hand side variables Sympathy, Anger, and Help). It is seen from the top panel of Table 2 that all goodness-of-fit tests reject the null hypothesis that the model fits the data well. However, the robust CFI, TLI, and RMSEA with DWLS are 0.996, 0.995, and 0.053, indicating a reasonable fit. In contrast, Table 3 shows that the MIIVall specification tests are highly significant when used to test the equations for s * 3 and Help. Here, the Bonferroni correction is used for simplicity to adjust the effect of multiple testing. Hence, the equations can be misspecified leading to invalid MIIVs. If we add the cross-loading s * 3 on Help, all goodness-of-fit tests become insignificant (bottom panel of Table 2). It is seen from Table 4 that MIIV and DWLS yield similar point estimates and standard errors when the revised model is fitted.

Discussion and Conclusion
We proposed a unified MIIV approach that handles a mixture of continuous, ordinal, or binary observed endogenous variables in SEM. Our method only requires a consistent estimator of the covariance matrix and an asymptotic covariance matrix of its elements. We provide point estimators of all parameters and their asymptotic standard errors. Furthermore, we provide model goodness of fit test statistics and local tests of overidentified equations. The latter of which correspond to the classic Sargan's Chi-square test and the tests in Jin and Cao (2018). In addition, we give Satorra-Bentler-type modifications to these test statistics. The simulation study shows that the small sample properties of the proposed MIIV approach is generally in line with the theoretical results. That is, the MIIV estimators are more robust to structural misspecifications than are the systemwide estimators; the overidentification equation tests provide useful local tests of fit, and the model goodness of fit tests provide useful diagnostics on global fit. The performance of these MIIV tests is best with strong MIIVs and deteriorates if the MIIVs are weak. The MIIV procedure is applicable to a large class of latent variable models, as long as they can be expressed as (1) and (2). Among these are the confirmatory factor analysis model (Jin et al., 2016;Jin and Cao, 2018;Nestler, 2013), the latent growth model (Nestler, 2015b), and specification tests for nonlinear terms in the latent variable model (Nestler, 2015a). The MIIV estimation essentially depends on S, which can be interpreted as a method of moment. Various latent variable models such as the item response theory models are expressed from the likelihood perspective. When using the probit link, some connections between the MIIV estimates and item response theory parameters are possible (Takane and de Leeuw, 1987). But for item response-type models in general the connections to a MIIV approach are less clear.
The finite sample equivalence among various test statistics depend on the estimator of . The estimatorsˆ and˜ are asymptotically the same but differ by a consistent estimator of 0 if all MIIVs are valid. If some MIIVs are not valid,ˆ and˜ are not always asymptotically the same. In other words, the established equivalence is only for the null distribution. A rigorous power analysis and extensive simulation studies to investigate the distributions of the test statistics under the alternative hypothesis and the effects of using different estimators in order to make further recommendations.
We also explored weak instruments in our simulation. An IV is commonly viewed as weak if its correlation with the endogenous variable is small. With the presence of weak IVs, it is known in the econometrics literature that the 2SLS estimator is biased, and the overidentification test has the wrong size when even a small correlation between instruments and equation errors exist (e.g., Bound et al., 1995;Hahn and Hausman, 2003;. In our simulation, we considered both MIIV1 and MIIVall. The weak IVs are not excluded from MIIVall, but MIIV1 only incorporates the strongest IVs for the given data set. We found that scaling indicators with low R 2 s were associated with weak instruments. A number of authors have proposed diagnostics for weak instruments including Hahn and Hausman (2002), Kleibergen and Paap (2006), Olea and Pflueger (2013), Shea (1997), and Stock and Yogo (2005). Fisher and Bollen (2020) showed that the accuracy of estimation is negatively related with the Shea's partial R 2 . Further studies are needed to provide guidelines on what to do when all IVs are weak or when some IVs are strong but others are weak. Finally, we recognize the limits of any simulation study and encourage other simulation studies that explore a wider variety of conditions to determine the degree to which our findings generalize.

Appendix: Mathematical Proof
Proof of Proposition 1. Let σ vz be an element in vz, j , σ vv be an element in vv, j , and σ vy be an element in vy, j . Then For any other σ that does not belong to vz, j , vv, j , nor vy, j , ∂ g j (σ ) /∂σ = 0. In matrix form, the nonzero part in ∂ g j (σ ) /∂σ T is essentially subject to a permutation of columns, where ⊗ is the Kronecker product. IfΥ = 2 D + (S ⊗ S) D +T , the estimator of ϒ that corresponds to the nonzero part in ∂ g j (s) /∂σ T is S yy, j S vv, j S T zy, j ⊗ S vv, j S zy, j ⊗ S vv, j S zz, j ⊗ S vv, j + S vy, j S T where rank {} and tr {} denote the rank and the trace of the enclosed matrix, respectively. Since G is the Moore-Penrose inverse of Q Q T , condition (29) is naturally satisfied. Hence, it suffices to show condition (30). It can be easily shown that G Q Q T is an idempotent matrix when G is a Moore-Penrose inverse. Hence, where the second equality holds because of Corollary 9.3.8 in Harville (1997). Note that, as the Moore-Penrose inverse, Hence, condition (30) is also satisfied. Further, as a gram matrix, Recall that Q is idempotent, then the degrees of freedom can be simplified to That is, F is asymptotically Chi-square distributed with L j − K j degrees of freedom.
Proof of Theorem 2. Equation (18) and its weight matrix is symmetric. Hence, the distribution (17) indicates that it converges in distribution to a weighted sum of independent Chi-square random variables with 1 degrees of freedom each and the weights are the eigenvalues of ϕ −2 j Q T 1/2 −1 vv, j 1/2 Q. Finally, plugging in the expression of Q into the weights yields the expression of in the theorem.