Skip to main content
Log in

Estimating Structural Equation Models Using James–Stein Type Shrinkage Estimators

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

An Erratum to this article was published on 05 May 2021

This article has been updated

Abstract

We propose a two-step procedure to estimate structural equation models (SEMs). In a first step, the latent variable is replaced by its conditional expectation given the observed data. This conditional expectation is estimated using a James–Stein type shrinkage estimator. The second step consists of regressing the dependent variables on this shrinkage estimator. In addition to linear SEMs, we also derive shrinkage estimators to estimate polynomials. We empirically demonstrate the feasibility of the proposed method via simulation and contrast the proposed estimator with ML and MIIV estimators under a limited number of simulation scenarios. We illustrate the method on a case study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Change history

References

  • Bentler, P. M. (1968). Alpha-maximized factor analysis (alphamax): Its relation to alpha and canonical factor analysis. Psychometrika, 33(3), 335–345.

    Article  Google Scholar 

  • Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

    Book  Google Scholar 

  • Bollen, K. A. (1995). Structural equation models that are nonlinear in latent variables: A least-squares estimator. Sociological Methodology, 25, 223–251.

    Article  Google Scholar 

  • Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61(1), 109–121.

    Article  Google Scholar 

  • Bollen, K. A. (2018). Model implied instrumental variables (MIIVs): An alternative orientation to structural equation modeling. Multivariate Behavioral Research, 54, 1–16.

    Google Scholar 

  • Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models (2nd ed.). New York: Chapman and Hall.

    Book  Google Scholar 

  • Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 496(106), 1602–1614.

    Article  Google Scholar 

  • Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68(341), 117–130.

    Google Scholar 

  • Efron, B., & Morris, C. (1975). Data analysis using Stein’s estimator and its generalizations. Journal of the American Statistical Association, 70(350), 311–319.

    Article  Google Scholar 

  • Fisher, Z., Bollen, K., Gates, K., & Rönkkö, M. (2019). MIIVsem: Model implied instrumental variable (MIIV) estimation of structural equation models. R package version, 0.5.4.

  • Fuller, W. A. (1987). Measurement error models. New York: Wiley.

    Book  Google Scholar 

  • Fuller, W. A., & Hidiroglou, M. A. (1978). Regression estimation after correcting for attenuation. Journal of the American Statistical Association, 73(361), 99–104.

    Article  Google Scholar 

  • Gleser, L. J. (1990). Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. Contemporary Mathematics, 112, 99–114.

    Article  Google Scholar 

  • Harman, H. H. (1976). Modern factor analysis. Chicago: University of Chicago Press.

    Google Scholar 

  • James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 361–379).

  • Jöreskog, K. G., & Sörbom, D. (1993). LISREL8: User’s reference guide. Lincolnwood: Scientific Software International.

    Google Scholar 

  • Kano, Y. (1990). Noniterative estimation and the choice of the number of factors in exploratory factor analysis. Psychometrika, 55(2), 277–291.

    Article  Google Scholar 

  • Kelley, T. L. (1947). Fundamentals of statistics. Cambridge: Harvard University Press.

    Google Scholar 

  • Kenny, D. A., & Judd, C. M. (1984). Estimating the nonlinear and interactive effect of latent variables. Psychological Bulletin, 96(1), 201–210.

    Article  Google Scholar 

  • Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models (5th ed.). New York: McGraw-Hill/Irwin.

    Google Scholar 

  • R Core Team. (2018). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

    Google Scholar 

  • Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36.

    Article  Google Scholar 

  • Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica, 26, 393–415.

    Article  Google Scholar 

  • Shi, J., Feng, J., & Song, W. (2018). Estimation in linear regression with Laplace measurement error using Tweedie-type formula. Journal of Systems Science and Complexity, 32, 1–20.

    Google Scholar 

  • Spearman, C. (1904). ‘General intelligence’, objectively determined and measured. The American Journal of Psychology, 15(2), 201–293.

    Article  Google Scholar 

  • Spearman, C. (1927). The ability of man. London: Macmillan.

    Google Scholar 

  • Warren, R. D., White, J. K., & Fuller, W. A. (1974). An errors-in-variables analysis of managerial role performance. Journal of the American Statistical Association, 69(348), 886–893.

    Article  Google Scholar 

  • Whittemore, A. S. (1989). Errors-in-variables regression using Stein estimates. The American Statistician, 43(4), 226–228.

    Google Scholar 

  • Yalcin, I., & Amemiya, Y. (2001). Nonlinear factor analysis as a statistical method. Statistical Science, 16(3), 275–294.

    Google Scholar 

Download references

Acknowledgements

This work was financially supported by a Special Research Fund (BOF) Starting Grant 01N00717 from Ghent University. The authors thank the editor, associate editor and the referees for their insightful and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elissa Burghgraeve.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The author affiliation has been updated.

Appendices

Appendix A

We give a detailed derivation of the expression for \(\mathbb {E}(\eta _i^2\mid y_{i1})\). It is obtained via the cumulant generating function (cgf) of \(\eta _i\) given \(y_{i1}\) (Efron, 2011). If \(y_{i1}\) follows a normal distribution, the cgf of \(\eta \) given \(y_1\), is given by

$$\begin{aligned} \frac{y_{i1}^2}{2\text {Var}(\epsilon _{i1})} - \frac{\left( y_{i1}-\mathbb {E}(\eta _i)\right) ^2}{2\text {Var}(\epsilon _{i1})}(1-R_i), \quad \text {with} \quad R_i=\frac{\text {Var}(\eta _{i})}{\text {Var}(y_{i1})}. \end{aligned}$$
(A1)

We can recover the moments in terms of the cumulants by computing the derivatives of (A1). The first moment \(\mathbb {E}(\eta _i\mid y_{i1})\) is equal to the first cumulant \(\kappa _1\). The expression for \(\mathbb {E}(\eta _i^2\mid y_{11})\) can be written in terms of the first 2 cumulants.

We have

$$\begin{aligned} \mathbb {E}(\eta _i^2\mid y_{11})&=\kappa _2 + \kappa _1^2\\&=\text {Var}(\epsilon _{i1})R_i+\left[ y_{i1}-(1-R_i)(y_{i1}-\mathbb {E}(\eta _i))\right] ^2\\&=\text {Var}(\epsilon _{i1})R_i +y_{i1}^2-2(1-R_i)y_{i1}(Y_{i1}-\mathbb {E}(\eta _i))+(1-R_i)^2(y_{i1}-\mathbb {E}(\eta _i))^2 \\&= R_i(\text {Var}(\epsilon _{i1})+2(1-R_i)\mathbb {E}(\eta _i)y_{i1})+R_i^2y_{i1}^2+(1-R_i)^2\mathbb {E}(\eta _i)^2. \end{aligned}$$

In order to show the generalization of this approach for polynomials of order \(k\ge 3\), we give the expression \(\mathbb {E}(\eta _i^3\mid y_{i1})\) in terms of the cumulants:

$$\begin{aligned} \mathbb {E}(\eta _i^3\mid y_{i1})&=\kappa _3+3\kappa _2\kappa _1+\kappa _1^3. \end{aligned}$$

Appendix B

In this paper we use the estimator described by Harman (1976): section 7.2, to estimate the variance of the measurement error. There are however several other possibilities to estimate the variance of the measurement error that we will discuss in this “Appendix”. When a direct estimate of the reliability of \(y_{i1}\) is available (e.g. Cronbach’s alpha), an estimate of \(\mathrm {Var}(\epsilon _{i1})\) can be derived.

If a direct estimate of the reliability is not available, we need at least two indicators to estimate \(\mathrm {Var}(\epsilon _{i1})\) and these indicators need to be linearly related to the latent variable. In what follows we will discuss estimators for \(\mathrm {Var}(\eta _i)\) from which an estimator of \(\mathrm {Var}(\epsilon _{i1}) = \mathrm {Var}(y_{i1}) - \mathrm {Var}(\eta _i)\) can be constructed directly. Let \(y_{i1} = \eta _i + \epsilon _{i1}\) and \(y_{i2} = \alpha _{i2} + \beta _{i2}\eta _i + \epsilon _{i2}\) with \(\beta _{i2} \ne 0\), and suppose there exists a z such that \(\mathrm {Cov}(y_{i1}, z) \ne 0\), \(\mathrm {Cov}(y_{i2}, z) \ne 0\) and \(\mathrm {Cov}(\epsilon _{i1}, z) = \mathrm {Cov}(\epsilon _{i2}, z) = 0\), then it follows that

$$\begin{aligned} \frac{\text {Cov}(y_{i1},z)\text {Cov}(y_{i1},y_{i2})}{\text {Cov}(y_{i2},z)} =\frac{\text {Cov}(\eta _i ,z)\text {Cov}(\eta _i,\beta _{i2}\eta _i)}{\text {Cov}(\beta _{i2} \eta _i,z)} = \mathrm {Var}(\eta _i) . \end{aligned}$$
(A2)

So replacing the covariances in the left hand side of (A2) with their empirical counterparts results in an estimator of \(\mathrm {Var}(\eta _i)\). The question now remains: how can we find a z? The easiest solution is to require a third indicator \(y_{i3} = \alpha _{i3} + \beta _{i3}\eta _i + \epsilon _{i3}\) with \(\beta _{i3} \ne 0\) which can play the role of z. Note that the functional form of this third indicator can also be more complicated, e.g. \(y_{i3} = \alpha _{i3} + \beta _{i3} \eta _i + \beta _{j3} \eta _j^2 + \epsilon _{i3}\) can also play the role of z as long as \(\mathrm {Cov}(y_{i1}, y_{i3}) \ne 0\) and \(\mathrm {Cov}(y_{i2}, y_{i3}) \ne 0\).

If there are more than three indicators and the measurement model (3) holds with loadings different from 0, all indicators except the scaling indicator can play the role of \(y_{2i}\) and z in (A2) and the resulting estimates of \(\mathrm {Var}(\eta )\) can be averaged (Kano, 1990). Notice that this is very similar to the method described by Harman (1976).

Equation (A2), however, also allows other choices of z without requiring a third indicator. If \(\eta _i\) does not follow a symmetric distribution, then it follows directly that \(y_{i1}y_{i2}\), \(y_{i1}^2\) and \(y_{i2}^2\) can all play the role of z (Fuller, 1987).

Appendix C

In this “Appendix”, we prove the theorems presented in this paper.

1.1 Proof of Theorem 1:

Let \(\varvec{\hat{A}}_i\) be defined as in (18). Under assumptions A1 and A2, \(\varvec{\hat{A}}_i\) is consistent and asymptotically normal.

Proof

It holds that

$$\begin{aligned} \bar{\eta }_i = \bar{\varvec{\eta }}_i\varvec{A}_i + \bar{\zeta }_i, \end{aligned}$$

and

$$\begin{aligned} \bar{y}_{1i} = \bar{\eta }_i + \bar{\epsilon }_i, \end{aligned}$$

so that

$$\begin{aligned} \bar{y}_{1i} = \bar{\varvec{\eta }}_i\varvec{A}_i + \bar{\zeta }_i +\bar{\epsilon }_i. \end{aligned}$$

Furthermore,

$$\begin{aligned} \bar{\varvec{y}}_{1i} = \bar{\varvec{\eta }}_i + \bar{\varvec{\epsilon }}_i, \end{aligned}$$

so that

$$\begin{aligned} \bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/n = \bar{\varvec{\eta }}_i^\mathrm{T}\bar{\varvec{\eta }}_i\varvec{A}_i/n + \bar{\varvec{\eta }}_i^\mathrm{T}\bar{\zeta }_i/n + \bar{\varvec{\eta }}_i^\mathrm{T}\bar{\epsilon }_i/n + \bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\varvec{\eta }}_i\varvec{A}_i/n + \bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\zeta }_i/n + \bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\epsilon }_i/n. \end{aligned}$$

The weak law of large numbers states that

$$\begin{aligned} \text {plim}(\bar{\varvec{\eta }}_i^\mathrm{T}\bar{\varvec{\eta }}_i/n )= & {} \varvec{\Sigma }_{\varvec{\eta }_i, \varvec{\eta }_i}, \quad \text {plim}(\bar{\varvec{\eta }}_i^\mathrm{T}\bar{\zeta }_i/n) = \mathbb {E}(\varvec{\eta }_i \zeta _i) = \varvec{0}\\ \text {plim}( \bar{\varvec{\eta }}_i^\mathrm{T}\bar{\epsilon }_i/n )= & {} \mathbb {E}(\varvec{\eta }_i \epsilon _i) = \varvec{0}, \quad \text {plim}(\bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\varvec{\eta }}_i) = \mathbb {E}(\varvec{\epsilon }_i \varvec{\eta }_i^\mathrm{T}) = \varvec{0}\\ \text {plim}(\bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\zeta }_i/n)= & {} \mathbb {E}(\varvec{\zeta }_i \zeta ) = \varvec{0}, \quad \text {plim}(\bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\epsilon }_i/n) = \mathbb {E}(\varvec{\epsilon } \epsilon ) = \varvec{0}, \end{aligned}$$

with plim the probability limit of the term in parentheses as n goes to infinity. Note that the five equalities to zero hold because we assume that all errors are mutually uncorrelated and are uncorrelated with the latent variable.

Combining these results leads to

$$\begin{aligned} \text {plim}(\bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/n ) = \varvec{\Sigma }_{\varvec{\eta }_i, \varvec{\eta }_i}\varvec{A}_i. \end{aligned}$$

so that following Slutsky lemma

$$\begin{aligned} \text {plim}(\varvec{\hat{A}}_i ) = \text {plim}(\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}^{-1}) \text {plim}(\bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/n) =\varvec{\Sigma }_{\varvec{\eta }_i, \varvec{\eta }_i}^{-1}\varvec{\Sigma }_{\varvec{\eta }_i, \varvec{\eta }_i}\varvec{A}_i = \varvec{A}_i, \end{aligned}$$

proving consistency.

Asymptotic normality follows directly from

$$\begin{aligned} \sqrt{n} \varvec{\hat{A}}_i = \hat{\varvec{\Sigma }}_{\varvec{\eta }_i}^{-1} \bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/\sqrt{n} \end{aligned}$$

because \(\hat{\varvec{\Sigma }}_{\varvec{\eta }_i}^{-1} \overset{p}{\rightarrow } \varvec{\Sigma }_{\varvec{\eta }_i}^{-1} \) and \(\bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/\sqrt{n}\) converges to a normal distribution due to the central limit theorem. Using Slutsky then shows that \(\sqrt{n} \varvec{\hat{A}}_i\) converges in distribution to a normal distribution. The asymptotic variance of \(\sqrt{n} \varvec{\hat{A}}_i \) is provided in Theorem 2. \(\square \)

1.2 Proof of Theorem 2:

Theorem 3

Let \(\hat{\varvec{A_i}}\) be defined as in (18). The variance of \(\hat{\varvec{A_i}}\) can be written as

$$\begin{aligned} \mathrm {Var}(\hat{\varvec{A}_i }) = \frac{\hat{\sigma }_r^2}{n}\left( \hat{\varvec{R}}^\mathrm{T}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1} + \mathrm {Var}[\mathbb {E}(\hat{\varvec{A}_i } \mid \hat{\varvec{R}})]. \end{aligned}$$

Proof

Using the Law of Total Variance, we get

$$\begin{aligned} \mathrm {Var}(\hat{\varvec{A}_i }) = \mathbb {E}[\mathrm {Var}(\hat{\varvec{A}_i } \mid \hat{\varvec{R}})] + \mathrm {Var}[\mathbb {E}(\hat{\varvec{A}_i } \mid \hat{\varvec{R}})]. \end{aligned}$$
(A3)

The first term can be estimated by the conventional least-squares variance estimator which treats \(\varvec{\hat{R}}\) as fixed. Our estimator for \(\varvec{A}_i\) is obtained by regressing \(\bar{y}_{1i}\) on the \(n \times r\) design matrix \(\bar{\varvec{y}}_{1i}\hat{\varvec{R}}\) with \(\hat{\varvec{R}} = (\bar{\varvec{y}}_{1i}^\mathrm{T}\bar{\varvec{y}}_{1i}/n)^{-1}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\). The variance of \(\hat{\varvec{A}_i }\) given \(\hat{\varvec{R}}\) can then be written as

$$\begin{aligned} \sigma _r^2\left( \varvec{\hat{R}}^\mathrm{T}\bar{\varvec{y}}_{1i}^\mathrm{T}\bar{\varvec{y}}_{1i}\varvec{\hat{R}}\right) ^{-1}&=\sigma _r^2\left( \hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}^\mathrm{T} (\bar{\varvec{y}}_{1i}\bar{\varvec{y}}^\mathrm{T}_{1i}/n)^{-1}\bar{\varvec{y}}_{1i}^\mathrm{T}\bar{\varvec{y}}_{1i} (\bar{\varvec{y}}_{1i}^\mathrm{T}\bar{\varvec{y}}_{1i}/n)^{-1}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1}\\&= \frac{\sigma _r^2}{n}\left( \hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}^\mathrm{T}(\bar{\varvec{y}}_{1i}\bar{\varvec{y}}^\mathrm{T}_{1i}/n)^{-1}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1}\\&=\frac{\sigma _r^2}{n}\left( \hat{\varvec{R}}^\mathrm{T}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1} \end{aligned}$$

We can estimate \(\sigma _r^2\) with \(\hat{\sigma }_r^2\), the estimated variance of the residuals. So Eq. (A3) becomes

$$\begin{aligned} \mathrm {Var}(\hat{\varvec{A}_i }) = \frac{\hat{\sigma }_r^2}{n}\left( \hat{\varvec{R}}^\mathrm{T}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1} + \mathrm {Var}[\mathbb {E}(\hat{\varvec{A}_i } \mid \hat{\varvec{R}})]. \end{aligned}$$

\(\square \)

Fig. 2
figure 2

Boxplot of the indicators of \(\eta _1\), where the horizontal bar shows the median.

Fig. 3
figure 3

Boxplot of the indicators of \(\eta _2\), where the horizontal bar shows the median.

Fig. 4
figure 4

Boxplot of the indicators of \(\eta _3\), where the horizontal bar shows the median.

Appendix D

In this “Appendix”, we give more details on the empirical example provided in Sect. 6. The Industrialization and Political Democracy dataset (Bollen, 1989) contains 75 observations of 11 continuous variables.

  • \(y_1\): The gross national product (GNP) per capita in 1960.

  • \(y_2\): The inanimate energy consumption per capita in 1960.

  • \(y_3\): The percentage of the labor force in industry in 1960.

  • \(y_4\): Expert ratings of the freedom of the press in 1960.

  • \(y_5\): The freedom of political opposition in 1960.

  • \(y_6\): The fairness of elections in 1960.

  • \(y_7\): The effectiveness of the elected legislature in 1960.

  • \(y_8\): Expert ratings of the freedom of the press in 1965.

  • \(y_9\): The freedom of political opposition in 1965.

  • \(y_{10}\): The fairness of elections in 1965.

  • \(y_{11}\): The effectiveness of the elected legislature in 1965 (Figs. 2, 3, and 4).

We give an descriptive analysis of th dataset by means of boxplots. The boxplots are grouped by the latent variable they measure.

Appendix E

We compare the performance of the estimators in terms of the MSE. To do so, we conduct paired t-tests comparing \((\hat{\beta }-\beta )^2\). We calculate the squared errors in each simulation step and compute the difference between them for the different considered estimators. Then we use the t-statistic \(\displaystyle \text {T}=\frac{\bar{d}}{\text {SE}(\bar{d})}\), with \(\bar{d}\) the mean difference and compute the p-values. The results can be found in Table 13.

Table 13 Considering both the measurement and latent variable model (linear and quadratic), this table gives the p values of paired t tests comparing \((\hat{\beta }-\beta )^2\) of the different estimators in multiple sample sizes and distributions for \(\eta \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Burghgraeve, E., De Neve, J. & Rosseel, Y. Estimating Structural Equation Models Using James–Stein Type Shrinkage Estimators. Psychometrika 86, 96–130 (2021). https://doi.org/10.1007/s11336-021-09749-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-021-09749-2

Keywords

Navigation