Estimating Structural Equation Models Using James–Stein Type Shrinkage Estimators

Burghgraeve, Elissa; De Neve, Jan; Rosseel, Yves

doi:10.1007/s11336-021-09749-2

Estimating Structural Equation Models Using James–Stein Type Shrinkage Estimators

Theory and Methods
Published: 18 March 2021

Volume 86, pages 96–130, (2021)
Cite this article

Psychometrika Aims and scope Submit manuscript

668 Accesses
5 Citations
2 Altmetric
Explore all metrics

An Erratum to this article was published on 05 May 2021

This article has been updated

Abstract

We propose a two-step procedure to estimate structural equation models (SEMs). In a first step, the latent variable is replaced by its conditional expectation given the observed data. This conditional expectation is estimated using a James–Stein type shrinkage estimator. The second step consists of regressing the dependent variables on this shrinkage estimator. In addition to linear SEMs, we also derive shrinkage estimators to estimate polynomials. We empirically demonstrate the feasibility of the proposed method via simulation and contrast the proposed estimator with ML and MIIV estimators under a limited number of simulation scenarios. We illustrate the method on a case study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Yan Xia & Yanyun Yang

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Jörg Henseler, Christian M. Ringle & Marko Sarstedt

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Levi Kumle, Melissa L.-H. Võ & Dejan Draschkow

Change history

05 May 2021
An Erratum to this paper has been published: https://doi.org/10.1007/s11336-021-09766-1

References

Bentler, P. M. (1968). Alpha-maximized factor analysis (alphamax): Its relation to alpha and canonical factor analysis. Psychometrika, 33(3), 335–345.
Article Google Scholar
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Book Google Scholar
Bollen, K. A. (1995). Structural equation models that are nonlinear in latent variables: A least-squares estimator. Sociological Methodology, 25, 223–251.
Article Google Scholar
Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61(1), 109–121.
Article Google Scholar
Bollen, K. A. (2018). Model implied instrumental variables (MIIVs): An alternative orientation to structural equation modeling. Multivariate Behavioral Research, 54, 1–16.
Google Scholar
Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models (2nd ed.). New York: Chapman and Hall.
Book Google Scholar
Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 496(106), 1602–1614.
Article Google Scholar
Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68(341), 117–130.
Google Scholar
Efron, B., & Morris, C. (1975). Data analysis using Stein’s estimator and its generalizations. Journal of the American Statistical Association, 70(350), 311–319.
Article Google Scholar
Fisher, Z., Bollen, K., Gates, K., & Rönkkö, M. (2019). MIIVsem: Model implied instrumental variable (MIIV) estimation of structural equation models. R package version, 0.5.4.
Fuller, W. A. (1987). Measurement error models. New York: Wiley.
Book Google Scholar
Fuller, W. A., & Hidiroglou, M. A. (1978). Regression estimation after correcting for attenuation. Journal of the American Statistical Association, 73(361), 99–104.
Article Google Scholar
Gleser, L. J. (1990). Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. Contemporary Mathematics, 112, 99–114.
Article Google Scholar
Harman, H. H. (1976). Modern factor analysis. Chicago: University of Chicago Press.
Google Scholar
James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 361–379).
Jöreskog, K. G., & Sörbom, D. (1993). LISREL8: User’s reference guide. Lincolnwood: Scientific Software International.
Google Scholar
Kano, Y. (1990). Noniterative estimation and the choice of the number of factors in exploratory factor analysis. Psychometrika, 55(2), 277–291.
Article Google Scholar
Kelley, T. L. (1947). Fundamentals of statistics. Cambridge: Harvard University Press.
Google Scholar
Kenny, D. A., & Judd, C. M. (1984). Estimating the nonlinear and interactive effect of latent variables. Psychological Bulletin, 96(1), 201–210.
Article Google Scholar
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models (5th ed.). New York: McGraw-Hill/Irwin.
Google Scholar
R Core Team. (2018). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Google Scholar
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36.
Article Google Scholar
Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica, 26, 393–415.
Article Google Scholar
Shi, J., Feng, J., & Song, W. (2018). Estimation in linear regression with Laplace measurement error using Tweedie-type formula. Journal of Systems Science and Complexity, 32, 1–20.
Google Scholar
Spearman, C. (1904). ‘General intelligence’, objectively determined and measured. The American Journal of Psychology, 15(2), 201–293.
Article Google Scholar
Spearman, C. (1927). The ability of man. London: Macmillan.
Google Scholar
Warren, R. D., White, J. K., & Fuller, W. A. (1974). An errors-in-variables analysis of managerial role performance. Journal of the American Statistical Association, 69(348), 886–893.
Article Google Scholar
Whittemore, A. S. (1989). Errors-in-variables regression using Stein estimates. The American Statistician, 43(4), 226–228.
Google Scholar
Yalcin, I., & Amemiya, Y. (2001). Nonlinear factor analysis as a statistical method. Statistical Science, 16(3), 275–294.
Google Scholar

Download references

Acknowledgements

This work was financially supported by a Special Research Fund (BOF) Starting Grant 01N00717 from Ghent University. The authors thank the editor, associate editor and the referees for their insightful and constructive comments.

Author information

Authors and Affiliations

Department of Data Analysis, GHENT UNIVERSITY, Henri Dunantlaan 1, Ghent, Belgium
Elissa Burghgraeve, Jan De Neve & Yves Rosseel

Authors

Elissa Burghgraeve
View author publications
You can also search for this author in PubMed Google Scholar
Jan De Neve
View author publications
You can also search for this author in PubMed Google Scholar
Yves Rosseel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elissa Burghgraeve.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The author affiliation has been updated.

Appendices

Appendix A

We give a detailed derivation of the expression for $\mathbb {E}(\eta _i^2\mid y_{i1})$. It is obtained via the cumulant generating function (cgf) of $\eta _i$ given $y_{i1}$ (Efron, 2011). If $y_{i1}$ follows a normal distribution, the cgf of $\eta $ given $y_1$, is given by

$$\begin{aligned} \frac{y_{i1}^2}{2\text {Var}(\epsilon _{i1})} - \frac{\left( y_{i1}-\mathbb {E}(\eta _i)\right) ^2}{2\text {Var}(\epsilon _{i1})}(1-R_i), \quad \text {with} \quad R_i=\frac{\text {Var}(\eta _{i})}{\text {Var}(y_{i1})}. \end{aligned}$$

(A1)

We can recover the moments in terms of the cumulants by computing the derivatives of (A1). The first moment $\mathbb {E}(\eta _i\mid y_{i1})$ is equal to the first cumulant $\kappa _1$. The expression for $\mathbb {E}(\eta _i^2\mid y_{11})$ can be written in terms of the first 2 cumulants.

We have

$$\begin{aligned} \mathbb {E}(\eta _i^2\mid y_{11})&=\kappa _2 + \kappa _1^2\\&=\text {Var}(\epsilon _{i1})R_i+\left[ y_{i1}-(1-R_i)(y_{i1}-\mathbb {E}(\eta _i))\right] ^2\\&=\text {Var}(\epsilon _{i1})R_i +y_{i1}^2-2(1-R_i)y_{i1}(Y_{i1}-\mathbb {E}(\eta _i))+(1-R_i)^2(y_{i1}-\mathbb {E}(\eta _i))^2 \\&= R_i(\text {Var}(\epsilon _{i1})+2(1-R_i)\mathbb {E}(\eta _i)y_{i1})+R_i^2y_{i1}^2+(1-R_i)^2\mathbb {E}(\eta _i)^2. \end{aligned}$$

In order to show the generalization of this approach for polynomials of order $k\ge 3$, we give the expression $\mathbb {E}(\eta _i^3\mid y_{i1})$ in terms of the cumulants:

$$\begin{aligned} \mathbb {E}(\eta _i^3\mid y_{i1})&=\kappa _3+3\kappa _2\kappa _1+\kappa _1^3. \end{aligned}$$

Appendix B

In this paper we use the estimator described by Harman (1976): section 7.2, to estimate the variance of the measurement error. There are however several other possibilities to estimate the variance of the measurement error that we will discuss in this “Appendix”. When a direct estimate of the reliability of $y_{i1}$ is available (e.g. Cronbach’s alpha), an estimate of $\mathrm {Var}(\epsilon _{i1})$ can be derived.

If a direct estimate of the reliability is not available, we need at least two indicators to estimate $\mathrm {Var}(\epsilon _{i1})$ and these indicators need to be linearly related to the latent variable. In what follows we will discuss estimators for $\mathrm {Var}(\eta _i)$ from which an estimator of $\mathrm {Var}(\epsilon _{i1}) = \mathrm {Var}(y_{i1}) - \mathrm {Var}(\eta _i)$ can be constructed directly. Let $y_{i1} = \eta _i + \epsilon _{i1}$ and $y_{i2} = \alpha _{i2} + \beta _{i2}\eta _i + \epsilon _{i2}$ with $\beta _{i2} \ne 0$, and suppose there exists a z such that $\mathrm {Cov}(y_{i1}, z) \ne 0$, $\mathrm {Cov}(y_{i2}, z) \ne 0$ and $\mathrm {Cov}(\epsilon _{i1}, z) = \mathrm {Cov}(\epsilon _{i2}, z) = 0$, then it follows that

$$\begin{aligned} \frac{\text {Cov}(y_{i1},z)\text {Cov}(y_{i1},y_{i2})}{\text {Cov}(y_{i2},z)} =\frac{\text {Cov}(\eta _i ,z)\text {Cov}(\eta _i,\beta _{i2}\eta _i)}{\text {Cov}(\beta _{i2} \eta _i,z)} = \mathrm {Var}(\eta _i) . \end{aligned}$$

(A2)

So replacing the covariances in the left hand side of (A2) with their empirical counterparts results in an estimator of $\mathrm {Var}(\eta _i)$. The question now remains: how can we find a z? The easiest solution is to require a third indicator $y_{i3} = \alpha _{i3} + \beta _{i3}\eta _i + \epsilon _{i3}$ with $\beta _{i3} \ne 0$ which can play the role of z. Note that the functional form of this third indicator can also be more complicated, e.g. $y_{i3} = \alpha _{i3} + \beta _{i3} \eta _i + \beta _{j3} \eta _j^2 + \epsilon _{i3}$ can also play the role of z as long as $\mathrm {Cov}(y_{i1}, y_{i3}) \ne 0$ and $\mathrm {Cov}(y_{i2}, y_{i3}) \ne 0$.

If there are more than three indicators and the measurement model (3) holds with loadings different from 0, all indicators except the scaling indicator can play the role of $y_{2i}$ and z in (A2) and the resulting estimates of $\mathrm {Var}(\eta )$ can be averaged (Kano, 1990). Notice that this is very similar to the method described by Harman (1976).

Equation (A2), however, also allows other choices of z without requiring a third indicator. If $\eta _i$ does not follow a symmetric distribution, then it follows directly that $y_{i1}y_{i2}$, $y_{i1}^2$ and $y_{i2}^2$ can all play the role of z (Fuller, 1987).

Appendix C

In this “Appendix”, we prove the theorems presented in this paper.

1.1 Proof of Theorem 1:

Let $\varvec{\hat{A}}_i$ be defined as in (18). Under assumptions A1 and A2, $\varvec{\hat{A}}_i$ is consistent and asymptotically normal.

Proof

It holds that

$$\begin{aligned} \bar{\eta }_i = \bar{\varvec{\eta }}_i\varvec{A}_i + \bar{\zeta }_i, \end{aligned}$$

and

$$\begin{aligned} \bar{y}_{1i} = \bar{\eta }_i + \bar{\epsilon }_i, \end{aligned}$$

so that

$$\begin{aligned} \bar{y}_{1i} = \bar{\varvec{\eta }}_i\varvec{A}_i + \bar{\zeta }_i +\bar{\epsilon }_i. \end{aligned}$$

Furthermore,

$$\begin{aligned} \bar{\varvec{y}}_{1i} = \bar{\varvec{\eta }}_i + \bar{\varvec{\epsilon }}_i, \end{aligned}$$

so that

$$\begin{aligned} \bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/n = \bar{\varvec{\eta }}_i^\mathrm{T}\bar{\varvec{\eta }}_i\varvec{A}_i/n + \bar{\varvec{\eta }}_i^\mathrm{T}\bar{\zeta }_i/n + \bar{\varvec{\eta }}_i^\mathrm{T}\bar{\epsilon }_i/n + \bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\varvec{\eta }}_i\varvec{A}_i/n + \bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\zeta }_i/n + \bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\epsilon }_i/n. \end{aligned}$$

The weak law of large numbers states that

$$\begin{aligned} \text {plim}(\bar{\varvec{\eta }}_i^\mathrm{T}\bar{\varvec{\eta }}_i/n )= & {} \varvec{\Sigma }_{\varvec{\eta }_i, \varvec{\eta }_i}, \quad \text {plim}(\bar{\varvec{\eta }}_i^\mathrm{T}\bar{\zeta }_i/n) = \mathbb {E}(\varvec{\eta }_i \zeta _i) = \varvec{0}\\ \text {plim}( \bar{\varvec{\eta }}_i^\mathrm{T}\bar{\epsilon }_i/n )= & {} \mathbb {E}(\varvec{\eta }_i \epsilon _i) = \varvec{0}, \quad \text {plim}(\bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\varvec{\eta }}_i) = \mathbb {E}(\varvec{\epsilon }_i \varvec{\eta }_i^\mathrm{T}) = \varvec{0}\\ \text {plim}(\bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\zeta }_i/n)= & {} \mathbb {E}(\varvec{\zeta }_i \zeta ) = \varvec{0}, \quad \text {plim}(\bar{\varvec{\epsilon }}_i^\mathrm{T}\bar{\epsilon }_i/n) = \mathbb {E}(\varvec{\epsilon } \epsilon ) = \varvec{0}, \end{aligned}$$

with plim the probability limit of the term in parentheses as n goes to infinity. Note that the five equalities to zero hold because we assume that all errors are mutually uncorrelated and are uncorrelated with the latent variable.

Combining these results leads to

$$\begin{aligned} \text {plim}(\bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/n ) = \varvec{\Sigma }_{\varvec{\eta }_i, \varvec{\eta }_i}\varvec{A}_i. \end{aligned}$$

so that following Slutsky lemma

$$\begin{aligned} \text {plim}(\varvec{\hat{A}}_i ) = \text {plim}(\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}^{-1}) \text {plim}(\bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/n) =\varvec{\Sigma }_{\varvec{\eta }_i, \varvec{\eta }_i}^{-1}\varvec{\Sigma }_{\varvec{\eta }_i, \varvec{\eta }_i}\varvec{A}_i = \varvec{A}_i, \end{aligned}$$

proving consistency.

Asymptotic normality follows directly from

$$\begin{aligned} \sqrt{n} \varvec{\hat{A}}_i = \hat{\varvec{\Sigma }}_{\varvec{\eta }_i}^{-1} \bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/\sqrt{n} \end{aligned}$$

because $\hat{\varvec{\Sigma }}_{\varvec{\eta }_i}^{-1} \overset{p}{\rightarrow } \varvec{\Sigma }_{\varvec{\eta }_i}^{-1} $ and $\bar{\varvec{y}}_{1i}^\mathrm{T} \bar{y}_{1i}/\sqrt{n}$ converges to a normal distribution due to the central limit theorem. Using Slutsky then shows that $\sqrt{n} \varvec{\hat{A}}_i$ converges in distribution to a normal distribution. The asymptotic variance of $\sqrt{n} \varvec{\hat{A}}_i $ is provided in Theorem 2. $\square $

1.2 Proof of Theorem 2:

Theorem 3

Let $\hat{\varvec{A_i}}$ be defined as in (18). The variance of $\hat{\varvec{A_i}}$ can be written as

$$\begin{aligned} \mathrm {Var}(\hat{\varvec{A}_i }) = \frac{\hat{\sigma }_r^2}{n}\left( \hat{\varvec{R}}^\mathrm{T}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1} + \mathrm {Var}[\mathbb {E}(\hat{\varvec{A}_i } \mid \hat{\varvec{R}})]. \end{aligned}$$

Proof

Using the Law of Total Variance, we get

$$\begin{aligned} \mathrm {Var}(\hat{\varvec{A}_i }) = \mathbb {E}[\mathrm {Var}(\hat{\varvec{A}_i } \mid \hat{\varvec{R}})] + \mathrm {Var}[\mathbb {E}(\hat{\varvec{A}_i } \mid \hat{\varvec{R}})]. \end{aligned}$$

(A3)

The first term can be estimated by the conventional least-squares variance estimator which treats $\varvec{\hat{R}}$ as fixed. Our estimator for $\varvec{A}_i$ is obtained by regressing $\bar{y}_{1i}$ on the $n \times r$ design matrix $\bar{\varvec{y}}_{1i}\hat{\varvec{R}}$ with $\hat{\varvec{R}} = (\bar{\varvec{y}}_{1i}^\mathrm{T}\bar{\varvec{y}}_{1i}/n)^{-1}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}$. The variance of $\hat{\varvec{A}_i }$ given $\hat{\varvec{R}}$ can then be written as

$$\begin{aligned} \sigma _r^2\left( \varvec{\hat{R}}^\mathrm{T}\bar{\varvec{y}}_{1i}^\mathrm{T}\bar{\varvec{y}}_{1i}\varvec{\hat{R}}\right) ^{-1}&=\sigma _r^2\left( \hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}^\mathrm{T} (\bar{\varvec{y}}_{1i}\bar{\varvec{y}}^\mathrm{T}_{1i}/n)^{-1}\bar{\varvec{y}}_{1i}^\mathrm{T}\bar{\varvec{y}}_{1i} (\bar{\varvec{y}}_{1i}^\mathrm{T}\bar{\varvec{y}}_{1i}/n)^{-1}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1}\\&= \frac{\sigma _r^2}{n}\left( \hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}^\mathrm{T}(\bar{\varvec{y}}_{1i}\bar{\varvec{y}}^\mathrm{T}_{1i}/n)^{-1}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1}\\&=\frac{\sigma _r^2}{n}\left( \hat{\varvec{R}}^\mathrm{T}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1} \end{aligned}$$

We can estimate $\sigma _r^2$ with $\hat{\sigma }_r^2$, the estimated variance of the residuals. So Eq. (A3) becomes

$$\begin{aligned} \mathrm {Var}(\hat{\varvec{A}_i }) = \frac{\hat{\sigma }_r^2}{n}\left( \hat{\varvec{R}}^\mathrm{T}\hat{\varvec{\Sigma }}_{\varvec{\eta }_i, \varvec{\eta }_i}\right) ^{-1} + \mathrm {Var}[\mathbb {E}(\hat{\varvec{A}_i } \mid \hat{\varvec{R}})]. \end{aligned}$$

$\square $

Appendix D

In this “Appendix”, we give more details on the empirical example provided in Sect. 6. The Industrialization and Political Democracy dataset (Bollen, 1989) contains 75 observations of 11 continuous variables.

$y_1$: The gross national product (GNP) per capita in 1960.
$y_2$: The inanimate energy consumption per capita in 1960.
$y_3$: The percentage of the labor force in industry in 1960.
$y_4$: Expert ratings of the freedom of the press in 1960.
$y_5$: The freedom of political opposition in 1960.
$y_6$: The fairness of elections in 1960.
$y_7$: The effectiveness of the elected legislature in 1960.
$y_8$: Expert ratings of the freedom of the press in 1965.
$y_9$: The freedom of political opposition in 1965.
$y_{10}$: The fairness of elections in 1965.
$y_{11}$: The effectiveness of the elected legislature in 1965 (Figs. 2, 3, and 4).

We give an descriptive analysis of th dataset by means of boxplots. The boxplots are grouped by the latent variable they measure.

Appendix E

We compare the performance of the estimators in terms of the MSE. To do so, we conduct paired t-tests comparing $(\hat{\beta }-\beta )^2$. We calculate the squared errors in each simulation step and compute the difference between them for the different considered estimators. Then we use the t-statistic $\displaystyle \text {T}=\frac{\bar{d}}{\text {SE}(\bar{d})}$, with $\bar{d}$ the mean difference and compute the p-values. The results can be found in Table 13.

Table 13 Considering both the measurement and latent variable model (linear and quadratic), this table gives the p values of paired t tests comparing $(\hat{\beta }-\beta )^2$ of the different estimators in multiple sample sizes and distributions for $\eta $.

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burghgraeve, E., De Neve, J. & Rosseel, Y. Estimating Structural Equation Models Using James–Stein Type Shrinkage Estimators. Psychometrika 86, 96–130 (2021). https://doi.org/10.1007/s11336-021-09749-2

Download citation

Received: 09 July 2019
Revised: 30 January 2021
Published: 18 March 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11336-021-09749-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating Structural Equation Models Using James–Stein Type Shrinkage Estimators

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Change history

05 May 2021

References

Acknowledgements