Skip to main content
Log in

Model averaging based on James–Stein estimators

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

Existing model averaging methods are generally based on ordinary least squares (OLS) estimators. However, it is well known that the James–Stein (JS) estimator dominates the OLS estimator under quadratic loss, provided that the dimension of coefficient is larger than two. Thus, we focus on model averaging based on JS estimators instead of OLS estimators. We develop a weight choice method and prove its asymptotic optimality. A simulation experiment shows promising results for the proposed model average estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Akaike H (1973) Maximum likelihood identification of Gaussian autoregression moving average models. Biometrika 60:255–265

    Article  MathSciNet  MATH  Google Scholar 

  • Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618

    Article  MATH  Google Scholar 

  • Claeskens G, Croux C, Venkerckhoven J (2006) Variable selection for logit regression using a prediction-focused information criterion. Biometrics 62:972–979

    Article  MathSciNet  MATH  Google Scholar 

  • Hansen BE (2007) Least squares model averaging. Econometrica 75:1175–1189

    Article  MathSciNet  MATH  Google Scholar 

  • Hansen BE (2008) Least squares forecast averaging. J Econ 146:342–350

    Article  Google Scholar 

  • Hansen BE, Racine J (2012) Jackknife model averaging. J Econ 167:38–46

    Article  MathSciNet  Google Scholar 

  • Hansen BE (2014) Model averaging, asymptotic risk, and regressor groups. Quant Econ (to appear)

  • Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417

    Article  MathSciNet  MATH  Google Scholar 

  • James W, Stein C (1961) Estimation with quadratic loss. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 361–379

  • Li K-C (1987) Asymptotic optimality for \(C_L\), cross validation and generalized crossvalidations: discrete index set. Ann Stat 15:958–975

    Article  MATH  Google Scholar 

  • Liang H, Zou G, Wan ATK, Zhang X (2011) On optimal weight choice in a frequentist model average estimator. J Am Stat Assoc 106:1053–1066

    Article  MathSciNet  MATH  Google Scholar 

  • Mallows CL (1973) Some comments on \(C_p\). Technometrics 15:661–675

    MATH  Google Scholar 

  • Miller AJ (2002) Subset selection in regression, 2nd edn. Chapman & Hall, London

    Book  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MATH  Google Scholar 

  • Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88:486–494

    Article  MATH  Google Scholar 

  • Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 197–206

  • Stein C (1981) Estimation of the mean of a multivariate normal distribution. Ann Stat 9:1135–1151

    Article  MATH  Google Scholar 

  • Wan ATK, Zhang X, Zou G (2010) Least squares model averaging by Mallows criterion. J Econ 156:277–283

    Article  MathSciNet  Google Scholar 

  • Yang Y (2001) Adaptive regression by mixing. J Am Stat Assoc 96:574–586

    Article  MATH  Google Scholar 

  • Yuan Z, Yang Y (2005) Combining linear regression models: when and how? J Am Stat Assoc 100:1202–1214

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang X, Liang H (2011) Focused information criterion and model averaging for generalized additive partial linear models. Ann Stat 39:174–200

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang X, Wan ATK, Zhou Z (2012) Focused information criteria, model selection and model averaging in a Tobit model with a non-zero threshold. J Bus Econ Stat 30:132–142

    Article  MathSciNet  Google Scholar 

  • Zhang X, Lu Z, Zou G (2013a) Adaptively combined forecasting for discrete response time series. J Econ 176:80–91

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang X, Wan ATK, Zou G (2013b) Model averaging by jackknife criterion in models with dependent data. J Econ 174:82–94

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

I am very grateful to two anonymous referees for their very helpful comments and suggestions. This work was supported by the grant 11226274 from the National Natural Science Foundation of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shangwei Zhao.

Appendix: Proofs of theorems

Appendix: Proofs of theorems

Proof of Theorem 1

By the condition stated in Theorem 1 and Stein’s Lemma (Stein 1981), we have

$$\begin{aligned} E (\hat{\mu }_i^{\text {JS}}(w)-y_i)(y_i-\mu _i) = \sigma ^2E \partial (\hat{\mu }_i^{\text {JS}}(w)-y_i)/\partial y_i = \sigma ^2E \partial \hat{\mu }_i^{\text {JS}}(w)/\partial y_i -\sigma ^2. \nonumber \\ \end{aligned}$$
(5)

Using (5) and some techniques on matrix derivatives, we have

$$\begin{aligned} R^{\text {JS}}(w)&= E\Vert \hat{\mu }^{\text {JS}}(w)-\mu \Vert ^2=E\Vert \hat{\mu }^{\text {JS}}(w)-y+y-\mu \Vert ^2\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2+E\Vert y-\mu \Vert ^2+2E(\hat{\mu }^{\text {JS}}(w)-y)'(y-\mu )\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2+n\sigma ^2 +2E\sum \limits _{i=1}^n(\hat{\mu }_i^{\text {JS}}(w)-y_i)(y_i-\mu _i)\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2+n\sigma ^2 +2\sigma ^2E\text{ tr }(\partial \hat{\mu }^{\text {JS}}(w)/\partial y') -2n\sigma ^2\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 \!+\!2\sigma ^2E\text{ tr }\left( \partial \sum \limits _{m=1}^Mw_m\left( 1-a_mg_m(y)\right) P_my/\partial y'\right) \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2E\sum \limits _{m=1}^Mw_m(1-a_mg_m(y))\text{ tr }(P_m)\nonumber \\&+ 2\sigma ^2\sum \limits _{m=1}^Mw_mE\text{ tr }\left( P_my\left( \partial \left( 1-a_m{\Vert \hat{\mu }_m-y\Vert ^2 \over \Vert \hat{\mu }_m\Vert ^2}\right) /\partial y'\right) \right) \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2E\sum \limits _{m=1}^Mw_m(1-a_mg_m(y))k_m\nonumber \\&+2\sigma ^2\sum \limits _{m=1}^Mw_mE\text{ tr }\left( P_my\left( \partial \left( 1+a_m-a_m{y'y \over y'P_my}\right) /\partial y'\right) \right) \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2E\sum \limits _{m=1}^Mw_m(1-a_mg_m(y))k_m\nonumber \\&-2\sigma ^2\sum \limits _{m=1}^Mw_mEa_m\left( \partial \left( {y'y \over y'P_my}\right) /\partial y'\right) P_my \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2w'k-2\sigma ^2E\sum \limits _{m=1}^Mw_ma_mg_m(y)k_m\nonumber \\&-2\sigma ^2\sum \limits _{m=1}^Mw_mEa_m\left( 2(y'P_my)^{-1}y'-2y'y(y'P_my)^{-2}y'P_m \right) P_my \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2w'k-2\sigma ^2E\sum \limits _{m=1}^Mw_ma_mg_m(y)k_m\nonumber \\&-4\sigma ^2\sum \limits _{m=1}^Mw_ma_m+ 4\sigma ^2\sum \limits _{m=1}^Mw_mEa_m y'y(y'P_my)^{-1}\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2+ 2\sigma ^2w'k-2\sigma ^2E\sum \limits _{m=1}^Mw_ma_mg_m(y)(k_m-2)-n\sigma ^2\nonumber \\&= ES(w)-n\sigma ^2. \end{aligned}$$
(6)

This completes the proof. \(\square \)

Proof of Theorem 2

Let \(U_m=(1-a_mg_m(y))P_m\), \(U(w)=\sum \nolimits _{m=1}^M w_mU_m\), \(\tilde{X}=[X_1,\ldots ,X_M]\), and \(\tilde{P}=\tilde{X}(\tilde{X}'\tilde{X})^{-1}\tilde{X}'\). Then \(\hat{\mu }^{\text {JS}}(w)=U(w)y\). It is straightforward to show that

$$\begin{aligned} \hat{S}(w)&= \Vert \hat{\mu }^{\text {JS}}(w)-\mu -e\Vert ^2+ 2\hat{\sigma }^2w'k-2\hat{\sigma }^2\sum \limits _{m=1}^Mw_ma_mg_m(y)(k_m-2)\nonumber \\&= L^{\text {JS}}(w)-2e'\hat{\mu }^{\text {JS}}(w)+ 2e'\tilde{P}\mu + 2\hat{\sigma }^2w'k-2\hat{\sigma }^2\sum \limits _{m=1}^Mw_ma_mg_m(y)(k_m-2)\nonumber \\&+\Vert e\Vert ^2 +2e'\mu -2e'\tilde{P}\mu \nonumber \\&\equiv S^*(w)+\Vert e\Vert ^2 +2e'\mu -2e'\tilde{P}\mu , \end{aligned}$$
(7)

where the last three terms are unrelated to \(w\). Thus,

$$\begin{aligned} \hat{w} = \underset{w \in {{\mathcal {W}}}}{\text {argmin}} S^*(w). \end{aligned}$$
(8)

It is seen that \(L^{\text {JS}}(w)- T(w) = \Vert U(w)e\Vert ^2+2\mu '(U(w)-I_n)U(w)e\) and \( S^*(w)-L^{\text {JS}}(w)=-2e'U(w)e - 2e'(U(w)-\tilde{P})\mu + 2\hat{\sigma }^2w'k-2\hat{\sigma }^2\sum \nolimits _{m=1}^Mw_ma_mg_m(y)(k_m-2)\). From the proof of Theorem 3 of Liang et al. (2011), (8), and condition (C.2), the following formulas are sufficient for (3):

$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)\Vert U(w)e\Vert ^2\mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$
(9a)
$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)e'U(w)e\mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$
(9b)
$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)\hat{\sigma }^2w'k\mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$
(9c)
$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)\hat{\sigma }^2|\sum \limits _{m=1}^Mw_ma_mg_m(y)(k_m-2)| \mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$
(9d)
$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)|\mu '(I_n-U(w))U(w)e|\mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$
(9e)
$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)|e'(U(w)-\tilde{P})\mu |\mathop {\longrightarrow }\limits ^{p}0. \end{aligned}$$
(9f)

Using condition (C.1), we have

$$\begin{aligned} \hat{\sigma }^2&= y'(I_n-P_{m^*})y(n-m^*)^{-1}\le \Vert y\Vert ^2(n-m^*)^{-1}=O_p(1),\quad \quad \end{aligned}$$
(10a)
$$\begin{aligned} a_mg_m(y)&= {\Vert \hat{\mu }_m-y\Vert ^2(n-k_m+2)^{-1} \over \Vert \hat{\mu }_m\Vert ^2(k_m-2)^{-1}} = {y'(I_n-P_m)y(n-k_m+2)^{-1} \over y'P_my(k_m-2)^{-1}}\quad \quad \nonumber \\&\le {\Vert y\Vert ^2(n-k_m+2)^{-1} \over \hat{\beta }_m'X_m'X_m\hat{\beta }_m(k_m-2)^{-1}}= O_p(1),\end{aligned}$$
(10b)
$$\begin{aligned} \Vert U(w)e\Vert ^2&\le M\sum \limits _{m=1}^M w_m^2(1-a_mg_m(y))^2\Vert P_me\Vert ^2=O_p(1),\end{aligned}$$
(10c)
$$\begin{aligned} e'U(w)e&\le \sum \limits _{m=1}^M w_m|1-a_mg_m(y)|e'P_me=O_p(1),\end{aligned}$$
(10d)
$$\begin{aligned} \Vert \tilde{P}e\Vert ^2&= O_p(1). \end{aligned}$$
(10e)

From (10a)–(10d) and condition (C.2), we obtain (9a)–(9d).

In addition,

$$\begin{aligned} |\mu '(I_n-U(w))U(w)e|\le \Vert (I_n-U(w))\mu \Vert \Vert U(w)e\Vert =T^{1/2}(w)\Vert U(w)e\Vert ,\quad \quad \end{aligned}$$
(11)

which, together with (10c) and condition (C.2), implies (9e). Finally,

$$\begin{aligned} |e'(U(w)-\tilde{P})\mu |&= |e'\tilde{P}(U(w)-I_n)\mu | \le \Vert (I_n-U(w))\mu \Vert \Vert \tilde{P}e\Vert \nonumber \\&= T^{1/2}(w)\Vert \tilde{P}e\Vert , \end{aligned}$$
(12)

which, together with (10e) and condition (C.2), implies (9f). This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, S. Model averaging based on James–Stein estimators. Metrika 77, 1013–1022 (2014). https://doi.org/10.1007/s00184-014-0483-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-014-0483-y

Keywords

Mathematics Subject Classification (2010)

Navigation