Model averaging based on James–Stein estimators

Zhao, Shangwei

doi:10.1007/s00184-014-0483-y

Model averaging based on James–Stein estimators

Published: 02 March 2014

Volume 77, pages 1013–1022, (2014)
Cite this article

Metrika Aims and scope Submit manuscript

Shangwei Zhao¹

398 Accesses
2 Citations
Explore all metrics

Abstract

Existing model averaging methods are generally based on ordinary least squares (OLS) estimators. However, it is well known that the James–Stein (JS) estimator dominates the OLS estimator under quadratic loss, provided that the dimension of coefficient is larger than two. Thus, we focus on model averaging based on JS estimators instead of OLS estimators. We develop a weight choice method and prove its asymptotic optimality. A simulation experiment shows promising results for the proposed model average estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust MAVE for single-index varying-coefficient models

Article 02 September 2022

Model averaging prediction for nonparametric varying-coefficient models with B-spline smoothing

Article 15 January 2021

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Article 04 October 2023

References

Akaike H (1973) Maximum likelihood identification of Gaussian autoregression moving average models. Biometrika 60:255–265
Article MathSciNet MATH Google Scholar
Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618
Article MATH Google Scholar
Claeskens G, Croux C, Venkerckhoven J (2006) Variable selection for logit regression using a prediction-focused information criterion. Biometrics 62:972–979
Article MathSciNet MATH Google Scholar
Hansen BE (2007) Least squares model averaging. Econometrica 75:1175–1189
Article MathSciNet MATH Google Scholar
Hansen BE (2008) Least squares forecast averaging. J Econ 146:342–350
Article Google Scholar
Hansen BE, Racine J (2012) Jackknife model averaging. J Econ 167:38–46
Article MathSciNet Google Scholar
Hansen BE (2014) Model averaging, asymptotic risk, and regressor groups. Quant Econ (to appear)
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417
Article MathSciNet MATH Google Scholar
James W, Stein C (1961) Estimation with quadratic loss. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 361–379
Li K-C (1987) Asymptotic optimality for $C_L$, cross validation and generalized crossvalidations: discrete index set. Ann Stat 15:958–975
Article MATH Google Scholar
Liang H, Zou G, Wan ATK, Zhang X (2011) On optimal weight choice in a frequentist model average estimator. J Am Stat Assoc 106:1053–1066
Article MathSciNet MATH Google Scholar
Mallows CL (1973) Some comments on $C_p$. Technometrics 15:661–675
MATH Google Scholar
Miller AJ (2002) Subset selection in regression, 2nd edn. Chapman & Hall, London
Book MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MATH Google Scholar
Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88:486–494
Article MATH Google Scholar
Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 197–206
Stein C (1981) Estimation of the mean of a multivariate normal distribution. Ann Stat 9:1135–1151
Article MATH Google Scholar
Wan ATK, Zhang X, Zou G (2010) Least squares model averaging by Mallows criterion. J Econ 156:277–283
Article MathSciNet Google Scholar
Yang Y (2001) Adaptive regression by mixing. J Am Stat Assoc 96:574–586
Article MATH Google Scholar
Yuan Z, Yang Y (2005) Combining linear regression models: when and how? J Am Stat Assoc 100:1202–1214
Article MathSciNet MATH Google Scholar
Zhang X, Liang H (2011) Focused information criterion and model averaging for generalized additive partial linear models. Ann Stat 39:174–200
Article MathSciNet MATH Google Scholar
Zhang X, Wan ATK, Zhou Z (2012) Focused information criteria, model selection and model averaging in a Tobit model with a non-zero threshold. J Bus Econ Stat 30:132–142
Article MathSciNet Google Scholar
Zhang X, Lu Z, Zou G (2013a) Adaptively combined forecasting for discrete response time series. J Econ 176:80–91
Article MathSciNet MATH Google Scholar
Zhang X, Wan ATK, Zou G (2013b) Model averaging by jackknife criterion in models with dependent data. J Econ 174:82–94
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

I am very grateful to two anonymous referees for their very helpful comments and suggestions. This work was supported by the grant 11226274 from the National Natural Science Foundation of China.

Author information

Authors and Affiliations

College of Science, Minzu University of China, Beijing, 100081, China
Shangwei Zhao

Authors

Shangwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shangwei Zhao.

Appendix: Proofs of theorems

Proof of Theorem 1

By the condition stated in Theorem 1 and Stein’s Lemma (Stein 1981), we have

$$\begin{aligned} E (\hat{\mu }_i^{\text {JS}}(w)-y_i)(y_i-\mu _i) = \sigma ^2E \partial (\hat{\mu }_i^{\text {JS}}(w)-y_i)/\partial y_i = \sigma ^2E \partial \hat{\mu }_i^{\text {JS}}(w)/\partial y_i -\sigma ^2. \nonumber \\ \end{aligned}$$

(5)

Using (5) and some techniques on matrix derivatives, we have

$$\begin{aligned} R^{\text {JS}}(w)&= E\Vert \hat{\mu }^{\text {JS}}(w)-\mu \Vert ^2=E\Vert \hat{\mu }^{\text {JS}}(w)-y+y-\mu \Vert ^2\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2+E\Vert y-\mu \Vert ^2+2E(\hat{\mu }^{\text {JS}}(w)-y)'(y-\mu )\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2+n\sigma ^2 +2E\sum \limits _{i=1}^n(\hat{\mu }_i^{\text {JS}}(w)-y_i)(y_i-\mu _i)\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2+n\sigma ^2 +2\sigma ^2E\text{ tr }(\partial \hat{\mu }^{\text {JS}}(w)/\partial y') -2n\sigma ^2\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 \!+\!2\sigma ^2E\text{ tr }\left( \partial \sum \limits _{m=1}^Mw_m\left( 1-a_mg_m(y)\right) P_my/\partial y'\right) \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2E\sum \limits _{m=1}^Mw_m(1-a_mg_m(y))\text{ tr }(P_m)\nonumber \\&+ 2\sigma ^2\sum \limits _{m=1}^Mw_mE\text{ tr }\left( P_my\left( \partial \left( 1-a_m{\Vert \hat{\mu }_m-y\Vert ^2 \over \Vert \hat{\mu }_m\Vert ^2}\right) /\partial y'\right) \right) \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2E\sum \limits _{m=1}^Mw_m(1-a_mg_m(y))k_m\nonumber \\&+2\sigma ^2\sum \limits _{m=1}^Mw_mE\text{ tr }\left( P_my\left( \partial \left( 1+a_m-a_m{y'y \over y'P_my}\right) /\partial y'\right) \right) \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2E\sum \limits _{m=1}^Mw_m(1-a_mg_m(y))k_m\nonumber \\&-2\sigma ^2\sum \limits _{m=1}^Mw_mEa_m\left( \partial \left( {y'y \over y'P_my}\right) /\partial y'\right) P_my \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2w'k-2\sigma ^2E\sum \limits _{m=1}^Mw_ma_mg_m(y)k_m\nonumber \\&-2\sigma ^2\sum \limits _{m=1}^Mw_mEa_m\left( 2(y'P_my)^{-1}y'-2y'y(y'P_my)^{-2}y'P_m \right) P_my \nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2-n\sigma ^2 + 2\sigma ^2w'k-2\sigma ^2E\sum \limits _{m=1}^Mw_ma_mg_m(y)k_m\nonumber \\&-4\sigma ^2\sum \limits _{m=1}^Mw_ma_m+ 4\sigma ^2\sum \limits _{m=1}^Mw_mEa_m y'y(y'P_my)^{-1}\nonumber \\&= E\Vert \hat{\mu }^{\text {JS}}(w)-y\Vert ^2+ 2\sigma ^2w'k-2\sigma ^2E\sum \limits _{m=1}^Mw_ma_mg_m(y)(k_m-2)-n\sigma ^2\nonumber \\&= ES(w)-n\sigma ^2. \end{aligned}$$

(6)

This completes the proof. $\square $

Proof of Theorem 2

Let $U_m=(1-a_mg_m(y))P_m$, $U(w)=\sum \nolimits _{m=1}^M w_mU_m$, $\tilde{X}=[X_1,\ldots ,X_M]$, and $\tilde{P}=\tilde{X}(\tilde{X}'\tilde{X})^{-1}\tilde{X}'$. Then $\hat{\mu }^{\text {JS}}(w)=U(w)y$. It is straightforward to show that

$$\begin{aligned} \hat{S}(w)&= \Vert \hat{\mu }^{\text {JS}}(w)-\mu -e\Vert ^2+ 2\hat{\sigma }^2w'k-2\hat{\sigma }^2\sum \limits _{m=1}^Mw_ma_mg_m(y)(k_m-2)\nonumber \\&= L^{\text {JS}}(w)-2e'\hat{\mu }^{\text {JS}}(w)+ 2e'\tilde{P}\mu + 2\hat{\sigma }^2w'k-2\hat{\sigma }^2\sum \limits _{m=1}^Mw_ma_mg_m(y)(k_m-2)\nonumber \\&+\Vert e\Vert ^2 +2e'\mu -2e'\tilde{P}\mu \nonumber \\&\equiv S^*(w)+\Vert e\Vert ^2 +2e'\mu -2e'\tilde{P}\mu , \end{aligned}$$

(7)

where the last three terms are unrelated to $w$. Thus,

$$\begin{aligned} \hat{w} = \underset{w \in {{\mathcal {W}}}}{\text {argmin}} S^*(w). \end{aligned}$$

(8)

It is seen that $L^{\text {JS}}(w)- T(w) = \Vert U(w)e\Vert ^2+2\mu '(U(w)-I_n)U(w)e$ and $ S^*(w)-L^{\text {JS}}(w)=-2e'U(w)e - 2e'(U(w)-\tilde{P})\mu + 2\hat{\sigma }^2w'k-2\hat{\sigma }^2\sum \nolimits _{m=1}^Mw_ma_mg_m(y)(k_m-2)$. From the proof of Theorem 3 of Liang et al. (2011), (8), and condition (C.2), the following formulas are sufficient for (3):

$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)\Vert U(w)e\Vert ^2\mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$

(9a)

$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)e'U(w)e\mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$

(9b)

$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)\hat{\sigma }^2w'k\mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$

(9c)

$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)\hat{\sigma }^2|\sum \limits _{m=1}^Mw_ma_mg_m(y)(k_m-2)| \mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$

(9d)

$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)|\mu '(I_n-U(w))U(w)e|\mathop {\longrightarrow }\limits ^{p}0, \end{aligned}$$

(9e)

$$\begin{aligned}&\sup _{w\in \mathcal{W}}T^{-1}(w)|e'(U(w)-\tilde{P})\mu |\mathop {\longrightarrow }\limits ^{p}0. \end{aligned}$$

(9f)

Using condition (C.1), we have

$$\begin{aligned} \hat{\sigma }^2&= y'(I_n-P_{m^*})y(n-m^*)^{-1}\le \Vert y\Vert ^2(n-m^*)^{-1}=O_p(1),\quad \quad \end{aligned}$$

(10a)

$$\begin{aligned} a_mg_m(y)&= {\Vert \hat{\mu }_m-y\Vert ^2(n-k_m+2)^{-1} \over \Vert \hat{\mu }_m\Vert ^2(k_m-2)^{-1}} = {y'(I_n-P_m)y(n-k_m+2)^{-1} \over y'P_my(k_m-2)^{-1}}\quad \quad \nonumber \\&\le {\Vert y\Vert ^2(n-k_m+2)^{-1} \over \hat{\beta }_m'X_m'X_m\hat{\beta }_m(k_m-2)^{-1}}= O_p(1),\end{aligned}$$

(10b)

$$\begin{aligned} \Vert U(w)e\Vert ^2&\le M\sum \limits _{m=1}^M w_m^2(1-a_mg_m(y))^2\Vert P_me\Vert ^2=O_p(1),\end{aligned}$$

(10c)

$$\begin{aligned} e'U(w)e&\le \sum \limits _{m=1}^M w_m|1-a_mg_m(y)|e'P_me=O_p(1),\end{aligned}$$

(10d)

$$\begin{aligned} \Vert \tilde{P}e\Vert ^2&= O_p(1). \end{aligned}$$

(10e)

From (10a)–(10d) and condition (C.2), we obtain (9a)–(9d).

In addition,

$$\begin{aligned} |\mu '(I_n-U(w))U(w)e|\le \Vert (I_n-U(w))\mu \Vert \Vert U(w)e\Vert =T^{1/2}(w)\Vert U(w)e\Vert ,\quad \quad \end{aligned}$$

(11)

which, together with (10c) and condition (C.2), implies (9e). Finally,

$$\begin{aligned} |e'(U(w)-\tilde{P})\mu |&= |e'\tilde{P}(U(w)-I_n)\mu | \le \Vert (I_n-U(w))\mu \Vert \Vert \tilde{P}e\Vert \nonumber \\&= T^{1/2}(w)\Vert \tilde{P}e\Vert , \end{aligned}$$

(12)

which, together with (10e) and condition (C.2), implies (9f). This completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, S. Model averaging based on James–Stein estimators. Metrika 77, 1013–1022 (2014). https://doi.org/10.1007/s00184-014-0483-y

Download citation

Received: 04 June 2013
Published: 02 March 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s00184-014-0483-y

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model averaging based on James–Stein estimators

Abstract

Access this article

Similar content being viewed by others

Robust MAVE for single-index varying-coefficient models

Model averaging prediction for nonparametric varying-coefficient models with B-spline smoothing

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of theorems

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Model averaging based on James–Stein estimators

Abstract

Access this article

Similar content being viewed by others

Robust MAVE for single-index varying-coefficient models

Model averaging prediction for nonparametric varying-coefficient models with B-spline smoothing

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of theorems

Appendix: Proofs of theorems

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation