Abstract
Existing model averaging methods are generally based on ordinary least squares (OLS) estimators. However, it is well known that the James–Stein (JS) estimator dominates the OLS estimator under quadratic loss, provided that the dimension of coefficient is larger than two. Thus, we focus on model averaging based on JS estimators instead of OLS estimators. We develop a weight choice method and prove its asymptotic optimality. A simulation experiment shows promising results for the proposed model average estimator.
Similar content being viewed by others
References
Akaike H (1973) Maximum likelihood identification of Gaussian autoregression moving average models. Biometrika 60:255–265
Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618
Claeskens G, Croux C, Venkerckhoven J (2006) Variable selection for logit regression using a prediction-focused information criterion. Biometrics 62:972–979
Hansen BE (2007) Least squares model averaging. Econometrica 75:1175–1189
Hansen BE (2008) Least squares forecast averaging. J Econ 146:342–350
Hansen BE, Racine J (2012) Jackknife model averaging. J Econ 167:38–46
Hansen BE (2014) Model averaging, asymptotic risk, and regressor groups. Quant Econ (to appear)
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417
James W, Stein C (1961) Estimation with quadratic loss. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 361–379
Li K-C (1987) Asymptotic optimality for \(C_L\), cross validation and generalized crossvalidations: discrete index set. Ann Stat 15:958–975
Liang H, Zou G, Wan ATK, Zhang X (2011) On optimal weight choice in a frequentist model average estimator. J Am Stat Assoc 106:1053–1066
Mallows CL (1973) Some comments on \(C_p\). Technometrics 15:661–675
Miller AJ (2002) Subset selection in regression, 2nd edn. Chapman & Hall, London
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88:486–494
Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 197–206
Stein C (1981) Estimation of the mean of a multivariate normal distribution. Ann Stat 9:1135–1151
Wan ATK, Zhang X, Zou G (2010) Least squares model averaging by Mallows criterion. J Econ 156:277–283
Yang Y (2001) Adaptive regression by mixing. J Am Stat Assoc 96:574–586
Yuan Z, Yang Y (2005) Combining linear regression models: when and how? J Am Stat Assoc 100:1202–1214
Zhang X, Liang H (2011) Focused information criterion and model averaging for generalized additive partial linear models. Ann Stat 39:174–200
Zhang X, Wan ATK, Zhou Z (2012) Focused information criteria, model selection and model averaging in a Tobit model with a non-zero threshold. J Bus Econ Stat 30:132–142
Zhang X, Lu Z, Zou G (2013a) Adaptively combined forecasting for discrete response time series. J Econ 176:80–91
Zhang X, Wan ATK, Zou G (2013b) Model averaging by jackknife criterion in models with dependent data. J Econ 174:82–94
Acknowledgments
I am very grateful to two anonymous referees for their very helpful comments and suggestions. This work was supported by the grant 11226274 from the National Natural Science Foundation of China.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs of theorems
Appendix: Proofs of theorems
Proof of Theorem 1
By the condition stated in Theorem 1 and Stein’s Lemma (Stein 1981), we have
Using (5) and some techniques on matrix derivatives, we have
This completes the proof. \(\square \)
Proof of Theorem 2
Let \(U_m=(1-a_mg_m(y))P_m\), \(U(w)=\sum \nolimits _{m=1}^M w_mU_m\), \(\tilde{X}=[X_1,\ldots ,X_M]\), and \(\tilde{P}=\tilde{X}(\tilde{X}'\tilde{X})^{-1}\tilde{X}'\). Then \(\hat{\mu }^{\text {JS}}(w)=U(w)y\). It is straightforward to show that
where the last three terms are unrelated to \(w\). Thus,
It is seen that \(L^{\text {JS}}(w)- T(w) = \Vert U(w)e\Vert ^2+2\mu '(U(w)-I_n)U(w)e\) and \( S^*(w)-L^{\text {JS}}(w)=-2e'U(w)e - 2e'(U(w)-\tilde{P})\mu + 2\hat{\sigma }^2w'k-2\hat{\sigma }^2\sum \nolimits _{m=1}^Mw_ma_mg_m(y)(k_m-2)\). From the proof of Theorem 3 of Liang et al. (2011), (8), and condition (C.2), the following formulas are sufficient for (3):
Using condition (C.1), we have
From (10a)–(10d) and condition (C.2), we obtain (9a)–(9d).
In addition,
which, together with (10c) and condition (C.2), implies (9e). Finally,
which, together with (10e) and condition (C.2), implies (9f). This completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Zhao, S. Model averaging based on James–Stein estimators. Metrika 77, 1013–1022 (2014). https://doi.org/10.1007/s00184-014-0483-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-014-0483-y