Model averaging estimator in ridge regression and its large sample properties

Zhao, Shangwei; Liao, Jun; Yu, Dalei

doi:10.1007/s00362-018-1002-4

Model averaging estimator in ridge regression and its large sample properties

Regular Article
Published: 18 April 2018

Volume 61, pages 1719–1739, (2020)
Cite this article

Statistical Papers Aims and scope Submit manuscript

810 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

In linear regression, when the covariates are highly collinear, ridge regression has become the standard treatment. The choice of ridge parameter plays a central role in ridge regression. In this paper, instead of ending up with a single ridge parameter, we consider a model averaging method to combine multiple ridge estimators with $M_n$ different ridge parameters, where $M_n$ can go to infinity with sample size n. We show that when the fitting model is correctly specified, the resulting model averaging estimator is $n^{1/2}$-consistent. When the fitting model is misspecified, the asymptotic optimality of the model averaging estimator is also established rigorously. The results of simulation studies and our case study concerning the urbanization level of Chinese ethnic areas demonstrate the usefulness of the model averaging method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The weighted ridge estimator in stochastic restricted linear measurement error models

Article 15 June 2016

Recent results in ridge regression methods

Article 15 July 2015

The stochastic restricted ridge estimator in generalized linear models

Article 25 October 2019

References

Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618
Article Google Scholar
Chen X, Zou G, Zhang X (2013) Frequentist model averaging for linear mixed-effects models. Front Math China 8:497–515
Article MathSciNet Google Scholar
Claeskens G, Croux C, van Kerckhoven J (2006) Variable selection for logistic regression using a prediction-focused information criterion. Biometrics 62:972–979
Article MathSciNet Google Scholar
Clue E, Vineis P, De Iorio M (2011) Significance testing in ridge regression for genetic data. BMC Bioinform 12:372
Article Google Scholar
Dempster A, Schatzoff M, Wermuth N (1975) A simulation study of alternatives to ordinary least squares. J Am Stat Assoc 70:77–106
MATH Google Scholar
Flynn CJ, Hurvich CM, Simonoff JS (2013) Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J Am Stat Assoc 108:1031–1043
Article MathSciNet Google Scholar
Gao Y, Zhang X, Wang S, Zou G (2016) Model averaging based on leave-subject-out cross-validation. J Econ 192:139–151
Article MathSciNet Google Scholar
Ghosh S, Yuan Z (2009) An improved model averaging scheme for logistic regression. J Multivar Anal 100:1670–1681
Article MathSciNet Google Scholar
Golub G, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21:215–223
Article MathSciNet Google Scholar
Hansen BE (2007) Least squares model averaging. Econometrica 75:1175–1189
Article MathSciNet Google Scholar
Hansen BE, Racine J (2012) Jacknife model averaging. J Econ 167:38–46
Article Google Scholar
Hoerl A, Kennard R (1970) Ridge regression, biased estimation for nonorthogonal problems. Technometrics 12:55–67
Article Google Scholar
Lee T (1987) Algorithm as 223: optimum ridge parameter selection. J R Stat Soc C 36:112–118
MATH Google Scholar
Leung G, Barron AR (2006) Information theory and mixing least-squares regressions. IEEE Trans Inf Theory 52:3396–3410
Article MathSciNet Google Scholar
Liu Q, Okui R (2013) Heteroskedasticity-robust Cp model averaging. Econ J 16:463–472
Google Scholar
Lu X, Su L (2015) Jackknife model averaging for quantile regressions. J Econ 188:40–58
Article MathSciNet Google Scholar
Magnus J, De Luca G (2016) Weighted-average least squares (WALS), a survey. J Econ Surv 30:117–148
Article Google Scholar
Moral-Benito E (2015) Model averaging in economics: an overview. J Econ 29:46–75
Google Scholar
Nordberg L (1982) A procedure for determination of a good ridge parameter in linear regression. Commun Stat Simul Comput 11:285–309
Article MathSciNet Google Scholar
Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York
Book Google Scholar
Schomaker M (2012) Shrinkage averaging estimation. Stat Pap 53(4):1015–1034
Article MathSciNet Google Scholar
Schomaker M, Wan ATK, Heumann C (2010) Frequentist model averaging with missing observations. Comput Stat Data Anal 54:3336–3347
Article MathSciNet Google Scholar
Wan ATK, Zhang X, Zou G (2010) Least squares model averaging by Mallows criterion. J Econ 156:277–283
Article MathSciNet Google Scholar
Wang H, Zhang X, Zou G (2009) Frequentist model averaging estimation: a review. J Syst Sci Complex 22:732–748
Article MathSciNet Google Scholar
Yu Y, Thurston S, Hauser R, Liang H (2013) Model averaging procedure for partially linear single-index models. J Stat Plan Inference 143:2160–2170
Article MathSciNet Google Scholar
Yuan Z, Yang Y (2005) Combining linear regression models: when and how? J Am Stat Assoc 100:1202–1214
Article MathSciNet Google Scholar
Zhang X (2015) Consistency of model averaging estimators. Econ Lett 130:120–123
Article MathSciNet Google Scholar
Zhang X, Liang H (2011) Focused information criterion and model averaging for generalized additive partial linear models. Ann Stat 39:174–200
Article MathSciNet Google Scholar
Zhang X, Wang W (2017) Optimal model averaging estimation for partially linear models. Stat Sin (forthcoming)
Zhang X, Wan ATK, Zhou SZ (2012) Focused information criteria, model selection and model averaging in a Tobit model with a non-zero threshold. J Busi Econ Stat 30:132–142
Article Google Scholar
Zhang X, Wan A, Zou G (2013) Model averaging by jackknife criterion in models with dependent data. J Econ 174:82–94
Article MathSciNet Google Scholar
Zhang X, Zou G, Liang H (2014) Model averaging and weight choice in linear mixed-effects models. Biometrika 101:205–218
Article MathSciNet Google Scholar
Zhang X, Zou G, Carroll R (2015) Model averaging based on Kullback-Leibler distance. Stat Sin 25:1583–1598
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank two anonymous referees for their insightful comments and very constructive suggestions that have substantially improved earlier versions of this paper. Zhao’s research was supported by a grant from the Ministry of Education of China (Grant No. 17YJC910011) and a grant from Minzu University of China (Grant No. 2017QNPY34). Yu’s research was supported by the National Natural Science Foundation of China (Grant Nos. 11661079 and 11301463).

Author information

Authors and Affiliations

College of Science, Minzu University of China, Beijing, 100081, China
Shangwei Zhao
School of Mathematical Sciences, Capital Normal University, Beijing, 100048, China
Jun Liao
Statistics and Mathematics College, Yunnan University of Finance and Economics, Kunming, 650221, China
Dalei Yu

Authors

Shangwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jun Liao
View author publications
You can also search for this author in PubMed Google Scholar
Dalei Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dalei Yu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (rar 10 KB)

Supplementary material 2 (rar 408 KB)

Appendices

1.1 A.1 Notations and regularity conditions

Let $\lambda _{\min }(\mathbf{B})$ and $\lambda _{\max }(\mathbf{B})$ be the minimum and maximum eigenvalues of a general real matrix $\mathbf{B}$. Denote $\left\| \mathbf{B}\right\| $ as the spectral norm of a real matrix $\mathbf{B}$, i.e. $ \left\| \mathbf{B}\right\| = \lambda _{\max }^{1/2}(\mathbf{B}'\mathbf{B})$. Let $R^{}(\mathbf{w})={E\{L^{}(\mathbf{w})|\tilde{\mathbf{X}}\}=\text {E}\left\{ \Vert \varvec{{\mu }}-\widehat{\varvec{{\mu }}}^{}(\mathbf{w})\Vert ^2|\tilde{\mathbf{X}}\right\} }$, $\xi _n=\inf \limits _{w\in \mathcal{W}} R(\mathbf{w})$, and $\mathbf{w}_m^0$ be a weight vector in which the m-th element is one and the others are zeros. We need the following regularity conditions, where all limiting processes are with respect to $n\rightarrow \infty $.

Condition (C.1)

$\lambda _{\min }(\mathbf{X}'\mathbf{X}/n)$ and $\lambda _{\max }(\mathbf{X}'\mathbf{X}/n)$ are bounded below and above by positive constants $c_0$ and $c_1$, a.s., respectively, and $n^{-1/2}\mathbf{X}'\mathbf{e}=O_p(1)$.

Condition (C.2)

There exists a $ {m^*}\in \{1,\ldots ,M_n\}$ such that $n^{-1/2}k_{m^*}=O_p(1) $.

Condition (C.3)

$p^*={O(n^{-1})}$, a.s., where $ p^*=\max _{1\le m\le M_n}\max _{1\le i\le n}P^{m}_{ii}$, and $P^{m}_{ii}$ is the i-th diagonal element of $\mathbf{P}_m = \mathbf{X}(\mathbf{X}'\mathbf{X}+k_m\mathbf{I}_p)^{-1}\mathbf{X}'$.

Condition (C.4)

${\sup _{i\in \{1,\ldots ,n\}}}E{(e^4_i|\tilde{\mathbf{x}}_i)=O(1),}$ a.s., where $e_i$ is the random error defined in (8).

Condition (C.5)

${\varvec{{\mu }}'\varvec{{\mu }}}/{n}={O(1)}, {a.s.}.$

Condition (C.6)

$\xi _n^{-2}\sum \limits _{m=1}^{M_n} R(\mathbf{w}_m^0)={o(1)}, {a.s..}$

The first part of Condition (C.1) guarantees the identifiability of the model and is common in the literature on model selection (Flynn et al. 2013). The second part of Condition (C.1) is mild and holds under some typical situations, e.g., for the case that $\{\mathbf{X}_i,e_i\}$’s are independent and satisfy some moment conditions. Condition (C.2) is mild and it requires that there exists an $ m^*$ such that $k_{m^*}$ grows at a rate no greater than $n^{1/2}$. In fact, the ridge parameters adopted in Sect. 3 satisfy this condition. The reason is that $p\max _{1\le j\le p}\widehat{\alpha }_j^2 \ge \Vert \varvec{{\beta }}_0\Vert ^2 + \varepsilon _n$, with $\varepsilon _n = 2\varvec{{\beta }}_0'(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{e}+ \Vert (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{e}\Vert ^2 = O_p(n^{-1/2})$ under Condition (C.1) and therefore $k^* = \widehat{\sigma }^2/\max _{ 1\le j\le p}\widehat{\alpha }_j^2 \le p\widehat{\sigma }^2/( \Vert \varvec{{\beta }}_0\Vert ^2 + \varepsilon _n) = O_p(1)$. Condition (C.3) is reasonable and weaker than Condition (C.2) of Zhang (2015). The reason is that $\{\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'-\mathbf{P}_m\}$ is a positive definite matrix and thus $\mathbf{l}_{i}'\{\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'-\mathbf{P}_m\}\mathbf{l}_{i}\ge 0$, where $\mathbf{l}_{i}$ is the i-th column of $\mathbf{I}_n$. Condition (C.4) is a commonly used condition in the literature (Wan et al. 2010). This excludes the situation where the random error distribution comes from a specific family of heavy-tailed distributions, such as the t-distribution whose degree of freedom is no greater than 4 or a Pareto distribution whose shape parameter is no greater than 4. Conditions (C.5) and (C.6) are the same as (23) and (21) in Zhang et al. (2013), respectively.

1.2 A.2 Proof of Theorem 1

This proof follows the framework in Zhang (2015). Define $\mathbf{D}_m$ as the $n\times {n}$ diagonal matrix with $h^{m}_{ii} = (1-P^{m}_{ii})^{-1}$ being its i-th diagonal element. From (6) and the fact that

$$\begin{aligned} \mathbf{X}'_{[-i]}\mathbf{Y}_{[-i]}= \mathbf{X}'\mathbf{Y}- \mathbf{X}_i\mathbf{Y}_i,\quad h^{m}_{ii}P^{m}_{ii}=h^{m}_{ii}-1 \quad \text {and}\quad h^{m}_{ii}P^{m}_{ii}P^{m}_{ii}=h^{m}_{ii}-1-P^{m}_{ii},\nonumber \\ \end{aligned}$$

(A.1)

we have

$$\begin{aligned} \widetilde{\varvec{{\mu }}}_{k_m}= & {} \mathbf{P}_m \mathbf{Y}- \text{ diag }(P_{ii}^m) \mathbf{Y}+ \text{ diag }(h_{ii}^m) \text{ diag }(P_{ii}^m)\mathbf{P}_m \mathbf{Y}\nonumber \\&- \text{ diag }(h_{ii}^m) \text{ diag }(P_{ii}^m) \text{ diag }(P_{ii}^m)\mathbf{Y}\nonumber \\= & {} \mathbf{D}_m\mathbf{P}_m\mathbf{Y}- \mathbf{D}_m\mathbf{Y}+ \mathbf{Y}. \end{aligned}$$

(A.2)

Now we define $\mathbf{Q}_m$ as the $n\times {n}$ diagonal matrix whose i-th diagonal element is $Q_{m,ii}=P^m_{ii}/(1-P^m_{ii})$. Then $ \mathbf{D}_m=\mathbf{Q}_m + \mathbf{I}_n$. By (A.2),

$$\begin{aligned} \widetilde{\varvec{{\mu }}}(\mathbf{w})= & {} \sum _{m=1}^{M_n}\{w_m\mathbf{D}_m (\mathbf{P}_m - \mathbf{I}_n)\mathbf{Y}\} + \mathbf{Y}\nonumber \\= & {} \sum _{m=1}^{M_n} \{w_m(\mathbf{Q}_m + \mathbf{I}_n) (\mathbf{P}_m - I_n)\mathbf{Y}\} + \mathbf{Y}. \end{aligned}$$

(A.3)

Let $\mathbf{V}$ be the $M_n \times M_n $ matrix whose (m, j)-th entry is

$$\begin{aligned} V_{mj} = (\mathbf{e}+\mathbf{X}\varvec{{\beta }}_0)'(\mathbf{I}_n - \mathbf{P}_m)(\mathbf{Q}_m+\mathbf{Q}_j+\mathbf{Q}_m\mathbf{Q}_j)(\mathbf{I}_n-\mathbf{P}_j)(\mathbf{e}+\mathbf{X}\varvec{{\beta }}_0). \end{aligned}$$

(A.4)

It follows from (A.3)–(A.4) that

$$\begin{aligned} \mathrm {CV}(\mathbf{w})&=\left\| \sum _{m=1}^{M_n} w_m(\mathbf{Q}_m + \mathbf{I}_n)(\mathbf{P}_m - \mathbf{I}_n)\mathbf{Y}\right\| ^2\nonumber \\&= \Vert \mathbf{e}\Vert ^2+\{\widehat{\varvec{{\beta }}} (\mathbf{w})- \varvec{{\beta }}_0\}'\mathbf{X}'\mathbf{X}\{\widehat{\varvec{{\beta }}}(\mathbf{w})- \varvec{{\beta }}_0\} - 2\mathbf{e}'\mathbf{X}\{\widehat{\varvec{{\beta }}}(\mathbf{w})- \varvec{{\beta }}_0\}+ \mathbf{w}'\mathbf{V}\mathbf{w}.\nonumber \\ \end{aligned}$$

(A.5)

Next, we will show that

$$\begin{aligned} \sqrt{n}(\widehat{\varvec{{\beta }}}_{k_m^{^*}}- \varvec{{\beta }}_0 )=O_p(1) \end{aligned}$$

(A.6)

with $m^*$ being defined in Condition (C.2), and that for any $\mathbf{w}\in \mathcal{W}$,

$$\begin{aligned} \mathbf{w}'\mathbf{V}\mathbf{w}=O_p(1). \end{aligned}$$

(A.7)

To show (A.6), note that

$$\begin{aligned} \sqrt{n}\left( \widehat{\varvec{{\beta }}}_{k_m^{^*}}- \varvec{{\beta }}_0\right) = \left( n^{-1}\mathbf{X}'\mathbf{X}+n^{-1}k_{m^*}\mathbf{I}_p\right) ^{-1} \left( n^{-1/2}\mathbf{X}'\mathbf{e}-n^{-1/2}k_{m^*}\varvec{{\beta }}_0\right) , \end{aligned}$$

then (A.6) holds under Conditions (C.1)–(C.2). Moreover, by the definition of $\mathbf{Q}_m$, Conditions (C.1)–(C.3), and Eq. (A.4), it is seen that (A.7) also holds because of the fact that

$$\begin{aligned} |V_{mj}|\le \Vert \mathbf{e}+\mathbf{X}\varvec{{\beta }}_0\Vert ^2\Vert \mathbf{I}_n - \mathbf{P}_m\Vert \Vert \mathbf{I}_n - \mathbf{P}_j\Vert (\Vert \mathbf{Q}_j\Vert + \Vert \mathbf{Q}_m\Vert + \Vert \mathbf{Q}_m\Vert \Vert \mathbf{Q}_j\Vert )=O_p(1), \end{aligned}$$

uniformly for every $m,j\in \{1,\ldots ,M_n\}$. Denote

$$\begin{aligned} \eta _n = \left( \widehat{\varvec{{\beta }}}_{k_m^{^*}}- \varvec{{\beta }}_0 \right) '\mathbf{X}'\mathbf{X}\left( \widehat{\varvec{{\beta }}}_{k_m^{^*}}-\varvec{{\beta }}_0 \right) - 2\mathbf{e}'\mathbf{X}\left( \widehat{\varvec{{\beta }}}_{k_m^{^*}} - \varvec{{\beta }}_0\right) + \mathbf{V}_{m^{*}m^{*}}. \end{aligned}$$

Based on (A.6), (A.7), and Condition (C.1), we have

$$\begin{aligned} \eta _n = O_p(1). \end{aligned}$$

(A.8)

In addition, by the definitions of $\widehat{\mathbf{w}}$ and $\eta _n$, we have that

$$\begin{aligned} \mathrm {CV}(\widehat{\mathbf{w}})\le \mathrm {CV}( \mathbf{w}_{m^*}^0 ) =\Vert \mathbf{e}\Vert ^2 + \eta _n, \end{aligned}$$

which, together with (A.5), implies that

$$\begin{aligned}&\lambda _{\tiny \min }\left( \mathbf{X}'\mathbf{X}/n \right) \Vert \sqrt{n}\{\widehat{\varvec{{\beta }}}(\mathbf{w})- \varvec{{\beta }}_0 \}\Vert ^2 \\&\quad \le \eta _n + 2\Vert n^{-1/2}\mathbf{e}'\mathbf{X}\Vert \Vert \sqrt{n}\{\widehat{\varvec{{\beta }}}(\widehat{\mathbf{w}})- \varvec{{\beta }}_0 \}\Vert - \widehat{\mathbf{w}}' \mathbf{V}\widehat{\mathbf{w}} \end{aligned}$$

and thus

$$\begin{aligned}&\lambda _{\tiny \min }\left( \frac{\mathbf{X}'\mathbf{X}}{n}\right) \left[ \Vert \sqrt{n}\{\widehat{\varvec{{\beta }}}(\widehat{\mathbf{w}})- \varvec{{\beta }}_0 \}\Vert - \lambda _{\tiny \min }^{-1}\left( \frac{\mathbf{X}'\mathbf{X}}{n}\right) \left\| \frac{\mathbf{e}'\mathbf{X}}{n^{1/2}}\right\| \right] ^2\\&\quad \le \eta _n + \lambda _{\tiny \min }^{-1}\left( \frac{\mathbf{X}'\mathbf{X}}{n}\right) \left\| \frac{\mathbf{e}'\mathbf{X}}{n^{1/2}}\right\| ^2 - \widehat{\mathbf{w}}' \mathbf{V}\widehat{\mathbf{w}} . \end{aligned}$$

Then, by Condition (C.1) and (A.7)–(A.8), we have

$$\begin{aligned}&\Vert \sqrt{n}\{\widehat{\varvec{{\beta }}}(\widehat{\mathbf{w}})- \varvec{{\beta }}_0 \}\Vert \le c_0^{-1/2}\left( \eta _n + c_0^{-1}\Vert n^{-1/2}\mathbf{e}'\mathbf{X}\Vert ^2 - \widehat{\mathbf{w}}' \mathbf{V}\widehat{\mathbf{w}} \right) ^{1/2} + c_0^{-1}\Vert n^{-1/2}\mathbf{e}'\mathbf{X}\Vert \nonumber \\&\quad = O_p(1), \end{aligned}$$

which concludes the proof.

1.3 A.3 Proof of Theorem 2

Let

$$\begin{aligned} \mathbf{A}_m=\mathbf{I}_n - \mathbf{P}_m,\quad \mathbf{A}(\mathbf{w})= \sum _{m=1}^{M_n} w_m\mathbf{A}_m,\quad {\mathrm{and}}\quad \mathbf{B}_m=\mathbf{Q}_m\mathbf{A}_m, \end{aligned}$$

where $\mathbf{Q}_m$ is defined in Appendix A.2, and $\mathbf{B}(\mathbf{w})=\sum _{m=1}^{M_n} w_m\mathbf{B}_m$. By (A.3), we have

$$\begin{aligned} \mathbf{Y}-\widetilde{\varvec{{\mu }}}(\mathbf{w}) = \sum _{m=1}^{M_n} \{w_m(\mathbf{Q}_m\mathbf{A}_m + \mathbf{A}_m) \mathbf{Y}\}= \{\mathbf{B}(\mathbf{w}) + \mathbf{A}(\mathbf{w})\} \mathbf{Y}. \end{aligned}$$

(A.9)

Define $\mathbf{M}(\mathbf{w})=\mathbf{A}(\mathbf{w})\mathbf{B}(\mathbf{w})+\mathbf{B}(\mathbf{w})\mathbf{A}(\mathbf{w})+\mathbf{B}(\mathbf{w})\mathbf{B}(\mathbf{w})$, then it is seen from (A.9) that

$$\begin{aligned} \mathrm {CV}(\mathbf{w})= & {} \left\| \{\mathbf{B}(\mathbf{w}) + \mathbf{A}(\mathbf{w})\}\mathbf{Y}\right\| ^2\nonumber \\= & {} \mathbf{Y}' \mathbf{A}(\mathbf{w})\mathbf{A}(\mathbf{w}) \mathbf{Y}+ \mathbf{Y}' \mathbf{M}(\mathbf{w}) \mathbf{Y}\nonumber \\= & {} L(\mathbf{w}) + r(\mathbf{w}) + \mathbf{e}'\mathbf{e}, \end{aligned}$$

(A.10)

where

$$\begin{aligned} r(\mathbf{w})= & {} 2\varvec{{\mu }}'\mathbf{A}(\mathbf{w})\mathbf{e}- 2\mathbf{e}'\mathbf{P}(\mathbf{w})\mathbf{e}+ 2\text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\right\} - 2\text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\right\} \\&+ \varvec{{\mu }}' \mathbf{M}(\mathbf{w}) \varvec{{\mu }}+2\varvec{{\mu }}'\mathbf{M}(\mathbf{w})\mathbf{e}+\mathbf{e}'\mathbf{M}(\mathbf{w})\mathbf{e}, \end{aligned}$$

with $\mathbf{P}(\mathbf{w}) = \sum _{m=1}^{M_n} w_m \mathbf{P}_m$. Since $\mathbf{e}'\mathbf{e}$ is independent of $\mathbf{w}$, to prove Theorem 2, by (A.10), it suffices to show that

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{r(\mathbf{w})}{R(\mathbf{w})} = o_p(1) \end{aligned}$$

(A.11)

and

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\left| \frac{L(\mathbf{w})}{R(\mathbf{w})}-1\right| = o_p(1). \end{aligned}$$

(A.12)

We first prove (A.11). Recall that

$$\begin{aligned} R(\mathbf{w}) = \left\| A(\mathbf{w})\varvec{{\mu }}\right\| ^2 + \text{ trace }\left\{ P(\mathbf{w})\varvec{{\varOmega }}P(\mathbf{w})\right\} . \end{aligned}$$

(A.13)

By (A.13), the Dominated Convergence Theorem, Conditions (C.4) and (C.6), and the assumption that there exists a positive constant $\bar{\sigma }^2$ such that $\lambda _{\tiny \max }(\varOmega )=\bar{\sigma }^2<\infty $ a.s., we have that, for any fixed $\delta >0$, as $n\rightarrow \infty $

$$\begin{aligned} P\left( \sup _{\mathbf{w}\in \mathcal{W}}R^{ - 1} (\mathbf{w})|\varvec{{\mu }}' \mathbf{A}(\mathbf{w})\mathbf{e}| \ge \delta \right)&\le P\left( \sup _{\mathbf{w}\in \mathcal{W}}|\varvec{{\mu }}' \mathbf{A}(\mathbf{w})\mathbf{e}| \ge \xi _n \delta \right) \nonumber \\&\le P\left( \max _{1 \le m \le M_n} |\varvec{{\mu }}'\mathbf{A}_m \mathbf{e}| \ge \xi _n \delta \right) \nonumber \\&\le \sum _{m = 1}^{M_n} P\left( |\varvec{{\mu }}' \mathbf{A}_m \mathbf{e}| \ge \xi _n \delta \right) \nonumber \\&\le \sum _{m = 1}^{M_n} \frac{ 1 }{ \delta ^2 }E\left( \frac{ |\varvec{{\mu }}' \mathbf{A}_m \mathbf{e}|^2 }{ \xi _n^2 } \right) \nonumber \\&= \sum _{m = 1}^{M_n} \frac{{1 }}{{\delta ^2 }}E_{\tilde{\mathbf{X}}} \left( \frac{ \varvec{{\mu }}' \mathbf{A}_m \varvec{{\varOmega }}\mathbf{A}_m \varvec{{\mu }}}{ \xi _n^2 } \right) \nonumber \\&\le \frac{\bar{\sigma }^2}{{\delta ^2 }}\sum _{m = 1}^{M_n} E_{\tilde{\mathbf{X}}} \left\{ \frac{\left\| A(\mathbf{w}_m^0)\varvec{{\mu }}\right\| ^2}{ \xi _n^2 } \right\} \nonumber \\&\le \frac{ \bar{\sigma } ^2 }{ \delta ^2 } E_{\tilde{\mathbf{X}}} \left\{ \sum \nolimits _{m = 1}^{M_n} \frac{ R(\mathbf{w}_m^0 ) }{ \xi _n^2 } \right\} \nonumber \\&\rightarrow 0, \end{aligned}$$

(A.14)

where the fourth inequality follows from the Chebyshev’s inequality and the last inequality is a direct result of (A.13). Similarly, by Conditions (C.3), (C.4) and (C.6) and Dominated Convergence Theorem, we obtain

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \mathbf{e}'\mathbf{P}{(\mathbf{w})}\mathbf{e}- \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\right\} \right| }{R(\mathbf{w})} = o_p(1), \end{aligned}$$

(A.15)

where we have used the fact that

$$\begin{aligned} M_n\xi _n^{-1} \le \xi _n^{-2}\sum \limits _{m=1}^{M_n} R(\mathbf{w}_m^0)=o(1), \quad a.s.. \end{aligned}$$

(A.16)

Moreover, by Condition (C.3) and Eq. (A.16)

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\right\} \right| }{R(\mathbf{w})}\le & {} \xi _n^{-1}\lambda _{\tiny \max }(\varvec{{\varOmega }}){\sup _{\mathbf{w}\in \mathcal{W}}} \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\right\} \nonumber \\\le & {} M_n\xi _n^{-1}\bar{\sigma }^2 {\max _{m\in \{1,\ldots ,M_n\}}} \text{ trace }(\mathbf{P}_m)\nonumber \\= & {} o(1), \quad a.s.. \end{aligned}$$

(A.17)

In addition, it follows from Conditions (C.1) and (C.3) that uniformly in m,

$$\begin{aligned} \left\| \mathbf{A}_m \right\| = \left\| \mathbf{I}_n - \mathbf{P}_m \right\| \le 1 + \left\| \mathbf{P}_m \right\| \le 1 + c_1 /c_0, \end{aligned}$$

and

$$\begin{aligned} \left\| \mathbf{Q}_m\right\| = \max _{1\le i\le n} \left\{ P^{m}_{ii}/(1-P^{m}_{ii})\right\} = O(1/n), \quad a.s.. \end{aligned}$$

Therefore,

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\left\| \mathbf{M}(\mathbf{w}) \right\| = \sup _{\mathbf{w}\in \mathcal{W}}\left\| \mathbf{A}(\mathbf{w})\mathbf{B}(\mathbf{w}) + \mathbf{B}(\mathbf{w})\mathbf{A}(\mathbf{w})+ \mathbf{B}(\mathbf{w})\mathbf{B}(\mathbf{w}) \right\| \nonumber \\&\quad = \sup _{\mathbf{w}\in \mathcal{W}}\left\| \sum \nolimits _{m = 1}^{M_n} {\sum \nolimits _{l = 1}^{M_n} {w_m w_l \left( \mathbf{A}_m \mathbf{B}_l +\mathbf{B}_m \mathbf{A}_l + \mathbf{B}_m \mathbf{B}_l \right) } } \right\| \nonumber \\&\quad \le \max _{1\le m\le M_n}\max _{1\le l\le M_n} {\left( \left\| {\mathbf{A}_m \mathbf{Q}_l\mathbf{A}_l } \right\| + \left\| {\mathbf{Q}_m\mathbf{A}_m \mathbf{A}_l } \right\| + \left\| \mathbf{Q}_m \mathbf{A}_m \mathbf{Q}_l \mathbf{A}_l\right\| \right) } \nonumber \\&\quad = O(1/n), \quad a.s.. \end{aligned}$$

(A.18)

Then, from Condition (C.5) and Eq. (A.16),

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \varvec{{\mu }}'\mathbf{M}{(\mathbf{w})}\varvec{{\mu }}\right| }{R(\mathbf{w})}\le \xi _n^{-1}\Vert \varvec{{\mu }}\Vert ^2\sup _{\mathbf{w}\in \mathcal{W}}\Vert \mathbf{M}(\mathbf{w})\Vert = o(1) \quad a.s.. \end{aligned}$$

(A.19)

By Conditions (C.4)–(C.5) and Eqs. (A.16) and (A.18), we have

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \varvec{{\mu }}'\mathbf{M}{(\mathbf{w})}\mathbf{e}\right| }{R(\mathbf{w})} \le \xi _n^{-1}\Vert \varvec{{\mu }}\Vert \Vert \mathbf{e}\Vert \sup _{\mathbf{w}\in \mathcal{W}}\Vert \mathbf{M}(\mathbf{w})\Vert = o_p(1). \end{aligned}$$

(A.20)

Likewise, with Condition (C.4) and Eqs. (A.16) and (A.18), we have

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \mathbf{e}'\mathbf{M}{(\mathbf{w})}\mathbf{e}\right| }{R(\mathbf{w})} \le \xi _n^{-1} \Vert \mathbf{e}\Vert ^2\sup _{\mathbf{w}\in \mathcal{W}}\Vert \mathbf{M}(\mathbf{w})\Vert = o_p (1). \end{aligned}$$

(A.21)

Equation (A.11) can then be proved by combining Eqs. (A.14)–(A.17) and (A.19)–(A.21) together.

We now prove (A.12). Note that

$$\begin{aligned}&|L(\mathbf{w})-R(\mathbf{w})|\nonumber \\&\quad =|\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^{2} - \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\mathbf{P}(\mathbf{w})\right\} -2\varvec{{\mu }}'\mathbf{A}^{}(\mathbf{w})\mathbf{P}(\mathbf{w})\mathbf{e}|, \end{aligned}$$

(A.22)

then, to show (A.12), it remains to verify that

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\frac{\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^{2}}{R(\mathbf{w})} = o_p(1), \end{aligned}$$

(A.23)

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\mathbf{P}(\mathbf{w})\right\} \right| }{R(\mathbf{w})} = o(1),~a.s. \end{aligned}$$

(A.24)

and

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{|\varvec{{\mu }}'\mathbf{A}^{}(\mathbf{w})\mathbf{P}(\mathbf{w})\mathbf{e}|}{R(\mathbf{w})} = o_p(1). \end{aligned}$$

(A.25)

Since $\mathbf{P}_m$ is positive semi-definite and $\lambda _{\tiny \max }(\mathbf{P}_m)\le 1$ for any $1\le m\le M_n$, for any $\mathbf{w}\in \mathcal{W}$, $\mathbf{P}(\mathbf{w})$ is positive semi-definite and

$$\begin{aligned} \lambda _{\tiny \max }\left\{ \mathbf{P}(\mathbf{w})\right\} \le 1. \end{aligned}$$

(A.26)

Then, by Conditions (C.3)–(C.4), (A.16) and (A.26), we can verify (A.23) by noting that

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^{2}}{R(\mathbf{w})}\le & {} \lambda _{\tiny \max }\left\{ \mathbf{P}(\mathbf{w})\right\} \sup _{\mathbf{w}\in \mathcal{W}}R^{-1}(\mathbf{w})\left| \mathbf{e}'\mathbf{P}{(\mathbf{w})}\mathbf{e}\right| \\= & {} o_p(1), \end{aligned}$$

where we have used the fact that for any fixed $\delta >0$

$$\begin{aligned} P\left( \sup _{\mathbf{w}\in \mathcal{W}}\frac{\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^{2}}{R(\mathbf{w})}> \delta \right)\le & {} P\left( \sup _{\mathbf{w}\in \mathcal{W}}\frac{ \mathbf{e}'\mathbf{P}(\mathbf{w})\mathbf{e}}{R(\mathbf{w})} > \delta \right) \\\le & {} \sum \nolimits _{m = 1}^{M_n} \delta ^{-1} E_{\tilde{\mathbf{X}}}\left\{ \frac{ \text{ trace }\left( \mathbf{P}_m \varvec{{\varOmega }}\right) }{\xi _n }\right\} \\\le & {} \frac{\bar{\sigma } ^2}{ \delta } \sum \nolimits _{m = 1}^{M_n} {\sum \nolimits _{i = 1}^n E_{\tilde{\mathbf{X}}}(P_{ii}^m \xi _n^{-1})} \\\rightarrow & {} 0. \end{aligned}$$

In addition, it follows from Condition (C.3) and (A.26) that,

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\mathbf{P}(\mathbf{w})\right\} \right| }{R(\mathbf{w})} \nonumber \\&\quad \le \xi _n^{ - 1} \sup _{\mathbf{w}\in \mathcal{W}}\sum \nolimits _{m = 1}^{M_n} {\sum \nolimits _{l = 1}^{M_n} {w_m w_l \text{ trace }\left( \mathbf{P}_m \varvec{{\varOmega }}\mathbf{P}_l \right) } }\nonumber \\&\quad \le \xi _n^{ - 1}\sup _{\mathbf{w}\in \mathcal{W}}\sum \nolimits _{m = 1}^{M_n} {\sum \nolimits _{l = 1}^{M_n} {w_m w_l \lambda _{\tiny \max }(\varvec{{\varOmega }})\lambda _{\max } (\mathbf{P}_l )\text{ trace }\left( \mathbf{P}_m \right) } } \nonumber \\&\quad \le \bar{\sigma }^2\xi _n^{ - 1} \max _{1\le m\le M_n}\max _{1\le l\le M_n} { \lambda _{\max } (\mathbf{P}_l )\text{ trace }\left( \mathbf{P}_m \right) }. \nonumber \\&\quad = O(\xi _n^{-1}),~ a.s.. \end{aligned}$$

(A.27)

Combining (A.16) and (A.27) will lead to (A.24). In addition, recalling that $ \left\| {\mathbf{A}(\mathbf{w})\varvec{{\mu }}} \right\| ^2 \le R(\mathbf{w})$, we have

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\frac{|\varvec{{\mu }}'\mathbf{A}^{}(\mathbf{w})\mathbf{P}(\mathbf{w})\mathbf{e}|}{R(\mathbf{w})} \\&\quad \le \sup _{\mathbf{w}\in \mathcal{W}}\left\{ R^{-2}(\mathbf{w})\Vert \mathbf{A}^{}(\mathbf{w})\varvec{{\mu }}\Vert ^2\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^2\right\} ^{1/2}\\&\quad \le \sup _{\mathbf{w}\in \mathcal{W}}\left\{ R^{-1}(\mathbf{w})\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^2\right\} ^{1/2}, \end{aligned}$$

which with (A.23) will lead to (A.25). This concludes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, S., Liao, J. & Yu, D. Model averaging estimator in ridge regression and its large sample properties. Stat Papers 61, 1719–1739 (2020). https://doi.org/10.1007/s00362-018-1002-4

Download citation

Received: 06 October 2017
Revised: 28 March 2018
Published: 18 April 2018
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00362-018-1002-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model averaging estimator in ridge regression and its large sample properties

Abstract

Access this article

Similar content being viewed by others

The weighted ridge estimator in stochastic restricted linear measurement error models

Recent results in ridge regression methods

The stochastic restricted ridge estimator in generalized linear models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (rar 10 KB)

Supplementary material 2 (rar 408 KB)

Appendices

Appendices

1.1 A.1 Notations and regularity conditions

Condition (C.1)

Condition (C.2)

Condition (C.3)

Condition (C.4)

Condition (C.5)

Condition (C.6)

1.2 A.2 Proof of Theorem 1

1.3 A.3 Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Model averaging estimator in ridge regression and its large sample properties

Abstract

Access this article

Similar content being viewed by others

The weighted ridge estimator in stochastic restricted linear measurement error models

Recent results in ridge regression methods

The stochastic restricted ridge estimator in generalized linear models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (rar 10 KB)

Supplementary material 2 (rar 408 KB)

Appendices

Appendices

1.1 A.1 Notations and regularity conditions

Condition (C.1)

Condition (C.2)

Condition (C.3)

Condition (C.4)

Condition (C.5)

Condition (C.6)

1.2 A.2 Proof of Theorem 1

1.3 A.3 Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation