Skip to main content
Log in

Model averaging estimator in ridge regression and its large sample properties

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In linear regression, when the covariates are highly collinear, ridge regression has become the standard treatment. The choice of ridge parameter plays a central role in ridge regression. In this paper, instead of ending up with a single ridge parameter, we consider a model averaging method to combine multiple ridge estimators with \(M_n\) different ridge parameters, where \(M_n\) can go to infinity with sample size n. We show that when the fitting model is correctly specified, the resulting model averaging estimator is \(n^{1/2}\)-consistent. When the fitting model is misspecified, the asymptotic optimality of the model averaging estimator is also established rigorously. The results of simulation studies and our case study concerning the urbanization level of Chinese ethnic areas demonstrate the usefulness of the model averaging method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618

    Article  Google Scholar 

  • Chen X, Zou G, Zhang X (2013) Frequentist model averaging for linear mixed-effects models. Front Math China 8:497–515

    Article  MathSciNet  Google Scholar 

  • Claeskens G, Croux C, van Kerckhoven J (2006) Variable selection for logistic regression using a prediction-focused information criterion. Biometrics 62:972–979

    Article  MathSciNet  Google Scholar 

  • Clue E, Vineis P, De Iorio M (2011) Significance testing in ridge regression for genetic data. BMC Bioinform 12:372

    Article  Google Scholar 

  • Dempster A, Schatzoff M, Wermuth N (1975) A simulation study of alternatives to ordinary least squares. J Am Stat Assoc 70:77–106

    MATH  Google Scholar 

  • Flynn CJ, Hurvich CM, Simonoff JS (2013) Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J Am Stat Assoc 108:1031–1043

    Article  MathSciNet  Google Scholar 

  • Gao Y, Zhang X, Wang S, Zou G (2016) Model averaging based on leave-subject-out cross-validation. J Econ 192:139–151

    Article  MathSciNet  Google Scholar 

  • Ghosh S, Yuan Z (2009) An improved model averaging scheme for logistic regression. J Multivar Anal 100:1670–1681

    Article  MathSciNet  Google Scholar 

  • Golub G, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21:215–223

    Article  MathSciNet  Google Scholar 

  • Hansen BE (2007) Least squares model averaging. Econometrica 75:1175–1189

    Article  MathSciNet  Google Scholar 

  • Hansen BE, Racine J (2012) Jacknife model averaging. J Econ 167:38–46

    Article  Google Scholar 

  • Hoerl A, Kennard R (1970) Ridge regression, biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  Google Scholar 

  • Lee T (1987) Algorithm as 223: optimum ridge parameter selection. J R Stat Soc C 36:112–118

    MATH  Google Scholar 

  • Leung G, Barron AR (2006) Information theory and mixing least-squares regressions. IEEE Trans Inf Theory 52:3396–3410

    Article  MathSciNet  Google Scholar 

  • Liu Q, Okui R (2013) Heteroskedasticity-robust Cp model averaging. Econ J 16:463–472

    Google Scholar 

  • Lu X, Su L (2015) Jackknife model averaging for quantile regressions. J Econ 188:40–58

    Article  MathSciNet  Google Scholar 

  • Magnus J, De Luca G (2016) Weighted-average least squares (WALS), a survey. J Econ Surv 30:117–148

    Article  Google Scholar 

  • Moral-Benito E (2015) Model averaging in economics: an overview. J Econ 29:46–75

    Google Scholar 

  • Nordberg L (1982) A procedure for determination of a good ridge parameter in linear regression. Commun Stat Simul Comput 11:285–309

    Article  MathSciNet  Google Scholar 

  • Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York

    Book  Google Scholar 

  • Schomaker M (2012) Shrinkage averaging estimation. Stat Pap 53(4):1015–1034

    Article  MathSciNet  Google Scholar 

  • Schomaker M, Wan ATK, Heumann C (2010) Frequentist model averaging with missing observations. Comput Stat Data Anal 54:3336–3347

    Article  MathSciNet  Google Scholar 

  • Wan ATK, Zhang X, Zou G (2010) Least squares model averaging by Mallows criterion. J Econ 156:277–283

    Article  MathSciNet  Google Scholar 

  • Wang H, Zhang X, Zou G (2009) Frequentist model averaging estimation: a review. J Syst Sci Complex 22:732–748

    Article  MathSciNet  Google Scholar 

  • Yu Y, Thurston S, Hauser R, Liang H (2013) Model averaging procedure for partially linear single-index models. J Stat Plan Inference 143:2160–2170

    Article  MathSciNet  Google Scholar 

  • Yuan Z, Yang Y (2005) Combining linear regression models: when and how? J Am Stat Assoc 100:1202–1214

    Article  MathSciNet  Google Scholar 

  • Zhang X (2015) Consistency of model averaging estimators. Econ Lett 130:120–123

    Article  MathSciNet  Google Scholar 

  • Zhang X, Liang H (2011) Focused information criterion and model averaging for generalized additive partial linear models. Ann Stat 39:174–200

    Article  MathSciNet  Google Scholar 

  • Zhang X, Wang W (2017) Optimal model averaging estimation for partially linear models. Stat Sin (forthcoming)

  • Zhang X, Wan ATK, Zhou SZ (2012) Focused information criteria, model selection and model averaging in a Tobit model with a non-zero threshold. J Busi Econ Stat 30:132–142

    Article  Google Scholar 

  • Zhang X, Wan A, Zou G (2013) Model averaging by jackknife criterion in models with dependent data. J Econ 174:82–94

    Article  MathSciNet  Google Scholar 

  • Zhang X, Zou G, Liang H (2014) Model averaging and weight choice in linear mixed-effects models. Biometrika 101:205–218

    Article  MathSciNet  Google Scholar 

  • Zhang X, Zou G, Carroll R (2015) Model averaging based on Kullback-Leibler distance. Stat Sin 25:1583–1598

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank two anonymous referees for their insightful comments and very constructive suggestions that have substantially improved earlier versions of this paper. Zhao’s research was supported by a grant from the Ministry of Education of China (Grant No. 17YJC910011) and a grant from Minzu University of China (Grant No. 2017QNPY34). Yu’s research was supported by the National Natural Science Foundation of China (Grant Nos. 11661079 and 11301463).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalei Yu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (rar 10 KB)

Supplementary material 2 (rar 408 KB)

Appendices

Appendices

1.1 A.1 Notations and regularity conditions

Let \(\lambda _{\min }(\mathbf{B})\) and \(\lambda _{\max }(\mathbf{B})\) be the minimum and maximum eigenvalues of a general real matrix \(\mathbf{B}\). Denote \(\left\| \mathbf{B}\right\| \) as the spectral norm of a real matrix \(\mathbf{B}\), i.e. \( \left\| \mathbf{B}\right\| = \lambda _{\max }^{1/2}(\mathbf{B}'\mathbf{B})\). Let \(R^{}(\mathbf{w})={E\{L^{}(\mathbf{w})|\tilde{\mathbf{X}}\}=\text {E}\left\{ \Vert \varvec{{\mu }}-\widehat{\varvec{{\mu }}}^{}(\mathbf{w})\Vert ^2|\tilde{\mathbf{X}}\right\} }\), \(\xi _n=\inf \limits _{w\in \mathcal{W}} R(\mathbf{w})\), and \(\mathbf{w}_m^0\) be a weight vector in which the m-th element is one and the others are zeros. We need the following regularity conditions, where all limiting processes are with respect to \(n\rightarrow \infty \).

Condition (C.1)

\(\lambda _{\min }(\mathbf{X}'\mathbf{X}/n)\) and \(\lambda _{\max }(\mathbf{X}'\mathbf{X}/n)\) are bounded below and above by positive constants \(c_0\) and \(c_1\), a.s., respectively, and \(n^{-1/2}\mathbf{X}'\mathbf{e}=O_p(1)\).

Condition (C.2)

There exists a \( {m^*}\in \{1,\ldots ,M_n\}\) such that \(n^{-1/2}k_{m^*}=O_p(1) \).

Condition (C.3)

\(p^*={O(n^{-1})}\), a.s., where \( p^*=\max _{1\le m\le M_n}\max _{1\le i\le n}P^{m}_{ii}\), and \(P^{m}_{ii}\) is the i-th diagonal element of \(\mathbf{P}_m = \mathbf{X}(\mathbf{X}'\mathbf{X}+k_m\mathbf{I}_p)^{-1}\mathbf{X}'\).

Condition (C.4)

\({\sup _{i\in \{1,\ldots ,n\}}}E{(e^4_i|\tilde{\mathbf{x}}_i)=O(1),}\) a.s., where \(e_i\) is the random error defined in (8).

Condition (C.5)

\({\varvec{{\mu }}'\varvec{{\mu }}}/{n}={O(1)}, {a.s.}.\)

Condition (C.6)

\(\xi _n^{-2}\sum \limits _{m=1}^{M_n} R(\mathbf{w}_m^0)={o(1)}, {a.s..}\)

The first part of Condition (C.1) guarantees the identifiability of the model and is common in the literature on model selection (Flynn et al. 2013). The second part of Condition (C.1) is mild and holds under some typical situations, e.g., for the case that \(\{\mathbf{X}_i,e_i\}\)’s are independent and satisfy some moment conditions. Condition (C.2) is mild and it requires that there exists an \( m^*\) such that \(k_{m^*}\) grows at a rate no greater than \(n^{1/2}\). In fact, the ridge parameters adopted in Sect. 3 satisfy this condition. The reason is that \(p\max _{1\le j\le p}\widehat{\alpha }_j^2 \ge \Vert \varvec{{\beta }}_0\Vert ^2 + \varepsilon _n\), with \(\varepsilon _n = 2\varvec{{\beta }}_0'(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{e}+ \Vert (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{e}\Vert ^2 = O_p(n^{-1/2})\) under Condition (C.1) and therefore \(k^* = \widehat{\sigma }^2/\max _{ 1\le j\le p}\widehat{\alpha }_j^2 \le p\widehat{\sigma }^2/( \Vert \varvec{{\beta }}_0\Vert ^2 + \varepsilon _n) = O_p(1)\). Condition (C.3) is reasonable and weaker than Condition (C.2) of Zhang (2015). The reason is that \(\{\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'-\mathbf{P}_m\}\) is a positive definite matrix and thus \(\mathbf{l}_{i}'\{\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'-\mathbf{P}_m\}\mathbf{l}_{i}\ge 0\), where \(\mathbf{l}_{i}\) is the i-th column of \(\mathbf{I}_n\). Condition (C.4) is a commonly used condition in the literature (Wan et al. 2010). This excludes the situation where the random error distribution comes from a specific family of heavy-tailed distributions, such as the t-distribution whose degree of freedom is no greater than 4 or a Pareto distribution whose shape parameter is no greater than 4. Conditions (C.5) and (C.6) are the same as (23) and (21) in Zhang et al. (2013), respectively.

1.2 A.2 Proof of Theorem 1

This proof follows the framework in Zhang (2015). Define \(\mathbf{D}_m\) as the \(n\times {n}\) diagonal matrix with \(h^{m}_{ii} = (1-P^{m}_{ii})^{-1}\) being its i-th diagonal element. From (6) and the fact that

$$\begin{aligned} \mathbf{X}'_{[-i]}\mathbf{Y}_{[-i]}= \mathbf{X}'\mathbf{Y}- \mathbf{X}_i\mathbf{Y}_i,\quad h^{m}_{ii}P^{m}_{ii}=h^{m}_{ii}-1 \quad \text {and}\quad h^{m}_{ii}P^{m}_{ii}P^{m}_{ii}=h^{m}_{ii}-1-P^{m}_{ii},\nonumber \\ \end{aligned}$$
(A.1)

we have

$$\begin{aligned} \widetilde{\varvec{{\mu }}}_{k_m}= & {} \mathbf{P}_m \mathbf{Y}- \text{ diag }(P_{ii}^m) \mathbf{Y}+ \text{ diag }(h_{ii}^m) \text{ diag }(P_{ii}^m)\mathbf{P}_m \mathbf{Y}\nonumber \\&- \text{ diag }(h_{ii}^m) \text{ diag }(P_{ii}^m) \text{ diag }(P_{ii}^m)\mathbf{Y}\nonumber \\= & {} \mathbf{D}_m\mathbf{P}_m\mathbf{Y}- \mathbf{D}_m\mathbf{Y}+ \mathbf{Y}. \end{aligned}$$
(A.2)

Now we define \(\mathbf{Q}_m\) as the \(n\times {n}\) diagonal matrix whose i-th diagonal element is \(Q_{m,ii}=P^m_{ii}/(1-P^m_{ii})\). Then \( \mathbf{D}_m=\mathbf{Q}_m + \mathbf{I}_n\). By (A.2),

$$\begin{aligned} \widetilde{\varvec{{\mu }}}(\mathbf{w})= & {} \sum _{m=1}^{M_n}\{w_m\mathbf{D}_m (\mathbf{P}_m - \mathbf{I}_n)\mathbf{Y}\} + \mathbf{Y}\nonumber \\= & {} \sum _{m=1}^{M_n} \{w_m(\mathbf{Q}_m + \mathbf{I}_n) (\mathbf{P}_m - I_n)\mathbf{Y}\} + \mathbf{Y}. \end{aligned}$$
(A.3)

Let \(\mathbf{V}\) be the \(M_n \times M_n \) matrix whose (mj)-th entry is

$$\begin{aligned} V_{mj} = (\mathbf{e}+\mathbf{X}\varvec{{\beta }}_0)'(\mathbf{I}_n - \mathbf{P}_m)(\mathbf{Q}_m+\mathbf{Q}_j+\mathbf{Q}_m\mathbf{Q}_j)(\mathbf{I}_n-\mathbf{P}_j)(\mathbf{e}+\mathbf{X}\varvec{{\beta }}_0). \end{aligned}$$
(A.4)

It follows from (A.3)–(A.4) that

$$\begin{aligned} \mathrm {CV}(\mathbf{w})&=\left\| \sum _{m=1}^{M_n} w_m(\mathbf{Q}_m + \mathbf{I}_n)(\mathbf{P}_m - \mathbf{I}_n)\mathbf{Y}\right\| ^2\nonumber \\&= \Vert \mathbf{e}\Vert ^2+\{\widehat{\varvec{{\beta }}} (\mathbf{w})- \varvec{{\beta }}_0\}'\mathbf{X}'\mathbf{X}\{\widehat{\varvec{{\beta }}}(\mathbf{w})- \varvec{{\beta }}_0\} - 2\mathbf{e}'\mathbf{X}\{\widehat{\varvec{{\beta }}}(\mathbf{w})- \varvec{{\beta }}_0\}+ \mathbf{w}'\mathbf{V}\mathbf{w}.\nonumber \\ \end{aligned}$$
(A.5)

Next, we will show that

$$\begin{aligned} \sqrt{n}(\widehat{\varvec{{\beta }}}_{k_m^{^*}}- \varvec{{\beta }}_0 )=O_p(1) \end{aligned}$$
(A.6)

with \(m^*\) being defined in Condition (C.2), and that for any \(\mathbf{w}\in \mathcal{W}\),

$$\begin{aligned} \mathbf{w}'\mathbf{V}\mathbf{w}=O_p(1). \end{aligned}$$
(A.7)

To show (A.6), note that

$$\begin{aligned} \sqrt{n}\left( \widehat{\varvec{{\beta }}}_{k_m^{^*}}- \varvec{{\beta }}_0\right) = \left( n^{-1}\mathbf{X}'\mathbf{X}+n^{-1}k_{m^*}\mathbf{I}_p\right) ^{-1} \left( n^{-1/2}\mathbf{X}'\mathbf{e}-n^{-1/2}k_{m^*}\varvec{{\beta }}_0\right) , \end{aligned}$$

then (A.6) holds under Conditions (C.1)(C.2). Moreover, by the definition of \(\mathbf{Q}_m\), Conditions (C.1)(C.3), and Eq. (A.4), it is seen that (A.7) also holds because of the fact that

$$\begin{aligned} |V_{mj}|\le \Vert \mathbf{e}+\mathbf{X}\varvec{{\beta }}_0\Vert ^2\Vert \mathbf{I}_n - \mathbf{P}_m\Vert \Vert \mathbf{I}_n - \mathbf{P}_j\Vert (\Vert \mathbf{Q}_j\Vert + \Vert \mathbf{Q}_m\Vert + \Vert \mathbf{Q}_m\Vert \Vert \mathbf{Q}_j\Vert )=O_p(1), \end{aligned}$$

uniformly for every \(m,j\in \{1,\ldots ,M_n\}\). Denote

$$\begin{aligned} \eta _n = \left( \widehat{\varvec{{\beta }}}_{k_m^{^*}}- \varvec{{\beta }}_0 \right) '\mathbf{X}'\mathbf{X}\left( \widehat{\varvec{{\beta }}}_{k_m^{^*}}-\varvec{{\beta }}_0 \right) - 2\mathbf{e}'\mathbf{X}\left( \widehat{\varvec{{\beta }}}_{k_m^{^*}} - \varvec{{\beta }}_0\right) + \mathbf{V}_{m^{*}m^{*}}. \end{aligned}$$

Based on (A.6), (A.7), and Condition (C.1), we have

$$\begin{aligned} \eta _n = O_p(1). \end{aligned}$$
(A.8)

In addition, by the definitions of \(\widehat{\mathbf{w}}\) and \(\eta _n\), we have that

$$\begin{aligned} \mathrm {CV}(\widehat{\mathbf{w}})\le \mathrm {CV}( \mathbf{w}_{m^*}^0 ) =\Vert \mathbf{e}\Vert ^2 + \eta _n, \end{aligned}$$

which, together with (A.5), implies that

$$\begin{aligned}&\lambda _{\tiny \min }\left( \mathbf{X}'\mathbf{X}/n \right) \Vert \sqrt{n}\{\widehat{\varvec{{\beta }}}(\mathbf{w})- \varvec{{\beta }}_0 \}\Vert ^2 \\&\quad \le \eta _n + 2\Vert n^{-1/2}\mathbf{e}'\mathbf{X}\Vert \Vert \sqrt{n}\{\widehat{\varvec{{\beta }}}(\widehat{\mathbf{w}})- \varvec{{\beta }}_0 \}\Vert - \widehat{\mathbf{w}}' \mathbf{V}\widehat{\mathbf{w}} \end{aligned}$$

and thus

$$\begin{aligned}&\lambda _{\tiny \min }\left( \frac{\mathbf{X}'\mathbf{X}}{n}\right) \left[ \Vert \sqrt{n}\{\widehat{\varvec{{\beta }}}(\widehat{\mathbf{w}})- \varvec{{\beta }}_0 \}\Vert - \lambda _{\tiny \min }^{-1}\left( \frac{\mathbf{X}'\mathbf{X}}{n}\right) \left\| \frac{\mathbf{e}'\mathbf{X}}{n^{1/2}}\right\| \right] ^2\\&\quad \le \eta _n + \lambda _{\tiny \min }^{-1}\left( \frac{\mathbf{X}'\mathbf{X}}{n}\right) \left\| \frac{\mathbf{e}'\mathbf{X}}{n^{1/2}}\right\| ^2 - \widehat{\mathbf{w}}' \mathbf{V}\widehat{\mathbf{w}} . \end{aligned}$$

Then, by Condition (C.1) and (A.7)–(A.8), we have

$$\begin{aligned}&\Vert \sqrt{n}\{\widehat{\varvec{{\beta }}}(\widehat{\mathbf{w}})- \varvec{{\beta }}_0 \}\Vert \le c_0^{-1/2}\left( \eta _n + c_0^{-1}\Vert n^{-1/2}\mathbf{e}'\mathbf{X}\Vert ^2 - \widehat{\mathbf{w}}' \mathbf{V}\widehat{\mathbf{w}} \right) ^{1/2} + c_0^{-1}\Vert n^{-1/2}\mathbf{e}'\mathbf{X}\Vert \nonumber \\&\quad = O_p(1), \end{aligned}$$

which concludes the proof.

1.3 A.3 Proof of Theorem 2

Let

$$\begin{aligned} \mathbf{A}_m=\mathbf{I}_n - \mathbf{P}_m,\quad \mathbf{A}(\mathbf{w})= \sum _{m=1}^{M_n} w_m\mathbf{A}_m,\quad {\mathrm{and}}\quad \mathbf{B}_m=\mathbf{Q}_m\mathbf{A}_m, \end{aligned}$$

where \(\mathbf{Q}_m\) is defined in Appendix A.2, and \(\mathbf{B}(\mathbf{w})=\sum _{m=1}^{M_n} w_m\mathbf{B}_m\). By (A.3), we have

$$\begin{aligned} \mathbf{Y}-\widetilde{\varvec{{\mu }}}(\mathbf{w}) = \sum _{m=1}^{M_n} \{w_m(\mathbf{Q}_m\mathbf{A}_m + \mathbf{A}_m) \mathbf{Y}\}= \{\mathbf{B}(\mathbf{w}) + \mathbf{A}(\mathbf{w})\} \mathbf{Y}. \end{aligned}$$
(A.9)

Define \(\mathbf{M}(\mathbf{w})=\mathbf{A}(\mathbf{w})\mathbf{B}(\mathbf{w})+\mathbf{B}(\mathbf{w})\mathbf{A}(\mathbf{w})+\mathbf{B}(\mathbf{w})\mathbf{B}(\mathbf{w})\), then it is seen from (A.9) that

$$\begin{aligned} \mathrm {CV}(\mathbf{w})= & {} \left\| \{\mathbf{B}(\mathbf{w}) + \mathbf{A}(\mathbf{w})\}\mathbf{Y}\right\| ^2\nonumber \\= & {} \mathbf{Y}' \mathbf{A}(\mathbf{w})\mathbf{A}(\mathbf{w}) \mathbf{Y}+ \mathbf{Y}' \mathbf{M}(\mathbf{w}) \mathbf{Y}\nonumber \\= & {} L(\mathbf{w}) + r(\mathbf{w}) + \mathbf{e}'\mathbf{e}, \end{aligned}$$
(A.10)

where

$$\begin{aligned} r(\mathbf{w})= & {} 2\varvec{{\mu }}'\mathbf{A}(\mathbf{w})\mathbf{e}- 2\mathbf{e}'\mathbf{P}(\mathbf{w})\mathbf{e}+ 2\text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\right\} - 2\text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\right\} \\&+ \varvec{{\mu }}' \mathbf{M}(\mathbf{w}) \varvec{{\mu }}+2\varvec{{\mu }}'\mathbf{M}(\mathbf{w})\mathbf{e}+\mathbf{e}'\mathbf{M}(\mathbf{w})\mathbf{e}, \end{aligned}$$

with \(\mathbf{P}(\mathbf{w}) = \sum _{m=1}^{M_n} w_m \mathbf{P}_m\). Since \(\mathbf{e}'\mathbf{e}\) is independent of \(\mathbf{w}\), to prove Theorem 2, by (A.10), it suffices to show that

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{r(\mathbf{w})}{R(\mathbf{w})} = o_p(1) \end{aligned}$$
(A.11)

and

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\left| \frac{L(\mathbf{w})}{R(\mathbf{w})}-1\right| = o_p(1). \end{aligned}$$
(A.12)

We first prove (A.11). Recall that

$$\begin{aligned} R(\mathbf{w}) = \left\| A(\mathbf{w})\varvec{{\mu }}\right\| ^2 + \text{ trace }\left\{ P(\mathbf{w})\varvec{{\varOmega }}P(\mathbf{w})\right\} . \end{aligned}$$
(A.13)

By (A.13), the Dominated Convergence Theorem, Conditions (C.4) and (C.6), and the assumption that there exists a positive constant \(\bar{\sigma }^2\) such that \(\lambda _{\tiny \max }(\varOmega )=\bar{\sigma }^2<\infty \) a.s., we have that, for any fixed \(\delta >0\), as \(n\rightarrow \infty \)

$$\begin{aligned} P\left( \sup _{\mathbf{w}\in \mathcal{W}}R^{ - 1} (\mathbf{w})|\varvec{{\mu }}' \mathbf{A}(\mathbf{w})\mathbf{e}| \ge \delta \right)&\le P\left( \sup _{\mathbf{w}\in \mathcal{W}}|\varvec{{\mu }}' \mathbf{A}(\mathbf{w})\mathbf{e}| \ge \xi _n \delta \right) \nonumber \\&\le P\left( \max _{1 \le m \le M_n} |\varvec{{\mu }}'\mathbf{A}_m \mathbf{e}| \ge \xi _n \delta \right) \nonumber \\&\le \sum _{m = 1}^{M_n} P\left( |\varvec{{\mu }}' \mathbf{A}_m \mathbf{e}| \ge \xi _n \delta \right) \nonumber \\&\le \sum _{m = 1}^{M_n} \frac{ 1 }{ \delta ^2 }E\left( \frac{ |\varvec{{\mu }}' \mathbf{A}_m \mathbf{e}|^2 }{ \xi _n^2 } \right) \nonumber \\&= \sum _{m = 1}^{M_n} \frac{{1 }}{{\delta ^2 }}E_{\tilde{\mathbf{X}}} \left( \frac{ \varvec{{\mu }}' \mathbf{A}_m \varvec{{\varOmega }}\mathbf{A}_m \varvec{{\mu }}}{ \xi _n^2 } \right) \nonumber \\&\le \frac{\bar{\sigma }^2}{{\delta ^2 }}\sum _{m = 1}^{M_n} E_{\tilde{\mathbf{X}}} \left\{ \frac{\left\| A(\mathbf{w}_m^0)\varvec{{\mu }}\right\| ^2}{ \xi _n^2 } \right\} \nonumber \\&\le \frac{ \bar{\sigma } ^2 }{ \delta ^2 } E_{\tilde{\mathbf{X}}} \left\{ \sum \nolimits _{m = 1}^{M_n} \frac{ R(\mathbf{w}_m^0 ) }{ \xi _n^2 } \right\} \nonumber \\&\rightarrow 0, \end{aligned}$$
(A.14)

where the fourth inequality follows from the Chebyshev’s inequality and the last inequality is a direct result of (A.13). Similarly, by Conditions (C.3), (C.4) and (C.6) and Dominated Convergence Theorem, we obtain

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \mathbf{e}'\mathbf{P}{(\mathbf{w})}\mathbf{e}- \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\right\} \right| }{R(\mathbf{w})} = o_p(1), \end{aligned}$$
(A.15)

where we have used the fact that

$$\begin{aligned} M_n\xi _n^{-1} \le \xi _n^{-2}\sum \limits _{m=1}^{M_n} R(\mathbf{w}_m^0)=o(1), \quad a.s.. \end{aligned}$$
(A.16)

Moreover, by Condition (C.3) and Eq. (A.16)

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\right\} \right| }{R(\mathbf{w})}\le & {} \xi _n^{-1}\lambda _{\tiny \max }(\varvec{{\varOmega }}){\sup _{\mathbf{w}\in \mathcal{W}}} \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\right\} \nonumber \\\le & {} M_n\xi _n^{-1}\bar{\sigma }^2 {\max _{m\in \{1,\ldots ,M_n\}}} \text{ trace }(\mathbf{P}_m)\nonumber \\= & {} o(1), \quad a.s.. \end{aligned}$$
(A.17)

In addition, it follows from Conditions (C.1) and (C.3) that uniformly in m,

$$\begin{aligned} \left\| \mathbf{A}_m \right\| = \left\| \mathbf{I}_n - \mathbf{P}_m \right\| \le 1 + \left\| \mathbf{P}_m \right\| \le 1 + c_1 /c_0, \end{aligned}$$

and

$$\begin{aligned} \left\| \mathbf{Q}_m\right\| = \max _{1\le i\le n} \left\{ P^{m}_{ii}/(1-P^{m}_{ii})\right\} = O(1/n), \quad a.s.. \end{aligned}$$

Therefore,

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\left\| \mathbf{M}(\mathbf{w}) \right\| = \sup _{\mathbf{w}\in \mathcal{W}}\left\| \mathbf{A}(\mathbf{w})\mathbf{B}(\mathbf{w}) + \mathbf{B}(\mathbf{w})\mathbf{A}(\mathbf{w})+ \mathbf{B}(\mathbf{w})\mathbf{B}(\mathbf{w}) \right\| \nonumber \\&\quad = \sup _{\mathbf{w}\in \mathcal{W}}\left\| \sum \nolimits _{m = 1}^{M_n} {\sum \nolimits _{l = 1}^{M_n} {w_m w_l \left( \mathbf{A}_m \mathbf{B}_l +\mathbf{B}_m \mathbf{A}_l + \mathbf{B}_m \mathbf{B}_l \right) } } \right\| \nonumber \\&\quad \le \max _{1\le m\le M_n}\max _{1\le l\le M_n} {\left( \left\| {\mathbf{A}_m \mathbf{Q}_l\mathbf{A}_l } \right\| + \left\| {\mathbf{Q}_m\mathbf{A}_m \mathbf{A}_l } \right\| + \left\| \mathbf{Q}_m \mathbf{A}_m \mathbf{Q}_l \mathbf{A}_l\right\| \right) } \nonumber \\&\quad = O(1/n), \quad a.s.. \end{aligned}$$
(A.18)

Then, from Condition (C.5) and Eq. (A.16),

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \varvec{{\mu }}'\mathbf{M}{(\mathbf{w})}\varvec{{\mu }}\right| }{R(\mathbf{w})}\le \xi _n^{-1}\Vert \varvec{{\mu }}\Vert ^2\sup _{\mathbf{w}\in \mathcal{W}}\Vert \mathbf{M}(\mathbf{w})\Vert = o(1) \quad a.s.. \end{aligned}$$
(A.19)

By Conditions (C.4)(C.5) and Eqs. (A.16) and (A.18), we have

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \varvec{{\mu }}'\mathbf{M}{(\mathbf{w})}\mathbf{e}\right| }{R(\mathbf{w})} \le \xi _n^{-1}\Vert \varvec{{\mu }}\Vert \Vert \mathbf{e}\Vert \sup _{\mathbf{w}\in \mathcal{W}}\Vert \mathbf{M}(\mathbf{w})\Vert = o_p(1). \end{aligned}$$
(A.20)

Likewise, with Condition (C.4) and Eqs. (A.16) and (A.18), we have

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \mathbf{e}'\mathbf{M}{(\mathbf{w})}\mathbf{e}\right| }{R(\mathbf{w})} \le \xi _n^{-1} \Vert \mathbf{e}\Vert ^2\sup _{\mathbf{w}\in \mathcal{W}}\Vert \mathbf{M}(\mathbf{w})\Vert = o_p (1). \end{aligned}$$
(A.21)

Equation (A.11) can then be proved by combining Eqs. (A.14)–(A.17) and (A.19)–(A.21) together.

We now prove (A.12). Note that

$$\begin{aligned}&|L(\mathbf{w})-R(\mathbf{w})|\nonumber \\&\quad =|\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^{2} - \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\mathbf{P}(\mathbf{w})\right\} -2\varvec{{\mu }}'\mathbf{A}^{}(\mathbf{w})\mathbf{P}(\mathbf{w})\mathbf{e}|, \end{aligned}$$
(A.22)

then, to show (A.12), it remains to verify that

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\frac{\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^{2}}{R(\mathbf{w})} = o_p(1), \end{aligned}$$
(A.23)
$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\mathbf{P}(\mathbf{w})\right\} \right| }{R(\mathbf{w})} = o(1),~a.s. \end{aligned}$$
(A.24)

and

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{|\varvec{{\mu }}'\mathbf{A}^{}(\mathbf{w})\mathbf{P}(\mathbf{w})\mathbf{e}|}{R(\mathbf{w})} = o_p(1). \end{aligned}$$
(A.25)

Since \(\mathbf{P}_m\) is positive semi-definite and \(\lambda _{\tiny \max }(\mathbf{P}_m)\le 1\) for any \(1\le m\le M_n\), for any \(\mathbf{w}\in \mathcal{W}\), \(\mathbf{P}(\mathbf{w})\) is positive semi-definite and

$$\begin{aligned} \lambda _{\tiny \max }\left\{ \mathbf{P}(\mathbf{w})\right\} \le 1. \end{aligned}$$
(A.26)

Then, by Conditions (C.3)(C.4), (A.16) and (A.26), we can verify (A.23) by noting that

$$\begin{aligned} \sup _{\mathbf{w}\in \mathcal{W}}\frac{\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^{2}}{R(\mathbf{w})}\le & {} \lambda _{\tiny \max }\left\{ \mathbf{P}(\mathbf{w})\right\} \sup _{\mathbf{w}\in \mathcal{W}}R^{-1}(\mathbf{w})\left| \mathbf{e}'\mathbf{P}{(\mathbf{w})}\mathbf{e}\right| \\= & {} o_p(1), \end{aligned}$$

where we have used the fact that for any fixed \(\delta >0\)

$$\begin{aligned} P\left( \sup _{\mathbf{w}\in \mathcal{W}}\frac{\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^{2}}{R(\mathbf{w})}> \delta \right)\le & {} P\left( \sup _{\mathbf{w}\in \mathcal{W}}\frac{ \mathbf{e}'\mathbf{P}(\mathbf{w})\mathbf{e}}{R(\mathbf{w})} > \delta \right) \\\le & {} \sum \nolimits _{m = 1}^{M_n} \delta ^{-1} E_{\tilde{\mathbf{X}}}\left\{ \frac{ \text{ trace }\left( \mathbf{P}_m \varvec{{\varOmega }}\right) }{\xi _n }\right\} \\\le & {} \frac{\bar{\sigma } ^2}{ \delta } \sum \nolimits _{m = 1}^{M_n} {\sum \nolimits _{i = 1}^n E_{\tilde{\mathbf{X}}}(P_{ii}^m \xi _n^{-1})} \\\rightarrow & {} 0. \end{aligned}$$

In addition, it follows from Condition (C.3) and (A.26) that,

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\frac{\left| \text{ trace }\left\{ \mathbf{P}(\mathbf{w})\varvec{{\varOmega }}\mathbf{P}(\mathbf{w})\right\} \right| }{R(\mathbf{w})} \nonumber \\&\quad \le \xi _n^{ - 1} \sup _{\mathbf{w}\in \mathcal{W}}\sum \nolimits _{m = 1}^{M_n} {\sum \nolimits _{l = 1}^{M_n} {w_m w_l \text{ trace }\left( \mathbf{P}_m \varvec{{\varOmega }}\mathbf{P}_l \right) } }\nonumber \\&\quad \le \xi _n^{ - 1}\sup _{\mathbf{w}\in \mathcal{W}}\sum \nolimits _{m = 1}^{M_n} {\sum \nolimits _{l = 1}^{M_n} {w_m w_l \lambda _{\tiny \max }(\varvec{{\varOmega }})\lambda _{\max } (\mathbf{P}_l )\text{ trace }\left( \mathbf{P}_m \right) } } \nonumber \\&\quad \le \bar{\sigma }^2\xi _n^{ - 1} \max _{1\le m\le M_n}\max _{1\le l\le M_n} { \lambda _{\max } (\mathbf{P}_l )\text{ trace }\left( \mathbf{P}_m \right) }. \nonumber \\&\quad = O(\xi _n^{-1}),~ a.s.. \end{aligned}$$
(A.27)

Combining (A.16) and (A.27) will lead to (A.24). In addition, recalling that \( \left\| {\mathbf{A}(\mathbf{w})\varvec{{\mu }}} \right\| ^2 \le R(\mathbf{w})\), we have

$$\begin{aligned}&\sup _{\mathbf{w}\in \mathcal{W}}\frac{|\varvec{{\mu }}'\mathbf{A}^{}(\mathbf{w})\mathbf{P}(\mathbf{w})\mathbf{e}|}{R(\mathbf{w})} \\&\quad \le \sup _{\mathbf{w}\in \mathcal{W}}\left\{ R^{-2}(\mathbf{w})\Vert \mathbf{A}^{}(\mathbf{w})\varvec{{\mu }}\Vert ^2\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^2\right\} ^{1/2}\\&\quad \le \sup _{\mathbf{w}\in \mathcal{W}}\left\{ R^{-1}(\mathbf{w})\Vert \mathbf{P}(\mathbf{w})\mathbf{e}\Vert ^2\right\} ^{1/2}, \end{aligned}$$

which with (A.23) will lead to (A.25). This concludes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, S., Liao, J. & Yu, D. Model averaging estimator in ridge regression and its large sample properties. Stat Papers 61, 1719–1739 (2020). https://doi.org/10.1007/s00362-018-1002-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-1002-4

Keywords

Mathematics Subject Classification

Navigation