Predictive performance of linear regression models

Özkale, M. Revan

doi:10.1007/s00362-014-0596-4

Predictive performance of linear regression models

Regular Article
Published: 08 May 2014

Volume 56, pages 531–567, (2015)
Cite this article

Statistical Papers Aims and scope Submit manuscript

M. Revan Özkale¹

570 Accesses
14 Citations
Explore all metrics

Abstract

In this paper, the cross-validation methods namely the $C_{p}$, PRESS and GCV are presented under the multiple linear regression model when multicollinearity exists and additional information imposes restrictions among the parameters that should hold in exact terms. The selection of the biasing parameters are given so as to minimize the cross-validation methods. An example is given which illustrates the comprehensive predictive assessment of various estimators and shows the usefullness of computing. Besides, the performance of the estimators under several different conditions is examined via a simulation study. The results displayed that the biased estimator versions and the restricted form of the biased estimator versions of cross-validation methods give better predictive performance in the presence of multicollinearity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Allen DM (1971) Mean square error of prediction as a criterion for selecting variables. Technometrics 13:469–475
Article MATH Google Scholar
Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16(1):125–127
Article MATH MathSciNet Google Scholar
Groß J (2003a) Restricted ridge estimation. Stat Probab Lett 65:57–64
Article MATH Google Scholar
Groß J (2003b) Linear regression. Springer, Berlin
Book MATH Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12:55–67
Article MATH Google Scholar
Hoerl AE, Kennard RW (1976) Ridge regression: iterative estimation of the biasing parameter. Commun Stat 5(1):77–88
Article Google Scholar
Kaçıranlar S, Sakallıoğlu S, Özkale MR, Güler H (2011) More on the restricted ridge regression estimation. J Stat Comput Simul 81(11):1433–1448
Article MATH MathSciNet Google Scholar
Liu K (1993) A new class of biased estimate in linear regression. Commun Stat 22(2):393–402
Article MATH Google Scholar
Liu XQ, Jiang HY (2012) Optimal generalized ridge estimator under the generalized cross-validation criterion in linear regression. Linear Algebra Appl 436:1238–1245
Article MATH MathSciNet Google Scholar
Liu XQ, Li B (2012) General linear estimators under the prediction error sum of squares criterion in a linear regression model. J Appl Stat 39(6):1353–1361
Article MathSciNet Google Scholar
Liu H, Weiss RE, Jennrich RI, Wenger NS (1999) PRESS model selection in repeated measures data. Comput Stat Data Anal 30:169–184
Article MATH MathSciNet Google Scholar
Mallows CL (1973) Some comments on $C_{p}$. Technometrics 15:661–675
MATH Google Scholar
Mayer LS, Willke TA (1973) On biased estimation in linear models. Technometrics 15:497–508
Article MATH MathSciNet Google Scholar
Montgomery DC, Askin RG (1981) Problems for nonnormality and multicollinearity for forecasting methods based on least squares. AIIE Trans 13:102–115
Article MathSciNet Google Scholar
Montgomery DC, Friedman DJ (1993) Prediction using regression models with multicollinear predictor variables. IIE Trans 25(3):73–85
Article Google Scholar
Montgomery DC, Peck EA, Vining GG (2001) Introduction to linear regression analysis. Wiley, New York
MATH Google Scholar
Myers RH (1990) Classical and modern regression with applications. Duxbury Press, California
Google Scholar
Özkale MR, Kaçıranlar S (2007a) A prediction-oriented criterion for choosing the biasing parameter in Liu estimation. Commun Stat 36(10):1889–1903
Article MATH Google Scholar
Özkale MR, Kaçıranlar S (2007b) The restricted and unrestricted two parameter estimators. Commun Stat 36(15):2707–2725
Article MATH Google Scholar
Snee RD (1977) Validation of regression models: methods and examples. Technometrics 19:415–428
Article MATH Google Scholar
Tarpey T (2000) A note on the prediction sum of squares statistic for restricted least squares. Am Stat 54(2):116–118
MathSciNet Google Scholar
Wahba G, Golub GH, Heath M (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2):215–223
Article MATH MathSciNet Google Scholar
Walker E, Birch JB (1988) Influence measures in ridge regression. Technometrics 30(2):221–227
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Faculty Science and Letters, Çukurova University, Adana , 01330, Turkey
M. Revan Özkale

Authors

M. Revan Özkale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Revan Özkale.

Appendices

Appendix 1: Proof of Theorem 1

Let us first write the restricted two parameter estimator (4) as

$$\begin{aligned} \hat{\beta }_{R}(k,d)=\left[ I-S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R\right] \hat{\beta }(k,d)+S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

Let

$$\begin{aligned} \hat{\beta }_{R(i)}(k,d)=\left[ I-S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R\right] \hat{\beta } _{(i)}(k,d)+S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}r\nonumber \\ \end{aligned}$$

(12)

is the vector of regression coefficients computed with the $i$th data point set aside. By using the Sherman-Morrison-Woodbury Theorem, the computations can be verified:

$$\begin{aligned} S_{k(i)}^{-1}=(X_{(i)}^{\prime }X_{(i)}+kI)^{-1}=(S_{k}-x_{i}x_{i}^{\prime })^{-1}=S_{k}^{-1}+\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}}{ 1-h_{ii}(k)} \end{aligned}$$

(13)

where $h_{ii}(k)=x_{i}^{\prime }S_{k}^{-1}x_{i}$. Equation (13) leads to

$$\begin{aligned} (RS_{k(i)}^{-1}R^{\prime })^{-1}&= \left[ RS_{k}^{-1}R^{\prime }+\frac{ RS_{k}^{-1}x_{i}}{\sqrt{1-h_{ii}(k)}}\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }}{\sqrt{1-h_{ii}(k)}}\right] ^{-1}=(RS_{k}^{-1}R^{ \prime })^{-1} \nonumber \\&-\,\frac{\frac{1}{1-h_{ii}(k)}(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \end{aligned}$$

(14)

where $h_{ii}^{R}(k)=x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}$. Multiplying (13) and (14), we get

$$\begin{aligned}&S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R =S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R \nonumber \\&\qquad -\,\frac{\frac{1}{1-h_{ii}(k)}S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \nonumber \\&\qquad +\,\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1-h_{ii}(k)}-\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}}{1-h_{ii}(k)} \nonumber \\&\qquad \times \, \frac{\frac{1}{1-h_{ii}(k)}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}}. \end{aligned}$$

(15)

After the premultiplication of (15) by $x_{i}^{\prime }$ and simplifications, following result holds

$$\begin{aligned} x_{i}^{\prime }S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R&= \left[ \frac{1}{1-h_{ii}(k)}-\frac{1}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \frac{h_{ii}^{R}(k)}{\left( 1-h_{ii}(k)\right) ^{2}}\right] \nonumber \\&\times \, x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R \nonumber \\&= \frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R. \end{aligned}$$

(16)

Similar to the result in (16), we have

$$\begin{aligned} x_{i}^{\prime }S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}r= \frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

(17)

$\hat{\beta }_{(i)}(k,d)$ can be derived by writing $\hat{\beta }(k,d)$ as $ \hat{\beta }(k,d)=d\hat{\beta }+(1-d)\hat{\beta }(k)$. Then using $\hat{\beta } _{(i)}=\hat{\beta }-\frac{1}{1-h_{ii}}S^{-1}x_{i}e_{i}$ (see Myers 1990; Montgomery et al. 2001) and $\hat{\beta }_{(i)}(k)=\hat{\beta }(k)-\frac{1}{ 1-h_{ii}(k)}S_{k}^{-1}x_{i}e_{i,k}$ (see Walker and Birch 1988), $\hat{\beta }_{(i)}(k,d)$ becomes

$$\begin{aligned} \hat{\beta }_{(i)}(k,d)=\hat{\beta }(k,d)-\frac{d}{1-h_{ii}}S^{-1}x_{i}e_{i}- \frac{(1-d)}{1-h_{ii}(k)}S_{k}^{-1}x_{i}e_{i,k} \end{aligned}$$

and the two parameter residual withholding the $i$th observation yields in

$$\begin{aligned} e_{(i)}(k,d)=y_{i}-x_{i}^{\prime }\hat{\beta }_{(i)}(k,d)=\frac{de_{i}}{ 1-h_{ii}}+\frac{(1-d)e_{i,k}}{1-h_{ii}(k)}. \end{aligned}$$

(18)

Noting that the two parameter residual is

$$\begin{aligned} e(k,d)=y-X\hat{\beta }(k,d)=de+(1-d)e_{k} \end{aligned}$$

and the restricted two parameter residual is

$$\begin{aligned} e_{R}(k,d)=y-X\hat{\beta }_{R}(k,d)=e(k,d)+XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r), \end{aligned}$$

and by using Eqs. (12), (16), (17) and (18), the two parameter residual withholding the $i$th observation yields in

$$\begin{aligned} e_{R(i)}(k,d)&= y_{i}-x_{i}^{\prime }\hat{\beta }_{R(i)}(k,d)=\frac{de_{i}}{1-h_{ii}}+\frac{(1-d)e_{i,k}}{1-h_{ii}(k)} \\&+\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}\Bigg [x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R\hat{\beta }(k,d) \\&-\,\frac{d}{1-h_{ii}}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}e_{i}-\frac{(1-d)}{1-h_{ii}(k)} h_{ii}^{R}(k)e_{i,k}\Bigg ] \\&-\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&= \left[ 1-\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \frac{de_{i}}{1-h_{ii}}+\left[ 1- \frac{h_{ii}^{R}(k)}{1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \\&\times \, \frac{(1-d)e_{i,k}}{1-h_{ii}(k)}+\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)} x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r) \\&= \left[ 1-\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}}{1-h_{ii}(k)+h_{ii}^{R}(k)}-\frac{1-h_{ii}}{ 1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \frac{de_{i}}{1-h_{ii}} \\&+\,\frac{1-h_{ii}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{de_{i}}{1-h_{ii}}+\frac{ 1-h_{ii}(k)}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{(1-d)e_{i,k}}{1-h_{ii}(k)} \\&+\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r). \end{aligned}$$

Then the proof is completed.

Appendix 2: Proof of Theorem 2

The total mean square error of estimating the expected values of the response, $E(y_{i})$, from the fitted value, $\tilde{y}_{i}$, is

$$\begin{aligned} \Gamma =\frac{1}{\sigma ^{2}}\sum \limits _{i=1}^{n}MSE(\tilde{y}_{i})=\frac{1}{\sigma ^{2}}\left\{ \sum \limits _{i=1}^{n}Var(\tilde{y}_{i})+\sum \limits _{i=1}^{n}\left[ Bias(\tilde{y}_{i})\right] ^{2}\right\} \end{aligned}$$

(19)

where $\tilde{y}_{i}$ is the fitted value obtained by any estimator. Mallow’s $C_{p}$ statistic (Mallows 1973) estimates $\Gamma $ as

$$\begin{aligned} C_{p}=\frac{SS_{\mathrm{Re}\,s}}{\hat{\sigma }^{2}}-n+2p \end{aligned}$$

where $SS_{\mathrm{Re}\,s}=\sum e_{i}^{2}$ is the least squares residual sum of squares.

The $\hat{\beta }_{R}(k,d)$ estimator is rewritten in the form,

$$\begin{aligned} \hat{\beta }_{R}(k,d)=M_{k}S_{kd}S^{-1}X^{\prime }y+S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \end{aligned}$$

where $S_{kd}=X^{\prime }X+kdI$.

Since the fitted value of $y_{i}$ under the restricted two parameter estimator

$$\begin{aligned} \hat{y}_{i}(k,d)=x_{i}^{\prime }\hat{\beta }_{R}(k,d)=x_{i}^{\prime }M_{k}S_{kd}S^{-1}X^{\prime }y+x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \end{aligned}$$

is the linear combination of the random variable $y$, we respectively get the total variance and squared bias of $\hat{y}_{i}(k,d)$ as

$$\begin{aligned} \frac{1}{\sigma ^{2}}\sum \limits _{i=1}^{n}Var(\hat{y}_{i}(k,d))=\sum \limits _{i=1}^{n}x_{i}^{\prime }M_{k}S_{kd}S^{-1}S_{kd}M_{k}x_{i}=tr(XM_{k}S_{kd}S^{-1}S_{kd}M_{k}X^{\prime }) \end{aligned}$$

and

$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}\!=\!\sum \limits _{i=1}^{n}\left[ E(y_{i})\!-\!E(\hat{y}_{i}(k,d))\right] ^{2}\!=\!\sum \limits _{i=1}^{n}\left[ x_{i}^{\prime }\beta \!-\!x_{i}^{\prime }E(\hat{\beta }_{R}(k,d))\right] ^{2}.\nonumber \\ \end{aligned}$$

(20)

By using

$$\begin{aligned} E(\hat{\beta }_{R}(k,d))&= S_{k}^{-1}S_{kd}\beta -S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(RS_{k}^{-1}S_{kd}\beta -r) \\&= M_{k}S_{kd}\beta +S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r, \end{aligned}$$

Equation (20) becomes

$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}&= \left[ X\beta -XM_{k}S_{kd}S^{-1}X^{\prime }X\beta -XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] ^{\prime } \\&\times \, \left[ X\beta -XM_{k}S_{kd}S^{-1}X^{\prime }X\beta -XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \\&= (X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \\&-\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&-\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

The restricted two parameter residual sum of squares can be written as

$$\begin{aligned} SS_{\mathrm{Re}\,s}^{R}(k,d)&= (y-X\hat{\beta }_{R}(k,d))^{\prime }(y-X\hat{\beta }_{R}(k,d)) \\&= y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y \\&-\,2y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

Then the expected value of $SS_{\mathrm{Re}\,s}^{R}(k,d)$ results in

$$\begin{aligned} E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right)&= E\left[ y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y\right] \nonumber \\&-\,2(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \nonumber \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

(21)

The expected value of the quadratic form equals

$$\begin{aligned}&E[y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y] \nonumber \\&\quad =\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \nonumber \\&\quad +\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta . \end{aligned}$$

(22)

In view of (21) and (22), we obtain

$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}&= E\left( SS_{\mathrm{Re}\, s}^{R}(k,d)\right) \\&-\,\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] . \end{aligned}$$

By expanding the term $tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] $, we get

$$\begin{aligned}&tr\left[ I-XM_{k}S_{kd}S^{-1}X^{\prime }-XS^{-1}S_{kd}M_{k}X^{\prime }+XS^{-1}S_{kd}M_{k}X^{\prime }XM_{k}S_{kd}S^{-1}X^{\prime }\right] \\&\quad =n-tr(M_{k}S_{kd})-tr(S_{kd}M_{k})+tr(XM_{k}S_{kd}S^{-1}X^{\prime }XS^{-1}S_{kd}M_{k}X^{\prime }). \end{aligned}$$

Hence,

$$\begin{aligned} \sum \limits _{i=1}^{n}Var(\hat{y}_{i}(k,d))\!+\!\sum \limits _{i=1}^{n}\left[ Bias(\hat{y} _{i}(k,d))\right] ^{2}\!=\!E(SS_{\mathrm{Re}\,s}^{R}(k,d))-n\sigma ^{2}+2\sigma ^{2}tr(M_{k}S_{kd}). \end{aligned}$$

(23)

Considering Eq.(19), we divide Eq.(23) by $\sigma ^{2}$. Then, using $\hat{ \sigma }_{RLS}^{2}$ instead of $\sigma ^{2}$ and by the property of the trace operator, the estimator of the expression in Eq. (23) is obtained which proves the theorem.

Appendix 3: Proof of Theorem 4

Derivative of ${\textit{PRESS}}^{R}(k,d)$ with respect to $d$ for fixed $k$ can be expressed as

$$\begin{aligned}&\frac{\partial {\textit{PRESS}}^{R}(k,d)}{\partial d}\nonumber \\&\quad =2\sum \left[ \frac{e_{Ri}(k,d)}{1-h_{ii}(k)+h_{ii}^{R}(k)} +\frac{h_{ii}-h_{ii}(k)+h_{ii}^{R}(k)-x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{de_{i}}{1-h_{ii}}\right] q\nonumber \\ \end{aligned}$$

(24)

Equating Eq. (24) to zero, the proof is completed.

Appendix 4: Proof of Theorem 5

Due to Eqs. (21) and (22), we get the derivative of $E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) $ with respect to $d$ for fixed $k$:

$$\begin{aligned} \frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}&= \frac{\partial }{\partial d}E[y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y] \nonumber \\&-\,\frac{\partial }{\partial d}\left[ 2\beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \nonumber \\&= \frac{\partial }{\partial d}\Big \{\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \nonumber \\&+\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \Big \} \nonumber \\&-\,\frac{\partial }{\partial d}\left[ 2\beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \nonumber \\&= \sigma ^{2}tr\left\{ \frac{\partial }{\partial d}\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \right\} \nonumber \\&+\,\frac{\partial }{\partial d}\left[ \beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \right] \nonumber \\&+\,2k\beta ^{\prime }SS^{-1}M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

(25)

By substituting the result

$$\begin{aligned}&\frac{\partial }{\partial d}\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] =-kXM_{k}S^{-1}X^{\prime }-kXS^{-1}M_{k}X^{\prime } \\&\quad +\,kXS^{-1}M_{k}SM_{k}X^{\prime }+kXM_{k}SM_{k}S^{-1}X^{\prime }+2k^{2}dXS^{-1}M_{k}SM_{k}S^{-1}X^{\prime } \end{aligned}$$

into Eq. (25) and using the trace properties, we have

$$\begin{aligned} \frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}&= -2\sigma ^{2}ktr(M_{k})+2\sigma ^{2}ktr(M_{k}SM_{k})+2\sigma ^{2}k^{2}dtr(S^{-1}M_{k}SM_{k}) \\&-\,k\beta ^{\prime }SM_{k}\beta -k\beta ^{\prime }M_{k}S\beta +k\beta ^{\prime }M_{k}SM_{k}S\beta +k\beta ^{\prime }SM_{k}SM_{k}\beta \\&+\,2k^{2}d\beta ^{\prime }M_{k}SM_{k}\beta +2k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

By using the equality $M_{k}SM_{k}=M_{k}-kM_{k}^{2}$, $\frac{\partial E\left( SS_{ \mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}$ reduces to

$$\begin{aligned}&\frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d} \!=\!-2\sigma ^{2}k^{2}tr(M_{k}^{2})\!+\!2\sigma ^{2}k^{2}dtr(S^{-1}M_{k}SM_{k})-k^{2}\beta ^{\prime }M_{k}^{2}S\beta \!-\!k^{2}\beta ^{\prime }S \nonumber \\&\quad \times \, M_{k}^{2}\beta +2k^{2}d\beta ^{\prime }M_{k}SM_{k}\beta +2k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

(26)

Let $\Gamma ^{R}(k,d)=\frac{E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\sigma ^{2}} -n+2tr(XM_{k}S_{kd}S^{-1}X^{\prime })$, then due to Eq. (26) and $\frac{\partial tr(XM_{k}S_{kd}S^{-1}X^{\prime })}{\partial d}=ktr(M_{k})$, we obtain the derivative of $\Gamma ^{R}(k,d)\ $with respect to $d$ for fixed $ k $:

$$\begin{aligned} \frac{\partial \Gamma ^{R}(k,d)}{\partial d}&= d\left[ 2k^{2}tr(S^{-1}M_{k}SM_{k})+2\sigma ^{-2}k^{2}\beta ^{\prime }M_{k}SM_{k}\beta \right] \\&-\,2k^{2}tr(M_{k}^{2})-\sigma ^{-2}k^{2}\beta ^{\prime }M_{k}^{2}S\beta -\sigma ^{-2}k^{2}\beta ^{\prime }SM_{k}^{2}\beta \\&+\,2\sigma ^{-2}k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r+2ktr(M_{k}). \end{aligned}$$

Using the equality $k^{2}tr(M_{k}^{2})-ktr(M_{k})=-ktr(M_{k}SM_{k})$ and equating $\frac{\partial \Gamma ^{R}(k,d)}{\partial d}$ to zero the $d$ value that minimizes $\Gamma ^{R}(k,d)$ is obtained as

$$\begin{aligned} d=\frac{k\beta ^{\prime }M_{k}^{2}S\beta +k\beta ^{\prime }SM_{k}^{2}\beta -2\sigma ^{2}tr(M_{k}SM_{k})-2\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r}{2\sigma ^{2}ktr(S^{-1}M_{k}SM_{k})+2k\beta ^{\prime }M_{k}SM_{k}\beta }. \end{aligned}$$

Since $C_{p}^{R}(k,d)$ is an estimate of $\Gamma ^{R}(k,d)$, we prove the theorem by writing $\hat{\sigma }_{RLS}^{2}$ instead of $\sigma ^{2}$ and $ \hat{\beta }_{RLS}$ instead of $\beta $.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Özkale, M.R. Predictive performance of linear regression models. Stat Papers 56, 531–567 (2015). https://doi.org/10.1007/s00362-014-0596-4

Download citation

Received: 17 August 2012
Revised: 29 January 2014
Published: 08 May 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s00362-014-0596-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictive performance of linear regression models

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Theorem 1

Appendix 2: Proof of Theorem 2

Appendix 3: Proof of Theorem 4

Appendix 4: Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Predictive performance of linear regression models

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Theorem 1

Appendix 2: Proof of Theorem 2

Appendix 3: Proof of Theorem 4

Appendix 4: Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation