Skip to main content
Log in

Predictive performance of linear regression models

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In this paper, the cross-validation methods namely the \(C_{p}\), PRESS and GCV are presented under the multiple linear regression model when multicollinearity exists and additional information imposes restrictions among the parameters that should hold in exact terms. The selection of the biasing parameters are given so as to minimize the cross-validation methods. An example is given which illustrates the comprehensive predictive assessment of various estimators and shows the usefullness of computing. Besides, the performance of the estimators under several different conditions is examined via a simulation study. The results displayed that the biased estimator versions and the restricted form of the biased estimator versions of cross-validation methods give better predictive performance in the presence of multicollinearity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  • Allen DM (1971) Mean square error of prediction as a criterion for selecting variables. Technometrics 13:469–475

    Article  MATH  Google Scholar 

  • Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16(1):125–127

    Article  MATH  MathSciNet  Google Scholar 

  • Groß J (2003a) Restricted ridge estimation. Stat Probab Lett 65:57–64

    Article  MATH  Google Scholar 

  • Groß J (2003b) Linear regression. Springer, Berlin

    Book  MATH  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  MATH  Google Scholar 

  • Hoerl AE, Kennard RW (1976) Ridge regression: iterative estimation of the biasing parameter. Commun Stat 5(1):77–88

    Article  Google Scholar 

  • Kaçıranlar S, Sakallıoğlu S, Özkale MR, Güler H (2011) More on the restricted ridge regression estimation. J Stat Comput Simul 81(11):1433–1448

    Article  MATH  MathSciNet  Google Scholar 

  • Liu K (1993) A new class of biased estimate in linear regression. Commun Stat 22(2):393–402

    Article  MATH  Google Scholar 

  • Liu XQ, Jiang HY (2012) Optimal generalized ridge estimator under the generalized cross-validation criterion in linear regression. Linear Algebra Appl 436:1238–1245

    Article  MATH  MathSciNet  Google Scholar 

  • Liu XQ, Li B (2012) General linear estimators under the prediction error sum of squares criterion in a linear regression model. J Appl Stat 39(6):1353–1361

    Article  MathSciNet  Google Scholar 

  • Liu H, Weiss RE, Jennrich RI, Wenger NS (1999) PRESS model selection in repeated measures data. Comput Stat Data Anal 30:169–184

    Article  MATH  MathSciNet  Google Scholar 

  • Mallows CL (1973) Some comments on \(C_{p}\). Technometrics 15:661–675

    MATH  Google Scholar 

  • Mayer LS, Willke TA (1973) On biased estimation in linear models. Technometrics 15:497–508

    Article  MATH  MathSciNet  Google Scholar 

  • Montgomery DC, Askin RG (1981) Problems for nonnormality and multicollinearity for forecasting methods based on least squares. AIIE Trans 13:102–115

    Article  MathSciNet  Google Scholar 

  • Montgomery DC, Friedman DJ (1993) Prediction using regression models with multicollinear predictor variables. IIE Trans 25(3):73–85

    Article  Google Scholar 

  • Montgomery DC, Peck EA, Vining GG (2001) Introduction to linear regression analysis. Wiley, New York

    MATH  Google Scholar 

  • Myers RH (1990) Classical and modern regression with applications. Duxbury Press, California

    Google Scholar 

  • Özkale MR, Kaçıranlar S (2007a) A prediction-oriented criterion for choosing the biasing parameter in Liu estimation. Commun Stat 36(10):1889–1903

    Article  MATH  Google Scholar 

  • Özkale MR, Kaçıranlar S (2007b) The restricted and unrestricted two parameter estimators. Commun Stat 36(15):2707–2725

    Article  MATH  Google Scholar 

  • Snee RD (1977) Validation of regression models: methods and examples. Technometrics 19:415–428

    Article  MATH  Google Scholar 

  • Tarpey T (2000) A note on the prediction sum of squares statistic for restricted least squares. Am Stat 54(2):116–118

    MathSciNet  Google Scholar 

  • Wahba G, Golub GH, Heath M (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2):215–223

    Article  MATH  MathSciNet  Google Scholar 

  • Walker E, Birch JB (1988) Influence measures in ridge regression. Technometrics 30(2):221–227

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Revan Özkale.

Appendices

Appendix 1: Proof of Theorem 1

Let us first write the restricted two parameter estimator (4) as

$$\begin{aligned} \hat{\beta }_{R}(k,d)=\left[ I-S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R\right] \hat{\beta }(k,d)+S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

Let

$$\begin{aligned} \hat{\beta }_{R(i)}(k,d)=\left[ I-S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R\right] \hat{\beta } _{(i)}(k,d)+S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}r\nonumber \\ \end{aligned}$$
(12)

is the vector of regression coefficients computed with the \(i\)th data point set aside. By using the Sherman-Morrison-Woodbury Theorem, the computations can be verified:

$$\begin{aligned} S_{k(i)}^{-1}=(X_{(i)}^{\prime }X_{(i)}+kI)^{-1}=(S_{k}-x_{i}x_{i}^{\prime })^{-1}=S_{k}^{-1}+\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}}{ 1-h_{ii}(k)} \end{aligned}$$
(13)

where \(h_{ii}(k)=x_{i}^{\prime }S_{k}^{-1}x_{i}\). Equation (13) leads to

$$\begin{aligned} (RS_{k(i)}^{-1}R^{\prime })^{-1}&= \left[ RS_{k}^{-1}R^{\prime }+\frac{ RS_{k}^{-1}x_{i}}{\sqrt{1-h_{ii}(k)}}\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }}{\sqrt{1-h_{ii}(k)}}\right] ^{-1}=(RS_{k}^{-1}R^{ \prime })^{-1} \nonumber \\&-\,\frac{\frac{1}{1-h_{ii}(k)}(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \end{aligned}$$
(14)

where \(h_{ii}^{R}(k)=x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}\). Multiplying (13) and (14), we get

$$\begin{aligned}&S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R =S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R \nonumber \\&\qquad -\,\frac{\frac{1}{1-h_{ii}(k)}S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \nonumber \\&\qquad +\,\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1-h_{ii}(k)}-\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}}{1-h_{ii}(k)} \nonumber \\&\qquad \times \, \frac{\frac{1}{1-h_{ii}(k)}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}}. \end{aligned}$$
(15)

After the premultiplication of (15) by \(x_{i}^{\prime }\) and simplifications, following result holds

$$\begin{aligned} x_{i}^{\prime }S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R&= \left[ \frac{1}{1-h_{ii}(k)}-\frac{1}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \frac{h_{ii}^{R}(k)}{\left( 1-h_{ii}(k)\right) ^{2}}\right] \nonumber \\&\times \, x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R \nonumber \\&= \frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R. \end{aligned}$$
(16)

Similar to the result in (16), we have

$$\begin{aligned} x_{i}^{\prime }S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}r= \frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
(17)

\(\hat{\beta }_{(i)}(k,d)\) can be derived by writing \(\hat{\beta }(k,d)\) as \( \hat{\beta }(k,d)=d\hat{\beta }+(1-d)\hat{\beta }(k)\). Then using \(\hat{\beta } _{(i)}=\hat{\beta }-\frac{1}{1-h_{ii}}S^{-1}x_{i}e_{i}\) (see Myers 1990; Montgomery et al. 2001) and \(\hat{\beta }_{(i)}(k)=\hat{\beta }(k)-\frac{1}{ 1-h_{ii}(k)}S_{k}^{-1}x_{i}e_{i,k}\) (see Walker and Birch 1988), \(\hat{\beta }_{(i)}(k,d)\) becomes

$$\begin{aligned} \hat{\beta }_{(i)}(k,d)=\hat{\beta }(k,d)-\frac{d}{1-h_{ii}}S^{-1}x_{i}e_{i}- \frac{(1-d)}{1-h_{ii}(k)}S_{k}^{-1}x_{i}e_{i,k} \end{aligned}$$

and the two parameter residual withholding the \(i\)th observation yields in

$$\begin{aligned} e_{(i)}(k,d)=y_{i}-x_{i}^{\prime }\hat{\beta }_{(i)}(k,d)=\frac{de_{i}}{ 1-h_{ii}}+\frac{(1-d)e_{i,k}}{1-h_{ii}(k)}. \end{aligned}$$
(18)

Noting that the two parameter residual is

$$\begin{aligned} e(k,d)=y-X\hat{\beta }(k,d)=de+(1-d)e_{k} \end{aligned}$$

and the restricted two parameter residual is

$$\begin{aligned} e_{R}(k,d)=y-X\hat{\beta }_{R}(k,d)=e(k,d)+XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r), \end{aligned}$$

and by using Eqs. (12), (16), (17) and (18), the two parameter residual withholding the \(i\)th observation yields in

$$\begin{aligned} e_{R(i)}(k,d)&= y_{i}-x_{i}^{\prime }\hat{\beta }_{R(i)}(k,d)=\frac{de_{i}}{1-h_{ii}}+\frac{(1-d)e_{i,k}}{1-h_{ii}(k)} \\&+\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}\Bigg [x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R\hat{\beta }(k,d) \\&-\,\frac{d}{1-h_{ii}}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}e_{i}-\frac{(1-d)}{1-h_{ii}(k)} h_{ii}^{R}(k)e_{i,k}\Bigg ] \\&-\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&= \left[ 1-\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \frac{de_{i}}{1-h_{ii}}+\left[ 1- \frac{h_{ii}^{R}(k)}{1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \\&\times \, \frac{(1-d)e_{i,k}}{1-h_{ii}(k)}+\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)} x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r) \\&= \left[ 1-\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}}{1-h_{ii}(k)+h_{ii}^{R}(k)}-\frac{1-h_{ii}}{ 1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \frac{de_{i}}{1-h_{ii}} \\&+\,\frac{1-h_{ii}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{de_{i}}{1-h_{ii}}+\frac{ 1-h_{ii}(k)}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{(1-d)e_{i,k}}{1-h_{ii}(k)} \\&+\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r). \end{aligned}$$

Then the proof is completed.

Appendix 2: Proof of Theorem 2

The total mean square error of estimating the expected values of the response, \(E(y_{i})\), from the fitted value, \(\tilde{y}_{i}\), is

$$\begin{aligned} \Gamma =\frac{1}{\sigma ^{2}}\sum \limits _{i=1}^{n}MSE(\tilde{y}_{i})=\frac{1}{\sigma ^{2}}\left\{ \sum \limits _{i=1}^{n}Var(\tilde{y}_{i})+\sum \limits _{i=1}^{n}\left[ Bias(\tilde{y}_{i})\right] ^{2}\right\} \end{aligned}$$
(19)

where \(\tilde{y}_{i}\) is the fitted value obtained by any estimator. Mallow’s \(C_{p}\) statistic (Mallows 1973) estimates \(\Gamma \) as

$$\begin{aligned} C_{p}=\frac{SS_{\mathrm{Re}\,s}}{\hat{\sigma }^{2}}-n+2p \end{aligned}$$

where \(SS_{\mathrm{Re}\,s}=\sum e_{i}^{2}\) is the least squares residual sum of squares.

The \(\hat{\beta }_{R}(k,d)\) estimator is rewritten in the form,

$$\begin{aligned} \hat{\beta }_{R}(k,d)=M_{k}S_{kd}S^{-1}X^{\prime }y+S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \end{aligned}$$

where \(S_{kd}=X^{\prime }X+kdI\).

Since the fitted value of \(y_{i}\) under the restricted two parameter estimator

$$\begin{aligned} \hat{y}_{i}(k,d)=x_{i}^{\prime }\hat{\beta }_{R}(k,d)=x_{i}^{\prime }M_{k}S_{kd}S^{-1}X^{\prime }y+x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \end{aligned}$$

is the linear combination of the random variable \(y\), we respectively get the total variance and squared bias of \(\hat{y}_{i}(k,d)\) as

$$\begin{aligned} \frac{1}{\sigma ^{2}}\sum \limits _{i=1}^{n}Var(\hat{y}_{i}(k,d))=\sum \limits _{i=1}^{n}x_{i}^{\prime }M_{k}S_{kd}S^{-1}S_{kd}M_{k}x_{i}=tr(XM_{k}S_{kd}S^{-1}S_{kd}M_{k}X^{\prime }) \end{aligned}$$

and

$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}\!=\!\sum \limits _{i=1}^{n}\left[ E(y_{i})\!-\!E(\hat{y}_{i}(k,d))\right] ^{2}\!=\!\sum \limits _{i=1}^{n}\left[ x_{i}^{\prime }\beta \!-\!x_{i}^{\prime }E(\hat{\beta }_{R}(k,d))\right] ^{2}.\nonumber \\ \end{aligned}$$
(20)

By using

$$\begin{aligned} E(\hat{\beta }_{R}(k,d))&= S_{k}^{-1}S_{kd}\beta -S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(RS_{k}^{-1}S_{kd}\beta -r) \\&= M_{k}S_{kd}\beta +S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r, \end{aligned}$$

Equation (20) becomes

$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}&= \left[ X\beta -XM_{k}S_{kd}S^{-1}X^{\prime }X\beta -XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] ^{\prime } \\&\times \, \left[ X\beta -XM_{k}S_{kd}S^{-1}X^{\prime }X\beta -XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \\&= (X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \\&-\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&-\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

The restricted two parameter residual sum of squares can be written as

$$\begin{aligned} SS_{\mathrm{Re}\,s}^{R}(k,d)&= (y-X\hat{\beta }_{R}(k,d))^{\prime }(y-X\hat{\beta }_{R}(k,d)) \\&= y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y \\&-\,2y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

Then the expected value of \(SS_{\mathrm{Re}\,s}^{R}(k,d)\) results in

$$\begin{aligned} E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right)&= E\left[ y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y\right] \nonumber \\&-\,2(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \nonumber \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
(21)

The expected value of the quadratic form equals

$$\begin{aligned}&E[y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y] \nonumber \\&\quad =\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \nonumber \\&\quad +\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta . \end{aligned}$$
(22)

In view of (21) and (22), we obtain

$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}&= E\left( SS_{\mathrm{Re}\, s}^{R}(k,d)\right) \\&-\,\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] . \end{aligned}$$

By expanding the term \(tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \), we get

$$\begin{aligned}&tr\left[ I-XM_{k}S_{kd}S^{-1}X^{\prime }-XS^{-1}S_{kd}M_{k}X^{\prime }+XS^{-1}S_{kd}M_{k}X^{\prime }XM_{k}S_{kd}S^{-1}X^{\prime }\right] \\&\quad =n-tr(M_{k}S_{kd})-tr(S_{kd}M_{k})+tr(XM_{k}S_{kd}S^{-1}X^{\prime }XS^{-1}S_{kd}M_{k}X^{\prime }). \end{aligned}$$

Hence,

$$\begin{aligned} \sum \limits _{i=1}^{n}Var(\hat{y}_{i}(k,d))\!+\!\sum \limits _{i=1}^{n}\left[ Bias(\hat{y} _{i}(k,d))\right] ^{2}\!=\!E(SS_{\mathrm{Re}\,s}^{R}(k,d))-n\sigma ^{2}+2\sigma ^{2}tr(M_{k}S_{kd}). \end{aligned}$$
(23)

Considering Eq.(19), we divide Eq.(23) by \(\sigma ^{2}\). Then, using \(\hat{ \sigma }_{RLS}^{2}\) instead of \(\sigma ^{2}\) and by the property of the trace operator, the estimator of the expression in Eq. (23) is obtained which proves the theorem.

Appendix 3: Proof of Theorem 4

Derivative of \({\textit{PRESS}}^{R}(k,d)\) with respect to \(d\) for fixed \(k\) can be expressed as

$$\begin{aligned}&\frac{\partial {\textit{PRESS}}^{R}(k,d)}{\partial d}\nonumber \\&\quad =2\sum \left[ \frac{e_{Ri}(k,d)}{1-h_{ii}(k)+h_{ii}^{R}(k)} +\frac{h_{ii}-h_{ii}(k)+h_{ii}^{R}(k)-x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{de_{i}}{1-h_{ii}}\right] q\nonumber \\ \end{aligned}$$
(24)

Equating Eq. (24) to zero, the proof is completed.

Appendix 4: Proof of Theorem 5

Due to Eqs. (21) and (22), we get the derivative of \(E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) \) with respect to \(d\) for fixed \(k\):

$$\begin{aligned} \frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}&= \frac{\partial }{\partial d}E[y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y] \nonumber \\&-\,\frac{\partial }{\partial d}\left[ 2\beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \nonumber \\&= \frac{\partial }{\partial d}\Big \{\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \nonumber \\&+\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \Big \} \nonumber \\&-\,\frac{\partial }{\partial d}\left[ 2\beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \nonumber \\&= \sigma ^{2}tr\left\{ \frac{\partial }{\partial d}\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \right\} \nonumber \\&+\,\frac{\partial }{\partial d}\left[ \beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \right] \nonumber \\&+\,2k\beta ^{\prime }SS^{-1}M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
(25)

By substituting the result

$$\begin{aligned}&\frac{\partial }{\partial d}\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] =-kXM_{k}S^{-1}X^{\prime }-kXS^{-1}M_{k}X^{\prime } \\&\quad +\,kXS^{-1}M_{k}SM_{k}X^{\prime }+kXM_{k}SM_{k}S^{-1}X^{\prime }+2k^{2}dXS^{-1}M_{k}SM_{k}S^{-1}X^{\prime } \end{aligned}$$

into Eq. (25) and using the trace properties, we have

$$\begin{aligned} \frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}&= -2\sigma ^{2}ktr(M_{k})+2\sigma ^{2}ktr(M_{k}SM_{k})+2\sigma ^{2}k^{2}dtr(S^{-1}M_{k}SM_{k}) \\&-\,k\beta ^{\prime }SM_{k}\beta -k\beta ^{\prime }M_{k}S\beta +k\beta ^{\prime }M_{k}SM_{k}S\beta +k\beta ^{\prime }SM_{k}SM_{k}\beta \\&+\,2k^{2}d\beta ^{\prime }M_{k}SM_{k}\beta +2k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$

By using the equality \(M_{k}SM_{k}=M_{k}-kM_{k}^{2}\), \(\frac{\partial E\left( SS_{ \mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}\) reduces to

$$\begin{aligned}&\frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d} \!=\!-2\sigma ^{2}k^{2}tr(M_{k}^{2})\!+\!2\sigma ^{2}k^{2}dtr(S^{-1}M_{k}SM_{k})-k^{2}\beta ^{\prime }M_{k}^{2}S\beta \!-\!k^{2}\beta ^{\prime }S \nonumber \\&\quad \times \, M_{k}^{2}\beta +2k^{2}d\beta ^{\prime }M_{k}SM_{k}\beta +2k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
(26)

Let \(\Gamma ^{R}(k,d)=\frac{E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\sigma ^{2}} -n+2tr(XM_{k}S_{kd}S^{-1}X^{\prime })\), then due to Eq. (26) and \(\frac{\partial tr(XM_{k}S_{kd}S^{-1}X^{\prime })}{\partial d}=ktr(M_{k})\), we obtain the derivative of \(\Gamma ^{R}(k,d)\ \)with respect to \(d\) for fixed \( k \):

$$\begin{aligned} \frac{\partial \Gamma ^{R}(k,d)}{\partial d}&= d\left[ 2k^{2}tr(S^{-1}M_{k}SM_{k})+2\sigma ^{-2}k^{2}\beta ^{\prime }M_{k}SM_{k}\beta \right] \\&-\,2k^{2}tr(M_{k}^{2})-\sigma ^{-2}k^{2}\beta ^{\prime }M_{k}^{2}S\beta -\sigma ^{-2}k^{2}\beta ^{\prime }SM_{k}^{2}\beta \\&+\,2\sigma ^{-2}k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r+2ktr(M_{k}). \end{aligned}$$

Using the equality \(k^{2}tr(M_{k}^{2})-ktr(M_{k})=-ktr(M_{k}SM_{k})\) and equating \(\frac{\partial \Gamma ^{R}(k,d)}{\partial d}\) to zero the \(d\) value that minimizes \(\Gamma ^{R}(k,d)\) is obtained as

$$\begin{aligned} d=\frac{k\beta ^{\prime }M_{k}^{2}S\beta +k\beta ^{\prime }SM_{k}^{2}\beta -2\sigma ^{2}tr(M_{k}SM_{k})-2\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r}{2\sigma ^{2}ktr(S^{-1}M_{k}SM_{k})+2k\beta ^{\prime }M_{k}SM_{k}\beta }. \end{aligned}$$

Since \(C_{p}^{R}(k,d)\) is an estimate of \(\Gamma ^{R}(k,d)\), we prove the theorem by writing \(\hat{\sigma }_{RLS}^{2}\) instead of \(\sigma ^{2}\) and \( \hat{\beta }_{RLS}\) instead of \(\beta \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Özkale, M.R. Predictive performance of linear regression models. Stat Papers 56, 531–567 (2015). https://doi.org/10.1007/s00362-014-0596-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-014-0596-4

Keywords

Mathematics Subject Classification

Navigation