Appendix 1: Proof of Theorem 1
Let us first write the restricted two parameter estimator (4) as
$$\begin{aligned} \hat{\beta }_{R}(k,d)=\left[ I-S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R\right] \hat{\beta }(k,d)+S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
Let
$$\begin{aligned} \hat{\beta }_{R(i)}(k,d)=\left[ I-S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R\right] \hat{\beta } _{(i)}(k,d)+S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}r\nonumber \\ \end{aligned}$$
(12)
is the vector of regression coefficients computed with the \(i\)th data point set aside. By using the Sherman-Morrison-Woodbury Theorem, the computations can be verified:
$$\begin{aligned} S_{k(i)}^{-1}=(X_{(i)}^{\prime }X_{(i)}+kI)^{-1}=(S_{k}-x_{i}x_{i}^{\prime })^{-1}=S_{k}^{-1}+\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}}{ 1-h_{ii}(k)} \end{aligned}$$
(13)
where \(h_{ii}(k)=x_{i}^{\prime }S_{k}^{-1}x_{i}\). Equation (13) leads to
$$\begin{aligned} (RS_{k(i)}^{-1}R^{\prime })^{-1}&= \left[ RS_{k}^{-1}R^{\prime }+\frac{ RS_{k}^{-1}x_{i}}{\sqrt{1-h_{ii}(k)}}\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }}{\sqrt{1-h_{ii}(k)}}\right] ^{-1}=(RS_{k}^{-1}R^{ \prime })^{-1} \nonumber \\&-\,\frac{\frac{1}{1-h_{ii}(k)}(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \end{aligned}$$
(14)
where \(h_{ii}^{R}(k)=x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}\). Multiplying (13) and (14), we get
$$\begin{aligned}&S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R =S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R \nonumber \\&\qquad -\,\frac{\frac{1}{1-h_{ii}(k)}S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \nonumber \\&\qquad +\,\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1-h_{ii}(k)}-\frac{S_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}}{1-h_{ii}(k)} \nonumber \\&\qquad \times \, \frac{\frac{1}{1-h_{ii}(k)}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}x_{i}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}}. \end{aligned}$$
(15)
After the premultiplication of (15) by \(x_{i}^{\prime }\) and simplifications, following result holds
$$\begin{aligned} x_{i}^{\prime }S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}R&= \left[ \frac{1}{1-h_{ii}(k)}-\frac{1}{1+\frac{h_{ii}^{R}(k)}{1-h_{ii}(k)}} \frac{h_{ii}^{R}(k)}{\left( 1-h_{ii}(k)\right) ^{2}}\right] \nonumber \\&\times \, x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R \nonumber \\&= \frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R. \end{aligned}$$
(16)
Similar to the result in (16), we have
$$\begin{aligned} x_{i}^{\prime }S_{k(i)}^{-1}R^{\prime }(RS_{k(i)}^{-1}R^{\prime })^{-1}r= \frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
(17)
\(\hat{\beta }_{(i)}(k,d)\) can be derived by writing \(\hat{\beta }(k,d)\) as \( \hat{\beta }(k,d)=d\hat{\beta }+(1-d)\hat{\beta }(k)\). Then using \(\hat{\beta } _{(i)}=\hat{\beta }-\frac{1}{1-h_{ii}}S^{-1}x_{i}e_{i}\) (see Myers 1990; Montgomery et al. 2001) and \(\hat{\beta }_{(i)}(k)=\hat{\beta }(k)-\frac{1}{ 1-h_{ii}(k)}S_{k}^{-1}x_{i}e_{i,k}\) (see Walker and Birch 1988), \(\hat{\beta }_{(i)}(k,d)\) becomes
$$\begin{aligned} \hat{\beta }_{(i)}(k,d)=\hat{\beta }(k,d)-\frac{d}{1-h_{ii}}S^{-1}x_{i}e_{i}- \frac{(1-d)}{1-h_{ii}(k)}S_{k}^{-1}x_{i}e_{i,k} \end{aligned}$$
and the two parameter residual withholding the \(i\)th observation yields in
$$\begin{aligned} e_{(i)}(k,d)=y_{i}-x_{i}^{\prime }\hat{\beta }_{(i)}(k,d)=\frac{de_{i}}{ 1-h_{ii}}+\frac{(1-d)e_{i,k}}{1-h_{ii}(k)}. \end{aligned}$$
(18)
Noting that the two parameter residual is
$$\begin{aligned} e(k,d)=y-X\hat{\beta }(k,d)=de+(1-d)e_{k} \end{aligned}$$
and the restricted two parameter residual is
$$\begin{aligned} e_{R}(k,d)=y-X\hat{\beta }_{R}(k,d)=e(k,d)+XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r), \end{aligned}$$
and by using Eqs. (12), (16), (17) and (18), the two parameter residual withholding the \(i\)th observation yields in
$$\begin{aligned} e_{R(i)}(k,d)&= y_{i}-x_{i}^{\prime }\hat{\beta }_{R(i)}(k,d)=\frac{de_{i}}{1-h_{ii}}+\frac{(1-d)e_{i,k}}{1-h_{ii}(k)} \\&+\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}\Bigg [x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}R\hat{\beta }(k,d) \\&-\,\frac{d}{1-h_{ii}}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}e_{i}-\frac{(1-d)}{1-h_{ii}(k)} h_{ii}^{R}(k)e_{i,k}\Bigg ] \\&-\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&= \left[ 1-\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \frac{de_{i}}{1-h_{ii}}+\left[ 1- \frac{h_{ii}^{R}(k)}{1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \\&\times \, \frac{(1-d)e_{i,k}}{1-h_{ii}(k)}+\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)} x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r) \\&= \left[ 1-\frac{x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}x_{i}}{1-h_{ii}(k)+h_{ii}^{R}(k)}-\frac{1-h_{ii}}{ 1-h_{ii}(k)+h_{ii}^{R}(k)}\right] \frac{de_{i}}{1-h_{ii}} \\&+\,\frac{1-h_{ii}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{de_{i}}{1-h_{ii}}+\frac{ 1-h_{ii}(k)}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{(1-d)e_{i,k}}{1-h_{ii}(k)} \\&+\,\frac{1}{1-h_{ii}(k)+h_{ii}^{R}(k)}x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(R\hat{\beta }(k,d)-r). \end{aligned}$$
Then the proof is completed.
Appendix 2: Proof of Theorem 2
The total mean square error of estimating the expected values of the response, \(E(y_{i})\), from the fitted value, \(\tilde{y}_{i}\), is
$$\begin{aligned} \Gamma =\frac{1}{\sigma ^{2}}\sum \limits _{i=1}^{n}MSE(\tilde{y}_{i})=\frac{1}{\sigma ^{2}}\left\{ \sum \limits _{i=1}^{n}Var(\tilde{y}_{i})+\sum \limits _{i=1}^{n}\left[ Bias(\tilde{y}_{i})\right] ^{2}\right\} \end{aligned}$$
(19)
where \(\tilde{y}_{i}\) is the fitted value obtained by any estimator. Mallow’s \(C_{p}\) statistic (Mallows 1973) estimates \(\Gamma \) as
$$\begin{aligned} C_{p}=\frac{SS_{\mathrm{Re}\,s}}{\hat{\sigma }^{2}}-n+2p \end{aligned}$$
where \(SS_{\mathrm{Re}\,s}=\sum e_{i}^{2}\) is the least squares residual sum of squares.
The \(\hat{\beta }_{R}(k,d)\) estimator is rewritten in the form,
$$\begin{aligned} \hat{\beta }_{R}(k,d)=M_{k}S_{kd}S^{-1}X^{\prime }y+S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \end{aligned}$$
where \(S_{kd}=X^{\prime }X+kdI\).
Since the fitted value of \(y_{i}\) under the restricted two parameter estimator
$$\begin{aligned} \hat{y}_{i}(k,d)=x_{i}^{\prime }\hat{\beta }_{R}(k,d)=x_{i}^{\prime }M_{k}S_{kd}S^{-1}X^{\prime }y+x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \end{aligned}$$
is the linear combination of the random variable \(y\), we respectively get the total variance and squared bias of \(\hat{y}_{i}(k,d)\) as
$$\begin{aligned} \frac{1}{\sigma ^{2}}\sum \limits _{i=1}^{n}Var(\hat{y}_{i}(k,d))=\sum \limits _{i=1}^{n}x_{i}^{\prime }M_{k}S_{kd}S^{-1}S_{kd}M_{k}x_{i}=tr(XM_{k}S_{kd}S^{-1}S_{kd}M_{k}X^{\prime }) \end{aligned}$$
and
$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}\!=\!\sum \limits _{i=1}^{n}\left[ E(y_{i})\!-\!E(\hat{y}_{i}(k,d))\right] ^{2}\!=\!\sum \limits _{i=1}^{n}\left[ x_{i}^{\prime }\beta \!-\!x_{i}^{\prime }E(\hat{\beta }_{R}(k,d))\right] ^{2}.\nonumber \\ \end{aligned}$$
(20)
By using
$$\begin{aligned} E(\hat{\beta }_{R}(k,d))&= S_{k}^{-1}S_{kd}\beta -S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}(RS_{k}^{-1}S_{kd}\beta -r) \\&= M_{k}S_{kd}\beta +S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r, \end{aligned}$$
Equation (20) becomes
$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}&= \left[ X\beta -XM_{k}S_{kd}S^{-1}X^{\prime }X\beta -XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] ^{\prime } \\&\times \, \left[ X\beta -XM_{k}S_{kd}S^{-1}X^{\prime }X\beta -XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \\&= (X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \\&-\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&-\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
The restricted two parameter residual sum of squares can be written as
$$\begin{aligned} SS_{\mathrm{Re}\,s}^{R}(k,d)&= (y-X\hat{\beta }_{R}(k,d))^{\prime }(y-X\hat{\beta }_{R}(k,d)) \\&= y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y \\&-\,2y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}X^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
Then the expected value of \(SS_{\mathrm{Re}\,s}^{R}(k,d)\) results in
$$\begin{aligned} E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right)&= E\left[ y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y\right] \nonumber \\&-\,2(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r \nonumber \\&+\,r^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS_{k}^{-1}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
(21)
The expected value of the quadratic form equals
$$\begin{aligned}&E[y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y] \nonumber \\&\quad =\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \nonumber \\&\quad +\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta . \end{aligned}$$
(22)
In view of (21) and (22), we obtain
$$\begin{aligned} \sum \limits _{i=1}^{n}\left[ Bias(\hat{y}_{i}(k,d))\right] ^{2}&= E\left( SS_{\mathrm{Re}\, s}^{R}(k,d)\right) \\&-\,\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] . \end{aligned}$$
By expanding the term \(tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \), we get
$$\begin{aligned}&tr\left[ I-XM_{k}S_{kd}S^{-1}X^{\prime }-XS^{-1}S_{kd}M_{k}X^{\prime }+XS^{-1}S_{kd}M_{k}X^{\prime }XM_{k}S_{kd}S^{-1}X^{\prime }\right] \\&\quad =n-tr(M_{k}S_{kd})-tr(S_{kd}M_{k})+tr(XM_{k}S_{kd}S^{-1}X^{\prime }XS^{-1}S_{kd}M_{k}X^{\prime }). \end{aligned}$$
Hence,
$$\begin{aligned} \sum \limits _{i=1}^{n}Var(\hat{y}_{i}(k,d))\!+\!\sum \limits _{i=1}^{n}\left[ Bias(\hat{y} _{i}(k,d))\right] ^{2}\!=\!E(SS_{\mathrm{Re}\,s}^{R}(k,d))-n\sigma ^{2}+2\sigma ^{2}tr(M_{k}S_{kd}). \end{aligned}$$
(23)
Considering Eq.(19), we divide Eq.(23) by \(\sigma ^{2}\). Then, using \(\hat{ \sigma }_{RLS}^{2}\) instead of \(\sigma ^{2}\) and by the property of the trace operator, the estimator of the expression in Eq. (23) is obtained which proves the theorem.
Appendix 3: Proof of Theorem 4
Derivative of \({\textit{PRESS}}^{R}(k,d)\) with respect to \(d\) for fixed \(k\) can be expressed as
$$\begin{aligned}&\frac{\partial {\textit{PRESS}}^{R}(k,d)}{\partial d}\nonumber \\&\quad =2\sum \left[ \frac{e_{Ri}(k,d)}{1-h_{ii}(k)+h_{ii}^{R}(k)} +\frac{h_{ii}-h_{ii}(k)+h_{ii}^{R}(k)-x_{i}^{\prime }S_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}RS^{-1}}{1-h_{ii}(k)+h_{ii}^{R}(k)}\frac{de_{i}}{1-h_{ii}}\right] q\nonumber \\ \end{aligned}$$
(24)
Equating Eq. (24) to zero, the proof is completed.
Appendix 4: Proof of Theorem 5
Due to Eqs. (21) and (22), we get the derivative of \(E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) \) with respect to \(d\) for fixed \(k\):
$$\begin{aligned} \frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}&= \frac{\partial }{\partial d}E[y^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })y] \nonumber \\&-\,\frac{\partial }{\partial d}\left[ 2\beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \nonumber \\&= \frac{\partial }{\partial d}\Big \{\sigma ^{2}tr\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \nonumber \\&+\,(X\beta )^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \Big \} \nonumber \\&-\,\frac{\partial }{\partial d}\left[ 2\beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }XS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r\right] \nonumber \\&= \sigma ^{2}tr\left\{ \frac{\partial }{\partial d}\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] \right\} \nonumber \\&+\,\frac{\partial }{\partial d}\left[ \beta ^{\prime }X^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })X\beta \right] \nonumber \\&+\,2k\beta ^{\prime }SS^{-1}M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
(25)
By substituting the result
$$\begin{aligned}&\frac{\partial }{\partial d}\left[ (I-XM_{k}S_{kd}S^{-1}X^{\prime })^{\prime }(I-XM_{k}S_{kd}S^{-1}X^{\prime })\right] =-kXM_{k}S^{-1}X^{\prime }-kXS^{-1}M_{k}X^{\prime } \\&\quad +\,kXS^{-1}M_{k}SM_{k}X^{\prime }+kXM_{k}SM_{k}S^{-1}X^{\prime }+2k^{2}dXS^{-1}M_{k}SM_{k}S^{-1}X^{\prime } \end{aligned}$$
into Eq. (25) and using the trace properties, we have
$$\begin{aligned} \frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}&= -2\sigma ^{2}ktr(M_{k})+2\sigma ^{2}ktr(M_{k}SM_{k})+2\sigma ^{2}k^{2}dtr(S^{-1}M_{k}SM_{k}) \\&-\,k\beta ^{\prime }SM_{k}\beta -k\beta ^{\prime }M_{k}S\beta +k\beta ^{\prime }M_{k}SM_{k}S\beta +k\beta ^{\prime }SM_{k}SM_{k}\beta \\&+\,2k^{2}d\beta ^{\prime }M_{k}SM_{k}\beta +2k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
By using the equality \(M_{k}SM_{k}=M_{k}-kM_{k}^{2}\), \(\frac{\partial E\left( SS_{ \mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d}\) reduces to
$$\begin{aligned}&\frac{\partial E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\partial d} \!=\!-2\sigma ^{2}k^{2}tr(M_{k}^{2})\!+\!2\sigma ^{2}k^{2}dtr(S^{-1}M_{k}SM_{k})-k^{2}\beta ^{\prime }M_{k}^{2}S\beta \!-\!k^{2}\beta ^{\prime }S \nonumber \\&\quad \times \, M_{k}^{2}\beta +2k^{2}d\beta ^{\prime }M_{k}SM_{k}\beta +2k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r. \end{aligned}$$
(26)
Let \(\Gamma ^{R}(k,d)=\frac{E\left( SS_{\mathrm{Re}\,s}^{R}(k,d)\right) }{\sigma ^{2}} -n+2tr(XM_{k}S_{kd}S^{-1}X^{\prime })\), then due to Eq. (26) and \(\frac{\partial tr(XM_{k}S_{kd}S^{-1}X^{\prime })}{\partial d}=ktr(M_{k})\), we obtain the derivative of \(\Gamma ^{R}(k,d)\ \)with respect to \(d\) for fixed \( k \):
$$\begin{aligned} \frac{\partial \Gamma ^{R}(k,d)}{\partial d}&= d\left[ 2k^{2}tr(S^{-1}M_{k}SM_{k})+2\sigma ^{-2}k^{2}\beta ^{\prime }M_{k}SM_{k}\beta \right] \\&-\,2k^{2}tr(M_{k}^{2})-\sigma ^{-2}k^{2}\beta ^{\prime }M_{k}^{2}S\beta -\sigma ^{-2}k^{2}\beta ^{\prime }SM_{k}^{2}\beta \\&+\,2\sigma ^{-2}k\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r+2ktr(M_{k}). \end{aligned}$$
Using the equality \(k^{2}tr(M_{k}^{2})-ktr(M_{k})=-ktr(M_{k}SM_{k})\) and equating \(\frac{\partial \Gamma ^{R}(k,d)}{\partial d}\) to zero the \(d\) value that minimizes \(\Gamma ^{R}(k,d)\) is obtained as
$$\begin{aligned} d=\frac{k\beta ^{\prime }M_{k}^{2}S\beta +k\beta ^{\prime }SM_{k}^{2}\beta -2\sigma ^{2}tr(M_{k}SM_{k})-2\beta ^{\prime }M_{k}SS_{k}^{-1}R^{\prime }(RS_{k}^{-1}R^{\prime })^{-1}r}{2\sigma ^{2}ktr(S^{-1}M_{k}SM_{k})+2k\beta ^{\prime }M_{k}SM_{k}\beta }. \end{aligned}$$
Since \(C_{p}^{R}(k,d)\) is an estimate of \(\Gamma ^{R}(k,d)\), we prove the theorem by writing \(\hat{\sigma }_{RLS}^{2}\) instead of \(\sigma ^{2}\) and \( \hat{\beta }_{RLS}\) instead of \(\beta \).