Skip to main content
Log in

Rank-based shrinkage estimation for identification in semiparametric additive models

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel and robust procedure for model identification in semiparametric additive models based on rank regression and spline approximation. Under some mild conditions, we establish the theoretical properties of the identified nonparametric functions and the linear parameters. Furthermore, we demonstrate that the proposed rank estimate has a great efficiency gain across a wide spectrum of non-normal error distributions and almost not lose any efficiency for the normal error compared with that of least square estimate. Even in the worst case scenarios, the asymptotic relative efficiency of the proposed rank estimate versus least squares estimate, which is show to have an expression closely related to that of the signed-rank Wilcoxon test in comparison with the t-test, has a lower bound equal to 0.864. Finally, an efficient algorithm is presented for computation and the selections of tuning parameters are discussed. Some simulation studies and a real data analysis are conducted to illustrate the finite sample performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

Download references

Acknowledgements

The authors are grateful to the Editor, Associate Editor and two anonymous referees whose comments lead to a significant improvement of the paper. This work was supported in part by the National Natural Science Foundation of China (Grant No. 11671059).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Yang.

Appendix

Appendix

In the proofs, C denotes a generic constant that might assume different values at different places. Assume \(\gamma _0=(\gamma _{01}^T,\gamma _{02}^T, \ldots ,\gamma _{0p}^T)^T\) be a pK-dimensional vector satisfying \(\Vert f_{0j}-B_j^T \gamma _{0j}\Vert =O_p(K^{-r})\) for \( 1\le j \le p_0\) and \(f_{0j}=B_j^T \gamma _{0j}\) for \(p_0 < j \le p\). In order to prove the theoretical results, we first give some notations for convenience of expression. Let

$$\begin{aligned} \theta _n=\sqrt{K/n}, ~~\gamma ^{*}=\theta _n^{-1}(\gamma -\gamma _0), ~~Z_i= \big ( B_1(X_{i1})^T,\ldots ,B_p(X_{ip})^T \big )^T, \end{aligned}$$
$$\begin{aligned} Z_{ij}=Z_i-Z_j, ~~Z=(Z_1,\ldots ,Z_n)^T, ~~\Delta _i=\sum _{l=1}^{p} { f_{0l}(X_{il})-Z_i^T \gamma _0 }, \end{aligned}$$
$$\begin{aligned} \bar{K}=pK, ~~~\text{ and }~~~ Q_n(\gamma ^{*})=\tau \theta _n^2 \gamma ^{*^T} Z^TZ \gamma ^{*} + \gamma ^{*^T} S_n(0) + L_n(0). \end{aligned}$$

Based on the notations, the objective function \(L_n(\gamma )\) defined in (4) can be rewritten as

$$\begin{aligned} L_n^{*}(\gamma ^{*})= \frac{1}{n}\sum _{i<j} { |(\varepsilon _i+\Delta _i)-(\varepsilon _j+\Delta _j)-\theta _n Z_{ij}^T \gamma ^{*}| }. \end{aligned}$$

Further denote as \(S_n(\gamma ^{*})\) the gradient function of \(L_n(\gamma ^{*})\), that is,

$$\begin{aligned} S_n(\gamma ^{*}) = \frac{\partial L_n^{*}(\gamma ^{*})}{\partial \gamma ^{*}}= -\frac{\theta _n}{n} \sum _{i \ne j} { \text{ sgn } \{ \varepsilon _i + \Delta _{i} -\varepsilon _j - \Delta _{j} - \theta _n Z_{ij}^T \gamma ^{*} \} Z_{ij} }, \end{aligned}$$

where \(\text{ sgn }(\cdot )\) denotes the sign function.

We first quote several necessary lemmas which are frequently used in the sequel, and the detailed proofs can be referred to Feng et al. (2015).

Lemma 1

Suppose that the assumptions (A1)–(A4) hold, then

$$\begin{aligned} S_n(\gamma ^{*})-S_n(0)=2 \tau \theta _n^2 Z^TZ \gamma ^{*} +o_p(1)\mathbf 1 _{\bar{K}}, \end{aligned}$$

where \(\tau \) is defined in Theorem 3 and \(\mathbf 1 _{\bar{K}}\) is a K-dimension vector of ones.

Lemma 2

Let \(\hat{\gamma }^{*}=\arg \min L_n^{*}(\gamma ^{*})\) and \(\tilde{\gamma }^{*}=\arg \min Q_n(\gamma ^{*})\). Suppose that the assumptions (A1)–(A4) hold, then

$$\begin{aligned} \Vert \hat{\gamma }^{*}-\tilde{\gamma }^{*}\Vert ^2=o_p(K). \end{aligned}$$

Lemma 3

Suppose that the assumptions (A1)–(A4) hold, then

$$\begin{aligned} S_n(0)=O_p(1)\mathbf 1 _{\bar{K}}. \end{aligned}$$

Proof of Theorem 1

By the definition of \(A_n(\gamma ^{*})\), it follows from the convexity lemma in Pollard (1991) that

$$\begin{aligned} \tilde{\gamma }^{*}=-(2 \tau \theta _n^2 Z^TZ )^{-1}S_n(0). \end{aligned}$$

Note that, according to Lemma A.3 of Huang et al. (2004), there exists an interval \([C_1,C_2]\), \(0<C_1<C_2<\infty \), such that all the eigenvalues of \(\frac{K}{n}Z^TZ\) fall into \([C_1,C_2]\) with probability tending to 1. Write \(S_n(0)=( S_{n1}(0),\ldots ,S_{n\bar{K}}(0) )^T\), then we have

$$\begin{aligned} \Vert \tilde{\gamma }^{*} \Vert ^2= & {} \frac{1}{4 \tau ^2}S_n(0)^T \left( \frac{K}{n}Z^TZ\right) ^{-1}\left( \frac{K}{n}Z^TZ\right) ^{-1}S_n(0) \\= & {} O_p(1)S_n(0)^T S_n(0)=O_p(1) \sum _{i=1}^{\bar{K}} { S_{ni}(0)^2 }=O_p(\bar{K}), \end{aligned}$$

where the last equality holds due to Lemma 3. As \(\bar{K}=pK\), it follows that \(|\tilde{\gamma }^{*}|^2=O_p(K)\). Therefore, based on the triangle inequality and Lemma 2, we obtain

$$\begin{aligned} \Vert \check{\gamma }^{*}\Vert ^2=\Vert \check{\gamma }^{*} - \tilde{\gamma }^{*} + \tilde{\gamma }^{*}\Vert ^2 \le \Vert \check{\gamma }^{*} - \tilde{\gamma }^{*} \Vert ^2 + \Vert \tilde{\gamma }^{*}\Vert ^2=o_p(K)+O_p(K)=O_p(K). \end{aligned}$$

This is equivalent to \(\Vert \check{\gamma }- \gamma _0 \Vert ^2=O_p(K^2/n)\) since \(\check{\gamma }^{*}=\theta _n^{-1}(\check{\gamma }-\gamma _0)\) and \(\theta _n=\sqrt{K/n}\).

In addition, by the properties of spline in De Boor (2001) that there exist some constants \(C_3\) and \(C_4\) satisfying

$$\begin{aligned} C_3 K \Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2 \le \Vert \check{\gamma }_j - \gamma _{0j} \Vert ^2 \le C_4 K \Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2. \end{aligned}$$

Thus, we can derive that \(\Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2=O_p(K/n)\). Consequently, by the fact that \(\Vert f_{0j}-B_j^T \gamma _{0j}\Vert =O_p(K^{-r})\), we have

$$\begin{aligned} \Vert \check{f}_j - f_{0j} \Vert ^2= & {} \Vert \check{\gamma }_j^T B_j- f_{0j} \Vert ^2 \le \Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2 + \Vert \gamma _{0j}^T B_j- f_{0j} \Vert ^2 \\= & {} O_p(K/n) + O_p(K^{-2r})= O_p(n^{-2r/(2r+1)}), \end{aligned}$$

where the last equality holds due to the assumption that the number of knots \(K=O_p\big ( n^{1/(2r+1)} \big )\). This completes the proof. \(\square \)

Proof of Theorem 2

Firstly, we prove (i). Denote by \(\delta _n=\theta _n+\lambda _1 +\lambda _2\), we first prove that \(\Vert \hat{\gamma }-\gamma _0\Vert = O_p(\bar{K}^{1/2} \delta _n)\). Let \(\gamma =\gamma _0+\bar{K}^{1/2} \delta _n v\), where v is a \(\bar{K}\)-dimensional vector. It is sufficient to show, for any given \(\xi >0\), there exists a large C such that

$$\begin{aligned} P\left\{ \inf _{\Vert v\Vert =C}L_n^{\lambda }(\gamma ) > L_n^{\lambda }(\gamma _0) \right\} \ge 1-\xi . \end{aligned}$$
(10)

By virtue of the identity \(|x-y|-|x|=-y\text{ sgn }(x)+2(y-x)\{ I(0<x<y)-I(y<x<0) \}\) and the definition of \(L_n^{\lambda }(\gamma )\), it follows that

$$\begin{aligned}&L_n^{\lambda }(\gamma ) - L_n^{\lambda }(\gamma _0) \nonumber \\&\quad = \frac{1}{n}\sum _{i<j} { \big \{ | Y_{ij}-Z_{ij}^T \gamma |-| Y_{ij}-Z_{ij}^T \gamma _0 | \big \} } + n \sum _{k=1}^p \big \{ p_{\lambda _1}( \sqrt{ \gamma _k^T D_k \gamma _k } ) \nonumber \\&\qquad - p_{\lambda _1}( \sqrt{ \gamma _{0k}^T D_k \gamma _{0k} } ) \big \} + n \sum _{k=1}^p { \bigg \{p_{\lambda _2}\left( \sqrt{ \gamma _{k}^T E_k \gamma _{k} } ) -p_{\lambda _2}( \sqrt{ \gamma _{0k}^T E_k \gamma _{0k} } \right) \bigg \}} \nonumber \\= & {} \frac{-1}{n}\sum _{i<j} { Z_{ij}^T (\gamma -\gamma _0) \text{ sgn }(Y_{ij}-Z_{ij}^T\gamma _0) } + \frac{2}{n}\sum _{i<j} { (Z_{ij}^T\gamma -Y_{ij}) } \cdot \nonumber \\&\big \{ I(0<Y_{ij}-Z_{ij}^T\gamma _0< Z_{ij}^T (\gamma -\gamma _0)) - I(Z_{ij}^T (\gamma -\gamma _0)<Y_{ij}-Z_{ij}^T\gamma _0<0) \big \} \nonumber \\&+\, n \sum _{k=1}^p { \big \{ p_{\lambda _1}( \sqrt{ \gamma _k^T D_k \gamma _k } ) - p_{\lambda _1}( \sqrt{ \gamma _{0k}^T D_k \gamma _{0k} } ) \big \}} \nonumber \\&+\,n \sum _{k=1}^p { \big \{p_{\lambda _2}( \sqrt{ \gamma _{k}^T E_k \gamma _{k} } ) -p_{\lambda _2}( \sqrt{ \gamma _{0k}^T E_k \gamma _{0k} } )\big \}} \nonumber \\\triangleq & {} L_1+L_2+L_3+L_4. \end{aligned}$$
(11)

From Lemma 3, it is easy to verify that \(\frac{-1}{n}\sum _{i<j} { \text{ sgn }(Y_{ij}-Z_{ij}^T\gamma _0) Z_{ij} }=\theta _n^{-1} \mathbf 1 _{\bar{K}}\), thus we have \(L_1=O_p( \delta _n \theta _n^{-1}\bar{K}^{1/2} \Vert v\Vert )=O_p( n^{1/2}\delta _n \Vert v\Vert )=o_p(n \delta _n^2 \Vert v\Vert )\) due to the assumption \(n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty \). Moreover, taking the similar arguments as in the proof of Lemma 1, we can obtain that

$$\begin{aligned} L_2=\tau (\gamma -\gamma _0)^T Z^TZ (\gamma -\gamma _0)(1+o_p(1)). \end{aligned}$$

By applying Lemma A.3 of Huang et al. (2004) to \(L_2\) yields \(L_2=O_p(n\delta _n^2 \Vert v\Vert ^2)\). Obviously, by choosing a sufficiently large C, \(L_2\) dominates \(L_1\) with probability tending to 1.

On the other hand, based on the well-known properties of B-spline that \(D_k\) and \(E_k\) are of rank \(K-1\) and all their positive eigenvalues are of order 1 / K, then according to the inequality \(p_{\lambda }(|x|)-p_{\lambda }(|y|) \le \lambda |x-y|\), we have

$$\begin{aligned} L_3 \le nC \lambda _1 \sum _{k=1}^p{\Vert \gamma _k-\gamma _{0k}\Vert }/\sqrt{K} = O_p(n \lambda _1 \delta _n \Vert v\Vert )=O_p(n \delta _n^2 \Vert v\Vert ). \end{aligned}$$

Thus \(L_3\) is dominated by \(L_2\) if a sufficiently large C is chosen. Similarly, it is easy to verify that \(L_4\) is also dominated by \(L_2\). Recall that \(L_2>0\), so we have (10) holds, which means \(\Vert \hat{\gamma }-\gamma _0\Vert =O_p(\bar{K}^{1/2} \delta _n)\).

Finally, we will show that the convergence rate can be further improved to \(\Vert \hat{\gamma }-\gamma _0\Vert =O_p(\bar{K}^{1/2} \theta _n)\). In fact, as the model is fixed as \(n \rightarrow \infty \), we can find a constant \(C>0\), such that \(\gamma _{0k}^TD_k\gamma _{0k} >C\) for \(k \le s\) and \(\gamma _{0k}^TE_k\gamma _{0k} >C\) for \(k \le p_0\). As \(\Vert \hat{\gamma }-\gamma _0\Vert ^2=O_p(\bar{K} \delta _n^2) =o_p(\bar{K})\) from above result and \(\lambda _k=o_p(1), k=1,2\), we have

$$\begin{aligned}&P \left( p_{\lambda _1}\left( \sqrt{ \gamma _{0k}^T D_k \gamma _{0k} }\right) =p_{\lambda _1}\left( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } \right) \right) \rightarrow 1, ~~~ j \le s, \\&P \left( p_{\lambda _1}\left( \sqrt{ \gamma _{0k}^T E_k \gamma _{0k} } )=p_{\lambda _1}( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \right) \right) \rightarrow 1, ~~~ j \le p_0. \end{aligned}$$

These facts indicate that

$$\begin{aligned}&P \left( n \sum _{k=1}^p {p_{\lambda _1}\left( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } \right) } - n \sum _{k=1}^p {p_{\lambda _1}\left( \sqrt{ \gamma _{0k}^T D_k \gamma _{0k} }\right) } \ge 0 \right) \rightarrow 1, \\&P \left( n \sum _{k=1}^p {p_{\lambda _1}\left( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \right) } - n \sum _{k=1}^p {p_{\lambda _1}\left( \sqrt{ \gamma _{0k}^T E_k \gamma _{0k} }\right) } \ge 0 \right) \rightarrow 1. \end{aligned}$$

Removing the regularizing terms \(L_3\) and \(L_4\) in (11), the rate can be improved to \(\Vert \hat{\gamma }-\gamma _0\Vert = O_p(\bar{K}^{1/2} \theta _n)\) by the same reasoning as above. That is \(\Vert \hat{\gamma }-\gamma _0\Vert ^2= O_p(\bar{K} \theta _n^2)=O_p(K^2/n)\). As a consequence, following the same approach in the proof of the second part of Theorem 1, we obtain that \(\Vert \hat{f}_j - f_{0j} \Vert ^2 = O_p(n^{-2r/(2r+1)})\), this completes the proof.

In the next, we put our main attention on proving part (ii) as an illustration and part (iii) can be similarly proved with its detailed proof omitted. Suppose that \(B_j^T\hat{\gamma }_j\) does not represent a linear function for \( p_0+1 \le j \le s\). Define \(\bar{\gamma }\) to be the same as \(\hat{\gamma }\) except that \(\hat{\gamma }_j\) is replaced by its projection onto the subspace { \(\gamma _j : B_j^T\gamma _j\) stands for a linear function }. Therefore, we have that

$$\begin{aligned} 0\ge & {} L_n^{\lambda }(\hat{\gamma }) - L_n^{\lambda }(\bar{\gamma }) = ( L_n^{\lambda }(\hat{\gamma }) - L_n^{\lambda }(\gamma _0) ) - ( L_n^{\lambda }(\bar{\gamma }) - L_n^{\lambda }(\gamma _0) ) \nonumber \\= & {} \frac{1}{n}\sum _{i<j} { \big \{ | Y_{ij}-Z_{ij}^T \hat{\gamma } |-| Y_{ij}-Z_{ij}^T \gamma _0 | \big \} } - \frac{1}{n}\sum _{i<j} { \big \{ | Y_{ij}-Z_{ij}^T \bar{\gamma } |-| Y_{ij}-Z_{ij}^T \gamma _0 | \big \} } \nonumber \\&+ \,n \sum _{k=1}^p { \big \{ p_{\lambda _1}( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } ) - p_{\lambda _1}( \sqrt{ \bar{\gamma }_{k}^T D_k \bar{\gamma }_{k} } ) \big \}} \nonumber \\&+\, n\sum _{k=1}^p { \big \{p_{\lambda _2}( \sqrt{ \hat{\gamma }_{k}^T E_k \hat{\gamma }_{k} } ) - p_{\lambda _2}( \sqrt{ \bar{\gamma }_{k}^T E_k \bar{\gamma }_{k} } ) \big \} } \nonumber \\\triangleq & {} M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0) + M_3(\hat{\gamma },\bar{\gamma })+ M_4(\hat{\gamma },\bar{\gamma }). \end{aligned}$$
(12)

Note that, by the same arguments to the derivation of (11), it is not difficult to verify that

$$\begin{aligned} M_1(\hat{\gamma },\gamma _0)= \tau (\hat{\gamma }-\gamma _0)^T Z^TZ (\hat{\gamma }-\gamma _0)(1+o_p(1)) + \theta _n^{-1} (\hat{\gamma }-\gamma _0)^T S_n(0) \end{aligned}$$

and

$$\begin{aligned} M_2(\bar{\gamma },\gamma _0)= \tau (\bar{\gamma }-\gamma _0)^T Z^TZ (\bar{\gamma }-\gamma _0)(1+o_p(1)) + \theta _n^{-1} (\bar{\gamma }-\gamma _0)^T S_n(0). \end{aligned}$$

Therefore, we can show that

$$\begin{aligned}&M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0) \\&\quad = \tau \{ (\hat{\gamma }-\bar{\gamma }+\bar{\gamma }-\gamma _0)^T Z^TZ (\hat{\gamma }-\bar{\gamma }+\bar{\gamma }-\gamma _0)\\&\qquad - (\bar{\gamma }-\gamma _0)^T Z^TZ (\bar{\gamma }-\gamma _0) \} (1+o_p(1)) +\, \theta _n^{-1} (\hat{\gamma }-\bar{\gamma })^T S_n(0) \\&\quad = \tau (\hat{\gamma }-\bar{\gamma })^T Z^TZ (\hat{\gamma }-\bar{\gamma }) + 2 \tau (\bar{\gamma }-\gamma _0)^T Z^TZ (\hat{\gamma }-\bar{\gamma }) + \theta _n^{-1} (\hat{\gamma }-\bar{\gamma })^T S_n(0) \\&\quad \ge 2 \tau (\bar{\gamma }-\gamma _0)^T Z^TZ (\hat{\gamma }-\bar{\gamma }) + \theta _n^{-1} (\hat{\gamma }-\bar{\gamma })^T S_n(0) \triangleq N_1+N_2. \end{aligned}$$

Recall that \(\bar{\gamma }_k\) is the projection of \(\hat{\gamma }_{k}\) onto \(\{ \gamma _{k}: \gamma _{k}^T E_k \gamma _{k}=0 \}\), then \(\hat{\gamma }_k-\bar{\gamma }_k\) is orthogonal to the space. Furthermore, the space \(\{ \gamma _{k}: \gamma _{k}^T E_k \gamma _{k}=0 \}\) is just the eigenspace of \(E_k\) corresponding to the zero eigenvalue. Consequently, based on the characterization of eigenvalues in terms of Rayleigh quotient, \((\hat{\gamma }_k-\bar{\gamma }_k)^T E_k (\hat{\gamma }_k-\bar{\gamma }_k) / \Vert \hat{\gamma }_k-\bar{\gamma }_k\Vert ^2\) lies between the minimum and the maximum positive eigenvalues of \(E_k\), which is of order 1 / K. Taking into account of the fact that \(\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}=(\hat{\gamma }_k-\bar{\gamma }_k)^T E_k (\hat{\gamma }_k-\bar{\gamma }_k)\) since \(\bar{\gamma }_{k}^T E_k \bar{\gamma }_{k}=0\), we derive \(\Vert \hat{\gamma }_k-\bar{\gamma }_k\Vert =O_p(\sqrt{K \hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}})\). According to Lemma 3, Lemma A.3 of Huang et al. (2004) and the result \(\Vert \bar{\gamma }-\gamma _0\Vert =O_p(K/\sqrt{n})\) from part (i), it follows that

$$\begin{aligned}&\Vert N_1 \Vert \le O_p\left( \frac{n}{K} \Vert \bar{\gamma }-\gamma _0 \Vert \cdot \Vert \hat{\gamma }-\bar{\gamma }\Vert \right) = O_p\left( \sqrt{nK}\sum _{k=1}^p { \sqrt{\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}} }\right) , \\&\Vert N_2 \Vert \le O_p\left( \theta _n^{-1} \Vert \hat{\gamma }-\bar{\gamma }\Vert \cdot \Vert S_n(0)\Vert \right) = O_p\left( \sqrt{nK}\sum _{k=1}^p { \sqrt{\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}} }\right) . \end{aligned}$$

These facts leads to

$$\begin{aligned} M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0) \ge - O_p\left( \sqrt{nK}\sum _{k=1}^p { \sqrt{\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}} } \right) . \end{aligned}$$
(13)

On the other hand, according to the proof of (i), we have \(P \big ( p_{\lambda _1}( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } ) = p_{\lambda _1}( \sqrt{ \bar{\gamma }_{k}^T D_k \bar{\gamma }_{k} } )\big ) \rightarrow 1\) and \(P \big ( \bar{\gamma }_k^T E_k \bar{\gamma }_k =0 \big ) \rightarrow 1\). Substituting these results into (12) yields

$$\begin{aligned} P \left( M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0)+ n \sum _{k=1}^p { p_{\lambda _2}( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } )} \le 0 \right) \rightarrow 1. \end{aligned}$$
(14)

In addition, based on the result of (i) and the condition \(n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty \), it is easy to verify that

$$\begin{aligned} \sqrt{ \hat{\gamma }_k^T E_j \hat{\gamma }_k } = \sqrt{ (\hat{\gamma }_k-\gamma _{0k})^T E_k (\hat{\gamma }_k-\gamma _{0k}) } =O_p(\sqrt{K/n})=o_p(\lambda _2). \end{aligned}$$

Hence, we have \(P \big ( p_{\lambda _2}( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } )= \lambda _2\sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \big ) \rightarrow 1\) by the definition of SCAD penalty function.

As a consequence, if \(\hat{\gamma }_k^T E_k \hat{\gamma }_k > 0\), we have

$$\begin{aligned} n \sum _{k=1}^p { p_{\lambda _2}\left( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k }\right) }=O_p\left( n \lambda _2 \sum _{k=1}^p { \sqrt{\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}} }\right) . \end{aligned}$$
(15)

Combining (13) and (15) along with the condition \(n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty \), it follows that

$$\begin{aligned} M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0)+ n \sum _{k=1}^p { p_{\lambda _2}\left( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \right) } > 0, \end{aligned}$$

which is contradictory to (14). Then we complete the proof of Theorem 2. \(\square \)

Proof of Theorem 3

Note that, by the results of Theorem 2, we only need to consider a correctly specified partially linear additive model as (2) without regularization terms. Specifically, the corresponding objective function is

$$\begin{aligned} \Phi _n(\alpha ,\beta )= \frac{1}{n}\sum _{i<j} { | Y_{ij}-V_{ij}^T \alpha - X_{ij}^{(2)^T} \beta | }, \end{aligned}$$

where \(V_i= \big ( B_1(X_{i1})^T,\ldots ,B_p(X_{ip_0})^T \big )^T\), \(X_i^{(2)}=(X_{i(p_0+1)},\ldots ,X_{is})^T\) and \(\alpha =(\gamma _1,\ldots ,\gamma _{p_0})^T\) is the corresponding coefficient vector of the spline approximation. Let \((\hat{\alpha }^T,\hat{\beta }^T)^T=\arg \min \Phi _n(\alpha ,\beta )\), \(\tilde{\Delta }_i=\sum _{l=1}^{p_0} { f_{0l}(X_{il}) }-V_i^T \hat{\alpha } \), \(\delta _n= n^{-1/2}\) and \(\beta ^*=\delta _n^{-1}(\beta -\beta _0)\). Then, \(\hat{\beta }^*\) must be the minimizer of the following function

$$\begin{aligned} \Phi _n^{*}(\beta ^{*})= \frac{1}{n}\sum _{i<j} { | (\varepsilon _i+\tilde{\Delta }_i) - (\varepsilon _j+\tilde{\Delta }_j)- \delta _n X_{ij}^{(2)^T} \beta ^{*} | }. \end{aligned}$$

Denote by \(S_n^*(\beta ^{*})\) the gradient function of \(\Phi _n^{*}(\beta ^{*})\), that is

$$\begin{aligned} S_n^*(\beta ^{*}) = \frac{\partial \Phi _n^{*}(\beta ^{*})}{\partial \beta ^{*}}= -\frac{\delta _n}{n} \sum _{i \ne j} { \text{ sgn } \{ (\varepsilon _i+\tilde{\Delta }_i) - (\varepsilon _j+\tilde{\Delta }_j)- \delta _n X_{ij}^{(2)^T} \beta ^{*} \} X_{ij}^{(2)} }. \end{aligned}$$

Then, we can show that

$$\begin{aligned} S_n^*(\beta ^{*})-S_n^*(0)= & {} -\frac{\delta _n}{n} \sum _{i \ne j} { \text{ sgn } \big ( (\varepsilon _i+\tilde{\Delta }_i) - (\varepsilon _j+\tilde{\Delta }_j)- \delta _n X_{ij}^{(2)^T} \beta ^{*} \big ) X_{ij}^{(2)}} \\&+\frac{\delta _n}{n} \sum _{i \ne j} { \text{ sgn } \big ( (\varepsilon _i+\tilde{\Delta }_i) - (\varepsilon _j+\tilde{\Delta }_j) \big ) X_{ij}^{(2)} }. \end{aligned}$$

Taking into consideration of the results obtained in Theorem 2, we have \(\tilde{\Delta }_i=O_p(K^{-r})=o_p(1)\) as \(n \rightarrow \infty \). Hence, following the similar proof of Lemma 1, it is not difficult to obtain

$$\begin{aligned} S_n^*(\beta ^{*})-S_n^*(0)=2\tau \delta _n^2 \Sigma \beta ^{*}, \end{aligned}$$
(16)

where \(\Sigma \) is defined in assumption (A3). Further let \(B_n(\beta ^{*})=\tau \delta _n^2 \beta ^{*^T} \Sigma \beta ^{*} + \beta ^{*^T} S_n^*(0) + \Phi _n^{*}(0)\) and its minimizer denoted by \(\tilde{\beta }^*\). Then it is not difficult to verify that \(\tilde{\beta }^*=-(2\tau )^{-1}(\delta _n^2 \Sigma )^{-1}S_n^*(0)\). Based on Equation (16) and a similar arguments of Lemma 2, it follows that

$$\begin{aligned} \hat{\beta }^*=\tilde{\beta } ^*+o_p(1)=-(2\tau )^{-1}(\delta _n^2 \Sigma )^{-1}S_n^*(0)+o_p(1). \end{aligned}$$
(17)

In addition, by the assumption that \(\varepsilon _i\) is the random error independent of \(X_i\), combined with some calculations, we have

$$\begin{aligned} \delta _n^{-2}S_n^*(0) ~\mathop \rightarrow \limits ^d~ N \big (0,E\big \{ (2H(\varepsilon )-1)^2 \big \} \Sigma \big ), \end{aligned}$$
(18)

where \(H(\cdot )\) stands for the cumulative distribution function of \(\varepsilon \). Furthermore, it can be shown that

$$\begin{aligned} E\{ (2H(\varepsilon )-1)^2 \}= & {} \int (2H(\varepsilon )-1)^2 h(\varepsilon ) d\varepsilon \nonumber \\= & {} \int 4H(\varepsilon )^2 h(\varepsilon ) d\varepsilon - 4 \int H(\varepsilon ) h(\varepsilon ) d\varepsilon + \int h(\varepsilon ) d\varepsilon \nonumber \\= & {} \int 4H(\varepsilon )^2 dH(\varepsilon ) - 4 \int H(\varepsilon ) dH(\varepsilon )+1 = 1/3. \end{aligned}$$
(19)

Therefore, substituting (18) and (19) into (17), we complete the proof. \(\square \)

Proof of Theorem 4

Based on the asymptotic results of Theorem 3 and the least square B-spline estimate given in Theorem 3 of Lian (2012a), we immediately obtain \(\text{ ARE }(\hat{\beta }_{RR},\hat{\beta }_{LS})=12 \tau ^2 \sigma ^2\). In addition, a result of Hodges and Lehmann (1956) indicates that the ARE has a lower bound 0.864, with this lower bound being obtained at the density \(h(x)=\frac{3}{20\sqrt{5}}(5-x^2)I(|x|\le 5)\). This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Yang, H. & Lu, F. Rank-based shrinkage estimation for identification in semiparametric additive models. Stat Papers 60, 1255–1281 (2019). https://doi.org/10.1007/s00362-017-0874-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-017-0874-z

Keywords

Navigation