Rank-based shrinkage estimation for identification in semiparametric additive models

Yang, Jing; Yang, Hu; Lu, Fang

doi:10.1007/s00362-017-0874-z

Rank-based shrinkage estimation for identification in semiparametric additive models

Regular Article
Published: 10 February 2017

Volume 60, pages 1255–1281, (2019)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Jing Yang¹,
Hu Yang² &
Fang Lu³

395 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we propose a novel and robust procedure for model identification in semiparametric additive models based on rank regression and spline approximation. Under some mild conditions, we establish the theoretical properties of the identified nonparametric functions and the linear parameters. Furthermore, we demonstrate that the proposed rank estimate has a great efficiency gain across a wide spectrum of non-normal error distributions and almost not lose any efficiency for the normal error compared with that of least square estimate. Even in the worst case scenarios, the asymptotic relative efficiency of the proposed rank estimate versus least squares estimate, which is show to have an expression closely related to that of the signed-rank Wilcoxon test in comparison with the t-test, has a lower bound equal to 0.864. Finally, an efficient algorithm is presented for computation and the selections of tuning parameters are discussed. Some simulation studies and a real data analysis are conducted to illustrate the finite sample performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rank estimation for the function-on-scalar model

Article 21 September 2023

Robust and efficient estimation of nonparametric generalized linear models

Article 16 May 2023

Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression

Article 01 March 2020

References

David HA (1998) Early sample measures of variability. Stat Sci 13:368–377
Article MathSciNet MATH Google Scholar
De Boor C (2001) A practical guide to splines, revised edn. Springer, New York
MATH Google Scholar
Deng G, Liang H (2010) Model averaging for semiparametric additive partial linear models. Sci China Math 53:1363–1376
Article MathSciNet MATH Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet MATH Google Scholar
Feng L, Zou C, Wang Z, Wei X, Chen B (2015) Robust spline-based variable selection in varying coefficient model. Metrika 78:85–118
Article MathSciNet MATH Google Scholar
Härdle W, Huet S, Mammen E, Sperlich S (2004) Bootstrap inference in semiparametric generalized additive models. Econ Theory 20:265–300
Article MathSciNet MATH Google Scholar
Hettmansperger TP, McKean JW (2011) Robust nonparametric statistical methods, 2nd edn. Chapman and Hall, Boca Raton
MATH Google Scholar
Hodges JL, Lehmann EL (1956) The efficiency of some nonparametric competitors of the t-test. Ann Math Stat 27:324–335
Article MathSciNet MATH Google Scholar
Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
Article MathSciNet MATH Google Scholar
Huang JZ, Wu CO, Zhou L (2004) Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Stat Sin 14:763–788
MathSciNet MATH Google Scholar
Jiang J, Zhou H, Jiang X, Peng J (2007) Generalized likelihood ratio tests for the structure of semiparametric additive models. Can J Stat 35:381–398
Article MathSciNet MATH Google Scholar
Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc Ser B 72:49–69
Article MathSciNet MATH Google Scholar
Leng C (2010) Variable selection and coefficient estimation via regularized rank regression. Stat Sin 20:167–181
MathSciNet MATH Google Scholar
Li Q (2000) Efficient estimation of additive partially linear models. Int Econ Rev 41:1073–1092
Article MathSciNet Google Scholar
Li J, Li Y, Zhang R (2015) B spline variable selection for the single index models. Stat Pap. doi:10.1007/s00362-015-0721-z
Lian H (2012a) Shrinkage estimation for identification of linear components in additive models. Stat Probab Lett 82:225–231
Article MathSciNet MATH Google Scholar
Lian H (2012b) Semiparametric estimation of additive quantile regression models by two-fold penalty. J Bus Econ Stat 30:337–350
Article MathSciNet Google Scholar
Liu X, Wang L, Liang H (2011) Estimation and Variable selection for semiparametric additive partial linear models. Stat Sin 21:1225–1248
Article MathSciNet MATH Google Scholar
Mammen E, Park B (2006) A simple smooth backfitting method for additive models. Ann Stat 34:2252–2271
Article MathSciNet MATH Google Scholar
Opsomer JD, Ruppert D (1999) A root-n consistent backfitting estimator for semiparametric additive modeling. J Comput Graph Stat 8:715–732
Google Scholar
Pollard D (1991) Asymptotics for least absolute deviation regression estimators. Econ Theory 7:186–199
Article MathSciNet Google Scholar
Sievers GL, Abebe A (2004) Rank estimation of regression coefficients using iterated reweighted least squares. J Stat Comput Simul 74:821–831
Article MathSciNet MATH Google Scholar
Sun J, Lin L (2014) Local rank estimation and related test for varying-coefficient partially linear models. J Nonparametr Stat 26:187–206
Article MathSciNet MATH Google Scholar
Tang Q (2015) Robust estimation for spatial semiparametric varying coefficient partially linear regression. Stat Pap 56:1137–1161
Article MathSciNet MATH Google Scholar
Wang L, Kai B, Li R (2009) Local rank inference for varying coefficient models. J Am Stat Assoc 488:1631–1645
Article MathSciNet MATH Google Scholar
Wang M, Song L (2013) Identification for semiparametric varying coefficient partially linear models. Stat Probab Lett 83:1311–1320
Article MathSciNet MATH Google Scholar
Wei C, Liu C (2012) Statistical inference on semi-parametric partial linear additive models. J Nonparametr Stat 24:809–823
Article MathSciNet MATH Google Scholar
Wei C, Luo Y, Wu X (2012) Empirical likelihood for partially linear additive errors-in-variables models. Stat Pap 53:485–496
Article MathSciNet MATH Google Scholar
Xue L (2009) Consistent variable selection in additive models. Stat Sin 19:1281–1296
MathSciNet MATH Google Scholar
Yu K, Lu Z (2004) Local linear additive quantile regression. Scand J Stat 31:333–346
Article MathSciNet MATH Google Scholar
Yu K, Park B, Mammen E (2008) Smooth backfitting in generalized additive models. Ann Stat 36:228–260
Article MathSciNet MATH Google Scholar
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to the Editor, Associate Editor and two anonymous referees whose comments lead to a significant improvement of the paper. This work was supported in part by the National Natural Science Foundation of China (Grant No. 11671059).

Author information

Authors and Affiliations

College of Mathematics and Computer Science, Key Laboratory of High Performance Computing and Stochastic Information Processing (Ministry of Education of China), Hunan Normal University, Changsha, 410081, China
Jing Yang
College of Mathematics and Statistics, Chongqing University, Chongqing, 401331, China
Hu Yang
College of Mathematics and Statistics, Chongqing Technology and Business University, Chongqing, 400067, China
Fang Lu

Authors

Jing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Fang Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Yang.

Appendix

In the proofs, C denotes a generic constant that might assume different values at different places. Assume $\gamma _0=(\gamma _{01}^T,\gamma _{02}^T, \ldots ,\gamma _{0p}^T)^T$ be a pK-dimensional vector satisfying $\Vert f_{0j}-B_j^T \gamma _{0j}\Vert =O_p(K^{-r})$ for $ 1\le j \le p_0$ and $f_{0j}=B_j^T \gamma _{0j}$ for $p_0 < j \le p$. In order to prove the theoretical results, we first give some notations for convenience of expression. Let

$$\begin{aligned} \theta _n=\sqrt{K/n}, ~~\gamma ^{*}=\theta _n^{-1}(\gamma -\gamma _0), ~~Z_i= \big ( B_1(X_{i1})^T,\ldots ,B_p(X_{ip})^T \big )^T, \end{aligned}$$

$$\begin{aligned} Z_{ij}=Z_i-Z_j, ~~Z=(Z_1,\ldots ,Z_n)^T, ~~\Delta _i=\sum _{l=1}^{p} { f_{0l}(X_{il})-Z_i^T \gamma _0 }, \end{aligned}$$

$$\begin{aligned} \bar{K}=pK, ~~~\text{ and }~~~ Q_n(\gamma ^{*})=\tau \theta _n^2 \gamma ^{*^T} Z^TZ \gamma ^{*} + \gamma ^{*^T} S_n(0) + L_n(0). \end{aligned}$$

Based on the notations, the objective function $L_n(\gamma )$ defined in (4) can be rewritten as

$$\begin{aligned} L_n^{*}(\gamma ^{*})= \frac{1}{n}\sum _{i<j} { |(\varepsilon _i+\Delta _i)-(\varepsilon _j+\Delta _j)-\theta _n Z_{ij}^T \gamma ^{*}| }. \end{aligned}$$

Further denote as $S_n(\gamma ^{*})$ the gradient function of $L_n(\gamma ^{*})$, that is,

$$\begin{aligned} S_n(\gamma ^{*}) = \frac{\partial L_n^{*}(\gamma ^{*})}{\partial \gamma ^{*}}= -\frac{\theta _n}{n} \sum _{i \ne j} { \text{ sgn } \{ \varepsilon _i + \Delta _{i} -\varepsilon _j - \Delta _{j} - \theta _n Z_{ij}^T \gamma ^{*} \} Z_{ij} }, \end{aligned}$$

where $\text{ sgn }(\cdot )$ denotes the sign function.

We first quote several necessary lemmas which are frequently used in the sequel, and the detailed proofs can be referred to Feng et al. (2015).

Lemma 1

Suppose that the assumptions (A1)–(A4) hold, then

$$\begin{aligned} S_n(\gamma ^{*})-S_n(0)=2 \tau \theta _n^2 Z^TZ \gamma ^{*} +o_p(1)\mathbf 1 _{\bar{K}}, \end{aligned}$$

where $\tau $ is defined in Theorem 3 and $\mathbf 1 _{\bar{K}}$ is a K-dimension vector of ones.

Lemma 2

Let $\hat{\gamma }^{*}=\arg \min L_n^{*}(\gamma ^{*})$ and $\tilde{\gamma }^{*}=\arg \min Q_n(\gamma ^{*})$. Suppose that the assumptions (A1)–(A4) hold, then

$$\begin{aligned} \Vert \hat{\gamma }^{*}-\tilde{\gamma }^{*}\Vert ^2=o_p(K). \end{aligned}$$

Lemma 3

Suppose that the assumptions (A1)–(A4) hold, then

$$\begin{aligned} S_n(0)=O_p(1)\mathbf 1 _{\bar{K}}. \end{aligned}$$

Proof of Theorem 1

By the definition of $A_n(\gamma ^{*})$, it follows from the convexity lemma in Pollard (1991) that

$$\begin{aligned} \tilde{\gamma }^{*}=-(2 \tau \theta _n^2 Z^TZ )^{-1}S_n(0). \end{aligned}$$

Note that, according to Lemma A.3 of Huang et al. (2004), there exists an interval $[C_1,C_2]$, $0<C_1<C_2<\infty $, such that all the eigenvalues of $\frac{K}{n}Z^TZ$ fall into $[C_1,C_2]$ with probability tending to 1. Write $S_n(0)=( S_{n1}(0),\ldots ,S_{n\bar{K}}(0) )^T$, then we have

$$\begin{aligned} \Vert \tilde{\gamma }^{*} \Vert ^2= & {} \frac{1}{4 \tau ^2}S_n(0)^T \left( \frac{K}{n}Z^TZ\right) ^{-1}\left( \frac{K}{n}Z^TZ\right) ^{-1}S_n(0) \\= & {} O_p(1)S_n(0)^T S_n(0)=O_p(1) \sum _{i=1}^{\bar{K}} { S_{ni}(0)^2 }=O_p(\bar{K}), \end{aligned}$$

where the last equality holds due to Lemma 3. As $\bar{K}=pK$, it follows that $|\tilde{\gamma }^{*}|^2=O_p(K)$. Therefore, based on the triangle inequality and Lemma 2, we obtain

$$\begin{aligned} \Vert \check{\gamma }^{*}\Vert ^2=\Vert \check{\gamma }^{*} - \tilde{\gamma }^{*} + \tilde{\gamma }^{*}\Vert ^2 \le \Vert \check{\gamma }^{*} - \tilde{\gamma }^{*} \Vert ^2 + \Vert \tilde{\gamma }^{*}\Vert ^2=o_p(K)+O_p(K)=O_p(K). \end{aligned}$$

This is equivalent to $\Vert \check{\gamma }- \gamma _0 \Vert ^2=O_p(K^2/n)$ since $\check{\gamma }^{*}=\theta _n^{-1}(\check{\gamma }-\gamma _0)$ and $\theta _n=\sqrt{K/n}$.

In addition, by the properties of spline in De Boor (2001) that there exist some constants $C_3$ and $C_4$ satisfying

$$\begin{aligned} C_3 K \Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2 \le \Vert \check{\gamma }_j - \gamma _{0j} \Vert ^2 \le C_4 K \Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2. \end{aligned}$$

Thus, we can derive that $\Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2=O_p(K/n)$. Consequently, by the fact that $\Vert f_{0j}-B_j^T \gamma _{0j}\Vert =O_p(K^{-r})$, we have

$$\begin{aligned} \Vert \check{f}_j - f_{0j} \Vert ^2= & {} \Vert \check{\gamma }_j^T B_j- f_{0j} \Vert ^2 \le \Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2 + \Vert \gamma _{0j}^T B_j- f_{0j} \Vert ^2 \\= & {} O_p(K/n) + O_p(K^{-2r})= O_p(n^{-2r/(2r+1)}), \end{aligned}$$

where the last equality holds due to the assumption that the number of knots $K=O_p\big ( n^{1/(2r+1)} \big )$. This completes the proof. $\square $

Proof of Theorem 2

Firstly, we prove (i). Denote by $\delta _n=\theta _n+\lambda _1 +\lambda _2$, we first prove that $\Vert \hat{\gamma }-\gamma _0\Vert = O_p(\bar{K}^{1/2} \delta _n)$. Let $\gamma =\gamma _0+\bar{K}^{1/2} \delta _n v$, where v is a $\bar{K}$-dimensional vector. It is sufficient to show, for any given $\xi >0$, there exists a large C such that

$$\begin{aligned} P\left\{ \inf _{\Vert v\Vert =C}L_n^{\lambda }(\gamma ) > L_n^{\lambda }(\gamma _0) \right\} \ge 1-\xi . \end{aligned}$$

(10)

By virtue of the identity $|x-y|-|x|=-y\text{ sgn }(x)+2(y-x)\{ I(0<x<y)-I(y<x<0) \}$ and the definition of $L_n^{\lambda }(\gamma )$, it follows that

$$\begin{aligned}&L_n^{\lambda }(\gamma ) - L_n^{\lambda }(\gamma _0) \nonumber \\&\quad = \frac{1}{n}\sum _{i<j} { \big \{ | Y_{ij}-Z_{ij}^T \gamma |-| Y_{ij}-Z_{ij}^T \gamma _0 | \big \} } + n \sum _{k=1}^p \big \{ p_{\lambda _1}( \sqrt{ \gamma _k^T D_k \gamma _k } ) \nonumber \\&\qquad - p_{\lambda _1}( \sqrt{ \gamma _{0k}^T D_k \gamma _{0k} } ) \big \} + n \sum _{k=1}^p { \bigg \{p_{\lambda _2}\left( \sqrt{ \gamma _{k}^T E_k \gamma _{k} } ) -p_{\lambda _2}( \sqrt{ \gamma _{0k}^T E_k \gamma _{0k} } \right) \bigg \}} \nonumber \\= & {} \frac{-1}{n}\sum _{i<j} { Z_{ij}^T (\gamma -\gamma _0) \text{ sgn }(Y_{ij}-Z_{ij}^T\gamma _0) } + \frac{2}{n}\sum _{i<j} { (Z_{ij}^T\gamma -Y_{ij}) } \cdot \nonumber \\&\big \{ I(0<Y_{ij}-Z_{ij}^T\gamma _0< Z_{ij}^T (\gamma -\gamma _0)) - I(Z_{ij}^T (\gamma -\gamma _0)<Y_{ij}-Z_{ij}^T\gamma _0<0) \big \} \nonumber \\&+\, n \sum _{k=1}^p { \big \{ p_{\lambda _1}( \sqrt{ \gamma _k^T D_k \gamma _k } ) - p_{\lambda _1}( \sqrt{ \gamma _{0k}^T D_k \gamma _{0k} } ) \big \}} \nonumber \\&+\,n \sum _{k=1}^p { \big \{p_{\lambda _2}( \sqrt{ \gamma _{k}^T E_k \gamma _{k} } ) -p_{\lambda _2}( \sqrt{ \gamma _{0k}^T E_k \gamma _{0k} } )\big \}} \nonumber \\\triangleq & {} L_1+L_2+L_3+L_4. \end{aligned}$$

(11)

From Lemma 3, it is easy to verify that $\frac{-1}{n}\sum _{i<j} { \text{ sgn }(Y_{ij}-Z_{ij}^T\gamma _0) Z_{ij} }=\theta _n^{-1} \mathbf 1 _{\bar{K}}$, thus we have $L_1=O_p( \delta _n \theta _n^{-1}\bar{K}^{1/2} \Vert v\Vert )=O_p( n^{1/2}\delta _n \Vert v\Vert )=o_p(n \delta _n^2 \Vert v\Vert )$ due to the assumption $n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty $. Moreover, taking the similar arguments as in the proof of Lemma 1, we can obtain that

$$\begin{aligned} L_2=\tau (\gamma -\gamma _0)^T Z^TZ (\gamma -\gamma _0)(1+o_p(1)). \end{aligned}$$

By applying Lemma A.3 of Huang et al. (2004) to $L_2$ yields $L_2=O_p(n\delta _n^2 \Vert v\Vert ^2)$. Obviously, by choosing a sufficiently large C, $L_2$ dominates $L_1$ with probability tending to 1.

On the other hand, based on the well-known properties of B-spline that $D_k$ and $E_k$ are of rank $K-1$ and all their positive eigenvalues are of order 1 / K, then according to the inequality $p_{\lambda }(|x|)-p_{\lambda }(|y|) \le \lambda |x-y|$, we have

$$\begin{aligned} L_3 \le nC \lambda _1 \sum _{k=1}^p{\Vert \gamma _k-\gamma _{0k}\Vert }/\sqrt{K} = O_p(n \lambda _1 \delta _n \Vert v\Vert )=O_p(n \delta _n^2 \Vert v\Vert ). \end{aligned}$$

Thus $L_3$ is dominated by $L_2$ if a sufficiently large C is chosen. Similarly, it is easy to verify that $L_4$ is also dominated by $L_2$. Recall that $L_2>0$, so we have (10) holds, which means $\Vert \hat{\gamma }-\gamma _0\Vert =O_p(\bar{K}^{1/2} \delta _n)$.

Finally, we will show that the convergence rate can be further improved to $\Vert \hat{\gamma }-\gamma _0\Vert =O_p(\bar{K}^{1/2} \theta _n)$. In fact, as the model is fixed as $n \rightarrow \infty $, we can find a constant $C>0$, such that $\gamma _{0k}^TD_k\gamma _{0k} >C$ for $k \le s$ and $\gamma _{0k}^TE_k\gamma _{0k} >C$ for $k \le p_0$. As $\Vert \hat{\gamma }-\gamma _0\Vert ^2=O_p(\bar{K} \delta _n^2) =o_p(\bar{K})$ from above result and $\lambda _k=o_p(1), k=1,2$, we have

$$\begin{aligned}&P \left( p_{\lambda _1}\left( \sqrt{ \gamma _{0k}^T D_k \gamma _{0k} }\right) =p_{\lambda _1}\left( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } \right) \right) \rightarrow 1, ~~~ j \le s, \\&P \left( p_{\lambda _1}\left( \sqrt{ \gamma _{0k}^T E_k \gamma _{0k} } )=p_{\lambda _1}( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \right) \right) \rightarrow 1, ~~~ j \le p_0. \end{aligned}$$

These facts indicate that

$$\begin{aligned}&P \left( n \sum _{k=1}^p {p_{\lambda _1}\left( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } \right) } - n \sum _{k=1}^p {p_{\lambda _1}\left( \sqrt{ \gamma _{0k}^T D_k \gamma _{0k} }\right) } \ge 0 \right) \rightarrow 1, \\&P \left( n \sum _{k=1}^p {p_{\lambda _1}\left( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \right) } - n \sum _{k=1}^p {p_{\lambda _1}\left( \sqrt{ \gamma _{0k}^T E_k \gamma _{0k} }\right) } \ge 0 \right) \rightarrow 1. \end{aligned}$$

Removing the regularizing terms $L_3$ and $L_4$ in (11), the rate can be improved to $\Vert \hat{\gamma }-\gamma _0\Vert = O_p(\bar{K}^{1/2} \theta _n)$ by the same reasoning as above. That is $\Vert \hat{\gamma }-\gamma _0\Vert ^2= O_p(\bar{K} \theta _n^2)=O_p(K^2/n)$. As a consequence, following the same approach in the proof of the second part of Theorem 1, we obtain that $\Vert \hat{f}_j - f_{0j} \Vert ^2 = O_p(n^{-2r/(2r+1)})$, this completes the proof.

In the next, we put our main attention on proving part (ii) as an illustration and part (iii) can be similarly proved with its detailed proof omitted. Suppose that $B_j^T\hat{\gamma }_j$ does not represent a linear function for $ p_0+1 \le j \le s$. Define $\bar{\gamma }$ to be the same as $\hat{\gamma }$ except that $\hat{\gamma }_j$ is replaced by its projection onto the subspace { $\gamma _j : B_j^T\gamma _j$ stands for a linear function }. Therefore, we have that

$$\begin{aligned} 0\ge & {} L_n^{\lambda }(\hat{\gamma }) - L_n^{\lambda }(\bar{\gamma }) = ( L_n^{\lambda }(\hat{\gamma }) - L_n^{\lambda }(\gamma _0) ) - ( L_n^{\lambda }(\bar{\gamma }) - L_n^{\lambda }(\gamma _0) ) \nonumber \\= & {} \frac{1}{n}\sum _{i<j} { \big \{ | Y_{ij}-Z_{ij}^T \hat{\gamma } |-| Y_{ij}-Z_{ij}^T \gamma _0 | \big \} } - \frac{1}{n}\sum _{i<j} { \big \{ | Y_{ij}-Z_{ij}^T \bar{\gamma } |-| Y_{ij}-Z_{ij}^T \gamma _0 | \big \} } \nonumber \\&+ \,n \sum _{k=1}^p { \big \{ p_{\lambda _1}( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } ) - p_{\lambda _1}( \sqrt{ \bar{\gamma }_{k}^T D_k \bar{\gamma }_{k} } ) \big \}} \nonumber \\&+\, n\sum _{k=1}^p { \big \{p_{\lambda _2}( \sqrt{ \hat{\gamma }_{k}^T E_k \hat{\gamma }_{k} } ) - p_{\lambda _2}( \sqrt{ \bar{\gamma }_{k}^T E_k \bar{\gamma }_{k} } ) \big \} } \nonumber \\\triangleq & {} M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0) + M_3(\hat{\gamma },\bar{\gamma })+ M_4(\hat{\gamma },\bar{\gamma }). \end{aligned}$$

(12)

Note that, by the same arguments to the derivation of (11), it is not difficult to verify that

$$\begin{aligned} M_1(\hat{\gamma },\gamma _0)= \tau (\hat{\gamma }-\gamma _0)^T Z^TZ (\hat{\gamma }-\gamma _0)(1+o_p(1)) + \theta _n^{-1} (\hat{\gamma }-\gamma _0)^T S_n(0) \end{aligned}$$

and

$$\begin{aligned} M_2(\bar{\gamma },\gamma _0)= \tau (\bar{\gamma }-\gamma _0)^T Z^TZ (\bar{\gamma }-\gamma _0)(1+o_p(1)) + \theta _n^{-1} (\bar{\gamma }-\gamma _0)^T S_n(0). \end{aligned}$$

Therefore, we can show that

$$\begin{aligned}&M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0) \\&\quad = \tau \{ (\hat{\gamma }-\bar{\gamma }+\bar{\gamma }-\gamma _0)^T Z^TZ (\hat{\gamma }-\bar{\gamma }+\bar{\gamma }-\gamma _0)\\&\qquad - (\bar{\gamma }-\gamma _0)^T Z^TZ (\bar{\gamma }-\gamma _0) \} (1+o_p(1)) +\, \theta _n^{-1} (\hat{\gamma }-\bar{\gamma })^T S_n(0) \\&\quad = \tau (\hat{\gamma }-\bar{\gamma })^T Z^TZ (\hat{\gamma }-\bar{\gamma }) + 2 \tau (\bar{\gamma }-\gamma _0)^T Z^TZ (\hat{\gamma }-\bar{\gamma }) + \theta _n^{-1} (\hat{\gamma }-\bar{\gamma })^T S_n(0) \\&\quad \ge 2 \tau (\bar{\gamma }-\gamma _0)^T Z^TZ (\hat{\gamma }-\bar{\gamma }) + \theta _n^{-1} (\hat{\gamma }-\bar{\gamma })^T S_n(0) \triangleq N_1+N_2. \end{aligned}$$

Recall that $\bar{\gamma }_k$ is the projection of $\hat{\gamma }_{k}$ onto $\{ \gamma _{k}: \gamma _{k}^T E_k \gamma _{k}=0 \}$, then $\hat{\gamma }_k-\bar{\gamma }_k$ is orthogonal to the space. Furthermore, the space $\{ \gamma _{k}: \gamma _{k}^T E_k \gamma _{k}=0 \}$ is just the eigenspace of $E_k$ corresponding to the zero eigenvalue. Consequently, based on the characterization of eigenvalues in terms of Rayleigh quotient, $(\hat{\gamma }_k-\bar{\gamma }_k)^T E_k (\hat{\gamma }_k-\bar{\gamma }_k) / \Vert \hat{\gamma }_k-\bar{\gamma }_k\Vert ^2$ lies between the minimum and the maximum positive eigenvalues of $E_k$, which is of order 1 / K. Taking into account of the fact that $\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}=(\hat{\gamma }_k-\bar{\gamma }_k)^T E_k (\hat{\gamma }_k-\bar{\gamma }_k)$ since $\bar{\gamma }_{k}^T E_k \bar{\gamma }_{k}=0$, we derive $\Vert \hat{\gamma }_k-\bar{\gamma }_k\Vert =O_p(\sqrt{K \hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}})$. According to Lemma 3, Lemma A.3 of Huang et al. (2004) and the result $\Vert \bar{\gamma }-\gamma _0\Vert =O_p(K/\sqrt{n})$ from part (i), it follows that

$$\begin{aligned}&\Vert N_1 \Vert \le O_p\left( \frac{n}{K} \Vert \bar{\gamma }-\gamma _0 \Vert \cdot \Vert \hat{\gamma }-\bar{\gamma }\Vert \right) = O_p\left( \sqrt{nK}\sum _{k=1}^p { \sqrt{\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}} }\right) , \\&\Vert N_2 \Vert \le O_p\left( \theta _n^{-1} \Vert \hat{\gamma }-\bar{\gamma }\Vert \cdot \Vert S_n(0)\Vert \right) = O_p\left( \sqrt{nK}\sum _{k=1}^p { \sqrt{\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}} }\right) . \end{aligned}$$

These facts leads to

$$\begin{aligned} M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0) \ge - O_p\left( \sqrt{nK}\sum _{k=1}^p { \sqrt{\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}} } \right) . \end{aligned}$$

(13)

On the other hand, according to the proof of (i), we have $P \big ( p_{\lambda _1}( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } ) = p_{\lambda _1}( \sqrt{ \bar{\gamma }_{k}^T D_k \bar{\gamma }_{k} } )\big ) \rightarrow 1$ and $P \big ( \bar{\gamma }_k^T E_k \bar{\gamma }_k =0 \big ) \rightarrow 1$. Substituting these results into (12) yields

$$\begin{aligned} P \left( M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0)+ n \sum _{k=1}^p { p_{\lambda _2}( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } )} \le 0 \right) \rightarrow 1. \end{aligned}$$

(14)

In addition, based on the result of (i) and the condition $n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty $, it is easy to verify that

$$\begin{aligned} \sqrt{ \hat{\gamma }_k^T E_j \hat{\gamma }_k } = \sqrt{ (\hat{\gamma }_k-\gamma _{0k})^T E_k (\hat{\gamma }_k-\gamma _{0k}) } =O_p(\sqrt{K/n})=o_p(\lambda _2). \end{aligned}$$

Hence, we have $P \big ( p_{\lambda _2}( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } )= \lambda _2\sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \big ) \rightarrow 1$ by the definition of SCAD penalty function.

As a consequence, if $\hat{\gamma }_k^T E_k \hat{\gamma }_k > 0$, we have

$$\begin{aligned} n \sum _{k=1}^p { p_{\lambda _2}\left( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k }\right) }=O_p\left( n \lambda _2 \sum _{k=1}^p { \sqrt{\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}} }\right) . \end{aligned}$$

(15)

Combining (13) and (15) along with the condition $n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty $, it follows that

$$\begin{aligned} M_1(\hat{\gamma },\gamma _0)-M_2(\bar{\gamma },\gamma _0)+ n \sum _{k=1}^p { p_{\lambda _2}\left( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \right) } > 0, \end{aligned}$$

which is contradictory to (14). Then we complete the proof of Theorem 2. $\square $

Proof of Theorem 3

Note that, by the results of Theorem 2, we only need to consider a correctly specified partially linear additive model as (2) without regularization terms. Specifically, the corresponding objective function is

$$\begin{aligned} \Phi _n(\alpha ,\beta )= \frac{1}{n}\sum _{i<j} { | Y_{ij}-V_{ij}^T \alpha - X_{ij}^{(2)^T} \beta | }, \end{aligned}$$

where $V_i= \big ( B_1(X_{i1})^T,\ldots ,B_p(X_{ip_0})^T \big )^T$, $X_i^{(2)}=(X_{i(p_0+1)},\ldots ,X_{is})^T$ and $\alpha =(\gamma _1,\ldots ,\gamma _{p_0})^T$ is the corresponding coefficient vector of the spline approximation. Let $(\hat{\alpha }^T,\hat{\beta }^T)^T=\arg \min \Phi _n(\alpha ,\beta )$, $\tilde{\Delta }_i=\sum _{l=1}^{p_0} { f_{0l}(X_{il}) }-V_i^T \hat{\alpha } $, $\delta _n= n^{-1/2}$ and $\beta ^*=\delta _n^{-1}(\beta -\beta _0)$. Then, $\hat{\beta }^*$ must be the minimizer of the following function

$$\begin{aligned} \Phi _n^{*}(\beta ^{*})= \frac{1}{n}\sum _{i<j} { | (\varepsilon _i+\tilde{\Delta }_i) - (\varepsilon _j+\tilde{\Delta }_j)- \delta _n X_{ij}^{(2)^T} \beta ^{*} | }. \end{aligned}$$

Denote by $S_n^*(\beta ^{*})$ the gradient function of $\Phi _n^{*}(\beta ^{*})$, that is

$$\begin{aligned} S_n^*(\beta ^{*}) = \frac{\partial \Phi _n^{*}(\beta ^{*})}{\partial \beta ^{*}}= -\frac{\delta _n}{n} \sum _{i \ne j} { \text{ sgn } \{ (\varepsilon _i+\tilde{\Delta }_i) - (\varepsilon _j+\tilde{\Delta }_j)- \delta _n X_{ij}^{(2)^T} \beta ^{*} \} X_{ij}^{(2)} }. \end{aligned}$$

Then, we can show that

$$\begin{aligned} S_n^*(\beta ^{*})-S_n^*(0)= & {} -\frac{\delta _n}{n} \sum _{i \ne j} { \text{ sgn } \big ( (\varepsilon _i+\tilde{\Delta }_i) - (\varepsilon _j+\tilde{\Delta }_j)- \delta _n X_{ij}^{(2)^T} \beta ^{*} \big ) X_{ij}^{(2)}} \\&+\frac{\delta _n}{n} \sum _{i \ne j} { \text{ sgn } \big ( (\varepsilon _i+\tilde{\Delta }_i) - (\varepsilon _j+\tilde{\Delta }_j) \big ) X_{ij}^{(2)} }. \end{aligned}$$

Taking into consideration of the results obtained in Theorem 2, we have $\tilde{\Delta }_i=O_p(K^{-r})=o_p(1)$ as $n \rightarrow \infty $. Hence, following the similar proof of Lemma 1, it is not difficult to obtain

$$\begin{aligned} S_n^*(\beta ^{*})-S_n^*(0)=2\tau \delta _n^2 \Sigma \beta ^{*}, \end{aligned}$$

(16)

where $\Sigma $ is defined in assumption (A3). Further let $B_n(\beta ^{*})=\tau \delta _n^2 \beta ^{*^T} \Sigma \beta ^{*} + \beta ^{*^T} S_n^*(0) + \Phi _n^{*}(0)$ and its minimizer denoted by $\tilde{\beta }^*$. Then it is not difficult to verify that $\tilde{\beta }^*=-(2\tau )^{-1}(\delta _n^2 \Sigma )^{-1}S_n^*(0)$. Based on Equation (16) and a similar arguments of Lemma 2, it follows that

$$\begin{aligned} \hat{\beta }^*=\tilde{\beta } ^*+o_p(1)=-(2\tau )^{-1}(\delta _n^2 \Sigma )^{-1}S_n^*(0)+o_p(1). \end{aligned}$$

(17)

In addition, by the assumption that $\varepsilon _i$ is the random error independent of $X_i$, combined with some calculations, we have

$$\begin{aligned} \delta _n^{-2}S_n^*(0) ~\mathop \rightarrow \limits ^d~ N \big (0,E\big \{ (2H(\varepsilon )-1)^2 \big \} \Sigma \big ), \end{aligned}$$

(18)

where $H(\cdot )$ stands for the cumulative distribution function of $\varepsilon $. Furthermore, it can be shown that

$$\begin{aligned} E\{ (2H(\varepsilon )-1)^2 \}= & {} \int (2H(\varepsilon )-1)^2 h(\varepsilon ) d\varepsilon \nonumber \\= & {} \int 4H(\varepsilon )^2 h(\varepsilon ) d\varepsilon - 4 \int H(\varepsilon ) h(\varepsilon ) d\varepsilon + \int h(\varepsilon ) d\varepsilon \nonumber \\= & {} \int 4H(\varepsilon )^2 dH(\varepsilon ) - 4 \int H(\varepsilon ) dH(\varepsilon )+1 = 1/3. \end{aligned}$$

(19)

Therefore, substituting (18) and (19) into (17), we complete the proof. $\square $

Proof of Theorem 4

Based on the asymptotic results of Theorem 3 and the least square B-spline estimate given in Theorem 3 of Lian (2012a), we immediately obtain $\text{ ARE }(\hat{\beta }_{RR},\hat{\beta }_{LS})=12 \tau ^2 \sigma ^2$. In addition, a result of Hodges and Lehmann (1956) indicates that the ARE has a lower bound 0.864, with this lower bound being obtained at the density $h(x)=\frac{3}{20\sqrt{5}}(5-x^2)I(|x|\le 5)$. This completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, J., Yang, H. & Lu, F. Rank-based shrinkage estimation for identification in semiparametric additive models. Stat Papers 60, 1255–1281 (2019). https://doi.org/10.1007/s00362-017-0874-z

Download citation

Received: 17 November 2015
Revised: 02 November 2016
Published: 10 February 2017
Issue Date: August 2019
DOI: https://doi.org/10.1007/s00362-017-0874-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rank-based shrinkage estimation for identification in semiparametric additive models

Abstract

Access this article

Similar content being viewed by others

Rank estimation for the function-on-scalar model

Robust and efficient estimation of nonparametric generalized linear models

Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Lemma 1

Lemma 2

Lemma 3

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rank-based shrinkage estimation for identification in semiparametric additive models

Abstract

Access this article

Similar content being viewed by others

Rank estimation for the function-on-scalar model

Robust and efficient estimation of nonparametric generalized linear models

Variable Selection for Varying Coefficient Models Via Kernel Based Regularized Rank Regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 1

Lemma 2

Lemma 3

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation