Abstract
In this paper, we propose a novel and robust procedure for model identification in semiparametric additive models based on rank regression and spline approximation. Under some mild conditions, we establish the theoretical properties of the identified nonparametric functions and the linear parameters. Furthermore, we demonstrate that the proposed rank estimate has a great efficiency gain across a wide spectrum of non-normal error distributions and almost not lose any efficiency for the normal error compared with that of least square estimate. Even in the worst case scenarios, the asymptotic relative efficiency of the proposed rank estimate versus least squares estimate, which is show to have an expression closely related to that of the signed-rank Wilcoxon test in comparison with the t-test, has a lower bound equal to 0.864. Finally, an efficient algorithm is presented for computation and the selections of tuning parameters are discussed. Some simulation studies and a real data analysis are conducted to illustrate the finite sample performance of the proposed method.
Similar content being viewed by others
References
David HA (1998) Early sample measures of variability. Stat Sci 13:368–377
De Boor C (2001) A practical guide to splines, revised edn. Springer, New York
Deng G, Liang H (2010) Model averaging for semiparametric additive partial linear models. Sci China Math 53:1363–1376
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Feng L, Zou C, Wang Z, Wei X, Chen B (2015) Robust spline-based variable selection in varying coefficient model. Metrika 78:85–118
Härdle W, Huet S, Mammen E, Sperlich S (2004) Bootstrap inference in semiparametric generalized additive models. Econ Theory 20:265–300
Hettmansperger TP, McKean JW (2011) Robust nonparametric statistical methods, 2nd edn. Chapman and Hall, Boca Raton
Hodges JL, Lehmann EL (1956) The efficiency of some nonparametric competitors of the t-test. Ann Math Stat 27:324–335
Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
Huang JZ, Wu CO, Zhou L (2004) Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Stat Sin 14:763–788
Jiang J, Zhou H, Jiang X, Peng J (2007) Generalized likelihood ratio tests for the structure of semiparametric additive models. Can J Stat 35:381–398
Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc Ser B 72:49–69
Leng C (2010) Variable selection and coefficient estimation via regularized rank regression. Stat Sin 20:167–181
Li Q (2000) Efficient estimation of additive partially linear models. Int Econ Rev 41:1073–1092
Li J, Li Y, Zhang R (2015) B spline variable selection for the single index models. Stat Pap. doi:10.1007/s00362-015-0721-z
Lian H (2012a) Shrinkage estimation for identification of linear components in additive models. Stat Probab Lett 82:225–231
Lian H (2012b) Semiparametric estimation of additive quantile regression models by two-fold penalty. J Bus Econ Stat 30:337–350
Liu X, Wang L, Liang H (2011) Estimation and Variable selection for semiparametric additive partial linear models. Stat Sin 21:1225–1248
Mammen E, Park B (2006) A simple smooth backfitting method for additive models. Ann Stat 34:2252–2271
Opsomer JD, Ruppert D (1999) A root-n consistent backfitting estimator for semiparametric additive modeling. J Comput Graph Stat 8:715–732
Pollard D (1991) Asymptotics for least absolute deviation regression estimators. Econ Theory 7:186–199
Sievers GL, Abebe A (2004) Rank estimation of regression coefficients using iterated reweighted least squares. J Stat Comput Simul 74:821–831
Sun J, Lin L (2014) Local rank estimation and related test for varying-coefficient partially linear models. J Nonparametr Stat 26:187–206
Tang Q (2015) Robust estimation for spatial semiparametric varying coefficient partially linear regression. Stat Pap 56:1137–1161
Wang L, Kai B, Li R (2009) Local rank inference for varying coefficient models. J Am Stat Assoc 488:1631–1645
Wang M, Song L (2013) Identification for semiparametric varying coefficient partially linear models. Stat Probab Lett 83:1311–1320
Wei C, Liu C (2012) Statistical inference on semi-parametric partial linear additive models. J Nonparametr Stat 24:809–823
Wei C, Luo Y, Wu X (2012) Empirical likelihood for partially linear additive errors-in-variables models. Stat Pap 53:485–496
Xue L (2009) Consistent variable selection in additive models. Stat Sin 19:1281–1296
Yu K, Lu Z (2004) Local linear additive quantile regression. Scand J Stat 31:333–346
Yu K, Park B, Mammen E (2008) Smooth backfitting in generalized additive models. Ann Stat 36:228–260
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
Acknowledgements
The authors are grateful to the Editor, Associate Editor and two anonymous referees whose comments lead to a significant improvement of the paper. This work was supported in part by the National Natural Science Foundation of China (Grant No. 11671059).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In the proofs, C denotes a generic constant that might assume different values at different places. Assume \(\gamma _0=(\gamma _{01}^T,\gamma _{02}^T, \ldots ,\gamma _{0p}^T)^T\) be a pK-dimensional vector satisfying \(\Vert f_{0j}-B_j^T \gamma _{0j}\Vert =O_p(K^{-r})\) for \( 1\le j \le p_0\) and \(f_{0j}=B_j^T \gamma _{0j}\) for \(p_0 < j \le p\). In order to prove the theoretical results, we first give some notations for convenience of expression. Let
Based on the notations, the objective function \(L_n(\gamma )\) defined in (4) can be rewritten as
Further denote as \(S_n(\gamma ^{*})\) the gradient function of \(L_n(\gamma ^{*})\), that is,
where \(\text{ sgn }(\cdot )\) denotes the sign function.
We first quote several necessary lemmas which are frequently used in the sequel, and the detailed proofs can be referred to Feng et al. (2015).
Lemma 1
Suppose that the assumptions (A1)–(A4) hold, then
where \(\tau \) is defined in Theorem 3 and \(\mathbf 1 _{\bar{K}}\) is a K-dimension vector of ones.
Lemma 2
Let \(\hat{\gamma }^{*}=\arg \min L_n^{*}(\gamma ^{*})\) and \(\tilde{\gamma }^{*}=\arg \min Q_n(\gamma ^{*})\). Suppose that the assumptions (A1)–(A4) hold, then
Lemma 3
Suppose that the assumptions (A1)–(A4) hold, then
Proof of Theorem 1
By the definition of \(A_n(\gamma ^{*})\), it follows from the convexity lemma in Pollard (1991) that
Note that, according to Lemma A.3 of Huang et al. (2004), there exists an interval \([C_1,C_2]\), \(0<C_1<C_2<\infty \), such that all the eigenvalues of \(\frac{K}{n}Z^TZ\) fall into \([C_1,C_2]\) with probability tending to 1. Write \(S_n(0)=( S_{n1}(0),\ldots ,S_{n\bar{K}}(0) )^T\), then we have
where the last equality holds due to Lemma 3. As \(\bar{K}=pK\), it follows that \(|\tilde{\gamma }^{*}|^2=O_p(K)\). Therefore, based on the triangle inequality and Lemma 2, we obtain
This is equivalent to \(\Vert \check{\gamma }- \gamma _0 \Vert ^2=O_p(K^2/n)\) since \(\check{\gamma }^{*}=\theta _n^{-1}(\check{\gamma }-\gamma _0)\) and \(\theta _n=\sqrt{K/n}\).
In addition, by the properties of spline in De Boor (2001) that there exist some constants \(C_3\) and \(C_4\) satisfying
Thus, we can derive that \(\Vert \check{\gamma }_j^T B_j- \gamma _{0j}^T B_j\Vert ^2=O_p(K/n)\). Consequently, by the fact that \(\Vert f_{0j}-B_j^T \gamma _{0j}\Vert =O_p(K^{-r})\), we have
where the last equality holds due to the assumption that the number of knots \(K=O_p\big ( n^{1/(2r+1)} \big )\). This completes the proof. \(\square \)
Proof of Theorem 2
Firstly, we prove (i). Denote by \(\delta _n=\theta _n+\lambda _1 +\lambda _2\), we first prove that \(\Vert \hat{\gamma }-\gamma _0\Vert = O_p(\bar{K}^{1/2} \delta _n)\). Let \(\gamma =\gamma _0+\bar{K}^{1/2} \delta _n v\), where v is a \(\bar{K}\)-dimensional vector. It is sufficient to show, for any given \(\xi >0\), there exists a large C such that
By virtue of the identity \(|x-y|-|x|=-y\text{ sgn }(x)+2(y-x)\{ I(0<x<y)-I(y<x<0) \}\) and the definition of \(L_n^{\lambda }(\gamma )\), it follows that
From Lemma 3, it is easy to verify that \(\frac{-1}{n}\sum _{i<j} { \text{ sgn }(Y_{ij}-Z_{ij}^T\gamma _0) Z_{ij} }=\theta _n^{-1} \mathbf 1 _{\bar{K}}\), thus we have \(L_1=O_p( \delta _n \theta _n^{-1}\bar{K}^{1/2} \Vert v\Vert )=O_p( n^{1/2}\delta _n \Vert v\Vert )=o_p(n \delta _n^2 \Vert v\Vert )\) due to the assumption \(n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty \). Moreover, taking the similar arguments as in the proof of Lemma 1, we can obtain that
By applying Lemma A.3 of Huang et al. (2004) to \(L_2\) yields \(L_2=O_p(n\delta _n^2 \Vert v\Vert ^2)\). Obviously, by choosing a sufficiently large C, \(L_2\) dominates \(L_1\) with probability tending to 1.
On the other hand, based on the well-known properties of B-spline that \(D_k\) and \(E_k\) are of rank \(K-1\) and all their positive eigenvalues are of order 1 / K, then according to the inequality \(p_{\lambda }(|x|)-p_{\lambda }(|y|) \le \lambda |x-y|\), we have
Thus \(L_3\) is dominated by \(L_2\) if a sufficiently large C is chosen. Similarly, it is easy to verify that \(L_4\) is also dominated by \(L_2\). Recall that \(L_2>0\), so we have (10) holds, which means \(\Vert \hat{\gamma }-\gamma _0\Vert =O_p(\bar{K}^{1/2} \delta _n)\).
Finally, we will show that the convergence rate can be further improved to \(\Vert \hat{\gamma }-\gamma _0\Vert =O_p(\bar{K}^{1/2} \theta _n)\). In fact, as the model is fixed as \(n \rightarrow \infty \), we can find a constant \(C>0\), such that \(\gamma _{0k}^TD_k\gamma _{0k} >C\) for \(k \le s\) and \(\gamma _{0k}^TE_k\gamma _{0k} >C\) for \(k \le p_0\). As \(\Vert \hat{\gamma }-\gamma _0\Vert ^2=O_p(\bar{K} \delta _n^2) =o_p(\bar{K})\) from above result and \(\lambda _k=o_p(1), k=1,2\), we have
These facts indicate that
Removing the regularizing terms \(L_3\) and \(L_4\) in (11), the rate can be improved to \(\Vert \hat{\gamma }-\gamma _0\Vert = O_p(\bar{K}^{1/2} \theta _n)\) by the same reasoning as above. That is \(\Vert \hat{\gamma }-\gamma _0\Vert ^2= O_p(\bar{K} \theta _n^2)=O_p(K^2/n)\). As a consequence, following the same approach in the proof of the second part of Theorem 1, we obtain that \(\Vert \hat{f}_j - f_{0j} \Vert ^2 = O_p(n^{-2r/(2r+1)})\), this completes the proof.
In the next, we put our main attention on proving part (ii) as an illustration and part (iii) can be similarly proved with its detailed proof omitted. Suppose that \(B_j^T\hat{\gamma }_j\) does not represent a linear function for \( p_0+1 \le j \le s\). Define \(\bar{\gamma }\) to be the same as \(\hat{\gamma }\) except that \(\hat{\gamma }_j\) is replaced by its projection onto the subspace { \(\gamma _j : B_j^T\gamma _j\) stands for a linear function }. Therefore, we have that
Note that, by the same arguments to the derivation of (11), it is not difficult to verify that
and
Therefore, we can show that
Recall that \(\bar{\gamma }_k\) is the projection of \(\hat{\gamma }_{k}\) onto \(\{ \gamma _{k}: \gamma _{k}^T E_k \gamma _{k}=0 \}\), then \(\hat{\gamma }_k-\bar{\gamma }_k\) is orthogonal to the space. Furthermore, the space \(\{ \gamma _{k}: \gamma _{k}^T E_k \gamma _{k}=0 \}\) is just the eigenspace of \(E_k\) corresponding to the zero eigenvalue. Consequently, based on the characterization of eigenvalues in terms of Rayleigh quotient, \((\hat{\gamma }_k-\bar{\gamma }_k)^T E_k (\hat{\gamma }_k-\bar{\gamma }_k) / \Vert \hat{\gamma }_k-\bar{\gamma }_k\Vert ^2\) lies between the minimum and the maximum positive eigenvalues of \(E_k\), which is of order 1 / K. Taking into account of the fact that \(\hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}=(\hat{\gamma }_k-\bar{\gamma }_k)^T E_k (\hat{\gamma }_k-\bar{\gamma }_k)\) since \(\bar{\gamma }_{k}^T E_k \bar{\gamma }_{k}=0\), we derive \(\Vert \hat{\gamma }_k-\bar{\gamma }_k\Vert =O_p(\sqrt{K \hat{\gamma }_{k}^T E_k \hat{\gamma }_{k}})\). According to Lemma 3, Lemma A.3 of Huang et al. (2004) and the result \(\Vert \bar{\gamma }-\gamma _0\Vert =O_p(K/\sqrt{n})\) from part (i), it follows that
These facts leads to
On the other hand, according to the proof of (i), we have \(P \big ( p_{\lambda _1}( \sqrt{ \hat{\gamma }_k^T D_k \hat{\gamma }_k } ) = p_{\lambda _1}( \sqrt{ \bar{\gamma }_{k}^T D_k \bar{\gamma }_{k} } )\big ) \rightarrow 1\) and \(P \big ( \bar{\gamma }_k^T E_k \bar{\gamma }_k =0 \big ) \rightarrow 1\). Substituting these results into (12) yields
In addition, based on the result of (i) and the condition \(n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty \), it is easy to verify that
Hence, we have \(P \big ( p_{\lambda _2}( \sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } )= \lambda _2\sqrt{ \hat{\gamma }_k^T E_k \hat{\gamma }_k } \big ) \rightarrow 1\) by the definition of SCAD penalty function.
As a consequence, if \(\hat{\gamma }_k^T E_k \hat{\gamma }_k > 0\), we have
Combining (13) and (15) along with the condition \(n^{r/(2r+1)} \min \{ \lambda _1,\lambda _2 \} \rightarrow \infty \), it follows that
which is contradictory to (14). Then we complete the proof of Theorem 2. \(\square \)
Proof of Theorem 3
Note that, by the results of Theorem 2, we only need to consider a correctly specified partially linear additive model as (2) without regularization terms. Specifically, the corresponding objective function is
where \(V_i= \big ( B_1(X_{i1})^T,\ldots ,B_p(X_{ip_0})^T \big )^T\), \(X_i^{(2)}=(X_{i(p_0+1)},\ldots ,X_{is})^T\) and \(\alpha =(\gamma _1,\ldots ,\gamma _{p_0})^T\) is the corresponding coefficient vector of the spline approximation. Let \((\hat{\alpha }^T,\hat{\beta }^T)^T=\arg \min \Phi _n(\alpha ,\beta )\), \(\tilde{\Delta }_i=\sum _{l=1}^{p_0} { f_{0l}(X_{il}) }-V_i^T \hat{\alpha } \), \(\delta _n= n^{-1/2}\) and \(\beta ^*=\delta _n^{-1}(\beta -\beta _0)\). Then, \(\hat{\beta }^*\) must be the minimizer of the following function
Denote by \(S_n^*(\beta ^{*})\) the gradient function of \(\Phi _n^{*}(\beta ^{*})\), that is
Then, we can show that
Taking into consideration of the results obtained in Theorem 2, we have \(\tilde{\Delta }_i=O_p(K^{-r})=o_p(1)\) as \(n \rightarrow \infty \). Hence, following the similar proof of Lemma 1, it is not difficult to obtain
where \(\Sigma \) is defined in assumption (A3). Further let \(B_n(\beta ^{*})=\tau \delta _n^2 \beta ^{*^T} \Sigma \beta ^{*} + \beta ^{*^T} S_n^*(0) + \Phi _n^{*}(0)\) and its minimizer denoted by \(\tilde{\beta }^*\). Then it is not difficult to verify that \(\tilde{\beta }^*=-(2\tau )^{-1}(\delta _n^2 \Sigma )^{-1}S_n^*(0)\). Based on Equation (16) and a similar arguments of Lemma 2, it follows that
In addition, by the assumption that \(\varepsilon _i\) is the random error independent of \(X_i\), combined with some calculations, we have
where \(H(\cdot )\) stands for the cumulative distribution function of \(\varepsilon \). Furthermore, it can be shown that
Therefore, substituting (18) and (19) into (17), we complete the proof. \(\square \)
Proof of Theorem 4
Based on the asymptotic results of Theorem 3 and the least square B-spline estimate given in Theorem 3 of Lian (2012a), we immediately obtain \(\text{ ARE }(\hat{\beta }_{RR},\hat{\beta }_{LS})=12 \tau ^2 \sigma ^2\). In addition, a result of Hodges and Lehmann (1956) indicates that the ARE has a lower bound 0.864, with this lower bound being obtained at the density \(h(x)=\frac{3}{20\sqrt{5}}(5-x^2)I(|x|\le 5)\). This completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Yang, J., Yang, H. & Lu, F. Rank-based shrinkage estimation for identification in semiparametric additive models. Stat Papers 60, 1255–1281 (2019). https://doi.org/10.1007/s00362-017-0874-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0874-z