Skip to main content
Log in

Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Semiparametric partially linear varying coefficient models (SPLVCM) are frequently used in statistical modeling. With high-dimensional covariates both in parametric and nonparametric part for SPLVCM, sparse modeling is often considered in practice. In this paper, we propose a new estimation and variable selection procedure based on modal regression, where the nonparametric functions are approximated by \(B\)-spline basis. The outstanding merit of the proposed variable selection procedure is that it can achieve both robustness and efficiency by introducing an additional tuning parameter (i.e., bandwidth \(h\)). Its oracle property is also established for both the parametric and nonparametric part. Moreover, we give the data-driven bandwidth selection method and propose an EM-type algorithm for the proposed method. Monte Carlo simulation study and real data example are conducted to examine the finite sample performance of the proposed method. Both the simulation results and real data analysis confirm that the newly proposed method works very well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Cai, Z., Xiao, Z. (2012). Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. Journal of Econometrics, 167, 413–425.

    Google Scholar 

  • Cai, Z., Fan, J., Li, R. (2000). Efficient estimation and inference for varying-coefficient models. Journal of the American Statistical Association, 95, 888–902.

    Google Scholar 

  • Candes, E., Tao, T. (2007). The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). The Annals of Statistics, 35, 2313–2351.

  • Cheng, M., Zhang, W., Chen, L. (2009). Statistical estimation in generalized multiparameter likelihood models. Journal of the American Statistical Association, 104, 1179–1191.

    Google Scholar 

  • Fairfield, K., Fletcher, R. (2002). Vitamins for chronic disease prevention in adults: scientific review. The Journal of the American Medical Association, 287, 3116–3126.

    Google Scholar 

  • Fan, J., Gijbels, I. (1996). Local polynomial modelling and its application. New York: Chapman and Hall.

  • Fan, J., Huang, T. (2005). Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli, 11, 1031–1057.

    Google Scholar 

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Google Scholar 

  • Fan, J., Zhang, W. (1999). Statistical estimation in varying coefficient models. The Annals of Statistics, 27, 1491–1518.

    Google Scholar 

  • Fan, J., Zhang, W. (2000). Simultaneous confidence bands and hypotheses testing in varying-coefficient models. Scandinavian Journal of Statistics, 27, 715–731.

    Google Scholar 

  • Hastie, T., Tibshirani, R. (1993). Varying-coefficient model. Journal of the Royal Statistical Society, Series B, 55, 757–796.

    Google Scholar 

  • Huang, J., Wu, C., Zhou, L. (2002). Varying-coefficient models and basis function approximation for the analysis of repeated measurements. Biometrika, 89, 111–128.

    Google Scholar 

  • Kai, B., Li, R., Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. The Annals of Statistics, 39, 305–332.

    Google Scholar 

  • Lam, C., Fan, J. (2008). Profile-kernel likelihood inference with diverging number of parameters. The Annals of Statistics, 36, 2232–2260.

    Google Scholar 

  • Lee, M. (1989). Mode regression. Journal of Econometrics, 42, 337–349.

  • Leng, C. (2009). A simple approach for varying-coefficient model selection. Journal of Statistical Planning and Inference, 139, 2138–2146.

    Google Scholar 

  • Li, J., Palta, M. (2009). Bandwidth selection through cross-validation for semi-parametric varying-coefficient partially linear models. Journal of Statistical Computation and Simulation, 79, 1277–1286.

    Google Scholar 

  • Li, J., Zhang, W. (2011). A semiparametric threshold model for censored longitudinal data analysis. Journal of the American Statistical Association, 106, 685–696.

    Google Scholar 

  • Li, J., Ray, S., Lindsay, B. (2007). A nonparametric statistical approach to clustering via mode identification. Journal of Machine Learning Research, 8, 1687–1723.

    Google Scholar 

  • Li, Q., Huang, C., Li, D., Fu, T. (2002). Semiparametric smooth coefficient models. Journal of Business and Economic Statistics, 3, 412–422.

    Google Scholar 

  • Li, R., Liang, H. (2008). Variable selection in semiparametric regression modeling. The Annals of Statistics, 36, 261–286.

    Google Scholar 

  • Lin, Z., Yuan, Y. (2012). Variable selection for generalized varying coefficient partially linear models with diverging number of parameters. Acta Mathematicae Applicatae Sinica, English Series, 28, 237–246.

    Google Scholar 

  • Lu, Y. (2008). Generalized partially linear varying-coefficient models. Journal of Statistical Planning and Inference, 138, 901–914.

    Google Scholar 

  • Nierenberg, D., Stukel, T., Baron, J., Dain, B., Greenberg, E. (1989). Determinants of plasma levels of beta-carotene and retinol. American Journal of Epidemiology, 130, 511–521.

    Google Scholar 

  • Schumaker, L. (1981). Splines function: basic theory. New York: Wiley.

  • Scott, D. (1992). Multivariate density estimation: theory, practice and visualization. New York: Wiley.

  • Stone, C. (1982). Optimal global rates of convergence for nonparametric regression. The Annals of Statistics, 10, 1040–1053.

    Google Scholar 

  • Tang, Y., Wang, H., Zhu, Z., Song, X. (2012). A unified variable selection approach for varying coefficient models. Statistica Sinica, 22, 601–628.

    Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    Google Scholar 

  • Wang, H., Zhu, Z., Zhou, J. (2009). Quantile regression in partially linear varying coefficient models. The Annals of Statistics, 37, 3841–3866.

    Google Scholar 

  • Wang, L., Li, H., Huang, J. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103, 1556–1569.

    Google Scholar 

  • Xia, Y., Zhand, W., Tong, H. (2004). Efficient estimation for semivarying-coefficient models. Biometrika, 91, 661–681.

    Google Scholar 

  • Xie, H., Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37, 673–696.

    Google Scholar 

  • Yao, W., Li, L. (2011). A new regression model: modal linear regression. Technical report, Kansas State University, Manhattan. http://www-personal.ksu.edu/~wxyao/

  • Yao, W., Lindsay, B., Li, R. (2012). Local modal regression. Journal of Nonparametric Statistics, 24, 647–663.

    Google Scholar 

  • Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.

    Google Scholar 

  • Zhang, W., Lee, S., Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82, 166–188.

    Google Scholar 

  • Zhao, P., Xue, L. (2009). Variable selection for semiparametric varying coefficient partially linear models. Statistics and Probability Letters, 79, 2148–2157.

    Google Scholar 

  • Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.

    Google Scholar 

  • Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.

    Google Scholar 

  • Zou, H., Li, R. (2008). One-step sparse estimates in nonconcave penalized like-lihood models (with discussion). The Annals of Statistics, 36, 1509–1533.

    Google Scholar 

Download references

Acknowledgments

We sincerely thank two referees and associate editor for their valuable comments that has led to great improved presentation of our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riquan Zhang.

Additional information

The research was supported in part by National Natural Science Foundation of China (11171112, 11101114, 11201190), National Statistical Science Research Major Program of China (2011LZ051) and the Natural Science Foundation of Zhejiang Province Education Department (Y201121276).

Appendices

Appendix

To establish the asymptotic properties of the proposed estimators, the following regularity conditions are needed in this paper. For convenience and simplicity, let \(C\) denote a positive constant that may be different at different place throughout this paper.

  • (C1) The index variable \(U\) has a bounded support \(\Omega \) and its density function \(f_U(\cdot )\) is positive and has a continuous second derivative. Without loss of generality, we assume \(\Omega \) be the unit interval [0,1].

  • (C2) The varying coefficient functions \(\alpha _1(u), \ldots , \alpha _p(u)\) are \(r\)th continuously differentiable on [0,1], where \(r\!>\!2\).

  • (C3) Let \(\Sigma _1(u)=\mathrm{E}\{\mathbf{X X}^T|U=u\}\), \(\Sigma _2(u)=\mathrm{E}\{\mathbf{Z Z}^T|U=u\}\) be continuous with respect to \(u\). Furthermore, for given \(u\), \(\Sigma _1(u)\) and \(\Sigma _2(u)\) are positive definite matrix, and their eigenvalues are bounded. In addition, we assume \(\max _i\Vert \mathbf{X}_i\Vert /\sqrt{n}=o_p(1)\) and \(\max _i\Vert \mathbf{Z}_i\Vert /\sqrt{n}=o_p(1)\).

  • (C4) Let \(t_1,\ldots ,t_K\) be the interior knots of [0,1]. Moreover, let \(t_0=0\), \(t_{K+1}=1\), \(\xi _i=t_i-t_{i-1}\) and \(\xi =\max \{\xi _i\}\). Then, there exists a constant \(C_0\) such that

    $$\begin{aligned} \frac{\xi }{\min \{\xi _i\}}\le C_0, \quad \max \{|\xi _{i+1}-\xi _i|\}=o(K^{-1}). \end{aligned}$$
  • (C5) \(F(x,z,u,h)\) and \(G(x,z,u,h)\) are continuous with respect to \((x,z,u)\).

  • (C6) \(F(x,z,u,h)<0\) for any \(h>0\).

  • (C7) \(\mathrm{E}(\phi ^{\prime }_h(\varepsilon )|\mathbf{x,z},u)=0\) and \(\mathrm{E}(\phi ^{\prime \prime }_h(\varepsilon )^2|\mathbf{x,z},u)\), \(\mathrm{E}(\phi ^{\prime }_h(\varepsilon )^3|\mathbf{x,z},u)\) and \(\mathrm{E}(\phi ^{\prime \prime \prime }_h(\varepsilon )|\mathbf{x,z},u)\) are continuous with respect to \(x\).

  • (C8) \(\mathrm{liminf}_{n\rightarrow \infty } \mathrm{liminf}_{\Vert {\varvec{\gamma }}_j\Vert _H\rightarrow 0^+} {\lambda _{1j}}^{-1}{p^{\prime }_{\lambda _{1j}}(\Vert {\varvec{\gamma }}_j\Vert _H)}>0, j=s_1,\ldots ,p\), and \(\mathrm{liminf}_{n\rightarrow \infty } \mathrm{liminf}_{\beta _k\rightarrow 0^+} {\lambda _{2k}}^{-1} {p^{\prime }_{\lambda _{2k}}(|\beta _k|)}>0, k=s_2,\ldots ,d\).

Remark 4

The conditions (C1)–(C3) are similar adopted for the SPLVCM, such as in Fan and Huang (2005), Li and Liang (2008) and Zhao and Xue (2009). Condition (C4) implies that \(c_0,\ldots ,c_{K+1}\) is a \(C_0\)-quasi-uniform sequence of partitions of [0,1]. (C5)–(C7) are used in modal nonparametric regression in Yao et al. (2012). The condition \(\mathrm{E}(\phi ^{\prime }_h(\varepsilon )|\mathbf{x,z},u)=0\) ensures that the proposed estimate is consistent and it is satisfied if the error density is symmetric about zero. However, we do not require the error distribution to be symmetric about zero. If the assumption \(\mathrm{E}(\phi ^{\prime }_h(\varepsilon )|\mathbf{x,z},u)=0\) does not hold, the proposed estimate is actually estimating the function \(\tilde{m}(\mathbf{x,z},u)=\mathrm{argmin}_m\mathrm{E}(\phi _h(Y-m)|\mathbf{x,z},u)\). Condition (C8) is the assumption about the penalty function, which is similarly to that used in Fan and Li (2001), Li and Liang (2008) and Zhao and Xue (2009).

Proof of Theorem 1

Proof

Let \(\delta =n^{-r/(2r+1)}+a_n\) and \(\mathbf{v}=(\mathbf{v}_1^T,\mathbf{v}_2^T)^T\) be a vector, where \(\mathbf{v}_1\) is \(d\)-dimension vector and \(\mathbf{v}_2\) is \(p\times q\)-dimension vector, \(q=K+\hbar +1\). Define \({\varvec{\beta }}={\varvec{\beta }}_0+\delta \mathbf{v}_1\) and \({\varvec{\gamma }}={\varvec{\gamma }}_0+\delta \mathbf{v}_2\), where \({\varvec{\gamma }}_0\) is the best approximation of \(\alpha (u)\) in the \(B\)-spline space. We first show that, for and any given \(\varrho >0\), there exists a large \(C\) such that

$$\begin{aligned} P\left\{ \sup _{\Vert \mathbf{v}\Vert =C}\mathcal{L } ({\varvec{\gamma }},{\varvec{\beta }}) <\mathcal{L }({\varvec{\gamma }}_0,{\varvec{\beta }}_0) \right\} \le 1-\varrho , \end{aligned}$$
(14)

where \(\mathcal{L }({\varvec{\gamma }},{\varvec{\beta }})\) is defined in (7). Let \(\Xi ({\varvec{\gamma }},{\varvec{\beta }})=\frac{1}{K} \{\mathcal{L }({\varvec{\gamma }},{\varvec{\beta }})- \mathcal{L }({\varvec{\gamma }}_0,{\varvec{\beta }}_0)\}\), then by Taylor expansion, we have that

$$\begin{aligned} \Xi ({\varvec{\gamma }},{\varvec{\beta }})&= -\frac{\delta }{K}\sum \limits _{i=1} ^n\phi _h^{\prime } \left( \varepsilon _i\!+\!\mathbf{X}_i^TR(U_i) \right) \left( \mathbf{Z}_i^T\mathbf{v}_1\!+\!\mathbf{W}_i^T\mathbf{v}_2 \right) \\&+\frac{\delta ^2}{K}\sum \limits _{i=1}^n\phi _h^{\prime \prime } \left( \varepsilon _i\!+\!\mathbf{X}_i^TR(U_i)\right) \left( \mathbf{Z}_i^T\mathbf{v}_1\!+\!\mathbf{W}_i^T\mathbf{v}_2 \right) ^{2}\\&+\frac{\delta ^3}{K}\sum \limits _{i=1}^n\phi _h^{\prime \prime \prime } (\zeta _i) \left( \mathbf{Z}_i^T\mathbf{v}_1\!+\!\mathbf{W}_i^T\mathbf{v}_2 \right) ^3 \\&+\frac{n}{K}\sum \limits _{j=1}^p \left\{ p_{\lambda _{1j}}(\Vert {\varvec{\gamma }}_j\Vert _H)\!-\! p_{\lambda _{1j}}(\Vert {\varvec{\gamma }}_{j0}\Vert _H) \right\} \\&+ \frac{n}{K}\sum \limits _{k=1}^d \left\{ p_{\lambda _{2k}}(|{\beta }_k|)\!-\!p_{\lambda _{2k}}(|\beta _{k0}|) \right\} \\&\triangleq I_1\!+\!I_2\!+\!I_3\!+\!I_4\!+\!I_5, \end{aligned}$$

where \(\zeta _i\) is between \(\varepsilon _i+\mathbf{X}_i^TR(U_i)\) and \(\varepsilon _i+\mathbf{X}_i^TR(U_i)-\delta (\mathbf{Z}_i^T\mathbf{v}_1+\mathbf{W}_i^T\mathbf{v}_2)\),

$$\begin{aligned} R(u)=(R_1(u),\ldots ,R_p(u))^T \quad \mathrm{and}\quad R_j(u)=\alpha _j(u)-B(u)^T{\varvec{\gamma }}_{j0}, \quad j=1,\ldots ,p. \end{aligned}$$

By the condition (C1), (C2) and Corollary 6.21 in Schumaker (1981), we have

$$\begin{aligned} \Vert R_j(u)\Vert =O(K^{-r}). \end{aligned}$$

Then, by Taylor expansion, we have

$$\begin{aligned}&\sum \limits _{i=1}^n\phi _h^{\prime } \left( \varepsilon _i+\mathbf{X}_i^TR(U_i) \right) \left( \mathbf{Z}_i^T\mathbf{v}_1+\mathbf{W}_i^T\mathbf{v}_2\right) \\&\quad = \sum \limits _{i=1}^n \left[ \phi _h^{\prime }(\varepsilon _i)+\phi _h^{\prime \prime }(\varepsilon _i)\mathbf{X}_i^TR(U_i)+\phi _h^{\prime \prime \prime } \left( \varepsilon ^*_i \right) \left( \mathbf{X}_i^TR(U_i) \right) ^2 \right] \left( \mathbf{Z}_i^T\mathbf{v}_1+\mathbf{W}_i^T\mathbf{v}_2 \right) \!, \end{aligned}$$

where \(\varepsilon _i^*\) is between \(\varepsilon _i\) and \(\varepsilon _i+\mathbf{X}_i^TR(U_i)\).

Invoking condition (C4) and (C7), after some direct calculations, we get

$$\begin{aligned} \sum _{i=1}^n\phi _h^{\prime } \left( \varepsilon _i+\mathbf{X}_i^TR(U_i)\right) \left( \mathbf{Z}_i^T\mathbf{v}_1+\mathbf{W}_i^T\mathbf{v}_2 \right) = O_p(nK^{-r}\Vert \mathbf{v}\Vert ). \end{aligned}$$
(15)

Hence, we have \(I_1=O_p(n\delta K^{-(r+1)}\Vert \mathbf{v}\Vert )=O_p(n\delta ^2K^{-1}\Vert \mathbf{v}\Vert )\).

For \(I_2\), we can prove

$$\begin{aligned} I_2=\mathrm{E} (F(\mathbf{X,Z},U,h))O_p(nK^{-1}\delta ^2\Vert \mathbf{v}\Vert ^2). \end{aligned}$$

Therefore, by choosing a sufficiently large \(C\), \(I_2\) dominates \(I_1\) uniformly \(\Vert \mathbf{v}\Vert =C\).

Similarly, we can prove that

$$\begin{aligned} I_3=O_p(nK^{-1}\delta ^3\Vert \mathbf{v}\Vert ^3). \end{aligned}$$

By the condition \(a_n\rightarrow 0\), hence \(\delta \rightarrow 0\). It follows that \(\delta \Vert \mathbf{v}\Vert \rightarrow 0\) with \(\Vert \mathbf{v}\Vert =C\), which lead to \(I_3=o_p(J_2)\). Therefore, \(I_3\) is also dominated by \(I_2\) in \(\Vert \mathbf{v}\Vert =C\).

Moreover, invoking \(p_\lambda (0)=0\), and by the standard argument of the Taylor expansion, we get that

$$\begin{aligned} I_5&\le \sum \limits _{k=1}^{s_2} \left\{ K^{-1}n\delta p^{\prime }_{\lambda _{2k}} (|\beta _{k0}|)\mathrm{sgn}(\beta _{k0})|v_{1l}|\right. \\&\left. +K^{-1}n\delta ^2 p^{\prime }_{\lambda _{2k}} (|\beta _{k0}|)\mathrm{sgn}(\beta _{k0})|v_{1l}|^2(1+o_p(1)) \right\} \\&\le \sqrt{s_2} \left( K^{-1}n\delta a_n\Vert \mathbf{v}\Vert +K^{-1}n\delta b_n\Vert \mathbf{v}\Vert ^2 \right) . \end{aligned}$$

Then, by the condition \(b_n\rightarrow 0\), it is easy to show that \(I_5\) is dominated by \(I_2\) uniformly in \(\Vert \mathbf{v}\Vert =C\). With the same argument, we can prove that \(I_4\) is also dominated by \(I_2\) uniformly in \(\Vert \mathbf{v}\Vert =C\).

By the condition (C6), we know that \(F(\mathbf{x,z},u,h)<0\), hence by choosing a sufficiently large \(C\), we have \(\Xi ({\varvec{\gamma }},{\varvec{\beta }})<0\), which implies that with the probability at least \(1-\varrho \), (14) holds. Hence, there exists a local maximizer such that

$$\begin{aligned} \Vert \hat{{\varvec{\beta }}}-{\varvec{\beta }}_0\Vert =O_p\left( \delta \right) \quad \mathrm{and} \quad \Vert \hat{{\varvec{\gamma }}}-{\varvec{\gamma }}_0\Vert =O_p\left( \delta \right) , \end{aligned}$$
(16)

which completes the proof of part (i).

Now, we prove part (ii). Note that

$$\begin{aligned} \Vert \hat{\alpha }_j(\cdot )-\alpha _{j0}(\cdot )\Vert ^2&= \int _0^1 |\hat{\alpha }_j(u)-\alpha _{j0}(u)|^2\mathrm{d}u\\&= \int _0^1 \left\{ B^T(u)\hat{{\varvec{\gamma }}}_k-B^T(u){\varvec{\gamma }}_k+R_k(u) \right\} ^2\mathrm{d}u\\&\le 2 \int _0^1 \left\{ B^T(u)\hat{{\varvec{\gamma }}}_k-B^T(u){\varvec{\gamma }}_k \right\} ^2\mathrm{d}u+ 2\int _0^1R_k(u)^2\mathrm{d}u\\&= 2(\hat{{\varvec{\gamma }}}_k-{\varvec{\gamma }}_k)^TH(\hat{{\varvec{\gamma }}}_k-{\varvec{\gamma }}_k)+2\int _0^1R_k(u)^2\mathrm{d}u, \end{aligned}$$

where \(H=\int _0^1 B(u)B^T(u)\mathrm{d}u\). Invoking \(\Vert H\Vert =O(1)\) and (16), we have

$$\begin{aligned} (\hat{{\varvec{\gamma }}}_k-{\varvec{\gamma }}_k) ^TH(\hat{{\varvec{\gamma }}}_k-{\varvec{\gamma }}_k)=O_p \left( n^{-\frac{2r}{2r+1}}+a_n^2\right) . \end{aligned}$$

In addition, it is easy to show that

$$\begin{aligned} \int _0^1R_k(u)^2\mathrm{d}u=O_p\left( n^{-\frac{2r}{2r+1}}\right) . \end{aligned}$$

Consequently, \(\Vert \hat{\alpha }_j(\cdot )-\alpha _{j0}\Vert =O_p\left( n^{\frac{-r}{2r+1}}+a_n\right) , j=1,\ldots ,p,\) which complete the proof of part (ii). \(\square \)

Proof of Theorem 2

Proof

By the property of SCAD penalty function, \(a_n=0\) as \(\lambda _{\max }\rightarrow 0\). Then by Theorem 1, it is sufficient to show that, when \(n\rightarrow \infty \), for any \({\varvec{\gamma }}\) that satisfies \(\Vert {\varvec{\gamma }}-{\varvec{\gamma }}_0\Vert =O_p(n^{-r/(2r+1)})\), \(\beta _k\) that satisfies \(\Vert \beta _k-\beta _{k0}\Vert =O_p(n^{-r/(2r+1)}), k=1,\ldots ,s_2\), and some given small \(\nu =Cn^{-r/(2r+1)}\), with probability tending to 1 we have

$$\begin{aligned} \frac{\partial {\mathcal{L }}({\varvec{\gamma }},{\varvec{\beta }})}{\partial \beta _k}<0, \quad \mathrm{for} \; 0<\beta _k<\nu , \ k=s_2+1,\ldots ,d \end{aligned}$$
(17)

and

$$\begin{aligned} \frac{\partial {\mathcal{L }}({\varvec{\gamma }},{\varvec{\beta }})}{\partial \beta _k}>0, \quad \mathrm{for}\; -\nu <\beta _k<0, \ k=s_2+1,\ldots ,d. \end{aligned}$$
(18)

Consequently, (17) and (18) imply the maximizer of \({\mathcal{L }}({\varvec{\gamma }},{\varvec{\beta }})\) attains at \(\beta _k=0, \ k=s_2+1,\ldots ,d\).

By a similar proof of Theorem 1, we can show that

$$\begin{aligned} \frac{\partial {\mathcal{L }}({\varvec{\gamma }},{\varvec{\beta }})}{\partial \beta _k}&= \frac{\partial Q({\varvec{\gamma }},{\varvec{\beta }})}{\partial \beta _k}-np^{\prime }_{\lambda _{2k}} (|\beta _{k}|)\mathrm{sgn}(\beta _{k})\\&= \sum \limits _{i=1}^n Z_{ik}\phi ^{\prime }_h \left( Y_i-\mathbf{W}_i^T{\varvec{\gamma }}-\mathbf{Z}_i^T{\varvec{\beta }} \right) -np^{\prime }_{\lambda _{2k}} (|\beta _{k}|)\mathrm{sgn}(\beta _{k})\\&= \sum \limits _{i=1}^n\left\{ Z_{ik}\phi ^{\prime }_h \left( \varepsilon _i+\mathbf{X}_i^TR(U_i) \right) -\phi ^{\prime \prime }_h \left( \varepsilon _i+\mathbf{X}_i^TR(U_i) \right) \right. \\&\left. Z_{ik} \left[ \mathbf{W}_i^T({\varvec{\gamma }}-{\varvec{\gamma }}_0) +\mathbf{Z}_i^T({\varvec{\beta }}-{\varvec{\beta }}_0) \right] \right. \\&\left. +\phi ^{\prime \prime \prime }(\eta _i)Z_{ik}[\mathbf{W}_i^T({\varvec{\gamma }}-{\varvec{\gamma }}_0)+\mathbf{Z}_i^T({\varvec{\beta }}-{\varvec{\beta }}_0)]^2-np^{\prime }_{\lambda _{2k}} (|\beta _{k}|)\mathrm{sgn}(\beta _{k})\right\} \\&= n\lambda _{2k}\left\{ \lambda _{2k}^{-1}p^{\prime }_{\lambda _{2k}} (|\beta _{k}|)\mathrm{sgn}(\beta _{k})+O_p(\lambda _{2k}^{-1}n^{-\frac{r}{2r+1}})\right\} , \end{aligned}$$

where \(\eta _i\) is between \(Y_i-\mathbf{W}_i^T{\varvec{\gamma }}-\mathbf{Z}_i^T{\varvec{\beta }}\) and \(\varepsilon _i+\mathbf{X}_i^TR(U_i)\).

By the condition (C8), \(\mathrm{liminf}_{n\rightarrow \infty } \mathrm{liminf}_{\beta _k\rightarrow 0^+} {\lambda _{2k}}^{-1} {p^{\prime }_{\lambda _{2k}}(|\beta _k|)}\!>\!0\), and \(\lambda _{2k}n^{\frac{r}{2r+1}}\!>\lambda _{\min }n^{\frac{r}{2r+1}} \rightarrow \infty \), the sign of the derivation is completely determined by that of \(\beta _k\), then (17) and (18) hold. This completes the proof of part (i).

For part (ii), apply the similar techniques as in part (i), we have, with probability tending to 1, that \(\hat{\alpha }_{j}(\cdot )=0, j=s_1+1,\ldots ,p\). Invoking \(\sup _u\Vert B(u)\Vert =O(1)\), the result is achieved from \(\hat{\alpha }_j(u)=B(u)^T\hat{{\varvec{\gamma }}}_j\). \(\square \)

Proof of Theorem 3

Proof

From Theorems 1 and 2, we know that, as \(n\rightarrow \infty \), with probability tending to 1, \({\mathcal{L }}({\varvec{\gamma }},{\varvec{\beta }})\) attains the maximal value at \((\hat{{\varvec{\beta }}}_a^T, 0)^T\) and \((\hat{{\varvec{\gamma }}}_a^T, 0)^T\). Let \({\mathcal{L }}_1({\varvec{\gamma }},{\varvec{\beta }})=\partial {\mathcal{L }}({\varvec{\gamma }},{\varvec{\beta }})/\partial {\varvec{\beta }}_a\) and \({\mathcal{L }}_2({\varvec{\gamma }},{\varvec{\beta }})=\partial {\mathcal{L }}({\varvec{\gamma }},{\varvec{\beta }})/\partial {\varvec{\gamma }}_a\), then \((\hat{{\varvec{\beta }}}_a^T, 0)^T\) and \((\hat{{\varvec{\gamma }}}_a^T, 0)^T\) must satisfy following two equations

$$\begin{aligned}&\frac{1}{n}{\mathcal{L }}_1 \left( \left( \hat{{\varvec{\gamma }}}_a^T, 0\right) ^T, \left( \hat{{\varvec{\beta }}}_a^T, 0\right) ^T\right) \nonumber \\&\quad =\frac{1}{n}\sum \limits _{i=1}^n \mathbf{Z}_{ia}\phi ^{\prime }_h \left\{ Y_i-\mathbf{W}_{ia}^T\hat{{\varvec{\gamma }}}_a -\mathbf{Z}_{ia}^T\hat{{\varvec{\beta }}}_a \right\} -p^{\prime }_{\lambda _{2}} (|\hat{{\varvec{\beta }}}_a|)\circ \mathrm{sgn}(\hat{{\varvec{\beta }}}_a)=0 \end{aligned}$$
(19)

and

$$\begin{aligned} \frac{1}{n}{\mathcal{L }}_2 \left( \left( \hat{{\varvec{\gamma }}}_a^T, 0\right) ^T, \left( \hat{{\varvec{\beta }}}_a^T, 0 \right) ^T\right) =\frac{1}{n}\sum _{i=1}^n \mathbf{W}_{ia}\phi ^{\prime }_h \left\{ Y_i-\mathbf{W}_{ia}^T\hat{{\varvec{\gamma }}}_a -\mathbf{Z}_{ia}^T\hat{{\varvec{\beta }}}_a \right\} -{\varvec{\kappa }}=0,\nonumber \\ \end{aligned}$$
(20)

where “\(\circ \)” denotes the Hadamard (componentwise) product and the \(k\)th component of \(p^{\prime }_{\lambda _{2}} (|\hat{{\varvec{\beta }}}_a|)\) is \(p^{\prime }_{\lambda _{2k}} (|\hat{\beta }_{k}|), 1\le k\le s_1\); \({\varvec{\kappa }}\) is a \(q \times s_1\)-dimensional vector with its \(j\)th block subvector being \(H\frac{\hat{{\varvec{\gamma }}}_j}{\Vert \hat{{\varvec{\gamma }}}_j\Vert _H} p_{\lambda _1}^{\prime }(\Vert \hat{{\varvec{\gamma }}}_j\Vert _H)\). Applying the Taylor expansion to \(p^{\prime }_{\lambda _{2k}} (|\hat{\beta }_{k}|)\), we get that

$$\begin{aligned} p^{\prime }_{\lambda _{2k}} (|\hat{\beta }_{k}|)=p^{\prime }_{\lambda _{2k}} (|\hat{\beta }_{k0}|) +\{p^{\prime \prime }_{\lambda _{2k}} (|\hat{\beta }_{k0}|)+o_p(1)\}(\hat{\beta }_{k}-\beta _{k0}), \quad k=1,\ldots , s_2. \end{aligned}$$

By the condition \(b_n\rightarrow 0\) and note that \(p^{\prime }_{\lambda _{2k}} (|\hat{\beta }_{k0}|)=0\) as \(\lambda _{\max }\rightarrow 0\), some simple calculations yields

$$\begin{aligned}&\frac{1}{n}\sum \limits _{i=1}^n \mathbf{Z}_{ia}\left\{ \phi ^{\prime }_h(\varepsilon _i) \!+\!\phi ^{\prime \prime }_h(\varepsilon _i) \left\{ \mathbf{X}_i^TR^*(U_i)\!-\! \left[ \mathbf{Z}_{ia}^T(\hat{{\varvec{\beta }}}_a\!-\!{{\varvec{\beta }}}_{a0}) \!+\! \mathbf{W}_{ia}^T(\hat{{\varvec{\gamma }}}_a\!-\!{{\varvec{\gamma }}} _{a0})\right] \right\} \right. \nonumber \\&\quad + \left. \phi ^{\prime \prime \prime }_h(\zeta _i) \left\{ \mathbf{X}_i^TR^*(U_i)\!-\!\left[ \mathbf{Z}_{ia}^T(\hat{{\varvec{\beta }}}_a\!-\!{{\varvec{\beta }}}_{a0})\!+\! \mathbf{W}_{ia}^T(\hat{{\varvec{\gamma }}}_a\!-\!{{\varvec{\gamma }}}_{a0}) \right] \right\} ^2\right\} \!+\!o_p(\hat{{\varvec{\beta }}}_a\!-\!{\varvec{\beta }}_{a0})\!=\!0,\nonumber \\ \end{aligned}$$
(21)

where \(\zeta _i\) is between \(\varepsilon _i\) and \(Y_i-\mathbf{W}_{ia}^T\hat{{\varvec{\gamma }}}_{a} -\mathbf{Z}_{ia}^T\hat{{\varvec{\beta }}}_{a}\), \(R^*(u)=(R_1(u),\ldots ,R_{s_1}(u))^T\). Invoking (20), and using the similar arguments to (21), we have

$$\begin{aligned}&\frac{1}{n}\sum \limits _{i=1}^n \mathbf{W}_{ia}\left\{ \phi ^{\prime }_h(\varepsilon _i) \!+\!\phi ^{\prime \prime }_h(\varepsilon _i) \left\{ \mathbf{X}_i^TR^*(U_i)\!-\!\left[ \mathbf{Z}_{ia}^T(\hat{\varvec{\beta }}_a\!-\!{\varvec{\beta }}_{a0})\!+\! \mathbf{W}_{ia}^T(\hat{{\varvec{\gamma }}}_a\!-\!{{\varvec{\gamma }}}_{a0}) \right] \right\} \right. \nonumber \\&\quad + \left. \phi ^{\prime \prime \prime }_h(\bar{\zeta _i})\left\{ \mathbf{X}_{ia}^TR^*(U_i)\!-\!\left[ \mathbf{Z}_{ia}^T(\hat{{\varvec{\beta }}}_a\!-\!{\varvec{\beta }}_{a0})\!+\! \mathbf{W}_{ia}^T(\hat{{\varvec{\gamma }}}_a\!-\!{{\varvec{\gamma }}}_{a0})\right] \right\} ^2\right\} \!+\!o_p(\hat{{\varvec{\gamma }}}_a\!-\!{\varvec{\gamma }}_{a0})\!=\!0,\nonumber \\ \end{aligned}$$
(22)

where \(\bar{\zeta }_i\) is also between \(\varepsilon _i\) and \(Y_i-\mathbf{W}_{ia}^T\hat{{\varvec{\gamma }}}_{a} -\mathbf{Z}_{ia}^T\hat{{\varvec{\beta }}}_{a}\).

Let \(\Phi _n=\frac{1}{n}\sum \nolimits _{i=1}^n\phi ^{\prime \prime }(\varepsilon _i)\mathbf{W}_{ia}\mathbf{W}_{ia}^T\) and \(\Psi _n=\frac{1}{n}\sum \nolimits _{i=1}^n\phi ^{\prime \prime }(\varepsilon _i)\mathbf{W}_{ia}\mathbf{Z}_{ia}^T\), then, by the result of Theorem 2 and regularity conditions (C3) and (C7), after some calculations based on (22), it follows that

$$\begin{aligned} \hat{{\varvec{\gamma }}}_a-{\varvec{\gamma }}_{a0}=(\Phi _n+o_p(1))^{-1}\left\{ - \Psi _n(\hat{{\varvec{\beta }}}_a-{{\varvec{\beta }}}_{a0})+\Lambda _n\right\} , \end{aligned}$$
(23)

where \(\Lambda _n=\frac{1}{n}\sum \nolimits _{i=1}^n\mathbf{W}_{ia}\left[ \phi ^{\prime }_h(\varepsilon _i)+\phi ^{\prime \prime }_h(\varepsilon _i) \mathbf{X}_{ia}^TR^*(U_i)\right] \). Furthermore, we can prove

$$\begin{aligned} \Phi _n \stackrel{\mathrm{P}}{\longrightarrow } \Phi \!=\!\mathrm{E} \left( F(\mathbf{X,Z},U,h)\mathbf{W}_a\mathbf{W}_a^T\right) \quad \mathrm{and} \quad \Psi _n \stackrel{\mathrm{P}}{\longrightarrow } \Psi \!=\!\mathrm{E}\left( F(\mathbf{X,Z},U,h)\mathbf{W}_a\mathbf{Z}_a^T\right) . \end{aligned}$$

Therefore, we can write

$$\begin{aligned} \hat{{\varvec{\gamma }}}_a-{\varvec{\gamma }}_{a0}=-(\Phi +o_p(1))^{-1} \Psi (\hat{{\varvec{\beta }}}_a-{{\varvec{\beta }}}_{a0})+(\Phi +o_p(1)) ^{-1}\Lambda _n. \end{aligned}$$
(24)

Substituting (24) into (21), we obtain

$$\begin{aligned}&\frac{1}{n}\sum \limits _{i=1}^n\phi ^{\prime \prime }_h(\varepsilon _i)\mathbf{Z}_{ia}\left[ \mathbf{Z}_{ia}-\Psi ^T\Phi ^{-1}\mathbf{W}_{ia}\right] ^T(\hat{{\varvec{\beta }}}_a-{{\varvec{\beta }}}_{a0})+o _p(\hat{{\varvec{\beta }}}_a-{\varvec{\beta }}_{a0})\nonumber \\&\quad =\frac{1}{n}\sum \limits _{i=1}^n\mathbf{Z}_{ia}\left[ \phi ^{\prime }_h(\varepsilon _i) +\phi ^{\prime \prime }_h(\varepsilon _i)\mathbf{X}_{ia}^TR^*(U_i)-\phi ^{\prime \prime }_h(\varepsilon _i)\mathbf{W}_{ia}^T\frac{1}{n}\sum \limits _{j=1}^n \Phi ^{-1}\mathbf{W}_{ja}\phi ^{\prime }_h(\varepsilon _j)\right] \nonumber \\&\qquad -\frac{1}{n}\sum \limits _{i=1}^n\mathbf{Z}_{ia}\phi ^{\prime \prime }_h(\varepsilon _i)\mathbf{W}_{ia}^T\frac{1}{n}\sum \limits _{j=1}^n\mathbf{X}_{ja}^TR^*(U_j). \end{aligned}$$
(25)

Note that

$$\begin{aligned} \mathrm{E}\left( \frac{1}{n}\sum _{i=1}^n\phi ^{\prime \prime }_h(\varepsilon _i)\Psi ^T\Phi ^{-1}\mathbf{W}_{ia}\left[ \mathbf{Z}_{ia}^T-\mathbf{W}_{ia}^T\Phi ^{-1}\Psi \right] \right) =0 \end{aligned}$$

and

$$\begin{aligned} \mathrm{Var}\left( \frac{1}{n}\sum _{i=1}^n\phi ^{\prime \prime }_h(\varepsilon _i) \Psi ^T\Phi ^{-1}\mathbf{W}_{ia}\left[ \mathbf{Z}_{ia}^T-\mathbf{W}_{ia}^T\Phi ^{-1}\Psi \right] \right) =o_p(1/n). \end{aligned}$$

Hence, it is easy to show that

$$\begin{aligned}&\left\{ \frac{1}{n}\sum \limits _{i=1}^n\phi ^{\prime \prime }_h(\varepsilon _i){\check{\mathbf{Z}}}_{ia} {\check{\mathbf{Z}}}_{ia}^T+o_p(1)\right\} \sqrt{n}(\hat{{\varvec{\beta }}}_a -{{\varvec{\beta }}}_{a0})\nonumber \\&\quad = \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n{ \check{\mathbf{Z}}}_{ia}\phi ^{\prime }_h(\varepsilon _i) +\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n{ \check{\mathbf{Z}}}_{ia}\phi ^{\prime \prime }_h(\varepsilon _i)\mathbf{X}_{ia}^TR^*(U_i)\triangleq J_1+J_2. \end{aligned}$$
(26)

By the definition of \(R^*(U_i)\), we can prove \(J_2=o_p(1)\). Moreover, we have

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\phi ^{\prime \prime }_h(\varepsilon _i){ \check{\mathbf{Z}}}_{ia} {\check{\mathbf{Z}}}_{ia}^T\stackrel{\mathrm{P}}{\longrightarrow } \Sigma . \end{aligned}$$

It remains to show that

$$\begin{aligned} J_1 \stackrel{\mathrm{d}}{\longrightarrow } N(0,\Delta ), \end{aligned}$$
(27)

where \(\Delta =\mathrm{E}(G(\mathbf{X,Z},U,h){\check{\mathbf{Z}}}_a{ \check{\mathbf{Z}}}_a^T)\).

Then, combine (26) and (27) and use the Slutsky’s theorem, it follows that

$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\beta }}}_a-{{\varvec{\beta }}}_{a0})\stackrel{\mathrm{d}}{\longrightarrow }N(0,\Sigma ^{-1}\Delta \Sigma ^{-1}). \end{aligned}$$

Next, we prove (27). Note that for any vector \({\varvec{\varsigma }}\) whose components are not all zero,

$$\begin{aligned} {\varvec{\varsigma }}^TJ_1=\sum _{i=1}^n \frac{1}{\sqrt{n}}{\varvec{\varsigma }}^T {\check{\mathbf{Z}}}_{ia}\phi ^{\prime }_h(\varepsilon _i)=\sum _{i=1}^n a_i\xi _i, \end{aligned}$$

where \(a_i^2=\frac{1}{n}G(\mathbf{X}_i,\mathbf{Z}_i,U_i,h){\varvec{\varsigma }}^T {\check{\mathbf{Z}}}_{ia}{\check{\mathbf{Z}}}_{ia}^T{\varvec{\varsigma }}\) and, conditioning on \(\{\mathbf{X}_i,\mathbf{Z}_i,U_i\}\), \(\xi _i\) are independent with mean zero and variance one. It follows easily by checking Lindeberg condition that if

$$\begin{aligned} \frac{\max _i a_i^2}{\sum _{i=1} a_i^2}\stackrel{\mathrm{P}}{\longrightarrow } 0, \end{aligned}$$
(28)

then \(\sum _{i=1}^n a_i\xi _i/\sqrt{\sum _{i=1}^n a_i^2}\stackrel{\mathrm{d}}{\longrightarrow } N(0,1)\). Thus, we can conclude that (27) holds.

Now, we only need to show (28) holds. Noting that \(({\varvec{\varsigma }}^T {\check{\mathbf{Z}}}_{ia})^2\le \Vert {\varvec{\varsigma }}\Vert ^2\Vert {\check{\mathbf{Z}}}_{ia}\Vert ^2\), hence \(a_i^2\le \frac{1}{n}G(\mathbf{X}_i,\mathbf{Z}_i,U_i,h)\Vert {\varvec{\varsigma }}\Vert ^2 \Vert {\check{\mathbf{Z}}}_{ia}\Vert ^2\). Since

$$\begin{aligned} \Vert {\check{\mathbf{Z}}}_{ia}\Vert =\Vert \mathbf{Z}_{ia}-\Psi ^T\Phi ^{-1}\mathbf{W}_{ia}\Vert \le \Vert \mathbf{Z}_{ia}\Vert +\Vert \Psi ^T\Phi ^{-1}\mathbf{W}_{ia}\Vert , \end{aligned}$$

and by the conditions \(\max _i\Vert \mathbf{X}_i\Vert /\sqrt{n}=o_p(1)\) and \(\max _i\Vert \mathbf{Z}_i\Vert /\sqrt{n}=o_p(1)\) in (C3), using the property of spline basis (Schumaker 1981) and the definition \(\mathbf{W}_{ia}=I_p\otimes B(U_i)\cdot \mathbf{X}_{ia}\) together with the conditions (C5) and (C7), we can prove

$$\begin{aligned} \max _i\Vert \check{\mathbf{Z}}_{i}\Vert /\sqrt{n}=o_p(1). \end{aligned}$$

Applying the Slutsky’s theorem, (28) holds obviously, which complete the proof of Theorem 3. \(\square \)

Proof of Theorem 4

Proof

According to the Eq. (24) and the asymptotic normality of \(\hat{\varvec{\beta }}_{a}-{\varvec{\beta }}_{a0}\) in Theorem 3, for any vector \(\mathbf{d}_n\) with dimension \(q\times s_1\) and components not all 0, by the conditions (C1)–(C5) and (C7) and use the Slutsky’s theorem and the property of multivariate normal distribution, it follows that

$$\begin{aligned} \left\{ \mathbf{d}_n^T\mathrm{var}(\hat{\varvec{\gamma }}_a)\mathbf{d}_n\right\} ^{-1/2}\mathbf{d}_n^T(\hat{\varvec{\gamma }}_a-{{\varvec{\gamma }}}_{a0}) \stackrel{\mathrm{d}}{\longrightarrow } N(0, 1), \end{aligned}$$

where

$$\begin{aligned} \mathrm{var}(\hat{\varvec{\gamma }}_a)={\Phi }^{-1}\Psi \frac{\Sigma ^{-1}\Delta \Sigma ^{-1}}{n}\Psi ^T\Phi ^{-1}. \end{aligned}$$

For any \(q\times s_1\)-vector \(\mathbf{c}_n\) whose components are not all 0, by the definition of \(\hat{\varvec{\alpha }}_a\) and \(\tilde{\varvec{\alpha }}_a\), choosing \(\mathbf{d}_n=\mathbf{W}_a^T\mathbf{c}_n\) yields

$$\begin{aligned} \left\{ \mathbf{c}_n^T\mathrm{var}(\hat{\varvec{\alpha }}_a(u))\mathbf{c}_n\right\} ^{-1/2}\mathbf{c}_n^T(\hat{\varvec{\alpha }}_a(u)-\tilde{{\varvec{\alpha }}}_a(u)) \stackrel{\mathrm{d}}{\longrightarrow } N(0, 1). \end{aligned}$$

\(\square \)

About this article

Cite this article

Zhao, W., Zhang, R., Liu, J. et al. Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Ann Inst Stat Math 66, 165–191 (2014). https://doi.org/10.1007/s10463-013-0410-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-013-0410-4

Keywords

Navigation