Skip to main content
Log in

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In this paper, a new robust and efficient estimation approach based on local modal regression is proposed for partially linear models with large-dimensional covariates. We show that the resulting estimators for both parametric and nonparametric components are more efficient in the presence of outliers or heavy-tail error distribution, and as asymptotically efficient as the corresponding least squares estimators when there are no outliers and the error distribution is normal. We also establish the asymptotic properties of proposed estimators when the covariate dimension diverges at the rate of \(o\left( {\sqrt{n} } \right) \mathrm{{ }}\). To achieve sparsity and enhance interpretability, we develop a variable selection procedure based on SCAD penalty to select significant parametric covariates and show that the method enjoys the oracle property under mild regularity conditions. Moreover, we propose a practical modified MEM algorithm for the proposed procedures. Some Monte Carlo simulations and a real data are conducted to illustrate the finite sample performance of the proposed estimators. Finally, based on the idea of sure independence screening procedure proposed by Fan and Lv (J R Stat Soc 70:849–911, 2008), a robust two-step approach is introduced to deal with ultra-high dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Akaike H (1973) Maximum likelihood Identification of Gaussian autoregressive moving average models. Biometrika 60:255–265

    Article  MathSciNet  Google Scholar 

  • Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37:373–384

    Article  MathSciNet  Google Scholar 

  • Chen B, Yu Y, Zou H, Liang H (2012) Profiled adaptive Elastic-Net procedure for partially linear models with high-dimensional covariates. J Stat Plann Inference 142:1733–1745

    Article  MathSciNet  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–21

    MathSciNet  MATH  Google Scholar 

  • Engle R, Granger C, Rice J, Weiss A (1986) Semiparametric estimates of the relation between weather and electricity sales. J Am Stat Assoc 81:310–320

    Article  Google Scholar 

  • Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman and Hall, London

    MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan J, Lv J (2008) Sure independence screening for ultra-high-dimensional feature space. J R Stat Soc 70:849–911

    Article  MathSciNet  Google Scholar 

  • Fan J, Hu TC, Truong YK (1994) Robust nonparametric function estimation. Scand J Stat 21:433–446

    MATH  Google Scholar 

  • Hardle W, Liang H, Gao JT (2000) Partial linear models. Springer, New York

    Book  Google Scholar 

  • Huber PJ (1981) Robust estimation. Wiley, New York

    MATH  Google Scholar 

  • Johnson RW (2003) Kiplingers personal finance. J Stat Educ 57:104–123

    Google Scholar 

  • Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36:261–286

    Article  MathSciNet  Google Scholar 

  • Li J, Ray S, Lindsay B (2007) A nonparametric statistical approach to clustering via mode identification. J Mach Learn Res 8:1687–1723

    MathSciNet  MATH  Google Scholar 

  • Li GR, Peng H, Zhu LX (2011) Nonconcave penalized M-estimation with diverging number of parameters. Stat Sin 21:391–420

    MathSciNet  MATH  Google Scholar 

  • Mallows CL (1973) Some comments on \(Cp\). Technometrics 15:661–675

    MATH  Google Scholar 

  • Ni X, Zhang HH, Zhang D (2009) Automatic model selection for partially linear models. J Multivar Anal 100:2100–2111

    Article  MathSciNet  Google Scholar 

  • Pollard D (1991) Asymptotics for least absolute deviation regression estimators. Econom Theory 7:186–199

    Article  MathSciNet  Google Scholar 

  • Rao BLSP (1983) Nonparametric functional estimation. Academic Press, Orlando

    MATH  Google Scholar 

  • Robinson PM (1988) Root \(n\)-consistent semiparametric regression. Econometrica 56:931–954

    Article  MathSciNet  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  Google Scholar 

  • Severini TA, Staniswalis JG (1994) Quasi-likelihood estimation in semiparametric models. J Am Stat Assoc 89:501–511

    Article  MathSciNet  Google Scholar 

  • Speckman PE (1988) Kernel smoothing in partial linear models. J R Stat Soc 50:413–436

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J Bus Econ Stat 25:347–355

    Article  MathSciNet  Google Scholar 

  • Xie H, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37:673–696

    Article  MathSciNet  Google Scholar 

  • Yang H, Yang J (2014) A robust and efficient estimation and variable selection method for partially linear single-index models. J Multivar Anal 129:227–242

    Article  MathSciNet  Google Scholar 

  • Yao W, Li L (2014) A new regression model: modal linear regression. Scand J Stat 41:656–671

    Article  MathSciNet  Google Scholar 

  • Yao W, Lindsay B, Li R (2012) Local modal regression. J Nonparametr Stat 24:647–663

    Article  MathSciNet  Google Scholar 

  • Zeger S, Diggle P (1994) Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50:689–699

    Article  Google Scholar 

  • Zhang R, Zhao W, Liu J (2013) Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. J Nonparametr Stat 25:523–544

    Article  MathSciNet  Google Scholar 

  • Zhao W, Zhang R, Liu J, Lv Y (2014) Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Ann Inst Stat Math 66:165–191

    Article  MathSciNet  Google Scholar 

  • Zhao W, Zhang R, Liu Y, Liu J (2015) Empirical likelihood based modal regression. Stat Papers 56:411–430

    Article  MathSciNet  Google Scholar 

  • Zhou H, You J, Zhou B (2010) Statistical inference for fixed-effects partially linear regression models with errors in variables. Stat Pap 51:629–650

    Article  MathSciNet  Google Scholar 

  • Zhu LX, Fang KT (1996) Asymptotics for kernel estimation of sliced inverse regression. Ann Stat 3:1053–1068

    MATH  Google Scholar 

  • Zhu L, Huang M, Li R (2012) Semiparametric quantile regression with high-dimensional covariates. Stat Sin 22:1379–1401

    MathSciNet  MATH  Google Scholar 

  • Zhu L, Li R, Cui H (2013) Robust estimation for partially linear models with large-dimensional covariates. Sci China Math 56:2069–2088

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 11671059).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Li.

Additional information

This work is supported by the National Natural Science Foundation of China (Grant No. 11671059).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (xls 51 KB)

Supplementary material 2 (xls 69 KB)

Appendix: Proofs of Theorems

Appendix: Proofs of Theorems

Proof of Theorem 1

We want to show that for any given \({{\delta > 0}}\), there exists a large constant C such that

$$\begin{aligned} P\left\{ {\mathop {\sup }\limits _{\Vert \mathbf {v} \Vert = \mathbf {C}} {R}( {\beta _0 + {{\mu _n}}\mathbf {v}} ) < {R}( \beta _0 )} \right\} \ge 1 - \delta , \end{aligned}$$
(13)

where \(R(\beta ) = \frac{1}{n}\sum \limits _{i = 1}^n {\phi _{h_1}}( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T\beta })\), \({\mu _n}={p_n}^{{1 / 2}} {{n^{{{ - 1} / 2}}}}\).

For simplicity, we define \({D_n}(\mathbf{{v}}) = {R}( {\beta _0 + {{\mu _n}}{} \mathbf{{v}}}) - {R }(\beta _0)\) and then obtain that

$$\begin{aligned} {D_n}( \mathbf{{u}})&= \frac{1}{n} {\sum \limits _{i = 1}^n\left\{ {{\phi _{{h_1}}}}\left( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T({\beta _0 + {\mu _n}\mathbf {v}} )}\right) - {{\phi _{{h_1}}}}\left( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T\beta _0 }\right) \right\} }\\&= \frac{1}{n} {\sum \limits _{i = 1}^n\left\{ {{\phi _{{h_1}}}}\left( {{\varepsilon _i} - {\sigma _i} - {\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}} \right) -{{\phi _{{h_1}}}} ( {{\varepsilon _i} - {\sigma _i}})\right\} }\\&= \frac{1}{n}\sum \limits _{i = 1}^n \left\{ { - \phi _{{h_1}}^\prime } ( {{\varepsilon _i} - {\sigma _i}})\left( {\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}\right) + \frac{1}{2}\phi _{{h_1}}^{\prime \prime }( {{\varepsilon _i} - {\sigma _i}} ){{\left( {\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}\right) }^2}\right. \\&\left. \quad - \frac{1}{6}\phi _{{h_1}}^{\prime \prime \prime }( {{t_i}} ){{\left( {\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}\right) }^3}\right\} \\&= {I_1}+{I_2}+{I_3}, \end{aligned}$$

where \({\sigma _i} = {{\widetilde{Y}}_i} - {{{\widehat{\widetilde{Y}}}}_i} - {({{\widetilde{X}}_i} - {{{\widehat{\widetilde{X}}}}_i})^T}{\beta _0}\), and \(t_i\) is between \({\varepsilon _i} - {\sigma _i}\) and \({\varepsilon _i} - {\sigma _i}-{\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}\). By the regularity conditions (A1)–(A4), the consistence of the kernel estimation implies that \({\max _{1 \le i \le n}}\Vert {{\sigma _i}} \Vert = {o_p}(1)\) almost surely, which will repeatedly be used in our proof. A detailed discussion on this argument can be found in the Lemma 3.5.1 and Lemma A.1 of Hardle et al. (2000).

Based on the fact \(\xi = E(\xi ) + {\mathrm{O}_p}(\sqrt{Var(\xi )})\), the regularity condition (A7) and the uniform convergence of \({{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}}\) entail that \({I_1} = {\mathrm{O}_p}(\frac{{C{\mu _n}}}{{\sqrt{n} }})\). One can refer to Rao (1983) and Zhu and Fang (1996) for these technical details. Similarly, we have \({I_3} = {\mathrm{O}_p}({C^3}{\mu _n}^3)\). For \(I_2\), we have \({I_2} = \frac{1}{2}{\mu _n}^2{\mathbf{{v}}^T}{\varSigma _1}{} \mathbf{{v}}(1 + {o_p}(1))\), where \({\varSigma _1} = E\{ F(U,{h_1}){\widetilde{X}}{{\widetilde{X}}^T}\}.\) By the condition \({{p_n^2} / n} \rightarrow 0\) as \(n \rightarrow \infty \), we can show that \({I_1} = {o_p}\left( {{I_2}} \right) \), and \({I_3} = {o_p}\left( {{I_2}} \right) \). Similar practice can been in Li et al. (2011).

By the regularity condition (A6), \(F(\mathbf{{v}},{h_1}) < 0\); hence, \({\varSigma _1}\) is a negative matrix. Noting \(\left\| \mathbf {v} \right\| = C\), we can get C large enough such that \(I_2\) dominates both \(I_1\) and \(I_3\) with a probability of at least \(1- \delta \). It follows that Eq. (13) holds. Hence, \(\widehat{\beta }\) is a root-\({{n / {{p_n}}}}\) consistent estimator of \(\beta \). \(\square \)

Proof of Theorem 2

Let \({{\widehat{\gamma }}_i} = {{{\widehat{\widetilde{X}}}}_i}^T( {\widehat{\beta }- {\beta _0}} )\). If \(\widehat{\beta }\) maximizes Eq. (6), then \(\widehat{\beta }\) satisfies the following equation:

$$\begin{aligned} 0&= \sum \limits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{{h_1}}^\prime } \left( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T\widehat{\beta }}\right) = \sum \limits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{{h_1}}^\prime }( {{\varepsilon _i} -{\sigma _i}- {{\widehat{\gamma }}_i}} )\\&= \sum \limits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\left\{ \phi _{{h_1}}^\prime ( {{\varepsilon _i}} ) - \phi _{{h_1}}^{\prime \prime }( {{\varepsilon _i}} ){({\sigma _i} + {{\widehat{\gamma }}_i})} + \frac{1}{2}\phi _{{h_1}}^{\prime \prime \prime }( {\varepsilon _i^ * }){({\sigma _i} + {{\widehat{\gamma }}_i})}^2\right\} }\\&=I_4+I_5+I_6, \end{aligned}$$

where \({\varepsilon _i^ * }\) is between \(\varepsilon _i\) and \(\varepsilon _i-{\sigma _i}-{\widehat{\gamma }}_i\).

For \(I_5\), we have

$$\begin{aligned} - \sum \limits _{i = 1}^n{{{\widehat{\widetilde{X}}}}_i}{\phi _{{h_1}}^{\prime \prime }( {{\varepsilon _i}} )({\sigma _i} + {{\widehat{\gamma }}_i})}= & {} - \sum \limits _{i = 1}^n {\phi _{{h_1}}^{\prime \prime }( {{\varepsilon _i}})\left\{ {{{\widehat{\widetilde{X}}}}_i}{{{\widehat{\widetilde{X}}}}_i}^T(\widehat{\beta }- {\beta _0}) + {o_p}(1)\right\} }\\= & {} - n{\varSigma _1}(\widehat{\beta }- {\beta _0}) + {o_p}(1), \end{aligned}$$

where \({\varSigma _1} = E\{ {F( {u,{h_1}}){\widetilde{X}}{{ \widetilde{X}}^T}}\},\) and the last equality is derived from the regularity conditions (A5) and (A7).

Based on \({| {{{\widehat{\gamma }}_i}} |^2} = {\mathrm{O}_p}({\Vert {\widehat{\beta }- {\beta _0}}\Vert ^2})\) and \({{p_n^2} / n} \rightarrow 0\) as \(n \rightarrow \infty \), we have \({I_6} = o_p({I_5})\). It can be shown, by easy calculation, that \(\sqrt{n} (\widehat{\beta }- {\beta _0}) = \frac{1}{{\sqrt{n} }}\varSigma _1^{ - 1}\sum \nolimits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{{h_1}}^\prime ( {{\varepsilon _i}})} + {o_p}(1).\)

Note that \(E\left( {{\phi }^\prime _h( \varepsilon )\left| {U = u} \right. } \right) = 0\), and by the central limit theorem, we have

$$\begin{aligned} \sqrt{n} (\widehat{\beta }- {\beta _0})\mathop \rightarrow \limits ^d N(0,\varSigma _1^{ - 1}{\varSigma _2}\varSigma _1^{ - 1}), \end{aligned}$$

where \({\varSigma _2} = Var\{ \widetilde{X}\phi _{{h_1}}^\prime (\varepsilon )\}\). This completes the proof. \(\square \)

Proof of Theorem 3

Since Theorem 3 is parallel to Theorem 6, we will only present detailed proof for Theorem 6. \(\square \)

Proof of Theorem 4

It is sufficient to show that for any given \({{\delta > 0}}\), there exists a large constant C such that

$$\begin{aligned} P\left\{ {\mathop {\sup }\limits _{\left\| \mathbf {v} \right\| = C} {Q_\lambda }\left( {\beta _0 + {{{\omega _n}}}\mathbf {v}} \right) < {Q_\lambda }\left( \beta _0 \right) } \right\} \ge 1 - \delta , \end{aligned}$$
(14)

where \({\omega _n} = p_n^{{1 / 2}}({n^{ - {1 / 2}}} + {a_n}).\)

Let \({I_7} = - \sum \nolimits _{j = 1}^{{k_n}} {\{ p{}_{{\lambda }}(\left| {{\beta _{{0j}}} + {\omega _n}{v_j}} \right| ) - p{}_{{\lambda }}(\left| {{\beta _{{0j}}}} \right| )\} } \), where \({k_n}\) is the number of components of \(\beta _{{0a}}\). Note that \(p{}_{{\lambda }}(0) = 0\) and \(p{}_{{\lambda }}(| {{\beta _{{j}}}} |) \ge 0\) for all \(\beta _j\). By the proof of Theorem 1, we have

$$\begin{aligned} \frac{1}{n}\left\{ {Q_\lambda }\left( {{\beta _0} + {\omega _n}{} \mathbf{{v}}} \right) - {Q_\lambda }\left( {{\beta _0}} \right) \right\} \le {{\bar{I}}_1} + {{\bar{I}}_2} + {{\bar{I}}_3} + {I_7}, \end{aligned}$$
(15)

where \({{\bar{I}}_1}\), \({{\bar{I}}_2}\), and \({{\bar{I}}_3}\) are the same as \(I_1\), \(I_2\) and \(I_3\) except the factor \({\mu _n}\) replaced by \(\omega _n\).

By the Taylor expansion and the Cauchy–Schwarz inequality, \(I_7\) is bounded by

$$\begin{aligned} \sqrt{{k_n}} \omega {}_n{a_n}\left\| \mathbf {v} \right\| + \omega {{}_n^2}{b_n}{\left\| \mathbf {v} \right\| ^2}. \end{aligned}$$

Consequently, as \(b_n \rightarrow 0\), \(I_7\) is dominated by \({{\bar{I}}_2} = \frac{1}{2}\omega _n^2{\mathbf{{v}}^T}{\varSigma _1}{} \mathbf{{v}}(1 + {o_p}(1))\), provided C is taken to be sufficiently large. Hence, for large C, \({\bar{I}}_2\) dominates all other three terms in Eq. (15). Based on the fact \({\bar{I}}_2<0\), Eq. (14) holds. Consequently, the result in Theorem 4 holds.

To prove Theorem 5, we need the following lemma. \(\square \)

Lemma 1

Under the conditions in Theorem 5, with probability tending to 1, for any given \({\beta _a}\) satisfying \(\left\| {{\beta _a} - {\beta _{0a}}} \right\| = {\mathrm{O}_p}(\sqrt{{{{p_n}} / n}} )\) and any constant C, we have

$$\begin{aligned} {Q_\lambda }\left\{ {\left( \begin{array}{l} {\beta _a}\\ 0 \end{array} \right) } \right\} = \mathop {\mathrm{max}}\limits _{\left\| {{\beta _b}} \right\| \le C{{{({p_n}} / n}{)^{{1 / 2}}}}} {Q_\lambda }\left\{ {\left( \begin{array}{l} {\beta _a}\\ {\beta _b} \end{array} \right) } \right\} , \end{aligned}$$
(16)

Proof of Lemma 1

From the proof of Theorem 2, we have

$$\begin{aligned} R_j^{'}(\beta ) = \frac{{\partial R(\beta )}}{{\partial {\beta _j}}} = \frac{1}{n}\sum \nolimits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{h_1}^\prime ({\varepsilon _i})} - {\varSigma _1}(\beta - {\beta _0}) + o({\omega _n}). \end{aligned}$$

It can be shown that \(\frac{1}{n}\sum \nolimits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{h_1}^\prime ({\varepsilon _i})}= {O_p}({n^{ - {1 / 2}}})\). By the assumption that \(\left\| {{\beta _a} - {\beta _{0a}}} \right\| = {O_p}(\sqrt{{{{p_n}} / n}} )\), then we have \(R_j^\prime (\beta ) = {O_p}(\sqrt{{{{p_n}} / n}} )\). Therefore, for \({\beta _j} \ne 0\) and \(j = {k_n} + 1, \ldots ,{p_n}\),

$$\begin{aligned} \frac{{\partial {Q_\lambda }(\beta )}}{{\partial {\beta _j}}}= & {} nR_j^{'}(\beta ) - np_{{\lambda }}^\prime (\left| {{\beta _j}} \right| )\mathrm{{sgn}}({\beta _j})\\= & {} - n{\lambda }\left\{ \lambda ^{ - 1}p_{{\lambda }}^\prime (\left| {{\beta _j}} \right| )\mathrm{{sgn}}({\beta _j}) + {O_p}\left( \frac{{\sqrt{{{{p_n}} / n}} }}{{{\lambda }}}\right) \right\} . \end{aligned}$$

Since \({{\lim {{\inf }_{n \rightarrow \infty }}\lim {{\inf }_{t \rightarrow {0^ + }}}p_{{\lambda }}^\prime (t)} / {{\lambda }}} > 0\) and \({({n / {{p_n}}})^{{1 / 2}}}{\lambda } \rightarrow \infty \), then the sign of the derivative for \({\beta _j} \in ( - C\sqrt{{{{p_n}} / n}}, C\sqrt{{{{p_n}} / n}})\) is completely determined by that of \({\beta _j}\). Therefore, Eq. (16) holds. \(\square \)

Proof of Theorem 5

From Lemma 1, it follows that \(\widehat{\beta }_b^\lambda = 0\). We will next show the asymptotic normality of \(\widehat{\beta }_a^\lambda \). By Theorem 4, it can be shown easily that there exists a \(\widehat{\beta }_a^\lambda \) that is a root-\({{n / {{p_n}}}}\) consistent local maximizer of \({Q_\lambda }\{ {{{\left( {\beta _a^T,0} \right) }^T}}\}\), which satisfies the following equations:

$$\begin{aligned} \frac{{\partial {Q_\lambda }(\beta )}}{{\partial {\beta _j}}}\left| {_{\beta = {{\left( (\widehat{\beta }_a^\lambda )^T,0\right) }^T}}} \right. = 0,\quad \mathrm{{for}} \quad j=1,2,\ldots ,{k_n}. \end{aligned}$$

Therefore,

$$\begin{aligned}&nR_j^\prime (\widehat{\beta }^\lambda ) - np_{{\lambda }}^\prime (| {{\widehat{\beta }^\lambda }}|)\mathrm{{sgn}}({\widehat{\beta }^\lambda })\\&\quad = \sum \limits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\left\{ \phi _{{h_1}}^{\prime }\left( {{\varepsilon _i}} \right) - \phi _{{h_1}}^{\prime \prime }\left( {{\varepsilon _i}} \right) {({\sigma _i} + {{\widehat{\gamma }}_i})} + \frac{1}{2}\phi _{{h_1}}^{\prime \prime \prime }\left( {\varepsilon _i^ * } \right) {({\sigma _i} + {{\widehat{\gamma }}_i})}^2\right\} }\\&\qquad -\, n\left\{ p_{{\lambda }}^{\prime }(\left| {{\beta _{0j}}} \right| )\mathrm{{sgn}}({\beta _{0j}}) + \left( p_{{\lambda }}^{\prime \prime }(\left| {{\beta _{0j}}} \right| ) + {o_p}(1)\right) (\widehat{\beta }^\lambda - {\beta _j})\right\} , \end{aligned}$$

where \({\varepsilon _i^ * }\) is between \(\varepsilon _i\) and \(\varepsilon _i-{\widehat{\gamma }}_i\).

By the similar proof in Theorem 2, it follows by the central limit theorem and the Slutsky’s theorem that

$$\begin{aligned} \sqrt{n} (\varSigma _1^{(1)} + {\varPsi _\lambda })\{ \widehat{\beta }_a^\lambda - {\beta _{0a}} + {(\varSigma _1^{(1)} + {\varPsi _\lambda })^{ - 1}}{} \mathbf{{s}_n}\} \rightarrow {N}( {\mathbf{{0}},\varSigma _2^{(1)}}) \end{aligned}$$

in distribution, where \(\varSigma _1^{(1)}\), \(\varSigma _2^{(1)}\) are the submatrices of \(\varSigma _1\) and \(\varSigma _2\) corresponding to \(\beta _{0a}\). \(\square \)

Proof of Theorem 6

For notational clarity, we let \({K_i} = K({\frac{{{Z_i} -Z}}{{{h_2}}}})\) and \(l\left( r \right) = - {\phi _{{h_3}}}(r)\). Then, Eq. (11) can be rewritten as

$$\begin{aligned} ( {{{\widehat{a}^{\lambda }}},\widehat{b}^{\lambda }} ) = \mathop {\arg \min }\limits _{a,b} \sum \limits _{i = 1}^n l{({ {Y_i} - X_i^T\widehat{\beta }^{\lambda }- a - ( {{Z_i} - Z})b} )}{K_i}. \end{aligned}$$

Let \({\theta } = {\left( {nh_2^q} \right) ^{{1 / 2}}}[ {{{a}}-f(Z),{h_2}( {{{b}}-f'(Z)})}]\), \(z_i^ * = {[ {1,{{{{( {{Z_i} - Z} )}^T}}/{{h_2}}}} ]^T}\), \({s_i} = X_i^T( {{\beta _0} - {{\widehat{\beta }}^\lambda }})\), \({\delta _i} = {Y_i} - X_i^T\widehat{\beta }^{\lambda } - f(Z) - f'(Z)({Z_i} - Z)\), \({\delta _i^*} = {Y_i} - X_i^T\beta _0 - f(Z) - f'(Z)({Z_i} - Z)\) and \(f_i={f({Z_i}) - f(Z) - f'({Z})({Z_i} - Z)}\). Then, \({\theta _n}= {\left( {nh_2^q} \right) ^{{1 / 2}}}[ {{{\widehat{a}}^\lambda }-f(Z),{h_2}( {{{\widehat{b}}^\lambda }-f'(Z)})}]\) minimizes the function

$$\begin{aligned} {J_n}({\theta })&=\sum \limits _{i = 1}^n {\left\{ {l( { {Y_i} - X_i^T\widehat{\beta }^{\lambda } - a - b({Z_i} - Z)}) - l({\delta _i})} \right\} } {K_i}\\&= \sum \limits _{i = 1}^n {\{ {l( {{\delta _i} - {{( {nh_2^q})}^{ - {1 / 2}}}} ( {{\theta ^T}z_i^ * })) - l({\delta _i})}\}} {K_i}. \end{aligned}$$

Since the function \({J_n}({\theta })\) is convex in \(\theta \), it is sufficient to prove that \({J_n}({\theta })\) converges pointwise to its conditional expectation (Pollard 1991).

Given \({{{{\mathbf{{X}}}}}} = {({{{{X}}}_1}, \ldots ,{{{{X}}}_n})^T}\) and \({{{{\mathbf{{Z}}}}}} = {({{{{Z}}}_1}, \ldots ,{{{{Z}}}_n})^T}\), we can obtain that

$$\begin{aligned} E\left( {{J_n}(\theta )\left| {\mathbf{{Z}}} \right. } \right)&=-\, {( {nh_2^q} )^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\varphi _{h_3}'( {f_i + {s_i}}|Z_i)( {{\theta ^T}z_i^ * })} {K_i}\\&\quad +\, \frac{{1 }}{2}{( {nh_2^q})}^{-1}{\sum \limits _{i = 1}^n {\varphi _{h_3}''( {f_i + {s_i}}|Z_i )}{( {{\theta ^T}z_i^ * })} ^2}{K_i}({1 + o_p(1)})\\&=-\, {( {nh_2^q} )^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\varphi _{h_3}'( {f_i}|Z_i)( {{\theta ^T}z_i^ * })} {K_i}\\&\quad +\, \frac{{1 }}{2}{( {nh_2^q})}^{-1}{\sum \limits _{i = 1}^n {\varphi _{h_3}''( 0|Z_i )}{( {{\theta ^T}z_i^ * })} ^2}{K_i}+ o_p\{{( {nh_2^q})}^{-1}\}, \end{aligned}$$

where the last equality is derived from the regularity condition (B3). Similar arguments can be also seen in (D.2) and (D.3) of Zhu et al. (2013).

Then, we can obtain that

$$\begin{aligned} {J_n}({\theta })&={( {nh_2^q})^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\phi _{h_3}'({f_i+{\varepsilon _i}} )( {{\theta ^T}z_i^ * })} {K_i}\\&\quad +\frac{1 }{2}{( {nh_2^q})^{ - 1}}{\sum \limits _{i = 1}^n {{\varphi _{h_3}''( 0|Z_i )}( {{\theta ^T}z_i^ * })} ^2}{K_i} + {o_p}\{ {{{( {nh_2^q})}^{{1 / 2}}}} \}\\&={( {nh_2^q})^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\phi _{h_3}'({{\delta _i^*}} )( {{\theta ^T}z_i^ * })} {K_i}\\&\quad +\frac{1 }{2}{( {nh_2^q})^{ - 1}}{\sum \limits _{i = 1}^n {{\varphi _{h_3}''( 0|Z_i )}( {{\theta ^T}z_i^ * })} ^2}{K_i} + {o_p}\{ {{{( {nh_2^q})}^{{1 / 2}}}} \} \end{aligned}$$

which is parallel to (4.6) of Fan et al. (1994). The rest of the proof follows literally from Fan et al. (1994) by treating the dimension of Z as fixed, so the detail is omitted here. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Li, N. & Yang, J. A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates. Stat Papers 61, 1911–1937 (2020). https://doi.org/10.1007/s00362-018-1013-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-1013-1

Keywords

Navigation