A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates

Yang, Hu; Li, Ning; Yang, Jing

doi:10.1007/s00362-018-1013-1

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates

Regular Article
Published: 21 May 2018

Volume 61, pages 1911–1937, (2020)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Hu Yang¹,
Ning Li¹ &
Jing Yang²

392 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, a new robust and efficient estimation approach based on local modal regression is proposed for partially linear models with large-dimensional covariates. We show that the resulting estimators for both parametric and nonparametric components are more efficient in the presence of outliers or heavy-tail error distribution, and as asymptotically efficient as the corresponding least squares estimators when there are no outliers and the error distribution is normal. We also establish the asymptotic properties of proposed estimators when the covariate dimension diverges at the rate of $o\left( {\sqrt{n} } \right) \mathrm{{ }}$. To achieve sparsity and enhance interpretability, we develop a variable selection procedure based on SCAD penalty to select significant parametric covariates and show that the method enjoys the oracle property under mild regularity conditions. Moreover, we propose a practical modified MEM algorithm for the proposed procedures. Some Monte Carlo simulations and a real data are conducted to illustrate the finite sample performance of the proposed estimators. Finally, based on the idea of sure independence screening procedure proposed by Fan and Lv (J R Stat Soc 70:849–911, 2008), a robust two-step approach is introduced to deal with ultra-high dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Entropy-Based Subsampling Methods for Big Data

Article 11 April 2024

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

Residuals-based distributionally robust optimization with covariate information

Article 26 September 2023

References

Akaike H (1973) Maximum likelihood Identification of Gaussian autoregressive moving average models. Biometrika 60:255–265
Article MathSciNet Google Scholar
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37:373–384
Article MathSciNet Google Scholar
Chen B, Yu Y, Zou H, Liang H (2012) Profiled adaptive Elastic-Net procedure for partially linear models with high-dimensional covariates. J Stat Plann Inference 142:1733–1745
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–21
MathSciNet MATH Google Scholar
Engle R, Granger C, Rice J, Weiss A (1986) Semiparametric estimates of the relation between weather and electricity sales. J Am Stat Assoc 81:310–320
Article Google Scholar
Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman and Hall, London
MATH Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultra-high-dimensional feature space. J R Stat Soc 70:849–911
Article MathSciNet Google Scholar
Fan J, Hu TC, Truong YK (1994) Robust nonparametric function estimation. Scand J Stat 21:433–446
MATH Google Scholar
Hardle W, Liang H, Gao JT (2000) Partial linear models. Springer, New York
Book Google Scholar
Huber PJ (1981) Robust estimation. Wiley, New York
MATH Google Scholar
Johnson RW (2003) Kiplingers personal finance. J Stat Educ 57:104–123
Google Scholar
Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36:261–286
Article MathSciNet Google Scholar
Li J, Ray S, Lindsay B (2007) A nonparametric statistical approach to clustering via mode identification. J Mach Learn Res 8:1687–1723
MathSciNet MATH Google Scholar
Li GR, Peng H, Zhu LX (2011) Nonconcave penalized M-estimation with diverging number of parameters. Stat Sin 21:391–420
MathSciNet MATH Google Scholar
Mallows CL (1973) Some comments on $Cp$. Technometrics 15:661–675
MATH Google Scholar
Ni X, Zhang HH, Zhang D (2009) Automatic model selection for partially linear models. J Multivar Anal 100:2100–2111
Article MathSciNet Google Scholar
Pollard D (1991) Asymptotics for least absolute deviation regression estimators. Econom Theory 7:186–199
Article MathSciNet Google Scholar
Rao BLSP (1983) Nonparametric functional estimation. Academic Press, Orlando
MATH Google Scholar
Robinson PM (1988) Root $n$-consistent semiparametric regression. Econometrica 56:931–954
Article MathSciNet Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet Google Scholar
Severini TA, Staniswalis JG (1994) Quasi-likelihood estimation in semiparametric models. J Am Stat Assoc 89:501–511
Article MathSciNet Google Scholar
Speckman PE (1988) Kernel smoothing in partial linear models. J R Stat Soc 50:413–436
MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc 58:267–288
MathSciNet MATH Google Scholar
Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J Bus Econ Stat 25:347–355
Article MathSciNet Google Scholar
Xie H, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37:673–696
Article MathSciNet Google Scholar
Yang H, Yang J (2014) A robust and efficient estimation and variable selection method for partially linear single-index models. J Multivar Anal 129:227–242
Article MathSciNet Google Scholar
Yao W, Li L (2014) A new regression model: modal linear regression. Scand J Stat 41:656–671
Article MathSciNet Google Scholar
Yao W, Lindsay B, Li R (2012) Local modal regression. J Nonparametr Stat 24:647–663
Article MathSciNet Google Scholar
Zeger S, Diggle P (1994) Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50:689–699
Article Google Scholar
Zhang R, Zhao W, Liu J (2013) Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. J Nonparametr Stat 25:523–544
Article MathSciNet Google Scholar
Zhao W, Zhang R, Liu J, Lv Y (2014) Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Ann Inst Stat Math 66:165–191
Article MathSciNet Google Scholar
Zhao W, Zhang R, Liu Y, Liu J (2015) Empirical likelihood based modal regression. Stat Papers 56:411–430
Article MathSciNet Google Scholar
Zhou H, You J, Zhou B (2010) Statistical inference for fixed-effects partially linear regression models with errors in variables. Stat Pap 51:629–650
Article MathSciNet Google Scholar
Zhu LX, Fang KT (1996) Asymptotics for kernel estimation of sliced inverse regression. Ann Stat 3:1053–1068
MATH Google Scholar
Zhu L, Huang M, Li R (2012) Semiparametric quantile regression with high-dimensional covariates. Stat Sin 22:1379–1401
MathSciNet MATH Google Scholar
Zhu L, Li R, Cui H (2013) Robust estimation for partially linear models with large-dimensional covariates. Sci China Math 56:2069–2088
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 11671059).

Author information

Authors and Affiliations

College of Mathematics and Statistics, Chongqing University, Chongqing, 401331, China
Hu Yang & Ning Li
Key Laboratory of High Performance Computing and Stochastic Information Processing (Ministry of Education of China), College of Mathematics and Statistics, Hunan Normal University, Changsha, 410081, China
Jing Yang

Authors

Hu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Li
View author publications
You can also search for this author in PubMed Google Scholar
Jing Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Li.

Additional information

This work is supported by the National Natural Science Foundation of China (Grant No. 11671059).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (xls 51 KB)

Supplementary material 2 (xls 69 KB)

Appendix: Proofs of Theorems

Proof of Theorem 1

We want to show that for any given ${{\delta > 0}}$, there exists a large constant C such that

$$\begin{aligned} P\left\{ {\mathop {\sup }\limits _{\Vert \mathbf {v} \Vert = \mathbf {C}} {R}( {\beta _0 + {{\mu _n}}\mathbf {v}} ) < {R}( \beta _0 )} \right\} \ge 1 - \delta , \end{aligned}$$

(13)

where $R(\beta ) = \frac{1}{n}\sum \limits _{i = 1}^n {\phi _{h_1}}( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T\beta })$, ${\mu _n}={p_n}^{{1 / 2}} {{n^{{{ - 1} / 2}}}}$.

For simplicity, we define ${D_n}(\mathbf{{v}}) = {R}( {\beta _0 + {{\mu _n}}{} \mathbf{{v}}}) - {R }(\beta _0)$ and then obtain that

$$\begin{aligned} {D_n}( \mathbf{{u}})&= \frac{1}{n} {\sum \limits _{i = 1}^n\left\{ {{\phi _{{h_1}}}}\left( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T({\beta _0 + {\mu _n}\mathbf {v}} )}\right) - {{\phi _{{h_1}}}}\left( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T\beta _0 }\right) \right\} }\\&= \frac{1}{n} {\sum \limits _{i = 1}^n\left\{ {{\phi _{{h_1}}}}\left( {{\varepsilon _i} - {\sigma _i} - {\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}} \right) -{{\phi _{{h_1}}}} ( {{\varepsilon _i} - {\sigma _i}})\right\} }\\&= \frac{1}{n}\sum \limits _{i = 1}^n \left\{ { - \phi _{{h_1}}^\prime } ( {{\varepsilon _i} - {\sigma _i}})\left( {\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}\right) + \frac{1}{2}\phi _{{h_1}}^{\prime \prime }( {{\varepsilon _i} - {\sigma _i}} ){{\left( {\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}\right) }^2}\right. \\&\left. \quad - \frac{1}{6}\phi _{{h_1}}^{\prime \prime \prime }( {{t_i}} ){{\left( {\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}\right) }^3}\right\} \\&= {I_1}+{I_2}+{I_3}, \end{aligned}$$

where ${\sigma _i} = {{\widetilde{Y}}_i} - {{{\widehat{\widetilde{Y}}}}_i} - {({{\widetilde{X}}_i} - {{{\widehat{\widetilde{X}}}}_i})^T}{\beta _0}$, and $t_i$ is between ${\varepsilon _i} - {\sigma _i}$ and ${\varepsilon _i} - {\sigma _i}-{\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}$. By the regularity conditions (A1)–(A4), the consistence of the kernel estimation implies that ${\max _{1 \le i \le n}}\Vert {{\sigma _i}} \Vert = {o_p}(1)$ almost surely, which will repeatedly be used in our proof. A detailed discussion on this argument can be found in the Lemma 3.5.1 and Lemma A.1 of Hardle et al. (2000).

Based on the fact $\xi = E(\xi ) + {\mathrm{O}_p}(\sqrt{Var(\xi )})$, the regularity condition (A7) and the uniform convergence of ${{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}}$ entail that ${I_1} = {\mathrm{O}_p}(\frac{{C{\mu _n}}}{{\sqrt{n} }})$. One can refer to Rao (1983) and Zhu and Fang (1996) for these technical details. Similarly, we have ${I_3} = {\mathrm{O}_p}({C^3}{\mu _n}^3)$. For $I_2$, we have ${I_2} = \frac{1}{2}{\mu _n}^2{\mathbf{{v}}^T}{\varSigma _1}{} \mathbf{{v}}(1 + {o_p}(1))$, where ${\varSigma _1} = E\{ F(U,{h_1}){\widetilde{X}}{{\widetilde{X}}^T}\}.$ By the condition ${{p_n^2} / n} \rightarrow 0$ as $n \rightarrow \infty $, we can show that ${I_1} = {o_p}\left( {{I_2}} \right) $, and ${I_3} = {o_p}\left( {{I_2}} \right) $. Similar practice can been in Li et al. (2011).

By the regularity condition (A6), $F(\mathbf{{v}},{h_1}) < 0$; hence, ${\varSigma _1}$ is a negative matrix. Noting $\left\| \mathbf {v} \right\| = C$, we can get C large enough such that $I_2$ dominates both $I_1$ and $I_3$ with a probability of at least $1- \delta $. It follows that Eq. (13) holds. Hence, $\widehat{\beta }$ is a root-${{n / {{p_n}}}}$ consistent estimator of $\beta $. $\square $

Proof of Theorem 2

Let ${{\widehat{\gamma }}_i} = {{{\widehat{\widetilde{X}}}}_i}^T( {\widehat{\beta }- {\beta _0}} )$. If $\widehat{\beta }$ maximizes Eq. (6), then $\widehat{\beta }$ satisfies the following equation:

$$\begin{aligned} 0&= \sum \limits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{{h_1}}^\prime } \left( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T\widehat{\beta }}\right) = \sum \limits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{{h_1}}^\prime }( {{\varepsilon _i} -{\sigma _i}- {{\widehat{\gamma }}_i}} )\\&= \sum \limits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\left\{ \phi _{{h_1}}^\prime ( {{\varepsilon _i}} ) - \phi _{{h_1}}^{\prime \prime }( {{\varepsilon _i}} ){({\sigma _i} + {{\widehat{\gamma }}_i})} + \frac{1}{2}\phi _{{h_1}}^{\prime \prime \prime }( {\varepsilon _i^ * }){({\sigma _i} + {{\widehat{\gamma }}_i})}^2\right\} }\\&=I_4+I_5+I_6, \end{aligned}$$

where ${\varepsilon _i^ * }$ is between $\varepsilon _i$ and $\varepsilon _i-{\sigma _i}-{\widehat{\gamma }}_i$.

For $I_5$, we have

$$\begin{aligned} - \sum \limits _{i = 1}^n{{{\widehat{\widetilde{X}}}}_i}{\phi _{{h_1}}^{\prime \prime }( {{\varepsilon _i}} )({\sigma _i} + {{\widehat{\gamma }}_i})}= & {} - \sum \limits _{i = 1}^n {\phi _{{h_1}}^{\prime \prime }( {{\varepsilon _i}})\left\{ {{{\widehat{\widetilde{X}}}}_i}{{{\widehat{\widetilde{X}}}}_i}^T(\widehat{\beta }- {\beta _0}) + {o_p}(1)\right\} }\\= & {} - n{\varSigma _1}(\widehat{\beta }- {\beta _0}) + {o_p}(1), \end{aligned}$$

where ${\varSigma _1} = E\{ {F( {u,{h_1}}){\widetilde{X}}{{ \widetilde{X}}^T}}\},$ and the last equality is derived from the regularity conditions (A5) and (A7).

Based on ${| {{{\widehat{\gamma }}_i}} |^2} = {\mathrm{O}_p}({\Vert {\widehat{\beta }- {\beta _0}}\Vert ^2})$ and ${{p_n^2} / n} \rightarrow 0$ as $n \rightarrow \infty $, we have ${I_6} = o_p({I_5})$. It can be shown, by easy calculation, that $\sqrt{n} (\widehat{\beta }- {\beta _0}) = \frac{1}{{\sqrt{n} }}\varSigma _1^{ - 1}\sum \nolimits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{{h_1}}^\prime ( {{\varepsilon _i}})} + {o_p}(1).$

Note that $E\left( {{\phi }^\prime _h( \varepsilon )\left| {U = u} \right. } \right) = 0$, and by the central limit theorem, we have

$$\begin{aligned} \sqrt{n} (\widehat{\beta }- {\beta _0})\mathop \rightarrow \limits ^d N(0,\varSigma _1^{ - 1}{\varSigma _2}\varSigma _1^{ - 1}), \end{aligned}$$

where ${\varSigma _2} = Var\{ \widetilde{X}\phi _{{h_1}}^\prime (\varepsilon )\}$. This completes the proof. $\square $

Proof of Theorem 3

Since Theorem 3 is parallel to Theorem 6, we will only present detailed proof for Theorem 6. $\square $

Proof of Theorem 4

It is sufficient to show that for any given ${{\delta > 0}}$, there exists a large constant C such that

$$\begin{aligned} P\left\{ {\mathop {\sup }\limits _{\left\| \mathbf {v} \right\| = C} {Q_\lambda }\left( {\beta _0 + {{{\omega _n}}}\mathbf {v}} \right) < {Q_\lambda }\left( \beta _0 \right) } \right\} \ge 1 - \delta , \end{aligned}$$

(14)

where ${\omega _n} = p_n^{{1 / 2}}({n^{ - {1 / 2}}} + {a_n}).$

Let ${I_7} = - \sum \nolimits _{j = 1}^{{k_n}} {\{ p{}_{{\lambda }}(\left| {{\beta _{{0j}}} + {\omega _n}{v_j}} \right| ) - p{}_{{\lambda }}(\left| {{\beta _{{0j}}}} \right| )\} } $, where ${k_n}$ is the number of components of $\beta _{{0a}}$. Note that $p{}_{{\lambda }}(0) = 0$ and $p{}_{{\lambda }}(| {{\beta _{{j}}}} |) \ge 0$ for all $\beta _j$. By the proof of Theorem 1, we have

$$\begin{aligned} \frac{1}{n}\left\{ {Q_\lambda }\left( {{\beta _0} + {\omega _n}{} \mathbf{{v}}} \right) - {Q_\lambda }\left( {{\beta _0}} \right) \right\} \le {{\bar{I}}_1} + {{\bar{I}}_2} + {{\bar{I}}_3} + {I_7}, \end{aligned}$$

(15)

where ${{\bar{I}}_1}$, ${{\bar{I}}_2}$, and ${{\bar{I}}_3}$ are the same as $I_1$, $I_2$ and $I_3$ except the factor ${\mu _n}$ replaced by $\omega _n$.

By the Taylor expansion and the Cauchy–Schwarz inequality, $I_7$ is bounded by

$$\begin{aligned} \sqrt{{k_n}} \omega {}_n{a_n}\left\| \mathbf {v} \right\| + \omega {{}_n^2}{b_n}{\left\| \mathbf {v} \right\| ^2}. \end{aligned}$$

Consequently, as $b_n \rightarrow 0$, $I_7$ is dominated by ${{\bar{I}}_2} = \frac{1}{2}\omega _n^2{\mathbf{{v}}^T}{\varSigma _1}{} \mathbf{{v}}(1 + {o_p}(1))$, provided C is taken to be sufficiently large. Hence, for large C, ${\bar{I}}_2$ dominates all other three terms in Eq. (15). Based on the fact ${\bar{I}}_2<0$, Eq. (14) holds. Consequently, the result in Theorem 4 holds.

To prove Theorem 5, we need the following lemma. $\square $

Lemma 1

Under the conditions in Theorem 5, with probability tending to 1, for any given ${\beta _a}$ satisfying $\left\| {{\beta _a} - {\beta _{0a}}} \right\| = {\mathrm{O}_p}(\sqrt{{{{p_n}} / n}} )$ and any constant C, we have

$$\begin{aligned} {Q_\lambda }\left\{ {\left( \begin{array}{l} {\beta _a}\\ 0 \end{array} \right) } \right\} = \mathop {\mathrm{max}}\limits _{\left\| {{\beta _b}} \right\| \le C{{{({p_n}} / n}{)^{{1 / 2}}}}} {Q_\lambda }\left\{ {\left( \begin{array}{l} {\beta _a}\\ {\beta _b} \end{array} \right) } \right\} , \end{aligned}$$

(16)

Proof of Lemma 1

From the proof of Theorem 2, we have

$$\begin{aligned} R_j^{'}(\beta ) = \frac{{\partial R(\beta )}}{{\partial {\beta _j}}} = \frac{1}{n}\sum \nolimits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{h_1}^\prime ({\varepsilon _i})} - {\varSigma _1}(\beta - {\beta _0}) + o({\omega _n}). \end{aligned}$$

It can be shown that $\frac{1}{n}\sum \nolimits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{h_1}^\prime ({\varepsilon _i})}= {O_p}({n^{ - {1 / 2}}})$. By the assumption that $\left\| {{\beta _a} - {\beta _{0a}}} \right\| = {O_p}(\sqrt{{{{p_n}} / n}} )$, then we have $R_j^\prime (\beta ) = {O_p}(\sqrt{{{{p_n}} / n}} )$. Therefore, for ${\beta _j} \ne 0$ and $j = {k_n} + 1, \ldots ,{p_n}$,

$$\begin{aligned} \frac{{\partial {Q_\lambda }(\beta )}}{{\partial {\beta _j}}}= & {} nR_j^{'}(\beta ) - np_{{\lambda }}^\prime (\left| {{\beta _j}} \right| )\mathrm{{sgn}}({\beta _j})\\= & {} - n{\lambda }\left\{ \lambda ^{ - 1}p_{{\lambda }}^\prime (\left| {{\beta _j}} \right| )\mathrm{{sgn}}({\beta _j}) + {O_p}\left( \frac{{\sqrt{{{{p_n}} / n}} }}{{{\lambda }}}\right) \right\} . \end{aligned}$$

Since ${{\lim {{\inf }_{n \rightarrow \infty }}\lim {{\inf }_{t \rightarrow {0^ + }}}p_{{\lambda }}^\prime (t)} / {{\lambda }}} > 0$ and ${({n / {{p_n}}})^{{1 / 2}}}{\lambda } \rightarrow \infty $, then the sign of the derivative for ${\beta _j} \in ( - C\sqrt{{{{p_n}} / n}}, C\sqrt{{{{p_n}} / n}})$ is completely determined by that of ${\beta _j}$. Therefore, Eq. (16) holds. $\square $

Proof of Theorem 5

From Lemma 1, it follows that $\widehat{\beta }_b^\lambda = 0$. We will next show the asymptotic normality of $\widehat{\beta }_a^\lambda $. By Theorem 4, it can be shown easily that there exists a $\widehat{\beta }_a^\lambda $ that is a root-${{n / {{p_n}}}}$ consistent local maximizer of ${Q_\lambda }\{ {{{\left( {\beta _a^T,0} \right) }^T}}\}$, which satisfies the following equations:

$$\begin{aligned} \frac{{\partial {Q_\lambda }(\beta )}}{{\partial {\beta _j}}}\left| {_{\beta = {{\left( (\widehat{\beta }_a^\lambda )^T,0\right) }^T}}} \right. = 0,\quad \mathrm{{for}} \quad j=1,2,\ldots ,{k_n}. \end{aligned}$$

Therefore,

$$\begin{aligned}&nR_j^\prime (\widehat{\beta }^\lambda ) - np_{{\lambda }}^\prime (| {{\widehat{\beta }^\lambda }}|)\mathrm{{sgn}}({\widehat{\beta }^\lambda })\\&\quad = \sum \limits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\left\{ \phi _{{h_1}}^{\prime }\left( {{\varepsilon _i}} \right) - \phi _{{h_1}}^{\prime \prime }\left( {{\varepsilon _i}} \right) {({\sigma _i} + {{\widehat{\gamma }}_i})} + \frac{1}{2}\phi _{{h_1}}^{\prime \prime \prime }\left( {\varepsilon _i^ * } \right) {({\sigma _i} + {{\widehat{\gamma }}_i})}^2\right\} }\\&\qquad -\, n\left\{ p_{{\lambda }}^{\prime }(\left| {{\beta _{0j}}} \right| )\mathrm{{sgn}}({\beta _{0j}}) + \left( p_{{\lambda }}^{\prime \prime }(\left| {{\beta _{0j}}} \right| ) + {o_p}(1)\right) (\widehat{\beta }^\lambda - {\beta _j})\right\} , \end{aligned}$$

where ${\varepsilon _i^ * }$ is between $\varepsilon _i$ and $\varepsilon _i-{\widehat{\gamma }}_i$.

By the similar proof in Theorem 2, it follows by the central limit theorem and the Slutsky’s theorem that

$$\begin{aligned} \sqrt{n} (\varSigma _1^{(1)} + {\varPsi _\lambda })\{ \widehat{\beta }_a^\lambda - {\beta _{0a}} + {(\varSigma _1^{(1)} + {\varPsi _\lambda })^{ - 1}}{} \mathbf{{s}_n}\} \rightarrow {N}( {\mathbf{{0}},\varSigma _2^{(1)}}) \end{aligned}$$

in distribution, where $\varSigma _1^{(1)}$, $\varSigma _2^{(1)}$ are the submatrices of $\varSigma _1$ and $\varSigma _2$ corresponding to $\beta _{0a}$. $\square $

Proof of Theorem 6

For notational clarity, we let ${K_i} = K({\frac{{{Z_i} -Z}}{{{h_2}}}})$ and $l\left( r \right) = - {\phi _{{h_3}}}(r)$. Then, Eq. (11) can be rewritten as

$$\begin{aligned} ( {{{\widehat{a}^{\lambda }}},\widehat{b}^{\lambda }} ) = \mathop {\arg \min }\limits _{a,b} \sum \limits _{i = 1}^n l{({ {Y_i} - X_i^T\widehat{\beta }^{\lambda }- a - ( {{Z_i} - Z})b} )}{K_i}. \end{aligned}$$

Let ${\theta } = {\left( {nh_2^q} \right) ^{{1 / 2}}}[ {{{a}}-f(Z),{h_2}( {{{b}}-f'(Z)})}]$, $z_i^ * = {[ {1,{{{{( {{Z_i} - Z} )}^T}}/{{h_2}}}} ]^T}$, ${s_i} = X_i^T( {{\beta _0} - {{\widehat{\beta }}^\lambda }})$, ${\delta _i} = {Y_i} - X_i^T\widehat{\beta }^{\lambda } - f(Z) - f'(Z)({Z_i} - Z)$, ${\delta _i^*} = {Y_i} - X_i^T\beta _0 - f(Z) - f'(Z)({Z_i} - Z)$ and $f_i={f({Z_i}) - f(Z) - f'({Z})({Z_i} - Z)}$. Then, ${\theta _n}= {\left( {nh_2^q} \right) ^{{1 / 2}}}[ {{{\widehat{a}}^\lambda }-f(Z),{h_2}( {{{\widehat{b}}^\lambda }-f'(Z)})}]$ minimizes the function

$$\begin{aligned} {J_n}({\theta })&=\sum \limits _{i = 1}^n {\left\{ {l( { {Y_i} - X_i^T\widehat{\beta }^{\lambda } - a - b({Z_i} - Z)}) - l({\delta _i})} \right\} } {K_i}\\&= \sum \limits _{i = 1}^n {\{ {l( {{\delta _i} - {{( {nh_2^q})}^{ - {1 / 2}}}} ( {{\theta ^T}z_i^ * })) - l({\delta _i})}\}} {K_i}. \end{aligned}$$

Since the function ${J_n}({\theta })$ is convex in $\theta $, it is sufficient to prove that ${J_n}({\theta })$ converges pointwise to its conditional expectation (Pollard 1991).

Given ${{{{\mathbf{{X}}}}}} = {({{{{X}}}_1}, \ldots ,{{{{X}}}_n})^T}$ and ${{{{\mathbf{{Z}}}}}} = {({{{{Z}}}_1}, \ldots ,{{{{Z}}}_n})^T}$, we can obtain that

$$\begin{aligned} E\left( {{J_n}(\theta )\left| {\mathbf{{Z}}} \right. } \right)&=-\, {( {nh_2^q} )^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\varphi _{h_3}'( {f_i + {s_i}}|Z_i)( {{\theta ^T}z_i^ * })} {K_i}\\&\quad +\, \frac{{1 }}{2}{( {nh_2^q})}^{-1}{\sum \limits _{i = 1}^n {\varphi _{h_3}''( {f_i + {s_i}}|Z_i )}{( {{\theta ^T}z_i^ * })} ^2}{K_i}({1 + o_p(1)})\\&=-\, {( {nh_2^q} )^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\varphi _{h_3}'( {f_i}|Z_i)( {{\theta ^T}z_i^ * })} {K_i}\\&\quad +\, \frac{{1 }}{2}{( {nh_2^q})}^{-1}{\sum \limits _{i = 1}^n {\varphi _{h_3}''( 0|Z_i )}{( {{\theta ^T}z_i^ * })} ^2}{K_i}+ o_p\{{( {nh_2^q})}^{-1}\}, \end{aligned}$$

where the last equality is derived from the regularity condition (B3). Similar arguments can be also seen in (D.2) and (D.3) of Zhu et al. (2013).

Then, we can obtain that

$$\begin{aligned} {J_n}({\theta })&={( {nh_2^q})^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\phi _{h_3}'({f_i+{\varepsilon _i}} )( {{\theta ^T}z_i^ * })} {K_i}\\&\quad +\frac{1 }{2}{( {nh_2^q})^{ - 1}}{\sum \limits _{i = 1}^n {{\varphi _{h_3}''( 0|Z_i )}( {{\theta ^T}z_i^ * })} ^2}{K_i} + {o_p}\{ {{{( {nh_2^q})}^{{1 / 2}}}} \}\\&={( {nh_2^q})^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\phi _{h_3}'({{\delta _i^*}} )( {{\theta ^T}z_i^ * })} {K_i}\\&\quad +\frac{1 }{2}{( {nh_2^q})^{ - 1}}{\sum \limits _{i = 1}^n {{\varphi _{h_3}''( 0|Z_i )}( {{\theta ^T}z_i^ * })} ^2}{K_i} + {o_p}\{ {{{( {nh_2^q})}^{{1 / 2}}}} \} \end{aligned}$$

which is parallel to (4.6) of Fan et al. (1994). The rest of the proof follows literally from Fan et al. (1994) by treating the dimension of Z as fixed, so the detail is omitted here. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., Li, N. & Yang, J. A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates. Stat Papers 61, 1911–1937 (2020). https://doi.org/10.1007/s00362-018-1013-1

Download citation

Received: 19 December 2016
Revised: 16 April 2018
Published: 21 May 2018
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00362-018-1013-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates

Abstract

Access this article

Similar content being viewed by others

Entropy-Based Subsampling Methods for Big Data

A Guide for Sparse PCA: Model Comparison and Applications

Residuals-based distributionally robust optimization with covariate information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (xls 51 KB)

Supplementary material 2 (xls 69 KB)

Appendix: Proofs of Theorems

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Lemma 1

Proof of Lemma 1

Proof of Theorem 5

Proof of Theorem 6

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates

Abstract

Access this article

Similar content being viewed by others

Entropy-Based Subsampling Methods for Big Data

A Guide for Sparse PCA: Model Comparison and Applications

Residuals-based distributionally robust optimization with covariate information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (xls 51 KB)

Supplementary material 2 (xls 69 KB)

Appendix: Proofs of Theorems

Appendix: Proofs of Theorems

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Lemma 1

Proof of Lemma 1

Proof of Theorem 5

Proof of Theorem 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation