Abstract
In this paper, a new robust and efficient estimation approach based on local modal regression is proposed for partially linear models with large-dimensional covariates. We show that the resulting estimators for both parametric and nonparametric components are more efficient in the presence of outliers or heavy-tail error distribution, and as asymptotically efficient as the corresponding least squares estimators when there are no outliers and the error distribution is normal. We also establish the asymptotic properties of proposed estimators when the covariate dimension diverges at the rate of \(o\left( {\sqrt{n} } \right) \mathrm{{ }}\). To achieve sparsity and enhance interpretability, we develop a variable selection procedure based on SCAD penalty to select significant parametric covariates and show that the method enjoys the oracle property under mild regularity conditions. Moreover, we propose a practical modified MEM algorithm for the proposed procedures. Some Monte Carlo simulations and a real data are conducted to illustrate the finite sample performance of the proposed estimators. Finally, based on the idea of sure independence screening procedure proposed by Fan and Lv (J R Stat Soc 70:849–911, 2008), a robust two-step approach is introduced to deal with ultra-high dimensional data.
Similar content being viewed by others
References
Akaike H (1973) Maximum likelihood Identification of Gaussian autoregressive moving average models. Biometrika 60:255–265
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37:373–384
Chen B, Yu Y, Zou H, Liang H (2012) Profiled adaptive Elastic-Net procedure for partially linear models with high-dimensional covariates. J Stat Plann Inference 142:1733–1745
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–21
Engle R, Granger C, Rice J, Weiss A (1986) Semiparametric estimates of the relation between weather and electricity sales. J Am Stat Assoc 81:310–320
Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman and Hall, London
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Lv J (2008) Sure independence screening for ultra-high-dimensional feature space. J R Stat Soc 70:849–911
Fan J, Hu TC, Truong YK (1994) Robust nonparametric function estimation. Scand J Stat 21:433–446
Hardle W, Liang H, Gao JT (2000) Partial linear models. Springer, New York
Huber PJ (1981) Robust estimation. Wiley, New York
Johnson RW (2003) Kiplingers personal finance. J Stat Educ 57:104–123
Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36:261–286
Li J, Ray S, Lindsay B (2007) A nonparametric statistical approach to clustering via mode identification. J Mach Learn Res 8:1687–1723
Li GR, Peng H, Zhu LX (2011) Nonconcave penalized M-estimation with diverging number of parameters. Stat Sin 21:391–420
Mallows CL (1973) Some comments on \(Cp\). Technometrics 15:661–675
Ni X, Zhang HH, Zhang D (2009) Automatic model selection for partially linear models. J Multivar Anal 100:2100–2111
Pollard D (1991) Asymptotics for least absolute deviation regression estimators. Econom Theory 7:186–199
Rao BLSP (1983) Nonparametric functional estimation. Academic Press, Orlando
Robinson PM (1988) Root \(n\)-consistent semiparametric regression. Econometrica 56:931–954
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Severini TA, Staniswalis JG (1994) Quasi-likelihood estimation in semiparametric models. J Am Stat Assoc 89:501–511
Speckman PE (1988) Kernel smoothing in partial linear models. J R Stat Soc 50:413–436
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc 58:267–288
Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J Bus Econ Stat 25:347–355
Xie H, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37:673–696
Yang H, Yang J (2014) A robust and efficient estimation and variable selection method for partially linear single-index models. J Multivar Anal 129:227–242
Yao W, Li L (2014) A new regression model: modal linear regression. Scand J Stat 41:656–671
Yao W, Lindsay B, Li R (2012) Local modal regression. J Nonparametr Stat 24:647–663
Zeger S, Diggle P (1994) Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50:689–699
Zhang R, Zhao W, Liu J (2013) Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. J Nonparametr Stat 25:523–544
Zhao W, Zhang R, Liu J, Lv Y (2014) Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Ann Inst Stat Math 66:165–191
Zhao W, Zhang R, Liu Y, Liu J (2015) Empirical likelihood based modal regression. Stat Papers 56:411–430
Zhou H, You J, Zhou B (2010) Statistical inference for fixed-effects partially linear regression models with errors in variables. Stat Pap 51:629–650
Zhu LX, Fang KT (1996) Asymptotics for kernel estimation of sliced inverse regression. Ann Stat 3:1053–1068
Zhu L, Huang M, Li R (2012) Semiparametric quantile regression with high-dimensional covariates. Stat Sin 22:1379–1401
Zhu L, Li R, Cui H (2013) Robust estimation for partially linear models with large-dimensional covariates. Sci China Math 56:2069–2088
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant No. 11671059).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National Natural Science Foundation of China (Grant No. 11671059).
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Proofs of Theorems
Appendix: Proofs of Theorems
Proof of Theorem 1
We want to show that for any given \({{\delta > 0}}\), there exists a large constant C such that
where \(R(\beta ) = \frac{1}{n}\sum \limits _{i = 1}^n {\phi _{h_1}}( {{{{\widehat{\widetilde{Y}}}}_i} - {{{\widehat{\widetilde{X}}}}_i}^T\beta })\), \({\mu _n}={p_n}^{{1 / 2}} {{n^{{{ - 1} / 2}}}}\).
For simplicity, we define \({D_n}(\mathbf{{v}}) = {R}( {\beta _0 + {{\mu _n}}{} \mathbf{{v}}}) - {R }(\beta _0)\) and then obtain that
where \({\sigma _i} = {{\widetilde{Y}}_i} - {{{\widehat{\widetilde{Y}}}}_i} - {({{\widetilde{X}}_i} - {{{\widehat{\widetilde{X}}}}_i})^T}{\beta _0}\), and \(t_i\) is between \({\varepsilon _i} - {\sigma _i}\) and \({\varepsilon _i} - {\sigma _i}-{\mu _n}{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}\). By the regularity conditions (A1)–(A4), the consistence of the kernel estimation implies that \({\max _{1 \le i \le n}}\Vert {{\sigma _i}} \Vert = {o_p}(1)\) almost surely, which will repeatedly be used in our proof. A detailed discussion on this argument can be found in the Lemma 3.5.1 and Lemma A.1 of Hardle et al. (2000).
Based on the fact \(\xi = E(\xi ) + {\mathrm{O}_p}(\sqrt{Var(\xi )})\), the regularity condition (A7) and the uniform convergence of \({{\mathbf{{v}}^T}{{{\widehat{\widetilde{X}}}}_i}}\) entail that \({I_1} = {\mathrm{O}_p}(\frac{{C{\mu _n}}}{{\sqrt{n} }})\). One can refer to Rao (1983) and Zhu and Fang (1996) for these technical details. Similarly, we have \({I_3} = {\mathrm{O}_p}({C^3}{\mu _n}^3)\). For \(I_2\), we have \({I_2} = \frac{1}{2}{\mu _n}^2{\mathbf{{v}}^T}{\varSigma _1}{} \mathbf{{v}}(1 + {o_p}(1))\), where \({\varSigma _1} = E\{ F(U,{h_1}){\widetilde{X}}{{\widetilde{X}}^T}\}.\) By the condition \({{p_n^2} / n} \rightarrow 0\) as \(n \rightarrow \infty \), we can show that \({I_1} = {o_p}\left( {{I_2}} \right) \), and \({I_3} = {o_p}\left( {{I_2}} \right) \). Similar practice can been in Li et al. (2011).
By the regularity condition (A6), \(F(\mathbf{{v}},{h_1}) < 0\); hence, \({\varSigma _1}\) is a negative matrix. Noting \(\left\| \mathbf {v} \right\| = C\), we can get C large enough such that \(I_2\) dominates both \(I_1\) and \(I_3\) with a probability of at least \(1- \delta \). It follows that Eq. (13) holds. Hence, \(\widehat{\beta }\) is a root-\({{n / {{p_n}}}}\) consistent estimator of \(\beta \). \(\square \)
Proof of Theorem 2
Let \({{\widehat{\gamma }}_i} = {{{\widehat{\widetilde{X}}}}_i}^T( {\widehat{\beta }- {\beta _0}} )\). If \(\widehat{\beta }\) maximizes Eq. (6), then \(\widehat{\beta }\) satisfies the following equation:
where \({\varepsilon _i^ * }\) is between \(\varepsilon _i\) and \(\varepsilon _i-{\sigma _i}-{\widehat{\gamma }}_i\).
For \(I_5\), we have
where \({\varSigma _1} = E\{ {F( {u,{h_1}}){\widetilde{X}}{{ \widetilde{X}}^T}}\},\) and the last equality is derived from the regularity conditions (A5) and (A7).
Based on \({| {{{\widehat{\gamma }}_i}} |^2} = {\mathrm{O}_p}({\Vert {\widehat{\beta }- {\beta _0}}\Vert ^2})\) and \({{p_n^2} / n} \rightarrow 0\) as \(n \rightarrow \infty \), we have \({I_6} = o_p({I_5})\). It can be shown, by easy calculation, that \(\sqrt{n} (\widehat{\beta }- {\beta _0}) = \frac{1}{{\sqrt{n} }}\varSigma _1^{ - 1}\sum \nolimits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{{h_1}}^\prime ( {{\varepsilon _i}})} + {o_p}(1).\)
Note that \(E\left( {{\phi }^\prime _h( \varepsilon )\left| {U = u} \right. } \right) = 0\), and by the central limit theorem, we have
where \({\varSigma _2} = Var\{ \widetilde{X}\phi _{{h_1}}^\prime (\varepsilon )\}\). This completes the proof. \(\square \)
Proof of Theorem 3
Since Theorem 3 is parallel to Theorem 6, we will only present detailed proof for Theorem 6. \(\square \)
Proof of Theorem 4
It is sufficient to show that for any given \({{\delta > 0}}\), there exists a large constant C such that
where \({\omega _n} = p_n^{{1 / 2}}({n^{ - {1 / 2}}} + {a_n}).\)
Let \({I_7} = - \sum \nolimits _{j = 1}^{{k_n}} {\{ p{}_{{\lambda }}(\left| {{\beta _{{0j}}} + {\omega _n}{v_j}} \right| ) - p{}_{{\lambda }}(\left| {{\beta _{{0j}}}} \right| )\} } \), where \({k_n}\) is the number of components of \(\beta _{{0a}}\). Note that \(p{}_{{\lambda }}(0) = 0\) and \(p{}_{{\lambda }}(| {{\beta _{{j}}}} |) \ge 0\) for all \(\beta _j\). By the proof of Theorem 1, we have
where \({{\bar{I}}_1}\), \({{\bar{I}}_2}\), and \({{\bar{I}}_3}\) are the same as \(I_1\), \(I_2\) and \(I_3\) except the factor \({\mu _n}\) replaced by \(\omega _n\).
By the Taylor expansion and the Cauchy–Schwarz inequality, \(I_7\) is bounded by
Consequently, as \(b_n \rightarrow 0\), \(I_7\) is dominated by \({{\bar{I}}_2} = \frac{1}{2}\omega _n^2{\mathbf{{v}}^T}{\varSigma _1}{} \mathbf{{v}}(1 + {o_p}(1))\), provided C is taken to be sufficiently large. Hence, for large C, \({\bar{I}}_2\) dominates all other three terms in Eq. (15). Based on the fact \({\bar{I}}_2<0\), Eq. (14) holds. Consequently, the result in Theorem 4 holds.
To prove Theorem 5, we need the following lemma. \(\square \)
Lemma 1
Under the conditions in Theorem 5, with probability tending to 1, for any given \({\beta _a}\) satisfying \(\left\| {{\beta _a} - {\beta _{0a}}} \right\| = {\mathrm{O}_p}(\sqrt{{{{p_n}} / n}} )\) and any constant C, we have
Proof of Lemma 1
From the proof of Theorem 2, we have
It can be shown that \(\frac{1}{n}\sum \nolimits _{i = 1}^n {{{{\widehat{\widetilde{X}}}}_i}\phi _{h_1}^\prime ({\varepsilon _i})}= {O_p}({n^{ - {1 / 2}}})\). By the assumption that \(\left\| {{\beta _a} - {\beta _{0a}}} \right\| = {O_p}(\sqrt{{{{p_n}} / n}} )\), then we have \(R_j^\prime (\beta ) = {O_p}(\sqrt{{{{p_n}} / n}} )\). Therefore, for \({\beta _j} \ne 0\) and \(j = {k_n} + 1, \ldots ,{p_n}\),
Since \({{\lim {{\inf }_{n \rightarrow \infty }}\lim {{\inf }_{t \rightarrow {0^ + }}}p_{{\lambda }}^\prime (t)} / {{\lambda }}} > 0\) and \({({n / {{p_n}}})^{{1 / 2}}}{\lambda } \rightarrow \infty \), then the sign of the derivative for \({\beta _j} \in ( - C\sqrt{{{{p_n}} / n}}, C\sqrt{{{{p_n}} / n}})\) is completely determined by that of \({\beta _j}\). Therefore, Eq. (16) holds. \(\square \)
Proof of Theorem 5
From Lemma 1, it follows that \(\widehat{\beta }_b^\lambda = 0\). We will next show the asymptotic normality of \(\widehat{\beta }_a^\lambda \). By Theorem 4, it can be shown easily that there exists a \(\widehat{\beta }_a^\lambda \) that is a root-\({{n / {{p_n}}}}\) consistent local maximizer of \({Q_\lambda }\{ {{{\left( {\beta _a^T,0} \right) }^T}}\}\), which satisfies the following equations:
Therefore,
where \({\varepsilon _i^ * }\) is between \(\varepsilon _i\) and \(\varepsilon _i-{\widehat{\gamma }}_i\).
By the similar proof in Theorem 2, it follows by the central limit theorem and the Slutsky’s theorem that
in distribution, where \(\varSigma _1^{(1)}\), \(\varSigma _2^{(1)}\) are the submatrices of \(\varSigma _1\) and \(\varSigma _2\) corresponding to \(\beta _{0a}\). \(\square \)
Proof of Theorem 6
For notational clarity, we let \({K_i} = K({\frac{{{Z_i} -Z}}{{{h_2}}}})\) and \(l\left( r \right) = - {\phi _{{h_3}}}(r)\). Then, Eq. (11) can be rewritten as
Let \({\theta } = {\left( {nh_2^q} \right) ^{{1 / 2}}}[ {{{a}}-f(Z),{h_2}( {{{b}}-f'(Z)})}]\), \(z_i^ * = {[ {1,{{{{( {{Z_i} - Z} )}^T}}/{{h_2}}}} ]^T}\), \({s_i} = X_i^T( {{\beta _0} - {{\widehat{\beta }}^\lambda }})\), \({\delta _i} = {Y_i} - X_i^T\widehat{\beta }^{\lambda } - f(Z) - f'(Z)({Z_i} - Z)\), \({\delta _i^*} = {Y_i} - X_i^T\beta _0 - f(Z) - f'(Z)({Z_i} - Z)\) and \(f_i={f({Z_i}) - f(Z) - f'({Z})({Z_i} - Z)}\). Then, \({\theta _n}= {\left( {nh_2^q} \right) ^{{1 / 2}}}[ {{{\widehat{a}}^\lambda }-f(Z),{h_2}( {{{\widehat{b}}^\lambda }-f'(Z)})}]\) minimizes the function
Since the function \({J_n}({\theta })\) is convex in \(\theta \), it is sufficient to prove that \({J_n}({\theta })\) converges pointwise to its conditional expectation (Pollard 1991).
Given \({{{{\mathbf{{X}}}}}} = {({{{{X}}}_1}, \ldots ,{{{{X}}}_n})^T}\) and \({{{{\mathbf{{Z}}}}}} = {({{{{Z}}}_1}, \ldots ,{{{{Z}}}_n})^T}\), we can obtain that
where the last equality is derived from the regularity condition (B3). Similar arguments can be also seen in (D.2) and (D.3) of Zhu et al. (2013).
Then, we can obtain that
which is parallel to (4.6) of Fan et al. (1994). The rest of the proof follows literally from Fan et al. (1994) by treating the dimension of Z as fixed, so the detail is omitted here. \(\square \)
Rights and permissions
About this article
Cite this article
Yang, H., Li, N. & Yang, J. A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates. Stat Papers 61, 1911–1937 (2020). https://doi.org/10.1007/s00362-018-1013-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-018-1013-1