Abstract
When the observed data set contains outliers, it is well known that the classical least squares method is not robust. To overcome this difficulty, Wang et al. (J Am Stat Assoc 108(502): 632–643, 2013) proposed a robust variable selection method by using the exponential squared loss (ESL) function with a tuning parameter. Although many important statistical models are investigated, to date, in the presence of outliers there is no paper to study the partially nonlinear model by using the ESL function. To fill in this gap, in this paper, we propose a robust and efficient estimation method for the partially nonlinear model based on the ESL function. Under certain conditions, we have shown that the proposed estimators can achieve the best convergence rates. Next, the asymptotic normality of the proposed estimators is established. In addition, we develop a new minorization–maximization algorithm to calculate the estimates for both non-parametric and parametric parts and present a procedure for deriving initial values. Finally, we provide a data-driven approach to select the tuning parameters. Numerical simulations and a real data analysis are used to illustrate that when there are outliers, the proposed ESL method is more robust and efficient for partially nonlinear models than the existing linear approximation method and the composite quantile regression method.
Similar content being viewed by others
References
Becker MP, Yang I, Lange K (1997) EM algorithms without missing data. Stat Methods Med Res 6:38–54
Huang TM, Chen H (2008) Estimating the parametric component of nonlinear partial spline model. J Multivar Anal 99(8):1665–1680
Huet S, Bouvier A, Poursat M-A, Jolivet E (2004) Statistical tools for nonlinear regression: a practical guide with S-plus and R examples. Springer, New York
Jiang Y, Li H (2014) Penalized weighted composite quantile regression in the linear regression model with heavy-tailed autocorrelated errors. J Korean Stat Soc 43:531–543
Jiang Y (2015) Robust estimation in partially linear regression models. J Appl Stat 42(11):2497–2508
Jiang Y (2016) An exponential-squared estimator in the autoregressive model with heavy-tailed errors. Stat Interface 9(2):233–238
Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39(1):305–332
Lange K, Hunter DR, Yang I (2000) Optimization transfer using surrogate objective functions (with discussion). J Comput Graph Stat 9(1):1–20
Li R, Nie L (2007) A new estimation procedure for a partially nonlinear model via a mixed-effects approach. Can J Stat 35(3):399–411
Li R, Nie L (2008) Efficient statistical inference procedures for partially nonlinear models and their applications. Biometrics 64(3):904–911
Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36(1):261–286
Liu JC, Zhang RQ, Zhao WH, Lv YZ (2013) A robust and efficient estimation method for single index models. J Multivar Anal 122:226–238
Lv J, Yang H, Guo CH (2015a) An efficient and robust variable selection method for longitudinal generalized linear models. Comput Stat Data Anal 82:74–88
Lv J, Yang H, Guo CH (2015b) Robust smooth-threshold estimating equations for generalized varying-coefficient partially linear models based on exponential score function. J Comput Appl Math 280:125–140
Mack YP, Silverman BW (1982) Weak and strong uniform consistency of kernel regression estimates. Probab Theory Relat Fields 61(3):405–415
Ruppert D, Sheather SJ, Wand MP (1995) An effective bandwidth selector for local least squares regression. J Am Stat Assoc 90(432):1257–1270
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, New York
Song LX, Zhao Y, Wang XG (2010) Sieve least squares estimation for partially nonlinear models. Stat Probab Lett 80(17–18):1271–1283
Song WX, Yao W, Xing YR (2014) Robust mixture regression model fitting by Laplace distribution. Comput Stat Data Anal 71:128–137
Tang LJ, Zhou ZG, Wu CC (2012) Efficient estimation and variable selection for infinite variance autoregressive models. J Appl Math Comput 40:399–413
Wang X, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. J Am Stat Assoc 108(502):632–643
Yao W, Li L (2014) A new regression model: modal linear regression. Scand J Stat 41(3):656–671
Yao W, Lindsay BG, Li R (2012) Local modal regression. J Nonparametric Stat 24(3):647–663
Yatchew A (1997) An elementary estimator of the partial linear model. Econ Lett 57(2):135–143
Yu C, Chen K, Yao W (2015) Outlier detection and robust mixture modeling using nonconvex penalized likelihood. J Stat Plan Inference 164:27–38
Zhang RQ, Zhao WH, Liu JC (2013) Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. J Nonparametric Stat 25(2):523–544
Acknowledgements
Jiang’s research is partially supported by the National Natural Science Foundation of China (No. 11301221) and the Fundamental Research Funds for the Central Universities (No. 11615455). Partial work was done when the first author visited the Department of Statistics and Actuarial Science of HKU. Fei’s work is supported in part by the National Natural Science Foundation of China (No. 11561071).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
For convenience, we define the following notations:
Before we prove Theorem 1, we first prove the following two lemmas.
Lemma 1
Assume that Conditions (C1)–(C2) and (C5)–(C7) hold, we have
and
for \(j=0,1\).
The proof of Lemma 1 is similar to that of Lemma 1 in Yao et al. (2012). Therefore, we omit it here.
Lemma 2
Under Conditions (C1)–(C7), with probability approaching to 1, there exists a consistent local maximizer of (2.2), denoted by \(\tilde{\varvec{\theta }}\), such that
where \(c_n \,\hat{=}\,(nh_1)^{-1/2}+h_1^2\).
Proof of Lemma 2
Recall that \(\ell _n(\varvec{\theta }) = \ell _n(a, b, \varvec{\beta })\) is defined by (2.2). It suffices to show that for any given \(\delta >0\), there exists a large constant \(C>0\) such that
for any \((d+2)\)-dimensional scalar vector \(\mathbf{v}\) satisfying \(\Vert \mathbf{v}\Vert =C\). Note that
By applying the first-order Taylor expansion and noting Conditions (C3)–(C4), we obtain
so that
where \(\varepsilon _i^*\) is a point between \(\varepsilon _i+r_i\) and \(\varepsilon _i+r_i-c_n{\mathbf {z}}_i^{*\!\top \!}\mathbf{v}\).
Under regularity conditions (C3)–(C4), the mean and variance of \(I_1\) can be directly calculated as
Therefore, we have
Similarly, we can obtain \(I_3=O_p(nc_n^3)\).
By Lemma 1, it follows that
where \(\varvec{\Sigma }_1(t)=F(t,\gamma _1)E[{\mathbf {A}}({\mathbf {x}})|T=t]\). Noting that \(\Vert \mathbf{v}\Vert =C\), we can choose a sufficiently large C such that \(I_2\) dominates both \(I_1\) and \(I_3\) with a probability of at least \(1-\delta \). By Condition (C1), we have \(F(t,\gamma _1)<0\). Therefore, \(\varvec{\Sigma }_1(t)\) is a negative matrix. The proof of Lemma 2 is completed. \(\square \)
Proof of Theorem 1
Note that
From (6.4) and the Taylor expansion, we know that \(\tilde{{\varvec{\upalpha }}}\) satisfies
where \(\varepsilon _i^*\) lies in \(\varepsilon _i\) and \(\varepsilon _i+\tilde{r}_i\),
Therefore, the second term on the right-hand side of (6.5) is
From Lemma 1, we have
By applying Lemma 2, we have \(\Vert \tilde{{\varvec{\upalpha }}}-{\varvec{\upalpha }}_{0}\Vert =O_p((nh_1)^{-1/2}+h_1^2)\). Thus,
Since
we have
so that
where \(\mathbf {w}_n=\sum _{i=1}^{n}{\mathbf {z}}_i^*K_{i,h_1}\phi _{\gamma _1}'(\varepsilon _i)\). By Condition (C2), we have \(E(\mathbf {w}_n)= \mathbf{0}\!\!\!\mathbf{0}\) and
Let \(\mathbf {w}_n^* \,\hat{=}\,\sqrt{h_1/n}\, \mathbf {w}_n\) and \(\zeta _i \,\hat{=}\,\sqrt{h_1/n}\,{\mathbf {d}}^{\!\top \!}{\mathbf {z}}_i^*K_{i,h_1}\phi _{\gamma _1}'(\varepsilon _i)\), where \({\mathbf {d}}\) is a unit vector satisfying \(\Vert {\mathbf {d}}\Vert =1\). Then, \({\mathbf {d}}^{\!\top \!}\mathbf {w}_n^*=\sum _{i=1}^{n}\zeta _i\). By (6.6), we have
Since \(\phi _{\gamma _1}(\cdot )\) is bounded and \(K(\cdot )\) has a compact support, we obtain \(nE|\zeta _1|^3=O((nh_1)^{-1/2})\rightarrow 0\) by direct calculations. Using the Lyapunov’s condition, we obtain
Therefore
The proof is completed. \(\square \)
Lemma 3
Let \((X_1,U_1)^{\!\top \!}, \ldots , (X_n, U_n)^{\!\top \!}\) be an i.i.d. random samples from population random vector \((X, U)^{\!\top \!}\) with the joint density p(x, u). Let \(E|U|^s<\infty \) and \(\sup _{x}\int |u|^sp(x,u)\,\text{ d }u<\infty \), where \(s \ge 2\). Let \(K(\cdot )>0\) be a bounded function with a bounded support, and satisfy the Lipschitz condition. Then,
provided that \(n^{2t-1}h\rightarrow \infty \) for some \(t<1-s^{-1}\).
The proof of Lemma 3 can be found in Mack and Silverman (1982).
Lemma 4
Under the conditions in Theorem 1, we have
Proof of Lemma 4
Let \(\tilde{{\varvec{\uplambda }}}=\sqrt{nh_1}(\tilde{{\varvec{\upalpha }}}-{\varvec{\upalpha }}_{0})\). Then, \(\tilde{{\varvec{\uplambda }}}\) is the maximizer of
Using the Taylor expansion, we have
where \({\varvec{\Delta }}_n = \frac{1}{n}\sum _{i=1}^{n} \tau _n^2{\mathbf {z}}_i^{*}{\mathbf {z}}_i^{*\!\top \!} K_{i,h_1}\phi _{\gamma _1}''(\varepsilon _i)\). By Lemma 3, we obtain
By the regularity conditions, we have \(E[{\varvec{\Delta }}_n]=f_T(t)\varvec{\Sigma }_1(t)+O_p(h_1^2)\). Therefore,
According to (6.7), we have
Since \(\tilde{{\varvec{\uplambda }}}=\sqrt{nh_1}(\tilde{{\varvec{\upalpha }}}-{\varvec{\upalpha }}_{0})\), we obtain
holds uniformly in \(t\in {\mathbb {T}}\). The proof is completed. \(\square \)
Lemma 5
Let Conditions (C1)–(C7) hold, \(nh_1^4\rightarrow 0\) and \(nh_1^2/[\log (1/h_1)] \rightarrow \infty \) as \(n\rightarrow \infty \). Then, with probability approaching to 1, there exists a local maximizer \({\varvec{\hat{\beta }}}_n\) defined by (2.3) such that
Proof of Lemma 5
Let \(\chi _i=\tilde{m}(T_i)-m(T_i)\) and \(R(\varvec{\beta })=\frac{1}{n}\sum _{i=1}^{n}\phi _{\gamma _2}(Y_i-\tilde{m}(T_i)-g({\mathbf {x}}_i; \varvec{\beta }))\). Using the Taylor expansion and (6.4), we have
where \(\Vert \varvec{\mathrm{e}}\Vert =C\) for a large constant C, and \(\varepsilon _i^*\) lies in \(\varepsilon _i-\chi _i- \varvec{\mathrm{e}}^{\!\top \!} g'({\mathbf {x}}_i; \varvec{\beta }_0)/ \sqrt{n}\) and \(\varepsilon _i-\chi _i\). By Lemma 4, we have
By (6.8), \(nh_1^4\rightarrow 0\) and \(nh_1^2/[\log (1/h_1)]\rightarrow \infty \) as \(n\rightarrow \infty \), we have \(E(I_6)=O(Ch_1^2/\sqrt{n})\), \(\text{ Var }(I_6)=O(C^2/n^2)\). Therefore, we can obtain \(I_6=O_p(n^{-1})\). Similarly, we have \(I_8=O_p(n^{-3/2})\). For \(I_7\), we have
where \(\varvec{\Sigma }_4=E[F(t,\gamma _2)g'({\mathbf {x}};\varvec{\beta }_0)g'({\mathbf {x}};\varvec{\beta }_0)^{\!\top \!}]\). Noting \(\Vert \varvec{\mathrm{e}}\Vert =C\), we can choose a sufficiently large C such that \(I_7\) dominates both \(I_6\) and \(I_8\) with a probability of at least \(1-\delta \). Since \(F(t,\gamma _2)<0\), we have
The proof is completed. \(\square \)
Proof of Theorem 2
Let \(\tilde{\varphi }_i = \tilde{m}(T_i)-m(T_i)+g'({\mathbf {x}};\varvec{\beta }_0)^{\!\top \!}({\varvec{\hat{\beta }}}_n-\varvec{\beta }_0)\). Then, \({\varvec{\hat{\beta }}}_n\) satisfies the following equation
where
By (6.8), we can write \(J_1\) as
Since \(nh_1^4\rightarrow 0\) and \(nh_1^2/[\log (1/h_1)] \rightarrow \infty \) as \(n\rightarrow \infty \), we have
so that
By calculating the second moment, It can be shown that \(J_1-J_3\xrightarrow {\mathrm{P}} 0\), where \(J_3=-\sum _{j=1}^{n} {\varvec{\kappa }}(T_j) \) with
On the other hand, \(J_2=-n\varvec{\Sigma }_4({\varvec{\hat{\beta }}}_n-\varvec{\beta }_0)\). Since \(|\tilde{\varphi }_i|=O_p(\Vert {\varvec{\hat{\beta }}}_n-\varvec{\beta }_0\Vert )=o_p(1)\), and \(|\tilde{\varphi }_i|^2=o_p(1)O_p({\varvec{\hat{\beta }}}_n-\varvec{\beta }_0)=o_p(\Vert {\varvec{\hat{\beta }}}_n-\varvec{\beta }_0\Vert )\)
Therefore,
The proof is completed by Slutsky’s Theorem and the Central Limit Theorem. \(\square \)
Proof of Theorem 3
According to Theorem 2, We have \(\Vert {\varvec{\hat{\beta }}}_n-\varvec{\beta }_0\Vert =O_p(1/\sqrt{n})\). Following the ideas in the proof of Theorem 1, we can easily obtain the result of Theorem 3.
Rights and permissions
About this article
Cite this article
Jiang, Y., Tian, GL. & Fei, Y. A robust and efficient estimation method for partially nonlinear models via a new MM algorithm. Stat Papers 60, 2063–2085 (2019). https://doi.org/10.1007/s00362-017-0909-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0909-5