Abstract
This study considers rank estimation of the regression coefficients of the single index regression model. Conditions needed for the consistency and asymptotic normality of the proposed estimator are established. Monte Carlo simulation experiments demonstrate the robustness and efficiency of the proposed estimator compared to the semiparametric least squares estimator. A real-life example illustrates that the rank regression procedure effectively corrects model nonlinearity even in the presence of outliers in the response space.
Similar content being viewed by others
References
Abebe, A., McKean, J. W. (2013). Weighted Wilcoxon estimators in nonlinear regression. Australian and New Zealand Journal of Statistics, 55(4), 401–420.
Andrews, D. W. K. (1994). Asymptotics for semiparametric econometric models via stochastic equicontinuity. Econometrica, 62(1), 43–72.
Bindele, H. F., Abebe, A. (2012). Bounded influence nonlinear signed-rank regression. Canadian Journal of Statistics, 40(1), 172–189.
Carroll, R. J., Fan, J., Gijbels, I., Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489.
Chang, W. H., McKean, J. W., Naranjo, J. D., Sheather, S. J. (1999). High-breakdown rank regression. Journal of the American Statistical Association, 94(445), 205–219.
Delecroix, M., Härdle, W., Hristache, M. (2003). Efficient estimation in conditional single-index regression. Journal of Multivariate Analysis, 86(2), 213–226.
Delecroix, M., Hristache, M., Patilea, V. (2006). On semiparametric-estimation in single-index regression. Journal of Statistical Planning and Inference, 136(3), 730–769.
Feng, L., Zou, C., Wang, Z. (2012). Rank-based inference for the single-index model. Statistics & Probability Letters, 82(3), 535–541.
Hájek, J., Šidák, Z., Sen, P. K. (1999). Theory of rank tests. Probability and mathematical statistics (2nd ed.). San Diego, CA: Academic Press, Inc.
Han, A. K. (1987). Non-parametric analysis of a generalized regression model: The maximum rank correlation estimator. Journal of Econometrics, 35(2), 303–316.
Härdle, W., Stoker, T. M. (1989). Investigating smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association, 84(408), 986–995.
Härdle, W., Tsybakov, A. B. (1993). How sensitive are average derivatives? Journal of Econometrics, 58(1–2), 31–48.
Härdle, W., Hall, P., Ichimura, H. (1993). Optimal smoothing in single-index models. The Annals of Statistics, 21(1), 157–178.
Hettmansperger, T. P., McKean, J. W. (1998). Robust nonparametric statistical methods, volume 5 of Kendall’s library of statistics. London: Edward Arnold.
Hettmansperger, T. P., McKean, J. W. (2011). Robust nonparametric statistical methods, volume 119 of monographs on statistics and applied probability (2nd ed.). Boca Raton, FL: CRC Press.
Horowitz, J. L., Härdle, W. (1996). Direct semiparametric estimation of single-index models with discrete covariates. Journal of the American Statistical Association, 91(436), 1632–1640.
Hristache, M., Juditsky, A., Spokoiny, V. (2001). Direct estimation of the index coefficient in a single-index model. The Annals of Statistics, 29(3), 595–623.
Ichimura, H. (1993). Semiparametric least squares (sls) and weighted sls estimation of single-index models. Journal of Econometrics, 58(1–2), 71–120.
Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of the residuals. Annals of Mathematical Statistics, 43, 1449–1458.
Klein, R. W., Spady, R. H. (1993). An efficient semiparametric estimator for binary response models. Econometrica, 61(2), 387–421.
Kutner, M., Nachtsheim, C., Neter, J., Li, W. (2004). Applied Linear Statistical Models with Student CD. New York: McGraw-Hill/Irwin Companies, Incorporated.
Liu, J., Zhang, R., Zhao, W., Lv, Y. (2013). A robust and efficient estimation method for single index models. Journal of Multivariate Analysis, 122, 226–238.
McCullagh, P., Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall.
Naranjo, J. D., Hettmansperger, T. P. (1994). Bounded influence rank regression. Journal of the Royal Statistical Society. Series B, 56(1), 209–220.
Newey, W. K. (2004). Efficient semiparametric estimation via moment restrictions. Econometrica, 72(6), 1877–1897.
Newey, W. K., McFadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of econometrics, 4, 2111–2245.
Powell, J. L., Stock, J. H., Stoker, T. M. (1989). Semiparametric estimation of index coefficients. Econometrica, 57(6), 1403–1430.
R Development Core Team. (2009). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Rao, T. S., Das, S., Boshnakov, G. N. (2014). A frequency domain approach for the estimation of parameters of spatio-temporal stationary random processes. Journal of Time Series Analysis, 35(4), 357–377.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics., Wiley series in probability and mathematical statistics. New York: Wiley.
Sherman, R. P. (1994). Maximal inequalities for degenerate U-processes with applications to optimization estimators. The Annals of Statistics, 22(1), 439–459.
Whitt, W. (2011). Stochastic-process limits: An introduction to stochastic-process limits and their application to queues. New York: Springer.
Xia, Y. (2006). Asymptotic distribution for two estimators of the single-index model. Econometric Theory, 22, 1112–1137.
Xia, Y., Tong, H., Li, W. K., Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 363–410.
Xiang, X. (1995). A strong law of large number for $L$-statistics in the non-i.d. case. Communications in Statistics. Theory and Methods, 24(7), 1813–1819.
Yin, X., Cook, R. D. (2005). Direction estimation in single-index regressions. Biometrika, 92(2), 371–384.
Yu, Y., Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97(460), 1042–1054.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
This appendix contains proofs of the theoretical main results together with a key Lemma due to Delecroix et al. (2006) that ensures the uniform strong consistency of the leave-one-out Nadaraya–Watson estimator. For details regarding the proof of this lemma, readers are referred to the aforementioned paper.
Lemma 3
Let \(\mathscr {B}_{n}=:\{\mathbf {\beta }:~\Vert \mathbf {\beta }-\mathbf {\beta }_{0}\Vert \le d_{n}\}\), where \(d_{n}\) is some sequence decreasing to zero. Then,
-
(a)
if \(\delta >0\), we have,
$$\begin{aligned} \sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}}\left| I_{\{{\mathbf x}:\widehat{\mu }_{\mathbf {\beta },h}^{i}({\mathbf x}^{\tau }\mathbf {\beta })\ge c\}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right| \le I_{\varGamma ^{\delta }}({\mathbf X}_{i})+I_{(\delta ,\infty )}(Z_{n}), \end{aligned}$$where \(\varGamma ^{\delta }=\{{\mathbf x}:~|\mu _{\mathbf {\beta }_{0},h}({\mathbf x}^{\tau }\mathbf {\beta }_{0})-c|\le \delta \}\) and
$$\begin{aligned} Z_{n}=\max _{1\le i\le n}\sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}} \left| \widehat{\mu }_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -\mu _{\mathbf {\beta }_{0},h}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{0})\right| . \end{aligned}$$ -
(b)
Assume \(d_{n}=o(1/\sqrt{n})\), and there exists a sequence \(\delta _{n}\rightarrow 0\) such that \(\delta _{n}/n^{-a\varepsilon }\rightarrow \infty \) and \(\delta _{n}[d_{n}\sqrt{n}]^{-a\varepsilon }\rightarrow \infty \), for some \(a>0\), then \(I_{(\delta _{n},\infty )}(Z_{n})=o_{p}(n^{-\alpha })\), for all \(\alpha >0\). Moreover, together with assumptions \((I_2)\)–\((I_4)\), assuming that \(E(|Y|^2)<\infty \), we have
$$\begin{aligned} \max _{1\le i\le n}\sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}| \widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}\mathbf {\beta })|I_{\varGamma }({\mathbf X}_{i})\rightarrow 0\quad a.s.\; \text{ as } n\rightarrow \infty \text{, } \end{aligned}$$and
$$\begin{aligned} \max _{1\le i\le n}\sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|\nabla _{\mathbf {\beta }}[\widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}\mathbf {\beta })]-\nabla _{\mathbf {\beta }}[g_{\mathbf {\beta }}({\mathbf X}_{i}\mathbf {\beta })]|I_{\varGamma }({\mathbf X}_{i})\rightarrow 0\quad a.s.\; \text{ as } n\rightarrow \infty \text{. } \end{aligned}$$
Proof of Lemma 1
(i) By definition, we have \(A_{n}(\hat{\alpha }_{n})\le A_{n}(\alpha _{0,n})\) and \(E(A_{n}(\alpha _{0,n}))\le E(A_{n}(\hat{\alpha }_{n}))\). These inequalities give \(A_{n}(\hat{\alpha }_{n})-E(A_{n}(\hat{\alpha }_{n}))\le A_{n}(\hat{\alpha }_{n})-E(A_{n}(\alpha _{0,n}))\le A_{n}(\alpha _{0,n})-E(A_{n}(\alpha _{0,n}))\). Thus,
Since \(\alpha _{0,n}\) is unique for any fixed n, \(\alpha _{0,n}\rightarrow \alpha _{0}\) and \(\displaystyle \sup _{\alpha \in \varTheta }|A_{n}(\alpha )-E(A_{n}(\alpha ))|\rightarrow 0\;\;a.s.\) as \(n\rightarrow \infty \), we have \(\hat{\alpha }_{n}\rightarrow \alpha _{0}\;\;a.s.\) as \(n\rightarrow \infty \). \(\square \)
Proof of Lemma 2
We provide the proof of equation (9) and those of Eqs. (10) and (11) could be obtained using similar arguments. By Chebyshev’s inequality, we have, for any \(\varepsilon >0\),
Setting \(a_{ni}(\mathbf {\beta }_0)=R(\nu _{ni}(\mathbf {\beta }_0))/(n+1)\), \(b_{ni}(\mathbf {\beta }_0)=R(z_{i}(\mathbf {\beta }_0))/(n+1)\), let us introduce the following notation: \(\psi _{i}(\mathbf {\beta }_0)=\varphi (a_{ni}(\mathbf {\beta }_0)) -\varphi (b_{ni}(\mathbf {\beta }_0))\) and \(U_{i}(\mathbf {\beta }_0)=I_{\varGamma _{n}}({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0} [\widehat{g}_{\mathbf {\beta }_{0},h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)] -\,I_{\varGamma }({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_{0}} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\).
We now show that \(J_{in}\rightarrow 0\) as \(n\rightarrow \infty \), for \(i=1,2,3\). Indeed, from the boundedness of \(\varphi \), there exists a positive constant L such that \(|\varphi (u)|\le L\), for all \(u\in (0,1)\). Also
For \(i=1,\ldots ,n\), \(F_{\nu }(\nu _{ni}(\mathbf {\beta }_0))\) and \(F(z_{i}(\mathbf {\beta }_0))\) are independent uniformly distributed in (0, 1) random variables. Following Chapter 6 of Hájek et al. (1999), it is obtained that \(\nu _{ni}(\mathbf {\beta }_0)-F_{\nu }(\nu _{ni}(\mathbf {\beta }_0))\rightarrow 0\;a.s.\) and \(b_{ni}(\mathbf {\beta }_0)-F(z_{i}(\mathbf {\beta }_0))\rightarrow 0\;a.s.\), for each i. Thus, by continuity of \(\varphi \) and by Lemma 3, we have \(\varphi (a_{ni}(\mathbf {\beta }_0))-\varphi (F_{\nu }(\nu _{ni}(\mathbf {\beta }_0)))\rightarrow 0\;a.s.\) and \(\varphi (F(z_{i}(\mathbf {\beta }_0)))-\varphi (b_{ni}(\mathbf {\beta }_0))\rightarrow 0\;a.s.\), for each i. Also, by Lemma 3, we have \(\nu _{ni}(\mathbf {\beta }_0)-z_{i}(\mathbf {\beta }_0)\rightarrow 0\;a.s.\), from which by the continuity of the probability measure and the continuity of \(\varphi \), we have \(\varphi (F_{\nu }(\nu _{ni}(\mathbf {\beta }_0)))-\varphi (F(z_{i}(\mathbf {\beta }_0)))\rightarrow 0\;a.s.\), for each i. On the other hand,
For \({\mathbf X}_{i}\in \varGamma \) and for all \(\varepsilon >0\), there exists \(N>0\) such that for all \(n\ge N\),
\(\varepsilon \) being arbitrary, letting \(\varepsilon \rightarrow 0\), we have \(\Vert \nabla _{\mathbf {\beta }_0}[\widehat{g}_{\mathbf {\beta }_{0},h}^{i} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\Vert \le J({\mathbf X}_{i})<\infty \;a.s. \), as J is integrable. Thus, by Lemma 3, \(|I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})|\Vert \nabla _{\mathbf {\beta }_0} [\widehat{g}_{\mathbf {\beta }_{0},h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\Vert \rightarrow 0\;a.s.\) and \(\Vert \nabla _{\mathbf {\beta }_0}[\widehat{g}_{\mathbf {\beta }_{0},h}^{i} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]-\nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_{0}} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\Vert I_{\varGamma }({\mathbf X}_{i})\rightarrow 0\;a.s.\), for all i. Therefore, \(\Vert U_{i}(\mathbf {\beta }_0)\Vert \rightarrow 0\;a.s.\), for all i. Then
by applying the dominated convergence theorem together with Lemma 3. Next, using Cauchy–Schwarz inequality, we have
By the strong law of large numbers (SLLN), \(n^{-1}\sum _{i=1}^{n}J^{4}({\mathbf X}_{i}) \rightarrow E\{J^{4}({\mathbf X})\}<\infty \;a.s.\) Also, from the above discussion, \(\max _{1\le i\le n}\left| \psi _i(\mathbf {\beta }_0)\right| ^4\rightarrow 0\;a.s.\) Thus, applying the dominated convergence theorem once again, we have \(J_{2n}\rightarrow 0\;a.s.\) Moreover, using the simple inequality \(ab\le (a^2+b^2)/2\) together with Cauchy–Schwarz inequality, we have
By Lemma 3, \(\displaystyle \max _{1\le i\le n}\left\| U_{i}(\mathbf {\beta }_0)\right\| ^{4}\rightarrow 0\;a.s.\), and again, by the SLLN, \(\displaystyle \frac{1}{n}\sum _{i=1}^{n}J^{4}({\mathbf X}_{i})\) converges almost surely to \(E\{J^{4}({\mathbf X})\}<\infty .\) Also, as before, \(\displaystyle \max _{1\le i\le n}\left| \psi _{i}(\mathbf {\beta }_0)\right| ^2\rightarrow 0\;a.s.\) To this end, once again, a direct application of the dominated convergence theorem gives \(J_{3n}\rightarrow 0\;a.s.\) and consequently, \(\displaystyle \lim _{n\rightarrow \infty } P_{\mathbf {\beta }_0}\left( \sqrt{n}\Vert \widetilde{S}_{n} (\mathbf {\beta }_0)-S_{n}(\mathbf {\beta }_0)\Vert >\varepsilon \right) =0\). \(\square \)
Proof of Theorem 1
In this proof, we take L to be an arbitrary positive constant not necessarily the same, and as in the proof of Lemma 2, set \(b_{ni}(\mathbf {\beta })=R(\nu _{ni}(\mathbf {\beta }))/(n+1)\) and \(a_{ni}(\mathbf {\beta })=R(z_{i}(\mathbf {\beta }))/(n+1)\). By definition of \(\widetilde{D}_{n}(\mathbf {\beta })\) and \(D_{n}(\mathbf {\beta })\), we have
Considering the first term to the right-hand side of Eq. (14), we have
where L is the bound of \(\varphi \), as \(\varphi \) is assumed bounded by assumption \((I_1)\). By Cauchy–Schwarz inequality, we have
The strong law of large numbers gives \(n^{-1}\sum _{i=1}^{n}|Y_{i}|^{2}\rightarrow E[|Y|^{2}]<\infty \;a.s.\) On the other hand, we have,
as \(n\rightarrow \infty \), by Lemma 3. Similarly,
Again, by Lemma 3,
and
Thus, \(n^{-1}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| |\widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|\rightarrow 0\;a.s.\) Moreover,
Following the same argument as above, we have \(n^{-1}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right| ^{2}\rightarrow 0\;a.s.\), and a direct application of the strong of large numbers gives
When it comes to the second term on the right-hand side of Eq. (14), it can be further decomposed as follows
Considering the first term to the right-hand side of this equation, we have
which converges to 0 a.s. by Lemma 3. Now, let’s set \(F_{i\nu }(s)=P(\nu _{in}(\mathbf {\beta })\le s)\) and \(F_{i}(s)=P(z_{i}(\mathbf {\beta })\le s)\). Then,
As in the proof of Lemma 2, since for \(i=1,\ldots ,n\) and for all \(\mathbf {\beta }\in \mathscr {B}\), \(F_{i\nu }(\nu _{in}(\mathbf {\beta }))\) and \(F_{i}(z_{i}(\mathbf {\beta }))\) are independent uniformly distributed random variables on (0, 1), following Hájek et al. (1999), we have, \(b_{ni}(\mathbf {\beta })-F_{i\nu }(\nu _{in}(\mathbf {\beta }))\rightarrow 0\;a.s.\) and \(a_{ni}(\mathbf {\beta })-F_{i}(z_{i}(\mathbf {\beta }))\rightarrow 0\;a.s.\), for each i. Applying the generalized continuous mapping theorem (Whitt 2011), we have \(\varphi (b_{ni}(\mathbf {\beta }))-\varphi (F_{i\nu }(\nu _{in}(\mathbf {\beta })))\rightarrow 0\;a.s.\) and \(\varphi (F_{i}(z_{i}(\mathbf {\beta })))-\varphi (a_{ni}(\mathbf {\beta }))\rightarrow 0\;a.s.\), for each i and for all \(\mathbf {\beta }\in \mathscr {B}\). Also, since \(\nu _{in}(\mathbf {\beta })-z_{i}(\mathbf {\beta })\rightarrow 0\;a.s.\), by the continuity of the probability measure and the continuity of \(\varphi \), we have \(\varphi (F_{i\nu }(\nu _{in}(\mathbf {\beta })))-\varphi (F_{i}(z_{i}(\mathbf {\beta })))\rightarrow 0\;a.s.\), for each i and for all \(\mathbf {\beta }\in \mathscr {B}\). Thus,
From this, we have
which converges almost surely to zero. Furthermore,
By the strong law of large numbers, the entire expression on the right-hand side of this inequality converges a.s. to \(E\{|Y|^2\}+E\{J^{2}({\mathbf X})\}+2\left( E\{|Y|^2\}E\{J^{2}({\mathbf X})\}\right) ^{1/2}<\infty \), by assumptions \((I_2)\)–(iii) and \((I_4)\). Thus,
Now, combining all these facts, we have \(\displaystyle \sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|\widetilde{D}_{n}(\mathbf {\beta })-D_{n}(\mathbf {\beta })|\ \rightarrow \ 0\;\;a.s.\) \(\square \)
Proof of Theorem 2
Note that \(\varphi \) has a bounded first derivative. So, \(\varphi \in Lip(1)\). Moreover, by \((I_2)\)–(iii) and \((I_4)\), we have \(\text{ Var }(z_i(\mathbf {\beta })) < \infty \), for all i and \(\mathbf {\beta } \in \mathscr {B}\). Then
where \(\sigma ^2_\mathrm{{max}}(\mathbf {\beta }) = \max \{\text{ Var }(z_1(\mathbf {\beta })), \ldots , \text{ Var }(z_n(\mathbf {\beta }))\}\). Setting \(\alpha _n = 1/n\) and \(\beta =1\) in the theorem of (Xiang 1995), we find that for every \(\mathbf {\beta } \in \mathscr {B}\), \(D_n(\mathbf {\beta }) - E\{D_n(\mathbf {\beta })\} \rightarrow 0 \ a.s.\)
To complete the proof, we have to show that \(\{D_{n}(\mathbf {\beta })\}_{n\ge 1}\) is stochastically equicontinuous. To that end, taking \(\mathbf {\beta }_{1},\mathbf {\beta }_{2}\in \mathscr {B}\), we have
As in the proof of Theorem 1, set \(a_{ni}(\mathbf {\beta })=R(z_{i}(\mathbf {\beta }))/(n+1)\). Then,
Note that \(z_{i}(\mathbf {\beta }_1)-z_{i}(\mathbf {\beta }_2)=g_{\mathbf {\beta }_{1}} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{1})-g_{\mathbf {\beta }_{2}}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{2})\). Since \(g_{\mathbf {\beta }}(\cdot )\) is differentiable with respect to \(\mathbf {\beta }\), applying the mean value theorem on the function \(g_{\mathbf {\beta }}({\mathbf X}^{\tau }\mathbf {\beta })\), there exists \(\mathbf {\xi }=\lambda \mathbf {\beta }_{1}+(1-\lambda )\mathbf {\beta }_{2}\) for some \(\lambda \in (0,1)\) such that
Then, by assumption \((I_2)\)–(iii) we have
Furthermore, set \(h_{i}(\mathbf {\beta })=\varphi \{F_{i}(z_{i}(\mathbf {\beta }))\}=\varphi \{F_{i}(Y_{i} -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta }))\}\), where \(F_{i}\) is a cumulative distribution function of \(z_{i}(\mathbf {\beta })\), and therefore almost surely differentiable. So by the mean value theorem, there exists \(\eta =\lambda \mathbf {\beta }_{1}+(1-\lambda )\mathbf {\beta }_{2}\) for \(\lambda \in (0,1)\) such that \(h_{i}(\mathbf {\beta }_{1})-h_{i}(\mathbf {\beta }_{2})=h'_{i}(\eta )(\mathbf {\beta }_{1} -\mathbf {\beta }_{2})\), with \(h'_{i}(\eta )=-\nabla _{\eta }[g_{\eta }({\mathbf X}_{i}^{\tau }\eta )]f_{i}(z_{i}(\eta )) \varphi ^{\prime }\{F_{i}(z_{i}(\eta ))\}\) and \(f_{i}(t)=dF_{i}(t)/dt\). It is worth pointing out that \(f_{i}\) being a density is almost surely bounded. Thus, by assumption \((I_2)-iii)\) again together with the boundedness of \(\varphi ^{\prime }\), we have \(\Vert h'_{i}(\eta )\Vert \le MJ({\mathbf X}_{i})\; a.s.\), where M is such that \(|f_{i}(z_{i}(\eta ))\varphi ^{\prime }\{F_{i}(z_{i}(\eta ))\}|\le M\;a.s.\) On the other hand, for \(i=1,\ldots ,n\), \(F_{i}(z_{i}(\mathbf {\beta }))\) being independent uniformly distributed in the interval (0, 1), for all \(\mathbf {\beta }\in \mathscr {B}\), as in Theorem 1, following Hájek et al. (1999) again, it is obtained that \(a_{ni}(\mathbf {\beta })-F_{i}(z_{i}(\mathbf {\beta }))\rightarrow 0\;a.s.\), for all \(\mathbf {\beta }\in \mathscr {B}\) and for each i. By continuity of \(\varphi \), we have \(\varphi \left( a_{ni}(\mathbf {\beta })\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }))\}\rightarrow 0\;a.s.\), for all \(\mathbf {\beta }\in \mathscr {B}\) and for each i. Thus,
for all \(\mathbf {\beta }\in \mathscr {B}\). Now
where L is such that \(|\varphi (t)|\le L\), for all \(t\in (0,1)\). Also, with probability 1, we have
Moreover,
as \(\max _{1\le i\le n}|\varphi \left( a_{ni}(\mathbf {\beta }_1)\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }_1))\}|^{2}\rightarrow 0\;\;a.s.\) and
where \(J_{4n}\), defined in Eq. (15), converges almost surely to a finite quantity by the strong law of large numbers under assumptions \((I_2)\)–(iii) and \((I_4)\). Similarly,
converges almost surely to zero. Hence, with probability 1, we have
where
For n large enough, \(B_{n}\) does not depend on \(\mathbf {\beta }\). From the fact that all terms in the definition of \(B_{n}\) converge almost surely to a finite quantity, so does \(B_{n}\). Therefore, \(\{D_{n}(\mathbf {\beta })\}_{n\ge 1}\) is stochastically equicontinuous (Rao et al. 2014). \(\square \)
Proof of Theorem 4
Note that by Jensen inequality,
Thus, together with Theorem 1, applying the dominated convergence theorem to the right-hand side of this inequality, we obtain the result. On the other hand,
Thus,
From Theorems 1, 2 and Eq. (16), the terms to the right-hand side of Eq. (17) converge to zero with probability 1. \(\square \)
Proof of Theorem 5
By assumption \((I_6)\), \(\mathbf {\beta }_{0,n}=\mathop {{{\mathrm{Argmin}}}}\limits _{\mathbf {\beta }}E(D_{n}(\mathbf {\beta }))\) which implies that
for all \(\mathbf {\beta }\in \mathscr {B}\). On the other hand, by Theorem 4, we have
Thus, \(\forall ~\varepsilon >0\), there exists \(N>0\) such that for all \(n\ge N\), \(|E(\widetilde{D}_{n}(\mathbf {\beta }))-E(D_{n}(\mathbf {\beta }))|<\varepsilon /2\) for all \(\mathbf {\beta }\in \mathscr {B}\). This implies that
Also, for all \(n\ge N\), \(|E(D_{n}(\mathbf {\beta }_{0,n}))-E(\widetilde{D}_{n}(\mathbf {\beta }_{0,n}))|<\varepsilon /2\). Thus, we have
Equations (19) in (18) gives \(-\varepsilon +E(\widetilde{D}_{n}(\mathbf {\beta }_{0,n}))<E(\widetilde{D}_{n}(\mathbf {\beta }))\), for all \(\mathbf {\beta }\in \mathscr {B}\) and for all \(n\ge N\). Now \(\varepsilon \) being arbitrary, letting \(\varepsilon \rightarrow 0\), we have \(E(\widetilde{D}_{n}(\mathbf {\beta }_{0,n}))\le E(\widetilde{D}_{n}(\mathbf {\beta }))\), for all \(\mathbf {\beta }\in \mathscr {B}\) which completes the proof. \(\square \)
Proof of Theorem 6
Note that
So,
By continuity of \(\varphi \) and the fact that for \(i=1,\ldots ,n\), \(F_{i}(z_{i}(\mathbf {\beta }))\) are independent uniformly distributed in (0, 1), once again following (Hájek et al. 1999), we have \(\left| \varphi \left( \frac{R(z_{i}(\mathbf {\beta }))}{n+1}\right) -\varphi \left( F_{i}(z_{i}(\mathbf {\beta }))\right) \right| \rightarrow 0\;a.s.\), for all i and \(\mathbf {\beta }\in \mathscr {B}\). Thus,
On the other hand, \(n^{-1}\sum _{i=1}^{n}J^{2}({\mathbf X}_{i})\rightarrow E[J^{2}({\mathbf X})]<\infty \;a.s.\) Hence, \(\displaystyle \lim _{n\rightarrow \infty }\sup _{\mathbf {\beta }\in \mathscr {B}}|S_{n} (\mathbf {\beta })-T_{n}(\mathbf {\beta })|=0\;a.s.\) \(\square \)
Proof of Theorem 7
Note that
A direct application of the strong of large numbers shows that \(\nabla _{\mathbf {\beta }_0}T_{n}(\mathbf {\beta }_0)\rightarrow \mathscr {{\mathbf W}}\;a.s.\) If we assume that \({\mathbf X}\) is independent of \(\varepsilon \), we have
But
from integration by parts, since \(f(\varepsilon )\varphi (F(\varepsilon ))\rightarrow 0\) as \(\varepsilon \rightarrow \pm \infty \). Now, putting \(u=F(\varepsilon )\), we have
On the other have, by assumption \((I_1)\), \(E\left[ \varphi \big (F(\varepsilon ))\big )\right] =\int _{0}^{1}\varphi (t)dt=0\). Thus,
On the other hand, to simplify notation, set \({\mathbf A}_{i}=\nabla _{\mathbf {\xi }}[g_{\mathbf {\xi }}({\mathbf X}_{i}^{\tau }\mathbf {\xi })]\), \({\mathbf B}_{i}=\nabla _{\mathbf {\xi }}^{2}[g_{\mathbf {\xi }}({\mathbf X}_{i}^{\tau }\mathbf {\xi })]\) and \({\mathbf C}_{i}=\nabla _{\mathbf {\xi }}^{3}[g_{\mathbf {\xi }}({\mathbf X}_{i}^{\tau }\mathbf {\xi })]\)
From this, it can be easily shown that with each term to the right-hand side of this equation is bounded by
which converges almost surely to \(3L\times E[\exp \{\lambda \Vert {\mathbf X}\Vert \}\{J({\mathbf X})+J^{2}({\mathbf X})+J^{3}({\mathbf X})\}]<\infty \), by the strong law of large numbers under \((I_2)\)–(iii) and \((I_4)\). Thus, \(\nabla _{\mathbf {\beta }}^{2}T_{n}(\mathbf {\xi })\) is almost surely bounded and the result follows from Theorem 2. \(\square \)
Proof of Theorem 8
We mimic the proof given in Hettmansperger and McKean (1998) for the linear model. Set
It follows by a routine argument that \(\sqrt{n}(S_{n}(\mathbf {\beta }_{0})-T_{n}(\mathbf {\beta }_{0}))\) converges to \(\mathbf {0}\) in probability. Hence, the proof will be completed by showing that \(\sqrt{n}T_{n}(\mathbf {\beta }_{0})\) converges to the intended distribution. Using the Cramér–Wold device (Serfling 1980), let
where \({\mathbf a}\in \mathbb {R}^p\). Since F is the distribution of \(\varepsilon (\mathbf {\beta }_{0})\) and \(\int _{0}^{1}\varphi (t)\mathrm{d}t=0\), we have \(E(U)=0\). Also, since \(\int _{0}^{1}\varphi ^{2}(t)\mathrm{d}t=1\),
Note that U is the sum of independent functions of random variables which are not necessarily identically distributed; hence, we need to establish the limit distribution by the Lindeberg–Feller central limit theorem. To this end, set \(\sigma _{n}^{2}=\mathrm{Var}(U)\). Defining \(A_{n}\) by
we need to show that
By assumption \((I_3)\)–(iii), \(\Vert \nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta })\Vert \le J({\mathbf X}_{i})\) and so,
\(J(\cdot )\) being integrable, is almost surely bounded. Thus, there exists a positive constant c such that \(J({\mathbf X}_{i})\le c\;a.s.\), and therefore, \(n^{-1/2}|{\mathbf a}^{\tau } \nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|\le n^{-1/2}c\Vert {\mathbf a}\Vert \;a.s.\) Hence,
Set \(\lambda _{n}=n^{-1/2}c\Vert {\mathbf a}\Vert \). Then, \(\lambda _{n}\rightarrow 0\) as \(n\rightarrow \infty \), and is independent of i. Since \(\sigma ^{2}_{n}\) converges to a positive quantity, the ratio \(\sigma _{n}/\lambda _{n}\rightarrow \infty \) as \(n\rightarrow \infty \). Now conditioning on \({\mathbf X}_{i}\), it is easy to see that
In this expression, \(\lim _{n \rightarrow \infty } n^{-1}\sum _{i=1}^{n}E\{I_{\varGamma }({\mathbf X})\nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big )[\nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big )]^{\tau }\}< \infty \) by \((I_2)\)–(iii), \((I_4)\) and \((I_6)\). From the boundedness of \(\varphi \) and applying the dominated convergence theorem, we have
This shows that the limit in (20) goes to zero as \(n\rightarrow \infty \). \(\square \)
Proof of Theorem 10
Recall that from Eq. (2), for any \({\mathbf X}_{i}\in \varGamma \),
where \(z_{(1)}(\mathbf {\beta })\le z_{(2)}(\mathbf {\beta })\le \cdots \le z_{(n)}(\mathbf {\beta })\). Since R(t) is a step function, it has a finite number of jumps. The set of such jumps is finite and therefore has a zero probability. Since \(g_{\mathbf {\beta }}(\cdot )\) is assumed to be three times continuously differentiable by \((I_2)\)–(iii), \(D_{n}(\mathbf {\beta })\) is almost surely differentiable. From this, taking into account Theorem 6 and expanding \(D_{n}(\mathbf {\beta })\) around \(\mathbf {\beta }_{0}\) up to order 2, we have with probability 1,
where \(\mathbf {\xi }=\lambda \mathbf {\beta }_{0}+(1-\lambda )\mathbf {\beta }\), for \(\lambda \in (0,1)\). Thus,
From this, we have
as \(\Vert \nabla _{\mathbf {\beta }_0}T_n(\mathbf {\beta }_0)\Vert \) and \(\Vert \nabla _{\mathbf {\beta }}T_{n}(\mathbf {\xi })\Vert \) are bounded by \(L n^{-1}\sum _{i=1}^{n}[J({\mathbf X}_{i})+J^{2}({\mathbf X}_{i})]\). On the other hand, \(n^{-1}\sum _{i=1}^{n}[J({\mathbf X}_{i})+J^{2}({\mathbf X}_{i})]\rightarrow E\big [J({\mathbf X})+J^{2}({\mathbf X})\big ]<\infty \;a.s.\), by assumption \((I_2)\)–(iii) and \((I_4)\). Now, for any \(\mathbf {\beta }\in \mathscr {B}_{n}\), \(\Vert \mathbf {\beta }-\mathbf {\beta }_{0}\Vert \le c/\sqrt{n}\). This implies that
By Markov’s inequality, we have for any \(\varepsilon >0\) and for n large enough,
A direct application of the dominated convergence theorem gives
Thus, \(\displaystyle \lim _{n\rightarrow \infty } P_{\mathbf {\beta }_{0}}\Big [\sup _{\mathbf {\beta }\in \mathscr {B}_{n}}|D_{n}(\mathbf {\beta })-M_{n}(\mathbf {\beta })|>\varepsilon \Big ]=0\). The proof of Eq. (8) is obtained similarly, while that of Eq. (7) is obtained by combining Eq. (6) and Theorem 1. \(\square \)
Proof of Theorem 11
Equation (12) gives \(\sqrt{n}\big (\tilde{\mathbf {\beta }}_{n}-\mathbf {\beta }_{0}\big ) = -\widetilde{{\mathbf W}}_{n}^{-1}\sqrt{n}\widetilde{S}_{n}(\mathbf {\beta }_{0}) + o_p(1)\) and by (9) we have \(\sqrt{n}\widetilde{S}_{n}(\mathbf {\beta }_{0}) = \sqrt{n}S_{n}(\mathbf {\beta }_{0}) + o_p(1)\). Moreover, \(\widetilde{{\mathbf W}} = {\mathbf W}+ o_p(1)\) by (13). Since \({\mathbf W}\) is positive definite, we have \(\sqrt{n}\big (\tilde{\mathbf {\beta }}_{n}-\mathbf {\beta }_{0}\big ) = -{\mathbf W}^{-1}\sqrt{n} S_{n}(\mathbf {\beta }_{0}) + o_p(1)\). The result follows by Theorem 8. \(\square \)
About this article
Cite this article
Bindele, H.F., Abebe, A. & Meyer, K.N. General rank-based estimation for regression single index models. Ann Inst Stat Math 70, 1115–1146 (2018). https://doi.org/10.1007/s10463-017-0618-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-017-0618-9