Skip to main content
Log in

General rank-based estimation for regression single index models

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

This study considers rank estimation of the regression coefficients of the single index regression model. Conditions needed for the consistency and asymptotic normality of the proposed estimator are established. Monte Carlo simulation experiments demonstrate the robustness and efficiency of the proposed estimator compared to the semiparametric least squares estimator. A real-life example illustrates that the rank regression procedure effectively corrects model nonlinearity even in the presence of outliers in the response space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abebe, A., McKean, J. W. (2013). Weighted Wilcoxon estimators in nonlinear regression. Australian and New Zealand Journal of Statistics, 55(4), 401–420.

  • Andrews, D. W. K. (1994). Asymptotics for semiparametric econometric models via stochastic equicontinuity. Econometrica, 62(1), 43–72.

    Article  MathSciNet  MATH  Google Scholar 

  • Bindele, H. F., Abebe, A. (2012). Bounded influence nonlinear signed-rank regression. Canadian Journal of Statistics, 40(1), 172–189.

  • Carroll, R. J., Fan, J., Gijbels, I., Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489.

  • Chang, W. H., McKean, J. W., Naranjo, J. D., Sheather, S. J. (1999). High-breakdown rank regression. Journal of the American Statistical Association, 94(445), 205–219.

  • Delecroix, M., Härdle, W., Hristache, M. (2003). Efficient estimation in conditional single-index regression. Journal of Multivariate Analysis, 86(2), 213–226.

  • Delecroix, M., Hristache, M., Patilea, V. (2006). On semiparametric-estimation in single-index regression. Journal of Statistical Planning and Inference, 136(3), 730–769.

  • Feng, L., Zou, C., Wang, Z. (2012). Rank-based inference for the single-index model. Statistics & Probability Letters, 82(3), 535–541.

  • Hájek, J., Šidák, Z., Sen, P. K. (1999). Theory of rank tests. Probability and mathematical statistics (2nd ed.). San Diego, CA: Academic Press, Inc.

  • Han, A. K. (1987). Non-parametric analysis of a generalized regression model: The maximum rank correlation estimator. Journal of Econometrics, 35(2), 303–316.

    Article  MathSciNet  MATH  Google Scholar 

  • Härdle, W., Stoker, T. M. (1989). Investigating smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association, 84(408), 986–995.

  • Härdle, W., Tsybakov, A. B. (1993). How sensitive are average derivatives? Journal of Econometrics, 58(1–2), 31–48.

  • Härdle, W., Hall, P., Ichimura, H. (1993). Optimal smoothing in single-index models. The Annals of Statistics, 21(1), 157–178.

  • Hettmansperger, T. P., McKean, J. W. (1998). Robust nonparametric statistical methods, volume 5 of Kendall’s library of statistics. London: Edward Arnold.

  • Hettmansperger, T. P., McKean, J. W. (2011). Robust nonparametric statistical methods, volume 119 of monographs on statistics and applied probability (2nd ed.). Boca Raton, FL: CRC Press.

  • Horowitz, J. L., Härdle, W. (1996). Direct semiparametric estimation of single-index models with discrete covariates. Journal of the American Statistical Association, 91(436), 1632–1640.

  • Hristache, M., Juditsky, A., Spokoiny, V. (2001). Direct estimation of the index coefficient in a single-index model. The Annals of Statistics, 29(3), 595–623.

  • Ichimura, H. (1993). Semiparametric least squares (sls) and weighted sls estimation of single-index models. Journal of Econometrics, 58(1–2), 71–120.

    Article  MathSciNet  MATH  Google Scholar 

  • Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of the residuals. Annals of Mathematical Statistics, 43, 1449–1458.

    Article  MathSciNet  MATH  Google Scholar 

  • Klein, R. W., Spady, R. H. (1993). An efficient semiparametric estimator for binary response models. Econometrica, 61(2), 387–421.

  • Kutner, M., Nachtsheim, C., Neter, J., Li, W. (2004). Applied Linear Statistical Models with Student CD. New York: McGraw-Hill/Irwin Companies, Incorporated.

  • Liu, J., Zhang, R., Zhao, W., Lv, Y. (2013). A robust and efficient estimation method for single index models. Journal of Multivariate Analysis, 122, 226–238.

  • McCullagh, P., Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall.

  • Naranjo, J. D., Hettmansperger, T. P. (1994). Bounded influence rank regression. Journal of the Royal Statistical Society. Series B, 56(1), 209–220.

  • Newey, W. K. (2004). Efficient semiparametric estimation via moment restrictions. Econometrica, 72(6), 1877–1897.

    Article  MathSciNet  MATH  Google Scholar 

  • Newey, W. K., McFadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of econometrics, 4, 2111–2245.

  • Powell, J. L., Stock, J. H., Stoker, T. M. (1989). Semiparametric estimation of index coefficients. Econometrica, 57(6), 1403–1430.

  • R Development Core Team. (2009). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.

  • Rao, T. S., Das, S., Boshnakov, G. N. (2014). A frequency domain approach for the estimation of parameters of spatio-temporal stationary random processes. Journal of Time Series Analysis, 35(4), 357–377.

  • Serfling, R. J. (1980). Approximation theorems of mathematical statistics., Wiley series in probability and mathematical statistics. New York: Wiley.

    Book  MATH  Google Scholar 

  • Sherman, R. P. (1994). Maximal inequalities for degenerate U-processes with applications to optimization estimators. The Annals of Statistics, 22(1), 439–459.

    Article  MathSciNet  MATH  Google Scholar 

  • Whitt, W. (2011). Stochastic-process limits: An introduction to stochastic-process limits and their application to queues. New York: Springer.

  • Xia, Y. (2006). Asymptotic distribution for two estimators of the single-index model. Econometric Theory, 22, 1112–1137.

    Article  MathSciNet  MATH  Google Scholar 

  • Xia, Y., Tong, H., Li, W. K., Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 363–410.

  • Xiang, X. (1995). A strong law of large number for $L$-statistics in the non-i.d. case. Communications in Statistics. Theory and Methods, 24(7), 1813–1819.

    Article  MathSciNet  MATH  Google Scholar 

  • Yin, X., Cook, R. D. (2005). Direction estimation in single-index regressions. Biometrika, 92(2), 371–384.

  • Yu, Y., Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97(460), 1042–1054.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huybrechts F. Bindele.

Appendix

Appendix

This appendix contains proofs of the theoretical main results together with a key Lemma due to Delecroix et al. (2006) that ensures the uniform strong consistency of the leave-one-out Nadaraya–Watson estimator. For details regarding the proof of this lemma, readers are referred to the aforementioned paper.

Lemma 3

Let \(\mathscr {B}_{n}=:\{\mathbf {\beta }:~\Vert \mathbf {\beta }-\mathbf {\beta }_{0}\Vert \le d_{n}\}\), where \(d_{n}\) is some sequence decreasing to zero. Then,

  1. (a)

    if \(\delta >0\), we have,

    $$\begin{aligned} \sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}}\left| I_{\{{\mathbf x}:\widehat{\mu }_{\mathbf {\beta },h}^{i}({\mathbf x}^{\tau }\mathbf {\beta })\ge c\}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right| \le I_{\varGamma ^{\delta }}({\mathbf X}_{i})+I_{(\delta ,\infty )}(Z_{n}), \end{aligned}$$

    where \(\varGamma ^{\delta }=\{{\mathbf x}:~|\mu _{\mathbf {\beta }_{0},h}({\mathbf x}^{\tau }\mathbf {\beta }_{0})-c|\le \delta \}\) and

    $$\begin{aligned} Z_{n}=\max _{1\le i\le n}\sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}} \left| \widehat{\mu }_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -\mu _{\mathbf {\beta }_{0},h}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{0})\right| . \end{aligned}$$
  2. (b)

    Assume \(d_{n}=o(1/\sqrt{n})\), and there exists a sequence \(\delta _{n}\rightarrow 0\) such that \(\delta _{n}/n^{-a\varepsilon }\rightarrow \infty \) and \(\delta _{n}[d_{n}\sqrt{n}]^{-a\varepsilon }\rightarrow \infty \), for some \(a>0\), then \(I_{(\delta _{n},\infty )}(Z_{n})=o_{p}(n^{-\alpha })\), for all \(\alpha >0\). Moreover, together with assumptions \((I_2)\)\((I_4)\), assuming that \(E(|Y|^2)<\infty \), we have

    $$\begin{aligned} \max _{1\le i\le n}\sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}| \widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}\mathbf {\beta })|I_{\varGamma }({\mathbf X}_{i})\rightarrow 0\quad a.s.\; \text{ as } n\rightarrow \infty \text{, } \end{aligned}$$

    and

    $$\begin{aligned} \max _{1\le i\le n}\sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|\nabla _{\mathbf {\beta }}[\widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}\mathbf {\beta })]-\nabla _{\mathbf {\beta }}[g_{\mathbf {\beta }}({\mathbf X}_{i}\mathbf {\beta })]|I_{\varGamma }({\mathbf X}_{i})\rightarrow 0\quad a.s.\; \text{ as } n\rightarrow \infty \text{. } \end{aligned}$$

Proof of Lemma 1

(i) By definition, we have \(A_{n}(\hat{\alpha }_{n})\le A_{n}(\alpha _{0,n})\) and \(E(A_{n}(\alpha _{0,n}))\le E(A_{n}(\hat{\alpha }_{n}))\). These inequalities give \(A_{n}(\hat{\alpha }_{n})-E(A_{n}(\hat{\alpha }_{n}))\le A_{n}(\hat{\alpha }_{n})-E(A_{n}(\alpha _{0,n}))\le A_{n}(\alpha _{0,n})-E(A_{n}(\alpha _{0,n}))\). Thus,

$$\begin{aligned} |A_{n}(\hat{\alpha }_{n})-E(A_{n}(\alpha _{0,n}))|\le & {} \max \{|A_{n}(\hat{\alpha }_{n})-E(A_{n}(\hat{\alpha }_{n}))|, |A_{n}(\alpha _{0,n})-E(A_{n}(\alpha _{0,n}))|\}\\\le & {} \sup _{\alpha \in \varTheta }|A_{n}(\alpha )-E(A_{n}(\alpha ))|. \end{aligned}$$

Since \(\alpha _{0,n}\) is unique for any fixed n, \(\alpha _{0,n}\rightarrow \alpha _{0}\) and \(\displaystyle \sup _{\alpha \in \varTheta }|A_{n}(\alpha )-E(A_{n}(\alpha ))|\rightarrow 0\;\;a.s.\) as \(n\rightarrow \infty \), we have \(\hat{\alpha }_{n}\rightarrow \alpha _{0}\;\;a.s.\) as \(n\rightarrow \infty \). \(\square \)

Proof of Lemma 2

We provide the proof of equation (9) and those of Eqs. (10) and (11) could be obtained using similar arguments. By Chebyshev’s inequality, we have, for any \(\varepsilon >0\),

$$\begin{aligned} P_{\mathbf {\beta }_0}\left( \sqrt{n}\Vert \widetilde{S}_{n}(\mathbf {\beta }_0) -S_{n}(\mathbf {\beta }_0)\Vert >\varepsilon \right) \le \frac{1}{\varepsilon ^2} E\left\{ n\big \Vert \widetilde{S}_{n}(\mathbf {\beta }_0)-S_{n}(\mathbf {\beta }_0)\big \Vert ^2\right\} . \end{aligned}$$

Setting \(a_{ni}(\mathbf {\beta }_0)=R(\nu _{ni}(\mathbf {\beta }_0))/(n+1)\), \(b_{ni}(\mathbf {\beta }_0)=R(z_{i}(\mathbf {\beta }_0))/(n+1)\), let us introduce the following notation: \(\psi _{i}(\mathbf {\beta }_0)=\varphi (a_{ni}(\mathbf {\beta }_0)) -\varphi (b_{ni}(\mathbf {\beta }_0))\) and \(U_{i}(\mathbf {\beta }_0)=I_{\varGamma _{n}}({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0} [\widehat{g}_{\mathbf {\beta }_{0},h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)] -\,I_{\varGamma }({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_{0}} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\).

$$\begin{aligned}&E\left\{ n\big \Vert \widetilde{S}_{n}(\mathbf {\beta }_0)-S_{n}(\mathbf {\beta }_0)\big \Vert ^2\right\} \\&\quad =\frac{1}{n\varepsilon ^2}E\left[ \left( \sum _{i=1}^{n}\left\{ I_{\varGamma _{n}} ({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0}[\widehat{g}_{\mathbf {\beta }_{0},h}^{i} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\varphi \left( a_{ni}(\mathbf {\beta }_0)\right) \right. \right. \right. \\&\qquad \left. \left. \left. -\,I_{\varGamma }({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_{0}} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\varphi \left( b_{ni}(\mathbf {\beta }_0)\right) \right\} \right) ^2\right] \\&\quad =\frac{1}{\varepsilon ^2}E\left[ \frac{1}{n}\sum _{i=1}^{n}U_{i}^{2} (\mathbf {\beta }_0)\varphi ^{2}\left( a_{ni}(\mathbf {\beta }_0)\right) \right] \\&\qquad +\,\frac{1}{\varepsilon ^2}E\left[ \frac{1}{n}\sum _{i=1}^{n} \left\{ \nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_{0}}({\mathbf X}_{i}^{\tau } \mathbf {\beta }_0)]\right\} ^{2}\psi _{i}^{2}(\mathbf {\beta }_0)\right] \\&\qquad +\,\frac{2}{\varepsilon ^2}E\left[ \frac{1}{n}\sum _{i<j}^{n}\left\{ \nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_{0}}({\mathbf X}_{i}^{\tau } \mathbf {\beta }_0)]\varphi \left( a_{nj}(\mathbf {\beta }_0)\right) \right\} U_{i}(\mathbf {\beta }_0)\psi _{j}(\mathbf {\beta }_0)\right] \\&\quad =J_{1n}+J_{2n}+J_{3n}. \end{aligned}$$

We now show that \(J_{in}\rightarrow 0\) as \(n\rightarrow \infty \), for \(i=1,2,3\). Indeed, from the boundedness of \(\varphi \), there exists a positive constant L such that \(|\varphi (u)|\le L\), for all \(u\in (0,1)\). Also

$$\begin{aligned} |\psi _i(\mathbf {\beta }_0)|\le & {} |\varphi (a_{ni}(\mathbf {\beta }_0))-\varphi (F_{\nu }(\nu _{ni}(\mathbf {\beta }_0)))|\\&+\,|\varphi (F_{\nu }(\nu _{ni}(\mathbf {\beta }_0)))-\varphi (F(z_{i} (\mathbf {\beta }_0)))|+|\varphi (F(z_{i}(\mathbf {\beta }_0)))-\varphi (b_{ni}(\mathbf {\beta }_0))|. \end{aligned}$$

For \(i=1,\ldots ,n\), \(F_{\nu }(\nu _{ni}(\mathbf {\beta }_0))\) and \(F(z_{i}(\mathbf {\beta }_0))\) are independent uniformly distributed in (0, 1) random variables. Following Chapter 6 of Hájek et al. (1999), it is obtained that \(\nu _{ni}(\mathbf {\beta }_0)-F_{\nu }(\nu _{ni}(\mathbf {\beta }_0))\rightarrow 0\;a.s.\) and \(b_{ni}(\mathbf {\beta }_0)-F(z_{i}(\mathbf {\beta }_0))\rightarrow 0\;a.s.\), for each i. Thus, by continuity of \(\varphi \) and by Lemma 3, we have \(\varphi (a_{ni}(\mathbf {\beta }_0))-\varphi (F_{\nu }(\nu _{ni}(\mathbf {\beta }_0)))\rightarrow 0\;a.s.\) and \(\varphi (F(z_{i}(\mathbf {\beta }_0)))-\varphi (b_{ni}(\mathbf {\beta }_0))\rightarrow 0\;a.s.\), for each i. Also, by Lemma 3, we have \(\nu _{ni}(\mathbf {\beta }_0)-z_{i}(\mathbf {\beta }_0)\rightarrow 0\;a.s.\), from which by the continuity of the probability measure and the continuity of \(\varphi \), we have \(\varphi (F_{\nu }(\nu _{ni}(\mathbf {\beta }_0)))-\varphi (F(z_{i}(\mathbf {\beta }_0)))\rightarrow 0\;a.s.\), for each i. On the other hand,

$$\begin{aligned} \Vert U_{i}(\mathbf {\beta }_0)\Vert&\le |I_{\varGamma _{n}}({\mathbf X}_{i}) -I_{\varGamma }({\mathbf X}_{i})|\Vert \nabla _{\mathbf {\beta }_0}[\widehat{g}_{\mathbf {\beta }_{0},h}^{i} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\Vert +\Vert \nabla _{\mathbf {\beta }_0} [\widehat{g}_{\mathbf {\beta }_{0},h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\\&\quad -\,\nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_{0}}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)] \Vert I_{\varGamma } ({\mathbf X}_{i}). \end{aligned}$$

For \({\mathbf X}_{i}\in \varGamma \) and for all \(\varepsilon >0\), there exists \(N>0\) such that for all \(n\ge N\),

$$\begin{aligned} \Vert \nabla _{\mathbf {\beta }_0}[\widehat{g}_{\mathbf {\beta }_{0},h}^{i} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\Vert <\Vert \nabla _{\mathbf {\beta }_0} [g({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\Vert +\varepsilon \le J({\mathbf X}_{i})+\varepsilon . \end{aligned}$$

\(\varepsilon \) being arbitrary, letting \(\varepsilon \rightarrow 0\), we have \(\Vert \nabla _{\mathbf {\beta }_0}[\widehat{g}_{\mathbf {\beta }_{0},h}^{i} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\Vert \le J({\mathbf X}_{i})<\infty \;a.s. \), as J is integrable. Thus, by Lemma 3, \(|I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})|\Vert \nabla _{\mathbf {\beta }_0} [\widehat{g}_{\mathbf {\beta }_{0},h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\Vert \rightarrow 0\;a.s.\) and \(\Vert \nabla _{\mathbf {\beta }_0}[\widehat{g}_{\mathbf {\beta }_{0},h}^{i} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]-\nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_{0}} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\Vert I_{\varGamma }({\mathbf X}_{i})\rightarrow 0\;a.s.\), for all i. Therefore, \(\Vert U_{i}(\mathbf {\beta }_0)\Vert \rightarrow 0\;a.s.\), for all i. Then

$$\begin{aligned} \Vert J_{1n}\Vert \le \frac{L^2}{\varepsilon ^2}E\left( \max _{1\le i\le n} \left\| U_{i}(\mathbf {\beta }_0)\right\| ^{2}\right) \rightarrow 0\;a.s., \end{aligned}$$

by applying the dominated convergence theorem together with Lemma 3. Next, using Cauchy–Schwarz inequality, we have

$$\begin{aligned} \big \Vert J_{2n}\big \Vert\le & {} \frac{1}{\varepsilon ^2}E\left[ \frac{1}{n}\sum _{i=1}^{n}J^{2}({\mathbf X}_{i}) \left| \psi _i(\mathbf {\beta }_0)\right| ^2\right] \\\le & {} \frac{1}{\varepsilon ^2}E\left[ \left( \frac{1}{n}\sum _{i=1}^{n} J^{4}({\mathbf X}_{i})\right) ^{1/2}\left( \max _{1\le i\le n}\left| \psi _i(\mathbf {\beta }_0)\right| ^4\right) ^{1/2}\right] . \end{aligned}$$

By the strong law of large numbers (SLLN), \(n^{-1}\sum _{i=1}^{n}J^{4}({\mathbf X}_{i}) \rightarrow E\{J^{4}({\mathbf X})\}<\infty \;a.s.\) Also, from the above discussion, \(\max _{1\le i\le n}\left| \psi _i(\mathbf {\beta }_0)\right| ^4\rightarrow 0\;a.s.\) Thus, applying the dominated convergence theorem once again, we have \(J_{2n}\rightarrow 0\;a.s.\) Moreover, using the simple inequality \(ab\le (a^2+b^2)/2\) together with Cauchy–Schwarz inequality, we have

$$\begin{aligned} \Vert J_{3n}\Vert\le & {} \frac{2L}{\varepsilon ^2}E\left[ \frac{1}{n}\sum _{i<j}^{n} J({\mathbf X}_{i})\left\| U_{i}(\mathbf {\beta }_0)\right\| \left| \psi _{i}(\mathbf {\beta }_0) \right| \right] \\\le & {} \frac{L}{\varepsilon ^2}E\left[ \frac{1}{n}\sum _{i=1}^{n}J^{2}({\mathbf X}_{i}) \left\| U_{i}(\mathbf {\beta }_0)\right\| ^{2}\right] +\frac{L}{\varepsilon ^2}E \left[ \frac{1}{n}\sum _{j=1}^{n}\left| \psi _{i}(\mathbf {\beta }_0)\right| ^{2}\right] \\\le & {} \frac{L}{\varepsilon ^2}E\left[ \left( \frac{1}{n}\sum _{i=1}^{n}J^{4} ({\mathbf X}_{i})\right) ^{1/2}\left( \max _{1\le i\le n}\left\| U_{i}(\mathbf {\beta }_0) \right\| ^{4}\right) ^{1/2}\right] \\&+\, \frac{L}{\varepsilon ^2}E \left[ \max _{1\le j\le n}\left| \psi _{i}(\mathbf {\beta }_0)\right| ^{2}\right] . \end{aligned}$$

By Lemma 3, \(\displaystyle \max _{1\le i\le n}\left\| U_{i}(\mathbf {\beta }_0)\right\| ^{4}\rightarrow 0\;a.s.\), and again, by the SLLN, \(\displaystyle \frac{1}{n}\sum _{i=1}^{n}J^{4}({\mathbf X}_{i})\) converges almost surely to \(E\{J^{4}({\mathbf X})\}<\infty .\) Also, as before, \(\displaystyle \max _{1\le i\le n}\left| \psi _{i}(\mathbf {\beta }_0)\right| ^2\rightarrow 0\;a.s.\) To this end, once again, a direct application of the dominated convergence theorem gives \(J_{3n}\rightarrow 0\;a.s.\) and consequently, \(\displaystyle \lim _{n\rightarrow \infty } P_{\mathbf {\beta }_0}\left( \sqrt{n}\Vert \widetilde{S}_{n} (\mathbf {\beta }_0)-S_{n}(\mathbf {\beta }_0)\Vert >\varepsilon \right) =0\). \(\square \)

Proof of Theorem 1

In this proof, we take L to be an arbitrary positive constant not necessarily the same, and as in the proof of Lemma 2, set \(b_{ni}(\mathbf {\beta })=R(\nu _{ni}(\mathbf {\beta }))/(n+1)\) and \(a_{ni}(\mathbf {\beta })=R(z_{i}(\mathbf {\beta }))/(n+1)\). By definition of \(\widetilde{D}_{n}(\mathbf {\beta })\) and \(D_{n}(\mathbf {\beta })\), we have

$$\begin{aligned} \widetilde{D}_{n}(\mathbf {\beta })-D_{n}(\mathbf {\beta })= & {} \frac{1}{n}\sum _{i=1}^{n} \left[ I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right] \varphi (b_{ni} (\mathbf {\beta }))\nu _{ni}(\mathbf {\beta })\\&+\,\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi (b_{ni} (\mathbf {\beta }))\nu _{ni}(\mathbf {\beta })-\varphi (a_{ni}(\mathbf {\beta })) z_{i}(\mathbf {\beta })\right] \nonumber . \end{aligned}$$
(14)

Considering the first term to the right-hand side of Eq. (14), we have

$$\begin{aligned}&\left| \frac{1}{n}\sum _{i=1}^{n}\left[ I_{\varGamma _{n}}({\mathbf X}_{i}) -I_{\varGamma }({\mathbf X}_{i})\right] \varphi (b_{ni}(\mathbf {\beta }))\nu _{ni} (\mathbf {\beta })\right| \\&\quad \le \frac{1}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}} ({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right| |\varphi (b_{ni}(\mathbf {\beta }))||\nu _{ni} (\mathbf {\beta })|\\&\quad \le \frac{L}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i}) -I_{\varGamma }({\mathbf X}_{i})\right| |\nu _{ni}(\mathbf {\beta })|\\&\quad \le \frac{L}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| |Y_{i}|\\&\qquad +\, \frac{L}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| |\widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|\\&\qquad +\, \frac{L}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| |g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|, \end{aligned}$$

where L is the bound of \(\varphi \), as \(\varphi \) is assumed bounded by assumption \((I_1)\). By Cauchy–Schwarz inequality, we have

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| |Y_{i}|\le & {} \left( \frac{1}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| ^{2}\right) ^{1/2}\left( \frac{1}{n} \sum _{i=1}^{n}|Y_{i}|^{2}\right) ^{1/2}. \end{aligned}$$

The strong law of large numbers gives \(n^{-1}\sum _{i=1}^{n}|Y_{i}|^{2}\rightarrow E[|Y|^{2}]<\infty \;a.s.\) On the other hand, we have,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| ^{2}\le & {} \max _{1\le i\le n}\sup _{h\in \mathscr {H}_{n}} \left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right| ^{2}\\\le & {} \max _{1\le i\le n} \sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}}\left| I_{\{{\mathbf x}:\widehat{\mu }_{\mathbf {\beta },h}^{i}({\mathbf x}^{\tau } \mathbf {\beta })\ge c\}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right| ^{2}\\&\rightarrow 0\quad a.s., \end{aligned}$$

as \(n\rightarrow \infty \), by Lemma 3. Similarly,

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| |\widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|\\&\quad \le \left( \frac{1}{n} \sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i}) \right| ^{2}\right) ^{1/2}\left( \frac{1}{n}\sum _{i=1}^{n}| \widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta })-g_{\mathbf {\beta }} ({\mathbf X}_{i}^{\tau }\mathbf {\beta })|^{2}I_{\varGamma }({\mathbf X}_{i})\right) ^{1/2}\\&\quad \le \left( \max _{1\le i\le n} \sup _{\mathbf {\beta }\in \mathscr {B}_{n}, h\in \mathscr {H}_{n}}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| ^{2}\right) ^{1/2}\\&\qquad \times \left( \max _{1\le i\le n} \sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}}| \widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|^{2}I_{\varGamma }({\mathbf X}_{i})\right) ^{1/2}. \end{aligned}$$

Again, by Lemma 3,

$$\begin{aligned} \max _{1\le i\le n} \sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}} \left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right| ^{2}\rightarrow 0\;a.s. \end{aligned}$$

and

$$\begin{aligned} \max _{1\le i\le n} \sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}}| \widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|^{2}I_{\varGamma }({\mathbf X}_{i})\rightarrow 0\;a.s. \end{aligned}$$

Thus, \(n^{-1}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| |\widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|\rightarrow 0\;a.s.\) Moreover,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| |g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|&\le \left( \frac{1}{n}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma } ({\mathbf X}_{i})\right| ^{2}\right) ^{1/2}\\&\quad \times \left( \frac{1}{n}\sum _{i=1}^{n}|g_{\mathbf {\beta }} ({\mathbf X}_{i}^{\tau }\mathbf {\beta })|^{2}\right) ^{1/2}. \end{aligned}$$

Following the same argument as above, we have \(n^{-1}\sum _{i=1}^{n}\left| I_{\varGamma _{n}}({\mathbf X}_{i})-I_{\varGamma }({\mathbf X}_{i})\right| ^{2}\rightarrow 0\;a.s.\), and a direct application of the strong of large numbers gives

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^{n}|g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|^{2}\rightarrow E\{|g_{\mathbf {\beta }}({\mathbf X}^{\tau }\mathbf {\beta })|^2\}\le E\{J^{2}({\mathbf X})\}<\infty \;a.s.,\\&\quad \text{ by } \text{ assumption } (I_3)-\mathrm{(iii)}. \end{aligned}$$

When it comes to the second term on the right-hand side of Eq. (14), it can be further decomposed as follows

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi (b_{ni} (\mathbf {\beta }))\nu _{ni}(\mathbf {\beta })-\varphi (a_{ni}(\mathbf {\beta })) z_{i}(\mathbf {\beta })\right] \\&\quad =\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i}) \varphi (b_{ni}(\mathbf {\beta }))[\nu _{ni}(\mathbf {\beta })-z_{i}(\mathbf {\beta })]\\&\qquad +\,\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi (b_{ni} (\mathbf {\beta }))-\varphi (a_{ni}(\mathbf {\beta }))\right] z_{i}(\mathbf {\beta }). \end{aligned}$$

Considering the first term to the right-hand side of this equation, we have

$$\begin{aligned}&\left| \frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\varphi (b_{ni} (\mathbf {\beta }))[\nu _{ni}(\mathbf {\beta })-z_{i}(\mathbf {\beta })]\right| \le \frac{L}{n}\sum _{i=1}^{n}|\widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau } \mathbf {\beta })-g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|I_{\varGamma }({\mathbf X}_{i})\\&\quad \le \max _{1\le i\le n} \sup _{\mathbf {\beta }\in \mathscr {B}_{n},h\in \mathscr {H}_{n}}|\widehat{g}_{\mathbf {\beta },h}^{i}({\mathbf X}_{i}^{\tau }\mathbf {\beta }) -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|I_{\varGamma }({\mathbf X}_{i}) \end{aligned}$$

which converges to 0 a.s. by Lemma 3. Now, let’s set \(F_{i\nu }(s)=P(\nu _{in}(\mathbf {\beta })\le s)\) and \(F_{i}(s)=P(z_{i}(\mathbf {\beta })\le s)\). Then,

$$\begin{aligned} \varphi (b_{ni}(\mathbf {\beta }))-\varphi (a_{ni}(\mathbf {\beta }))= & {} [\varphi (b_{ni}(\mathbf {\beta }))-\varphi (F_{i\nu }(\nu _{in}(\mathbf {\beta })))]\\&+\,[\varphi (F_{i\nu }(\nu _{in}(\mathbf {\beta })))-\varphi (F_{i}(z_{i}(\mathbf {\beta })))]\\&+\,[\varphi (F_{i}(z_{i}(\mathbf {\beta })))-\varphi (a_{ni}(\mathbf {\beta }))]. \end{aligned}$$

As in the proof of Lemma 2, since for \(i=1,\ldots ,n\) and for all \(\mathbf {\beta }\in \mathscr {B}\), \(F_{i\nu }(\nu _{in}(\mathbf {\beta }))\) and \(F_{i}(z_{i}(\mathbf {\beta }))\) are independent uniformly distributed random variables on (0, 1), following Hájek et al. (1999), we have, \(b_{ni}(\mathbf {\beta })-F_{i\nu }(\nu _{in}(\mathbf {\beta }))\rightarrow 0\;a.s.\) and \(a_{ni}(\mathbf {\beta })-F_{i}(z_{i}(\mathbf {\beta }))\rightarrow 0\;a.s.\), for each i. Applying the generalized continuous mapping theorem (Whitt 2011), we have \(\varphi (b_{ni}(\mathbf {\beta }))-\varphi (F_{i\nu }(\nu _{in}(\mathbf {\beta })))\rightarrow 0\;a.s.\) and \(\varphi (F_{i}(z_{i}(\mathbf {\beta })))-\varphi (a_{ni}(\mathbf {\beta }))\rightarrow 0\;a.s.\), for each i and for all \(\mathbf {\beta }\in \mathscr {B}\). Also, since \(\nu _{in}(\mathbf {\beta })-z_{i}(\mathbf {\beta })\rightarrow 0\;a.s.\), by the continuity of the probability measure and the continuity of \(\varphi \), we have \(\varphi (F_{i\nu }(\nu _{in}(\mathbf {\beta })))-\varphi (F_{i}(z_{i}(\mathbf {\beta })))\rightarrow 0\;a.s.\), for each i and for all \(\mathbf {\beta }\in \mathscr {B}\). Thus,

$$\begin{aligned}&\Big |\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi (b_{ni} (\mathbf {\beta }))-\varphi (a_{ni}(\mathbf {\beta }))\right] z_{i}(\mathbf {\beta })\Big |\\&\le \left( \frac{1}{n}\sum _{i=1}^{n}|\varphi (b_{ni}(\mathbf {\beta })) -\varphi (a_{ni}(\mathbf {\beta }))|^{2}\right) ^{1/2}\left( \frac{1}{n}\sum _{i=1}^{n} |z_{i}(\mathbf {\beta })|^2\right) ^{1/2}. \end{aligned}$$

From this, we have

$$\begin{aligned} n^{-1}\sum _{i=1}^{n}|\varphi (b_{ni} (\mathbf {\beta }))-\varphi (a_{ni}(\mathbf {\beta }))|^{2}\le \max _{1\le i\le n} \sup _{\mathbf {\beta }\in \mathscr {B}}|\varphi (b_{ni}(\mathbf {\beta }))-\varphi (a_{ni} (\mathbf {\beta }))|^{2}, \end{aligned}$$

which converges almost surely to zero. Furthermore,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}|z_{i}(\mathbf {\beta })|^2\le & {} \frac{1}{n} \sum _{i=1}^{n}\left( |Y_{i}|+|J({\mathbf X}_{i})|\right) ^{2} \le \frac{1}{n}\sum _{i=1}^{n}|Y_{i}|^{2}+\frac{1}{n}\sum _{i=1}^{n}| J({\mathbf X}_{i})|^{2} \nonumber \\&+\,2\left( \frac{1}{n}\sum _{i=1}^{n}|Y_{i}|^{2}\right) ^{1/2}\left( \frac{1}{n} \sum _{i=1}^{n}J^{2}({\mathbf X}_{i})\right) ^{1/2} := J_{4n}. \end{aligned}$$
(15)

By the strong law of large numbers, the entire expression on the right-hand side of this inequality converges a.s. to \(E\{|Y|^2\}+E\{J^{2}({\mathbf X})\}+2\left( E\{|Y|^2\}E\{J^{2}({\mathbf X})\}\right) ^{1/2}<\infty \), by assumptions \((I_2)\)–(iii) and \((I_4)\). Thus,

$$\begin{aligned} \sup _{\mathbf {\beta }\in \mathscr {B}}\left| \frac{1}{n}\sum _{i=1}^{n}I_{\varGamma } ({\mathbf X}_{i})\left[ \varphi (b_{ni}(\mathbf {\beta }))-\varphi (a_{ni}(\mathbf {\beta }))\right] z_{i}(\mathbf {\beta })\right| \rightarrow 0\;a.s. \end{aligned}$$

Now, combining all these facts, we have \(\displaystyle \sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|\widetilde{D}_{n}(\mathbf {\beta })-D_{n}(\mathbf {\beta })|\ \rightarrow \ 0\;\;a.s.\) \(\square \)

Proof of Theorem 2

Note that \(\varphi \) has a bounded first derivative. So, \(\varphi \in Lip(1)\). Moreover, by \((I_2)\)–(iii) and \((I_4)\), we have \(\text{ Var }(z_i(\mathbf {\beta })) < \infty \), for all i and \(\mathbf {\beta } \in \mathscr {B}\). Then

$$\begin{aligned} \sum _{i = 1}^n \frac{\text{ Var }(z_i(\mathbf {\beta }))}{n^2} \le \frac{\sigma ^2_\mathrm{{max}}(\mathbf {\beta })}{n} = O(1/n), \end{aligned}$$

where \(\sigma ^2_\mathrm{{max}}(\mathbf {\beta }) = \max \{\text{ Var }(z_1(\mathbf {\beta })), \ldots , \text{ Var }(z_n(\mathbf {\beta }))\}\). Setting \(\alpha _n = 1/n\) and \(\beta =1\) in the theorem of (Xiang 1995), we find that for every \(\mathbf {\beta } \in \mathscr {B}\), \(D_n(\mathbf {\beta }) - E\{D_n(\mathbf {\beta })\} \rightarrow 0 \ a.s.\)

To complete the proof, we have to show that \(\{D_{n}(\mathbf {\beta })\}_{n\ge 1}\) is stochastically equicontinuous. To that end, taking \(\mathbf {\beta }_{1},\mathbf {\beta }_{2}\in \mathscr {B}\), we have

$$\begin{aligned}&D_{n}(\mathbf {\beta }_{1})-D_{n}(\mathbf {\beta }_{2})\\&\quad =\frac{1}{n} \sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi \left( \frac{R(z_{i}(\mathbf {\beta }_1))}{n+1}\right) z_{i}(\mathbf {\beta }_{1})-\varphi \left( \frac{R(z_{i}(\mathbf {\beta }_2))}{n+1}\right) z_{i}(\mathbf {\beta }_{2})\right] . \end{aligned}$$

As in the proof of Theorem 1, set \(a_{ni}(\mathbf {\beta })=R(z_{i}(\mathbf {\beta }))/(n+1)\). Then,

$$\begin{aligned} D_{n}(\mathbf {\beta }_{1})-D_{n}(\mathbf {\beta }_{2})= & {} \frac{1}{n} \sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi \left( a_{ni} (\mathbf {\beta }_1)\right) z_{i}(\mathbf {\beta }_{1})-\varphi \left( a_{ni} (\mathbf {\beta }_2)\right) z_{i}(\mathbf {\beta }_{2})\right] \\= & {} \frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\varphi \left( a_{ni} (\mathbf {\beta }_1)\right) \left[ z_{i}(\mathbf {\beta }_{1})-z_{i}(\mathbf {\beta }_{2})\right] \\&+\,\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi \left( a_{ni} (\mathbf {\beta }_1)\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }_1))\}\right] z_{i} (\mathbf {\beta }_2)\\&+\,\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi \{F_{i}(z_{i} (\mathbf {\beta }_1))\}-\varphi \{F_{i}(z_{i}(\mathbf {\beta }_2))\}\right] z_{i}(\mathbf {\beta }_2)\\&+\, \frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\left[ \varphi \{F_{i}(z_{i} (\mathbf {\beta }_2))\}-\varphi \left( a_{ni}(\mathbf {\beta }_2)\right) \right] z_{i}(\mathbf {\beta }_2). \end{aligned}$$

Note that \(z_{i}(\mathbf {\beta }_1)-z_{i}(\mathbf {\beta }_2)=g_{\mathbf {\beta }_{1}} ({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{1})-g_{\mathbf {\beta }_{2}}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{2})\). Since \(g_{\mathbf {\beta }}(\cdot )\) is differentiable with respect to \(\mathbf {\beta }\), applying the mean value theorem on the function \(g_{\mathbf {\beta }}({\mathbf X}^{\tau }\mathbf {\beta })\), there exists \(\mathbf {\xi }=\lambda \mathbf {\beta }_{1}+(1-\lambda )\mathbf {\beta }_{2}\) for some \(\lambda \in (0,1)\) such that

$$\begin{aligned} g_{\mathbf {\beta }_{1}}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{1}) -g_{\mathbf {\beta }_{2}}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{2}) =\nabla _{\mathbf {\xi }}[g_{\mathbf {\xi }}({\mathbf X}_{i}^{\tau }\mathbf {\xi })] (\mathbf {\beta }_{1}-\mathbf {\beta }_{2}). \end{aligned}$$

Then, by assumption \((I_2)\)–(iii) we have

$$\begin{aligned} |g_{\mathbf {\beta }_{1}}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{1}) -g_{\mathbf {\beta }_{2}}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_{2})|=|\nabla _{\mathbf {\xi }}[g_{\mathbf {\xi }}({\mathbf X}_{i}^{\tau }\mathbf {\xi })](\mathbf {\beta }_{1}-\mathbf {\beta }_{2})|\le J({\mathbf X}_{i})\Vert \mathbf {\beta }_{1}-\mathbf {\beta }_{2}\Vert . \end{aligned}$$

Furthermore, set \(h_{i}(\mathbf {\beta })=\varphi \{F_{i}(z_{i}(\mathbf {\beta }))\}=\varphi \{F_{i}(Y_{i} -g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta }))\}\), where \(F_{i}\) is a cumulative distribution function of \(z_{i}(\mathbf {\beta })\), and therefore almost surely differentiable. So by the mean value theorem, there exists \(\eta =\lambda \mathbf {\beta }_{1}+(1-\lambda )\mathbf {\beta }_{2}\) for \(\lambda \in (0,1)\) such that \(h_{i}(\mathbf {\beta }_{1})-h_{i}(\mathbf {\beta }_{2})=h'_{i}(\eta )(\mathbf {\beta }_{1} -\mathbf {\beta }_{2})\), with \(h'_{i}(\eta )=-\nabla _{\eta }[g_{\eta }({\mathbf X}_{i}^{\tau }\eta )]f_{i}(z_{i}(\eta )) \varphi ^{\prime }\{F_{i}(z_{i}(\eta ))\}\) and \(f_{i}(t)=dF_{i}(t)/dt\). It is worth pointing out that \(f_{i}\) being a density is almost surely bounded. Thus, by assumption \((I_2)-iii)\) again together with the boundedness of \(\varphi ^{\prime }\), we have \(\Vert h'_{i}(\eta )\Vert \le MJ({\mathbf X}_{i})\; a.s.\), where M is such that \(|f_{i}(z_{i}(\eta ))\varphi ^{\prime }\{F_{i}(z_{i}(\eta ))\}|\le M\;a.s.\) On the other hand, for \(i=1,\ldots ,n\), \(F_{i}(z_{i}(\mathbf {\beta }))\) being independent uniformly distributed in the interval (0, 1), for all \(\mathbf {\beta }\in \mathscr {B}\), as in Theorem 1, following Hájek et al. (1999) again, it is obtained that \(a_{ni}(\mathbf {\beta })-F_{i}(z_{i}(\mathbf {\beta }))\rightarrow 0\;a.s.\), for all \(\mathbf {\beta }\in \mathscr {B}\) and for each i. By continuity of \(\varphi \), we have \(\varphi \left( a_{ni}(\mathbf {\beta })\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }))\}\rightarrow 0\;a.s.\), for all \(\mathbf {\beta }\in \mathscr {B}\) and for each i. Thus,

$$\begin{aligned} \max _{1\le i\le n}|\varphi \left( a_{ni}(\mathbf {\beta })\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }))\}|\rightarrow 0\;a.s., \end{aligned}$$

for all \(\mathbf {\beta }\in \mathscr {B}\). Now

$$\begin{aligned} \left| \frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\varphi \left( a_{ni} (\mathbf {\beta }_1)\right) \left[ z_{i}(\mathbf {\beta }_{1})-z_{i}(\mathbf {\beta }_{2}) \right] \right| \le \Vert \mathbf {\beta }_{1}-\mathbf {\beta }_{2}\Vert \frac{L}{n}\sum _{i=1}^{n} J({\mathbf X}_{i}), \end{aligned}$$

where L is such that \(|\varphi (t)|\le L\), for all \(t\in (0,1)\). Also, with probability 1, we have

$$\begin{aligned}&\left| \frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i}) \left[ \varphi \{F_{i}(z_{i}(\mathbf {\beta }_1))-\varphi \{F_{i}(z_{i} (\mathbf {\beta }_2))\}\right] z_{i}(\mathbf {\beta }_2)\right| \\ {}\le & {} \Vert \mathbf {\beta }_1 -\mathbf {\beta }_2\Vert \frac{M}{n}\sum _{i=1}^{n}J({\mathbf X}_{i})|z_{i}(\mathbf {\beta }_2)|\\\le & {} \Vert \mathbf {\beta }_1-\mathbf {\beta }_2\Vert M\left( \frac{1}{n}\sum _{i=1}^{n}J^{2} ({\mathbf X}_{i})\right) ^{1/2}\left( \frac{1}{n}\sum _{i=1}^{n}|z_{i} (\mathbf {\beta }_2)|^{2}\right) ^{1/2}\\\le & {} \Vert \mathbf {\beta }_1-\mathbf {\beta }_2\Vert M\left( \frac{1}{n}\sum _{i=1}^{n}J^{2} ({\mathbf X}_{i})\right) ^{1/2}\left( \frac{1}{n}\sum _{i=1}^{n}[|Y_{i}| +J({\mathbf X}_{i})]^{2}\right) ^{1/2}. \end{aligned}$$

Moreover,

$$\begin{aligned}&\left| \frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i}) \left[ \varphi \left( a_{ni}(\mathbf {\beta }_1)\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }_1))\}\right] z_{i}(\mathbf {\beta }_2)\right| \\\le & {} \frac{1}{n}\sum _{i=1}^{n}\left| \varphi \left( a_{ni} (\mathbf {\beta }_1)\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }_1))\}\right| |z_{i} (\mathbf {\beta }_2)|\\\le & {} \left( \max _{1\le i\le n}|\varphi \left( a_{ni}(\mathbf {\beta }_1)\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }_1))\}|^{2}\right) ^{1/2}\left( \frac{1}{n}\sum _{i=1}^{n}[|Y_{i}|+J({\mathbf X}_{i})]^{2}\right) ^{1/2}\rightarrow 0\;a.s. \end{aligned}$$

as \(\max _{1\le i\le n}|\varphi \left( a_{ni}(\mathbf {\beta }_1)\right) -\varphi \{F_{i}(z_{i}(\mathbf {\beta }_1))\}|^{2}\rightarrow 0\;\;a.s.\) and

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\left( |Y_{i}|+|J({\mathbf X}_{i})|\right) ^{2}\le & {} J_{4n}, \end{aligned}$$

where \(J_{4n}\), defined in Eq. (15), converges almost surely to a finite quantity by the strong law of large numbers under assumptions \((I_2)\)–(iii) and \((I_4)\). Similarly,

$$\begin{aligned} \left| \frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i}) \left[ \varphi \{F_{i}(z_{i}(\mathbf {\beta }_2))\}-\varphi \left( a_{ni} (\mathbf {\beta }_2)\right) \right] z_{i}(\mathbf {\beta }_2)\right| \end{aligned}$$

converges almost surely to zero. Hence, with probability 1, we have

$$\begin{aligned} |D_{n}(\mathbf {\beta }_{1})-D_{n}(\mathbf {\beta }_{2})|\le B_n\Vert \mathbf {\beta }_{1} -\mathbf {\beta }_{2}\Vert , \end{aligned}$$

where

$$\begin{aligned} B_{n}=: \frac{L}{n}\sum _{i=1}^{n}J({\mathbf X}_{i})+\times M \left( \frac{1}{n}\sum _{i=1}^{n}J^{2}({\mathbf X}_{i})\right) ^{1/2}J_{4n}^{1/2}+o(1). \end{aligned}$$

For n large enough, \(B_{n}\) does not depend on \(\mathbf {\beta }\). From the fact that all terms in the definition of \(B_{n}\) converge almost surely to a finite quantity, so does \(B_{n}\). Therefore, \(\{D_{n}(\mathbf {\beta })\}_{n\ge 1}\) is stochastically equicontinuous (Rao et al. 2014). \(\square \)

Proof of Theorem 4

Note that by Jensen inequality,

$$\begin{aligned} \sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|E(\widetilde{D}_{n}(\mathbf {\beta })) -E(D_{n}(\mathbf {\beta }))|\le E\Big (\sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|\widetilde{D}_{n}(\mathbf {\beta }) -D_{n}(\mathbf {\beta })|\Big ). \end{aligned}$$
(16)

Thus, together with Theorem 1, applying the dominated convergence theorem to the right-hand side of this inequality, we obtain the result. On the other hand,

$$\begin{aligned} \widetilde{D}_{n}(\mathbf {\beta })-E(\widetilde{D}_{n}(\mathbf {\beta }))= & {} \widetilde{D}_{n}(\mathbf {\beta })-D_{n}(\mathbf {\beta })+D_{n}(\mathbf {\beta }) -E(D_{n}(\mathbf {\beta }))+E(D_{n}(\mathbf {\beta }))\\&-E(\widetilde{D}_{n}(\mathbf {\beta })). \end{aligned}$$

Thus,

$$\begin{aligned} \sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|\widetilde{D}_{n} (\mathbf {\beta })-E(\widetilde{D}_{n}(\mathbf {\beta }))|&\le \sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|\widetilde{D}_{n}(\mathbf {\beta })-D_{n}(\mathbf {\beta })|\nonumber \\&\quad +\sup _{\mathbf {\beta }\in \mathscr {B}}|D_{n}(\mathbf {\beta })-E(D_{n}(\mathbf {\beta }))|\nonumber \\&\quad + \sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|E(D_{n}(\mathbf {\beta })) -E(\widetilde{D}_{n}(\mathbf {\beta }))|.\nonumber \\ \end{aligned}$$
(17)

From Theorems 1,  2 and Eq. (16), the terms to the right-hand side of Eq. (17) converge to zero with probability 1. \(\square \)

Proof of Theorem 5

By assumption \((I_6)\), \(\mathbf {\beta }_{0,n}=\mathop {{{\mathrm{Argmin}}}}\limits _{\mathbf {\beta }}E(D_{n}(\mathbf {\beta }))\) which implies that

$$\begin{aligned} E(D_{n}(\mathbf {\beta }_{0,n}))\le E(D_{n}(\mathbf {\beta })), \end{aligned}$$

for all \(\mathbf {\beta }\in \mathscr {B}\). On the other hand, by Theorem 4, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _{\mathbf {\beta }\in \mathscr {B},h\in \mathscr {H}_{n}}|E(\widetilde{D}_{n}(\mathbf {\beta })) -E(D_{n}(\mathbf {\beta }))|=0. \end{aligned}$$

Thus, \(\forall ~\varepsilon >0\), there exists \(N>0\) such that for all \(n\ge N\), \(|E(\widetilde{D}_{n}(\mathbf {\beta }))-E(D_{n}(\mathbf {\beta }))|<\varepsilon /2\) for all \(\mathbf {\beta }\in \mathscr {B}\). This implies that

$$\begin{aligned} -\varepsilon /2+E(D_{n}(\mathbf {\beta }_{0,n}))<E(\widetilde{D}_{n}(\mathbf {\beta })). \end{aligned}$$
(18)

Also, for all \(n\ge N\), \(|E(D_{n}(\mathbf {\beta }_{0,n}))-E(\widetilde{D}_{n}(\mathbf {\beta }_{0,n}))|<\varepsilon /2\). Thus, we have

$$\begin{aligned} -\varepsilon /2+E\left( \widetilde{D}_{n}\left( \mathbf {\beta }_{0,n}\right) \right) <E\left( D_{n}\left( \mathbf {\beta }_{0,n}\right) \right) . \end{aligned}$$
(19)

Equations (19) in (18) gives \(-\varepsilon +E(\widetilde{D}_{n}(\mathbf {\beta }_{0,n}))<E(\widetilde{D}_{n}(\mathbf {\beta }))\), for all \(\mathbf {\beta }\in \mathscr {B}\) and for all \(n\ge N\). Now \(\varepsilon \) being arbitrary, letting \(\varepsilon \rightarrow 0\), we have \(E(\widetilde{D}_{n}(\mathbf {\beta }_{0,n}))\le E(\widetilde{D}_{n}(\mathbf {\beta }))\), for all \(\mathbf {\beta }\in \mathscr {B}\) which completes the proof. \(\square \)

Proof of Theorem 6

Note that

$$\begin{aligned} S_{n}(\mathbf {\beta })-T_{n}(\mathbf {\beta })=\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma } ({\mathbf X}_{i})\nabla _{\mathbf {\beta }}[g_{\mathbf {\beta }}({\mathbf X}_{i}^{\tau }\mathbf {\beta })] \left[ \varphi \left( \frac{R(z_{i}(\mathbf {\beta }))}{n+1}\right) -\varphi \left( F_{i}(z_{i}(\mathbf {\beta }))\right) \right] . \end{aligned}$$

So,

$$\begin{aligned} |S_{n}(\mathbf {\beta })-T_{n}(\mathbf {\beta })|\le & {} \frac{1}{n}\sum _{i=1}^{n} J({\mathbf X}_{i})\left| \varphi \left( \frac{R(z_{i}(\mathbf {\beta }))}{n+1}\right) -\varphi \left( F_{i}(z_{i}(\mathbf {\beta }))\right) \right| \quad \text{ by } (I_2)-iii)\\\le & {} \left\{ \frac{1}{n}\sum _{i=1}^{n}J^{2}({\mathbf X}_{i})\right\} ^{1/2}\left\{ \max _{1\le i\le n}\sup _{\mathbf {\beta }\in \mathscr {B}}\left| \varphi \left( \frac{R(z_{i}(\mathbf {\beta }))}{n+1}\right) \right. \right. \\&\left. \left. -\varphi \left( F_{i}(z_{i}(\mathbf {\beta }))\right) \right| ^{2}\right\} ^{1/2}. \end{aligned}$$

By continuity of \(\varphi \) and the fact that for \(i=1,\ldots ,n\), \(F_{i}(z_{i}(\mathbf {\beta }))\) are independent uniformly distributed in (0, 1), once again following (Hájek et al. 1999), we have \(\left| \varphi \left( \frac{R(z_{i}(\mathbf {\beta }))}{n+1}\right) -\varphi \left( F_{i}(z_{i}(\mathbf {\beta }))\right) \right| \rightarrow 0\;a.s.\), for all i and \(\mathbf {\beta }\in \mathscr {B}\). Thus,

$$\begin{aligned} \max _{1\le i\le n}\sup _{\mathbf {\beta }\in \mathscr {B}}\left| \varphi \left( \frac{R(z_{i} (\mathbf {\beta }))}{n+1}\right) -\varphi \left( F_{i}(z_{i}(\mathbf {\beta }))\right) \right| ^{2}\rightarrow 0\;a.s. \end{aligned}$$

On the other hand, \(n^{-1}\sum _{i=1}^{n}J^{2}({\mathbf X}_{i})\rightarrow E[J^{2}({\mathbf X})]<\infty \;a.s.\) Hence, \(\displaystyle \lim _{n\rightarrow \infty }\sup _{\mathbf {\beta }\in \mathscr {B}}|S_{n} (\mathbf {\beta })-T_{n}(\mathbf {\beta })|=0\;a.s.\) \(\square \)

Proof of Theorem 7

Note that

$$\begin{aligned} \nabla _{\mathbf {\beta }_0}T_{n}(\mathbf {\beta }_0)= & {} -\frac{1}{n} \sum _{i=1}^n I_{\varGamma }({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0} [g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta })]\{\nabla _{\mathbf {\beta }_0} [g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\}^{\tau }f(z_{i}(\mathbf {\beta }_{0}))\\&\varphi '(F(z_{i}(\mathbf {\beta }_{0})))+\,\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0}^{2} [g_{\mathbf {\beta }}({\mathbf X}_i^{\tau }\mathbf {\beta }_{0})]\varphi (F(z_{i}(\mathbf {\beta }_{0}))). \end{aligned}$$

A direct application of the strong of large numbers shows that \(\nabla _{\mathbf {\beta }_0}T_{n}(\mathbf {\beta }_0)\rightarrow \mathscr {{\mathbf W}}\;a.s.\) If we assume that \({\mathbf X}\) is independent of \(\varepsilon \), we have

$$\begin{aligned} {\mathbf W}= & {} -E\{I_{\varGamma }({\mathbf X})\nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}^{\tau }\mathbf {\beta })\big ) [\nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}^{\tau }\mathbf {\beta }_0)\big )]^{\tau }\}E\{f(\varepsilon )\varphi '(F(\varepsilon ))\}\\&+\,E\{I_{\varGamma }({\mathbf X})\nabla _{\mathbf {\beta }_0}^{2} [g_{\mathbf {\beta }}({\mathbf X}^{\tau }\mathbf {\beta }_{0})]\}E\{\varphi (F(\varepsilon ))\}. \end{aligned}$$

But

$$\begin{aligned} E\left[ f(\varepsilon )\varphi '(F(\varepsilon ))\right] =\int _{-\infty }^{\infty }f (\varepsilon )\varphi '(F(\varepsilon ))dF(\varepsilon ) =-\int _{-\infty }^{\infty }f'(\varepsilon )\varphi (F(\varepsilon ))d\varepsilon , \end{aligned}$$

from integration by parts, since \(f(\varepsilon )\varphi (F(\varepsilon ))\rightarrow 0\) as \(\varepsilon \rightarrow \pm \infty \). Now, putting \(u=F(\varepsilon )\), we have

$$\begin{aligned} \int _{-\infty }^{\infty }f'(\varepsilon )\varphi (F(\varepsilon ))d\varepsilon = -\int _{0}^{1}\varphi (u)\varphi _{f}(u)du=-\gamma _{\varphi }^{-1}. \end{aligned}$$

On the other have, by assumption \((I_1)\), \(E\left[ \varphi \big (F(\varepsilon ))\big )\right] =\int _{0}^{1}\varphi (t)dt=0\). Thus,

$$\begin{aligned} {\mathbf W}=\gamma _{\varphi }^{-1}E\{I_{\varGamma }({\mathbf X})\nabla _{\mathbf {\beta }_0} \big (g_{\mathbf {\beta }_0}({\mathbf X}^{\tau }\mathbf {\beta })\big ) [\nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}^{\tau } \mathbf {\beta }_0)\big )]^{\tau }\}. \end{aligned}$$

On the other hand, to simplify notation, set \({\mathbf A}_{i}=\nabla _{\mathbf {\xi }}[g_{\mathbf {\xi }}({\mathbf X}_{i}^{\tau }\mathbf {\xi })]\), \({\mathbf B}_{i}=\nabla _{\mathbf {\xi }}^{2}[g_{\mathbf {\xi }}({\mathbf X}_{i}^{\tau }\mathbf {\xi })]\) and \({\mathbf C}_{i}=\nabla _{\mathbf {\xi }}^{3}[g_{\mathbf {\xi }}({\mathbf X}_{i}^{\tau }\mathbf {\xi })]\)

$$\begin{aligned} \nabla _{\mathbf {\xi }}^{2}T_{n}(\mathbf {\xi })= & {} -\frac{3}{n}\sum _{i=1}^n I_{\varGamma } ({\mathbf X}_{i}){\mathbf B}_i{\mathbf A}_i^{\tau }f_{i}(z_{i}(\mathbf {\xi }))\varphi '(F_{i}(z_{i}(\mathbf {\xi })))\\&+\,\frac{1}{n}\sum _{i=1}^{n} I_{\varGamma }({\mathbf X}_{i}){\mathbf C}_i\varphi (F_{i}(z_{i}(\mathbf {\xi })))\\&+\,\frac{1}{n}\sum _{i=1}^{n} I_{\varGamma }({\mathbf X}_{i}){\mathbf A}_{i}{\mathbf A}_{i}^{\tau }{\mathbf A}_{i} f'_{i}(z_{i}(\mathbf {\xi }))\varphi '(F_{i}(z_{i}(\mathbf {\xi })))\\&+\,\frac{1}{n}\sum _{i=1}^{n} I_{\varGamma }({\mathbf X}_{i}){\mathbf A}_{i}{\mathbf A}_{i}^{\tau }{\mathbf A}_{i} f^{2}_{i}(z_{i}(\mathbf {\xi }))\varphi ''(F_{i}(z_{i}(\mathbf {\xi }))). \end{aligned}$$

From this, it can be easily shown that with each term to the right-hand side of this equation is bounded by

$$\begin{aligned} 3L n^{-1}\sum _{i=1}^{n}\exp \{\lambda \Vert {\mathbf X}_{i}\Vert \}[J({\mathbf X}_{i})+J^{2}({\mathbf X}_{i})+J^{3}({\mathbf X}_{i})], \end{aligned}$$

which converges almost surely to \(3L\times E[\exp \{\lambda \Vert {\mathbf X}\Vert \}\{J({\mathbf X})+J^{2}({\mathbf X})+J^{3}({\mathbf X})\}]<\infty \), by the strong law of large numbers under \((I_2)\)–(iii) and \((I_4)\). Thus, \(\nabla _{\mathbf {\beta }}^{2}T_{n}(\mathbf {\xi })\) is almost surely bounded and the result follows from Theorem 2. \(\square \)

Proof of Theorem 8

We mimic the proof given in Hettmansperger and McKean (1998) for the linear model. Set

$$\begin{aligned} T_{n}(\mathbf {\beta }_{0})=\frac{1}{n}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i}) \nabla _{\mathbf {\beta }_0}[g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta })] \varphi [F(\varepsilon _{i}(\mathbf {\beta }_{0}))]. \end{aligned}$$

It follows by a routine argument that \(\sqrt{n}(S_{n}(\mathbf {\beta }_{0})-T_{n}(\mathbf {\beta }_{0}))\) converges to \(\mathbf {0}\) in probability. Hence, the proof will be completed by showing that \(\sqrt{n}T_{n}(\mathbf {\beta }_{0})\) converges to the intended distribution. Using the Cramér–Wold device (Serfling 1980), let

$$\begin{aligned} U=n^{-1/2}\sum _{i=1}^{n}I_{\varGamma }({\mathbf X}_{i}){\mathbf a}^{\tau }\nabla _{\mathbf {\beta }_0} [g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)]\varphi [F(\varepsilon _{i} (\mathbf {\beta }_{0}))], \end{aligned}$$

where \({\mathbf a}\in \mathbb {R}^p\). Since F is the distribution of \(\varepsilon (\mathbf {\beta }_{0})\) and \(\int _{0}^{1}\varphi (t)\mathrm{d}t=0\), we have \(E(U)=0\). Also, since \(\int _{0}^{1}\varphi ^{2}(t)\mathrm{d}t=1\),

$$\begin{aligned} \mathrm{Var}(U)=\frac{1}{n}\sum _{i=1}^{n}E(I_{\varGamma }({\mathbf X}_{i}){\mathbf a}^{\tau } \nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big ) [\nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0) \big )]^{\tau }{\mathbf a}\ \rightarrow \ {\mathbf a}^{\tau }\varSigma {\mathbf a}\quad a.s. \end{aligned}$$

Note that U is the sum of independent functions of random variables which are not necessarily identically distributed; hence, we need to establish the limit distribution by the Lindeberg–Feller central limit theorem. To this end, set \(\sigma _{n}^{2}=\mathrm{Var}(U)\). Defining \(A_{n}\) by

$$\begin{aligned} A_{n}=\frac{1}{\sqrt{n}}I_{\varGamma }({\mathbf X}_{i})\nabla _{\mathbf {\beta }_0} \big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big )[\nabla _{\mathbf {\beta }_0} \big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big )]^{\tau }\varphi [F(\varepsilon _{i}(\mathbf {\beta }_{0}))], \end{aligned}$$

we need to show that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{\sigma _{n}^{2}}\sum _{i=1}^{n}E[A_{n}^{2}I\left\{ |A_{n}|>\varepsilon \sigma _{n}\right\} ]=0. \end{aligned}$$
(20)

By assumption \((I_3)\)–(iii), \(\Vert \nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta })\Vert \le J({\mathbf X}_{i})\) and so,

$$\begin{aligned} \frac{1}{\sqrt{n}}\left| {\mathbf a}^{\tau } \nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0} ({\mathbf X}_{i}^{\tau }\mathbf {\beta })\right| \le \frac{1}{\sqrt{n}}J({\mathbf X}_{i})\Vert {\mathbf a}\Vert . \end{aligned}$$

\(J(\cdot )\) being integrable, is almost surely bounded. Thus, there exists a positive constant c such that \(J({\mathbf X}_{i})\le c\;a.s.\), and therefore, \(n^{-1/2}|{\mathbf a}^{\tau } \nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|\le n^{-1/2}c\Vert {\mathbf a}\Vert \;a.s.\) Hence,

$$\begin{aligned} \frac{1}{\sqrt{n}}|{\mathbf a}^{\tau } \nabla _{\mathbf {\beta }_0} \big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta })|\rightarrow 0\;a.s.\quad as\;\;n\rightarrow \infty . \end{aligned}$$

Set \(\lambda _{n}=n^{-1/2}c\Vert {\mathbf a}\Vert \). Then, \(\lambda _{n}\rightarrow 0\) as \(n\rightarrow \infty \), and is independent of i. Since \(\sigma ^{2}_{n}\) converges to a positive quantity, the ratio \(\sigma _{n}/\lambda _{n}\rightarrow \infty \) as \(n\rightarrow \infty \). Now conditioning on \({\mathbf X}_{i}\), it is easy to see that

$$\begin{aligned} E[A_{n}^{2}I\{|A_{n}|>\varepsilon \sigma _{n}\}]\le & {} E\Big [\varphi ^{2} [F(\varepsilon (\mathbf {\beta }_{0}))]I\Big (\big |\varphi [F(\varepsilon (\mathbf {\beta }_{0}))]\big |>\varepsilon \sigma _{n}/\lambda _{n}\Big )\Big ]\\&\times \frac{1}{n}\sum _{i=1}^{n}E\{I_{\varGamma }({\mathbf X})\nabla _{\mathbf {\beta }_0} \big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big )[\nabla _{\mathbf {\beta }_0} \big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big )]^{\tau }\}. \end{aligned}$$

In this expression, \(\lim _{n \rightarrow \infty } n^{-1}\sum _{i=1}^{n}E\{I_{\varGamma }({\mathbf X})\nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big )[\nabla _{\mathbf {\beta }_0}\big (g_{\mathbf {\beta }_0}({\mathbf X}_{i}^{\tau }\mathbf {\beta }_0)\big )]^{\tau }\}< \infty \) by \((I_2)\)–(iii), \((I_4)\) and \((I_6)\). From the boundedness of \(\varphi \) and applying the dominated convergence theorem, we have

$$\begin{aligned} E\Big [\varphi ^{2}[F(\varepsilon (\mathbf {\beta }_{0}))]I\Big (\big |\varphi [F(\varepsilon (\mathbf {\beta }_{0}))]\big |>\varepsilon \sigma _{n}/\lambda _{n}\Big )\Big ] \rightarrow 0\quad as\quad n\rightarrow \infty . \end{aligned}$$

This shows that the limit in (20) goes to zero as \(n\rightarrow \infty \). \(\square \)

Proof of Theorem 10

Recall that from Eq. (2), for any \({\mathbf X}_{i}\in \varGamma \),

$$\begin{aligned} D_{n}(\mathbf {\beta })=\frac{1}{n}\sum _{i=1}^{n}\varphi \Big (\frac{R(z_{i}(\mathbf {\beta }))}{n+1}\Big )z_{i}(\mathbf {\beta })=\frac{1}{n}\sum _{i=1}^{n}\varphi \Big (\frac{i}{n+1}\Big ) z_{(i)}(\mathbf {\beta }), \end{aligned}$$

where \(z_{(1)}(\mathbf {\beta })\le z_{(2)}(\mathbf {\beta })\le \cdots \le z_{(n)}(\mathbf {\beta })\). Since R(t) is a step function, it has a finite number of jumps. The set of such jumps is finite and therefore has a zero probability. Since \(g_{\mathbf {\beta }}(\cdot )\) is assumed to be three times continuously differentiable by \((I_2)\)–(iii), \(D_{n}(\mathbf {\beta })\) is almost surely differentiable. From this, taking into account Theorem 6 and expanding \(D_{n}(\mathbf {\beta })\) around \(\mathbf {\beta }_{0}\) up to order 2, we have with probability 1,

$$\begin{aligned} D_{n}(\mathbf {\beta })=D_{n}(\mathbf {\beta }_0)+(\mathbf {\beta }-\mathbf {\beta }_0)S_{n} (\mathbf {\beta }_0)+\frac{1}{2}(\mathbf {\beta }-\mathbf {\beta }_{0})^{\tau }\nabla _{\mathbf {\beta }} T_{n}(\mathbf {\xi })(\mathbf {\beta }-\mathbf {\beta }_0)+o(1), \end{aligned}$$

where \(\mathbf {\xi }=\lambda \mathbf {\beta }_{0}+(1-\lambda )\mathbf {\beta }\), for \(\lambda \in (0,1)\). Thus,

$$\begin{aligned} M_{n}(\mathbf {\beta })-D_{n}(\mathbf {\beta })= & {} (\mathbf {\beta }-\mathbf {\beta }_{0})^\tau \nabla _{\mathbf {\beta }_0}T_n(\mathbf {\beta }_0) (\mathbf {\beta }-\mathbf {\beta }_{0}) \\&-\frac{1}{2}(\mathbf {\beta }-\mathbf {\beta }_{0})^{\tau }\nabla _{\mathbf {\beta }} T_{n}(\mathbf {\xi })(\mathbf {\beta }-\mathbf {\beta }_0)+o(1). \end{aligned}$$

From this, we have

$$\begin{aligned} |M_{n}(\mathbf {\beta })-D_{n}(\mathbf {\beta })|\le & {} \Vert \nabla _{\mathbf {\beta }_0} T_n(\mathbf {\beta }_0)\Vert \Vert \mathbf {\beta }-\mathbf {\beta }_{0}\Vert ^{2} +\frac{1}{2}\Vert \nabla _{\mathbf {\beta }}T_{n}(\mathbf {\xi })\Vert \Vert \mathbf {\beta } -\mathbf {\beta }_{0}\Vert ^{2}+o(1)\\= & {} \left\{ \Vert \nabla _{\mathbf {\beta }_0}T_n(\mathbf {\beta }_0)\Vert +\frac{1}{2}\Vert \nabla _{\mathbf {\beta }}T_{n}(\mathbf {\xi })\Vert \right\} \Vert \mathbf {\beta }-\mathbf {\beta }_{0}\Vert ^{2}+o(1)\\\le & {} \frac{3L}{2}\Vert \mathbf {\beta }-\mathbf {\beta }_{0}\Vert ^{2}\frac{1}{n} \sum _{i=1}^{n}[J({\mathbf X}_{i})+J^{2}({\mathbf X}_{i})]+o(1) . \end{aligned}$$

as \(\Vert \nabla _{\mathbf {\beta }_0}T_n(\mathbf {\beta }_0)\Vert \) and \(\Vert \nabla _{\mathbf {\beta }}T_{n}(\mathbf {\xi })\Vert \) are bounded by \(L n^{-1}\sum _{i=1}^{n}[J({\mathbf X}_{i})+J^{2}({\mathbf X}_{i})]\). On the other hand, \(n^{-1}\sum _{i=1}^{n}[J({\mathbf X}_{i})+J^{2}({\mathbf X}_{i})]\rightarrow E\big [J({\mathbf X})+J^{2}({\mathbf X})\big ]<\infty \;a.s.\), by assumption \((I_2)\)–(iii) and \((I_4)\). Now, for any \(\mathbf {\beta }\in \mathscr {B}_{n}\), \(\Vert \mathbf {\beta }-\mathbf {\beta }_{0}\Vert \le c/\sqrt{n}\). This implies that

$$\begin{aligned} \sup _{\mathbf {\beta }\in \mathscr {B}_{n}}|M_{n}(\mathbf {\beta })-D_{n}(\mathbf {\beta })|\le \frac{3c^{2}L}{2n}\frac{1}{n}\sum _{i=1}^{n}[J({\mathbf X}_{i})+J^{2}({\mathbf X}_{i})]+o(1). \end{aligned}$$

By Markov’s inequality, we have for any \(\varepsilon >0\) and for n large enough,

$$\begin{aligned} P_{\mathbf {\beta }_{0}}\Big [\sup _{\mathbf {\beta }\in \mathscr {B}_{n}}|D_{n} (\mathbf {\beta })-M_{n}(\mathbf {\beta })|>\varepsilon \Big ]\le & {} \frac{1}{\varepsilon } E\left[ \sup _{\mathbf {\beta }\in \mathscr {B}_{n}}|M_{n}(\mathbf {\beta })-D_{n} (\mathbf {\beta })|\right] \\\le & {} \frac{3c^{2}L}{2n\varepsilon }E\left\{ \frac{1}{n}\sum _{i=1}^{n} [J({\mathbf X}_{i})+J^{2}({\mathbf X}_{i})]\right\} . \end{aligned}$$

A direct application of the dominated convergence theorem gives

$$\begin{aligned} \lim _{n\rightarrow \infty }E\left\{ \frac{1}{n}\sum _{i=1}^{n}[J({\mathbf X}_{i}) +J^{2}({\mathbf X}_{i})]\right\} \rightarrow E\left\{ [J({\mathbf X})+J^{2}({\mathbf X})]\right\} <\infty . \end{aligned}$$

Thus, \(\displaystyle \lim _{n\rightarrow \infty } P_{\mathbf {\beta }_{0}}\Big [\sup _{\mathbf {\beta }\in \mathscr {B}_{n}}|D_{n}(\mathbf {\beta })-M_{n}(\mathbf {\beta })|>\varepsilon \Big ]=0\). The proof of Eq. (8) is obtained similarly, while that of Eq. (7) is obtained by combining Eq. (6) and Theorem 1. \(\square \)

Proof of Theorem 11

Equation (12) gives \(\sqrt{n}\big (\tilde{\mathbf {\beta }}_{n}-\mathbf {\beta }_{0}\big ) = -\widetilde{{\mathbf W}}_{n}^{-1}\sqrt{n}\widetilde{S}_{n}(\mathbf {\beta }_{0}) + o_p(1)\) and by (9) we have \(\sqrt{n}\widetilde{S}_{n}(\mathbf {\beta }_{0}) = \sqrt{n}S_{n}(\mathbf {\beta }_{0}) + o_p(1)\). Moreover, \(\widetilde{{\mathbf W}} = {\mathbf W}+ o_p(1)\) by (13). Since \({\mathbf W}\) is positive definite, we have \(\sqrt{n}\big (\tilde{\mathbf {\beta }}_{n}-\mathbf {\beta }_{0}\big ) = -{\mathbf W}^{-1}\sqrt{n} S_{n}(\mathbf {\beta }_{0}) + o_p(1)\). The result follows by Theorem 8. \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bindele, H.F., Abebe, A. & Meyer, K.N. General rank-based estimation for regression single index models. Ann Inst Stat Math 70, 1115–1146 (2018). https://doi.org/10.1007/s10463-017-0618-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-017-0618-9

Keywords

Navigation