Skip to main content
Log in

Optimal subsampling for composite quantile regression in big data

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

The composite quantile regression (CQR) is an efficient and robust alternative to the least squares for estimating regression coefficients in a linear model. We investigate optimal subsampling for CQR with massive datasets. By establishing the consistency and asymptotic normality of the CQR estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities under the L- and A-optimality criteria. The L-optimality criterion minimizes the trace of the asymptotic variance–covariance matrix of the estimator for a linearly transformed regression parameters and the A-optimality criterion minimizes that of the estimator for regression parameters. The L-optimal subsampling probabilities is easy to implement as they do not depend on the densities of the responses given covariates. Based on the L-optimal subsampling probabilities, we propose algorithms for computing the resulting estimators and their asymptotic distributions and asymptotic optimality are established. To obtain standard errors for CQR estimators without estimating the densities of the responses given the covariates, we propose an iterative subsampling procedure based on the L-optimal subsampling probabilities. The proposed methods are illustrated through numerical experiments on simulated and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Ai M, Yu J, Zhang H, Wang H (2021) Optimal subsampling algorithms for big data regressions. Stat Sinica 31:749–772

    MathSciNet  MATH  Google Scholar 

  • Atkinson A, Donev A, Tobias R (2007) Optimum experimental designs, with SAS, vol 34. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Drineas P, Mahoney MW, Muthukrishnan S (2006) Sampling algorithms for \(l_2\) regression and applicationsIn: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp 1127–1136

  • Drineas P, Mahoney MW, Muthukrishnan S, Sarlós T (2011) Faster least squares approximation. Numer Math 117:219–249

    Article  MathSciNet  Google Scholar 

  • Fonollosa J, Sheik S, Huerta R, Marco S (2015) Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sens Actuators B-Chem 215:618–629

    Article  Google Scholar 

  • Goodson DZ (2011) Mathematical methods for physical and analytical chemistry. Wiley, New York

    Book  Google Scholar 

  • Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory. https://doi.org/10.1109/TIT.2020.3001090

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort N L, Pollard D (2011) Asymptotics for minimisers of convex processes. arXiv preprint arXiv:1107.3806

  • Jiang XJ, Jiang JC, Song XY (2012) Oracle model selection for nonlinear models based on weighted composite quantile regression. Stat Sinica 22:1479–1506

    MathSciNet  MATH  Google Scholar 

  • Jiang R, Zhou ZG, Qian WM, Chen Y (2013) Two step composite quantile regression for single-index models. Comput Stat Data Anal 64:180–191

    Article  MathSciNet  Google Scholar 

  • Jiang R, Qian WM, Zhou ZG (2016) Single-index composite quantile regression with heteroscedasticity and general error distributions. Stat Pap 57:185–203

    Article  MathSciNet  Google Scholar 

  • Jiang R, Hu X, Yu K, Qian W (2018) Composite quantile regression for massive datasets. Statistics 52:980–1004

    Article  MathSciNet  Google Scholar 

  • Jiang R, Hu XP, Yu KM, Qian WM (2018) Composite quantile regression for massive datasets. Statistics 52:980–1004

    Article  MathSciNet  Google Scholar 

  • Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc Ser B 72:49–69

    Article  MathSciNet  Google Scholar 

  • Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39:305–332

    Article  MathSciNet  Google Scholar 

  • Knight K (1998) Limiting distributions for L\(_1\) regression estimators under general conditions. Ann Stat 26:755–770

    Article  Google Scholar 

  • Koenker R (2005) Quantile regression, vol 38. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Lin N, Xie R (2011) Aggregated estimating equation estimation. Stat Interface 4:73–83

    Article  MathSciNet  Google Scholar 

  • Ma P, Mahoney M, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16:861–911

    MathSciNet  MATH  Google Scholar 

  • Ning Z, Tang L (2014) Estimation and test procedures for composite quantile regression with covariates missing at random. Stat Prob Lett 95:15–25

    Article  MathSciNet  Google Scholar 

  • Raskutti G, Mahoney M (2016) A statistical perspective on randomized sketching for ordinary least-squares. J Mach Learn Res 17:1–31

    MathSciNet  MATH  Google Scholar 

  • Sun J (2020) An improvement on the efficiency of complete-case-analysis with nonignorable missing covariate data. Comput Stat. https://doi.org/10.1007/s00180-020-00964-6

    Article  MathSciNet  MATH  Google Scholar 

  • Tang L, Zhou Z (2015) Weighted local linear CQR for varying-coefficient models with missing covariates. TEST 24(3):583–604

    Article  MathSciNet  Google Scholar 

  • Tang L, Zhou Z, Wu C (2012) Weighted composite quantile estimation and variable selection method for censored regression model. Stat Prob Lett 3:653–663

    Article  MathSciNet  Google Scholar 

  • van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, London

    Book  Google Scholar 

  • Wang H (2019) More efficient estimation for logistic regression with optimal subsamples. J Mach Learn Res 20:1–59

    MathSciNet  MATH  Google Scholar 

  • Wang S, Xiang L (2017) Two-layer EM algorithm for ALD mixture regression models: a new solution to composite quantile regression. Comput Stat Data Anal 115:136–154

    Article  MathSciNet  Google Scholar 

  • Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108:99–112

    Article  MathSciNet  Google Scholar 

  • Wang H, Zhu R, Ma P (2018) Optimal subsampling for large sample logistic regression. J Am Stat Assoc 113:829–844

    Article  MathSciNet  Google Scholar 

  • Wang H, Yang M, Stufken J (2019) Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc 114:393–405

    Article  MathSciNet  Google Scholar 

  • Yao Y, Wang H (2019) Optimal subsampling for softmax regression. Stat Pap 60:585–599

    Article  MathSciNet  Google Scholar 

  • Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are grateful to the two reviewers and the associate editor for a number of constructive and helpful comments and suggestions that have clearly improved our manuscript. Xiaohui Yuan was partly supported by the NSFC (Nos. 11571051, 11671054, 11701043).

Author information

Authors and Affiliations

Authors

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Theorem 1

Define

$$\begin{aligned} A_n^*(\mathbf{u })= & {} \sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} A^*_{ik}(\mathbf{u }), \end{aligned}$$

where \(A^*_{ik}(\mathbf{u }) = \rho _{\tau _k}( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k} ( \varepsilon _i^*-b_{0k})\), \(\tilde{\mathbf{x }}_{ik}^*=(\mathbf{x }_i^{*\textsf {T}}, \mathbf{e }_k^\textsf {T})^\textsf {T}\), and \(\varepsilon _i^*=y_i^*-{\varvec{\beta }}^\textsf {T}_0\mathbf{x }_i^*\), \(i=1,\ldots ,n\). As a function of \(\mathbf{u }\), \(A_n^*(\mathbf{u })\) is convex and its minimizer is \( \sqrt{n}( \tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\). Thus, we can focus on \(A_n^*(\mathbf{u })\) when assessing the properties of \( \sqrt{n}( \tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\).

Let \(\psi _\tau (u)=\tau -I(u<0)\). By the identity (Knight 1998),

$$\begin{aligned} \rho _{\tau }(u-v)-\rho _{\tau }(u)= & {} -v\psi _\tau (u)+\int _0^v \{I(u\le s)-I(u\le 0)\}ds, \end{aligned}$$

we obtain

$$\begin{aligned} A^*_{ik}(\mathbf{u })= & {} \rho _{\tau _k} ( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k}( \varepsilon _i^*-b_{0k})\\= & {} - \frac{1}{\sqrt{n}}\mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\\&+ \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*- b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds. \end{aligned}$$

Thus, we write

$$\begin{aligned} A_n^*(\mathbf{u })= & {} -\mathbf{u }^\textsf {T}\frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}^* \nonumber \\&+\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i}\int _0^{ {\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik} /\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds\nonumber \\= & {} \mathbf{u }^\textsf {T} \mathbf{Z }_{n}^* +A^*_{2n}(\mathbf{u }), \end{aligned}$$
(15)

where

$$\begin{aligned}&\mathbf{Z }_{n}^*= - \frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}^*,\\&A^*_{2n}(\mathbf{u })= \sum _{i=1}^n\frac{1}{N\pi ^*_i} A^*_{2n,i}(\mathbf{u }), \\&A^*_{2n,i}(\mathbf{u })= \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^* /\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds. \end{aligned}$$

We first prove the asymptotic normality of \(\mathbf{Z }_{n}^* \). Denote

$$\begin{aligned} {\varvec{\eta }}_i^*= & {} -\frac{1}{N\pi ^*_i} \sum _{k=1}^K \{\tau _k- I(\varepsilon _i^*-b_{0k}<0)\}\tilde{\mathbf{x }}_{ik}^*, \end{aligned}$$

then we can write \(\mathbf{Z }_{n}^*= \frac{1}{\sqrt{n}} \sum _{i=1}^n {\varvec{\eta }}_i^*\). Direct calculation yields

$$\begin{aligned} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)= & {} -\frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}=O_p(N^{-1/2}), \\ \text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N)= & {} E\{({\varvec{\eta }}_i^*)^{\otimes 2} |{\mathbb {D}}_N\}-\{E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}^{\otimes 2} \\= & {} \sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr \{\sum _{k=1}^K [\tau _k-I(\varepsilon _i-b_{0k}<0)] \tilde{\mathbf{x }}_{ik}\biggr \} ^{\otimes 2}- \{E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}^{\otimes 2} \\= & {} \sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr \{\sum _{k=1}^K [\tau _k-I(\varepsilon _i-b_{0k}<0)] \tilde{\mathbf{x }}_{ik}\biggr \} ^{\otimes 2} - o_p(1). \end{aligned}$$

It follows that

$$\begin{aligned}&E\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}= 0, \\&\text{ cov }\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}= \frac{1}{N^2} \sum _{i=1}^N \text{ cov }\left\{ \sum _{k=1}^K\left[ \tau _k-I(\varepsilon _i<b_{0k})\right] \tilde{\mathbf{x }}_{ik}\right\} . \end{aligned}$$

Consider the (st)th element of \( \text{ cov }\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}\), denoted by \(\sigma _{st}\). Using the \(c_r\) inequality, we have \(|\sigma _{st}| \le \sqrt{\sigma _{ss}}\sqrt{\sigma _{tt}}\le \frac{1}{N^2}\sum _{i=1}^NK(\Vert \mathbf{x }_i\Vert ^2+1)=O(N^{-1})\) under Assumption 1(b). By Chebyshev’s inequality, \( E({\varvec{\eta }}_i^*|{\mathbb {D}}_N) =O_p(N^{-1/2})\).

We now check Lindeberg’s conditions (Theorem 2.27 of van der Vaart 1998) under the conditional distribution given \({\mathbb {D}}_N\). Specifically, we want to show that for \(\epsilon >0\),

$$\begin{aligned}&\sum _{i=1}^n E\{\Vert n^{-1/2}{\varvec{\eta }}_i^* \Vert ^2 I(\Vert {\varvec{\eta }}_i^*\Vert>\sqrt{n}\epsilon ) |{\mathbb {D}}_N\} \nonumber \\&\quad = \sum _{i=1}^n E\biggr \{\biggr \Vert \frac{1}{\sqrt{n} N \pi _i^*} \sum _{k=1}^K \tilde{\mathbf{x }}_{ik}^*\{\tau _k-I(\varepsilon _i-b_{0k}<0 )\} \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\biggr \Vert \frac{1}{\sqrt{n} N \pi _i^* \epsilon } \sum _{k=1}^K \tilde{\mathbf{x }}_{ik}^* \{\tau _k-I(\varepsilon _i-b_{0k}<0)\} \biggr \Vert>1 \biggr ) \biggr | {\mathbb {D}}_N \biggr \} \nonumber \\&\quad = \sum _{i=1}^N\frac{1}{ N^2 \pi _i} \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0)\}\tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\frac{1}{\sqrt{n} N \pi _i \epsilon } \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0)\} \tilde{\mathbf{x }}_{ik} \biggr \Vert >1 \biggr ) \end{aligned}$$
(16)

goes to zero in probability. If condition (7) holds, then the right hand side of (16) satisfies that

$$\begin{aligned}&\sum _{i=1}^N\frac{1}{ N^2 \pi _i} \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \nonumber \\&\quad \times I\biggr (\frac{1}{\sqrt{n} N \pi _i \epsilon } \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr \Vert>1 \biggr ) \nonumber \\&\qquad \le K^2 \sum _{i=1}^N\frac{1}{ N^2 \pi _i} (1+ \Vert \mathbf{x }_{i}\Vert )^2 I\biggr (\frac{K(1+ \Vert \mathbf{x }_{i}\Vert )}{\sqrt{n} N \pi _i \epsilon }>1 \biggr ) \nonumber \\&\qquad \le I\biggr (\max _{1\le i\le N} \frac{\Vert \mathbf{x }_{i}\Vert +1 }{\pi _i}>\frac{\sqrt{n} N\epsilon }{ K} \biggr )\biggr ( K^2\sum _{i=1}^N\frac{(1+ \Vert \mathbf{x }_{i}\Vert )^2}{N^2\pi _i}\biggr ). \end{aligned}$$

By Assumption 2(a), \(\max _{1\le i\le N}\frac{\Vert {\mathbf{x }}_i\Vert +1}{\pi _i}=o_p(\sqrt{n}N)\). By Assumption 2(b), \(K^2\sum _{i=1}^N\frac{(1+ \Vert {\mathbf{x }}_{i}\Vert )^2}{N^2\pi _i}=O_p(1)\). It follows that

$$\begin{aligned} \sum _{i=1}^n E\{\Vert n^{-1/2}{\varvec{\eta }}_i^* \Vert ^2 I(\Vert {\varvec{\eta }}_i^*\Vert >\sqrt{n}\epsilon ) |{\mathbb {D}}_N\}=o_p(1), \end{aligned}$$

which shows that Lindeberg’s conditions hold with probability approaching one.

Given \({\mathbb {D}}_N\), \({\varvec{\eta }}^*_i\), \(i=1,\ldots ,n\), are independent and identically distributed with mean \(E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)=O_p(N^{-1/2})\) and the covariance \(\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\). Thus, conditional on \({\mathbb {D}}_N\), when \(n,N \rightarrow +\infty \), with probability approaching one,

$$\begin{aligned} \{\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N) \}^{-1/2}\{ \mathbf{Z }_{n}^*- \sqrt{n} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$

Since \(\sqrt{n} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)=O_p(\sqrt{n}/\sqrt{N})=o_p(1)\), it follows that

$$\begin{aligned} \{\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N) \}^{-1/2} \mathbf{Z }_{n}^*&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$
(17)

Next, we prove that

$$\begin{aligned} A^*_{2n}(\mathbf{u })= & {} \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p}(1). \end{aligned}$$

Write the conditional expectation of \(A^*_{2n}(\mathbf{u })\) as

$$\begin{aligned}&E\{A^*_{2n}(\mathbf{u })|{\mathbb {D}}_N\} \nonumber \\&\qquad = \frac{n}{N}\sum _{i=1}^N E\{A_{2n,i}(\mathbf{u })\}+\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]. \end{aligned}$$
(18)

By Assumption 1,

$$\begin{aligned}&\frac{n}{N}\sum _{i=1}^N E(A_{2n,i}(\mathbf{u })) \nonumber \\&\quad = \frac{n}{N}\sum _{i=1}^N \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \{F(b_{0k}+s)-F(b_{0k})\}ds \nonumber \\&\quad =\frac{\sqrt{n}}{N}\sum _{i=1}^N \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}}\{F(b_{0k}+t/\sqrt{n})-F(b_{0k})\}dt \nonumber \\&\quad =\frac{1}{2}\mathbf{u }^\textsf {T}\left( \frac{1}{N}\sum _{i=1}^N \sum _{k=1}^K f(b_{0k})\tilde{\mathbf{x }}_{ik}\tilde{\mathbf{x }}_{ik}^\textsf {T}\right) \mathbf{u }+o(1) \nonumber \\&\quad = \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o(1). \end{aligned}$$
(19)

The second term of (18) has mean 0 and its variance satisfies

$$\begin{aligned} \text{ var }\biggr (\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]\biggr ) \le \frac{n^2}{N^2} \sum _{i=1}^N E\{A_{2n,i}^2(\mathbf{u })\}. \end{aligned}$$
(20)

From the fact that \(A_{2n,i}(\mathbf{u })\) is nonnegative, we obtain

$$\begin{aligned} A_{2n,i}(\mathbf{u })\le & {} \left| \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \{I(\varepsilon _i\le b_{0k}+s)-I(\varepsilon _i\le b_{0k})\}ds \right| \nonumber \\\le & {} \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \left| \{I(\varepsilon _i\le b_{0k}+s)-I(\varepsilon _i\le b_{0k})\} \right| ds \nonumber \\\le & {} \frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}|. \end{aligned}$$
(21)

By Assumption 1(b), \(\max _{1\le i \le N}\Vert \mathbf{x }_i\Vert =o(\sqrt{N})\). Combining this fact, (20), and (21), we have

$$\begin{aligned}&\text{ var }\biggr (\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]\biggr ) \nonumber \\&\quad \le \left\{ K\frac{\Vert \mathbf{u }\Vert }{\sqrt{N}}(1+\max _{1\le i \le N}\Vert \mathbf{x }_i\Vert ) \right\} \frac{\sqrt{n}}{\sqrt{N}} \frac{n}{N} \sum _{i=1}^N E\{ A_{2n,i}(\mathbf{u })\}= o(1). \end{aligned}$$
(22)

From (18), (19), (22), and Chebyshev’s inequality, it follows that

$$\begin{aligned} E\left\{ A^*_{2n}(\mathbf{u })| {\mathbb {D}}_N \right\} =\frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$
(23)

Next, we exam the conditional variance of \(A^*_{2n}(\mathbf{u })\), i.e., \(\text{ var }\left\{ A^*_{2n}(\mathbf{u })| {\mathbb {D}}_N \right\} \). Observing that conditional on \({\mathbb {D}}_N\), \(A^*_{2n,i}(\mathbf{u }), i=1,\ldots , n\) are independent and identically distributed, then

$$\begin{aligned} \text{ var }\left\{ A^*_{2n}(\mathbf{u }) |{\mathbb {D}}_N\right\}= & {} \frac{1}{N^2} \sum _{i=1}^n \text{ var }\biggr \{ \frac{A^*_{2n,i}(\mathbf{u })}{\pi ^*_i} \biggr |{\mathbb {D}}_N\biggr \} \nonumber \\\le & {} \frac{n}{N^2}E\biggr [\biggr \{\frac{A^*_{2n,i}(\mathbf{u })}{\pi ^*_i}\biggr \}^2 \biggr |{\mathbb {D}}_N\biggr ]. \end{aligned}$$
(24)

By (21), the right hand of (24) satisfies

$$\begin{aligned} \frac{n}{N^2}\sum _{i=1}^N \frac{A^2_{2n,i}(\mathbf{u })}{\pi _i}\le & {} \frac{n}{N^2}\sum _{i=1}^N \frac{A_{2n,i}(\mathbf{u })}{\pi _i}\biggr (\frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}| \biggr ) \nonumber \\\le & {} \frac{1}{\sqrt{n}N} \biggr (K\Vert \mathbf{u }\Vert \max _{1\le i \le N}\frac{\Vert \mathbf{x }_i\Vert +1}{\pi _i}\biggr )\frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u }). \end{aligned}$$
(25)

From (19), (25) and Assumption 2(a), we have

$$\begin{aligned} \text{ var }\biggr \{ A^*_{2n}(\mathbf{u }) |{\mathbb {D}}_N\biggr \}=o_p(1). \end{aligned}$$
(26)

From (21), (26), and Chebyshev’s inequality,

$$\begin{aligned} A^*_{2n}(\mathbf{u })= & {} \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p|{\mathbb {D}}_N}(1), \end{aligned}$$
(27)

Here \(a=o_{p|{\mathbb {D}}_N}(1)\) means a converges to 0 in conditional probability given \({\mathbb {D}}_N\) in probability, namely, for any \(\delta >0\), \(P(|a|>\delta |{\mathbb {D}}_N) {\mathop {\longrightarrow }\limits ^{p}}0\) as \(N\rightarrow +\infty \). Note that \(0\le P(|a|>\delta |{\mathbb {D}}_N) \le 1\), thus it converges to 0 in probability if and only \(P(|a|>\delta )= E\{P(|a|>\delta |{\mathbb {D}}_N)\} \rightarrow 0\). Thus, \(a=o_{p|{\mathbb {D}}_N}(1)\) is equivalent to \(a=o_{p}(1)\). We will use the notation of \(o_p\) only.

From (15) and (27), we have

$$\begin{aligned} A_n^*(\mathbf{u })= & {} \mathbf{u }^\textsf {T} \mathbf{Z }_{n}^*+ \frac{1}{2} \mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p}(1). \end{aligned}$$

Since \(A_n^*(\mathbf{u })\) is a convex function, then from the corollary in page 2 of Hjort and Pollard (2011), its minimizer, \(\sqrt{n}(\tilde{{\varvec{\theta }}}_S- {\varvec{\theta }}_0)\), satisfies that

$$\begin{aligned} \sqrt{n} (\tilde{{\varvec{\theta }}}_S- {\varvec{\theta }}_0) = - \mathbf{E }_N^{-1} \mathbf{Z }_{n}^*+o_p(1). \end{aligned}$$

Thus, we have

$$\begin{aligned} \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)= & {} -\{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \mathbf{E }^{-1}_N \mathbf{Z }_{n}^*+o_p(1). \end{aligned}$$

Combining (17) and Slutsky’s Theorem, we have that, for any \(\mathbf{a }\in {\mathbb {R}}^{p+K}\),

$$\begin{aligned} P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N]{\mathop {\longrightarrow }\limits ^{p}}\varPhi _{p+K}(\mathbf{a }), \end{aligned}$$
(28)

where \(\varPhi _{p+K}(\mathbf{a })\) denotes the standard \(p+K\)-dimensional multivariate normal distribution function. Note that the conditional probability in (28) is a bounded random variable, thus convergence in probability to a constant implies convergence in the mean. Therefore, for any \(\mathbf{a }\in {\mathbb {R}}^{p+K}\),

$$\begin{aligned}&P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }]\\&\quad =E(P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N])\\&\quad \rightarrow \varPhi _{p+K}(\mathbf{a }). \end{aligned}$$

This finishes the proof of Theorem 1.

Proof the Theorem 2

Note that

$$\begin{aligned} \text{ tr }(\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1})= & {} \frac{1}{N^2} \sum _{i=1}^N \text{ tr }\biggr ( \frac{1}{\pi _i}\mathbf{E }_N^{-1} \biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\}\tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}\mathbf{E }_N^{-1}\biggr ) \nonumber \\= & {} \frac{1}{N^2}\sum _{i=1}^N \biggr [\frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\biggr ] \nonumber \\= & {} \frac{1}{N^2}\biggr (\sum _{i=1}^N \pi _i \biggr ) \biggr (\sum _{i=1}^N \frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2 \biggr ) \nonumber \\\ge & {} \frac{1}{N^2} \biggr [ \sum _{i=1}^N \biggr \Vert \sum _{k=1}^K\{I(\varepsilon _i<b_{0k})-\tau _k\}\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert \biggr ]^2 \end{aligned}$$

where the last step is from the Cauchy-Schwarz inequality and the equality in it holds if and only if when \(\pi _i \propto \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\Vert \).

Proof the Theorem 3

Note that

$$\begin{aligned} \text{ tr }(\mathbf{V }_\pi )= & {} \text{ tr }\biggr (\sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik}\biggr ]^{\otimes 2}\biggr ) \nonumber \\= & {} \frac{1}{N^2} \sum _{i=1}^N \text{ tr }\biggr ( \frac{1}{\pi _i}\biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik}\biggr ]^{\otimes 2}\biggr ) \nonumber \\= & {} \frac{1}{N^2}\sum _{i=1}^N \biggr [ \frac{1}{\pi _i} \biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert ^2\biggr ] \nonumber \\= & {} \frac{1}{N^2}\biggr (\sum _{i=1}^N \pi _i \biggr ) \biggr [\sum _{i=1}^N \frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \biggr ] \nonumber \\\ge & {} \frac{1}{N^2} \biggr [\sum _{i=1}^N \biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert \biggr ]^2 \end{aligned}$$

where the last step is from the Cauchy-Schwarz inequality and the equality in it holds if and only if when \(\pi _i \propto \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \Vert \).

Proof the Theorem 4

Recall that \(\varepsilon _i^*=y_i^*-{\varvec{\beta }}^\textsf {T}_0\mathbf{x }_i^*\), \(i= 1,\ldots ,n\). Denote

$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} \sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} A^*_{ik}(\mathbf{u }), \end{aligned}$$

where \(A^*_{ik}(\mathbf{u }) = \rho _{\tau _k} ( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k} (\varepsilon _i^*-b_{0k})\).

As a function of \(\mathbf{u }\), \({\tilde{A}}_n^*(\mathbf{u })\) is convex and its minimizer is \( \sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)\). Thus we can focus on \({\tilde{A}}_n^*(\mathbf{u })\) when assessing the properties of \( \sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)\). We can write

$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} -\mathbf{u }^\textsf {T}\frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \{\tau _k- I(\varepsilon _i^*-b_{0k} <0 )\}\tilde{\mathbf{x }}_{ik}^* \nonumber \\&+\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}}\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*- b_{0k}\le 0 )\}ds\nonumber \\= & {} \mathbf{u }^\textsf {T} \tilde{\mathbf{Z }}_{n}^*+ {\tilde{A}}^*_{2n}(\mathbf{u }), \end{aligned}$$
(29)

where

$$\begin{aligned}&\tilde{\mathbf{Z }}_{n}^*=- \frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \{\tau _k- I(\varepsilon _i^*<b_{0k} )\}\tilde{\mathbf{x }}_{ik}^*,\\&{\tilde{A}}^*_{2n}(\mathbf{u })=\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}}{\tilde{A}}^*_{2n,i}(\mathbf{u }),\\&{\tilde{A}}^*_{2n,i}(\mathbf{u })= \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*\le b_{0k}+s)-I(\varepsilon _i^*\le b_{0k})\}ds. \end{aligned}$$

Conditioning on \(({\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)\), we first prove the asymptotic normality of \(\tilde{\mathbf{Z }}_{n}^* \). Denote

$$\begin{aligned} \tilde{{\varvec{\eta }}}_i^*= & {} -\frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \sum _{k=1}^K \{\tau _k- I(\varepsilon _i^*<b_{0k} )\}\tilde{\mathbf{x }}_{ik}^*, \end{aligned}$$

then we can write \(\tilde{\mathbf{Z }}_{n}^*= \frac{1}{\sqrt{n}} \sum _{i=1}^n \tilde{{\varvec{\eta }}}_i^*\). Direct calculation yields

$$\begin{aligned} E(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)= & {} -\frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} =O_p(N^{-1/2}),\nonumber \\ \text{ cov }(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)= & {} \sum _{i=1}^N \frac{1}{N^2\tilde{\pi }_{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}+o_p(1).\nonumber \\ \end{aligned}$$
(30)

Let \(\tilde{\varepsilon }_i= y_i-\tilde{{\varvec{\beta }}}^\textsf {T}_U\mathbf{x }_i\), \(i=1,\ldots ,N\), then we can write

$$\begin{aligned} \tilde{\pi }_{i}^{Lopt}= & {} \frac{\Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\Vert }{\sum _{j=1}^N\Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_j<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{jk}\Vert },\ \ i=1,\ldots ,N. \end{aligned}$$

Thus we have

$$\begin{aligned}&\sum _{i=1}^N \frac{1}{N^2\tilde{\pi }_{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2} \nonumber \\&\quad =\frac{1}{N}\sum _{i=1}^N \frac{ [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} ]^{\otimes 2} }{ \Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\Vert } \times \frac{1}{N} \sum _{i=1}^N\biggr \Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\biggr \Vert \nonumber \\&\quad = \tilde{\varDelta }_1 \times \tilde{\varDelta }_2. \end{aligned}$$
(31)

Let

$$\begin{aligned} \varDelta _1= & {} \frac{1}{N}\sum _{i=1}^N \frac{ [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} ]^{\otimes 2} }{ \Vert \sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}\Vert }, \\ \varDelta _2= & {} \frac{1}{N} \sum _{i=1}^N\biggr \Vert \sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}\biggr \Vert . \end{aligned}$$

Next, we show that \(\tilde{\varDelta }_1=\varDelta _1+o_p(1)\) and \(\tilde{\varDelta }_2=\varDelta _2+o_p(1)\). Note that the \((j_1,j_2)\)th element of \(\tilde{\varDelta }_1-\varDelta _1\) \((j_1, j_2=1,\ldots ,p)\) is bounded by

$$\begin{aligned}&|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}\le \frac{1}{N} \sum _{i=1}^N \biggr [\sum _{k=1}^K |\tau _k-I(\varepsilon _i<b_{0k})| \Vert \tilde{\mathbf{x }}_{ik}\Vert \biggr ]^{2} \\&\qquad \times \left| \frac{1}{\Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\Vert }-\frac{1}{\Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\Vert } \right| \\&\quad \le \frac{1}{N} \sum _{i=1}^N K^{2}(\Vert \mathbf{x }_{i}\Vert +1 )^2 \\&\qquad \times \frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{\Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\Vert \Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\Vert }. \end{aligned}$$

Observing that

$$\begin{aligned} \biggr |K^{-1}\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2\le K^{-1}\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}^2, \end{aligned}$$

we have

$$\begin{aligned}&\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\\&\quad =\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\mathbf{x }_{i}\biggr \Vert ^2+\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\mathbf{e }_{k}\biggr \Vert ^2\\&\quad =\biggr |\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2\Vert \mathbf{x }_{i}\Vert ^2+\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}^2\\&\quad \ge \biggr |\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2(\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}). \end{aligned}$$

Similarly,

$$\begin{aligned} \biggr \Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\ge \biggr |\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\biggr |^2(\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}).\nonumber \\ \end{aligned}$$
(32)

Using the inequality

$$\begin{aligned} \frac{(\Vert {\mathbf{x }}_{i}\Vert +1)^2}{\Vert {\mathbf{x }}_i\Vert ^2+K^{-1}}\le 2\frac{\Vert {\mathbf{x }}_i\Vert ^2+1}{\Vert {\mathbf{x }}_i\Vert ^2+K^{-1}}\le 2K, \end{aligned}$$
(33)

we have

$$\begin{aligned}&|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}\nonumber \\&\quad \le \frac{2 K^{3}}{N} \sum _{i=1}^N\frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{|\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}||\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}|}\nonumber \\&\quad \le \frac{2 K^{3}}{N} \sum _{i=1}^N\frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{(\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|)^2}, \end{aligned}$$
(34)

where \(\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|>0\). Note that for each i, \(| I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k}) - I(\varepsilon _i<b_{0k})|\) is bounded and converges in probability to 0, as \(n_0\rightarrow \infty \). Thus, for \(k=1,\ldots ,K\), \(E\{| I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k}) - I(\varepsilon _i<b_{0k})|\}\rightarrow 0\). For any \(\epsilon >0\),

$$\begin{aligned}&P\biggr \{\frac{1}{N} \sum _{i=1}^N\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)>\epsilon \biggr \}\\&\qquad \le \frac{1}{\epsilon N} \sum _{i=1}^N\sum _{k=1}^KE\{|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|\}(\Vert \mathbf{x }_{i}\Vert +1)\rightarrow 0, \end{aligned}$$

which implies that the term in (34) converges in probability to 0. Thus, \(|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}{\mathop {\longrightarrow }\limits ^{p}}0\). Similarly, it is easy to verify that \(|\tilde{\varDelta }_2-\varDelta _2|{\mathop {\longrightarrow }\limits ^{p}}0\). These facts, together with (30) and (31), show that

$$\begin{aligned} \text{ cov }(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U )= & {} \sum _{i=1}^N \frac{1}{N^2 \pi _{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}+ o_p(1)\\= & {} \mathbf{V }_{Lopt}+o_p(1). \end{aligned}$$

We now check Lindeberg’s conditions (Theorem 2.27 of van der Vaart 1998) given \({\mathbb {D}}_N\) and \(\tilde{{\varvec{\theta }}}_U\). For \(\epsilon >0\),

$$\begin{aligned}&\sum _{i=1}^n E\{\Vert n^{-1/2}\tilde{{\varvec{\eta }}}_i^* \Vert ^2 I(\Vert \tilde{{\varvec{\eta }}}_i^*\Vert>\sqrt{n}\epsilon ) |{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U\} \nonumber \\&\quad = \sum _{i=1}^n E\biggr \{\biggr \Vert \frac{1}{\sqrt{n} N \tilde{\pi }_{i}^{*Lopt}} \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik}^* \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\biggr \Vert \frac{1}{\sqrt{n} N \tilde{\pi }_{i}^{*Lopt} \epsilon } \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik}^*\biggr \Vert>1 \biggr ) \biggr | {\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U \biggr \} \nonumber \\&\quad \le I\left( \max _{1\le i\le N} \frac{ \Vert \mathbf{x }_{i}\Vert +1 }{\tilde{\pi }_{i}^{Lopt}} >\frac{\sqrt{n} N \epsilon }{K} \right) \left( K^2 \sum _{i=1}^N\frac{(\Vert \mathbf{x }_{i}\Vert +1)^2}{ N^2\tilde{\pi }_{i}^{Lopt}}\right) . \end{aligned}$$
(35)

Using the inequalities (32) and (33), it is easy to verify that

$$\begin{aligned} \frac{\Vert \mathbf{x }_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}= & {} \frac{\Vert \mathbf{x }_i\Vert +1 }{\Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})] \tilde{\mathbf{x }}_{ik} \Vert } \sum _{j=1}^N\left\| \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_j<{\tilde{b}}_{U,k})] \tilde{\mathbf{x }}_{jk}\right\| \nonumber \\\le & {} \frac{\Vert \mathbf{x }_i\Vert +1}{|\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}|\sqrt{\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}}} K \sum _{j=1}^N (1+\Vert \mathbf{x }_j\Vert )\nonumber \\\le & {} \frac{\sqrt{2K}}{\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|} K \sum _{j=1}^N (1+\Vert \mathbf{x }_j\Vert )=O(N). \end{aligned}$$
(36)

Thus, \(\max _{1\le i \le N }\frac{\Vert {\mathbf{x }}_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}=o_p(\sqrt{n} N)\) and the right hand of (35) is \(o_p(1)\), which shows that

$$\begin{aligned} \sum _{i=1}^n E\{\Vert n^{-1/2}\tilde{{\varvec{\eta }}}_i^* \Vert ^2 I(\Vert \tilde{{\varvec{\eta }}}_i^*\Vert >\sqrt{n}\epsilon ) |{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U \}=o_p(1). \end{aligned}$$

Given \({\mathbb {D}}_N\) and \(\tilde{{\varvec{\theta }}}_U\), \(\tilde{{\varvec{\eta }}}_i^*\), are i.i.d with mean \(o_p(1)\) and variance \(\mathbf{V }_{Lopt}+o_p(1)\). Note that if \((NK)^{-1}\sum _{i=1}^N(1+\Vert \mathbf{x }_i\Vert )^{-1}[\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}]^{\otimes 2}\) is asymptotically positive definite, then \(\mathbf{V }_{Lopt}\) is asymptotically positive definite. Thus, given \({\mathbb {D}}_N\) and \(\tilde{{\varvec{\theta }}}_U\) in probability, as \(n_0\rightarrow \infty \), \(n\rightarrow \infty \), and \(N\rightarrow \infty \), if \(n/N =o(1)\) and \(\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|>0\), then

$$\begin{aligned} \mathbf{V }_{Lopt} ^{-1/2} \tilde{\mathbf{Z }}_{n}^*&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$
(37)

For \({\tilde{A}}^*_{2n}(\mathbf{u })\) in (29), we get that

$$\begin{aligned} E\{{\tilde{A}}^*_{2n}(\mathbf{u })|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U\}= & {} \frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1), \end{aligned}$$
(38)

where the last equality is from (23).

Now we exam its variance. By (21) and (36), we have

$$\begin{aligned}&\text{ var }\{{\tilde{A}}^*_{2n}(\mathbf{u })|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U\}\le \frac{n}{N^2}E\biggr \{ \frac{\{{\tilde{A}}^*_{2n,i}(\mathbf{u })\}^2}{(\tilde{\pi }_{i}^{*Lopt})^2}\biggr |{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}\biggr \}\nonumber \\&\qquad =\frac{n}{N^2}\sum _{i=1}^N \frac{A_{2n,i}^2(\mathbf{u })}{\tilde{\pi }_{i}^{Lopt}}\le \frac{n}{N^2}\sum _{i=1}^N\frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}|\frac{A_{2n,i}(\mathbf{u })}{\tilde{\pi }_{i}^{Lopt}}\nonumber \\&\qquad \le K \Vert \mathbf{u }\Vert \frac{\sqrt{n}}{N^2}\sum _{i=1}^N\frac{\Vert \mathbf{x }_{i}\Vert +1}{\tilde{\pi }_{i}^{Lopt}}A_{2n,i}(\mathbf{u })\nonumber \\&\qquad \le K \Vert \mathbf{u }\Vert \max _{1\le i \le N }\frac{\Vert \mathbf{x }_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}\frac{1}{\sqrt{n}N}\frac{n}{N}\sum _{i=1}^NA_{2n,i}(\mathbf{u })=O_p(n^{-1/2}). \end{aligned}$$
(39)

From (38), (39), and Chebyshev’s inequality,

$$\begin{aligned} {\tilde{A}}^*_{2n}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$
(40)

From (29) and (40),

$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} \mathbf{u }^\textsf {T} \tilde{\mathbf{Z }}_{n}^*+ \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$

Since \({\tilde{A}}_n^*(\mathbf{u })\) is a convex function, from the corollary in page 2 of Hjort and Pollard (2011), its minimizer, \(\sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}- {\varvec{\theta }}_0)\), satisfies that

$$\begin{aligned} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}- {\varvec{\theta }}_0)&{\mathop {\longrightarrow }\limits ^{d}}&- \mathbf{E }^{-1}_N \tilde{\mathbf{Z }}_{n}^*+o_p(1). \end{aligned}$$

Thus, we have

$$\begin{aligned} \{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)= & {} -\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \mathbf{E }^{-1}_N \tilde{\mathbf{Z }}_{n}^*+o_p(1). \end{aligned}$$

This asymptotic expression, together with (37), show that, for any \(\mathbf{a }\in {\mathbb {R}}^{p+K}\),

$$\begin{aligned} P[\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U]{\mathop {\longrightarrow }\limits ^{p}}\varPhi _{p+K}(\mathbf{a }). \end{aligned}$$

Here, \(\varPhi _{p+K}(\mathbf{a })\) denotes the standard \(p+K\)-dimensional multivariate normal distribution function. Since the conditional probability is a bounded random variable, convergence in probability to a constant implies convergence in the mean. Therefore, \(P[\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} ({\hat{{\varvec{\theta }}}}_{Lopt}-{\varvec{\theta }}_0)\le \mathbf{a }]\rightarrow \varPhi _{p+K}(\mathbf{a })\) for any \(\mathbf{a }\in {\mathbb {R}}^{p+K}\), and this finishes the proof of Theorem 4.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, X., Li, Y., Dong, X. et al. Optimal subsampling for composite quantile regression in big data. Stat Papers 63, 1649–1676 (2022). https://doi.org/10.1007/s00362-022-01292-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-022-01292-1

Keywords

Navigation