Optimal subsampling for composite quantile regression in big data

Yuan, Xiaohui; Li, Yong; Dong, Xiaogang; Liu, Tianqing

doi:10.1007/s00362-022-01292-1

Optimal subsampling for composite quantile regression in big data

Regular Article
Published: 08 February 2022

Volume 63, pages 1649–1676, (2022)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Xiaohui Yuan ORCID: orcid.org/0000-0002-1416-4511^1,2,
Yong Li¹,
Xiaogang Dong² &
…
Tianqing Liu³

603 Accesses
6 Citations
Explore all metrics

Abstract

The composite quantile regression (CQR) is an efficient and robust alternative to the least squares for estimating regression coefficients in a linear model. We investigate optimal subsampling for CQR with massive datasets. By establishing the consistency and asymptotic normality of the CQR estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities under the L- and A-optimality criteria. The L-optimality criterion minimizes the trace of the asymptotic variance–covariance matrix of the estimator for a linearly transformed regression parameters and the A-optimality criterion minimizes that of the estimator for regression parameters. The L-optimal subsampling probabilities is easy to implement as they do not depend on the densities of the responses given covariates. Based on the L-optimal subsampling probabilities, we propose algorithms for computing the resulting estimators and their asymptotic distributions and asymptotic optimality are established. To obtain standard errors for CQR estimators without estimating the densities of the responses given the covariates, we propose an iterative subsampling procedure based on the L-optimal subsampling probabilities. The proposed methods are illustrated through numerical experiments on simulated and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Article Open access 01 July 2018

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Article 25 March 2024

References

Ai M, Yu J, Zhang H, Wang H (2021) Optimal subsampling algorithms for big data regressions. Stat Sinica 31:749–772
MathSciNet MATH Google Scholar
Atkinson A, Donev A, Tobias R (2007) Optimum experimental designs, with SAS, vol 34. Oxford University Press, Oxford
MATH Google Scholar
Drineas P, Mahoney MW, Muthukrishnan S (2006) Sampling algorithms for $l_2$ regression and applicationsIn: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp 1127–1136
Drineas P, Mahoney MW, Muthukrishnan S, Sarlós T (2011) Faster least squares approximation. Numer Math 117:219–249
Article MathSciNet Google Scholar
Fonollosa J, Sheik S, Huerta R, Marco S (2015) Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sens Actuators B-Chem 215:618–629
Article Google Scholar
Goodson DZ (2011) Mathematical methods for physical and analytical chemistry. Wiley, New York
Book Google Scholar
Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory. https://doi.org/10.1109/TIT.2020.3001090
Article MathSciNet MATH Google Scholar
Hjort N L, Pollard D (2011) Asymptotics for minimisers of convex processes. arXiv preprint arXiv:1107.3806
Jiang XJ, Jiang JC, Song XY (2012) Oracle model selection for nonlinear models based on weighted composite quantile regression. Stat Sinica 22:1479–1506
MathSciNet MATH Google Scholar
Jiang R, Zhou ZG, Qian WM, Chen Y (2013) Two step composite quantile regression for single-index models. Comput Stat Data Anal 64:180–191
Article MathSciNet Google Scholar
Jiang R, Qian WM, Zhou ZG (2016) Single-index composite quantile regression with heteroscedasticity and general error distributions. Stat Pap 57:185–203
Article MathSciNet Google Scholar
Jiang R, Hu X, Yu K, Qian W (2018) Composite quantile regression for massive datasets. Statistics 52:980–1004
Article MathSciNet Google Scholar
Jiang R, Hu XP, Yu KM, Qian WM (2018) Composite quantile regression for massive datasets. Statistics 52:980–1004
Article MathSciNet Google Scholar
Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc Ser B 72:49–69
Article MathSciNet Google Scholar
Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39:305–332
Article MathSciNet Google Scholar
Knight K (1998) Limiting distributions for L$_1$ regression estimators under general conditions. Ann Stat 26:755–770
Article Google Scholar
Koenker R (2005) Quantile regression, vol 38. Cambridge University Press, Cambridge
Book Google Scholar
Lin N, Xie R (2011) Aggregated estimating equation estimation. Stat Interface 4:73–83
Article MathSciNet Google Scholar
Ma P, Mahoney M, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16:861–911
MathSciNet MATH Google Scholar
Ning Z, Tang L (2014) Estimation and test procedures for composite quantile regression with covariates missing at random. Stat Prob Lett 95:15–25
Article MathSciNet Google Scholar
Raskutti G, Mahoney M (2016) A statistical perspective on randomized sketching for ordinary least-squares. J Mach Learn Res 17:1–31
MathSciNet MATH Google Scholar
Sun J (2020) An improvement on the efficiency of complete-case-analysis with nonignorable missing covariate data. Comput Stat. https://doi.org/10.1007/s00180-020-00964-6
Article MathSciNet MATH Google Scholar
Tang L, Zhou Z (2015) Weighted local linear CQR for varying-coefficient models with missing covariates. TEST 24(3):583–604
Article MathSciNet Google Scholar
Tang L, Zhou Z, Wu C (2012) Weighted composite quantile estimation and variable selection method for censored regression model. Stat Prob Lett 3:653–663
Article MathSciNet Google Scholar
van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, London
Book Google Scholar
Wang H (2019) More efficient estimation for logistic regression with optimal subsamples. J Mach Learn Res 20:1–59
MathSciNet MATH Google Scholar
Wang S, Xiang L (2017) Two-layer EM algorithm for ALD mixture regression models: a new solution to composite quantile regression. Comput Stat Data Anal 115:136–154
Article MathSciNet Google Scholar
Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108:99–112
Article MathSciNet Google Scholar
Wang H, Zhu R, Ma P (2018) Optimal subsampling for large sample logistic regression. J Am Stat Assoc 113:829–844
Article MathSciNet Google Scholar
Wang H, Yang M, Stufken J (2019) Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc 114:393–405
Article MathSciNet Google Scholar
Yao Y, Wang H (2019) Optimal subsampling for softmax regression. Stat Pap 60:585–599
Article MathSciNet Google Scholar
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126
Article MathSciNet Google Scholar

Download references

Acknowledgements

We are grateful to the two reviewers and the associate editor for a number of constructive and helpful comments and suggestions that have clearly improved our manuscript. Xiaohui Yuan was partly supported by the NSFC (Nos. 11571051, 11671054, 11701043).

Author information

Authors and Affiliations

School of Mathematics, Jilin University, Changchun, 130012, China
Xiaohui Yuan & Yong Li
School of Mathematics and Statistics, Changchun University of Technology, Changchun, 130012, Jilin, China
Xiaohui Yuan & Xiaogang Dong
Center for Applied Statistical Research and School of Mathematics, Jilin University, Changchun, 130012, China
Tianqing Liu

Authors

Xiaohui Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yong Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaogang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Tianqing Liu
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 1

Define

$$\begin{aligned} A_n^*(\mathbf{u })= & {} \sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} A^*_{ik}(\mathbf{u }), \end{aligned}$$

where $A^*_{ik}(\mathbf{u }) = \rho _{\tau _k}( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k} ( \varepsilon _i^*-b_{0k})$, $\tilde{\mathbf{x }}_{ik}^*=(\mathbf{x }_i^{*\textsf {T}}, \mathbf{e }_k^\textsf {T})^\textsf {T}$, and $\varepsilon _i^*=y_i^*-{\varvec{\beta }}^\textsf {T}_0\mathbf{x }_i^*$, $i=1,\ldots ,n$. As a function of $\mathbf{u }$, $A_n^*(\mathbf{u })$ is convex and its minimizer is $ \sqrt{n}( \tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)$. Thus, we can focus on $A_n^*(\mathbf{u })$ when assessing the properties of $ \sqrt{n}( \tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)$.

Let $\psi _\tau (u)=\tau -I(u<0)$. By the identity (Knight 1998),

$$\begin{aligned} \rho _{\tau }(u-v)-\rho _{\tau }(u)= & {} -v\psi _\tau (u)+\int _0^v \{I(u\le s)-I(u\le 0)\}ds, \end{aligned}$$

we obtain

$$\begin{aligned} A^*_{ik}(\mathbf{u })= & {} \rho _{\tau _k} ( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k}( \varepsilon _i^*-b_{0k})\\= & {} - \frac{1}{\sqrt{n}}\mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\\&+ \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*- b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds. \end{aligned}$$

Thus, we write

$$\begin{aligned} A_n^*(\mathbf{u })= & {} -\mathbf{u }^\textsf {T}\frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}^* \nonumber \\&+\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i}\int _0^{ {\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik} /\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds\nonumber \\= & {} \mathbf{u }^\textsf {T} \mathbf{Z }_{n}^* +A^*_{2n}(\mathbf{u }), \end{aligned}$$

(15)

where

$$\begin{aligned}&\mathbf{Z }_{n}^*= - \frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\pi ^*_i} \{\tau _k- I(\varepsilon _i^*-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}^*,\\&A^*_{2n}(\mathbf{u })= \sum _{i=1}^n\frac{1}{N\pi ^*_i} A^*_{2n,i}(\mathbf{u }), \\&A^*_{2n,i}(\mathbf{u })= \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^* /\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*-b_{0k}\le 0)\}ds. \end{aligned}$$

We first prove the asymptotic normality of $\mathbf{Z }_{n}^* $. Denote

$$\begin{aligned} {\varvec{\eta }}_i^*= & {} -\frac{1}{N\pi ^*_i} \sum _{k=1}^K \{\tau _k- I(\varepsilon _i^*-b_{0k}<0)\}\tilde{\mathbf{x }}_{ik}^*, \end{aligned}$$

then we can write $\mathbf{Z }_{n}^*= \frac{1}{\sqrt{n}} \sum _{i=1}^n {\varvec{\eta }}_i^*$. Direct calculation yields

$$\begin{aligned} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)= & {} -\frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0 )\}\tilde{\mathbf{x }}_{ik}=O_p(N^{-1/2}), \\ \text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N)= & {} E\{({\varvec{\eta }}_i^*)^{\otimes 2} |{\mathbb {D}}_N\}-\{E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}^{\otimes 2} \\= & {} \sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr \{\sum _{k=1}^K [\tau _k-I(\varepsilon _i-b_{0k}<0)] \tilde{\mathbf{x }}_{ik}\biggr \} ^{\otimes 2}- \{E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}^{\otimes 2} \\= & {} \sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr \{\sum _{k=1}^K [\tau _k-I(\varepsilon _i-b_{0k}<0)] \tilde{\mathbf{x }}_{ik}\biggr \} ^{\otimes 2} - o_p(1). \end{aligned}$$

It follows that

$$\begin{aligned}&E\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}= 0, \\&\text{ cov }\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}= \frac{1}{N^2} \sum _{i=1}^N \text{ cov }\left\{ \sum _{k=1}^K\left[ \tau _k-I(\varepsilon _i<b_{0k})\right] \tilde{\mathbf{x }}_{ik}\right\} . \end{aligned}$$

Consider the (s, t)th element of $ \text{ cov }\{E({\varvec{\eta }}_{i}^*|{\mathbb {D}}_N)\}$, denoted by $\sigma _{st}$. Using the $c_r$ inequality, we have $|\sigma _{st}| \le \sqrt{\sigma _{ss}}\sqrt{\sigma _{tt}}\le \frac{1}{N^2}\sum _{i=1}^NK(\Vert \mathbf{x }_i\Vert ^2+1)=O(N^{-1})$ under Assumption 1(b). By Chebyshev’s inequality, $ E({\varvec{\eta }}_i^*|{\mathbb {D}}_N) =O_p(N^{-1/2})$.

We now check Lindeberg’s conditions (Theorem 2.27 of van der Vaart 1998) under the conditional distribution given ${\mathbb {D}}_N$. Specifically, we want to show that for $\epsilon >0$,

$$\begin{aligned}&\sum _{i=1}^n E\{\Vert n^{-1/2}{\varvec{\eta }}_i^* \Vert ^2 I(\Vert {\varvec{\eta }}_i^*\Vert>\sqrt{n}\epsilon ) |{\mathbb {D}}_N\} \nonumber \\&\quad = \sum _{i=1}^n E\biggr \{\biggr \Vert \frac{1}{\sqrt{n} N \pi _i^*} \sum _{k=1}^K \tilde{\mathbf{x }}_{ik}^*\{\tau _k-I(\varepsilon _i-b_{0k}<0 )\} \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\biggr \Vert \frac{1}{\sqrt{n} N \pi _i^* \epsilon } \sum _{k=1}^K \tilde{\mathbf{x }}_{ik}^* \{\tau _k-I(\varepsilon _i-b_{0k}<0)\} \biggr \Vert>1 \biggr ) \biggr | {\mathbb {D}}_N \biggr \} \nonumber \\&\quad = \sum _{i=1}^N\frac{1}{ N^2 \pi _i} \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0)\}\tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\frac{1}{\sqrt{n} N \pi _i \epsilon } \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i-b_{0k}<0)\} \tilde{\mathbf{x }}_{ik} \biggr \Vert >1 \biggr ) \end{aligned}$$

(16)

goes to zero in probability. If condition (7) holds, then the right hand side of (16) satisfies that

$$\begin{aligned}&\sum _{i=1}^N\frac{1}{ N^2 \pi _i} \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \nonumber \\&\quad \times I\biggr (\frac{1}{\sqrt{n} N \pi _i \epsilon } \biggr \Vert \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr \Vert>1 \biggr ) \nonumber \\&\qquad \le K^2 \sum _{i=1}^N\frac{1}{ N^2 \pi _i} (1+ \Vert \mathbf{x }_{i}\Vert )^2 I\biggr (\frac{K(1+ \Vert \mathbf{x }_{i}\Vert )}{\sqrt{n} N \pi _i \epsilon }>1 \biggr ) \nonumber \\&\qquad \le I\biggr (\max _{1\le i\le N} \frac{\Vert \mathbf{x }_{i}\Vert +1 }{\pi _i}>\frac{\sqrt{n} N\epsilon }{ K} \biggr )\biggr ( K^2\sum _{i=1}^N\frac{(1+ \Vert \mathbf{x }_{i}\Vert )^2}{N^2\pi _i}\biggr ). \end{aligned}$$

By Assumption 2(a), $\max _{1\le i\le N}\frac{\Vert {\mathbf{x }}_i\Vert +1}{\pi _i}=o_p(\sqrt{n}N)$. By Assumption 2(b), $K^2\sum _{i=1}^N\frac{(1+ \Vert {\mathbf{x }}_{i}\Vert )^2}{N^2\pi _i}=O_p(1)$. It follows that

$$\begin{aligned} \sum _{i=1}^n E\{\Vert n^{-1/2}{\varvec{\eta }}_i^* \Vert ^2 I(\Vert {\varvec{\eta }}_i^*\Vert >\sqrt{n}\epsilon ) |{\mathbb {D}}_N\}=o_p(1), \end{aligned}$$

which shows that Lindeberg’s conditions hold with probability approaching one.

Given ${\mathbb {D}}_N$, ${\varvec{\eta }}^*_i$, $i=1,\ldots ,n$, are independent and identically distributed with mean $E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)=O_p(N^{-1/2})$ and the covariance $\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N)$. Thus, conditional on ${\mathbb {D}}_N$, when $n,N \rightarrow +\infty $, with probability approaching one,

$$\begin{aligned} \{\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N) \}^{-1/2}\{ \mathbf{Z }_{n}^*- \sqrt{n} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)\}&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$

Since $\sqrt{n} E({\varvec{\eta }}_i^*|{\mathbb {D}}_N)=O_p(\sqrt{n}/\sqrt{N})=o_p(1)$, it follows that

$$\begin{aligned} \{\text{ cov }({\varvec{\eta }}_i^*|{\mathbb {D}}_N) \}^{-1/2} \mathbf{Z }_{n}^*&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$

(17)

Next, we prove that

$$\begin{aligned} A^*_{2n}(\mathbf{u })= & {} \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p}(1). \end{aligned}$$

Write the conditional expectation of $A^*_{2n}(\mathbf{u })$ as

$$\begin{aligned}&E\{A^*_{2n}(\mathbf{u })|{\mathbb {D}}_N\} \nonumber \\&\qquad = \frac{n}{N}\sum _{i=1}^N E\{A_{2n,i}(\mathbf{u })\}+\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]. \end{aligned}$$

(18)

By Assumption 1,

$$\begin{aligned}&\frac{n}{N}\sum _{i=1}^N E(A_{2n,i}(\mathbf{u })) \nonumber \\&\quad = \frac{n}{N}\sum _{i=1}^N \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \{F(b_{0k}+s)-F(b_{0k})\}ds \nonumber \\&\quad =\frac{\sqrt{n}}{N}\sum _{i=1}^N \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}}\{F(b_{0k}+t/\sqrt{n})-F(b_{0k})\}dt \nonumber \\&\quad =\frac{1}{2}\mathbf{u }^\textsf {T}\left( \frac{1}{N}\sum _{i=1}^N \sum _{k=1}^K f(b_{0k})\tilde{\mathbf{x }}_{ik}\tilde{\mathbf{x }}_{ik}^\textsf {T}\right) \mathbf{u }+o(1) \nonumber \\&\quad = \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o(1). \end{aligned}$$

(19)

The second term of (18) has mean 0 and its variance satisfies

$$\begin{aligned} \text{ var }\biggr (\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]\biggr ) \le \frac{n^2}{N^2} \sum _{i=1}^N E\{A_{2n,i}^2(\mathbf{u })\}. \end{aligned}$$

(20)

From the fact that $A_{2n,i}(\mathbf{u })$ is nonnegative, we obtain

$$\begin{aligned} A_{2n,i}(\mathbf{u })\le & {} \left| \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \{I(\varepsilon _i\le b_{0k}+s)-I(\varepsilon _i\le b_{0k})\}ds \right| \nonumber \\\le & {} \sum _{k=1}^K \int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}/\sqrt{n}} \left| \{I(\varepsilon _i\le b_{0k}+s)-I(\varepsilon _i\le b_{0k})\} \right| ds \nonumber \\\le & {} \frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}|. \end{aligned}$$

(21)

By Assumption 1(b), $\max _{1\le i \le N}\Vert \mathbf{x }_i\Vert =o(\sqrt{N})$. Combining this fact, (20), and (21), we have

$$\begin{aligned}&\text{ var }\biggr (\frac{n}{N}\sum _{i=1}^N [ A_{2n,i}(\mathbf{u })-E\{A_{2n,i}(\mathbf{u })\}]\biggr ) \nonumber \\&\quad \le \left\{ K\frac{\Vert \mathbf{u }\Vert }{\sqrt{N}}(1+\max _{1\le i \le N}\Vert \mathbf{x }_i\Vert ) \right\} \frac{\sqrt{n}}{\sqrt{N}} \frac{n}{N} \sum _{i=1}^N E\{ A_{2n,i}(\mathbf{u })\}= o(1). \end{aligned}$$

(22)

From (18), (19), (22), and Chebyshev’s inequality, it follows that

$$\begin{aligned} E\left\{ A^*_{2n}(\mathbf{u })| {\mathbb {D}}_N \right\} =\frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$

(23)

Next, we exam the conditional variance of $A^*_{2n}(\mathbf{u })$, i.e., $\text{ var }\left\{ A^*_{2n}(\mathbf{u })| {\mathbb {D}}_N \right\} $. Observing that conditional on ${\mathbb {D}}_N$, $A^*_{2n,i}(\mathbf{u }), i=1,\ldots , n$ are independent and identically distributed, then

$$\begin{aligned} \text{ var }\left\{ A^*_{2n}(\mathbf{u }) |{\mathbb {D}}_N\right\}= & {} \frac{1}{N^2} \sum _{i=1}^n \text{ var }\biggr \{ \frac{A^*_{2n,i}(\mathbf{u })}{\pi ^*_i} \biggr |{\mathbb {D}}_N\biggr \} \nonumber \\\le & {} \frac{n}{N^2}E\biggr [\biggr \{\frac{A^*_{2n,i}(\mathbf{u })}{\pi ^*_i}\biggr \}^2 \biggr |{\mathbb {D}}_N\biggr ]. \end{aligned}$$

(24)

By (21), the right hand of (24) satisfies

$$\begin{aligned} \frac{n}{N^2}\sum _{i=1}^N \frac{A^2_{2n,i}(\mathbf{u })}{\pi _i}\le & {} \frac{n}{N^2}\sum _{i=1}^N \frac{A_{2n,i}(\mathbf{u })}{\pi _i}\biggr (\frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}| \biggr ) \nonumber \\\le & {} \frac{1}{\sqrt{n}N} \biggr (K\Vert \mathbf{u }\Vert \max _{1\le i \le N}\frac{\Vert \mathbf{x }_i\Vert +1}{\pi _i}\biggr )\frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u }). \end{aligned}$$

(25)

From (19), (25) and Assumption 2(a), we have

$$\begin{aligned} \text{ var }\biggr \{ A^*_{2n}(\mathbf{u }) |{\mathbb {D}}_N\biggr \}=o_p(1). \end{aligned}$$

(26)

From (21), (26), and Chebyshev’s inequality,

$$\begin{aligned} A^*_{2n}(\mathbf{u })= & {} \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p|{\mathbb {D}}_N}(1), \end{aligned}$$

(27)

Here $a=o_{p|{\mathbb {D}}_N}(1)$ means a converges to 0 in conditional probability given ${\mathbb {D}}_N$ in probability, namely, for any $\delta >0$, $P(|a|>\delta |{\mathbb {D}}_N) {\mathop {\longrightarrow }\limits ^{p}}0$ as $N\rightarrow +\infty $. Note that $0\le P(|a|>\delta |{\mathbb {D}}_N) \le 1$, thus it converges to 0 in probability if and only $P(|a|>\delta )= E\{P(|a|>\delta |{\mathbb {D}}_N)\} \rightarrow 0$. Thus, $a=o_{p|{\mathbb {D}}_N}(1)$ is equivalent to $a=o_{p}(1)$. We will use the notation of $o_p$ only.

From (15) and (27), we have

$$\begin{aligned} A_n^*(\mathbf{u })= & {} \mathbf{u }^\textsf {T} \mathbf{Z }_{n}^*+ \frac{1}{2} \mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_{p}(1). \end{aligned}$$

Since $A_n^*(\mathbf{u })$ is a convex function, then from the corollary in page 2 of Hjort and Pollard (2011), its minimizer, $\sqrt{n}(\tilde{{\varvec{\theta }}}_S- {\varvec{\theta }}_0)$, satisfies that

$$\begin{aligned} \sqrt{n} (\tilde{{\varvec{\theta }}}_S- {\varvec{\theta }}_0) = - \mathbf{E }_N^{-1} \mathbf{Z }_{n}^*+o_p(1). \end{aligned}$$

Thus, we have

$$\begin{aligned} \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)= & {} -\{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \mathbf{E }^{-1}_N \mathbf{Z }_{n}^*+o_p(1). \end{aligned}$$

Combining (17) and Slutsky’s Theorem, we have that, for any $\mathbf{a }\in {\mathbb {R}}^{p+K}$,

$$\begin{aligned} P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N]{\mathop {\longrightarrow }\limits ^{p}}\varPhi _{p+K}(\mathbf{a }), \end{aligned}$$

(28)

where $\varPhi _{p+K}(\mathbf{a })$ denotes the standard $p+K$-dimensional multivariate normal distribution function. Note that the conditional probability in (28) is a bounded random variable, thus convergence in probability to a constant implies convergence in the mean. Therefore, for any $\mathbf{a }\in {\mathbb {R}}^{p+K}$,

$$\begin{aligned}&P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }]\\&\quad =E(P[ \{\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_S-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N])\\&\quad \rightarrow \varPhi _{p+K}(\mathbf{a }). \end{aligned}$$

This finishes the proof of Theorem 1.

Proof the Theorem 2

Note that

$$\begin{aligned} \text{ tr }(\mathbf{E }_N^{-1}\mathbf{V }_\pi \mathbf{E }_N^{-1})= & {} \frac{1}{N^2} \sum _{i=1}^N \text{ tr }\biggr ( \frac{1}{\pi _i}\mathbf{E }_N^{-1} \biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\}\tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}\mathbf{E }_N^{-1}\biggr ) \nonumber \\= & {} \frac{1}{N^2}\sum _{i=1}^N \biggr [\frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\biggr ] \nonumber \\= & {} \frac{1}{N^2}\biggr (\sum _{i=1}^N \pi _i \biggr ) \biggr (\sum _{i=1}^N \frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2 \biggr ) \nonumber \\\ge & {} \frac{1}{N^2} \biggr [ \sum _{i=1}^N \biggr \Vert \sum _{k=1}^K\{I(\varepsilon _i<b_{0k})-\tau _k\}\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\biggr \Vert \biggr ]^2 \end{aligned}$$

where the last step is from the Cauchy-Schwarz inequality and the equality in it holds if and only if when $\pi _i \propto \Vert \sum _{k=1}^K [I(\varepsilon _i<b_{0k})-\tau _k]\mathbf{E }_N^{-1}\tilde{\mathbf{x }}_{ik}\Vert $.

Proof the Theorem 3

Note that

$$\begin{aligned} \text{ tr }(\mathbf{V }_\pi )= & {} \text{ tr }\biggr (\sum _{i=1}^N \frac{1}{N^2\pi _i} \biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik}\biggr ]^{\otimes 2}\biggr ) \nonumber \\= & {} \frac{1}{N^2} \sum _{i=1}^N \text{ tr }\biggr ( \frac{1}{\pi _i}\biggr [\sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik}\biggr ]^{\otimes 2}\biggr ) \nonumber \\= & {} \frac{1}{N^2}\sum _{i=1}^N \biggr [ \frac{1}{\pi _i} \biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert ^2\biggr ] \nonumber \\= & {} \frac{1}{N^2}\biggr (\sum _{i=1}^N \pi _i \biggr ) \biggr [\sum _{i=1}^N \frac{1}{\pi _i}\biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert ^2 \biggr ] \nonumber \\\ge & {} \frac{1}{N^2} \biggr [\sum _{i=1}^N \biggr \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \biggr \Vert \biggr ]^2 \end{aligned}$$

where the last step is from the Cauchy-Schwarz inequality and the equality in it holds if and only if when $\pi _i \propto \Vert \sum _{k=1}^K \{I(\varepsilon _i<b_{0k})-\tau _k\} \tilde{\mathbf{x }}_{ik} \Vert $.

Proof the Theorem 4

Recall that $\varepsilon _i^*=y_i^*-{\varvec{\beta }}^\textsf {T}_0\mathbf{x }_i^*$, $i= 1,\ldots ,n$. Denote

$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} \sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} A^*_{ik}(\mathbf{u }), \end{aligned}$$

where $A^*_{ik}(\mathbf{u }) = \rho _{\tau _k} ( \varepsilon _i^*-b_{0k}- \mathbf{u }^\textsf {T}\tilde{\mathbf{x }}_{ik}^* /\sqrt{n})-\rho _{\tau _k} (\varepsilon _i^*-b_{0k})$.

As a function of $\mathbf{u }$, ${\tilde{A}}_n^*(\mathbf{u })$ is convex and its minimizer is $ \sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)$. Thus we can focus on ${\tilde{A}}_n^*(\mathbf{u })$ when assessing the properties of $ \sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)$. We can write

$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} -\mathbf{u }^\textsf {T}\frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \{\tau _k- I(\varepsilon _i^*-b_{0k} <0 )\}\tilde{\mathbf{x }}_{ik}^* \nonumber \\&+\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}}\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*-b_{0k}\le s)-I(\varepsilon _i^*- b_{0k}\le 0 )\}ds\nonumber \\= & {} \mathbf{u }^\textsf {T} \tilde{\mathbf{Z }}_{n}^*+ {\tilde{A}}^*_{2n}(\mathbf{u }), \end{aligned}$$

(29)

where

$$\begin{aligned}&\tilde{\mathbf{Z }}_{n}^*=- \frac{1}{\sqrt{n}}\sum _{k=1}^K\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \{\tau _k- I(\varepsilon _i^*<b_{0k} )\}\tilde{\mathbf{x }}_{ik}^*,\\&{\tilde{A}}^*_{2n}(\mathbf{u })=\sum _{i=1}^n \frac{1}{N\tilde{\pi }_{i}^{*Lopt}}{\tilde{A}}^*_{2n,i}(\mathbf{u }),\\&{\tilde{A}}^*_{2n,i}(\mathbf{u })= \sum _{k=1}^K\int _0^{{\mathbf{u }}^\textsf {T}\tilde{{\mathbf{x }}}_{ik}^*/\sqrt{n}} \{I(\varepsilon _i^*\le b_{0k}+s)-I(\varepsilon _i^*\le b_{0k})\}ds. \end{aligned}$$

Conditioning on $({\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)$, we first prove the asymptotic normality of $\tilde{\mathbf{Z }}_{n}^* $. Denote

$$\begin{aligned} \tilde{{\varvec{\eta }}}_i^*= & {} -\frac{1}{N\tilde{\pi }_{i}^{*Lopt}} \sum _{k=1}^K \{\tau _k- I(\varepsilon _i^*<b_{0k} )\}\tilde{\mathbf{x }}_{ik}^*, \end{aligned}$$

then we can write $\tilde{\mathbf{Z }}_{n}^*= \frac{1}{\sqrt{n}} \sum _{i=1}^n \tilde{{\varvec{\eta }}}_i^*$. Direct calculation yields

$$\begin{aligned} E(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)= & {} -\frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} =O_p(N^{-1/2}),\nonumber \\ \text{ cov }(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U)= & {} \sum _{i=1}^N \frac{1}{N^2\tilde{\pi }_{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}+o_p(1).\nonumber \\ \end{aligned}$$

(30)

Let $\tilde{\varepsilon }_i= y_i-\tilde{{\varvec{\beta }}}^\textsf {T}_U\mathbf{x }_i$, $i=1,\ldots ,N$, then we can write

$$\begin{aligned} \tilde{\pi }_{i}^{Lopt}= & {} \frac{\Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\Vert }{\sum _{j=1}^N\Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_j<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{jk}\Vert },\ \ i=1,\ldots ,N. \end{aligned}$$

Thus we have

$$\begin{aligned}&\sum _{i=1}^N \frac{1}{N^2\tilde{\pi }_{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2} \nonumber \\&\quad =\frac{1}{N}\sum _{i=1}^N \frac{ [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} ]^{\otimes 2} }{ \Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\Vert } \times \frac{1}{N} \sum _{i=1}^N\biggr \Vert \sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\tilde{\mathbf{x }}_{ik}\biggr \Vert \nonumber \\&\quad = \tilde{\varDelta }_1 \times \tilde{\varDelta }_2. \end{aligned}$$

(31)

Let

$$\begin{aligned} \varDelta _1= & {} \frac{1}{N}\sum _{i=1}^N \frac{ [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik} ]^{\otimes 2} }{ \Vert \sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}\Vert }, \\ \varDelta _2= & {} \frac{1}{N} \sum _{i=1}^N\biggr \Vert \sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}\biggr \Vert . \end{aligned}$$

Next, we show that $\tilde{\varDelta }_1=\varDelta _1+o_p(1)$ and $\tilde{\varDelta }_2=\varDelta _2+o_p(1)$. Note that the $(j_1,j_2)$th element of $\tilde{\varDelta }_1-\varDelta _1$ $(j_1, j_2=1,\ldots ,p)$ is bounded by

$$\begin{aligned}&|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}\le \frac{1}{N} \sum _{i=1}^N \biggr [\sum _{k=1}^K |\tau _k-I(\varepsilon _i<b_{0k})| \Vert \tilde{\mathbf{x }}_{ik}\Vert \biggr ]^{2} \\&\qquad \times \left| \frac{1}{\Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\Vert }-\frac{1}{\Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\Vert } \right| \\&\quad \le \frac{1}{N} \sum _{i=1}^N K^{2}(\Vert \mathbf{x }_{i}\Vert +1 )^2 \\&\qquad \times \frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{\Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\Vert \Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\Vert }. \end{aligned}$$

Observing that

$$\begin{aligned} \biggr |K^{-1}\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2\le K^{-1}\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}^2, \end{aligned}$$

we have

$$\begin{aligned}&\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\\&\quad =\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\mathbf{x }_{i}\biggr \Vert ^2+\biggr \Vert \sum _{k=1}^K[\tau _k-I(\varepsilon _i<b_{0k})]\mathbf{e }_{k}\biggr \Vert ^2\\&\quad =\biggr |\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2\Vert \mathbf{x }_{i}\Vert ^2+\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}^2\\&\quad \ge \biggr |\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}\biggr |^2(\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}). \end{aligned}$$

Similarly,

$$\begin{aligned} \biggr \Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})]\tilde{\mathbf{x }}_{ik}\biggr \Vert ^2\ge \biggr |\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}\biggr |^2(\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}).\nonumber \\ \end{aligned}$$

(32)

Using the inequality

$$\begin{aligned} \frac{(\Vert {\mathbf{x }}_{i}\Vert +1)^2}{\Vert {\mathbf{x }}_i\Vert ^2+K^{-1}}\le 2\frac{\Vert {\mathbf{x }}_i\Vert ^2+1}{\Vert {\mathbf{x }}_i\Vert ^2+K^{-1}}\le 2K, \end{aligned}$$

(33)

we have

$$\begin{aligned}&|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}\nonumber \\&\quad \le \frac{2 K^{3}}{N} \sum _{i=1}^N\frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{|\sum _{k=1}^K\{\tau _k-I(\varepsilon _i<b_{0k})\}||\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}|}\nonumber \\&\quad \le \frac{2 K^{3}}{N} \sum _{i=1}^N\frac{\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)}{(\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|)^2}, \end{aligned}$$

(34)

where $\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|>0$. Note that for each i, $| I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k}) - I(\varepsilon _i<b_{0k})|$ is bounded and converges in probability to 0, as $n_0\rightarrow \infty $. Thus, for $k=1,\ldots ,K$, $E\{| I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k}) - I(\varepsilon _i<b_{0k})|\}\rightarrow 0$. For any $\epsilon >0$,

$$\begin{aligned}&P\biggr \{\frac{1}{N} \sum _{i=1}^N\sum _{k=1}^K|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|(\Vert \mathbf{x }_{i}\Vert +1)>\epsilon \biggr \}\\&\qquad \le \frac{1}{\epsilon N} \sum _{i=1}^N\sum _{k=1}^KE\{|I(\varepsilon _i<b_{0k})-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})|\}(\Vert \mathbf{x }_{i}\Vert +1)\rightarrow 0, \end{aligned}$$

which implies that the term in (34) converges in probability to 0. Thus, $|\tilde{\varDelta }_1-\varDelta _1|_{(j_1,j_2)}{\mathop {\longrightarrow }\limits ^{p}}0$. Similarly, it is easy to verify that $|\tilde{\varDelta }_2-\varDelta _2|{\mathop {\longrightarrow }\limits ^{p}}0$. These facts, together with (30) and (31), show that

$$\begin{aligned} \text{ cov }(\tilde{{\varvec{\eta }}}_i^*|{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U )= & {} \sum _{i=1}^N \frac{1}{N^2 \pi _{i}^{Lopt}} \biggr [\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik} \biggr ]^{\otimes 2}+ o_p(1)\\= & {} \mathbf{V }_{Lopt}+o_p(1). \end{aligned}$$

We now check Lindeberg’s conditions (Theorem 2.27 of van der Vaart 1998) given ${\mathbb {D}}_N$ and $\tilde{{\varvec{\theta }}}_U$. For $\epsilon >0$,

$$\begin{aligned}&\sum _{i=1}^n E\{\Vert n^{-1/2}\tilde{{\varvec{\eta }}}_i^* \Vert ^2 I(\Vert \tilde{{\varvec{\eta }}}_i^*\Vert>\sqrt{n}\epsilon ) |{\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U\} \nonumber \\&\quad = \sum _{i=1}^n E\biggr \{\biggr \Vert \frac{1}{\sqrt{n} N \tilde{\pi }_{i}^{*Lopt}} \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik}^* \biggr \Vert ^2 \nonumber \\&\qquad \times I\biggr (\biggr \Vert \frac{1}{\sqrt{n} N \tilde{\pi }_{i}^{*Lopt} \epsilon } \sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\} \tilde{\mathbf{x }}_{ik}^*\biggr \Vert>1 \biggr ) \biggr | {\mathbb {D}}_N, \tilde{{\varvec{\theta }}}_U \biggr \} \nonumber \\&\quad \le I\left( \max _{1\le i\le N} \frac{ \Vert \mathbf{x }_{i}\Vert +1 }{\tilde{\pi }_{i}^{Lopt}} >\frac{\sqrt{n} N \epsilon }{K} \right) \left( K^2 \sum _{i=1}^N\frac{(\Vert \mathbf{x }_{i}\Vert +1)^2}{ N^2\tilde{\pi }_{i}^{Lopt}}\right) . \end{aligned}$$

(35)

Using the inequalities (32) and (33), it is easy to verify that

$$\begin{aligned} \frac{\Vert \mathbf{x }_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}= & {} \frac{\Vert \mathbf{x }_i\Vert +1 }{\Vert \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})] \tilde{\mathbf{x }}_{ik} \Vert } \sum _{j=1}^N\left\| \sum _{k=1}^K[\tau _k-I(\tilde{\varepsilon }_j<{\tilde{b}}_{U,k})] \tilde{\mathbf{x }}_{jk}\right\| \nonumber \\\le & {} \frac{\Vert \mathbf{x }_i\Vert +1}{|\sum _{k=1}^K\{\tau _k-I(\tilde{\varepsilon }_i<{\tilde{b}}_{U,k})\}|\sqrt{\Vert \mathbf{x }_{i}\Vert ^2+K^{-1}}} K \sum _{j=1}^N (1+\Vert \mathbf{x }_j\Vert )\nonumber \\\le & {} \frac{\sqrt{2K}}{\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|} K \sum _{j=1}^N (1+\Vert \mathbf{x }_j\Vert )=O(N). \end{aligned}$$

(36)

Thus, $\max _{1\le i \le N }\frac{\Vert {\mathbf{x }}_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}=o_p(\sqrt{n} N)$ and the right hand of (35) is $o_p(1)$, which shows that

$$\begin{aligned} \sum _{i=1}^n E\{\Vert n^{-1/2}\tilde{{\varvec{\eta }}}_i^* \Vert ^2 I(\Vert \tilde{{\varvec{\eta }}}_i^*\Vert >\sqrt{n}\epsilon ) |{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U \}=o_p(1). \end{aligned}$$

Given ${\mathbb {D}}_N$ and $\tilde{{\varvec{\theta }}}_U$, $\tilde{{\varvec{\eta }}}_i^*$, are i.i.d with mean $o_p(1)$ and variance $\mathbf{V }_{Lopt}+o_p(1)$. Note that if $(NK)^{-1}\sum _{i=1}^N(1+\Vert \mathbf{x }_i\Vert )^{-1}[\sum _{k=1}^K \{\tau _k-I(\varepsilon _i<b_{0k})\}\tilde{\mathbf{x }}_{ik}]^{\otimes 2}$ is asymptotically positive definite, then $\mathbf{V }_{Lopt}$ is asymptotically positive definite. Thus, given ${\mathbb {D}}_N$ and $\tilde{{\varvec{\theta }}}_U$ in probability, as $n_0\rightarrow \infty $, $n\rightarrow \infty $, and $N\rightarrow \infty $, if $n/N =o(1)$ and $\min _{0\le j\le K,j\in {\mathbb {Z}}}|j-\sum _{k=1}^K\tau _k|>0$, then

$$\begin{aligned} \mathbf{V }_{Lopt} ^{-1/2} \tilde{\mathbf{Z }}_{n}^*&{\mathop {\longrightarrow }\limits ^{d}}&N(\mathbf{0 },\mathbf{I }). \end{aligned}$$

(37)

For ${\tilde{A}}^*_{2n}(\mathbf{u })$ in (29), we get that

$$\begin{aligned} E\{{\tilde{A}}^*_{2n}(\mathbf{u })|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U\}= & {} \frac{n}{N}\sum _{i=1}^N A_{2n,i}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1), \end{aligned}$$

(38)

where the last equality is from (23).

Now we exam its variance. By (21) and (36), we have

$$\begin{aligned}&\text{ var }\{{\tilde{A}}^*_{2n}(\mathbf{u })|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U\}\le \frac{n}{N^2}E\biggr \{ \frac{\{{\tilde{A}}^*_{2n,i}(\mathbf{u })\}^2}{(\tilde{\pi }_{i}^{*Lopt})^2}\biggr |{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}\biggr \}\nonumber \\&\qquad =\frac{n}{N^2}\sum _{i=1}^N \frac{A_{2n,i}^2(\mathbf{u })}{\tilde{\pi }_{i}^{Lopt}}\le \frac{n}{N^2}\sum _{i=1}^N\frac{1}{\sqrt{n}}\sum _{k=1}^K |\mathbf{u }^\textsf {T} \tilde{\mathbf{x }}_{ik}|\frac{A_{2n,i}(\mathbf{u })}{\tilde{\pi }_{i}^{Lopt}}\nonumber \\&\qquad \le K \Vert \mathbf{u }\Vert \frac{\sqrt{n}}{N^2}\sum _{i=1}^N\frac{\Vert \mathbf{x }_{i}\Vert +1}{\tilde{\pi }_{i}^{Lopt}}A_{2n,i}(\mathbf{u })\nonumber \\&\qquad \le K \Vert \mathbf{u }\Vert \max _{1\le i \le N }\frac{\Vert \mathbf{x }_i\Vert +1}{\tilde{\pi }_{i}^{Lopt}}\frac{1}{\sqrt{n}N}\frac{n}{N}\sum _{i=1}^NA_{2n,i}(\mathbf{u })=O_p(n^{-1/2}). \end{aligned}$$

(39)

From (38), (39), and Chebyshev’s inequality,

$$\begin{aligned} {\tilde{A}}^*_{2n}(\mathbf{u })=\frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$

(40)

From (29) and (40),

$$\begin{aligned} {\tilde{A}}_n^*(\mathbf{u })= & {} \mathbf{u }^\textsf {T} \tilde{\mathbf{Z }}_{n}^*+ \frac{1}{2}\mathbf{u }^\textsf {T}\mathbf{E }\mathbf{u }+o_p(1). \end{aligned}$$

Since ${\tilde{A}}_n^*(\mathbf{u })$ is a convex function, from the corollary in page 2 of Hjort and Pollard (2011), its minimizer, $\sqrt{n}(\tilde{{\varvec{\theta }}}_{Lopt}- {\varvec{\theta }}_0)$, satisfies that

$$\begin{aligned} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}- {\varvec{\theta }}_0)&{\mathop {\longrightarrow }\limits ^{d}}&- \mathbf{E }^{-1}_N \tilde{\mathbf{Z }}_{n}^*+o_p(1). \end{aligned}$$

Thus, we have

$$\begin{aligned} \{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)= & {} -\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \mathbf{E }^{-1}_N \tilde{\mathbf{Z }}_{n}^*+o_p(1). \end{aligned}$$

This asymptotic expression, together with (37), show that, for any $\mathbf{a }\in {\mathbb {R}}^{p+K}$,

$$\begin{aligned} P[\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} (\tilde{{\varvec{\theta }}}_{Lopt}-{\varvec{\theta }}_0)\le \mathbf{a }|{\mathbb {D}}_N,\tilde{{\varvec{\theta }}}_U]{\mathop {\longrightarrow }\limits ^{p}}\varPhi _{p+K}(\mathbf{a }). \end{aligned}$$

Here, $\varPhi _{p+K}(\mathbf{a })$ denotes the standard $p+K$-dimensional multivariate normal distribution function. Since the conditional probability is a bounded random variable, convergence in probability to a constant implies convergence in the mean. Therefore, $P[\{\mathbf{E }_N^{-1}\mathbf{V }_{Lopt} \mathbf{E }_N^{-1}\}^{-1/2} \sqrt{n} ({\hat{{\varvec{\theta }}}}_{Lopt}-{\varvec{\theta }}_0)\le \mathbf{a }]\rightarrow \varPhi _{p+K}(\mathbf{a })$ for any $\mathbf{a }\in {\mathbb {R}}^{p+K}$, and this finishes the proof of Theorem 4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, X., Li, Y., Dong, X. et al. Optimal subsampling for composite quantile regression in big data. Stat Papers 63, 1649–1676 (2022). https://doi.org/10.1007/s00362-022-01292-1

Download citation

Received: 12 August 2020
Revised: 17 January 2022
Accepted: 28 January 2022
Published: 08 February 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00362-022-01292-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal subsampling for composite quantile regression in big data

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Check your outliers! An introduction to identifying statistical outliers in R with easystats

References

Acknowledgements

Author information

Authors and Affiliations

Additional information

Publisher's Note

Appendix

Proof of Theorem 1

Proof the Theorem 2

Proof the Theorem 3

Proof the Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal subsampling for composite quantile regression in big data

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Check your outliers﻿! An introduction to identifying statistical outliers in R with easystats

References

Acknowledgements

Author information

Authors and Affiliations

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 1

Proof the Theorem 2

Proof the Theorem 3

Proof the Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Check your outliers! An introduction to identifying statistical outliers in R with easystats