Robust optimal subsampling based on weighted asymmetric least squares

Ren, Min; Zhao, Shengli; Wang, Mingqiu; Zhu, Xinbei

doi:10.1007/s00362-023-01480-7

Robust optimal subsampling based on weighted asymmetric least squares

Regular Article
Published: 19 September 2023

Volume 65, pages 2221–2251, (2024)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Min Ren¹,
Shengli Zhao¹,
Mingqiu Wang ORCID: orcid.org/0000-0001-7164-0054¹ &
…
Xinbei Zhu²

373 Accesses
Explore all metrics

Abstract

With the development of contemporary science, a large amount of generated data includes heterogeneity and outliers in the response and/or covariates. Furthermore, subsampling is an effective method to overcome the limitation of computational resources. However, when data include heterogeneity and outliers, incorrect subsampling probabilities may select inferior subdata, and statistic inference on this subdata may have a far inferior performance. Combining the asymmetric least squares and $L_2$ estimation, this paper proposes a double-robustness framework (DRF), which can simultaneously tackle the heterogeneity and outliers in the response and/or covariates. The Poisson subsampling is implemented based on the DRF for massive data, and a more robust probability will be derived to select the subdata. Under some regularity conditions, we establish the asymptotic properties of the subsampling estimator based on the DRF. Numerical studies and actual data demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Entropy-Based Subsampling Methods for Big Data

Article 11 April 2024

Accounting for outliers in optimal subsampling methods

Article Open access 27 April 2023

Subsampling in Longitudinal Models

Article 25 February 2023

References

Ai M, Wang F, Yu J, Zhang H (2021) Optimal subsampling for large-scale quantile regression. J Complexity 62:101512
Article MathSciNet Google Scholar
Ai M, Yu J, Zhang H, Wang H (2021) Optimal subsampling algorithms for big data regressions. Stat Sin 31(2):749–772
MathSciNet Google Scholar
Aigner D, Amemiya T, Poirier D (1976) On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function. Int Econ Rev 17(2):377–396
Article MathSciNet Google Scholar
Barry A, Oualkacha K, Charpentier A (2022) A new gee method to account for heteroscedasticity using asymmetric least-square regressions. J Appl Stat 49(14):3564–3590
Article MathSciNet Google Scholar
Bowman A (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2):353–360
Article MathSciNet Google Scholar
Ciuperca G (2021) Variable selection in high-dimensional linear model with possibly asymmetric errors. Comput Stat Data Anal 155:107–112
Article MathSciNet Google Scholar
Drineas P, Mahoney M, Muthukrishnan S, Sarlós T (2011) Faster least squares approximation. Numer Math 117:219–249
Article MathSciNet Google Scholar
Efron B (1991) Regression percentiles using asymmetric squared error loss. Stat Sin 1(1):93–125
MathSciNet Google Scholar
Fan J, Han F, Liu H (2014) Challenges of big data analysis. Nat Sci Rev 1(2):293–314
Article Google Scholar
Gijbels I, Karim R, Verhasselt A (2019) On quantile-based asymmetric family of distributions: properties and inference. Int Stat Rev 87(3):471–504
Article MathSciNet Google Scholar
Gu Y, Zou H (2016) High-dimensional generalizations of asymmetric least squares regression and their applications. Ann Stat 44(6):2661–2694
Article MathSciNet Google Scholar
Hájek J (1964) Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat 35(4):1491–1523
Article MathSciNet Google Scholar
Hjort N, Pollard D (2011) Asymptotics for minimisers of convex processes. arXiv:1107.3806
Koenker R (2005) In quantile regression. Cambridge University Press, Cambridge
Book Google Scholar
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50
Article MathSciNet Google Scholar
Liao L, Park C, Choi H (2019) Penalized expectile regression: an alternative to penalized quantile regression. Ann Inst Stat Math 71(2):409–438
Article MathSciNet Google Scholar
Lin L, Li F (2019) A global bias-correction dc method for biased estimation under memory constraint. arXiv:1904.07477
Ma P, Mahoney W, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16(1):861–919
MathSciNet Google Scholar
Ma P, Zhang X, Xing X, Ma J, Mandal A (2022) Asymptotic analysis of sampling estimators for randomized numerical linear algebra algorithms. J Mach Learn Res 23(177):1–45
MathSciNet Google Scholar
Meng C, Xie R, Mandal A, Zhang X, Zhong W, Ma P (2021) Lowcon: a design-based subsampling approach in a misspecified linear model. J Comput Graph Stat 30:694–708
Article MathSciNet Google Scholar
Newey W, Powell J (1987) Asymmetric least squares estimation and testing. Econometric 55(4):819–847
Article MathSciNet Google Scholar
Pollard D (1982) Empirical choice of histogram and kernel density estimators. Scand J Stat 9(2):65–78
MathSciNet Google Scholar
Pollard D (1991) Asymptotics for least absolution deviation regression estimators. Economet Theor 7(2):186–199
Article Google Scholar
Pukelsheim F (2006) In optimal design of experiments. Society for Industrial and Applied Mathematics, Philadelphia
Book Google Scholar
Schifano E, Wu J, Wang C, Yan J, Chen H (2016) Online updating of statistical inference in the big data setting. Technometrics 58(3):393–403
Article MathSciNet Google Scholar
Scott D (2012) Parametric statistical modeling by minimum integrated square error. Technometrics 43(3):274–285
Article MathSciNet Google Scholar
Shao Y, Wang L (2022) Optimal subsampling for composite quantile regression model in massive data. Stat Pap 63:1139–1161
Article MathSciNet Google Scholar
van der Vaart (1998) In asymptotic statistics. Cambridge University Press, London
Book Google Scholar
Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108(1):99–112
Article MathSciNet Google Scholar
Wang H, Zhu R, Ma P (2018) Optimal subsampling for large sample logistic regression. J Am Stat Assoc 113(522):829–844
Article MathSciNet Google Scholar
Wang H, Yang M, Stufken J (2019) Information-based optimal subdata s-election for big data linear regression. J Am Stat Assoc 114(525):393–405
Article Google Scholar
Wang L, Elmstedt J, Wong W, Xu H (2021) Orthogonal subsampling for big data linear regression. Ann Appl Stat 15(3):1273–1290
Article MathSciNet Google Scholar
Xiong S, Li G (2008) Some results on the convergence of conditional distributions. Stat Probab Lett 78(18):3249–3253
Article MathSciNet Google Scholar
Yan Q, Li Y, Niu M (2022) Optimal subsampling for functional quantile regression. Stat Pap. https://doi.org/10.1007/s00362-022-01367-z
Article Google Scholar
Yu J, Wang H, Ai M, Zhang H (2022) Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. J Am Stat Assoc 117(537):265–276
Article MathSciNet Google Scholar
Yuan X, Li Y, Dong X, Liu T (2022) Optimal subsampling for composite quantile regression in big data. Stat Pap 63:1649–1676
Article MathSciNet Google Scholar
Yu J, Wang H (2022) Subdata selection algorithm for linear model discrimination. Stat Pap. 63:1883–1906
Article MathSciNet Google Scholar
Yu J, Ai M, Ye Z (2023a) A review on design inspired subsampling for big data. Stat Pap. https://doi.org/10.1007/s00362-022-01386-w
Yu J, Liu J, Wang H (2023b) Information-based optimal subdata selection for non-linear models. Stat Pap. https://doi.org/10.1007/s00362-023-01430-3
Zhu X, Li F, Wang H (2021) Least-square approximation for a distributed system. J Comput Graph Stat 30(4):1004–1018
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the Editor and two referees for the constructive suggestions that lead to a significant improvement over the article. This research is supported in part by the National Natural Science Foundation of China (12171277, 12271294, 12071248).

Author information

Authors and Affiliations

School of Statistics and Data Science, Qufu Normal University, Qufu, 273165, Shandong, China
Min Ren, Shengli Zhao & Mingqiu Wang
Department of Computer Science, Virginia Tech University, Blacksburg, VA, 24061, USA
Xinbei Zhu

Authors

Min Ren
View author publications
You can also search for this author in PubMed Google Scholar
Shengli Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Mingqiu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xinbei Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingqiu Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Technical details

Proof of Proposition 1

Direct calculation yields

$$\begin{aligned}&\int _{-\infty }^{+\infty } f(u)^{2} du\\&\quad =\int _{-\infty }^{+\infty } \frac{4 \tau (1-\tau )}{\pi \sigma ^{2}(\sqrt{\tau }+\sqrt{1-\tau })^{2}} \exp \left\{ -2 \rho _{\tau }\left( \frac{u-\mu _{\tau }}{\sigma }\right) \right\} du \\&\quad =\frac{4 \tau (1-\tau )}{\pi \sigma ^{2}(1+2 \sqrt{\tau (1-\tau )})} \int _{-\infty }^{+\infty } \exp \left\{ -2|\tau -\mathbb {1}\left( u \le \mu _{\tau }\right) |\frac{\left( u-\mu _{\tau }\right) ^{2}}{\sigma ^{2}}\right\} du\\&\quad =\frac{4 \tau (1-\tau )}{\pi \sigma ^{2}(1+2 \sqrt{\tau (1-\tau )})}\left[ \int _{\mu _{\tau }}^{+\infty } \exp \left\{ -2 \tau \left( \frac{u-u_{\tau }}{\sigma }\right) ^{2}\right\} du\right. \\&\qquad +\left. \int _{-\infty }^{\mu _{\tau }} \exp \left\{ -2(1-\tau ) \left( \frac{u-u_{\tau }}{\sigma }\right) ^{2}\right\} du\right] \\&\quad =\frac{4 \tau (1-\tau )}{\pi \sigma ^{2}(1+2 \sqrt{\tau (1-\tau )})}\times \left( \frac{\sigma \sqrt{\pi }}{2\sqrt{2\tau }}+\frac{\sigma \sqrt{\pi }}{2\sqrt{2(1-\tau )}} \right) \\&\quad =\frac{\sqrt{2(1-\tau )\tau }}{\sigma \sqrt{\pi }(\sqrt{\tau }+\sqrt{1-\tau })} \end{aligned}$$

and

$$\begin{aligned}&\int _{-\infty }^{+\infty } f(u)^{2}du-\frac{2}{N}\sum _{i=1}^{N}f(u_i)\\&\quad =\frac{\sqrt{2(1-\tau )\tau }}{\sigma \sqrt{\pi }(\sqrt{\tau }+\sqrt{1-\tau })}\\&\qquad -\frac{2}{N}\sum _{i=1}^{N}\frac{2}{\sqrt{\pi \sigma ^2}}\frac{\sqrt{\tau (1-\tau )}}{\sqrt{\tau }+\sqrt{1-\tau }}\exp \left\{ -|\tau -\mathbb {1}\left( u_{i} \le u_{\tau }\right) |\left( \frac{u_i-u_{\tau }}{\sigma } \right) ^{2}\right\} \\&\quad =\frac{1}{N}\sum _{i=1}^{N} \frac{\sqrt{2(1-\tau )\tau }}{\sigma \sqrt{\pi }(\sqrt{\tau }+\sqrt{1-\tau })}\left[ 1-2\sqrt{2}\exp \left\{ -|\tau -\mathbb {1}(u_i\le u_{\tau }) |\left( \frac{u_i-u_{\tau }}{\sigma }\right) ^2 \right\} \right] . \end{aligned}$$

The result can be derived. $\square $

Lemma 1

(Gu and Zou 2016) Denote $r(v_{i}) = \rho _{\tau }(\varepsilon _{i} - v_{i}) - \rho _{\tau }(\varepsilon _{i}) + 2\varepsilon _{i}v_{i}\psi _{\tau }(\varepsilon _{i})$, $i = 1, \ldots , N$. The asymmetric squared error loss $\rho _{\tau }(\cdot )$ is continuously differentiable, but is not twice differentiable at zero when $\tau \ne 0.5$. Moreover, for any $\varepsilon _{i}$, $v_{i} \in {\mathbb {R}}$ and $\tau \in (0,1)$, we have

$$\begin{aligned} (\tau \wedge (1-\tau ))v_{i}^{2} \le r(v_{i}) \le (\tau \vee (1-\tau ))v_{i}^{2}, \end{aligned}$$

where $\tau \wedge (1-\tau ) = \text {min}\{\tau , 1-\tau \}$ and $\tau \vee (1-\tau ) = \text {max }\{\tau , 1-\tau \}$. It follows that $\rho _{\tau }(\cdot )$ is strongly convex.

Lemma 2

(Corollary of Hjort and Pollard (2011)) Suppose $\varvec{Z}_{n}(\varvec{d})$ is convex and can be represented as $\frac{1}{2} \varvec{d}' \varvec{V} \varvec{d} + \varvec{W}'_{n}\varvec{d} + C_{n} + a_{n}(\varvec{d})$, where $\varvec{V}$ is symmetric and positive definite, $\varvec{W}_{n}$ is stochastically bounded, $C_{n}$ is an arbitrary constant and $a_{n}(\varvec{d})$ goes to zero in probability for each $\varvec{d}$. Then $\varvec{\beta }_{n} = \arg \min \varvec{Z}_{n}$ is only $o_P(1)$ away from $\varvec{\alpha }_{n} = -\varvec{V}^{-1}\varvec{W}_{n}$, where $\varvec{\alpha }_{n} = \arg \min (\frac{1}{2}\varvec{d}'\varvec{V} \varvec{d} + \varvec{W}'_{n}\varvec{d} + \varvec{C}_{n})$. If $\varvec{W}_{n} \overset{d}{\rightarrow }\ \varvec{W}$, then $\varvec{\beta }_{n} \overset{d}{\rightarrow }\ -\varvec{V}^{-1}\varvec{W}$.

Lemma 3

If Conditions 1, 3, 4, 5 hold, as $n, N \rightarrow \infty $, then

(a)
$\underset{\varvec{\beta }\in \Lambda _\textrm{B}}{\textrm{sup}}|Q_n(\varvec{\beta }) - Q_N(\varvec{\beta })|\rightarrow 0$ in conditional probability for given ${\mathcal {F}}_{N}$,
(b)
$\parallel \tilde{\varvec{\beta }} - \varvec{\beta }_{t}\parallel = o_{P}(1)$.

Proof

Direct calculation yields

$$\begin{aligned} {\mathbb {E}}\left\{ Q_n(\varvec{\beta })\mid {\mathcal {F}}_{N} \right\} = \frac{1}{N}\sum _{i=1}^{N}\frac{{\mathbb {E}}(R_i)\omega _i(\varvec{\beta }_0)\rho _{\tau }(y_i-\varvec{x}_i'\varvec{\beta })}{\pi _i}=Q_N(\varvec{\beta }). \end{aligned}$$

Since $(\varvec{x}_{i}, y_{i})$’s are i.i.d., we have

$$\begin{aligned}&{\mathbb {E}}\left( Q_n(\varvec{\beta })-Q_N(\varvec{\beta })\mid {\mathcal {F}}_{N}\right) ^{2}\nonumber \\&\quad =\frac{1}{N^2}{\mathbb {V}}\left\{ \sum _{i=1}^{N}\frac{R_i}{\pi _i}\omega _i(\varvec{\beta }_0)\rho _{\tau }(y_i-\varvec{x}_i'\varvec{\beta })\mid {\mathcal {F}}_{N}\right\} \nonumber \\&\quad =\frac{1}{N^2}\sum _{i=1}^{N}\frac{1}{\pi _i}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })-\frac{1}{N^2}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\nonumber \\&\quad \le \underset{1\le i \le N}{\max }\left( \frac{1}{N\pi _i}\right) \left\{ \frac{1}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\right\} -\frac{1}{N^2}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\nonumber \\&\quad =O_{P}\left( \frac{1}{n}\right) \left\{ \frac{1}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\right\} -\frac{1}{N^2}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\nonumber \\&\quad =O_{P}\left( \frac{1}{n}\right) , \end{aligned}$$

(A.1)

where the last equality is derived by

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })&\le \frac{1}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)(\tau \vee (1-\tau ))^2(y_i-\varvec{x}_i'\varvec{\beta })^4\\&\le \frac{2}{N}\sum _{i=1}^{N}(y_i-\varvec{x}_i'\varvec{\beta }_t)^4+\frac{2}{N}\sum _{i=1}^{N}[\varvec{x}_i'(\varvec{\beta }_t-\varvec{\beta })]^4\\&\le \frac{2}{N}\sum _{i=1}^{N}\varepsilon _i^4+\frac{2}{N}\sum _{i=1}^{N}\Vert \varvec{x}_i\Vert ^4\Vert \varvec{\beta }_t-\varvec{\beta }\Vert ^4\\&= \frac{2}{N}\sum _{i=1}^{N}\varepsilon _i^4+2({\mathbb {E}}\Vert \varvec{x}_i\Vert ^4+o_P(1))\Vert \varvec{\beta }_t-\varvec{\beta }\Vert ^4\\&=O_P(1), \end{aligned}$$

where the last equality is due to Conditions 1, 3, 4. So ${\mathbb {E}}\left\{ Q_n(\varvec{\beta })-Q_N(\varvec{\beta })\mid {\mathcal {F}}_{N} \right\} ^{2}\rightarrow 0$ as $N\rightarrow \infty $ and $n\rightarrow \infty $. Combining (A.1) and the Chebyshev inequality, $Q_n(\varvec{\beta }) - Q_N(\varvec{\beta }) \rightarrow 0$ in conditional probability for given ${\mathcal {F}}_{N}$. Since $Q_n(\varvec{\beta })$ is convex function of $\varvec{\beta }$, by the Convexity Lemma of Pollard (1991), then $\underset{\varvec{\beta }\in \Lambda _{\textrm{B}}}{\text {sup}}|Q_n(\varvec{\beta }) - Q_N(\varvec{\beta })|\rightarrow 0$ in conditional probability for given ${\mathcal {F}}_{N}$.

$Q_N(\varvec{\beta })$ has a unique minimum $\hat{\varvec{\beta }}_{f}$ by Lemma A of Newey and Powell (1987). Thus, based on Theorem 5.9 and its remark of van der Vaart (1998), we have

$$\begin{aligned} \Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_{f}\Vert = o_{P\mid {\mathcal {F}}_{N}}(1). \end{aligned}$$

Xiong and Li (2008) showed that if a sequence is bounded in conditional probability, then it is bounded in unconditional probability, so we have $\Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_{f}\Vert = o_{P}(1)$. Newey and Powell (1987) proved that $\Vert \hat{\varvec{\beta }}_{f} - \varvec{\beta }_{t} \Vert = o_{P}(1)$. By the triangle inequality, then

$$\begin{aligned} \Vert \tilde{\varvec{\beta }}-\varvec{\beta }_{t}\Vert \le \Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_{f}\Vert + \Vert \hat{\varvec{\beta }}_{f} - \varvec{\beta }_{t}\Vert = o_{P}(1). \end{aligned}$$

This completes the proof. $\square $

Lemma 4

Denote $\varvec{L}^{*}(\varvec{\beta })=\frac{\partial Q_n(\varvec{\beta })}{\partial \varvec{\beta }}$. If Conditions 1, 3, 4, 5 hold, as $n, N \rightarrow \infty $ then

$$\begin{aligned} \varvec{L}^{*}(\varvec{\beta }_{t})=O_{P}\left( \frac{1}{\sqrt{n}}\right) . \end{aligned}$$

Proof

For any $\varvec{\beta } \in \Lambda _{\textrm{B}}$, direct calculation yields

$$\begin{aligned} \varvec{L}^{*}(\varvec{\beta }) =&-\frac{2}{N}\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\psi _{\tau }(y_i-\varvec{x}_i'\varvec{\beta })(y_i-\varvec{x}_i'\varvec{\beta })R_i\varvec{x}_i}{\pi _i}. \end{aligned}$$

(A.2)

Substituting $\varvec{\beta }=\varvec{\beta }_{t}$ into (A.2), we have

$$\begin{aligned} \varvec{L}^{*}(\varvec{\beta }_t) =&-\frac{2}{N}\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)R_i\varepsilon _i\varvec{x}_i}{\pi _i}. \end{aligned}$$

Let $L_{j_1}^{*}(\varvec{\beta }_{t})$, $L_{j_2}^{*}(\varvec{\beta }_{t})$ be the elements of $\varvec{L}^{*}(\varvec{\beta }_{t})$ and $x_{ij_1}$, $x_{ij_2}$ be the elements of $\varvec{x}_i$, $j_1,j_2 = 0, 1, \ldots , p$, then the conditional expectation and conditional covariance are

$$\begin{aligned}&{\mathbb {E}}(\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})=-\frac{2}{N}\sum _{i=1}^{N}\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i=O_{P}\left( \frac{1}{\sqrt{N}}\right) , \end{aligned}$$

(A.3)

$$\begin{aligned}&Cov(L_{j_1}^{*}(\varvec{\beta }_t),L_{j_2}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})\nonumber \\&\quad =\frac{4}{N^2}\sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi ^2_{\tau }(\varepsilon _i){\mathbb {V}}(R_i)\varepsilon ^2_ix_{ij_1}x_{ij_2}}{\pi _i^2}\nonumber \\&\quad =\frac{4}{N^2}\sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi ^2_{\tau }(\varepsilon _i)\varepsilon ^2_ix_{ij_1}x_{ij_2}}{\pi _i}-\frac{4}{N^2}\sum _{i=1}^{N} \omega _i^2(\varvec{\beta }_0)\psi ^2_{\tau }(\varepsilon _i)\varepsilon ^2_ix_{ij_1}x_{ij_2}\nonumber \\&\quad \le \underset{1\le i\le N}{\max }\left\{ \frac{1}{N\pi _i}\right\} \frac{4}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\psi ^2_{\tau }(\varepsilon _i)\varepsilon ^2_ix_{ij_1}x_{ij_2}+O_{P}\left( \frac{1}{N}\right) \nonumber \\&\quad =O_{P}\left( \frac{1}{n}\right) , \end{aligned}$$

(A.4)

where the last equality is due to Conditions 1, 4, 5. For the (A.3), this is because

$$\begin{aligned} {\mathbb {E}}\{{\mathbb {E}}(\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})\}={\mathbb {E}}\left\{ -\frac{2}{N}\sum _{i=1}^{N}\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i\right\} =\varvec{0}, \end{aligned}$$

and

$$\begin{aligned} Cov\left\{ {\mathbb {E}}(\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})\right\}&= \frac{4}{N^2}\sum _{i=1}^{N}{\mathbb {E}}\left( \omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _i)\varepsilon _i^2\varvec{x}_i\varvec{x}_i'\right) \\&\quad -\frac{4}{N^2}\sum _{i=1}^{N}\left\{ {\mathbb {E}}\left( \omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i\right) {\mathbb {E}}\left( \omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i'\right) \right\} \\&=\frac{4}{N^2}\sum _{i=1}^{N}{\mathbb {E}}\left( \omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _i)\varepsilon _i^2\varvec{x}_i\varvec{x}_i'\right) \\&=O\left( \frac{1}{N}\right) . \end{aligned}$$

By the Chebyshev’s inequality, (A.3) can be obtained. Therefore, from (A.3), (A.4) and Chebyshev’s inequality, the result can be derived. $\square $

Lemma 5

Denote $Z=\frac{1}{N}\sum _{i=1}^{N}[\rho _{\tau }(\varepsilon _{i}-v_{i})-\rho _{\tau }(\varepsilon _{i})]$. Under Conditions 1 and 3, then we are able to split Z in two functions, i.e.

$$\begin{aligned} Z \simeq -\frac{2}{N}\sum _{i=1}^{N}v_{i}\varepsilon _{i}\psi _{\tau }(\varepsilon _{i})+\frac{1}{N}\sum _{i=1}^{N}v_{i}^{2}\psi _{\tau }(\varepsilon _{i}), \end{aligned}$$

where $v_{i}=\varvec{x}_{i}'\left( \varvec{\beta }-\varvec{\beta }_{t} \right) $, $\varvec{\beta }\in \Lambda _{\textrm{B}}$.

Proof

From Lemma 1, then we have

$$\begin{aligned} Z&=\frac{1}{N}\sum _{i=1}^{N}[\rho _{\tau }(\varepsilon _{i}-v_{i})-\rho _{\tau }(\varepsilon _{i})]\\&=\frac{1}{N}\sum _{i=1}^{N}[-2\varepsilon _{i}v_{i}\psi _{\tau }(\varepsilon _{i})+r(v_{i})]\\&=\frac{1}{N}\sum _{i=1}^{N}[-2\varepsilon _{i}v_{i}\psi _{\tau }(\varepsilon _{i})+r(v_{i})-v_{i}^{2}\psi _{\tau }(\varepsilon _{i})+v_{i}^{2}\psi _{\tau }(\varepsilon _{i})]\\&=\frac{1}{N}\sum _{i=1}^{N}[-2v_{i}\varepsilon _{i}\psi _{\tau }(\varepsilon _{i})+v_{i}^{2}\psi _{\tau }(\varepsilon _{i})]+O_P\left( \frac{1}{N}\sum _{i=1}^{N}v_{i}^{2} \right) . \end{aligned}$$

This completes the proof. $\square $

Proof of Theorem 3

The first part of Theorem 3 has showed in Lemma 3. Now we will proof the second part. Let $\varvec{\xi } = \varvec{\beta }-\varvec{\beta }_t$ and

$$\begin{aligned} \varvec{Z}(\varvec{\xi })=\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\left\{ \rho _{\tau }(\varepsilon _{i}-\varvec{x}_i'\varvec{\xi })-\rho _{\tau }(\varepsilon _{i})\right\} R_i}{N\pi _i}. \end{aligned}$$

Note that $\varvec{Z}(\varvec{\xi })$ is convex and minimized by $\tilde{\varvec{\beta }}-\varvec{\beta }_t$. Thus, we focus on $\varvec{Z}(\varvec{\xi })$ when assessing the properties of $\tilde{\varvec{\beta }}-\varvec{\beta }_t$. Denote $\varvec{Z}_{N2}=\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _{i})R_i\varvec{x}_i\varvec{x}_i'}{N\pi _i}$, then

$$\begin{aligned} \varvec{Z}_{N2}&=\varvec{Z}_{N2}-{\mathbb {E}}(\varvec{Z}_{N2}\mid {\mathcal {F}}_{N})+{\mathbb {E}}(\varvec{Z}_{N2}\mid {\mathcal {F}}_{N}) \\&=o_{P\mid {\mathcal {F}}_{N}}(1)+\varvec{D}_N, \end{aligned}$$

where $\varvec{Z}_{N2}-{\mathbb {E}}(\varvec{Z}_{N2}\mid {\mathcal {F}}_{N})=o_{P\mid {\mathcal {F}}_{N}}(1)$ can be derived by (A.5), (A.6) and Chebyshev’s inequality. Denote $\varvec{Z}_{N3}=\varvec{Z}_{N2}-{\mathbb {E}}(\varvec{Z}_{N2}\mid {\mathcal {F}}_{N})$, then

$$\begin{aligned} {\mathbb {E}}(\varvec{Z}_{N3}\mid {\mathcal {F}}_{N})=\varvec{0}, \end{aligned}$$

(A.5)

and let $Z_{N3j_1}$, $Z_{N3j_2}$ be the elements of $\varvec{Z}_{N3}$ and $x_{ij_1}$, $x_{ij_2}$ be the elements of $\varvec{x}_i$, $j_1,j_2 =0, 1, \ldots , p$,

$$\begin{aligned} Cov(Z_{N3j_1},Z_{N3j_2}\mid {\mathcal {F}}_{N})&\le \sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _{i})(x_{ij_1}x_{ij_2})^2\pi _i(1-\pi _i)}{N^2\pi _{i}^2}\nonumber \\&\le \sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _{i})x_{ij_1}x_{ij_2}}{N^2\pi _{i}}\nonumber \\&\le \underset{1\le i\le N}{\max }\left( \frac{1}{N\pi _i}\right) \left( \sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _{i})(x_{ij_1}x_{ij_2})^2}{N}\right) \nonumber \\&=O_P\left( \frac{1}{n}\right) . \end{aligned}$$

(A.6)

From Lemma 5, we have

$$\begin{aligned} \varvec{Z}(\varvec{\xi })&= -\varvec{\xi }'\sum _{i=1}^{N}\frac{2\omega _i(\varvec{\beta }_0)\varepsilon _{i}\psi _{\tau }(\varepsilon _{i})R_i\varvec{x}_i}{N\pi _i}+\varvec{\xi }'\sum _{i=1}^{N}\frac{ \omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _{i})\varvec{x}_i\varvec{x}_i'}{N}\varvec{\xi }+o_P(1) \\&= \varvec{\xi }'\varvec{L}^{*}(\varvec{\beta }_t)+\varvec{\xi }'\varvec{D}_N\varvec{\xi }+o_P(1). \end{aligned}$$

Since $\varvec{Z}(\varvec{\xi })$ is convex, and from Lemma 2,

$$\begin{aligned} \tilde{\varvec{\beta }}-\varvec{\beta }_t=-{\frac{1}{2}}\varvec{D}_N^{-1}\varvec{L}^{*}(\varvec{\beta }_t)+o_P(1). \end{aligned}$$

(A.7)

By Condition 2 and Lemma 4, we have

$$\begin{aligned} \tilde{\varvec{\beta }}-\varvec{\beta }_t=O_{P\mid {\mathcal {F}}_{N}}\left( \frac{1}{\sqrt{n}}\right) . \end{aligned}$$

This completes the proof of Theorem 3. $\square $

Proof of Theorem 4

By Lemma 4,

$$\begin{aligned} {\mathbb {E}}\{\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N}\}=O_{P\mid {\mathcal {F}}_{N}}\left( \frac{1}{\sqrt{N}}\right) , \quad {\mathbb {V}}\{\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N}\}={ 4}\varvec{V}_{\pi }+o_{P\mid {\mathcal {F}}_{N}}(1). \end{aligned}$$

(A.8)

Now we check the Lindeberg-Feller condition. Note that

$$\begin{aligned} \varvec{L}^{*}(\varvec{\beta }_t)=-\frac{2}{N}\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)R_i\varepsilon _i\varvec{x}_i}{\pi _i}:=-2\sum _{i=1}^{N}\varvec{\eta }_i. \end{aligned}$$

For every $\epsilon >0$, we have

$$\begin{aligned} \sum _{i=1}^{N}{\mathbb {E}}\{\Vert \varvec{\eta }_i\Vert ^2\mathbb {1}(\Vert \varvec{\eta }_i\Vert >\epsilon )\mid {\mathcal {F}}_{N}\}&\le \frac{1}{\epsilon }\sum _{i=1}^{N}{\mathbb {E}}\{\Vert \varvec{\eta }_i\Vert ^3\mid {\mathcal {F}}_{N}\}\\&\le \frac{1}{\epsilon }\sum _{i=1}^{N}{\mathbb {E}}\left\{ \frac{\Vert \omega _i(\varvec{\beta }_0)R_i\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i\Vert ^3}{N^3\pi _i^3}\mid {\mathcal {F}}_{N}\right\} \\&\le \frac{1}{\epsilon }\underset{1\le i\le N}{\max }\left\{ \frac{1}{(N\pi _i)^2}\right\} \frac{1}{N}\sum _{i=1}^{N}\Vert \omega _i(\varvec{\beta }_0)\psi (\varepsilon _i)\varepsilon _i\varvec{x}_i\Vert ^3\\&\le \frac{1}{\epsilon }O_P\left( \frac{1}{n^2}\right) =o_P(1), \end{aligned}$$

where the last inequality holds by Conditions 1, 4, 5.

Given ${\mathcal {F}}_{N}$, using (A.8) and Lindeberg-Feller central limit theorem,

$$\begin{aligned} \varvec{V}_{\pi }^{-1/2}\{\varvec{L}^{*}(\varvec{\beta }_t)-\sqrt{n}{\mathbb {E}}(\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})\}\overset{d}{\rightarrow } {\mathbb {N}}(\varvec{0},\varvec{I}) \end{aligned}$$

(A.9)

with $\sqrt{n}{\mathbb {E}}\left( \varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N} \right) = O_{P}\left( \sqrt{\frac{n}{N}}\right) = o_{P}(1)$.

By Theorems 3, (A.7), (A.9) and Slutsky’s theorem, we conclude that as $n\rightarrow \infty $, $N\rightarrow \infty $, conditional on ${\mathcal {F}}_{N}$, with probability approaching one,

$$\begin{aligned} \{\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N\}^{-1/2}(\tilde{\varvec{\beta }}-\varvec{\beta }_t)\overset{d}{\rightarrow } {\mathbb {N}}(\varvec{0},\varvec{I}), \end{aligned}$$

where $\varvec{D}_N$ and $\varvec{V}_{\pi }$ are defined in (12) and (13), respectively. $\square $

Proof of Theorem 5

Define $h_i^{\textrm{Aopt}}=\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert $, $i=1,\ldots ,N$. Without loss of generality, we assume that $h_i^{\textrm{Aopt}}>0$, for any i, and $h_{N+1}^{\textrm{Aopt}}=+\infty $. Minimizing $\text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)$ is sufficient to solve the following optimization problem:

$$\begin{aligned}&\min \ {\tilde{H}}:=\text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N) \\&s.t. \ \sum _{i=1}^{N}\pi _i=n, 0\le \pi _i\le 1 \ \text {for } i=1,\ldots ,N. \end{aligned}$$

Without loss of generality, we assume that $h_1^{\textrm{Aopt}}\le h_2^{\textrm{Aopt}}\le \ldots \le h_N^{\textrm{Aopt}}$,

$$\begin{aligned} \text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)&=\frac{1}{N^2}\sum _{i=1}^{N}\left\{ \frac{1}{\pi _i}\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert ^2 \right\} \\&=\frac{1}{N^2}\sum _{i=1}^{N}\frac{1}{\pi _i}(h_i^{\textrm{Aopt}})^2\\&=\frac{1}{N^2}\frac{1}{n}\left( \sum _{i=1}^{N}\pi _i\right) \left( \sum _{i=1}^{N}\frac{1}{\pi _i}(h_i^{\textrm{Aopt}})^2\right) \\&\ge \frac{1}{N^2}\frac{1}{n}\left( \sum _{i=1}^{N}h_i^{\textrm{Aopt}}\right) ^2, \end{aligned}$$

where the last step is from the Cauchy-Schwarz inequality and the equality holds if and only if $\pi _{i}\propto h_i^{\textrm{Aopt}}$. Now we consider two cases:

Case 1. If all $\frac{nh_i^{\textrm{Aopt}}}{\sum _{j=1}^{N}h_j^{\textrm{Aopt}}}\le 1$, then $\pi _i^{\textrm{Aopt}}=\frac{nh_i^{\textrm{Aopt}}}{\sum _{j=1}^{N}h_j^{\textrm{Aopt}}}$, where $i=1,\ldots ,N$.

Case 2. Assume that exists some i such that $\pi _i^{\textrm{Aopt}}=\frac{nh_i^{\textrm{Aopt}}}{\sum _{j=1}^{N}h_j^{\textrm{Aopt}}}>1$, by the definition of k, we know that the number of such i is k. Therefore, the original optimization turns into the following optimization problem:

$$\begin{aligned}&\min \ \frac{1}{N^2}\sum _{i=1}^{N-k}\left\{ \frac{1}{\pi _i}\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert ^2 \right\} \\&s.t. \ \sum _{i=1}^{N-k}\pi _i=n-k, 0\le \pi _i\le 1 \ \text {for } i=1,\ldots ,N-k,\ \pi _{N-k+1}=\ldots =\pi _{N}=1. \end{aligned}$$

Similar with the calculating of $\pi _i^{\textrm{Aopt}}$ under Case 1, from the Cauchy–Schwarz inequality,

$$\begin{aligned}&\frac{1}{N^2}\sum _{i=1}^{N-k}\left\{ \frac{1}{\pi _i}\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert ^2 \right\} \\&\quad =\frac{1}{N^2}\frac{1}{(n-k)}\left( \sum _{i=1}^{N-k}\pi _i\right) \left( \sum _{i=1}^{N-k}\frac{1}{\pi _i}(h_i^{\textrm{Aopt}})^2\right) \\&\quad \ge \frac{1}{N^2}\frac{1}{(n-k)}\left( \sum _{i=1}^{N-k}h_i^{\textrm{Aopt}}\right) ^2, \end{aligned}$$

and the equality holds if and only if $\pi _{i}\propto h_i^{\textrm{Aopt}}$, i.e. $\pi _{i}^{\textrm{Aopt}}=\frac{(n-k)h_i^{\textrm{Aopt}}}{\sum _{j=1}^{N-k}h_j^{\textrm{Aopt}}}$, $i=1,\ldots ,N-k$. Assume there exists $\tilde{\textrm{M}}$ such that

$$\begin{aligned} \underset{1\le i\le N}{\max }\pi _{i}^{\textrm{Aopt}}=\underset{1\le i\le N}{\max }\frac{n(h_i^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}{\sum _{j=1}^{N}(h_j^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}=1, \end{aligned}$$

and $h_{N-k}^{\textrm{Aopt}}\le \tilde{\textrm{M}}\le h_{N-k+1}^{\textrm{Aopt}}$, so $\sum _{i=1}^{N-k}h_i^{\textrm{Aopt}}=(n-k)\tilde{\textrm{M}}$ holds. Thus, the set $\{1, \ldots , N\}$ can be divided into two parts, i.e. $\{1, \ldots , N-k\}$ and $\{N-k+1, \ldots , N\}$, which correspond to $\pi _{i}^{\textrm{Aopt}}=\frac{h_i^{\textrm{Aopt}}}{\tilde{\textrm{M}}}$ and $\pi _{i}^{\textrm{Aopt}}=1$. So we have

$$\begin{aligned} \text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)&=\frac{1}{N^2}\sum _{i=1}^{N}\frac{1}{\pi _i}(h_i^{\textrm{Aopt}})^2\nonumber \\&=\frac{1}{N^2}\left\{ (n-k)\tilde{\textrm{M}}^2+\sum _{i=N-k+1}^{N}(h_i^{\textrm{Aopt}})^2\right\} . \end{aligned}$$

(A.10)

(A.10) describes that the lower bound of $\text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)$ can be attained when the equality of the Cauchy-Schwarz inequality holds.

When $\pi _{i}^{\textrm{Aopt}}=\frac{n(h_i^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}{\sum _{j=1}^{N}(h_j^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}$, and takes $\pi _{i}^{\textrm{Aopt}}$ into ${\tilde{H}}$, the following equation holds:

$$\begin{aligned} {\tilde{H}}&:=\frac{1}{N^2}\sum _{i=1}^{N}\left\{ \frac{1}{\pi _{i}^{\textrm{Aopt}}}\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert ^2 \right\} \nonumber \\&=\frac{1}{N^2}\sum _{i=1}^{N}\frac{1}{\pi _{i}^{\textrm{Aopt}}}(h_i^{\textrm{Aopt}})^2\nonumber \\&=\frac{1}{N^2}\left\{ (n-k)\tilde{\textrm{M}}^2+\sum _{i=N-k+1}^{N}(h_i^{\textrm{Aopt}})^2\right\} , \end{aligned}$$

(A.11)

which echoes the lower bound of $\text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)$ in (A.10). Therefore, by (A.10) and (A.11), $\pi _{i}^{\textrm{Aopt}}=\frac{n(h_i^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}{\sum _{j=1}^{N}(h_j^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}$ is the optimal solution.

Now, we will prove the existence and rationality of $\tilde{\textrm{M}}$ when $\tilde{\textrm{M}} \in (h_{N-k}^{\textrm{Aopt}}, h_{N-k+1}^{\textrm{Aopt}}]$. The definition of k, which implies that

$$\begin{aligned} \frac{(n-k+1)h_{N-k+1}^{\textrm{Aopt}}}{\sum _{i=1}^{N-k+1} h_{i}^{\textrm{Aopt}}} \ge 1 \quad \text{ and } \quad \frac{(n-k) h_{N-k}^{\textrm{Aopt}}}{\sum _{i=1}^{N-k} h_{i}^{\textrm{Aopt}}}<1. \end{aligned}$$

Taking $\tilde{\textrm{M}}_{1}=h_{N-k+1}^{\textrm{Aopt}}$ and $\tilde{\textrm{M}}_{2}=h_{N-k}^{\textrm{Aopt}}$, we have

$$\begin{aligned} \frac{(n-k+1) h_{N-k+1}^{\textrm{Aopt}}+(k-1) \tilde{\textrm{M}}_{1}}{\sum _{i=1}^{N-k+1} h_{i}^{\textrm{Aopt}}+(k-1) \tilde{\textrm{M}}_{1}} \ge 1 \quad \text{ and } \quad \frac{(n-k) h_{N-k}^{\textrm{Aopt}}+k \tilde{\textrm{M}}_{2}}{\sum _{i=1}^{N-k} h_{i}^{\textrm{Aopt}}+k \tilde{\textrm{M}}_{2}}<1, \end{aligned}$$

which implies the fact that

$$\begin{aligned} n\frac{h_{i}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}_{1}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}_{1})} \ge 1 \text{ and } n\frac{h_{i}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}_{2}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}_{2})}<1. \end{aligned}$$

We can see that $\underset{1\le i\le N}{\max }\frac{h_{i}^{\textrm{Aopt}}\wedge \tilde{\textrm{M}}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}})}$ is continuous, which shows the existence of $\tilde{\textrm{M}}$.

For the rationality of $\tilde{\textrm{M}}$, we only prove that $\underset{1\le i\le N}{\max }\frac{h_{i}^{\textrm{Aopt}}\wedge \tilde{\textrm{M}}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}})}=\frac{1}{n}$, i.e. $\frac{h_{N}^{\textrm{Aopt}}\wedge \tilde{\textrm{M}}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}})}$ is nondecreasing on $\tilde{\textrm{M}} \in (h_{1}^{\textrm{Aopt}},h_{N}^{\textrm{Aopt}})$. For any $h_N^{\textrm{Aopt}}\ge \tilde{\textrm{M}}'\ge \tilde{\textrm{M}}$, $\tilde{\textrm{M}}'\wedge h_N^{\textrm{Aopt}}\ge \tilde{\textrm{M}}\wedge h_N^{\textrm{Aopt}}$, and $\left( \tilde{\textrm{M}}' / \tilde{\textrm{M}}\right) \sum _{i=1}^{N}(h_{i}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}) \ge \sum _{i=1}^{n}(h_{i}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}')$. So, the rationality can be proved. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ren, M., Zhao, S., Wang, M. et al. Robust optimal subsampling based on weighted asymmetric least squares. Stat Papers 65, 2221–2251 (2024). https://doi.org/10.1007/s00362-023-01480-7

Download citation

Received: 28 November 2022
Revised: 07 June 2023
Published: 19 September 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00362-023-01480-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust optimal subsampling based on weighted asymmetric least squares

Abstract

Access this article

Similar content being viewed by others

Entropy-Based Subsampling Methods for Big Data

Accounting for outliers in optimal subsampling methods

Subsampling in Longitudinal Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Technical details

Proof of Proposition 1

Lemma 1

Lemma 2

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Proof of Theorem 3

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Robust optimal subsampling based on weighted asymmetric least squares

Abstract

Access this article

Similar content being viewed by others

Entropy-Based Subsampling Methods for Big Data

Accounting for outliers in optimal subsampling methods

Subsampling in Longitudinal Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Technical details

Appendix: Technical details

Proof of Proposition 1

Lemma 1

Lemma 2

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Proof of Theorem 3

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation