Robust estimation for nonrandomly distributed data

Li, Shaomin; Wang, Kangning; Xu, Yong

doi:10.1007/s10463-022-00852-4

Robust estimation for nonrandomly distributed data

Published: 12 October 2022

Volume 75, pages 493–509, (2023)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Shaomin Li¹,
Kangning Wang² &
Yong Xu³

349 Accesses
Explore all metrics

Abstract

In recent years, many methodologies for distributed data have been developed. However, there are two problems. First, most of these methods require the data to be randomly and uniformly distributed across different machines. Second, the methods are mainly not robust. To solve these problems, we propose a distributed pilot modal regression estimator, which achieves robustness and can adapt when the data are stored nonrandomly. First, we collect a random pilot sample from different machines; then, we approximate the global MR objective function by a communication-efficient surrogate that can be efficiently evaluated by the pilot sample and the local gradients. The final estimator is obtained by minimizing the surrogate function in the master machine, while the other machines only need to calculate their gradients. Theoretical results show the new estimator is asymptotically efficient as the global MR estimator. Simulation studies illustrate the utility of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Entropy-Based Subsampling Methods for Big Data

Article 11 April 2024

Golden Ratio Proximal Gradient ADMM for Distributed Composite Convex Optimization

Article 15 November 2023

References

Battey, H., Fan, J., Liu, H., Lu, J., Zhu, Z. (2018). Distributed testing and estimation under sparse high dimensional models. Annals of Statistics, 46, 1352–1382.
Article MathSciNet MATH Google Scholar
Chen, X., Liu, W., Zhang, Y. (2019). Quantile regression under memory constraint. Annals of Statistics, 47, 3244–3273.
Article MathSciNet MATH Google Scholar
Chen, Y., Genovese, C., Tibshirani, R., Wasserman, L. (2016). Nonparametric modal regression. Annals of Statistics, 44, 489–514.
Article MathSciNet MATH Google Scholar
Duchi, J., Jordan, M., Wainwright, M., Zhang, Y. (2014). Optimality guarantees for distributed statistical estimation. arXiv preprint arXiv:1405.0782.
Fan, J., Wang, D., Wang, K., Zhu, Z. (2019). Distributed estimation of principal eigenspaces. Annals of statistics, 47, 3009.
Article MathSciNet MATH Google Scholar
Fan, J., Guo, Y., Wang, K. (2021). Communication-efficient accurate statistical estimation. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2021.1969238.
Article Google Scholar
Feng, Y., Fan, J., Suykens, J. (2020). A statistical learning approach to modal regression. Journal of Machine Learning Research, 21(2), 1–35.
MathSciNet MATH Google Scholar
Gopal, S., Yang, Y. (2013). Distributed training of large-scale logistic models. In: International Conference on Machine Learning, pp. 289–297.
Huber, P. J. (1981). Robust statistics. New York: Wiley.
Book MATH Google Scholar
Jordan, M. I., Lee, J. D., Yang, Y. (2019). Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 14, 668–681.
Article MathSciNet MATH Google Scholar
Koenker, R., Bassett, G., Jr. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society, 46, 33–50.
Article MathSciNet MATH Google Scholar
Lee, J., Liu, Q., Sun, Y., Taylor, J. (2017). Communication-efficient sparse regression. Journal of Machine Learning Research, 18, 115–144.
MathSciNet MATH Google Scholar
Pan, R., Ren, T., Guo, B., Li, F., Li, G., Wang, H. (2021). A note on distributed quantile regression by pilot sampling and one-step updating. Journal of Business and Economic Statistics. https://doi.org/10.1080/07350015.2021.1961789.
Article Google Scholar
Shamir, O., Srebro, N., Zhang, T. (2014). Communication-efficient distributed optimization using an approximate newton-type method. International Conference on Machine Learning, 32, 1000–1008.
Google Scholar
Tu, J., Liu, W., Mao, X., Chen, X. (2021). Variance reduced median-of-means estimator for byzantine-robust distributed inference. Journal of Machine Learning Research, 22(84), 1–67.
MathSciNet MATH Google Scholar
Wang, F., Huang, D., Zhu, Y., Wang, H. (2020). Efficient estimation for generalized linear models on a distributed system with nonrandomly distributed data. arXiv preprint arXiv:2004.02414.
Wang, J., Kolar, M., Srebro, N., Zhang, T. (2017). Efficient distributed learning with sparsity. International Conference on Machine Learning, 70, 3636–3645.
Google Scholar
Wang, K., Li, S. (2021). Robust distributed modal regression for massive data. Computational Statistics and Data Analysis, 160, 107225.
Article MathSciNet MATH Google Scholar
Wang, K., Lin, L. (2016). Robust structure identification and variable selection in partial linear varying coefficient models. Journal of Statistical Planning and Inference, 174, 153–168.
Article MathSciNet MATH Google Scholar
Wang, K., Li, S., Sun, X., Lin, L. (2019). Modal regression statistical inference for longitudinal data semivarying coefficient models: Generalized estimating equations, empirical likelihood and variable selection. Computational Statistics and Data Analysis, 133, 257–276.
Article MathSciNet MATH Google Scholar
Wang, K., Wang, H., Li, S. (2022). Renewable Quantile Regression for Streaming Datasets. Knowledge-based Systems, 235, 107675.
Article Google Scholar
Wang, K., Zhang, B., Sun, Xiao, Li, S. (2022). Efficient statistical estimation for a non-randomly distributed system with application to large-scale data neural network. Expert Systems With Applications, 197, 116698.
Article Google Scholar
Yao, W., Li, L. (2014). A new regression model: modal linear regression. Scandinavian Journal of Statistics, 41, 656–671.
Article MathSciNet MATH Google Scholar
Yao, W., Lindsay, B., Li, R. (2012). Local modal regression. Journal of Nonparametric Statistics, 24, 647–663.
Article MathSciNet MATH Google Scholar
Zhang, Y., Duchi, J. C., Wainwright, M. (2013). Communication-efficient algorithms for statistical optimization. Journal of Machine Learning Research, 14, 3321–3363.
MathSciNet MATH Google Scholar
Zhao, W., Zhang, R., Liu, J., Lv, Y. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66, 165–191.
Article MathSciNet MATH Google Scholar
Zhu, X., Li, F., Wang, H. (2021). Least-square approximation for a distributed system. Journal of Computational and Graphical Statistics, 30(4), 1004–1018.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The research was supported by NNSF project of China (12101056 and 11901356).

Author information

Authors and Affiliations

Center for Statistics and Data Science, Beijing Normal University, No.18 Jinfeng Road, Zhuhai, 519087, China
Shaomin Li
School of Statistics, Shandong Technology and Business University, No.191 Binhai Middle Road, Yantai, 264005, China
Kangning Wang
School of Business Administration, Shandong Technology and Business University, No.191 Binhai Middle Road, Yantai, 264005, China
Yong Xu

Authors

Shaomin Li
View author publications
You can also search for this author in PubMed Google Scholar
Kangning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kangning Wang or Yong Xu.

Ethics declarations

Conflict of interest

The authors declare that there are no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 1

First, we compute the order of $\Vert \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})\Vert _{\infty }$. Let ${\varvec{\Sigma }}=E({\varvec{X}}{\varvec{X}}^{T})$, we have that

$$\begin{aligned}&~~~~\Vert \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})\Vert _{\infty } \\&=O_{p}\left( \left\| {\varvec{\Sigma }}-\frac{1}{N}\widetilde{{\varvec{X}}} \widetilde{{\varvec{X}}}^{T} \right\| _{\infty }\right) +O_{p}\left( \left\| \frac{1}{n}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}^{T}-{\varvec{\Sigma }} \right\| _{\infty }\right) +O_{p}(N^{-1/2}), \end{aligned}$$

where $\widetilde{{\varvec{X}}}=({\varvec{X}}_{1},\dots ,{\varvec{X}}_{N})^{\text {T}}$ and $\widetilde{{\varvec{X}}}_{{\mathcal {P}}}=({\varvec{X}}_{i},i\in {{\mathcal {P}}})^{\text {T}}$. It is easy to obtain

$$\begin{aligned} P\left( \left| \frac{1}{N}\sum \nolimits _{i=1}^{N}X_{ij}X_{ik}-{\varvec{\Sigma }}_{jk}\right| >t\right) \leqslant \exp (-c_{1}\min (t^{2},t)N), \end{aligned}$$

where $c_{1}$ is a constant that depends on ${\varvec{\Sigma }}$. By a union bound over all (j, k) pairs,

$$\begin{aligned} P\left( \left| \frac{1}{N}\sum \nolimits _{i=1}^{N}\widetilde{{\varvec{X}}}\widetilde{{\varvec{X}}}^{T}-{\varvec{\Sigma }}\right| >t\right) \leqslant \exp (2\log p-c_{1}\min (t^{2},t)N). \end{aligned}$$

Thus, letting $t=C\sqrt{\frac{\log p}{N}}$, we have $O_{p}(\Vert {\varvec{\Sigma }}-\frac{1}{N}\widetilde{{\varvec{X}}}\widetilde{{\varvec{X}}}^{T}\Vert _{\infty })=O_{p}(N^{-1/2})$. By a similar argument, $O_{p}(\Vert \frac{1}{n}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}^{T}-{\varvec{\Sigma }}\Vert _{\infty })=O_{p}(n^{-1/2})$. Then, we can get that

$$\begin{aligned} \left\| \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})\right\| _{\infty }=O_{p}\left( n^{-1/2}\right) . \end{aligned}$$

By applying Lemma 6 in Zhang et al. (2013) with $F_{1}=\widetilde{{L}}_\text {N}^h({\varvec{\beta }})$ in the notation therein, we can also obtain

$$\begin{aligned} \left\| \widetilde{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_\text {N} \right\| =O_{p}\left( \left\| \nabla \widetilde{{L}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})\right\| \right) . \end{aligned}$$

A simple calculation yields

$$\begin{aligned} \nabla \widetilde{{L}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})=\nabla {{Q}}_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_\text {N})- \nabla {{Q}}_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})+\nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}}), \end{aligned}$$

and note that $\nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})={\varvec{0}}$, we obtain

$$\begin{aligned} \nabla \widetilde{{L}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})= \left( \nabla {{Q}}_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_\text {N})-\nabla {{Q}}_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})\right) - \left( \nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})-\nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})\right) . \end{aligned}$$

By the integral form of Taylor’s expansion, we have

$$\begin{aligned} \nabla Q_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_\text {N})-\nabla Q_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})={\varvec{H}}_{{\mathcal {P}}}(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}) \text{ and } \nabla Q_{N}^h(\hat{{\varvec{\beta }}}_\text {N})-\nabla Q_{N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})={\varvec{H}}_{N}(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}), \end{aligned}$$

where ${\varvec{H}}_{{\mathcal {P}}}=\int _{0}^{1}\nabla ^{2}Q_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}}+t(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}))dt$ and ${\varvec{H}}_{N}=\int _{0}^{1}\nabla ^{2}Q_{N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}}+t(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}))dt$ satisfy $\Vert {\varvec{H}}_{{\mathcal {P}}}-\nabla ^{2}Q_{{\mathcal {P}}}^h({{\varvec{\beta }}}_0)\Vert =O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert +\Vert \hat{{\varvec{\beta }}}_\text {N}-{{\varvec{\beta }}}_{0}\Vert )$ and $\Vert {\varvec{H}}_{N}-\nabla ^{2}Q_{N}^h({{\varvec{\beta }}}_0)\Vert =O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert +\Vert \hat{{\varvec{\beta }}}_\text {N}-{{\varvec{\beta }}}_{0}\Vert )$, respectively. Thus, we have

$$\begin{aligned}&~~~~\left\| \nabla \widetilde{{L}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})\right\| \\&=\left\| ({\varvec{H}}_{{\mathcal {P}}}-\nabla ^{2}{{Q}}_{{\mathcal {P}}}^h({{\varvec{\beta }}}_0)) (\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}})-({\varvec{H}}_{N}-\nabla ^{2}{{Q}}_\text {N}^h({{\varvec{\beta }}}_0)) (\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}})\right. \\&~~~~~~\left. +(\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0}))(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}) \right\| \\&\leqslant \Vert {\varvec{H}}_{{\mathcal {P}}}-\nabla ^{2}{{Q}}_{{\mathcal {P}}}^h({{\varvec{\beta }}}_0)\Vert \Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert +\Vert {\varvec{H}}_{N}-\nabla ^{2}{{Q}}_\text {N}^h({{\varvec{\beta }}}_0)\Vert \Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert \\&~~~~~~+\Vert \nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})\Vert \Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert \\&=(O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert )+O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert )+\Vert \nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})- \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})\Vert )\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert \\&=O_{p}\left( \frac{1}{\sqrt{n}}\right) \Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert . \end{aligned}$$

Now, we complete the proof of (a).

Together with

$$\begin{aligned} \hat{{\varvec{\beta }}}_\text {N}-{\varvec{\beta }}_{0}=-\frac{1}{E(\ddot{\phi }_{h}(\epsilon ))}{\varvec{\Sigma }}^{-1}\nabla Q_\text {N}^h({{\varvec{\beta }}}_{0})+O_{p}\left( \frac{1}{N}\right) , \end{aligned}$$

we can get that

$$\begin{aligned}&~~~~\widetilde{{{\varvec{\beta }}}}_\text {N}-{\varvec{\beta }}_{0}\\&=(\widetilde{{{\varvec{\beta }}}}_\text {N}-\hat{{{\varvec{\beta }}}}_\text {N})+( \hat{{{\varvec{\beta }}}}_\text {N}-{\varvec{\beta }}_{0})\\&=-\frac{1}{E(\ddot{\phi }_{h}(\epsilon ))}{\varvec{\Sigma }}^{-1}\left( \frac{1}{N}\sum \nolimits _{i=1}^{N}{\varvec{X}}_{i}{\dot{\phi }}_{h}\left( \epsilon _{i}\right) \right) + O_{p}\left( \frac{1}{N}+n^{-1/2}\Vert \hat{{\varvec{\beta }}}_{{\mathcal {P}}}-\hat{{\varvec{\beta }}}_\text {N}\Vert \right) . \end{aligned}$$

Under the assumptions $n/\sqrt{N}\rightarrow \infty$ and $\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert =O_{p}(n^{-1/2})$, we can obtain that $\sqrt{N}(\frac{1}{N}+n^{-1/2}\Vert \hat{{\varvec{\beta }}}_{{\mathcal {P}}}-\hat{{\varvec{\beta }}}_\text {N}\Vert )=o_{p}(1)$. Thus, we have that

$$\begin{aligned}&~~~~\sqrt{N}(\widetilde{{{\varvec{\beta }}}}_\text {N}-{\varvec{\beta }}_{0})\\&=-\frac{1}{E(\ddot{\phi }_{h}(\epsilon ))}{\varvec{\Sigma }}^{-1}\left( \frac{1}{\sqrt{N}} \sum \nolimits _{i=1}^{N}{\varvec{X}}_{i}{\dot{\phi }}_{h}\left( \epsilon _{i}\right) \right) +o_{p}(1)\\&\rightarrow _d N({\varvec{0}}, \xi (h){\varvec{\Sigma }}^{-1}), \end{aligned}$$

where $\xi (h)=\frac{E({\dot{\phi }}_{h}^2(\epsilon ))}{[E(\ddot{\phi }_{h}(\epsilon ))]^2}$. The proof of (b) is completed. $\square$

About this article

Cite this article

Li, S., Wang, K. & Xu, Y. Robust estimation for nonrandomly distributed data. Ann Inst Stat Math 75, 493–509 (2023). https://doi.org/10.1007/s10463-022-00852-4

Download citation

Received: 04 May 2021
Revised: 08 July 2022
Accepted: 16 August 2022
Published: 12 October 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10463-022-00852-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust estimation for nonrandomly distributed data

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Entropy-Based Subsampling Methods for Big Data

Golden Ratio Proximal Gradient ADMM for Distributed Composite Convex Optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Proof of Theorem 1

About this article

Cite this article

Keywords

Navigation

Robust estimation for nonrandomly distributed data

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Entropy-Based Subsampling Methods for Big Data

Golden Ratio Proximal Gradient ADMM for Distributed Composite Convex Optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 1

About this article

Cite this article

Share this article

Keywords

Search

Navigation