Skip to main content
Log in

Robust estimation for nonrandomly distributed data

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In recent years, many methodologies for distributed data have been developed. However, there are two problems. First, most of these methods require the data to be randomly and uniformly distributed across different machines. Second, the methods are mainly not robust. To solve these problems, we propose a distributed pilot modal regression estimator, which achieves robustness and can adapt when the data are stored nonrandomly. First, we collect a random pilot sample from different machines; then, we approximate the global MR objective function by a communication-efficient surrogate that can be efficiently evaluated by the pilot sample and the local gradients. The final estimator is obtained by minimizing the surrogate function in the master machine, while the other machines only need to calculate their gradients. Theoretical results show the new estimator is asymptotically efficient as the global MR estimator. Simulation studies illustrate the utility of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Battey, H., Fan, J., Liu, H., Lu, J., Zhu, Z. (2018). Distributed testing and estimation under sparse high dimensional models. Annals of Statistics, 46, 1352–1382.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, X., Liu, W., Zhang, Y. (2019). Quantile regression under memory constraint. Annals of Statistics, 47, 3244–3273.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Y., Genovese, C., Tibshirani, R., Wasserman, L. (2016). Nonparametric modal regression. Annals of Statistics, 44, 489–514.

    Article  MathSciNet  MATH  Google Scholar 

  • Duchi, J., Jordan, M., Wainwright, M., Zhang, Y. (2014). Optimality guarantees for distributed statistical estimation. arXiv preprint arXiv:1405.0782.

  • Fan, J., Wang, D., Wang, K., Zhu, Z. (2019). Distributed estimation of principal eigenspaces. Annals of statistics, 47, 3009.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Guo, Y., Wang, K. (2021). Communication-efficient accurate statistical estimation. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2021.1969238.

    Article  Google Scholar 

  • Feng, Y., Fan, J., Suykens, J. (2020). A statistical learning approach to modal regression. Journal of Machine Learning Research, 21(2), 1–35.

    MathSciNet  MATH  Google Scholar 

  • Gopal, S., Yang, Y. (2013). Distributed training of large-scale logistic models. In: International Conference on Machine Learning, pp. 289–297.

  • Huber, P. J. (1981). Robust statistics. New York: Wiley.

    Book  MATH  Google Scholar 

  • Jordan, M. I., Lee, J. D., Yang, Y. (2019). Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 14, 668–681.

    Article  MathSciNet  MATH  Google Scholar 

  • Koenker, R., Bassett, G., Jr. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society, 46, 33–50.

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, J., Liu, Q., Sun, Y., Taylor, J. (2017). Communication-efficient sparse regression. Journal of Machine Learning Research, 18, 115–144.

    MathSciNet  MATH  Google Scholar 

  • Pan, R., Ren, T., Guo, B., Li, F., Li, G., Wang, H. (2021). A note on distributed quantile regression by pilot sampling and one-step updating. Journal of Business and Economic Statistics. https://doi.org/10.1080/07350015.2021.1961789.

    Article  Google Scholar 

  • Shamir, O., Srebro, N., Zhang, T. (2014). Communication-efficient distributed optimization using an approximate newton-type method. International Conference on Machine Learning, 32, 1000–1008.

    Google Scholar 

  • Tu, J., Liu, W., Mao, X., Chen, X. (2021). Variance reduced median-of-means estimator for byzantine-robust distributed inference. Journal of Machine Learning Research, 22(84), 1–67.

    MathSciNet  MATH  Google Scholar 

  • Wang, F., Huang, D., Zhu, Y., Wang, H. (2020). Efficient estimation for generalized linear models on a distributed system with nonrandomly distributed data. arXiv preprint arXiv:2004.02414.

  • Wang, J., Kolar, M., Srebro, N., Zhang, T. (2017). Efficient distributed learning with sparsity. International Conference on Machine Learning, 70, 3636–3645.

    Google Scholar 

  • Wang, K., Li, S. (2021). Robust distributed modal regression for massive data. Computational Statistics and Data Analysis, 160, 107225.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, K., Lin, L. (2016). Robust structure identification and variable selection in partial linear varying coefficient models. Journal of Statistical Planning and Inference, 174, 153–168.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, K., Li, S., Sun, X., Lin, L. (2019). Modal regression statistical inference for longitudinal data semivarying coefficient models: Generalized estimating equations, empirical likelihood and variable selection. Computational Statistics and Data Analysis, 133, 257–276.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, K., Wang, H., Li, S. (2022). Renewable Quantile Regression for Streaming Datasets. Knowledge-based Systems, 235, 107675.

    Article  Google Scholar 

  • Wang, K., Zhang, B., Sun, Xiao, Li, S. (2022). Efficient statistical estimation for a non-randomly distributed system with application to large-scale data neural network. Expert Systems With Applications, 197, 116698.

    Article  Google Scholar 

  • Yao, W., Li, L. (2014). A new regression model: modal linear regression. Scandinavian Journal of Statistics, 41, 656–671.

    Article  MathSciNet  MATH  Google Scholar 

  • Yao, W., Lindsay, B., Li, R. (2012). Local modal regression. Journal of Nonparametric Statistics, 24, 647–663.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, Y., Duchi, J. C., Wainwright, M. (2013). Communication-efficient algorithms for statistical optimization. Journal of Machine Learning Research, 14, 3321–3363.

    MathSciNet  MATH  Google Scholar 

  • Zhao, W., Zhang, R., Liu, J., Lv, Y. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66, 165–191.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu, X., Li, F., Wang, H. (2021). Least-square approximation for a distributed system. Journal of Computational and Graphical Statistics, 30(4), 1004–1018.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The research was supported by NNSF project of China (12101056 and 11901356).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kangning Wang or Yong Xu.

Ethics declarations

Conflict of interest

The authors declare that there are no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Theorem 1

First, we compute the order of \(\Vert \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})\Vert _{\infty }\). Let \({\varvec{\Sigma }}=E({\varvec{X}}{\varvec{X}}^{T})\), we have that

$$\begin{aligned}&~~~~\Vert \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})\Vert _{\infty } \\&=O_{p}\left( \left\| {\varvec{\Sigma }}-\frac{1}{N}\widetilde{{\varvec{X}}} \widetilde{{\varvec{X}}}^{T} \right\| _{\infty }\right) +O_{p}\left( \left\| \frac{1}{n}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}^{T}-{\varvec{\Sigma }} \right\| _{\infty }\right) +O_{p}(N^{-1/2}), \end{aligned}$$

where \(\widetilde{{\varvec{X}}}=({\varvec{X}}_{1},\dots ,{\varvec{X}}_{N})^{\text {T}}\) and \(\widetilde{{\varvec{X}}}_{{\mathcal {P}}}=({\varvec{X}}_{i},i\in {{\mathcal {P}}})^{\text {T}}\). It is easy to obtain

$$\begin{aligned} P\left( \left| \frac{1}{N}\sum \nolimits _{i=1}^{N}X_{ij}X_{ik}-{\varvec{\Sigma }}_{jk}\right| >t\right) \leqslant \exp (-c_{1}\min (t^{2},t)N), \end{aligned}$$

where \(c_{1}\) is a constant that depends on \({\varvec{\Sigma }}\). By a union bound over all (jk) pairs,

$$\begin{aligned} P\left( \left| \frac{1}{N}\sum \nolimits _{i=1}^{N}\widetilde{{\varvec{X}}}\widetilde{{\varvec{X}}}^{T}-{\varvec{\Sigma }}\right| >t\right) \leqslant \exp (2\log p-c_{1}\min (t^{2},t)N). \end{aligned}$$

Thus, letting \(t=C\sqrt{\frac{\log p}{N}}\), we have \(O_{p}(\Vert {\varvec{\Sigma }}-\frac{1}{N}\widetilde{{\varvec{X}}}\widetilde{{\varvec{X}}}^{T}\Vert _{\infty })=O_{p}(N^{-1/2})\). By a similar argument, \(O_{p}(\Vert \frac{1}{n}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}^{T}-{\varvec{\Sigma }}\Vert _{\infty })=O_{p}(n^{-1/2})\). Then, we can get that

$$\begin{aligned} \left\| \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})\right\| _{\infty }=O_{p}\left( n^{-1/2}\right) . \end{aligned}$$

By applying Lemma 6 in Zhang et al. (2013) with \(F_{1}=\widetilde{{L}}_\text {N}^h({\varvec{\beta }})\) in the notation therein, we can also obtain

$$\begin{aligned} \left\| \widetilde{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_\text {N} \right\| =O_{p}\left( \left\| \nabla \widetilde{{L}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})\right\| \right) . \end{aligned}$$

A simple calculation yields

$$\begin{aligned} \nabla \widetilde{{L}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})=\nabla {{Q}}_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_\text {N})- \nabla {{Q}}_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})+\nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}}), \end{aligned}$$

and note that \(\nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})={\varvec{0}}\), we obtain

$$\begin{aligned} \nabla \widetilde{{L}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})= \left( \nabla {{Q}}_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_\text {N})-\nabla {{Q}}_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})\right) - \left( \nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})-\nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})\right) . \end{aligned}$$

By the integral form of Taylor’s expansion, we have

$$\begin{aligned} \nabla Q_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_\text {N})-\nabla Q_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})={\varvec{H}}_{{\mathcal {P}}}(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}) \text{ and } \nabla Q_{N}^h(\hat{{\varvec{\beta }}}_\text {N})-\nabla Q_{N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}})={\varvec{H}}_{N}(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}), \end{aligned}$$

where \({\varvec{H}}_{{\mathcal {P}}}=\int _{0}^{1}\nabla ^{2}Q_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}}+t(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}))dt\) and \({\varvec{H}}_{N}=\int _{0}^{1}\nabla ^{2}Q_{N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}}+t(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}))dt\) satisfy \(\Vert {\varvec{H}}_{{\mathcal {P}}}-\nabla ^{2}Q_{{\mathcal {P}}}^h({{\varvec{\beta }}}_0)\Vert =O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert +\Vert \hat{{\varvec{\beta }}}_\text {N}-{{\varvec{\beta }}}_{0}\Vert )\) and \(\Vert {\varvec{H}}_{N}-\nabla ^{2}Q_{N}^h({{\varvec{\beta }}}_0)\Vert =O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert +\Vert \hat{{\varvec{\beta }}}_\text {N}-{{\varvec{\beta }}}_{0}\Vert )\), respectively. Thus, we have

$$\begin{aligned}&~~~~\left\| \nabla \widetilde{{L}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})\right\| \\&=\left\| ({\varvec{H}}_{{\mathcal {P}}}-\nabla ^{2}{{Q}}_{{\mathcal {P}}}^h({{\varvec{\beta }}}_0)) (\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}})-({\varvec{H}}_{N}-\nabla ^{2}{{Q}}_\text {N}^h({{\varvec{\beta }}}_0)) (\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}})\right. \\&~~~~~~\left. +(\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0}))(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}) \right\| \\&\leqslant \Vert {\varvec{H}}_{{\mathcal {P}}}-\nabla ^{2}{{Q}}_{{\mathcal {P}}}^h({{\varvec{\beta }}}_0)\Vert \Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert +\Vert {\varvec{H}}_{N}-\nabla ^{2}{{Q}}_\text {N}^h({{\varvec{\beta }}}_0)\Vert \Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert \\&~~~~~~+\Vert \nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})\Vert \Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert \\&=(O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert )+O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert )+\Vert \nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})- \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})\Vert )\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert \\&=O_{p}\left( \frac{1}{\sqrt{n}}\right) \Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert . \end{aligned}$$

Now, we complete the proof of (a).

Together with

$$\begin{aligned} \hat{{\varvec{\beta }}}_\text {N}-{\varvec{\beta }}_{0}=-\frac{1}{E(\ddot{\phi }_{h}(\epsilon ))}{\varvec{\Sigma }}^{-1}\nabla Q_\text {N}^h({{\varvec{\beta }}}_{0})+O_{p}\left( \frac{1}{N}\right) , \end{aligned}$$

we can get that

$$\begin{aligned}&~~~~\widetilde{{{\varvec{\beta }}}}_\text {N}-{\varvec{\beta }}_{0}\\&=(\widetilde{{{\varvec{\beta }}}}_\text {N}-\hat{{{\varvec{\beta }}}}_\text {N})+( \hat{{{\varvec{\beta }}}}_\text {N}-{\varvec{\beta }}_{0})\\&=-\frac{1}{E(\ddot{\phi }_{h}(\epsilon ))}{\varvec{\Sigma }}^{-1}\left( \frac{1}{N}\sum \nolimits _{i=1}^{N}{\varvec{X}}_{i}{\dot{\phi }}_{h}\left( \epsilon _{i}\right) \right) + O_{p}\left( \frac{1}{N}+n^{-1/2}\Vert \hat{{\varvec{\beta }}}_{{\mathcal {P}}}-\hat{{\varvec{\beta }}}_\text {N}\Vert \right) . \end{aligned}$$

Under the assumptions \(n/\sqrt{N}\rightarrow \infty\) and \(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert =O_{p}(n^{-1/2})\), we can obtain that \(\sqrt{N}(\frac{1}{N}+n^{-1/2}\Vert \hat{{\varvec{\beta }}}_{{\mathcal {P}}}-\hat{{\varvec{\beta }}}_\text {N}\Vert )=o_{p}(1)\). Thus, we have that

$$\begin{aligned}&~~~~\sqrt{N}(\widetilde{{{\varvec{\beta }}}}_\text {N}-{\varvec{\beta }}_{0})\\&=-\frac{1}{E(\ddot{\phi }_{h}(\epsilon ))}{\varvec{\Sigma }}^{-1}\left( \frac{1}{\sqrt{N}} \sum \nolimits _{i=1}^{N}{\varvec{X}}_{i}{\dot{\phi }}_{h}\left( \epsilon _{i}\right) \right) +o_{p}(1)\\&\rightarrow _d N({\varvec{0}}, \xi (h){\varvec{\Sigma }}^{-1}), \end{aligned}$$

where \(\xi (h)=\frac{E({\dot{\phi }}_{h}^2(\epsilon ))}{[E(\ddot{\phi }_{h}(\epsilon ))]^2}\). The proof of (b) is completed. \(\square\)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Wang, K. & Xu, Y. Robust estimation for nonrandomly distributed data. Ann Inst Stat Math 75, 493–509 (2023). https://doi.org/10.1007/s10463-022-00852-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-022-00852-4

Keywords

Navigation