Abstract
In recent years, many methodologies for distributed data have been developed. However, there are two problems. First, most of these methods require the data to be randomly and uniformly distributed across different machines. Second, the methods are mainly not robust. To solve these problems, we propose a distributed pilot modal regression estimator, which achieves robustness and can adapt when the data are stored nonrandomly. First, we collect a random pilot sample from different machines; then, we approximate the global MR objective function by a communication-efficient surrogate that can be efficiently evaluated by the pilot sample and the local gradients. The final estimator is obtained by minimizing the surrogate function in the master machine, while the other machines only need to calculate their gradients. Theoretical results show the new estimator is asymptotically efficient as the global MR estimator. Simulation studies illustrate the utility of the proposed approach.
Similar content being viewed by others
References
Battey, H., Fan, J., Liu, H., Lu, J., Zhu, Z. (2018). Distributed testing and estimation under sparse high dimensional models. Annals of Statistics, 46, 1352–1382.
Chen, X., Liu, W., Zhang, Y. (2019). Quantile regression under memory constraint. Annals of Statistics, 47, 3244–3273.
Chen, Y., Genovese, C., Tibshirani, R., Wasserman, L. (2016). Nonparametric modal regression. Annals of Statistics, 44, 489–514.
Duchi, J., Jordan, M., Wainwright, M., Zhang, Y. (2014). Optimality guarantees for distributed statistical estimation. arXiv preprint arXiv:1405.0782.
Fan, J., Wang, D., Wang, K., Zhu, Z. (2019). Distributed estimation of principal eigenspaces. Annals of statistics, 47, 3009.
Fan, J., Guo, Y., Wang, K. (2021). Communication-efficient accurate statistical estimation. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2021.1969238.
Feng, Y., Fan, J., Suykens, J. (2020). A statistical learning approach to modal regression. Journal of Machine Learning Research, 21(2), 1–35.
Gopal, S., Yang, Y. (2013). Distributed training of large-scale logistic models. In: International Conference on Machine Learning, pp. 289–297.
Huber, P. J. (1981). Robust statistics. New York: Wiley.
Jordan, M. I., Lee, J. D., Yang, Y. (2019). Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 14, 668–681.
Koenker, R., Bassett, G., Jr. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society, 46, 33–50.
Lee, J., Liu, Q., Sun, Y., Taylor, J. (2017). Communication-efficient sparse regression. Journal of Machine Learning Research, 18, 115–144.
Pan, R., Ren, T., Guo, B., Li, F., Li, G., Wang, H. (2021). A note on distributed quantile regression by pilot sampling and one-step updating. Journal of Business and Economic Statistics. https://doi.org/10.1080/07350015.2021.1961789.
Shamir, O., Srebro, N., Zhang, T. (2014). Communication-efficient distributed optimization using an approximate newton-type method. International Conference on Machine Learning, 32, 1000–1008.
Tu, J., Liu, W., Mao, X., Chen, X. (2021). Variance reduced median-of-means estimator for byzantine-robust distributed inference. Journal of Machine Learning Research, 22(84), 1–67.
Wang, F., Huang, D., Zhu, Y., Wang, H. (2020). Efficient estimation for generalized linear models on a distributed system with nonrandomly distributed data. arXiv preprint arXiv:2004.02414.
Wang, J., Kolar, M., Srebro, N., Zhang, T. (2017). Efficient distributed learning with sparsity. International Conference on Machine Learning, 70, 3636–3645.
Wang, K., Li, S. (2021). Robust distributed modal regression for massive data. Computational Statistics and Data Analysis, 160, 107225.
Wang, K., Lin, L. (2016). Robust structure identification and variable selection in partial linear varying coefficient models. Journal of Statistical Planning and Inference, 174, 153–168.
Wang, K., Li, S., Sun, X., Lin, L. (2019). Modal regression statistical inference for longitudinal data semivarying coefficient models: Generalized estimating equations, empirical likelihood and variable selection. Computational Statistics and Data Analysis, 133, 257–276.
Wang, K., Wang, H., Li, S. (2022). Renewable Quantile Regression for Streaming Datasets. Knowledge-based Systems, 235, 107675.
Wang, K., Zhang, B., Sun, Xiao, Li, S. (2022). Efficient statistical estimation for a non-randomly distributed system with application to large-scale data neural network. Expert Systems With Applications, 197, 116698.
Yao, W., Li, L. (2014). A new regression model: modal linear regression. Scandinavian Journal of Statistics, 41, 656–671.
Yao, W., Lindsay, B., Li, R. (2012). Local modal regression. Journal of Nonparametric Statistics, 24, 647–663.
Zhang, Y., Duchi, J. C., Wainwright, M. (2013). Communication-efficient algorithms for statistical optimization. Journal of Machine Learning Research, 14, 3321–3363.
Zhao, W., Zhang, R., Liu, J., Lv, Y. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66, 165–191.
Zhu, X., Li, F., Wang, H. (2021). Least-square approximation for a distributed system. Journal of Computational and Graphical Statistics, 30(4), 1004–1018.
Acknowledgements
The research was supported by NNSF project of China (12101056 and 11901356).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that there are no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Theorem 1
First, we compute the order of \(\Vert \nabla ^2{Q}_\text {N}^h({\varvec{\beta }}_{0})-\nabla ^2{Q}_{{\mathcal {P}}}^h({\varvec{\beta }}_{0})\Vert _{\infty }\). Let \({\varvec{\Sigma }}=E({\varvec{X}}{\varvec{X}}^{T})\), we have that
where \(\widetilde{{\varvec{X}}}=({\varvec{X}}_{1},\dots ,{\varvec{X}}_{N})^{\text {T}}\) and \(\widetilde{{\varvec{X}}}_{{\mathcal {P}}}=({\varvec{X}}_{i},i\in {{\mathcal {P}}})^{\text {T}}\). It is easy to obtain
where \(c_{1}\) is a constant that depends on \({\varvec{\Sigma }}\). By a union bound over all (j, k) pairs,
Thus, letting \(t=C\sqrt{\frac{\log p}{N}}\), we have \(O_{p}(\Vert {\varvec{\Sigma }}-\frac{1}{N}\widetilde{{\varvec{X}}}\widetilde{{\varvec{X}}}^{T}\Vert _{\infty })=O_{p}(N^{-1/2})\). By a similar argument, \(O_{p}(\Vert \frac{1}{n}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}\widetilde{{\varvec{X}}}_{{\mathcal {P}}}^{T}-{\varvec{\Sigma }}\Vert _{\infty })=O_{p}(n^{-1/2})\). Then, we can get that
By applying Lemma 6 in Zhang et al. (2013) with \(F_{1}=\widetilde{{L}}_\text {N}^h({\varvec{\beta }})\) in the notation therein, we can also obtain
A simple calculation yields
and note that \(\nabla {{Q}}_\text {N}^h(\hat{{\varvec{\beta }}}_\text {N})={\varvec{0}}\), we obtain
By the integral form of Taylor’s expansion, we have
where \({\varvec{H}}_{{\mathcal {P}}}=\int _{0}^{1}\nabla ^{2}Q_{{\mathcal {P}}}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}}+t(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}))dt\) and \({\varvec{H}}_{N}=\int _{0}^{1}\nabla ^{2}Q_{N}^h(\hat{{\varvec{\beta }}}_{{\mathcal {P}}}+t(\hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}))dt\) satisfy \(\Vert {\varvec{H}}_{{\mathcal {P}}}-\nabla ^{2}Q_{{\mathcal {P}}}^h({{\varvec{\beta }}}_0)\Vert =O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert +\Vert \hat{{\varvec{\beta }}}_\text {N}-{{\varvec{\beta }}}_{0}\Vert )\) and \(\Vert {\varvec{H}}_{N}-\nabla ^{2}Q_{N}^h({{\varvec{\beta }}}_0)\Vert =O_{p}(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert +\Vert \hat{{\varvec{\beta }}}_\text {N}-{{\varvec{\beta }}}_{0}\Vert )\), respectively. Thus, we have
Now, we complete the proof of (a).
Together with
we can get that
Under the assumptions \(n/\sqrt{N}\rightarrow \infty\) and \(\Vert \hat{{\varvec{\beta }}}_\text {N}-\hat{{\varvec{\beta }}}_{{\mathcal {P}}}\Vert =O_{p}(n^{-1/2})\), we can obtain that \(\sqrt{N}(\frac{1}{N}+n^{-1/2}\Vert \hat{{\varvec{\beta }}}_{{\mathcal {P}}}-\hat{{\varvec{\beta }}}_\text {N}\Vert )=o_{p}(1)\). Thus, we have that
where \(\xi (h)=\frac{E({\dot{\phi }}_{h}^2(\epsilon ))}{[E(\ddot{\phi }_{h}(\epsilon ))]^2}\). The proof of (b) is completed. \(\square\)
About this article
Cite this article
Li, S., Wang, K. & Xu, Y. Robust estimation for nonrandomly distributed data. Ann Inst Stat Math 75, 493–509 (2023). https://doi.org/10.1007/s10463-022-00852-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-022-00852-4