Skip to main content

Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

  • Conference paper
  • First Online:
  • 357 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13930))

Abstract

Distributed optimization with open collaboration is a popular field since it provides an opportunity for small groups/companies/universities, and individuals to jointly solve huge-scale problems. However, standard optimization algorithms are fragile in such settings due to the possible presence of so-called Byzantine workers – participants that can send (intentionally or not) incorrect information instead of the one prescribed by the protocol (e.g., send anti-gradient instead of stochastic gradients). Thus, the problem of designing distributed methods with provable robustness to Byzantine workers has been receiving a lot of attention recently. In particular, several works consider a very promising way to achieve Byzantine tolerance via exploiting variance reduction and robust aggregation. The existing approaches use SAGA- and SARAH-type variance reduced estimators, while another popular estimator – SVRG – is not studied in the context of Byzantine-robustness. In this work, we close this gap in the literature and propose a new method – Byzantine-Robust Loopless Stochastic Variance Reduced Gradient (BR-LSVRG). We derive non-asymptotic convergence guarantees for the new method in the strongly convex case and compare its performance with existing approaches in numerical experiments.

The research was supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) 075-00337-20-03, project No. 0714-2020-0005.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
EUR   29.95
Price includes VAT (Finland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR   67.40
Price includes VAT (Finland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR   87.99
Price includes VAT (Finland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This term takes its origin in [22] and has become standard in the literature [26]. By using this term, we do not want to offend any group of people but rather follow standard notation for the community.

  2. 2.

    See [13] for a recent survey on variance-reduced methods.

References

  1. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Baruch, G., Baruch, M., Goldberg, Y.: A little is enough: circumventing defenses for distributed learning. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  3. Blanchard, P., El Mhamdi, E.M., Guerraoui, R., Stainer, J.: Machine learning with adversaries: byzantine tolerant gradient descent. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  4. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)

    Article  Google Scholar 

  5. Damaskinos, G., El-Mhamdi, E.M., Guerraoui, R., Guirguis, A., Rouault, S.: AGGREGATHOR: byzantine machine learning via robust gradient aggregation. Proc. Mach. Learn. Res. 1, 81–106 (2019)

    Google Scholar 

  6. Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inf. Process. Syst. 27 (2014)

    Google Scholar 

  7. Fang, C., Li, C.J., Lin, Z., Zhang, T.: SPIDER: near-optimal non-convex optimization via stochastic path-integrated differential estimator. Adv. Neural Inf. Process. Syst. 31 (2018)

    Google Scholar 

  8. Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning (2016)

    Google Scholar 

  10. Gorbunov, E., Borzunov, A., Diskin, M., Ryabinin, M.: Secure distributed training at scale. In: International Conference on Machine Learning, pp. 7679–7739. PMLR (2022). http://proceedings.mlr.press/v162/gorbunov22a/gorbunov22a.pdf

  11. Gorbunov, E., Hanzely, F., Richtárik, P.: A unified theory of SGD: variance reduction, sampling, quantization and coordinate descent. In: International Conference on Artificial Intelligence and Statistics, pp. 680–690. PMLR (2020)

    Google Scholar 

  12. Gorbunov, E., Horváth, S., Richtárik, P., Gidel, G.: Variance reduction is an antidote to byzantines: better rates, weaker assumptions and communication compression as a cherry on the top. arXiv preprint arXiv:2206.00529 (2022)

  13. Gower, R.M., Schmidt, M., Bach, F., Richtárik, P.: Variance-reduced methods for machine learning. Proc. IEEE 108(11), 1968–1983 (2020)

    Article  Google Scholar 

  14. Guerraoui, R., Rouault, S., et al.: The hidden vulnerability of distributed learning in byzantium. In: International Conference on Machine Learning, pp. 3521–3530. PMLR (2018)

    Google Scholar 

  15. He, L., Karimireddy, S.P., Jaggi, M.: Byzantine-robust decentralized learning via self-centered clipping. arXiv preprint arXiv:2202.01545 (2022)

  16. Hofmann, T., Lucchi, A., Lacoste-Julien, S., McWilliams, B.: Variance reduced stochastic gradient descent with neighbors. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  17. Horváth, S., Lei, L., Richtárik, P., Jordan, M.I.: Adaptivity of stochastic gradient methods for nonconvex optimization. SIAM J. Math. Data Sci. 4(2), 634–648 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  18. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26 (2013)

    Google Scholar 

  19. Karimireddy, S.P., He, L., Jaggi, M.: Learning from history for byzantine robust optimization. In: International Conference on Machine Learning, pp. 5311–5319. PMLR (2021)

    Google Scholar 

  20. Karimireddy, S.P., He, L., Jaggi, M.: Byzantine-robust learning on heterogeneous datasets via bucketing. In: International Conference on Learning Representations (2022). https://arxiv.org/pdf/2006.09365.pdf

  21. Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: SVRG and Katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467. PMLR (2020). http://proceedings.mlr.press/v117/kovalev20a/kovalev20a.pdf

  22. Lamport, L., Shostak, R., Pease, M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)

    Article  MATH  Google Scholar 

  23. Li, C.: Demystifying GPT-3 language model: a technical overview (2020)

    Google Scholar 

  24. Li, Z., Bao, H., Zhang, X., Richtárik, P.: PAGE: a simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295. PMLR (2021)

    Google Scholar 

  25. Lojasiewicz, S.: A topological property of real analytic subsets. Coll. du CNRS, Les équations aux dérivées partielles 117(87–89), 2 (1963)

    Google Scholar 

  26. Lyu, L., et al.: Privacy and robustness in federated learning: attacks and defenses. IEEE Trans. Neural Netw. Learn. Syst. (2022)

    Google Scholar 

  27. Nesterov, Y.: Lectures on Convex Optimization. SOIA, vol. 137. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91578-4

    Book  MATH  Google Scholar 

  28. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621. PMLR (2017)

    Google Scholar 

  29. Ouyang, L., et al.: Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155 (2022)

  30. Pillutla, K., Kakade, S.M., Harchaoui, Z.: Robust aggregation for federated learning. IEEE Trans. Signal Process. 70, 1142–1154 (2022)

    Article  MathSciNet  Google Scholar 

  31. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Article  Google Scholar 

  32. Polyak, B.T.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)

    Article  MATH  Google Scholar 

  33. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 400–407 (1951)

    Google Scholar 

  34. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Book  MATH  Google Scholar 

  35. Wu, Z., Ling, Q., Chen, T., Giannakis, G.B.: Federated variance-reduced stochastic gradient descent with robustness to byzantine attacks. IEEE Trans. Signal Process. 68, 4583–4596 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  36. Xie, C., Koyejo, O., Gupta, I.: Fall of empires: breaking byzantine-tolerant SGD by inner product manipulation. In: Uncertainty in Artificial Intelligence, pp. 261–270. PMLR (2020)

    Google Scholar 

  37. Yin, D., Chen, Y., Kannan, R., Bartlett, P.: Byzantine-robust distributed learning: towards optimal statistical rates. In: International Conference on Machine Learning, pp. 5650–5659. PMLR (2018)

    Google Scholar 

  38. Zinkevich, M., Weimer, M., Li, L., Smola, A.: Parallelized stochastic gradient descent. Adv. Neural Inf. Process. Syst. 23 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduard Gorbunov .

Editor information

Editors and Affiliations

A Examples of Robust Aggregators

A Examples of Robust Aggregators

In [20], the authors propose the procedure called bucketing (see Algorithm 2) that robustifies certain aggregation rules such as: • geometric median (GM) \(\hat{x} = \arg \min _{x\in \mathbb {R}^d}\sum _{i=1}^n \Vert x - x_i\Vert \); • coordinate-wise median (CM) \(\hat{x} = \arg \min _{x\in \mathbb {R}^d}\sum _{i=1}^n \Vert x - x_i\Vert _1\); • Krum estimator [3] \(\arg \min _{x_i \in \{x_1, \ldots , x_n\}} \sum _{j \in S_i} \Vert x_j - x_i\Vert ^2\), where \(S_i \subseteq \{x_1, \ldots , x_n\}\) is the subset of \(n - |\mathcal{B}| - 2\) closest (w.r.t. \(\ell _2\)-norm) vectors to \(x_i\).

figure u

The following result establishes the robustness of the aforementioned aggregation rules in combination with Bucketing.

Theorem 3

(Theorem D.1 from [12]). Assume that \(\{x_1,x_2,\ldots ,x_n\}\) is such that there exists a subset \(\mathcal{G}\subseteq [n]\), \(|\mathcal{G}| = G \ge (1-\delta )n\) and \(\sigma \ge 0\) such that \(\frac{1}{G(G-1)}\sum _{i,l \in \mathcal{G}}\mathbb {E}\Vert x_i - x_l\Vert ^2 \le \sigma ^2\). Assume that \(\delta \le \delta _{\max }\). If Algorithm 2 is run with , then

  • GM \(\circ \) Bucketing satisfies Definition 1 with \(c = \mathcal{O}(1)\) and ,

  • CM \(\circ \) Bucketing satisfies Definition 1 with \(c = \mathcal{O}(d)\) and ,

  • Krum \(\circ \) Bucketing satisfies Definition 1 with \(c = \mathcal{O}(1)\) and .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fedin, N., Gorbunov, E. (2023). Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient. In: Khachay, M., Kochetov, Y., Eremeev, A., Khamisov, O., Mazalov, V., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2023. Lecture Notes in Computer Science, vol 13930. Springer, Cham. https://doi.org/10.1007/978-3-031-35305-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35305-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35304-8

  • Online ISBN: 978-3-031-35305-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics