Skip to main content

Fundamentals of Robust Machine Learning

  • Chapter
  • First Online:
Robust Machine Learning

Abstract

As explained in the previous chapter, distributing the learning procedure significantly simplifies the task of training complex models on a large amount of data. The workload for each node is divided essentially by the total size of the network, while the nodes retain the control over their local data. However, the perks of federated machine learning rest upon the unrealistic assumption that each node correctly executes the prescribed algorithm and all its data are trustworthy. This assumption need not hold true in practice. In this chapter, we revisit the problem of federated machine learning in the case when some of the participating nodes deviate from the set of instructions prescribed to them. As discussed in the introduction of this book, this deviation can be caused by bad data, software bugs, hardware failures, or even malicious attackers controlling some of the nodes. Under the presence of such adversarial nodes, traditional distributed-learning methods fail to guarantee good accuracy. We explain in this chapter how the traditional server-based gradient-descent methods can be rendered robust against a minority of adversarial nodes, and we analyze the training error of the resulting robust gradient-descent algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The quality of robustness achieved by an aggregation rule, which gives substance to the notion of more robustness (or less robustness), is formalized later in Sect. 4.3.

  2. 2.

    More formally, let \(\tau \) be the permutation on \([n]\) such that \(v_{\tau (1)} \leq \cdots \leq v_{\tau (n)}\). The i-th order statistic of \(v_{1}, \ldots , v_{n}\) is simply \(v_{(i)} = v_{\tau (i)}\) for all \( i \in [n]\).

  3. 3.

    Hint: If \(\frac {q}{n} \leq \frac {1}{2} \left ( \frac {c}{1 + c} \right )\), then \(\frac {q}{n-2q} \leq \frac {c}{2}\).

  4. 4.

    We can reason the same using the Jensen’s inequality on the square function \((\cdot )^2\).

  5. 5.

    Refer to the proof of Theorem 3.1 for the reasoning.

  6. 6.

    Recall that \(L \geq \mu \) when Assumptions 3.1 and 4.2 hold true at the same time (see Appendix A.2).

  7. 7.

    We ignore \(\mu \), as it can be a priori fixed by simply normalizing (or scaling) the point-wise loss function.

References

  1. Alistarh D, Allen-Zhu Z, Li J (2018) Byzantine stochastic gradient descent. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4618–4628

    Google Scholar 

  2. Allen-Zhu Z, Ebrahimianghazani F, Li J, Alistarh D (2021) Byzantine-Resilient non-convex stochastic gradient descent. In: International Conference on Learning Representations

    Google Scholar 

  3. Allouah Y, Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Stephan J (2023) Fixing by mixing: a recipe for optimal Byzantine ML under heterogeneity. In: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, vol 206. Proceedings of Machine Learning Research. PMLR, pp 1232–1300

    Google Scholar 

  4. Allouah Y, Guerraoui R, Gupta N, Pinot R, Stephan J (2023) On the privacy-robustness-utility trilemma in distributed learning. In: International Conference on Machine Learning, ICML 2023, 23–29 July 2023, Honolulu, Hawaii, USA. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J (eds), vol 202. Proceedings of Machine Learning Research. PMLR, pp 569–626

    Google Scholar 

  5. Allouah Y, Guerraoui R, Gupta N, Rafaël P, Rizk G (2023) Robust distributed learning: tight error bounds and breakdown point under data heterogeneity. In: The 37th Conference on Neural Information Processing Systems

    Google Scholar 

  6. Baruch M, Baruch G, Goldberg Y (2019) A little is enough: circumventing defenses for distributed learning. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 8–14 December 2019, Long Beach, CA, USA

    Google Scholar 

  7. Blanchard P, Mhamdi EME, Guerraoui R, Stainer J (2017) Machine learning with adversaries: Byzantine tolerant gradient descent. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds), pp 119–129

    Google Scholar 

  8. Blum M, Floyd RW, Pratt V, Rivest RL, Tarjan RE (1973) Time bounds for selection. J Comput Syst Sci 7(4):448–461. ISSN: 0022-0000

    Article  MathSciNet  Google Scholar 

  9. Charikar M, Steinhardt J, Valiant G (2017) Learning from untrusted data. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 47–60

    Google Scholar 

  10. Chen Y, Su L, Xu J (2017) Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Meas. Analy. Comput. Syst. 1(2):1–25

    Google Scholar 

  11. Cohen MB, Lee YT, Miller GL, Pachocki J, Sidford A (2016) Geometric median in nearly linear time. In: Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18–21, 2016. ACM, pp 9–21

    Google Scholar 

  12. Data D, Diggavi S (2021) Byzantine-resilient high-dimensional sgd with local iterations on heterogeneous data. In: International Conference on Machine Learning. PMLR, pp 2478–2488

    Google Scholar 

  13. Diakonikolas I, Kamath G, Kane D, Li J, Steinhardt J, Stewart A (2019) Sever: a robust meta-algorithm for stochastic optimization. In: International Conference on Machine Learning. PMLR, pp 1596–1606

    Google Scholar 

  14. El Mhamdi EM, Farhadkhani S, Guerraoui R, Guirguis A, Hoang LN, Rouault S (2021) Collaborative learning in the jungle (Decentralized, byzantine, heterogeneous, asynchronous and nonconvex learning). In: The 35th Conference on Neural Information Processing Systems

    Google Scholar 

  15. Fang C, Yang Z, Bajwa WU (2022) BRIDGE: Byzantine-resilient decentralized gradient descent. IEEE Trans Signal Inf Process over Netw 8:610–626

    Article  MathSciNet  Google Scholar 

  16. Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Stephan J (2022) Byzantine machine learning made easy by resilient averaging of momentums. In: International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA, vol. 162. Proceedings of Machine Learning Research. PMLR, pp 6246–6283

    Google Scholar 

  17. Farhadkhani S, Guerraoui R, Villemaud O, et al (2022) An equivalence between data poisoning and Byzantine gradient attacks. In: International Conference on Machine Learning. PMLR, pp 6284–6323

    Google Scholar 

  18. Farhadkhani S, Guerraoui R, Gupta N, Hoang L, Pinot R, Stephan J (2023) Robust collaborative learning with linear gradient overhead. In: International Conference on Machine Learning, ICML 2023, 23–29 July 2023, Honolulu, Hawaii, USA, vol 202. Proceedings of Machine Learning Research. PMLR, pp 9761–9813

    Google Scholar 

  19. Guerraoui R, Gupta N, Pinot R (2023) Byzantine machine learning: a primer. In: ACM Comput Surv. Just Accepted. ISSN: 0360–0300

    Google Scholar 

  20. Gupta N, Vaidya NH (2020) Fault-tolerance in distributed optimization: the case of redundancy. In: Proceedings of the 39th Symposium on Principles of Distributed Computing, pp 365–374

    Google Scholar 

  21. Huber PJ (1981) Robust statistics. Wiley series in probability and mathematical statistics. Wiley, Hoboken

    Google Scholar 

  22. Karimireddy SP, He L, Jaggi M (2021) Learning from history for Byzantine Robust optimization. In: International Conference On Machine Learning, vol 139. Proceedings of Machine Learning Research

    Google Scholar 

  23. Karimireddy SP, He L, Jaggi M (2022) Byzantine-Robust learning on heterogeneous datasets via bucketing. In: International Conference on Learning Representations

    Google Scholar 

  24. Khaled A, Richtárik P (2023) Better theory for SGD in the nonconvex world. Trans Mach Learn Res. Survey Certification. ISSN: 2835-8856. https://openreview.net/forum?id=AU4qHN2VkS

  25. Li L, Xu W, Chen T, Giannakis GB, Ling Q (2019) RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1544–1551

    Google Scholar 

  26. Liu S, Gupta N, Vaidya NH (2021) Approximate Byzantine Fault-tolerance in distributed optimization. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing. PODC’21. Virtual Event, Italy: Association for Computing Machinery, pp 379–389. ISBN: 9781450385480

    Google Scholar 

  27. Merad I, Gaïffas S (2023) Robust methods for high-dimensional linear learning. J Mach Learn Res 24:165:1–165:44

    Google Scholar 

  28. PrasadA, Suggala AS, Balakrishnan S, Ravikumar P (2020) Robust estimation via robust gradient estimation. J Roy Statist Soc Ser B (Statist Methodol) 82(3):601–627

    Article  MathSciNet  Google Scholar 

  29. Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. Math Statist Appl 8(37):283–297

    MathSciNet  Google Scholar 

  30. Shejwalkar V, Houmansadr A (2021) Manipulating the Byzantine: optimizing model poisoning attacks and defenses for federated learning. In: NDSS

    Google Scholar 

  31. Stich SU (2019) Unified optimal analysis of the (stochastic) gradient method. In: arXiv preprint arXiv:1907.04232

    Google Scholar 

  32. Su L, Vaidya NH (2016) Fault-tolerant multi-agent optimization: optimal iterative distributed algorithms. In: Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pp 425–434

    Google Scholar 

  33. Su L, Vaidya NH (2021) Byzantine-resilient multiagent optimization. IEEE Trans Autom Control 66(5):2227–2233

    Article  MathSciNet  Google Scholar 

  34. Sundaram S, Gharesifard B (2018) Distributed optimization under adversarial nodes. IEEE Trans Autom Control 64:1063–1076

    Article  MathSciNet  Google Scholar 

  35. Wu Z, Chen T, Ling Q (2023) Byzantine-resilient decentralized stochastic optimization with Robust aggregation rules. IEEE Trans Signal Process 71:3179–3195

    Article  MathSciNet  Google Scholar 

  36. Xie C, Koyejo O, Gupta I (2019) Fall of empires: breaking byzantinetolerant SGD by inner product manipulation. In: Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22–25, 2019, p 83

    Google Scholar 

  37. Yang Z, Bajwa WU (2019) ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning. IEEE Trans Signal Inf Process over Netw 5(4):611–627

    Article  MathSciNet  Google Scholar 

  38. Yang Z, Gang A, Bajwa WU (2020) Adversary-resilient distributed and decentralized statistical inference and machine learning: an overview of recent advances under the Byzantine threat model. IEEE Signal Process Mag 37(3):146–159

    Article  Google Scholar 

  39. Yin D, Chen Y, Ramchandran K, Bartlett PL (2018) Byzantine-Robust distributed learning: towards optimal statistical rates. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, vol 80. Proceedings of Machine Learning Research. PMLR, pp 5636–5645

    Google Scholar 

  40. Zhu B, Wang L, Pang Q, Wang S, Jiao J, Song D, Jordan MI (2023) Byzantine-Robust federated learning with optimal statistical rates. In: International Conference on Artificial Intelligence and Statistics, 25–27 April 2023, Palau de Congressos, Valencia, Spain. Ruiz FJR, Dy JG, van de Meent J (eds), vol 206. Proceedings of Machine Learning Research. PMLR, pp 3151–3178

    Google Scholar 

  41. Zhu J, Lin Y, Velasquez A, Liu J (2023) Resilient distributed optimization. In: American Control Conference, ACC 2023, San Diego, CA, USA, May 31–June 2, 2023. IEEE, pp 1307–1312

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Guerraoui, R., Gupta, N., Pinot, R. (2024). Fundamentals of Robust Machine Learning. In: Robust Machine Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-97-0688-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0688-4_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0687-7

  • Online ISBN: 978-981-97-0688-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics