Fundamentals of Robust Machine Learning

Guerraoui, Rachid; Gupta, Nirupam; Pinot, Rafael

doi:10.1007/978-981-97-0688-4_4

Rachid Guerraoui⁶,
Nirupam Gupta⁶ &
Rafael Pinot⁷

Part of the book series: Machine Learning: Foundations, Methodologies, and Applications ((MLFMA))

234 Accesses

Abstract

As explained in the previous chapter, distributing the learning procedure significantly simplifies the task of training complex models on a large amount of data. The workload for each node is divided essentially by the total size of the network, while the nodes retain the control over their local data. However, the perks of federated machine learning rest upon the unrealistic assumption that each node correctly executes the prescribed algorithm and all its data are trustworthy. This assumption need not hold true in practice. In this chapter, we revisit the problem of federated machine learning in the case when some of the participating nodes deviate from the set of instructions prescribed to them. As discussed in the introduction of this book, this deviation can be caused by bad data, software bugs, hardware failures, or even malicious attackers controlling some of the nodes. Under the presence of such adversarial nodes, traditional distributed-learning methods fail to guarantee good accuracy. We explain in this chapter how the traditional server-based gradient-descent methods can be rendered robust against a minority of adversarial nodes, and we analyze the training error of the resulting robust gradient-descent algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The quality of robustness achieved by an aggregation rule, which gives substance to the notion of more robustness (or less robustness), is formalized later in Sect. 4.3.
2.
More formally, let \(\tau \) be the permutation on \([n]\) such that \(v_{\tau (1)} \leq \cdots \leq v_{\tau (n)}\). The i-th order statistic of \(v_{1}, \ldots , v_{n}\) is simply \(v_{(i)} = v_{\tau (i)}\) for all \( i \in [n]\).
3.
Hint: If \(\frac {q}{n} \leq \frac {1}{2} \left ( \frac {c}{1 + c} \right )\), then \(\frac {q}{n-2q} \leq \frac {c}{2}\).
4.
We can reason the same using the Jensen’s inequality on the square function \((\cdot )^2\).
5.
Refer to the proof of Theorem 3.1 for the reasoning.
6.
Recall that \(L \geq \mu \) when Assumptions 3.1 and 4.2 hold true at the same time (see Appendix A.2).
7.
We ignore \(\mu \), as it can be a priori fixed by simply normalizing (or scaling) the point-wise loss function.

References

Alistarh D, Allen-Zhu Z, Li J (2018) Byzantine stochastic gradient descent. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4618–4628
Google Scholar
Allen-Zhu Z, Ebrahimianghazani F, Li J, Alistarh D (2021) Byzantine-Resilient non-convex stochastic gradient descent. In: International Conference on Learning Representations
Google Scholar
Allouah Y, Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Stephan J (2023) Fixing by mixing: a recipe for optimal Byzantine ML under heterogeneity. In: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, vol 206. Proceedings of Machine Learning Research. PMLR, pp 1232–1300
Google Scholar
Allouah Y, Guerraoui R, Gupta N, Pinot R, Stephan J (2023) On the privacy-robustness-utility trilemma in distributed learning. In: International Conference on Machine Learning, ICML 2023, 23–29 July 2023, Honolulu, Hawaii, USA. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J (eds), vol 202. Proceedings of Machine Learning Research. PMLR, pp 569–626
Google Scholar
Allouah Y, Guerraoui R, Gupta N, Rafaël P, Rizk G (2023) Robust distributed learning: tight error bounds and breakdown point under data heterogeneity. In: The 37th Conference on Neural Information Processing Systems
Google Scholar
Baruch M, Baruch G, Goldberg Y (2019) A little is enough: circumventing defenses for distributed learning. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 8–14 December 2019, Long Beach, CA, USA
Google Scholar
Blanchard P, Mhamdi EME, Guerraoui R, Stainer J (2017) Machine learning with adversaries: Byzantine tolerant gradient descent. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds), pp 119–129
Google Scholar
Blum M, Floyd RW, Pratt V, Rivest RL, Tarjan RE (1973) Time bounds for selection. J Comput Syst Sci 7(4):448–461. ISSN: 0022-0000
Article MathSciNet Google Scholar
Charikar M, Steinhardt J, Valiant G (2017) Learning from untrusted data. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 47–60
Google Scholar
Chen Y, Su L, Xu J (2017) Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Meas. Analy. Comput. Syst. 1(2):1–25
Google Scholar
Cohen MB, Lee YT, Miller GL, Pachocki J, Sidford A (2016) Geometric median in nearly linear time. In: Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18–21, 2016. ACM, pp 9–21
Google Scholar
Data D, Diggavi S (2021) Byzantine-resilient high-dimensional sgd with local iterations on heterogeneous data. In: International Conference on Machine Learning. PMLR, pp 2478–2488
Google Scholar
Diakonikolas I, Kamath G, Kane D, Li J, Steinhardt J, Stewart A (2019) Sever: a robust meta-algorithm for stochastic optimization. In: International Conference on Machine Learning. PMLR, pp 1596–1606
Google Scholar
El Mhamdi EM, Farhadkhani S, Guerraoui R, Guirguis A, Hoang LN, Rouault S (2021) Collaborative learning in the jungle (Decentralized, byzantine, heterogeneous, asynchronous and nonconvex learning). In: The 35th Conference on Neural Information Processing Systems
Google Scholar
Fang C, Yang Z, Bajwa WU (2022) BRIDGE: Byzantine-resilient decentralized gradient descent. IEEE Trans Signal Inf Process over Netw 8:610–626
Article MathSciNet Google Scholar
Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Stephan J (2022) Byzantine machine learning made easy by resilient averaging of momentums. In: International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA, vol. 162. Proceedings of Machine Learning Research. PMLR, pp 6246–6283
Google Scholar
Farhadkhani S, Guerraoui R, Villemaud O, et al (2022) An equivalence between data poisoning and Byzantine gradient attacks. In: International Conference on Machine Learning. PMLR, pp 6284–6323
Google Scholar
Farhadkhani S, Guerraoui R, Gupta N, Hoang L, Pinot R, Stephan J (2023) Robust collaborative learning with linear gradient overhead. In: International Conference on Machine Learning, ICML 2023, 23–29 July 2023, Honolulu, Hawaii, USA, vol 202. Proceedings of Machine Learning Research. PMLR, pp 9761–9813
Google Scholar
Guerraoui R, Gupta N, Pinot R (2023) Byzantine machine learning: a primer. In: ACM Comput Surv. Just Accepted. ISSN: 0360–0300
Google Scholar
Gupta N, Vaidya NH (2020) Fault-tolerance in distributed optimization: the case of redundancy. In: Proceedings of the 39th Symposium on Principles of Distributed Computing, pp 365–374
Google Scholar
Huber PJ (1981) Robust statistics. Wiley series in probability and mathematical statistics. Wiley, Hoboken
Google Scholar
Karimireddy SP, He L, Jaggi M (2021) Learning from history for Byzantine Robust optimization. In: International Conference On Machine Learning, vol 139. Proceedings of Machine Learning Research
Google Scholar
Karimireddy SP, He L, Jaggi M (2022) Byzantine-Robust learning on heterogeneous datasets via bucketing. In: International Conference on Learning Representations
Google Scholar
Khaled A, Richtárik P (2023) Better theory for SGD in the nonconvex world. Trans Mach Learn Res. Survey Certification. ISSN: 2835-8856. https://openreview.net/forum?id=AU4qHN2VkS
Li L, Xu W, Chen T, Giannakis GB, Ling Q (2019) RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1544–1551
Google Scholar
Liu S, Gupta N, Vaidya NH (2021) Approximate Byzantine Fault-tolerance in distributed optimization. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing. PODC’21. Virtual Event, Italy: Association for Computing Machinery, pp 379–389. ISBN: 9781450385480
Google Scholar
Merad I, Gaïffas S (2023) Robust methods for high-dimensional linear learning. J Mach Learn Res 24:165:1–165:44
Google Scholar
PrasadA, Suggala AS, Balakrishnan S, Ravikumar P (2020) Robust estimation via robust gradient estimation. J Roy Statist Soc Ser B (Statist Methodol) 82(3):601–627
Article MathSciNet Google Scholar
Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. Math Statist Appl 8(37):283–297
MathSciNet Google Scholar
Shejwalkar V, Houmansadr A (2021) Manipulating the Byzantine: optimizing model poisoning attacks and defenses for federated learning. In: NDSS
Google Scholar
Stich SU (2019) Unified optimal analysis of the (stochastic) gradient method. In: arXiv preprint arXiv:1907.04232
Google Scholar
Su L, Vaidya NH (2016) Fault-tolerant multi-agent optimization: optimal iterative distributed algorithms. In: Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pp 425–434
Google Scholar
Su L, Vaidya NH (2021) Byzantine-resilient multiagent optimization. IEEE Trans Autom Control 66(5):2227–2233
Article MathSciNet Google Scholar
Sundaram S, Gharesifard B (2018) Distributed optimization under adversarial nodes. IEEE Trans Autom Control 64:1063–1076
Article MathSciNet Google Scholar
Wu Z, Chen T, Ling Q (2023) Byzantine-resilient decentralized stochastic optimization with Robust aggregation rules. IEEE Trans Signal Process 71:3179–3195
Article MathSciNet Google Scholar
Xie C, Koyejo O, Gupta I (2019) Fall of empires: breaking byzantinetolerant SGD by inner product manipulation. In: Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22–25, 2019, p 83
Google Scholar
Yang Z, Bajwa WU (2019) ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning. IEEE Trans Signal Inf Process over Netw 5(4):611–627
Article MathSciNet Google Scholar
Yang Z, Gang A, Bajwa WU (2020) Adversary-resilient distributed and decentralized statistical inference and machine learning: an overview of recent advances under the Byzantine threat model. IEEE Signal Process Mag 37(3):146–159
Article Google Scholar
Yin D, Chen Y, Ramchandran K, Bartlett PL (2018) Byzantine-Robust distributed learning: towards optimal statistical rates. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, vol 80. Proceedings of Machine Learning Research. PMLR, pp 5636–5645
Google Scholar
Zhu B, Wang L, Pang Q, Wang S, Jiao J, Song D, Jordan MI (2023) Byzantine-Robust federated learning with optimal statistical rates. In: International Conference on Artificial Intelligence and Statistics, 25–27 April 2023, Palau de Congressos, Valencia, Spain. Ruiz FJR, Dy JG, van de Meent J (eds), vol 206. Proceedings of Machine Learning Research. PMLR, pp 3151–3178
Google Scholar
Zhu J, Lin Y, Velasquez A, Liu J (2023) Resilient distributed optimization. In: American Control Conference, ACC 2023, San Diego, CA, USA, May 31–June 2, 2023. IEEE, pp 1307–1312
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Rachid Guerraoui & Nirupam Gupta
Mathematics, Sorbonne Université, Paris, France
Rafael Pinot

Authors

Rachid Guerraoui
View author publications
You can also search for this author in PubMed Google Scholar
Nirupam Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Pinot
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Guerraoui, R., Gupta, N., Pinot, R. (2024). Fundamentals of Robust Machine Learning. In: Robust Machine Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-97-0688-4_4

Download citation

DOI: https://doi.org/10.1007/978-981-97-0688-4_4
Published: 05 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0687-7
Online ISBN: 978-981-97-0688-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics