Abstract
As explained in the previous chapter, distributing the learning procedure significantly simplifies the task of training complex models on a large amount of data. The workload for each node is divided essentially by the total size of the network, while the nodes retain the control over their local data. However, the perks of federated machine learning rest upon the unrealistic assumption that each node correctly executes the prescribed algorithm and all its data are trustworthy. This assumption need not hold true in practice. In this chapter, we revisit the problem of federated machine learning in the case when some of the participating nodes deviate from the set of instructions prescribed to them. As discussed in the introduction of this book, this deviation can be caused by bad data, software bugs, hardware failures, or even malicious attackers controlling some of the nodes. Under the presence of such adversarial nodes, traditional distributed-learning methods fail to guarantee good accuracy. We explain in this chapter how the traditional server-based gradient-descent methods can be rendered robust against a minority of adversarial nodes, and we analyze the training error of the resulting robust gradient-descent algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The quality of robustness achieved by an aggregation rule, which gives substance to the notion of more robustness (or less robustness), is formalized later in Sect. 4.3.
- 2.
More formally, let \(\tau \) be the permutation on \([n]\) such that \(v_{\tau (1)} \leq \cdots \leq v_{\tau (n)}\). The i-th order statistic of \(v_{1}, \ldots , v_{n}\) is simply \(v_{(i)} = v_{\tau (i)}\) for all \( i \in [n]\).
- 3.
Hint: If \(\frac {q}{n} \leq \frac {1}{2} \left ( \frac {c}{1 + c} \right )\), then \(\frac {q}{n-2q} \leq \frac {c}{2}\).
- 4.
We can reason the same using the Jensen’s inequality on the square function \((\cdot )^2\).
- 5.
Refer to the proof of Theorem 3.1 for the reasoning.
- 6.
- 7.
We ignore \(\mu \), as it can be a priori fixed by simply normalizing (or scaling) the point-wise loss function.
References
Alistarh D, Allen-Zhu Z, Li J (2018) Byzantine stochastic gradient descent. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4618–4628
Allen-Zhu Z, Ebrahimianghazani F, Li J, Alistarh D (2021) Byzantine-Resilient non-convex stochastic gradient descent. In: International Conference on Learning Representations
Allouah Y, Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Stephan J (2023) Fixing by mixing: a recipe for optimal Byzantine ML under heterogeneity. In: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, vol 206. Proceedings of Machine Learning Research. PMLR, pp 1232–1300
Allouah Y, Guerraoui R, Gupta N, Pinot R, Stephan J (2023) On the privacy-robustness-utility trilemma in distributed learning. In: International Conference on Machine Learning, ICML 2023, 23–29 July 2023, Honolulu, Hawaii, USA. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J (eds), vol 202. Proceedings of Machine Learning Research. PMLR, pp 569–626
Allouah Y, Guerraoui R, Gupta N, Rafaël P, Rizk G (2023) Robust distributed learning: tight error bounds and breakdown point under data heterogeneity. In: The 37th Conference on Neural Information Processing Systems
Baruch M, Baruch G, Goldberg Y (2019) A little is enough: circumventing defenses for distributed learning. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 8–14 December 2019, Long Beach, CA, USA
Blanchard P, Mhamdi EME, Guerraoui R, Stainer J (2017) Machine learning with adversaries: Byzantine tolerant gradient descent. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds), pp 119–129
Blum M, Floyd RW, Pratt V, Rivest RL, Tarjan RE (1973) Time bounds for selection. J Comput Syst Sci 7(4):448–461. ISSN: 0022-0000
Charikar M, Steinhardt J, Valiant G (2017) Learning from untrusted data. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 47–60
Chen Y, Su L, Xu J (2017) Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Meas. Analy. Comput. Syst. 1(2):1–25
Cohen MB, Lee YT, Miller GL, Pachocki J, Sidford A (2016) Geometric median in nearly linear time. In: Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18–21, 2016. ACM, pp 9–21
Data D, Diggavi S (2021) Byzantine-resilient high-dimensional sgd with local iterations on heterogeneous data. In: International Conference on Machine Learning. PMLR, pp 2478–2488
Diakonikolas I, Kamath G, Kane D, Li J, Steinhardt J, Stewart A (2019) Sever: a robust meta-algorithm for stochastic optimization. In: International Conference on Machine Learning. PMLR, pp 1596–1606
El Mhamdi EM, Farhadkhani S, Guerraoui R, Guirguis A, Hoang LN, Rouault S (2021) Collaborative learning in the jungle (Decentralized, byzantine, heterogeneous, asynchronous and nonconvex learning). In: The 35th Conference on Neural Information Processing Systems
Fang C, Yang Z, Bajwa WU (2022) BRIDGE: Byzantine-resilient decentralized gradient descent. IEEE Trans Signal Inf Process over Netw 8:610–626
Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Stephan J (2022) Byzantine machine learning made easy by resilient averaging of momentums. In: International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA, vol. 162. Proceedings of Machine Learning Research. PMLR, pp 6246–6283
Farhadkhani S, Guerraoui R, Villemaud O, et al (2022) An equivalence between data poisoning and Byzantine gradient attacks. In: International Conference on Machine Learning. PMLR, pp 6284–6323
Farhadkhani S, Guerraoui R, Gupta N, Hoang L, Pinot R, Stephan J (2023) Robust collaborative learning with linear gradient overhead. In: International Conference on Machine Learning, ICML 2023, 23–29 July 2023, Honolulu, Hawaii, USA, vol 202. Proceedings of Machine Learning Research. PMLR, pp 9761–9813
Guerraoui R, Gupta N, Pinot R (2023) Byzantine machine learning: a primer. In: ACM Comput Surv. Just Accepted. ISSN: 0360–0300
Gupta N, Vaidya NH (2020) Fault-tolerance in distributed optimization: the case of redundancy. In: Proceedings of the 39th Symposium on Principles of Distributed Computing, pp 365–374
Huber PJ (1981) Robust statistics. Wiley series in probability and mathematical statistics. Wiley, Hoboken
Karimireddy SP, He L, Jaggi M (2021) Learning from history for Byzantine Robust optimization. In: International Conference On Machine Learning, vol 139. Proceedings of Machine Learning Research
Karimireddy SP, He L, Jaggi M (2022) Byzantine-Robust learning on heterogeneous datasets via bucketing. In: International Conference on Learning Representations
Khaled A, Richtárik P (2023) Better theory for SGD in the nonconvex world. Trans Mach Learn Res. Survey Certification. ISSN: 2835-8856. https://openreview.net/forum?id=AU4qHN2VkS
Li L, Xu W, Chen T, Giannakis GB, Ling Q (2019) RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1544–1551
Liu S, Gupta N, Vaidya NH (2021) Approximate Byzantine Fault-tolerance in distributed optimization. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing. PODC’21. Virtual Event, Italy: Association for Computing Machinery, pp 379–389. ISBN: 9781450385480
Merad I, Gaïffas S (2023) Robust methods for high-dimensional linear learning. J Mach Learn Res 24:165:1–165:44
PrasadA, Suggala AS, Balakrishnan S, Ravikumar P (2020) Robust estimation via robust gradient estimation. J Roy Statist Soc Ser B (Statist Methodol) 82(3):601–627
Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. Math Statist Appl 8(37):283–297
Shejwalkar V, Houmansadr A (2021) Manipulating the Byzantine: optimizing model poisoning attacks and defenses for federated learning. In: NDSS
Stich SU (2019) Unified optimal analysis of the (stochastic) gradient method. In: arXiv preprint arXiv:1907.04232
Su L, Vaidya NH (2016) Fault-tolerant multi-agent optimization: optimal iterative distributed algorithms. In: Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pp 425–434
Su L, Vaidya NH (2021) Byzantine-resilient multiagent optimization. IEEE Trans Autom Control 66(5):2227–2233
Sundaram S, Gharesifard B (2018) Distributed optimization under adversarial nodes. IEEE Trans Autom Control 64:1063–1076
Wu Z, Chen T, Ling Q (2023) Byzantine-resilient decentralized stochastic optimization with Robust aggregation rules. IEEE Trans Signal Process 71:3179–3195
Xie C, Koyejo O, Gupta I (2019) Fall of empires: breaking byzantinetolerant SGD by inner product manipulation. In: Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22–25, 2019, p 83
Yang Z, Bajwa WU (2019) ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning. IEEE Trans Signal Inf Process over Netw 5(4):611–627
Yang Z, Gang A, Bajwa WU (2020) Adversary-resilient distributed and decentralized statistical inference and machine learning: an overview of recent advances under the Byzantine threat model. IEEE Signal Process Mag 37(3):146–159
Yin D, Chen Y, Ramchandran K, Bartlett PL (2018) Byzantine-Robust distributed learning: towards optimal statistical rates. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, vol 80. Proceedings of Machine Learning Research. PMLR, pp 5636–5645
Zhu B, Wang L, Pang Q, Wang S, Jiao J, Song D, Jordan MI (2023) Byzantine-Robust federated learning with optimal statistical rates. In: International Conference on Artificial Intelligence and Statistics, 25–27 April 2023, Palau de Congressos, Valencia, Spain. Ruiz FJR, Dy JG, van de Meent J (eds), vol 206. Proceedings of Machine Learning Research. PMLR, pp 3151–3178
Zhu J, Lin Y, Velasquez A, Liu J (2023) Resilient distributed optimization. In: American Control Conference, ACC 2023, San Diego, CA, USA, May 31–June 2, 2023. IEEE, pp 1307–1312
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Guerraoui, R., Gupta, N., Pinot, R. (2024). Fundamentals of Robust Machine Learning. In: Robust Machine Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-97-0688-4_4
Download citation
DOI: https://doi.org/10.1007/978-981-97-0688-4_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0687-7
Online ISBN: 978-981-97-0688-4
eBook Packages: Computer ScienceComputer Science (R0)