Skip to main content

Data Poisoning Attacks Against Federated Learning Systems

Part of the Lecture Notes in Computer Science book series (LNSC,volume 12308)

Abstract

Federated learning (FL) is an emerging paradigm for distributed training of large-scale deep neural networks in which participants’ data remains on their own devices with only model updates being shared with a central server. However, the distributed nature of FL gives rise to new threats caused by potentially malicious participants. In this paper, we study targeted data poisoning attacks against FL systems in which a malicious subset of the participants aim to poison the global model by sending model updates derived from mislabeled data. We first demonstrate that such data poisoning attacks can cause substantial drops in classification accuracy and recall, even with a small percentage of malicious participants. We additionally show that the attacks can be targeted, i.e., they have a large negative impact only on classes that are under attack. We also study attack longevity in early/late round training, the impact of malicious participant availability, and the relationships between the two. Finally, we propose a defense strategy that can help identify malicious participants in FL to circumvent poisoning attacks, and demonstrate its effectiveness.

Keywords

  • Federated learning
  • Adversarial machine learning
  • Label flipping
  • Data poisoning
  • Deep learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
EUR   29.95
Price includes VAT (Finland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR   42.79
Price includes VAT (Finland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR   54.99
Price includes VAT (Finland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.

Notes

  1. 1.

    https://github.com/git-disl/DataPoisoning_FL.

References

  1. An Act: Health insurance portability and accountability act of 1996. Public Law 104-191 (1996)

    Google Scholar 

  2. Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., Shmatikov, V.: How to backdoor federated learning. arXiv preprint arXiv:1807.00459 (2018)

  3. Baracaldo, N., Chen, B., Ludwig, H., Safavi, J.A.: Mitigating poisoning attacks on machine learning models: a data provenance based approach. In: 10th ACM Workshop on Artificial Intelligence and Security, pp. 103–110 (2017)

    Google Scholar 

  4. Bhagoji, A.N., Chakraborty, S., Mittal, P., Calo, S.: Analyzing federated learning through an adversarial lens. In: International Conference on Machine Learning, pp. 634–643 (2019)

    Google Scholar 

  5. Biggio, B., Nelson, B., Laskov, P.: Support vector machines under adversarial label noise. In: Asian Conference on Machine Learning, pp. 97–112 (2011)

    Google Scholar 

  6. Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, pp. 1467–1474 (2012)

    Google Scholar 

  7. Biggio, B., Pillai, I., Rota Bulò, S., Ariu, D., Pelillo, M., Roli, F.: Is data clustering in adversarial settings secure? In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, pp. 87–98 (2013)

    Google Scholar 

  8. Blanchard, P., Guerraoui, R., Stainer, J., et al.: Machine learning with adversaries: byzantine tolerant gradient descent. In: NeurIPS, pp. 119–129 (2017)

    Google Scholar 

  9. Bonawitz, K., et al.: Towards federated learning at scale: System design. In: SysML 2019 (2019, to appear). https://arxiv.org/abs/1902.01046

  10. Bursztein, E.: Attacks against machine learning - an overview (2018). https://elie.net/blog/ai/attacks-against-machine-learning-an-overview/

  11. Chen, S., et al.: Automated poisoning attacks and defenses in malware detection systems: an adversarial machine learning approach. Comput. Secur. 73, 326–344 (2018)

    CrossRef  Google Scholar 

  12. Demontis, A., et al.: Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In: 28th USENIX Security Symposium, pp. 321–338 (2019)

    Google Scholar 

  13. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  14. Fang, M., Cao, X., Jia, J., Gong, N.Z.: Local model poisoning attacks to byzantine-robust federated learning. In: USENIX Security Symposium (2020, to appear)

    Google Scholar 

  15. Fang, M., Yang, G., Gong, N.Z., Liu, J.: Poisoning attacks to graph-based recommender systems. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 381–392 (2018)

    Google Scholar 

  16. Fung, C., Yoon, C.J., Beschastnikh, I.: Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866 (2018)

  17. Hard, A., et al.: Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604 (2018)

  18. Hitaj, B., Ateniese, G., Perez-Cruz, F.: Deep models under the GAN: information leakage from collaborative deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618 (2017)

    Google Scholar 

  19. Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., Li, B.: Manipulating machine learning: poisoning attacks and countermeasures for regression learning. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 19–35. IEEE (2018)

    Google Scholar 

  20. Kairouz, P., et al.: Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977 (2019)

  21. Khazbak, Y., Tan, T., Cao, G.: MLGuard: mitigating poisoning attacks in privacy preserving distributed collaborative learning (2020)

    Google Scholar 

  22. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  23. Liu, C., Li, B., Vorobeychik, Y., Oprea, A.: Robust linear regression against training data poisoning. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 91–102 (2017)

    Google Scholar 

  24. Maiorca, D., Biggio, B., Giacinto, G.: Towards adversarial malware detection: lessons learned from pdf-based attacks. ACM Comput. Surv. (CSUR) 52(4), 1–36 (2019)

    CrossRef  Google Scholar 

  25. Marcel, S., Rodriguez, Y.: Torchvision the machine-vision package of torch. In: 18th ACM International Conference on Multimedia, pp. 1485–1488 (2010)

    Google Scholar 

  26. Mathews, K., Bowman, C.: The California consumer privacy act of 2018 (2018)

    Google Scholar 

  27. Melis, L., Song, C., De Cristofaro, E., Shmatikov, V.: Exploiting unintended feature leakage in collaborative learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706. IEEE (2019)

    Google Scholar 

  28. Mhamdi, E.M.E., Guerraoui, R., Rouault, S.: The hidden vulnerability of distributed learning in byzantium. arXiv preprint arXiv:1802.07927 (2018)

  29. Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N.K.: Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inform. 19(6), 1893–1905 (2014)

    CrossRef  Google Scholar 

  30. Muñoz-González, L., et al.: Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 27–38 (2017)

    Google Scholar 

  31. Nasr, M., Shokri, R., Houmansadr, A.: Comprehensive privacy analysis of deep learning: passive and active white-box inference attacks against centralized and federated learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 739–753. IEEE (2019)

    Google Scholar 

  32. Nelson, B., et al.: Exploiting machine learning to subvert your spam filter. LEET 8, 1–9 (2008)

    Google Scholar 

  33. Nguyen, T.D., Rieger, P., Miettinen, M., Sadeghi, A.R.: Poisoning attacks on federated learning-based IoT intrusion detection system (2020)

    Google Scholar 

  34. Papernot, N., McDaniel, P., Sinha, A., Wellman, M.P.: SoK: security and privacy in machine learning. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 399–414. IEEE (2018)

    Google Scholar 

  35. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8024–8035 (2019)

    Google Scholar 

  36. Paudice, A., Muñoz-González, L., Gyorgy, A., Lupu, E.C.: Detection of adversarial training examples in poisoning attacks through anomaly detection. arXiv preprint arXiv:1802.03041 (2018)

  37. Paudice, A., Muñoz-González, L., Lupu, E.C.: Label sanitization against label flipping poisoning attacks. In: Alzate, C., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11329, pp. 5–15. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13453-2_1

    CrossRef  Google Scholar 

  38. General Data Protection Regulation: Regulation (EU) 2016/679 of the European parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46. Off. J. Eur. Union (OJ) 59(1–88), 294 (2016)

    Google Scholar 

  39. Rubinstein, B.I., et al.: Antidote: understanding and defending against poisoning of anomaly detectors. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp. 1–14 (2009)

    Google Scholar 

  40. Ryffel, T., et al.: A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017 (2018)

  41. Schlesinger, A., O’Hara, K.P., Taylor, A.S.: Let’s talk about race: identity, chatbots, and AI. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2018)

    Google Scholar 

  42. Shafahi, A., et al.: Poison frogs! Targeted clean-label poisoning attacks on neural networks. In: Advances in Neural Information Processing Systems, pp. 6103–6113 (2018)

    Google Scholar 

  43. Shen, S., Tople, S., Saxena, P.: Auror: Defending against poisoning attacks in collaborative deep learning systems. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 508–519 (2016)

    Google Scholar 

  44. Steinhardt, J., Koh, P.W.W., Liang, P.S.: Certified defenses for data poisoning attacks. In: NeurIPS, pp. 3517–3529 (2017)

    Google Scholar 

  45. Suciu, O., Marginean, R., Kaya, Y., Daume III, H., Dumitras, T.: When does machine learning fail? Generalized transferability for evasion and poisoning attacks. In: 27th USENIX Security Symposium, pp. 1299–1316 (2018)

    Google Scholar 

  46. Sun, Z., Kairouz, P., Suresh, A.T., McMahan, H.B.: Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963 (2019)

  47. Truex, S., Liu, L., Gursoy, M.E., Yu, L., Wei, W.: Towards demystifying membership inference attacks. arXiv preprint arXiv:1807.09173 (2018)

  48. Truex, S., Liu, L., Gursoy, M.E., Yu, L., Wei, W.: Demystifying membership inference attacks in machine learning as a service. IEEE Trans. Serv. Comput. (2019)

    Google Scholar 

  49. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  50. Xiao, H., Xiao, H., Eckert, C.: Adversarial label flips attack on support vector machines. In: ECAI, pp. 870–875 (2012)

    Google Scholar 

  51. Xiao, H., Biggio, B., Brown, G., Fumera, G., Eckert, C., Roli, F.: Is feature selection secure against training data poisoning? In: International Conference on Machine Learning, pp. 1689–1698 (2015)

    Google Scholar 

  52. Xiao, H., Biggio, B., Nelson, B., Xiao, H., Eckert, C., Roli, F.: Support vector machines under adversarial label contamination. Neurocomputing 160, 53–62 (2015)

    CrossRef  Google Scholar 

  53. Yang, C., Wu, Q., Li, H., Chen, Y.: Generative poisoning attack method against neural networks. arXiv preprint arXiv:1703.01340 (2017)

  54. Yang, G., Gong, N.Z., Cai, Y.: Fake co-visitation injection attacks to recommender systems. In: NDSS (2017)

    Google Scholar 

  55. Yin, D., Chen, Y., Kannan, R., Bartlett, P.: Byzantine-robust distributed learning: towards optimal statistical rates. In: International Conference on Machine Learning, pp. 5650–5659 (2018)

    Google Scholar 

  56. Zhao, L., et al.: Shielding collaborative learning: mitigating poisoning attacks through client-side detection. IEEE Trans. Dependable Secure Comput. (2020)

    Google Scholar 

  57. Zhao, M., An, B., Gao, W., Zhang, T.: Efficient label contamination attacks against black-box learning models. In: IJCAI, pp. 3945–3951 (2017)

    Google Scholar 

  58. Zhu, C., Huang, W.R., Li, H., Taylor, G., Studer, C., Goldstein, T.: Transferable clean-label poisoning attacks on deep neural nets. In: International Conference on Machine Learning, pp. 7614–7623 (2019)

    Google Scholar 

  59. Zhu, L., Liu, Z., Han, S.: Deep leakage from gradients. In: Advances in Neural Information Processing Systems, pp. 14747–14756 (2019)

    Google Scholar 

Download references

Acknowledgements

This research is partially sponsored by NSF CISE SaTC 1564097 and 2038029. The second author acknowledges an IBM PhD Fellowship Award and the support from the Enterprise AI, Systems & Solutions division led by Sandeep Gopisetty at IBM Almaden Research Center. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or other funding agencies and companies mentioned above.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stacey Truex .

Editor information

Editors and Affiliations

A DNN Architectures and Configuration

A DNN Architectures and Configuration

All NNs were trained using PyTorch version 1.2.0 with random weight initialization. Training and testing was completed using a NVIDIA 980 Ti GPU-accelerator. When necessary, all CUDA tensors were mapped to CPU tensors before exporting to Numpy arrays. Default drivers provided by Ubuntu 19.04 and built-in GPU support in PyTorch was used to accelerate training. Details can be found in our repository: https://github.com/git-disl/DataPoisoning_FL.

Fashion-MNIST: We do not conduct data pre-processing. We use a Convolutional Neural Network with the architecture described in Table 6. In the table, Conv = Convolutional Layer, and Batch Norm = Batch Normalization.

CIFAR-10: We conduct data pre-processing prior to training. Data is normalized with mean [0.485, 0.456, 0.406] and standard deviation [0.229, 0.224, 0.225]. Values reflect mean and standard deviation of the ImageNet dataset  [13] and are commonplace, even expected when using Torchvision  [25] models. We additionally perform data augmentation with random horizontal flipping, random cropping with size 32, and default padding. Our CNN is detailed in Table 5.

 

Table 5. CIFAR-10 CNN.
Table 6. Fashion-MNIST CNN.

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tolpegin, V., Truex, S., Gursoy, M.E., Liu, L. (2020). Data Poisoning Attacks Against Federated Learning Systems. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds) Computer Security – ESORICS 2020. ESORICS 2020. Lecture Notes in Computer Science(), vol 12308. Springer, Cham. https://doi.org/10.1007/978-3-030-58951-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58951-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58950-9

  • Online ISBN: 978-3-030-58951-6

  • eBook Packages: Computer ScienceComputer Science (R0)