Skip to main content

Threats on Machine Learning Technique by Data Poisoning Attack: A Survey

  • Conference paper
  • First Online:
Advances in Cyber Security (ACeS 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1487))

Included in the following conference series:

Abstract

With the huge services provided by machine learning systems in our daily life, the attacks on these services are increasing every day. The attackers are trying to distort the functionality of these services and change their real duty by falsifying the function using the principle of intoxication. The poisoned system gives the unauthorized person the right to enter and exit the system as a legal person at anytime and anywhere. This could degrade the credibility of systems built using intelligent technologies. The paper extensively introduces the mechanisms of a data poisoning attack. Data poisoning attacks target systems based on machine learning technology, with explanations of the attack mechanisms targeting data sources and the intelligence model during either the training or testing phases. Defense methods presented by researchers in this field have also been described by defense strategies presented in the literature. The risks and effects caused by this attack are also described, and what are the future solutions that give opportunities for researchers working in this field to avoid and repel this attack perfectly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amanuel, S.V.A., Ameen, S.Y.: Device-to-device communication for 5G security: a review. J. Inf. Technol. Inf. 1(1), 26–31 (2021)

    Google Scholar 

  2. Khalid, L.F., Ameen, S.Y.: Secure IoT integration in daily lives: a review. J. Inf. Technol. Inf. 1(1), 6–12 (2021)

    Google Scholar 

  3. Medak, T., Krishna, A.P.: Power controlled secured transmission using self organizing trusted node model. Int. J. Pure Appl. Math. 118(24), 11–21 (2018)

    Google Scholar 

  4. Pitropakis, N., et al.: A taxonomy and survey of attacks against machine learning. Comput. Sci. Rev. 34, 100199 (2019)

    Article  MathSciNet  Google Scholar 

  5. Goldblum, M., et al.: Data security for machine learning: data poisoning, backdoor attacks, and defenses. arXiv preprint arXiv:2012.10544 (2020)

  6. Hamed, Z.A., Ahmed, I.M., Ameen, S.Y.: Protecting windows OS against local threats without using antivirus. Relation 29(12s), 64–70 (2020)

    Google Scholar 

  7. Abd Al Nomani, M.M., Birmani, A.H.T.: Informational destruction crime; A comparative Study. PalArch’s J. Archaeol. Egypt 17(3), 2266–2281 (2020)

    Google Scholar 

  8. Yao, Y., et al.: Latent backdoor attacks on deep neural networks, pp. 2041–2055 (2019)

    Google Scholar 

  9. Li, Y., et al.: Backdoor learning: a survey. arXiv preprint arXiv:2007.08745 (2020)

  10. Tang, D., Wang, X., et al.: Demon in the variant: statistical analysis of DNNs for robust backdoor contamination detection. In: 30th {USENIX} Security Symposium ({USENIX} Security 21) (2021)

    Google Scholar 

  11. Xia, Y., et al.: Weighted speech distortion losses for neural-network-based real-time speech enhancement. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2020)

    Google Scholar 

  12. Ning, J., et al.: Analytical modeling of part distortion in metal additive manufacturing. Int. J. Adv. Manuf. Technol. 107(1–2), 49–57 (2020). https://doi.org/10.1007/s00170-020-05065-8

    Article  Google Scholar 

  13. Ahmed, I.: Enhancement of network attack classification using particle swarm optimization and multi-layer perceptron. Int. J. Comput. Appl. 137(12), 18–22 (2016)

    Google Scholar 

  14. Huang, J., et al.: An exploratory analysis on users’ contributions in federated learning. arXiv preprint arXiv:2011.06830 (2020)

  15. Tomsett, R., Chan, K.S., et al.: Model poisoning attacks against distributed machine learning systems (2019)

    Google Scholar 

  16. Gu, T., et al.: BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)

    Article  Google Scholar 

  17. Bagdasaryan, E., et al.: How to backdoor federated learning. In: International Conference on Artificial Intelligence and Statistics. PMLR (2020)

    Google Scholar 

  18. Tolpegin, V., et al.: Data poisoning attacks against federated learning systems. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds.) ESORICS 2020. LNCS, vol. 12308, pp. 480–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58951-6_24

    Chapter  Google Scholar 

  19. Liu, Y., et al.: RC-SSFL towards robust and communication-efficient semi-supervised federated learning system. arXiv preprint arXiv:2012.04432 (2020)

  20. Lyu, L., Yu, H., Yang, Q.: Threats to federated learning: a survey. arXiv preprint arXiv:2003.02133 (2020)

  21. Weerasinghe, S., et al.: Defending regression learners against poisoning attacks. arXiv preprint arXiv:2008.09279 (2020)

  22. Jagielski, M., et al.: Manipulating machine learning: poisoning attacks and countermeasures for regression learning. In: 2018 IEEE Symposium on Security and Privacy (SP). IEEE (2018)

    Google Scholar 

  23. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  24. Recht, B., et al.: Do CIFAR-10 classifiers generalize to CIFAR-10? arXiv preprint arXiv:1806.00451 (2018)

  25. Kwon, H., Yoon, H., Park, K.-W.: Selective poisoning attack on deep neural networks. Symmetry 11(7), 892 (2019)

    Article  Google Scholar 

  26. Kurita, K., Michel, P., Neubig, G.: Weight poisoning attacks on pre-trained models. arXiv preprint arXiv:2004.06660 (2020)

  27. Candanedo, L.M., Feldheim, V., Deramaix, D.: Reconstruction of the indoor temperature dataset of a house using data driven models for performance evaluation. Build. Environ. 138, 250–261 (2018)

    Article  Google Scholar 

  28. Li, M., Mickel, A., Taylor, S.: Should this loan be approved or denied?: a large dataset with class assignment guidelines. J. Stat. Educ. 26(1), 55–66 (2018)

    Article  Google Scholar 

  29. Makonin, S., Wang, Z.J., Tumpach, C.: RAE: the rainforest automation energy dataset for smart grid meter data analysis. Data 3(1), 8 (2018)

    Article  Google Scholar 

  30. Purohit, H., et al.: MIMII dataset: sound dataset for malfunctioning industrial machine investigation and inspection. arXiv preprint arXiv:1909.09347 (2019)

  31. Purushotham, S., et al.: Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inform. 83, 112–134 (2018)

    Article  Google Scholar 

  32. Wadawadagi, R., Pagi, V.: Fine-grained sentiment rating of online reviews with Deep-RNN. In: Chiplunkar, N., Fukao, T. (eds.) Advances in Artificial Intelligence and Data Engineering. Advances in Intelligent Systems and Computing, vol. 1133, pp. 687–700. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-3514-7_52

    Chapter  Google Scholar 

  33. Fortuna, P., Soler-Company, J., Wanner, L.: How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets? Inf. Process. Manag. 58(3), 102524 (2021)

    Article  Google Scholar 

  34. Apoorva, K.A., Sangeetha, S.: Deep neural network and model-based clustering technique for forensic electronic mail author attribution. SN Appl. Sci. 3(3), 1–12 (2021). https://doi.org/10.1007/s42452-020-04127-6

    Article  Google Scholar 

  35. Huang, H., et al.: Data poisoning attacks to deep learning based recommender systems. arXiv preprint arXiv:2101.02644 (2021)

  36. Shejwalkar, V., Houmansadr, A.: Manipulating the Byzantine: optimizing model poisoning attacks and defenses for federated learning (2021)

    Google Scholar 

  37. Tahmasebian, F., et al.: Crowdsourcing under data poisoning attacks: a comparative study. In: Singhal, A., Vaidya, J. (eds.) DBSec 2020. LNCS, vol. 12122, pp. 310–332. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49669-2_18

    Chapter  Google Scholar 

  38. Chen, L., et al.: Data poisoning attacks on neighborhood-based recommender systems. Trans. Emerg. Telecommun. Technol. 32, e3872 (2020)

    Google Scholar 

  39. Koh, P.W., Steinhardt, J., Liang, P.: Stronger data poisoning attacks break data sanitization defenses. arXiv preprint arXiv:1811.00741 (2018)

  40. Jianqiang, Z., Xiaolin, G., Xuejun, Z.: Deep convolution neural networks for Twitter sentiment analysis. IEEE Access 6, 23253–23260 (2018)

    Article  Google Scholar 

  41. Baldominos, A., Saez, Y., Isasi, P.: A survey of handwritten character recognition with MNIST and EMNIST. Appl. Sci. 9(15), 3169 (2019)

    Article  Google Scholar 

  42. Jain, A., Jain, V.: Effect of activation functions on deep learning algorithms performance for IMDB movie review analysis. In: Bansal, P., Tushir, M., Balas, V.E., Srivastava, R. (eds.) Proceedings of International Conference on Artificial Intelligence and Applications. AISC, vol. 1164, pp. 489–497. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-4992-2_46

    Chapter  Google Scholar 

  43. Perry, E.: Lethean attack: an online data poisoning technique. arXiv preprint arXiv:2011.12355 (2020)

  44. Zhang, X., Zhu, X., Lessard, L.: Online data poisoning attacks. In: Learning for Dynamics and Control. PMLR (2020)

    Google Scholar 

  45. Geiping, J., et al.: Witches’ Brew industrial scale data poisoning via gradient matching. arXiv preprint arXiv:2009.02276 (2020)

  46. Wang, Y., Chaudhuri, K.: Data poisoning attacks against online learning. arXiv preprint arXiv:1808.08994 (2018)

  47. Fang, M., et al.: Local model poisoning attacks to Byzantine-robust federated learning. In: 29th {USENIX} [17] Security Symposium ({USENIX} Security 20) (2020)

    Google Scholar 

  48. Zhang, Y., et al.: Towards poisoning the neural collaborative filtering-based recommender systems. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds.) ESORICS 2020. LNCS, vol. 12308, pp. 461–479. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58951-6_23

    Chapter  Google Scholar 

  49. Liu, Y., et al.: A survey on neural trojans. In: 2020 21st International Symposium on Quality Electronic Design (ISQED). IEEE (2020)

    Google Scholar 

  50. Aono, Y., et al.: Privacy-preserving logistic regression with distributed data sources via homomorphic encryption. IEICE Trans. Inf. Syst. 99(8), 2079–2089 (2016)

    Article  Google Scholar 

  51. Assegie, T.A.: An optimized K-nearest neighbor based breast cancer detection. J. Robot. Control (JRC) 2(3), 115–118 (2021)

    Google Scholar 

  52. Amin, B., et al.: Intelligent neutrosophic diagnostic system for cardiotocography data. Comput. Intell. Neurosci. 2021, 20–31 (2021)

    Google Scholar 

  53. Rezaei, M.R.: Amazon product recommender system. arXiv preprint arXiv:2102.04238 (2021)

  54. Ramasamy, L.K., et al.: Performance analysis of sentiments in Twitter dataset using SVM models. Int. J. Electr. Comput. Eng. 11(3), 2275–2284 (2088–8708) (2021)

    Google Scholar 

  55. Leung, J.K., Griva, I., Kennedy, W.G.: An affective aware pseudo association method to connect disjoint users across multiple datasets–an enhanced validation method for text-based emotion aware recommender. arXiv preprint arXiv:2102.05719 (2021)

  56. Liu, Y., et al.: Towards communication-efficient and attack-resistant federated edge learning for industrial internet of things. arXiv preprint arXiv:2012.04436 (2020)

  57. Siddiqui, M., Wang, M.C., Lee, J.: Data mining methods for malware detection using instruction sequences. In: Artificial Intelligence and Applications (2008)

    Google Scholar 

  58. Narisada, S., et al.: Stronger targeted poisoning attacks against malware detection. In: Krenn, S., Shulman, H., Vaudenay, S. (eds.) CANS 2020. LNCS, vol. 12579, pp. 65–84. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65411-5_4

    Chapter  Google Scholar 

  59. Weerasinghe, S., et al.: Defending distributed classifiers against data poisoning attacks. arXiv preprint arXiv:2008.09284 (2020)

  60. Xu, X., et al.: Detecting ai trojans using meta neural analysis. arXiv preprint arXiv:1910.03137 (2019)

  61. Weerasinghe, P.S.L.: Novel defenses against data poisoning in adversarial machine learning (2019)

    Google Scholar 

  62. Borgnia, E., et al.: Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff. arXiv preprint arXiv:2011.09527 (2020)

  63. Gray, J., Sgandurra, D., Cavallaro, L.: Identifying authorship style in malicious binaries: techniques, challenges & datasets. arXiv preprint arXiv:2101.06124 (2021)

  64. Sridhar, K., et al.: ICASSP 2021 acoustic echo cancellation challenge: datasets and testing framework. arXiv preprint arXiv:2009.04972 (2020)

  65. Gu, L., et al.: Semi-supervised learning in medical images through graph-embedded random forest. Front. Neuroinf. 14, 49 (2020)

    Article  Google Scholar 

  66. Yang, L., et al.: Random noise attenuation based on residual convolutional neural network in seismic datasets. IEEE Access 8, 30271–30286 (2020)

    Article  Google Scholar 

  67. Asghari, H., et al.: CircMiner: accurate and rapid detection of circular RNA through the splice-aware pseudo-alignment scheme. Bioinformatics 36(12), 3703–3711 (2020)

    Article  Google Scholar 

  68. Panda, N., Majhi, S.K.: How effective is the salp swarm algorithm in data classification. In: Das, A.K., Nayak, J., Naik, B., Pati, S.K., Pelusi, D. (eds.) Computational Intelligence in Pattern Recognition. AISC, vol. 999, pp. 579–588. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9042-5_49

    Chapter  Google Scholar 

  69. Schwarzschild, A., et al.: Just how toxic is data poisoning? A unified benchmark for backdoor and data poisoning attacks. arXiv preprint arXiv:2006.12557 (2020)

  70. Sablayrolles, A., et al.: Radioactive data: tracing through training. In: International Conference on Machine Learning. PMLR (2020)

    Google Scholar 

  71. Dang, T.K., Truong, P.T.T., et al.: Data poisoning attack on deep neural network and some defense methods, pp. 15–22 (2020)

    Google Scholar 

Download references

Acknowledgment

The authors would like to thank the University of Mosul/ College of Computer Sciences and Mathematics for their provided facilities.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ibrahim M. Ahmed or Manar Younis Kashmoola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ahmed, I.M., Kashmoola, M.Y. (2021). Threats on Machine Learning Technique by Data Poisoning Attack: A Survey. In: Abdullah, N., Manickam, S., Anbar, M. (eds) Advances in Cyber Security. ACeS 2021. Communications in Computer and Information Science, vol 1487. Springer, Singapore. https://doi.org/10.1007/978-981-16-8059-5_36

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-8059-5_36

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-8058-8

  • Online ISBN: 978-981-16-8059-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics