Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

Peri, Neehar; Gupta, Neal; Huang, W. Ronny; Fowl, Liam; Zhu, Chen; Feizi, Soheil; Goldstein, Tom; Dickerson, John P.

doi:10.1007/978-3-030-66415-2_4

Neehar Peri¹⁰,
Neal Gupta¹⁰,
W. Ronny Huang¹⁰,
Liam Fowl¹⁰,
Chen Zhu¹⁰,
Soheil Feizi¹⁰,
Tom Goldstein¹⁰ &
…
John P. Dickerson¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12535))

Included in the following conference series:

European Conference on Computer Vision

3483 Accesses
36 Citations

Abstract

Targeted clean-label data poisoning is a type of adversarial attack on machine learning systems in which an adversary injects a few correctly-labeled, minimally-perturbed samples into the training data, causing a model to misclassify a particular test sample during inference. Although defenses have been proposed for general poisoning attacks, no reliable defense for clean-label attacks has been demonstrated, despite the attacks’ effectiveness and realistic applications. In this work, we propose a simple, yet highly-effective Deep k-NN defense against both feature collision and convex polytope clean-label attacks on the CIFAR-10 dataset. We demonstrate that our proposed strategy is able to detect over 99% of poisoned examples in both attacks and remove them without compromising model performance. Additionally, through ablation studies, we discover simple guidelines for selecting the value of k as well as for implementing the Deep k-NN defense on real-world datasets with class imbalance. Our proposed defense shows that current clean-label poisoning attack strategies can be annulled, and serves as a strong yet simple-to-implement baseline defense to test future clean-label poisoning attacks. Our code is available on GitHub.

N. Peri, N. Gupta and W. R. Huang—The first three authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Label Sanitization Against Label Flipping Poisoning Attacks

Data sanitization against adversarial label contamination based on data complexity

Article 24 January 2017

Nrat: towards adversarial training with inherent label noise

Article Open access 10 January 2024

Notes

1.
We use conventional training loss for all except the adversarial training defense.
2.
There is no filtering in adversarial training.

References

Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_25
Chapter Google Scholar
Chen, B., et al.: Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1907–1915 (2017)
Google Scholar
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: Advances in Neural Information Processing Systems, pp. 4467–4475 (2017)
Google Scholar
Frosst, N., Papernot, N., Hinton, G.: Analyzing and improving representations with the soft nearest neighbor loss. arXiv preprint arXiv:1902.01889 (2019)
Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015). http://arxiv.org/abs/1412.6572
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. IEEE (2017)
Google Scholar
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems, pp. 125–136 (2019)
Google Scholar
Koh, P.W., Steinhardt, J., Liang, P.: Stronger data poisoning attacks break data sanitization defenses. arXiv preprint arXiv:1811.00741 (2018)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report. Citeseer (2009)
Google Scholar
Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Uesato, J., Frossard, P.: Robustness via curvature regularization, and vice versa. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9078–9086 (2019)
Google Scholar
Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765 (2018)
Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1916–1920. IEEE (2015)
Google Scholar
Qin, C., et al.: Adversarial robustness through local linearization. arXiv preprint arXiv:1907.02610 (2019)
Rizoiu, M.A., Wang, T., Ferraro, G., Suominen, H.: Transfer learning for hate speech detection in social media. In: Conference on Artificial Intelligence (AAAI) (2019)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. arXiv preprint arXiv:1801.04381 (2018)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article Google Scholar
Shafahi, A., et al.: Poison frogs! targeted clean-label poisoning attacks on neural networks. In: Advances in Neural Information Processing Systems, pp. 6103–6113 (2018)
Google Scholar
Shafahi, A., et al.: Adversarial training for free! In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Sitawarin, C., Wagner, D.: On the robustness of keep k-nearest neighbors (2019)
Google Scholar
Steinhardt, J., Koh, P.W., Liang, P.: Certified defenses for data poisoning attacks. CoRR abs/1706.03691 (2017). http://arxiv.org/abs/1706.03691
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1891–1898 (2014)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Advances in Neural Information Processing Systems, pp. 8000–8010 (2018)
Google Scholar
Turner, A., Tsipras, D., Madry, A.: Clean-label backdoor attacks (2019). https://people.csail.mit.edu/madry/lab/cleanlabel.pdf
Xie, C., Wu, Y., Maaten, L.v.d., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509 (2019)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995. IEEE (2017)
Google Scholar
Zhu, C., Huang, W.R., Li, H., Taylor, G., Studer, C., Goldstein, T.: Transferable clean-label poisoning attacks on deep neural nets. In: International Conference on Machine Learning, pp. 7614–7623 (2019)
Google Scholar

Download references

Acknowledgement

Dickerson and Gupta were supported in part by NSF CAREER Award IIS-1846237, DARPA GARD HR00112020007, DARPA SI3-CMD S4761, DoD WHS Award HQ003420F0035, and a Google Faculty Research Award. Goldstein and his students were supported by the DARPA GARD and DARPA QED4RML programs. Additional support was provided by the National Science Foundation DMS division, and the JP Morgan Fellowship program.

Author information

Authors and Affiliations

Center for Machine Learning, University of Maryland, College Park, USA
Neehar Peri, Neal Gupta, W. Ronny Huang, Liam Fowl, Chen Zhu, Soheil Feizi, Tom Goldstein & John P. Dickerson

Authors

Neehar Peri
View author publications
You can also search for this author in PubMed Google Scholar
Neal Gupta
View author publications
You can also search for this author in PubMed Google Scholar
W. Ronny Huang
View author publications
You can also search for this author in PubMed Google Scholar
Liam Fowl
View author publications
You can also search for this author in PubMed Google Scholar
Chen Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Soheil Feizi
View author publications
You can also search for this author in PubMed Google Scholar
Tom Goldstein
View author publications
You can also search for this author in PubMed Google Scholar
John P. Dickerson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to W. Ronny Huang .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peri, N. et al. (2020). Deep k-NN Defense Against Clean-Label Data Poisoning Attacks. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12535. Springer, Cham. https://doi.org/10.1007/978-3-030-66415-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-66415-2_4
Published: 10 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66414-5
Online ISBN: 978-3-030-66415-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

Abstract

Access this chapter

Similar content being viewed by others

Label Sanitization Against Label Flipping Poisoning Attacks

Data sanitization against adversarial label contamination based on data complexity

Nrat: towards adversarial training with inherent label noise

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

Abstract

Access this chapter

Similar content being viewed by others

Label Sanitization Against Label Flipping Poisoning Attacks

Data sanitization against adversarial label contamination based on data complexity

Nrat: towards adversarial training with inherent label noise

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation