Generating More Effective and Imperceptible Adversarial Text Examples for Sentiment Classification

Du, Xiaohu; Yi, Zibo; Li, Shasha; Ma, Jun; Yu, Jie; Tan, Yusong; Wu, Qinbo

doi:10.1007/978-3-030-57884-8_37

Xiaohu Du¹¹,
Zibo Yi¹¹,
Shasha Li¹¹,
Jun Ma¹¹,
Jie Yu¹¹,
Yusong Tan¹¹ &
…
Qinbo Wu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12239))

Included in the following conference series:

International Conference on Artificial Intelligence and Security

1182 Accesses

Abstract

In this paper, we propose a novel white-box attack against word-level CNN text classifier. On the one hand, we use an Euclidean distance and cosine distance combined metric to find the most semantically similar substitution when generating perturbations, which can effectively increase the attack success rate. We’ve increased global search success rate from 75.8% to 85.8%. On the other hand, we can control the dispersion of the location of the modified words in the adversarial examples by introducing the coefficient of variation(CV) factor, because greedy search sometimes has poor readability for the modified positions in adversarial examples are close. More dispersed modifications can increase human imperceptibility and text readability. We use the attack success rate to evaluate the validity of the attack method, and use CV value to measure the dispersion degree of the modified words in the generated adversarial examples. Finally, we use the combination of these two methods, which can increase the attack success rate and make modification positions in generated examples more dispersed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Detecting textual adversarial examples through text modification on text classification systems

Article 24 February 2023

Learning to Generate Textual Adversarial Examples

Fine-tuning more stable neural text classifiers for defending word level adversarial attacks

Article 31 January 2022

References

Cheng, Y., Jiang, L., Macherey, W.: Robust neural machine translation with doubly adversarial inputs. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4324–4333. Association for Computational Linguistics, Florence, July 2019. https://doi.org/10.18653/v1/P19-1425, https://www.aclweb.org/anthology/P19-1425
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 31–36. Association for Computational Linguistics, Melbourne, July 2018. https://doi.org/10.18653/v1/P18-2006, https://www.aclweb.org/anthology/P18-2006
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems (2017)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. CoRR abs/1408.5882 (2014). http://arxiv.org/abs/1408.5882
Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial examples for natural language classification problems (2018). https://openreview.net/forum?id=r1QZ3zbAZ
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)
Google Scholar
Lei, Q., Wu, L., Chen, P.Y., Dimakis, A.G., Dhillon, I.S., Witbrock, M.: Discrete adversarial attacks and submodular optimization with applications to text classification (2019)
Google Scholar
Li, M., Sun, Y., Su, S., Tian, Z., Wang, Y., Wang, X.: DPIF: a framework for distinguishing unintentional quality problems from potential shilling attacks. CMC-Comput. Mater. Continua 59(1), 331–344 (2019)
Article Google Scholar
Liang, B., Li, H., Su, M., Pan, B., Shi, W.: Deep text classification can be fooled. In: IJCAI (2018)
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases (1998)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Tsai, Y.T., Yang, M.C., Chen, H.Y.: Adversarial attack on sentiment classification. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 233–240. Association for Computational Linguistics, Florence, August 2019. https://doi.org/10.18653/v1/W19-4824, https://www.aclweb.org/anthology/W19-4824
Wang, W., Tang, B., Wang, R., Wang, L., Ye, A.: A survey on adversarial attacks and defenses in text. CoRR abs/1902.07285 (2019). http://arxiv.org/abs/1902.07285
Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.I.: Greedy attack and gumbel attack: generating adversarial examples for discrete data. arXiv preprint arXiv:1805.12316 (2018)
Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep learning models in natural language processing: a survey. arXiv preprint arXiv:1901.06796 (2019)
Zhang, Y., Cheng, Q.: An image steganography algorithm based on quantization index modulation resisting scaling attacks and statistical detection. Comput. Mater. Continua 56(1), 151–167 (2018)
Article MathSciNet Google Scholar
Zhao, S., Cai, Z., Chen, H., Wang, Y., Liu, F., Liu, A.: Adversarial training based lattice lstm for chinese clinical named entity recognition. J. Biomed. Inform. 99, 103290 (2019)
Article Google Scholar
Zhao, W., Li, P., Zhu, C., Liu, D., Liu, X.: Defense against poisoning attack via evaluating training samples using multiple spectral clustering aggregation method. CMC-Comput. Mater. Continua 59(3), 817–832 (2019)
Article Google Scholar
Zhao, W., Long, J., Yin, J., Cai, Z., Xia, G.: Sampling attack against active learning in adversarial environment. In: Torra, V., Narukawa, Y., López, B., Villaret, M. (eds.) MDAI 2012. LNCS (LNAI), vol. 7647, pp. 222–233. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34620-0_21
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, National University of Defense Technology, Changsha, Hunan, China
Xiaohu Du, Zibo Yi, Shasha Li, Jun Ma, Jie Yu, Yusong Tan & Qinbo Wu

Authors

Xiaohu Du
View author publications
You can also search for this author in PubMed Google Scholar
Zibo Yi
View author publications
You can also search for this author in PubMed Google Scholar
Shasha Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yusong Tan
View author publications
You can also search for this author in PubMed Google Scholar
Qinbo Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohu Du .

Editor information

Editors and Affiliations

Nanjing University of Information Science, Nanjing, China
Xingming Sun
Nanjing University of Information Science, Nanjing, China
Jinwei Wang
Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, X. et al. (2020). Generating More Effective and Imperceptible Adversarial Text Examples for Sentiment Classification. In: Sun, X., Wang, J., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2020. Lecture Notes in Computer Science(), vol 12239. Springer, Cham. https://doi.org/10.1007/978-3-030-57884-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-57884-8_37
Published: 01 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57883-1
Online ISBN: 978-3-030-57884-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generating More Effective and Imperceptible Adversarial Text Examples for Sentiment Classification

Abstract

Access this chapter

Similar content being viewed by others

Detecting textual adversarial examples through text modification on text classification systems

Learning to Generate Textual Adversarial Examples

Fine-tuning more stable neural text classifiers for defending word level adversarial attacks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Generating More Effective and Imperceptible Adversarial Text Examples for Sentiment Classification

Abstract

Access this chapter

Similar content being viewed by others

Detecting textual adversarial examples through text modification on text classification systems

Learning to Generate Textual Adversarial Examples

Fine-tuning more stable neural text classifiers for defending word level adversarial attacks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation