Abstract
Many recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples. Adversarial attacks on DNNs for natural language processing tasks are notoriously more challenging than that in computer vision. This paper proposes an attention-based genetic algorithm (dubbed AGA) for generating adversarial examples under a black-box setting. In particular, the attention mechanism helps identify the relatively more important words in a given text. Based on this information, bespoke crossover and mutation operators are developed to navigate AGA to focus on exploiting relatively more important words thus leading to a save of computational resources. Experiments on three widely used datasets demonstrate that AGA achieves a higher success rate with less than \(48\%\) of the number of queries than the peer algorithms. In addition, the underlying DNN can become more robust by using the adversarial examples obtained by AGA for adversarial training.
Keywords
- Attention mechanism
- Adversarial attack
- Genetic algorithm
- Natural language processing
This work was supported by UKRI Future Leaders Fellowship (MR/S017062/1), EPSRC (2404317), NSFC (62076056), Royal Society (IES/R2/212077) and Amazon Research Award.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B., Srivastava, M.B., Chang, K.: Generating natural language adversarial examples. In: EMNLP’18: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2890–2896. Association for Computational Linguistics (2018)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: EMNLP’15: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642. The Association for Computational Linguistics (2015). https://doi.org/10.18653/v1/d15-1075
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy, SP, pp. 39–57. IEEE Computer Society (2017). https://doi.org/10.1109/SP.2017.49
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL’19: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Garg, S., Ramakrishnan, G.: BAE: bert-based adversarial examples for text classification. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) EMNLP’20: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6174–6181. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.498, https://doi.org/10.18653/v1/2020.emnlp-main.498
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR’15: Proceedings of the 2019 International Conference on Learning Representations (2015). http://arxiv.org/abs/1412.6572
Jia, R., Raghunathan, A., Göksel, K., Liang, P.: Certified robustness to adversarial word substitutions. In: EMNLP-IJCNLP’19 : Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 4127–4140. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1423
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP’14: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751. ACL (2014). https://doi.org/10.3115/v1/d14-1181
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: ICLR’20: Proceedings of the 2020 International Conference on Learning Representations. OpenReview.net (2020). https://openreview.net/forum?id=H1eA7AEtvS
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: ACL’11: Proc. of the 2011 Association for Computational Linguistics: Human Language Technologies. pp. 142–150. The Association for Computer Linguistics (2011), https://aclanthology.org/P11-1015/
Maheshwary, R., Maheshwary, S., Pudi, V.: Generating natural language attacks in a hard label black box setting. In: AAAI’21: Proc. of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI, pp. 13525–13533. AAAI Press (2021). https://ojs.aaai.org/index.php/AAAI/article/view/17595
Morris, J.X., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., Qi, Y.: Textattack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. In: EMNLP’20: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 119–126. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-demos.16
Papernot, N., McDaniel, P.D., Swami, A., Harang, R.E.: Crafting adversarial input sequences for recurrent neural networks. In: Brand, J., Valenti, M.C., Akinpelu, A., Doshi, B.T., Gorsic, B.L. (eds.) MILCOM’16: Proceedings of the 2016 IEEE Military Communications Conference, pp. 49–54. IEEE (2016). https://doi.org/10.1109/MILCOM.2016.7795300
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP’14: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543. ACL (2014). https://doi.org/10.3115/v1/d14-1162
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS’15: Proceedings of the 2015 Advances in Neural Information Processing Systems, pp. 91–99 (2015). https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: ACL’19: Proceedings of the 2019 Association for Computational Linguistics, pp. 1085–1097. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1103
Samanta, S., Mehta, S.: Towards crafting text adversarial samples. CoRR abs/1707.02812 (2017). http://arxiv.org/abs/1707.02812
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). http://arxiv.org/abs/1910.01108
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS’14: Proc. of the 2014 Advances in Neural Information Processing Systems. pp. 3104–3112 (2014), https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. In: Bengio, Y., LeCun, Y. (eds.) ICLR’14: Proc. of the 2014 International Conference on Learning Representations (2014), http://arxiv.org/abs/1312.6199
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. CoRR abs/1909.06723 (2019). http://arxiv.org/abs/1909.06723
Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based LSTM for aspect-level sentiment classification. In: EMNLP’16: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615. The Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/d16-1058
Wang, Y., Huang, R., Song, S., Huang, Z., Huang, G.: Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition. In: NIPS’21: Proc. of the 2021 Advances in Neural Information Processing Systems, vol. 34 (2021). https://proceedings.neurips.cc/paper/2021/hash/64517d8435994992e682b3e4aa0a0661-Abstract.html
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: NAACL’16: Proc. of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489. The Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/n16-1174, https://doi.org/10.18653/v1/n16-1174
Zang, Y., Qi, F., Yang, C., Liu, Z., Zhang, M., Liu, Q., Sun, M.: Word-level textual adversarial attacking as combinatorial optimization. In: ACL’20: Proceedings of the 2020 Annual Meeting of the Association for Computational Linguistics, pp. 6066–6080. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.540
Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans. Intell. Syst. Technol. 11(3), 24:1–24:41 (2020). https://doi.org/10.1145/3374217
Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text classification. In: NIPS’15: Proceedings of the 2015 Advances in Neural Information Processing Systems, pp. 649–657 (2015). https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, S., Li, K., Min, G. (2022). Attention-Based Genetic Algorithm for Adversarial Attack in Natural Language Processing. In: Rudolph, G., Kononova, A.V., Aguirre, H., Kerschke, P., Ochoa, G., Tušar, T. (eds) Parallel Problem Solving from Nature – PPSN XVII. PPSN 2022. Lecture Notes in Computer Science, vol 13398. Springer, Cham. https://doi.org/10.1007/978-3-031-14714-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-14714-2_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14713-5
Online ISBN: 978-3-031-14714-2
eBook Packages: Computer ScienceComputer Science (R0)