Targeted Universal Adversarial Perturbations for Automatic Speech Recognition

Zong, Wei; Chow, Yang-Wai; Susilo, Willy; Rana, Santu; Venkatesh, Svetha

doi:10.1007/978-3-030-91356-4_19

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 13118))

Included in the following conference series:

International Conference on Information Security

1130 Accesses
4 Citations

Abstract

Automatic speech recognition (ASR) is an essential technology used in commercial products nowadays. However, the underlying deep learning models used in ASR systems are vulnerable to adversarial examples (AEs), which are generated by applying small or imperceptible perturbations to audio to fool these models. Recently, universal adversarial perturbations (UAPs) have attracted much research interest. UAPs used to generate audio AEs are not limited to a specific input audio signal. Instead, given a generic audio signal, audio AEs can be generated by directly applying UAPs. This paper presents a method of generating UAPs based on a targeted phrase. To the best of our knowledge, our proposed method of generating UAPs is the first to successfully attack ASR models with connectionist temporal classification (CTC) loss. In addition to generating UAPs, we empirically show that the UAPs can be considered as signals that are transcribed as the target phrase. We also show that the UAPs themselves preserve temporal dependency, such that the audio AEs generated using these UAPs also preserved temporal dependency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/SeanNaren/deepspeech.pytorch.
2.
We used the open source implementation from https://github.com/AI-secure/Characterizing-Audio-Adversarial-Examples-using-Temporal-Dependency.
3.
https://pypi.org/project/pyroomacoustics/.

References

Abdoli, S., Hafemann, L.G., Rony, J., Ayed, I.B., Cardinal, P., Koerich. A.L.: Universal adversarial audio perturbations. arXiv preprint arXiv:1908.03173 (2019)
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)
Google Scholar
Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.: Synthesizing robust adversarial examples. In: International Conference on Machine Learning, pp. 284–293. PMLR (2018)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Google Scholar
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)
Google Scholar
Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20–25 March 2016, pp. 4960–4964. IEEE (2016)
Google Scholar
Du, X., Pun, C., Zhang, Z.: A unified framework for detecting audio adversarial examples. In: Chen, C.W., et al. (eds.), MM 2020: The 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, USA, 12–16 October 2020, pp. 3986–3994. ACM (2020)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Hannun, A., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)
Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: Dy, J.G., Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, volume 80 of Proceedings of Machine Learning Research, pp. 2142–2151. PMLR (2018)
Google Scholar
Kingma, D.P., Ba. J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Google Scholar
Li, J., et al.: Universal adversarial perturbations generative network for speaker recognition. In: IEEE International Conference on Multimedia and Expo, ICME 2020, London, UK, 6–10 July 2020, pp. 1–6. IEEE (2020)
Google Scholar
Li, Z., Wu, Y., Liu, J., Chen, Y., Yuan. B.: Advpulse: universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.), CCS 2020: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, 9–13 November 2020, pp. 1121–1134. ACM (2020)
Google Scholar
Lu, Z., Han, W., Zhang, Y., Cao. I.: Exploring targeted universal adversarial perturbations to end-to-end ASR models. arXiv preprint arXiv:2104.02757 (2021)
Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., Frossard., P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)
Google Scholar
Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Google Scholar
Neekhara, P., Hussain, S., Pandey, P., Dubnov, S., McAuley, J.J., Koushanfar, F.: Universal adversarial perturbations for speech recognition systems. In: Kubin, G., Kacic, Z. (eds.) Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15–19 September 2019, pp. 481–485. ISCA (2019)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Google Scholar
Park, D.S., Chan, W., Zhang, Y., Chiu, C., Zoph, D.S., Cubuk, E.D., Le, Q.V.: Specaugment: A simple data augmentation method for automatic speech recognition. In: Kubin, G., Kacic, Z. (eds.) Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15–19 September 2019, pp. 2613–2617. ISCA (2019)
Google Scholar
Qin, Y., Carlini, N., Cottrell, G.W., Goodfellow, I.J., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, CA, USA, pp. 5231–5240 (2019)
Google Scholar
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2001, 7–11 May, 2001, Salt Palace Convention Center, Salt Lake City, Utah, USA, Proceedings, pp. 749–752. IEEE (2001)
Google Scholar
Schönherr, L., Eisenhofer, T., Zeiler, S., Holz, T., Kolossa, D.: Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. In: ACSAC 2020: Annual Computer Security Applications Conference, Virtual Event/Austin, TX, USA, 7–11 December, 2020, pp. 843–855. ACM (2020)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, Conference Track Proceedings (2014)
Google Scholar
Wang, D., Dong, L., Wang, R., Yan, D., Wang, J.: Targeted speech adversarial example generation with generative adversarial network. IEEE Access 8, 124503–124513 (2020)
Article Google Scholar
Xie, Y., Li, Z., Shi, C., Liu, J., Chen, Y., Yuan, B.: Enabling fast and universal audio adversarial attack using generative model. arXiv preprint arXiv:2004.12261 (2020)
Xie, Y., Shi, C., Li, Z., Liu, J., Chen, Y., Yuan, B.: Real-time, universal, and robust adversarial attacks against speaker recognition systems. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, 4–8 May 2020, pp. 1738–1742. IEEE (2020)
Google Scholar
Yang, Z., Li, B., Chen, P., Song, D.: Characterizing audio adversarial examples using temporal dependency. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019)
Google Scholar
Yuan, X., et al.: Commandersong: A systematic approach for practical adversarial voice recognition. In 27th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 18), pages 49–64, 2018
Google Scholar
Zhang, C., Benz, P., Imtiaz, T., Kweon, I.S.: Understanding adversarial examples from the mutual influence of images and perturbations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14521–14530 (2020)
Google Scholar
Zhao, P., et al.: On the design of black-box adversarial examples by leveraging gradient-free optimization and operator splitting method. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 121–130 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Cybersecurity and Cryptology (iC2), School of Computing and Information Technology, University of Wollongong, Wollongong, NSW, Australia
Wei Zong, Yang-Wai Chow & Willy Susilo
Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, VIC, Australia
Santu Rana & Svetha Venkatesh

Authors

Wei Zong
View author publications
You can also search for this author in PubMed Google Scholar
Yang-Wai Chow
View author publications
You can also search for this author in PubMed Google Scholar
Willy Susilo
View author publications
You can also search for this author in PubMed Google Scholar
Santu Rana
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zong .

Editor information

Editors and Affiliations

Monash University, Clayton, VIC, Australia
Joseph K. Liu
Norwegian University of Science and Technology, Gjøvik, Norway
Sokratis Katsikas
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
University of Wollongong, Wollongong, NSW, Australia
Willy Susilo
Petra Christian University, Surabaya, Indonesia
Rolly Intan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zong, W., Chow, YW., Susilo, W., Rana, S., Venkatesh, S. (2021). Targeted Universal Adversarial Perturbations for Automatic Speech Recognition. In: Liu, J.K., Katsikas, S., Meng, W., Susilo, W., Intan, R. (eds) Information Security. ISC 2021. Lecture Notes in Computer Science(), vol 13118. Springer, Cham. https://doi.org/10.1007/978-3-030-91356-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-91356-4_19
Published: 27 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91355-7
Online ISBN: 978-3-030-91356-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Targeted Universal Adversarial Perturbations for Automatic Speech Recognition