Skip to main content

Targeted Universal Adversarial Perturbations for Automatic Speech Recognition

  • Conference paper
  • First Online:
Information Security (ISC 2021)

Abstract

Automatic speech recognition (ASR) is an essential technology used in commercial products nowadays. However, the underlying deep learning models used in ASR systems are vulnerable to adversarial examples (AEs), which are generated by applying small or imperceptible perturbations to audio to fool these models. Recently, universal adversarial perturbations (UAPs) have attracted much research interest. UAPs used to generate audio AEs are not limited to a specific input audio signal. Instead, given a generic audio signal, audio AEs can be generated by directly applying UAPs. This paper presents a method of generating UAPs based on a targeted phrase. To the best of our knowledge, our proposed method of generating UAPs is the first to successfully attack ASR models with connectionist temporal classification (CTC) loss. In addition to generating UAPs, we empirically show that the UAPs can be considered as signals that are transcribed as the target phrase. We also show that the UAPs themselves preserve temporal dependency, such that the audio AEs generated using these UAPs also preserved temporal dependency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/SeanNaren/deepspeech.pytorch.

  2. 2.

    We used the open source implementation from https://github.com/AI-secure/Characterizing-Audio-Adversarial-Examples-using-Temporal-Dependency.

  3. 3.

    https://pypi.org/project/pyroomacoustics/.

References

  1. Abdoli, S., Hafemann, L.G., Rony, J., Ayed, I.B., Cardinal, P., Koerich. A.L.: Universal adversarial audio perturbations. arXiv preprint arXiv:1908.03173 (2019)

  2. Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)

    Google Scholar 

  3. Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.: Synthesizing robust adversarial examples. In: International Conference on Machine Learning, pp. 284–293. PMLR (2018)

    Google Scholar 

  4. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)

    Google Scholar 

  5. Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)

    Google Scholar 

  6. Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20–25 March 2016, pp. 4960–4964. IEEE (2016)

    Google Scholar 

  7. Du, X., Pun, C., Zhang, Z.: A unified framework for detecting audio adversarial examples. In: Chen, C.W., et al. (eds.), MM 2020: The 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, USA, 12–16 October 2020, pp. 3986–3994. ACM (2020)

    Google Scholar 

  8. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

    Google Scholar 

  9. Hannun, A., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)

  10. Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: Dy, J.G., Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, volume 80 of Proceedings of Machine Learning Research, pp. 2142–2151. PMLR (2018)

    Google Scholar 

  11. Kingma, D.P., Ba. J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)

    Google Scholar 

  12. Li, J., et al.: Universal adversarial perturbations generative network for speaker recognition. In: IEEE International Conference on Multimedia and Expo, ICME 2020, London, UK, 6–10 July 2020, pp. 1–6. IEEE (2020)

    Google Scholar 

  13. Li, Z., Wu, Y., Liu, J., Chen, Y., Yuan. B.: Advpulse: universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.), CCS 2020: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, 9–13 November 2020, pp. 1121–1134. ACM (2020)

    Google Scholar 

  14. Lu, Z., Han, W., Zhang, Y., Cao. I.: Exploring targeted universal adversarial perturbations to end-to-end ASR models. arXiv preprint arXiv:2104.02757 (2021)

  15. Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., Frossard., P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)

    Google Scholar 

  16. Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)

    Google Scholar 

  17. Neekhara, P., Hussain, S., Pandey, P., Dubnov, S., McAuley, J.J., Koushanfar, F.: Universal adversarial perturbations for speech recognition systems. In: Kubin, G., Kacic, Z. (eds.) Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15–19 September 2019, pp. 481–485. ISCA (2019)

    Google Scholar 

  18. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)

    Google Scholar 

  19. Park, D.S., Chan, W., Zhang, Y., Chiu, C., Zoph, D.S., Cubuk, E.D., Le, Q.V.: Specaugment: A simple data augmentation method for automatic speech recognition. In: Kubin, G., Kacic, Z. (eds.) Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15–19 September 2019, pp. 2613–2617. ISCA (2019)

    Google Scholar 

  20. Qin, Y., Carlini, N., Cottrell, G.W., Goodfellow, I.J., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, CA, USA, pp. 5231–5240 (2019)

    Google Scholar 

  21. Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2001, 7–11 May, 2001, Salt Palace Convention Center, Salt Lake City, Utah, USA, Proceedings, pp. 749–752. IEEE (2001)

    Google Scholar 

  22. Schönherr, L., Eisenhofer, T., Zeiler, S., Holz, T., Kolossa, D.: Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. In: ACSAC 2020: Annual Computer Security Applications Conference, Virtual Event/Austin, TX, USA, 7–11 December, 2020, pp. 843–855. ACM (2020)

    Google Scholar 

  23. Szegedy, C., et al.: Intriguing properties of neural networks. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, Conference Track Proceedings (2014)

    Google Scholar 

  24. Wang, D., Dong, L., Wang, R., Yan, D., Wang, J.: Targeted speech adversarial example generation with generative adversarial network. IEEE Access 8, 124503–124513 (2020)

    Article  Google Scholar 

  25. Xie, Y., Li, Z., Shi, C., Liu, J., Chen, Y., Yuan, B.: Enabling fast and universal audio adversarial attack using generative model. arXiv preprint arXiv:2004.12261 (2020)

  26. Xie, Y., Shi, C., Li, Z., Liu, J., Chen, Y., Yuan, B.: Real-time, universal, and robust adversarial attacks against speaker recognition systems. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, 4–8 May 2020, pp. 1738–1742. IEEE (2020)

    Google Scholar 

  27. Yang, Z., Li, B., Chen, P., Song, D.: Characterizing audio adversarial examples using temporal dependency. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019)

    Google Scholar 

  28. Yuan, X., et al.: Commandersong: A systematic approach for practical adversarial voice recognition. In 27th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 18), pages 49–64, 2018

    Google Scholar 

  29. Zhang, C., Benz, P., Imtiaz, T., Kweon, I.S.: Understanding adversarial examples from the mutual influence of images and perturbations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14521–14530 (2020)

    Google Scholar 

  30. Zhao, P., et al.: On the design of black-box adversarial examples by leveraging gradient-free optimization and operator splitting method. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 121–130 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Zong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zong, W., Chow, YW., Susilo, W., Rana, S., Venkatesh, S. (2021). Targeted Universal Adversarial Perturbations for Automatic Speech Recognition. In: Liu, J.K., Katsikas, S., Meng, W., Susilo, W., Intan, R. (eds) Information Security. ISC 2021. Lecture Notes in Computer Science(), vol 13118. Springer, Cham. https://doi.org/10.1007/978-3-030-91356-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91356-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91355-7

  • Online ISBN: 978-3-030-91356-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics