Black-Box Audio Adversarial Example Generation Using Variational Autoencoder

Zong, Wei; Chow, Yang-Wai; Susilo, Willy

doi:10.1007/978-3-030-88052-1_9

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12919))

Included in the following conference series:

International Conference on Information and Communications Security

1041 Accesses
1 Citations

Abstract

Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the techniques underlying ASR models suffer from adversarial example (AE) attacks. Audio AEs resemble non-suspicious audio to the casual listener, but will be incorrectly transcribed by an ASR system. Existing black-box AE techniques require excessive requests sent to a targeted system. Such suspicious behavior can potentially trigger a threat alert on the system. This paper proposes a method of generating black-box AEs in a way that significantly reduces the required amount of requests. We describe our proposed method and presents experimental results demonstrating its effectiveness in generating word-level and sentence-level AEs that are incorrectly transcribed by an ASR system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdullah, H., et al.: Hear “no evil”, see “kenansville”: efficient and transferable black-box attacks on speech recognition and voice identification systems. In: 2021 IEEE Symposium on Security and Privacy (SP), Los Alamitos, CA, USA, May 2021, pp. 142–159. IEEE Computer Society (2021)
Google Scholar
Alzantot, M., Balaji, B., Srivastava, M.B.: Did you hear that? Adversarial examples against automatic speech recognition. CoRR, abs/1801.00554 (2018)
Google Scholar
Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.: Synthesizing robust adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, pp. 284–293 (2018)
Google Scholar
Carlini, N., Wagner, D.A.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops, SP Workshops 2018, San Francisco, CA, USA, 24 May 2018, pp. 1–7 (2018)
Google Scholar
Chen, G., et al.: Who is real bob? Adversarial attacks on speaker recognition systems. CoRR, abs/1911.01840 (2019)
Google Scholar
Cissé, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: fooling deep structured visual and speech recognition models with adversarial examples. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017, pp. 6977–6987 (2017)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Google Scholar
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Machine Learning, Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, 25–29 June 2006, pp. 369–376 (2006)
Google Scholar
Griffin, D., Lim, J.: Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)
Article Google Scholar
Hannun, A.Y., et al.: Deep speech: scaling up end-to-end speech recognition. CoRR, abs/1412.5567 (2014)
Google Scholar
Hsu, W., Zhang, Y., Glass, J.R.: Learning latent representations for speech generation and transformation. In: Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, 20–24 August 2017, pp. 1273–1277 (2017)
Google Scholar
Khare, S., Aralikatte, R., Mani, S.: Adversarial black-box attacks for automatic speech recognition systems using multi-objective genetic optimization. CoRR, abs/1811.01312 (2018)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)
Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Workshop Track Proceedings (2017)
Google Scholar
Lugosch, L., Ravanelli, M., Ignoto, P., Tomar, V.S., Bengio, Y.: Speech model pre-training for end-to-end spoken language understanding. CoRR, abs/1904.03670 (2019)
Google Scholar
Moosavi-Dezfooli, S., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 86–94 (2017)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, 19–24 April 2015, pp. 5206–5210 (2015)
Google Scholar
Qin, Y., Carlini, N., Cottrell, G.W., Goodfellow, I.J., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019, pp. 5231–5240 (2019)
Google Scholar
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2001, Salt Palace Convention Center, Salt Lake City, Utah, USA, 7–11 May, 2001, Proceedings, pp. 749–752. IEEE (2001)
Google Scholar
Schönherr, L., Kohls, K., Zeiler, S., Holz, T., Kolossa, D.: Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. In: 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, 24–27 February 2019 (2019)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)
Google Scholar
Taori, R., Kamsetty, A., Chu, B., Vemuri, N.: Targeted adversarial examples for black box audio systems. In: 2019 IEEE Security and Privacy Workshops, SP Workshops 2019, San Francisco, CA, USA, 19–23 May 2019, pp. 15–20 (2019)
Google Scholar
van den Oord, A., et al.: WaveNet: a generative model for raw audio. In: The 9th ISCA Speech Synthesis Workshop, Sunnyvale, CA, USA, 13–15 September 2016, p. 125 (2016)
Google Scholar
Wang, Q., Zheng, B., Li, Q., Shen, C., Ba, Z.: Towards query-efficient adversarial attacks against automatic speech recognition systems. IEEE Trans. Inf. Forensics Secur. 16, 896–908 (2021)
Article Google Scholar
Yang, Z., Li, B., Chen, P., Song, D.: Characterizing audio adversarial examples using temporal dependency. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Cybersecurity and Cryptology, School of Computing and Information Technology, University of Wollongong, Wollongong, NSW, Australia
Wei Zong, Yang-Wai Chow & Willy Susilo

Authors

Wei Zong
View author publications
You can also search for this author in PubMed Google Scholar
Yang-Wai Chow
View author publications
You can also search for this author in PubMed Google Scholar
Willy Susilo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Zong or Yang-Wai Chow .

Editor information

Editors and Affiliations

Singapore Management University, Singapore, Singapore
Debin Gao
Tsinghua University, Beijing, China
Qi Li
Xi'an Jiaotong University, Xi'an, China
Xiaohong Guan
Chongqing University, Chongqing, China
Xiaofeng Liao

Appendix

Table 3. Example of circumventing temporal dependency detection.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zong, W., Chow, YW., Susilo, W. (2021). Black-Box Audio Adversarial Example Generation Using Variational Autoencoder. In: Gao, D., Li, Q., Guan, X., Liao, X. (eds) Information and Communications Security. ICICS 2021. Lecture Notes in Computer Science(), vol 12919. Springer, Cham. https://doi.org/10.1007/978-3-030-88052-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-88052-1_9
Published: 17 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88051-4
Online ISBN: 978-3-030-88052-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Black-Box Audio Adversarial Example Generation Using Variational Autoencoder

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation