Skip to main content

Privacy-Preserving Speaker Verification and Speech Recognition

  • Conference paper
  • First Online:
Emerging Technologies for Authorization and Authentication (ETAA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13782))

  • 345 Accesses

Abstract

This paper proposes an approach to speaker verification and speech recognition in environments that require authentication and privacy protection, while accuracy and data utility must remain high. Our methodology aims at protecting audio files and users’ identities through the use of encryption and hashing algorithms, while at the same time providing accurate speaker’s identity prediction. In addition, for speech recognition, we introduce a mechanism to anonymize the resulting transcript of the recognized spoken language using the Named Entity Recognition method by removing sensitive entities from the text according to the user’s preferences. Furthermore, a privacy-preserving version of the original audio is obtained by performing a text-to-speech translation of the anonymized transcript, which together, the anonymous audio and transcript can be transmitted to third parties or service providers without violating privacy restrictions. The proposed methodology has been validated with a set of experiments on a well-known audio dataset, the Librispeech dataset. A type of Time Delay Neural Networks, ECAPA-TDNN was used for speaker verification, Deep Speech as a type of Recurrent Neural Networks was used for speech recognition, NER for entity recognition, cryptography and hashing for privacy protection. The results demonstrate the validity of our approach to protecting the privacy of user data and biometric information while simultaneously performing data analysis with a high degree of accuracy and similarity with the results obtained with no privacy mechanisms in place, also considering the use of several privacy mechanisms.

This work was partially supported by the EU H2020 project SIFIS-Home, G.A. n. 952652.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence: https://bit.ly/3y5wf6e.

  2. 2.

    https://gtts.readthedocs.io/en/latest/.

  3. 3.

    http://wwwopenslr.org!12/.

  4. 4.

    https://github.com/speechbrain/speechbrain.

  5. 5.

    https://github.com/mozilla/DeepSpeech.

  6. 6.

    https://cryptography.io/en/latest/.

  7. 7.

    https://docs.python.org/3.5/library/hashlib.html.

  8. 8.

    https://spacy.io/.

  9. 9.

    https://gtts.readthedocs.io/en/latest/l.

  10. 10.

    https://github.com/speechbrain/speechbrain.

  11. 11.

    https://github.com/mozilla/DeepSpeech.

References

  1. Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  2. Aloufi, R., Haddadi, H., Boyle, D.: Emotionless: privacy-preserving speech analysis for voice assistants. arXiv preprint arXiv:1908.03632 (2019)

  3. Amberkar, A., Awasarmol, P., Deshmukh, G., Dave, P.: Speech recognition using recurrent neural networks. In: 2018 International Conference on Current Trends Towards Converging Technologies (ICCTCT), pp. 1–4. IEEE (2018)

    Google Scholar 

  4. Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182. PMLR (2016)

    Google Scholar 

  5. Barker, E.B., et al.: Secure hash standard (SHS) [includes change notice from 2/25/2004] (2002)

    Google Scholar 

  6. Blazhevski, D., Bozhinovski, A., Stojchevska, B., Pachovski, V.: Modes of operation of the AES algorithm (2013)

    Google Scholar 

  7. Bolton, T., Dargahi, T., Belguith, S., Al-Rakhami, M.S., Sodhro, A.H.: On the security and privacy challenges of virtual assistants. Sensors 21(7), 2312 (2021)

    Article  Google Scholar 

  8. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Sig. Process. Lett. 13(5), 308–311 (2006)

    Article  Google Scholar 

  9. Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)

  10. Cramer, R., Damgård, I.B., et al.: Secure Multiparty Computation. Cambridge University Press, Cambridge (2015)

    Book  MATH  Google Scholar 

  11. Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. arXiv preprint arXiv:2005.07143 (2020)

  12. Dworkin, M.J., et al.: Advanced encryption standard (AES) (2001)

    Google Scholar 

  13. Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210. PMLR (2016)

    Google Scholar 

  14. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)

    Google Scholar 

  15. Hannun, A., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)

  16. Heron, S.: Advanced encryption standard (AES). Netw. Secur. 2009(12), 8–12 (2009)

    Article  Google Scholar 

  17. Hosseini, H., Yun, S., Park, H., Louizos, C., Soriaga, J., Welling, M.: Federated learning of user authentication models. arXiv preprint arXiv:2007.04618 (2020)

  18. Huang, K., Liu, X., Fu, S., Guo, D., Xu, M.: A lightweight privacy-preserving CNN feature extraction framework for mobile sensing. IEEE Trans. Dependable Secure Comput. 18(3), 1441–1455 (2019)

    Google Scholar 

  19. Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)

    Article  MATH  Google Scholar 

  20. Kenny, P.: Bayesian speaker verification with, heavy tailed priors. In: Proceedings of Odyssey 2010 (2010)

    Google Scholar 

  21. Krawczyk, H., Bellare, M., Canetti, R.: HMAC: keyed-hashing for message authentication. Technical report (1997)

    Google Scholar 

  22. Kröger, J.L., Gellrich, L., Pape, S., Brause, S.R., Ullrich, S.: Personal information inference from voice recordings: user awareness and privacy concerns. Proc. Priv. Enhancing Technol. 2022(1), 6–27 (2022)

    Article  Google Scholar 

  23. Kuchling, A.: Python cryptography toolkit. Release 2(1), 1–16 (2008)

    Google Scholar 

  24. Liu, J., Juuti, M., Lu, Y., Asokan, N.: Oblivious neural network predictions via minionn transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631 (2017)

    Google Scholar 

  25. Malik, M., Malik, M.K., Mehmood, K., Makhdoom, I.: Automatic speech recognition: a survey. Multimed. Tools Appl. 80(6), 9411–9457 (2021). https://doi.org/10.1007/s11042-020-10073-7

    Article  Google Scholar 

  26. McLaren, M., Lawson, A., Lei, Y., Scheffer, N.: Adaptive Gaussian backend for robust language identification. In: Interspeech, pp. 84–88 (2013)

    Google Scholar 

  27. Mohit, B.: Named entity recognition. In: Zitouni, I. (ed.) Natural Language Processing of Semitic Languages. TANLP, pp. 221–245. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45358-8_7

    Chapter  Google Scholar 

  28. Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017)

  29. Nguyen, H.V., Bai, L.: Cosine similarity metric learning for face verification. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6493, pp. 709–720. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19309-5_55

    Chapter  Google Scholar 

  30. Paar, C., Pelzl, J.: Understanding Cryptography: A Textbook for Students and Practitioners. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04101-3

    Book  MATH  Google Scholar 

  31. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)

    Google Scholar 

  32. Parcollet, T., et al.: SpeechBrain: a general-purpose speech toolkit (2022)

    Google Scholar 

  33. Pathak, M.A., Raj, B.: Privacy-preserving speaker verification and identification using gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 21(2), 397–406 (2012)

    Article  Google Scholar 

  34. Po, D.K.: Similarity based information retrieval using Levenshtein distance algorithm. Int. J. Adv. Sci. Res. Eng. 6(04), 06–10 (2020)

    Google Scholar 

  35. Qian, J., et al.: VoiceMask: anonymize and sanitize voice input on mobile devices. arXiv preprint arXiv:1711.11460 (2017)

  36. Rahulamathavan, Y.: Privacy-preserving similarity calculation of speaker features using fully homomorphic encryption. arXiv preprint arXiv:2202.07994 (2022)

  37. Ravanelli, M., et al.: SpeechBrain: a general-purpose speech toolkit. arXiv preprint arXiv:2106.04624 (2021)

  38. Room, C.: Named entity recognition. Algorithms 8(3), 48 (2020)

    Google Scholar 

  39. Safavi, S., Russell, M., Jančovič, P.: Automatic speaker, age-group and gender identification from children’s speech. Comput. Speech Lang. 50, 141–156 (2018)

    Article  Google Scholar 

  40. Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley, Hoboken (2013)

    Book  Google Scholar 

  41. Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2003), vol. 2, pp. II-1. IEEE (2003)

    Google Scholar 

  42. Swietojanski, P., Ghoshal, A., Renals, S.: Convolutional neural networks for distant speech recognition. IEEE Sig. Process. Lett. 21(9), 1120–1124 (2014)

    Article  Google Scholar 

  43. Tan, C.B., Hijazi, M.H.A., Khamis, N., Zainol, Z., Coenen, F., Gani, A., et al.: A survey on presentation attack detection for automatic speaker verification systems: state-of-the-art, taxonomy, issues and future direction. Multimed. Tools nd Appl. 80(21), 32725–32762 (2021). https://doi.org/10.1007/s11042-021-11235-x

    Article  Google Scholar 

  44. Treiber, A., Nautsch, A., Kolberg, J., Schneider, T., Busch, C.: Privacy-preserving PLDA speaker verification using outsourced secure computation. Speech Commun. 114, 60–71 (2019)

    Article  Google Scholar 

  45. Vaidya, T., Sherr, M.: You talk too much: limiting privacy exposure via voice input. In: 2019 IEEE Security and Privacy Workshops (SPW), pp. 84–91. IEEE (2019)

    Google Scholar 

  46. Yi, X., Paulet, R., Bertino, E.: Homomorphic encryption. In: Yi, X., Paulet, R., Bertino, E. (eds.) Homomorphic Encryption and Applications. SCS, pp. 27–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12229-8_2

    Chapter  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wisam Abbasi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abbasi, W. (2023). Privacy-Preserving Speaker Verification and Speech Recognition. In: Saracino, A., Mori, P. (eds) Emerging Technologies for Authorization and Authentication. ETAA 2022. Lecture Notes in Computer Science, vol 13782. Springer, Cham. https://doi.org/10.1007/978-3-031-25467-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25467-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25466-6

  • Online ISBN: 978-3-031-25467-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics