Privacy-Preserving Speaker Verification and Speech Recognition

Abbasi, Wisam

doi:10.1007/978-3-031-25467-3_7

Wisam Abbasi ORCID: orcid.org/0000-0002-6901-1838^9,10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13782))

Included in the following conference series:

International Workshop on Emerging Technologies for Authorization and Authentication

345 Accesses

Abstract

This paper proposes an approach to speaker verification and speech recognition in environments that require authentication and privacy protection, while accuracy and data utility must remain high. Our methodology aims at protecting audio files and users’ identities through the use of encryption and hashing algorithms, while at the same time providing accurate speaker’s identity prediction. In addition, for speech recognition, we introduce a mechanism to anonymize the resulting transcript of the recognized spoken language using the Named Entity Recognition method by removing sensitive entities from the text according to the user’s preferences. Furthermore, a privacy-preserving version of the original audio is obtained by performing a text-to-speech translation of the anonymized transcript, which together, the anonymous audio and transcript can be transmitted to third parties or service providers without violating privacy restrictions. The proposed methodology has been validated with a set of experiments on a well-known audio dataset, the Librispeech dataset. A type of Time Delay Neural Networks, ECAPA-TDNN was used for speaker verification, Deep Speech as a type of Recurrent Neural Networks was used for speech recognition, NER for entity recognition, cryptography and hashing for privacy protection. The results demonstrate the validity of our approach to protecting the privacy of user data and biometric information while simultaneously performing data analysis with a high degree of accuracy and similarity with the results obtained with no privacy mechanisms in place, also considering the use of several privacy mechanisms.

This work was partially supported by the EU H2020 project SIFIS-Home, G.A. n. 952652.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence: https://bit.ly/3y5wf6e.
2.
https://gtts.readthedocs.io/en/latest/.
3.
http://wwwopenslr.org!12/.
4.
https://github.com/speechbrain/speechbrain.
5.
https://github.com/mozilla/DeepSpeech.
6.
https://cryptography.io/en/latest/.
7.
https://docs.python.org/3.5/library/hashlib.html.
8.
https://spacy.io/.
9.
https://gtts.readthedocs.io/en/latest/l.
10.
https://github.com/speechbrain/speechbrain.
11.
https://github.com/mozilla/DeepSpeech.

References

Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Article Google Scholar
Aloufi, R., Haddadi, H., Boyle, D.: Emotionless: privacy-preserving speech analysis for voice assistants. arXiv preprint arXiv:1908.03632 (2019)
Amberkar, A., Awasarmol, P., Deshmukh, G., Dave, P.: Speech recognition using recurrent neural networks. In: 2018 International Conference on Current Trends Towards Converging Technologies (ICCTCT), pp. 1–4. IEEE (2018)
Google Scholar
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182. PMLR (2016)
Google Scholar
Barker, E.B., et al.: Secure hash standard (SHS) [includes change notice from 2/25/2004] (2002)
Google Scholar
Blazhevski, D., Bozhinovski, A., Stojchevska, B., Pachovski, V.: Modes of operation of the AES algorithm (2013)
Google Scholar
Bolton, T., Dargahi, T., Belguith, S., Al-Rakhami, M.S., Sodhro, A.H.: On the security and privacy challenges of virtual assistants. Sensors 21(7), 2312 (2021)
Article Google Scholar
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Sig. Process. Lett. 13(5), 308–311 (2006)
Article Google Scholar
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)
Cramer, R., Damgård, I.B., et al.: Secure Multiparty Computation. Cambridge University Press, Cambridge (2015)
Book MATH Google Scholar
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. arXiv preprint arXiv:2005.07143 (2020)
Dworkin, M.J., et al.: Advanced encryption standard (AES) (2001)
Google Scholar
Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210. PMLR (2016)
Google Scholar
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Google Scholar
Hannun, A., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)
Heron, S.: Advanced encryption standard (AES). Netw. Secur. 2009(12), 8–12 (2009)
Article Google Scholar
Hosseini, H., Yun, S., Park, H., Louizos, C., Soriaga, J., Welling, M.: Federated learning of user authentication models. arXiv preprint arXiv:2007.04618 (2020)
Huang, K., Liu, X., Fu, S., Guo, D., Xu, M.: A lightweight privacy-preserving CNN feature extraction framework for mobile sensing. IEEE Trans. Dependable Secure Comput. 18(3), 1441–1455 (2019)
Google Scholar
Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)
Article MATH Google Scholar
Kenny, P.: Bayesian speaker verification with, heavy tailed priors. In: Proceedings of Odyssey 2010 (2010)
Google Scholar
Krawczyk, H., Bellare, M., Canetti, R.: HMAC: keyed-hashing for message authentication. Technical report (1997)
Google Scholar
Kröger, J.L., Gellrich, L., Pape, S., Brause, S.R., Ullrich, S.: Personal information inference from voice recordings: user awareness and privacy concerns. Proc. Priv. Enhancing Technol. 2022(1), 6–27 (2022)
Article Google Scholar
Kuchling, A.: Python cryptography toolkit. Release 2(1), 1–16 (2008)
Google Scholar
Liu, J., Juuti, M., Lu, Y., Asokan, N.: Oblivious neural network predictions via minionn transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631 (2017)
Google Scholar
Malik, M., Malik, M.K., Mehmood, K., Makhdoom, I.: Automatic speech recognition: a survey. Multimed. Tools Appl. 80(6), 9411–9457 (2021). https://doi.org/10.1007/s11042-020-10073-7
Article Google Scholar
McLaren, M., Lawson, A., Lei, Y., Scheffer, N.: Adaptive Gaussian backend for robust language identification. In: Interspeech, pp. 84–88 (2013)
Google Scholar
Mohit, B.: Named entity recognition. In: Zitouni, I. (ed.) Natural Language Processing of Semitic Languages. TANLP, pp. 221–245. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45358-8_7
Chapter Google Scholar
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017)
Nguyen, H.V., Bai, L.: Cosine similarity metric learning for face verification. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6493, pp. 709–720. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19309-5_55
Chapter Google Scholar
Paar, C., Pelzl, J.: Understanding Cryptography: A Textbook for Students and Practitioners. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04101-3
Book MATH Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Google Scholar
Parcollet, T., et al.: SpeechBrain: a general-purpose speech toolkit (2022)
Google Scholar
Pathak, M.A., Raj, B.: Privacy-preserving speaker verification and identification using gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 21(2), 397–406 (2012)
Article Google Scholar
Po, D.K.: Similarity based information retrieval using Levenshtein distance algorithm. Int. J. Adv. Sci. Res. Eng. 6(04), 06–10 (2020)
Google Scholar
Qian, J., et al.: VoiceMask: anonymize and sanitize voice input on mobile devices. arXiv preprint arXiv:1711.11460 (2017)
Rahulamathavan, Y.: Privacy-preserving similarity calculation of speaker features using fully homomorphic encryption. arXiv preprint arXiv:2202.07994 (2022)
Ravanelli, M., et al.: SpeechBrain: a general-purpose speech toolkit. arXiv preprint arXiv:2106.04624 (2021)
Room, C.: Named entity recognition. Algorithms 8(3), 48 (2020)
Google Scholar
Safavi, S., Russell, M., Jančovič, P.: Automatic speaker, age-group and gender identification from children’s speech. Comput. Speech Lang. 50, 141–156 (2018)
Article Google Scholar
Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley, Hoboken (2013)
Book Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2003), vol. 2, pp. II-1. IEEE (2003)
Google Scholar
Swietojanski, P., Ghoshal, A., Renals, S.: Convolutional neural networks for distant speech recognition. IEEE Sig. Process. Lett. 21(9), 1120–1124 (2014)
Article Google Scholar
Tan, C.B., Hijazi, M.H.A., Khamis, N., Zainol, Z., Coenen, F., Gani, A., et al.: A survey on presentation attack detection for automatic speaker verification systems: state-of-the-art, taxonomy, issues and future direction. Multimed. Tools nd Appl. 80(21), 32725–32762 (2021). https://doi.org/10.1007/s11042-021-11235-x
Article Google Scholar
Treiber, A., Nautsch, A., Kolberg, J., Schneider, T., Busch, C.: Privacy-preserving PLDA speaker verification using outsourced secure computation. Speech Commun. 114, 60–71 (2019)
Article Google Scholar
Vaidya, T., Sherr, M.: You talk too much: limiting privacy exposure via voice input. In: 2019 IEEE Security and Privacy Workshops (SPW), pp. 84–91. IEEE (2019)
Google Scholar
Yi, X., Paulet, R., Bertino, E.: Homomorphic encryption. In: Yi, X., Paulet, R., Bertino, E. (eds.) Homomorphic Encryption and Applications. SCS, pp. 27–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12229-8_2
Chapter MATH Google Scholar

Download references

Author information

Authors and Affiliations

Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa, Italy
Wisam Abbasi
Department of Computer Science at the University of Pisa, Pisa, Italy
Wisam Abbasi

Authors

Wisam Abbasi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wisam Abbasi .

Editor information

Editors and Affiliations

Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa, Italy
Andrea Saracino
Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa, Italy
Paolo Mori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abbasi, W. (2023). Privacy-Preserving Speaker Verification and Speech Recognition. In: Saracino, A., Mori, P. (eds) Emerging Technologies for Authorization and Authentication. ETAA 2022. Lecture Notes in Computer Science, vol 13782. Springer, Cham. https://doi.org/10.1007/978-3-031-25467-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-25467-3_7
Published: 31 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25466-6
Online ISBN: 978-3-031-25467-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Privacy-Preserving Speaker Verification and Speech Recognition