Abstract
This paper proposes an approach to speaker verification and speech recognition in environments that require authentication and privacy protection, while accuracy and data utility must remain high. Our methodology aims at protecting audio files and users’ identities through the use of encryption and hashing algorithms, while at the same time providing accurate speaker’s identity prediction. In addition, for speech recognition, we introduce a mechanism to anonymize the resulting transcript of the recognized spoken language using the Named Entity Recognition method by removing sensitive entities from the text according to the user’s preferences. Furthermore, a privacy-preserving version of the original audio is obtained by performing a text-to-speech translation of the anonymized transcript, which together, the anonymous audio and transcript can be transmitted to third parties or service providers without violating privacy restrictions. The proposed methodology has been validated with a set of experiments on a well-known audio dataset, the Librispeech dataset. A type of Time Delay Neural Networks, ECAPA-TDNN was used for speaker verification, Deep Speech as a type of Recurrent Neural Networks was used for speech recognition, NER for entity recognition, cryptography and hashing for privacy protection. The results demonstrate the validity of our approach to protecting the privacy of user data and biometric information while simultaneously performing data analysis with a high degree of accuracy and similarity with the results obtained with no privacy mechanisms in place, also considering the use of several privacy mechanisms.
This work was partially supported by the EU H2020 project SIFIS-Home, G.A. n. 952652.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence: https://bit.ly/3y5wf6e.
- 2.
- 3.
http://wwwopenslr.org!12/.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
References
Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Aloufi, R., Haddadi, H., Boyle, D.: Emotionless: privacy-preserving speech analysis for voice assistants. arXiv preprint arXiv:1908.03632 (2019)
Amberkar, A., Awasarmol, P., Deshmukh, G., Dave, P.: Speech recognition using recurrent neural networks. In: 2018 International Conference on Current Trends Towards Converging Technologies (ICCTCT), pp. 1–4. IEEE (2018)
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182. PMLR (2016)
Barker, E.B., et al.: Secure hash standard (SHS) [includes change notice from 2/25/2004] (2002)
Blazhevski, D., Bozhinovski, A., Stojchevska, B., Pachovski, V.: Modes of operation of the AES algorithm (2013)
Bolton, T., Dargahi, T., Belguith, S., Al-Rakhami, M.S., Sodhro, A.H.: On the security and privacy challenges of virtual assistants. Sensors 21(7), 2312 (2021)
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Sig. Process. Lett. 13(5), 308–311 (2006)
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)
Cramer, R., Damgård, I.B., et al.: Secure Multiparty Computation. Cambridge University Press, Cambridge (2015)
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. arXiv preprint arXiv:2005.07143 (2020)
Dworkin, M.J., et al.: Advanced encryption standard (AES) (2001)
Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210. PMLR (2016)
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Hannun, A., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)
Heron, S.: Advanced encryption standard (AES). Netw. Secur. 2009(12), 8–12 (2009)
Hosseini, H., Yun, S., Park, H., Louizos, C., Soriaga, J., Welling, M.: Federated learning of user authentication models. arXiv preprint arXiv:2007.04618 (2020)
Huang, K., Liu, X., Fu, S., Guo, D., Xu, M.: A lightweight privacy-preserving CNN feature extraction framework for mobile sensing. IEEE Trans. Dependable Secure Comput. 18(3), 1441–1455 (2019)
Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)
Kenny, P.: Bayesian speaker verification with, heavy tailed priors. In: Proceedings of Odyssey 2010 (2010)
Krawczyk, H., Bellare, M., Canetti, R.: HMAC: keyed-hashing for message authentication. Technical report (1997)
Kröger, J.L., Gellrich, L., Pape, S., Brause, S.R., Ullrich, S.: Personal information inference from voice recordings: user awareness and privacy concerns. Proc. Priv. Enhancing Technol. 2022(1), 6–27 (2022)
Kuchling, A.: Python cryptography toolkit. Release 2(1), 1–16 (2008)
Liu, J., Juuti, M., Lu, Y., Asokan, N.: Oblivious neural network predictions via minionn transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631 (2017)
Malik, M., Malik, M.K., Mehmood, K., Makhdoom, I.: Automatic speech recognition: a survey. Multimed. Tools Appl. 80(6), 9411–9457 (2021). https://doi.org/10.1007/s11042-020-10073-7
McLaren, M., Lawson, A., Lei, Y., Scheffer, N.: Adaptive Gaussian backend for robust language identification. In: Interspeech, pp. 84–88 (2013)
Mohit, B.: Named entity recognition. In: Zitouni, I. (ed.) Natural Language Processing of Semitic Languages. TANLP, pp. 221–245. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45358-8_7
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017)
Nguyen, H.V., Bai, L.: Cosine similarity metric learning for face verification. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6493, pp. 709–720. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19309-5_55
Paar, C., Pelzl, J.: Understanding Cryptography: A Textbook for Students and Practitioners. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04101-3
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Parcollet, T., et al.: SpeechBrain: a general-purpose speech toolkit (2022)
Pathak, M.A., Raj, B.: Privacy-preserving speaker verification and identification using gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 21(2), 397–406 (2012)
Po, D.K.: Similarity based information retrieval using Levenshtein distance algorithm. Int. J. Adv. Sci. Res. Eng. 6(04), 06–10 (2020)
Qian, J., et al.: VoiceMask: anonymize and sanitize voice input on mobile devices. arXiv preprint arXiv:1711.11460 (2017)
Rahulamathavan, Y.: Privacy-preserving similarity calculation of speaker features using fully homomorphic encryption. arXiv preprint arXiv:2202.07994 (2022)
Ravanelli, M., et al.: SpeechBrain: a general-purpose speech toolkit. arXiv preprint arXiv:2106.04624 (2021)
Room, C.: Named entity recognition. Algorithms 8(3), 48 (2020)
Safavi, S., Russell, M., Jančovič, P.: Automatic speaker, age-group and gender identification from children’s speech. Comput. Speech Lang. 50, 141–156 (2018)
Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley, Hoboken (2013)
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2003), vol. 2, pp. II-1. IEEE (2003)
Swietojanski, P., Ghoshal, A., Renals, S.: Convolutional neural networks for distant speech recognition. IEEE Sig. Process. Lett. 21(9), 1120–1124 (2014)
Tan, C.B., Hijazi, M.H.A., Khamis, N., Zainol, Z., Coenen, F., Gani, A., et al.: A survey on presentation attack detection for automatic speaker verification systems: state-of-the-art, taxonomy, issues and future direction. Multimed. Tools nd Appl. 80(21), 32725–32762 (2021). https://doi.org/10.1007/s11042-021-11235-x
Treiber, A., Nautsch, A., Kolberg, J., Schneider, T., Busch, C.: Privacy-preserving PLDA speaker verification using outsourced secure computation. Speech Commun. 114, 60–71 (2019)
Vaidya, T., Sherr, M.: You talk too much: limiting privacy exposure via voice input. In: 2019 IEEE Security and Privacy Workshops (SPW), pp. 84–91. IEEE (2019)
Yi, X., Paulet, R., Bertino, E.: Homomorphic encryption. In: Yi, X., Paulet, R., Bertino, E. (eds.) Homomorphic Encryption and Applications. SCS, pp. 27–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12229-8_2
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Abbasi, W. (2023). Privacy-Preserving Speaker Verification and Speech Recognition. In: Saracino, A., Mori, P. (eds) Emerging Technologies for Authorization and Authentication. ETAA 2022. Lecture Notes in Computer Science, vol 13782. Springer, Cham. https://doi.org/10.1007/978-3-031-25467-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-25467-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25466-6
Online ISBN: 978-3-031-25467-3
eBook Packages: Computer ScienceComputer Science (R0)