Abstract
Attempts to spoof an ASV (voice biometric system) have been successful in the past due to the advent of technologies. However, despite the development of various countermeasures for each spoofing attack, there is an urgent need for a versatile countermeasure. Hence, designing a voice privacy system has become crucial. Moreover, the energy losses in a speech production model contain speaker-specific information and thus provide acoustic cues for voice privacy. In this chapter, the design of 2nd-order resonator and the linear prediction modeling of speech production are exploited to design voice privacy system. The performance of the proposed system is compared with the secondary baseline system of the INTERSPEECH 2020 voice privacy challenge. Improved performance-wise EER and WER are achieved for various subsets of the corpora, furthermore, while we may achieve anonymization by cryptography, which have limitations in complexity and implementation costs, discussed in detail for privacy preservation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Warren, S.D. and Brandeis, L.D. (1890) The Right to Privacy. Harvard Law Review : 193–220.
Nautsch, A., Jiménez, A., Treiber, A., Kolberg, J., Jasserand, C., Kindt, E., Delgado, H. et al. (2019) Preserving Privacy in Speaker and Speech Characterisation. Computer Speech & Language 58: 441–480.
Malin, B.A., Emam, K.E. and O’Keefe, C.M. (2013), Biomedical data privacy: problems, perspectives, and recent advances.
Boyer, B.B. (1975) Computerized medical records and the right to privacy: the emerging federal response. BuFF. L. REv. 25: 37.
Stylianou, Y., Cappé, O. and Moulines, E. (1998) Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing 6(2): 131–142.
Stylianou, Y. (2009) Voice transformation: A survey. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Taipei, Taiwan): 3585–3588.
Zen, H., Tokuda, K. and Black, A.W. (2009) Statistical parametric speech synthesis. Speech Communication 51(11): 1039–1064.
De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I. and Saratxaga, I. (October, 2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 20(8): 2280–2290.
Alegre, F., Janicki, A. and Evans, N. (2014) Re-assessing the threat of replay spoofing attacks against automatic speaker verification. In International Conference of the Biometrics Special Interest Group (BIOSIG) (Darmstadt, Germany): 1–6.
Paul, A., Das, R.K., Sinha, R. and Prasanna, S.M. (2016) Countermeasure to handle replay attacks in practical speaker verification systems. In 2016 International Conference on Signal Processing and Communications (SPCOM) (IISc, Bengaluru, India): 1–5.
Prajapati, G.P., , Kamble, M.R. and Patil, H.A. (18-21 January, 2020) Energy separation based features for replay spoof detection for voice assistant. 28th European Signal Processing Conference (EUSIPCO) : pp. 386–390.
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F. and Li, H. (2015) Spoofing and countermeasures for speaker verification: A survey. Speech Communication 66: 130–153.
Lau, Y.W., Wagner, M. and Tran, D. (2004) Vulnerability of speaker verification to voice mimicking. In International Symposium on Intelligent Multimedia, Video, and Speech Processing (Hong Kong): 145–148.
Gupta, P., Prajapati, G.P., Singh, S., Kamble, M.R. and Patil, H.A. (7-10 December, 2020) Design of voice privacy system using linear prediction. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (Auckland, New Zealand: IEEE): 543–549.
Gong, Y., Yang, J. and Poellabauer, C. (2020) Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method. IEEE Signal Processing Letters.
Patel, T.B. and Patil, H.A. (2016) Cochlear Filter and Instantaneous Frequency based Features for Spoofed Speech Detection. IEEE Journal of Selected Topics in Signal Processing 11(4): 618–631.
Patel, T.B. and Patil, H.A. (6-10 September, 2015) Combining Evidences from Mel Cepstral, Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs. Spoofed Speech. In INTERSPEECH (Dresden, Germany).
Kamble, M.R., Pulikonda, A.K.S., Krishna, M.V.S. and Patil, H.A. (1-5 November, 2020) Analysis of Teager Energy Profiles for Spoof Speech Detection. In Odyssey The Speaker and Language Recognition Workshop, Tokyo, Japan.
Zhizheng, W., Kinnunen, T., Evans, N., Yamagishi, J., Hanilçi, C., Sahidullah, M. and Sizov, A. (6-10 September, 2015) ASVspoof 2015: The First Automatic Speaker Verification Spoofing and Countermeasures Challenge. In INTERSPEECH (Dresden, Germany): 2037–2041.
Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., Yamagishi, J. et al. (2019) Asvspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. arXiv preprint arXiv:1904.05441 .
Automatic Speaker Verification-Spoofing and Countermeasures Challenge https://www.asvspoof.org/. {Last Accessed: 2021-03-15}.
Novoselov, S., Kozlov, A., Lavrentyeva, G., Simonchik, K. and Shchemelinin, V. (20-25 March, 2016) STC Anti-spoofing systems for the ASVspoof 2015 Challenge. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Shanghai, China: IEEE): 5475–5479.
Wester, M., Wu, Z. and Yamagishi, J. (6-10 September, 2015) Human vs Machine Spoofing Detection on Wideband and Narrowband Data. In INTERSPEECH (Dresden, Germany): 2047–2051.
Wang, L., Yoshida, Y., Kawakami, Y. and Nakagawa, S. (6-10 September, 2015) Relative Phase Information for Detecting Human Speech and Spoofed Speech. In INTERSPEECH (Dresden, Germany): 2092–2096.
Liu, Y., Tian, Y., He, L., Liu, J. and Johnson, M.T. (6-10 September, 2015) Simultaneous Utilization of Spectral Magnitude and Phase Information to Extract Supervectors for Speaker Verification Anti-spoofing. In INTERSPEECH (Dresden, Germany): 2082–2086.
Xiao, X., Tian, X., Du, S., Xu, H., Chng, E.S. and Li, H. (6-10 September, 2015) Spoofing Speech Detection using High-Dimensional Magnitude and Phase Features: The NTU Approach for ASVspoof 2015 Challenge. In INTERSPEECH (Dresden, Germany): 2052–2056.
Font, R., EspÃn, J.M. and Cano, M.J. (20-24 August, 2017) Experimental Analysis of Features for Replay Attack Detection-Results on the ASVspoof 2017 Challenge. In INTERSPEECH (Stockholm, Sweden): 7–11.
Witkowski, M., Kacprzak, S., Zelasko, P., Kowalczyk, K. and Galka, J. (20-24 August, 2017) Audio Replay Attack Detection Using High-Frequency Features. In INTERSPEECH (Stockholm, Sweden): 27–31.
Wang, X., Xiao, Y. and Zhu, X. (20-24 August, 2017) Feature selection based on CQCCs for automatic speaker verification spoofing. In INTERSPEECH (Stockholm, Sweden): 32–36.
Doddington, G., Liggett, W., Martin, A., Przybocki, M. and Reynolds, D. (1998) Sheep, Goats, Lambs and Wolves: A Statistical Analysis of Speaker Performance. Tech. rep., National Institute of Standards and Technology (NIST), Gaithersburg Md.
Gupta, P. and Patil, H.A. (2021, Brno, Czechia) A Survey of Attacker’s Perspective on Automatic Speaker Verification (ASV) Systems. Submitted to INTERSPEECH 2021 .
(2017) HSBC reports high trust levels in biometric tech as twins spoof its voice id system. Biometric Technology Today 2017(6): 12. http://www.sciencedirect.com/science/article/pii/S0969476517301194. {Last Accessed: 2021-03-15}.
Team, E. (2017), Twins fool HSBC voice biometrics - BBC. https://www.finextra.com/newsarticle/30594/twins-fool-hsbc-voice-biometrics--bbc. {last accessed: 2021-03-15}.
Rosenberg, A.E. (1976) Automatic speaker verification: A review. Proceedings of the IEEE 64(4): 475–487.
Quatieri, T.F. (2004) Discrete-Time Speech Signal Processing: Principles and Practice (2nd Edition, Pearson Education India).
Kersta, L.G. (1962) Voiceprint identification. Nature 196(4861): 1253–1257.
Fant, G. (1970) Acoustic Theory of Speech Production (2nd Edition, Walter de Gruyter).
Atal, B.S. and Hanauer, S.L. (1971) Speech Analysis and Synthesis by Linear Prediction of the Speech Wave. The Journal of the Acoustical Society of America (JASA) 50(2B): 637–655.
Flanagan, J.L. (2013) Speech Analysis Synthesis and Perception, 3 (Springer Science & Business Media).
Portnoff, M.R. (1973) A Quasi-One-Dimensional Digital Simulation for the Time-Varying Vocal Tract. Ph.D. thesis, Department of Electrical Engineering, Massachusetts Institute of Technology, USA.
Markel, J.D. and Gray, A.J. (2013) Linear Prediction of Speech, 12 (Springer Science & Business Media).
Eide, E. and Gish, H. (1996) A Parametric Approach to Vocal Tract Length Normalization. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Atlanta, Georgia, USA: IEEE), 1: 346–348.
Mizuno, H. and Abe, M. (1996) A Formant Frequency Modification Algorithm Dealing with the Pole Interaction. Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 79(1): 46–55.
Schroeder, M.R. (May 1966) Vocoders: Analysis and Synthesis of Speech. Proceedings of the IEEE 54(5): 720–734.
The Voice Privacy 2020 Challenge Evaluation Plan. https://www.voiceprivacychallenge.org.
Tomashenko, N., Srivastava, B.M.L., Wang, X., Vincent, E., Nautsch, A., Yamagishi, J., Evans, N. et al. (24-28 October, 2020) Introducing the voice privacy initiative. In INTERSPEECH (Shanghai, China). {Last Accessed: 2021-03-15}.
McAdams, S. (May, 1984) Spectral fusion, spectral parsing, and the formation of auditory image. Ph.D. Thesis, Department of Hearing and Speech, Stanford University, California, USA .
Patino, J., Todisco, M., Nautsch, A. and Evans, N. (2020) Speaker Anonymisation using the McAdam’s Coefficient. Tech. rep., EURECOM. http://www.eurecom.fr/publication/6190 Last Accessed: 2021-03-15.
Panayotov, V., Chen, G., Povey, D. and Khudanpur, S. (19-24 April, 2015) LibriSpeech: an ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Brisbane, Australia: IEEE): 5206–5210.
Yamagishi, J., Veaux, C., MacDonald, K. et al. (2019) CSTR VCTK Corpus: English Multi-Speaker Corpus for CSTR Voice Cloning Toolkit (Version 0.92) .
Slifka, J. and Anderson, T.R. (1995) Speaker Modification with LPC Pole Analysis. In 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Detroit, Michigan, USA: IEEE), 1: 644–647.
Un, C. and Magill, D. (1975) The residual-excited linear prediction vocoder with transmission rate below 9.6 kbits/s. IEEE Transactions on Communications 23(12): 1466–1474.
Schroeder, M. and Atal, B. (1985) Code-excited linear prediction (CELP): High-quality speech at very low bit rates. In ICASSP’85. IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE), 10: 937–940.
McCree, A.V. and Barnwell, T.P. (1995) A mixed excitation LPC vocoder model for low bit rate speech coding. IEEE Transactions on Speech and Audio Processing 3(4): 242–250.
Gupta, P., Prajapati, G., Singh, S., Kamble, M.R. and Patil, H.A. (2020) System description : Design of voice privacy system using linear prediction https://www.voiceprivacychallenge.org/docs/DA-IICT-Speech-Group.pdf. {Last Accessed: 15-01-2021}.
Patil, H.A., Dutta, P. and Basu, T. (2006) On the Investigation of Spectral Resolution Problem for Identification of Female Speakers in Bengali. In 2006 IEEE International Conference on Industrial Technology (ICIT) (Mumbai, India: IEEE): 375–380.
Sailor, H.B. (2013) Objective Evaluation of Speech Quality of Text-to-Speech (TTS) Synthesis Systems. Master’s thesis, DA-IICT, Gandhinagar, India.
Stinson, D.R. and Paterson, M. (2018) Cryptography: Theory and Practice (CRC press).
Stallings, W. (2006) Cryptography and Network Security: Principles and Practices (Pearson Education India).
Rivest, R.L., Shamir, A. and Adleman, L. (1978) A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM 21(2): 120–126.
Bai, X., Jiang, L., Liu, X. and Tan, J. (2014) RSA Encryption/Decryption Implementation Based on ZedBoard. In International Conference on Trustworthy Computing and Services (Springer): 114–121.
Dixon, J.D. (1970) The Number of Steps in the Euclidean Algorithm. Journal of Number Theory 2(4): 414–422.
Gentry, C. and Boneh, D. (2009) A Fully Homomorphic Encryption Scheme, 20 (Stanford University).
Nara, R., Satoh, K., Yanagisawa, M., Ohtsuki, T. and Togawa, N. (2010) Scan-based Side-Channel Attack Against RSA Cryptosystems Using Scan Signatures. IEICE transactions on Fundamentals of Electronics, Communications and Computer Sciences 93(12): 2481–2489.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Gupta, P., Singh, S., Prajapati, G.P., Patil, H.A. (2023). Voice Privacy in Biometrics. In: Paunwala, C., et al. Biomedical Signal and Image Processing with Artificial Intelligence. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-15816-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-15816-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15815-5
Online ISBN: 978-3-031-15816-2
eBook Packages: EngineeringEngineering (R0)