Skip to main content

Part of the book series: EAI/Springer Innovations in Communication and Computing ((EAISICC))

Abstract

Attempts to spoof an ASV (voice biometric system) have been successful in the past due to the advent of technologies. However, despite the development of various countermeasures for each spoofing attack, there is an urgent need for a versatile countermeasure. Hence, designing a voice privacy system has become crucial. Moreover, the energy losses in a speech production model contain speaker-specific information and thus provide acoustic cues for voice privacy. In this chapter, the design of 2nd-order resonator and the linear prediction modeling of speech production are exploited to design voice privacy system. The performance of the proposed system is compared with the secondary baseline system of the INTERSPEECH 2020 voice privacy challenge. Improved performance-wise EER and WER are achieved for various subsets of the corpora, furthermore, while we may achieve anonymization by cryptography, which have limitations in complexity and implementation costs, discussed in detail for privacy preservation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Warren, S.D. and Brandeis, L.D. (1890) The Right to Privacy. Harvard Law Review : 193–220.

    Google Scholar 

  2. Nautsch, A., Jiménez, A., Treiber, A., Kolberg, J., Jasserand, C., Kindt, E., Delgado, H. et al. (2019) Preserving Privacy in Speaker and Speech Characterisation. Computer Speech & Language 58: 441–480.

    Google Scholar 

  3. Malin, B.A., Emam, K.E. and O’Keefe, C.M. (2013), Biomedical data privacy: problems, perspectives, and recent advances.

    Google Scholar 

  4. Boyer, B.B. (1975) Computerized medical records and the right to privacy: the emerging federal response. BuFF. L. REv. 25: 37.

    Google Scholar 

  5. Stylianou, Y., Cappé, O. and Moulines, E. (1998) Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing 6(2): 131–142.

    Google Scholar 

  6. Stylianou, Y. (2009) Voice transformation: A survey. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Taipei, Taiwan): 3585–3588.

    Google Scholar 

  7. Zen, H., Tokuda, K. and Black, A.W. (2009) Statistical parametric speech synthesis. Speech Communication 51(11): 1039–1064.

    Google Scholar 

  8. De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I. and Saratxaga, I. (October, 2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 20(8): 2280–2290.

    Google Scholar 

  9. Alegre, F., Janicki, A. and Evans, N. (2014) Re-assessing the threat of replay spoofing attacks against automatic speaker verification. In International Conference of the Biometrics Special Interest Group (BIOSIG) (Darmstadt, Germany): 1–6.

    Google Scholar 

  10. Paul, A., Das, R.K., Sinha, R. and Prasanna, S.M. (2016) Countermeasure to handle replay attacks in practical speaker verification systems. In 2016 International Conference on Signal Processing and Communications (SPCOM) (IISc, Bengaluru, India): 1–5.

    Google Scholar 

  11. Prajapati, G.P., , Kamble, M.R. and Patil, H.A. (18-21 January, 2020) Energy separation based features for replay spoof detection for voice assistant. 28th European Signal Processing Conference (EUSIPCO) : pp. 386–390.

    Google Scholar 

  12. Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F. and Li, H. (2015) Spoofing and countermeasures for speaker verification: A survey. Speech Communication 66: 130–153.

    Google Scholar 

  13. Lau, Y.W., Wagner, M. and Tran, D. (2004) Vulnerability of speaker verification to voice mimicking. In International Symposium on Intelligent Multimedia, Video, and Speech Processing (Hong Kong): 145–148.

    Google Scholar 

  14. Gupta, P., Prajapati, G.P., Singh, S., Kamble, M.R. and Patil, H.A. (7-10 December, 2020) Design of voice privacy system using linear prediction. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (Auckland, New Zealand: IEEE): 543–549.

    Google Scholar 

  15. Gong, Y., Yang, J. and Poellabauer, C. (2020) Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method. IEEE Signal Processing Letters.

    Google Scholar 

  16. Patel, T.B. and Patil, H.A. (2016) Cochlear Filter and Instantaneous Frequency based Features for Spoofed Speech Detection. IEEE Journal of Selected Topics in Signal Processing 11(4): 618–631.

    Google Scholar 

  17. Patel, T.B. and Patil, H.A. (6-10 September, 2015) Combining Evidences from Mel Cepstral, Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs. Spoofed Speech. In INTERSPEECH (Dresden, Germany).

    Google Scholar 

  18. Kamble, M.R., Pulikonda, A.K.S., Krishna, M.V.S. and Patil, H.A. (1-5 November, 2020) Analysis of Teager Energy Profiles for Spoof Speech Detection. In Odyssey The Speaker and Language Recognition Workshop, Tokyo, Japan.

    Google Scholar 

  19. Zhizheng, W., Kinnunen, T., Evans, N., Yamagishi, J., Hanilçi, C., Sahidullah, M. and Sizov, A. (6-10 September, 2015) ASVspoof 2015: The First Automatic Speaker Verification Spoofing and Countermeasures Challenge. In INTERSPEECH (Dresden, Germany): 2037–2041.

    Google Scholar 

  20. Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., Yamagishi, J. et al. (2019) Asvspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. arXiv preprint arXiv:1904.05441 .

    Google Scholar 

  21. Automatic Speaker Verification-Spoofing and Countermeasures Challenge https://www.asvspoof.org/. {Last Accessed: 2021-03-15}.

  22. Novoselov, S., Kozlov, A., Lavrentyeva, G., Simonchik, K. and Shchemelinin, V. (20-25 March, 2016) STC Anti-spoofing systems for the ASVspoof 2015 Challenge. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Shanghai, China: IEEE): 5475–5479.

    Google Scholar 

  23. Wester, M., Wu, Z. and Yamagishi, J. (6-10 September, 2015) Human vs Machine Spoofing Detection on Wideband and Narrowband Data. In INTERSPEECH (Dresden, Germany): 2047–2051.

    Google Scholar 

  24. Wang, L., Yoshida, Y., Kawakami, Y. and Nakagawa, S. (6-10 September, 2015) Relative Phase Information for Detecting Human Speech and Spoofed Speech. In INTERSPEECH (Dresden, Germany): 2092–2096.

    Google Scholar 

  25. Liu, Y., Tian, Y., He, L., Liu, J. and Johnson, M.T. (6-10 September, 2015) Simultaneous Utilization of Spectral Magnitude and Phase Information to Extract Supervectors for Speaker Verification Anti-spoofing. In INTERSPEECH (Dresden, Germany): 2082–2086.

    Google Scholar 

  26. Xiao, X., Tian, X., Du, S., Xu, H., Chng, E.S. and Li, H. (6-10 September, 2015) Spoofing Speech Detection using High-Dimensional Magnitude and Phase Features: The NTU Approach for ASVspoof 2015 Challenge. In INTERSPEECH (Dresden, Germany): 2052–2056.

    Google Scholar 

  27. Font, R., Espín, J.M. and Cano, M.J. (20-24 August, 2017) Experimental Analysis of Features for Replay Attack Detection-Results on the ASVspoof 2017 Challenge. In INTERSPEECH (Stockholm, Sweden): 7–11.

    Google Scholar 

  28. Witkowski, M., Kacprzak, S., Zelasko, P., Kowalczyk, K. and Galka, J. (20-24 August, 2017) Audio Replay Attack Detection Using High-Frequency Features. In INTERSPEECH (Stockholm, Sweden): 27–31.

    Google Scholar 

  29. Wang, X., Xiao, Y. and Zhu, X. (20-24 August, 2017) Feature selection based on CQCCs for automatic speaker verification spoofing. In INTERSPEECH (Stockholm, Sweden): 32–36.

    Google Scholar 

  30. Doddington, G., Liggett, W., Martin, A., Przybocki, M. and Reynolds, D. (1998) Sheep, Goats, Lambs and Wolves: A Statistical Analysis of Speaker Performance. Tech. rep., National Institute of Standards and Technology (NIST), Gaithersburg Md.

    Google Scholar 

  31. Gupta, P. and Patil, H.A. (2021, Brno, Czechia) A Survey of Attacker’s Perspective on Automatic Speaker Verification (ASV) Systems. Submitted to INTERSPEECH 2021 .

    Google Scholar 

  32. (2017) HSBC reports high trust levels in biometric tech as twins spoof its voice id system. Biometric Technology Today 2017(6): 12. http://www.sciencedirect.com/science/article/pii/S0969476517301194. {Last Accessed: 2021-03-15}.

  33. Team, E. (2017), Twins fool HSBC voice biometrics - BBC. https://www.finextra.com/newsarticle/30594/twins-fool-hsbc-voice-biometrics--bbc. {last accessed: 2021-03-15}.

  34. Rosenberg, A.E. (1976) Automatic speaker verification: A review. Proceedings of the IEEE 64(4): 475–487.

    Article  Google Scholar 

  35. Quatieri, T.F. (2004) Discrete-Time Speech Signal Processing: Principles and Practice (2nd Edition, Pearson Education India).

    Google Scholar 

  36. Kersta, L.G. (1962) Voiceprint identification. Nature 196(4861): 1253–1257.

    Article  Google Scholar 

  37. Fant, G. (1970) Acoustic Theory of Speech Production (2nd Edition, Walter de Gruyter).

    Google Scholar 

  38. Atal, B.S. and Hanauer, S.L. (1971) Speech Analysis and Synthesis by Linear Prediction of the Speech Wave. The Journal of the Acoustical Society of America (JASA) 50(2B): 637–655.

    Google Scholar 

  39. Flanagan, J.L. (2013) Speech Analysis Synthesis and Perception, 3 (Springer Science & Business Media).

    Google Scholar 

  40. Portnoff, M.R. (1973) A Quasi-One-Dimensional Digital Simulation for the Time-Varying Vocal Tract. Ph.D. thesis, Department of Electrical Engineering, Massachusetts Institute of Technology, USA.

    Google Scholar 

  41. Markel, J.D. and Gray, A.J. (2013) Linear Prediction of Speech, 12 (Springer Science & Business Media).

    Google Scholar 

  42. Eide, E. and Gish, H. (1996) A Parametric Approach to Vocal Tract Length Normalization. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Atlanta, Georgia, USA: IEEE), 1: 346–348.

    Google Scholar 

  43. Mizuno, H. and Abe, M. (1996) A Formant Frequency Modification Algorithm Dealing with the Pole Interaction. Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 79(1): 46–55.

    Google Scholar 

  44. Schroeder, M.R. (May 1966) Vocoders: Analysis and Synthesis of Speech. Proceedings of the IEEE 54(5): 720–734.

    Article  Google Scholar 

  45. The Voice Privacy 2020 Challenge Evaluation Plan. https://www.voiceprivacychallenge.org.

  46. Tomashenko, N., Srivastava, B.M.L., Wang, X., Vincent, E., Nautsch, A., Yamagishi, J., Evans, N. et al. (24-28 October, 2020) Introducing the voice privacy initiative. In INTERSPEECH (Shanghai, China). {Last Accessed: 2021-03-15}.

    Google Scholar 

  47. McAdams, S. (May, 1984) Spectral fusion, spectral parsing, and the formation of auditory image. Ph.D. Thesis, Department of Hearing and Speech, Stanford University, California, USA .

    Google Scholar 

  48. Patino, J., Todisco, M., Nautsch, A. and Evans, N. (2020) Speaker Anonymisation using the McAdam’s Coefficient. Tech. rep., EURECOM. http://www.eurecom.fr/publication/6190 Last Accessed: 2021-03-15.

  49. Panayotov, V., Chen, G., Povey, D. and Khudanpur, S. (19-24 April, 2015) LibriSpeech: an ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Brisbane, Australia: IEEE): 5206–5210.

    Google Scholar 

  50. Yamagishi, J., Veaux, C., MacDonald, K. et al. (2019) CSTR VCTK Corpus: English Multi-Speaker Corpus for CSTR Voice Cloning Toolkit (Version 0.92) .

    Google Scholar 

  51. Slifka, J. and Anderson, T.R. (1995) Speaker Modification with LPC Pole Analysis. In 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Detroit, Michigan, USA: IEEE), 1: 644–647.

    Google Scholar 

  52. Un, C. and Magill, D. (1975) The residual-excited linear prediction vocoder with transmission rate below 9.6 kbits/s. IEEE Transactions on Communications 23(12): 1466–1474.

    Google Scholar 

  53. Schroeder, M. and Atal, B. (1985) Code-excited linear prediction (CELP): High-quality speech at very low bit rates. In ICASSP’85. IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE), 10: 937–940.

    Google Scholar 

  54. McCree, A.V. and Barnwell, T.P. (1995) A mixed excitation LPC vocoder model for low bit rate speech coding. IEEE Transactions on Speech and Audio Processing 3(4): 242–250.

    Google Scholar 

  55. Gupta, P., Prajapati, G., Singh, S., Kamble, M.R. and Patil, H.A. (2020) System description : Design of voice privacy system using linear prediction https://www.voiceprivacychallenge.org/docs/DA-IICT-Speech-Group.pdf. {Last Accessed: 15-01-2021}.

  56. Patil, H.A., Dutta, P. and Basu, T. (2006) On the Investigation of Spectral Resolution Problem for Identification of Female Speakers in Bengali. In 2006 IEEE International Conference on Industrial Technology (ICIT) (Mumbai, India: IEEE): 375–380.

    Google Scholar 

  57. Sailor, H.B. (2013) Objective Evaluation of Speech Quality of Text-to-Speech (TTS) Synthesis Systems. Master’s thesis, DA-IICT, Gandhinagar, India.

    Google Scholar 

  58. Stinson, D.R. and Paterson, M. (2018) Cryptography: Theory and Practice (CRC press).

    Google Scholar 

  59. Stallings, W. (2006) Cryptography and Network Security: Principles and Practices (Pearson Education India).

    Google Scholar 

  60. Rivest, R.L., Shamir, A. and Adleman, L. (1978) A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM 21(2): 120–126.

    Google Scholar 

  61. Bai, X., Jiang, L., Liu, X. and Tan, J. (2014) RSA Encryption/Decryption Implementation Based on ZedBoard. In International Conference on Trustworthy Computing and Services (Springer): 114–121.

    Google Scholar 

  62. Dixon, J.D. (1970) The Number of Steps in the Euclidean Algorithm. Journal of Number Theory 2(4): 414–422.

    Article  MathSciNet  MATH  Google Scholar 

  63. Gentry, C. and Boneh, D. (2009) A Fully Homomorphic Encryption Scheme, 20 (Stanford University).

    Google Scholar 

  64. Nara, R., Satoh, K., Yanagisawa, M., Ohtsuki, T. and Togawa, N. (2010) Scan-based Side-Channel Attack Against RSA Cryptosystems Using Scan Signatures. IEICE transactions on Fundamentals of Electronics, Communications and Computer Sciences 93(12): 2481–2489.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gauri P. Prajapati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gupta, P., Singh, S., Prajapati, G.P., Patil, H.A. (2023). Voice Privacy in Biometrics. In: Paunwala, C., et al. Biomedical Signal and Image Processing with Artificial Intelligence. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-15816-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15816-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15815-5

  • Online ISBN: 978-3-031-15816-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics