Skip to main content

Advertisement

Log in

On the study of replay and voice conversion attacks to text-dependent speaker verification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic speaker verification (ASV) is to automatically accept or reject a claimed identity based on a speech sample. Recently, individual studies have confirmed the vulnerability of state-of-the-art text-independent ASV systems under replay, speech synthesis and voice conversion attacks on various databases. However, the behaviours of text-dependent ASV systems have not been systematically assessed in the face of various spoofing attacks. In this work, we first conduct a systematic analysis of text-dependent ASV systems to replay and voice conversion attacks using the same protocol and database, in particular the RSR2015 database which represents mobile device quality speech. We then analyse the interplay of voice conversion and speaker verification by linking the voice conversion objective evaluation measures with the speaker verification error rates to take a look at the vulnerabilities from the perspective of voice conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://festvox.org/index.html

  2. http://aholab.ehu.es/ahocoder/

References

  1. Alegre F, Amehraye A, Evans N (2013) A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: Proceedings of the international conference on biometrics: theory, applications and systems (BTAS)

  2. Alegre F, Vipperla R, Evans N, et al. (2012) Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proceedings interspeech

  3. Bonastre JF, Matrouf D, Fredouille C (2006) Transfer function-based voice transformation for speaker recognition. In: Proceedings Odyssey: the speaker and language recognition workshop

  4. Bonastre JF, Matrouf D, Fredouille C (2007) Artificial impostor voice transformation effects on false acceptance rates. In: Proceedings interspeech

  5. Campbell J (1997) Speaker recognition: A tutorial. Proc IEEE 85(9):1437–1462

    Article  Google Scholar 

  6. Center ST VoiceGrid (TM) RT: Sophisticated distributed solution for real-time speaker identification. In: http://speechpro.com/product/biometric/voicegridrt

  7. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  8. De Leon P, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290

    Article  Google Scholar 

  9. Dehak N, Dumouchel P, Kenny P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103

    Article  Google Scholar 

  10. Farrús M, Wagner M, Anguita J, Hernando J (2008) How vulnerable are prosodic features to professional imitators?. In: Proceedings Odyssey: the speaker and language recognition workshop

  11. Faundez-Zanuy M, Hagmüller M, Kubin G (2006) Speaker verification security improvement by means of speech watermarking. Speech Comm 48(12):1608–1619

    Article  MATH  Google Scholar 

  12. Hautamäki RG, Kinnunen T, Hautamäki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceedings interspeech

  13. Hebert M (2008) Text-dependent speaker recognition. In: Benesty J, Sondhi M, Huang Y (eds) Springer Handbook of Speech Processing. Springer Berlin, Heidelberg, pp 743–762

    Chapter  Google Scholar 

  14. Hunt AJ, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)

  15. Jin Q, Toth A, Black A, Schultz T (2008) Is voice transformation a threat to speaker identification?. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)

  16. Khoury E, Kinnunen T, Sizov A, Wu Z, Marcel S (2014) Introducing i-vectors for joint anti-spoong and speaker verication. In: Proceedings interspeech

  17. Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Comm 52(1):12–40

    Article  Google Scholar 

  18. Kinnunen T, Wu Z, Lee K, Sedlak F, Chng E, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)

  19. Kockmann M, Burget L, Cernocky J (2010) Investigations into prosodic syllable contour features for speaker recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)

  20. Kons Z, Aronowitz H (2013) Voice transformation-based spoofing of text-dependent speaker verification systems. In: Proceedings interspeech

  21. Larcher A, Bonastre JF, Mason JS (2013) Constrained temporal structure for text-dependent speaker verification. Digital Signal Processing 23(6):1910–1917

    Article  Google Scholar 

  22. Larcher A, Lee KA, Ma B, Li H (2012) The RSR2015: Database for text-dependent speaker verification using multiple pass-phrases. In: Proceedings interspeech

  23. Larcher A, Lee KA, Ma B, Li H (2014) Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Comm 60:5677

  24. Lau YW, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of the IEEE international symposium on intelligent multimedia, video and speech processing

  25. Lee CH, Huo Q (2000) On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc IEEE 88(8):1241–1269

    Article  Google Scholar 

  26. Lee KA, Larcher A, Thai H, Ma B, Li H (2011) Joint application of speech and speaker recognition for automation and security in smart home. In: Proceedings interspeech

  27. Lee KA, Ma B, Li H (2013) Speaker verification makes its debut in smartphone. In: IEEE signal processing society speech and language technical committee newsletter

  28. Li H, Ma B (2010) Techware: Speaker and spoken language recognition resources [best of the web]. IEEE Signal Proc Mag 27(6):139–142

    Google Scholar 

  29. Li H, Ma B, Lee KA (2013) Spoken language recognition: From fundamentals to practice. Proc IEEE 101(5):1136–1159

    Article  Google Scholar 

  30. Lindberg J, Blomberg M, et al. (1999) Vulnerability in speaker verification-a study of technical impostor techniques. In: Proceedings of the European conference on speech communication and technology (Eurospeech)

  31. Masuko T, Hitotsumatsu T, Tokuda K, Kobayashi T (1999) On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proceedings of the European conference on speech communication and technology (Eurospeech)

  32. Masuko T, Tokuda K, Kobayashi T (2000) Imposture using synthetic speech against speaker verification based on spectrum and pitch. In: Proceedings of the international conference on spoken language processing (ICSLP)

  33. Matrouf D, Bonastre JF, Fredouille C (2006) Effect of speech transformation on impostor acceptance. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)

  34. Nuance: Nuance voice biometrics. In: http://www.nuance.com/landing-pages/products/voicebiometrics/

  35. Qian Y, Soong FK, Yan ZJ (2013) A unified trajectory tiling approach to high quality speech rendering. IEEE Trans Audio Speech Lang Process 21(2):280–290

    Article  Google Scholar 

  36. Ratha NK, Connell JH (2001) Bolle, R.M.: Enhancing security and privacy in biometrics-based authentication systems. IBM Syst J 40(3):614–634

    Article  Google Scholar 

  37. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digital signal processing 10(1):19–41

    Article  Google Scholar 

  38. Satoh T, Masuko T, Kobayashi T, Tokuda K (2001) A robust speaker verification system against imposture using a HMM-based speech synthesis system. In: Proceedings of the European conference on speech communication and technology (Eurospeech)

  39. Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Comm 46 (3):455–472

    Article  Google Scholar 

  40. Stafylakis T, Kenny P, Ouellet P, Perez J, Kockmann M, Dumouchel P (2013) Text-dependent speaker recognition using PLDA with uncertainty propagation. In: Proceedings interspeech

  41. Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing 6(2):131–142

    Article  Google Scholar 

  42. Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235

    Article  Google Scholar 

  43. Villalba J, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: Proceedings FALA 10 workshop

  44. Wu Z, Chng E, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings interspeech

  45. Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Comm 66:130–153

    Article  Google Scholar 

  46. Wu Z, Gao S, Cling ES, Li H (2014) A study on replay attack and anti-spoofing for text-dependent speaker verification. In: Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC)

  47. Wu Z, Khodabakhsh A, Demiroglu C, Yamagishi J, Saito D, Toda T, King S (2015) SAS: A speaker verification spoofing database containing diverse attacks. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)

  48. Wu Z, Kinnunen T, Chng E, Li H, Ambikairajah E (2012) A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Proceedings Asia-Pacific signal information processing association annual summit and conference (APSIPA ASC)

  49. Wu Z, Larcher A, Lee KA, Chng ES, Kinnunen T, Li H (2013) Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints. In: Proceedings interspeech

  50. Wu Z, Li H (2014) Voice conversion versus speaker verification: an overview. APSIPA Transactions on Signal and Information Processing 3(e17). doi:10.1017/ATSIP.2014.17

  51. Wu Z, Swietojanski P, Veaux C, Renals S, King S (2015) A study of speaker adaptation for DNN-based speech synthesis. In: Proceedings interspeech

  52. Wu Z, Xiao X, Chng ES, Li H (2013) Synthetic speech detection using temporal modulation feature. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)

  53. Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Audio Speech Lang Process 17(1):66–83

    Article  Google Scholar 

  54. Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Comm 51(11):1039–1064

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhizheng Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Z., Li, H. On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimed Tools Appl 75, 5311–5327 (2016). https://doi.org/10.1007/s11042-015-3080-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3080-9

Keywords

Navigation