Abstract
Automatic speaker verification (ASV) is to automatically accept or reject a claimed identity based on a speech sample. Recently, individual studies have confirmed the vulnerability of state-of-the-art text-independent ASV systems under replay, speech synthesis and voice conversion attacks on various databases. However, the behaviours of text-dependent ASV systems have not been systematically assessed in the face of various spoofing attacks. In this work, we first conduct a systematic analysis of text-dependent ASV systems to replay and voice conversion attacks using the same protocol and database, in particular the RSR2015 database which represents mobile device quality speech. We then analyse the interplay of voice conversion and speaker verification by linking the voice conversion objective evaluation measures with the speaker verification error rates to take a look at the vulnerabilities from the perspective of voice conversion.
Similar content being viewed by others
References
Alegre F, Amehraye A, Evans N (2013) A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: Proceedings of the international conference on biometrics: theory, applications and systems (BTAS)
Alegre F, Vipperla R, Evans N, et al. (2012) Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proceedings interspeech
Bonastre JF, Matrouf D, Fredouille C (2006) Transfer function-based voice transformation for speaker recognition. In: Proceedings Odyssey: the speaker and language recognition workshop
Bonastre JF, Matrouf D, Fredouille C (2007) Artificial impostor voice transformation effects on false acceptance rates. In: Proceedings interspeech
Campbell J (1997) Speaker recognition: A tutorial. Proc IEEE 85(9):1437–1462
Center ST VoiceGrid (TM) RT: Sophisticated distributed solution for real-time speaker identification. In: http://speechpro.com/product/biometric/voicegridrt
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
De Leon P, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290
Dehak N, Dumouchel P, Kenny P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103
Farrús M, Wagner M, Anguita J, Hernando J (2008) How vulnerable are prosodic features to professional imitators?. In: Proceedings Odyssey: the speaker and language recognition workshop
Faundez-Zanuy M, Hagmüller M, Kubin G (2006) Speaker verification security improvement by means of speech watermarking. Speech Comm 48(12):1608–1619
Hautamäki RG, Kinnunen T, Hautamäki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceedings interspeech
Hebert M (2008) Text-dependent speaker recognition. In: Benesty J, Sondhi M, Huang Y (eds) Springer Handbook of Speech Processing. Springer Berlin, Heidelberg, pp 743–762
Hunt AJ, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Jin Q, Toth A, Black A, Schultz T (2008) Is voice transformation a threat to speaker identification?. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Khoury E, Kinnunen T, Sizov A, Wu Z, Marcel S (2014) Introducing i-vectors for joint anti-spoong and speaker verication. In: Proceedings interspeech
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Comm 52(1):12–40
Kinnunen T, Wu Z, Lee K, Sedlak F, Chng E, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Kockmann M, Burget L, Cernocky J (2010) Investigations into prosodic syllable contour features for speaker recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Kons Z, Aronowitz H (2013) Voice transformation-based spoofing of text-dependent speaker verification systems. In: Proceedings interspeech
Larcher A, Bonastre JF, Mason JS (2013) Constrained temporal structure for text-dependent speaker verification. Digital Signal Processing 23(6):1910–1917
Larcher A, Lee KA, Ma B, Li H (2012) The RSR2015: Database for text-dependent speaker verification using multiple pass-phrases. In: Proceedings interspeech
Larcher A, Lee KA, Ma B, Li H (2014) Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Comm 60:5677
Lau YW, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of the IEEE international symposium on intelligent multimedia, video and speech processing
Lee CH, Huo Q (2000) On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc IEEE 88(8):1241–1269
Lee KA, Larcher A, Thai H, Ma B, Li H (2011) Joint application of speech and speaker recognition for automation and security in smart home. In: Proceedings interspeech
Lee KA, Ma B, Li H (2013) Speaker verification makes its debut in smartphone. In: IEEE signal processing society speech and language technical committee newsletter
Li H, Ma B (2010) Techware: Speaker and spoken language recognition resources [best of the web]. IEEE Signal Proc Mag 27(6):139–142
Li H, Ma B, Lee KA (2013) Spoken language recognition: From fundamentals to practice. Proc IEEE 101(5):1136–1159
Lindberg J, Blomberg M, et al. (1999) Vulnerability in speaker verification-a study of technical impostor techniques. In: Proceedings of the European conference on speech communication and technology (Eurospeech)
Masuko T, Hitotsumatsu T, Tokuda K, Kobayashi T (1999) On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proceedings of the European conference on speech communication and technology (Eurospeech)
Masuko T, Tokuda K, Kobayashi T (2000) Imposture using synthetic speech against speaker verification based on spectrum and pitch. In: Proceedings of the international conference on spoken language processing (ICSLP)
Matrouf D, Bonastre JF, Fredouille C (2006) Effect of speech transformation on impostor acceptance. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Nuance: Nuance voice biometrics. In: http://www.nuance.com/landing-pages/products/voicebiometrics/
Qian Y, Soong FK, Yan ZJ (2013) A unified trajectory tiling approach to high quality speech rendering. IEEE Trans Audio Speech Lang Process 21(2):280–290
Ratha NK, Connell JH (2001) Bolle, R.M.: Enhancing security and privacy in biometrics-based authentication systems. IBM Syst J 40(3):614–634
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digital signal processing 10(1):19–41
Satoh T, Masuko T, Kobayashi T, Tokuda K (2001) A robust speaker verification system against imposture using a HMM-based speech synthesis system. In: Proceedings of the European conference on speech communication and technology (Eurospeech)
Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Comm 46 (3):455–472
Stafylakis T, Kenny P, Ouellet P, Perez J, Kockmann M, Dumouchel P (2013) Text-dependent speaker recognition using PLDA with uncertainty propagation. In: Proceedings interspeech
Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing 6(2):131–142
Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235
Villalba J, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: Proceedings FALA 10 workshop
Wu Z, Chng E, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings interspeech
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Comm 66:130–153
Wu Z, Gao S, Cling ES, Li H (2014) A study on replay attack and anti-spoofing for text-dependent speaker verification. In: Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC)
Wu Z, Khodabakhsh A, Demiroglu C, Yamagishi J, Saito D, Toda T, King S (2015) SAS: A speaker verification spoofing database containing diverse attacks. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Wu Z, Kinnunen T, Chng E, Li H, Ambikairajah E (2012) A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Proceedings Asia-Pacific signal information processing association annual summit and conference (APSIPA ASC)
Wu Z, Larcher A, Lee KA, Chng ES, Kinnunen T, Li H (2013) Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints. In: Proceedings interspeech
Wu Z, Li H (2014) Voice conversion versus speaker verification: an overview. APSIPA Transactions on Signal and Information Processing 3(e17). doi:10.1017/ATSIP.2014.17
Wu Z, Swietojanski P, Veaux C, Renals S, King S (2015) A study of speaker adaptation for DNN-based speech synthesis. In: Proceedings interspeech
Wu Z, Xiao X, Chng ES, Li H (2013) Synthetic speech detection using temporal modulation feature. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Audio Speech Lang Process 17(1):66–83
Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Comm 51(11):1039–1064
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, Z., Li, H. On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimed Tools Appl 75, 5311–5327 (2016). https://doi.org/10.1007/s11042-015-3080-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3080-9