On the study of replay and voice conversion attacks to text-dependent speaker verification

Wu, Zhizheng; Li, Haizhou

doi:10.1007/s11042-015-3080-9

On the study of replay and voice conversion attacks to text-dependent speaker verification

Published: 03 December 2015

Volume 75, pages 5311–5327, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhizheng Wu¹ &
Haizhou Li²

617 Accesses
35 Citations
Explore all metrics

Abstract

Automatic speaker verification (ASV) is to automatically accept or reject a claimed identity based on a speech sample. Recently, individual studies have confirmed the vulnerability of state-of-the-art text-independent ASV systems under replay, speech synthesis and voice conversion attacks on various databases. However, the behaviours of text-dependent ASV systems have not been systematically assessed in the face of various spoofing attacks. In this work, we first conduct a systematic analysis of text-dependent ASV systems to replay and voice conversion attacks using the same protocol and database, in particular the RSR2015 database which represents mobile device quality speech. We then analyse the interplay of voice conversion and speaker verification by linking the voice conversion objective evaluation measures with the speaker verification error rates to take a look at the vulnerabilities from the perspective of voice conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Alegre F, Amehraye A, Evans N (2013) A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: Proceedings of the international conference on biometrics: theory, applications and systems (BTAS)
Alegre F, Vipperla R, Evans N, et al. (2012) Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proceedings interspeech
Bonastre JF, Matrouf D, Fredouille C (2006) Transfer function-based voice transformation for speaker recognition. In: Proceedings Odyssey: the speaker and language recognition workshop
Bonastre JF, Matrouf D, Fredouille C (2007) Artificial impostor voice transformation effects on false acceptance rates. In: Proceedings interspeech
Campbell J (1997) Speaker recognition: A tutorial. Proc IEEE 85(9):1437–1462
Article Google Scholar
Center ST VoiceGrid (TM) RT: Sophisticated distributed solution for real-time speaker identification. In: http://speechpro.com/product/biometric/voicegridrt
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
De Leon P, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290
Article Google Scholar
Dehak N, Dumouchel P, Kenny P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103
Article Google Scholar
Farrús M, Wagner M, Anguita J, Hernando J (2008) How vulnerable are prosodic features to professional imitators?. In: Proceedings Odyssey: the speaker and language recognition workshop
Faundez-Zanuy M, Hagmüller M, Kubin G (2006) Speaker verification security improvement by means of speech watermarking. Speech Comm 48(12):1608–1619
Article MATH Google Scholar
Hautamäki RG, Kinnunen T, Hautamäki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceedings interspeech
Hebert M (2008) Text-dependent speaker recognition. In: Benesty J, Sondhi M, Huang Y (eds) Springer Handbook of Speech Processing. Springer Berlin, Heidelberg, pp 743–762
Chapter Google Scholar
Hunt AJ, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Jin Q, Toth A, Black A, Schultz T (2008) Is voice transformation a threat to speaker identification?. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Khoury E, Kinnunen T, Sizov A, Wu Z, Marcel S (2014) Introducing i-vectors for joint anti-spoong and speaker verication. In: Proceedings interspeech
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Comm 52(1):12–40
Article Google Scholar
Kinnunen T, Wu Z, Lee K, Sedlak F, Chng E, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Kockmann M, Burget L, Cernocky J (2010) Investigations into prosodic syllable contour features for speaker recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Kons Z, Aronowitz H (2013) Voice transformation-based spoofing of text-dependent speaker verification systems. In: Proceedings interspeech
Larcher A, Bonastre JF, Mason JS (2013) Constrained temporal structure for text-dependent speaker verification. Digital Signal Processing 23(6):1910–1917
Article Google Scholar
Larcher A, Lee KA, Ma B, Li H (2012) The RSR2015: Database for text-dependent speaker verification using multiple pass-phrases. In: Proceedings interspeech
Larcher A, Lee KA, Ma B, Li H (2014) Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Comm 60:5677
Lau YW, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of the IEEE international symposium on intelligent multimedia, video and speech processing
Lee CH, Huo Q (2000) On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc IEEE 88(8):1241–1269
Article Google Scholar
Lee KA, Larcher A, Thai H, Ma B, Li H (2011) Joint application of speech and speaker recognition for automation and security in smart home. In: Proceedings interspeech
Lee KA, Ma B, Li H (2013) Speaker verification makes its debut in smartphone. In: IEEE signal processing society speech and language technical committee newsletter
Li H, Ma B (2010) Techware: Speaker and spoken language recognition resources [best of the web]. IEEE Signal Proc Mag 27(6):139–142
Google Scholar
Li H, Ma B, Lee KA (2013) Spoken language recognition: From fundamentals to practice. Proc IEEE 101(5):1136–1159
Article Google Scholar
Lindberg J, Blomberg M, et al. (1999) Vulnerability in speaker verification-a study of technical impostor techniques. In: Proceedings of the European conference on speech communication and technology (Eurospeech)
Masuko T, Hitotsumatsu T, Tokuda K, Kobayashi T (1999) On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proceedings of the European conference on speech communication and technology (Eurospeech)
Masuko T, Tokuda K, Kobayashi T (2000) Imposture using synthetic speech against speaker verification based on spectrum and pitch. In: Proceedings of the international conference on spoken language processing (ICSLP)
Matrouf D, Bonastre JF, Fredouille C (2006) Effect of speech transformation on impostor acceptance. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Nuance: Nuance voice biometrics. In: http://www.nuance.com/landing-pages/products/voicebiometrics/
Qian Y, Soong FK, Yan ZJ (2013) A unified trajectory tiling approach to high quality speech rendering. IEEE Trans Audio Speech Lang Process 21(2):280–290
Article Google Scholar
Ratha NK, Connell JH (2001) Bolle, R.M.: Enhancing security and privacy in biometrics-based authentication systems. IBM Syst J 40(3):614–634
Article Google Scholar
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digital signal processing 10(1):19–41
Article Google Scholar
Satoh T, Masuko T, Kobayashi T, Tokuda K (2001) A robust speaker verification system against imposture using a HMM-based speech synthesis system. In: Proceedings of the European conference on speech communication and technology (Eurospeech)
Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Comm 46 (3):455–472
Article Google Scholar
Stafylakis T, Kenny P, Ouellet P, Perez J, Kockmann M, Dumouchel P (2013) Text-dependent speaker recognition using PLDA with uncertainty propagation. In: Proceedings interspeech
Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing 6(2):131–142
Article Google Scholar
Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235
Article Google Scholar
Villalba J, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: Proceedings FALA 10 workshop
Wu Z, Chng E, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings interspeech
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Comm 66:130–153
Article Google Scholar
Wu Z, Gao S, Cling ES, Li H (2014) A study on replay attack and anti-spoofing for text-dependent speaker verification. In: Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC)
Wu Z, Khodabakhsh A, Demiroglu C, Yamagishi J, Saito D, Toda T, King S (2015) SAS: A speaker verification spoofing database containing diverse attacks. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Wu Z, Kinnunen T, Chng E, Li H, Ambikairajah E (2012) A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Proceedings Asia-Pacific signal information processing association annual summit and conference (APSIPA ASC)
Wu Z, Larcher A, Lee KA, Chng ES, Kinnunen T, Li H (2013) Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints. In: Proceedings interspeech
Wu Z, Li H (2014) Voice conversion versus speaker verification: an overview. APSIPA Transactions on Signal and Information Processing 3(e17). doi:10.1017/ATSIP.2014.17
Wu Z, Swietojanski P, Veaux C, Renals S, King S (2015) A study of speaker adaptation for DNN-based speech synthesis. In: Proceedings interspeech
Wu Z, Xiao X, Chng ES, Li H (2013) Synthetic speech detection using temporal modulation feature. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Audio Speech Lang Process 17(1):66–83
Article Google Scholar
Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Comm 51(11):1039–1064
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Centre for Speech Technology Research (CSTR), University of Edinburgh, Edinburgh, UK
Zhizheng Wu
Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore, Singapore
Haizhou Li

Authors

Zhizheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Haizhou Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhizheng Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Z., Li, H. On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimed Tools Appl 75, 5311–5327 (2016). https://doi.org/10.1007/s11042-015-3080-9

Download citation

Received: 03 August 2015
Revised: 03 October 2015
Accepted: 13 November 2015
Published: 03 December 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11042-015-3080-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the study of replay and voice conversion attacks to text-dependent speaker verification

Abstract

Access this article

Similar content being viewed by others

A Novel Secure Speech Biometric Protection Method

Speaker Recognition Anti-spoofing

Introduction to Voice Presentation Attack Detection and Recent Advances

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the study of replay and voice conversion attacks to text-dependent speaker verification

Abstract

Access this article

Similar content being viewed by others

A Novel Secure Speech Biometric Protection Method

Speaker Recognition Anti-spoofing

Introduction to Voice Presentation Attack Detection and Recent Advances

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation