Speaker Recognition Anti-spoofing

  • Nicholas Evans
  • Tomi Kinnunen
  • Junichi Yamagishi
  • Zhizheng Wu
  • Federico Alegre
  • Phillip  De Leon
Chapter
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

Progress in the development of spoofing countermeasures for automatic speaker recognition is less advanced than equivalent work related to other biometric modalities. This chapter outlines the potential for even state-of-the-art automatic speaker recognition systems to be spoofed. While the use of a multitude of different datasets, protocols and metrics complicates the meaningful comparison of different vulnerabilities, we review previous work related to impersonation, replay, speech synthesis and voice conversion spoofing attacks. The article also presents an analysis of the early work to develop spoofing countermeasures. The literature shows that there is significant potential for automatic speaker verification systems to be spoofed, that significant further work is required to develop generalised countermeasures, that there is a need for standard datasets, evaluation protocols and metrics and that greater emphasis should be placed on text-dependent scenarios.

Keywords

Vocal Tract Replay Attack Speech Synthesis False Acceptance Rate Synthetic Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This work was partially supported by the TABULA RASA project funded under the 7th Framework Programme of the European Union (EU) (grant agreement number 257289), by the Academy of Finland (project no. 253120) and by EPSRC grants EP/I031022/1 (NST) and EP/J002526/1 (CAF).

References

  1. 1.
    Evans N, Kinnunen T, Yamagishi J (2013) Spoofing and countermeasures for automatic speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, FranceGoogle Scholar
  2. 2.
    Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. In: Proceedings of Odyssey 2001: the speaker and language recognition workshop, Crete, Greece, pp 213–218Google Scholar
  3. 3.
    Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46(3–4):455–472CrossRefGoogle Scholar
  4. 4.
    Dehak N, Kenny P, Dumouchel P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103CrossRefGoogle Scholar
  5. 5.
    Siddiq S, Kinnunen T, Vainio M, Werner S (2012) Intonational speaker verification: a study on parameters and performance under noisy conditions. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), Kyoto, Japan, pp 4777–4780Google Scholar
  6. 6.
    Kockmann M, Ferrer L, Burget L, Cěrnocký J (2011) i-vector fusion of prosodic and cepstral features for speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Florence, Italy, pp 265–268Google Scholar
  7. 7.
    Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRefGoogle Scholar
  8. 8.
    Reynolds D, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83CrossRefGoogle Scholar
  9. 9.
    Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1):19–41CrossRefGoogle Scholar
  10. 10.
    Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311CrossRefGoogle Scholar
  11. 11.
    Solomonoff A, Campbell W, Boardman I (2005) Advances in channel compensation for SVM speaker recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 629–632, Philadelphia, USAGoogle Scholar
  12. 12.
    Burget L, Matějka P, Schwarz P, Glembek O, Černocký J (2007) Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Trans Audio Speech Lang Process 15(7):1979–1986CrossRefGoogle Scholar
  13. 13.
    Hatch AO, Kajarekar S, Stolcke A (2006) Within-class covariance normalization for svm-based speaker recognition. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 1471–1474Google Scholar
  14. 14.
    Kenny, P (2006) Joint factor analysis of speaker and session variability: theory and algorithms. technical report CRIM-06/08-14Google Scholar
  15. 15.
    Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Speaker and session variability in GMM-based speaker verification. IEEE Trans Audio Speech Lang Process 15(4):1448–1460CrossRefGoogle Scholar
  16. 16.
    Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P (2008) A study of inter-speaker variability in speaker verification. IEEE Trans Audio Speech Lang Process 16(5):980–988CrossRefGoogle Scholar
  17. 17.
    Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRefGoogle Scholar
  18. 18.
    Li P, Fu Y, Mohammed U, Elder JH, Prince SJ (2012) Probabilistic models for inference about identity. IEEE Trans Pattern Anal Mach Intell 34(1):144–157CrossRefGoogle Scholar
  19. 19.
    Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of interspeech, annual conference of the international speech communication association, Florence, Italy, pp 249–252Google Scholar
  20. 20.
    Kinnunen T, Wu ZZ, Lee KA, Sedlak F, Chng ES, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of IEEE international conference on acoustics speech and signal process (ICASSP), pp 4401–4404Google Scholar
  21. 21.
    Saeidi R et al (2013) I4U submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, FranceGoogle Scholar
  22. 22.
    Brümmer N, Burget L, Černocký J, Glembek O, Grézl F, Karafiát M, Leeuwen D, Matějka P, Schwartz P, Strasheim A (2007) Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Trans Audio Speech Lang Process 15(7):2072–2084CrossRefGoogle Scholar
  23. 23.
    Hautamäki V, Kinnunen T, Sedlák F, Lee KA, Ma B, Li H (2013) Sparse classifier fusion for speaker verification. IEEE Trans Audio Speech Lang Process 21(8):1622–1631CrossRefGoogle Scholar
  24. 24.
    Akhtar Z, Fumera G, Marcialis GL, Roli F (2012) Evaluation of serial and parallel multibiometric systems under spoong attacks. In: Proceedings of 5th Int. Conference on biometrics (ICB 2012), pp 283–288, New Delhi, IndiaGoogle Scholar
  25. 25.
    Lau YW, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 international symposium on Intelligent multimedia, video and speech processing, 2004. IEEE, pp 145–148Google Scholar
  26. 26.
    Lau Y, Tran D, Wagner M (2005) Testing voice mimicry with the yoho speaker verification corpus. Knowledge-based intelligent information and engineering systems. Springer, Berlin, p 907Google Scholar
  27. 27.
    Mariéthoz J, Bengio S (2005) Can a professional imitator fool a GMM-based speaker verification system? IDIAP Research Report 05–61Google Scholar
  28. 28.
    Eriksson A, Wretling P (1997) How flexible is the human voice?—a case study of mimicry. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 1043–1046. http://www.ling.gu.se/anders/papers/a1008.pdf
  29. 29.
    Zetterholm E, Blomberg M, Elenius D (2004) A comparison between human perception and a speaker verification system score of a voice imitation. In: Proceedings of tenth australian international conference on speech science and technology, Macquarie University, Sydney, Australia, pp 393–397Google Scholar
  30. 30.
    Farrús M, Wagner M, Anguita J, Hernando J (2008) How vulnerable are prosodic features to professional imitators? In: The speaker and language recognition workshop (Odyssey 2008), Stellenbosch, South AfricaGoogle Scholar
  31. 31.
    Kitamura T (2008) Acoustic analysis of imitated voice produced by a professional impersonator. In: Proceedings of interspeech, annual conference of the international speech communication association, Brisbane, Australia, pp 813–816Google Scholar
  32. 32.
    Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp 17–20Google Scholar
  33. 33.
    Lindberg J, Blomberg M et al (1999) Vulnerability in speaker verification-a study of technical impostor techniques. Proc Eur Conf speech Commun Technol 3:1211–1214Google Scholar
  34. 34.
    Villalba J, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp 131–134Google Scholar
  35. 35.
    Wang ZF, Wei G, He QH (2011) Channel pattern noise based playback attack detection algorithm for speaker recognition. Int Conf Mach Learn Cybern (ICMLC) 4:1708–1713Google Scholar
  36. 36.
    Shang W, Stevenson M (2010) Score normalization in playback attack detection. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 1678–1681Google Scholar
  37. 37.
    Villalba J, Lleida E (2011) Preventing replay attacks on speaker verification systems. In: Proceedings of the IEEE international carnahan conference on security technology, (ICCST) 2011, pp 1–8Google Scholar
  38. 38.
    Klatt DH (1980) Software for a cascade/parallel formant synthesizer. J Acoust Soc Am 67:971–995CrossRefGoogle Scholar
  39. 39.
    Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun 9:453–467CrossRefGoogle Scholar
  40. 40.
    Hunt A, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 373–376Google Scholar
  41. 41.
    Breen A, Jackson P (1998) A phonologically motivated method of selecting nonuniform units. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 2735–2738Google Scholar
  42. 42.
    Donovan RE, Eide EM (1998) The IBM trainable speech synthesis system. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 1703–1706Google Scholar
  43. 43.
    Beutnagel B, Conkie A, Schroeter J, Stylianou Y, Syrdal A (1999) The AT&T next-gen TTS system. In: Proceedings of joint ASA, EAA and DAEA meeting, pp 15–19Google Scholar
  44. 44.
    Coorman G, Fackrell J, Rutten P, Coile B (2000) Segment selection in the L & H realspeak laboratory TTS system. In: Proceedings of international conference on speech and language processing, pp 395–398Google Scholar
  45. 45.
    Yoshimura T, Tokuda K, Masuko T, Kobayashi T, Kitamura T (1999) Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 2347–2350Google Scholar
  46. 46.
    Ling ZH, Wu YJ, Wang YP, Qin L, Wang RH (2006) USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Proceedings of the blizzard challenge workshopGoogle Scholar
  47. 47.
    Black AW (2006) CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling. In: Proceedings of interspeech, annual conference of the international speech communication association, pp 1762–1765Google Scholar
  48. 48.
    Zen H, Toda T, Nakamura M, Tokuda K (2007) Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans Inf Syst E90–D(1):325–333CrossRefGoogle Scholar
  49. 49.
    Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Communication 51(11):1039–1064. doi: 10.1016/j.specom.2009.04.004 CrossRefGoogle Scholar
  50. 50.
    Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Speech Audio Lang Process 17(1):66–83CrossRefGoogle Scholar
  51. 51.
    Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185CrossRefGoogle Scholar
  52. 52.
    Woodland PC (2001) Speaker adaptation for continuous density HMMs: A review. In: Proceedings of ISCA workshop on adaptation methods for speech recognition, p 119Google Scholar
  53. 53.
    Foomany F, Hirschfield A, Ingleby M (2009) Toward a dynamic framework for security evaluation of voice verification systems. In: IEEE toronto international conference on science and technology for humanity (TIC-STH), pp 22–27. doi: 10.1109/TIC-STH.2009.5444499
  54. 54.
    Masuko T, Hitotsumatsu T, Tokuda K, Kobayashi T (1999) On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technologyGoogle Scholar
  55. 55.
    Matsui T, Furui S (1995) Likelihood normalization for speaker verification using a phoneme- and speaker-independent model. Speech Commun 17(1–2):109–116CrossRefGoogle Scholar
  56. 56.
    Masuko T, Tokuda K, Kobayashi T, Imai S (1996) Speech synthesis using HMMs with dynamic features. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP)Google Scholar
  57. 57.
    Masuko T, Tokuda K, Kobayashi T, Imai S (1997) Voice characteristics conversion for HMM-based speech synthesis system. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP)Google Scholar
  58. 58.
    De Leon PL, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290. doi: 10.1109/TASL.2012.2201472 CrossRefGoogle Scholar
  59. 59.
    Galou, G (2011) Synthetic voice forgery in the forensic context: a short tutorial. In: Forensic speech and audio analysis working group (ENFSI-FSAAWG), pp 1–3Google Scholar
  60. 60.
    Satoh T, Masuko T, Kobayashi T, Tokuda K (2001) A robust speaker verification system against imposture using an HMM-based speech synthesis system. In: Proceedings of Eurospeech, ESCA European conference on speech technologyGoogle Scholar
  61. 61.
    Chen LW, Guo W, Dai LR (2010) Speaker verification against synthetic speech. In: Proceedings of 7th international symposium on chinese spoken language processing (ISCSLP), pp 309–312 (29 Nov–3 Dec 2010). doi: 10.1109/ISCSLP.2010.5684887
  62. 62.
    Quatieri TF (2002) Discrete-time speech signal processing principles and practice. Prentice-hall, IncGoogle Scholar
  63. 63.
    Wu Z, Chng ES, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings of interspeech, annual conference of the international speech communication associationGoogle Scholar
  64. 64.
    Ogihara A, Unno H, Shiozakai A (2005) Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification. IEICE Trans Fundam Electron Commun Comput Sci 88(1):280–286CrossRefGoogle Scholar
  65. 65.
    De Leon PL, Stewart B, Yamagishi J (2012) Synthetic speech discrimination using pitch pattern statistics derived from image analysis. In: Proceedings of interspeech, annual conference of the international speech communication association, Portland, Oregon, USAGoogle Scholar
  66. 66.
    Stylianou Y (2009) Voice transformation: a survey. In: Proceedings of IEEE international conference on acoustics speech and signal process (ICASSP), pp 3585–3588Google Scholar
  67. 67.
    Pellom BL, Hansen JH (1999) An experimental study of speaker verification sensitivity to computer voice-altered imposters. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 2, pp 837–840Google Scholar
  68. 68.
    Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 655–658Google Scholar
  69. 69.
    Arslan LM (1999) Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun 28(3):211–226CrossRefGoogle Scholar
  70. 70.
    Kain A, Macon MW (1998) Spectral voice conversion for text-to-speech synthesis. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp 285–288Google Scholar
  71. 71.
    Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6(2):131–142CrossRefGoogle Scholar
  72. 72.
    Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235CrossRefGoogle Scholar
  73. 73.
    Popa V, Silen H, Nurminen J, Gabbouj M (2012) Local linear transformation for voice conversion. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 4517–4520Google Scholar
  74. 74.
    Chen Y, Chu M, Chang E, Liu J, Liu R (2003) Voice conversion with smoothed GMM and MAP adaptation. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 2413–2416Google Scholar
  75. 75.
    Hwang HT, Tsao Y, Wang HM, Wang YR, Chen SH (2012) A study of mutual information for GMM-based spectral conversion. In: Proceedings of Interspeech, annual conference of the international speech communication associationGoogle Scholar
  76. 76.
    Helander E, Virtanen T, Nurminen J, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE Trans Audio Speech Lang Process 18(5):912–921CrossRefGoogle Scholar
  77. 77.
    Pilkington NC, Zen H, Gales MJ (2011) Gaussian process experts for voice conversion. In: Twelfth annual conference of the international speech communication associationGoogle Scholar
  78. 78.
    Saito D, Yamamoto K, Minematsu N, Hirose K (2011) One-to-many voice conversion based on tensor representation of speaker space. In: Proceedings of Interspeech, annual conference of the international speech communication association, pp 653–656Google Scholar
  79. 79.
    Zen H, Nankaku Y, Tokuda K (2011) Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans Audio Speech Lang Process 19(2):417–430CrossRefGoogle Scholar
  80. 80.
    Wu Z, Kinnunen T, Chng ES, Li H (2012) Mixture of factor analyzers using priors from non-parallel speech for voice conversion. IEEE Signal Process Lett 19(12):914–917CrossRefGoogle Scholar
  81. 81.
    Saito D, Watanabe S, Nakamura A, Minematsu N (2012) Statistical voice conversion based on noisy channel model. IEEE Trans Audio Speech Lang Process 20(6):1784–1794CrossRefGoogle Scholar
  82. 82.
    Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech commun 16(2):207–216CrossRefGoogle Scholar
  83. 83.
    Desai S, Raghavendra EV, Yegnanarayana B, Black AW, Prahallad K (2009) Voice conversion using artificial neural networks. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 3893–3896Google Scholar
  84. 84.
    Song P, Bao Y, Zhao L, Zou C (2011) Voice conversion using support vector regression. Electron Lett 47(18):1045–1046CrossRefGoogle Scholar
  85. 85.
    Helander E, Silén H, Virtanen T, Gabbouj M (2012) Voice conversion using dynamic kernel partial least squares regression. IEEE Trans Audio Speech Lang Process 20(3):806–817CrossRefGoogle Scholar
  86. 86.
    Wu Z, Chng ES, Li H (2013) Conditional restricted boltzmann machine for voice conversion. In: The first IEEE china summit and international conference on signal and information processing (ChinaSIP)Google Scholar
  87. 87.
    Sundermann D, Ney H (2003) VTLN-based voice conversion. In: Proceedings of the 3rd IEEE international symposium on signal processing and information technology, 2003. ISSPIT 2003, pp 556–559Google Scholar
  88. 88.
    Erro D, Moreno A, Bonafonte A (2010) Voice conversion based on weighted frequency warping. IEEE Trans Audio Speech Lang Process 18(5):922–931CrossRefGoogle Scholar
  89. 89.
    Erro D, Navas E, Hernaez I (2013) Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans Audio Speech Lang Process 21(3):556–566CrossRefGoogle Scholar
  90. 90.
    Gillet B, King S (2003) Transforming F0 contours. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 101–104Google Scholar
  91. 91.
    Wu CH, Hsia CC, Liu TH, Wang JF (2006) Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. IEEE Trans Audio Speech Lang Process 14(4):1109–1116CrossRefGoogle Scholar
  92. 92.
    Helander EE, Nurminen J (2007) A novel method for prosody prediction in voice conversion. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp IV-509Google Scholar
  93. 93.
    Wu ZZ, Kinnunen T, Chng ES, Li H (2010) Text-independent F0 transformation with non-parallel data for voice conversion. In: Eleventh annual conference of the international speech communication associationGoogle Scholar
  94. 94.
    Lolive D, Barbot N, Boeffard O (2008) Pitch and duration transformation with non-parallel data. Speech prosody 2008:111–114Google Scholar
  95. 95.
    Sundermann D, Hoge H, Bonafonte A, Ney H, Black A, Narayanan S (2006) Text-independent voice conversion based on unit selection. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp I-IGoogle Scholar
  96. 96.
    Wu Z, Larcher A, Lee KA, Chng ES, Kinnunen T, Li H (2013) Vulnerability evaluation of speaker verication under voice conversion spoong: the effect of text constraints. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, FranceGoogle Scholar
  97. 97.
    Matrouf D, Bonastre JF, Fredouille C (2006) Effect of speech transformation on impostor acceptance. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp I-IGoogle Scholar
  98. 98.
    Alegre F, Vipperla R, Evans N, Fauve B (2012) On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals. In: Proceedings of EURASIP Euro signal processing conference (EUSIPCO)Google Scholar
  99. 99.
    Wu Z, Kinnunen T, Chng ES, Li H, Ambikairajah E (2012) A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Signal and information processing association annual summit and conference (APSIPA ASC), 2012 Asia-Pacific, pp 1–5Google Scholar
  100. 100.
    De Leon PL, Hernaez I, Saratxaga I, Pucher M, Yamagishi J (2011) Detection of synthetic speech for the problem of imposture. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP), pp 4844–4847, Dallas, USAGoogle Scholar
  101. 101.
    Alegre F, Vipperla R, Evans N, et al (2012) Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proceedings of interspeech, annual conference of the international speech communication associationGoogle Scholar
  102. 102.
    Alegre F, Amehraye A, Evans N (2013) Spoofing countermeasures to protect automatic speaker verification from voice conversion. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP)Google Scholar
  103. 103.
    Wu Z, Xiao X, Chng ES, Li H (2013) Synthetic speech detection using temporal modulation feature. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP)Google Scholar
  104. 104.
    Alegre F, Vipperla R, Amehraye A, Evans N (2013) A new speaker verification spoofing countermeasure based on local binary patterns. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, FranceGoogle Scholar
  105. 105.
    Hautamki RG, Kinnunen T, Hautamki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceedings of interspeech, annual conference of the international speech communication associationGoogle Scholar
  106. 106.
    Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 1895–1898Google Scholar
  107. 107.
    Alegre F, Amehraye A, Evans N (2013) A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: Proceedings of international conference on biometrics: theory, applications and systems (BTAS), Washington DC, USAGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Nicholas Evans
    • 1
  • Tomi Kinnunen
    • 2
  • Junichi Yamagishi
    • 3
    • 4
  • Zhizheng Wu
    • 5
  • Federico Alegre
    • 1
  • Phillip  De Leon
    • 6
  1. 1.Department of Multimedia Communications, Campus SophiaTechEURECOMBiotFrance
  2. 2.Speech and Image Processing Unit, School of ComputingUniversity of Eastern Finland (UEF)JoensuuFinland
  3. 3.National Institute of InformaticsChiyoda-kuJapan
  4. 4.University of EdinburghEdinburghUK
  5. 5.Emerging Research Lab, School of Computer EngineeringNanyang Technological University (NTU)SingaporeSingapore
  6. 6.Department 3-O, Klipsch School of Electrical and Computer EngineeringNew Mexico State UniversityLas CrucesUSA

Personalised recommendations