40 Years of Progress in Automatic Speaker Recognition

  • Sadaoki Furui
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5558)


Research in automatic speaker recognition has now spanned four decades. This paper surveys the major themes and advances made in the past 40 years of research so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech-based human biometrics. Although many techniques have been developed, many challenges have yet to be overcome before we can achieve the ultimate goal of creating human-like machines. Such a machine needs to be able to deliver satisfactory performance under a broad range of operating conditions. A much greater understanding of the human speech process is still required before automatic speaker recognition systems can approach human performance.


Speaker recognition speaker identification speaker verification speaker diarization text-dependent text-independent robust recognition 


  1. 1.
    Atal, B.S.: Text-independent speaker recognition: J.A.S.A. 52(181) (A), 83th ASA (1972) Google Scholar
  2. 2.
    Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification: J.A.S.A. 55(6), 1304–1312 (1974) Google Scholar
  3. 3.
    Beek, B., et al.: Automatic speaker recognition system: Rome Air Development Center Report (1971) Google Scholar
  4. 4.
    Bimbot, F.J., et al.: A tutorial on text-independent speaker verification. EURASIP Journ. on Applied Signal Processing, 430–451 (2004) Google Scholar
  5. 5.
    Bricker, P.D., et al.: Statistical techniques for talker identification. B.S.T.J. 50, 1427–1454 (1971) Google Scholar
  6. 6.
    Campbell, W.M., Campbell, J.P., Reynolds, D.A., Jones, D.A., Leek, T.R.: High-level speaker verification with support vector machines. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. I-73–76 (2004) Google Scholar
  7. 7.
    Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Computer Speech and Language 20(2-3), 210–229 (2006) Google Scholar
  8. 8.
    Cheung, M.-C., Mak, M.-W., Kung, S.-Y.: A two-level fusion approach to multimodal biometric verification. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. V-485-488 (2005) Google Scholar
  9. 9.
    Doddington, G.R.: A method of speaker verification. J.A.S.A. 49(139) (A) (1971) Google Scholar
  10. 10.
    Doddington, G.R.: Speaker recognition based on idiolectal differences between speakers. In: Proc. Eurospeech, pp. 2521–2524 (2001) Google Scholar
  11. 11.
    Endres, W., et al.: Voice spectrograms as a function of age, voice disguise, and voice imitation. J.A.S.A. 49, 6(2), 1842–1848 (1971) Google Scholar
  12. 12.
    Ferguson, J. (ed.): Hidden Markov models for speech, IDA, Princeton, NJ (1980) Google Scholar
  13. 13.
    Furui, S.: An analysis of long-term variation of feature parameters of speech and its application to talker recognition. Electronics and Communications in Japan 57-A, 34–41 (1974) Google Scholar
  14. 14.
    Furui, S., et al.: Talker recognition by long time averaged speech spectrum. Electronics and Communications in Japan 55-A, 54–61 (1972) Google Scholar
  15. 15.
    Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoustics, Speech, Signal Processing ASSP-29, 254–272 (1981) Google Scholar
  16. 16.
    Furui, S.: Speaker-independent and speaker-adaptive recognition techniques. In: Furui, S., Sondhi, M.M. (eds.) Advances in Speech Signal Processing, pp. 597–622. Marcel Dekker (1991) Google Scholar
  17. 17.
    Furui, S.: Recent advances in speaker recognition. In: Proc. First Int. Conf. Audio- and Video-based Biometric Person Authentication, Crans-Montana, Switzerland, pp. 237–252 (1997) Google Scholar
  18. 18.
    Furui, S.: Digital Speech Processing, Synthesis, and Recognition, 2nd edn. Marcel Dekker, New York (2000) Google Scholar
  19. 19.
    Furui, S.: Fifty years of progress in speech and speaker recognition. In: Proc. 148th ASA Meeting (2004) Google Scholar
  20. 20.
    Gales, M.J.F., Young, S.J.: HMM recognition in noise using parallel model combination. In: Proc. Eurospeech, Berlin, pp. II-837-840 (1993) Google Scholar
  21. 21.
    Gish, H., Siu, M., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: Proc. ICASSP, S13.11, pp. 873–876 (1991) Google Scholar
  22. 22.
    Higgins, A., et al.: Speaker verification using randomized phrase prompting. Digital Signal Processing 1, 89–106 (1991) Google Scholar
  23. 23.
    Juang, B.-H., Soong, F.K.: Speaker recognition based on source coding approaches. In: Proc. ICASSP, vol. 1, pp. 613–616 (1990) Google Scholar
  24. 24.
    Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9, 171–185 (1995) Google Scholar
  25. 25.
    Li, K.P., Hughes, G.W.: Talker differences as they appear in correlation matrices of continuous speech spectra. J.A.S.A. 55, 833–837 (1974) Google Scholar
  26. 26.
    Li, K.P., et al.: Experimental studies in speaker verification using a adaptive system. J.A.S.A. 40, 966–978 (1966) Google Scholar
  27. 27.
    Martin, F., Shikano, K., Minami, Y.: Recognition of noisy speech by composition of hidden Markov models. In: Proc. Eurospeech, Berlin, pp. II-1031–1034 (1993) Google Scholar
  28. 28.
    Matsui, T., Furui, S.: Text-independent speaker recognition using vocal tract and pitch information. In: Proc. Int. Conf. Spoken Language Processing, Kobe, vol. 5.3, pp. 137–140 (1990) Google Scholar
  29. 29.
    Matsui, T., Furui, S.: Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs. In: Proc. ICSLP, pp. II-157–160 (1992) Google Scholar
  30. 30.
    Matsui, T., Furui, S.: Concatenated phoneme models for text-variable speaker recognition. In: Proc. ICASSP, pp. II-391–394 (1993) Google Scholar
  31. 31.
    Matsui, T., Furui, S.: Similarity normalization method for speaker verification based on a posteriori probability. In: Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, pp. 59–62 (1994) Google Scholar
  32. 32.
    Matsui, T., Furui, S.: Speaker recognition using HMM composition in noisy environments. Computer Speech and Language 10, 107–116 (1996) Google Scholar
  33. 33.
    McLaren, M., Vogt, R., Baker, B., Sridharan, S.: A comparison of session variability compensation techniques for SVM-based speaker recognition. In: Proc. Interspeech, pp. 790–793 (2007) Google Scholar
  34. 34.
    Naik, J.M., et al.: Speaker verification over long distance telephone lines. In: Proc. ICASSP, pp. 524–527 (1989) Google Scholar
  35. 35.
    Petri, A., Bonastre, J.-F., Matrouf, D., Capman, F., Ravera, B.: Confidence measure based unsupervised target model adaptation for speaker verification. In: Proc. Interspeech, pp. 754–757 (2007) Google Scholar
  36. 36.
    Poritz, A.B.: Linear predictive hidden Markov models and the speech signal. In: Proc. ICASSP, vol. 2, pp. 1291–1294 (1982) Google Scholar
  37. 37.
    Pruzansky, S.: Pattern-matching procedure for automatic talker recognition. J.A.S.A. 35, 354–358 (1963) Google Scholar
  38. 38.
    Pruzansky, S., Mathews, M.V.: Talker recognition procedure based on analysis of variance. J.A.S.A. 36, 2041–2047 (1964) Google Scholar
  39. 39.
    Rabiner, L.R., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993) Google Scholar
  40. 40.
    Rose, R., Reynolds, R.A.: Text independent speaker identification using automatic acoustic segmentation. In: Proc. ICASSP, pp. 293–296 (1990) Google Scholar
  41. 41.
    Rose, R.C., Hofstetter, E.M., Reynolds, D.A.: Integrated models of signal and background with application to speaker identification in noise. IEEE Trans. Speech and Audio Processing 2(2), 245–257 (1994) Google Scholar
  42. 42.
    Rosenberg, A.E., Sambur, M.R.: New techniques for automatic speaker verification. IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-23(2), 169–176 (1975) Google Scholar
  43. 43.
    Rosenberg, A.E., Soong, F.K.: Evaluation of a vector quantization talker recognition system in text independent and text dependent models. Computer Speech and Language 22, 143–157 (1987) Google Scholar
  44. 44.
    Sambur, M.R.: Speaker recognition and verification using linear prediction analysis. Ph. D. Dissert., M.I.T (1972) Google Scholar
  45. 45.
    Siu, M., et al.: An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers. In: Proc. ICASSP, pp. I-189–192 (1992) Google Scholar
  46. 46.
    Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., Venkataraman, A.: MLLR transforms as features in speaker recognition. In: Proc. Interspeech 2005, pp. 2425–2428 (2005) Google Scholar
  47. 47.
    Sugiyama, M.: Segment based text independent speaker recognition. In: Proc. Acoust., Spring Meeting of Soc. Japan, pp. 75–76 (1988) (in Japanese) Google Scholar
  48. 48.
    Tishby, N.: On the application of mixture AR hidden Markov models to text independent speaker recognition. IEEE Trans. Acoust., Speech, Signal Processing ASSP-30(3), 563–570 (1991) Google Scholar
  49. 49.
    Wilcox, L., et al.: Segmentation of speech using speaker identification. In: Proc. ICASSP, pp. I-161–164 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Sadaoki Furui
    • 1
  1. 1.Department of Computer ScienceTokyo Institute of Technology,JapanTokyoJapan

Personalised recommendations