Text and Language-Independent Speaker Recognition Using Suprasegmental Features and Support Vector Machines

  • Anvita Bajpai
  • Vinod Pathangay
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 40)


In this paper, presence of the speaker-specific suprasegmental information in the Linear Prediction (LP) residual signal is demonstrated. The LP residual signal is obtained after removing the predictable part of the speech signal. This information, if added to existing speaker recognition systems based on segmental and subsegmental features, can result in better performing combined system. The speaker-specific suprasegmental information can not only be perceived by listening to the residual, but can also be seen in the form of excitation peaks in the residual waveform. However, the challenge lies in capturing this information from the residual signal. Higher order correlations among samples of the residual are not known to be captured using standard signal processing and statistical techniques. The Hilbert envelope of residual is shown to further enhance the excitation peaks present in the residual signal. A speaker-specific pattern is also observed in the autocorrelation sequence of the Hilbert envelope, and further in the statistics of this autocorrelation sequence. This indicates the presence of the speaker-specific suprasegmental information in the residual signal. In this work, no distinction between voiced and unvoiced sounds is done for extracting these features. Support Vector Machine (SVM) is used to classify the patterns in the variance of the autocorrelation sequence for the speaker recognition task.


Speaker recognition Suprasegmental features Linear prediction residual Hilbert envelope Support Vector Machines 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Furui, S.: Speaker-independent and speakeradaptive recognition techniques. In: Furui, S., Sondhi, M.M. (eds.) Advances in Speech signal processing, pp. 597–622. Marcel Dekker (1991)Google Scholar
  2. 2.
    Makhoul, J.: Linear Prediction: A Tutorial Review. Proc. IEEE 63(4), 561–580 (1975)CrossRefGoogle Scholar
  3. 3.
    Yegnanarayana, B., Prasanna, S.R.M., Rao, K.S.: Speech Enhancement using Excitation Source Information. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, Orlando, FL, USA (May 2002)Google Scholar
  4. 4.
    Ananthapadmanabha, T.V., Yegnanarayana, B.: Epoch Extraction from Linear Prediction Residual for Identification of Closed Glottis Interval. IEEE Trans. Acoust., Speech, Signal Processing ASSP-27(4), 309–319 (1979)CrossRefGoogle Scholar
  5. 5.
    Yegnanarayana, B., Prasanna, S.R.M., Zachariah, J.M., Gupta, C.S.: Combining Evidence from Source, Suprasegmental and Spectral Features for a Fixed-Text Speaker Verification System. IEEE Trans. Speech and Audio Processing 13(4) (July 2005)Google Scholar
  6. 6.
    Campbell, J.P.: Speaker recognition: A tutorial. Proc. IEEE 85(9), 1436–1462 (1997)CrossRefGoogle Scholar
  7. 7.
    Bimbot, F., et al.: A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing 4, 430–451 (2004)CrossRefGoogle Scholar
  8. 8.
    Yegnanarayana, B., Reddy, K.S., Kishore, S.P.: Source and System Features for Speaker Recognition using AANN Models. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, Saltlake City, Utah, USA (May 2001)Google Scholar
  9. 9.
    Prasanna, S.R.M., Gupta, C.S., Yegnanarayana, B.: Autoassociative Neural Network Models for Speaker Verification using Source Features. In: Proc. Int. Conf. Cognitive and Neural Systems, Boston, USA (May 2002)Google Scholar
  10. 10.
    Pruzansky, S.: Pattern-matching procedure for automatic talker recognition. J. Acoust. Soc. Amer. 35, 354–358 (1963)CrossRefGoogle Scholar
  11. 11.
    Li, K.P., et al.: Experimental studies in speaker verification using a adaptive system. J. Acoust. Soc. Amer. 40, 966–978 (1966)CrossRefGoogle Scholar
  12. 12.
    Doddington, G.: A method of speaker verification. J. Acoust. Soc. Amer. 49, 139 (A) (1971)CrossRefGoogle Scholar
  13. 13.
    Li, K.P., Hughes, G.W.: Talker differences as they appear in correlation matrices of continuous speech spectra. J. Acoust. Soc. Amer. 55(4), 833–837 (1974)CrossRefGoogle Scholar
  14. 14.
    Beek, B., et al.: An assessment of the technology of automatic speech recognition for military applications. IEEE Trans. Acoust., Speech, Signal Processing 25, 310–322 (1977)CrossRefGoogle Scholar
  15. 15.
    Sambur, M.R.: Speaker recognition using orthogonal linear prediction. IEEE Trans. Acoust., Speech, Signal Processing 24, 283–289 (1976)CrossRefGoogle Scholar
  16. 16.
    Furui, S., Itakura, F., Satio, S.: Talker recognition by long-time averaged speech spectrum. Electron Commun., Jap. 55-A, 54–61 (1972)Google Scholar
  17. 17.
    Soong, F.K., Rosenberg, A.E., Rabiner, L.R., Juang, B.H.: A vector quantization approach to speaker recognition. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, pp. 387–390 (1985)Google Scholar
  18. 18.
    Rosenberg, A.E., Soong, F.K.: Evaluation of a vector quantization talker recognition system in a text independent and text dependent modes. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, pp. 873–876 (1986)Google Scholar
  19. 19.
    Poritz, A.B.: Linear predictive hidden markov models and the speech signal. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, pp. 1291–1294 (1982)Google Scholar
  20. 20.
    Reynolds, D.A.: Speaker identification and verification using gaussian mixture models. Speech Comm. 17, 91–108 (1995)CrossRefGoogle Scholar
  21. 21.
    Higgins, A.L., Bahler, L., Porter, J.: Voice identification using nonparametric density matching. In: Lee, C.H., Soong, F.K., Paliwal, K.K. (eds.) Automatic Speech and Speaker Recognition, pp. 211–232. Kluwer Academic, Boston (1996)CrossRefGoogle Scholar
  22. 22.
    Doddington, G.R.: Speaker recognition based on idiolectal differences between speakers. In: Eurospeech, pp. 2521–2524 (2001)Google Scholar
  23. 23.
    Prasanna, S.R.M., Gupta, C.S., Yegnanarayana, B.: Source Information from Linear Prediction Residual for Speaker Recognition. Communicated to J. Acoust. Soc. Amer. (2002)Google Scholar
  24. 24.
    Collobert, R., Bengio, S.: Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)Google Scholar
  25. 25.
    Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan College Publishing Company, New York (1994)Google Scholar
  26. 26.
    Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Anvita Bajpai
    • 1
  • Vinod Pathangay
    • 2
  1. 1.DeciDyn SystemsBangaloreIndia
  2. 2.Dept. of Computer Science and EngineeringIndian Institute of Technology MadrasIndia

Personalised recommendations