Spectral Subband Centroids as Complementary Features for Speaker Authentication

  • Norman Poh Hoon Thian
  • Conrad Sanderson
  • Samy Bengio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3072)

Abstract

Most conventional features used in speaker authentication are based on estimation of spectral envelopes in one way or another, e.g., Mel-scale Filterbank Cepstrum Coefficients (MFCCs), Linear-scale Filterbank Cepstrum Coefficients (LFCCs) and Relative Spectral Perceptual Linear Prediction (RASTA-PLP). In this study, Spectral Subband Centroids (SSCs) are examined. These features are the centroid frequency in each subband. They have properties similar to formant frequencies but are limited to a given subband. Empirical experiments carried out on the NIST2001 database using SSCs, MFCCs, LFCCs and their combinations by concatenation suggest that SSCs are somewhat more robust compared to conventional MFCC and LFCC features as well as being partially complementary.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bengio, S., Keller, M., Mariéthoz, J.: The Expected Performance Curve. IDIAP Research Report 03-85, Martigny, Switzerland (2003)Google Scholar
  2. 2.
    Bengio, Y.: Neural Networks for Speech and Sequence Recognition. Thompson Computer Press (1995)Google Scholar
  3. 3.
    Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1999)Google Scholar
  4. 4.
    Chilton, E., Marvi, H.: Two-Dimensional Root Cepstrum as Feature Extraction Method for Speech Recognition. Electronics Letters 3(10), 815–816 (2003)CrossRefGoogle Scholar
  5. 5.
    de Mori, R., Palakal, M.: On the Use of a Taxonomy of Time-Frequency Morphologies for Automatic Speech Recognition. Int’l Joint Conf. Artificial Intelligence, 877–879 (1985)Google Scholar
  6. 6.
    Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: Rasta-PLP speech analysis. In: Proc. IEEE Int’l Conf. Acoustics, Speech and Signal Processing, San Francisco, vol. 1, pp. 121–124 (1992)Google Scholar
  7. 7.
    Kajarekar, S.S., Hermansky, H.: Analysis of Information in Speech and its Application in Speech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 283–288. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  8. 8.
    Magrin-Chagnolleau, I., Gravier, G., Seck, M., Boeffard, O., Blouet, R., Bimbot, F.: A Further Investigation on Speech Features for Speaker Characterization. In: Proc. Int’l Conf. Spoken Language Processing, Beijing, October 2000, vol. 3, pp. 1029–1032 (2000)Google Scholar
  9. 9.
    Paliwal, K.K.: Spectral Subband Centroids Features for Speech Recognition. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Seattle, vol. 2, pp. 617–620 (1998)Google Scholar
  10. 10.
    Poh, N., Sanderson, C., Bengio, S.: An Investigation of Spectral Subband Centroids For Speaker Authentication. IDIAP Research Report 03-62, Martigny, Switzerland (2003); To appear in Int’l Conf. on Biometric Authentication, Hong Kong (2004)Google Scholar
  11. 11.
    Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Oxford University Press, Oxford (1993)Google Scholar
  12. 12.
    Reynolds, D.A.: Experimental Evaluation of Features for Robust Speaker Identification. IEEE Trans. Speech and Audio Processing 2(4), 639–643 (1994)CrossRefGoogle Scholar
  13. 13.
    Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker Verification Using Adapted Gaussian Mixture Models.  10(1–3), 19–41 (2000)Google Scholar
  14. 14.
    Sanderson, C.: Speech Processing & Text-Independent Automatic Person Verification. In: IDIAP Communication 02-08, Martigny, Switzerland (2002)Google Scholar
  15. 15.
    Sönmez, M.K., Shriberg, E., Heck, L., Weintraub, M.: Modeling Dynamic Prosodic Variation for Speaker Verification. In: Proc. Int’l Conf. Spoken Language Processing, Sydney, vol. 7, pp. 3189–3192 (1998)Google Scholar
  16. 16.
    Kemal Sönmez, M., Heck, L., Weintraub, M., Shriberg, E.: A Lognormal Tied Mixture Model of Pitch for Prosody-Based Speaker Recognition. In: Proc. Eurospeech, Rhodes, vol. 3, pp. 1291–1394 (1997) (Greece)Google Scholar
  17. 17.
    Varga, A., Steeneken, H.: Assessment for Automatic Speech Recognition: NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems. Speech Communication 12(3), 247–251 (1993)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Norman Poh Hoon Thian
    • 1
  • Conrad Sanderson
    • 1
  • Samy Bengio
    • 1
  1. 1.IDIAPMartignySwitzerland

Personalised recommendations