Skip to main content

Spectral Subband Centroids as Complementary Features for Speaker Authentication

  • Conference paper
Biometric Authentication (ICBA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3072))

Included in the following conference series:

Abstract

Most conventional features used in speaker authentication are based on estimation of spectral envelopes in one way or another, e.g., Mel-scale Filterbank Cepstrum Coefficients (MFCCs), Linear-scale Filterbank Cepstrum Coefficients (LFCCs) and Relative Spectral Perceptual Linear Prediction (RASTA-PLP). In this study, Spectral Subband Centroids (SSCs) are examined. These features are the centroid frequency in each subband. They have properties similar to formant frequencies but are limited to a given subband. Empirical experiments carried out on the NIST2001 database using SSCs, MFCCs, LFCCs and their combinations by concatenation suggest that SSCs are somewhat more robust compared to conventional MFCC and LFCC features as well as being partially complementary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bengio, S., Keller, M., Mariéthoz, J.: The Expected Performance Curve. IDIAP Research Report 03-85, Martigny, Switzerland (2003)

    Google Scholar 

  2. Bengio, Y.: Neural Networks for Speech and Sequence Recognition. Thompson Computer Press (1995)

    Google Scholar 

  3. Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1999)

    Google Scholar 

  4. Chilton, E., Marvi, H.: Two-Dimensional Root Cepstrum as Feature Extraction Method for Speech Recognition. Electronics Letters 3(10), 815–816 (2003)

    Article  Google Scholar 

  5. de Mori, R., Palakal, M.: On the Use of a Taxonomy of Time-Frequency Morphologies for Automatic Speech Recognition. Int’l Joint Conf. Artificial Intelligence, 877–879 (1985)

    Google Scholar 

  6. Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: Rasta-PLP speech analysis. In: Proc. IEEE Int’l Conf. Acoustics, Speech and Signal Processing, San Francisco, vol. 1, pp. 121–124 (1992)

    Google Scholar 

  7. Kajarekar, S.S., Hermansky, H.: Analysis of Information in Speech and its Application in Speech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 283–288. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  8. Magrin-Chagnolleau, I., Gravier, G., Seck, M., Boeffard, O., Blouet, R., Bimbot, F.: A Further Investigation on Speech Features for Speaker Characterization. In: Proc. Int’l Conf. Spoken Language Processing, Beijing, October 2000, vol. 3, pp. 1029–1032 (2000)

    Google Scholar 

  9. Paliwal, K.K.: Spectral Subband Centroids Features for Speech Recognition. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Seattle, vol. 2, pp. 617–620 (1998)

    Google Scholar 

  10. Poh, N., Sanderson, C., Bengio, S.: An Investigation of Spectral Subband Centroids For Speaker Authentication. IDIAP Research Report 03-62, Martigny, Switzerland (2003); To appear in Int’l Conf. on Biometric Authentication, Hong Kong (2004)

    Google Scholar 

  11. Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Oxford University Press, Oxford (1993)

    Google Scholar 

  12. Reynolds, D.A.: Experimental Evaluation of Features for Robust Speaker Identification. IEEE Trans. Speech and Audio Processing 2(4), 639–643 (1994)

    Article  Google Scholar 

  13. Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker Verification Using Adapted Gaussian Mixture Models.  10(1–3), 19–41 (2000)

    Google Scholar 

  14. Sanderson, C.: Speech Processing & Text-Independent Automatic Person Verification. In: IDIAP Communication 02-08, Martigny, Switzerland (2002)

    Google Scholar 

  15. Sönmez, M.K., Shriberg, E., Heck, L., Weintraub, M.: Modeling Dynamic Prosodic Variation for Speaker Verification. In: Proc. Int’l Conf. Spoken Language Processing, Sydney, vol. 7, pp. 3189–3192 (1998)

    Google Scholar 

  16. Kemal Sönmez, M., Heck, L., Weintraub, M., Shriberg, E.: A Lognormal Tied Mixture Model of Pitch for Prosody-Based Speaker Recognition. In: Proc. Eurospeech, Rhodes, vol. 3, pp. 1291–1394 (1997) (Greece)

    Google Scholar 

  17. Varga, A., Steeneken, H.: Assessment for Automatic Speech Recognition: NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems. Speech Communication 12(3), 247–251 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Thian, N.P.H., Sanderson, C., Bengio, S. (2004). Spectral Subband Centroids as Complementary Features for Speaker Authentication. In: Zhang, D., Jain, A.K. (eds) Biometric Authentication. ICBA 2004. Lecture Notes in Computer Science, vol 3072. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25948-0_86

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25948-0_86

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22146-3

  • Online ISBN: 978-3-540-25948-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics