Spectral Subband Centroids as Complementary Features for Speaker Authentication

Thian, Norman Poh Hoon; Sanderson, Conrad; Bengio, Samy

doi:10.1007/978-3-540-25948-0_86

Norman Poh Hoon Thian¹⁷,
Conrad Sanderson¹⁷ &
Samy Bengio¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3072))

Included in the following conference series:

International Conference on Biometric Authentication

1303 Accesses
15 Citations

Abstract

Most conventional features used in speaker authentication are based on estimation of spectral envelopes in one way or another, e.g., Mel-scale Filterbank Cepstrum Coefficients (MFCCs), Linear-scale Filterbank Cepstrum Coefficients (LFCCs) and Relative Spectral Perceptual Linear Prediction (RASTA-PLP). In this study, Spectral Subband Centroids (SSCs) are examined. These features are the centroid frequency in each subband. They have properties similar to formant frequencies but are limited to a given subband. Empirical experiments carried out on the NIST2001 database using SSCs, MFCCs, LFCCs and their combinations by concatenation suggest that SSCs are somewhat more robust compared to conventional MFCC and LFCC features as well as being partially complementary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bengio, S., Keller, M., Mariéthoz, J.: The Expected Performance Curve. IDIAP Research Report 03-85, Martigny, Switzerland (2003)
Google Scholar
Bengio, Y.: Neural Networks for Speech and Sequence Recognition. Thompson Computer Press (1995)
Google Scholar
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1999)
Google Scholar
Chilton, E., Marvi, H.: Two-Dimensional Root Cepstrum as Feature Extraction Method for Speech Recognition. Electronics Letters 3(10), 815–816 (2003)
Article Google Scholar
de Mori, R., Palakal, M.: On the Use of a Taxonomy of Time-Frequency Morphologies for Automatic Speech Recognition. Int’l Joint Conf. Artificial Intelligence, 877–879 (1985)
Google Scholar
Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: Rasta-PLP speech analysis. In: Proc. IEEE Int’l Conf. Acoustics, Speech and Signal Processing, San Francisco, vol. 1, pp. 121–124 (1992)
Google Scholar
Kajarekar, S.S., Hermansky, H.: Analysis of Information in Speech and its Application in Speech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 283–288. Springer, Heidelberg (2000)
Chapter Google Scholar
Magrin-Chagnolleau, I., Gravier, G., Seck, M., Boeffard, O., Blouet, R., Bimbot, F.: A Further Investigation on Speech Features for Speaker Characterization. In: Proc. Int’l Conf. Spoken Language Processing, Beijing, October 2000, vol. 3, pp. 1029–1032 (2000)
Google Scholar
Paliwal, K.K.: Spectral Subband Centroids Features for Speech Recognition. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Seattle, vol. 2, pp. 617–620 (1998)
Google Scholar
Poh, N., Sanderson, C., Bengio, S.: An Investigation of Spectral Subband Centroids For Speaker Authentication. IDIAP Research Report 03-62, Martigny, Switzerland (2003); To appear in Int’l Conf. on Biometric Authentication, Hong Kong (2004)
Google Scholar
Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Oxford University Press, Oxford (1993)
Google Scholar
Reynolds, D.A.: Experimental Evaluation of Features for Robust Speaker Identification. IEEE Trans. Speech and Audio Processing 2(4), 639–643 (1994)
Article Google Scholar
Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker Verification Using Adapted Gaussian Mixture Models. 10(1–3), 19–41 (2000)
Google Scholar
Sanderson, C.: Speech Processing & Text-Independent Automatic Person Verification. In: IDIAP Communication 02-08, Martigny, Switzerland (2002)
Google Scholar
Sönmez, M.K., Shriberg, E., Heck, L., Weintraub, M.: Modeling Dynamic Prosodic Variation for Speaker Verification. In: Proc. Int’l Conf. Spoken Language Processing, Sydney, vol. 7, pp. 3189–3192 (1998)
Google Scholar
Kemal Sönmez, M., Heck, L., Weintraub, M., Shriberg, E.: A Lognormal Tied Mixture Model of Pitch for Prosody-Based Speaker Recognition. In: Proc. Eurospeech, Rhodes, vol. 3, pp. 1291–1394 (1997) (Greece)
Google Scholar
Varga, A., Steeneken, H.: Assessment for Automatic Speech Recognition: NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems. Speech Communication 12(3), 247–251 (1993)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP, Rue du Simplon 4, CH-1920, Martigny, Switzerland
Norman Poh Hoon Thian, Conrad Sanderson & Samy Bengio

Authors

Norman Poh Hoon Thian
View author publications
You can also search for this author in PubMed Google Scholar
Conrad Sanderson
View author publications
You can also search for this author in PubMed Google Scholar
Samy Bengio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Biometrics Research Centre, Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
David Zhang
Department of Computer Science and Engineering, Michigan State University,
Anil K. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thian, N.P.H., Sanderson, C., Bengio, S. (2004). Spectral Subband Centroids as Complementary Features for Speaker Authentication. In: Zhang, D., Jain, A.K. (eds) Biometric Authentication. ICBA 2004. Lecture Notes in Computer Science, vol 3072. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25948-0_86

Download citation

DOI: https://doi.org/10.1007/978-3-540-25948-0_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22146-3
Online ISBN: 978-3-540-25948-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics