Application of Multiple Classifier Techniques to Subband Speaker Identification with an HMM/ANN System

Higgins, J. E.; Dodd, T. J.; Damper, R. I.

doi:10.1007/3-540-48219-9_37

J. E. Higgins⁶,
T. J. Dodd⁶ &
R. I. Damper⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2096))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

1115 Accesses
2 Citations

Abstract

In previous work, we have confirmed the performance gains that can be obtained in speaker recognition by splitting the (clean) wide-band speech signal into several subbands, employing separate pattern classifiers for each subband, and then using multiple classifier fusion (‘recombination’) techniques to produce a final decision. However, our earlier work used fairly rudimentary recognition techniques (dynamic time warping), just sum or product fusion rules and the spoken word seven only. The question then arises: Can subband processing still deliver performance gains when using state-of-the-art recognition techniques, more sophisticated recombination, and different spoken digits? To answer this, we have applied hidden Markov modelling and artificial neural network (ANN) recombination to text-dependent speaker identification, for spoken digits seven and nine. We find that ANN recombination performs about as well as the sum rule operating in log probability space, but the ANN results are not unique. They depend critically on user-specified parameters, initialisation, etc. On clean speech, all classifiers achieve close to 100% identification. Subband techniques offer advantages when the speech signal is significantly degraded by noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55, 1304–1312.
Article Google Scholar
Besacier, L. and J.-F. Bonastre (1997). Subband approach for automatic speaker recognition: Optimal division of the frequency domain. In Proceedings of 1st International Conference on Audio-and Visual-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, pp. 195–202.
Google Scholar
Besacier, L. and J.-F. Bonastre (2000). Subband architecture for automatic speaker recognition. Signal Processing 80, 1245–1259.
Article MATH Google Scholar
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford, UK: Clarendon Press.
Google Scholar
Bourlard, H. and S. Dupont (1996). A new ASR approach based on independent processing and recombination of partial frequency bands. In Proceedings of Fourth International Conference on Spoken Language Processing, ICSLP’96, Volume 1, Philadelphia, PA, pp. 426–429.
Article Google Scholar
Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE 85(9), 1437–1462.
Article Google Scholar
Deller, J. R., J. P. Proakis, and J. H. L. Hansen (1993). Discrete-Time Processing of Speech Signals. Englewood Cliffs, NJ: MacMillan.
Google Scholar
Doddington, G. (1985). Speaker recognition–identifying people by their voices. Proceedings of the IEEE 73(11), 1651–1664.
Article Google Scholar
Finan, R. A., R. I. Damper, and A. T. Sapeluk (2001). Text-dependent speaker recognition using sub-band processing. International Journal of Speech Technology 4(1), 45–62.
Article MATH Google Scholar
Furui, S. (1974). An analysis of long-term variation of feature parameters of speech and its application to talker recognition. Electronic Communications 57-A, 34–42.
Google Scholar
Furui, S. (1981). Cepstral analysis techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(2), 254–272.
Article Google Scholar
Furui, S. (1997). Recent advances in speaker recognition. Pattern Recognition Letters 18, 859–872.
Article Google Scholar
Higgins, J. E., R. I. Damper, and C. J. Harris (1999). A multi-spectral data-fusion approach to speaker recognition. In Proceedings of 2nd International Conference on Information Fusion, Fusion 99, Volume II, Sunnyvale, CA, pp. 1136–1143.
Google Scholar
Kittler, J., M. Hatef, R. P. W. Duin, and J. Matas (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239.
Article Google Scholar
Markel, J. D. and A. H. Gray (1976). Linear Prediction of Speech. Berlin, Germany: Springer-Verlag.
MATH Google Scholar
Morris, A., A. Hagen, and H. Bourlard (1999). The full-combination sub-bands approach to noise robust HMM/ANN-based ASR. In Proceedings of 6th European Conference on Speech Communication and Technology, Eurospeech’99, Volume 2, Budapest, Hungary, pp. 599–602.
Google Scholar
Okawa, S., T. Nakajima, and K. Shirai (1999). A recombination strategy for multi-band speech recognition based on mutual information criterion. In Proceedings of 6th European Conference on Speech Communication and Technology, Eurospeech’99, Volume 2, Budapest, Hungary, pp. 603–606.
Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285.
Article Google Scholar
Reynolds, D. A. and R. C. Rose (1995). Robust text-independent speaker identification using Gaussian mixture models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83.
Article Google Scholar
Sivakumaran, P., A. M. Ariyaeeinia, and J. A. Hewitt (1998). Sub-band speaker verification using dynamic recombination weights. In Proceedings of 5th International Conference on Spoken Language Processing, ICSLP 98, Sydney, Australia. Paper 1055 on CD-ROM.
Google Scholar
Stevens, S. S. and J. Volkmann (1940). The relation of pitch to frequency: A revised scale. American Journal of Psychology 53(3), 329–353.
Article Google Scholar
Tibrewala, S. and H. Hermansky (1997). Sub-band based recognition of noisy speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, ICASSP 97, Volume II, Munich, Germany, pp. 1255–1258.
Article Google Scholar
Young, S., J. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland (2000). The HTK Book. Available from URL http://htk.eng.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Image, Speech and Intelligent Systems Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK
J. E. Higgins, T. J. Dodd & R. I. Damper

Authors

J. E. Higgins
View author publications
You can also search for this author in PubMed Google Scholar
T. J. Dodd
View author publications
You can also search for this author in PubMed Google Scholar
R. I. Damper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Josef Kittler
Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Higgins, J.E., Dodd, T.J., Damper, R.I. (2001). Application of Multiple Classifier Techniques to Subband Speaker Identification with an HMM/ANN System. In: Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2001. Lecture Notes in Computer Science, vol 2096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48219-9_37

Download citation

DOI: https://doi.org/10.1007/3-540-48219-9_37
Published: 22 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42284-6
Online ISBN: 978-3-540-48219-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics