Skip to main content

Application of Multiple Classifier Techniques to Subband Speaker Identification with an HMM/ANN System

  • Conference paper
  • First Online:
Multiple Classifier Systems (MCS 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2096))

Included in the following conference series:

Abstract

In previous work, we have confirmed the performance gains that can be obtained in speaker recognition by splitting the (clean) wide-band speech signal into several subbands, employing separate pattern classifiers for each subband, and then using multiple classifier fusion (‘recombination’) techniques to produce a final decision. However, our earlier work used fairly rudimentary recognition techniques (dynamic time warping), just sum or product fusion rules and the spoken word seven only. The question then arises: Can subband processing still deliver performance gains when using state-of-the-art recognition techniques, more sophisticated recombination, and different spoken digits? To answer this, we have applied hidden Markov modelling and artificial neural network (ANN) recombination to text-dependent speaker identification, for spoken digits seven and nine. We find that ANN recombination performs about as well as the sum rule operating in log probability space, but the ANN results are not unique. They depend critically on user-specified parameters, initialisation, etc. On clean speech, all classifiers achieve close to 100% identification. Subband techniques offer advantages when the speech signal is significantly degraded by noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55, 1304–1312.

    Article  Google Scholar 

  2. Besacier, L. and J.-F. Bonastre (1997). Subband approach for automatic speaker recognition: Optimal division of the frequency domain. In Proceedings of 1st International Conference on Audio-and Visual-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, pp. 195–202.

    Google Scholar 

  3. Besacier, L. and J.-F. Bonastre (2000). Subband architecture for automatic speaker recognition. Signal Processing 80, 1245–1259.

    Article  MATH  Google Scholar 

  4. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford, UK: Clarendon Press.

    Google Scholar 

  5. Bourlard, H. and S. Dupont (1996). A new ASR approach based on independent processing and recombination of partial frequency bands. In Proceedings of Fourth International Conference on Spoken Language Processing, ICSLP’96, Volume 1, Philadelphia, PA, pp. 426–429.

    Article  Google Scholar 

  6. Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE 85(9), 1437–1462.

    Article  Google Scholar 

  7. Deller, J. R., J. P. Proakis, and J. H. L. Hansen (1993). Discrete-Time Processing of Speech Signals. Englewood Cliffs, NJ: MacMillan.

    Google Scholar 

  8. Doddington, G. (1985). Speaker recognition–identifying people by their voices. Proceedings of the IEEE 73(11), 1651–1664.

    Article  Google Scholar 

  9. Finan, R. A., R. I. Damper, and A. T. Sapeluk (2001). Text-dependent speaker recognition using sub-band processing. International Journal of Speech Technology 4(1), 45–62.

    Article  MATH  Google Scholar 

  10. Furui, S. (1974). An analysis of long-term variation of feature parameters of speech and its application to talker recognition. Electronic Communications 57-A, 34–42.

    Google Scholar 

  11. Furui, S. (1981). Cepstral analysis techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(2), 254–272.

    Article  Google Scholar 

  12. Furui, S. (1997). Recent advances in speaker recognition. Pattern Recognition Letters 18, 859–872.

    Article  Google Scholar 

  13. Higgins, J. E., R. I. Damper, and C. J. Harris (1999). A multi-spectral data-fusion approach to speaker recognition. In Proceedings of 2nd International Conference on Information Fusion, Fusion 99, Volume II, Sunnyvale, CA, pp. 1136–1143.

    Google Scholar 

  14. Kittler, J., M. Hatef, R. P. W. Duin, and J. Matas (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239.

    Article  Google Scholar 

  15. Markel, J. D. and A. H. Gray (1976). Linear Prediction of Speech. Berlin, Germany: Springer-Verlag.

    MATH  Google Scholar 

  16. Morris, A., A. Hagen, and H. Bourlard (1999). The full-combination sub-bands approach to noise robust HMM/ANN-based ASR. In Proceedings of 6th European Conference on Speech Communication and Technology, Eurospeech’99, Volume 2, Budapest, Hungary, pp. 599–602.

    Google Scholar 

  17. Okawa, S., T. Nakajima, and K. Shirai (1999). A recombination strategy for multi-band speech recognition based on mutual information criterion. In Proceedings of 6th European Conference on Speech Communication and Technology, Eurospeech’99, Volume 2, Budapest, Hungary, pp. 603–606.

    Google Scholar 

  18. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285.

    Article  Google Scholar 

  19. Reynolds, D. A. and R. C. Rose (1995). Robust text-independent speaker identification using Gaussian mixture models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83.

    Article  Google Scholar 

  20. Sivakumaran, P., A. M. Ariyaeeinia, and J. A. Hewitt (1998). Sub-band speaker verification using dynamic recombination weights. In Proceedings of 5th International Conference on Spoken Language Processing, ICSLP 98, Sydney, Australia. Paper 1055 on CD-ROM.

    Google Scholar 

  21. Stevens, S. S. and J. Volkmann (1940). The relation of pitch to frequency: A revised scale. American Journal of Psychology 53(3), 329–353.

    Article  Google Scholar 

  22. Tibrewala, S. and H. Hermansky (1997). Sub-band based recognition of noisy speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, ICASSP 97, Volume II, Munich, Germany, pp. 1255–1258.

    Article  Google Scholar 

  23. Young, S., J. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland (2000). The HTK Book. Available from URL http://htk.eng.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Higgins, J.E., Dodd, T.J., Damper, R.I. (2001). Application of Multiple Classifier Techniques to Subband Speaker Identification with an HMM/ANN System. In: Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2001. Lecture Notes in Computer Science, vol 2096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48219-9_37

Download citation

  • DOI: https://doi.org/10.1007/3-540-48219-9_37

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42284-6

  • Online ISBN: 978-3-540-48219-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics