Skip to main content
Log in

Addable Stress Speech Recognition with Multiplexing HMM: Training and Non-training Decision

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

In stress speech recognition, a recognition model that is capable of processing multi-stress speech needs to be designed in the view points of accuracy and add-ability. This paper proposes addable stress speech recognition with multiplexing Hidden-Markov model (HMM). To achieve multi-stress speech, we propose a multiplexing topology that combines multiple stress speech models. Since each stress affects a speech in different way, having a speech recognition model that specifically trained to recognize words effected by the stress help improve the recognition rates. However, since each stress speech model gives it own independent recognized word, we need to have an effective decision module to choose the correct word. In each stress speech model, a MFCC is applied to the input speech. The result is fed into a HMM that is segmented into N parts. Each part of the segmentation provides its own tentative recognized word which in turn is an input to the proposed non-training decision module. Based on these tentative recognized words from segments of all stress speech models, the final recognized word is decided using coarse-to-fine concept performed by a majority vote, segment-weighted difference square score and next best score, respectively. Besides neutral speech, the proposed method was verified using three stresses including angry, loud, and Lombard. The results showed that the proposed method achieved 94.7 % recognition rate comparing to 94.2 % of the training-based decision method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Amornkul, P., Kongkachanda, R., & Chamnongthai, K. (2001). Thai stress speech recognition with different feature extraction. In: International symposium on communication and information technology (ISCIT2001).

  2. Amornkul, P., Kumhom, R., & Chamnongthai, K. (2003). Thai stress speech recognizer based on combined features. In International symposium on communication and information technology (ISCIT2003). Songkha, Thailand.

  3. Amornkul, P., Kumhom, P., & Chamnongthai, K. (2005). N-best decision for thai stressed speech recognition with parallel hidden markov model. In The IEEE international symposium on intelligent signal processing and communication systems 2005 (ISPACS 2005), Hongkong.

  4. Bou-Ghazale, S. E., Hansen, J. H. L., & Kaiser, J. F. (1998). Classification of speech under stress based on feature derived from the nonlinear teager energy operator. Speech Communication, 1, 549–552.

    Google Scholar 

  5. Chen, Y. (1987). Cepstral domain stress compensation for robust speech recognition. In IEEE international conference on acoustics, speech, signal processing (pp. 717–720).

  6. Digalakis, V. V., Rtischev, D., & Neumeyer, L. (1995). Speaker adaptation using constrained estimation of gaussian mixtures. IEEE Transaction on Speech and Audio Processing, 3, 357–366.

    Article  Google Scholar 

  7. Hansen, J. H. L., & Bria, O. N. (1990). Lombard effect compensation for robust automatic speech recognition in noise. In International conference on spoken language processing (pp. 1125–1128).

  8. Hansen, J. H. L., & Clements, M. A. (1989). Stress compensation and noise reduction algorithms for robust speech recognition. In IEEE international conference acoustics, speech, signal processing(pp. 266–269).

  9. Hansen, J. H. L., & Clements, M. A. (1995). Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Transaction Speech and Audio Processing, 3, 407–415.

    Article  Google Scholar 

  10. Hansen, J. H. L. (1995). Morphological constrained feature enhancement with adaptive cepstral compensation (mce-acc) for speech recognition in noise and lombard effect. IEEE Transaction on Speech and Audio Processing, 2, 598–614.

    Article  Google Scholar 

  11. Hansen, J. H. L., & Bou-Ghazale, S. E. (1995). Robust speech recognition training via duration and spectral-based stress token generation. IEEE Transaction on Speech Audio Processing, 3, 415–421.

    Google Scholar 

  12. Hansen, J. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communication, 20, 151–157.

    Article  Google Scholar 

  13. Hsu, C. W., Chang, C. C., & Lin, C. (2013). A practical guide to support vector classification. http://www.csie.ntu.edu.tw/cjlin/libsvm/.

  14. Lee, C. H., Lin, C. H., & Juang, B. H. (1991). A study on speaker adaptation of the parameters of continuous density hidden markov models. IEEE Transaction on Signal Processing, 39, 806–814.

    Article  Google Scholar 

  15. Lippman, R. P., Martin, E. A., & Paul, D. B. (1987). Multi-style training for robust isolated-word speech recognition. In IEEE international conference on acoustics, speech and signal processing (pp. 705–708).

  16. Patil, S. A., & Hansen, J. H. L. (2007). Speech under stress: Analysis, modeling and recognition. Berlin, Heidelberg: Springer-Verlag.

    Google Scholar 

  17. Projections of future growth of the older population. (2014). http://www.aoa.gov.

  18. Ruzanski, E., Hansen, J. H. L., Meyerhoff, J., et al. (2006). Stress level classification of speech using euclidean distance metrics in a novel hybrid multi-dimensional feature space. In Proceedings of the 31st IEEE international conference on acoustics, speech, and signal processing (ICASSP 06).

  19. Ruzanski, E., Hansen, J. H. L., Meyerhoff, J., Saviolakis, G., & Koenig, M. (2005). Effect of phoneme characteristics on teo feature-based automatic stress detection in speech. In Proceedings of the 30th IEEE international conference on acoustics, speech, and signal processing (ICASSP 05).

  20. Sahar, E. B., & Hansen, J. H. L. (1998). Hmm based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress. IEEE Transactions on Speech and Audio Processing, 6, 201–216.

    Article  Google Scholar 

  21. Schreuder, M., Eerten, L., & Gilbers, D. (2006). Music as a method of identifying emotional speech. In Proceedings of the workshop on corpora for research on emotion and affect (LRE 06). Genua, Italy.

  22. Stanton, B., Jamieson, L., & Allen, G. (1989). Robust recognition of loud and lombard speech in the fighter cockpit environment. In IEEE international conference on acoustic, speech, signal processing (pp. 675–679).

  23. Varadarajan, V., & Hansen, J. H. L. (2006). Analysis of lombard effect under different types and levels of noise with application to in-set speaker id systems. In Proceedings of the 9th international conference on spoken language processing (Interspeech 06 ICSLP), Pittsburgh.

  24. Vlasenko, B., Schuller, B., Wendemuth, A., & Rigoll, G. (2007). Frame versus turn-level: Emotion recognition from speech considering static and dynamic processing. In Lecture notes in computer science, Springer.

  25. Womack, B. D., & Hansen, J. H. L. (1996). Classification of speech under stress using target driven features. Speech Communication, 20, 131–150.

    Article  Google Scholar 

  26. Womack, B. D., & Hansen, J. H. L. (1999). N-channel hidden markov models for combined stressed speech classification and recognition. IEEE Transaction on Speech and Audio Processing, 76, 668–677.

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank our collaborative Dr. Pinit Khumhom for his valuable assistance in revising this paper. And this research was supported by a grant from the Thailand Research Fund (TRF) through the Royal Golden Jubilee Scholarship Ph.D. program (Grant No. PHD/0134/2545).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kosin Chamnongthai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amornkul, P., Chamnongthai, K. & Temdee, P. Addable Stress Speech Recognition with Multiplexing HMM: Training and Non-training Decision. Wireless Pers Commun 76, 503–521 (2014). https://doi.org/10.1007/s11277-014-1721-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-014-1721-3

Keywords

Navigation