Skip to main content
Log in

Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

The work of this research is devoted to studying and enhancing talking condition recognition in stressful and emotional talking environments (completely two separate environments) based on three different and separate classifiers. The three classifiers are: Hidden Markov Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s) and Suprasegmental Hidden Markov Models (SPHMMs). The stressful talking environments that have been used in this work are composed of neutral, shouted, slow, loud, soft and fast talking conditions, while the emotional talking environments are made up of neutral, angry, sad, happy, disgust and fear emotions. The achieved results in the current work show that SPHMMs lead each of HMMs and CHMM2s in improving talking condition recognition in stressful and emotional talking environments. The results also demonstrate that talking condition recognition in stressful talking environments outperforms that in emotional talking environments by 2.7%, 1.8% and 3.3% based on HMMs, CHMM2s and SPHMMs, respectively. Based on subjective assessment by human judges, the recognition performance of stressful talking conditions leads that of emotional ones by 5.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Picard RW (1995) Affective computing. MIT Media Lab Perceptual Computing Section Tech Rep, No 321

  2. Bou-Ghazale SE, Hansen JHL (2000) A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans Speech Audio Process 8(4):429–442

    Article  Google Scholar 

  3. Shahin I (2006) Talking condition identification using circular hidden Markov models. In: 2nd International conference on information & communication technologies: from theory to applications (ICTTA’06, IEEE Section France), Damascus, Syria, April 2006

    Google Scholar 

  4. Chen Y (1988) Cepstral domain talker stress compensation for robust speech recognition. IEEE Trans Acoust Speech Signal Process 36(4):433–439

    Article  MATH  Google Scholar 

  5. Zhou G, Hansen JHL, Kaiser JF (2001) Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9(3):201–216

    Article  Google Scholar 

  6. Lee CM, Narayanan SS (2005) Towards detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303

    Article  Google Scholar 

  7. Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49(2):98–112

    Article  Google Scholar 

  8. Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623

    Article  Google Scholar 

  9. Casale S, Russo A, Serrano S (2007) Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Commun 49(10–11):801–810

    Article  Google Scholar 

  10. Hansen JHL, Womack B (1996) Feature analysis and neural network-based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307–313

    Article  Google Scholar 

  11. Park CH, Sim KB (2003) Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the international joint conference on neural networks, vol 4, July 20–24, 2003, Portland, Oregon, USA, pp 2594–2598

    Chapter  Google Scholar 

  12. Oudeyer P-Y (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum-Comput Stud 59:157–183

    Article  Google Scholar 

  13. Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signals. In: 8th European conference on speech communication and technology 2003, Geneva, Switzerland, September 2003, pp 125–128

    Google Scholar 

  14. Shahin I (2006) Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models. Speech Commun 48(8):1047–1055

    Article  Google Scholar 

  15. Shahin I (2008) Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Process J 88(11):2700–2708

    Article  MATH  Google Scholar 

  16. Shahin I (2009) Speaker identification in emotional environments. Iran. J. Electr. Comput. Eng. 8(1):41–46

    MathSciNet  Google Scholar 

  17. Shahin I (2008) Speaking style authentication using suprasegmental hidden Markov models. Univ Sharjah J Pure Appl Sci 5(2):41–65

    Google Scholar 

  18. Adell J, Benafonte A, Escudero D (2005) Analysis of prosodic features: towards modeling of emotional and pragmatic attributes of speech. In: XXI Congreso de la sociedad Española para el procesamiento del lenguaje natural, SEPLN, Granada, Spain, September 2005

    Google Scholar 

  19. Polzin TS, Waibel AH (1998) Detecting emotions in Speech. In: Cooperative multimodal communication, second international conference 1998, CMC

    Google Scholar 

  20. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  21. Pitsikalis V, Maragos P (2009) Analysis and classification of speech signals by generalized fractal dimension features. Speech Commun 51(12):1206–1223

    Article  Google Scholar 

  22. Wu W, Zheng TF, Xu MX, Bao HJ (2006) Study on speaker verification on emotional speech. In: INTERSPEECH 2006—proceedings of international conference on spoken language processing (ICSLP), September 2006, pp 2102–2105

    Google Scholar 

  23. Falk TH, Chan WY (2010) Modulation spectral features for robust far-field speaker identification. IEEE Trans Audio Speech Lang Process 18(1):90–100

    Article  Google Scholar 

  24. Kandali AB, Routray A, Basu TK (2008) Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In: Proc. IEEE region 10 conference TENCON 2008, Hyderabad, India, November 2008, pp 1–5

    Chapter  Google Scholar 

  25. Huang XD, Ariki Y, Jack MA (1990) Hidden Markov models for speech recognition. Edinburgh University Press, Edinburgh

    Google Scholar 

  26. Rabiner LR, Juang BH (1983) Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs

    Google Scholar 

  27. Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181

    Article  Google Scholar 

  28. Nogueiras A, Moreno A, Bonafonte A, Mario J (2001) Speech emotion recognition using hidden Markov models. In: European conference on speech communication and technology, EUROSPEECH 2001, Aalborg, Denmark, September 2001

    Google Scholar 

  29. Shahin I (2008) Talking condition identification using second-order hidden Markov models. In: 3rd International conference on information & communication technologies: from theory to applications (ICTTA’08, IEEE Section France), Damascus, Syria, April 2008

    Google Scholar 

  30. Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature of human vocal emotion. J Acoust Soc Am 93(2):1097–1108

    Article  Google Scholar 

  31. Hansen JHL, Bou-Ghazale S (1997) Getting started with SUSAS: A speech under simulated and actual stress database. In: EUROSPEECH-97: international conference on speech communication and technology, Rhodes, Greece, September 1997, pp 1743–1746

    Google Scholar 

  32. www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S28 (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ismail Shahin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahin, I. Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs. J Multimodal User Interfaces 6, 59–71 (2012). https://doi.org/10.1007/s12193-011-0082-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-011-0082-4

Keywords

Navigation