Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs

Shahin, Ismail

doi:10.1007/s12193-011-0082-4

Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs

Original Paper
Published: 17 December 2011

Volume 6, pages 59–71, (2012)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Ismail Shahin¹

121 Accesses
23 Citations
1 Altmetric
Explore all metrics

Abstract

The work of this research is devoted to studying and enhancing talking condition recognition in stressful and emotional talking environments (completely two separate environments) based on three different and separate classifiers. The three classifiers are: Hidden Markov Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s) and Suprasegmental Hidden Markov Models (SPHMMs). The stressful talking environments that have been used in this work are composed of neutral, shouted, slow, loud, soft and fast talking conditions, while the emotional talking environments are made up of neutral, angry, sad, happy, disgust and fear emotions. The achieved results in the current work show that SPHMMs lead each of HMMs and CHMM2s in improving talking condition recognition in stressful and emotional talking environments. The results also demonstrate that talking condition recognition in stressful talking environments outperforms that in emotional talking environments by 2.7%, 1.8% and 3.3% based on HMMs, CHMM2s and SPHMMs, respectively. Based on subjective assessment by human judges, the recognition performance of stressful talking conditions leads that of emotional ones by 5.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Speaker age and gender recognition using 1D and 2D convolutional neural networks

Article 28 November 2023

References

Picard RW (1995) Affective computing. MIT Media Lab Perceptual Computing Section Tech Rep, No 321
Bou-Ghazale SE, Hansen JHL (2000) A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans Speech Audio Process 8(4):429–442
Article Google Scholar
Shahin I (2006) Talking condition identification using circular hidden Markov models. In: 2nd International conference on information & communication technologies: from theory to applications (ICTTA’06, IEEE Section France), Damascus, Syria, April 2006
Google Scholar
Chen Y (1988) Cepstral domain talker stress compensation for robust speech recognition. IEEE Trans Acoust Speech Signal Process 36(4):433–439
Article MATH Google Scholar
Zhou G, Hansen JHL, Kaiser JF (2001) Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9(3):201–216
Article Google Scholar
Lee CM, Narayanan SS (2005) Towards detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Article Google Scholar
Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49(2):98–112
Article Google Scholar
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623
Article Google Scholar
Casale S, Russo A, Serrano S (2007) Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Commun 49(10–11):801–810
Article Google Scholar
Hansen JHL, Womack B (1996) Feature analysis and neural network-based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307–313
Article Google Scholar
Park CH, Sim KB (2003) Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the international joint conference on neural networks, vol 4, July 20–24, 2003, Portland, Oregon, USA, pp 2594–2598
Chapter Google Scholar
Oudeyer P-Y (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum-Comput Stud 59:157–183
Article Google Scholar
Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signals. In: 8th European conference on speech communication and technology 2003, Geneva, Switzerland, September 2003, pp 125–128
Google Scholar
Shahin I (2006) Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models. Speech Commun 48(8):1047–1055
Article Google Scholar
Shahin I (2008) Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Process J 88(11):2700–2708
Article MATH Google Scholar
Shahin I (2009) Speaker identification in emotional environments. Iran. J. Electr. Comput. Eng. 8(1):41–46
MathSciNet Google Scholar
Shahin I (2008) Speaking style authentication using suprasegmental hidden Markov models. Univ Sharjah J Pure Appl Sci 5(2):41–65
Google Scholar
Adell J, Benafonte A, Escudero D (2005) Analysis of prosodic features: towards modeling of emotional and pragmatic attributes of speech. In: XXI Congreso de la sociedad Española para el procesamiento del lenguaje natural, SEPLN, Granada, Spain, September 2005
Google Scholar
Polzin TS, Waibel AH (1998) Detecting emotions in Speech. In: Cooperative multimodal communication, second international conference 1998, CMC
Google Scholar
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
Pitsikalis V, Maragos P (2009) Analysis and classification of speech signals by generalized fractal dimension features. Speech Commun 51(12):1206–1223
Article Google Scholar
Wu W, Zheng TF, Xu MX, Bao HJ (2006) Study on speaker verification on emotional speech. In: INTERSPEECH 2006—proceedings of international conference on spoken language processing (ICSLP), September 2006, pp 2102–2105
Google Scholar
Falk TH, Chan WY (2010) Modulation spectral features for robust far-field speaker identification. IEEE Trans Audio Speech Lang Process 18(1):90–100
Article Google Scholar
Kandali AB, Routray A, Basu TK (2008) Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In: Proc. IEEE region 10 conference TENCON 2008, Hyderabad, India, November 2008, pp 1–5
Chapter Google Scholar
Huang XD, Ariki Y, Jack MA (1990) Hidden Markov models for speech recognition. Edinburgh University Press, Edinburgh
Google Scholar
Rabiner LR, Juang BH (1983) Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs
Google Scholar
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181
Article Google Scholar
Nogueiras A, Moreno A, Bonafonte A, Mario J (2001) Speech emotion recognition using hidden Markov models. In: European conference on speech communication and technology, EUROSPEECH 2001, Aalborg, Denmark, September 2001
Google Scholar
Shahin I (2008) Talking condition identification using second-order hidden Markov models. In: 3rd International conference on information & communication technologies: from theory to applications (ICTTA’08, IEEE Section France), Damascus, Syria, April 2008
Google Scholar
Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature of human vocal emotion. J Acoust Soc Am 93(2):1097–1108
Article Google Scholar
Hansen JHL, Bou-Ghazale S (1997) Getting started with SUSAS: A speech under simulated and actual stress database. In: EUROSPEECH-97: international conference on speech communication and technology, Rhodes, Greece, September 1997, pp 1743–1746
Google Scholar
www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S28 (2011)

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, University of Sharjah, P.O. Box 27272, Sharjah, United Arab Emirates
Ismail Shahin

Authors

Ismail Shahin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ismail Shahin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahin, I. Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs. J Multimodal User Interfaces 6, 59–71 (2012). https://doi.org/10.1007/s12193-011-0082-4

Download citation

Received: 02 February 2011
Accepted: 02 December 2011
Published: 17 December 2011
Issue Date: July 2012
DOI: https://doi.org/10.1007/s12193-011-0082-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Speaker age and gender recognition using 1D and 2D convolutional neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Speaker age and gender recognition using 1D and 2D convolutional neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation