Abstract
The work of this research is devoted to studying and enhancing talking condition recognition in stressful and emotional talking environments (completely two separate environments) based on three different and separate classifiers. The three classifiers are: Hidden Markov Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s) and Suprasegmental Hidden Markov Models (SPHMMs). The stressful talking environments that have been used in this work are composed of neutral, shouted, slow, loud, soft and fast talking conditions, while the emotional talking environments are made up of neutral, angry, sad, happy, disgust and fear emotions. The achieved results in the current work show that SPHMMs lead each of HMMs and CHMM2s in improving talking condition recognition in stressful and emotional talking environments. The results also demonstrate that talking condition recognition in stressful talking environments outperforms that in emotional talking environments by 2.7%, 1.8% and 3.3% based on HMMs, CHMM2s and SPHMMs, respectively. Based on subjective assessment by human judges, the recognition performance of stressful talking conditions leads that of emotional ones by 5.2%.
Similar content being viewed by others
References
Picard RW (1995) Affective computing. MIT Media Lab Perceptual Computing Section Tech Rep, No 321
Bou-Ghazale SE, Hansen JHL (2000) A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans Speech Audio Process 8(4):429–442
Shahin I (2006) Talking condition identification using circular hidden Markov models. In: 2nd International conference on information & communication technologies: from theory to applications (ICTTA’06, IEEE Section France), Damascus, Syria, April 2006
Chen Y (1988) Cepstral domain talker stress compensation for robust speech recognition. IEEE Trans Acoust Speech Signal Process 36(4):433–439
Zhou G, Hansen JHL, Kaiser JF (2001) Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9(3):201–216
Lee CM, Narayanan SS (2005) Towards detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49(2):98–112
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623
Casale S, Russo A, Serrano S (2007) Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Commun 49(10–11):801–810
Hansen JHL, Womack B (1996) Feature analysis and neural network-based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307–313
Park CH, Sim KB (2003) Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the international joint conference on neural networks, vol 4, July 20–24, 2003, Portland, Oregon, USA, pp 2594–2598
Oudeyer P-Y (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum-Comput Stud 59:157–183
Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signals. In: 8th European conference on speech communication and technology 2003, Geneva, Switzerland, September 2003, pp 125–128
Shahin I (2006) Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models. Speech Commun 48(8):1047–1055
Shahin I (2008) Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Process J 88(11):2700–2708
Shahin I (2009) Speaker identification in emotional environments. Iran. J. Electr. Comput. Eng. 8(1):41–46
Shahin I (2008) Speaking style authentication using suprasegmental hidden Markov models. Univ Sharjah J Pure Appl Sci 5(2):41–65
Adell J, Benafonte A, Escudero D (2005) Analysis of prosodic features: towards modeling of emotional and pragmatic attributes of speech. In: XXI Congreso de la sociedad Española para el procesamiento del lenguaje natural, SEPLN, Granada, Spain, September 2005
Polzin TS, Waibel AH (1998) Detecting emotions in Speech. In: Cooperative multimodal communication, second international conference 1998, CMC
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Pitsikalis V, Maragos P (2009) Analysis and classification of speech signals by generalized fractal dimension features. Speech Commun 51(12):1206–1223
Wu W, Zheng TF, Xu MX, Bao HJ (2006) Study on speaker verification on emotional speech. In: INTERSPEECH 2006—proceedings of international conference on spoken language processing (ICSLP), September 2006, pp 2102–2105
Falk TH, Chan WY (2010) Modulation spectral features for robust far-field speaker identification. IEEE Trans Audio Speech Lang Process 18(1):90–100
Kandali AB, Routray A, Basu TK (2008) Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In: Proc. IEEE region 10 conference TENCON 2008, Hyderabad, India, November 2008, pp 1–5
Huang XD, Ariki Y, Jack MA (1990) Hidden Markov models for speech recognition. Edinburgh University Press, Edinburgh
Rabiner LR, Juang BH (1983) Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181
Nogueiras A, Moreno A, Bonafonte A, Mario J (2001) Speech emotion recognition using hidden Markov models. In: European conference on speech communication and technology, EUROSPEECH 2001, Aalborg, Denmark, September 2001
Shahin I (2008) Talking condition identification using second-order hidden Markov models. In: 3rd International conference on information & communication technologies: from theory to applications (ICTTA’08, IEEE Section France), Damascus, Syria, April 2008
Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature of human vocal emotion. J Acoust Soc Am 93(2):1097–1108
Hansen JHL, Bou-Ghazale S (1997) Getting started with SUSAS: A speech under simulated and actual stress database. In: EUROSPEECH-97: international conference on speech communication and technology, Rhodes, Greece, September 1997, pp 1743–1746
www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S28 (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shahin, I. Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs. J Multimodal User Interfaces 6, 59–71 (2012). https://doi.org/10.1007/s12193-011-0082-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-011-0082-4