Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method

Sheikhan, Mansour; Bejani, Mahdi; Gharavian, Davood

doi:10.1007/s00521-012-0814-8

Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method

Original Article
Published: 20 January 2012

Volume 23, pages 215–227, (2013)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Mansour Sheikhan¹,
Mahdi Bejani¹ &
Davood Gharavian²

1801 Accesses
62 Citations
Explore all metrics

Abstract

The speech signal consists of linguistic information and also paralinguistic one such as emotion. The modern automatic speech recognition systems have achieved high performance in neutral style speech recognition, but they cannot maintain their high recognition rate for spontaneous speech. So, emotion recognition is an important step toward emotional speech recognition. The accuracy of an emotion recognition system is dependent on different factors such as the type and number of emotional states and selected features, and also the type of classifier. In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers. The most efficient features are also selected by using the analysis of variations method. It is noted that the proposed modular scheme is achieved through a comparative study of different features and characteristics of an individual emotional state with the aim of improving the recognition performance. Empirical results show that even by discarding 22% of features, the average emotion recognition accuracy can be improved by 2.2%. Also, the proposed modular neural-SVM classifier improves the recognition accuracy at least by 8% as compared to the simulated monolithic classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm

Article 12 January 2022

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

Article 22 April 2023

Machine learning techniques for speech emotion recognition using paralinguistic acoustic features

Article 08 July 2022

References

Bosch L (2003) Emotions, speech and the ASR framework. Speech Commun 40:213–225
Article MATH Google Scholar
Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25:556–570
Article Google Scholar
Ai H, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring systems. In: The proceedings of Interspeech, pp 797–800
Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human–human call center dialogs. In: The proceedings of Interspeech, pp 801–804
Lee CC, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: The proceedings of Interspeech, pp 320–323
Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: The proceedings of Interspeech, pp 340–343
Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design and results. Interact Comput 14:119–140
Article Google Scholar
López-Cózar R, Silovsky J, Kroul M (2011) Enhancement of emotion detection in spoken dialogue systems by combining several information sources. Speech Commun 53:1210–1228
Article Google Scholar
Fernandez R, Picard R (2011) Recognizing affect from speech prosody using hierarchical graphical models. Speech Commun 53:1088–1103
Article Google Scholar
Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Interact Stud 59:157–183
Article Google Scholar
Huber R, Batliner A, Buckow J, Nöth E, Warnke V, Niemann H (2000) Recognition of emotion in a realistic dialogue scenario. In: The proceedings of international conference on spoken language processing, pp 665–668
Yacoub S, Simske S, Lin X, Burns J (2003) Recognition of emotions in interactive voice response systems. In: The proceeding of European conference on speech communication and technology, pp 729–732
Polzehl T, Schmitt A, Metze F, Wagner M (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Commun 53:1198–1209
Article Google Scholar
Lee CM, Narayanan S (2003) Emotion recognition using a data-driven fuzzy inference system. In: The proceedings of Eurospeech, pp 157–160
Litman DJ, Forbes-Riley K (2006) Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun 48:559–590
Article Google Scholar
Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40:117–143
Article MATH Google Scholar
Ang J, Dhillon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: The proceedings of international conference on spoken language processing, pp 2037–2039
Liscombe J, Hirschberg J, Venditti JJ (2005) Detecting certainness in spoken tutorial dialogues. In: The proceeding of European conference on speech communication and technology, pp 1837–1840
Womack BD, Hansen JHL (1996) Classification of speech under stress using target driven features. Speech Commun 20:131–150
Article Google Scholar
Gharavian D, Ahadi SM (2008) Stressed speech recognition using a warped frequency scale. IEICE Electron Express 5:187–191
Article Google Scholar
Laukka P, Neiberg D, Forsell M, Karlsson I, Elenius K (2011) Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput Speech Lang 25:84–104
Article Google Scholar
Tolkmitt FJ, Scherer KR (1986) Effect of experimentally induced stress on vocal parameters. J Exp Psychol Hum Percept Perform 12:302–313
Article Google Scholar
Cairns D, Hansen JHL (1994) Nonlinear analysis and detection of speech under stressed conditions. J Acoust Soc Am 96:3392–3400
Article Google Scholar
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: The proceedings of international conference on spoken language processing, vol 3, pp 1970–1973
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13:293–303
Article Google Scholar
Gharavian D, Ahadi SM (2005) The effect of emotion on Farsi speech parameters: a statistical evaluation. In: The proceedings of international conference on speech and computer, pp 463–466
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48:1162–1181
Article Google Scholar
Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classifications of emotions in speech. Speech Commun 49:201–212
Article Google Scholar
Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Syst Appl 36:8197–8203
Article Google Scholar
Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl (published online 27 May 2011). doi:10.1007/s00521-011-0643-1
Sheikhan M, Safdarkhani MK, Gharavian D (2011) Emotion recognition of speech using small-size selected feature set and ANN-based classifiers: a comparative study. World Appl Sci J 14:616–625
Google Scholar
Gharavian D, Sheikhan M, Pezhmanpour M (2011) GMM-based emotion recognition in Farsi language using feature selection algorithms. World Appl Sci J 14:626–638
Google Scholar
Fersini E, Messina E, Archetti F (2012) Emotional states in judicial courtrooms: an experimental investigation. Speech Commun 54:11–22
Article Google Scholar
Young SJ, Evermann G, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2002) The HTK book (Ver. 3.2). Cambridge University Press, Cambridge
Google Scholar
SPSS Inc. (2007) Clementine^® 12.0 algorithms guide. Integral Solutions Limited, Chicago
Freedman DA (2005) Statistical models: theory and practice. Cambridge University Press, Cambridge
Book Google Scholar
Rong J, Li G, Chen YP (2009) Acoustic feature selection for automatic emotion recognition from speech. Info Process Manage 45:315–328
Article Google Scholar
Kao Y, Lee L (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: The proceedings of international conference on spoken language processing, pp 1814–1817
Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: The proceedings of international conference on spoken language processing, pp 809–812
Pao T, Chen Y, Yeh J, Chang Y (2008) Emotion recognition and evaluation of Mandarin speech using weighted D-KNN classification. Int J Innov Comput Info Control 4:1695–1709
Google Scholar
Sidorova J (2009) Speech emotion recognition with TGI+.2 classifier. In: The proceedings of the EACL student research workshop, pp 54–60
Gajšek R, Štruc V, Mihelič F (2010) Multi-modal emotion recognition using canonical correlations and acoustic features. In: The proceedings of international conference on pattern recognition, pp 4133–4136
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423
Article MATH Google Scholar
Bitouk D, Verma R, Nenkova A (2010) Class-level spectral features for emotion recognition. Speech Commun 52:613–625
Article Google Scholar
Yeh J, Pao T, Lin C, Tsai Y, Chen Y (2010) Segment-based emotion recognition from continuous Mandarin Chinese speech. Comput Hum Behav 27:1545–1552
Article Google Scholar
Wu S, Falk TH, Chan WY (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53:768–785
Article Google Scholar
He L, Lech M, Maddage NC, Allen NB (2011) Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomed Signal Process Control 6:139–146
Article Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Article MATH Google Scholar
Hyvarinen A (1999) Survey of independent component analysis. Neural Comput Surv 2:94–128
Google Scholar
Talavera L (1999) Feature selection as a preprocessing step for hierarchical clustering. In: The proceedings of international conference on machine learning, pp 389–397
Liu H, Motoda H, Yu L (2002) Feature selection with selective sampling. In: The proceedings of international conference on machine learning, pp 395–402
Ververidis D, Kotropoulos C (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In: The proceedings of European signal processing conference, pp 1–5
Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N (2011) Whodunnit-Searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25:4–28
Article Google Scholar
Haq S, Jackson PJB, Edge J (2008) Audio-visual feature selection and reduction for emotion classification. In: The proceedings of international conference on auditory-visual speech processing, pp 185–190
Pérez-Espinosa H, Reyes-García CA, Villaseñor-Pineda L (2011) Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomed Signal Process Control (published online 3 April 2011). doi:10.1016/j.bspc.2011.02.008
Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: The proceedings of the international conference on spoken language processing, pp 222–225
Väyrynen E, Toivanen J, Seppänen T (2011) Classification of emotion in spoken Finnish using vowel-length segments: increasing reliability with a fusion technique. Speech Commun 53:269–282
Article Google Scholar
Iliev AI, Scordilis MS, Papa JP, Falcão AX (2010) Spoken emotion recognition through optimum-path forest classification using glottal features. Comput Speech Lang 24:445–460
Article Google Scholar
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44:572–587
Article MATH Google Scholar
Nwe TL, Foo SV, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623
Article Google Scholar
Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 2, pp 1–4
Luengo I, Navas E, Hernáez I, Sanchez J (2005) Automatic emotion recognition using prosodic parameters. In: The proceeding of Interspeech, pp 493–496
Kockmann M, Burget L, Černocky JH (2011) Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun (article in press). doi:10.1016/j.specom.2011.01.007
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 1, pp 577–580
Chuang ZJ, Wu CH (2004) Emotion recognition using acoustic features and textual content. In: The proceedings of the international conference on multimedia and expo, vol 1, pp 53–56
Hoch S, Althoff F, McGlaun G, Rigooll G (2005) Bimodal fusion of emotional data in an automotive environment. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 2, pp 1085–1088
Morrison D, Wang R, de Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centers. Speech Commun 49:98–112
Article Google Scholar
Chandaka S, Chatterjee A, Munshi S (2009) Support vector machines employing cross-correlation for emotional speech recognition. Measurement 42:611–618
Article Google Scholar
Wang F, Verhelst W, Sahli H (2011) Relevance vector machine based speech emotion recognition. Lecture Notes in Computer Science. Affect Comput Intell Interact 6975:111–120
Article Google Scholar
Nicholson J, Takahashi K, Nakatsu R (1999) Emotion recognition in speech using neural networks. In: The proceedings of the international conference on neural information processing, vol 2, pp 495–501
Lee CM, Narayanan S, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: The proceedings of the international conference on spoken language processing, pp 873-876
Park CH, Lee DW, Sim KB (2002) Emotion recognition of speech based on RNN. In: The proceedings of the international conference on machine learning and cybernetics, vol 4, pp 2210–2213
Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71:2553–2562
Article Google Scholar
Planet S, Iriondo I, Socor′o J, Monzo C, Adell J (2009) GTMURL contribution to the INTERSPEECH 2009 emotion challenge. In: The proceedings of 10th annual of the international speech communication association (Interspeech’09), pp 316–319
Lee CC, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Commun 53:1162–1171
Article Google Scholar
Schwenker F, Scherer S, Schmidt M, Schels M, Glodek M (2010) Multiple classifier systems for the recognition of human emotions. Lecture Notes in Computer Science. Multiple Classif Syst 5997:315–324
Article Google Scholar
Schwenker F, Scherer S, Magdi YM, Palm G (2009) The GMM-SVM supervector approach for the recognition of the emotional status from speech. Lecture Notes in Computer Science. Artif Neural Netw 5768:894–903
Google Scholar
Scherer S, Schwenker F, Palm G (2008) Emotion recognition from speech using multi-classifier systems and RBF-ensembles. Studies in Computational Intelligence. Speech, audio, image and biomedical signal processing using neural networks, vol 83, pp 49–70
Wu CH, Liang WB (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2:10–21
Article Google Scholar
Lefter I, Rothkrantz LJM, Wiggers P, van Leeuwen DA (2010) Emotion recognition from speech by combining databases and fusion of classifiers. Lecture Notes in Computer Science. Text Speech Dialogue 6231:353–360
Article Google Scholar
Scherer S, Schwenker F, Palm G (2009) Classifier fusion for emotion recognition from speech. In: Advanced intelligent environments, pp 95–117
Pao TL, Chien CS, Chen YT, Yeh JH, Cheng YM, Liao WY (2007) Combination of multiple classifiers for improving emotion recognition in Mandarin speech. In: The proceedings of the international conference on intelligent information hiding and multimedia signal processing, vol 1, pp 35–38
Clavel C, Vasilescu I, Devillers L (2011) Fiction support for realistic portrayals of fear-type emotional manifestations. Comput Speech Lang 25:63–83
Article Google Scholar
Bijankhan M, Sheikhzadegan J, Roohani MR, Samareh Y, Lucas C, Tebiani M (1994) The speech database of Farsi spoken language. In: The proceedings of the international conference on speech science and technology, pp 826–831
Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun (article in press). doi:10.1016/j.specom.2011.01.011
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hochberg Y, Tamhane AC (1987) Multiple comparison procedures. Wiley, New York
Book MATH Google Scholar
NIST/SEMATECH (2011) e-Handbook of statistical methods. (http://www.itl.nist.gov/div898/handbook/)
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26:451–471
Article MathSciNet MATH Google Scholar
Ghanem AS, Venkatesh S, West G (2010) Multi-class pattern classification in imbalanced data. In: The proceedings of the international conference on pattern recognition, pp 2881–2884
Wang Y, Guan L (2005) Recognizing human emotion from audiovisual information. In: The proceedings of the international conference on acoustics, speech, and signal processing, pp 1125–1128
Kittler J, Hojjatoleslami A, Windeatt T (1997) Weighting factors in multiple expert fusion. In: The proceedings of the British machine vision conference, pp 42–50
Yu F, Chang E, Xu Y, Shum H (2001) Emotion detection from speech to enrich multimedia content. In: The proceedings of the IEEE Pacific Rim conference on multimedia: advances in multimedia information processing, pp 550–557
Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signal. In: The proceedings of the European conference on speech communication and technology, pp 125–128
Ayadi M, Kamel S, Karray F (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 5, pp 957–960

Download references

Author information

Authors and Affiliations

EE Department, Faculty of Engineering, Islamic Azad University, South Tehran Branch, P.O. Box: 11365-4435, Tehran, Iran
Mansour Sheikhan & Mahdi Bejani
EE Department, Shahid Abbaspour University of Technology, Tehran, Iran
Davood Gharavian

Authors

Mansour Sheikhan
View author publications
You can also search for this author in PubMed Google Scholar
Mahdi Bejani
View author publications
You can also search for this author in PubMed Google Scholar
Davood Gharavian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mansour Sheikhan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheikhan, M., Bejani, M. & Gharavian, D. Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Comput & Applic 23, 215–227 (2013). https://doi.org/10.1007/s00521-012-0814-8

Download citation

Received: 10 November 2011
Accepted: 05 January 2012
Published: 20 January 2012
Issue Date: July 2013
DOI: https://doi.org/10.1007/s00521-012-0814-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method

Abstract

Access this article

Similar content being viewed by others

Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

Machine learning techniques for speech emotion recognition using paralinguistic acoustic features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method

Abstract

Access this article

Similar content being viewed by others

Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

Machine learning techniques for speech emotion recognition using paralinguistic acoustic features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation