Skip to main content
Log in

Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In recent four decades, enormous efforts have been focused on developing automatic speech recognition systems to extract linguistic information, but much research is needed to decode the paralinguistic information such as speaking styles and emotion. The effect of using first three normalized formant frequencies and pitch frequency as supplementary features on improving the performance of an emotion recognition system that uses Mel-frequency cepstral coefficients and energy-related features, as the components of feature vector, is investigated in this paper. The normalization is performed using a dynamic time warping-multi-layer perceptron hybrid model after determining the frequency range that is most affected by emotion. To reduce the number of features, fast correlation-based filter and analysis of variations (ANOVA) methods are used in this study. Recognizing of the emotional states is performed using Gaussian mixture model. Experimental results show that first formant (F1)-based warping and ANOVA-based feature selection result in the best performance as compared to other simulated systems in this study, and the average emotion recognition accuracy is acceptable as compared to most of the recent researches in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423

    Article  MATH  Google Scholar 

  2. Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53:1062–1087

    Article  Google Scholar 

  3. Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: The proceedings of international conference on spoken language processing, vol 3, pp 1970–1973

  4. Ai H, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring systems. In: The proceedings of Interspeech, pp 797–800

  5. Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human–human call center dialogs. In: The proceedings of Interspeech, pp 801–804

  6. Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: The proceedings of Interspeech, pp 340–343

  7. López-Cózar R, Silovsky J, Kroul M (2011) Enhancement of emotion detection in spoken dialogue systems by combining several information sources. Speech Commun 53:1210–1228

    Article  Google Scholar 

  8. Fernandez R, Picard R (2011) Recognizing affect from speech prosody using hierarchical graphical models. Speech Commun 53:1088–1103

    Article  Google Scholar 

  9. Kao Y, Lee L (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: The proceedings of international conference on spoken language processing, pp 1814–1817

  10. Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48:1162–1181

    Article  Google Scholar 

  11. Pao T, Chen Y, Yeh J, Chang Y (2008) Emotion recognition and evaluation of Mandarin speech using weighted D-KNN classification. Int J Innov Comput Inf Control 4:1695–1709

    Google Scholar 

  12. Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Syst Appl 36:8197–8203

    Article  Google Scholar 

  13. Gajšek R, Struc V, Mihelič F (2010) Multi-modal emotion recognition using canonical correlations and acoustic features. In: The proceedings of international conference on pattern recognition, pp 4133–4136

  14. Bitouk D, Verma R, Nenkova A (2010) Class-level spectral features for emotion recognition. Speech Commun 52:613–625

    Article  Google Scholar 

  15. Yeh J, Pao T, Lin C, Tsai Y, Chen Y (2010) Segment-based emotion recognition from continuous Mandarin Chinese speech. Comput Hum Behav 27:1545–1552

    Article  Google Scholar 

  16. Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53:768–785

    Article  Google Scholar 

  17. He L, Lech M, Maddage NC, Allen NB (2011) Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomed Signal Process Control 6:139–146

    Article  Google Scholar 

  18. Polzehl T, Schmitt A, Metze F, Wagner M (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Commun 53:1198–1209

    Article  Google Scholar 

  19. Ververidis D, Kotropoulos C (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In: The proceedings of European signal processing conference, pp 1–5

  20. Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N (2011) Whodunnit-searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25:4–28

    Article  Google Scholar 

  21. Haq S, Jackson PJB, Edge J (2008) Audio-visual feature selection and reduction for emotion classification. In: The proceedings of international conference on auditory-visual speech processing, pp 185–190

  22. Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl [Published Online 27 May 2011 (doi:10.1007/s00521-011-0643-1)]

  23. Rong J, Li G, Chen YP (2009) Acoustic feature selection for automatic emotion recognition from speech. Inf Process Manag 45:315–328

    Article  Google Scholar 

  24. Iliev AI, Scordilis MS, Papa JP, Falcão AX (2010) Spoken emotion recognition through optimum-path forest classification using glottal features. Comput Speech Lang 24:445–460

    Article  Google Scholar 

  25. Väyrynen E, Toivanen J, Seppänen T (2011) Classification of emotion in spoken Finnish using vowel-length segments: increasing reliability with a fusion technique. Speech Commun 53:269–282

    Article  Google Scholar 

  26. Fersini E, Messina E, Archetti F (2012) Emotional states in judicial courtrooms: an experimental investigation. Speech Commun 54:11–22

    Article  Google Scholar 

  27. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44:572–587

    Article  MATH  Google Scholar 

  28. Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25:556–570

    Article  Google Scholar 

  29. Kockmann M, Burget L, Černocky JH (2011) Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun (Article in Press, doi:10.1016/j.specom.2011.01.007)

  30. Chandaka S, Chatterjee A, Munshi S (2009) Support vector machines employing cross-correlation for emotional speech recognition. Measurement 42:611–618

    Article  Google Scholar 

  31. Yacoub S, Simske S, Lin X, Burns J (2003) Recognition of emotions in interactive voice response systems. In: The proceeding of European conference on speech communication and technology, pp 729–732

  32. Lee CM, Narayanan S, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: The proceedings of the international conference on spoken language processing, pp 873–876

  33. Park CH, Lee DW, Sim KB (2002) Emotion recognition of speech based on RNN. In: The proceedings of the international conference on machine learning and cybernetics, vol 4, pp 2210–2213

  34. Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71:2553–2562

    Article  Google Scholar 

  35. Planet S, Iriondo I, Socor′o J, Monzo C, Adell J (2009) GTMURL contribution to the INTERSPEECH 2009 emotion challenge. In: The proceedings of 10th annual of the international speech communication association (Interspeech’09), pp 316–319

  36. Lee C–C, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Commun 53:1162–1171

    Article  Google Scholar 

  37. Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555

    MathSciNet  MATH  Google Scholar 

  38. NIST/SEMATECH (2011) e-Handbook of Statistical Methods. (http://www.itl.nist.gov/div898/handbook/)

  39. Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23:67–72

    Article  Google Scholar 

  40. Yu F, Chang E, Xu Y, Shum H (2001) Emotion detection from speech to enrich multimedia content. In: The proceedings of the IEEE Pacific Rim conference on multimedia: advances in multimedia information processing, pp 550–557

  41. Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signal. In: The proceedings of the European conference on speech communication and technology, pp 125–128

  42. Ayadi M, Kamel S, Karray F (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In: The proceedings of the international conference on acoustics, speech, and signal processing, vol 5, pp 957–960

  43. Vlasenko B, Wendemuth A (2007) Tuning hidden Markov model for speech emotion recognition. In: The proceedings of 33rd German annual conference on acoustics, pp 317–320

  44. Sidorova J (2009) Speech emotion recognition with TGI + .2 classifier. In: The proceedings of the EACL student research workshop, pp 54–60

  45. Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: The proceedings of the international conference on spoken language processing, pp 222–225

  46. Luengo I, Navas E, Hernáez I, Sanchez J (2005) Automatic emotion recognition using prosodic parameters. In: The proceeding of Interspeech, pp 493–496

  47. Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: The proceedings of international conference on spoken language processing, pp 809–812

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davood Gharavian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gharavian, D., Sheikhan, M. & Ashoftedel, F. Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model. Neural Comput & Applic 22, 1181–1191 (2013). https://doi.org/10.1007/s00521-012-0884-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-0884-7

Keywords

Navigation