Applied Intelligence

, Volume 37, Issue 4, pp 602–612 | Cite as

Mandarin emotion recognition combining acoustic and emotional point information

  • Lijiang Chen
  • Xia MaoEmail author
  • Pengfei Wei
  • Yuli Xue
  • Mitsuru Ishizuka


In this contribution, we introduce a novel approach to combine acoustic information and emotional point information for a robust automatic recognition of a speaker’s emotion. Six discrete emotional states are recognized in the work. Firstly, a multi-level model for emotion recognition by acoustic features is presented. The derived features are selected by fisher rate to distinguish different types of emotions. Secondly, a novel emotional point model for Mandarin is established by Support Vector Machine and Hidden Markov Model. This model contains 28 emotional syllables which reflect rich emotional information. Finally the acoustic information and emotional point information are integrated by a soft decision strategy. Experimental results show that the application of emotional point information in speech emotion recognition is effective.


Mandarin emotion recognition Emotional point Fisher rate Support vector machine Hidden Markov model 



This research is supported by the International Science and Technology Cooperation Program of China (No. 2010DFA11990) and the National Nature Science Foundation of China (No. 61103097).


  1. 1.
    Picard R (2000) Affective computing. MIT Press, Cambridge Google Scholar
  2. 2.
    Malatesta L, Raouzaiou A, Karpouzis K, Kollias S (2009) Towards modeling embodied conversational agent character profiles using appraisal theory predictions in expression synthesis. Appl Intell 30(1):58–64 CrossRefGoogle Scholar
  3. 3.
    Cho S (2002) Towards creative evolutionary systems with interactive genetic algorithm. Appl Intell 16(2):129–138 zbMATHCrossRefGoogle Scholar
  4. 4.
    Tao J, Tan T (2005) Affective computing: a review. In: Proc of affective computing and intelligent interaction, pp 981–995 CrossRefGoogle Scholar
  5. 5.
    Assaleh K, Shanableh T (2010) Robust polynomial classifier using L 1-norm minimization. Appl Intell 33(3):330–339 CrossRefGoogle Scholar
  6. 6.
    Ince G, Nakadai K, Rodemann T, Tsujino H, Imura J (2011) Ego noise cancellation of a robot using missing feature masks. Appl Intell 34(3):1–12 CrossRefGoogle Scholar
  7. 7.
    Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181 CrossRefGoogle Scholar
  8. 8.
    Scherer K (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40(1–2):227–256 zbMATHCrossRefGoogle Scholar
  9. 9.
    Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: Sixth international conference on spoken language processing, Beijing, China, vol 2, pp 222–225 Google Scholar
  10. 10.
    Yoon W, Park K (2007) A study of emotion recognition and its applications. In: Proc of modeling decisions for artificial intelligence, pp 455–462 CrossRefGoogle Scholar
  11. 11.
    Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H (2009) Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput 27(12):1760–1774 CrossRefGoogle Scholar
  12. 12.
    Ekman P (1992) An argument for basic emotions. Cogn Emot 6(3–4):169–200 CrossRefGoogle Scholar
  13. 13.
    Plutchik R (1980) Emotion: a psychoevolutionary synthesis. Harper Collins, New York Google Scholar
  14. 14.
    Mehrabian A, Russell J (1974) An approach to environmental psychology. MIT Press, Cambridge Google Scholar
  15. 15.
    Coghlan A, Pearce P (2010) Tracking affective components of satisfaction. Tour Hosp Res 10(1):42 CrossRefGoogle Scholar
  16. 16.
    Banse R, Scherer K (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614 CrossRefGoogle Scholar
  17. 17.
    Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423 zbMATHCrossRefGoogle Scholar
  18. 18.
    Fujisaki H (2004) Information, prosody, and modeling-with emphasis on tonal features of speech. In: Proceedings of speech prosody 2004, Nara, Japan, pp 1–10 Google Scholar
  19. 19.
    Zhao L, Cao Y, Wang Z, Zou C (2005) Speech emotional recognition using global and time sequence structure features with MMD. In: Proc of affective computing and intelligent interaction, pp 311–318 CrossRefGoogle Scholar
  20. 20.
    Shami M, Kamel M (2005) Segment-based approach to the recognition of emotions in speech. In: 2005 IEEE international conference on multimedia and expo. IEEE Press, New York, pp 1–4 Google Scholar
  21. 21.
    Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE international conference on acoustics, speech, and signal processing. Proceedings ICASSP’04, vol 1. IEEE Press, New York, pp I577–I580 Google Scholar
  22. 22.
    Zhang J, Hirose K (2004) Tone nucleus modeling for Chinese lexical tone recognition. Speech Commun 42(3):447–466 CrossRefGoogle Scholar
  23. 23.
    Chao Y (1965) A grammar of spoken Chinese. University of California Press, Berkeley Google Scholar
  24. 24.
    Chen Y, Wang R (1990) Speech signal processing. University of Science and Technology of China Press, Hefei (in Chinese) Google Scholar
  25. 25.
    Olson C (1995) Parallel algorithms for hierarchical clustering. Parallel Comput 21(8):1313–1325 MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Mao X, Chen L (2010) Speech emotion recognition based on parametric filter and fractal dimension. IEICE Trans Inf Syst 93(8):2324–2326 CrossRefGoogle Scholar
  27. 27.
    Xiao Z, Dellandrea E, Dou W, Chen L (2010) Multi-stage classification of emotional speech motivated by a dimensional emotion model. Multimed Tools Appl 46(1):119–145 CrossRefGoogle Scholar
  28. 28.
    Fisher R (1938) The statistical utilization of multiple measurements. Ann Hum Genet 8(4):376–386 Google Scholar
  29. 29.
    Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720 CrossRefGoogle Scholar
  30. 30.
    Sun Y, Zhou Y, Zhao Q, Yan Y (2010) Acoustic feature optimization based on f-ratio for robust speech recognition. IEICE Trans Inf Syst 93(9):2417–2430 CrossRefGoogle Scholar
  31. 31.
    Pudil P, Novovicová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125 CrossRefGoogle Scholar
  32. 32.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 zbMATHGoogle Scholar
  33. 33.
    Lin Y, Wei G (2005) Speech emotion recognition based on HMM and SVM. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 8. IEEE Press, New York, pp 4898–4901 CrossRefGoogle Scholar
  34. 34.
    Schuller B, Reiter S, Muller R, Al-Hames M, Lang M, Rigoll G (2005) Speaker independent speech emotion recognition by ensemble classification. In: IEEE international conference on multimedia and expo, 2005. ICME 2005. IEEE Press, New York, pp 864–867 CrossRefGoogle Scholar
  35. 35.
    Damper R, Gunn S, Gore M (2000) Extracting phonetic knowledge from learning systems: perceptrons, support vector machines and linear discriminants. Appl Intell 12(1):43–62 CrossRefGoogle Scholar
  36. 36.
    Hooper J (1972) The syllable in phonological theory. Language 48:525–540 CrossRefGoogle Scholar
  37. 37.
    Goslin J, Frauenfelder U (2001) A comparison of theoretical and human syllabification. Lang Speech 44(4):409–436 CrossRefGoogle Scholar
  38. 38.
    Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1–3):133–147 CrossRefGoogle Scholar
  39. 39.
    Viikki O, Bye D, Laurila K (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998, vol 2. IEEE Press, New York, pp 733–736 Google Scholar
  40. 40.
    Glass J, Chang J, McCandless M (1996) A probabilistic framework for feature-based speech recognition. In: Proceedings of fourth international conference on spoken language, ICSLP 96, vol 4. IEEE Press, New York, pp 2277–2280 CrossRefGoogle Scholar
  41. 41.
    Nogueiras A, Moreno A, Bonafonte A, Mariño J (2001) Speech emotion recognition using hidden Markov models. In: Proceedings of eurospeech, 2001, pp 2679–2682 Google Scholar
  42. 42.
    Fernandez R, Picard R (2003) Modeling drivers’ speech under stress. Speech Commun 40(1–2):145–159 zbMATHCrossRefGoogle Scholar
  43. 43.
    Janev M, Pekar D, Jakovljevic N, Delic V (2010) Eigenvalues driven Gaussian selection in continuous speech recognition using HMMS with full covariance matrices. Appl Intell 33(2):107–116 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Lijiang Chen
    • 1
  • Xia Mao
    • 1
    Email author
  • Pengfei Wei
    • 1
  • Yuli Xue
    • 1
  • Mitsuru Ishizuka
    • 2
  1. 1.School of Electronic and Information EngineeringBeihang UniversityBeijingChina
  2. 2.Department of Information and Communication EngineeringUniversity of TokyoTokyoJapan

Personalised recommendations