International Journal of Speech Technology

, Volume 19, Issue 4, pp 805–816 | Cite as

Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering

  • Chengwei Huang
  • Baolin Song
  • Li Zhao


In this paper we propose a feature normalization method for speaker-independent speech emotion recognition. The performance of a speech emotion classifier largely depends on the training data, and a large number of unknown speakers may cause a great challenge. To address this problem, first, we extract and analyse 481 basic acoustic features. Second, we use principal component analysis and linear discriminant analysis jointly to construct the speaker-sensitive feature space. Third, we classify the emotional utterances into pseudo-speaker groups in the speaker-sensitive feature space by using fuzzy k-means clustering. Finally, we normalize the original basic acoustic features of each utterance based on its group information. To verify our normalization algorithm, we adopt a Gaussian mixture model based classifier for recognition test. The experimental results show that our normalization algorithm is effective on our locally collected database, as well as on the eNTERFACE’05 Audio-Visual Emotion Database. The emotional features achieved using our method are robust to the speaker change, and an improved recognition rate is observed.


Speech emotion recognition Feature normalization Speaker clustering 



This work is partially supported by National Nature Science Foundation of China (No:61231002; No:61273266; No:51075068) and Doctoral Fund of Ministry of Education of China (No:20110092130004).


  1. Bezdek, J. C. (1974). Clustering validity with fuzzy sets. Journal of Mathematical Biology, 1, 57–71.MathSciNetCrossRefzbMATHGoogle Scholar
  2. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. Proceedings of the interspeech, Lissabon, Portugal (pp. 1517–1520).Google Scholar
  3. Dring, C., Lesot, M. J., & Kruse, R. (2006). Data analysis with fuzzy clustering method. Computational Statistics and Data Analysi, 51(1), 192–214.CrossRefzbMATHGoogle Scholar
  4. Eyben, F., Woellmer, M., & Schuller, B. (2010). Opensmile, the munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM international conference on multimedia (pp. 1459–1462).Google Scholar
  5. Hansen, J., & Bou-Ghazale, S. (1997). Getting started with SUSAS: A speech under simulated and actual stress database. In Proceedings of Eurospeech, Rhodes, Greece.Google Scholar
  6. Huang, C., Zhao, Y., Jin, Y., Yu, Y., & Zhao, L. (2011). A study on feature analysis and recognition for practical speech emotion. Journal of Electronics & Information Technology, 33, 112–116.CrossRefGoogle Scholar
  7. Kockmann, M., Burget, L., & Cernocky, J. H. (2011). Application of speaker- and language identiffication state-of-the-art techniques for emotion recognition. Speech Communication, 53(9-10), 1172–1185.CrossRefGoogle Scholar
  8. Küstner, D., Tato, R., Kemp, T., & Meffert, B. (2004). Towards real life applications in emotion recognition, In Proceedings of the workshop on affective dialogue systems, Kloster Irsee (pp. 25–35).Google Scholar
  9. Li, J. (2015). Novel fuzzy clustering algorithm based on nature inspired computation, PhD thesis, Xidian University.Google Scholar
  10. Li, Y., Chao, L., Liu, Y. et al. (2015). From simulated speech to natural speech, what are the robust features for emotion recognition? In Proceedings of the IEEE international conference on affective computing and intelligent interaction (ACII) (pp. 368–373).Google Scholar
  11. Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006). The eNTERFACE’05 audio-visual emotion database. In Proceedings of the international conference on data engineering workshops.Google Scholar
  12. Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing & Applications, 9(4), 290–296.CrossRefzbMATHGoogle Scholar
  13. Nwe, T. L., Foo, S. W., & De Silva, L. C. (2001). Speech based emotion classification. In Proceedings of the IEEE region 10 international conference on electrical and electronic technology, Phuket Island, Langkawi Island, Singapore (pp. 297–301).Google Scholar
  14. Nwe, T. L., Foo, S. W., & Silva, L. C. D. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.CrossRefGoogle Scholar
  15. Palo, H. K., Mohanty, M. N., & Chandra, M. (2016). Efficient feature combination techniques for emotional speech classification. International Journal of Speech Technology, 19(1), 135–150.CrossRefGoogle Scholar
  16. pcDuino: Mini PC + Arduino (TM) [].
  17. Rachuri, K. K., Musolesi, M., Mascolo, C., Rentfrow, P. J., Longworth, C., & Aucinas, A. (2010). EmotionSense: A mobile phones based adaptive platform for experimental social psychology, In Proceedings of the 12th ACM international conference on ubiquitous computing, Copenhagen (pp. 281–290).Google Scholar
  18. Ruspini, E. H. (1969). A new approach to clustering. Information and Control, 15, 22–32.CrossRefzbMATHGoogle Scholar
  19. Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech emotion recognition. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing.Google Scholar
  20. Sethu, V., Ambikairajah, E., & Epps, J. (2007). Speaker normalisation for speech-based emotion detection. In Procceedingsd of 15th international conference on digital signal processing, Cardiff (pp. 611–614)Google Scholar
  21. Steidl, S. (2009). Automatic classification of emotion-related user states in spontaneous children’s speech, PhD thesis, FAU Erlangen-Nuremberg.Google Scholar
  22. Sukhummek, P., Kasuriya, S., Theeramunkong, T., et al. Feature selection experiments on emotional speech classification, In Proc. of 12th IEEE International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, pp.1-4 (2015).Google Scholar
  23. Tao, Y., Wang, K., Yang, J. et al. (2015). Harmony search for feature selection in speech emotion recognition, In Proceedings of the IEEE international conference on affective computing and intelligent interaction (ACII) (pp. 362–367).Google Scholar
  24. Tato, R., Santos, R., Kompe, R., & Pardo, J. M. (2002). Emotional space improves emotion recognition, In Proceedings of the 6th IEEE international conference on spoken language processing, Denver, CO (pp. 2029–2032).Google Scholar
  25. Truong, K. P. (2009). How does real affect affect affect recognition in speech? PhD thesis, University of Twente.Google Scholar
  26. Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In Proceedings of the 12th European signal processing conference, Vienna (pp. 341–344).Google Scholar
  27. Vlasenko, B., Schuller, B., Wendemuth, A., & Rigoll, G. (2007). Combining frame and turn-level information for robust recognition of emotions within speech (pp. 2249–2252). Brighton: In Proc. of Interspeech.Google Scholar
  28. Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of the 5th language resources and evaluation conference, Genoa.Google Scholar
  29. Wu, W., Zheng, T. F., Xu, M. X., & Bao, H. J. (2006). Study on speaker verification on emotional speech. In Proceedings of interspeech, Pittsburgh, PA (pp. 2102–2105).Google Scholar
  30. Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53, 768–785.CrossRefGoogle Scholar
  31. Zhao, L., Wang, Z., & Zou, C. (2006). Emotional speech recognition based on modified parameter and distance of statistical model of pitch. Acta Acustica, 31, 28–34.Google Scholar
  32. Zou, C., Huang, C., Han, D., & Zhao, L. (2011). Detecting practical speech emotion in a cognitive task. In Proceedings of the of 20th computer communications and networks (Vol. 31), Maui, HI.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.School of Information Science and EngineeringSoutheast UniversityNanjingChina

Personalised recommendations