Improving Emotion Recognition Performance by Random-Forest-Based Feature Selection

  • Olga EgorowEmail author
  • Ingo Siegert
  • Andreas Wendemuth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11096)


As technical systems around us aim at a more natural interaction, the task of automatic emotion recognition from speech receives an ever growing attention. One important question still remains unresolved: The definition of the most suitable features across different data types. In the present paper, we employed a random-forest based feature selection known from other research fields in order to select the most important features for three benchmark datasets. Investigating feature selection on the same corpus as well as across corpora, we achieved an increase in performance using only 40 to 60% of the features of the well-known emobase feature set.


Speech emotion recognition Feature selection Random forest 



This work has been sponsored by the German Federal Ministry of Education and Research in the program Zwanzig20 – Partnership for Innovation as part of the research alliance 3Dsensation (grant number 03ZZ0414). It was also supported by the project Intention-based Anticipatory Interactive Systems (IAIS) funded by the European Funds for Regional Development (EFRE) and by the Federal State of Sachsen-Anhalt, Germany (grant number ZS/2017/10/88785).


  1. 1.
    Berthold, M.R., et al.: KNIME: The konstanz information miner. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds.) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg (2008). Scholar
  2. 2.
    Bitouk, D., Verma, R., Nenkova, A.: Class-level spectral features for emotion recognition. Speech Commun. 52(7–8), 613–625 (2010)CrossRefGoogle Scholar
  3. 3.
    Böck, R., Egorow, O., Siegert, I., Wendemuth, A.: Comparative study on normalisation in emotion recognition from speech. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 189–201. Springer, Cham (2017). Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  5. 5.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of the INTERSPEECH-2005, pp. 1517–1520 (2005)Google Scholar
  6. 6.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. Trans. Intell. Syst. Technol. 2, 1–27 (2011)CrossRefGoogle Scholar
  7. 7.
    Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction: Foundations and Applications, pp. 315–324. Springer, Berlin Heidelberg (2006). Scholar
  8. 8.
    Egorow, O., Wendemuth, A.: Detection of challenging dialogue stages using acoustic signals and biosignals. In: Proceedings of the 24th International Conference on Computer Graphics, Visualization and Computer Vision, pp. 137–143 (2016)Google Scholar
  9. 9.
    Eyben, F., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. Trans. Affect. Comput. 7(2), 190–202 (2016)CrossRefGoogle Scholar
  10. 10.
    Eyben, F., Wöllmer, M., Schuller, B.: OpenEAR - introducing the Munich open-source emotion and affect recognition toolkit. In: Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–6. IEEE (2009)Google Scholar
  11. 11.
    Gharavian, D., Sheikhan, M., Nazerieh, A., Garoucy, S.: Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput. Appl. 21(8), 2115–2126 (2012)CrossRefGoogle Scholar
  12. 12.
    Hansen, J., Bou-Ghazale, S.: Getting started with SUSAS: A speech under simulated and actual stress database. In: Proceedings of the EUROSPEECH-1997, pp. 1743–1746 (1997)Google Scholar
  13. 13.
    Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals. In: Proceedings of the 8th European Conference on Speech Communication and Technology (2003)Google Scholar
  14. 14.
    Levinson, S.C., Holler, J.: The origin of human multi-modal communication. Phil. Trans. R. Soc. B 369(1651), 20130302 (2014)CrossRefGoogle Scholar
  15. 15.
    Mao, Q., Zhao, X., Zhan, Y.: Extraction and analysis for non-personalized emotion features of speech. Adv. Inf. Sci. Serv. Sci. 3(10), 255–263 (2011)Google Scholar
  16. 16.
    Menze, B.H., et al.: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10(1), 213 (2009)CrossRefGoogle Scholar
  17. 17.
    Oshrat, Y., Bloch, A., Lerner, A., Cohen, A., Avigal, M., Zeilig, G.: Speech prosody as a biosignal for physical pain detection. In: Proceedings of Speech Prosody, pp. 420–424 (2016)Google Scholar
  18. 18.
    Palo, H.K., Mohanty, M.N.: Wavelet based feature combination for recognition of emotions. Ain Shams Eng. J. (2017, in Press)Google Scholar
  19. 19.
    Ramanarayanan, V., et al.: Using vision and speech features for automated prediction of performance metrics in multimodal dialogs. ETS Research Report Series 1 (2017)Google Scholar
  20. 20.
    Schuller, B., Müller, R., Hörnler, B., Höthker, A., Konosu, H., Rigoll, G.: Audiovisual recognition of spontaneous interest within conversations. In: Proceedings of the 9th International Conference on Multimodal interfaces, pp. 30–37. ACM (2007)Google Scholar
  21. 21.
    Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)CrossRefGoogle Scholar
  22. 22.
    Schuller, B., Wöllmer, M., Eyben, F., Rigoll, G.: The role of prosody in affective speech, linguistic insights, studies in language and communication. Lang. Commun. 97, 285–307 (2009)Google Scholar
  23. 23.
    Silipo, R., Adae, I., Hart, A., Berthold, M.: Seven techniques for dimensionality reduction. Technical report, KNIME (2014)Google Scholar
  24. 24.
    Tzirakis, P., Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: End-to-end multimodal emotion recognition using deep neural networks. J. Sel. Top. Signal Process. 11(8), 1301–1309 (2017)CrossRefGoogle Scholar
  25. 25.
    Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using fourier parameters. Trans. Affect. Comput. 6(1), 69–75 (2015)CrossRefGoogle Scholar
  26. 26.
    Yang, C., Ji, L., Liu, G.: Study to speech emotion recognition based on TWINsSVM. In: Proceedings of the 5th International Conference on Natural Computation, vol. 2, pp. 312–316. IEEE (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Cognitive Systems GroupOtto von Guericke UniversityMagdeburgGermany

Personalised recommendations