Skip to main content

Improving Emotion Detection with Sub-clip Boosting

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11053)

Abstract

With the emergence of systems such as Amazon Echo, Google Home, and Siri, voice has become a prevalent mode for humans to interact with machines. Emotion detection from voice promises to transform a wide range of applications, from adding emotional-awareness to voice assistants, to creating more sensitive robotic helpers for the elderly. Unfortunately, due to individual differences, emotion expression varies dramatically, making it a challenging problem. To tackle this challenge, we introduce the Sub-Clip Classification Boosting (SCB) Framework, a multi-step methodology for emotion detection from non-textual features of audio clips. SCB features a highly-effective sub-clip boosting methodology for classification that, unlike traditional boosting using feature subsets, instead works at the sub-instance level. Multiple sub-instance classifications increase the likelihood that an emotion cue will be found within a voice clip, even if its location varies between speakers. First, each parent voice clip is decomposed into overlapping sub-clips. Each sub-clip is then independently classified. Further, the Emotion Strength of the sub-classifications is scored to form a sub-classification and strength pair. Finally we design a FilterBoost-inspired “Oracle”, that utilizes sub-classification and Emotion Strength pairs to determine the parent clip classification. To tune the classification performance, we explore the relationships between sub-clip properties, such as length and overlap. Evaluation on 3 prominent benchmark datasets demonstrates that our SCB method consistently outperforms all state-of-the art-methods across diverse languages and speakers. Code related to this paper is available at: https://arcgit.wpi.edu/toto/EMOTIVOClean.

Keywords

  • Classification
  • Emotion
  • Boosting
  • Sub-clip
  • Sub-classification

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-10997-4_3
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-10997-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Notes

  1. 1.

    https://wwww.amazon.com/.

  2. 2.

    https://madeby.google.com/home/.

  3. 3.

    http://www.apple.com/ios/siri/.

  4. 4.

    https://arcgit.wpi.edu/toto/EMOTIVOClean.

  5. 5.

    http://www.rml.ryerson.ca/rml-emotion-database.html.

  6. 6.

    http://www.emodb.bilderbar.info/.

  7. 7.

    http://kahlan.eps.surrey.ac.uk/savee/.

  8. 8.

    http://audeering.com/technology/opensmile/.

References

  1. Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. (TOIS) 26(3), 12 (2008)

    CrossRef  Google Scholar 

  2. Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015)

    CrossRef  Google Scholar 

  3. Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service, PlatCon, pp. 1–5. IEEE (2017)

    Google Scholar 

  4. Bradley, J.K., Schapire, R.E.: FilterBoost: regression and classification on large datasets. In: NIPS, pp. 185–192 (2007)

    Google Scholar 

  5. Chenchah, F., Lachiri, Z.: Speech emotion recognition in acted and spontaneous context. Proc. Comput. Sci. 39, 139–145 (2014)

    CrossRef  Google Scholar 

  6. Ekman, P.: Strong evidence for universals in facial expressions: a reply to Russell’s mistaken critique (1994)

    CrossRef  Google Scholar 

  7. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)

    CrossRef  Google Scholar 

  8. Eyben, F., Unfried, M., Hagerer, G., Schuller, B.: Automatic multi-lingual arousal detection from voice applied to real product testing applications. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5155–5159. IEEE (2017)

    Google Scholar 

  9. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838. ACM (2013)

    Google Scholar 

  10. Haq, S., Jackson, P., Edge, J.: Audio-visual feature selection and reduction for emotion classification. In: Proceedings of International Conference on Auditory-Visual Speech Processing, AVSP 2008, Tangalooma, Australia, September 2008

    Google Scholar 

  11. Hossain, M.S., Muhammad, G., Alhamid, M.F., Song, B., Al-Mutib, K.: Audio-visual emotion recognition using big data towards 5G. Mobile Netw. Appl. 21(5), 753–763 (2016)

    CrossRef  Google Scholar 

  12. Huang, Z., Dong, M., Mao, Q., Zhan, Y.: Speech emotion recognition using CNN. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 801–804. ACM (2014)

    Google Scholar 

  13. Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M.A.: A review on speech emotion recognition: case of pedagogical interaction in classroom. In: 2017 International Conference on Advanced Technologies for Signal and Image Processing, ATSIP, pp. 1–7. IEEE (2017)

    Google Scholar 

  14. Kishore, K.K., Satish, P.K.: Emotion recognition in speech using MFCC and wavelet features. In: 2013 IEEE 3rd International Advance Computing Conference, IACC, pp. 842–847. IEEE (2013)

    Google Scholar 

  15. Knapp, M.L., Hall, J.A., Horgan, T.G.: Nonverbal Communication in Human Interaction. Cengage Learning, Boston (2013)

    Google Scholar 

  16. Kobayashi, V., Calag, V.: Detection of affective states from speech signals using ensembles of classifiers. In: FIET Intelligent Signal Processing Conference (2013)

    Google Scholar 

  17. Kobayashi, V.: A hybrid distance-based method and support vector machines for emotional speech detection. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2013. LNCS, vol. 8399, pp. 85–99. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08407-7_6

    CrossRef  Google Scholar 

  18. Kraus, M.W.: Voice-only communication enhances empathic accuracy. Am. Psychol. 72(7), 644 (2017)

    CrossRef  Google Scholar 

  19. Litman, D.J., Silliman, S.: ITSPOKE: an intelligent tutoring spoken dialogue system. In: Demonstration Papers at HLT-NAACL 2004, pp. 5–8. Association for Computational Linguistics (2004)

    Google Scholar 

  20. Nass, C., Moon, Y.: Machines and mindlessness: social responses to computers. J. Soc. Issues 56(1), 81–103 (2000)

    CrossRef  Google Scholar 

  21. Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)

    CrossRef  MathSciNet  Google Scholar 

  22. Picard, R.W.: Affective computing (1995)

    Google Scholar 

  23. Poels, K., Dewitte, S.: How to capture the heart? Reviewing 20 years of emotion measurement in advertising. J. Advert. Res. 46(1), 18–37 (2006)

    CrossRef  Google Scholar 

  24. Riva, G.: Ambient intelligence in health care. CyberPsychol. Behav. 6(3), 295–300 (2003)

    CrossRef  MathSciNet  Google Scholar 

  25. Sun, Y., Wen, G.: Ensemble softmax regression model for speech emotion recognition. Multimed. Tools Appl. 76(6), 8305–8328 (2017)

    CrossRef  Google Scholar 

  26. Todorovski, L., Džeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)

    CrossRef  Google Scholar 

  27. Valstar, M., et al.: AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pp. 3–10. ACM (2013)

    Google Scholar 

  28. Vasuki, P.: Speech emotion recognition using adaptive ensemble of class specific classifiers. Res. J. Appl. Sci. Eng. Technol. 9(12), 1105–1114 (2015)

    CrossRef  Google Scholar 

  29. Vasuki, P., Vaideesh, A., Abubacker, M.S.: Emotion recognition using ensemble of cepstral, perceptual and temporal features. In: International Conference on Inventive Computation Technologies, ICICT, vol. 2, pp. 1–6. IEEE (2016)

    Google Scholar 

  30. Verhoef, P.C., Lemon, K.N., Parasuraman, A., Roggeveen, A., Tsiros, M., Schlesinger, L.A.: Customer experience creation: determinants, dynamics and management strategies. J. Retail. 85(1), 31–41 (2009)

    CrossRef  Google Scholar 

  31. Vlasenko, B., Wendemuth, A.: Tuning hidden Markov model for speech emotion recognition. Fortschritte der Akustik 33(1), 317 (2007)

    Google Scholar 

  32. Vogt, T., André, E., Bee, N.: EmoVoice—a framework for online recognition of emotions from voice. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS, vol. 5078, pp. 188–199. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69369-7_21

    CrossRef  Google Scholar 

  33. Wang, Y., Guan, L.: Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimed. 10(5), 936–946 (2008)

    CrossRef  Google Scholar 

  34. Weißkirchen, N., Bock, R., Wendemuth, A.: Recognition of emotional speech with convolutional neural networks by means of spectral estimates. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 50–55. IEEE (2017)

    Google Scholar 

  35. Wen, G., Li, H., Huang, J., Li, D., Xun, E.: Random deep belief networks for recognizing emotions from speech signals. Comput. Intell. Neurosci. 2017 (2017)

    CrossRef  Google Scholar 

  36. Yu, D., Deng, L.: Automatic Speech Recognition. SCT. Springer, London (2015). https://doi.org/10.1007/978-1-4471-5779-3

    CrossRef  MATH  Google Scholar 

  37. Zao, L., Cavalcante, D., Coelho, R.: Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Process. Lett. 21(5), 620–624 (2014)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ermal Toto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Toto, E., Foley, B.J., Rundensteiner, E.A. (2019). Improving Emotion Detection with Sub-clip Boosting. In: , et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science(), vol 11053. Springer, Cham. https://doi.org/10.1007/978-3-030-10997-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10997-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10996-7

  • Online ISBN: 978-3-030-10997-4

  • eBook Packages: Computer ScienceComputer Science (R0)