Multistage classification scheme to enhance speech emotion recognition

Poorna, S. S.; Nair, G. J.

doi:10.1007/s10772-019-09605-w

Multistage classification scheme to enhance speech emotion recognition

Published: 08 March 2019

Volume 22, pages 327–340, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

590 Accesses
25 Citations
Explore all metrics

Abstract

During the past decades, emotion recognition from speech has become one of the most explored areas in affective computing. These systems lack universality due to multilingualism. Research in this direction is restrained due to unavailability of emotional speech databases in various spoken languages. Arabic is one such language, which faces this inadequacy. The proposed work aims at developing a speech emotion recognition system for Arabic speaking community. A speech database with elicited emotions—anger, happiness, sadness, disgust, surprise and neutrality are recorded from 14 subjects, who are non-native, but proficient speakers in the language. The prosodic, spectral and cepstral features are extracted after pre-processing. Subsequently the features were subjected to single stage classification using supervised learning methods viz. Support vector machine and Extreme learning machine. The performance of the speech emotion recognition systems implemented are compared in terms of accuracy, specificity, precision and recall. Further analysis is carried out by adopting three multistage classification schemes. The first scheme followed a two stage classification by initially identifying gender and then the emotions. The second used a divide and conquer approach, utilizing cascaded binary classifiers and the third, a parallel approach by classification with individual features, followed by a decision logic. The result of the study depicts that, these multistage classification schemes an bring improvement in the performance of speech emotion recognition system compared to the one with single stage classification. Comparable results were obtained for same experiments carried out using Emo-DB database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Automatic speech recognition: a survey

Article 10 November 2020

Multimodal emotion classification using machine learning in immersive and non-immersive virtual reality

Article Open access 06 May 2024

References

Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech and Language, 25(3), 556–570.
Article Google Scholar
Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43, 155. https://doi.org/10.1007/s10462-012-9368-5.
Article Google Scholar
Badshah, A.M., Ahmad, J., Lee, M.Y., & Baik, S.W. (2016). Divide-and-conquer based ensemble to spot emotions in speech using MFCC and random forest. In arXiv:1610.01382v1.
Boersma, P., & Weenink, D. (2018). Praat: Doing phonetics by computer. Version 6.0.39. Retrieved 3 April, 2018, from http://www.praat.org/.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technolog (pp. 1517–1520). Lisbon, Portugal.
Chen, C., You, M., Song, M., Bu, J., & Liu, J. (2006). An enhanced speech emotion recognition system based on discourse information. In Computational Science ICCS 2006 (p. 449456). New York: Springer.
Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Deriche, M., & Abo absa, A. H. (2017). A two-stage hierarchical Bilingual emotion recognition system using a hidden Markov model and neural networks. Arabian Journal for Science and Engineering, 42, 5231. https://doi.org/10.1007/s13369-017-2742-5.
Article MathSciNet MATH Google Scholar
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Article MATH Google Scholar
Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 92, 60–68.
Article Google Scholar
Ghazi, D., Inkpen, D., & Szpakowicz, S. (2010). Hierarchical approach to emotion recognition and classification in Texts. In A. Farzindar & V. Keelj (Eds.), Advances in artificial intelligence, Lecture Notes in Computer Science Berlin: Springer.
Google Scholar
Giannakopoulos, T. (2009). A method for silence removal and segmentation of speech signals, implemented in Matlab (p. 2). Athens: University of Athens.
Google Scholar
Hassan, A., & Damper, R. I. (2010). Multi-class and hierarchical SVMs for emotion recognition. In Eleventh Annual Conference of the International Speech Communication Association.
Haykin, S. (1998). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle Rive: Prentice Hall.
MATH Google Scholar
Hozjan, V., & Kai, Z. (2003). Context-independent multilingual emotion recognition from speech signals. International Journal of Speech Technology, 6(3), 311–320.
Article Google Scholar
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and Applications. Neurocomputing, 70(1–3), 489–501.
Article Google Scholar
Huang, K. Y., Wu, C. H., Su, M. H., & Kuo, Y. T. (2018). Detecting unipolar and bipolar depressive disorders from elicited speech responses using latent affective structure model. In IEEE Transactions on Affective Computing.
Huang, G. B., Zhu, Q. Y., Siew, & C. K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In IEEE Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, 2004 (Vol. 2, pp. 985–990).
Huber, R., Anton, B., Jan, B., Elmar, N., Volker, W., & Heinrich, N. (2000). Recognition of emotion in a realistic dialogue scenario. In Proceedings of International Conference on Spoken Language Processing. Beijing, China, pp 665- 668.
Kadiri, S. R., Gangamohan, P., Gangashetty, S. V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In Sixteenth Annual Conference of the International Speech Communication Association.
Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.
Article Google Scholar
Klaylat, S., Osman, Z., Hamandi, L., & Zantout, R. (2018). Emotion recognition in Arabic speech. Analog Integrated Circuits and Signal Processing. https://doi.org/10.1007/s10470-018-1142-4.
Klaylat, S., Hamandi, L., Osman, Z., & Zantout, R. (2017). Emotion recognition in Arabic speech,. In 2017 Sensors Networks Smart and Emerging Technologies (SENSET), Beirut, pp. 1–4. https://doi.org/10.1109/SENSET.2017.8125028.
Koolagudi, S. G., Murthy, Y. V. S., & Bhaskar, S. P. (2018). Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition. International Journal Speech Technology, 21, 167. https://doi.org/10.1007/s10772-018-9495-8.
Article Google Scholar
Kotti, M., & Patern, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.
Article Google Scholar
Lausen, A., & Schacht, A. (2018). Gender differences in the recognition of vocal emotions. Frontiers in Psychology, 9, 882.
Article Google Scholar
Lee, C. C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.
Article Google Scholar
Lindquist, K. A., MacCormack, J. K., & Shablack, H. (2015). The role of language in emotion: Predictions from psychological constructionism. Frontiers in Psychology, 6, 444. https://doi.org/10.3389/fpsyg.2015.00444.
Article Google Scholar
Liu, Z.-T., Wu, M., Cao, W.-H., Mao, J.-W., Xu, J.-P., & Tan, G.-Z. (2017). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.07.050.
Google Scholar
Lugger, M., Janoir, M. E., & Yang, B. (2009, August). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In 17th European IEEE Conference on Signal Processing, 2009 (pp. 1225-1229).
Mayoraz, E., & Alpaydin, E. (1999). Support vector machines for multi-class classification. In International Work-Conference on Artificial Neural Networks (pp. 833–842). Springer, Berlin, Heidelberg.
Meddeb, M., Hichem, K., & Alimi. A. (2014). Intelligent remote control for TV program based on emotion in Arabic speech. International Journal of Scientific Research and Engineering Technology (IJSET), Vol. 1, ISSN 2277-1581
Meddeb, M., Karray, H., & Alimi, A. M. (2015). Speech emotion recognition based on Arabic features. In 15th International Conference on Intelligent Systems Design and Applications (ISDA) (pp. 46–51). Marrakech. https://doi.org/10.1109/ISDA.2015.7489165
Meftah, A, Selouani, S. A., & Alotaibi, Y. A. (2014). Preliminary Arabic speech emotion classification. In IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) (pp. 000179–000182), Noida. https://doi.org/10.1109/ISSPIT.2014.7300584.
Meftah, A., Alotaibi, Y., & Selouani, S. A. (2016). Emotional speech recognition: A multilingual perspective. International Conference on Bio-engineering for Smart Technologies, Dubai, 2016, 1–4. https://doi.org/10.1109/BIOSMART.2016.7835600SS
Miguel Signorelli, C. (2018). Can computers become conscious and overcome humans? Frontiers in Robotics and AI, 5, 45.
Google Scholar
Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech and Language, 28(3), 727–742.
Article Google Scholar
Morrison, D., Wang, R., Xu, W., & Silva, L. C. D. (2007). Incremental learning for spoken affect classification and its application in call-centres. International Journal of Intelligent Systems Technologies and Applications, 2, 242–254.
Article Google Scholar
Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K., & Geralts, D. S. (2007). Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. Journal of Neurolinguistics, 20(1), 50–64.
Article Google Scholar
Padhi, D. R., & Gupta, R. (2017). IVR Wizard of OZ field experiment with less-literate telecom customers. In IFIP Conference on Human-Computer Interaction (pp. 492–495). Cham: Springer.
Pell, M. D. (2008). Implicit processing of emotional prosody in a foreign versus native language. Speech Communication. https://doi.org/10.1016/j.specom.2008.03.006.
Google Scholar
Picard, R. W. (1997). Affective computing. Cambridge: The MIT Press.
Google Scholar
Poorna, C. Y., Jeevitha, Shyama Jayan, Nair, Sini Santhosh, & Nair, G. J. (2015). Emotion recognition using multi-parameter speech feature classification, IEEE International Conference on Computers, Communications, and Systems, India. Electronic. ISBN 978-1-4673-9756-8.
Poorna, S. S., Anuraj, K., & Nair, G. J. (2018). A weight based approach for emotion recognition from speech: An analysis using South Indian languages. In Soft computing systems. ICSCS 2018. Communications in Computer and Information Science, Vol. 837. Springer.
Rabiner, L., Cheng, M., Rosenberg, A., & McGonegal, C. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(5), 399–418. https://doi.org/10.1109/TASSP.1162846.
Article Google Scholar
Rabiner, L. R., & Schafer, R. W. (2011). Theory and application of digital speech processing (1st ed.). New York: Prentice Hall.
Google Scholar
Rajoo, R., & Aun. C. C. (2016). Influences of languages in speech emotion recognition: A comparative study using Malay, English and Mandarin languages. In IEEE Symposium on Computer Applications and Industrial Electronics (ISCAIE), Penang. https://doi.org/10.1109/ISCAIE.2016.7575033.
Roh, Y. -W., Kim, D. -J., Lee, W. -S., & Hong, K. -S. (2009). Novel acoustic features for speech emotion recognition. Science in China Series E: Technological Sciences, 52, 1838. https://doi.org/10.1007/s11431-009-0204-3.
Article Google Scholar
Siddiqui, S., Monem, A. A., & Shaalan, K. (2017). Towards Improving Sentiment Analysis in Arabic. In: A. Hassanien, K. Shaalan, T. Gaber, A. Azar, & M. Tolba (Eds.), Proceedings of the International Conference on Advanced Intelligent Systems and Informatics. Advances in Intelligent Systems and Computing, Vol. 533. Cham: Springer.
Silla, C. N., & Freitas, A. A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, Springer. https://doi.org/10.1007/s10618-010-0175-9.
Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: A review. International Journal of Speech Technology, 21(1), 93–120.
Article Google Scholar
Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 11621181.
Article Google Scholar
Vlasenko, B., Schuller, B., Wendemuth, A., & Rigoll, G. (2007). Frame vs. turn-level: Emotion recognition from speech considering static and dynamic processing. In International Conference on Affective Computing and Intelligent Interaction, pp. 139–147. Berlin: Springer.
Wolpert, D. H. (2002). The supervised learning no-free-lunch theorems. In R. Roy, M. Köppen, S. Ovaska, T. Furuhashi, & F. Hoffmann (Eds.), Soft computing and industry (pp. 25–42). London: Springer.
Google Scholar
Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2011). Classification of Emotional Speech Based on an Automatically Elaborated Hierarchical Classifier (p. 753819). Article ID: ISRN Signal Processing.
Xin, L., & Xiang, L. (2010). Novel Hilbert energy spectrum based features for speech emotion recognition. In WASE International Conference on Information Engineering (pp. 189–193). Beidaihe, Hebei. https://doi.org/10.1109/ICIE.2010.52.
Yazdani, A., Skodras, E., Fakotakis, N., & Ebrahimi, T. (2013). Multimedia content analysis for emotional characterization of music video clips. EURASIP Journal on Image and Video Processing, 2013(1), 26, https://doi.org/10.1186/1687-5281-2013-26.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India
S. S. Poorna & G. J. Nair

Authors

S. S. Poorna
View author publications
You can also search for this author in PubMed Google Scholar
G. J. Nair
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. S. Poorna.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poorna, S.S., Nair, G.J. Multistage classification scheme to enhance speech emotion recognition. Int J Speech Technol 22, 327–340 (2019). https://doi.org/10.1007/s10772-019-09605-w

Download citation

Received: 01 November 2018
Accepted: 26 February 2019
Published: 08 March 2019
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10772-019-09605-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multistage classification scheme to enhance speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

Multimodal emotion classification using machine learning in immersive and non-immersive virtual reality

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multistage classification scheme to enhance speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

Multimodal emotion classification using machine learning in immersive and non-immersive virtual reality

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation