Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition

Swain, Monorama; Sahoo, Subhasmita; Routray, Aurobinda; Kabisatpathy, P.; Kundu, Jogendra N.

doi:10.1007/s10772-015-9275-7

Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition

Published: 03 March 2015

Volume 18, pages 387–393, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Monorama Swain¹,
Subhasmita Sahoo²,
Aurobinda Routray²,
P. Kabisatpathy³ &
…
Jogendra N. Kundu¹

853 Accesses
11 Citations
Explore all metrics

Abstract

Emotions are broad aspects and expressed in a similar way by every human being; however, these are affected by culture. This creates a major threat to the universality of speech emotion detection system. Cultural behaviour of society affects the way emotions are expressed and perceived. Hence, an emotion recognition system customized for languages within a cultural group is feasible. In this work, a speaker dependent and speaker independent emotion recognition system has been proposed for two different dialects of Odisha: Sambalpuri and Cuttacki. Spectral speech features, such as, log power, Mel-frequency cepstral coefficients (MFCC), Delta MFCC, Double delta MFCC, log frequency power coefficients, and linear predictive cepstral coefficients, are used with Hidden Markov model and support vector machines (SVM) classifier, for classifying a speech into one of the seven discrete emotion classes: anger, happiness, disgust, fear, sadness, surprise, and neutral. For a better comparative study of system’s accuracy, features are taken individually as well as in combinations by varying sampling frequency, frame length and frame overlapping. Best average recognition accuracy obtained for speaker independent system, is 82.14 % for SVM classifier using only MFCC as feature vector. However, for speaker dependent system a hike in accuracy of more than 10 % is seen. It is also revealed that use of MFCC on SVM classifier, not only gives the best overall performance on 8 kHz sampling frequency, but also shows consistent performance for all the emotion classes, compared to other classifiers and feature combinations with less computational complexity. Hence, it can be applied efficiently in call centre application for emotion recognition over telephone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emotion Recognition in Speech Using MFCC and Classifiers

Speech Emotion Recognition of Tamil Language: An Implementation with Linear and Nonlinear Feature

An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

Article 01 January 2024

References

Bachorowski, J. A., & Owren, M. J. (2008). Vocal expressions of emotion, Chapter 12. In M. Lewis, et al. (Eds.), Handbook of Emotions (pp. 196–210). New York: Guilford Publications.
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Interspeech-Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1517–1520).
Caballero-Morales, S. (2013). Recognition of emotions in Mexican Spanish speech: An approach based on acoustic modelling of emotion-specific vowels. Hindawi Publishing Corporation, The Scientific World Journal, 2013, 1–13.
Google Scholar
Chen, L., Mao, X., Xue, Y., & Cheng, L. L. (2012). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22, 1154–1160.
Article MathSciNet Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.
Article Google Scholar
Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete-time processing of speech signals. New York, NY: Macmillan Publishing Company.
Google Scholar
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.
Article MATH Google Scholar
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Article MATH Google Scholar
Gales, M. J. F. (1998). Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language, 12, 75–98.
Article Google Scholar
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University.
Kandali, B., Routray, A., & Basu, T. K. (2008). Emotion recognition for Assamese speeches using MFCC features and GMM classifier. In TENCON 2008-2008. IEEE Region 10 Conference (pp. 1–5). IEEE.
Kandali, B., Routray, A., & Basu, T. K. (2008a). Emotion recognition from speeches of some native languages of Assam independent of text and speaker. In National Seminar on Devices, Circuits and Communication, Department of E.C.E (pp. 6–7). Ranchi: B.I.T. Mesra.
Kandali, B., Routray, A., & Basu, T. K. (2009). Vocal emotion recognition in five native languages of Assam using new wavelet features. International Journal of Speech Technology, 12(1), 1–13.
Article Google Scholar
Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centers. Speech Communication, 49, 98–112.
Article Google Scholar
Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transactions on Affective Computing, 3(1), 116–125.
Article Google Scholar
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.
Article Google Scholar
Quatieri, T. F. (2002). Discrete-time speech signal processing. New Delhi: Pearson Education India.
Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech motion recognition. In Proceedings of (ICASSP’03). IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 2). IEEE.
Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9, 293–300.
Article MathSciNet Google Scholar
Yang, B., & Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing, 90(5), 1415–1423.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics & Telecommunication Engineering, Silicon Institute of Technology, Bhubaneswar, Odisha, India
Monorama Swain & Jogendra N. Kundu
Electrical Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
Subhasmita Sahoo & Aurobinda Routray
Department of Electronics and Instrumentation, National Institute of Science and Technology, Berhampur, Odisha, India
P. Kabisatpathy

Authors

Monorama Swain
View author publications
You can also search for this author in PubMed Google Scholar
Subhasmita Sahoo
View author publications
You can also search for this author in PubMed Google Scholar
Aurobinda Routray
View author publications
You can also search for this author in PubMed Google Scholar
P. Kabisatpathy
View author publications
You can also search for this author in PubMed Google Scholar
Jogendra N. Kundu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monorama Swain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Swain, M., Sahoo, S., Routray, A. et al. Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition. Int J Speech Technol 18, 387–393 (2015). https://doi.org/10.1007/s10772-015-9275-7

Download citation

Received: 29 September 2014
Accepted: 17 February 2015
Published: 03 March 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10772-015-9275-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

Emotion Recognition in Speech Using MFCC and Classifiers

Speech Emotion Recognition of Tamil Language: An Implementation with Linear and Nonlinear Feature

An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

Emotion Recognition in Speech Using MFCC and Classifiers

Speech Emotion Recognition of Tamil Language: An Implementation with Linear and Nonlinear Feature

An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation