Abstract
Emotions are broad aspects and expressed in a similar way by every human being; however, these are affected by culture. This creates a major threat to the universality of speech emotion detection system. Cultural behaviour of society affects the way emotions are expressed and perceived. Hence, an emotion recognition system customized for languages within a cultural group is feasible. In this work, a speaker dependent and speaker independent emotion recognition system has been proposed for two different dialects of Odisha: Sambalpuri and Cuttacki. Spectral speech features, such as, log power, Mel-frequency cepstral coefficients (MFCC), Delta MFCC, Double delta MFCC, log frequency power coefficients, and linear predictive cepstral coefficients, are used with Hidden Markov model and support vector machines (SVM) classifier, for classifying a speech into one of the seven discrete emotion classes: anger, happiness, disgust, fear, sadness, surprise, and neutral. For a better comparative study of system’s accuracy, features are taken individually as well as in combinations by varying sampling frequency, frame length and frame overlapping. Best average recognition accuracy obtained for speaker independent system, is 82.14 % for SVM classifier using only MFCC as feature vector. However, for speaker dependent system a hike in accuracy of more than 10 % is seen. It is also revealed that use of MFCC on SVM classifier, not only gives the best overall performance on 8 kHz sampling frequency, but also shows consistent performance for all the emotion classes, compared to other classifiers and feature combinations with less computational complexity. Hence, it can be applied efficiently in call centre application for emotion recognition over telephone.
Similar content being viewed by others
References
Bachorowski, J. A., & Owren, M. J. (2008). Vocal expressions of emotion, Chapter 12. In M. Lewis, et al. (Eds.), Handbook of Emotions (pp. 196–210). New York: Guilford Publications.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Interspeech-Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1517–1520).
Caballero-Morales, S. (2013). Recognition of emotions in Mexican Spanish speech: An approach based on acoustic modelling of emotion-specific vowels. Hindawi Publishing Corporation, The Scientific World Journal, 2013, 1–13.
Chen, L., Mao, X., Xue, Y., & Cheng, L. L. (2012). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22, 1154–1160.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.
Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete-time processing of speech signals. New York, NY: Macmillan Publishing Company.
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Gales, M. J. F. (1998). Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language, 12, 75–98.
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University.
Kandali, B., Routray, A., & Basu, T. K. (2008). Emotion recognition for Assamese speeches using MFCC features and GMM classifier. In TENCON 2008-2008. IEEE Region 10 Conference (pp. 1–5). IEEE.
Kandali, B., Routray, A., & Basu, T. K. (2008a). Emotion recognition from speeches of some native languages of Assam independent of text and speaker. In National Seminar on Devices, Circuits and Communication, Department of E.C.E (pp. 6–7). Ranchi: B.I.T. Mesra.
Kandali, B., Routray, A., & Basu, T. K. (2009). Vocal emotion recognition in five native languages of Assam using new wavelet features. International Journal of Speech Technology, 12(1), 1–13.
Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centers. Speech Communication, 49, 98–112.
Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transactions on Affective Computing, 3(1), 116–125.
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.
Quatieri, T. F. (2002). Discrete-time speech signal processing. New Delhi: Pearson Education India.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech motion recognition. In Proceedings of (ICASSP’03). IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 2). IEEE.
Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9, 293–300.
Yang, B., & Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing, 90(5), 1415–1423.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Swain, M., Sahoo, S., Routray, A. et al. Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition. Int J Speech Technol 18, 387–393 (2015). https://doi.org/10.1007/s10772-015-9275-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9275-7