International Journal of Speech Technology

, Volume 18, Issue 3, pp 387–393 | Cite as

Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition

  • Monorama Swain
  • Subhasmita Sahoo
  • Aurobinda Routray
  • P. Kabisatpathy
  • Jogendra N. Kundu


Emotions are broad aspects and expressed in a similar way by every human being; however, these are affected by culture. This creates a major threat to the universality of speech emotion detection system. Cultural behaviour of society affects the way emotions are expressed and perceived. Hence, an emotion recognition system customized for languages within a cultural group is feasible. In this work, a speaker dependent and speaker independent emotion recognition system has been proposed for two different dialects of Odisha: Sambalpuri and Cuttacki. Spectral speech features, such as, log power, Mel-frequency cepstral coefficients (MFCC), Delta MFCC, Double delta MFCC, log frequency power coefficients, and linear predictive cepstral coefficients, are used with Hidden Markov model and support vector machines (SVM) classifier, for classifying a speech into one of the seven discrete emotion classes: anger, happiness, disgust, fear, sadness, surprise, and neutral. For a better comparative study of system’s accuracy, features are taken individually as well as in combinations by varying sampling frequency, frame length and frame overlapping. Best average recognition accuracy obtained for speaker independent system, is 82.14 % for SVM classifier using only MFCC as feature vector. However, for speaker dependent system a hike in accuracy of more than 10 % is seen. It is also revealed that use of MFCC on SVM classifier, not only gives the best overall performance on 8 kHz sampling frequency, but also shows consistent performance for all the emotion classes, compared to other classifiers and feature combinations with less computational complexity. Hence, it can be applied efficiently in call centre application for emotion recognition over telephone.


Recognition of emotion Emotional speech SVM HMM MFCC LFPC 


  1. Bachorowski, J. A., & Owren, M. J. (2008). Vocal expressions of emotion, Chapter 12. In M. Lewis, et al. (Eds.), Handbook of Emotions (pp. 196–210). New York: Guilford Publications.Google Scholar
  2. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Interspeech-Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1517–1520).Google Scholar
  3. Caballero-Morales, S. (2013). Recognition of emotions in Mexican Spanish speech: An approach based on acoustic modelling of emotion-specific vowels. Hindawi Publishing Corporation, The Scientific World Journal, 2013, 1–13.Google Scholar
  4. Chen, L., Mao, X., Xue, Y., & Cheng, L. L. (2012). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22, 1154–1160.MathSciNetCrossRefGoogle Scholar
  5. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.CrossRefGoogle Scholar
  6. Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete-time processing of speech signals. New York, NY: Macmillan Publishing Company.Google Scholar
  7. Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.CrossRefzbMATHGoogle Scholar
  8. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.CrossRefzbMATHGoogle Scholar
  9. Gales, M. J. F. (1998). Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language, 12, 75–98.CrossRefGoogle Scholar
  10. Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University.Google Scholar
  11. Kandali, B., Routray, A., & Basu, T. K. (2008). Emotion recognition for Assamese speeches using MFCC features and GMM classifier. In TENCON 2008-2008. IEEE Region 10 Conference (pp. 1–5). IEEE.Google Scholar
  12. Kandali, B., Routray, A., & Basu, T. K. (2008a). Emotion recognition from speeches of some native languages of Assam independent of text and speaker. In National Seminar on Devices, Circuits and Communication, Department of E.C.E (pp. 6–7). Ranchi: B.I.T. Mesra.Google Scholar
  13. Kandali, B., Routray, A., & Basu, T. K. (2009). Vocal emotion recognition in five native languages of Assam using new wavelet features. International Journal of Speech Technology, 12(1), 1–13.CrossRefGoogle Scholar
  14. Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centers. Speech Communication, 49, 98–112.CrossRefGoogle Scholar
  15. Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transactions on Affective Computing, 3(1), 116–125.CrossRefGoogle Scholar
  16. Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.CrossRefGoogle Scholar
  17. Quatieri, T. F. (2002). Discrete-time speech signal processing. New Delhi: Pearson Education India.Google Scholar
  18. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRefGoogle Scholar
  19. Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech motion recognition. In Proceedings of (ICASSP’03). IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 2). IEEE.Google Scholar
  20. Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9, 293–300.MathSciNetCrossRefGoogle Scholar
  21. Yang, B., & Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing, 90(5), 1415–1423.CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Monorama Swain
    • 1
  • Subhasmita Sahoo
    • 2
  • Aurobinda Routray
    • 2
  • P. Kabisatpathy
    • 3
  • Jogendra N. Kundu
    • 1
  1. 1.Department of Electronics & Telecommunication EngineeringSilicon Institute of TechnologyBhubaneswarIndia
  2. 2.Electrical EngineeringIndian Institute of Technology KharagpurKharagpurIndia
  3. 3.Department of Electronics and InstrumentationNational Institute of Science and TechnologyBerhampurIndia

Personalised recommendations