Abstract
We describe a novel methodology that is applicable in the detection of emotions from speech signals. The methodology is useful if we can safely ignore sequence information since it constructs static feature vectors to represent a sequence of values; this is the case of the current application. In the initial feature extraction part, the speech signals are cut into 3 speech segments according to relative time interval process. The speech segments are processed and described using 988 acoustic features. Our proposed methodology consists of two steps. The first step constructs emotion models using principal component analysis and it computes distances of the observations to each emotion models. The distance values from the previous step are used to train a support vector machine classifier that can identify the affective content of a speech signal. We note that our method is not only applicable for speech signal, it can also be used to analyse other data of similar nature. The proposed method is tested using four emotional databases. Results showed competitive performance yielding an average accuracy of at least 80 % on three databases for the detection of basic types of emotion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G.N., Kollias, S.D., Fellenz, W.A., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Sig. Process. Mag. 18, 32–80 (2001)
Vogt, T., André, E., Wagner, J.: Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Peter, C., Beale, R. (eds.) Affect and Emotion in HCI. LNCS, vol. 4868, pp. 75–91. Springer, Heidelberg (2008)
El Ayadi, M.M.H., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572–587 (2011)
Koolagudi, S.G., Rao, K.S.: Emotion recognition from speech: a review. Int. J. Speech Technol. 15, 99–117 (2012)
Dileep, A.D., Veena, T., Sekhar, C.C.: A review of kernel methods based approaches to classification and clustering of sequential patterns, part i: sequences of continuous feature vectors. In: Kumar, P., Krishna, P.R., Raju, S.B. (eds.) Pattern Discovery Using Sequence Data Mining: Applications and Studies. IGI Global (2012)
Branden, K.V., Hubert, M.: Robust classification in high dimensions based on the SIMCA method. Chemometr. Intell. Lab. Syst. 79, 10–21 (2005)
Bi, F., Yang, J., Yu, Y., Xu, D.: Decision templates ensemble and diversity analysis for segment-based speech emotion recognition. In: 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007). Advances in Intelligent Systems Research (2007)
Schuller, B., Rigoll, G.: Timing levels in segment-based speech emotion recognition. In: Ninth International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, Pittsburgh, PA, USA, pp. 1818–1821. ISCA (2006)
Shami, M.T., Kamel, M.S.: Segment-based approach to the recognition of emotions in speech. In: Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, ICME 2005, Amsterdam, The Netherlands, pp. 366–369. IEEE (2005)
Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. In: INTERSPEECH 2005, pp. 1517–1520. ISCA (2005)
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile - the munich versatile and fast open-source audio feature extractor. In: Proceedings of ACM Multimedia (MM), Florence, Italy, pp. 1459–1462. ACM (2010)
Osuna, E., Freund, R., Girosi, F.: Training support vector machines: an application to face detection. In: CVPR ’97, IEEE Computer Society, pp. 130–136 (1997)
Joachims, T.: Text categorization with suport vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Herbrich, R.: Learning Kernel Classifiers: Theory and Algorithms. MIT Press, Cambridge (2001)
Schuller, B., Zhang, Z., Weninger, F., Rigoll, G.: Using multiple databases for training in emotion recognition: to unite or to vote? In: 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, Florence, Italy, pp. 1553–1556. ISCA (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kobayashi, V. (2014). A Hybrid Distance-Based Method and Support Vector Machines for Emotional Speech Detection. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2013. Lecture Notes in Computer Science(), vol 8399. Springer, Cham. https://doi.org/10.1007/978-3-319-08407-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-08407-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08406-0
Online ISBN: 978-3-319-08407-7
eBook Packages: Computer ScienceComputer Science (R0)