Multi-algorithm Fusion for Speech Emotion Recognition
In this paper, we have proposed a speech emotion recognition system based on multi-algorithm fusion. Mel Frequency Cepstral Coefficients (MFCC) and Discrete Wavelet Transform (DWT), the two prominent algorithms for speech analysis, have been used to extract emotion information from speech signal. MFCC, a representation of the short-term power spectrum of a sound is a classical approach to analyze speech signal whilst the DWT, a multiresolution approach mainly approximate the frequency information along with time information. Feature level fusion of algorithms has been performed after extraction of features by acoustic analysis of speech emotion signal. The final emotion state was determined by classification using Support Vector Machine. Popular Berlin emotion database is used for evaluation of the proposed system. The results achieved are very promising as the proposed fusion algorithm performed well compared to individual algorithms.
KeywordsMulti-algorithm Fusion MFCC DWT Speech Emotion Recognition
Unable to display preview. Download preview PDF.
- 1.Cohn, J.F., Katz, G.S.: Bimodal expressions of emotion by face and voice. In: Workshop on Face/Gesture Recognition and their Applications, the Sixth ACM International Multimedia Conference, Bristol, England (1998)Google Scholar
- 3.Kudiri, K.M., Verma, G.K., Gohel, B.: Relative Amplitude based Features for Emotion Detection from Speech. In: 3rd IEEE Int. Conf. on Signal and Image Processing, pp. 301–304 (2010)Google Scholar
- 5.Shah, F., et al.: Discrete Wavelet Transforms and Artificial Neural Networks for Speech Emotion Recognition. International Journal of Computer Theory and Engineering 2(3), 1793–8201 (2010)Google Scholar
- 6.Kwon, O.-W.: Emotion Recognition by Speech Signals. In: EUROSPEECH-2003, Geneva (2003)Google Scholar
- 7.Mao, X.: Speech Emotion Recognition based on a Hybrid of HMM/ANN. In: Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26 (2007)Google Scholar
- 8.Liqin, F., et al.: Relative Speech Emotion Recognition Based Artificial Neural Network. In: IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application (2008)Google Scholar
- 10.Dutta, T.: Dynamic Time Warping Based Approach to Text Dependent Speaker Identification Using Spectrograms. In: Congress on Image and Signal Processing, vol. 2, pp. 354–360 (2008)Google Scholar
- 11.Tzanetakis, G., Essl, G., Cook, P.: Audio Analysis using the Discrete Wavelet Transform. In: Proc. Conf. in Acoustics and Music Theory Applications, Skiathos, Greece (2001)Google Scholar
- 12.Lindasalwa, M., Begam, M., Elamvazuthi, I.: Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. Jour. of Computing 2(3), 138–143 (2010)Google Scholar
- 13.Toh, A.M., Togneri, R., Northolt, S.: Spectral entropy as speech features for speech recognition. In: The Proceedings of PEECS, Perth, pp. 22–25 (2005)Google Scholar
- 14.Kan, P.L.E., Allen, T., Quigley, F.: A GMM-Based Speaker Identification System on FPGA. In: 6th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications. LNCS. Bangkok, Thailand (March 2010)Google Scholar
- 15.Burkhardt, F., Paeschke, A.: A database of German emotional speech. In: Interspeech, Lisbon, Portugal, pp. 1517–1520 (2005)Google Scholar