Abstract
Hilbert-Huang transform method has been widely utilized from its inception because of the superiority in varieties of areas. The Hilbert spectrum thus obtained is able to reflect the distribution of the signal energy in a number of scales accurately. In this paper, a novel feature called ECC is proposed via feature extraction of the Hilbert energy spectrum which describes the distribution of the instantaneous energy. The experimental results conspicuously demonstrate that ECC outperforms the traditional short-term average energy. Combination of the ECC with mel frequency cepstral coefficients (MFCC) delineates the distribution of energy in the time domain and frequency domain, and the features of this group achieve a better recognition effect compared with the feature combination of the short-term average energy, pitch and MFCC. Afterwards, further improvements of ECC are developed. TECC is gained by combining ECC with the teager energy operator, and EFCC is obtained by introducing the instantaneous frequency to the energy. In the experiments, seven status of emotion are selected to be recognized and the highest recognition rate 83.57% is achieved within the classification accuracy of boredom reaching 100%. The numerical results indicate that the proposed features ECC, TECC and EFCC can improve the performance of speech emotion recognition substantially.
Similar content being viewed by others
References
Chang J S, Kim E Y, Kim H J. Mobile robot control using hand-shape recognition [J]. Transactions of the Institute of Measurement and Control, 2008, 30(2): 143–152.
Picard R. Affective computing [M]. Boston: MIT Press, 1997.
Anselmo F N, Wanderley C C, Vinicius R M, Teodiano F B F. Human-machine interface based on electro-biological signals for mobile vehicles [C]// IEEE International Symposium on Industrial Electronics, Montreal, Canada. 2006, 2954–2959.
Rohit M, Mirrasoul J M. Trend analysis techniques for incipient fault prediction [C]// Power & Energy Society General Meeting, Calgary, Canada. 2009: 1–8.
Norden E H, Zheng S, Steven R L, Manli C W, Hsing H S, Quanan Z, Nai C Y, Chi C T, Henry H L. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis [J]. Proceedings of the Royal Society A, 1998, 454(1971): 903–995.
Jones J D, Pei J S, Wright J P, Tull M P. Embedded EMD algorithm within an FPGA-based design to classify nonlinear SDOF systems[C]// Proceedings of SPIE — The International Society for Optical Engineering, 2010, DOI:10.1117/12.847889.
Shi Q W, Zhou W, Cao J T, Tanaka T, Wang R B. Brain-computer interface system using approximate entropy and EMD techniques [J]. Lecture Notes in Computer Science, 2010, 6146(2): 204–212.
Pan J Y, Yan X H, Zheng Q N. Interpretation of scatter meter ocean surface wind vector EOFs over the Northwestern Pacific [J]. Remote Sensing of Environment, 2003, 84(1): 53–68.
Dong Y F, Li Y M, Xiao M K. Analysis of earthquake ground motions using an improved Hilbert-Huang transform [J]. Soil Dynamics and Earthquake Engineering, 2008, 28(1): 7–19.
Huang B, Yan G Z. Analysis of the characteristics of gastrointestinal motility based on Hilbert-Huang transform method [J]. High Technology Letters, 2008, 14(1): 30–34.
Zhang L, Huang M. Fault diagnosis approach for bearing based on EMD and slice bi-spectrum [J]. Journal of Beijing University of Aeronautics and Astronautics, 2010, 36(3): 287–290.
Cao C F, Yang S X, Yang J X. Vibration mode extraction method based on the characteristics of white noise [J]. Journal of Mechanical Engineering, 2010, 46(3): 65–70.
Xie S, Zeng Y C, Jiang Y B. Application of Hilbert marginal spectrum in speech emotion recognition [J]. Technical Acoustics, 2009, 28(2): 148–152.
Wang D L, Leung H, Kwak K C, Yoon H. Enhanced speech recognition with blind equalization for robot ‘WEVER-R2 [C]// Proceedings of IEEE International Workshop on Robot and Human Interactive Communication, Jeju, Korean. 2007: 684–688.
Roy A, Doherty J F. Empirical mode decomposition frequency resolution improvement using the preemphasis and de-emphasis method [C]// The 42nd Annual Conference on Information Sciences and Systems, Princeton, USA. 2008: 453–457.
Tsau E, Cho N, Kuo C J. Fundamental frequency estimation for music signals with modified Hilbert-Huang Transform (HHT) [C]// Proceedings of IEEE International Conference on Multimedia and Expo. 2009: 338–341.
Teager H M, Teager S M. A phenomenological model for vowel production in the vocal tract [J]. Speech Science: Recent Advances, 1983: 73–109.
Chorin A J, Marsden J E. A mathematical introduction to fluid mechanics [M]// 2nd ed. Berlin: Springer-Verlag, 1990.
Thomas T J. A finite element model of fluid flow in the vocal tract [J]. Computer Speech and Language, 1986, 1(1): 131–151.
Zhou G, Hansen J H L, Kaiser J F. Nonlinear feature based classification of speech under stress [J]. IEEE Transactions on Speech and Audio Processing, 2010, 9(3): 201–216.
Kaiser J F. On a simple algorithm to calculate the ‘energy’ of a signal [C]// Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Albuquerque, USA. 1990: 381–384.
Boudraa A O, Cexus J C, Salzenstein F, Guillon L. If estimation using empirical mode decomposition and nonlinear Teager energy operator [C]// Proceedings of the First International Symposium on Control, Communications and Signal Processing. 2004: 45–48.
Gao H, Chen S G, Su G C. Emotion classification of mandarin speech based on TEO nonlinear features [C]// The Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Qingdao, China. 2007: 394–398.
Gao Hui, Su Guang-chuan, Chen Shan-guang. Acoustic Feature analysis of mandarin speech under various emotional status [J]. Space Medical and Medical Engineering, 2005, 18(5): 350–354 (in Chinese).
Burkhardt F, Paeschke A, Rolfes M, Weiss B. A database of German emotional speech [C]// Proceedings of the Ninth European Conference on Speech Communication and Technology, Lisbon, Portugal. 2005: 3–6.
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the State Key Laboratory of Robotics and System (Grant No.SKLS-2009-MS-10), and the Shanghai Leading Academic Discipline Project (Grant No.J50103)
About this article
Cite this article
Li, X., Zheng, Y. & Li, X. Extraction of novel features for emotion recognition. J. Shanghai Univ.(Engl. Ed.) 15, 479–486 (2011). https://doi.org/10.1007/s11741-011-0772-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11741-011-0772-3