Time Dependent ARMA for Automatic Recognition of Fear-Type Emotions in Speech
The speech signals are non-stationary processes with changes in time and frequency. The structure of a speech signal is also affected by the presence of several paralinguistics phenomena such as emotions, pathologies, cognitive impairments, among others. Non-stationarity can be modeled using several parametric techniques. A novel approach based on time dependent auto-regressive moving average (TARMA) is proposed here to model the non-stationarity of speech signals. The model is tested in the recognition of “fear-typeo” emotions in speech. The proposed approach is applied to model syllables and unvoiced segments extracted from recordings of the Berlin and enterface05 databases. The results indicate that TARMA models can be used for the automatic recognition of emotions in speech.
KeywordsNon-stationary signals Speech emotion recognition Continuous speech Time dependent ARMA models
Unable to display preview. Download preview PDF.
- 1.Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley (2014)Google Scholar
- 4.Eyben, F., Batliner, A., Schuller, B.: Towards a standard set of acoustic features for the processing of emotion in speech. Proceedings of Meetings on Acoustics 9(1), 1–12 (2012)Google Scholar
- 5.Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of german emotional speech. In: Proc of the INTERSPEECH 2005, pp. 1517–1520 (2005)Google Scholar
- 6.Martin, O., Kotsia, I., Macq, B., Pitas, I.: The enterface 2005 audio-visual emotion database. In: Proceedings of the 22nd International Conference on Data Engineering Workshops. ICDEW 2006, pp. 8–15 (2006)Google Scholar
- 7.Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Valentin, E., Sahli, H.: Hybrid deep neural network-hidden markov model (DNN-HMM) based speech emotion recognition. In: Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (2013)312–317Google Scholar
- 9.Tüske, Z., Drepper, F.R., Schlüter, R.: Non-stationary signal processing and its application in speech recognition. In: Workshop on Statistical and Perceptual Audition, Portland, OR, USA, September 2012Google Scholar
- 10.Ishi, C.T., Ishiguro, H., Hagita, N.: Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech. EURASIP J. Audio, Speech and Music Processing 2010 (2010)Google Scholar
- 11.Funaki, K.: A time-varying complex AR speech analysis based on GLS and ELS method. In: Eurospeech, pp. 1–4 (2001)Google Scholar
- 15.Rudoy, D., Quatieri, T.F., Wolfe, P.J.: Time-varying autoregressive tests for multiscale speech analysis. In: INTERSPEECH, pp. 2839–2842 (2009)Google Scholar
- 16.Vásquez-Correa, J.C., Garcia, N., Vargas-Bonilla, J.F., Orozco-Arroyave, J.R., Arias-Londoño, J.D., Quintero, O.L.: Evaluation of wavelet measures on automatic detection of emotion in noisy and telephony speech signals. In: 2014 International Carnahan Conference on Security Technology (ICCST), pp. 1–6, October 2014Google Scholar
- 17.Boersma, P., Weenink, D.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)Google Scholar