Time Dependent ARMA for Automatic Recognition of Fear-Type Emotions in Speech

  • J. C. Vásquez-Correa
  • J. R. Orozco-Arroyave
  • J. D. Arias-Londoño
  • J. F. Vargas-Bonilla
  • L. D. Avendaño
  • Elmar Nöth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9302)


The speech signals are non-stationary processes with changes in time and frequency. The structure of a speech signal is also affected by the presence of several paralinguistics phenomena such as emotions, pathologies, cognitive impairments, among others. Non-stationarity can be modeled using several parametric techniques. A novel approach based on time dependent auto-regressive moving average (TARMA) is proposed here to model the non-stationarity of speech signals. The model is tested in the recognition of “fear-typeo” emotions in speech. The proposed approach is applied to model syllables and unvoiced segments extracted from recordings of the Berlin and enterface05 databases. The results indicate that TARMA models can be used for the automatic recognition of emotions in speech.


Non-stationary signals Speech emotion recognition Continuous speech Time dependent ARMA models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley (2014)Google Scholar
  2. 2.
    Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge. Speech Communication 53(9–10), 1062–1087 (2011)CrossRefGoogle Scholar
  3. 3.
    Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Communication 50(6), 487–503 (2008)CrossRefGoogle Scholar
  4. 4.
    Eyben, F., Batliner, A., Schuller, B.: Towards a standard set of acoustic features for the processing of emotion in speech. Proceedings of Meetings on Acoustics 9(1), 1–12 (2012)Google Scholar
  5. 5.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of german emotional speech. In: Proc of the INTERSPEECH 2005, pp. 1517–1520 (2005)Google Scholar
  6. 6.
    Martin, O., Kotsia, I., Macq, B., Pitas, I.: The enterface 2005 audio-visual emotion database. In: Proceedings of the 22nd International Conference on Data Engineering Workshops. ICDEW 2006, pp. 8–15 (2006)Google Scholar
  7. 7.
    Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Valentin, E., Sahli, H.: Hybrid deep neural network-hidden markov model (DNN-HMM) based speech emotion recognition. In: Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (2013)312–317Google Scholar
  8. 8.
    Henríquez, P., Alonso, J.B., Ferrer, M.A., Travieso, C.M., Orozco-Arroyave, J.R.: Nonlinear dynamics characterization of emotional speech. Neurocomputing 132, 126–135 (2014)CrossRefGoogle Scholar
  9. 9.
    Tüske, Z., Drepper, F.R., Schlüter, R.: Non-stationary signal processing and its application in speech recognition. In: Workshop on Statistical and Perceptual Audition, Portland, OR, USA, September 2012Google Scholar
  10. 10.
    Ishi, C.T., Ishiguro, H., Hagita, N.: Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech. EURASIP J. Audio, Speech and Music Processing 2010 (2010)Google Scholar
  11. 11.
    Funaki, K.: A time-varying complex AR speech analysis based on GLS and ELS method. In: Eurospeech, pp. 1–4 (2001)Google Scholar
  12. 12.
    Poulimenos, A., Fassois, S.: Parametric time-domain methods for non-stationary random vibration modelling and analysis a critical survey and comparison. Mechanical Systems and Signal Processing 20(4), 763–816 (2006)CrossRefGoogle Scholar
  13. 13.
    Fouskitakis, G.N., Fassois, S.D.: Functional series TARMA modelling and simulation of earthquake ground motion. Earthquake Engineering & Structural Dynamics 31(2), 399–420 (2002)CrossRefGoogle Scholar
  14. 14.
    Avendaño Valencia, L.D., Fassois, S.D.: Generalized stochastic Constraint TARMA models for in-operation identification of wind turbine non-stationary dynamics. Key Engineering Materials 569, 587–594 (2013)CrossRefGoogle Scholar
  15. 15.
    Rudoy, D., Quatieri, T.F., Wolfe, P.J.: Time-varying autoregressive tests for multiscale speech analysis. In: INTERSPEECH, pp. 2839–2842 (2009)Google Scholar
  16. 16.
    Vásquez-Correa, J.C., Garcia, N., Vargas-Bonilla, J.F., Orozco-Arroyave, J.R., Arias-Londoño, J.D., Quintero, O.L.: Evaluation of wavelet measures on automatic detection of emotion in noisy and telephony speech signals. In: 2014 International Carnahan Conference on Security Technology (ICCST), pp. 1–6, October 2014Google Scholar
  17. 17.
    Boersma, P., Weenink, D.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)Google Scholar
  18. 18.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10(1–3), 19–41 (2000)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • J. C. Vásquez-Correa
    • 1
  • J. R. Orozco-Arroyave
    • 1
    • 2
  • J. D. Arias-Londoño
    • 1
  • J. F. Vargas-Bonilla
    • 1
  • L. D. Avendaño
    • 3
  • Elmar Nöth
    • 2
  1. 1.Faculty of EngineeringUniversidad de Antioquia UdeAMedellínColombia
  2. 2.Pattern Recognition LabFriedrich-Alexander-UniversitätErlangen-NürnbergGermany
  3. 3.Laboratory for Stochastic Mechanical Systems and Automation (SMSA), Department of Mechanical and Aeronautical EngineeringUniversity of PatrasPatrasGreece

Personalised recommendations