Skip to main content

Time Dependent ARMA for Automatic Recognition of Fear-Type Emotions in Speech

  • Conference paper
  • First Online:
  • 1810 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Abstract

The speech signals are non-stationary processes with changes in time and frequency. The structure of a speech signal is also affected by the presence of several paralinguistics phenomena such as emotions, pathologies, cognitive impairments, among others. Non-stationarity can be modeled using several parametric techniques. A novel approach based on time dependent auto-regressive moving average (TARMA) is proposed here to model the non-stationarity of speech signals. The model is tested in the recognition of “fear-typeo” emotions in speech. The proposed approach is applied to model syllables and unvoiced segments extracted from recordings of the Berlin and enterface05 databases. The results indicate that TARMA models can be used for the automatic recognition of emotions in speech.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley (2014)

    Google Scholar 

  2. Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge. Speech Communication 53(9–10), 1062–1087 (2011)

    Article  Google Scholar 

  3. Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Communication 50(6), 487–503 (2008)

    Article  Google Scholar 

  4. Eyben, F., Batliner, A., Schuller, B.: Towards a standard set of acoustic features for the processing of emotion in speech. Proceedings of Meetings on Acoustics 9(1), 1–12 (2012)

    Google Scholar 

  5. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of german emotional speech. In: Proc of the INTERSPEECH 2005, pp. 1517–1520 (2005)

    Google Scholar 

  6. Martin, O., Kotsia, I., Macq, B., Pitas, I.: The enterface 2005 audio-visual emotion database. In: Proceedings of the 22nd International Conference on Data Engineering Workshops. ICDEW 2006, pp. 8–15 (2006)

    Google Scholar 

  7. Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Valentin, E., Sahli, H.: Hybrid deep neural network-hidden markov model (DNN-HMM) based speech emotion recognition. In: Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (2013)312–317

    Google Scholar 

  8. Henríquez, P., Alonso, J.B., Ferrer, M.A., Travieso, C.M., Orozco-Arroyave, J.R.: Nonlinear dynamics characterization of emotional speech. Neurocomputing 132, 126–135 (2014)

    Article  Google Scholar 

  9. Tüske, Z., Drepper, F.R., Schlüter, R.: Non-stationary signal processing and its application in speech recognition. In: Workshop on Statistical and Perceptual Audition, Portland, OR, USA, September 2012

    Google Scholar 

  10. Ishi, C.T., Ishiguro, H., Hagita, N.: Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech. EURASIP J. Audio, Speech and Music Processing 2010 (2010)

    Google Scholar 

  11. Funaki, K.: A time-varying complex AR speech analysis based on GLS and ELS method. In: Eurospeech, pp. 1–4 (2001)

    Google Scholar 

  12. Poulimenos, A., Fassois, S.: Parametric time-domain methods for non-stationary random vibration modelling and analysis a critical survey and comparison. Mechanical Systems and Signal Processing 20(4), 763–816 (2006)

    Article  Google Scholar 

  13. Fouskitakis, G.N., Fassois, S.D.: Functional series TARMA modelling and simulation of earthquake ground motion. Earthquake Engineering & Structural Dynamics 31(2), 399–420 (2002)

    Article  Google Scholar 

  14. Avendaño Valencia, L.D., Fassois, S.D.: Generalized stochastic Constraint TARMA models for in-operation identification of wind turbine non-stationary dynamics. Key Engineering Materials 569, 587–594 (2013)

    Article  Google Scholar 

  15. Rudoy, D., Quatieri, T.F., Wolfe, P.J.: Time-varying autoregressive tests for multiscale speech analysis. In: INTERSPEECH, pp. 2839–2842 (2009)

    Google Scholar 

  16. Vásquez-Correa, J.C., Garcia, N., Vargas-Bonilla, J.F., Orozco-Arroyave, J.R., Arias-Londoño, J.D., Quintero, O.L.: Evaluation of wavelet measures on automatic detection of emotion in noisy and telephony speech signals. In: 2014 International Carnahan Conference on Security Technology (ICCST), pp. 1–6, October 2014

    Google Scholar 

  17. Boersma, P., Weenink, D.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)

    Google Scholar 

  18. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10(1–3), 19–41 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. C. Vásquez-Correa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vásquez-Correa, J.C., Orozco-Arroyave, J.R., Arias-Londoño, J.D., Vargas-Bonilla, J.F., Avendaño, L.D., Nöth, E. (2015). Time Dependent ARMA for Automatic Recognition of Fear-Type Emotions in Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics