Skip to main content
Log in

Two stage emotion recognition based on speaking rate

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper proposes two stage speech emotion recognition approach using speaking rate. The emotions considered in this study are anger, disgust, fear, happy, neutral, sadness, sarcastic and surprise. At the first stage, based on speaking rate, eight emotions are categorized into 3 broad groups namely active (fast), normal and passive (slow). In the second stage, these 3 broad groups are further classified into individual emotions using vocal tract characteristics. Gaussian mixture models (GMM) are used for developing the emotion models. Emotion classification performance at broader level, based on speaking rate is found to be around 99% for speaker and text dependent cases. Performance of overall emotion classification is observed to be improved using the proposed two stage approach. Along with spectral features, the formant features are explored in the second stage, to achieve robust emotion recognition performance in case of speaker, gender and text independent cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alm, C. O., & Llora, X. (2006). Evolving emotional prosody. In ICSLP ninth international conference on spoken language processing INTERSPEECH 2006, Pittsburgh, PA, USA, 17–21 September 2006.

    Google Scholar 

  • Banziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication 46, 252–267.

    Article  Google Scholar 

  • Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.) (2008). Springer handbook on speech processing. Berlin: Springer.

    Google Scholar 

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of german emotional speech, In Interspeech.

    Google Scholar 

  • Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40, 5–32.

    Article  MATH  Google Scholar 

  • Francis, A. L., & Nusbaum, H. C. Paying attention to speaking rate. Center for Computational Psychology, Department of Psychology, The University of Chicago.

  • Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). In Communications in computer and information science. IITKGP-SESC: Speech database for emotion analysis, JIIT University, Noida, India, 17–19 August 2009. Berlin: Springer. ISSN: 1865-0929.

    Google Scholar 

  • Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303.

    Article  Google Scholar 

  • Li, A. & Zu, Y. (2008). Speaking rate effects on discourse prosody in standard Chinese. In Fourth international conference on speech prosody 2008 (pp. 449–452). Campinas, Brazil, 6–9 May 2008.

    Google Scholar 

  • Lussier, E. F., & Morgan, N. (1999). Effects of speaking rate and word frequency on pronunciations in conventional speech. Speech Communication, 29, 137–158.

    Article  Google Scholar 

  • Murty, K., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.

    Article  Google Scholar 

  • Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech and Language, 21, 282–295.

    Article  Google Scholar 

  • Reddy, M. S. H., Kumar, K. S., Guruprasad, S., & Yegnanarayana, B. (2009). Subsegmental features for analysis of speech at different speaking rates. In International conference on natural language processing, India (pp. 75–80). New York: Macmillan.

    Google Scholar 

  • Richardson, M., Hwang, M. Y., Acero, A., & Huang, X. (1999). Improvements on speech recognition for fast talkers. In Eurospeech conference, September 1999.

    Google Scholar 

  • Sagar, T. V., Rao, K. S., Prasanna, S. R. M., & Dandapat, S. (2007). Characterisation and incorporation of emotions in speech. In IEEE INDICON, Delhi, India, September 2006.

    Google Scholar 

  • Schroder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M., & Gielen, S. (2001). Acoustic correlates of emotion dimensions in view of speech synthesis. In 7th European conference on speech communication and technology, EUROSPEECH 2001 Scandinavia, 2nd INTERSPEECH Event, Aalborg, Denmark, 3–7 September 2001.

    Google Scholar 

  • Seshadri, G., & Yegnanarayana, B. (2009). Perceived loudness of speech based on the characteristics of glottal excitation source. The Journal of the Acoustical Society of America, 126, 2061–2071.

    Article  Google Scholar 

  • Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP 2004 IEEE (pp. I593–I596).

    Google Scholar 

  • Yang, H., Guo, W., & Liang, Q. (2008). A speaking rate adjustable digital speech repeater for listening comprehension in second-language learning. In International conference on computer science and software engineering (Vol. 5, pp. 893–896). 12–14 December 2008.

    Chapter  Google Scholar 

  • Yuan, J., Liberman, M., & Cieri, C. (2006). Towards an integrated understanding of speaking rate in conversation. In Interspeech 2006 (pp. 541–544). Pittsburgh, PA.

    Google Scholar 

  • Zheng, J., Franco, H., Weng, F., Sankar, A., & Bratt, H. (2000). Word-level rate of speech modeling using rate-specific phones and pronunciations. In International conf. on acoustic, speech and signal processing (ICASSP-2000) (pp. 1775–1778).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shashidhar G. Koolagudi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koolagudi, S.G., Krothapalli, R.S. Two stage emotion recognition based on speaking rate. Int J Speech Technol 14, 35–48 (2011). https://doi.org/10.1007/s10772-010-9085-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-010-9085-x

Keywords

Navigation