Two stage emotion recognition based on speaking rate

Koolagudi, Shashidhar G.; Krothapalli, Rao Sreenivasa

doi:10.1007/s10772-010-9085-x

Two stage emotion recognition based on speaking rate

Published: 11 December 2010

Volume 14, pages 35–48, (2011)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Shashidhar G. Koolagudi¹ &
Rao Sreenivasa Krothapalli¹

685 Accesses
31 Citations
Explore all metrics

Abstract

This paper proposes two stage speech emotion recognition approach using speaking rate. The emotions considered in this study are anger, disgust, fear, happy, neutral, sadness, sarcastic and surprise. At the first stage, based on speaking rate, eight emotions are categorized into 3 broad groups namely active (fast), normal and passive (slow). In the second stage, these 3 broad groups are further classified into individual emotions using vocal tract characteristics. Gaussian mixture models (GMM) are used for developing the emotion models. Emotion classification performance at broader level, based on speaking rate is found to be around 99% for speaker and text dependent cases. Performance of overall emotion classification is observed to be improved using the proposed two stage approach. Along with spectral features, the formant features are explored in the second stage, to achieve robust emotion recognition performance in case of speaker, gender and text independent cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alm, C. O., & Llora, X. (2006). Evolving emotional prosody. In ICSLP ninth international conference on spoken language processing INTERSPEECH 2006, Pittsburgh, PA, USA, 17–21 September 2006.
Google Scholar
Banziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication 46, 252–267.
Article Google Scholar
Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.) (2008). Springer handbook on speech processing. Berlin: Springer.
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of german emotional speech, In Interspeech.
Google Scholar
Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40, 5–32.
Article MATH Google Scholar
Francis, A. L., & Nusbaum, H. C. Paying attention to speaking rate. Center for Computational Psychology, Department of Psychology, The University of Chicago.
Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). In Communications in computer and information science. IITKGP-SESC: Speech database for emotion analysis, JIIT University, Noida, India, 17–19 August 2009. Berlin: Springer. ISSN: 1865-0929.
Google Scholar
Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303.
Article Google Scholar
Li, A. & Zu, Y. (2008). Speaking rate effects on discourse prosody in standard Chinese. In Fourth international conference on speech prosody 2008 (pp. 449–452). Campinas, Brazil, 6–9 May 2008.
Google Scholar
Lussier, E. F., & Morgan, N. (1999). Effects of speaking rate and word frequency on pronunciations in conventional speech. Speech Communication, 29, 137–158.
Article Google Scholar
Murty, K., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.
Article Google Scholar
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Google Scholar
Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech and Language, 21, 282–295.
Article Google Scholar
Reddy, M. S. H., Kumar, K. S., Guruprasad, S., & Yegnanarayana, B. (2009). Subsegmental features for analysis of speech at different speaking rates. In International conference on natural language processing, India (pp. 75–80). New York: Macmillan.
Google Scholar
Richardson, M., Hwang, M. Y., Acero, A., & Huang, X. (1999). Improvements on speech recognition for fast talkers. In Eurospeech conference, September 1999.
Google Scholar
Sagar, T. V., Rao, K. S., Prasanna, S. R. M., & Dandapat, S. (2007). Characterisation and incorporation of emotions in speech. In IEEE INDICON, Delhi, India, September 2006.
Google Scholar
Schroder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M., & Gielen, S. (2001). Acoustic correlates of emotion dimensions in view of speech synthesis. In 7th European conference on speech communication and technology, EUROSPEECH 2001 Scandinavia, 2nd INTERSPEECH Event, Aalborg, Denmark, 3–7 September 2001.
Google Scholar
Seshadri, G., & Yegnanarayana, B. (2009). Perceived loudness of speech based on the characteristics of glottal excitation source. The Journal of the Acoustical Society of America, 126, 2061–2071.
Article Google Scholar
Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP 2004 IEEE (pp. I593–I596).
Google Scholar
Yang, H., Guo, W., & Liang, Q. (2008). A speaking rate adjustable digital speech repeater for listening comprehension in second-language learning. In International conference on computer science and software engineering (Vol. 5, pp. 893–896). 12–14 December 2008.
Chapter Google Scholar
Yuan, J., Liberman, M., & Cieri, C. (2006). Towards an integrated understanding of speaking rate in conversation. In Interspeech 2006 (pp. 541–544). Pittsburgh, PA.
Google Scholar
Zheng, J., Franco, H., Weng, F., Sankar, A., & Bratt, H. (2000). Word-level rate of speech modeling using rate-specific phones and pronunciations. In International conf. on acoustic, speech and signal processing (ICASSP-2000) (pp. 1775–1778).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
Shashidhar G. Koolagudi & Rao Sreenivasa Krothapalli

Authors

Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar
Rao Sreenivasa Krothapalli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shashidhar G. Koolagudi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koolagudi, S.G., Krothapalli, R.S. Two stage emotion recognition based on speaking rate. Int J Speech Technol 14, 35–48 (2011). https://doi.org/10.1007/s10772-010-9085-x

Download citation

Received: 16 November 2010
Accepted: 02 December 2010
Published: 11 December 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s10772-010-9085-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two stage emotion recognition based on speaking rate

Abstract

Access this article

Similar content being viewed by others

Robust Emotion Recognition using Speaking Rate Features

Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features

Text-Dependent Versus Text-Independent Speech Emotion Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two stage emotion recognition based on speaking rate

Abstract

Access this article

Similar content being viewed by others

Robust Emotion Recognition using Speaking Rate Features

Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features

Text-Dependent Versus Text-Independent Speech Emotion Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation