Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

Gudmalwar, Ashishkumar Prabhakar; Rama Rao, Ch V; Dutta, Anirban

doi:10.1007/s10772-018-09576-4

Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

Published: 05 December 2018

Volume 22, pages 521–531, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Ashishkumar Prabhakar Gudmalwar¹,
Ch V Rama Rao¹ &
Anirban Dutta¹

465 Accesses
7 Citations
Explore all metrics

Abstract

Speaker emotion recognition is an important research issue as it finds lots of applications in human–robot interaction, computer–human interaction, etc. This work deals with the recognition of emotion of the speaker from speech utterance. For that features like pitch, log energy, zero crossing rate, and first three formant frequencies are used. Feature vectors are constructed using the 11 statistical parameters of each feature. The Artificial Neural Network (ANN) is chosen as a classifier owing to its universal function approximation capabilities. In ANN based classifier, the time required for training the network as well as for classification depends upon the dimension of feature vector. This work focused on development of a speaker emotion recognition system using prosody features as well as reduction of dimensionality of feature vectors. Here, principle component analysis (PCA) is used for feature vector dimensionality reduction. Emotional prosody speech and transcription from Linguistic Data Consortium (LDC) and Berlin emotional databases are considered for evaluating the performance of proposed approach for seven types of emotion recognition. The performance of the proposed method is compared with existing approaches and better performance is obtained with proposed method. From experimental results it is observed that 75.32% and 84.5% recognition rate is obtained for Berlin emotional database and LDC emotional speech database respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Essam H. Houssein, Asmaa Hammad & Abdelmgeid A. Ali

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Priyadarsini Samal & Mohammad Farukh Hashmi

Emotion recognition with EEG-based brain-computer interfaces: a systematic literature review

Article Open access 01 March 2024

Kübra Erat, Elif Bilge Şahin, … Pınar Onay Durdu

Notes

Linguistic Data Consurtium (LDC), http://www.ldc.upenn.edu/.

References

Abelin, A. & Allwood, J. (2000). Cross-linguistic interpretation of emotional prosody. In Proceedings of the ISCA workshop on speech and emotion.
Anagnostopoulos, C. N., & Vovoli, E. (2010). Sound processing features for speaker-dependent and phrase-independent emotion recognition in Berlin database. In G. Papadopoulos, W. Wojtkowski, G. Wojtkowski, S. Wrycza, & J. Zupancic (Eds.), Information systems development (pp. 413–421). Boston: Springer.
Google Scholar
Anagnostopoulos, C. N., Iliou, T., & Giannoukos, (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.
Article Google Scholar
Atassi, H. & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In Proceedings of 20th IEEE international conference on tools with artificial intelligence (pp. 147–152).
Banse, R., & Sherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.
Article Google Scholar
Bisio, I., Delfino, A., Lavagetto, F., Marchese, M., & Sciarrone, A. (2013). Gender-driven emotion recognition through speech signals for ambient intelligence applications. IEEE Transactions on Emerging Topics in Computing, 1(2), 244–257.
Article Google Scholar
Breazeal, C. (2001). Designing social robots. Cambridge, MA: MIT Press.
MATH Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. & Weiss, B. (2005). A database of German emotional speech, In Proceedings of interspeech.
Burkhardt, F., & Sendlmeier, W. (2000). Verification of acoustical correlates of emotional speech using formant-synthesis. In Proceedings of the ISCA workshop on speech and emotion.
Cao, H., Vermaa, R., & Nenkovab, A. (2015). Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Computer Speech & Language, 29(1), 186–202.
Article Google Scholar
Cen, L., Ser, W., & Yu, Z. L. (2008). Speech emotion recognition using canonical correlation analysis and probabilistic neural network, In Proceedings of 7th international conference on machine learning and applications (pp. 859–862).
Chiaverini, S., Siciliano, B., & Villani, L. (1999). A survey of robot interaction control scheme with experimental comparison. IEEE/ASME Transactions on Mechatronics, 4(3), 273–285.
Article Google Scholar
De Cheveign, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917–1930.
Article Google Scholar
FirozShah, A., Vimal Krishnan, V. R., Raji Sukumar, A., Jayakumar, A., & Babu Anto, P. (2009). Speaker independent automatic emotion recognition from speech: A comparison of MFCCs and discrete wavelet transforms. In Proceedings of international conference on advances in recent technologies in communication and computing (pp. 528–531).
Fu, L., Mao, X., & Chen, L. (2008a). Relative speech emotion recognition based artificial neural network. In Proceedings of IEEE Pacific-Asia workshop on computational intelligence and industrial application (pp. 140–144).
Fu, L., Mao, X., & Chen, L. (2008b). Speaker independent emotion recognition using HMMs fusion system with relative features. In Proceedings of 1st international conference on intelligent networks and intelligent systems (pp. 608–611).
Giannakopoulos, T., Pikrakis, A., & Theodoridis, S. (2009). A dimensional approach to emotion recognition of speech from movies. In Proceedings of IEEE international conference on acoustics, speech and signal processing (pp. 65–68).
Iliou, T., & Anagnostopoulos, C. N. (2009). Comparison of different classifiers for emotion recognition. In Proceedings of panhellenic conference in informatics (pp. 102–106).
Kostoulas, T. P., & Fakotakis, N. (2006). A speaker dependent emotion recognition framework, CSNDSP. In Proceedings of 5th international symposium computers, systems, networks and digital signal processing (pp. 305–309).
Kostoulas, T., Ganchev, T., Lazaridis, A., & Fakotakis, N. (2010). Enhancing emotion recognition from speech through feature selection, In P. Sojka, A. Hork, I. Kopecek, K. Pala (Eds.), International conference on text, speech and dialogue, lecture notes in artificial intelligence (Vol. 6231, pp. 338–344).
Loni, D. Y., & Subbaraman, S. (2014). Formant estimation of speech and singing voice by combining wavelet with LPC and Cepstrum techniques. In 9th international conference on industrial and information systems (ICIIS) IEEE (pp. 1–7).
Luengo, I., Navas, E., & Hernaez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction on Multimedia, 12, 490–501.
Article Google Scholar
Lugger, M., & Yang, B. (2007a). An incremental analysis of different feature groups in speaker independent emotion recognition. In Proceedings of international congress phonetic sciences (pp. 2149–2152).
Lugger, M., & Yang, B. (2007b). The relevance of voice quality features in speaker independent emotion recognition. In Proceedings of IEEE international conference on acoustics, speech and signal processing (pp. 17–20).
McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In Proceedings of ISCA workshop speech emotion in Belfast, UK (pp. 207–212).
Mishra, H. K., & Sekhar, C. C. (2009). Variational gaussian mixture models for speech emotion recognition. In Proceedings of 7th international conference on advances in pattern recognition (pp. 183–186).
Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In Proceedings of INTERSPEECH conference (pp. 809–812).
Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: Feature and algorithms. International Journal of Human-Computer Studies, 59(1), 157–183.
Google Scholar
Pantic, M., & Rothkrantz, L. J. (2003). Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9), 1370–1390.
Article Google Scholar
Picard, R. (1997). Affective computing. Cambridge, MA: MIT Press.
Google Scholar
Ramakrishnan, S. (2012). Recognition of emotion from speech: A review. In S. Ramakrishnan (Ed.), Speech enhancement, modeling and recognition: Algorithms and applications (pp. 121–138). London: IntechOpen.
Chapter Google Scholar
Ross, M., Shaffer, H., Cohen, A., Freudberg, R., & Manley, H. (1974). Average magnitude difference function pitch extractor. IEEE Transactions on Acoustics, Speech, and Signal Processing, 22(5), 353–362.
Article Google Scholar
Schuller, B., Muller, R., Lang, M., & Rigoll, G. (2005a). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In Proceedings of 9th Eurospeech Interspeech (pp. 805–809).
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., et al. (2007). The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals. In Proceedings of interspeech (pp. 2253–2256).
Schuller, B., Mller, R., Eyben, F., Gast, J., Hrnler, B., Wllmer, M., et al. (2009). Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing, 27, 1760–1774.
Article Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., et al. (2010). Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Transaction on Affect Computing, 1, 119–131.
Article Google Scholar
Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9(7), 727–730.
Article Google Scholar
Sidorova, J. (2007). Speech emotion recognition. Ph.D. Thesis, Universitat Pompeu Fabra, Barcelona.
Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). Speech Coding and Synthesis, 495, 495–518.
Google Scholar
Tan, L. N., & Alwan, A. (2013). Multi-band summary correlogram-based pitch detection for noisy speech. Speech Communication, 55(7), 841–856.
Article Google Scholar
Tickle, A. (2000). English and Japanese speakers emotion vocalizations and recognition: a comparison highlighting vowel quality. In ISCA workshop on speech and emotion, Belfast.
Tolkmitt, F. J., & Scherer, K. R. (1986). Effect of experimentally induced stress on vocal parameter. Journal of Experimental Psychology: Human Perception and Performance, 12(3), 302–313.
Google Scholar
Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. Journal of the Acoustical Society of America, 52(4), 1238–1250.
Article Google Scholar
Wu, S., Falk, T.H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In Proceedings of 16th international conference on digital signal processing.
Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affect Computing, 2, 10–21.
Article Google Scholar
Yang, C., Ji, L., & Liu, G. (2009a). Study to speech emotion recognition based on TWINsSVM. In Proceedings of 5th international conference on natural computation (pp. 312–316).
Yang, T., Yang, J., & Bi, F. (2009b). Emotion statuses recognition of speech signal using intuitionistic fuzzy set. In Proceedings of world congress on software engineering (pp. 204–207).
Yun, S., Yoo, C. D. (2009). Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegens emotion model. In Proceedings IEEE international conference on acoustics, speech and signal processing (pp. 4169–4172).

Download references

Author information

Authors and Affiliations

National Institute of Technology, Meghalaya, Shillong, Meghalaya, India
Ashishkumar Prabhakar Gudmalwar, Ch V Rama Rao & Anirban Dutta

Authors

Ashishkumar Prabhakar Gudmalwar
View author publications
You can also search for this author in PubMed Google Scholar
Ch V Rama Rao
View author publications
You can also search for this author in PubMed Google Scholar
Anirban Dutta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashishkumar Prabhakar Gudmalwar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gudmalwar, A.P., Rama Rao, C.V. & Dutta, A. Improving the performance of the speaker emotion recognition based on low dimension prosody features vector. Int J Speech Technol 22, 521–531 (2019). https://doi.org/10.1007/s10772-018-09576-4

Download citation

Received: 16 October 2017
Accepted: 26 November 2018
Published: 05 December 2018
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10772-018-09576-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Emotion recognition with EEG-based brain-computer interfaces: a systematic literature review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Emotion recognition with EEG-based brain-computer interfaces: a systematic literature review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation