Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion

Vydana, Hari Krishna; Raju, V. V. Vidyadhara; Gangashetty, Suryakanth V.; Vuppala, Anil Kumar

doi:10.1007/978-3-319-26832-3_28

Hari Krishna Vydana¹⁶,
V. V. Vidyadhara Raju¹⁶,
Suryakanth V. Gangashetty¹⁶ &
…
Anil Kumar Vuppala¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9468))

Included in the following conference series:

International Conference on Mining Intelligence and Knowledge Exploration

1789 Accesses
5 Citations

Abstract

Most of the speech processing applications suffer from a degradation in performance when operated in emotional environments. The degradation in performance is mostly due to a mismatch between developing and operating environments. Model adaptation and feature adaptation schemes have been employed to adapt speech systems developed in neutral environments to emotional environments. In this study, we have considered only anger emotion in emotional environments. In this work, we have studied the signal level conversion from anger emotion to neutral emotion. Emotion in human speech is concentrated over a small region in the entire utterance. The regions of speech that are highly influenced by the emotive state of the speaker is are considered as emotionally significant regions of an utterance. Physiological constraints of human speech production mechanism are explored to detect the emotionally significant regions of an utterance. Variation of various prosody parameters (Pitch, duration and energy) based on their position in the sentences is analyzed to obtain the modification factors. Speech signal in the emotionally significant regions is modified using the corresponding modification factor to generate the neutral version of the anger speech. Speech samples from Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC) are used in this study. A subjective listening test is performed for evaluating the effectiveness of the proposed conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2), 109–118 (1992)
Article Google Scholar
Batliner, A., Steidl, S., Seppi, D., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv. Hum. Comput. Interact. 2010, 3 (2010)
Article Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. Sig. Process. Mag. IEEE 18(1), 32–80 (2001)
Article Google Scholar
Gangamohan, P., Kadiri, S.R., Yegnanarayana, B.: Analysis of emotional speech at subsegmental level. In: INTERSPEECH, pp. 1916–1920 (2013)
Google Scholar
Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with susas: a speech under simulated and actual stress database. In: Eurospeech, vol. 97, pp. 1743–1746 (1997)
Google Scholar
Hansen, J.H., Womack, B.D.: Feature analysis and neural network-based classification of speech under stress. IEEE Trans. Speech Audio Process. 4(4), 307–313 (1996)
Article Google Scholar
Kadiri, S.R., Gangamohan, P., Yegnanarayana, B.: Discriminating neutral and emotional speech using neural networks. ICON (2014)
Google Scholar
Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009)
Chapter Google Scholar
Krothapalli, S.R., Yadav, J., Sarkar, S., Koolagudi, S.G., Vuppala, A.K.: Neural network based feature transformation for emotion independent speaker identification. Int. J. Speech Technol. 15(3), 335–349 (2012)
Article Google Scholar
Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
Murty, K.: Significance of excitation source information for speech analysis. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras (2009)
Google Scholar
Ortony, A., Clore, G.L., Collins, A.: The cognitive structure of emotions. Cambridge University Press, Cambridge (1990)
Google Scholar
Raja, G.S., Dandapat, S.: Speaker recognition under stressed condition. Int. J. Speech Technol. 13(3), 141–161 (2010)
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10(1), 19–41 (2000)
Article Google Scholar
Schuller, B., Stadermann, J., Rigoll, G.: Affect-robust speech recognition by dynamic emotional adaptation. In: Proceedings of Speech Prosody. Citeseer (2006)
Google Scholar
Stevens, K.N.: Acoustic Phonetics, vol. 30. MIT press, Cambridge (2000)
Google Scholar
Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio Speech Lang. Process. 14(4), 1145–1154 (2006)
Article Google Scholar
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using psola technique. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP-92, vol. 1, pp. 145–148. IEEE (1992)
Google Scholar
Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: 2011 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2011)
Google Scholar
Vlasenko, B., Prylipko, D., Wendemuth, A.: Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. In: Poster and Demo Track of the 35th German Conference on Artificial Intelligence, KI-2012, pp. 103–107. Citeseer, Saarbrucken, Germany (2012)
Google Scholar
Vlasenko, B., Wendemuth, A.: Location of an emotionally neutral region in valence-arousal space: two-class vs. three-class cross corpora emotion recognition evaluations. In: 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2014)
Google Scholar
Vuppala, A.K., Kadiri, S.R.: Neutral to anger speech conversion using non-uniform duration modification. In: 2014 9th International Conference on Industrial and Information Systems (ICIIS), pp. 1–4. IEEE (2014)
Google Scholar
Vuppala, A.K., Limmayya, J., Raghavendra, G.: Neutral speech to anger speech conversion using prosody modification. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS, vol. 8284, pp. 383–390. Springer, Heidelberg (2013)
Chapter Google Scholar
Vydana, H.K., Kadiri, S.R., Vuppala, A.K.: Vowel-based non-uniform prosody modification for emotion conversion. Circuits Syst. Signal Process. 34, 1–21 (2015)
Article Google Scholar
Vydana, H.K., Kumar, P.P., Krishna, K., Vuppala, A.K.: Improved emotion recognition using GMM-UBMs. In: 2015 International Conference on Signal Processing And Communication Engineering Systems (SPACES), pp. 53–57. IEEE (2015)
Google Scholar
Yang, B., Lugger, M.: Emotion recognition from speech signals using new harmony features. Signal Process. 90(5), 1415–1423 (2010)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Vision Lab, International Institute of Information Technology, Hyderabad, India
Hari Krishna Vydana, V. V. Vidyadhara Raju, Suryakanth V. Gangashetty & Anil Kumar Vuppala

Authors

Hari Krishna Vydana
View author publications
You can also search for this author in PubMed Google Scholar
V. V. Vidyadhara Raju
View author publications
You can also search for this author in PubMed Google Scholar
Suryakanth V. Gangashetty
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Vuppala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hari Krishna Vydana .

Editor information

Editors and Affiliations

Norwegian Univ. of Science & Technology, Trondheim, Norway
Rajendra Prasath
Intl Inst of Info Tech Hyderabad, Hyderabad, India
Anil Kumar Vuppala
V.H.N.S.N.College (Autonomous), Virudhunagar, Tamil Nadu, India
T. Kathirvalavakumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vydana, H.K., Raju, V.V.V., Gangashetty, S.V., Vuppala, A.K. (2015). Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-26832-3_28
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics