Skip to main content

Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2015)

Abstract

Most of the speech processing applications suffer from a degradation in performance when operated in emotional environments. The degradation in performance is mostly due to a mismatch between developing and operating environments. Model adaptation and feature adaptation schemes have been employed to adapt speech systems developed in neutral environments to emotional environments. In this study, we have considered only anger emotion in emotional environments. In this work, we have studied the signal level conversion from anger emotion to neutral emotion. Emotion in human speech is concentrated over a small region in the entire utterance. The regions of speech that are highly influenced by the emotive state of the speaker is are considered as emotionally significant regions of an utterance. Physiological constraints of human speech production mechanism are explored to detect the emotionally significant regions of an utterance. Variation of various prosody parameters (Pitch, duration and energy) based on their position in the sentences is analyzed to obtain the modification factors. Speech signal in the emotionally significant regions is modified using the corresponding modification factor to generate the neutral version of the anger speech. Speech samples from Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC) are used in this study. A subjective listening test is performed for evaluating the effectiveness of the proposed conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2), 109–118 (1992)

    Article  Google Scholar 

  2. Batliner, A., Steidl, S., Seppi, D., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv. Hum. Comput. Interact. 2010, 3 (2010)

    Article  Google Scholar 

  3. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. Sig. Process. Mag. IEEE 18(1), 32–80 (2001)

    Article  Google Scholar 

  4. Gangamohan, P., Kadiri, S.R., Yegnanarayana, B.: Analysis of emotional speech at subsegmental level. In: INTERSPEECH, pp. 1916–1920 (2013)

    Google Scholar 

  5. Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with susas: a speech under simulated and actual stress database. In: Eurospeech, vol. 97, pp. 1743–1746 (1997)

    Google Scholar 

  6. Hansen, J.H., Womack, B.D.: Feature analysis and neural network-based classification of speech under stress. IEEE Trans. Speech Audio Process. 4(4), 307–313 (1996)

    Article  Google Scholar 

  7. Kadiri, S.R., Gangamohan, P., Yegnanarayana, B.: Discriminating neutral and emotional speech using neural networks. ICON (2014)

    Google Scholar 

  8. Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Krothapalli, S.R., Yadav, J., Sarkar, S., Koolagudi, S.G., Vuppala, A.K.: Neural network based feature transformation for emotion independent speaker identification. Int. J. Speech Technol. 15(3), 335–349 (2012)

    Article  Google Scholar 

  10. Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  11. Murty, K.: Significance of excitation source information for speech analysis. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras (2009)

    Google Scholar 

  12. Ortony, A., Clore, G.L., Collins, A.: The cognitive structure of emotions. Cambridge University Press, Cambridge (1990)

    Google Scholar 

  13. Raja, G.S., Dandapat, S.: Speaker recognition under stressed condition. Int. J. Speech Technol. 13(3), 141–161 (2010)

    Article  Google Scholar 

  14. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10(1), 19–41 (2000)

    Article  Google Scholar 

  15. Schuller, B., Stadermann, J., Rigoll, G.: Affect-robust speech recognition by dynamic emotional adaptation. In: Proceedings of Speech Prosody. Citeseer (2006)

    Google Scholar 

  16. Stevens, K.N.: Acoustic Phonetics, vol. 30. MIT press, Cambridge (2000)

    Google Scholar 

  17. Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio Speech Lang. Process. 14(4), 1145–1154 (2006)

    Article  Google Scholar 

  18. Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using psola technique. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP-92, vol. 1, pp. 145–148. IEEE (1992)

    Google Scholar 

  19. Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: 2011 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2011)

    Google Scholar 

  20. Vlasenko, B., Prylipko, D., Wendemuth, A.: Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. In: Poster and Demo Track of the 35th German Conference on Artificial Intelligence, KI-2012, pp. 103–107. Citeseer, Saarbrucken, Germany (2012)

    Google Scholar 

  21. Vlasenko, B., Wendemuth, A.: Location of an emotionally neutral region in valence-arousal space: two-class vs. three-class cross corpora emotion recognition evaluations. In: 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2014)

    Google Scholar 

  22. Vuppala, A.K., Kadiri, S.R.: Neutral to anger speech conversion using non-uniform duration modification. In: 2014 9th International Conference on Industrial and Information Systems (ICIIS), pp. 1–4. IEEE (2014)

    Google Scholar 

  23. Vuppala, A.K., Limmayya, J., Raghavendra, G.: Neutral speech to anger speech conversion using prosody modification. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS, vol. 8284, pp. 383–390. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  24. Vydana, H.K., Kadiri, S.R., Vuppala, A.K.: Vowel-based non-uniform prosody modification for emotion conversion. Circuits Syst. Signal Process. 34, 1–21 (2015)

    Article  Google Scholar 

  25. Vydana, H.K., Kumar, P.P., Krishna, K., Vuppala, A.K.: Improved emotion recognition using GMM-UBMs. In: 2015 International Conference on Signal Processing And Communication Engineering Systems (SPACES), pp. 53–57. IEEE (2015)

    Google Scholar 

  26. Yang, B., Lugger, M.: Emotion recognition from speech signals using new harmony features. Signal Process. 90(5), 1415–1423 (2010)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hari Krishna Vydana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vydana, H.K., Raju, V.V.V., Gangashetty, S.V., Vuppala, A.K. (2015). Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26832-3_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26831-6

  • Online ISBN: 978-3-319-26832-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics