Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

Haque, Arijul; Rao, Krothapalli Sreenivasa

doi:10.1007/s10772-016-9386-9

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

Published: 27 October 2016

Volume 20, pages 15–25, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Arijul Haque¹ &
Krothapalli Sreenivasa Rao¹

552 Accesses
12 Citations
Explore all metrics

Abstract

This work attempts to convert a given neutral speech to a target emotional style using signal processing techniques. Sadness and anger emotions are considered in this study. For emotion conversion, we propose signal processing methods to process neutral speech in three ways: (i) modifying the energy spectra (ii) modifying the source features and (iii) modifying the prosodic features. Energy spectra of different emotions are analyzed, and a method has been proposed to modify the energy spectra of neutral speech after dividing the speech into different frequency bands. For the source part, epoch strength and epoch sharpness are extensively studied. A new method has been proposed for modification and incorporation of epoch strength and epoch sharpness parameters using appropriate modification factors. Prosodic features like pitch contour and intensity have also been modified in this work. New pitch contours corresponding to the target emotions are derived from the pitch contours of neutral test utterances. The new pitch contours are incorporated into the neutral utterances. Intensity modification is done by dividing neutral utterances into three equal segments and modifying the intensities of these segments separately, according to the modification factors suitable for the target emotions. Subjective evaluation using mean opinion scores has been carried out to evaluate the quality of converted emotional speech. Though the modified speech does not completely resemble the target emotion, the potential of these methods to change the style of the speech is demonstrated by these subjective tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Priyadarsini Samal & Mohammad Farukh Hashmi

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Essam H. Houssein, Asmaa Hammad & Abdelmgeid A. Ali

A survey of music emotion recognition

Article 22 January 2022

Donghong Han, Yanru Kong, … Guoren Wang

References

Bishop, C. M. (2007). Pattern Recognition and Machine Learning (2nd ed.)., Information Science and Statistics New York: Springer.
MATH Google Scholar
Bulut, M., & Narayanan, S. (2008). On the robustness of overall f0-only modifications to the perception of emotions in speech. The Journal of the Acoustical Society of America, 123(6), 4547–4558.
Article Google Scholar
Govind, D., Prasanna, S. M., & Yegnanarayana, B. (2011) Neutral to target emotion conversion using source and suprasegmental information. In Interspeech (pp. 2969–2972).
Iriondo, I., Alías, F., Melenchón, J., & Llorca, M. A. (2004) Modeling and synthesizing emotional speech for Catalan text-to-speech synthesis. In Tutorial and research workshop on affective dialogue systems (pp. 197–208). New York: Springer.
Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.
Article Google Scholar
Kröger, B. J., & Birkholz, P. (2009). Multimodal signals: Cognitive and algorithmic issues. Berlin: Springer.
Google Scholar
Krothapalli, S. R., & Koolagudi, S. G. (2013). Characterization and recognition of emotions from speech using excitation source information. International Journal of Speech Technology, 16(2), 181–201.
Article Google Scholar
Montero, J. M., Gutierrez-Arriola, J. M., Palazuelos, S. E., Enriquez, E., Aguilera, S., & Pardo, J. M. (1998). Emotional speech synthesis: From speech database to TTS. ICSLP, 98, 923–926.
Google Scholar
Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9(5–6), 453–467.
Article Google Scholar
Mozziconacci, S. J. L. (1998). Speech variability and emotion: Production and perception. Eindhoven: Technische Universiteit Eindhoven.
Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.
Article Google Scholar
Pitrelli, J. F., Bakis, R., Eide, E. M., Fernandez, R., Hamza, W., & Picheny, M. A. (2006). The IBM expressive text-to-speech synthesis system for American English. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1099–1108.
Article Google Scholar
Prasanna, S. M., & Govind, D. (2010) Analysis of excitation source information in emotional speech. In Interspeech (pp. 781–784).
Přibilová, A., & Přibil, J. (2009) Spectrum modification for emotional speech synthesis. In Multimodal signals: Cognitive and algorithmic issues (pp. 232–241). Berlin: Springer.
Rao, K. S. (2012). Unconstrained pitch contour modification using instants of significant excitation. Circuits, Systems, and Signal Processing, 31(6), 2133–2152.
Article MathSciNet Google Scholar
Rao, K. S., & Koolagudi, S. G. (2012). Emotion recognition using speech features. New York: Springer.
MATH Google Scholar
Rao, K. S., & Vuppala, A. K. (2013). Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Communication, 55(6), 745–756.
Article Google Scholar
Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 972–980.
Article Google Scholar
Rao, K. S., et al. (2010). Real time prosody modification. Journal of Signal and Information Processing, 1(01), 50.
Article Google Scholar
Sarkar, P., Haque, A., Dutta, A. K., Reddy, G., Harikrishna, D., Dhara, P., Verma, R., Narendra, N., Sunil S. B., & Yadav, J., et al. (2014). Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages: Bengali, Hindi and Telugu. In Seventh international conference on contemporary computing (IC3). IEEE (pp. 473–477).
Schröder, M. (2001). Emotional speech synthesis: A review. In Interspeech (pp. 561–564).
Schröder, M. (2004). Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In Tutorial and research workshop on affective dialogue systems (pp. 209–220). New York: Springer.
Silva, A., Vala, M., & Paiva, A. (2001). The storyteller: Building a synthetic character that tells stories. In Proceedings of the workshop multimodal communication and context in embodied agents (pp. 53–58).
Tao, J. (2003). Emotion control of Chinese speech synthesis in natural environment. In Interspeech (pp. 2349–2352).
Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1145–1154.
Article Google Scholar
Theune, M., Meijs, K., Heylen, D., & Ordelman, R. (2006). Generating expressive speech for storytelling applications. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1137–1144.
Article Google Scholar
Türk, O., & Schröder, M. (2008). A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis. In Interspeech (pp. 2282–2285).
Yadav, J., & Rao, K. S. (2016). Prosodic mapping using neural networks for emotion conversion in Hindi language. Circuits, Systems, and Signal Processing, 35(1), 139–162.
Article MathSciNet Google Scholar
Yegnanarayana, B., & Murty, K. S. R. (2009). Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 614–624.
Article Google Scholar
Zhang, J. Y., Black, A. W., & Sproat, R. (2003). Identifying speakers in children’s stories for speech synthesis. In Interspeech.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
Arijul Haque & Krothapalli Sreenivasa Rao

Authors

Arijul Haque
View author publications
You can also search for this author in PubMed Google Scholar
Krothapalli Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arijul Haque.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haque, A., Rao, K.S. Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech. Int J Speech Technol 20, 15–25 (2017). https://doi.org/10.1007/s10772-016-9386-9

Download citation

Received: 11 May 2016
Accepted: 13 October 2016
Published: 27 October 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10772-016-9386-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

Abstract

Access this article

Similar content being viewed by others

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

A survey of music emotion recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

Abstract

Access this article

Similar content being viewed by others

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

A survey of music emotion recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation