Neutral Speech to Anger Speech Conversion Using Prosody Modification

  • Anil Kumar Vuppala
  • J. Limmayya
  • G. Raghavendra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8284)


In this paper, the dynamics of prosodic features are exploited for speech emotion conversion. In particular, emotion conversion of neutral speech to anger speech is accomplished. The database used for analysis of prosody is the Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC). The prosodic features considered for the study are pitch contour, intensity contour, and duration contour. Objective test is performed in terms of average of pitch contour and intensity contour. Subjective listening test results show that the effectiveness of perception of emotion is better in the case of pitch contour modification at the beginning and ending of utterance than for the whole utterance. The results show that the synthesized anger speech is perceived very close to natural anger emotion.


Emotion conversion neutral speech anger speech phase vocoder pitch shift intensity contour duration contour 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vroomen, J., Collier, R., Mozziconacci, S.: Duration and intonation in emotional speech. Eurospeech 1, 577–580 (1993)Google Scholar
  2. 2.
    Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech, and Language Processing 14, 1145–1154 (2006)CrossRefGoogle Scholar
  3. 3.
    Rao, K.S., Yegnanarayana, B.: Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech and Language Processing 14, 972–980 (2006)CrossRefGoogle Scholar
  4. 4.
    Paeschke, A., Sendlmeier, W.F.: Prosodic characteristics of emotional speech: measurements of fundamental frequency movements. In: Speech Emotion, pp. 75–80 (2000)Google Scholar
  5. 5.
    Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Sreenivasa Rao, K.: IITKGP-SESC: Speech database for emotion analysis. In: Ranka, S., et al. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Yegnanarayana, B., Murty, K.S.R.: Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech and Language Process 17(4), 614–625 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Anil Kumar Vuppala
    • 2
  • J. Limmayya
    • 1
  • G. Raghavendra
    • 1
  1. 1.Department of ECERGU IIIT-NuzvidIndia
  2. 2.International Institute of Information TechnologyHyderabadIndia

Personalised recommendations