Circuits, Systems, and Signal Processing

, Volume 35, Issue 5, pp 1643–1663 | Cite as

Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

  • Hari Krishna Vydana
  • Sudarsana Reddy Kadiri
  • Anil Kumar VuppalaEmail author


The objective of this work is to develop a rule-based emotion conversion method for a better emotional perception. In this work, performance of emotion conversion using the linear modification model is improved by using vowel-based non-uniform prosody modification. In the present approach, attempts were made to integrate features like position and identity for addressing the non-uniformity in prosody generated due to the emotional state of the speaker. We mainly concentrate on the parameters such as strength, duration and pitch contour of vowels at different parts of the sentence. The influence of emotions on the above parameters is exploited to convert the speech from neutral emotion to the target emotion. Non-uniform prosody modification factors for emotion conversion are based on the position of vowels in the word, and the position of the word in the sentence. This study is carried out by using Indian Institute of Technology-Simulated Emotion speech corpus. Evaluation of the proposed algorithm is carried out by a subjective listening test. From the listening tests, it is observed that the performance of the proposed approach is better than the existing approaches.


Prosody Prosody modification Non-uniform prosody modification Vowels Emotion conversion Pitch Duration Energy Epochs Vowel onset points 


  1. 1.
    M. Bulut, S.S. Narayanan, A.K. Syrdal, Expressive speech synthesis using a concatenative synthesizer, in Proceedings of International Conferences Spoken Language Processing, vol. 2, pp. 1265–1268 (2002)Google Scholar
  2. 2.
    J.P. Cabral, L.C. Oliveira, Emovoice: a system to generate emotions in speech, in INTERSPEECH (2006)Google Scholar
  3. 3.
    E. Eide, A. Aaron, R. Bakis, W. Hamza, M. Picheny, J. Pitrelli, A corpus-based approach to expressive speech synthesis, in Fifth ISCA Workshop on Speech Synthesis (2004)Google Scholar
  4. 4.
    D. Govind, S.R.M. Prasanna, Expressive speech synthesis using prosodic modification and dynamic time warping, in NCC 2009 (Guwahati, India), pp. 290–293 (2009)Google Scholar
  5. 5.
    D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in INTERSPEECH, pp. 2969–2972 (2011)Google Scholar
  6. 6.
    S.G. Koolagudi, S. Maity, A.K. Vuppala, S. Chakrabarti, K.S. Rao, IITKGP-SESC: speech database for emotion analysis, in Contemporary Computing (Springer, 2009), pp. 485–492Google Scholar
  7. 7.
    I.R. Murray, J.L. Arnott, Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Commun. 16(4), 369–390 (1995)CrossRefGoogle Scholar
  8. 8.
    K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  9. 9.
    A. Paeschke, W.F. Sendlmeier, Prosodic characteristics of emotional speech: measurements of fundamental frequency movements, in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, pp. 75–80 (2000)Google Scholar
  10. 10.
    J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandez, W. Hamza, M.A. Picheny, The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Speech Audio Lang. Process. 14(4), 1099–1108 (2006)CrossRefGoogle Scholar
  11. 11.
    S.R.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in INTERSPEECH, pp. 781–784 (2010)Google Scholar
  12. 12.
    S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Speech Audio Lang. Process. 17, 556–565 (2009)CrossRefGoogle Scholar
  13. 13.
    K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51(12), 1263–1269 (2009)CrossRefGoogle Scholar
  14. 14.
    K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun. 55(6), 745–756 (2013)CrossRefGoogle Scholar
  15. 15.
    K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Speech Audio Lang. Process. 14(3), 972–980 (2006)CrossRefGoogle Scholar
  16. 16.
    J. Tao, Y. Kang, A. Li, Prosody conversion from neutral speech to emotional speech. IEEE Trans. Speech Audio Lang. Process. 14(4), 1145–1154 (2006)CrossRefGoogle Scholar
  17. 17.
    A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEU Int. J. Electron. Commun. 66(8), 697–700 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Hari Krishna Vydana
    • 1
  • Sudarsana Reddy Kadiri
    • 1
  • Anil Kumar Vuppala
    • 1
    Email author
  1. 1.Speech and Vision Laboratory (SVL), LTRCInternational Institute of Information Technology, Hyderabad (IIIT-H)HyderabadIndia

Personalised recommendations