Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

Abstract

The objective of this work is to develop a rule-based emotion conversion method for a better emotional perception. In this work, performance of emotion conversion using the linear modification model is improved by using vowel-based non-uniform prosody modification. In the present approach, attempts were made to integrate features like position and identity for addressing the non-uniformity in prosody generated due to the emotional state of the speaker. We mainly concentrate on the parameters such as strength, duration and pitch contour of vowels at different parts of the sentence. The influence of emotions on the above parameters is exploited to convert the speech from neutral emotion to the target emotion. Non-uniform prosody modification factors for emotion conversion are based on the position of vowels in the word, and the position of the word in the sentence. This study is carried out by using Indian Institute of Technology-Simulated Emotion speech corpus. Evaluation of the proposed algorithm is carried out by a subjective listening test. From the listening tests, it is observed that the performance of the proposed approach is better than the existing approaches.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. 1.

    M. Bulut, S.S. Narayanan, A.K. Syrdal, Expressive speech synthesis using a concatenative synthesizer, in Proceedings of International Conferences Spoken Language Processing, vol. 2, pp. 1265–1268 (2002)

  2. 2.

    J.P. Cabral, L.C. Oliveira, Emovoice: a system to generate emotions in speech, in INTERSPEECH (2006)

  3. 3.

    E. Eide, A. Aaron, R. Bakis, W. Hamza, M. Picheny, J. Pitrelli, A corpus-based approach to expressive speech synthesis, in Fifth ISCA Workshop on Speech Synthesis (2004)

  4. 4.

    D. Govind, S.R.M. Prasanna, Expressive speech synthesis using prosodic modification and dynamic time warping, in NCC 2009 (Guwahati, India), pp. 290–293 (2009)

  5. 5.

    D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in INTERSPEECH, pp. 2969–2972 (2011)

  6. 6.

    S.G. Koolagudi, S. Maity, A.K. Vuppala, S. Chakrabarti, K.S. Rao, IITKGP-SESC: speech database for emotion analysis, in Contemporary Computing (Springer, 2009), pp. 485–492

  7. 7.

    I.R. Murray, J.L. Arnott, Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Commun. 16(4), 369–390 (1995)

    Article  Google Scholar 

  8. 8.

    K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  9. 9.

    A. Paeschke, W.F. Sendlmeier, Prosodic characteristics of emotional speech: measurements of fundamental frequency movements, in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, pp. 75–80 (2000)

  10. 10.

    J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandez, W. Hamza, M.A. Picheny, The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Speech Audio Lang. Process. 14(4), 1099–1108 (2006)

    Article  Google Scholar 

  11. 11.

    S.R.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in INTERSPEECH, pp. 781–784 (2010)

  12. 12.

    S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Speech Audio Lang. Process. 17, 556–565 (2009)

    Article  Google Scholar 

  13. 13.

    K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51(12), 1263–1269 (2009)

    Article  Google Scholar 

  14. 14.

    K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun. 55(6), 745–756 (2013)

    Article  Google Scholar 

  15. 15.

    K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Speech Audio Lang. Process. 14(3), 972–980 (2006)

    Article  Google Scholar 

  16. 16.

    J. Tao, Y. Kang, A. Li, Prosody conversion from neutral speech to emotional speech. IEEE Trans. Speech Audio Lang. Process. 14(4), 1145–1154 (2006)

    Article  Google Scholar 

  17. 17.

    A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEU Int. J. Electron. Commun. 66(8), 697–700 (2012)

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Anil Kumar Vuppala.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vydana, H.K., Kadiri, S.R. & Vuppala, A.K. Vowel-Based Non-uniform Prosody Modification for Emotion Conversion. Circuits Syst Signal Process 35, 1643–1663 (2016). https://doi.org/10.1007/s00034-015-0134-1

Download citation

Keywords

  • Prosody
  • Prosody modification
  • Non-uniform prosody modification
  • Vowels
  • Emotion conversion
  • Pitch
  • Duration
  • Energy
  • Epochs
  • Vowel onset points