Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

Vydana, Hari Krishna; Kadiri, Sudarsana Reddy; Vuppala, Anil Kumar

doi:10.1007/s00034-015-0134-1

Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

Published: 06 August 2015

Volume 35, pages 1643–1663, (2016)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Hari Krishna Vydana¹,
Sudarsana Reddy Kadiri¹ &
Anil Kumar Vuppala¹

986 Accesses
21 Citations
Explore all metrics

Abstract

The objective of this work is to develop a rule-based emotion conversion method for a better emotional perception. In this work, performance of emotion conversion using the linear modification model is improved by using vowel-based non-uniform prosody modification. In the present approach, attempts were made to integrate features like position and identity for addressing the non-uniformity in prosody generated due to the emotional state of the speaker. We mainly concentrate on the parameters such as strength, duration and pitch contour of vowels at different parts of the sentence. The influence of emotions on the above parameters is exploited to convert the speech from neutral emotion to the target emotion. Non-uniform prosody modification factors for emotion conversion are based on the position of vowels in the word, and the position of the word in the sentence. This study is carried out by using Indian Institute of Technology-Simulated Emotion speech corpus. Evaluation of the proposed algorithm is carried out by a subjective listening test. From the listening tests, it is observed that the performance of the proposed approach is better than the existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

M. Bulut, S.S. Narayanan, A.K. Syrdal, Expressive speech synthesis using a concatenative synthesizer, in Proceedings of International Conferences Spoken Language Processing, vol. 2, pp. 1265–1268 (2002)
J.P. Cabral, L.C. Oliveira, Emovoice: a system to generate emotions in speech, in INTERSPEECH (2006)
E. Eide, A. Aaron, R. Bakis, W. Hamza, M. Picheny, J. Pitrelli, A corpus-based approach to expressive speech synthesis, in Fifth ISCA Workshop on Speech Synthesis (2004)
D. Govind, S.R.M. Prasanna, Expressive speech synthesis using prosodic modification and dynamic time warping, in NCC 2009 (Guwahati, India), pp. 290–293 (2009)
D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in INTERSPEECH, pp. 2969–2972 (2011)
S.G. Koolagudi, S. Maity, A.K. Vuppala, S. Chakrabarti, K.S. Rao, IITKGP-SESC: speech database for emotion analysis, in Contemporary Computing (Springer, 2009), pp. 485–492
I.R. Murray, J.L. Arnott, Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Commun. 16(4), 369–390 (1995)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
A. Paeschke, W.F. Sendlmeier, Prosodic characteristics of emotional speech: measurements of fundamental frequency movements, in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, pp. 75–80 (2000)
J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandez, W. Hamza, M.A. Picheny, The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Speech Audio Lang. Process. 14(4), 1099–1108 (2006)
Article Google Scholar
S.R.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in INTERSPEECH, pp. 781–784 (2010)
S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Speech Audio Lang. Process. 17, 556–565 (2009)
Article Google Scholar
K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51(12), 1263–1269 (2009)
Article Google Scholar
K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun. 55(6), 745–756 (2013)
Article Google Scholar
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Speech Audio Lang. Process. 14(3), 972–980 (2006)
Article Google Scholar
J. Tao, Y. Kang, A. Li, Prosody conversion from neutral speech to emotional speech. IEEE Trans. Speech Audio Lang. Process. 14(4), 1145–1154 (2006)
Article Google Scholar
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEU Int. J. Electron. Commun. 66(8), 697–700 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Vision Laboratory (SVL), LTRC, International Institute of Information Technology, Hyderabad (IIIT-H), Hyderabad, India
Hari Krishna Vydana, Sudarsana Reddy Kadiri & Anil Kumar Vuppala

Authors

Hari Krishna Vydana
View author publications
You can also search for this author in PubMed Google Scholar
Sudarsana Reddy Kadiri
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Vuppala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anil Kumar Vuppala.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vydana, H.K., Kadiri, S.R. & Vuppala, A.K. Vowel-Based Non-uniform Prosody Modification for Emotion Conversion. Circuits Syst Signal Process 35, 1643–1663 (2016). https://doi.org/10.1007/s00034-015-0134-1

Download citation

Received: 05 August 2014
Revised: 24 June 2015
Accepted: 27 June 2015
Published: 06 August 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s00034-015-0134-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation