Modifying Spectral Envelope to Synthetically Adjust Voice Quality and Articulation Parameters for Emotional Speech Synthesis

  • Yanqiu Shao
  • Zhuoran Wang
  • Jiqing Han
  • Ting Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3784)


Both of the prosody and spectral features are important for emotional speech synthesis. Besides prosody effects, voice quality and articulation parameters are the factors that should be considered to modify in emotional speech synthetic systems. Generally, rules and filters are designed to process these parameters respectively. This paper proves that by modifying spectral envelope, the voice quality and articulation could be adjusted as a whole. Thus, it will not need to modify each of the parameter separately depending on rules. Accordingly, it will make the synthetic system more flexible by designing an automatic spectral envelope model based on some machine learning methods. The perception test in this paper also shows that when prosody and spectral features are all modified, the best emotional synthetic speech will be obtained.


Emotion Recognition Vocal Tract Voice Quality Speech Synthesis Pitch Contour 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cahn, J.E.: Generating expression in synthesized speech. Master’s thesis, Massachusetts Institute of Technology (1989)Google Scholar
  2. 2.
    Murray, I.R., Arnott, J.L.: Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication 16, 369–390 (1995)CrossRefGoogle Scholar
  3. 3.
    Rank, E., Pirker, H.: Generating emotional speech with a concatenative synthesizer. In: Proceedings, ICSLP 1998, Sydney, Australia, vol. 3, pp. 671–674 (1998)Google Scholar
  4. 4.
    Iida, A., Campbell, N., Iga, S., Higuchi, F., Yasumura, M.: A Speech synthesis system with emotion for assisting communication. In: Proceedings of ISCA Workshop (ITRW) on Speech and Emotion, Newcastle, Northern Ireland, pp. 167–172 (2000)Google Scholar
  5. 5.
    Nagasaki, Y., Komatsu, T.: Can people perceive different emotions from a non-emotional voice by modifying its F0 and duration? In: Proceedings of Speech Prosody 2004, Nara, Japan (2004)Google Scholar
  6. 6.
    Gobl, C., Bennett, E., Ní, C.A.: Expressive synthesis: How crucial is voice quality? In: Proceedings of IEEE Workshop on Speech Synthesis, Santa, Monica (2002)Google Scholar
  7. 7.
    Moriyama, T., Ozawa, S.: Emotion recognition and synthesis system on speech. In: IEEE ICMCS 1999 (1999)Google Scholar
  8. 8.
    Hawkins, S., Stevens, K.: Acoustic and perceptual correlates of the non-nasal nasal distinction for vowels. Journal of the Acoustical Society of America 77, 1560–1575 (1985)CrossRefGoogle Scholar
  9. 9.
    Klatt, D., Klatt, L.: Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America 87, 820–857 (1990)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Yanqiu Shao
    • 1
  • Zhuoran Wang
    • 1
  • Jiqing Han
    • 1
  • Ting Liu
    • 1
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations