Expressive Speech Synthesis System Using Unit Selection

  • Mukta Gahlawat
  • Amita Malik
  • Poonam Bansal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8284)


Speech for realistic environment is hard to achieve. Emotion synthesizing is one way to achieve realistic and natural sounding speech. Use of right emotion in synthesized speech generates the speech which is more effective and natural for listener. The implementation of emotions is very difficult, as word “emotion” has no single definition. There have been various attempts in creating emotional speech synthesis but perfect or near to ideal system has not been developed so far. Our paper is an attempt to create emotional speech synthesizer, where we have used the emotional database recorded in our own voice. We have used unit selection and CART method to implement it. We have taken class room environment for teaching pre-school students with three emotions i.e neutral, happy, sad and tested our synthesizer with twenty listeners and found that listeners have significantly identify the emotional state of speaker.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cole, R.A., Zue, V.: Survey of the State of the Art in Human Language Technology, ch. 1, pp. 1–2Google Scholar
  2. 2.
    Jakobson, R.: Structure of Language and Its mathematical Aspects. In: Symposia in Applied Mathematics. AMS Bookstore (1980)Google Scholar
  3. 3.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing, p. 346. Prentice Hall (2008)Google Scholar
  4. 4.
    Roger, W.E.: English Phonemes. Department of English Furman University Greenville,
  5. 5.
    O’Grady, W., et al.: Contemporary Linguistics: An Introduction, 5th edn. Bedford/St. Martin’s (2005)Google Scholar
  6. 6.
    Cornelius, R.R.: Theoretical approaches to emotion. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 3–10 (2000)Google Scholar
  7. 7.
    Schröder, M.: Speech and Emotion Research. An overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis. PhD thesis. Universität des Saarlandes. Saarbrücken (2003)Google Scholar
  8. 8.
    Hofer, G.O.: Emotional Speech Synthesis. Master of Science Thesis School of Informatics University of Edinburgh (2004)Google Scholar
  9. 9.
    Schere, K.R.: Vocal affect expression: A review and a model for future research. Psychological Bulletin 99, 143–165 (1986)CrossRefGoogle Scholar
  10. 10.
    Averill, J.R.: A semantic atlas of emotional concepts. JSAS Catalog of Selected Documents in Psychology 5:330. Ms. No. 421 (1975)Google Scholar
  11. 11.
    Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178 (1980)CrossRefGoogle Scholar
  12. 12.
    Schröder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M., Gielen, S.: Acoustic Correlates of Emotion Dimensions in View of Speech Synthesis. In: Eurospeech 2001, vol. 1, pp. 87–90 (2001)Google Scholar
  13. 13.
    Schröder, M.: Expressing degree of activation in synthetic speech. IEEE Transactions on Audio, Speech and Language Processing 14(4), 1128–1136 (2006)CrossRefGoogle Scholar
  14. 14.
    Eide, E.: Preservation, Identification, And Use Of Emotion. In: A Text-To-Speech System, pp. 127–130. IEEE (2002)Google Scholar
  15. 15.
    Galanis, D., Darsinos, V., Kokkinakis, G.: Investigating Emotional Speech Parameters For Speech Synthesis. In: ICECS 1996, pp. 1227–1230 (1996)Google Scholar
  16. 16.
    Hunt, A.J., Black, A.W.: Unit Selection in a Concatenative Speech Synthesis System Using A Large Speech Database, pp. 373–376. IEEE (1996)Google Scholar
  17. 17.
    Dutoit, T.: An Introduction to Text-to-Speech Synthesis, ch. 6, pp. 150–160. Springer (1997)Google Scholar
  18. 18.
    Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Master Thesis, CASE - Center of Applied Statistics and Economics Humboldt University, Berlin (December 20, 2004)Google Scholar
  19. 19.
    TTSBOX available online,
  20. 20.
    Audacity available online,
  21. 21.
    Wavesurfer available online,
  22. 22.
    Black, A.W., Campbell, N.: Optimizing selection of Units from Speech Databases for Concatenative Synthesis. In: Proc. of Eurospeech 1995, vol. 1, pp. 581–584 (1995)Google Scholar
  23. 23.
    Liberman, M.Y., Church, K.W.: Text Analysis and Word Pronunciation in Text-to-Speech Synthesis. In: Advances in Speech Signal Processing, New York (1992)Google Scholar
  24. 24.
    Galanis, D., Darsinos, V., Kokkinakis, G.: Investigating Emotional Speech Parameters For Speech Synthesis. In: ICECS 1996, p. 1227 (1996)Google Scholar
  25. 25.
    Moore, et al.:Three Dimensional Speech Synthesis. U.S. Patent 5, pp. 561–736 (October 1,1996) Google Scholar
  26. 26.
    Sodnik, J., Tomažič, S.: Spatial Speaker: 3D Java Text-to-Speech Converter. In: World Congress on Engineering and Computer Science Vol II (WCECS 2009), San Francisco, USA, pp. 1306–1310 (2009)Google Scholar
  27. 27.
    Oliveira, L.C., Paulo, S., Figueira, L., Mendes, C., Nunes, A., Godinho, J.: Methodologies for Designing and Recording Speech Databases for Corpus Based Synthesis. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Mukta Gahlawat
    • 1
  • Amita Malik
    • 2
  • Poonam Bansal
    • 1
  1. 1.Maharaja Surajmal Institute of TechnologyNew DelhiIndia
  2. 2.DCRUSTSonepatIndia

Personalised recommendations