Emotional Prosodic Model Evaluation for Greek Expressive Text-to-Speech Synthesis

  • Dimitrios Tsonos
  • Pepi Stavropoulou
  • Georgios Kouroupetroglou
  • Despina Deligiorgi
  • Nikolaos Papatheodorou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8514)

Abstract

In this study we introduce a novel experimental approach towards the evaluation of emotional prosodic models in Expressive Speech Synthesis. It is based on the dimensional emotion expressivity and adopts the Self-Assessment Manikin Test. We applied this experimental approach to evaluate an emotional prosodic model for Greek expressive Text-to-Speech synthesis. We used two pseudo-sentences for each of the Greek and English HMM-based synthetic voices, implemented in the MARY TtS platform. Fifteen native Greek participants were asked to assess eleven emotional states for each sentence. The results show that the “Arousal” dimension is perceived as intended, followed by the “Pleasure” and “Dominance” dimensions’ ratings. These preliminary findings are consistent with the results in previous studies.

Keywords

Expressive Speech Synthesis prosody evaluation Text-to-Speech emotional state 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tatham, M., Morton, K.: Expression in Speech: Analysis and Synthesis. Oxford Linguistics, Oxford University Press (2006)Google Scholar
  2. 2.
    Campbell, N., Hamza, W., Hoge, H., Tao, J., Bailly, G.: Editorial Special Section on Expressive Speech Synthesis. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1097–1098 (2006)CrossRefGoogle Scholar
  3. 3.
    Kouroupetroglou, G.: Incorporating Typographic, Logical and Layout Knowledge of Documents into Text-to-Speech. In: Encarnacao, P., Azevedo, L., Gelderblom, G.-J., Newell, A., Mathieassen, N.-E. (eds.) Assistive Technology: From Research to Practice, Proceedings of the 12th European AAATE Conference, Vilamoura, Portugal, September 19-22, pp. 708–713. IOS Press (2013), doi:10.3233/978-1-61499-304-9-708Google Scholar
  4. 4.
    Kouroupetroglou, G., Tsonos, D.: Multimodal Accessibility of Documents. In: Advances in Human-Computer Interaction, pp. 451–470. I-Tech Education and Publishing, Vienna (2008)Google Scholar
  5. 5.
    Kouroupetroglou, G., Tsonos, D., Vlahos, E.: DocEmoX: A System for the Typography-Derived Emotional Annotation of Documents. In: Stephanidis, C. (ed.) UAHCI 2009, Part III. LNCS, vol. 5616, pp. 550–558. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Freitas, D., Kouroupetroglou, G.: Speech Technologies for Blind and Low Vision Persons. Technology and Disability 20, 135–156 (2008)Google Scholar
  7. 7.
    Tsonos, D., Kouroupetroglou, G., Deligiorgi, D.: Regression Modeling of Reader’s Emotions Induced by Font Based Text Signals. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2013, Part II. LNCS, vol. 8010, pp. 434–443. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Abelin, A., Allwood, J.: Cross Linguistic Interpretation of Expressions of Emotions. In: Proceedings of the 8th Simposio Internactional de Communicacion Social, pp. 387–393 (2003)Google Scholar
  9. 9.
    Scherer, K.R., Banse, R., Wallbott, H.G.: Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures. Journal of Cross-Cultural Psychology 32(1), 76–92 (2001)CrossRefGoogle Scholar
  10. 10.
    Pell, M., Paulmann, S., Dara, C., Alasseri, A., Kotz, S.: Factors in the Recognition of Vocally Expressed Emotions: A comparison of Four Languages. Journal of Phonetics 37(4), 417–435 (2009)CrossRefGoogle Scholar
  11. 11.
    Schröder, M.: Expressing degree of activation in synthetic speech. IEEE Transactions on Audio, Speech and Language Processing 14(4), 1128–1136 (2006)CrossRefGoogle Scholar
  12. 12.
    Schröder, M., Trouvain, J.: The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. International Journal of Speech Technology 6, 365–377 (2003)CrossRefGoogle Scholar
  13. 13.
    Pammi, S., Charfuelan, M., Schröder, M.: Multilingual Voice Creation Toolkit for the MARY TTS Platform. In: Proceedings of the International Conference on language Resources and Evaluation (LREC), pp. 3750–3756 (2010)Google Scholar
  14. 14.
    Schröder, M., Charfuelan, M., Pammi, S., Steiner, I.: Open source voice creation toolkit for the MARY TTS Platform. In: Proc. of the 12th Conference of the International Speech Communication Association (INTERSPEECH), pp. 3253–3256 (2011)Google Scholar
  15. 15.
    Fakotakis, N.: Corpus Design, Recording and Phonetic Analysis of Greek Emotional Database. In: Proceedings of the International Conference on language Resources and Evaluation (LREC), pp. 1391–1394 (2004)Google Scholar
  16. 16.
    Kostoulas, T., Ganchev, T., Mporas, I., Fakotakis, N.: A real-world emotional speech corpus for modern Greek. In: Proceedings of the International Conference on language Resources and Evaluation (LREC), pp. 2676–2680 (2008)Google Scholar
  17. 17.
    Lazaridis, A., Mporas, I.: Evaluation of Hidden Semi-Markov Models Training Methods for Greek Emotional Text-to-Speech Synthesis. International Journal of Information Technology and Computer Science 05(04), 23–29 (2013)CrossRefGoogle Scholar
  18. 18.
    Bradley, M.M., Lang, P.J.: Measuring emotion: The self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry 25(1), 49–59 (1994)CrossRefGoogle Scholar
  19. 19.
    Scherer, K.R.: What are emotions? And how can they be measured? Social Science Information 44(4), 695–729 (2005)CrossRefGoogle Scholar
  20. 20.
    Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. Journal of Research in Personality 11(3), 273–294 (1977)CrossRefGoogle Scholar
  21. 21.
    Kouroupetroglou, G., Papatheodorou, N., Tsonos, D.: Design and Development Methodology for the Emotional State Estimation of Verbs. In: Holzinger, A., Ziefle, M., Hitz, M., Debevc, M. (eds.) SouthCHI 2013. LNCS, vol. 7946, pp. 1–15. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  22. 22.
    Castro, S., Lima, L., Recognizing, C.F.: emotions in spoken language: A validated set of Portuguese sentences and pseudosentences for research on emotional prosody. Behavior Research Methods 42(1), 74–81 (2010)CrossRefGoogle Scholar
  23. 23.
    OpenMARY, Emotion-to-Mary XSL, http://mary.dfki.de/lib/emotion-to-mary.xsl/view
  24. 24.
    James, A., Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. Journal of Research in Personality 11(3), 273–294 (1977)CrossRefGoogle Scholar
  25. 25.
    Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70(3), 614–636 (1996)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Dimitrios Tsonos
    • 1
  • Pepi Stavropoulou
    • 1
  • Georgios Kouroupetroglou
    • 1
  • Despina Deligiorgi
    • 2
  • Nikolaos Papatheodorou
    • 1
  1. 1.Department of Informatics and TelecommunicationsNational and Kapodistrian University of AthensAthensGreece
  2. 2.Department of PhysicsNational and Kapodistrian University of AthensAthensGreece

Personalised recommendations