Advertisement

Synthesising Expressive Speech – Which Synthesiser for VOCAs?

  • Jan-Oliver Wülfing
  • Chi Tai Dang
  • Elisabeth AndréEmail author
Conference paper
  • 103 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12284)

Abstract

In the context of people with complex communication needs who depend on Voice Output Communication Aids, the ability of speech synthesisers to convey not only sentences, but also emotions would be a great enrichment. The latter is essential and very natural in interpersonal speech communication. Hence, we are interested in the expressiveness of speech synthesisers and their perception. We present the results of a study in which 82 participants listened to different synthesised sentences with different emotional contours from three synthesisers. We found that participants’ ratings on expressiveness and naturalness indicate that the synthesiser CereVoice performs better than the other synthesisers.

Keywords

Complex Communication Needs Voice Output Communication Aid Expressive Speech Synthesis Online survey 

Notes

Acknowledgements

The work presented here is partially supported by ‘PROMI - Promotion inklusive’ and the employment centre. We thank the students, Lena Tikovsky and Ewald Heinz, for their contribution to this work.

References

  1. 1.
    Aylett, M.P., Cowan, B.R., Clark, L.: Siri, echo and performance: you have to suffer darling. In: Conference on Human Factors in Computing Systems, Extended Abstracts, Glasgow, Scotland, UK. ACM, New York (2019).  https://doi.org/10.1145/3290607.3310422
  2. 2.
    Aylett, M.P., Pidcock, C.J.: Adding and controlling emotion in synthesised speech. Tech. Rep. UK patent GB2447263A (2008)Google Scholar
  3. 3.
    Aylett, M.P., Vinciarelli, A., Wester, M.: Speech synthesis for the generation of artificial personality. IEEE Trans. Affect. Comput. 11(2), 361–372 (2020).  https://doi.org/10.1109/TAFFC.2017.2763134CrossRefGoogle Scholar
  4. 4.
    Blackstone, S.W., Wilkins, D.P.: Exploring the importance of emotional competence in children with complex communication needs. Perspect. Augmentative Altern. Commun. 18(3), 78–87 (2009).  https://doi.org/10.1044/aac18.3.78CrossRefGoogle Scholar
  5. 5.
    Chafe, W.: Prosody: the music of language. In: Genetti, C., Adelman, A. (eds.) How Languages Work - An Introduction to Language and Linguistics, 2nd edn, pp. 236–256. Cambridge University Press, Cambridge (2019)Google Scholar
  6. 6.
    Dang, C.T., Andre, E.: Acceptance of autonomy and cloud in the smart home and concerns. In: Dachselt, R., Weber, G. (eds.) Mensch und Computer 2018 (MuC 2018) - Tagungsband (2018)Google Scholar
  7. 7.
    Dang, C.T., Aslan, I., Lingenfelser, F., Baur, T., André, E.: Towards somaesthetic smarthome designs: exploring potentials and limitations of an affective mirror. In: Proceedings of the 9th International Conference on the Internet of Things. IoT 2019. Association for Computing Machinery, New York (2019).  https://doi.org/10.1145/3365871.3365893
  8. 8.
    Fiannaca, A.J., Paradiso, A., Campbell, J., Morris, M.R.: Voicesetting: voice authoring UIs for improved expressivity in augmentative communication. In: Mandryk, R.L., Hancock, M., Perry, M., Cox, A.L. (eds.) Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, 21–26 April 2018, p. 283. ACM (2018).  https://doi.org/10.1145/3173574.3173857
  9. 9.
    Girden, E.R.: ANOVA: Repeated Measures. Sage, Newbury Park (1992)CrossRefGoogle Scholar
  10. 10.
    Hinterleitner, F.: Quality of Synthetic Speech. TSTS. Springer, Singapore (2017).  https://doi.org/10.1007/978-981-10-3734-4CrossRefGoogle Scholar
  11. 11.
    Hoffmann, L., Wülfing, J.O.: Usability of electronic communication aids in the light of daily use. In: Proceedings 14th Biennial Conference of the International Society for Augmentative and Alternative Communication, p. 259 (2010)Google Scholar
  12. 12.
    Murray, I.R., Arnott, J.L.: Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Comput. Speech Lang. 22(2), 107–129 (2008).  https://doi.org/10.1016/j.csl.2007.06.001CrossRefGoogle Scholar
  13. 13.
    Schröder, M., Charfuelan, M., Pammi, S., Steiner, I.: Open source voice creation toolkit for the MARY TTS platform. In: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 27–31 August 2011, pp. 3253–3256. ISCA (2011)Google Scholar
  14. 14.
    Steiner, I., Maguer, S.L.: Creating new language and voice components for the updated marytts text-to-speech synthesis platform. In: Calzolari, N., et al. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, 7–12 May 2018. European Language Resources Association (ELRA) (2018)Google Scholar
  15. 15.
    Wagner, P., et al.: Speech synthesis evaluation - state-of-the-art assessment and suggestion for a novel research program. In: Proceedings of the 10th ISCA Speech Synthesis Workshop, pp. 105–110 (2019).  https://doi.org/10.21437/SSW.2019-19
  16. 16.
    Wülfing, J.-O., André, E.: Progress to a VOCA with prosodic synthesised speech. In: Miesenberger, K., Kouroupetroglou, G. (eds.) ICCHP 2018. LNCS, vol. 10896, pp. 539–546. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-94277-3_84CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Jan-Oliver Wülfing
    • 1
  • Chi Tai Dang
    • 1
  • Elisabeth André
    • 1
    Email author
  1. 1.Human-Centred MultimediaUniversity of AugsburgAugsburgGermany

Personalised recommendations