Empathetic Speech Synthesis and Testing for Healthcare Robots

James, Jesin; Balamurali, B. T.; Watson, Catherine I.; MacDonald, Bruce

doi:10.1007/s12369-020-00691-4

Empathetic Speech Synthesis and Testing for Healthcare Robots

Published: 11 September 2020

Volume 13, pages 2119–2137, (2021)
Cite this article

International Journal of Social Robotics Aims and scope Submit manuscript

Jesin James ORCID: orcid.org/0000-0003-0024-9597¹,
B. T. Balamurali²,
Catherine I. Watson¹ &
…
Bruce MacDonald¹

1566 Accesses
14 Citations
11 Altmetric
1 Mention
Explore all metrics

Abstract

One of the major factors that affect the acceptance of robots in Human-Robot Interaction applications is the type of voice with which they interact with humans. The robot’s voice can be used to express empathy, which is an affective response of the robot to the human user. In this study, the aim is to find out if social robots with empathetic voice are acceptable for users in healthcare applications. A pilot study using an empathetic voice spoken by a voice actor was conducted. Only prosody in speech is used to express empathy here, without any visual cues. Also, the emotions needed for an empathetic voice are identified. It was found that the emotions needed are not only the stronger primary emotions, but also the nuanced secondary emotions. These emotions are then synthesised using prosody modelling. A second study, replicating the pilot test is conducted using the synthesised voices to investigate if empathy is perceived from the synthetic voice as well. This paper reports the modelling and synthesises of an empathetic voice, and experimentally shows that people prefer empathetic voice for healthcare robots. The results can be further used to develop empathetic social robots, that can improve people’s acceptance of social robots.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 14

The Role of Speech Technology in User Perception and Context Acquisition in HRI

Article 04 August 2020

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Iterative Design of an Emotive Voice for the Tabletop Robot Haru

Notes

Anthropomorphism refers to the tendency of humans to see human-like characteristics, emotions, and motivations in non-human entities such as animals, gods, and objects.
The focus is on elderly as the Healthbots - which is the application on which this study is based, is developed for aged-care facilities.
Affect is a concept used in psychology to describe the experiencing of feeling or emotion. The addition of emotions/feelings into the robot’s speech is explained here. This has lead to a field called affective computing, which includes developing systems that can recognise, interpret and respond to emotions, and also produce them.
In this study, neutral voice is defined as voice spoken naturally (i.e. without stress). For the robot with expressive voice, stress was included to express urgency.
Empathy is the ability to understand and share the feelings of another.
The study is approved by the University of Auckland Human Participants Ethics Committee (UAHPEC) on 20/10/2017 for 3 years. Ref. No. 019845.
https://auckland.au1.qualtrics.com/jfe/form/SV_2hn68L1Df9lMXIh
First language and second language speakers distinction is based on New Zealand English. Participants were classified as L1 if they were living in New Zealand since age seven at least.
Version XM of Qualtrics. Copyright 2019 Qualtrics. Qualtrics and all other Qualtrics product or service names are registered trademarks or trademarks of Qualtrics, Provo, UT, USA. https://www.qualtrics.com.
Primary emotions are emotions that are innate to support reactive response behaviour (Eg. angry, happy, sad, fear). The basic/primary emotions are based on the studies by Ekman [37]. Secondary emotions arise from higher cognitive processes, based on an ability to evaluate preferences over outcomes and expectations (Eg. relief, hope) [38, 39]. Various theories define primary and secondary emotions [40], but here we will be looking at emotions that were studied as part of human-robot interaction and speech synthesis studies.
JLCorpus contains five primary and five secondary emotions. “Assertive” was one of the secondary emotions. The actors of the corpus were instructed to speak seriously and confidently while recording this emotion. In a previous paper, the reviewers strongly criticised the use of “assertive” as an emotion, and asked to reconsider it. From the existing list of emotions in Russel’s circumplex model of emotions (Fig. 6), “confident” was the best match. This journey is a clear indication of the difficulty in analysing and classifying secondary emotions as they can be difficult to define. The corpus is available at: github.com/tli725/JL-Corpus.
Approved by the University of Auckland Human Participants Ethics Committee (UAHPEC) on 20/10/2017 for 3 years. Ref. No. 019845.
https://auckland.au1.qualtrics.com/jfe/form/SV_9tvDP800i4oLmXX
r indicates the effect size.
Using Taguette [62]; mind map plot using Miro [63]

References

Tapus A (2009) Assistive robotics for healthcare and rehabilitation. In: Int. conf. on control systems and computer science, Romania, pp 1–7
Toh LPE, Causo A, Tzuo P, Chen I, Yeo SH (2016) A review on the use of robots in education and young children. Educ Technol Soc 19:148–163
Google Scholar
Triebel R, Arras K, Alami R, Beyer L, Breuers S, Chatila R, Chetouani M, Cremers D, Evers V, Fiore M (2016) Spencer: a socially aware service robot for passenger guidance and help in busy airports. In: Field and service robotics, pp 607–622
Pineau Joelle, Montemerlo Michael, Pollack Martha, Roy Nicholas, Thrun Sebastian (2002) Towards robotic assistants in nursing homes: challenges and results. Robot Auton Syst 42(271–281):6
MATH Google Scholar
Chu M, Khosla R, Khaksar SMS, Nguyen K (2017) Service innovation through social robot engagement to improve dementia care quality. Assist Technol 29(1):8–18
Article Google Scholar
Centre for automation and robotic engineering science-Healthbots. https://cares.blogs.auckland.ac.nz/research/healthcare-assistive-technologies/healthbots/. Accessed 29 Oct 2019
Broadbent E, Stafford R, MacDonald B (2009) Acceptance of healthcare robots for the older population: review and future directions. Int J Social Robot 1(4):319
Article Google Scholar
Igic A, Watson CI, Stafford RQ, Broadbent E, Jayawardena C, MacDonald BA (2010) Perception of synthetic speech with emotion modelling delivered through a robot platform: an initial investigation with older listeners. In: Australasian int. conf. on speech science and technology, Australia, pp 189–192
Igic A (2010) Synthetic speech for a healthcare robot: investigation, issues and implementation. Master’s thesis, The University of Auckland, New Zealand
Fussell SR, Kiesler S, Setlock LD, Victoria Y (2008) How people anthropomorphize robots. In: ACM/IEEE int. conf. on human-robot interaction, Netherlands, pp 145–152
Heerink Marcel, Kröse Ben, Evers Vanessa, Wielinga Bob (2010) Assessing acceptance of assistive social agent technology by older adults: the Almere model. Int J Social Robot 2(4):361–375
Article Google Scholar
Heerink M (2011) Exploring the influence of age, gender, education and computer experience on robot acceptance by older adults. In: Int. conf. on Human-robot interaction, Switzerland, pp 147–148
Duffy Brian R (2003) Anthropomorphism and the social robot. Robot Auton Syst 42(3–4):177–190
Article Google Scholar
Marcel Heerink, Ben Krose, Vanessa Evers, Bob Wielinga (2006) The influence of a robot’s social abilities on acceptance by elderly users. IEEE Int. Symposium on Robot and Human Interactive Communication, UK, pp 521–526
Markowitz J (2017) Speech and language for acceptance of social robots: an overview. Voice Interact Design 2:1–11
Google Scholar
Breazeal C, Scassellati B (1999) A context-dependent attention system for a social robot. In: Int. joint conf.s on artificial intelligence, USA, pp 1146–1151
Chella A, Barone RE, Pilato G, Sorbello R (2008) An emotional storyteller robot. Emotion, personality, and social behavior, USA. In: AAAI spring symposium, pp 17–22
Mavridis Nikolaos (2015) A review of verbal and non-verbal human-robot interactive communication. Robot Auton Syst 63:22–35
Article MathSciNet Google Scholar
Ivar Nass Clifford, Brave Scott (2005) Wired for speech: how voice activates and advances the human–computer relationship. MIT press, Cambridge
Google Scholar
Goetz J, Kiesler S, Powers A (2003) Matching robot appearance and behavior to tasks to improve human-robot cooperation. In: IEEE int. workshop on robot and human interactive communication, USA, pp 55–60
Scheutz M, Schermerhorn P, Kramer J, Middendorff C (2006) The utility of affect expression in natural language interactions in joint human–robot tasks. In: ACM conf. on human–robot interaction. USA 2:226–233
Eyssel F, Ruiter L, Kuchenbrandt D, Bobinger S, Hegel F (2012) If you sound like me, you must be more human: on the interplay of robot and user features on human-robot acceptance and anthropomorphism. In: ACM/IEEE int. conf. on human–robot interaction, USA, pp 125–126
Fung P, Bertero D, Wan Y, Dey A, Chan RHY, Siddique F, Yang Y, Wu C, Lin R (2016) Towards empathetic human-robot interactions. In: Int. conf. on intelligent text processing & computational linguistics, Turkey, pp 173–193
James J, Watson CI, MacDonald B (2018) Artificial empathy in social robots: an analysis of emotions in speech. In: IEEE int. symposium on robot and human interactive communication, China, pp 632–637
Cuff Benjamin MP, Brown Sarah J, Taylor Laura, Howat Douglas J (2016) Empathy: a review of the concept. Emot Rev 8(2):144–153
Article Google Scholar
Asada Minoru (2015) Towards artificial empathy. Int J Social Robot 7(1):19–33
Article Google Scholar
Taylor P (2009) Text-to-speech synthesis. Cambridge university press, Cambridge
Book Google Scholar
Crumpton J, Bethel CL (2015) Validation of vocal prosody modifications to communicate emotion in robot speech. In: Int. conf. on collaboration technologies and systems, USA, pp 39–46
Alam Firoj, Danieli Morena, Riccardi Giuseppe (2018) Annotating and modeling empathy in spoken conversations. Computer Speech Lang 50:40–61
Article Google Scholar
Li X, Watson CI, Igic A, MacDonald B (2009) Expressive speech for a virtual talking head. In: Australasian conf. on robotics and automation, Australia, pp 5009–5014
Moore Lisa A (2006) Empathy: a clinician’s perspective. ASHA Leader 11(10):16–35
Article Google Scholar
Niculescu Andreea, van Dijk Betsy, Nijholt Anton, Li Haizhou, See Swee Lan (2013) Making social robots more attractive: the effects of voice pitch, humor and empathy. Int J Social Robot 5(2):171–191
Article Google Scholar
Watson C, Liu W, MacDonald B (2013) The effect of age and native speaker status on synthetic speech intelligibility. In: ISCA workshop on speech synthesis, Spain, pp 195–200
Broadbent E, Tamagawa R, Kerse N, Knock B, Patience A, MacDonald B (2009) Retirement home staff and residents preferences for healthcare robots. In: IEEE int. symposium on robot and human interactive communication, Japan, pp 645–650
Moyers TB, Martin T, Manuel JK, Miller WR, Ernst D (2003) The motivational interviewing treatment integrity (miti) code: Version 2.0. http://casaa.unm.edu/download/miti.pdf. Accessed 29 Oct 2019
Field A, Miles J, Field Z (2012) Discovering statistics using R. Sage, Thousand Oaks, pp 666–673
Google Scholar
Ekman Paul (1992) An argument for basic emotions. Cogn Emotion 6(3–4):169–200
Article Google Scholar
Damasio A (1994) Descartes error, emotion reason and the human brain. Avon books, New York
Google Scholar
Becker-Asano Christian, Wachsmuth Ipke (2010) Affective computing with primary and secondary emotions in a virtual human. Auton Agent Multi-Agent Syst 20(1):32
Article Google Scholar
Kemper Theodore D (1987) How many emotions are there? wedding the social and the autonomic components. Am J Sociol 93(2):263–289
Article Google Scholar
Ochs Magalie, Sadek David, Pelachaud Catherine (2012) A formal model of emotions for an empathic rational dialog agent. Auton Agent Multi-Agent Syst 24(3):410–440
Article Google Scholar
Boukricha H, Wachsmuth I, Carminati MN, Knoeferle P (2013) A computational model of empathy: empirical evaluation. In: Humaine association conf. on affective computing and intelligent interaction, USA, pp 1–6
Schröder M (2001) Emotional speech synthesis: a review. In: Eurospeech, Scandinavia, pp 561–64
Breazeal C (2001) Emotive qualities in robot speech. In: IEEE/RSJ IROS, USA, pp 1389–1394. IEEE
Crumpton Joe, Bethel Cindy L (2016) A survey of using vocal prosody to convey emotion in robot speech. Int J Social Robot 8(2):271–285
Article Google Scholar
Paltoglou Georgios, Thelwall Michael (2012) Seeing stars of valence and arousal in blog posts. IEEE Trans Affect Comput 4(1):116–123
Article Google Scholar
James J, Tian L, Watson CI (2018) An open source emotional speech corpus for human robot interaction applications. In: Interspeech, India, pp 2768–2772
James J, Watson CI, Stoakes H(2019) Influence of prosodic features and semantics on secondary emotion production and perception. In: Int. congress of phonetic sciences, Australia, pp 1779–1782
Kisler T, Schiel F, Sloetjes H (2012) Signal processing via web services: the use case webmaus. In: Digital humanities conf, Germany, pp 30–34
James J, Mixdorff H, Watson CI (2019) Quantitative model-based analysis of \(f_0\) contours of emotional speech. In: Int. congress of phonetic sciences, Australia, pp 72–76
Mixdorff H, Cossio-Mercado C, Hönemann A, Gurlekian J, Evin D, Torres H(2015) Acoustic correlates of perceived syllable prominence in German. In: Annual conf. of the int. speech communication association, Germany, pp 51–55
Mixdorff H (2000) A novel approach to the fully automatic extraction of fujisaki model parameters. In: IEEE int. conf. on acoustics, speech, and signal processing. Proceedings, Turkey, pages 1281–1284
Schröder Marc, Trouvain J”urgen (2003) The German text-to-speech synthesis system MARY: a tool for research, development and teaching. Int J Speech Technol 6(4):365–377
Article Google Scholar
Watson CI, Marchi A (2014) Resources created for building New Zealand english voices. In: Australasian int. conf. of speech science and technology, New Zealand, pp 92–95
Jain S (2015) Towards the creation of customised synthetic voices using Hidden Markov Models on a Healthcare Robot. Master’s thesis, The University of Auckland, New Zealand
Paul Boersma (1993) Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Inst Phonetic Sci 17:97–110
Google Scholar
Liaw Andy, Wiener Matthew (2002) Classification and regression by Random Forest. R news 2.3 23:18–22
Google Scholar
Yoav Freund, Schapire Robert E (1996) Experiments with a new boosting algorithm. In: Int. conf. on machine learning, Italy, pp 148–156
Eide E, Aaron A, Bakis R, Hamza W, Picheny M, Pitrelli J (2004) A corpus-based approach to expressive speech synthesis. In: ISCA ITRW on speech synthesis, USA, pp 79–84
Ming H, Huang D, Dong M, Li H, Xie L, Zhang S (2015) Fundamental frequency modeling using Wavelets for emotional voice conversion. In: Int. conf. on affective computing and intelligent interaction, China, pp 804–809
Robinson C, Obin N, Roebel A (2019) Sequence-to-sequence modelling of \(F_0\) for speech emotion conversion. In: Int. conf. on acoustics, speech, and signal processing, UK, pp 6830–6834
Taguette version: 0.9. https://www.taguette.org. Publisher: Zenodo
Miro. https://miro.com/app/
Powers A, Kiesler S, Fussell S, Torrey C (2007) Comparing a computer agent with a humanoid robot. In: Proceedings of the ACM/IEEE int. conf. on human-robot interaction, pp 145–152
McGinn C, Torre I (2019) Can you tell the robot by the voice? an exploratory study on the role of voice in the perception of robots. In: 2019 14th ACM/IEEE int. conf. on human–robot interaction (HRI), pp 211–221. IEEE
Anzalone Salvatore M, Boucenna Sofiane, Ivaldi Serena, Chetouani Mohamed (2015) Evaluating the engagement with social robots. Int J Social Robot 7(4):465–478
Article Google Scholar
Leite Iolanda, Castellano Ginevra, Pereira André, Martinho Carlos, Paiva Ana (2014) Empathic robots for long-term interaction. Int J Social Robot 6(3):329–341
Article Google Scholar
Tamagawa Rie, Watson Catherine I, Han Kuo I, MacDonald Bruce A, Broadbent Elizabeth (2011) The effects of synthesized voice accents on user perceptions of robots. Int J Social Robot 3(3):253–262
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Centre for Automation and Robotic Engineering Science, University of Auckland Seed funding. The authors would like to thank the professional actors who recorded their voices for the JLCorpus and the perception test participants for their time and effort.

Author information

Authors and Affiliations

Centre for Automation and Robotic Engineering Science, Department of Electrical, Computer, and Software Engineering, The University of Auckland, Auckland, New Zealand
Jesin James, Catherine I. Watson & Bruce MacDonald
Singapore University of Technology & Design, Singapore, Singapore
B. T. Balamurali

Authors

Jesin James
View author publications
You can also search for this author in PubMed Google Scholar
B. T. Balamurali
View author publications
You can also search for this author in PubMed Google Scholar
Catherine I. Watson
View author publications
You can also search for this author in PubMed Google Scholar
Bruce MacDonald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesin James.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical Standard

The authors also thank the very detailed review provided by the reviewers of the journal that helped improve this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix-Scale of Empathy Questionnaire

Table 5 Empathy scale in MITI and its extension to HRI

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

James, J., Balamurali, B.T., Watson, C.I. et al. Empathetic Speech Synthesis and Testing for Healthcare Robots. Int J of Soc Robotics 13, 2119–2137 (2021). https://doi.org/10.1007/s12369-020-00691-4

Download citation

Accepted: 10 August 2020
Published: 11 September 2020
Issue Date: December 2021
DOI: https://doi.org/10.1007/s12369-020-00691-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empathetic Speech Synthesis and Testing for Healthcare Robots

Abstract

Access this article

Similar content being viewed by others

The Role of Speech Technology in User Perception and Context Acquisition in HRI

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Iterative Design of an Emotive Voice for the Tabletop Robot Haru

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical Standard

Additional information

Publisher's Note

Appendix-Scale of Empathy Questionnaire

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Empathetic Speech Synthesis and Testing for Healthcare Robots

Abstract

Access this article

Similar content being viewed by others

The Role of Speech Technology in User Perception and Context Acquisition in HRI

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Iterative Design of an Emotive Voice for the Tabletop Robot Haru

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical Standard

Additional information

Publisher's Note

Appendix-Scale of Empathy Questionnaire

Appendix-Scale of Empathy Questionnaire

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation