Abstract
One of the major factors that affect the acceptance of robots in Human-Robot Interaction applications is the type of voice with which they interact with humans. The robot’s voice can be used to express empathy, which is an affective response of the robot to the human user. In this study, the aim is to find out if social robots with empathetic voice are acceptable for users in healthcare applications. A pilot study using an empathetic voice spoken by a voice actor was conducted. Only prosody in speech is used to express empathy here, without any visual cues. Also, the emotions needed for an empathetic voice are identified. It was found that the emotions needed are not only the stronger primary emotions, but also the nuanced secondary emotions. These emotions are then synthesised using prosody modelling. A second study, replicating the pilot test is conducted using the synthesised voices to investigate if empathy is perceived from the synthetic voice as well. This paper reports the modelling and synthesises of an empathetic voice, and experimentally shows that people prefer empathetic voice for healthcare robots. The results can be further used to develop empathetic social robots, that can improve people’s acceptance of social robots.
Similar content being viewed by others
Notes
Anthropomorphism refers to the tendency of humans to see human-like characteristics, emotions, and motivations in non-human entities such as animals, gods, and objects.
The focus is on elderly as the Healthbots - which is the application on which this study is based, is developed for aged-care facilities.
Affect is a concept used in psychology to describe the experiencing of feeling or emotion. The addition of emotions/feelings into the robot’s speech is explained here. This has lead to a field called affective computing, which includes developing systems that can recognise, interpret and respond to emotions, and also produce them.
In this study, neutral voice is defined as voice spoken naturally (i.e. without stress). For the robot with expressive voice, stress was included to express urgency.
Empathy is the ability to understand and share the feelings of another.
The study is approved by the University of Auckland Human Participants Ethics Committee (UAHPEC) on 20/10/2017 for 3 years. Ref. No. 019845.
First language and second language speakers distinction is based on New Zealand English. Participants were classified as L1 if they were living in New Zealand since age seven at least.
Version XM of Qualtrics. Copyright 2019 Qualtrics. Qualtrics and all other Qualtrics product or service names are registered trademarks or trademarks of Qualtrics, Provo, UT, USA. https://www.qualtrics.com.
Primary emotions are emotions that are innate to support reactive response behaviour (Eg. angry, happy, sad, fear). The basic/primary emotions are based on the studies by Ekman [37]. Secondary emotions arise from higher cognitive processes, based on an ability to evaluate preferences over outcomes and expectations (Eg. relief, hope) [38, 39]. Various theories define primary and secondary emotions [40], but here we will be looking at emotions that were studied as part of human-robot interaction and speech synthesis studies.
JLCorpus contains five primary and five secondary emotions. “Assertive” was one of the secondary emotions. The actors of the corpus were instructed to speak seriously and confidently while recording this emotion. In a previous paper, the reviewers strongly criticised the use of “assertive” as an emotion, and asked to reconsider it. From the existing list of emotions in Russel’s circumplex model of emotions (Fig. 6), “confident” was the best match. This journey is a clear indication of the difficulty in analysing and classifying secondary emotions as they can be difficult to define. The corpus is available at: github.com/tli725/JL-Corpus.
Approved by the University of Auckland Human Participants Ethics Committee (UAHPEC) on 20/10/2017 for 3 years. Ref. No. 019845.
r indicates the effect size.
References
Tapus A (2009) Assistive robotics for healthcare and rehabilitation. In: Int. conf. on control systems and computer science, Romania, pp 1–7
Toh LPE, Causo A, Tzuo P, Chen I, Yeo SH (2016) A review on the use of robots in education and young children. Educ Technol Soc 19:148–163
Triebel R, Arras K, Alami R, Beyer L, Breuers S, Chatila R, Chetouani M, Cremers D, Evers V, Fiore M (2016) Spencer: a socially aware service robot for passenger guidance and help in busy airports. In: Field and service robotics, pp 607–622
Pineau Joelle, Montemerlo Michael, Pollack Martha, Roy Nicholas, Thrun Sebastian (2002) Towards robotic assistants in nursing homes: challenges and results. Robot Auton Syst 42(271–281):6
Chu M, Khosla R, Khaksar SMS, Nguyen K (2017) Service innovation through social robot engagement to improve dementia care quality. Assist Technol 29(1):8–18
Centre for automation and robotic engineering science-Healthbots. https://cares.blogs.auckland.ac.nz/research/healthcare-assistive-technologies/healthbots/. Accessed 29 Oct 2019
Broadbent E, Stafford R, MacDonald B (2009) Acceptance of healthcare robots for the older population: review and future directions. Int J Social Robot 1(4):319
Igic A, Watson CI, Stafford RQ, Broadbent E, Jayawardena C, MacDonald BA (2010) Perception of synthetic speech with emotion modelling delivered through a robot platform: an initial investigation with older listeners. In: Australasian int. conf. on speech science and technology, Australia, pp 189–192
Igic A (2010) Synthetic speech for a healthcare robot: investigation, issues and implementation. Master’s thesis, The University of Auckland, New Zealand
Fussell SR, Kiesler S, Setlock LD, Victoria Y (2008) How people anthropomorphize robots. In: ACM/IEEE int. conf. on human-robot interaction, Netherlands, pp 145–152
Heerink Marcel, Kröse Ben, Evers Vanessa, Wielinga Bob (2010) Assessing acceptance of assistive social agent technology by older adults: the Almere model. Int J Social Robot 2(4):361–375
Heerink M (2011) Exploring the influence of age, gender, education and computer experience on robot acceptance by older adults. In: Int. conf. on Human-robot interaction, Switzerland, pp 147–148
Duffy Brian R (2003) Anthropomorphism and the social robot. Robot Auton Syst 42(3–4):177–190
Marcel Heerink, Ben Krose, Vanessa Evers, Bob Wielinga (2006) The influence of a robot’s social abilities on acceptance by elderly users. IEEE Int. Symposium on Robot and Human Interactive Communication, UK, pp 521–526
Markowitz J (2017) Speech and language for acceptance of social robots: an overview. Voice Interact Design 2:1–11
Breazeal C, Scassellati B (1999) A context-dependent attention system for a social robot. In: Int. joint conf.s on artificial intelligence, USA, pp 1146–1151
Chella A, Barone RE, Pilato G, Sorbello R (2008) An emotional storyteller robot. Emotion, personality, and social behavior, USA. In: AAAI spring symposium, pp 17–22
Mavridis Nikolaos (2015) A review of verbal and non-verbal human-robot interactive communication. Robot Auton Syst 63:22–35
Ivar Nass Clifford, Brave Scott (2005) Wired for speech: how voice activates and advances the human–computer relationship. MIT press, Cambridge
Goetz J, Kiesler S, Powers A (2003) Matching robot appearance and behavior to tasks to improve human-robot cooperation. In: IEEE int. workshop on robot and human interactive communication, USA, pp 55–60
Scheutz M, Schermerhorn P, Kramer J, Middendorff C (2006) The utility of affect expression in natural language interactions in joint human–robot tasks. In: ACM conf. on human–robot interaction. USA 2:226–233
Eyssel F, Ruiter L, Kuchenbrandt D, Bobinger S, Hegel F (2012) If you sound like me, you must be more human: on the interplay of robot and user features on human-robot acceptance and anthropomorphism. In: ACM/IEEE int. conf. on human–robot interaction, USA, pp 125–126
Fung P, Bertero D, Wan Y, Dey A, Chan RHY, Siddique F, Yang Y, Wu C, Lin R (2016) Towards empathetic human-robot interactions. In: Int. conf. on intelligent text processing & computational linguistics, Turkey, pp 173–193
James J, Watson CI, MacDonald B (2018) Artificial empathy in social robots: an analysis of emotions in speech. In: IEEE int. symposium on robot and human interactive communication, China, pp 632–637
Cuff Benjamin MP, Brown Sarah J, Taylor Laura, Howat Douglas J (2016) Empathy: a review of the concept. Emot Rev 8(2):144–153
Asada Minoru (2015) Towards artificial empathy. Int J Social Robot 7(1):19–33
Taylor P (2009) Text-to-speech synthesis. Cambridge university press, Cambridge
Crumpton J, Bethel CL (2015) Validation of vocal prosody modifications to communicate emotion in robot speech. In: Int. conf. on collaboration technologies and systems, USA, pp 39–46
Alam Firoj, Danieli Morena, Riccardi Giuseppe (2018) Annotating and modeling empathy in spoken conversations. Computer Speech Lang 50:40–61
Li X, Watson CI, Igic A, MacDonald B (2009) Expressive speech for a virtual talking head. In: Australasian conf. on robotics and automation, Australia, pp 5009–5014
Moore Lisa A (2006) Empathy: a clinician’s perspective. ASHA Leader 11(10):16–35
Niculescu Andreea, van Dijk Betsy, Nijholt Anton, Li Haizhou, See Swee Lan (2013) Making social robots more attractive: the effects of voice pitch, humor and empathy. Int J Social Robot 5(2):171–191
Watson C, Liu W, MacDonald B (2013) The effect of age and native speaker status on synthetic speech intelligibility. In: ISCA workshop on speech synthesis, Spain, pp 195–200
Broadbent E, Tamagawa R, Kerse N, Knock B, Patience A, MacDonald B (2009) Retirement home staff and residents preferences for healthcare robots. In: IEEE int. symposium on robot and human interactive communication, Japan, pp 645–650
Moyers TB, Martin T, Manuel JK, Miller WR, Ernst D (2003) The motivational interviewing treatment integrity (miti) code: Version 2.0. http://casaa.unm.edu/download/miti.pdf. Accessed 29 Oct 2019
Field A, Miles J, Field Z (2012) Discovering statistics using R. Sage, Thousand Oaks, pp 666–673
Ekman Paul (1992) An argument for basic emotions. Cogn Emotion 6(3–4):169–200
Damasio A (1994) Descartes error, emotion reason and the human brain. Avon books, New York
Becker-Asano Christian, Wachsmuth Ipke (2010) Affective computing with primary and secondary emotions in a virtual human. Auton Agent Multi-Agent Syst 20(1):32
Kemper Theodore D (1987) How many emotions are there? wedding the social and the autonomic components. Am J Sociol 93(2):263–289
Ochs Magalie, Sadek David, Pelachaud Catherine (2012) A formal model of emotions for an empathic rational dialog agent. Auton Agent Multi-Agent Syst 24(3):410–440
Boukricha H, Wachsmuth I, Carminati MN, Knoeferle P (2013) A computational model of empathy: empirical evaluation. In: Humaine association conf. on affective computing and intelligent interaction, USA, pp 1–6
Schröder M (2001) Emotional speech synthesis: a review. In: Eurospeech, Scandinavia, pp 561–64
Breazeal C (2001) Emotive qualities in robot speech. In: IEEE/RSJ IROS, USA, pp 1389–1394. IEEE
Crumpton Joe, Bethel Cindy L (2016) A survey of using vocal prosody to convey emotion in robot speech. Int J Social Robot 8(2):271–285
Paltoglou Georgios, Thelwall Michael (2012) Seeing stars of valence and arousal in blog posts. IEEE Trans Affect Comput 4(1):116–123
James J, Tian L, Watson CI (2018) An open source emotional speech corpus for human robot interaction applications. In: Interspeech, India, pp 2768–2772
James J, Watson CI, Stoakes H(2019) Influence of prosodic features and semantics on secondary emotion production and perception. In: Int. congress of phonetic sciences, Australia, pp 1779–1782
Kisler T, Schiel F, Sloetjes H (2012) Signal processing via web services: the use case webmaus. In: Digital humanities conf, Germany, pp 30–34
James J, Mixdorff H, Watson CI (2019) Quantitative model-based analysis of \(f_0\) contours of emotional speech. In: Int. congress of phonetic sciences, Australia, pp 72–76
Mixdorff H, Cossio-Mercado C, Hönemann A, Gurlekian J, Evin D, Torres H(2015) Acoustic correlates of perceived syllable prominence in German. In: Annual conf. of the int. speech communication association, Germany, pp 51–55
Mixdorff H (2000) A novel approach to the fully automatic extraction of fujisaki model parameters. In: IEEE int. conf. on acoustics, speech, and signal processing. Proceedings, Turkey, pages 1281–1284
Schröder Marc, Trouvain J”urgen (2003) The German text-to-speech synthesis system MARY: a tool for research, development and teaching. Int J Speech Technol 6(4):365–377
Watson CI, Marchi A (2014) Resources created for building New Zealand english voices. In: Australasian int. conf. of speech science and technology, New Zealand, pp 92–95
Jain S (2015) Towards the creation of customised synthetic voices using Hidden Markov Models on a Healthcare Robot. Master’s thesis, The University of Auckland, New Zealand
Paul Boersma (1993) Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Inst Phonetic Sci 17:97–110
Liaw Andy, Wiener Matthew (2002) Classification and regression by Random Forest. R news 2.3 23:18–22
Yoav Freund, Schapire Robert E (1996) Experiments with a new boosting algorithm. In: Int. conf. on machine learning, Italy, pp 148–156
Eide E, Aaron A, Bakis R, Hamza W, Picheny M, Pitrelli J (2004) A corpus-based approach to expressive speech synthesis. In: ISCA ITRW on speech synthesis, USA, pp 79–84
Ming H, Huang D, Dong M, Li H, Xie L, Zhang S (2015) Fundamental frequency modeling using Wavelets for emotional voice conversion. In: Int. conf. on affective computing and intelligent interaction, China, pp 804–809
Robinson C, Obin N, Roebel A (2019) Sequence-to-sequence modelling of \(F_0\) for speech emotion conversion. In: Int. conf. on acoustics, speech, and signal processing, UK, pp 6830–6834
Taguette version: 0.9. https://www.taguette.org. Publisher: Zenodo
Miro. https://miro.com/app/
Powers A, Kiesler S, Fussell S, Torrey C (2007) Comparing a computer agent with a humanoid robot. In: Proceedings of the ACM/IEEE int. conf. on human-robot interaction, pp 145–152
McGinn C, Torre I (2019) Can you tell the robot by the voice? an exploratory study on the role of voice in the perception of robots. In: 2019 14th ACM/IEEE int. conf. on human–robot interaction (HRI), pp 211–221. IEEE
Anzalone Salvatore M, Boucenna Sofiane, Ivaldi Serena, Chetouani Mohamed (2015) Evaluating the engagement with social robots. Int J Social Robot 7(4):465–478
Leite Iolanda, Castellano Ginevra, Pereira André, Martinho Carlos, Paiva Ana (2014) Empathic robots for long-term interaction. Int J Social Robot 6(3):329–341
Tamagawa Rie, Watson Catherine I, Han Kuo I, MacDonald Bruce A, Broadbent Elizabeth (2011) The effects of synthesized voice accents on user perceptions of robots. Int J Social Robot 3(3):253–262
Acknowledgements
This research was supported by the Centre for Automation and Robotic Engineering Science, University of Auckland Seed funding. The authors would like to thank the professional actors who recorded their voices for the JLCorpus and the perception test participants for their time and effort.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Ethical Standard
The authors also thank the very detailed review provided by the reviewers of the journal that helped improve this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix-Scale of Empathy Questionnaire
Appendix-Scale of Empathy Questionnaire
Rights and permissions
About this article
Cite this article
James, J., Balamurali, B.T., Watson, C.I. et al. Empathetic Speech Synthesis and Testing for Healthcare Robots. Int J of Soc Robotics 13, 2119–2137 (2021). https://doi.org/10.1007/s12369-020-00691-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-020-00691-4