Abstract
The role and relevance of speech synthesis and speech recognition in social robotics is addressed in this paper. To increase the generality of this study, the interaction of a human being with one and two robots when executing tasks was considered. By making use of these scenarios, a state-of-the-art speech synthesizer was compared with non-linguistic utterances (1) from the human preference and (2) perception of the robots’ capabilities, (3) speech recognition was compared with typed text to input commands regarding the user preference, and (4) the importance of knowing the context of robots and (5) the role of synthetic voice to acquire this context were evaluated. Speech synthesis and recognition are different technologies but generating and understanding speech should be understood as different dimensions of the same spoken language phenomenon. Also, robot context denotes all the information about operating conditions and completeness status of the task that is being executed by the robot. Two robotic setups for online experiments were built. With the first setup, where only one robot was employed, our findings indicate that: highly natural synthetic speech is preferred over beep-like audio; users also prefer to enter commands by voice rather than by typing text; and, the robot voice has a more important effect on the perceived robot’s capability than the possibility to input commands by voice. The analysis presented here suggests that when the users interacted with a single robot, its voice as a social cue and cause of anthropomorphization lost relevance while the interaction was carried out and the users could evaluate better the robot’s capability with respect to its task. In the experiment with the second setup, a two-robot collaborative testbed was employed. When the robots communicated to each other to sort out the problems while they were trying to accomplish a mission, the user observed the situation from a more distanced position and the “reflective” perspective dominated. Our results indicate that to acquire the robots’ context was perceived as essential for a successful human–robot collaboration to accomplish a given objective. For this purpose, synthesized speech was preferred over text on a screen for context acquisition.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
https://www.slashfilm.com/25-best-movie-robots/5/, accessed January 16, 2019.
https://www.heykuri.com, accessed January 14, 2019.
https://www.anki.com/en-us/vector, accessed January 14, 2019.
References
Goodrich MA, Schultz AC (2008) Human–robot interaction: a survey. Found Trends Hum Comput Interact 1(3):203–275
Lopes LS, Teixeira A (2000) Human–robot interaction through spoken language dialogue. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, pp 528–534
Hoffman G, Vanunu K (2013) Effects of robotic companionship on music enjoyment and agent perception. In: Proceedings of the 8th ACM/IEEE international conference on human–robot interaction. ACM Press, Tokio, pp 317–324
Lin CY, Song KT, Chen YW, Chien SC, Chen SH, Chiang CY, Yang JH, Wu YC, Liu TJ (2012) User identification design by fusion of face recognition and speaker recognition. In: 2012 international conference on control, automation and systems. IEEE, Jeju Island South Korea
Zheng K, Glas DF, Kanda T, Ishiguro H, Hagita N (2013) Designing and implementing a human–robot team for social interactions. IEEE Trans Syst Man Cybernet Syst 43(4):843–859
Graf B, Hans M, Schraft RD (2004) Care-O-Bot II—development of a next generation robotic home assistant. Auton Robots 16(2):193–205
Jeong K, Sung J, Lee HS, Kim A, Kim H, Park C, Jeong Y, Lee J, Kim J (2018) Fribo: a social networking robot for increasing social connectedness through sharing daily home activities from living noise data. In: Proceedings of the 13th ACM/IEEE international conference on human–robot interaction. IEEE Press, Chicago, pp 114–122
Pachidis T, Vrochidou E, Kaburlasos VG, Kostova S, Bonković M, Papić V (2018) Social robotics in education: state-of-the-art and directions. In: Proceedings of the 27th international conference on robotics in Alpe-Adria Danube region. Springer, Cham, pp 689–700
Wei CW, Hung I (2011) A joyful classroom learning system with robot learning companion for children to learn mathematics multiplication. Turk Online J Educ Technol 10(2):11–23
Barker BS, Ansorge J (2007) Robotics as means to increase achievement scores in an informal learning environment. J Res Technol Educ 39(3):229–243
Highfield K (2010) Robotic toys as a catalyst for mathematical problem solving. Aust Primary Math Classroom 15(2):22–27
Young SSC, Wang YH, Jang JSR (2010) Exploring perceptions of integrating tangible learning companions in learning english conversation: colloquium. Br J Educ Technol 41(5):E78–E83
Cabibihan J-J, Javed H, Ang M, Aljunied SM (2013) Why robots? A survey on the roles and benefits of social robots in the therapy of children with autism. Int J Soc Robot 5:593–618
Michaud F, Duquette A, Nadeau I (2003) Characteristics of mobile robotic toys for children with pervasive developmental disorders. In: 2003 IEEE international conference on systems, man, and cybernetics, SMC, pp 2938–2943. IEEE
Kozima H, Nakagawa C, Yasuda Y (2007) Children-robot interaction: a pilot study in autism therapy. Prog Brain Res 164:385–400
Meszaros EL, Le Vie LR, Allen BD (2018) Trusted communication: utilizing speech communication to enhance human–machine teaming success. In: AIAA aviation technology, integration, and operations conference, AIAA-2018-4014, Atlanta, GA
Han S, Hong J, Jeong S, Hahn M (2010) Robust GSC-based speech enhancement for human machine interface. IEEE Trans Consum Electron 56(2):965–970
Staudte M, Crocker MW (2011) Investigating joint attention mechanisms through spoken human–robot interaction. Cognition 120(2):268–291
Krämer NC, von der Pütten A, Eimler S (2012) Human-agent and human–robot interaction theory: similarities to and differences from human-human interaction. In: Zacarias M, Oliveira JV (eds) Human-computer interaction: the agency perspective, vol 396. Springer, Heidelberg, pp 215–240
Cassell J, Bickmore T, Campbell L, Vilhjálmsson H, Yan H (2000) Human conversation as a system framework: designing embodied conversational agents. In: Cassell J, Sullivan J, Prevost S, Churchill E (eds) embodied conversational agents. MIT Press, Cambridge, pp 29–63
Gratch J, Rickel J, André E, Cassell J, Petajan E, Badler N (2002) Creating interactive virtual humans: some assembly required. IEEE Intell Syst 17:54–63
Kopp S, Wachsmuth I (2004) Synthesizing multimodal utterances for conversational agents. Comput Animat Virt World 15:39–52
Parise S, Kiesler S, Sproull L, Waters K (1999) Cooperating with life-like interface agents. Comput Hum Behav 15:123–142
Rickenberg R, Reeves B (2000) The effects of animated characters on anxiety, task performance, and evaluations of user interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 49–56
Sproull L, Subramani M, Kiesler S, Walker JH, Waters K (1996) When the interface is a face. Hum Comput Interact 11:97–124
Swinth KR, Blascovich J (2001) Conformity to group norms in an immersive virtual environment. In: 2001 annual meeting of the American Psychological Society (APS), Toronto, Ontario. Canada
Woods S, Dautenhahn K, Kaouri C (2005) Is someone watching me?-consideration of social facilitation effects in human–robot interaction experiments. In: 2005 international symposium on computational intelligence in robotics and automation. IEEE, pp 53–60
Krämer NC, Bente G, Piesk J (2003) The ghost in the machine. The influence of embodied conversational agents on user expectations and user behaviour in a TV/VCR application. IMC workshop, pp 121–128
Schermerhorn P, Scheutz M, Crowell CR (2008) Robot social presence and gender: Do females view robots differently than males?. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction. ACM, pp 263–270
Rist T, Baldes S, Gebhard P, Kipp M, Klesen M, Rist P, Schmitt M (2002) CrossTalk: An interactive installation with animated presentation agents. In: Proceedings of the 2nd international conference on Computational Semiotics for Games and New Media (COSIGN), pp 61–67
Jung B, Kopp S (2003) Flurmax: An interactive virtual agent for entertaining visitors in a hallway. In: Proceedings of the 4th international workshop on intelligent virtual agents, IVA 2003. Springer, Kloster Irsee, pp 23–26
Takayama L (2012) Perspectives on agency interacting with and through personal robots. In: Zacarias M, Oliveira JV (eds) Human–computer interaction: the agency perspective, vol 396. Springer, Heidelberg, pp 195–214
Reeves B, Nass CI (1996) The media equation: how people treat computers, television, and new media like real people and places. Cambridge University Press, New York
Nass C, Moon Y (2000) Machines and mindlessness: social responses to computers. J Soc Issues 56:81–103
Nass C, Steuer J, Tauber ER (1994) Computers are social actors. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, USA, CHI’94, pp 72–78
Moore RK (2014) Spoken language processing: time to look outside? In: 2nd international conference on statistical language and speech processing (SLSP 2014), pp 21–36
Gold T (1980) Speech production in hearing-impaired children. J Commun Disord 13:397–418
Tamagawa R, Watson CI, Kuo IH, MacDonald BA, Broadbent E (2011) The effects of synthesized voice accents on user perceptions of robots. Int J Soc Robot 3:253–262
Niculescu A, Dijk B, Nijholt A, Li H, See SL (2013) Making social robots more attractive: the effects of voice pitch, humor and empathy. Int J Soc Robot 5:171–191
Lee EJ, Nass C, Brave S (2000) Can computer-generated speech have gender?: an experimental test of gender stereotype. In: CHI’00 extended abstracts on human factors in computing systems (CHI’00). ACM Press, New York, pp 289–290
Nass C, Lee KM (2001) Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. J Exp Psychol Appl 7(3):171
Gong L, Lai J (2001) Shall we mix synthetic speech and human speech?: impact on users’ performance, perception, and attitude. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 158–165
Nass C, Foehr U, Brave S, Somoza M (2001) The effects of emotion of voice in synthesized and recorded speech. In: Proceedings of the AAAI symposium emotional and intelligent II: The tangled knot of social cognition
Eyssel F, De Ruiter L, Kuchenbrandt D, Bobinger S, Hegel F (2012) ‘If you sound like me, you must be more human’: On the interplay of robot and user features on human–robot acceptance and anthropomorphism. In: Proceedings of the 7th ACM/IEEE international conference on human–robot interaction. ACM Press, Boston, Massachusetts, pp 125–126
Eyssel F, Kuchenbrandt D, Hegel F, Ruiter L (2012) Activating elicited agent knowledge: How robot and user features shape the perception of social robots. In: The 21st IEEE international symposium on robot and human interactive communication, RO-MAN, pp 851–857
McGinn C, Torre I (2019) Can you tell the robot by the voice? An exploratory study on the role of voice in the perception of robots. In: Proceedings of the 14th ACM/IEEE international conference on human–robot interaction. IEEE Press, Daegu, pp 211–221
Crowelly CR, Villanoy M, Scheutzz M, Schermerhornz P (2009) Gendered voice and robot entities: perceptions and reactions of male and female subjects. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, pp 3735–3741
Walters ML, Syrdal DS, Koay KL, Dautenhahn K, Te Boekhorst R (2008) Human approach distances to a mechanical-looking robot with different robot voice styles. In: The 17th IEEE international symposium on robot and human interactive communication, RO-MAN, pp 707–712
Niculescu A, Van Dijk B, Nijholt A, See SL (2011) The influence of voice pitch on the evaluation of a social robot receptionist. In: Proceedings of the 2011 international conference on user science and engineering, i-USEr 2011, pp 18–23
Cha E, Dragan AD, Srinivasa SS (2015) Perceived robot capability. In: The 24th IEEE international symposium on robot and human interactive communication, RO-MAN, pp 541–548
Fischer K, Soto B, Pantofaru C, Takayama L (2014) Initiating interactions in order to get help: effects of social framing on people’s responses to robots’ requests for assistance. In: The 23rd IEEE international symposium on robot and human interactive communication, RO-MAN, pp 999–1005
Read R, Belpaeme T (2014) Non-linguistic utterances should be used alongside language, rather than on their own or as a replacement. In: Proceedings of the 9th ACM/IEEE international conference on human–robot interaction. ACM Press, New York, pp 276–277
Hollingum J, Cassford G (2013) Speech technology at work. Springer, Berlin
Khota A, Kimura A, Cooper E (2019) Modelling of non-linguistic utterances for machine to human communication in dialogue. In: 5th international symposium on affective science and engineering. Japan Society of Kansei Engineering, Tokyo, pp 1–4
Schwenk M, Arras KO (2014) R2-D2 reloaded: a flexible sound synthesis system for sonic human–robot interaction design. In: The 23rd IEEE international symposium on robot and human interactive communication, RO-MAN, pp 161–167
Read R, Belpaeme T (2016) People interpret robotic non-linguistic utterances categorically. Int J Soc Robot 8:31–50
Read R, Belpaeme T (2012) How to use non-linguistic utterances to convey emotion in child-robot interaction. In: Proceedings of the 7th ACM/IEEE international conference on human–robot interaction. ACM Press, Boston, Massachusetts, pp 219–220
Read R (2014) A study of non-linguistic utterances for social human–robot interaction. PhD Thesis, University of Plymouth, Plymouth, United Kingdom
Bechar A, Edan Y (2003) Human–robot collaboration for improved target recognition of agricultural robots. Ind Robot Int J 30(5):432–436
Kardos C, Kemény Z, Kovács A, Pataki BE, Váncza J (2018) Context-dependent multimodal communication in human–robot collaboration. In: 51st CIRP international conference on manufacturing systems, pp 15–20
Lakhmani SG, Wright JL, Schwartz MR, Barber D (2019) Exploring the effect of communication patterns and transparency on performance in a human–robot team. In: Proceedings of the 63rd human factors and ergonomics society annual meeting. SAGE Publications, Los Angeles, CA, pp 160–164
Marvel JA, Bagchi S, Zimmerman M, Antonishek B (2020) Towards effective interface designs for collaborative HRI in manufacturing: metrics and measures. ACM Trans Comput Hum Interact 9(4):1–55
Lyons JB, Havig PR (2014) Transparency in a human-machine context: approaches for fostering shared awareness/intent. In: Proceedings of the 6th international conference on virtual, augmented and mixed Reality. Springer, Cham, pp 181–190
Wang E, Kim YS, Kim HS, Son JH, Lee S, Suh IH (2005) Ontology modeling and storage system for robot context understanding. In: Proceedings of the 9th international conference on knowledge-based and intelligent information and engineering systems. Springer, Berlin, pp 922–929
Chernova S, Veloso M (2010) Confidence-based multi-robot learning from demonstration. Int J Soc Robot 2:195–215
Arimoto T, Yoshikawa Y, Ishiguro H (2018) Multiple-robot conversational patterns for concealing incoherent responses. Int J Soc Robot 10:583–593
Silva P, Pereira JN, Lima PU (2015) Institutional robotics. Int J Soc Robot 7:825–840
Williams T, Briggs P, Scheutz M (2015) Covert robot-robot communication: human perceptions and implications for human–robot interaction. J Hum Robot Interact 4(2):24–49
Tan XZ, Reig S, Carter EJ, Steinfeld A (2019) From one to another: how robot-robot interaction affects users’ perceptions following a transition between robots. In: Proceedings of the 14th ACM/IEEE international conference on human-robot interaction. IEEE Press, Daegu, pp 114–122
Dahlbäck N, Jönsson A, Ahrenberg L (1993) Wizard of Oz studies: why and how. In: International conference on intelligent user interfaces, IUI1993, pp 193–200
Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design, and results. Interact Comput 14(2):119–140
Powers SR, Rauh C, Henning RA, Buck RW, West TV (2011) The effect of video feedback delay on frustration and emotion communication accuracy. Comput Human Behav 27(5):1651–1657
Bechara A (2004) The role of emotion in decision-making: evidence from neurological patients with orbitofrontal damage. Brain Cogn 55(1):30–40
Lerner JS, Li Y, Valdesolo P, Kassam KS (2015) Emotion and decision making. Ann Rev Psychol 66:799–823
Graesser AC, Chipman P, Haynes BC, Olney A (2005) AutoTutor: an intelligent tutoring system with mixed-initiative dialogue. IEEE Trans Educ 48(4):612–618
Salem M, Dautenhahn K (2015) Evaluating trust and safety in HRI: practical issues and ethical challenges. In: Proceedings of the 10th ACM/IEEE international conference on human–robot interaction (HRI 2015): workshop on the emerging policy and ethics of human–robot interaction. ACM, New York, NY
Acknowledgements
The authors would like to thank Prof. Simon King, The University of Edinburgh, for having provided the TTS system employed here and Prof. Henny Admoni, CMU, for the preliminary discussions about this research.
Funding
This study was funded by Grants Conicyt-Fondecyt 1151306 and ONRG N°62909-17-1-2002.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wuth, J., Correa, P., Núñez, T. et al. The Role of Speech Technology in User Perception and Context Acquisition in HRI. Int J of Soc Robotics 13, 949–968 (2021). https://doi.org/10.1007/s12369-020-00682-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-020-00682-5