Skip to main content
Log in

The Role of Speech Technology in User Perception and Context Acquisition in HRI

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

The role and relevance of speech synthesis and speech recognition in social robotics is addressed in this paper. To increase the generality of this study, the interaction of a human being with one and two robots when executing tasks was considered. By making use of these scenarios, a state-of-the-art speech synthesizer was compared with non-linguistic utterances (1) from the human preference and (2) perception of the robots’ capabilities, (3) speech recognition was compared with typed text to input commands regarding the user preference, and (4) the importance of knowing the context of robots and (5) the role of synthetic voice to acquire this context were evaluated. Speech synthesis and recognition are different technologies but generating and understanding speech should be understood as different dimensions of the same spoken language phenomenon. Also, robot context denotes all the information about operating conditions and completeness status of the task that is being executed by the robot. Two robotic setups for online experiments were built. With the first setup, where only one robot was employed, our findings indicate that: highly natural synthetic speech is preferred over beep-like audio; users also prefer to enter commands by voice rather than by typing text; and, the robot voice has a more important effect on the perceived robot’s capability than the possibility to input commands by voice. The analysis presented here suggests that when the users interacted with a single robot, its voice as a social cue and cause of anthropomorphization lost relevance while the interaction was carried out and the users could evaluate better the robot’s capability with respect to its task. In the experiment with the second setup, a two-robot collaborative testbed was employed. When the robots communicated to each other to sort out the problems while they were trying to accomplish a mission, the user observed the situation from a more distanced position and the “reflective” perspective dominated. Our results indicate that to acquire the robots’ context was perceived as essential for a successful human–robot collaboration to accomplish a given objective. For this purpose, synthesized speech was preferred over text on a screen for context acquisition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.slashfilm.com/25-best-movie-robots/5/, accessed January 16, 2019.

  2. https://www.heykuri.com, accessed January 14, 2019.

  3. https://www.anki.com/en-us/vector, accessed January 14, 2019.

References

  1. Goodrich MA, Schultz AC (2008) Human–robot interaction: a survey. Found Trends Hum Comput Interact 1(3):203–275

    Article  Google Scholar 

  2. Lopes LS, Teixeira A (2000) Human–robot interaction through spoken language dialogue. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, pp 528–534

  3. Hoffman G, Vanunu K (2013) Effects of robotic companionship on music enjoyment and agent perception. In: Proceedings of the 8th ACM/IEEE international conference on human–robot interaction. ACM Press, Tokio, pp 317–324

  4. Lin CY, Song KT, Chen YW, Chien SC, Chen SH, Chiang CY, Yang JH, Wu YC, Liu TJ (2012) User identification design by fusion of face recognition and speaker recognition. In: 2012 international conference on control, automation and systems. IEEE, Jeju Island South Korea

  5. Zheng K, Glas DF, Kanda T, Ishiguro H, Hagita N (2013) Designing and implementing a human–robot team for social interactions. IEEE Trans Syst Man Cybernet Syst 43(4):843–859

    Article  Google Scholar 

  6. Graf B, Hans M, Schraft RD (2004) Care-O-Bot II—development of a next generation robotic home assistant. Auton Robots 16(2):193–205

    Article  Google Scholar 

  7. Jeong K, Sung J, Lee HS, Kim A, Kim H, Park C, Jeong Y, Lee J, Kim J (2018) Fribo: a social networking robot for increasing social connectedness through sharing daily home activities from living noise data. In: Proceedings of the 13th ACM/IEEE international conference on human–robot interaction. IEEE Press, Chicago, pp 114–122

  8. Pachidis T, Vrochidou E, Kaburlasos VG, Kostova S, Bonković M, Papić V (2018) Social robotics in education: state-of-the-art and directions. In: Proceedings of the 27th international conference on robotics in Alpe-Adria Danube region. Springer, Cham, pp 689–700

  9. Wei CW, Hung I (2011) A joyful classroom learning system with robot learning companion for children to learn mathematics multiplication. Turk Online J Educ Technol 10(2):11–23

    Google Scholar 

  10. Barker BS, Ansorge J (2007) Robotics as means to increase achievement scores in an informal learning environment. J Res Technol Educ 39(3):229–243

    Article  Google Scholar 

  11. Highfield K (2010) Robotic toys as a catalyst for mathematical problem solving. Aust Primary Math Classroom 15(2):22–27

    Google Scholar 

  12. Young SSC, Wang YH, Jang JSR (2010) Exploring perceptions of integrating tangible learning companions in learning english conversation: colloquium. Br J Educ Technol 41(5):E78–E83

    Article  Google Scholar 

  13. Cabibihan J-J, Javed H, Ang M, Aljunied SM (2013) Why robots? A survey on the roles and benefits of social robots in the therapy of children with autism. Int J Soc Robot 5:593–618

    Article  Google Scholar 

  14. Michaud F, Duquette A, Nadeau I (2003) Characteristics of mobile robotic toys for children with pervasive developmental disorders. In: 2003 IEEE international conference on systems, man, and cybernetics, SMC, pp 2938–2943. IEEE

  15. Kozima H, Nakagawa C, Yasuda Y (2007) Children-robot interaction: a pilot study in autism therapy. Prog Brain Res 164:385–400

    Article  Google Scholar 

  16. Meszaros EL, Le Vie LR, Allen BD (2018) Trusted communication: utilizing speech communication to enhance human–machine teaming success. In: AIAA aviation technology, integration, and operations conference, AIAA-2018-4014, Atlanta, GA

  17. Han S, Hong J, Jeong S, Hahn M (2010) Robust GSC-based speech enhancement for human machine interface. IEEE Trans Consum Electron 56(2):965–970

    Article  Google Scholar 

  18. Staudte M, Crocker MW (2011) Investigating joint attention mechanisms through spoken human–robot interaction. Cognition 120(2):268–291

    Article  Google Scholar 

  19. Krämer NC, von der Pütten A, Eimler S (2012) Human-agent and human–robot interaction theory: similarities to and differences from human-human interaction. In: Zacarias M, Oliveira JV (eds) Human-computer interaction: the agency perspective, vol 396. Springer, Heidelberg, pp 215–240

    Chapter  Google Scholar 

  20. Cassell J, Bickmore T, Campbell L, Vilhjálmsson H, Yan H (2000) Human conversation as a system framework: designing embodied conversational agents. In: Cassell J, Sullivan J, Prevost S, Churchill E (eds) embodied conversational agents. MIT Press, Cambridge, pp 29–63

    Chapter  Google Scholar 

  21. Gratch J, Rickel J, André E, Cassell J, Petajan E, Badler N (2002) Creating interactive virtual humans: some assembly required. IEEE Intell Syst 17:54–63

    Article  Google Scholar 

  22. Kopp S, Wachsmuth I (2004) Synthesizing multimodal utterances for conversational agents. Comput Animat Virt World 15:39–52

    Article  Google Scholar 

  23. Parise S, Kiesler S, Sproull L, Waters K (1999) Cooperating with life-like interface agents. Comput Hum Behav 15:123–142

    Article  Google Scholar 

  24. Rickenberg R, Reeves B (2000) The effects of animated characters on anxiety, task performance, and evaluations of user interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 49–56

  25. Sproull L, Subramani M, Kiesler S, Walker JH, Waters K (1996) When the interface is a face. Hum Comput Interact 11:97–124

    Article  Google Scholar 

  26. Swinth KR, Blascovich J (2001) Conformity to group norms in an immersive virtual environment. In: 2001 annual meeting of the American Psychological Society (APS), Toronto, Ontario. Canada

  27. Woods S, Dautenhahn K, Kaouri C (2005) Is someone watching me?-consideration of social facilitation effects in human–robot interaction experiments. In: 2005 international symposium on computational intelligence in robotics and automation. IEEE, pp 53–60

  28. Krämer NC, Bente G, Piesk J (2003) The ghost in the machine. The influence of embodied conversational agents on user expectations and user behaviour in a TV/VCR application. IMC workshop, pp 121–128

  29. Schermerhorn P, Scheutz M, Crowell CR (2008) Robot social presence and gender: Do females view robots differently than males?. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction. ACM, pp 263–270

  30. Rist T, Baldes S, Gebhard P, Kipp M, Klesen M, Rist P, Schmitt M (2002) CrossTalk: An interactive installation with animated presentation agents. In: Proceedings of the 2nd international conference on Computational Semiotics for Games and New Media (COSIGN), pp 61–67

  31. Jung B, Kopp S (2003) Flurmax: An interactive virtual agent for entertaining visitors in a hallway. In: Proceedings of the 4th international workshop on intelligent virtual agents, IVA 2003. Springer, Kloster Irsee, pp 23–26

  32. Takayama L (2012) Perspectives on agency interacting with and through personal robots. In: Zacarias M, Oliveira JV (eds) Human–computer interaction: the agency perspective, vol 396. Springer, Heidelberg, pp 195–214

    Chapter  Google Scholar 

  33. Reeves B, Nass CI (1996) The media equation: how people treat computers, television, and new media like real people and places. Cambridge University Press, New York

    Google Scholar 

  34. Nass C, Moon Y (2000) Machines and mindlessness: social responses to computers. J Soc Issues 56:81–103

    Article  Google Scholar 

  35. Nass C, Steuer J, Tauber ER (1994) Computers are social actors. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, USA, CHI’94, pp 72–78

  36. Moore RK (2014) Spoken language processing: time to look outside? In: 2nd international conference on statistical language and speech processing (SLSP 2014), pp 21–36

  37. Gold T (1980) Speech production in hearing-impaired children. J Commun Disord 13:397–418

    Article  Google Scholar 

  38. Tamagawa R, Watson CI, Kuo IH, MacDonald BA, Broadbent E (2011) The effects of synthesized voice accents on user perceptions of robots. Int J Soc Robot 3:253–262

    Article  Google Scholar 

  39. Niculescu A, Dijk B, Nijholt A, Li H, See SL (2013) Making social robots more attractive: the effects of voice pitch, humor and empathy. Int J Soc Robot 5:171–191

    Article  Google Scholar 

  40. Lee EJ, Nass C, Brave S (2000) Can computer-generated speech have gender?: an experimental test of gender stereotype. In: CHI’00 extended abstracts on human factors in computing systems (CHI’00). ACM Press, New York, pp 289–290

  41. Nass C, Lee KM (2001) Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. J Exp Psychol Appl 7(3):171

    Article  Google Scholar 

  42. Gong L, Lai J (2001) Shall we mix synthetic speech and human speech?: impact on users’ performance, perception, and attitude. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 158–165

  43. Nass C, Foehr U, Brave S, Somoza M (2001) The effects of emotion of voice in synthesized and recorded speech. In: Proceedings of the AAAI symposium emotional and intelligent II: The tangled knot of social cognition

  44. Eyssel F, De Ruiter L, Kuchenbrandt D, Bobinger S, Hegel F (2012) ‘If you sound like me, you must be more human’: On the interplay of robot and user features on human–robot acceptance and anthropomorphism. In: Proceedings of the 7th ACM/IEEE international conference on human–robot interaction. ACM Press, Boston, Massachusetts, pp 125–126

  45. Eyssel F, Kuchenbrandt D, Hegel F, Ruiter L (2012) Activating elicited agent knowledge: How robot and user features shape the perception of social robots. In: The 21st IEEE international symposium on robot and human interactive communication, RO-MAN, pp 851–857

  46. McGinn C, Torre I (2019) Can you tell the robot by the voice? An exploratory study on the role of voice in the perception of robots. In: Proceedings of the 14th ACM/IEEE international conference on human–robot interaction. IEEE Press, Daegu, pp 211–221

  47. Crowelly CR, Villanoy M, Scheutzz M, Schermerhornz P (2009) Gendered voice and robot entities: perceptions and reactions of male and female subjects. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, pp 3735–3741

  48. Walters ML, Syrdal DS, Koay KL, Dautenhahn K, Te Boekhorst R (2008) Human approach distances to a mechanical-looking robot with different robot voice styles. In: The 17th IEEE international symposium on robot and human interactive communication, RO-MAN, pp 707–712

  49. Niculescu A, Van Dijk B, Nijholt A, See SL (2011) The influence of voice pitch on the evaluation of a social robot receptionist. In: Proceedings of the 2011 international conference on user science and engineering, i-USEr 2011, pp 18–23

  50. Cha E, Dragan AD, Srinivasa SS (2015) Perceived robot capability. In: The 24th IEEE international symposium on robot and human interactive communication, RO-MAN, pp 541–548

  51. Fischer K, Soto B, Pantofaru C, Takayama L (2014) Initiating interactions in order to get help: effects of social framing on people’s responses to robots’ requests for assistance. In: The 23rd IEEE international symposium on robot and human interactive communication, RO-MAN, pp 999–1005

  52. Read R, Belpaeme T (2014) Non-linguistic utterances should be used alongside language, rather than on their own or as a replacement. In: Proceedings of the 9th ACM/IEEE international conference on human–robot interaction. ACM Press, New York, pp 276–277

  53. Hollingum J, Cassford G (2013) Speech technology at work. Springer, Berlin

    MATH  Google Scholar 

  54. Khota A, Kimura A, Cooper E (2019) Modelling of non-linguistic utterances for machine to human communication in dialogue. In: 5th international symposium on affective science and engineering. Japan Society of Kansei Engineering, Tokyo, pp 1–4

  55. Schwenk M, Arras KO (2014) R2-D2 reloaded: a flexible sound synthesis system for sonic human–robot interaction design. In: The 23rd IEEE international symposium on robot and human interactive communication, RO-MAN, pp 161–167

  56. Read R, Belpaeme T (2016) People interpret robotic non-linguistic utterances categorically. Int J Soc Robot 8:31–50

    Article  Google Scholar 

  57. Read R, Belpaeme T (2012) How to use non-linguistic utterances to convey emotion in child-robot interaction. In: Proceedings of the 7th ACM/IEEE international conference on human–robot interaction. ACM Press, Boston, Massachusetts, pp 219–220

  58. Read R (2014) A study of non-linguistic utterances for social human–robot interaction. PhD Thesis, University of Plymouth, Plymouth, United Kingdom

  59. Bechar A, Edan Y (2003) Human–robot collaboration for improved target recognition of agricultural robots. Ind Robot Int J 30(5):432–436

    Article  Google Scholar 

  60. Kardos C, Kemény Z, Kovács A, Pataki BE, Váncza J (2018) Context-dependent multimodal communication in human–robot collaboration. In: 51st CIRP international conference on manufacturing systems, pp 15–20

  61. Lakhmani SG, Wright JL, Schwartz MR, Barber D (2019) Exploring the effect of communication patterns and transparency on performance in a human–robot team. In: Proceedings of the 63rd human factors and ergonomics society annual meeting. SAGE Publications, Los Angeles, CA, pp 160–164

  62. Marvel JA, Bagchi S, Zimmerman M, Antonishek B (2020) Towards effective interface designs for collaborative HRI in manufacturing: metrics and measures. ACM Trans Comput Hum Interact 9(4):1–55

    Article  Google Scholar 

  63. Lyons JB, Havig PR (2014) Transparency in a human-machine context: approaches for fostering shared awareness/intent. In: Proceedings of the 6th international conference on virtual, augmented and mixed Reality. Springer, Cham, pp 181–190

  64. Wang E, Kim YS, Kim HS, Son JH, Lee S, Suh IH (2005) Ontology modeling and storage system for robot context understanding. In: Proceedings of the 9th international conference on knowledge-based and intelligent information and engineering systems. Springer, Berlin, pp 922–929

  65. Chernova S, Veloso M (2010) Confidence-based multi-robot learning from demonstration. Int J Soc Robot 2:195–215

    Article  Google Scholar 

  66. Arimoto T, Yoshikawa Y, Ishiguro H (2018) Multiple-robot conversational patterns for concealing incoherent responses. Int J Soc Robot 10:583–593

    Article  Google Scholar 

  67. Silva P, Pereira JN, Lima PU (2015) Institutional robotics. Int J Soc Robot 7:825–840

    Article  Google Scholar 

  68. Williams T, Briggs P, Scheutz M (2015) Covert robot-robot communication: human perceptions and implications for human–robot interaction. J Hum Robot Interact 4(2):24–49

    Article  Google Scholar 

  69. Tan XZ, Reig S, Carter EJ, Steinfeld A (2019) From one to another: how robot-robot interaction affects users’ perceptions following a transition between robots. In: Proceedings of the 14th ACM/IEEE international conference on human-robot interaction. IEEE Press, Daegu, pp 114–122

  70. Dahlbäck N, Jönsson A, Ahrenberg L (1993) Wizard of Oz studies: why and how. In: International conference on intelligent user interfaces, IUI1993, pp 193–200

  71. Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design, and results. Interact Comput 14(2):119–140

    Article  Google Scholar 

  72. Powers SR, Rauh C, Henning RA, Buck RW, West TV (2011) The effect of video feedback delay on frustration and emotion communication accuracy. Comput Human Behav 27(5):1651–1657

    Article  Google Scholar 

  73. Bechara A (2004) The role of emotion in decision-making: evidence from neurological patients with orbitofrontal damage. Brain Cogn 55(1):30–40

    Article  Google Scholar 

  74. Lerner JS, Li Y, Valdesolo P, Kassam KS (2015) Emotion and decision making. Ann Rev Psychol 66:799–823

    Article  Google Scholar 

  75. Graesser AC, Chipman P, Haynes BC, Olney A (2005) AutoTutor: an intelligent tutoring system with mixed-initiative dialogue. IEEE Trans Educ 48(4):612–618

    Article  Google Scholar 

  76. Salem M, Dautenhahn K (2015) Evaluating trust and safety in HRI: practical issues and ethical challenges. In: Proceedings of the 10th ACM/IEEE international conference on human–robot interaction (HRI 2015): workshop on the emerging policy and ethics of human–robot interaction. ACM, New York, NY

Download references

Acknowledgements

The authors would like to thank Prof. Simon King, The University of Edinburgh, for having provided the TTS system employed here and Prof. Henny Admoni, CMU, for the preliminary discussions about this research.

Funding

This study was funded by Grants Conicyt-Fondecyt 1151306 and ONRG N°62909-17-1-2002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Néstor Becerra Yoma.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wuth, J., Correa, P., Núñez, T. et al. The Role of Speech Technology in User Perception and Context Acquisition in HRI. Int J of Soc Robotics 13, 949–968 (2021). https://doi.org/10.1007/s12369-020-00682-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-020-00682-5

Keywords

Navigation