Robot Dialog System in the Context of Hospital Receptionist and its Demonstration

Hwang, Eui Jun; Ahn, Byeong Kyu; Lim, Jong Yoon; Macdonald, Bruce A.; Ahn, Ho Seok

doi:10.1007/s12369-021-00861-y

Robot Dialog System in the Context of Hospital Receptionist and its Demonstration

Open access
Published: 17 October 2022

Volume 15, pages 679–687, (2023)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Social Robotics Aims and scope Submit manuscript

Robot Dialog System in the Context of Hospital Receptionist and its Demonstration

Download PDF

Eui Jun Hwang^1,2,
Byeong Kyu Ahn^1,2,
Jong Yoon Lim^1,2,
Bruce A. Macdonald¹ &
…
Ho Seok Ahn ORCID: orcid.org/0000-0001-7418-6280²

2404 Accesses
2 Citations
Explore all metrics

Abstract

Task-oriented dialogue systems play an important role in human-robot interaction. Task-oriented dialogue systems help users achieve specific goals, and robots with systems are very effective in terms of reducing time and effort when replacing human workers. The conventional pipeline method, which has disadvantages such as a large amount of cost and time for development, has been applied to reception robots. For example, developers manually define responses that correspond to user input. Also, they use predefined robotic actions such as gestures and facial expressions. Recently, end-to-end learning of Recurrent Neural Networks is an attractive solution for the dialogue system. Based on the strengths of RNNs, we propose a social robot system in the context of hospital receptionists. We utilize Hybrid Code Network as an end-to-end dialogue system and extend it to select both response and gesture using RNN-based gesture selector. A user study is conducted comparing our proposed system with one of the existing methods, a rule-based approach. The empirical results show that the proposed method has an advantage in terms of dialog efficiency, which indicates how efficiently the user performed a given task with the help of a robot. In addition, there is no significant difference in experimental results between the proposed RNN based gesture selection and the rule-based gesture selection.

Learning Emotion Recognition and Response Generation for a Service Robot

Survey of Speechless Interaction Techniques in Social Robotics

Towards Dialogue-Based Navigation with Multivariate Adaptation Driven by Intention and Politeness for Social Robots

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Robots are rapidly becoming a valuable resource in many area such as homes, offices, shops, museums, and hospitals [1]. In particular, robots in hospitals can handle repetitive tasks such as checking appointment or prescriptions on behalf of human receptionists. In order to interact with patients in hospitals, robots with task-oriented dialogue systems are essential, and if replaced with human operators, a positive effect can be expected in terms of reducing time and effort. In our previous work [1], we designed a receptionist robot to work in a hospital reception environment, but some components in the system, such as dialogue system and gesture generation, relied on purely rule-based.

Many researchers have developed task-oriented dialogue systems used in a Human-Robot Interaction (HRI) [2,3,4,5,6,7]. These robots have adopted conventional task-oriented dialogue system, which contains several components connected in a pipeline. In this approach, natural language understanding (NLU) identify user’s intent and extract semantic information (slot values) from the recognized user’s utterance. Then the output of NLU is passed to the dialogue state tracking (DST). DST maintain a distribution of dialogue states, which contain user’s intent and slot value expressed by a user so far, over the past dialogue histories. The distribution output is passed on to the dialogue policy module, where the next available system action is selected. The system action can be represented as a semantic frame which has action name (i.e. confirms a user request, request user’s name and so on) with entity value (i.e. name, address, age and so on). The generated system action is then passed to natural language generation (NLG) to generate an actual response.

Recently, with the success of chit-chat system based on end-to-end trainable neural network models [8, 9], researchers have started exploring end-to-end approaches to solve such difficulties in the pipeline approaches. The end-to-end methods are mainly based on an idea that recurrent neural networks (RNNs) can be directly trained on text transcripts of dialogues to represent distributed dialogue representations. With the benefit of RNNs, the end-to-end approaches tend to use a single module to generate a response rather than separate modules as in the pipeline methods.

We address a research question on how we can apply end-to-end dialogue system to the robot. A different approach may be necessary to build a robot system because end-to-end based dialogue system which calculates the dialogue state with a trained hidden state cannot manually define a robot’s behavior such as a gesture, expression and so on. We aim to fill this gap by demonstrating how we build a receptionist robot using the end-to-end dialogue system. There are two aspects of HRI with the end-to-end approach: 1) how end-to-end dialogue system is applied to the robot? 2) how we make the robot express its behavior?

For the sake of this, we propose a robot dialogue system that can generate responses and gestures according to user input. Note that we only focus on generating robot’s gesture as a first attempt to HRI with end-to-end approach. We utilized Hybrid Code Network (HCN) [9] and extended it to produce a response with selected gesture. We applied Recurrent Neural Network (RNN) to select the robot’s gesture depending on the system response generated from HCN. The dialogue system is then integrated into a robot and deployed as a real receptionist. In order to examine the feasibility of the proposed system, we conduct an experiment on real users and compare it with the rule-based system. As a evaluation metrics, we use PARADISE framework and God speed test.

The main contributions of this work as follows:

We propose robot dialogue system that can produce responses and gestures in an end-to-end manner to user input.
We conduct a comparative experiment on real users between the proposed and baseline system.
The experimental results shows that the proposed system achieves better dialogue efficiency, which can complete a given task more efficiently.
It also shows that the proposed system is more efficient in terms of development speed.

This paper is organized as follow. Section 2 presents our extended HCN for HRI. Section 3 describes system components of our receptionist robot. Section 4 presents an experiment of receptionist robot with real users. Section 5 summaries and concludes this paper.

2 Related Work

2.1 Dialogue Systems for Service Robot

In the previous studies, many researchers have developed a task-oriented dialogue system used in HRI. Finite State Machine (FSM) and slot-based methods have been applied along with gestures such as facial expressions, gestures, and so on [2,3,4]. Statistical approaches such as Partially Observable Markov Decision Process (POMDP) have been applied to the dialogue systems to maintain a distribution of possible dialogue states [5, 6]. Reinforcement learning has been applied to combine chat and task-based conversation for the dialogue system [7]. However, these systems are based on the traditional pipeline approach and rule-based behavior selection and have several common drawbacks. According to [9], it is often unclear how the dialogue state is defined and what the dialogue history is maintained to select system behavior based on the current dialogue state. Moreover, the traditional approaches are expensive and time consuming to deploy and make it difficult to scale to new domains [10].

2.2 Recent Trends in Dialogue System

With the recent success of chit-chat systems based on learnable end-to-end neural network models [8, 11], researchers have begun to look for an end-to-end approach to address these challenges in the traditional approaches. The end-to-end methods are mainly based on the idea that RNNs can be trained directly in the text transcripts of the dialogue to represent the distributed dialogue representations. Due to the benefits of RNNs, the end-to-end approaches tend to use a single module to generate responses rather than separate modules as in traditional methods.

Bordes et al. [12] developed an end-to-end learnable framework using end-to-end memory networks (MemN2N) [13], which consists of interfering modules and readable and readable memory components that can be read and written. In similar studies, other researchers explored approaches using gated end-to-end memory networks [14], query reduction networks [15], and copy-augmented sequence to sequence network [10]. However, according to Williams et al. [9], these pure RNN-based approaches have been found to lack a general mechanism for injecting domain knowledge. Domain knowledge injection can be easily solved with a few lines of code, but the models mentioned earlier require thousands of conversations to learn these simple actions. To address these limitations, they introduce a practical RNN-based end-to-end framework called HCN.

3 Dialogue System for Hospital Receptionist Robot

We use HCN and extended it with a gesture selector to produce both responses and gestures as shown in Fig. 1. HCN has domain-specific components (rules-based) and a neural network-based response selector that tracks the state of latent dialogue states. The details of each component are described in the following paragraphs.

Once the user’s utterance is provided, it is transformed into four different feature vectors: word embedding, bag of word, context feature, and action mask. We use pretrained word embeddings to get the embedding vector $emb_t$ at time t. The entity processing module, which is part of a domain-specific component to generate context and action mask vectors, identifies the entity in the user’s utterance $U={u_1,u_2,\ldots ,u_T}$ with $T$ words and keeps the identified entity. Then, if a new entity is identified later in the user’s utterance, the old entity is replaced with the newly identified entity. Finally, the module generates an action mask $am_t$ and context feature vector $c_t$ at time t as part of the input in the response selector.

These feature vectors are now concatenated as an input feature vector $f$ in the response selection and can be represented as:

$$\begin{aligned} f_t=[emb_t\oplus \ b_t\oplus \ am_t\oplus \ c_t]. \end{aligned}$$

(1)

The response selection module (Fig. 2) consisting of LSTM [16], density and softmax layers determines the response ${\hat{r}}$ based on the input feature vector $f_t$ as shown in Fig. 2. To determine the response ${\hat{r}}$, the LSTM is fed with feature vectors $f_t$ until time t-1. In the final step of this module, the LSTM receives the feature vector $f_t$ and generates an 11-dimensional probability distribution that is same size as the response extracted from dialogue dataset we used.

The entity output module generates a fully formatted response based on the response template selected in the response selector. For example, if an action template “api_call location <location>” selected the previous module, then entity output fills stored entity as “api_call location bathroom”.

The trainable gesture selector (Fig. 3), based on the idea of intent classification [17, 18], consists of three layers: embedding, LSTM, dropout, dense and softmax layers. When the HCN generates a response, each word in the response is tokenized and transformed into a vector using an embedding layer. It is then sequentially input to the LSTM layer, the dense layer and the softmax layer outputting the probability of a gesture label matching one of the robot’s pre-defined motion controls.

4 Hospital Receptionist Robot System

The receptionist robot system consists of four components: sensory perception, extended HCN, Social Human-Robot Interaction framework (SHRI) [19], and robot platform, as shown in Fig. 4. The automatic speech recognition and face detection module serves as the robot’s sensory recognition. The extended HCN is responsible for generating the robot’s response with gestures that the robot should take. The SHRI framework acts as a bridge between the robot platform and external modules such as sensory recognition and dialogue systems. It manages non-verbal functions such as turn taking, gaze, emotions, gestures and more. All components are developed based on Robot Operating System (ROS). We use NAO, which is a humanoid robot that is widely used for research and educational purposes, to test the system as shown in Fig. 5. The details of each module are described in the following paragraphs.

First, the speech recognition part of the sensory perception module is built with Google Cloud Speech applying a neural network model to convert audio into text. Face recognition, which is also part of the sensory recognition module, uses a face recognition module provided by the NAO robot platform. Here the module has been slightly modified to track the target face in front of the robot.

Second, the extended HCN is a dialogue system that infers what response is required and which gesture is required based on the recognized user’s utterance. To train the dialogue system, we used the hospital receptionist dataset introduced in [20]. The dataset consists of conversations between humans and the system and covers four different tasks: request a prescription, confirm an appointment, ask for a wait time, and ask for a location.

To train the gesture selector, we extract a total of 19,671 sentences with 8 corresponding labels (multiple choice, welcome, request, greeting, inform, confirm answer, thanks, closing) from the conversation corpus provided by the Microsoft conversation challenge [21]. Additionally, the trained HCN is loaded into the system as a dialogue system, and if the highest probability of the selected response or the confidence of the ASR is less than 50%, we designed it to re-prompt the user so that we can build an interactive application for practical purpose.

Lastly, to integrate with the robot platform, we used the SHRI framework, a modular human robot software that manages the robot’s social behavior and domain tasks. By separating into domain tasks that control the execution flow of scenarios and social behavior as a framework, developers can reduce their efforts to implement non-verbal features of the robot such as turn taking, gaze, emotions, and gestures and so on. The domain task is considered conversational task in this work.

The framework consists of three main components: Social perception, Social task controller and Action renderer, as shown in Fig. 4. Social perception interprets situations based on the output of sensory perception modules. For example, audio-visual saliency is continuously evaluated, the user’s turn-taking intention is inferred, and cognitive and emotional state of interaction participants are estimated. Social task controller accomplishes action requested by the domain task. It uses tag information such as saying, gazing, pointing, facial expressions and so on. For example, if domain task request “<sm=tag:greeting> Hello. My name is Silbot”. Then, it generates a portable motion command format called semantic motion. Action renderer, which is a robot dependent component, is responsible for executing actual motor control. It interprets the semantic motion generated from the social task controller module.

5 Experiment

5.1 Experimental Environments

The human-robot dialogue system was evaluated through a user study in which human subjects interacted with NAO acting autonomously using the system described above. During each session, all interactions were conducted in English and in the participant’s seat in front of the robot and the experimenter’s seat next to the robot, the participant asked for assistance if needed, as shown in Fig. 6. The participants were also given a description of the overall hospital reception scenario, such as the tasks that had to be completed. The scene was recorded from the participant’s point of view centered on a robot. The time for each session did not exceed 20 minutes.

A total of 20 people (7 males, 13 females) agreed to participate in our study. Their ages had range from 19 to 30 (M=24.1, SD=3.4), where M denotes mean value and SD denotes standard deviation. Participants did not receive any financial compensation, and most of them were students with little or no previous experience in interaction with a robot. We assign half of the participants to interact with the proposed system and the rest to the baseline for a fair comparison of each condition.

To explore the benefits of the proposed system, we compared two conditions: a robot using the proposed system and a robot using the conventional method. We use the slot-based method which is workhorse of the conventional pipeline dialogue system as a baseline. The slot-based method is now widely used in commercial conversation systems such as Google Dialogue Flow, Amazon Lex, and IBM Watson, and predefines the structure of the conversation state with a set of slots to be filled during the conversation [12].

We use Google’s Dialog Flow^{Footnote 1} to implement the baseline system. Here we call our API to query our knowledge base with seven intents (welcome, prescription, check-in, silence, location, farewell and replacement) and 4 different slots (name, address, time and location). Regarding robot gesture selection, we make the baseline to select gestures according to a rule-based method by manually defining the gestures used for each response. A between-subject design was used to compare the two conditions. Thus, different participants are assigned to different conditions.

Table 1 Dialogue efficiency and quality results in both conditions. M represents the mean value and SD represents the standard deviation

Full size table

We collect various objective measurements from the log files and video recordings. We consider two metrics for objective measures used in the PARADISE framework[22, 23]: dialogue efficiency and quality. Dialogue efficiency is assessed using elapsed time, the number of tasks completed, and the number of utterances made by the user and the robot during the experimental session. The dialogue quality is measured by the number of timeouts, the number of re-prompt, and the confidence of ASR. Specifically, the timeout refers to the number of times the user misses the speaking time in the recognition section of the robot. The re-prompt is the number of times the robot asks the user the same question to obtain specific information. Note that the robot system is designed to ask the user the same question if the response probability or ASR confidence is less than 50%.

To explore subjective measures, the participants were asked to fill out a questionnaire to analyze perceptions of the robots. The questionnaire includes user’s overall rating and the God speed test [24], which is a measurement tool for HRI with five key concepts; anthropomorphism, animacy, likeability, perceived intelligence and perceived safety.

An experimental scenario was designed to show how well the robot works as a receptionist robot. Participants were asked to complete the given tasks as many as possible (ask for prescription, checking in for doctor’s appointment, asking waiting time, asking location of bathroom). Every participant was asked to imagine that they were entering a hospital that they had never been to before where the robot was installed in the reception area interacting with the patients. Before starting the experiment, they were asked to use natural language spontaneously. Moreover, they were provided with hints on how better communication with the robot. For example, “please wait for your turn to speak”and“please keep in mind that the robot only listens to you while its eyes turn blue”.

5.2 Experimental Result

Table 1 shows the experimental result in which participant successfully performed the given tasks under the both conditions. Here, we perform a one-tailed T-test to determine the statistical difference between the two conditions. The null hypothesis $H_0$ and its alternative $H_a$ for the both conditions can be described as follows:

$$\begin{aligned} H_0: \mu _{C1}=\mu _{C2},\quad H_a: \mu _{C1}\ne \mu _{C2}, \end{aligned}$$

(2)

where $c_1$ and $c_2$ represent each condition. Then, the significance level $\alpha $ set to 0.05.

The average number of tasks completed shows 3.8 (SD=0.42) in the proposed method and 3.9 (SD=0.32) in the baseline, with no significant difference (one-tailed T-test, p = 0.24). However, there is a significant difference in the number of the user turns, and robot turns, and elapsed time (one-tailed T-test, p = 0.028, 0.03 and 0.003, respectively). More specifically, the user with the proposed method has an average of 1.4 turns less than the baseline method. Moreover, the robot with the proposed method has an average of 1.8 turns less, and the users have an average of 38.4 seconds shorter interaction time.

In terms of dialogue quality, the result of data analysis shows that the number of time out, re-prompts and speech recognition all have no significant difference (one tailed T-test, p = 0.07, 0.18 and 0.32, respectively). However, we find that the proposed method was more likely to re-prompt an average 0.5 more.

We also analyze the result of robot perception and user satisfaction questionnaire to find the acceptability of our proposed system compared to the baseline. The validity of the used questionnaire was tested by measuring its internal consistency with Cronbach’s $\alpha $, which was equal to 0.89 (good consistency). Based on this value, we assume that our participants in the given context interpreted the robot characteristics, provided in the questionnaire, in an expected way. We averaged the 5-point Likert scale of the questionnaire we collected. As a result, we could not find any significant difference between two methods; trainable gesture selector and conventional method which manually defines gesture in the response (one tailed T-test, p = 0.24 and 0.39, respectively).

6 Discussion

Both conditions have shown similar results in the experiment with real users, but our proposed system shows better results in terms of dialogue efficiency. To gain more insight into this experiment, we performed a detailed analysis of the dialogue log files and recording, and it revealed that the proposed model tends to comprehend the dialogue context than the baseline system. Fig. 7 shows an example of dialogue between the proposed system and the user. In the same way as work presented in [22], the whole dialogue can be divided into four different sub-dialogues which is same as given tasks; check-in ($\hbox {U}1\sim \hbox {R}5$), collect prescription ($\hbox {U}7\sim \hbox {R}7$), ask waiting time ($\hbox {U}8\sim \hbox {R}8$), and ask location ($\hbox {U}9\sim \hbox {R}9$). Silence means that user did not provide any utterance. It shows that the collected information such as name and address carry over to the different tasks naturally (from check-in to collect prescription in this case) and may notice conversation is going to be over ($\hbox {U}10\sim \hbox {R}10$).

On the other hands, the baseline system requests the information ($\hbox {U}8\sim \hbox {R}10$) that has already been collected from the previous task ($\hbox {U}1\sim \hbox {R}6$) as shown in Fig. 8. We found that the baseline system is not suitable for complex conversation in terms of dialogue efficiency. We found that the baseline system to be a bit inefficient at completing the given tasks, but it can still be solved with handcrafted rules. In practice, the scenarios are more complicated and it cannot be easily solved with these handcrafted rules. However, the approach we proposed has a potential to build such latent rules as the scenario become a large and complex.

By analysing the questionnaire, there is no significant difference between both conditions. In both methods, it can be said that the gestures selected according to the robot’s responses are recognized equally by the users. However, we find that the proposed method is still superior in terms of development efficiency. This is because the proposed method automatically selects the appropriate gesture from the robot’s response, whereas in the baseline, the developer set each gesture based on the response in advance.

7 Conlusion and Future work

We presented and evaluated our autonomous hospital receptionist robot, using an end-to-end approach to generate response and gesture. For this, we extended HCN to select not only a response but also a proper gesture based on the generated response. We experimented with real users. We found that our proposed system has an advantage in terms of dialogue efficiency, which indicate how efficient users were in achieving the given tasks with the receptionist robot. Moreover, participants found no difference between the proposed system and baseline system in terms of the robot’s perception. It means that no more handcrafted works are required to define the robot’s gesture according to the robot’s response.

In future work, it will be possible to make several improvements to extend the realm of possibility for the receptionist robot with the end-to-end approach. The dialogue system, it was limited to a rather small domain. Tests on others, perhaps broader, the domain would be needed to see the resulting scale. On the side of the robot’s behaviour, we have tested how the users feel when using the receptionist robot. However, the robot is rather equipped with minimal features at this stage; verbal interaction with the robot’s gesture and gaze would be good examples. To improve the user experience, we can extend our work to more diverse forms such as robot’s expression, voice pitch and so on

Data availability

The data that support the findings of this study are not openly available due to human data and are available from the corresponding author upon reasonable request.

Notes

https://dialogflow.cloud.google.com/.

References

Ahn HS, Yep W, Lim J, Ahn BK, Johanson DL, Hwang EJ, Lee MH, Broadbent E, MacDonald BA (IEEE, 2019) Hospital Receptionist Robot v2: Design for Enhancing Verbal Interaction with Social Skills. In:2019 28th IEEE International conference on robot and human interactive communication (RO-MAN) pp. 1–6
Faber F, Bennewitz M, Eppner C, Gorog A, Gonsior C, Joho D, Schreiber M, Behnke S (IEEE, 2009) The humanoid museum tour guide Robotinho. In:RO-MAN 2009-The 18th IEEE International symposium on robot and human interactive communication pp. 891–896
Fong T, Thorpe C, Baur C (2001) Collaboration, dialogue and human-robot interaction, 10th international sumposium of robotics research (Lorne, Victoria, Australia). In:Proceedings of the 10th international symposium of robotics research
Spiliotopoulos D, Androutsopoulos I, Spyropoulos CD (2001) Human-robot interaction based on spoken natural language dialogue. In:Proceedings of the European workshop on service and humanoid robots pp. 25–27
Lucignano L, Cutugno F, Rossi S, Finzi A (2013) A dialogue system for multimodal human-robot interaction. In:Proceedings of the 15th ACM on International conference on multimodal interaction pp. 197–204
Stiefelhagen R, Ekenel HK, Fugen C, Gieselmann P, Holzapfel H, Kraft F, Nickel K, Voit M, Waibel A (2007) Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot. IEEE Trans Robot 23(5):840
Article Google Scholar
Papaioannou I, Dondrup C, Novikova J, Lemon O (IEEE, 2017) Hybrid chat and task dialogue for more engaging hri using reinforcement learning. In:2017 26th IEEE International symposium on robot and human interactive communication (RO-MAN) pp. 593–598
Vinyals O, Le Q (2015) A neural conversational model. arXiv preprint arXiv:1506.05869
Williams JD, Asadi K, Zweig G (2017) Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. arXiv preprint arXiv:1702.03274
Eric M, Manning CD (2017) A copy-augmented sequence-to-sequence architecture gives good performance on task-oriented dialogue. arXiv preprint arXiv:1701.04024
Shang L, Lu Z, Li H (2015) Neural responding machine for short-text conversation. arXiv preprint arXiv:1503.02364
Bordes A, Boureau YL, Weston J (2016) Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683
Sukhbaatar S, Weston J, Fergus R et al. (2015) End-to-end memory networks. In:Advances in neural information processing systems pp. 2440–2448
Liu F, Perez J (2017) Gated end-to-end memory networks. In:Proceedings of the 15th conference of the European chapter of the association for computational linguistics: Volume 1, Long Papers pp. 1–10
Seo M, Min S, Farhadi A, Hajishirzi H (2016) Query-reduction networks for question answering. arXiv preprint arXiv:1606.04582
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735
Article Google Scholar
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In:Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers) pp. 207–212
Purohit H, Dong G, Shalin V, Thirunarayan K, Sheth A (IEEE, 2015) Intent classification of short-text on social media. In:2015 IEEE International conference on smart city/socialcom/sustaincom (smartcity) pp. 222–228
Jang M, Kim J, Ahn BK (IEEE, 2015) A software framework design for social human-robot interaction. In:2015 12th International conference on ubiquitous robots and ambient intelligence (URAI) pp. 411–412
Hwang EJ, Macdonald BA, Ahn HS (IEEE, 2019) End-to-end dialogue system with multi languages for hospital receptionist robot. In:2019 16th International conference on ubiquitous robots (UR) pp. 278–283
Li X, Wang Y, Sun S, Panda S, Liu J, Gao J (2018) Microsoft dialogue challenge: Building end-to-end task-completion dialogue systems. arXiv preprint arXiv:1807.11125
Walker MA, Litman DJ, Kamm CA, Abella A (1997)“PARADISE: a framework for evaluating spoken dialogue agents. In:35th Annual meeting of the association for computational linguistics and 8th conference of the European chapter of the association for computational linguistics (Association for Computational Linguistics, Madrid, Spain ), pp. 271–280. https://doi.org/10.3115/976909.979652.https://www.aclweb.org/anthology/P97-1035
Walker MA (2000) An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. J Artif Int Res 12(1):387–416
MATH Google Scholar
Bartneck C, Kulić D, Croft E, Zoghbi S (2009) Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc Robot 1(1):71
Article Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions. This work was supported by the Technology Innovation Program (10077553, Development of Social Robot Intelligence for Social Human-Robot Interaction of Service Robots) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea).

Author information

Authors and Affiliations

Level 6, Room 653, 5 Grafton RD, Auckland, New Zealand
Eui Jun Hwang, Byeong Kyu Ahn, Jong Yoon Lim & Bruce A. Macdonald
Level 3, Room 903-314, 314 Khyber Pass Rd, Newmarket, New Zealand
Eui Jun Hwang, Byeong Kyu Ahn, Jong Yoon Lim & Ho Seok Ahn

Authors

Eui Jun Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Byeong Kyu Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Jong Yoon Lim
View author publications
You can also search for this author in PubMed Google Scholar
Bruce A. Macdonald
View author publications
You can also search for this author in PubMed Google Scholar
Ho Seok Ahn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ho Seok Ahn.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hwang, E.J., Ahn, B.K., Lim, J.Y. et al. Robot Dialog System in the Context of Hospital Receptionist and its Demonstration. Int J of Soc Robotics 15, 679–687 (2023). https://doi.org/10.1007/s12369-021-00861-y

Download citation

Accepted: 20 December 2021
Published: 17 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s12369-021-00861-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robot Dialog System in the Context of Hospital Receptionist and its Demonstration

Abstract

Similar content being viewed by others

Learning Emotion Recognition and Response Generation for a Service Robot

Survey of Speechless Interaction Techniques in Social Robotics

Towards Dialogue-Based Navigation with Multivariate Adaptation Driven by Intention and Politeness for Social Robots

1 Introduction

2 Related Work

2.1 Dialogue Systems for Service Robot

2.2 Recent Trends in Dialogue System

3 Dialogue System for Hospital Receptionist Robot

4 Hospital Receptionist Robot System

5 Experiment

5.1 Experimental Environments

5.2 Experimental Result

6 Discussion

7 Conlusion and Future work

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation