Keywords

1 Introduction

There are increasing amount of service robots being developed and applied in various public places, such as in museums [1, 2], airports [3], train stations [4], and shopping malls [5]. These service robots are expected to interact with people and act as receptionists or information staff. Initiating interaction is one of the most basic functions of these service robots [6], and it has a vital impact on the subsequent interaction process [7].

However, there are still many remaining issues existing for robots to initiate interaction in public places. Currently, there are mainly two approaches for service robots to initiate interaction. The first is called “the reactive approach”, in which a robot waits until a user to initiate interaction [8,9,10]. Robots who adopt this initiating strategy usually use certain behavior to exhibit their availability and recipiency [11], and to encourage users to initiate interactions [12, 13, 15]. This approach is resistant to many complex issues [14], nevertheless, it will miss out on potential users who are hesitating or uncertain about how to interact with robots. The other is “the proactive approach”, in which a robot proactively seeks for people who need help [6, 8]. Compared to the reactive approach, this approach is more initiative [16] and thus more likely to help those with potential needs but do not know they can turn to robots [6]. Besides, this approach is also useful for advertisement purposes. Yet, being too proactive, it may be perceived as “rude”, “disturbing”, or even “annoying” [8, 15]. In summarize, neither of the two approaches are natural enough in practical applications: the reactive approach is too passive, while the proactive approach may seem intrusive and cause annoyance.

The aim of this study was to design a more natural approach for robots to initiate interaction. We expected that with this approach, even those who had never encountered robots before, could interact with them in a natural, efficient, and pleasant way. Our approach was mainly based on behavior patterns revealed in human-human interactions, as they were established through long social practices and could be accepted by the majority of people without further explanations. Specifically, we designed a range of behaviors of a robot under the guidance of the approach. Moreover, an experimental study was conducted to validate the practical benefits of the approach.

2 Related Work

2.1 Initiating Interaction Between Human-Human in Public Places

The interaction initiating process is extremely delicate, and requires cooperation [17,18,19]. Moreover, it is a dynamic process. In the process of initiating interaction, people will use various social cues, which are understandable to all participants of the interaction [20].

Hall [21] suggests a series of steps in the process of initiating interaction: (a) getting a person’s attention; (b) assessing that person’s willingness to interact; (c) creating physical proximity to enable interaction.

One key issue in initiating interaction in public places is to recognize the intention to interact. Goffman [20] points out that in public places, an “encounter” is normally initiated by one person, using specific eye contact (a brief glance) or certain body language and posture. The interaction is not considered as officially started until the cues sent by the initiator are recognized by the receiver. In return, the receiver usually sends back eye contact (e.g., gaze) or body language as recognition. Specifically, gaze, among other kinds of eye contact, is a vital social cue for initiating social contact in public places, as it is one of the most directional social cues to express the intention to interact [22, 23].

As suggested by Hall [21], once the interaction intention is recognized by mutual parties, the next step is to create physical proximity to enable interaction. Hall [24] further proposes the concept of “proxemics”, which refers to the physical distance and/or closeness between people. He defines four kinds of personal spatial zones from a relatively close to far distance: “Intimate Zone”, “Personal Zone”, “Social Zone”, “Public Zone”. The widely accepted personal spatial zone was summarized by Lambert [25] in Table 1. Moreover, it appears that distance itself is not only a common social cue that can reflect and influence social relationships and attitudes [26, 27], but it can also affect one’s ability to see or touch the other person, and thus play an important role in the use of other social cues. For instance, Kendon [28] suggests that friends usually exchange greetings twice, first using body language at a far distance and again by smiling at a closer distance.

Table 1. Human-human personal spatial zones (cf. Lambert)

2.2 Initiate Interaction Between Human-Robot in Public Places

In general, robots are expected to obey common social norms when they are starting and maintaining communication with humans [29]. Most people are willing to try to interact with robots if the robot showed appropriate social behavior [30].

“How to attract users’ attention” is a popular research topic. Studies find that robots with humanoid bodies are more likely to attract users’ attention [31], and robots with face can attract significantly more users to stop [32]. Robots can also catch users’ eyes by moving their heads or blinking their eyes. However, when users are engaged in other things (e.g., watching news on television), speech turns out to be the best way to attract their attention, followed by waving gesture and eye LED blinking gesture, while attempting to build eye-contact works the worst in this kind of scenarios [33]. Gaze behavior is only valid when the robot has users’ attention already [33]. Some researchers designed different possible behavior including head motions (e.g., looking at the nearest person), different facial expressions (e.g., happy, sad, and angry) and different language acts (e.g., come here, do you like robots?), After self-learning from interactions with humans, the robot developed a “positive attitude” to attract attention, that is, saying nice things, looking at people, and smiling [34].

Gaze behavior is still an effective social cue in human-robot interaction, given that the robot has attracted users’ attention already. Mutlu and colleagues [35] applied key gaze mechanisms in human conversations on robots, and found that humanlike gaze mechanisms successfully helped the robot signal different participant roles, manage turn-exchanges, and shaped how participants perceive the robot and the conversation. Besides, other studies suggest that prolonged eye-contact from the robot while approaching or being approached by users can increase the user’s affection towards the robot [36].

“Human-Robot Proxemics” has been another focus of research. It is suggested that when approaching robots, users are likely to keep robots at personal or social zones, same as what they will behave when they are approaching another human-being [30, 37,38,39]. Similarly, when being approached by robots, users will feel more comfortable when robots stopped at the personal and social zones [40, 41]. If robots were too close, users will even step back to make themselves feel more comfortable. Besides, users prefer to be approached from the front instead of from the back, and front-left or front-right is preferred than the direct front-on direction [42,43,44].

Although there have been abundant findings accumulating in the field of robots’ social behavior, few studies have focused on the whole dynamic process of “initiating interaction”. So far, studies towards initiating interaction between human-robot have been mainly concentrated on building algorithms, such as how to detect humans and track their positions, how to determine intention and interests of humans, and how to recognize humans etc., or simply on one part of the initiating process, such as how to attract users’ attention. The present study focused on the whole process of “initiating interaction”, and further explored how these designed behaviors based on the progressive interaction approach would be perceived by users.

3 The Progressive Interaction

3.1 The Approach of “the Progressive Interaction”

Based on analyses of human behavior and our previous investigations, we suggest that during the process of human-robot interaction in public places, robots are expected to send interaction signals more actively, and in a progressively enhancing manner. This is what we call “the progressive interaction”.

Specifically, as humans and robots getting closer, humans will have different expectations towards robots. We divided the process into three stages, which are named as “far field”, “mid field”, and “near field”, based on the order they will appear in users’ mental world. Every stage is corresponding to certain distance ranges in physical world, and there are certain behaviors that will fit in users’ expectations for every stage:

  • Far Field: The aim of robots in this stage is to gain users’ attention, and to make users’ aware that “I’m noticed by HIM/HER”. This is a critical stage, as if it failed, the following human-robot interaction would seem to be abrupt or even impossible to make. This field is corresponding to a distance ranging approximately from 2.7 to 4.2 m. Robots are expected to use facial expressions and body movements to attract attention. Such as smiling, friendly eye contact, waving, tilting head to one side, nodding, and so on;

  • Mid Field: Robots are supposed to further express “the intention to interact”, and make users clearly aware that “I’m the only one in HIS/HER eyes” and HE/SHE is intended to further interact. This will also encourage users to approach the robot spontaneously. The distance corresponding to this field is about 1.2–2.7 m. Within this distance, users expect the robot to send out interactive signals using a variety of combinations, so that the user can be certain that he/she is the target. For example, using voice to send greetings (such as good morning, hello), at the same time, smiling, or waving, etc.;

  • Near Field: Robots need to “start a dialogue” first, and will be perceived as initiative and friendly. “HE/SHE is ‘Liao’ (hitting on) me”, and with this impression, the dialogue between human and robot develops naturally. The distance corresponding to the near field is within 1.2 m. Within this field, the role of language is highlighted. Users expect the robot to “initiate dialogue” first, such as introducing himself/herself and asking if any help was needed. At the same time, users expect more enthusiastic smiles and body movements (such as shaking hands, hugging, etc.).

The distance corresponding to each field was established in a previous study we carried out, in which participants (N = 32) were asked to approach a robot from certain direction and with certain pace that he/she was comfortable with. While approaching, they were asked to evaluate where were the appropriate points for the robot to attract attention (far field), express interaction intention (mid field), and start a dialogue (far field). The specific distance was measured and calculated afterwards.

3.2 Designing the Progressive Interaction

We applied the approach, the progressive interaction, on a robot called Xiaodu. The focus of the present study was listed in Table 2.

Table 2. Designing the progressive interaction

The Robot “Xiaodu”.

Xiaodu is a “formal employee” in Baidu company, 160 cm tall, 110 cm width, and works as a receptionist in the company hall (see Fig. 1). Xiaodu is benefited from the AI techniques (e.g., NLP, dialogue system, speech recognition) of Baidu, and is able to communicate smoothly with users in multiple aspects, such as communicating emotions, providing information and other services.

Fig. 1.
figure 1

Xiaodu at work

Facial Expressions in Far Field.

The aim of this stage is to attract users’ attention. Facial expressions of Xiaodu need to be perceived and understood by users. In return, users will be willing to pay attention to Xiaodu, and even further interact.

Six groups of facial expressions are designed by our UX designers and researchers, based on the features of facial expressions in human-human interactions, facial expression database of Xiaodu, and popular stickers used in online chatting. Participants were invited to evaluate these expressions from far field, regarding their understanding, affection, at what degree they were attracted, and the intention to further interact. The expression of “Raised eyebrows” was selected (as shown in Table 3).

Table 3. Facial expressions of Xiaodu for different fields

Facial Expression and Voice in Mid Field.

The core in mid field is to “initiate an interaction request”. For this stage, users need to be able to perceive the robot’s intention to interact. After the same process as in far field stage, the expression “smiling eyes” was selected for the mid field stage (see Table 3). As for voice, the classic way of “greeting” was selected—“Hi, how are you”.

In addition, the dual-modality design (“smiling eyes + greeting”) was favored by participants, comparing to merely smiling eyes. The former design left participants an impression that the robot was more enthusiastic, more directional, and more interactive. We further investigated whether the effect of voice was also significant after the whole interaction process in later validation experiment.

Facial Expressions, Voice and Face Recognition in Near Field.

The aim for this stage is to “start a dialogue”. “Smiling eyes with heart-shaped blush” was designed and selected for this stage (as shown in Table 3). As for voice, Xiaodu introduced himself/herself and then the rules for subsequent interaction–“I’m Xiaodu. If you want to chat you can just speak to me, I’m listening”.

During the process of user evaluation, we found that participants paid a lot of attention to the screen in the front of Xiaodu and would spontaneously keep eyes on the screen. Xiaodu was capable of face recognition, thus, we investigated whether displaying the face of the user would improve his/her experience in later experiment (as shown in Table 4).

Table 4. The screen of Xiaodu in far field

4 Experiment Validation of the Progressive Interaction Approach

4.1 Objectives

The study aimed to investigate whether there was any difference in user experience between the progressive interaction approach and traditional initiating interaction approach (the reactive approach); and the second aim was to explore the difference in user experience among different designs in the progressive interaction approach.

4.2 Design

The experiment adopted a within-subject design, and all participants were required to experience all 5 trials. After each trial, participants were asked to fill in questionnaires regarding their emotions and attitudes.

One of the trials was the control condition, in which Xiaodu adopted the reactive approach. Specifically, Xiaodu kept the default facial expression, and participants could walk in the front of Xiaodu, touch the screen, and he/she would be reminded to pick up the microphone to start a conversation.

The other four trials were the experiment condition. They were based on the progressive interaction approach. Specifically, a 2 (Mid field: with/without voice) × 2 (Near field: with/without face recognition) design was used for the experiment condition (as shown in Fig. 2).

Fig. 2.
figure 2

Designs for the progressive interaction approach. Introduction in near field: I’m Xiaodu. If you want to chat you can just speak to me, I’m listening.

To minimize the influence of learning effect and fatigue effect, we randomized the order of trials for each participant.

The present study only focused on the initiating process, and later human-robot dialogue was not included in this study. Thus, to avoid the impacts of the robot’s responses on participants’ emotions and attitudes, all participants were required to ask the same question “How’s the weather today”.

4.3 Participants

25 participants were recruited, including 12 males and 13 females. Their age ranged from 20 to 45 years old. 11 of them reported previous experience with service robots in public places. All participants reported normal or corrected to normal vision, and normal hearing.

All participants volunteered to participate in the study and agreed to make audio and video recordings of the research process. At the end of the study, all participants were given appropriate compensation.

4.4 Experiment Environment and Equipment

Experimental Set-Up.

The experiment took place in the open office area of Baidu company. The whole experiment area was approximately 8 m in length, and 2.5 m in width. A white curtain was set up behind Xiaodu, and one experimenter recorded the facial expression of users’ behind the curtain (as shown in Fig. 3).

Fig. 3.
figure 3

The experimental set-up

The Robot Xiaodu.

As mentioned before, Xiaodu is a humanoid robot in Baidu company. In different trials, it would display different kinds of behavior.

To save the time and other resources needed to develop a fully autonomous robot, the experiment used the Wizard of Oz (WoZ) paradigm [45]. The facial expressions and voice of Xiaodu were actually controlled by an experimenter, who practiced multiple times to ensure consistency among different trials.

4.5 Measures

Emotions.

Initiating interaction is a dynamic, subtle and transient process. In order to better evaluate user experience in this process, we included “emotion” as an important dependent variable, and it was measured with two criteria: objective face expression analysis and subjective self-report.

Objective Emotions.

Noldus FaceReader [46] offers a relatively objective method to measure emotions through analyzing facial expressions, and it can automatically detect various emotions such as happiness, surprise and anger. For a given time period, face expressions in recordings will be analyzed, and different criteria will be generated, such as emotional arousal level, emotional valence.

Considering the emotions that participants might experience in the experiment, we included four kinds of emotions: happiness, surprise, neutral, and negative emotion. Specifically, negative emotion was the combination of sadness, anger, fear, and disgust. We adopted the criterion—emotional arousal level. Values for emotional arousal level range from 0–1, and higher values indicate stronger emotions.

Subjective Emotions.

Participants were asked to rate their emotional intensity in an 8-point Likert questionnaire (from 0–7, higher scores indicated stronger emotional intensity), in which they were required to answer the question, “How did you feel in the process of interacting with Xiaodu just now”, repeatedly for different kinds of emotions. Five kinds of emotions were rated in the questionnaire, which is, happiness, surprise, confusion, disgust, and neutral. The questionnaire was filled in after every trial.

Attitudes.

Aside from emotions, we were also interested in participants’ perceptions and cognitions towards Xiaodu. Thus, we also asked participants to evaluate the following attitudes, with one single question each. Participants were asked to rate in a 7-point Likert questionnaire (from 1–7, higher scores indicted stronger agreement):

  • Naturalness: How natural do you think this interaction approach is?

  • Friendliness: How friendly do you think this interaction approach is?

  • Affection: How much do you like this interaction approach?

  • Interaction Intention: If Xiaodu initiated interaction with you in this way, how much would you like to interact with him?

4.6 Procedure

Participants were first welcomed, and were informed about the aim and general process of the study. After introduction, there was a practice phase, in which participants could get familiar with the experimental set-up, the robot (those who encountered Xiaodu for the first time were usually excited), and find out the comfortable pace for him/her to approach Xiaodu. Then, the experiment began. Participants were asked to imagine that this was the first time they encountered Xiaodu, and they were intended to check the weather through Xiaodu. All participants went through all five trials (one for reactive approach and four for the progressive interaction approach). Participants’ facial expressions during the experiment were recorded by one experimenter. After every trial, they filled out questionnaires about their emotions and attitudes during the trial. Finally, participants were interviewed for the reasons of their ratings after all trials.

4.7 Data Analysis

SPSS 23.0 was used for data analysis. First, we conducted descriptive analysis for all dependent variables. To evaluate the difference between the progressive interaction approach and the reactive approach, one-way repeated measures ANOVA was used. Further post hoc analysis was adopted if the ANOVA showed significant differences among the five trials.

Furthermore, we were interested in the effects of the two factors (“voice in mid field” and “face recognition in near field”) in the progressive interaction designs. Thus, we conducted the two-way repeated measures ANOVA to see whether voice and face recognition were preferred by participants.

5 Results

5.1 Objective Emotions

The objective emotions of all participants in different trials are shown in Table 5. Because of technique issues, only 21 of participants were included in this analysis. In general, participants exhibited a neural or happy state in their facial expressions of all trials.

Table 5. Descriptive statistics of objective emotions

One-way repeated measures ANOVA was used to investigate the differences among different trials. Mauchly’s test indicated that the assumption of sphericity had been violated except for the emotion happiness. Thus, we looked at the effects of trials after Green-house Geisser corrected (except for happiness). The findings suggested that there was no significant difference among different trials in all kinds of emotions (as shown in Table 6).

Table 6. The effects of trials on objective emotions

Besides, we conducted two-way repeated measures ANOVA to investigate the different among different designs of the progressive interaction. Since both of the factors had two levels (with/without), the assumption of sphericity was considered as automatically meet. The effects of voice and face recognition, and their interaction effects are listed in Table 7. It turned out that there was no significant difference in four designs, nor is the interaction effect of the two factors significant.

Table 7. The effects of voice and face recognition on objective emotions

5.2 Subjective Emotions

The ratings of different kinds of self-reported emotions in different trials are shown in Table 8. In general, participants reported more positive emotions (e.g., happiness) and less negative emotions (e.g., confusion) after the trials of the progressive interaction than after the trial of the reactive approach.

Table 8. Descriptive statistics of subjective emotions

As with the objective emotions, one-way repeated measures ANOVA was used and the results are listed in Table 9. Trials had significant effects on self-reported happiness, surprise, and confusion. Thus, post hoc analysis was conducted to see the difference between any two of the five trials. Compared to the reactive approach, participants reported significantly higher scores in happiness, surprise, but lower scores in confusion for the progressive interaction approach (as shown in Table 10).

Table 9. The effects of trials on subjective emotions
Table 10. Post Hoc analysis. Paired comparisons between the reactive approach and other four designs respectively.

Findings of two-way repeated measures ANOVA suggested that there was no significant difference in different designs of the progressive interaction approach (see Table 11).

Table 11. The effects of voice and face recognition on subjective emotions

5.3 Attitudes: Naturalness, Friendliness, Affection and Interaction Intention

Descriptive statistics are listed in Table 12. Compared to the reactive approach, Xiaodu with the progressive interaction approach were generally considered as more natural and friendly, and participants also reported higher affection, higher interaction intention with.

Table 12. Descriptive statistics of attitudes

Again, we conducted one-way repeated measures ANOVA to see if the differences of different approaches were significant, and found that there was significant difference in all four aspects of attitudes (see Table 13). Thus, post hoc analysis was conducted, and the results are listed in Table 14.

Table 13. The effects of trials on attitudes
Table 14. Post Hoc analysis. Paired comparisons between the reactive condition and other four designs respectively.

In addition, two-factor repeated measures ANOVA was used to see the effect of two factors (voice/face recognition) in different progressive interaction approach designs. Results showed no significant difference of the two factors in all four aspects (see Table 15).

Table 15. The effects of voice and face recognition on attitudes

6 Discussion

In the present study, we compared two approaches to initiate interaction: the progressive interaction approach and the reactive approach. Our main findings were that the progressive interaction approach resulted in more positive self-reported emotions, and was perceived to be more natural and friendlier. Moreover, participants reported higher affection and higher interaction intention towards the progressive interaction approach. However, no difference was found in objective emotions of the two approaches, nor in the four designs of the progressive interaction approach.

Participants reported more positive emotions and attitudes towards the progressive interaction approach than the reactive approach. The findings were also confirmed in the interview after the experiment. We found that participants successfully received the robot’s intention to interact through signals the robot sent, such as facial expressions, greetings, and self-introductions. With these signals, participants started interacting with the robot naturally. On the contrary, participants were confused about “how to initiate dialogue” with the robot in the reactive approach condition.

Interestingly, no significant difference was found in the objective emotions between the progressive interaction approach and the reactive approach, and in all trials, participants mainly exhibited neutral and positive facial expressions. Culture might be one of the influencing factors of this finding. Studies suggested that compared to Westerners, Easterners were more likely to control their facial expressions, since they paid more attention to the appropriateness of expressing emotions [47]. Moreover, it was common for Chinese to use similes or laughter to cover up negative emotions, unless in front of close others [48]. Thus, it was possible that during the experiment, participants used awkward smiles to cover up their confusion and awkwardness, which were recognized as positive emotions by the Noldus FaceReader. The assumption was consistent with experimenters’ observations. Moreover, participants also reported that they were confused about “how to initiate dialogue” with the robot in reactive approach condition.

There was no significant difference in four designs of the progressive interaction approach in all subjective evaluations. In general, participants had positive attitudes and emotions towards the progressive interaction designs, which suggested that these four designs were all relatively natural and friendly. However, it was worth noting that some participants expressed their concerns towards face recognition. 48% of participants reported that they did not like that their faces were shown on the screen, while only 16% of participants clearly indicated the opposite attitude. Reasons for dislikes were mainly: (1) Privacy issues, they fear that the image information might be recorded, analyzed and stored, or even be used for other purposes; (2) The image itself, the image captured by Xiaodu was not ideal because of wrong angels, looked fat or ugly etc.; (3) Interaction was not natural, as participants needed to switch between Xiaodu’s face and the screen back and forth, participants could not focus on Xiaodu’s face as in daily interaction. Besides, it felt awkward to communicate with his/her own image presented on the screen. On the other hand, those who liked face recognition reported that (1) High-tech feeling, face recognition made Xiaodu seemed more intelligent; (2) More directional; participants felt being seen by Xiaodu and was certain that him/herself was the target for further interaction. Thus, we suggested that face recognition should be used with caution, especially in the process of initiating interaction.

There were several limitations of this study. One was that the experiment was conducted in the constrained open office area, which was not necessarily considered as public places. A more ideal environment would be in the hall of Baidu building. Moreover, this was a very targeted study, and thus its’ generalizability needs to be considered when applying the findings. Specifically, a humanoid robot was used who was often perceived to be adorable. The characters of the robot itself may have some positive impacts on users’ emotions and attitudes. Moreover, our definitions of the far/mid/near fields and expectations corresponded were targeted at Chinese users. Not to mention the role Xiaodu played when we were putting the progressive interaction into concrete behavior patterns. Thus, the advantages of the progressive interaction need to be validated in other types of robots and in other cultural backgrounds. Another limitation was that we didn’t include the proactive approach into comparisons in the present study, which we would be interested to investigate in the future.

There are still many interesting topics that worth investigating in the field, which we would also be dedicated to in the future. For example, how to design body language in different fields; how to make the greetings and introductions more natural and diverse.

7 Conclusions

The present study focused on service robots in public places, and proposed a brand-new approach–the progressive interaction approach, for robots to initiate interaction. Furthermore, the approach was preliminarily validated by an experimental study, in which it was compared to a relatively traditional approach, the reactive approach. Specifically, we found that: (1) compared to the reactive approach, the progressive interaction approach led to more positive emotions, and was perceived to be more natural and friendly. Participants also reported higher affection, higher interaction intention towards the progressive interaction approach; (2) There was no significant difference among the four designs of the progressive interaction approach; (3) During the process of initiating interaction, face recognition did not cause more positive experience but revealed many concerns from users. Thus, whether to use face recognition or not in application should be considered cautiously. In conclusion, our study enriched the understanding of the human-robot interaction, and made a step forward in designing a natural and friendly human-robot interaction process.