Experimental Procedure and Evaluation
The study was conducted remotely, and each participant received links to the videos, electronic consent form, and online questionnaires with study instructions. After signing the consent form and reading the instructions, they completed a practice session followed by 12 study sessions. Each session included one of the six pairing of the gaze patterns listed in Table 2, for a single condition out of the three listed in Table 1Footnote 3. So that each participant watched all six pairs of gaze patterns twice, one for condition a and one for condition b. To reduce the recency effect of participants forgetting the previous conditions counterbalanced pairwise comparisons were performed instead of three-way comparisons. All six pairwise comparisons were combined into a ranked ordered list of three gaze patterns [18]. In each session, they watched two handover videos, consecutively.The different objects and postures used in the experiment are shown in Figs. 4 and 3 respectively.
The instructions at the start of the experiment, as well as the caption for each video, stated that participants should pay close attention to the robot’s eyes in the video. After every two videos, the participants were asked to fill out a questionnaire which collected subjective measures as detailed below. The questionnaire was identical to the one used in our previous study [6] and in Zheng et al.’s study [3]. Questions 1 and 2 measure the metric likability (Cronbach’s \(\alpha =0.83\)). Questions 3 and 4 measure the metric anthropomorphism (Cronbach’s \(\alpha = 0.91\)). Question 5 measures the metric timing communication.
1) Which handover did you like better? (1st or 2nd)
2) Which handover seemed more friendly? (1st or 2nd)
3) Which handover seemed more natural? (1st or 2nd)
4) Which handover seemed more humanlike? (1st or 2nd)
5) Which handover made it easier to tell when, exactly, the robot wanted the giver to give the object? (1st or 2nd)
6) Any other comments (optional)
Table 1 Study Conditions (24 participants per condition) Table 2 Six pairings of the three gaze patterns and their reverse order for each object or posture. Each participant experienced two versions (a/b of a single condition) of these pairings, for a total of 12 pairings Table 3 Combined preferences of gaze behaviors in the video study for the small and large object conditions Table 4 Combined preferences of gaze behaviors in the video study for the non-fragile object and fragile object conditions Table 5 Combined preferences of gaze behaviors in the video study for the standing and sitting conditions Experimental Design
The experiment was designed as a between-within experiment, using likability, anthropomorphism, timing communication as the dependent variables. The participants were divided into three groups of 24 participants. Each group performed one of the three study conditions listed in Table 1. The order of the 12 sessions were randomized and counterbalanced among the subjects.
Analysis
The participants’ ratings for the likability and anthropomorphism of the gaze behaviors were measured by averaging their responses to Questions 1-2 and 3-4 respectively. The one-sample Wilcoxon signed-rank test was used to check if participants exhibited any bias towards selecting the first or the second handover. Similar to our previous work [6] and Zheng et. al’s work [3], the Bradley-Terry model [19] was used to evaluate participants’ rankings of the likeability, anthropomorphism and timing communication of gaze behaviors. To evaluate the hypothesis H1, i.e. \(P_i \ne P_j \forall i \ne j\), where \(P_i\) is the probability that one gaze condition is preferred over others, the \(\chi ^2\) values for each metric were computed, as proposed by Yamaoka et. al [20]:
$$\begin{aligned} B = n \sum _{i<j}log(P_i+P_j) - \sum _{i} a_ilogP_i, \end{aligned}$$
(1)
$$\begin{aligned} \chi ^2 = ng(g-1)ln2 - 2Bln10, \end{aligned}$$
(2)
where, \(g = 3\) is the number of gaze behaviors, n is the number of participants, \(a_i\) is the sum of ratings in each row of Tables 3-7 (Appendix).
In order to examine H2-H4, we conducted two series of tests for each measured metric (likability, anthropomorphism and timing communication), and for each study scenario:
-
Binary proportion difference tests for matched pairs [21], in which the difference between the proportion of participants who chose one gaze condition \(p_b\) over other \(p_c\) was evaluated in each study scenario. The distribution of differences \(p_b-p_c\) is:
$$\begin{aligned} p_b-p_c \sim {\mathcal {N}}(0,\,\sqrt{\frac{p_b+p_c-(p_b-p_c)^2}{n}})\, \end{aligned}$$
(3)
where \(n = 24\) is the number of participants in each scenario. The Z-score is calculated according to the following formula:
$$\begin{aligned} Z = \frac{(p_b-p_c)}{\sqrt{var(p_b-p_c)}} \end{aligned}$$
(4)
A low Z-score means that the distribution of differences has zero mean with high probability.
-
Equivalence tests based on McNemar’s test for matched proportions [22, 23], in which the proportion of participants who changed their gaze preferences in each study scenario was compared within equivalence bounds of \(\triangle =\pm 0.1\).
Table 6 Combined preferences of gaze behaviors in the in-person study for the small and large object conditions Table 7 Combined preferences of gaze behaviors in the in-person study for the non-fragile object and fragile object conditions Results
Quantitative Results
To test for order effects, we checked, but did not find any bias towards selecting the first or the second handover [like: z =-0.68, p = 0.50; friendly: z = 1.22, p = 0.22; natural: z =0.20, p = 0.84; humanlike: z = 1.36, p = 0.17; timing communication: z =1.23, p = 0.22].
Tables 3 - 5 (Appendix) and Fig. 5-7 show the robot gaze preferences of the participants in terms of likability, anthropomorphism and timing communication.
Gaze conditions differ significantly in ratings (all \(\chi ^2\) values are large \((p < 0.0001)\)), supporting H1. Participants prefer the Face-Hand-Face transition gazes over Hand-Face and Hand gazes. Hand gaze is the least preferred condition.
Based on the binary proportion difference test, we did not find evidence that the proportion of observers of a handover preferring one gaze condition over the other is affected by object size (Table 9, Appendix), object fragility (Table 10, Appendix) and user’s posture (Table 11, Appendix). Hypotheses H2, H3 and H4 are not supported (all p values are over 0.2).
However, based on the equivalence tests, we did not find evidence that the proportion of observers of a handover preferring one gaze condition over the other is equivalent for the two object sizes (Table 9, Appendix), object fragilities (Table 10, Appendix), or user’s postures (Table 11, Appendix). Thus, hypotheses H2, H3 and H4 can also not be rejected (all p values are over 0.15).
Open-ended Responses
All open-ended responses are presented in [17] with major insights detailed below.
10 out of 72 participants gave at least one additional comment. Four out of the eight participants, who made Hand-Face gaze vs. Face-Hand-Face gaze comparisons, preferred Face-Hand-Face gaze over Hand-Face gaze due to the extended eye contact by the robot.
P059 - “As much eye contact as possible.”
P048 - “I preferred handover 2 (Face-Hand-Face gaze) because the robot looked more at the human”
Two participants mentioned that they could not distinguish between Face-Hand-Face gaze and Hand-Face gaze, while two participants commented about the advantages and disadvantages of the two gaze patterns.
P041 - “In handover 1 (Hand-Face gaze) you could tell that the robot was ready to receive the object. However, handover 2 (Face-Hand-Face gaze) felt more humanized because the robot looked at the giver’s eyes right until the transfer was made”.
Four out of six participants, who commented on the comparison between Hand-Face gaze and Hand gaze, preferred Hand-Face gaze because of the eye movement.
P008 - “In my opinion, the change in eye movement creates a better human-robot interaction.”
P009 - “In the second handover (Hand-Face gaze) the eye movement, gave a good indication for the communication.”
Two participants mentioned that they could not distinguish between Hand-Face gaze and Hand gaze.
Six participants commented on Face-Hand-Face gaze vs. Hand gaze comparison. All of them said that they preferred Face-Hand-Face gaze over Hand gaze.
P009 - “At handover 2 (Face-Hand-Face gaze), the robot looked at the object precisely when it wanted to take it, so it was perceived more understandable.”
P037 - “In my opinion video 2 (Face-Hand-Face gaze) best simulated human-like behavior out of all the videos I have seen so far.”
Table 8 Combined preferences of gaze behaviors in the in-person study for the standing and sitting conditions Table 9 Results of binary proportion difference test and equivalence test for matched pairs comparing small object and large object user’s preferences of robot gaze in handovers. Gaze condition in bold is the preferred choice in each pairwise comparison Table 10 Results of binary proportion difference test and equivalence test for matched pairs comparing fragile object and non-fragile object user’s preferences of robot gaze in handovers. Gaze condition in bold is the preferred choice in each pairwise comparison Table 11 Results of binary proportion difference test and equivalence test for matched pairs comparing sitting and standing user’s preferences of robot gaze in handovers. Gaze condition in bold is the preferred choice in each pairwise comparison