Three main analyses were carried out that were designed to best answer the research questions and understand the differences in time spent looking at the different interest areas depending on the sound condition, viewed activity, and during periods with direct gaze. Preliminary analysis of the data showed that gender did not have a significant impact on the data, and so it was not included in the main analyses as a covariate.
Effect of activity and sound on fixating on the face and hands
The first analysis looked at whether the sound and the activity influenced the degree to which participants fixated on the face as opposed to the hands. We calculated the percentage of time that participants fixated each interest area, and carried out an ANOVA with activity (monologue, manual action misdirection) and interest area (face, hands) as the within-participant variables, and sound (sound, no sound) as the between-participants variable. Figure 2 shows the mean dwell times for each of these interest areas and videos as a function of sound condition.
Our primary interest was whether the sound condition oriented participants’ eye movements to the actor’s face; our results did indeed reveal a significant audio by interest area interaction, F(1, 70) = 4.30, p = .042, η2= .11. The interaction was broken down by looking at the simple effects. As predicted, participants spent significantly more time fixating on the face with sound rather than without sound [t(70) = 1.87, p = .033 (one-tailed), d = 0.31], and vice versa for the hands [t(70) = 1.68, p = .048 (one-tailed), d= 0.28]. Whilst we feel justified in using one-tailed tests to assess the simple effect, we would like to draw attention to the fact that these effects are rather small and are no longer significant under stricter criteria. There was no significant activity by sound by interest area interaction F(2, 140) = 0.708, p = .50, η2= .01, which suggests this effect was independent of activity.
Our secondary interest was whether the actor’s activity influenced the duration to which participants fixated the face as opposed to the hands. There was a significant activity by interest area interaction F(2,140) = 652, p < .001, η2= .90, and Bonferroni corrected t-tests were applied to break down the effect. Participants spent significantly more time fixating on the face than the hands in the monologue video [t(71) = 19.3, p < .001, d = 2.28], but the opposite pattern was found in both the manual action [t(71) = 15.9, p < .001, d = 1.87] and misdirection [t(71) = 26.3, p < .001, d = 3.10]. As predicted the participants spent significantly more time looking at the hands than the face in the two videos that included informative hand movements. The ANOVA also revealed a significant main effects of activity, F(2, 140) = 16.0, p < .001, η2= .19, and interest area F(1, 70) = 8.67, p = .004, η2= .11. The main effect of sound was not significant, F(1, 34) = 0.67, p = .41, η2= .009, neither was the interaction between sound and activity, F(2, 68) = 0.71, p = .50, η2= .01. None of these main effects or interactions are particularly meaningful in light of the predictions.Footnote 1
Effect of task and sound on fixation to the eyes and mouth
The second analysis addressed whether hearing sound (mostly speech) as opposed to hearing no sound would draw people’s fixations towards the mouth rather than the eyes. For this analysis only data that fell within the eye or mouth interest areas were considered. It is clear from comparing Figs. 2 and 3, that roughly 75%Footnote 2 of the fixations on the face are captured by the eyes and mouth. A further factorial ANOVA contrasted gaze durations to the two interest areas (eyes and mouth) by activity (monologue, manual action, misdirection) and sound condition (sound or no sound). There was a significant interest area by sound interaction, F(2, 140) = 3.99, p = .05, η2= .054. Contrary to our prediction (that sound would increase gaze to the mouth), Bonferroni corrected post-hoc comparisons showed that participants spent significantly more time fixating the eyes when there was sound than with no sound [t(70) = 2.70, p = .009, d = 0.64], but the difference between the sound conditions was not significant for the mouth t(70) = 0.52, p = .60, d = 0.13]. Hearing the corresponding sound influenced the amount of time that was spent fixating the eyes but not the mouth.
There was no significant interest area by sound by activity interaction F(2, 140) = 1.01, p = .37, η2= .014. There was a significant main effect of sound, F(1, 70) = 6.28, p = .015, η2= .082, but not a significant main effect of interest area F(1, 70) = 0.966, p = .80, η2= .001, or interest area by activity interaction F(2,140) = 2.42, p = .092 η2= .033. There was a significant sound by activity interaction F(2,140) = 6.33, p = .002, η2= .083, and a significant main effect of activity, F(2,140) = 215, p < .001, η2= .76, but no significant interest area by activity interaction F(2,140) = 2.42, p = .092, η2= .033.
Effect of gaze direction fixations to the face
Finally, we examined the extent to which establishing eye contact made people fixate on the face. For each video, we manually coded (frame by frame) whether the actor’s gaze was directed towards the camera (direct) or whether they were looking elsewhere (averted). In the monologue the actor maintained direct eye contact 48% of the time, compared to 15% in the manual action and 33% in the misdirection condition. Figure 4 shows the percentage of time spent fixating on the face as a function of activity and whether the actor maintained direct eye contact or when their gaze was averted. An ANOVA with activity (monologue, manual action, misdirection) and gaze direction (direct, averted), as a within-subjects factor found a significant main effect of activity, F(2, 140) = 287, p <.0005, η2 = .80, mirroring the findings reported in the earlier analysis (activities that required hand actions oriented gaze to the hands more). More importantly there was a significant main effect of gaze direction F(2, 140) = 795, p < .00005, η2 = .92 , and a significant gaze direction by activity interaction, F(2, 140) = 235, p < .00005, η2 = .77. The activity being viewed modulated the extent to which people’s eyes were drawn towards direct eye gaze. Although participants spent more time looking at the face when direct eye contact was established (all p <.0005), post-hoc t-tests (difference scores were calculated by subtracting the % dwell time when gaze was averted from when gaze was directed towards the observer) (Bonferroni corrected) revealed that the difference in fixations to the face as a function of gaze direction was significantly smaller in the monologue than in both the manual task [t(71) = 20.7, p <.0005, d = 2.44] and the misdirection condition [t(71) = 11.1, p <.0005, d = 1.31]. Moreover, the difference was significantly greater in the manual than in the misdirection task, t(71) = 11.4, p < .0005, d = 1.35. Thus, eye contact was significantly more effective at driving fixations to the face during the viewing of the manual activities than during the monologue. We also ran the same analysis including sound as a variable, but none of the interactions involving sound or the main effect of sound were significant (all p > .09).
Table 2 shows the engagement scores for each of the videos as a function of sound condition. Data from three participants are missing due to data loss. An ANOVA with activity as within subject factor and sound condition as between subject factor found a significant main effect of activity F(2, 135) = 50.2, p < .0005, η2= .43, but no significant main effect of sound F(1, 67) = 2.27, p = .12, η2= .038, and no significant sound by activity interaction F(2, 134) = 2.00, p = .14, η2= ,029. Whilst there were clear differences in engagement for the different videos, the sound did not significantly influence the participant’s reported level of engagement.