Skip to main content

Examining joint attention with the use of humanoid robots-A new approach to study fundamental mechanisms of social cognition

Abstract

This article reviews methods to investigate joint attention and highlights the benefits of new methodological approaches that make use of the most recent technological developments, such as humanoid robots for studying social cognition. After reviewing classical approaches that address joint attention mechanisms with the use of controlled screen-based stimuli, we describe recent accounts that have proposed the need for more natural and interactive experimental protocols. Although the recent approaches allow for more ecological validity, they often face the challenges of experimental control in more natural social interaction protocols. In this context, we propose that the use of humanoid robots in interactive protocols is a particularly promising avenue for targeting the mechanisms of joint attention. Using humanoid robots to interact with humans in naturalistic experimental setups has the advantage of both excellent experimental control and ecological validity. In clinical applications, it offers new techniques for both diagnosis and therapy, especially for children with autism spectrum disorder. The review concludes with indications for future research, in the domains of healthcare applications and human–robot interaction in general.

Introduction

In this review, we describe a novel approach for studying the mechanisms of joint attention, namely the use of robot agents as dynamic “social stimuli” in naturalistic interactive scenarios. We argue that such a method provides more ecological validity than do classical screen-based protocols, while simultaneously allowing excellent experimental control. After a brief review of classical studies on joint attention, and the more recent approaches, we focus on the approach of using embodied robots in interactive scenarios. In the final section, we describe application areas in which robots are used to train joint attention skills in children diagnosed with autism spectrum disorder (ASD). Using robots for examining joint attention (and social cognition in general) is very timely, due to the recent emergence of new approaches in the study of human social cognition, the so-called “Second-person Neuroscience” (Schilbach et al., 2013), new developments in clinical applications (Pennisi et al., 2016), and a current strong focus of academia, industry and society on artificial intelligence, robotics, human–robot interaction and the societal, as well as economical, impact of new digital technologies (Manyika et al., 2013).

Classical studies on joint attention

Joint attention, a fundamental mechanism of social cognition (Frischen, Bayliss, & Tipper, 2007; Jording, Hartz, Bente, Schulte-Rüther, & Vogeley, 2018), has been widely studied in laboratory settings with the use of screen-based tasks. Joint attention is observed as the phenomenon of attending toward the same direction, or toward the same object/event, that another person is attending (Emery, 2000). The ability to discriminate between straight and averted gaze appears early in development (i.e., among 2-day-old babies—Farroni, Csibra, Simion, & Johnson, 2002; see also Vecera & Johnson, 1995) and it is considered a valid predictor of efficient development in linguistic abilities (e.g., Brooks & Meltzoff, 2005).

In the last 20 years, joint attention has been studied by using pictures or schematic faces presented to participants on a computer screen, and it is often operationalized as a modification of Posner’s cueing paradigm (Posner, 1980): the gaze-cueing paradigm. In a typical experimental condition, represented in Figs. 1a and 1b, participants view a schematic or realistic picture of a face presented in the center of the display. The first image is then replaced with the same image with eyes averted to the left or to the right (i.e., gaze cue). Finally, a target may appear in the location signaled by the eyes (i.e., validly cued trials) or in the opposite location (i.e., invalidly cued trials). The averted gaze represents the cue while its predictivity regarding target location is usually one of the variables that are manipulated in such paradigms. As in the classic spatial-cueing paradigm, responses are faster for validly than for invalidly cued trials (i.e., gaze-cueing effect), indicating that attention is oriented in the direction signaled by the gaze and thus switching focus to the uncued location is costly. One of the first studies investigating this phenomenon was carried out by Friesen and Kingstone (Friesen and Kingstone,1998; see also Driver et al., 1999). Electrophysiological and neuropsychological evidence highlighted the relationship between gaze direction and attention, indicating the existence of a specific neural substrate devoted to process meaningful gaze direction (i.e., gaze directed toward an object rather than toward empty space), like the superior temporal sulcus (STS; Allison, Puce, & McCarthy, 2000; Hoffman & Haxby, 2000; Pelphrey, Singerman, Allison, & McCarthy, 2003; Perrett et al., 1985). The STS projects input–output connections from- and to the fronto-parietal attentional networks (Corbetta, Miezin, Shulman, & Petersen, 1993; Maurizio & Shulman, 2002; Harries & Perrett, 1991; Nobre et al., 1997; Rafal, 1996). Through these connections, information about gaze direction projects to spatial attention systems to orient attention in the corresponding direction, as it occurs in joint attention.

Fig. 1
figure1

Examples of classical and novel paradigms used to study joint attention. (a) A gaze-cueing paradigm with schematic faces for congruent (upper frame) and incongruent (lower frame) trials (Friesen & Kingstone, 1998). From Ciardo et al., 2018. (b) Experimental setup in a gaze-following task using avatar faces. From “Studying the Influence of Race on the Gaze Cueing Effect Using Eye Tracking Method,” by G. Y. Menshikova, A. I. Kovalev, and E. G. Luniakova, 2017, National Psychological Journal, 2, p. 50, Fig. 1. Copyright 2017 by Lomonosov Moscow State University and the Russian Psychological Society (Menshikova, Kovalev, & Luniakova, 2017). (c) Adapted gaze-cueing procedure for gaze cueing in a real-world experimental setup. From “Mental State Attribution and the Gaze Cueing Effect,” by G. G. Cole, D. T. Smith, and M. A. Atkinson, 2015, Attention, Perception, & Psychophysics, 77, Fig. 5. Copyright 2015 by the Psychonomic Society (Cole, Smith, & Atkinson, 2015). (d) Gaze-cueing task in human–robot interaction (paradigm of Kompatsiari, Ciardo, et al., 2018).

Bottom-up and top-down components in joint attention

Early behavioral and electrophysiological studies investigating the gaze-cueing effect showed that the orienting of attention triggered by averted gaze can be defined as automatic (Jonides, 1981). Indeed, it has been showed that gaze-cueing effect emerges early in time (Friesen & Kingstone, 1998; Frischen et al., 2007), and is not affected by the nature of the task (Friesen & Kingstone, 1998), by gaze predictivity (Driver et al., 1999), or by a secondary, resource-demanding task (i.e., a memory task; Law, Langton, & Logie, 2010). Event-related potentials (ERPs) showed that occipital–parietal P1 and N1 components are modulated by gaze validity, indicating that visual processing already in the extrastriate cortex is modulated by gaze cues (Perez-Osorio, Müller, & Wykowska, 2017; Schuller & Rossion, 2001). Furthermore, Ricciardelli, Bricolo, Aglioti, and Chelazzi (2002) developed a prosaccade/antisaccade task to investigate whether observed averted gaze can interfere with goal-driven saccades (i.e., the gaze-following paradigm; see also Ciardo, Marino, Actis-Grosso, Rossetti, & Ricciardelli, 2014; Ciardo, Marino, Rossetti, Actis-Grosso, & Ricciardelli, 2013; Ricciardelli, Carcagno, Vallar, & Bricolo, 2013, for results using the same paradigm). Saccadic performance is less accurate when the gaze cue is incongruent with the saccade instruction. Recent studies, however, suggest that joint attention may not be purely bottom-up driven, but it is rather a combination of bottom-up and top-down mechanisms. Several factors have been identified to have an impact on top-down modulation of the gaze-cueing effect: relevance for the task (e.g., Ricciardelli et al., 2013), other stimuli in the environment (e.g., Greene, Mooshagian, Kaplan, Zaidel, & Iacoboni, 2009; Ristic & Kingstone, 2005), whether the gazing agent is assumed to see the target (Teufel, Alexis, Clayton, & Davis, 2010), believed reliability of the gazing agent (Wiese, Wykowska, & Müller, 2014), or whether the gaze is in line with action expectations (Perez-Osorio, Müller, Wiese, & Wykowska, 2015; Perez-Osorio et al., 2017). Furthermore, also social information associated with the observed agent plays a role in gaze-cueing effect: age (e.g., Ciardo et al., 2014; Ciardo et al., 2013), social status (e.g., Ciardo et al., 2013; Dalmaso, Pavan, Castelli, & Galfano, 2012); social attitude (Carraro et al., 2017; Ciardo, Ricciardelli, Lugli, Rubichi, & Iani, 2015), or assumed intentionality (Wiese, Wykowska, Zwickel, & Müller, 2012; Wykowska, Wiese, Prosser, & Müller, 2014). Taken together, these results highlight a link between joint attention and other (higher-level) mechanisms of cognition (see Capozzi & Ristic, 2018, for review) suggesting that engagement in joint attention in everyday life may be dependent on contextual and social information.

Joint attention, development, and individual differences

Gaze following behavior plays a pivotal role in development. For example, even children as young as 3 months are able to discriminate averted gaze and to shift attention to the corresponding location (Hood, Willen, & Driver, 1998). Moreover, longitudinal studies showed that an early onset of gaze-following predicts efficient development in linguistic abilities (e.g., Brooks & Meltzoff, 2005). Several studies showed that joint attention is dependent on individual differences, such as self-esteem (Wilkowski, Robinson, & Friesen, 2009), gender (Bayliss & Tipper, 2006), and autistic traits (Bayliss, di Pellegrino, & Tipper, 2005). For instance, Bayliss et al. (2005) reported a negative correlation between gaze-cueing effect magnitude and score on the Autism-Spectrum Quotient questionnaire (Baron-Cohen, Wheelwright, Skinner, Martin, & Clubley, 2001). Similarly, Ristic and Kingstone (2005) showed that adults diagnosed with high functioning autism show the gaze-cueing effect only when gaze direction is informative with respect to the possible location of the target, suggesting that for adults diagnosed with autism spectrum disorder gaze direction does not have the special status typically observed in healthy controls. A study investigating joint attention in patients suffering from chronic schizophrenia showed weaker gaze-cueing effect (Akiyama et al., 2008), whereas standard cueing effects were reported for non-social cues (i.e., arrows) and pointing gestures (Dalmaso, Galfano, Tarqui, Forti, & Castelli, 2013; see Marotta et al., 2014, for similar results from ADHD patients). Langdon & colleagues (2017) showed that when pictures of real faces instead of schematic faces are used, the larger gaze-cueing effect reported in schizophrenia patients can be attributed to a difficulty in disengaging from the gazed-at location once shared attention is established (Langdon, Seymour, Williams, & Ward, 2017). Altogether, these findings strongly support the idea that the ability to respond to joint attention signals and the development of communicative and social skills are strongly connected. However, classical studies use pictures or schematic faces presented to participants on a computer screen and mainly focus on responding to joint attention. Such classical paradigms contribute to understanding the cognitive and neural mechanisms of joint attention but lack the aspect of reciprocity in social interactions and ecological validity (Schilbach, 2015).

Recent approaches to study joint attention, highlighting the need for reciprocity

Recently, a new framework has been proposed according to which studying mechanisms of social cognition require experimental paradigms involving more “online” social interaction (Bolis & Schilbach, 2018; Edwards, Stephenson, Dalmaso, & Bayliss, 2015; Kajopoulos, Cheng, Kise, Müller, & Wykowska, in press; Risko, Laidlaw, Freeth, Foulsham, & Kingstone, 2012; Risko, Richardson, & Kingstone, 2016; Schilbach, 2014; 2015; Schilbach et al., 2013).

There is evidence that findings from static stimuli used in traditional paradigms cannot evoke the same mechanisms of response to joint attention as more dynamic social stimuli (for a review, see Risko et al., 2012). To begin with, even though Hietanen and Leppänen (2003) using static gaze cues found a similar gaze-cueing effect across emotions (happy, sad, fearful), Putman and colleagues using more complex dynamic representation of emotion and gaze found that the gaze-cueing effect was modulated by the emotion—that is, larger cueing effect for fearful than for happy faces (Putman, Hermans, & van Honk, 2006). The modulation of emotion on gaze-cueing effect might be associated with the difference in emotion processing per se that seems to be enhanced using dynamic stimuli (Sato, Kochiyama, Yoshikawa, Naito, & Matsumura, 2004; Sato & Yoshikawa, 2007). Importantly, studies have also examined the classical gaze-cueing paradigm using another human as a central cue. For example, Cole, Smith, and Atkinson (2015) examined the effect of mental state attribution on gaze-cueing effect during a human–human interaction. They found robust gaze-cueing effect even when the person’s view was occluded from the targets (a mental state of “not seeing”; see Fig. 1c), which is in contrast with previous screen-based studies in which the gaze-cueing effect was modulated by the belief regarding whether the gazer can or cannot see through a pair of goggles (Teufel et al., 2010). Interestingly, Cole and colleagues found a gaze-cueing effect approximately three times larger than that for standard screen-based stimuli (see Lachat, Conty, Hugueville, & George, 2012, for a different pattern of results, when only eyes are used as a cue instead of the whole head movements).

The abovementioned studies provide evidence that using more dynamic and naturalistic social stimuli in joint attention research might lead to different findings than static, screen-based stimuli. This is further confirmed by several efforts that have been made to study mechanisms of joint attention in the “wild”—that is, in situations that involve or have the potential for real social interaction (for a review, see Risko et al., 2012). In this case, evidence suggests that results from laboratory paradigms are not necessarily valid in natural, real world situations. For example, Gallup and colleagues showed that participants were more likely to follow cues of confederates toward an attractive object when the confederates were walking in the same direction as them on the street (participants’ gaze direction could not be seen by the confederate), as compared to the opposite direction (participants’ gaze direction could be detected by the confederate) (Gallup, Chong, & Couzin, 2012a). Interestingly, when the “pedestrians” were facing them, participants not only did not follow their gaze, but they were also less likely to look at the attractive object compared to the baseline condition, in which no one had looked at the object before (see also Gallup et al., 2012b, for similar results). Hayward, Voorhies, Morris, Capozzi, and Ristic (2017) compared gaze following between a real-world interaction and a typical laboratory task. During real-world interaction, a confederate kept an everyday conversation with the participant, while maintaining eye contact, but shifted his/her gaze on five different occasions. Response to joint attention was operationalized as the proportion of the confederate’s gaze shifts that were followed by the participant. In the laboratory paradigm, participants executed a typical nonpredictive gaze-cueing task with a schematic face. In this task, response to joint attention was operationalized during the cue presentation period, as the proportion of trials in which participants broke fixation at the central cue and executed a saccade toward the gazed-at location. Additionally, the authors measured the traditional gaze-cueing effect as reflected by reaction times to target detection. Although results of attentional shifting were statistically reliable and consistent with the existing literature in both paradigms (real-world, laboratory), comparison between experiments showed that no reliable associations emerged for shifting functions between cueing task and real-world interactions. So far, studies “in the wild” show that findings collected in the laboratory do not necessarily reveal all factors playing a role in social cognition (for a review, see Risko et al., 2012).

The need for more naturalistic online social interaction protocols is even clearer with respect to the mechanism of initiating joint attention (rather than only responding to joint attention bids). Under this perspective, authors started using virtual agents in the experiments addressing the initiation of joint attention (Bayliss et al., 2013; Dalmaso, Edwards, & Bayliss, 2016; Edwards et al., 2015; Schilbach et al., 2009). Virtual agents can provide high levels of behavioral realism—for instance, in mimicking human eye movement capabilities with respect to appearance and timing (Admoni & Scassellati, 2017). To address the issue of reciprocity in social interaction—for example, gaze contingency—some studies involved an experimental setup with an interactive eye-tracking system monitoring participants’ gaze position on a stimulus screen and controlling gaze behavior of an anthropomorphic virtual character (Pfeiffer, Timmermans, Bente, Vogeley, & Schilbach, 2011; Schilbach et al., 2006; Wilms et al., 2010). By programming a virtual agent’s gaze behavior to be contingent on participant’s gaze, Schilbach et al. (2009) compared the neural correlates of joint attention in terms of initiating and responding to joint attention. Authors found that, whereas following someone else’s gaze activated the anterior portion of medial prefrontal cortex (MPFC), seeing someone else following our gaze direction also activated the ventral striatum, an area associated with different stages of reward processing, such as hedonistic and motivational aspects (Liu et al., 2007; Rolls, Grabenhorst, & Parris, 2008), highlighting thereby that reciprocity in joint attention has an impact on crucial engaging factors. Moreover, Redcay et al. (2010) developed an experimental setup that allowed the examination of face-to-face interactions between a participant inside an MRI scanner and an experimenter outside of the scanner through a real-time video feed of either live or previously recorded interaction (Redcay et al., 2010). The experimenter and the participant were engaged in a game in which they had a common goal to find a target (Redcay, Kleiner, & Saxe, 2012). In each trial, the participant either responded to joint attention by following the experimenter’s gaze to the target object (only the experimenter could see the clue about the location) or initiated joint attention by cueing the experimenter to look at the object (only the participant could see the clue about the location). In contrast to previous studies (Schilbach et al., 2009), this paradigm required the intentional coordination of attention toward a common goal. The study found that dorsomedial prefrontal cortex (dMPFC) was activated both in response to joint attention and initiating joint attention. However, initiating joint attention, specifically, recruited regions associated with attention orienting and cognitive control systems (see Caruana, McArthur, Woolgar, & Brock, 2017, for an extensive review on fMRI studies of joint attention).

At a behavioral level, Bayliss et al. (2013) developed a gaze-leading paradigm in which participants were asked to choose freely—by gaze direction—an object. A centrally presented face would either gaze at the same direction (gaze congruent) or at the opposite (gaze incongruent). After selecting the object, participants were required to look back to the central face (Bayliss et al., 2013). In line with the developmental importance of refocusing to our interaction partner (for a review, see Feinman, Roberts, Hsieh, Sawyer, & Swanson, 1992), the successfully initiated joint attention modulated the return-to-face saccades to the central face. More specifically, the return-to-face saccade onset times were slower when the gaze of the face was incongruent with participants’ gaze than in the congruent condition. Along a similar line, Edwards et al. (2015) showed that participants’ attention was shifted to peripherally presented faces who followed their gaze. Additionally, Dalmaso et al. (2016) showed that gaze-cueing effect was more prominent with faces who previously did not follow participants’ gaze, in comparison with faces who followed participants.

Taken together, these studies suggest that the two mechanisms of joint attention—that is, responding to joint attention and initiating joint attention—are not identical in nature, since they activate both common (MPFC) but also distinct brain areas considering that initiating joint attention specifically recruited areas related to reward processing, attentional orienting and cognitive control. Importantly, this shows that initiation of joint attention requires interactive protocols, and thus, classical “spectatorial” approaches with participants passively observing screen-based stimuli are not sufficient to elucidate the full plethora of mechanisms engaged in the mechanism of joint attention.

Limitations of recent approaches to study joint attention

Studies using more ecologically valid experimental protocols suggest that findings in naturalistic setups might be different from screen-based “spectatorial” paradigms. Such interactive protocols have certainly advanced our knowledge regarding responding and initiating to joint attention, but each protocol involves specific shortcomings. For example, on the one hand, virtual agents can enable reciprocal social interactions but on the other hand, they still remain screen-based agents and thus lack the realism of natural social interactions. Human–human interaction paradigms increase the ecological validity but certainly impose challenges regarding the comparison between studies and the replicability of results, since various factors, such as the velocity of the directional movement during the cueing procedure, could influence the gaze-cueing effect in these setups. These factors are challenging to replicate, often they are not controlled for or not reported. Advancing to real-life paradigms poses even higher risk of compromising experimental control. For instance, apart from the controllability and reproducibility of the cues, differences in gazing arising from real-life situation, or from comparisons between live and screen-based cues can be attributed at least to some extent to the variations in the visual stimuli to which participants are exposed across situations (Gobel, Kim, & Richardson, 2015).

Using robots to examine joint attention

Among the manifold recent approaches to examine human social cognition, there is a growing interest in using humanoid robot agents in joint attention studies. In more classical paradigms in which robot faces are presented on the screen, using such stimuli allows for answering the question of what is the role of humanness and human/natural agency in evoking joint attention mechanisms. That is, with artificial humanoid agents, we can examine whether human-likeness is a crucial factor for engagement in joint attention. In more interactive protocols with embodied humanoids, the advantage of using them is that they can overcome issues of recent interactive protocols by offering excellent experimental control on the one hand and allowing for increased ecological validity and social presence on the other. In this section, we will review studies that have used robot agents as attention-orienting stimuli in both screen-based as well as naturalistic protocols. Subsequently, we discuss possible limitations of using robots as interactive partners. In the final part of this section, we provide guidelines for optimal use of embodied humanoid robots in joint attention research.

Screen-based paradigms examining joint attention with robot faces

The results from screen-based gaze-cueing paradigms with humanoid robots have not been entirely consistent. On the one hand, Admoni and colleagues found that two different robots, Zeno (Robokind) and Keepon, did not elicit reflexive gaze-cueing effect (Admoni, Bank, Tan, Toneva, & Scassellati, 2011). However, conclusions from this study are limited by the lack of statistical power (see Table 1), given the small number of cued trials (eight cued trials, p. 1986). In a similar line, Okumura, Kanakogi, Kanda, Ishiguro, and Itakura (2013) demonstrated that only a human gaze elicited anticipatory gaze shifts of 12-year-old infants, but robots did not have the same effect. On the other hand, Chaminade and Okka (2013) found that there was no difference in the magnitude of the gaze-cueing effects elicited by the head shift of a human face and of the NAO T14 robot face using nonpredictive cues (upper torso). Additionally, Wiese et al. (2012), by comparing the magnitude of gaze-cueing effect elicited by a robot and a human face using nonpredictive cues, demonstrated that both faces induced a gaze-cueing effect, but robots engaged participants in joint attention to a smaller extent. In a follow-up study, the authors showed that with the very same robot face, gaze-cueing effect was elicited, dependent on whether participants believed its behavior was preprogrammed or human-controlled (gaze-cueing effect was quantified both in reaction times and in the P1 component of the EEG signal). Martini, Buzzell, and Wiese (2015) studied the effect of the physical appearance of the robot (from 100% robot to 100% human) on mind attribution and gaze-cueing effect using a counter-predictive gaze-cueing paradigm. The authors found a positive linear relationship between mind attribution ratings and human-like appearance, however, this was not reflected in the gaze-cueing effect, which showed an inverted U-shaped pattern. Indeed, only agents with moderate level of human-likeness (60% human morph) induced automatic gaze-cueing effect, whereas both agents with 100% human-likeness (human faces) and 100% robot-likeness (robot faces) eliminated the gaze-cueing effect (Martini et al., 2015).

Table 1 Summary of the studies examining joint attention in healthy population, from classical to more naturalistic and recent approaches

Concerning the study of initiating joint attention with robot faces, a screen-based gaze-leading paradigm has been developed using a robot face instead of a virtual agent. In this gaze-contingent eye-tracking task with the face of the iCub humanoid robot (Metta, Sandini, Vernon, Natale, & Nori, 2008; Natale, Bartolozzi, Pucci, Wykowska, & Metta, 2017) presented on the screen, Willemse, Marchesi, and Wykowska (2018) manipulated the behavior of the robot to either follow the gaze of the participants (80% of the trials, “joint disposition” robot) or not (20% of the trials, “disjoint disposition” robot). In this way, authors could dissociate whether the modulation of re-engagement times to the faces arose from the learning of an agent’s identity (identity with disjoint disposition) or from trial-by-trial contingency. The results showed that onset times of saccades returning to the face of the robot were faster with the robot who typically followed the gaze than with the disjoint robot. Interestingly, the results extended previous findings and showed that this effect arose from the learnt disposition of the robot (main effect of disposition), and not by the trial-wise contingency (Willemse et al., 2018).

In this section, we observed that the majority of screen-based joint attention experiments using robots as attentional-orienting stimuli not only replicated classical findings of responding and initiating to joint attention but also essentially advanced our knowledge regarding the role of human-likeness in inducing joint attention mechanisms (Martini et al., 2015; Willemse et al., 2018). However, as argued above, screen-based agents might not be sufficient for elucidating social cognitive mechanisms.

Joint attention examined with embodied robots and interactive protocols

Robots that are embodied and integrated into interactive protocols can act as dynamic social “partners,” which can engage mechanisms crucial for social cognition in daily life (Putman et al., 2006), see Fig. 1d. Being embodied, they increase social presence (Jung & Lee, 2004), and are more “natural” than even virtual reality, as they can modify our environment and manipulate physical objects around us. Importantly, they also allow for reciprocity in interaction: for example, similarly to virtual agents, robot’s gaze behavior can be programmed to be contingent on participants’ gaze. Moreover, similar to Gobel et al. (2015), one could exploit the dual function of robot gaze by manipulating participants’ beliefs about another human looking back at them through robot’s eyes. Finally, although it is still somewhat too early to have humanoid robots implemented in the “wild,” interactive paradigms in the lab that require joint actions and common goals with a human, such as manipulating objects on a table, could certainly have a real-life relevance, and are not constrained to tasks on the screens or 2-D environment. In the case of using humanoid robots in interactive scenarios, one can maintain experimental control while also embedding the setup in natural 3-D joint environment. Importantly for the purposes of studying joint attention, humanoids offer excellent experimental control—they can repeat same specific behaviors over many trials, and they allow for “modularity of control” (Sciutti, Ansuini, Becchio, & Sandini, 2015); that is, their movements can be decomposed into specific elements, an impossible endeavor for a human. For instance, in the context of joint attention research, the trajectory time of the movement of the eyes can be controlled and can follow predefined parameters over many repetitions. Overall, we argue that combining embodied humanoid robots with well-controlled experimental designs offers an optimal combination of ecological validity and experimental control, and allows for tapping into specific cognitive mechanisms such as joint attention.

A recent interactive study (Wykowska, Kajopoulos, Ramirez-Amaro, & Cheng, 2015) on joint attention involving an embodied robot iCub demonstrated that the gaze-cueing effect was of the same magnitude independent of whether participants believed iCub’s behavior was human-controlled or “programmed,” which is in slight contrast to previous studies with screen-based stimuli (Wiese et al., 2012). Similarly, Wiese, Weis, and Lofaro (2018) employing a gaze-cueing paradigm with Meka robot showed that the embodied robot elicited a gaze-cueing effect. Additionally, Kompatsiari, Perez-Osorio, et al. (2018) showed that the gaze-cueing effect during a gaze-cueing procedure with iCub humanoid robot was similar to those previously observed with human faces (Wykowska et al., 2014), at both the behavioral and neural level—that is, reaction times to target discrimination were faster, and the N1 ERP component peaked earlier and had higher amplitude on validly cued trials, relative to invalidly cued trials (Kompatsiari, Perez-Osorio, et al., 2018). Moreover, Kompatsiari and colleagues (2018) demonstrated that a real-time eye contact during a gaze-cueing paradigm with iCub enhances the gaze-cueing effect driven by a non-predictive cue (50% validity), while it suppresses orienting of attention driven by a counterpredictive gaze cue (25% validity), as compared to a prior no-eye-contact gaze. This paradigm, by encompassing an online eye contact prior to the gaze shift, challenges classical findings of screen-based paradigms that showed an automatic gaze-cueing effect elicited by counterpredictive cues (Driver et al., 1999; Friesen & Kingstone, 1998). Moreover, a similar nonpredictive gaze-cueing study showed that participants not only engaged in joint attention (measured by the gaze-cueing effect) merely when the robot established eye contact before shifting the gaze, but they also fixated longer on iCub’s face during eye contact than during no-eye-contact gaze (Kompatsiari, Ciardo, De Tommaso, & Wykowska, 2019a). These results advanced the knowledge related to the cognitive mechanisms affected by eye contact in joint attention research, by demonstrating that eye contact has a “freezing” effect on attentional focus, resulting in longer disengagement times and thus longer time to reallocate attention.

Besides being initiators of joint attention, humanoid robots can also be programmed to respond to the gaze of participants, thereby introducing reciprocity. In an interactive version of the screen-based gaze-contingent task, Willemse and Wykowska (2019) found an interactive effect of robot disposition (more likely to follow human gaze or more likely not to follow) and the effect of trial-wise contingency over re-engagement with the robot’s face (measured as onset latencies of return saccades to the robot face), thereby providing different pattern of results that 2-D screen-based stimuli. Similar to human–human studies in joint attention research, studies using embodied humanoid robots, also show that an embodied robot might produce a different pattern of results than screen-based stimuli (Kompatsiari,et al., 2018; Willemse & Wykowska, 2019).

To provide the reader with a clearer view of the results obtained in joint attention research using different kinds of setups (from classical to more naturalistic), we summarize in Table 1 the gaze-cueing studies that were reported in the previous sections. Table 1 shows that the effect size of validity varies not only across setups but also within the same setup. However, in the majority of the reported studies, the effect size lies in the range of a large effect (> .8), and in only a few studies the effect size is medium (.5–.8). Although the largest effect sizes are reported in the screen-based human/schematic setup, it should be noted that more interactive setups—that is, those including human or robot partners—still induce medium or large main validity effects. Moreover, it is also worth noting that the smaller effect size observed in a number of studies can be attributed to a low number of trials, or to the inclusion of a manipulation that reduced the strength of the main validity effect due to the lack of a validity effect in one of the conditions (e.g., Hietanen et al., 2006; Jones et al., 2010; Kompatsiari, et al., 2018; Kompatsiari, Perez-Osorio, et al., 2018; Martini et al., 2015).

Limitations in using robots as stimuli to study joint attention

Although embodied robots in interactive protocols can lead to new insights regarding the joint attention mechanism, it is important to note that robots obviously cannot substitute a human interactive partner, or evoke exactly the same mechanisms as those involved in real-life spontaneous human–human interaction. However, this constraint is not exclusively related to the use of robots. It also applies in general to controlled experimental setups for studying social interactions (even between human agents), since the repetitive agent’s movements over a relatively long time period and the rather monotonous nature of the task cannot really represent a spontaneous interaction. Finally, even the knowledge of participants that they are under examination might modify their behavior. However, robot stimuli might have a specific limitation related to their artificial nature. It might be that, first of all, they might not be treated as a social entity (and therefore not evoke all possible mechanisms of social cognition) and second, they might evoke negative attitudes of some participants. This is particularly related to anxieties and fears that humans have toward robotic technology and artificial intelligence (Kaplan, 2004; Syrdal, Dautenhahn, Koay, & Walters, 2009). This issue could be addressed by measuring the bias toward robots (e.g., by qualitative measures) and applying statistical methods to control for effects of interindividual differences. Another potential constraint of using robots consists in possibilities of comparison between studies and generalizability of results since robots are often very different; and it is often the case that one lab works with only one specific robot, whereas another lab uses a different robot platform. To address this limitation, the comparison should be mainly performed within the same robotic platforms or using robots that could evoke similar gaze cues—that is, having similar mechanical characteristics of eyes.

However, despite the limitations, we argue that embodied robots embedded in interactive protocols that are grounded in well-established paradigms targeting specific mechanisms of social cognition can be extremely informative and serve the function of social “stimuli” of higher ecological validity than classical screen-based stimuli. Simultaneously, they allow for maintaining a high degree of experimental controlling contrast to human–human interaction protocols.

General guidelines for using embodied robots in joint attention experimental protocols

From the results reviewed here, it emerges that embodied robots would benefit from complying with specific design properties for research and applications in the area of joint attention. In terms of appearance, robots probably need to have a moderate human-like appearance (60% human morph) as indicated by Martini and colleagues’ study, which showed that robotic agents with 100% robot-likeness or 100% human-likeness did not show a reflexive gaze-cueing effect (Martini et al., 2015). Additionally, despite the limitations regarding the implementation of biologically inspired robot eyes both in terms of cost and complexity, mechanical human-like eyes that can enable a gaze-cueing procedure are recommendable (for a review, see Admoni & Scassellati, 2017). It would also be beneficial if robots are endowed with algorithms that allow for the establishment of eye contact with participants since it has been shown that eye contact initiated by a humanoid robot increases perceived human-likeness and engagement with the robot (Kompatsiari, Ciardo, Tikhanoff, Metta, & Wykowska, 2019b). It also enhanced joint attention (Kompatsiari, et al., 2018). Furthermore, gaze contingency of robot behavior implemented in a more naturalistic setup (i.e., without eye-tracker) would benefit by embedding in robots algorithms that would allow for online detection of participant’s gaze and assessment of saccadic eye movement parameters. Finally, in order to ensure the reproducibility of the results and studies, authors should always report the controller used for producing robot’s movements, the desired kinematic parameters (e.g., eyes velocity), and the actual measured parameters.

Application of joint attention studies in human–robot interaction in healthcare

In the previous sections, we discussed the new approach of using robots to investigate the mechanism of joint attention. This section will report studies in which fundamental research reaches out to application to healthcare.

Similar to neurotypical population, in clinical populations more natural settings are needed to achieve a good understanding of the mechanisms of social cognition (including joint attention). For example, individuals diagnosed with high-functioning autism are shown to experience impairments in the ability to use implicit social cognition mechanisms: they have difficulties in responding intuitively to socially relevant information during an online dynamic and fast-paced interaction with others (Schilbach et al., 2013). However, explicit social cognition mechanisms in offline experimental protocols often remain intact (Schilbach et al., 2013). Indeed, individuals diagnosed with high-functioning autism are reported to respond differently when they judge an interaction in the role of an observer, relative to being an actor: the role of observer enables participants diagnosed with high-functioning autism to take the time and think about the interaction, while having to take part of the interaction actively triggers their social impairments, as they experience an overwhelming amount of social information. Therefore, more naturalistic approaches are needed to fully understand the cognitive processes impaired in ASD.

Here, we focus on the use of robots in interactive protocols for individuals diagnosed with ASD. Because individuals diagnosed with ASD enjoy being engaged with mechanical and technological artifacts (Baron-Cohen, 2010; Hart, 2005)—due to the fact that these artifacts are less overwhelming (simplified design), less intimidating, and offer repetitive, predictable behaviors—it has been proposed that using robot during interventions could help therapists to train social skills in children diagnosed with ASD (Cabibihan, Javed, Ang, & Aljunied, 2013; Scassellati, Admoni, & Matarić, 2012; Wiese, Wykowska, & Müller, 2014).

Children diagnosed with ASD, among other social and cognitive deficits, show impaired initiation of joint attention (e.g., reduced use of common joint attention strategies, such as gestures, finger pointing, and grasping the hand of an adult) and diminished responsiveness to joint attention bids (American Psychiatric Association, 2013; Charman et al., 1997; Johnson, Myers, & American Academy of Pediatrics Council on Children With Disabilities, 2007; Mundy, 2018; Mundy & Newell, 2007). The impact of reduced engagement in joint attention in ASD may be far-reaching—by contributing to functional development of other mechanisms of social cognition (Mundy, 2018). Because training joint attention in children diagnosed with ASD showed positive effects on social learning and development (Johnson et al., 2007; Mundy & Newell, 2007), intervention approaches for increasing joint attention have been encouraged (Johnson et al., 2007).

Following this line of reasoning, several authors focused on training or assessing the joint attention skills of children diagnosed with ASD with the use of interactive sessions with a robot (Anzalone et al., 2014; Anzalone et al., 2019; Bekele, Crittendon, Swanson, Sarkar, & Warren, 2014; Boccanfuso et al., 2017; Chevalier et al., 2017; David, Costescu, Matu, Szentagotai, & Dobrean, 2018; Duquette, Michaud, & Mercier, 2008; Kajopoulos et al., 2015; Michaud et al., 2007; Simut, Vanderfaeillie, Peca, Van de Perre, & Vanderborght, 2016; Taheri, Meghdari, Alemi, & Pouretemad, 2018; Warren et al., 2015; Zheng et al., 2013; Zheng et al., 2018), often through a spatial attention-cueing paradigm: The child is prompted by the robot to look in a given direction in which a visual target is displayed (see Fig. 2). The robots can use increasing degrees of bids for joint attention, depending on the child’s ability to respond to the bid (e.g., the robot will first move only the head, and if the child does not look at the target, the robot will prompt again by moving the head and pointing with the arm). However, using a robot for training or examining joint attention skills with individuals diagnosed with ASD was questioned by Pennisi et al. (2016): In their recent systematic review on autism and social robotics, they outline that results of studies on joint attention were mixed. Indeed, the five selected studies (published before November 3, 2014) on socially assistive robotics, focusing on joint attention in children diagnosed with autism, present contradictory and exploratory results. Anzalone et al. (2014) and Bekele et al. (2014) examined joint attention skills in children with ASD and typically developing children during a single interaction with a robot or a human partner. Both studies observed that a human partner needed less prompting (relative to a robot partner) to successfully orient the child’s attention. Duquette et al. (2008) and Michaud et al. (2007), however, observed higher improvements in the joint attention skills of two children diagnosed with ASD after training with a robot partner for 22 sessions, relative to two children diagnosed with ASD after training with a human partner for the same number of sessions. Finally, Warren et al. (2015) and Zheng et al. (2013) successfully trained joint attention skills in six children diagnosed with ASD with a four-sessions robot-based therapy, but they observed that the data obtained from their pilot study were not sufficient to suggest broader changes in the children’s skills. To summarize Pennisi et al. (2016) review, the benefits of a robot partner in comparison with a human partner to train and/or examine joint attention is not clear, however the studies are very exploratory considering the number of participants and their methodology (e.g., no pre- or posttest of the trained skills, single interaction, etc.).

Fig. 2
figure2

Examples of setups using robots to train and examine joint attention in children diagnosed with ASD. (a) Setup using the robot CuDDler. From “Robot-Assisted Training of Joint Attention Skills in Children Diagnosed With Autism,” by J. Kajopoulos et al., 2015, in A. Arvah, J.-J. Cabibihan, A. M. Howard, M. A. Salichs, and H. He (Eds.), Social Robotics, Cham, Switzerland: Springer. Copyright 2015 by Springer International Publishing Switzerland. (b) Setup using the robot Nao. From the thesis “Impact of Sensory Preferences in Individuals With Autism Spectrum Disorder on Their Social Interaction With a Robot,” by P. Chevalier, 2016, Université Paris-Saclay. Copyright 2016 by the author.

In the following sections, we report and discuss more recent studies (published before July 15, 2018) evaluating the use of robot to train or examine joint attention in children diagnosed with ASD. Table 2 presents a summary of the articles reviewed here. Note, however, that the articles summarized in this review needed to satisfy two criteria: First, the studies reported in the articles needed to be human-centered (i.e., they were not focused only on the robotic system and skills). Second, their main purpose was to study the use of robots in therapy for children with ASD (i.e., the research needed to include clinical trials or scientific experiments, there needed to be at least an experimental group of children diagnosed with ASD, and the study needed to involve at least three participants diagnosed with ASD).

Table 2 Summary of the articles reviewed here

Robot-assisted training of joint attention in children diagnosed with ASD

Results from more recent studies using robots to train joint attention still report mixed results regarding the effectiveness of the method. For example, Simut et al. (2016) compared the behaviors of 30 children diagnosed with ASD during an interaction with a human or a robot partner, in a joint attention task. As in Anzalone et al. (2014) and Bekele et al. (2014), they observed no differences in the children’s performance in the joint attention tasks and in their behavior toward the different partners, except a longer gaze toward the robot partner. However, this is a single interaction, and no long-term effects could be observed. In a longer-term intervention, David et al. (2018), investigated if joint attention engagement of five children diagnosed with ASD was dependent on the social cues displayed by the robot during therapy sessions. They compared the effect of a human (~8 sessions) or a robot partner (~8 sessions) to train joint attention and compared the children’s performance in joint attention to their preintervention performance. As in Anzalone et al. (2014) and Bekele et al. (2014), they observed similar patterns in their five participants’ behaviors and performance in joint attention independent of whether the children were trained by a robot or by a human partner. Furthermore, the robot partner needed to show a higher level of prompting than the human partner. However, the study was performed including a small number of participants, and the joint attention skills were not evaluated posttraining, to assess the effectiveness of the therapy over a longer term.

Unlike the results of the previously discussed studies, Kajopoulos et al. (2015) found improvements in joint attention skills after a robot intervention. In their study, seven children diagnosed with ASD followed six joint attention training sessions with the robot CuDDler. Joint attention skills were evaluated before and after the training session, thanks to the abridged Early Social Communication Scale (ESCS; Seibert & Hogan, 1982). The ESCS enables to assess separately the mechanisms of responding to joint attention and initiating joint attention. The authors observed improvement in responding to joint attention, which is not surprising, given that the training protocol was designed to target specifically this mechanism with a head-cueing procedure. Importantly, however, improvement in responding to joint attention was observed during a human–human interaction session (the experimenter administering the ESCS posttest) two to three days after the end of the training. This is an encouraging result, showing that skills trained during human–robot interaction can be transferred to an interaction with a human. In Zheng et al. (2018), the authors presented an updated setup of their previous experiment from Bekele et al. (2014) and Zheng et al. (2013). In their earlier studies, the setup required a child to wear a hat and an experimenter to validate when the participant was looking at the target after the prompt of the robot (through a Wizard-of-Oz technique). In Zheng et al. (2018), the setup was automated and participants did not need to wear anything, which was a more convenient setup. The article describes the validation of their automated setup, with 14 children diagnosed with ASD that followed four sessions of joint attention training. They observed that during the sessions, the joint attention skills improved (the children looked significantly more to the target cue than to the nontarget cue across the sessions). However, as the authors point out, they did not use other screening tools to assess the improvements, and further studies should be conducted to replicate this result and examine whether the improvement transfers to interaction with human partners. In summary, although several researchers attempted a robot-assisted training of joint attention for children with ASD, the results remain mixed.

In addition to studies focusing only on joint attention in children diagnosed with ASD, other studies investigated robot-based set of games designed to train social skills, including, but not limited to, joint attention (Boccanfuso et al., 2017; Taheri et al., 2018). Boccanfuso et al. developed a low-cost robot, CHARLIE, to play a set of games designed to engage the children in imitation, joint attention, and social tasks. Over a period of six weeks, eight children diagnosed with ASD interacted with a robot partner in addition to speech therapy, whereas a control group of three children diagnosed with ASD participated only in the speech therapy. The children were screened pre and post-intervention with different screening tools, including the unstructured imitation assessment (UIA; Ingersoll & Lalonde, 2010). The UIA is a tool to measure a child’s ability to imitate spontaneously during unstructured play with an adult and has a subscale screening joint attention, which enabled the authors to track the children’s improvements in their joint attention skills. The authors observed that both groups benefited independently of the type of training, and the interaction with the robot did not provide additional benefits. In Taheri et al.’s study, the authors also developed a set of games involving imitation, joint attention, and social games with a robot. They compared the impact of a human partner and a robot partner for the improvements of the social skills of six children diagnosed with ASD that participated in the study. However, as the study involved only six children from different age groups, and the games were involving many skills, the authors reported that the results of their study could not give proper indication of the effect of the study on specific skills such as joint attention, or conclusions regarding the impact of human versus robot partners.

The results from these studies, despite being mixed, suggest that training joint attention with a robot improves the children’s joint attention skills, in a similar way as training with a human partner. However, this field of research requires more systematic and rigorous methods of testing and larger statistical power in the recruited samples, in order to validate the effects of socially assistive robotics in training joint attention.

Examining the mechanisms of joint attention in children with ASD with robot interaction partners

Apart from training joint attention skills, robots can also be used as a tool to understand cognitive or behavioral mechanisms of joint attention in children diagnosed with ASD, or potentially, in the future, as a diagnostic tool. For example, in Kajopoulos et al.’s work (2015), in addition to training the mechanism of responding to joint attention bids, the authors used their experiment to observe the difference between the cognitive processes of responding to joint attention and initiating joint attention in children diagnosed with ASD. Because the children improved only in responding to joint attention bids, thanks to the spatial-cueing paradigm, this implies that both responding and initiating joint attention are different processes that are learned in a different way (as explained in Mundy, 2018). Their work also emphasized that robots can be used to target specific cognitive processes by using well-known paradigms used in laboratory settings that are designed to address isolated (in a controlled manner) cognitive mechanisms. Similarly, in Anzalone et al.’s work (2018), instead of using the robot for training joint attention skills, the robot was used to compare behavioral metrics of children with- and without a diagnosis of ASD performing a joint attention task. Furthermore, behavioral metrics of children with ASD were compared with the use of a robot before and after a period in which the children did the Gaming Open Library for Intervention in Autism at Home (GOLIAH; Bono et al., 2016). GOLIAH is a set of games (that does not involve robots) done in a clinic and at home that focus on training specific abilities, particularly joint attention and imitation. As in their previous work (Anzalone et al., 2014), the authors used the robot Nao in a gaze-cueing paradigm to assess a child’s response to joint attention. An RGB-D camera (Microsoft Kinect) was recording the gaze, body, and head behaviors during the experiment. The metric they used enabled statistical distinction of children diagnosed with ASD (N = 42) and without ASD (N = 16): Children diagnosed with ASD were less stable and their head and body moved more than neurotypical children during the joint attention interaction with the robot. This shows that the naturalistic interaction with the robot enabled measuring joint attention characteristics of children diagnosed with ASD and discriminating them from joint attention characteristics in typically developing children. The comparison of the behavioral metrics of eight children diagnosed with ASD before and after six months of training the joint attention skill thanks to GOLIAH showed that their body and head displacement and gaze behavior were closer to the pattern of typically developed children. In Chevalier et al. (2016), the authors used a spatial-cueing paradigm task to assess the different behavioral responses to a joint attention prompt from a robot partner regarding their participants’ sensory profiles. They hypothesized that the different sensory profiles in children diagnosed with ASD could lead to different behavior, and that assessing these interpersonal differences could help the knowledge of ASD and to better tune socially assistive robotics for this population. They assessed the sensory profiles of 11 children diagnosed with ASD and observed after a single intervention with a robot that the response time to joint attention from the robot seemed to be linked to the visual and proprioceptive preferences of the participants. However, the study was done only on a single session with few participants. Even if these results are obtained based on small groups of children and require replication, they are encouraging, and supporting the idea of the use of naturalistic robotic settings to examine or diagnose the mechanisms of cognitive process in children diagnosed with ASD.

Limitations in the use of socially assistive robots for training and examining joint attention in ASD

The use of robots to train or examine joint attention skills in children diagnosed with ASD still provides inconclusive results, as discussed above. However, the field is still very new, and all the studies still have rather an exploratory, or proof-of-concept, character. Future research in training and examining joint attention with robots for children diagnosed with ASD should be conducted in a more systematic manner, with larger and well-screened samples, standardized pre- and posttests, appropriately designed control groups or conditions. Indeed, as Scassellati et al. (2012) explain in their review on research in socially assistive robotics for children diagnosed with ASD, research teams that develop these studies need to consist of experts specialized in many fields of research (to cite a few: robotics, computer science, psychology, etc.). Few research teams cover all these areas, and they tend to focus only on the strengths existing in that particular team. What is observed is that often, the experiments described are not targeted at specific isolated cognitive mechanisms. It is therefore difficult to observe and interpret precisely what changes during the therapeutic intervention. To explore social cognition mechanisms, although it is difficult to use exactly the same protocols as those developed for adults, it is still possible and recommended to adapt existing protocols in experimental psychology to children and to observe well-specified and isolated cognitive mechanisms.

ASD comprises of great inter-individual variability, as the symptoms fall on a continuum (American Psychiatric Association, 2013). Studies investigating the use of robots to train or examine joint attention in children with ASD rarely consider this aspect of ASD. However, as pointed out by Milne (2011), individuals diagnosed with ASD present very large interindividual differences, comparing to data collected from control groups. Furthermore, numerous studies used subgroups within their sample of individuals diagnosed with ASD to capitalize on the large differences in their symptoms and/or behaviors (Milne, 2011). The author also adds that although many cognitive deficits are observed in ASD, there are many studies in ASD literature with examples of not replicated results, suggesting that some of the observed specific cognitive impairments might not be consistent and universal in ASD. These observations from Milne can therefore also relate to the differences in results that have been observed in robot-assisted therapies of joint attention skills for children diagnosed with ASD.

It is also important to note that one major limitation of robot-based therapy reported in the studies discussed previously is on the technology used and the design of the training. Indeed, robot-based interventions aim to be more and more automated but are still limited in their range of actions due to technological limitations. Zheng et al. (2018) discuss that the design of the task used in their setup is limited by the automated system they developed and that in longer training protocols, the lack of different tasks could make the participants lose their interest and therefore make the therapy less impactful. Similarly, Anzalone et al. (2019) discussed that their automated setup offers limited freedom and that children find their behavior constrained. Chevalier et al. (2016) reported that they had to use a Wizard of Oz setup (i.e., the experimenter was controlling the robot instead of having an automated system). The face tracker technology they used during the experiment was unable to follow accurately the children’s faces as they covered their heads with their hands or they looked straight down, limiting the accuracy of the technology. Boccanfuso et al. (2017) used a teleoperated robot to test their games instead of the automated system they developed, to ensure that the robot was responsive rapidly and accurately enough to test more efficiently how engaging were the games they designed. The difficulty of designing robot-based interventions (quality of the games regarding difficulty, interest, etc.) is also pointed out in the previously discussed studies. David et al. (2018) reported that they had to change gradually the task to keep the children’s interest. Kajopoulos et al. (2015) and Chevalier et al. (2015) reported that even if the interventions were designed with the help of caregivers, some children had difficulties to understand or perform the task.

Guidelines for robot-assisted training for ASD

As described above, the use of robots as a tool for training or examining joint attention skills in children diagnosed with ASD still yields mixed results. However, it should be noted that it is a promising avenue. Although it is a difficult process, progress might be achieved if future studies are based on closer collaboration with clinics, hospitals or associations working with children diagnosed with ASD. This should allow for the recruitment of a larger amount of participants over a longer time period. A larger number of participants could also mitigate the high prevalence of dropout rates or loss of data due to technical issues. Unfortunately, to date, too many articles report results on too few participants and/or for short-case studies, which makes it very difficult to draw conclusions regarding the results of the use of a robot in training children diagnosed with ASD. Additionally, working closely with clinicians should enable design of new training protocols with higher degree of engagement of participants in the training (see Chevalier et al., 2017; Ferrari, Robins, & Dautenhahn, 2009; Robins & Dautenhahn, 2010, for reports discussing design strategies for socially assistive robotic interventions). Another point for improvement is the evaluation of the children’s progress during training interventions. Using well-known paradigms or protocols is recommendable in order to target very specific cognitive mechanisms. For example, using the spatial attention-cueing paradigms, one can train responding to joint attention bids, but with the robot’s behavior being contingent on the gaze/head behavior of the participant (the robot following the gaze of a child), one can target the mechanism of initiating joint attention. Targeting one particular skill or set of skills, in a well-known structured way would ease the design of the experiments and the replicability of results and studies. Finally, using pre- and posttests to evaluate the progress of therapy and improvement in skills is also highly recommended. Finding appropriate clinical tests may be a challenge, depending on the country of study, as the ESCS, for example, is not translated in all languages. This is another reason to encourage close collaboration with clinicians. The above-mentioned guidelines should also help to take into account the great heterogeneity of the patients in ASD, which would enable to fit better the protocols and track more efficiently if certain subgroups of behavior exist in joint attention within the spectrum of autism. On a side note, open-source codes of the training intervention could additionally help in the replicability of studies.

Conclusions and outstanding questions

In this review, we have discussed new approaches in examining joint attention, with a special focus on the use of embodied robots in healthy individuals and clinical population of individuals diagnosed with ASD. We highlighted that classical approaches with observational stance and screen-based stimuli do not capture all aspects of social cognition. Therefore, new approaches capitalizing on naturalistic and interactive setups (Schilbach et al., 2013) are more promising in terms of explaining various aspects of social cognition. However, using naturalistic approaches is challenging with respect to experimental control. In this context, humanoid robots can prove particularly useful, as they allow studying social cognition and joint attention specifically with both a high degree of experimental control and relatively high ecological validity. Such approach provides new insights into the mechanisms of joint attention (such as the role of human-likeness, and eye contact in eliciting gaze-cueing effects, and the difficulty in disengagement from the face during eye contact), and potential for application in healthcare, in training and examining joint attention in children diagnosed with ASD.

One crucial theoretical question that is not yet fully understood in joint attention research relates to how different nonverbal cues such as eyes, head, body posture or pointing are integrated in order to summon human’s attention. This question could be easily addressed with full-body humanoid robots that consist of mechanical eyes since the robot’s movements can be decomposed into individual components but also in selected combinations of them, as described in (Sciutti et al., 2015) by the term “modularity of the control.” The importance of this topic is also relevant for clinical studies. Previous research showed that in autism, a robot seemed to need a higher level of prompting than a human (e.g., a robot needed to use a combination of the face and arm whereas a human needed only the face, see Anzalone et al., 2014; Bekele et al., 2014; David et al., 2018). However, those studies did not examine the cognitive processes involved and the results are still very exploratory because of the small number of participants.

Similarly, the mechanical abilities of a humanoid robot could allow for exploring how the velocity of movements affects joint attention. This is also relevant for clinical studies in autism, as this population is known to have impaired processing of visual motion (Simmons et al., 2009). Some studies have observed that slowing down the velocity of videos would help children diagnosed with ASD in improving verbal cognition and behavior (Tardif, Latzko, Arciszewski, & Gepner, 2017), and in better exploration of facial signals (Charrier, Tardif, & Gepner, 2017).

The possibility of changing the appearance of robots, by modifying, adding, or removing elements of its body and face, could enable investigating how social and individual biases toward appearance can affect joint attention. Understanding the impact of appearance in joint attention could greatly help in clinical applications; for example, plain robotic faces and bodies have been discussed as being more efficient for interacting with children with autism than is more realistic, complex embodiment (Billard, Robins, Nadel, & Dautenhahn, 2007).

Another aspect of joint attention that could be thoroughly investigated using humanoid robots, but that is almost impossible to be examined with screen-based experiments, involves joint attention during joint action. This could theoretically boost joint attention research since the majority of dyadic or group interactions in real life are related to actions. The findings would directly help research in clinical studies target the processes impaired in interactions more efficiently.

Author Note

This work is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant awarded to A.W., titled “InStance: Intentional Stance for Social Attunement,” grant agreement no. 715058) and by Minded Program–Marie Skłodowska-Curie grant agreement no. 754490, a fellowship awarded to P.C.

References

  1. Admoni, H., Bank, C., Tan, J., Toneva, M., & Scassellati, B. (2011). Robot gaze does not reflexively cue human attention. In L. Carlson, C. Hölscher, & T. F. Shipley (Eds.), Expanding the space of cognitive science: Proceedings of the 33rd Annual Meeting of the Cognitive Science Society (pp. 1983–1988). Austin, TX: Cognitive Science Society. Retrieved from https://escholarship.org/uc/item/3pq1v9b0

    Google Scholar 

  2. Admoni, H., & Scassellati, B. (2017). Social eye gaze in human–robot interaction: A review. Journal of Human–Robot Interaction, 6, 25–63. doi:https://doi.org/10.5898/JHRI.6.1.Admoni

    Article  Google Scholar 

  3. Akiyama, T., Kato, M., Muramatsu, T., Maeda, T., Hara, T., & Kashima, H. (2008). Gaze-triggered orienting is reduced in chronic schizophrenia. Psychiatry Research, 158, 287–296. doi:https://doi.org/10.1016/j.psychres.2006.12.004

    Article  PubMed  Google Scholar 

  4. Allison, T., Puce, A., & McCarthy, G. (2000). Social perception from visual cues: Role of the STS region. Trends in Cognitive Sciences, 4, 267–278. doi:https://doi.org/10.1016/S1364-6613(00)01501-1

    Article  PubMed  Google Scholar 

  5. Anzalone, S. M., Tilmont, E., Boucenna, S., Xavier, J., Jouen, A.-L., Bodeau, N., . . . Cohen, D. (2014). How children with autism spectrum disorder behave and explore the 4-dimensional (spatial 3D + time) environment during a joint attention induction task with a robot. Research in Autism Spectrum Disorders, 8, 814–826. doi:https://doi.org/10.1016/j.rasd.2014.03.002

  6. Anzalone, S. M., Xavier, J., Boucenna, S., Billeci, L., Narzisi, A., Muratori, F., . . . Chetouani, M. (2019). Quantifying patterns of joint attention during human–robot interactions: An application for autism spectrum disorder assessment. Pattern Recognition Letters, 118, 42–50. doi:https://doi.org/10.1016/j.patrec.2018.03.007

  7. American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders: DSM-5. Arlington, VA: American Psychiatric Association.

    Book  Google Scholar 

  8. Baron-Cohen, S. (2010). Empathizing, systemizing, and the extreme male brain theory of autism. In I. Savic (Ed.), Sex differences in the human brain, their underpinnings and implications (Progress in Brain Research), Vol. 186, pp. 167–175). Amsterdam, The Netherlands: Elsevier. doi:https://doi.org/10.1016/B978-0-444-53630-3.00011-7

    Chapter  Google Scholar 

  9. Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., & Clubley, E. (2001). The Autism-Spectrum Quotient (AQ): Evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. Journal of Autism and Developmental Disorders, 31, 5–17. doi:https://doi.org/10.1023/A:1005653411471

    Article  PubMed  Google Scholar 

  10. Bayliss, A. P., Murphy, E., Naughtin, C. K., Kriticos, A., Schilbach, L., & Becker, S. I. (2013). “Gaze leading”: Initiating simulated joint attention influences eye movements and choice behavior. Journal of Experimental Psychology: General, 142, 76–92. doi:https://doi.org/10.1037/a0029286

    Article  Google Scholar 

  11. Bayliss, A. P., di Pellegrino, G., & Tipper, S. P. (2005). Sex differences in eye gaze and symbolic cueing of attention. Quarterly Journal of Experimental Psychology, 58A, 631–650. doi:https://doi.org/10.1080/02724980443000124

    Article  Google Scholar 

  12. Bayliss, A. P., & Tipper, S. P. (2006). Predictive gaze cues and personality judgments: Should eye trust you? Psychological Science, 17, 514–520. doi:https://doi.org/10.1111/j.1467-9280.2006.01737.x

    Article  PubMed  PubMed Central  Google Scholar 

  13. Bekele, E., Crittendon, J. A., Swanson, A., Sarkar, N., & Warren, Z. E. (2014). Pilot clinical application of an adaptive robotic system for young children with autism. Autism, 18, 598–608.

    Article  Google Scholar 

  14. Billard, A., Robins, B., Nadel, J., & Dautenhahn, K. (2007). Building robota, a mini-humanoid robot for the rehabilitation of children with autism. Assistive Technology, 19, 37–49.

    Article  Google Scholar 

  15. Boccanfuso, L., Scarborough, S., Abramson, R. K., Hall, A. V., Wright, H. H., & O’Kane, J. M. (2017). A low-cost socially assistive robot and robot-assisted intervention for children with autism spectrum disorder: Field trials and lessons learned. Autonomous Robots, 41, 637–655. doi:https://doi.org/10.1007/s10514-016-9554-4

    Article  Google Scholar 

  16. Bolis, D., & Schilbach, L. (2018). “I interact therefore I am”: The self as a historical product of dialectical attunement. Topoi. Advance online publication. doi:https://doi.org/10.1007/s11245-018-9574-0

  17. Bono, V., Narzisi, A., Jouen, A.-L., Tilmont, E., Hommel, S., Jamal, W., . . . MICHELANGELO Study Group. (2016). GOLIAH: A gaming platform for home-based intervention in autism—Principles and design. Frontiers in Psychiatry, 7, 70. doi:https://doi.org/10.3389/fpsyt.2016.00070

  18. Brooks, R., & Meltzoff, A. N. (2005). The development of gaze following and its relation to language. Developmental Science, 8, 535–543. doi:https://doi.org/10.1111/j.1467-7687.2005.00445.x

    Article  PubMed  PubMed Central  Google Scholar 

  19. Cabibihan, J.-J., Javed, H., Ang, M., & Aljunied, S. M. (2013). Why robots? A survey on the roles and benefits of social robots in the therapy of children with autism. International Journal of Social Robotics, 5, 593–618. doi:https://doi.org/10.1007/s12369-013-0202-2

    Article  Google Scholar 

  20. Capozzi, F., & Ristic, J. (2018). How attention gates social interactions. Annals of the New York Academy of Sciences, 1426, 179–198. doi:https://doi.org/10.1111/nyas.13854

    Article  Google Scholar 

  21. Carraro, L., Dalmaso, M., Castelli, L., Galfano, G., Bobbio, A., & Mantovani, G. (2017). The appeal of the devil’s eye: Social evaluation affects social attention. Cognitive Processing, 18, 97–103. doi:https://doi.org/10.1007/s10339-016-0785-2

    Article  PubMed  Google Scholar 

  22. Caruana, N., McArthur, G., Woolgar, A., & Brock, J. (2017). Simulating social interactions for the experimental investigation of joint attention. Neuroscience & Biobehavioral Reviews, 74, 115–125. doi:https://doi.org/10.1016/j.neubiorev.2016.12.022

    Article  Google Scholar 

  23. Chaminade, T., & Okka, M. M. (2013). Comparing the effect of humanoid and human face for the spatial orientation of attention. Frontiers in Neurorobotics, 7, 12. doi:https://doi.org/10.3389/fnbot.2013.00012

    Article  PubMed  PubMed Central  Google Scholar 

  24. Charman, T., Swettenham, J., Baron-Cohen, S., Cox, A., Baird, G., & Drew, A. (1997). Infants with autism: An investigation of empathy, pretend play, joint attention, and imitation. Developmental Psychology, 33, 781–789. doi:https://doi.org/10.1037/0012-1649.33.5.781

    Article  PubMed  Google Scholar 

  25. Charrier, A., Tardif, C., & Gepner, B. (2017). Amélioration de l’exploration visuelle d’un visage par des enfants avec autisme grâce au ralentissement de la dynamique faciale: Une étude préliminaire en oculométrie. L’Encéphale, 43, 32–40. doi:https://doi.org/10.1016/j.encep.2016.02.005

    Article  PubMed  Google Scholar 

  26. Chevalier, P. (2016). Impact of sensory preferences in individuals with autism spectrum disorder on their social interaction with a robot (Thesis). Paris, France: Université Paris-Saclay. Retrieved from http://www.theses.fr/2016SACLY017

  27. Chevalier, P., Li, J. J., Ainger, E., Alcorn, A. M., Babovic, S., Charisi, V., . . . Evers, V. (2017). Dialogue design for a robot-based face-mirroring game to engage autistic children with emotional expressions. In A. Kheddar, E. Yoshida, S. S. Ge, K. Suzuki, J.-J. Cabibihan, F. Eyssel, & H. He (Eds.), Social robotics (pp. 546–555). Berlin, Germany: Springer.

  28. Chevalier, P., Martin, J.-C., Isableu, B., Bazile, C., Iacob, D.-O., & Tapus, A. (2016). Joint attention using human–robot interaction: Impact of sensory preferences of children with autism. In 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2016) (INSPEC no. 16467876). Piscataway, NJ: IEEE Press.

    Google Scholar 

  29. Ciardo, F., Marino, B. F. M., Actis-Grosso, R., Rossetti, A., & Ricciardelli, P. (2014). Face age modulates gaze following in young adults. Scientific Reports, 4, 4746. doi:https://doi.org/10.1038/srep04746

    Article  PubMed  PubMed Central  Google Scholar 

  30. Ciardo, F., Marino, B. F. M., Rossetti, A., Actis-Grosso, R., & Ricciardelli, P. (2013). Face age and social status exert different modulatory effects on gaze following behaviour. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Cooperative minds: Social interaction and group dynamics. Proceedings of the 35th Annual Meeting of the Cognitive Science Society (Vol. 35, pp. 2058–2063). Austin, TX: Cognitive Science Society. Retrieved from https://dx.escholarship.org/uc/item/0mg1j9np

  31. Ciardo, F., Ricciardelli, P., Lugli, L., Rubichi, S., & Iani, C. (2015). Eyes keep watch over you! Competition enhances joint attention in females. Acta Psychologica, 160, 170–177. doi:https://doi.org/10.1016/j.actpsy.2015.07.013

    Article  PubMed  Google Scholar 

  32. Cole, G. G., Smith, D. T., & Atkinson, M. A. (2015). Mental state attribution and the gaze cueing effect. Attention, Perception, & Psychophysics, 77, 1105–1115. doi:https://doi.org/10.3758/s13414-014-0780-6

    Article  Google Scholar 

  33. Corbetta, M., Miezin, F. M., Shulman, G. L., & Petersen, S. E. (1993). A PET study of visuospatial attention. Journal of Neuroscience, 13, 1202–1226. doi:https://doi.org/10.1523/JNEUROSCI.13-03-01202.1993

    Article  PubMed  Google Scholar 

  34. Maurizio, C., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3, 201–215. doi:https://doi.org/10.1038/nrn755

  35. Dalmaso, M., Edwards, S. G., & Bayliss, A. P. (2016). Re-encountering individuals who previously engaged in joint gaze modulates subsequent gaze cueing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42, 271–284. doi:https://doi.org/10.1037/xlm0000159

    Article  PubMed  Google Scholar 

  36. Dalmaso, M., Galfano, G., Tarqui, L., Forti, B., & Castelli, L. (2013). Is social attention impaired in schizophrenia? Gaze, but not pointing gestures, is associated with spatial attention deficits. Neuropsychology, 27, 608–613.

    Article  Google Scholar 

  37. Dalmaso, M., Pavan, G., Castelli, L., & Galfano, G. (2012). Social status gates social attention in humans. Biology Letters, 8, 450–452. doi:https://doi.org/10.1098/rsbl.2011.0881

    Article  PubMed  Google Scholar 

  38. David, D. O., Costescu, C. A., Matu, S., Szentagotai, A., & Dobrean, A. (2018). Developing joint attention for children with autism in robot-enhanced therapy. International Journal of Social Robotics, 10, 595–605. doi:https://doi.org/10.1007/s12369-017-0457-0

    Article  Google Scholar 

  39. Driver, J. D., Davis, G., Ricciardelli, P., Kidd, P., Maxwell, E., & Baron-Cohen, S. (1999). Gaze perception triggers reflexive visuospatial orienting. Visual Cognition, 6, 509–540. doi:https://doi.org/10.1080/135062899394920

    Article  Google Scholar 

  40. Duquette, A., Michaud, F., & Mercier, H. (2008). Exploring the use of a mobile robot as an imitation agent with children with low-functioning autism. Autonomous Robots, 24, 147–157.

    Article  Google Scholar 

  41. Edwards, S. G., Stephenson, L. J., Dalmaso, M., & Bayliss, A. P. (2015). Social orienting in gaze leading: A mechanism for shared attention. Proceedings of the Royal Society B, 282, 20151141. doi:https://doi.org/10.1098/rspb.2015.1141

    Article  PubMed  Google Scholar 

  42. Emery, N. J. (2000). The eyes have it: The neuroethology, function and evolution of social gaze. Neuroscience & Biobehavioral Reviews, 24, 581–604. doi:https://doi.org/10.1016/S0149-7634(00)00025-7

    Article  Google Scholar 

  43. Farroni, T., Csibra, G., Simion, F., & Johnson, M. H. (2002). Eye contact detection in humans from birth. Proceedings of the National Academy of Sciences, 99, 9602–9605. doi:https://doi.org/10.1073/pnas.152159999

    Article  Google Scholar 

  44. Feinman, S., Roberts, D., Hsieh, K.-F., Sawyer, D., & Swanson, D. (1992). A critical review of social referencing in infancy. In S. Feinman (Ed.), Social referencing and the social construction of reality in infancy (pp. 15–54). New York, NY: Springer. doi:https://doi.org/10.1007/978-1-4899-2462-9_2

    Chapter  Google Scholar 

  45. Ferrari, E., Robins, B., & Dautenhahn, K. (2009). Therapeutic and educational objectives in robot assisted play for children with autism. In Proceedings of the 18th IEEE International Symposium on Robot and Human Interactive Communication (pp. 108–114). Piscataway, NJ: IEEE Press.

  46. Friesen, C. K., & Kingstone, A. (1998). The eyes have it! Reflexive orienting is triggered by nonpredictive gaze. Psychonomic Bulletin & Review, 5, 490–495. doi:https://doi.org/10.3758/BF03208827

    Article  Google Scholar 

  47. Frischen, A., Bayliss, A. P., & Tipper, S. P. (2007). Gaze cueing of attention. Psychological Bulletin, 133, 694–724. doi:https://doi.org/10.1037/0033-2909.133.4.694

    Article  PubMed  PubMed Central  Google Scholar 

  48. Gallup, A. C., Chong, A., & Couzin, I. D. (2012a). The directional flow of visual information transfer between pedestrians. Biology Letters, 8, 520–522. doi:https://doi.org/10.1098/rsbl.2012.0160

    Article  PubMed  PubMed Central  Google Scholar 

  49. Gallup, A. C., Hale, J. J., Sumpter, D. J. T., Garnier, S., Kacelnik, A., Krebs, J. R., & Couzin, I. D. (2012b). Visual attention and the acquisition of information in human crowds. Proceedings of the National Academy of Sciences, 109, 7245–7250. doi:https://doi.org/10.1073/pnas.1116141109

    Article  Google Scholar 

  50. Gobel, M. S., Kim, H. S., & Richardson, D. C. (2015). The dual function of social gaze. Cognition, 136, 359–364. doi:https://doi.org/10.1016/j.cognition.2014.11.040

    Article  PubMed  Google Scholar 

  51. Greene, D. J., Mooshagian, E., Kaplan, J. T., Zaidel, E., & Iacoboni, M. (2009). The neural correlates of social attention: Automatic orienting to social and nonsocial cues. Psychological Research, 73, 499–511. doi:https://doi.org/10.1007/s00426-009-0233-3

    Article  PubMed  PubMed Central  Google Scholar 

  52. Harries, M. H., & Perrett, D. I. (1991). Visual processing of faces in temporal cortex: Physiological evidence for a modular organization and possible anatomical correlates. Journal of Cognitive Neuroscience, 3, 9–24. doi:https://doi.org/10.1162/jocn.1991.3.1.9

    Article  PubMed  Google Scholar 

  53. Hart, M. (2005). Autism/Excel study. In Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility (pp. 136–141). New York, NY: ACM Press. doi:https://doi.org/10.1145/1090785.1090811

  54. Hayward, D. A., Voorhies, W., Morris, J. L., Capozzi, F., & Ristic, J. (2017). Staring reality in the face: A comparison of social attention across laboratory and real world measures suggests little common ground. Canadian Journal of Experimental Psychology, 7, 212–225. doi:https://doi.org/10.1037/cep0000117

    Article  Google Scholar 

  55. Hietanen, J. K., & Leppänen, J. M. (2003). Does facial expression affect attention orienting by gaze direction cues? Journal of Experimental Psychology: Human Perception and Performance, 29, 1228–1243. doi:https://doi.org/10.1037/0096-1523.29.6.1228

    Article  PubMed  Google Scholar 

  56. Hoffman, E. A., & Haxby, J. V. (2000). Distinct representations of eye gaze and identity in the distributed human neural system for face perception. Nature Neuroscience, 3, 80–84. doi:https://doi.org/10.1038/71152

    Article  PubMed  Google Scholar 

  57. Hood, B. M., Willen, J. D., & Driver, J. (1998). Adult’s eyes trigger shifts of visual attention in human infants. Psychological Science, 9, 131–134. doi:https://doi.org/10.1111/1467-9280.00024

    Article  Google Scholar 

  58. Ingersoll, B., & Lalonde, K. (2010). The impact of object and gesture imitation training on language use in children with autism spectrum disorder. Journal of Speech, Language, and Hearing Research, 53, 1040–1051. doi:https://doi.org/10.1044/1092-4388

    Article  PubMed  PubMed Central  Google Scholar 

  59. Johnson, C. P., Myers, S. M., & American Academy of Pediatrics Council on Children With Disabilities. (2007). Identification and evaluation of children with autism spectrum disorders. Pediatrics, 120, 1183–1215. doi:https://doi.org/10.1542/peds.2007-2361

    Article  PubMed  Google Scholar 

  60. Jonides, J. (1981). Voluntary versus automatic control over the mind’s eye’s movement. In J. Long & A. Baddeley (Eds.), Attention and performance IX (pp. 187–203). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  61. Jording, M., Hartz, A., Bente, G., Schulte-Rüther, M., & Vogeley, K. (2018). The “social gaze space”: A taxonomy for gaze-based communication in triadic interactions. Frontiers in Psychology, 9, 226. doi:https://doi.org/10.3389/fpsyg.2018.00226

    Article  PubMed  PubMed Central  Google Scholar 

  62. Jung, Y., & Lee, K. M. (2004). Effects of physical embodiment on social presence of social robots. In Proceedings of PRESENCE (pp. 80–87). Amsterdam, The Netherlands: Elsevier.

  63. Kajopoulos, J., Cheng, G., Kise, K., Müller, H. J., & Wykowska, A. (in press). Focusing on the face or getting distracted by social signals? The effect of distracting gestures on attentional focus in natural interaction. Psychological Research.

  64. Kajopoulos, J., Wong, A. H. Y., Yuen, A. W. C., Dung, T. A., Kee, T. Y., & Wykowska, A. (2015). Robot-assisted training of joint attention skills in children diagnosed with autism. In A. Arvah, J.-J. Cabibihan, A. M. Howard, M. A. Salichs, & H. He (Eds.), Social robotics (Lecture Notes in Computer Science), Vol. 9979, pp. 296–305). Cham, Switzerland: Springer. doi:https://doi.org/10.1007/978-3-319-25554-5_30

    Chapter  Google Scholar 

  65. Kaplan, F. (2004). Who is afraid of the humanoid? Investigating cultural differences in the acceptance of robots. International Journal of Humanoid Robotics, 1, 465–480. doi:https://doi.org/10.1142/S0219843604000289

    Article  Google Scholar 

  66. Kompatsiari, K., Ciardo, F., De Tommaso, D., & Wykowska, A. (2019a). Measuring engagement elicited by eye contact in human–robot interaction. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, 4-8 November 2019, IEEE Press.

  67. Kompatsiari, K., Ciardo, F., Tikhanoff, V., Metta, G., & Wykowska, A. (2018). On the role of eye contact in gaze cueing. Scientific Reports, 8, 17842. doi:https://doi.org/10.1038/s41598-018-36136-2

    Article  PubMed  PubMed Central  Google Scholar 

  68. Kompatsiari, K., Ciardo, F., Tikhanoff, V., Metta, G., & Wykowska, A. (2019b). It’s in the eyes: The engaging role of eye contact in HRI. International Journal of Social Robotics. Advance online publication. doi:https://doi.org/10.1007/s12369-019-00565-4

  69. Kompatsiari, K., Perez-Osorio, J., De Tommaso, D., Metta, G., & Wykowska, A. (2018). Neuroscientifically-grounded research for improved human–robot interaction. Paper presented at the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain.

  70. Lachat, F., Conty, L., Hugueville, L., & George, N. (2012). Gaze cueing effect in a face-to-face situation. Journal of Nonverbal Behavior, 36, 177–190. doi:https://doi.org/10.1007/s10919-012-0133-x

    Article  Google Scholar 

  71. Langdon, R., Seymour, K., Williams, T., & Ward, P. B. (2017). Automatic attentional orienting to other people’s gaze in schizophrenia. Quarterly Journal of Experimental Psychology, 70, 1549–1558. doi:https://doi.org/10.1080/17470218.2016.1192658

    Article  Google Scholar 

  72. Law, A. S., Langton, S. R. H., & Logie, R. H. (2010). Assessing the impact of verbal and visuospatial working memory load on eye-gaze cueing. Visual Cognition, 18, 1420–1438. doi:https://doi.org/10.1080/13506285.2010.496579

    Article  Google Scholar 

  73. Liu, X., Powell, D. K., Wang, H., Gold, B. T., Corbly, C. R., & Joseph, J. E. (2007). Functional dissociation in frontal and striatal areas for processing of positive and negative reward information. Journal of Neuroscience, 27, 4587–4597. doi:https://doi.org/10.1523/JNEUROSCI.5227-06.2007

    Article  PubMed  Google Scholar 

  74. Manyika, J., Chui, M., Bughin, J., Dobbs, R., Bisson, P., & Bosseler, A. (2013). Disruptive technologies: Advances that will transform life, business, and the global economy. San Francisco, CA: McKinsey Global Institute.

    Google Scholar 

  75. Marotta, A., Casagrande, M., Rosa, C., Maccari, L., Berloco, B., & Pasini, A. (2014). Impaired reflexive orienting to social cues in attention deficit hyperactivity disorder. European Child and Adolescent Psychiatry, 23, 649–657. doi:https://doi.org/10.1007/s00787-013-0505-8

    Article  PubMed  Google Scholar 

  76. Martini, M. C., Buzzell, G. A., & Wiese, E. (2015). Agent appearance modulates mind attribution and social attention in human–robot interaction. A. Arvah, J.-J. Cabibihan, A. M. Howard, M. A. Salichs, & H. He (Eds.), Social robotics Lecture Notes in Computer Science, Vol. 9979, pp. 431–439). Berlin, Germany: Springer. doi:https://doi.org/10.1007/978-3-319-25554-5_43

    Chapter  Google Scholar 

  77. Menshikova, G. Y., Kovalev, A. I., & Luniakova, E. G. (2017). Studying the influence of race on the gaze cueing effect using eye tracking method. National Psychological Journal, 2, 46–58.

    Article  Google Scholar 

  78. Metta, G., Sandini, G., Vernon, D., Natale, L., & Nori, F. (2008). The iCub humanoid robot: An open platform for research in embodied cognition. In Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems (pp. 50–56). New York, NY: ACM Press. doi:https://doi.org/10.1145/1774674.1774683

  79. Michaud, F., Salter, T., Duquette, A., Mercier, H., Lauria, M., Larouche, H., & Larose, F. (2007). Assistive technologies and child–robot interaction. In AAAI Spring Symposium on Multidisciplinary Collaboration for Socially Assistive Robotics. Palo Alto, CA: Association for the Advancement of Artificial Intelligence. https://pdfs.semanticscholar.org/4010/e452401f828c43d65d0b02ea8025c8bea122.pdf

    Google Scholar 

  80. Milne, E. (2011). Increased intra-participant variability in children with autistic spectrum disorders: Evidence from single-trial analysis of evoked EEG. Frontiers in Psychology, 2, 51. doi:https://doi.org/10.3389/fpsyg.2011.00051

    Article  PubMed  PubMed Central  Google Scholar 

  81. Mundy, P. (2018). A review of joint attention and social-cognitive brain systems in typical development and autism spectrum disorder. European Journal of Neuroscience, 47, 497–514. doi:https://doi.org/10.1111/ejn.13720

    Article  PubMed  Google Scholar 

  82. Mundy, P., & Newell, L. (2007). Attention, joint attention, and social cognition. Current Directions in Psychological Science, 16, 269–274. doi:https://doi.org/10.1111/j.1467-8721.2007.00518.x

    Article  PubMed  PubMed Central  Google Scholar 

  83. Natale, L., Bartolozzi, C., Pucci, D., Wykowska, A., & Metta, G. (2017). iCub: The not-yet-finished story of building a robot child. Science Robotics, 2, eaaq1026. doi:https://doi.org/10.1126/scirobotics.aaq1026

  84. Nobre, A. C., Sebestyen, G. N., Gitelman, D. R., Mesulam, M. M., Frackowiak, R. S., & Frith, C. D. (1997). Functional localization of the system for visuospatial attention using positron emission tomography. Brain, 120, 515–533. doi:https://doi.org/10.1093/brain/120.3.515

    Article  PubMed  Google Scholar 

  85. Okumura, Y., Kanakogi, Y., Kanda, T., Ishiguro, H., & Itakura, S. (2013). Infants understand the referential nature of human gaze but not robot gaze. Journal of Experimental Child Psychology, 116, 86–95. doi:https://doi.org/10.1016/j.jecp.2013.02.007

    Article  PubMed  Google Scholar 

  86. Pelphrey, K. A., Singerman, J. D., Allison, T., & McCarthy, G. (2003). Brain activation evoked by perception of gaze shifts: The influence of context. Neuropsychologia, 41, 156–170. doi:https://doi.org/10.1016/S0028-393200146-X

    Article  PubMed  Google Scholar 

  87. Pennisi, P., Tonacci, A., Tartarisco, G., Billeci, L., Ruta, L., Gangemi, S., & Pioggia, G. (2016). Autism and social robotics: A systematic review. Autism Research, 9, 165–183. doi:https://doi.org/10.1002/aur.1527

    Article  PubMed  Google Scholar 

  88. Perez-Osorio, J., Müller, H. J., Wiese, E., & Wykowska, A. (2015). Gaze following is modulated by expectations regarding others’ action goals. PLoS ONE, 10, e0143614. doi:https://doi.org/10.1371/journal.pone.0143614

    Article  PubMed  PubMed Central  Google Scholar 

  89. Perez-Osorio, J., Müller, H. J., & Wykowska, A. (2017). Expectations regarding action sequences modulate electrophysiological correlates of the gaze-cueing effect. Psychophysiology, 54, 942–954. doi:https://doi.org/10.1111/psyp.12854

    Article  PubMed  Google Scholar 

  90. Perrett, D. I., Smith, P. A. J., Mistlin, A. J., Chitty, A. J., Head, A. S., Potter, D. D., . . . Jeeves, M. A. (1985). Visual analysis of body movements by neurones in the temporal cortex of the macaque monkey: A preliminary report. Behavioural Brain Research, 16, 153–170. doi:https://doi.org/10.1016/0166-4328(85)90089-0

  91. Pfeiffer, U. J., Timmermans, B., Bente, G., Vogeley, K., & Schilbach, L. (2011). A non-verbal Turing test: Differentiating mind from machine in gaze-based social interaction. PLoS ONE, 6, e27591. doi:https://doi.org/10.1371/journal.pone.0027591

    Article  PubMed  PubMed Central  Google Scholar 

  92. Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3–25. doi:https://doi.org/10.1080/00335558008248231

    Article  PubMed  Google Scholar 

  93. Putman, P., Hermans, E., & van Honk, J. (2006). Anxiety meets fear in perception of dynamic expressive gaze. Emotion, 6, 94–102.

    Article  Google Scholar 

  94. Rafal, R. (1996). Visual attention: Converging operations from neurology and psychology. In A. F. Kramer, M. G. H. Coles, & G. D. Logan (Eds.), Converging operations in the study of visual selective attention (pp. 139–192). Washington, DC: American Psychological Association. doi:https://doi.org/10.1037/10187-005

    Chapter  Google Scholar 

  95. Redcay, E., Dodell-Feder, D., Pearrow, M. J., Mavros, P. L., Kleiner, M., Gabrieli, J. D. E., & Saxe, R. (2010). Live face-to-face interaction during fMRI: A new tool for social cognitive neuroscience. NeuroImage, 50, 1639–1647. doi:https://doi.org/10.1016/j.neuroimage.2010.01.052

    Article  PubMed  PubMed Central  Google Scholar 

  96. Redcay, E., Kleiner, M., & Saxe, R. (2012). Look at this: The neural correlates of initiating and responding to bids for joint attention. Frontiers in Human Neuroscience, 6, 169. doi:https://doi.org/10.3389/fnhum.2012.00169

    Article  PubMed  PubMed Central  Google Scholar 

  97. Ricciardelli, P., Bricolo, E., Aglioti, S. M., & Chelazzi, L. (2002). My eyes want to look where your eyes are looking: Exploring the tendency to imitate another individual’s gaze. NeuroReport, 13, 2259–2264. doi:https://doi.org/10.1097/00001756-200212030-00018

    Article  PubMed  Google Scholar 

  98. Ricciardelli, P., Carcagno, S., Vallar, G., & Bricolo, E. (2013). Is gaze following purely reflexive or goal-directed instead? Revisiting the automaticity of orienting attention by gaze cues. Experimental Brain Research, 224, 93–106. doi:https://doi.org/10.1007/s00221-012-3291-5

    Article  PubMed  Google Scholar 

  99. Risko, E. F., Laidlaw, K. E., Freeth, M., Foulsham, T., & Kingstone, A. (2012). Social attention with real versus reel stimuli: Toward an empirical approach to concerns about ecological validity. Frontiers in Human Neuroscience, 6, 143. doi:https://doi.org/10.3389/fnhum.2012.00143

    Article  PubMed  PubMed Central  Google Scholar 

  100. Risko, E. F., Richardson, D. C., & Kingstone, A. (2016). Breaking the fourth wall of cognitive science: Real-world social attention and the dual function of gaze. Current Directions in Psychological Science, 25, 70–74. doi:https://doi.org/10.1177/0963721415617806

    Article  Google Scholar 

  101. Ristic, J., & Kingstone, A. (2005). Taking control of reflexive social attention. Cognition, 94, B55–B65. doi:https://doi.org/10.1016/j.cognition.2004.04.005

    Article  PubMed  Google Scholar 

  102. Robins, B., & Dautenhahn, K. (2010). Developing play scenarios for tactile interaction with a humanoid robot: A case study exploration with children with autism. In S. S. Ge, H. Li, J.-J. Cabibihan, & Y. K. Tan (Eds.), Social robotics (pp. 243–252). Berlin, Germany: Springer.

    Chapter  Google Scholar 

  103. Rolls, E. T., Grabenhorst, F., & Parris, B. A. (2008). Warm pleasant feelings in the brain. NeuroImage, 41, 1504–1513. doi:https://doi.org/10.1016/j.neuroimage.2008.03.005

    Article  PubMed  Google Scholar 

  104. Sato, W., Kochiyama, T., Yoshikawa, S., Naito, E., & Matsumura, M. (2004). Enhanced neural activity in response to dynamic facial expressions of emotion: An fMRI study. Cognitive Brain Research, 20, 81–91. doi:https://doi.org/10.1016/j.cogbrainres.2004.01.008

    Article  PubMed  Google Scholar 

  105. Sato, W., & Yoshikawa, S. (2007). Enhanced experience of emotional arousal in response to dynamic facial expressions. Journal of Nonverbal Behavior, 31, 119–135. doi:https://doi.org/10.1007/s10919-007-0025-7

    Article  Google Scholar 

  106. Scassellati, B., Admoni, H., & Matarić, M. (2012). Robots for use in autism research. Annual Review of Biomedical Engineering, 14, 275–294. doi:https://doi.org/10.1146/annurev-bioeng-071811-150036

    Article  PubMed  Google Scholar 

  107. Schilbach, L. (2014). On the relationship of online and offline social cognition. Frontiers in Human Neuroscience, 8, 278. doi:https://doi.org/10.3389/fnhum.2014.00278

    Article  PubMed  PubMed Central  Google Scholar 

  108. Schilbach, L. (2015). Eye to eye, face to face and brain to brain: Novel approaches to study the behavioral dynamics and neural mechanisms of social interactions. Current Opinion in Behavioral Sciences, 3, 130–135. doi:https://doi.org/10.1016/j.cobeha.2015.03.006

    Article  Google Scholar 

  109. Schilbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G., Schlicht, T., & Vogeley, K. (2013). Toward a second-person neuroscience. Behavioral and Brain Sciences, 36, 393–414. doi:https://doi.org/10.1017/S0140525X12000660

    Article  PubMed  Google Scholar 

  110. Schilbach, L., Wilms, M., Eickhoff, S. B., Romanzetti, S., Tepest, R., Bente, G., . . . Vogeley, K. (2009). Minds made for sharing: Initiating joint attention recruits reward-related neurocircuitry. Journal of Cognitive Neuroscience, 22, 2702–2715. doi:https://doi.org/10.1162/jocn.2009.21401

  111. Schilbach, L., Wohlschlaeger, A. M., Kraemer, N. C., Newen, A., Shah, N. J., Fink, G. R., & Vogeley, K. (2006). Being with virtual others: Neural correlates of social interaction. Neuropsychologia, 44, 718–730. doi:https://doi.org/10.1016/j.neuropsychologia.2005.07.017

    Article  PubMed  Google Scholar 

  112. Schuller, A.-M., & Rossion, B. (2001). Spatial attention triggered by eye gaze increases and speeds up early visual activity. NeuroReport, 12, 2381–2386. doi:https://doi.org/10.1097/00001756-200108080-00019

    Article  PubMed  Google Scholar 

  113. Sciutti, A., Ansuini, C., Becchio, C., & Sandini, G. (2015). Investigating the ability to read others’ intentions using humanoid robots. Frontiers in Psychology, 6, 1362. doi:https://doi.org/10.3389/fpsyg.2015.01362

    Article  PubMed  PubMed Central  Google Scholar 

  114. Seibert, J. M., & Hogan, A. E. (1982). Procedures manual for the Early Social-Communication Scales (ESCS). Miami, FL: University of Miami, Mailman Center for Child Development.

    Google Scholar 

  115. Simmons, D. R., Robertson, A. E., McKay, L. S., Toal, E., McAleer, P., & Pollick, F. E. (2009). Vision in autism spectrum disorders. Vision Research, 49, 2705–2739. doi:https://doi.org/10.1016/j.visres.2009.08.005

    Article  PubMed  Google Scholar 

  116. Simut, R. E., Vanderfaeillie, J., Peca, A., Van de Perre, G., & Vanderborght, B. (2016). Children with autism spectrum disorders make a fruit salad with Probo, the social robot: An interaction study. Journal of Autism and Developmental Disorders, 46, 113–126. doi:https://doi.org/10.1007/s10803-015-2556-9

    Article  PubMed  Google Scholar 

  117. Syrdal, D. S., Dautenhahn, K., Koay, K. L., & Walters, M. L. (2009). The Negative Attitudes Towards Robots Scale and reactions to robot behaviour in a live human–robot interaction study. Retrieved from http://dx.uhra.herts.ac.uk/handle/2299/9641

  118. Taheri, A., Meghdari, A., Alemi, M., & Pouretemad, H. (2018). Human–robot interaction in autism treatment: A case study on three pairs of autistic children as twins, siblings, and classmates. International Journal of Social Robotics, 10, 93–113. doi:https://doi.org/10.1007/s12369-017-0433-8

    Article  Google Scholar 

  119. Tardif, C., Latzko, L., Arciszewski, T., & Gepner, B. (2017). Reducing information’s speed improves verbal cognition and behavior in autism: A 2-cases report. Pediatrics, 139, e20154207. doi:https://doi.org/10.1542/peds.2015-4207

    Article  PubMed  Google Scholar 

  120. Teufel, C., Alexis, D. M., Clayton, N. S., & Davis, G. (2010). Mental-state attribution drives rapid, reflexive gaze following. Attention, Perception, & Psychophysics, 72, 695–705. doi:https://doi.org/10.3758/APP.72.3.695

    Article  Google Scholar 

  121. Vecera, S. P., & Johnson, M. H. (1995). Gaze detection and the cortical processing of faces: Evidence from infants and adults. Visual Cognition, 2, 59–87. doi:https://doi.org/10.1080/13506289508401722

    Article  Google Scholar 

  122. Warren, Z. E., Zheng, Z., Swanson, A. R., Bekele, E., Zhang, L., Crittendon, J. A., . . . Sarkar, N. (2015). Can robotic interaction improve joint attention skills? Journal of Autism and Developmental Disorders, 45, 3726–3734. doi:https://doi.org/10.1007/s10803-013-1918-4

  123. Wiese, E., Weis, P., & Lofaro, D. (2018). Embodied social robots trigger gaze following in real-time. PsyArXiv preprint. doi:https://doi.org/10.31234/osf.io/8cx3s

  124. Wiese, E., Wykowska, A., & Müller, H. J. (2014). What we observe is biased by what other people tell us: Beliefs about the reliability of gaze behavior modulate attentional orienting to gaze cues. PLoS ONE, 9, e94529. doi:https://doi.org/10.1371/journal.pone.0094529

    Article  PubMed  PubMed Central  Google Scholar 

  125. Wiese, E., Wykowska, A., Zwickel, J., & Müller, H. J. (2012). I see what you mean: How attentional selection is shaped by ascribing intentions to others. PLoS ONE, 7, e45391. doi:https://doi.org/10.1371/journal.pone.0045391

    Article  PubMed  PubMed Central  Google Scholar 

  126. Wilkowski, B. M., Robinson, M. D., & Friesen, C. K. (2009). Gaze-triggered orienting as a tool of the belongingness self-regulation system, gaze-triggered orienting as a tool of the belongingness self-regulation system. Psychological Science, 20, 495–501. doi:https://doi.org/10.1111/j.1467-9280.2009.02321.x

    Article  PubMed  Google Scholar 

  127. Willemse, C., Marchesi, S., & Wykowska, A. (2018). Robot faces that follow gaze facilitate attentional engagement and increase their likeability. Frontiers in Psychology, 9, 70. doi:https://doi.org/10.3389/fpsyg.2018.00070

    Article  PubMed  PubMed Central  Google Scholar 

  128. Willemse, C., & Wykowska, A. (2019). In natural interaction with embodied robots we prefer it when they follow our gaze: A gaze-contingent mobile eyetracking study. PsyArXiv preprint. doi:https://doi.org/10.31234/osf.io/bnmvt

  129. Wilms, M., Schilbach, L., Pfeiffer, U., Bente, G., Fink, G. R., & Vogeley, K. (2010). It’s in your eyes—Using gaze-contingent stimuli to create truly interactive paradigms for social cognitive and affective neuroscience. Social Cognitive and Affective Neuroscience, 5, 98–107. doi:https://doi.org/10.1093/scan/nsq024

    Article  PubMed  PubMed Central  Google Scholar 

  130. Wykowska, A., Kajopoulos, J., Ramirez-Amaro, K., & Cheng, G. (2015). Autistic traits and sensitivity to human-like features of robot behavior. Interaction Studies, 16, 219–248. doi:https://doi.org/10.1075/is.16.2.09wyk

    Article  Google Scholar 

  131. Wykowska, A., Wiese, E., Prosser, A., & Müller, H. J. (2014). Beliefs about the minds of others influence how we process sensory information. PLoS ONE, 9, e94339. doi:https://doi.org/10.1371/journal.pone.0094339

    Article  PubMed  PubMed Central  Google Scholar 

  132. Zheng, Z., Zhang, L., Bekele, E., Swanson, A., Crittendon, J. A., Warren, Z., & Sarkar, N. (2013). Impact of robot-mediated interaction system on joint attention skills for children with autism. In IEEE 13th International Conference on Rehabilitation Robotics (ICORR) (INSPEC no. 6650408). Piscataway, NJ: IEEE Press. doi:https://doi.org/10.1109/ICORR.2013.6650408

  133. Zheng, Z., Zhao, H., Swanson, A. R., Weitlauf, A. S., Warren, Z. E., & Sarkar, N. (2018). Design, development, and evaluation of a noninvasive autonomous robot-mediated joint attention intervention system for young children with ASD. IEEE Transactions on Human–Machine Systems, 48, 125–135. doi:https://doi.org/10.1109/THMS.2017.2776865

    Article  PubMed  Google Scholar 

Download references

Open Practices Statement

This article is a review of previous research, and no new data are reported here.

Author information

Affiliations

Authors

Contributions

A.W. and P.C. conceptualized the article. F.C. wrote the following paragraphs: “Classical Studies on Joint Attention,” “Bottom-Up and Top-Down Components in Joint Attention,” and “Joint Attention, Development, and Individual Differences.” K.K. wrote the “Recent Approaches to Study Joint Attention, Highlighting the Need for Reciprocity,” and “Limitations of Recent Approaches to Study Joint Attention” paragraphs and the “Using Robots to Examine Joint Attention” section. P.C. wrote the “Application of Joint Attention Studies in Human–Robot Interaction in Healthcare” section. All authors revised the manuscript.

Corresponding author

Correspondence to Pauline Chevalier.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chevalier, P., Kompatsiari, K., Ciardo, F. et al. Examining joint attention with the use of humanoid robots-A new approach to study fundamental mechanisms of social cognition. Psychon Bull Rev 27, 217–236 (2020). https://doi.org/10.3758/s13423-019-01689-4

Download citation

Keywords

  • Joint attention
  • Human–robot interaction
  • Healthy and clinical populations
  • Autism
  • Review