In recent years, there has been a growing interest in how visual attention is influenced by our interactions with other people. Numerous everyday activities, during which the most fundamental cognitive processes are engaged, do not take place in isolation but alongside other people. For example, many occupations, which tax the highest levels of human performance, such as surgical procedures or piloting commercial aircraft, take place alongside coactors who’s attention can influence that of others. Cognitive and social psychologists therefore wish to study these interactions in order to better understand how they might shape processes such as attention. Intuitively, another’s activity might influence our own attention in a number of ways. We can be affected by where they are looking, what we know their interests or motivations to be, or by how they move their body and act. Nonetheless, attention to the manual and bodily actions of other humans has received little consideration. Indeed, it is common for action observation studies, including review articles, to make no reference to attentional orienting, or ‘attention’ at all (e.g., Atmaca, Sebanz, Prinz, & Knoblich, 2008; Braun, Ortega, & Wolpert, 2011; Galantucci & Sebanz, 2009; Paulus & Moore, 2007; Vesper, van der Wel, Knoblich, & Sebanz, 2011). It is hoped that understanding how attention might be influenced by these phenomena will lead psychologists to a richer account of it as a useful tool for understanding brain and behaviour.

Attention in the Social World

Orienting visual attention

In this context, attention refers to the aligning of a sense organ or of cognitive recourses to a particular object or area of the environment, including to one or more features of the respective object or area. Features of an object or of visual space may include, for example, a particular shape, colour, or movement direction. The result of this alignment is that the attended area or feature can be processed more effectively, at the expense of other unattended areas or features. The direction of attention allows humans to focus processing capacity on the most salient or behaviourally relevant features of the environment. In vision, this sensory alignment can be achieved by movement of the eyes. Around three times per second, the eyes are moved in abrupt, rapid movements known as saccades. Additionally, between saccades, the eyes will rest and extract visual information during periods known as fixations. With a large array of complex and competing visual information, saccades and fixations are guided in such a way to facilitate behavioural goals—a process known as attentional orienting.

In addition to orienting attention by alignment of the eyes, however, it has been shown that certain objects and locations within the visual field can be processed more effectively than others when the eyes are stationary. The observation that ‘the mind’s eye’ can attend to certain areas or features of the environment over others, in the absence of eye movements, has led to the distinction between overt attention, where orienting occurs with eye and head movements, and covert attention, where attention is oriented in the absence of eye movements (Itti & Koch, 2000; Posner, 1980). Furthermore, attention is thought to be directed in two ways (Jonides, 1981). It can be captured by stimuli in a bottom-up, reflexive, or exogenous way and can also be directed voluntarily, according to top-down, or endogenous mechanisms. This distinction fundamentally rests on whether attention is directed externally or internally. For example, exogenous external directors might include the appearance of a new object, motion onset, or an abrupt luminance change. In contrast, endogenous internal factors can include one’s goals in visual search or one’s understanding of the meaning of stimuli.

Orienting to social cues: The case of gaze

Social attention can be defined as the study of how areas of space, objects, or their features are selectively processed depending upon the real or implied presence of other people, and the way information about other people is processed. Other humans can act as a cue to attention in several ways. For example, another’s eyes, head, or limbs are targets that can be a focus for attention in many contexts. These anatomical targets may themselves guide and cue attention to other aspects of the environment, such as when one points to an object of interest. Furthermore, the presence of another person—real or imagined—can influence how one attends to the human and nonhuman environment. For example, another’s social status or the presence of competition with them can influence which features of the environment are attended to when people interact (e.g., Gobel, Kim, & Richardson, 2015; Vick & Anderson, 2003). Moreover, our beliefs about the presence of other people may influence how we attend to their anatomy or the focus of their gaze, such as when we might ignore a television picture of another’s face but attend to them carefully during a live video conference (e.g., Laidlaw, Foulsham, Kuhn, & Kingstone, 2011). Needless to say, these factors can interact; thus, all are relevant to the study of social attention.

Much of the research effort directed at understanding social attention has focused on the role of eye gaze and head direction (see Birmingham & Kingstone, 2009; Frischen, Bayliss, & Tipper, 2007, for reviews). The reasoning behind this approach is that such observable cues provide an indication of where the focus of another’s attention is. As social organisms, these indicators may therefore provide particularly strong cues to orient our own attention, which can help in the achievement of other behavioural goals (e.g., Hare, Call, & Tomasello, 1998; Tomasello, Call, & Hare, 1998). Both covert and overt attention have been investigated in this respect, and common to both approaches has been an attempt to determine whether social cues are special in orienting attention when compared to other kinds of directional or symbolic stimuli (for example, schematic arrows or language).

Covert attention

In the case of covert attention, much of the work on gaze and arrow cueing has employed the Posner cueing paradigm (Posner, 1978, 1980), which established a number of currently accepted characteristics of attentional orienting. In this procedure, participants are asked to manually detect or identify a target onset following a cue. Cues can either be congruent with the direction or location of subsequent targets or incongruent with them. Shorter RTs to previously cued targets are thought to show that attention has been oriented by the cue. The investigation of both arrow and eye gaze stimuli in covert paradigms has a much longer history than the equivalent approaches for eye-movement measures. Early work established, a subsequently long-held position, that nonpredictive arrow cues do not orient covert attention to the location of subsequent targets (Jonides, 1981). It later emerged however, that central arrows can shift covert attention at longer stimulus onset asynchronies (SOA; approximately 300 ms), whereas asynchronies required by peripheral stimuli (e.g., luminance changes) can be as little as 100 ms (Cheal & Lyon, 1991). This finding was thought to show that centrally presented directional cues were only able to direct attention to peripheral locations endogenously, in contrast to cues in peripheral locations themselves, which capture attention exogenously.

Friesen and Kingstone (1998) first investigated the effects of gaze as a central cue, using a variant of the Posner cueing paradigm, and established that these cues induced a rapid effect (from 105 ms SOA onwards), which was present when gaze direction was nonpredictive and to be ignored by participants. These findings were interpreted as showing that, contrary to central arrow cues, central gaze cues could induce an automatic shift of covert attention. Further support for this position was offered by the finding that even counter-predictive gaze cues could direct attention at up to 700 ms SOA (Driver et al., 1999).

Nonetheless, since the original gaze cueing studies, evidence has emerged that directional cues such as arrows may also have reflexive characteristics in orienting attention (Hommel, Pratt, Colzato, & Godijn, 2001; Quadflieg, Mason, & Macrae, 2004; Ristic, Friesen, & Kingstone, 2002; Tipples, 2002), both in the presence of counterpredictive cues and very short SOAs. Recent work (Gibson & Bryant, 2005) has suggested that the long-held view that arrows represent a voluntary orienting cue may have been based upon the extremely brief presentation of arrow cues (e.g., 25 ms) used in Jonides’s (1981) original study. Despite other attempts to demonstrate the greater reflexivity of gaze than arrow cues, recent work has suggested automatic shifts of attention with both directional stimuli (Friesen, Ristic, & Kingstone, 2004; Tipples, 2008).

Although both arrows and gaze orient attention reflexively, they may differ when it comes to inhibition of return (IOR; Posner & Cohen, 1984). IOR refers to the finding that after approximately 300 ms, RTs to targets that are preceded by a peripheral transient cue (e.g., a luminance change or the appearance of a new object) will be slowed when cues are congruent rather than incongruent (Klein, 2000; Taylor & Klein, 1998). So the facilitatory effect of a peripheral transient cue becomes inhibitory after about 300 ms. At much longer SOAs (e.g., 1,200 ms), gaze cueing has also been shown to cause inhibitory effects (Frischen & Tipper, 2004). However, even at these longer SOAs, this effect has not proved reliable without the presence of a central transient event to redirect attention away from the target location (Frischen, Smilek, Eastwood, & Tipper, 2007; McKee, Christie, & Klein, 2007).

In summary, the behavioural evidence favours the view that gaze cues are roughly equivalent to other well-learned directional symbols such as arrows. Gaze and symbolic cues can orient attention reflexively; however, neither orient attention with the same characteristics as low-level, peripheral transient cues. The same evidence subsequently suggests that attention shifts to gaze and other directional cues relies upon domain-general orienting processes rather than on a set of domain-specific mechanisms, unique to eye gaze.

Overt attention

In the case of overt attention, researchers have found that eye gaze cues, which are typically presented centrally in a visual display, will reduce reaction times (RTs) to make voluntary saccades to a target, when the gaze cue and target are in the same direction (Ricciardelli, Bricolo, Aglioti, & Chelazzi, 2002). This will occur even if the cues do not predict where the target will appear. Given that for overt attention, the gaze cue is directly congruent with the direction of a performed motor behaviour (i.e., the cue and participant look the same way), many authors suggest that this and similar results may be due to facilitation and interference from observed oculomotor preparation (Mansfield, Farroni, & Johnson, 2003; Nummenmaa & Hietanen, 2006; Quintana et al., 2003). This position, which can be called the ‘gaze imitation hypothesis’, posits that observed eye gaze causes the activation of an oculomotor program in the observer, and that this subsequently affects the performance of the same gaze behaviour. The gaze behaviour that is supposedly imitated is goal-centred, such that an observed gaze in the same direction is imitated, rather than the oculomotor program per se, which would result in a saccade in the opposite direction when another person is opposite the observer.

Further studies found that gaze cues did not induce errors on incongruent movements. This was replicated using equivalent schematic stimuli by Kuhn and Benson (2007), who additionally found that incongruent arrow cues could induce such errors in subsequent eye movements. Further evidence against the gaze imitation hypothesis comes from the observation that manual tasks can show a greater gaze cueing effect than saccadic tasks (Friesen & Kingstone, 2003). In addition, Koval, Thomas, and Everling (2005) found that in the case of antisaccades (where participants are instructed to make an eye movement in the opposite direction of a target), RTs are shorter following gaze cues that are congruent with these targets. In Koval et al.’s study, therefore, interference was not observed from gaze cues that were spatially incongruent with the targets towards which they were asked to saccade. As such, their study implies automatic imitation, and saccadic motor contagion does not influence subsequent eye movement behaviour. In contrast, the authors suggest that saccade preparation is dependent upon the particular task, or goal, driving the saccadic behaviour. Despite these findings, the possibility remains that the task or goal of another’s saccadic behaviour may be imitated and so moderate gaze following. Studies have yet to probe this question in the case of oculomotor measures of gaze following, despite recent interest in the influence of higher level goal states in social attention (e.g., Teufel, Alexis, Clayton, & Davis, 2010; Wiese, Wykowska, Zwickel & Müller, 2012). The possibility of goal-imitation in saccadic behaviour therefore remains an interesting avenue for further work.

Despite evidence inconsistent with the assertion that gaze cues elicit imitative behaviour, they have consistently been shown to be a powerful cue for attention. Indeed, some evidence has pointed at the ability of others’ gaze to orient attention automatically. In favour of this position, at very short intervals between the onset of a cue and a saccade target (i.e., SOA), gaze cues have been shown to interfere with saccade RTs, even when these cues are more likely to be in the opposite direction of targets (Kuhn & Kingstone, 2009). Nonetheless, the same study found that an equivalent pattern of results was elicited by arrow cues, similarly casting doubt upon whether eye gaze represents a special class of directional cue for overt attention. This ‘nothing special’ position has also found support in studies investigating saccade trajectories (Hermens & Walker, 2010).

Gaze, social attention, and the brain

In contrast to behavioural data, stronger support for specialized systems that orient to eye gaze comes from neurological findings. Evidence is primarily based upon the differential activation of these areas during gaze cueing, in comparison to other stimuli. Whilst the neural areas identified are certainly not unique to gaze cueing, they are not generally identified with other attentional processes (Corbetta & Shulman, 2002). Regions thought to be uniquely associated with orienting to social and biological, versus nonbiological cues, are also identified as those which code for the perception of eye movements and the direction of eye gaze (Akiyama et al., 2006; Allison, Puce, & McCarthy, 2000; Hietanen, Nummenmaa, Nyman, Parkkola, & Hämäläinen, 2006).

The superior temporal sulcus (STS) is an area consistently identified as being sensitive to judgments about eye gaze in humans (Calder et al., 2007; Hoffman & Haxby, 2000; Perrett et al., 1985). A number of studies have established that in the case of the STS (although not adjacent regions, such as the superior temporal gyrus), greater activation occurs for judgments of gaze direction when compared with other directional cues (Hietanen, Leppänen, Nummenmaa, & Astikainen, 2008; Hietanen et al., 2006; Hooker et al., 2003; C. M. Tipper, Handy, Giesbrecht, & Kingstone, 2008). The STS has dense connections to the intraparietal sulcus (IPS), which is important in covert attention shifts and spatial processing (Corbetta et al., 1998; Harries & Perrett, 1991; Materna, Dicke, & Thier, 2008; Rafal, 1996). The role of these connections has found support in studies that have correlated STS activation with activity in IPS under conditions where averted gaze (i.e., where gaze is directed away from the observer) is viewed (George, Driver, & Dolan, 2001; Pelphrey, Singerman, Allison, & McCarthy, 2003; Wicker, Michel, Henaff, & Decety, 1998). Although the IPS has generally been shown to undergo greater activation following averted rather than direct gaze (i.e., when gaze is directed at the observer; Hoffman & Haxby, 2000), unlike the STS it is not thought to be sensitive to biological cues. Both socially relevant and nonsocially relevant directional judgments have been found to elicit activation in this area (Materna et al., 2008).

There is certainly disagreement regarding whether a gaze-orienting network is separable from regions associated with other kinds of orienting (Birmingham & Kingstone, 2009; Rombough, Barrie, & Iarocci, 2012). However, there is now a raft of evidence in humans and nonhuman primates to support the involvement of the STS and IPS in orienting to gaze and head direction and thus supporting social attention. Nonetheless, as the following discussion will highlight, the STS and IPS are by no means uniquely implicated in processing gaze or face information. In fact, these areas are associated with a range of tasks, including body orientation and the perception of manual and whole-body biological motion (Jellema, Baker, Wicker, & Perrett, 2000; Saxe, Xiao, Kovacs, Perrett, & Kanwisher, 2004; van Kemenade, Muggleton, Walsh, & Saygin, 2012).

The neural basis of attention to action

The past 15 to 20 years has witnessed a surge in interest in the neural mechanisms which support action perception in both human and nonhuman primates. Much of this research began with the discovery of mirror neurons in the macaque monkey. Mirror neurons are cells in premotor and parietal areas that respond to both the performance and observation of a goal-directed action. These neurons are thought to be sensitive to the goal state of observed actions, such that a commonly observed and performed action selectively activates a specific population of cells (di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, 1992; Fogassi et al., 2005; Gallese, Fadiga, Fogassi, & Rizzolatti, 1996; Umiltà et al., 2001). Mirror neurons are therefore proposed to map an observed goal-directed action directly to one that is subsequently performed. Investigations with humans have also supported the presence of single cell mirror neurons (Mukamel, Ekstrom, Kaplan, Iacoboni, & Fried, 2010). Furthermore, neuroimaging studies in humans have demonstrated the association of a widespread network of cortical areas involved in the understanding and performance of goal-directed actions, thus supporting the existence of a human ‘mirror neuron system’ (MNS). Many of these studies in humans have reported mirror neuron activity in areas assumed to be homologous to those where single cell recordings have identified mirror neurons in monkeys (Filimon, Nelson, & Hagler, 2007; Hamilton & Grafton, 2006; Kilner, Friston, & Frith, 2007; Pobric & Hamilton, 2006; Van Overwalle & Baetens, 2009).

Whilst the discovery of mirror neurons has been a catalyst for much research investigating the neural basis of goal-directed action perception and understanding, a lively debate has emerged concerning the extent to which mirror neurons underlie in these processes. Classic theories concerning mirror neurons have asserted that they represent an evolutionary specialisation for understanding others’ goal-directed actions, via direct access to representations that code for the performance of the same actions (Aglioti, Cesar, Romani & Urgesi, 2008; Calvo-Merino, Glaser, Grezes, Passingham, & Haggard, 2005; Keysers & Perrett, 2004; Uddin et al., 2007). Specifically, it has been argued that mirror neurons enable action understanding by directly matching observed motor representations with those for performance of the same action, a position known as the direct matching hypothesis (Gallese, 2007). The action understanding afforded by this mechanism is thought to have a role in social processes such as empathy, theory of mind, language, and social attention (Gallese, 2008; Iacoboni, 2009; Williams, Whiten, Suddendorf, & Perrett, 2001). However, these theories have been criticised on both empirical and theoretical grounds.

Collecting direct evidence for mirror neurons in humans has not been possible (for obvious ethical reasons), with one important exemption (Mukamel et al., 2010). Guided by clinical rather than empirical necessity, this study recorded cortical areas (including frontal eye fields; FEF) that were not identical to those previously identified in the macaque, and thus even these findings are problematic for extrapolating primate MNS localisation to humans. Further, whilst the macaque areas F5 and inferior parietal lobule (IPL) have been shown to contain mirror neurons, they do not represent the majority of cells in these areas (an average of 30% of neurons in F5 and 20% in IPL; Kilner & Lemon, 2013). It is therefore far from certain that activity in homologous areas in humans (i.e., IPL and inferior frontal gyrus; IFG) necessarily represent mirror neuron activity, without a more precise single cell physiology of these human regions.

Indeed, fMRI data support the view that areas thought to be part of the human MNS, and homologous to macaque mirror neuron areas, are not specialised for action understanding but are linked to a range of processes (e.g., the processing of emotion, language, and number; Chochon, Cohen, Van De Moortele, & Dehaene, 1999; Simon, Mangin, Cohen, Le Bihan, & Dehaene, 2002). Even if neurophysiological data from human MNS areas represents the activity of single cell mirror neurons, fMRI data have not always supported the view that areas associated with the human MNS are recruited during the performance and observation for the same action. For example, Lingnau, Gesierich, & Caramazza (2009) did not find evidence of fMRI adaptation, in human MNS areas, for actions that were first performed and then subsequently observed (see also Dinstein, Hasson, Rubin, & Heeger, 2007).

In addition to the lack of empirical verification of human mirror neurons, alternative theoretical perspectives have questioned the interpretation accorded to early studies in macaques. For example, it is difficult to explain how single cells could respond to goal-directed actions, but not non-goal-directed actions, without recruiting wider computational processes, which would ‘make sense’ of the complexity of coding for goals (Hickok, 2009; Kosonogov, 2012). Heyes (2010) has questioned the evolutionary account of the MNS when responding to the difficulty in aligning evidence from the human and monkey MNS, and to evidence that areas associated with the MNS are also active in a range of social tasks. Heyes argues that mirror neurons are a corollary of associative learning and sensorimotor experience which occur during social interaction, rather than an evolved mechanism making these interactions possible. Thus, while there is considerable evidence that mirror neurons are associated with the observation and performance of human actions, much of the evidence to date indicates that they may be one component of a diverse neural system for action understanding. In addition, the function of mirror neurons themselves may extend beyond merely understanding goal-directed action to a number of diverse processes in social cognition.

It is fair to say that little of the research concerning the neural underpinning of action understanding and social cognition has explicitly examined the question of orienting attention to manual or whole-body actions of others. Nonetheless, many of the areas that have been implicated in social attention have also been implicated in encoding manual action and movement. A number of adjacent and highly connected areas (e.g., the superior temporal sulcus: STS; temporoparietal junction: TPJ and IPL) have been consistently linked with both action understanding and social attention (as well as higher level social cognition). The right TPJ, at the intersection between inferior parietal cortex and the posterior temporal cortex, has also been extensively linked with reorienting attention for behavioural goals (Corbetta & Shulman, 2002; Devlin & Poldrack, 2007; Posner, Walker, Friedrich, & Rafal 1984). The boundaries of the TPJ include parts of the posterior STS. As previously mentioned, the STS responds selectively to whole-body orientation as well as gaze orientation (Keysers & Perrett, 2004; Perrett et al., 1985). Perrett and colleagues (1985) identified this region as part of a biological orientation system. The neurons recorded in this area were most sensitive to gaze direction but also activated in response to head and whole-body orientation.

In the case of both gaze and actions, the STS is thought to be particularly responsive to the intentionality of perceived gaze or action, such that looking behaviours or manual movements that appear to be directed towards a target by an agent are associated with higher activation (Pelphrey, Morris, & McCarthy, 2004; Pelphrey et al., 2003). This interpretation is consistent with data showing that the STS is involved in higher order social cognitive processes such as theory of mind and recognition of intention and agency (Mar, Kelley, Heatherton, & Macrae, 2007; Rilling, Sanfey, Aronson, Nystrom, & Cohen, 2004; Saxe et al., 2004). Both sets of findings suggest that the STS supports higher order processing of gaze and action direction, rather than lower level visual analysis (Lee, Gao, & McCarthy, 2014; Schultz, Imamizu, Kawato, & Frith 2004).

The anatomical coverage of the TPJ also converges with the inferior parietal lobule (IPL). The IPL has also been linked specifically with orienting attention to averted gaze (Sato, Kochiyama, Uono, & Toichi, 2016). Further, it is considered part of the MNS, and neurons here have been shown to represent both observed and performed goal-directed action, in addition to areas within the ventral premotor cortex (Chong, Cunnington, Williams, Kanwisher & Mattingley, 2008; Fogassi et al., 2005). It has been reported that these areas are also specifically sensitive to movements embedded in complementary action goals and demonstrate different responses when the same movements are observed pursuing different goals. These cells have therefore also been thought to code for intentionality in goal-directed action. A number of studies have suggested that the IPL is a critical area for orienting attention following observed gaze (e.g., Driver & Mattingley, 1998; Vallar, 1993); however, this has been disputed, with conflicting work having argued that the parietal centre of the attention network is instead located in the superior parietal lobule (Gitelman et al., 1999). In addition to its observed association with attention to averted gaze, the IPS (as part of the MNS) may also have a role in attention shifts following observed action. It is activated during the planning, execution, and perception of manual action (Frey, Vinton, Norlund & Grafton, 2005; Grezes & Decety, 2001). In addition, activation in this area is modulated by observed action goals (Hamilton & Grafton, 2006).

Whilst the promise of functional overlap between social attention and action understanding mechanisms is an interesting direction for future research, few if any studies have probed the role of these areas specifically for orienting attention to goal-directed action. What is clear is that areas that have been associated with attending to gaze direction and symbolic stimuli (e.g., arrows) overlap extensively with those which code for intentional biological motion and manual action, including the STS, IPL, IPS, and TPJ (Frischen et al. 2007; Hooker et al., 2003; Nummenmaa & Calder, 2009). Indeed, a number of neural regions that have been associated with processing gaze direction and the intentional nature of eye gaze behaviours have been linked to coding the direction of intentional manual actions. Nonetheless, there are at present no studies probing the overlapping or divergent activation of attentional networks following the observation of eye gaze and symbolic cues (e.g., Callejas, Shulman, & Corbetta, 2014) versus manual action cues, making this an interesting but unexplored area for future research.

Research at the neurophysiological level has now begun to probe whether gaze cueing may arise due to imitation or mirroring of another’s eye movements. This work has revealed that, despite evidence for neural regions specialized for gaze processing, there has so far been little support for ‘eye movement cells’ with mirror properties. Recently, some support has emerged for the presence of mirror neurons in areas that control saccadic activity and overt attention such as the lateral intraparietal area (LIP; Shepherd, Klein, Deaner, & Platt, 2009), and the supplementary motor area including the supplementary eye fields (Mukamel et al., 2010). However, no evidence yet exists for neurons with mirror properties in areas that are specifically linked with eye movements such as the frontal eye fields. Additionally, only Shepherd et al. (2009) recorded mirror properties of cells specifically for observed and performed eye-movement behaviours. In contrast, mirror neurons which activate during the performance and observation of actions have been found throughout the premotor cortex (PMC), which is primarily responsible for planning a range of manual actions (Calvo-Merino et al., 2005; Ferrari, Gallese, Rizzolatti, & Fogassi, 2003; Ferrari, Rozzi, & Fogassi, 2005).

Just as behavioural studies find little support for the gaze imitation hypothesis, it appears that shared representations for the performance and observation of an oculomotor movement lack a corresponding neural signature, in contrast to those for manual movements. Certainly further work is required to establish the case for mirror neurons in areas responsible for the control of attention and eye movements. The suggestion that processing eye-gaze direction and wider social orienting mechanisms could occur via mirroring or direct mapping processes has, however, attracted growing interest (Frischen, Loach, & Tipper, 2009; Rizzolatti & Sinigaglia, 2010; Shepherd, 2010). One recent fMRI study has already identified mirror neuron areas which may be responsive to averted and direct gaze in humans (Coudé et al., 2016).

Nonetheless, there are problems with linking mirror areas and social attention processes, even in the case of observed actions. One reason for this is that the PMC is not an area traditionally associated with endogenous or exogenous attentional orienting networks (Corbetta & Shulman, 2002; Rombough et al., 2012). In fact, neither areas that are uniquely associated with social orienting to gaze and head direction, nor more general circuits for endogenous and exogenous orienting, have been identified as involving PMC. Thus there is little neural basis for the involvement of this area of the MNS in orienting to action. Moreover, the PMC is generally thought to receive input both from the STS and IPS, suggesting that orienting and attention to movements might be processes that occur prior to the direct mapping mechanisms that are associated with PMC (Van Overwalle & Baetens, 2009). Thus, it may be that attention to biological movements, such as gaze and action, is a functional prerequisite to the direct mapping of representations, rather than a consequence of them (Thompson & Parasuraman, 2012).

Overt attention to action

Most of the systematic work concerning attention to the actions of others has employed measures of eye movements while participants observe goal-directed action. These paradigms have revealed that overt attention behaviours diverge when observing human manual actions versus mechanical or physical movements that share the same spatial or directional properties. Eye-movement paradigms have therefore revealed strong evidence that overt attention may have specialised, domain-specific mechanisms for responding to observed actions and, moreover, these mechanisms may also be influenced by the observer’s own current and future action plans.

Overt attention and action prediction

When individuals act alone in everyday tasks, predictive eye movements are important for the control and planning of manual actions. As such, a number of studies have shown that eye movements anticipate the goals of our own planned actions (Hayhoe & Ballard, 2005; Land & Hayhoe, 2001; Land, Mennie, & Rusted, 1999; Land & Tatler, 2009). Flanagan and Johansson (2003) also found that overt attention predicts not just the goal of your own actions but also those of others. Crucially, non-actor-propelled movement (e.g., a mechanical claw or projected object) along the same trajectory of motion does not elicit the same behaviour, even when it achieves the same goal. This pattern of performance can also be seen in preverbal infants, with spontaneous and proactive prediction generally increasing with age (Ambrosini, Costantini, & Sinigaglia, 2011; Ambrosini et al., 2013; Daum, Attig, Gunawan, Prinz, & Gredeback, 2012; Rosander & von Hofsten, 2011). Crucially, even when multiple potential action goals are present, predictive eye movements have been shown to rapidly and accurately reflect the goal of an observed action (Rotman, Troje, Johansson, & Flanagan, 2006)

The direct matching hypothesis and attention to action

Evidence suggesting a close relationship between predicative eye movements in performed and observed actions has led to the proposal that the MNS may mediate these processes. In the case of over attention, the direct matching hypothesis postulates that the same predictive eye movements are represented by the motor system for observed as well as performed goal-directed actions. Direct matching has received considerable support in the literature concerning predictive eye movements.

Ambrosini et al. (2013) also found a relationship between infants’ ability to perform grasping actions and the level of predictive gaze behaviour during observation of the same movements. A relationship between 12-month-old infant’s predictive gaze and immediate prior experience of executing reaching responses to objects has also been demonstrated (Cannon, Woodward, Gredeback, von Hofsten, & Turek, 2012). Further, developmental perspectives have confirmed that infants’ motor experience determines the movements they most accurately predict (Stapel, Hunnius, Meyer, & Bekkering 2016). With adults, Ambrosini et al. (2011) showed that predictive saccades were biased toward the object that matched the size of the actor’s grip. In their study, participants were more likely to saccade to the smaller of two objects when the observed reach was a precision grip and the larger object when a power grip was viewed. Costantini, Ambrosini, and Sinigaglia (2012) found impairments to predictive gaze when participants observed actions using a particular grasp that was incompatible with the one they were asked to maintain while resting during the experiment. Virtual lesions of the left ventral premotor cortex were shown to modulate the predictive gaze effect (Costantini, Ambrosini, Cardellicchio, & Sinigaglia, 2013), as did actors who moved to a target that was out of their reach (Costantini et al., 2012). When participant’s hand movements were restricted, predictive gaze is also impaired (Ambrosini et al., 2012).

A further feature of predictive gaze, which may support the direct matching hypothesis, is that anticipatory eye movements for action appear to be modulated by goal salience (Henrichs, Elsner, Elsner, & Gredeback, 2012). Considerable evidence from the mirror neuron literature in humans and nonhuman primates indicates that MNS activation is sensitive to the goal states of actions (Fogassi et al., 2005; Hamilton & Grafton, 2006; Rizzolatti & Fabbri-Destro, 2008; Van Overwalle & Baetens, 2009).

Despite the wealth of evidence consistent with the direct matching hypothesis, there have been some objections raised. Most notably, Eshuis, Coventry, and Vulchanova (2009) found that the presentation of end effects when viewing action modulated predictive gaze and did not require a human actor to be viewed. This position has been further supported by the finding that in matched interactive contexts, predictive gaze to human and robot action is strikingly similar (Sciutti et al., 2012). As mirror neuron activation is generally thought to be selectively responsive to human biological actions (Kilner, Paulignan, & Blakemore, 2003; Tai, Scherfler, Brooks, Sawamoto, & Castiello, 2004), some authors suggest that predictive gaze is an example of a general goal-directed bias, which guides overt attention to action (Csibra & Gergely, 2007). In support of this position, some authors have found predictive eye movements can also be sensitive to the goals of observed actions, which would not be available by simulating the actions kinematics, including the relationship between an object’s physical properties and its function (Hunnius & Bekkering, 2010), or the efficacy with which an observed action achieves a goal (Biro, 2013). This evidence indicates predictive eye movements are sensitive to movement–goal relationships that could not be explained solely by the presence of action performance processes during action observation.

The presence of goal-directed inference across a range of goal states during action observation may be explained by developmental approaches which argue that correspondence between observed and performed actions are the result of learning and experience. The theory of associative sequence learning (ASL; Catmur, Walsh, & Heyes, 2009; Heyes, Bird, Johnson, & Haggard, 2005) suggests that mirror representations of goal-directed actions are determined primarily by the frequency with which infants and young children observe and perform complementary actions. Paulus et al. (2011) found that infants’ looking behaviour prior to an observed action was based on the frequency of what had previously been observed. This approach may explain why predictive gaze is modulated both by differences in the observed goal-state and kinematics when actions are observed.

Regardless of the underlying mechanisms subserving the predictive gaze phenomenon, there is now substantial evidence in favour of an attentional mechanism that, if not unique to the observation of human action, is attuned to the goal-directed context within which these movements are commonly observed. Crucially, research on predictive gaze has revealed that attention may select and orient to actions in a specialised way, when compared to equivalent lower level motion stimuli in the environment.

Covert attention to action

Action in covert cueing paradigms

Whilst there is now a substantial literature suggesting that eye movements and head direction can shift covert attention, much less is known about how actions direct covert shifts. A small number of studies have specifically addressed how manual gestures influence attention. Langton, O’Malley, and Bruce (1996) and Langton and Bruce (2000) initially used a Stroop-like (Stroop, 1935) paradigm to investigate whether processing of irrelevant pointing gestures interfered with verbal directional cues when making judgments of direction. This was shown to be the case, independently of head and eye-gaze orientation. These initial results indicated that participants were unable to ignore observed manual gestures when attempting to complete a task based upon speech. Langton and Bruce (2000) argued that like other directional social cues (e.g., head and eye gaze), manual actions can elicit exogenous visual orienting in the observer (Bruce & Langton, 1999; Driver et al., 1999; Friesen & Kingstone, 1998). The authors suggested that manual gesture direction may provide an orienting cue that is equally, or perhaps more salient, than other social cues such as eye gaze. This interpretation is consistent with studies showing that directional biological cues elicit reflexive cueing effects similar to those of eye gaze and head direction (Ariga & Watanabe, 2009; Watanabe, 2002).

Nonetheless, it is difficult to conclude whether the pointing gestures used in these studies generalize to other kinds of manual or goal-directed actions. Such pointing cues may act like other directional, symbolic stimuli that are known to reflexively orient attention, such as arrows (Friesen & Kingstone, 1998; Ristic et al., 2002; Tipples, 2002, 2008). These studies therefore do not determine whether pointing actions produce specialized attentional affects, as a result of being social and biological (like eye gaze), or influence attention in a more general, symbolic manner. In support of this distinction, some body postures alone, such as trunk orientation, do not cue attention (Hietanen, 2002). On the other hand, there is some evidence that biological motion, like both gaze and manual actions, may shift attention automatically. Using point-light displays of moving walkers, it has been shown that both human and nonhuman (e.g., a cat) biological motion can cue attention in a congruent direction, even when this is a counterpredictive cue to subsequent targets. Moreover, this effect disappears both with inverted motion orientation and nonbiological motion stimuli (Shi, Weng, He, & Jiang, 2010). These results indicate that reflexive attention may be specialized for biological motion direction.

Additionally, Gervais, Reed, Beall, and Roberts (2010) directly investigated the cueing effects of noncommunicative manual actions. They utilized a nonpredictive cueing paradigm to determine whether running and throwing cues would orient attention to peripheral targets. Their results showed that both of these cues do indeed orient attention, additionally replicating the finding that trunk orientation alone does not elicit a cueing effect. Their results also yielded two other interesting findings. First, the SOA between cue and target was tested at 100 ms, 300 ms, and 600 ms. It was found that throw cues, but not run cues, shifted attention at the shortest SOA of 100 ms. This short SOA is like that observed with reflexive attention shifts following central gaze and arrow cues, as well as peripheral transient cues (Kingstone, Smilek, Ristic, Friesen, & Eastwood, 2003; Posner & Cohen, 1984). Perhaps more uniquely, Gervais et al. found faster responses for throw and run cues across both cued and uncued targets compared to nonacting cues. This matches the finding in the gaze-cueing literature that gaze cues elicit shorter RTs across both cued and uncued locations compared to arrows (Quadflieg et al., 2004). These two findings (Gervais et al., 2010; Quadflieg et al., 2004) suggest that biological movement primes response readiness in the case of both gaze and action cues. This manual priming effect occurred following the throwing gestures, and to a lesser extent the running actions, compared to nonactors and nonhuman directional cues.

Manual actions may be perceived from the observer’s own egocentric viewpoint (where, for example, an actor might face away from the observer). Alternatively, and perhaps more commonly during social interaction, they may be viewed allocentrically, like gaze where an actor may face the observer. One question for research on attentional orienting by action is whether this perspective is important for the effects of joint action on attention. Belopolsky, Olivers, and Theeuwes (2008) investigated this with a cueing paradigm in which a group of participants either saw manual pointing gestures allocentrically or egocentrically, which were either spatially compatible or incompatible with targets. This allowed the authors to examine whether perspective would influence the validity of cues and additionally, whether validity interacted with anatomical compatibility. The authors found that the validity effect for responding to cued locations did not interact with whether the actions were allocentric or egocentric, or with the anatomical compatibility of the observed movement. These results suggest that the effect of such gestures on attention is independent of direct mapping mechanisms that are known to drive compatibility effects between action observation and performance.

The interpretation of Belopolsky et al.’s (2008) paradigm is problematic, however. Despite being more ecologically valid, the pointing stimuli used are difficult to compare to the floating, isolated directional cues typically used in Posner cueing paradigms. In particular, pointing gestures are likely to be attended not just centrally but peripherally, and therefore may be attended exogenously due to low-level sensory input, rather than endogenously. This may explain the persistence of cueing effects across changes in perspective, which are known to modulate direct mapping mechanisms (e.g., Alaerts, Heremans, Swinnen, & Wenderoth, 2009). In addition, participants always performed the same right-handed finger press movement, so it is not known whether movement compatibility may have influenced the cueing effects.

Fischer, Prinz, and Lotz (2008) presented stronger evidence that covert attention may be oriented by the direct mapping of performed action mechanisms. They presented participants with a central grasping gesture that was congruent with one of two differently sized objects, due to the grasp’s aperture. The authors found that when a subsequent target appeared at the location where the object was congruent with the grasp, participants were faster to detect its onset, an effect that was present at an SOA of as little as 250–300 ms (between grasping cue presentation and target onset). A second experiment replicated these findings with counterpredictive cues, establishing that grasp-based attentional orienting was reflexive. Additional evidence in favour of direct mapping has paralleled that of predictive gaze paradigms. Using a slightly modified procedure, Lindemann, Nuku, Rueschemeyer, and Bekkering (2011) found that this effect was sensitive to the animacy of the perceived grasp aperture. The authors again interpret these results in the context of direct mapping, suggesting that covert attention shifts following observed actions are due to the simulation of similar attentional mechanisms required for the performance of a congruent action.

Social inhibition of return

In addition to studies where participants take part alone, attention to action has also been investigated when two or more people undertake a cueing task simultaneously. These paradigms modified those that investigated attentional processes, when participants performed goal-directed reaching movements in isolation (Tremblay, Welsh, & Elliott, 2005; Welsh & Elliott, 2004; Welsh & Pratt, 2006). This includes a target–target aiming task, where single participants are found to exhibit inhibition in the second of two RTs to the same target location. These results are typically interpreted in terms of IOR. Attention is returned to the same (now inhibited) location in the second response, as it had previously been located during the first response. The presence of IOR implies that the execution of the participant’s first response induces a reflexive shift of attention to the location of aiming, much like low-level transient cues do in the traditional Posner cueing paradigm. The second of their aiming responses is then inhibited when attention and/or motor processes are directed to the same target.

In the joint action paradigm used by Welsh et al. (2005), pairs of participants take turns to reach to locations on a shared work surface. Each participant’s trial consists of two target presentations that can appear at random, either to the left or right of a central fixation. This allows the analysis of each participant’s RT to initiate a response following their own previous response, and following the response of the other coacting participant. In the case of a participant following their own last action, Welsh et al. replicated previous work (e.g., Tremblay et al., 2005; Welsh & Pratt, 2006) showing that participant’s RTs were slower in the case where the same location was reached to, verses a different location. Of particular interest, however, was the finding that a participant’s response to the same location as that of their coactor’s previous response was also slowed. This novel finding was interpreted as an example of IOR elicited by the actions of another.

The presence of IOR was striking in this context as this phenomenon had not been demonstrated following an action cue, and only two studies have done so in response to gaze cues (Frischen, Smilek, et al., 2007; Frischen & Tipper, 2004). Of these, it was determined that long cue-target intervals were required, alongside a centrally presented perceptual transient such as the gaze cue offset or an independent transient event (see Orienting to Social Cues: The Case of Gaze section). Given the robust finding that IOR occurs in the second of two subsequent goal-directed aiming movements (Tremblay et al., 2005; Welsh & Pratt, 2006), Welsh et al. suggested that observed aiming movements may elicit inhibitory mechanisms due to shared representations between observed and performed actions.

The transfer of inhibitory processes from one individual to another following goal-directed actions may be an example of action corepresentation. Corepresentation during joint action is defined as the shared representation of each other’s action and/or goal across two or more individuals. In these contexts, an actor represents not just their own action plans during a shared activity but also those of their partner. Corepresentation between actors can be supported by direct perceptual-motor experience of another’s actions, by communicative behaviours, or by inference and prediction of another’s actions based upon knowledge of the goals of their activity or goal-relevant stimuli (see Knoblich, Butterfill, & Sebanz, 2011; Vesper et al., 2016, for reviews).

Neural substrates of corepresentation are thought to involve a number of regions and networks, many of which have also been associated with other kinds of social cognition. Studies have identified higher activation in areas thought to be part of the human MNS when corepresentation of another’s action occurs within a joint task, including the IPL (Wen & Hsiah, 2015). Activation in the IPL is consistent with the MNS having a role in joint action, linking observed actions with their own performance (Molenberghs et al., 2009), as well as making discriminations between self and other originated actions (e.g. Uddin, Molnar-Szakacs, Zaidel, & Iacoboni, 2006). Corepresentation has also been associated with areas integral to making inferences about others’ intentions, including the medial PFC and STS (Chaminade, Marchant, Kilner, & Frith 2012; Liepelt et al., 2016). Finally, activation has been observed in areas which have been implicated in, and the discrimination of, self and other-related representations and agency, including TPJ (Bardi, Gheza & Brass, 2016; Spengler, von Cramon, & Brass, 2009). Taken together, these findings indicate corepresentation is subserved by sensorimotor control and mirroring of another’s action, as well as inferences about another’s intentions and the control of self and other- related representations.

The interpretation of Welsh et al. (2005) in terms of an ‘action corepresentation account’ was supported by the fact that there was no difference found in the magnitude of IOR in their findings between trials following participants’ own responses and trials following those of their coactor. Thus, the effect may reflect the automatic shifts of attention and subsequent inhibition that are known to precede the planning of goal-directed action (Baldauf, Wolf, & Deubel, 2006) rather than the less reflexive mechanisms provoked by centrally presented gaze, arrow, and action cues. A number of further studies using this joint IOR paradigm have sought a fuller understanding of this action-based attentional mechanism. Cole, Skarratt, and Billing (2012) suggested that social IOR could be due to sensory transients, occurring when a coactor’s targets are presented at peripheral locations and therefore inducing the ‘standard’ IOR effect. This ‘attentional cueing’ account posits no role for action corepresentation. Welsh et al. (2005) used LCD goggles to occlude the presentation of a coactor’s targets from the participant. They found that ‘between-person’ IOR was present when no target presentation was visible, suggesting that another’s action was wholly responsible for eliciting the effect.

In a follow-up study, Welsh et al. (2007) again used LCD goggles to determine which components of observed action might subserve inhibition. They found that either the observed culmination or initiation of action were sufficient, to cause IOR. The latter finding in particular, ruled out a similar but subtler objection to Welsh et al.’s (2005) original result: that the culmination of another’s reaching response was an exogenous sensory event that caused within-person IOR, with properties equivalent to a peripheral cue. On the contrary, a centrally observed reach was sufficient to inhibit subsequent responses. This finding appears to be inconsistent with the gaze cueing literature, which reports no IOR at the SOAs typically used in this study, following centrally presented cues (Frischen, Smilek, et al., 2007; Frischen & Tipper, 2004), and therefore with an attentional cueing account of the effect. Welsh et al. (2007) also found a significant correlation between within- and between-person inhibition, which underscored the potentially shared mechanism between these two behaviours.

The importance of controlling for low-level stimuli in the joint IOR paradigm, as demonstrated by Welsh et al. (2007), was shown in two further sets of experiments. In the first, Welsh, Ray, Weeks, Dewey, and Elliott (2009) investigated social IOR in populations with autistic spectrum disorder (ASD). ASD is characterized by a triad of deficits in social communication, social interaction, as well as a propensity towards repetitive behaviour patterns (Baron-Cohen & Bolton, 1994). Importantly, groups of ASD participants have also demonstrated atypical behaviours in measures of social attention such as reduced gaze cueing, yet the same dissociations do not occur for orienting to low-level visual stimuli typically presented in Posner cueing paradigms (Goldberg et al., 2008; Johnson et al., 2005; Marotta et al., 2012; Rinehart, Bradshaw, Moss, Brereton, & Tonge, 2008; Ristic et al., 2005). Welsh et al. found this pattern was present in the social IOR paradigm, where a high functioning ASD group was compared with typically developing (TD) controls. Under ‘full’ visual conditions, where no portion of the partner’s action was occluded, nor the presentation of their target, an IOR effect was evidenced in both groups. However, when LCD goggles were employed to restrict the culmination of actions in peripheral vision, and leaving only a central window of visibility, IOR was only present in the TD group.

Testing a TD sample of adults, a further dissociation was revealed by Skarratt, Cole, and Kingstone (2010). They also restricted the visual access to coactor’s responses in peripheral vision using a set of physical barriers, such that again there was only a central window of visibility. Under these conditions, and over two experiments, the authors established that although an animated partner’s behaviour does not elicit IOR, the behaviour of a real coactor does. Taken in conjunction with the findings of Welsh, Ray, et al. (2009), these findings suggest that when visual transients are restricted, the effect is sensitive to manipulations of social or interactive context, as well as being dissociable in populations with atypically developing traits in social interaction. This has led to the term social IOR being applied to the between-person IOR effect (Skarratt, Cole, & Kuhn, 2012), implying an IOR effect that is necessitated by some level of social interaction.

Action corepresentation and attention in social IOR

Demonstrations that orienting to action in social IOR paradigms can be affected by social interactions suggests that, in the absence of peripheral transient cues typically needed to generate IOR, the effect may be generated by an independent mechanism. As Welsh et al. (2005; Welsh et al., 2007; Welsh, Ray, et al., 2009) argued, IOR following another’s action may be related to similar exogenous attention shifts that typically precede goal-directed action. As a result, the authors suggest that inhibition may arise following another’s observed action due to the corepresentation of observed and performed action. Like other direct mapping approaches of attention to action, this position posits that the same mechanisms that control orienting prior to and during the performance of goal-directed action are also evoked during the observation of the same action. In further accordance with direct mapping approaches, Welsh et al. suggest that the MNS may be responsible for the corepresentation of observed and performed actions.

That social IOR should result from the corepresentation of another’s actions may shed light on why the inhibition is observed under the restricted visual conditions employed by Welsh et al. (2007; Welsh, Ray, et al., 2009) and Skarratt et al. (2010), even when this is inconsistent with stimuli that are typically thought to cause this effect. The role of corepresentation and the MNS in social IOR has received additional support from two sources. First, Hayes, Hansen, and Elliott (2010) found that for both between- (social) and within-person IOR, the salience of low-level stimulus and response events were approximately equivalent, suggesting shared inhibitory mechanisms following previous actions in both.

Second, Welsh, McDougall, and Weeks (2009) tested participants sitting side-by-side, rather than opposite each other, making responses to a shared touchscreen. In a typical social IOR paradigm, when participants respond to the same location, they make a different kinematic movement. For example, if Coactor A responds to a target on their right side, they must make a simple elbow extension movement. However, if Coactor B then responds to the same target location as Coactor A (their left hand side), a cross-body movement must be made. Using the side-by-side arrangement, Welsh, McDougall, et al. (2009) were able to dissociate within-person from between-person IOR processes. They dissociated when participants responded to the same location, using different movements, from when participants made the same movement, but to a different location. The two conditions were compared to baseline trials, where neither the movement nor response location were the same as the previous response. The authors demonstrated that both same location/different movement and same movement/different location trials were delayed when compared to baseline on between- and within-person trials. Moreover, the authors established that delay magnitude in between-person trials was again related to within-person performance for both trial types. These data suggest shared mechanisms in the observation and performance of goal-directed aiming.

Recent data have extended these findings beyond RT measures (Cole, Wright, Doneva, & Skarratt, 2015). In the case of an individual Posner-type free-choice task, where participants are presented with a cue followed by the request to select a response location at either the cued or uncued location, they are less likely to respond at the area previously cued (Wilson & Pratt, 2007). Cole et al. found that this behaviour extended to the joint goal-directed aiming paradigm used in social IOR. This again indicates that cognitive processes associated with planning individual aiming movements may be elicited by observing the same movements. Interestingly, the presence of inhibition under these conditions indicates that IOR and social IOR are due to attentional mechanisms, rather than a potentially competing account—that participants are committing the ‘gambler’s fallacy’ (Croson & Sundali, 2005). The gambler’s fallacy is the belief that repeated independent events are less likely than probability theory actually predicts. Humans are generally poor at generating truly random numbers or sequences (e.g., Bakan, 1960; Wagenaar, 1972) and will evidence a bias away from repeated digits in such sequences. This behaviour may manifest in response time by reduced preparedness to make a response to repeatedly presented targets. Nonetheless, Lyons, Weeks, and Elliott (2013) demonstrated that there was a relationship between IOR and the tendency to bet against repeated digits in a joint task—presenting interesting evidence that there may be a basic relationship between these two inhibitory processes. Understanding the direction of this relationship is a fruitful area of future attention and decision-making research.

Corepresentation and attention during joint action

Frischen et al. (2009) adopted a similar selective reaching procedure to that of Welsh et al. (2005) to investigate attentional processes during joint action. They asked participants to take part jointly in a negative priming task. Negative priming paradigms (S. P. Tipper 1985; see S. P. Tipper, 2001, for a review) are thought to reveal attentional mechanisms where processing of goal-relevant stimuli is facilitated whist irrelevant distracting stimuli are simultaneously inhibited. Negative priming tasks typically find that target objects that were to-be-ignored distractors up to a few seconds prior are inhibited more strongly than previously neutral objects. The more salient and behaviourally relevant the previous distractor stimuli are, the stronger the subsequent inhibition observed. In negative priming tasks, where goal-directed reaching responses are performed, inhibition is action-centred, rather than visuospatially organized (MacDonald, Joordens, & Seergobin, 1999). This is indicated by the robust finding that distractors located close to the initiation point of the reaching hand elicit the strongest negative priming effect.

Frischen et al. 2009 adapted a reach-response negative priming paradigm for two participants who took turns in making responses to targets, in the presence of distractors. In the individual condition, they made two consecutive responses, with probe trials measuring response time to locations relative to previous target and distractor locations in prime trials. In the dual person condition, the prime trial was performed by the experimenter and participants then made one subsequent response. Results replicated the finding that distractors located close to the responding hand were inhibited most strongly in the individual condition. However, in the dual person condition, the position of strongest inhibition shifted to those distractors located close to the hand of the experimenter. Further evidence that attentional effects pertaining to the position of the hand can transfer to that of another person was presented by Sun and Thomas (2013). They expanded the finding of Abrams, Davoli, Du, Knapp (2008), and Paull (2008), who found visual stimuli which were presented close to the hand induced faster shifts of attention. Sun and Thomas found this effect occurred when the hand of another was placed close to targets; however, this was only after the participants had taken part in a joint-action task.

These results suggest that witnessing the actions of another person modulates mechanisms of selective attention. Specifically, Frischen et al. (2009) interpret these findings as evidence that participants simulate the action-centred attentional mechanisms of another during joint action. As such, their findings represent compelling evidence that the same covert attention processes, that are employed in the performance of a goal directed action are also elicited during the observation of the same action. In further support of this claim, the authors employed a control condition that sought to rule out attentional cueing due to sensory capture by the motion of the action itself.

However, controlling for sensory transients (i.e., Welsh et al., 2007) does not necessarily refute an attentional cueing account of effects like social IOR. The (seen) initial part of a coactor’s response may be enough to shift an observer’s attention. In support of this view, Atkinson, Simpson, Skarratt, and Cole (2014) found that centrally viewed pointing responses generated social IOR when peripheral events were obscured. This supports a similar finding by Skarratt et al. (2010), who observed the effect following only centrally presented eye/head cues. Atkinson et al. (2014) demonstrated that even when participants observed a different action to that which they performed, social IOR was present. Doneva & Cole (2014) and Doneva, Atkinson, Skarratt & Cole (2017) observed that peripheral transients, generated by luminance changes, were sufficient to generate the effect in the absence of a coactor. Moreover, they showed that responses made with a different effector (feet, as opposed to hands) did not alter the basic effect. These findings cast doubt upon a corepresentation account of inhibition. Whilst IOR is not typically observed following head, gaze, and manual cues in individual cueing paradigms, evidence from joint-action procedures suggests that within an interactive context (i.e., when taking turns to respond), these social cues may be the equivalent of low-level sensory transients, and so generate the subsequent inhibitory effects. Further work has shown that participants’ belief that an observed action is made to a specific location is not sufficient to generate social IOR. Welsh, Manzone, and McDougall (2014) found that an auditory cue which featured no spatial dimension failed to induce social IOR, indicating that some direct visual input is necessary for the effect. This study shows that whether the effect is due to corepresentation or attentional cueing, some direct visual signal of action, head or gaze movement is required.

Action goals and attention

If observed and performed actions are corepresented and covert attention makes use of these corepresentations, then this attentional process may be sensitive to the goals of observed actors. This perspective is consistent with an array of studies in the literature on action observation, which suggest that these corepresentations are based on the goals of actions (e.g., Van Overwalle & Baetens, 2009). Central to this assertion is the view that the goal of an action can be inferred from bodily movement (Becchio, Manera, Sartori, Cavallo, & Castiello, 2012), and that these goals are the principal information encoded during action corepresentation (Fogassi et al., 2005; Gazzola et al., 2007).

A number of studies have now shown that action planning and internally generated movement goals can influence attention and perception. Specifically, according to the theory of event coding (TEC; Hommel, Müsseler, Aschersleben, & Prinz, 2001), stimuli pertaining to action effects can be selectively processed over competing stimuli, when particular action plans are relevant (e.g., Fagioli, Hommel, & Schubotz, 2007; Wykowska, Hommel, & Schubö, 2012). For example, when reach movements are to be performed, selection is biased to location-related stimuli, whereas for grasping movements, size-related stimuli are processed preferentially. Interestingly these effects are also observed when reach or grasping action is observed in another (Fagioli et al., 2007) and responses were made using the foot (no reaching or grasping responses were performed). These findings suggest that selection for action effects occurs following observed as well as performed action. TEC interprets these findings as evidence that actions are prepared according to their perceptual consequences rather than to motor plans. Therefore, the framework argues that stimulus and response representations are coded within the same space so that preparing or observing goal-directed actions activates the relevant stimulus codes for the goals of that action. TEC therefore presents a single framework for interpreting the finding that observed and performed action both bias attention to stimuli codes based upon the goals of the action (e.g., Hommel, 2010).

In the case of attention during joint action, the evidence regarding whether orienting to another’s action is modulated by goals is unclear. Using the social IOR paradigm, Cole et al. (2012) carried out a series of experiments to determine whether differences in the goal state of identically performed actions modulates social IOR. None of the experiments found that differences in observed and performed goal, between participant pairs, modulated the size of social IOR observed. Interestingly, in the most ecologically valid experiment, where pairs performed different goal-directed actions (e.g., writing or erasing a character on paper), participants were faster to execute an action when the same goal was performed. Nonetheless, this RT benefit did not interact with the social IOR effect. These results suggest that the encoding action goals, which has received much support in the literature, may be independent of the ability of the same actions to shift attention. This position has similarly found support from Janczyk, Welsh, and Dolk (2015), who observed that manipulations of action goals did not influence the social IOR effect. In addition, these findings indicate that the mechanisms eliciting shifts of attention to action may not be based on corepresentation of observed and performed action at all (as observed action representation is typically shown to be sensitive to the goal state of the encoded actions).

Nonetheless, recent work has provided evidence that the goals of actions may modulate shifts of attention (Ondobaka, de Lange, Newman-Norlund, Wiemers, & Bekkering, 2012). This study employed a variant of the basic social IOR procedure, where participants took turns to make reaching responses to sets of cards located on the left and right of a flat touchscreen. Using a confederate coactor, the authors manipulated responses to the higher or lower of two cards, and participants were asked to monitor the goal of the coactor and to adopt the same goal or a different goal (i.e., match their move to the higher or lower cards or make the opposite response). Results showed that this manipulation affected RTs. Like Cole et al. (2012), movements across both previously responded locations and opposite locations were faster when participants had the same goal. However, in Ondobaka et al.’s task, the differences in action goal appeared to modulate delayed movements to the same location, such that the social IOR effect was abolished when participants had different goals.

In view of these conflicting findings, the role of goals in attending to other’s action is unresolved. Differences in the methods of the two studies (Cole et al., 2012; Ondobaka et al., 2012) may indicate why their findings differed. For example, in Ondobaka et al. (2012), participants had to infer action goals online, on a trial-by-trial basis, whereas in Cole et al. (2012) the goal states of participants were blocked. Critically, participants in Cole et al.’s task did not have to update the goal state of the other participant in order to perform their own response, so action planning was not dependent upon the other’s goal, merely their own. This, however, was systematically examined by Cole, Atkinson, D’Souza, Welsh and Skarratt (in press), whose results suggested that task relevancy of the coactors’ actions may not be an important component. Furthermore, Cole et al. failed to replicate the original findings of Ondobaka et al.

Attending to action in interactive contexts

If the critical factor, which determines attention to another’s action, is the task relevance to the observer of that action, then a key question (when studying joint action) is how interactive social context effects attention. Whilst Skarratt et al. (2010) probed this issue by comparing performance of participants when they undertook the task with a live coactor verses an animated one, little other work has addressed how characteristics of joint tasks influence attention to another’s action. Nonetheless, other work in the joint action and social attention literature indicates that live interactive contexts may be essential for a full understanding of how social cues are processed during visual cognition.

In the social attention literature, there is now some evidence that the degree to which experimental stimuli approximate a face-to-face, social interaction can affect social attention. In the case of the gaze-cueing effect, the role of more complex dynamic stimuli is subtle. In infants, motion appears to be necessary for gaze orienting (Farroni, Johnson, Brockbank, & Simion, 2000). Nonetheless, neither photographic faces nor dynamic movement in gaze cues have been shown to increase the magnitude of the gaze cueing effect with adults (Hietanen & Leppänen, 2003). Recent work has even revealed that gaze cueing in a live face-to-face situation elicits an effect comparable in magnitude to schematic eye gaze stimuli (Cole, Smith, & Atkinson, 2015; Lachat, Conty, Hugueville, & George, 2012). Interestingly, however, some evidence suggests that social manipulations of the gaze-cueing effect may be most powerful under conditions where gaze stimuli are dynamic and thus better approximates experience of gaze in the environment. In particular, modulation of gaze cueing by emotion has been shown to occur only using dynamic stimuli which present gaze motion (Putman, Hermans, & Van Honk, 2006). Procedures which have shown moderation of gaze cueing by mental state attribution have employed dynamic cues such as videos and human rather than schematic gaze cues (Teufel et al., 2010; Wiese et al., 2012). Evidence from the gaze-cueing literature therefore indicates that attention may be modulated by interactive factors when stimuli are dynamic and approximate a human gazer.

The importance of interactive context for attention to action is further underlined by findings from the wider joint action literature. Joint tasks such as that which produces the joint Simon effect (JSE; Sebanz, Knoblich, & Prinz, 2003) have been found to be sensitive to the interactive context in which they are performed. In a typical joint Simon task, participants take part in a spatial compatibility task, where spatially oriented responses (such as a button on the left or right) must be made to stimuli on the basis of a nonspatial characteristic (e.g., red or green), which also possesses a spatial dimension (e.g., they are on the left or right of a display). When participants perform this task alone, it is typically found that they are faster to detect, for example, a red item when this is placed in a congruent spatial location to the red response, relative to an incongruent location (i.e., the Simon effect; Craft, Simon, & Richard, 1970; J. Simon, 1970). When participants perform this task with another person, such that each person is only responsible for one response, a Simon effect also emerges. This effect disappears however, when a single participant performs only one of the two possible responses. This observation has led to considerable research interest. The individual Simon effect is thought to emerge because spatially oriented stimuli elicit automatic orientation of compatible response codes (Hommel, Müsseler, et al., 2001; Kornblum, Hasbroucq, & Osman, 1990). The JSE is therefore believed to emerge from the corepresentation of another person’s response codes as a result of engaging in joint action. These corepresentations may be for another’s action, task or both (however, see Dolk, Hommel, Prinz, & Liepelt, 2013; Sebanz, Knoblich, & Prinz, 2005).

Although the JSE is not yet fully understood, it is reasonable to say that much of the subsequent empirical work has revealed that the JSE can be modulated by subtle social manipulations. For example, a negative interpersonal relationship between coactors can abolish the effect (Hommel, Colzato, & van den Wildenberg, 2009). When participants act alongside a wooden nonhuman hand or believe a remote coactor to be a preprogrammed computer, the effect is also abolished (Tsai & Brass, 2007; Tsai, Kuo, Hung, & Tzeng, 2008). The effect with the wooden coactor can, however be reinstated when participants are primed as to the animacy or agency of the coacting hand (Müller, Brass, et al., 2011). The effect is undiminished by the removal of visual and auditory online feedback of the coactor and their presence in interpersonal space (Vlainic, Liepelt, Colzato, Prinz, & Hommel, 2010; Welsh et al., 2013). Researchers have also tested whether social identity processes influence joint action using this paradigm. Social identity theory argues that group memberships are internalised, such that they become integral to an individual’s self and sense of identity (Tajfel & Turner, 1979). As a result of social identity processes, individuals show increased prejudice toward and competition with those who are members of outgroups (Brewer, 1979). Consistent with this approach, interacting with another person from either a natural or minimal lab-induced outgroup weakens the JSE (McClung, Jentzsch, & Reicher, 2013; Müller, Kühn, et al., 2011).

These findings clearly have implications for the mechanisms that underlie orienting to action. If attention to action, as evidenced in social IOR and the predictive gaze paradigms, is subserved by the corepresentation or direct mapping of observed actions, then this may well be modulated by interactive factors. Indeed, according to the above findings, sharing a task with a coactor, the social relationship between the observer and coactor and beliefs regarding the animacy and intentionality of observed action may all affect processes by which observed action orients attention. All or any one of these factors may account for the findings of Skarratt et al. (2010), who found an animated partner did not elicit social IOR. Interactive factors may also influence attentional effects, such as predictive gaze and grasp cueing. As a live interactive paradigm, the social IOR studies offer a useful tool to investigate social and contextual factors underlying attention to action (and indeed other social cues, such as gaze and head direction). Future research that employs interactive joint-task paradigms to the study of social attention promises to aid understanding of the contextual and social factors that determine orienting (see Atkinson, Doneva, Simpson & Cole, under revision). As such, these approaches may also shed light on the underlying mechanisms determining attention to other’s actions.

Summary and future directions

The aim of this review has been to highlight the diverse ways in which researchers have examined attention to other’s actions. Despite the evidence drawn together in the preceding sections, the mechanisms underlying attention to observed action have not received nearly the degree of systematic investigation that has been the case for eye gaze and other social directional cues (Birmingham & Kingstone, 2009; Frischen et al., 2007). Action may be a strong signal indicating the direction of another’s attention (Langton & Bruce, 2000). It may also be a powerful directional signal in its own right (Gervais et al., 2010). Finally, observing the actions of others may also orient attention because direct mapping, corepresentation, and mirroring of another’s observed action may simulate the same attentional processes as when that action is performed (Falck-Ytter, 2012; Lindemann et al., 2011; Welsh et al., 2007). Nonetheless, the influence of manual action on attention and the relationship between manual action and gaze for orienting has been the subject of no integrated formal models in the literature.

Perhaps the most critical direction for future research concerns the place of attending to action as part of a range of processes that are engaged during social contact. Some evidence suggests that attention to manual action is necessary for processes that are engaged during social interactions (Chong, Williams, Cunnington, & Mattingley, 2008; Longo & Bertenthal, 2009). These processes include imitation and corepresentation of action and determining the intentions of others (Knoblich & Sebanz, 2008). Dovetailing with this view, evidence has emerged that the same processes may determine whether or not people pay attention to the actions of others (Ondobaka et al., 2012). A lack of clarity on the direction of these relationships parallels a similar debate that has emerged concerning attention to eye gaze and head direction (Cole et al., 2015; Teufel et al., 2010; Wiese et al., 2012). Finally, questions still remain regarding when manual action is selected in natural scenes as a cue for attention, alongside competing social signals. Using eye-movement paradigms, compelling evidence has emerged in favour of the primacy of head and eye gaze selection in such scenes (Birmingham, Bischof, & Kingstone, 2008). Nonetheless, early work using these techniques leaves open the question of whether this finding persists when the scenes in question are presented as part shared manual tasks or in the presence of action-focused narratives (Yarbus & Riggs, 1967).

A final concern for this field, as suggested by research on gaze selection and joint action, is the need to understand attention to action in ecologically valid contexts (see Skarratt et al., 2012). This approach has the ability to underscore the pervasiveness of effects observed in the lab yet also reveal subtleties that were not previously apparent. For example, recent work in both the gaze and action cueing fields has underlined the reflexive nature of basic orienting processes, even across a range of social contexts (Cole et al., 2012; Hietanen & Leppänen, 2003). Nonetheless, real social interactions and realistic social stimuli have revealed novel findings that are not present in traditional lab approaches, which employ schematic or otherwise artificial stimuli (Risko, Laidlaw, Freeth, Foulsham, & Kingstone, 2012; Richardson & Gobel, 2015; Skarratt et al., 2010). There exists a range of different methods that can all be thought to represent an increase in ecological validity from early lab-based approaches. A pressing challenge for present social attention research is reconciling these disparate paradigms in order to develop rich models of how we attend to other’s actions.