Introduction

In our daily lives, we often use objects that greatly extend our bodily action capabilities, such as writing with a pen, using a hammer, or driving a car. According to an extended view of cognition, such objects can be considered as an extension of the human body (Clark 2004). This notion is exemplified by upper limb amputees who can attain an amazing degree of control over neural prostheses and who often consider the prosthesis as a part of their own body (Schultz and Kuiken 2011).

In addition to these clinical achievements, a growing number of studies have provided insight into the functional and neural mechanisms underlying the extension of one’s body schema to include tools and external objects (for review, see: Arbib et al. 2009). For instance, in a monkey study, it was found that the response properties of visuotactile neurons in the anterior intraparietal sulcus (IPS) changed after the monkey acquired the skill to use a tool as a rake (Iriki et al. 1996). More specifically, whereas the initial receptive field of these neurons responded to stimuli near the hand, after training with the tool, the receptive field was found extended into more distant space surrounding the end of the tool. The authors suggested that the use of the tool extended the representation of the body, by including the area that could only be reached with the rake. In humans, comparable effects of tool use have been established as well by investigating cross-modal congruency effects for stimuli presented at the end of a tool (Maravita et al. 2002; Holmes et al. 2004, 2007). Together these studies suggest that tool use changes one’s peripersonal space—i.e., the space directly surrounding one’s body—via a process of multisensory integration of information related to the tool.

Whereas the relation between tool use and changes in the body schema has been an intensive topic of investigation, less is known about the mechanisms whereby we learn to use novel tools. Observational learning and imitation likely provide important mechanisms to learn novel tool use that would otherwise take a lot of time and effort (Massen and Prinz 2009). For instance, as children we learn how to eat with a knife and a fork by observing our parents or peers, and as adults we may learn how to operate the new espresso machine by carefully observing the actions of our colleagues. The ability to learn by observation and imitation has received much attention in recent years (Byrne and Russon 1998; Rizzolatti et al. 2001; Brass and Heyes 2005; van der Helden et al. 2010). A classical finding is that participants are faster to execute a movement after having observed an actor performing the same movement, even in cases where the observed movement is irrelevant to the subject’s task (Brass et al. 2000, 2001; Press et al. 2005; Jonas et al. 2007a). Several studies have suggested that an important network underlying imitation and observational learning is formed by the putative mirror neuron system and more specifically by the inferior frontal gyrus (IFG; Iacoboni et al. 1999; Kilner et al. 2003; Koski et al. 2003; Buccino et al. 2004; van der Helden et al. 2010). In contrast, other studies have argued for a more general involvement of the IFG in perception–action coupling—beyond imitation—(Dassonville et al. 2001; Newman-Norlund et al. 2007b, 2010). As a consequence, the precise functional significance of the neural mechanisms involved in imitation remains a matter of ongoing debate.

Different explanations have been suggested for these apparently contradictory findings. On the one hand, specialist theories of imitation suggest that imitation is subserved by a special purpose imitation system that evolved in order to relate observed biological movement to one’s own motor repertoire (Meltzoff and Moore 1989; Anisfeld 1991; Jones 2009). According to this theory, an important mechanism underlying imitation is the direct matching of observed movements unto one’s own motor repertoire. In other words, based on learned action-effect associations (i.e., the movement of our own hand results in a perceptual change in our visual field), the observation of the perceptual consequences of an action elicits to some extent the same motor program as used for bringing about the effect. Because the kinematics for bringing about a tool action are quite different from the kinematics of a manual action, observation of tool and hand actions should activate different motor programs. As a consequence, according to the specialist view of imitation, tool use imitation should be different from the imitation of biological actions, as both rely on different specialized neural systems and involve different action representations. On the other hand, generalist theories of imitation suppose that imitation is based on general cognitive mechanisms of associative learning and action control that are involved in other tasks as well (Anisfeld 1991; Brass and Heyes 2005). According to the generalist view, tool use imitation and biological imitation should be similar, as both rely on a common neural network involved in perception–action coupling.

The aims of this study were to distinguish between these two hypotheses and to extend our knowledge about the relation between tool use and imitation more generally. Therefore, subjects performed actions either with their hand or with a handheld tool in separate blocks. We manipulated the congruency of the grip type observed in the picture (grip congruency). Importantly, actions could be performed in response to a tool cue, a hand cue, or a symbolic cue (effector congruency). In this way, we were able to measure whether imitation of tool or hand actions is modulated by effector-specific information.

We used an experimental setup in which the subject was seated behind a table on which a graspable object was placed that could be grasped with a full grip or a precision grip at respectively the lower side or the upper side of the object. Subjects were instructed to always make a full grip or a precision grip in response to a color cue represented on a screen (e.g., green = grasp object at lower side with a full grip, red = grasp object at upper side with a precision grip). The color cue was superimposed on different effectors in the picture (i.e., a hand or a tool) that could be in a spatial position that was congruent or incongruent with respect to the required action. For instance, in congruent trials, the color cue was presented at the lower side of the object that could be grasped with a full grip and the color instructed the subject to actually grasp the lower side with a full grip. In incongruent trials, the color cue could for instance be presented at the upper side of the object that could be grasped with a precision grip, while the color instructed the subject to actually grasp the lower side with a full grip. In this way, we were able to measure the automatic interference effect of observing actions that could be congruent or incongruent with respect to the planned grip type and the effector displayed.

Methods

Participants

Twenty-four right-handed healthy adults participated in the experiment (7 men, mean age = 24.0 years) with normal or corrected-to-normal vision. Data from 3 participants were discarded from analysis, as the experiment could not be completed. In addition, data from 1 subject were excluded because of grasping the incorrect object part in more than 25% of all trials, leaving 19 participants for the final analysis.

Experimental setup and procedure

Participants were seated in front of a table facing a computer screen at a distance of approximately 100 cm. A centrally located response box was placed on the participant’s lap and served as a starting position for the grasping actions. A custom-made touch-sensitive manipulandum was attached to the table, consisting of a small cylinder (r = .08 cm, height = 1.80 cm) on top of a larger cylindrical base (r = 3.00 cm, height = 8.00 cm) that could be grasped with respectively a precision (upper part of object) and a power grip (lower part of object; see Fig. 1a).

Fig. 1
figure 1

a Experimental setup. Participants were seated behind a table on which the manipulandum and a screen were placed. Actions were performed with either the right hand (left picture) or with a handheld tool (right picture). Stimuli were presented on a screen directly in front of the participant. b Starting picture representing the manipulandum with the actor’s hands out of view (left side) and example stimuli (right side) used in the tool cue, the hand cue, and the symbolic cue conditions. Participants were instructed to grasp the manipulandum with either a precision or a power grip depending on the color of the stimuli

In half of all experimental blocks, subjects grasped the manipulandum by using a handheld mechanical tool (tool blocks); in the other half of all blocks, subjects grasped the manipulandum by using their right hand (hand blocks). The mechanical tool consisted of a handheld tool that could be held with a power grip (see Fig. 1 A; tool length = 78 cm; maximum distance between opened jaws = 8.5 cm). When using the tool to grasp the manipulandum, the movements of the hands differed from manual grasping in two ways: (1) The tool was always held with a full grip, and when closing the hand (i.e., making a full hand pincer grip), a handle was pulled resulting in the closing of the jaws. With hand extension, the handle was released, resulting in the opening of the jaws of the tool; and (2) The jaws of the tool were placed in 90° opposition relative to the position of the hand. During tool blocks, the chair on which the participant was seated had to be moved backwards, such that the jaws of the tool ended up in the same position as the initial position of the subjects’ hands during hand blocks (see Fig. 1a).

Each trial started with the participant holding the starting button of the response box, either by means of their index finger (hand blocks) or by means of the base of the tool (tool blocks). A starting picture appeared for 1,000–1,500 ms, representing the manipulandum and the actor with both hands out of view (see Fig. 1b). Then the target stimulus appeared which remained on the screen until the computer detected a grasping response or a return to the starting button.

Target stimuli consisted of static pictures representing a hand cue (hand grasping the manipulandum), a tool cue (tool grasping the manipulandum), or a symbolic cue (dots presented near the manipulandum; see Fig. 1b). For hand cues and tool cues, the effector (the hand or the jaws of the tool) was colored red or green, by adjusting the color balance using Photoshop CS5 (Adobe Systems, Inc, CA). Symbolic cues consisted of red or green dots near the manipulandum and were included as a control condition to establish a baseline measure of performing hand and tool actions in response to unambiguous visual cues. Thus, in the case of symbolic cues, no effector was represented in the picture: only the colored dots were visible. Hand cues, tool cues, and symbolic cues were randomly presented. Pictures were presented at a size of 1,280 × 1,024 pixels, resulting in a stimulus size on the screen that matched the experimental setup in size (i.e., the object, hand, and tool represented in the picture had a size comparable to their real counterparts).

The participants were instructed to perform either a precision or a power grip, based on the color of the cue. Half of all participants were required to perform a power grip in response to the color green and a precision grip in response to the color red. The other half of all participants received opposite instructions. In this way, participants were instructed to perform actions that were either congruent or incongruent with respect to the end location and grip type represented in the picture (e.g., grasping the upper part of the manipulandum, while observing a cue at the lower part of the manipulandum). Importantly, the manipulation to instruct actions based on color cues ensured that participants were attending both the grip type and the effector displayed in the picture, even though this information was task-irrelevant (i.e., the grip type displayed was not relevant to the action that the subject was required to perform). Participants were instructed only to initiate their movement when they were certain which part of the object they planned to grasp. All movements were made with the right hand, and following movement execution, participants were instructed to hold the object for about 1 s, without lifting or moving it, after which they returned to the starting position to initiate the next trial. No explicit instructions were given with respect to the fixation while performing the movement, but as it is generally known that the eyes are often ahead of the hand (Neggers and Bekkering 2000), subjects most likely fixated on the target object that they had to grasp. Participants performed 4 blocks of 120 trials each. These blocks consisted of a 2 (action: tool and hand) × 3 (stimulus: hand cue, tool cue, or symbolic cue) × 2 (movement: congruent and incongruent) design, and block order (i.e., tool or hand) was counterbalanced between participants. In total, the experiment took about 1 h.

The experiment was controlled using Presentation software version 12.2 (Neurobehavioral Systems, Davis, CA). The response box and the manipulandum were connected to a computer to detect (1) reaction times (time between target onset and release of the starting position), (2) movement times (time between release of the starting position and grasping the manipulandum), and (3) end position (power or precision grip).

Results

Trials with incorrect responses (i.e., grasping the incorrect part of the object; 3.4% of all trials), trials in which the subject did not hold the button at the onset of the picture (reaction times less than 100 ms; 1.1% of all trials), and trials exceeding a 2 SD cutoff for each subject’s mean RT (3.3% of all trials) were excluded from reaction time analysis. The averaged reaction times, movement times, and error rates were analyzed using a 2 × 3 × 2 repeated measures ANOVA with Action (tool and hand), Cue type (tool, hand, and symbolic) and Congruency (congruent and incongruent movements) as within-subjects factors. Reaction and movement times are represented in Fig. 2.

Fig. 2
figure 2

Reaction and movement times. Graphs at the left represent RTs and MTs to actions performed with the tool, and graphs at the right represent RTs and MTs to actions performed with the hand. Bars on the left represent actions in response to tool cues, bars in the middle represent actions in response to hand cues, and bars on the right represent actions in response to symbolic cues. Light bars represent congruent movements and dark bars incongruent movements. Error bars represent standard errors

For reaction times, a main effect of Congruency, F(1, 18) = 41.9, P < .001, η2 = .70, reflected slower reaction times for incongruent (457 ms, SE = 20) compared to congruent movement cues (441 ms, SE = 18). A main effect of Cue type, F(2,36) = 28.3, P < .001, η2 = .61, reflected faster reaction times to symbolic cues (433 ms, SE = 17) compared to hand cues (462 ms, SE = 20) and tool cues (452 ms, SE = 19). The main effect of Action was not significant (P = .25). No significant interactions were observed.

For movement times, a significant main effect of Action, F(1, 18) = 45.7, P < .001, η2 = .72, reflected that movement times were faster when the actions were performed with the hand (530 ms, SE = 18) than with the tool (789 ms, SE = 41). No other main effects or interactions were found significant for the analysis of movement times.

For the analysis of the error rates (i.e., grasping the manipulandum at the incorrect part), a significant main effect of Action, F(1, 18) = 9.0, P < .01, η2 = .72, reflected that subjects made slightly more errors (i.e., grasping the incorrect part of the object) when using the tool (4.9%) than when grasping with the hand (1.9%). No other effects were found significant for the analysis of error rates.

Discussion

The present study established a classical action congruency effect for hand actions and tool actions, reflected in faster responses if the observed movement was the same as the instructed movement, even though the observed movement was irrelevant to the subject’s task. Thereby this study extends previous findings that were based on finger lifting movements (Brass et al. 2000, 2001) and transitive hand movements (Newman-Norlund et al. 2007a, 2010; van Schie et al. 2008) to the domain of tool use.

Importantly, neither hand actions nor tool actions were differentially affected by the effector represented in the picture. Thus, when performing a tool action, the action congruency effect was comparable for cues representing a tool and a hand. Similarly, when performing a hand action, no difference was found in the action congruency effect between cues representing a hand or a tool. These findings are in line with previous studies that have reported comparable effects of biological and non-biological stimuli on action imitation (Press et al. 2005; Jansson et al. 2007; Newman-Norlund et al. 2010). Thereby these findings support generalist theories of imitation, according to which imitation is subserved by general cognitive mechanisms of associative learning and action control (Brass and Heyes 2005; Heyes 2011).

In addition, the finding that tool imitation is not modulated by effector-specific information is in line with several studies, suggesting that tool use training results in the acquisition of effector-independent action representations. For instance, grasping kinematics are often highly comparable between actions performed with the hand or with a tool (Gentilucci et al. 2004), and in a recent study, it was found that observational priming of an action with a physical device was not influenced by whether the action was performed with the left or the right arm (Massen 2009). In a recent fMRI study, a comparable activation was found in the brain’s parieto-frontal grasping network when subjects planned a hand grasping action or a grasping action with a handheld tool (Jacobs et al. 2010). In addition, in monkeys, it was found that grasping neurons in the ventral premotor cortex represented the outcome of a tool action rather than the precise kinematics by which the action was performed (Umilta et al. 2008).

Interestingly, some studies have shown tool-specific responses in the mirror neuron system (e.g., neurons are active both when viewing a tool action and executing a hand action; Ferrari et al. 2005) and during action observation in humans (Massen and Prinz 2009). One possibility is that representing tool actions in terms of the outcome requires a substantial amount of training with the tool—either by observation or by doing. In this context, it should be noted that most tool-responsive mirror neurons were observed in the later phases of the experiment, after which the monkey had already some experience with tool observation (Ferrari et al. 2005). Furthermore, grasping neurons were found similar responsive to the observation of a hand closure using a pair of normal pliers and a hand opening using a pair of “inverse pliers” (Umilta et al. 2008). Using a similar paradigm in humans, it was found that TMS-evoked motor potentials reflected the outcome of the action (e.g., the pliers opening or closing) rather than the actual hand kinematics involved (Cattaneo et al. 2009). Together these studies support the idea that tool actions are represented in an effector-independent fashion.

In the present experiment, the effector used for grasping was varied between blocks, whereas the grip type to be executed varied within blocks. This design was a logical consequence of the fact that it is difficult to vary the effector within blocks (i.e., subject alternating between hand and tool actions from trial to trial). One possible confound of this design could be that information about the grip type was more task-relevant as it varied from trial to trial, whereas the effector used for grasping remained the same. In other words, the finding that imitation of hand and tool actions is effector-independent could be a consequence of subjects paying more attention to the grip type than to the effector represented in the picture. Indeed, previous studies have shown that the saliency of the action feature observed plays an important role in facilitating imitation (Bird et al. 2007; Franz et al. 2007). However, in contrast to many previous imitation studies, it should be noted that in the present study, only the color of the action cue determined what action subjects were required to perform, whereas both the grip type and the effector represented in the picture were irrelevant to the subject’s task. In fact, the color cue was always superimposed on the effector, thereby ensuring that subjects implicitly attended to both the grip type and the effector. The finding that only information about the grip type interfered with action imitation, suggests that only task-irrelevant information about the grip but not about the effector can facilitate or inhibit imitation. In addition, it should be noted that several studies have shown that even when both the effector and the end location vary within blocks, imitation is mainly modulated by the end location of the observed action (Bekkering et al. 2000; Wohlschlager et al. 2003; Franz et al. 2007), thereby providing further support for the notion that the findings observed in the present study are not an artifact of the experiment setup.

In the present study, the end location of the action (up or down) was always concordant with a specific grip type (precision or full grip). Thus, based on the present study, it is difficult to determine whether the action congruency effect was driven mainly by the spatial location (up or down) or by the grip type (precision or full grip) of the observed effector. However, in a recent study using a similar setup (van Schie et al., in prep.), we showed that the planning of manual grasping actions was more efficient in response to cues representing information about the end location than in response to cues representing the grip type to be performed. In addition, several studies on action observation have shown that it is easier to attend to goal-related aspects (i.e., what is the end location of the action?) than to grip-related aspects of an observed action (i.e., which grip is used for grasping the object? Bach et al. 2005; van Elk et al. 2008). Similarly, in the present study, it is likely that the action congruency effect reflects the congruence between the observed end location and the planned end location of the action.

The present study shows that the imitation of tool actions is effector-independent, as measured with reaction times. In addition to reaction times, kinematic and/or EMG measures often provide valuable insight into the precise dynamics underlying action execution. For instance, with respect to tool use it has been shown that grasping kinematics are often highly comparable between actions performed with the hand or with a tool (Gentilucci et al. 2004). Although in the present study we did not measure kinematics, the finding of an action congruency effect in the reaction times but not in the movement times suggests that action congruency mainly affected the planning phase and not the execution phase of the action (see also: Chong et al. 2009). Typically, reaction times are considered a reliable measure of how plans for movement sequences are mentally represented (Rosenbaum et al. 1984), and faster reaction times often reflect a more efficient planning process. Thus, the finding that especially the reaction times were affected by the congruency of the observed action is in line with the action planning-control framework, according to which the planning phase of an action—which involves the incorporation of visual and cognitive information—occurs within the first few hundred milliseconds (Glover 2004).

Interestingly, both tool and hand actions were performed faster in response to symbolic cues compared to hand and tool cues. Although some previous studies have reported faster responses to biological compared to non-biological cues (Brass et al. 2000; Jonas et al. 2007a), in another imitation study, a comparable advantage of symbolic over biological cues was observed (Newman-Norlund et al. 2010). At a neural level, although some studies have suggested that motor-related areas respond preferentially to the observation of biological movements (Iacoboni et al. 1999; Perani et al. 2001; Heiser et al. 2003; Kilner et al. 2003; Tai et al. 2004), other studies have shown that biological and non-biological movements result in activation in overlapping brain areas (Gazzola et al. 2007; Jonas et al. 2007b). The apparent inconsistency between these studies may be related to the stimuli used. In the present experiment, the symbolic stimuli used were probably more salient than both the tool stimuli and the hand stimuli, thereby resulting in faster reaction times.

Previous studies have suggested that an important mechanism underlying imitation is spatial compatibility, i.e., the spatial congruence between a stimulus and a response results in faster reaction times (Aicken et al. 2007; Jansson et al. 2007; van Schie et al. 2008; Catmur and Heyes 2010). The present study is in line with this account, as the driving factor underlying the imitation of both hand and tool actions appears to be the spatial compatibility of the action cue with the prepared action, rather than the effector-specific information represented by the cue. The finding that both hand and tool actions showed comparable reaction time effects furthermore supports the idea that actions are planned primarily in terms of the end location (i.e., grasping the upper or lower part of the object) rather than the effector by which the action is performed. This interpretation is in accordance with the ideomotor principle according to which actions are represented primarily in terms of the effects they produce (Prinz 1997; Hommel et al. 2001; Massen and Prinz 2009; Shin et al. 2010) and fits will with the hierarchical view of the motor system according to which end locations are an organizing feature of action planning (Grafton and Hamilton 2007; Rosenbaum et al. 2007). The present study extends this view to tool actions as well, in line with the notion that tools often can be considered a natural extension of the human body (Arbib et al. 2009).

Conclusions

The main finding of the present study is that the imitation of hand and tool actions is not affected by effector-specific information. Thereby this study supports generalist rather than specialist theories of imitation and suggests that imitation is subserved by general cognitive mechanisms, such as spatial compatibility.