Method
Participants
A total of 16 volunteers (6 males) were participated for payment. 15 participants were right-handed. The average age was 23 years (range 20–27). All participants were naive to the purpose of the experiments and were native German speakers. All had normal or corrected-to-normal vision and none had previously seen the stimuli.
Stimuli
We presented prime–target pairs comprised of action movies as primes and pictures of manipulable objects as targets. The prime stimuli consisted of eight gray-scale movie clips, each lasting 2,000 ms (25 frames/s). The movies showed hands performing an action in interaction with an unseen object. Movies were recorded using the MPI VideoLab (Kleiner et al. 2004). The actions were filmed in front of a black background. The actor wore black clothing. He performed the action in interaction with real objects in order to ensure that the dynamics of the action were correct which is not easily achieved when merely using pantomimes. The objects were painted black or covered in black cloth. Thereafter, luminance-based image thresholding was applied to each movie frame to segment the hands which performed the action from the unwanted “background” parts (actor, object, background). The size of the movie on the screen was 512 × 768 pixels (circa 18.8 × 25.3 cm) and subtended 11.9° × 17.8° at a viewing distance of 90 cm (a chin-rest was used to stabilize the observers’ viewing distance). We used the following eight actions as prime stimuli: (1) screwing with a screwdriver, (2) pounding with a hammer, (3) ironing with an electric iron, (4) typing on a computer keyboard, (5) rolling out with a rolling pin, (6) sweeping with a dustpan, (7) stapling with a stapler and (8) carrying a toolbox.
The target stimuli consisted of 56 gray-scale photographs of familiar man-made manipulable objects. The objects were inscribed into a square of 280 × 280 pixels in order to equate the maximal extension. Picture size on the screen was circa 10.3 × 10.3 cm (visual angle about 6.5° at a viewing distance of about 90 cm). Prime and target stimuli were presented in the center of a 21″ monitor with a resolution of 1,024 × 768 pixels and a refresh rate of 100 Hz.
Word labels in the picture–word matching task denoted the names of the objects at the basic level of abstraction (Rosch et al. 1976; e.g., “corkscrew”, “nutcracker”, “typewriter”, but note that we used German words as all participants were German native speakers). They were shown in white letters on black background in the center of the screen. Height of the word label was about 0.9 cm, width ranged between 2.6 and 9.7 cm (depending on word length). Thus, the visual angles ranged from about 0.57° × 0.95° to about 0.57° × 3.81°.
Procedure
Participants were instructed to fixate the central fixation cross and to initiate the next trial by pressing a button. After button press the fixation cross remained visible for 1,000 ms followed by a blank black screen for 700 ms. Then, a prime movie was shown (lasting 2,000 ms) followed by another blank black screen for 70 ms. Subsequently, the target object was displayed for 80 ms. The target object was replaced by a blank screen for 120 ms followed by a picture showing a word label (250 ms). Subjects were instructed to decide whether the word label matches the previously shown target picture and to respond as fast and as accurate as possible by pressing one of two buttons (button assignment counterbalanced across observers). After the response was recorded, the fixation point reappeared and the participant was able to initiate the next trial. The experimental session started with a short practice phase (10 trials, other stimuli than in the main experiment). The main experiment consisted of 2 blocks; in each block 56 stimulus pairs were presented.
Design
In the congruent condition, the eight prime movies were combined with several (3 up to 10, on average 7) target objects affording actions similar to the action shown in the movie. For example, the target objects scissors, nutcracker and pliers typically involve an action similar to the prime action “stapling with a stapler” in that they all have a typical hand movement in common: closing the hand to compress the handles. Overall, 8 prime actions were combined with 56 congruent target objects (examples are shown in Fig. 1).
Importantly, target objects were never the same object as the one which was used to record the prime action. Thus, even if a participant guesses the (deleted) object upon which the action was carried out, the same object never appeared as a target stimulus. For example, when viewing hands typing on an unseen computer keyboard it is easily possible that the observer guesses the object (keyboard) upon which the action is carried out. By avoiding to show these objects as targets we could rule out that a potentially observed “action priming” effect is, in fact, simply a priming effect of the pre-activated object representation on target recognition.
In the incongruent condition, the 56 target objects were randomly assigned to one of the 7 dissimilar prime actions such that each prime movie was combined with the same number of congruent and incongruent targets. This randomization was done once and was retained unchanged for all subjects (which enables us to equate congruent and incongruent prime–target pairs for semantic similarity; see below).
A total of 50% of the target pictures (randomly chosen for each observer individually) were combined with the word label correctly denoting the object, in both the congruent and the incongruent condition. On the other 50% of the trials the remaining word labels were randomly assigned to the target objects such that they did not match the target. The presentation order of prime–target stimulus pairs was randomized.
Norming studies
In two norming studies, ratings were obtained with regard to action similarity and semantic similarity in order to (1) ensure that in the congruent condition the similarity between prime action and the action participants typically associate with the target object is indeed significantly higher than in the incongruent condition and (2) to ensure that semantic similarity does not differ across congruent and incongruent conditions (to rule out a potential confound by global semantic similarity).
In the first norming experiment (action similarity), a prime movie and a target object were sequentially presented in every trial (same pairing and same procedure as in Experiment 1, see below). Prime–target pairs were presented in randomized order. Subjects (n = 11) had to judge the similarity between observed prime action and the action they typically associate with the target object on a scale from 1 to 7, where 1 indicates very low and 7 very high similarity. A two-tailed two-sample t test revealed that action similarity was significantly higher for congruent as opposed to incongruent prime–target pairs (incong. 2.41, cong. 6.00, P < 0.001) indicating that the experimental manipulation was effective (see Fig. 2).
In the second norming study (semantic similarity), a photograph of the object, which has been used to record the prime action, was presented before the congruent and incongruent target objects (same procedure as in the first norming study). Participants (n = 12) had to indicate the semantic similarity between the object used in the action movie and the target object (on a scale from 1 to 7, with 1 indicating very low similarity, 7 very high similarity). A two-tailed two-sample t test did not reveal significant differences of semantic similarity (incong. 4.58, cong. 4.64, P > 0.6) between prime–target stimulus pairs from incongruent and congruent conditions (see Fig. 2). This result renders it unlikely that a potential priming effect in the main experiments (1 and 2) is due to differences in global semantic similarity.
Results
The analysis was restricted to trials on which picture—word label pairing was correct—as processes underlying performance in incorrect picture–word trials are less constrained than in correct trials. Analysis of reaction time data was additionally restricted to trials on which the participants responded correctly to the task.
Matching accuracy was higher in the congruent than in the incongruent condition (cong.: mean acc. = 94.3%, incong.: mean acc. = 89.6%). A one-tailed paired-sample t test revealed a significant effect of action congruency [t(15) = 2,833, P < 0.01]. Reaction times did not differ significantly across conditions [cong.: mean RT = 416 ms, incong.: mean RT = 424, t(15) = −0,916, P > 0.18] (see Fig. 3 upper panels).
In agreement with our prediction we found an action priming effect on matching accuracy—although we controlled for semantic similarity as a potential confounding factor. We did not observe an effect on reaction times. Importantly, reaction times are not faster in the incongruent condition and thus, the observed priming effect on accuracy does not merely reflect a speed-accuracy trade-off.
Accuracy in the picture–word matching task was on average at about 90%, and thus relatively high. Potentially, when the perceptual system is more taxed by masking the target object, action priming effects may become stronger and possibly evident also in the reaction times.