Adequate social behavior nearly always requires the ability to appreciate that a given situation is perceived and interpreted differently by different people involved. Imagine a football game in which the players did not constantly monitor what other players and the referee could see and how a dive would be interpreted. Or think of situations in which people judge whether or not an approaching car is seen by a child—an ability that can save a life. Because representing how others perceive a given situation is so frequently encountered in, and so important for, social life, it is not surprising that people spontaneously adopt the perspective of others during conversations (Clark & Krych, 2004).

Spontaneous perspective adoption is also found in experimental paradigms involving visual stimuli of persons (Belopolsky, Olivers, & Theeuwes, 2008; Frischen, Loach, & Tipper, 2009; Samson, Apperly, Braithwaite, Andrews, & Scott, 2010; Thirioux, Jorland, Bret, Tramus, & Berthoz, 2009; Tversky & Hard, 2009; Zwickel & Müller, 2010). What is more, people even adopt the perspective of geometrical shapes if the movement of the shapes appears intentional (Zwickel, 2009). Surprisingly, the relationship of this spontaneous perspective taking (SPT) to processes engaged when people are explicitly instructed to adopt a certain perspective (instructed perspective taking, IPT) or to mentally rotate objects has as yet received little attention.

Traditionally, IPT, SPT, and mental object rotation have been investigated employing different paradigms—potentially obscuring the fact that all of these visuospatial tasks might be related. In this study, we distinguish between two ways in which tasks can be related: Tasks can be related in terms of underlying processes or representations. For example, when one imagines a perspective rotation of oneself, processes might be involved that are similar to those involved in a situation in which spontaneous perspective taking occurs without one being instructed to do so. These processes operate on representations that can be similar (e.g., representations of different human bodies) or dissimilar (e.g., representations of a human body and a nonliving object).

Importantly, common processes and common representations give rise to different patterns of interference. For example, if SPT and IPT draw on common processes, this would give rise to interference when both types of perspective taking are engaged, where the interference might be independent of the specific orientation representations involved in SPT and IPT. That is, whether the spontaneously adopted and the instructed orientations are the same or different, the same amount of interference might arise, because the same rotation processes are invoked even if the orientations are different. Such an interference pattern would mean that it might be difficult to adopt, say, a leftward-facing perspective while spontaneous adoption of another perspective occurs, whatever the orientation of the spontaneously adopted perspective (also leftward or, say, rightward). Alternatively, the orientation representations corresponding to the instructed and the spontaneously taken perspectives might interfere; for example, it might be more difficult to adopt a leftward-facing perspective when, at the same time, spontaneous adoption of the perspective of a rightward-facing person occurs. In this case, the orientation representations would be subject to interference, which could be taken as an indicator of common reference frames. See “Experiment 1” for a more formal treatment of these alternatives.

Even though IPT has received a great deal of attention in the literature, studies examining the underlying processes of SPT have been scarce. Given this, in Experiment 1, we set out to achieve a better understanding of the relationship between SPT and the well-studied IPT by using an interference-based logic. Having established interference between SPT and IPT in Experiment 1 and, therefore, the relationship of SPT to another visuospatial task, we went on to investigate the relationship of SPT to other visuospatial tasks in Experiments 2 and 3 in order to better understand the processes and representations that permit SPT to occur. Finding interference between SPT and other visuospatial tasks would then reveal what kind of representations and processes are involved in SPT.

On the basis of previous reports (Amorim, Isableu, & Jarraya, 2006; Kessler, & Thomson, 2010; Zacks & Michelon, 2005) of the involvement of motoric processes in IPT, we speculated that if interference between SPT and IPT could be demonstrated, this interference would be attributable mainly to the involvement of embodied processes and representations. On this basis, participants were asked to mentally rotate a non-body-part object in Experiment 2 and a body-part object in Experiment 3. These tasks are similar, except, however, that body-part objects, but not non-body-part objects, are thought to automatically evoke motoric processes (see below). Given this, these two tasks (rotation of body-part and non-body-part objects) can be used to test the hypothesis that common motoric processes cause interference between SPT and other visuospatial tasks, the mental rotation of body-part objects being one example. The following section highlights studies that have shown motoric involvement in visuospatial tasks.

Amorim et al. (2006), for instance, compared the performance of mental object rotations between objects with and without human features (e.g., abstract shapes with/without heads and feet). The results showed that participants could solve the mental rotation tasks better for stimuli with human features. These findings were taken as an indication for motoric processes being involved in the mental rotation of objects with human body features. Involvement of motoric processes was also found in an IPT paradigm. Kessler and Thomson (2010) compared performance of perspective taking when participants’ posture was either congruent or incongruent with the direction of the mental self-rotation required by the task. Participants solved the task faster when their body posture was congruent with the required rotation, which was taken as evidence for motoric embodiment. Taken together, findings such as these suggest that embodied processes and representations underlie both IPT and the rotation of objects with humanoid features.

If our reasoning is well-grounded, we would expect interference between SPT and tasks with motoric components. This logic motivated Experiments 2 and 3. By examining interference between SPT and mental rotation of non-body (Experiment 2) and, body (Experiment 3) objects, we wanted to see whether SPT would share processes that are employed during the mental rotation of body parts. Alternatively, SPT might share processes that are engaged in the mental rotation of non-body parts. Finding common processes for SPT and the rotation of body-part objects, but not the rotation of non-body-part objects, would add support to the hypothesized involvement of motoric processes in SPT and, thus, to the similarity between SPT and IPT. Overall, achieving a better understanding of SPT, which can be seen as the “glue” for social functioning, not only would improve our understanding of social behavior, but also might help shed light on processes underlying socially less appropriate behavior. The different visuo-spatial tasks that were used in the experiments will be introduced next.

In spontaneous perspective taking paradigms, participants are not explicitly instructed to take a certain perspective. Rather, perspective taking occurs during solving some kind of cover task. For example, when people are asked to describe the location of objects, they often do so in relation to other people: When an observer faces another person, a bottle on the left as seen from the perspective of the observer is often described as being on the right of the observed person (Tversky & Hard, 2009). This tendency of SPT is so prevalent that, as a recent study showed (Zwickel, 2009), people in certain situations even spontaneously adopt the perspective of geometrical objects; that is, their egocentric reference frame is rotated to match the frame of the geometrical objects. In Zwickel’s study, participants watched animations that involved simple geometric shapes—the so-called Frith–Happé animations (Abell, Happé, & Frith, 2000). In these animations, two triangles move around in a self-propelled fashion on the screen. Despite the visual poverty of the stimuli, observers reliably describe certain movement patterns as indicative of intentions and emotional states (Heider & Simmel, 1944). In the study of Zwickel, in addition to watching these kinds of animations, participants were asked to make speeded judgments of the location of a briefly presented dot relative to one of the triangles. Even though participants were required to make the responses as seen from their own perspective, while the movements and pointing directions of the shapes were irrelevant, they showed slower responses when judgments from their perspective were in conflict with judgments from the perspective of the observed shapes. For example, when a triangle was pointing downward on the screen, dots presented on one or the other side of the triangle led to conflicting judgments when seen from the perspectives of the triangle and the observer, respectively. Importantly, slower responses in these inconsistent conditions were found only during animations that typically also give rise to descriptions of the shapes in terms of goals and mental states.

In IPT, often a distinction is drawn between level 1 and level 2 perspective taking (Flavell, Everett, Croft, & Flavell, 1981). Level 1 perspective taking refers to the ability to judge whether an object is seen from another person’s viewpoint, while level 2 perspective taking refers to the capacity to represent what a certain spatial layout looks like from someone else’s viewpoint. Michelon and Zacks (2006) showed that reaction times (RTs) to level 1 perspective taking are not affected by the angular difference between the participants’ and the to-be-judged viewing directions. In contrast, RT in level 2 perspective-taking tasks increases monotonically with angular separation between the observer’s and the to-be-judged looking directions. In what follows, IPT will refer to level 2 perspective taking only.

A particularly interesting study that highlights the close connection between IPT and SPT is that by Easton, Blanke, and Mohr (2009). In their study, participants with prior out-of-body experiences and thus, arguably, nonstandard SPT behavior performed worse when explicitly asked to switch between either imagining themselves in the position of someone else (IPT) or imagining the other person as being a reflection of themselves in a mirror. For participants with prior out-of-body experiences, performance when they were asked to switch between transformations of egocentric reference frames (IPT) and maintaining their egocentric reference frames (mirror task) was particularly poor in situations in which the visual input of the other person matched the task. A matching condition was, for example, seeing the other person front-facing in the mirror task, since this would also be the visual input in everyday life mirror experiences. This study thus points to a close relationship between SPT and IPT, because nonstandard SPT behavior, as expressed in prior out-of-body experiences, had an influence on IPT behavior. Further similarities between SPT and IPT have been found in an EEG paradigm (Thirioux, Mercier, Jorland, Berthoz, & Blanke, 2010), in which participants were asked to observe another person (SPT), imagine themselves in the position of the other person (IPT), or imagine the other person being a mirror reflection of themselves. Interestingly, SPT and IPT conditions were not distinguishable neurally, while both differed from the mirror instruction condition.

In paradigms of object rotation, participants are asked to decide whether two presented objects will be the same after applying certain rotations to them (Shepard & Metzler, 1971) or to judge whether an object—for example, the letter “R”—is a rotated version of a normal or mirror-reversed “R” (Cooper & Shepard, 1973). Mental rotation processes seem to differ for different objects (Parsons, 1987). Mental rotation of objects such as an “R” involves the rotation of an allocentric (object-centered) reference frame. Parsons (1987), by contrast, found evidence that the mental rotation of hand stimuli involves egocentric reference frames. In his study, participants made right/left judgments of hands by imagining aligning the orientation of the depicted hands with their own hands. In contrast to tasks in which “R” and mirror-reversed “R” are to be distinguished, RT did not depend on the orientation difference between the depicted object and a canonical form (e.g., upward presented stimuli), but rather on the orientation difference from participants’ hand position (Parsons, 1994). Moreover, RTs depended not only on the difference between participants’ hand orientation and the to-be-judged hand stimulus, but also on the anatomical awkwardness that would be associated with a real motoric rotation to align the hand with the stimulus. In general, actual movement times of instructed hand rotations closely correlated with the RTs for the right/left judgments (Parsons, 1987, 1994). This was taken as evidence that motoric processes are involved in tasks that require right/left judgments of body parts. See Corradi-Dell’Acqua and Tessari (2010) for an extensive review of the literature pertaining to the involvement of motoric processes in perception.

Interestingly, later studies have shown that, at least in participants with motoric deficits, different strategies might also be used when body part stimuli are rotated (Steenbergen, Nimwegen, & van Craje, 2007; Tomasino & Rumiati, 2004; Wilson et al., 2004). Wilson et al., for example, observed that children with a developmental coordination disorder displayed an RT pattern for right-/left-hand decisions that was more consistent with an object-based than with a motor imagery strategy. Similar results have been reported by Steenbergen et al. for persons with congenital hemiparesis.

Tomasino and Rumiati (2004) demonstrated that the involvement of motoric processes during mental rotation of body parts can be influenced by instruction, at least in patients with damage to mental rotation areas in the brain. In one experiment, Tomasino and Rumiati instructed participants to solve a right-/left-hand judgment task either by instructing them to imagine rotation of the stimulus to an upright position and decide from this position whether the thumb was on the right or the left (visual strategy) or by giving no explicit instruction on how to solve the task, which was expected to induce a motoric strategy. Indeed, RTs in the motor strategy condition reflected anatomical constraints of hand rotation, indicating that motoric processes were used. As the results showed, participants with impairments in areas underlying object-based mental rotation performed worse in the visual than in the motoric instruction condition. The reverse pattern was observed for participants with impairments to brain areas underlying egocentric-based mental rotation. Thus, this study shows that, even though different strategies might be available for the mental rotation of body parts, the default strategy for solving right/left judgments of hands (without any explicit instruction) is a motor strategy that involves egocentric reference frames.

Overview of the experiments

To investigate the relation between SPT and IPT, participants in Experiment 1 were instructed to judge the location of an object from another viewpoint (e.g., Zacks & Michelon, 2005) under conditions in which SPT was or was not expected to occur. In Experiment 2, we examined the relationship between mental non-body-part object rotation and SPT—the question being whether SPT would involve rotation processes and/or representations that are also engaged when object rotations are performed. Finally, in Experiment 3, we tested whether differences between body-part and non-body-part rotations (Sack, Lindner, & Linden, 2007) would be reflected in different interaction patterns with SPT. The overall aim of this set of experiments was to further our understanding of which task components—rotation processes or orientation representations—of visuospatial tasks overlap with SPT.

If the other visuospatial tasks and SPT rely on common processes for performing the mental rotations required, these processes would interfere in any condition in which two rotation processes are invoked, independently of the specific orientations involved. By contrast, if the other visuospatial tasks and SPT rely on common representations (e.g., orientation of the stimuli/adopted perspective), interference would be expected in conditions in which SPT occurs, where the magnitude of interference is modulated by the difference in orientation between the stimuli and the adopted reference.

Experiment 1

In Experiment 1, participants viewed six of the animations already used by Klein, Zwickel, Prinz, and Frith (2009). In these animations, two triangles are moving in a seemingly self-propelled fashion for about 18 s. Crucially, half of the selected animations are constructed in a way that typically elicits descriptions of the animations in terms of mental concepts—as, for example, “The small one is surprising the large one.” These animations are referred to as theory-of-mind (ToM) animations. The other half of the animations do not lead to such attributions, but are rather described in physical terms—for example, “Two triangles are floating around.” These animations will be referred to as random animations. As Zwickel (2009) has shown, animations of the ToM variety make observers spontaneously adopt the visuospatial perspective of the triangles, whereas observers of random animations show no such effect. To examine how this SPT interacts with IPT, we briefly displayed an arrow and a dot within the large triangle during the animations. The arrow was pointing upward, downward, leftward, or rightward (see Fig. 1).

Fig. 1
figure 1

Stimulus schema. Shown are all combinations of stimulus orientation (up, right, down, left), triangle orientation (up, down), and correct response (left, right). Congruent stimulus–triangle orientations are marked by a solid rectangle, incongruent conditions by a dashed rectangle. Participants were asked to adopt the viewing position along the pointing direction of the arrow and judge the location of the dot from this viewpoint. To test for common processes, conditions a and b are compared with conditions c and d. To test for common orientation codes, conditions b and c are compared with conditions a and d. See the text for details

Participants were instructed to adopt the viewing position along the direction of the arrow (realizing IPT) and judge whether, from this perspective, the dot would be on the left or the right side. If IPT and SPT relied on common processes, this would be reflected in overall slower performance on trials on which SPT and a rotation of the triangle occurred—that is, on ToM trials (affording SPT) on which the triangle was not oriented upward (conditions a and b vs. c and d in Fig. 1). This RT pattern of a main effect of triangle orientation could be taken as evidence that the rotation processes underlying SPT and IPT share a common basis.

However, if IPT and SPT did not rely on common processes but, rather, on common representations, we would expect a pattern of slower responses only in conditions in which SPT occurs and the instructed and the spontaneously adopted perspectives are in conflict. Accordingly, slowing was expected for downward-pointing-triangle/upward-pointing-arrow and upward-pointing-triangle/downward-pointing-arrow conditions (conditions c and b vs. a and d in Fig. 1). No difference between upward- and downward-pointing triangles was expected in conditions in which participants were required to adopt the viewpoint of leftward- or rightward-pointing arrows, since neither upward- nor downward-pointing triangle directions were closer to the pointing direction of the arrows. Finding an interaction between the orientations of the triangle and the arrow would therefore suggest that the underlying codes of the orientation representations are related. Importantly, a difference between upward- and downward-pointing triangles should be observed only in conditions in which SPT occurs—that is, only on ToM trials.

A more formal description of the two models would be to conceive of the SPT and IPT tasks in terms of rotation processes operating on orientations:

$$ \begin{array}{*{20}c} {rotat{e_{{_{SPT }}}}(orientatio{n_{SPT }})} \hfill \\ {rotat{e_{{_{IPT }}}}(orientatio{n_{IPT }})} \hfill \\ \end{array} $$

An interaction between the orientations would show that the orientation representations interfere—that is, \( orientatio{n_{SPT }} \) and \( orientatio{n_{IPT }} \). This interference could be mediated by the same underlying reference frames (e.g., egocentric frame) being used in both tasks. In contrast, a main effect of triangle orientation would argue in favor of a model that assumes an overlap between the processes \( rotat{e_{SPT }} \) and \( rotat{e_{IPT }} \) (e.g., common motor processes). Finding an effect restricted to one animation condition would show that the effect is not caused by low-level visual interactions of the arrow and triangle stimuli.

Method

Participants, apparatus, stimuli, and design

Twelve participants (mean age 28 years; 8 female; all right-handed) with normal or corrected-to-normal eyesight took part in the experiment in exchange for money or course credits. Distance to the monitor was approximately 55 cm.

Six animations (three random, three ToM) from the Frith–Happé animations that had already been used by Zwickel (2009) served as stimuli. Three additional practice animations allowed participants to become familiar with the task. Each of the experimental animations had a duration of about 18 s and displayed a red (about 4° and 2° in height and width) and a blue (about 2° and 0.5° in height and width) triangle that moved in a seemingly self-propelled fashion. These animations were edited such that at six pseudorandom time points during the animations, an arrow appeared inside the larger triangle. At the same time, a dot of about 1° was presented to either the left or the right of the arrow.

During a training block at the beginning of the experiment, 3 trials with animations that were not used in the experiment were run to permit familiarization with the task. Animation condition was blocked, and block sequence was balanced across participants. Each block contained eight repetitions of the three animations of one condition. Thus, 51 trials were run in total. Time and direction of arrow orientation was pseudorandomized so that, across the eight repetitions of a given animation, each combination of arrow orientation direction, dot side, and triangle pointing direction occurred 3 times.

Procedure

Participants were told that they were going to see different animations, which they were asked to watch and try to remember their content so as to be able to describe them later. Participants were asked to write down at the end of each block what had happened in the animations during the last block, to make sure that they had actually paid attention to the content of the animations. These records were not further analyzed, except to ascertain that participants had actually been paying attention to the stimuli and not imagining nonexisting objects. In addition, participants were asked to respond to the arrows that appeared during the animations. Participants were told that the arrows would always appear within the red triangle. Participants responded with the “l” key if they thought that the dot was on the right side, relative to the arrow pointing direction, and with the “s” key if they thought that the dot had been on the left side. If participants failed to respond within 400 ms, the animation paused and waited for a decision. After the decision, the animation resumed playing.

Data analysis

RT was calculated from stimulus presentation until buttonpress. To exclude trials on which participants did not comply with the task, exclusion proceeded in hierarchical steps. First, RTs longer than 3,000 ms or shorter than 150 ms were excluded as clear outliers (no responses). Next, incorrect responses were not analyzed (wrong responses). Finally, all RTs with an absolute difference of more than 2 standard deviations from the mean of the participant were also rejected (unfocused responses). This is the standard criterion that had previously been used in similar paradigms (e.g., Zwickel, 2009). For each participant, the remaining RTs were averaged separately for each combination of animation, arrow direction, and triangle orientation.

These mean RTs were subsequently subjected to a repeated measures ANOVA with the factors animation condition (random, ToM), triangle orientation (down, up), and stimulus orientation (up, right, down, left). When necessary, violation of sphericity was corrected according to Greenhouse–Geisser. To facilitate reading, however, only uncorrected degrees of freedom are reported. Significant interactions were followed up by separate ANOVAs for the two animation conditions and by planned contrasts that compared the upward and downward triangle orientation conditions for the different arrow orientations.

Results

On average, participants responded on more than 99% of all trials. However, 19% of the trials were excluded due to incorrect responses. Furthermore, 3% of all trials involved unfocused responses according to the criteria described above and were excluded on this basis.

RTs increased with an increase in angular distance from upright, with the longest RTs for downward-pointing arrows. In the ToM animation condition, the difference between upward- and downward-pointing triangle conditions was largest for upward- and downward-pointing arrows, but with reversed sign: For upward-pointing arrows, upward-pointing triangles led to faster responses than did downward-pointing triangles; the opposite was observed for downward-pointing arrows. These observations were reflected in a significant effect of stimulus orientation, F(3, 33) = 26.70, MSE = 12,843, p < .01, and a significant effect of animation condition, F(1, 11) = 5.56, MSE = 13,838, p < .05, as well as an interaction between stimulus and triangle orientation, F(3, 33) = 5.93, MSE = 2,937, p < .01. However, these effects were modulated by the significant three-way interaction, F(3, 33) = 6.78, MSE = 2,513, p < .01. All other main effects and interactions had p values above .10.

Examining the random condition separately revealed a significant main effect of stimulus orientation, F(3, 33) = 24.92, MSE = 5,630, p < .01; all other F values were < 1.01.

For the ToM condition, a significant effect of stimulus orientation was found, F(3, 33) = 18.06, MSE = 24,442, p < .01; there was no main effect of triangle orientation, F(1, 11) = 1.57, MSE = 4,712, p > .10, but, importantly, there was a significant interaction between stimulus and triangle orientation, F(3, 33) = 12.22, MSE = 2,703, p < .01.

Planned contrasts revealed significant differences between upward- and downward-pointing triangle conditions for the upward- and downward-pointing arrows (both ps < .01) in the ToM condition. For the other stimulus orientation conditions, no differences between the upward- and downward-pointing triangle conditions were found (both ps > .05). See Fig. 2 for the mean RTs.

Fig. 2
figure 2

Mean reaction times (RTs) and standard errors for each combination of triangle movement (up, down), arrow pointing (up, right, down, left), and animation condition (random, ToM)

Descriptively, the incorrect responses mirrored the RT results, with an increase in errors with increasing angular deviations from upright and with fewer errors in ToM conditions in which the triangle and arrow pointing directions were congruent. The correlation between RTs and error rates was significantly positive, r(16) = .70, p < .01; that is, more errors occurred with slow responses. This excludes explanations in terms of speed–accuracy trade-offs.

Discussion

Experiment 1 was designed to investigate whether common processes and/or representations underlie IPT and SPT, by testing for interference effects between IPT and SPT. An increase in RT was found with an increase of angle between the upright and the rotated stimuli, as would be expected for IPT tasks (e.g., Michelon, 2006). More relevant to the question at issue, the results clearly showed that incongruent stimulus and triangle orientations lead to slower responses than when both tasks require processing of the same (congruent) orientations. Importantly, this interference was found only when SPT was expected to occur—that is, in the ToM animation condition—which makes explanations based on low-level visual interactions between triangle and arrow stimuli unlikely. Experiment 1 thus clearly shows that the orientations encoded in IPT and SPT influence each other. This interference could be explained by assuming that the same egocentric reference frame underlies both spatial transformation tasks.

No main effect of triangle orientation was found, which would have provided evidence for common underlying processes. However, the significant interaction between stimulus and triangle orientation might have obscured an underlying main effect of triangle orientation.

Experiment 2

The results of Experiment 1 suggest that IPT and SPT share common representational elements. In Experiment 2, we investigated the relationship between SPT and mental object rotation. If mental object rotation and SPT share common representational elements (e.g., the representational elements for the orientation of the object and of the adopted perspective are the same), an RT pattern similar to that revealed in Experiment 1 would be expected. Prior research allows no clear prediction; we are not aware of any study that has looked at the relation between SPT and mental object rotation. Also, the close connection between SPT and IPT as demonstrated in Experiment 1 does not lead to any clear predictions, since evidence for both dissociations and associations between IPT and mental object rotation has been reported (Hegarty & Waller, 2004; Kozhevnikov & Hegarty, 2001; Zacks & Michelon, 2005). Despite this, Experiment 2 was designed to shed light on the processes and representations that underlie SPT.

Method

Participants, apparatus, stimuli, design, procedure, and data analysis

Data of 13 participants (mean age 30 years; 7 female; all right-handed) were collected. One participant was recruited as a replacement for a participant who produced zero correct responses in one condition. To investigate the mental rotation of non-body objects, the arrow stimuli were replaced with either normal or mirror-reversed “Rs” that were rotated to the right (90°), downward (180°), or to the left (270°). Participants responded to a normal “R” with the “l” key and to a mirror-reversed “R” with the “s” key. All other details were the same as in Experiment 1.

Results

The proportions of excluded trials were 2%, 18%, and 4% for no responses, wrong responses, and unfocused responses, respectively.

Again, RTs increased with an increase in angular distance between the orientation of the presented “R” and the upright version of it. However, this time, the largest difference between upright and downward-pointing triangles was found in the random animation condition for downward-pointing “Rs.” Statistically, only a main effect of stimulus orientation, F(3, 33) = 11.92, MSE = 68,869, p < .01, and the three-way interaction, F(3, 33) = 3.02, MSE = 3,775, p < .05, turned out significant. All other main effects and interactions had p values above .10.

When analyzing the ToM animation condition separately, only a main effect of stimulus orientation was found, F(3, 33) = 6.61, MSE = 50,085, p < .05. Neither the main effect of triangle orientation, F(1, 11) = 3.23, MSE = 2,056, p > .05, nor the interaction, F < 1, was significant.

This time, in the random condition, a significant interaction between stimulus and triangle orientation was obtained, F(3, 33) = 4.37, MSE = 7,367, p < .05. In addition, the main effect of stimulus orientation was significant, F(3, 33) = 13.35, MSE = 25,569, p < .01, but not that of triangle orientation, F(1, 11) < 1.

Planned contrasts revealed only a significant difference between upward- and downward-pointing triangle conditions for the downward stimulus orientation, t(11) = 17.37, p < .01. Comparisons for all other stimulus orientation conditions resulted in ps > .05. The RT means are plotted in Fig. 3.

Fig. 3
figure 3

Mean reaction times (RTs) and standard errors for each combination of triangle movement (up, down), R orientation (up, right, down, left), and animation condition (random, ToM)

The mean incorrect responses across the different conditions are given in Table 1. Again, the correlation between RTs and incorrect responses across the different conditions was significantly positive, r(16) = .63, p < .01), excluding speed–accuracy trade-offs.

Table 1 Mean incorrect responses (%) across the different animation (random, ToM), stimulus orientation (up, right, down, left), and triangle orientation (up, down) conditions. Standard errors are given in parentheses

Discussion

The observed increase in RT with an increase in the angle of rotation that was needed to align the stimulus with an upright version is in line with what is typically found for mental object rotation (Zacks & Michelon, 2005). Similar to Experiment 1, a congruency effect occurred. Importantly, in contrast to Experiment 1, the congruency effect was found in the random animation condition. That is, SPT in the ToM animation condition did not interact with mental object rotation; instead, mentally rotating an object interacted with the orientation of the depicted triangles in the random animation condition.

Why was no congruency effect found in the ToM condition? In the ToM condition, the triangles are no longer perceived as objects per se but, rather, as intentional entities. This makes them rather dissimilar to inanimate objects. In contrast, in the random animation condition, the triangles are perceived as inanimate objects, leading participants to perceive their movements on the screen as the rotation of objects. In consequence, the orientation of the “object” triangle in the random animation condition interacts with mental object rotation, but not the orientation of the “agent” triangle in the ToM condition. This finding fits well with a study by Yu and Zacks (2010), which showed—although using still stimuli—that perceived animacy influences whether object or egocentric reference frames are used in the performance of spatial reasoning tasks.

In the present experiment, triangles in the ToM condition led to SPT and, therefore, the activation of an egocentric reference frame, which did not interact with the allocentric reference frame that was activated by mental object rotation. Therefore, Experiment 2 supports the suggestion by Kessler and Thomson (2010) that different reference frames underlie the mental rotation of objects and spatial perspective taking: Spatial perspective taking relies on an egocentric reference frame, whereas mental objectrotation relies on an allocentric frame.

This account makes an interesting prediction. If participants showed no congruency effect in the ToM condition in Experiment 2 because the triangles were conceived of not as object-like, but rather as agent-like, one would expect to find a congruency effect when the mental rotation involves agent-like objects, as, for example, body-part objects.

Before describing how we tested this prediction in Experiment 3, we emphasize that the congruency effect in Experiment 2 was found only for “R” objects that were pointing downward. However, since the task was to distinguish between a normal and a mirror-reversed version of the letter “R,” no mental object rotation was required in the condition in which the “R” was presented in upright orientation, which is why mental object rotation could not have had an influence with upright "R.” A further result of Experiment 2 was that no main effect of triangle orientation was found in the TOM condition, which suggests a dissociation between the rotational processes evoked during spontaneous perspective taking and during mental rotation of non-body-objects.

Experiment 3

Experiments 1 and 2 had shown that SPT is related to IPT, but not to the mental rotation of non-body-part objects. In Experiment 3, we examined whether SPT would be related to the mental rotation of body parts that are associated with agents. At least two kinds of relations are possible: SPT and mental body-part object rotation might rely on similar reference frames, or SPT and mental body-part object rotation might be based on similar processes. In the former case, again a congruency effect between stimulus and triangle orientation would be expected. In the latter case, no congruency effect but generally slower responses would be expected when rotations of the triangle occur, because rotational processes would be activated that might give rise to interference independently of the specific orientations involved. SPT is expected to be evoked only in the ToM condition, and it would involve rotation processes only when the triangle is oriented downward; it follows that slower responses would be expected for the ToM condition when the triangle is oriented downward. Given the evidence that both mental rotation of body-part objects and IPT rely on motor processes (Amorim et al., 2006; Kessler & Rutherford, 2010; Kessler & Thomson, 2010; Kosslyn, DiGirolamo, Thompson, & Alpert, 1998; Wraga, Thompson, Alpert, & Kosslyn, 2003), we expected to find indications of common processes and, therefore, a main effect of triangle orientation in the ToM condition.

Method

Participants, apparatus, stimuli, design, procedure, and data analysis

Experiment 3 was generally the same as Experiments 1 and 2, except where noted otherwise. Data of 12 participants (mean age 29 years; 9 female; all right-handed) were analyzed. This time, either a right or a left hand appeared back-facing inside the triangle (see Fig. 4 for example stimuli). Again, the stimulus was rotated to the right, downward, to the left, or upward. Participants were asked to press the “s” key when detecting a left hand and the “l” key when observing a right hand.

Fig. 4
figure 4

Example stimuli of Experiment 3. Views of backs of hands in different orientations served as stimuli in Experiment 3. Depicted are a left and right hand in stimulus orientation condition “up”

Results

The proportions of excluded trials were 3%, 25%, and 3% for no responses, wrong responses, and unfocused responses, respectively.

Again, RT increased with an increase in angular distance between the orientation at which the hands were presented and their upright version. This time, however, responses tended to be slower, overall, when the triangle was oriented downward, although this effect was driven solely by the ToM condition. The statistics supported this observation: A main effect of stimulus orientation, F(3, 33) = 16.38, MSE = 76,241, p < .01, was accompanied by an interaction between animation condition and triangle orientation, F = 5.58, MSE = 5,083, p < .05. All other effects of the omnibus ANOVA had p-values of > .10.

A separate ANOVA for the random animation condition yielded only a main effect of stimulus orientation, F = 13.48, MSE = 23,450, p < .01; for all other effects, p was above .10.

By contrast, the ANOVA for the ToM condition revealed, in addition to a main effect of stimulus orientation, F(3, 33) = 12.31, MSE = 49,444, p < .01, also a main effect of triangle orientation, F(1, 11) = 9.68, MSE = 4,141, p < .05. The interaction had an F value of < 1 (see Fig. 5 for the mean RTs).

Fig. 5
figure 5

Mean reaction times (RTs) and standard errors for each combination of triangle orientation (up, down), hand orientation (up, right, down, left), and animation condition (random, ToM)

As in the two preceding experiments, a significant positive correlation between incorrect responses and RTs excluded explanations based on speed–accuracy trade-offs, r(16) = .82, p < .01. Table 1 shows the mean percentage of incorrect responses for each condition.

Discussion

Experiment 2 had shown that mental non-body-part object rotation can be dissociated from SPT. In Experiment 3, we asked whether this dissociation was observed because the stimuli involved an inanimate object, in which case it would disappear with body-part objects. With body-part objects introduced in Experiment 3, the results showed a main effect of triangle orientation in the ToM condition. No difference between upward- and downward-oriented triangles was found for the downward stimulus orientation condition, but this was also the condition with the longest RTs, suggesting that this failure to find a difference is owing to a ceiling effect. Importantly, the interaction was clearly not significant, rendering this missing difference spurious and statistically supporting a sole main effect.

This main effect of triangle orientation is consistent with the expectation that similar processes underlie the mental rotation of body-part objects and SPT (Kessler & Thomson, 2010), supporting the notion of embodied perspective taking (Kessler & Rutherford, 2010; Kessler & Thomson, 2010) and embodied mental rotation of body parts (Amorim et al., 2006). In contrast to Experiment 1, no interaction between stimulus orientation and triangle orientation was found in Experiment 3. As was discussed in the introduction to Experiment 1, finding no interaction would argue against common orientation representations shared by the two tasks (i.e., in Experiment 3, SPT and mental body-part rotation), while the main effect of triangle orientation would indicate that common rotation processes are engaged by both tasks. One might speculate that no interference between the specific orientation of the hand and the triangle was found in Experiment 3 because SPT and mental rotation of hand stimuli involve differential, although both body-related, reference frames.

General discussion

In a set of three experiments, we investigated the relation between SPT when intentionally moving objects were observed, IPT when participants were explicitly instructed to take another perspective, and mental object rotation of body- and non-body-part objects. The intention behind this investigation was to better understand the processes and representations involved in SPT by systematically examining with which other visuospatial tasks SPT interferes. SPT was manipulated by presenting participants with animations involving moving geometric objects that in one condition, but not the other, typically lead to a description of the animations in terms of intentional actions. Given previous findings (Zwickel, 2009), SPT was expected to occur in the former but not the latter condition. SPT was investigated by manipulating the explicit visuospatial task assigned to the participants in the three experiments.

In all experiments, RTs were found to be increased as the angular difference of stimulus orientation relative to upright presentation became larger. This pattern has traditionally been taken as evidence that, during mental rotation, an analogue process like that in an actual, physical rotation is occurring (Zacks & Michelon, 2005). Furthermore, in the present experiments, depending on the type of the explicit visuospatial task, SPT interfered with the explicit judgment task in either the ToM animation condition, as was the case in Experiments 1 (IPT) and 3 (mental rotation of body-part objects), or the random condition, as was the case in Experiment 2 (mental rotation of non-body-part objects). This pattern of specificity to the animation condition makes explanations based on low-level visual differences between the different orientations of the stimuli relative to the triangles quite unlikely.

Twelve participants were analyzed in each experiment. Importantly, all experiments involved the same number of participants, and all yielded significant effects, making it unlikely that differences in the outcome between the experiments are explicable by lack of statistical power in one or the other experiment. Also, the number of males and females was—with 8, 7, and 9 female participants in Experiments 1, 2, and 3, respectively—relatively similar and thus cannot explain the difference in the results of the three experiments.

With Experiment 1, we investigated the relationship between SPT and IPT in terms of common processes and representations. Participants were explicitly asked to adopt the perspective (orientation) of an arrow that appeared inside the triangle, and they were required to make a spatial judgment relative to this perspective. This yielded an interaction between the orientation of the arrow and the orientation that was expected to be spontaneously adopted in ToM animations. This interaction between orientation codes suggests the involvement of common egocentric reference frames during instructed and spontaneous perspective taking. We are not aware of another study that has looked at the relationship between these two perspective transformation processes at this detailed level. Our answer is that SPT and IPT share common reference frames and representational elements for orientations.

From this close relation between spontaneous and instructed perspective taking, suggestions can be derived as to how explicit perspective taking may be improved by means of visual aids. For example, displaying a human body with the correct orientation on a navigation map (e.g., on a mobile device) would likely help observers to perform the mental perspective transformations as required by the navigation task. Also, given the link between instructed visual perspective taking and ToM abilities (Hamilton, Brindley, & Frith, 2009), using visual cues that depict the to-be-adopted perspective might help individuals with ToM problems when required to adopt a certain visual perspective.

The issue of common processes and representations between SPT and the mental rotation of non-body-part objects was addressed in Experiment 2. The explicit visuospatial task was to judge whether a briefly presented stimulus was a rotated version of either a normal or a mirror-reversed “R.”. It is assumed that these kinds of tasks involve a mental rotation of the presented stimulus (Shepard & Cooper, 1982), and, as was expected, an increase in RTs with an increase in angle of the required rotation was obtained. More important for the present question at issue, no interference between SPT and the mental rotation task was evident in the ToM animation condition. Taken together with the overlap between SPT and IPT revealed in Experiment 1, this is consistent with findings of differences between IPT and mental (non-body-part) object rotation (Hegarty & Waller 2004; Wraga, Shephard, Church, Inati, & Kosslyn, 2005), suggesting that different reference frames are being used when a mental self-rotation (egocentric) versus a mental non-body-part object rotation (allocentric) is performed.

However, while Experiment 2 yielded a null finding for the ToM condition, it revealed an interaction between the orientation of the triangles and the non-body-object for the random condition. This is exactly what one would expect if the triangles were perceived as objects in the random animation condition but as agents in the ToM animation condition. Because the triangles are perceived as agents in the latter condition, SPT occurs, and the triangles are coded in the (egocentric) reference frame of SPT, rather than in an allocentric reference frame. Additionally, the absence of an interaction in the random condition of Experiment 1 and its presence in Experiment 2 further support the notion that different reference frames underlie object coding and IPT/SPT.

Finally, in Experiment 3, we tested whether SPT and mental rotation interfered in the ToM condition if a body-part object such as a hand was to be rotated. This question was motivated by recent research pointing to embodiment of mental rotations of body and body-part objects (Amorim et al., 2006; Kosslyn et al., 1998; Wraga et al., 2003), but not of non-body objects (Kosslyn et al., 1998; Wraga et al., 2003). For example, Kosslyn et al. asked participants to mentally rotate a cube or a hand while their cerebral blood flow (BOLD activity) was measured by means of fMRI. Significantly stronger BOLD activity was found in motor areas during rotation of the hands, as compared with rotation of the cubes. Given that IPT also relies on embodiment processes (Kessler & Rutherford, 2010; Kessler & Thomson, 2010) and that Experiment 1 suggests a close relation between SPT and IPT, an interaction between SPT and the rotation of body-part objects was expected.

Experiment 3 yielded a clear answer. When SPT required rotation of the reference frame—that is, in downward-oriented triangle conditions—this rotation led to generally slower performance (main effect of triangle orientation) in the hand rotation task, independently of the relation between the hand and the triangle orientation. SPT was expected to occur only in the ToM animation condition, and indeed, this main effect was restricted to ToM animations.

Of note, no interaction was found between the orientation of the hand and the triangle stimulus, which argues against common representational elements being used for coding the orientations of one’s own body and of body parts; for instance, different body-related reference frames might be involved. Instead of common orientation representations, it appears that SPT shares common rotation processes with body-part object rotation. In particular, motoric processes are potential candidates for common rotation processes that might be involved in both SPT and the mental rotation of body-part objects.

The results of Experiments 2 and 3 suggest that the seemingly contradictory findings of both dissociations and associations between mental object rotation and IPT (Hegarty & Waller, 2004) might, at least in part, be due to whether body (part) or non-body (part) objects had to be rotated. This, of course, is not to deny that other task components, too, may have an influence on the processes involved, as has already been discussed in the introduction. For example, Wraga et al. (2003) showed that whether or not mentally rotating a three-dimensional object involves motoric components can be influenced by the prior task to be performed. When the prior task required mental rotation of hands, but not when it required rotation of an object, the subsequent object rotation involved motoric components. Similarly, Zacks, Mires, Tversky, and Hazeltine (2002) showed that instructions can influence whether or not an egocentric reference frame is activated. In this study, only the task instructions were varied, while the stimuli were kept constant. It turned out that having to make same/different judgments led participants to perform object-based rotations, whereas left/right judgments induced egocentric perspective transformations.

Taken together, the interference patterns that were obtained in the present study suggest a close relationship between SPT and other visuospatial tasks that involve social stimuli. Arguably, this observation is of help for achieving a better understanding of how humans perform spontaneous perspective taking. An improved understanding of this crucial component in social interactions (e.g., Clark, 2004; Schober, 2005) may, then, foster further research in areas as diverse as, for example, human–computer interaction or autism.

In summary, we show (1) that SPT relies on similar reference frames as IPT and (2) that SPT and the mental rotation of body-part objects rely on similar rotation processes, presumably involving motoric components. We further show (3) that SPT can be dissociated from the mental rotation of non-body-part objects and (4) that the perception of triangles as agents makes their object reference frame disappear. Given this, the present study advances our understanding of the functional and representational basis of SPT and its relation to other forms of mental spatial transformations, and it offers suggestions as to how these findings might be used to inform interface designs in spatial navigation.