Visual information (e.g., vanishing lines, texture density) dominates comprehension of perspective images (Posner, Nissen, & Klein, 1976; Snowden, Thompson, & Troscianko, 2012). However, visual information alone is not always sufficient to allow unambiguous interpretation of a perspective image. For example, with only visual information, a series of converging rectangles could indicate a downward passage viewed from above or an upward passage viewed from below. In most such cases, proprioceptive information (e.g., observer orientation and posture) disambiguates inadequate visual information. For example, the inclination of a perceiver’s neck indicates whether the perceiver is looking up or down. In most settings, proprioception provides veridical cues to orientation.

This system of comprehension presents an interesting problem for the perception of two-dimensional perspective images. An observer’s physical posture (and associated proprioceptive cues) can be discrepant from the posture implied by a perspective image. For example, consider an image of an arched cathedral ceiling, captured from below. When looking straight ahead at a screen displaying this image, the perspective cues in the image would imply craning one’s neck to look up. Actual posture, however, would not supply this information. How does an observer make sense of a visually implied perspective in light of contradictory proprioceptive cues? Given the prominent role of artificial perspective images in modern life (e.g., in films and television, paintings, photos, and video games), this question is of great importance.

The most effective means of comprehension might be for the visual system to simply ignore inconsistent proprioception. Since visual information dominates the perception of perspective images, the visual system could plausibly rely on proprioceptive information only when visual information is insufficient. Indeed, when coordinating movement, proprioceptive cues become more influential as the quality of visual information falls (Holmes & Spence, 2004).

However, in natural environments, proprioception normally provides veridical cues to perspective, and cues from multiple sources are normally integrated in order to determine perception (Brunswik, 1956). For example, physical changes in viewpoint (e.g., shifting to the left or the right of a stimulus) facilitate correct recognition of object arrays relative to display rotation, independent of locomotion (Simons & Wang, 1998; Wang & Simons, 1999). And active exploration of simulated environments improves recognition of correct image orientations in comparison with the same spatial information acquired passively (Larish & Andersen, 1995; see also Péruch, Vercher, & Gauthier, 1995).

Moreover, many areas of cognition show a similarly close link between visual input and motor systems. Observing someone taking a certain posture activates the same cerebral premotor and motor areas as actually taking that posture (Alaerts, Heremans, Swinnen, & Wenderoth, 2009; Aziz-Zadeh, Wilson, Rizzolatti, & Iacoboni, 2006; Tettamanti, Buccino, Saccuman, Gallese, Danna, Scifo, and Perani 2005), and such activation can then facilitate related action (Ottoboni, Tessari, Cubelli, & Umiltà, 2005; Tessari, Ottoboni, Symes, & Cubelli, 2010). Seeing a teapot with a handle on the left or on the right facilitates later responses with the left or right hand, respectively, and seeing objects typically handled with a specific grip facilitates performing that grip (Ellis & Tucker, 2000; Ellis, Tucker, Symes, & Vainio, 2007; Tucker & Ellis, 1998, 2004). The preponderance of evidence from observing others’ actions and anticipating one’s own actions, as well as from the relationship between observer movement and spatial scene recognition, thus suggests that perceptual and motor information are likely interdependent, such that perception informs motor representations and motor representations inform perception (see Hommel, Musseler, Aschersleben, & Prinz, 2001, for a comprehensive framework).

Such close integration of visual and proprioceptive information in perspective image comprehension predicts that assuming or simulating a posture consistent with the perspective implied by a perspective image would facilitate comprehension of that image relative to assuming or simulating a posture inconsistent with the perspective implied by an image. By image comprehension, we mean extracting and stating a spatial relationship depicted in a perspective image. We tested this prediction in a series of three experiments that manipulated the compatibility of actual or simulated head and neck posture with the perspective implied by perspective images, and measured the speed and accuracy with which the perspective images were comprehended.

General method

Design and participants

All experiments employed a 2 (Posture manipulation: posture taken or posture imagined) × 2 (Posture–image compatibility: compatible or incompatible) design. All participants had normal or corrected-to-normal vision and showed no head-related mobility impairments. Participants received 7€ or course credit in exchange for their participation.

Stimuli and apparatus

The stimulus images were 2-D perspective images (800 × 800 pixels in S1 and S2, 600 × 600 pixels in S3) showing three colored spheres at random positions (see Fig. 1). The image background was black, with two overlapping checkerboard-colored planes seen from either above (downward-oriented images) or below (upward-oriented images). The upward- and downward-oriented images were identical except for a 180° rotation. Each image contained three visual cues that allowed 3-D interpretation of depth: (A) vanishing lines, (B) object sizes, and (C) texture density. The three colored spheres (red, yellow, and blue) appeared at distinct positions without mutual occlusion. Learning of specific sphere arrangements was avoided by selecting from 11,500 random combinations around the image center.

Fig. 1
figure 1

Stimulus perspective images: Downward perspectives appear on the left. Upward perspectives appear on the right. The upper panels show the 25 possible sphere positions in both perspective images, which varied randomly and independently of sphere color (red, yellow, and blue). The middle panels show example stimulus configurations with no sky or ground added (Experiment 1). The bottom panels show an example stimulus configuration with sky and ground added (Experiments 2 and 3). Perspective cues consist of (a) vanishing lines, (b) object size, and (c) texture density

The apparatus for presenting the material in Experiments 1 and 2 is illustrated in the left panel of Fig. 2. Participants sat in front of three vertically mounted and numbered flat screens (1,280 × 1,024 pixels, 75 Hz), with the middle screen directly in front of the participant, at eye level. Looking at the upper or lower screen required tilting the head approximately 45° upward or downward. All screens displayed the same content. In Experiment 3 (right panel of Fig. 2), the screens were replaced with a head-mounted display (HMD; eMagin Z800 3Dvisor with 800 × 600 resolution in each microdisplay). In all experiments, participants responded via a USB keyboard resting on or near their knees.

Fig. 2
figure 2

Experimental apparatus in Experiments 1 and 2 (left) with screens and in Experiment 3 (right) with HMD. Participants either (a) looked up or down to see images with an upward or downward perspective, or (b) imagined looking up or down before images with an upward or downward perspective were presented in front of participants

Procedure

After initial instruction, participants completed 80 judgments (10 per unique condition) of perspective images preceded by eight practice judgments, with a 60-sec break halfway through the main judgments. Before making each judgment, participants were instructed to actually take or to simulate taking an upward- or downward-oriented posture. In Experiments 1 and 2, posture was manipulated by asking participants to view the stimulus image on the lower screen (identified as “screen 1”) or on the upper screen (identified as “screen 3”). In Experiment 3, posture was manipulated with the short behavioral instructions, “head to chest” or “head to neck.”Footnote 1 In all experiments, simulated posture was manipulated by asking participants to imagine performing the actions described in the posture manipulation; after this, an image was presented on the screen directly in front of participants (screen 2, in Experiments 1 and 2) or without a change in actual posture (Experiment 3). All instructions were delivered while participants were in a forward-oriented posture.

The order of posture instructions was randomized within blocks so that each combination of actual or simulated posture instructions, posture direction, and image perspective occurred before repetition. Participants indicated that they had complied with postural instructions by pressing the space bar, at which point a 1-sec fixation cross appeared in the center of the screen, immediately followed by an upward- or downward-oriented stimulus image. The participants’ task was to answer, as quickly as possible, “Is the yellow ball above the red ball?”Footnote 2 by pressing the left arrow key for “no” and the right arrow key for “yes.” Participants used one hand to press the space bar and two fingers on their other hand to respond to the stimuli. The stimulus image remained onscreen until participants responded.

After completing the experiment, participants were debriefed and interviewed about their subjective interpretation of the perspective cues present in the images and the clarity of the instructions.

Dependent measures

Perspective image comprehension was assessed with answer speed and answer accuracy in Experiments 1 and 2, but with only answer speed in Experiment 3.

Response latencies below 500 ms or above 10,000 ms were excluded from analysis (less than 0.5 % of trials in any given experiment) and the remaining response latencies were log-transformed (Cohen & Cohen, 1983; Tabachnick & Fidell, 2007). Mean response latencies before transformation were 1,440 ms (SD = 802), 1,811 ms (SD = 1,009), and 2,088 ms (SD = 1,278) in Experiments 1, 2, and 3, respectively.

Response errors were relatively infrequent in Experiment 1 (M = 3.36, SD = 3.7), even less frequent in Experiment 2 (M = 2.5, SD = 2.7), and were nearly nonexistent in Experiment 3 (M = 1.44, SD = 1.8). Because of this increasingly restricted range, we did not consider answer accuracy in Experiment 3.

In order to compare across measures, we z-standardized both log-transformed response latency and answer accuracy and reverse-scored response latency so that higher numbers indicated better image comprehension for both measures. Both measures were then coded according to whether the head posture direction and image viewing direction were compatible (upward perspective with upward posture and downward perspective with downward posture) or incompatible (upward perspective with downward posture and downward perspective with upward posture; Estes, Verges, & Barsalou, 2008).

Experiment 1

Thirty participants (mean age = 24.55 years, SD = 2.667) completed the procedures using a screen-based stimulus display and stimulus images without ground or sky present (Fig. 1). During postexperimental interview, 2 participants indicated that they had interpreted the perspective in the images differently than was intended, so they were excluded from analysis.

The effect of actual and simulated posture on image comprehension is graphed in Fig. 3. We analyzed these data with a 2 (Posture manipulation: posture taken or posture simulated) × 2 (Posture–image compatibility: compatible or incompatible) within-subject multivariate analysis of variance (MANOVA) on standardized response speed and accuracy, as recommended by Davidson (1972) and O’Brien & Kaiser (1985). The MANOVA revealed only the predicted significant main effect of posture–image compatibility [F(1,27) = 6.175, p = .019, η2 p = .186]. Image-compatible posture led to better image comprehension than did image-incompatible posture. Neither the main effect of posture manipulation [F(1,27) = 3.795, p = .062, η2 p = .122] nor the interaction between posture–image compatibility and posture manipulation [F(1,27) = 1.972, p = .148, η2 p = .076] were significant.Footnote 3

Fig. 3
figure 3

Standardized image comprehension collapsed across measure and conditions (“Overall”) as well as by measure (Accuracy, Speed) and posture manipulation (Physical, Simulated), Experiment 1. Compatible images were better comprehended than were incompatible images. Error bars represent standard errors for individual cells

Looked at separately, the compatibility effect was not significant for response speed [F(1,27) = 3.875, p = .059, η2 p = .126] but was significant for response accuracy [F(1,27) = 4.751, p = .038, η2 p = .150]. The compatibility effect was significant for simulated [F(1,27) = 8.493, p = .007, η2 p =.239] but not for actual posture [F(1,27) = 1.249, p = .256, η2 p = .048].

These results provided initial support for a role for proprioception in perspective image comprehension. However, 2 participants inadvertently misinterpreted the perspective. This implies that the images allowed some degree of perspective ambiguity. In order to verify that the observed effects generalize to image comprehension with nonambiguous images, we conducted a second experiment.

Experiment 2

Twenty-two participants (mean age = 28.27 years, SD = 7.363) completed the procedures using a screen-based stimulus display and stimulus images with ground or sky added, in order to decrease ambiguity in the image perspective (Fig. 1). Participants reported no confusion regarding the intended image perspective.

The effect of actual and simulated posture on image comprehension is graphed in Fig. 4. A 2 (Posture–image compatibility: compatible or incompatible) × 2 (Posture manipulation: posture taken or posture simulated) within-subjects MANOVA on image comprehension yielded two main effects, a main effect of posture–image compatibility [F(1,21) = 10.647, p =.004, η2 p = .336] and a main effect of posture manipulation [F(1,21) = 6.014, p = .023, η2 p = .223]. The interaction between posture–image compatibility and the posture manipulation was not significant [F(1,21) = .615, p = .442, η2 p = .028].

Fig. 4
figure 4

Standardized image comprehension collapsed across measure and conditions (“Overall”) as well as by measure (Accuracy, Speed) and posture manipulation (Physical, Simulated), Experiment 2. Compatible images were better comprehended than were incompatible images. Error bars represent standard errors for individual cells

Looked at separately, the compatibility effect was significant for response speed [F(1,21) = 5.490, p = .029, η2 p = .207] as well as for answer accuracy [F(1,21) = 5.446, p = .030, η2 p = .206]. The compatibility effect was significant for posture taken [F(1,21) = 7.689, p = .011, η2 p = .268] but was not significant for simulated posture [F(1,21) = 3.738, p = .067, η2 p = .151].

The unexpected main effect of posture manipulation indicated that simulated posture reduced image comprehension relative to assumed posture, perhaps reflecting greater mental effort in complying with the posture simulation instructions. Critically, this effect was independent of the observed compatibility effect.

These results provide a second demonstration that proprioception influences perspective image comprehension, this time with wholly unambiguous perspective images.

Experiment 3

Experiment 3 focused on ruling out visual, rather than postural, explanations for the effects of proprioception. In Experiments 1 and 2, looking between different screens required participants to adjust their field of vision. Thus, the perspective images were not only consistent or inconsistent with the posture but also with the eye movements and the flow of optical information created by looking from screen to screen. In order to rule out optical explanations for Experiments 1 and 2, we replaced the three-screen array from Experiments 1 and 2 with an HMD used in a completely darkened room. Thus, visual field remained constant despite changes in posture.

Forty-two participants (mean age = 24.65 years, SD = 3.67) completed the procedures using an HMD display and stimulus images with ground or sky added. Recordings of head orientation confirmed the efficacy of the posture manipulation (up = +45.0°, down = −46.9°, imagined up = +0.5°, imagined down = −0.5°). Participants reported no confusion regarding the intended image perspective.

As noted, answer accuracy approached ceiling (M = 1.44, SD = 1.8), perhaps as a result of reduced answer speed. Thus, we analyzed only answer speed. The effect of posture compatibility on answer speed is graphed in Fig. 5. A 2 (Posture–image compatibility: compatible or incompatible) × 2 (Posture manipulation: posture taken or posture simulated) within-subjects ANOVA yielded only two main effects, a main effect of posture compatibility [F(1,41) = 8.348, p = .006, η2 p = .169] and a main effect of posture manipulation [F(1,41) = 7.477, p = .009, η2 p = .154]. The interaction between posture compatibility and the posture manipulation was not significant [F(1,41) = 1.102, p = .300, η2 p = .026]. Looked at separately, the compatibility effect was significant for simulated posture [F(1,41) = 6.395, p = .015, η2 p = .135] but not for actual posture [F(1,41) = 1.665, p = .204, η2 p = .039].

Fig. 5
figure 5

Standardized response speed collapsed across posture manipulation (“Overall”) and by posture manipulation (Physical, Simulated), Experiment 3. Compatible images were better comprehended than were incompatible images. Error bars represent standard errors for individual cells

As in the previous experiments, image comprehension was better after compatible posture than after incompatible posture, and better after assumed posture than after imagined posture. These results provide further evidence that proprioception plays a role in the comprehension of unambiguous perspective images, this time with a constant visual field.

General discussion

In three experiments, we demonstrated that proprioceptive cues affect the comprehension of perspective images. Proprioception compatible with the perspective implied by an image facilitated image comprehension relative to incompatible proprioception. Experiments 1 and 2 demonstrated this using onscreen displays of perspective images when image perspective was strongly implied but still somewhat ambiguous (Experiment 1) as well as when perspective was disambiguated by the ground and sky (Experiment 2). Experiment 3 replicated these findings with unambiguous images using an HMD display, allowing us to deconfound posture and the visual input normally associated with changing posture.

We observed similar results whether proprioception actually occurred or was simulated. This result extends the importance of mental simulation from the domain of action comprehension and action preparation, where similar findings have been observed (see, e.g., Alaerts et al., 2009; Ottoboni et al., 2005), to the domain of visual perspective comprehension. This extension, in turn, supports the centrality of mental simulation to the comprehension of most every aspect of a human observer’s environment.

Additional features of these procedures are also important. The stimulus images were carefully constructed to keep the sphere configuration centered and invariant except for features related to changes in perspective. Additionally, in no case did the manipulations of posture refer to the semantic concepts of “up” and “down.” Rather, posture was manipulated in reference to numbered screens in Experiments 2 and 3 and in relation to participants’ own bodies in Experiment 3.

Note that these results do not suggest that the compatibility of a perspective image with preceding optical flow is unrelated to image comprehension. Rather, we find the existence of compatibility effects based on visual information plausible. The flow of input across the retina influences vestibular and proprioceptive stimulus processing (Koenderink, 1986). Consistency or inconsistency between such optical flow and the perspective implied by an image might also create a compatibility effect in image comprehension.

It would be possible to respond correctly to our stimulus materials by ignoring depth and focusing on the relative height of the yellow and red balls in a flat plane (an analytic viewing attitude; Carlson, 1962). However, such a viewing attitude would generate null effects. Depth cues are irrelevant to an analytic viewing attitude, so the manipulation of depth cues would not be expected to influence participants’ decision if they were adopting an analytic viewing attitude. We thus assume that participants adopted a realistic viewing attitude in which they encoded and interpreted depth cues as meaningful.

One possibility that our design did not account for is that posture might exert its influence through creating expectations rather than through integration with visual input. We explored this possibility by examining the time course of our observed effects. Because violated expectations fade, an expectations-based account would predict a diminishing posture–image incompatibility effect in later trials relative to earlier trials. We assessed this possibility by reanalyzing the data from all three experiments with experiment phase (early or late) as an added factor. We found clear evidence of learning. Main effects of experiment phase (p < .001 in all experiments) showed performance improving over time. However, we found no comparable evidence of an interaction between learning and posture–image compatibility in any of our experiments (ps = .129, .210, and .379 in Experiments 1, 2, and 3, respectively). The only qualification of posture–image compatibility by experiment phase that we observed suggested a speed–accuracy tradeoff in Experiment 2 [F(1,21) = 7.183, p = .014, η2 p = .255]. That is, posture–image incompatibility induced significant errors in earlier trials [F(1,21) = 6.277, p = .021, η2 p = .230] but not in later trials [F(1,21) = .040, p = .844, η2 p = .002]. Also, posture–image incompatibility significantly slowed response time in later trials [F(1,21) = 6.385, p = .020, η2 p = .233] but not in earlier trials [F(1,21) = .969, p = .336, η2 p = .044]. Overall, these analyses suggest to us that expectancy violation is not a compelling alternative to posture (in)compatibility. The only interaction with experiment phase that we observed was consistent with a robust effect over time.

Our findings support a role for motor and posture simulation in understanding everyday environments and activities (Hommel, 2009). Not only do we refer to our own motor system to make sense of observed actions (Alaerts et al., 2009; Brass, Bekkering, Wohlschläger, & Prinz, 2000; Liepelt, Ullsperger, Obst, Spengler, von Cramon, and Brass 2009), motor and posture simulation are also connected to interpreting and comprehending perspective images. These results suggest that proprioceptive representations are involved in understanding the images we encounter as we watch television, browse the Internet, play computer games, or even look at a friend’s vacation photos.