Perspective taking and systematic biases in object location memory

Segen, Vladislava; Colombo, Giorgio; Avraamides, Marios; Slattery, Timothy; Wiener, Jan M.

doi:10.3758/s13414-021-02243-y

Perspective taking and systematic biases in object location memory

Open access
Published: 15 March 2021

Volume 83, pages 2033–2051, (2021)
Cite this article

Download PDF

You have full access to this open access article

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Perspective taking and systematic biases in object location memory

Download PDF

Vladislava Segen ORCID: orcid.org/0000-0002-6677-723X^1,2,
Giorgio Colombo³,
Marios Avraamides^4,5,
Timothy Slattery² &
…
Jan M. Wiener^1,2

2421 Accesses
5 Citations
13 Altmetric
Explore all metrics

Abstract

The aim of the current study was to develop a novel task that allows for the quick assessment of spatial memory precision with minimal technical and training requirements. In this task, participants memorized the position of an object in a virtual room and then judged from a different perspective, whether the object has moved to the left or to the right. Results revealed that participants exhibited a systematic bias in their responses that we termed the reversed congruency effect. Specifically, they performed worse when the camera and the object moved in the same direction than when they moved in opposite directions. Notably, participants responded correctly in almost 100% of the incongruent trials, regardless of the distance by which the object was displaced. In Experiment 2, we showed that this effect cannot be explained by the movement of the object on the screen, but that it relates to the perspective shift and the movement of the object in the virtual world. We also showed that the presence of additional objects in the environment reduces the reversed congruency effect such that it no longer predicts performance. In Experiment 3, we showed that the reversed congruency effect is greater in older adults, suggesting that the quality of spatial memory and perspective-taking abilities are critical. Overall, our results suggest that this effect is driven by difficulties in the precise encoding of object locations in the environment and in understanding how perspective shifts affect the projected positions of the objects in the two-dimensional image.

Biases in object location estimation: The role of rotations and translation

Article Open access 31 May 2023

The role of memory and perspective shifts in systematic biases during object location estimation

Article Open access 16 February 2022

Alignment in spatial memory: Encoding of reference frames or of relations?

Article Open access 06 October 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Our ability to orient and navigate depends largely on forming spatial representations that maintain information about the locations of landmarks and other objects (Epstein et al., 1999; Postma et al., 2004; Waller, 2006). Such representations can vary greatly in terms of the precision with which they hold information (Evensmoen et al., 2013). In the visual working memory literature, the precision of spatial representations has been investigated with tasks that involve memorizing first the position of an object presented in a 2D stimulus array on a blank screen, and then repositioning the object to its original position (Aagten-Murphy & Bays, 2019; Nilakantan et al., 2018; Pertzov et al., 2012; Pertzov et al., 2015; Stevenson et al., 2018). Moreover, psychophysical approaches with change detection tasks have also been used to quantify the precision of spatial representations (e.g., Brady & Alvarez, 2015; Luck & Vogel, 1997, 2013). In these tasks, participants are asked to indicate whether an object has moved or changed between encoding and test, with the amount of movement/change systematically manipulated. Such tasks, which are primarily designed to investigate visuospatial working memory capacity, showed that increasing the number of to-be-remembered items leads to a reduction in the quality of the representation for each of the items (Brady et al., 2011). In addition, the precision of encoding was shown to be negatively affected by typical and atypical aging (Liang et al., 2016; Nilakantan et al., 2018; Pertzov et al., 2015).

Although such approaches provide a detailed account of the precision with which object locations can be memorized, they typically focus on 2D spaces and do not investigate the precision of spatial representations in the 3D space that we encounter during navigation, where perspective shifts typically take place. In addition, tasks that use 2D spaces can often be solved by memorizing the pixel positions of the objects on the screen and thus do not require participants to infer how space is structured (Nardini et al., 2009). In contrast, the use of virtual environments and the introduction of perspective shifts between encoding and test allows investigating the ability to encode 3D spatial locations. It also ensures that participants cannot simply memorize the position of the objects on the screen. Instead, participants must remember the position of the object in the virtual world and understand how the visual projections of the objects and the room would change following a perspective shift.

There are several virtual reality navigation tasks that allow assessing the precision of spatial representations. In some tasks, participants have to learn the position of target locations within an environment—that is, virtual Morris water maze (vMWM) tasks (e.g., Daugherty et al., 2015; Moffat et al., 2007; Woolley et al., 2010), the flag localization task (Hartley et al., 2004), object-location memory tasks (Doeller et al., 2008)—while in other tasks they have to memorize their own locations before being transported to a new location and asked to navigate back to the previous location (e.g., Gillner et al., 2008). These experimental tasks provide rich data sets with a wide range of measures that allow assessing the precision with which spatial locations can be memorized, such as distances and angular differences between the estimated position of the target or own location and the correct locations, time spent searching in the vicinity of the correct location, and path trajectories amongst others. These tasks have also been used to investigate spatial encoding strategies (e.g., Mueller et al., 2008) and reference frames (e.g., King et al., 2002; King et al., 2004) used during navigation as well as effects of (a)typical aging on spatial navigation (e.g., Moffat et al., 2007). More recently, the vMWM has also been applied to investigate the precision of spatial representations in patients with hippocampal lesions (Kolarik et al., 2018; Kolarik et al., 2016).

Despite their utility for studying the precision of spatial memory, these tasks often require specialized equipment, software, and skills, as well as prolonged training and familiarization with the task, the virtual environment, and the equipment. For example, a typical virtual Morris water maze task consists of training trials during which participants learn the position of the hidden platform by navigating within the environment (Daugherty et al., 2015; Kolarik et al., 2018; Kolarik et al., 2016; Moffat et al., 2007; Woolley et al., 2010) as well as control trials where participants navigate to a visible platform. In addition, those tasks require participants to navigate/move within the environment using a keyboard or a joystick, which can introduce unwanted confounds that depend on gaming and computing experience (Murias et al., 2016; Richardson et al., 2011). This becomes a particular challenge if testing involves patients and older adults, who are often less experienced in using such devices (Charness & Boot, 2009; Diersch & Wolbers, 2019). Difficulties with the testing apparatus can inflate differences in navigation performance (Richardson et al., 2011; Waller, 2000). Moreover, the in-depth analysis of performance on those virtual navigation tasks, which is needed to estimate the precision of spatial representations (Kolarik et al., 2018; Kolarik et al., 2016), can often be quite complex (Cooke et al., 2019).

Spatial memory and perspective-taking tasks offer advantages for studying the precision of spatial representations over navigation tasks as they are easier to administer and require neither prolonged training nor specialized equipment. Typically, they involve an encoding stimulus portraying a place or an array of objects that participants have to memorize, followed by the presentation of a second stimulus presented from a different perspective with participants asked to judge whether it depicts the same place, or whether the objects have moved (Hartley et al., 2007; Hilton et al., 2020; Montefinese, Sulpizio, Galati, & Committeri, 2015; Muffato et al., 2019; Segen, Avraamides, Slattery, & Wiener, 2020a).

A popular spatial memory task that follows this paradigm is the Four Mountains task (Hartley et al., 2007), which involves viewing an image of a place defined by four mountains, followed by four new images. One of these images depicts the same place, but from a different perspective, while the other images display a slightly different arrangement of the mountains. Participants are asked to select from the four, the image that corresponds to the same place they have seen during encoding. The Four Mountains task was specifically designed to provide a test that is quick and easy to administer, tapping into viewpoint independent spatial memory. What is more, the task has been successfully used to differentiate between healthy older adults and those with mild cognitive impairment (MCI) as well as between MCI, Alzheimer’s disease (AD), and frontotemporal dementia patients (Bird et al., 2010; Chan et al., 2016).

The Four Mountains task, however, does not systematically manipulate the amount of change of the spatial situation between encoding and test and is therefore not suited to assess the precision of spatial representations. Similarly, spatial memory tasks that focus on object location binding typically either move the object by a specific invariant amount (Montefinese et al., 2015) or swap two objects with each other (Hilton et al., 2020; Muffato et al., 2019; Segen et al., 2020a, 2020b). Again, such manipulations do not allow the assessment of the precision with which spatial locations are encoded.

Spatial memory precision has recently been associated with hippocampal functioning (Ekstrom & Yonelinas, 2020; Kolarik et al., 2018; Kolarik et al., 2016; Stevenson et al., 2018). For example, Stevenson et al. (2018) reports that increased high-frequency activity in the hippocampus was associated with the precision of spatial memory retrieval in a task using 2D stimuli. Moreover, Kolarik et al. (2016) and Kolarik et al. (2018) showed that hippocampal damage was associated with deficits in the ability to precisely remember the position of targets while coarse memory for the targets’ approximate locations was not affected. Importantly, the hippocampus and related regions undergo functional and anatomical changes in typical and atypical aging, which are often associated with declines in spatial memory (Hartley et al., 2007; Hilton et al., 2020; Montefinese et al., 2015; Muffato et al., 2019; Segen et al., 2020a). However, the nature of those deficits is not well understood as the findings reporting deficits are often mixed, specifically in healthy older adults and those with very early MCI (Colombo et al., 2017; Moodley et al., 2015; Segen et al., 2020b). Quantifying the precision of spatial memory may offer a more sensitive tool, compared with studies focusing on coarse spatial changes (Hartley et al., 2007; Hilton et al., 2020; Montefinese et al., 2014; Muffato et al., 2019; Segen et al., 2020a), to investigate spatial memory deficits in those groups. As a result, a quick and accurate tool that taps into the precision of spatial representations would provide a more nuanced understanding of the nature of spatial deficits across those groups—that is, (a)typically aged groups—that could be extended for early detection of MCI as well as differential dementia diagnosis in the future.

Here, we set out to develop a novel spatial memory task that aims to provide a quick and objective assessment of precision of spatial encoding, with minimal training requirements. To do so, we developed a two-alternative forced-choice (2AFC) task where participants had to judge the direction in which an object has moved in a 3D environment following a perspective shift. By systematically manipulating the distance by which the object was displaced, we estimated how accurately participants could detect the movement of objects in space following a perspective shift.

Experiment 1

Introduction

In this experiment, we introduce a novel task that was designed to provide a quick assessment of the precision of object location representations in healthy younger adults. We employed psychophysics methods using an 2AFC task in which participants had to judge the direction in which an object moved in an environment following a perspective shift. A 2AFC approach was chosen as it is better suited to rapidly and reliably assess precision of spatial memory than change detection tasks (Heywood-Everett et al., 2020). To investigate the precision of participants’ representations for object locations, we systematically manipulated the distance by which the object was displaced.

Method

Participants

In total, 44 participants between the ages of 20 and 48 (M_age = 25.5, SD = 6.31) years of age took part in the study (29 females; 15 males). The majority of the participants (40) were right-handed. Participants were recruited through Bournemouth University’s participant recruitment system and received monetary compensation for their time. Written informed consent was obtained in accordance with the Declaration of Helsinki.

Design

The experiment followed a within 2 (object direction: left/right) × 2 (camera direction: left/right) × 6 (object displacement distance [ODD]: 5, 8, 13, 22, 37, 61 cm) design.

Materials

Virtual environment

The virtual environment was designed using 3DS Max 2018 (Autodesk Inc.) and consisted of a square room (9.8 m × 9.8 m), on the walls of which there were posters depicting highly familiar and recognizable landmarks (Hamburger & Röser, 2014). A teal plank was placed diagonally in the middle of the room (14-m long), and a target object was placed centrally on that plank with its position varied within a range of 65 cm either to the left or right of the center. The target object could only move along the plank.

The experimental stimuli were renderings of the environment with a 47.7° horizontal field of view and a 15% shift in the vertical field of view to simulate human vision (see Fig. 1a). Creating an asymmetric viewing frustum that resembles natural vision has been found to improve distance estimates in virtual environments (Franz, 2005). The experiment was presented on an 80.9-cm screen (diagonal) with an aspect ratio of 16:9 and a resolution of 1,920 × 1,080 pixels. Participants were seated 80 cm from the monitor with their head positioned on a chin rest. The physical vertical field of view (FOV) of the screen at this distance was 28°, and the horizontal FOV was 47.7° and matched the horizontal FOV of the rendered stimuli.

The cameras were arranged around an invisible diagonal line that was perpendicular to the plank. In both encoding and test stimuli, participants would always see one corner of the room and two posters on either side of the corner (see Fig. 1a). There were two possible camera start and object start positions in encoding stimuli. The two possible camera start positions were 15° to the left (Position 1) or to the right (Position 2) of the diagonal line (see Fig. 1a). The target object was positioned on the plank, either 5 cm to the left or to the right of the center of the room. The camera always faced the center of the room.

The test stimuli were rendered from a different viewpoint with a 20° perspective shift. If the stimuli at encoding was presented from Camera Position 1, the camera moved right, and if the encoding was presented from Camera Position 2, it moved left (see Fig. 1a). The target object at test would move by 5, 8, 13, 22, 37, or 61cm from its start position either to the left or the right.

Stimuli were presented with OpenSesame 3.1.7 (Mathôt et al., 2012), and the left and right arrow keys on a standard computer keyboard were used to record responses.

Procedure

Each experimental trial started with a brief presentation of text instructing participants to remember the location of the target object (750 ms), followed by the presentation of a fixation cross and a scrambled stimuli mask (600 ms; see Fig. 1b). In the subsequent encoding phase, participants were presented with a rendering of one of the two target object start positions either from Camera Position 1 or Position 2 for 1.7 seconds. After the encoding phase, participants were again presented with a fixation cross and a scrambled stimuli mask for 600 ms. In the test phase, participants were presented with a rendering of the room following a 20° perspective shift. Their task was to decide whether the target object has moved to the left or to the right and respond by pressing the corresponding key on a standard computer keyboard. In 50% of the trials, the target object moved left, and in the remaining 50% of the trials, the target object moved right.

The experiment consisted of 72 experimental trials presented in randomized order, with each object displacement distance repeated eight times. The task took around 10–15 minutes to complete and was administered as part of a larger study.

Results

Accuracy estimates were obtained using generalized linear mixed-effects(GLME) models using the glmer function from LME4 package (Bates, Kliegl, Vasishth, & Baayen, 2015) in R, with ODD (object displacement distance), camera direction, and object direction as fixed factors and a random by-subject and by-stimuli intercept. We also estimated corresponding p values using the lmerTest package (Kuznetsova, Brockhoff & Christensen, 2017). Both camera direction and object direction were effect coded, and ODD was scaled and log transformed and used as a continuous variable. Our results (see Table 1) showed that performance increased with an increase in the distance by which the target object was displaced between encoding and test. In addition, we found an interaction between camera direction and object direction, with lower performance in situations when the camera and the object moved in the same direction (e.g., the target object moves left, and the camera moves left). We also found a three-way interaction between camera direction, object direction, and ODD, in which the effect of camera and object direction was reduced with an increase in the ODD.

Table 1 Coefficients from accuracy GLME analysis

Full size table

Reversed congruency effect

To further investigate the Camera × Object Direction interaction, we split data into congruent and incongruent trials. In congruent trials, the camera and the object moved in the same direction, whereas in incongruent trials, the camera and the object moved in opposite directions. We then ran a GLME to investigate the effect of congruency and ODD on performance. The same random effect structure was used as in the previous analysis. The results (see Table 2) show that participants performed significantly worse in congruent trials than in incongruent trials, and we termed this bias the reversed congruency effect. We also found a two-way interaction with the reduction of the reversed congruency effect with an increase in distance (see Fig. 2). Specifically, our results show that in the congruent trials, participants consistently reported that the object moved in the opposite direction of the actual movement for small displacements (i.e., 5cm–22 cm). Only once the object was moved by 37 cm or more (61 cm), participants began to perform above chance level in the congruent trials (see Fig. 2). A different pattern of results was found in incongruent trials, with participants responding correctly on more than 90% of the trials, regardless of the ODD.

Table 2 GLME analysis investigating the congruency

Full size table

Discussion

This experiment set out to establish a new task that allowed for a quick and easy assessment of the precision of spatial representations. Unexpectedly, we found that the combination of object and camera movement direction systematically biased participants' responses. Specifically, if the object and the camera moved in the same direction, participants perceived the movement of the object to be in the opposite direction. This was most pronounced at smaller displacement distances. If, however, the object and the camera moved in the opposite directions, participants reliably detected movement direction, even at the smallest displacement distances. We termed this the reversed congruency effect.

It is not obvious how spatial cognition theories, including those differentiating between egocentric and allocentric spatial representations (Burgess, 2006; Klatzky, 1998; Shelton & McNamara, 2001), could explain this reversed congruency effect. For example, if participants formed an allocentric representation of the environment (Burgess, 2008), they should reliably detect the direction of object movement regardless of whether the camera and the object moved in the same or opposite directions. This is because their representations of object locations are encoded relative to other features or landmarks in the environment and do not depend on the perspective from which the environment is viewed. Similarly, if participants encode the position of the object and other environmental cues in relation to their current position in space and engage in mental transformations to achieve spatial perspective taking (Holmes et al., 2018; King et al., 2002; Klencklen et al., 2012), we would expect them to adjust the expected positions of the objects in the environment based on their new position in the environment and perform the task without the systematic bias that we observed. Of course, neither the egocentric nor the allocentric strategy would guarantee that participants always responded correctly. Instead, performance would depend on the individual’s ability to generate precise spatial representations. Thus, we expected a linear increase in performance in both congruent and incongruent trials with increasing target object displacement, with the slope and intercept of the increase being determined by individual differences in precision.

If participants, as argued above, did not solve the task using a spatial strategy (i.e., egocentric or allocentric strategy), it is possible that they used a heuristic that may have given rise to the systematic bias we have observed. We considered a number of such heuristics for the reversed congruency effect (more information on those heuristics is available in the Supplementary Materials). First, given the relatively small extent of the camera movement between encoding and test, participants may have found it difficult to understand the perspective shift and, therefore, essentially ignored it. As a result, they would have remembered the position of the target object on the screen (i.e., in screen coordinates) and used this position to judge whether the object has moved to the left or right. The screen-based strategy would be akin to participants using an egocentric strategy that would ignore the perspective shift and use the absolute relationships between their body and the object to judge the direction in which the object has moved. This screen-based strategy, however, predicts correct response for all trials, a pattern that we did not observe in the congruent trials. Second, participants could have encoded the position of the target object relative to other room-basedcues—such as the room corner—but in the image, rather than in the 3D space. During test, they may have compared this memorized relationship with that in the test image in order to decide whether the object moved left or right. This “corner-based” strategy does predict correct responses in all incongruent trials, thus predicting participants’ performance well in these trials. However, the corner-based strategy predicts incorrect responses for all congruent trials, which does not match the empirical data.

We believe that the reversed congruency effect is primarily driven by the movement of the camera in the real world such that when the camera moves left, participants expect that the object would appear to move left as well. As a result, even if the object remained stationary, participants would experience “camera-induced object motion” to the right (as they expected that it would move to the left). This camera-induced object motion, together with actual object movement direction, would give rise to the reversed congruency effect. Specifically, if the object moves in the opposite direction to the camera (incongruent trials), the camera-induced object motion amplifies the actual object movement. In contrast, when the object moves in the same direction as the camera (congruent trials), the camera-induced object motion effect may be greater than the actual object movement. In such cases, participants would incorrectly perceive the direction of object movement. However, when the object movement is large enough, it will eliminate the induced motion effect caused by the camera movement, and participants may perceive the object movement in the correct direction. This interpretation is in line with our empirical data, as participants consistently misjudged the direction of movement for small object displacements in congruent trials with performance improving for larger displacements. In incongruent trials, on the other hand, participants responded correctly across all object displacement distances.

To our knowledge, there have been no other reports that have described an “induced object motion effect” after a perspective shift in the spatial cognition literature. We did, however, find reports from studies with 2D stimuli that describe an induced object motion effect, called the induced Roelofs effect (IRE; Bridgeman et al., 1997). Specifically, when a dot and a surrounding rectangular frame move in opposite directions on the screen, participants perceive the movement of the dot as larger than when the dot and the frame move in the same direction (Abrams & Landgraf, 1990; see also, Bacon et al., 1982). The IRE has also been demonstrated using static stimuli showing that if the frame is shifted to the left, participants estimate the target object to be farther to the left (Bridgeman et al., 1997; Taghizadeh & Gail, 2014). Two explanations have been proposed for the IRE: (1) the frame biases the egocentric perceived midline in the direction of the frame shift, thus changing the location of the target relative to perceived midline (Bridgeman et al., 2000; Bridgeman et al., 1997); (2) the effect is induced by an allocentric influence with the relative relationship between the target and the frame directly affecting the perceived location of the target (de Grave et al., 2002; Taghizadeh & Gail, 2014). Importantly, both explanations suggest that the IRE stems from biased encoding as a result of the shift of the frame position on the screen. In our experiment, it is not clear what the frame would be as the stimuli were always presented full screen and thus did not move on the screen. Thus, the camera-induced object motion effect in our experiment is unlikely to be driven by the same mechanisms that describe the IRE. Instead, we propose that the camera-induced object motion is the product of the camera movement in the “real world” (virtual environment) between encoding and test rather than by the movement of the object on the screen. While we do not have a firm explanation for the camera-induced object motion effect we observed, we speculate that it is driven by difficulties in precisely encoding the position of the object in the environment and difficulties in understanding how the perspective shift between encoding and test affects the projected position of the object in the two dimensional image. It is also possible that the camera-induced object motion effect experienced by participants may arise due to naïve theories that people hold about how the visual world works (for more in-depth discussion, see Bertamini, Latto, & Spooner, 2003). It is also worthwhile to note, that the encoding time was relatively short, as a result it is possible that this has contributed to difficulties in precisely encoding object position.

The primary aim of Experiment 1 was to introduce a new task to assess the precision with which participants can memorize the locations of objects in space. The reversed congruency effect described above, however, demonstrates that the perspective shift between encoding and test had a significant impact on participants’ judgments. Therefore, Experiment 2 was designed to facilitate our understanding of the reversed congruency effect. Specifically, we investigate whether the effect is driven by object movement on the screen or as a result of camera movement in the real world. Experiment 2 aimed to provide a conceptual replication of the reversed congruency effect, but also aimed to investigate whether providing additional cues in the environment would eliminate or at least reduce the effect.