The brain mechanisms for memorizing and recall when navigating in 2-D environments have been extensively studied in recent years. But few studies have investigated human spatial memory in multifloored environments like buildings (Buechner, Hoelscher, & Strube, 2007; Christou & Bülthoff, 1999; Hölscher, Meilinger, Vrachliotis, Brösamle, & Knauff, 2006; Montello & Pick, 1993; Richardson, Montello, & Hegarty, 1999; Wilson, Foreman, Stanton, & Duffy, 2004). The first, third, and fourth references suggested that participants memorize multifloored environments as a collection of floors. For example, in Montello and Pick (1993), participants learned two separate routes through several landmark objects in a multifloored building, the routes being located one above the other. After learning, participants discovered a stairway connecting the two routes. Results showed that participants pointed more precisely and more rapidly toward a recalled object for objects that were located within, rather than between, floors. Yet this spatial knowledge was acquired by floor observation. Therefore, the finding that the layout is memorized by floors might derive from the learning mode.

Previous publications have reported several examples of learning-dependent effects in spatial memory for recognition of planar scenes (e.g., Shelton & McNamara, 2001). Specifically, Shelton and McNamara (2004) suggested that route and survey perspectives and egocentric orientation influence encoding and retrieval of a large virtual 2-D environment. In the present work, we manipulated the learning condition by means of a landmark object learning and recognition task in a virtual building, to investigate whether vertical relationships between landmark objects are more difficult to memorize because the environment has been explored by floors during learning. We hypothesized that the ability to memorize multifloored environments is highly influenced by the way participants learned about the building, by viewing travel either along horizontal corridors one at a time or via simulated lifts between floors.

Method

Apparatus

We used a desktop tour in a virtual multifloored building (e.g., Ruddle, Payne, & Jones, 1997). The environment was displayed on a 30 × 48 cm monitor with a screen resolution of 1,600 × 1,200 pixels at a refresh rate of 60 Hz. The viewers had a horizontal field of view of 51°. The landmarks consisted of nine virtual objects from daily life (a fireplace, a bar, a writing desk, a bookcase, a boiler, a kitchen unit, a blackboard, a drawer, and a piano). Each of the nine objects was placed in a different room of the virtual building, in three adjacent rooms per story, on three stories.

Participants

Fifty-six employees of EDF (28 males and 28 females) participated in this study at EDF R&D Centre. Ages ranged between 23 and 57 years (M = 39.6 for both genders). All participants had normal or corrected-to-normal vision. This study was approved by the local Ethics Committee. Participants gave their informed consent prior to the beginning of the experiment.

Procedure

We randomly allocated 28 participants to the floor-learning group and the 28 others to the column-learning group. The experiment consisted of a learning stage followed by a testing stage.

Learning stage

During the learning stage, participants in the floor-learning group memorized the location of objects during a passive visit to the environment, one corridor after another, while participants in the column-learning group memorized the location of objects from a visit to one column after another, as if they were inside a glass lift.

We informed the participants that there was one object per room, without instructing them about the actual number or position of objects.

Both groups started their travel from the same initial room and met the landmark objects in exactly the same sequence by placing the objects in the building accordingly for each group (see Fig. 1a, b). Although observation did not proceed uniquely by floors in the floor-learning condition (e.g., after observing the ground floor, it was necessary to ascend to the first floor) or columns in the column-learning condition, for the sake of clarity, we will refer to floor- and column-learning conditions.

Fig. 1
figure 1

Description of the virtual environment and trials. a Pink ribbon shows the floor-learning route. b Pink ribbon describes the column-learning route, with the same sequence of objects as in panel a. In panels a and b, cameras along the ribbons indicate the orientation of the viewpoint in each room; white arrows show the movement in familiar segments; black arrows mark novel segments; chequered arrows correspond to the misaligned trials. There are six familiar and four novel segments in both cases. c In familiar segments, the correct object is the one that followed the starting object along the observation route. Here, in this floor-learning case, the “fireplace” was followed by the “bar.” Therefore, the familiar trial consists of viewing the room with the “fireplace,” then seeing the camera moving to the adjacent room within the same floor and (correctly) choosing the “bar” among four objects. d The correct object for this novel trial is the “bookcase.” The “bar” serves as probe distractor. e and f The column-learning case. g The subjective view at the starting position of a trial where the “fireplace” is the starting object. h The subjective view during the following choice

The participants viewed the building twice for 45 s at the constant speed of 2 m/s. We designed the walls of the building so that participants could see only one object at a time, for about 3 s. A text appeared at the beginning of each trip to indicate to the participant that he had returned to the departure point of the trip.

Since our aim was to ecologically study spatial memory in buildings, following Vidal, Amorim, and Berthoz (2004), who found that humans can build a mental representation of a 3-D environment although this representation is probably oriented with respect to the vertical of the memorized structure, we restricted the camera movements to forward horizontal and upright vertical translations only. As a consequence, in the floor-learning condition, when the camera moved upright from the ground to the first floor, it was necessary to rotate the camera by 180º to avoid the unnatural experience of visiting the first floor by “walking backward.” Rotation was also applied during the ascension from the first to the second floor. Such rotations were not present in the column-learning condition. The camera movements occurring at the end of a floor (or column, respectively) were called misaligned movements.

Testing stage

During the testing stage, participants completed a series of recognition trials. A fixation cross appeared for about 1 s before each trial. Then, for each of the 24 trials, the camera was directly placed in one room viewed during learning. The landmark object characterizing this room as experienced during the learning stage was displayed in order to allow participants to locate themselves in the environment (i.e., to determine in which room of the building they were). We called such an object the starting object (see Fig. 1c–g). After about 2 s, the camera moved for about 4 s from this room to an adjacent empty one by following a movement either within a floor (floor trial) or within a column (column trial). We call this movement from a room to an adjacent one a segment. Thus, depending on the learning condition, a trial could either replicate a segment of the learning route (familiar segment) or not (novel segment, or shortcut) (see Fig. 1a, c, d and b, e, f). After 500 ms, four objects appeared (see Fig. 1h). Participants had to select the object that was located in that room during learning (arrival object, which appears simultaneously with the distractor objects) by pressing the corresponding key. Therefore, for floor learners, a floor trial corresponded to a familiar segment, and a column trial corresponded to a novel segment. Conversely, for column learners, a column trial corresponded to a familiar segment, and a floor trial corresponded to a novel one.

The participants underwent a first block of 12 trials, then a second block of 12 more trials presented in a pseudorandom order with respect to the first block. In each block, 6 trials tested familiar segments, and 4 tested novel ones. Two trials involved misaligned movements, familiar but perpendicular to the main direction of the learning path. The particular status of these trials, together with the possible effect of the 180° rotation experienced only by floor learners, led us to exclude the misaligned trials from the results analysis. We selected the trials to be homologous across groups (e.g., “fireplace as starting object plus floor movement toward the bar as arrival object,” which is a familiar trial for the floor-learning group, was taken as equivalent to a “fireplace plus in-column movement toward the bar,” also familiar for the column-learning group thanks to the symmetrical placements of objects along the two learning routes).

We encouraged participants to answer as accurately and as quickly as possible. Errors and reaction times (RTs) were recorded.

On each recognition trial, one of the three distractors was the object that would have been the correct answer if the movement had been perpendicular to the one performed on that trial (see Fig. 1d–f). We called it the probe distractor. Thus, on trials testing novel segments, the probe distractor corresponded to the object that was viewed during the learning phase right after the starting object. If recognitions were somehow dictated by the sequence of objects met along the learning route independently of their location, then when testing novel segments, the probe distractor should be chosen more frequently than the two other distractors.

Preparation

A pretest inspired by Huttenlocher and Presson (1979) assessed whether any participant had any long-term spatial memory impairment. Moreover, before the experiment, we trained participants to perform the task in a simpler environment (two floors with four objects different from those used later). The learning mode used during training was the same as that for the group to which participants were assigned. They received feedback on their performance, and in case of error, the trial was repeated.

Hypothesis

We argued that if human spatial memory of multifloored environments is preferentially exploited by floors regardless of the learning mode, better performance should be observed for floor trials than for column trials. If the use of spatial memory depends on the learning condition, we should observe a statistical interaction between learning and recognition conditions. In our experiment, such an interaction is equivalent to comparing performance for movements that replicate learning (i.e., familiar segments) with performance for shortcuts (i.e., novel segments). Finally, if learning spatial relations is easier for floor learners, we should observe better performance for floor learners than for column learners independently of the recognition condition.

Results

All participants passed the whole session completely, without unusual fatigue or nausea, in 30 min, on average. The overall mean error was equal to 36 % (SEM = 3.8 %); as compared with the chance level (75 %), this corresponds to a rather accurate performance. One-tail Student tests indicated that the performance of participants in each elementary condition was better than the chance level.

Errors and RTs were analyzed using a two-way ANOVA applied to the learning conditions (floor vs. column learning) and the recognition conditions (floor vs. column segments) (see Table 1 and Fig. 2a), in a fully independent design assuming four groups: the two learning groups and the two types of segments.

Table 1 Percentages of errors for the different conditions (mean, SEM)
Fig. 2
figure 2

Graphs of the results in floor versus column recognitions, for floor- and column-learning groups. a Mean percentages of errors. b Reaction times. Floor recognitions for floor learners and column recognitions for column learners correspond to familiar segments (novel ones are defined symmetrically). The error bars represent ± SEM

First, the ANOVA showed no significant effect of the recognition condition on the errors, F(1, 108) < 0.001, p = .999, confirmed by the Kruskall–Wallis test, χ 2 = 0.055, p = .815. This indicates that the difficulty of floor and column recognitions was equivalent, which led us to reject the hypothesis that spatial memory is preferentially used by floors.

We then studied whether performance depended on the learning route independently of the recognition condition. The ANOVA showed a significant effect of the learning condition, F(1, 108) = 8.3, p = .004, confirmed by the Kruskall–Wallis test, χ 2 = 7.84, p = .005. Globally, floor learners were more accurate than column learners.

Third, familiar and novel segments were studied with respect to a possible interaction between the factors of the ANOVA. Indeed, we observed an interaction, F(1, 108) = 13.0, p < .001, confirmed by the Kruskall–Wallis test, χ 2 = 9.9, p = .001. Regardless of the learning condition, performance was reliably superior for trials where segments were familiar, as compared with trials involving novel segments. Moreover, the study of familiar versus novel recognitions for each separate learning condition confirmed this result (all ps < .005 in the two t-tests). Taken together, this shows a significant relation between learning and performance, in favor of familiarity.

On shortcut trials, both groups responded reliably better than chance, showing the ability of participants to exploit learned spatial knowledge. On these trials, the probability of choosing the probe distractor by chance among the three distractors follows a binomial distribution law B(n,p) where p = 1/3 and n is the number of errors committed (n = 84 for the floor learners, n = 116 for the column learners). It is approximated by the normal distribution Ν[np, np(1 − p)] of variance np(1 − p). Floor learners and column learners made 23 and 36 selections, respectively, of the probe distractor. By comparing these values with the confidence intervals to within two standard deviations, we establish that the probe distractor was selected at random by the floor and the column learners: All participants were able to carry out the spatial memory task, and not merely a sequential memory task, in which case they would have been attracted by the probe distractor. We also studied the RTs (see Table 2 and Fig. 2b).

Table 2 Reaction times for the different conditions (in milliseconds) (mean, SEM)

First, the ANOVA showed no statistical difference between floor and column recognitions, F(1, 108) = 0.138, p = .71. This is consistent with the error analysis. Second, there was no statistical difference between the floor- and column-learning groups in the time needed for recognition, F(1, 108) = 1.18, p = .279. This differs from the error analysis. Third, for familiar versus novel recognitions, the ANOVA yielded a significant effect, F(1, 108) = 6.42, p = .012, which is consistent with the error analysis. The Kruskall–Wallis test confirmed these three results, χ 2 < .001, p = .986; χ 2 = 1.98, p = .159; χ 2 = 5.28, p = .021.

Discussion

The existing studies on spatial memory in multifloored buildings suggest that people memorize preferentially by floors. In contrast with this suggestion, in our learning and recognition experiment, we detected no advantage for floor recognition per se when the results were analyzed by assembling the responses of all the participants. But when learning conditions are carefully controlled, the performances can be more finely analyzed by type of learning. We thus found that horizontal learning clearly yields better performance and that there is a positive influence of the learning route, by floors and also by columns, on familiar trials, as compared with novel ones, in accuracy and RT.

The observed performance for floor learners is consistent with the literature in that they better exploit their memory when tested by floors. However, this was not observed for column learners, indicating that the previously reported findings could stem from an advantage for familiar segments, as compared with novel ones.

The 850-ms additional time required for shortcuts (novel vs. familiar trials) is in the range of what has been reported previously: Wolbers and Büchel (2005), where participants were also moved passively in a virtual environment, observed an additional time of 500 ms for close shortcuts, as compared with direct recalls, and of 1,600 ms for remote shortcuts, as compared with direct recalls.

The reduced field of view and the absence of vestibular and motor cues in our experiment could have reduced participants’ performance (see Richardson et al., 1999). Sensed gravity was always consistent with the simulated motion; transient linear accelerations were not experienced by the participants in either learning condition. The main visuo-vestibular discrepancy occurred during rotation, which could have been detrimental for floor learners only. Since the latter had better performance than column learners, this difference would be even stronger under real conditions.

The participants were passively presented with a movie of a virtual building, contrary to Experiments 1 and 3 of Christou and Bülthoff (1999), where exploration was active, on a single floor. Since Experiment 2 of these authors establishes that, in a passive situation, the participants have performances similar to those for the active case for learning points of view, our study extends this to the case of a multifloored building.

Admittedly, alternative factors, especially the environment’s structure, may influence spatial knowledge. In particular, Buechner et al. (2007), who studied the effect of learning on spatial memory in a real building, observed better memorization by floor than by column, which the authors related directly to the extended structure of the environment. In contrast, in our study, we sought to reduce the effect of the environment (by using a regular grid of homogeneous rooms for both the paths) to highlight the learning effect. In addition, the study by Shelton and McNamara (2001), about learning and remembering a scene with respect to an intrinsic spatial reference frame clearly provided by the walls in the case of a rectangular room or not provided in the case of a circular one, could be extended by altering our protocol to study the effect of the environment on spatial learning in a parallelepiped versus cylindrical multifloored building.

Our results emphasize that spatial memory of multifloored environments is structured according to the learning route. Besides, participants are able to perform the recognition task in shortcut trials, which involves survey knowledge. Taken together, our findings, in line with the literature on 2-D navigation, also support the idea that in multifloored environments, spatial cognition results from an interplay of survey and route knowledge.