A memory palace is a spatial mnemonic technique where information is associated with different aspects of the imagined environment, such as people, objects, or rooms, to assist in their recall (Yates 1992; Harman 2001). The goal of our user study was to examine whether a virtual memory palace, experienced immersively in a head-tracked stereoscopic HMD, can assist in recall better than a mouse-based interaction on a traditional, non-immersive, monoscopic desktop display. Previous work has examined the role of spatial organization, immersion, and interaction in assisting recall.
This study is different from the previous work in several ways. First, we are focusing on spatial memory using a 3D model of a virtual memory palace, rather than relying on other forms of memory (such as temporal/episodic). Second, both the training and testing (recall) phases take place within the same virtual memory palace. Third, participants used both the desktop and HMD displays, which allows us to compare each participant’s recall across displays. Lastly, the content used in previous studies was either abstract, verbal, textual, visually simplistic, low in diversity, or time based, whereas our study uses faces, with unique and diverse characteristics.
Participants
Our user study for this research was carried out under IRB ID 751321-1 approved on August 7, 2015, by the University of Maryland College Park IRB board. In this study, we recruited 40 participants, 30 male and 10 female, from our campus and surrounding community. Each participant had normal or corrected-to-normal vision (self reported). The study session for each participant lasted around 45 min.
Materials
For this study, we used a traditional desktop with a 30 inch (76.2) cm—diagonal monitor and an Oculus DK2 HMD. The rendering for the desktop was configured to match that of the Oculus with a resolution of \(1920 \times 1080\) pixels (across the two eyes) with a rendering field of view (FOV) of \(100^{\circ }\). In order to give the desktop display the same field of view as the HMD, the participants were positioned with their heads 10 inches (25.4 cm) away from the monitor. The software used to render the 3D environments on both the desktop and HMD was identical and was designed in-house using C++ and OpenGL-accelerated rendering. The rendering was designed to replicate a realistic looking environment as closely as possible, incorporating realistic lighting, shadows, and textures. The models (the medieval town and palace) were purchased through the 3D modeling distribution Web site TurboSquid (3DMarko 2011, 2014).
Design
The participants were shown two scenes, on two display conditions (head-tracked HMD and a mouse-based interaction desktop), and two sets of faces (within-subject design), all treated as independent variables, with the measured accuracy of recall as the dependent variable. The two scenes (virtual memory palaces) consist of pre-constructed palace and medieval town environments filled with faces. We decided to use faces given the previous work (Harris 1980; McCabe 2015) showing the effectiveness of memory palaces aiding users in recalling face-name pairs. We used faces as the objects to be memorized and carefully partitioned them into two sets of roughly equal familiarity. We quantified the familiarity of the faces using Google trends data over the four months preceding the study. The faces are shown in “Appendix” (at the very end of the paper) in Figs. 11 and 12, and the Google trends statistics are presented in Tables 1 and 2. There was no statistically significant difference between the two sets of Google trends data: \(p = 0.45 > 0.05\).
The faces in the palace and medieval town were hand positioned for each environment, before the start of the study, and remained consistent throughout the study. We distributed the faces at varying distances from the users’ location (see Fig. 2) so that they surrounded and faced the user. Since we used perspective projection, the sizes of the faces varied. However, the distribution of the angular resolution of the faces across the two sets/environments was not statistically different, with \(p = 0.44 > 0.05\) (see Table 3 in “Appendix”).
Users were allowed to freely rotate their view but not translate. This effectively simulated a stereoscopic spherical panoramic image with the participant at its center. Our motivation behind this study design decision was that if even this limited level of immersion could show an improvement in recall, it could lead to a better-informed exploration of how greater levels of immersion relate to varying levels of recall.
Procedure
First, each participant familiarized themselves with all the 42 faces and their names used in the study. The participants received a randomly permuted collection of printouts, each containing a face-name pair used in the study. Participants were given as much time as needed until they stated when they were comfortable with the faces. In general, participants did not spend more than 5 min on this familiarization.
Next, each participant was told about the training and testing procedure, including how many faces were going to be in each scene (21), how much time they had to view the faces (5 min), how the breaks would work, that the faces would be replaced with numbers in the recall phase, and that they were to give a name and confidence for their recalled faces for each numbered position. In almost every case, we recorded the answer as the name explicitly recalled by the participant. However, in rare, exceptional circumstances, when the participants gave an extremely detailed and unambiguous description of the face (“fat, wore a wig, was King of France, and is not Napoleon” for King Louis), we marked it correct. Next, each participant was placed either in front of a desktop monitor with a mouse or inside a head-tracked stereoscopic HMD. They were given as much time as they desired to get comfortable, looking around the scene without numbers or faces. The users rotated the scene on a desktop monitor with a mouse, and in the HMD setup they rotated their head and body, but no further navigation was possible.
Once each participant was comfortable with the setup and the controls, a set of 21 faces were added to the 3D scene and distributed around the entire space as shown in Fig. 2. We used two such scenes—a palace and a medieval town, shown in Fig. 3. The faces were divided into two consistent sets used for the whole study; if a face appeared in one set (or scene) for a given participant, it would not be shown again in the second set or scene.
To cover all possible treatments of the \(2 \times 2 \times 2\) Latin-square design, each participant was tested in both scenes, both display conditions (HMD and desktop), and both sets of faces, with their relative ordering counterbalanced across participants. The 21 faces within the scene were presented to the participants all at once, and the participants were able to view and memorize the faces in any order of their choosing. The faces were deterministically placed in the same order for all participants. However, since the participants were free to look in any direction, the order of presentation of faces was self-determined. Each participant was given 5 min to memorize the faces and their locations within the scene. After the 5-min period, the display went blank and each participant was given a 2-min break in which they were asked a series of questions. Questions we asked included how each participant learned about the study, what their profession/major was, and what were their general hobbies or interests. In the second half of the study, during the break for the alternative display, we asked how often a participant used a computer, what their previous experience was with VR, and their general impressions of VR. We consistently asked these questions of each participant, but did not record the responses.
The reasons for these study design decisions are rooted in foundational research in psychology on memory. From the seminal work by Miller (1956), we learn that the working memory (Baddeley and Hitch 1974) can only retain \(7 \pm 2\) items. According to Atkinson and Shiffrin (1968), the information in the short-term memory decays and is lost within a period of 15–30 s. We feel confident that having participants recall 21 faces after a 2-min break will engage their long-term memory.
After the 2-min break, the scene would reappear on the display with numbers having replaced the faces, as shown in Fig. 4. Each participant was then asked to recall, in any order, which face had been at each numbered location. During this recall phase, each participant could look around and explore the scene just as they did in the training phase, using the mouse on the desktop or rotating their head-tracked HMD. Each participant had up to 5 min to recall the names of all the faces in the scene. Once the participant was confident in all their answers, or the 5-min period had passed, the testing phase ended. After a break, each participant was placed in the other display that they had not previously tested with. The process was then repeated with a different scene and a different set of 21 faces to avoid information overlap from the previous test.
For each numbered location in the scene, the participants verbally recalled the name of the face at that location, as well as a confidence rating for their answer, ranging from 1 to 10, with 10 being certain. If a participant had no answer for a location, it was given a score of 0. The results were hand-recorded by the study administrator, keeping track of the number, name, user confidence, and any changes in a previously given answer.
To mitigate any learning behavior from the first trial to the second, we employed a within-subject trial structure, using a 2 (HMD-condition to desktop-condition vs desktop-condition to HMD-condition) × 2 (Scene 1 vs Scene 2 ) × 2 (Face Set 1 vs Face Set 2) Latin-square design. By alternating between the displays shown first (2), the scenes (2), and the faces (2), we expect to mitigate any confounding effects. At the end, each participant was tested on the two display conditions, desktop and HMD, on two different scenes, and with two different sets of 21 faces. We note that participants could have used personal mnemonics to help remember the locations and ordering of faces. However, since we evaluated recall for each participant over a desktop and a HMD, their performance should be counterbalanced between the two display conditions.