A standardized set of 3-D objects for virtual reality research and applications
The use of immersive virtual reality as a research tool is rapidly increasing in numerous scientific disciplines. By combining ecological validity with strict experimental control, immersive virtual reality provides the potential to develop and test scientific theories in rich environments that closely resemble everyday settings. This article introduces the first standardized database of colored three-dimensional (3-D) objects that can be used in virtual reality and augmented reality research and applications. The 147 objects have been normed for name agreement, image agreement, familiarity, visual complexity, and corresponding lexical characteristics of the modal object names. The availability of standardized 3-D objects for virtual reality research is important, because reaching valid theoretical conclusions hinges critically on the use of well-controlled experimental stimuli. Sharing standardized 3-D objects across different virtual reality labs will allow for science to move forward more quickly.
KeywordsVirtual reality 3D-objects Database Stimuli
Visual representations of individual objects have been an essential type of experimental stimulus in several domains of scientific inquiry including attention, language, memory, and visual perception research. Already at the end of the 19th century, James McKeen Cattell developed an ingenious instrument that allowed for the consecutive presentation of individual pictures of objects (and other visual stimuli such as words and numerals) to an observer (Cattell, 1885). The use of visual stimuli in such an experimental context led to theoretically interesting findings such as that words are named faster than pictures and that pictures are named faster in one’s first language than in one’s second language (Cattell, 1885; Levelt, 2013). Over the years, picture-naming tasks have continued to play a pivotal role in psychological and neurological research—for instance, in the development of cognitive models of speech production (e.g., Levelt, Roelofs, & Meyer, 1999).
Reaching meaningful and valid theoretical conclusions critically hinges on the use of well-controlled experimental stimuli. Therefore, standardized, normative databases of picture stimuli have been crucial in controlling for the factors that influence picture recognition and picture-naming latencies, as well as in enabling the comparison of results across different studies and different samples of participants. The most influential standardized picture database to date was developed by Snodgrass and Vanderwart (1980). It consists of 260 black-and-white line drawings standardized for name agreement (the degree to which participants produce the same name for a given picture), image agreement (the degree to which participants’ mental image of a concept corresponds to the visually depicted concept), familiarity (the degree to which participants come in contact with or think about a depicted concept in everyday life), and visual complexity (the amount of detail or intricacy of line in the picture) in native speakers of American English. Over the years, similar picture databases have been introduced and standardized for other languages, including British English, Bulgarian, Dutch, French, German, Hungarian, Icelandic, Italian, Japanese, Mandarin Chinese, and Modern Greek (Alario & Ferrand, 1999; Barry, Morrison, & Ellis, 1997; Bonin, Peereman, Malardier, Méot, & Chalard, 2003; Dell’Acqua, Lotto, & Job, 2000; Dimitropoulou, Duñabeitia, Blitsas, & Carreiras, 2009; Martein, 1995; Nishimoto, Miyawaki, Ueda, Une, & Takahashi, 2005; Nisi, Longoni, & Snodgrass, 2000; Sanfeliu & Fernandez, 1996; Szekely et al., 2004; Van Schagen, Tamsma, Bruggemann, Jackson, & Michon, 1983; Viggiano, Vannucci, & Righi, 2004; Vitkovitch & Tyrrell, 1995; Wang, 1997).
Such black-and-white line drawings typically used in experiments are abstractions of real-world objects. They lack the texture, color, and shading information of the natural objects that we encounter in the real world. One may therefore doubt whether results obtained in studies using line drawings will fully generalize to everyday situations. In a first attempt to increase the ecological validity of experimental stimuli, standardized databases have been developed that include grayscale or colored photographs of objects (e.g., Adlington, Laws, & Gale, 2009; Brodeur, Dionne-Dostie, Montreuil, & Lepage, 2010; Migo, Montaldi, & Mayes, 2013; Moreno-Martínez & Montoro, 2012; Viggiano et al., 2004). Indeed, in certain cases color information in a picture or a line drawing enhances object recognition, such as when several objects within a category (e.g., types of fruit) have relatively similar shapes (e.g., apple, orange, peach) but different diagnostic colors (see, e.g., Laws & Hunter, 2006; Price & Humphreys, 1989; Rossion & Pourtois, 2004; Wurm, Legge, Isenberg, & Luebker, 1993). Importantly, the use of more ecologically valid stimuli significantly increases the odds of experimental findings being generalizable to everyday situations of object recognition, naming, and memory. Despite the availability of color and surface details in photographs of objects, there is still a large gap between observing a picture of an object on a small computer monitor in the lab and encountering that object in the real world. One important difference is the two-dimensional (2-D) nature of the line drawing or photograph versus the three-dimensional (3-D) nature of the objects we encounter in the wild.
In further pursuit of establishing the ecological validity of psychological and neuroscientific findings and theory in general, researchers have now started to exploit recent advances in immersive virtual reality (VR) technology (see Bohil, Alicea, & Biocca, 2011; Fox, Arena, & Bailenson, 2009; Peeters & Dijkstra, 2017; Slater, 2014). In immersive virtual environments, participants’ movements are tracked and their digital surroundings rendered, usually via large projection screens or head-mounted displays (Fox et al., 2009). This allows researchers to immerse participants in rich environments that resemble real-world settings, while maintaining full experimental control. Critically, such environments will often contain a multitude of 3-D objects. One can think of the furniture in a virtual classroom, the food items in a virtual restaurant, the groceries in a virtual supermarket, or even the clothes that a virtual agent or avatar is wearing. Whether participants recognize the 3-D objects will depend, among other factors, on those objects’ graphical quality. However, producing realistic 3-D objects takes time as well as graphic design skills. An open-access database of standardized 3-D objects for VR experiments and applications would be an important step forward in facilitating such research and making the findings comparable across different studies and different groups of participants.
The present study, therefore, introduces a database of 147 colored 3-D objects standardized for name agreement, image agreement, familiarity, visual complexity, and corresponding lexical characteristics of the modal object names. The 3-D objects are freely available from an online database and can be used for VR and augmented reality research and applications. Researchers may use them in the virtual, 3-D equivalents of traditional object recognition and object-naming experiments, to test whether original findings will generalize to situations of more naturalistic vision that include depth cues and richer environments (e.g., Eichert, Peeters, & Hagoort, 2017; Tromp, Peeters, Meyer, & Hagoort, 2017). Moreover, these 3-D objects can be used in any virtual setting that requires the presence of objects. Using a 3-D object from the database will be faster than designing the object from scratch. Moreover, on the basis of the standardized norms, researchers may select 3-D objects that fit the purpose of their specific research question.
A total of 168 native Dutch speakers (84 female, 84 male; mean age 22 years old; age range 18–31 years) participated in the study. Each task (name agreement, image agreement, familiarity, and visual complexity) included 42 different participants (21 female, 21 male). One additional participant in the name agreement task was replaced due to technical problems during the experiment. All of the participants were Dutch; studied in Nijmegen, The Netherlands; and had Dutch as their single native language. They were university students, which means that they had been enrolled in at least 12 years of formal education. They all had normal or corrected-to-normal vision and no language or hearing impairments or history of neurological disease. The participants provided informed consent and were paid for participation. Ethical approval for the study was granted by the ethics board of Radboud University’s Faculty of Social Sciences.
In each task, after having provided informed consent, participants were seated in a chair in the middle of a CAVE system (Cruz-Neira, Sandin, & DeFanti, 1993), such that the three screens covered their entire horizontal visual field (see below). They put on VR glasses, which were part of a tracking-system that monitored the position and direction of the participant’s head, controlling the correct perspective of the visual display. The eyes of the participant were approximately 180 cm away from the middle screen. Objects were presented one by one in random order against a simple background for 7 s on the center of the screen facing the participant. We aimed to present the objects in expected real-world size. A number (1 to 150) was displayed next to the object that corresponded with a number on the answer sheet or file. The procedure in each of the four tasks was kept similar to the procedure used for standardization of picture databases (e.g., Snodgrass & Vanderwart, 1980). For all four tasks, participants were informed that we were setting up a database of 3-D objects made by an in-house designer and that we would like to know people’s opinion about the objects. Each task consisted of a single session without breaks. To include as many objects as possible in the database, no practice session with practice objects preceded the task. Instead, the experimenter checked before the start of the experiment whether the participant completely understood the task. For these simple tasks, this procedure worked well.
In the name agreement task, participants were instructed to carefully look at the object and type the name of each object into a laptop they held on their lap. They were told that a name could consist of a maximum of two words. They were asked to type in “OO” (for Object Onbekend, “unknown object”) if they did not know the object, “NO” (for Naam Onbekend, “name unknown”) if they knew the object but not its name, and “PT” (for Puntje van de Tong, “tip of the tongue”) for objects that elicited a tip-of-the-tongue state. Henceforth, these answer options will be referred to by their commonly used English acronyms: respectively, DKO (“don’t know the object”), DKN (“don’t know the name”), and TOT (“tip of the tongue”). Participants were told that they had 7 s to look at each object and type in its name. The task took about 25 min.
In the image agreement task, participants were instructed that for each object they would first see its name (i.e., the modal name derived from the name agreement task, defined as the unique name that was produced by the largest number of participants in the name agreement task) on the 3-D screen in front of them for 4 s, after which they would see the corresponding 3-D object for 7 s. They were instructed to (passively—i.e., without saying it out loud) read the name of the object and imagine what an object corresponding to that name would normally look like. On a rating form, they then rated for each object the correspondence between their mental image and the presented 3-D object on a 5-point scale. A rating of 1 indicated low agreement, which meant a poor match to their mental image. A rating of 5 indicated high agreement, which meant a strong match to their mental image. For each object they were asked to circle Geen Beeld (“no image”) if they did not manage to form a mental image for an object, and Ander Object (“different object”) if they had a different object in mind than the one depicted. This task took about 35 min.
In the familiarity task, participants were instructed to look at each object and rate on a 5-point scale how familiar they were with the object. Familiarity was defined as the degree to which the participant usually comes in contact with the object or thinks about the concept. A rating of 1 indicated that the participant was not familiar at all with the object. A rating of 5 indicated that the participant was very familiar with the object. This task took about 25 min.
In the visual complexity task, participants were instructed to look at each object and rate on a 5-point scale how visually complex they found it. Complexity was defined as the amount of detail or the intricacy of the lines in each object. Color was not mentioned in the instructions. A rating of 1 indicated an object with very few details, and a rating of 5 indicated a very detailed object. This task took about 25 min.
The CAVE system consisted of three screens (255 cm × 330 cm; VISCON GmbH, Neukirchen-Vluyn, Germany) that were arranged at right angles. Two projectors (F50, Barco N.V., Kortrijk, Belgium) illuminated each screen indirectly by means of a mirror behind the screen. For each screen, the two projectors showed two vertically displaced images that were overlapping in the middle of the screen. Thus, the complete display on each screen was visible only as a combined overlay of the two projections. Each object was presented on the screen facing the participants.
For optical tracking, infrared motion capture cameras (Bonita 10, Vicon Motion Systems Ltd, UK) and the Tracker 3 software (Vicon Motion Systems Ltd, UK) were used. Six cameras were positioned at the upper edges of the CAVE screens, and four cameras were placed at the bottom edges. All cameras were oriented toward the middle of the CAVE system. Optical head-tracking was accomplished by placing light reflectors on both sides of the VR glasses. Three spherical reflectors were connected on a plastic rack, and two such racks with a mirrored version of the given geometry were manually attached to both sides of the glasses. The reflectors worked as passive markers that can be detected by the infrared tracking system in the CAVE. The tracking system was trained to the specific geometric structure of the markers and detected the position and orientation of the glasses with an accuracy of 0.5 mm.
A control room was located behind the experimental room containing the CAVE setup. The experimenter could visually inspect the participant and the displays on the screens through a large window behind the participant. The four tasks were programmed and run using Python-based 3-D application software (Vizard, Floating Client 5.4; WorldViz LLC, Santa Barbara, CA).
Results and discussion
Summary statistics for all elicited data
Correlation matrix for the collected norms
Overview of standardized measures in the present study, three recent colored photograph databases, and the canonical line drawings database by Snodgrass and Vanderwart (1980)
The familiarity measure in the present study yielded a result numerically similar to that from the line drawing database by Snodgrass and Vanderwart (1980), which is slightly lower than the average familiarity ratings from the three databases with colored photographs. This difference may be due to the fact that both line drawings and 3-D objects are designed from scratch by a designer, whereas photographs of objects, by definition, represent objects more directly. Nevertheless, photographs and line drawings are typically 2-D abstractions of an actual 3-D real-world object. They represent an object, but they are not the represented object itself. In the case of 3-D VR research, however, a participant’s full immersion in a virtual world means that he or she should experience the 3-D objects as real objects. This difference also explains why certain semantic categories are not represented in the present database, though they are present in previous picture databases. Whereas traditional databases include, for instance, line drawings or (manipulated) photographs of individual body parts (Adlington et al., 2009; Duñabeitia et al., 2017; Moreno-Martínez & Montoro, 2012; Snodgrass & Vanderwart, 1980), no 3-D body parts are provided in the present database. Showing an individual body part in a 3-D virtual environment might decrease the participant’s experience of presence in the virtual world, since people usually do not come across individual, detached body parts in everyday life.
Numerically, the average image agreement and visual complexity of the 3-D objects in the present study are comparable to the norms for photographs and line drawings from the four other databases. The overall numerical similarity in image agreement suggests that, across the evaluated databases, participants on average agreed to a similar extent with the collected modal names. The overall similarity in average visual complexity scores suggests that the depicted objects in the present database, despite their 3-D nature, were not evaluated as being visually more complex than the stimuli in earlier databases. Note, however, that this might change if 2-D photographs of objects were directly compared to our 3-D objects in the same study with the same group of participants.
Correlations (r) between ratings for 3-D objects from the present study, two recent colored photograph databases, and the canonical line drawings database by Snodgrass and Vanderwart (1980)
Overall, significant correlations between the present database and previous databases in terms of name agreement were either absent (Moreno-Martínez & Montoro, 2012; Snodgrass & Vanderwart, 1980) or weak (Brodeur et al., 2010). Thus, although a modal name may be the same across studies, this does not imply that the name agreement for that specific item was also similar. This is not surprising, because different stimuli and different languages (Dutch, English, and Spanish) were involved in the different studies. Weak to moderate significant positive correlations were observed between the present database and previous databases in terms of image agreement. This suggests that, overall, certain modal names (e.g., “hammer”) elicit a highly stable mental image that is clearly represented by both picture stimuli and our 3-D object. Other modal names (e.g., “lamp”) may consistently elicit lower image agreement across different studies because there is more variance in the mental images each elicits (e.g., different types of lamps) across the participants within studies. Moderate to strong significant positive correlations were observed between the present database and the three earlier databases for both familiarity and visual complexity in all three comparisons (see Table 4).
The familiarity result indicates that, broadly speaking, objects that were normed as more or less familiar in the present study were also more or less familiar to the participants who provided norms in the earlier picture databases. This can be explained by the fact that the participants providing norms for the different databases have all lived in Western cultures in which they may encounter similar objects in their daily life. Some cultural differences in the familiarity of specific objects may exist, for example, in different culture-specific types of food (e.g., the typical Dutch pastry tompouce that was included in the present database, or the crème caramel in Moreno-Martínez & Montoro, 2012). Such items were, however, by definition not included in these analyses, because they were present in only one of the databases.
The positive correlations in terms of visual complexity suggest that the objects depicted as visually more or less complex in the earlier databases were also designed and rated as being visually more or less complex in the present database. This overlap is explained by the inherent degree of visual complexity present in objects in everyday life, which is consequently represented as such in line drawings, pictures, and 3-D objects based on these real-world objects.
All in all, the comparisons of the present 3-D object database to four previous databases confirm the validity of the present set of 3-D objects. On the basis of these results, the present standardized 3-D object database sets the stage for better comparability of scientific findings that can result from the use of immersive VR and augmented reality settings within and across research labs and participant groups.
This study has introduced the first standardized set of 3-D objects for VR and augmented reality research and applications. The objects are freely available and can be selected as a function of the aim of a specific study or application, on the basis of the provided norms for name agreement, image agreement, familiarity, visual complexity, and the lexical characteristics of the object’s modal name. The 3-D objects can be adapted in size, color, texture, and visual complexity to fit the purposes of individual studies and applications. Note, however, that the collected norms are representative only of the 3-D objects as they are currently presented in the online database. Modifying, for instance, an object’s texture or color might change any of the collected norms. The 3-D objects can be used further for educational purposes as well as for testing patient populations in 3-D virtual environments. Researchers performing experiments in languages other than Dutch are invited to standardize the current set of 3-D objects for their local language and to expand the database by adding more objects. Sharing standardized 3-D objects across different labs will move VR research forward more quickly.
Open access funding provided by Max Planck Society.
I thank Jeroen Derks for designing the objects, Birgit Knudsen for assistance in running the experiments, Jeroen Geerts for creating the online repository, Albert Russel and Reiner Dirksmeyer for technical support, Peter Hagoort for making VR research possible at the Max Planck Institute, and two anonymous reviewers for valuable feedback.
- Alario, F.-X., & Ferrand, L. (1999). A set of 400 pictures standardized for French: Norms for name agreement, image agreement, familiarity, visual complexity, image variability, and age of acquisition. Behavior Research Methods, Instruments, & Computers, 31, 531–552. doi:10.3758/BF03200732 CrossRefGoogle Scholar
- Bonin, P., Peereman, R., Malardier, N., Méot, A., & Chalard, M. (2003). A new set of 299 pictures for psycholinguistic studies: French norms for name agreement, image agreement, conceptual familiarity, visual complexity, image variability, age of acquisition, and naming latencies. Behavior Research Methods, Instruments, & Computers, 35, 158–167. doi:10.3758/BF03195507 CrossRefGoogle Scholar
- Brodeur, M. B., Dionne-Dostie, E., Montreuil, T., & Lepage, M. (2010). The Bank of Standardized Stimuli (BOSS), a new set of 480 normative photos of objects to be used as visual stimuli in cognitive research. PLoS ONE, 5, e10773. doi:10.1371/journal.pone.0010773 CrossRefPubMedPubMedCentralGoogle Scholar
- Cattell, J. M. (1885). Über die Zeit der Erkennung und Benennung von Schriftzeichen, Bildern und Farben. Philosophische Studien, 2, 635–650.Google Scholar
- Cruz-Neira, C., Sandin, D. J., & DeFanti, T. A. (1993). Surround-screen projection-based virtual reality: The design and implementation of the CAVE. In M. C. Whitton (Ed.), Proceedings of the 20th annual conference on computer graphics and interactive techniques (pp. 135–142). New York, NY: ACM Press.Google Scholar
- Dell’Acqua, R., Lotto, L., & Job, R. (2000). Naming times and standardized norms for the Italian PD/DPSS set of 266 pictures: Direct comparisons with American, English, French, and Spanish published databases. Behavior Research Methods, Instruments, & Computers, 32, 588–615. doi:10.3758/BF03200832 CrossRefGoogle Scholar
- Duñabeitia, J. A., Crepaldi, D., Meyer, A. S., New, B., Pliatsikas, C., Smolka, E., & Brysbaert, M. (2017). MultiPic: A standardized set of 750 drawings with norms for six European languages. The Quarterly Journal of Experimental Psychology. doi:10.1080/17470218.2017.1310261
- Eichert, N., Peeters, D., & Hagoort, P. (2017). Language-driven anticipatory eye movements in virtual reality. Behavior Research Methods (in press). Google Scholar
- Levelt, W. J. M. (2013). A history of psycholinguistics. Oxford, UK: Oxford University Press.Google Scholar
- Martein, R. (1995). Norms for name and concept agreement, familiarity, visual complexity and image agreement on a set of 216 pictures. Psychologica Belgica, 35, 205–225.Google Scholar
- Nisi, M., Longoni, A. M., & Snodgrass, J. G. (2000). Italian measurement on the relation of name, familiarity, and acquisition age for the 260 figures of Snodgrass and Vanderwart. Giornale Italiano di Psicologia, 27, 205–218.Google Scholar
- Peeters, D., & Dijkstra, T. (2017). Sustained inhibition of the native language in bilingual language production: A virtual reality approach. Bilingualism: Language and Cognition (in press).Google Scholar
- Sanfeliu, M. C., & Fernandez, A. (1996). A set of 254 Snodgrass–Vanderwart pictures standardized for Spanish: Norms for name agreement, image agreement, familiarity, and visual complexity. Behavior Research Methods, Instruments, & Computers, 28, 537–555. doi:10.3758/BF03200541 CrossRefGoogle Scholar
- Szekely, A., Jacobsen, T., D’Amico, S., Devescovi, A., Andonova, E., Herron, D., … Bates, E. (2004). A new on-line resource for psycholinguistic studies. Journal of Memory and Language, 51, 247–250. doi:10.1016/j.jml.2004.03.002
- Tromp, J., Peeters, D., Meyer, A. S., & Hagoort, P. (2017). The combined use of virtual reality and EEG to study language processing in naturalistic environments. Behavior Research Methods. doi:10.3758/s13428-017-0911-9
- Van Schagen, I., Tamsma, N., Bruggemann, F., Jackson, J. L., & Michon, J. A. (1983). Namen en normen voor plaatjes. Nederlands Tijdschrift voor de Psychologie, 38, 236–241.Google Scholar
- Wang, M. Y. (1997). The evaluation of perceptual and semantic characteristics for a set of object contour pictures. Chinese Journal of Psychology, 39, 157–172.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.