Faces contain some of the most valuable visual information that human beings encounter, because they allow us to discern others’ identities, emotions, and intentions (Leopold & Rhodes, 2010; Taubert, 2010). Therefore, it is not surprising that there has been a long history of behavioral, neuropsychological, and neuroimaging research dedicated to understanding human face processing (Rivolta, 2013). Many fundamental questions have been posed in this field, including: “Do faces represent a ‘special’ class of visual stimuli?” (e.g., Farah, Wilson, Drain, & Tanaka, 1998), “Which brain regions contribute to face processing?” (e.g., Nasr & Tootell, 2012), and “Do individuals with prosopagnosia [i.e., disordered face perception] experience difficulty processing stimulus categories other than faces?” (e.g., Yovel & Duchaine, 2006). In their attempts to clarify these and other important issues, researchers have often sought out appropriate classes of nonface objects with which to compare faces.

Object categories that have been compared to faces in the literature have included photographs of cars (e.g., Cassia, Turati, & Schwarzer, 2011), chairs (Levine, Banich, & Koch-Weser, 1988), shoes (e.g., Picozzi, Cassia, Turati, & Vescovo, 2009), eyeglasses (Farah, Levinson, & Klein, 1995), novel objects (i.e., “Greebles”; e.g., Gauthier, Behrmann, & Tarr, 2004), and houses (e.g., Rossion et al., 2000). Although a perfect class of objects with which to compare faces likely does not exist (Luck & Kappenman, 2012), photographs of houses have often been employed in face-processing studies, due to the numerous characteristics that they have in common with faces, including distinct, consistent internal features, mono-orientation, and relative familiarity in the everyday environment (Robbins, Shergill, Maurer, & Lewis, 2011; Yin, 1969). Furthermore, given the role that the spatial relationships between facial features play in human face processing (Haig, 1984), the ability to manipulate the spatial locations of key features of houses (e.g., windows, doors; Matheson & McMullen, 2009; Tanaka & Farah, 1993) increases the utility of houses as comparison stimuli for faces.

Despite the frequency with which house stimuli are used in face-processing studies, we are not aware of any large, well-controlled databases of photographs of houses that have been developed for research use. Across studies, there has been considerable variation in the methods of procuring house stimuli, resulting in great heterogeneity of the images used. For example, researchers have used images that were created using computer software (e.g., Collins, Zhu, Bhatt, Clark, & Joseph, 2012; Tanaka & Farah, 1993), obtained from the Internet (e.g., Jemel, Coutya, Langer, & Roy, 2009), or photographed specifically for the purposes of the study (e.g., Tanaka, Kaiser, Hagen, & Pierce, 2014). Factors on which house stimuli vary include their level of realism (i.e., photographic vs. computer-generated), physical features (e.g., number of stories, layout of key features such as windows and doors, symmetry), and inclusion of extraneous features (e.g., trees, driveways, shrubs, power lines). The lack of a large, well-controlled data set not only requires labor-intensive stimulus set development by individual researchers, but may also limit the replicability of face-processing research that employs house stimuli.

In the present study, we sought to address these issues by developing a large set of high-quality, well-controlled house stimuli (i.e., the “DalHouses”) for use in future face- and object-processing research. To do this, we took high-resolution photographs of 100 two- and three-story houses located in the vicinity of Halifax, Nova Scotia, Canada. Each image was carefully edited to remove all major extraneous details (e.g., tree branches, shrubs, power lines) and potentially identifying information (e.g., house numbers, cars, license plates). To further increase the utility of the DalHouses, we had a group of university students rate each stimulus on three stimulus dimensions. Specifically, we asked participants to rate (1) how typical each stimulus was of the category “houses,” (2) how face-like each stimulus appeared, and (3) how much they liked each stimulus.

Typicality ratings were gathered, because typicality is one of the strongest predictors of face recognition performance and has been found to be correlated with other important constructs in the face-processing literature, such as familiarity and attractiveness (Zhao & Chellappa, 2006). It should be noted that many of the university students who we tested are from other locations and, therefore, typicality ratings were not solely based on experience with houses in Halifax. Face-likeness ratings were gathered, because this factor may also influence researchers’ selections of comparison stimuli, in light of questions that have been raised about the specificity of the fusiform face area to faces versus objects with face-like configurations of key features (Cabeza & Kingstone, 2006; Gauthier et al., 2004). Finally, since we intended to employ this stimulus set in a mere repeated exposure (MRE; Zajonc, 1968) experiment, we also collected ratings of likeability. We have included these likeability ratings in the present article because they may be of interest to other researchers studying the MRE effect and other factors that may influence preferences for faces.

To summarize, the carefully acquired and edited house stimuli that we describe in this article, and the initial ratings data that we provide, are an important contribution to face- and object-processing research.

Method

Participants

Forty-four university students were recruited through the Dalhousie University Department of Psychology Subject Pool. They received course credit for their time. A technological issue resulted in one participant’s data not being saved. Furthermore, two participants’ ratings of the houses were considered unreliable because of a lack of variability in the ratings that they assigned (i.e., they gave more than 80 % of the stimuli the same rating on at least one dimension). The final sample included 41 participants, 33 of whom self-identified as female, and 8 of whom self-identified as male. The participants ranged in age from 18.17 to 31.83, with a mean age of 20.95 years (SD = 2.45). To be included in the present study, participants were required to be English-speaking and have normal or corrected-to-normal vision.

Stimuli

House stimuli were photographed using a Nikon D80 camera with an 18-200VR lens. Only two- and three-story homes were photographed, because this house configuration was deemed to be most similar to the structure of a human face (e.g., the first-story door and second- or third-story windows being somewhat analogous to a mouth and eyes, respectively). As we indicated previously, all houses photographed were located in the Halifax Regional Municipality in Nova Scotia, Canada. Houses were selected to include a range of older and newer homes, as well as a variety of architectural designs, in order to provide researchers with a wide range of options from which to select their preferred stimuli.

From their original format, the house photographs were converted to grayscale. Following this, any potentially identifying features (e.g., house numbers, cars, license plates) were removed from the photographs, as were elements (e.g., tree branches, shrubs, power lines) that obscured a large portion of the house. Furthermore, the entire background of the images (e.g., sky, grass) was removed, so that each stimulus consisted of only a house on a white background.

The final set of 100 stimuli was saved at a resolution of 72 pixels/cm. The size of each image was vertically constrained to approximately 375 pixels. Although the stimuli varied with respect to their horizontal dimensions, the widest stimulus was about 780 pixels across. The photographs of all 100 houses can be obtained at http://dx.doi.org/10.6084/m9.figshare.1279430, or by contacting the corresponding or first authors.

Equipment

Participants completed all experimental tasks on a 15-in. MacBook Pro laptop running OS X Leopard (Version 10.5.8). PsyScope X Build 57 (Cohen, MacWhinney, Flatt, & Provost, 1993) software was used to present the visual information to participants and to collect responses. All statistical analyses were carried out using IBM SPSS Statistics (Version 21.0.0.0).

Procedure

Participants were asked to rate each stimulus on three dimensions: typicality, face-likeness, and likeability. Participants viewed the entire set of stimuli three times, rating each stimulus on only one dimension during each viewing. The order of the dimension ratings for each participant was determined randomly, as was the order in which the stimuli were presented during each rating task.

For the typicality ratings, participants were given the following instructions:

For each house you see on the screen, please give it a rating on the typicality scale. Rate typicality based on how typical the house shown is of your representation of a house.

Then they were asked to respond to the question “How typical is this house?”

For the likeability ratings, participants were given the following instructions:

For each house on the screen, please give it a rating on the likeability scale. Rate each image based on how much you like or dislike each house.

Then, they were asked to respond to the question “How likeable is this house?”

For the face-likeness ratings, participants were given the following instructions:

Faces and houses can be similar in complexity, symmetry, layout and shape.

For each house on the screen, please give it a rating on the facial similarity scale. Rate each image based on how face-like the house is.

Then they were asked to respond to the question “How face-like is this house?”

For all three rating dimensions, participants were asked to assign each house stimulus a value on a 7-point Likert scale. All three scales included two labels. For typicality ratings, a rating of 1 was paired with the label not very typical, and a rating of 7 was paired with the label very typical. For likeability ratings, a rating of 1 was paired with the label not very likeable, and a rating of 7 was paired with the label very likeable. For face-likeness ratings, a rating of 1 was paired with the label not very face-like, and a rating of 7 was paired with the label very face-like.

For all three types of rating tasks, a visual scale with the numbers 1 to 7 and the associated anchors was presented below each to-be-rated stimulus. Participants’ ratings were not recorded until they pressed the Enter/Return key and, therefore, they were able to edit their responses as desired.

Results

Missing values

As was indicated previously, 41 participants rated each of 100 houses for each of the three stimulus dimensions considered. Therefore, there were 4,100 potential ratings for each stimulus dimension. For the typicality, likeability, and face-likeness dimensions, 0.85 % (35/4,100), 1.02 % (42/4,100), and 1.10 % (45/4,100) of the ratings were missing, respectively. These missing ratings resulted from participants pressing the Enter/Return key before they had entered a rating value. A further seven data points (i.e., three, three, and one data points for the typicality, likeability, and face-likeness scales, respectively) were manually removed because they were outside the specified 1–7 rating scale.

Typicality ratings

The mean typicality rating for the stimulus set was 4.13 (SD = 0.66). Recall that the stimuli were rated on a 7-point Likert scale (i.e., 1 = not very typical and 7 = very typical). The houses that received the highest (M = 5.34, SD = 1.76) and lowest (M = 2.21, SD = 1.44) ratings on the typicality dimension are pictured in Fig. 1. The typicality ratings for individual house stimuli can be found in Table 1. Please note that all of the ratings data can be downloaded in spreadsheet format from http://dx.doi.org/10.6084/m9.figshare.1279430.

Fig. 1
figure 1

House stimuli rated lowest and highest on the typicality, likeability, and face-likeness dimensions

Table 1 Means, standard deviations, and ranges of ratings for each of the house stimuli

Likeability ratings

The mean likeability rating for the stimulus set was 3.47 (SD = 0.85). Recall that the stimuli were rated on a 7-point Likert scale (i.e., 1 = not very likeable and 7 = very likeable). The houses that received the highest (M = 5.8, SD = 1.79) and lowest (M = 1.75, SD = 0.95) ratings on the likeability dimension are pictured in Fig. 1. The likeability ratings for individual house stimuli can be found in Table 1.

Face-likeness ratings

The mean face-likeness rating for the stimulus set was 3.09 (SD = 0.95). Recall that the stimuli were rated on a 7-point Likert scale (i.e., 1 = not very face-like and 7 = very face-like). The houses that received the highest (M = 6.65, SD = 1.14) and lowest (M = 1.68, SD = 0.94) ratings on the face-likeness dimension are pictured in Fig. 1. The face-likeness ratings for individual house stimuli can be found in Table 1.

Correlations

Pearson’s bivariate correlation coefficients were calculated for all three possible pairings of the dimension ratings. A significant positive correlation was observed between the typicality and likeability ratings that participants provided [r(100) = .71, p < .001]. Significant relationships were not observed between face-likeness and either typicality [r(100) = .10, p = .33] or likeability [r(100) = –.03, p = .80].

Discussion

The main goal of this study was to develop a large set of high-quality, well-controlled house stimuli (i.e., the “DalHouses”) for use in future face-processing research. In order to create this stimulus set, we photographed and edited 100 houses and had participants rate them on three scales: typicality, likeability, and face-likeness. Ratings of each of these houses are presented within this article. Importantly, the ratings appear to have a high degree of face validity. For example, the houses that were rated as most face-like looked very much like faces (e.g., with windows placed in a similar spatial layout to the eyes and mouth of a face), whereas those that were rated as least face-like did not (see Fig. 1). Furthermore, the significant correlation that we observed between the rating scales (i.e., typicality and likeability) is consistent with previous findings in the face-processing literature suggesting that people find more average or prototypical faces to be particularly attractive (Halberstadt & Rhodes, 2003). Interestingly, the face-likeness dimension was not related to either typicality or likeability.

Although the DalHouses and the associated ratings represent a valuable contribution to the face-processing literature, the present study does have some limitations. Our participant sample was relatively homogeneous. For example, the majority of the participants were young female undergraduate university students enrolled in at least one psychology course. Furthermore, all participants lived in the same city where the house stimuli had been photographed. Therefore, it is possible that the participants in the rating study were somewhat more familiar with the style of houses depicted than participants in other areas of the country or world would be. Although we did not gather information about the university students’ home towns, approximately 56 % of Dalhousie students are from provinces other than Nova Scotia, and about 14 % are from countries other than Canada (Dalhousie University, 2014). We also attempted to control for this possible issue by photographing a wide range of house styles that we believe are not particularly unique to our geographic area. Nonetheless, it will be valuable for researchers to obtain additional ratings of the DalHouses stimuli across a broader participant sample in the future.

In conclusion, the present study involved the development of a set of high-quality, well-controlled house stimuli for use in future face- and object-processing research. We believe that this stimulus set (i.e., the DalHouses) will be useful to other face-processing researchers because it will minimize the effort required to acquire stimuli and allow for easier replication and extension across studies.