Beauty is mysterious. We know it when we see it, but it eludes explanation. One facet of beauty that has been explored is its relationship to category structure, and in psychology, two possible relationships have been suggested. The first line of research explores how beauty arises from the feature structure of categories. For example, golden retrievers may be considered beautiful dogs because they are typical of the category dogs, sharing many features with other dogs. A second line of research instead explores how beauty contributes to the structure of categories, as when beautiful individuals are perceived to be better leaders and more electable (Berggren, Henrik, & Poutvaara, 2010). Together, these two views present a conundrum: Beauty is viewed as both arising from and contributing to the structure of categories. The present study disentangles these two accounts.

The first line of research, which we will refer to as the similarity-based view, holds that similarity relationships among category members play an important role in determining beauty. According to this view, items that are central by virtue of sharing features with other category members tend to be judged typical of their category and are processed more fluently (Nosofsky, 1988; Rosch, 1975; Storms, De Boeck, & Ruts, 2000). Fluency and familiarity are theorized to increase positive affect (Zajonc, 1968), which, in turn, is thought to increase perceptions of beauty (Reber, Schwarz, & Winkielman, 2004). For example, among a golden retriever, a daschund, and a great dane, the golden retriever is the most similar to other dogs in its size, proportions, and other characteristics and should be judged the most typical and the most beautiful of the three, according to the similarity-based view. Analogously, face morphs (Langlois et al., 2000), line drawings of animals (Halberstadt & Rhodes, 2003), and a variety of other real-world (Halberstadt, 2006) and artificial (Winkielman, Halberstadt, Fazendeiro, & Catty, 2006) stimuli with features that are central for their category are judged to be more beautiful than atypical category members. Importantly, the similarity-based view predicts that, by virtue of being fluently processed, highly typical objects should be viewed as the most beautiful.

According to a second line of research, referred to as the theory-based view, people’s perceptions of an item’s beauty combine with prior beliefs about the category to shape the structure of the category. Rather than arising from category structure, beauty can be a determinant of category structure. This view follows from theories of categorization suggesting that people’s prior beliefs, expectations, and intuitive theories of the category, rather than featural-similarity relationships among category members, determine typicality structure (Heit, 1997; Murphy & Medin, 1985; Wisniewski & Medin, 1994). On this view, Yao Ming is a good example of a professional basketball player because he satisfies certain expectations about the category (e.g., high scoring percentage, good rebounder, etc.), not because he shares many features with other category members.

In the case of basketball players, beautiful players are not necessarily viewed as central or typical category members, because beauty does not play a central role in people’s intuitive theories concerning basketball. However, in other domains, such as art, beauty does play a prominent role in people’s intuitive theories and, therefore, should influence category structure. Thus, depending on the role beauty plays in people’s prior beliefs and expectations about a category, the theory-based view suggests that being beautiful (or not) can make an object either more or less typical of that category.

One example of a theory-based attribution is the halo effect (Asch, 1946; Thorndike, 1920), whereby attractive individuals are perceived as more socially competent (Eagly, Ashmore, Makhijani, & Longo, 1991; Feingold, 1992), happier (Dion, Berscheid, & Walster, 1972), more trustworthy (Wilson & Eckel, 2006), and more competent in their occupations than others (Langlois et al., 2000). From a theory-based view, beautiful objects are not beautiful because they are typical or share more features with other category members. Rather, beauty can make an object seem more typical of its category when the category is associated with other positive characteristics (e.g., intelligence) or people have a prior expectation about how beauty relates to the category.

Thus, we broadly define similarity-based views as bottom-up processing of category members’ features and theory-based views as top-down reasoning based on category labels. These two views, although divergent, are not necessarily mutually exclusive: There are multiple determinants of category typicality (e.g., Barsalou, 1985; Lynch, Coley, & Medin, 2000), and bottom-up and top-down processes could be active simultaneously. However, we clearly define the views such that they make different, testable predictions.

We evaluated the two views using a task in which subjects learned to categorize works of abstract modern art as pieces from a college seniors’ art show or an art museum. The paintings used in our task were composed by professional artists (Table 1), were largely unfamiliar to the subject pool, and were found in a previous multidimensional scaling (MDS) study to vary on two psychological dimensions: geometry, or how curvilinear versus angular a painting was, and complexity, or how “busy” the painting appeared (Fig. 1). The paintings were grouped, between subjects, into two categories on the basis of their similarity along one of the dimensions. Some subjects learned a category structure in which the paintings were grouped on the basis of differences in geometry, whereas other subjects learned a structure in which paintings were grouped on the basis of differences in complexity (Fig. 2). We followed others (Palmeri & Blalock, 2000; Wisniewski & Medin, 1994) in using meaningful category labels for these groupings to test the effects of theories; both the art museum and the student art show were perceived as equally likely sources of the paintings (see the Supplementary material).

Table 1 Paintings used in both the multidimensional scaling (MDS) study and the rating study
Fig. 1
figure 1

The stimuli organized into four quadrants defined by two dimensions, geometry and complexity. Geometry describes the angularity of the lines and shapes in a painting, whereas complexity arises from the number of shapes and degree of overlap in a painting

Fig. 2
figure 2

The four possible combinations of category structure and labeling. In the categorization task, subjects were trained to categorize paintings as either student art show or art museum pieces. Paintings with roughly the same level of either geometry or complexity (see Fig. 1) were grouped together to form a category

By systematically manipulating the grouping of paintings and the category labels associated with them, it is possible to test key predictions from both the similarity- and theory-based views of beauty. Both views predict that typicality and beauty will be correlated but differ in terms of the direction of the correlation and how the relationship between beauty and typicality will differ between the art museum and student art show category labels.

The similarity-based view predicts that featural similarity drives typicality and processing fluency, thereby affecting perceptions of beauty. According to this view, perceived typicality should differ depending on how the paintings are grouped. Because the present study uses two strongly contrasting categories, items that are furthest from members of the opposing category in the MDS space (i.e., share the least number of features with opposing category members) are processed most fluently and are perceived as more typical (Davis & Love, 2010). Thus, the paintings at the extremes of the dimension used for grouping (e.g., highly complex paintings or very simple paintings, when the grouping dimension is complexity) should be rated the most typical of their categories, the most fluently processed, and hence, from a similarity-based view, the most beautiful.

In contrast, the theory-based view does not predict that changes in typicality and fluency caused by differences in grouping will affect perceptions of beauty. Rather, this view suggests that perceived beauty should impact typicality structure, as per the halo effect. More important, it also predicts a difference in this effect depending on category label, because subjects may have different prior expectations about how beauty relates to art museums and student art shows. Appearing in an art museum indicates that a piece is considered by experts to be beautiful or valuable (Danto, 1981). Artworks are also expected to be beautiful when created by famous artists (Isham, Ekstrom, & Banks, 2010), and the same pieces are perceived as more beautiful when created by a professional rather than by an amateur (Duerksen, 1972) or by a computer (Kirk, Skov, Hulme, Christensen, & Zeki, 2009). We confirmed that these findings extend to the paintings in our stimulus set (see the Supplementary material). Thus, subjects likely expect that paintings appearing in an art museum, a place populated with the work of famous, professional artists, will be beautiful. In contrast, appearing in a student art show does not carry this strong positive connotation. Specifically, the theory-based view predicts that beauty will be insensitive to the groupings of paintings along the two dimensions (geometry and complexity) and the associated changes in typicality and fluency. Instead, beauty will lead to increases in typicality, but only when the paintings are labeled as art museum pieces.

Method

Subjects

Ninety-three undergraduates from the University of Texas participated for class credit. Five were excluded for failing to exceed chance in the learning phase; mean categorization accuracy for all others was 81.7 % (SD = 0.08).

Materials

Stimuli consisted of 20 abstract paintings (see Fig. 1) without a recognizable topic to ensure that subjects focused on paintings’ perceptual characteristics instead of their subject matter. These paintings were determined to vary continuously along two perceptual dimensions, geometry and complexity.

Design and procedure

Categorization task

Paintings were grouped into four quadrants depending on their values along the geometry and complexity dimensions (see Fig. 1). Counterbalanced across subjects, two adjacent quadrants (roughly matching on geometry or complexity) were assigned a category label (“student art show” or “art museum”), with the remaining stimuli assigned the other label (see Fig. 2).

During learning, each subject completed three trial blocks, which consisted of the individual presentation of the 20 stimuli in a random order, for a total of 60 learning trials. On each trial, subjects were presented with a painting and were instructed to categorize it, on the basis of its visual forms, as a piece from an art museum or a student art show. After they responded, the screen cleared, and feedback was presented for 3,000 ms, indicating the correct category assignment. Following feedback, a white screen was presented for 1,000 ms.

Rating tasks

After the category-learning task, subjects were instructed to rate each painting’s typicality (how well the painting represented its category), beauty (how appealing its visual forms were), and value (how valuable it was). The typicality, beauty, and value rating tasks followed this instruction in that order. Within each task, each painting was presented once in a random order. On each trial, subjects were presented with a painting and, 2,500 ms later, a 7-point scale with low-, center- and high-points labeled not at all, somewhat, and extremely in terms of the characteristic to be rated in that task. After subjects keyed their rating, a white screen was presented for 1,500 ms.

Results

Relationships among the basic variablesFootnote 1

Ratings of typicality, beauty, and value had high interrater reliability, as measured by Cronbach’s coefficient alpha (.81, .97, and .93, respectively). For descriptive purposes, we averaged over subjects to obtain mean ratings for beauty, value, and typicality for each painting. Beauty and value ratings were highly correlated in both categories [art museum, r = .91, t(18) = 9.38, p < .001; student art show, r = .97, t(18) = 16.16, p < .001], suggesting that subjects considered the same quality of the paintings when rating both of these characteristics. Overall, neither beauty nor value was significantly correlated with typicality [beauty, r = .14, t(18) = 0.60, n.s.; value, r = .23 t(18) = 1.00, n.s.]; we explore the impact of category labels on this relationship in our hypothesis tests below. However, given the strong correlation between beauty and value, subsequent analyses focused on beauty, the main variable of interest.

Many of the measures of processing fluency were also correlated: Typicality and categorization reaction time were significantly negatively correlated, r = −.56, t(18) = −2.89, p < .01; typicality and categorization accuracy were significantly positively correlated, r = .54, t(18) = 2.69, p = .015; and reaction time and categorization accuracy were negatively correlated, but not significantly, r = −.37, t(18) = −1.68, n.s.

Similarity-based versus theory-based views of beautyFootnote 2

Similarity- and theory-based views predict different patterns of results in terms of how the category structure (grouping of stimuli with respect to geometry or complexity dimensions) and category label (art museum or student art show) factors will relate to our measures of fluency and subjects’ perceptions of beauty. The similarity-based view predicts that stimulus grouping should affect perceptions of typicality, processing fluency, and subjects’ perceptions of beauty. The theory-based view suggests that beauty will not be affected by changes in category structure or processing fluency but, rather, will lead beautiful items to be perceived as more typical and processed more fluently when they are labeled with the art museum label. We address each of these questions in a series of cross-classified random effects models that test the relationships between category structure, beauty, and measures of typicality and fluency while controlling for subject- and painting-level variability (Bayeen, Davidson, & Bates, 2008). Conceptually, these random effect models are akin to running a separate regression for each subject and testing whether the mean slopes (bs below) relating our variables (e.g., typicality and beauty) are significantly different from zero across subjects. However, by estimating each subject’s slope simultaneously, we are able to pool information from the group-level data to better estimate individual subject slopes and simultaneously account for mean differences in our measures (e.g., beauty) between paintings.

Category structure affects processing fluency and perceptions of typicality, but not beauty

Because the similarity-based view does not predict differences based on label, ratings were collapsed across the two label conditions. Following previous research using strongly contrasting category pairs (Davis & Love, 2010), typicality and measures of processing fluency increased as a painting became more distant from the boundary separating categories along a grouping dimension (e.g., more angular in the high geometry category or more curvilinear in the low geometry category, when paintings were grouped with respect to geometry). Distance-from-the-bound significantly predicted typicality, b = .21, t(87) = 2.58, p = .01, reaction time, b = −.12, t(87) = −4.78, p < .001, and probability correct, b = .33, z = 5.27, p < .001, such that as paintings became more extreme in relation to the grouping dimension, they were perceived as more typical and were categorized more quickly and more accurately. However, distance-from-the-bound did not affect ratings of beauty, b = −.04, t(87) = −0.57, n.s. Instead, across subjects, paintings’ beauty ratings when categories were grouped with respect to geometry were very similar to and highly correlated with their beauty ratings when categories were grouped with respect to complexity, r = .94, t(18) = 11.35, p < .001 (see Fig. 3). These results are inconsistent with the similarity-based view. The paintings that were processed fluently and perceived as typical changed when the grouping dimension changed, but the paintings that were rated as beautiful did not.

Fig. 3
figure 3

Relationship between mean beauty ratings when category structure is determined by grouping paintings by shared geometry versus complexity. Inconsistent with the similarity-based view, judgments of beauty are unaffected by category structure

Beauty contributes to typicality in the art museum condition

The theory-based view predicts that beauty should increase typicality and, by extension, processing fluency for paintings assigned the art museum category label—a label associated with prior expectations of beauty. Thus, because category structure is not predicted to impact typicality and fluency, ratings were collapsed across grouping conditions. As was predicted, for typicality and processing fluency measures, beauty and category label significantly interacted such that the relationship between beauty and fluency was significantly greater for paintings labeled as museum pieces than it was for paintings labeled as student show paintings [typicality, b AM = .22 vs. b SS = −.01, t(87) = 3.72, p = .002; reaction time, b AM = −.04 vs. b SS = .01, t(87) = −2.55, p = .02; probability correct, b AM = .22 vs. b SS = −.01, z = 4.59, p < .001] (see Table 2 and Fig. 4). For art museum pieces, these relationships were significantly different from zero [typicality, t(87) = 4.84, p < .001; reaction time, t(87) = 2.55, p = .01; probability correct, z = 4.83, p < .001]. However, for student show paintings, the effect of beauty was not significant [typicality, t(87) = −0.23, n.s.; reaction time, t(87) = 0.59, n.s.; probability correct, z = −0.25, n.s.]. These results are consistent with a theory-based view: Beauty impacted category structure by contributing to how typical paintings were perceived to be and how fluently they were processed, but this increase was significant only for the art museum category, which has a strong prior relationship to the concept of beauty.

Table 2 Combined mixed effects models to predict measures of fluency from distance-from-the-bound, beauty, and category label
Fig. 4
figure 4

Typicality as a function of beauty. Consistent with the theory-based view, more beautiful paintings are rated as more typical for the art museum category, but not for the student art show category

Notably, when considered simultaneously, both beauty and distance-from-the-bound contributed significantly to fluency and typicality (see Table 2), suggesting that beauty and grouping contribute independently to the art museum category’s typicality structure. Thus, even though groupings did not impact painting beauty, both similarity- and theory-based factors may influence category typicality structure.

Discussion

Together, the results are inconsistent with the predictions of the similarity-based view and in accord with the theory-based view. Beauty does not arise from increases in typicality or fluency caused by category contrast. Instead, beauty contributes to the structure of categories for which subjects have strong prior expectations about beauty: Category members that are perceived as beautiful are viewed as more typical and are more fluently processed. Indeed, the fluency with which paintings were processed varied across different groupings in our experiment, but beauty did not. Instead, we observed a halo effect whereby art museum paintings that were considered beautiful were rated more typical and processed more fluently. The theory-based view suggests that the different impact of beauty on typicality between the two category labels reflects differences in subjects’ expectations. Art museums are expected to contain beautiful and valuable artworks, which causes beautiful paintings to be considered better, more typical examples of art museum pieces. Because student art shows are not as strongly associated with beautiful art, beauty did not contribute to this category’s typicality structure.

One potential criticism of the present study is that we have left beauty itself unexplained, a je ne sais quoi that paintings either have or do not have. Because we do not offer an account of beauty’s origins, a similarity-minded researcher may suggest that perhaps beauty is determined by similarity to an abstract art concept collated over an individual’s lifespan, not the art museum and student art show categories that subjects learned here. Although we do not discount the role that previous experience may play in shaping perceptions of beauty, the mechanisms by which similarity to a long-term average would affect perceptions of beauty is not clear. From a similarity-based view, averages are thought to impact perceptions of beauty via processing fluency. Our results demonstrate that fluency, in and of itself, is not what gave rise to perceptions of beauty in the present experiment, and so a similarity-based view that depended on similarity to a long-term average would need to offer a different mechanism. Indeed, an approach that relied on processing fluency as a cause of beauty would have a difficult time explaining why student art show paintings that were processed more fluently were not rated as more beautiful.

To this end, our experiment may explain some additional observations in the beauty-in-averageness literature that are inconsistent with pure fluency-based accounts. While averageness has been found to predict beauty in a number of real-world categories, there are cases beyond the present experiment where it does not. For example, typical spiders are not rated as the most attractive or beautiful (Halberstadt, 2006), although typical dogs, fish, and wristwatches are (Halberstadt & Rhodes, 2003). This difference may be explained by our theories about these categories. Our beliefs about spiders, as unpleasant and even dangerous, are more negative than our prior expectations about dogs and wristwatches. Similarly, student art shows are expected to contain less beautiful art than art museums. These findings are in line with a theory-based view, which would predict a relationship between beauty and typicality only in positive categories or those affiliated with beauty, even though they are inconsistent with pure fluency-based accounts.

These examples also suggest that the two views need not be mutually exclusive. Depending on the domain and how relevant theories are to it, theory, similarity, or both effects could be manifested. Unlike patterns of dots or simple drawings of wristwatches, artworks are complex, beauty relevant, and associated with different cultural practices and personal experiences. As a result, we may have strong theories about artworks that shape our perception of their beauty, whereas featural similarity may exert a greater impact on our perception of simpler stimuli’s beauty in the absence of strong theories.

In summary, we explored one aspect of beauty’s nature: its relationship to individuals’ theories and perceptions of categories. Our results suggest that beauty is not merely a reflection of category structure, as is predicted by a similarity-based view. Instead, the relationship between beauty and category structure may be more complex than can be captured by similarity alone. In judgments of real-world stimuli, beauty itself can influence the structure of categories, in line with a theory-based explanation. Beauty remains mysterious; however, we have made some progress here in understanding it.