Is better beautiful or is beautiful better? Exploring the relationship between beauty and category structure
We evaluate two competing accounts of the relationship between beauty and category structure. According to the similarity-based view, beauty arises from category structure such that central items are favored due to their increased fluency. In contrast, the theory-based view holds that people’s theories of beauty shape their perceptions of categories. In the present study, subjects learned to categorize abstract paintings into meaningfully labeled categories and rated the paintings’ beauty, value, and typicality. Inconsistent with the similarity-based view, beauty ratings were highly correlated across conditions despite differences in fluency and assigned category structure. Consistent with the theory-based view, beautiful paintings were treated as central members for categories expected to contain beautiful paintings (e.g., art museum pieces), but not in others (e.g., student show pieces). These results suggest that the beauty of complex, real-world stimuli is not determined by fluency within category structure but, instead, interacts with people’s prior knowledge to structure categories.
KeywordsAesthetic preferences Categorization Halo effect Fluency
Beauty is mysterious. We know it when we see it, but it eludes explanation. One facet of beauty that has been explored is its relationship to category structure, and in psychology, two possible relationships have been suggested. The first line of research explores how beauty arises from the feature structure of categories. For example, golden retrievers may be considered beautiful dogs because they are typical of the category dogs, sharing many features with other dogs. A second line of research instead explores how beauty contributes to the structure of categories, as when beautiful individuals are perceived to be better leaders and more electable (Berggren, Henrik, & Poutvaara, 2010). Together, these two views present a conundrum: Beauty is viewed as both arising from and contributing to the structure of categories. The present study disentangles these two accounts.
The first line of research, which we will refer to as the similarity-based view, holds that similarity relationships among category members play an important role in determining beauty. According to this view, items that are central by virtue of sharing features with other category members tend to be judged typical of their category and are processed more fluently (Nosofsky, 1988; Rosch, 1975; Storms, De Boeck, & Ruts, 2000). Fluency and familiarity are theorized to increase positive affect (Zajonc, 1968), which, in turn, is thought to increase perceptions of beauty (Reber, Schwarz, & Winkielman, 2004). For example, among a golden retriever, a daschund, and a great dane, the golden retriever is the most similar to other dogs in its size, proportions, and other characteristics and should be judged the most typical and the most beautiful of the three, according to the similarity-based view. Analogously, face morphs (Langlois et al., 2000), line drawings of animals (Halberstadt & Rhodes, 2003), and a variety of other real-world (Halberstadt, 2006) and artificial (Winkielman, Halberstadt, Fazendeiro, & Catty, 2006) stimuli with features that are central for their category are judged to be more beautiful than atypical category members. Importantly, the similarity-based view predicts that, by virtue of being fluently processed, highly typical objects should be viewed as the most beautiful.
According to a second line of research, referred to as the theory-based view, people’s perceptions of an item’s beauty combine with prior beliefs about the category to shape the structure of the category. Rather than arising from category structure, beauty can be a determinant of category structure. This view follows from theories of categorization suggesting that people’s prior beliefs, expectations, and intuitive theories of the category, rather than featural-similarity relationships among category members, determine typicality structure (Heit, 1997; Murphy & Medin, 1985; Wisniewski & Medin, 1994). On this view, Yao Ming is a good example of a professional basketball player because he satisfies certain expectations about the category (e.g., high scoring percentage, good rebounder, etc.), not because he shares many features with other category members.
In the case of basketball players, beautiful players are not necessarily viewed as central or typical category members, because beauty does not play a central role in people’s intuitive theories concerning basketball. However, in other domains, such as art, beauty does play a prominent role in people’s intuitive theories and, therefore, should influence category structure. Thus, depending on the role beauty plays in people’s prior beliefs and expectations about a category, the theory-based view suggests that being beautiful (or not) can make an object either more or less typical of that category.
One example of a theory-based attribution is the halo effect (Asch, 1946; Thorndike, 1920), whereby attractive individuals are perceived as more socially competent (Eagly, Ashmore, Makhijani, & Longo, 1991; Feingold, 1992), happier (Dion, Berscheid, & Walster, 1972), more trustworthy (Wilson & Eckel, 2006), and more competent in their occupations than others (Langlois et al., 2000). From a theory-based view, beautiful objects are not beautiful because they are typical or share more features with other category members. Rather, beauty can make an object seem more typical of its category when the category is associated with other positive characteristics (e.g., intelligence) or people have a prior expectation about how beauty relates to the category.
Thus, we broadly define similarity-based views as bottom-up processing of category members’ features and theory-based views as top-down reasoning based on category labels. These two views, although divergent, are not necessarily mutually exclusive: There are multiple determinants of category typicality (e.g., Barsalou, 1985; Lynch, Coley, & Medin, 2000), and bottom-up and top-down processes could be active simultaneously. However, we clearly define the views such that they make different, testable predictions.
Paintings used in both the multidimensional scaling (MDS) study and the rating study
MDS coordinate: geometry
MDS coordinate: complexity
Mean beauty rating
Cadmium Red Over Black
Octavio Paz Suite–Nocturne VI
Beside the Sea #42
Ochre and Black
Rite of Passage III
Trees in Blossom
Red Square: Painterly Realism of a Peasant Woman in Two Dimensions
Composition with Red, Blue and Yellow
Red, Orange, Tan and Purple
Collage with Squares Arranged According to the Laws of Chance
Composition No. 10
By systematically manipulating the grouping of paintings and the category labels associated with them, it is possible to test key predictions from both the similarity- and theory-based views of beauty. Both views predict that typicality and beauty will be correlated but differ in terms of the direction of the correlation and how the relationship between beauty and typicality will differ between the art museum and student art show category labels.
The similarity-based view predicts that featural similarity drives typicality and processing fluency, thereby affecting perceptions of beauty. According to this view, perceived typicality should differ depending on how the paintings are grouped. Because the present study uses two strongly contrasting categories, items that are furthest from members of the opposing category in the MDS space (i.e., share the least number of features with opposing category members) are processed most fluently and are perceived as more typical (Davis & Love, 2010). Thus, the paintings at the extremes of the dimension used for grouping (e.g., highly complex paintings or very simple paintings, when the grouping dimension is complexity) should be rated the most typical of their categories, the most fluently processed, and hence, from a similarity-based view, the most beautiful.
In contrast, the theory-based view does not predict that changes in typicality and fluency caused by differences in grouping will affect perceptions of beauty. Rather, this view suggests that perceived beauty should impact typicality structure, as per the halo effect. More important, it also predicts a difference in this effect depending on category label, because subjects may have different prior expectations about how beauty relates to art museums and student art shows. Appearing in an art museum indicates that a piece is considered by experts to be beautiful or valuable (Danto, 1981). Artworks are also expected to be beautiful when created by famous artists (Isham, Ekstrom, & Banks, 2010), and the same pieces are perceived as more beautiful when created by a professional rather than by an amateur (Duerksen, 1972) or by a computer (Kirk, Skov, Hulme, Christensen, & Zeki, 2009). We confirmed that these findings extend to the paintings in our stimulus set (see the Supplementary material). Thus, subjects likely expect that paintings appearing in an art museum, a place populated with the work of famous, professional artists, will be beautiful. In contrast, appearing in a student art show does not carry this strong positive connotation. Specifically, the theory-based view predicts that beauty will be insensitive to the groupings of paintings along the two dimensions (geometry and complexity) and the associated changes in typicality and fluency. Instead, beauty will lead to increases in typicality, but only when the paintings are labeled as art museum pieces.
Ninety-three undergraduates from the University of Texas participated for class credit. Five were excluded for failing to exceed chance in the learning phase; mean categorization accuracy for all others was 81.7 % (SD = 0.08).
Stimuli consisted of 20 abstract paintings (see Fig. 1) without a recognizable topic to ensure that subjects focused on paintings’ perceptual characteristics instead of their subject matter. These paintings were determined to vary continuously along two perceptual dimensions, geometry and complexity.
Design and procedure
Paintings were grouped into four quadrants depending on their values along the geometry and complexity dimensions (see Fig. 1). Counterbalanced across subjects, two adjacent quadrants (roughly matching on geometry or complexity) were assigned a category label (“student art show” or “art museum”), with the remaining stimuli assigned the other label (see Fig. 2).
During learning, each subject completed three trial blocks, which consisted of the individual presentation of the 20 stimuli in a random order, for a total of 60 learning trials. On each trial, subjects were presented with a painting and were instructed to categorize it, on the basis of its visual forms, as a piece from an art museum or a student art show. After they responded, the screen cleared, and feedback was presented for 3,000 ms, indicating the correct category assignment. Following feedback, a white screen was presented for 1,000 ms.
After the category-learning task, subjects were instructed to rate each painting’s typicality (how well the painting represented its category), beauty (how appealing its visual forms were), and value (how valuable it was). The typicality, beauty, and value rating tasks followed this instruction in that order. Within each task, each painting was presented once in a random order. On each trial, subjects were presented with a painting and, 2,500 ms later, a 7-point scale with low-, center- and high-points labeled not at all, somewhat, and extremely in terms of the characteristic to be rated in that task. After subjects keyed their rating, a white screen was presented for 1,500 ms.
Relationships among the basic variables1
Ratings of typicality, beauty, and value had high interrater reliability, as measured by Cronbach’s coefficient alpha (.81, .97, and .93, respectively). For descriptive purposes, we averaged over subjects to obtain mean ratings for beauty, value, and typicality for each painting. Beauty and value ratings were highly correlated in both categories [art museum, r = .91, t(18) = 9.38, p < .001; student art show, r = .97, t(18) = 16.16, p < .001], suggesting that subjects considered the same quality of the paintings when rating both of these characteristics. Overall, neither beauty nor value was significantly correlated with typicality [beauty, r = .14, t(18) = 0.60, n.s.; value, r = .23 t(18) = 1.00, n.s.]; we explore the impact of category labels on this relationship in our hypothesis tests below. However, given the strong correlation between beauty and value, subsequent analyses focused on beauty, the main variable of interest.
Many of the measures of processing fluency were also correlated: Typicality and categorization reaction time were significantly negatively correlated, r = −.56, t(18) = −2.89, p < .01; typicality and categorization accuracy were significantly positively correlated, r = .54, t(18) = 2.69, p = .015; and reaction time and categorization accuracy were negatively correlated, but not significantly, r = −.37, t(18) = −1.68, n.s.
Similarity-based versus theory-based views of beauty2
Similarity- and theory-based views predict different patterns of results in terms of how the category structure (grouping of stimuli with respect to geometry or complexity dimensions) and category label (art museum or student art show) factors will relate to our measures of fluency and subjects’ perceptions of beauty. The similarity-based view predicts that stimulus grouping should affect perceptions of typicality, processing fluency, and subjects’ perceptions of beauty. The theory-based view suggests that beauty will not be affected by changes in category structure or processing fluency but, rather, will lead beautiful items to be perceived as more typical and processed more fluently when they are labeled with the art museum label. We address each of these questions in a series of cross-classified random effects models that test the relationships between category structure, beauty, and measures of typicality and fluency while controlling for subject- and painting-level variability (Bayeen, Davidson, & Bates, 2008). Conceptually, these random effect models are akin to running a separate regression for each subject and testing whether the mean slopes (bs below) relating our variables (e.g., typicality and beauty) are significantly different from zero across subjects. However, by estimating each subject’s slope simultaneously, we are able to pool information from the group-level data to better estimate individual subject slopes and simultaneously account for mean differences in our measures (e.g., beauty) between paintings.
Category structure affects processing fluency and perceptions of typicality, but not beauty
Beauty contributes to typicality in the art museum condition
Combined mixed effects models to predict measures of fluency from distance-from-the-bound, beauty, and category label
Beauty * category label
Beauty * category label
Beauty * category label
Notably, when considered simultaneously, both beauty and distance-from-the-bound contributed significantly to fluency and typicality (see Table 2), suggesting that beauty and grouping contribute independently to the art museum category’s typicality structure. Thus, even though groupings did not impact painting beauty, both similarity- and theory-based factors may influence category typicality structure.
Together, the results are inconsistent with the predictions of the similarity-based view and in accord with the theory-based view. Beauty does not arise from increases in typicality or fluency caused by category contrast. Instead, beauty contributes to the structure of categories for which subjects have strong prior expectations about beauty: Category members that are perceived as beautiful are viewed as more typical and are more fluently processed. Indeed, the fluency with which paintings were processed varied across different groupings in our experiment, but beauty did not. Instead, we observed a halo effect whereby art museum paintings that were considered beautiful were rated more typical and processed more fluently. The theory-based view suggests that the different impact of beauty on typicality between the two category labels reflects differences in subjects’ expectations. Art museums are expected to contain beautiful and valuable artworks, which causes beautiful paintings to be considered better, more typical examples of art museum pieces. Because student art shows are not as strongly associated with beautiful art, beauty did not contribute to this category’s typicality structure.
One potential criticism of the present study is that we have left beauty itself unexplained, a je ne sais quoi that paintings either have or do not have. Because we do not offer an account of beauty’s origins, a similarity-minded researcher may suggest that perhaps beauty is determined by similarity to an abstract art concept collated over an individual’s lifespan, not the art museum and student art show categories that subjects learned here. Although we do not discount the role that previous experience may play in shaping perceptions of beauty, the mechanisms by which similarity to a long-term average would affect perceptions of beauty is not clear. From a similarity-based view, averages are thought to impact perceptions of beauty via processing fluency. Our results demonstrate that fluency, in and of itself, is not what gave rise to perceptions of beauty in the present experiment, and so a similarity-based view that depended on similarity to a long-term average would need to offer a different mechanism. Indeed, an approach that relied on processing fluency as a cause of beauty would have a difficult time explaining why student art show paintings that were processed more fluently were not rated as more beautiful.
To this end, our experiment may explain some additional observations in the beauty-in-averageness literature that are inconsistent with pure fluency-based accounts. While averageness has been found to predict beauty in a number of real-world categories, there are cases beyond the present experiment where it does not. For example, typical spiders are not rated as the most attractive or beautiful (Halberstadt, 2006), although typical dogs, fish, and wristwatches are (Halberstadt & Rhodes, 2003). This difference may be explained by our theories about these categories. Our beliefs about spiders, as unpleasant and even dangerous, are more negative than our prior expectations about dogs and wristwatches. Similarly, student art shows are expected to contain less beautiful art than art museums. These findings are in line with a theory-based view, which would predict a relationship between beauty and typicality only in positive categories or those affiliated with beauty, even though they are inconsistent with pure fluency-based accounts.
These examples also suggest that the two views need not be mutually exclusive. Depending on the domain and how relevant theories are to it, theory, similarity, or both effects could be manifested. Unlike patterns of dots or simple drawings of wristwatches, artworks are complex, beauty relevant, and associated with different cultural practices and personal experiences. As a result, we may have strong theories about artworks that shape our perception of their beauty, whereas featural similarity may exert a greater impact on our perception of simpler stimuli’s beauty in the absence of strong theories.
In summary, we explored one aspect of beauty’s nature: its relationship to individuals’ theories and perceptions of categories. Our results suggest that beauty is not merely a reflection of category structure, as is predicted by a similarity-based view. Instead, the relationship between beauty and category structure may be more complex than can be captured by similarity alone. In judgments of real-world stimuli, beauty itself can influence the structure of categories, in line with a theory-based explanation. Beauty remains mysterious; however, we have made some progress here in understanding it.
When these ratings were examined in light of the MDS results, more geometric paintings were rated more typical, r = .62, t(18) = 3.36, p = .003, and more complex paintings rated were rated more beautiful, r = .81, t(18) = 5.84, p < .001; these relationships did not vary as a function of category label. Complexity was included as a factor in all models that included beauty to control for the correlation between the two, and doing so did not change the nature of the results. Additionally, beauty was centered according to each subject’s mean in all models to reduce colinearity in the random effects.
The relationships between distance-to-the-bound and fluency, distance-to-the-bound and beauty, and beauty and fluency are tested in groupings made according to theory (collapsed across category labels when testing the similarity-based view and collapsed across grouping conditions when testing the theory-based view). However, these relationships are consistent across all possible groupings and across experimental condition, which was counterbalanced between subjects.
This work was supported by the National Institutes of Health (MH09152), the Air Force Office of Scientific Research (FA9550-10-1-0268), the Army Research Laboratory (W911NF-09-2-0038), and the National Science Foundation (0927315).