Do individuals differ systematically in their aesthetic preferences, and, if so, how? The well-known adage, “There’s no accounting for taste,” suggests that individual differences (IDs) in aesthetic preference are either completely arbitrary or otherwise inexplicable (e.g., Chandler 1928; Woodworth, 1938). However, modern behavioral research on empirical aesthetics has shown that scientifically meaningful statements can be made about average preferences among colors (see, e.g., Ling & Hurlbert, 2009; Ou, Luo, Woodcock, & Wright, 2004; Palmer & Schloss, 2010), shapes (e.g., Amir, Biederman, & Hayworth, 2011; Bar & Neta, 2006; Silvia & Barona, 2009), spatial compositions (e.g., McManus & Kitson, 1995; Palmer, Gardner, & Wickens, 2008), and music (e.g., Smith & Melara, 1990; Trainor & Heinmiller, 1998). It therefore seems reasonable that similar techniques could be used to characterize (and thereby “account for”) IDs in aesthetic preference.

Some headway has been made in this direction already. For example, Jacobsen (2004) modeled IDs in the preference for simple spatial compositions and isolated specific cues that seemed to drive the preference decisions in different individuals. McManus (1980) showed that preference for the shapes of triangles and rectangles varies widely across individuals and yet is consistent within an individual over a time span of more than 2 years. More recently, McManus, Cook, and Hunt (2010) tried to tie these differences to particular personality scales—including the Big Five personality traits as well as need for cognition, tolerance of ambiguity, schizotypy, vocational types, and aesthetic activities—but found no significant correlations.

An older body of research, by Eysenck (1940) on IDs in aesthetic preference, is still more closely related to the present research. Using widely varying domains (e.g., black-and-white photographs, colors, polygons, and odors), Eysenck had participants order stimuli within a given domain from most to least preferred. He then correlated each individual’s ordering with the average ranking for that domain and found that a single factor could reliably predict the results. Eysenck interpreted this factor (t) as a measure of the degree to which each individual had what he called “good taste.” Interestingly, this t factor was uncorrelated with other ID factors, including general IQ (G).

In the present research, we take research on IDs in aesthetic preference an important step further by identifying a new variable. We call this variable “preference for harmony” because it is an index of the degree to which a person systematically likes (or dislikes) stimuli that are harmonious, in the sense of being “good gestalts.” This construct brings together two strands of research: one on perceptual goodness (or “good gestalt,” or “Prägnanz”; e.g., Garner, 1974; Palmer, 1991), and the other on aesthetic preference (e.g., Palmer et al., 2008; Schloss & Palmer, 2011). In preference experiments, people judge how much they “like” stimuli using some response of relative aesthetic preference (see Palmer et al., 2012). In perceptual goodness experiments, people judge how “good” the same stimuli are in terms of simplicity, regularity, and/or harmony, depending on the type of stimulus being judged. We will generically call this dimension “harmony” but intend it to stand for the appropriate term in each domain: “harmony” for music and color, “good fit” for spatial composition, and “figural goodness” for shape.

The finding that suggested the present research arose from studies of the relation between preference and harmony in color combinations (Schloss & Palmer, 2011). In the art historical literature, most color theorists have taken preference and harmony to be identical (e.g., Chevreul, 1839/1967; Itten, 1970). A few, including artist Josef Albers (1963), have disagreed, suggesting that harmonious combinations need not be liked, nor dissonant combinations disliked. Schloss and Palmer asked 48 participants to rate 992 pairs of 32 colors for both preference (how much they liked the color combination) and harmony (how well the two colors went together, regardless of preference). To convey the possibility that preference and harmony ratings need not be the same, the participants were given a musical analogy: Almost everyone would agree that Mozart’s music is more harmonious than Stravinsky’s, but some people like Stravinsky’s better than Mozart’s, and others the opposite.

Schloss and Palmer (2011) found a strong positive correlation between the average ratings of preference and harmony for the same color combinations (r  = +.79), but the corresponding correlations for individuals ranged widely, from –.03 to +.70. Schloss and Palmer also found a systematic relation between preference for harmony and color training: Preference for harmony was highest in individuals with moderate amounts of color training (average r  = +.52) and lower in individuals with either the least (average r  = +.33) or the most (average r  = +.25) color training, consistent with Berlyne’s (1971) theory of aesthetic dynamics.

These findings suggest a number of interesting questions: Do the same kind of IDs in preference for harmony exist in other aesthetic domains, such as music and/or spatial domains? Would preference for harmony across different domains be correlated, as one might expect if Eysenck’s (1940) t factor were due to preference for harmony? Finally, how do other personality factors, such as the Big Five Index (BFI; John, Donahue, & Kentle, 1991) or the Sensation Seeking Scale (SSS; Zuckerman, 1979), and/or levels of training and experience in the relevant domains relate to preference for harmony?

Method

Participants

A set of 90 participants from three different educational groups were studied: 30 students each from psychology, art practice, and music (average age, 21.4 years). None had color vision deficiency using the Dvorine Pseudo-Isochromatic Plates. All gave informed consent and were naïve as to the purpose of the study. The Committee for the Protection of Human Subjects at the University of California, Berkeley, approved the experimental protocol.

Design

The participants rated 127 stimuli first for aesthetic preference and later for harmony (using different names in different domains; see below). The order of the two tasks was important, because previous studies on preferences for color pairs have shown preference ratings to be more variable than harmony ratings (Schloss & Palmer, 2011). Our general philosophy was to have participants make the more subjective, variable ratings first, in order to minimize influences of more objective tasks on the less objective ones. The instructions defined “preference” simply as how much the participant “liked” a given stimulus as compared to all others in the set, and no participant requested further clarification of this instruction. The instructions for the harmony ratings differed by domain (see below), in order to make their meaning more obvious, and included the musical analogy mentioned above. Finally, participants completed the 44-item BFI, the SSS, and two questionnaires about art and music training.

Stimuli

The color stimuli were 56 color pairs from the Berkeley Color Project colors, consisting of all possible combinations of colors from the red, yellow, blue, and green hues in light and saturated cuts (Schloss & Palmer, 2011) as small squares (100  × 100 pixels), centered within a partly occluded larger square (300  × 300 pixels). The dot patterns (Garner, 1974) consisted of 22 five-dot images (60-pixel-diameter dots) centered on points in a 3  × 3 matrix, including the center of the screen and displacements ±100 pixels vertically and/or horizontally. The dot patterns were chosen to represent each symmetry subgroup for this class of images (Palmer, 1991). The circle-in-a-frame images (Palmer & Guidi, 2011) consisted of 35 images of a single black dot (20-pixel diameter) viewed in a white, horizontal, rectangular frame (200  × 300 pixels) with a 5-pixel black border. In each image, the dot appeared in one of 35 different positions arranged as a 5  × 7 grid, with 50 pixels between each position vertically and horizontally. Figure 1 shows representative samples of the stimuli from each domain, with the examples ranging from highly harmonious to highly disharmonious.

Fig. 1
figure 1

Examples of the visual stimuli: (A) color pairs, (B) dot patterns, and (C) framed-dot images. The numbers below each display indicate its average rated harmony on a scale from −100 to +100

The musical stimuli consisted of fourteen 30-s audio clips of classical solo piano music chosen to vary in style (classical, romantic, transitional, and atonal), harmonic mode (major and minor, where relevant), and tempo (fast and slow). Table 1 lists the pieces, and the sound files can be accessed as Supplementary Online Materials.

Table 1 Musical compositions sampled for the auditory stimuli

Participants rated their preferences for each stimulus on a 400-pixel continuous rating scale with Not at all below the left endpoint and Very much below the right endpoint. An identical rating scale was used for the second viewing of the stimuli to indicate how “X” it was, with different labels for the endpoints of dimension “X” (i.e., Harmonious/Disharmonious for the color pairs and musical selections, Simple/Complex for the dot patterns, and Good fit/Bad fit for the circle-in-a-frame images). Participants were told that the vertical mark crossing the center of the scale represented a neutral point.

The visual stimuli were presented on a 20-in. iMac (2007) computer monitor (1,680 × 1,050 pixels; 60-Hz refresh rate) in a darkened room from a distance of approximately 70 cm. The background was always neutral gray (CIE x  = 0.312, y  = 0.318, Y  = 19.26). The chromaticity and luminance functions of the red, green, and blue guns were measured using a Minolta CS100 Chroma Meter, which was then used to calculate the appropriate RGB values to ensure accurate presentation of the CIE xyY values for our colors. The auditory stimuli were presented on the iMac computer through Sennheiser HD-270 headphones at a volume set by the participant. All displays were generated and presented using the Presentation software (www.neurobs.com).

Procedure

The 127 stimuli were blocked by category in the following order: dot patterns, color pairs, circles in frames, and music. The stimuli were randomized within each block. Before participants were allowed to respond, each musical selection was presented in its entirety and each visual display was presented for 2,000 ms. A 500-ms interval occurred between trials. In the first phase, the participants were asked to rate their aesthetic preferences, and in the second phase, to rate the relevant dimension of harmony (see above). The recorded datum on each trial corresponded to the nearest integer x-coordinate (−100 to + 100) at which the participant clicked on the scale.

After rating all stimuli for both preference and harmony, the participants completed computerized versions of the 44-item BFI (John et al., 1991) and the SSS (Zuckerman, 1979), as well as a modified version of the Queens Questionnaire for Musical Background (Cuddy, Balkwill, Peretz, & Holden, 2005, as modified by Bhatara, Quintin, Heaton, Fombonne, & Levitin, 2009) and a questionnaire about their background in visual art and color (Schloss & Palmer, 2011).

Results

The correlations between the preference and harmony ratings for the same stimuli were reliably positive in all four domains when computed from the ratings averaged over participants, ranging from +.97 for music to +.47 for dot patterns (see Fig. 2, left column). The correlations between the preference and harmony ratings varied widely when computed from the ratings for individual participants, however, spanning fully 70%–89% of the possible range (see Fig. 2, right column). Note also that the average correlations across participants were systematically lower than the corresponding correlations for the group averages. People generally preferred harmony in all four tested domains, but the tendencies across individuals toward preferring other kinds of structure were unsystematic relative to the harmony judgments, and therefore tended to cancel out when averaging the ratings.

Fig. 2
figure 2

Average and individual data regarding preference for harmony. The left column shows scatterplots of average harmony ratings versus average preference ratings, together with the best-fitting regression line for each stimulus domain. The right column shows histograms of the correlations for individual participants, together with the average correlations of the individuals (solid lines) and the correlations between the average preference and average harmony ratings (dotted lines)

It is clear from these data that participants vary widely in the relationships between their preference and harmony ratings. To investigate these IDs further, we calculated a preference-for-harmony (PfH) score for each participant in each domain, which represents the average unsigned difference score over all stimuli in that domain. Specifically, each participant’s PfH was calculated for a given domain (d) as 100 minus the average of the absolute values of the difference between that participant’s preference rating (P i ) and harmony rating (H i ) for each stimulus (i) in that domain, as follows:

$$ PfH(d)=100-\frac{1}{{{n_d}}}\left( {\sum\limits_{i=1}^{{{n_d}}} {\left| {{P_i}-{H_i}} \right|} } \right), $$

where n d is the number of stimuli in domain d. Given the ±100 rating scales for P i and H i , the absolute values of their difference could range from 0 (equal P i and H i scores for all stimuli in domain d) to +200 (maximally different P i and H i scores for all stimuli in domain d). After subtracting this value from 100, the PfH scores could thus vary from +100 (maximal preference for harmony) to −100 (maximal preference for disharmony). For example, if a participant strongly liked harmonious stimuli and strongly disliked disharmonious stimuli, P i and H i would have similar values for each stimulus, leading to average difference scores close to zero and a PfH score close to +100. If a participant strongly liked disharmonious stimuli and strongly disliked harmonious stimuli, his or her P i and H i scores would differ greatly from each other, resulting in a PfH score closer to −100.

As is shown in Table 2, these individual PfH scores were found to be strongly correlated across each pair of domains, ranging from a high of +.60 for color and music [t(88)  = 7.04, p  < .0001] to a low of +.32 for music and spatial composition [t(88)  = 3.07, p  < .005]. Table 2 also shows the correlations between the PfH scores in each of the four domains and scores on the BFI and the SSS, with their respective subscales, none of which reached significance (p  > .05) after adjusting for family-wise error using the Sidak–Bonferroni method. The lack of significant correlations between PfH and the BFI subscales is not entirely surprising (see McManus et al., 2010), but because the Bonferroni correction might be too stringent, we repeated the analyses using only six variables: average PfH across all domains versus total SSS score and the five BFI subscales. The results of this analysis were also nonsignificant.

Table 2 Cross-domain correlations of preference-for-harmony (PfH) scores

Figure 3 shows PfH scores in the four stimulus domains for the art, music, and psychology students. We found a main effect of domain [F(3, 348)  = 19.44, p  < .01] due to the PfH measures being reliably higher for music [F(1, 268)  = 26.94, p  < .01] and lower for shape [F(1, 268)  = 13.35, p  < .01] than for color and spatial composition, which did not differ from each other (F  < 1). These differences are unlikely to be meaningful, however, because they are sensitive to the particular stimulus samples used in the different domains. A main effect of educational group also emerged [F(2, 348) = 20.05, p  < .01], with higher preference-for-harmony scores for psychology than for music [F(1, 238)  = 33.93, p  < .01] or art [F(1, 238)  = 21.33, p  < .01] students.

Fig. 3
figure 3

Mean correlations between preference and harmony (average PfH) by educational group and stimulus domain. Significant differences are indicated by p values

We also found a significant interaction between stimulus domains and educational groups [F(6, 348)  = 3.20, p  < .01]. The general pattern was that art and music majors tended to have lower PfH values than did psychology majors, but the lowest average PfH scores were always found in the students with the most training in the relevant domain (Fig. 3). Many, but not all, of these differences were statistically significant, as is indicated in Fig. 3.

Previous research has indicated that judgments of color preference are more variable across individuals than are judgments of color harmony (Schloss & Palmer, 2011). To examine this hypothesis, we calculated Cronbach’s alpha scores for the entire sample of participants and for the three different educational groups, both for preference and for harmony ratings (Fig. 4). The level of agreement for the preference ratings varied across the participants groups but was consistently lower than the agreement for the harmony ratings [F(1, 22)  = 9.92, p  < .01], indicating greater variability across individuals in ratings of preference than of harmony. Additionally, alpha scores for the preference judgments showed a significant drop in the visual art and the music educational groups as compared to the psychology students [F(1, 10)  = 5.06, p  < .05], but there was no corresponding difference for the harmony judgments [F(1, 10)  = 0.23, p  = .64].

Fig. 4
figure 4

Cronbach's alpha scores for ratings of preference (P) and harmony (H) in each of the four stimulus domains for the three educational groups

These group differences are potentially relevant to a possible alternative interpretation of the present results: Perhaps some participants simply have no basis on which to judge harmony, and therefore report their preferences as a proxy for harmony ratings. Participants who show high preferences for harmony might then be the ones for whom harmony is a poorly defined concept. If this were true, however, we should find systematically lower agreement (lower alpha scores) among the harmony ratings for the groups who had less expertise, because they are the individuals for whom harmony would be poorly defined. In fact, we found that agreement on harmony ratings was essentially the same in the different educational groups. To examine this issue more closely, we computed the differences between the average correlations of each participant in each group and (a) each other participant in the same educational group and (b) each participant in each of the other educational groups—that is, measures of within-group agreement versus between-group agreement. This was done for both the preference and harmony ratings in all domains. We found a highly significant difference between the inter- and intragroup agreements for the preference ratings [F(1, 178)  = 43.82, p  < .001], but no significant difference for the harmony ratings [F(1, 178)  = 3.51, p  = .06]. This pattern suggests that all groups were judging the same thing when they rated harmony, whereas their ratings of preference differed.

To test hypotheses concerning whether PfH could be represented as a single explanatory factor with effects separate from level of expertise (years of self-reported art and music training for each student; see Table 3 for the group differences), we conducted confirmatory factor analyses. First, we tested the simplest model without any expertise effects, in which preference for harmony in each domain was predicted by a single, generalized latent factor (Fig. 5a). This single-factor model fit the data well (goodness-of-fit index [GFI]  = .9705) but was only marginally nonsignificant for the chi-square test (p  = .056), with root-mean-squared error of approximation (RMSEA)  = .145 and standardized root-mean square residual (SRMR)  = .05. Next, we tested a model with all possible predictors, trimming those connections that were found to be nonsignificant (Fig. 5b). We began with a saturated model, including art expertise, music expertise, sensation seeking (composite score), the five scores of the BFI, and a single latent factor as predictors of PfH scores in the four domains. The resulting model suggests that preference for harmony is best represented as a single general factor (H) influencing the PfH scores in all domains, with art and music expertise having additional effects only in certain relevant domains. This model fares better on the chi-square nonsignificance test (p  = .514), with GFI  = .9683, RMSEA < .001, and SRMR = .05. To more directly compare the fits of the two models, we conducted a chi-square difference test, which was significant in favor of the expanded model [χ2(8, N  = 90)  = 15.59, p  < .05]. Correlation values for the variables relevant to the expanded model are shown in Table 4.

Table 3 Average training levels of participants by major
Fig. 5
figure 5

Structural equation models. (A) A-priori one-factor model with n = 90, eight parameters, p = .056, and goodness of fit =.9705. (B) A second model created through an empirical trimming process, with n = 90, 11 parameters, p = .514, and goodness of fit = .9683

Table 4 Correlations between variables relevant to our expanded model

Discussion

The results of this experiment support our initial hypothesis that preference for harmony represents a domain-general individual difference in aesthetic preference. This intercorrelation extends to all four tested domains, including both visual and auditory modalities. Preference for harmony seems to be represented best as a single general factor that is unrelated to the traditional personality measures studied (the five subscales of the BFI and the four subscales of the SSS). We believe that preference for harmony provides a plausible explanation for Eysenck’s t for any aesthetic domain in which the group averages show that people generally prefer more harmonious to less harmonious stimuli, because the preferences of people with a high preference for harmony will necessarily correlate more strongly with the group average than will those of people with lower preferences for harmony.

The results of structural equation modeling showed that expertise is significantly related only to the relevant domains (spatial harmony for art training and musical harmony for music training) after taking into account the effect of the general factor. However, the art and music majors also showed lower general-factor values, in addition to direct effects of expertise. The direction of causality in this relationship is an interesting issue. It is possible that individuals having lower preferences for harmony might be predisposed to enter the fields of art and music, but one could also hypothesize that formal art and music training has the effect of lowering their preference for harmony. Both effects are entirely possible, of course.

Several caveats should be mentioned about the present results. One is that they can only explain IDs in domains for which the concept of “harmony” is relevant, implying a relational aspect. Eysenck (1940) studied individual colors and odors, which are not explicitly relational in the way that our stimuli were. Nevertheless, the concept of harmony can be meaningful in such domains, to the extent that it signifies how well the given color or odor “goes with” other colors or odors in general. For example, Schloss and Palmer (2011) found that light (pastel) and desaturated (muted) colors are rated as being more harmonious across all possible combinations than are saturated (vivid) colors, suggesting that single pastel and muted colors may indeed be perceived as being more harmonious. We collected ratings from 25 participants of how “harmonious” 32 single chromatic colors were and found that the average ratings correlated very positively with the average harmony ratings of that color in combination with the 31 other colors (r  = .71, p < .0001), even though the latter ratings were provided by 48 different participants (Schloss & Palmer, 2011). It is not obvious that this would also be true for odors or for other individual stimuli, however (e.g., rectangular shapes, as studied by McManus et al., 2010).

A second caveat is that the concept of harmony investigated here is presumably but one of many features relevant to IDs in aesthetic judgments. Harmony, for example, is not the same as complexity, which has previously been taken to denote the number of elements in different stimuli (e.g., Berlyne, 1971). Whether and how harmony and complexity might be related in terms of IDs is an important topic for further study.

Third, the concept of harmony studied here is a subjective perceptual attribute rather than an objective stimulus property that at present can be calculated from physically measurable features. The fact that the variability is systematically lower for ratings of harmony than for preferences does suggest that harmony may be “more objective” than preference, however. Investigating the underlying stimulus attribute(s) influencing perceived harmony will be an important avenue for further research.

We believe that our results constitute compelling evidence that preference for harmony is an individual difference in aesthetic style that crosses traditional domain boundaries and affects aesthetic judgments that go beyond those of domain-specific training. More broadly, our findings show that empirical research into aesthetic preference is valuable not only for describing the average preferences of a sample of people, but also for increasing our understanding of and ability to predict IDs in aesthetic preference. We further believe that the concept of preference for harmony, as defined here, can serve as a tool for future research in aesthetics and personality that seeks to investigate preferences across multiple domains and at multiple levels of analysis.