1 Aesthetic Experience: A Preliminary Definition

I am assuming that the term “aesthetic” can be construed notionally in a way that allows us to identify a specific type of mental relationship: a specific attitude toward the world. This was already Kant’s hypothesis and the idea has resurfaced again and again in philosophical aesthetics. Of course it has also been much criticized. Most of the critics think that the very idea of aesthetic experience prevents us from getting a true understanding of art. This criticism has been formulated by continental philosophy as well as by analytical philosophy. According to Hans Georg Gadamer (1990), the idea of aesthetic experience misrepresents the ontology of the artwork; according to George Dickie (1964), a famous analytical philosopher, it misrepresents our relationship with art. In fact, Gadamer and Dickie disqualified the notion of “aesthetic experience” on different grounds. Dickie argued against the existence of a specific mental attitude—the “aesthetic attitude”—seen as a type of attitude distinguishing the attention paid to artworks from the attention paid to other objects, or distinguishing the “proper” attention to artworks from improper ones. Gadamer only objected to a specific understanding of the notion of “experience:” experience as a “subjectivist feel” (“Erlebnis”). But at the same time, he thought that paying attention to a (valuable) artwork is an outstanding experience (“Erfahrung”): an interpretative encounter with something that changes one’s existential outlook.

These criticisms are partly valid. “Aesthetic experience” does not refer to a specific mental attitude or to a specific type of attention (disinterested, distanced, and so on). And it is true also that you cannot adequately understand the ontology of artworks in terms of aesthetic experience and that you cannot understand the complexity of our relationship with art if you reduce it to a subjectivist feel of qualia. But this does not disqualify the notion of “aesthetic experience.” My aim here is to show not only that the notion of “aesthetic experience” is not vacuous, but also that it is central for understanding our relationship with artworks.

Admittedly, the expression “aesthetic relationship”—or any semantically equivalent paraphrase in some other language or some other culture—will probably activate different representations in different individuals at different times and in different groups or cultures. But I think it reasonable to assume that all situations we commonly describe as “aesthetic,” or would accept to describe this way, share two characteristics. Taken together they identify a specific intentional relationship:

  • All aesthetic experiences are attentional activities. Being engaged in an aesthetic relationship is paying attention to this or that: reading a poem, listening to Thelonious Monk, contemplating the garden of Ryoan-ji, and so on. I take this to imply that to engage in an aesthetic relationship means to engage in a cognitive relationship. I use the term “cognitive” here in a very broad sense, encompassing all perceptive, conceptual, and imaginative activities we engage in, in order to understand the world, ourselves, and other humans. But of course nobody would like to say that all cognitive acts are aesthetic acts. How can we distinguish aesthetically oriented attention from other types of attention? Is there a specific type of attention that would be aesthetic attention? Or is the cognitive dimension only a necessary condition but not a sufficient one for something to be called an “aesthetic experience?” Following Kant and some others, I opt for the second branch of the alternative, although I will try to argue later that when cognition is activated in an aesthetic setting it has a very specific profile.

  • So what is the second element that, taken together with attention, will yield a definition of “aesthetic experience” that meets our intuitions? Well, it seems to me that in all experiences that we would accept to describe as aesthetic, the attentional activity is regulated by the satisfaction or dissatisfaction it causes. This hedonic component has been recognized by almost everybody taking a serious look at the question, even if many would be unhappy with the words “pleasure” or “satisfaction:” they would prefer to talk of “appreciation,” “evaluation,” “grading,” and so on. On my level of analysis, all this boils down to satisfaction (or dissatisfaction), which boils down to hedonic valence. But not every satisfaction or dissatisfaction will do. I shall try to show, following Kant and cognitive psychology, that the hedonic component must be derived from the unfolding attentional activity itself and not from its object. And, of course, satisfaction or hedonic valence is not the last word about aesthetic experience: it only explains the dynamics of ongoing attention.

As it stands, this definition, I agree, doesn’t take us very far. And it is not new: it is largely a reformulation of what Kant had to say in the famous § 1 of the “Analytik des Schönen” in the Kritik der Urteilskraft (The Critique of Judgment). But I don’t think that it is vacuous: it catches nicely the array of phenomena that we want to take into account when we are looking for something which could reasonably be called “aesthetic experience.” That said, we must of course be able to give more content to the two aspects of our tentative definition: the attentional aspect and the hedonic aspect.

I’ll begin with the question of attention. As George Dickie rightly contended already some 30 years ago, there seems to be no empirical backing for the idea of the existence of a specific type of attention—which would be aesthetic attention. But I want to defend the idea that when attention functions in an aesthetic context, it acquires a very specific profile. It is this profile that will interest me now.

2 The Artist and the Connoisseur

“Bowerbirds” is the name given to a group of twenty or so species of passerines living in the Pacific, notably in Australia and New Guinea. They owe their reputation among ornithologists to the fact that males build complex and highly decorated architectures called “bowers.” The construction is made out of shrub branches interwoven in a remarkable way and skillfully decorated. The decorative design consists of items of all types collected and recycled: flowers, feathers, ribbons, bottle caps, broken glass or crockery, plastic utensils stolen from neighboring campsites, and so on. Often, the inside wall of the bower is “painted” with a mixture of berries, bark, charcoal, saliva, and dirt. The male bowerbird is busy with the architecture for several months every year, building it, upgrading it, repairing it, and “refreshing it,” for example by replacing wilted flowers.Footnote 1

Why does the male invest such a huge amount of energy in a construction that seemingly has no utilitarian function? Well, in fact it has a function: the bower is a central element in the male seduction strategy and in the selection process through which the female picks out her preferred one among the would-be lovers. During the courtship ritual, the bower fulfills successively three different roles. In the first place it works as a visual trap: it attracts the attention of the female who then thoroughly inspects it visually. At this stage, it functions as a signal of fitness compared to other bowers the female has already inspected. If the result of the inspection is positive from the point of view of the female, the bower takes on a different function: the female moves inside to watch the second part of the seductive strategy of the male: the parade. Once the female is inside the theater building, the male places himself in front of it and engages in a ritualized dance and sound performance. While dancing, he emits all sorts of sounds, partly mimetic (for example, he mimics other birds) and partly self-mimetic (he imitates his own cries of threat). Once he has gone through his show, he tries of course to mate with the female. At this point the bower takes on a third function: it becomes a device which prevents forced mating (copulation): to get at the female, the male must circle around the bower, which gives the lady the opportunity to fly away if she prefers to do so.

As is evidenced by my talk about architecture, dance, theatre, show, and so on, I want to suggest that the situation I am describing has important points in common with what in humans we would call artistic creation and aesthetic experience—sculptural and choreographic creativity on one hand, expression of preferences grounded in experienced qualities of an attentional activity on the other. In fact, I want to suggest that the activities of the bowerbirds are very illuminating for a better understanding of the structure of artistic activity and aesthetic attention. My specific target in this paper is aesthetic attention, but a short look at the artistic side may be useful. There are two reasons for this. The first is that the productive side is easier to access: it gives rise to artifactual and bodily incarnations which can be analyzed directly, whereas the activity of reception is mostly perceptual and evaluative, which means that it consists in internal processes that are more difficult to access and assess. The second reason is that the productive activity and the receptive activity are coordinated, co-adapted: so if we can pin down a specificity of the relevant phenomena on the side of the male compared to other, unmarked activities of this same male, we can tentatively assume that on the side of the female there must also be some specificity compared to non-marked attentional activities. If this was not so, the whole interactive process would break down (or would never have existed on the level of phylogeny). So a better understanding of the productions of the male can help us to describe what is going on in the black box of the female brain. And the two taken together can help us to better understand aesthetic attention.

Of course I do not ignore that the whole questioning I just entered into may be judged to be completely flawed. It seems to imply a form of biological reductionism ignorant of the specificity of human cultural facts. We all know what the function of the bowers, the parade, and the attentional activity is: it is all about sexual selection. Which, clearly, is not the case of artistic creation and aesthetic experience. So, how could sexual selection in animals be helpful for a better understanding of artistic creation and aesthetic experience in humans? Well, I am not saying that sexual selection can be helpful to understand artistic creation and aesthetic experience. I am only saying that the type of activities and processes that are activated in the process of sexual selection in bowerbirds throw some light on the activities and processes of artistic creation and aesthetic experience. What I want to say is that what I address here, and find interesting, are the structural homologies of mental processes and not the functional identity of behaviors. The idea is that there is a homology on the level of the poïetic and attentional processes or, to be more precise, a homology in the way they diverge from other unmarked productive and attentional processes. It seems to me that such a type of questioning implies no reductionist move, because structural homology does argue neither for nor against functional equivalence. This is a common lesson taught as well by evolutionary biology as by structural anthropology or system-theoretical sociology: one and the same structure can be co-opted by different functions, and the same function can be performed by different structures. The two-way relationship between structure and function is not a “one to one” but a “one to many and many to one” relationship. Of course the existence of such a structural relationship calls ultimately for an explanation. The default explanation would be in terms of a predecessor-relationship linking sexual selection to aesthetic attention, although I concede that in the context of our present state of knowledge such an explanation would remain largely speculative.

But is there really a structural homology between our two sets of facts? Am I not simply playing with vague analogies and surreptitiously anthropomorphizing animal behavior? To neutralize this objection we must rephrase our question: if a human being (male or female) was building the type of construction built by the male bowerbird and if she or he was engaging in the organized sequence of movements the male is engaging in, and if another human being (male or female) was relating to this construction and this organized sequence of movements the way the female bowerbird is relating to the bower and the parade of the male, wouldn’t we say, and rightly so, that we are dealing with an artwork and with aesthetic experience?

To answer this question, we need to take a closer look at the behaviors just described. For the reasons already given, I’ll begin with the male. In what sense can we defend the hypothesis of a structural homology with human artistic practices?

A first answer is that we are dealing with two sets of activities that seem to be based on the same mental resources and the same modalities of externalizing them. Thus the construction of the bower mobilizes the same type of procedural competences as human art-making: the capacity of internal modeling; the ability to translate this model into a physical three-dimensional reality; a sequential planning of a global script split up into subroutines—construction and decoration; the capacity to make preferential decisions when confronted with alternative solutions; the access to a synthetic evaluation of the structure closing the whole sequence, and so on. The same holds for the parade when compared to human dance: bodily movements forming organized sequences not related to a transitive action-goal; a capacity to produce rhythmic sounds forming non-random sequences not related to first-order communication, and so on. Maybe the bird’s repertoire is largely innate (but there must still be individual differences, because if there were no individual differences sexual selection would break down). Maybe the behavior of the bird is not “intentional” in the sense human cultural activities are. Maybe the bird has no “conscious” phenomenal experience in the way humans have. As far as I know, all these are hotly disputed questions, but in any event, the hypothesis of a homology of “poïetic” and attentional processes does not require a homology as far as the levels of treatment are concerned. Of course, having or not having a phenomenal experience of these processes makes a difference, notably in terms of feedback, but this difference concerns the richness and plasticity of the processes and not their nature.

One could still formulate another objection: perhaps there exists some homology, but it certainly is not operating on the level of artistic production and on the level of aesthetic attention, but on a more generic level, that of production of artifacts as such and of attention as such. If this were the case, the bowerbirds would teach us nothing interesting for artistic creation or aesthetic attention.

Well, let’s once again reflect on the bower. Its foremost characteristic is the fact that it is a marked construction. To understand what I mean by a “marked construction,” we must compare the bower to the nest constructed by the female. The nest is a purely practical construction, used to breed and raise the small ones. It can therefore be regarded as homologous to shelters constructed by humans. The bower has no such practical function (if we except its function against forced mating). Its central function is to be part of a display: it exhibits the value of the male. So it is “marked” compared to the nest because it has a special status compared to the nest—although its construction is the result of the same motor and planning capacities. The same can be said about the parade. Its time structure is a marked sequence: it is not the time of common interactions. It is ritualized time. Specifically, the movements and vocalizations of the male lose their pragmatic significance. For example, the typical cry of threat emitted by the male loses its pragmatic significance on two levels. First, the sound is not a response to an external stimulus: it is endogenous. Second, it is not emitted to threat the female or some other living being: it is emitted as a display of the male’s capacity to threat.

In fact, the parade possesses several other quite remarkable characteristics. First, behaviors which outside of the ritual have no semiotic status are transformed into signs when they take place inside the ritualized time of the parade: body movements and building capacities transform into signs of the male’s fitness. Second, behaviors that have already a semiotic function outside of the ritual are transformed into meta-signals: when the male produces the threat-signal “SKRA,” the sound does not have the semiotic function to threat—but its phenomenological qualities (loudness and so on) become signs of the male’s fitness. What is interesting here is the way in which the ritual moves all activities one level up compared to their “normal” status: non-sign becomes sign, sign becomes meta-sign.

Third, the whole process functions as a self-referential device: the bower, the dance, and the sounds refer to themselves as an embodiment of the value of the male. This may seem an unwarranted assertion: if, as I said, these processes are signals or signs of the male’s fitness, doesn’t this imply that they function quite normally in a hetero-referential way? Well, we’ll soon see that they function in a very peculiar way, which warrants the thesis of self-referentiality.

But before that, we must take a closer look at the partner of the male. As I said, the behavior of the female is exclusively attentional. But this attention is very peculiar compared to standard attention. The first point is that, like the “artistic” activity of the male, it is embedded in a ritual time frame that is internally structured. As we have already seen, it is sequentially organized into three phases. It starts with the visual inspection of the bower. The second phase, initiated by the female if and when she moves into the bower, is the phase of the parade. It is the most intriguing part of the whole sequence as it implies a strong decoupling of the female’s attention from the larger environment and its stimuli. The third phase is the concluding appreciation, followed by the consent or refusal to engage in sexual intercourse. The sexual intercourse itself does no more belong to the ritual time: it is pragmatic business as usual. Notice that the ritual time frame can be broken off prematurely if the female’s reaction to her inspection of the bower is negative and that the whole ritual sequence is retrospectively disqualified if the final decision of the female is negative.

The second point, at which I alluded already, is that, like the production of artifacts and the bodily movements of the male, the attentional sequence of the female is cut off from its normal pragmatic function. The ritual time of the parade is a shared time not only because the time of the male and the time of the female are synchronized but also because both are cut off from pragmatically oriented interactions and from normal attentional feedback with the real life environment. The second function of the bower we encountered, that of a theater-house for the female, can be read as the expression of this absence of pragmatic functionality: it provides a shelter against predators and so allows the exclusive focusing on the ritual. But the absence of any direct pragmatic function is also characteristic of the interaction with the male partner. For the ritual to work, the female must be able to handle the whole situation as one not of direct interaction, but of “display.” She must process the signals emitted by the male as self-referential signals, that is to say, as signals whose content is not their standard one, for example a threat, but their exemplifying function: they denote what they are the result of, a certain degree of male fitness. Processing signs in this way is a very complex undertaking. The female must be able to neutralize all direct feedback loops, that is, all loops where a stimulus is paired with the direct behavioral response it normally produces. Thus she must not look at the berries decorating the bower as fruits to peck at and eventually to eat, but as objects to be processed only attentionally. The same holds true for the sounds emitted by the male: she must not react to their standard function—for example, by flying away when he emits threatening cries—but she must attend to and appreciate their internal phenomenal qualities. In short, she must be able to cut off her perceptual processes from the ecological context. The standard cycles of ecologically embedded perception are the cycle of perception-reaction-perception and the cycle of action-perception-action. During the mating ritual these cycles are replaced by the dynamics of an attentional flow regulated by an online evaluation of the phenomenal qualities of the perceived stimuli (intensity, color, rhythm, and so on). Of course, the non-pragmatically oriented interaction and, more specifically, the self-teleological dynamics of appreciative attention are imbedded in a larger sequence of actions—the process of sexual reproduction—which is of utmost pragmatic importance in the life of the partners. This shows that auto-teleological processes can be, and most of the time are, instrumental for achieving pragmatic ends: auto-teleology is often hetero-teleologically motivated. This is also the case with aesthetic experience.

It follows from the preceding description that the actions of the male and the attention of the female share important characteristics. The most interesting one is the following: all these activities are costly in terms of energy expenditure and risk exposition. The architecture involves a huge investment of time, energy, and ingenuity, as does the parade, whether it be dancing or singing. The same holds for the attentional processes of the female bird: they are very intense, near to what in psychology is called attentional overload. And of course the whole ritual is handicapping in terms of maximization of survival: the male and the female are focusing all their energies and attention on the ritual, and they process the ecological real-life context only in a marginal way; most of their resources, notably their attentional resources, are being mobilized by the situation of the parade. This may appear to be somewhat of a mystery if seen in the context of natural selection processes.

In biology there is a theory which has been developed especially to account for such paradoxical situations. It is called the theory of costly or honest signaling. It arose in the context of evolutionary biology to account for situations of incomplete knowledge, that is to say, situations of communicative interaction about attributes varying in quality, intensity, or degree among subjects and which are difficult to assess directly—although it is very important for the individuals to assess them correctly.Footnote 2 In the case of bowerbirds, the issue is genetic fitness. For the female, it is important to correctly assess the fitness of the male; for the male, it is very important to signal his fitness. At the same time, fitness is very difficult to assess in a direct way, which means that we are in a typical situation of incomplete knowledge, which in turn opens up the possibility of cheating. The theory tries to explain how individuals who differ in terms of fitness and whose interests diverge may still obtain mutual benefit from reporting honestly their qualities instead of trying to cheat. But how can one prevent cheating? The theory of costly signaling answers that the only way to prevent cheating is to choose a signal that cannot be dishonest. This condition implies that the signal must be such that its very existence is the incarnation of the value it signals. Such a signal cannot be simulated: if you do not possess the qualities you want to signal you cannot signal them, because the signal operates through exemplification. The best known example is the peacock’s tail: the tail is objectively disabling because it is a major handicap when the bird soars to escape a predator. How could evolution select a trait that handicaps its possessor? Well, the answer is: it has been selected precisely because it is a handicap.Footnote 3 Being a handicap makes it an honest signal: it depends directly on the qualities of the individual who exhibits it. If the male peacock has survived until the mating season although the tail is a disabling feature, then the tail is an honest signal of his fitness and the female can take it as direct proof of that. Now, the bigger and the more colorful the tail is, the more handicapping it is. This explains why females tend to choose always the longest and the most colorful tail. This dynamics can lead to what has been called “a runaway process”, because every generation of females selects among the males the ones with the most colorful and longest tails; in the long run, the mean level of colorfulness and length of tail of the males will rise.Footnote 4 If this process would not come to a halt, it would open a very grim perspective for the male population and more generally for the species. Happily, the cost vs. benefice balance will at some moment halt this potentially self-destructive dynamic. The logical possibility of a runaway process illustrates very nicely the difference between costly signals and inexpensive signals. A costly signal has to be paid cash. This is not the case with inexpensive signaling. Take language for example: it is the prototype of an inexpensive signaling device and, as we all know, you generally don’t have to pay cash for your utterances. That is the reason why language can so easily be used for simulating and cheating. It is easy to signal in a linguistic way qualities that you don’t possess. And you might even wonder if I am not doing it now. Who knows?

The theory of costly signaling is heuristically very powerful: it is a tool which helps us to link together facts studied by many different disciplines: evolutionary biology, anthropology of religion, economic theory (the problem of conspicuous consumption), sociology (theories of symbolic capital), and so on. Bird and Smith (2005) have shown how the theory of costly signaling is able to bridge the gap between social anthropology (which emphasizes the intangible relationships, self-representations and symbolic representations, the question of status symbols, etc.) and “naturalistic” approaches seen in terms of selfish but socially immersed individuals. They note: “By paying attention to the problem of maintaining credibility when individuals are taking interdependent decisions (concerning joint alliances, conflicts, relations of trust, etc.) in situations of incomplete information, signaling theory gives us a new interpretation of symbolic activities such as the aesthetic development, initiation rites, the ethnic boundaries, the ceremonial festivities, the circulation of wealth, conspicuous consumption, monumental architecture, religious commitment and the supply of altruistic goods” (p. 222). As far as aesthetics is concerned, the heuristic function of the costly signaling model resides in the fact that it allows us to locate art and aesthetics in the broader context of other social facts to which they are associated in most societies, such as religion, ritual, politics, conspicuous consumption and so on. The fact that aesthetics is only one of the multiple domains of costly signaling in humans loosens somewhat its exclusive ties with sexual selection, although it remains important to stress that what marks their common specificity among the forms of costly signaling is the fact that in both cases the signaling is realized through a display which asks for a specific type of attention.

But as it stands, the theory, when applied directly to problems studied by social sciences, has several drawbacks.

The first is that it is not sure that every costly signal is honest and that every honest signal is costly: cheating can sometimes be very costly and in non-competitive relationships, even inexpensive signaling can be honest. After all, acts of language are sometimes honest, aren’t they? But these difficulties imply neither that costly signaling does not exist nor that it is not mostly honest.

The second problem is that it is not always clear whether the social facts which the theory is supposed to explain are really costly signals, the danger being that, contrary to what happens with signaling theory in evolutionary biology, its use in human sciences risks defining the notion of “cost” in a vague or metaphorical manner.

The third is that when it is used in human and social sciences, costly signaling is generally interpreted in terms of a vague functional equivalence and not of a structural homology. When using the concept, human sciences are generally looking for (supposedly) functional equivalents to sexual selection, like symbolic capital, agonistic power relations, prestige politics, and so on.Footnote 5 As far as art and aesthetics are concerned, these vague functional equivalences, although they are relevant to some extent, have no great explanatory power. They can explain why some people collect art, why some organize lavish performances and so on, but they seem unable to explain on a general level why people create art and why they are interested in it even when no prestige comes into play: enjoying a movie or a poem, inventing geometrical perspective or abstract art and so on, have not much in common with functional equivalents of some strategy of egoistic genes.

The fourth drawback is that, generally speaking, the theory of costly signaling is taking into account only the cost of issuing the signal: the signal is costly for the issuer (the male bird). But what about the female? She certainly is not emitting a costly signal, because she is emitting no signal at all. But to be illuminating for aesthetics, the theory must be able to say something about attention. As far as I know, very little attention has been paid to the question of the cost for the retriever of the signal: is the retrieval inexpensive or is it costly? Well, if we think about the behavior of the female, it appears that the costly signals emitted by the male command, on the side of the female, a type of attention that is itself costly and handicapping. We have seen that she has to synchronize her attentional behavior with that of the male: she has to tune in. This implies that her attentional profile must have the same characteristics as the “poïetic” profile of the male: loss of pragmatic significance, focalization, heavy investment in perceptual and neurological cost, capability of sustaining delay of decision making, risk of being attacked by a predator, and so on. We have seen that for the ritual to work, the female, cognitively as well as on the level of her emotional reactions, must look at the whole situation as one not of direct interaction but of “display.” She must be able to read the signals as self-referential signals. And to do this, she must be able, as we have seen, to process all relevant stimuli—the bower structure, the decoration, the colors, the movements, and the vocalization of the male—by neutralizing direct feedback loops pairing a stimulus with an immediate behavioral response. Instead of the standard stimulus-driven behavior she must switch to a self-reinforcing, attention-driven, costly behavior. This is the price she has to pay if she wants to be able to assess if the signal is honest: as the male can only emit the signal if he possesses the qualities he advertises, so too can the female only get the assurance she is looking for if she is willing and capable of assessing the signals through an attentional process which is costly compared to standard attention. The most important point is that she can only get the information she is looking for if she processes the signals of the male in this costly way: it is not enough to identify them as being costly for the male, because you cannot identify them independently of the fact of experiencing them as costly which means experiencing them in a costly way. There is no shortcut because only the complete sequence of the ritual gives access to the needed information.

So my tentative conclusion at this point is that once we reframe the notion of costly signaling in a way which takes into account also the mode of attention demanded by the emission, it can have an illuminating capacity as far as the study of art and aesthetics is concerned.

3 Aesthetic Attention

With this in mind, it is time now to focus directly on the question of aesthetic attention. In what way does the setting called “aesthetic experience” affect the dynamics of attention? What are the inflections that characterize the attentional processes occurring in an aesthetic setting compared to those occurring in standard attentional processes? The notion of a standard attentional process is problematic, but for my purposes this is not very important because I will not say anything specific about it: I use it loosely as a contrasting element for the traits which I take to be specific of attention in an aesthetic context. What I have to say about this specificity draws heavily on cognitive psychology and I will be unable here to go into the specifics of the experimental settings or to discuss the legitimacy of extrapolating from these settings to the problem of aesthetically oriented attention. But in a general way, the extrapolation of the experimental results to the case of aesthetic attention can be justified on grounds of commonly accepted and commonsensically formulated characteristics of aesthetic experience. I would like to foreground three major specificities:

The aesthetic inflection of attention results in a reversal of the relative importance of bottom-up information processing compared to top-down processing. Standard pragmatic information processing puts emphasis on stimulus-driven, bottom-up, schematic, and automatic treatments. In aesthetic experience, information retrieval is more heavily attention-driven, top-down, concretizing and reflective. Now it is important not to construe this as a dichotomy. Every attentional process is partly stimulus-driven, bottom up, schematic, automatic, and partly attention-driven, top-down, concretizing, and reflective. Looking at a picture in an aesthetic way doesn’t neutralize the pre-attentional stages of visual organization, which are constitutively automatic, bottom-up, and so on. What happens is that aesthetic attention, contrary to standard attention, which is driven by the norm of perceptual and cognitive economy, does not maximize these processes but rather emphasizes, on the contrary, the attention-driven top down ones. I think this explains why we often consider aesthetic attention to be active and standard attention to be passive. Of course, literally speaking, this opposition does not make much sense because stimulus-driven, bottom-up perception is never passive, even at the pre-attentional level: as we know, the pre-attentional processing of a visual stimulus, for example, is made up of operations of selection, as one of the central functions of this pre-attentional stage of information processing is the reduction of the complexity of the proximal stimulus. But it is easy to understand why we can have the impression that standard perception and attention are “passive”: we do not have conscious access to pre-attentional cognitive processes and at least in settings of ecological familiarity even the attentional levels are largely automatic because they are founded on an acquired expertise.

But of course, not all non-aesthetic attentional processes are operating this way: “hard-looking” processes do also exist in other contexts. We have only to think about the entomologist or the botanist looking for specimens of a hitherto unknown species, or less exotically, about a person looking for a displaced item. I think what is peculiar to the aesthetic inflection of attention is foremost the fact that it is not only attention-driven but has also a peculiar auto-teleology built into it: the aim of looking aesthetically at something is the process of looking itself. The entomologist looking for specimens of unknown species is hard-looking because he is aiming to identify the discrete differential characteristics which will allow him to identify the specimen as belonging to species A or B. So, his hard-looking is still a looking which strives for the most economical way to achieve this result, which means that his attentional processes are guided by the final result he wants to achieve: the correct identification of the specimen. This, it seems, is not the situation when attention is aesthetically inflected: if you adopt the aesthetic stance towards a flower, a sound, or a picture, your activity is not driven by the transitive aim of identifying it correctly as the flower A, the sound of K, or the representation of Z. This identification surely is often part of the aesthetic process—but once you have achieved this goal of identification, the process is not over. One could even say that it is now that it really begins. Take the example of a picture: you’ll go on to look at it, descending attentionally beneath the level of representational identification, looking for the visual organization, the balance of colors, then perhaps ascending again, putting the colors in relation with the representational content, and so on. What this comes up to is that when attention is aesthetically inflected it is self-reinforcing: attention calls for further attention in an internal process of continuous feedback—a point already implicit in Kant’s analysis of “aesthetic judgment.” What could motivate such a costly process? As I suggested already in the opening statement the motivation is I believe hedonic reinforcement, a question I will try to expose at the end of this paper.

Attention-driven information retrieval, which is typical of aesthetically inflected attention, enhances our capacities of discrimination, be they perceptual, categorical, or emotional. Practicing this type of attention, even when it is not producing object-knowledge, is enhancing our cognitive abilities. Aesthetic attention is, among others, a way to achieve what psychologists call perceptual learning.Footnote 6 Perceptual learning is not acquisition of new object-knowledge but results in the lowering of the attentional threshold. Lowering of the attentional threshold is a typical outcome of top-down, attentionally driven processes. It has been studied notably in the area of videogames, but it is a general process corresponding to what the two psychologists Ahissar and Hochstein (2004) call the “reverse hierarchy theory.” Their idea is that what limits performance in the field of simple visual discrimination is not that the relevant information is absent from neural representations, but that neophytes do not have access to it. In other words, the same visual stimulus gives rise to the same neural representations in all subjects, because their capacities in processing stimuli sub-personally in a bottom-up way are basically identical because they are biologically hardwired. The subjects differ only in terms of their ability or inability to attentionally access this information. So potentially the necessary information is there for everyone but people differ in their capacity to gain attentional access to it. As Ahissar and Hochstein showed, the training of top-down attention-driven information retrieval lowers the threshold of our attentional access and so enables us to reach further down in the hierarchy of information retrieval. The reverse hierarchy theory predicts more precisely that this development of attentional discrimination is due to a descending cascade of top-down transformations on a neural plane that enhances the relevant information and weakens the irrelevant one.

A well-known pictorial strategy to produce such “reverse hierarchy” processes is to create pictures that are difficult to treat in a coherent way by automatic, bottom-up processes. This is the case for example of post-impressionism: although post-impressionism is still figurative painting, it very often is on the borderline between figuration and design. Think of the later series of Monet’s “Water Lilies:” at the first look, some of them seem to be pure design; it’s the attention-driven descending processing caused by the title of the Work which will give it its figurative content, without at the same time neutralizing the all-over design effect, producing in this way a constitutively unstable attentional logic. Using very different techniques, Matisse and Bonnard produce the same instability, this time in the form of a tension between the principle of depth construction and the principle of surface scanning. To a naive eye, Bonnard’s treatment of the relationship of depth-effect and surface-effect is disturbing and produces perceptual dissonance. Attention-driven, top-down perceptual processing is able to reduce this dissonance by producing a process of perceptual learning, developing in the spectator the capacity to adopt the so called “pictorial vision” stance and to switch between this stance and the canonical visual mode. Even if such learning is not explicit and does not give rise to propositional knowledge, at least not directly, it seems hardly deniable that an important part of the cognitive appeal of pictorial art is related to this dialectic between painterly vision and world vision.Footnote 7

The dynamics of aesthetic exploration is characterized by a prevalence of horizontally distributed exploration over vertically integrated exploration. Standard cognitive processes use preferentially bottom-up automatic processing to produce efficient beliefs and evaluations in the least costly way. Specifically, when we encounter a perceptual stimulus, we try to associate it in the most economical way with a maximum of properties which do not belong to the perception itself but which allow us to integrate it into a larger context. This generalization operates through a process known as “schematizing:” this process “impoverishes” the potential complexity (and richness) of the stimulus by projecting upon it an internalized general pattern (or category) and by ascribing to the perceived event the categorical attributes of the scheme. A “cognitive pattern”Footnote 8 or “template” (“Sollmuster” or “Superzeichen” in German) of this type is a short-cut allowing us to minimize the cost of cognitive processing and to maximize its effectiveness (all other things equal). The cognitive patterns, or at least the perceptual ones, generally operate at the pre-attentional level. For example, when we look for a fraction of a second at a triangle lacking one summit, we see a complete triangle with three summits, because and anticipatory sub-personal mechanism has “filled in” the lacking third summit:

figure a

But of course, this mechanism operates not only at the level of perception. It also plays a central role in conceptual categorization where it has been studied under various names (such as “schema,” “prototype,” or “horizon of expectation”) by many disciplines like, for example, cognitive psychology, social psychology, and sociology of knowledge along with descriptive phenomenology and hermeneutics. In all cases, the function of the anticipatory “simplification” is to reduce the amount of potential information contained in the ongoing experience of our “being-in-the-world” and so to ensure the quickest possible integration of the new stimulus in the stock of familiar stimuli. But in the case of attention in the context of an aesthetic experience, the quickest way to produce beliefs is no longer the goal of the process. We are looking on the contrary for “contextual complexity,”Footnote 9 which is characterized by the fact that the top-down and horizontal explorations outweigh one-way bottom-up processing. This does not necessarily mean that the field of perception is more important than intellectual discrimination. The difference is one of cognitive dynamics: instead of trying to reduce the complexity of information, the aim being to produce a stable belief in the most economical way that fits into a class of already existing beliefs or (more rarely) that reorders the prototype of that class, aesthetically oriented attention favors complexity of information, looks for multiple (top-down as well as bottom-up) relationships between the different levels of information-processing and accepts to linger on the same level to explore it horizontally in all its richness.

4 The Hedonic Component

But why would we engage in such a costly relationship? As I suggested earlier, in aesthetic experience the costly cognitive process is regulated by hedonic feedback. Of course, most if not all cognitive processes are tied to hedonic reactions. But whereas standard attention is regulated mostly by its final outcome and therefore is heavily hetero-teleological, aesthetically oriented attention is self-teleological because, in its case, the hedonic calculator is functioning online in a feedback-loop with the ongoing attention: in aesthetically oriented attention, the costly processing of the signal is driven by an internal reward.

The empirical existence of direct online feedback loops between attention and reward in aesthetic experience has been amply demonstrated, notably by the cognitive psychologists Rolf Reber, Norbert Schwarz and Piotr Winkielman (2003). Reber and his colleagues are working in the field of cognitive psychology and their empirical evidence is mostly behavioral, although Winkielman and Cacioppo (2001) used facial electromyography (EMG) as a way to measure participants’ affective response. But their findings are corroborated by neuroscientific research by Ramachandran and Hirstein (1999) or by the team of Semir Zeki, who identified the area of the medial orbito-frontal cortex that mediates this feedback in the case of experiences of visual and musical “beauty” and “ugliness.”Footnote 10

One very important difference between the psychological model of Reber and his colleagues and most neuroscientific models is that Reber’s experimental work is concerned with establishing a difference between hedonic value attributed to the processed object and hedonic value attributed to the processing itself. Reber’s experiments highlight the fact that, in aesthetic contexts, pleasure/displeasure is a reaction to the process of attention, which implies that although the properties of the object of attention are a central part of the distal cause of aesthetic appreciation, its proximal cause is the ongoing attentional activity focused on the object. For Reber, what is rewarding during the process is neither the represented object as such nor the final cognitive outcome of the processing of the object (a “determining judgment,” to speak with Kant), but the act of processing itself (the Kantian “harmony of the faculties”). Neuroscientists, it seems to me, tend (more classically) to relate the hedonic response to the properties of the attended object.Footnote 11

Another important difference is that neuroscientists mostly address issues in visual art, and although Zeki extends his research to music, he too is interested mostly in arts where the level of perception is central, which excludes not only the whole domain of literature (the case of oral poetry is of course more complex) but also conceptual art and, more generally, many forms of contemporary art which minimize the perceptual level of attentional engagement. Although Reber’s experiments are also exclusively studies of visual stimuli, his model explicitly claims validity for perceptual and conceptual levels of processing.

For multiple reasons, I think that we should look out for a generic model of aesthetic experience, valid for all modalities of aesthetically oriented attention, and for a model that foregrounds processing instead of object properties. Along these lines, aesthetic experience could be defined as a bidirectional feedback loop established between the attention paid to the object (artwork or whatever) and an online hedonic calculus evaluating the positive or negative valence of the attentional process as it unfolds in time. Several points must be stressed: it is the attentional process which is evaluated by the hedonic calculator and not directly the object (although the appreciation will generally be projected on the processed object); this implies that the processing is meta-cognitive and reflective in important ways; the feedback goes both directions; the hedonic evaluation is done online, which means that it regulates and is affected by the attentional processing.

Can we go further and try to find out if there are specific characteristics of the profile of the attentional process which are in a deterministic way linked to positive hedonic valence and therefore to positive aesthetic experience? As already indicated, contrary to objectivist theories of aesthetic evaluation, which place aesthetic value directly in the object’s properties, a model based on the hypothesis that what is evaluated are not the qualities of the object per se but the quality of the way it is processed can of course not look for object-properties to find an answer to this question. The relevant characteristic must be a characteristic of the processing itself. What is this characteristic? The standard answer in cognitive psychology as to what causes positive feedback in the case of aesthetic experience has been for a long time been that it is fluency of processing which is the hedonic regulator. The more the processing is experienced as fluent, the more the aesthetic experience will be positive. This would imply that the only variable on which the hedonic calculator draws is fluency or easiness of processing. This has notably been the initial explanation given by Reber et al. (2004).

This conclusion has been considered by many critics of Reber to be counterintuitive. The first objection comes from art-history: if fluency is the end of the story, how can we explain that many works of art—and, more precisely, many highly successful ones—are designed intentionally in a way so as to limit fluency of processing: this is the case notably with important parts (but of course not all) poetry. It is also the case not only in modernist music but also in classical polyphony and so on. To answer this criticism, Reber has complexified his theory. In Reber and Bullot (2013), “disfluency” is introduced as an artistic strategy to “manipulate fluency.” As the authors explicitly state, fluency remains the cause of the positive effect, which implies that disfluency is considered to be a source of negative affect. Why then should the artists be keen on introducing “disfluency?” Reber and Bullot state that its function is instrumental for manipulating the mode of engagement of the public: “[…] disfluency can elicit inferences about the artwork and a more analytical style of processing in appreciators who adopt the design stance and acquire art-historical understanding.” Later on they state: “For instance, artists may aim to elicit processing disfluency in order to prevent automatic identification of the content of a work, or elicit thoughts about issues that are culturally significant in their art-historical context.” As the use of the expression “automatic identification” suggests, disfluency seems simply to be a new word for the process called “defamiliarization” by the Russian formalists. But in fact, disfluency is not the same thing as “defamiliarization.” Whereas the Russian formalists thought that defamiliarization was necessary to uphold satisfying aesthetic experiences, Reber and Bullot think its aim is to compel the public to go beyond the basic exposure stance (the stance of the naive spectator, so to say) and to take into account the design stance (personal and historical intentionality) and the artistic understanding stance. As Reber and Bullot state, this implies to adopt a more “analytical style of processing,” which in fact culminates in a historical interpretation of the native signification of the artwork. But adopting the stance of historical analysis and interpretation is different from adopting the aesthetic stance. Of course, intentional and historical information may inform aesthetic experience: it can render it richer. But it is part of the input into the aesthetic experience and not part of the experience itself. The “analytical style of processing” is a standard cognitive approach to art, for example that of the art-historian. If the function of “disfluency” is of this kind, it cannot be the right answer to the problems encountered by the theory of fluency because it displaces the problem from an aesthetic plane to the plane of the background information for the experience. It also gives a biased image of standard aesthetic experience in many arts: in movies, narrative, theater, poetry, and many others, the standard experience is not historicist in this way. On an analytical plane, we should not conflate cognitive understanding of the native intentional identity of artworks with the aesthetic appreciation of artworks.

But there exists a second objection to the fluency theory, which could perhaps show us a way out of the problem. This second objection comes from inside psychology itself. Several studies have shown that the attractiveness of fluency has a boundary condition: boredom.Footnote 12 When fluency is pushed too far, the hedonic valence is inverted and becomes negative. This fact indicates that fluency cannot be the whole story and suggests the existence of the second factor counterbalancing fluency. What could be this factor? Well, the most plausible candidate would be curiosity.Footnote 13 Artworks must not only be “beautiful,” they must also be “interesting,” that is, stimulate curiosity, and my tentative hypothesis would be that positive hedonic feedback is the result of fluency and curiosity counterbalancing each other. Curiosity is somewhat difficult to assess in psychological terms: although it is defined by a lack of information and by a drive to reduce the information gap, it is not, contrary to disfluency, experienced as dysphoric but rather is associated with positive feelings. This inherent positive hedonic valence of curiosity has perhaps been shaped by evolution, curiosity being a fitness-enhancing quality. But whatever the evolutionary cause, the reality of the positive hedonic valence of curiosity is well established.

In what way could curiosity go together with fluency to enhance positive hedonic value? I think it is important to notice that the two factors have not the same status. Fluency and disfluency are two opposing experiences of processing dynamics, disfluency being generally experienced simply as that which hinders fluency. Curiosity is not an experience in this sense. It is a mental attitude (or disposition) opposed to that of lack of interest (and lack of interest is provoked, among others, by boredom, which, as we have seen, is a limiting condition for experiencing fluency positively). Curiosity is an attitude of positive cognitive alertness for stimuli (objects, events) not yet processed or only partly processed. The positive valence depends not on the nature of the stimuli but is tied to the simple fact that the stimuli are as of yet not processed. This means that curiosity values the act of processing information as such. Loewenstein (1994), along with Lahroodi and Schmitt (2008), argue therefore that, in its purest forms, curiosity is characterized by an auto-teleological drive: when we are curious, we are valuing information in itself independently of any specific cognitive or pragmatic reward. This means that the reward of curiosity lies in the onset and the going on of processing itself.

If this tentative outline is correct, then artists are not obliged to construe traps of “disfluency” to maintain the positive interest of the art-lover: they have to get him to become, and then to stay, interested in processing the object (work of art). That is, the work must be rich in the sense of opening up the possibility of an intense and open processing. This means that it must be complex: as Reber, among others, noticed, if people value fluency in a positive way, they nevertheless prefer complexity over simplicity. If curiosity is a factor of the dynamics of positive aesthetic evaluation, this would be what we should await. All this does not mean that fluency is not important, but it certainly cannot explain positive aesthetic value on its own. It seems to me that a model based on the tensional interplay between fluency and curiosity is what we should look for.

I should, of course, add that this psychological description—if it helps to understand the internal dynamics of aesthetically oriented attention, its mechanics so to say—tells us nothing about the social and cultural factors that shape the attribution of hedonic valences and, of course, the attentional processes themselves. I was here only interested in the mechanics, even if it can be argued, and I would agree with this argument, that the most complex problems we have to face are those concerning the level of a correct understanding of the way social and cultural factors shape our attention and our allotment of positive or negative hedonic valence.

5 Some Concluding Remarks

I am not sure that the descriptive and explanatory outlines I have sketched above really fit together to draw an integrated portrait of aesthetic experience, but it seems to me that they constitute a possible starting-ground if we want to gain a better understanding of aesthetic experience. The difficulties that remain are numerous. One difficulty is the following: if aesthetic experience and artistic creation are phenomena of costly signaling, what about the second condition of costly signaling, the honesty condition? We saw that the decisive criterion explaining the existence of costly signals was their inbuilt honesty, due to the impossibility to simulate them. Is this condition valid for artistic creation and aesthetic relation? As of today, I am unable to give a satisfactory answer to this question. To explore the problem, one possible entry is the question of paraphrase and summary: although artworks can be paraphrased or summarized to convey information about them, they cannot be aesthetically experienced through a summary or a paraphrase. If this is the case, then one could perhaps develop an argument in favor of their constitutive “honesty:” they cannot be separated from their singular contingent identity, because the work of art is not the vehicle of the signal but its incarnation: the relation is one of self-exemplification. And I would argue that the idea of autonomy and of the artwork as a self-enclosed self-referential phenomenon, defended notably by Gadamer and Wittgenstein, should be studied in relation with this question. The impossibility of replacing an artwork by a summary or a paraphrase holds true not only for intermedial situations (for example, replacing a painting by its verbal description), but also for paraphrases or summaries in the same medium. To experience Remembrance of Things Lost, you have to read the whole novel: no summary will do. Of course, a summary or a paraphrase can give me substantive information about the representational content of Proust’s work. But experiencing the work aesthetically is to experience it not only as representing a world but incarnating it verbally in precisely the “form” Proust gave it. To elicit the same experience, one would have to copy it. The same holds true for aesthetic experiences relating to natural phenomena: no description of a landscape can replace the real experience in its singularity as experienced by a singular individual. Of course, the description of a landscape can itself be the object of an aesthetic experience, but in this case, the experience is tied to the description and not to the landscape. I am not sure that these hints are really conclusive but it could be interesting to push them further.

Another open question is that of the evolutionary aspect of the homologies between the processes of the bowerbirds on one hand artistic creation and aesthetic experience on the other. In fact, this question boils down to that of the functionality of costly signaling as evidenced by artworks and aesthetic experience. The hedonic feedback loop helps to explain how the process is possible on the level of the individual person but this does not tell us how and why it evolved and survived culturally, as a social fact present in a variety of forms in all human societies. Remember that costly signaling is characteristic of situations where the information that agents have access to is both incomplete and essential to them. This could perhaps help us to understand why art and aesthetic experience are so often present in risky communicative situations were inexpensive signaling does not seem to be appropiate: this is certainly the case when men or women want to seduce, when they want to impress a rival, when they want to show their power or their submission. But as I said, it would be simplistic to focus on these agonistic situations between individuals. Socially speaking, art and aesthetics are very often tied to existentially more elementary situations of risky communication: when we enter into a relationship with otherness, for example, with the spirits or the ancestors or the dead, or when we are faced with the conundrum of our own existential identity within the social, natural, and cosmic world—in short, in the countless lived situations in which our existential mood, our attunement, our “Gestimmtheit” (Heidegger) as individuals or as groups caught in a network of human and cosmic realities ceases to go without saying. In most human communities, these situations have given rise to a number of cross-culturally related phenomena and artifacts: dances, ornaments, sculptures, verbal productions, performances, and so on—what we here and today call art.