This will be our reply to violence: to make music more intensely, more beautifully, more devotedly than ever before.

―Leonard Bernstein

Music is so prevalent in human societies that its presence in everyday life could easily be taken for granted. It has been present in all civilizations around the world since their beginning. Most humans seem to be able to create it in some form and are susceptible to its influence. Music is a means of enriching one’s sensory experience and serves as a socializing agency that influences many behaviors (Hodges, 2020). The benefits of music for a society have been proclaimed since the time of the ancient Greeks through to the present. Aristotle believed that music could promote moral behavior, thus creating model citizens for the ideal state. Plato believed that music could bypass reason and penetrate into the core of the self, thus affecting the development of a person’s character (Popular Beethoven, n.d.). In their Survey of Public Participation in the Arts, the National Endowment for the Arts (NEA, 2020) reported a number of statistics that underscore the pervasiveness of music in everyday life: In 2017, 175 million adults in the United States used electronic media to access artistic or arts-related content; 128 million adults attended artistic, creative, or cultural activities—with live music performance being the most frequent—and 128 million adults created or performed art, with singing being the most frequent form of expression. In addition, of the adults who participated in the performing arts, 62% reported that they do so to spend time with family and friends. Finally, the Bureau of Labor Statistics informs that artists, including musicians, are becoming a larger portion of the U. S. labor force, increasing by 9.2% from 2006 to 2017 (NEA, 2019). Music is thus as relevant a cultural practice today as it was for Aristotle and the ancient societies.

Musicologists consider music to be a species-specific trait that is unique to humans. Music and art figure prominently in the evolution of a culture, so much so that Hodges (2020) contends that it is impossible to know a culture without knowing its art. In light of efforts to extend the science of behavior to cultural practices (see Mattaini & Cihon, 2019), a topic worthy of consideration for culturo-behavior scientists concerns the function of music for an individual or group of people. We might ask, for example, why the behavior of music making was selected in our ancestors, and how people benefit from its selection (see Brown et al., 2000; Morley, 2014)?

Skinner (1981) proposed three levels at which the mechanisms of selection operate: the genetic, the operant, and the cultural. Skinner hoped that the science of behavior would one day be sophisticated enough to provide analyses of any number of cultural phenomena, and as articulated by Mattaini (1996), such analyses may promote the survival of groups of people and may even prevent a society’s demise. Mattaini’s perspective is not unlike that of contemporary evolutionary scientists, who today recognize that cultural-level selection mechanisms may have a greater impact on behavioral inheritance than genes (see Jablonka & Lamb, 2014). Through this lens, Wilson (2007) notes that today’s evolutionary science framework may aid not only in understanding but also in improving the human condition. Analyzing the function of cultural practices such as music may contribute to this agenda.

In this article we explore the selection mechanisms responsible for the retention of music-related behaviors across time and cultures. We explore the notion of music first as a biological-level adaptation and second as a cultural-level adaptation. We also propose that music is a highly complex form of arbitrarily applicable relational responding (AARRing; see Hayes et al., 2001) that requires cooperation between two or more individuals. As such, music as a form of AARRing may be culturally selected through metacontingencies that guide the form and function of music in the past and present. Our hope is that this provisional model, although complex, produces testable predictions that can aid in a culturo-behavioral understanding of music, its function, and the ways in which music might contribute to the well-being of a society.

Music as a Biological Adaptation

Darwin was intrigued by the evolutionary significance of music but was mystified as to the basis for its selection and retention. In The Descent of Man (Darwin, 1871; 1981) he wrote that “neither the enjoyment nor the capacity of producing musical notes are faculties of the least use to man. . . . They must be ranked among the most mysterious with which he is endowed” (p. 733; see Sacks, 2008). Darwin was fascinated by the similarities between music and speech. Both music and speech require timing, consist of unlimited varieties of complex sequences, and are often accompanied by bodily movements (Corballis, 2010). Pinker (as cited in Sacks, 2008), on the other hand, disputed the relevance of music by declaring “what benefit could there be to diverting time and energy to making plunking noises. . . . Music is useless. It could vanish from our species and the rest of our lifestyle would be virtually unchanged.” Pinker further argued that music is merely a “happy by-product” of some other cultural adaptation that may have served some important purpose for our ancestors (Sacks, 2008). Morley (2014) considered the debate between the viewpoints that musical behavior, on the one hand, composes a leisure activity that is superfluous, luxurious, and redundant and that, on the other hand, it is considered a core aspect of human life and experience. Morley astutely observed that

in a sense, aspects of Western societies’ consumption of music encompass both these perspectives simultaneously: in Western societies in particular music is packaged and sold as a product, a consumable luxury addition to life’s essentials, yet these sales are often predicated on the idea that music deals with fundamental human emotional concerns; it is used to adjust human behavior, solicit our affections and votes, and to elicit particular emotional responses in specific circumstances. (p. 148)

Nonetheless, because music of some variety has existed in every single culture throughout history, musicologists maintain that the ability to make and respond to music is indicative of a genetically endowed universal potential (see Morley, 2014, for a review of the evolutionary history of the development of musicality in humans). In short, music may have evolved as a general human capacity (Wallin et al., 2000).

Music presumably left no trace until the writing repertoire emerged; as is the case with verbal behavior, we can only speculate about its early forms (Skinner, 1986). As highlighted by Morley (2014), the earliest known instruments in the archaeological record include flutes shaped from mammoth tusks in the European Paleolithic era that are approximately 40,000 years old (e.g., Higham et al., 2012). Indeed, this remains a very likely substantial underestimation of when music appeared in our evolutionary history. As Morley acknowledged, “This is some 150,000 years after the estimated emergence of our species, and studies in developmental psychology and neuroscience strongly suggest that the human capacities that underpin musical production and perception have a much longer evolutionary history” (p. 148).

Musicologists (e.g., Cross, 2003; Wallin et al., 2000) have identified a number of features that music making in different cultures share in common, which have fostered the notion of music making as an evolved capacity. For example, octaves, or intervals whose higher note has a sound-wave frequency of vibration twice that of its lower note, are equivalent in music from different cultures. Scales, or sets of notes ordered by pitch, typically consist of seven or fewer pitches. Rhythmic patterns are frequently based on duple or triple beats, and loud and fast music is often perceived to be exciting. Absolute pitch, in which a person can identify auditory stimuli presented in isolation without a reference, appears to be an inherited trait. Rhythmic behaviors in parent–infant bonding, such as rocking and patting, are coordinated with vocal pitch and dynamics when a parent cares for their child, and in fact infants who interact rhythmically with their caregivers via rocking or other movements show superior performance on several developmental milestones relative to those who do not (Hodges, 2020).

Morley (2014) highlighted that although music was ubiquitous across all human cultures, definitions of what could be characterized or classified as musical behavior differed considerably cross-culturally. Despite this cross-cultural variation, Morley defined the regularities observed:

It would appear then, that musical behavior amongst all humans involve the organization of sounds into pitches (frequently three to seven), unequally separated across the scale, including the perfect fifth interval, and favoring consonance over dissonance; they involve organizing sound sequences so that they have a deliberate structured temporal relationship with each other including attributing a regular beat to these stimuli. (p. 150)

Morley further noted that music is not just an auditory phenomenon, as it reflects an intentional, temporally organized bodily action. Moreover, it is not possible to make music without action, and this specific, planned arrangement of bodily action is necessary to produce both tonal and rhythmic sound. Of particular interest from a contextual-behavioral science perspective is that despite the structure and intentionality necessary to create music, the capacity remains for that piece of music to have multiple interpretations and be perceived or experienced in very different ways (Cross, 2003; Morley, 2014).

There appears to be a formal similarity between rhythm, pitch, and the actions with which they are often associated. In relational frame theory (Hayes et al., 2001), this can be referred to as non-arbitrarily applicable relational responding (NAARRing). A slow rhythm takes a similar auditory form to the proprioceptive experience of infants rocking. A fast rhythm and lower pitch, conversely, may correspond to more aggressive rocking that would have aversive functions for the infant. On the other hand, a faster rhythm may correspond to elevations in heart rate and arousal and therefore may have augmental functions (i.e., reinforcer strengthening functions) in contexts where high rates of action like tribal warfare or competitive sport are adaptive. In this way, elements of music may be nonarbitrarily related to experiences when the dynamic properties of sound (i.e., auditory stimulus event) correspond to experiences in other sensory modes. Musicologists believe that these similarities lend support to the notion of music as an evolved ability, found in all people across time and cultures (Brown et al., 2000), and, furthermore, provide evidence of traits that are common in all people, suggesting a phylogenic basis. Nonhumans even display rudimentary forms of rhythmic behaviors, often tied to migration or hibernation, and in fact rhythmicity is suggested to be present in all organisms with a central nervous system (Delcomyn, 1980).

Darwin’s study of song patterns in birds led him to propose that the main purpose of music is to facilitate sexual attraction and reproduction. Birdsong is most common during breeding season and shows characteristics of both rhythmicity and tonality, sometimes resembling duets and choruses between humans (Whaling, 2000). Some primate species vocalize with rhythmic patterns akin to human singing, and whales demonstrate approximations of songs with themes. Such precursors to music making in nonhumans seem to involve organisms responding to the vocalizations of one another in a coordinated manner (Wallin et al., 2000). Vocal behavior of this sort may attract mates, announce food or danger, or defend a territory—all with the benefit of promoting survival. Skinner (1984) acknowledged the importance of imitation in establishing behaviors with survival value. He stated that

once imitation evolved, contingencies of selection exist that should produce modeling. A young bird will eventually fly by itself, but it flies sooner if parent birds fly, and if early flying has survival value, then parental modeling should evolve, the parent birds flying often and in particularly conspicuous ways that can be imitated. (p. 218)

The same may be applied to song imitation. Not only do birds vocally imitate within species, but some birds mimic sounds from other bird species and elsewhere in their environment (Goller & Shizuka, 2018).

Further evidence of music as an evolved capacity that involves NAARRing stems from the domain of neuroscience, where neurological correlates of music making have been identified. Some research has isolated the role of the dopaminergic system in music-evoked pleasure, suggesting a phylogenic basis for the rewarding experience people may have while listening to music. Ferreri et al. (2019), for example, found that participants who were orally administered a dopamine precursor evaluated a passage of music more favorably and were more motivated to spend money to hear the music than participants who were administered an antagonist or a placebo. Music has also been proposed to stimulate oxytocin and sympathetic nervous system responses that compete with neural pain messages (Clarke et al., 2015). Differences in gray matter volume have been found in professional musicians relative to amateur musicians, predominantly in brain regions responsible for motor, auditory, and visual-spatial learning (Gaser & Schlaug, 2003). Although findings such as these do little to clarify the function of music, they do suggest that the mechanisms of selection produced anatomical changes in people as music making evolved. Heart rate and blood pressure have also been shown to change when people listen to music, lending further support for the notion that humans are phylogenetically susceptible to the influence of music, even at the physiological level. Hodges (2020) states that “genetic instructions create a brain and body that are predisposed to be musical” (p. 31).

Elicitive Effects

In general, classical conditioning research with music tends to suggest that music influences behavior by altering mood and arousal (e.g., Bruner, 1990; Husain et al., 2002). In consumer behavior research, for example, music as an unconditioned stimulus has been shown to affect explicit and implicit attitudes toward product brands and influence brand choice (Redker & Gibson, 2009). Background music pairing appeared to impact sales of wine, with classical music associated with a greater volume of wine purchased than when Top-40 pop music was played in a store (Areni & Kim, 1993). Another area that classical conditioning has been applied to in the context of music has been in explorations of whether music listening conditions involuntary musical imagery (e.g., Filippidi & Timmer, 2017).

The elicitive effects of music, particularly with respect to emotion, have long been studied (Juslin & Vastfjall, 2008; Susino & Schubert, 2019; Zenter et al., 2008), with familiarity of the music piece or particular music styles seeming pertinent (e.g., Pereira et al., 2011). One particular framework, the BRECVEMA framework, proposes that there are eight distinct psychological mechanisms when emotion is elicited in a listener via music (e.g., Juslin et al., 2014; Juslin & Vastfjall, 2008). In brief, they are described as follows: (a) brain stem reflex, which refers to an unconditioned response to music’s psychophysical cues (e.g., tempo, pitch, loudness); (b) rhythmic entrainment, whereby a body’s internal rhythm, such as heart rate, entrains or adjusts to the external rhythm of the music and subsequently affects a listener’s experienced emotion via proprioception cues; (c) evaluative conditioning, a classical conditioning pairing of a piece of music with other emotionally valenced stimuli (positive or negative), leading to a conditioned emotion; (d) contagion, a rather loosely defined idea that there is some natural imitation of the emotion that is perceived in the music piece itself; (e) visual imagery, which refers to the perspective that a listener may experience an emotion as mental images in the mind that are evoked while listening to music; (f) episodic memory, whereby the music elicits a particular explicit memory of an event in the listener’s life; (g) musical expectancy, insofar as an emotional response is due to whether the gradual unfolding of the syntactical structure of the music piece occurs in an expected or an unexpected manner; and (h) aesthetic judgment, which is a listener’s own personal and subjective evaluation of the music’s aesthetic value based on their own set of criteria.

This is a considerable body of literature on music preference as a defining feature of a person’s personality or self (e.g., Rentfrow & Gosling, 2003), and also on stereotyping “others” according to their musical genre preference, such as hip-hop, heavy metal, and dance music (e.g., Greasley & Lamont, 2016; Rentfrow et al., 2009; Rentfrow & Gosling, 2007). This represents an area ripe for examination from a relational responding stance derived from culturo-behavior science in terms of the contextual transformation of symbolic functions, as will be discussed shortly. Susino and Schubert (2017) proposed that symbolic transfer could occur cross-culturally with respect to music and stereotyping in their STEM (stereotype theory of emotion in music) model. In essence, STEM suggests that the emotion a listener perceives or experiences in music may be based on what Susino and Schubert term “stereotyped associations” that the listener holds about the cultural origin of the music piece (e.g., Brazilian culture encoded in samba music or Portuguese culture in fado music). Indeed, Susino and Schubert (2019) tested the idea that some music genres may be regularly paired via evaluative conditioning with a limited set of emotions linked to a held cultural stereotype that, for example, Japanese culture is “calmer” than more Western cultures, which are stereotypically viewed as more “angry.” Traditional Japanese music appears to reflect the calming properties of Japanese culture, and thus Susino and Schubert (2019) examined whether this stereotyping effect would compete with the emotions evoked by the music via psychophysical cues or autobiographical associations.

The previous summary underscores the relevance of music for the purposes of sexual attraction and reproduction in our ancestors, as well as, in rudimentary form, in nonhumans. It also highlights the elicitive effects of music in a classical conditioning paradigm, even in complex human cultures. However, this conceptualization of the function of music leaves unexplained the fact that music is typically created for and engaged in by groups of people (Brown, 2000). Moreover, focusing exclusively on the NAARRing properties of music does not fully explain the vast complexity of what music can communicate and how it may function for groups. In hunter-gatherer societies, for example, music was used in rituals and ceremonies in which entire tribes or villages participated, and was often accompanied by group dance and song. Today music is a common component of social events such as parties and celebrations in many cultures and features prominently in games all over the world. Live musical performances typically involve more than one performer and are intended to be enjoyed by groups of people. Music, therefore, seems to involve synchronous, coordinated behaviors among those engaging in it, including those producing and those listening to it.

In some instances, reinforcement contingencies support the emission of the same behaviors on the part of group members, such as members of an audience at a concert. In other instances, reinforcement contingencies support the emission of different behaviors on the part of group members, such as members of a tribe who drummed in different ways during a prehunt ceremony—some using the body as a drum, others using actual drums, and others humming or chanting. Thus, in some cases, contingencies support the same behavior on the part of group members who produce some cumulative outcome, in what has been termed a macrocontingency (Glenn et al., 2020), such as people singing the same hymnal during religious worship, the cumulative effect being a loud vocal production presented in unison. Alternatively, in other cases, contingencies may support different behaviors on the part of a group’s members that contribute to the group’s aggregate product, in what has been termed a metacontingency (Glenn et al., 2020).

Figure 1 provides an example of interlocking behaviors that might participate in the complex event of a modern concert. There are several interlocking behaviors that occur in the development of the concert that include social reinforcing functions. For example, two people may work together to produce the lyrics for a song, and three people may work together to develop the music. These five people must then work together (reciprocal arrows) to construct a coherent song where the words thematically and rhythmically match the musical sounds. The band must also practice the music and lyrics to a level of mastery in order to perform and may even work directly with the lyricists and musical developers to adapt the content throughout the production process. A large group of nonmusicians must also work together to organize the concert by acquiring the venue, advertising the concert, ensuring all equipment is functioning, selling tickets, setting up for the band, and troubleshooting throughout the concert event.

Fig. 1
figure 1

Metacontingencies that may influence concert performances through the ongoing interaction of music and concert producers (interlocking behaviors), the concert itself (aggregate product), and responses of the crowd (aggregate outcomes). Note. IBC = interlocking behavioral contingency; AARRing = arbitrarily applicable relational responding; NAARRing = non-arbitrarily applicable relational responding

The aggregate product of all of this is the concert itself that occurs within a concert context. The concert of course is the music being played. The concert context can be a local phenomenon, such as the type of venue or event, or a more temporally extended phenomenon, such as social and political movement taking place outside of the concert itself. Both of these contextual variables likely influence the aggregate outcomes of the concert. More locally, if someone plays a heavy metal song at a wedding while the spouses initiate their first dance, this may be less likely to be reinforced than a classic love song selected by the newlyweds. More temporally extended, music that may have once been common may lose its flavor, possibly due to the evolution of music (e.g., traditional forms of tribal drumming may only produce reinforcement in a very limited context today).

The actions of the listener then support the continued development of topographically similar concert events. Outcomes that may reinforce the production of the event may be monetary or serve as indicators of future monetary reward. For example, whether the crowd dances or cheers may be a function of the history of each individual attending the concert and the meaning they ascribe to the music being produced (de Rose, accepted). If most or all of the audience is dancing or cheering, this may suggest shared appetitive experiences with the music that could indicate future purchasing or engagement with the group that produced the musical event. Concert attendees may also indicate their intent to engage further and pay to attend future events through online reviews and rating systems. Moreover, their reviews are contacted by others who did not attend the concert, which could increase future attendance. Finally, the band may experience increased sales following the concert.

It is important to note that the behaviors of the crowd are also likely reinforced in very complex ways. Concertgoing may have other social functions, as well as reproductive functions as described previously, as dancing can increase feelings of sexual arousal and events at the concert provide a common source of discourse for concertgoers and nonconcertgoers alike. Thus, the cultural contingencies surrounding the concert event may even further reinforce or strengthen the actions of the crowd that reinforce or punish the interlocking behaviors of the band and producers.

This same general model could be applied to any musical event in the past or present. What is key in all of these examples is that selection contingencies support the execution of behaviors that are coordinated and occur in synchrony with that of others. Thus, the selection mechanisms responsible for the transmission of music across time and cultures appear to operate at the level of a group of people; music must have benefits for a community’s survival beyond sexual attraction and reproduction. Additionally, this can be provisionally explained within existing behavioral principles as described in a metacontingency model. This metacontingency model is even further expanded by advances in our understanding of cultural evolution and the role of symbolic referential behavior, which we will elaborate on shortly.

Brown (2000) maintains that sexual selection may be only an indirect mechanism of selection, secondary to the benefits that music affords for a group or community. A further limitation of the sexual selection hypothesis, Brown acknowledges, is that it fails to account for how music facilitates or complements the development of other complex cultural practices: For example, during many points in history, music has served as a voice for large groups of people to speak out against social injustices. In addition, music is often used to educate or communicate ideas, including in educational settings for young children, and also serves as platform anthems for political campaigns. Finally, music is used to cope with adversity, a phenomenon recently observed during the COVID-19 crisis (see Fink et al., 2021): Fink et al. (2021) reported substantial increases in music listening and music playing in a large, global sample of participants, who were surveyed about their experiences during lockdown. The sexual selection hypothesis does not account for the myriad of ways in which music functions in people’s everyday lives. Although music making may represent an inherited capacity in humans, with neurophysiological evidence to support this possibility, to truly understand the function of music requires an analysis of how phylogenic, ontogenic, and cultural-level contingencies interact in its selection. Today’s evolutionary science perspective grants considerable emphasis to the role of ontogenic and cultural-level contingencies as mechanisms of selection, a topic to which we now turn.

A Renewed Evolutionary Science

Evolutionary scientists today acknowledge that behavioral repertoires are shaped and sustained more from a person’s interactions with their particular sociocultural community and less from their genetic endowment. In other words, Skinner’s (1981) second and third levels of selection, ontogenic and cultural-level contingencies, respectively, may be more prominent sources of influence on the provenance of behavior than phylogeny. With regard to music, it may well be the case that humans evolved with a susceptibility for creating, engaging in, and enjoying music, but the transmission of music-making behavior cannot be due exclusively to genetic coding. Wilson et al. (2014) suggest that the traditional study of evolution offers few insights into the selection of complex and diverse cultural practices such as music. Evolutionary theory has expanded beyond Darwin’s framework to include social learning and cultural practices—Skinner’s (1981) second and third levels—as additional behavioral inheritance systems. Moreover, as acknowledged by Skinner (1981), the systems interact: Wilson et al. contend that social learning over the course of an individual’s lifetime produces variation in, selection, and retention of specific patterns of behavior that in turn cause the culture to change as well. Jablonka and Lamb (2014) define culture as a “system of socially transmitted behaviors, products, and preferences which can have cumulative effects, one effect being a persistence in cultural-level behaviors that serve some adaptive value for a group” (p. 158). According to this stance, ontogenic and cultural-level selection mechanisms may be inseparable. Jablonka and Lamb accordingly maintain that evolutionary change does not have to wait for genetic change to occur because the interaction between ontogenic and cultural-level contingencies is continuous and ongoing.

This revised understanding of evolutionary processes has permitted an analysis of the selection of highly complex behavioral repertoires, including language and symbolic thought. Jablonka and Lamb (2014) assert that the use of symbols is a “diagnostic trait” of humans that permeates every aspect of human societies and characterizes numerous cultural practices (p. 189). Signs, the authors state, become symbols when they are assigned meaning by a particular sociocultural community, at which time they may be said to become a part of a “symbolic inheritance system” in which their meaning is actualized based on their relationship to other signs in the cultural system (Jablonka & Lamb, 2014, p. 196). Jablonka and Lamb further note that symbolic inheritance systems are the basis of many cultural institutions, including art, music, religion, dance, and literature (p. 201), for which there are variations across different cultural systems. Importantly, the functions of symbols in a cultural system do not necessarily depend on a person’s direct experience with their physical environment. Wilson et al. (2014) introduce the term symbotype to refer to “higher order relations abstracted from and distinct and separate from physical properties . . . (which) become part of a social community, shaped by sociocultural contingencies” (p. 10). Symbolic inheritance systems including symbotypes permit people to participate in an “imagined reality” not tied to direct experience (Jablonka & Lamb, 2014, p. 196). Such inheritance systems are selected and retained because of the benefits they provide for a group or society.

As discussed previously, music is a core feature of many cultural activities and practices and is typically engaged in by groups of people of varying sizes. Music always seems to mean something to a group of people. For example, loud and fast music with heavy percussion has been symbolic of war, military combat, or victory in many cultures throughout history. Likewise, soft tunes with slow tempos, such as that of a lullaby, are often equated with soothing, bonding, and love. Musical passages were constructed to convey different meanings in prehistoric societies. For example, specific melodies or passages were assigned to specific ceremonies, rituals, festivals, or celebrations. Early musical passages included variations in pitch, rhythmic structure, tempo, and numbers of voices to convey celebration, worship, or healing (Thaut, 2015). Although similarities may be found between symbotypes across cultures, the specific characteristics of a symbolic inheritance system are culturally bound. A country’s national anthem typically only has meaning for a particular nation, for example, and songs that characterized a political era, such as protest songs from the 1960s, may have meaning only for those who lived during that era. Celebration songs are used to convey happiness, health, luck, and fortune to celebrate New Year’s in different countries, but the characteristics may differ across cultures. In China, ancient instruments are used for New Year’s celebrations, for example. How music functions as a symbolic inheritance system is thus determined and actualized by a particular sociocultural context and may spread as cultural practices are selected and transmitted into new geographic settings as migration occurs. Composers of music, operating in accordance with the contingencies of their cultural community, assign meaning to their compositions. Antonio Vivaldi composed his “Four Seasons” concerti such that each of the four concerti was symbolic of a particular season. As articulated by Jablonka and Lamb (2014), music promotes a shared experience among members of a community that is separate from one’s direct experience with the physical world. Imagining snow falling in a forest on a dark, cold evening may be a common psychological experience among people listening to Vivaldi’s “Winter” concerto that is completely detached from one’s physical surroundings. Other pieces may occasion similar imagery even in the absence of lyrics or titles to occasion covert experience. Music thus represents a rich and complex symbolic inheritance system.

This conceptualization of the evolution of symbolic thought and language is entirely consistent with the behavior-analytic conceptualization of relational responding and its phylogenic and ontogenic origins, to which we now turn.


Critchfield and Rehfeldt (2019) define relational behavior as responding to one stimulus in terms of another—in other words, interacting with the symbolic functions of stimuli. Most stimulus relations are arbitrary, in that the stimuli are related not on the basis of physical properties but because the reinforcement contingencies of a given verbal community teach people to relate stimuli in particular ways. Experience in one’s sociocultural community, therefore, is what gives stimuli meaning (Critchfield & Rehfeldt, 2019). When people develop repertoires of AARRing, the behavioral functions of stimuli are determined by their relationship to other stimuli, and a person comes to interact with the indirect, symbolic functions of stimuli without having to experience the actual contingency. The verbal utterance “Careful! The stove is hot,” for example, is sufficient for prohibiting a child from touching a hot burner and continues to function in this way as the child repeats the utterance themselves—the child need never experience the aversive consequence of touching the hot stove. Relational frame theorists refer to this process as the “literality” of relational networks, or “responding to verbal formulations in some ways as if one were responding to the actual contingency” (Hayes & Wilson, 1994, p. 290). AARRing is not only necessary for effective functioning in one’s day-to-day environment (Critchfield & Rehfeldt, 2019) but also required for a listener to understand, and engage appropriately in, the cultural practices of their community. Given the many ways that different communities of people have engaged in music historically, AARRing would seem to be the process by which music comes to have meaning in a given sociocultural context.

Barnes-Holmes et al. (2018) cogently elaborated on the evolutionary significance of AARRing, explaining the survival benefits it had for our ancestors, as well as the ways in which repertoires of relational responding among groups of people promoted cultural advances. In the early stages of our ancestors’ development, Barnes-Holmes et al. explain, basic speech sounds, gestures, or bodily movements came to mean different things, which meant that speakers could influence the behavior of listeners without them having to experience direct contingencies, which in turn resulted in the survival of the group. Early music, such as chanting and drumming, functioned in this way as well. Particular drum beats likely meant something in a particular ceremony or ritual, just as chants were used to convey specific messages of worship to spiritual entities. Still today different drumming patterns are used in different indigenous ceremonies and rituals depending on the type of ritual. The Congolese rhythm Zebola, for example, is used in healing rituals (Vinesett et al., 2015). Such behaviors were likely acquired via imitation and modeling or vocally instructed to groups of listeners by speakers. Over time, patterns of musical behaviors likely became more differentiated as complex repertoires of relational responding developed among groups of people.

In fact, Barnes-Holmes et al. (2018) explain that AARRing, once established in our ancestors, facilitated more complex forms of relating such that the fitness of a group could be promoted in increasingly complex ways. This no doubt occurred in the evolution of music making. As the uses of music became more differentiated, instruments were created to expand its scope beyond drumming and vocalizing. Creating music on instruments permitted variation in pitch, dynamics, and tone, which further expanded its use and diversified its functions. The emergence of the writing repertoire was also pivotal in the evolution of music, as now a musical notation system could be developed, and people could acquire a system for reading music.

Reading music in and of itself is based on symbolic relations between printed notes and their corresponding note name and placement on a keyboard or instrument fingerboard. Other characteristics of how a musical passage is played on a particular instrument (e.g., a string player using a specific part of the bow, or a percussionist operating the snares on a drum in a particular manner, or the dynamics of a particular passage) are based on a highly complex system of music vocabulary and conveyed textually in printed music. Of even greater complexity are the relations between the many instrumental parts of a band or orchestra score, all of which were composed in relation to one another to create a larger piece of music. The art of conducting also has symbolic properties, as the conductor’s actions, be they gestures, facial expressions, or movements of the baton, are symbolic of different expressions on the part of the group’s musicians. Words embedded in songs are also necessarily forms of AARRing that may be uniquely human and add a necessary layer of complexity to any analysis of music and selection. All of these examples illustrate how, in the evolution of music, AARRing can be considered an example of a cusp, or a behavior change that brings the organism’s (or community’s) behavior into contact with new contingencies that have even more far-reaching consequences (Rosales-Ruiz & Baer, 1997). Music is thus a prototypical example of how basic forms of relational behavior promote the development of more complex forms of relational behavior, which permit a culture’s symbolic inheritance system to develop and evolve as a result (Barnes-Holmes et al., 2018).

That music is a symbolic inheritance system, composed of numerous examples of symbotypes, or relational networks between stimuli, is further exemplified by the fact that people interact with the functions of music at the covert level. Many composers, for example, report “hearing music in their heads” as they compose (see Rehfeldt et al., 2020). People who are not trained as musicians can readily introduce music into their sensory environment by humming or singing a tune overtly or covertly, and doing so may evoke rich perceptual imagery. Listening to a song from one’s childhood, for example, may actualize memories of childhood experiences. Technology has greatly advanced the means by which musical stimuli can be introduced into one’s environment via digital sources, such that the indirect and perceptual functions of musical stimuli can be actualized readily and rapidly—exactly the shared reality to which Jablonka and Lamb (2014) refer. It is thus no wonder that listening to music serves such a powerful escape function for people enduring oppressive or challenging circumstances.

In a truly functional account, we must also understand how the external context of socially mediated reinforcement and punishment guides the evolution of musical behavior within groups. A relatively simple model of how AARRing may participate in the music experience of individuals can be seen in Fig. 2. It is important to remember that these individuals (i.e., the music listeners) are part of the metacontingencies that maintain music production as we described previously, but an analysis of the listener experience cannot be complete without accounting for complex verbal relations that can occur when listening to music. Although an oversimplification, we have discussed the nonarbitrarily applicable features of music, which include rhythm, pitch, and dynamics. In most modern music, there are also lyrics. A musical event therefore entails nonarbitrary musical referents that coordinate with experiences in auditory and other sense modes (e.g., heart rate, proprioceptive movement, visual movement). Additionally, because of AARRing, as described previously, lyrics also refer to events (e.g., “and we were driving through the mountains” is coordinated with the private experience of driving through the mountains). When a musical event occurs, this never happens in a vacuum, but rather occurs with a concurrent experience (e.g., attending a concert, listening to music on a long drive, being with friends or family). To augment means “to add or strengthen,” and each of these parts of the whole may augment the experience of other parts. For example, music alone is more reinforcing with the addition of lyrics. Being with a loved one is more reinforcing with the addition of the musical event that contains the musical and lyrical referents, especially when the referents cohere with the concurrent experience. By augmenting reinforcing functions of the whole event, music may therefore have evocative functions, which builds on work that characterizes and classifies the emotions evoked by music (e.g., Zenter et al., 2008). For example, a fast-paced song with angry lyrics could evoke more aggressive behavior (see Susino & Schubert, 2017) in a context that already selects for aggression (e.g., listening to heavy metal music before playing a contact sport). Conversely, a slow-paced song with lyrics about love and affection may evoke approach behaviors toward another or sexual behavior. Both of these serve an evolutionary function for the group by promoting survival through aggression or reproduction, respectively.

Fig. 2
figure 2

Diagram of arbitrary relations that may occur when listening to music that together produce augmental and response-evoking functions as music interacts with the context within which music occurs and a shared cultural environment

It is important to note here that these relations do not occur inside the mind of the listener but originate within a shared cultural environment. Lyrical references to objects or events that are foreign to a group are less likely to have augmenting or response-evoking functions. That is, they have less meaning to a group that has not experienced the events to which the music refers. Take for example the dichotomy between modern rap and country music, where the former alludes to events that are more common in urban areas where this music originated and the latter more common in rural areas. The objects and events referred to in the music are those that represent a shared experience of the group members. Moreover, because humans can experience events privately, the occurrence of the musical event may bring visual experiences of concurrent events (see Filippidi & Timmer, 2017) to the psychological present, as described by Skinner as conditioned seeing. For example, if a specific song was playing while on a first date with a romantic partner, hearing the song at a different time could evoke seeing the partner as if in that moment. Conversely, when one is with that same partner at a later time, singing the lyrics or humming the music of the song could evoke memories for them related to the event when the song was first shared (e.g., Juslin et al., 2014; Juslin & Vastfjall, 2008).

These examples speak to the importance of AARRing within group cooperation. Relational frame theorists propose that cooperative behavior between speakers and listeners is a core feature of AARRing, although disagreement as to whether cooperation facilitated the emergence of early relational repertoires, or vice versa, persists. We now discuss the role of cooperation in the evolution of AARRing in general and in music as a symbolic inheritance system more specifically.


Central to the evolution of AARRing, according to Barnes-Holmes et al. (2018), is the repertoire of cooperative behavior that was shaped between speakers and listeners. The authors suggest that a number of basic forms of mutually entailed responding, such as heeding warnings to avoid predators and taking directions to locate food sources, required speakers and listeners to cooperate with one another. Wilson et al. (2014) acknowledge that evolution favored cooperation, as the likelihood of survival was greatest when groups of people worked together. Examples of activities important for group survival include childcare, hunting and gathering, cultivating foods, preparing shelter, and taking stances of offense or defense against other groups (Barnes-Holmes et al., 2018). All such activities enabled our ancestors to adapt to their environments more rapidly than the much slower process of genetic evolution, and all were likely to be most successful if speakers and listeners worked together (see Wilson et al., 2014). Relational frame theorists disagree about which process came first, cooperation or AARRing. Barnes-Holmes et al. maintain that cooperation was the outcome of early, basic forms of mutual entailment, as speakers came to influence the behaviors of listeners and listeners understood the behavior of speakers in certain contexts. Hayes and Sanford (2014) alternatively suggest that repertoires of social cooperation were established first, which then promoted mutually entailed responding. Regardless of the sequence in which the two processes emerged in our evolutionary history, it seems reasonable to conclude that many forms of social cooperation between people require repertoires of relational responding, and relational repertoires expanded the scope and diversity of opportunities for speakers and listeners to cooperate and collaborate. The list of cultural practices that require cooperation and collaboration between people—building bridges, roads, and neighborhoods; scientific discoveries and technological innovations; implementing government and education systems; and so on—could not be performed were it not for the uniquely human repertoire of relational behavior.

As was discussed previously, music seems to be an activity that, even in our ancestors, was and is today typically performed by or experienced in groups of people. For example, music and dance often occurred in conjunction in hunter-gatherer societies. People quickly learned that the more voices that joined in, the bigger the musical outcome, perhaps making it more advantageous for more members of a community to participate (Hodges, 2020). In addition, with more people, different roles or parts could be performed, thus diversifying and adding complexity to the musical outcome. People no doubt experimented with the coordination of different parts or passages executed by different members, and relied on social interactions between members to instruct or shape the coordination of multiple parts. The making of instruments, as previously noted, also expanded the musical outcome that could be produced by a group, and instruments could be created more rapidly and in more and more specialized manners when groups worked together. The execution of music by a group’s members must be synchronized and coordinated cooperatively within the group for the outcome to be successful, whether it is performing in a group or listening to music at a concert venue or religious establishment. Hodges (2020) acknowledges that not only did cooperative behavior among group members facilitate the expansion of music as a cultural practice, but music may in turn increase cooperation among group members. Military marches or drills, for example, were used historically during battle to direct the behavior of the soldiers on the battlefield, a context in which teamwork was necessary for survival. In educational settings for young children, music is often used to promote coordinated group behavior in games or chores. Protest songs have been used to facilitate group sit-ins and marches. In all of these cases, music appears to influence or facilitate cooperation among a group’s members. It may well be the case that, although cooperation allowed for advances in the development of music as an institution, variations were selected that likewise promoted or encouraged cooperation among groups of people.

To summarize, music can be regarded as a special form of relational responding—a symbolic inheritance system or AARRing—that may have both required and facilitated cooperation. It is seldom performed alone by one person, and a bigger outcome can be achieved with more people. So influential is music on the behavior of individuals in groups that scholars of music have described the highly coordinated, cooperative repertoires that music occasions as examples of a “collective consciousness” (Hodges, 2020, p. 66). For these reasons, from an evolutionary science perspective, music can be regarded as a cultural-level adaptation.

A Cultural Adaptation

Evolutionarily, many activities promoted the advancement and survival of a community when performed cooperatively, and permitted, as Wilson (2007, 2016) allows, for the development of innumerable cultural practices (see Morley, 2014). Opportunities to cooperate also seem to have positive benefits for people’s happiness and well-being. Lu and Argyle (1991), for example, found that people who engaged in regular, ongoing activities with others, both during work time and leisure, reported higher scores on standardized measures of happiness. These authors defined cooperation as “acting together, in a coordinated way at work, leisure, or in social activities, in the pursuit of shared goals, the enjoyment of the joint activity, or simply furthering the relationship” (Lu & Argyle, 1991, p. 1019). It may be that people who engage in team-oriented projects in the workplace, where employees work together toward a common goal and participate in more formal and informal community group activities such as social clubs, religious organizations, volunteer clubs, or even informal social events with others, appear to be happier. Indeed, people living in small communities where contingencies support caring for others report less stress and greater life satisfaction than others (Grinde et al., 2018). Wilson (2007) suggests that natural selection selected not only for collaboration among group members but also for loyalty or fidelity to the group. Wilson’s perspective runs parallel to that expressed by ethnomusicologists, or scholars of music who study the cultural and social aspects of music.

Hodges (2020) allows that the survival benefits of music and dancing performed by large groups of people are not necessarily apparent. In fact, the activities would have had high costs for our ancestors, as noisy activities could have attracted competitors and been physically exhausting. The benefits of music making must have outweighed these risks. Hodges further submits that music serves as a unifying force among groups of people, stating that

members of a tribe are often bound by common religious beliefs, that are often expressed through music. Members of one tribe must band together to fight off members of another tribe. Music gives courage to those going off to battle and it gives comfort to those who must stay behind. Much of the work of a tribal community requires the coordination of many laborers; music not only provides for synchrony of movement but also for relief from tedium. (p. 43)

As an example, Hodges narrates the story of a Swazi king who required his warriors to sing together to prevent them from fighting with one another. Even today, although many musicians practice or compose in solitude, the ultimate goal is that the music be both performed and enjoyed by groups. Brown (2000) and Wilson (2007) agree that for these reasons music can be considered not only a group adaptation but also a cultural adaptation because, as Wilson (2007) explains, it promotes group cohesion, affinity, and cooperation among group members, thereby promoting collective survival.

Ethnomusicologists have suggested that not only does music unify, but, due to the common emotional experience that people may have while engaging with it, it also may produce empathy. This reasoning is supported by research findings showing that people who scored high in trait empathy reported more intense emotions while listening to music (Vuoskoski & Eerola, 2012). Musicologists have even proposed that music represents a “virtual person” for people to identify with while they are listening (see Clarke et al., 2015). A more pragmatic position may be to look to the mechanisms of selection to explain the unifying effects that music may have. Because music is a complex form of AARRing (Barnes-Holmes et al., 2018), pieces of music have symbolic meaning as established by the contingencies of particular sociocultural communities. Because many people can experience music at one time, either as musicians or listeners, that culturally ascribed meaning can be experienced by many people. It is this shared experience, due to the relational functions of music, that may promote group cohesion or affinity. This collective meaning and shared experience further explain the consistent ways in which music has been shown to influence behaviors across individuals. For example, Greitemeyer (2009) reported that when people listened to music with prosocial lyrics relative to songs with neutral lyrics, they were more likely to leave large tips for restaurant employees. Advertisers take advantage of the fact that particular music may occasion specific behaviors by manipulating background music to influence the likelihood of purchasing behavior (e.g., Santos & Freire, 2013). Shared meaning or experience may at times occasion behaviors that may be harmful to an individual or group while at the same time promoting affinity among group members. A relationship between music and risky behaviors such as drug taking has been posited, for example (e.g., Coyne & Padilla-Walker, 2015). To this end, Clarke et al. (2015) ask how “shared feeling states, sensibilities and predispositions come about (through music), and how they can be cultivated and controlled” (p. 13)—a question similarly posed centuries ago by Plato and Aristotle, both of whom believed that music could create just and kind societies.

Implications for the Well-Being of Society

Understanding how contingencies of selection operate at the level of a culture may permit behavior scientists to enact culture-wide interventions to promote the well-being of a society (Wilson et al., 2014). In fact, this is the very goal of the relatively new wing of behavior analysis known as culturo-behavior science (Mattaini & Cihon, 2019). Relevant examples of such culture-wide interventions were first described in Walden II, Skinner’s (1948) novel about a Utopian society that was designed in such a way so as to maximize reinforcers, enhance productivity, and limit punishers. Music played an important role in the community. Walden II’s citizens had the option to play in any number of musical groups regardless of ability and had the leisure time to do so. Frazier, the community’s ubiquitous founder, stated that

if you live in Walden II and like music, you may go as far as you like. I don’t mean a few minutes a day—I mean all the time and energy you can give to music and remain healthy. If you want to listen, there’s an extensive library of records and, of course, many concerts, some of them quite professional. . . . If you want to perform, you can get instruction on almost any instrument from other members—who get credits for it. If you have any ability, you can soon find an audience. We all go to concerts. We’re never too tired. (Skinner, 1948, pp. 81–82)

Music and art were such important features of the community that Frazier further attested that

we shall never produce so satisfying a world that there will be no place for art. On the contrary, Walden II has demonstrated very nicely that as soon as the simple necessities of life are obtained with little effort, there’s an enormous welling up of artistic interest. (Skinner, 1948, p. 116)

The accessibility of music in Walden II is similar to that of the Garden in Epicurean society, a parallel drawn by Neuringer and Englert (2017). As explained by Neuringer and Englert, the ancient Greek philosopher Epicurus was concerned with how to live the “good life” and defined “good” as caring for one’s community. Epicurus established a meeting place called the Garden, located outside of the city walls of Athens, where people from all classes could meet and dine together and enjoy conversation, art, and literature. According to Neuringer and Englert, Walden II and the Garden both illustrate how the availability of pleasurable stimuli in a community, including opportunities for social relationships and engagement in the arts, can promote the well-being of a community’s members. Neuringer and Englert’s analysis coincides with that of Biglan et al. (2020), who opine that a society that maximizes reinforcers and reduces stressors is likely to thrive. Opportunities for people to create, listen to, and enjoy music together—working cooperatively in a shared experience—may be a step toward this goal.

Interestingly, Skinner’s (1948) and Epicurus’s vision resonates with a recent body of results disseminated by the NEA in the United States, which showed that people who engage in musical activities are also highly engaged in other ways with their communities. Specifically, the NEA (2007) reported that one half of all attendees of performing arts events engage in volunteer or charity work in their communities, in contrast to 20% of all nonattendees. In addition, attendees of performing arts events reported engaging in physical exercise and outdoor recreational activities at double the rate of individuals who do not attend performing arts events. Thus, those who engage in musical events in their communities appear to be most involved in other activities that support their own well-being, as well as that of their community. The 2007 NEA report affords that “art is not escapism, but an invitation to activism”; in short, healthy communities depend on active citizens (p. 1). Music seems to characterize both.

Grant (2010) elaborates on an additional advantage that opportunities for music may afford for a community—namely, the promotion of sustainability. Grant defines sustainability as the ability to engage in behaviors that reinforce the behaviors of the current population and are repeatable over long time spans without having harmful effects on future generations. In outlining the numerous ways in which finite natural resources are being used in increasing quantities, Grant distinguishes between a steady-state economy and a growth economy. A steady-state economy is defined as a system in which natural resource reinforcer inputs remain constant over time, whereas a growth economy is one in which natural resource supply continuously increases, as does production and consumption, until a system collapses. Sustainable practices are consistent with a steady-state economy. Grant distinguishes between “resource-heavy” and “resource-light” reinforcers, suggesting that a society plentiful in resource-light or resource-free reinforcers is a step toward achieving a steady-state economy.

Music and the other arts, according to Grant (2010), may play a special role in the promotion of a steady-state economy or sustainable cultural practices, as the availability of music and other artistic activities may help facilitate a shift from material to nonmaterial reinforcers. Grant contends that nonmaterial reinforcers, particularly those that might be obtained by interacting with others, such as charity events, group celebrations, and social relationships, may help create sustainable behavior. He further suggests that by making underconsumed and resource-light (or resource-free) reinforcers more available, sustainability will be enhanced. In fact, underconsumed and resource-light reinforcers figured prominently in a number of rich and diverse historical traditions, Grant contends, noting that for many religious organizations the ideal life was characterized by charity and group celebrations rather than the pursuit of material wealth. Grant describes the “potential for a more reinforcing world through a transition to a culture of acquired tastes and pleasures” (p. 15), such as that depicted in bohemian societies that are committed to the arts. Indeed, Walden II was characterized by a reduction in resource-intensive labor, and residents had time to pursue interests in the literary, visual, and performing arts, for which opportunities were plentiful. Grant acknowledges that aesthetic consumption skills must be taught, however—a case in point for arts education.

In addition to its conceptualization as a resource-light or resource-free reinforcer, engagement in which could promote a society’s sustainability, music has been identified as a means of breaking down barriers between people, including those related to ethnicity, age, and social class (Clarke et al., 2015). As an example, the West-Eastern Divan Orchestra is a youth orchestra that was founded in 1999 for the sole purpose of bringing together young Arabic and Israeli musicians. The orchestra was named after von Goethe’s collection of poems West-Eastern Divan, which endorses the concept of one world culture. Barenboim (2009) suggests that although the orchestra itself cannot accomplish peace, it can create a space for listening to others and arousing curiosity about those from cultural backgrounds different from one’s own. Musicians involved in the orchestra reported that the experience altered their preconceived notions and stereotypes regarding either Israeli or Arabic culture and inspired a respect for people’s differences (Cheah, 2009). In a band or orchestra, contingencies select different topographies for the different instrumentalists in the group, the result being the larger aggregate product of the performance of some piece of music. In this example of a metacontingency, the individual behavioral contingencies support listening and working cooperatively toward a common goal. In fact, manipulations of interlocking behavioral contingencies in the laboratory have been shown to promote cooperation among research participants (see Morford & Cihon, 2013; Vichi et al., 2009) in simulations that would seem to parallel the same behavioral processes involved in the West-Eastern Divan Orchestra.

As a symbolic inheritance system, the sociocultural contingencies of a given community may assign meaning to music, such that particular music comes to “stand for,” or be symbolic of, highly abstract ideas. There are many instances throughout history in which sociocultural contingencies established relational frames between specific music and democratic ideals, such that music has been associated with a number of movements for social change (see Rehfeldt et al., 2020). Weissman (2010) distinguished between music that merely weighs in on social issues and music that is written or performed deliberately in an attempt to establish social change. The freedom songs of the civil rights music in the United States, influenced by African American spirituals, helped to articulate and reinforce the core beliefs of the movement (Rosenthal & Flacks, 2011). For example, 100,000 individuals sang the song “We Shall Overcome” at the rally ending the 1963 March on Washington (Marable, 1984). Such music motivated activists during long marches and provided courage in the face of opposition (Library of Congress, n.d.). Rosenthal and Flacks (2011) allow that the transportable nature of music allows social activists to “carry beliefs and loyalties with them in their everyday routines” (p. 127). Behavior analytically, this means that the transportable nature of music makes it possible to introduce the functions of musical stimuli into the environment almost at any time, such that its meaning can be experienced recurrently and rapidly by large groups.

Other examples of the use of music to enact widespread social change include its use to affect participation in democratic elections. “Rock the Vote,” a nonpartisan, not-for-profit project, mobilizes performances by musicians in the United States and United Kingdom to encourage people to vote in presidential elections. Adebayo (2017) describes the dissemination of pop songs carrying messages of peace throughout Nigeria during elections. Programmed radio play was intended to promote peaceful, nonviolent behavior during the elections. The author suggests that the music created a sense of community that discouraged violence. Those interviewed said that the music played a huge role in helping them choose nonviolent participation and that the songs “kept ringing in their heads” as they were approached by political groups to scuttle the elections (Adebayo, 2017, p. 74). This shows how the indirect functions of musical stimuli can be actualized in such a way to occasion similar, values-consistent behavior across groups of people.

Our contextual-behavioral science conceptualization that we propose here of music as composing a cultural symbolic inheritance system that is built on an innate, biological capacity fits neatly with the views of some music anthropologists and musicologists. For example, Morley (2014) stated,

The production and perception of gestural (vocal, orofacial, and corporeal) expression of emotion in this system involves the priming of the rhythmic and emotional systems. Rhythmic sequences, and the prosodic and rhythmic content of tonal sequences, prime this system and each other, resulting in a multimodal relationship between rhythmic and emotional content. This takes the form of auditory, visual and kinesthetic expression of emotion, and it would appear that what emerged over the course of evolution of [Humans] was the ability to deliberately use this system along with increasing control over the form, range, and duration of these expressive gestures. It is proposed here that the culturally-shaped melodic, rhythmic behaviors that we call music, and semantic, lexical linguistic abilities later emerged as specialized behaviors building upon the foundations of this system of vocal and kinesthetic communication of emotion. (p. 167)


We have explored the evolutionary significance of music-related behaviors through the lens of contextual-behavioral science and explained how music, given its presence in many cultural practices, may be considered a cultural-level adaptation. We discussed music as a complex form of AARRing that requires cooperation and may in turn promote cooperation among members of a community. Because of the symbolic meaning assigned to particular types of music by a sociocultural community, and because of the shared experience that people may have while producing or listening to music, music may serve to unify groups of people. We provided examples of instances in which music has occasioned similar behaviors across multiple individuals in ways that collectively promoted peace or social change.

Toward the end of his career, Skinner lamented that the science of behavior had not made enough contributions toward the modification of cultural practices that would promote the well-being of all (see Chance, 2007). With music serving as such a ubiquitous presence in everyday life, it is easy to take it for granted. However, evidence suggests a relationship between engagement in music and caring for one’s community. In efforts to promote the health and vitality of a culture, culturo-behavior scientists should systematically evaluate how the cultural transmission of music across communities contributes to the well-being of all of its members and promotes survival and longevity. Indeed, Lebanese poet Kahlil Gibran (2011) noted that “music is the language of the spirit. It opens the secret of life bringing peace, (and) abolishing strife” (p. 11).