In the past, researchers assumed that the ability of humans to synchronize bodily movement to a beat, as in dance, was unique in the animal kingdom (e.g., Bispham, 2006; Wallin, Merker, & Brown, 2000, p. 12; Zatorre, Chen, & Penhune, 2007). However, subsequent research has shown entrainment in various species of parrots (Hasegawa, Okanoya, Hasegawa, & Seki, 2011; Patel, Iversen, Bregman, & Schulz, 2009a; Schachner, Brady, Pepperberg, & Hauser, 2009), lending support to suggestions that vocal learners—humans, parrots, and perhaps a few other species, such as elephants or cetaceans—might be uniquely adapted for entrainment (Fitch, 2012; Merchant & Honing, 2014; Merker, Madison, & Eckerdal, 2009; Patel, 2006, 2014; Patel & Iversen, 2014; Patel, Iversen, Bregman, & Schulz, 2009b; Trainor, Gao, Lei, Lehtovaara, & Harris, 2009).

But recent results have pushed the envelope even further. A California sea lion was trained to match head-bobs to an auditory beat, and furthermore it transferred this ability to new tempos and complex musical stimuli (Cook, Rouse, Wilson, & Reichmuth, 2013). Evidence is also emerging in apes. Large and Gray (2015) reported that a bonobo spontaneously synchronized drumming behavior with a human experimenter, and Hattori, Tomonaga, and Matsuzawa (2013) found that a chimpanzee performing a tapping task spontaneously aligned her tapping to a task-irrelevant auditory beat. In addition, pilot data reported by Bregman, Iversen, Lichman, Reinhart, and Patel (2013) suggest that a domestic horse may be able to synchronize trotting to music. This opens the possibility that a wide range of animals may be capable of synchronization.

Some authors are pursuing what this means for the evolution of complex rhythmic and musical abilities in humans (Hoeschele, Merchant, Kikutchi, Hattori, & ten Cate, 2015; Honing & Ploeger, 2012; Honing, ten Cate, Peretz, & Trehub, 2015; Merchant & Honing, 2014; Patel & Iversen, 2014), but in this study we focus on animal entrainment as a question of interest in its own right. If a faculty for entrainment is evolutionarily old and widespread in the animal kingdom, it may be implicated in a range of adaptive behaviors for interacting with the environment and with conspecifics.

Although we could search for criteria that apply to the specific animals shown so far to entrain, and that exclude all others, such a strategy would still take as its starting point the idea that beat-matching is rare and only possible under certain narrow conditions. In this article we instead challenge the idea that any sharp distinction can be made between animals that entrain and animals that do not. We begin by examining the extent of findings that have purported to show a failure of entrainment in various species.

Negative evidence?

Although there was early interest in the possibility of animal entrainment (Craig, 1916, 1917; Wheeler’s, 1917, remarks on antelopes and pelicans), the more recent literature on this topic has generally accepted it as established that most animals do not entrain (Bispham, 2006; Fitch, 2012; Greenfield, 1994; Hoeschele et al., 2015; Merchant & Honing, 2014; Merker, Madison, & Eckerdal, 2009; Patel, 2006; Patel et al., 2009a, b; Schachner et al., 2009; Trainor et al., 2009; Wallin et al., 2000, p. 12; Zatorre et al., 2007). As we will review in a later section, positive evidence is accruing for entrainment in certain species, but here we challenge the overall assumption that these cases are rare and that a few examples contrast against a background of lack of entrainment in the animal kingdom. The assumption traces at least as far back as Wallin et al., and appears to have emerged from an attempt to encapsulate the difference between humans’ rich musical capabilities and other animals’ lack thereof. However, the specific claim about an absence of animal entrainment (as contrasted to more complex musical abilities) has been based on very little empirical evidence.

Some authors (e.g., Hasegawa et al., 2011; Patel, 2014) have cited Zarco, Merchant, Prado, and Mendez (2009) as a failure to train rhesus macaques to beat-match (for similar results, see Konoike, Mikami, & Miyachi, 2012; Merchant, Zarco, Perez, Prado, & Bartolo, 2011). However, these animals did learn to entrain, although with less precision than humans (most noticeably during a continuation phase after the stimulus ceased), and although the animals’ responses occurred after rather than on the beat, this delay was considerably less than the same animals’ simple reaction time to the same stimulus, indicating that they were not just reacting to each stimulus as a separate event. In addition, these studies used discrete button-push responses to a short series of tones. In contrast, successful entrainment with other species has involved continuous rhythmic stimuli for longer periods of time, and continuous, oscillatory, self-guided behaviors that the animal can bring into phase with the stimulus. These methods are different enough that we cannot conclude that macaques have been proven to be poor entrainers (cf. Large & Gray, 2015, p. 3). Furthermore, there is a recent report, which we will discuss below, of macaques entraining spontaneously (Nagasaka, Chao, Hasegawa, Notoya, & Fujii, 2013), which would seem to negate macaques as a case of negative evidence.

To date, only one large-scale survey has attempted to establish an empirical case for a widespread paucity of entrainment in the animal kingdom. We will consider this study in some depth. Schachner et al. (2009) reported a search of YouTube videos for the keyword “dance” coupled with specific animals, including those likely to have contact with humans, a variety of nonhuman primates, and known vocal mimickers matched to related nonmimickers. The search yielded 3879 videos, evenly split between mimicking and nonmimicking species. Of these, 33 videos showed evidence of entrainment. The species in those 33 videos included 14 species of parrot and one species of elephant, all vocal mimickers (see Stoeger & Manger, 2014, on vocal mimicking in elephants).

The authors gave particular mention to the failure to find any videos of entrainment in dogs, despite massive efforts by human trainers. A subculture of dog training, called canine freestyle, is devoted to training dogs to perform “dance” duets with their human handlers to music, yet Schachner et al.’s (2009) analysis of these videos did not show evidence of entrainment of the dogs’ footfalls to the music. It is doubtful, however, that this is a fair test of a dog’s ability to entrain. In canine freestyle, the dogs are generally trained on a series of large-scale moves (e.g., circling around the trainer in response to a hand signal), all taught and then chained as a sequence before the musical accompaniment is introduced. Training the dog to time its footfalls to the music is not part of the training procedure.

More seriously, regarding the attempt to broadly survey the abilities of animals, the selection criteria for Schachner et al. (2009) involved substantial bias, contrary to the authors’ claim. First, the options for robustly mimicking species are limited (Tyack, 2008), and a substantial proportion of them are the nearly 400 species of parrot (including true parrots, cockatoos, and New Zealand parrots). Parrots are already known from the published literature to do beat-matching, and furthermore are highly social animals who bond with their human caretakers and are responsive to social reward. This, then, was a slam-dunk for finding entrainment, unlike the search for entrainment in the much broader and less explored field of nonmimicking species.

Second, the impact of the celebrity status of Snowball the dancing cockatoo (Patel et al., 2009a) cannot be dismissed. This is likely to have dramatically influenced the types of animals that lay people will observe for evidence of dancing, will try to train to dance, and will choose to post on YouTube. In fact, of the 29 videos of parrots that showed entrainment, more than a third were cockatoos, and fully half of those were sulphur-crested cockatoos, the same species as Snowball. In contrast, the fact that budgerigars (also a species of parrot) can beat-match (Hasegawa et al., 2011) has escaped the notice of the general public, and the search did not yield a single video of dancing budgies. This undermines the authors’ implied argument that, if a species can beat-match, someone will have posted a video of it. The authors present as evidence against bias in the available pool the fact that nonmimickers outnumber mimickers on YouTube overall by 2:1. In fact, given the rarity of vocal mimicking in the animal kingdom, this is an extraordinarily high proportion of vocal mimickers, indicating substantial selection bias in what kinds of animal videos people choose to post.

Third, the use of the keyword “dance” is a highly biasing one. Humans will perceive as dancing only those animals who are moving their bodies in certain ways, and who are moving to a music-like auditory stimulus. This will exclude virtually all cases of animals engaging in ordinary species-typical behavior, such as walking, pecking, licking, scratching, and so on.

In short, the argument implicit in the literature is that, if a wide range of animals were capable of entrainment, there would be reported evidence, so that the absence of evidence constitutes evidence of absence. Instead, we offer the alternate view that—far from due diligence having been done and having turned up little evidence—investigations of animal entrainment have barely begun.

Below we take this claim further, to argue that such evidence as there is suggests that entrainment may be much more widespread than has been thought.

The argument from neurological plausibility

As Patel (2014) has pointed out, the idea dates back to Darwin (1871) that the neurological preconditions for beat-matching may be evolutionarily very old and widespread. An obvious candidate for such a neurological precondition is neural oscillations, which are ubiquitous in animal brains (cf. Bispham, 2006; Fitch, 2012; Large, 2008; Large & Snyder, 2009; Zanto, Snyder, & Large, 2006). These, in turn, are merely a subset of the physiological processes in the body that involve the mutual entrainment of oscillating processes. Indeed, as Glass (2001) put it, “physiological function derives from the interactions of these [rhythmic] cells with each other and with external inputs to generate the rhythms essential for life” (p. 279).

Neural oscillations occur when populations of neurons in the brain fire in synchrony, a phenomenon so widespread that it can be called an inherent principle of brain functioning. Coherently oscillating neural ensembles are found throughout the animal kingdom, indicating that they are a fundamental design feature. They are endemic in the brain, found in every region of cortex as well as in subcortical structures, and operate at a wide range of temporal scales, from seconds to milliseconds (Bragin, Engel, Wilson, Fried, & Buzsaki, 1999; Canolty & Knight, 2010; Ward, 2003). They have been implicated in a broad range of neurological and cognitive activities, including homeostasis, timing, perception, attention, motivation, motor control, language, and memory (Jensen, Kaiser, & Lachaux, 2007; Jutras & Buffalo, 2010; Knyazev, 2007; Singer, 1999; Ward, 2003; Wilson & Wilson, 2005). Furthermore, one temporal range of oscillations, gamma oscillations (~40 Hz), is believed to be involved in the synchronization of other processes (e.g., Fries, 2009; Nicolić, Fries, & Singer, 2013).

Of particular relevance here is the role of oscillators in sensory and motor processes and, crucially, the coordination between the two. Motor behavior is fundamentally rhythmic (Molinari, Leggio, Martin, Cerasa, & Thaut, 2003), and evidence is emerging that sensory processes may be, as well (e.g., Miller, Carlson, & McAuley, 2013). Canolty and Knight (2010) argued that the rhythmic, periodic quality inherent in motor systems, together with the evolution of sensory systems to serve motor control (i.e., their role as guidance systems for moving bodies), gives rise to an integrated system in which sensory information will be best processed if it is packaged by the sensory systems into “rhythmic volleys.”

Furthermore, sensory stimuli that are in fact objectively rhythmic cause the entrainment of brain oscillations, an effect that has been shown in humans, macaque monkeys, and zebrafish (Lakatos, Karmos, Mehta, Ulbert, & Schroeder, 2008; Saleh, Reimer, Penn, Ojakangas, & Hatsopoulos, 2010; Sumbre, Muto, Baier, & Poo, 2008). In macaques and humans, at least, this propagation of timing extends up to and includes the motor system, as is evidenced by decreased reaction times under conditions of rhythmic input (Lakatos et al., 2008; Praamstra, Kourtis, Kwok, & Oostenveld, 2006; Saleh et al., 2010); the activation of motor planning areas by passive listening (Chen, Penhune, & Zatorre, 2008; Grahn & Brett, 2007), the time course of which suggests a predictive mechanism (Fujioka, Trainor, Large, & Ross, 2012); and the activation of cell populations in the premotor cortex of rhesus macaques that appear to be stimulus-predicting cells, firing in response to regularly timed visual or auditory stimuli (Merchant et al., 2015).

At a minimum, these findings show that rhythmic driving of motor systems by sensory systems extends fairly far back in the primate lineage, but it could still be argued to be an evolutionary development specific to that lineage (cf. Merchant & Honing, 2014), with convergent analogues in more distant lineages such as parrots. However, other considerations suggest a much older, shared neurological architecture.

Nonhuman animals are generally successful with interval timing (reproducing or categorizing single intervals between stimuli), which can be thought of as a precursor to entrainment (Merchant & Honing, 2014). Interval timing is governed by the basal ganglia and their major input area, the striatum (see Matell & Meck, 2004, and Meck, Penney, & Pouthas, 2008, for reviews), and Merchant, Harrington, and Meck (2013) proposed that timing mechanisms in general are mediated by a centralized “hub” involving the basal ganglia. The basal ganglia are also known to play a crucial role in motor control. Finally, the basal ganglia are found in all vertebrates, which suggests that the mechanisms for timing sensory and motor events may have been conserved over long evolutionary time scales. All this provides a plausible neurological framework within which entrainment abilities could easily arise by connecting the timing of sensory input and the timing of a repetitive motor behavior.

Beat-matching to nonauditory stimuli

In the face of the above considerations, one way to preserve the special status of beat-matching would be to theorize that it depends not just on general timing mechanisms, but on a specific sensory-to-motor pathway that exists in humans and only a few other species. In particular, as we noted above, a specialized auditory-to-motor pathway has been suggested, which would indicate that beat-matching will be most successful by far when the driving input is auditory. In apparent support of this, the human auditory system appears to greatly surpass the visual system in terms of its ability to match a beat (see Hove, Spivey, & Krumhansl, 2010, for a review).

However, the stimuli usually used to test entrainment in the visual modality—typically, blinking objects—are not optimal. With a stimulus that allows prediction of a collision, such as a finger tapping or a ball bouncing, entrainment can be nearly as good as with auditory input (Hove et al., 2010; Iversen, Patel, Nicodemus, & Emmorey, 2015). Interpersonal situations are also conducive of entrainment, with people entraining to each other spontaneously when walking, swinging their arms, or rocking in rocking chairs (Issartel, Marin, & Cadopi, 2007; Nessler & Gilliland, 2009; Richardson, Marsh, Isenhower, Goodman, & Schmidt, 2007). Furthermore, deaf people perform better with a traditional flashing stimulus than do hearing people, highlighting the role of perceptual learning in entrainment (Iversen et al., 2015).

In addition, other modalities, such as the somatosenses, can support beat-matching, suggesting that the perception and production of rhythm are centrally controlled and not modality-specific (cf. Penhune, Zatorre, & Evans, 1998; Phillips-Silver, Aktipis, & Bryant, 2010). Humans can match finger-taps to an auditory or a tactile metronome with equal fidelity (Elliott, Wing, & Welchman, 2010, 2011), and rhythmic vestibular input causes both adults and infants to interpret an ambiguously timed auditory beat as duple meter or triple meter (Phillips-Silver & Trainor, 2005, 2007, 2008; Trainor et al., 2009). Furthermore, the vestibular response induced by loud music can contribute to the impulse to dance (Todd & Cody, 2000). Also of interest is the fact that multiple modalities yield better motor synchronization than each modality in isolation (Elliott et al., 2010), and that rhythm in one modality can influence attention in another modality (Miller, Carlson, & McAuley, 2013).

We conclude from all this that the human ability to entrain is robustly multimodal. It is also worth noting that species may differ in which sensory modalities best support entrainment. Macaques appear to entrain better to visual than to auditory stimuli (Nagasaka et al., 2013; Zarco et al., 2009), and Merchant and Honing (2014) suggested that whereas apes, and humans in particular, evolved a strong facility for audiomotor entrainment, monkeys are more adapted for visuomotor entrainment. Both within humans and beyond, then, it does not appear that entrainment evolved specifically within an auditory-to-motor pathway to support vocal learning.

Entrainment in animals: What counts as evidence?

We turn our attention next to positive evidence of entrainment in the animal kingdom. But because various theories have been proposed as to why animals might or might not entrain, and what should count as entrainment, it is not entirely clear where we should focus our attention. Because of this, we propose to cast a wide net in order to survey the phenomenon of beat-matching as completely as possible. In this section, we address several concerns about the evidence we will be surveying.

Patel (2014) has proposed three criteria to distinguish a more-or-less human-like ability to beat-match, on the one hand, from a highly rigid, mechanical, involuntary type of entrainment, such as that exhibited by certain insects, on the other. These criteria are the ability to match stimuli that (1) are more complex than a simple pulse train, (2) range across a wide variety of tempos (a criterion also endorsed by Merchant & Honing, 2014), and (3) are in a different modality than the response, so that the response is not simply mimicry of the stimulus.

One difficulty with these criteria is that, although they were designed to exclude the narrow and mechanical cases, they largely fail to do so. Consider the case of fireflies. Fireflies do not emit a “metronome-like pulse train” (Patel et al., 2009b, Table 1), but instead have species-specific patterns of firing, and females respond preferentially to the pattern of flashing of males of their own rather than of other species (see Lewis & Cratsley, 2008, for a review). With respect to the second criterion, fireflies can produce a range of tempos as wide as 2:1 within a single species, varying with air temperature (e.g., Carlson, Copeland, Raderman, & Bulloch, 1976); but by itself this would likely not satisfy Patel’s second criterion, which seems to be aimed at a flexible ability to respond to changes in tempo within a single context. Instead, a more compelling critique is that the ability to entrain to a variety of tempos does not necessarily require any cognitive complexity. Indeed, the entrainment of “simple” oscillating systems in nature can exhibit highly complex dynamics, depending on the ratios of the tempos of the interacting systems (Glass, 2001, p. 280). Conversely, a limited tempo range for entrainment need not mean a lack of sophisticated cognitive beat processing, but instead could be imposed by, for example, a limited range of optimum tempos for motor control over a particular effector (Hasegawa et al., 2011; Konoike et al., 2012; Large & Gray, 2015). A similar critique applies to the third criterion. There is no reason why a neurologically simple creature couldn’t, for example, be wired to mechanically and involuntarily synchronize its wing-beats, or chirping, to visual flashes.

All this is not to suggest that fireflies have cognitively sophisticated control over their beat-matching, but rather that the criteria have missed the mark. They have not effectively captured the intuition that something is different about the case of fireflies. Instead, we suggest, the key feature here is that the synchronization ability is both involuntary and extremely narrow. It is, in the terms of Fodor’s (1983) modularity, informationally encapsulated. Much like the dance of the honeybee, which scientists are reluctant to consider as referential communication, the issue does not lie in the internal structure of the behavior. Instead, the problem lies in the fact that it functions like a reflex. It is a hardwired, dedicated-use mechanism that applies to only one stimulus and fails to interact with the rest of the cognitive system.

In a more recent article, Patel (2014) added the criterion of “prediction,” meaning that the animal’s timing is influenced by the ongoing timing of the stimulus, rather than each response being a reaction to each new stimulus. Stated in this way, without reference to a cognitive mechanism, this is definitional of what it means to entrain. We concur that evidence of true entrainment rather than reactive behavior is an important criterion. We would add only that it need not involve strict synchronization, wherein the behavioral onset exactly corresponds to the stimulus onset. Instead, other variants of entrainment, such as counterphasing or antisynchrony (i.e., the stimulus and behavior alternate; cf. Ravignani, 2015), or behavior onset slightly after stimulus onset, should count as entrainment, provided that the timing is too short to be plausibly due to simple reaction (which, as Patel points out, would need to be on the order of at least a few hundred milliseconds).

However, Patel (2014) additionally describes prediction as involving a “mental model” of the timing. Unfortunately, this is a difficult criterion to assess. Oscillating physical systems in general entrain to one another, ranging from electrons to pendulums to asteroids (Strogatz, 2003). None of these possess mental models, yet their behavior appears to be “predictive.” Thus, we cannot tell from the entrainment behavior itself, no matter how precisely timed it is, whether it is driven by a mental model. Thus, we are once more back to trying to assess how cognitively complex the mechanisms are that underlie a particular instance of entrainment, which cannot be done by referencing only the decontextualized characteristics of the behavior itself.

A further problem arises when we consider whether to set aside vocal learners from the discussion, since all sides agree that these animals should be capable of beat-matching. The difficulty, though, is in deciding exactly what counts as a vocal learner. Parrots are exuberant mimics, frequently and spontaneously imitating non-species-typical sounds. Other species, including African elephants, Asian elephants, white whales, and harbor seals, have been documented mimicking only on an occasional basis, and it is unclear to what extent these species do so as a regular part of their behavioral ecology (Holden, 2006; Janik & Slater, 1997; Poole, Tyack, Stoeger-Horwath, & Watwood, 2005; Ralls, Fiorelli, & Gish, 1984; Ridgeway, Carder, Jeffries, & Todd, 2013; Stoeger et al., 2012). Still other species, including bottlenose dolphins and orcas, can be trained to mimic novel sounds, though again regular spontaneous mimicry is less certain (Foote et al., 2006; Richards, Wolz, & Herman, 1984). Then there is a much larger grouping of animals whose vocalizations are restricted to species-typical calls, but fully realized adult performance of those vocalizations is learned from conspecifics during development. These animals including the oscine songbirds, humpback whales, greater horseshoe bats, and others (Boughman, 1997, 1998; Catchpole & Slater, 1995; Jones & Ransome, 1993; Kroodsma & Miller, 1996; Noad, Cato, Bryden, Jenner, & Jenner, 2000; Payne, Tyack, & Payne, 1983). And finally, many species produce group-specific local variations of stereotypic repertoire vocalizations (see Tyack, 2008, for a review). These include additional species of birds and bats, and—though much is made of their vocal inflexibility—several species of primates, including pygmy marmosets, tamarins, and chimpanzees (Crockford, Herbinger, Vigilant, & Boesch, 2004; Elowson & Snowdon, 1994; Maeda & Masataka, 1987; Marshall, Wrangham, & Arcadi, 1999; Mitani & Gros-Louis, 1998; Smolker & Pepper, 1999; Snowdon & Elowson, 1999; Watwood, Tyack, & Wells, 2004). It is unclear where on this continuum to draw the line on vocal learners that should be capable of beat-matching.

Finally, Hoeschele et al. (2015) raised the issue of the relative merits of laboratory experiments versus naturalistic observations. The former can be criticized for creating unnatural demands that may not reveal an animal’s actual capabilities, whereas the latter can be criticized for a lack of rigorous control. Hoeschele et al. concluded, as do we, that both approaches are needed in order to make progress.

Given all of the considerations above, we propose that no cases be summarily excluded from the discussion. Instead, we choose to cast a broad net, considering all possible cases of entrainment in animals, to try to understand the range of phenomena involved.

Positive evidence

In fact, a surprisingly wide range of species have been reported to engage in synchronized behavior.

Beginning with automatic entrainment of behaviors in neurologically simple species, one point to note is the variety of species in which this occurs, and the variety of effectors and behaviors involved. These include bioluminescent flashing in fireflies, fish, and marine crustaceans (Buck & Buck, 1968; Morin, 1986; Woodland, Cabanban, Taylor, & Taylor, 2002); stridulation, or chirping, in limb-rubbing insects (Alexander & Moore, 1958; Greenfield & Schultz, 2008; Walker, 1969); croaking in frogs (Klump & Gerhardt, 1992; Wells, 1977); and claw-waving in crabs (Backwell, Jennions, Passmore, & Christy, 1998; see Greenfield, 1994, for a general review). This shows, at a minimum, that the mechanism for this kind of entrainment is not an isolated evolutionary occurrence, but rather has emerged repeatedly through convergent evolution. It further suggests that the raw neurological materials shared by species as diverse as invertebrates, fish, and tetrapods contain the prerequisites that make the emergence of entrainment a relatively simple matter. It is not implausible that this may form the most basic substrate of the flexible and voluntary forms of entrainment shown by more complex species.

Moving to more complex species, cetaceans such as dolphins and orcas have been observed to synchronize behaviors under a variety of circumstances, including schooling behavior, cooperative feeding, breathing while resting, mother–infant coordination, alliance behavior in males, displays by multiple males in the presence of females, and synchronized surfacing (Connor, Smolker, & Bejder, 2006; Heimlich-Boran, 1988; Mann & Smuts, 1998; Norris & Dohl, 1980; Norris & Schilt, 1988; Peddemors, 1990; Similä, 1997). The degree of coordination involved is noteworthy—in one study, male dolphin pairs surfacing in synchrony broke the surface on average within 120–150 ms of each other (Connor et al., 2006). Given that breaking the surface is a crude measure of synchrony, subject to variability from perturbations of the water surface, this is remarkable.

Several species of primates are known to perform vocal duetting, in which calls are synchronized or alternated between two conspecifics. These species include bonobos, gelada monkeys, gibbons, indris, langurs, siamangs, tarsiers, titi monkeys, and marmosets (Chivers, 1972; de Waal, 1988; Ellefson, 1974; Geissmann, 2000; Haimoff, 1986; Hohmann & Fruth, 1994; Kinzey & Robinson, 1983; Richman, 1976, 1978, 1987; Takehashi, Narayanan, & Ghazanfar, 2013; Tembrock, 1974; Tenaza, 1976). For most of these species there are not sufficient data to rule out a stimulus–response account, in which each individual responds to the most recent call of the other, with no synchronization mechanism controlling the timing. Takehashi et al. claimed that marmosets entrain their timing in turn-taking of calls, similar to the exquisitely timed turn-taking in human conversation (Wilson & Wilson, 2005), but their analysis failed to make the case for anything more than a simple call-and-response mechanism. (For the full argument, see http://languagelog.ldc.upenn.edu/nll/?p=7989.)

However, in two species, gelada monkeys and bonobos, a high degree of precision has been reported. Richman (1978, 1987) recorded gelada monkeys in naturalistic group settings in captivity, and reported the common occurrence of both counterphased (alternating) and phase-locked calls. Millisecond analysis of the sonograms showed that the timing is too precise to be accounted for by a stimulus–response explanation. A further interesting feature is that synchrony improves over successive calls, in one example starting at a 46-ms asynchrony, and achieving zero asynchrony after three calls.

In bonobos, two reports exist of in-phase and counterphase synchronized calls in the wild. Although neither study reported a quantitative analysis of the timing, both reported observations of very precise synchrony. “During choruses, staccato hooting of different individuals is almost perfectly synchronized so that one individual acts as the ‘echo’ of another, or emits calls at the same moments as another. The calls are given in a steady rhythm of about two per second” (de Waal, 1988, p. 203). Similarly, “there is a very short delay between the first and second animal and units are usually emitted in more or less precise alternation” (Hohmann & Fruth, 1994, p. 772).

These cases are of particular interest because they demonstrate vocal synchronizing in species that are not robust vocal learners. This is important because it indicates that the ability to entrain the timing of a given behavior need not depend on a great deal of voluntary control over the content of that behavior (see Janik & Slater, 2000, on the distinction between contextual learning—when to produce a call—vs. production learning—modulating the call itself). Control of the onset and offset of behaviors is present in virtually all mammals and birds, as has been extensively studied in the field of operant conditioning, even when there is little neurological elaboration for motor learning of the details of the behavior (see Burnstein & Wolff, 1967; Davis & Hubbard, 1973; Molliver, 1963; and Salzinger & Waller, 1962, for a few examples).

Furthermore, a chimpanzee, a bonobo, and Japanese macaques have been reported to spontaneously synchronize a nonvocal behavior to an ambient rhythmic stimulus. Hattori et al. (2013) trained three captive chimpanzees to alternately tap two keys on a piano keyboard. In a test phase, the same auditory notes as those to be tapped were played as irrelevant auditory stimuli during tapping, with an interstimulus interval (ISI) of 400, 500, or 600 ms. One of the three chimpanzees spontaneously synchronized to one of the three ISIs. In a different study, a captive bonobo spontaneously synchronized drum beats with those produced by a human experimenter, when the tempo was close to the animal’s own preferred spontaneous drumming speed (Large & Gray, 2015). Interestingly, this rate (270 bpm) was far faster than the rates typically used in animal entrainment studies. Finally, three Japanese macaques who were taught a button-tapping task spontaneously synchronized with a partner (Nagasaka et al., 2013). It is worth noting that the task required alternating between two buttons, creating an oscillatory movement, as well as involving an ongoing stimulus that the animal could progressively align to, distinguishing it from other button-press tasks with monkeys (see above). It is also worth noting that the studies with macaques and the bonobo both involved social stimuli, which may be an important motivating factor. These studies suggest that at least some individuals of these species, under some circumstances, are sensitive to an incoming rhythm and will spontaneously beat-match.

One other possible instance of spontaneous synchronization was reported by Bregman et al. (2013). They analyzed the footfalls of a horse trotting to music, and found preliminary evidence for synchronization, although the authors acknowledged that further control conditions would be needed to determine that the result was not coincidental.

Turning now to studies in which animals were deliberately taught to entrain, we have the cases of the button-press experiments in rhesus macaques reviewed in the section on negative evidence, but which do provide partial evidence for entrainment. Furthermore, in studies with parrots, including budgerigars, gray parrots, and sulphur-crested cockatoos, all were able to entrain after a learning process. (In the case of Snowball the cockatoo, we do not know the training history of the animal, whether deliberate or inadvertent, by his human caretakers, but it seems likely that the behavior was not entirely spontaneous.)

Finally, as we mentioned before, in one case a nonmimic and nonprimate, the California sea lion Ronan, was trained to produce a body movement in synchrony with an auditory signal (Cook et al., 2013). Sea lions (members of the family Otariidae) are known to be vocally inflexible, in spite of suggestions that this has not been adequately shown (Merchant & Honing, 2014; Patel, 2014). They do not learn their vocalizations from conspecifics in infancy, and despite decades of observation, there is no evidence that they mimic vocalizations or other environmental sounds (Schusterman, 2008; see Reichmuth & Casey, 2014, for a review). It is true that the other pinniped families, the Odobenidae (walruses) and Phocidae (the 18 species of true seals), include species that possess greater vocal flexibility. Walruses, for example, can be trained to produce novel vocalizations of their own invention (Schusterman & Reichmuth, 2007), and elephant seals and harbor seals show spontaneous vocal learning (Sanvito, Galimberti, & Miller, 2007; Schusterman, 2008). One particularly well-known case is the harbor seal Hoover, who was cared for by humans in infancy and developed a striking ability to produce certain phrases spoken by his early caretaker (Ralls et al., 1984). But Hoover remains the only pinniped individual to ever show strong evidence of mimicry (see the review in Reichmuth & Casey, 2014). It is decidedly not the case that vocal learning is a shared characteristic of all pinnipeds, whose different families diverged from one another at least 23 million years ago. (In contrast, humans diverged from chimpanzees and bonobos only 6 million years ago, yet could hardly be more different from them in terms of vocal flexibility.) Thus, the case of Ronan presents a compelling example of an ability to learn beat-matching in a non-vocal-learner.

These are, as far as we know, the extent of formal reports of synchronization in animals. However, basic observation of various rhythmic behaviors in animals also suggests that entrainment may be occurring, which merits closer study. These include locomotor behaviors, such as herds or flocks trotting, galloping, or flying together (cf. Wheeler, 1917), and social vocalizations, whether chorusing or alternating. Further research will be needed to determine whether synchronization occurs in these situations across a broad range of species.

In sum, the possibilities for entrainment, whether spontaneous, as in group behavior, or learned, as in a laboratory setting, are very wide indeed. The time is ripe for research on a broad range of species, to explore whether, to what extent, and under what conditions each species is capable of entrainment.

Why do animals vary in entrainment?

In the title of this article, we ask why humans want to entrain, fireflies can’t help it, pet birds try, and sea lions have to be bribed. In this section, we propose some reasons for this variability.

The considerations raised in the previous sections flip the question of entrainment on its head. Rather than asking which animals can entrain and why, we should be asking, why isn’t entrainment the default for all animals when presented with rhythmic stimuli? For example, one of the interesting features of the training of Ronan the sea lion was that, although she eventually learned to beat-match spontaneously to novel stimuli, getting her to that point took months of training. Along similar lines, although Hattori et al. (2013) found one instance of spontaneous synchronization in a chimpanzee, two other chimpanzees, as well as the first chimpanzee at two of the three ISIs, showed no synchronization. That is, their motor systems were not always spontaneously driven by oscillatory coupling with the sensory input.

We suggest that this relative imperviousness to an incoming beat—surprising when coupled with a latent ability to entrain, as in the sea lion and the first chimpanzee—may actually be the inevitable outcome of bringing entrainment (and other forms of sensory guidance of behavior) under greater voluntary control. As animals evolved to become more cognitively sophisticated, they increased in their ability to allocate attention to one stimulus and neglect others, and in their ability to voluntarily start and stop a behavior in response to their own motives. Furthermore, this greater sophistication may lead human researchers to expect entrainment with stimuli and behaviors that are unsuitable for that animal, a point that has been made by Hoeschele et al. (2015, p. 2). This can occur when a particular motor behavior is not under fine-tuned control (e.g., sea lion flippers are under less refined motor control than the head, and an attempt to entrain flipper movement might have failed), when the tempo is too fast or too slow for the biomechanics of that effector or its neural control systems, or when the animal does not have the perceptual ability to extract the beat from a complex stimulus. All of these complicating factors make it less than inevitable that entrainment will happen simply because a rhythmic stimulus is present.

In the research cited above on rhythmic sensory input driving the motor system in humans and macaques (Lakatos et al., 2008; Praamstra et al., 2006; Saleh et al., 2010), it is particularly noteworthy that this occurred only when the subject was actively attending to the stimulus. Further evidence that complex brains are able to entrain to some stimuli but filter out others has come from the fact that humans are less likely to entrain a rhythmic behavior with a partner they don’t like (Miles, Griffighs, Richardson, & Macrae, 2010), and that dolphins are more likely to synchronize during social than during nonsocial behaviors (Connor et al., 2006).

Compatible with this idea is a proposal by Schachner (2012), in a response to the data from horses reported by Bregman et al. (2013). Schachner proposed that animals may be able to beat-match if they are exposed to crucial developmental experiences (mirroring in some important respects the developmental experiences of vocal learners). Specifically, this would apply to animals raised in conditions that induce them to develop motor skills coupled to real-time feedback, such as those experienced by dressage horses, which would lead to a propensity to attend to the relevant stimuli. Schachner argues that vocal learners are unique simply in that their natural developmental trajectory in the wild leads them to have this ability.

In addition to the animal funneling voluntary attention to the relevant stimulus, it is also crucial that the animal’s perceptual system be able to extract the rhythmic component from the stimulus. This should not be a problem for simple pulsing stimuli, particularly if Canolty and Knight (2010) are correct that animal sensory systems are prone to packaging incoming stimuli rhythmically. Less obvious, though, is the ability to “hear” the beat in music—that is, to build a structured mental representation of the stimulus that extracts the relevant rhythmic pulse from the additional, overlaid complexities. As Kung, Chen, Zatorre, and Penhune (2013) noted, “musical beat has no one-to-one relationship with auditory features—it is an abstract perceptual representation that emerges from the interaction between sensory cues and higher level cognitive organization” (p. 401). In Cook et al. (2013), the sea lion Ronan became expert at beat-matching to a relatively simple oscillating stimulus, yet was confused when introduced to human music. The animal had to be trained to hear the beat, through successive approximations of more complex musical exposures. But once trained to do so, she was able to transfer this ability to novel musical stimuli. In other words, she did not narrowly learn one song, but rather learned the auditory skill of extracting a beat from music.

This finding of trainability with Ronan the sea lion may shed light on a claim that rhesus macaques cannot hear the beat in a rhythmic stimulus, as was shown by the lack of an event-related potential (ERP) response to an oddball stimulus (Honing, Merchant, Háden, Prado, & Bartolo, 2012). It is possible that this lack of response was due to a lack of relevant perceptual training. It would be of crucial interest to see whether macaques would show the ERP response after training similar to Ronan’s.

That this skill needed to be learned in an animal like Ronan is perhaps not unlike what happens in human development through exposure to heavily cadenced children’s songs and rhymes, as well as multimodal input such as being bounced by adults while listening (cf. Phillips-Silver & Trainor, 2005). Although newborns show ERP responses to deviations in a heard beat (Honing, Ladinig, Háden, & Winkler, 2009; Winkler, Háden, Ladinig, Sziller, & Honing, 2009; Zentner & Eerola, 2010), other studies have shown that learning to extract the beat from complex musical structures requires a long apprenticeship. For example, Drake, Jones, and Baruch (2000) demonstrated that the ability to attend to different levels in a hierarchy of rhythm in music increases with age in children and with musical training, and they proposed that these increased abilities are due to increasing reliance on multiple coupled oscillations rather than a single rate of oscillation by a single ensemble. In addition, cultural experience with a particular metrical pattern of music plays a large role in the ability to detect a temporal disruption in that pattern embedded in music (Hannon, Soley, & Ullal, 2012; Hannon & Trehub, 2005a, b), and exposure may need to occur in early childhood for mere exposure to result in greater sensitivity to a complex rhythm (Hannon, Vanden Bosch der Nederlanden, & Tichko, 2012).

A similarly long apprenticeship may be required for learning to beat-match with one’s body, as can be attested by anyone who has watched a small child “dance” to music or attempt to coordinate patty-cake. Even among adults with a lifetime of cultural exposure to music, learning makes a difference. Miura, Kudo, Ohtsuki, and Kanehisa (2011) reported that dancers are better than nondancers at synchronizing body movement to a beat, and the difference between the groups varied with the type of movement required. All of this shows that developing the ability to hear a beat in a complex stimulus, and to synchronize body movement to that beat, is a prolonged process in humans, with learning playing a crucial role.

In addition to attending to the stimulus and knowing how to hear the stimulus, further complexities are introduced by the issue of whether an animal has voluntary control of particular effectors and movement patterns performed with those effectors. For example, the rate of the incoming stimulus may be too different from the preferred rate of the motor system (Hasegawa et al., 2011; Konoike et al., 2012; Large & Gray, 2015), or the animal may not possess the neural sophistication to learn to rhythmically perform a behavior that is not in its natural repertoire. However, if the animal does have sufficient voluntary control, then the chances that an animal that does not spontaneously entrain can nevertheless learn to do so may be much greater. By being able to control the starting and stopping of the behavior at will, and being able to modulate the speed to successively approximate the desired outcome, the animal can in essence choose to entrain once it “gets the point” of the task.

This emphasis on voluntary control and learning raises a question, though, in the case of humans. Since humans are experts at voluntary control, why are music and dance universal? The key factor may be motivational. Nonhuman animals can be trained to do beat-matching for reward, but at least some species, given the choice, prefer silence to music (McDermott & Hauser, 2007; but see Watanabe & Nemoto, 1998). Humans, on the other hand, enjoy rhythm and musical patterns, and therefore build these into their cultures, so that exposure to music and dance becomes a universal part of the human developmental experience. Indeed, this intensive, lifelong, universal overlearning of music and dance may lead to it becoming automatized to the point that humans can’t help but beat-match—an effect perhaps comparable to the Stroop effect in experienced readers. In fact, it has been shown that humans cannot help but entrain to one another, even when asked not to (Issartel, Marin, & Cadopi, 2007). However, if we can imagine a group of humans raised without any cultural input of music and dance, it is possible that they would show as little spontaneous beat-matching as a sea lion.

The role of motivation may also help to explain why highly social birds are the animals that have been discovered by lay people to have an ability to “dance.” These birds bond with their caretakers and are highly sensitive to social reward, making it particularly likely that they will pick up behaviors that humans find amusing. Motivation can also explain the spontaneous occurrence of synchronization in dolphins and orcas, who clearly have a specifically social motivation to do so (cf. Abramson, Hernández-Lloreda, Call, & Colmenares, 2013, on social imitation in orcas), and social motivation may also be a factor in the case of the bonobo who entrained to a human experimenter (Large & Gray, 2015).

To summarize, then, some species such as insects will produce a beat-matched repetitive behavior in response to a repetitive stimulus simply “because it’s there,” via relatively unmediated neural connections that foster entrainment of oscillations, whereas other species will filter out the stimulus via attentional mechanisms, fail to hear the beat embedded in the stimulus, choose not to initiate the behavior, or lack the motor control to produce a repetitive behavior that is not part of their natural repertoire. Greater neurological sophistication can lead to apparent failures of entrainment, but for reasons that yield insights about the animal’s cognitive architecture.

Conclusions

In sum, the evidence suggests that the range of animals that can entrain is much larger than has been believed, and that a great deal of further research will be needed across a range of species before we can make any broad claims about the rhythmic capabilities of animals.

The results of such research would have fundamental implications for understanding animal learning and behavior. By focusing the discussion too tightly on the role of sensorimotor entrainment in supporting human rhythm and music, and the hunt for restricted neural adaptations that might be responsible, a wide array of understudied animal behavior has been prematurely excluded from the discussion.

It is easy to think of reasons why the entrainment of rhythmic activity to rhythmic stimuli in the environment would be advantageous. Larsson (2012, 2014) has argued that animals evolved to synchronize group locomotion and breathing in order not to mask important environmental sounds, and Ravignani, Bowling, and Fitch (2014) reviewed a range of hypotheses regarding the advantages of vocal chorusing. Other examples might include an aquatic animal swimming in choppy water, synchronizing limb strokes to the frequency of the waves; an infant riding on its mother, matching the tensing and relaxing of muscles to her movement, much like a horseback rider; herd movement, which may be more efficient when locomotion is synchronized, and which may also help to avoid collisions; synchronizing with prey to facilitate capture; coordinating group hunting; managing the ebb and flow of play behavior; minimizing conflict in group feeding situations; mating rituals; and copulatory behavior. In addition to these pragmatic advantages, synchronized behavior may be used by animals to signal affiliation with each other (enhancing attention and cooperation, and possibly even driving mirror systems), and also to signal that affiliation to other conspecific observers of the synchronized behavior.

In addition, the extent of entrainment in the animal kingdom has implications for our understanding of human music and dance. It is clear that dramatic evolutionary changes happened very recently, after human ancestors split from the other apes, to produce these complex and distinctive behaviors. Research on the cognition of music and dance has therefore naturally focused on identifying components that could form the basis of this recent development. But if the arguments we have put forth in this article are correct, then the ability to entrain is not one of these recent changes. Instead, its role in human music and dance is to form a very old substrate, shared by widely diverse creatures, and other factors must be sought to explain why music and dance are so remarkable and unique in the animal kingdom.