Music organizes a set of tones over time, creating musical phrases or motives. While the tones of a motive can vary in features like pitch range, timbre or intensity, the rhythmic organization of the tones and their durations can mainly vary in relation to the tempo of the beat and its metrical structure. In fact, two music universals that appear across cultures are that the sounds used to build motives have few durational values (mainly, their duration is set to two- to three-beat subdivisions) and that they are organized in metrical hierarchies (Savage, Brown, Sakai, & Currie, 2015). The combination of these two factors leads to the rhythmic structure of any musical excerpt. Here we explore if the cognitive architecture required for the processing of these structural features in music has deep biological roots that can be traced to other species. More explicitly, we study if a distantly related mammal, the rat (Rattus norvegicus), can detect the rhythmic structure of a song while ignoring changes in other features, such as pitch. Other fundamental aspects of rhythm processing, such as beat entrainment or regularity detection, have successfully been studied with birds, pinnipeds, and primates (for a review, see Kotz, Ravignani, & Fitch, 2018; Wilson and Cook, 2016; ten Cate, Spierings, Hubert, & Honing, 2016). Here, however, we focus on the exploration of a more complex aspect of rhythmic cognition: Whether non-human animals can extract the temporal organization of three durational values that define the rhythmic structure of a tune.

Rodents can correctly process some salient aspects of musical sequences. For instance, rats are able to discriminate between excerpts of Bach and Stravinsky (Otsuka, Yanagi, & Watanabe, 2009) or between Mozart and the Beatles (Okaichi & Okaichi, 2001). They also succeed in discriminating structured tunes based on surface features, such as intensity, octave, timbre, and melodic organization (d’Amato & Salmon, 1982, 1984; Poli & Previde, 1991). However, more important for musical structure is how tones are organized in temporal patterns (Kotz, Ravignani, & Fitch, 2018). It is not clear whether non-human animals can detect this temporal information in order to integrate the structural and perceptual features of the complex auditory signal. Rats are sensitive to at least some aspects of the rhythmic organization in acoustic sequences, as they can detect isochrony, discriminating regular from irregular tone sequences Celma-Miralles & Toro, 2020). They also group sound sequences following the principles described by the iambic-trochaic law (de la Mora, Nespor, & Toro, 2013; Toro & Nespor, 2015). In the present study, we wanted to go further, by exploring whether rats can detect the underlying temporal pattern of a complex auditory signal independently of changes in surface features (such as pitch).

In our experiment, we tested whether rats can use rhythm (defined by the durations of the composing tones in relation to the tempo of the beat) to discriminate a familiar song from similar melodies. We familiarized the animals with an excerpt of the Happy Birthday song. After familiarization, the animals were presented with three types of test items: The same musical excerpt, a version of the excerpt that preserves the rhythm but not the melodic intervals, and versions of the excerpt that scramble the rhythmic but not the melodic intervals. If the animals can in fact detect the underlying rhythmic structure of the tune, they should be able to recognize it independently of melodic changes. If, on the contrary, the animals are focusing on the melody of the tune, they should not detect rhythmic variations when melodic intervals are preserved.

Materials and methods


Subjects were 40 female Long-Evans rats (4 months of age). They were caged in pairs. Rats were exposed to a light–dark cycle of 12 h/12 h in a pathogen-free room. They were food-deprived until they reached 85–90% of their free-feeding weight, but they had access to water ad libitum. Food was always administered after each training session.


For the familiarization, we used the second half of the “Happy birthday” song (see Fig. 1a). This tune is made up of 13 tones, contains all the pitches of the Western major musical scale, and the tonic (C6) occupies a centric position in the frequency range of the sounds. The tones were synthetized with MuseScore 2.3.2 ( with the timbre of an acoustic piano. Every tone had a 20-ms fade-in ramp and it faded out during the rest of the tone duration. The tones were eight different pitches: G5 (783.9 Hz), A5 (880 Hz), B5 (987.8 Hz), C6 (1046.5 Hz), D6 (1174.7 Hz), E6 (1318.5 Hz), F6 (1396.9 Hz), and G6 (1568.0 Hz). The frequencies of these pitches fall in the hearing range of rats, which goes from approximately 200 Hz to 80–90 kHz (Fay 1988). Each sequence of 13 tones lasted 5.156 s and contained three kinds of rhythmic figures (the notes could have three different durations): one half note (857.14 ms), eight quarter notes (428.5 ms), and four eighth notes (214.28 ms). The tempo of the beat (every quarter note) occurred at the frequency of 2.33 Hz, that is, at the metronomic 140 bpm.

Fig. 1
figure 1

The music score and the sound envelope (in amplitude and time) are depicted for each excerpt used in the experiment. The isotonic rhythm excerpt preserved the rhythmic structure of the original tune but reduced the melodic organization to one pitch (C6). The rhythmically scrambled excerpt maintained the melodic organization but scrambled the rhythmic structure

For the test, we used two kinds of unfamiliar stimuli: the isotonic rhythm excerpt and the rhythmically scrambled excerpt. The isotonic rhythm was obtained by replacing the tones of the familiarization song by the pitch C6 (Fig. 1b). We thus eliminated the melodic intervals of the musical excerpt, creating a constant-pitch version, while preserving its rhythmic organization. The rhythmically scrambled excerpt was obtained by randomly switching the rhythmic figures inside the song. That is, by interchanging the duration of the composing tones (Fig. 1c). Thus, the rhythmically scrambled excerpt preserved the melodic intervals of the song but distorted its rhythmic organization.


We used modular response boxes (reference LE1005; Panlab S. L., Barcelona, Spain), equipped with a pellet feeder. There was a photoelectric detector that registered the nose-poking responses of the rats attached to the feeder. We presented the auditory stimuli using Electro Voice (s-40) speakers located next to the boxes. The intensity of the sound was approximately 68 dB. Each box was isolated within a bigger soundproof box. A custom-made program (RatboxCBC v.2) controlled the presentation of stimuli, recorded the nose-poking responses, and provided the food reward (sugar pellets; F0021-D, Bilaney Consultants) during the study.


Before the experiment began, rats were trained to put their nose into the feeder to obtain food pellets. They learned this behavior within the first session. After this first day, we used a familiarization procedure to repeatedly present to the animals the target song. We thus familiarized the animals with the “Happy birthday” excerpt over 26 sessions (one session per day). In each session, rats were placed individually in a response box. The familiar excerpt was presented 40 times per session. There was an 8-s silent interval between excerpts. During this time interval, rats received food pellets if they introduced the nose into the feeder. One pellet was delivered after each nose-poking response. After the familiarization, we ran a test session. In the test session, we presented 42 sequences. Half of the sequences were familiar excerpts. As during familiarization, the animals received reinforcement for responses after these sequences. The other half of the sequences were test sequences, including seven familiar excerpts, seven isotonic rhythm versions, and seven rhythmically scrambled versions of the excerpt. No reinforcement was delivered after the presentation of these test sequences. The presentation of the sequences was semi-randomized within the test session, so that there were no more than two items of the same type in a row. We also avoided the alternation of the three types of stimuli. All the experimental procedures were conducted in accordance with Catalan, Spanish, and European guidelines, and received the necessary approval from the ethics committee of the Universitat Pompeu Fabra and the Generalitat de Catalunya (protocol number 9068).


A repeated-measures ANOVA with the within-factor Test Stimuli (familiar song, isotonic rhythm, rhythmically scrambled song) was applied to the number of responses rats gave to the three stimuli. The Shapiro-Wilk test revealed that the nose-poking responses to each test stimuli were normally distributed (all p > .05) and Mauchly’s Sphericity test revealed no violations of variances of the differences between possible pairs (p > .05). The animals responded differently to the three types of test stimuli (F(2,78)=9.885, p < .001, ŋ2 = .202; see Fig. 2). Post hoc pairwise comparisons with the Bonferroni alpha correction revealed a higher mean number of nose-poking responses for the rhythmically scrambled excerpts (M=31.57, SD=9.33) compared to the familiar song (M=24.75, SD=7.35; MD=6.82, p < .001, 95% CI [3.60, 10.05]) and to the isotonic rhythm excerpts (M=27.52, SD=9.76; MD=4.05, p = .035, 95% CI [0.22, 7.89]). The mean number of responses after the isotonic rhythm excerpts and the familiar song did not differ (MD=2.77, p = .376, 95% CI [-1.66, 7.21]). Thus, the animals discriminated the rhythmically scrambled rhythm from the two excerpts that maintained the rhythmic organization of the song. In contrast, they did not discriminate between the two excerpts that differed in their melodic intervals but that kept the same rhythmic structure.

Fig. 2
figure 2

Individual and mean number of nose-poking responses (and standard error bars) to the three types of test stimuli. The filled dots represent each rat response to the test stimuli. The significance level of the paired t-tests is marked with asterisks: n.s. indicates p > .05, * indicates p ≤ .05, and *** indicates p ≤ .001

To explore possible individual differences in the cues used to discriminate the test items, we classified the rats according to their responses to the unfamiliar stimuli. By far the majority of the rats (N=30) responded more to the rhythmically scrambled stimuli than to the isotonic rhythm stimuli. However, a group of animals (N=8) showed the opposite pattern, and responded more to the isotonic rhythm stimuli than to the rhythmically scrambled stimuli. Only two rats gave the same number of responses to both types of unfamiliar stimuli (see Supplementary Fig. 1). The difference between the number of animals responding differently to the rhythmically scrambled and the isotonic rhythm stimuli was significant (X2=24.26, p < 0.005). A mixed-design ANOVA with the within-factor Test Stimuli (familiar song, isotonic rhythm, rhythmically scrambled song) and the between-factor Preference (rhythmically scrambled, isotonic, indifferent) was applied to the rats’ responses. Greenhouse-Geisser corrections were applied to sphericity violations found in Mauchly’s test (p = .013), and Bonferroni alpha corrections were applied to post hoc paired t-tests. A Shapiro-Wilk test revealed that each group’s nose-poking responses to the test stimuli were normally distributed (all p > .05). Levene’s test for equality of variances was not violated in any group (all p > .05). We observed no significant main effects but a significant interaction between Test Stimuli and Preference (F(3.29,60.93)=10.17, p < .001, ŋ2 = .355; see Fig. 3). The eight rats that gave more responses to the isotonic test excerpts (M=37.25, SD=10.11) discriminated them significantly from the rhythmically scrambled (M=27.00, SD=10.93; MD=10.25, p < .001, 95% CI [4.62, 15.88]) and from the familiar excerpts (M=22.38, SD=6.80; MD=14.88, p < .001, 95% CI [6.37, 23.39]). However, this group of animals did not discriminate the rhythmically scrambled from the familiar stimuli (MD=4.63, p = .316, 95% CI [-2.36, 11.61]). This suggests that these rats were focusing on the melodic organization of the tune instead of its rhythmic structure. The opposite pattern was observed in the group of 30 animals that produced more responses to the rhythmically scrambled (M=33.37, SD=8.61) than to the familiar (M=25.27, SD=7.66; MD=8.10, p < .001, 95% CI [4.50, 11.71]) and the isotonic stimuli (M=25.23, SD=8.40; MD=8.13, p < .001, 95% CI [5.23, 11.04]), without discriminating the isotonic from the familiar stimuli (MD=-0.03, p = 1.000, 95% CI [-4.43, 4.36]). The two rats that gave the same number of responses to the isotonic stimuli (M=23.00, SD=1.41) and the rhythmically scrambled stimuli (M=23.00, SD=1.41; MD=0.00, p = 1.000, 95% CI [-11.26, 11.26]) could not discriminate the familiar (M=26.50, SD=4.95) from either the isotonic stimuli (MD=3.50, p = 1.000, 95% CI [-13.52, 20.52]) or the rhythmically scrambled stimuli (MD=3.50, p = 1.000, 95% CI [-10.48, 17.48]).

Fig. 3
figure 3

Mean number of responses (and standard error bars) to familiar, isotonic, and rhythmically scrambled stimuli grouped by response patterns. The significance level of the paired t-tests is marked with asterisks: n.s. indicates p > .05, and *** indicates p ≤ .001


We explored whether rats could discriminate a familiar song from a version that preserved the rhythm but not the melody, and from a version that preserved the melody but not the rhythm. The results suggest that most animals successfully discriminated among the complex auditory sequences by focusing on their underlying rhythmic structure. In the main analysis, we observed no differences between the original excerpt and its constant-pitch version (the isotonic rhythm excerpt). In contrast, we did observe differences between them and the version that disrupted the rhythmic organization of the song. Importantly, we observed some individual differences in the cues on which the animals focused to process the tune. Although the majority of the rats focused on the rhythmic structure of the tune to discriminate among the unfamiliar stimuli, eight out of 40 animals seem to have focused on the melodic organization of the tunes. This suggests that there might be individual differences in the strategies to process complex auditory signals and discriminate musical tunes. Nevertheless, the general tendency we observed is to focus on the rhythmic structure of the song rather than on its melodic organization.

Previous studies reported that rats found it difficult to discriminate between sequential patterns of notes (d’Amato & Salmon, 1984) and a tune from its reversed sequence (Poli & Previde, 1991). In these studies, the rodents seem to have focused on a single pitch of the sequences or on a difference in timbre rather than on their underlying structure to perform their discrimination. In our study, however, the order of the pitches and the timbre were identical between the familiar song and its rhythmically scrambled version. The only difference between these two excerpts was the duration pattern of the tones that composed them. Under these conditions, our rats seem to have focused on the rhythmic structure, rather than on the melodic organization of the song, to discriminate among the excerpts. It is an open question, however, whether the group of rats that did not discriminate the rhythmically scrambled from the familiar stimuli was paying attention to the whole frequency contour or just to a single pitch, as in the Experiment 3 by d’Amato and Salmon (1984). If they were focusing on certain tones, the rats that did not discriminate any stimuli could have been basing their responses only on the pitch C6, present in both unfamiliar stimuli.

Our study suggests that rats are sensitive to the rhythmic structure of a familiar tune. One possibility could be that, to discriminate between the stimuli, rats may have used the metrical organization of the three different durational values that the tones composing the excerpts could take. This may mean that they were sensitive to the temporal changes in the tune regardless of the pitch of the tones. In fact, they did not discriminate the familiar tune from the version in which all notes were transposed to a single tone (the constant-pitch version). This suggests that when the underlying structure of an auditory sequence provides enough information, rodents can compensate for changes in absolute frequency (e.g., Crespo-Bojorque & Toro, 2016). Another possibility could be that rats just focused on the first or the last few durational values of the unfamiliar excerpts to detect whether they matched or mismatched those of the familiar song. This alternative possibility implies that rats can discriminate the temporal organization of few durational values but without requiring the processing of higher metrical structures. Indeed, this is a limitation of this study, because we cannot ascertain whether rats focused on the whole rhythmic structure or just the temporal organization of the few first or last tones. A way to solve this issue would be by testing rats with unfamiliar excerpts that uniquely scramble the beginning, the middle, or the end of the tune.

Despite this limitation, and in contrast with the findings reported in Poli and Previde (1991), our main results indicate that rats’ discriminative behavior might go beyond the surface features of the musical excerpts (such as fundamental frequency or timbre) and is based on the structured rhythm of the tunes (or part of them). This suggests striking abilities in rodents regarding the processing of complex auditory signals, which would be interesting to explore using more naturalistic and biologically meaningful stimuli. Interestingly, we observed that the rodents responded more (produced more nose-poking responses) to the unfamiliar excerpts than to the familiar ones. A similar preference for novel stimuli has also been reported in rats in the domain of taste (Kalat, 1974; Welker and King, 1962) and in the spatial rearrangement of objects (Pisula & Siegel, 2005; Pisula, Stryjek, & Nalecz-Tolak, 2006), as well as in new-born chicks during difficult tasks (Bateson & Jaeckel, 1976; Santolin, Rosa-Salva, Vallortigara, & Regolin, 2016). The animals in our experiment were, therefore, responding to the novel durations of the tones that made up the tune.

The fact that the ability to detect the rhythmic organization of tones in a sequence – be it the whole sequence or part of it – is present in a distantly related mammal suggests that certain musicality features may have evolved independently across species to process the relevant sensory information of their environment. More importantly, it suggests that the cognitive architecture required for the processing of core aspects of music universals (i.e., temporal organization (metrical hierarchies) of few durations constrained to two to three subdivisions of the beat; Savage et al., 2015) might have deep biological roots shared across species. In fact, studies with avian species suggest strong rhythmic processing abilities probably linked to their natural vocalizations (Slabbekoorn & ten Cate, 1999; Hoeschele, Merchant, Kikuchi, Hattori, & ten Cate, 2015). Beyond the temporal organization of rhythms that we studied, other universal properties of music might be found in the animal kingdom (e.g., Fitch, 2006; Hauser & McDermott, 2003; Honing, ten Cate, Peretz, & Trehub, 2015), such as pitch identification or melodic organization, which opens the door to a better understanding of the phylogenetic origins of music.