Instrumental music and spoken language have many obvious differences, ranging from the acoustic structure of their fundamental building blocks (e.g., tones vs. phonemes or syllables) to the kinds of meanings that sequences convey to listeners (Slevc & Patel, 2011). Yet a growing body of research suggests that the cognitive processing of instrumental music and of language has more in common than one might initially suspect. (Henceforth in this article, music refers to instrumental music, and language refers to ordinary language, i.e., not poetry, chant, or other stylized forms). Hidden links between musical and linguistic cognition have been found at several levels of language processing, including syntactic, semantic, prosodic, phonological, and affective (e.g., Flaugnacco et al., 2015; Habib et al., 2016; Koelsch et al., 2004; Koelsch, Gunter, Wittfoth, & Sammler, 2005; Kunert, Willems, Casasanto, Patel, & Hagoort, 2015; Kunert, Willems, & Hagoort, 2016; Lima & Castro, 2011; Liu, Patel, Fourcin, & Stewart, 2010; Musso et al., 2015; Patel, Peretz, Tramo, & Labreque, 1998; Slevc, Rosenberg, & Patel, 2009; Thompson, Schellenberg, & Husain, 2004; for recent debate, see Collins, Tillmann, Barrett, Delbé, & Janata, 2014; Kunert & Slevc, 2015; Peretz, Vuvan, Lagrois, & Armony, 2015; Tillmann & Bigand 2015).

The purpose of this short essay is to point out one implication of these connections for research on the evolution of spoken language processing. This is the idea that music can be used in comparative (cross-species) studies to study the evolution of cognitive mechanisms involved in language. Thus, for example, if a specific aspect of music processing is known to have a (nontrivial) link to linguistic syntactic processing, then one can study this aspect of music processing in other species to gain insight into the evolutionary precursors of syntactic processing. This is of interest because music provides certain advantages for cross-species research. The raw materials of music (individual tones) can be relatively simple stimuli, and neuroscientific and behavioral research suggests that pitch perception of individual tones is similar in humans and other mammals (Bendor & Wang, 2005; Song, Osmanski, Guo, & Wang, 2016). Tones lack the acoustic complexity of spoken syllables and the semantic properties of words, yet in musical contexts humans perceive tones in terms of rich hierarchical relations and implicit structural norms (Jackendoff & Lerdahl, 2006; Krumhansl, 2015; Patel, 2003, 2008). Children learn these norms similarly to how they learn linguistic structural norms (i.e., without formal instruction).

To take one example, Corrigal and Trainor (2014) showed that children who grow up hearing Western tonal music can identify out-of-key chords in novel melodies by the age of 5 years. Interestingly, the researchers also showed that at an even slightly earlier age (average 4.5 years old), Western children show an event-related potential (ERP) response to such chords in passive listening tasks (e.g., while watching a silent movie). In contrast, 8-month-old infants do not appear to be sensitive to musical key structure (Trainor & Trehub 1992), which shows that this sensitivity is not hard wired at birth. Instead, the sensitivity likely reflects implicit knowledge that develops though exposure to the native musical system. The acquisition of this knowledge does not require formal musical training and may rely instead on statistical learning, a cognitive mechanism thought to span both music and spoken language. Infants, for example, have been shown to extract statistical regularities from both syllable and tone sequences (Saffran, Aslin, & Newport, 1996; Saffran, Johnson, Aslin, & Newport, 1999). Although statistical learning may help the human mind acquire implicit knowledge of the norms of harmonic structure, evidence from behavioral and neural studies suggests that the processing of harmonic structure by adults involves hierarchical processing (e.g., Koelsch, Rohrmeier, Torrecuso, & Jentschke, 2013; Lerdahl & Krumhansl, 2007; cf. Rohrmeier, 2011), which overlaps and interacts with the processing of grammatical relationships in language (for empirical evidence, see Fedorenko, Patel, Casasanto, Winawer, & Gibson, 2009; Koelsch, Gunter, et al., 2005; Kunert et al., 2015, 2016; Musso et al., 2015; Slevc et al., 2009; Van de Cavey & Hartsuiker, 2016).

It is thus of considerable interest to know if other species can acquire sensitivity to harmonic structure in music. This would provide a novel way to study cognitive and neural mechanisms relevant to the evolution of linguistic syntactic processing (cf. Fitch, 2014). There is prior work with nonhuman animals that has examined syntactic processing using nonlinguistic sounds (e.g., Gentner, Fenn, Margoliash, & Nusbaum, 2006; ten Cate & Okanoya, 2012; cf. Ravignani, Sonnweber, Stobbe, & Fitch, 2013), but this work often relies on extensive training, which differs from the spontaneous acquisition of linguistic and musical structural knowledge observed in humans. There is thus an untapped line of cross-species research exploring the extent to which other animals, like humans, can acquire implicit knowledge of musical harmonic structure through extended exposure to music (i.e., over several years, spanning birth to adulthood.)

I would like to suggest that one species that may prove particularly useful in addressing this issue is the domestic dog (Canis familiaris). Dogs have lived with humans for thousands of years and attend to human behavior and social cues to a degree that can surpass chimpanzees (Kirchofer, Zimmermann, Kaminski, & Tomasello, 2012; Rosati, Santos, & Hare, 2010). In the West, dogs are often raised in households where music is frequently heard, alongside human children who spontaneously acquire implicit knowledge of musical harmonic structure based on this exposure. Despite the growing number of music CDs for dogs (which are premised on the idea that they experience music in a way similar to how we do), we actually have no idea how dogs perceive music. The hearing range and frequency resolution of dogs seems sufficient for basic music perception (Anrep, 1920; Heffner, 1983). Furthermore, dogs appear to be significantly superior to monkeys in auditory short-term memory abilities, which would facilitate the learning of musical patterns (Kuśmierek, Kowalska, & Mishkin, 1999; Kuśmierek & Kowalska, 1998; Scott, Mishkin, & Yin, 2012). Do dogs (like humans) develop implicit knowledge of musical harmonic structure through exposure to music over several years? (If not, our music may sound to them like atonal music sounds to us.)

One way to address this question is to study neural responses to out-of-key chords in dogs using ERP, as Corrigal and Trainor (2014) did with young children. Such experiments require only passive listening (with no behavioral response). Out-of-key chords produce specific ERP responses in humans, and one could look for analogs of these responses in dogs. (Recent studies have shown that the ERP methodology can be used with awake dogs, e.g., Howell, Conduit, Toukhsati, & Bennett, 2012; Kujala et al., 2013; Törnqvist et al., 2013.) Another option for neural studies is fMRI. This technique has recently been used to study voice-sensitive cortical regions in awake, unanesthetized dogs trained to lie still in an MRI scanner (Andics, Gácsi, Faragó, Kis, & Miklósi, 2014). Using the canine auditory fMRI method pioneered by Andics et al., one could determine if dogs, like humans, show increased activity in inferior frontal brain regions when hearing music that is harmonically complex versus simple, which would suggest cognitive processing of harmonic structure (cf. Tillmann et al., 2006; Patel, 2003). (Another interesting use of fMRI would be to examine activity in the mesolimbic reward pathway when dogs hear music that is frequently played in their households, to determine if they, like their owners, derive pleasure from such music; cf. Zatorre & Salimpoor, 2013).

Of course, behavioral studies would also be important (e.g., discrimination studies in which dogs are tested for their ability to respond differentially depending on whether a novel musical sequence contains one or more out-of-key chords). Demonstration of sensitivity to harmonic structure in dogs would be a first step toward investigating whether they, like humans, develop the ability to process music hierarchically through extended exposure to it. Such investigations of hierarchical processing need not only focus on pitch structure but could also examine rhythmic processing (e.g., the perception of metrical structure; Fitch, 2013; Honing, Merchant, Háden, Prado, & Bartolo, 2012; Schachner, Brady, Pepperberg, & Hauser, 2009). Given the rising amount of cognitive research with dogs, including studies of how they perceive human speech, faces, and emotional expressions (e.g., Huber, Racca, Scaf, Virányi, & Range, 2013; Müller, Schmitt, Barber, & Huber, 2015; Ratcliffe & Reby, 2014; cf. Stewart et al., 2015), and how they respond vocally to human music (Yuan, Rosenberg, & Patel, 2016), hopefully research on canine music perception is not too far in the future.

Taking a step back, the larger point is that studies of language evolution would benefit from knowing whether nonhuman animals, like human children, can acquire implicit knowledge of the structural rules of a language-like communication system via extended exposure to that system (without explicit training) during development. Instrumental music provides an opportunity to study this issue because it is a rule-governed system with cognitive parallels to language (e.g., in syntactic processing), but without the complexities of lexical semantics (Patel, 2008). Dogs are an interesting choice of species to study because they are often raised in households where music is frequently heard. Furthermore, because domestic dogs are typically raised by humans (vs. by other dogs), much of their social attention and behavior is directed toward humans, which could be an important factor for developing sensitivity to human music (cf. ten Cate, Spierings, Huber, & Honing, 2016).

In closing, I suggest that cross-species studies of music cognition, which have recently begun to attract growing interest (e.g., Fitch, 2015; Hoeschele et al., 2015; Patel, 2014), have much to offer the cognitive study of language evolution.