Neural systems for vocal learning in birds and humans: a synopsis
- First Online:
- Cite this article as:
- Jarvis, E.D. J Ornithol (2007) 148: 35. doi:10.1007/s10336-007-0243-0
- 308 Views
I present here a synopsis on a hypothesis that I derived on the similarities and differences of vocal learning systems in vocal learning birds for learned song and in humans for spoken language. This hypothesis states that vocal learning birds—songbirds, parrots, and hummingbirds—and humans have comparable specialized forebrain regions that are not found in their close vocal non-learning relatives. In vocal learning birds, these forebrain regions appear to be divided into two sub-pathways, a vocal motor pathway mainly used to produce learned vocalizations and a pallial–basal–ganglia–thalamic loop mainly used to learn and modify the vocalizations. I propose that humans have analogous forebrain pathways within and adjacent to the motor and pre-motor cortices, respectively, used to produce and learn speech. Recent advances have supported the existence of the seven cerebral vocal nuclei in the vocal learning birds and the proposed brain regions in humans. The results in birds suggest that the reason why the forebrain regions are similar across distantly related vocal learners is that the vocal pathways may have evolved out of a pre-existing motor pathway that predates the ancient split from the common ancestor of birds and mammals. Although this hypothesis will require the development of novel technologies to be fully tested, the existing evidence suggest that there are strong genetic constraints on how vocal learning neural systems can evolve.
KeywordsSingingSpeakingEvolutionSong nucleiAuditory pathway
Vocal learning birds, songbirds in particular, have been extensively used as a model system to study neural mechanisms of vocal learning as it relates to speech acquisition in humans (Jarvis 2004a, b). This neurobiology sub-field began in the 1970s with the first discovery of a non-human vocal learning system, that of canaries (Serinus canaria) (Nottebohm et al. 1976). Since then, nearly a thousand papers have been published on vocal learning systems in birds (Pubmed and Scirus searches; keywords, song–system–brain–avian). However, little attempt was made to link neural systems for vocal learning in birds with that for spoken language in humans (Doupe and Kuhl 1999). Making such links was hampered by several factors, including uncertainty on the telencephalic homologies between birds and mammals, lack of broadly agreed-upon definitions for song, speech, and language and what makes language special, and lack of sufficient data and synthesis on the neural pathways for vocal learning across bird orders and for speech learning in humans.
Some of these limitations have been overcome in recent years. First, a revision of the nomenclature and understanding of the avian brain has resulted in a consensus view that birds and mammals have homologous pallidal, striatal, and pallial subdivisions in their cerebrums, of which the latter two contain the vocal learning regions (Reiner et al. 2004; Jarvis et al. 2005). However, the pallial subdivision in mammals, the cortex, is layered in its cellular organization whereas in birds it is nuclear, which makes comparisons difficult at the level of one-on-one homologies or analogies. Second, a greater understanding of birdsong behavior has allowed for more informative comparisons with human speech (Doupe and Kuhl 1999; Hauser et al. 2002; Okanoya 2007), although many open questions still remain. For the sake of brain comparisons, I simply define ‘song’ in the vocal learning birds and ‘speech’ in humans as analogous behaviors, and ‘spoken language’ in humans as synonymous with speech. Third, gene expression mapping studies have led to important discoveries on the vocal neural systems across vocal learning bird orders (Jarvis et al. 2000) and brain imaging studies in humans have allowed a more accurate identification of brain areas for spoken language (Gracco et al. 2005). Based upon these advances, I derived a hypothesis on the similarities and differences of brain pathways for song in vocal learning birds and spoken language in humans. Here, I present a synopsis of that hypothesis, some of the evidence for it, and some new findings since it was first reported in 2004 (Jarvis 2004a, b).
Vocal learning is the ability to modify the acoustic and/or syntactic structure of sounds produced, including imitation and improvisation. It is distinct from auditory learning, which is the ability to make associations with sounds heard, though vocal learning depends upon auditory learning (Konishi 1965). Vocal learning is one of the most critical behavioral substrates for spoken human language; with it, humans have the ability to imitate speech sounds heard individually and sequentially, and modify them through auditory feedback. Vocal learning, however, is not synonymous with spoken language, in that spoken language includes many other features such as grammar and recursion (Hauser et al. 2002). That is, different vocal learning species imitate and modify sounds to various degrees, with humans being the most prolific. Despite these differences, most, if not all, vertebrates are capable of auditory learning, but few are capable of vocal learning. The latter has found in three distantly related groups of mammals (humans, bats, and cetaceans) and three distantly related groups of birds (parrots, hummingbirds, and songbirds) (Nottebohm 1972; Janik and Slater 1997). Recent studies have also discovered evidence for vocal learning in seals (Sanvito et al. 2007) and elephants (Poole et al. 2005). However, it is only in humans and the three vocal learning bird groups that the brain pathways for learned vocalization have been studied.
Vocal learning brain pathways in birds and humans
Word or phrase
Word or phrase
Lateral magnocellular nucleus of anterior nidopallium
Central nucleus of the anterior arcopallium
Central nucleus of the anterior arcopallium, dorsal part
Magnocellular nucleus of anterior nidopallium
Central nucleus of the anterior arcopallium, ventral part
Mesencephalic lateral dorsal nucleus
Medial magnocellular nucleus of anterior nidopallium
Caudal medial arcopallium
Magnocellular nucleus of the anterior striatum
Anterior cingulate cortex
Oval nucleus of the mesopallium
Anterior insula cortex
Oval nucleus of the anterior nidopallium
Caudal medial nidopallium
Anterior supplementary motor area
Caudal dorsal nidopallium
Intermediate dorsal lateral nidopallium
Area X of the striatum
Interfacial nucleus of the nidopallium
Central nucleus of the lateral nidopallium
Tracheosyringeal subdivision of the 12th nucleus
Medial nucleus of dorsolateral thalamus
Dorsal medial nucleus of the midbrain
Magnocellular nucleus of the dorsomedial thalamus
Robust nucleus of the arcopallium
Dorsal lateral prefrontal cortex
Vocal nucleus of the arcopallium
Face motor cortex
Vocal nucleus of the anterior mesopallium
(A letter based name)
Vocal nucleus of the anterior nidopallium
Vocal nucleus of the anterior striatum
Lateral nucleus of the anterior nidopallium
Vocal nucleus of the lateral nidopallium
Lateral nucleus of the anterior mesopallium
Vocal nucleus of the medial mesopallium
Vocal nucleus of the medial nidopallium
The major differences among vocal learning birds are in the connections between the posterior and anterior vocal pathways (Jarvis and Mello 2000). In songbirds, the posterior pathway sends input to the anterior pathway via HVC to Area X; the anterior pathway sends output to the posterior pathway via lateral MAN (LMAN) to RA and medial MAN (MMAN) to HVC (Fig. 1c; Foster and Bottjer 2001). In contrast, in parrots, the posterior pathway sends input into the anterior pathway via ventral AAC (AACv, parallel of songbird RA) to NAO (parallel of songbird MAN) and MO; the anterior pathway sends output to the posterior pathway via NAO to NLC (parallel of songbird HVC) and to AAC (Fig. 1a; Durand et al. 1997).
In humans, imaging and lesions studies have revealed cortical, striatal, and thalamic regions that are active and necessary for learning and production of language (reviewed in Jarvis 2004a, b; and see below). However, ethical and practical issues prevent connectivity tract-tracing experiments on humans. Some post-mortem neuro-degeneration studies have been conducted in humans and many tract-tracing studies have been performed on adjacent non-vocal pathways in vocal non-learning mammals. Based upon these comparisons, it appears that the avian posterior vocal pathways are similar to mammalian motor cortico-brainstem pathways, where, in humans, I propose an analogous posterior vocal pathway consists of the face motor cortex that projects to nucleus ambiguous (Am) of the medulla (Fig. 1d; Kuypers 1958a); Am, the parallel of avian nXIIts, projects to the muscles of the larynx, the main mammalian vocal organ (Zhang et al. 1995; Jurgens 1998). Non-human primates, like chickens, do not have such a projection (Kuypers 1958a, b). See Jarvis (2004b) for a detail description on analogous cell types.
The avian anterior vocal pathways are similar in connectivity to mammalian cortical-basal ganglia–thalamic–cortical loops (Bottjer and Johnson 1997; Durand et al. 1997; Jarvis et al. 1998; Perkel and Farries 2000). In this regard, I proposed that a strip of adjacent premotor cortex in humans that is required for speech learning and syntax production makes up the cortical part of a speech loop. This cortical strip extends from the anterior insula (aINS), Broca’s area, the anterior dorsal lateral prefrontal cortex (aDLPFC), the anterior pre-supplementary motor area (aSMA), to the anterior cingulate (aCC; Fig. 1d). This strip I argue is analogous to the avian pallial anterior vocal nuclei (i.e., parrot MO and NAO). As in non-human primates and in vocal learning birds, I proposed that this cortical strip projects to the anterior most region of the striatum (aSt), the anterior striatum to the globus pallidus (GP), the pallidus to the anterior dorsal thalamus (aT), and the dorsal thalamus back up to the cortical strip (Fig. 1d), all regions required for speech learning and syntax (described below).
Because connections between the posterior and anterior vocal pathways differ between songbirds and parrots, comparisons between them and mammals will also differ. In mammals, layer 5 neurons of motor cortex have axon collaterals, where one projects into the striatum and another projects to the medulla and spinal cord (Alexander and Crutcher 1990; Reiner et al. 2003). This pattern is different from the songbird where a specific cell type of HVC, called X-projecting neurons, projects to Area X in the striatum separately from neurons of RA of the arcopallium that project to the medulla (Fig. 1c). This pattern is also different from the parrot, where AAC of the arcopallium has two anatomically separate neuron populations, AACd that projects to the medulla and AACv that projects to anterior pallial vocal nuclei NAO and MO (Fig. 1a; Durand et al. 1997). Output of mammalian anterior pathways are proposed to be the collaterals of layer 3 and upper layer 5 neurons that project to other cortical regions and the striatum (Reiner et al. 2003; Jarvis 2004b).
Functions of vocal brain areas in birds and humans
There are some gross similarities in behavioral deficits following lesions in specific brain areas of vocal learning birds (experimentally placed) and of humans (due to stroke or trauma). Lesions to songbird posterior nuclei HVC and RA (Nottebohm et al. 1976; Simpson and Vicario 1990), on the left side in canaries, cause deficits similar to those found after damage to left human face motor cortex, this being muteness for learned vocalizations, i.e., for speech (Valenstein 1975; Jurgens et al. 1982; Jurgens 1995). Lesions to parrot NLC even cause deficits in producing the correct acoustic structure of learned human speech in parrots (Lavenex 2000). Lesions to the face motor cortex in chimpanzees and other non-human primates do not affect their ability to produce vocalizations (Kuypers 1958b; Jurgens et al. 1982; Kirzinger and Jurgens 1982). Lesions to avian nXIIts and DM and mammalian Am and PAG result in muteness in both vocal learners and non-learners (Brown 1965; Nottebohm et al. 1976; Seller 1981; Jurgens 1994, 1998; Esposito et al. 1999).
Lesions to songbird MAN cause deficits that are most similar to those found after damage to anterior parts of the human premotor cortex, this being disruption of imitation and/or induction of sequencing problems. In birds and humans, such lesions do not prevent the ability to produce learned song or speech. In humans, these deficits are called verbal aphasias and verbal amusias (Benson and Ardila 1996). Damage to the left side often leads to verbal aphasias, whereas damage to the right can lead to verbal amusias (Berman 1981). The deficits in humans, however, are more complex. Specifically, lesions to songbird LMAN (Bottjer et al. 1984; Nottebohm et al. 1990; Scharff and Nottebohm 1991; Kao et al. 2005) and to the human insula and Broca’s (Mohr 1976; Benson and Ardila 1996; Dronkers 1996) lead to poor imitation with sparing or even inducing more stereotyped song or speech. In addition, lesions to Broca’s and/or DLPFC (Benson and Ardila 1996) lead to poor syntax production in construction of phonemes into words and words into sentences. Lesions to DLPFC also result in uncontrolled echolalia imitation, whereas lesions to aSMA and anterior cingulate result in spontaneous speech arrest, lack of spontaneous speech, and/or loss of emotional tone in speech, but with imitation preserved (Nielsen and Jacobs 1951; Barris et al. 1953; Rubens 1975; Valenstein 1975; Jonas 1981). Lesions to songbird MMAN lead to a decreased ability in vocal learning and some disruption of syntax (Foster and Bottjer 2001).
Lesions to songbird Area X and to the human anterior striatum do not prevent the ability to produce already learned speech, but do result in disruption of vocal learning and disruption of some syntax in birds (Sohrabji et al. 1990; Scharff and Nottebohm 1991; Kobayashi et al. 2001) or verbal aphasias and amusias in humans (Mohr 1976; Bechtereva et al. 1979; Leicester 1980; Damasio et al. 1982; Alexander et al. 1987; Cummings 1993; Speedie et al. 1993; Lieberman 2000). Humans can have a combination of symptoms (Mohr 1976) perhaps because, as in non-human mammals, large cortical areas send projections that converge onto relatively smaller striatal areas (Beiser et al. 1997). Not many cases have been reported of lesions to the human globus pallidus leading to aphasias (Strub 1989), but the fact that this can occur suggests some link with a striatal vocal area in humans. In vocal learning birds, the pallidal neurons appear to be within the striatal vocal nucleus (Durand et al. 1997; Farries and Perkel 2002).
Similar to a preliminary report on songbird DLM (Halsema and Bottjer 1991), damage to anterior portions of the human thalamus leads to verbal aphasias (Graff-Radford et al. 1985). In humans, thalamic lesions can lead to temporary muteness followed by aphasia deficits that are sometimes greater than after lesions to the anterior striatum or premotor cortex. This greater deficit may occur perhaps because there is further convergence of inputs from the striatum to the globus pallidus and then from the globus pallidus to the thalamus (Beiser et al. 1997).
Results of lesion studies overlap with brain activation studies. In vocal learning birds, all seven comparable cerebral vocal nuclei display vocalizing-driven expression of egr-1, an immediate early gene (Jarvis and Nottebohm 1997; Jarvis et al. 1998, 2000; Jarvis and Mello 2000); expression of immediate early genes are responsive to changes in neural activity. Likewise, premotor neural firing has been found in several posterior and anterior vocal nuclei when a bird sings (McCasland 1987; Yu and Margoliash 1996; Hessler and Doupe 1999; Hahnloser et al. 2002). The firing in songbird HVC and RA correlates with sequencing of syllables and syllable structure, respectively, whereas firing in Area X and LMAN is much more varied and, in LMAN, it correlates with song variability. Stimulation with electrical pulses to HVC during singing temporarily disrupt song output, i.e., song arrest (Vu et al. 1998).
In humans, the face motor cortex is always activated with speech task (Petersen et al. 1988; Rosen et al. 2000; Gracco et al. 2005). For the proposed language strip, production of verbs and complex sentences can be accompanied by activation in all or a subregion of this strip (Fig. 1d) (Petersen et al. 1988; Poeppel 1996; Price et al. 1996; Crosson et al. 1999; Wise et al. 1999; Papathanassiou et al. 2000; Rosen et al. 2000; Palmer et al. 2001; Gracco et al. 2005). Activation in Broca’s, DLPFC, and aSMA is higher when speech tasks are more complex, including learning to vocalize new words or sentences, sequencing words into complex syntax, producing non-stereotyped sentences, and thinking about speaking (Hinke et al. 1993; Poeppel 1996; Buckner et al. 1999; Bookheimer et al. 2000). Like vocal nuclei in birds, premotor speech-related neural activity has been found in Broca’s area (Fried et al. 1981). Further, low threshold electrical stimulation to the face motor cortex, Broca’s, or the aSMA cause speech arrest or generation of phonemes or words (Jonas 1981; Fried et al. 1991; Ojemann 1991, 2003).
In non-cortical areas, speech production is accompanied by activation of the anterior striatum and the thalamus (Wallesch et al. 1985; Klein et al. 1994; Wildgruber et al. 2001; Gracco et al. 2005). Low threshold electrical stimulation to ventral lateral and anterior thalamic nuclei, particularly in the left hemisphere, leads to word repetition, speech arrest, speech acceleration, spontaneous speech, anomia, or verbal aphasia (Johnson and Ojemann 2000). The globus pallidus can also show activation during speaking (Wise et al. 1999). In non-human mammals and in birds, PAG and DM, and Am and nXIIts display premotor vocalizing neural firing (Larson 1991; Larson et al. 1994; Zhang et al. 1995; Dusterhoft et al. 2004) and/or vocalizing-driven gene expression (Jarvis et al. 1998, 2000; Jarvis and Mello 2000).
Taken together, the lesion and brain activation findings are consistent with the idea that songbird HVC and RA are more similar in their functional properties to face motor cortex than to any other human brain area, and that songbird MAN, Area X, and the anterior part of the dorsal thalamus are more similar in their properties to parts of the human premotor cortex, anterior striatum, and ventral lateral/anterior thalamus, respectively. The findings are consistent with the presence in humans of a posterior-like vocal motor pathway and an anterior-like vocal premotor pathway that are similar to the production and learning pathways of vocal learning birds. A difference between birds and humans appears to be the greater complexity of the deficits found after lesions in humans.
The auditory system
The source of auditory input into the vocal pathways of vocal learning birds is unclear. In songbirds, proposed routes include the HVC shelf into HVC, the RA cup into RA, Ov or CM into NIf, and from NIf dendrites in L2 (Wild 1994; Fortune and Margoliash 1995; Vates et al. 1996; Mello et al. 1998). However, the location of the vocal nuclei relative to the auditory regions differs among vocal learning groups. In songbirds, the posterior vocal nuclei are embedded in the auditory regions; in hummingbirds, they are situated more laterally, but still adjacent to the auditory regions; in parrots, they are situated far laterally and physically separate from the auditory regions (Fig. 1a–c). At a minimum, the auditory input must take different routes to enter the posterior vocal nuclei of each group.
In humans, primary auditory cortex information is passed to secondary auditory areas, which includes Wernicke’s area (Fig. 1d). Damage to this area leads to auditory aphasias, sometimes call fluent aphasia. A patient can speak well, but produces nonsense highly verbal speech. One reason for this symptom is that the vocal pathways may no longer receive feedback from the auditory system. Bilateral damage to primary auditory cortex and Wernicke’s area also leads to full auditory agnosia, the inability to consciously recognize any sounds (speech, musical instruments, natural noises, etc.) (Benson and Ardila 1996). Information from the Wernicke’s area has been proposed to be passed to Broca’s area through arcuate fibers in a caudal-rostral direction (Geschwind 1979), but for many years such a pathway had not been proven. Recently, this hypothesis was tested in experiments with stimulation electrodes in patients undergoing surgery, which revealed a functional bi-directional axon pathway between Wernicke’s and Broca’s areas (Matsumoto et al. 2004).
No one has tested whether lesions to avian secondary auditory areas result in fluent song aphasias. Yet, lesions to songbird NCM and CM result in a significant decline in the ability to form auditory memories of songs heard (MacDougall-Shackleton et al. 1998; Gobes and Bolhuis 2007). It is difficult to ascertain how non-human animals, including birds, perceive sensory stimuli, and therefore it is difficult to make comparisons with humans in regard to perceptual auditory deficits.
Evolution of vocal learning systems from a common motor pathway
Given that the auditory pathways in avian, mammalian, and reptilian species are similar, whether not a given species is a vocal learner, this suggests that the auditory pathway in vocal learning birds and in humans was inherited from their common stem-amniote ancestor, thought to have lived ∼320 million years ago (Evans 2000). Having a cerebral auditory pathway would explain why non-human mammals, including dogs, exhibit auditory learning, including learning to understand the meaning of human speech, although with less facility than a human. For vocal learning pathways, because the connections of the anterior and posterior vocal pathways in vocal learning birds bear some resemblance to those of non-vocal pathways in both birds and mammals, pre-existing connectivity could have been a genetic constraint for the evolution of vocal learning (Durand et al. 1997; Farries 2001; Lieberman 2002; 2004a, b). In terms of function, recent results suggest that vocal nuclei of vocal learning birds are embedded within at least seven brain areas active during the production of limb and body movements (Feenders, Leidvogel, Rivas, Zapka, Horita, Tremere, Hara, Wada, Mouritsen, and Jarvis, submitted). The same movement-associated brain areas are also found in vocal learning birds, such as Ring Doves (Streptopelia risoria). Like the vocal nuclei, their activation is independent of auditory input and correlates with the amount of movement performed. These findings led to a motor theory for the origin of vocal learning, whereby in the avian brain a pre-existing motor system in a vocal non-learner ancestor is proposed to consists of seven brain regions distributed across mesopallial, nidopallial, arcopallial, and striatal brain subdivisions, and separated into two pathways: an anterior pre-motor pathway that forms a pallial–basal–ganglia–thalamic–pallial loop and a posterior motor pathway that sends descending projections to brainstem and spinal cord pre-motor neurons. Then, a mutational event or events might have caused descending projections of avian arcopallium neurons, that normally synapse onto non-vocal pre-motor neurons, to instead synapse onto vocal nXIIts motor neurons in vocal learners. Thereafter, cerebral vocal brain regions could have developed out of adjacent motor brain regions using the pre-existing connectivity. Such a mutational event would be expected to occur in genes that regulate synaptic connectivity of pallial motor neurons to α-motor neurons. This theory can also be applied to the proposed human posterior and anterior vocal pathways used for spoken language, as these regions are either embedded within or adjacent to motor and pre-motor pathways. Various parts of this hypothesis can be verified or falsified with connectivity, lesion, and brain activation experiments on adjacent brain areas in vocal non-learning birds, brain areas for vocal learning in other mammalian vocal learners, and gene manipulation experiments on genes that control pallial to brainstem neural connectivity in birds and mammals.
I thank Dr. Miriam Rivas for performing the in situ hybridizations of the parrot brain sections.