Basal Ganglia: Songbird Models
KeywordsReinforcement Learning Inverse Model Auditory Feedback Efference Copy Hebbian Learning
Songbirds produce complex vocalizations, a behavior that depends on the ability of juveniles to imitate the song of an adult. Song learning relies on a specialized basal ganglia-thalamocortical loop. Several computational models have examined the role of this circuit in song learning, shedding light on the neurobiological mechanisms underlying sensorimotor learning.
Songbirds use learned vocalizations to communicate during courtship or aggressive behaviors. These vocalizations, called song, require fast coordination of laryngeal and respiratory muscles. Songbirds learn their song as juveniles through a long process comprising two sequential phases: the juvenile first listens to and memorizes one or more tutor songs and then uses auditory feedback to match its song to the memorized model through trial and error.
BG Functions in Songbirds: Experimental Background
The BG loop in songbirds is necessary for song learning but not for song production per se. BG neurons respond selectively to playbacks of the bird’s own song (Doupe and Solis 1997) and were initially thought to convey auditory feedback signals to be compared with a stored template of the tutor song. However, no neuronal correlate of such comparison can be found in the BG-thalamocortical loop (Leonardo 2004). Auditory feedback-related activity has however been reported in upstream cortical nuclei (HVC), where similar neuronal responses can be observed in response to syllable production or playback (“mirror neurons,” Prather et al. 2008). During song production, the BG-thalamocortical loop introduces motor variability allowing vocal exploration (Kao et al. 2005; Olveczky et al. 2005), as needed in a reinforcement learning (RL) framework, and can guide adaptive changes in song to minimize errors (Andalman and Fee 2009).
Reinforcement Learning in Songbirds
Most models of BG-dependent learning in songbirds are in a RL framework (but see Ganguli and Hahnloser 2013). In these models, a timing signal is assumed to be produced in HVC, in which output is sent to the BG to serve as a clock input during learning. Based on this clock input, the RL circuit learns to produce the correct motor gestures at each time step. In the RL framework, the agent learns to correlate random motor explorations with fluctuations in a reward signal in order to select motor patterns that lead to the highest reward. For example, changes in the song (the motor output) that increase the amount of expected reward should be implemented, whereas song changes leading to lower expected reward should be discarded. Implementation of such RL framework in the song-related neuronal circuitry must include four components: (1) an “actor” that produces the song; (2) a mechanism for exploration, implemented in a variability circuit (Fee and Goldberg 2011), a searcher (Doya and Sejnowski 1998), or an experimenter (Fiete et al. 2007); (3) a comparator circuit or “critic” (Doya and Sejnowski 1998) that computes the reward signal by evaluating the produced song with respect to the memorized template; (4) and a learning mechanism to modify motor output with time.
Models of Song Learning and the Involvement of the BG-Thalamocortical Loop
Fiete et al. (2007) elaborated on the Doya and Sejnowski model in a biologically realistic network of spiking neurons. While the location of the critic in their model was only speculative, the other elements of the model were similar to Doya and Sejnowski (1998): plasticity took place in HVC-to-RA synapses and LMAN was again the search element, called the “experimenter.” Besides its realistic nature, their model differs from previous models by utilizing a new learning algorithm relying on node perturbations, where LMAN neurons perturb the activity of RA neurons instead of their synaptic efferences from HVC (Fig. 2), and an online reward signal during song. Using such a realistic learning algorithm, the circuit learns a simple song in a biologically plausible number of song renditions.
Template Comparison, Efference Copy, or Inverse Model?
In the RL models discussed above, a global reward signal evaluating auditory feedback quality is delivered by the critic. Because of delays that are inherent in the transformation of neural activity into vocal gestures and in the auditory processing of produced sounds, the reward signal is necessarily delayed with respect to the neuronal activity underlying the rewarded gesture. In other words, the reward is likely temporally imprecise. To overcome this delay problem, an eligibility trace can be used (Doya and Sejnowski 1995; Fiete et al. 2007).
In summary, the models we have presented address the question of how sensorimotor learning leads to the faithful copy of a previously imprinted tutor song. The BG-thalamocortical loop plays the role of the actor-critic reinforcement circuit in these models, although learning of an inverse model, in the BG or elsewhere, may also participate to imitation. Whether these two types of learning coexist in the same neural circuits and how they may be combined to achieve song learning remains to be elucidated in future theoretical work.
- Andalman AS, Fee MS (2009) A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors. Proc Natl Acad Sci USA 106:12518–12523Google Scholar
- Doya K, Sejnowski T (1995) A novel reinforcement model of birdsong vocalization learning. Adv Neural Inf Process Syst 7:101–108.Google Scholar
- Ganguli S, Hahnloser RHR (2013) Bird song learning without reinforcement: the Hebbian self-organization of sensorimotor circuits. Soc Neurosci Abstr 105.12/YY4Google Scholar
- Leonardo A (2004) Experimental test of the birdsong error-correction model. Proc Natl Acad Sci USA 101:16935–16940Google Scholar