Keywords

1 Introduction

Hearing and communication present a variety of challenges to the auditory system. To be heard and understood, the auditory brain must transform a time-varying acoustic stimulus into a perceptual representation; that is, a sound.

Auditory perception is associated with a number of computational processes, which may act in parallel or serial, including: perceptual grouping, decision-making, attention, and categorization. (1) Perceptual grouping is a form of feature-based stimulus segmentation that determines whether acoustic events will be grouped into a single sound or be segregated into distinct sounds (Bregman 1990). (2) Auditory decision-making is a computational process in which the brain interprets sensory information in order to detect, discriminate, or identify the source or content of auditory stimuli (Gold and Shadlen 2007). Did I hear the stimulus? From where and whom did it come? What does it tell me? How can I use this information to plan an action? (3) Although attention is not always necessary, our awareness of a sound can be influenced by attention (Alain and Arnott 2000; Micheyl et al. 2003; Fritz et al. 2005; Shinn-Cunningham 2008; Snyder et al. 2012; Gutschalk et al. 2015). For example, we can choose whether to listen to—or ignore—the first violin, the string section, or even the whole orchestra. Likewise, we can selectively attend to the particular features in a person’s voice that allow a listener to identify the speaker. (4) In auditory categorization, sounds are classified based on their acoustic features or more processed forms of information (e.g., semantic knowledge), providing an efficient means to interact with stimuli in our environment (Ashby and Berretty 1997; Gifford et al. 2014). For example, when we hear the word “Groningen” from different speakers, we can categorize the gender of each speaker based on the pitch of the speaker’s voice. On the other hand, to analyze the linguistic content transmitted by a speech sound, we can ignore the unique pitch, timbre etc. of each speaker and categorize the sound into the distinct word category “Groningen”.

It is thought that the neural computations and processes that mediate auditory perception are found in the ventral auditory pathway (Rauschecker and Scott 2009; Romanski and Averbeck 2009; Hackett 2011; Bizley and Cohen 2013). In rhesus monkeys, this pathway begins in core auditory cortex—specifically, primary auditory cortex (A1) and the rostral field. These core areas project to the middle lateral (ML) and anterolateral belt (AL) regions of auditory cortex. In turn, these belt regions project directly and indirectly to the ventrolateral prefrontal cortex (vlPFC).

It is important to briefly comment on the contribution of the dorsal (“spatial”) pathway to auditory perception (Rauschecker 2012; Cloutman 2013). Spatial information can act as a grouping cue to assist the segregation of an acoustic stimulus into discrete sounds. For example, when a rhythmic sequence of identical sound bursts is presented from a single location, it is often perceived as one source. But, when the sound sequence is presented from two different locations, it can be perceived as two sounds (Middlebrooks and Onsan 2012; Middlebrooks and Bremen 2013). Such findings suggest that a mixture of spatial and non-spatial auditory cues from both the dorsal and ventral pathways may be needed in order to create a coherent auditory-perceptual representation that guides behavior.

Nonetheless, in this review, we focus on the hierarchical processing that occurs at different stages in the ventral auditory pathway. In particular, we identify—or, at least, suggest—the unique contributions of these different processing stages to auditory perception and categorization, with an acknowledgment that associating any single brain region with a particular computation oversimplifies the complexity of the auditory brain. Indeed, it is well known that neurons become increasingly sensitive to more complex stimuli along the beginning stages of the ventral auditory pathway (e.g., between the core and belt regions of the auditory cortex). For example, core neurons are more sharply tuned for tone bursts than neurons in the lateral belt, whereas lateral-belt neurons are more selective for particular spectrotemporal features of complex sounds, such as vocalizations (Rauschecker and Tian 2000). Here, though, we review hierarchical processing in the ventral pathway by focusing on those studies in which neural activity was collected simultaneously while a listener was engaged in an auditory task.

2 Neural Correlates of Auditory Perception Along the Ventral Auditory Pathway

A1’s role in auditory perception is controversial. Part of that controversy stems from the putative role of A1 in processing auditory “objects” (Nelken 2008). We will take the position that auditory objects are analogous to perceptual representations (i.e., sounds) (Bizley and Cohen 2013). As a consequence of this definition, if a neuron encodes an auditory object, it should be modulated by a listener’s perceptual reports. That is, by holding a stimulus constant and testing whether neural activity is modulated by a listener’s reports, neural activity that is associated with the acoustic features of the stimulus can be dissociated from neural activity associated with the perceptual report. Thus, neurons with complex tuning properties or even those modulated by components of a task (Brosch et al. 2005; Fritz et al. 2005) may contribute to the construction of a perceptual representation; but by themselves do not offer direct evidence of a perceptual representation.

There has been a recent set of literature implicating A1 in auditory perceptual decision-making (Riecke et al. 2009; Kilian-Hutten et al. 2011; Niwa et al. 2012; Riecke et al. 2012; Bizley et al. 2013). In one study, ferrets were asked to report changes in a sound’s pitch, and it was found that both local-field potentials and spiking activity in A1 were modulated by the ferrets’ pitch judgments. Niwa and colleagues have also shown that A1 single-unit activity is modulated by monkeys’ reports during a task in which monkeys reported whether or not a sound was amplitude modulated. Finally, human-imaging studies have revealed that regions of core auditory cortex are modulated by listener’s reports of the identity of an ambiguous speech sound. However, a different body of work suggests that A1 does not encode auditory decisions. For example, when monkeys discriminate between two types of acoustic flutter, A1 activity is not modulated by the monkeys’ choices (Lemus et al. 2009b).

What could be the bases for these apparent divergent sets of findings? We posit that these differences can be attributed to several non-exclusive possibilities. One possibility may be due to the relative perceptual and cognitive demands of the behavioural task: tasks with different demands might differentially engage neurons in core auditory cortex (Bizley and Cohen 2013; Nienborg and Cumming 2014). A second possibility focuses on how choice-related activity itself is analyzed. In choice analyses, it is imperative to restrict the analysis to those trials in which neural modulation related to choice can be clearly disassociated from the auditory stimulus. If this analysis is not carefully conducted, apparent choice activity may be conflated with stimulus-related activity. Finally, choice-related activity may not reflect a casual contribution of the auditory cortex to decision-making but may simply reflect feedback from higher choice-sensitive areas (Nienborg and Cumming 2009) or the structure of the correlated noise (Nienborg et al. 2012).

In the belt regions (ML and AL) of the ventral pathway, several lines of study from our laboratory suggest that this is not the case; but also see Niwa et al. 2013. While monkeys categorized speech sounds, we tested whether neural activity was modulated by monkeys’ categorical judgements. We found that AL neurons were not modulated by the monkeys’ reports (Tsunada et al. 2011, 2012). In a separate line of studies, we asked monkeys to listen to a sequence of tone bursts and report whether the sequence had a “low” or “high” pitch. We found that neither ML nor AL activity was modulated by the monkeys’ choices (Tsunada et al., in press).

Do neurons in the different belt regions contribute differentially to auditory perception? The preliminary findings from our low-high pitch study suggest that AL neurons, but not ML neurons, might represent the sensory information (evidence) used to inform the monkeys’ perceptual decisions. Specifically, AL neurometric sensitivity appeared to be correlated with both psychometric sensitivity and the monkeys’ choices. Consistent with these findings, AL may play a causal role in these auditory judgments: microstimulation of an AL site tends to shift the monkeys’ reports toward the pitch associated with the site’s frequency tuning.

Whereas a single cortical locus of decision-making has yet to materialize, decision-related activity is seen throughout the frontal lobe. vlPFC neurons are strongly modulated by monkeys’ choices (Russ et al. 2008; Lee et al. 2009). Neural activity in the inferior frontal lobe of the human cortex is also modulated by choice when listeners judge the content of ambiguous speech sounds (Binder et al. 2004). Neural correlates relating to a listeners’ decision on auditory-flutter stimuli have also been observed in the ventral premotor cortex (Lemus et al. 2009a). Interestingly, as noted above, the dorsal pathway also contributes to auditory perception; consistent with that notion, activity in the human parietal lobe is modulated by listeners’ choices (Cusack 2005).

In summary, we propose a model in which auditory information is hierarchically organized and processed in the ventral pathway. In early parts of the auditory cortex, neural activity encodes the acoustic features of an auditory stimulus and become increasingly sensitive to complex spectrotemporal properties (Rauschecker and Tian 2000). In later regions of the auditory cortex, this information informs perceptual judgments. However, neural activity that reflects a listener’s perceptual judgments does not become apparent until the frontal lobe.

3 Category Representation in the Ventral Auditory Pathway

Next, we review the manner in which auditory-category information is hierarchically organized in the ventral auditory pathway. In brief, we will highlight how feature-based categories are represented early; whereas in later parts of the pathway, we find representations of “abstract” categories, which combine acoustic information with mnemonic, emotional, and other information sources.

In core auditory cortex, neural activity codes the category membership of simple feature conjunctions. For example, categorical representations of frequency-contours have been identified (Ohl et al. 2001; Selezneva et al. 2006). These neurons encode the direction of a frequency contour (increasing or decreasing), independent of its specific frequency content. These categories may be found in the firing rates of individual neurons or may be observed as a result of population-level computations.

Categories for more complex stimuli, such as speech sounds and vocalizations, can be found in the lateral belt (Chang et al. 2010; Steinschneider et al. 2011; Tsunada et al. 2011; Steinschneider 2013). For example, AL neurons respond categorically, and in a manner consistent with listeners’ behavioral reports, to morphed versions of two speech sounds (“bad” and “dad”). AL neurons also respond categorically to species-specific vocalizations; however, the degree to which AL (and listeners) can categorize these vocalizations is constrained by the vocalizations’ acoustic variability (Christison-Lagay et al. 2014). In humans, the superior temporal gyrus is categorically and hierarchically organized by speech sounds (Binder et al. 2000; Chang et al. 2010; Leaver and Rauschecker 2010): phoneme categories are found in the middle aspect; word categories in the anterior-superior aspect; and phrases in the most anterior aspect (DeWitt and Rauschecker 2012; Rauschecker 2012).

Beyond the auditory cortex, neurons represent categories that are formed based on the abstract information that is transmitted by sounds. vlPFC neurons represent the valence of food-related calls (e.g., high quality food vs. low quality food) (Gifford et al. 2005). That is, vlPFC neurons encode the “referential” information that is transmitted by vocalizations, independent of differences in their acoustic properties. Prefrontal activity also contributes to the formation of categories that reflect the emotional valence of a speaker’s voice (Fecteau et al. 2005) as well as the semantic information transmitted by multisensory stimuli (Joassin et al. 2011; Werner and Noppeney 2011; Hu et al. 2012). Together, these studies are consistent with the idea that ventral auditory pathway is an information-processing pathway that more complex stimuli and categories are processed in a hierarchically organized manner.

4 Future Questions

Of course, several fundamental questions remain. First, as alluded to above, understanding how feedforward versus feedback information contributes to neural correlates of perceptual judgments remains an open question. Second, the degree to which different types of auditory judgements differentially engage different components of the ventral pathway has yet to be fully articulated. A third question is to identify how the different computational processes (e.g., perceptual grouping, attention, decision-making, and categorization) that underlie auditory perception interact with one another. For example, it remains an open issue as to whether and how attention differentially modulates neural correlates of auditory perception at different hierarchical levels of the ventral auditory pathway (e.g., A1 versus AL) (Atiani et al. 2014). Another example is to identify the potential interactions between auditory perceptual grouping and decision-making. Finally, it is important to identify the manner by which the dorsal and ventral auditory pathways interact in order to form a consistent and coherent representation of the auditory scene.