Neural representations of faces and limbs neighbor in human high-level visual cortex: evidence for a new organization principle
- First Online:
- Cite this article as:
- Weiner, K.S. & Grill-Spector, K. Psychological Research (2013) 77: 74. doi:10.1007/s00426-011-0392-x
- 1.8k Views
Neurophysiology and optical imaging studies in monkeys and functional magnetic resonance imaging (fMRI) studies in both monkeys and humans have localized clustered neural responses in inferotemporal cortex selective for images of biologically relevant categories, such as faces and limbs. Using higher resolution (1.5 mm voxels) fMRI scanning methods than past studies (3–5 mm voxels), we recently reported a network of multiple face- and limb-selective regions that neighbor one another in human ventral temporal cortex (Weiner and Grill-Spector, Neuroimage, 52(4):1559–1573, 2010) and lateral occipitotemporal cortex (Weiner and Grill-Spector, Neuroimage, 56(4):2183–2199, 2011). Here, we expand on three basic organization principles of high-level visual cortex revealed by these findings: (1) consistency in the anatomical location of functional regions, (2) preserved spatial relationship among functional regions, and (3) a topographic organization of face- and limb-selective regions in adjacent and alternating clusters. We highlight the implications of this structure in comparing functional brain organization between typical and atypical populations. We conclude with a new model of high-level visual cortex consisting of ventral, lateral, and dorsal components, where multimodal processing related to vision, action, haptics, and language converges in the lateral pathway.
A brief history regarding the organization of hand- and face-selective neurons in monkeys: scattered, then clustered, then columned
Scattered: hand- and face-selective neurons discovered and then neglected for more than a decade
These findings of hand- and face-selective units were not received with much fervor because the definition of macaque IT cortex as a visual area was controversial in and of itself (Gross, 2008 for review). In fact, even though Gross and colleagues were the first to systematically measure the visual properties of IT cortex (Gross et al., 1969, 1972, 1967), it was more than a decade before any group replicated the receptive field properties of IT neurons (Richmond, Wurtz, & Sato, 1983), let alone the finding of hand- and face-selective neurons within this cortical expanse. Contributing to the controversy was the perceived sparsity of hand- and face-selective neurons. For example, Gross et al., (1969) reported only one hand-selective unit (of 51 recorded) in the original study, and then three hand- and three face-selective neurons (out of 205 that were visually responsive in TE) in the second study (Gross et al., 1972). Such small samples suggested that these cells were randomly scattered throughout IT cortex without any general organization principle, in stark contrast to the systematic organization of striate cortex (Hubel & Wiesel, 1962, 1968).
Clustered: face-selective cells are clumped together across several anatomical locations and hand-selective units are hard to find
Throughout the 1980s, the study of face-selective cells became more widely accepted and researchers began documenting functional properties of these cells related to different aspects of face processing. It was unknown (and still is presently) what an appropriate control stimulus is to compare to complex stimuli, such as faces and hands. Many of these studies defined face-selective units as those cells that fired more vigorously to the presentation of face images relative to both the spontaneous activity of the cell, as well as to a variety of control images, which could span from brushes (Gross et al., 1972) to ‘3D junk objects’ (Perrett, Rolls, & Caan, 1982; Rolls, 1984). Cells were also typically tested for additional selectivity features, such as responses to oriented bars, 2D shapes of various colors, aversive stimuli (such as images of snakes), as well as to tactile and auditory stimuli (Desimone, Albright, Gross, & Bruce, 1984). Finally, in order to be considered a face-selective neuron, a further criterion was added where a given cell needed to respond at least two times higher to faces than to the most effective control stimulus (e.g. Perrett et al., 1982). Once face-selective neurons were identified in this manner, they were reported to respond comparably across image formats (photograph, drawing) and across face species (human and monkey; Bruce, Desimone, & Gross, 1981; Desimone et al., 1984). Studies also showed that responses of face-selective cells decreased when parts of the face were removed or scrambled (Bruce et al., 1981; Desimone et al., 1984; Perrett et al., 1982), were modulated by the relative distance between internal facial features (Yamane, Kaji, & Kawano, 1988), and were tuned to specific face viewpoints (Perrett et al., 1982; Desimone et al., 1984). Importantly, these properties were maintained across changes in size and position suggesting a high-level representation (Desimone et al., 1984). It is critical to note that some cells illustrated selectivity for features, not the composite face, where these neurons illustrated comparable (and sometimes higher) responses to particular face features, such as the eyes or hair alone (Bruce et al., 1981; Perrett et al., 1982) compared to responses to the whole face (Perrett et al., 1982).
In addition to new insights regarding the functional properties of face-selective neurons, researchers began reporting larger populations of face-selective cells than the initial measurements, as well as documenting a correspondence between a particular anatomical location and a resulting cluster of face-selective cells. For example, in 1972, Gross and colleagues reported only 3 face-selective neurons out of 205 measured (~1.5%) in posterior TE. 15 years later, recording from a more anterior location, this number increased to 34% (17/50 neurons; Desimone et al., 1984). Shortly thereafter, Baylis, Rolls, and Leonard (1987) showed that more face-selective neurons were clustered on the upper and lower banks of the STS than in ventral IT by measuring functional properties of neurons within different cytoarchitechtonic subdivisions of the STS (see Fig. 2b for this anatomical distinction adapted from Saleem, Suzuki, Tanaka, & Hashikawa, 2000). Specifically, Baylis and colleagues showed that area TPO in the upper bank of the STS (44/244; 18% face-selective) and areas TEa (53/250; 21%) and TEm (51/232; 22%) in the lower bank of the STS contained higher concentrations of face cells than other neighboring areas in the upper and lower banks of the STS (Baylis et al., 1987; parcellation terminology from Seltzer & Pandya, 1978; Fig. 2b).
Columnar organization: a general organization principle in macaque high-level visual cortex
In addition to the reports of clustered face-selective cells in particular anatomical locations (Baylis et al., 1987), Perrett et al., (1984) also illustrated evidence for a potential columnar organization within the STS for face-selective cells, as well as additional neurons selective for moving bodies (Perrett et al., 1985a, b). We interpret their proposal of IT organization to have three general features. First, selective cells are clustered in small patches (0.5–2 mm in diameter) on the cortical surface (Perrett et al., 1984, 1985a). Second, within each of these patches, columns extend as much as 2 mm downward with cells illustrating similar stimulus selectivities on a given vertical electrode penetration. Third, nearby columns illustrate associated selectivities—for example, for different rotations of the head (Perrett et al., 1985b). Tanaka and colleagues extended Perrett’s findings in a series of influential studies demonstrating a general columnar organization of IT cortex, whereby cells preferring similar features tended to cluster in vertical columns perpendicular to the cortical surface about 0.4 mm in diameter (Fujita, Tanaka, Ito, & Cheng, 1992; Tanaka, 1996; Tanaka, Saito, Fukada, & Moriya, 1991; Wang, Tanaka, & Tanifuji, 1996). Tanaka’s group showed this columnar organization for hand-selective cells (Tanaka et al., 1991), moderately complex features (Fujita et al., 1992; Tanaka et al., 1991), and face viewpoint (Wang et al., 1996). In addition to this fine-scale organization, Harries and Perrett (1991) also reported a larger scale organization in macaque STS. They reported clusters of face cells approximately 3–4 mm in diameter along the STS, with a periodic organization in which dense clusters of face-selective cells alternated with clusters of cells that were not face-selective generating an ‘inter-cluster distance’ on the order of 3 mm (Fig. 2c adapted from Perrett, Hietanen, Oram, & Benson, 1992).
There are two key differences in findings across groups. First, Perrett’s sets of recordings were performed on the upper banks of the STS (Harries & Perrett, 1991; Perrett et al., 1985a; Perrett et al., 1984; Fig. 2b, c), while Tanaka’s recordings were in anterior TE (Fujita et al., 1992; Tanaka et al., 1991; Wang et al., 1996; Fig. 2b, c). Second, though both sets of findings conclude with a columnar organization, the definition of column and the associated theory with each definition is different. In Perrett’s definition, many columns make-up one cluster with a particular stimulus-selectivity which then produces a large-scale periodic organization of multiple face patches in the STS. Tanaka’s columns represent a general organization principle in IT cortex for the representation of object features with no additional macroscopic structure. Nevertheless, Tanaka and colleagues suggested that faces may have separate representations which represent facial features and configurations that are not shared by other objects (Tanaka, 1996). Though both the theory and definition of columnar organization are different across groups, converging results across these studies indicate a fine-scale (Fujita et al., 1992; Perrett et al., 1984, 1985a; Tanaka et al., 1991; Wang et al., 1996) and a potentially larger-scale structure (Harries & Perrett, 1991; Fig. 2c) in the organization of cells selective for faces, hands, and moving bodies in different portions of the temporal lobe that had not been documented before.
Summarizing the organization of face- and hand-selective neurons in monkey STS and IT cortex from their discovery until the advent of fMRI in the early 1990s
When Gross and colleagues began to study IT cortex in the late 1960s and early 1970s (see Timeline), they presented data from hand- and face-selective neurons together. Though hand-selective neurons were discovered first, the study of face-selective neurons solidified a niche in visual neuroscience 20 years later, while the study of hand- and other limb-selective neurons evaded researchers. In the late 1980s, there seemed to be a general correspondence between anatomical location and clustering of face-selective cells with a potential columnar organization. By the mid 1990s, the pairing of neurophysiological recording and optical imaging enabled the observation of a columnar structure in monkey IT. Furthermore, there was a reappearance of reports of neurons selective for static hands (Tanaka et al., 1991), the entire body (Wachsmuth, Oram, & Perrett, 1994) and moving hands (Perrett, Mistlin, Harries, & Chitty, 1990). During the same time period, Perrett and collaborators also reported larger clusters of face-selective cells in macaque STS that were 3–4 mm wide with an inter-cluster distance of 3 mm, illustrating a putative periodicity of face-selective cells along the STS (Harries & Perrett, 1991). Though the organization of macaque face- and hand-selective cells vastly evolved from randomly scattered, to clustered, to periodically clustered with a columnar organization, how this organization related to human cortex was still largely unknown.
The fusiform face area: a trend begins for the study of category-selective regions in humans
With the advent of fMRI in 1992 (Kwong et al., 1992; Ogawa et al., 1992), a trend emerged in the mid 1990s where researchers began to non-invasively map face-selective regions in the human brain. These studies were inspired by neurophysiological findings in monkeys described above, as well as behavioral and neural findings from neuropsychological case studies and invasive measurements in both patient and typical populations. Neuropsychological studies of face-blindness, or prosopagnosia (Bodamer, 1947), suggested that damage to ventral occipitotemporal cortex, especially the right fusiform gyrus, resulted in specific deficits in face recognition that do not generalize to other modalities (Damasio, Damasio, & Van Hoesen, 1982) or to other classes of visual stimuli such as objects or tools (Benton, 1980; Damasio et al., 1982; De Renzi, 1986; Hecaen & Angelergues, 1962; Landis, Cummings, Christen, Bogen, & Imhof, 1986; McNeil & Warrington, 1993; Sergent & Signoret, 1992). Furthermore, subdural recordings of neurons in human patients illustrated face-selective responses in both VTC and LOTC. Using single-unit methods, Ojemann, Ojemann, and Lettich (1992) showed that neurons in the human right middle and superior temporal gyri responded more during tasks associated with matching facial identity and facial expression than during object naming or matching. When measuring subdural field potentials, a series of studies reported higher responses to faces compared to words, letterstrings, numbers, colors, scrambled stimuli, and objects on the fusiform and inferotemporal gyri across hemispheres (Allison et al., 1994a; Allison, McCarthy, Nobre, Puce, & Belger, 1994b; Nobre, Allison & McCarthy, 1994). In typical populations, positron emission tomography (PET; Clark et al., 1996; Haxby et al., 1991, 1994; Sergent, Ohta, & MacDonald, 1992) studies reported functionally dissociable face-selective regions along the fusiform gyrus: the posterior fusiform gyrus and occipitotemporal sulcus activated during tasks of face matching and face gender discrimination (Haxby et al., 1994; Sergent et al., 1992), while the right mid-fusiform gyrus was activated during face identification (Sergent et al., 1992). Motivated by these findings of face-sensitive regions in VTC, early fMRI studies measured BOLD responses to images of faces compared to those of scrambled faces, textures, common objects, or consonant strings and found a network of regions that responded more strongly to intact faces spanning the fusiform, inferotemporal, and inferior occipital gyri, as well as the superior temporal sulcus (Clark et al., 1996; Puce, Allison, Asgari, Gore, & McCarthy, 1996; Puce, Allison, Gore, & McCarthy, 1995; see Timeline).
However, in 1997, a new trend emerged when Kanwisher, McDermott, and Chun (1997) introduced the functional localizer approach to examine the properties of face-selective regions. By first identifying a particular region of interest (ROI) in each subject with one set of functional scans (e.g. images of faces > images of objects), Kanwisher and colleagues then used a variety of different types of images similar to those used in the early Gross and Perrett studies (e.g. faces with eyes removed, scrambled internal features, etc.) to examine the functional properties of these regions in independent sets of experiments. In doing so, they reported a single area in the fusiform gyrus specialized for perceiving faces: ‘Our strategy was to ask first whether any regions of occipitotemporal cortex were significantly more active during face than object viewing; only one such area (in the fusiform gyrus) was found consistently across most subjects’ (p. 4303). This lead to the conclusion of a single area selective for faces on the fusiform gyrus labeled area FF (or the Fusiform Face Area, FFA; Kanwisher et al., 1997).
For comparison, the first fMRI retinotopic mapping study to use cortical surface visualizations (published 2 years prior to the seminal FFA work) identified areas V1-V4v ventrally and V1-V3 dorsally (7 total maps; Sereno et al., 1995). Presently, neuroscientists have identified a series of eight maps extending ventrally from V1 to the temporal lobe (V1v-V3v, hV4, VO-1/2, PHC-1/2; Arcaro, McMains, Singer, & Kastner, 2009; Brewer, Liu, Wade, & Wandell, 2005; Wandell, Dumoulin, & Brewer, 2007), 12 maps extending dorsally into the parietal lobe (V1d-V3d, V3A, V3B, V7, IPS-1/5, SPL-1; Konen & Kastner, 2008; Silver, Ress, & Heeger, 2005; Swisher, Halko, Merabet, McMains, & Somers, 2007; Tootell et al., 1998), and more than four maps laterally (LO-1/2; TO-1/2; pMSTv, pFST, pV4t; Amano, Wandell, & Dumoulin, 2009; Huk, Dougherty, & Heeger, 2002; Kolster, Peeters, & Orban, 2010; Larsson & Heeger, 2006). Conservatively, that is three times as many visual field maps as reported around 1995–1997. Notably, definitions of several of these maps have been revisited and re-parcellated as both methods and empirical ideas evolve (e.g. V4/V8 vs. hV4/VO-1/VO-2: Brewer et al., 2005; Hadjikhani, Liu, Dale, Cavanagh, & Tootell, 1998; V4d vs. LO-1/LO-2: Hansen, Kay, & Gallant, 2007; Larsson & Heeger, 2006; Tootell & Hadjikhani, 2001; Wade, Augath, Logothetis, & Wandell, 2008). Yet in the same passage of time, the concept of a single FFA has largely remained unrevised, even with improvements in scanning methods and visualizations illustrating more than one face-selective region on the fusiform (Fig. 3b, c). For example, it is not uncommon for research groups to refer to several face-selective regions spanning different anatomical locations (sometimes from the posterior fusiform gyrus all the way to the tip of the temporal lobe) together as the FFA (Fig. 3b). Other times, research groups separate some face-selective regions from one another (AFP1 and AFP2, Tsao, Moeller, & Freiwald, 2008; Fig. 3c), yet still combine multiple fusiform regions together into a single FFA despite the comparable anatomical distances separating each pair of regions (Fig. 3c). Such variability in FFA definitions illustrates the need for a parcellation framework to implement consistent parcellation practices across research groups.
Despite these issues, a number of face- and body part-selective regions have been identified and widely examined in addition to the EBA and FFA in VTC (fusiform body area, FBA; Peelen & Downing, 2005; Schwarzlose, Baker, & Kanwisher, 2005) and LOTC (occipital face area, OFA; Gauthier, Skudlarski, Gore, & Anderson, 2000), as well as in the posterior superior temporal sulcus (pSTS; Puce et al., 1995). Most recently, fMRI studies have identified an increasing number of face- and body-selective regions in high-level visual cortex, including two face-selective regions on the fusiform gyrus (FFA-1 and FFA-2; Pinsk et al., 2009), a region in anterior temporal cortex 40 mm in front of the more anterior fusiform face-selective activation (Kriegeskorte, Formisano, Sorger, & Goebel, 2007; Nestor, Plaut, & Behrmann, 2011; Pinsk et al., 2009; Rajimehr, Young, & Tootell, 2009; Tsao et al., 2008), and two regions on the anterior and middle STS (Calder et al., 2007; Pinsk et al., 2009; Winston, Henson, Fine-Goulden, & Dolan, 2004). Likewise, fMRI studies of body part-selective regions have documented more than one activation on the fusiform gyrus (FBA-1 and FBA-2; Pinsk et al., 2009), as well as focal selectivity for specific body parts in LOTC and VTC for hands, torsos, and legs (Bracci, Ietswaart, Peelen, & Cavina-Pratesi, 2010; Chan, Kravitz, Truong, Arizpe, & Baker, 2010; Op de Beeck, Brants, Baeck, & Wagemans, 2010; Orlov, Makin, & Zohary, 2010).
Is there a consistent spatial organization of face- and limb-selective regions in ventral temporal cortex?
If so, does this organization principle of a reliable spatial relationship among face- and limb-selective regions extend to lateral occipitotemporal cortex?
Summary of recent findings
Face- and limb-selective regions alternate throughout ventral temporal and lateral occipitotemporal cortices
Applying higher-resolution fMRI (1.5 mm voxels) than past studies (3–5 mm voxels) in a series of experiments, we examined the spatial characteristics of face- and limb-selective activations implementing a different approach than typically used. Presently, researchers commonly label any face-selective voxels in the fusiform gyrus as ‘FFA’ and any body part-selective voxels in LOTC as ‘EBA’ (Figs. 3, 4). Such an approach results in extensive variability in the anatomical location of these areas across subjects and research groups (Figs. 3, 4). This variability can lead to an inconsistent spatial relationship among functional regions, which in turn affects the interpretation of the organization. Often, this inconsistency is interpreted to reflect substantial inter-subject variability of activations in human high-level visual cortex.
Our new approach (Weiner & Grill-Spector, 2010, 2011) to systematically parcellate face- and body part-selective regions uses well-known principles that are used to parcellate early retinotopic areas. We delineate activations in single subjects on their cortical surfaces using anatomical and functional criteria, creating boundaries between functionally defined regions when there is a change in selectivity. Face-selective regions were defined by higher BOLD responses to images of faces compared to images of limbs, flowers, cars, guitars, and houses, (t > 3, P < 0.002, voxel level) and limb-selective regions were identified by comparing BOLD responses to images of limbs with responses to images of faces, flowers, cars, guitars, and houses (t > 3, P < 0.002, voxel level; see Weiner & Grill-Spector, 2010, 2011 for details). We chose these comparison stimuli as they are each of a visually coherent category and provide a broad baseline of comparison objects. Limbs were used as representative body part stimuli because they are the most common stimuli used to localize the EBA and FBA (Supplemental Table 1 from Weiner & Grill-Spector, 2011). These contrasts typically yield multiple activations rather than a single EBA and FFA (a fact often illustrated in prior figures, but not addressed in print; Figs. 3, 4). In order to implement consistent parcellation across subjects, we distinguished regions with the same selectivity from one another if they were anatomically segregated and contained a region with a different selectivity between them. If no intervening clusters were present, activations were merged if they were in close proximity to one another. We then examined the spatial relationship of face- and limb-selective regions relative to (1) each other, (2) known visual field maps, and (3) other known functionally defined regions such as hMT+ that are associated with stable anatomical landmarks.
Alternating and adjacent face- and limb-selective regions in occipitotemporal cortex
Reliable anatomical and functional boundaries to delineate face- and limb-selective regions in ventral temporal cortex
Location of face- and limb-selective regions in Talairach space (SDs across 9–11 subjects)
Finding a consistent anatomical location and spatial relation among functional activations is important because these two criteria reflect fundamental cortical organization principles. Importantly, these two principles have been used to parcellate cortex in the macaque (Felleman & Van Essen, 1991) and are evident among early retinotopic areas across primate species. For example, in humans, there are a series of retinotopic maps in each hemisphere extending from V1 dorsally in a specific order with particular characteristics: a hemi-field representation (V1), two mirror-reversed quarter-fields (V2d, V3d), and a second hemi-field representation (V3a). Since this mapping is consistent across subjects, researchers are able to define these regions identically in individual subjects. Our data extend these principles of consistent anatomical location and spatial relation among activations to high-level regions and provide strong evidence for a parsimonious organization principle applicable to the entire visual system. In contrast, domain-specificity (which is the principle used to derive the FFA and EBA) proposes separate organization principles across visual cortex: retinotopy in early and intermediate visual areas and functionally specialized modules for a select number of categories in high-level visual cortex (Kanwisher, 2010). As an outcome, the visual system is dichotomized where it is highly organized across early visual areas and rather disorganized across high-level visual areas. However, this disorganized principle of high-level visual cortex is problematic as we have shown that prior definitions of the FFA violate these two principles of anatomical location and spatial relationship, where both the anatomical location of the FFA and the spatial relationship between the FFA and FBA change from subject to subject (Peelen & Downing, 2005; Peelen et al., 2006; Pinsk et al., 2009; Schwarzlose et al., 2005). The present data clarify this discrepancy by showing that there are actually two face-selective regions 15 mm apart on the fusiform instead of a single FFA, where these regions are reliably separable by the limb-selective OTS.
These findings in VTC lead to an important question: Does this consistent topographicrelationship among face- and limb-selective regions extend to LOTC? If so, this would suggest that the parsimonious organization principle discovered in VTC is generalizable throughout high-level visual cortex. We recently addressed this question empirically (Weiner & Grill-Spector, 2011) and summarize our results below.
Reliable anatomical and functional boundaries to delineate face- and limb-selective regions in lateral occipitotemporal cortex
To address if the consistent topographic relationship among face- and limb-selective regions observed in VTC extends to other portions of the brain, we measured face- and limb-selective responses in each subject’s LOTC using the same analyses described in the prior section. In addition, we also localized hMT+ in each subject because it is adjacent to these activations and is associated with a specific anatomical location on the ascending limb of the posterior inferotemporal sulcus (Amano et al., 2009; DeYoe et al., 1996; Dumoulin et al., 2000; Tootell et al., 1995; light blue outlined in black in Fig. 1). Thus, it serves as a reliable anchor from which to generate functional boundaries that are closely linked to the underlying anatomy (see Weiner & Grill-Spector, 2011 for details). Similar to the analysis of VTC organization, we examined the spatial organization of limb- and face-selective regions relative to each other in LOTC, as well as relative to (1) anatomical landmarks, (2) hMT+, and (3) known visual field maps.
As in VTC, we illustrate three important findings regarding the functional organization of LOTC (Fig. 6; Supplementary Figs. 3, 4): (1) there are several face- and limb-selective activations in LOTC in distinct anatomical locations, (2) they have a consistent spatial organization relative to each other, as well as (3) relative to hMT+. Specifically, we do not find evidence for one EBA in LOTC, as is commonly reported, but rather a series of limb-selective activations located around the perimeter of hMT+ (illustrated by a dotted black line in Fig. 6) where each is associated with a distinct anatomical landmark and consistent spatial relation to hMT+ (Table 1 for Talairach coordinates). The first activation is located on the lateral occipital sulcus/inferior portion of the middle occipital gyrus (LOS/MOG) and is posterior to hMT+. The second activation is located on the inferior temporal gyrus (ITG) and inferior to hMT+. The third activation is located on the middle temporal gyrus (MTG) and anterior to hMT+. The crescent organization surrounding hMT+ is reproducible over a span of 3 years (Supplementary Fig. 3) and a variety of contrasts using different body part and control stimuli (Supplementary Fig. 4). Furthermore, using anatomical landmarks and the spatial relationship to hMT+ to define the original limb-selective ROIs accurately predicts functional differences across these ROIs 3 years later (Supplementary Fig. 4).
Notably, there is also a consistent organization of LOTC limb-selective regions relative to face-selective regions as illustrated in Fig. 6 (see also Supplemental Fig. 1 from Weiner & Grill-Spector, 2010). In particular, there is a ring organization of alternating face- and limb-selective regions surrounding, but largely not encroaching into, hMT+. Specifically, the face-selective pSTS is superior to hMT+ and located between the limb-selective LOS/MOG and limb-selective MTG. Likewise, the face-selective IOG is located between the limb-selective LOS/MOG and the limb-selective ITG, as well as located on the inferior corner of hMT+.
Taken together, the alternating series of face- and limb-selective regions in VTC extends to LOTC, indicating that the topographic relationship between face- and limb-selective regions generalizes across high-level visual cortex. Furthermore, our data illustrate there is not one EBA in LOTC and not one FFA in VTC, but rather a fine-scale spatial organization of these activations relative to one another and specific anatomical landmarks. It is possible that this organization has previously been missed because of methodological reasons such as scanning with larger voxels (>3 mm) and the use of inplane visualizations. We directly relate the stability of our higher-resolution measurements to measurements with larger voxels and different visualizations below.
Methods and measurements produce theories: the way in which data is acquired, analyzed, and visualized can lead to misleading interpretations
Recommendations for methodological decisions in fMRI data analysis pipelines in high-level visual cortex
Smaller voxels (1–2 mm)
Reduces effects of partial voluming and susceptibility artifacts
Results in inaccurate localization practices and potential averaging of regions that are distant along the cortical surface
Gray matter segmentation
Reduces partial voluming effects across gray matter (cortex) and white matter (axons); restricts data analysis to the tissue containing neurons
Single subject analyses
Functional data stays true to the gyral and sulcal patterns of a given brain, improving the accuracy of localization
Inflated cortical surface
Accurately depicts the spatial relationship among cortical locations typically obscured in inplane or volume views due to the complexity of cortical folds
Taken together, the exploration of these methodological issues (Supplementary Materials) indicate that the factor most detrimental to accurately examining spatial organization of functional activations is spatial smoothing. Even researchers using large voxels and inplane visualizations can implement the parcellation methods used here as long as spatial smoothing is not used (Fig. 8). Importantly, precisely defining functional regions at the correct spatial scale and anatomical location identifies functional distinctions among activations that are lost when data are spatially smoothed or inaccurately combined (e.g. Figs. 3, 4). For example, in VTC, mFus-faces shows more fMRI-adaptation to repeated images than pFus-faces, illustrating a potential hierarchical organization extending from V1 into VTC based on adaptation characteristics (Fig. 9a; adapted from Weiner et al., 2010). Similarly, functional differences are found in LOTC, whereby LOS-, ITG- and MTG-limbs illustrate different retinotopic properties. There is a strong contralateral bias in LOS-limbs, which decreases progressively to MTG-limbs, with a concomitant increase in foveal bias (Fig. 9b; adapted from Weiner & Grill-Spector, 2011).
Theoretical implications and discussion
Neural representations of faces and limbs: cortical neighbors in lateral and ventral high-level visual cortex
We expand below on the implications of these findings: (1) as a new general organization principle in high-level visual cortex, (2) for the comparison of cortical organization between typical and atypical populations, and (3) in relation to the organization of face- and body part-selective regions in non-human primates.
Topographic relationship among face- and limb-selective regions as a new organization principle in human high-level visual cortex
Faces and limbs are ecologically and socially relevant classes of visual stimuli with a statistical regularity in their visual appearance: heads are most often above bodies and limbs are often just offset from the body. The present data show that this consistent topography of faces and limbs in stimulus space is reflected in cortical space where there is a consistent topographic nature of face- and limb-selective regions. A well-known example illustrating a correspondence between stimulus space and neural representation is a retinotopic map, where two adjacent points in visual space are projected to two adjacent points on the retina. This retinotopic relationship then extends to visual cortex where adjacent points on the retina map to adjacent locations on the cortical surface (Wandell & Winawer, 2011 for review). Unlike retinotopic mapping methods, our stimuli are not presented in a way that smoothly varies the relative positions of faces and limbs. Instead, all our stimuli are presented in the center of the visual field, in line with the early studies from Kanwisher and colleagues. However, when one sees an image of a face, it is understood that the body (limbs included) are underneath it (and vice versa). We propose that the regularity in which faces, limbs, and bodies are presented relative to one another in everyday life has been incorporated into the visual system resulting in a map of alternating face- and limb-selective regions throughout high-level visual cortex. A recent paper supports this proposal illustrating a topographic organization for body part representation (upper/lower half of the face, arms, legs, and torso) in human LOTC (Orlov et al., 2010). However, Orlov and colleagues did not propose a parcellation scheme or report the regularity of face- and limb-selective organization relative to anatomical landmarks. Thus, our data introduce a new set of principles, and in turn an unconsidered organization, of high-level visual cortex where there is a systematic and alternating representation of faces and limbs in predictable anatomical locations.
We label these face- and limb-selective activations as ‘regions’ rather than ‘areas’ and preface the type of stimulus selectivity with the anatomical loci of these regions most consistently found across subjects (e.g. pFus-faces or OTS-limbs). Such a labeling reflects the preserved spatial relationship of anatomical landmarks, as well as the alternating stimulus selectivity. Since large-scale neuroanatomy is stable, the spatial relationship of anatomical landmarks will be preserved, as will the relationship among the associated functional regions. For example, anatomically, the ITG will always be anterior to the IOG. Functionally, then, ITG-limbs will always be anterior to IOG-faces. This principle is also applicable within a given anatomical structure such as the fusiform gyrus where mFus-faces will always be anterior to pFus-faces. Consequently, using these labels will increase the generalizability within a subject across time (Supplementary Figs. 1–4), across subjects, and across research groups examining either typical or atypical subject populations.
Anatomical location and spatial relationship of high-level visual regions are important for the comparison of cortical organization between typical and atypical populations
The present data indicate the correspondence between gross anatomical landmarks and a given functional region in high-level visual cortex, which allows the potential integration of identified regions with underlying anatomical structure (such as cytoarchitecture) in future studies of typical and clinical populations. The utility of this prospect is illustrated in a recent study examining cytoarchitechtonic differences of the fusiform gyrus in post-mortem brains of autistics and typical subjects (van Kooten et al., 2008). Specifically, van Kooten et al., (2008) write: ‘The fusiform face area (FFA) within the (fusiform gyrus) could not be identified separately because neither gross anatomical landmarks nor cytoarchitectonic criteria have been established in the literature to identify the FFA within the (fusiform gyrus) in human post-mortem brains’ (p. 989). Thus, the present findings now make it possible for future studies to link known functional properties with underlying neuroanatomical structures in high-level visual cortex of either typical or atypical populations. While the present mapping methods are conducted in individual subjects with high-resolution fMRI to assure precise functional localization that respects the nuances of each subject’s anatomy that group analyses do not allow, there is promise to combine the multiple mapping methods used here from individual subjects into an average functional brain that respects the macro-anatomical structure of each respective subject as recently illustrated using a probabilistic atlas approach across subjects (Frost & Goebel, 2011).
In addition to gross anatomical landmarks, we also provide a precise model of high-level visual cortex documenting a preserved spatial relationship among regions (Fig. 10), which can be used to compare to cortical organization in clinical populations. Present fMRI work examining the cortical consequence of perceptual deficits in face perception examines the presence, size, connections, or functional properties of a specific activation in patient populations (Golarai et al., 2010; Rossion et al., 2003; Schiltz et al., 2006; Thomas et al., 2009). But, how category-selective regions are organized relative to other high-level visual regions is also critical. For example, it may be possible that some clinical conditions may be associated with spatial reorganization of these regions. If so, the measurements proposed here and the comparison to the standard model (Fig. 10) could be used as diagnostics for identifying the condition in an individual subject.
Clustered and connected: fMRI and connectivity studies in non-human primates show an interconnected system of face-selective regions—but what about limbs?
Technological advancements enabling fMRI of awake, behaving non-human primates (Logothetis, Guggenberger, Peled, & Pauls, 1999), reveal as many as six face-selective patches in specific anatomical locations in the macaque temporal lobe (Freiwald & Tsao, 2010; Freiwald, Tsao, & Livingstone, 2009; Hadj-Bouziane, Bell, Knusten, Ungerleider, & Tootell, 2008; Hoffman, Gothard, Schmid, & Logothetis, 2007; Ku, Tolias, Logothetis, & Goense, 2011; Pinsk et al., 2009; Pinsk, DeSimone, Moore, Gross, & Kastner, 2005; Tsao, Freiwald, Knutsen, Mandeville, & Tootell, 2003; Tsao, Freiwald, Tootell, & Livingstone, 2006; Tsao et al., 2008). These patches contain a large proportion of face-selective neurons ranging between 52% and 97% of visually-responsive neurons (Freiwald & Tsao, 2010; Freiwald et al., 2009; Moeller, Freiwald, & Tsao, 2008; Tsao et al., 2006), which is much higher than 20–35% reported by early neurophysiology studies (Baylis et al., 1987; Desimone et al., 1984). Microstimulation methods further demonstrate that these face-selective patches are interconnected, creating an extended face-selective network (Moeller et al., 2008). But, what about limbs?
Just as the study of face-selective neurons gained popularity before the study of body part-selective neurons (see History section), there have been more fMRI studies examining face-selective regions in the monkey than regions selective for images of limbs or bodies. However, a few studies using either fMRI or optical imaging in monkeys report body part-selective patches cortically proximate to face-selective clusters (Borra, Ichinohe, Sato, Tanifuji, & Rockland, 2010; Pinsk et al., 2005, 2009; Sato, Uchida, & Tanifuji, 2008; Tsao et al., 2003), suggesting that the adjacent and alternating relationship among face- and limb-selective activations reported here may extend to monkeys. Most recently, using fMRI-guided neurophysiology, Bell et al., (2011) examined the relationship between the distribution of neurons within and outside face- and limb-selective regions localized with fMRI reporting three relevant findings. First, the concentration of selective neurons within face- and limb-selective regions was higher than for object- and place-selective regions. This supports our present stance that face- and limb-selective neural responses are good comparator systems for one another. Second, there were higher percentages of selective neurons within a given region than just outside (1–4 mm) or far outside (>4 mm) it, with the highest proportions in the center of a region. This indicates that the ‘inter-cluster’ distance between regions selective for faces and limbs contain selective neurons, but in smaller concentrations than within regions. Third, the proportions of recorded cells outside a region corresponded nicely to those percentages reported from the early studies of face-selective cells reviewed here (Perrett et al., 1982; Baylis et al., 1987; Desimone et al., 1984). These results indicate the utility of studying face- and limb-selective regions together, as well as the benefit of using high-resolution fMRI in humans (without spatial smoothing) to target the regions of interest since the highest concentration of selective neurons are likely to be in the center of fMRI activations. Further, these data support both modular and distributed elements in the organization of face- and body part-selective responses, which is consistent with other recent studies that used anatomical tracers in monkeys (Borra et al., 2010), as well as our results of a sparsely-distributed organization in humans revealed by high-resolution fMRI measurements (Weiner & Grill-Spector, 2010).
An important question remaining for future studies is: What are the mechanisms that generate the cortical correspondence among face- and limb-selective regions? One possibility is that neural responses to face and limb stimuli develop over time due to their joint frequency in the environment. This suggests that the selectivity of face and limb regions results from the ecological relevance of faces and limbs, as well as their high frequency in the natural world. However, reports of face-selective neurons in monkeys as young as five and a half weeks (Rodman, Scalaidhe, & Gross, 1993; Rodman, Skelly, & Gross, 1991) raise the possibility of an early maturation of these activations or even an innate bias for these stimuli. A related question is whether adjacent face- and limb-selective patches reflect two neighboring but separate cortical systems for face and limb processing, or a single system of alternating (but interconnected) face- and limb-selective regions that share connections at their boundaries. Some clues regarding this question come from microstimulation experiments, where microstimulating face-selective clusters in monkeys yields activation in other face-selective sites, but also extends outside their boundaries (Moeller et al., 2008)—where the present data would suggest a limb-selective region. Future work using a combination of methods such as fMRI, microstimulation, and single unit recording may address the transition between each stage of organization from single neurons to columns, to functional regions, to adjacent and alternating networks in IT cortex. These future studies will shed light on the organizational mechanisms across micro- and macro-level scales.
A new three-stream model of high-level visual cortex
Why might high-level visual cortex contain multiple face- and limb-selective regions?
The ventral stream: the role of ventral temporal cortex in recognition and memory
The ventral stream extends from early visual areas to ventral aspects of the occipital and temporal lobe (Fig. 1). It is well known that VTC is involved in visual recognition from lesion studies in monkeys and neuropsychological studies in humans documenting that damage to different portions of the temporal lobe produces specific deficits in object and/or face recognition (Damasio et al., 1982; Farah, 1990; Goodale, Milner, Jakobson, & Carey, 1991; Rossion et al., 2003; Sergent & Signoret, 1992; Ungerleider & Mishkin, 1982). Consistent with these reports, functional neuroimaging studies show that activations in VTC are correlated with successful recognition (Bar et al., 2001; Grill-Spector, Kushnir, Hendler, & Malach, 2000; Moutoussis & Zeki, 2002). For example, face-selective regions in lateral VTC show higher responses for the successful perception of faces during illusory and ambiguous stimuli (Andrews, Schluppeck, Homfray, Matthews, & Blakemore, 2002; Hasson, Hendler, Ben Bashat, & Malach, 2001; Tong, Nakayama, Vaughan, & Kanwisher, 1998), as well as for detection and identification of faces (Grill-Spector, Knouf, & Kanwisher, 2004). Given this role of VTC in visual recognition, and the adjacency of limb-selective regions relative to face-selective regions, we predict that the limb-selective OTS is involved in recognition of body parts, which can be tested in future research. In addition to the fine-grained parcellation of face- and limb-selective regions based on both anatomical and functional boundaries in VTC, there are also functional differences between more general anatomical subdivisions of VTC. Specifically, lateral VTC (from the occipitotemporal sulcus to mid-fusiform sulcus) illustrates qualitatively different temporal dynamics than medial VTC (from the mid-fusiform sulcus to parahippocampal gyrus) during prolonged presentations of various visual stimuli (Gilaie-Dotan, Nir, & Malach, 2008) and repetitions of visual stimuli across different timescales (Weiner et al., 2010). Based on these results, we have recently proposed that lateral VTC is involved in perception, whereas medial VTC is a gateway between perception and memory (Weiner et al., 2010). Future studies will help elucidate behavioral consequences of these organizational differences and how they affect aspects of perception and memory.
The dorsal stream: the role of posterior parietal cortex in position, motion, spatial working memory and attention, form, and action
The dorsal stream extends from early visual areas to the dorsal aspects of the occipital lobe extending into the parietal lobe. Prevailing views implicate the dorsal stream in different aspects of spatial vision (Ungerleider & Mishkin, 1982), visually guided actions toward objects (Goodale et al., 1991), and even time (Battelli, Pascual-Leone, & Cavanagh, 2007). Here, we focus only on the posterior aspect of the parietal lobe, as the processes within this cortical region are largely visual in nature. We recently reported a limb-selective region (limb-selective IPS) consistently overlapping visual field map V7 (also referred to as IPS-0; Swisher et al., 2007), where this limb-selective IPS is sensitive to the position of the limb in the visual field (Weiner & Grill-Spector, 2011). In addition to selectivity for static limb images, posterior parietal cortex in and around V7 has also been implicated in different aspects of spatial working memory, attention, and motion (Konen & Kastner, 2008; Orban et al., 2006; Silver et al., 2005; Tootell et al., 1998; Xu & Chun, 2006), indicating the integration of several computational processes within this cortical region. Indeed, a series of studies examining the neural processing of limb actions have documented a clear anatomical and functional dissociation of parietal cortex where posterior IPS regions are involved in the observation and visual guidance of limb movements (a combination of position, motion, and limb form), while the anterior IPS regions are more involved in the execution of limb movements themselves (Filimon, Nelson, Huang, & Sereno, 2009; Levy, Schluppeck, Heeger, & Glimcher, 2007). Such results are in line with patient studies reporting that focal damage to posterior parietal cortex produces specific deficits in identifying and pointing to body parts—either their own (autotopagnosia; De Renzi, 1982; Ogden, 1985) or others (heterotopagnosia; Auclair, Noulhiane, Raibaut, & Amarenco, 2009; Cleret de Langavant, Trinkler, Cesaro, & Bachoud-Levi, 2009). Whether this perceptual deficit is a direct result of local cortical damage or reflects a disruption of connections within the extended cortical network of limb-selective regions in the ventral or lateral pathways is an open question. We propose that the posterior parietal cortex (in the vicinity of V7) is a transitional stage in the dorsal pathway functioning to convert visual inputs into action outputs, whereas the anterior IPS is more involved in the actions themselves. Relevant to the topics in this Special Issue, future research will elucidate whether the limb-selective IPS reflects visual processing associated with the form of the limb itself, or reflects a visual representation embodied in the context of an action representation.
The lateral stream: the role of lateral occipitotemporal cortex in form, motion, and multimodal processing
Traditionally, the visual system is divided into ventral and dorsal pathways, where area MT is typically assigned to the dorsal stream consistent with its anatomical location in the monkey (Ungerleider & Mishkin, 1982). However, in humans, MT is farther from parietal cortex, located more inferiorly in the posterior inferior temporal sulcus (Dumoulin et al., 2000; Tootell & Taylor, 1995). This difference in the anatomical location of MT, as well as the more inferior positioning of the ventral stream and more superior location of the dorsal stream in humans compared to monkeys, has been proposed to reflect the cortical expansion accommodating emergent language properties in humans (Orban, Van Essen, & Vanduffel, 2004; Ungerleider, Courtney, & Haxby, 1998). Expanding on these proposals, we suggest that this difference reflects a lateral pathway in the human brain incorporating different aspects of vision, action, and language. For example, our present measurements document face- and limb-selective regions radiating around both MT and MST (Weiner & Grill-Spector, 2011; Fig. 10 for schematic). This organization seems to be specific to humans as fMRI studies in non-human primates illustrate face- and body part-selective regions cortically distant from MT, located more ventrally in portions of TEO and TE (Fig. 2; Tsao et al., 2003; Pinsk et al., 2009). We propose that the organization of face and limb-selective regions around hMT+ is a unique feature of the human lateral surface and expand on recent results from neuropsychology and neuroimaging studies providing evidence that LOTC is functionally distinct from the dorsal and ventral streams.
Neuropsychology studies examining face and body part processing suggest that damage to LOTC results in perceptual deficits separate from processing associated with dorsal or ventral high-level visual cortex. Damage to the lateral surface near the limb-selective LOS results in a general body agnosia with impairments in body part, but not object or face part, discrimination (Moro et al., 2008). Compared with deficits in body part localization and ownership associated with damage to posterior parietal cortex discussed above (Auclair et al., 2009; Cleret de Langavant et al., 2009; De Renzi, 1982; Ogden, 1985), these results illustrate a dissociation between the dorsal and lateral streams within the domain of body part processing. Within the domain of face processing, a variety of cortical lesions spanning different aspects of the ventral and lateral streams can each produce impairments in holistic face processing (Busigny, Joubert, Felician, Ceccaldi, & Rossion, 2010; Van Belle et al., 2011). However, lesions to the IOG are associated with selective impairments in discriminating face parts, but not object or body parts (Moro et al., 2008), which suggests functional differences between the ventral and lateral streams. These findings from neuropsychological studies implicate the cortical expanse posterior to MT (LOS for body parts and the IOG for faces; Fig. 10) with processing the visual form of the face and body. Paired with the fact that this portion of lateral occipital cortex also selectively responds to images of objects and shapes across multiple visual cues (Grill-Spector, 2003; Grill-Spector et al., 1998; Mendola, Dale, Fischl, Liu, & Tootell, 1999; Vinberg & Grill-Spector, 2008), suggests that regions posterior to MT are responsible for coding visual form more generally (Fig. 11).
Comparatively, the MTG, which is anterior to hMT+, is not strictly visual in nature, but shows polymodal response properties involved in different aspects of vision, action, and language—a feature that further distinguishes LOTC from the other processing streams. Based on our measurements, we refer to the MTG as limb-selective. However, prior studies also implicate the MTG and nearby ITG in executing hand movements (Astafiev, Stanley, Shulman, & Corbetta, 2004; Orlov et al., 2010), haptically exploring objects (Amedi, Malach, Hendler, Peled, & Zohary, 2001), and responding to tactile stimulations of the hand relative to the foot (Beauchamp, Laconte, & Yasar, 2009; Beauchamp, Yasar, Kishan, & Ro, 2007). The MTG has also been shown to code the rationality of movements (e.g. the mapping of action to meaning; Jastorff, Clavagnier, Gergely, & Orban, 2010), as well as the mapping of sounds to meaning (Glasser & Rilling, 2008; Wong, Chandrasekaran, Garibaldi, & Wong, 2011), suggesting that it may be an anatomical locus for the integration of gesture and language processing (Nelissen et al., 2010). Taken together, these studies indicate that the MTG may be a convergence zone of action representation embodying information across visual, tactile, haptic, and motor domains with potential roles also in language processing and social communication, which is in line with previous proposals (Beauchamp & Martin, 2007; Martin, 2007). Overall, these data suggest that LOTC is organized differently than ventral or dorsal high-level visual cortex with distinct functions that separate it from either pathway. Future research using multiple functional and anatomical methods will support or refute our proposal of the lateral pathway as a distinct processing stream.
Conclusions and future directions
The present work illustrates that face- and limb-selective regions are topographically organized throughout high-level visual cortex. These data provide the first framework for consistent parcellation of high-level visual regions outside visual field maps. Importantly, implementing this parcellation framework has generated a new model of high-level visual cortex containing three processing streams extending dorsally, laterally, and ventrally, which are separable based off anatomical and functional criteria. Further, our results suggest that the statistical regularity in which faces, limbs, and bodies are presented relative to one another in the natural world has been incorporated into the visual system in high-level representations of alternating maps within these three separate processing streams. The anatomical location of each region within its particular stream, as well as its spatial relationship to other known surrounding functional regions, may be related to the particular role of each region in either distinct aspects of vision, action, haptics, memory, and language, or combinatorial aspects across these modalities. The new three stream model and systematic parcellation framework described here motivates future research both to examine how and why neural representations of faces and limbs cortically neighbor one another, as well as to test visual and polymodal properties of different regions guided by the predictions of the model. These future directions will determine how these pathways interact and converge to embody different aspects of vision, action, and language.
This work was supported by NSF BCS grant 0920865 and Round 4 Bio-X IIP award. We thank Charlie Gross for valuable feedback on the history section of the manuscript.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.