Introduction

Electrophysiological recordings, optical imaging, and functional magnetic resonance imaging (fMRI) in nonhuman primates report individual neurons and clustered neural responses in inferotemporal (IT) cortex responding preferentially to static and dynamic images of biologically relevant categories, such as faces and limbs (Gross & Sergent, 1992; Tsao & Livingstone, 2008 for reviews). In humans, fMRI studies report activations in ventral temporal and lateral occipitotemporal cortices (VTC and LOTC, respectively; Fig. 1) illustrating higher blood oxygen level dependent (BOLD) responses to images of faces and limbs relative to images from a variety of control categories (Op de Beeck, Haushofer, & Kanwisher, 2008; Peelen & Downing, 2007 for reviews). Even though hand- and face-selective neurons were first discovered over 40 years ago (Gross, Bender, & Rocha-Miranda, 1969; Gross, Rocha-Miranda, & Bender, 1972), the underlying organization principles generating these responses and how they relate to face, limb, and body perception, are still unknown. Recent research has begun to shed light on the organization of face- and limb-selective activations in human LOTC and VTC, which we review here. This paper is organized into four main sections: (1) a brief history regarding the cortical organization of face- and limb-selective responses in both monkeys and humans including a Timeline summarizing this progression of knowledge, (2) a report of our recent findings (Weiner & Grill-Spector, 2010, 2011) of alternating face- and limb-selective regions in human LOTC and VTC using high-resolution fMRI, (3) a theoretical discussion explaining the implications of these findings revealing a new unconsidered organization principle of high-level visual cortex, and (4) a new model linking these findings across ventral, lateral, and dorsal pathways of high-level visual cortex.

Fig. 1
figure 1

Anatomical delineations of lateral occipitotemporal cortex (LOTC) and ventral temporal cortex (VTC). a LOTC (dashed outline) is the portion of cortex bounded by the lateral occipital sulcus (LOS), inferotemporal gyrus (ITG), middle temporal gyrus (MTG), and posterior superior temporal sulcus (STS). b VTC (dashed outline) is bounded by the occipitotemporal sulcus (OTS), middle of the fusiform gyrus just anterior to the mid‐fusiform sulcus (FG and mFus‐sulcus), collateral sulcus (CoS), and the posterior fusiform gyrus

History

A brief history regarding the organization of hand- and face-selective neurons in monkeys: scattered, then clustered, then columned

Scattered: hand- and face-selective neurons discovered and then neglected for more than a decade

In the early 1960s, little was known about how the visual system combines information to process complex shapes. At the time, Hubel and Wiesel (1965) proposed a hierarchy of sensory processing in the geniculo-striate system of the cat where visual processing became more complex as one ascended these stages. Up to that point, this processing stream ended at area 19 (visual area III; V3), which Hubel and Wiesel admitted could not account for the processing needed for the complex computations involved in object perception. Ablation studies provided some insight into the processing beyond V3, where removal of macaque IT, which was considered ‘association cortex’ at the time, generated specific deficits in visual recognition (Mishkin, 1966). Aware of these two findings, and due to his own clinical experience observing a variety of visual agnosias (an inability to recognize visually presented objects) resulting from cortical lesions in humans, neurophysiologist and neuropsychologist Jerzy Konorski suggested the existence of a face-selective ‘field’ lateral to V3 in visual cortex. Konorski further theorized regions selective for other ‘special’ classes of stimuli such as limbs, words, and places. He referred to these regions as gnostic fields and the neurons within them gnostic units (Konorski, 1967). Around the same time period, Charlie Gross was using single unit electrophysiology methods to examine the visual properties of neurons within IT cortex of the macaque monkey (Gross, Schiller, Wells, & Gerstein, 1967; Fig. 2a). Because of his expertise, Gross was asked to write a book review of Konorski’s theorized organization. A year after publishing this review of Konorski’s book in Science magazine (Gross, 1968), Gross et al., (1969) reported the first hand-selective neuron while measuring properties of IT cells. In this original study, a hand-selective neuron was defined as a cell responding more vigorously to silhouettes of hands relative to a variety of other images of 2D shapes, such as circles, rectangles, and flower-like configurations. In a subsequent study, Gross et al., (1972) extended these findings by measuring additional hand-selective neurons in unison with face-selective neurons (defined in a comparable fashion) in the posterior portion of macaque TE (Fig. 2a), which is a cytoarchitechtonic subdivision of the temporal lobe (von Bonin & Bailey, 1947).

Fig. 2
figure 2

Schematic depicting the location of face‐selective cells in monkey superior temporal sulcus (STS) and inferotemporal (IT) cortex. a Top Lateral view of a macaque brain with the fundus of the STS unfolded and shaded in gray. Approximate locations of visual areas V1 and V4 are indicated, in addition to the superior temporal polysensory area (STP) in the upper bank of the STS, as well as IT areas TEO and the posterior and anterior ventral subdivisions of TE—TEpv and TEav. b Example coronal section illustrating the relationship between the upper bank, fundus, and lower bank of the STS where face cells have commonly been found. Location of the section indicated by vertical line in the lateral view in a. Early studies from Gross and colleagues typically recorded from the lower bank of the STS (Gross et al., 1969, 1972; Desimone et al., 1984), while the early studies from Perrett and Rolls recorded from the upper bank (Perrett et al., 1982, 1984, 1985; Rolls, 1984; Baylis et al., 1987). Area TPO mentioned in the text is a cytoarchitechtonic subdivision of the upper bank, while areas TEa and TEm are adjacent subdivisions of the lower bank (Seltzer and Pandya, 1978; Baylis et al., 1987). Arrows indicate boundaries between cortical areas. Solid lines indicate lips and fundi of the sulci. Image adapted from Saleem et al., 2000. c The superior temporal sulcus has been enlarged from the image in a in order to illustrate the different recording sites from numerous studies illustrating face‐selective cells throughout STS and IT cortex. Dotted red outline indicates the clusters of cells identified by Harries and Perrett (1991). Image adapted from Perrett et al., 1992 with permission from authors

These findings of hand- and face-selective units were not received with much fervor because the definition of macaque IT cortex as a visual area was controversial in and of itself (Gross, 2008 for review). In fact, even though Gross and colleagues were the first to systematically measure the visual properties of IT cortex (Gross et al., 1969, 1972, 1967), it was more than a decade before any group replicated the receptive field properties of IT neurons (Richmond, Wurtz, & Sato, 1983), let alone the finding of hand- and face-selective neurons within this cortical expanse. Contributing to the controversy was the perceived sparsity of hand- and face-selective neurons. For example, Gross et al., (1969) reported only one hand-selective unit (of 51 recorded) in the original study, and then three hand- and three face-selective neurons (out of 205 that were visually responsive in TE) in the second study (Gross et al., 1972). Such small samples suggested that these cells were randomly scattered throughout IT cortex without any general organization principle, in stark contrast to the systematic organization of striate cortex (Hubel & Wiesel, 1962, 1968).

Clustered: face-selective cells are clumped together across several anatomical locations and hand-selective units are hard to find

Throughout the 1980s, the study of face-selective cells became more widely accepted and researchers began documenting functional properties of these cells related to different aspects of face processing. It was unknown (and still is presently) what an appropriate control stimulus is to compare to complex stimuli, such as faces and hands. Many of these studies defined face-selective units as those cells that fired more vigorously to the presentation of face images relative to both the spontaneous activity of the cell, as well as to a variety of control images, which could span from brushes (Gross et al., 1972) to ‘3D junk objects’ (Perrett, Rolls, & Caan, 1982; Rolls, 1984). Cells were also typically tested for additional selectivity features, such as responses to oriented bars, 2D shapes of various colors, aversive stimuli (such as images of snakes), as well as to tactile and auditory stimuli (Desimone, Albright, Gross, & Bruce, 1984). Finally, in order to be considered a face-selective neuron, a further criterion was added where a given cell needed to respond at least two times higher to faces than to the most effective control stimulus (e.g. Perrett et al., 1982). Once face-selective neurons were identified in this manner, they were reported to respond comparably across image formats (photograph, drawing) and across face species (human and monkey; Bruce, Desimone, & Gross, 1981; Desimone et al., 1984). Studies also showed that responses of face-selective cells decreased when parts of the face were removed or scrambled (Bruce et al., 1981; Desimone et al., 1984; Perrett et al., 1982), were modulated by the relative distance between internal facial features (Yamane, Kaji, & Kawano, 1988), and were tuned to specific face viewpoints (Perrett et al., 1982; Desimone et al., 1984). Importantly, these properties were maintained across changes in size and position suggesting a high-level representation (Desimone et al., 1984). It is critical to note that some cells illustrated selectivity for features, not the composite face, where these neurons illustrated comparable (and sometimes higher) responses to particular face features, such as the eyes or hair alone (Bruce et al., 1981; Perrett et al., 1982) compared to responses to the whole face (Perrett et al., 1982).

In addition to new insights regarding the functional properties of face-selective neurons, researchers began reporting larger populations of face-selective cells than the initial measurements, as well as documenting a correspondence between a particular anatomical location and a resulting cluster of face-selective cells. For example, in 1972, Gross and colleagues reported only 3 face-selective neurons out of 205 measured (~1.5%) in posterior TE. 15 years later, recording from a more anterior location, this number increased to 34% (17/50 neurons; Desimone et al., 1984). Shortly thereafter, Baylis, Rolls, and Leonard (1987) showed that more face-selective neurons were clustered on the upper and lower banks of the STS than in ventral IT by measuring functional properties of neurons within different cytoarchitechtonic subdivisions of the STS (see Fig. 2b for this anatomical distinction adapted from Saleem, Suzuki, Tanaka, & Hashikawa, 2000). Specifically, Baylis and colleagues showed that area TPO in the upper bank of the STS (44/244; 18% face-selective) and areas TEa (53/250; 21%) and TEm (51/232; 22%) in the lower bank of the STS contained higher concentrations of face cells than other neighboring areas in the upper and lower banks of the STS (Baylis et al., 1987; parcellation terminology from Seltzer & Pandya, 1978; Fig. 2b).

Taken together, by the end of the eighties, several research groups replicated the localization of face-selective neurons in both macaque IT and STS and began documenting that these cells were clustered. Still, neurons selective for static images of hands and limbs evaded researchers with the methods available at the time. For example, in an early study, Perrett et al., (1982) reported five units responding to images of hands, but responses to hands were lower than those to faces (i.e. these neurons were actually face-selective). Thus, if these limb-selective neurons existed, and how they were spatially organized in cortex relative to face-selective neurons was still unknown.

figure a

Columnar organization: a general organization principle in macaque high-level visual cortex

In addition to the reports of clustered face-selective cells in particular anatomical locations (Baylis et al., 1987), Perrett et al., (1984) also illustrated evidence for a potential columnar organization within the STS for face-selective cells, as well as additional neurons selective for moving bodies (Perrett et al., 1985a, b). We interpret their proposal of IT organization to have three general features. First, selective cells are clustered in small patches (0.5–2 mm in diameter) on the cortical surface (Perrett et al., 1984, 1985a). Second, within each of these patches, columns extend as much as 2 mm downward with cells illustrating similar stimulus selectivities on a given vertical electrode penetration. Third, nearby columns illustrate associated selectivities—for example, for different rotations of the head (Perrett et al., 1985b). Tanaka and colleagues extended Perrett’s findings in a series of influential studies demonstrating a general columnar organization of IT cortex, whereby cells preferring similar features tended to cluster in vertical columns perpendicular to the cortical surface about 0.4 mm in diameter (Fujita, Tanaka, Ito, & Cheng, 1992; Tanaka, 1996; Tanaka, Saito, Fukada, & Moriya, 1991; Wang, Tanaka, & Tanifuji, 1996). Tanaka’s group showed this columnar organization for hand-selective cells (Tanaka et al., 1991), moderately complex features (Fujita et al., 1992; Tanaka et al., 1991), and face viewpoint (Wang et al., 1996). In addition to this fine-scale organization, Harries and Perrett (1991) also reported a larger scale organization in macaque STS. They reported clusters of face cells approximately 3–4 mm in diameter along the STS, with a periodic organization in which dense clusters of face-selective cells alternated with clusters of cells that were not face-selective generating an ‘inter-cluster distance’ on the order of 3 mm (Fig. 2c adapted from Perrett, Hietanen, Oram, & Benson, 1992).

There are two key differences in findings across groups. First, Perrett’s sets of recordings were performed on the upper banks of the STS (Harries & Perrett, 1991; Perrett et al., 1985a; Perrett et al., 1984; Fig. 2b, c), while Tanaka’s recordings were in anterior TE (Fujita et al., 1992; Tanaka et al., 1991; Wang et al., 1996; Fig. 2b, c). Second, though both sets of findings conclude with a columnar organization, the definition of column and the associated theory with each definition is different. In Perrett’s definition, many columns make-up one cluster with a particular stimulus-selectivity which then produces a large-scale periodic organization of multiple face patches in the STS. Tanaka’s columns represent a general organization principle in IT cortex for the representation of object features with no additional macroscopic structure. Nevertheless, Tanaka and colleagues suggested that faces may have separate representations which represent facial features and configurations that are not shared by other objects (Tanaka, 1996). Though both the theory and definition of columnar organization are different across groups, converging results across these studies indicate a fine-scale (Fujita et al., 1992; Perrett et al., 1984, 1985a; Tanaka et al., 1991; Wang et al., 1996) and a potentially larger-scale structure (Harries & Perrett, 1991; Fig. 2c) in the organization of cells selective for faces, hands, and moving bodies in different portions of the temporal lobe that had not been documented before.

Summarizing the organization of face- and hand-selective neurons in monkey STS and IT cortex from their discovery until the advent of fMRI in the early 1990s

When Gross and colleagues began to study IT cortex in the late 1960s and early 1970s (see Timeline), they presented data from hand- and face-selective neurons together. Though hand-selective neurons were discovered first, the study of face-selective neurons solidified a niche in visual neuroscience 20 years later, while the study of hand- and other limb-selective neurons evaded researchers. In the late 1980s, there seemed to be a general correspondence between anatomical location and clustering of face-selective cells with a potential columnar organization. By the mid 1990s, the pairing of neurophysiological recording and optical imaging enabled the observation of a columnar structure in monkey IT. Furthermore, there was a reappearance of reports of neurons selective for static hands (Tanaka et al., 1991), the entire body (Wachsmuth, Oram, & Perrett, 1994) and moving hands (Perrett, Mistlin, Harries, & Chitty, 1990). During the same time period, Perrett and collaborators also reported larger clusters of face-selective cells in macaque STS that were 3–4 mm wide with an inter-cluster distance of 3 mm, illustrating a putative periodicity of face-selective cells along the STS (Harries & Perrett, 1991). Though the organization of macaque face- and hand-selective cells vastly evolved from randomly scattered, to clustered, to periodically clustered with a columnar organization, how this organization related to human cortex was still largely unknown.

The fusiform face area: a trend begins for the study of category-selective regions in humans

With the advent of fMRI in 1992 (Kwong et al., 1992; Ogawa et al., 1992), a trend emerged in the mid 1990s where researchers began to non-invasively map face-selective regions in the human brain. These studies were inspired by neurophysiological findings in monkeys described above, as well as behavioral and neural findings from neuropsychological case studies and invasive measurements in both patient and typical populations. Neuropsychological studies of face-blindness, or prosopagnosia (Bodamer, 1947), suggested that damage to ventral occipitotemporal cortex, especially the right fusiform gyrus, resulted in specific deficits in face recognition that do not generalize to other modalities (Damasio, Damasio, & Van Hoesen, 1982) or to other classes of visual stimuli such as objects or tools (Benton, 1980; Damasio et al., 1982; De Renzi, 1986; Hecaen & Angelergues, 1962; Landis, Cummings, Christen, Bogen, & Imhof, 1986; McNeil & Warrington, 1993; Sergent & Signoret, 1992). Furthermore, subdural recordings of neurons in human patients illustrated face-selective responses in both VTC and LOTC. Using single-unit methods, Ojemann, Ojemann, and Lettich (1992) showed that neurons in the human right middle and superior temporal gyri responded more during tasks associated with matching facial identity and facial expression than during object naming or matching. When measuring subdural field potentials, a series of studies reported higher responses to faces compared to words, letterstrings, numbers, colors, scrambled stimuli, and objects on the fusiform and inferotemporal gyri across hemispheres (Allison et al., 1994a; Allison, McCarthy, Nobre, Puce, & Belger, 1994b; Nobre, Allison & McCarthy, 1994). In typical populations, positron emission tomography (PET; Clark et al., 1996; Haxby et al., 1991, 1994; Sergent, Ohta, & MacDonald, 1992) studies reported functionally dissociable face-selective regions along the fusiform gyrus: the posterior fusiform gyrus and occipitotemporal sulcus activated during tasks of face matching and face gender discrimination (Haxby et al., 1994; Sergent et al., 1992), while the right mid-fusiform gyrus was activated during face identification (Sergent et al., 1992). Motivated by these findings of face-sensitive regions in VTC, early fMRI studies measured BOLD responses to images of faces compared to those of scrambled faces, textures, common objects, or consonant strings and found a network of regions that responded more strongly to intact faces spanning the fusiform, inferotemporal, and inferior occipital gyri, as well as the superior temporal sulcus (Clark et al., 1996; Puce, Allison, Asgari, Gore, & McCarthy, 1996; Puce, Allison, Gore, & McCarthy, 1995; see Timeline ).

However, in 1997, a new trend emerged when Kanwisher, McDermott, and Chun (1997) introduced the functional localizer approach to examine the properties of face-selective regions. By first identifying a particular region of interest (ROI) in each subject with one set of functional scans (e.g. images of faces > images of objects), Kanwisher and colleagues then used a variety of different types of images similar to those used in the early Gross and Perrett studies (e.g. faces with eyes removed, scrambled internal features, etc.) to examine the functional properties of these regions in independent sets of experiments. In doing so, they reported a single area in the fusiform gyrus specialized for perceiving faces: ‘Our strategy was to ask first whether any regions of occipitotemporal cortex were significantly more active during face than object viewing; only one such area (in the fusiform gyrus) was found consistently across most subjects’ (p. 4303). This lead to the conclusion of a single area selective for faces on the fusiform gyrus labeled area FF (or the Fusiform Face Area, FFA; Kanwisher et al., 1997).

However, there are several datapoints from the initial measurements of the FFA that suggest otherwise. First, the original report illustrates multiple face-selective activations on the fusiform gyrus within the same subject as well as vastly different loci of activations on the fusiform gyrus across subjects (figure replicated in Fig. 3a). This difference is reflected in the reported FFA Talairach coordinates, which vary as much as nearly 40 mm in the anterior to posterior dimension (even in the same subject across hemispheres, S8 from Kanwisher et al., 1997). To anchor this measurement for the reader, the average length of the fusiform gyrus is 50 mm and the distance between V1 and MT is on the order of 50–60 mm (Tootell & Taylor, 1995), suggesting that there could be multiple visual areas in the cortical expanse that is reported by Kanwisher and colleagues as a single brain area. Second, there are additional regions outside the fusiform that show the same face selectivity (Fig. 3a). Third, regions labeled as the ‘FFA’ in some subjects have the same Talairach coordinates as ‘other face activation loci’ in other subjects (e.g. S8, FFA: 40, −39, −6 mm as the FFA and S5: other activation: 40, −30, −9 mm). Fourth, the fact that one of the fusiform activations appeared to be the most consistent in 1997 might be a consequence of the limitations in the functional mapping methods at that time, rather than a principle of brain organization.

Fig. 3
figure 3

The many faces of the FFA. Researchers identify more than one region on the fusiform, but typically refer to them all as the FFA because there has been no established criteria for accurate parcellation. a Two example subjects from Kanwisher et al., (1997). There is extensive variability in the location of the labeled FFA (defined from faces > objects, indicated by arrows) in both the superior–inferior dimension, as well as the anterior–posterior dimension. This difference is hard to see with one axial slice. b Left Goesaert and Op de Beeck (2010) refer to three anatomically distinct face‐selective patches as the FFA (defined from faces > hands, torsos, buildings, and skyscrapers). Right Grill‐Spector et al., (2004) show two anatomically segregated regions and label them the FFA (defined from faces > objects). c Tsao et al., 2008 report two anterior temporal face‐selective patches, AFP1 and AFP2 (defined from faces > objects), but still label two similarly separate face‐selective regions on the fusiform as the FFA. Images adapted with permission from authors

For comparison, the first fMRI retinotopic mapping study to use cortical surface visualizations (published 2 years prior to the seminal FFA work) identified areas V1-V4v ventrally and V1-V3 dorsally (7 total maps; Sereno et al., 1995). Presently, neuroscientists have identified a series of eight maps extending ventrally from V1 to the temporal lobe (V1v-V3v, hV4, VO-1/2, PHC-1/2; Arcaro, McMains, Singer, & Kastner, 2009; Brewer, Liu, Wade, & Wandell, 2005; Wandell, Dumoulin, & Brewer, 2007), 12 maps extending dorsally into the parietal lobe (V1d-V3d, V3A, V3B, V7, IPS-1/5, SPL-1; Konen & Kastner, 2008; Silver, Ress, & Heeger, 2005; Swisher, Halko, Merabet, McMains, & Somers, 2007; Tootell et al., 1998), and more than four maps laterally (LO-1/2; TO-1/2; pMSTv, pFST, pV4t; Amano, Wandell, & Dumoulin, 2009; Huk, Dougherty, & Heeger, 2002; Kolster, Peeters, & Orban, 2010; Larsson & Heeger, 2006). Conservatively, that is three times as many visual field maps as reported around 1995–1997. Notably, definitions of several of these maps have been revisited and re-parcellated as both methods and empirical ideas evolve (e.g. V4/V8 vs. hV4/VO-1/VO-2: Brewer et al., 2005; Hadjikhani, Liu, Dale, Cavanagh, & Tootell, 1998; V4d vs. LO-1/LO-2: Hansen, Kay, & Gallant, 2007; Larsson & Heeger, 2006; Tootell & Hadjikhani, 2001; Wade, Augath, Logothetis, & Wandell, 2008). Yet in the same passage of time, the concept of a single FFA has largely remained unrevised, even with improvements in scanning methods and visualizations illustrating more than one face-selective region on the fusiform (Fig. 3b, c). For example, it is not uncommon for research groups to refer to several face-selective regions spanning different anatomical locations (sometimes from the posterior fusiform gyrus all the way to the tip of the temporal lobe) together as the FFA (Fig. 3b). Other times, research groups separate some face-selective regions from one another (AFP1 and AFP2, Tsao, Moeller, & Freiwald, 2008; Fig. 3c), yet still combine multiple fusiform regions together into a single FFA despite the comparable anatomical distances separating each pair of regions (Fig. 3c). Such variability in FFA definitions illustrates the need for a parcellation framework to implement consistent parcellation practices across research groups.

Several years following the discovery of the FFA, the functional localizer approach was also used to identify a separate cortical module selective for the human body labeled the extrastriate body area (EBA; Downing, Jiang, Shuman, & Kanwisher, 2001). Downing and colleagues used images of headless bodies in comparison to an array of control stimuli motivated by the findings of Wacsmuth et al., (1994) who found neurons in macaque STS that responded more robustly to images of headless bodies than faces, whole bodies, and 3D objects. Downing et al., (2001) reported a single continuous region in human extrastriate cortex near hMT+ that responded more strongly to bodies and body parts vs. objects and object parts (Fig. 4a).

Fig. 4
figure 4

The many faces of the EBA. a Three example subjects from Downing et al., (2001). The combination of coronal volume‐based or inplane visualizations with large voxels and spatial smoothing obstructs the view of the underlying anatomical structures, as well as the precise spatial organization of the EBA relative to hMT+. b The spatial relationship between the EBA and hMT+ changes with different visualizations. On the sagittal slice (left) the EBA (red) is largely posterior and overlaps with hMT+ (yellow; from Downing et al., 2007), while on the cortical surface (right) the EBA (red) appears to surround hMT+ (yellow) in a ring‐like structure (from Spiridon et al., 2006). Images adapted with permission from authors

Parallel concerns arise regarding the organization of the EBA as with the FFA. First, when restricted to a series of coronal slices acquired with large functional voxels as shown in Fig. 4a, it is difficult to determine whether the observed activation is one contiguous region or a series of regions. Second, it is also hard to determine the spatial organization of this functionally defined activation relative to adjacent surrounding regions based on one coronal slice. For example, in the rightmost image in Fig. 4a, the EBA (red) appears to be superior to hMT+ (green), but when using sagittal slices, the EBA seems to also extend ventrally beneath hMT+ (Fig. 4b, left; Downing, Wiggett, & Peelen, 2007). Further, with 3D surface visualizations, the EBA appears to surround hMT+ in a ring-like organization (Fig. 4b, right; Spiridon, Fischl, & Kanwisher, 2006), suggesting a different relative organization among activations depending on the type of data visualization used. Note that adjacency on the brain volume can be misleading. Due to gyral and sulcal folding patterns, regions that appear to be nearby on the brain volume or on an inplane slice, can actually be quite distant on the cortical surface (Fig. 5). These issues are exacerbated when fMRI acquisitions use large voxels and researchers spatially smooth data on the brain volume. The combination of these procedures can merge distant cortical activations into what appears to be a single cluster on the brain volume.

Fig. 5
figure 5

A problem with volume‐based data visualizations is reconciled using cortical surface visualizations in single subjects. Left Example axial slice from a single subject. Middle Zoomed portion surrounding the posterior inferotemporal sulcus indicated by the dotted red outline. Two regions of interest (green, red) are illustrated in different anatomical locations that would appear to be one contiguous region using large functional voxels (3–5 mm on a side) and spatial smoothing (e.g. Figs. 3, 4). Notably, neurons close to one another in volume space due to the sulcal and gyral folding patterns may perform different functions (e.g. Fig. 9). Right Inflated cortical surface illustrating the precise anatomical locations of these ROIs. The distance on the gray matter between these two ROIs is 15 mm rather than 5 mm in volume space

Despite these issues, a number of face- and body part-selective regions have been identified and widely examined in addition to the EBA and FFA in VTC (fusiform body area, FBA; Peelen & Downing, 2005; Schwarzlose, Baker, & Kanwisher, 2005) and LOTC (occipital face area, OFA; Gauthier, Skudlarski, Gore, & Anderson, 2000), as well as in the posterior superior temporal sulcus (pSTS; Puce et al., 1995). Most recently, fMRI studies have identified an increasing number of face- and body-selective regions in high-level visual cortex, including two face-selective regions on the fusiform gyrus (FFA-1 and FFA-2; Pinsk et al., 2009), a region in anterior temporal cortex 40 mm in front of the more anterior fusiform face-selective activation (Kriegeskorte, Formisano, Sorger, & Goebel, 2007; Nestor, Plaut, & Behrmann, 2011; Pinsk et al., 2009; Rajimehr, Young, & Tootell, 2009; Tsao et al., 2008), and two regions on the anterior and middle STS (Calder et al., 2007; Pinsk et al., 2009; Winston, Henson, Fine-Goulden, & Dolan, 2004). Likewise, fMRI studies of body part-selective regions have documented more than one activation on the fusiform gyrus (FBA-1 and FBA-2; Pinsk et al., 2009), as well as focal selectivity for specific body parts in LOTC and VTC for hands, torsos, and legs (Bracci, Ietswaart, Peelen, & Cavina-Pratesi, 2010; Chan, Kravitz, Truong, Arizpe, & Baker, 2010; Op de Beeck, Brants, Baeck, & Wagemans, 2010; Orlov, Makin, & Zohary, 2010).

Critically, despite the discoveries of many face- and body part-selective regions in high-level visual cortex, there is no theoretical model of the spatial layout of face- and body part-selective regions relative to each other and relative to anatomical landmarks. To address this gap in knowledge, we recently conducted a series of experiments to systematically examine the fine-scale spatial organization of both face- and limb-selective regions in VTC and LOTC using high-resolution fMRI motivated by the following questions:

  1. 1.

    Is there a consistent spatial organization of face- and limb-selective regions in ventral temporal cortex?

  2. 2.

    If so, does this organization principle of a reliable spatial relationship among face- and limb-selective regions extend to lateral occipitotemporal cortex?

Summary of recent findings

Face- and limb-selective regions alternate throughout ventral temporal and lateral occipitotemporal cortices

Applying higher-resolution fMRI (1.5 mm voxels) than past studies (3–5 mm voxels) in a series of experiments, we examined the spatial characteristics of face- and limb-selective activations implementing a different approach than typically used. Presently, researchers commonly label any face-selective voxels in the fusiform gyrus as ‘FFA’ and any body part-selective voxels in LOTC as ‘EBA’ (Figs. 3, 4). Such an approach results in extensive variability in the anatomical location of these areas across subjects and research groups (Figs. 3, 4). This variability can lead to an inconsistent spatial relationship among functional regions, which in turn affects the interpretation of the organization. Often, this inconsistency is interpreted to reflect substantial inter-subject variability of activations in human high-level visual cortex.

Our new approach (Weiner & Grill-Spector, 2010, 2011) to systematically parcellate face- and body part-selective regions uses well-known principles that are used to parcellate early retinotopic areas. We delineate activations in single subjects on their cortical surfaces using anatomical and functional criteria, creating boundaries between functionally defined regions when there is a change in selectivity. Face-selective regions were defined by higher BOLD responses to images of faces compared to images of limbs, flowers, cars, guitars, and houses, (t > 3, P < 0.002, voxel level) and limb-selective regions were identified by comparing BOLD responses to images of limbs with responses to images of faces, flowers, cars, guitars, and houses (t > 3, P < 0.002, voxel level; see Weiner & Grill-Spector, 2010, 2011 for details). We chose these comparison stimuli as they are each of a visually coherent category and provide a broad baseline of comparison objects. Limbs were used as representative body part stimuli because they are the most common stimuli used to localize the EBA and FBA (Supplemental Table 1 from Weiner & Grill-Spector, 2011). These contrasts typically yield multiple activations rather than a single EBA and FFA (a fact often illustrated in prior figures, but not addressed in print; Figs. 3, 4). In order to implement consistent parcellation across subjects, we distinguished regions with the same selectivity from one another if they were anatomically segregated and contained a region with a different selectivity between them. If no intervening clusters were present, activations were merged if they were in close proximity to one another. We then examined the spatial relationship of face- and limb-selective regions relative to (1) each other, (2) known visual field maps, and (3) other known functionally defined regions such as hMT+ that are associated with stable anatomical landmarks.

Alternating and adjacent face- and limb-selective regions in occipitotemporal cortex

Visualizing face- and limb-selective activations on the cortical surface reveals multiple face- and limb-selective clusters throughout occipitotemporal cortex, which generate a continuous topographic representation from lateral occipitotemporal cortex extending into ventral temporal cortex. Figure 6 illustrates this organization on the inflated cortical surface of three individual subjects with four notable findings. First, there are multiple face- and limb-selective regions with a periodic organization throughout occipitotemporal cortex. Second, face- and limb-selective regions complement one another where the ‘inter-cluster distance’ between two face-selective regions is commonly filled with a limb-selective region and vice versa. Third, this organization is consistent across individual subjects (see also Weiner & Grill-Spector, 2010, 2011; Weiner, Sayres, Vinberg, & Grill-Spector, 2010), and is reliable across experimental paradigms, tasks, and time (Fig. 7 ). Fourth, several face- and limb-selective regions radiate in a ring-like organization surrounding a well-known functional region—the human motion-selective complex (hMT+; dotted black line in Fig. 6). As the location of hMT+ has been widely examined and is associated with a particular anatomical landmark of the posterior inferotemporal sulcus (Dumoulin et al., 2000; Fig. 1), this suggests that each of these face- and body part-selective regions can be associated with anatomical landmarks. We explain reliable anatomical and functional boundaries that divide this map of face- and limb-selective regions in turn below, first in VTC and then in LOTC.

Fig. 6
figure 6

Face‐ and limb‐selective regions alternate in ventral temporal cortex (VTC) and lateral occipitotemporal cortex (LOTC). Face‐selective and limb‐selective activations on the inflated right cortical surfaces of three example subjects. In each inset, black rectangles indicate the imaged region of VTC and LOTC in these higher resolution functional scans (1.5 × 1.5 × 3 mm voxels). hMT+ is indicated by the dotted black outline. In LOTC (superior portion of each image), face‐ and limb‐selective regions radiate around hMT+ in an alternating fashion. In VTC (inferior portion of each image), this alternation among face‐ and limb‐selective regions continues. For clarity voxels responding comparably to both faces and limbs are not colored separately. Acronyms: OTS: occipitotemporal sulcus; ITS: inferotemporal sulcus; STS: superior temporal sulcus

Fig. 7
figure 7

Stable response amplitudes to object categories across experiments. Left Zoomed portion of an inflated right hemisphere schematically illustrating the locations of four ROIs in ventral temporal cortex: a pFus‐faces, b OTS‐limbs, c mFus disk ROI, and d mFus‐faces. All ROIs were defined functionally from localizer scans except for the disk ROI, which was defined as a 10 mm diameter disk on the cortical surface in the anatomical extent separating mFus‐faces and pFus‐faces. ROIs were defined from one session and response amplitudes were extracted from three independent experiments either from the same day (event‐related) or five months later (two block design experiments). The event‐related experiment used four categories, while the other experiments used six. Responses are relative to a fixation baseline and averaged across hemispheres and subjects. Error bars indicate SEMs across subjects. Adapted from Weiner and Grill‐Spector (2010)

Reliable anatomical and functional boundaries to delineate face- and limb-selective regions in ventral temporal cortex

In VTC, we find alternating face- and limb-selective regions along the posterior 30 mm of the fusiform gyrus extending laterally into the occipitotemporal sulcus (OTS; inferior activations in Fig. 6). Each of these functional regions exhibits two features that have gone undocumented in prior studies. First, there are two anatomically distinct face-selective regions on the fusiform gyrus: one on the posterior fusiform gyrus that we refer to as pFus-faces and one on the mid-fusiform sulcus that we refer to as mFus-faces (Weiner & Grill-Spector, 2010; Weiner et al., 2010). The centers of these face-selective regions are spatially dissociable as mFus-faces is 15 mm anterior to pFus-faces (Talairach coordinates in Table 1). Further, these two face-selective regions are separated by regions with different selectivity: a limb-selective region (Fig. 7b) located laterally on the occipitotemporal sulcus (OTS), which we refer to as OTS-limbs rather than the FBA because it seldomly extends to the fusiform (even in the original reports; see Peelen & Downing, 2005; Schwarzlose et al., 2005), and a more medial fusiform activation that responds more strongly to many categories compared to scrambled versions of these images (Fig. 7c, previously referred to as mFus-objects see Grill-Spector, 2003). Second, each face- and limb-selective region has a preserved spatial relationship relative to one another across subjects and hemispheres: mFus-faces is anterior and medial to a more posterior and lateral OTS-limbs, and pFus-faces is consistently posterior and medial to a more anterior and lateral OTS-limbs (Fig. 6). The locations of activations are also spatially reliable relative to nearby visual field maps hV4, VO-1, and VO-2 (see Weiner & Grill-Spector, 2010). Importantly, these activations are reproducible over a span of 3 years, across imaging resolutions, and different contrasts used for localization (Supplementary Figs. 1, 2 and Supplementary Materials). These reliable anatomical and functional boundaries are indicative of a topographic relationship among face- and limb-selective regions in VTC that has gone undocumented.

Table 1 Location of face- and limb-selective regions in Talairach space (SDs across 9–11 subjects)

Finding a consistent anatomical location and spatial relation among functional activations is important because these two criteria reflect fundamental cortical organization principles. Importantly, these two principles have been used to parcellate cortex in the macaque (Felleman & Van Essen, 1991) and are evident among early retinotopic areas across primate species. For example, in humans, there are a series of retinotopic maps in each hemisphere extending from V1 dorsally in a specific order with particular characteristics: a hemi-field representation (V1), two mirror-reversed quarter-fields (V2d, V3d), and a second hemi-field representation (V3a). Since this mapping is consistent across subjects, researchers are able to define these regions identically in individual subjects. Our data extend these principles of consistent anatomical location and spatial relation among activations to high-level regions and provide strong evidence for a parsimonious organization principle applicable to the entire visual system. In contrast, domain-specificity (which is the principle used to derive the FFA and EBA) proposes separate organization principles across visual cortex: retinotopy in early and intermediate visual areas and functionally specialized modules for a select number of categories in high-level visual cortex (Kanwisher, 2010). As an outcome, the visual system is dichotomized where it is highly organized across early visual areas and rather disorganized across high-level visual areas. However, this disorganized principle of high-level visual cortex is problematic as we have shown that prior definitions of the FFA violate these two principles of anatomical location and spatial relationship, where both the anatomical location of the FFA and the spatial relationship between the FFA and FBA change from subject to subject (Peelen & Downing, 2005; Peelen et al., 2006; Pinsk et al., 2009; Schwarzlose et al., 2005). The present data clarify this discrepancy by showing that there are actually two face-selective regions 15 mm apart on the fusiform instead of a single FFA, where these regions are reliably separable by the limb-selective OTS.

These findings in VTC lead to an important question: Does this consistent topographic relationship among face- and limb-selective regions extend to LOTC? If so, this would suggest that the parsimonious organization principle discovered in VTC is generalizable throughout high-level visual cortex. We recently addressed this question empirically (Weiner & Grill-Spector, 2011) and summarize our results below.

Reliable anatomical and functional boundaries to delineate face- and limb-selective regions in lateral occipitotemporal cortex

To address if the consistent topographic relationship among face- and limb-selective regions observed in VTC extends to other portions of the brain, we measured face- and limb-selective responses in each subject’s LOTC using the same analyses described in the prior section. In addition, we also localized hMT+ in each subject because it is adjacent to these activations and is associated with a specific anatomical location on the ascending limb of the posterior inferotemporal sulcus (Amano et al., 2009; DeYoe et al., 1996; Dumoulin et al., 2000; Tootell et al., 1995; light blue outlined in black in Fig. 1). Thus, it serves as a reliable anchor from which to generate functional boundaries that are closely linked to the underlying anatomy (see Weiner & Grill-Spector, 2011 for details). Similar to the analysis of VTC organization, we examined the spatial organization of limb- and face-selective regions relative to each other in LOTC, as well as relative to (1) anatomical landmarks, (2) hMT+, and (3) known visual field maps.

As in VTC, we illustrate three important findings regarding the functional organization of LOTC (Fig. 6; Supplementary Figs. 3, 4): (1) there are several face- and limb-selective activations in LOTC in distinct anatomical locations, (2) they have a consistent spatial organization relative to each other, as well as (3) relative to hMT+. Specifically, we do not find evidence for one EBA in LOTC, as is commonly reported, but rather a series of limb-selective activations located around the perimeter of hMT+ (illustrated by a dotted black line in Fig. 6) where each is associated with a distinct anatomical landmark and consistent spatial relation to hMT+ (Table 1 for Talairach coordinates). The first activation is located on the lateral occipital sulcus/inferior portion of the middle occipital gyrus (LOS/MOG) and is posterior to hMT+. The second activation is located on the inferior temporal gyrus (ITG) and inferior to hMT+. The third activation is located on the middle temporal gyrus (MTG) and anterior to hMT+. The crescent organization surrounding hMT+ is reproducible over a span of 3 years (Supplementary Fig. 3) and a variety of contrasts using different body part and control stimuli (Supplementary Fig. 4). Furthermore, using anatomical landmarks and the spatial relationship to hMT+ to define the original limb-selective ROIs accurately predicts functional differences across these ROIs 3 years later (Supplementary Fig. 4).

Notably, there is also a consistent organization of LOTC limb-selective regions relative to face-selective regions as illustrated in Fig. 6 (see also Supplemental Fig. 1 from Weiner & Grill-Spector, 2010). In particular, there is a ring organization of alternating face- and limb-selective regions surrounding, but largely not encroaching into, hMT+. Specifically, the face-selective pSTS is superior to hMT+ and located between the limb-selective LOS/MOG and limb-selective MTG. Likewise, the face-selective IOG is located between the limb-selective LOS/MOG and the limb-selective ITG, as well as located on the inferior corner of hMT+.

Taken together, the alternating series of face- and limb-selective regions in VTC extends to LOTC, indicating that the topographic relationship between face- and limb-selective regions generalizes across high-level visual cortex. Furthermore, our data illustrate there is not one EBA in LOTC and not one FFA in VTC, but rather a fine-scale spatial organization of these activations relative to one another and specific anatomical landmarks. It is possible that this organization has previously been missed because of methodological reasons such as scanning with larger voxels (>3 mm) and the use of inplane visualizations. We directly relate the stability of our higher-resolution measurements to measurements with larger voxels and different visualizations below.

Methods and measurements produce theories: the way in which data is acquired, analyzed, and visualized can lead to misleading interpretations

Theoretical interpretations resulting from functional organization measured with fMRI depend on a variety of factors, such as the way in which data is acquired, analyzed, and visualized. In Table 2, we summarize a variety of methodological recommendations to improve the mapping of functional activations in high-level visual cortex. When possible, scanning with smaller functional voxels (1–2 mm) is encouraged because the higher spatial resolution reduces the effects of partial voluming and susceptibility artifacts (Supplementary Fig. 5; Winawer, Horiguchi, Sayres, Amano, & Wandell, 2010) compared to larger voxels, which in turn increases the spatial specificity of measurements. In particular, detecting functional activations in the posterior fusiform, inferior occipital, and inferotemporal gyri can be affected by artifacts produced by the transverse sinus (Supplementary Fig. 5; Winawer et al., 2010). Furthermore, when restricting data to gray matter, the organization with small (~1.5 mm) and large (~3-4 mm) voxels is similar, but the combination of spatial smoothing and not segmenting gray from white matter produces spatially inaccurate measurements (Fig. 8; Supplementary Fig. 6). In addition, 3D surface visualizations in single subjects enable a bird’s eye view of the global organization of high-level visual cortex without being restricted to a particular slice orientation as is the case with inplane or volume visualizations.

Table 2 Recommendations for methodological decisions in fMRI data analysis pipelines in high-level visual cortex
Fig. 8
figure 8

Alternation of face‐ and limb‐selective regions is also evident using larger functional voxels and inplane visualizations, but not with spatial smoothing. An example inplane slice from subject S3 acquired with voxels eight times as large (3.75 × 3.75 × 4 mm) as our HR‐fMRI scans. Left Face‐selective regions (red). Middle Face‐selective regions with limb‐selective regions (green), and their overlap (yellow). Labeling of face‐selective regions is possible using limb‐selective regions as a guide (and vice versa). Right With spatial smoothing and not restricting data to gray matter, however, mFus‐ and pFus‐faces merge to a single region, and OTS‐ and ITG‐limbs merge. The top rightmost image is smoothed with a 4 mm kernel and the bottom rightmost image is smoothed with an 8 mm kernel

Fig. 9
figure 9

Functional differences among VTC and LOTC activations that would be missed if regions were defined as FFA and EBA. a Difference in responses to blocks of nonrepeated compared to repeated images (fMRI‐adaptation level) averaged across categories and subjects. fMRI‐adaptation was significantly larger in mFus‐faces than pFus‐faces (* P < 0.03), illustrating functional differences between these ROIs (adapted from Fig. 3, Weiner et al., 2010). b Mean responses across subjects to limbs presented contralaterally versus ipsilaterally (contralateral bias) and limbs presented foveally versus contralaterally (foveal bias). The limb‐selective LOS/MOG illustrates a significantly greater contralateral bias than foveal bias (* P < 0.05), while the ITG and MTG do not (adapted from Weiner & Grill‐Spector, 2011). Error bars in both panels indicate SEMs across subjects

Taken together, the exploration of these methodological issues (Supplementary Materials) indicate that the factor most detrimental to accurately examining spatial organization of functional activations is spatial smoothing. Even researchers using large voxels and inplane visualizations can implement the parcellation methods used here as long as spatial smoothing is not used (Fig. 8). Importantly, precisely defining functional regions at the correct spatial scale and anatomical location identifies functional distinctions among activations that are lost when data are spatially smoothed or inaccurately combined (e.g. Figs. 3, 4). For example, in VTC, mFus-faces shows more fMRI-adaptation to repeated images than pFus-faces, illustrating a potential hierarchical organization extending from V1 into VTC based on adaptation characteristics (Fig. 9a; adapted from Weiner et al., 2010). Similarly, functional differences are found in LOTC, whereby LOS-, ITG- and MTG-limbs illustrate different retinotopic properties. There is a strong contralateral bias in LOS-limbs, which decreases progressively to MTG-limbs, with a concomitant increase in foveal bias (Fig. 9b; adapted from Weiner & Grill-Spector, 2011).

Theoretical implications and discussion

Neural representations of faces and limbs: cortical neighbors in lateral and ventral high-level visual cortex

The current paper elaborates on our recent findings illustrating a series of alternating and adjacent face- and limb-selective regions in a topographic organization in VTC and LOTC (summarized in Fig. 10 ). Specifically, each face- and limb-selective region is situated in a particular anatomical location with a consistent spatial relationship relative to neighboring high-level visual regions. This consistent spatial relationship also applies to the location of face- and limb-selective regions relative to visual field maps in VTC and LOTC. These findings indicate a single organization principle extendable from early to high-level visual cortex (see Weiner & Grill-Spector, 2010, 2011).

Fig. 10
figure 10

Summary schematic depicting the organization of face‐ and limb‐selective regions throughout high‐level visual cortex. Inset indicates the anatomical location of the summary schematic on the cortex. LOTC: Face‐ and limb‐selective regions radiate around the perimeter of hMT+, which can be further divided into MT and MST (Amano et al., 2009). Each of these face‐ and limb‐selective activations is situated in a different anatomical location where the spatial relationship among activations is preserved. Importantly, no face‐ or limb‐selective voxels are found in the center of hMT+ (which is also the location of the upper vertical meridian shared between MT and MST). VTC: This alternation of face‐ and limb‐selective regions extends ventrally where two face‐selective activations on the fusiform are separated by a limb‐selective region located in the OTS

We expand below on the implications of these findings: (1) as a new general organization principle in high-level visual cortex, (2) for the comparison of cortical organization between typical and atypical populations, and (3) in relation to the organization of face- and body part-selective regions in non-human primates.

Topographic relationship among face- and limb-selective regions as a new organization principle in human high-level visual cortex

Faces and limbs are ecologically and socially relevant classes of visual stimuli with a statistical regularity in their visual appearance: heads are most often above bodies and limbs are often just offset from the body. The present data show that this consistent topography of faces and limbs in stimulus space is reflected in cortical space where there is a consistent topographic nature of face- and limb-selective regions. A well-known example illustrating a correspondence between stimulus space and neural representation is a retinotopic map, where two adjacent points in visual space are projected to two adjacent points on the retina. This retinotopic relationship then extends to visual cortex where adjacent points on the retina map to adjacent locations on the cortical surface (Wandell & Winawer, 2011 for review). Unlike retinotopic mapping methods, our stimuli are not presented in a way that smoothly varies the relative positions of faces and limbs. Instead, all our stimuli are presented in the center of the visual field, in line with the early studies from Kanwisher and colleagues. However, when one sees an image of a face, it is understood that the body (limbs included) are underneath it (and vice versa). We propose that the regularity in which faces, limbs, and bodies are presented relative to one another in everyday life has been incorporated into the visual system resulting in a map of alternating face- and limb-selective regions throughout high-level visual cortex. A recent paper supports this proposal illustrating a topographic organization for body part representation (upper/lower half of the face, arms, legs, and torso) in human LOTC (Orlov et al., 2010). However, Orlov and colleagues did not propose a parcellation scheme or report the regularity of face- and limb-selective organization relative to anatomical landmarks. Thus, our data introduce a new set of principles, and in turn an unconsidered organization, of high-level visual cortex where there is a systematic and alternating representation of faces and limbs in predictable anatomical locations.

We label these face- and limb-selective activations as ‘regions’ rather than ‘areas’ and preface the type of stimulus selectivity with the anatomical loci of these regions most consistently found across subjects (e.g. pFus-faces or OTS-limbs). Such a labeling reflects the preserved spatial relationship of anatomical landmarks, as well as the alternating stimulus selectivity. Since large-scale neuroanatomy is stable, the spatial relationship of anatomical landmarks will be preserved, as will the relationship among the associated functional regions. For example, anatomically, the ITG will always be anterior to the IOG. Functionally, then, ITG-limbs will always be anterior to IOG-faces. This principle is also applicable within a given anatomical structure such as the fusiform gyrus where mFus-faces will always be anterior to pFus-faces. Consequently, using these labels will increase the generalizability within a subject across time (Supplementary Figs. 1–4), across subjects, and across research groups examining either typical or atypical subject populations.

Anatomical location and spatial relationship of high-level visual regions are important for the comparison of cortical organization between typical and atypical populations

The present data indicate the correspondence between gross anatomical landmarks and a given functional region in high-level visual cortex, which allows the potential integration of identified regions with underlying anatomical structure (such as cytoarchitecture) in future studies of typical and clinical populations. The utility of this prospect is illustrated in a recent study examining cytoarchitechtonic differences of the fusiform gyrus in post-mortem brains of autistics and typical subjects (van Kooten et al., 2008). Specifically, van Kooten et al., (2008) write: ‘The fusiform face area (FFA) within the (fusiform gyrus) could not be identified separately because neither gross anatomical landmarks nor cytoarchitectonic criteria have been established in the literature to identify the FFA within the (fusiform gyrus) in human post-mortem brains’ (p. 989). Thus, the present findings now make it possible for future studies to link known functional properties with underlying neuroanatomical structures in high-level visual cortex of either typical or atypical populations. While the present mapping methods are conducted in individual subjects with high-resolution fMRI to assure precise functional localization that respects the nuances of each subject’s anatomy that group analyses do not allow, there is promise to combine the multiple mapping methods used here from individual subjects into an average functional brain that respects the macro-anatomical structure of each respective subject as recently illustrated using a probabilistic atlas approach across subjects (Frost & Goebel, 2011).

In addition to gross anatomical landmarks, we also provide a precise model of high-level visual cortex documenting a preserved spatial relationship among regions (Fig. 10), which can be used to compare to cortical organization in clinical populations. Present fMRI work examining the cortical consequence of perceptual deficits in face perception examines the presence, size, connections, or functional properties of a specific activation in patient populations (Golarai et al., 2010; Rossion et al., 2003; Schiltz et al., 2006; Thomas et al., 2009). But, how category-selective regions are organized relative to other high-level visual regions is also critical. For example, it may be possible that some clinical conditions may be associated with spatial reorganization of these regions. If so, the measurements proposed here and the comparison to the standard model (Fig. 10) could be used as diagnostics for identifying the condition in an individual subject.

Clustered and connected: fMRI and connectivity studies in non-human primates show an interconnected system of face-selective regions—but what about limbs?

Technological advancements enabling fMRI of awake, behaving non-human primates (Logothetis, Guggenberger, Peled, & Pauls, 1999), reveal as many as six face-selective patches in specific anatomical locations in the macaque temporal lobe (Freiwald & Tsao, 2010; Freiwald, Tsao, & Livingstone, 2009; Hadj-Bouziane, Bell, Knusten, Ungerleider, & Tootell, 2008; Hoffman, Gothard, Schmid, & Logothetis, 2007; Ku, Tolias, Logothetis, & Goense, 2011; Pinsk et al., 2009; Pinsk, DeSimone, Moore, Gross, & Kastner, 2005; Tsao, Freiwald, Knutsen, Mandeville, & Tootell, 2003; Tsao, Freiwald, Tootell, & Livingstone, 2006; Tsao et al., 2008). These patches contain a large proportion of face-selective neurons ranging between 52% and 97% of visually-responsive neurons (Freiwald & Tsao, 2010; Freiwald et al., 2009; Moeller, Freiwald, & Tsao, 2008; Tsao et al., 2006), which is much higher than 20–35% reported by early neurophysiology studies (Baylis et al., 1987; Desimone et al., 1984). Microstimulation methods further demonstrate that these face-selective patches are interconnected, creating an extended face-selective network (Moeller et al., 2008). But, what about limbs?

Just as the study of face-selective neurons gained popularity before the study of body part-selective neurons (see History section), there have been more fMRI studies examining face-selective regions in the monkey than regions selective for images of limbs or bodies. However, a few studies using either fMRI or optical imaging in monkeys report body part-selective patches cortically proximate to face-selective clusters (Borra, Ichinohe, Sato, Tanifuji, & Rockland, 2010; Pinsk et al., 2005, 2009; Sato, Uchida, & Tanifuji, 2008; Tsao et al., 2003), suggesting that the adjacent and alternating relationship among face- and limb-selective activations reported here may extend to monkeys. Most recently, using fMRI-guided neurophysiology, Bell et al., (2011) examined the relationship between the distribution of neurons within and outside face- and limb-selective regions localized with fMRI reporting three relevant findings. First, the concentration of selective neurons within face- and limb-selective regions was higher than for object- and place-selective regions. This supports our present stance that face- and limb-selective neural responses are good comparator systems for one another. Second, there were higher percentages of selective neurons within a given region than just outside (1–4 mm) or far outside (>4 mm) it, with the highest proportions in the center of a region. This indicates that the ‘inter-cluster’ distance between regions selective for faces and limbs contain selective neurons, but in smaller concentrations than within regions. Third, the proportions of recorded cells outside a region corresponded nicely to those percentages reported from the early studies of face-selective cells reviewed here (Perrett et al., 1982; Baylis et al., 1987; Desimone et al., 1984). These results indicate the utility of studying face- and limb-selective regions together, as well as the benefit of using high-resolution fMRI in humans (without spatial smoothing) to target the regions of interest since the highest concentration of selective neurons are likely to be in the center of fMRI activations. Further, these data support both modular and distributed elements in the organization of face- and body part-selective responses, which is consistent with other recent studies that used anatomical tracers in monkeys (Borra et al., 2010), as well as our results of a sparsely-distributed organization in humans revealed by high-resolution fMRI measurements (Weiner & Grill-Spector, 2010).

An important question remaining for future studies is: What are the mechanisms that generate the cortical correspondence among face- and limb-selective regions? One possibility is that neural responses to face and limb stimuli develop over time due to their joint frequency in the environment. This suggests that the selectivity of face and limb regions results from the ecological relevance of faces and limbs, as well as their high frequency in the natural world. However, reports of face-selective neurons in monkeys as young as five and a half weeks (Rodman, Scalaidhe, & Gross, 1993; Rodman, Skelly, & Gross, 1991) raise the possibility of an early maturation of these activations or even an innate bias for these stimuli. A related question is whether adjacent face- and limb-selective patches reflect two neighboring but separate cortical systems for face and limb processing, or a single system of alternating (but interconnected) face- and limb-selective regions that share connections at their boundaries. Some clues regarding this question come from microstimulation experiments, where microstimulating face-selective clusters in monkeys yields activation in other face-selective sites, but also extends outside their boundaries (Moeller et al., 2008)—where the present data would suggest a limb-selective region. Future work using a combination of methods such as fMRI, microstimulation, and single unit recording may address the transition between each stage of organization from single neurons to columns, to functional regions, to adjacent and alternating networks in IT cortex. These future studies will shed light on the organizational mechanisms across micro- and macro-level scales.

A new three-stream model of high-level visual cortex

Why might high-level visual cortex contain multiple face- and limb-selective regions?

In this section, we propose a model of high-level visual cortex explaining the multiplicity of face- and limb-selective regions in different anatomical locations (Fig. 11). Specifically, we elaborate on how the fine-scale organization summarized here and explored relative to visual field maps in our prior papers (Weiner & Grill-Spector, 2010, 2011) illustrates three anatomically and functionally distinct (but interacting) pathways extending ventrally, laterally, and dorsally in human high-level visual cortex.

Fig. 11
figure 11

A three stream model of high‐level visual cortex. The model is divided into three pathways, dorsal, lateral, and ventral, extending from early visual cortex. The parcellation of each pathway is guided by specific anatomical boundaries and functional differences, either visual or multimodal in nature. Gray arrows indicate interactions between pathways, while black arrows indicate transitions of function

The ventral stream: the role of ventral temporal cortex in recognition and memory

The ventral stream extends from early visual areas to ventral aspects of the occipital and temporal lobe (Fig. 1). It is well known that VTC is involved in visual recognition from lesion studies in monkeys and neuropsychological studies in humans documenting that damage to different portions of the temporal lobe produces specific deficits in object and/or face recognition (Damasio et al., 1982; Farah, 1990; Goodale, Milner, Jakobson, & Carey, 1991; Rossion et al., 2003; Sergent & Signoret, 1992; Ungerleider & Mishkin, 1982). Consistent with these reports, functional neuroimaging studies show that activations in VTC are correlated with successful recognition (Bar et al., 2001; Grill-Spector, Kushnir, Hendler, & Malach, 2000; Moutoussis & Zeki, 2002). For example, face-selective regions in lateral VTC show higher responses for the successful perception of faces during illusory and ambiguous stimuli (Andrews, Schluppeck, Homfray, Matthews, & Blakemore, 2002; Hasson, Hendler, Ben Bashat, & Malach, 2001; Tong, Nakayama, Vaughan, & Kanwisher, 1998), as well as for detection and identification of faces (Grill-Spector, Knouf, & Kanwisher, 2004). Given this role of VTC in visual recognition, and the adjacency of limb-selective regions relative to face-selective regions, we predict that the limb-selective OTS is involved in recognition of body parts, which can be tested in future research. In addition to the fine-grained parcellation of face- and limb-selective regions based on both anatomical and functional boundaries in VTC, there are also functional differences between more general anatomical subdivisions of VTC. Specifically, lateral VTC (from the occipitotemporal sulcus to mid-fusiform sulcus) illustrates qualitatively different temporal dynamics than medial VTC (from the mid-fusiform sulcus to parahippocampal gyrus) during prolonged presentations of various visual stimuli (Gilaie-Dotan, Nir, & Malach, 2008) and repetitions of visual stimuli across different timescales (Weiner et al., 2010). Based on these results, we have recently proposed that lateral VTC is involved in perception, whereas medial VTC is a gateway between perception and memory (Weiner et al., 2010). Future studies will help elucidate behavioral consequences of these organizational differences and how they affect aspects of perception and memory.

The dorsal stream: the role of posterior parietal cortex in position, motion, spatial working memory and attention, form, and action

The dorsal stream extends from early visual areas to the dorsal aspects of the occipital lobe extending into the parietal lobe. Prevailing views implicate the dorsal stream in different aspects of spatial vision (Ungerleider & Mishkin, 1982), visually guided actions toward objects (Goodale et al., 1991), and even time (Battelli, Pascual-Leone, & Cavanagh, 2007). Here, we focus only on the posterior aspect of the parietal lobe, as the processes within this cortical region are largely visual in nature. We recently reported a limb-selective region (limb-selective IPS) consistently overlapping visual field map V7 (also referred to as IPS-0; Swisher et al., 2007), where this limb-selective IPS is sensitive to the position of the limb in the visual field (Weiner & Grill-Spector, 2011). In addition to selectivity for static limb images, posterior parietal cortex in and around V7 has also been implicated in different aspects of spatial working memory, attention, and motion (Konen & Kastner, 2008; Orban et al., 2006; Silver et al., 2005; Tootell et al., 1998; Xu & Chun, 2006), indicating the integration of several computational processes within this cortical region. Indeed, a series of studies examining the neural processing of limb actions have documented a clear anatomical and functional dissociation of parietal cortex where posterior IPS regions are involved in the observation and visual guidance of limb movements (a combination of position, motion, and limb form), while the anterior IPS regions are more involved in the execution of limb movements themselves (Filimon, Nelson, Huang, & Sereno, 2009; Levy, Schluppeck, Heeger, & Glimcher, 2007). Such results are in line with patient studies reporting that focal damage to posterior parietal cortex produces specific deficits in identifying and pointing to body parts—either their own (autotopagnosia; De Renzi, 1982; Ogden, 1985) or others (heterotopagnosia; Auclair, Noulhiane, Raibaut, & Amarenco, 2009; Cleret de Langavant, Trinkler, Cesaro, & Bachoud-Levi, 2009). Whether this perceptual deficit is a direct result of local cortical damage or reflects a disruption of connections within the extended cortical network of limb-selective regions in the ventral or lateral pathways is an open question. We propose that the posterior parietal cortex (in the vicinity of V7) is a transitional stage in the dorsal pathway functioning to convert visual inputs into action outputs, whereas the anterior IPS is more involved in the actions themselves. Relevant to the topics in this Special Issue, future research will elucidate whether the limb-selective IPS reflects visual processing associated with the form of the limb itself, or reflects a visual representation embodied in the context of an action representation.

The lateral stream: the role of lateral occipitotemporal cortex in form, motion, and multimodal processing

Traditionally, the visual system is divided into ventral and dorsal pathways, where area MT is typically assigned to the dorsal stream consistent with its anatomical location in the monkey (Ungerleider & Mishkin, 1982). However, in humans, MT is farther from parietal cortex, located more inferiorly in the posterior inferior temporal sulcus (Dumoulin et al., 2000; Tootell & Taylor, 1995). This difference in the anatomical location of MT, as well as the more inferior positioning of the ventral stream and more superior location of the dorsal stream in humans compared to monkeys, has been proposed to reflect the cortical expansion accommodating emergent language properties in humans (Orban, Van Essen, & Vanduffel, 2004; Ungerleider, Courtney, & Haxby, 1998). Expanding on these proposals, we suggest that this difference reflects a lateral pathway in the human brain incorporating different aspects of vision, action, and language. For example, our present measurements document face- and limb-selective regions radiating around both MT and MST (Weiner & Grill-Spector, 2011; Fig. 10 for schematic). This organization seems to be specific to humans as fMRI studies in non-human primates illustrate face- and body part-selective regions cortically distant from MT, located more ventrally in portions of TEO and TE (Fig. 2; Tsao et al., 2003; Pinsk et al., 2009). We propose that the organization of face and limb-selective regions around hMT+ is a unique feature of the human lateral surface and expand on recent results from neuropsychology and neuroimaging studies providing evidence that LOTC is functionally distinct from the dorsal and ventral streams.

Neuropsychology studies examining face and body part processing suggest that damage to LOTC results in perceptual deficits separate from processing associated with dorsal or ventral high-level visual cortex. Damage to the lateral surface near the limb-selective LOS results in a general body agnosia with impairments in body part, but not object or face part, discrimination (Moro et al., 2008). Compared with deficits in body part localization and ownership associated with damage to posterior parietal cortex discussed above (Auclair et al., 2009; Cleret de Langavant et al., 2009; De Renzi, 1982; Ogden, 1985), these results illustrate a dissociation between the dorsal and lateral streams within the domain of body part processing. Within the domain of face processing, a variety of cortical lesions spanning different aspects of the ventral and lateral streams can each produce impairments in holistic face processing (Busigny, Joubert, Felician, Ceccaldi, & Rossion, 2010; Van Belle et al., 2011). However, lesions to the IOG are associated with selective impairments in discriminating face parts, but not object or body parts (Moro et al., 2008), which suggests functional differences between the ventral and lateral streams. These findings from neuropsychological studies implicate the cortical expanse posterior to MT (LOS for body parts and the IOG for faces; Fig. 10) with processing the visual form of the face and body. Paired with the fact that this portion of lateral occipital cortex also selectively responds to images of objects and shapes across multiple visual cues (Grill-Spector, 2003; Grill-Spector et al., 1998; Mendola, Dale, Fischl, Liu, & Tootell, 1999; Vinberg & Grill-Spector, 2008), suggests that regions posterior to MT are responsible for coding visual form more generally (Fig. 11).

Comparatively, the MTG, which is anterior to hMT+, is not strictly visual in nature, but shows polymodal response properties involved in different aspects of vision, action, and language—a feature that further distinguishes LOTC from the other processing streams. Based on our measurements, we refer to the MTG as limb-selective. However, prior studies also implicate the MTG and nearby ITG in executing hand movements (Astafiev, Stanley, Shulman, & Corbetta, 2004; Orlov et al., 2010), haptically exploring objects (Amedi, Malach, Hendler, Peled, & Zohary, 2001), and responding to tactile stimulations of the hand relative to the foot (Beauchamp, Laconte, & Yasar, 2009; Beauchamp, Yasar, Kishan, & Ro, 2007). The MTG has also been shown to code the rationality of movements (e.g. the mapping of action to meaning; Jastorff, Clavagnier, Gergely, & Orban, 2010), as well as the mapping of sounds to meaning (Glasser & Rilling, 2008; Wong, Chandrasekaran, Garibaldi, & Wong, 2011), suggesting that it may be an anatomical locus for the integration of gesture and language processing (Nelissen et al., 2010). Taken together, these studies indicate that the MTG may be a convergence zone of action representation embodying information across visual, tactile, haptic, and motor domains with potential roles also in language processing and social communication, which is in line with previous proposals (Beauchamp & Martin, 2007; Martin, 2007). Overall, these data suggest that LOTC is organized differently than ventral or dorsal high-level visual cortex with distinct functions that separate it from either pathway. Future research using multiple functional and anatomical methods will support or refute our proposal of the lateral pathway as a distinct processing stream.

Conclusions and future directions

The present work illustrates that face- and limb-selective regions are topographically organized throughout high-level visual cortex. These data provide the first framework for consistent parcellation of high-level visual regions outside visual field maps. Importantly, implementing this parcellation framework has generated a new model of high-level visual cortex containing three processing streams extending dorsally, laterally, and ventrally, which are separable based off anatomical and functional criteria. Further, our results suggest that the statistical regularity in which faces, limbs, and bodies are presented relative to one another in the natural world has been incorporated into the visual system in high-level representations of alternating maps within these three separate processing streams. The anatomical location of each region within its particular stream, as well as its spatial relationship to other known surrounding functional regions, may be related to the particular role of each region in either distinct aspects of vision, action, haptics, memory, and language, or combinatorial aspects across these modalities. The new three stream model and systematic parcellation framework described here motivates future research both to examine how and why neural representations of faces and limbs cortically neighbor one another, as well as to test visual and polymodal properties of different regions guided by the predictions of the model. These future directions will determine how these pathways interact and converge to embody different aspects of vision, action, and language.