Introduction

Functional magnetic resonance imaging (fMRI) has enabled the acquisition of whole-brain images of brain activity in humans and other animals. The technique has been used to functionally localize brain regions, with particular success in localizing regions selective for different visual categories, including face-, body-, object-, and place-selective areas in humans1,2,3 and non-human primates (NHP)4,5,6,7,8.

Human fMRI studies typically use the endogenous contrast agent deoxyhemoglobin, and measure the blood-oxygen-level-dependent (BOLD) signal. BOLD has been used in humans with a wide variety of experimental designs, including rapid event-related designs that give researchers great flexibility. In particular, rapid event-related fMRI enables condition-rich designs intended for pattern-information analyses9. BOLD fMRI has also been utilised by NHP studies (see below), however, many NHP studies have used the exogenous contrast agent monocrystalline iron oxide nanoparticle (MION) to increase the sensitivity of the measured signal. MION reflects blood volume, rather than blood oxygenation. Vanduffel et al.10 compared the use of BOLD vs. MION in block-design experiments in awake macaque monkeys to map the brain areas selective for motion. Their results not only matched monkey electrophysiology and human fMRI results, but also showed greater spatial localization and contrast increase in MION relative to BOLD. More recently, block-designs combined with MION have been predominantly used to localize fMRI-defined category-selective areas in macaques (for example4,7,8,11).

MION’s slower haemodynamic response is unproblematic in the context of block designs. Leite et al.12 compared MION with BOLD in macaques using visual checkerboard stimuli with varying presentation durations. They found that MION increased the functional sensitivity for stimuli presented at long durations, but brief or rapidly repeated stimulus presentations led to a greater attenuation of the signal compared to BOLD, consistent with a linear model capturing the dispersion of the response over time. This suggests that MION might be less sensitive for rapid event-related designs, whose high-temporal-frequency effects might not pass through the low-temporal-frequency filter of the MION response. However, event-related designs have been successfully used in MION fMRI studies previously, for example13,14,15,16,17.

To understand the functional homologies and analogies between the human and the NHP brain, it would be desirable to use the same contrast mechanism in both species (note, however, that alternative comparative neuroimaging approaches have been utilised –see18 and more recently19). Given that administering MION is an invasive procedure, not approved for use in humans, BOLD in NHPs might be the best approach for interspecies comparisons. Indeed, Pinsk et al.5,6 investigated visual category-selectivity in monkeys using BOLD with block-designs, and BOLD has also been used in event-related designs in monkeys previously20,21,22. Interestingly, the faster temporal response in BOLD fMRI might be beneficial in the context of rapid event-related designs.

Here, we explore block-design and rapid event-related (rER) BOLD fMRI in awake macaques using visual images of real-world stimuli including human and animal faces, human and animal bodies, objects, and places. In the rER experiment, each stimulus was presented for 0.5 s, and there was a 2.5 s inter-stimulus interval (see Methods). Therefore, we define our event-related design as ‘rapid’ on the basis that the interval between successful stimulus presentations was shorter than the duration of the hemodynamic response function23. We selected these stimuli because they have been shown to evoke strong category-selective visual responses in higher-order visual areas in both humans and macaques (see above). We found clear and strong visual responses and some selectivity to categories, consistent with findings reported in previous studies, in both our block- and rER experiments. However, in the rER experiment, even after censoring scan volumes where our behavioural performance criteria were not met, and after substantial averaging, responses were quite noisy compared to (a) human rapid event-related BOLD fMRI24, (b) monkey rapid event-related MION fMRI14, and (c) our own block-design experiment. We cannot rule out that factors related to the suboptimal behavioural performance of our animals may have affected the responses we obtained. For example, the majority of fMRI studies involving macaques use some form of fluid restriction to motivate the animals, whereas our animals had daily unlimited access to water, and were motivated by smoothie rewards during the experiments. Nevertheless, eye fixations within the animals’ fixation window showed reasonable position stability (see Table S1 in Supplementary Information for summary statistics of the eye-tracking data). To account for our rER fMRI results further, given that collecting additional MRI data using MION was not an option under our study’s project license, we conducted a set of simulations of the BOLD and MION response during rapid event-related experiments. Our results extend previous findings by Leite and Mandeville25 by considering all frequencies of stimulus presentation and more conservative assumptions about the MION response shape. Our simulations suggest that the benefits of the greater amplitude of the MION response outweigh the loss of contrast caused by greater temporal smoothing. MION dominated BOLD in functional sensitivity across the entire range of temporal frequencies that matter for event-related experiments and across a wide range of reasonable assumptions about the relative amplitude of MION and BOLD. Together, our experiments using BOLD with block- and rER designs, and our simulations showing improved efficiency of MION than BOLD in rER designs, suggest that future NHP studies aiming to perform BOLD fMRI might benefit from employing a block design, whereas rER NHP studies should consider the use of MION. We conclude that although any two of the three elements (rER, BOLD, NHP) have been shown to work well, the combination of all three is particularly challenging.

Results

We collected fMRI data while three macaque monkeys (M1-M3) were viewing visual images presented at the centre of a computer monitor.

We first ran a block-design fMRI experiment whose data were also used to define the regions of interest (ROIs) used in our event-related fMRI experiment conducted a few months afterwards. In the event-related experiment, we sought to probe the emergence of selectivity to different object categories in the ventral visual stream, using a stimulus set that has been successfully employed in NHP and human studies previously14,24,26. Additionally, given the predominant categorical organisation of face-selective regions on the macaque superior temporal sulcus (STS)7,27, we were particularly interested in evaluating the (dis)similarities in the activity patterns9 elicited by the individual images in these regions.

Block-design experiment

We found strong visual responses for most hemispheres, in the occipital and temporal lobes (Figs. 1 and 2). Furthermore, we identified anterior, middle, and posterior face-selective regions in the STS (see Fig. 3 for M1).

Figure 1
figure 1

Visual activation (ON > OFF contrast) maps, in the block-design experiment. Left panel: Uncorrected data are presented on inflated left and right hemisphere, for the three monkeys (M1–3). For the generation of z-statistic activation maps, see Methods. Results are displayed on surface representations transformed to standard monkey space (MACAQUE-F99; ref. 28). The inflated hemispheres were generated using Caret5 (http://www.nitrc.org/projects/caret/; see ref. 29). A: anterior; P: posterior. Right panel: The cluster-corrected z-statistic maps for the three monkeys.

Figure 2
figure 2

Regions of interest (anatomically-derived) and mask-generation pipeline. (A) The regions of the ventral visual stream delineated according to the atlases cortical parcellations described in refs. 30,31. (B) We used the uncorrected ON > OFF results, from the block-design experiment, to generate 2 mm radius spherical ROI masks (appearing as white patches, and highlighted with green rings for illustrative purposes) within the original anatomical mask, around the peak ON > OFF voxel, separately for each monkey (see also Fig. 1). For the ease of representation, the present figure shows the results for M1’s left hemisphere only. D: dorsal; A: anterior.

Figure 3
figure 3

Face-selectivity maps in the block-design experiment. Data are presented on the left and right hemispheres for M1. (A) Activation maps for the faces > places contrast. Top panel: Uncorrected z-statistic maps displayed on lateral surface representations transformed to a standard monkey brain28. In accordance with Tsao et al.8, we found regions activated by images of faces in posterior, middle and anterior parts of STS. We generated 2 mm radius spheres around the peak face-selective voxels in these posterior, middle and anterior STS (spheres’ rough locations for M1 are highlighted here with cyan, blue and black rings respectively). Middle panel: Coronal-plane views (Y coordinates in MACAQUE-F99 space) of the spherical ROI masks (in green) are also shown for M1. Note that the left hemisphere is shown on the right of each image (radiological display convention). Bottom panel: The cluster-corrected z-statistic maps for M1. By comparison with the uncorrected data, the anterior face patches do not survive the correction for multiple comparisons. (B) Activation maps for the faces > objects contrast: The STS face-selective regions identified by contrasting faces and objects were either in identical locations or in very close proximity to the regions revealed by the faces>places contrast (especially in the left hemisphere –compare with panel A). Overall, in accordance with ref. 4, the faces>places contrast produced larger regions.

In particular, to localise face-selective regions in each animal, we contrasted face and place stimuli using the data from the present, block-design, experiment. To increase the chances that we captured the face-selective locations in all monkeys, we selected a liberal threshold of z = 1.6. Consistent with the ‘face patches’ reported previously7,8, we found brain regions with face selectivity in the posterior, middle and anterior parts of STS. As shown in Fig. 3A (M1’s data; threshold set to z = 2.3 for presentation clarity of the face-selectivity found), we identified a large region of face-selectivity that covers the fundus and the lower lip of the middle STS, and likely corresponds to Tsao et al.’s8 middle fundus (MF) and middle lateral (ML) temporal face patches. Furthermore, we found a small anterior face region at the fundus of the STS (likely corresponding to the AF in ref. 8) and a small region located more ventrally at the anterior STS (corresponding to the AM in ref. 8). The face-selective voxels we found at posterior STS slightly varied compared to ref. 8: our experiment revealed a posterior face patch in the STS fundus, whereas Tsao et al. reported this to be closer to the STS lip. In the hemispheres where we did not identify a face-selective region (as was the case in M3, where no face-selective voxels were found in anterior or posterior left STS), we used the coordinates from another monkey in our study to generate the mask for the specific missing ROI.

The face-selective regions we found were in close proximity to the ones identified when contrasting faces and objects (Fig. 3B), further confirming the strong selectivity for face stimuli in macaque STS, as well as providing reassurance that the faces >places contrast we used is appropriate for revealing face selectivity4.

Event-related experiment: univariate analysis

Figure 4 shows percent signal change data across all ROIs for the event-related experiment. The bars depict data averaged across the individual stimuli within each category, then averaged across sessions (all subjects). We found no category-selectivity in early visual cortex V1 or V2. Rather, category-selectivity seems to emerge for the first time at higher levels of visual processing. Specifically, a significant main effect of stimulus category was first observed in area V4 F(3,75) = 3.15, p = 0.030). As Fig. 4A shows, V4 responses to the images of places were greater compared to the responses to the rest of the images. Farther along the ventral cortex, a significant main effect of category (F(3,75) = 2.83, p = 0.044) was found in area TEO, where responses to body-part images were greater compared to the other categories. Area TEm showed a significant main effect of category (F(3,75) = 2.9, p = 0.041) with a preference to body-parts. We did not find a significant main effect in TEpd (F(3,75) = 2.19, p = 0.096), however, responses to body-parts were greater compared to the other categories. Moving more anterior in IT cortex, into TEad and TEa, we observed responses lower than baseline to almost all stimulus categories and no significant category-selectivity in either region (p’s > 0.05).

Figure 4
figure 4

Percent signal change for the event-related experiment. Data are shown across the ventral visual stream (A) and the functionally-defined face-selective regions in STS (B). Results derived from our GLM analysis (contrast: each individual image versus baseline). Bars depict data averaged across the individual images within a category, averaged across sessions and across monkeys. The two hemispheres were merged together in a single ROI. Error bars show standard error of the means across sessions. V1, V2, V4 responded significantly to images of any of the categories. V4, TEO, TEm, the posterior face-selective and the middle face-selective regions (marked by asterisks) responded distinctly to different categories (ANOVA main effect, significant at p < 0.05).

We also extracted data from the posterior, middle and anterior face-selective ROIs. In the posterior face-selective ROI, we found a significant main effect of category (F(3,75) = 4.46, p = 0.006), with greater responses observed to face images. In the middle face-selective STS area, we found a significant main effect of category (F(3,75) = 2.95, p = 0.038), with greater responses observed to body-part images. Finally, in the anterior STS area, we found greater responses to faces compared to the other categories, but this did not reach statistical significance (Fig. 4B). Figure 4 overall suggests that even after substantial averaging across the responses to different stimuli within a category, we did not observe strong category selectivity, demonstrating that our BOLD rapid event-related data were very noisy.

Finally, similar to the block-design experiment, we contrasted face and place stimuli to generate face-selectivity maps in the event-related experiment. We selected a threshold of 1.6, uncorrected. As shown in Fig. 5, face-selective regions emerged in reasonable STS locations, however, activations were less strong than in the block-design experiment (Fig. 3A). Furthermore, contrary to the block-design results, no active voxels survived cluster correction here.

Figure 5
figure 5

Activation maps for the faces>places contrast, in the event-related experiment. Uncorrected Z-statistic maps for the left and right hemisphere for M1 are displayed on lateral surface representations transformed to a standard monkey brain28. Contrary to the block-design experiment (Fig. 3A), no voxels survived cluster correction here.

Event-related experiment: representational similarity analysis

Representational dissimilarity matrices (RDMs)9 did not exhibit any discernible structure. Figure 6 shows RDMs from bilateral V1, posterior face-selective regions, and anterior face-selective regions from M1. In particular, there was no apparent clustering of patterns by category as observed in human fMRI and macaque cell recordings24. The cross-validated Mahalanobis distances in V1 were strong, and uniform across the images, consistent with distinct response patterns to individual images. In face-selective regions the responses were less uniform across the images, suggesting some categorical structure encoded in these regions. However, the brain responses were likely too weak to yield interpretable structure in the RDM. Overall, measuring the detailed response patterns elicited by particular stimuli as is standardly done in humans was difficult given the combined challenges of rapid event-related design, BOLD contrast, and NHP fMRI. These analyses suggest that condition-rich pattern-information analyses, as routinely performed in humans on the basis of BOLD fMRI data, may be more challenging to obtain in monkey BOLD rapid event-related fMRI, particularly with inconsistent behaviour. Condition-rich design reduces the number of repetitions that are possible for a given stimulus within the scan duration; pattern-information analyses rely on fine-grained activity patterns across voxels.

Figure 6
figure 6

Representational similarity analysis. Examples of representational dissimilarity matrices (RDMs) in bilateral V1 (left), posterior face-selective regions (middle), and anterior face-selective region (right) of M1. RDMs consist of pair-wise cross-validated Mahalonobis distances between all the 48 images we used.

MION dominates BOLD for simulated event-related monkey fMRI

Our BOLD fMRI rapid event-related response estimates are noisy, suggesting that BOLD rapid event-related designs, although successful in humans, are challenging in monkeys. An important question is whether rapid event-related designs might work better in monkeys when MION is used.

Rapid event-related designs can work with MION12,13,14,25. However, it is unclear how the larger amplitude of the MION response (which helps sensitivity) (see Fig. 7A) trades off against its larger temporal width (which might reduce the differential sensitivity to fast switching stimuli in rapid event-related designs). Leite and Mandeville25 argued on the basis of simulations, that MION more than BOLD benefits from randomization of the stimulus timing, which moves effect energy into lower temporal-frequency bands. Even if high temporal-frequency effects are significantly attenuated in MION fMRI, they could still be stronger than in BOLD fMRI. The power spectrum of the MION model response and linear-model simulations indeed suggests that MION should have greater sensitivity than BOLD in general, i.e., for any type of design12,25. However, the MION model used by Leite et al.12,25 has a sharp onset, which, on one hand, might not realistically reflect the actual MION response and, on the other, might enable the response model to transmit more high-temporal-frequency information than the actual MION response.

Figure 7
figure 7

Linear response simulation suggests that MION affords greater sensitivity than BOLD in a fast-switching rapid event-related design. (A) Impulse response function models for BOLD (blue) and MION (red). MION has a larger and wider response than BOLD. The BOLD model is from Boynton et al.32. The MION model is from Leite et al.12, except that the sharp onset has been replaced with the onset of the BOLD model (see Fig. 8 for the onsets), making the comparison more realistic and conservative with respect to the prospects of MION. Responses are normalized so that the BOLD response peaks at 1. (B) Contrast time course. The design consists of two stimuli presented in alternation with a switch every 2 s (4 s period, 1/4 Hz). The volume TR is 2 s. The panel shows the contrast time course, i.e. the stimulus contrast time course convolved with BOLD (blue) and MION (red) impulse response functions. This illustrates that, even for this very fast switching design, where the broad MION response cancels much effect energy, the greater amplitude of the MION response still yields more effect energy than the BOLD response. MION (for equal additive noise) yields greater functional contrast for rapidly switching designs. (C) Periodograms of the impulse response functions (red for MION, blue for BOLD) and the stimulus contrast time course (black) show the full-spectrum dominance of MION over BOLD. (D) The size of the standard-error bars we expect to obtain with each of the two methods. The greater effect energy for MION translates to substantially smaller standard-error bars.

To assess more conservatively whether MION is theoretically superior to BOLD even for rapid event-related designs, we performed analyses and simulations using a modified version of the Leite et al.12 model. We extended Leite et al.12,25 by replacing the sharp onset of the MION impulse response function with a smooth onset, matching the onset to the BOLD model of Boynton et al.32 (Fig. 7A). We simulated an extremely fast rapid event-related design, in which two experimental conditions (e.g. two stimuli) switch back and forth, with each being presented for the duration of just one acquisition volume (2 s) (Fig. 7B). Even for this fast-switching design, our simulation shows that MION yields substantially better sensitivity to the contrast between the two conditions (reflected in smaller error bars, Fig. 7D), essentially replicating the effects predicted by Leite and Mandeville25.

To address these questions more generally, we analysed the impulse response of BOLD and MION in the frequency domain (Fig. 8). We added versions of the BOLD and MION impulse response functions with an even smoother onset than that of Boynton et al.32. Results demonstrate that MION has full-spectrum dominance over BOLD.

Figure 8
figure 8

MION affords greater sensitivity than BOLD under conservative assumptions about the onset for arbitrarily rapid event-related designs. (A) Different impulse response function models considered for BOLD (blue) and MION (red). The most conservative models (dashed lines in saturated red and blue) have a smooth onset (identical for BOLD and MION), which transmits less high-temporal-frequency effect energy than the Boynton et al.32 BOLD model, and much less than the Leite et al.12 MION model. (B) Periodograms show that for conventional as well as our more conservative smooth-onset models, MION dominates BOLD in terms of its transmission of effect energy across the full spectrum of temporal frequencies. We therefore expect that MION will provide greater sensitivity to effects of interest, no matter how rapid the event-related design.

We tested the robustness of MION’s full-spectrum dominance to changes of the assumed factor by which the peak of the MION response exceeds that of the BOLD response. The previous simulations assumed that the MION response peaks 1.8 times higher than the BOLD response (when the temporal noise is equated). We relaxed this assumption by gradually lowering the peak amplitude of the MION response in the simulation (not shown). MION maintained its full-spectrum dominance over BOLD for rapid event-related designs down to a factor of 1.5 (peak of MION/peak of BOLD) for the smooth-onset variants of both impulse response functions. In sum, the simulations suggest that MION robustly dominates BOLD under conservative assumptions. We expect that MION will yield greater sensitivity no matter what experimental design is used.

Discussion

We collected BOLD fMRI data from awake behaving macaque monkeys, focusing on the occipital and temporal cortices in two independent experiments: a block-design experiment and a rapid event-related experiment. Each experiment used an independent set of visual images of real-world stimuli. In both experiments, we found visual responses and category selectivity. However, the effects were noisier than expected, especially in the event-related experiment.

In the block-design experiment, we found strong visual responses in the occipital and temporal lobes of all three subjects. Furthermore, we were able to identify bilateral anterior, middle, and posterior face-selective regions for most of the subjects. These face-selective regions were in the regions expected, but less specific than those reported in the literature using MION. For example, Tsao et al.8 found six face patches in each hemisphere, where there are two patches for each of the anterior, middle, and posterior parts of the STS. Here, we found correspondence in the anatomical locations, with some subjects showing two patches in each portion of the STS, but these were not easily identifiable in all subjects possibly due to the lower functional contrast of BOLD. However, this could also be because Tsao and colleagues collected more volumes per subject, or had more stimulus repetitions in their block design experiments. We also cannot rule out the possibility that our monkeys’ (who were not fluid-restricted during testing) performance on the task may have negatively affected our observations. For example, on average, we had to exclude about 15% of the trials corresponding to our collected MRI volumes because monkeys did not sustain fixation during those volumes (see Methods). In the remaining data, fixation eye showed reasonable position stability within their fixation window (see Table S1 in Supplementary Information), although, the eye monitoring sampling rate we used (20 Hz) may have missed some saccades.

In the event-related experiment, the stimulus-evoked BOLD responses were substantially noisier. Using an ROI approach, we considered ventral stream brain regions from early visual cortices to anterior IT. In early visual areas, we found strong visual responses, but no significant category selectivity. Beyond early visual cortex, we found that category-selectivity begins to emerge. Specifically, we found some category selectivity in V4, TEO, and TEm, as well as in the face-selective regions in the STS. However, as we reach regions in anterior IT, such as TEad and TEa, we found no evidence of category-selective responses. This could be related to the relatively weaker fMRI signal found in these regions or to our rapid event-related data being of lower quality and not very reliable, but could also be because these regions are more involved in distinguishing identities within a particular category (e.g., refs. 33,34,35 -but see4). RSA analyses using noise covariance-normalized distances (crossnobis distances) on face-selective regions found some differences between early visual areas and face regions. The RDMs appeared qualitatively different from each other across areas, but the pattern dissimilarities within a brain region were too noisy for detailed analyses of the representational geometries. Brain activity patterns in early visual areas were strongly dissimilar across stimulus conditions. In the face-selective regions, by contrast, there was only very weak structure suggesting some information about stimulus category.

Overall, we found strong visual responses and some category-selectivity in both our block- and event-related designs. However, the data were noisy even after artefact rejection and substantial averaging, which we attribute to the lower contrast-to-noise ratio of BOLD compared to MION, the smaller brains of NHPs, as well as eye-movement- and motion-related artefacts.

Collecting MRI data using MION was not an option under the project license of our study, therefore, we finally performed simulations based on the known response properties of BOLD and MION to compare the response profiles between the two contrast mechanisms. Considering the slower temporal response profile of MION, and previous findings of a more attenuated differential response in MION compared to BOLD at faster rates of stimulus switching12, one might expect that BOLD will work better than MION for rapid event-related designs. However, our simulations suggest that at every timescale of stimulus presentation, MION dominates BOLD in terms of sensitivity.

Our ability to draw clear conclusions in this study was somewhat compromised by the relatively poor quality of the data as compared to other similar studies combining NHPs with fMRI. A significant contributor to that was the inconsistent behavioural performance of the monkeys, and the subsequent need for a larger-than-normal fixation window. Many studies that combine fMRI and NHPs use fluid restriction regimens to train and later motivate the animals. Our animals were provided with free access to water for most of the day, and we instead relied on fruit smoothie (as a special reward) to motivate our animals. It is possible that this method of reinforcement may have been insufficient for this particular context. With this in mind, it is essential to choose the best experimental procedures available to the researchers to maximise their chances of obtaining high-quality data. Here, we can offer some insights. First, the stronger BOLD responses we measured in the block-design experiment compared to the rapid event-related experiment suggest that block designs may be a better choice than rapid event-related designs when using BOLD fMRI in NHPs, particularly in situations where MION and/or more efficient training approaches are not an option. Second, our simulations suggest that MION, despite its more prolonged response, may be preferable to BOLD even for rapid event-related designs. Finally – and perhaps most important – our data clearly argue for the need to achieve a consistent level of high performance in the animals. Unlike, for example, in electrophysiological studies where individual trials can be easily removed without affecting overall data quality; it is essential to have the animals performing strongly throughout a scan session. Achieving this level of performance may require additional time and effort if fluid restriction paradigms are not an option.

Methods

Subjects and housing

All experimental procedures were performed in accordance with the guidelines and regulations of the UK Animals (Scientific Procedures) Act of 1986. A Project License was reviewed by the University of Oxford Animal Care and Ethical Review Committee and the Home Office (UK) approved and licensed all procedures. Three male macaque rhesus monkeys (M1-M3; mean age: 7 years; mean weight: 12.5 kg) were used in the experiments. M1 and M2 were pair housed and M3 was singly housed, with a 12 hour light/dark cycle (lights on 07:00–19:00). All three animals had unlimited access to water and regular visual contact with human staff. The animals were surgically implanted with an MRI-compatible head post (Rogue Research, Montreal, Canada) in aseptic conditions under general anaesthesia (see ref. 36). After recovery, animals were trained to sit in a primate chair in the ‘sphinx’ position with their heads fixed.

Training task and stimuli

Monkeys were trained to fixate a cue that appeared in the centre of a computer screen. Training took place inside a mock scanner to acclimatize the monkeys to the scanner environment and noise. The images had a size of 11° by 11° of visual angle. Monkeys received a smoothie reward for maintaining fixation within a 5° by 5° rectangular window appearing in the centre of each image. The frequency of reward increased over time as the monkey maintained unbroken fixation. Stimulus presentation, eye fixations, and reward delivery were controlled by PrimatePy, a custom-made programme based on Psychopy37 (for details on PrimatePy, see38). In the scanner, stimuli were presented centrally via an LCD projector onto a rear-projection screen.

In the block-design experiment, we used a subset of the stimuli used by our group previously39. Here, our monkeys were presented with images of faces, objects, places, and scrambled versions of objects. In the event-related experiment, the monkeys were presented with a different stimulus set (a subset of the stimuli used in refs. 14,24,26) that consisted of 48 images of human and animal faces and body parts, man-made and natural objects, and places.

Block-design fMRI procedure

Each image in the ‘ON’ blocks was presented for 0.4 s, with a 0.5 s inter-stimulus interval (ISI). The duration of an ON block was 32 s. ON blocks were interleaved with blank, ‘OFF’, blocks (16 s). Images had a size of 11° of visual angle. A scanning session included a total of 900–2000 volumes, and we collected data in 5 sessions for each monkey. For M1, we collected a total of 7600 volumes; for M2 a total of 6100 volumes; for M3 a total of 7000 volumes.

Event-related fMRI procedure

On each run (consisting of 117 volumes), each of the images was presented once, in randomized order. Each image was presented for 0.5 s, ISI was 2.5 s, and 30 null trials (blank -isoluminant gray) lasting 2.5 s were interleaved at random time points within a run. A scanning session included a total of 1170–1638 volumes (10–14 runs). For M1, we collected a total of 20358 volumes in 14 sessions; for M2 a total of 12987 volumes in 9 sessions; for M3 a total of 3744 volumes in 3 sessions.

MR data acquisition and pre-processing

Data for both experiments were collected using a horizontal 3 T MRI scanner and a four-channel phased-array receiver coil, together with a radial transmission coil (Wind-miller Kolster Scientific). For the MRI data acquisition, we used an echo planar imaging (EPI) sequence with the following imaging parameters: voxel size=1.5 mm isotropic, repetition time (TR) = 2 s, 32 slices, echo time (TE) = 29 ms, flip angle=78°.

Raw data were reconstructed offline using a sensitivity encoding (SENSE –see ref. 40) reconstruction method in Matlab, to reduce ghosting artefacts41 (Offline SENSE GUI, Windmiller Kolster Scientific, Fresno, CA). To reduce artefacts caused by body motion42 we further used motion-correction algorithms43 as follows: all volumes within a run were aligned slice-by-slice to the single volume identified as having the least amount of motion (least variance from the mean). The aligned data obtained in the same session were merged into a single 4D NIFTI file, using the Functional MRI of the Brain (FMRIB) Software Library (FSL; www.fmrib.ox.ac.uk/fsl)44. In FSL, the 4D data were skull-stripped and subjected to spatial smoothing (full-width half maximum of 3 mm), intensity normalisation, and high-pass filtering (cutoff 60 s). Finally, we spatially co-registered the functional data to a standard anatomical template (MACAQUE-F9928) using affine transformation.

Eye-movements and motion artefacts

We analysed eye-tracking data to identify and exclude trials where the monkeys fixated outside a ± 5° window from the fixation cue (i.e., not looking the presented stimulus, which had a size of 11° by 11°). Eye-movements were monitored using an MR-compatible LED infrared camera (MRC Systems GmbH, Germany). Eye position was calibrated at the beginning of each session. This calibration procedure was part of the animals’ regular behavioural training in the mock scanner. In the scanner, eye-movements were recorded at 20 Hz, that is, ~40 samples were obtained in each volume. A trial was excluded if the subject did not look the stimulus for more than half the time (i.e., >1 s) in a corresponding volume. For each subject, the mean percentage of volumes per session detected with broken fixations were: M1 = 16.3% (standard error of the mean -SEM = 2.4%); M2 = 17.1% (SEM = 7.0%); M3 = 10.0% (SEM = 6.7%). All trials during which broken fixations were detected were not modelled in the general linear model (see ‘event-related fMRI: data analysis’ below).

For each session, we identified the volumes containing large motion artefacts (variance two standard deviations greater than the mean of the head motion estimation). The mean percentage of volumes identified per session were: M1 = 3.0% (SEM = 0.3%); M2 = 4.3% (SEM = 0.8%); M3 = 2.3% (SEM = 0.6%). Motion outliers were modelled as nuisance regressors in the main analysis.

Regions of interest

The ventral visual stream is considered to be a visual object recognition pathway45 accounting for key findings of object-selective responses in monkey inferior temporal (IT) cortex46,47. We considered ventral visual stream areas V1, V2, V4, TEO (posterior IT cortex) and TE (anterior IT). We examined subdivisions of TE, that is, posterior TE (TEpd), middle TE (TEm), and anterior TE (TEad and TEa). TEO and TE anatomical masks were delineated according to the macaque cortical parcellations in the Saleem and Logothetis30 atlas. V1–4 anatomical masks were delineated according to the cortical parcellations in Van Essen et al.31.

Furthermore, recent NHP studies have revealed that the superior and inferior banks of the macaque STS contain several fMRI-identified face-selective regions4,5,6,7,8. To reveal such face-selective regions, we contrasted face and place stimuli (from our Block-design experiment data) similarly to previous studies4,48,49.

To compare category-selectivity across different parts of the brain in our event-related experiment, we equated the size of all our ROIs50. Specifically, for V1, V2, V4, TEO and TE, we created a 2 mm radius spherical mask around the voxel with peak visual activation (ON > OFF contrast in our block-design experiment), within each area (Figs. 1 and 2). Note that within V4, the spherical masks for the three monkeys were located in the ventral portion of V451,52. For the functionally-defined, face-selective, STS areas, we created a 2 mm radius spherical mask around the peak face-selective voxel (faces>places contrast in the block-design), in the posterior, middle, and anterior STS (Fig. 3A).

The mask generation pipeline that was applied to all ROIs across both hemispheres for each animal was as follows. Within a given mask, a sphere was generated around the peak visual- or the peak face-selective voxel from our block-design experiment across all sessions within each animal. Before extracting fMRI data from the spherical masks, masks were co-registered to each individual scanning session’s example functional image to align with the functional space of each session. Final spherical ROIs had approximately equal volume (~30 mm3) across animals. We chose spheres of this size so that our spherical ROIs approximately matched the volume of our smallest functionally-defined region (a cluster of face-selective voxels in the anterior STS).

Event-related fMRI: data analysis

We used custom-written code in Matlab to temporally co-register the stimulus presentation times with fMRI volumes and eye-tracking recordings. Statistical analyses were performed using FSL’s FMRI Expert Analysis Tool (FEAT) Version 6.00 by estimating a general linear model (GLM). For each session, each image was modelled as an explanatory variable (EV, i.e., regressor). Monkeys’ head motion outliers (see above) were included in the GLM, as additional EVs of no interest. All EVs were convolved by a hemodynamic response function (HRF) adjusted to reflect the macaque BOLD HRF, which is faster than in humans: we used a gamma HRF with 3 s mean lag and 1.5 s standard deviation (see refs. 20,53,54). We set up one contrast (stimulus> baseline) for each of our 48 images. Z-statistic images arose from EVs according to the following pipeline: each EV in the design matrix resulted in a parameter estimate (PE) image indicating the fit of a waveform model to the data in each voxel. A PE image was converted to a t-statistic image by dividing the PE by its standard error (deriving from the residual noise after the complete model was fit). T-statistic images were then converted to z-statistic images following standard statistical transformations. The beta weight for each stimulus EV, within each ROI, was extracted and converted to % signal change using the Featquery tool in FSL. In every ROI, we averaged the data from the two hemispheres.

Event-related fMRI: representational similarity analysis

To perform representational similarity analysis (RSA)9 on face-selective regions and early visual cortex, we extracted the pre-processed fMRI data from these ROIs. ROIs included bilateral anterior, middle, and posterior face-selective regions, and early visual regions bilateral V1 and V2. For the face-selective regions, we used the localization procedure described above, and created new spherical masks with a 5 mm radius. We used larger spherical masks with more voxels for the RSA to improve sensitivity of these analyses, since cross-validated distance measures are based on voxel patterns rather than simply the mean activation in an area. For the early visual regions, we used anatomically-derived masks from ref. 31.

We loaded unsmoothed data from each spherical ROI for each face-selective region in each subject into Matlab and constructed GLMs using custom-written Matlab code. The EVs were modelled as above, with each image modelled as one variable for each run. Each run was modelled separately, in order to perform cross-validation across runs within each session. A GLM-based analysis was performed on each run for each animal and each ROI, which produced a vector of beta weights for each ROI. We used these beta weights to produce representational dissimilarity matrices (RDMs). We used the cross-validated Mahalanobis (or crossnobis) distance, for the distance measure in the RDMs, representing the dissimilarity between two sets of voxel-wise brain patterns55,56.

The crossnobis estimate of the squared Mahalanobis distance was computed as follows:

$$Crossnobis({{\boldsymbol{b}}}_{k},{{\boldsymbol{b}}}_{j})=({{\boldsymbol{b}}}_{k,A}-{{\boldsymbol{b}}}_{j,A}){\Sigma }_{A}^{-1}{({{\boldsymbol{b}}}_{k,B}-{{\boldsymbol{b}}}_{j,B})}^{T}$$

where \({{\boldsymbol{b}}}_{k}\) and \({{\boldsymbol{b}}}_{j}\) are vectors of beta weights (fMRI voxel activation patterns) to be compared for image k and j, A denotes the training set and B denotes the test set, ΣA is the noise covariance matrix estimated from the residuals of the GLM for this ROI in the training set A (see ref. 56), and T means transpose. The crossnobis distance was computed between each image in each ROI.

Cross-validation was performed across runs within each session. Given that trials with excessive eye-movements were excluded (see Methods and Supplementary), not every run included a trial for each image, and therefore it was not possible to use a leave-one-run-out method for computing the crossnobis distance. Instead, we used a split-half approach to estimate the pairwise crossnobis distances. For each session, we randomly assigned half the runs as the training set A and half the runs as the test set B (with one of the runs left out of the analysis when there were odd numbers of runs within a session). This was done 50 times to produce 50 cross-validated distances, and the distances were averaged across cross-validation folds. The same procedure was performed for each session, and the RDMs were averaged across sessions. The noise covariance matrix used was estimated based on the training data. To produce this matrix, we obtained the residuals R estimated from the GLM from an ROI, which is a T (number of time points) × P (number of voxels). The P × P noise covariance matrix can be then estimated by: \(\Sigma =\frac{1}{T}{{\boldsymbol{R}}}^{{\boldsymbol{T}}}{\boldsymbol{R}}\)

As in the univariate analysis, the RDMs were averaged across hemispheres which gave three RDMs for face-selective regions and three RDMs for early visual areas for each monkey.