Working memory (WM) is a multifaceted cognitive construct that comprises short-term storage, rehearsal, interference resolution, decision-making, and response operations. Edward Smith was one of the early pioneers in the use of functional neuroimaging to understand the physiological basis of WM (Jonides et al., 1993; Smith et al., 1995); his enduring legacy is the emphasis on dissociating these components of WM from one another and isolating their neural properties and substrates (Smith & Jonides, 1998, 1999; Wager & Smith, 2003). A large portion of the effort of isolating WM components has been directed at elucidating the neural basis for the short-term storage of information. Some of the earliest studies used a cognitive-subtraction logic, comparing activity across tasks matched for executive demands, but with differential demands on storage, in order to distinguish storage from rehearsal and executive functions. These studies revealed storage-related activity in posterior regions, although some additionally showed similar activity profiles in prefrontal cortex (PFC; Smith & Jonides, 1998, 1999). These findings are broadly consistent with the finding that storage operations are largely intact in patients with lesions to PFC (D’Esposito & Postle, 1999).

Other attempts to identify the neural substrates of short-term storage processes have involved the use of different stimulus domains as memoranda (Lepsien & Nobre, 2007; Ranganath, Cohen, Dam, & D’Esposito, 2004), the inclusion of interfering information to tax or disrupt storage (Artchakov et al., 2009; Sakai, Rowe, & Passingham, 2002; Yoon, Curtis, & D’Esposito, 2006), or the manipulation of WM load (Braver et al., 1997; Jha & McCarthy, 2000; Jonides et al., 1997; Leung, Seelig, & Gore, 2004; Todd & Marois, 2004). The results from this work continue to be equivocal; some studies have emphasized the role of lateral PFC in storage operations and suggested that perceptual regions only transiently represent memoranda (Leung, Gore, & Goldman-Rakic, 2002; Munk et al., 2002), whereas others have suggested that storage occurs primarily within content-specific posterior regions (Postle, 2006). The most recent attempts to identify storage-related activity with neuroimaging have employed multivariate decoding methods. These studies have identified sustained distributed patterns of activity in posterior cortices that contain information about items retained in WM (Christophel, Hebart, & Haynes, 2012; Harrison & Tong, 2009; Riggall & Postle, 2012; Serences, Ester, Vogel, & Awh, 2009), suggesting an overlap between the neural substrates of perception and WM storage (D’Esposito, 2007; Pasternak & Greenlee, 2005; Postle, 2006). However, it is unclear whether these patterns persist in the face of subsequent perceptual input, which is a key criterion for WM storage.

In addition to depending on local activity, WM relies on communication between regions (Fuster, Bauer, & Jervey, 1985). Multivariate neuroimaging analyses have recently allowed researchers to investigate interregional communication with fMRI (Friston, 1994; Rissman, Gazzaley, & D’Esposito, 2004; Roebroeck, Formisano, & Goebel, 2005). Several studies have pointed to the importance of communication between PFC and posterior regions during WM (Cohen, Sreenivasan, & D’Esposito, 2012; Fiebach, Rissman, & D’Esposito, 2006; Gazzaley, Rissman, & D’Esposito, 2004; Liebe, Hoerzer, Logothetis, & Rainer, 2012; Salazar, Dotson, Bressler, & Gray, 2012); however, it remains unclear how interactions between these regions support WM processes. Measures of directed connectivity, which support inferences about directional influences (Friston, Harrison, & Penny, 2003; Roebroeck et al., 2005), may help resolve this question by distinguishing between (1) transient, passive activity in perceptual regions as a result of input from PFC regions involved in storage, and (2) active storage operations in perceptual regions that influence incoming sensory processing and relay that information to PFC.

In the present study, we sought to investigate the role of posterior perceptual regions in WM storage using a two-pronged approach. First, we sought to demonstrate that population activity within perceptual cortex is tuned to the sensory features of the memory item—a key prediction of models that emphasize an active role for perceptual regions in memory-guided processes (Chelazzi, Duncan, Miller, & Desimone, 1998; Desimone & Duncan, 1995). Morphing software was used to create a set of faces with features that varied parametrically with respect to a designated “target” face. Participants held the target face in WM, viewed a series of probe faces, and responded to indicate whether each probe face was a match to the target.

Previous studies have used manipulations of probe stimuli to infer storage and maintenance processes (Jonides, Smith, Marshuetz, Koeppe, & Reuter-Lorenz, 1998) and have typically found elevated responses to “match” relative to “nonmatch” probes (Druzgal & D’Esposito, 2001; Jiang, Haxby, Martin, Ungerleider, & Parasuraman, 2000; Miller & Desimone, 1994). However, match probes may exhibit qualitatively different perceptual responses than nonmatch probes do, for several reasons: Match and nonmatch probes may engage different cognitive processes (St. James & Eriksen, 1991; Zhang, Leung, & Johnson, 2003), or match probes may activate attentional resources that result in preferential downstream processing of match stimuli. Alternatively, increased match responses may be the result of WM storage operations; if perceptual neurons are preferentially tuned to target-specific features, the response to any probe face should reflect the degree to which visual features of the target are present in the probe. Demonstrating this type of graded (quantitatively varying) response as a function of physical similarity to the maintained memory item would provide a critical constraint on the role of perceptual regions in WM by indicating that these regions contain information about the features of the memory item. In principle, one might predict the opposite pattern—that tuning in perceptual neurons would result in reduced responses to targets, as well as increased response with decreasing physical similarity to the target face, due to adaptation of the neurons that represent the target and its features. However, on the basis of work demonstrating that tuning results in increased responses to task-relevant information (Chelazzi et al., 1998; David, Hayden, Mazer, & Gallant, 2008), as well as our own work demonstrating that electrophysiological responses increased as probes increased in similarity to a target held in WM (Sreenivasan, Sambhara, & Jha, 2011), we predicted that perceptual response would be increased for probes that were similar to the target face, and decreased for dissimilar probes. To ensure that any observed modulations as a function of similarity to the target face were specific to WM, we included a control task that included similar bottom-up stimulus information without any WM requirements.

In a second analysis, we used directed functional connectivity measures to identify the direction of information flow between perceptual regions demonstrating sensitivity to the target features and higher-order regions that were also recruited by the task. We reasoned that if the decision criteria for the task were based on a comparison between the visual input and the WM representation stored in perceptual regions, then WM decision processes should rely on feedforward information transfer. Alternatively, comparisons between probes and the stored target representation could occur in multimodal association cortex, resulting in downstream modulation of perceptual regions. In this case, we would see greater evidence for feedback information flow.

Method

Participants

A group of 18 healthy adults from the University of California, Berkeley, community were recruited to participate in this study. A previous electrophysiological study with a similar paradigm (Sreenivasan et al., 2011) revealed a large effect size (η p 2 > .75) using 15 participants. A conservative estimate based on these data suggested that ~15 participants would be sufficient to achieve acceptable power in the present study. The data from two individuals were discarded prior to the analysis: One individual did not properly follow the task instructions, whereas another individual exhibited excessive head motion (with more than 10 % of functional data censored due to motion). This left 16 participants (10 female) between the ages of 18 and 32 (mean = 22 years). All of the participants were right-handed, with normal or corrected-to-normal vision and no history of neurological injury or medications with psychoactive, cardiovascular, or homeostatic effects. Written informed consent was obtained according to procedures approved by the Committee on the Protection of Human Subjects at UC Berkeley.

Stimuli

In the prescan, memory, and control tasks, the stimuli were black-and-white line drawings of faces. Four faces were chosen from a large collection of unique faces that were assembled from a set of face features (eyes, noses, and mouths) from the Mac-a-Mug software package (Sheherazam software). These four faces were divided into two morph pairs. Each morph pair was loaded into the Morph 2.5 package (Gryphon software), which can be used to create a series of images that define a transformation of a start image into a destination image. Four intermediate images were saved in 20 % increments along the morph continuum between the two faces in a morph pair, yielding two morph sets. In each set, one of the original morph pairs was designated as the target face, and the four intermediate morph faces were designated as the 80 %, 60 %, 40 %, and 20 % faces, with the name indicating the percentage of target features present in that face. Critically, each individual feature of a morph face retained a percentage of the corresponding original target feature. We operationally define the term “feature similarity” to indicate the overlap in features between a given face and the target face. At the same time, each feature in a given face was qualitatively different from the corresponding features of the other morph faces within its morph set. Two other faces that did not contain any of the features of the faces in either morph set were designated as the novel faces for each set. Seven participants performed the experiment with Morph Set 1, and the remaining nine performed the experiment with Morph Set 2. The two morph sets are shown in Fig. 1.

Fig. 1
figure 1

Face stimuli. Seven participants performed the task with Morph Set 1, and nine performed the task using Morph Set 2. In the memory task, the target face was maintained in WM while faces from the morph set were presented sequentially in a pseudorandom order. Participants responded to each face with a “target”/“nontarget” judgment. The control task required participants to determine whether or not the faces were rotated. All faces (except the novel face, which was not analyzed) contained a percentage of target facial features (80 %, 60 %, 40 %, and 20 %, respectively; see the Method section for details)

In the face localizer task, face stimuli were grayscale photographs of male and female faces with hair, ears, and neck cropped. Scene stimuli were grayscale photographs of outdoor scenes.

Experimental procedure

Participants completed two brief prescan tasks, during which they were familiarized with the face stimuli and encoded the target face. Following these tasks, they performed alternating blocks of a continuous recognition task (memory task) in which they responded to the presentation of the target face, and a perceptual task in which they responded to slight rotations of the face stimuli (control task). Participants also completed one run of a face localizer task designed to identify face-processing regions.

Prescan tasks

The prescan tasks followed the same procedure as in (Sreenivasan et al., 2011). Briefly, participants were exposed to the stimuli and encoded the target face in the context of two matching tasks. In the first matching task, two faces were presented simultaneously; the face on the left was always the target, and the face on the right was either the target (50 % of trials) or the 80 %, 60 %, or 40 % face (each presented on 16.7 % of trials). The second matching task was similar to the first, except that both faces varied on each trial. In both matching tasks, each trial required a buttonpress indicating whether the two faces were identical (50 % of trials) or different (50 % of trials). The face pairs were presented for 1,500 ms, followed by a 750-ms delay. Participants received feedback on each trial. The participants viewed a total of 720 faces across both matching tasks, half of which were targets, and the remaining half of which were divided evenly among 80 %, 60 %, and 40 % faces.

Memory and control tasks

The memory and control tasks were performed in alternating runs in the scanner. Both tasks consisted of a series of centrally presented faces: Target, 80 %, 60 %, 40 %, 20 %, and novel faces were each presented with equal probability. The presentation order was counterbalanced to control for sequence effects, and the target was not repeated on consecutive trials. Each face was presented for 750 ms, with an interstimulus interval of 1,250 ms. In order to optimize parameter estimation for our events of interest, face stimuli were interspersed with null events, as determined using Optseq (http://surfer.nmr.mgh.harvard.edu/optseq/). Our predictions concerned the faces that contained a percentage of the target features; thus, the novel face was included as an event of no interest. In the memory task, participants were instructed to use their right hand to push one button for target faces and another for non-target faces. In the control task, they used the same buttons to indicate whether each face was tilted 2º (16.7 % of trials; tilted left or right with equal frequency) or upright. The bottom-up stimulus information was identical across tasks, with the exception of the slight tilt in 16.7 % of the faces in the control task. Equating the bottom-up information allowed us to attribute differences as a function of task to the influence of WM, ruling out the possibility that our results were due to differences between the stimuli in the absence of WM. It should be noted that the representation of faces in temporal regions is largely invariant to subtle positional transformations (Andrews & Ewbank, 2004). One participant apiece completed three and two scanning runs of the memory and control tasks; all other participants completed four runs of each task.

The memory task was designed to encourage participants to maintain an active representation of the target throughout each run in order to distinguish targets from nontarget faces. Studies examining the effect of active representations on perceptual processing typically employ a delayed match-to-sample task; however, these tasks can be performed using familiarity-based cues (Miller & Desimone, 1994). Importantly, WM maintenance, whether it involves recently encountered information or activated portions of long-term memory (Cowan, 1993; Lewis-Peacock & Postle, 2008), may rely on different neural substrates than are recruited by passive, familiarity-based strategies (Speer, Jacoby, & Braver, 2003). Thus, the memory task was designed to promote active WM maintenance by (1) using stimuli with a high degree of similarity, which necessitated a highly selective memory template for the target, and (2) familiarizing participants with most of the stimuli during the prescan tasks, precluding them from using a sense of familiarity to identify targets. Consistent with our aim of promoting active WM maintenance, a debriefing asking participants to reveal any strategies they may have used indicated that participants did not consciously use familiarity as a strategy.

Face localizer task

Following the memory and control tasks, participants participated in a single scanning run of a face localizer task. For this task, 16-s blocks of rapidly presented face stimuli were interspersed with 16-s blocks of rapidly presented scenes and with blank 16-s baseline blocks, while participants indicated stimulus repetitions with a buttonpress.

Data acquisition and preprocessing

Imaging data was collected on a 3 Tesla Siemens MAGNETOM Trio MR scanner equipped with a 12-channel head coil. Functional data was acquired with a T2*-weighted gradient echoplanar imaging sequence with fat saturation. The parameters of the acquisition sequence were as follows: repetition time (TR) = 2,000 ms, echo time (TE) = 33 ms, flip angle = 74º, 96 × 96 in-plane matrix, 26 slices (5 % slice gap) acquired in descending order approximately 10 deg from axial, voxel size = 2.33 × 2.33 × 3.0 mm3. These parameters yielded near-whole-brain coverage, although slices were positioned for maximal coverage of the inferior temporal and lateral prefrontal cortices, at the expense of full coverage of orbitofrontal, anterior temporal, and posterior parietal cortices. A MP-RAGE T1-weighted sequence (parameters: TR = 2,300 ms, TE = 2.98 ms, matrix = 160 × 240 × 256, isotropic 1.0-mm3 voxels) was used to acquire a high-resolution anatomical scan for coregistration of the functional data. The face localizer run consisted of 168 volumes; all memory and control task runs consisted of 160 volumes.

All image processing was conducted in AFNI (http://afni.nimh.nih.gov/afni/; Cox, 1996). Functional volumes were slice-time corrected and aligned to the third volume of the first imaging run. Nonbrain voxels were removed from both the functional and structural volumes. The functional volumes underwent a rigid-body alignment to the structural image and spatial smoothing with a 4-mm full-width-at-half-maximum kernel.

Univariate analysis

A general linear model (GLM) was used to estimate response amplitude for each stimulus type separately for the memory and control tasks. Each event was modeled with the canonical gamma HRF. Regressors of no interest included trends up to order 3 and estimated head motion parameters. In addition to modeling each stimulus type separately, we tested two contrasts corresponding to competing hypotheses about the role of perceptual regions in WM. The first contrast (physical similarity) was designed to isolate activity specific to the feature similarity between the probe and the target face. We defined a linear contrast, in which the stimulus overlap values (.2, .4, .6, .8, and 1.0) were used as the weights on the corresponding stimulus types. The second contrast was designed to identify activity related to participants’ subjective estimates of “targetness” (perceptual similarity) derived from their behavioral responses; for each participant, the proportion of “target” responses to a given stimulus type was used as the weight for that stimulus type. Weights for different stimulus types in the perceptual similarity contrast are shown in Supplementary Table S1 for each participant. Trials with response time (RT) <200 ms were added as nuisance events in the perceptual similarity contrast; incorrect trials were additionally included as nuisance events in the physical similarity contrast (including incorrect trials yielded similar results).

In addition to evaluating the results of the above GLMs within participant-specific regions of interest (ROIs), we conducted a whole-brain analysis. Participants’ parameter estimates were transformed to standard space (AFNI’s standard Talairach atlas) using the transform defined by AFNI’s @auto_tlrc function, and paired comparisons between contrasts of interest were conducted using AFNI’s 3dttest++ function. Whole-brain maps were overlaid on the average of participants’ normalized anatomical scans.

The GLM for the face localizer task contained separate regressors for face and scene blocks. GLM parameters were as described above, except that face and scene blocks were modeled as 16-s boxcar functions convolved with the canonical HRF.

Directed functional connectivity analysis

Directed functional connectivity measures were established using Granger causality (GC) analysis, which can support inferences about directed influences between regions from fMRI data (Ding, Bressler, Yang, & Liang, 2000; Roebroeck et al., 2005). GC analysis was implemented via the Granger Causal Connectivity Analysis toolbox (Seth, 2010) for MATLAB (MathWorks, Natick, MA), in order to identify information flow between posterior face-processing regions and frontal regions. Time series data from an entire scanning run (160 TRs) was extracted for each participant and ROI (see below), mean-centered, and detrended. The memory and control runs were analyzed separately. Each run was considered a realization of a common underlying stochastic process (Ding et al., 2000), and analyzed accordingly by removing the ensemble mean from each run and using a first-order autoregressive model (Seth, 2010). Our measure of directed connectivity was the difference of the bidirectional interactions produced by the GC analysis (Granger DOI value), which is thought to be a better approximation of the true directions of influence for fMRI data (Roebroeck et al., 2005). A potential limitation of applying GC analysis to BOLD data is that regional differences in hemodynamic response may result in spurious connectivity results (Friston, 2009; Smith et al., 2011), but subsequent work has shown that this concern is mitigated in practice (Deshpande, Sathian, & Hu, 2010; Schippers, Renken, & Keysers, 2011; Seth, Chorley, & Barnett, 2013). This concern can be further mitigated by examining differences between the results of GC analyses across experimental conditions (Roebroeck et al., 2005), thus making this technique a useful method for assessing directed interregional influences (Wen, Rangarajan, & Ding, 2013). Accordingly, our analyses focused on differences across the memory and control tasks.

Spatial correlation

We conducted an exploratory analysis to examine the relationship between the spatial patterns of activation for the different stimulus types and how this relationship was affected by WM. This analysis complements and extends the univariate analysis because it considers the multivariate pattern of BOLD activity elicited by each stimulus type rather than a single value for activity magnitude. First, the parameter estimates for each voxel within an ROI were vectorized (Aguirre, 2007). This procedure was conducted separately for each stimulus type. We calculated the correlation (Pearson’s r) between the vector for each stimulus type and the vector for the target face. The resulting coefficient was a measure of the similarity between the activity pattern elicited by a given face and the activity pattern elicited by the target face. In order to conduct statistical comparisons, correlation coefficients were transformed to z-scores using the Fisher transformation.

Region of interest definition

Face-specific perceptual processing was isolated by defining individual-participant fusiform face area (FFA) ROIs from the face localizer task data. First, a bilateral anatomical inferior temporal mask was created on a standard brain using the unthresholded Harvard–Oxford Probabilistic Brain Atlas (provided by the Harvard Center for Morphometric Analysis via the FSL analysis package: http://fsl.fmrib.ox.ac.uk/fsl/). This mask was transformed to individual participants’ native space using the inverse of the transform from standard to native space. Face-processing modules were identified as clusters of voxels within the mask that showed greater activity for faces relative to scenes in the localizer GLM (t > 4.0, corresponding to an uncorrected p < .0001). Clusters were examined visually to identify bilateral clusters corresponding to the FFA (in one participant, only the right FFA could be identified), and the 30 voxels with the largest t value for the faces-versus-scenes contrast within these clusters were used to create each participant’s bilateral FFA ROI.

Results

Behavioral performance

The signal detection metric d-prime (d') was used to measure participants’ sensitivity for detecting the target face in the memory task and the rotated face in the control task. Detection sensitivity was comparable across memory and control tasks (memory task [mean d' ± SEM]: 1.79 ± 0.16; control task: 1.91 ± 0.14; p > .5, Cohen’s d = 0.15; all Cohen’s ds in this article are corrected for dependence between the means; see Morris & DeShon, 2002). RT measures were also comparable across the tasks (memory task [mean RT ± SEM]: 406 ± 14 ms; control task: 411 ± 20 ms; p > .75, d = 0.09).

fMRI results

Our first prediction was that WM for the target face would result in an evoked face response in the FFA that was proportional to the probe’s feature similarity to the target face. Regression parameter estimates (β values) for each of the five stimulus types of interest (target, 80 %, 60 %, 40 %, and 20 %) were averaged over the voxels in each participants’ bilateral FFA ROI and entered into a two-way repeated measures analysis of variance (ANOVA) with the factors Stimulus Type and Task (memory and control). In accordance with our prediction, we found a significant Stimulus Type × Task interaction [F(4, 60) = 2.97, p = .029, η p 2 = .16; see Fig. 2a]. Planned one-way follow-up ANOVAs demonstrated that feature similarity modulated the FFA response in the memory condition [F(4, 60) = 9.43, p < .001, η p 2 = .39], but not in the control condition [F(4, 60) = 1.75, p = .15, η p 2 = .11], indicating that the predicted modulation was a result of the feature similarity between the probe and the activated WM template and was not due to any physical properties of the faces themselves. For completeness, we also report the main effects of task [F(1, 15) = 3.04, p = .10, η p 2 = .17] and stimulus type [F(4, 60) = 6.10, p < .001, η p 2 = .29].

Fig. 2
figure 2

Region-of-interest results. a Fusiform face area response was modulated by the degree of feature overlap between the probe face and the target face being held in WM. This effect was specific to the memory task. b The pattern of modulation was more consistent with a linear effect, in which response amplitude was determined by the sensory feature similarity between the probe and the target face (physical similarity contrast), rather than a subjective contrast in which the response amplitude was determined by participants’ behavioral responses to each of the probe face stimulus types (perceptual similarity contrast). This linear effect was only present during the memory task. All error bars represent SEMs; asterisks indicate p < .005

In a supplementary analysis, we examined whether similar effects of WM could be observed in early visual cortex. Bilateral early visual cortex ROIs were created from the ROI corresponding to early visual regions in the Harvard–Oxford Probabilistic Brain Atlas, thresholded at 50 %. This mask was transformed to individual-participant space. The data extracted from this early visual mask were also examined for Stimulus × Task interactions using a two-way repeated measures ANOVA. We found no evidence for a main effect of stimulus type [F(4, 60) = 0.44, p > .77, η p 2 = .03] or an interaction between stimulus type and task [F(4, 60) = 1.49, p > .21, η p 2 = .09]. Together, our results suggest that WM for faces results in templates that are tuned for face features, without systematically modulating downstream population activity. However, this does not rule out the possibilities that subpopulations of early visual neurons coding for the simple features could be tuned to target-specific features (Gratton, Sreenivasan, Silver, & D’Esposito, 2013), that our manipulation modulated communication between higher and lower visual regions (Al-Aidroos, Said, & Turk-Browne, 2012), or that we lacked sufficient power to detect weaker modulations of population activity.

We also conducted a complementary exploratory multivariate analysis, in which we examined the similarity between the FFA activity patterns elicited by probe faces and the FFA activity pattern elicited by the target face, and how this relationship was modulated by WM. Z-transformed correlation coefficients were entered into the same Stimulus Type × Task ANOVA described earlier. Since the correlation of the activity pattern elicited by the target face to itself is, by definition, 1, corresponding to a z score of positive infinity, this condition was removed from the ANOVA, resulting in four stimulus types in the analysis. The results were qualitatively similar to those from our univariate analysis; we observed a trend toward a significant interaction of stimulus type and task [F(3, 45) = 2.16, p = .11, η p 2 = .13], as well as a trend toward a significant linear contrast for the memory condition [F(1, 15) = 4.11, p = .06, η p 2 = .22], but no such trend for the control condition [F(1, 15) = 0.004, p > .95, η p 2 < .001], suggesting that multivoxel patterns of activation displayed increasing similarity to the target pattern as the physical similarity to the target increased, but that this relationship was specific to WM.

Our next prediction concerned the specific role of perceptual regions in WM. We compared the parameter estimates of the physical similarity contrast, in which the weight on each stimulus type was the sensory feature similarity to the target face, with the parameter estimates of the perceptual similarity contrast, in which the weight on each stimulus type was determined by participants’ subjective similarity, as indicated by their behavioral responses (see the Method section). The physical similarity contrast corresponded to our prediction that FFA neurons stored a veridical representation of the target face; FFA neurons preferentially tuned to target features should result in a population response to a probe that is a linear function of the physical similarity between the probe and the target features. The perceptual similarity contrast corresponded to the alternative prediction that graded responses in FFA reflected the outcome of decision processes that compared sensory input to WM representations stored in multimodal association cortex. The parameter estimates for each contrast were averaged over the voxels in each participant’s FFA ROI. Consistent with our prediction, the average parameter estimate for the physical similarity contrast in the memory condition was significantly greater than the average estimate for the perceptual similarity contrast in that same condition [two-tailed paired t test: t(15) = 3.69, p = .002, d = 2.07; see Fig. 2b], and significantly greater than the parameter estimate for the physical similarity contrast in the control condition [t(15) = 3.36, p = .004, d = 0.85].

To provide further evidence for our hypothesis that perceptual regions store features of WM items, we examined the relationship between WM and directed connectivity between perceptual regions and higher-order regions. Our prediction was that if the pattern of modulation with target feature similarity represented the storage of target features, WM decision and response processes should depend on comparison operations within face-processing regions, and thus be associated with communication from face-processing regions to control and decision-making regions. In contrast, information flow from higher-order regions to face-processing regions would suggest that the pattern of results observed above may reflect downstream effects of decision processes, such as elevated attentional priority (Chelazzi et al., 1998; Liu, Hospadaruk, Zhu, & Gardner, 2011). First, we identified regions involved in the maintenance of target features with a group analysis contrasting the physical similarity contrast for the memory task with the physical similarity contrast for the control task. The resulting group map (Fig. 3a) was thresholded at p < .001 and a cluster size of 20 voxels, to achieve an α level of .01 as determined by AFNI’s 3dClustSim function, which estimates the probability of false positives using Monte Carlo simulations of noise distributions with the same estimated smoothness as the data. Supporting the results from our ROI analysis, bilateral regions of fusiform gyrus showed significant activation for the linear (memory > control) contrast. Other regions identified through this analysis included dorsal and ventral prefrontal regions, as well as regions of basal ganglia and thalamus (Table 1).

Fig. 3
figure 3

Whole-brain results. a A broad network of regions exhibited a greater linear effect during the memory task than during the control task, including bilateral PFC subregions, basal ganglia, and thalamus (see Table 1). b Directed functional connectivity between bilateral fusiform gyrus and combined dorsal PFC regions predicted the participants’ behavioral performance during the memory task, indicating that information flow from perceptual to dorsal PFC regions is optimal for task performance

Table 1 Regions demonstrating increasing activity with increasing similarity to the target face

On the basis of known anatomical connections between PFC and inferior temporal regions (Pandya, Dye, & Butters, 1971; Pandya & Kuypers, 1969), as well as the well-established importance of PFC in WM (D’Esposito, Postle, Jonides, & Smith, 1999; Jha, Fabian, & Aguirre, 2004; Jonides, Smith, Marshuetz, Koeppe, & Reuter-Lorenz, 1998; Wager & Smith, 2003; Zanto, Rubens, Thangavel, & Gazzaley, 2011), our connectivity analyses focused on prefrontal regions identified in the group analysis: bilateral inferior frontal junction (IFJ), bilateral inferior frontal gyrus/insula (IFG/insula), dorsal anterior cingulate/presupplementary motor area (dACC/preSMA), and right inferior frontal gyrus (IFG). Masks from the group activation maps for these regions were transformed into native space, and time-series data were extracted from each mask for each participant and entered into a bivariate GC analysis with the time series from the reverse-normalized bilateral fusiform group mask. GC analyses were conducted separately for the memory and control tasks (see the Method section).

The PFC ROIs were grouped according to well-described functions, resulting in a ventral selection network (IFG/insula + IFG; Badre & Wagner, 2007; D’Esposito et al., 1999; Jha et al., 2004; Thompson-Schill, D’Esposito, Aguirre, & Farah, 1997) and a dorsal maintenance/cognitive control network (IFJ + dACC/preSMA; Brass, Derrfuss, Forstmann, & von Cramon, 2005; Derrfuss, Brass, Neumann, & von Cramon, 2005; Wager & Smith, 2003), although the results did not differ if the regions were considered individually. Average Granger DOI values between these PFC networks and the bilateral fusiform region were examined for specificity to WM. Neither the dorsal nor the ventral PFC network exhibited greater connectivity during the memory task than during the control task (ps > .5). Given that the control task also required cognitive control resources, however, this was not particularly surprising. Critically, we found a significant relationship between fusiform–dorsal prefrontal Granger DOI values and behavioral performance (d') during the memory task (r = .89, p < .001, 95 % confidence interval = [.68, .96]; see Fig. 3b), indicating that greater influence from perceptual regions to dorsal frontal regions was related to increased behavioral performance during WM. This relationship was specific to WM; the magnitude of this correlation was significantly greater (p < .001; Steiger, 1980) than the correlation between the fusiform–dorsal prefrontal Granger DOI and d' during the control task (r = –.13, p = .63). The corresponding fusiform–ventral frontal correlation during memory did not reach significance (r = .26, p = .32) and was significantly weaker than the fusiform–dorsal frontal correlation (p = .002).

We further examined whether the relationship between fusiform–dorsal prefrontal DOI and behavior was likely to be driven by local activity differences within either region. Mean activity in fusiform and dorsal prefrontal ROIs showed no relationship with behavior (fusiform: r = .23, p > .39, 95 % confidence interval = [–.30, .65]; dorsal prefrontal: r = –.26, p > .33, 95 % confidence interval = [–.67, .27]), suggesting that the observed relationship between Granger DOI and behavior was not an artifact of changes in activity magnitude.

Discussion

Our findings demonstrate that visual regions are tuned to the features of items held in WM, indicating that these regions store feature information that is subsequently compared to sensory input. Recent work using decoding methods has similarly suggested that perceptual regions store WM representations (Harrison & Tong, 2009; Riggall & Postle, 2012; Serences et al., 2009); however, a critical advance of the present work is our finding that tuning for memoranda persists despite intervening sensory input. Miller and colleagues (Miller, Erickson, & Desimone, 1996) have previously found that WM activity in inferior temporal cortex does not persist across interfering input, and used this finding to contrast stable prefrontal representations with transient sensory representations. Thus, an important challenge for recent work suggesting that stable visual WM representations are sustained in visual cortices (e.g., Harrison & Tong, 2009; Serences et al., 2009) lies in demonstrating that these representations can withstand interference. One study observed that when participants were cued to remember the orientation of the first of two sequentially presented gratings, the orientation of the first grating could still be predicted from the activity pattern of early visual cortex during the blank delay interval (Harrison & Tong, 2009). However, unlike studies explicitly examining interference resolution in WM (Sreenivasan & Jha, 2007), the stimuli in this study were not designed to interfere with one another. In fact, the gratings were oriented orthogonally to one another and were presented in quick succession during the encoding phase of the task. In the present study, by demonstrating that tuning for the sensory features of the target face persisted across the task, we can infer that sensory representations persist across intervening input that is highly similar to the memoranda.

Several other studies have shown that WM operations modulate sensory input (Gazzaley, Cooney, McEvoy, Knight, & D’Esposito, 2005; Miller & Desimone, 1994; Peters, Roelfsema, & Goebel, 2012; Sreenivasan, Katz, & Jha, 2007; Sreenivasan, Sambhara, & Jha, 2011), but it was heretofore unclear whether these modulations represented transient downstream effects of top-down communication or whether these modulations were the result of local storage operations within perceptual cortices. The present results inform this question by providing evidence that feedforward information flow (i.e., from perceptual to prefrontal regions) is associated with better WM performance. Whereas our finding focuses on the direction of information flow, previous work has shown that visual WM precision was related to the quality of sensory representations in visual cortex (Emrich, Riggall, LaRocque, & Postle, 2013; Ester, Anderson, Serences, & Awh, 2013). Our study lacked an adequate proxy for the quality of sensory representation, limiting our ability to address this question. Taken together, our results indicate a model of WM wherein comparisons between stored WM representations and sensory input occur locally within perceptual regions before being relayed to PFC. An open question is how the computations that transform veridical sensory responses into subjective judgments of “targetness” and, eventually, binary match/non-match decisions, are implemented, although evidence from decision-making work broadly implicates interactions between medial PFC and the reward system (Deco, Rolls, Albantakis, & Romo, 2013; Gold & Shadlen, 2007).

The pattern of modulation in the FFA for the memory condition is strikingly similar to previous theoretical and empirical accounts of attention (David, Hayden, Mazer, & Gallant, 2008; Tsotsos et al., 1995), in which visual neurons act as matched filters, elevating responses to attended features and suppressing responses to unattended features. Although the relationship between our findings and attentional tuning is merely qualitative, it adds to the growing literature connecting downstream effects of WM and attention (Awh & Jonides, 2001; Awh, Vogel, & Oh, 2006; Gazzaley & Nobre, 2012; Kuo, Stokes, & Nobre, 2012; Soto, Llewelyn, & Silvanto, 2012). Although our hypothesis did not explicitly address the question of whether tuning for target features within perceptual regions would rely on enhancement of target features or suppression of nontarget features, a comparison of the memory and control conditions (Fig. 2a) suggests that the pattern of modulation due to target feature similarity is largely due to suppression relative to baseline (control) activity.

Importantly, we were able to rule out the alternative explanation of our results—that our observed modulation due to target feature similarity was a downstream result of participants’ judgment of a probe’s “targetness,” instead of an indicator of the comparison process itself. We distinguished between these alternatives by fitting a GLM using the physical feature similarity values for each of the stimulus types as the regression weights and comparing the resulting parameter estimates to those computed from a GLM in which participants’ subjective measures of each stimulus type’s “targetness” were used as the weight on the stimulus types. The physical similarity contrast produced significantly larger parameter estimates, indicating that FFA responses were driven by the physical similarity between target and probe. It should be noted that the directed connectivity results also supported the notion that feature similarity-based modulations were fed forward to higher regions as opposed to indicating the outcome of WM decision processes.

Although our task does not resemble a typical WM task, we argue that it nevertheless taps into crucial aspects of WM. In addition to being necessary for representing recently encountered sensory information, WM is critical for actively representing information from long-term stores that is necessary to carry out the current goals (Cowan, 1993). This was demonstrated in a study that used multivoxel patterns of activity recorded during a task requiring the retrieval of long-term representations to predict when those same items were held in WM (Lewis-Peacock & Postle, 2008). This study stands as an effective demonstration of the overlap between long-term representations and WM representations. Indeed, much of the information we represent in WM is information with which we are familiar; thus, familiarity with items does not preclude the need for WM in order to actively represent those items in goal-oriented contexts. In the case of our study, the judgments required from our participants were very difficult—significantly more so than a standard delayed recognition task in which nonmatch probes are quite dissimilar to the sample item. We thus argue that our task, if anything, is more likely to require the active representation of information that is the hallmark of WM.

It is critical to distinguish between the idea that comparison between probes and sensory target representations is fed forward to PFC and the idea that WM does not rely on feedback (i.e., top-down) information flow. Although our results suggest the former, they certainly do not imply the latter; in fact, evidence was comparable for both feedback and feedforward information flow during the memory task. That is, the Granger DOI averaged across participants was not significantly different than zero. However, although the overall Granger DOI value did not indicate a preference for feedforward over feedback processing, individual differences in the direction and magnitude of this relationship were related to behavioral performance, with feedforward communication being optimal for behavior. We cannot rule out the possibility that evidence for feedforward versus feedback processing reflects differential strategic approaches by our participants. Nevertheless, our finding suggests that ongoing information about the comparison between target and probe relies on sensory representations and is fed forward to PFC. However, it is quite plausible that tuning in face-processing regions is a result of preparatory feedback signals from PFC (Lee & D’Esposito, 2012; Miller, Vytlacil, Fegen, Pradhan, & D’Esposito, 2011; but see Sugase-Miyamoto, Liu, Wiener, Optican, & Richmond, 2008, for the notion that matched filter operations during WM may be a local phenomenon), and an important area of research revolves around identifying regions of PFC that send feedback signals, as well those that receive information from downstream regions to arrive at a decision. Targeted disruption of selected PFC regions and/or directed models that take into account multiple interregional influences simultaneously may be necessary to tease apart the relative roles of feedback and feedforward communication during WM.