The human visual system exhibits a great deal of flexibility with how visual objects are processed—we have the ability to both recognize the abstract category to which the object belongs (e.g., dog) and/or identify the specific exemplar to which it corresponds (e.g., Fido). However, abstract category recognition and specific exemplar identification place contradictory demands on how visual stimuli should be processed (see Fig. 1; Marsolek, 2003). For example, a subsystem for abstract category recognition should activate the same dog representation for visually distinct images of dogs (e.g., sheepdogs and poodles), which implies that it would not effectively activate different representations for visually similar exemplars within the category dog (e.g., my sheepdog and my neighbor’s sheepdog both activate the same dog representation). Moreover, a subsystem for specific exemplar identification should activate different representations for even visually similar exemplars within the category (e.g., my sheepdog and my neighbor’s sheepdog activate the Fido and Rex representations, respectively), which implies that it would not effectively activate the same representation for visually distinct images of dogs (e.g., sheepdogs and poodles). In other words, conflict arises between these two tasks because abstract categorization needs a representational scheme that generalizes across dissimilar objects whereas specific identification needs a representational scheme that preserves differences between even similar objects.

Fig. 1
figure 1

Recognizing objects can be conceptualized as mapping from points in image space (retinotopic input representations for a visual-form subsystem; left side of the figure) to points in a long-term memory space (output representations from visual-form subsystems; right side of the figure). According to the dissociable visual subsystems theory, contradictory mapping strategies are used by the AC and SE subsystems, The AC subsystem generalizes across visual differences, so that objects from the same category are mapped to the same representation even when they are visually dissimilar. The SE subsystem preserves visual differences so that all objects from the same category are mapped to different representations even when they are visually similar

Some theories posit that a single object processing system can provide both aspects of object representation by using relatively abstract visual object representations (e.g., Amira, Biederman, & Hayworth, 2012; Biederman, 1987, 2013; Biederman & Bar, 1999; Biederman & Cooper, 2009; Cooper, Biederman, & Hummel, 1992; Hayworth & Biederman, 2006; Hummel & Biederman, 1992; Hummel & Stankiewicz, 1996; Wagemans, Gool, & Lamote, 1996), such as parts-based structural descriptions that are parts based and invariant to metric shape changes. Other single-system theories posit that these visual abilities are accomplished using relatively specific visual object representations (e.g., Bülthoff & Edelman, 1992; Gauthier et al., 2002; Palmeri & Tarr, 2008; Poggio & Edelman, 1990; Tarr, 1995; Tarr & Gauthier, 1998; Tarr, Williams, Hayward, & Gauthier, 1998; Ullman, 1996), such as image-based representations that are whole based and vary with metric shape changes. Yet, other single-system theories posit that visual object representations are encoded in manners that enable both relatively abstract and relatively specific representations (e.g., Farah, 1992; Hayward & Williams, 2000; Palmeri & Tarr, 2008; Tarr & Bülthoff, 1995).

Alternatively, the fundamental conflict between categorization and differentiation could be resolved by using multiple visual subsystems with complementary abilities. Evidence suggests that there are two dissociable neural subsystems that operate with different relative efficiencies in the left and right cerebral hemispheres (Burgund & Marsolek, 1997, 2000; Harvey & Burgund, 2012; Marsolek, 1995, 1999; Marsolek & Burgund, 1997, 2003, 2005, 2008; McMenamin, Deason, Steele, Koutstaal, & Marsolek, 2015). In particular, an abstract-category (AC) subsystem operates with greater efficiency in the left hemisphere (LH) than in the right hemisphere (RH), and a specific-exemplar (SE) subsystem operates with greater efficiency in the RH than in the LH. The AC subsystem represents objects in a features-based manner that is relatively abstract, so that the same representation is activated by different exemplars within a category or different views of the same exemplar. In contrast, the SE subsystem represents objects in a whole-based manner that is relatively specific, so that different representations are activated by different exemplars or different views of the same exemplar.

Much of the evidence for multiple subsystems has been obtained using repetition priming experiments involving familiar visual objects (e.g., Beeri, Vakil, Adonsky, & Levenkron, 2004; Burgund & Marsolek, 2000; Harvey & Burgund, 2012; Koutstaal et al., 2001; Marsolek, 1999; Marsolek & Burgund, 2003; McMenamin et al., 2015; Simons, Koutstaal, Prince, Wagner, & Schacter, 2003). However, Marsolek and Burgund (2008) corroborated this pattern of hemispheric asymmetries using two versions of a visual working memory task designed to engage the AC and SE subsystems, respectively. Abstract categories of unfamiliar objects were created for this study by adding a brick, wedge, or pyramid to an unfamiliar base shape, such that categories consisted of three related yet distinct object exemplars (see Figure S1; Marsolek & Burgund, 2008). On each trial, participants maintained a “cue” object in visual working memory for 10 seconds, and then they were prompted to decide whether it matched a second “probe” object. During the abstract visual form (AVF) variant of the task, participants judged whether the cue and probe belonged to the same abstract category (see Fig. 2a). In each same response trial, the cue and probe were different exemplars, so that an SE subsystem would not be useful for determining the correct response. During the specific visual form (SVF) variant of the task, participants judged whether the cue and probe were the same specific exemplar (see Fig. 2b). In each different response trial, the cue and probe belonged to the same abstract category, so that an AC subsystem would not be useful for determining the correct response. To test hemispheric asymmetries in visual processing, the probe object was presented in a manner that advantaged either the LH (i.e., briefly in the right visual field) or the RH (i.e., briefly in the left visual field). The results supported the multiple subsystems theory: performance of the AVF task was greater following LH than RH probe presentations, and performance of the SVF task was greater following RH than LH probe presentations.

Fig. 2
figure 2

Depiction of the experimental paradigm

The multiple subsystems theory posits two parallel visual subsystems to resolve the contradictory demands between abstract category recognition and specific exemplar identification. Unfortunately, in many situations this simply moves the conflict between AVF and SVF processing demands further “downstream”—the conflicting demands between perceptual representations are replaced by conflict between the outputs of two processing subsystems. For example, consider the visual working memory tasks used by Marsolek and Burgund (2008). During a same response trial of the SVF task, cue and probe objects are the same with respect to both category and exemplar, so both the AC and SE subsystems would indicate that the participant’s response should be same. However, during a different response trial of the SVF task, cue and probe objects are the same with respect to category but different with respect to exemplar; this means that the AC subsystem would indicate same and the SE subsystem would indicate different. In this case, an additional mechanism is needed to resolve the conflict between outputs so the correct subsystem will guide the participant’s response. Similar patterns of high and low conflict exist in the AVF task (see Table 1), such that the pattern of conflict between subsystems across these four conditions is orthogonal to task variant (AVF or SVF) and trial type (Same, Different).

Table 1 Conditions with low/high conflict between subsystems

Previous tests of the dissociable subsystems theory have focused on establishing that there are dissociable subsystems, ignoring the need for mechanism(s) that resolve conflict between them. To remedy that situation, we collected fMRI data while participants performed the visual working memory tasks from Marsolek and Burgund (2008) to identify the mechanism(s) for conflict resolution required by the dissociable subsystems theory. Of particular interest was the frontoparietal “control” network centered on the bilateral intraparietal sulcus (IPS) that previous functional neuroimaging studies have identified in working memory tasks (Pessoa, Gutierrez, Bandettini, & Ungerleider, 2002; Seeley et al., 2007). These regions are positioned in a manner to implement a combination of proactive conflict resolution (e.g., using the current task demands to direct top-down biasing of the AC/SE subsystems to reduce anticipated conflict; Gazzaley & Nobre, 2012; Padmala & Pessoa, 2011; Yantis & Serences, 2003) and/or reactive conflict resolution (e.g., engaging conflict resolution mechanisms only after AC/SE subsystems have conflicting outputs; Sebastian et al., 2013; Wendelken, Ditterich, Bunge, & Carter, 2009).

Regions performing proactive conflict resolution were identified by measuring how task demands altered functional connectivity during the maintenance phase of the working memory task—a proactive control region should exhibit high connectivity with LH visual regions during the AVF task and high connectivity with RH visual regions during the SVF task. Regions performing reactive conflict resolution were identified by increased probe-evoked activation on trials with high conflict (AVF-Same, SVF-Different) relative to those with low conflict (AVF-Different, SVF-Same; see Table 1). Moreover, the within-category visual similarity between cue and probe stimuli should affect the amount of conflict between subsystems differently during the AVF task (in the same response trials) and the SVF task (in the different response trials). Increasing the within-category visual similarity of the cue and probe stimuli should benefit same responses in the AVF task but impair different responses in the SVF task, whereas decreasing the visual similarity should impair same responses in the AVF task but benefit different responses in the SVF task. Indeed, our previous behavioral work indicated that visual similarity predicted performance in the two tasks in these opposite ways (Morseth, Burgund, & Marsolek, 2012). This means that increased visual similarity should decrease conflict between the two subsystems during the AVF-Same trials (both subsystems are pushed toward the correct same response, so the conflict from an SE subsystem’s output decreases). In addition, increased visual similarity should increase conflict between the two subsystems during the SVF-Different trials (both subsystems are pushed toward an incorrect same response, so the conflict from an AC subsystem’s output increases). Accordingly, the effects of visual similarity on neural measures of reactive conflict processing were assessed separately for the AVF-Same and SVF-Different conditions.

Method

Participants

Thirty-six male participants (mean age 22 years [±2.5], range 19–30 years) with normal or corrected-to-normal vision and no reported neurological or psychiatric disease were recruited from the University of Minnesota and Macalester College communities. All participants were right-handed (mean laterality quotient = 0.87, based on the Edinburgh Handedness Inventory; Oldfield, 1971). Right-handed males were tested because they tend to exhibit more consistent hemispheric asymmetries than women or left-handed males (Hellige, 1993). The project was approved by the University of Minnesota and Macalester College’s Institutional Review Boards, and all participants provided written informed consent prior to participation.

Materials

Stimuli were the line drawings of 240 objects used by Marsolek and Burgund (2008). Eighty object categories of three exemplars each were created by adding a brick, pyramid, or wedge to 80 different base shapes (see Figure S1). A separate sample of right-handed male participants (N = 24) viewed pairs of objects within the same category and rated how “visually similar the objects are to each other” on a scale of 1 (not at all similar) to 7 (very similar). These ratings were averaged across participants to create a similarity score for every pair of objects within each category.

Procedure

Working memory tasks

Participants performed two visual working memory tasks based on Marsolek and Burgund (2008), as depicted in Fig. 2. Both tasks were presented using PsyScope software (Cohen, MacWhinney, Flatt, & Provost, 1993), and stimuli were viewed on a projection screen using a mirror mounted to the head coil. Each trial began with a centrally presented fixation point (500 ms), and then a cue object appeared (1 s), followed by the working memory maintenance period (11 s of blank screen). Then, a centrally presented probe object appeared for 1 s after the maintenance period, and participants made a button-press response. The intertrial interval was randomly jittered (2–6 s).

The two tasks were administered across four separate runs with the order of the tasks (AVF, AVF, SVF, SVF or SVF, SVF, AVF, AVF) counterbalanced across participants. During the AVF task, participants decided whether the probe and cue objects were visually similar enough that they belonged to the same abstract object category or dissimilar enough that they belonged to different abstract object categories. In same trials, the two objects belonged to the same abstract category, but they were not the same exemplar; in different trials, the two objects belonged to different abstract categories. In this way, a SE subsystem would not be useful for enabling the correct judgment. During the SVF task, participants decided whether the probe and cue objects were the same specific object exemplar or different specific object exemplars. In same trials, the two objects were the same images; in different trials, the two objects were different specific exemplars that both belonged to the same abstract category. In this way, an AC subsystem would not be useful for enabling the correct judgment. Responses were made by pressing a button with their right index finger to indicate same and pressing a button with their right middle finger to indicate different. Each run had 16 trials (eight same, eight different), and participants performed two runs of each task variant (AVF, SVF). Trials were pseudo-randomized within runs to ensure that there were no more than three consecutive same or different trials.

Participants practiced both tasks outside the scanner before the study began (four trials of each task), and then again while in the scanner immediately before the two runs of the task (two trials of the relevant task). Feedback was given on every practice trial to ensure that participants understood the task demands.

Object localizer task

Immediately following the working memory runs, participants completed two runs of a functional localizer task in order to identify regions activated by visual objects. During each run, participants viewed 10 alternating blocks of familiar and scrambled objects. Each block consisted of 24 line drawings of familiar objects taken from Snodgrass and Vanderwart (1980) or 24 visually scrambled versions of these objects presented at 2 Hz for 12 s. Scrambled objects were created by randomly rearranging nine squares of a 3 × 3 grid superimposed on each picture. At the end of each block, participants pressed a button to indicate whether all the images had been real or scrambled. Objects were not repeated; thus, a total of 240 unique objects and their scrambled counterparts were presented across the two runs.

MRI data acquisition

MRI data were collected using a 3 Tesla Siemens TRIO scanner (Siemens Medical Systems, Erlangen, Germany) with a 12-channel phase-array head coil. Each session began with the acquisition of an MPRAGE anatomical scan (1.0 mm isomorphic voxels). Each of the subsequent functional runs collected EPI data with TR = 2.0 s, TE = 30 ms, and FOV = 220 mm. Each EPI volume contained 34 axial slices with thickness of 3.5 mm and voxels measuring 3.5 mm × 3.5 mm in plane. Each run of the working memory task acquired 189 volumes, and each run of the object localizer acquired 122 volumes.

Data analysis

Behavioral analysis

Error rates and response times were analyzed separately in two repeated-measures analyses of variance (ANOVAs) that included Task (AVF vs. SVF) and Probe Type (same vs. different) as within-participant variables. Incorrect-response trials and trials with response times that were 2.5 standard deviations from the grand mean of correct trials (computed across all conditions and participants) were excluded from the analysis of response times. In addition, two participants were excluded from the analysis of response times because their response times were not recorded correctly due to a programming error.

Functional MRI preprocessing

Preprocessing of the functional and anatomical MRI data used the AFNI (http://afni.nimh.nih.gov/afni/) and SPM (http://www.fil.ion.ucl.ac.uk/spm/) software packages. The first three volumes of each functional run were discarded to account for equilibration effects. Slice-timing correction used Fourier interpolation to align the onset times of every slice in a volume to the first acquisition slice. A six-parameter rigid body transformation corrected head motion within and between runs by spatially registering each volume to the first volume. None of the participants exhibited excessive head motion (>3.5 mm within-run total movement). The SPM package was used to skull strip the high-resolution anatomical scans. A 12-parameter affine transformation registered each participant’s anatomical scan with the TT_N27 template (AFNI package) for normalization to Talairach space. The same transformation was applied to the functional data. Functional data were resampled to a grid with 3-mm isometric voxels, and a 6 mm full-width half-maximum (FWHM) Gaussian filter was used to spatially smooth all volumes. The average intensity at each voxel in each run was scaled to 100.

Analysis of functional MRI signals

Functional signals were analyzed in two complementary analyses in order to identify regions that exhibited proactive and/or reactive control on the AC and SE subsystems. Regions exhibiting proactive control were identified using the localizer task to create left and right lateralized regions of interest (ROIs) that correspond to the AC and SE object recognition subsystems, and then comparing the functional connectivity of these two regions during the maintenance phase of each trial. Proactive control regions should exhibit relatively greater connectivity to the LH/AC region (relative to the RH/SE subsystem) during the AVF task and relatively greater connectivity to the RH/SE region (relative to the LH/AC subsystem) during the SVF task. Regions exhibiting reactive control over the AC and SE subsystems were identified using the Task-by-Probe type interaction to find regions where probe-evoked activation was greater on high-conflict trials (AVF-Same, SVF-Different) relative to probe-evoked activation on low-conflict trials (AVF-Different, SVF-Same trials).

Activation at every voxel was analyzed for each participant using a multiple regression model in AFNI. Artifactual drift and motion artifact were residualized from the functional time course using constant, linear, and quadratic polynomial covariates for each run and six additional regressors corresponding to rigid-body head motion parameters. The residual signals from this GLM model were used in subsequent analyses of activation and functional connectivity during the localizer and task.

Defining inferotemporal regions of interest

The object localizer task was used to create bilateral inferotemporal visual ROIs. Regressors were created for the object and scrambled-object conditions stimulus timecourses (i.e., boxcar functions) with the BLOCK hemodynamic response function using AFNI’s 3dDeconvolve program. The beta scores for the object and scrambled-object regressors were compared at every voxel using univariate random effects analysis across participants. The object representation regions were identified as clusters where objects evoked more activity than scrambled objects, t(35) > 9.0, cluster extent > 30 voxels. Given that our goal was to explore functional hemispheric asymmetries in these ROIs, it was important to ensure that the two regions were of comparable shape and location across hemispheres. To that end, the final ROIs were defined as the intersection of the cluster in each hemisphere with the contralateral cluster mirrored across the left–right axis. This resulted in two bilateral ROIs that are equal in size, shape, and location.

Estimating single-trial activation timecourses

Our analyses of maintenance-period connectivity and effects of visual similarity on probe-evoked activity required estimates of activation in every voxel for each individual trial. However, given the slow temporal nature of the hemodynamic response, responses from consecutive trials may overlap with one another during event-related designs. Therefore, we employed a deconvolution approach that other studies have used to successfully account for overlapping signals between consecutive trials of an event-related design (Mumford, Turner, Ashby, & Poldrack, 2012; Turner, Mumford, Poldrack, & Ashby 2012). A separate deconvolution analysis was performed for each trial. On each of these deconvolution analyses, a single trial of interest was modeled using the cubic spline basis function for 24 seconds following cue onset to capture the activity across all phases of the trial, including transient responses locked to the cue and probe onset as well as sustained signal during the maintenance period. Using the cubic spline basis function to model the BOLD response for each trial allowed the activation timecourse to be estimated without requiring assumptions regarding the shape of the hemodynamic response. All other trial types were modeled using six other cubic spline basis functions (AVF-Same, AVF-Different, AVF-Incorrect, SVF-Same, SVF-Different, and SVF-Incorrect trials) created using every trial other than the trial of interest. This allows one to estimate the activation evoked on a single trial while accounting for overlapping signals from adjacent trials. The result was an estimated activation timecourse for 24-seconds locked to trial onset for each trial at every voxel.Footnote 1

Functional connectivity with inferotemporal regions of interest

Functional connectivity between the inferotemporal ROIs and every other voxel in the brain was measured as correlated activation across trials using a variant of the beta-series method (Rissman, Gazzaley, & D’Esposito, 2004). Maintenance-phase activation was estimated from each single-trial activation timecourse by averaging a window from 4 seconds post cue onset (to account for hemodynamic lag relative to trial onset) and ending at 10 seconds post cue onset (to ensure that it did not contain any contributions from the probe stimulus). However, the earlier timepoints in this window also contained large, transient cue-locked responses so the activation scores at each timepoint were z-scored across trials to equate signal magnitude for each timepoint in the window and ensure that those timepoints did not exert undue influence over the window average. We refer to this time period as the maintenance phase because the time window predominately covers activity during cue maintenance, but it undoubtedly contains some signals related to cue processing (e.g., visual onset of the cue and initial encoding). However, our hypotheses regarding proactive control do not require a clear distinction between mechanisms operating on cue-related signals versus a “pure” maintenance signal.

Functional connectivity was defined as the correlation of maintenance-phase activation across trials between each of the bilateral inferotemporal ROIs and each of the other gray-matter voxels. The bilateral ROIs should be positively correlated with one another, so their patterns of connectivity with the rest of the brain should be highly similar. To isolate the connectivity patterns that differed across the two inferotemporal ROIs, partial correlations were used to measure connectivity between each region of interest and each voxel by removing variance that could be explained by the contralateral region of interest.

The result was four connectivity maps for each participant—connectivity during each of the two tasks (AVF and SVF) for each of the two regions of interest (the LH region as a seed with the RH partialled out; the RH region as a seed with the LH partialled out). The voxel-wise correlation maps were Fisher-transformed to reduce bias in the correlation coefficients (Silver & Dunlap, 1987), and hemispheric asymmetry maps were calculated as the voxel-wise difference between maps with LH and RH seeds. Regions where hemisphere asymmetry maps differed between AVF and SVF tasks were identified using a paired t test. Correction for multiple comparisons was performed by estimating the FWHM smoothness of spatial noise using AFNI’s 3dFWHMx program (7.71 mm × 7.73 mm × 7.39 mm) and then using AFNI’s 3dClustSim program to perform Monte Carlo simulations and determine the cluster extent threshold necessary to achieve a whole-brain corrected alpha ≤0.01. Based on these simulations, all statistical maps were thresholded at p < .005 (uncorrected) with a cluster extent ≥46 voxels.

Measuring connectivity effects using probe-evoked activation

The signal evoked by probe stimuli was measured using the GLM framework to model the response for each trial with a cubic spline basis function. Separate models were fit for six trial types: AVF-Same, AVF-Different, AVF-Incorrect, SVF-Same, SVF-Different, and SVF-Incorrect trials. Activation evoked by the onset of the probe stimulus was measured by averaging the signal from 16 to 18 seconds after cue onset (i.e., 5 seconds after probe onset) to account for hemodynamic lag given that the probe stimuli appeared 11 seconds into each trial. The effects of task (AVF vs. SVF), probe type (same vs. different), and the task-by-probe-type interaction were measured on the probe-evoked activation using univariate voxel-wise random effects across participants. Correction for multiple comparisons was performed by estimating the FWHM smoothness of spatial noise using AFNI’s 3dFWHMx program (7.71 mm × 7.73 mm × 7.39 mm) and then using AFNI’s 3dClustSim program to perform Monte Carlo simulations and determine the cluster extent threshold necessary to achieve a whole-brain corrected alpha ≤0.01. Based on these simulations, all statistical maps were thresholded at p < .005 (uncorrected) with a cluster extent ≥46 voxels.

Measuring effects of similarity on probe-evoked activation via cue-evoked activation

Given the prediction that the visual similarity of the cue and probe will have opposite effects on reactive conflict signals during the AVF and SVF tasks, we tested whether probe evoked activity on each trial correlated with the cue–probe similarity ratings. For each region exhibiting the effect of reactive control, the probe-evoked activation was estimated for each high-conflict trial (i.e., AVF-Same and SVF-Different) by averaging the single-trial timecourse from 16 to 18 seconds after cue onset (i.e., 5 seconds after probe onset) and across voxels within each region of interest. Spearman correlations were used to measure whether the probe-locked activity on each trial was related to the visual similarity scores on each trial. The correlations were calculated separately for each participant and condition and then Fisher-transformed for group analysis.

The group analysis used a repeated-measures MANOVA to test whether the similarity correlations had the expected pattern across conditions (i.e., negative relationship between similarity and activity during AVF-Same trials, positive relationship between similarity and activity during SVF-Different trials). The within-participant factor was Task (AVF, SVF), and similarity correlations from all regions of interest served as the dependent variables. A significant effect of Task was followed up by paired t tests separately for each region of interest.

Results

Behavioral responses

Analysis of error rates revealed a main effect of Task, F(1, 35) = 20.75, p < .001, such that errors were more frequent during the AVF task (18%) than the SVF task (12%) tasks. An effect of Probe Type (same vs. different) was not detected, F(1, 35) < 1. The Task by Probe Type interaction was significant, F(1, 35) = 4.92, p = .033, in the predicted direction, in which errors were more common in the high-conflict conditions (AVF-Same, 19%; SVF-Different, 14%) than in the low-conflict conditions (AVF-Different, 16%; SVF-Same, 9%).

Similar effects appeared in the analysis of response times. A main effect of Task, F(1, 33) = 7.46, p = .010, indicated that response times were slower during the AVF task (968 ms) than the SVF task (911 ms) tasks. An effect of Probe Type was not detected, F(1, 35) < 1. The Task by Probe Type interaction was significant, F(1, 33) = 11.52, p = .002, in the predicted direction, such that responses were slower in the high-conflict conditions (AVF-Same, 995 ms; SVF-Different, 939 ms) than in the low-conflict conditions (AVF-Different, 951 ms; SVF-Same, 892 ms).

Effect of conflict during working memory maintenance

Fig. 3 depicts the bilateral object representation regions identified with the localizer for functional connectivity analyses. Each region contained 466 voxels centered on fusiform gyrus (centers of mass located at ±39, -51, -11), and extending throughout the ventral visual stream.

Fig. 3
figure 3

Inferotemporal regions of interest identified by the object localizer task. Lateralized regions of interest identified by the object localizer task (left), and group-averaged evoked timecourses from each region of interest (right) to illustrate the similar temporal onset of activity across conditions. Shaded regions correspond to the maintenance phase of the trial

The connectivity asymmetry maps using these two regions differed between the AVF and SVF tasks in the bilateral IPS (see Fig. 4). Follow-up simple effect analyses (see Table 2) indicated that both of these IPS regions were more connected to the left inferotemporal ROI than the right inferotemporal ROI during the AVF task (cluster centroid) t(35)s > 2.81, and more connected to the right inferotemporal ROI than the left inferotemporal ROI during the SVF task (cluster centroid) t(35)s < -4.26. This is consistent with the interpretation that bilateral IPS uses current task demands to apply top-down proactive control over the AC and SE subsystems to minimize anticipated conflict between the two subsystems with respect to the current task demands. Alternatively, it may reflect changes to bottom-up processing such that the visual subsystem that is most useful for the current task becomes more actively connected to bilateral IPS.

Fig. 4
figure 4

Regions with an effect of task on functional connectivity with inferotemporal regions of interest. Color-scale for effects overlaid on the brain depict the task-by-asymmetry interaction measured by t score of the double-difference score (AVF/LH connectivity - AVF/RH connectivity) - (SVF/LH connectivity - SVF/RH connectivity). (Color figure online)

Table 2 Regions exhibiting proactive conflict resolution

It is important to note that activity was greater in the right inferotemporal ROI than the left during both the AVF and the SVF tasks (see Fig. 3), according to tests on the windowed averages (see Table 2). This is in contrast to our prediction based on previous neuroimaging studies of AC and SE subsystems using repetition-priming paradigms (e.g., Koutstaal et al., 2001; McMenamin et al., 2015; Simons et al., 2013). According to those studies, activity in the right inferotemporal ROI should be greater than the left during the SVF task, and activity in the left inferotemporal ROI should be greater than the right during the AVF task. Thus, one might question whether the inferotemporal ROIs used in the connectivity analysis correspond to AC and SE subsystems. To address this concern, we conducted a secondary analysis using independent AVF and SVF activity to define inferotemporal seed ROIs (see Supplemental Materials). Although results from this analysis are not particularly strong, we find a pattern of altered connectivity across tasks in bilateral IPS, as well as dmPFC, that replicates the results obtained using localizer-defined seed regions.

Effect of conflict on probe-evoked activation

Probe-evoked activation did not exhibit any main effects of Task or Probe Type, but several regions in the frontoparietal control network—including bilateral IPS, left ventrolateral PFC, dorsomedial PFC, and left caudate—had Task-by-Probe type interactions (see Fig. 5). Follow-up analyses indicated that the interaction in six of these seven regions was consistent with increased activation during conflict trials relative to nonconflict trials. In particular, simple effects (see Table 3) indicated that Same probes elicited significantly greater activity than Different probes during the AVF task, but Same probes elicited significantly less activity than Different probes during the SVF task. The cuneus exhibited a Task-by-Probe type interaction in the opposite direction, where simple effects revealed that Different probes elicited significantly more activity than Same probes during the AVF task, but there was only a weak trend for Same probes to elicit more activity than Different probes during the SVF task. This is consistent with the frontoparietal network coming online to perform reactive conflict resolution between the AC and SE subsystems during conflict situations. Moreover, the IPS regions identified in this analysis exhibited considerable anatomical overlap with those identified in the connectivity analysis. The left IPS region from the probe-evoked analysis contained 66% (46 of 70 voxels) of the left IPS region from the connectivity analysis, and the right IPS region from the probe-evoked analysis contained 51% (42 of 83 voxels) of the right IPS region from the connectivity analysis.

Fig. 5
figure 5

Probe-evoked conflict signals. Regions exhibiting a significant task-by-probe type interaction (left) with group-averaged evoked timecourses from each region to illustrate the similar temporal characteristics of probe-evoked activity across conditions. Shaded regions correspond to the probe phase of the trial. Color-scale for effects overlaid on the brain depict the Task-by-Probe interaction measured by the t score on the double-difference score: (AVF_Same - AVF_Different) - (SVF_Same - SVF_Different). (Color figure online)

Table 3 Regions exhibiting reactive conflict resolution

Ruling out reaction time confounds

Unfortunately, the interaction observed in this frontoparietal network has a very similar pattern compared with the response times observed in the behavioral data. To rule out the possibility that the activation differences in these regions were due to uninteresting differences associated with response time (e.g., deliberation time or motor preparation), a follow-up GLM analysis was performed that included response-time modulated regressors implemented via the AM2 option in AFNI’s 3dDeconvolve. The original set of regressors were paired with a set of predictors whose amplitude was modulated by response time on each trial to remove variance in each condition that can be explained by trial-to-trial variation in response time from the original set of regressors. Using the 34 participants with reliable response-time data, a very similar pattern of Task-by-Probe type interaction (see Table S1 in the supplemental materials) compared with the initial analysis was found in a similar frontoparietal network (see Table S1). Given this finding, it is unlikely that this activation pattern was driven by response-time changes across conditions.

Effect of visual similarity on probe-evoked conflict signals

Given the hypothesis that visual similarity of the cue and probe would have opposite effects on reactive conflict signals evoked by probes during AVF-Same and SVF-Different trials, we correlated visual similarity scores with activation in the six reactive conflict regions (see Table 3; bilateral IPS, left vlPFC; anterior caudate, dmPFC ,and cerebellum). The repeated-measures MANOVA had a significant effect of Task, F(6, 30) = 3.51, p = .01, indicating that the similarity correlations differed during the AVF and SVF tasks (see Fig. 6). Follow-up analyses emphasized the importance of the bilateral IPS in conflict processing because the correlation was significant in the left IPS, t(35) = 2.97, p = .01, right IPS, t(35) = 2.48, p = .02, and caudate, t(35) = 3.19, p < .01, but not in the dmPFC, t(35) = 1.84, p = .07, left vlPFC, t(35) = 1.199, p = .24, or cerebellum, t(35) = 0.46, p = .66. These results from bilateral IPS indicated the predicted negative relationship between similarity and activity during AVF-Same trials and the predicted positive relationship between similarity and activity during SVF-Different trials.

Fig. 6
figure 6

Effect of similarity on probe-evoked activity in each region of interest

Discussion

In the present study, we collected fMRI data while participants performed visual working memory tasks that required the use of abstract visual form (AVF) processing or specific visual form (SVF) processing for the purpose of understanding how the visual system reconciles the conflicting demands of categorizing and differentiating objects. The behavioral and neural results were consistent with previous reports of dissociable neural subsystems for AC and SE visual shape representation (e.g., Burgund & Marsolek, 2000; Marsolek, 1995, 1999; Marsolek & Burgund, 1997, 2005, 2008) and provide insight on the mechanisms used to resolve conflict between the subsystems and coordinate their output with respect to current task demands. In particular, a network centered on the bilateral IPS was critical for providing proactive coordination of the subsystems and reactive conflict resolution.

The bilateral IPS regions identified during proactive and reactive conflict are frequently implicated as a key information-processing hub in the so-called dorsal attention network (DAN; Corbetta & Shulman, 2002). The DAN is a frontoparietal network with a characteristic pattern of functional connectivity at rest (Yeo et al., 2011) and activation during visual working memory tasks, but it may serve a more general purpose of coordinating the orientation of attention to the external world (Dosenbach et al., 2007; Seeley et al., 2007; Smith et al., 2009). Previous reports have also found that the IPS is involved in the representation of current tasks and goals (Dosenbach et al., 2006; Harding, Yücel, Harrison, Pantelis, & Breakspear, 2015; Harding, Harrison, Breakspear, Pantelis, & Yücel, 2014; Waskom, Kumaran, Gordon, Rissman, & Wagner, 2014) and can apply top-down attentional biases to visual cortex (Beck & Kastner, 2009; Bray, Almas, Arnold, Iaria, & MacQueen, 2015; Padmala & Pessoa, 2011), consistent with the present findings that task demands altered the connectivity between IPS and inferotemporal object representation areas.

No regions exhibited a mismatch effect of greater probe-evoked activity following a different probe trial relative to a same trial. Instead, there was a widespread interaction of probe type (same vs. different) with task (AVF vs. SVF) in the direction indicative of reactive conflict processing. Most of these regions—including the bilateral IPS, caudate, and dorsomedial PFC—are frequently included in a frontoparietal executive control network that is active during cognitive tasks (Fox et al., 2005; Seeley et al., 2007). Of particular interest are the conflict signals in the left vlPFC because previous research has found that nearby regions of the left vlPFC play an important role in stimulus selection and the inhibition of incorrect representations (Badre & Wagner, 2007; Higo, Mars, Boorman, Buch, & Rushworth, 2011).

The existence of reactive conflict signals indicates that there is a limit to the amount of proactive control that can be applied to prepare systems for upcoming stimuli. Our previous report has found that the AC and SE subsystems are only weakly modular (McMenamin et al., 2015), so we hypothesize that it is difficult to selectively engage one subsystem without processing taking place in the other (Marsolek & Burgund, 1997). An interesting direction for future research is to explore how proactive and reactive processes interact, particularly given their shared neural substrates.

Moreover, the magnitude of conflict signals in these regions varied from trial to trial in relation to the visual similarity of cue–probe stimulus pairs. During an AVF-Same trial, conflict arose because the AC subsystem indicated same and SE subsystem indicated different; conversely, conflict arose during an SVF-Different trial because the AC subsystem indicated same and the SE subsystem indicated different (see Table 1). Increasing the similarity of the cue–probe stimulus pair pushed the SE subsystem response toward same, reducing the conflict in the former condition (AVF-Same) and increasing the conflict in the latter condition (SE-Different). Accordingly, probe-evoked signals had a negative correlation with similarity during AVF-Same trials and a positive correlation with similarity during SVF-Different trials. Overall, the magnitude of the correlation was greater during the SVF task, consistent with the fact that proactive conflict resolution has been applied to emphasize/deemphasize the contributions of the SE subsystem during the SVF and AVF tasks, respectively.

An alternative interpretation of this interaction is that activation of these regions does not provide an index of conflict between AC and SE subsystems. Instead, they provide an index of an object similarity metric that changes based on current task demands. For example, participants may use a “similarity to category prototype” measure for doing the AVF task (resulting in greater activity during the AVF-Same than AVF-Different trials), and a “distance between exemplars” measure for the SVF task (resulting in greater activity during the SVF-Different than SVF-Same trials). This interpretation is particularly appealing given that such a similarity measure is conceptually related to the concept of error or surprise that has been localized to nearby portions of prefrontal cortex (Wessel, Danielmeier, Morton, & Ullsperger, 2012). In other words, the former interpretation emphasizes competition between representations and the latter emphasizes competition between processes. Regardless of which interpretation is used to interpret activity in this network, it is consistent with the use of multiple visual processing subsystems and a control mechanism in the bilateral IPS.

Until recently, an important concern regarding the evidence for the dissociable subsystems theory was that the LH advantages in AVF tasks may reflect linguistic processing of the names associated with the familiar visual objects (Curby, Hayward, & Gauthier, 2004; Simons et al., 2003) rather than processing of abstract visual-shape information as posited in the dissociable subsystems theory. However, a study examining working memory for unfamiliar objects with no names and viewed only one time provided evidence against this possibility (Marsolek & Burgund, 2008). The same set of unfamiliar, unnamed objects were used in the present report to reduce the possibility of linguistic processing.

We observed greater activity in the right inferotemporal ROI than in the left inferotemporal ROI in both the AVF and SVF visual working memory tasks. This may be surprising given past research using repetition priming of familiar objects, which has indicated greater evidence of SE priming in the right hemisphere than in the left but greater evidence of AC priming in the left hemisphere than in the right (e.g., Koutstaal et al., 2001; McMenamin et al., 2015; Simons et al., 2003). How should the different results be reconciled? We suspect that the differences in the experimental paradigms are responsible. In repetition priming, initial encoding of some objects enables their representations to be utilized, and subsequently differences in activity are measured between objects that have been primed and objects that have not been primed. When appropriate conditions are included, repetition priming of familiar objects can be used to isolate activity that is due solely to visual object processing (see McMenamin et al., 2015), enabling differential asymmetries of visual object subsystems to be observed. Our visual working memory paradigm using unfamiliar objects is different in important ways. Comparisons are made between two shapes that have no preexisting representations to be activated. Thus, neural activity in inferotemporal areas likely reflects a high degree of interactivity with other areas (e.g., IPS, according to our results) rather than solely visual shape processing. This may be the reason why there were no differential asymmetries in inferotemporal activity in the present study. Why was activity greater in the right hemisphere than in the left overall? We suspect that the use of novel objects is responsible, given evidence that visual processing of novel shapes is more effective in the right hemisphere than in the left (e.g., Marsolek, Schacter, & Nicholas, 1996). We note that, if this explanation for the lack of differential asymmetries is correct, we should slightly revise the interpretation of the visual working memory results obtained by Marsolek and Burgund (2008). The differential hemispheric asymmetries measured using the divided-visual-field paradigm may not reflect asymmetric visual object subsystems directly but instead how asymmetric visual subsystems interact with other brain areas.

Alternatively, the lack of differential asymmetries in inferotemporal activity may reflect the manner in which the ROI analysis was performed. When ROIs are defined from functional localizers, it is typical that they are defined individually and without the constraint that they be symmetrical. Constraining the ROIs to be symmetrical was needed for the present study, and this may have resulted in relatively poor localization of the important voxels for visual object processing (e.g., Brett, Johnsrude, & Owen, 2002; Swallow, Braver, Snyder, Speer, & Zacks, 2003). We suspect, however, that this was not the case in the present study. Our connectivity analyses did reveal asymmetry simple effects in either direction depending on task (left hemisphere greater than right hemisphere in the AVF task and right hemisphere greater than left hemisphere in the SVF task), which provides evidence against the possibility that we systematically missed the mark for identifying the important functional regions.

Previous behavioral and neuroimaging studies have indicated that the visual system accommodates the conflicting demands of object categorization and exemplar differentiation by using two asymmetric visual subsystems. However, a critical component to the dissociable subsystems theory is that the outputs from different subsystems will often be in conflict with one another, so an additional processing module is needed to adjudicate between their outputs based on current task demands. The present study provided evidence for two conflict resolution methods: proactive conflict control was implemented by bilateral IPS using current task demands to favor processing in the AC or SE subsystem, and reactive conflict control was implemented by a broad frontoparietal network (including bilateral IPS) to reconcile conflict between AC and SE subsystems once it is detected. These results support our theory that there is an important reason why dissociable AC and SE subsystems operate in the brain. They implement contradictory processes to effectively achieve the goals of recognizing abstract categories and identifying specific exemplars.