This research is concerned with identifying brain regions that are engaged when students have to deploy their mathematical knowledge in non-routine versus routine ways. Our goal is to acquire information that will guide development of a model of metacognitive activity in mathematical problem solving. We will both examine predefined brain regions (see Fig. 1) that past theory has related to routine mathematical tasks and identify additional regions that are engaged by the non-routine tasks in this research.

Fig. 1
figure 1

The six predefined regions in the experiment. The Motor region is a 12.8-mm (high) by 15.6 x 15.6-mm2 region centered at Talairach coordinates ±41,–20,50 spanning Brodmann Areas 1,2, and 3. The posterior superior parietal lobule (PSPL) is a 12.8-mm (high) by 12.5 x 12.5-mm2 region centered at Talairach coordinates ±19,–68,55 in Brodmann area 7. The horizontal intraparietal sulcus (HIPS) is a 12.8-mm (high) by 12.5 x 12.5-mm2 region centered at Talairach coordinates ±34,–49,45 in Brodmann area 40. The lateral inferior prefrontal cortex (LIPFC) is a 12.8-mm (high) by 15.6 x 15.6-mm2 region centered at Talairach coordinates ±43,23,24 spanning Brodmann areas 9 and 46. The posterior parietal cortex (PPC) is a 12.8-mm (high) by 15.6 x 15.6-mm2 region centered at Talairach coordinates ±23,–63,40 spanning Brodmann areas 7 and 39. The angular gyrus (ANG) 12.8-mm (high) by 12.5 x 12.5-mm2 region centered at Talairach coordinates ±41,–65,37 in Brodmann area 39

Considerable research has studied the neural basis of what might be characterized as routine use of mathematical knowledge. The greatest amount of research has gone into understanding the role of various parietal regions in arithmetic tasks (e.g., Castelli, Glaser, & Butterworth, 2006; Eger, Sterzer, Russ, Giraud, & Kleinschmidt, 2003; Isaacs, Edmonds, Lucas, & Gadian, 2001; Molko et al., 2003; Naccache & Dehaene, 2001; Piazza, Izard, Pinel, Le Bihan, & Dehaene, 2004; Pinel, Piazza, Le Bihan, & Dehaene, 2004). The triple-code theory (e.g., Dehaene & Cohen, 1997) proposes that basic numerical knowledge is distributed over three brain regions that code for different aspects of number knowledge: the horizontal intraparietal sulcus (HIPS) that processes numerical quantities, a left perisylvian language network that is involved in the verbal processing of numbers, and a ventral occipital-parietal region that processes visual representations of digits. In related work, Dehaene et al. (2003) identified three parietal regions that will be of interest to us. In addition to HIPS, there is the angular gyrus (ANG) that is part of the perisylvian language network, and the posterior superior parietal lobule (PSPL, not part of the original triple-code theory) that supports attentional orientation on the mental number line and other spatial processing.

The prefrontal cortex is also involved in mathematical performance. Broca’s area is part of the perisylvian language network identified in the triple-code theory. There is a region of the lateral inferior prefrontal cortex (LIPFC) that is particularly involved in more advanced tasks involving topics like algebra, geometry, or calculus (e.g., Krueger et al., 2008; Qin, Anderson, Silk, Stenger, & Carter, 2004; Ravizza, Anderson, & Carter, 2008; Sohn, Goode, Koedinger, Stenger, Carter, & Anderson, 2004). It also appears to play a key role in retrieval of arithmetic facts and semantic facts (Danker & Anderson, 2007; Menon, Rivera, White, Glover, & Reiss, 2000).

We have developed ACT-R theory (Anderson, Bothell, Byrne, Douglass, Lebiere, & Qin, 2004; Anderson, 2007) of equation solving (Anderson, 2005; Ravizza et al., 2008) and mental multiplication (Rosenberg-Lee, Lovett, & Anderson, 2009). These models particularly emphasize the contribution of the LIPFC and a region of the posterior parietal cortex (PPC), which is about 2 cm away from each of the three parietal regions (HIPS, PSPL, and ANG) identified by Dehaene et al. (2003). This region is activated when mental representations are being manipulated (e.g., Carpenter, Just, Keller, Eddy, & Thulborn, 1999; Zacks, Ollinger, Sheridan, & Tversky, 2002). In a variety of experiments studying tasks like algebra equation solving and geometry proof generation (see Anderson, 2007, for a review), activity in the PPC proves to be the best correlate of problem complexity, while activity in the LIPFC proves to be the best correlate of student proficiency. According to the ACT-R theory, the connection between complexity and PPC holds because its activity reflects how much the mental representation of the problem is manipulated in solving itFootnote 1. The connection between proficiency and LIPFC holds because its activity reflects amount of declarative retrieval, which decreases as students develop a procedural mastery of a new algorithm and drop out the need for retrieval of tasks instructions.

The predefined regions that we will work with (Fig. 1) are those that have been localized in Dehaene et al. (2003) or the ACT-R algebra studies. The predefined regions for the triple-code theory were of similar size as the ACT-R region and were centered at the coordinates reported in Dehaene et al. (2003) and Cohen, Kadosh, Lammertyn, & Izard (2008). Rosenberg-Lee et al. (2009) compared the activity in HIPS, PSPL, ANG, PPC, and LIPFC in mental multi-digit multiplication. They found that, while there were differences, all four of the LIPFC, PPC, HIPS, and PSPL showed strong engagement that increased with task difficulty. In contrast, ANG was deactivated and was not affected by condition. As in all of our research on algebra, the left hemisphere LIPFC, PPC, HIPS, and PSPL gave stronger responses than their right hemisphere homologs although the right gave similar responses. The left ANG is often activated in language tasks, and the triple-code theory only concerns left ANG (in contrast to HIPS and PSPL, which are bilateral). Rosenberg-Lee et al. did not find differential response between left and right ANG. Inspection of algebra data from our laboratory (e.g., Anderson, Qin, Sohn, Stenger, & Carter, 2003) confirms the result of deactivation in the region of the angular gyrus and no differential left-right response.

The common feature of most of these math tasks studied is that they involve participants executing a known algorithm to find a result. However, all use of mathematics is not algorithmic and it is often necessary to adapt ones mathematical knowledge to novel situations. Success in solving novel problems involves the metacognitive processes of monitoring and reflecting on the problem solving process. Metacognition has been a topic of interest in research on mathematics education, in particular if and how students monitor their problem solving, adapt their solution methods, and check their solutions for whether they make sense (e.g., Schoenfeld, 1987).

This paper will study how the regions in Fig. 1 respond to routine problems and non-routine problems that require modifying solution methods. We will also see if other regions are engaged by our non-routine problems. Given that the regions in Fig. 1 were identified in studies of routine mathematical activities, we suspect they will be again engaged by our routine problems. There is some reason to suspect that the ANG may also be engaged to serve the metacognitive activities of monitoring and reflecting. Regions close to the right ANG have been found to play a variety of metacognitive functions (Decety & Lamm, 2007, for a review), although there are a number of other ways of characterizing the function of the right temporoparietal junction (e.g., Legrand & Ruby, 2009).

Another region that is potentially involved in metacognitive function is Brodmann Area 10 or frontopolar cortex (FPC), particularly its lateral portion (Christoff & Gabrieli, 2000: Fletcher & Henson, 2001). A number of converging lines of research suggest that this region of the brain may be critical in the ability to extend knowledge. Ramnani and Owen (2004) stress that this region is unique in being exclusively connected only with supramodal regions of the cortex. They suggest that this region is involved in the integration of information from multiple regions. There are numerous other theories of the role of this region, all of which have elements of metacognition, such as monitoring of memory retrieval (e.g., Buckner, 2003; Rugg, Henson, & Robb, 2003), branching (Koechlin & Hyafil, 2007), shifting between external and internal processes (Burgess, Gilbert, Okuda, & Simons, 2006), relational integration (Christoff et al., 2001), and after-task processing in memory experiments (e.g., Johnson, Raye, Mitchell, Greene, Cunningham, & Sanislow, 2005; Raye, Johnson, Mitchell, Greene, & Johnson, 2007; Reynolds, McDermott, & Braver, 2006).

In an effort to contrast cognitive and metacognitive processes in mathematics, we developed pyramid problems. Table 1 reproduces the instruction provided to participants for solving these problems. These problems require only middle-school mathematics to solve. As students work with pyramid problems they quickly master the algorithm for solving these problems. Nonetheless, students can still be placed in situations that require they extend this knowledge and most students can do so with at least some success.

Table 1 Instructions given to participants on pyramid problems

Anderson (2007) reported an initial behavioral study with these problems. Students had little difficulty solving regular versions of these problems in the three forms:

8$4:

x (solve-for-value)

7$x:

18 (solve-for-height)

x$4:

22 (solve-for-base)

Students initially solve both solve-for-value and solve-for-height problems by performing repeated addition. In the solve-for-value case, students keep adding until the number of additions equals the height while in the solve-for-height case they keep adding until the sum equals the value. The solve-for-base problems cannot be dealt with so simply because there is no base to start the addition. Most students start by using a guess-and-check strategy, guessing a base, checking that, and guessing again if necessary. Their initial guesses are close (for instance, 6 for x$4 = 22) and they sensibly adjust wrong guesses (e.g., since 6 produces too small an answer for x$4 = 22, they try 7, which they often do not even test but just assume will be correct).

The experiment to be reported here contrasts students solving regular problems like the above with exception problems. Regular problems are characterized by involving only small positive integer values for base and height with a single unknown. Exception problems involve going outside these constraints. There are a substantial variety of exception problems and each requires the student to think about how to extend their understanding of pyramid problems. We use four types of exception problems: problems involving fractions for bases, problems involving negative numbers for bases or heights, problems involving large numbers for bases or heights, and problems involving a repetition of the variable as in x$x = 15.

We collected verbal protocols from pilot students solving these problems. The following verbal protocol of one student solving 6$5 illustrates the solution of a regular problem:

“Six plus four plus three plus two – four, fifteen, three, eighteen plus two is 20”

That is, the student basically gives a partial record of his calculations. The student said nothing further upon receiving confirmatory feedback and just went on to the next problem. In contrast, consider the protocol of the same student as he successfully solved –9$4:

“So wait it's one less so it's minus ten…minus nine plus minus ten plus minus eleven is three and then negative… minus 42”

In this case, in addition to the record of calculations there is an initial recognition of difficulty. Also after solving the problem the student continues to think about it even though he got it right:

“You need to explain the rule”

Consider another student on the X$4 = X which he solved incorrectly entering –2 rather than 2:

“different… Oh, that’s interesting…x, 3x +6” and after the feedback:

“no, oh 2, 2, 2! Shit!”

Two metacognitive features show up in the protocols for exception problems, both of which are intuitively reasonable—an early recognition that this was a different kind of problem and various assessment remarks after the solution. We expected that these exception problems would engage brain regions in addition to those engaged for regular problems and that this engagement would extend after the problem was solved. This research will contrast regions engaged during execution of a routine procedure with regions engaged when the problem involved rethinking the existing procedure. We will call these Cognitive and Metacognitive regions. The Cognitive regions might be somewhat more active in solving exception problems than routine problems because exception problems tend to involve somewhat more complex computations, but their activity should drop after solving the problem. In contrast, Metacognitive regions should show much greater activation for exception problems and maintain that activation into after-task reflection. Less than 15% of all problems were exceptions and they were quite varied so that they would continue to engage metacognitive activities throughout the experiment.

Our research will contrast regions that reflect a Cognitive or a Metacognitive pattern of activity. As Fig. 2 illustrates, participants are given feedback after their response and time to consider this feedback. Letting “Before” denote activity before the answer is generated and “After” the activity after the answer is generated (so, for instance, Before-Regular denotes activity for regular problems before the answer is generated) we expect one of the following two orderings of the conditions:

Fig. 2
figure 2

An illustration of the sequence of events for each problem. The problem began with a 4-s fixation and then was followed by a problem that stayed on the screen until the participant answered or until 30 s were up. Participants responded by clicking the answer in the keypad with a mouse. This was followed by feedback on the correct answer and its derivation. After seeing the feedback for 5 s, participants were given a repetition-detection task for 12 s. In this task, letters appeared on the screen at the rate of 1 per 1.25 s. Participants were instructed to click an onscreen button each time they detected a pair of letters that were the same. The function of this task was to distract the participant from the previous problem and return them to a relatively common neutral state

FormalPara Cognitive pattern

Before-Regular > After-Exception – Cognitive regions should cease engagement upon solution of a problem and, even if Exception problems involve somewhat more computation, there should be relatively little activation after the solution of the problem.

FormalPara Metacognitive pattern

After-Exception > Before-Regular – Exception problems should evoke reflection after as well as during the solution of the problem. In contrast, regular problems should involve little reflection at any point.

Thus, the contrast between After-Exception and Before-Regular offers a simple diagnostic of whether a region is Cognitive or Metacognitive. We will restrict this contrast to correctly solved problems to avoid any complications that error processing might add to the analysis, but we will also look at activity on error trials.

Methods

Participants

Twenty right-handed participants (eight female and 12 male, 18–31 years of age, mean = 23.8) were recruited from subject population at Carnegie Mellon University. Each participant took part in one fMRI scan that lasted approximately 90 min.

Procedure

Participants responded by using a mouse to select numbers from a numeric keypad on the screen. They were given practice on using the keypad when in the scanner. A scanning session consisted of seven blocks. Each block involved a series of problems presented according the procedure in Fig. 2. The first block was a warm-up block that involved 18 regular problems (i.e., small positive numbers solving for the value, base, and height) and was presented during the structural image acquisitions. The subsequent six blocks occurred during the functional scanning. Each block contained a mixture of 21 problems. The first problem was always a regular problem for warm-up purposes and was not analyzed. The remaining problems were a randomly ordered sequence of the following problem types:

  1. 1.

    Regular: Twelve problems were created using small positive integers for base (4 to 9) and height (2 to 5). These consisted of four problems each that required solving for base, solving for height, and solving for value.

  2. 2.

    Exception: Four problems consisted of one each of the following:

    1. (a)

      A problem with a large base (in the hundreds) or large exponent (in the teens or twenties) (e.g., 151$2 = X or 8$21 = X),

    2. (b)

      A problem with a fractional base (e.g., 5½$4 = X),

    3. (c)

      A problem with a negative bases or height (e.g., –9$3 = X, 9$–3 = X),

    4. (d)

      A problem with a repetition of X in two positions (e.g., X$4 = X, X$X = 21).

    5. (e)

      Over the course of the six blocks, two problems in categories (a) – (c) would involve solving for base, solving for height, and solving for value. For category (d), the variable appeared twice each pair of positions. These subcategories within the four exception categories were randomly ordered over the blocks.

  3. 3.

    Font: To have regular problems that would be surprising like exception problems but not pose a cognitive challenge we created four very odd fonts and presented regular problems in these fonts and with different colors other than the normal black. Over the course of the experiment, each font-color choice would be tested twice solving for base, height, and value. These problems proved indistinguishable from regular problems in their behavioral and imaging profiles in the predefined regions (Fig. 1) and were collapsed with regular problems in all reported analyses.

fMRI analysis

Images were acquired using gradient echo-echo planar image (EPI) acquisition on a Siemens 3 T Allegra Scanner using a standard RF head coil (quadrature birdcage), with 2-s repetition time (TR), 30-ms echo time (TE), 70° flip angle, and 20-cm field of view (FOV). The experiment acquired 34 axial slices on each TR using a 3.2-mm-thick, 64 × 64 matrix. This produces voxels 3.2 mm high and 3.125 × 3.125 mm2. The anterior commissure-posterior commissure (AC-PC) line was on the 11th slice from the bottom scan slice. Acquired images were analyzed using the NIS system. Functional images were motion-corrected using six-parameter 3D registration (AIR; Woods, Grafton, Holmes, Cherry, & Mazziotta, 1998). All images were then co-registered to a common reference structural MRI by means of a 12-parameter 3D registration and smoothed with a 6-mm full-width-half-maximum 3D Gaussian filter to accommodate individual differences in anatomy.

All of our analysis will be on percent change in the BOLD response from the first scan of the 4-s fixation period. To test the predictions outlined in the introduction, we extracted an estimate of the engagement of these regions before and after the response. To separate these two estimates from any activity that occurred during the actual response generation, we used three regressors—one for the variable period up to 2 s before response completion, one for the 2 s up to response completion, and one for the 8 s after response completion. These regressors were created by taking boxcar functions for the three periods and convolving them with a hemodynamic response function. We used the standard hemodynamic function in the ACT-R model (Anderson, Carter, Fincham, Ravizza, & Rosenberg-Lee, 2008), a gamma function with an index parameter of six and a scale parameter of 0.75 sFootnote 2. Three such regressors were entered into a design matrix (Friston, 2006) for each of five conditions (regular versus exception crossed with correct versus error, plus a fifth condition to deal with cases where participants times out and never generated a response). Finally, a set of regressors for a quadratic function was added for each block to extract any general trends. Ignoring the time outs and block trends, this yielded 12 beta weights for each participant—the four conditions crossed with the three intervals. Analyses (random-effects model at the group level of analysis) will focus on the beta weights for the variable period before the response and the 8-s period after the response. We refer to these beta weights as estimates of engagement for that region.

We performed three analyses on the imaging data:

  1. 1.

    Predefined Regions: The first analysis focused on the six regions in Fig. 1: HIPS, PSPL, and ANG from the triple-code theory and a Motor region controlling the hands, LIPFC, and PPC from the ACT-R theory. The ACT-R regions have been used many times (for a review see Anderson, 2007). The locations of triple-code regions are from Dehaene et al. (2003) and were first used in our research in Rosenberg-Lee et al. (2009). However, we updated the coordinates of the HIPS region to reflect the larger meta analysis in on Cohen et al. (2008)

  2. 2.

    408 Regions Exploratory: This analysis used 408 regions that we have used in a multivariate pattern recognition study (e.g., Anderson, Betts, Ferris, & Fincham, 2010). These regions were created by evenly distributing 4 x 4 x 4 voxel cubes (a voxel is 3.2 mm high by 3.125 mm long and wide) over the 34 slices of the 64 x 64 acquisition matrix. Between region spacing was 1 voxel in the x- and y- directions in the axial plane, and one slice in the z-direction. We applied a mask of the structural reference brain and excluded regions where less than 70% of the region’s original 64 voxels survived.

  3. 3.

    Voxel-Level Exploratory: The other exploratory analyses will be at the individual voxel level. They include a contrast of Before-Exception and before Before-Regular. This exploratory analysis looked for regions of at least 15 contiguous voxels that showed a voxel-wise significance of 0.00001 for the difference between Before-Exception and Before-Regular. Using these values results in a brain-wide significance estimated to be less than 0.00001 by simulation (Cox, 1996; Cox & Hyde, 1997).

Results

Behavioral results

Participants were correct on 90.3% of the regular problems but only 42.9% of the exception problems, a highly significant difference (t(19) = 12.55, p < .0001). The error rate on exception problems ranged relatively uniformly across participants from 17 to 79%. We excluded from further analysis the 4.9% of problems that resulted in errors due to timing out. Of the remaining problems, correct regular averaged 9.77 s., correct exceptions average 16.02 s., incorrect regulars average 12.75 s., and incorrect exceptions averaged 19.02 s. These times display a main effect of problem type (F(1, 19) = 113.36, p < .0001) and correctness (F(1, 19) = 48.21, p < .0001) and no significant interaction F(1, 19) = .01).

Imaging analysis: predefined regions

The first analysis will look at the BOLD response in the left and right homologs of six predefined regions given in Fig. 1. While the real interest is in the other regions, we will use the motor region to illustrate how the analysis applies. Figure 3a illustrates the analysis for the left motor region for regular and exception problems that were solved correctly. In this figure and others like it, for illustrative purposes only, the BOLD values from all trials have been rescaled (see Anderson et al., 2008) so that the scan of the response is always the mean duration of that condition—that is, five scans for correct regular problems and eight scans for correct exception problems (marked by stars in the figure). This makes the illustrative BOLD responses both stimulus-locked and response-locked. The BOLD values are plotted for the two scans before the problem (during which the warning appeared) through ten scans after the response. The measure plotted is the percent change from a baseline defined by the first and last scan in the graph. The last scan is the beginning of the warning for the next trial. Also plotted is mean magnitude of engagement during the three periods (Before response, Response period, and After response), estimated as described in the Methods section). The solid line is the predicted hemodynamic response scaled to the mean length of the conditions as the empirical data. The good fit to the data suggests that our estimation of the three engagement magnitudes captures the main effects in the data. Sensibly, engagement is low before the response, spikes during the period of response generation, and is low again during the 8 s after the response.

Fig. 3
figure 3

Activity in the motor regions associated with the hands. a The average BOLD responses for correct regular and exception problems. The dotted lines connect the actual mean responses and show the standard errors of the means. The stars indicate when the responses were submitted. The faded boxcar lines give the average engagement estimated for the Before, Response, and After periods. The solid line gives the predicted BOLD response by convolving a hemodynamic response function with the engagement functions. The correlation between observed and predicted BOLD response is displayed (for the motor region, only we assume that engagement begins with the onset of the warning, reflecting anticipatory mouse movement. For other regions, activity is estimated to begin with problem onset). b Mean engagement in Before and After periods as a function of hemisphere, correct versus error, and regular versus exception problems. Arrows connect Before-Regular and After-Exception correct condition that compose the critical contrast in diagnosing Cognitive versus Metacognitive patterns. z in the inset brain slice (radiological convention: image left = participant’s right) is at x = y = 0 in Talairach coordinates

While the hemodynamic responses in Fig. 3a are illustrative, the statistical analyses will be on estimates of engagement during the Before and After intervals (ignoring the Response engagement for the 2 s before the response was completed). Figure 3b illustrates the mean engagements, averaged over participants, both for left and right motor regions and for both correct and incorrect answers. For this and subsequent regions we performed a 2 x 2 x 2 x 2 analysis of variance (see Table 2) where the factors were Hemisphere (left versus right), Accuracy (correct versus error), Type (regular versus exception), and Period (Before versus After). The analysis in the case of the motor region produced only one strongly significant effect – that of hemisphere (F(1,19) = 22.10, p < .0005). Not surprisingly, since the responding hand is the right, the right motor region shows little activity. While the Before and After engagement in the left hemisphere is significantly greater than in the right, it is small relative to the left’s engagement during the Response period (see Fig. 3a).

Table 2 Analyses of variance (F’s with 1 and 19 df) and critical contrasts (t’s with 19 df)

The results in Fig. 3 can be viewed as a sanity check on the analysis, as there is no reason to expect any effects of factors other than hemisphere for the motor region and a strong reason to expect a larger effect in the left hemisphere that would spike during the response interval. The relatively small Before engagement probably reflects mouse movements, perhaps in anticipation of response generation. Also, the relatively small After engagement probably reflects mouse movements, perhaps in anticipation of the repetition detection task.

The results of complete statistical analyses for the six predefined regions are in Table 2a. The results for the other regions are quite different from the motor region. None shows significant effects of hemisphere. In contrast, they show significant or near significant effects of correctness (errors greater than corrects), type (exception greater than regular), and (except for ANG) of period (Before greater than After). They all show a significant period-by-correctness interaction such that the greater response to errors only occurs after the response has been made. Table 2a also reports the results of a two-tailed t-test of the difference between After-Exception and Before-Regular, which is the critical test identified in the introduction for distinguishing Cognitive and Metacognitive regions. The PPC, the LIPFC, the PSPL, and the HIPS show significant effects) in the Cognitive direction of Before-Regular being greater than After-Exception. In contrast, ANG shows a strong effect in the opposite direction (p < .005).

The PPC, LIPFC, PSPL, and HIPS all show the same direction of effects and are strongly intercorrelated (mean correlation of the eight Before and After engagement values (averaging over left and right) is .933, with a range from .841 to .981) while they are much more weakly correlated with ANG (mean .423, range from .185 to .591). Figure 4 illustrates the BOLD values for the left LIPFC, which is quite representative of the intercorrelated four regions, and Fig. 5 illustrates the pattern in right ANG, which is different. The LIPFC (Fig. 4) shows strong engagement during the problem solving and low engagement afterwards. The mean .04% engagement after successful solution of regular problems is only marginally different from 0 (t(19) = 2.01, p < .1). While there is a difference between exception and regular problems, it is relatively small.

Fig. 4
figure 4

Activity in the lateral inferior prefrontal regions associated with controlled retrieval in the ACT-R theory. a BOLD responses in left hemisphere. b Mean engagement in left and right hemispheres (see caption of Fig. 3 for detail)

Fig. 5
figure 5

Activity in the angular gyrus whose left hemisphere is associated with verbal processing of numbers in the triple-code theory. a BOLD responses in right hemisphere. b Mean engagement in left and right hemispheres (see caption of Fig. 3 for detail)

The results for ANG (Fig. 5) are quite different from the LIPFC. For corrects (Fig. 5a) the Before engagement is not significantly different than the After engagement. The difference between exceptions and regulars is quite large and results in a highly significant difference between Before-Regular and After-Exception, but in the opposite direction of the other regions. The magnitude of the left engagement is marginally larger (p < .1) than in right but hemisphere is involved in no interactionsFootnote 3. The left ANG is often distinguished from the right in many theories including the triple code, but the pattern of ANG effects in this experiment is basically the same in the two hemispheres.

Imaging analysis: exploratory analysis with 408 regions

The ANG is the only predefined region to display the Metacognitive Pattern defined in the introduction. To find further Metacognitive (or Cognitive) regions we performed an exploratory analysis using 408 regions that cover the entire brain as described in the Methods section. For each of the 408 regions we calculated the difference between the critical conditions of Before-Regular and After-Exception for the correct data. Using the Bonferroni correction for multiple comparisons, a t of ±4.81 is required for significance. Eleven regions exceeded this threshold (see Table 3). Four of these regions were located in the bilateral premotor areas and showed significantly greater activation for the Before period for regular problems than the After period for exception problems and they all give very similar patterns of engagement. Three parietal regions, the last two of which overlap with the predefined left and right ANG, show patterns of engagement similar to Fig. 5. Two regions in the frontopolar cortex (FPC) and two visual regions also show greater engagement for After-Exceptions than Before-Regulars. One of the visual regions is the left fusiform and the other is an occipital region just posterior to the fusiform. These visual areas overlap substantially with the region in the triple-code theory of Dehaene and Cohen that processes visual representations of digits. Having already examined the ANG, we will report the effects for the premotor regions, frontopolar, and fusiform in more detail. We averaged the two premotor regions on the left, the two on the right, the two FPC regions, and the two fusiform regions. Complete results of statistical tests involving these regions are included in Table 2b.

Table 3 Exploratory regions, Brodmann areas, Talairach coordinates, and t values of the after-exception minus before-regular contrast for corrects

Figure 6 presents the results for the left and right premotor regions. The pattern of engagement in the Before and After periods shows substantial correlation with PPC, LIPFC, PSPL, and HIPS (mean correlation .892) and much less correlation with ANG (mean .061). For corrects the premotor areas are strongly engaged in the Before period and very weakly engaged in the After period. The mean After engagement for correct regulars (.02%) is not significantly different than zero (t(19) = 1.16, p > .25). These regions contrast with the other regions in showing no significant difference between regular problems and exceptions. This pattern of results yields a highly significant contrast between Before-Regular and After-Exception.

Fig. 6
figure 6

Activity in premotor regions. a BOLD responses in the two right hemisphere regions. b Mean engagement in the two left and two right hemispheres (see caption of Fig. 3 for detail)

Figure 7 provides information about the average engagement of the two frontopolar regions and the two visual-fusiform regions, which give somewhat similar effects. The mean engagement in the fusiform correlates .907 with the frontopolar engagement. Their mean correlation with ANG is .742 while their mean correlation with the other non-motor predefined regions is just –.101. Both are more engaged by errors, by exceptions, and in the After period, and show a strong interaction between correctness and period. Figure 7a illustrates the mean response of FPC for corrects. This region actually shows a negative engagement in the Before period for regular problems which is marginally significant (t(19) = –1.93, p < .10).

Fig. 7
figure 7

Activity in inferior regions. a BOLD responses in frontopolar regions. b Mean engagement in frontopolar and fusiform regions (see caption of Fig. 3 for detail)

The exploratory regions in Figs. 6 and 7 provide a striking contrast. While there are small effects of errors in Fig. 6, the majority of variance among conditions is due to the greater activity before than after the response. On the other hand, the regions in Fig. 7 show strong effects of error and problem type and show more activation after the response. To determine how typical these regions might be of general activity in the 408 regions we performed a factor analysis of the 8 magnitudes of engagement (excluding the magnitude for the response scan between the Before and After periods) for each of the 408 regions. Figure 8a shows, after a rotation to produce the best illustration, the scores for the first two factors (Factor 1 and Factor 2) that account for 72% of the variance. Factor 1 is much like the pattern in Fig. 6 for the premotor regions. The figure also shows a hypothetical Cognitive Factor that is simply +1 before the response and -1 after. The correlation between Factor 1 and the Cognitive Factor is .974. Factor 2 shows a pattern somewhat like Fig. 7 for the frontopolar and fusiform regions. Figure 8a also illustrates a Metacognitive Factor that is engaged by exceptions and by errors in the After period. The Metacognitive factor adds the exception engagement to the error engagement for the last condition (After response to exception errors) in Fig. 8a. The correlation between the Metacognitive factor and Factor 2 is .913. Unlike Factors 1 and 2, the Cognitive and Metacognitive factors are not perfectly orthogonal but show a negative correlation of –.378. Factor 2 may be the Metacognitive factor distorted under the constraint of extracting orthogonal factors in the factor analysis.

Fig. 8
figure 8

a Factors 1 and 2 show the rotated scores for the first two factors identified in the factor analysis of the 408 exploratory regions. The Cognitive and Metacognitive Factors are idealized scores that strongly correlated with Factors 1 and 2. b Distribution of 51,239 voxels in terms of their correlation with the Cognitive and Metacognitive factors color-coded as to total variance explained. c Color coding of categories for 11,994 positively responding voxels with variance explained greater than .841. d Brain distribution of voxels in b with R2 > .841. Negatively responding voxels are in black and positively responding voxels use the color coding in c. The value of z at each brain slice (shown in radiological convention: image left = participant’s right) is at x = y = 0 in Talairach coordinates

Individual voxel analysis of the cognitive and metacognitive factors

While this final analysis is at the voxel level, the intent is to shed more light on the global organization of the brain in service of both cognition and metacognition. Given the interrelatedness of these functions, and the prior research discussed below in the general discussion, the expectation is that various cognitive-to-metacognitive gradients should be observed. To this end, we calculated the correlation of each voxel with the Cognitive and Metacognitive factors in Fig. 8a. Figure 8b provides a scatterplot of the two correlations for all 51,239 voxels. The outer boundary in this scatterplot reflects the theoretical bound on the maximum correlations. It has a negative principal axis because of the negative correlation between the two factors. Figure 8c presents only those voxels for which the mean engagement over the 8 conditions was greater than zero and with R2 greater than .841, which corresponds to a significance of p < 0.01. Of the 32,820 positively responding voxels, 11,994 of these have R2 > .841. Since we would only expect about 328 voxels with R2 > .841 by chance, it is clear that these two factors account for real variance in the brain—just as the factor analysis implied.

Figure 8c breaks these voxels into six color-coded 60° regions:

  1. 1.

    Cognitive: The 60° region from –45º to +15º where the correlation with the Cognitive factor is near 1.

  2. 2.

    Mixed: The 60°region from +15º to +75º where the correlations with the Cognitive and Metacognitive factors are about equal.

  3. 3.

    Metacognitive. The 60° region from +75º to +135º where the correlation with the Metacognitive factor is near 1.

  4. 4.

    Anti-cognitive. The 60° region from +135º to +195º where the correlation with the Cognitive factor is near –1.

  5. 5.

    Negative: The 60° region from +195º to +255º where the correlation is about equally negative with the Cognitive and Metacognitive factors.

  6. 6.

    Anti-metacognitive. The 60° region from +255º to +315º (or –45º) where the correlation with the Metacognitive factor is near –1.

Many of voxels in the Anti-cognitive to Anti-Metacognitive range in Fig. 8b are eliminated in Fig. 8c because they had a negative mean response.

Figure 8d shows the locations of all voxels with R2 > .841, whether positively or negatively responding. The positively responding regions use the same color-coding as in Fig. 8c, while the negatively responding voxels are in black. Many of the negatively responding voxels are in classic default network regions (e.g., Fair et al., 2008; Raichle & Snyder, 2007) such as the medial prefrontal and posterior cingulate. The positively responding voxels include many of the regions already identified. The motor region is not among the regions that show strong correlation with the Cognitive and Metacognitive factors. The LIPFC, PPC, PSPL, HIPS overlap Cognitive and Mixed voxels. The premotor is a purely Cognitive region. The figure also shows three regions that have already been found to show metacognitive profiles – ANG, FPC, and fusiform. In every case the patterns are roughly bilaterally symmetric. Figure 8d indicates an additional Metacognitive region that has not come out in the prior analysis and this is part of the superior prefrontal gyrus (SPFG) that is Brodmann Area 8. Separate left, right, and medial SPFG regions show the metacognitive pattern.

SPFG regions are also identified in an exploratory analysis that compares the difference Before-Exception and Before-Regular (see Table 3c). Figure 9 illustrates the response in the SPFG. While the SPFG has a strong correlation with the Metacognitive profile (r = .883) and shows similar effects to other metacognitive regions, it differs from the ANG (Fig. 5) and FPC regions (Fig. 8) in that it shows significantly greater engagement in the Before period for corrects (t(19) = 3.29, p < .005). This results in a smaller After-Exception and Before-Regular contrast for corrects, which is why it was not identified in the earlier 408-region exploratory analysis. In fact, the contrast only reaches conventional levels of significance for the left SPG (Table 3c).

Fig. 9
figure 9

Activity in SPFG regions in Fig. 8d. a BOLD responses in right hemisphere. b Mean engagement in left and right hemispheres (see caption of Fig. 3 for detail)

General discussion

The analyses in this paper depend on comparing engagement before and after response generation. We separately estimated an engagement for response generation not to contaminate either of these measures with the role the region might play in response generation. The ability to clearly separate the engagement of the motor region into expected components in Fig. 3 is one indication of the success of the procedure. Another indication of the success of this separation is that, while many regions showed differences between error and non-errors in the After period, no region showed a significant difference in engagement to corrects versus errors in the Before period, when participants did not yet know the accuracy of their response.

These pyramid problems nicely separate regions that show a Cognitive profile (strong engagement during problem solving, independent of challenge and disengagement afterwards) and regions that show a Metacognitive profile (strong engagement during challenging parts of problem solving and continued reflection afterwards). Figure 8d reveals a cognitive-to-metacognitive gradient in the frontal cortex moving posterior to anterior. Others have noted a similar concrete-to-abstract organization of the prefrontal cortex (e.g., Badre, 2008; Buckner, 2003; Christoff & Gabrieli, 2000; Petrides, 2005). Figure 8d would also appear to reveal another cognitive-to-metacognitive gradient in the parietal cortex going medial to lateral. There are some reports of a similar organization in other tasks (e.g., Asari, Konishi, Jimura, & Miyashita, 2005). There are projections between many prefrontal regions and parietal regions (e.g., Petrides & Pandya, 1984; Selemon & Goldman-Rakic, 1988) including connections between premotor and superior parietal (Wise, Boussaoud, Johnson, & Caminiti, 1997), but apparently no direct parietal connections from frontopolar to the parietal (Petrides & Pandya, 1984).

It would be wrong to conclude that all of the Cognitive and Metacognitive regions are equivalent. On the Cognitive side, the ACT-R theory implies different roles for Cognitive regions like the LIPFC and PPC and the triple-code theory implies distinct functions for the HIPS and PSPL. On the Metacognitive side, it is interesting that two of the regions identified are part of the triple-code theory—ANG and the fusiform gyrus. The fusiform gyrus is supposed to encode for visual form and has been shown to be active in many reading tasks (McCandliss, Cohen, & Dehaene, 2003), particularly on the left. This region is almost certainly not performing a metacognitive function per se but rather is being recruited in this experiment for more visual inspection in service of metacognitive functions. According to the triple-code theory, the left ANG is involved in verbal processing of numbers, also a relatively routine cognitive activity. Perhaps the greater activity in the left ANG reflects this verbal processing. The bilateral Metacognitive pattern may reflect a second function of this region that is overlaid on the verbal processing in the left.

It is of interest that the three of the regions identified as metacognitive in this task, the ANG, the FPC, and the SPFG, are adjacent to regions typically identified as default network and which respond negatively in our experiment (see Fig. 8d). These default regions have also been associated with cognitive functions like theory of mind and prospection (Buckner & Carroll, 2007; Spreng, Mar, & Kim, 2009). This suggests there may be an abstraction continuum that goes beyond the three metacognitive regions we found and that the highest levels of abstraction may not be activated by any of our mathematical problems.

We think the three Metacognitive regions identified in this experiment are serving general higher-level functions, but these functions can be separated, just as the functions of the different Cognitive regions have been separated. Even in this experiment, the three regions show differences in their pattern of engagement to correct exception problems. The ANG (Fig. 5) shows no significant difference in engagement before versus after response generation. The FPC (Fig. 7) shows significantly greater engagement after than before. The SPFG (Fig. 9) shows significantly greater engagement before response generation than afterwards. Below we will discuss these three regions and their possible functions in more detail.

Although the left ANG shows somewhat stronger engagement than the right, the profiles for the two hemispheres are highly correlated (r = .955 for Fig. 5b). We have suggested that the left ANG is engaged both for verbal processing and metacognitive processing. The right ANG is close to the regions of the right temporoparietal junction that have been associated with theory of mind and reorientation of attention. While the right ANG tends to be about 2 cm away from the center cited for the temporoparietal junction (ANG is a little higher and more posterior) its coordinates do overlap with some studies reported in the meta-analysis of Decety and Lamm (2007). Theory of mind studies typically require participants to interpret the intentions of others. For instance, one theory-of-mind study (Ruby & Decety, 2003) found this region was more active when participants took another’s perspective rather than their own. This suggests that activation in the region might reflect trying to understand the experimenter’s intentions in the definition of pyramid problems in order to correctly extend it to these exception problems. Certainly, the verbal protocols contained reference to the experimenter intentions as in “You need to explain the rule”. The reorientation of attention function attributed to the right temporoparietal junction may also be involved because these exception problems typically involve taking a different approach to their solution. In addition, a right ANG region, basically identical to the predefined region, is activated by a mismatch between one’s actions and the consequences of these actions (Farrer et al., 2008). Under these various interpretations, it makes sense that the ANG is heavily engaged both before and after response generation.

The greater involvement of the FPC region in the After period is consistent with other studies that find it highly engaged in reflection. In episodic memory experiments, it also gives a late positive fMRI response that can peak after the motor area controlling response generation peaks (e.g., Reynolds et al., 2006; Rugg et al., 2003). In other memory research, Johnson et al. (2005) and Raye et al. (2007) describe a “refreshing” process that shares features with reflection on problem solving. Their paradigm involves presenting participants with a sequence of items, usually a new word to study, but sometimes a cue to refresh the previous word. If the FPC region principally serves a reflection function, then it is reasonable that it should be much more engaged after response generation. However, it is not just a region that is engaged after any response, since it does not show significant engagement after a correct response to a regular problem.

The SPFG region has been shown to play a role in spatial working memory tasks (e.g., Casey et al., 1998; Courtney, Ungerleider, Keil, & Haxby, 1996; Postle, Berger, Taich, & D’Esposito, 2000; Rowe, Toni, Josephs, Frackowiak, & Passingham, 2000) and reasoning tasks (e.g., Goel & Dolan, 2004; Parsons & Osherson, 2001), particularly inductive reasoning. Neither of these processes are what one would consider classic metacognitive functions. One might speculate the SPFG is activated by the need to reason about how to extend regular solutions to exception cases by analogy to the solution for regular problems. Alternatively, this region might be engaged in spatial reasoning about the pyramid metaphor for these mathematical problems. Psychometric studies tend to find substantial correlations among spatial ability, reasoning, and mathematical ability (Lohman, 1988). Both of the reasoning and the spatial interpretations would imply that the SPFG would be engaged more by the immediate challenge of solving a pyramid problem and so we might expect it to be more active before the response is determined.

Our suggested interpretations for the ANG, FPC, and SPFG are consistent with other interpretations in the literature and these interpretations generally see these regions as having a role in higher-level cognitive processing. The postulated functions for each region can be interpreted as general-purpose computation that is not specific to metacognitive processing. In this experiment, the critical metacognitive demands involve reflecting on ones processing, both concurrently to adapt the solution method for an exception problem and after the fact to tune one’s methods given the feedback. These regions are like the fusiform in that they perform general-purpose functions that are called upon to meet the metacognitive demands of solving these pyramid problems.

The results of this experiment have important implications for how to model metacognitive processing in mathematics. Just as processing of regular problems proved to require the postulation of multiple modules in either the triple-code theory or ACT-R, these results suggest that to one needs to incorporate multiple separate modules that perform different functions when confronted with a situation that evokes metacognition. We might conjecture that the exception problems in this experiment evoke a “theory of mind” module (ANG) to interpret the intentions of the experimenter in defining pyramid problems, a “reasoning” module (SPFG) to determine the implications of the definition of a pyramid problem, and a “reflection” module (APFC) to modify one’s understanding given the feedback. Figure 8d suggests that these pyramid problems evoke a wide pattern of activity with many different brain regions performing the distinct functions required to solve these problems.