Reasoning refers to the cognitive processes engaged during methodical problem solving. An important component of reasoning is meta-reasoning, which involves assessing the quality of one’s judgments and cognitive processes (Metcalfe & Wiebe, 1987; Salvi et al., 2016). Though metacognition and meta-reasoning are both concerned with awareness of one’s cognitive processes, metacognition research has focussed on memory and general knowledge (Berardi-Coletta, 1995), whereas meta-reasoning research has focussed on monitoring and control processes during problem solving (Ackerman & Thompson, 2017). One’s ability to meta-reason is important because it informs decisions about whether to engage in problem solving, about effort investment during problem solving, and retrospective confidence about solving outcomes (Ackerman, 2014; Ackerman & Thompson, 2017; Payne & Duggan, 2011).

According to Ackerman and Thompson (2017), the first stage of meta-reasoning involves making a judgment of solvability (JOS). A JOS indicates one’s beliefs about whether a problem is solvable and/or whether one can solve it (Ackerman & Beller, 2017; Metcalfe & Wiebe, 1987). JOSs inform decisions to either engage in the problem or to give up on it. Misjudging problem solvability can have negative consequences such as wasting time attempting to solve unsolvable problems, or prematurely abandoning solvable problems (Ackerman & Thompson, 2017; Payne & Duggan, 2011; Toplak et al., 2014). To further our understanding of this first stage of meta-reasoning, our study sought to provide a detailed investigation of JOSs, focussing on whether they are discriminating and predictive of later problem-solving outcomes.

Are JOSs Discriminating?

Intuitive judgments can be made quickly and accurately and without analytic engagement (Kahneman, 2003; Lieberman, 2000). However, the evidence regarding whether intuitive JOSs are sensitive to a problem’s actual solvability has been mixed. Several studies have found that JOSs can discriminate between solvable problems (e.g., Balas et al., 2011; Bolte & Goschke, 2005; Novick & Sherman, 2003; Topolinski & Strack, 2009; Topolinski et al., 2016; Undorf & Zander, 2017). Other studies have failed to find such effects (e.g., Ackerman & Beller, 2017), or have reported that JOSs are discriminating only when certain problem-solving task conditions are met (e.g., Lauterman & Ackerman, 2019; Valerjev & Dujmović, 2020). Whether JOSs are found to be discriminating may depend in part on how investigators treat problems that are spontaneously solved during the JOS task. Often, researchers use “insight” problems to measure JOSs—these are short, verbal problems (such as anagram solving or the remote associates task) that are solved in a sudden, non-incremental way (Bowden & Jung-Beeman, 2003; Weisberg, 1992). Solutions to insight problems are usually found without much deliberate analytic engagement (Metcalfe, 1986; Metcalfe & Wiebe, 1987). Where insight differs from intuition is that insight involves retrieval of the solution, whereas intuition is based on a “gut feeling” that a solution to the problem exists (Stanovich & West, 2000; Topolinski & Strack, 2009). An advantage of using insight problems to measure JOSs is that such problems can be processed rapidly. Use of analytic problems typically requires more solving time (De Neys, 2006), which limits the number of JOSs that can be captured in a single experiment without fatiguing participants (Healy et al., 2004). Thus, we used anagrams as our problem-solving task, which allowed us to capture more JOSs than would be possible in an experiment that used analytic problems.

Despite the merits of using insight problems to measure JOS discrimination, solutions to insight problems can arise spontaneously during the JOS task (e.g., Novick & Sherman, 2003). Consequently, significant JOS discrimination may be attributable at least in part to participants spontaneously solving some of the problems during their presentation, rather than because they had accurate intuitions about their solvability. JOSs are intended to capture participants’ intuitions before reasoning occurs (Ackerman & Thompson, 2017), thus spontaneous solutions arising before/during the JOS would confound the measurement of JOS discrimination. For instance, Topolinski et al. (2016, Experiment 7) found that JOS discrimination was only marginally significant when anagrams that participants spontaneously solved during the JOS task were excluded from analysis. Therefore, our study reassessed whether JOSs are discriminating when problems solved during the JOS process are excluded.

Do JOSs Predict Later Problem Solving?

People generally avoid expending cognitive effort on problems they deem themselves unlikely to solve (De Neys et al., 2013; Payne & Duggan, 2011)—a process known as effort regulation. If intuition about problem solvability guides decisions about effort regulation, then people should exert more time and effort solving problems that they deem to be solvable (Ackerman & Thompson, 2017). Because longer processing time is associated with better reasoning performance (e.g., Pennycook et al., 2015), greater effort expenditure should lead to more successful problem solving.

Surprisingly, the few studies that have evaluated whether JOSs predict problem-solving success and effort regulation have yielded mixed findings. Judging a problem as solvable (vs. unsolvable) has been found to predict successful problem solving in some studies (Markovits et al., 2015; Siedlecka et al., 2016), but not others (Ackerman & Beller, 2017; Lauterman & Ackerman, 2019). Moreover, Lauterman and Ackerman found that participants who judged a problem as solvable later spent more time attempting to solve it, regardless of its actual solvability.

The same methodological issues noted above for measuring JOS discrimination apply equally to measuring JOS predictiveness. Specifically, spontaneous problem solving during the JOS task will exaggerate how well JOSs predict later problem solving. This issue may contribute to the mixed findings regarding JOS predictiveness. Thus, another aim of our study was to evaluate whether JOSs are predictive of effort regulation and problem-solving success after accounting for spontaneously solved items.

Approaches to Measuring JOS Discrimination and Predictiveness

To measure JOSs, researchers typically aim to choose a problem duration that will limit spontaneous solving during the JOS task. However, if the problems are presented too briefly, participants may revert to using unreliable heuristic cues to make their decisions (Ackerman, 2019; Benjamin, 2005; Kahneman et al., 1982), or may even engage in random responding that would reduce the accuracy and predictiveness of JOSs. An alternative is to provide problems for longer but to allow participants to report whenever they have solved a problem during its presentation. This method enables the researcher to examine whether JOS discrimination and predictiveness is limited to solved problems or extends to unsolved problems. A second advantage of this method is that participants can be given more time to make their JOSs without them outsourcing their cognitive efforts to unreliable heuristics.

Some studies have ignored the possibility of spontaneous solutions or have merely assumed that the selected problem duration prevents them (e.g., Balas et al., 2011; Bolte & Goschke, 2005; Siedlecka et al., 2016; Valerjev & Dujmović, 2020). Topolinski et al., (2016; Experiments and 7) instructed participants to report any spontaneous solutions to the anagram after each JOS trial, and then discarded trials where participants had solved the anagram. However, their participants only provided solutions to spontaneously solved problems; JOS predictiveness was not measured. Thus, participants’ ability to solve the problems they judged as “solvable” (but did not report having spontaneously solved) was not assessed.

Most studies that have not measured spontaneous solving have interleaved the JOS and problem-solving tasks (e.g., Balas et al., 2011; Topolinski & Strack, 2009; Valerjev & Dujmović, 2020), such that on each trial participants made a JOS and then immediately attempted to solve the problem. In an interleaved paradigm, JOSs may be influenced by solving attempts, and vice versa. For instance, if a participant judges a problem as solvable, and then solves the problem, that serves as metacognitive feedback that the JOS was well calibrated. As a result, interleaved paradigms may lead to higher levels of JOS discrimination because participants can adjust their JOS calibration in light of their solving outcomes. Additionally, participants may exert more effort solving problems they have judged to be solvable, leading to more success and thus also rendering JOSs more predictive. In short, interleaved paradigms may allow reasoners to bootstrap their JOS intuitions, which in turn might improve JOS discrimination and predictiveness.

In contrast, other studies have used a two-phase paradigm. In a JOS phase, participants make JOSs for the entire set of reasoning problems. The JOS phase is then followed by a solving phase, in which participants attempt to solve some or all of these problems (e.g., Ackerman & Beller, 2017; Lauterman & Ackerman, 2019). The JOS phase is intended to capture intuitive judgments, and the solving phase is intended to capture solving outcomes and how well they are predicted by the JOSs. Our study used the two-phase paradigm.

Is JOS Discrimination and Predictiveness Trainable?

To date, studies examining JOSs have focussed on identifying factors that may influence or bias JOSs, such as problem length, difficulty, and fluency (e.g., Balas et al., 2011; Lauterman & Ackerman, 2019; Topolinski et al., 2016; Valerjev & Dujmović, 2020). Research has yet to examine whether JOSs are trainable in ways that increase how discriminating they are, and how predictive they are of later problem solving. The influence of training on metacognition has largely occurred in the metamemory area (e.g., Dunlosky & Rawson, 2012; Koriat et al., 2002; West & Mulligan, 2019). Our study examined the impact of training on meta-reasoning, by measuring whether practice with longer-duration anagram problems in the JOS phase enhances JOS discrimination and/or predictiveness.

Overview

We examined JOS discrimination and JOS predictiveness using a two-phase paradigm. Anagrams were used to allow collection of brief assessments of solvability. In the JOS phase, equal numbers of solvable and unsolvable anagrams were presented in each of four blocks. In the training groups, the first block presented each anagram for 16 s, and anagram presentation duration was then halved across the three subsequent blocks to allow a parametric examination of duration on JOS discrimination and predictiveness. This design resulted in the briefest blocks using 2 s and 4 s anagram durations, consistent with the durations used in prior studies (e.g., Lauterman & Ackerman, 2019; Novick & Sherman, 2003; Topolinski et al., 2016). After the anagram disappeared, participants quickly judged the anagram as either solvable, unsolvable, or already solved.

For the blocks with longer-duration anagrams, participants are likely to move from simply making a JOS to attempting to solve the anagrams. This should result in a higher rate of “already solved” JOSs. Nonetheless, “solvable” JOSs should accurately capture intuition regardless of whether solving efforts have not yet occurred (shorter-duration anagrams) or have occurred but have not yielded solutions (longer-duration anagrams). Starting with longer-duration anagrams was expected to provide the training groups with more solving successes that might increase participants’ motivation to provide rational JOSs, help them generate a better intuitive sense of an anagram’s solvability (Schuster et al., 2020), and help them to regulate their JOSs (Leopold & Leutner, 2015; Leutner et al., 2007). Examining JOS discrimination and predictiveness across a range of durations, rather than choosing an arbitrary “gold standard” duration, also served to increase generality. In the solving phase, participants then attempted to solve each of the solvable anagrams within 45 s (Experiment 1 and 2), or they received both solvable and unsolvable anagrams and solving time was self-regulated (Experiment 3).

We report three experiments. Experiment 1 determined whether “solvable” JOSs in a training group were discriminating and predictive after excluding anagrams classified by the participant as “already solved” during the JOS task. We also examined how anagram duration affects JOS discrimination and the rate of “already solved” JOSs. Experiment 2 compared the training group to a no-training group that consistently received short (2 s) duration anagrams, to allow us to measure the effect of longer-duration training. In Experiment 3, we modified the two-phase paradigm to allow effort-regulation and solving performance to vary in the solving phase, to examine whether JOSs predict self-regulation of effort investment in anagram solving. Here, we included both solvable and unsolvable anagrams in the solving phase, and no time limit was imposed on solving. Our experiments build on Topolinski et al.’s (2016) initial explorations of JOSs by measuring and considering “already solved” JOSs, by examining the links between JOSs and later solving outcomes, and by exploring the effects of training using longer-duration anagrams.

Experiment 1

Experiment 1 explored whether already-solved (AS) and solvable (S) JOSs in the JOS phase discriminate between solvable and unsolvable anagrams. The subsequent solving phase allowed us to explore whether these JOSs predicted successful problem solving, as well as whether not solvable (NS) JOSs predicted problem-solving failures. During the JOS phase, anagrams were presented for 16 s in block 1, 8 s in block 2, 4 s in block 3, and 2 s in block 4. In the solving phase, participants attempted to solve each solvable anagram within 45 s.

Both AS and S JOS were expected discriminate solvable from unsolvable anagrams, and JOS discrimination was expected to decrease across blocks as anagram duration decreased. We also expected that JOSs would be more discriminating when anagrams receiving AS JOSs were included in the discrimination measures than when they were excluded. In turn, we expected that anagrams receiving AS JOSs typically would be solved in the solving phase—indeed, this creates a manipulation check that participants used the AS JOS response option appropriately. We also evaluated whether anagrams receiving S JOSs were associated with greater solving-phase success, and whether anagrams receiving NS JOSs were associated with lower solving-phase success.

Method

The experiment was preregistered on Open Science Framework (OSF) at https://osf.io/zuqnw.

Participants

Participants (N = 122) were recruited through Amazon’s Mechanical Turk (MTurk) via TurkPrime (Litman et al., 2017) and each received USD $2.25. We excluded 22 participants who met more than one pre-registered exclusion criterion (correctly solved less than 10% of anagrams, did not complete the study, failed an attention check, more than 2 SD outside the mean study completion time). The final sample was 100 participants (55 female, 44 male, 1 other; mean age = 39.76, SD = 12.35), in line with our preregistration.

Stimuli

Because anagram solving depends to some degree on how frequently the solution word appears in the English language (Johnson, 1966; Mayzner & Tresselt, 1958), a set of 75 solvable 5-letter anagrams was selected from a corpus of frequently used words (Word Frequency Data, 2016). The anagrams were subject to Gilhooly’s (1978) bigram analysis indicating each word had a single anagram solution. The anagrams were piloted online (N = 94) and 40 were selected to be used in the study. The anagrams were then sorted into 4 sets of 10 roughly equated on solvability (each block had a mean solving rate of roughly 77% and solving rates ranged from 50 to 100%). To create the set of 40 unsolvable anagrams, the letters in pseudowords created using Wuggy (Keuleers & Brysbaert, 2010) were randomly shuffled using an online character randomizer (Shuffle Characters in Text, 2010), and were then randomly and evenly allocated to the 4 sets. Assignment of sets to blocks was counterbalanced across participants via Latin square.

Procedure

The experiment was conducted online using Qualtrics software (Qualtrics, 2019). For the JOS phase, participants were instructed that they would be presented with a sequence of letters on each trial (i.e., an anagram), some of which could be rearranged to spell a word (e.g., DSTMI—MIDST) and hence were “solvable”, and others of which did not have a solution word (e.g., ZEREB) and hence were “unsolvable”. Their task was to make one of three solvability judgments for each anagram in the allotted time: “YES it is solvable”, “NO it is not solvable”, or “I have already solved it”. They were told that the anagram duration would decrease across four blocks as follows: 16 s, 8 s, 4 s, 2 s. Participants were also forewarned that they would later have 45 s to attempt to solve each solvable anagram.

On each of the 80 JOS phase trials, an anagram was presented for the duration specified for that block. Once the anagram disappeared, the 3 JOS options appeared as response boxes, and participants had 3 s to click on a response. If they failed to make their JOS within 3 s, a message appeared asking them to respond within 3 s. This message remained on the screen for 4 s to discourage participants from continuing to try to solve anagrams after they disappeared. After making their JOS, participants pressed an arrow button to submit their response, and then the next trial began. If participants made a JOS but did not submit it within 3 s it was still recorded; this occurred on an average of 0.4% of trials in Experiment 1, 1.3% in Experiment 2, and 0.7% in Experiment 3. Before commencing the task, participants completed 10 practice JOS trials (5 solvable, 5 unsolvable) at the 16 s duration. They then attempted to solve the 5 solvable anagrams, each within 45 s.

The solving phase immediately followed the JOS phase. The solvable anagrams from the JOS phase were presented sequentially in a random order, each for 45 s (due to a programming error, only 39 of the 40 solvable anagrams were presented). Participants had 45 s to type the solution into a response box (minimum allowed was 3 s) and to then press the “Next” button to proceed. The 45 s time limit was selected based on a pilot study with a 60 s time limit in place; here the mean response time plus 2 SD was roughly 45 s, so this time limit ensured adequate solving time for the majority of trials/participants. On average, responses to anagrams were made within 45 s on 93% of trials in Experiment 1 and 94% of trials in Experiment 2 (among retained participants). If participants did not respond within 45 s, any response they entered was recorded and the solving phase progressed.

Results

JOS Phase

Participants’ ability to distinguish solvable anagrams from unsolvable anagrams (i.e., JOS discrimination) was assessed by measuring whether their hit rate (i.e., judging a solvable anagram to be solvable) exceeded their false alarm rate (i.e., judging an unsolvable anagram to be solvable). Hits and false alarms were converted to proportions by dividing them by the total number of JOS phase trials in which participants entered a response within the 3 s time limit following anagram presentation. These proportions were calculated across AS + S JOSs (Fig. 1), as would be the case if participants were not offered the option of reporting spontaneous solving, and separately for AS JOSs (Fig. 2a) and S JOSs (Fig. 3a).

Fig. 1
figure 1

Experiment 1: Mean proportions of hits and false alarms for AS + S JOSs in the JOS phase in experiment 1 (Bars show 95% CI of each mean)

Fig. 2
figure 2

Experiments 1–3: Mean proportions of hits and false alarms for AS JOSs in the JOS phase (Bars show 95% CI of each mean)

Fig. 3
figure 3

Experiments 1–3: Mean proportions of hits and false alarms for S JOSs in the JOS phase (Bars show 95% CI of each mean)

Each dependent variable was analyzed using a 2(discrimination: hits, false alarms) × 4(block: 1–4) repeated-measures ANOVA. Table 1 provides the complete ANOVA results. The two key effects reviewed below are whether each JOS discriminated solvable from unsolvable anagrams (i.e., the main effect of discrimination) and whether JOS discrimination decreased across blocks (i.e., the interaction between discrimination and block).

Table 1 Experiment 1: JOS phase ANOVAs results

Participants’ AS + S JOSs were highly discriminating; averaged across blocks, hits (M = .73, SD = .16) were significantly greater than false alarms (M = .24, SD = .18). JOS discrimination also interacted with block, reflecting reduced discrimination across blocks as anagram durations were reduced (see Fig. 1); however, pairwise comparisons showed that discrimination was significant in each block (ps < .001). The interaction was followed up using linear contrasts, given our parametric manipulation of anagram duration. The results are shown in Table 2. The significant interaction between block and discrimination indicated that the linear effect across anagram durations differed for hits versus false alarms (see Fig. 1). Hits decreased about .04 per block as anagram duration was halved, whereas false alarms increased about .04 per block (both linear effects were significant).

Table 2 Experiment 1: JOS phase linear contrast ANOVAs

When AS JOSs and S JOSs were analyzed separately, the same patterns occurred: a significant main effect of discrimination and an interaction with block. Each JOS was again discriminating at each duration (ps < .001). For both AS JOSs and S JOSs, the linear contrast analyses showed that linear effect of block was significant, as was the linear interaction between block and discrimination. For AS JOSs, the decrease in discrimination across blocks was due to a linear decrease in hits, whereas false alarms did not increase across blocks. For S JOSs, the reverse pattern was found: the decrease in discrimination across blocks was due to a linear increase in false alarms, whereas hits did not decrease across blocks. We discuss this novel pattern further in the General Discussion.

Solving Phase

We devised two measures to assess how well JOSs predict later problem-solving success. The solving rates for each JOS were similar across block, therefore we averaged across JOS phase blocks in our solving phase analyses (our Supplementary Materials provide the block-wise means).

Our first measure, proportion solved, was calculated as the number of anagrams solved during the solving phase that had received a given JOS during the JOS phase, divided by the total number of anagrams that had received that JOS during the JOS phase. For example, if a participant went on to solve 6 out of 10 anagrams to which they had made S JOSs, their proportion solved in the solving phase would be .6 for S JOSs. The proportion solved in the solving phase for each JOS were thus independent and each could range from 0 to 1. Our Supplementary Materials provide a full illustrative example.

The mean proportion solved as a function of JOS (AS vs. S vs. NS) was analyzed using a repeated-measures ANOVA, which was significant, F(2, 150) = 27.41, MSE = 1.01, p < .001, η2p = .27 (see Fig. 4a for the means). Pairwise multiple comparisons established that participants solved a greater proportion of anagrams that had received AS JOSs compared to either S JOSs (p < .001) or NS JOSs (p < .001), confirming that AS JOSs were predictive of later solving. In sharp contrast, solving rates were not significantly higher for anagrams that had received S JOSs versus NS JOSs (p = .67), thus S JOSs were discriminating during the JOS phase but were not predictive of later solving.

Fig. 4
figure 4

Experiments 1–3: Mean proportions of solvable anagrams solved in the solving phase (Bars show 95% CI of each mean)

Our second measure for assessing how well JOSs predict later problem-solving success was solved versus not solved outcomes. This measure was calculated as the proportion of anagrams that were solved versus not solved in the solving phase that had received a given JOS. For each JOS, the proportion solved was calculated by dividing the total number of solved anagrams given that JOS by the total number of anagrams solved, and the proportion not solved was calculated by dividing the total number of not solved anagrams given that JOS by the total number of anagrams not solved. For example, if a participant solved 10 anagrams, and 6 of those anagrams had received S JOSs, the proportion solved in the solving phase for S JOSs would be .60. Likewise, if a participant failed to solve 10 anagrams, and 2 of those unsolved anagrams had received S JOSs, the proportion not solved in the solving phase for S JOSs would be .20. Thus, the two proportions were independent and could each range from 0 to 1 for each JOS, allowing us to compare them directly (see Fig. 5a for the means). Our Supplementary Materials provide a full illustrative example.

Fig. 5
figure 5

Experiments 1–3: Mean proportion of solved versus not solved outcomes for solvable anagrams (Bars show 95% CI of each mean)

Anagrams that had received an AS JOS were more frequent among solved anagrams than among not-solved anagrams in the solving phase, F(1, 93) = 53.70, MSE = 3.75, p < .001, η2p = .37, whereas this was not the case for anagrams that received S JOSs, F(1, 93) = 0.37, MSE = 0.02, p = .55, η2p = .004. On the other hand, NS JOSs were more frequent among not-solved than among solved anagrams, F(1, 93) = 56.40, MSE = 2.90, p < .001, η2p = .38. In sum, AS and NS JOSs reliably predicted later solving outcomes, but S JOSs did not.

Discussion

In the JOS phase, participants’ JOSs accurately discriminated between solvable and unsolvable anagrams, even at our briefest anagram duration of 2 s. Importantly, this pattern held for S JOSs after excluding trials that led to AS JOSs. The S JOS pattern establishes that above-chance discrimination of JOSs can occur apart from trials in which participants have spontaneously solved problems during the JOS task. In contrast, Topolinski et al. (2016, Experiments 6 and 7) found that JOS discrimination was only marginally significant after excluding already-solved anagrams.

In the solving phase, AS JOSs were predictive of later solving success (and NS JOSs were predictive of later solving failure). Most surprisingly, we found that S JOSs were not predictive of later solving success. Participants solved more anagrams given AS JOSs than anagrams given either S or NS JOSs, but solving rates were not higher for anagrams given S JOSs rather than NS JOSs. In addition, solved outcomes were not more frequent than not-solved outcomes for anagrams that received S JOSs, unlike for AS JOSs. Thus, although S JOSs were discriminating, they were not associated with later problem-solving success.

Experiment 2

Although Experiment 1 provided new insights about JOS discrimination and predictiveness, its design did not allow us to gauge whether the ‘training’ we provided through the inclusion of longer duration anagrams increased S JOS discrimination. Therefore, in Experiment 2 we manipulated the presence versus absence of longer-duration anagrams across groups. The training group was identical to Experiment 1 and in the no-training group we presented anagrams for 2 s in all 4 blocks. This design allowed us to test whether the training group showed greater JOS discrimination than the no-training group in blocks 1–3 after the same amount of task experience. It also enabled us to test whether training improved JOS discrimination in block 4. The solving phase was the same as Experiment 1, thus the Experiment 2 design again allowed us to test whether training with longer-duration anagrams in the JOS phase influences later solving performance.

We expected the training group to show greater AS JOS and S JOS discrimination than the no-training group. Because longer duration anagrams provide problem-solving successes during the JOS task, participants may use these successes to better regulate their JOSs on trials where they are unable to solve the anagram during its presentation, resulting in improved discrimination. In turn, training with longer-duration anagrams was expected to result in AS and S JOSs being more predictive of anagram solving during the solving phase.

Method

The experiment was preregistered on OSF at https://osf.io/cq2kb.

Participants

We tested another 238 MTurk workers, as per Experiment 1. Data for the training and no-training groups were collected in turn (back-to-back). We excluded 56 participants who met more than one pre-registered exclusion criteria. The final sample consisted of 182 participants (101 female, 81 male; mean age = 41.54, SD = 13.39), 91 per group, in line with our preregistration.

Stimuli

The Experiment 1 stimuli were used.

Procedure

The Experiment 1 procedure was used, except the anagrams were presented for 2 s in each block in the no-training group; participants were informed of this duration. All 40 solvable anagrams were shown in the solving phase.

Results

JOS Phase

JOS discrimination was measured as in Experiment 1. The combined AS + S JOS results replicated Experiment 1 and are presented in our Supplementary Materials. The means for AS JOSs are provided in Fig. 2b–c, and for S JOSs in Fig. 3b–c. The measures were computed and analyzed as per Experiment 1. Table 3 provides the ANOVA results, and Table 4 provides the linear contrast results.

Table 3 Experiment 2: Results for JOS phase ANOVAs
Table 4 Experiment 2 and 3: JOS phase linear contrast ANOVAs

Training Group

In the training group, the discrimination pattern for AS JOSs and S JOSs fully replicated Experiment 1. In both cases, JOSs distinguished between solvable and unsolvable anagrams, and discrimination decreased across blocks but was significant at each duration (ps ≤ .001). The linear effect of block was significant for both AS JOSs and S JOSs, as was the linear interaction between block and discrimination. The decrease in discrimination across blocks was again due to a linear decrease in hits (rather than an increase in false alarms) for AS JOSs, and to a linear increase in false alarms (rather than a decrease in hits) for S JOSs.

No-Training Group

In the no-training group, JOS discrimination was significant for both AS JOSs and S JOSs. However, unlike in the training group, here the discrimination by block interactions were not significant. Discrimination was significant in each block for each measure, except in block 1 for S JOSs (p = .09).

Did Longer-Duration Anagrams Improve JOS Discrimination in Blocks 1–3?

We next gauged whether training enhanced JOS discrimination in blocks 1–3 relative to the no-training group, using a 2(discrimination: hits vs. false alarms) × 3(block: 1–3) by 2(group: training vs. no-training) mixed-factor ANOVA for each JOS measure. The complete ANOVA results are reported in Table 5. Of central interest was the three-way interaction. For AS JOSs, this interaction was significant, indicating that longer-duration anagrams in the training group improved AS JOS discrimination. However, the three-way interaction was not significant for S JOSs – the longer-duration anagrams in blocks 1–3 (as opposed to 2 s) did not result in more discriminating S JOSs in these blocks.

Table 5 Experiment 2: JOS phase discrimination ANOVAs in blocks 1–3

Did Training Improve JOS Discrimination in Block 4?

We next focused on block 4 to determine whether training improved JOS discrimination where both groups received 2 s duration anagrams. For each JOS measure, we ran a 2(discrimination: hits, false alarms) × 2(group: training vs. no-training) mixed-factor ANOVA (see Table 6). The effect of interest was the interaction, which was not significant either for AS or S JOSs. Thus, training with longer-duration anagrams, relative to 2 s anagrams, did not improve discrimination for either AS or S JOSs.

Table 6 Experiment 2 and 3: Results for JOS phase discrimination ANOVAs in block 4

Solving Phase

Our solving phase analyses again averaged across the JOS phase blocks. Therefore, when we refer to the effect of training versus no-training on JOS predictiveness, we are referring to the general effect of experience with longer-duration anagrams on solving outcomes. The solving phase analyses followed Experiment 1, except group was added as a between-subjects factor. The means for the proportion solved measure appear in Fig. 4b. The 3(JOS: AS, S, NS) by 2(group: training vs. no-training) ANOVA revealed a significant main effect of JOS, F(2, 192) = 13.78, MSE = 0.45, p < .001, η2p = .13. The proportion of anagrams solved was greater for anagrams that had received AS JOSs in the JOS phase rather than either S JOSs (p = .001) or NS JOSs (p < .001). In contrast, participants were equally likely to solve anagrams that received S JOSs or NS JOSs in the JOS phase (p = 1.00). Thus, replicating Experiment 1, S JOSs were not predictive of greater problem-solving success. The group main effect was not significant, F(1, 96) = 2.00, MSE = 0.28, p = .16, η2p = .02. Strikingly, training did not improve how well JOSs predicted later solving: JOS predictiveness was similar across groups, F(2, 192) = 1.37, MSE = 0.04, p = .26, η2p = .01 for the interaction.

The solved versus not-solved outcome measure means appear in Fig. 5b, and Table 7 provides the 2(outcome: solved, not solved) by 2(group: training, no-training) ANOVAs. The main effects of group are not of interest because they average across outcomes. AS JOSs were more frequent among solved anagrams than among not-solved anagrams, and this effect was larger in the training group, resulting in a significant interaction (though prediction was significant in each group, ps ≤ .03). Anagrams given S JOSs were not significantly more frequent among solved anagrams than among not-solved anagrams. Here, outcome interacted with group, but the outcome difference did not reach significance for either group (ps ≥ .05). NS JOS were more frequent among not-solved anagrams than among solved anagrams, and outcome interacted with group; this effect was significant in the training group (p < .001) but not in the no-training group (p = .42). Thus, training with longer-duration anagrams in blocks 1–3 enhanced the predictiveness of AS and NS JOSs but not S JOSs.

Table 7 Experiments 2 and 3: Results for solved versus not-solved outcomes ANOVAs

Discussion

Replicating Experiment 1, participants’ AS JOSs and S JOSs both discriminated solvable from unsolvable anagrams. Experiment 2 extended this finding by establishing that both JOSs were discriminating even in a no-training group where anagram duration was always 2 s during the JOS phase. Experiment 2 also confirmed that presenting longer-duration anagrams in blocks 1–3 led to more discriminating AS JOSs. Importantly, however, this was not the case for S JOSs. Thus, inclusion of longer-duration anagrams increased the likelihood of spontaneous anagram solving, but it did not improve S JOS discrimination. In fact, training did not result in greater discrimination in block 4 (2 s anagrams for both groups) for either S or AS JOSs.

The solving phase for the training group replicated Experiment 1. Participants were more likely to solve anagrams that had been given AS JOSs than either S or NS JOSs. AS and NS JOSs predicted later anagram solving successes and failures, respectively. Importantly, replicating Experiment 1, S JOSs were not predictive of later problem-solving outcomes. Additionally, Experiment 2 showed that although training improved the predictiveness of AS and NS JOSs, it did not do so for S JOSs. Training also did not result in a greater proportion of anagrams solved, regardless of JOS.

In Experiment 1 and 2, the majority of anagrams were solved no matter the JOS (0.71-0.94; see Fig. 4a and b). Participants were informed that each anagram was solvable, and were given 45 s to solve each one. These design elements may have increased solving efforts and, in turn, solving successes, which may have masked our ability to detect effects of training on JOS predictiveness. Experiment 3 revisited JOS predictiveness when efforts were made to reduce solving phase success, which also enabled us to examine how training during the JOS phase influences effort regulation in the solving phase.

Experiment 3

The JOS phase in Experiment 3 was identical to Experiment 2, allowing us to test the replicability of our findings with respect to JOS discrimination and the impact of training on JOS discrimination. However, the solving phase was modified to allow us to examine the generality our findings regarding JOS predictiveness and to investigate the effects of training on how well JOSs predict effort regulation. In Experiment 3, the solving phase included 5 solvable and 5 unsolvable anagrams from each block of the JOS phase (rather than including only the 10 solvable anagrams from each block). In addition, we allowed participants to self-regulate their problem-solving effort: they could spend as much or as little time as they wished attempting to solve each anagram. On each trial, they either typed in the anagram solution, passed, or indicated that the anagram was not solvable (dubbed a not-solvable response). The inclusion of unsolvable anagrams, the ability to pass and make not-solvable responses, and to respond sooner than 45 s if no solution was found were expected to reduce the solving rate relative to Experiments 1 and 2. By lowering the solving rate, Experiment 3 was expected to provide a stronger test of JOS predictiveness, and of the potential effects of training on JOS predictiveness.

These modifications to the two-phase paradigm also provided new measures of the link between JOSs and later problem solving. One new measure was how long participants took to solve solvable anagrams, and another was how long they took to make not-solvable responses to unsolvable anagrams. The latter provides a novel measure of effort regulation that allowed us to examine, for example, whether participants spent longer solving anagrams when they had given an AS or S JOS versus a NS JOS, and whether training further impacted their effort regulation. We were also able to examine whether NS JOSs were associated with faster not-solvable responses for unsolvable anagrams, and whether training strengthened this effect. A third new measure was the rate of not-solvable responses itself, which provided a parallel window onto these same questions.

Method

The experiment was preregistered on OSF at https://osf.io/bzuqc.

Participants

A total of N = 357 additional MTurk workers were tested. Allocation to the training or no-training groups was randomized. We increased the sample size for each group by 50 given that the solving phase now included 5 rather than all 10 solvable anagrams from each block of the JOS phase. Here, 60 participants were excluded for failing more than one pre-registered exclusion criterion. The final sample consisted of 297 participants (221 female, 76 male, mean age = 42.25, SD = 12.90): 150 in the training group, and 147 in the no training group, in line with our preregistration.

Stimuli

The Experiment 1 and 2 stimuli were used.

Procedure

The procedure followed Experiment 2, except the modifications to the solving phase to enable us to measure regulation of problem-solving effort. The solving phase now consisted of one of two sets of 20 solvable and 20 unsolvable anagrams from the JOS phase. To this end, a random half of the anagrams from each block were assigned to each set, and the set used in the solving phase was counterbalanced across participants. The solving phase instructions informed participants that half the anagrams were solvable, and half were not. They were told that they had as much time to try to solve each anagram as they wished, and they were instructed to either type in a solution, type the letter “P” to pass if they believed the anagram was solvable but were unable to solve it, or to type the letter “N” for “not solvable” if they believed the anagram was unsolvable. Participants were given 5 JOS practice trials, and 5 solving practice trials using the same anagrams from the JOS practice trials.

Results

Experiment 3 was analyzed as per Experiment 2, with additional analyses of unsolvable anagrams in the solving phase, and of self-regulated solving times.

JOS Phase

JOS discrimination means are provided in Fig. 2d–e for AS JOSs and in Fig. 3d–e for S JOSs (see Table 8 for ANOVA results and Table 4 for the linear contrasts).

Table 8 Experiment 3: Results for JOS phase ANOVAs

Training Group

The discrimination pattern for AS JOSs and S JOSs replicated Experiments 1 and 2. In each case, AS and S JOSs were both discriminating, and discrimination decreased across blocks but was significant at each duration (ps ≤ .001). For both AS JOSs and S JOSs, linear contrast analyses showed that the linear effect of block was significant, but only AS JOSs had a significant linear interaction between block and discrimination. For AS JOSs, although hits and false alarms both decreased linearly, the decrease in discrimination across blocks was greater for hits than for false alarms (but both were significant). For S JOSs, although the interaction was not significant, the analyses showed a similar pattern to Experiments 1 and 2: the change in discrimination across blocks was driven by a significant linear increase in false alarms (p < .001), whereas the decrease in hits was not significant (p = .09).

No-Training Group

The discrimination pattern in the no-training group also largely replicated Experiment 2. Discrimination was significant across all JOSs, did not significantly interact with block; and was significant in each block for each JOS, except for block 4 for S JOSs (p = .42).

Did Longer-Duration Anagrams Improve JOS Discrimination in Blocks 1–3?

The pattern of three-way interactions between discrimination, block, and group across JOSs replicated Experiment 2 (see Table 9). Although AS JOS discrimination decreased across blocks in the training group, longer-duration anagrams still led to significantly greater discrimination across blocks 1–3 for AS JOSs. In the no-training group, discrimination did not increase across blocks, and was significantly weaker than in the training group. The three-way interaction was not significant for S JOSs.

Table 9 Experiment 3: JOS phase ANOVA results

Did Training Improve JOS Discrimination in Block 4?

Training with longer-duration anagrams did not significantly improve discrimination for either AS and S JOSs (see Table 6), replicating Experiment 2.

Solving Phase

Solving phase analyses followed Experiment 2, with additional analyses given the self-regulated elements. Note that “pass” responses were too rare to analyze separately. As expected, the change to a self-regulated solving phase reduced the mean solving rate in Experiment 3 (M = .65, SD = .20) relative to Experiments 1 and 2 (M = .77, SD = 0.23), t(775) = 7.09, p < .001. This reduction of solving rate should make it easier to detect an impact of JOS on solving phase outcomes.

Proportion Solved

The proportion solved means for solvable anagrams appear in Fig. 4c. The 3(JOS: AS, S, NS) × 2(group: training, no training) ANOVA revealed a significant main effect of JOS, F(2, 400) = 168.01, MSE = 12.70, p < .001, η2p = .46. The proportion of anagrams solved was greater for anagrams that received AS JOSs compared to either S JOSs (p < .001) or NS JOSs (p < .001). Unlike in Experiments 1 and 2, here the proportion of anagrams solved was greater for anagrams receiving S JOSs than NS JOSs (p < .001). As in Experiment 2, the main effect of group was not significant, F(1, 207) = 0.51, MSE = 0.07, p = .48, η2p = .002, nor was the interaction, F(1, 400) = 0.01, MSE = 0.001, p = .99, η2p < .001. Thus, training did not increase the overall proportion of anagrams solved in the solving phase, nor was the predictiveness of JOSs greater in the training group.

Solved vs. Not Solved Outcomes for Solvable Anagrams

The solved versus not solved outcome means appear in Fig. 5c, and the 2(outcome: solved vs. not solved) × 2(group: training vs. no training) ANOVA results appear in Table 7. Incorrect solutions, “pass” responses, and not-solvable responses were all counted as not-solved outcomes. AS JOSs were more frequent among solved anagrams than among not-solved anagrams, and training interacted with outcome such that the effect was larger in the training group (though each was significant, ps < .001). In contrast, S JOSs did not significantly predict solving phase outcome, and the interaction with training was also not significant. NS JOSs were more frequent among not-solved anagrams than among solved anagrams, and here the interaction with outcome and training was (just) significant (and the effect was significant for each group, ps < .001). Thus, predictiveness of NS JOSs was again enhanced by training.

Not-Solvable Responses to Solvable vs. Unsolvable Anagrams

The proportion of not-solvable responses in the solving phase was calculated as the number of not-solvable responses in the solving phase that had received a given JOS during the JOS phase, divided by the total number of anagrams that had received that JOS during the JOS phase. For example, if a participant gave a not-solvable response in the solving phase for 5 out of 10 anagrams to which they had given an NS JOS in the JOS phase, their proportion of not-solvable responses in the solving phase for NS JOSs was .5. Their independence allowed for direct comparisons of the mean proportion of anagrams given a not-solvable response in the solving phase as a function of JOS. Due to the rarity of AS and S JOSs for unsolvable anagrams, these proportions were pooled across AS + S JOSs.

A 2(JOS: AS + S, NS) × 2(anagram type: solvable, unsolvable) × 2(group: training, no-training) mixed-factor ANOVA was conducted on not-solvable responses (Table 10). The means appear in Fig. 6. Here, we were interested in whether not-solvable responses were more likely for anagrams given NS JOSs than AS + S JOSs, whether anagram type strengthened the likelihood of not-solvable responses for NS JOSs relative to AS JOSs, and whether training moderated the latter interaction. There was a significant main effect of JOS; not-solvable responses were more likely for anagrams given NS JOSs than AS + S JOSs. JOS interacted with anagram type; the difference in proportion of not-solvable responses between AS + S and NS JOSs was greater for solvable anagrams (though was significant for both solvable and unsolvable anagrams; ps < .001). JOS also interacted with group; although both groups showed more not-solvable responses at test for anagrams given NS JOSs than for AS + S JOSs (ps < .001), this pattern was more robust in the training group.

Table 10 Experiment 3: Proportion of not-solvable responses ANOVA results
Fig. 6
figure 6

Experiment 3: Mean proportions of not-solvable outcomes to solvable anagrams (Bars show 95% CI of each mean)

Finally, the three-way interaction with anagram type was also significant. This interaction was followed up with separate interaction contrasts for the training and no-training groups (see Table 11). For each group, not-solvable responses were more likely to be provided for anagrams given NS JOSs than AS + S JOSs (i.e., a main effect of JOS). The interaction of JOS and anagram type was significant only in the no-training group, and reflected a smaller difference between JOSs for unsolvable than for solvable anagrams (though both were significant; ps < 0.001).

Table 11 Experiment 3: Proportion of not-solvable responses ANOVA results

Do JOSs Predict Self-Regulated Response Times in the Solving phase?

Whether JOSs predict solving response times was analyzed as per the not-solvable response analyses. Because solution times were negatively skewed, a base 10 logarithm transformation was applied to normally distribute the data. Thus, descriptive statistics are presented in seconds, but inferential statistics used the transformed means.

Solved Anagrams

The mean response time to correctly solve solvable anagrams appear in Fig. 7. The 3(JOS) × 2(group) ANOVA revealed a main effect of JOS, F(2, 280) = 81.15, MSE = 1.80, p < .001, η2p = .37. AS JOSs were associated with shorter solution times than both S JOSs (p < .001) and NS JOSs (p < .001). In contrast, solution times were similar for anagrams that received S JOSs versus NS JOSs (p = 1.00). The main effect of group was not significant, F(1, 140) = 2.84, MSE = 0.28, p = .09, η2p = .02. The interaction was just shy of significance, F(2, 280) = 2.96, MSE = 0.07, p = .053, η2p = .02.

Fig. 7
figure 7

Experiment 3: Mean solving times for anagrams (Bars show 95% CI of each mean)

Not-Solvable Responses to Solvable vs. Unsolvable Anagrams

The mean response time for making not-solvable responses appear in Fig. 8. Table 12 shows the complete 2(JOS: AS + S, NS) × 2(anagram type: solvable, unsolvable) × 2(group: training, no-training) mixed-factor ANOVA results. All three main effects were significant: not-solvable responses were faster for anagrams assigned NS JOSs than AS + S JOSs (JOS main effect), not-solvable responses were faster for solvable than unsolvable anagrams (anagram type main effect), and not-solvable response times were faster in the training than no-training group (group main effect). The interaction of JOS and anagram type was significant; the not-solvable response time difference between JOSs was larger for unsolvable than solvable anagrams (but both were significant; ps ≤ .003). The interaction between JOS and group was significant; the not-solvable response time difference was larger for the no-training group than for the training group (both ps < .001). Thus, training reduced the difference in not-solvable response times for NS versus AS + S JOSs. The remaining effects were not significant.

Fig. 8
figure 8

Experiment 3: Mean response time for not-solvable responses (Bars show 95% CI)

Table 12 Experiment 3: Mean response time for not-solvable responses ANOVA results

Discussion

In terms of JOS discrimination, Experiment 3 replicated Experiment 2; AS and S JOSs were both discriminating, even in the no-training group where anagram duration was 2 s in all blocks. Longer-duration anagrams in blocks 1–3 increased the likelihood of AS JOSs rather than enhancing the discrimination of S JOSs, and training again did not increase discrimination in the final 2 s block for either S or AS JOSs.

Allowing participants to self-regulate their solving efforts (and the inclusion of unsolvable anagrams) reduced solving rates in Experiment 3, and thus provided a stronger test of whether JOSs (especially S JOSs) predict problem solving. AS JOSs and NS JOSs again predicted solving successes and failures, respectively, but even when self-regulation was permitted we did not find any evidence that S JOSs predicted later problem-solving outcomes. As was found in Experiment 2, training improved the predictiveness of AS JOSs and NS JOSs, but not of S JOSs. Interestingly, in Experiment 3 solving rates for S JOSs were higher than for NS JOSs, unlike in Experiments 1 and 2. However, training still did not impact solving phase performance.

The design of Experiment 3 also allowed us to assess how often each JOS was associated with not-solvable responses in the solving phase. Solvable anagrams that received NS JOSs in the JOS phase received more not-solvable responses than anagrams that received AS + S JOSs, and this difference was greater in the training group. For unsolvable anagrams, the training group made more not-solvable responses for anagrams given NS JOSs than AS + S JOSs compared to the no-training group; the no-training group made more not-solvable responses for NS JOSs than AS + S JOSs, but the difference was smaller than the training group. Adding to Lauterman and Ackerman’s (2019) findings that a “not solvable” initial JOS predicts a “not solvable” final JOS, longer-duration anagrams enhanced NS JOS predictiveness.

Experiment 3 also measured self-regulated response times during the solving phase. Unsurprisingly, anagrams that were reported to have been spontaneously solved during the JOS phase (AS JOSs) yielded the fastest solution times. However, solution times were similar for S JOSs and NS JOSs, and training with longer-duration anagrams did not impact this pattern. We also found faster not-solvable responses in the solving phase following NS JOSs than AS + S JOSs, particularly for unsolvable anagrams and for the no-training group. Lauterman and Ackerman (2019) reported that effort regulation following an S JOS was similar for solvable and unsolvable problems, suggesting that making an S JOS for an unsolvable problem may lead solvers to perseverate on unsolvable problems. Our findings support theirs, and further establish that differences in not-solvable responses times between AS + S JOSs and NS JOSs are reduced via training with longer-duration anagrams.

In sum, Experiment 3 replicated Experiments 1 and 2 in terms of JOS discrimination and the ability of JOSs to predict solved versus not-solved outcomes. Adapting the two-phase paradigm to allow effort regulation and solving performance to vary extended our understanding of JOSs by revealing that S JOSs can be associated with a higher solving rate relative to NS JOSs, whereas NS JOSs showed lower rates and faster “not solvable” responses at test. Further, we found novel evidence that the ability of JOSs to predict the rate and speed of “not solvable” responses was influenced by training, such that training lead to quicker and higher rates of not-solvable responses for anagrams given NS JOSs.

General Discussion

Three experiments provided an in-depth investigation of the first stage of meta-reasoning—judgments of whether problems are solvable or not. In our two-phase paradigm, participants first made JOSs to solvable and unsolvable anagram problems, and this JOS phase was followed by a solving phase. During the JOS phase, an ‘already solved’ (AS) JOS option was provided to allow participants to indicate having solved an anagram at this stage. A two-phase paradigm allows participants to focus on making intuitive JOSs in the JOS phase (at least at briefer anagram durations) and to focus on trying to solve the anagrams in the solving phase. Providing an AS JOS option allowed us to parse out solved anagrams from our discrimination measure in the JOS phase. Because JOSs are meant to be intuitive judgments (and intuition about problem solvability should precede solving; Ackerman & Thompson, 2017), it is important to separate intuitive JOSs from problems solved during the JOS process. This separation enabled us to more cleanly measure whether S JOSs predict solving outcomes and effort-regulation. We also examined the effects of training on JOS discrimination and predictiveness, by presenting anagrams for longer durations at first (16 s), which then halved across blocks (8 s, 4 s, 2 s). In Experiments 2 and 3 we compared JOSs in these training groups to no-training groups in which anagram duration was always 2 s. Below, we discuss in turn JOS discrimination and whether it was improved by training. We then discuss whether JOSs were predictive of later problem-solving outcomes and effort regulation, and whether these outcomes benefitted from training. Finally, we also discuss the potential value of future research comparing two-phase and interleaved paradigms for capturing the initial stages of meta-reasoning.

JOS Discrimination

Our experiments provide evidence that participants’ intuitions can discriminate solvable from unsolvable anagrams. AS and S JOSs were both found to be discriminating, even at our briefest anagram duration (2 s). Importantly, discrimination remained above chance when we excluded the anagrams that participants reported having solved during the task (i.e., those receiving AS JOSs).

Previous studies have reached different conclusions regarding the ability of solvable JOSs to discriminate solvable from unsolvable problems. Studies in which solvable JOSs were found to be discriminating did not allow participants to report having solved the problems during the JOS process (Balas et al., 2011; Bolte & Goschke, 2005; Novick & Sherman, 2003; Topolinski & Strack, 2009; Undorf & Zander, 2017). When already-solved items were reported and removed from analysis, Topolinski et al., (2016, Experiment 7) found that participants were only marginally sensitive to anagram solvability. Our experiments found that S JOS discrimination was significant, though it was notably weaker than AS JOS discrimination. Our η2p effect sizes for S JOS discrimination ranged from .09 to .11 in our no-training groups to .22 to .37 in our training groups, whereas Topolinski et al.’s was .06. Topolinski et al. also had a smaller sample and fewer JOS trials, thus their study may have lacked power. Regardless, our study is the first to provide clear evidence that S JOSs can be discriminating, even after excluding solutions arising during the JOS task.

An interesting question our study cannot address is the stimulus features participants use to successfully distinguish solvable from unsolvable anagrams. Perhaps their intuitions are biased by certain diagnostic letter combinations or differences in bigram frequencies. Another possible mechanism underlying S JOS discrimination is the unconscious activation of semantic representations (Bowers et al., 1990) that would indicate an anagram is solvable. Future research should investigate the stimulus features that drive S JOS discrimination.

Even at our briefest anagram duration (2 s), AS JOSs were reported for 22–48% of the solvable anagrams during the JOS phase. Given that intuition about solvability precedes solving, we would expect a “solvable” JOS to have preceded the solution on these trials (Ackerman & Thompson, 2017; Bowers et al., 1990). Thus, a potential disadvantage of removing AS JOS trials is that it may remove trials where accurate intuitions have occurred. In doing so, S JOS discrimination may be underestimated (and in turn may underestimate how well S JOSs predict solving outcomes, as discussed below). An alternative approach to capturing intuitive S JOSs is to establish a brief problem duration for each participant at which they no longer report AS JOSs. However, our concern with this approach is that the use of very brief problem durations may lead participants to rely on irrelevant cues to make their JOSs, or to simply engage in random responding. Thus, there are pros and cons to both approaches, and future research should compare them.

Turning to the impact of training, in Experiments 2 and 3, as expected, we confirmed that AS and S JOS discrimination were more accurate in blocks 1–3 in our training group (who received longer-durations anagrams in these blocks) than in our no-training group. We expected that the training group would use their greater solving success as feedback to help calibrate their “solvable” intuitions. However, training did not improve AS JOS or S JOS discrimination in the final block relative to the no-training group. Perhaps, then, the training group simply shifted their efforts to solving the anagrams when longer-duration anagrams were provided, rather than on trying to improve the accuracy of their intuitive judgments. If so, then our ‘training’ may not have helped participants learn to regulate their meta-reasoning during blocks 1–3. We recommend that future research consider alternative means of enhancing JOS discrimination. For example, it could be worthwhile to examine the effects of providing explicit feedback about JOS accuracy (i.e., by indicating after each JOS whether the anagram was solvable or unsolvable), particularly given that prior studies have shown that trial-by-trial feedback improves discrimination in meta-memory tasks (e.g., Arnold et al., 2013; Higham, 2007; Sharp et al., 1988).

We consistently found that JOS discrimination in the training group weakened across blocks in a linear manner. Interestingly, this decrease in discrimination across blocks took a different form for AS and S JOSs. For AS JOSs, it reflected a linear decrease in hits across blocks (while false alarms remained similar), whereas for S JOSs, it reflected a linear increase in false alarms (while hits remained similar). AS JOSs and S JOSs appear to be impacted differentially by training. However, why this occurs remains to be determined. Regardless, this novel dissociation is indicative of a qualitative rather than quantitative difference between AS and S JOSs, as one would expect if AS JOSs reflect actual solving whereas S JOSs capture intuitions about solvability. This difference between AS and S JOSs, coupled with their different predictiveness of solving outcomes (as discussed next), help rule out the possibility that AS JOSs are simply stronger S JOSs.

JOS Predictiveness

Our study also clarified whether intuitions about problem solvability, as measured by JOSs, predict later reasoning performance and effort regulation. One of our key measures compared how often anagrams were solved as a function of how often they had received AS versus S versus NS JOSs in the JOS phase. In Experiments 1 and 2, anagrams that received AS JOSs were more likely to be solved than those that had received S or NS JOSs. But anagrams that received S JOSs were not more likely to be solved than those that received NS JOSs, consistent with prior evidence that JOSs are limited in their ability to predict reasoning success (e.g., Ackerman & Beller, 2017; Lauterman & Ackerman, 2019). However, because participants in Experiments 1 and 2 knew that each anagram was solvable, they may have put equal effort into solve each anagram for the full 45 s regardless of their JOS. In Experiment 3 we included both solvable and unsolvable anagrams in the solving phase, and participants decided how much time to spend on their solving attempts. These conditions reduced the solving rate relative to Experiments 1 and 2, thus providing more room for solving rate to vary as a function of JOS. Here the solving rate was higher for anagrams that had received S than NS JOSs.

The extent to which solving phase effort and outcomes are influenced by memory for one’s JOS is another important issue for future research to tackle. Remembering having indicated that an anagram is solvable is likely to increase one’s efforts to solve it. In our two-phase paradigm, the delay between JOSs and solving attempts should reduce the likelihood that participants’ solving efforts are solely determined by memory for the JOS—at least relative to an interleaved paradigm where each JOS is immediately followed by a solving attempt. However, the extent to which participants attempt to align their solving efforts with their intuitive judgments remains unknown. If intuitions about solvability are stable over time (Stagnaro et al., 2018) then the intuition that a problem is solvable might recur when the same problem is presented in the solving phase—even if the solver does not remember the JOS or intuition they experienced earlier for the same problem.

Our second measure of solving outcomes considered whether, for each JOS, solved outcomes were more likely than not-solved outcomes. We consistently found that solvable anagrams given AS JOSs were more frequent among solved anagrams than among not-solved anagrams, whereas solvable anagrams given NS JOSs were more frequent among not-solved anagrams than among solved anagrams. But critically, even though S JOSs were discriminating, they were not more common among solved anagrams than not-solved anagrams. Some prior research suggests that solvable JOSs predict solving outcomes (Markovits et al., 2015; Siedlecka et al., 2016). We found that removing AS JOS trials eliminated S JOS predictiveness, suggesting that the effect in these studies may have arisen due to the inclusion of problems solved during the JOS task. Therefore, we recommend that where spontaneous solving is possible, an AS JOS option be provided to enable participants’ intuitions to be separated from their solutions.

On the other hand, as discussed earlier, removal of AS JOSs may underestimate the predictiveness of S JOSs, given that intuitive feelings of solvability likely precede AS JOSs. Had we used problem durations short enough to eliminate AS JOSs, we may have obtained more solved than not-solved outcomes for S JOSs—so long as participants did not revert to random guessing or biases.

As was true in Topolinski et al. (2016), we did not measure whether participants’ AS JOSs were accompanied by a solution at that time. Consequently, it remains unclear whether AS JOSs reflect high-confidence intuitions or actual solving. However, since solving rates for AS JOSs were very high, and solving times were fastest for AS JOSs, we suspect that AS JOSs typically reflect genuine solving. Nonetheless, future JOS research using a two-phase design could explore this question by asking participants to report solutions to anagrams they indicate as having already solved during the JOS phase.

Experiment 3 also assessed the rates of not-solvable responses during the solving phase. This rate was higher for anagrams given NS JOSs than for those given either of the other JOSs (i.e., AS + S JOSs). This finding is in line with Lauterman and Ackerman’s (2019) evidence that initial JOSs predict final JOSs (i.e., a participant’s final judgment about whether an unsolved problem was solvable). Similarly, not-solvable response times during the solving phase were shorter when anagrams were given NS JOSs than AS + S JOSs,in line with Lauterman and Ackerman’s finding that participants spend more time on problems they judge as solvable, regardless of their actual solvability. Together, these findings highlight that JOSs can predict later effort regulation and can help problem solvers optimize their effort regulation (i.e., so as not to waste efforts on unsolvable problems).

Next, we turn to a consideration of the impact of training on JOS predictiveness. Did exposure to blocks of longer-duration anagrams lead JOSs to be more predictive of solving phase outcomes? In general, we did not find this to be the case. However, in Experiment 3 only the training group produced more solved than not-solved outcomes for solvable anagrams given AS JOSs. Of course, this difference is not surprising given that longer-duration anagrams should result in more solving during the solving phase. We also found more not-solved outcomes than solved outcomes for solvable anagrams given NS JOSs in the training group, but not in the no-training group. Longer deliberation of solvability without a solution may lead participants to judge it as not solvable during the solving phase (Payne & Duggan, 2011). Given that the training group had longer to deliberate solvability in blocks 1–3, they may have exhausted all letter arrangements for some anagrams during the JOS phase and thus defaulted to not-solvable responses for them in the solving phase.

Importantly, training did not result in more solved than not-solved outcomes for anagrams given S JOSs. Earlier, we suggested that longer-duration anagrams may lead the training group to shift toward solving the anagrams rather than merely assessing solvability, thus robbing them of opportunities to learn how to regulate their JOSs. This might also explain why we did not detect an effect of training on S JOS predictiveness. Earlier, we also suggested that providing trial-level accuracy feedback after each JOS might improve S JOS discrimination. However, participants might use their memory for this feedback to regulate their efforts in the solving phase, rather than relying on their intuition. If so, then providing feedback might actually undermine solving performance. To assess this possibility, future research could examine whether providing trial-level feedback about JOS accuracy for one set of problems affects discrimination for another set of problems presented without feedback, and whether JOSs predict solving outcomes selectively for the latter set.

Paradigms for Measuring JOSs

Our use of a two-phase paradigm, in conjunction with collecting AS JOSs, enabled us to separate the effects of intuitions from deliberate solving. However, it remains to be determined whether participants use memory for their JOSs to regulate their solving attempts—and whether memory for JOSs also impacts JOS predictiveness. In an interleaved paradigm, a solving attempt immediately follows each JOS, thus memory for the JOS likely influences one’s problem-solving efforts. We are currently comparing these two paradigms.

Implications for Learners

Our results have some clear implications for learners. For instance, students taking timed tests need to learn how to strategically regulate their time and effort to maximize their performance. The ability to discriminate between questions they can versus cannot answer enables students to regulate their effort toward solvable problems. Our studies suggest that merely judging a problem to be solvable (S JOSs) was not predictive of later problem-solving success. Moreover, it can also mislead effort regulation; reasoners take longer to abandon problems they judged to be solvable, especially under greater time pressure. An important direction for future research is to investigate how to train and optimize JOSs to appropriately shift effort and increase successful solving.

Conclusion

Our study establishes that meta-reasoning judgments about solvability are sensitive to whether a problem is actually solvable and can sometimes influence subsequent regulation of problem-solving effort. We found that meta-reasoning judgments about solvability remained accurate when spontaneously solved items were excluded. Meta-reasoning research is still in the early stages, and our study highlights the need to measure solving during the JOS process both for its effects on measures of intuition and on later problem-solving performance. Our findings highlight an interesting discrepancy regarding judgments of solvability, namely that they can be discriminating and yet not be predictive of later solving. More research is needed to examine the generality of our findings as a function of the type of problems being solved, and as a function of the paradigm used to measure JOS discrimination and predictiveness.