The feeling of knowing (FOK) is a subjective state of confidence in the availability of information in memory, even when it cannot currently be accessed. Following Hart (1965), empirical studies of FOKs have asked individuals to attempt to recall information when prompted by a cue, followed by a rating of confidence in the FOK that scales the likelihood of being able to later recognize the sought-after information. In episodic memory experiments, the cue is usually information paired with the sought-after target during encoding (e.g., studying a face–name pair, then trying to recall the name when shown the face).

Introduction

Theoretical bases of FOKs

Metacognitive research largely concerns itself with identifying influences on the magnitude of metacognitive judgments (such as FOKs) and the predictive accuracy of those judgments for subsequent cognitive performance (see Dunlosky & Metcalfe, 2009, for an introductory review). Depending on how they are scaled, these two variables—FOK magnitude and FOK accuracy—are essentially independent of one another. FOK magnitude has been shown to be influenced by multiple variables, including the familiarity of the cue that is used to generate the FOK (e.g., Metcalfe, Schwartz, & Joaquim, 1993) and the retrieved information elicited by the cue (the accessibility hypothesis; Koriat, 1995; Koriat & Levy-Sadot, 2001). Predictive accuracy is traditionally defined by FOK resolution—or the within-person correlation of variation in FOKs for different items with recognition memory outcomes for those items. Typically, resolution is measured by ordinal Goodman–Kruskal gamma correlations of the FOKs with recognition accuracy, computed separately for each participant (Gonzalez & Nelson, 1996).Footnote 1

The accessibility hypothesis on FOKs argues that they are influenced by the amount of information accessed, whether or not it derives from the originally encoded target. We adopted an alternative, accessibility-based perspective that individuals construct FOKs on the basis of weighting multiple cues, so that FOK magnitude and resolution depend on the accessed cues that are regarded by the rater as relevant to the criterion outcome. False memories for encoding contexts can influence FOKs and constrain FOK accuracy (Koriat, 1995). Conversely, experimental conditions that increase access to diagnostic cues (i.e., cues that derive from the encoded information in memory and signify later recognition success) will increase FOK resolution when raters base their FOKs upon them (e.g., Schacter & Worling, 1985).

Both FOK magnitude and FOK resolution are influenced by the quality of the original encoding (e.g., Lupker, Harbluk, & Patrick, 1991; T. O. Nelson, Leonesio, Shimamura, Landwehr, & Narens, 1982; Thomas, Bulevich, & Dubois, 2012). For example, both mean FOKs and FOK resolution are higher for items studied with multiple presentations, relative to items studied only once (Carroll & Nelson, 1993; Hertzog, Dunlosky, & Sinclair, 2010). Conversely, divided attention at the time of encoding impairs subsequent FOK resolution (Sacher, Taconnat, Souchay, & Isingrini, 2009).

The typical method of generating FOK resolution—correlating them with recognition memory accuracy—merely contrasts FOKs for correctly recognized items against FOKs for incorrectly recognized items. Thus, it cannot evaluate whether gradations in FOK magnitude are associated with gradations in recollective experiences for correctly recognized items. Functionally, this limitation implies that most of the evidence regarding FOK resolution in the literature to date has implicitly concerned discriminating low from high FOKs, which could be driven primarily by what Liu, Su, Xu, and Chan (2007) described as the distinction between “definitely knowing that one doesn’t know” versus other FOK states. However, for items that were correctly recognized on the criterion test, Hicks and Marsh (2002) demonstrated that a remember–know judgment after each forced choice recognition item test correlated with FOKs. This finding showed that FOKs after failed recall tests forecast subsequent recollection experiences during a recognition test.

We have replicated this association of FOKs with remember–know judgments (MacLaverty & Hertzog, 2009) and extended it to confidence judgments for recognition test answers (henceforth, CJs; Hertzog Dunlosky, & Sinclair, 2010). As with Hicks and Marsh’s (2002) findings with the remember–know procedure, this correlation was driven by variation in FOKs and CJs within the class of correctly recognized items alone (Eakin, Hertzog, & Harris, 2013; Hertzog et al., 2010), showing that above-chance FOK–CJ resolution cannot be produced by merely discriminating memory successes from memory errors. In fact, FOKs have no reliable correlation with CJs for items that are incorrectly recognized, consistent with the arguments that the FOK–CJ relationship is generated by the degree of the encoded cue–target relations that are recollected during the FOK judgment (when the target is absent) and that it is diagnostic for later recollective experiences at the time of the recognition test (see also Souchay, Moulin, Clarys, Taconnat, & Isingrini, 2007). Moreover, the effect is observed for different types of stimuli, including verbal paired associates and a face–name learning task, in which faces serve as the cues for recall and FOK (Eakin et al., 2013). Eakin et al. also showed that the FOK–CJ correlation for correctly recognized name–face pairs was observed for both episodic (previously unknown) and semantic (i.e., normatively famous) faces and names.

This pattern of effects for correctly recognized items validates FOK experiences beyond what can be obtained by the traditional means of discriminating recognition successes from recognition failures. More generally, above-chance FOK–CJ correlations are consistent with the view that the amount and quality of information accessed during an FOK-initiated retrieval search influence gradations in the FOKs (Hertzog et al., 2010; Koriat, 1995). The present study further establishes and clarifies the connections between FOK states, recognition accuracy, and recognition memory CJs.

Noncriterial recollection and strategy recall

The major goal of this study was to evaluate a hypothesis regarding the diagnostic cues that people can access in order to enhance FOK accuracy. The noncriterial-recollection hypothesis (Brewer, Marsh, Clark-Foos, & Meeks, 2010) is an accessibility view stipulating that FOKs are based in part on retrieving information about either the original encoding context or target features other than the criterion target itself (e.g., Parks, 2007). For example, the participant might recollect emotional reactions to the cue–target combination, or that the target reminded one of a past event, and access to such information is predicted to boost the FOK magnitude. Noncriterial recollection could influence FOK magnitude because access to contextual detail about encoding or about features of the target can occur even when people cannot recall the target itself (Cook, Marsh, & Hicks, 2006). Consistent with this hypothesis, Brewer et al. found that recollection of the source context or other item characteristics influences FOKs for unrecalled targets. Thomas, Bulevich, and Dubois (2011) showed that remembering the emotional valence of an unrecalled target increases both FOK magnitudes and FOK resolution. They also showed that explicit instructions to recall target valence prior to the FOK increased the FOK–memory correlation, suggesting that a controlled retrieval search is part of the process of making an accurate FOK.

Unlike in previous studies, we tested the noncriterial-recollection hypothesis for FOKs and FOK accuracy by focusing on retrieval of the encoding strategies that had been generated during study. In particular, during study, individuals were instructed to generate mediators for new associations between normatively unrelated nouns. Immediately after the cued-recall attempt, participants were then prompted to recall the mediator that they had originally generated during study. We hypothesized that recall of accurate detail about the original associative mediator would increase FOK magnitudes. Retrieving the original mediator, even when the target itself could not be accessed at the time of the FOK, was hypothesized to be a potent cue influencing FOKs in standard paired-associate tasks. Given that successful retrieval of the original mediator (vs. unsuccessful retrieval) is also related to memory for the sought after target (Dunlosky, Hertzog, & Powell-Moman, 2005), we expected that this cue would also be diagnostic of subsequent recognition performance, and hence also boost FOK resolution.

To evaluate the noncriterial-recollection hypothesis, we directly estimated the relationship between mediator retrieval during cued recall and any subsequent FOKs and their resolution. We assessed strategy recall by using a mediator-report-and-recall method (Dunlosky et al., 2005) for verbal paired-associate (noun–noun) items. After studying each item, participants described the image (or other mediating strategy) that they had generated, if any. As with the previously cited source memory experiments (e.g., Brewer et al., 2010; Cook et al., 2006), when individuals could not recall the target during the cued-recall test, for some items they could still report access to aspects of the prior encoding operations, such as partial access to constructed encoding strategies that had been generated to form a new association between the paired words (Dunlosky et al., 2005; Hertzog, Fulton, Mandviwala, & Dunlosky, 2013). Although this is infrequent, target recall failures occur even when individuals are able to provide verbatim recall of their original encoding strategy. Target recall failures are more likely when only gist-consistent or partial descriptions of the mediator are accessed. Between-item variability in the recall of aspects of the original encoding strategies is therefore a candidate source of cues influencing FOKs and FOK resolution in associative memory tasks.

It is also plausible that access to aspects of original encoding would be considered useful information by persons making FOKs, especially when individuals are instructed to use mediational strategies to assist with associative learning. We expected that participants in this study would be likely to deem successful retrieval of original encoding outcomes as being diagnostic of future recognition memory success, leading them to use that information when making FOK judgments.

We also experimentally manipulated two variables that were likely to influence the quality of the associative encoding on the basis of strategy use: item concreteness and repetitions. We instructed individuals to use interactive imagery to study normatively unrelated verbal paired-associate items (either concrete–concrete [e.g., TICK–SPOON] or abstract–abstract [e.g., LIBERTY–PASSION] items). It is more difficult to generate and retrieve imagery mediators for abstract pairs, because imageable tokens must be generated for each abstract concept (Paivio, 2007; Yuille, 1973). Using imagery for abstract pairs is therefore less likely to lead to successful associative recall, in part due to reduced access to the mediator during the test (Hertzog et al., 2013). Items were presented either once or three times, given that this manipulation influences memory, FOK magnitudes, and FOK accuracy (Hertzog Dunlosky, & Sinclair, 2010; T. O. Nelson et al., 1982). Cook et al. (2006) also demonstrated that repeated presentations increase the likelihood of source recollection in the absence of target recall.

After a one-week delay following the original encoding (to bring recognition memory performance for thrice-presented items away from ceiling; see T. O. Nelson et al., 1982), participants returned to the lab for the recall test. They were cued with one word from a pair (e.g., TICK) and asked to recall its associate. We also asked them to provide FOKs and to report what they could remember about the mediator that they had generated during encoding.

Research hypotheses

The critical questions for this experiment concerned the relations of strategy recall to FOKs and CJs. Our test of the noncriterial-recollection hypothesis stipulated three effects regarding prediction of recognition memory performance by FOKs: (1) Remembering the original mediator, in whole or in part (which we shall refer to as strategy recall) will increase FOK magnitudes relative to trials in which nothing about the mediator can be recalled; (2) strategy recall will predict recognition memory for unrecalled items; and (3) strategy recall will statistically account for, or mediate (MacKinnon, 2008), the relationship of FOKs to recognition memory for unrecalled items. To foreshadow our results, this experiment shows that manipulating these variables affects noncriterial access to the original encoding strategies, which in turn influence FOK magnitudes and account for the prediction of recognition memory by FOKs.

With respect to CJs, the hypotheses of interest were that (4) FOKs would predict CJs for correctly recognized items; (5) strategy recall would also predict CJs for those items; and (6) strategy recall would account for the relationship of FOKs to CJs. However, an alternative possibility was that additional (unmeasured) cues besides strategy recall were accessed when making FOKs, so that both FOKs and strategy recall would independently predict correct-recognition CJs.

Statistical approach

We tested these hypotheses by using multilevel regression models to evaluate simultaneously the influences of multiple cues on FOKs, recognition memory accuracy, and CJs. This statistical procedure has been successfully employed to evaluate multiple variables’ influences on judgments of learning (e.g., Hertzog, Sinclair, & Dunlosky, 2010; Hines, Touron, & Hertzog, 2009). For instance, Tauber and Rhodes (2012) used multilevel regression to show that a memory-for-past-test heuristic is only one of multiple influences on multitrial judgments of learning (see also Hertzog, Hines, & Touron, 2013).

This approach has three major advantages. First, it generates regression models that estimate the magnitudes of influence of multiple cues on metacognitive judgments, including the proper standard errors of estimation for these effects. Second, one can evaluate whether a cue (such as recollection of the original encoding strategies) statistically mediates the relation of other cues and experimentally manipulated variables to metacognitive judgments and to memory outcomes.Footnote 2 We used multilevel regression to test whether the effects of FOKs on recognition memory and CJs would be statistically independent of recall of encoding strategies at the time of the cued-recall test. This approach can be used to falsify the hypothesis that recall of the original encoding outcomes is a sufficient explanation of both FOKs and their predictive validity (for either recognition memory or CJs), in favor of the alternative hypothesis that multiple influences, including recalled encoding strategies, have effects on FOKs.Footnote 3

Third, the fact that FOKs are evaluated for unrecalled items implies that each person has in principle a different set of unrecalled items that remain for further analysis of FOK magnitudes and FOK accuracy. In the present experiment, participants would also fail to generate a mediator for some items, further reducing the available item pool on an idiosyncratic basis. The possible biasing influences of residual item sets are typically ignored in metacognitive research; multilevel models that use item as an explicit factor in the analysis help to control for these differences and ensure that the predictors of FOKs are not artifacts of which items survive the screening criteria.

Method

Participants

Undergraduate students at Kent State University and the Georgia Institute of Technology received course credit for participating in the study. In all, 45 young adults were included in the analyses. A total of 69 students were recruited for the study, of whom 15 did not return for the second session and nine did not recall enough mediators (5 % minimum) to be included in the analysis.

Materials

A list of 80 noun pairs, 40 concrete and 40 abstract (see the online supplemental materials, Appendix A), were chosen from the University of South Florida Free Association Norms (D. L. Nelson, McEvoy, & Schreiber, 1998) and the MRC Psycholinguistic Database (Fearnley, 1997) and verified with ListCheck Pro 1.2 program (Eakin, 2010).

Design

The experiment had a 2 (Concreteness: concrete, abstract) × 2 (Presentation: one, three) within-subjects design.

Procedure

Items were presented in a random order at study, either once or three times (under the constraint that an item could not be presented twice in a row), for 30 s each. The instructions acknowledged that multiple encoding strategies exist, but participants were instructed to generate an interactive image if possible. Participants practiced using interactive imagery with three concrete and three abstract word pairs. Then they were presented with the experimental list and were prompted to give an oral description of the imagery mediator, which was digitally recorded, after each item studied. After a seven-day delay, participants returned to the lab and went through the main task, which included a phase of cued recall, FOK, and encoding strategy report, followed by a recognition memory phase, all of which were self-paced. During cued recall, individuals typed in the associated target words after being shown the cues, which were presented in a random order. They then were again shown the cue and provided an FOK on a 0 %–100 % confidence scale. After the FOK, they were prompted to report anything that they could recall about the strategy that they had generated at study. Target recall was scored as correct if the first three letters of the typed response were correct. This method is fast and automated, yet it has high convergent validity with other measures of recall, such as coded oral recall protocols (e.g., Dunlosky et al., 2005).

After they had completed cued recall, FOKs, and strategy recall reports for all of the items, they were given a four-alternative forced choice recognition test, in which the cue was presented with its target and three randomly selected targets from other pairs, under the constraint that each target was used equally often as a recognition lure. After each recognition test probe, individuals rated their confidence in the correctness of their selection, rated on a 0 %–100 % scale. The FOK and CJ procedures were modeled after those of Hertzog Dunlosky, & Sinclair, 2010, which can be accessed for additional procedural details.

Strategy recall was obtained by matching the oral descriptions at study and test, coding for no mediator at study, verbatim recall, gist recall, partial recall, commission errors, and omission errors (see Dunlosky et al., 2005, for more details). A summary of the coding scheme is available (see the online supplemental materials, Appendix B). For purposes of this study, we mapped encoding strategy outcomes on an ordinal scale from the highest fidelity of description recall to the lowest: 4 = verbatim recall, 3 = gist recall, 2 = partial recall, 1 = omission errors or commission errors. Treatment of commission errors as low strategy recall is the most defensible scaling of recall outcomes, although it could limit FOK–strategy relations because (1) commission errors in target recall are often accompanied by high FOKs (Krinsky & Nelson, 1985) and (2) commission errors for encoding strategies could be regarded by participants as accurately recalled details about the original encoding. As such, this scaling of strategy recall might dilute somewhat the connection between perceived recollection of the original encoding outcomes and FOKs.

Statistical methods

We used SAS PROC GLIMMIX (SAS Institute, 2008) to analyze the dependent variables in a generalized mixed model (Littell, Milliken, Stroup, & Wolfinger, 2000). For the categorical dependent variable of associative recognition success, a logit link function was employed. For other variables, a Gaussian (normal distribution) link function was used. In these analyses, individual items (nested within the concreteness independent variable) were modeled as having specific effects on the dependent variables. Hence, any significant effects of concreteness, repetition, and mediator recall statistically control for item-specific influences on the dependent variable. In addition to the usual homoscedastic residual error variance, we also modeled a random effect for (person) intercepts (individual differences), retaining the parameter if it was reliably different from zero. A critical value of .05 was used for all significance tests. To aid in interpreting the results, we computed an effect size difference in the fitted marginal means, where applicable, using Cohen’s (1988) d statistic, which scales mean differences in error standard deviation units (pooled intercept and residual variance). Cohen’s benchmarks for large, medium, and small effects are 0.8, 0.5, and 0.2, respectively.

We also estimated multilevel structural regression models in the MPlus 7.0 program (Muthén & Muthén, 1998–2007). This approach allowed us to accomplish two additional aims. First, we were able to estimate direct (partial regression coefficients), indirect (effects of one variable on another mediated by an intervening variable), and total (the sum of the direct and indirect effects; see Cheong & MacKinnon, 2012) effects, and to get standard errors (and significance tests) for the indirect effects. This feature made it possible to address questions about the degree to which strategy recall mediated the effects of such independent variables as repetition and concreteness on FOKs. Second, Mplus produces standardized regression estimates for both the within-person (item) and between-person (person) levels of the multilevel model. Standardization in Mplus is achieved by partitioning the total covariance matrix into within-person and between-person submatrices, and then rescaling the regression coefficients with the appropriate estimates of the variables’ standard deviations (SDs). For item-level regression coefficients, the rescaling is done by means of the associated ratio of item-level SDs (i.e., β * SD x(w) / SD y(w), where β is the estimated regression coefficient, SD x(w) is the estimated within-person SD of the predictor, and SD y(w) is the within-person SD of the criterion). For between-person regression coefficients, rescaling is done by the analogous ratio of between-person standard deviations. This feature allowed us to evaluate the relative magnitudes of the effects of different variables on metacognitive judgments (FOKs and CJs).

Results and discussion

The target recall results were fully consistent with those of earlier studies (see the online supplemental materials, Appendix C, Table 1), showing greater recall for concrete (vs. abstract) items and for three (vs. one) repetitions (e.g., Dunlosky et al., 2005; Hertzog et al., 2010; Hertzog et al., 2013). FOK states are defined as confidence that a target that cannot be accessed is available in memory and will be recognized later. Hence, as is traditional in this area of research, the analyses that we report all exclude trials resulting in successful target recall (on average, targets were recalled on 28 % of the trials), analyzing only data for trials in which targets were not recalled.Footnote 4 We also excluded items for which individuals did not report generating a mediator. Consistent with our earlier work (Hertzog et al., 2013), successful mediator production was relatively common. The mean proportion of items generating mediator descriptions was .95 (SD = .07), with the values ranging from .63 to 1.0 across all participants.

Table 1 reports strategy recall, in terms of the average level, and recognition accuracy, as proportions correct, as a function of concreteness and repetition. For archival purposes, we also report the mean FOKs and mean CJs and their SDs in this table. Note that the low mean levels of strategy recall reflect the fact that the modal outcome for unrecalled items was an omission or commission error (Dunlosky et al., 2005; Hertzog et al., 2013) for the generated mediator (M = .80, SD = .12). Nevertheless, verbatim or gist recall of the original encoding strategy (M = .10, SD = .08) still occurred following target recall failures.

Table 1 Mean feeling-of-knowing judgments (FOKs), strategy recall, recognition, and confidence judgments as a function of concreteness and repetition

It would be typical in the metacognitive literature to use aggregated person-level means for the variables reported in Table 1 as dependent measures; for example, by analyzing each person’s proportion correct in the associative recognition task. We forgo this approach because of our use of multilevel models for each variable, using item-level data.

The use of item-level data for recognition memory ran into the problem that six items were correctly recognized by all participants, and thus had to be deleted from the analysis of recognition memory success in order to obtain converging multilevel regression solutions. To preserve comparability of the results across the different dependent variables, we deleted the data for these six items from all of the multilevel regression analyses reported in this article, including the ones analyzing FOKs and CJs.

Strategy recall

We begin with an analysis of strategy recall, because this variable is central to most of the major predictions about FOKs that we described in the Research Hypotheses section. We expected that the likelihood of recalling properties of the mediators (i.e., strategy recall) would be influenced by the independent variables of concreteness and repetition. The generalized mixed model predicting the strategy recall variable (see Table 2, columns 2 and 3) showed that concreteness and repetition would both influence strategy recall, controlling for the significant specific item effects (i.e., some items afforded more memorable encodings than others). Table 3 reports the random effects for each model. The first data row of Table 3 reports the unconditioned model (estimating only a person intercept and residual variance, without any experimental effects); reductions in residual variances to models including independent variables enabled us to compute a pseudo-R 2 statistic (Snijders & Bosker, 1999). The full regression model included a residual variance and a significant random effect for intercepts, indicating reliable individual differences in average levels of strategy recall. The fixed effects for concreteness, repetition, and their interaction accounted for about 56 % of the variance in strategy recall; after including the intercept variance, the model accounted for 71 % of the variance in strategy recall.

Table 2 F tests for the fixed effects of items, concreteness, and repetition on strategy recall and FOK magnitudes from the mixed-model analyses
Table 3 Random variance components for the mixed models predicting strategy recall and feeling-of-knowing (FOK) magnitude

Figure 1 shows the corresponding marginal means and standard errors for the strategy recall variable. Recalling something about the encoding strategy was far more likely for items presented three times (M = 1.72, SE = 0.04) than for items presented once (M = 1.15, SE = 0.03), d = 0.75, a large effect. In terms of odds ratios, strategy recall success (attaining either verbatim or gist recall of originally encoded mediators) was three times more likely when items were presented thrice instead of once. Recall of encoding outcomes was also on average more likely for concrete items (M = 1.51, SE = 0.04) than for abstract items (M = 1.36, SE = 0.03), d = 0.20. The Concreteness × Repetition interaction was also reliable (see Fig. 1), indicating that repetition effects were larger for concrete items, d = 0.89, than for abstract items, d = 0.51.

Fig. 1
figure 1

Effects of concreteness and repetition on strategy recall. The error bars represent standard errors of the means

In general, then, recall of the original encoding strategies for unrecalled targets occurred, varied within persons, and was influenced by independent variables that have been shown in other studies to influence FOKs. Thus, the quality of strategy recall is a candidate variable to explain variation in FOKs for unrecalled items.

FOK magnitude

Before evaluating the main research hypotheses pertaining to FOKs and strategy recall, FOKs for unrecalled items were first analyzed without reference to encoding outcomes. The mixed-model results (Table 2, Model 2) showed reliable effects of items, concreteness, and repetition, along with a Concreteness × Repetition interaction. FOKs were therefore sensitive to the independent variables (see Fig. 2). Concreteness on average generated a small effect, d = 0.18, whereas repetition generated a medium-sized effect, d = 0.67. The reliable interaction reflected larger repetition effects on FOKs for concrete items, d = 0.80, than for abstract items, d = 0.53. The model also included a random effect of FOK intercepts, reflecting individual differences in the mean FOKs (Table 3, Model 2). The pseudo-R 2 indicated that the experimental factors (items, concreteness, and repetition) accounted for about 11 % of the total variance in FOKs. Including the random intercept variance, the model accounted for about half of the variance in FOKs, showing that individual differences in the mean FOKs were a substantial source of FOK variance.

Fig. 2
figure 2

Effects of concreteness and repetition on FOK magnitude. The error bars represent standard errors of the means

To evaluate the main hypotheses, a critical next step was to consider the contribution of strategy recall to FOKs. In particular, the noncriterial-recollection hypothesis stipulates that strategy recall will have a strong relationship to FOKs. As a preliminary step, we computed the average FOK at each level of strategy recall (see Fig. 3). This plot suggested a strong relationship between the two variables, with the biggest discrimination between levels of FOKs for strategy recall errors (omissions and commissions) and some level of mediator recall, which is consistent with the prediction from our first hypothesis. The plot indicated little distinction between gist and verbatim recall of encoding strategies in the effects on FOKs, which would not necessarily be unexpected for the retrieval of imagery mediators, which might be equally likely to generate gist or verbatim verbal descriptions of the retrieved images at the time of cued recall (Hertzog et al., 2013), with either recollective experience generating relatively high FOKs. Nevertheless, we opted to continue to use the four-level graded strategy recall variable in the further analyses of strategy recall–FOK relationships.

Fig. 3
figure 3

Effects of strategy recall on FOK magnitude. The error bars represent standard errors of the means

We then added the graded strategy recall variable to the mixed model predicting FOKs (Table 2, Model 3). The model included two variables capturing different aspects of graded strategy recall: a person-centered variable measuring within-person variation in strategy recall for different items (i.e., item-to-item variability in strategy recall for a given person), and a grand-mean-centered variable that captured between-person variation in each person’s mean level of strategy recall. These two variables reflecting between-person and a within-person sources of item variance in strategy recall are statistically independent (see Singer, 1998).

We initially included all higher-order interaction terms with the two strategy recall variables, but then trimmed nonsignificant effects in the reported final model. As compared to Model 2, Model 3 increased R 2 by 13 % by adding the fixed effects associated with strategy recall (see Table 3, Model 3). The model revealed a robust effect of item-level strategy recall on FOKs, β = 19.7, SE = 1.1. Within an individual, an increase in the level of strategy recall (e.g., from strategy recall failure to partial mediator recall) increased FOK confidence by about 20 %. This effect was moderated by concreteness. Figure 4 shows that the fitted linear effects for strategy recall were stronger for concrete items than for abstract items.

Fig. 4
figure 4

Fitted regression lines for the interactive effects of strategy recall and concreteness on feeling-of-knowing (FOK) magnitudes

In contrast, person-level effects of strategy recall were not statistically significant, indicating that individual differences in mean levels of strategy recall did not greatly influence individual differences in mean FOKs. Overall, these results indicated that the within-person variability in strategy recall across items was a more important influence on FOKs than were the between-person differences in strategy recall. In sum, the signature feature of these results was a very large effect of item-level variation in strategy recall on FOKs for unrecalled items in all experimental conditions, with a magnified effect size for concrete items. This outcome verified a key prediction of the noncriterial-recollection hypothesis.

Including mediator recall in the model reduced residual variance, and therefore increased statistical power. Nevertheless, the F tests for the repetition and concreteness main effects were reduced in magnitude as compared to Model 2, and the Concreteness × Repetition interaction was eliminated. The fitted marginal mean difference in FOK confidence between concrete and abstract items was only 2.4 % (d = 0.09) when strategy recall was included in the model, as compared to a 14 % (d = 0.18) difference when it was not in the model. Likewise, the repetition effect on FOK magnitudes, controlling on strategy recall, was reduced to 9.2 %, d = 0.34, as compared to the previous effect, d = 0.67, when strategy recall variables were not part of the model. It appeared that strategy recall statistically mediated some of the effects of concreteness and repetition on FOKs.

This inference was supported by a structural regression model with estimated indirect effects run in the Mplus program. The estimated standardized direct effect of strategy recall on FOKs was .50, larger than the standardized direct effects of concreteness (.04) and repetition (.17). The indirect effects of concreteness and repetition mediated by encoding strategy recall were .04 and .19, respectively, both of which were reliably greater than zero, p < .05. Thus, about half of the total effect of each independent variable on FOKs was mediated by encoding strategy recall.

These outcomes support the noncriterial-recollection hypothesis, showing that FOKs in episodic memory tasks are strongly influenced by access to the outcomes of the encoding operations carried out one week earlier. The effect of strategy recall on the FOKs found in this study appears to be larger than the FOK-related effects found in studies that have used the accessibility of ancillary encoding-context features (Brewer et al., 2010) or the accessibility of a single manipulated target feature (e.g., emotional valence; Thomas et al., 2011) to evaluate the noncriterial-recollection hypothesis. We speculated that participants in this experiment routinely regarded recovered detail about the original encoding experience as being diagnostic of later target recognition and often based their FOKs on this source of information.

Associative recognition accuracy

Table 4 (columns 2 and 3, Model 1) reports the F tests from the SAS PROC GLIMMIX analysis of recognition memory success (for previously unrecalled items only), after logit transformation of that binary dependent variable. Controlling for significant item differences in recognition memory success, reliable main effects of concreteness and repetition emerged, as well as a reliable Concreteness × Repetition interaction. On average, concrete items were more likely to be correctly recognized than were abstract items, thrice-presented items were more likely to be correctly recognized than were once-presented items, and the latter effect was larger for concrete than for abstract items. Table 5 (Model 1) reports the estimated random effects for this model.

Table 4 F tests for the generalized mixed models using item, concreteness, repetition, and strategy recall to predict logit-transformed recognition accuracy
Table 5 Random variance components for the generalized mixed models predicting recognition accuracy

FOK–recognition accuracy relationships

It is traditional to evaluate FOK resolution with respect to recognition accuracy by computing ordinal within-person gamma correlations and analyzing them as the dependent variable. As expected, repetition did affect gamma correlations (see the online supplemental materials, Appendix D, Table 4). We focus, however, on the use of multilevel regression models in SAS PROC GLIMMIX with logit-transformed recognition accuracy as the dependent variable because of its advantages for evaluating the linkage of strategy recall to FOK accuracy.

We started by adding FOKs to the model already reported. Our earlier analysis with FOKs as the dependent variable had shown reliable random effects in intercepts (individual differences in mean FOKs), so it was important to isolate the item-level and person-level FOK effects on recognition accuracy. We again used person-centered and grand-mean-centered FOK variables to accomplish this partition.

The initial analysis included all higher-order interactions involving both FOK variables (e.g., Concreteness × Item-Level FOKs), but none of these interactions were statistically significant, so they were trimmed from the model. Table 4, Model 2, reports the F tests for the effects remaining in the trimmed model. Table 5 reports the estimated random effects. We found reliable effects of item-level variation in FOKs on recognition accuracy, consistent with the gamma correlations. Higher FOKs were associated with higher likelihoods of recognition memory accuracy, β = 0.008, SE = 0.003. In contrast, mean FOKs did not reliably predict individual differences in recognition memory.

The next step was to add strategy recall to the model. Again, we entered item-level strategy recall, person-level strategy recall, and all associated interactions into the model. None of the interactions were statistically significant. Table 4, Model 3, reports the F tests for fixed effects in the trimmed model. Note that reliable effects of both encoding recall variables on recognition memory success were apparent. Within an individual, items for which aspects of the original encoding could be recalled were more likely to be recognized than were items generating less retrieved detail, β = 0.48, SE = 0.14. Between individuals, persons with higher levels of mediator recall were more likely to successfully recognize items that they had not previously recalled, β = 2.40, SE = 0.73. This finding corroborates our second hypothesis, demonstrating a substantial relationship between strategy recall and the recognition of previously unrecalled items. Hence, strategy recall is a diagnostic cue that could account for FOK accuracy.

Controlling the item-level strategy recall variable completely eliminated the significant effect of item-level FOKs on item recognition memory. This important outcome verifies our third hypothesis, suggesting that noncriterial recollection of the original encoding details fully mediated the predictive accuracy of FOKs for recognition memory. To further evaluate this hypothesis, we ran an Mplus model using strategy recall as the mediator of FOKs’ relationship to recognition memory success. Whereas the direct effect of FOKs on recognition memory just missed significance when controlling for strategy recall (standardized effect = .05, SE = .03, p = .06), the standardized indirect effect (.03, SE = .01) mediated by strategy recall was statistically significant, p < .05. The standardized .08 total effect of FOKs on recognition accuracy (SE = .02) was reliably greater than zero, p < .05.Footnote 5

An interesting interpretational twist on these analyses is that one can also argue that FOKs, although influenced by encoding recall, do not fully capture the potential of strategy recall as a cue for generating accurate FOKs, given that strategy recall predicted recognition memory independently of FOKs. This outcome suggests that participants’ reliance on this cue to make FOKs was inconsistent across trials, and highlights the idea that this type of metacognitive monitoring potentially could be enhanced by improving attention to the available diagnostic cues.

In sum, then, the predictive accuracy of FOKs for recognition memory success appears to be generated in large part by strategy recall, consistent with the noncriterial-recollection hypothesis (Brewer et al., 2010). However, FOKs failed to benefit fully from the available cues of strategy recall, repetition, and concreteness, all of which predicted recognition success independently of FOKs.

FOK–CJ relationships

To evaluate our second group of hypotheses pertaining to CJs, our next goal was to evaluate the predictive validity of FOKs for recollective experiences at the time of the recognition memory test, using correct associative recognition trials only. First, as predicted, the FOK–CJ gamma correlations were reliably above chance—greater than zero (see the online supplemental materials, Appendix E, Table 5). We were interested, however, in hypotheses about strategy recall and the FOK–CJ relationships that cannot be assessed with these gamma correlations. Specifically, we hypothesized that strategy recall would also predict recollection during the forced choice recognition test, and, given the relationship of strategy recall to FOKs, would therefore mediate, at least in part, the prediction of recognition memory CJs by FOKs.

The first model (Table 6, Model 1) simply included the item effects and the two experimentally manipulated variables, concreteness and repetition. We detected significant random effects in intercepts, indicating substantial individual differences in the mean CJs (see Table 7). The main effects of both independent variables were robust, and their interaction just missed statistical significance. Concrete items led to higher levels of confidence in recognition decisions (M = 88.5, SE = 2.0) than did abstract items (M = 76.0, SE = 2.0), d = 0.49, and thrice-presented items led to higher confidence (M = 92.2, SE = 2.1) than did once-presented items (M = 72.3, SE = 2.0), d = 0.78. The trend for an interaction reflected larger repetition benefits on confidence for abstract items, d = 0.86, than for concrete items, d = 0.71.

Table 6 F tests for mixed models using item, concreteness, repetition, and strategy recall to predict confidence judgments for correctly recognized items
Table 7 Random variance components and coefficients of determination for the mixed models with confidence judgments

We interpreted gradations in the CJs for correctly recognized items as reflecting the degree of recollective experience at the time of the forced choice recognition test, including recollective support for recall-to-reject processes (e.g., Cohn & Moscovitch, 2007; Gallo, Bell, Beier, & Schacter, 2006; Yonelinas, 2001). Given that the targets were previously unrecalled for the items included in these analyses, recall-to-reject in this context would most likely reflect a process by which recollective detail of the encoding context was first triggered when the foils (incorrect alternatives) were presented during the forced choice recognition test. Thus, unlike in a yes/no recognition task, recollection during the forced choice task would include both recollection of the original cue–target encoding and recollection triggered by recall of the foils and their originally paired cues.

Next, we entered item-level and person-level FOKs and the associated interactions into the model. The analysis detected a reliable effect of item-level FOKs on CJs, qualified by interactions of the item-level FOKs with repetition and concreteness. Similarly, we found a significant person-level FOK effect on CJs, qualified by an interaction of the person-level FOKs with repetition (see Table 6, Model 2).

To help clarify the repetition-related interactions, we ran multilevel models separately for the once-presented and thrice-presented items. Item-level FOKs had a larger regression coefficient for once-presented items, relative to items presented three times (β = 0.153, SE = 0.050, vs. β = 0.061, SE = 0.043); indeed, the effect was not reliable for thrice-presented items. Likewise, person-level FOKs tended to generate a larger effect for once-presented items, β = 0.243, SE = 0.132, than for thrice-presented items, β = 0.090, SE = 0.081.

These results were consistent with the idea that very high levels of recollection for correctly recognized items blunted the connection between FOKs and CJs for the thrice-presented items, given the strong effect of repetition on mean FOKs in the previous analysis. By this interpretation, the interaction does not imply qualitative shifts in the basis for FOK–CJ relations in the different repetition conditions.

In general, these results supported our fourth hypothesis of FOK–CJ relations for correctly recognized items, consistent with previous research (Eakin et al., 2013; Hertzog, Dunlosky, & Sinclair, 2010): Access to information about encoding strategies at the time of the FOK forecasts recognition states that generate higher confidence in the accuracy of the forced choice recognition discrimination. This outcome sets the stage for a test of the noncriterial-recollection account of FOK relations to recognition memory CJs.

Strategy recall, FOKs, and CJs

To evaluate the contributions of strategy recall to these effects, we added item-level strategy recall and person-level strategy recall to the previous model. Table 6, Model 3, shows that doing so resulted in significant prediction of CJs by item-level strategy recall, but not at the person level. Furthermore, the effect of FOKs on CJs was reduced but not eliminated by adding strategy recall to the model, suggesting that strategy recall partially mediated the predictive validity of FOKs for CJs for correct recognition responses. In contrast, strategy recall had little impact on the repetition-related effects in Model 2.

We evaluated the indirect effects of item-level FOKs on CJs as mediated by strategy recall by modeling the item-level data using Mplus. Repetition and concreteness both had relatively robust direct effects on CJs (standardized effects of .33 and .24, respectively). The standardized direct effects of item-level FOKs and item-level strategy recall were reliable, but weaker (.08 and .07, respectively).Footnote 6 The standardized total effect of strategy recall on CJs was .11 (SE = .23). The standardized indirect effect mediated through FOKs was smaller, but reliably greater than zero, effect = .04, SE = .02, p < .05.

These results support the more modest version of the noncriterial-recollection hypothesis. Strategy recall is indeed one of the cues accounting for the predictive validity of FOKs for recollective experiences during the recognition test. However, other cues must also be operating to generate FOK–CJ relationships.

The fact that strategy recall accounts for most of the FOK–recognition memory correlations, but not for most of the FOK–CJ correlations for correct trials, supports the argument that recognition success and CJs reveal different aspects of recognition memory with which to validate FOKs. As has been noted previously, awareness that one has accessed little or no information about the target after cued recall failure could lead to a phenomenal experience of knowing that one does not know (Liu et al., 2007), which cannot account for correlations between FOKs and CJs for correct recognition trials. Forced choice recognition success requires access to specific information about the original cue–target relationship that could in principle be triggered at the time of the FOK by the recall cue, and strategy recall is either the principal source of this information or a correlate of most available sources. However, although strategy recall does predict CJs for correct recognition trials, it does not account in full for the observed predictive validity of FOKs for CJs. Thus, FOKs must be, ipso facto, influenced by other cues that foster a positive FOK–CJ relationship. A number of candidate cues that were not directly measured in this study could in principle influence FOKs and FOK accuracy, including (1) cue familiarity, promoted in this experiment by the manipulation of repetition (e.g., Metcalfe et al., 1993); (2) access to target features not integrated into the encoded mediator when making the FOK (Thomas et al., 2011); and (3) recollection of other aspects of the encoding context besides the mediator itself.

However, it is also clear that FOKs did not achieve the level of predictive validity for CJs that was in principle possible, given the magnitude of the observed relationships of the available cues of repetition, concreteness, and strategy recall to CJs, as well as the substantial residual variance in CJs implicitly related to other, unmeasured influences. One possible explanation for the limited FOK–CJ relationship—aside from poor monitoring or suboptimal rating scale behavior—contrasts the types of recollective experience that are not shared in common between cue-generated FOKs and recognition tests. Recollective experiences during the recognition test can also derive from the foil-induced recall-to-reject mechanisms cited earlier, and these contributions to recognition test confidence cannot in principle be anticipated at the time of the FOK (when the foils are not yet known; as in Thiede & Dunlosky, 1994). One can draw an analogy here to work on the difference between immediate judgments of learning (JOLs), on the one hand, and delayed JOLs and FOKs, on the other hand (Eakin & Hertzog, 2012). Immediate JOLs are insensitive to cue set size and target set size effects that influence implicit retrieval interference during a cued-recall test (see T. O. Nelson & Dunlosky, 1991). In contrast, both delayed JOLs and FOKs, which are influenced by the accessibility of information during cued recall, are sensitive to these retrieval-based effects.

Limitations and conclusions

The major finding of this study was that the mediators that participants produce at encoding play a large role in statistically explaining variation in FOKs, the resolution of FOKs, and (to a lesser degree) the relationship between FOKs and CJs in a subsequent associative recognition test. As such, the degree of strategy recall appears to be one of the pathways by which noncriterial recollection influences FOKs and FOK accuracy.

These results do not necessarily generalize to all other studies of episodic FOK accuracy, given that in this study we instructed individuals to use encoding strategies and required individuals to report their encoded mediators at study and after cued recall. This procedure could have increased the salience of strategy recall as a potential cue for making FOKs. Hence, we cannot conclude at present that encoding strategy recall influences FOKs in tasks in which participants spontaneously generate the strategies without experimenter intervention. Note, however, that people do spontaneously generate mnemonic strategies—including imagery—during encoding (e.g., Dunlosky & Hertzog, 2001), and that other evidence suggests that recalling spontaneous strategy use may affect FOKs. Hosey, Peynircioǧlu, and Rabinovitz (2009) requested post hoc justifications for face–name FOKs from their participants, and they found that individuals often reported access to aspects of encoding when they made high FOKs. Hertzog, Sinclair, & Dunlosky, (2010) used retrospective strategy reports to measure spontaneous strategy use when learning verbal paired associates. They showed that the reported encoding strategies correlated with JOLs made immediately after each item was encoded. One could even argue that the effects seen in the present study might be even stronger under conditions in which individuals spontaneously used strategies for some but not all items, so that strategy use would contribute to subsequent item-level variability in recognition and in recognition CJs. New research will be needed to investigate whether FOKs are influenced by spontaneous strategy use, and the degree of influence that this might have on FOK accuracy.

We also acknowledge that the strategy recall–FOK relationships observed in this study were inherently correlational. Although they are consistent with the interpretation that strategy recall in this task has a causal influence on FOKs, we cannot rule out the generic rival explanation that unmeasured cues that are correlated with strategy recall are the actual basis for the strategy recall–FOK relationships observed in this study. We are justified in concluding that the observed relationship is independent of the manipulated variables of repetition and concreteness, and that some of the effects of those independent variables on FOKs covary with strategy recall, rendering strategy recall a plausible candidate explanation of FOK variance.

Scientists have speculated about whether FOKs might be based on nonanalytic feelings of warmth that could derive from indirect (and perhaps unconscious) access to the target (e.g., Metcalfe, 2000) or implicit influences on cue familiarity (Jameson, Narens, Goldfarb, & Nelson, 1990), as opposed to being based on information generated by explicit retrieval searches prompted by the FOK. The present results demonstrate that one product of an explicit retrieval search, strategy recall, appears to have a strong influence on FOKs. This outcome is therefore consistent with what Nelson and Narens (1990) once termed the “no-magic” account of FOKs. The retrieval of information about the mediators created by implementing instructed strategy use apparently influences FOKs for unrecalled items. Access to the original encoding strategies accounts for much of the correlation of FOKs with subsequent recognition memory success and for some of the correlation of FOKs with CJs for correct recognition trials.