Simultaneous utilization of multiple cues in judgments of learning
- 522 Downloads
There is much evidence that metacognitive judgments, such as people’s predictions of their future memory performance (judgments of learning, JOLs), are inferences based on cues and heuristics. However, relatively little is known about whether and when people integrate multiple cues in one metacognitive judgment or focus on a single cue without integrating further information. The current set of experiments systematically addressed whether and to what degree people integrate multiple extrinsic and intrinsic cues in JOLs. Experiment 1 varied two cues: number of study presentations (1 vs. 2) and font size (18 point vs. 48 point). Results revealed that people integrated both cues in their JOLs. Experiment 2 demonstrated that the two word characteristics concreteness (abstract vs. concrete) and emotionality (neutral vs. emotional) were integrated in JOLs. Experiment 3 showed that people integrated all four cues in their JOLs when manipulated simultaneously. Finally, Experiment 4 confirmed integration of three cues that varied on a continuum rather than in two easily distinguishable levels. These results demonstrate that people have a remarkable capacity to integrate multiple cues in metacognitive judgments. In addition, our findings render an explanation of cue effects on JOLs in terms of demand characteristics implausible.
KeywordsMetamemory Judgments of learning Cue integration
Metacognition—the ability to think about one’s own thoughts and cognitions—is among the most fascinating aspects of the human mind. In experimental psychology, the study of metacognition is inextricably linked to obtaining metacognitive judgments. Classic metacognitive judgments include retrospective confidence judgments, feeling of knowing judgments, and judgments of learning (e.g., Dunlosky & Tauber, 2014; Koriat, 2007). In using these judgments, much has been learned about metacognition. Most importantly, research demonstrated that people do not have privileged access to their cognitions. Instead, people infer the state of their own cognitive systems using cues and heuristics. For instance, Koriat’s (1997) cue-utilization theory suggests that people base predictions of future memory performance (judgments of learning, JOLs) on different types of cues. Intrinsic cues, such as word frequency or concreteness, are inherent to study materials. In contrast, extrinsic cues, such as presentation time or number of study presentation, are bound to specific study conditions. Based on a review of early JOL studies, Koriat (1997) argued that JOLs are usually sensitive to intrinsic cues but often insensitive to extrinsic cues (but see Dunlosky & Matvey, 2001; Jang & Nelson, 2005).
Over the years, considerable evidence for the impact of various specific cues on metacognitive judgments has accumulated. For instance, Rhodes and Castel (2008) found that the font size of to-be-remembered words influences JOLs, Zimmerman and Kelley (2010) demonstrated that JOLs take account of the word characteristic emotionality, and Mueller, Tauber, and Dunlosky (2013) summarized results showing that associative relatedness between members of a word pair affects JOLs. By comparison, the question of whether and how multiple cues combine to affect metacognitive judgments has received less attention (Rhodes, 2016). Resolving this question is essential for two reasons. First, evidence that a cue affects JOLs when manipulated in isolation does not necessarily imply that this cue affects JOLs in situations where multiple cues are available (e.g., Undorf & Erdfelder, 2013). As important, focusing on single cues falls short of understanding metacognition in everyday situations that usually provide learners with multiple potentially relevant cues (Rhodes, 2016). In the current set of experiments, we therefore empirically evaluate the integration of multiple cues in JOLs.
Cue integration in judgments
The idea that judgments are based on cues and heuristics is not unique to metacognitive judgments but is commonly held in research on judgment and decision-making (JDM; e.g., Koriat, 1997, 2015). In JDM research, the question of how and when people integrate multiple pieces of information in their judgments has been a focus for more than seven decades. Following the seminal work of Brunswik (1944, 1955), numerous studies have investigated the integration of multiple cues in diverse judgmental contexts, including clinical judgments, diagnoses, lie detection, predictions, and staff decisions. Reviews of this research (e.g., Karelaia & Hogarth, 2008) documented high multiple correlation coefficients for predicting judgments by the cues that are available in the judgment context. These correlations are interpreted as indicating additive integration or compensatory use of cues (e.g., Brehmer, 1994; Einhorn, Kleinmuntz, & Kleinmuntz, 1979).
This conclusion, however, has been challenged for theoretical and methodological reasons since the heuristics and biases program (cf. Tversky & Kahneman, 1974). Theoretically, it has been argued that human judgment focuses on one cue in a noncompensatory fashion and follows this cue without integrating further information because people’s cognitive resources are limited (Gigerenzer, Todd, & the ABC Research Group, 1999; Todd, Gigerenzer, & the ABC Research Group, 2012). Methodologically, it has been argued that high multiple correlation coefficients of the linear model can occur even when people actually use one-cue strategies (Bröder, 2000; Martignon & Hoffrage, 2002).
There is an emerging consensus that people use simple one-cue heuristics under certain circumstances, such as when they are under time pressure or when they have to retrieve information from memory (for an overview, see Pachur & Bröder, 2013). At the same time, additive integration of multiple cues occurs in various judgment domains (e.g., Anderson, 1981, 2013).
From the perspective of JDM research, there is good reason to suspect that JOLs integrate multiple cues in an additive manner. This is because cue information is often readily available when making JOLs, meaning that there are no explicit search costs for retrieving cue information from the environment or from memory (e.g., Platzer, Bröder, & Heck, 2014). For instance, information about whether an item is printed in large or small font, is presented for study for 2 or 8 s, or evokes an emotional reaction is an integral part of studying it (salience differences between cues notwithstanding). However, it is also plausible to predict that JOLs rely on simple one-cue heuristics. The reason for this is that JOLs refer to a relatively noisy environment (i.e., one’s own memory performance). Consequently, cue validities are hard to learn, and people might rely on the most obvious and valid cues. Moreover, in judgment domains with high stochastic uncertainty, simple and frugal heuristics were found to be particularly robust (Gigerenzer & Brighton, 2009). In sum, there are two possibilities, both of which are plausible from a JDM perspective: When making JOLs, people might integrate multiple cues or focus on single cues.
Cue integration in JOLs
Prior JOL studies that focused on cue integration investigated JOLs made during multitrial learning (e.g., Hertzog, Hines, & Touron, 2013; Hines, Hertzog, & Touron, 2015; Tauber & Rhodes, 2012). In these experiments, people studied the same material in two or more study-test trials. Results revealed that JOLs from later study trials were based on multiple cues, including prior JOLs, prior memory performance, and prior recognition confidence judgments. However, it is possible that all these cues are specific instantiations of a single general cue (e.g., subjective item difficulty). These studies therefore do not provide strong evidence for the integration of multiple cues in JOLs outside of multitrial learning.
At the same time, several observations are relevant to the issue of cue integration in JOLs, although they were obtained in studies that addressed different research questions. All these observations came from experiments that manipulated two or more cues in within-participant designs. In an early experiment with extrinsic cues, Zechmeister and Shaughnessy (1980) demonstrated that number of study presentations and repetition of items in a massed or distributed fashion both affected JOLs (notably, JOLs were higher for massed items than for distributed items, whereas the opposite pattern was found for memory performance; see also Son & Simon, 2012). More recent studies revealed that JOLs were sensitive to the jointly manipulated cues font format (standard vs. aLtErNaTiNg) and font size1 (Rhodes & Castel, 2008), number of study presentations and font size (Kornell, Rhodes, Castel, & Tauber, 2011), or font style and font size (Price, McElroy, & Martin, 2016). In contrast, Benjamin (2005) demonstrated that unspeeded JOLs for word pairs were affected by target duration but not by cue duration, whereas the reverse was true for speeded JOLs (for similar findings, see Metcalfe & Finn, 2008). Similarly, Pyc and Rawson (2012) found that JOLs were sensitive to criterion level (number of correct recalls during practice) but were insensitive to the lag between practice trials, and Tauber and Rhodes (2010, Experiment 4) revealed that list length but not list order affected JOLs. With pictorial stimuli, Besken (2016, Experiment 3) found an effect of presentation format (intact vs. degraded) on JOLs, but no effect of the matching of a preceding contour. In summary, while some studies hinted that people integrated two extrinsic cues in JOLs, other studies showed that one cue affected JOLs while another cue was ignored.
A similar picture emerges from experiments that manipulated two or more intrinsic cues. In paired-associate learning, Begg, Duft, Lalonde, Melnick, and Sanvito (1989) found that cue concreteness and target concreteness both affected JOLs, whereas Illman and Morrison (2011) reported that JOLs were sensitive to cue imageability and target age of acquisition but not to cue age of acquisition and target imageability. Also, Hourihan, Fraundorf, and Benjamin (2017) revealed that JOLs were sensitive to word frequency but insensitive to valence and arousal.
Finally, most studies that manipulated one extrinsic and one intrinsic cue alluded to cue integration. When manipulating presentation time and relatedness (Jang & Nelson, 2005; Koriat, 1997), number of study presentations and relatedness (Jang & Nelson, 2005), font size and relatedness (Price & Harrison, 2017; Rhodes & Castel, 2008), announced retention interval and relatedness (Koriat, Bjork, Sheffer, & Bar, 2004, Experiment 3b), and font format and relatedness (Mueller et al., 2013), both cues affected JOLs. Similarly, Mueller and Dunlosky (2017, Experiment 6) showed that both font color (associated with an induced belief) and stimulus category (word vs. nonword) affected JOLs. In Mueller, Dunlosky, Tauber, and Rhodes’s (2014) Experiment 1, an interactive effect of font size and stimulus category on JOLs indicated cue integration. Peynircioglu, Brandler, Hohman, and Knutson (2014) revealed a similar interaction between presentation modality (visual vs. auditory) and syntax (coherent vs. re-arranged) on JOLs for musical pieces. Other studies manipulating extrinsic and intrinsic cues showed that cues were ignored. Magreehan, Serra, Schwartz, and Narciss (2016) reported that JOLs were insensitive to readability but sensitive to relatedness. Susser and Mulligan (2015, Experiment 2) found that writing hand (dominant vs. nondominant) but not word frequency affected JOLs. Studies manipulating reward and relatedness showed that college students and ninth-graders based JOLs on both cues (e.g., Koriat, Ackerman, Adiv, Lockl, & Schneider, 2014; Soderstrom & McCabe, 2011). In contrast, fifth-graders and sixth-graders usually focused on the one cue that was more salient and integrated the two cues only after a training procedure designed to foster cue integration (Koriat et al., 2014).2
Taken together, previous JOL studies that varied two or more cues suggest that people sometimes ignore available cues when making JOLs. However, for some of the cues, it remains unclear whether people ignored them because multiple cues were available or whether people would have ignored them even when manipulated in isolation (e.g., lag between practice trials, readability, matching of a preceding contour). More importantly, the studies revealing that two cues affected JOLs do not strictly warrant the conclusion of individual participants integrating cue information in their JOLs. The reason is that cue effects were tested at the aggregate level, and therefore may occur even if each individual participant based his or her JOLs on only one cue but different individuals focused on different cues. As an illustration, consider Kornell et al.’s (2011) experiment, in which number of study presentations and font size both affected JOLs. It is of course possible that all or most participants based their JOLs on both cues (i.e., integrated the two cues). However, the general pattern of results is also consistent with some participants basing their JOLs on number of study presentations only and other participants basing their JOLs on font size only. Consequently, prior studies do not provide conclusive evidence for cue integration in JOLs. Furthermore, most of the studies did not specifically target the question of cue integration.
The current study
The experiments reported here systematically investigated whether and to what degree people integrate multiple cues in JOLs. We only selected cues that are well known to affect JOLs when manipulated in isolation. Thus, if people ignore cues, it is reasonable to conclude that this is due to multiple cues being available. We tested the generality of our findings in four ways. First, we simultaneously varied two cues in Experiments 1 and 2 and more than two cues in Experiments 3 and 4. Second, we manipulated extrinsic cues in Experiment 1, intrinsic cues in Experiment 2, and combinations of intrinsic and extrinsic cues in Experiments 3 and 4. Also, we used two discrete cue levels in Experiments 1 to 3, but varied all cues continuously in Experiment 4. Finally, we included not only cues that are known to have similar effects on JOLs and memory performance (number of study presentations, concreteness, emotionality) but also a cue that is known to affect JOLs but usually has no or minimal effects on memory performance (font size).
If the information provided by multiple cues is integrated in JOLs, (1) JOLs should be sensitive to the cues at the aggregate level and (2) individual-level analysis should reveal that a large number of participants base their JOLs on at least two cues. In contrast, if cue integration in JOLs does not occur, individual-level analysis should reveal that participants base their JOLs on only one cue.
In Experiment 1, we varied the two extrinsic cues number of study presentations and font size. Participants studied words that were presented either once or twice. Half the once and twice presented words were printed in a smaller font and the rest were printed in a larger font. Participants made a JOL after the presentation of each word and, after the study phase, completed a free recall test. This procedure was similar to that of a previous experiment by Kornell et al. (2011, Experiment 1). Kornell et al., however, focused on the stability bias—namely, the observation that JOLs underestimate the beneficial effect of future study opportunities—and therefore asked participants to make JOLs only after the first study presentation of each word, knowing whether or not they would study it again. In contrast, in Experiment 1, we focused on JOLs that were made after the study of each item was completed. We expect that both number of study presentations and font size affect JOLs at the aggregate level. If information from these two extrinsic cues is integrated in JOLs, individual-level analyses should reveal that a large number of participants base their JOLs on both cues.
Participants and materials
Participants were 53 University of Mannheim undergraduates. Stimuli were 56 German 5–10 letter nouns. All normed values were taken from Võ et al. (2009). Words were of moderate concreteness (M = 4.05, SD = 0.62), neutral valence (M = 0.03, SD = 0.24), and moderate arousal (M = 2.71, SD = 0.41). Four additional words served as primacy buffers and were not included in the analysis.
The experiment consisted of a study phase and a free recall test. Instructions informed participants that they would study 60 words and would be asked to recall as many words as they could remember in a final test. Participants were also told that after the presentation of each word, they would be asked to estimate the probability of recalling it in the test phase. For each participant, 30 randomly chosen words (two buffer and 28 target items) were presented once for study, and the remaining 30 words were presented twice. Of both the words presented once and the words presented twice, a randomly selected half was displayed in a small Arial font (18 point), and the other half was displayed in a large Arial font (48 point). Each word remained on the screen for 3 s. Immediately after studying each word, the JOL prompt “The chance to recall (0%–100%)?” appeared on the screen, and participants typed any whole number from 0 to 100. Consequently, we obtained one JOL for words that were presented once and two JOLs for words that were presented twice. A 100-ms blank screen preceded the presentation of each word. Following the study phase, participants performed a numerical filler task for 3 min. Finally, they were asked to write down as many of the words from the study phase as they could remember, in any order. Participants were given 5 min for free recall.
Results and discussion
A 2 (number of study presentations) × 2 (font size) ANOVA on recall performance revealed that actual memory was significantly better for words studied twice than for words studied once, F(1, 52) = 150.06, p < .001, ηp2 = .74. Neither the main effect of font size nor the interaction were significant, both Fs < 1.
To analyze cue integration on the individual level, we coded participants as having based JOLs on number of study presentations if their JOLs were higher for words studied twice than for words studied once. Likewise, participants were coded as having based JOLs on font size if their JOLs were higher for words presented in a large font than for words presented in a small font. Results revealed that 32 participants (60.38%) integrated number of study presentations and font size in their JOLs (binomial test: p < .001). The remaining participants based their JOLs on either number of study presentations (7 participants) or font size (12 participants) or on neither cue (2 participants).
In summary, replicating Kornell et al. (2011), both number of study presentations and font size affected JOLs at the aggregate level. While number of study presentations affected recall performance, font size left recall performance unchanged. As in previous studies (e.g., Kornell et al., 2011; Rhodes & Castel, 2008), a large font increased overconfidence in JOLs. Two complementary individual-level analyses revealed that a tight majority of participants integrated both cues in their JOLs. Experiment 2 investigated whether cue integration in JOLs would extend to two intrinsic cues that have similar effects on JOLs and recall performance.
In Experiment 2, we manipulated the two intrinsic cues concreteness and emotionality. Participants studied words that were either abstract or concrete. Half the words of each level of concreteness were neutral and the rest were emotional. Neutral words were low in arousal and neutral in valence, whereas emotional words were high in arousal and either positive or negative in valence. Hence, as has been done in prior studies, we manipulated arousal and valence jointly to maximize effects of emotionality on JOLs (Tauber & Dunlosky, 2012; Zimmerman & Kelley, 2010; but see Hourihan et al., 2017). If information from the two cues is integrated in JOLs, individual-level analyses should reveal that a large number of participants base their JOLs on both cues.
Participants and materials
Participants were 55 University of Mannheim undergraduates. Stimuli were 56 German 5–10 letter nouns. All normed values were taken from Võ et al. (2009). Half the words (28 words) were abstract and half were concrete. Mean imagery value was 2.49 (SD = 0.35) for abstract words and 5.65 (SD = 0.54) for concrete words (rated on a 7-point scale, 1 = low imageability to 7 = high imageability). Half the abstract words (14 words) were low in arousal (M = 2.10, SD = 0.23; rated on a 5-point scale, 1 = low arousal to 5 = high arousal) and neutral in valence (M = 0.10, SD = 0.21; 7-point scale, −3 = very negative through 0 = neutral to 3 = very positive), while the rest were high in arousal and either positive (seven words, arousal: M = 2.76, SD = 0.31, valence: M = 2.19, SD = 0.15) or negative (seven words, arousal: M = 4.05, SD = 0.50, valence: M = −2.12, SD = 0.13). The same was true for concrete words: Half (14 words) were low in arousal and neutral (arousal: M = 1.88, SD = 0.19, valence: M = 0.12, SD = 0.24), while the rest were high in arousal and either positive (seven words, arousal: M = 3.51, SD = 0.48, valence: M = 2.22, SD = 0.12) or negative (seven words, arousal: M = 3.84, SD = 0.36, valence: M = −2.09, SD = 0.11). Four additional words that differed in concreteness and emotionality served as primacy buffers and were not included in the analysis.
The procedure was identical to that of Experiment 1, except that each word was presented only once in a 26-point font.
Results and discussion
A 2 (concreteness) × 2 (emotionality) ANOVA on recall performance revealed that actual memory was significantly better for concrete words than for abstract words, F(1, 54) = 57.35, p < .001, ηp2 = .52. Recall performance was also significantly higher for emotional words than for neutral words, F(1, 54) = 17.41, p < .001, ηp2 = .24. The interaction was not significant, F < 1.
Data from Experiment 2 demonstrated that both concreteness and emotionality affected JOLs and recall performance at the aggregate level. Individual-level analyses revealed that about one half of the participants integrated both cues in their JOLs. Experiment 2 thus replicated Experiment 1’s finding of cue integration. It is worth noting that, unlike in Experiment 1, both cues were intrinsic and had similar effects on JOLs and actual memory. This illustrates the robustness of additive cue integration in JOLs. Nevertheless, it is still possible that cue integration in JOLs is limited to two cues and does not occur with more than two cues. Experiment 3 therefore tested whether JOLs would integrate information from the four cues used in Experiments 1 and 2 when manipulated simultaneously.
In Experiment 3, we varied the extrinsic cues from Experiment 1 (number of study presentations, font size) and the intrinsic cues from Experiment 2 (concreteness, emotionality). This allowed us to investigate whether people integrate up to four extrinsic and intrinsic cues in JOLs. As in the previous experiments, individual-level analyses are used to assess cue integration.
Participants and materials
Participants were 50 University of Mannheim undergraduates. Stimuli were 64 German 5–10 letter nouns. As in Experiment 2, half the words (32 words) were abstract (M = 2.48, SD = 0.36) and half were concrete (M = 5.66, SD = 0.52). Half the abstract words were of low arousal and neutral (16 words, arousal: M = 2.09, SD = .22; valence: M = 0.14, SD = 0.22), while the rest were of high arousal and either positive (eight words, arousal: M = 2.74, SD = 0.29, valence: M = 2.19, SD = 0.14) or negative (eight words, arousal: M = 3.96, SD = 0.41, valence: M = −2.10, SD = 0.13). The same was true for concrete words: Half were low in arousal and neutral (16 words, arousal: M = 1.89, SD = 0.18, valence: M = 0.15, SD = 0.24), while the rest were high in arousal and either positive (eight words, arousal: M = 3.43, SD = 0.49, valence: M = 2.27, SD = 0.18) or negative (eight words, arousal: M = 3.85, SD = 0.47, valence: M = −2.10, SD = 0.10). Four additional words served as primacy buffers and were not included in the analysis.
The procedure was the same as in Experiment 1, except that for each participant, one randomly chosen fourth of the words from each of the four levels of concreteness and arousal were presented (1) once in a small font, (2) once in a large font, (3) twice in a small font, and (4) twice in a large font.
A 2 (number of study presentations) × 2 (font size) × 2 (concreteness) × 2 (emotionality) ANOVA on recall performance revealed that actual memory was significantly better for words studied twice than for words studied once, F(1, 48)3 = 193.98, p < .001, ηp2 = .80. A significant main effect of font size revealed better memory for words presented in a large font than for words presented in a small font, F(1, 48) = 8.98, p = .004, ηp2 = .16. Concerning concreteness, memory was better for concrete words than for abstract words, F(1, 48) = 51.83, p < .001, ηp2 = .52. Finally, a significant main effect of emotionality revealed better memory for emotional words than for neutral words, F(1, 48) = 11.70, p = .001, ηp2 = .20. No other effects were significant, all Fs < 2.96.
An individual-level analysis based on simple mean differences suggested that 26 participants (52.00%) integrated all four cues in their JOLs (binomial test: p < .001). Eighteen participants (36.00%) integrated three cues in their JOLs (2 participants: number of study presentations, font size, and concreteness; 5 participants: number of study presentations, font size, and emotionality; 6 participants: number of study presentations, concreteness, and emotionality; 5 participants: font size, concreteness, and emotionality). The remaining 6 participants (12.00%) integrated two cues in their JOLs (2 participants: number of study presentations and emotionality; 2 participants: number of study presentations and concreteness; 1 participant: font size and emotionality; 1 participant: concreteness and emotionality). An individual-level analysis of effect sizes with the criterion of |d| ≥ .2 revealed that 22 participants (44.00%) integrated all four cues, that 17 participants (34.00%) integrated three cues, and that 10 participants (20.00%) integrated two cues. The remaining participant (2.00%) used only one cue for making JOLs.
Experiment 3 showed that all four cues affected JOLs and recall performance at the aggregate level. Individual-level analyses revealed that cue integration in JOLs is exceptional. First, 88.00% (simple mean difference analysis) or 78.00% (effect size analysis) of participants based their JOLs on three or four cues. Moreover, all (or all but one) participants integrated at least two cues in their JOLs.
However, although these impressive results apparently demonstrate a high capacity for cue integration, two issues may be raised. First, we manipulated each cue in two easily distinguishable levels. There is a danger, then, that judgments are responses to the demand characteristics of the situation rather than reflections of the psychological variable of interest (Orne, 1962). This means that participants might base their JOLs on a plausible ad hoc hypothesis about the pattern of memory predictions that the experimenter expects (for the impact of ad hoc theories on JOLs, see Mueller & Dunlosky, 2017). Obviously, this is of concern not only for the current experiments but also for other JOL studies. When compared to experiments that vary a single cue, our Experiment 3 alleviated this concern because simultaneously manipulating four cues probably reduced the saliency of each cue. Nevertheless, dichotomous cues might have triggered demand characteristics.
Moreover, in Experiment 3, the selection of words was nonrepresentative in that we (a) excluded words of intermediate concreteness and (b) varied concreteness and emotionality orthogonally. Orthogonal manipulations of stimulus features destroy the correlational structure of the environment. From a Brunswikian perspective, this may produce processes different from those in a natural domain (e.g., Dhami, Hertwig, & Hoffrage, 2004). For instance, strong positive correlations between cues render some information redundant and enhance the accuracy of one-cue judgment strategies, whereas negative correlations increase conflict and boost compensatory decision making (e.g., Bettman, Johnson, Luce, & Payne, 1993; Fasolo, McClelland, & Lange, 2005). Hence, it is worthwhile to see whether cue integration would also occur in a representative design. Thus, Experiment 4 investigated cue integration in JOLs with nonorthogonal cues that vary on a continuum.
First, though, an unexpected result from Experiment 3 deserves comment: Words presented in a large font were better recalled than words presented in a small font (for a similar finding, see Price et al. 2016). However, this finding did not replicate in Experiments 1 or 4 and thus is not considered further.
In Experiment 4, words were presented in eight different font sizes between 18 point and 48 point. We selected a representative sample of words that varied across a wide range of concreteness and emotionality. Thus, Experiment 4 assessed whether or not participants base their JOLs on multiple nonorthogonal cues that vary on a continuum. Individual differences were taken into account by using multilevel regression models that estimated subject-to-subject variation in mean JOLs or recall and in the impact of font size, concreteness, and emotionality on JOLs or recall. To facilitate comparisons across experiments, we also report individual-level analyses that are parallel to those provided for Experiments 1 to 3.
Participants and materials
Participants were 48 University of Mannheim undergraduates. We selected a representative sample of 5–10 letter nouns from Võ et al. (2009). To this end, we first divided all nouns into six levels of concreteness (i.e., the one sixth of nouns with lowest concreteness, one sixth of nouns with next lowest concreteness, etc.). Second, we divided all nouns into six levels of arousal (i.e., the one sixth of nouns with lowest arousal, one sixth of nouns with next lowest arousal, etc.). We then selected 64 words while ensuring that the percentage of words in each combination of concreteness and arousal level matched the respective percentage in the Võ et al. (2009) word norms. Mean values of concreteness and arousal and the concreteness–arousal correlation were similar for the selected words and for all words (concreteness: 4.25 vs. 4.29, arousal: 2.76 vs. 2.78, correlation: −.10 vs. −.12). Four additional words served as primacy buffers and were not included in the analysis.
The procedure was the same as in Experiment 3, except that all words were presented once and that, for each participant, eight randomly chosen words each were presented in font sizes of 18, 21, 24, 27, 31, 36, 41, and 48 point.
Results and discussion
We used multilevel regression (R packages lme4 and lmerTest; Bates, Maechler, Bolker, & Walker, 2015; Kuznetsova, Brockhoff, & Christensen, 2016; R Core Team, 2016) to evaluate the impact of concreteness, emotionality, and font size on JOLs and recall performance. We specified random intercepts for participants and uncorrelated random effects for concreteness, emotionality, font size, and their interactions. All predictors were centered. Recall performance was modeled with a logistic regression model.
A multilevel logistic regression of recall performance revealed that both concreteness and emotionality but not font size affected actual memory, concreteness: z = 6.04, p < .001; emotionality: z = 3.47, p < .001; font size: z = 1.54; p = .123. On average, each one-unit increase in concreteness increased odds of recall by 1.24 times, and each one-unit increase in emotionality increased odds of recall by 1.27 times. No other effects were significant.
As an analog of the individual-level analyses based on simple mean differences, we coded participants as having based JOLs on a particular cue if that cue revealed a positive regression weight in a multiple linear regression predicting their JOLs from all three cues. Results revealed that 29 participants (60.42%) integrated all three cues in their JOLs (binomial test: p < .001). Thirteen participants (27.08%) integrated two cues in their JOLs (5 participants: concreteness and emotionality; 2 participants: font size and emotionality; 6 participants: font size and concreteness). Five participants (10.42%) based their JOLs on only one cue (2 participants respectively: font size, emotionality; 1 participant: concreteness). The remaining participant used none of the cues. For the individual analysis based on effect size, we tested each participant’s standardized regression weight for the respective cue against Cohen’s effect size convention for small effects in measures of association (|r| ≥ .10). According to this criterion, 9 participants (18.75%) integrated all three cues, and 16 participants (33.33%) integrated two cues, whereas 13 participants (27.08%) used only one cue, and 10 participants (20.83%) used none of the cues.
In Experiment 4, the majority of participants integrated two or three cues in their JOLs. This showed that people can integrate nonorthogonal cues that vary on a continuum. However, the number of participants that integrated all available cues in their JOLs was lower in Experiment 4 than in Experiment 3 (according to both classification criteria), indicating that nonorthogonal cues that vary on a continuum are more difficult to integrate than orthogonal cues that vary in two discrete levels. Nevertheless, Experiment 4 showed that, in representative designs, additive integration of multiple cues in JOLs occurs.
Our finding that emotionality affected JOLs when manipulated on a continuum is somewhat at odds with Hourihan et al.’s (2017) finding that JOLs were sensitive to word frequency but insensitive to valence and arousal when all three cues were manipulated on a continuum. Importantly, in our Experiment 4, effects of font size, concreteness, and emotionality on JOLs remained when we included word frequency in the analysis. Given that ranges of word frequency were comparable in Hourihan et al.’s and in our experiment, differences in results are probably due the fact that Hourihan et al. decomposed emotionality into arousal and valence. Maybe, higher salience of emotionality as a whole fosters its integration in JOLs (for salience effects on JOLs, see Castel, McCabe, & Roediger, 2007; Dunlosky & Matvey, 2001; Koriat et al., 2014). However, regardless of the specific reasons for the discrepancy in findings, our results are compatible with Hourihan et al.’s conclusion that emotionality affects JOLs through a cognitive rather than a physiological mechanism. Similarly, our conclusion that emotionality can affect JOLs when manipulated on a continuum holds in the light of Hourihan et al.’s findings.
In the current study, we investigated whether and to what degree multiple cues combine to affect JOLs. The question of cue integration has been a major focus in judgment research, but has received relatively little attention in metacognition research. In fact, it was mostly tangential to the focus of prior JOL studies (for notable exceptions, see Hertzog et al., 2013; Hines et al., 2015; Tauber & Rhodes, 2012). In four experiments, we simultaneously varied up to four intrinsic and extrinsic cues of diverse validities in both orthogonal (Experiments 1–3) and representative designs (Experiment 4).
At the aggregate level, every single cue that was manipulated affected JOLs. This pattern of results closely mirrored findings from JOL experiments that manipulated each cue in isolation. In contrast, only three of the cues (number of study presentations, concreteness, and emotionality) consistently affected recall performance. The remaining cue, font size, affected recall performance in Experiment 3 but not in Experiments 1 or 4. With the exception of Price et al. (2016), previous studies found that font size did not affect recall performance (e.g., Kornell et al., 2011; Rhodes & Castel, 2008). This demonstrates that cue integration held regardless of whether cues had similar effects on JOLs and actual memory or not. In the current research, another difference between predicted and actual memory was that concreteness and emotionality interacted to influence JOLs but not recall performance in Experiments 1 and 4.
Crucially, although effects of two or more simultaneously varied cues on JOLs at the aggregate level allude to cue integration, such findings do not strictly warrant the conclusion that individual participants integrate cues in their JOLs. The reason is that significant effects of two or more cues at the aggregate level may also occur if individual participants based their JOLs on only a single cue and different participants based their JOLs on different cues. However, individual-level analyses on the basis of mean differences and effect size measures argued against this possibility. In all experiments, about half of the participants or more based their JOLs on two or more cues, depending on the criterion used to classify participants. One important finding of the present study was that increasing the number of manipulated cues from two to four did not impair cue integration in JOLs. In contrast, a representative design with nonorthogonal cues that varied on a continuum reduced cue integration in JOLs as compared to orthogonal designs with only two cue levels.
In addition to demonstrating that metacognitive judgments integrate information from multiple cues, the present study has relevance for other theoretical issues. First, when manipulating a single cue in two easily distinguishable levels, strong demand characteristics might threaten the validity of JOLs. Specifically, cue effects on JOLs might be based on participants’ hypotheses about what pattern of recall predictions would make a success of the experiment (Orne, 1962; see also Mueller & Dunlosky, 2017). The current Experiment 3 alleviates this concern in demonstrating that effects of four different cues on JOLs persisted when manipulated in the context of multiple varying cues. Even more compelling evidence came from Experiment 4, in which three nonorthogonal cues that varied on a continuum were integrated in JOLs. Although we cannot rule out the possibility that demand effects played a role in Experiments 3 and 4, we think that varying multiple cues greatly reduces the danger of demand effects on JOLs.
As a related point, the availability of multiple cues may affect the extent to which explicit beliefs about memory govern JOLs. Some previous studies that manipulated single cues such as concreteness (Witherby & Tauber, 2016) or font size (Mueller et al., 2014) have found that metamemory beliefs were the sole basis of JOLs (but see, e.g., Besken & Mulligan, 2013; Undorf & Erdfelder, 2015; Undorf & Zander, 2017; Undorf, Zimdahl, & Bernstein, 2017). These findings support the notion that people deliberately search for variability across items and base their JOLs on activated or newly formed beliefs about how item characteristics or experimental manipulations may affect memory (analytic processing theory; Dunlosky & Tauber, 2014; Mueller et al., 2013). It remains to be seen how analytic processing theory may fare in experiments with multiple varying cues, in which relevant beliefs are probably harder to activate or develop. Maybe, the availability of multiple cues fosters the reliance of JOLs on nonanalytic, experience-based processes such as fluency (Koriat, 1997; see also Undorf & Erdfelder, 2015). This speculation awaits further research.
Finally, our experiments touched on the idea that JOLs are sensitive to intrinsic cues but insensitive to extrinsic cues (Koriat, 1997). The finding that JOLs equally integrated intrinsic and extrinsic cues converges with previous results that questioned this hypothesis (Dunlosky & Matvey, 2001; Jang & Nelson, 2005).
The current study clearly demonstrated that people have the ability to integrate multiple cues in JOLs. Still, much remains to be learned about cue integration in metacognitive judgments. For instance, it will need to be explored whether integrating multiple cues fosters JOL accuracy. In the current study, we found a positive correlation between resolution and the number of cues used as bases for JOLs in Experiment 4 (r = .34, p = .017, d = 0.37). In Experiments 1 (r = .25, p = .08, d = 0.26), 2 (r = .04, p = .752, d = 0.04), and 3 (r = .21, p = .152, d = 0.21), however, correlations were numerically positive but not statistically different from zero. Clearly, more research is needed determine whether cue integration has positive effects on JOL accuracy.4 Also, it would be worthwhile to more thoroughly examine the limits of cue integration in JOLs. Moreover, it remains to be determined what factors influence how much weight people give to individual cues when integrating multiple cues in their JOLs. Finally, future research may explore how task features and individual differences impact on cue integration in JOLs.
In summary, the current work demonstrated that people integrate multiple cues in JOLs. Theoretically, these findings are relevant for understanding how JOLs are formed. Practically, the present results advance knowledge about metacognition in everyday learning, where multiple potentially relevant cues are available. Also, they demonstrate parallels between metacognitive judgments and judgments about the external world and may therefore contribute to developing an integrative theoretical view of cognition and metacognition.
While most cues can be easily classified as either intrinsic or extrinsic in the sense of Koriat’s (1997) cue-utilization theory of JOLs, classification of font size is ambiguous (see also Dunlosky & Matvey, 2001). Considering font size as an intrinsic cue is justified, in one sense, by the fact that it concerns properties inherent to the item itself (Rhodes, 2016; Rhodes & Castel, 2008). However, font size fits at least as well in the category of extrinsic cues inasmuch as it is a feature that is randomly assigned to items and therefore specific to particular study conditions. For the present study, we reserve the term intrinsic to cues that are inseparable from the item (such as concreteness or emotionality) and therefore consider font size an extrinsic cue. Importantly, the classification of font size is not critical to our general conclusion.
In their analyses, Koriat et al. (2014) did not differentiate between related and unrelated word pairs but between pairs with below-median and above-median self-paced study time. We interpret this median split as indicative of relatedness because several studies demonstrated shorter study time for related pairs than for unrelated pairs (e.g., Undorf & Ackerman, 2017; Undorf & Erdfelder, 2015).
One recall sheet was lost before recall performance was coded.
We thank an anonymous reviewer for suggesting this analysis.
This research was supported by a starter grant from the University of Mannheim to the first and second authors. Preparation of this article was completed in part while the first author was a visiting scholar in the Faculty of Industrial Engineering and Management of the Technion−Israel Institute of Technology, Haifa, Israel. We thank Rakefet Ackerman, Edgar Erdfelder, Ido Erev, and Asher Koriat for helpful comments on this research.
- Anderson, N. H. (1981). Foundations of information integration theory. New York, NY: Academic Press.Google Scholar
- Bröder, A. (2000). A methodological comment on behavioral decision research. Psychologische Beiträge, 42(4), 645–662.Google Scholar
- Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York, NY: Academic Press.Google Scholar
- Dunlosky, J., & Matvey, G. (2001). Empirical analysis of the intrinsic-extrinsic distinction of judgments of learning (JOLs): Effects of relatedness and serial position on JOLs. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(5), 1180–1191. https://doi.org/10.1037/0278-73184.108.40.2060 PubMedGoogle Scholar
- Dunlosky, J., & Tauber, S. K. (2014). Understanding people’s metacognitive judgments: An isomechanism framework and its implications for applied and theoretical research. In T. J. Perfect & D. S. Lindsay (Eds.), The Sage handbook of applied memory (pp. 444–464). Los Angeles, CA: Sage.CrossRefGoogle Scholar
- Fasolo, B., McClelland, G. H., & Lange, K. A. (2005). The effect of site design and interattribute correlations on interactive web-based decisions. In C. P. Haugtvedt, K. A. Machleit, & R. F. Yalch (Eds.), Online consumer psychology: Understanding and influencing consumer behavior in the virtual world (pp. 325–342). Mahwah, NJ: Erlbaum.Google Scholar
- Gigerenzer, G., Todd, P. M., & the ABC Research Group. (1999). Simple heuristics that make us smart. New York, NY: Oxford University Press.Google Scholar
- Koriat, A., Ackerman, R., Adiv, S., Lockl, K., & Schneider, W. (2014). The effects of goal-driven and data-driven regulation on metacognitive monitoring during learning: A developmental perspective. Journal of Experimental Psychology: General, 143(1), 386–403. https://doi.org/10.1037/a0031768 CrossRefGoogle Scholar
- Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2016). lmerTest: Tests in linear mixed effects models (R Package Version 2.0-33) [Computer software]. Retrieved from https://CRAN.R-project.org/package=lmerTest
- Mueller, M. L., Dunlosky, J., Tauber, S. K., & Rhodes, M. G. (2014). The font-size effect on judgments of learning: Does it exemplify fluency effects or reflect people’s beliefs about memory? Journal of Memory and Language, 70(1), 1–12. https://doi.org/10.1016/j.jml.2013.09.007 CrossRefGoogle Scholar
- R Core Team. (2016). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
- Rhodes, M. G. (2016). Judgments of learning: Methods, data, and theory. In J. Dunlosky & S. K. Tauber (Eds.), The Oxford handbook of metamemory (pp. 65–80). New York, NY: Oxford University Press.Google Scholar
- Soderstrom, N. C., & McCabe, D. P. (2011). The interplay between value and relatedness as bases for metacognitive monitoring and control: Evidence for agenda-based monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(5), 1236–1242. https://doi.org/10.1037/a0023548 PubMedGoogle Scholar
- Witherby, A. E., & Tauber, S. K. (2016). The concreteness effect on judgments of learning: Evaluating the contributions of fluency and beliefs. Memory & Cognition, 639–650. https://doi.org/10.3758/s13421-016-0681-0