The font size effect is a widely studied metamemory illusion, in which larger fonts elicit higher judgments of learning (JOLs; i.e., self-estimates of one’s future memory) but not better memory performance than smaller fonts (Rhodes & Castel, 2008). This effect has been replicated in many experiments, mostly with unrelated word lists (e.g., Halamish, 2018; Hu et al., 2016; Kornell et al., 2011; Mueller et al., 2014; Rhodes & Castel, 2008; Susser et al., 2013; Tatz & Peynircioğlu, 2020; Undorf & Zimdahl, 2019; Yang et al., 2018), and occasionally with related or unrelated word pairs (e.g., Chen et al., 2019; Double, 2019; Luna et al., 2018; Price & Harrison, 2017; Rhodes & Castel, 2008; Wang et al., 2020). A recent meta-analysis of the font size effect (Chang & Brainerd, 2022) pointed out that related word lists were so far a missing type of study materials in this line of research. This type of material, however, can provide important theoretical and empirical implications for the font size effect. Whereas prior studies showed that the JOL effects of font size are not mitigated by other item-specific cues like intra-item relation (e.g., Price & Harrison, 2017; Rhodes & Castel, 2008), it is unknown whether the same pattern holds for relational cues like inter-item relation. Thus, the use of related lists can provide unique insights into how people integrate item-specific cues like font size versus relational cues like inter-item relation in making JOLs. Also, because to-be-remembered information is often embedded in a web of semantic relations in real life, it is critical to investigate the font size effect under similar conditions.

Given that related lists can provide instructive findings about the font size effect while they have not yet figured in this line of research, the current study aimed to fill this gap by examining the JOL and recall effects of font size with both related and unrelated word lists. Below, we first highlight the difference between intra-item relations and inter-item relations in metamemory research. For the reader’s convenience, we will refer to them as pair relatedness and list relatedness in the following text, respectively. Then, we delineate some theoretical hypotheses about the JOL and recall effects of font size under conditions of list relatedness. Last, we briefly explain the three experiments in the current study.

Pair relatedness versus list relatedness in metamemory research

In the metamemory literature, pair relatedness (i.e., the relation between cue and target within a word pair) is a well-established cue that affects both JOLs and recall (e.g., Koriat, 1997; Mueller et al., 2013; Undorf & Erdfelder, 2015). Related word pairs (e.g., duke–prince) robustly elicit higher JOLs and better recall than unrelated word pairs (e.g., brush–cube). The present study targeted a different type of item relatedness, which is relations among the items on lists of single words. We refer to this as list relatedness (as opposed to pair relatedness).

According to Koriat’s (1997) cue-utilization account of JOLs, list relatedness and pair relatedness are distinct types of cues. In this account, people base their JOLs on three types of cues: intrinsic cues that pertain to the characteristics of study items, extrinsic cues that pertain to study conditions and encoding operations, and mnemonic cues that pertain to subjective encoding experience. As a word pair is a composite item during encoding, the relation between the cue and target within a word pair is a classic intrinsic cue. However, relations among items of a single-word list would be extrinsic cues, because they stimulate relational processing across items, which is an encoding operation applied by learners (Matvey et al., 2006).

Although the JOL effects of list relatedness have been less frequently studied than that of pair relatedness, it has been found that list relatedness affects JOLs, too. For instance, in Matvey et al.’s (2006) study, participants made item-by-item JOLs for categorized lists and unrelated lists, followed by free recall. It was found that JOLs were higher for categorized lists than for unrelated lists, although the effect was smaller than that of pair relatedness. Meanwhile, list relatedness, like pair relatedness, also affects recall accuracy (see also Kausler, 1974; Mandler, 1967), and the effect of list relatedness on actual recall is larger than its effect on JOLs. Moreover, Hourihan and Tullis (2015) showed that JOLs were sensitive to category size: JOLs were higher for categorized lists with 12 exemplars than those with four exemplars, tracking the pattern in the actual recall. In brief, there is an important theoretical difference between pair relatedness and list relatedness, and prior studies provide some evidence that list relatedness, like pair relatedness, also affects both JOLs and recall.

Do the JOL effects of font size persist under conditions of list relatedness?

A few studies show that the font size effect occurs for both related and unrelated word pairs (Price & Harrison, 2017; Rhodes & Castel, 2008; Wang et al., 2020). In particular, Price and Harrison (2017) and Rhodes and Castel (2008) found that when font size and pair relatedness were factorially manipulated, JOLs were higher for both the larger fonts and the related word pairs. Moreover, the effects of font size and pair relatedness did not interact. Notably, pair relatedness is a more diagnostic cue than font size: Related word pairs robustly elicit both higher JOLs and better recall than unrelated pairs (e.g., Koriat, 1997; Mueller et al., 2013; Undorf & Erdfelder, 2015), whereas larger fonts only increase JOLs relative to smaller fonts.Footnote 1 Thus, these results show that the JOL effects of font size remain robust when pair relatedness is present, even though font size is a less diagnostic cue than pair relatedness.

However, whether the pattern is the same with list relatedness remains an open question. On the one hand, it is possible that the JOL effects of font size will not be mitigated by list relatedness. As mentioned above, according to Koriat’s (1997) cue-utilization account, pair relatedness is an intrinsic cue whereas list relatedness is an extrinsic cue. Because extrinsic cues usually have weaker influences on JOLs (Koriat, 1997), it is reasonable to expect that the effects of font size on JOLs, which are not modified by pair relatedness (an intrinsic cue), would also not be modified by list relatedness (an extrinsic cue). Additionally, the cue-integration framework (Peynircioğlu & Tatz, 2019; Undorf & Bröder, 2020; Undorf et al., 2018) also predicts that the font size effect will persist under conditions of list relatedness. According to this account, people simultaneously process and integrate multiple cues when making JOLs, and hence, JOLs are sensitive to multiple cues if these cues affect JOLs when manipulated in isolation. In support of this prediction, prior studies have shown that the effect of font size remains robust when other, more diagnostic, cues (e.g., word frequency, word concreteness, pair relatedness) are provided along with font size (Fan et al., 2021; Price & Harrison, 2017; Rhodes & Castel, 2008; Undorf et al., 2018).

On the other hand, there are other reasons to expect that list relatedness might influence the JOL effects of font size that are rooted in the nature of encoding. Here, it is critical to pay attention to the fact that pair relatedness and font size are both item-specific cues that are processed with respect to individual list items, whereas list relatedness is processed across individual list items. In the memory literature, there is often a trade-off between those two types of processing, which are usually referred to as item-specific and relational encoding (e.g., Hunt & Einstein, 1981). Therefore, list relatedness shifts encoding in the direction of relational processing, which may dilute the effects of item-specific perceptual features such as font size. To sum up, there are contrasting theoretical predictions for the effect of font size on JOLs when list relatedness is an available cue: (a) It may still persist because list relatedness is an extrinsic cue that usually has weaker JOL effects than intrinsic cues, and the cue-integration framework predicts that people can simultaneously integrate multiple cues when making JOLs; or (b) it may be reduced by list relatedness, as the relational processing provoked by list relatedness may attenuate the processing of item-specific perceptual features.

Does font size affect recall with related lists?

Although many prior experiments have shown that font size has little-to-no effects on recall, it is important to point out that prior font size experiments relied overwhelmingly on unrelated word lists (e.g., Kornell et al., 2011; Mueller et al., 2014; Rhodes & Castel, 2008; Susser et al., 2013). Thus, these studies leave open the question of whether there is a memory effect (as opposed to a JOL effect) of font size with related word lists.

In the memory literature, some findings suggest that there may indeed be such an effect (Arndt & Reder, 2003; Brainerd et al., 2002). For instance, Arndt and Reder (2003) used different font types for the words on Deese–Roediger–McDermott (DRM; Deese, 1959; Roediger & McDermott, 1995) lists. DRM lists consist of a series of words (e.g., bed, pillow, nap) that are forward associates of a common unpresented word (the critical distractor; e.g., sleep). Thus, DRM lists tend to elicit strong relational encoding, which often leads participants to falsely remember the critical distractor as being on the list. Arndt and Reder observed that presenting DRM list words in different fonts reduced false recognition of critical distractors. This seems to demonstrate that the different perceptual features associated with fonts enhance item-specific processing, which is complementary to the relational processing provoked by the high level of list relatedness.

In that connection, Hunt and his colleagues demonstrated that high levels of engagement of both item-specific and relational processing are most beneficial for memory performance (e.g., Hunt & Einstein, 1981; Hunt & Seta, 1984). Therefore, it is possible that font size manipulations have similar effects as font type manipulations, which can strengthen item-specific processing with related lists and thus produce better memory. Note that if font size manipulations did enhance item-specific processing, such effects were less likely to benefit recall for unrelated lists. This is because while related lists favor relational processing, unrelated lists tend to provoke more item-specific processing than relational processing. Thus, any item processing enhancement induced by font size may be redundant with unrelated lists.

The current study

As discussed above, list relatedness is a distinct type of cue from pair relatedness, and it has not yet figured in research on the font size effect. Moreover, there are contrasting theoretical predictions about whether font size affects JOLs when list relatedness is simultaneously manipulated, and it also remains an open question whether font size affects recall with related lists. The present study was designed to resolve those uncertainties by testing (a) whether the JOL effects of font size are robust under conditions of list relatedness, and (b) whether font size affects recall for related lists. To achieve these goals, we factorially manipulated list relatedness and font size in three free recall experiments. For the font size manipulation, we presented list words in either 18-pt font or 48-pt font. For the list relatedness manipulation, we used both related lists (categorized lists in Experiments 1 and 2 and DRM lists in Experiment 3) and unrelated lists. To vary the salience of list relatedness, we presented related and unrelated lists in a blocked manner in Experiment 1 and in a mixed manner in Experiments 2 and 3.

Experiment 1

Method

Participants

Fifty-one undergraduates (Mage = 19.90, SDage = 1.97) participated in Experiment 1. All participants were compensated with extra course credits. We conducted a sensitivity analysis with G*Power 3.1 (Faul et al., 2007), which suggested that our sample size provided sufficient power (1 − β > .80) to detect a small-sized (ηp2 = .04) main effect of font size or interactions between font size and list relatedness in mixed analyses of variance (ANOVAs).

Materials

The materials were 64 English vocabulary words, evenly divided into four 16-word lists. Two of the four lists were categorized, with each list including a category label (e.g., animal) and 15 categorical exemplars (e.g., squirrel, cat, elephant) of the given category. The categorized lists were retrieved from Van Overschelde et al.’s (2004) category norms. We used 16-item lists because the four-word categorized lists in Matvey et al. (2006) produced relatively weak effects on JOLs, and Hourihan and Tullis (2015) showed that longer lists exhibited stronger effects. Additionally, we presented the category label as the first word, so as to provoke strong relational processing. The other two lists were semantically unrelated and consisted of words randomly selected from unused categories in Van Overschelde et al. (2004). Within each list, half of the words were presented in 18-pt Arial font whereas the other half were presented in 48-pt Arial font. We controlled word length, word frequency (Durda & Buchanan, 2006), and word concreteness (Brysbaert et al., 2014) between related and unrelated lists and between 18-pt and 48-pt words. The word lists used in all experiments and the relevant attribute values for the lists are available online (https://osf.io/7j2pv).

Procedure

The experiment was constructed in Qualtrics. Each participant studied four 16-word lists, evenly distributed across two blocks, with two related lists in one block and two unrelated lists in the other. The order of the blocks and the order of lists within each block were counterbalanced between participants.

The experiment consists of two blocks, with the second block immediately following the first. The two blocks followed the same study-buffer-test sequence, and they only differed in the word lists presented during the study phase. At the beginning of each block, participants were informed that they would study two word lists, and they would make a JOL for each word after it was presented. They were also informed that they would take a memory test after studying the two lists. During the study phase, all the words were presented at a 2-s rate, and there was a 5-s interval between consecutive lists. After each word was presented for 2 seconds, the word disappeared and participants were asked to make a JOL—namely, to judge the likelihood that they would be able to recall it later on a 0–100 scale (0 = not likely at all, 100 = very likely). They were required to type a whole number between 0 and 100 and were encouraged to use the full scale. There was no time limit for making JOLs. In each block, after the two word lists were studied, participants worked on a buffer task (simple mathematical problems) for 2 minutes. Then, they were given 3 minutes to complete a free recall test. No feedback was provided for the recall test.

Results

In this and the following experiments, we first report the results for JOLs and then for recall. Although JOL resolution is not of primary interest in the current study, we still reported those results for archival purposes. Here, JOL resolution indicates whether items that are recalled with higher probabilities are rated with higher JOLs, which is indexed by Goodman–Kruskal gamma correlations between JOLs and recall. All the ANOVAs were conducted with the afex package in R (Singmann et al., 2015), and the Goodman–Kruskal gamma correlations were calculated using the Hmisc package in R (Harrell, 2019). Outliers were identified and removed if they were 1.5 interquartile ranges (IQRs) above or below the median (Höhne & Schlosser, 2018). Based on this criterion, no participant was excluded in Experiment 1.

JOL results

The descriptive results for JOLs in Experiment 1 are presented in Table 1. The JOLs were submitted to a 2 (list relatedness: related, unrelated) × 2 (font size: 18 pt, 48 pt) × 2 (block order: related/unrelated, unrelated/related) mixed ANOVA. The ANOVA revealed a main effect of list relatedness, F(1, 49) = 28.64, MSE = 207.00, ηp2 = .37, p < .001, and a main effect of font size, F(1, 49) = 8.47, MSE = 22.59, ηp2 = .13, p = .005, as JOLs were overall higher for the related lists and the larger font. There was no interaction between list relatedness and font size, F(1, 49) = .00, MSE = 17.75, ηp2 = .00, p = .978.

Table 1 Mean and standard deviations of judgments of learning and recall in Experiment 1

In addition, there was a List Relatedness × Block Order interaction, F(1, 49) = 9.49, MSE = 207.00, ηp2 = .16, p = .003. Post hoc t testsFootnote 2 revealed that related lists only elicited higher JOLs when they are presented before unrelated lists, t(49) = 6.57, p < .001, but not when they are presented after unrelated lists, t(49) = 1.48, p = .145. Additionally, there was a List Relatedness × Font × Block Order interaction, F(1, 49) = 5.91, MSE = 17.75, ηp2 = .11, p = .019. When related lists were presented in the first block and the unrelated lists were presented in the second, the JOL effect of font size was non-significant in either unrelated lists, t(49) = .32, p = .748, or related lists, t(49) = 2.15, p = .072. However, when the unrelated lists were presented first, there was a significant effect of font size on JOLs in unrelated lists, t(49) = 3.20, p = .010, but not in related lists, t(49) = .96, p = .455. Thus, when related lists or unrelated lists were presented in the first block, there was a significant JOL effect of font size with unrelated lists but not with related lists. When related lists or unrelated lists were presented in the second block, there was no JOL effect of font size with either list type.

Recall results

Table 1 also summarizes the descriptive results for the proportion of correct recall in Experiment 1. A 2 (list relatedness: related, unrelated) × 2 (font size: 18pt, 48pt) × 2 (block order: related/unrelated, unrelated/related) mixed ANOVA for recall revealed a list relatedness main effect, F(1, 49) = 124.99, MSE = .02, ηp2 = .72, p < .001, and a font size main effect, F(1, 49) = 10.35, MSE = .01, ηp2 = .17, p = .002. There was also a List Relatedness × Font Size interaction, F(1, 49) = 28.15, MSE = .01, ηp2 = .36, p < .001, as smaller words produced higher recall for related lists, t(49) = 6.60, p < .001, but not for unrelated lists, t(49) = −1.75, p = .087. Thus, although font size did not affect recall for unrelated lists, as reported in many prior studies (e.g., Hu et al., 2016; Kornell et al., 2011; Rhodes & Castel, 2008; Yang et al., 2018), the smaller font actually led to better recall in related lists.

JOL resolution results

The JOL resolution data can be found in Table 2. A 2 (list relatedness: related, unrelated) × 2 (font size: 18pt, 48pt) × 2 (block order: related/unrelated, unrelated/related) mixed ANOVA was conducted on JOL resolution, with five participants’ data removed because their gamma correlations were not calculable in at least one condition. The ANOVA revealed a main effect of font size, F(1, 44) = 11.90, MSE = .10, ηp2 = .21, p = .001, as resolution was better for the smaller font than for the larger font. The main effect of list relatedness, F(1, 44) = 2.24, MSE = .10, ηp2 = .05, p = .142, and the List Relatedness × Font Size interaction, F(1, 44) = .73, MSE = .09, ηp2 = .02, p = .396, were not significant.

Table 2 Mean and standard deviations of JOL resolution for all experiments by list relatedness and font size

Experiment 2

Experiment 1 showed that font size had a significant JOL effect with unrelated word lists but not with related lists in the first block, and font size had no significant JOL effect with either related or unrelated lists in the second block. Thus, the effects of font size on JOLs seem to be mitigated by list relatedness. Additionally, although font size did not affect recall for unrelated lists, the smaller font produced better recall for related lists.

In Experiment 1, related lists and unrelated lists were presented in a blocked manner, so participants saw a set of lists that were homogeneously related or homogeneously unrelated. In Experiment 2, we increased the salience of list relatedness by presenting related and unrelated lists within each block. Given that the JOL-making process is comparative and inferential in nature, we expected that the mixed-list design would enhance participants’ awareness of variations in list relatedness and prompt them to weigh relatedness more in JOLs. In short, our goal was to further examine the JOL and recall effects of font size when list relatedness is made more salient.

Method

Participants, materials, and procedure

Fifty-one undergraduates (Mage = 19.82, SDage = 1.14) participated in Experiment 2. All participants were compensated with extra course credits. The sample size, materials and procedure were identical to Experiment 1 with only one change: The lists were presented in a mixed rather than blocked manner in that there was one related list and one unrelated list presented in each block.

Results

JOL results

The descriptive results for JOLs in Experiment 2 can be found in Table 3. The JOLs were submitted to a 2 (list relatedness: related, unrelated) × 2 (font size: 18 pt, 48 pt) × 2 (block: first, second) × 2 (list order: related/unrelated, unrelated/related) mixed ANOVA. Because Experiment 1 demonstrated considerable differences between blocks, we included block as a factor in the analyses. Additionally, because we presented a related list and an unrelated list in each block, with the order of lists counterbalanced across participants, we also included list order as a factor.

Table 3 Mean and standard deviations of judgments of learning and in Experiment 2

The ANOVA revealed a main effect of list relatedness, F(1, 49) = 65.17, MSE = 209.89, ηp2 = .57, p < .001, as related lists produced higher JOLs than unrelated lists, and a main effect of block, F(1, 49) = 5.35, MSE = 238.07, ηp2 = .10, p = .025, as JOLs were higher in the first block than the second. The main effect of font size was not significant, F(1, 49) = 2.96, MSE = 107.10, ηp2 = .06, p = .092.

The ANOVA also revealed a List Relatedness × Font Size interaction, F(1, 49) = 4.46, MSE = 37.78, ηp2 = .08, p = .040. Post hoc tests revealed that the larger font only produced significantly higher JOLs for related lists, t(49) = 3.12, p = .010, but not for unrelated lists, t(49) = .36, p = .721. Moreover, there was a List Relatedness × Font Size × Block interaction, F(1, 49) = 4.22, MSE = 21.59, ηp2 = .08, p = .045. Font size affected JOLs for related lists only in the second block, t(49) = 2.64, p = .044, but not in the first block, t(49) = 1.98, p = .106, and font size had no JOL effect for unrelated lists in either block, ts = 1.10 and −.58, ps = .367 and .564.

Additionally, the ANOVA revealed a List Relatedness × Block interaction, F(1, 49) = 4.92, MSE = 161.23, ηp2 = .09, p = .031, as the list relatedness effect was stronger in the second block, t(49) = 14.70, p < .001, than in the first block, t(49) = 9.00, p < .001. There was also a List Relatedness × List Order interaction, F(1, 49) = 10.89, MSE = 209.89, ηp2 = .19, p = .002, as the list relatedness effect was stronger when a related list preceded an unrelated list, t(49) = 16.81, p < .001, than when an unrelated list preceded a related list, t(49) = 7.01, p = .003. Last, there was a List Relatedness × Block × List Order interaction, F(1, 49) = 21.29, MSE = 161.23, ηp2 = .30, p < .001. Specifically, in the first block, the effect of list relatedness was significant when a related list preceded an unrelated list, t(49) = 8.85, p < .001, but not significant when an unrelated list preceded a related list,, t(49) = −.64, p = .524. However, in the second block, the effect of list relatedness was significant regardless of list order, ts = 5.16 and 4.81, ps < .001.

Recall results

The descriptive results for the proportion of correct recall in Experiment 2 are displayed in Table 3. Based on the outlier exclusion criterion described in Experiment 1, one participant with 100% recall accuracy was identified as an outlier and was removed from the analyses. The qualitative pattern of the results remained the same before and after the removal of the outlier. A 2 (list relatedness: related, unrelated) × 2 (font size: 18 pt, 48 pt) × 2 (block: first, second) × 2 (list order: related/unrelated, unrelated/related) mixed ANOVA revealed a main effect of list relatedness, F(1, 48) = 242.61, MSE = .03, ηp2 = .84, p < .001, and a main effect of font size, F(1, 48) = 4.24, MSE = .02, ηp2 = .08, p = .045, as recall was higher for related lists and for the smaller font.

There was also a List Relatedness × Font size interaction, F(1, 48) = 11.57, MSE = .02, ηp2 = .19, p = .001, as the smaller font only produced better recall for related lists, t(48) = 3.69, p = .001, but not for unrelated lists, t(48) = −1.02, p = .312. Further, there was a List Relatedness × Block interaction, F(1, 48) = 4.55, MSE = .03, ηp2 = .09, p = .038, and a List Relatedness × List Order interaction, F(1, 48) = 7.00, MSE = .03, ηp2 = .13, p = .011. The list relatedness effect was stronger in the second block, t(48) = 12.30, p < .001, than in the first block, t(48) = 9.61, p < .001, and it was stronger when the related list preceded the unrelated list, t(48) = 14.78, p < .001, relative to the other way around, t(48) = 8.21, p < .001. In brief, the recall results in Experiment 2 are highly consistent with Experiment 1.

JOL resolution results

The JOL resolution data are presented in Table 2. We no longer included list order and block as factors in the ANOVA because splitting data by block and list order would render nearly half of the participants’ gamma correlations incalculable in at least one condition. Thus, a 2 (relatedness: related, unrelated) × 2 (font: 18 pt, 48 pt) repeated-measures ANOVA was conducted, with one participant’s data removed as an outlier and three additional participants’ data removed because gamma correlations were not calculable for them in at least one condition. The ANOVA revealed the same pattern as in Experiment 1: There was a main effect of font size F(1, 46) = 6.28, MSE = .17, ηp2 = .12, p = .016, as resolution was better for the smaller font than for the larger font. The main effect of list relatedness, F(1, 46) = .17, MSE = .15, ηp2 = .00, p = .682, and the List Relatedness × Font Size interaction, F(1, 46) = .29, MSE = .16, ηp2 = .01, p = .592, were not significant.

Experiment 3

Experiment 2 showed that the JOL and recall effects of font size were both modified by list relatedness when list relatedness was manipulated in a mixed-list design. However, we observed a puzzling result: Font size produced a significant JOL effect for related lists in the second block. We suspect that this result is a statistical aberration, considering that the JOL effects of font size were not significant for related lists in Experiment 1 and in the first block of Experiment 2. To resolve the uncertainty, we conducted Experiment 3 to (a) replicate the results of Experiments 1 and 2 and (b) examine whether provoking stronger relational processing would completely eliminate the JOL effects of font size.

To stimulate stronger relational processing in related lists, we replaced categorized lists with DRM lists, which are traditionally administered in false memory experiments. DRM lists produce higher levels of false memory than categorized lists, indicating that they trigger even stronger relational processing that focuses on connecting the gist across individual items (e.g., Brainerd & Reyna, 2005). The reason is that there are multiple overlapping semantic relations among list words (e.g., category membership, situational membership, synonymy) rather than only one type of relations (Brainerd et al., 2008, 2020; Cann et al., 2011). As our principal interest does not lie with false memory, we presented the critical distractor of each DRM list as the first word on the list to enhance relational processing, just as we did for the categorized lists in Experiments 1 and 2.

Method

Participants, materials, and procedure

Forty-nine undergraduates (Mage = 19.38, SDage = 1.27) participated in Experiment 3. All participants were compensated with extra course credits. The sensitivity analysis indicated that our sample size provided sufficient power (1 − β > .80) to detect a small effect size (ηp2 = .04) for the main effect of font size or the interaction between font size and list relatedness in mixed ANOVAs. The materials and procedure were the same as in Experiment 2, except for one change: We replaced categorized lists with DRM lists that were chosen from the Appendix of Roediger et al. (2001). Thus, among the four 16-word to-be-remembered lists, half were DRM lists, with each list including a critical distractor and the 15 forward associates of the critical distractor. The other two lists were semantically unrelated lists that consisted of words randomly selected from other unused DRM lists. No participant was excluded in Experiment 3 based on the outlier exclusion criterion described in Experiment 1.

Results

JOL results

The descriptive results for JOLs in Experiment 3 appear in Table 4. We conducted a 2 (list relatedness: related, unrelated) × 2 (font size: 18 pt, 48 pt) × 2 (block: first, second) × 2 (list order: related/unrelated, unrelated/related) mixed ANOVA on JOLs. The ANOVA revealed a main effect of list relatedness, F(1, 47) = 13.12, MSE = 242.36, ηp2 = .22, p < .001, whereas there was again no significant main effect of font size, F(1, 47) = .001, MSE = 151.59, ηp2 = .00, p = .975. Additionally, there was a List Relatedness × List Order interaction, F(1, 47) = 33.47, MSE = 242.36, ηp2 = .42, p < .001, and a List Relatedness × List Order × Block interaction, F(1, 47) = 7.34, MSE = 295.89, ηp2 = .14, p = .009. In the first block, JOLs were higher for related lists than for unrelated lists regardless of list order, ts = 5.51 and 3.39, ps < .002. However, in the second block, related lists only elicited higher JOLs when presented before the unrelated lists, t(47) = 3.82, p < .001, but not when presented after the unrelated lists, t(47) = 1.15, p = .258.

Table 4 Mean and standard deviations of judgments of learning and recall in Experiment 3

Recall results

Table 4 lists the descriptive results for the proportion of correct recall in Experiment 3. Those data were submitted to a 2 (list relatedness: related, unrelated) × 2 (font size: 18 pt, 48 pt) × 2 (block: first, second) × 2 (list order: related/unrelated, unrelated/related) mixed ANOVA. The ANOVA revealed a main effect of list relatedness, F(1, 47) = 116.84, MSE = .04, ηp2 = .71, p < .001, and a main effect of font size, F(1, 47) = 7.96, MSE = .02, ηp2 = .14, p = .007. Additionally, there was a List Relatedness × Font Size interaction, F(1, 47) = 5.19, MSE = .03, ηp2 = .10, p = .027, as the smaller font improved recall for related lists, t(47) = 4.67, p < .001, but not for unrelated lists, t(47) = −.52, p = .609. Again, the recall results in Experiment 3 were highly consistent with Experiments 1 and 2.

JOL resolution results

The descriptive data for JOL resolution are presented in Table 2. Four participants’ gamma correlations were not calculable in at least one condition, and thus, their data were removed from the analyses. Owing to the same reason described in Experiment 2, list order and block were not included as factors in the ANOVA for JOL resolution. A 2 (relatedness: related, unrelated) × 2 (font size: 18 pt, 48 pt) repeated-measures ANOVA revealed a main effect of relatedness, F(1, 44) = 8.74, MSE = .13, ηp2 = .17, p = .005, as JOL resolution was worse for related lists than for unrelated lists. The main effect of font size, F(1, 44) = .05, MSE = .13, ηp2 = .00, p = .819, and the List Relatedness × Font Size interaction, F(1, 44) = 2.00, MSE = .12, ηp2 = .04, p = .165, were not significant.

General discussion

In the present study, we factorially manipulated list relatedness and font size and examined (a) whether the JOL effects of font size persist when list relatedness is varied, and (b) whether font size affects recall accuracy with related word lists. The effects of font size on JOLs and recall in Experiments 13 are summarized in Table 5. Across these experiments, we found that both the JOL and recall effects of font size were modified by list relatedness. For JOLs, the effects of font size were moderated or eliminated when list relatedness was varied simultaneously. For recall, the effects of font size were only found for related lists. Below, we first discuss the JOL results and their implications. Then, we turn to the recall results. Last, we make some recommendations for future research on the font size effect and metamemory in general.

Table 5 Summary of the effects of font size on JOLs and recall in Experiments 13

The JOL effects of font size depends on list relatedness

In Experiment 1, list relatedness was manipulated in a blocked-list design such that there were either two categorized lists or two unrelated lists in each block. As can be seen in Table 5, in the first block of Experiment 1, we found that the larger font led to significantly higher JOLs for unrelated lists but not for categorized lists, and the JOL effects of font size disappeared in the second block. Experiments 2 and 3 provide additional support that the JOL effects of font size are mitigated by list relatedness. In those two experiments, lists were presented in a mixed manner as a related list (categorized or DRM list) and an unrelated list was intermixed in each block. Such a mixed-list design should highlight the variability in list relatedness, and thus list relatedness should be a more salient cue than in the blocked-list design. As shown in Table 5, there was no effect of font size on JOLs for either related or unrelated lists in the first block of both Experiments 2 and 3. However, in the second block of Experiment 2, we observed an interaction between font size and list relatedness, as the larger font led to higher JOLs for related lists but not for unrelated lists. As we did not replicate the interaction in Experiment 3, we think it is most likely that this interaction was merely a statistical aberration.

To sum up, the three experiments provided converging evidence that the JOL effects of font size depend on list relatedness: The effect is either moderated or eliminated when list relatedness is manipulated in parallel with font size; that is, differences in relatedness remove some or all of the variance that is normally due to differences in font size. To the best of our knowledge, related word lists have not figured in prior research on the font size effect. Thus, our study provides novel evidence that list relatedness may be a boundary condition for this effect. Moreover, our study supports the notion that there is a trade-off between item-specific and relational processing when making JOLs. Specifically, the availability of a relational cue (list relatedness) reduced the effects of an item-specific cue (font size). This has important implications for the cue-integration framework of JOLs (Peynircioğlu & Tatz, 2019; Undorf & Bröder, 2020; Undorf et al., 2018). To recap, this account predicts that people can integrate multiple cues when making JOLs, and thus JOLs should reflect the effects of multiple cues simultaneously if those cues affect JOLs when manipulated in isolation. In that connection, prior findings that the JOL effects of font size persist when font size and pair relatedness are manipulated simultaneously are in harmony with this account (e.g., Price & Harrison, 2017; Rhodes & Castel, 2008). However, our results showed that list relatedness could eliminate the JOL effects of font size. This suggests that even if some cues have robust JOL effects when manipulated individually, their effects can be overshadowed by the effects of other preferentially processed cues that are simultaneously present. Specifically, there is likely a trade-off between item-specific cues like font size and relational cues like list relatedness, in that the effects of some item-specific cues can be mitigated by the presence of relational cues.

The recall effects of font size depends on list relatedness

Across the three experiments, we found a robust recall effect of font size. Specifically, the smaller font always led to better recall than the larger font for related lists but not for unrelated lists. Here, although the classic finding is that font size affects JOLs but not recall, we stress that most prior studies used unrelated word lists as study materials (for a review, see Chang & Brainerd, 2022). On the one hand, our findings were consistent with the prior findings that font size has minimal effects on recall for unrelated lists. On the other hand, our results demonstrated that font size can affect recall for related lists.

The recall advantage of the smaller font has also been reported in some prior studies (Halamish, 2018; Undorf & Zimdahl, 2019). According to Halamish (2018), smaller fonts may improve memory because they are processed less fluently than larger fonts, which triggers more effortful processing. In that connection, many studies provide evidence that smaller fonts are processed less fluently than larger fonts (Susser et al., 2013; Undorf et al., 2017; Yang et al., 2018). Therefore, it is likely that the effects of smaller fonts are a “desirable difficulty” effect (Bjork, 1994), which reduces processing fluency and provoke more thorough processing that redounds to memory performance.

Whereas Halamish (2018) reported that the recall benefit of smaller fonts was not robust and was eliminated by the solicitation of JOLs, we found that the smaller font consistently led to better recall, when JOLs were administered. Again, the different patterns may be due to the difference in study materials, as Halamish used pure unrelated lists whereas we used both related and unrelated lists. A possible reason for our more robust recall effect for the smaller font with related lists is that the item-specific processing triggered by the smaller font complements the relational processing triggered by list relatedness. As suggested by Hunt and his colleagues, memory is optimal when encoding is characterized by high levels of both item-specific and relational processing (e.g., Hunt & Einstein, 1981; Hunt & Seta, 1984). While related lists clearly favor relational processing, unrelated lists primarily provoke item-specific processing. Thus, the item processing enhancement induced by the smaller font should be complementary for related lists but not for unrelated lists.

The finding that the smaller font produced better recall for related lists has important practical implications. In daily life, to-be-remembered information is often presented as part of related materials. For instance, students need to remember multiple concepts that are relevant to a topic in textbooks, and advertisers list multiple advantages of a product in advertisements. In such circumstances, particularly important information is often highlighted in a larger font. However, this may not optimize memory, as our results suggest that the smaller font enhanced memory for related items relative to the larger font. Further, Experiments 1 and 2 also showed that the smaller font produced higher JOL resolution, suggesting that it also helps people make more accurate estimates of their future memory. This is also potentially beneficial for memory outcomes, as metacognitive monitoring can guide people’s study strategies (Nelson & Leonesio, 1988; Thiede & Dunlosky, 1999). Last but not least, it is worth mentioning that although some prior studies suggest that perceptual manipulations that reduce processing fluency (e.g., disfluent font type, blurring) produce desirable difficulty affects (Diemand-Yauman et al., 2011; Geller et al., 2018; Rosner et al., 2015; Sungkhasettee et al., 2011), there is also contrary evidence (Huff et al., 2022; Taylor et al., 2020; Yue et al., 2013). Thus, whether smaller fonts produce robust memory benefits requires further examination.

Future directions

The current study examined the font size effect under the novel condition of list relatedness (as opposed to pair relatedness). An obvious next step would be to investigate the same questions with more complex materials. For example, future research could focus on textual materials. In that regard, Luna et al. (2019) examined the font size effect with related and unrelated sentences and found the standard pattern in which JOLs were higher for larger-font sentences, but cued recall was not affected by font size. However, Luna et al. manipulated relatedness within sentences rather than across sentences, so their relatedness manipulation was still focused on item-specific rather than relational processing. Thus, it is an open question whether the results remain the same when relatedness is manipulated across sentences. Additionally, Luna et al. manipulated font size at the sentence level rather than at the word level. For the sake of comparability with the extant literature and ecological validity, it is important to discover whether results change when font size is manipulated for specific words within the sentences.

It would also be instructive to manipulate a wider range of font sizes. As Chang and Brainerd’s (2022) meta-analysis showed, the memory effects of font size vary with the specific font sizes being compared, such that results differ when very small fonts are compared to intermediate-size fonts versus when the smaller and larger fonts are both in the intermediate range. Therefore, it is particularly important to identify which font sizes produce the optimal memory outcomes when compared to the most conventionally used font size in typographical practices.

Last and perhaps most important, future studies should aim to develop a more coherent theoretical framework for JOLs that weights different types of cues differently in cue integration. In our studies, font size was deemphasized in the JOL process when list relatedness was simultaneously varied, suggesting that relational cues were preferentially processed relative to certain item-specific cues. Why? An obvious possibility is that list relatedness is more diagnostic of eventual memory accuracy than font size. However, prior studies demonstrated that the JOL effects of font size are still robust when other more diagnostic cues are manipulated. For instance, Price and Harrison (2017) and Rhodes and Castel (2008) showed that font size and pair relatedness had independent effects on JOLs, and Undorf et al. (2018) showed that font size, number of study presentations, concreteness, and emotionality all affected JOLs, and their effects did not interact. Thus, it seems that diagnosticity is not sufficient for a cue to be preferentially processed. It would be important to (a) recognize that cues are not necessarily weighted equally during cue integration, and (b) investigate which cues are naturally weighted more heavily than the others.