Introduction

Over recent years, there has been increasing scientific interest in understanding the potential benefits of reading fiction for a range of psychological outcomes. Whilst the majority of empirical research has tested the idea that reading fiction promotes social cognitive abilities including theory of mind, empathy (e.g. Bal & Veltkamp, 2013; Djikic et al., 2013b; Kidd & Castano, 2013), and related outcomes including moral cognition (e.g. Johnson et al., 2013; Koopman, 2015), benefits of reading fiction have been reported in other outcomes as well, for instance the need for cognitive closure (Djikic et al., 2013a), creativity (Black & Barnes, 2021), or changes in personality (Djikic et al., 2009). Across outcome variables, the majority of experimental studies have investigated the effects of reading short fictional narratives. A meta-analysis concluded that such experiments on average yield small-sized benefits for social cognition (Dodell-Feder & Tamir, 2018).

However, recently proposed models by Consoli (2018) and Mar (2018) that outline conditions under which reading (fictional) narratives can lead to psychological benefits question the validity of these findings. According to these models, effects in the sense of deep learning are predicted exclusively after frequent exposure to fictional stories over a prolonged period of time, and not after a single brief reading assignment. One method of testing this prediction is to investigate the effects of lifetime exposure to print fiction: Instead of testing outcomes after a reading task, researchers could assess the amount of fiction participants have read in their lifetime so far, and examine whether the amount of lifetime fiction reading is associated with purported benefits. This research agenda requires valid indicators of lifetime fiction reading. In this article, we report a study investigating the validity of three such indicators.

According to a meta-analysis by Mol and Bus (2011), the most frequently applied indicators of print exposure are self-report scales, book counting, and author recognition tests. Using self-report scales, participants report on their own reading preferences and/or habits, often by responding to items using a rating scale. Prior to publication of the Self-Report Habit Index for Reading (SRHI-R; Schmidt & Retelsdorf, 2016), empirically validated self-report instruments did not exist so that researchers relied on bespoke scales (for examples see Acheson et al., 2008; Spear-Swerling et al., 2010). Schmidt and Retelsdorf (2016) reported some evidence suggesting criterion-related validity of the SRHI-R, as this questionnaire was a stronger predictor of reading achievement and decoding speed than self-reported reading frequency. However, the SRHI-R was no longer a significant predictor of reading achievement when intrinsic reading motivation was included as predictor. Furthermore, the SRHI-R assesses general reading habits, and not fiction reading specifically. Hence, in the present study we used a bespoke single-item self-report scale. We decided to use a single item since the application of a multi-item instrument would have required additional piloting to clarify the instrument’s factor structure.

Book counting, understood as the number of fiction (or non-fiction) books in one’s home—in other words, the size of one’s home library—is typically used within large sociological surveys as an indicator of the home literacy environment. Within Bourdieu's (1984) classic cultural reproduction theory, home libraries are regarded as a component of a family’s cultural capital. Elite families are thought to provide their children with high-status cultural signals, including a large home library, in order to convince teachers of their children’s academic excellence. This is supposed to motivate the teachers to give these children extra pedagogical support and seal educational benefits in the long run. More contemporary views, such as the scholarly culture theory (Evans et al., 2010), assume that raising children in “bookish” environments, including large home libraries, provides the foundation of their trait-level tastes, skills, and knowledge. This is expected to promote future educational and occupational achievements. In sum, both cultural reproduction theory and scholarly culture theory would predict that the number of books in one’s home is positively linked with reading skills. An important methodological limitation of book counting is that previous studies have not precisely counted participants’ books, but asked them to give rough estimates. For instance, in the study reported by Sikora et al. (2019), participants chose from the following response options: 10 books or less; between 11 and 25 books; between 26 and 100 books; between 101 and 200 books; between 201 and 500; more than 500 books. To increase precision and attain an indicator of fiction exposure, the present study asked participants to count the fiction books in their homes and provide the exact figure.

Author recognition tests (ARTs), first introduced by Stanovich and West (1989), attempt to provide an objective measure of lifetime print exposure. In this task, respondents must identify the real authors from a list of names that includes both real authors and non-authors (so-called foils). The more authors are accurately recognised, the higher the estimated lifetime print exposure. The presumed relation between author recognition and reading amount draws on the assumption that people encode author names for the texts they read. Thus, the more they read, the more author names they should recognise. However, test scores are culturally and temporally sensitive, meaning that recognition of authors varies considerably across countries and short periods of time (McCarron & Kuperman, 2021; Moore & Gordon, 2015). This demonstrates the need to regularly develop updated versions for given cultural contexts. Therefore, in the present study we applied the Author Recognition Test–Genres (ART-G; Mar & Rain, 2015), since it is the most recent version providing separate scores for exposure to fiction and non-fiction.

The above-mentioned meta-analysis by Mol and Bus (2011) provides some evidence on the validity of print exposure measures. Here, 99 studies investigating leisure reading among preschoolers, kindergartners, school children, and higher education students were synthesised. In view of, first, inter-correlations of different types of measures, and second, correlations of these measures with reading skills, it was concluded that print exposure checklists (e.g. ARTs) and book counting have better validity than self-report measures since only the latter are particularly prone to social desirability biases. Despite the importance of the findings it yielded, this meta-analysis was published a decade ago and is limited to child, adolescent, and young adult samples. Hence, we conducted a comprehensive and updated literature search that also considered middle-aged and older adults.

In order to determine criterion-related validity, understood as the degree to which a measure is associated with a behavioural manifestation of the construct to be measured, we chose vocabulary, defined as word knowledge, as criterion of print exposure. This is because word knowledge is regarded as a central component of reading comprehension (e.g. Perfetti & Stafura, 2014), implying that vocabulary should improve as a result of frequent print exposure (Cunningham, 2005). A database search was carried out using PsycINFO and Web of Science, with the search terms “(author recognition test OR print exposure OR home literacy environment OR reading frequency OR leisure time reading) AND (vocabulary OR word knowledge)”. Further studies were identified via reference lists and article recommendations on journal webpages. In total, we detected 117 studies which reported concurrent correlations between at least one print exposure index and vocabulary in participants’ primary language, and which were not included in the meta-analysis by Mol and Bus (2011). Table 1 provides an overview including correlation coefficients with vocabulary; for an extended version also listing inter-correlations of print exposure measures and correlations involving measures of divergent validity, see https://osf.io/ytudn/.

Table 1 Overview of studies that provide information on the validity of print exposure measures in terms of concurrent correlations with primary language vocabulary; not included are studies synthesised by Mol and Bus (2011).

The vast majority of investigations (82 out of 117 studies in Table 1) have studied child samples up to primary school age, whereas studies with adolescents of secondary school age (8 out of 117 studies in Table 1) and young or middle-aged adults (25 out of 117 studies in Table 1) are less frequent. In addition, there is an apparent lack of studies with older adults (2 out of 117 studies in Table 1: Payne et al., 2012; Veldre et al., 2021). Investigating this population is especially informative for recent models on the effects of reading (fictional) narratives by Consoli (2018) and Mar (2018), since the amount of fiction read accumulates over the lifespan, so that effects of fiction exposure should be largest in older adults compared to younger samples.

Regarding criterion-related validity, the following correlation coefficients between print exposure measures and vocabulary have been observed (see Table 1): for ARTs, 34 correlation coefficients range between .05 (yes/no vocabulary test in Brysbaert et al., 2020, Study 1) and .70 (print exposure index composed of ART and magazine recognition test in Schmidtke et al., 2018), with an interquartile range of .34 to .58. For book count, 71 correlation coefficients range between −.12 (Peeters, Verhoeven, de Moor, et al., 2009a) and .63 (home literacy environment index including book count and parent-reported frequency of reading to child in Griffin & Morrison, 1997), with an interquartile range of .12 to .38. For self-report measures (or parent-report measures in case of younger child samples), 257 correlation coefficients range between −.36 (correlation controlled for age in Grant, 2012, Study 2) and .78 (Pratheeba & Krashen, 2013), with an interquartile range of .06 to .31. Taken together, the pattern of correlations confirms the earlier conclusions by Mol and Bus (2011) that ARTs and book counting have better validity than self-report indicators. Beyond this, the current literature review seems to suggest that ARTs have even better criterion-related validity than book counting.

However, the majority of extant work has relied on a combination of self-report scales with either author recognition tests or book counting, whereas interrelations between all three indicators have rarely been tested so far; a look at Table 1 reveals that only four out of 117 studies included all types of indicators (namely Burris et al., 2019; Grolig et al., 2017, 2019; and Zhang et al., 2018), and that all of these studies worked with child samples. Also, these studies applied title recognition tests instead of ARTs, so do not address ARTs directly. For those studies applying more than one type of index, the following inter-correlations were reported (see https://osf.io/ytudn/): 15 correlation coefficients addressing the association between ARTs and self-/parent-report measures range between .03 (time spent reading online in Chen & Fang, 2015) and .50 (frequency of reading for pleasure in Lee et al., 1997), with an interquartile range of .16 to .41; and 55 correlation coefficients concerning the relation between self-/parent-report measures and book count range between −.02 (Torppa et al., 2007) and .73 (score containing book count and parent-reported frequency of shared reading in O’Brien et al., 2020), with an interquartile range of .20 to .46. There were no studies reporting the association of book count with ARTs. It would therefore be important to assess construct validity, i.e. whether the three types of indicators measure the same or different constructs, more extensively.

A previously understudied question is whether print exposure measures have divergent validity, which would indicate that they do not correlate highly with measures of theoretically unrelated constructs (Campbell & Fiske, 1959). In fact, assessment of divergent validity should be an integral part of each validation process (see also Hodson, 2021): strictly speaking, assumptions about a measure’s convergent validity draw on a comparison of associations with indicators reflecting similar constructs on the one hand and associations with indicators reflecting dissimilar constructs on the other. A measure is said to have good convergent validity if the former associations are considerably higher than the latter. Nevertheless, within the 117 studies listed in Table 1, we identified only 68 correlation coefficients between vocabulary on the one hand and constructs thought to be associated with vocabulary to a lower extent than print exposure on the other (see https://osf.io/ytudn/). These divergent measures reflected various behaviours and skills, including non-verbal intelligence, numeracy, and memory. Correlation coefficients ranged between −.28 (rapid automatised naming test in Zhang et al., 2020) and .61 (inference-making ability in Sénéchal et al., 2018; IQ in Sparks et al., 2014), with an interquartile range of .10 to .31. This seems to suggest that only ARTs may have divergent validity, since ARTs typically correlated more strongly with vocabulary than the divergent measures included in Table 1 (see above), whereas book counting and self-/parent-report measures did not typically exceed correlations of .31. Yet, this assumption may be premature due to the relative scarcity and heterogeneity of investigations into divergent validity. More research in this area would be desirable.

Finally, the field has focused on the amount of lifetime reading in general, but not reading fiction specifically: Only 10 out of the 117 studies reported in Table 1 looked at fiction exposure (namely Brysbaert et al., 2020, Studies 1, 3, 4, 5; Chen & Fang, 2015; Grant, 2012, Studies 2, 3; Mar & Rain, 2015; Pfost et al., 2013; Spear-Swerling et al., 2010). Hence, conclusions about assessment of fiction exposure are currently not supported.

In the present article we report a study investigating the three main indicators of lifetime exposure to written fiction in a sample of older adults (here defined as individuals between 50 and 80 years of age). We examined the construct validity of a self-report scale and book counting, two types of measures whose validity has been evidenced to a relatively low degree, especially regarding the fiction exposure of older adults, against the fiction sub-score of an ART, for which validity is supported by a comparatively larger evidence base. Convergent construct validity was investigated in terms of bivariate correlations of the ART-G fiction sub-score with the self-report scale and book counting. Divergent validity was tested through bivariate correlations of the ART-G non-fiction sub-score with the self-report scale and book counting. In addition, we examined criterion-related validity insofar as we determined the value of each indicator in predicting performance in a vocabulary test.

We aimed to answer the following research questions:

  1. 1.

    How strongly are self-report scales and counting fiction books correlated with fiction author recognition lists on the one hand and non-fiction author recognition lists on the other?

    More positive correlations with fiction author recognition than with non-fiction author recognition would suggest that self-report scales and counting fiction books measure the same construct as the ART-G fiction sub-score, and hence, the most parsimonious indicator might be sufficient to assess lifetime exposure to print fiction.

  2. 2.

    Which of the three indicators demonstrates the strongest positive association with word knowledge?

    This indicator can be regarded as the one with the best criterion-related validity.

Methods

This study utilised a correlational design and was authorised by the Research Ethics Committee of the School of Psychology at the University of Kent before study commencement. The sample overlaps with the one reported in Study 2 within Wimmer et al. (2021). Wimmer et al. (2021) investigated relations of the ART-G subscales with empathy, theory of mind, general world knowledge, and imaginative skills. As distinct from the present study, Wimmer et al. (2021) did not include self-report measures of fiction reading or book counting.

Participants

A total of N = 337 participants were recruited via Prolific Academic, the University of the Third Age (https://u3a.org.uk/), and through local social media/web pages. Participants were deemed eligible if they were native English speakers and were between 50 and 80 years of age. Participants were excluded from analyses if they did not report their age (N = 5), did not pass an attention check item interspersed within the survey (N = 11), or selected more than two mock authors in the ART-GFootnote 1 (N = 15). This resulted in a final sample of N = 306Footnote 2, 281 of which were recruited from Prolific Academic and 25 from one of the other sources named above. The mean participant age was 59.29 years (SDage = 7.01), and 60.5% were female. All respondents gave written informed consent prior to data collection and were compensated with a payment of £10.00, either via bank transfer or a digital shopping voucher. Post hoc power analyses using SPSS 27 showed that the final sample size had a power of 1 − β > .99 to detect a medium-sized correlation of rho = .30 in a two-tailed test adopting a significance level of p < .05, and a power of 1 − β = .41 to detect a small-sized correlation of rho = .10 in the same sort of inference test.

Assessment measures

Lifetime exposure to print

ART-G (Mar & Rain, 2015) provided the first indicator of reading habits. Respondents were tasked to accurately recognise author names from a list that included 110 fiction authors and 50 non-fiction authors (targets), as well as 40 non-authors (foils). Fiction and non-fiction sub-scores were calculated from the number of selected authors for each genre; i.e., the fiction sub-score is the sum of correctly identified fiction authors, the non-fiction sub-score is the sum of correctly identified non-fiction authors. As distinct from the scoring procedures of the ART version by Stanovich and West (1989), foils were not subtracted from hits because the ART-G materials do not contain instructions to do so. Since we excluded participants selecting more than two foils, the penalty for foil checking was very strict (see above). Hence, the final sample for analyses had limited variance of ART-G foils, and further control measures did not seem necessary. Split-half reliability (Guttman split-half coefficient; test halves were composed using the odd-even method) was .96 for the fiction sub-score and .86 for the non-fiction sub-score.

Book counting served as the second measure of print exposure. Participants were given the following instruction: “How many fiction books do you have at home? Fiction books include novels such as crime novels, romantic novels, science fiction novels, but also short stories, comics/graphic novels, fairy tales, storybooks (often for children), theatre plays, poetry, etc. Please also include fiction e-books you may have on your e-book reader. If you live with other people, please only count the books that belong to you (i.e. reflect YOUR reading preferences). If you have more than 160 fiction books in your house, you can stop counting when you have reached 160 fiction books.” Participants were explicitly asked to give an accurate response and were reminded that failure to do so would make the study results useless. The threshold of 160 books was based on the finding that British households have on average 143 books (SD = 179; Sikora et al., 2019), but this score does not differentiate between fiction and non-fiction. Since there is no information available regarding the proportion of fiction versus non-fiction books typically owned, we arbitrarily assumed that, on average, 50% of books at home (M = 72, SD = 80) are fiction, and 50% are non-fiction (M = 72, SD = 80). Participants having more than (M + 1 SD =) 152 fiction books can therefore be considered scoring above average. Assuming that the number of books in one’s home is normally distributed and that the current sample was representative, collecting precise book counts from everyone scoring below 152 implied that we were able to gather precise estimates from approximately 84% of the sample. The number 160 rather than 152 was used as a cut-off to provide participants with a round number, which made instructions easier to follow, and to get exact estimates from even more than 84% of the sample. The use of this threshold was regarded as feasible because it increased feasibility for participants, and prevented inaccurate responses.

Thirdly, participants self-reported on their reading frequency by responding to “About how often do you read a fiction book?” using a six-point Likert scale with response options being 1 = “less than once a month”, 2 = “once a month”, 3 = “more than once a month”, 4 = “once a week”, 5 = “more than once a week”, and 6 = “every day”.

Vocabulary

An adapted version of the vocabulary subtest of the Wechsler Abbreviated Scale of Intelligence–Second Edition (WASI-II; Wechsler, 2011) reflected the breadth of participants’ vocabulary and overall word comprehension. Respondents had to provide a written definition of 31 words presented to them. The time limit for each word was 30 s. Correct responses were awarded a score of 2, partly correct responses were coded 1, and incorrect responses received a score of 0. A sum score with a possible range of 0 to 62 served as dependent measure. Split-half reliability (Guttman split-half coefficient; test halves were composed using the odd-even method) was .84.

In the original version, the examiner conducts the vocabulary test in a face-to-face session with the participant. The test was adapted to fit the online setting of the study: participants received the same instructions and items, and were also given the same time limit as in the original version. As distinct from the original version, instructions were presented in written from via a Qualtrics survey instead of orally by the examiner, and participants typed their responses into a text field instead of answering orally.

Procedure

Volunteers participated online, via the Qualtrics platform. After providing informed consent, participants completed the vocabulary test, ART-G, book counting survey, and self-report reading scale in that order. Finally, they provided their demographics and were debriefed in written form and remunerated. Participants also completed other tasks reported in Wimmer et al. (2021). The entire study took approximately 90 min per participant.

Data analysis

Full data and analysis scripts are available on the Open Science Framework web pages (see https://osf.io/ytudn/ for data and https://osf.io/sb7xz/ for analysis scripts). If not otherwise stated, statistics were computed using SPSS 27 software. After computing descriptive statistics of our key variables, we conducted Kolmogorov–Smirnov tests to check whether the measures of print exposure were normally distributed. Bivariate correlations between all print exposure measures and vocabulary were analysed through bivariate Spearman correlations. Next, we compared the correlation coefficients observed for the ART-G fiction sub-score with the ones observed for the ART-G non-fiction sub-score using an online calculator (https://www.psychometrica.de/correlation.html). We also checked whether the three indicators of fiction exposure and the vocabulary sum score were associated with age (using bivariate correlations) or gender (using independent-samples t-tests), to see whether analyses would have to be controlled for age and/or gender. Associations of the three indicators of fiction exposure with the vocabulary sum score were tested using a hierarchical linear regression, with the vocabulary sum score serving as the outcome variable. The self-report reading frequency scale was entered as first predictor, followed by the book count, and finally the ART-G fiction sub-score. Unless otherwise mentioned, we adopted the standard 5% significance level.

Results

Descriptive statistics for all independent variables and the dependent measure are summarised in Table 2. Significant Kolmogorov–Smirnov tests indicated that none of the indicators of print exposure under investigation was normally distributed (all ps < .001). In line with this, for the book count, 30.4% of the current sample reported having 160 or more fiction books in their homes, whereas the expected percentage under normal distribution is 13.5%. Hence, interrelations between the ART-G fiction and non-fiction sub-scores, the book count, and the self-report scale on the frequency of reading fiction, and the vocabulary sum score were tested using bivariate Spearman correlations, as described above (see Fig. 1 for illustrations). The significance level was adjusted for multiple comparisons using the Bonferroni method, resulting in pcrit = .005. All correlation coefficients were small- to medium-sized and significant (all ps < .0004; see Table 1). The ART-G fiction and non-fiction sub-scores were strongly positively correlated. Nevertheless, the ART-G fiction sub-score was significantly more positively correlated with the book count and the self-report scale than the ART-G non-fiction sub-score (Zs > 4.70).

Table 2 Descriptive statistics and bivariate correlations for indicators of print exposure and vocabulary
Fig. 1
figure 1

Regression plots illustrating inter-correlations of the indicators of print exposure under investigation

Further analyses revealed that none of the independent variables or the dependent measure were associated with age (ps > .051). The vocabulary sum score did not differ by gender (p = .425). However, there were significant gender differences for the self-report scale on frequency of reading fiction, t(267.84) = −2.192, p = .029, d = −0.255, book count, t(267.48) = −2.329, p = .021, d = −0.269, and ART-G fiction sub-score, t(294.12) = −2.185, p = .030, d = −0.243. Females scored consistently higher than males. Means (SDs) were 2.92 (2.02) vs 2.43 (1.79) for the self-report scale, 80.56 (64.83) vs 63.53 (61.03) for book count, and 28.91 (18.43) vs 24.79 (14.45) for the ART-G fiction sub-score.

As outlined above, a hierarchical regression tested the predictive power of each of the three indicators of fiction exposure in explaining variance of the vocabulary sum score (see Table 3 and Fig. 2). The indicators of print exposure were Z-standardised, and gender was contrast-coded (−.50 = male, .50 = female). To control for the observed gender differences, the two-way interactions of gender with the self-report scale, book count, and ART-G fiction sub-score were included in the baseline model alongside the ART-G non-fiction sub-scoreFootnote 3 in order to control for effects of non-fiction exposure. The self-report scale was entered as the predictor in the second model, book count was added in the third model, and the fourth model finally included all three predictors. Multicollinearity was acceptable (all variance inflation factors [VIFs] < 3). R2 increased significantly in each model. The self-report scale was a significant predictor in the second model but lost its significance when book count was added in the third model. Book count, in turn, predicted vocabulary significantly in the third model but lost its significance when the ART-G fiction sub-score was added in the fourth model. Thus, when all variables—including the ART-G non-fiction sub-score—were entered, the ART-G fiction sub-score remained the only significant predictor of vocabulary (p < .001).

Table 3 Summary of stepwise multiple regression for the vocabulary sum score (N = 293)
Fig. 2
figure 2

Regression plots illustrating the relationship of the three indicators of fiction exposure with performance in a vocabulary test

Discussion

There has been a recent increase of research interest in the potential benefits of reading fiction. According to contemporary models (Consoli, 2018; Mar, 2018), investigating the effects of lifetime exposure to written fiction in older adults seems particularly promising. Such a research agenda requires validated indicators of lifetime exposure to print fiction. The present study is the first to look at the three main types of indicators, namely self-report, author recognition test, and book counting, when applied to reading fiction rather than reading in general in a sample of older adults. We investigated convergent and divergent construct validity of the self-report scale and book counting through bivariate correlations with fiction author recognition on the one hand and non-fiction author recognition on the other, and criterion-related validity via associations of each indicator of fiction exposure with vocabulary test scores.

Our first research question addressed whether the self-report scale and book counting are more positively associated with fiction author recognition than with non-fiction author recognition. Such a pattern would indicate that self-report scales and counting fiction books reflect the same or a similar construct as the ART-G fiction sub-score, so that researchers could confine themselves to applying the most efficient measure without loss of information—in that case, using multiple measures would not provide additional information about participants’ engagement with fiction. The correlations observed were all statistically significant and ranged between rho = .206 (self-report scale—ART-G non-fiction sub-score) and rho = .688 (ART-G fiction—non-fiction sub-score), so were of small to medium size. Importantly, the self-report scale and book counting correlated consistently more strongly with the ART-G fiction sub-score than with the ART-G non-fiction sub-score. In general, the current interrelations of print exposure indicators are within the range of coefficients identified in our review of the literature, though consistently above the 75th percentile. The finding that the present associations were slightly higher than what has been typically found in earlier studies could trace back either to the age of our sample, which is higher than most previous studies, or to the fact that we examined specifically fiction exposure rather than general print exposure.

Whilst the above-mentioned statistics suggest that the self-report scale and book counting have satisfactory levels of convergent validity (Frey, 2018), the variance shared with the ART-G fiction sub-score ranges from 19% to 26%, leaving between 74% and 81% of variations within each indicator unexplained. This means that the constructs assessed by the three measures do overlap partially but are far from congruent (by congruency we mean a shared variance of 100%, or approaching 100% given the levels of noise typically present in empirical observations, rather than the partial overlap found in the current study). Thus, we cannot say from the present data that the three indicators can be used interchangeably. If researchers would like to get a comprehensive picture of participants’ lifetime exposure to print fiction, they might want to apply all three indicators—that is, a self-report-based measure, an author recognition test, and book counting.

The second research question dealt with the strength of associations between each of the three indicators on the one hand and vocabulary test scores on the other. Since vocabulary is considered a central component of reading comprehension (Perfetti & Stafura, 2014), and good levels of reading comprehension at least partly trace back to frequent reading (Perfetti, 1985; Torppa et al., 2020), we assumed that the relation between the indicators and vocabulary test scores would be informative of the indicators’ criterion-related validity. Analyses revealed that, when gender and non-fiction exposure were controlled for, the self-report scale had the lowest predictive value, as its contribution was no longer significant when book counting was added. Book counting proved to be the indicator with the second-best predictive value, as it outperformed the self-report scale but lost its significance when the ART-G fiction sub-score was included as another predictor. Finally, the ART-G fiction sub-score was found to have the highest criterion-related validity, since it emerged as the only significant predictor in the regression model including all three measures of exposure to print fiction. The model including the ART-G fiction sub-score was also the one with the highest R2, meaning it was the one explaining the most variance of the vocabulary test score. This emphasises the predictive power of the ART-G fiction sub-score.

Compared with earlier evidence summarised in Table 1, the current correlations between indicators of fiction exposure and vocabulary are similar in size, though above the median. Again, slightly higher coefficients could be related to either the age of our sample or the present focus on fiction exposure. Still, it is reassuring that according to the present findings, applying print exposure measures to index fiction exposure in older adults is not associated with reduced, but rather even better, validity.

Additional results partly confirmed and partly deviated from earlier findings. On the one hand, females scored higher than males on all three indicators of exposure to written fiction. This resonates with the well-established finding that females have a stronger preference for fiction texts than males (e.g. Thums et al., 2021). On the other hand, in the current sample none of the measures of fiction exposure were related to age. This conflicts with results of Grolig et al. (2020), where author recognition test scores increased with rising age. The difference between the findings by Grolig et al. (2020) and the present results is likely to reflect differences in the samples’ age ranges. Participants in Grolig et al. (2020) were between 13 and 77 years old, whereas the current sample included a much smaller age range, from 50 to 79 years. Comparatively lower age variance in the present study might have made the detection of an age-based impact more difficult. Whilst we deliberately focused on an older target group to capture lifetime experience with fiction, research on fiction exposure across the entire lifespan would indeed require samples covering the full scope of literate ages.

Although the research reported here makes novel contributions to the field of fiction research in several respects, a few limitations should be acknowledged. First, the skewed distribution of the book count may raise some concerns about the reliability of book counting. The highest possible score of 160 was reported by a percentage (i.e. 30.4 %) more than twice the size expected under normal distribution (here, the score of 160 would be reported by 13.5% only). On one hand, it is possible that the number of fiction books is not normally distributed in our older adult population (who are likely to have accumulated a larger home library over their lifetime compared to the younger samples tested in most previous studies), so assuming normal distribution was incorrect in the first place. On the other hand, it cannot be ruled out that some participants did not actually count their books until they reached 160 but instead made a rough guess, in fact overestimating the number of their books. Unfortunately, we do not have data, such as response times, which could be used to test this potential explanation. Future investigations could compare self-reported book counts with those recorded by researchers during home visits in order to gauge the reliability of self-reported book counts. Figure 1 also reveals that a considerable percentage of the sample (i.e. 7.2%) reported having zero books in their home, which may be counter-intuitive—everyday experience suggests that people typically own at least a small number of books. This result is possibly associated with the target group of the current investigation. Older adults are likely to change their housing situation to something more age-appropriate; either downsizing or moving to a retirement home means adjusting to less personal space, so that one might have to divest oneself of personal belongings including books. In that case, book counting would not reliably reflect lifetime print exposure in this population. Targeted research is needed to clarify this.

Second, the use of a single-item self-report scale likely limited the reliability of this measure. Also, since Schmidt and Retelsdorf (2016) found the SRHI-R, another self-report indicator of print exposure (see Introduction), to be confounded with reading motivation, the same may have applied to the current study. We opted for a single-item scale for three reasons, namely the lack of validated multi-item self-report questionnaires of fiction exposure, a shortage of resources to pilot a new multi-item instrument, and the previous successful application of a bespoke single-item self-report scale in a study with older adults by Payne et al. (2012) . The first issue turned out to be moot, since Kuijpers et al. (2020) developed the Reading Habits Questionnaire to assess fiction and non-fiction exposure. This questionnaire was published after the planning stage of the current study (end of 2019/beginning of 2020) and we were not aware of it until after data collection was completed.

Another limitation is related to the variable used to assess criterion-related validity, namely vocabulary or word knowledge. First, some researchers assume that good word knowledge is the result of frequent reading (e.g. Perfetti, 1985; Perfetti & Stafura, 2014), whereas others postulate other relationships between reading frequency and word knowledge. For instance, performance in a vocabulary test has been shown to predict reading comprehension (e.g. Laufer & Aviad-Levitzky, 2017; Ouellette, 2006), which suggests that word knowledge is a precursor rather than an outcome of reading behaviour. Hence, it remains disputable whether word knowledge is a suitable criterion-variable for reading exposure. Second, even if one accepts word knowledge as an appropriate criterion of print exposure, it does not provide a criterion of fiction exposure in particular. General word knowledge should improve through any kind of reading, but a specific benefit after reading fiction is not currently justifiable. However, at present we simply do not know what indicators are suitable criteria for fiction exposure. This may be related to the fact that empirical fiction research is still in its infancy, even though research activities are increasing. Only when research has identified robust outcomes of reading fiction will we learn what measures to apply as external criteria for fiction exposure.

To conclude, in the present study we found evidence to suggest that self-report measures, book counting, and author recognition tests have good levels of construct validity as indicators of exposure to written fiction. However, the three indicators overlap only partially, so that they cannot be used interchangeably. In order to achieve a comprehensive picture of participants’ fiction exposure, researchers are encouraged to apply all three indicators. Out of the measures under investigation, the ART-G fiction sub-score has proven to have the highest criterion-related validity. It remained the only significant predictor of word knowledge both when the impact of gender and non-fiction reading were controlled for and when all indicators were entered in a regression model. Thus, we recommend that researchers include all three measures of fiction exposure if they have the resources to do so, and that they confine themselves to the ART-G in the case that they can include only a single indicator. For the future, it would be interesting to examine the reliability of book counting more closely, and to validate a multi-item self-report scale. Furthermore, estimating criterion-related validity would benefit if reliable outcomes of fiction reading were identified.