Accurate early screening is particularly important for providing effective early intervention for dyslexic readers, and some have suggested that family history could serve as a proxy for identification of dyslexia. Family history has been considered as a contributing risk factor for dyslexia. Scottish ophthalmologist James Hinshelwood’s descriptions of dyslexia in the early twentieth century included reports that, in some cases, dyslexia occurs in families (Hinshelwood, 1900, 1911). Recent studies provide conflicting evidence regarding the potential usefulness of family history as a predictor in screening for dyslexia (Carroll et al., 2014; Dilnot et al., 2017; Thompson et al., 2015). In a high-risk sample, while family history was a significant predictor of dyslexia during the preschool years, by school entry, family history no longer was predictive, and other measures, including letter knowledge and phonological awareness, were better predictors of dyslexia (Thompson et al., 2015).

Among 7- to 9-year-old children, family history was a significant predictor of reading accuracy after controlling for early speech and language (Carroll et al., 2014). In contrast, family risk of dyslexia did not predict reading readiness (a composite of word reading, letter-sound knowledge, phoneme deletion, and rapid automatized naming) once other risks were controlled (Dilnot et al., 2017). A meta-analysis of the effects of family history on reading found an average prevalence rate of dyslexia of 45% (varying between 26 and 66%) in children with a first-degree relative with dyslexia, compared to just under 12% risk of dyslexia in samples of children without such a family history, though samples of dyslexic readers without a family history were not examined (Snowling & Melby-Lervåg, 2016). None of these previous studies reported sensitivity, specificity, and area under the curve (AUC) analyses for family history as a risk factor for dyslexia.

In this report, we examine the classification accuracy of family history as a screening measure for dyslexia using a unique population, an epidemiologic sample of 398 school children representative of children entering public kindergarten, including typical readers (TR) and dyslexic readers (DR), followed longitudinally from age 5 through adulthood. Our study addresses three fundamental questions: What is the classification accuracy of family history as a screening measure for dyslexia in young children? How do dyslexia and family history compare when predicting longitudinal changes in reading skills from childhood to adolescence? What is the longitudinal predictive value of family history within typical readers (TR) and dyslexic readers (DR) considered separately?

Methods

Participants

We use data from The Connecticut Longitudinal Study, an epidemiologic sample survey of schoolchildren representative of children entering public kindergarten (Shaywitz et al., 1990). Of the 398 participants with complete data, 52.8% are females and 47.2% males. The sample contains European Americans or Whites (85.2%), African Americans or Blacks (11.8%), Asians (1.0%), Hispanics or Latinos (2.0%), and other children with unreported race or ethnicity (0.3%). The composition of this sample was similar to the racial and ethnic composition of the USA at the time of the study. All participants were primary English speakers. This cohort, assembled from a 2-stage probability sample, has been followed longitudinally from school entry into adulthood to study the development of reading, learning, and attention (Ferrer et al., 2007, 2010, 2015; Shaywitz et al., 1990, 1992a, 1992b, 1992c, 1999). Parents or caretakers provided written consent for their children to participate in the study, and children also provided assent. The study was approved by the Institutional Review Board at Yale University and was conducted in accordance with the ethical principles that have their origin in the Declaration of Helsinki and are consistent with good clinical practices and applicable laws and regulations.

Measures

Before kindergarten entry, the participants’ parents completed the Yale Children’s Inventory (YCI), a comprehensive overview of the child’s prenatal and perinatal history; the family’s behavioral, cognitive, and medical history; the child’s behavior, development, language, habits, and preschool experiences; parental education and employment; and significant life events (Shaywitz et al., 1986, 1988, 1992a, 1992b). Reading skills were measured using the WJ Reading Cluster (composite of Letter-Word Identification, Word Attack, and Passage Comprehension subtests) from the Woodcock–JohnsonPsycho-Educational Test Battery (WJ; Woodcock & Johnson, 1989), and IQ was measured using the Wechsler Intelligence Scale for Children–Revised (WISC-R; Wechsler, 1981). To compare family history and an evidence-based early screening measure for identification as at-risk for dyslexia, we used the kindergarten and first-grade teachers’ ratings from the Shaywitz DyslexiaScreen (Shaywitz, 2016).

Criteria for Family History and Dyslexia

Family history criteria were established by responses to two questions from the YCI specifically related to family history of dyslexia: Did the family member have (1) trouble reading and (2) trouble spelling? Three criteria were set for trouble reading, trouble spelling, and trouble reading or spelling. The respondent provided answers (coded as 1 for yes; 0 for no) for each family member group: siblings, parents, or grandparents, for each criterion. Two variables (coded as 1 for yes; 0 for no) indicated whether any 1st-degree relatives (parent or sibling) had trouble reading, trouble spelling, and trouble reading or spelling. The last criterion indicated whether any 1st- or 2nd-degree relative (parent, sibling, or grandparent) had trouble reading or spelling. The 1st- and 2nd-degree relative definition yielded a family history positive group (FH+, n = 119) and a family history negative group (FH, n = 279).

Dyslexia was defined using the WJ Reading Cluster scores and the WISC-R Full Scale IQ score. Dyslexic children met criteria based on low achievement (Reading Cluster Age Standard score <90) or IQ-achievement discrepancy criteria (a reading cluster >1.5 standard deviation lower than that predicted by Full Scale IQ) in grade 2 or 4 (Ferrer et al., 2015). Both definitions validly identify children as poor readers, and there is little evidence of differences between subgroups formed by one definition versus the other (Shaywitz et al., 1992a, 1992b, 1992c). This definition of dyslexia status yielded a dyslexic readers group (DR, n = 97) and a typical readers group (TR, n = 301).

Beginning with its first description over a century ago (Morgan, 1896), continuing through the early part of the twentieth century (Hinshelwood, 1917) and through the beginning of the twenty-first century (Lyon et al., 2003), dyslexia has always been defined as an unexpected difficulty in reading in a person who has the intelligence to be a much better reader. Ferrer and associates provided empiric evidence for dyslexia’s unexpected nature (Ferrer et al., 2010) and recent federal law (“First Step Act,” , 2018) has codified dyslexia as “an unexpected difficulty in reading for an individual who has the intelligence to be a much better reader.” This definition fits some current revised methods to identify dyslexia based on discrepancy “…seriously low reading ability, average or better cognitive ability, and a standard score difference of 15 to 29 points [for likely] and 30 points or more [for very likely]” (Hammill & Allen, 2020).

The definition used in our study follows directly from the over a century of dyslexia research and conceptualizes dyslexia as an unexpected difficulty in reading. Investigators, including ourselves, have operationalized the definition to include unexpected for ability and unexpected for age. Specifically, dyslexic readers were identified by an observed Woodcock–Johnson Reading Cluster score 1.5 standard errors below the score predicted from their Full Scale IQ or with a Reading Cluster score below 90. Both of these definitions validly identify children as dyslexic, and there is little evidence of differences between subgroups of children formed with one criterion versus the other (Shaywitz et al., 1992c). This operational definition has been used by us in many previous peer-reviewed publications (Estrada et al., 2018; Ferrer et al., 2015; Herrera-Araujo et al., 2017) (S. E. Shaywitz et al., 2003).

Statistical Analysis

To examine the classification accuracy of the family history definitions (trouble reading, trouble spelling, and trouble reading or spelling) for each family member group as the sole predictor of reader group status (TR and DR), we performed classification analyses yielding the following statistics: sensitivity, specificity, area under the curve (AUC), 95% confidence intervals, and p value from ROC (receiver operating characteristic curve) analysis.

To characterize the normative differences from grades 1 to 9 for the WJ Reading Cluster scores, we carried out a repeated-measures ANOVA, with reader group and family history serving as between-subject effects, and grades (1, 3, 5, 7, and 9) as the repeated measure. The two between-group main effects comparing the overall reader groups (TR vs. DR) and the overall (FH and FH+) groups were tested using the pooled within-subjects standard deviation as the denominator for the calculated Hedge’s g effect sizes (Clearinghouse, 2017). Two simple main effects were calculated for the differences between (FH and FH+) groups within the DR and the TR groups using the pooled within-subject standard deviation, the F statistic of the multivariate model, and Hedge’s g effect sizes. To control for type I error for the four comparisons, the criterion for statistical significance was set at p = .05/4 = .0125.

Finally, to examine the predictive utility of the family history variable (positive family history (for any 1st- and 2nd-degree relative)) alone, an early screening measure (Shaywitz DyslexiaScreen) alone, and the combined screen and family history variables on dyslexia status, we used ROC curve analysis. Results of the three models for kindergarten and first grade are reported in Table 3.

Results

Table 1 reports analyses for sensitivity, specificity, and area under the curve (AUC using receiver operating characteristic curves). True positives (TP, dyslexic readers classified as dyslexic by positive family history), true negatives (TN, typical readers classified as typical), false positives (FP, typical readers classified as dyslexic), and false negatives (FN, dyslexic readers classified as typical) are presented for each family member grouping (parents, siblings, 1st-degree relatives, and grandparents). These analyses are carried out considering family history definitions based on trouble reading, trouble spelling, and the combined trouble reading or spelling.

Table 1 Family history classification analysis of dyslexia

Sensitivities (correct classification of DRs as dyslexic) range from 5% for grandparents using trouble reading as the family history criterion to a maximum of 51% for the combination of 1st- and 2nd-degree relatives, using trouble with reading or spelling as the criterion for positive family history. Specificities (correct classification of TRs as typical readers) were substantially higher, ranging from 77% for 1st- and 2nd-degree relatives using trouble with reading or spelling as the criterion for positive family history to 97% for 1st-degree relatives using trouble spelling as the family history criterion. The largest observed AUC was 80% (p < .001), which was obtained for the sibling family member group using trouble spelling. The smallest AUCs were obtained for the grandparent family member group: 55%, p = .51, for trouble reading; 59%, p = .11, for trouble reading or spelling; and 62%, p = .05, for trouble spelling.

Results of repeated measures ANOVA examining longitudinal differences in reading scores for the reading and family history groups from grades 1 to 9 are presented in Table 2. The overall difference in reading achievement between typical and dyslexic readers (Figure 1, panel A) was very large and statistically significant (p < .001, effect size (ES) = 1.42 standard deviations). The overall difference between the family history groups (FH+ = having a 1st- or 2nd-degree relative with dyslexia; FH = not) (Figure 1, panel B) was also statistically significant (p = .002, ES = .59). The FH and FH+ difference within typical readers (Figure 1, panel C, upper) was small and, after adjusting the p values for multiple comparisons, not statistically significant (p = .015Footnote 1, ES = .29), as was the FH and FH+ difference within dyslexic readers (Figure 1, panel C, lower) (p = .61, ES = .22). The grade main effect was the only within-subject effect that was statistically significant (p < .001). Inspection of panels A, B, and C in Figure 1 indicates that reading scores tend to decline slightly from grades 1 to 5 and increase from grades 5 to 9 in each comparison. Although statistically significant, the differences across grades are small relative to the differences due to groups. Notably, none of the interactions is statistically significant, indicating that differences between reading groups, family history groups, and between grades are independent of each other.

Table 2 Repeated-measures analysis of variance
Fig. 1
figure 1

Panel A is a plot of means ± 95% confidence intervals of WJ Reading Cluster scores for typical and dyslexic readers over grades 1 to 9. The grand means over grades are 109.8 and 88.8, for the TR and DR groups, respectively. The difference between the TR and DR of 20.9 is statistically significant (F1394 = 273.8, p < .001). Panel B is a plot of means ± 95% confidence intervals of WJ Reading Cluster scores for FH and FH+ groups over grades 1 to 9. The grand means over grades are 107.3 and 98.6 for FH and FH+, respectively. The difference between the FH and FH+ of 8.6 is statistically significant (F1394 = 9.9, p = .002). Panel C is a plot of means ± 95% confidence intervals of WJ Reading Cluster scores for FH and FH+ groups over grades 1 to 9 within the TR group and within the DR group. For the TR group, the FH grand mean over grades is 110.8, and the FH+ grand mean over grades is 106.5. The difference between the FH and FH+ of 4.3 is not statistically significant (F5390 = 2.86, p = .015). For the DR group, the FH grand mean over grades is 90.5, and the FH+ grand mean over grades is 87.2. The difference between the FH and FH+ of 3.3 is not statistically significant (F5390 = .72, p = .61)

Finally, results of ROC curve analyses examining the predictive utility of the family history variable (positive family history for any 1st- and 2nd-degree relative) alone, early screening measure alone, and the combined early screening measure and family history variables on dyslexia status for kindergarten and first grade are reported in Table 3. As a reference, included in this table are also classification accuracy values from the screen manual (Shaywitz, 2016, p. 10). These results indicate that the screener is superior to family history in sensitivity but inferior in specificity. Moreover, although the screener has a higher false positivity rate than family history, the predictive indices for screener alone are very similar to those obtained by adding family history, with little value added.

Table 3 At-risk classification summary statistics for family history alone, screener alone, and their combination in kindergarten and grade 1

Discussion

In this paper, using a longitudinal epidemiologic sample of schoolchildren, we examine the classification accuracy of family history as a screening measure for at-risk for dyslexia. In addition, we investigate the predictive value of an evidence-based early screening measure (Shaywitz DyslexiaScreen) to identify at-risk for dyslexia and determine whether family history provided added value.

Using an epidemiologic sample of 398 children followed from age 5 through adulthood, we found that sensitivity of family history for predicting dyslexia was unacceptably low for all family member groups, even when using the highest sensitivity (combining 1st- and 2nd-degree relatives). These results indicate that an evidence-based screener is superior to family history in sensitivity, the primary metric used in screening. ROC curves for family history alone, early screening measure alone, and the combination of the screener and family history indicate that predicting dyslexia using family history does not improve the value of using the screener alone and, in fact, for first-grade data, the addition of family history to the early screener appears to make the prediction worse.

Our report is the first to include sensitivity, specificity, and AUC analyses in an epidemiologic sample examining the classification accuracy of family history for determining at-risk status for dyslexia. Consistent with previous analyses based on growth modeling (Ferrer et al., 2015), the present analyses indicate that the persistent normative reading achievement gap between typical and dyslexic readers from first grade to adolescence is more than twice that of the persistent achievement gap between individuals with and without family history of dyslexia. The significant overall FH achievement gap and the nonsignificant FH differences when examining within TR and DR groups are consistent with the low sensitivity of family history as a predictor of dyslexia and undermine the use of family history for universal screening.

One important way to advance our understanding of the current results would be to conduct analyses of sensitivity and specificity separately by the various SES, sex, and ethnic groups. These additional results would expand the overall results and illuminate the usefulness of family history for different groups. Furthermore, they may also affect the ecological validity of the results and translational practicality of using family history during screening for educators working in diverse areas.Footnote 2 In our current sample, however, including SES, sex, and ethnic groups into the analyses would result in such small subgroups and reduced power that statistical comparisons would be compromised. Future research using large epidemiologic samples that oversample small groups should pursue such analyses.

We conclude that the proposed use of positive family history as a proxy for dyslexia is unwarranted. Moreover, if family history were to be used for screening to determine at-risk for dyslexia, this could have harmful consequences. For example, many children who are dyslexic may never be identified by their schools, as they and their parents may not be aware of any family history of difficulties in reading or spelling, an issue especially problematic for children from single-family households and in children from economically disadvantaged circumstances.