Background

Repetitive head impacts (RHI) from American football have been associated with later-life cognitive symptoms [16] and chronic traumatic encephalopathy (CTE) [5, 79]. We use the term RHI herein to refer to environmental exposures to repetitive impacts, hits, or blows to the head. These impacts can result in symptomatic traumatic brain injuries (e.g., concussion) and/or less discrete cumulative effects on the brain. The 2021 National Institute of Neurological Disorders and Stroke consensus diagnostic criteria for traumatic encephalopathy syndrome (TES) describe the clinical disorder associated with neuropathologically diagnosed CTE [10]. Impairments in memory and/or executive function are core cognitive features of the TES criteria. The specificity of these impairments to CTE and/or RHI is not clear as similar impairments are known to develop in other neurodegenerative diseases (e.g., Alzheimer’s disease [AD], frontotemporal dementia) [11, 12]. The TES criteria are informed by retrospective reports from informants of brain donors [10, 13] and prospective objective neuropsychological data on individuals exposed to RHI is lacking. The patterns of cognitive impairments in populations at risk for CTE (e.g., former American football players) are not known. This has led to diagnostic challenges for neuropsychologists and other clinicians.

Neuropsychological evaluation is an integral component of the clinical evaluation of neurodegenerative diseases. This is evidenced by clinical diagnostic criteria for neurodegenerative diseases requiring the presence of cognitive impairment on neuropsychological testing [1418], often defined by standardized test score(s) of 1 to 1.5 standard deviations below the normative mean [14, 17]. The TES research diagnostic criteria emphasize the need for comprehensive neuropsychological testing to substantiate the presence of cognitive impairment [10]. Neuropsychological test scores serve as outcome measures for large-scale multi-center clinical trials of disease-modifying therapies [19]. It is important that cognitive profiles of populations with exposure to RHI, such as former football players, are delineated for research and clinical purposes.

Research studies on the neuropsychological test performance of older former football players have been limited. Schaffert and colleagues conducted a critical review of 22 studies published between 2013 and 2019 on neuropsychological function in former National Football League (NFL) players [3]. Some, but not all, of the studies found evidence for cognitive impairment, most consistently in verbal episodic memory. There were inconsistencies in the domains impaired across studies. That review highlighted the limitations of the research on this topic that include (1) small sample sizes (e.g., n = 9 former NFL players); (2) unknown exposure status of the comparison groups; (3) lack of consistent reporting of effect sizes; (4) substantial variation in the extent and quality of the neuropsychological test battery and associated norming practices; and (5) restricted focus on former professional American football players.

The objective of this study was to characterize the neuropsychological test performance of a large sample of former college and professional football players from the DIAGNOSE CTE Research Project [20]. Neuropsychological function across major cognitive domains was assessed and we report the sample raw and T-scores derived from age, sex, and/or education normative data. Rates of impairment by test and cognitive domain are reported.

Methods

Participants and study design

Participants were from the Diagnostics, Imaging, and Genetics Network for the Objective Study and Evaluation of Chronic Traumatic Encephalopathy (DIAGNOSE CTE) Research Project [20]. The objectives of the DIAGNOSE CTE Research Project are to develop in vivo biomarkers for CTE, characterize its clinical presentation, and refine and validate clinical research diagnostic criteria. The study enrolled 240 male participants, ages 45–74, including 120 former NFL players, 60 former college football players, and 60 asymptomatic men without a history of RHI or TBI. All participants volunteered to participate as a part of a research study and they were compensated $500 for their time. Evaluations and participation were not done as part of clinical care or medico-legal purposes. Baseline evaluations were completed between September 2016 and February 2020. Inclusion criteria included no contraindications for MRI, lumbar puncture, or PET procedures; English as the primary language; and consent to all study procedures. Because RHI can often result in TBI, TBI was not exclusionary in the former football players. The former college football players must have played ≥ 6 years of organized football with ≥ 3 years at the college level. Former professional football players must have played ≥ 12 years of organized football, including ≥3 in college and ≥4 seasons in the NFL. Although recruitment for the former football players was not based on cognitive (or neuropsychiatric) status, most football players had subjective cognitive concerns at the time of study screening based on the AD8 Dementia Screening Interview and Cognitive Change Index (Supplemental Table 1).

The criteria for the asymptomatic unexposed were no self-reported diagnosed history of TBI of any severity at study screening; no participation in organized contact and collision sports (including American football), military combat, or any other activity that can result in RHI; absence of self-reported formal diagnosis or treatment of psychiatric illness or cognitive impairment; and no self-reported cognitive, behavioral, or mood symptoms at study telephone screening (Supplemental Table 1). They had to have a body mass index ≥24 in order to facilitate matching on body habitus to the former football players. All participants were required to have an informant and adequate decisional capacity at the time of their baseline visit to participate. Additional details of enrollment criteria and recruitment methods have been reported [20]. The neuropsychological test performance of the unexposed asymptomatic men is presented and qualitatively described in the supplemental material (Supplemental Tables 2 and 3). These data are not presented in the main text, and statistical tests that compare the former American football players to the asymptomatic unexposed men on neuropsychological outcomes were not conducted because recruitment of participants for the DIAGNOSE CTE Research Project was based on our risk factor of interest (i.e., elite football play with RHI exposure) and symptoms (i.e., unexposed men must have been asymptomatic at screening). This recruitment strategy was designed for biomarker development [20]. However, it is problematic when examining clinical measures as outcomes because estimates of group differences are magnified.

Participants were evaluated at Boston University Chobanian & Avedisian School of Medicine (with MRI conducted at Brigham and Women’s Hospital); Cleveland Clinic Lou Ruvo Center for Brain Health in Las Vegas, Mayo Clinic Arizona (with PET scans at Banner Alzheimer’s Institute); or NYU Langone Medical Center. Participants underwent a 2-day baseline study visit that included a comprehensive neuropsychological examination and other procedures. All sites received approval by their Institutional Review Board. Participants provided written informed consent. Research was completed in accordance with the Helsinki Declaration.

Sample size

The final sample size included 59 former college football players and 111 former professional football players. (As shown in the supplement, there were 57 asymptomatic unexposed men.) The sample was reduced after exclusion of participants (across all study groups) for missing data on the primary objective neuropsychological tests (n=5) and suboptimal performance validity (n=8). Three had missing data on the Golden Stroop Color-Word Test due to colorblindness and were excluded from Golden Stroop Color-Word statistics but were not excluded otherwise.

Objective neuropsychological evaluation

Participants completed an in-person baseline neuropsychological test battery using standard paper-pencil tests administered by fully trained examiners [20]. A complete list of the domains assessed and neuropsychological tests administered are presented in Table 1.

Table 1 DIAGNOSE CTE Research Project Neuropsychological Baseline Measures

Neuropsychological measures were selected to assure harmonization with data-sharing platforms, such as the National Alzheimer Coordinating Center (NACC). Many instruments and methodologies that overlap with the NACC Uniform Data Set (UDS) v.3.0 were selected [29, 30]. Measures include those that assess cognitive domains relevant to the features described in neuropathologically confirmed cases of CTE as reported by informants of brain donors [5, 7, 10, 13] and that are part of the TES research diagnostic criteria [10, 13]. Domains assessed included attention, visual scanning, and psychomotor speed; executive functions; learning and episodic memory (verbal and visual); language; and visuospatial abilities. Tests of memory and executive functions were overrepresented given these domains are known to be adversely affected by exposure to RHI [3, 10, 13]. Measures of performance validity and estimated pre-morbid intelligence were administered. Raw scores for all tests were generated according to NACC or test manual protocols. For all tests, the primary raw score outcome was total correct with the exception of Trail Making Test where completion time (in seconds) served as the primary outcome (number of errors is also reported). Neuropsychological test raw scores were converted to T-scores using normative data that accounted for age, sex, and education. A small number of tests only accounted for age or age and education. Table 1 provides the normative data source. A T-score ≤ 35 (i.e., 1.5 standard deviations [SD] below the normative mean) was considered impaired [14, 17, 31]. The T-score range was restricted to 20–80 to limit skewed distributions of the data and outliers.

Suboptimal performance validity was defined by below criterion performance on two out of the following three performance validity measures: Trial 2 of the Test of Memory Malingering (TOMM), reliable number span (modified due to use of the UDS Number Span task), and Neuropsychological Assessment Battery (NAB) List Learning Recognition Hits. Established cutoffs for defining performance validity failure on these measures were used but are not disclosed here to preserve test integrity. While failure of one performance validity test can be indicative of invalidity [32], our decision adheres to the revised Slick criteria (Sherman et al.) for identification of malingered neurocognitive dysfunction that advises failure of 2+ performance validity tests [33]. Of the sample with complete neuropsychological data, 10 (4.2%) had suboptimal performance on the TOMM, 8 (3.4%) had a below cutoff score on the reliable number span, and 50 (21.3%) fell below cutoff on the NAB Recognition Hits trial. Fifty (21.3%) failed at least one performance validity measure, six (2.6%) failed two, and only 2 (0.9%) failed all three. Of note, four participants had one performance validity test missing but were above the cutoff on the other two indices. Taken together, for the current sample, a total of 8 participants were excluded for suboptimal performance validity.

Sample characteristics

Semi-structured interviews were performed, supplemented by online questionnaires, to collect data on demographics, medical and psychiatric history, athletic history, and other variables not relevant to the present study. An aliquot of whole blood was used for APOE genotyping. Race and ethnicity were self-reported. A majority of the sample was Black or White. There was insufficient representation of other racial groups to statistically examine them separately. All racial groups are presented in Table 2.

Table 2 Sample characteristics

Statistical analyses

Descriptive statistics were used to characterize the neuropsychological test raw and T-scores. Rates of impairment on each test and/or test indices are reported, based on above-described cutoffs (e.g., T-score ≤ 35). A neuropsychological domain was considered impaired if at least two tests within that domain had a T-score ≤ 35. This was only done for the attention, visual scanning, psychomotor speed domain, executive function domain, and the episodic memory domain as there was insufficient number of tests for the language and visuospatial domains (as designed). While we report the frequency of those who had one impaired score, our interpretation of an impaired domain is based on 2+ tests falling below the threshold given the high base rates of test impairments on large batteries among normative individuals [3135]. For the memory domain, measures that counted towards impairment included the long delay recall trials from the Brief Visuospatial Memory Test-Revised, Neuropsychological Assessment Battery List Learning task, and Craft Story 21 Recall (Paraphrase). Analysis of covariance controlling for age compared the former college and professional football players on the neuropsychological test raw scores. Statistical analyses were conducted using IBM SPSS Statistics, version 27.

Results

Sample characteristics are in Table 2. The sample included 170 former American football players (111 former professional football players, 59 former college football players). The sample of football players was 57.5 (SD=8.1) years old and had 16.7 (SD = 1.5) years of education, and 55 (32.4%) were Black or African American. Tables 3 and 4 show the neuropsychological test performance of the former American football players. Analysis of covariance controlling for age showed a statistically significant difference between the former college and professional football player groups on only three of the primary neuropsychological tests, all memory (Supplemental Tables 4 and 5), though the significant differences would not have survived multiple comparison adjustments. For this reason, the former college and professional American football players were combined and described as a single group.

Table 3 Baseline neuropsychological test performance of former American football players
Table 4 T-score distributions of baseline neuropsychological test performance of former American football players

As previously described, participants who had suboptimal performance validity on 2+ measures were excluded from the sample (n=8). However, there remained four participants who had suboptimal scores on TOMM Trial 2 (scores ranged from 32 to 38). These four participants were retained given their adequate performances on the other two validity tests, including reliable number span (scores ranged from 7 to 11) and NAB List Learning Recognition Hits (percentiles ranged from 13 to 50).

Estimated premorbid intelligence

Based on the Wide Range Achievement Test, 4th Edition (WRAT-4) standard score, the estimated premorbid intelligence of the former football players fell in the average psychometric range. Fourteen had below average standard scores (i.e., <85). Five (2.9%) reported a diagnostic history of a learning disability, two of whom had a below average standard score on the WRAT-4.

Learning and episodic memory

The sample mean T-scores for NAB List Learning Trials 1–3 and NAB List Learning Short and Long Delay recall trials were all ~40. Thirty-six (21.2%) had impaired episodic memory, representing the domain with the highest rates of impairment. Eighty-eight (51.8%) had at least one test impaired in episodic memory. Of the sample, impairments were frequent on the NAB List Learning Trials 1–3 (30.6%, n=52) and on the NAB List Learning Short Delay (37.6%, n=64) and Long Delay recall trials (44.7%, n=74). The participants recalled a mean of 5.2 (of 12) words after a long delay recall. On the recognition trial, 23 participants (13.8%) had impaired false positive errors (mean = 4.8, SD = 3.8) and 33 participants (19.8%) had impaired recognition hits (mean = 10.4, SD = 1.4).

Compared with learning and memory for unstructured verbal stimuli, learning and memory of structured contextualized information (i.e., a story) were better. The sample mean T-scores were in the average psychometric range for Craft Story 21 Immediate and Delay Recall trials (for both paraphrase and verbatim). Rates of impairments were approximately 15% (n=25) for Craft Story 21 Recall Immediate and Delay trials with impairment rates highest for Craft Story 21 Recall Delay Paraphrase (18.8%, n=32).

There was better visual than verbal memory test performance. The sample mean T-scores on indices of learning and episodic memory for figures (BVMT-R) fell in the average psychometric range. Of the sample, 21.8% (n=37) and 17.1% (n=29) had impairments on BVMT-R Trials 1–3 and BVMT-R Delay Recall, respectively. Recognition hits (mean = 5.7, SD = 0.7) and false alarms (mean, SD = 0.1, 0.4) were overall intact with few participants having scores in the impaired range (n = 6 [3.5%] for hits, n = 4 [2.4%] for false alarms).

Executive functions

Mean T-scores were in the average psychometric range and 12 (7.1%) had impaired executive function. Forty-five (26.5%) had a least one test impaired in this domain. Rates of impairments were highest for Trail Making Test Part B (20.6%). On Trail Making Test Part B, 37 had one error, 14 had two errors, and 9 had 2+ errors. Less than 10% of the sample had impaired performance across all other tests.

Attention, visual scanning, psychomotor speed

The sample mean T-scores fell in the average psychometric range for all neuropsychological tests administered in this domain. Twenty-one (12.4%) participants were impaired. Fifty-five (32.4%) had one test impaired and rates of impairments ranged from 2.4% (n=4) on UDS Number Span Forward total correct trials to 18.8% (n=32) on Trail Making Test Part A. On Trail Making Test Part A, 29 participants had one error and two participants had two errors.

Language

The sample mean T-scores for measures of semantic fluency (Animal Fluency) and confrontation naming (Multilingual Naming Test, MINT) were in the average psychometric range. Regarding rates of impairments, 21.2% (n=36) and 17.1% (n=29) of the sample were impaired on the MINT and Animal Fluency, respectively.

Visuospatial

Only 7.1% (n=12) were impaired on the Judgment of Line Orientation test. Gross visuospatial abilities on the BVMT-R Copy were intact as raw scores ranged from 9 to 12 (of 12).

Multidomain impairments

We examined rates of multidomain impairments among the memory, attention, visual scanning and psychomotor speed, and executive function domains. Based on our definition of impairment (i.e., 2+ tests impaired), 25 (14.7%) had 1 domain impaired and 20 (11.8%) of the football players had 2 or more domains that were impaired.

Discussion

This study examined the neuropsychological test performance of 170 male former college (n=59) and professional (n=111) football players (ages 45–74), most of whom had subjective cognitive concerns. Impairments were identified using established normative data that account for age, sex, and education. Episodic memory was the most frequently impaired cognitive domain, particularly memory of unstructured verbal information (i.e., NAB List Learning). Compared with unstructured verbal stimuli, learning and recall of contextual verbal stimuli (i.e., Craft stories) and visual information (i.e., BVMT-R figures) were better but impairments still frequent. Other domains with impairments included attention and psychomotor speed (i.e., Trail Making Test Part A) and set-shifting and mental flexibility (Trail Making Test Part B). With the exception of Trail Making Test Part B, performances on tests of executive functions and on visual-perceptual abilities were otherwise preserved.

The results of this study have several implications. Previous research has shown that more than one-third of NFL retirees report being “extremely concerned” about memory and thinking skills [36]. A majority of this sample also had subjective cognitive concerns. Our finding that performance on memory tests was the most frequently impaired is similar to other neuropsychological studies of former NFL players [3]. The mean performance of the word list learning test was at a level of impairment comparable to what is seen in patients with mild cognitive impairment (MCI) [37]. This finding, in combination with less significant reductions on scores on psychomotor speed, confrontation naming, and semantic fluency suggests a neuropsychological profile that resembles an amnestic form of MCI in this sample of former college and professional football players with a mean age of 58, similar to what has been suggested by other investigators [38].

The 2021 NINDS Consensus Diagnostic Criteria for TES include impairments in episodic memory and/or executive functions as core clinical features [10]. One surprising result from this study was that, with the exception of Trail Making Test Part B, performance on tests of executive functions was relatively preserved. While this finding might provide additional support for a neurocognitive profile consistent with amnestic MCI, it might also be an effect of some of the well-known limitations in the neuropsychological assessment of executive functions. Studies that have examined many of the most commonly used tests of executive functions find only modest correlations among the tests suggesting that these functions are difficult to measure as they do not combine neatly into a unitary factor [39]. There are also indications that tests of executive functions often fail to correspond to behavioral ratings of dysexecutive behavior, raising questions about the ecological validity of the measures [40]. In theory, one would expect individuals with the “neurobehavioral dysregulation” of TES to be impaired most specifically on measures of impulsive responding. There was no evidence of impairment in this study on tasks like the Golden Stroop Color Word Interference measure, a well-known index of cognitive impulsivity.

This study included former college and professional football players. There was no statistically significant difference (with consideration of multiple comparisons) between the former college and professional American football players across any of the neuropsychological tests, but there were trends for worse performance in former professional football players. Former professional American football players, and primarily former NFL players, have been the focus of studies on the long-term neuropsychological consequences of American football play [3]. This is one of the first studies to feature middle aged to older adult former college football players without subsequent professional experience or other RHI exposure after college. From a public health perspective, it is critical to elucidate the long-term health outcomes of college football players given that approximately 800,000 student athletes have played college football in the USA since 1960, 250,000 of whom are currently older than 60 years of age [41]. Moreover, a recent health outcome survey study found a significantly higher prevalence of cognitive impairment disorders in former college football players compared to the general population, a finding similar to previous studies of former NFL players [41].

A challenge in the field of neuropsychology is the appropriate selection of normative data to derive standardize scores to establish levels of impairment. Here, normative data used included those from the specific test manuals, as well as from the NACC for UDS measures. A majority of normative data accounted for age, sex, and education. However, there were variations in normative adjustments across tests that could have influenced impairment rates by test and domain. Race-based norming was not performed. Race-based norming has been incorporated into the training and practice of neuropsychology since at least the 1990s (e.g., Heaton Norms) based on the assumption that race may be a proxy for socioeconomic factors associated with cognitive function. For people who identify as Black, race-based norming results in a stricter threshold needed to be designated as cognitively impaired compared with Whites. The differential treatment of Blacks when scoring and interpreting neuropsychological tests has been a controversial practice [42]. Recently, the NFL ended its use of race-based neuropsychological test norms to determine monetary compensation as part of the NFL Concussion Settlement. The use of race-based norms as part of a rigid algorithm that is void of clinical judgment to determine compensation perpetuates systemic racial injustice and inequity [43]. Prior to consideration of normative data, a majority of neuropsychological tests were developed in White populations, placing Black Americans at initial disadvantage from the beginning. A study is currently underway that is modeling the neuropsychological differences by race in this sample, along with relevant psychosocial, socioeconomic, social, and health factors that might explain observed differences.

There are limitations to the present findings. The asymptomatic unexposed men were required to have no reported symptoms to be eligible for the DIAGNOSE CTE Research Project. While recruitment of the former football players was not based on symptomatic status, most have subjective cognitive (and neuropsychiatric) concerns. This design is appropriate for biomarker development but it limits meaningful comparisons and interpretations on neuropsychological measures between groups as any observed differences could be biased by our recruitment methods. For this reason, statistical comparisons of the former American football players and the asymptomatic unexposed men were not performed [20]. The use of normative data circumvents limitations of study design and informs on rates of neuropsychological impairments among former elite football players. Our ability to make inferences on whether impairments are from pathology and a function of exposure to RHI is challenging given the recruitment design, lack of biomarkers, and in the context of the test performance of the asymptomatic unexposed men. Although impairments were generally infrequent in the unexposed men, approximately 25% and 21% were impaired on BVMT-R Learning Trials and NAB List Learning Long Delay Recall trial, respectively. While the presence of neurological disease in this group cannot be ruled out, it might also be a function of the number of neuropsychological tests administered [31, 4446]. In the present battery of close to 15 separate correlated indices, rates of impairments in the entire sample might be inflated due to type I error. We also excluded eight participants who had evidence of suboptimal performance on 2+ performance validity tests, based on the revised Slick criteria [33]. Four participants in the sample had suboptimal performance on the TOMM Trial 2 but not on any of the remaining validity indices. We acknowledge that failure of just one validity test can be indicative of invalidity and could have contributed to inflated impairment rates [32]. However, the use of 2+ tests to define invalidity is more stringent, followed recommended guidelines, and performance invalidity rates in this sample were overall low and did not influence the results.

The current study did not include a disease comparison group (e.g., Alzheimer’s disease), which is needed to determine the specificity of the observed neuropsychological profiles and facilitate differential diagnosis. The sample includes individuals who volunteered to participate in research. Most of the male former football players had concerns about their cognitive function, mood, and/or behavior. External validity to the general football population, as well as to women and other athlete populations is limited. We used 1.5 SD below the normative mean to define impairment, a generally accepted convention [14, 17]. We recognize that a continuum exists. Finally, cognitive function was measured using traditional paper-and-pencil tests that might have lacked adequate sensitivity to capture certain impairments. The absence or low rate of impairments in certain domains (e.g., executive functions, visuospatial abilities) might be related to measurement. While digital phenotyping currently lacks clinical applicability, it is an exciting avenue of future research.

Conclusions

In this sample of 170 male former elite American football players, a comprehensive neuropsychological assessment revealed most frequent impairments in learning and recall for unstructured verbal stimuli. Continued efforts are needed to characterize the neuropsychological profile of individuals exposed to RHI to assist neuropsychologists and other clinicians in disease detection and differential diagnosis. Additional research that includes a disease comparison (e.g., Alzheimer’s disease) and examines causes of neuropsychological impairment in this population is needed. Development of tests sensitive to the specific executive functions disturbed in this population is also an important target for future research. Such development should include and extend beyond traditional paper-and-pencil tests which might not be adequate for the identification of certain impairments in this population.