Gene Expression Alterations in Peripheral Blood Following Sport-Related Concussion in a Prospective Cohort of Collegiate Athletes: A Concussion Assessment, Research and Education (CARE) Consortium Study

Rapid diagnosis of concussion is essential to effective treatment and recovery. Concussion biomarker research has focused primarily on blood-based protein assays to detect markers of brain injury. However, transcriptomic data provides insight into the complex biological response to concussion. In this study, we investigated RNA-seq transcriptome analysis of whole blood in a large cohort of concussed and control collegiate athletes who were participating in the Concussion Assessment, Research and Education (CARE) Consortium. In this multicenter prospective cohort study, blood samples were collected from collegiate athletes at preseason (baseline), post-injury (0–6 hours), 24–48 hours postinjury, time of symptom resolution, 7 days after unrestricted return to play and 6 months post-injury. RNA-sequencing was performed on samples from 230 concussed, 130 contact control and 102 non-contact control athletes. Differential gene expression analysis was performed at each timepoint relative to baseline. Deconvolution analysis was used to identify differences in immune cell types. We identi�ed key genes and pathways that were activated in response to concussion. Cytokine and immune response signaling pathways were activated immediately after concussion, but at later time points, these pathways appeared to be suppressed relative to contact controls. RNA-seq data also revealed that the proportion of neutrophils increased and natural killer cells decreased in the blood following concussion. Transcriptome signatures in the blood re�ect the known pathophysiology of concussion and may be useful for de�ning the immediate biological response and the time course for recovery. In addition, the identi�ed immune response pathways and changes in immune cell type proportions following concussion could inform future treatment strategies.


Introduction
Concussion is a type of mild traumatic brain injury, caused by a blow to the head or a hit to the body, that causes a sudden movement of the head and brain.The sudden movement of the brain inside the skull injures neural cells and blood vessels resulting in altered brain chemistry and brief loss of normal brain function. 1Evaluation for a suspected concussion immediately after injury typically involves symptom assessment.Common symptoms include a brief loss of consciousness, nausea, headache, blurred vision, or confusion. 1Symptoms attributed to concussion injury are greatest within the rst 7 to 10 days and for most patients are resolved by one month, although a minority of patients may have symptoms that persist for several months or longer. 2,3ncussion is a common injury in many collegiate sports. 4Unfortunately, concussions are underreported, not only because the symptoms can be subtle or may not be apparent immediately following injury, but also because some athletes want to remain in competition. 5,6Rapid identi cation of concussion is important because a delay in diagnosis can prolong recovery.Additionally, individuals who return to play before they are fully recovered are at increased risk of sustaining another more serious brain injury. 7ecause of the challenges related to concussion diagnosis, molecular-based approaches that provide complex biological information are necessary to advance concussion research and identify potential diagnostic and/or prognostic biomarkers of injury.
Given that blood-based assays have identi ed potential protein biomarkers of concussion injury, we hypothesize that gene expression-based diagnostic or prognostic biomarkers may also exist. 8Our objective in this study was to identify longitudinal gene expression changes in peripheral blood that are initiated post-injury and that are relevant to concussion response and recovery.We analyzed samples from the Concussion Assessment, Research and Education (CARE) Consortium, which was formed to further the study of concussion neurobiology and consequences of exposure to repetitive head impacts. 9e anticipate that post-injury gene expression signatures will lead to biological processes that are informative for long-term recovery prognosis.
RNA sequencing (RNA-seq) is a powerful experimental approach that summarizes the transcriptome of cells and can be used to infer the expression of genes in a tissue or blood sample.This study describes the initial ndings from RNA-seq analysis on concussed individuals, spanning preseason baseline and multiple post-injury timepoints.This study also introduces a comprehensive dataset that will be available publicly and serve as a valuable resource for researchers investigating the consequences of head impacts, traumatic brain injuries and gene expression biomarkers.

Methods
Study participants and sample collection.For this study, whole blood was collected from a cohort of 552 collegiate varsity and military academy cadet athletes participating in various competitive sports between 2015 and 2019.Samples were drawn into PAXgene tubes (BD Biosciences, Franklin Lakes, NJ) at six timepoints: seasonal baseline (Base), at the start of the athletic season before injury; post-injury (PostInj), taken within six hours of injury; 24-hour (24hr) taken between 24 to 48 hours after injury; asymptomatic (Asymp), when an athlete begins return-to-play progression; seven days post unrestricted return to play (7PostUR); and six months (6Mo) from the date of injury. 9Athletes were divided into three groups based on injury status: non-contact controls (NCC) were athletes who did not participate in contact sports; contact controls (CCT) were athletes who participated in contact sports but did not sustain a concussion; and injured (INJ) athletes who sustained a concussion. 9In CCTs, the time span for blood draws at timepoints after Base were approximated by pairing an individual with an INJ athlete on the same team.We removed any individual from the CCT and INJ groups who did not have a baseline sample.As no preseason baseline blood draws were collected for the NCCs, these individuals were not paired to any other participant for timespan matching.During collection, the time between NCC blood draws was approximated against the distribution of time between INJ follow-up appointments at the discretion of the institution managing the respective NCC participant.During analysis, the initial blood draw for each NCC participant was rede ned as Base.Other blood draws for the same NCC participant were then paired separately with the Base sample.The full study description, along with the concussion criteria and recovery protocol, has been detailed earlier. 9This study was approved by the Indiana University School of Medicine institutional review board and the Human Research Protection O ce at the US Army Medical Research and Material Command.Written informed consent was obtained from all participants.
Sequencing library preparation.Total RNA was extracted from blood cells using the PAXgene Blood RNA kit (Qiagen, Germantown, MD) followed by DNase I treatment to remove contaminating genomic DNA.Dual-indexed strand-speci c cDNA libraries were prepared from eluted total RNA using the Kapa mRNA HyperPrep kit (KapaBiosystems, Wilmington, DE) along with QIAseq FastSelect Human Globin removal kit (Qiagen).Libraries were prepared in a 96-well plate using a Biomek FxP Laboratory Automation Workstation (Beckman Coulter Life Sciences, Indianapolis, IN).Each plate was pooled using the QIAgility Automation System (Qiagen).Pooled libraries were loaded onto a owcell that was sequenced with 2 × 150 bp paired-end con guration on a NovaSeq 6000 instrument (Illumina, Inc., San Diego, CA).
Gene expression quanti cation and differential expression analysis.Sequence reads from RNA-seq experiments were aligned to the human genome (hg38) using STAR v2.5.2b.Gene expression levels were quanti ed by counting the number of RNA fragments aligned to exonic regions of genes using the program featureCounts. 10 The data were analyzed as individual timepoints compared to baseline to avoid eliminating participants with missing time point data.Differential expression analysis was performed with edgeR using negative binomial generalized log-linear modeling (GLM) and likelihood ratio tests.11,12 When calculating distribution parameters with the estimateDisp function, the robust option was used to nullify extreme outliers.Genes with very low read counts were removed before differential expression analysis to reduce the number of individual statistical tests performed and to avoid in ated signi cance values.Genes were ltered if they had less than 1 count per million mapped fragments (CPM) in a minimum number of samples at each timepoint.Given the large number of samples in each group, we de ned the minimum number of samples as 25% of the smallest group in the comparison (i.e., INJ vs. CCT).At each time point, the minimum sample thresholds (N) were: PostInj (48); 24h (58); Asymp (61); 7PostUR (57); and 6Mo (42).Differential expression analysis was performed at each timepoint compared to baseline and a contrast between the INJ and CCT groups was calculated.By rst modeling within the CCT and INJ groups, and then contrasting them, we controlled for gene expression changes resulting from contact sport participation and not associated with injury.Background levels of gene expression differences at the PostInj timepoint were de ned by comparing CCT and NCC participants.In all comparisons, genes with Benjamini-Hochberg false discovery rate (FDR) ≤ 0.05 were considered signi cant.Gene ontology analysis.Gene ontology (GO) analysis is a type of enrichment analysis where the top differentially expressed genes in an experiment, de ned by a signi cance cutoff, are matched against reference gene lists that have been annotated to biological terms or functions.Enrichment tests determine if a term is signi cant by comparing the number of matched genes in a list to a random background.GO analysis was performed in R using the clusterPro ler package. 13Differentially expressed genes with FDR ≤ 0.05 were converted to Entrez gene IDs using Biomart. 14Term enrichment at the PostInj timepoint was determined using clusterPro ler to search the following reference lists: biological process, cellular component, molecular function, and Kyoto Encyclopedia of Genes and Genomes (KEGG).Only terms with Benjamini-Hochberg FDR ≤ 0.05 were considered signi cant.Terms with enrichment lists containing ≥ 50% common genes were merged.
Gene set enrichment analysis.Gene set enrichment analysis (GSEA), which is not dependent on a signi cance cutoff, is used when few or no statistically signi cant differentially expressed genes have been identi ed.Instead, genes are ordered by signi cance and a running score is obtained as matching proceeds down the full list.Differentially expressed genes at all timepoints were ranked by thelog(pvalue) from differential expression analysis multiplied by the sign of the fold change and analyzed using the GSEA v4.1.0[17] Deconvolution analysis.Deconvolution analysis is the process where cell type proportions can be estimated from bulk RNA-seq data based on marker gene expression.Deconvolution analysis was performed with CIBERSORTx using raw read counts and default software normalization. 18A GLM was used to test the difference between estimated percentages of cell types output by CIBERSORTx.The cell type percentage of the Base sample and the group were used as covariates to predict the cell type percentage of the timepoint sample.All timepoints were tested, and cell types with Benjamini-Hochberg FDR ≤ 0.05 at a given timepoint were considered signi cant.
Role of the funding source.The funders had no role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the paper for publication.

Results
A total of 2,489 blood samples were collected from 552 athletes amongst all groups.Participants without baseline samples in the CCT and INJ groups were ltered, leaving 130 CCT, 230 INJ, and 102 NCC individuals with a combined total of 2,125 blood samples.Participant demographics are in Table 1.Because some sample collections were missed, unaccounted for, improperly recorded, or failed quality control, the number of samples at each timepoint differed between groups.During the study, nine CCT participants sustained a concussion and were subsequently reclassi ed as INJ participants; as a result, samples from these athletes were present in both CCT and INJ groups.Therefore, the baseline samples for these nine participants were duplicated for the reclassi ed sample sets, while the other blood draws for these participants (40 CCT and 33 INJ) remained unique in the dataset.In addition, one CCT athlete served as a control in two different seasons; the rst CCT sample set consisted of ve blood draws and the second set consisted of two blood draws.The same baseline sample was used for both of these sample sets.To investigate how sport-related concussion altered gene expression patterns in peripheral blood over time, we performed differential gene expression analysis on the RNA-seq data at each of the sampled timepoints.The highest number of differentially expressed genes occurred at the PostInj timepoint (N = 860, FDR ≤ 0.05) and that number was reduced 100-fold by the 24h timepoint (N = 8).Volcano plots of differentially expressed genes at all follow-up timepoints are shown in Fig. 2. Lists of differentially expressed genes, fold-changes, and signi cance values for each timepoint are provided in Supplementary Data 1.
A known consequence of brain injury is membrane damage in neuronal cells that triggers ionic ux and disrupts calcium metabolism and calcium dependent signaling. 19The cellular response to restore homeostasis entails activating ion pumps, including calcium pumps, which in turn consume ATP and starve the brain of energy. 19Amongst the differentially expressed genes, we observed that multiple genes related to calcium metabolism were altered at the PostInj timepoint, including CAMK2G, CAMKK2, and CAMKK1, which were all upregulated in INJ participants.Additionally, expression of many solute transporters was altered, including four members of the SLC22 family (SLC22A15, SLC22A16, SLC22A1, and SLC22A4) that transport carnitine, which is used in cells to transport long-chain fatty acids into mitochondria for energy production. 20Together, the upregulated genes we observed related to calcium and energy metabolism suggested a compensatory effect following injury, and matched the pathophysiology reported for concussions.
We also investigated gene expression of known potential protein biomarkers for traumatic brain injury diagnosis. 21Genes for two FDA-approved traumatic brain injury biomarkers used in the i-STAT TBI plasma test (Abbott), GFAP and UCH-L1, did not meet the minimum expression threshold for analysis at any timepoint.The MAPT gene encoding Tau also did not meet the minimum expression threshold for analysis at any timepoint.NEFL, encoding neuro lament light chain, was expressed at all timepoints but no signi cant differences were observed between INJ and CCT participants.
To explore the biological function of differentially expressed genes after concussion, we performed GO term and KEGG pathway enrichment analysis on the differentially expressed genes at the PostInj timepoint.The top two biological processes were neutrophil activation and neutrophil mediated immunity (Fig. 3A, Supplementary Data 2).Several other signi cant biological processes were also related to immune response, which is consistent with in ammation as a mechanism of neuronal tissue damage in concussion injuries. 22,23In addition, we also observed signi cant gene expression differences in multiple interleukin receptor genes at the PostInj timepoint, including IL1R1, IL1R2, IL1RAP, and IL2RB, which is consistent with an acute in ammatory response and upregulated cytokine production that has previously been reported in traumatic brain injury studies. 24,25Several other biological processes related to signal transduction pathways were found, such as regulation of GTPase activity and protein phosphorylation.Likewise, enriched KEGG pathways included natural killer cell mediated cytotoxicity, MAPK signaling pathway, and NOD-like receptor signaling activity (Fig. 3B, Supplementary Data 2).
The small number of differentially expressed genes after the 24h timepoint prohibited GO analysis.Therefore, to compare enriched cellular processes and pathways at all timepoints relative to the baseline, we performed GSEA.Enrichment results using hallmark gene sets are shown in Fig. 4A.Similar to the ndings from GO analysis and KEGG pathways, GSEA also showed that the top-ranked pathways immediately following concussion were related to upregulation of immune-related signaling.6][27][28][29][30] In addition, differentially expressed genes identi ed at the PostInj timepoint included multiple genes downstream of JAK, such as members of PI3K-AKT, MAPK and STAT signaling pathways.These genes include JAKMIP1, JAKMIP2, PRR5L, MAPK13, STAT6, and BCL6.Additionally, two regulatory subunits of protein phosphatase PP2, PPP2RB2 and PPP2R5E, were differentially expressed; PP2, a serine/threonine phosphatase, targets Raf, MEK and AKT signaling cascade pathways.At later timepoints, we also observed signi cant enrichment for "TNFa signaling via NF-kB", "in ammatory response", and "IL6 JAK STAT3 signaling"; however, the enrichment scores were now negative.
To determine whether the observed reversal in enriched pathway scores at later timepoints might be related to recovery of injured athletes after being removed from play, we compared enriched processes at the PostInj timepoint between CCT and NCC participants (Fig. 4B).Interestingly, we observed that the immune signaling processes "TNFa signaling via NF-kB", in ammatory response, and "IL6 JAK STAT3 signaling", were also positively enriched in CCT participants.Therefore, it appears that athletes participating in contact sports exhibit higher activation of certain immune signaling processes compared to athletes participating in non-contact sports.These immune signaling processes become further elevated immediately following concussion and are then downregulated below CCT levels during recovery.GSEA results also indicate that some altered immune signaling pathways in the INJ group appear to remain repressed compared to the CCT group up to 6 months following a concussion (Fig. 4A).
To con rm our observations from the GSEA Hallmark gene lists, we also performed GSEA with gene lists from WikiPathways and Biocarta. 31,32Similar ndings were observed in both WikiPathways (Fig. 5A) and Biocarta (Fig. 5B), in that gene sets were positively enriched (FDR ≤ 0.05) at the PostInj timepoint and negatively enriched at later timepoints.Observed pathways included those associated with cytokine production and in ammatory response.Full results from GSEA analyses are provided in Supplementary Data 3.
Because we observed differential expression of genes that are important in immune signaling, we asked whether there were any changes in circulating cell type populations in response to concussion.To address this question, we performed deconvolution analysis on RNA-seq counts using CIBERSORT to identify immune cell proportions.We then evaluated differences in cell type proportions between INJ and CCT groups at all timepoints using a GLM.Only two cell types at the PostInj timepoint were determined to be differentially proportioned; neutrophils were more prevalent (FDR 2.3E-2) and natural killer cells were less prevalent (FDR 2.49E-5) in the INJ group.No differences in cell-type proportions were observed at later timepoints.Our nding that the neutrophil proportion increased within 6 hours following concussion is consistent with a previous report that also found an increase in neutrophils following mild traumatic brain injury at the site of injury. 33

Discussion
The results of this study provide new insights into the biological responses following concussion injury by comparing gene-expression changes from ve post-injury timepoints against a participant's baseline sample.We found a high number of gene expression changes in peripheral blood cells immediately following sport-related concussion, many of which were consistent with a major immune signaling response.We also identi ed compensatory changes in genes associated with calcium and energy metabolism that are consistent with the pathophysiology of concussion.Our nding that expression of genes which mediate immune signal transduction was enhanced after injury was further supported by GO analysis and GSEA.Immune signaling processes initiated immediately following concussion were largely suppressed within 24 hours and remained repressed compared to contact controls during the 6month recovery period.Lastly, we found the proportion of neutrophils in peripheral blood was higher in injured participants compared to contact control athletes.
One strength of our study is that RNA-seq technology allowed quanti cation of signi cantly more genes compared to earlier studies that used microarray-based chips.However, some of these earlier small studies noted differentially expressed genes persisted days or months after injury, whereas we found only a few differentially expressed genes 24 hours following concussion. 30,34One potential explanation for the observed differences in gene expression at later timepoints is the higher biological variability in our study, which derives from the complex study design that spans multiple sports and institutions.Nevertheless, while a relatively small number of gene expression differences may persist which we could not observe, our ndings provide evidence that a short-term surge in gene expression triggers a powerful immune response.This conclusion is consistent with the ndings of at least one other study of sportrelated concussions. 30tably, we did not observe changes in the expression of genes coding for known blood biomarkers of concussion injury.This nding suggests that protein and molecular biomarkers of concussion in peripheral blood are not related to gene expression changes in blood cells, and instead likely result from release by injured tissues, changes in protein metabolism or other sources.Therefore, RNA-seq of peripheral blood samples may yield complementary, yet distinct, diagnostic targets compared to bloodbased protein biomarker testing.
One limitation of our study is that peripheral blood samples do not re ect the physiological environment of a concussion as well as brain tissue from the site of injury.However, blood is easily accessible for diagnostic testing and therefore is of interest for use in identifying potential biomarkers for concussion diagnosis and monitoring recovery.Another limitation is that individuals who have suffered a sportrelated concussion may have also sustained a blow to the body that might have contributed to gene expression changes in blood.We expect, however, that the wide variety of contact sports and the use of teammates as contact controls in this study minimized identi cation of non-concussion related gene expression changes.Additionally, while sex is known to in uence concussion recovery, we did not investigate the in uence of sex on gene expression following a concussion.Because our study objective was to identify common gene expression changes, we rst compared differential gene expression in individuals with their own preseason baseline sample; thus, the changes we identi ed are independent of sex.Finally, our study would be strengthened by additional samples and more complete time courses to fully explore the longitudinal effects of concussion on gene expression.Increasing the sample size would increase the statistical power for identifying small differences in gene expression and could improve the ndings at individual timepoints as well.
To our knowledge, this study is the largest concussion transcriptome study to date.Our ndings con rm results from several smaller studies and expand on the existing knowledgebase by showing trends in gene expression following injury and concussion-related pathways during recovery.We anticipate the dataset we provide herein will be a rich source of information that advances research in new strategies for the treatment of concussion injury.

Declarations
Patient metadata is currently accessible as part of CARE 1.0 on the Federal Interagency Traumatic Brain Injury Research Informatics System (FITBIR).Gene expression data in the form of raw counts will be available on the (FITBIR) following acceptance of this manuscript.Raw sequence data will be made available on DBGap.Criteria for access is determined by each of the entities hosting the data and the governance of their respective regulatory committees.Supplementary Files

Figure 1 Distribution
Figure 1

Figure 4 Gene
Figure 4

Figure 5 Gene
Figure 5

Table 1
Cohort demographics of CARE participants.We constructed 130 CCT and 230 INJ sample sets, each having a baseline blood draw and at least one or more samples from a later timepoint.A summary of sample numbers is provided in Table2and the distribution of participant sample sets is shown in Fig.1.NCC samples were used to represent time-based gene expression variance.For each of the 102 NCC participants, the rst sample drawn was designated as the Base sample and each subsequent sample for that participant was individually paired with the Base sample as a separate sample set.As a result, there were 428 NCC sample sets, each with only two samples where the baseline sample may have been duplicated in another NCC sample set from the same athlete.