Ethics
Ethical approval for this study was provided through a central institutional review board (Western IRB 1271583). Written, informed consent was obtained for all participants. Written assent was provided by participants under 18 years of age. The study was registered in the clinicaltrials.gov registry (NCT02901821).
Participants
This multicenter study included 112 individuals, ages 8–24 years, with a clinical diagnosis of mTBI, as defined by the 2016 Concussion in Sport Group [15]. The participants were enrolled from emergency departments, sports medicine clinics, urgent care centers, concussion speciality clinics, and outpatient primary care clinics at initial clinical presentation (within 14 days of injury) and were repeatedly assessed for symptoms, balance, cognitive test performance, and saliva ncRNA levels up to 60 days post-injury. The cohort was divided into PPCS (n = 32) and non-PPCS (n = 80) groups based on self-reported symptom scores. PPCS was defined using the upper 95% confidence interval of the mean symptom severity score on the Post-Concussion Symptom Scale (PCSS) from 170 age-matched participants without mTBI (score ≥ 5) [16]. The first symptom report ≥ 21 days post-injury was used to determine PPCS status. A cut-off of 21 days was chosen based on the literature showing that the majority of children (75.6%) report concussion recovery within two weeks, but symptom change flattens between two and four weeks [17]. This threshold resulted in a percentage of PPCS participants (28.6%; n = 32) consistent with existing literature [18, 19]. Participants were enrolled at six institutions: Adena Health System (n = 14), Colgate University (n = 7), Penn State College of Medicine (n = 69), State University of New York (SUNY) Buffalo Medical University (n = 3), SUNY Upstate Medical University (n = 3), and Vanderbilt University (n = 16). Participants meeting the following criteria were excluded: non-English speaking, neurologic injury (e.g., intracranial bleeding, spinal cord injury, skull fracture), periodontal disease, upper respiratory infection, secondary oropharynx injury, baseline hearing/vision loss, and drug or alcohol dependency. Additional exclusion criteria included presentation for clinical care > 14 days after injury (n = 17), incomplete symptom reports necessary for PPCS classification (n = 111), and falling outside the desired age range (n = 16; Supplemental Fig. 1).
Samples were divided into a training set (184 samples (58% of total); PPCS = 53, non-PPCS = 131), an evaluation set (72 samples (23% of total); PPCS = 27, non-PPCS = 45), and a semi-naïve testing set (62 samples (19% of total); PPCS = 18, non-PPCS = 44). The training set was used for ncRNA feature selection and algorithm creation. The testing set was used to validate the accuracy of resulting predictive algorithms. The evaluation set was used to minimize bias that could arise from class imbalance by shifting the probability threshold of the classifier away from the standard value of 0.5, while avoiding artificial performance inflation [20]. While the samples in the testing set were naïve, a subset of the participants from which they derive were not (i.e., 37/112 participants were represented in both training and testing sets). Samples were grouped by age, sex, and PPCS-status and assigned randomly across sets. A maximum of five samples per participant were allowed in training and testing sets, with remaining samples being incorporated into the evaluation set. First, the prognostic accuracy of ncRNAs was compared against the Zemek 12-point risk score [21], employing samples with complete data for “history of concussion” and “medical diagnosis of chronic headaches or migraines” in addition to age, sex, and select symptom information (218 samples, PPCS = 62, non-PPCS = 156). Next, the ability of ncRNAs to differentiate recovered and non-recovered participants ≥ 21 days post-injury was compared against computerized cognitive test and balance scores (77 samples, PPCS = 17, non-PPCS = 60).
Measures
Medical/demographic characteristics were collected from each participant via survey at enrollment. For children ≤ 12 years of age, parents assisted with survey completion. Concussion-related symptoms were self-reported on a 7-point scale (0–6) using the PCSS [22]. These survey characteristics enabled recapitulation of all nine predictors (each having 0–2 risk points for PPCS) from the Zemek 12-point risk score model. The nine predictors and PCSS counterparts are: age group (three bins, 5–18), sex, prior concussion and symptom duration, migraine history, answering questions slowly (“Feeling slowed down”), tandem stance balance errors (“Balance problems”), headache (“Headache”), sensitivity to noise (“Sensitivity to noise”), and fatigue (“Fatigue or low energy”). Balance and cognitive function were assessed using the ClearEdge system (Quadrant Biosciences Inc., Syracuse NY) [23]. Body sway was measured in eight stances: two-legs eyes-open (TLEO), tandem-stance eyes-open (TSEO), two-legs eyes-closed (TLEC), tandem-stance eyes-closed (TSEC), two-legs eyes-open on foam pad (TLEOFP), two-legs eyes-closed on foam pad (TLECFP), tandem-stance eyes-open on foam pad (TSEOFP), and tandem-stance eyes-closed on foam pad (TSECFP). The computerized cognitive assessment included simple reaction time (SRT1), procedural reaction time (PRT), go/no-go (GNG), and a repeat of simple reaction time (SRT2) [24]. The Minimal Detectable Change (MDC) value [25, 26] for cognitive and balance tests were used to determine whether a participant’s change in performance from enrollment to follow-up was a real change, or whether it fell within the 95% confidence interval of random measurement error. As we have previously described [16], non-fasting saliva samples (n = 505) were collected from all participants (n = 112) using OraCollect Swabs (DNA Genotek, Ottowa Canada). RNA sequencing was performed at a depth of 10 million reads per sample, using 50 base-pair single end reads, on an Illumina NextSeq 500 instrument. Fastq files were aligned to the following databases: miRBase22 (miRNAs), RefSeq v90 (small nucleolar RNAs; snoRNA), and piRBase v2 (piwi-interacting RNA). To allow for efficient and meaningful alignment from piRBase, highly similar sequences were reduced using hierarchical clustering. Resulting sequences were termed wiRNAs. Aligned reads were filtered to remove low counts (< 0.01% of total reads per RNA category), normalized using total sum scaling, and inverse hyperbolic sine transformed to correct for skew.
PPCS versus non-PPCS comparisons
Three ncRNA sub-types (miRNA, snoRNA, wiRNA) were compared among PPCS and non-PPCS groups at two time-points: (1) initial (≤ 14 days post-injury) and (2) follow-up (≥ 21 days post-injury). Mean symptom, balance performance, and cognitive performance scores were also compared between PPCS and non-PPCS groups at each time-point. To identify changes in ncRNA levels during “typical” recovery, a paired test looked at differentially expressed RNAs within the non-PPCS group only, comparing their “follow-up” and “initial” samples.
Feature selection
Training data were processed through a custom, multifold feature selection pipeline in R (caret package) consisting of neural network- and random forest-based algorithms. Top features appearing in > 50% of the folds were combined with ncRNAs identified from differential expression and penalized generalized linear model (GLM) analyses. Penalized GLMs identified ncRNA predictors associated with symptom scores, balance and cognitive test performance, and injury-associated risk factors. RNAs with significant Pearson correlation coefficients (p < 0.05, unadjusted) were chosen from linear regression models (for numeric response variables), along with the three highest ranked RNAs in terms of variable importance from logistic regression models (for binary response variables) with “fair” predictive accuracy (kappa > 0.20). The reduced feature set was used to train the PPCS algorithm (see below). Recursive feature elimination was used to further refine the panel. At each iteration, the feature resulting in maximum weighted algorithm performance upon omission was removed until optimal performance was reached. A gradient-boosted machine (GBM) model was used to rank the final features in order of importance.
Prognostic algorithm development
To create a prognostic algorithm capable of predicting PPCS status, a training set of 113 non-PPCS and 53 PPCS samples (collected within 14 days of injury from 72 and 28 participants, respectively) was used to train a radial support vector machine (rSVM) algorithm (Supplemental Table 1a). Performance was evaluated using AUC from repeated tenfold cross-validation along with sensitivity, specificity, positive predictive value, and negative predictive value. A naïve testing set of 44 non-PPCS and 18 PPCS samples from 33 and 16 participants, respectively (Supplemental Table 1b), was used to validate algorithm performance. Stratified random sampling in R was used to ensure age- and sex-matching of the PPCS and non-PPCS groups across the training, evaluation, and testing sets, as well as equal % PPCS across sets. Sampling was performed only once to avoid bias and to maintain a truly naïve testing set. To compare the ncRNA algorithm with an existing clinical assessment tool, rSVM models were trained using features from the Zemek 12-point risk score model. Performance was assessed through AUC on tenfold cross-validation. A third model was generated combining the risk score with ncRNAs.
Identifying mTBI recovery
The same feature selection pipeline was used to select ncRNAs capable of objectively identifying individuals with symptom recovery. In addition to age, individual cognitive and balance test scores were used as features in a random-forest model. Predictive capability of cognitive and balance testing was compared with that of ncRNAs by performing repeated tenfold cross-validation. The cross-validation approach was chosen due to the reduced number of participants (78/112) for whom complete balance and cognitive test results were available at initial and follow-up time-points. To increase fidelity of group assignment, samples with an associated PCSS score within two of the threshold score (n = 5) were also excluded. A set of 60 non-PPCS and 17 PPCS samples from 58 and 15 participants, respectively, was used. (Supplemental Table 1c).
Statistical analysis
R version 3.6.1 was used for all statistical analyses. The data were analyzed by paired (e.g., initial vs. follow-up time points) or unpaired (e.g., PPCS vs. non-PPCS) t tests, one-way ANOVA in the case of multiple groups, or the Mann–Whitney test in case of nonparametric distribution. A Chi-squared test with Yates correction was used for nominal data. Differential expression analysis was performed using the DESeq2 package (version 1.24.0), where p values were attained by the Wald test. Multiple testing correction was achieved with the Benjamini–Hochberg method. Algorithm performance was evaluated by AUC and statistically compared using the method of DeLong. Unless otherwise noted, * denotes p ≤ 0.05, ** denotes p ≤ 0.01, and *** denotes p ≤ 0.001.
Power analysis and sample size software (NCSS PASS 2019, Chapter 260) was used to determine that the sample size in the training set provided 99% power to detect a difference between the null AUC = 0.68, taken from the Zemek 12-point-risk score model validation AUC, and the alternative hypothesis, AUC = 0.856, estimated from our previously published research [14]. A two-sided z test was used with α = 0.05 for continuous data with equal variances and binomial outcomes. The testing cohort achieved 74% power to differentiate the ncRNA model performance (AUC = 0.87) from the Zemek risk score model (AUC = 0.68).