A novel prediction score determining individual clinical outcome 3 months after juvenile stroke (PREDICT-score)

Background Juvenile strokes (< 55 years) account for about 15% of all ischemic strokes. Structured data on clinical outcome in those patients are sparse. Here, we aimed to fill this gap by systematically collecting relevant data and modeling a juvenile stroke prediction score for the 3-month functional outcome. Methods We retrospectively integrated and analyzed clinical and outcome data of juvenile stroke and TIA patients treated at the LMU University Hospital, LMU Munich, Munich. Good outcome was defined as a modified Rankin Scale of 0–2 or return to baseline of function. We analyzed candidate predictors and developed a predictive model. Predictive abilities were inspected using Area Under the ROC curve (AUROC) and visual representation of the calibration. The model was validated internally. Results 346 patients were included in the analysis. We observed a good outcome in n = 293 patients (84.7%). The prediction model for an unfavourable outcome had an AUROC of 89.1% (95% CI 83.3–93.1%). The model includes age NIHSS, ASPECTS, blood glucose and type of vessel occlusion as predictors for the individual patient outcome. Conclusions Here, we introduce the highly accurate PREDICT-score for the 3-month outcome after juvenile stroke derived from clinical routine data. The PREDICT-score might be helpful in guiding individual patient decisions and designing future studies but needs further prospective validation which is already planned. Trial registration The study has been registered at https://drks.de (DRKS00024407) on March 31, 2022.


Introduction
Stroke is one of the leading causes of death and permanent disability worldwide, accounting for approximately 6.5 million deaths worldwide and approximately 140 disability adjusted life years [1].It is primarily a disease of the elderly, although around 15% occur in people under the age of 55.In addition, it is precisely in this age group that the incidence has increased by up to 40% in recent years [2,3].
The etiology of juvenile stroke usually differs from that of older patients.It is particularly challenging that the etiology is much more diverse and for many cases (up to 30%) etiology remains unknown [2].
Despite the low prevalence of strokes at a younger age, the individual and socioeconomic consequences are very significant due to the even longer lifespan [4,5].
For this reason, it is essential to determine predictors of outcome after juvenile strokes.To date, there are no studies dedicated to the outcome of juvenile strokes.Validated clinical parameters can enable personalized decisions and lay the foundation for future clinical trials.This study aims to address this gap by modeling a multivariable juvenile stroke prediction score for functional outcome at 3 months after stroke, using a combined set of clinical and paraclinical data.

Ethics statement
Ethical approval for retrospective analysis of data has been obtained at the local ethics committee at LMU Munich ).The study is conducted according to the Declaration of Helsinki.

Study design and patients
We retrospectively collected clinical, imaging and laboratory data in juvenile stroke and transient ischemic attack (TIA) patients who were hospitalized at the stroke unit of the LMU University Hospital, LMU Munich, Munich, Germany between Jan 01, 2011 and Mar 31, 2020.Study size was defined by the number of patients treated during this period.Data were extracted from the clinical database by trained personnel and integrated into our study database.
Ischemic stroke was defined by a sudden focal neurologic deficit lasting more than 24 h with no sign of acute intracranial bleeding on cerebral imaging at admission.TIA was defined as a brief episode of focal loss of brain function that lasted less than 24 h, thought to be due to ischemia, localized to a region of the brain supplied by one vascular system and for which no other cause could be found [6].Trained stroke neurologists performed physical and neurological examinations on admission and treated patients according to current guidelines for the management of stroke during their in-hospital stay.Reperfusion therapy by intravenous thrombolysis with a recombinant tissue plasminogen activator, mechanical thrombectomy or both was performed as appropriate.

Selection of candidate predictors
A systematic literature review was conducted to identify potential predictor variables of functional outcome in juvenile stroke or transient ischemic attack (TIA).We selected for further consideration variables that are collected as part of clinical routine in the majority of stroke patients and have been reported to be associated with poor outcome.The relevant variables are based on expert opinion of the authors and are the well-known predicting variables in stroke care.
Variables were divided into four categories.The first consisted of preadmission factors, including age, previous stroke or TIA as well as the time from symptom onset to admission.The second category comprised clinical, imaging, and laboratory findings at admission, including clinical severity measured by the National Institutes of Health Stroke Scale score (NIHSS), systolic blood pressure, blood glucose level as well as the Alberta stroke program early CT score (ASPECTS) or the posterior circulation ASPECTS (pc-ASPECTS) and large vessel occlusion (LVO).LVO was defined as proximal artery occlusion suitable for thrombectomy.The third category included the results of diagnostic investigations during the in-hospital stay, like mean carotid artery intima-media thickness (IMT) on ultrasonography, the presence and severity of a patent foramen ovale [PFO, examined in a transesophageal echocardiogram (TEE)], CHA 2 DS 2 -VASc-Score and atrial septal aneurysm (ASA), respectively, and the underlying aetiology.The fourth category consisted of the treatment given, including intravenous thrombolysis with a recombinant tissue plasminogen activator and vessel occlusion measured by the modified Thrombolysis in Cerebral Infarction (mTICI) score.
Age, CHA 2 DS 2 -VASc-Score, time from symptom onset to admission, NIHSS score, blood pressure, glucose level, ASPECTS and mean IMT were analyzed as continuous variables while the variable previous stroke or TIA was dichotomized.As the aetiology of juvenile stroke is more heterogeneous compared to older stroke patients, in addition to the underlying Trial of Org 10172 in Acute Stroke Treatment (TOAST) mechanisms [7] we included the presence of a cervical artery dissection, moyamoya disease and vasculitis as independent aetiologies in our data collection.

Outcome
All stroke patients were asked to participate in a clinical structured follow-up 3 months after stroke.Trained personnel assessed the 3-month functional outcome either during an outpatient visit or via a structured follow-up telephone interview.They were blinded with respect to clinical data during the in-hospital stay.A favorable outcome at 3 months was defined as a modified Rankin Scale (mRS) of 0-2 or return to baseline of pre-stroke function.Higher values on the mRS were deemed unfavorable outcomes.

Statistical analysis
Patient characteristics, clinical parameters and outcomes were analyzed descriptively using total numbers and percentages or median and Interquartile ranges (IQR).Univariate Odds Ratios with 95% CIs for an unfavourable outcome were calculated using univariate logistic regression.P values of the respective Wald tests were also added.

Missing data
Patients with missing outcome information were deleted from the data set.Missing observations for candidate predictors were imputed five times with the Multiple Imputation by Chained Equations (MICE) algorithm using all other predictors in the data set as well as the outcome and the random forest method in the R package mice [8].Random Forest imputation is known for its robustness and ability to handle complex interactions and nonlinear relationships in the data [9].Imputations were used for multivariate modeling but not for the univariate analysis, e.g., shown in Table 1.

Model development
As prior research has shown that machine learning models are not superior to regression analysis [10][11][12] multivariate logistic regression analyses were performed to assess the association of candidate predictors with the 3-month functional outcome endpoint.
To derive an appropriate prediction model we created all possible models on each of the five imputation data sets using the R package glmulti [13].We then chose the variables with a model-averaged importance of terms of over 0.8 for further modeling using Akaike's Information Criterion (AIC) and variable selection in each imputation data set.
Linearity of the relationship between the log (Odds Ratio) and the continuous predictors were checked graphically.Outliers and influential observations were identified using Cook's distance and standardized residuals.Collinearity was assessed by calculating the Variance Inflation Factor, goodness of fit was evaluated via the Hosmer-Lemeshow test.
To pool the models from each imputation data set to achieve one final model we used the extended Median-P-Rule which performs very well also when categorical variables are used [14].This method is included in the R package psfmi [15].

Predictive ability and validation
Model discrimination was visualized by plotting the ROC curve and calculating the Area Under the ROC curve (AUROC) with corresponding DeLong 95% confidence intervals (CI).Calibration was assessed by plotting the mean observed probability against the mean predicted probability in each decile.Perfect calibration is displayed as a straight line passing through zero with a gradient of one.
The model was validated internally by performing a bootstrap validation of the final model using 1000 bootstrap samples to achieve an optimism corrected AUROCC.
All analyses were performed using R version 4.3.1.

Data sharing
The data of this study are available on site from the corresponding author upon reasonable request.This work is reported according to the suggestions made in the TRIPOD statement [16].

Results
From Jan 01, 2011 to Mar 31, 2020 the inclusion criteria of juvenile stroke or TIA were met by 388 consecutive patients treated at the Department of Neurology of the LMU University Hospital.Data of these patients were collected from clinical routine documentation.For 42 patients the 3-month outcome was not available.These observations were excluded and the data of 346 patients were included into the final analysis (see Fig. 1).Table 1 shows the baseline characteristics of the final cohort and the univariate associations of the candidate predictors with the patient outcome 3 months after stroke or TIA as Odds Ratios (OR), the 95% confidence intervals (95% CI) and the P values of the respective Wald tests.Age, NIHSS, glucose level, ASPECTS, mean IMT, etiology, intravenous thrombolysis and vessel occlusion were significant predictors in the univariate analysis.

Missing data
Data were missing mainly for the candidate predictors systolic blood pressure at admission (n = 128, 37%), mean IMT (n = 73, 21%) and ASPECTS (n = 42, 12%).Unfortunately, systolic blood pressure was not systematically recorded from 2011 through 2014, thus it is missing more frequently than other values.Systolic blood pressure was missing significantly more frequently in patients with a favourable outcome (40.3% vs. 18.9%,P = 0.0049), while mean IMT was missing significantly more often in patients with an unfavourable outcome (16.0% vs. 49.1%,P < 0.0001).There was no significant difference between missing values for ASPECTS between patients with a favourable and an unfavourable outcome (11.3% vs. 17.0%,P = 0.3449).All missing values were imputed using the MICE algorithm.

Multivariate analysis
In the multivariate logistic regression analyses the variables vessel occlusion, NIHSS, ASPECTS, and blood glucose level were the variables with a model-averaged importance of terms of over 0.8 in each imputation data set.The variable age had a model-averaged importance of terms over 0.8 in four out of the five imputation data sets.
The AIC of the models including age additionally to vessel occlusion, NIHSS, ASPECTS, and blood glucose level had lower AICs in each of the five imputation data sets, thus we age into the final model.In each imputation data set, the continuous predictors age and blood glucose level were assessed for their functional form using plots of the observed log odds versus predictor value.The linearity assumption was not violated.Absolute values of standardized residuals were never larger than three indicating that no single observation had an overly high impact on the model's fit.A sensitivity analysis excluding five outliers with a Cook's distance of more than 0.04 did not result in different predictors or changed model coefficient estimates.There were no indicators for overdispersion or collinearity.The Hosmer-Lemeshow goodness-of-fit test was not significant.
In a final step, we pooled the model with the five predictors vessel occlusion, NIHSS, ASPECTS, blood glucose level and age.The model estimates, ORs and p values are given in Table 2.The AUROC of the model is 89.1% (95% CI 83.3-93.1%).The ROC curve (Fig. 2) shows the model's very good discrimination.The calibration plot indicates the model is well calibrated with an intercept of 0.0009 and a slope of 0.994 (Fig. 3).
The internal validation via bootstrapping resulted in an optimism corrected AUROC of 87.5%.
The individual predicted probability for an unfavourable outcome can be assessed by calculating exp(PREDICT Score)/1 − exp(PREDICT Score).However, the resulting probability needs to be interpreted keeping the low overall percentage of 15.3% of patients with an unfavourable outcome in mind.

Discussion
Up to now, to our knowledge there is no tool predicting outcomes especially for juvenile stroke patients.This represents a significant gap in patient care and clinical research as especially younger patients need a valid prediction to adjust their family and work circumstances if needed.The PREDICT score presented in this work is very precisely predicting the outcome of juvenile stroke after 3 months using the mRS with a cutoff at 0-2 for favorable outcomes in our cohort.The mRS is the most frequently Univariate odds ratios with 95% CIs for an unfavourable outcome.P values of the respective Wald tests   One limitation of our work is the retrospective collection of data from a single center.Recalibration of the score might be indicated in different settings depending on the ratio of unfavourable outcomes in the individual hospital/care unit.In addition, the data were collected from an almost 10-year period (Jan 01, 2011 to Mar 31, 2020).The rather long time period was necessary to reach a minimum number of unfavourable outcomes-which are rather rare in juvenile stroke patients-for stable model estimation.Therefore, we cannot rule out that changes in stroke treatment over time affected the outcome.Although patient data was well documented we had to impute substantial parts for three candidate predictors [systolic blood pressure (n = 128, 37%), mean IMT (n = 73, 21%) and ASPECTS (n = 42, 12%)].
The major limitation of our work is the absence of a validation in an independent cohort.However, data for an external and temporal validation will be collected from routine care data in a structured manner in our institution as well as our partner institutions.The protocol of this validation study has already been published [18].In this data we will also be able to do more subgroup analyses, for example for age groups and etiology.
Presence and severity of a PFO might be a predictor of interest in further research.In our data we observed a nonsignificant but potentially substantial protective effect of a relevant PFO compared to no PFO/small PFO without ASA [OR: 0.58 (95% CI 0.19-1.75)]which appears counterintuitive.However, in our cohort patients who were not examined with a TEE did have an increased risk to experience an unfavourable outcome [OR: 1.70 (95% CI 0.92-3.15)].This can in part be explained because TEE was not regularly performed when the cause of stroke was already known.It would be interesting to know if even with a known cause for stroke PFO might be an independent predictor of the outcome.
The selected predictors in the PREDICT score appear plausible as they were found to be predictive in earlier research on functional outcome after stroke, e.g.age, NIHSS and glucose [10,19,20].Systolic blood pressure, however, was found to be predictive in earlier research but not in our data [10,21].This might be due to the high percentage of missing values (37%) or to lower relevance of this predictor for younger patients.We hope to clarify this matter using data from the planned validation cohort.The PREDICT-scores' accuracy in our patient cohort is comparable or even better than e.g.recent prediction models for elderly patients based on MRI imaging and clinical deep learning model reaching an AUROC of 90% and 68% [10,11,22].

Conclusion
Here we introduce the highly accurate PREDICT-score for 3-month outcome after juvenile stroke derived from clinical routine data.The PREDICT-score might be helpful in guiding individual patient decisions and designing future studies but needs further prospective validation.

Fig. 1
Fig. 1 Flow of patients through the study and outcome status

Fig. 3
Fig.3Calibration plot: graphical representation of the predicted probability of an unfavourable outcome against the actual probability of an unfavourable outcome.Patients were ranked into order of predicted probability of an unfavourable outcome and divided into tenths.The dots represent the mean risks for each tenth; the dotted line represents the perfect relationship

Table 1
Baseline characteristics and outcome of patients in total numbers (%) or median and interquartile ranges (IQR)

Table 2
Final multivariate logistic regression model for outcome 3 months after stroke