Introduction

Posterior circulation (pc) strokes are frequently associated with poor outcome [8, 17]. Recently reported results from the Basilar artery international cooperation study (BASICS) indicate that functional outcome may still, to a large extent, be dependent on the initial clinical presentation and imaging findings, and, to a lesser extent, on the specific therapeutic strategies [14, 32]. Predicting functional outcome based on the initial clinical and imaging findings might therefore (1) allow for a prognosis of the patient’s long-term functional status (2) optimize triage for additional MR imaging diagnostics and allocation of best possible medical care [36] and (3) facilitate required adaptations in the patient’s social environment including arrangements of long-term care at an early point of time.

Binary quantification of early ischemic changes using the posterior circulation Acute Stroke Prognosis Early CT Score (pc-ASPECTS) was shown to predict functional outcome in patients with suspected pc ischemia [27, 30]. However, conventional binary classifications of pc-ASPECTS regions do not consider all information available from the imaging data: prognostic value carried by changes in texture and small shifts of grey level distributions remains unused. The accuracy of conventional pc-ASPECTS ratings is also affected by the limited sensitivity of the human eye for subtle early ischemic changes. Moreover, visual assessments of non-contrast CT (NCCT) images suffers from inter- and intra-reader variability and are often interfered by beam-hardening artifacts in the posterior fossa [9, 27, 33].

The integration of clinical data, mainly the baseline National Institute of Health Stroke Scale (NIHSS) was shown to improve discriminatory power [16]. However, although being the most widely used scoring system in patients with acute ischemic stroke, NIHSS has weaknesses when applied to pc strokes partly because deficits such as truncal ataxia, dysphagia and diplopia—that are typical for pc strokes—are not assessed. This explains why patients with pc stroke can have a high probability of an unfavorable outcome at 90 days despite relatively low NIHSS scores at admission [31] and underlines the need of a combined approach of imaging evaluation and clinical scoring [16].

We therefore propose a machine learning (ML)-based evaluation of multidimensional quantitative image features from pc-ASPECTS regions in admission NCCTs combined with clinical data to predict functional outcomes in patients with acute pc strokes.

Materials and methods

The anonymized data used for training and validation of algorithms that support the findings of this study are available from the corresponding author upon reasonable request.

This multi-center retrospective study was approved by the Ethics Committee of the University of Hamburg and the Hamburg Chamber of Physicians, Hamburg, Germany, and the Ethics Committee of the University of Muenster and the Westfalian Chamber of Physicians, Muenster, Germany, and written informed consent was waived by the institutional review boards. All study protocols and procedures were conducted in accordance with the Declaration of Helsinki.

Patient characteristics

The study cohort includes consecutive patients with suspected posterior circulation ischemia admitted between April 1, 2010, and February 28, 2019 at two tertiary care stroke centers. Inclusion criteria for this study were (1) documented occlusion of the basilar or intracranial vertebral artery; (2) NCCT performed on admission within 6 h of symptom onset; (3) availability of modified Rankin Scale (mRS) after 90 days (mRS90). Patients were excluded in case of poor imaging quality (artifacts from movement and implants). In total, 172 patients met the inclusion criteria and were selected for the imaging-based analysis. Complete clinical data including NIHSS at admission were available for 149 patients that were selected for all models employing clinical data at admission.

Image acquisition

NCCT scans with head images obtained from the vertex to the skull base were acquired on a 128-slice dual-source CT scanner (Somatom Definition Flash; Siemens Healthcare GmbH) with tube voltage 120 kV, tube current 340 mA, 5.0 mm slice reconstruction, < 0.5 mm in-plane resolution, as well as on an iCT 256™ scanner (Philips Healthcare, Best, The Netherlands) with tube voltage 120 kV, tube current 300 mA, 4.0 mm slice reconstruction and < 0.5 mm in-plane resolution.

Visual pc-ASPECTS rating

For all admission NCCT scans, pc-ASPECTS was conventionally assessed by two Neuroradiologists in a consensus rating approach (UH, PS: 8 years of clinical experience in diagnostic neuroradiology in acute care full-service hospitals). pc-ASPECTS allots the posterior circulation 10 points. One point each is subtracted for early ischemic changes on NCCT in left or right thalamus, cerebellum, or posterior cerebral artery territory, respectively, and two points each for early ischemic changes in any part of the midbrain or pons. A pc-ASPECTS score of 10 indicates absence of visible posterior circulation ischemia, a score of 0 indicates early ischemic changes in all pc-ASPECTS territories [7, 27].

Image pre-processing and pc-ASPECTS feature extraction

To (1) extract information from standardized pc-ASPECTS maps and (2) reduce potential bias in quantitative texture analysis, all NCCT images were registered to a custom MNI (Montreal Neurological Institute)-152 CT reference atlas [10] using two-step affine algorithms. Registration success was visually verified by two Neuroradiologists (UH, PS). Standardized pc-ASPECTS area maps (thalamus left/right (l/r), pons, midbrain, territory of the posterior cerebral artery (PCA) l/r, cerebellum l/r) were derived as follows: First, an experienced Neuroradiologist (UH) performed manual segmentations of the respective regions on the original NCCT images of 63 healthy subjects using Analyze 11.0 Software (Biomedical Imaging Resource, Mayo Clinic, Rochester, MN) [3]. Second, manual segmentations were transformed into standard space by utilizing transformation matrices obtained from image registration to the custom MNI-152 CT reference atlas [10]. Third, all segmentations were added and final standard maps were defined by applying 50% cut-off points.

Quantitative image features were extracted using the PyRadiomics Python package v2.1.0 [35], proposed default settings were used for the analysis. Extracted features comprised 252 first-order features (18 based on unfiltered images, 144 wavelet decompositions, 90 log-sigma Laplacian of Gaussian filtered images) and 966 texture features (82 based on unfiltered images, 544 wavelet decompositions, 340 log-sigma Laplacian of Gaussian filtered images). In total, 1218 quantitative image features were extracted from each of the 1376 included pc-ASPECTS areas.

Statistical analysis

Univariate logistic regression analysis was conducted based on the entire dataset to investigate conventional odds ratios of the clinical predictors (NIHSS at admission, pc-ASPECTS and age) for good outcome (mRS90 ≤ 2). Using fivefold cross validation, univariate (conventional pc-ASPECTS ratings) and multivariate logistic regression models (conventional pc-ASPECTS ratings, NIHSS at admission and age) were trained to predict functional outcome at dichotomized mRS90 levels of ≤ 2, ≤ 3, ≤ 4 and ≤ 5 (survival).

Imaging-based machine learning prediction of dichotomized mRS90 levels was evaluated using Random Forest algorithms (Python scikit-learn environment v0.20.3 [24]) in a fivefold nested cross validation approach [15]. Random forest classifiers have a comparably low tendency to overfit [4] and support classification tasks also for data sets comprising numerous and heterogeneous predictors. For each study patient, quantitative image features of the eight pc-ASPECTS regions were evaluated for their ability to predict functional outcome (9744 image feature in total per patient). Hyperparameter tuning of the random forest classifiers (total number of features, number of trees, maximum depth of the tree, minimum number of samples to split an internal node, number of features considered for splitting (mtry), minimum number of samples at leaf node) was conducted using grid search algorithms on each training data set within the nested cross-validation layers. Parameters at initiation were set to scikit-learn default values. Selection of features with the highest predictive value was conducted separately for each training data set of the fivefold cross-validation sample split according to Gini impurity measures [18]. For the integrated model, predicted probabilities for good outcome of the logistic regression model using clinical data and of the imaging-based machine learning classifier were averaged.

Receiver operating characteristic (ROC) curves were used to determine the optimal cut-off values according to Youden’s index. For predictive models, ROC curves were generated from results of all cross-validation sets. Confidence intervals (CI) for sensitivities and specificities were bootstrapped (2000 replicates, pROC v1.15 R-package [29]). Bonferroni adjustments were applied to control for alpha error inflation. Furthermore, the classifiers were analyzed using sensitivity, specificity, accuracy, maximum Youden Index, positive predictive value, negative predictive value (ThresholdROC v2.8 R-package) and Matthews correlation coefficient (MCC) [20] metrics (psychometric v.2.2. R-package). MCC evaluates all fields of the confusion matrix and is considered as a favorable measure for unbiased comparisons of binary classifiers [25]. Due to the relatively low class imbalance for all mRS90 cut-off values (event rates for mRS90 ≤ 2: 33%; ≤ 3: 40%; ≤ 4: 56%; ≤ 5: 74%), no additional data augmentation for reducing bias from class imbalance was performed.

A graphical flow chart of the proposed ML-based algorithm for prediction of the clinical outcome is depicted in Fig. 1.

Fig. 1
figure 1

Schematic overview of proposed imaging-based outcome prediction pipeline. CV cross validation set, mRS modified Rankin Scale, pc-ASPECTS posterior circulation Acute Stroke Prognosis Early CT Score, ROC receiver-operating-characteristic

Results

Our analysis included NCCT images of 1376 pc-ASPECTS regions extracted from 172 patients (77 females, median age 74 years, interquartile range (IQR): 61–79 years) with acute stroke in the posterior circulation. NIHSS assessments were available for 149 patients. Median NIHSS score at admission was 15 (IQR 5–42), 94 patients (54.7%) underwent successful recanalization with TICI (thrombolysis in cerebral infarctions) score ≥ 2b, 79 patients (45.9%) were treated with intravenous thrombolysis (Table 1). 57 patients (33.1%) reached a favorable outcome of mRS ≤ 2 at day 90.

Table 1 Patient characteristics

Logistic regression analysis

Logistic regression for mRS ≤ 2 (good outcome) of the conventional predictors (conventional pc-ASPECTS, NIHSS at admission and age) on the entire dataset showed significant coefficients for pc-ASPECTS and NIHSS at admission (P-value < 0.05), age was not significantly associated with good outcome (Table 2). Optimal cut-off values (Youden’s index) indicate that patients with pc-ASPECTS ≥ 8 have a significantly higher probability to achieve mRS ≤ 2 with odds ratio of 11.07 (95% CI [2.55; 48.02]). Also patients with NIHSS at admission < 10 have a significantly higher chance of good outcome with odds ratio of 16.17 (95% CI [7.01; 37.32]).

Table 2 Logistic regression of conventional predictors for mRS90 ≤ 2 (good outcome)

Predictive models for functional outcome

Areas under the receiver operating characteristic curves (ROC AUCs) of the test sets using conventionally rated pc-ASPECTS in an univariate logistic regression reached 0.63 (95% CI [0.59; 0.67]) for mRS ≤ 4 to 0.68 (95% CI [0.63; 0.72]) for mRS ≤ 5. Pure imaging-based machine learning classifier ROC AUCs were lowest for mRS ≤ 4 with 0.81 (95% CI [0.78; 0.84]) and highest for mRS ≤ 5 with 0.87 (95% CI [0.85; 0.90]). Employing multidimensional conventional predictors (conventional pc-ASPECTS, NIHSS at admission and age) yielded ROC AUCs of 0.73 (95% CI [0.69; 0.77]) for mRS ≤ 5 (lowest) to 0.85 (95% CI [0.82; 0.88]) for mRS ≤ 2. Overall, highest predictive performance was observed for the combined clinical data and machine learning-based model with ROC AUCs of 0.83 (95% CI [0.80; 0.87]) for mRS ≤ 5 (lowest) to 0.90 (95% CI [0.88; 0.92]) for mRS ≤ 2 (highest) (Figs. 2 and 3 and Table 3). Results show that the predictive performance of machine learning-based evaluation of quantitative image features was higher compared to the predictive value of conventional pc-ASPECTS metrics (p values < 0.05). If combined with additional clinical data (NIHSS at admission and age), the conventional prediction model achieved slightly better metrics for differentiating lower mRS values (≤ 2 and ≤ 3), however, these differences were not significant. For the mRS ≤ 4 and ≤ 5 classification tasks, quantitative image features-based algorithms reached higher performance with significant differences in all metrics for mRS ≤ 5 (survival). The integrated model employing information from conventional pc-ASPECTS ratings, clinical data and machine learning-based evaluation of quantitative image features showed superior results versus conventional pc-ASPECTS and clinical data by trend for all mRS cut-offs. Improvements were observed for ROC AUC in mRS ≤ 2 prediction (ROC AUC = 0.90 vs. 0.85, p value < 0.05) and for all metrics in mRS ≤ 5 prediction (ROC AUC = 0.83 vs. 0.73, p value < 0.05). Feature importance analyses of the mean top 300 predictors of all training data sets show that pc-ASPECTS regions with the highest predictive power are cerebellum (30%), midbrain (29%) and thalamus (27%). The largest share of predictive value was mainly derived from wavelet (40%) and log-sigma (38%) filtered images. Unfiltered original images contributed 22% to total predictive power (Fig. 4). Within feature classes, texture metrics and first order statistics were used at equal proportions.

Fig. 2
figure 2

Imaging-based prediction of outcome in patients with posterior circulation stroke at admission. ROC curves, AUCs and maximum Youden index at different mRS cut-off values for respective binary classification tasks. A Univariate logistic regression models employing conventional pc-ASPECTS ratings; B pure imaging-based random forest machine learning algorithms. Results are based on nested 5-fold cross validation of 172 patients from 2 different centers. Bonferroni corrections have been applied to account for alpha spending error. CI confidence interval, d days, mRS modified Rankin Scale, pc-ASPECTS posterior circulation Acute Stroke Prognosis Early CT Score, ROC AUC receiver-operating-characteristic area-under-the-curve

Fig. 3
figure 3

Imaging and clinical data-based prediction of outcome in patients with posterior circulation stroke at admission. ROC curves, AUCs and maximum Youden index at different mRS cut-off values for respective binary classification tasks. A Multivariate logistic regression models employing conventional predictors of outcome (conventional pc-ASPECTS ratings, NIHSS at admission, age) and B Combined models utilizing information derived from conventional predictors and machine learning-based image analysis. Bonferroni corrections have been applied to account for alpha spending error. Results are based on nested five-fold cross validation of 149 patients from two different centers. CI confidence interval, d days, mRS modified Rankin Scale, NIHSS National Institutes of Health Stroke Scale, pc-ASPECTS posterior circulation Acute Stroke Prognosis Early CT Score, ROC AUC receiver-operating-characteristic area under the curve

Table 3 Classification performance of imaging-based outcome prediction
Fig. 4
figure 4

Predictive value of quantitative image features. Pie charts show regional distribution of features and applied filters in utilized top-300 predictors. Results are based on nested five-fold cross validation of 172 patients from two different centers. PCA: Posterior cerebral artery

Discussion

In this study, we developed a machine learning approach for predicting functional outcome of patients with posterior circulation stroke based on multidimensional quantitative image analysis of pc-ASPECTS regions in admission NCCTs and basic clinical data available at admission. The study is based on a cohort of 172 patients, of which 57 (33.1%) achieved a favorable outcome of mRS ≤ 2 at day 90. This corresponds to the results of the BASIC trial with a recently reported total share of 32.7% for mRS ≤ 2 at day 90 (35.1% in the intervention arm vs. 30.1% in the control arm) in patients with basilar artery occlusions [32].

Conventional logistic regression and cut-off point optimization confirmed that high pc-ASPECTS (optimal cut-off at pc-ASPECTS ≥ 8) and low NIHSS at admission (optimal cut-off at NIHSS < 10) are significant and independent predictors of good outcome. These results are in line with the findings of other studies [2, 12].

The proposed ML-approach employing quantitative image features provided high discriminatory accuracy between good and poor functional outcome at different mRS thresholds; observed performance metrics were superior or equal to conventional clinical and imaging-based assessments. For predicting mRS ≤ 2, ROC AUC, sensitivity and specificity were 0.90, 81% and 85% for the integrated machine learning classifier; 0.85, 80% and 83% for conventional pc-ASPECTS with clinical data and 0.64, 68% and 57% for conventional pc-ASPECTS alone. Our analysis also showed that employing multidimensional clinical predictors (conventional pc-ASPECTS, NIHSS at admission and age) improved accuracy with a statistically significant increase in ROC AUC compared to using conventional pc-ASPECTS alone. However, even the solely imaging-based machine learning approach achieved higher discriminatory power than conventional pc-ASPECTS.

Earlier studies have investigated the predictive power of conventional pc-ASPECTS ratings based on different imaging modalities and clinical parameters [2, 16, 19, 21, 30]. Lin et al. [16] report ROC AUC for pc-ASPECTS and NIHSS at admission of 0.69 and of 0.77 if both parameters are combined. Other works focused on outcome prediction after endovascular therapy and show ROC AUC of 0.74 for NIHSS at admission and 0.72 for pc-ASPECTS [21]. Based on CT perfusion imaging parameters, pc-ASPECTS ROC AUC was reported to achieve 0.64 (mean transit time) to 0.82 (cerebral blood volume) [2]. pc-ASPECTS based on DWI was shown to be a predictor of clinical outcome with ROC AUC of 0.82 [19]. To date, all published studies employ conventional regression analysis. None of the published analyses investigated the discriminatory power of ML-based quantitative image assessment in a train, validation and test approach.

Our study has the following limitations: first, generalizability might be limited due to the retrospective nature with inherent selection bias and its relatively small sample size. An expansion of sample size in a prospective study design would certainly contribute to further improving generalizability of results. However, low variability of results across different validation sets suggests sufficient robustness for assessing general feasibility and limitations of the approach. Second, differences in recanalization treatment possibly have influenced patients´ outcome. However, our results were confirmed throughout the whole patient collective despite different recanalization results even though our approach did include only variables available at admission. This observation indicates that clinical and imaging data at admission might already include information regarding probabilities of specific treatment strategies (e.g., decision for mechanical thrombectomy based on age, NIHSS, pc-ASPECTS). Furthermore, our findings are supported by the results of the BASICS trial that did not report a significant difference in functional outcome for patients treated with endovascular therapy plus best medical management vs. best medical management alone [32]. Third, limitations typically associated with quantitative radiomics-based image analysis and classification may compromise generalizability of the results [1, 5, 6, 11, 13]. These limitations include differences in image acquisition settings, for example size of the field of view or gantry tilt, and under- or overfitting of machine learning algorithms. Bias of these factors was minimized through employment of standardized NCCT scans and the application of Random Forest algorithms that are comparably stable with regards to overfitting. The risk of overfitting was also reduced by evaluating multiple different models in a nested cross-validation approach. Due to standardized and calibrated quantitative imaging parameters and signal intensity processing of CT scanners we assume neglectable bias on classifier performance in a generalized setting. Fourth, with NCCT being the most widely performed brain-imaging technique in acute pc stroke settings [9], our analysis did not integrate CT angiography, CT perfusion or MR imaging. An extension to these imaging modalities could have further improved the results [22, 26,27,28] as NCCT images only offer limited sensitivity for detecting ischemia compared to e.g. diffusion-weighted imaging [34]. However, both scores—conventional ASPECTS (Alberta Stroke Program Early CT Score) for anterior circulation and pc-ASPECTS—are originally based on evaluations of acute NCCT scans. NCCT scans at admission are fast and the technique is available in most hospitals. Furthermore, NCCT imaging is a fundamental part of most standard-of-care stroke protocols. Fifth, the acquisition resolution of NCCT scans was limited to < 5 mm in slice thickness. The utilization of higher resolution images could improve classification performance. Sixth, the manual definition of pc-ASPECTS areas still implies a certain degree of observer-dependence within the machine learning process. To minimize its influence, we derived standard maps from delineations obtained from 63 healthy subjects. Further, it was shown that radiomic features are comparably stable with regards to variations in segmentations [23, 37].

Conclusion

We developed a machine-learning based classifier that predicts functional outcome of acute posterior circulation stroke patients based on quantitative multidimensional analysis of pc-ASPECTS regions. We observed higher classification performance metrics than achieved in conventional clinical and imaging-based assessments. The proposed algorithm might therefore (1) allow for an early prognosis of the patient’s long-term functional status (2) optimize triage for additional MR imaging diagnostics and allocation of best possible medical care [36] and (3) could facilitate required arrangements of the patient’s social environment at an early point of time.