Big Data in Stroke: How to Use Big Data to Make the Next Management Decision

Liu, Yuzhe; Luo, Yuan; Naidech, Andrew M.

doi:10.1007/s13311-023-01358-4

Big Data in Stroke: How to Use Big Data to Make the Next Management Decision

Review
Published: 10 March 2023

Volume 20, pages 744–757, (2023)
Cite this article

Download PDF

Neurotherapeutics

Big Data in Stroke: How to Use Big Data to Make the Next Management Decision

Download PDF

736 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

The last decade has seen significant advances in the accumulation of medical data, the computational techniques to analyze that data, and corresponding improvements in management. Interventions such as thrombolytics and mechanical thrombectomy improve patient outcomes after stroke in selected patients; however, significant gaps remain in our ability to select patients, predict complications, and understand outcomes. Big data and the computational methods needed to analyze it can address these gaps. For example, automated analysis of neuroimaging to estimate the volume of brain tissue that is ischemic and salvageable can help triage patients for acute interventions. Data-intensive computational techniques can perform complex risk calculations that are too cumbersome to be completed by humans, resulting in more accurate and timely prediction of which patients require increased vigilance for adverse events such as treatment complications. To handle the accumulation of complex medical data, a variety of advanced computational techniques referred to as machine learning and artificial intelligence now routinely complement traditional statistical inference. In this narrative review, we explore data-intensive techniques in stroke research, how it has informed the management of stroke patients, and how current work could shape clinical practice in the future.

The Allure of Big Data to Improve Stroke Outcomes: Review of Current Literature

Article Open access 11 March 2022

Future Application: Prognosis Determination

Introduction

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction/Methods

A combination of developments in stroke interventions, biomedical informatics, and computer science in the last 10 years has transformed the way we approach the management of stroke. The publication of five landmark trials in 2015 established the preference for mechanical thrombectomy for selected patients [1,2,3,4,5]. However, while interventions like thrombolytics and thrombectomy have improved patient outcomes, significant gaps remain in our ability to select patients for specific treatments and predict complications. The proliferation of medical data and advances in computational techniques have created new methods to address these gaps. From 2012 to 2014, the proportion of US hospitals using an electronic health record increased from 44 to 97% [6]. As a result, more digital medical data are being generated at an increasing pace. The opportunity and challenge of leveraging this exponentially increasing amount of information are often described as the problem of big data.

Techniques to manage and analyze big data have also advanced rapidly. In 2015, Google Brain released TensorFlow [7], a machine learning software package that furthered the era of widely available deep learning algorithms. These algorithms, which have revolutionized technology like self-driving cars and virtual assistants, have also found success in the medical domain with applications ranging from drug discovery to imaging analysis to seizure detection. Big data and the novel computational techniques required to process it have been used in the domain of stroke management to identify additional patients who may benefit from acute intervention, standardize the detection of large vessel occlusions (LVOs), predict the location and extent of hemorrhagic transformation, stratify stroke risk, and make personalized treatment recommendations.

This narrative review highlights recent applications of big data and machine learning in stroke management. A review of the literature was conducted by searching PubMed using ((“big data” OR “machine learning”) AND “stroke”) filtered to results published in the last 10 years. The results were sorted by “Best Match” and reviewed by the primary author for relevancy. Further publications were added to the review based on the authors’ expert opinion.

Big Data

The proliferation of diverse digital medical data such as electronic medical records, digital imaging, genomics, and research registries has outpaced our capacity to analyze the data. While the term big data is often used when discussing the opportunities and challenges associated with the exponential accumulation of data, it has been difficult to settle on a commonly agreed-upon definition. To address this, the National Institute of Standards and Technology (NIST) in 2019 published a report for the purpose of better defining big data. Broadly, four characteristics of data describe a big data problem: volume, velocity, variety, and variability (Table 1) [8]. The NIST report specifically does not define concrete metrics for each of these variables, nor does it prescribe requirements of magnitude before data can be considered “big.” Rather, they are meant as an aid to think about a class of problems inadequately addressed by traditional analytic methods. Medical data, when considered in this context, is big data [9]. A large number of patients as well as variables pose a volume problem. Methods are needed to integrate a variety of data sources, including charts, imaging, genetics, and administrative data. The amount of data is rapidly accumulating (velocity), and, with the development of new tests, imaging modalities, and multi-omics testing, the nature of the data is rapidly changing as well (variability).

Table 1 Four V characteristics defining big data

Full size table

Investigations using big data are facilitated by novel computational techniques often referred to as artificial intelligence (AI) or machine learning (ML, Table 2, Glossary). Biomedical studies have traditionally used statistical methods to analyze a handful of variables derived from structured clinical data that was hand-collected to either test a hypothesis or estimate a parameter of a model. For example, an investigation may require an estimation of hematoma volume. While delineating the precise three dimensional area of hemorrhage can be done by humans, it is time consuming and not feasible to perform by hand beyond several hundred scans. As a result, estimation methods such as ABC/2 (height * width * depth divided by 2) were developed to produce fast approximations [10]. In contrast, automated techniques based upon machine learning can reliably estimate hematoma volumes and can be scaled up to efficiently process thousands to millions of images, which is helpful for conducting research with large data sets [11].

Table 2 Glossary of commonly used terms

Full size table

Common AI and ML algorithms are described in Table 3, and further reading can be found here [12]. These algorithms are used not only for parameter estimation but can also perform other tasks like classification and clustering and can be incorporated into clinical decision tools. The goal of classification is to predict the value of an output variable (sometimes referred to as a label), such as clinical outcome or development of hemorrhagic transformation. The performance of machine learning methods is often evaluated based on their ability to accurately predict on a validation dataset as opposed to traditional statistical methods, which typically focus on the estimation of the parameters of a model using all the available data.

Table 3 Common artificial intelligence and machine learning techniques

Full size table

Two categories of ML methods are notable for their typically superior performance in complex classification tasks and as such are commonly found in recent studies using machine learning. The first category is ensemble methods—a category of algorithms that function by aggregating the predictions of a collection of simple classifiers to produce a more accurate prediction. Examples of ensemble methods include random forest and gradient boost. The second category of methods is collectively termed deep learning. Deep learning uses a complex, multi-layered artificial neural network (ANN) to capture complex relationships in the training data to make predictions. While the concept of an artificial neural network has been around since the invention of the perceptron in 1943 [13], the ability to capture complex relationships in the data using a multi-layered model was only recently made possible with the development of widely available and sufficiently powerful graphical processing units (GPUs) and parallel processing algorithms to perform the enormous number of calculations necessary to train the model. While deep learning models can demonstrate outstanding predictive performance, the sheer complexity of the models renders them nearly uninterpretable to humans, and, as such, a deep learning algorithm effectively becomes a black box. Making deep learning models human interpretable is an ongoing area of research [14].

New challenges arise when addressing big data problems using AI and ML. Unlike traditional datasets, big data cannot be feasibly curated by hand, and mis-formatted or missing data are common. This must be addressed automatically, and the choice of method for handling missing data may bias the results [15]. As the number of variables in a dataset increases, the space of possible models increases exponentially as well, and it becomes easy to overfit a model to training data, resulting in a model that does not generalize and performs poorly when attempting to make predictions on validation data. Methods for selecting models that are not too complex or too simple or reducing the number of variables help address this problem.

Although the total amount of data that exists in electronic health record numbers in the hundreds of millions of patients, the amount of data that is practically available for a given study in a specific disease is likely more limited. A combination of factors such as disease prevalence, patient consent to research, and data sharing agreements limits the amount of data available. In practice, studies in a particular medical domain may contain tens to hundreds of patients on the low end (observational or prospective studies) to hundreds of thousands of patients in large national cohorts, registries, or biobanks. The availability of amalgamated registries (e.g., Epic Cosmos) raises the possibility of research using data from hundreds of millions of patients [16]. In stroke, efforts to increase the availability of imaging data for research are underway as part of the NIH Stroke Trials Network (StrokeNet) [17].

Human Bias in Machine Learning

While ML is useful for finding otherwise unknown associations between variables, it does not inherently understand the plausibility or context of the learned associations. Though it is possible to infer causal relationships through only observed data using machine learning, the field of causal ML is still growing, and the majority of machine learning algorithms do not infer causality [18, 19]. As such, a researcher must be thoughtful about the selection of input variables and making inferences about causality from learned models. A model trained using clinical data that includes ethnicity, socioeconomic status, or hospital location might capture the effect of health disparities rather than the biological associations the researcher intended to study. For example, in a machine learning model using clinical data to predict outcome from a Chinese stroke cohort, significant clinical factors included the specific hospital at which the patient was treated [20]. One explanation is that different hospitals delivered different quality stroke care, which translated to different outcomes. Another explanation could be that sicker patients went to specific hospitals. The learned model is thus confounded by the inclusion of the presenting hospital in the input.

There exists a harmful misperception that decisions made with algorithmic assistance are not susceptible to human biases. Because these algorithms rely on models learned from data collected and curated in a biased society, they are manifestly as susceptible to systemic societal bias and inequity as human-made decisions. For example, studies using data from personal fitness trackers are likely to be biased toward people who are wealthy enough to afford them. Underserved communities are likely to be underrepresented in datasets, which are typically collected from academic research institutions. The topics of fairness, social justice, and bias in machine learning are increasingly researched, and methods to correct for these measures have been introduced [21]. In order to do so, terms such as fairness, bias, and protected attribute must first be explicitly defined in a computational context. Then, a fairness metric must be designed to quantify the degree of fairness so that a machine learning algorithm can be designed to optimize for it. Unfortunately, there are many possible ways to define fairness mathematically, some of which are mutually exclusive [22]. As such, it falls on the researcher to choose the most appropriate definition of fairness for its application. Once a metric of fairness has been defined, a machine learning algorithm can then search for a model that maximizes fairness. Training data must contain sufficient and unbiased representation of protected groups to allow for accurate training. Expressly including measures of social determinants of inequity may improve the fairness of AI models [23].

Trust in novel medical interventions stems from high-quality clinical trials that balance measured and unmeasured confounders; in AI/ML, these unmeasured confounders are a source of more accurate models that more fully incorporate the human bias that may already be contained in the data. The CONSORT guidelines provide evidence-based recommendations for transparency and completeness of reporting of randomized clinical trials [24]. As AI and ML methods are increasingly applied to medical problems in the age of big data, an extension of trial guidelines is needed to address the unique challenges of interpreting these studies. Because AI algorithms often construct complex predictive models using large numbers of variables and complicated processing pipelines, it can be difficult for a human reviewer to judge the models for bias. CONSORT-AI is an extension of the CONSORT guidelines to provide reporting guidance for trials, incorporating AI and machine learning to address these concerns [25]. In particular, the CONSORT-AI extension stipulates that authors should make clear how an AI algorithm is integrated into the trial setting, how input data is acquired, how poor quality or missing data is handled, how much human input is involved in handling the data, how the algorithm output is used in the trial, an analysis of errors, and whether code is accessible. SPIRIT-AI is a complementary extension of guidelines for the reporting of clinical trial protocols [26]. STARD-AI is currently underway to develop consensus guidelines on the reporting of AI diagnostic accuracy studies [27]. In a review of 41 randomized controlled trials using AI or machine learning for medical decisions, no trials were found to have met all of the CONSORT-AI guidelines for reporting, and most trials failed to discuss how they handled poor quality or missing data and failed to assess performance errors [28]. This review provides a broad overview of the different areas in which artificial intelligence and machine learning have been applied to stroke research but will not explicitly evaluate each study on the basis of CONSORT-AI as most studies referenced are not clinical trials.

Acute Treatment of Stroke

The widespread adoption of thrombolytics and thrombectomy for the acute treatment of stroke poses new big data challenges. Of particular importance is the timely selection of eligible patients, which is critical to the success of these treatments. Trials have demonstrated the effectiveness and safety of alteplase within 4.5 h of stroke onset [29,30,31], while an optimal window for tenecteplase remains unclear [32,33,34,35,36]. Although early trials failed to find benefit to thrombectomy, subsequent landmark trials demonstrated clear improvement in neurologic outcome in patients treated with thrombectomy up to 24 h after symptom onset [1,2,3,4,5, 37,38,39,40,41]. Robust, standardized patient selection using automated image processing algorithms such as RAPID [42] played a significant role in the success of subsequent trials.

The determination of time of stroke onset can be unreliable in a significant proportion of patients who present to the ED with an acute stroke due to either stroke onset during sleep, or unwitnessed strokes in which the patient is not cognitively intact or able to communicate the time of stroke onset. The proportion of patients who present with a wake-up stroke ranges from 14 to 24%, with a smaller fraction of additional patients who present with non-wake-up strokes with unknown time of onset [43, 44]. If time of stroke onset were able to be estimated for these patients from other data, some of these patients may be able to see benefits from acute stroke interventions.

In order to address this problem, MRI diffusion weighted imaging (DWI) and fluid attenuated inversion recovery (FLAIR) mismatch has been proposed as an imaging biomarker for predicting time of stroke onset within 4.5 h and has been studied in the context of the WAKE-UP trial [45]. While neuroradiologist protocols have been developed to classify the presence or absence of DWI-FLAIR mismatch, human readers were found to be around 60% sensitive and 80% specific in predicting stroke onset < 4.5 h [46]. ML methods are well suited to this problem and could potentially improve upon human predictive performance. Indeed, several studies, using region of interest (ROI)-based or deep-learning-extracted imaging features, have been able to achieve classification of time of stroke onset to less or more than 4.5 h with better sensitivity and specificity compared to manual DWI-FLAIR mismatch protocols [47,48,49]. Another study used quantitative imaging features (radiomics) to measure the degree of DWI-FLAIR mismatch beyond the typically human adjudicated categories of absent, subtle, or obviously present [50]. These studies are limited in their sample sizes, which are on the order of hundreds of stroke MRIs, but demonstrate how machine learning methods can leverage the full depth of imaging data to improve on manual reading.

The risk–benefit discussion with patients for acute interventions is informed by our assessment of their current deficits as well as an estimation of what their stroke might look like if it were to complete (i.e., all ischemic tissue became infarcted). The current clinical standard for predicting final stroke volume uses standardized thresholds on diffusion and perfusion maps; these thresholds, however, are susceptible to artifacts, have not been validated in a large cohort, and do not capture individual variability in physiology [51]. ML methods, particularly deep learning methods such as convolutional neural networks (CNN, Table 3), can better predict final stroke volume compared to the current clinical standard [52,53,54,55,56]. While these methods perform well with large infarcts, they are less accurate with smaller infarcts (e.g., < 20 mL infarct volume). The incorporation of anatomical information about each voxel and the probability of infarct at that location in addition to DWI and PWI data can increase the performance of predictive models [57]. Deep learning methods were also able to predict tissue at risk using arterial spin echo MRI, circumventing the need for intravenous contrast for perfusion imaging [58]. Most studies, thus far, on automated segmentation of infarct volume have been limited by the small amount of training data, with only tens to hundreds of samples available. There is evidence that training on larger datasets from repositories produces models that perform better than models trained on smaller, single-center datasets [59].

Intracranial vessel and perfusion imaging are critical to decision-making for thrombectomy, and big data has made possible the development of artificial intelligence imaging interpretation and decision aids to streamline the process of making timely decisions for intervention, particularly in resource-limited settings. Software suites, such as RAPID (iSchemaView), e-Stroke Suite (Brainomix), and VIZ.ai, provide interpretations of perfusion and vessel imaging [60]. Software such as RAPID automatically calculates parameters such as diffusion and perfusion, and perform some rudimentary segmentation to automatically calculate useful clinical maps such as a diffusion-perfusion mismatch or core-penumbra mismatch, calculate ASPECTS, or predict large vessel occlusion [42, 61]. The development of this software was made possible using datasets from thrombectomy trials and subsequently used in the decision-making process in trials such as EXTEND IA, DEFUSE 3, and DAWN, allowing for standardization of the information available across multiple medical centers [1, 40, 41]. These software suites have since been introduced into clinical practice—RAPID has been deployed to more than 1800 hospitals worldwide, and Brainomix has won a tender to be deployed to the national healthcare system in Hungary [62, 63].

Recent additions of automated ASPECTS calculation in these software systems typically use random forests or CNNs to make their predictions, and several are available commercially, including RAPID ASPECTS and Brainomix e-ASPECTS [64]. Similar work has been done using CNN for LVO detection [65] and has been commercialized in VIZ.ai LVO/CTP, though RAPID uses a non-machine learning–based algorithm for this purpose. VIZ.ai LVO detection is only trained to detect occlusions at the carotid terminus, M1, and M2 locations, and, in real-world studies, performs better with detecting carotid terminus (100% sensitivity) and M1 (93% sensitivity) occlusions compared to M2 (50% sensitivity for proximal, 28% sensitivity for distal) [66]. Specificities were reported in the 90% range, and, given the preponderance of studies without LVO in the dataset, positive predictive values were only in the 30–40% range. A similar study looking at RAPID’s detection of intracranial LVOs demonstrated sensitivity of 95–96%, with specificity of 74–79% without and with the inclusion of M2 occlusions, respectively [67]. A CNN-based method was also developed for automatically detecting LVOs on digital subtraction angiography instead of CT scans for the purposes of standardization in thrombectomy studies; however, on average the predicted locations differed from ground truth by about 1.2 cm for carotid terminus occlusions and 1.9 cm for distal occlusions [68].

In practice, algorithmically predicted LVO occlusion still requires expert validation but has a role in the triage of strokes from limited resource settings where expert radiologist or neurologist review may not be quickly available. In this sense, these algorithms help address the velocity of data arriving in the modern stroke code. Work has been done to expand automated algorithms to detect other neurological problems, such as intracranial hemorrhage, fracture, or mass effect, from imaging data [69, 70]. These algorithms can be helpful for screening in settings with limited access to expert interpretation.

Management of Complications

Early neurologic complications of acute stroke include cerebral edema, hemorrhagic transformation, and early post-stroke seizures. The ability to accurately predict complications before they happen or early in their course allows for the implementation of early interventions to minimize the amount of irreversible neurologic injury. While a clinician may be able to qualitatively estimate the risk of developing malignant edema or hemorrhagic transformation based on clinical characteristics and approximation tools such as the ASPECTS score, there is no standardized set of biomarkers from which predictive performance can be evaluated and studies on single biomarkers can produce conflicting results [71]. Challenges include the limitations of approximation tools (e.g., the ASPECTS score only measuring MCA territory) and the inability of any single biomarker to capture the heterogeneity of the problem. Machine learning can provide a standardized method to integrate multiple and more sophisticated biomarkers to estimate the risk of developing such complications [72], which can inform decisions on monitoring, reperfusion therapy, and goals of care.

Artificial intelligence is helpful to recognize complications of stroke interventions. ANNs (Table 3) were used to predict the presence or absence of hemorrhagic transformation at 48 h from clinical and demographic variables, achieving an AUC of 0.84 [73]. While the study authors did not compare performance against non-ML-based methods, previous work on other datasets using clinical biomarkers as risk scores achieved AUCs ranging from 0.50 to 0.86 [74,75,76]. As such, ML algorithms likely achieve comparable performance as clinical risk scores. It also confirms that important predictors of hemorrhagic transformation include stroke severity as represented by NIH Stroke Scale (NIHSS), cardioembolism as the stroke etiology, blood glucose, and systolic blood pressure. Long short-term memory (LSTM) neural networks, a deep learning algorithm that incorporates time series data, were used on the temporal information stored in the perfusion signal on MRIs from before reperfusion therapy to predict the extent and location of hemorrhagic transformation at 24 h after reperfusion therapy [77, 78]. Compared to traditional ML methods, LSTMs demonstrated superior performance on classification of hemorrhagic transformation, achieving an AUC on a voxel-by-voxel basis of 0.89. From a clinical perspective, this algorithm is not only able to predict whether hemorrhagic transformation will happen but where it will happen in the brain. Knowing the likely location and extent of hemorrhagic transformation would allow a physician to stratify the clinical significance of a potential hemorrhage and, thus, would inform decisions on reperfusion therapy, if hemorrhagic transformation does occur, current guidelines generally recommend management similar to that of spontaneous ICH [76]. Given the different comorbidities and etiologies of hemorrhagic transformation, however, more work is needed to identify areas in which management might differ. While hematoma expansion has been identified as a modifiable factor that can improve outcomes in spontaneous ICH [79], the same has not been established in hemorrhagic transformation. Identifying modifiable risk factors for worse outcomes in hemorrhagic transformation could inform specific management practices in that setting.

Similar rationale applies to predicting malignant cerebral edema (edema severe enough to cause mass effect and neurologic injury) as predicting hemorrhagic transformation. The Monro-Kellie doctrine provides a mathematical basis for the estimation of intracranial pressure and suggests that an estimation of intracranial CSF volume, or reserve, may predict the development of malignant edema [80]. In a study on hemispheric stroke patients, automated image processing was used to extract features representing intracranial reserve from baseline and 24-h CT scans, and these features were used to train a logistic regression model to predict the development of malignant cerebral edema (defined as either needing decompressive hemicraniectomy or death related to at least 5 mm of midline shift of the brain) with better accuracy than clinical variables alone [81]. In the future, such algorithms could predict deterioration and anticipate the need for additional surveillance or targeted treatments. By catching deterioration early or before it happens, early interventions can be applied to limit the amount of neurologic injury that would have been caused.

Seizures can complicate ischemic strokes as well as hemorrhagic strokes. Lobar location and ICH over infarct are both associated with increased seizure risk after stroke [82]. Risk scores for prediction of late seizures include the SeLECT score in ischemic stroke (AUC 0.76) [83] and the CAVE score in ICH (AUC 0.69) [84]. Early seizures after ICH are associated with worse quality of life; inconveniently, so is the use of prophylactic anti-seizure medication in unselected patients [85, 86]. As such, being able to predict early seizures may be helpful in selecting appropriate patients for closer monitoring or selective prophylactic management. With spontaneous ICH, gradient boosting has been shown to have improved performance at predicting early seizures compared to a subset of the CAVE score, achieving AUC of 0.79 compared to 0.72 [87]. A meta-analysis of studies on seizures in ischemic stroke found that risk factors for early seizures included cortical involvement, severe stroke, hemorrhagic transformation, age < 65, large lesion, and presence of atrial fibrillation [88], though only one study evaluated the predictive accuracy of a risk score for early seizures [89]. The study compared several risk scores using discretized clinical variables and achieved an AUC of 0.73 using a subset of 5 variables. Instead of manually choosing variables to consider, however, a ML approach such as decision trees could automatically find the most discriminative variables. ML to predict seizures could make antiseizure medication treatment more precisely targeted, while sparing potential adverse effects in patients less likely to have a seizure in the future.

While non-invasive EEG can more reliably detect sufficiently large seizures affecting the cortex, hippocampal seizures are often more difficult to detect without invasive electrodes. This is relevant to ischemic strokes as the hippocampus is particularly vulnerable to ischemic insults. The ability to detect deep hippocampal seizures from non-invasive EEG would be safer and better tolerated for patients. An ensemble CNN-based algorithm was able to detect hippocampal epileptiform activity from scalp EEG alone, achieving an AUC of 0.89 at detecting individual hippocampal epileptiform events recorded from invasive electrodes [90]. The algorithm was able to classify temporal lobe epilepsy from healthy controls with AUC of 0.88 and 0.95 in two separate validation data sets. ML can help identify subtle patterns on the scalp EEG not detectable by humans that are predictive of deeper hippocampal seizures, avoiding the need for invasive monitoring.

Stroke Outcomes and Prognosis

ML has been used to predict length of stay, functional outcome, and risk for readmission, which can be helpful in discharge planning and care coordination. Imaging, text analysis, and structured clinical data have all been used to predict outcome [91]. To simplify the classification task, outcome is often defined as a binary variable where a favorable outcome equals a modified Rankin Scale (mRS) < = 2 and an unfavorable outcome as a mRS > 2. The ASTRAL score is an integer-based scoring system derived from logistic regression on clinical variables present on admission and predicts the probability of unfavorable outcome at 3 months with an AUC of 0.90 and maximum accuracy around 0.8 in a pooled validation cohort [92]. It has also been used to predict 5-year dependence and mortality with similar performance [93]. A support vector machine (SVM, Table 3) model trained using anatomical information on the extent of infarcts in conjunction with patient age and NIHSS on admission predicted favorable outcome with an accuracy of 0.85 [94]. Neural networks had superior performance compared to the ASTRAL score at predicting favorable outcome at 3 months [95]. Instead of clinical data, one study used natural language processing (NLP) on MRI radiology reports to predict outcome, achieving an AUC of 0.78 with random forest and 0.8 with CNN [96]. In a study predicting outcome at 90 days using combined clinical, multimodal imaging, and angiographic data, a gradient boost algorithm found that NIHSS at 24 h, premorbid mRS, and final infarct volume were the most important predictors of long-term outcome, and a combined multimodal model achieved an AUC of 0.85 [97]. Overall, machine learning methods perform as well or better than the ASTRAL score at predicting 3-month functional outcome. In terms of mortality, ensemble machine learning methods such as random forest and gradient boost have demonstrated increased predictive accuracy compared to logistic regression in the prediction of mortality after rehabilitation, increasing AUC from 0.74 to 0.92 [98]. Machine learning methods perform better than simple integer-based scores at predicting outcome and can help in planning for the recovery process.

While other outcome measures such as Barthel Index (BI) and NIHSS exist, most stroke trials have used mRS as the primary outcome as it appears to correlate most closely with patient-reported quality-of-life metrics, such as the Stroke Impact Scale (SIS) [99]. While mRS correlates well with quality of life on a population level, it cannot account for personal values, which may dramatically impact health-related quality of life (HRQoL) on an individual level. A variety of methods for estimating multi-domain HRQoL are available, including the NIH Patient Reported Outcomes Measurement Information System (PROMIS), Neuro-QOL (a set of measures similar to PROMIS that were validated for proxy report and in patients with neurological diseases), and EuroQOL. A substantial amount of multi-domain HRQoL data is available for patients with stroke [100,101,102]. However, machine learning methods generally perform better with simple classification tasks compared to prediction of multi-domain scores [103]. Attempts to use ML to study HRQoL often rely on simplifying HRQoL scales such as the SIS into a composite score and further binarizing the composite score into good response or poor response [104]. Other strategies include limiting the number of domains being investigated and focusing on unsupervised instead of supervised learning (Table 2, Glossary). For example, clustering algorithms have been used to identify distinct phenotypes of 4-domain HRQoL responses after sub-arachnoid hemorrhage [105]., Overall, computational models remain a poor substitute for compassionate care when discussing detailed prognosis with patients or family.

From a systems improvement and resource utilization perspective, outcome metrics like length of hospital stay and 30-day readmission rates are also important. Unfortunately, attempts to predict length of stay [106] and 30-day readmission rates [107] have not been as successful. These outcome measures likely depend on other factors that are not well captured in clinical data, such as hospital administrative policies, the availability, and quality of disposition facilities, and the support systems in place for a patient after hospital discharge. Further work will likely need to better characterize these social factors and inequities in order to provide more accurate predictions. The lack of biologically plausible predictors may also render machine learning less useful for prediction tasks that depend on data not documented or inferred from the electronic health record and are potentially more associated with social determinants (e.g., resources for continued medical care).

Stroke Prevention

Management decisions on secondary prevention of stroke may be made on up to 690,000 patients with acute ischemic stroke and 240,000 patients with transient ischemic attack each year in the USA [108], while primary prevention applies to the entire population. As such, even small increases in the performance of risk prediction using big data and machine learning have the potential to benefit a large number of people. Although the use of risk scores in patient provider communication does not appear to change patient beliefs or behavior for stroke prevention [109], individualized predictions of stroke risk can, nevertheless, be helpful for the provider in recommending initiation of preventative therapy. Decisions on the use of anti-platelet therapy for primary prevention of stroke often depend on the calculation of risk scores such as the ASCVD score or the Framingham stroke risk profile, which are derived using Cox regression [108, 110]. A study on more than 500,000 Chinese patients used an ensemble method to combine Cox regression predictions with gradient boost predictions to increase positive predictive value (PPV) for future stroke by 1% compared to Cox regression alone [111]. A similar study on 57,000 hypertensive patients in China found that gradient boost predicted subsequent stroke in 3 years with better AUC than the Framingham stroke risk profile [112]. While a 1% increase in PPV may seem small, when applied to an eligible population in the hundreds of millions, additional million people may be appropriately screened for preventative therapy.

Automated methods for data extraction can be helpful due to the large number of patients in population risk factor studies. An automated NLP algorithm was found to be superior to manual coders in the detection of stroke comorbidities from data from the Sentinel Stroke National Audit Programme in the UK [113]. Another study used ML on administrative data and echocardiogram reports to identify likely cardioembolic strokes for the purposes of ensuring appropriate follow-up [114]. When using automated methods, however, it is important to understand the source of input data. Depending on the country, sources like billing or administrative data may misrepresent the prevalence of risk factors compared to clinical notes or structured data, as coders may over-code to maximize reimbursements [115]. This, in turn, may result in biased predictive models.

The nature of genetics studies necessitates a big data approach; due to the large amount of data contained in an individual human genome, a large study population is needed to identify relevant genetic markers. An in-depth review of computational techniques for genomic analysis is outside the scope of this paper. The role of genetics in stroke risk remains an active area of research. Genetics research has identified at least 35 genetic loci that are associated with increased stroke risk as well as a number of inherited stroke syndromes, such as CADASIL, CARASIL, and PADMAL [116]. The MEGASTROKE study, a genome-wide association study (GWAS) of more than half a million patients, discovered 22 new loci associated with stroke risk from the previously known 10 [117]. While work needs to be done to validate these discovered associations, their discovery advances our progress toward stratifying individual genetic risk and using that information to manage surveillance or preventative therapy. A major weakness of many GWAS studies is the biased representation of ethnicities, which can limit generalizability [118]. For example, out of the half a million patients included in the MEGASTROKE study, the majority were European, and less than 2000 were Latin American. This can bias the discovered associations to those polymorphisms disproportionately affecting Europeans and miss important polymorphisms affecting Latin Americans.

Future Directions

The current time since stroke onset thresholds in guidelines for the use of thrombolytics is unfortunately a relic of the design of randomized controlled trials. Discontinuity analysis suggests there is little difference in outcomes shortly before and shortly after time thresholds of 3 h or 4.5 h [119]. While current work focuses on predicting stroke onset within the first few hours after symptom onset, future work could eliminate the need for strict time-based exclusion criteria altogether. Instead, image processing techniques could shift the decision-making paradigm from a time-based approximation of the likelihood to benefit to a tissue-based one and would better account for individual variability in the rates of stroke progression.

Algorithms for the prediction of LVOs and segmentation of stroke volumes are increasingly available in clinical practice and can be helpful for decision support in limited resource settings. Improvements need to be made in their predictive accuracy to improve their clinical utility. As these algorithms become increasingly deployed in the clinical setting, proactive machine learning can help identify weak spots in the collected data (e.g., patients for whom a prediction of an LVO is more uncertain) and guide a continuous cycle of data augmentation and model evaluation to effectively improve algorithm performance [120].

Beyond making accurate predictions of risk, future work with big data may help establish precise treatment recommendations. For example, current guidelines only provide standardized blood pressure recommendations for both acute treatment and prevention. In the acute stroke setting, exceeding individualized autoregulatory blood pressure goals may be associated with worse outcomes [121]. In patients with intracranial atherosclerotic disease, while systolic blood pressure under 140 mm Hg has been associated with a lower rate of stroke recurrence, it remains unclear whether there is a subset of patients who might benefit from more permissive hypertension in the long term [122]. Instead of a general recommendation for long-term blood pressure control, a big data-driven ML approach may be able to identify individualized goals for blood pressure.

Despite the rapid adoption of big data and ML into everyday life (see virtual assistant technology, semi-autonomous driving, social media feeds, bank fraud detection, AI-generated art and writing), their adoption into medicine has lagged behind. Barriers to adoption include the lack of transparently reported prospective studies, concerns about the generalizability of models developed with research data to real-world applications, trust regarding the ability of algorithms to explain their decision-making, concerns about bias in training data as well as population shifts over time, potential liability, and technical issues with implementation involving security, privacy, and interoperability [123, 124]. A multi-faceted approach is necessary to address these diverse challenges. Future studies should strive to incorporate more prospective clinical trials and conform to CONSORT-AI guidelines on reporting to achieve transparency. Future work on machine learning and, particularly, deep learning should explore methods to increase human interpretability and build trust. Medical information technology infrastructure must evolve toward interoperability, using standards such as Health Level 7 (HL7) Fast Health Interoperability Resources (FHIR). Legislation is needed to promote standardization and data sharing while protecting privacy and security.

Conclusion

Machine learning and big data analytics are a rapidly developing asset to improve the acute management and prevention of strokes. These algorithms can help identify additional patients who may benefit from intervention, automate and standardize the detection of LVOs to facilitate the triage of patients, predict the development of hemorrhagic transformation or malignant edema, better stratify risk for stroke prevention, and make personalized treatment recommendations. Some ML and AI techniques are already being introduced into clinical practice for neuroimaging. High-quality clinical trials with transparent, AI-conscious reporting are needed to explicitly evaluate their utility for patient care. Barriers to the adoption of big data and AI in medicine will need to be addressed to benefit from these advances.

References

Campbell BCV, Mitchell PJ, Kleinig TJ, Dewey HM, Churilov L, Yassi N, et al. Endovascular therapy for ischemic stroke with perfusion-imaging selection. N Engl J Med. 2015;372:1009–18.
Article CAS PubMed Google Scholar
Berkhemer OA, Fransen PSS, Beumer D, van den Berg LA, Lingsma HF, Yoo AJ, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med. Massachusetts Medical Society; 2015;372:11–20.
Goyal M, Demchuk AM, Menon BK, Eesa M, Rempel JL, Thornton J, et al. Randomized assessment of rapid endovascular treatment of ischemic stroke. N Engl J Med. 2015;372:1019–30.
Article CAS PubMed Google Scholar
Saver JL, Goyal M, Bonafe A, Diener H-C, Levy EI, Pereira VM, et al. Stent-retriever thrombectomy after intravenous t-PA vs. t-PA alone in stroke. N Engl J Med. 2015;372:2285–95.
Article CAS PubMed Google Scholar
Jovin TG, Chamorro A, Cobo E, de Miquel MA, Molina CA, Rovira A, et al. Thrombectomy within 8 hours after symptom onset in ischemic stroke. N Engl J Med. 2015;372:2296–306.
Article CAS PubMed Google Scholar
Office of the National Coordinator for Health Information Technology. National Trends in Hospital and Physician Adoption of Electronic Health Records [Internet]. 2023 [cited 2023 Jan 27]. Available from: https://www.healthit.gov/data/quickstats/national-trends-hospital-and-physician-adoption-electronic-health-records.
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: large-scale machine learning on heterogeneous systems [Internet]. 2015. Available from: https://www.tensorflow.org/. Accessed 30 Jan 2023.
Chang W, Grady N. NIST big data interoperability framework: volume 1, definitions. Special Publication (NIST SP), National Institute of Standards and Technology, Gaithersburg, MD; 2019.
Shilo S, Rossman H, Segal E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med. Nature Research; 2020: 29–38.
Kothari RU, Brott T, Broderick JP, Barsan WG, Sauerbeck LR, Zuccarello M, et al. The ABCs of measuring intracerebral hemorrhage volumes. Stroke. Lippincott Williams and Wilkins; 1996;27:1304–5.
Dhar R, Falcone GJ, Chen Y, Hamzehloo A, Kirsch EP, Noche RB, et al. Deep learning for automated measurement of hemorrhage and perihematomal edema in supratentorial intracerebral hemorrhage. Stroke [Internet]. Lippincott Williams and Wilkins; 2020 [cited 2022 Oct 28];51:648–51. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.119.027657.
Bishop CM. Pattern recognition and machine learning. 1st ed. New York: Springer; 2006.
Google Scholar
McCulloch WS, Pitts W. A logical calculus of ideas immanent in nervous activity. Bull Math Biophys. 1943;5:115–33.
Article Google Scholar
Li X, Xiong H, Li X, Wu X, Zhang X, Liu J, et al. Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond. Knowl Inf Syst [Internet]. Springer Science and Business Media Deutschland GmbH; 2022 [cited 2023 Jan 28];64:3197–234. Available from: https://link.springer.com/article/10.1007/s10115-022-01756-8.
Liu Y, Gopalakrishnan V. An overview and evaluation of recent machine learning imputation methods using cardiac imaging data. Data (Basel) [Internet]. Multidisciplinary Digital Publishing Institute; 2017 [cited 2017 May 10];2:8. Available from: http://www.mdpi.com/2306-5729/2/1/8.
Tarabichi Y, Frees A, Honeywell S, Huang C, Naidech AM, Moore JH, et al. The cosmos collaborative: a vendor-facilitated electronic health record data aggregation platform. ACI Open. Georg Thieme Verlag KG; 2021;05:e36–46.
Liebeskind DS, Albers GW, Crawford K, Derdeyn CP, George MS, Palesch YY, et al. Imaging in StrokeNet: realizing the potential of big data. Stroke. Lippincott Williams and Wilkins; 2015;46:2000–6.
Kaddour J, Lynch A, Liu Q, Kusner MJ, Silva R. Causal machine learning: a survey and open problems. 2022 [cited 2022 Oct 23]; Available from: http://arxiv.org/abs/2206.15475.
Sharma A, Kiciman E. DoWhy: An end-to-end library for causal inference. 2020 [cited 2022 Oct 23]; Available from: http://arxiv.org/abs/2011.04216.
Fang G, Liu W, Wang L. A machine learning approach to select features important to stroke prognosis. Comput Biol Chem. Elsevier Ltd; 2020;88.
Bellamy RKE, Dey K, Hind M, Hoffman SC, Houde S, Kannan K, et al. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. ArXiv [Internet]. 2018 [cited 2022 Oct 11];63:1–15. Available from: https://arxiv.org/abs/1810.01943v1.
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) [Internet]. ACM PUB27 New York, NY, USA ; 2021 [cited 2023 Jan 28];54. Available from: https://dl.acm.org/doi/10.1145/3457607.
Li Y, Wang H, Luo Y. Improving fairness in the prediction of heart failure length of stay and mortality by integrating social determinants of health. Circ Heart Fail [Internet]. NLM (Medline); 2022 [cited 2023 Feb 4];15:e009473. Available from: https://www.ahajournals.org/doi/abs/10.1161/CIRCHEARTFAILURE.122.009473.
Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med. American College of Physicians; 2010;152:726–32.
Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK, Chan AW, et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med Nature Research. 2020;26:1364–74.
Article CAS Google Scholar
Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ, Ashrafian H, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digit Health. Elsevier Ltd; 2020; e549–60.
Sounderajah V, Ashrafian H, Golub RM, Shetty S, de Fauw J, Hooft L, et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open. BMJ Publishing Group; 2021;11:e047709.
Plana D, Shung DL, Grimshaw AA, Saraf A, Sung JJY, Kann BH. Randomized clinical trials of machine learning interventions in health care. JAMA Netw Open [Internet]. American Medical Association; 2022 [cited 2022 Oct 9];5:e2233946. Available from: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2796833.
Hacke W, Kaste M, Fieschi C, von Kummer R, Davalos A, Meier D, et al. Randomised double-blind placebo-controlled trial of thrombolytic therapy with intravenous alteplase in acute ischaemic stroke (ECASS II). Lancet. Elsevier Limited; 1998;352:1245–51.
Hacke W, Kaste M, Bluhmki E, Brozman M, Dávalos A, Guidetti D, et al. Thrombolysis with Alteplase 3 to 4.5 hours after acute ischemic stroke. N Engl J Med. Massachusetts Medical Society; 2008;359:1317–29.
The National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med [Internet]. Massachusetts Medical Society; 1995 [cited 2022 Sep 11];333:1581–8. Available from: http://www.nejm.org/doi/abs/10.1056/NEJM199512143332401.
Campbell BCV, Mitchell PJ, Churilov L, Yassi N, Kleinig TJ, Dowling RJ, et al. Effect of intravenous tenecteplase dose on cerebral reperfusion before thrombectomy in patients with large vessel occlusion ischemic stroke: the EXTEND-IA TNK part 2 randomized clinical trial. JAMA - J Am Med Assoc. 2020;323:1257–65.
Article CAS Google Scholar
Kvistad CE, Næss H, Helleberg BH, Idicula T, Hagberg G, Nordby LM, et al. Tenecteplase versus alteplase for the management of acute ischaemic stroke in Norway (NOR-TEST 2, part A): a phase 3, randomised, open-label, blinded endpoint, non-inferiority trial. Lancet Neurol. Elsevier Ltd; 2022;21:511–9.
Menon BK, Buck BH, Singh N, Deschaintre Y, Almekhlafi MA, Coutts SB, et al. Intravenous tenecteplase compared with alteplase for acute ischaemic stroke in Canada (AcT): a pragmatic, multicentre, open-label, registry-linked, randomised, controlled, non-inferiority trial. Lancet. Elsevier B.V.; 2022;400:161–9.
Roaldsen MB, Lindekleiv H, Eltoft A, Jusufovic M, Søyland MH, Petersson J, et al. Tenecteplase in wake-up ischemic stroke trial: protocol for a randomized-controlled trial. Int J Stroke. SAGE Publications Inc.; 2021;16:990–4.
Albers GW, Campbell BCV, Lansberg MG, Broderick J, Butcher K, Froehler MT, et al. A phase III, prospective, double-blind, randomized, placebo-controlled trial of thrombolysis in imaging-eligible, late-window patients to assess the efficacy and safety of tenecteplase (TIMELESS): rationale and design. Int J Stroke. SAGE Publications Inc.; 2022;
Broderick JP, Palesch YY, Demchuk AM, Yeatts SD, Khatri P, Hill MD, et al. Endovascular therapy after intravenous t-PA versus t-PA alone for stroke. N Engl J Med. New England Journal of Medicine (NEJM/MMS); 2013;368:893–903.
Ciccone A, Valvassori L, Nichelatti M, Sgoifo A, Ponzio M, Sterzi R, et al. Endovascular treatment for acute ischemic stroke. N Engl J Med [Internet]. Massachusetts Medical Society; 2013 [cited 2022 Sep 10];368:904–13. Available from: http://www.nejm.org/doi/10.1056/NEJMoa1213701.
Kidwell CS, Jahan R, Gornbein J, Alger JR, Nenov V, Ajani Z, et al. A trial of imaging selection and endovascular treatment for ischemic stroke. N Engl J Med. 2013;368:914–23.
Article CAS PubMed PubMed Central Google Scholar
Nogueira RG, Jadhav AP, Haussen DC, Bonafe A, Budzik RF, Bhuva P, et al. Thrombectomy 6 to 24 hours after stroke with a mismatch between deficit and infarct. N Engl J Med. Massachusetts Medical Society; 2018;378:11–21.
Albers GW, Marks MP, Kemp S, Christensen S, Tsai JP, Ortega-Gutierrez S, et al. Thrombectomy for stroke at 6 to 16 hours with selection by perfusion imaging. N Engl J Med. 2018;378:708–18.
Article PubMed PubMed Central Google Scholar
Straka M, Albers GW, Bammer R. Real-time diffusion-perfusion mismatch analysis in acute stroke. J Magn Reson Imaging. 2010;1024–37.
Mackey J, Kleindorfer D, Sucharew H, Moomaw CJ, Kissela BM, Alwell K, et al. Population-based study of wake-up strokes. Neurology [Internet]. American Academy of Neurology; 2011 [cited 2022 Oct 10];76:1662. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3100086/.
Thomalla G, Boutitie F, Fiebach JB, Simonsen CZ, Nighoghossian N, Pedraza S, et al. Stroke with unknown time of symptom onset: baseline clinical and magnetic resonance imaging data of the first thousand patients in WAKE-UP (efficacy and safety of MRI-based thrombolysis in wake-up stroke: a randomized, doubleblind, placebo-controlled trial). Stroke [Internet]. Lippincott Williams and Wilkins; 2017 [cited 2022 Oct 10];48:770–3. Available from: https://www.ahajournals.org/doi/abs/10.1161/STROKEAHA.116.015233.
Thomalla G, Simonsen CZ, Boutitie F, Andersen G, Berthezene Y, Cheng B, et al. MRI-guided thrombolysis for stroke with unknown time of onset. N Engl J Med. 2018;379:611–22.
Article PubMed Google Scholar
Thomalla G, Cheng B, Ebinger M, Hao Q, Tourdias T, Wu O, et al. DWI-FLAIR mismatch for the identification of patients with acute ischaemic stroke within 4·5 h of symptom onset (PRE-FLAIR): a multicentre observational study. Lancet Neurol. 2011;10:978–86.
Article PubMed Google Scholar
Zhu H, Jiang L, Zhang H, Luo L, Chen Y, Chen Y. An automatic machine learning approach for ischemic stroke onset time identification based on DWI and FLAIR imaging. Neuroimage Clin. Elsevier Inc.; 2021;31.
Ho KC, Speier W, Zhang H, Scalzo F, El-Saden S, Arnold CW. A machine learning approach for classifying ischemic stroke onset time from imaging. IEEE Trans Med Imaging. Institute of Electrical and Electronics Engineers Inc.; 2019;38:1666–76.
Lee H, Lee EJ, Ham S, Lee HB, Lee JS, Kwon SU, et al. Machine learning approach to identify stroke within 4.5 hours. Stroke. Lippincott Williams and Wilkins; 2020;860–6.
Regenhardt RW, Bretzner M, Zanon Zotin MC, Bonkhoff AK, Etherton MR, Hong S, et al. Radiomic signature of DWI-FLAIR mismatch in large vessel occlusion stroke. J Neuroimaging [Internet]. J Neuroimaging; 2022 [cited 2023 Jan 28];32:63–7. Available from: https://pubmed.ncbi.nlm.nih.gov/34506667/.
Wang X, Fan Y, Zhang N, Li J, Duan Y, Yang B. Performance of machine learning for tissue outcome prediction in acute ischemic stroke: a systematic review and meta-analysis. Front Neurol [Internet]. Frontiers Media SA; 2022 [cited 2022 Aug 30];13. Available from: https://www.frontiersin.org/articles/10.3389/fneur.2022.910259/full.
Nielsen A, Hansen MB, Tietze A, Mouridsen K. Prediction of tissue outcome and assessment of treatment effect in acute ischemic stroke using deep learning. Stroke. Lippincott Williams and Wilkins; 2018;49:1394–401.
Kim YC, Lee JE, Yu I, Song HN, Baek IY, Seong JK, et al. Evaluation of diffusion lesion volume measurements in acute ischemic stroke using encoder-decoder convolutional network. Stroke [Internet]. Lippincott Williams and Wilkins; 2019 [cited 2022 Oct 22];50:1444–51. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.118.024261.
Yu Y, Xie Y, Thamm T, Gong E, Ouyang J, Christensen S, et al. Tissue at risk and ischemic core estimation using deep learning in acute stroke. Am J Neuroradiol. American Society of Neuroradiology; 2021;1030–7.
Yu Y, Xie Y, Thamm T, Gong E, Ouyang J, Huang C, et al. Use of Deep learning to predict final ischemic stroke lesions from initial magnetic resonance imaging. JAMA Netw Open. American Medical Association; 2020;3:e200772–e200772.
Feng R, Badgeley M, Mocco J, Oermann EK. Deep learning guided stroke management: a review of clinical applications. J Neurointerv Surg. BMJ Publishing Group; 2018;358–61.
Grosser M, Gellißen S, Borchert P, Sedlacik J, Nawabi J, Fiehler J, et al. Improved multi-parametric prediction of tissue outcome in acute ischemic stroke patients using spatial features. PLoS One. Public Library of Science; 2020;15.
Wang K, Shou Q, Ma SJ, Liebeskind D, Qiao XJ, Saver J, et al. Deep learning detection of penumbral tissue on arterial spin labeling in stroke. Stroke [Internet]. Lippincott Williams and Wilkins; 2020 [cited 2022 Aug 2];51:489–97. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.119.027457.
Wu O, Winzeck S, Giese AK, Hancock BL, Etherton MR, Bouts MJRJ, et al. Big Data Approaches to phenotyping acute ischemic stroke using automated lesion segmentation of multi-center magnetic resonance imaging data. Stroke. Lippincott Williams and Wilkins; 2019;50:1734–41.
Lotan E. Emerging artificial intelligence imaging applications for stroke interventions. Am J Neuroradiol. American Society of Neuroradiology; 2021;255–6.
Adhya J, Li C, Eisenmenger L, Cerejo R, Tayal A, Goldberg M, et al. Positive predictive value and stroke workflow outcomes using automated vessel density (RAPID-CTA) in stroke patients: one year experience. Neuroradiol J [Internet]. SAGE Publications Inc.; 2021 [cited 2022 Aug 2];34:476–81. Available from: http://journals.sagepub.com/doi/10.1177/19714009211012353.
RapidAI Achieves Record Momentum [Internet]. 2021. [cited 2022 Oct 10]. Available from: https://www.rapidai.com/press-release/rapidai-achieves-record-momentum.
Brainomix | Hungarian tender [Internet]. 2022. [cited 2022 Oct 10]. Available from: https://www.brainomix.com/news/hungarian-tender/.
Murray NM, Unberath M, Hager GD, Hui FK. Artificial intelligence to diagnose ischemic stroke and identify large vessel occlusions: a systematic review. J Neurointerv Surg. BMJ Publishing Group; 2020;156–64.
Yu Y, Heit JJ, Zaharchuk G. Improving ischemic stroke care with MRI and deep learning artificial intelligence. Topics in Magnetic Resonance Imaging [Internet]. NLM (Medline); 2021 [cited 2022 Aug 2];30:187–95. Available from: https://journals.lww.com/10.1097/RMR.0000000000000290.
Matsoukas S, Morey J, Lock G, Chada D, Shigematsu T, Marayati NF, et al. AI software detection of large vessel occlusion stroke on CT angiography: a real-world prospective diagnostic test accuracy study. J Neurointerv Surg. BMJ; 2022;neurintsurg-2021–018391.
Amukotuwa SA, Straka M, Dehkharghani S, Bammer R. Fast automatic detection of large vessel occlusions on CT angiography. Stroke [Internet]. Lippincott Williams and Wilkins; 2019 [cited 2022 Sep 26];50:3431–8. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.119.027076.
Khankari J, Yu Y, Ouyang J, Hussein R, Do HM, Heit JJ, et al. Automated detection of arterial landmarks and vascular occlusions in patients with acute stroke receiving digital subtraction angiography using deep learning. J Neurointerv Surg. BMJ Publishing Group; 2022.
Matsoukas S, Chennareddy S, Kalagara R, Scaggiante J, Smith CJ, Bazil MJ, et al. Pilot deployment of viz–intracranial hemorrhage for intracranial hemorrhage detection: real-world performance in a stroke code cohort. Stroke [Internet]. 2022; Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.122.039711.
Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau NG, Venugopal VK, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. Lancet Publishing Group; 2018;392:2388–96.
Álvarez-Sabín J, Maisterra O, Santamarina E, Kase CS. Factors influencing haemorrhagic transformation in ischaemic stroke. Lancet Neurol. 2013;689–705.
Mainali S, Darsie ME, Smetana KS. Machine learning in action: stroke diagnosis and outcome prediction. Front Neurol. Frontiers Media S.A.; 2021:2153.
Choi JM, Seo SY, Kim PJ, Kim YS, Lee SH, Sohn JH, et al. Prediction of hemorrhagic transformation after ischemic stroke using machine learning. J Pers Med. MDPI; 2021;11.
Marsh EB, Llinas RH, Schneider ALC, Hillis AE, Lawrence E, Dziedzic P, et al. Predicting hemorrhagic transformation of acute ischemic stroke: prospective validation of the HeRS score. Medicine [Internet]. Wolters Kluwer Health; 2016 [cited 2022 Oct 10];95. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4718251/.
Liu J, Wang Y, Jin Y, Guo W, Song Q, Wei C, et al. Prediction of hemorrhagic transformation after ischemic stroke: development and validation study of a novel multi-biomarker model. Front Aging Neurosci. Frontiers Media S.A.; 2021;13:257.
Yaghi S, Willey JZ, Cucchiara B, Goldstein JN, Gonzales NR, Khatri P, et al. Treatment and outcome of hemorrhagic transformation after intravenous alteplase in acute ischemic stroke: a scientific statement for healthcare professionals From the American Heart Association/American Stroke Association. Stroke [Internet]. Lippincott Williams & Wilkins Hagerstown, MD ; 2017 [cited 2022 Oct 10];48:e343–61. Available from: https://www.ahajournals.org/doi/abs/10.1161/str.0000000000000152.
Yu Y, Guo D, Lou M, Liebeskind D, Scalzo F. Prediction of hemorrhagic transformation severity in acute stroke from source perfusion MRI. IEEE Trans Biomed Eng. IEEE Computer Society; 2018;65:2058–65.
Yu Y, Parsi B, Speier W, Arnold C, Lou M, Scalzo F. LSTM network for prediction of hemorrhagic transformation in acute stroke. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Science and Business Media Deutschland GmbH; 2019. p. 177–85.
Hall AN, Weaver B, Liotta E, Maas MB, Faigle R, Mroczek DK, et al. Identifying modifiable predictors of patient outcomes after intracerebral hemorrhage with machine learning. Neurocrit Care. Springer; 2021;34:73–84.
Dhar R, Chen Y, Hamzehloo A, Kumar A, Heitsch L, He J, et al. Reduction in cerebrospinal fluid volume as an early quantitative biomarker of cerebral edema after ischemic stroke. Stroke [Internet]. Lippincott Williams and Wilkins; 2020 [cited 2022 Oct 28];51:462–7. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.119.027895.
Foroushani HM, Hamzehloo A, Kumar A, Chen Y, Heitsch L, Slowik A, et al. Quantitative serial CT imaging-derived features improve prediction of malignant cerebral edema after ischemic stroke. Neurocrit Care. Springer; 2020;33:785–92.
Labovitz DL, Allen Hauser W, Sacco RL. Prevalence and predictors of early seizure and status epilepticus after first stroke. Neurology. Lippincott Williams and Wilkins; 2001;57:200–6.
Galovic M, Döhler N, Erdélyi-Canavese B, Felbecker A, Siebel P, Conrad J, et al. Prediction of late seizures after ischaemic stroke with a novel prognostic model (the SeLECT score): a multivariable prediction model development and validation study. Lancet Neurol. Lancet Publishing Group; 2018;17:143.
Haapaniemi E, Strbian D, Rossi C, Putaala J, Sipi T, Mustanoja S, et al. The CAVE score for predicting late seizures after intracerebral hemorrhage. Stroke [Internet]. Lippincott Williams and Wilkins; 2014 [cited 2022 Oct 9];45:1971–6. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.114.004686.
Naidech AM, Weaver B, Maas M, Bleck TP, Vanhaerents S, Schuele SU. Early seizures are predictive of worse health-related quality of life at follow-up after intracerebral hemorrhage. Crit Care Med. Lippincott Williams and Wilkins; 2021;49:E578–84.
Naidech AM, Beaumont J, Muldoon K, Liotta EM, Maas MB, Potts MB, et al. Prophylactic seizure medication and health-related quality of life after intracerebral hemorrhage. Crit Care Med [Internet]. Crit Care Med; 2018 [cited 2022 Oct 9];46:1480–5. Available from: https://pubmed.ncbi.nlm.nih.gov/29923930/.
Bunney G, Murphy J, Colton K, Wang H, Shin HJ, Faigle R, et al. Predicting early seizures after intracerebral hemorrhage with machine learning. Neurocrit Care. Springer; 2022;37.
Feher G, Gurdan Z, Gombos K, Koltai K, Pusch G, Tibold A, et al. Early seizures after ischemic stroke: focus on thrombolysis. CNS Spectr. Cambridge University Press; 2020:101–13.
Kim HJ, Park KD, Choi KG, Lee HW. Clinical predictors of seizure recurrence after the first post-ischemic stroke seizure. BMC Neurol [Internet]. BioMed Central Ltd.; 2016 [cited 2022 Oct 9];16:212. Available from: http://bmcneurol.biomedcentral.com/articles/10.1186/s12883-016-0729-6.
Abou Jaoude M, Jacobs CS, Sarkis RA, Jing J, Pellerin KR, Cole AJ, et al. Noninvasive detection of hippocampal epileptiform activity on scalp electroencephalogram. JAMA Neurol. American Medical Association; 2022;79:614–22.
Wang W, Kiik M, Peek N, Curcin V, Marshall IJ, Rudd AG, et al. A systematic review of machine learning models for predicting outcomes of stroke with structured data. PLoS One. Public Library of Science; 2020.
Ntaios G, Faouzi M, Ferrari J, Lang W, Vemmos K, Michel P. An integer-based score to predict functional outcome in acute ischemic stroke: the ASTRAL score. Neurology. Lippincott Williams and Wilkins; 2012;78:1916–22.
Papavasileiou V, Milionis H, Michel P, Makaritsis K, Vemmou A, Koroboki E, et al. ASTRAL score predicts 5-year dependence and mortality in acute ischemic stroke. Stroke [Internet]. Lippincott Williams & Wilkins Hagerstown, MD ; 2013 [cited 2022 Aug 1];44:1616–20. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.113.001047.
Forkert ND, Verleger T, Cheng B, Thomalla G, Hilgetag CC, Fiehler J. Multiclass support vector machine-based lesion mapping predicts functional outcome in ischemic stroke patients. PLoS One. Public Library of Science; 2015;10.
Heo JN, Yoon JG, Park H, Kim YD, Nam HS, Heo JH. Machine learning-based model for prediction of outcomes in acute stroke. Stroke. Lippincott Williams and Wilkins; 2019;50:1263–5.
Heo TS, Kim YS, Choi JM, Jeong YS, Seo SY, Lee JH, et al. Prediction of stroke outcome using natural language processing-based machine learning of radiology report of brain MRI. J Pers Med. MDPI AG; 2020;10:1–11.
Brugnara G, Neuberger U, Mahmutoglu MA, Foltyn M, Herweh C, Nagel S, et al. Multimodal predictive modeling of endovascular treatment outcome for acute ischemic stroke using machine-learning. Stroke. Lippincott Williams and Wilkins; 2020;51:3541–51.
Scrutinio D, Ricciardi C, Donisi L, Losavio E, Battista P, Guida P, et al. Machine learning to predict mortality after rehabilitation among patients with severe stroke. Sci Rep. Nature Research; 2020;10.
Ali M, Fulton R, Quinn T, Brady M. How well do standard stroke outcome measures reflect quality of life?: a retrospective analysis of clinical trial data. Stroke [Internet]. Lippincott Williams & Wilkins Hagerstown, MD ; 2013 [cited 2023 Jan 28];44:3161–5. Available from: https://www.ahajournals.org/doi/abs/10.1161/STROKEAHA.113.001126.
Katzan IL, Lapin B. PROMIS GH (patient-reported outcomes measurement information system global health) scale in stroke: a validation study. Stroke [Internet]. Stroke; 2018 [cited 2023 Feb 4];49:147–54. Available from: https://pubmed.ncbi.nlm.nih.gov/29273595/.
Lapin B, Udeh B, Bautista JF, Katzan IL. Patient experience with patient-reported outcome measures in neurologic practice. Neurology [Internet]. Neurology; 2018 [cited 2023 Feb 4];91:e1135–51. Available from: https://pubmed.ncbi.nlm.nih.gov/30135254/.
Katzan IL, Fan Y, Uchino K, Griffith SD. The PROMIS physical function scale: a promising scale for use in patients with ischemic stroke. Neurology [Internet]. Neurology; 2016 [cited 2023 Feb 4];86:1801–7. Available from: https://pubmed.ncbi.nlm.nih.gov/27164715/.
Lin C, Lee J, Chatterjee N, Corado C, Carroll T, Naidech A, et al. Predicting domain-specific health-related quality of life using acute infarct volume. Stroke [Internet]. NIH Public Access; 2017 [cited 2023 Feb 4];48:1925. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5505231/.
Liao WW, Hsieh YW, Lee TH, Chen CL, Wu CY. Machine learning predicts clinically significant health related quality of life improvement after sensorimotor rehabilitation interventions in chronic stroke. Sci Rep [Internet]. Sci Rep; 2022 [cited 2023 Jan 28];12. Available from: https://pubmed.ncbi.nlm.nih.gov/35787657/.
Murphy J, Shin HJ, Wang H, Luo Y, Jahromi B, Bleck TP, et al. Clusters across multiple domains of health-related quality of life reveal complex patient outcomes after subarachnoid hemorrhage. Crit Care Explor [Internet]. Wolters Kluwer Health; 2021 [cited 2023 Feb 4];3:e0533. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8443826/.
Bacchi S, Oakden-Rayner L, Menon DK, Jannes J, Kleinig T, Koblar S. Stroke prognostication for discharge planning with machine learning: a derivation study. J Clin Neurosci. Churchill Livingstone; 2020;79:100–3.
Lineback CM, Garg R, Oh E, Naidech AM, Holl JL, Prabhakaran S. Prediction of 30-day readmission after stroke using machine learning and natural language processing. Front Neurol. Frontiers Media S.A.; 2021;12.
Kleindorfer DO, Towfighi A, Chaturvedi S, Cockroft KM, Gutierrez J, Lombardi-Hill D, et al. 2021 Guideline for the prevention of stroke in patients with stroke and transient ischemic attack: a guideline from the American Heart Association/American Stroke Association. Stroke [Internet]. Lippincott Williams & Wilkins Hagerstown, MD ; 2021 [cited 2022 Oct 11];52:E364–467. Available from: https://www.ahajournals.org/doi/abs/10.1161/STR.0000000000000375.
Powers BJ, Danus S, Grubber JM, Olsen MK, Oddone EZ, Bosworth HB. The effectiveness of personalized coronary heart disease and stroke risk communication. Am Heart J [Internet]. Am Heart J; 2011 [cited 2022 Oct 11];161:673–80. Available from: https://pubmed.ncbi.nlm.nih.gov/21473965/.
Meschia JF, Bushnell C, Boden-Albala B, Braun LT, Bravata DM, Chaturvedi S, et al. Guidelines for the primary prevention of stroke: a statement for healthcare professionals from the American heart association/American stroke association. Stroke [Internet]. Lippincott Williams and Wilkins; 2014 [cited 2022 Oct 11];45:3754–832. Available from: https://www.ahajournals.org/doi/abs/10.1161/str.0000000000000046.
Chun M, Clarke R, Cairns BJ, Clifton D, Bennett D, Chen Y, et al. Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults. J Am Med Inform Assoc. Oxford University Press; 2021;28:1719–27.
Yang Y, Zheng J, Du Z, Li Y, Cai Y. Accurate prediction of stroke for hypertensive patients based on medical big data and machine learning algorithms: retrospective study. JMIR Med Inform. JMIR Publications Inc.; 2021;9.
Shek A, Jiang Z, Teo J, Au Yeung J, Bhalla A, Richardson MP, et al. Machine learning-enabled multitrust audit of stroke comorbidities using natural language processing. Eur J Neurol. John Wiley and Sons Inc; 2021;28:4090–7.
Guan W, Ko D, Khurshid S, Trisini Lipsanopoulos AT, Ashburner JM, Harrington LX, et al. Automated electronic phenotyping of cardioembolic stroke. Stroke [Internet]. Lippincott Williams and Wilkins; 2020 [cited 2022 Aug 25];52:181–9. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.120.030663.
Prabhakaran S. Big data trends in stroke epidemiology in the United States: but are they good data? Neurology. Lippincott Williams and Wilkins; 2017:1940–1.
Dichgans M, Pulit SL, Rosand J. Stroke genetics: discovery, biology, and clinical applications. Lancet Neurol. Lancet Publishing Group; 2019:587–99.
Malik R, Chauhan G, Traylor M, Sargurupremraj M, Okada Y, Mishra A, et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat Genet. Nature Publishing Group; 2018;50:524–37.
Owolabi M, Peprah E, Xu H, Akinyemi R, Tiwari HK, Irvin MR, et al. Advancing stroke genomic research in the age of Trans-Omics big data science: emerging priorities and opportunities. J Neurol Sci. Elsevier B.V.; 2017. p. 18–28.
Naidech AM, Lawlor PN, Xu H, Fonarow GC, Xian Y, Smith EE, et al. Probing the effective treatment thresholds for alteplase in acute ischemic stroke with regression discontinuity designs. Front Neurol. Frontiers Media S.A.; 2020;11.
Luo Y, Wunderink RG, Lloyd-Jones D. Proactive vs reactive machine learning in health care: lessons from the COVID-19 pandemic. JAMA. American Medical Association; 2022;327:623–4.
Petersen NH, Silverman A, Wang A, Strander S, Kodali S, Matouk C, et al. Association of personalized blood pressure targets with hemorrhagic transformation and functional outcome after endovascular stroke therapy. JAMA Neurol. American Medical Association; 2019. p. 1256–8.
Yaghi S, Prabhakaran S, Khatri P, Liebeskind DS. Intracranial atherosclerotic disease. Stroke [Internet]. Lippincott Williams and Wilkins; 2019 [cited 2022 Oct 11];50:1286–93. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.118.024147.
Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med [Internet]. BioMed Central Ltd.; 2019 [cited 2023 Jan 28];17:1–9. Available from: https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-019-1426-2.
D’Hondt E, Ashby TJ, Chakroun I, Koninckx T, Wuyts R. Identifying and evaluating barriers for the implementation of machine learning in the intensive care unit. Commun Med [Internet]. Nature Publishing Group; 2022 [cited 2023 Jan 28];2:1–12. Available from: https://www.nature.com/articles/s43856-022-00225-1.

Download references

Required Author Forms

Disclosure forms provided by the authors are available with the online version of this article.

Funding

Andrew M. Naidech is funded by the National Institute of Health, National Institute of Neurological Disorders and Stroke grants R01 NS110779 and U01 NS110772. Yuan Luo is funded by the National Institute of Health, National Library of Medicine R01 LM013337 and National Center for Advanced Translational Sciences U01 TR003528.

Author information

Authors and Affiliations

Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Yuzhe Liu
Section of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Yuan Luo
Section of Neurocritical Care, Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Andrew M. Naidech

Authors

Yuzhe Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Andrew M. Naidech
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuzhe Liu.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 508 kb)

Supplementary file2 (PDF 499 kb)

Supplementary file3 (PDF 507 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Luo, Y. & Naidech, A.M. Big Data in Stroke: How to Use Big Data to Make the Next Management Decision. Neurotherapeutics 20, 744–757 (2023). https://doi.org/10.1007/s13311-023-01358-4

Download citation

Accepted: 17 February 2023
Published: 10 March 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s13311-023-01358-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Big Data in Stroke: How to Use Big Data to Make the Next Management Decision

Abstract

Similar content being viewed by others

The Allure of Big Data to Improve Stroke Outcomes: Review of Current Literature

Future Application: Prognosis Determination

Introduction

Introduction/Methods

Big Data

Human Bias in Machine Learning

Acute Treatment of Stroke

Management of Complications

Stroke Outcomes and Prognosis

Stroke Prevention

Future Directions

Conclusion

References

Required Author Forms

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 508 kb)

Supplementary file2 (PDF 499 kb)

Supplementary file3 (PDF 507 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Big Data in Stroke: How to Use Big Data to Make the Next Management Decision

Abstract

Similar content being viewed by others

The Allure of Big Data to Improve Stroke Outcomes: Review of Current Literature

Future Application: Prognosis Determination

Introduction

Introduction/Methods

Big Data

Human Bias in Machine Learning

Acute Treatment of Stroke

Management of Complications

Stroke Outcomes and Prognosis

Stroke Prevention

Future Directions

Conclusion

References

Required Author Forms

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 508 kb)

Supplementary file2 (PDF 499 kb)

Supplementary file3 (PDF 507 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation