Keywords

1 Introduction

“The essence of practicing medicine has been obtaining as much data about the patient’s health or disease as possible and making decisions based on that. Physicians have had to rely on their experience, judgement, and problem-solving skills while using rudimentary tools and limited resources.” [1]

Precision medicine aims to individualize prevention, diagnostics, and therapeutics by understanding differences in individuals’ genetics, lifestyle, and environment [2]. Over the past years, we have been witnessing an unprecedented push toward a more data-driven approach in healthcare that promises to take precision medicine to the next level, in part through artificial intelligence (AI). Simply put, AI can be understood as a set of sophisticated computational methods that seek to mimic human cognitive functions, including visual perception, speech recognition , and decision-making [3, 4]. AI uses certain machine learning (ML) algorithms to “learn” features from large datasets [3] and recognize patterns that are often invisible to the human eye [5,6,7]. Capitalizing on the availability of big data and ever-increasing computational power and storage capacities [1, 8], these novel tools seek to improve population health and well-being and to reduce healthcare costs.

A surge in scientific publications documents the potential to harness artificial intelligence in healthcare to prevent, diagnose, and treat diseases [9]. One of the pressing disease areas in focus for AI researchers is stroke, a leading cause of disability and mortality worldwide [3, 8]. Researchers aim to develop applications to optimize stroke diagnosis, treatment, and rehabilitation [10,11,12], and they also use AI to better understand risk. Several well-established risk prediction models have been developed as tools for stroke prevention [13]. Prevention plays an instrumental role in reducing the global burden of stroke [14], and the strategic adoption and development of AI-driven prediction tools can contribute substantially to this mission [1, 13]. These new tools open welcome opportunities and introduce new questions for us, of course. We find ourselves only at the beginning of this exciting journey that will without a doubt confront us with novel ethical, societal, and regulatory challenges.

This chapter surveys the global burden of stroke and describes current practices for reducing stroke incidence and stroke mortality rates. In particular, the chapter reviews how ML applications are applied to stroke risk prediction and prevention and identifies important technological and methodological challenges for using AI in these contexts. The chapter concludes by drawing the readers’ attention to some of the questions and ethical challenges that arise as clinicians widely adopt ML-based applications in practice.

2 Burden of Stroke

Stroke is one of the leading causes of disability and mortality worldwide [14,15,16,17]. Even though a decrease in stroke mortality and incident rates was observed from 1990 to 2016, absolute numbers show an increase in stroke-related mortality and disability [15, 16]. The absolute number of people affected by stroke almost doubled during this time [16] with incidence rates in low- to middle-income countries exceeding those observed in high-income countries [18]. Researchers estimate that in 2016, there were over 80 million people affected by stroke, many of them younger than 70 years of age [15, 16]. In 2017, Europe counted 1.5 million stroke diagnoses and nine million stroke survivors, with 1.2 million experiencing severe limitations in their activities of daily living [19]. That same year, 0.4 million people died because of stroke [19]. The increase in absolute numbers is largely attributed to population aging and growth [20, 21]. Yet, a noteworthy increase was also recorded in stroke incidence rates in younger age groups (15- to 49-year olds) [16].

The global increase in stroke incidents poses major challenges for healthcare systems, and these challenges extend beyond a patient’s hospital stay. Patients who survive a stroke long to return to normality [22]. However, following hospital discharge, stroke survivors and their families must cope with the aftermath of stroke. People who suffered a stroke often experience more or less severe physical, cognitive, and emotional deficits that may limit their ability to perform certain activities in daily life [23, 24]. As a result, they remain at least partially dependent on an informal caregiver, usually a family member or partner [25]. Stroke survivors and informal caregivers commonly report physical, emotional, social, and financial challenges and concerns [26, 27]. They also face service deficiencies in health and social care, limited options for service offers outside of healthcare, and a paucity of options for continuity of care. All of this lays an additional burden on those affected by stroke, leaving them frustrated and under emotional strain [27].

In addition to the impact of stroke on individuals, societies are faced with the economic burden of stroke [28]. Healthcare utilization, informal care provision, and the loss of productivity in the workforce contribute to these rising costs [21, 29, 30]. A recent study analyzing stroke-related costs for 32 European countries estimates that total costs added up to €60 billion in 2017. This includes €27 billion (45%) incurred by healthcare systems, €5 billion (8%) incurred by social care systems, an estimated €16 billion (27%) for informal care costs, and €13 billion (20%) owed to lost productivity due to early death or absence from work [19]. While lower total costs to the healthcare system have been reported for the United States for 2014/2015 [31], per capita healthcare-related spending on stroke was higher in the USA compared to Europe [19]. Similar costs were reported for stroke-related healthcare costs per stroke survivor living in the USA and Europe [19].

3 Stroke Prevention: A Public Health Priority

As the global stroke burden increases, researchers and policymakers call for more efficient stroke prevention and management strategies and improved access to stroke services [16, 17, 32, 33]. In 2006, the World Health Organization (WHO) highlighted neurological disorders, including stroke, as a public health priority [34]. With its Global Status Report on Noncommunicable Diseases 2014, WHO aimed to unite and support nations in the fight against stroke and vascular diseases [32, 33].

There is common agreement that prevention is one, if not the most, promising strategy to reduce the burden of stroke [16, 35,36,37]. It is well established that there are non-modifiable (e.g., sex, gender, genetics) and modifiable (e.g., smoking cessation, physical inactivity) risk factors for stroke [38, 39]. Modifiable risk factors are the obvious targets of stroke prevention efforts. In an international case-control study, researchers found that ten risk factors (history of hypertension, current smoking, waist-to-hip ratio, diet risk score, regular physical activity, diabetes mellitus, binge alcohol consumption, psychosocial stress and depression, cardiac diseases, and ratio of apolipoproteins B to A1) were associated to 90% of the risk of stroke [39]. The authors concluded that lifestyle interventions targeting blood pressure reduction, smoking cessation, and the promotion of physical activity and a healthy diet could help to significantly reduce the burden of stroke.

There are two main approaches in stroke prevention [40]: population-wide prevention strategies and prevention strategies that target high-risk individuals. Population-wide strategies aim at modifying behavioral and lifestyle risk factors in the entire population to promote health maintenance [41]. In doing so, they can also contribute to preventing other diseases and chronic conditions (e.g., hypertension and diabetes mellitus) that constitute known stroke risk factors [14]. Recent advances in our ability to accurately assess individual risk for cardiovascular diseases have motivated some countries to prioritize risk-based screening approaches to identify individuals at risk [42, 43].

Despite a formal distinction between these two approaches, it is important to note that stroke risk is a continuum with no determined threshold at which certain interventions are automatically indicated. Therefore, it may not be appropriate to categorize individuals into low-, moderate-, and high-risk groups when communicating absolute cardiovascular risk [44]. To effectively reduce stroke incidence and mortality rates, efforts must be undertaken to educate the general population about known behavioral risk factors [14, 43]. In addition, inexpensive screening strategies should be adopted to assist clinicians in identifying and protecting high-risk individuals [14, 43].

4 The Advent of Data-Driven Risk Prediction Models

Early prediction of stroke risk is the cornerstone of stroke prevention [45]. Identifying individuals who could benefit most from specific therapeutics or interventions helps them get the care they need and simultaneously helps avoid unnecessary treatments for others [10, 46, 47]. To date, several well-established statistically derived risk prediction models have been developed to provide long-term risk prediction [42, 45, 48, 49]. Clinicians commonly rely on these models to assess long-term risk, because the models provide parameters that are easy to interpret, such as odds ratios, relative risks, and hazard ratios [50]. However, these traditional models are subject to several limitations. They can, for example, only include a small number of risk factors (predictors) and generally do not include image-based morphological characteristics [13, 50, 51] nor behavioral risk factors (except smoking) or independent genetic factors [43]. Moreover, traditional approaches rely on certain assumptions of linearity, thus forcing models to behave in a certain way [51]. Often, traditional models are not generalizable across different populations due to the specific characteristics of the cohorts they were derived from [13]. This may lead clinicians to over- or underestimate risk for their patients [52].

Researchers are now trying to use ML in cardiovascular diseases and stroke risk assessment to overcome some of the challenges associated with traditional risk prediction models. ML methods use computational algorithms to relate all or some predictor variables of a given set to an outcome variable [50]. Classification and regression are the two primary tasks performed by ML-based algorithms [13]. Put simply, classification tasks categorize input data into predefined labels or outcomes (e.g., event or no event), whereas regression tasks predict some real-valued output (e.g., real-valued percentage risk between 0% and 100%). Despite various commonalities, ML differs from traditional statistical approaches in some aspects [53,54,55]. Contrary to classical statistics, ML is a data-driven approach that does not rely on a predefined model and assumption of data normality [53, 56]. Moreover, unlike traditional statistics which are focused on the “typical patient,” ML is capable of making inferences at the individual level, taking into account individual differences in the data [53]. ML is also inherently a multivariate approach that can be used to analyze complex and heterogeneous kinds of data and incorporate them into risk prediction models, making it a promising solution for stroke risk prediction [53, 54, 57].

Studies investigating the use of these techniques in cardiovascular diseases and stroke prediction indicate that ML-based approaches can boost prediction accuracy. A recently published review found that the most common ML-based algorithms used in cardiovascular risk assessment are support vector machines, artificial neural networks, linear and logistic regression, and tree-based algorithms, such as random forests and gradient tree boosting [13]. In their review, Jamthikar et al. further showed that ML-based algorithms performed better compared to traditional regression-based methods for risk assessment, and that including both image-based features and conventional cardiovascular risk factors drives prediction accuracy. Indeed, imaging plays a pivotal role in cardiovascular and stroke risk detection. Ultrasound, in particular carotid ultrasound screening, can also easily be performed in routine clinical practice—unlike other non-invasive techniques, such as computed tomography or magnetic resonance imaging [47]—making ultrasound an invaluable tool for stroke prevention. In line with these findings, Ambale-Venkatesh et al. [58] emphasized the importance of subclinical disease markers obtained from imaging, electrocardiography, and blood tests. The authors found that ML in conjunction with deep phenotyping (i.e., multiple evaluations of different aspects of a specific disease process) enhanced prediction accuracy of cardiovascular events compared to traditional risk scores.

Several other studies provide similar evidence. In a prospective cohort study using routine clinical data, for example, researchers compared four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) to an established algorithm (American College of Cardiology guidelines) for first cardiovascular event prediction over 10 years [46]. Their findings show that ML techniques outperformed the established algorithm, leading to a significantly more accurate risk prediction. Similarly, a team of researchers demonstrated that their hybrid ML approach to stroke prediction significantly reduced the false-negative rate in comparison to conventional approaches, while the overall error increased only slightly [59]. In addition to increasing prediction accuracy, authors also recognize the potential of ML-based approaches to help identify new potential risk factors and to generate a better understanding of the role of novel biomarkers [59, 60].

5 From Data-Driven Risk Prediction to Stroke Prevention

Accurate risk prediction allows clinicians and patients to act. Enabled by advances in AI technologies that can analyze vast volumes of health data in an efficient and accurate manner [4], precision medicine aims to provide treatment and prevention tailored to individuals’ variability in genetics, environment, and lifestyle [1]. At present, doctors recommend lifestyle changes to their patients, advising them to change known, modifiable risk factors to prevent stroke. Yet, their advice often goes unheeded. We should eat healthy, refrain from smoking and eschew excessive alcohol consumption, exercise regularly, stay hydrated, and the list goes on and on. To adhere to all these health-promoting recommendations in a world full of competing priorities, temptation, and imposed restrictions (e.g., financial constraints, poor access) may be too much to ask and simply not a realistic goal for many people. Earlier work has shown that there are incongruities between what people know they should do and their actual health behavior. So even though interventions (e.g., public health campaigns) may help to improve people’s knowledge, these interventions may ultimately fail to induce, and more importantly, sustain behavior change—a phenomenon commonly referred to as the knowledge-behavior gap [61, 62].

Precision medicine is a promising approach to bridge this gap. It enables physicians and researchers to predict more accurately which prevention strategies will be most effective for which groups of people [1]. Understanding their natural predisposition to stroke may, in turn, motivate individuals to take on a more active role in their own health to reduce their individual stroke risk [14, 63]. In this context, the potential of mobile monitoring devices with real-time feedback systems has been highlighted as a tool for stroke prevention [10, 60, 64,65,66,67]. However, despite the promise these novel technologies hold for enabling personalized risk assessment and promoting stroke prevention, achieving stroke prevention via these means will largely depend on patients’ acceptance and uptake of the technology. Tran et al. investigated chronic patients’ perceptions of wearable biometric monitoring devices and AI systems that enable remote measurement and analysis of patient data in real-time [68]. In addition to capturing the perceived benefits and dangers of using these new technologies, the authors also assessed patients’ readiness for using them. Their findings indicated that only half of the patients who participated in the study viewed digital tools and AI in healthcare as an opportunity, while 11% even considered them a danger, fearing that these will lead to the replacement of humans. In light of these findings, it is not surprising that 35% of patients indicated that they would refuse to integrate such devices into their care. More research is needed to better understand individuals’ underlying motivations and fears that influence their attitudes toward the use of mobile monitoring devices and AI in healthcare. It is currently also unclear how well these new tools will be received by healthcare professionals. So, while AI-powered technologies are evolving rapidly, providing unprecedented opportunities for precision medicine in stroke prevention, the integration of these technologies into clinical practice raises several questions.

A project that will shed light on some of these questions is PRECISE4Q, a project funded under the European Union’s Horizon 2020 Research and Innovation Program [69,70,71]. PRECISE4Q aims to identify and quantify risk factors and individual risk factor patterns. To do so, it combines heterogeneous data from a variety of sources, including large retrospective longitudinal stroke registry data, biobank data, and insurance data. What distinguishes PRECISE4Q from many other efforts in the field is its hybrid modeling approach, which combines ML methods and theory-driven (mechanistic modeling) approaches to risk prediction. Within the course of the project, a Digital Stroke Patient Platform will be established to collect and integrate large-scale data sets. This platform will also feature novel hybrid model architectures, structured prediction models, complex deep learning and gradient boosting models, as well as Clinical Decision Support Systems (CDSS) for stroke risk assessment, treatment outcomes, rehabilitation programs, and a socio-economic planning tool. A thorough validation of the models is planned with clinical data generated by prospective clinical studies and retrospective analyses of health registries, cohort studies, health insurance data, and electronic health records. The CDSS envisioned by PRECISE4Q will allow clinicians to simulate how an individual’s stroke risk will evolve and change under different circumstances over time. In other words, clinicians will be able to simulate how different risk factors (e.g., smoking) will contribute to disease occurrence and how the individual will respond to different possible interventions (e.g., lifestyle intervention, medication). This will assist them in providing individuals with tailored recommendations based on their natural predisposition. For individuals, this means that they will learn not only their individual stroke risk but also what they can do to reduce this risk.

Another promising avenue for future research is the use of natural language processing to automatically extract information on lifestyle modification assessment and/or advice in clinical practice from electronic health records [72,73,74]. Such analyses can provide an objective evaluation of current clinical practice and improve our understanding of the timing of lifestyle modification and patient, clinic, and provider characteristics that are associated with or predictive of lifestyle modification documentation [73]. Understanding how and when clinicians assess lifestyle modification and provide advice to patients holds important implications for the development of prevention strategies. These insights can inform the improvement of care delivery and documentation in practice. Combining tools aimed at understanding current clinical practice with sophisticated risk prediction models, such as the ones described earlier, constitutes an opportunity to deepen our understanding of stroke prevention.

6 Technological , Methodological, and Ethical Challenges

Machine learning holds great promise for stroke prevention, yet it is also subject to some challenges and limitations. There are three common areas of challenges that clinicians and researchers should be mindful of as they seek to maximize the advantages of ML in stroke prevention, and in healthcare more generally: (1) challenges in data sourcing; (2) challenges in application development; (3) challenges in deployment in clinical practice [75]. Given that patients’ health and well-being are at stake, it is of critical importance to investigate the technological and methodological challenges that arise at each stage and to consider their potential real-life consequences. It is also important to note that challenges occurring at one stage may have consequences for the subsequent stages. Challenges and limitations at the stage of data sourcing, for example, inevitably affect application development and deployment in clinical practice.

6.1 Data Sourcing

High-quality big data is key to accurate predictions. To develop ML systems that can be deployed in clinical practice, a continuous supply of large datasets is needed initially to train, validate, and improve algorithms [3, 76]. Yet, inadequate access to well-established patient and population-based datasets constitutes a major challenge for many ML-based data scientists and developers [13]. These professionals lack access to data partly because effective data sharing is currently not sufficiently incentivized by the medical scientific community [3, 10, 13, 77]. International research collaborations can help to mitigate this challenge. In the long run, effective data sharing strategies also need to be in place to facilitate and incentivize data sharing across institutions.

Another challenge to data sourcing relates to data protection and privacy regulations. Personal data are often subject to protective regulations that may impede data sharing. The European General Data Protection Regulation (GDPR), for example, entails a comprehensive set of regulations for the collection, storage, and use of personal information that will affect AI implementation in healthcare in several ways [76, 78]. The GDPR requires that individuals give explicit and informed consent before any organization collects personal data. It also grants individuals the right to track what data organizations are collecting about them, and it empowers them to direct an organization to discard their data. While these regulations rightly aim to protect patient privacy, they of course also impose certain restrictions on researchers and clinicians who seek to utilize these data. At present, the long-term impact of the GDPR and similar regulations on the implementation of AI in healthcare remains to be seen.

Closely related to data sourcing, data harmonization across different sources can also be quite problematic for data scientists. Given that very few studies provide comprehensive datasets for large numbers of participants, collaborative efforts are currently underway in the scientific community to harmonize and synthesize heterogeneous data across studies [79]. However, data harmonization is a time-consuming task that demands significant technological and scientific investments [80, 81].

6.2 Application Development

As outlined in this chapter, there is substantial evidence to suggest that ML-based algorithms can provide robust and accurate models for cardiovascular and stroke risk assessment, and can often outperform traditional regression-based approaches. Yet, there are several potential challenges and pitfalls to be mindful of when it comes to developing apps based on these algorithms. One of the key challenges in application development is algorithmic bias, which leads to systemic and unfair discrimination against certain individuals or groups of individuals [82, 83]. Even if no discrimination is intended, we know that the way data is collected, selected, prepared, and used to train ML-based algorithms can introduce bias [82]. Datasets used to develop stroke risk prediction models may, for example, suffer from missing data, misclassification, and measurement error , which can lead researchers and clinicians to make inaccurate predictions for subgroups of patients [84]. In other words, bias can occur when data sources do not reflect the true epidemiology within a given demographic [75]. As an example, consider that cardiovascular disease is often underdiagnosed in women because their symptoms are described as atypical [85, 86]. Using such data to train ML-based algorithms may further reinforce this trend.

It has also been shown that ML methods perform poorly on imbalanced datasets, as they will be biased towards the majority group [59, 87, 88]. In other words, insufficient training samples and imbalanced class distribution will limit predictive performance in cases of rare occurrences [89]. In the case of stroke risk prediction, this may, for instance, pose limitations when we aim to develop predictive models for younger populations since the vast majority of available records likely describe older age groups [89]. Even though several balancing techniques have been developed, it is still a challenge to detect and address this bias in ML models [88].

But what does persistent algorithmic bias mean in practice? Algorithmic bias can cause enormous harm and contribute to increasing existing health inequalities in the real world [83]. A prominent example is the case of racial bias in commercial algorithms used in the U.S. healthcare system. In their 2019 study, Obermeyer et al. [90] found evidence indicating that a widely used algorithm was significantly biased against black patients. Due to this racial bias, a significantly lower number of black patients were identified for extra care. The authors demonstrated that bias occurred because the algorithm predicted healthcare costs rather than illness, not accounting for the fact that unequal access to care means that healthcare spending is lower for black patients than for white patients. The study carried out by Obermeyer et al. [90] serves as a striking example of how ML-based algorithms can reinforce existing inequalities and cause harm. It also raises the question: how many biased algorithms are still out there operating day in, day out? Importantly, this kind of bias is by no means limited to the US or to US race demographics. Similar problems can just as well be embedded in European algorithms, hiding similar (or different) kinds of social disparity.

6.3 Deployment in Clinical Practice

Finally, the practical implementation of AI technologies in healthcare is not without its own challenges [76, 91]. Trust plays a fundamental role in the implementation process. To obtain acceptance, AI-powered tools must first gain healthcare providers’ and patients’ trust [92]. As a first important step to gaining trust, tools should comply with existing data protection requirements and be transparent as to how outcomes and recommendations are derived [75]. However, at present, many ML models are considered black boxes that do not explain how their predictions are derived in a way that humans can grasp [93]. Unlike well-established regression-based methods where a clear relationship can be observed between the input variables and the output variable, the internal workings of ML algorithms are not easy to interpret for most clinicians [10]. As a result, clinicians may be wary of ML-based algorithms and reluctant to adopt them in practice [13]. This may also have to do with the fact that clinicians owe their patients explanations as to how certain recommendations were derived. Patients may, in turn, be more likely to follow recommendations regarding stroke prevention if they receive a clear explanation of why certain prevention measures (e.g., exercise regime, medication) are preferable over others in their particular situation. Even though concepts like AI explainability, interpretability, and transparency have gained traction in the scientific community, there is a need for strengthening cooperation among medical practitioners and data scientists to tackle these issues in a collaborative manner [13].

There is also uncertainty regarding who can be held liable for adverse events that result from the use of ML-based algorithms. This uncertainty may, in turn, hamper trust and impede the adoption of these technologies in practice [75]. This point is also linked to clinical validation and efficacy. To foster trust in ML-based algorithms, data scientists and researchers have to show that their algorithms yield accurate predictions and that they can be integrated into clinical practice securely and efficiently for the benefit of patients [10]. In the case of stroke risk prediction and prevention, this means that novel ML-based approaches will have to compete against established models to win over clinicians’ and patients’ trust. Clinicians and patients, in turn, will have to exercise good judgment about what and whom to trust.

7 Conclusion

Novel ML-driven approaches to stroke risk prediction allow researchers to overcome some of the challenges frequently associated with traditional risk prediction models. Capitalizing on the advantages of ML, physicians, and researchers will also be able to predict more accurately which type of interventions will be most effective for which groups of people. This will, in turn, help them to provide patients with tailored recommendations based on their natural predisposition, empowering them to reduce their individual risk of suffering a stroke. Yet, while ML methods offer unprecedented opportunities for precision medicine in stroke prevention, several technological and methodological challenges remain. As outlined in this chapter, challenges can be grouped into three broad categories: (1) challenges in data sourcing, (2) challenges in application development, (3) challenges in deployment in clinical practice.

Having identified some of the opportunities and challenges of machine learning in stroke risk prediction and prevention, it is time to ask ourselves what impact these dynamics will have on individuals and the delivery of care, more generally. Even though it will certainly take some time before ML-based tools can (at least partially) replace established approaches for stroke risk assessment and prevention, we should already prepare for the questions that will arise as these applications are broadly adopted in practice: how will they impact the doctor-patient relationship? How will they affect public trust in the healthcare system? As great strides are made in precision medicine for stroke, how can we ensure everyone will benefit from these gains—what about low- to middle-income countries where stroke incidence rates exceed those observed in high-income countries? What about individuals who refuse to have their data collected and analyzed? These and several other questions raise important ethical concerns that require further investigation. Only by committing to ethical conduct, methodological rigor, and patient safety will we harness the full potential of data-driven predictive modeling in stroke.

Funding

This work was supported by funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 777107 (PRECISE4Q).