Importance of knowing risk for future events
The ability to accurately predict risk of clinical events in patients with HF provides important information to providers and patients alike. Determination of the risk of an event helps identify patients who are most (or least) likely to derive benefit from targeted interventions for preventing future events. Assessment of a patient’s risk for these events is particularly valuable when the proposed treatment option is costly or associated with serious side effects. Knowledge of a patient’s risk for future events when integrated along with clinical, social, and other information provides a framework for therapeutic recommendations by providers and is helpful to patients and their families in planning for the future.
Approaches to predicting risk
Risk of future events can be determined in a variety of ways including assessment of patient characteristics, clinical status, test results, or biomarkers. Risk scores which combine information from a variety of the above domains using standard statistical analysis methods have also been developed to predict risk in patients with HF [8,9,10,11,12,13]. These risk scores, however, demonstrate only modest predictive power, especially when applied to populations outside the one used for their derivation [14,15,16,17]. A recent analysis of the predictive accuracy of several commonly used risk scores including the CHARM (Candesartan in Heart Failure-Assessment of Reduction in Mortality), GISSI-HF (Gruppo Italiano per lo Studio della Streptochinasi nell’Infarto Miocardico-Heart Failure), MAGGIC (Meta-analysis Global Group in Chronic Heart Failure), and SHFM (Seattle Heart Failure Model) scores in a large European registry of patients with HF indicated only modest accuracy of all of these scores in predicting risk of death at 1 and 2 years [18]. The authors of this analysis concluded that the limited accuracy of these scores made physicians reluctant to use them in clinical practice and that more precise predictive tools are needed.
Machine learning for predicting incident HF risk
Machine learning techniques have been used to predict cardiovascular (CV) events, including the risk of incident HF. In a study of patients in the Multi-Ethnic Study of Atherosclerosis (MESA) cohort, Ambale-Venkatesh et al. used random survival forest (RSF) to predict six CV outcomes including new onset HF [19]. They identified the top-20 variables which were predictive of each outcome from a total 735 variables derived from imaging studies, health questionnaires, and biomarker analysis. While the model predicted incident HF with an AUC of 0.84, this was only a modest improvement compared to the MESA-HF score (AUC of 0.8). A limitation of this risk score is that the top predictive variables included several laboratory values which are not widely available, e.g., tissue necrosis factor alpha soluble receptor and interleukin-2 soluble receptor. This model has not yet been validated in external populations and its generalizability is questionable. Segar et al. used a similar RSF technique to identify predictors of incident HF among patients with type 2 diabetes mellitus (DM) in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) trial [20]. The RSF models demonstrated somewhat better discrimination than the Cox-based method (C-index 0.77 [95% CI 0.75–0.80] vs. 0.73 [0.70–0.76], respectively). From the identified predictors, an integer-based risk score for 5-year HF incidence was created, the WATCH-DM (Weight [BMI], Age, hyperTension, Creatinine, HDL-C, Diabetes control [fasting plasma glucose], QRS Duration, MI, and CABG) risk score. Each 1 unit increment in the risk score is associated with a 24% higher relative risk of HF within 5 years. The WATCH-DM risk score can easily be applied in clinical practice. Both the risk score and the RSF-based risk prediction model were externally validated in a cohort of individuals with DM using the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) [21].
Machine Learning for Predicting Future Events in Patients with Heart Failure
Researchers have also applied ML to predict morbidity and mortality in patients with known HF (Table 1). Although numerous models have been created to predict HF readmissions, most have demonstrated only limited discriminative properties. Frizzell et al. used several ML algorithms to predict all-cause readmissions 30 days after discharge from a HF hospitalization in patients included in the GWTG-HF registry [22]. All of the models developed in this study showed modest discriminatory power, with C statistics consistently around 0.62. Awan et al. demonstrated a similar AUC of 0.62 using a multi-layer perceptron-based approach to predict risk of 30 day HF readmission or death in a population of patients above age 65 years admitted with HF [25]. Golas et al. were able to demonstrate modest improvement in risk prediction using several deep learning algorithms in a population of HF patients admitted within a large healthcare system [24]. The model developed using deep unified networks from >3500 variables from the electronic health record (EHR) demonstrated the best performance with an AUC of 0.705 for prediction of 30-day readmission. To our knowledge, these models have not been externally validated nor directly compared to traditional risk assessment tools, so it is not known if they can be applied to broad HF populations.
Table 1 Characteristics and outcomes in selected studies using machine learning to predict mortality and hospitalization in patients with heart failure Machine learning models have overall been more successful at predicting mortality in patients with HF. In 2018, Ahmad et al. applied random forest modeling to identify predictors of 1-year survival in patients enrolled in the Swedish HF Registry [28]. Their model demonstrated excellent discrimination for survival with a C-statistic of 0.83. Additionally, they used cluster analysis to successfully identify 4 clinically relevant subgroups of HF with marked differences in 1-year survival and response to therapeutics. Samad et al. used ML learning techniques to incorporate a panel of 57 echocardiographic measurements with clinical variables to predict 5-year mortality in all patients undergoing echocardiography, including 15,492 patients with HF. For the patients with HF, the random forest model that was developed resulted in an AUC of 0.8, a significant improvement compared to the Seattle HF Model (AUC 0.63) [9, 30]. Using a deep learning algorithm, Kwon et al. have also demonstrated superior predictive abilities compared to traditional risk scores for predicting in-hospital, 1-year and 3-year mortality for patients with acute HF [29•]. For in-hospital mortality, their ML algorithm demonstrated an excellent AUC of 0.88, compared to the AUC of 0.728 from the GWTG-HF score [10]. They were also able to successfully predict long-term mortality, with an AUC of 0.782 for 1-year mortality and AUC 0.813 for 3-year mortality. This algorithm outperformed the MAGGIC score which demonstrated AUCs of 0.718 and 0.729 for 1- and 3-year mortality, respectively 13. While these models have demonstrated success at risk prediction, they have not been externally validated and have not yet been widely incorporated into clinical care.
Jing et al. were recently able to demonstrate that in addition to risk stratify patients, ML can be used to identify individuals most likely to benefit from interventions with evidence-based therapies [31]. Using data from the EHR, they trained ML models to predict 1-year all-cause mortality in 26,971 patients with HF. Their model included clinical variables, diagnostic codes, electrocardiogram and echocardiographic measurements, and 8 evidence-based “care gaps”: flu vaccine, blood pressure of <130/80 mmHg, A1c of <8%, cardiac resynchronization therapy, and active medications (angiotensin-converting enzyme inhibitor/angiotensin II receptor blocker/angiotensin receptor-neprilysin inhibitor, aldosterone receptor antagonist, hydralazine, and beta-blocker). Their best performing model demonstrated an AUC of 0.77 for 1-year all-cause mortality which was superior to the performance of the Seattle HF Model (AUC 0.57) [9]. Of the 13,238 living patients, 2844 (21.5%) patients were predicted to die within 1 year based on the estimated mortality rate. Simulating closure of the 8 care gaps that existed in these patients resulted in a 1.7% improvement in the absolute mortality rate with 231additional patients predicted to survive beyond 1 year. These findings require prospective evaluation and external validation, but they are promising and highlight the potential utility of ML methods to guide clinical action and to identify patients most likely to benefit from optimization of evidence-based therapies.
Derivation and validation of MARKER-HF
The Machine learning Assessment of RisK and EaRly mortality in Heart Failure (MARKER-HF) risk score for predicting mortality in patients with HF was developed using non-parametric analysis methods to incorporate interactions between variables that have prognostic value [32]. It was derived and internally validated in patients who were identified at the time of the first mention of a HF diagnosis in either the in-patient or out-patient setting in their University of California, San Diego Healthcare System EHR. Patients who met entry criteria were divided into training and validation cohorts. MARKER-HF was constructed in the training cohort using a boosted decision tree model to discriminate between patients at the extrema of risk of death with patients who died within 90 days considered “high risk” while those known to be alive after 800 days designated as “low risk.” Data from complete blood count, comprehensive metabolic panel, vital sign measurement, electrocardiogram, and echocardiogram, all performed within 7 days of the patients’ index HF event, was used to construct the model. The number of variables selected was determined by balancing inclusiveness so that patients with missing data would not be lost and the need to minimize overfitting that could result in over-training and loss of accuracy of the score in other populations. Ultimately, a composite of eight variables (diastolic blood pressure, creatinine, blood urea nitrogen, hemoglobin, white blood cell count, platelets, albumin, and red blood cell distribution width) that discriminated with a high degree of accuracy between patients at low and high risk was identified. Predictive accuracy was demonstrated across the entire spectrum of risk and was confirmed in the validation cohort. Comparison of MARKER-HF predicted survival in the training and validation cohorts is depicted in Fig. 2, Panel A. No evidence of over-training was detected.
The ability of a risk score to maintain its accuracy in clinically relevant subgroups of the HF population and in other independent populations is essential if it is to be widely used in clinical practice. Consequently, we assessed the predictive accuracy of MARKER-HF in patients according to sex, race, in-patient vs out-patient status and acuity at the time of identification (determined by whether the index diagnosis was HF or pulmonary edema) and found it to perform equally well in all subgroups. External validation demonstrated that MARKER-HF maintained its predictive accuracy in populations followed in the University of California, San Francisco (UCSF) Healthcare System and the European based BIOSTAT-CHF Registry [32] (Fig. 2, Panel B).
Finally, to determine if MARKER-HF was superior to other predictors of risk we compared its performance to NT-proBNP, a well-validated HF biomarker, and to other risk scores used to predict mortality. While MARKER-HF scores tracked reasonably well with levels of NT-proBNP, it proved a much more reliable predictor of mortality than did the biomarker which had an AUC of 0.69 as shown in Fig. 2, Panel C. Initially, NT-proBNP was excluded as a covariate during the derivation of MARKER-HF due to its low availability (~50%). When it was added to the other eight variables in the MARKER-HF score, predictive accuracy was not significantly improved. MARKER-HF also proved to be superior to either the Intermountain Risk Score (IMRS) [7], the Get With the Guidelines-HF (GWTG-HF) score [11], or the Acute Decompensated Heart Failure Registry (ADHERE) risk score [8] in predicting mortality risk in the UCSD, UCSF, and BIOSTAT-CHF populations.
Future directions
By virtue of their ability to assess large data bases and find unsuspected associations between covariates, ML approaches offer a powerful new approach for risk assessment in patients with HF. The ability to accurately predict risk of future events is critical for providers in making recommendations about the advisability of specific therapies and for patients and their families who must plan for the future. In this review, we have outlined criteria for extracting and managing data that we believe are essential in developing ML based risk scores. In all cases, external validation in independent populations is needed to determine generalizability and comparison with available tools is required in order to know if they are superior to other approaches. For risk scores to be effective, they should require input of a manageable number of variables that are easily accessible and widely collected in routine patient care, so as not to constrain their use.
Finally, although this review has focused on the use of ML to develop hospitalization and mortality risk scores for clinical events in patients with HF, they can be applied to a variety of situations in either this or other populations. There is great need for novel approaches for calculating risk of other adverse events known to occur in patients with HF such as stroke, atrial fibrillation and sudden cardiac death as well as the risk of adverse consequences of specific therapies designed to treat these conditions. Machine learning approaches may also be useful in future clinical trials in HF by helping to determine which patients to enroll.