Key points

We used machine learning and found bleeding history and socioeconomic status are important for predicting SSRI-related bleeding. Neural networks with genomic features are planned for future analyses.

Introduction

The advent of modern medicines has improved the lives of millions worldwide. In the United States (US), more than one billion medications are prescribed in a single year [1]. Medications are prescribed with the intent of improving patients’ lives, yet unintended adverse drug events (ADEs) may occur. ADEs cause approximately 1.3 million emergency department visits and 350,000 hospitalizations each year in the US [2]. These hospitalizations are often prolonged and may precipitate secondary health problems [3]. The Agency for Healthcare Research and Quality reported an 11.3% increase in hospitalizations that involved an ADE present upon admission in the US between 2010 and 2014 [4]. The mean cost per hospital stay also increased by 15% for ADEs that were present on admission but doubled if they originated during the hospital stay [4].

Studies have shown that approximately 80% of ADEs are predictable, with more than 40% of ADE-attributable healthcare costs being preventable [5, 6]. The ability to predict and prevent ADEs in clinical practice would minimize harm and associated financial burden. Traditional efforts have focused mainly on system measures such as electronic prescribing and automated dispensing to minimize human error, but do not account for the underlying risk of ADEs for individual patients [7]. Precision medicine may play a key role in preventing ADEs through a holistic review of patients’ sociodemographic, clinical, and omics profiles to predict risk of future ADEs at time of prescribing or admission [8, 9].

A use case of precision medicine in ADE research is the prediction of bleeding events after exposure to selective serotonin reuptake inhibitors (SSRIs), a rare but debilitating side effect of SSRIs that can cause significant morbidity and hospitalizations [10, 11]. SSRIs are commonly prescribed to manage psychiatric conditions such as depressive and anxiety disorders across all ages [12], as well as off-label uses for conditions such as post-stroke recovery [13]. The pharmacologic properties of SSRIs stem from their effect of increasing serotoninergic activity at neuronal synapses [14]. However, off-target effects have been observed, including reductions in platelet serotonin content of 80–90% with sustained SSRI exposure [15,16,17]. Serotonin changes in the platelet microenvironment are postulated to explain the higher coronary artery events in depressed geriatric patients, antithrombotic effects of SSRIs, and increased bleeding risk with SSRI exposure [18, 19]. This is notwithstanding the multiplicative effect of SSRIs on bleeding through increasing gastric acid secretion and inhibiting cytochrome-P450 (CYP) enzymes [11, 19], as well as patient-level differences in CYP-enzyme genetic variants that explain interindividual pharmacokinetic differences and bleeding risks [20]. Therefore, in this study, we employed machine learning (ML) techniques to account for these complex relationships in the prediction of SSRI-associated bleeding events and leveraged the large datasets collected by the All of Us (AoU) Research Program for model development and validation [21].

Methods

Data source

The AoU program, a National Institutes of Health (NIH) initiative [22], aims to enhance healthcare through facilitating precision medicine research, recruiting one million plus participants nationwide, and providing researchers with access to participants’ electronic health records (EHR) and survey data to define clinical features and outcomes for prediction model development [23]. The AoU program began in May 2018 and continues to recruit individuals 18 years old or older across more than 340 recruitment sites around the US [23]. All data [electronic health records (EHR) and surveys] are organized with the Observational Health and Medicines Outcomes Partnership (OMOP) common data model v5.2 [24]. This study does not require Institutional Review Board approval as the authors were not involved in any direct interaction with participants and all data have been de-identified by the AoU research team. All researchers must adhere to the AoU Data User Code of Conduct for upholding data privacy and confidentiality.

Study design and sample

Participants who received clopidogrel, warfarin and SSRIs (citalopram, escitalopram, fluoxetine, fluvoxamine, paroxetine, sertraline and vortioxetine) were identified with the EHR. Clopidogrel and warfarin were analyzed concurrently with SSRIs to serve as positive controls. The OMOP concept identifications (IDs) for identifying exposure to these drugs are listed in eTable 1 of the Supplement. We created a total of nine individual drug cohorts and one combined SSRI cohort comprising all patients receiving different types of SSRIs. Each cohort of participants were used to create independent prediction models for the respective medications (individual SSRIs, all SSRIs combined, clopidogrel, and warfarin). To ensure adequacy of EHR data for analysis, eligible patients must have at least one recorded visit to the EHR institution during the 365 days before the index date, and one record of visit during the follow-up period.

  1. 1.

    Index date: The index date, also known as cohort entry date, is the first drug exposure date of each medication for the respective drug cohorts. The index date was identified using dispensing and administration records. To reduce the risk of immortal time bias, prescription records were not used to define index dates.

  2. 2.

    Follow-up period: The follow-up period was defined by continuous records of dispensing, administration, and prescription of the medications of interest. Follow-up of patients continued until the occurrence of bleeding event or if there was lack of evidence of medication exposure for ≥ 90 days. For the combined SSRI cohort, SSRI switching served as an additional criterion for determining follow-up end date. Cohort re-entry was permitted.

Bleeding event outcome algorithm

Bleeding events were identified during the follow-up period. All healthcare data were stored using appropriate standard OMOP concept IDs across different domains (e.g., SNOMED codes for “Condition” domain, and RxNorm for active ingredients in the “Drug” domain). Thus, the appropriate OMOP concept IDs for bleeding were translated from validated ICD-9-CM and ICD-10-CM codes for bleeding [25, 26], excluding trauma-related bleeding events, using the concept set builder toolkit in the Observational Health Data Sciences and Informatics ATLAS program [27] and applying the recommended practices to define ADEs [28]. The OMOP concept IDs are presented in eTable 2 of the Supplement.

Features

A total of 88 features were selected according to clinicians’ advice and literature review [29]. We included sociodemographic information, past medical history, substance use behaviors, and concurrent drug use as features in all models. The following three groups of features, totaling 16 features, were specific to the combined SSRI models: current SSRI use, SSRI used just before the newly prescribed SSRI, and the number of prior SSRI switches. Sources of features were longitudinal EHR data as well as cross-sectional survey data collected during AoU recruitment. All EHR-derived features, other than concurrent drug use, were determined during the period prior to index date. Concurrent drug use holds the value between 0 and 1, where 0 indicates no overlap in drug use while 1 indicates 100% overlap in drug use between drug features and researched drugs during the follow-up period. The features are listed in Table 1 but more detailed information regarding the source of features (EHR or survey) and, if applicable, the corresponding OMOP concept IDs are included in eTable 3 of the Supplement.

Table 1 The list of a priori selected features and their respective feature clusters

Machine learning approaches

We developed and validated four different ML algorithms commonly used in binary classification tasks: logistic regression (LR), decision trees (DT), random forest (RF), and extreme gradient boost (XGBoost). The selection of the ML algorithms was informed by previous ML-based studies in ADE prediction [30]. LR was included as it is the dominate model used on EHR data for predicting ADEs and in other clinical prediction models [30]. Each dataset was randomly divided into training and test data using a ten-fold stratified cross validation method. Missing data were imputed using the Scikit-Learn [31] SimpleImputer method with the mode and median being used for categorical and continuous features, respectively. To address the concerns of imbalanced datasets, the effectiveness of randomly oversampling the minority classification was tested for each dataset and ML model. The descriptions of the ML algorithms are provided in eMethods of the Supplement.

Prediction performance evaluation

To assess the performance of each prediction model, we used the area under the receiver operating characteristic curve statistic (AUC score), as well as performance metrics including sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio and F1 score. These metrics were assessed at the optimized threshold defined by the Youden’s index [32].

Feature cluster importance and clinical significance

We calculated feature importance based on a combination of statistical and pharmacological information. Features that are correlated with another feature are subject to having their feature importance diminished and overlooked. To reduce likelihood of this occurrence, we first grouped the features into clusters based on pharmacological and clinical relationships, then interpreted the clinical importance of related features in predicting bleeding events (Table 1). This was accomplished by iteratively removing each cluster individually with replacement to quantify the impact on the AUC score for each ML model. Cluster removals that resulted in a > 0.01 decline in AUC score were classified as important [33]. We defined clinically significant feature clusters based on a stricter threshold of resulting in a > 0.01 decline in AUC score among 3 out of 4 ML models (frequency ≥ 0.75).

Statistical analysis

We summarized the total number of participants and bleeding events with counts and percentages as descriptive statistics. For model performance metrics, we focused on reporting the AUC and Youden’s index optimized sensitivity and specificity. The importance of each feature cluster was summarized as radar plots based on the frequency (range: 0–1) of resulting in a > 0.01 decline in AUC score across all models for each cohort. Data were accessed with Google BigQuery and analyzed using Python version 3.7.12 in an integrated Jupyter Notebook environment. Results were reported in compliance with the AoU Data and Statistics Dissemination Policy prohibiting the display of participant counts ranging from 1 to 20.

Results

Descriptive statistics

At the time of analysis, there were 329,038 participants in the registered tier AoU dataset version R2021Q3R2, with up to 271,124 participants having both EHR and survey data. We identified 2,159 participants with reliable data for clopidogrel exposure, 1,855 for warfarin, 3,151 for citalopram, 2,597 for escitalopram, 2,719 for fluoxetine, 117 for fluvoxamine, 1,100 for paroxetine, 4,052 for sertraline and 149 for vortioxetine.

The average age at index was 49.4 years for SSRIs, compared to 63.1 for clopidogrel and 60.2 for warfarin. More female participants received SSRIs, except for citalopram which included a much larger proportion of male than female participants (65.1% vs 33.0%). For all cohorts, there was a much larger proportion of White participants, 69.8% (paroxetine) to 81.2% (vortioxetine), compared to other races. The descriptive statistics for each cohort are summarized in Table 2.

Table 2 Descriptive statistics of each drug cohorts

The proportion of bleeding events after drug exposure was 10.8% for clopidogrel and 15.8% for warfarin. Across individual SSRIs, the percentages of bleeding events ranged from 6.0% in escitalopram to 9.1% in citalopram. When combining all the SSRIs into a single combined SSRI cohort, there were 10,362 participants exposed to at least one of the seven SSRIs, with 9.6% experiencing a bleeding event upon SSRI exposure. These statistics are summarized in Table 3.

Table 3 Cohort size, number of bleeding events, and best model performance metrics for each drug cohorts

Model performance

Datasets without feature selection and oversampling of the minority class were selected as primary inputs for each of the ML models. A total of 40 models, four for each of the 10 cohorts, were developed. The models for fluvoxamine and vortioxetine were excluded due to the small number (n < 150) of participants in the cohorts relative to other drugs. Nevertheless, these participants were still included in the combined SSRI cohort. Table 3 summarizes the best performing model with AUC score and the corresponding Youden’s index-optimized sensitivity and specificity for each drug cohort. The hyperparameters of the best performing models are summarized in eTable 4 of the Supplement. Figure 1 summarizes the AUC score for each individual drug as well as the dataset with all SSRIs combined. The AUC scores and other metrics for each ML model and drug for datasets with feature selection and an oversampling of the minority class can be found in eTables 513 in the Supplement.

Fig. 1
figure 1

Receiver operator curves with area under the curve (AUC) scores. Higher AUC score represents better model performance. Baseline characteristics of participants in each cohort served as features for bleeding event prediction with logistic regression (LR), decision tree (DT), random forest (RF) and extreme gradient boosting (XGB) machine learning models

Feature clustering and importance

In total, there were 15 clusters summarizing 88 features (Table 1). For this analysis, three clusters comprising 16 features (current SSRI use, SSRI used just before the newly prescribed SSRI, and the number of prior SSRI switches) were not examined as they were only present in the combined SSRI models. Bleeding history and socioeconomic status were the top two most important clusters across all cohorts (Fig. 2). In fact, bleeding history feature removal was found to cause > 0.01 decline in AUC scores across all four ML models (LR, DT, RF and XGBoost) for all cohorts except for sertraline (3 models, frequency: 0.75), and escitalopram (2 models, frequency: 0.5) (Fig. 2).

Fig. 2
figure 2

The importance of each feature cluster was summarized as radar plots based on the frequency (range: 0–1) of resulting in a > 0.01 decline in AUC score across four machine learning (ML) models (logistic regression, decision tree, random forest, and extreme gradient boosting) for each cohort. The larger the chart area, the more important the feature cluster was across all cohorts (0.25 = important in one ML model, 0.50 = important in two ML models, 0.75 = important in three ML models, 1 = important for all four ML models)

Clinically significant feature clusters

Bleeding history was a clinically significant feature for all drugs except for escitalopram. For escitalopram, health literacy is the only clinically significant feature. Antithrombotics were clinically significant for warfarin, while features for socioeconomic status (highest education level, employment status, annual household income, and health insurance) were significant for fluoxetine and combined SSRIs cohorts (Table 4).

Table 4 Clinically significant feature clusters for each drug cohort

Discussion

We developed ML models with close to moderate predictive performance for SSRI-associated bleeding using data from the NIH AoU Research Program as part of what will be a larger precision medicine endeavor. The AoU database allows us to create models incorporating not only clinical information from the EHR but also sociodemographic characteristics through survey data including income, health literacy, and education level. More importantly, we created our models with the goal of eventually implementing them in clinical practice to allow for evaluation of patient-specific factors and individualized bleeding risk scores for each SSRI to select therapy with the lowest possible risk. Thus, most of our features were selected to ensure that they can be feasibly obtained in clinical settings.

Multiple meta-analyses have demonstrated an augmented risk of gastrointestinal (GI) bleeding with SSRIs, especially when taken concurrently with a non-steroidal anti-inflammatory drug (NSAID) [34,35,36]. Another meta-analysis demonstrated an increased risk of intracerebral and intracranial hemorrhage (ICH) with SSRIs, albeit these bleeding events were rare [37]. There was an estimated a 36% increase in non-specific, global bleeding risk from SSRI treatment [10]. Despite the literature establishing SSRI bleeding risk, studies have not extensively examined actionable risk factors to prevent bleeding ADEs. To our knowledge, this is the first ML prediction model developed specifically for bleeding events associated with SSRIs.

Prior bleeding history was identified as clinically significant in almost all drug cohorts, except escitalopram, although bleeding history remains arguably important as significant changes in AUC were found in two out of its four ML escitalopram models. This is unsurprising as bleeding history is a component of bleeding risk stratification tools for other clinical settings such as HAS-BLED, RIETE, and VTE-BLEED [29, 38]. Further, this evidenced the importance of evaluating predisposing risk factors to bleeding prior to SSRI prescribing. Socioeconomic status was identified as a clinically important feature cluster in the fluoxetine cohort and the combined SSRI cohort. This is an important finding as hospital admissions due to antidepressant-related ADE were also identified to be higher in patients from low-income areas [39] and the need for use of antidepressants may be higher in low-income populations [40]. Patients with low socioeconomic status received low-quality health care coupled with unstandardized care coordination which has caused suboptimal use of medications [41, 42]. Health literacy based on survey data was also deemed clinically significant in the escitalopram cohort. Health literacy affects a person’s capability to interpret and execute health information [43, 44]. Patients with poorer health literacy frequently misunderstood drug information, including over-the-counter drugs [45, 46], which could lead to unintended yet preventable adverse drug events especially in underserved communities [47, 48]. These support the need to examine sociodemographic factors for evaluation of ADE risk at the time of prescribing, as well as interventions to improve patient understanding of their medications.

Surprisingly, use of concurrent antithrombotics was defined as clinically important only for the warfarin cohort and concurrent NSAID use was not noted to be clinically significant in our ML models which is inconsistent with previous studies evaluating bleeding risk with SSRIs [34,35,36, 49]. This may be explained by the incomplete nature of EHR data (which was used to quantify these features) as a consequence of patients' visits to multiple health institutions for care and prescription filling. This presents a significant challenge in the implementation of clinical prediction models in routine clinical practice, especially if the use of real-world EHR data for feature extraction and engineering is desired. Nevertheless, there is great research potential in this field if clinicians and health informaticians work together. For example, clinicians routinely perform medication reconciliation, a process involving the comparison of a patient's medical record to an external list of medications obtained from various sources to determine the most precise and complete list of all medications, including their names, dosages, frequencies, and routes of administration. Health informaticians design and maintain the electronic health system and have expertise in extracting real-world EHR data to train and implement clinical prediction models. Collaborations between both professionals can facilitate the development of clinically actionable prediction models and optimize patient health outcomes. Therefore, we emphasize that our findings do not conclude that concurrent medications and comorbidities are less significant for predicting ADEs. Rather, it uncovers the limitations with EHR data, barriers with training and implementing clinical prediction models in real-world practice, and other modifiable risk factors that clinicians should consider addressing.

While the AUC scores and Youden’s index-optimized sensitivity and specificity for each drug cohort are modest, the performances of models established from this study are comparable to those of previously validated prediction models for clinically relevant bleeding. In the AMADEUS study, CHADS2, CHA2DS2-VASc and HAS-BLED scores were used to determine predictive value for bleeding for enrolled patients [50]. The best performing model, and only one of the three recommended to perform bleeding risk assessment, was HAS-BLED, which demonstrated a modest performance in predicting clinically relevant bleeding, with an AUC of 0.60. Of note, prediction of bleeding events in this study was in patients with atrial fibrillation being treated with anticoagulants; thus, its findings are likely not directly comparable to ours. Nevertheless, this illustrates that our models demonstrate at least comparable performance to currently utilized prediction models in clinical settings.

Developing ML models on EHR data to predict ADEs has been of interest to the research community. Zhao et al. tested multiple ML models including regression, decision trees, AdaBoost, and Random Forest on EHR data to predict ADEs [51]. They showed that, with careful feature selection, ML models can achieve promising accuracy as high as 85% in predicting ADEs [51]. Given the widespread understanding of regression models across health disciplines, these models are predominantly used on EHR data for predicting ADEs [30], with LR found to perform similar to other ML models across multiple clinical prediction studies [52] as further verified by our findings. Future ADE studies can continue exploring with LR using more optimal EHR features, such as the most recent laboratory results and current medication lists at time of office or pharmacy visit.

This study does have some limitations. As explained previously, there are inherent limitations when using EHR databases retrospectively for ADE research. Selection of participants and identification of ADEs is challenging, as it is difficult to ascertain information necessary for thorough causality assessment. Poor quality data collected from EHR sources designed for non-research methods, or missing data, may lead to selection bias and information bias. Therefore, we applied recommended practices to address these inherent limitations, employing strategies such as defining the index date as the first drug exposure date to reduce the risk of immortal time bias [28]. We also designed the follow-up period carefully and treated drug exposure as a time-varying feature, considering factors such gaps in medication records and initiation of other drugs, rather than assuming initial exposure remains the same throughout the follow-up period. Feature selection and clusters were determined a priori, which could have excluded important features identifiable with empirical methods, while the definition of clinically significant features requires optimization. Nonetheless, the rich data made available by the AoU program allow us to make robust predictions with reasonable sample sizes while performing hypothesis-generating research for further evaluation with prospective studies.

Conclusion

We observed that bleeding history, socioeconomic status, and health literacy were important factors that may predict bleeding associated with SSRI use. This work contributes to the larger conversation on judicious use of medications and the importance of optimizing non-drug treatment modalities such as psychotherapy, lifestyle management, and psychosocial interventions whenever possible. Public health interventions that focus on increasing health literacy and provide more health care resources in low-income neighborhoods will go a long way to reduce adverse events worldwide. Although our models performed better than many existing clinical models, we expect improvements in the performance of our current models with the inclusion of genomic features and pharmacokinetic drug interactions [53], alongside optimization of real-world medication and health outcomes using EHR. We will also explore with deep learning models, such as recurrent neural networks, to better capture the granularity of medication changes (dose and frequency) that may be important for ADE prediction.