Background

Anticoagulant drugs, including vitamin K antagonists (VAKs) and direct oral anticoagulants (DOACs), have been extensively applied in clinical practice [1,2,3,4]. Compared with VAKs, DOACs, targeting specifically at factor Xa and/or thrombin, have advantages, such as predictable pharmacokinetics and pharmacodynamics, fewer food-drug and/or drug‒drug interactions, and no requirement for routine laboratory monitoring of coagulation parameters [5,6,7,8]. Due to their noninferiority in efficacy and superiority in safety, DOACs are recommended to overcome the limitations of VAKs in an increasing number of clinical practices. Data revealed that the application of DOACs has been steadily increasing over the years [9, 10].

As the first approved factor Xa inhibitor among DOACs, rivaroxaban has been widely used in the treatment of patients with nonvalvular atrial fibrillation (NVAF) to reduce the risk of stroke and systemic embolism, in the prevention of venous thromboembolism (VTE) in adult patients undergoing elective hip or knee replacement surgery, and in prophylaxis of recurrent acute deep venous thrombosis (DVT) and pulmonary embolism (PE) [11,12,13,14,15]. Some studies have shown that DOACs represented by rivaroxaban exhibited significantly lower mortality and intracranial hemorrhage than VAKs. However, the risk of hemorrhage remains a concern and has a significant impact on the health of patients with long-term administration [16, 17]. Recently, several groups have focused on the hemorrhagic incidence rates of rivaroxaban [18,19,20,21,22]. For example, the Ichiro Sakuma group analyzed the dataset of the Expand Study to explore the bleeding risk among Japanese patients with NVAF after the use of rivaroxaban [18]. The results showed a 1.2% incidence rate of major bleeding, of which the incidence rate of intracranial hemorrhage was 0.5%. Unlike warfarin, there is a lack of monitoring methods to assess the hemorrhage risk of rivaroxaban and specific antidotes during bleeding events. Therefore, it is of great importance to establish an early-warning model to precisely predict the hemorrhage risk associated with rivaroxaban.

Geriatric patients, who usually take rivaroxaban for a long time and who also have decreased renal and liver functions, are more vulnerable to hemorrhage and have a poor prognosis [23]. In this study, we investigated the risk factors that contributed to the hemorrhagic events of geriatric patients who were over 70 years old and took rivaroxaban continuously for more than 3 months. Conventional logistic regression and eXtreme Gradient Boosting (XGBoost) were utilized to establish the hemorrhage risk prediction model, of which the XGBoost-based machine learning model showed a better prediction accuracy. This study will provide a novel strategy to guide the individualized administration of rivaroxaban for geriatric patients and thus promote protocols for the safe use of rivaroxaban in clinical practice.

Methods

Study population and follow-up of hemorrhage

A total of 798 geriatric patients treated with rivaroxaban for anticoagulation therapy at the Chinese PLA General Hospital from 2011 to 2021 were enrolled for the establishment of hemorrhage predictive models based on multivariate logistic regression, random forest and XGBoost. In addition, 94 geriatric patients treated with rivaroxaban for anticoagulation therapy at Chinese PLA General Hospital from 2021 to 2023 were enrolled as an external validation set for random forest and XGBoost-based model. Data were collected from the our clinical database by retrieving the outpatient, physical examination and inpatient records of these patients. The inclusion criteria were as follows: subjects aged ≥ 70 years old and who had continuous oral administration of rivaroxaban ≥ 3 months with complete clinical data. Patients were excluded if they had severe trauma or major surgery within the previous 6 months or switched to other anticoagulant drugs during the treatment period. Patients whose clinical data were incomplete or had obvious recording errors were also excluded. Finally, the cohort was established for further risk analysis.

Through a well-established clinical follow-up system (telephone follow-up, outpatient and inpatient examinations), the hemorrhage information of geriatric patients was constantly tracked and recorded. Adverse hemorrhagic events, including gastrointestinal hemorrhage, intracranial hemorrhage, ocular hemorrhage, urinary bleeding, pulmonary hemorrhage, nasal hemorrhage, gingival hemorrhage, and knee joint cavity hemorrhage, were detected and recorded by doctors and nurses through clinical observation, measuring the red blood cell counts in the patients’ urine and feces, and monitoring serum hemoglobin levels.

Clinical data collection

According to previous studies [18, 23, 24] and the characteristics of our clinical data, this study collected a total of 27 clinical indicators for risk analysis, including basic information of patients (gender, age and BMI), medication-taking information (rivaroxaban dose and combination therapy of antiplatelet drugs), underlying diseases and previous surgical history (hypertension, diabetes, high triglyceride, high cholesterol, LDL-cholesterol abnormal, lowest hemoglobin, lowest blood platelet, coronary disease, heart failure, valvulopathy, percutaneous coronary intervention (PCI), apoplexy, hemorrhage history, coagulopathy), coagulation function index (thrombin time (TT), active partial thrombin time (APTT), international normalized ratio (INR) and D-dimer), liver function (aspartate aminotransferase (AST) and alanine aminotransferase (ALT)), and renal function (BUN and creatinine). Among them, antiplatelet drugs, rivaroxaban dose and hemorrhage information were collected since the elderly patients had begun the treatment. The coagulation function index, including TT, APTT, INR and D-dimer, was obtained from the first-time physical examination data of the patients after continuous administration for more than three months. Other clinical indicators were collected before the patients started taking rivaroxaban.

Random forest model and XGBoost model for prediction of hemorrhage risk

Random forest and XGBoost models were developed and validated with R software (version 3.6.1). Briefly, the R package “missForest” (version 1.4) was used to impute missing values, and ggplot2 graphics (version 3.3.6) was used for data analysis. To establish a hemorrhagic risk prediction model for rivaroxaban, 798 geriatric patients were randomly divided into a training cohort and a testing cohort at a ratio of 85:15. Tenfold cross-validation was performed for hyperparameter tuning of the machine learning models.

Random forest was done as previously reported [25, 26]. To be specific, N is used to represent the number of original training sets, and M is the number of features. For each tree node, m features are randomly selected, where m should be much smaller than M. The best splitting method is calculated based on m features using the Gini coefficient. Out-of-bag (OOB) error was used to detect the generalization ability of the model. The number of decision trees constructed in this study was 500, and four variables were randomly selected on each decision tree node. The importance of each variable was subsequently measured by calculating how much reduction each variable offers when it was added to the RF model using the mean decreased accuracy and Gini. The final model estimates the importance of each predictor by checking how much the prediction error has increased. The R packages “randomforest” (version 4.6.14), “Boruta” (version 7.0.0), and “caret” (version 6.0–90) were used to develop and validate the random forest model.

The XGBoost model uses a gradient boosting framework and is also a decision tree-based ensemble method. In the XGBoost model, we ran 100 repetitions, each with a different random undersampling, of a tenfold cross-validation experiment using random parameters each time. Then, the optimal set of parameters is obtained. In this study, we used the smallest multiclass logloss. After hyperparameter optimization, we used the following parameters: eval_metric = rmse, max_depth = 6, eta = 0.11, subsample = 0.60, colsample_bytree = 0.58, min_child_weight = 2, and max_delta_step = 1. The R package “xgboost” (version 1.5.2.1) was used to develop and validate the XGBoost model. We additionally performed sensitivity analyses in which we limited the study group to patients younger than 100 years old and reported the result of sensitivity analyses separately.

Statistical analyses

Statistical software SPSS 15.0 was utilized to perform Fisher’s exact test, chi-square tests, and univariate and multivariate logistic regression analysis. Random forest and XGBoost were conducted using R software, and a p value < 0.05 was defined as statistically significant.

Ethical considerations

The study was conducted in accordance with the principles of the Declaration of Helsinki. The research proposal was approved by the Ethics Committee and the Institutional Review Committee of Chinese PLA General Hospital (Approval Number: S2022-045–01).

Results

Patient characteristics

A total of 1289 geriatric patients were treated with rivaroxaban for anticoagulation therapy. Moreover, 491 patients were excluded according to the exclusion criteria (Fig. 1). The baseline characteristics of the patients in the training and testing set are summarized in Table 1. Overall, this study enrolled a total of 798 geriatric patients suffering from diseases such as NVAF, VTE, PE or DVT who were treated with rivaroxaban for anticoagulation therapy. Among them, 112 patients (14.0%) had adverse hemorrhagic events during the treatment. The population distribution of 27 clinical indicators, including gender, age, BMI, rivaroxaban dose, antiplatelet drugs, hypertension, diabetes, high triglyceride, high cholesterol, LDL-cholesterol abnormal, lowest hemoglobin, lowest blood platelet, coronary disease, heart failure, valvulopathy, PCI, apoplexy, hemorrhage history, coagulopathy, TT, APTT, INR, D-dimer, AST, ALT, BUN and creatinine, is listed in Table 1.

Fig. 1
figure 1

The flowchart of patient selection

Table 1 Distribution of patients’ characteristics and prognosis analysis for the establishment of predictive models

Hemorrhage types of geriatric patients

The statistical results of the hemorrhage types are listed in Fig. 2. Among the 112 geriatric patients with hemorrhage events, 79 patients (68.14%) had gastrointestinal hemorrhage events, 17 patients (15.04%) had intracranial hemorrhage, 6 patients (5.31%) had ocular hemorrhage, 5 patients (4.42%) had urinary hemorrhage events, 3 patients (2.65%) had pulmonary hemorrhage, 2 patients (1.77%) had nasal hemorrhage events, 2 patients (1.77%) had gingival hemorrhage events, and only one patient (0.88%) had knee joint cavity hemorrhage. Gastrointestinal and intracranial hemorrhage were the most common bleeding types caused by rivaroxaban.

Fig. 2
figure 2

Percentage of hemorrhage types

Analysis of hemorrhagic risk factors and construction of a prediction model by conventional logistic regression

As shown in Table 1, the analysis results of the univariate logistic regression showed that risk factors including age (P value = 0.002, OR: 4.13, 95% CI: 1.65–10.62), BMI (P value = 0.020, OR: 0.47, 95% CI: 0.25–0.89), antiplatelet drugs (P value = 0.007, OR: 1.76, 95% CI: 1.17–2.66), lowest hemoglobin (P value = 0.003, OR: 0.12, 95% CI: 0.03–0.48), coronary disease (P value = 0.016, OR: 1.98, 95% CI: 1.13–3.45), apoplexy (P value = 0.004, OR: 1.81, 95% CI: 1.21–2.71), hemorrhage history (P value = 0.001, OR: 2.33, 95% CI: 1.39–3.89) and coagulopathy (P value = 0.015, OR: 2.38, 95% CI: 1.19–4.75) differed significantly between the hemorrhage and nonhemorrhage groups. Then, a multivariate logistic regression analysis was performed by incorporating the variables with significant differences obtained from the univariate logistic regression. As a consequence, the lowest hemoglobin level and hemorrhage history were identified as important risk factors in the dataset (Table 2). The hemorrhage risk of patients with the lowest hemoglobin level ≥ 120 g/L was 22% of that in patients with the lowest hemoglobin level < 120 g/L (P value = 0.041, 95% CI: 0.05–0.94). Meanwhile, the hemorrhage risk of patients with a history of hemorrhage was 2.04 times higher than that of patients without a history of hemorrhage (P value = 0.024, 95% CI: 1.10–3.78). A hemorrhage prediction model was developed by multivariate logistic regression, and the ROC curve indicated that this model had only moderate discrimination with an AUC of 0.679 (Fig. 3). To improve the accuracy of prognostication, algorithms based on machine learning were further explored to construct predictive models.

Table 2 Multivariate logistic regression
Fig. 3
figure 3

ROC curve of hemorrhage prediction model constructed by multivariate logistic regression with an AUC of 0.679

Analysis of hemorrhagic risk factors and construction of a prediction model based on XGBoost

A cohort of 94 geriatric patients with treatment of rivaroxaban were enrolled as an external validation set (Table 3). Initially, random forest was used to build the predictive model with an AUC of 0.672 for the test set and 0.610 for the external validation cohort, and this model represented poorer discriminatory power than the model built by multivariate logistic regression. XGBoost was used to establish the prediction model. First, feature selection by XGBoost identified 13 distinguished variables predisposing patients to hemorrhage: lowest blood platelet count, BMI, APTT, TT, D-dimer, lowest hemoglobin, creatinine, INR, ALT, AST, hemorrhage history, BUN and apoplexy (Fig. 4). Because it is difficult to measure BMI for bedridden geriatric patients, missing values among all the variables mainly focused on BMI features. The MissForest package was used for missing value imputation with default parameters. Then, those distinguished risk factors were utilized to develop a prediction model. In the internal validation of the testing dataset, the ROC curve showed the resulting model with an AUC of 0.776 (95% CI: 0.687, 0.864) with an accuracy, sensitivity, precision, recall, and F-1 of 0.771, 0.804, 0.921, 0.904, and 0.859, respectively (Fig. 5). In the independent external validation cohort, the model displayed decreased discrimination with an AUC of 0.689 (95% CI: 0.518, 0.860) with an accuracy, sensitivity, precision, recall, and F-1 of 0.830, 0.905, 0.905, 0.905, and 0.905, respectively (Fig. 5). With respect to sensitivity analyses, we repeated the above methodology for study groups limited to patients younger than 100 years. An analogous model training process fit models to both internal testing dataset and external validation dataset to assess the robustness of the models. The AUC of internal testing dataset was 0.784 (95% CI: 0.677–0.891) and the AUC of external validation dataset was 0.647 (95% CI: 0.451–0.843).

Table 3 Distribution of patients’ characteristics and prognosis analysis for external validation set
Fig. 4
figure 4

Visualize the feature importance in XGBoost model

Fig. 5
figure 5

ROC for evaluating the XGBoost model’s discrimination performance in both the testing and external validation datasets. AUC in testing datasets was 0.776 and AUC in the external validation datasets was 0.689

Discussion

Progressive increases in the incidence rates of NVAF, VTE, PE and DVT are observed with advanced age [27,28,29,30]. As previously reported, approximately one-third of all patients with AF are aged 80 or older, and the proportion of this age group will continue to rise. Moreover, the incidence of a first episode of DVT or PE in patients aged over 80 years old is estimated to be 6 times higher than that in patients less than 50 years of age. Due to the better benefit-risk profile in comparison with VAKs, rivaroxaban is recommended in the daily clinical treatment of geriatric patients for anticoagulation therapy [11,12,13,14,15]. However, hemorrhagic events, as one of the major side effects caused by rivaroxaban, pose a great threat to patient health. Geriatric patients showed different pathophysiologic characteristics from young patients, such as changes in the body composition of mass and muscle, impairment of liver and renal function and the presence of underlying diseases or comorbidities [27, 31]. These characteristics make geriatric patients more vulnerable to hemorrhage by affecting drug absorption, distribution, metabolism and elimination. Thus, evaluation of the potential hemorrhagic risk factors and establishment of a related predictive model are significant to guarantee the clinical safety use of rivaroxaban in geriatric patients.

In our study, we built a cohort of 798 geriatric patients with continuous treatment of rivaroxaban for more than 3 months to establish the hemorrhagic predictive model. A total of 112 geriatric patients (14.0%) had hemorrhagic events during treatment, which is higher than the 8.4% hemorrhagic rate observed in the elderly cohort (a median age of 71.66 years) reported by Hou et al. [23]. This may be attributed to the different age composition between the two cohorts, considering a median age of 92 years in our cohort, which is 20 years older than that in Hou’s cohort. According to the hemorrhagic location, all hemorrhagic events could be classified into 8 types (gastrointestinal hemorrhage, intracranial hemorrhage, ocular hemorrhage, urinary bleeding, pulmonary hemorrhage, nasal hemorrhage, gingival hemorrhage and knee joint cavity hemorrhage). Among them, gastrointestinal and intracranial hemorrhage were the most common hemorrhage types, accounting for 68.14% and 15.04%, respectively. All these results indicated that hemorrhage tendency rates could increase with age and daily clinical observation, and routine monitoring of serum hemoglobin levels and red blood cell count in the urine and feces is crucial for the safety of geriatric patients.

Candidate variables for the model were chosen from variables collected at the baseline study visit based on existing evidence and clinical relevance. Considering the influencing factors of hemorrhage reported in previous studies and some indicators closely related to bleeding according to clinical experience, 27 clinical indicators were selected in our study. Most clinical indicators, except for rivaroxaban dose and hemorrhage information, were registered before patients took rivaroxaban for the first time. Considering the possible effects of rivaroxaban dose on hemorrhage, related information from doctors’ advice in the outpatient medical record was also incorporated into predictive indicators. Hemorrhagic events occurring after the use of rivaroxaban were truthfully reported with a well-established clinical follow-up system. The frequency of telephone follow-up was once every 3 months. However, it should be noted that some potential indicators, such as the level and activity of factor Xa, might be missed. The adverse effect of factor Xa inhibitors is naturally relevant to the level and activity of factor Xa, which is rarely tested for in cardiovascular patients in most hospitals.

The univariate logistic regression revealed that age, BMI and antiplatelet drugs were significantly correlated with bleeding risk, which is consistent with previous studies of rivaroxaban and other DOACs [32,33,34]. Due to multiple types of clinical indicators collected in our study, lowest hemoglobin, coronary disease, apoplexy, hemorrhage history and coagulopathy were also identified as risk factors with statistical significance. The lowest hemoglobin level and hemorrhage history, which were not evaluated in previous studies, were further indicated as independent risk factors for bleeding by multivariate logistic regression. The hemorrhage prediction model constructed by multivariate logistic regression after incorporating the lowest hemoglobin level and hemorrhage history as influencing factors showed a less satisfactory AUC of 0.679.

To further optimize the prediction model, random forest was applied to build a model with an AUC of 0.672 in the internal validation of the testing dataset and 0.610 in the external validation cohort, showing a poorer discrimination than multivariate logistic regression. As an emerging machine learning algorithm based on gradient boosting, XGBoost was proposed by Tianqi Chen in 2016 and showed excellent ability to customize the loss function, normalize the regular term, and process sparse features and missing data [35, 36]. These abilities allow the model to use variables with flexibility in different areas of the output space, thus realizing automatic feature selection and the fitting of high-order interactions [37, 38]. To the best of our knowledge, the XGBoost machine learning approach has never been used before to build a bleeding risk prediction model of rivaroxaban in geriatric patients. Successfully, the hemorrhage prediction model constructed by XGBoost showed the best discrimination with an AUC of 0.776 (95% CI: 0.687, 0.864) in the internal validation of the testing dataset and 0.689 (95% CI: 0.518, 0.860) in the independent external validation cohort. Out of the 27 clinical indicators, XGBoost identified 13 distinguished variables predisposing to hemorrhage in patients, including lowest blood platelet count, BMI, APTT, TT, D-dimer, lowest hemoglobin, creatinine, INR, ALT, AST, hemorrhage history, BUN and apoplexy. Coagulation function indicators, including APTT, TT and INR, were also evaluated as risk factors by Student's t test in a previous study reported by Liang et al. [24]. The results of XGBoost model suggest that periodic monitoring of the coagulation indicators (platelet, APTT, TT, D-dimer and INR) and BMI is important to reduce the occurrence of hemorrhagic events for geriatric patients with continuous treatment of rivaroxaban. When geriatric patient's coagulation function is abnormal or their BMI is too small, doctors should pay more attention to the potentially higher incidence of bleeding events.

The XGBoost model showed a better capacity than the random forest model for predicting hemorrhage in geriatric patients treated with rivaroxaban. Random forest and XGBoost are decision tree algorithms where the training data are taken in a different manner. XGBoost is a gradient boosting-based decision tree ensemble designed to be highly efficient and scalable. Since the gradient of the data is considered for each tree, the calculation is faster and the precision is more accurate than those of random forest [39]. On the other hand, compared with random forest, XGBoost shows resistance to overfitting in datasets with imbalanced feature/outcome ratios and hyperparameters, which allows tuning for imbalanced datasets [40]. Therefore, the model built by XGBoost should be employed to predict hemorrhage risk in geriatric patients with long-term rivaroxaban treatment. We believe this XGBoost model will promote personalized medication of patients treated with rivaroxaban and contribute to its safe administration.

Limitations

There are some limitations in our study. First, it is a single-center study. A total of 798 patients in the training and testing cohorts were from the same hospital. Moreover, considering the inclusion of multiple kinds of clinical indicators, the sample size was relatively small. These limitations may limit the interpretation of the results and lead to a wide confidence interval for the XGBoost-based bleeding prediction model. Second, clinical information, especially hemorrhagic history, was reported by the patients and susceptible to recall bias. In addition, geriatric patients over 90 years old are mostly bedridden, which resulted in missing BMI values. Third, although multivariate logistic regression analysis could adjust the influence of confounding factors to some extent, more thorough investigation of confounding factors should be conducted to further strengthen the conclusions and provide a better understanding of the underlying relationships. Above all, more solid conclusions need to be confirmed in future studies.

Conclusions

In conclusion, 112 geriatric patients (14.0%) with long-term rivaroxaban treatment had adverse bleeding events. As the main hemorrhage types, gastrointestinal and intracranial hemorrhage accounted for 83.18% of the bleeding events in total. Three hemorrhage prediction models were constructed by applying multivariate logistic regression, random forest and XGBoost. Among them, the XGBoost model performed best with good discrimination and accuracy and identified the lowest blood platelet count, BMI, APTT, TT, and D-dimer and the lowest hemoglobin, creatinine, INR, ALT, AST, hemorrhage history, BUN and apoplexy as the most contributing features. The XGBoost predictive model will contribute to the safe clinical use of rivaroxaban for geriatric patients.