Introduction

Postpartum hemorrhage (PPH) is a severe complication during delivery. It is defined by The American College of Obstetricians and Gynecologists as the cumulative blood loss greater than or equal to 1000 ml or accompanied by symptoms or signs of hypovolemia within 24 h after birth (including intrapartum loss) regardless of route of delivery1, and is one of the main causes of global maternal mortality. According to the World Health Organization, PPH is associated with about 20% of maternal deaths annually2. In the United States, the maternal mortality rate attributable to PPH is 11.2%3. In China, the maternal mortality rate is 0.183/10004, with PPH accounting for one-third of maternal deaths. In developing countries, the mortality rate of PPH is higher3, and the numbers are on the rise5,6,7.

Cesarean delivery (CD) is a predominant, independent risk factor for PPH8. In Israel, the rate of PPH after CD is 9.6%9, while 6.5% in China10. In the United States and India, blood transfusion rates after CD are 3.2% and 12.2%, respectively11,12. Therefore, the risk factors of PPH after CD should be explored for early identification and to develop a risk-factor model for PPH after CD. Extensive studies have evaluated the risk factors for PPH after CD13,14,15,16, and few of them have integrated these risk factors to build PPH risk-factor models. Furthermore, these studies have drawbacks, such as small sample sizes or low areas under the curve (AUC)17,18,19.

Machine learning (ML) is a new artificial intelligence discipline that is widely used to inform the management of diseases in the early stages20,21,22. For instance, Zheutlin19 established a PPH prediction model using a gradient boosted decision tree and achieved an AUC value of 0.71. Segar21 developed a novel machine learning-derived model to predict heart failure in diabetic patients and attained an AUC value of 0.74. Kang18 used a support vector machine and random forest models to screen septic shock in an emergency department with an AUC value of 0.83. Therefore, the aim of this study was to use ML algorithms to identify risk factors and build a risk-factor model for PPH after CD.

Methods

Study participants

Data were obtained from the medical big data platform of the medical data science academy, Chongqing Medical University (Chongqing, China), which includes seven medical institutions. All the seven medical institutions are affiliated hospitals or teaching hospitals of Chongqing Medical University and operate in a similar manner. The CD electronic medical data were collected from January 1st, 2015, to June 1st, 2020. Blood loss volume data were extracted from the electronic medical record. Briefly, blood loss volume was quantified in all cases by measuring the blood collected in the suction apparatus and by weighing lap pads, towels, gauzes and drapes. A coagulation examination is a routine assay after the patients are admitted to the hospitals. The inclusion criteria for this study were as follows: (1) Patients with CD; (2) Gestational age > 20 weeks; (3) Age ≥ 18 years. The exclusion criteria were as follows: (1) Patients with coagulation dysfunction23,24,25; (2) Patients with preoperative anticoagulant therapy and hemorrhagic diseases24,25; (3) Those missing all clinical information. This study protocol was reviewed and approved by the Ethics Committee of Chongqing Medical University, and with its approval, this study required no informed consent. All methods were performed in accordance with the Declaration of Helsinki and the relevant guidelines.

Potential risk factors

During hospitalization, the clinical information of the patients, including laboratory examination records, imaging examination records, diagnosis and treatment process, were recorded on the medical big data platforms. Most of the variables were obtained before the CD, and their definition and access methods are shown in Supplementary Table 1. Based on the previous studies8,18,26,27,28,29,30, a total of 56 potential risk factors were obtained from the medical record systems. Of these, 10 factors had a missing rate of more than 30% and were excluded. Therefore, 46 factors were collected, including all admission first blood routine examination indices, coagulation indices and pregnancy indices. These included gravidity, number of previous deliveries, number of CD, prothrombin time (PT), thrombin time (TT), activated partial thromboplastin time (APTT), fibrinogen, neutrophil ratio (NEUT%), neutrophil count (NEUT#), monocyte ratio (MONO%), monocyte count (MONO#)), basophil ratio (BASO%), basophil count (BASO#), eosinophil ratio (EO%), eosinophil count (EO#), lymphocyte ratio (LYMPH%), lymphocyte count (LYMPH#), white blood cell count (WBC), red blood cell count (RBC), mean corpuscular volume (MCV), hemoglobin concentration (HGB), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), platelet count (PLT), mean platelet volume (MPV), platelet distribution width (PDW), platelet larger cell ratio (P-LCR), coefficient variation of red blood cell volume distribution width (RDW-CV), anemia before delivery, thrombocytopenia, gestational age, gestational hypertension, gestational diabetes mellitus, pregnancy with uterine fibroids, amniotic fluid index (AFI), estimated neonatal weight, preeclampsia, placental abruption, placenta previa, pre-labor rupture of membranes, uterine rupture, umbilical cord around the neck, placenta accreta, uterine atony, anesthesia, and pelvic adhesion.

Statistical analysis

All statistical analyses were performed in R for windows (version 3.6.1, https://www.r-project.org/) and SPSS 24.0 (IBM Corporation, Armonk, NY, USA). Data were presented as counts with percentages for categorical variables, median with inter-quartile range (IQR) or mean with standard deviation for continuous variables. Variables with missing rate of more than 30% were excluded, while the miss-forest algorithm was used to fill the variables with missing rates of less than 30%. Propensity score matching (PSM) was used to balance the large difference in proportions between all CD with PPH patients and all CD without PPH patients at a ratio of 1 to 4; and maternal age and BMI were used as matching factors. Moreover, participants were randomized into the training and test sets using a random number table. The Mann–Whitney U test and T-test were used to analyze continuous variables, whereas the Chi-square test was used for all categorical variables. For multi-variable analysis, the least absolute shrinkage and selection operator (LASSO) and logistic regression analysis were performed. Variance inflation factor (VIF) was used to assess multi-collinearity between variables, with VIF > 10 indicating collinearity. Then, extreme gradient boosting (XGBoost), random forest (RF), classification and regression trees (CART) and artificial neural network (ANN) were used to develop a risk-factor model. The model’s performance was assessed by its specificity, precision, recall, F1 score and AUC; a larger value indicates a higher performance31. Significance was established at P < 0.05. Evaluation metrics for model performance were as follows:

$$ Precision = \frac{TP}{{TP + FP}} $$
$$ Recall = \frac{TP}{{TP + FN}} $$
$$ F1 = \frac{2 \times Precision \times Recall}{{Precision + Recall}} $$

TP: true positive numbers; TN: true negative numbers; FP: false positive numbers; FN: false negative numbers.

Results

A total of 15,275 patients, including 701 patients with PPH after CD and 14,574 patients without PPH after CD, met the inclusion criteria. The CD rate in these hospital units was 33.36%. Propensity score matching (PSM) was used to match 2797 patients without PPH after CD (control group) with the 701 patients with PPH after CD (study group). Eventually, a total of 3498 participants, including 701 (20%) patients with PPH after CD and 2797 (80%) patients without PPH after CD, were included in this study. There were 3457 (98.83%) patients of the Chinese Han nationality. The other nationalities were represented by 41 (1.17%) patients, with nine patients in the study group and 32 in the control group. The flow chart for screening the study participants is shown in Fig. 1.

Figure 1
figure 1

Flowchart for screening study participants.

The patients were randomized into the training set (n = 2448, 70%) and the test set (n = 1050, 30%). The training set had 477 patients with PPH after CD and 1971 patients without PPH after CD. The test set had 224 patients with PPH after CD and 826 patients without PPH after CD. Univariate analysis revealed 28 significant variables between the study and the control groups, while 18 variables were insignificant (Table 1). Statistically different variables were further identified by LASSO to find the optimal value of lambda by balancing accuracy and simplicity. The log of the optimal value of lambda was 10 (Supplementary Fig. 1). Thus, 10 significant variables (Table 2) were retained and analyzed by logistic analysis. The risk factors associated with PPH after CD included pregnancy with uterine fibroids, anemia before delivery, placenta previa, placenta accreta, placental abruption, uterine atony, small for gestational age, prolonged PT, prolonged TT, and low fibrinogen.

Table 1 Univariate analysis of CD-PPH and CD no-PPH in the training set.
Table 2 Factors associated with CD-PPH in the training set.

To establish a risk-factor model, the 10 independent risk factors were used as input variables in the ML algorithm, with PPH after CD as the outcome event (yes = 1, no = 0). In the training set, Logistic, XGBoost, RF, CART, and ANN models had AUC of 0.893 (0.875–0.911), 0.879 (0.859–0.898), 0.957 (0.950–0.965), 0.862 (0.842–0.883), and 0.893 (0.875–0.911), respectively, and 0.851 (0.823–0.880), 0.857 (0.828–0.887), 0.893 (0.867–0.918), 0.849 (0.818–0.880), and 0.891 (0.866–0.916), respectively, in the test set (Table 3, Fig. 2). The F1 score of the RF model in the training and test sets was 0.708 (Table 3). Since multiple variables were present during CD, we also constructed the risk-factor model based on the potential risk factors before CD. The results were similar to those obtained by the RF model (Supplementary Table 2).

Table 3 Evaluation of model fitness in training and test sets.
Figure 2
figure 2

AUC values for five models in the training and test sets.

Evaluation indices indicated that RF outperformed the other models. Based on the gini coefficient, the 10 variables were ranked as follows: placenta accreta, placenta previa, gestational age, PT, TT, fibrinogen, anemia before delivery, uterine atony, placental abruption and pregnancy with uterine fibroids, (Fig. 3). The web-based tool was developed based on these 10 risk factors (https://cqmugj.shinyapps.io/pph_after_cd/). Furthermore, another web-based tool was also constructed based on seven risk factors before CD (https://cqmugj.shinyapps.io/pph_after_cd_2/).

Figure 3
figure 3

Variable importance score in the RF model.

Discussion

In this study, we report a risk-factor model for PPH after CD based on the ML algorithm. 46 variables were collected and analyzed by univariate analysis. Among them, 28 variables showed a significant difference between the two groups, and the 10 independent risk factors were filtered in a multi-variable analysis. Finally, five risk-factor models were successfully developed to discriminate between PPH after CD and no-PPH after CD.

Many studies have investigated PPH after CD13,14,15,16. In America, a study revealed that placenta previa is a risk factor for PPH after CD15. According to an Australian study, general anesthesia is closely correlated with PPH16. Furthermore, many studies found that pregnant women with hypertensive disorders, and placenta accreta spectrum disorders were more likely to suffer from PPH10,13,14,15,16. However, a limited number of studies have integrated these factors to develop a risk-factor model19.

In this study, placenta previa, gestational age, PT, TT, fibrinogen, anemia before delivery, placenta accreta, uterine atony, placental abruption, and pregnancy with uterine fibroids were identified as risk factors for PPH after CD. Most of theses independent risk factors have previously been reported8,10,13,14,17,29. For example, both the study from the Southern Ethiopia and Mali showed that antepartum anemia mothers are more likely to suffer from PPH32,33. The effect of placenta previa has also been reported in many studies15,34,35. A study from China found the patients with complete placenta previa had higher risk of PPH than those incomplete or no placenta previa34. Therefore, the guidelines36 usually recommend planned cesarean delivery for better maternal outcomes35. Placenta accreta refers to the penetration of placental villi into part of the muscular layer of the uterine wall. The implanted part of the placenta fails to detach itself during childbirth. Manual separation of the placenta may damage the myometrium, resulting in severe bleeding, perforation or maternal death37. Placental abruption is characterized by the partial or total detachment of the placenta from the uterine wall before the delivery of the fetus. In China, placental abruption has an incidence of 0.46%–2.1%38 and 0.3%–1.2% in other countries (US, Canada, Sweden, Norway, Denmark, Finland and Spain)39. It is a critical complication during the pregnancy (after 20 weeks) or delivery40. Uterine atony is one of the most common causes of PPH. With prolonged labor, some pregnant women lose effective pressure on blood vessels, resulting in PPH41. Previous studies have shown small for gestational age is associated with PPH42,43. Furthermore, premature delivery with gestational hypertension, severe preeclampsia, and placenta previa can significantly increase the risk of PPH44,45,46. Fibrinogen is involved in platelet aggregation during secondary homeostasis. High prenatal fibrinogen levels are associated with low PPH incidences47,48. However, the effects of uterine fibroids on the risk of PPH after CD remain poorly understood. Some studies reported that uterine fibroids do not increase the risk of obstetric complications49,50. In contrast, other studies found that uterine fibroids with a diameter of over 5 cm are a risk factor for PPH after CD30,51. Our results were in accordance with the latter findings. A possible explanation is that uterine fibroids prevent uterine contractility, and those with a larger diameter and specific position affect uterine contractions, increase dystocia, and increase PPH risk52. The activities of coagulation factors I, II, V, VII and X in plasma can be reflected by PT and TT. We found that PT and TT were significantly prolonged in the PPH after CD group, indicating impaired coagulation in the study group compared to the control group, which may be associated with massive hemorrhage53.

Studies have developed risk-factor models for PPH after CD. Zheutlin’s model19, which included 24 unique features from the United States, had an AUC of 0.71 (95%CI: 0.69–0.72), while Wu’s54 model, which was built using 35 radiomic features, had an AUC value of 0.83 (95% CI: 0.75–0.91). Several studies have also constructed prediction models for blood transfusion after CD. Ahmadzia’s8 model, which was based on prenatal and intrapartum variables from 19 medical institutions in the United States, demonstrated an AUC value of 0.83 (95% CI: 0.81–0.84). Kang’s18 model, which involved 5 risk factors from South Korea, had an AUC value of 0.83 (95% CI: 0.70–0.92). The parameters of these models are shown in Supplementary Table 3. Compared to the aforementioned models, the risk-factor model constructed using RF in this study performed better.

Based on 10 influencing factors, we developed two web-based tools for PPH after CD, which could be applied in the participating hospital units. Healthcare workers or patients will get a probability score of PPH by filling all or some of the 10 factors. For high-risk patients, reasonable measures could be taken for PPH prevention, such as ameliorating of patient’s anemia, correcting patient’s coagulation disorders with medication before CD, and preparing additional plasma or serum in advance. Many of the 10 influencing factors could not be modified. Therefore, further studies should focus on improving the performance of this model by increasing the sample size, supplementing other types of patient populations (such as patients with coagulation disorders), and screening other influencing factors (such as age and ethnicity).

In conclusion, we identified several risk factors for PPH after CD, including pregnancy with uterine fibroids, anemia before delivery, placenta previa, placenta accreta, placental abruption, uterine atony, small for gestational age, prolonged PT, prolonged TT, and low fibrinogen. Furthermore, we developed a risk-factor model to predict the risk of PPH after CD.

Limitations

This study has several limitations. First, all of the data is obtained from hospital units, in southwest of China, which may have caused a selection bias and a clustering effect in our model. Second, variables with a missing rate ≥ 30% were not included in this study. Therefore, further analysis should be performed to establish whether the excluded factors are associated with PPH after CD. Third, some variables, such as gestational age, had a relatively high missing rate. Although we used the miss-forest algorithm to fill in the missing data, it still lower the validity of this model. Fourth, some risk factors, such as PPH history, urgent or elective CDs, cervical dilation, and first or second stages of delivery, the fibroids sizes, could not be obtained from the platform, reducing the clinical utility of the model. Fifth, since this study did not strictly adhere to the TRIPOD statement, there was no external independent data to validate our results, which limits the generalization ability of the model. Sixth, most of the variables in the final model could not be modified, and just a few variables were available during the CD, thus, it is difficult to apply the model in clinical decision-making. Admittedly, the protocols were not the same in all seven hospitals, and we did not distinguish between elective and urgent surgeries, induction versus spontaneous onset according to labour stage. Eventually, this study lacked data regarding indications for CD. Therefore, there is heterogeneity in our study and more studies are needed to support our results.