Introduction

Gestational diabetes mellitus (GDM) is one of the common complications of pregnancy, caused by the combined effects of environmental and genetic factors. GDM increases the risk of maternal infection and preeclampsia and also leads to premature birth, fetal malformation, and macrosomia, which significantly increases the risk of type 2 diabetes (T2DM) and metabolic diseases in mothers and offspring [1]. With the rapid development of the social economy and the improvement of living standards, the prevalence of GDM is continuously increasing. According to the data from the International Diabetes Federation (IDF) [2], the global prevalence of GDM was 16.7% in 2021. It has become a significant cause of maternal and child mortality worldwide. Thus, early diagnosis and treatment of GDM are crucial to reducing short- and long-term complications in the mother and child [3].

A previous study [4] found that before the diagnosis, the high-risk group of GDM had a tendency of increased blood sugar, and the high-glycemia environment had adverse effects on the fetus. Therefore, identification of pregnant women with GDM in early pregnancy and targeted intervention could reduce the occurrence of the disease, thereby reducing maternal and fetal complications and improving prognosis. Although some early pregnancy GDM prediction models have been put forth, most have not been widely applied clinically. This study aimed to explore the risk factors of GDM to construct a new and accurate GDM risk prediction model that would improve the specificity and sensitivity of GDM prediction and provide a theoretical basis for early screening, monitoring, and intervention of high-risk patients.

Materials and methods

General data: A total of 6000 pregnant women who underwent a routine prenatal examination in Qinhuangdao Maternal and Child Health Hospital (Qinhuangdao City, Hebei Province, China) from January 2016–2022 were retrospectively selected and randomly divided into a modeling cohort (4200 cases) and a validation cohort (1800 cases) at a ratio of 3:7. The results of oral glucose tolerance test (OGTT) at 24–28 weeks of pregnancy were recorded, and patients were categorized into non-gestational diabetes mellitus (NGDM) and gestational diabetes mellitus (GDM) groups. The modeling cohort comprised 2975 cases in the NGDM group and 1225 GDM cases, while the validation cohort consisted of 1281 and 519 NGDM and GDM cases, respectively. This study was approved by the ethics committees of Qinhuangdao First Hospital and Qinhuangdao Maternal and Child Health Hospital.

The results of OGTT at 24–28 weeks of pregnancy were collected according to the GDM diagnostic criteria recommended by the International Association of Diabetes and Pregnancy Research Groups [5]: fasting blood glucose ≥ 5.1 mmol/L, lh blood glucose ≥ 10.0 mmol/L after taking sugar, and blood glucose > 8.5 mmol/L in the next 2 h; the diagnosis was confirmed when blood glucose levels met one or more criteria.

Exclusion criteria: ① Diabetes combined with pregnancy and overt diabetes during pregnancy; ② History of glucocorticoid application; ③ Hypertension; ④ Polycystic ovary syndrome; ⑤ Connective tissue diseases; ⑥ Liver and kidney diseases; ⑦ Other chronic diseases and pregnancy complications; ⑧ Recent history of acute infection; ⑨ Pregnant women with missing medical records.

Methods

Determination of basic human body parameters: age, height, pre-pregnancy weight, mid-pregnancy weight, and other general conditions were recorded. The body mass index (BMI) was calculated using the formula: BMI = weight (kg)/height2 (m2).

Laboratory test data, including glycated hemoglobin (HbA1c), triglyceride (TG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), uric acid (UA), blood urea nitrogen (BUN), creatinine (CREA), alanine aminotransferase (ALT), aspartate aminotransferase (AST), γ-glutamyltransferase (GGT), free triiodothyronine (FT3), free thyroxine (FT4), and thyroid-stimulating hormone (TSH), for 8–12 weeks were collected.

Statistical methods: All the statistical analyses were conducted using SPSS 25.0 software. The measurement data were expressed as mean ± standard deviation (\(\overline{x}\)±sd), and two-sample t-test was used for comparison between pregnant women in the GDM and NGDM group; the enumeration data were expressed as [cases (%)], and the χ2 test was used for comparison between groups. Significant indicators were further analyzed by logistic regression to analyze their correlation with GDM and then used in the prediction model. The discriminative power of the model was assessed using the receiver operating characteristic curve (ROC). The Hosmer–Lemeshow (HL) test was used to evaluate the calibration of the model. P < 0.05 indicated a statistically significant difference.

Results

The comparison results of the general and laboratory index of pregnant women in the modeling and verification cohorts

No significant difference was detected in height, weight gain in the second trimester, FT4, TSH, ALT, AST, BUN, and TC in the first trimester between the NGDM and GDM groups in the modeling cohort (P ≥ 0.05), while significant differences were noted in age, pre-pregnancy weight, mid-pregnancy weight, pre-pregnancy BMI, glycosylated hemoglobin, FT3, GGT, UA, CREA, TG, HDL-C, LDL-C, fasting plasma glucose (FPG), and blood glucose at 1 h and 2 h OGTT between the two groups (P < 0.05; Table 1).

Table 1 Comparison of the results of general pregnancy status and laboratory indicators in the modeling cohort

In the verification cohort, no significant differences were detected in mid-gestational weight gain, FT4, TSH, AST, BUN, CREA, TC, and LDL-C of pregnant women in the NGDM and GDM groups (P ≥ 0.05). Conversely, age, height, pre-pregnancy weight, mid-pregnancy weight, pre-pregnancy BMI, glycosylated hemoglobin, FT3, ALT, GGT, UA, TG, HDL-C, FPG, OGTT1h blood glucose, and OGTT2h blood glucose differed significantly between the two groups (P < 0.05; Table 2).

Table 2 Comparative results of general conditions and laboratory indicators of pregnant women in the validation cohort

Multifactor logistic regression analysis results and prediction model were established

Multivariate logistic regression analysis was conducted on the significant variables (P < 0.05; Table 1). The results showed that age, BMI, glycosylated hemoglobin in the first trimester, UA, TG, and HDL-C were independent risk factors for GDM (P < 0.05; Table 3) and hence, were included in the logistic regression model; Y = 1/{1 + exp[− (− 18.373 + age × 0.065 + BMI × 0.030 + first-trimester HbA1c × 2.519 + UA × 0.014 + TG × 0.224-HDL-C × 0.635)]}.

Table 3 Results of the multivariate logistic regression analysis

The ability of the above indicators to predict the risk of GDM was analyzed according to the ROC curve, and the ability of the individual indicators to predict GDM was low (Table 4 and Fig. 1).

Table 4 The ability of each indicator to predict the risk of GDM
Fig. 1
figure 1

ROC curve of each indicator predicting the risk of GDM

Evaluation of the forecasting performance of the GDM forecast model discrimination test: The ROC curve with the sensitivity as the ordinate and 1-specificity as the abscissa. The area under the ROC curve (AUC) was 0.803 (P < 0.01), and the 95% confidence interval (CI) was 0.788–0.817. The Youden index was 0.455, the sensitivity was 72.0%, and the specificity was 73.5% (Table 4 and Fig. 1). Calibration test: The H–L test was selected to test the goodness-of-fit of the model. The values for the GDM risk prediction model were χ2 = 4.436, P = 0.816, and no significant difference was detected between the predicted and the actual observed values (P > 0.05).

Validation cohort to evaluate the effectiveness of the GDM risk prediction model: The GDM prediction model was substituted in the first trimester into the validation cohort, and the ROC curve was constructed. The AUC was 0.782 (P < 0.01), and the 95% CI was 0.759–0.806. The Youden index was 0.424, the sensitivity was 68.6%, and the specificity was 73.8% (Table 4 and Fig. 2). The results of H–L test showed that the GDM risk prediction model in the first trimester was χ2 = 5.591, P = 0.693, and there was no significant difference between the predicted and the actual observed values (P > 0.05).

Fig. 2
figure 2

Area under the ROC curve of the validation queue

Discussion

GDM risk prediction model

GDM has gained increasing attention due to its hazardous outcomes and long-term adverse effects on mothers and offspring. Early detection and standardized management of GDM are essential to improve maternal and fetal outcomes [6]. Several scholars have established a GDM prediction model in early pregnancy to predict and intervene in the high-risk groups of GDM early, reducing the disease's occurrence and its complications and improving maternal and child outcomes. Sweeting et al. [7] included previous GDM medical history, family history of diabetes, age, race, parity, and BMI into the model, and the AUC was 0.88. When the model incorporated new maternal lipid markers, such as pregnancy-associated proteins, lipocalin-2, and triglycerides, the AUC was 0.91. The new model formed after the addition of new maternal lipid markers in the Sweeting model identified pregnant women at high risk of GDM more accurately than the old model, but lacks external data validation [8]. The prediction model of Teede et al. [9] includes previous GDM medical history, family history of diabetes, maternal age, pre-pregnancy BMI, and race. This model was simple and suitable for clinical application, but its predictive efficiency was low, with an AUC of 0.70. Wang et al. [10] applied four methods to establish a GDM risk prediction model in early pregnancy. The calculation of the scoring model was simple, but the AUC was 0.772, and the prediction performance was poor. The calculation formula of the logistic regression model was complicated but had a high accuracy; the AUC of training and validation sets was 0.799 and 0.834, respectively. Although the machine learning models had a high accuracy, achieving the same in clinical practice was challenging.

Some early pregnancy GDM prediction models have a good prediction performance but have not been widely used in clinical practice. The study of GDM prediction models in China started late, and a prediction model for GDM in the first trimester of pregnancy has not yet been established to provide a valuable preliminary screening tool for the early screening of pregnant women. This retrospective study analyzed the data of 6000 pregnant women. According to the clinical characteristics of pregnant women and laboratory results in the first trimester, a risk prediction model for GDM in the first trimester was established through logistic regression. The model finally included six predictors: age, pre-pregnancy BMI, HbA1c in the first trimester, UA, TG, and HDL-C. The AUC of the modeling cohort was 0.803 (95% CI: 0.788–0.817), with a sensitivity of 72% and a specificity of 73.5%. After substituting the equation into the validation cohort, the AUC was 0.782 (95% CI: 0.759–0.806), the sensitivity was 68.6%, and the specificity was 73.8%. The P values of the HL test for both the modeling and validation cohorts were > 0.05, indicating that the predictive model established in this study had a good fit.

Correlation between clinical features and laboratory indicators in the first trimester and GDM

Some studies [11] have shown that the risk of GDM increases linearly with the age of pregnant women. The prevalence of GDM increases with maternal age [12]. Li et al. [13] found that advanced age, pre-pregnancy BMI overweight, and a history of diabetes in first-degree relatives are associated with an increased risk of GDM. In early pregnancy, age and pre-pregnancy BMI are independent risk factors for GDM, and the risk of GDM in overweight/obese women aged ≥ 35 years is 2.45 times that of normal women [14]. Our results were consistent with the above findings that age and pre-pregnancy BMI are independent risk factors for the occurrence and development of GDM. However, the ability of age and pre-pregnancy BMI to predict GDM was low, the AUC was 0.583 and 0.618, respectively, and the sensitivity and specificity were low.

In early pregnancy, high concentrations of TSH and FT3 and lower concentrations of FT4 were associated with an increased risk of GDM; pregnant women with a high FT3/FT4 ratio are more likely to suffer from GDM than normal pregnant women [15, 16]. Moreover, positive anti-peroxidase antibody (TPOAb) was also associated with an increased risk of GDM [14]. A retrospective analysis of 626 subjects [17] showed that elevated UA levels in early pregnancy were positively associated with GDM risk. High UA at 13–18 weeks of gestation is a risk factor for GDM, and in pregnant women ≥ 35-years-old, serum UA has a stronger correlation with GDM [18]. Li et al. [19] showed that high UA levels during 16–18 weeks of gestation were positively and independently associated with an increased risk of GDM, and those in the highest quartile increased the risk by 55.7% compared to the lowest quartile. In the present study, no significant difference was detected in the levels of FT4 and TSH between the two groups of pregnant women. The levels of FT3 and UA in the GDM group were significantly higher than those in the non-GDM group. However, after adjusting age, pre-pregnancy BMI, HbA1c, TG, HDL-C, and other factors in the first trimester, no correlation was established between FT3 level and GDM, while UA level was correlated with GDM and was an independent risk factor for GDM. When the UA in the first trimester was > 226.55 μmol/L, the possibility of pregnant women suffering from GDM was high, and the AUC was 0.693, which had a certain predictive ability.

HbA1c showed the average blood glucose level in the past 3 months. The HbA1c of pregnant women with GDM was significantly higher than that of pregnant women with normoglycemia. Women with higher HbA1c in the first trimester had a high risk of developing GDM [20]. Kattini et al. [21] found that the risk of GDM increased when the HbA1c level was > 5.7%, and all patients with GDM could be identified when the level was > 6.0%. Fasting blood glucose (FPG), OGTT1h blood glucose level, OGTT2h blood glucose level, and HbA1c level in early pregnancy are critical predictors of GDM, among which 1 h blood glucose level has the most significant predictive value [22]. Another study found that [23], the levels of TC and TG were significantly different between the GDM and the non-GDM groups. Cao et al. [24] speculated that compared to the normal pregnant subjects, TG, TC, low-density lipoprotein (LDL) and very low-density lipoprotein (VLDL) in GDM patients were significantly higher. Conversely, the high-density lipoprotein in the GDM group (HDL) concentration was low. In this study, HbA1c, TG, and HDL-C in the first trimester were independent risk factors for GDM, but HDL-C had no independent predictive effect on GDM. The AUCs of HbA1c and TG in the first trimester were 0.722 and 0.692, respectively, and the optimal cutoff points for predicting GDM were 5.05% and 1.53 mmol/L, respectively. Thus, focusing on the glucose and lipid metabolism levels of pregnant women in the first trimester of pregnancy to prevent the occurrence of GDM is imperative.

The occurrence of GDM can be predicted based on a single index; for example, HbA1c in the first trimester, but its sensitivity and specificity are low. However, whether it could predict the occurrence of GDM alone needs to be investigated further. Moreover, the current study found that compared to individual indicators, the risk prediction model established by combining age, pre-pregnancy BMI, and laboratory indicators in the first trimester can increase the AUC from 0.583–0.722 to 0.803; also, the sensitivity and specificity have been improved.

This study mainly used factors that were easy to obtain, identify, and intervene, such as the results of early pregnancy checkups of pregnant women, as predictors, and incorporated thyroid function indicators and UA in the first trimester into the GDM risk prediction model to provide a basis for the identification of high-risk groups for GDM. Nevertheless, the present study has several deficiencies. Herein, only the pregnant women of Qinhuangdao City were included, which could not be used to infer the situation in other regions. The fitting degree of the predictive model was good, but the AUCs of the modeling cohort and the validation cohort were 0.803 (95% CI: 0.788–0.817) and 0.782 (95% CI: 0.759–0.806), respectively, and the predictive power was moderate. Also, the prediction model had not been verified externally, and needs further extrapolation.