Introduction

Gestational diabetes mellitus (GDM) is characterised by any degree of glucose intolerance that either develops or is first identified during pregnancy [1]. It encompasses cases of previously undiagnosed glucose intolerance that may have existed before or emerged during pregnancy, regardless of subsequent management approaches, such as dietary modification or insulin therapy, and whether the condition persists post-pregnancy [2]. Regional disparities in GDM prevalence are evident, with the highest rates found in the Middle East and North Africa (12.9%), followed by Southeast Asia (11.7%), the Western Pacific (11.7%), South and Central America (11.2%), and the lowest rates in Europe (5.8%), North America, and the Caribbean (7.0%) [3]. GDM is a widespread pregnancy complication, affecting 1–14% of pregnancies worldwide, with variations influenced by patient ethnicity and diagnostic criteria [4, 5]. The impact of GDM on maternal and fetal health is significant, often leading to preterm delivery, cesarean section, excessive fetal growth, hyperinsulinemia, hypoglycemia, and hyperbilirubinemia in newborns [6,7,8]. Additionally, GDM can progress to Type 2 Diabetes Mellitus (T2DM), resulting in birth-related complications, visceromegaly, fetal macrosomia, and an increased risk of metabolic disorders for both mother and child, including hypertension, obesity, and metabolic syndrome [9, 10].

The precise pathophysiological mechanisms of GDM remain incompletely understood, but hormonal imbalances, impaired insulin sensitivity, and pancreatic β-cell malfunction are suggested contributors [11]. About 16% of pregnancies globally are linked to hyperglycemia, with 84% classified as GDM [12]. GDM significantly contributes to the onset of T2DM in both mothers and offspring, emphasising the importance of effectively managing blood glucose levels during pregnancy to prevent and reduce the prevalence of T2D in future generations [13]. Historically, screening for GDM relied on medical history, previous obstetric outcomes, and family history of T2D. However, this approach exhibited an approximate 50% failure rate in detecting GDM among pregnant women. In 1973, a pivotal study recommended adopting the 50 g 1-h oral glucose tolerance test as a screening tool, which is now widely used by approximately 95% of obstetricians in the United States for GDM screening. In 2014, the U.S. Preventive Services Task Force (USPSTF) recommended GDM screening for all pregnant women at 24 weeks [12, 14, 15].

Early screening and diagnosis of GDM are crucial for reducing the risks of pregnancy-related complications, such as macrosomia, preterm birth, pre-eclampsia, and neonatal intensive care admissions [14, 16]. Existing diagnostic tools have limitations in this regard. To enhance the prediction of GDM, clinical, sociodemographic, and anthropometric data have been employed in traditional regression analysis-based clinical risk prediction models. Recent advancements in machine learning promise to increase the accuracy of disease perception, diagnosis, and management. For instance, Belsti et al. [17] used a predictive analysis on antenatal care records. Their model achieved 85% accuracy, 90% precision, 78% recall, 84% F1-score, 81% sensitivity, 90% specificity, 92% positive predictive value, 78% negative predictive value, and a Brier Score of 0.39, surpassing the performance of traditional statistical methods. Most outcome prediction models enable early intervention in high-risk women and cost-effective screening by identifying low-risk individuals, potentially eliminating the need for glucose tolerance tests [18]. This review explores the effectiveness of machine learning algorithms in detecting GDM, incorporating relevant studies and data on their application for GDM detection.

Methodology

Literature search strategy

A literature search was carried out to review the role of machine learning algorithms in the early detection of GDM and their impact on fetomaternal outcomes. The following databases were searched: PubMed, Scopus, Web of Science, and Google Scholar. The search was conducted for studies published between 2000 and September 2023. The following keywords were used (“machine learning”[MeSH Terms] OR (“machine”[All Fields] AND “learning”[All Fields]) OR “machine learning”[All Fields]) AND (“algorithms”[MeSH Terms] OR “algorithms”[All Fields]) AND (“diabetes, gestational”[MeSH Terms] OR (“diabetes”[All Fields] AND “gestational”[All Fields]) OR “gestational diabetes”[All Fields] OR (“gestational”[All Fields] AND “diabetes”[All Fields] AND “mellitus”[All Fields]) OR “gestational diabetes mellitus”[All Fields]).

Inclusion and exclusion criteria

Articles were included if they met the following criteria:

  • Published in English.

  • Peer-reviewed original studies.

  • Focused on applying machine learning algorithms in the context of GDM.

  • Included information on using machine learning in detecting or predicting GDM.

The exclusion criteria were:

  • Systematic analyses, meta-analyses, reviews, conference abstracts, case reports, editorials, and letters.

  • Studies that did not provide relevant information or data on the topic.

Study selection

Two independent reviewers (NA & EK) initially screened titles and abstracts to identify potentially relevant articles. Full-text articles were then retrieved for further evaluation. Discrepancies were resolved through discussion, and a third reviewer (GO) was consulted when necessary.

Data extraction

Data were extracted from the selected articles, including study design, sample size, characteristics of the study population, machine learning algorithms employed, predictive variables used, outcomes measured, and reported results.

Data synthesis

The findings from the selected studies were synthesised to provide an overview of the current evidence regarding the role of machine learning algorithms in the early detection of GDM and their impact on fetomaternal outcomes. Common themes, trends, and methodological differences were identified. Results were analysed and presented in a clear and organised manner.

Results

The studies in this review focused on predicting and detecting GDM through machine learning algorithms (See Table 1). Most were retrospective studies; others were cohort studies, and two were randomised clinical trials. The populations studied vary in size, from smaller cohorts of just a few thousand individuals to larger populations exceeding 30,000. The studies reviewed utilised diverse machine learning algorithms, including Naïve Bayes, Decision Trees, Support Vector Machines, Neural Networks, Logistic Regression, Lasso-Logistics, Gradient Boosting Decision Tree (GBDT), Deep Neural Network (DNN), Gaussian Naïve Bayes (GNB), Bernoulli Naïve Bayes (BNB), and various ensemble methods such as Light Gradient Boosting Machine (LGBM) and Extreme Gradient Boosting (XGBoost). Data sources include pregnancy registries, perinatal databases, clinical records, and data from health institutions or hospitals.

Table 1 Characteristics of reviewed studies

Model performance and comparison

The studies conducted by Kang et al. (2023) and Yunzhen et al. (2020) demonstrated notable outcomes in terms of model performance and comparison [21, 32]. Kang et al. [21] conducted an analysis comparing the effectiveness of two machine learning algorithms, namely Light Gradient Boosting Machine (LGBM) and XGBoost, in predicting GDM. This study revealed that XGBoost consistently outperformed LGBM when evaluated across diverse cohorts and time points, positioning it as a promising choice for the prediction of GDM. In contrast, Yunzhen et al. [32] explored the potential of machine learning methods to surpass traditional logistic regression in GDM prediction. Their results, however, indicated that several machine-learning methods fell short of outperforming logistic regression.

Jie et al. [24] implemented diverse machine learning algorithms, including Logistic Regression, Gradient Boosting Decision Tree, XGBoost, and Lightgbm. The outcome was a model with high accuracy, precision, and recall, demonstrating the potential of these algorithms to enhance GDM prediction and risk assessment. Complementing this, Yang-Ting et al. [30] introduced a clinically cost-effective 7-variable Logistic Regression model. This simplified approach offers a promising avenue for GDM prediction, making it accessible and practical for clinical applications.

Early prediction

Gabriel et al. [19] develop employed Gaussian Naïve Bayes (GNB), Bernoulli Naïve Bayes (BNB), Decision Trees (DT), Support Vector Machines (SVMs), Multi-Layer Perceptron (MLP) to predict early prediction of GDM within the early stages of pregnancy through regular examinations. The results showed that the developed ML models and the proposed data augmentation method achieved excellent predictive performance for GDM. Similarly, Jenny et al. [23] introduced a novel machine learning-based stratification system. The study utilised linear and non-linear tree-based regression models, including XGBoost. The study demonstrated a straightforward method for implementing proportionate care delivery based on existing features in GDM clinics. The machine learning-based stratification system identified patients at risk of high blood glucose levels, enhancing the ability to tailor care interventions. Furthermore, Yi-xin et al. [22] utilised machine learning to forecast GDM risk with a moderate performance at pregnancy initiation, ultimately achieving good-to-excellent predictive capabilities by the end of the first trimester. The ML algorithm utilised in the study was XGBoost. The machine learning model demonstrated moderate performance in predicting GDM at pregnancy initiation and good-to-excellent performance at the first cohort’s end of the first trimester. However, in the second cohort, the trained XGBoost exhibited moderate performance. The primary objective of the prospective cohort study conducted by Jingyuan Wang et al. [33] was to develop and verify an early prediction model for GDM using machine learning algorithms. Various machine learning algorithms, including LR, Random Forest (RT), ANN, and SVM, were employed in the study. The study findings indicate that the constructed New-Stacking model theoretically aimed for optimal specificity, accuracy, and AUC. Nonetheless, the SVM model demonstrated superior performance, specifically in sensitivity.

Jesús et al. [20] conducted a study to address the barriers to early detection of GDM in pregnant Mexican women. The study employed a machine-learning-driven method to select the best predictive variables for GDM risk. The identified variables included age, family history of type 2 diabetes, previous diagnosis of hypertension, pregestational body mass index, gestational week, parity, birth weight of the last child, and random capillary glucose.

Subsequently, an artificial neural network approach was used to build the AI-based prediction model. The developed model demonstrated a high level of accuracy, reaching 70.3%, and sensitivity, achieving 83.3%. These results indicate the model’s effectiveness in identifying pregnant women at high risk of developing GDM. Moreover, de Freitas et al. [31] conducted a study aiming to characterise GDM in pregnant women better using Attenuated Total Reflection Fourier-transform infrared (ATR-FTIR) spectroscopy. The study employed chemometric approaches, integrating feature selection algorithms along with discriminant analysis methods such as Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and Support Vector Machines (SVM). The results obtained by Genetic Algorithm Linear Discriminant Analysis (GA-LDA) were reported as the most satisfactory, achieving % accuracy, sensitivity, and specificity of 100%.

Results in diverse populations

Mukkesh Kumar et al. [26] conducted a cohort study to evaluate the predictive ability of the existing UK National Institute for Health and Care Excellence (NICE) guidelines for assessing GDM using machine learning. This study employed the CatBoost gradient boosting algorithm and the Shapley feature attribution framework for predictive modelling. The findings of the study revealed that the existing UK NICE guidelines were insufficient to assess GDM risk in Asian women. Furthermore, the non-invasive predictive model developed in this study demonstrated superior performance to the current state-of-the-art machine learning models in predicting GDM. Similarly, Mukkesh Kumar et al. [27] built a preconception-based GDM predictor to enable early intervention. Additionally, the study aimed to assess the associations of top predictors with GDM and adverse birth outcomes. Participants were recruited from multi-ethnic groups (Chinese, Malay, Indian, or any combination of these three ethnicities). The study employed an evolutionary algorithm-based automated machine learning (AutoML) approach, incorporating the SHAP (SHapley Additive exPlanations) framework and TPOT (Tree-based Pipeline Optimization Tool). The study successfully devised a population-based predictive care solution, utilising an AutoML approach, to assess the risk of developing GDM among Asian women in the preconception period. While effective in some contexts, their findings revealed that these algorithms proved insufficient for accurately assessing GDM risk in some ethnic groups of women. This study highlights the need for population-specific considerations when addressing GDM.

Predictive models for specific cohorts

Yuhan et al. [28] conducted a Randomized Clinical Trial to apply machine learning techniques to develop a Clinical Decision Support System (CDSS). The objective was to predict the risk of Gestational Diabetes Mellitus (GDM), specifically in a high-risk group of women with overweight and obesity.. The study employed both Random Forest and Logistic Regression models for prediction. The study successfully developed a simple yet effective model utilising machine learning algorithms to predict the risk of GDM in the first trimester. Notably, the model achieved this without relying on blood examination indexes. Li-Li et al. [29] conducted a retrospective study to investigate the application of a machine learning algorithm for predicting GDM in early pregnancy. The machine learning algorithm employed in the study was the Random Forest regression algorithm. Notably, the model identified body weight at birth and the mother’s weight as strongly predictive variables for GDM. Additionally, other variables such as colpomycosis, kidney disease, the number of births by the mother, regular menstruation, blood type, and hepatitis consistently ranked among the top 20 most influential factors. They were found to be linked to GDM in the study.

Clinical data and treatment modality

Lauren et al. [25] conducted a population-based cohort study to investigate whether clinical data at different stages of pregnancy could predict the treatment modality for GDM. The focus of the study was on predicting the risks for pharmacologic treatment beyond medical nutrition therapy (MNT) for pregnant women diagnosed with GDM. The study employed transparent and ensemble machine learning methods for predictive modelling, incorporating LASSO regression and a super learner. The super learner included classification, regression tree, LASSO regression, random forest, and extreme gradient boosting algorithms. The study’s findings demonstrated reasonably high predictability for GDM treatment modality at GDM diagnosis and maintained high predictability at 1-week post-GDM diagnosis. In parallel, Jenny et al. [23] demonstrated the development of an innovative method for implementing proportionate care delivery based on existing features within GDM clinics. For predictive modelling, the study employed linear and non-linear tree-based regression models, including metrics such as XGBoost MSE (Mean Squared Error), R2 (R-squared), and MAE (Mean Absolute Error). The findings suggest that such a machine learning-based stratification system could provide an effective and practical approach for tailoring care interventions based on existing features within GDM clinics, potentially improving patient outcomes and resource allocation.

Discussion

The studies reviewed here encompass various methodologies, underlining the multifaceted nature of GDM prediction. One striking trend within this collection of studies is the detailed comparison of machine learning algorithms. Algorithms like XGBoost and Logistic Regression have demonstrated their effectiveness in GDM prediction [29]. However, it is essential to recognise that there is no one-size-fits-all solution. While XGBoost displayed superiority in several studies, comprehending the strengths and weaknesses of different algorithms becomes crucial for optimising predictive models within various contexts.

The importance of early prediction for effective GDM management cannot be overstated, and it is evident in the significant emphasis placed on this aspect in the reviewed studies [25, 34] (Fig. 1). The rationale behind early prediction lies in the potential to initiate timely interventions and provide personalised care to pregnant women at risk of developing GDM. The complications associated with GDM can have profound and long-lasting effects on both the mother and child, making early detection a critical component of effective healthcare [35]. This emphasis on early prediction is reflected in the proliferation of diverse models designed to forecast GDM risk during the early stages of pregnancy. The variety of models exemplified by the comprehensive work of Gabriel Cubillos et al. [19] underscores the collective ambition within the scientific community to enhance the accuracy and reliability of GDM predictions. The study by Gabriel Cubillos and their team is particularly noteworthy as it prioritised early prediction and explored the potential of different machine-learning models [19]. They expanded the toolkit for healthcare providers and researchers by developing and optimising twelve distinct models. These models are fine-tuned to deliver high prediction performance during the early stages of pregnancy. This multi-pronged approach allows for more comprehensive risk assessment, increasing the chances of timely interventions. The focus on early prediction is not only about identifying cases but also about developing a deeper understanding of the factors and variables that contribute to the development of GDM [36]. By emphasising the importance of early detection, these studies pave the way for tailoring interventions that can prevent or mitigate the impact of GDM. The ultimate goal is to improve maternal and fetal health outcomes by making proactive, personalised care a standard practice in obstetrics.

Fig. 1
figure 1

Translating machine learning predictions into clinical interventions for gestational diabetes

Studies within this review underscore the importance of tailoring predictive models to specific populations and demographic groups when addressing the prediction and early detection of GDM [19, 23, 30]. These studies highlight that a one-size-fits-all approach is insufficient, and demographic-specific considerations are essential for constructing accurate predictive models. Mukkesh Kumar et al. [26] have made a particularly striking contribution by shedding light on the limitations of employing uniform guidelines for diverse populations, specifically emphasising the challenges faced by Asian women. Their findings reveal that traditional, broadly applicable guidelines may not adequately capture the unique risk factors and nuances associated with GDM in Asian populations. This study emphasises the necessity of considering ethnicity, genetics, and other demographic-specific factors when constructing predictive models for GDM. By doing so, healthcare providers can better identify at-risk individuals within these populations and tailor interventions and care strategies to their specific needs. Similarly, the research conducted by Yuhan Du et al. (2022) provides a compelling illustration of the potential for augmenting prediction accuracy by focusing on high-risk groups [23]. In this case, the study zeroes in on women who are overweight or obese, a demographic with a higher susceptibility to GDM. By developing a specialised clinical decision support system for this specific cohort, the study recognises the unique risk profile of these individuals. This targeted approach can enhance prediction accuracy, ensuring women at the highest risk receive the necessary attention, interventions, and care. These findings indicate the importance of healthcare equity, emphasising that predictive models must be sensitive to the diversity of the populations they serve. The one-size-fits-all approach is no longer adequate, as demographic factors significantly determine GDM risk. Future research and healthcare initiatives should consider these demographic-specific considerations when designing predictive models, ultimately leading to more accurate risk assessment and better-tailored interventions.

Lauren et al. (2022) and Jenny et al. (2022) made substantial contributions to the field by emphasising the importance of integrating clinical data into the predictive models for GDM [23, 25]. These studies provide valuable insights into how leveraging clinical data can enhance the treatment and care delivery for individuals diagnosed with GDM, ultimately improving patient outcomes. The integration of clinical data into predictive models offers several crucial advantages. First and foremost, it enables healthcare providers to personalise and optimise the treatment and care for pregnant individuals diagnosed with GDM. By considering clinical data such as responsiveness to medical nutrition therapy, they can tailor interventions to each patient’s specific needs. This individualised approach is essential, as GDM management can vary significantly from one person to another [37]. Furthermore, incorporating clinical data fosters a more patient-centred approach to care. It ensures that the treatment plan aligns with the patient’s specific health profile, preferences, and response to interventions. This patient-centred approach can improve patient satisfaction, compliance, and overall well-being. Jenny et al. [23] introduced the concept of proportionate care delivery based on available clinical data. This innovative approach streamlines care and ensures that resources are allocated efficiently, addressing patients’ needs more effectively [30]. By leveraging existing clinical data, healthcare providers can identify individuals at risk of high blood glucose levels, enabling proactive intervention and reducing the likelihood of complications associated with uncontrolled GDM.

Nonetheless, it is essential to acknowledge that challenges persist within GDM prediction. A common challenge encountered is the extensive array of variables associated with GDM [38]. The condition’s multifaceted nature means numerous factors must be considered, making selecting and weighing these variables complex [39]. While studies like that of Jie et al. (2022) have demonstrated the potential of different machine-learning models, addressing this variable complexity remains a significant challenge [26]. Researchers must continue refining their models and methodologies to accurately incorporate the full spectrum of relevant variables. Moreover, applying ensemble methods, such as stacking, underscores the aspiration to enhance predictive performance. While these methods promise to improve accuracy, they also introduce additional layers of complexity. Studies must balance model sophistication and practicality, ensuring that predictive models can be effectively implemented in real-world clinical settings.

As technology and healthcare data evolve, future research can leverage emerging opportunities. Integrating real-time data from wearable devices, exploring genetic data, and incorporating a more comprehensive range of health-related information are all promising avenues for improving predictive models. These advanced data sources have the potential to provide a more holistic understanding of GDM risk, leading to more accurate and timely predictions [40]. Furthermore, future research should consider the holistic context in which GDM occurs. Focusing on patient-centred outcomes and the social determinants of GDM can deepen our understanding of this condition. Factors such as access to healthcare, socioeconomic status, and lifestyle can significantly impact an individual’s risk of developing GDM. By considering these broader determinants, researchers and healthcare providers can develop more comprehensive and effective management strategies that address the medical aspects and the social and environmental factors influencing GDM.

Limitations and strengths of review

This review explores various studies on predicting and detecting GDM through machine learning methods. It encompasses a wide range of study designs, population groups, and machine learning algorithms, providing an inclusive overview of this field’s current state of research. However, the studies included in this review span across different geographical regions and demographic profiles. While this diversity enriches the scope of the review, it can simultaneously limit the generalizability of findings. GDM risk factors and predictive models may exhibit variations among populations, and the review would benefit from a more thorough discussion of the implications arising from this variability. Additionally, this review primarily relies on studies published in English, which might introduce publication bias, potentially overlooking negative or inconclusive results less readily available in English literature.

Conclusion

Predicting and early detecting GDM through machine learning techniques is a dynamic and evolving field. This review shows significant findings and trends across diverse studies, shedding light on the potential and challenges within this domain. The significance of early prediction in facilitating effective GDM management is striking, with numerous studies committed to crafting models capable of identifying GDM risk in the early stages of pregnancy. XGBoost emerged prominently as a consistent performer, showcasing superior predictive capabilities across various cohorts and time points. These models create opportunities for timely interventions and personalised care, ultimately improving outcomes for both mothers and infants. Nevertheless, the challenges at hand are notable. The vast array of variables associated with GDM poses a substantial hurdle in the quest for accurate prediction models. The selection and weighting of these variables remain intricate tasks, necessitating ongoing research and innovation in feature engineering. Furthermore, the emphasis on tailoring predictive models to specific populations, evident in studies focusing on Asian women or high-risk groups, underscores the importance of demographic-specific considerations. Predictive models must adapt to these groups’ unique characteristics and risk factors. The practicality of implementing proportionate care delivery based on readily available clinical data underscores the value of leveraging existing resources effectively. As technology and healthcare data continue to advance, there is an opportunity for future research to harness real-time data from wearable devices and genetic information to enhance predictive models further. These emerging data sources could revolutionise GDM prediction and early intervention. Focusing on patient-centred outcomes and exploring the role of social determinants in GDM prediction can deepen our understanding of this condition. It can pave the way for more comprehensive and effective management strategies considering medical variables and broader contexts in which GDM occurs. This review offers valuable insights and directions for future studies in GDM prediction through machine learning techniques.