Background

Gestational diabetes mellitus (GDM) is defined as elevated blood glucose levels that are first diagnosed in pregnancy [1]. Women with GDM are more likely to experience caesarean section or preterm delivery and babies born to women with GDM are at a greater risk of macrosomia, respiratory distress, neonatal jaundice, admission to neonatal care and type 2 diabetes in later life [2, 3]. In addition to the adverse outcomes during pregnancy and delivery, women with GDM are at an estimated 8-fold risk of developing type 2 diabetes compared to women who have not had GDM [4]. Up to 70% of women with GDM develop type 2 diabetes, with the risk being greatest in the first five years following pregnancy and then plateauing at around 10 years [5, 6], but a diagnosis of GDM represents an opportunity for interventions to reduce type 2 diabetes risk [7].

It is thought that around 14% of pregnant women worldwide are affected by GDM but differences in screening approaches and diagnostic criteria result in variable estimates [8]. The diagnostic criteria used by clinicians for the diagnosis of GDM vary considerably worldwide, and have also changed over time. In the past diagnostic criteria were based on criteria for glucose intolerance in non-pregnant individuals or thresholds were decided based upon prediction of future type 2 diabetes risk in the mother, but more recently there has been an increasing focus on diagnostic thresholds that are based upon their predictive value for adverse outcomes in pregnancy [9].

A clear understanding of GDM is essential at local and national level so that health care interventions can be planned, financed and delivered for this group. A recent study of 51 population-based studies worldwide estimated global prevalence to be 4.4% (95% CI 4.3–4.4%) [10]. Our recent meta-analysis in developed countries in Europe yielded a prevalence estimate of 5.4% (95% CI 3.8–7.8%) [11] and another reviewing data from all European countries reported prevalence of 10.9% (95% CI 10-11.8%) [12].A meta-analysis in Eastern and South-eastern Asia yielded an estimate of 10.1% (95% 6.5–15.7%) [13] and another in Africa reported prevalence of 13.6% (95% CI 11-16.2%) [14]. However, there has been no review of prevalence of GDM specifically in the US or Canada. We have therefore conducted a systematic review and meta-analysis of observational studies that have assessed the prevalence of GDM in the general population of pregnant women in the US or Canada, regardless of the specific diagnostic criteria used. We have calculated an overall prevalence estimate for GDM and examined variables that could have influenced this estimate.

Methods

The systematic review and meta-analysis were conducted according to the Meta-analysis of Observational Studies in Epidemiology (MOOSE) guidelines [13].

Data sources

A search was carried out in the databases MEDLINE, CINAHL, Health Source and PsycInfo in June 2023 with no limit on the age of articles For each database the following search terms were used: (prevalence or incidence) and (gestational diabetes or diabetes in pregnancy or gestational diabetes mellitus) and (United States or America or US*or Canada).

Study selection

The titles and abstracts of all articles were screened by one author (DS) and independent screening was split between two other authors, with JE screening half and CE screening the other half. The full texts of papers were retrieved for studies that were considered relevant, but also for those that contained insufficient information to allow judgement of relevance. These were checked against the inclusion criteria by CE and independently by JE. Where there were disagreements between authors about the inclusion of a paper, the full text of the paper was retrieved, and a consensus was reached through discussion. The reference lists of included papers were checked to identify any other potentially relevant papers but experts in the field were not contacted due to the time-consuming nature of this process.

Articles were required to meet the following inclusion criteria.

Study Design

Observational study published in English.

Population

General population of pregnant women living in the US or Canada. In this context, general population referred to a sample of women not defined by clinical or other non-demographic characteristics.

Outcome measures

Prevalence of GDM diagnosed using universal screening carried out in the second or third trimester, using either an Oral Glucose Tolerance Test (OGTT) alone or two step screening with glucose challenge test (GCT) followed by an OGTT.

Data extraction and quality assessment

Data from included papers were extracted by two authors (half by CE and half by JE) using a data extraction form based on the template provided by the Centre for Reviews and Dissemination [14]. The extracted data were independently checked by two other authors (KB,RA). The following information was recorded for each included study: first author, journal name, year of publication, country, dates of data collection, study sample type, study design, age range of sample, ethnicity, body mass index (BMI), sample size, type of screening and diagnostic test carried out, and diagnostic criteria used for GDM.

The outcome measures extracted were the number and proportion of the sample with GDM and, where reported, these measures stratified by demographic factors such as ethnicity and age. Ethnic make-up of the sample was defined as unknown or mixed, unless one ethnic group comprised more than 70% of the sample, in which case it was allocated to that ethnic grouping.

Where possible, confidence intervals for prevalence estimates were calculated by the authors if these were not reported. Where there was more than one paper published from the same sample, only the paper reporting the most complete and definitive results was included. In cases where a study reported prevalence estimates according to different diagnostic criteria only one prevalence estimate was included in the analysis to avoid dependency effects. The prevalence estimate selected was that derived from the criteria that were most commonly used in other papers included in the review, to maximise comparability. For studies reporting multiple prevalence estimates by other factors, such as age or year, an average of the estimates was calculated and used in the analysis.

Included studies quality assessed using a checklist based upon the example published by the Joanna Briggs Institute [15] which was specifically designed for assessment of quality in systematic reviews of prevalence and incidence. Quality assessment was completed for all included papers by one author (CE) and a list of all identified weaknesses was compiled. The list was then discussed by two authors (CE and JE). A decision was made to exclude any papers with significant weaknesses, one of which was a participation rate of less than 70%. Participation rates can be defined in many ways but for this review the participation rate (recoded during data extraction if necessary and possible) was the proportion of eligible women sampled who completed testing for GDM. Papers were also excluded if sample size was less than 500, if it was not clear that screening was universal, or if it was not possible to determine whether the population was a ‘general’ population. Other less important weaknesses were common in the papers. These included not explicitly reporting women’s gestation at testing, limited description of the study sample, not reporting differences between participants and non-participants, not reporting details of who carried out glucose testing and not reporting confidence intervals. Papers with these weaknesses were retained.

Data synthesis and analysis

The meta-analysis was carried out using the Comprehensive Meta-Analysis software version 3.3.070 (Biostat, Englewood, NJ). The proportion of women or deliveries with GDM in each study was transformed into a logit event rate effect size and the standard error was calculated [16]. After analysis, the logits were retransformed to proportions. Combined effect sizes were calculated, and analyses were carried out that either included or excluded outlying logit event rates. No significant differences were found between these analyses, so the outliers were initially retained.

A random effects model was used to combine studies for significance testing and moderator analysis in a meta-regression, thereby allowing for the possibility that there were random differences between studies due to factors such as variation in procedures, measures or settings, alongside differences due to sampling error. This accords with evidence suggesting that the variability in reported prevalence for GDM may be the due to different methodologies and criteria [2]. The Q test was used to assess the homogeneity of studies, for which the null hypothesis states that variability of the effect sizes is due to sampling error only. If the assumption of homogeneity is violated, sources of variation can be explored by studying moderator variables. Categorical moderator variables in this study were analysed using an analysis of variance for meta-analysis, and tests of interaction used to explore differences between subgroups of these variables. The between study homogeneity statistic (QB) reflects the amount of heterogeneity that can be attributed to the moderator variable. The within study homogeneity statistic indicates the degree of heterogeneity that remains in the category in question (QW) and the I2 statistic shows the proportion of the variation that is due to heterogeneity rather than sampling error. Finally, a weighted multiple regression was carried out to assess which moderator variables made the greatest contribution to the variability in prevalence of GDM.

Results

Description of included studies

Figure 1 shows a PRISMA flow diagram of studies identified by the search. The search identified 4,229 abstracts of which 504 were potentially relevant after title and abstract screening. The full text articles were retrieved and assessed against the inclusion criteria, with 54 retained for quality assessment. Following assessment, a further 25 articles were excluded for the following reasons: eight were subsets or repeated samples of other included studies [17,18,19,20,21,22,23,24], six were cohort studies in which participants were invited to take part but eligibility criteria and/or participation rate were unclear [25,26,27,28,29,30], two had sample sizes of less than 500 [31, 32], four provided insufficient information on how the sample was derived [33,34,35,36], four used different methods to diagnose GDM within the same study without separate reporting [37,38,39,40], and one did not provide the required unadjusted data [41].

Fig. 1
figure 1

PRISMA flow diagram showing study selection

The resulting 29 studies yielded prevalence estimates for 36 separate samples of women, pregnancies or deliveries, giving a total sample size of 1,550,917 [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71]. The characteristics of the studies are presented in Table 1. In general, studies tended to fall into one of two categories. Some studies reported data that had been collected specifically for the estimation of GDM prevalence or were available through other related ad hoc research projects. Alternatively, some studies reported analyses of routinely collected data that were available either as part of national datasets or to support the operation of large Health Maintenance Organisations (HMOs).

Table 1 Characteristics of studies included in the meta-analysis

Nine studies (11 samples) were from Canada; the rest were from the US. Most of the studies used a two single step screening strategy, with all women screened first with a GCT, followed by an oral glucose challenge test (OGTT) if indicated. A one-step screening strategy was used in nine samples. Thresholds for GDM diagnosis with an OGTT also varied. We divided the studies into five categories, according to the diagnostic cut-offs that were used in the study (Table 1).

The most commonly used diagnostic criteria [72] were those of the National Diabetes Data Group (NDDG) which were used to diagnose GDM in ten studies as part of two-step screening and one study using a one-step strategy. Carpenter-Coustan criteria were used in eight studies, all using a two-step strategy. Two studies used O’Sullivan criteria within a two-step strategy, and five used thresholds according to Canadian guidelines (1998) [73], all of which one used a two-step strategy. The IADPSG criteria were applied in three studies, all using a one-step strategy. The diagnostic thresholds used by studies in this meta-analysis are shown in Table 2.

Table 2 Mean prevalence of GDM by several moderator variables for studies using two-step screening strategy

Mean prevalence of GDM

The overall mean prevalence of GDM in the meta-analysis including all studies was 6.9% (95% CI: 5.7–8.3). There were three outliers identified: studies that yielded prevalence estimates of 23.3%, 24.1% and 27.4%, all of which used IADPSG diagnostic thresholds. When these outliers were excluded, the prevalence estimate was 5.8% (95% CI: 5.0-6.8). Because this difference was not statistically significant, the outliers were initially retained in subsequent analyses. However, there was a statistically significant difference between studies that used a one-step or two-step screening strategy. The mean GDM prevalence using a one-step strategy was 13.7% (95% CI: 10.7–17.3) compared to 5.2% (95% CI: 4.4–6.1) for studies using a two-step strategy. For this reason, all subsequent analyses were conducted using studies that used a two-step strategy only (with the result that the outliers were also excluded).

Moderator analyses

Table 2 shows the effect of different moderators on the prevalence estimate. As would be expected, the estimate varied by the diagnostic criteria used. The highest prevalence of GDM was observed when the Carpenter-Coustan criteria were used, and the lowest with the NDDG criteria. There were no statistically significant differences in mean GDM between studies carried out in the US and Canada. There was a trend of increasing prevalence estimates the later the data collection period started but the trend according to when the data collection period ended was not significant. Only 15 and 11 studies respectively reported on the mean age and proportion of nulliparous women in the sample, and those studies with higher proportions of nulliparous women and a mean age of under 30 had lower GDM prevalence estimates, but these differences were not statistically significant. The ethnic composition of 19 of the samples was mixed or unknown. However, GDM prevalence estimates were slightly higher for five samples comprising over 70% First Nations women, and three comprising over 70% Hispanic or Latino women although these differences were not statistically significant.

The 15 studies using routinely collected data yielded prevalence estimates that were approximately 2% lower than those from the other studies. The GDM prevalence estimate in studies where the denominator did not include women with pre-existing diabetes was 1.6% higher than studies that included these women but the difference was not statistically significant. The estimate in the 12 studies when the sample was defined as pregnancies or deliveries, and pregnant women could be included more than once was similar to those where it was stated or implied that women could only be included in the study for one pregnancy or delivery (n = 13).

Multivariate analysis

On the basis of the moderator analysis, a weighted multiple regression was performed in order to explore which important moderator variables made the greatest contribution to the variability in prevalence of GDM (Table 3). Correlations between the different variables were explored to inform variable selection for the multivariate analysis but no statistically significant correlations were found. Diagnostic criteria, start of data collection period, whether routinely-collected data were used, and how the sample was defined were statistically significant in moderator analyses and included in the final model of the multiple regression.

Table 3 Univariate and weighted multiple regression of GDM prevalence

The results of the meta-regression indicated that overall, the covariates were able to explain 57% of the total observed variability (R2 analog = 0.57 (QR [8] = 71.97, p < 0.001). However, the residual model was statistically significant (QE [14] = 1163.83, p < 0.001, I2 = 98.8%) confirming that there was variability in the data that was not explained by the moderator variables. Of the variables that were significant in the univariate analysis (diagnostic criteria, start period of data collection, routine dataset, how sample was defined) only diagnostic criteria and period of data collection remained significant when the other variables were held constant.

Discussion

This meta-analysis of 32 samples of pregnant women in the US and Canada yielded prevalence estimates for GDM of 11.8% using a one-step screening strategy and 5.0% using a two-step screening strategy; with an overall estimate of 5.9%. The overall estimate was higher than estimates from meta-analyses in Europe (5.4%) [9] and globally (4.4%) [11], but lower than that for Eastern and South Eastern Asia (10.1%) [10]. A higher estimate associated with a one-step screening strategy was also observed within the European and Asian studies, with US one-step and two-step estimates again higher than respective estimates in Europe but lower than those from Asia [9, 11]. The methods of this systematic review were robust and followed a pre-determined protocol. Independent reviewers screened all results returned by the search and decisions on the inclusion of papers were discussed and made by two authors. Limitations of the review include that only non-English language papers were excluded, experts in the field were not contacted, grey literature was not identified, and data extraction was only carried out by one author. The increased prevalence observed in women in the US and Canada in the present review compared to Europe [9] may reflect difference in prevalence of obesity in these populations. Women who are obese have significantly increased odds of developing gestational diabetes even after confounders are controlled for [74]. In 2021 41.8% of women in the USA and 22% of women in Canada were obese. Rates in developed European countries included in the European systematic review discussed [9] were between 9.7% in Italy and 20.4% in the UK with an average figure of 16.3% [World Obesity 2021]. Differences in prevalence estimates between studies in this review were not only related to the screening approach (one-step or two-step) but can also be attributed to the use of different diagnostic thresholds, with estimates obtained using NDDG thresholds and those from Canadian guidelines significantly lower than those using Carpenter-Coustan thresholds. The IADPSG thresholds yielded very high estimates, as has consistently been reported [75]. When stratified by diagnostic categories, US and Canadian estimates in our meta-analysis were higher for two out of three categories that could be directly compared with the European study further supporting the suggestions of underlying differences in GDM prevalence between these areas linked to obesity prevalence. The effect of diagnostic category on GDM prevalence is less pronounced in the multivariate meta-regression. This was also the case for later start of data collection which was univariately associated with increased prevalence of GDM, but no independent effect of this variable was evident after adjustment for diagnostic category in the multivariate analysis. But the defined periods of data collection were relatively wide, so a temporal trend of increasing prevalence cannot be ruled out. Samples of women with mean age over 30 years yielded higher estimates of GDM than samples with a lower mean age although the difference was not statistically significant. Fewer than half of the studies reported age, making it difficult to assess the effect of age on our results, or indeed to compare with other studies.

One of the challenges of a meta-analysis is the heterogeneity of methods used in different studies. We attempted to include studies that used similar methods in order to minimise differences in prevalence estimates that could be due to differences in settings, procedures and clinical factors. We defined a general population of pregnant women as one which was not considered to be high risk or defined according to other clinical characteristics. This could mean a geographical (neighbourhood, regional or national) population, or the catchment population of either one, or a group of, medical centres or hospitals, provided that they did not serve a high-risk group. However, there was a difference between studies that used routinely-collected data where the denominator could be very large and included all enrolled women, and those where the data were collected within the context of a specific research study, often when women needed to be recruited and consented. Studies using routinely collected data tended to produce lower estimates. Furthermore, some of these studies used data from large Health Maintenance Organisations, and these populations are not necessarily socio-demographically representative of the overall population, but tend to be relatively affluent.

This review has shown that technical differences in the way that the denominator or the sample is defined can also have substantial effects on prevalence estimates. Most studies in this review used pregnant women as the sample, with some restricting this to primiparous women. Where the number of pregnancies or deliveries was the sampling unit, either the first or a randomly-selected delivery in the study period might be selected, while other studies could include the same woman twice. Furthermore, not all studies excluded stillbirths, or explicitly indicated that analyses were restricted to singleton pregnancies. It was not possible to perform moderator analyses on all these differences, given that the requisite information was not always available, but we did show that studies using pregnancies or deliveries as the sampling unit yielded lower estimates overall, and that excluding women with pre-existing diabetes from the denominator substantially increased the prevalence estimate. Given the increasing prevalence of prediabetes and diabetes in reproductive age women, the effects of this particular methodological detail could become increasingly important. The complexities of defining and diagnosing GDM that are highlighted in this review are likely to continue and as technology in this area develops. Continuous glucose monitoring has recently been shown to be able to potentially detect abnormal glucose levels in women who have a negative OGTT result [76] and previously HbA1c had been considered and used as a diagnostic tool [77]. These developments further highlight the need for clarity in the conduct and reporting of epidemiological research on GDM to allow new technology to be evaluated and compared to more established diagnostic tools.

Conclusion

This meta-analysis points to a slightly higher prevalence of GDM in the US and Canada, compared to Europe. However, much of the variability observed between estimates in the meta-regression remains unexplained. The combined effects of technical methodological differences and variation in the composition of different samples clearly account for a high proportion of residual variability. This strengthens the case for standardised epidemiological protocols for estimating the prevalence of GDM, so that trends over time can be monitored accurately, and that meaningful local, national and international comparisons can be made.