Introduction

Gestational diabetes mellitus (GDM) is the commonest medical condition in pregnancy [1], affecting 2–9% of pregnancies [2, 3], and its prevalence is increasing [4]. Precise data on prevalence are lacking, not least because of the lack of international agreement regarding diagnosis.

Gestational diabetes mellitus is associated with a number of adverse fetal and maternal outcomes and many have argued that detection of GDM and treatment to reduce maternal hyperglycaemia may offer potential benefits to both mother and baby both during and immediately after pregnancy [5]. GDM is a risk factor for diabetes in women so a diagnosis of GDM may provide an opportunity to intervene through lifestyle modification to prevent or delay the onset of subsequent type 2 diabetes.

Until recently there was a dearth of good-quality evidence to demonstrate that screening and treatment to reduce maternal hyperglycaemia improves outcomes. Consequently, there has been considerable professional disagreement and concomitant variation in screening practice. The American College of Obstetricians and Gynecologists recommended selective screening until 1994 but now recommends universal screening in certain high-risk settings [6]. The American Diabetic Association (ADA) recommended universal screening in 1996 but then revised their recommendations in 1997, suggesting selective screening of women at high risk of GDM [7]. The Australasian Diabetes in Pregnancy Society recommended that all pregnant women should be considered for screening dependent on the availability of resources [2]. Others, such as The US Preventive Services Task Force and the 2003 National Institute of Health and Clinical Excellence (NICE) Antenatal Care guideline, have questioned the role for any screening because of lack of evidence to support its use [8, 9]. Most recently, the International Association of Diabetes and Pregnancy Study Groups has recommended measuring either fasting or random plasma glucose or glycated haemoglobin in either all women or high-risk women at booking depending on the population risk followed by universal testing with an oral glucose tolerance test between 24 and 28 weeks [10].

The Australian Carbohydrate Intolerance Study (ACHOIS) was a high-quality randomised clinical trial that demonstrated that active treatment of GDM in pregnant women whose fasting glucose concentration and 2 h post-75 g glucose challenge concentration was less than 7.8 mmol/l and 7.8–11.0 mmol/l, respectively, was associated with a lower rate of serious perinatal complications compared with routine care (1% versus 4%, p = 0.01) [11].

Following the publication of the ACHOIS study [11], NICE commissioned a rapid update of their antenatal care guidance [12] alongside the development of new guidance for diabetes in pregnancy [13] and concluded that screening and treatment for GDM was cost-effective for the National Health Service (NHS). Since the publication of this guideline, a US study examining the effects of active treatment of mild GDM was reported by Landon et al. [14]; this study found no significant difference in the composite primary outcomes but, as in the ACHOIS study, Landon et al. found significant differences in several pre-specified secondary outcomes including lower mean birthweight, fewer large-for-gestational-age infants, fewer instances of macrosomia and fewer cases of shoulder dystocia and Caesarean delivery [14]. Both the ACHOIS and Landon et al. studies have been used in this analysis to estimate the effects of GDM treatment.

The economic model produced for NICE adopted a population perspective. Given the wide variation in GDM prevalence across the UK [15], however, the most cost-effective screening strategy might vary according to the local prevalence. Many of the screening strategies in the NICE model included GDM risk factors, either alone or in combination. To estimate GDM detection rates and the proportion of women identified for subsequent testing, a model was developed of the relationship between prevalence and the positive and negative predictive values of these risk factors. In practice, this involved a considerable simplification of the complex relationship between risk factors and prevalence. Several risk factors are not independent (e.g. age and BMI) and many are non-dichotomous variables (e.g. the risk of GDM increases with increasing BMI). Within different populations, the proportion of women with and without various risk factors varies even if the overall prevalence is similar. For example, a population of older pregnant women with a small proportion of ‘higher-risk’ ethnic groups may have the same GDM prevalence as a younger population with a higher proportion of ‘higher-risk’ ethnic groups. Therefore, the efficacy of various risk-factor-based screening strategies cannot be readily established from such a model.

A more practical drawback with such a population-based approach is that there are readily identifiable low-risk women in high-risk prevalence areas and vice versa. For such women a screening strategy based on population prevalence rather than their individual risk may be sub-optimal. Therefore, in this paper we explore the cost-effectiveness of screening and treatment of GDM based on a woman’s hypothetical individual risk of disease. The relevance of such an approach is strengthened by a recently published study that attempts to estimate the risks of GDM based on patient characteristics and medical history [16].

Methods

Model description

We developed a probabilistic decision analytic economic model of screening and treatment for GDM to evaluate the cost–utility of eight screening strategies (including a no screening/treatment strategy) at different levels of individual risk (Table 1). This is an extension of the model developed previously for the NICE guidance, which is described in detail elsewhere [12, 13]. The NICE guidance compared 21 screening strategies in addition to no screening. An executable version of the Excel model is available from www.ncc-wch.org.uk. Screening strategies were based on various risk factors (age, ethnicity, BMI and family history) and/or blood tests (random blood glucose [RBG], fasting plasma glucose [FPG] and a 1 h 50 g glucose challenge test [GCT]) followed by a diagnostic test (2 h 75 g OGTT). In the NICE model, a relationship was assumed between GDM prevalence and the proportion of the population that would be identified as being at ‘higher risk’ by the risk strategies and would therefore require screening. In this model, risk is determined exogenously and risk factor strategies are therefore not relevant.

Table 1 Screening and diagnostic strategies used in cost-effectiveness analysis

The screening strategies selected for this study comprise a screening and a diagnostic blood test for GDM either on their own or in combination, and are the same as those strategies that did not include an assessment of risk factors as used in the NICE model, which was guided by clinical opinion of current practice in the NHS. Our model estimates the cost-effectiveness of these strategies across a range of individual risk from 0% to 15% at intervals of 0.1%.

The 75 g 2 h OGTT was selected as the gold standard diagnostic test, and we assume it has a sensitivity and specificity of 100% for diagnosing GDM. Although the OGTT does not, in practice, have perfect sensitivity and specificity, it is a reasonable assumption given that this is the test against which the accuracy of other blood tests is referenced. The sensitivity and specificity of the screening blood tests are given at the foot of Table 1 [1719].

The basic structure of the decision tree used in developing the model is shown in Fig. 1. The costs and outcomes of treatment are important determinants of the cost-effectiveness of screening, with the benefit of screening predicated on clinical and cost-effective treatment. Similarly, the cost-effectiveness of treatment is dependent on patient identification through screening and diagnosis at an acceptable opportunity cost. Therefore, treatment following screening and diagnosis is included in our model. In this analysis, the treatment has been modelled as far as possible according to the protocol used in the ACHOIS study [11].

Fig. 1
figure 1

The basic model structure. For clarity, the tree is depicted as screen versus no screen, although the actual model has seven screening branches for each of the strategies (1–7) in Table 1. [+] indicates a sub-tree that has been collapsed for clarity. This tree is identical to the Treat ‘diet’ sub-tree. SPC, serious perinatal complication

Clinical effectiveness

A recently published study included a meta-analysis of five studies examining the effect of treatment on GDM [20]. Of these studies, only ACHOIS [11] and Landon et al. [14] were adjudged to have adequate randomisation. Therefore, in the base-case analysis, clinical effectiveness of treatment has been estimated based on the pooled results from these two studies. Inclusion criteria for these studies differed slightly. In both studies patients were selected based on a two-step diagnosis using GCT followed by OGTT; in ACHOIS [11] the 2 h cut-off value for diagnosis was set at 7.8 mmol/l while in Landon et al. [14] the 1 h cut-off value was 10.0 mmol/l, the 2 h value was 8.6 mmol/l and the 3 h value was 7.8 mmol/l. In the absence of data showing the relationship between diagnostic criteria and treatment effectiveness it was assumed in this analysis that treatment effect would be similar for the patients in both studies. Sensitivity analysis is presented showing the cost-effective strategies based on non-pooled results from both ACHOIS [11] and Landon et al. [14].

Costs

Costs are given in 2009 prices and denominated in UK pounds. Costs have not been discounted because they are all assumed to occur at, or close to, the time of screening or birth. They are taken from published UK sources where possible and reflect an NHS (or third-party payer) perspective. Cost inputs into the model are listed in Electronic supplementary material (ESM) Table 1. A weighted average cost for a serious perinatal complication was estimated using a cost for each individual component weighted according to their relative frequency (ESM Table 1). The model also included the costs of other outcomes from the ACHOIS [11] and Landon et al. [12] studies. Maternal outcomes included pre-eclampsia, the need for induction of labour and/or Caesarean section, both of which have cost implications. Jaundice requiring phototherapy and admission to neonatal nursery were also included. The costs associated with any adverse outcome do not factor in the costs associated with litigation or compensation. Litigation costs are typically excluded from economic evaluations in healthcare and as such we are following standard practice. Cost-effectiveness studies (as with clinical studies) are usually predicated on care being provided in a non-negligent fashion.

The cost-effectiveness of a healthcare intervention is determined by the societal willingness to pay for an additional unit of health benefit. In the UK NICE methodology recommends an advisory threshold limit of £20,000 per additional quality-adjusted life-year (QALY) gained [21]. Decision-makers in other healthcare systems may choose other threshold values. Probability values for the decision model are derived from the literature as shown in ESM Table 2.

QALYs

In the model, effectiveness was measured in discounted QALYs, using an annual discount rate of 3.5% per year in accordance with NICE methods guidance [21]. A QALY loss associated with a serious perinatal complication was estimated. Each of the individual components of the composite outcome was assigned a QALY loss. A weighted average QALY for a serious perinatal complication was then estimated based on the relative frequency of each individual component (Table 2). The QALY loss associated with a stillbirth or neonatal death was estimated at 25 QALYs, which is an approximation of the discounted QALY from a life expectancy of 80 years lived in full health.

Table 2 Serious perinatal outcomes pooled from data from ACHOIS [11] and Landon et al. [14] and combined across control and intervention groups

The QALY loss from shoulder dystocia and birth trauma is likely to be relatively small as most infants born with these complications do not suffer significant long-term morbidity. Estimates of the QALY loss from shoulder dystocia were calculated based on the QALY loss associated with brachial plexus injuries, one of the most important fetal complications of shoulder dystocia, affecting 4–16% of cases [2224]. Most of these resolve without disability with permanent brachial plexus dysfunction occurring in less than 10% [25]. Culligan et al. [26] estimated a health-state utility of 0.6 for permanent brachial plexus injury (mild to moderate, and including quality of life of mother and child) and a health-state utility of 0.99 for brachial plexus injuries that resolve within 2 months.

  • QALY loss from permanent brachial plexus injury:

    $$ {\hbox{Life expectancy at birth}} \times (1 - 0.6) = {\hbox{32 QALYs}} $$
    $$ {\hbox{Discounted at 3}}.{5}\% {\hbox{ per annum}} = {\hbox{11 QALYs}} $$
  • QALY loss from brachial plexus injury that resolves within 2 months:

    $$ (1 - 0.99) \times {2/12} = 0.00{\hbox{17 QALYs}} $$

A weighted QALY loss for all shoulder dystocia was then estimated (see ESM Table 3). An identically weighted QALY loss for birth trauma was assumed.

In addition, data from the ACHOIS [11] study on women’s health-state utility were used in the calculation of maternal QALYs. These utilities were taken from a subgroup of women with high loss to follow-up. As a result these estimates may be subject to bias. Maternal utility estimates from this study are assumed to reflect any QALY loss to the mother associated with any maternal complication or any reduction in health-related quality of life experienced by the mother as a result of any adverse effect experienced by her baby.

Sensitivity analysis

Probabilistic sensitivity analysis is used to estimate the probability that a strategy will be cost-effective at a given willingness to pay for each additional unit of benefit. For each increment of individual risk, 10,000 simulations were undertaken in which probabilistic parameter values were sampled from a predefined probability distribution. For each simulation, standard methods of economic evaluation were used to exclude strategies that were dominated (in the strict or extended sense) [27], that is, those strategies that were less effective and more costly or which had a higher cost per effect than more effective alternatives. Incremental cost-effectiveness ratios were then calculated for the remaining strategies, with the most effective strategy within the maximum willingness-to-pay threshold (£20,000 per QALY in this case) being the preferred option. The probability distributions used for each variable in the model are shown in ESM Table 2.

To reflect any concern about heterogeneity arising from pooling results from ACHOIS [11] and Landon et al. [14], the model was also run using the results of these studies separately. Finally, an analysis was undertaken in which the impact of test acceptance by women was explored.

Results

The results using the base-case data from both studies are presented in Fig. 2. The strategy that has the greatest likelihood of being cost-effective is dependent on the risk of each individual woman. When GDM risk is <1% then the no screening/treatment strategy is the most likely to be cost-effective; where risk is between 1.0% and 4.2% then FPG followed by OGTT is most likely to be cost-effective; and where risk exceeds 4.2%, OGTT alone is most likely to be cost-effective.

Fig. 2
figure 2

Base-case analysis based on pooled data from ACHOIS [11] and Landon et al. [14]. Light grey, FPG + GGT; mid-grey, GTT; black, no screening

ACHOIS and Landon et al. studies analysed separately

ESM Figs 1 and 2 show the difference between ACHOIS [11] and Landon et al. [14] when analysed separately. These analyses allow the importance of neonatal death to model results to be assessed as there was a large difference in the point estimate of neonatal mortality of untreated GDM in these studies. Using ACHOIS [11] data, no screening only appears cost-effective if a woman’s risk of GDM is less than 0.6%. With a risk of disease between 0.6% and 2.4% a sequential strategy of FPG followed by OGTT in those with a positive FPG is the most cost-effective. Where the risk of GDM is >2.4% then OGTT is the preferred strategy.

If the model is populated using Landon et al. [14] data, then where GDM risk is <4% no screening is optimal. Between 4% and 12.7% an FPG followed by OGTT in those with a positive FPG is most cost-effective. Above 12.7% then OGTT is preferred on economic grounds.

Test acceptance rates

The above analyses made the assumption that all women invited for screening or diagnostic blood tests for GDM during pregnancy would be willing and able to attend these appointments. This assumption is a key driver of the relative cost-effectiveness of different testing strategies. The effect of changes in the test acceptance rates is shown in ESM Fig. 3. In this scenario, test acceptance rates were estimated as per the NICE model (Table 3). Where the risk is <1.6%, no screening is the most cost-effective. With a risk of disease between 1.6% and 3.6% RBG followed by a confirmatory OGTT if positive is preferred. Above a 3.6% risk of disease, a GCT followed by OGTT if positive is the most cost-effective strategy.

Table 3 Estimated blood test acceptance rates (% of women invited for test who will attend)

Discussion

This study has assessed the cost-effectiveness of screening for GDM and has shown that the preferred screening option is dependent on a woman’s individual (hypothetical) risk, the estimated reduction in perinatal death rate and the acceptability of the test to the woman.

All the analyses suggest that there is some level of risk at which it is not cost-effective to screen, although the precise level of the risk will depend on the QALY loss experienced by missed cases. The key difference in the separate analyses of the ACHOIS [11] and Landon et al. [14] studies is the weighting given to perinatal death in the calculation of the QALY loss from a serious perinatal complication. The higher the weight given to perinatal death, the greater the QALY gain from treatment and the lower the threshold risk for optimal testing all women with OGTT. This explains the different results from these analyses and unsurprisingly the risk at which OGTT becomes optimal in the pooled analysis lies somewhere in between.

Where test acceptance is assumed to be 100%, there is always an intermediate level of risk between the alternatives of no screening and OGTT where a sequential strategy of FPG followed by OGTT is optimal. FPG has reasonable sensitivity and at relatively low levels of risk this means that the additional number of GDM cases that would be missed is small. It is also a lower cost strategy than testing all women with OGTT. Nevertheless, it still remains cost-effective to confirm an FPG positive with OGTT as this additional cost of testing is more than fully offset from the saving realised by not treating false positives.

The acceptability of different screening strategies, perhaps measured in clinical practice by attendance rates, may also make a considerable difference to what is considered cost-effective. This is important because it is unlikely that universal testing will be fully achievable because not all women will be willing or able to attend for screening tests. The sensitivity analyses also show that the acceptability of the tests is important in determining the most cost-effective option for a given risk.

Where test acceptance is no longer assumed to be 100%, then a combination of test sensitivity and test acceptability will determine the number of missed cases, which explains why OGTT alone no longer appears as a cost-effective strategy when the assumption of 100% test acceptance is relaxed. With our assumptions about test acceptability, the GCT identifies most GDM cases; however, at lower levels of risk the difference in absolute numbers detected by GCT and the cheaper RBG is small. This explains why RBG followed by OGTT is cost-effective for some intermediate level of risk. As we assume that OGTT would have higher test acceptability as part of a sequential testing strategy where the woman has already had one positive result, it is cost-effective to use OGTT to confirm any positive RBG or GCT before treatment. This is because the confirmatory test reduces the cost of treating false positives.

A limitation of this study is the uncertainty about the QALY gain from treatment; in particular, the uncertainty surrounding the number of perinatal deaths that would be averted as a result of GDM treatment. Such an important treatment effect on GDM-related perinatal mortality may not be observed in clinical practice [28]; however, a conservative assumption of relatively low QALY losses for other serious perinatal complications will offset this to some extent.

A further limitation is that the model is based on treatment effects observed in women diagnosed with mild gestational diabetes using a sequential two-step GCT and OGTT diagnosis, and explicitly excluding those with more severe disease [11, 14]. Although these are the best data for treatment effect it cannot automatically be assumed that women identified by alternative strategies would experience an identical treatment effect size. Furthermore, a larger treatment effect size might be expected than recorded in the trials when the full disease spectrum is considered, which is relevant to a population screening programme

One potentially important outcome for the detection of GDM is the identification of women who are at high risk of subsequent type 2 diabetes or at high risk of GDM in subsequent pregnancies. Both lifestyle and pharmacological interventions, some of which have been undertaken in women with a previous history of GDM, are highly effective in reducing the incidence of type 2 diabetes [16]. Furthermore, a previous diagnosis of GDM should prompt regular screening for type 2 diabetes to identify this at an early stage before it becomes symptomatic or is associated with the development of diabetic complications. Both prevention and early identification of type 2 diabetes should result in both clinical benefits and potential cost savings. The model does not address the potential QALY gains of screening in terms of subsequent pregnancies or reduced or delayed progression to type 2 diabetes.

Currently there is little consensus about the optimal screening strategy for GDM. Previous recommendations have varied from no screening to universal screening. It is only recently that the clinical effectiveness of treatment for GDM has been established and consequently there are only limited data on the cost-effectiveness of the screening and treatment of GDM. While clinical effectiveness is a necessary condition for cost-effectiveness, it is not sufficient. Resources are finite and have competing uses. Demonstrating a benefit from a particular use of resources does not mean that an even greater benefit could not be derived if those resources were deployed elsewhere.

The NICE guideline model attempted to evaluate how the cost-effectiveness of screening varied with disease prevalence but made assumptions to simplify the complex relationship between GDM and associated risk factors. This relationship was no longer relevant in the extended model presented here, although a screening strategy based on individual risk implicitly uses a form of risk factor screening because it is patient characteristics that determine the individual’s risk. Screening based on individual risk is potentially more sophisticated and cost-effective than the dichotomous approaches to risk factor screening that are widely discussed in the literature. A recently published risk prediction model [29] is an example of the type of approach that could be used to determine the risk of an individual patient on which the most cost-effective screening strategy could then be based.

A major strength of this analysis is that the value of screening is considered within the context of the potential improvement in health outcomes for both mother and baby. In other studies of screening for GDM [1, 5, 3035] the investigators did not consider the implications on the health outcomes through modelling treatment; these studies estimated the cost per correct diagnosis only. Cost-effectiveness conclusions based on such a measure are usually flawed, because they ignore the impact on health outcomes of treatment and any concomitant health gains arising from additional detected cases. Consequently health planners can only draw limited conclusions from them when deciding how to allocate resources.

Conclusion

The trade-off between detection and unnecessary testing is at the heart of the economic problem of developing a screening strategy for any condition or disease. Recent NICE guidance proposed a screening test based on a population prevalence approach. The current study suggests that if a woman’s individual risk of GDM could be accurately predicted, then healthcare resource allocation could be improved by providing an individualised screening strategy.

This study suggests that while some form of screening is usually cost-effective, the optimal strategy varies according to the woman’s individual risk (or pre-test probability of disease). When risk of GDM is high, a highly specific and sensitive strategy is optimal; conversely, when the risk is very low, the most cost-effective strategy is to do nothing. Even when a test is capable of detecting GDM accurately, in a low prevalence population the benefits of identifying and treating cases can be outweighed by the costs of doing so. This analysis shows that a screening programme tailored to the individual risk of each patient could enhance cost-effectiveness.