Background

The main consequence of threatened preterm labor (TPL) is preterm birth, which is the leading cause of neonatal mortality and severe morbidity. Preterm birth is defined as birth before 37 weeks of gestation, but it is generally stratified in three groups according to the gestational age; 24–27 (extremely preterm), 28–31 (very preterm) and 32–34 (moderate preterm) weeks [1].

In developed countries, spontaneous preterm birth occurs in 6–13% of pregnancies [1]. TPL is considered as the cause of preterm birth in 45% of cases, the other causes being premature preterm rupture of the membranes in 25% and maternal or fetal infections in 30% of the cases [2]. TPL is also one of the main causes of hospitalization during pregnancy, and leads to substantial costs estimated at $820 million in the United States of America [3]. Treatment consists of prolonging pregnancy with tocolysis, reducing neonatal mortality and morbidities in cases of preterm delivery by injecting corticosteroids, and sometimes transfer to a specialized center [4]. Studies show that 75–95% of women with threatened preterm labor do not deliver within 7 days, and 40% will even deliver at term [5, 6]. Furthermore, 44% of these women have at least two subsequent admissions for preterm labor, thus leading to additional costs [6]. Therefore, it appears important to identify true TPL early in order to decrease neonatal morbidity and mortality, avoid maternal morbidity induced by antepartum bed rest [7] and unnecessary treatment, and to reduce costs.

According to the literature, diagnostic tests such as cervical length measurement, qualitative cervicovaginal fetal fibronectin (fFN), and cervicovaginal interleukin-6 (IL-6) have been proven to increase accuracy when predicting premature birth [8,9,10,11,12,13,14]. However, no diagnostic strategy is clearly recommended by international guidelines [15,16,17]. Cervical length measurement has proven to be a more efficient strategy than medical examination for predicting preterm birth in symptomatic women [14, 17], but currently neither cervical length measurement nor qualitative cervicovaginal fFN can be recommended, and further investigation is required [18].

These diagnostic methods have been assessed in several medico-economic analyses based on decision analysis models, whose results showed they could be accurate enough to be cost-effective [19,20,21,22,23]. However, newer diagnostic tests, such as quantitative fetal fibronectin or proteins in maternal serum, which have also shown significant results, were not included in previous cost-effectiveness studies [24, 25]. Subsequent antepartum hospitalizations were not taken into account either. Given the large number of strategies to consider and the lack of consensus regarding the optimal strategy to recommend, the objective of this study was to compare seven diagnostic methods in terms of cost and effectiveness using a decision analysis model in singleton pregnancy presenting threatened preterm labor.

Methods

Study population

A Medline literature research was performed. It was restricted to studies written in English or in French from 2004 onwards, and included “preterm labor, cervical length, fetal fibronectin, preterm birth” as key words. Most of the published studies considered (1) women with a singleton pregnancy, (2) hospitalized for TPL between 24 and 34 weeks gestational age with symptoms indicating threatened preterm delivery based on the presence of regular uterine contractions and intact membranes with possible cervical change but without advanced cervical dilation (< 3 cm), (3) with a preterm delivery (PTD) occurring within 7 days of the initial hospitalization, and (4) without severe maternal disease such as severe gestational arterial hypertension, pre-eclampsia, eclampsia, premature preterm rupture of the membranes, and placenta previa, or cases of termination of pregnancy for maternal and fetal medical reasons.

Only the studies using these inclusion criteria were taken into account in our study.

General description of the model

A cost-effectiveness analysis was conducted using a decision analysis model. Seven diagnostic tests among women presenting with TPL were compared until hospitalization discharge. The choice between the seven tests was represented by a decision node. All clinical events in each strategy were then associated with estimated conditional transition probabilities. At the end of each alternative strategy of the decision tree, two payoffs were assigned corresponding to the total cost of care and the effectiveness. The decision tree was built and analyzed using TreeAge Pro 2017 software (TreeAge Software, Inc., Williamstown, MA).

Description of strategies

Cervical length (CL) measured by transvaginal ultrasonography defined as positive if CL < 25 mm was considered as the reference strategy (Sref) because it appears to be the strategy the most widely used by French health care providers.

Other alternatives were:

S2::

a qualitative rapid fetal fibronectin (fFN) test, defined as positive if fFN ≥ 0.05 µg/ml;

S3::

a quantitative fetal fibronectin test, defined as positive if fFN ≥ 200 ng/ml;

S4::

a cervical interleukin-6 test (IL-6), defined as positive if IL-6 ≥ 210 pg/ml;

S5::

a combination test associating CL measured by transvaginal ultrasonography, plasma on activation normal T-expressed and secreted regulated (RANTES) and plasma interleukin-10, defined as positive if CL ≤ 18 mm, plasma RANTES ≥ 49 293 pg/ml and plasma interleukin-10 ≥ 48 pg/ml;

S6::

CL measured by transvaginal ultrasonography, defined as positive if CL < 15 mm;

S7::

a test associating CL and qualitative fetal fFN, defined as positive if CL < 15 mm or if CL is 16–30 mm and qualitative fetal fFN ≥ 0.05 µg/ml.

Description of the decision tree

For each compared strategy, in case of positive results corresponding to the probability of delivering within 7 days, women were hospitalized and treated. Treatment was defined as the administration of tocolytic agents and steroids, combined with the transfer of women to a perinatal center depending on the GA (Fig. 1). The possibility that a woman presented a positive result but did not deliver within the first 7 days was also modelled. In this case, a state-transition Markov model was used to simulate the probability of giving birth until 37 weeks gestation (Fig. 2). Four health states and one absorbing state were modelled: home monitoring, new hospitalisation, delivery with severe neonatal morbidity, delivery without severe neonatal morbidity and delivery with death of the new-born. At each new cycle of one week, the women could move from one state to another through predefined transition probabilities either until preterm delivery or until delivery at 37 weeks. A similar follow-up was modelled in case of negative test results.

Fig. 1
figure 1

Decision analysis model. vpp positive predictive value. vpn negative predictive value

Fig. 2
figure 2

Markov model

Model parameters

Two types of transition probabilities have to be distinguished: parameters which had to be estimated using data issued from the literature research, and those which were directly introduced into the model, based on national validated sources.

The parameters estimated from literature data concerned the probability of having a positive diagnostic test or not. It was estimated from a contingency table, based on an incidence of 9.7%, defined by the median of data issued from the literature and the sensitivity and specificity for each of the seven diagnostic tests (Table 1). It was therefore possible to estimate the probability of having a preterm delivery or not within 7 days in case of positive test (true and false positive situation respectively) and negative test (false and true negative situation) (Fig. 1).

Table 1 Values, ranges, distributions and references for parameters used in the decision tree

Data directly based on national validated sources included:

  • The probability of serious neonatal adverse events issued from the results of the EPIPAGE-2, a national cohort [26]. These events were defined as perinatal death or severe neonatal morbidity (severe bronchopulmonary dysplasia, severe necrotizing enterocolitis, severe retinopathy of prematurity, severe cerebral abnormalities on cranial ultrasonography).

  • The probability of subsequent hospitalizations after discharge. It was issued from the data collected by a national medico-administrative database, the PMSI (Programme de médicalisation du système d’information). This database is used to determine the activity-based payment for hospitals in France. The reliability and validity of the PMSI data have already been assessed [27].

The parameters used in the model, the ranges over which they were tested and their sources are shown in Table 1.

Analysis

Effectiveness

The effectiveness endpoint was the number of serious adverse events concerning the new-born, including either perinatal death or severe neonatal morbidity. We considered a score of 1 for death or severe neonatal morbidity and 0 otherwise.

Costs

The economic analysis was performed from the French healthcare system perspective. Only direct medical costs were taken into account. Costs were expressed in Euros (€) for the year 2012. Costs were not updated given the stability of prices in France (average annual variation of the consumer price index less than 1% between 2012 and 2017) [28].

The mothers’ and the newborns’ hospitalization was identified using their associated Diagnosis Related Groups (DRG). Their costs were estimated using the National cost survey sample named ‘Echelle nationale des coûts’ (ENC) [29]. The ENC estimates production costs of hospitalisation from a sample of public and private care centers. They were categorized in medical cost (consumable, diagnostic test, drugs, human resources) and structure cost (laundry, restauration, global logistics, depression and maintenance). Home follow-up care costs were estimated from the reimbursement of the national insurance health system concerning midwife consultation. All economic data are presented in Table 2.

Table 2 Values, ranges, distributions and references for economic parameters used in the decision tree (costs 2012, €)

Cost-effectiveness analysis

All strategies were compared with each other. Strategies were ranked from the least to the most costly. Strategies that were more costly and less effective (i.e. presenting a higher number of serious adverse events) than the next alternative were excluded by simple dominance. Strategies presenting a higher incremental cost-effectiveness ratio (ICER) than that of the next most effective alternative were excluded by extended dominance. ICER was calculated according to the following formula:

$${\text{ICER}} = \left( {{\text{Mean Cost}}_{\text{test n}} - {\text{Mean Cost}}_{{{\text{test n}} - 1}} } \right)/\left( {{\text{Mean effectiveness}}_{\text{test n}} - {\text{Mean effectiveness}}_{{{\text{test n}} - 1}} } \right)$$

Four cost-effectiveness analyses were performed: one for each GA group (24–27, 28–31 and 32–34 weeks), and one for the whole period of 24–34 weeks GA, which was obtained by adjusting the cost and effectiveness results of each GA group by the proportion of mothers in each of these periods. As the time frame was less than one year, costs and effectiveness were not discounted.

Sensitivity analyses

Three deterministic sensitivity analyses were performed to test the robustness of the model. The first analysis concerned the incidence of preterm birth which was fixed at 5% and then at 15%.

The second analysis concerned the diagnostic performances of the tests. The maximum values of sensitivities and specificities were first simultaneously tested, and a similar analysis was then performed with their minimum values.

Probabilistic analyses

A Monte Carlo simulation was also performed. The Monte Carlo analysis draws a cost-effectiveness plane divided into four quadrants [30]: the northeast (NE) quadrant contained situations where incremental costs and effects were both positive (ΔE > 0 and ΔC > 0), indicating that the new test was dominated by the alternative test (because in our study, a high level of effectiveness corresponded to a high level of severe neonatal events for the newborn). The southwest (SW) quadrant contained the opposite situation where the new test dominated the alternative (ΔE < 0 and ΔC < 0). The northwest (NW) quadrant corresponded to the situation where incremental costs were positive and incremental effects negative (ΔE < 0 and ΔC > 0), indicating that a trade-off needed to be made and an ICER had to be calculated. Finally, in the southeast (SE) quadrant, we find a situation with negative incremental effects as well as cost savings (ΔE > 0 and ΔC < 0) [31]. The distribution of transition probabilities and costs were sampled in 5000 consecutive iterations (Table 1).

Results

Baseline cost-effectiveness analysis

Results showed that at 24–34 GA, cervical length < 15 mm, or a positive qualitative fetal fibronectin test when CL was between 16 and 30 mm (S7) was the least costly and the most effective strategy (because it is associated with the lowest number of neonatal serious adverse events) and dominated the reference strategy (CL < 25 mm) and all other alternatives (Table 3). The rate of perinatal death or severe neonatal morbidity was decreased in a range varying between 9 and 15% (from 2.33 to 4.2 serious neonatal adverse events avoided per 1000 new-borns) and cost saving of between 25 and 31% (from €1107 to €1481 per mother–child) depending on the strategies compared.

Table 3 Cost and effectiveness of seven diagnostic strategies for threatened preterm labor at 24–34 weeks gestational age

Similar results were found for each gestational age group (24–27; 28–31; 32–34). Results also showed that the earlier the prematurity, the higher the number of avoided serious adverse neonatal events when strategies were compared (Additional file 1: Supplement A).

Deterministic sensitivity analyses

Results issued from the deterministic analyses concerning the incidence of preterm birth confirmed the efficiency of the association of CL and qualitative fFN (S7) compared with the other strategies. Results issued from the analyses using the minimum and the maximum values of the diagnostic test performances also confirmed this result (Table 4).

Table 4 Deterministic sensitivity analyses

Probabilistic analysis

Table 5 indicates the proportion of points for each GA, representing pairs of incremental costs and effectiveness in each of these four quadrants. These points were issued from the comparison between the most efficient strategy, identified with the baseline cost-effectiveness analysis of this work, and the six other diagnostic tests.

Table 5 Probabilistic analysis (5000 iterations): proportions (%) of pairs of incremental cost and incremental severe adverse neonatal events associated with CL < 15 mm or CL [16–30 mm] and fFN qualitative compared to each of the six other strategies

The results showed that when the association of CL and qualitative fFN was compared with CL < 15 mm (S6), most of the pairs of incremental costs and effectiveness were contained in the SW quadrant of the cost-effectiveness plane (ΔE < 0 and ΔC < 0, i.e. showing a higher effectiveness and cost-savings associated with CL and qualitative fFN): at 24–27 GA, the probability that the association of CL and qualitative fFN dominates CL < 15 mm was estimated to be 90%; at 28–31 and 32–34 GA, the analysis depicted a probability of 92 and 96% respectively.

More uncertainty was observed concerning the comparison between CL and qualitative fFN and the five other strategies (Sref, S2, S3, S4 and S5): at 24–27 GA, the proportion of pairs of incremental results was split between the SW quadrant and the NW quadrant range between 49 and 71% according to the strategies. At 28–31, this proportion varied between 48 and 73%. At 32–34, the range was 55–84%. Moreover, the association of CL and qualitative fFN (S7) was dominated (NE quadrant) by S3, S4, S5 in almost half of the cases and whatever the GA (Table 5 and Additional file 2: Supplement B).

Discussion

Summary of key findings

The results showed that among seven diagnostic strategies in singleton pregnancy presenting TPL between the GA of 24 and 34 weeks, CL less than 15 mm or a positive qualitative fFN when CL was between 16 and 30 mm (S7), was the most efficient diagnostic strategy and led to a reduction in neonatal morbidity and mortality and significant cost savings compared to all other alternative strategies (with a cost savings of 1481€ per mother–child and 4.2 serious neonatal adverse events avoided per 1000 new-borns). The deterministic and probabilistic analyses confirmed the domination of the association of CL and qualitative fFN over the other tests and especially over CL < 15 mm which was the least effective and the most costly strategy. This can be explained by the fact that this strategy had a poor sensitivity compared to the combination of strategies.

Comparison with other studies

To the best of our knowledge, relatively few cost-effectiveness analyses on this topic have been performed. Most of the studies did not include the combination of CL and qualitative fFN [20,21,22,23], except the study conducted by van Baaren et al. in the Netherlands who found that testing fFN in women with CL between 10 and 30 mm was the most efficient strategy [19]. Both our decision analysis models provided arguments in favor of this strategy for the international medical community [15].

The previous observational studies showed that CL combined with fFN could improve the identification of women with a low risk of delivering spontaneously within 7 days [8, 32,33,34] and thus reduce costs and the number of hospitalizations [21]. The clinical trial conducted by Ness et al. also showed that CL combined with fFN was also associated with reduced evaluation time in triage for women with CL ≥ 30 mm [36]. In consequence, this strategy (CL less than 15 mm or a positive qualitative fFN when CL was between 16 and 30 mm (S7)) could be easily applied in current obstetrics practice whatever the type of center. However, it requires the application of the standardized protocol described by Schmitz et al. [35], as clinicians must sample qualitative fFN and then measure CL by transvaginal ultrasound before making a decision.

Strengths and limitations

The main strength of our study is that it was based on reliable data from three official and validated sources: the Epipage 2-cohort which provided neonatal morbidity and mortality data, PMSI which is a national medico-administrative database, and the National cost sample survey which provided costs issued from public and private care centers [29]. These data gave us the opportunity to provide detailed results according to GA. The stratified cost-effectiveness analysis showed that the strategy combining CL with qualitative fFN had a positive economic and medical impact according to GA groups: at 24–27 weeks GA, the number of serious adverse neonatal events was much higher compared to 28–31 and 32–34 GA with an overall cost not exceeding €1500 per mother child. This overall cost should be traded-off with the cost of complications associated with high prematurity [36, 37] and the cost of follow-up for these children over a longer period of time. The national sources of data also provided enough robust parameters to be able to implement a Markov model in our decision tree to take into account the complexity of mother–child management and to avoid underestimating the costs associated with subsequent hospitalization after discharge. Another strength was the selection for which, contrary to the study conducted by van Baaren, only the studies including the same criteria of inclusion population were included in our analyses, therefore limiting the selection bias.

Our analysis does present some limits. Firstly, our population did not include all preterm births because women presenting either a disease associated with a high risk of preterm birth or multiple pregnancy were excluded from our analyses because these medical situations do not require the use of diagnostic tests for TPL. Moreover, the data on diagnostic performance used for our study was all derived from observational studies, which can be prone to bias. Then the choice of the reference strategy in our baseline cost-effectiveness analysis. Currently, no diagnostic strategy is clearly recommended by international guidelines, and our choice to have taken cervical length as the reference strategy may be controversial. However, this choice had no effect on our results because all the other strategies were dominated.

Another limit concerns the strategies modelled: even if we modelled the use of new diagnostic tests such as quantitative fetal fibronectin or proteins in maternal serum, the combination of quantitative fFN and CL was not included in the model due to a lack of data.

For the economic analysis, only direct medical costs in public hospitals were taken into account, which raises the question of the transferability of the results to other countries where the organization and the financing of care is different. We also made the choice not to perform a cost-utility analysis because the main goal of our work was to assess the clinical consequences associated with threatened preterm labor. Prematurity is associated with long-term neuro-motor, sensory and cognitive disabilities. Given the economic consequences linked to these impairments, an analysis using Quality Adjusted Life Year (QALY) would have been relevant. Unfortunately, the medical and economic data required for this type of analysis were not available and would have required conducting further studies.

Conclusion

The strategy that combines CL with qualitative fFN, defined as positive if either CL < 15 mm or if CL is 16–30 mm and qualitative fFN is positive, appeared to be the most efficient strategy. Our findings could lead to a significant reduction of medical costs. Furthermore, this decision analysis provides arguments for establishing new guidelines, and informing the daily practice of clinicians in regional perinatal networks. Indeed, our suggested strategy is based on current obstetric practices, and represents no additional costs compared to the most used diagnostic test at the moment in France. This test can be implemented whatever the level of maternity center, which would make it easier to move women at high risk of delivery towards a center equipped to optimize the health of premature newborns.