FormalPara Key Points for Decision Makers

This study provides evidence from large multi-national samples to generalise the results of previous smaller scale research around patient preferences for assistive reproductive therapies.

Effectiveness of treatment was the most (or second most) important attribute across all samples but we do not see a preference for effectiveness to the exclusion of other considerations, or a wide gap between effectiveness and other attributes in terms of relative importance.

Respondents placed significant value on access to treatment, reflected in the ‘option value’ of treatment, but also had a substantial willingness to pay for improvements in the effectiveness of treatment, a greater degree of shared decision making, and among some respondents, less discomfort in treatment.

The Chinese sample was insensitive to the cost of treatment in their choices, although their preferences across the other attributes were broadly similar to the other samples. We hypothesise that this result may reflect some social desirability bias that discouraged respondents in this sample from considering cost in their choice of treatment.

1 Introduction

Cultural, demographic and other trends over recent decades have led to later childbearing as well as increasing obesity rates, a rise of sexually transmitted diseases and decreasing sperm quality [1,2,3]. Together, these have meant that an increasing number of prospective parents are experiencing subfertility, defined as an inability to achieve a clinical pregnancy after trying for more than 12 months [4]. A recent review found that in more developed countries, 12-month infertility rates ranged from 3.5 to 16.7%, and that 40–70% of these cases sought medical treatment for infertility [5].

Assistive reproductive therapies (ART) can help individuals or couples who have difficulty conceiving naturally to get pregnant and carry a baby to term. However, while ART improve a couple’s chance of conceiving a child, different forms of treatment are associated with differing effectiveness, risks and convenience. These characteristics can affect patient preferences for specific forms of ART. An understanding of how patients prioritise and trade-off between the positive and negative aspects of treatments is essential to ensuring that their care is aligned with their preferences and therefore provides the greatest value. Research has also suggested that physicians and patients often differ in their view of the most important characteristics of treatment, including infertility treatments [6,7,8]. A misaligned understanding of patient preferences can undermine shared decision making, as providers may emphasise aspects of treatment that are less important to patients, or misunderstand acceptable trade-offs between different aspects of treatment.

Previous research around preferences for fertility treatment has found that effectiveness, in terms of the probability of a live birth, is typically (but not invariably) the single most important characteristic of a treatment to patients, but that they are willing to accept some reduction in the probability of a successful live birth to improve other aspects of treatment. These other aspects include lower risks of adverse events, more convenient modes of administration and a greater degree of shared decision making [6, 7, 9,10,11]. Most of these studies, though, have been relatively small by the current standards of stated preference research [12], usually 200 respondents or less, and are reflective of a single country. This limits the ability to generalise the results across patients or to understand how treatment preferences might change in different national contexts. Additionally, many studies in this area have not included cost as an attribute, which is arguably a significant shortcoming given that many patients must finance their own fertility treatment in many countries [13,14,15].

Our primary objective here is to quantify the relative importance of different aspects of fertility treatment in a more generalisable context, including in terms of willingness to pay (WTP) for different characteristics of ART. As such, we undertook a large discrete choice experiment (DCE) with persons who had experience with subfertility or ART in five countries/regions: USA, UK, the Nordic region (Denmark, Norway, Sweden and Finland), Spain and China.

2 Methods

We administered an online survey in two sections. In the first section, respondents were asked about their demographic characteristics, experience with subfertility, their attitudes towards infertility and their willingness to contribute to a publicly funded ART programme. These results are reported elsewhere [16]. In the second section, respondents who indicated that they had tried for more than 12 months to have a baby or had sought medical treatment to get pregnant were presented a series of DCE tasks to elicit their preferences over different aspects of ART. The survey itself is available as Electronic Supplementary Material (ESM).

A DCE is a quantitative approach to eliciting individuals’ preferences over health states. Respondents are asked to choose their most preferred option from a choice set of two or more alternatives described in terms of a common set of attributes and differing attribute levels. This methodology has previously been applied in the context of infertility and ART [6, 7, 9,10,11] as well as in other healthcare contexts [12, 17].

2.1 Design of the DCE

The attributes included in the DCE were derived through reference to previous DCEs and other preference studies in this area [6, 7, 9, 11, 18, 19]. Attributes were selected to encourage consideration of trade-offs between treatment effectiveness, risk of adverse effects, treatment (dis)comfort and (in)convenience, and cost per ART cycle. We also sought to understand the importance of patient centredness or shared decision making in treatment decisions relative to other attributes.

Attribute levels were selected by the authors to cover the range of salient levels, with reference to the previous DCEs mentioned above along with unstructured input and review from a clinical expert in reproductive medicine. The midpoint cost per cycle in each country was based on indicative costs from IVF-Worldwide [20], updated to 2020 costs using country-specific price indices. IVF-Worldwide cost estimates were calculated according to a methodology reported by Collins [21] and includes country-specific costs associated with initial consultation, basic in vitro fertilisation treatment, intracytoplasmic sperm injection, hormonal drugs, embryo freezing, other investigations and regulatory fees. Estimates do not include productivity or other indirect costs. Upper and lower levels were defined as ± 40% of the midpoint, reflecting the reported variation in UK cost per cycle. [20] Costs were presented to respondents in country-specific currencies but converted to Euro using XE.com historical exchange rates at the time of data analysis for comparability between study countries. The attributes and attribute levels in the experimental design are shown in Table 1, and the country-specific costs are shown in Table 2.

Table 1 Attributes and levels in the discrete choice experiment
Table 2 Indicative cost per cycle by study country

We used Ngene™ software (ChoiceMetrics Pty Ltd; Sydney, New South Wales, Australia) version 1.2.1 to generate a d-efficient fractional factorial experimental design, based on a main-effects model focusing on the independent effect of each attribute on choice. We assumed non-informative priors in developing the design. We produced a 36-set design with two treatment alternatives in each set and included a fixed ‘no treatment’ alternative in each choice task, with all attribute levels set to zero (i.e. no change from the current state). An example DCE task is shown in Box 1. The attributes and levels presented in the tasks were described to respondents in the introduction to the DCE, included as part of the survey in the ESM.

Box 1 Sample discrete choice experiment task

2.2 Survey

Samples were recruited from general population survey panels maintained by Dynata™ in USA, UK, Denmark, Norway, Sweden, Finland, Spain and China. These countries were chosen to represent a diverse cross-section of cultural attitudes and preferences towards infertility and ART. Because of the small populations of the individual Nordic countries, participants from these countries were pooled into a combined Nordic sample. Nationally representative samples in terms of age and sex were recruited in each country/region and supplemented by an ‘over-sample’ of reproductive age respondents to ensure sufficient statistical power for the DCE phase of the study. This ‘over-sample’ is not nationally representative as it is based on individuals with self-reported experience of subfertility or ART. The supplementary sample size was informed by recent practice in DCE elicitations [12].

Individuals who had previously registered with Dynata™ received an e-mail inviting them to learn more about this study. An accompanying link took them to an online participant information sheet (PIS), which outlined the purpose of the study and provided a link to the questionnaire. The PIS, questionnaire, and the statistical analysis plan were reviewed and approved by the University of East Anglia Faculty of Medicine and Health Science Ethics Committee, Norwich UK (reference 201819-090).

Each respondent saw 11 choice sets: ten unique sets plus one repeated set to test respondent consistency. The unique sets were selected ‘dynamically’: the ten sets with the fewest number of responses to that point in the data collection were selected from the full experimental design to ensure that each set was seen a similar number of times across all respondents. In the repeated set, an earlier task was re-presented with the order of two treatment alternatives reversed. In all cases, the third task was reversed and re-presented as the eighth task. Respondents who did not choose the same alternative (including ‘no treatment’) in both tasks were flagged as potentially non-attentive [22]. We recorded completion times and flagged respondents who completed the questionnaire in less than half the median completion time for their country. All respondents were included in the primary analysis but respondents flagged as both fast and inconsistent were excluded in a sensitivity analysis.

Respondents were asked for demographic details including their age group, highest level of education and income category. Each version of the questionnaire presented five income categories. In the UK, these categories were presented in £15,000 intervals (< 15,000; 15,000–30,000; 30,000–45,000; 45,000–60,000; > 60,000), and the other versions used roughly equivalent intervals in local currency.

A small convenience sample (N = 25) was recruited from each country/region to pilot the full survey. Respondents were asked to rate the difficulty and length of the survey on a 5-point Likert scale, from very easy/short to much too difficult/long. The pilot identified an issue around the currency symbols presented to respondents in China; this was corrected in the final version. Likert ratings of the length and difficulty of the survey did not flag concerns: 7% of pilot respondents found the survey ‘long’ or ‘very long’ and 9% found it ‘difficult’ or ‘very difficult’.

2.3 Statistical Analysis

Prior to modelling DCE responses, we generated descriptive statistics of the frequency of ‘no treatment’ choices and tested for non-trading or dominant preferences to confirm the theoretical validity of the elicitation. Respondents with a dominant preference always choose the alternative that maximises or minimises the level of a single attribute, such as treatment effectiveness, without regard to the level of other attributes such as cost or discomfort. Strictly dominant preferences are inconsistent with the theory of compensatory decision making that underlies DCE methods, and an ‘excessive’ proportion of dominant preferences may invalidate a DCE. However, such preferences are not ‘irrational’, and they are almost impossible to definitively identify in a fractional factorial design where respondents see only a subset of all possible attribute-level combinations [23, 24]. As such, we note the proportion of respondents with potentially dominant preferences but do not exclude these respondents from the analysis.

Discrete choice experiment responses were analysed by country/region. Cost was included in the analysis as a continuous variable and all other attributes were effects coded to allow for non-linear preferences over the levels of the different attributes. The middle level of each attribute was used as the reference level, except for the two-level discomfort attribute, where ‘mild’ was used as the reference.

We specified an additive, main effects utility function for the treatment alternatives A and B, and specified the ‘no treatment’ alternative C as the reference alternative:

$$U\left(A\right)={\alpha }_{A}+{\beta }_{1}\times \mathrm{Effect}10+{\beta }_{2}\times \mathrm{Effect}40+{\beta }_{3}\times \mathrm{Complications}2+{\beta }_{4}\times \mathrm{Complications}8+{\beta }_{5}\times \mathrm{Discomfort}.\mathrm{Strong}+{\beta }_{6}\times \mathrm{SharedDM}.\mathrm{None}+{\beta }_{7}\times \mathrm{SharedDM}.\mathrm{Full}+{\beta }_{8}\times \mathrm{Injections}1+{\beta }_{9}\times \mathrm{Injections}5+{\beta }_{10}\times \mathrm{CycleCost}.\mathrm{Euros}/100,$$
$$U\left(B\right)={\alpha }_{B}+{\beta }_{1}\times \mathrm{Effect}10+{\beta }_{2}\times \mathrm{Effect}40+{\beta }_{3}\times \mathrm{Complications}2+{\beta }_{4}\times \mathrm{Complications}8+{\beta }_{5}\times \mathrm{Discomfort}.\mathrm{Strong}+{\beta }_{6}\times \mathrm{SharedDM}.\mathrm{None}+{\beta }_{7}\times \mathrm{SharedDM}.\mathrm{Full}+{\beta }_{8}\times \mathrm{Injections}1+{\beta }_{9}\times \mathrm{Injections}5+{\beta }_{10}\times \mathrm{CycleCost}.\mathrm{Euros}/100,$$
$$U(C)=0,$$

where αA and αB are treatment-specific constants, representing the utility of treatment relative to no treatment, independent of attribute levels. We averaged these treatment-specific constants to represent the value of having treatment options, independent of the characteristics of those treatments. We refer to this value as ‘option value’.

In the first instance, we used separate multinomial logit models to estimate the part-worth utilities of each attribute level for each country/region. To allow for unobserved heterogeneity, we also tested random parameters, or mixed-logit models [25]. We assigned a normal distribution to all parameters except cost and generated 1000 Halton draws. We modelled cost as a deterministic parameter to facilitate estimates WTP for changes in attribute levels.

Where heterogeneity in the random coefficients was statistically significant, we used the Krinsky–Robb approach to estimate non-parametric 95% confidence intervals around the point estimate, based on the 2.5th and 97.5th percentiles of the random coefficient draws [26]. Where heterogeneity was not statistically significant, we used the standard error of the point estimate. The Akaike Information Criterion was used to compare the fit of the multinomial logit and mixed-logit models.

To understand heterogeneity in preferences by respondent characteristics, we estimated a series of models including the main effects as above as well as an interaction term between each of the main effects and specific characteristic flags (e.g. \({\beta }_{1}\times \mathrm{Effect}10+ {\beta }_{2}\times \mathrm{Effect}10 \times \mathrm{Female}\)). Each subgroup was estimated separately. These interaction terms capture the difference in part-worth utilities between the specified subgroup and the remainder of the sample. We combined the national samples into a single dataset to test the impact of respondent characteristics other than nationality, specifically: ‘fast’ vs ‘non-fast’ responders; female vs male individuals; higher (quintiles 4 and 5) vs lower income; inconsistent vs consistent in the repeated task; not in a long-term relationship vs in a long-term relationship; and received ART vs no ART. Note that we rescaled the cost attribute to better highlight differences between subgroups. We also tested the impact of excluding respondents flagged as jointly ‘fast and inconsistent’ from the national samples in a sensitivity analysis.

The relative importance of the main effects for each sample was estimated as the absolute difference in the part-worth utility of the most preferred and least preferred levels of each attribute, as a share of the sum of differences across all attributes. Under this approach, attributes with a greater absolute difference in utility are relatively more important than attributes with a smaller absolute difference in utility. [27]

Finally, the implied WTP for a change in attribute levels was estimated using the cost attribute to reframe the part-worth utilities in terms of Euro. We estimated WTP using Small and Rosen’s compensating variation approach [28]:

$${\mathrm{WTP}}_{x}=\frac{1}{-{\beta }_{\mathrm{Cost}}}{(v}_{x1}-{v}_{x0}),$$

where βCost is the coefficient on the cost parameter and vx0 and vx1 are part-worth utilities before and after a change in the level of attribute x. Given our main effects specification (vx1vx0) is equivalent to (βx1 − βx0), where βx0 is the reference level of attribute x and βx1 is the new level. We estimated WTP for the ‘option value’ of treatment as \(\frac{\overline{\alpha }}{{-\beta }_{\mathrm{Cost}}}\).

All analyses were conducted in R statistical software, version 4.0.5. [29] The MLOGIT [30] package was used to model choices, and the GGPLOT2 [31] and GGPUBR [32] packages were used to produce the figures.

3 Results

The survey was administered in February 2020 and age–sex quotas for all samples were filled within 2 weeks of sending the first invitations. The characteristics of 7565 respondents who indicated that they had tried to have a baby and experienced 12 months or more of infertility, received medical assistance to try to get pregnant, or both, are shown in Table 3 by country/region. For most countries/regions, the number of participants peaked between 31 and 45 years of age, although the Chinese sample was slightly younger than the others, peaking between 26 and 35 years of age. The largest proportion of the Chinese and Nordic samples, 52% and 35%, respectively, were in the third of the five income categories, corresponding with a UK income of £30,000–45,000. The largest proportions of the Spanish (51%) and UK (33%) samples were in the second quintile (corresponding with £15,000–30,000), and the largest proportion of the USA sample (24%) was in the fourth quintile (corresponding with £45,000–60,000).

Table 3 Respondent counts and characteristics by country/region

Response behaviours are summarised in Table 4. Median DCE completion times ranged from 1½ to 2½ minutes, and approximately 20% of respondents had a completion time of less than half their country-specific median. Inconsistency in the repeated task was between 32 and 40% across the countries/regions in the survey. The Nordic (13%) and UK (13%) samples had significantly higher proportions of joint ‘fast and inconsistent’ respondents than China (8%), USA (10%) and Spain (9%).

Table 4 Response behaviours by country/region

Twelve per cent of all choices were for ‘no treatment’, and analysis of variance showed that the proportion of ‘no treatment’ choices was significantly lower in China (7.1%) relative to other regions (13.6–18.4%). Six hundred and six respondents (8.0%) chose ‘no treatment’ in a majority of the tasks they saw (six or more ‘no treatment’ choices out of 11 tasks) and 256 (3.4%) chose ‘no treatment’ in all tasks. Logistic regression showed a statistically significant association between choosing no treatment in a majority of tasks and increasing respondent age, and that female individuals were significantly more likely to choose no treatment than male individuals, as were respondents in the lowest income category relative to the middle-income category.

There was some evidence of dominant preferences around effectiveness and discomfort. Of all respondents, 7.8% and 5.7% always chose the alternative that maximised the level of effectiveness or minimised the level of discomfort, respectively. The proportion of respondents with a dominant preference for these two attributes was substantially and significantly greater than for the other attributes (< 1% for all other attributes). Analysis of variance showed that the proportion of respondents with a potentially dominant preference for maximising effectiveness was significantly greater in the UK (11.6%) than other countries, with no significant differences between other regions (5.9–8.1%). Conversely, the proportion with a potentially dominant preference for minimising discomfort was significantly greater in China (10.4%) and Spain (8.0%) relative to other countries (1.3–3.4%).

3.1 Preference Modelling

The mixed-logit models had the best fit for each country/region. Country-specific coefficients, or “part-worth utilities”, are presented in Table 5 and illustrated in Fig. 1. In this figure, an upward sloping line indicates that a higher level of the attribute was preferred, whilst downward sloping preference indicates that the lower level of the attribute was preferred. Most point estimates were statistically significant with the exception of preferences over the number of daily injections in some samples and, most notably, the cost attribute in the Chinese sample. We observed significant heterogeneity in preferences over effectiveness, discomfort and shared decision making. The confidence interval around discomfort crossed zero in all samples, but the other attributes remained significant for most samples.

Table 5 Model coefficients and p-values by sample
Fig. 1
figure 1

Part-worth utilities and confidence intervals by attribute and country/region

Heterogeneity between subgroups is summarised in Fig. 2, with point estimates and 95% confidence intervals as well as the proportion of respondents in each subgroup. The greatest deviations were between fast (less than half median completion time) and ‘non-fast’ respondents, and between consistent and inconsistent respondents. We return to the issue of these respondents in a sensitivity analysis below. Among female respondents, preferences for different attribute levels moved in the same direction as male respondents but tended to be relatively stronger. For example, female respondents derived greater positive utility from the higher level of effectiveness and greater negative utility from a lower level of effectiveness relative to the remainder of the sample. There are examples of statistically significant divergences in the strength of preference amongst the other subgroups, but these are relatively small in absolute terms.

Fig. 2
figure 2

Preference heterogeneity by respondent subgroups (subgroup as proportion of all respondents). Fast Completers completed the discrete choice experiment in less than half the country-specific median completion time (19% of respondents) relative ‘non-fast’ respondents. ART assistive reproductive therapies, Females female individuals (57% of respondents) relative to all other respondents (including “no answer”), High Income income quintiles 4 and 5 (29% of respondents) relative to quintiles 1–3, Inconsistent chose a different alternative in the repeated task (37% of respondents) relative to those who were consistent in their choice, No LT relationship not in a long-term relationship (7% of respondents) relative to those in a long-term relationship or married, Received ART previously received medical assistance (59% of respondents) relative to those who did not receive assistance

Table 5 and Fig. 3 show the relative contribution of each attribute to overall utility, conditional on the ranges presented to respondents. Effectiveness was the most important attribute in most countries and the number of injections was the least important attribute in most countries. The importance of cost was highly variable across samples, from statistically insignificant in China to almost as important as effectiveness in USA. The importance of (dis)comfort was also variable, from the most important factor in China to relatively unimportant in UK and USA. The degree of shared decision making was more important than cost in the Nordic sample and was also relatively important in UK and USA, but relatively unimportant in China.

Fig. 3
figure 3

Attribute relative importance by country/region. Values show attribute relative share of change in aggregate utility from least preferred to most preferred scenario. DM decision making

Estimates of WTP, including the ‘option value’ of treatment, excluding China because of insignificant preferences over cost among this sample, are shown in Table 6 and illustrated in Fig. 4. A WTP greater than zero indicates a WTP to secure a change from the reference level, whilst values less than zero indicate a willingness to pay to avoid a move from the reference level. [28] These results imply a substantial option value of treatment, with a range from €11,000 in the Nordic countries to more than €28,000 in USA. In terms of treatment attributes, the greatest WTP was associated with improved effectiveness (likelihood of live birth), where respondents from the Nordic countries, Spain and the UK were willing to pay between €3000 and €3500 for a 15% absolute improvement in effectiveness (or to avoid a 15% absolute reduction in effectiveness), whilst respondents from the USA were willing to pay more than €6000. Respondents from the Nordic countries and the USA were also willing to pay up to €3000 to move from ‘some’ to ‘full’ shared decision making, but less willing to pay to move from ‘no’ to ‘some’ shared decision making. There was also a WTP for a greater degree of shared decision making, and a reduction in treatment discomfort, although significant heterogeneity means that this latter WTP was not significantly different from zero in all samples. Respondents from the USA had a substantially greater WTP than other respondents for the option of treatment and for gains in effectiveness but were similar to other respondents with respect to WTP for changes in other attributes.

Table 6 Willingness to pay and 95% confidence intervals by attribute level and country/region (Euro)

As noted above, fast and inconsistent respondents showed significant divergence from other respondents. However, excluding the 861 respondents (11.3% of all respondents) who completed the DCE tasks in less than half of the median time for their country and who were inconsistent in the repeated task improved Akaike Information Criterion in all models but did not substantively alter the pattern or magnitude of preferences or estimates of WTP, including the insignificant preference over cost in the Chinese sample. The results of this secondary analysis are available in the ESM.

4 Discussion

We find that the effectiveness of treatment was the most (or second most) important attribute across all samples. However, notwithstanding its relative importance and substantial WTP in all regions, we do not see a dominant preference for effectiveness to the exclusion of other considerations, or a wide gap between effectiveness and other attributes in terms of relative importance. Rather, we see evidence of simultaneous consideration of cost (in most regions), as well as aspects such as the (dis)comfort of treatment and the degree of shared decision making in treatment (Fig. 4).

Fig. 4
figure 4

Willingness to pay by attribute and region, excluding China. Confidence intervals shown in red cross zero and are considered statistically insignificant

Respondents placed a significant value on access to treatment, reflected in the ‘option value’ of treatment, but also had a substantial WTP for improvements in the effectiveness of treatment, a greater degree of shared decision making, and among some respondents, less discomfort in treatment. In general, respondents from the USA had the greatest WTP for improvements in effectiveness and, along with respondents from the Nordic countries, a greater degree of shared decision making. Respondents from other countries had a similar pattern but typically a lower absolute WTP.

The insensitivity of Chinese respondents to the cost of treatment led to a small and insignificant coefficient on the cost attribute and it was not appropriate to use it in WTP calculations. However, we observe that preferences over other attributes in the Chinese sample were broadly consistent with expectations. As in the other samples, we see that effectiveness was quite important, whilst the number of daily injections was relatively unimportant. Excluding potentially inattentive respondents did not change this pattern of preferences or the price insensitivity in this sample.

We have no data to explain why Chinese respondents were price insensitive over the range of cost presented. Given that the Chinese sample had the lowest proportion of respondents flagged as ‘fast and inconsistent’, we do not believe that this unexpected result is driven by a high degree of inattentive respondents. In addition, outside of low price sensitivity, there was no obvious evidence of random or irrational responses, such as a high proportion of statistically insignificant attributes or ‘objectively irrational’ preferences such as a preference for lower effectiveness treatments. An alternative hypothesis is that the range of costs presented to the Chinese sample was too narrow for respondents to form a significant preference over the range of values and therefore respondents disregarded it in their choices. We used the same proportional range (± 40%) that was associated with statistically significant price-sensitivity preference in the other samples, but if the midpoint estimate was artificially low for China (perhaps as a result of subsidised treatment costs), then, this range may be inappropriate. Finally, this result may reflect some ‘social desirability’ bias (SDB), whereby respondents may have felt pressured by cultural expectations to prioritise children over wealth, leading to price insensitivity in their (hypothetical) responses. ‘Social desirability’ bias may be particularly likely in the context of emotive topics such as parenting and infertility.

The hypothesis of ‘cultural’ price insensitivity driven by a greater degree of SDB in the Chinese sample is consistent with results from the societal WTP portion of this survey [16], where Chinese public respondents had the highest stated maximum WTP for a national ART programme by a substantial margin. ‘Social desirability’ bias could also explain the substantial treatment-specific constants, or ‘option value’, in the Chinese sample, as consistently choosing any treatment over no treatment, regardless of the attribute levels of those treatments, would inflate ‘option value’. These results are consistent with the findings of a recent study that demonstrated SDB was associated with inflated valuations in an environmental economic study [33], and other research found that SDB was relatively stronger in more “collectivistic” countries [34].

To the extent that the price insensitivity of the Chinese sample reflects some degree of SDB, a key limitation to our study was not anticipating and controlling for this possibility. We do not, however, see substantive differences between the preferences of the Chinese sample and the other respondents in the other attributes. This suggests that although SDB may have led many Chinese respondents to disregard the cost attribute, this bias does not appear to have carried over to the other attributes. Indeed, the high importance of discomfort in the Chinese sample indicates that respondents were willing to trade-off some chance of conception for a less uncomfortable treatment, suggesting their decisions were not driven by a sense of ‘conception at any cost’. There is evidence that anonymous online surveys can reduce SDB [35], but future research should seek more effective methods to mitigate against this bias.

Another potential limitation is that respondents could have interpreted the wording of “no improved chance of pregnancy” with no treatment to mean “no chance of conception”. This could have unintentionally encouraged some respondents to choose one of the treatment options over no treatment, thus inflating the option value of treatment. However, as likelihood of spontaneous conception after more than 12 months of trying is typically less than 10% [36], the practical difference between “no change” and “no chance” is relatively small, for 90% of people, the outcome of no treatment will be the same (no pregnancy). For this reason, we believe that any bias in our estimates of optional value from this wording will be minimal.

Putting the overall results into the context of the existing literature, the relatively low importance of the number of injections and the high importance of the likelihood of success and of cost observed here are consistent with Musters et al. [10], who reported that an additional daily injection did not alter women’s treatment preferences but that they were impacted by cost and the live birth rate. This study was conducted in the Netherlands and included 206 respondents. The authors found that, on average, respondents were only willing to pay €1000 if it was associated with an improvement in the live birth rate of at least 6%. Similarly, Palumbo et al. [9] reported a willingness to pay of between €100 and €300 for a 1–2% improvement in effectiveness. They also found that positive doctor-patient information sharing was more important to patients than treatment comfort.

van Empel et al. [6] specifically tested the impact of ‘patient centredness’ on patient and physician preferences for fertility care. They did not include a cost attribute but asked respondents to trade-off between the pregnancy rate and process aspects of treatment, including travel time to the clinic, the physician’s attitude toward the patient, the information provided to the patient and the continuity of care. They found that patients were willing to accept up to a 10% lower pregnancy rate for a friendly and interested physician, and for clear and customised information on treatment. Physicians given the same tasks and asked to anticipate patient responses underestimated the value of clear and customised information by more than 40% (a 5.5% trade-off in the pregnancy rate compared with the patients’ 9.6%). Our results are consistent with this finding, but we see a non-linear value to shared decision making: moving from ‘none’ to ‘some’ shared decision making was considerably less valuable than moving from ‘some’ to ‘full’ in all samples. Shared decision making, though, is a difficult concept to quantify, particularly compared with some of the other attributes in the DCE. Different participants may have had varying perceptions of what represented an acceptable degree of shared decision making in this context, and future research should seek to understand which ART decisions patients are most interested in sharing and which they prefer to delegate to their physician.

5 Conclusions

This study provides evidence from large multi-national samples to generalise the results of previous smaller scale research around patient preferences for ART. We find that the direction of preferences over attribute levels is relatively uniform across the countries/regions in the sample, but that the relative importance of those attributes can differ substantially. We also see that respondents balanced concerns for treatment effectiveness with other considerations, including the cost and (dis)comfort of treatment, and the degree of shared decision making. Moreover, we find a substantial ‘option value’ to treatment, demonstrating the value of access to ART to those with experience of subfertility.