Introduction

Integrated care programmes that focus on multi-morbidity often include a package of complex and multifaceted interventions that have multiple aims [1, 2]. Such aims include improved population health, better patient experience, cost reduction (known as the Triple Aim [3]), and better experience of providing care (known as the Quadruple Aim [4]). Therefore, evaluations of such models of care require the measurement of a broad spectrum of outcomes that go beyond traditional health outcomes like health-related quality of life and longevity [5].

However, not all aims are expected to have the same importance to the different stakeholders involved in the design, provision, financing, and receipt of integrated care for persons with multi-morbidity. For example, patients may assign higher importance to experience with care than clinicians with overburdened workloads, while payers may be more sensitive to costs than other stakeholders. Discordance in preferences complicates the decision-making process [6]. International experience and scientific evidence show that the success of integrated care models is highly dependent on the alignment of stakeholder preferences for the model’s aims and achievements [7,8,9]. Hence, it is important to elicit their preferences and take them into account when designing and assessing integrated care for multi-morbidity [9].

A common technique to measure preferences in healthcare delivery is a discrete choice experiment (DCE), in which respondents are asked to make a number of choices between two hypothetical options characterised by attributes with differing levels [10]. A DCE forces respondents to make trade-offs between multiple elements or aims of a health care intervention [11]. Especially in integrated care this is important as interventions focus on improving outcomes beyond the quality-adjusted life year (QALY). An advantage of a DCE is that, by making patient-preferences so explicit, it makes it possible to incorporate them in decision-making [12]. To study heterogeneity in choices, one can use a latent class analysis to identify underlying subgroups of respondents with similar preferences and characterised by background characteristics. This information can be used to better understand differences in preferences between stakeholders and further align them.

The aim of this study was (1) to investigate to which extent outcomes beyond health are valued and (2) to study the heterogeneity of preferences for outcome measures of integrated care among stakeholders involved in integrated care. The outcome measures included in the preference study were physical functioning, psychological well-being, social relationships & participation, enjoyment of life, resilience, person-centeredness, continuity of care, and total health and social care costs. Respondents were recruited among persons with multi-morbidity, partners & other informal caregivers, professionals, payers, and policymakers. This is the first DCE study including such a wide variety of outcomes measures relevant to integrated care and such a diversity of stakeholders involved in integrated care.

Methods

Context of the DCE

This study took place in the context of the EU-funded SELFIE2020 project, in which we aimed to elicit preferences for outcome measures of integrated care that could be used in Multi-Criteria Decision Analysis [13], see box 1. In the current study, preference data from Dutch stakeholders involved in integrated care were used.

Box 1:
figure a

Information on the SELFIE project

Attributes and levels

The development of attributes and attribute levels (see Table 1) consisted of two steps. First, a longlist with potentially relevant attributes was composed using four methods: (1) a literature review of outcome measures used in (integrated) care, (2) national workshops with patients, informal caregivers, professionals, payers and policymakers in the eight countries in the SELFIE project to discuss outcomes of integrated care, (3) eight focus groups with individuals with multi-morbidity to discuss what outcomes of integrated care matter to them [14], and (4) a review of outcomes being used in the 17 integrated care programmes in Europe that were evaluated in the SELFIE project. The second step was to shorten the list, a process that was guided by multiple criteria, including relevance to multi-morbidity in different contexts and population groups, non-redundancy, operationality, and preference independence [13]. The levels to describe the attributes were defined such that they represent the full range of the scale from worst to best, with an intermediate level in between. The wording of the levels was based on validated questionnaires that are used to measure these outcomes in empirical research [13]. For costs we used estimates of the mean total health and social care costs for people with multimorbidity in the Netherlands (middle level), which we increased and decreased by 20%.

Table 1 Attributes and levels

Design

Given the large number of attributes, in combination with the three possible levels for each, a full factorial design that includes all 6561 possible alternatives (i.e. 38: 8 attributes with 3 attribute levels), would not be feasible. To reduce this set of combinations to a manageable number, we used specialised software to select the most informative combinations of attribute levels per choice question, using Bayesian design algorithms that maximise the D-efficiency for a pre-specified conditional logit main-effects model [15,16,17]. Maximizing the D-efficiency involves minimizing the confidence sphere around the complete set of model parameters in this logit model. Priors for the weights of the attribute-levels, as required for an efficient optimisation approach, were obtained from literature [18]. To further improve the efficiency of the parameter estimates obtained from the DCE, the overall DCE design comprised ten different sub-designs. This means that instead of using 1 design for all respondents, we constructed 10 different sub-designs, and each respondent is only asked to complete one, randomly chosen, sub-design that consists of a pre-specified number of 18 choice tasks [15]. The informative priors from the literature were updated using the answers of the first 50 respondents from each stakeholder group, to create a more efficient DCE design for the remaining respondents in the stakeholder group.

When scanning the subsets of the full-choice design to find a D-optimal design we imposed two design constraints to reduce the complexity and to avoid unrealistic choice tasks. First, the highest level of enjoyment of life and lowest level of psychological well-being, and vice versa, were never combined within a single choice option, i.e., ‘Seldom or never stressed, worried, listless, anxious, and down’ (highest level of psychological well-being), and ‘No or barely any pleasure and happiness in life’ (lowest level of enjoyment of life) could never be part of the same programme description. Second, in each choice task either three or four attributes needed to have the same level for each alternative to reduce overall complexity and improve response efficiency [19].

Questionnaire

The DCE questionnaire was self-administered and web-based. After informed consent, the meaning of the attributes and levels was explained. Each choice task described two alternatives with eight attributes of varying levels. These two alternatives represented two integrated care programmes with different outcome-profiles. These were labelled ‘Care programme A’ and ‘Care programme B’ (See Fig. 1 for an example choice task). Respondents were asked to complete two ‘warm-up’ DCE choice-tasks before the main choice tasks. This familiarised respondents with the attributes and levels and prepared them for the full set of 18 choice tasks. The choice tasks were presented in three groups of six choice tasks each, with a few general demographic or health-related questions in between to reduce the repetitive nature of the choice tasks. The questionnaire concluded with debriefing questions related to the stakeholder perspectives the respondents identify themselves with and the ease of understanding and completion of the choice tasks.

Fig. 1
figure 1

Example choice task DCE

The DCE questionnaire was pilot tested with six patients, including think aloud sessions to see if there were elements that needed clarification. After the pilot study small changes were made in the visual design of the study to enhance the clarity of the questionnaire.

Subject recruitment and data collection

To recruit a representative group of different stakeholders involved in integrated care, we aimed to recruit 750 respondents among persons with multi-morbidity (n = 150), informal caregivers of persons with multi-morbidity (n = 150), professionals (n = 150), payers (n = 150) and policymakers (n = 150). In July 2017, members of an online marketing research panel who were persons with multi-morbidity, informal caregivers, or professional care providers were invited to complete the questionnaire. Payers and policymakers were invited via the same panel organisation, but since it was difficult to reach 150 respondents in these groups, recruitment was supplemented by personal invitations of payers and policymakers in the network of the researchers of the SELFIE project, followed by snowballing. Between July 2017 and July 2018, we approached healthcare payers such as health insurance companies and departments of municipalities responsible for paying social care. To include policymakers, we invited public servants working at the ministry of health, provincial or local governments, official governmental advisory bodies, mayors, aldermen, and city councillors with health and/or social care in their portfolio. Each participant was asked to confirm that they belonged to their assigned stakeholder group and to respond from that perspective. They were also invited to indicate one or more other stakeholder group(s) that they felt they belonged to as well. It was made impossible to fill in the questionnaire using a mobile phone or tablet as the choice task would not be fully visible.

Statistical analysis

First, a Bayesian mixed logit model (MIXL), using diffuse priors for the mean values of the random coefficients, was used to analyse the data of all respondents simultaneously. This model allowed all utility coefficients to be randomly distributed and estimated a full covariance matrix among them. A burn-in phase of 10,000 Markov Chain Monte Carlo (MCMC) draws was used, followed by 30,000 draws to reliably approximate the posterior. The relative importance of each attribute (i.e., outcome measure of integrated care) was based on the coefficient of its best level (level 3) divided by the sum of all best attribute levels.

Second, a latent class model was used to model heterogeneity across individuals with a discrete distribution over a set of classes, and individuals were sorted into a set of classes based on their observed choice data [20]. Latent class analysis is an extension of the standard logit model and is used to identify unobserved groups of similar individuals (latent classes) with homogeneous preferences based on observed variables. These distinct groups can possess (widely) different preferences regarding integrated care. Furthermore, latent class modelling is probabilistic, which means that respondents are allocated to the group they are most likely to be a member of. Subsequently, posterior analysis can be used to describe differences in characteristics across groups. Initially, we compared 2 to 9 class solutions for the best statistical fit based on the Bayesian Information Criterion (BIC) and Consistent Aikake Information Criterion (CAIC), with a lower value implying a better fit [21]. We also considered theoretical interpretability and the size of the classes to see if another number of classes would be more logical based on the observed variables. For each respondent, the posterior probability that (s)he belongs to each latent class was calculated and each respondent was assigned to the class with the highest probability. Class membership was based solely on estimated preferences from the DCE. To assess whether differences between classes were significant, we conducted chi-square tests (categorical variables), one-way ANOVA tests (continuous variables) and Kruskal Wallis tests (non-parametric test for continuous variables).

Third, as the observed preference heterogeneity could be related to stakeholders having multiple roles, we investigated which other perspectives the stakeholders identified themselves with, in a descriptive analysis.

Analyses were performed in Stata 16.0, using the bayesmixedlogit module specified with Metropolis-within-Gibbs sampling and default (uninformative) priors for the MIXL model and the lclogit procedure for the latent class models.

Results

Study population

There were 935 persons that started the questionnaire and gave informed consent, of which 739 (79%) finished all DCE scenarios. The mean time to complete the questionnaire was approximately 20 min. 705 respondents spent at least 5 min filling in the questionnaire and all further analyses are performed on this group.

Table 2 summarizes the characteristics of the respondents. Their mean age was 49.6 years, 54.2% was female, and the majority (73.1%) of respondents was employed. The respondents’ highest attained educational level was relatively high. 31% of the respondents reported no health problems.

Table 2 Respondent characteristics

DCE preferences across all stakeholders

The results of the Bayesian MIXL (Table 3) showed that all attribute-levels differed from level 1. The attribute levels had the expected positive sign and the coefficients of level 3 were always larger than those of level 2. This means that level 2 and level 3 were valued higher than level 1 and the level 3 was valued higher than level 2. On average, the respondents assigned the highest relative importance to enjoyment of life, followed by psychological well-being, and resilience (Fig. 2). The least important outcome was total health and social care costs. However, the standard deviations of all attribute (levels) indicated a wide variation in preferences among respondents (Table 3).

Table 3 Attribute-level coefficients of the Bayesian MIXL model
Fig. 2
figure 2

Relative importance of the outcome measures. (Note: All relative importance weights sum up to 1. The relative importance of each outcome measure was based on the coefficient of its attribute-level 3 divided by the sum of all level 3 coefficients. For instance, ‘Enjoyment of life’ had a coefficient of 5.571 for its level-3 attribute (see Table 3), and it yielded a relative importance weight of 5.571/25.164 = 0.221, where 25.164 was the sum of level-3 coefficients across all domains.)

Grouping stakeholders with similar preferences

Based on the information criteria (BIC and CAIC), the latent class model with four classes provided the best model fit. Based on the class probabilities 20% (n = 140) of all respondents was assigned to class 1, 39% (n = 273) to class 2, 18% (n = 126) to class 3, and 24% (n = 166) to class 4. The average of the respondents’ maximum posterior class membership probabilities was 0.82 (SD = 0.17, median = 0.87), varying from 0.74 for class 1 and 0.89 for class 4. Figure 3 presents the class-specific preference coefficients. The estimates in class 1, 2 and 3 had the expected direction, i.e., respondents preferred a higher level of each outcome. In class 4 the preferences for physical functioning, psychological well-being and social relationships & participation were not statistically significant. A table with all coefficients, including standard errors and p-values, is presented in online Appendix 1.

Fig. 3
figure 3

Results latent class analysis: coefficients of attribute-level 3. *Significant preference within the class (P < 0.05)

Compared to the other classes, class 1 respondents had the highest estimates for the experience with care outcomes (continuity of care and person-centeredness). In class 2, respondents assigned a relatively higher weight to enjoyment of life, followed by psychological well-being, than the other classes. Respondents in class 3 had a stronger preference for physical health than the other classes, followed by three outcomes related to mental health i.e., psychological well-being, enjoyment of life and resilience. The coefficients in class 4 were overall quite small, which indicates that the preferences were less consistent than in the other classes.

Characteristics of stakeholders with similar preferences

Table 4 presents the background characteristics of stakeholders in the four latent classes. All classes included representatives from all primary stakeholder groups, although professionals were overrepresented in class 1 (26%), followed by policymakers (25%). Class 1 had the highest share of females (62%). Also, health was best in this class compared to the other classes based on both self-perceived general health (mean 2.54 and 44% of respondents who answered “excellent” or “very good”) and the mean number of health conditions (1.55).

Table 4 Class-specific respondent characteristics

In class 2 the stakeholder groups were quite evenly distributed, with somewhat more professionals (23%). This class consisted of respondents with the lowest mean age (47.6 years). Furthermore, the educational level and employment rate were the highest in this class. 75% of the respondents had a high educational level and 79% currently had a paid job.

Persons with multi-morbidity were more frequently a member of class 3 (29%). This class, which predominantly consisted of males (59%), also had a worse health status than respondents in class 1 and 2.

Class 4 had the highest share of informal caregivers (34%). The respondents in this class were on average older (53.7 years), lower educated, in worse general health (31% “fair” or “poor” health) and had more health problems (mean 2.23 health problems) than respondents in the other classes. Of the respondents in this class, 30% found the questionnaire difficult or very difficult, although the time to completion was the fastest of all classes (18.2 min).

Multiple perspectives per stakeholder

Of the respondents, 48% chose no additional stakeholder perspective they identify themselves with. 39% of the respondents only selected one additional perspective, 13% identified with two or more additional perspectives. In the entire sample, 18% of the persons with multi-morbidity identified themselves as informal caregiver (Table 5). Of the informal caregivers, 29% identified themselves as person with multi-morbidity. 30% of the professionals identified themselves with the perspective of informal caregiver. 59% of the payers viewed themselves as policymaker. Of the policymakers, 24% selected informal caregiver as additional perspective.

Table 5 Overlap in perspectives of respondents

When comparing the four classes (Table 4), class 4 had the lowest percentage of respondents that chose no additional perspective (43%). In all four classes ‘Informal caregiver’ was the most frequent additional perspective and in classes 1, 3 and 4 this was followed by person with multi-morbidity. There were no marked differences between the classes in the additional perspectives that were chosen.

Discussion

Interpretation of the main findings

This study investigated stakeholders’ preferences for outcomes of integrated care for persons with multi-morbidity using a DCE. Based on the mixed logit model results of the pooled data, which showed that all attribute levels were statistically different from 1, it was established that all outcome measures and all levels of the outcomes measures influenced stakeholders’ choices. This means that stakeholders took all outcome measures into account when deciding upon which care programme was preferred. Enjoyment of life, psychological well-being and resilience were deemed most important and total health and social care costs least important, but there was a lot of variation in preferences.

When divided into four classes using latent class analysis, we could identify a class that assigned a relatively higher weight to the two experience with care outcomes, i.e., continuity of care and person-centeredness, a class that emphasized the importance of enjoyment of life and psychological well-being, a class that was more focused on physical functioning and a class with inconsistent preferences. Each of the classes included persons with multi-morbidity as well as informal caregivers, professionals, payers and policy makers, suggesting that differences between the classes were not particularly driven by stakeholder group. One of the reasons that differences in preferences cannot be directly related to stakeholder perspective might be that respondents obviously have multiple roles, as was clearly shown by their self-reported additional stakeholder perspectives. Many informal caregivers, payers and policy makers were patients themselves.

Nevertheless, some stakeholders were overrepresented in some classes. The class that focused more on experience with care (class 1) included relatively more professionals and policy makers, i.e., stakeholders that are used to monitoring process outcomes as indicators of the quality of care. In the Dutch context, where patients are free to choose between care providers, professionals are incentivized to improve patient’s experience and satisfaction with their services to increase their market share. This is emphasized by payers who consider quality indicators when contracting providers. The class that focused on physical and mental health (class 3) had the highest share of persons with multi-morbidity, resulting in the highest share of persons reported having more than one health problem. A likely explanation is that people with immediate concerns about their health prefer outcomes related to these domains in contrast to experience with care or cost outcomes. In the class with less consistent preferences (class 4), persons with multi-morbidity and informal caregivers were overrepresented. Respondents in this class found the questionnaire difficult, which has likely contributed to the inconsistency. The fact that they had the shortest completion time might illustrate this difficulty.

The largest class (class 2) with approximately 39% of the sample, consisted of respondents that put much weight on enjoyment of life. The respondents were quite evenly distributed over the stakeholders. The respondents in this class were relatively younger, higher educated, healthier or more likely to have a paid job than respondents in class 3 and 4. Their lesser experience with (physical) health problems might explain their higher valuation of enjoyment of life.

Comparison with other research

In contrast to our study, most previous DCE-studies include the perspective of one stakeholder group, e.g., patients or healthcare workers, or compare the preferences of two stakeholder groups [22, 23]. Furthermore, many health-related DCE-research include attributes related to characteristics of the new therapies or drugs (i.e., structure-attributes such as waiting time till appointment, care provider/setting or process-attributes such as shared decision making) [24] whereas in the current study we included outcomes of the intervention.

In a previous paper, covering preference data from 8 European countries (including these Dutch data) [18], we also compared different stakeholder groups directly and reported considerable within-country agreement between stakeholders involved in integrated care with enjoyment of life ranking first and costs ranking last. However, we also found that patients assigned significantly higher values to physical functioning than professionals in five countries, which is in line with our finding that class 3, which focused more on physical health, contained the highest proportion of persons with multi-morbidity.

Similar to our study, other studies acknowledge the importance of measuring a broader set of outcomes than merely the physical and mental health outcomes that are traditionally included in health-related quality of life [25, 26]. This is required to fully capture the outcomes that interventions are trying to achieve. The discussion on outcomes beyond the quality-adjusted life year (QALY) largely concentrates on interventions in the care sector, such as elderly care or care for physically or mentally disabled people. In that context, a lot of attention is being paid to well-being outcomes, for which several questionnaires were developed in recent years [27]. Well known instruments include the Adult Social Care Outcomes Toolkit (ASCOT) [28] and the ICEpop CAPability measure for Older people (ICECAP- O) [29, 30]. The ASCOT focuses on social care related quality of life and, similar to our outcome measures, also includes ‘social participation’ as one of their 8 domains. The ICECAP-O is conceptually based on the capability approach and one of the five domains covered in this instrument is ‘enjoyment’ which received the highest weight in our study. A more recent instrument is the Well-being Of Older People measure [31] that captures relevant well-being domains for older people–among which multi-morbidity is common–and includes e.g., ‘resilience and acceptance’ and ‘social contacts’. These outcomes were also included in the current DCE, in which resilience was in the top 3 outcomes that received the highest importance. Another example is the extension of the EQ-5D into the EuroQOL Health and Well-being (EQ-HWB) [32], which also includes outcomes in social care and carers’ quality of life. Similarly, in our study we included social relationships and participation.

Strengths and limitations

This study is one of the first that elicited weights for a set of outcomes that goes beyond health, and requires trade-offs between health, well-being, experience, and costs to obtain weights. It included a sufficiently large representation of multiple stakeholder groups involved in integrated care for multi-morbidity. Furthermore, the inclusion of a variety of background characteristics and information on self-perceived health, allowed us to investigate the differences between stakeholders that had different opinions on the importance of the outcome measures. What is unique for this study is that we also asked the stakeholders for other roles they might have. We have learned that the additional perspective(s) that were chosen did not explain the variation in preferences between the classes.

Several limitations in the current study should also be mentioned. The survey could only be completed using a computer and not via a mobile phone. Therefore, younger persons may be underrepresented. Secondly, although the sample is quite large, payers are less well represented among the stakeholder groups. Moreover, although a DCE is a widely used method to elicit preferences in health care and health care delivery, it also has its limitations. One of the main concerns regarding DCEs is the external validity due to hypothetical bias, i.e., the disparity between stated preferences based on hypothetical DCE questions and revealed preferences based on actual choices in real life [33, 34]. Recently, a number of case studies reported a high external validity of DCEs, with over 90% of the individual choices correctly predicted, thus suggesting a high degree of confidence [35]. However, this research does not pertain to integrated care, nor does it use outcome measures as attributes.

For future research, it would be interesting to further investigate the reasons behind differences in preferences. Preferences are, for example, likely to be influenced by a person’s own experiences, or the experiences of significant others [36, 37]. In the current study we did not explicitly ask about this.

Implications

Information about these preferences can be used in the design of new integrated care initiatives by concentrating on interventions that specifically aim to improve well-being and by better targeting interventions to patients’ preferences. As this study has shown that patients’ preferences cannot be presumed based on their characteristics, obtaining more insight in an individual’s preferences should be an important part of a shared decision-making process.

The preferences can also be used in health care evaluation. Currently, health care evaluation focuses mainly on health status, life expectancy and QALYs, although there is a demand for tools that incorporate multiple outcomes that emerge from interventions with benefits beyond health [38, 39]. The current study showed that outcomes related to well-being, and mental health in particular, were highly valued. More specifically, enjoyment of life received much weight in both the full sample analysis, and 3 out of 4 latent classes in which it received the highest or second highest weight. Yet it is not a common outcome measure in health care research. Future evaluations of integrated care interventions that measure a similar set of outcomes can make use of the weights obtained in this study. That enables the calculation of weighted outcomes which can be combined in an overall value score using multi-criteria decision analysis (MCDA) [11].

Conclusion

Stakeholders involved in integrated care for multi-morbidity value the outcome measure ‘enjoyment of life’ most and ‘total health and social care costs’ least. There is considerable heterogeneity in preferences, with a group of stakeholders assigning relatively higher importance to experience with care outcomes, a group assigning relatively higher importance to enjoyment of life and psychological wellbeing and a group focusing more on physical health. Differences in preferences were only weakly related to whether respondents were patients, informal caregivers, professionals, payers or policymakers as many stakeholders have multiple roles. This heterogeneity in preferences underlines the need to measure a wide range of different outcome measures when evaluating integrated care, including well-being outcomes and experience with care outcomes.