Introduction

Evaluating the benefits of health treatments can assist the allocation of scarce health-care resources by maximizing health benefits. Effectiveness of health-care interventions is currently preferably measured in terms of quality-adjusted life years [1, 2]. Quality-adjusted life years combine the quality and quantity of life into a one-dimensional outcome. Commonly used scales to assess health-related quality of life (QoL) are generic utility measures, like the EQ-5D [3]. These QoL measures provide utilities for different levels of a predefined set of domains (e.g., mobility). They focus on domains of QoL that can be expected to be affected by health-care interventions and are therefore often labeled as health-related QoL measures. An increasingly common critique is that such utility measures are too narrowly focused and do not capture all domains relevant to QoL [4, 5]. For example, these measures mainly focus on determining the physical effects of cure-related treatments and do not detect important effects of health-care interventions in the care-sector on mental and social domains of QoL [6, 7]. A worrying consequence is that the effects of health-care interventions are not as comprehensively captured as possible, which results in suboptimal measures of the effectiveness of health-care interventions. Therefore, measures need to go beyond these scales. Unfortunately, there is no consensus in the scientific literature on the core domains of QoL [5, 810]. To identify these core domains, we conducted a three-stage Delphi-procedure among different groups of experts. The current article outlines the outcomes of this Delphi-procedure.

Delphi consensus procedure

Delphi consensus procedures have proven to be a valuable tool in gaining insight into health-related issues [1113]. The selection of experts is critical to the success of the Delphi technique in providing in-depth understanding of scientific questions. Although a Delphi-procedure does not require representative sampling, it does require the cautious selection of panel members who are information- and experience-rich [1416]. In many studies, multiple groups of experts are included to capture a broad spectrum of insights and information [13, 17]. Accordingly, we included five groups of experts: patients, family of patients, clinicians, scientists and the general public. We did not include two other groups that could be seen as informative, namely: board members of health-care insurers and policy makers, because workers in these professions are not expected to base their judgments on their own opinion but on existing information (e.g., scientific outcomes, statements of medical professionals). We will focus on the differences and similarities between all five groups.

Methods

Before conducting a three-stage Delphi-procedure, we performed an extensive analysis of existing (health-related) QoL measurements to identify potential health-related QoL domains. This search provides solid input for the Delphi-procedure [18]. Our search was intended to be an open process that is able to identify all potential domains of health-related QoL (e.g., irrespective of level of abstractness). We did use a broad and general conceptual framework to structure the extensive number of domains we found during our search; we used the definition of health of the World Health Organization (WHO) [19] as guidance. That is, in accordance with the WHO definition, we perceive health-related QoL as a state of complete positive physical, mental, and social well-being. We used this broad conceptual framework, instead of a more specific theoretical model, because there is a degree of consensus within the scientific field of population health around this definition; concerning more specific theoretical models, there is much more debate and ambiguity [20]. We also used the criterion that domains could be influenced by health-care interventions (e.g., medical interventions, psychological interventions). The domains that we identified during our search formed the input for the first round of the Delphi-procedure.

Just as important, during the first questionnaire round of the Delphi-procedure, participants were encouraged to freely express their own beliefs. That is, they were stimulated to mention additional health-related QoL domains not found during the search or comment on the description of domains. These suggestions made by the participants during the first round were used to adjust the second-round questionnaire. Finally, the experts re-evaluated the second-round outcomes in the third round. This iterative approach allows participants to adjust their opinions when needed and to obtain feedback about the opinions of other people. All experts remained anonymous to minimize group pressure for conformity biases [21, 22].

Analysis of existing measurements

We identified QoL domains that are part of existing health-related QoL utility measures and domains that are part of overall QoL, satisfaction or well-being measures. Health-related QoL utility measures mostly originate from an economical scientific tradition [23]. Overall QoL measures can originate from multiple backgrounds (e.g., medicine, psychology, sociology, economics).

(Health-related) QoL utility measures

The most commonly used health-related QoL utility measures, as previously identified [3, 2426], are the EQ-5D, SF-6D, HUI2, HUI3, 15D, QWB-SA and AQoL. We examined these questionnaires and specified which domains are present in these measures.

General QoL measures

To get a broader and more diverse overview of domains of health-related QoL, we also looked at the QoL questionnaires included in the online archive of the Australian Centre on Quality of Life [27]. This database includes hundreds of questionnaires on health-related QoL, QoL, well-being, wellness and life satisfaction. Nine categories are specified (e.g., normal population, cognitive disability, children/adolescents). We selected the questionnaires from the category ‘normal population’ (762 questionnaires). For each scale, a short description and a reference to the original article is given in the online archive. We excluded questionnaires that still explicitly focused on specific groups (e.g., specific patient groups) and health-related utility measures already identified in the above-mentioned search. We divided all remaining questionnaires in two categories: (a) questionnaires that considered QoL in general and (b) questionnaires that considered specific domains within QoL (e.g., anxiety).

Based on the title of the questionnaire and a short description given in the online archive, we were able to see whether a scale considered QoL in general or specific domains. If a scale focuses upon QoL in general, we looked-up the original article to determine which domains were central in these measures.Footnote 1 We identified 53 general questionnaires on QoL (of the 762 questionnaires in the online archive) that matched our criteria. For these questionnaires, we scrutinized the original article for the QoL domains. Concerning questionnaires that focus on specific QoL domains, we identified a total of 484 questionnaires that consider one or two specific domains of QoL (of the 762 questionnaires of the online archive). We used the title of the scales and general description to identify the specific QoL domains. We checked whether the scales were intended to be used in the general population.

Summarizing outcomes of analysis of existing measurements

An extensive list of domains was obtained. We excluded domains that were mentioned multiple times or domains that overlapped. Next, we excluded domains if they were so specific that they were only applicable to specific subgroups (e.g., satisfaction with meditation options). As mentioned above, the domains should also be expected to be influenced by health-care interventions. Finally, we ended up with 40 health-related QoL domains (see Table 1). These domains formed the input for the first round of the Delphi-procedure.

Table 1 Final list of domains based on analysis of existing (health-related) QoL measurements

First round

First-round procedure and participants

Five different groups of participants were recruited to take part in the Delphi-procedure (see Table 2 for demographics): (a) patients—people who, at the moment of recruitment or in the previous year, had an acute/chronic physical/mental disease, were terminally ill or underwent fertility treatment; (b) family members of patients—people who had a family member who can be categorized as being a patient; (c) clinicians—people who had been working with patients/clients for at least 2 years. This was a diverse group; in the first round, the group consisted of 19 clinicians, 5 physiotherapists, 5 nurses and 5 psychologists. They had an average of 17.79 years of professional experience with patients/clients (SD = 10.41); (d) scientific experts—prominent researchers from all over the world on the topics QoL, well-being and health-related QoL. We personally approached the following people: (1) all first authors of the articles that are part of our analysis of existing general QoL measurements who stated in their online CV that QoL/well-being is their current main interest; (2) the editorial boards of five prominent scientific journals on QoL/well-being (i.e., Applied Psychology: Health and Well-Being, International Journal of Wellbeing, Psychology of Well-Being, Quality of Life Research, and Journal of Happiness studies). It is a mixed group; in the first round, the group consisted of 8 people with an economics background; 8 with a psychological background and the 16 remaining participants had varied backgrounds such as epidemiology, philosophy, psychiatry, marketing, public health; e) general population—people that did not fit the criteria of the above-mentioned groups.

Table 2 Respondents of three-stage Delphi-procedure

Patients, family of patients and the general population were recruited by means of calls on Dutch websites of patient organizations and calls in local Dutch newspapers. Clinicians were personally approached. In exchange for participation in the total Delphi-procedure, we contributed 7.50 euro to a charity fund of their choice (except for scientific experts). All participants received a link to the first online Delphi questionnaire by e-mail.

First-round questionnaire

The first-round Internet-based questionnaire started with some background information about our project and about QoL. It was communicated that we defined health-related QoL as a state of complete positive physical, mental and social well-being (see definition of health of the World Health Organization [19]). We also emphasized that the domains should be able to be influenced by health-care interventions. Next, participants answered 40 questions; these covered the domains of health-related QoL that were identified during our analysis of existing measurements (see Table 1). Participants were asked to indicate for each domain to what extent they perceive each domain as an important part of health-related QoL (endpoints 1 [not important] to 4 [very important]). These 40 domains were categorized, based on content in five categories: physical, mental, social, domains on the interface of mental and social well-being, and remaining domains. These five categories were presented to participants in random order, and within each category the domains were also presented in random order. At the end of each list of domains belonging to a certain category, participants were encouraged to write down any comments. In addition, at the end of the questionnaire, participants were asked if they thought that domains were missing and if so, they were encouraged to write down these domain(s) and provide a short explanation. Finally, demographic questions were posed.

First-round analyses

First, we analyzed the answers given by the five groups on the 40 domains of health-related QoL. For all domains within each group, we determined whether or not there was consensus on the importance of a specific domain. In all the rounds, we used median scores (Mdn) to determine consensus. We did not use means, because means are more sensitive to extreme scores than Mdn scores, and are therefore less appropriate to determine the presence of consensus within groups [16]. We classified three outcomes in the first round: (a) consensus that a domain is highly important (Mdn = 4); (b) consensus that a domain is less or moderately important (Mdn < 3); (c) no consensus that a domain is highly important (Mdn ≥ 3 and <4). In addition, we analyzed the answers to all open-ended questions for all five groups together. Three different researchers looked at the open-ended answers to exclude interpretation bias [12, 13]. Suggestions that were mentioned by at least two participants were processed. This resulted in several changes (see Table 3).

Table 3 New, deleted, merged, and altered domains in the second and third round of Delphi-procedure

Second round

Second-round procedure and participants

All people who participated in the first round were invited to participate in the second round. This round started 6 weeks after the first. Each group was asked to re-rate the domains of the first round, for which a consensus on whether they were highly important or not was not reached in that group (Mdn ≥ 3 and <4), and to rate the added and altered domains. For domains that were re-rated, a summary report was presented to the participant in which their answers to the first round, the average of the group and a frequency chart of the group’s answers were shown.

Second-round questionnaire

In the second-round questionnaire, the domains were also ordered in five categories. In the second-round questionnaire, we used a 7-point scale (endpoints 1 [totally not important] to 7 [very important]). We included a more detailed response scale to get more variance in the answers.

Second-round analyses

In this round, we determined consensus by means of both Mdn scores and Inter-Quartile Deviations (IQDs). IQDs are commonly used to determine consensus [11, 16]. We included IQDs because our 7-point scale allowed for a meaningful interpretation of this outcome. IQD represents the distance between the twenty-fifth percentile and the seventy-fifth percentile values in opinions, with a smaller IQD indicating larger consensus. An IQD ≤ 1 can be considered as good consensus on a 7-point Likert scale [15]. We classified three outcomes: (a) consensus and agreement that a domain is highly important (IQD ≤ 1 and Mdn ≥ 6); (b) consensus that a domain is less to moderately important (IQD ≤ 1 and Mdn ≤ 5); (c) no consensus (IQD > 1).

Third round

Third-round procedure and participants

All people who participated in round two were invited to participate in round three. This round started 4 weeks after the previous round. Each group was asked to re-rate the domains of round two for which no consensus was reached. Again, a summary report of the results of the previous round was presented.

Third-round questionnaire

No domains were changed or added to the third-round questionnaire compared with the second-round questionnaire. The response scales were identical to round two. In the third round, we also asked respondents to indicate which five domains they perceived as most important aspects of QoL of all 42 domains. Next, participants were asked to rank these five domains from least important to most important.

Third-round analyses

The same criteria as used in the second round (i.e., Mdn scores and IQDs) were applied to determine consensus and agreement. In addition, for each group, we determined which domains were mentioned most often in the list of five most important domains.Footnote 2

Results

We analyzed the results for the five groups separately. In Table 4, the results of all three Delphi-procedure rounds are presented.

Table 4 Means and other results of Three-Stage Delphi-procedure for all five groups

First-round results

Patients and family members of patients already agreed in the first round on the high importance of a quarter of the presented domains. The other groups reached consensus on the high importance of only a few domains in round one. There is no domain for which all five groups reached consensus on the high importance. However, all groups, except the scientists, agreed on the high level of importance of the domains: self-acceptance, self-esteem and autonomy.

Second-round results

The outcomes show that especially the scientists perceive much fewer domains as highly important, than the other groups. In addition, both scientists and the general population reached consensuses that several domains are not important, while the other groups mainly reported reaching consensus on the high level of importance of domains. Looking at specific domains, differences between groups are also clearly present. For example, the domain ‘emotional control’ is perceived as highly important by patients and family of patients, but as not highly important by scientists and the general population. There are also similarities between the groups. For example, almost all groups agree on the high level of importance of ‘being able to perform activities of daily living that are important to you’ and ‘enjoying the little things in life’.

Third-round results

Third-round results of five groups

A striking finding is the extensive absolute number of domains for which no consensus is attained by scientific experts. They did not reach consensus on 20 domains, while for the other groups, this was the case for less than 10 domains. The third round mainly proved effective in reaching consensus on a large absolute number of domains for family of patients.

At the end of round three, we wanted to get a better understanding of the similarities and differences between all five groups. Because many groups perceived a large number of domains as highly important, we made a further selection—we looked at the 10 domains with the highest mean ratings on which consensus and agreement was attained that they were highly important (within each group).Footnote 3 This provided a rather diverse picture. The five groups strongly agree on ‘self-acceptance,’ as this domain is rated highly by all groups. The next domains are rated highly by the majority of groups (i.e., three or four groups): being able to perform activities of daily living that are important to you, independence, mental balance, self-esteem, acceptation of the situation, enjoying the little things in life, autonomy, purpose in life, satisfaction with daily activities and satisfaction with life roles.

Five most important domains

In Table 5, the 10 domains that were most frequently mentioned in the list of five most important domains of each group are given. ‘Self-esteem’ and ‘good social contact’ were part of the list of all groups. The following domains were part of the list of the majority of the groups (i.e., three or four groups): self-acceptance, independence, being able to perform activities of daily living that are important to you, optimism, autonomy and purpose in life.Footnote 4 All these outcomes closely resemble the described outcomes based upon the highest mean ratings.

Table 5 Frequencies of domains mentioned in the list of five most important domains in the third round of the Delphi-procedure

Discussion

The Delphi-procedure we performed aimed to identify the essential domains of QoL that are important in the context of health-care interventions. Consequently, our Delphi-procedure shows which domains should potentially be included in generic preference-based (utility) scales to comprehensively measure health-related QoL. Generic preference-based measures are used for evaluation of (cost-) effectiveness of health-care interventions, capturing meaningful within-person change over time when it occurs.

Five different groups of experts rated more than 40 potential domains of health-related QoL. The results showed that all five groups agreed on few domains. That is, only ‘self-acceptance’ is part of the highest mean ratings of all groups. When looking at the list of five most important domains, ‘self-esteem’ and ‘good social contacts’ are the only two domains on which all five groups agree that they are highly important. Interestingly, these domains cover aspects that can be classified as mental and social phenomena and not as typical physical domains that are part of health utility measures, like the EQ-5D and SF-6D (i.e., mobility, vitality, dealing with somatic complaints). Looking at the more extensive lists of domains that the majority of the groups see as highly important (i.e., three or four groups), we again see that most of the domains concern mental and social phenomena. Moreover, the typical physical domains used in health-related QoL utility measures (like ‘vitality’ and ‘mobility’) are not present in these more extensive lists. In sum, mental and social domains are perceived as more essential than physical domains across all five stakeholders groups. This conclusion can have practical implications for future (cost-) effectiveness studies concerning health-care interventions. It can be stated that adding (more) mental and social domains to existing health-related QoL utility measures will result in a more comprehensive operationalization of health-related QoL. This will ultimately facilitate the allocation of health-care resources to interventions that are most effective in increasing people’s (health-related) QoL in relation to their costs.

An important point of consideration is that we know little about what participants are thinking when they provide scores on health states in existing health-related QoL measures. It is possible that respondents when faced with a health state with substantial physical impairments (e.g., impaired vision, impaired cognition), they might readily infer that in such a state social interaction is limited and self-esteem is low and therefore provide a low score on that state. So it is possible that existing generic preference-based measures already implicitly, to some extent, include social and mental domains. Our Delphi-procedure does, however, show the importance of providing more explicit attention to social and mental domains.

Limitations

We chose to present participants at the start of the Delphi study with a list of domains that is extracted from existing questionnaires. Participants did have the opportunity to add, delete or alter domains. However, by presenting participants with a predefined set of domains, we might have led the participants’ way of thinking about health-related QoL. An alternative would have been to simply ask participants to list essential domains of health-related QoL. We chose to use a predefined set of domains to give our study a solid base, [18] and we believe that providing no list of domains could have been too cognitively demanding, resulting in a small list of domains that only include the domains that come to mind most easily. Second potential point of concern is that we did not explicitly select a specific theoretical framework at the beginning of our Delphi-procedure to guide our study. We used a broad conceptual framework—the WHO definition of health. By using existing measures as input for our Delphi-procedure, we have implicitly incorporated the specific theoretical frameworks of health-related QoL underlying these existing measures. For undertaking future steps, it can be helpful to use a more specific theoretical model. That is, the next step is to narrow down the list of 25 domains (see Table 5), resulting from this Delphi study, to create a new and comprehensive measure of health-related QoL. Utility measures require a limited number of questions/domains to be included [28]. A more detailed theoretical model on health-related QoL could provide informative input for this search process.

Theoretical issues

What sort of theoretical issues does a more specific theoretical model on health-related QoL need to resolve? An important theoretical issue is that the domains that are part of our Delphi-procedure differ in level of abstractness. Some concern people’s general life views (i.e., feeling an autonomous person) and others are very concrete and objectively measurable (i.e., being able to walk). It is possible that some specific domains are part of another larger and more abstract domain [29]. In addition, the domains also differ in the extent to which they are ‘proximal’ or ‘distal’. Proximal refers to domains such as activities of daily living, while distal refers to more fundamental domains that influence the extent to which one can perform activities of daily living. Functional limitations may for example influence being capable of engaging in self-care. A second theoretical issue concerns social domains; some scientists suggest that social interaction should be omitted from the domains included in a measure of (health-related) QoL. It is argued that social interaction affects health and health affects social interaction, and therefore, social interaction should be measured separately [30]. A final theoretical issue that needs to be addressed is whether our new utility scale is intended to capture people’s capabilities (e.g., coping abilities) versus their functioning. This distinction is made in the Capability Theory of Sen [31]. Focusing on capabilities is a paradigm that is given increasing attention in health economics [32]. We see strong added value for focusing on people’s capabilities; it makes it possible to capture the extent to which people are able to autonomously cope with life’s ever changing physical, mental and social challenges.

Future steps

In order to develop a new QoL utility measure, several follow-up steps need to be taken to narrow down the number of domains. We believe that the first step should be to make a rough selection of health-related QoL domains (present in the Delphi-procedure) and construct concrete questions that capture these domains. The next step will be testing these questions in a large sample of respondents. For example, to test which domains, if any, overlap in a factor analysis and see how the domains correlate with existing QoL/well-being measures to determine which domains have the best level of validity in capturing physical, mental and social phenomena of QoL. We will include a diverse array of measures: questionnaires on health-related QoL and more general QoL/well-being measures and both multi-attribute and one-dimensional questionnaires. This enables us to create a smaller, more operationalizable, set of domains.

Conclusion

Inevitably, health-care resources are scarce. Evaluating the benefits of health-care interventions assists the allocation of these scarce resources and helps to maximize health benefits. An increasingly common critique is that traditionally used scales are too narrowly focused, resulting in suboptimal measures of effectiveness in which not all relevant domains of QoL are captured. Therefore, measures need to go beyond these scales. Unfortunately, there is no consensus on the core domains of QoL. The current three-stage Delphi consensus study among patients, family of patients, clinicians, scientists and the general public shows that measures need to put more emphasis on mental and social domains to capture aspects of QoL that are essential to people.