FormalPara Key Points
Patients, clinicians, and experts from across the world were supported by international organizations in defining what aspects of health should be considered (and how they should be measured) when assessing the added value of services for adults, in particular for public and primary health care.
The result is a brief yet comprehensive set of outcomes and valid, reliable, well supported and readily available measurement tools to support the delivery of health care that results in added value for patients.

Introduction

Health reforms worldwide aim at strengthening the orientation of health care systems for the provision of value-based health care by linking payments to outcomes, rather than processes of care [1, 2]. In this context, value is defined as the ratio between outcomes achieved (e.g., health improvements) and resources employed to achieve those outcomes (e.g., costs). There is broad consensus that such outcomes should include patient-reported outcomes of health status, for example, functional status or health-related quality of life as well as more traditional measures, such as survival and morbidity [3, 4]. When value is based on outcomes that truly represent patients’ priorities, this can create a context in which incentives are properly aligned across all health care stakeholders for the creation of value for patients [1, 4]. Although the evidence base for the feasibility and impact of value-based health care is still limited [5], the potential of the approach is raising great interest, particularly among numerous early adopters [6].

A key step in the implementation and evaluation of value-based health care is the identification of the outcomes that define value for a specific population. The systematic identification of such outcomes sets has been underresearched. Notable exceptions are efforts by trialists to identify core outcome sets for research purposes, which has crystalized in the COMET initiative [7, 8], as well as specific efforts by the World Health Organization (WHO) to promote the development of sets of functional outcomes for populations with specific conditions [9]. In parallel, the OECD is leading the development and standardization, and implementing a new generation of indicators that measure the outcomes and experiences of health care that matter most to people through the Patient-Reported Indicator Surveys initiative [10].

Although a large number of institutions and professional organizations are supporting the development of the relevant metrics, a coherent and comprehensive approach has been lacking. The International Consortium for Health Outcomes Measurement (ICHOM) initiative has set out to fill this gap by promoting and facilitating the development of standardized outcome sets as benchmark criteria when measuring value in health care. Thirty-two global outcome sets have been developed thus far, covering conditions responsible for over half of the global burden of disease [11].

The approach thus far, however useful, presents with important limitations. The disease orientation of the available sets makes it difficult to fully appraise aspects of health beyond those directly linked to the presence of disease. Furthermore, it leaves out aspects of overall health that are not specific of a particular condition-specific set, such as those relevant to people with multimorbidity and polypharmacy [12]. These limitations undermine the appraisal of the value of services oriented towards health promotion and prevention, such as public health and primary care services [13]. Moreover, it is at odds with the renewed recognition of the essential role that these services play within health care systems and the support for primary care and public health functions as the core of integrated health services and one of the three pillars of the WHO’s primary health care approach [14]. The lack of a standard set for monitoring patient-centred outcomes in a way that is not dependent on disease status and/or health services is a significant limiting factor for developing the evidence base for value-based healthcare.

Aims

Our aim was to define a minimum Overall Adult Health Standard Set that will enable outcome measurement in routine clinical practice to improve decision making between providers and patients aged 18 years or older, to facilitate quality improvement, and to allow for benchmarking across organizations. Specific objectives included the identification of a parsimonious, consensus-based set of outcomes, and the identification of a set of variables to be systematically collected to enable case-mix adjustment to support comparison across providers and health care systems.

Methods

Design

A panel of professionals and patients was convened by invitation based on a snowballing approach. To ensure a wide variety of both expertise and geographical representation, panel members with a range of professional expertise, including patient advocacy, primary care and public health professionals, health services administration, and research and outcomes measurement were recruited from 13 countries. The panel took part in eight videoconferences. An executive team (JMV, JBG, AA, SM, AC, SW, LM, AJ) supported, coordinated, and guided the panel’s activities.

A structured, consensus-driven modified three-round Delphi approach was implemented from May 2017 to December 2019. This approach has been successfully applied to the development of 35 population-specific outcomes sets now covering over 50% of the world’s burden of disease, with a number of others currently in development [15, 16].

As outlined in further detail in the following sections, decision making by the panel was facilitated by the executive team through: (a) a series of reviews of (i) the relevant literature in academic databases (Medline, Embase and PsychINFO [through Ovid], CINAHL [EBSCO host], and ProQuest) and grey literature; (ii) patient surveys by the WHO, the World Bank, the Organization for Economic Cooperation and Development (OECD), and the Commonwealth Fund; and (iii) existing ICHOM standard sets [17] (executive team, supplemented with input from panel members); (b) application of prespecified criteria for short-listing (executive team); (c) circulation of documentation to the panel members, subsequently discussed via teleconference (either of the whole panel or small groups of panel members depending on complexity and amount of information and according to panel members expertise); and (d) formal surveys using prespecified thresholds for agreement (working group) [Fig. 1].

Fig. 1
figure 1

Methods for the development of the standard set. PROMs patient-reported outcome measures

The process was iterated for the identification of (1) outcomes; (2) measures; (3) case-mix variables; and (4) timing of all the relevant measurements. From May 2017 to December 2019, the panel was convened for eight full working group videoconferences and ten breakout group teleconferences. A patient validation survey and an open review survey gathered further input from patients, experts, and interested stakeholders.

Identification and Selection of Outcomes

To further clarify the scope of the work, the panel discussed and reached consensus on a conceptual framework built upon existing outcomes frameworks [1, 18, 19] as proposed by the executive team and which included three of broad outcome domains that were consistent with the preventative scope of the services for which the set should be suitable (current health status, future health status, and modifiable predictors of future health status) alongside examples of possible outcomes (electronic supplementary Fig. 1).

Subsequent searches of the literature for overall health outcomes and their definitions (electronic supplementary Table 1) allowed the identification of articles for initial revision by the executive team, who selected relevant articles according to predefined eligibility criteria (Box 1).

Box 1 Eligibility criteria for the documents retrieved in the literature searches

In preparation for the abstract reviews, three researchers (AA, JBG, JMV) applied eligibility criteria to the same abstracts and discussed disagreements until achieving agreement among researchers, reaching a kappa score >0.7, which occurred after review of 60 abstracts. Eligibility criteria were applied to all the retrieved documents. Outcomes were extracted from eligible documents using a structured proforma and were presented to the working group members, who voted for inclusion in the standard outcome set those outcomes that they considered to (1) represent the end results of care (rather than the process of care); (2) be important for patients; (3) be feasible to be accurately measured; and (4) be modifiable with quality improvement efforts. Working Group Members voted on all outcomes with an explicit threshold for their inclusion in the standard set. For an outcome to be included, it had to be ranked between 7 and 9 on a 9-point relevance scale by at least 80% of working group members, where 9 was the highest possible value. For an outcome to be excluded it had to be ranked between 1 and 3 on the same scale by at least 80% of respondents. Any outcomes that did not meet any of the prior criteria were considered inconclusive and voted upon again on an upcoming voting round. In each round panel members also considered whether two or more outcomes could be consolidated due to their overlap. This set of outcomes was considered provisional and contingent to the availability of relevant instruments.

An online patient survey was conducted to gather patients’ opinions about the draft set of health outcomes recommended by the panel. For pragmatic reasons, the survey was conducted in the US and Wales (UK). The survey was active for 6 weeks from 3 October 2019 via the Qualtrics survey platform. Patients were asked to rate each proposed outcome on a 9-point scale, where 9 indicates an outcome of the greatest importance to patients and 1 the least importance.

Identification and Selection of Measures

Searches of the literature for overall health outcomes measures were initially undertaken by the executive team. Relevant articles were selected according to predefined eligibility criteria (Box 1). In preparation for the abstract reviews, three researchers (AA, JBG, JMV) applied eligibility criteria to the same abstracts and discussed disagreements until achieving agreement among researchers, reaching a kappa score >0.7. Eligibility criteria were applied to all the retrieved documents. Outcome measures were extracted from eligible documents following Tool Selection Methodology and were presented to the working group members, who voted for inclusion in the standard outcome set. The panel recognized that defining a set that would produce separate multi-item measures for all the proposed outcomes would result in an unreasonable burden on respondents, which would be a barrier for use in routine clinical practice. It was therefore unanimously agreed to select measures that would cover all the proposed domains without necessarily eliciting separate scores for each of them (e.g., a measure may include an item on the outcome ‘pain’, thereby ensuring the coverage of that outcome, without necessarily eliciting a distinct ‘pain’ score).

In a sequential process, panel members firstly selected candidate generic measures (those not being specific for any population, disease-specific or otherwise, or a priori defined outcomes), and then additional measures were selected according to outcomes that were not covered by the generic measures. The processes described above for outcomes were then followed for measures. The criteria applied by the expert panel included coverage of multiple outcomes; availability of evidence on reliability (>0.7), validity [20], and sensitivity to change; time of administration; feasibility of implementation within diverse, international, clinical settings; availability in English and other languages (no minimum number was defined); and minimization of financial barriers to using the measures. The panel voted on competing outcome sets, including high-scoring measures as supplemented by additional ones to ensure the comprehensiveness of each set.

Identification and Selection of Case-Mix Variables

The aim was to identify a parsimonious consensus-based and harmonized set (in relation to existing standard sets) of case-mix factors for which there was evidence of association with the proposed outcomes and which could be reliably measured across diverse international clinical settings. All documents in the initial search for outcome indicators that included an outcome present in the final set were considered eligible. Case-mix variables were then extracted and the panel was asked to prioritize those factors for which there was a stronger association with the proposed outcomes following steps described above for outcomes and measures.

Identification and Selection of Timepoints

The timepoints for each recommended outcome and measure were extracted from the documents included in the previous steps. The executive team presented a suggested framework for data collection, which was discussed among Working Group Members.

Open Review and Approval of Standard Set

An open review survey was distributed via the ICHOM electronic newsletter, the panel members’ networks, and organizations, to professionals working in or involved with healthcare whose role was not primarily a patient. Respondents were asked for their feedback on the standard set as a whole (outcomes, measures, case-mix, and timepoints). The online survey was distributed using the Qualtrics survey platform from 3 October 2019 and remained active for 6 weeks. The final standard set was agreed after consideration of the open review results in a final survey of panel members.

Results

Identification and Selection of Outcomes

Of 4927 articles retrieved from the literature searches, 2079 documents were deemed eligible. Initially, 301 potentially relevant outcomes were identified from the literature review and supplemental sources. Three rounds of Delphi were necessary based on the prespecified criteria. A total of 34 outcomes emerged, which were firstly consolidated in 22 outcomes (electronic supplementary Table 2). Over 90% of participants in the patient survey (77 complete responses (Wales, UK 68%; US 32%) agreed that these outcomes were either somewhat important or most important (electronic supplementary Table 2).

After further consolidation, the final set included 16 outcomes organized into four domains: (1) overall health (general health); (2) physical health (general physical health, physical functioning, mobility, seeing, hearing, fatigue, pain); (3) mental health (general mental health, vitality, sleeping, symptoms of depression, symptoms of anxiety); and (4) social health (general social health, interpersonal functioning, work) [electronic supplementary Table 3].

Additional outcomes still considered at this stage but subsequently excluded because of lack of a suitable tool included resilience, patient health and health care capabilities (including ‘knowledge, skills, and confidence’, ‘health literacy’, and ‘involvement and participation in health care’), and the selected outcomes of physical health (fitness), mental health (emotional support, substance and drug use), and social health (social isolation, discrimination).

Identification and Selection of Measures

Initially, 130 potentially relevant measures were identified from 2079 eligible documents. After applying the prespecified criteria, 23 measures were evaluated by the panel in the modified Delphi process. The measures with the strongest support included PROMIS Global Health-10 (38%) [21], PROMIS-29 (29%) [22], FACT-GP (14%) [23], RAND 36 (10%) [24], WHOQOL-BREF (5%) [25] and EQ-5D-5L (5%) [26] [electronic supplementary Table 4]. To ensure the balance between comprehensiveness and minimization of burden of administration, the panel subsequently considered two alternative sets, as designed by supplementing each of the two most promising measures with additional measures to ensure comprehensiveness of the set, one including PROMIS Global Health 10 and WHO-DAS-12 [27] and another one composed of FACT-GP and EQ5D-5L. Both sets also included WHO-5 [28], as well as single items for seeing and hearing, because they covered outcomes (notably sleeping, vitality and positive aspects of health, as well as hearing and seeing) that were not adequately covered by the other proposed measures. The first set was endorsed by 68% of panel members, while 32% endorsed an alternative set.

The final set consisted of three multi-item measures and two single-item measures covering all the outcomes in the standard set (Table 1). The PROMIS Global Health v1.2–10 (PROMIS-10) has 10 items and elicits two scores: global physical health and global mental health. The WHO-5 Well Being Index (WHO-5) has 5 items and elicits a single score of quality of life. The WHO Disability Assessment Schedule 2.0–12 has 12 items covering the domains of understanding and communicating; moving and getting around; attending to one’s hygiene, dressing, eating and staying alone; interacting with other people; domestic responsibilities, leisure, work and school; and joining in community activities and participating in society. It elicits a single measurement of functioning. The single item on seeing corresponds to the ‘global vision rating’ from the National Eye Institute Visual Function Questionnaire (NEI-VFQ-25) [29], while the item on hearing corresponds to the ‘general condition of hearing’ in the National Health and Nutrition Examination Survey (NHANES) 2019–2020 in the US [30]. Of note, the global vision rating has also been used in NHANES.

Table 1 Domains, outcomes, and measures of the Overall Adult Health Standard Set

Identification and Selection of Case-Mix Variables

Exactly 100 candidate demographic, clinical and treatment-related, and lifestyle factors were identified through literature search, previous ICHOM standard sets, registry review, and working group feedback. At the end of the Delphi process, 13 case-mix factors were selected: (1) demographic (age, sex, level of education, marital status, employment status, housing status); (2) clinical (comorbidities, body mass index, blood pressure, cardiovascular risk [based on sex, age, smoking status, blood pressure, cholesterol and diabetes status]); and (3) lifestyle (smoking status, alcohol intake and physical exercise) [Table 2]. Dietary and eating habits, although deemed relevant, were excluded because of lack of a suitable measure.

Table 2 Factors for case-mix adjustments

Identification and Selection of Timepoints

Recommendations regarding timing of measurement of different variables and instruments were based on literature review and extensive discussions among the Working Group members. There was consensus that the variables age and sex should be collected at baseline. The following variables should be measured at baseline and then annually: PROMIS-10, WHO-5, WHO-DAS-12, self-reported hearing and seeing, and all case-mix variables except age and sex (Fig. 2). This provides an adequate balance between administrative burden and sufficient frequency of measurement as to pick up relevant clinical changes. Timing of the collection of comorbid conditions should be synchronized with administration of other datasets.

Fig. 2
figure 2

Time points for the administration of the Overall Adult Health Standard Set. PROMIS-10 10-item Patient-Reported Outcomes Measurement Information System, WHO-DAS-12 12-item World Health Organization Disability Assessment Schedule, WHO-5 5-item World Health Organization Well-Being Index. PROMs include: PROMIS-10, WHO-DAS-12, and WHO-5. Patient form includes age, sex, level of education, marital status, employment status, housing status, comorbidities, smoking, alcohol intake, and physical activity. Provider form includes body mass index, blood pressure, and cardiovascular risk

Public Consultation

A total of 110 participants across 25 countries (UK 30%, The Netherlands 16%, Australia 12%, US 10%, other 22%) participated in the consultation. About half (51%) of all participants were healthcare professionals (others: researchers 25%, healthcare administrators 14%, policy advisors 9%, advocacy professionals 1%, and industry/commercial representatives 1%). A large majority of respondents agreed with the proposed minimum set for measuring overall health status (89%), physical health (83%), mental health (79%), social health (86%), and case-mix variables (73%) [electronic supplementary Table 5).

Discussion

An overall adult health standard set has been developed to support routine outcome monitoring regardless of a patient’s underlying health status, presence of particular conditions or receipt of specific health care interventions, filling a specific gap for primary and preventive care services. The set includes health outcome measures and relevant case-mix variables. The set is to be used as a whole, eliciting a profile of the individual across all the relevant outcomes based on scores for the specific outcomes of general health (WHO-5), physical health and mental health (PROMIS-10), and supplemented with scores for WHO-DAS-12.

Significant strengths of this outcome set include the standardized, comprehensive, and tested approach that was used in its development; the involvement of patients and experts; and its orientation to supporting the delivery of value-based health care. However, some limitations of our approach need to be acknowledged. Our searches were pragmatic to allow us to process a huge body of literature. Although we cannot rule out missing a relevant outcome, measure, or case-mix factor, the systematic effort and triangulation with input from experts, professionals, and patients makes it less likely. We limited our searches to the English language only due to the extensive nature of the reviews. It also remains to be established whether we were successful in devising a system that is applicable to the widest number of countries and situations. Although panel members were based in a number of countries across four continents, not all geographical locations were equally represented. Furthermore, the patient survey was administered in only two countries (US and Wales). Our criteria for prioritization of measures may have favored fixed short instruments over systems based on computerized adaptive testing, which support efficient yet flexible measurement [31]. However, the inclusion of a short form that is part of one such system (PROMIS) facilitates the alignment with PROMIS metrics and measurement approaches in the near future [32]. In addition, the positive side is that the current set can be used in organizations and countries that do not have advanced electronic systems. Related to this, the set has prioritized standardized tools over individualized tools [33, 34]; whereas the latter may be perceived as advantageous by some clinicians [35], they lack the adequate metric robustness and comparability across patients that is intrinsic to the value-based healthcare approach [1].

A further limitation is that a number of outcomes have been excluded because of the lack of readily available instruments. The standard set should be reviewed and updated in the near future to scan for suitable measures that may have become available for those outcomes. While this limits the comprehensiveness of the core set, it also suggests, by way of triangulation, that those outcomes may have somewhat less support for their inclusion. Finally, in order to avoid imposing an excessive burden on respondents, the set does not elicit separate scores for each distinct outcome, but rather across the whole range. However, responses to individual items can possibly also be used to approximate measurements of each outcome, while also allowing for detailed examination of potential problems to support a patient-centred approach to care planning. This was perceived as a key contribution of WHO-DAS-12 to the set. However, it remains a possibility in future revisions of the standard set to reconsider whether the other two instruments may provide an even more parsimonious yet sufficient characterization of the status of the individual.

Previous outcome sets are condition-specific in the adult population and this is the first study to report the development of a specific standard set of measures for overall adult health [36]. This set fills a key gap in the implementation of the value based approach to primary care and public health. In addition, by virtue of being applicable to all individuals regardless of their health status, it also offers a new approach for benchmarking and facilitates standardization across other ICHOM core sets. Furthermore, this set can serve as a core foundation for modular add-ons of more specific sets, enabling comparisons in and across institutions and across conditions. In this sense, this standard is similar to ICHOM’s Overall Pediatric Health and Older Person Standard Sets [37], and if implemented in unison they would offer a comprehensive approach for the evaluation of outcomes across the lifespan. Further work will be needed to maximize alignment of the sets with particular attention to transitions.

Implementation of the set could improve patient care by using innovative and multipurpose approaches, including clinical encounters, electronic health records, patient-reported outcome measures feedback, and others [3840]. However, in practice, the utility of the standard set is still to be determined through ongoing evaluation of its implementation [41]. Further research is needed for the implementation evaluation of this set, with a special accent on cultural pertinence given the limited scope of the patient validation survey and also on the feasibility and utility of using the set across multiple settings in conjunction with disease-specific sets. Further potential applications would also include research, whether to study the impact of broad-ranging health care interventions. or to study protective factors in health-disease processes.

Conclusion

This Overall Adult Health Standard Set provides a new approach for appraising both positive aspects of health and key impacts of disease, using available and accessible measurement measures. The use of this standard set in the delivery of care can support aligning health services stakeholders’ incentives with patients’ needs and the creation of value for patients.