FormalPara Key Points for Decision-Makers

Use of the EQ-5D for COA was more common in the HTA review (18%) than in the regulatory review (5%) and SLR (8%).

The proportion of approved drugs which reported EQ-5D data for COA differed between HTA agencies and between regulatory agencies, suggesting that different standards were used by stakeholders to assess clinical benefit of treatments.

Study sponsors should be mindful of the limitation of EQ-5D as a tool to assess clinical benefit.

1 Introduction

The EQ-5D is a concise, generic patient-reported outcome (PRO) measure of health consisting of five dimensions of health status (mobility, self-care, usual activities, pain/discomfort and anxiety/depression) and a visual analogue scale (VAS) [1]. Two versions of the EQ-5D are available: the 3-Level EQ-5D (EQ-5D-3L), with 3 severity levels for each dimension, and the 5-Level EQ-5D (EQ-5D-5L), with five severity levels [1]. The EQ-5D is the most commonly cited preference-based measure of health status in health technology assessment (HTA) guidelines [2] and is mainly utilised for economic evaluations. There is, however, potential for the EQ-5D to be useful in clinical outcomes assessment (COA) due to its widespread use and ease of administration. Use of the EQ-5D in this capacity could have important statistical implications for trial design and analysis, including its place in the hierarchy of endpoints and whether trials need to be powered to detect statistically significant changes or differences in the EQ-5D. Although regulatory and HTA agencies recognise the importance of PRO data, the extent to which the EQ-5D is used for COA is unclear.

The objective of this study was to conduct a series of three structured, parallel reviews to identify the prevalence with which EQ-5D data are evaluated by health authorities in clinical benefit assessments of drug technologies, and the EQ-5D’s use in a non-economic context to support the communication of value of drug treatments.

2 Methods

2.1 Literature Search

2.1.1 HTA Review

HTA decision and supporting documents published between 1 January 2019 and 15 January 2021 were manually downloaded from the websites of five HTA bodies: The National Institute for Health and Care Excellence (NICE; www.nice.org.uk/), the Haute Autorité de Santé (HAS; www.has-sante.fr/), Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen, (IQWiG; www.iqwig.de/en/), Gemeinsamer Bundesausschuss (G-BA; www.g-ba.de/) and the Institute for Clinical and Economic Review (ICER; https://icer.org/). Searches were conducted on 15 January 2021, except for the one for G-BA, where a search was conducted on 11 October 2021. Electronic keyword searching was then conducted for the following terms: ‘EQ-5D’, ‘EQ5D’, ‘EQ-5D-3L’, ‘EQ-5D-5L’, ‘EuroQoL’, ‘EQ-VAS’ and ‘EQVAS’.

2.1.2 Regulatory Review

The PROLABELS™ (ePROVIDE™) database was used to identify European Medicines Agency (EMA) European Public Assessment Report (EPAR) product information [which includes the Summary of Product Characteristics (SmPC)] and assessment reports, and Food and Drug Administration (FDA) product labelling which included EQ-5D-related terminology for all new drug approvals and modifications to existing drug approvals published between 1 January 2016 and 25 February 2021. Biosimilars, generics and minor modifications to the license which did not present EQ-5D claims to an existing label were excluded. As this search only identified documents which mentioned EQ-5D in the drug labels, a manual search of the FDA website Drugs@FDA (www.accessdata.fda.gov) was later performed to identify the supporting documentation (medical and statistical reviews) for all new drug approvals between 1 January 2016 and 16 March 2021. Electronic keyword searching was then conducted to identify documents with EQ-5D-related terminology.

2.1.3 Systematic Literature Review

A systematic literature review (SLR) was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines [3]. An electronic search was performed on 12 January 2021 in Embase and MEDLINE (OvidSP) databases, using two broad sets of terms to identify publications reporting EQ-5D and clinical trial data. Searches were limited to English language and publication type was limited to randomised controlled trials (RCTs) and single-arm trials. Manuscripts were limited to the last 5 years and conference abstracts to the last 2 years. No limit was placed on country of study origin. The search strategy was validated by cross-referencing search strategies with previously published SLRs and by ensuring known studies were identified. Manual searching also was performed to identify proceedings from conferences of interest. A complete copy of the search strategy used for each electronic database is reported in Online Resource 1.

2.2 Study Selection and Quality Assessment

2.2.1 HTA and Regulatory Reviews

All retrieved documents were reviewed by one analyst (with 10% of the documents reviewed by a second analyst). Records were included or excluded according to pre-specified eligibility criteria. Inclusion criteria included drug technologies intended for human use, and EQ-5D data (utility index and/or EQ-VAS), outside of the context of economic evaluation, reported in guidance documents (HTA review), EPARs and product labelling (regulatory review) and their supporting documents for new drug approvals or modifications to existing drug approvals. Any disagreements between analysts were resolved through discussion until a consensus was reached.

2.2.2 SLR

After deduplication, two analysts independently reviewed the retrieved records. Records were included or excluded according to pre-specified inclusion and exclusion criteria. Inclusion criteria included drug technologies intended for human use and studies reporting EQ-5D data (utility index and/or EQ-VAS) from clinical trials. Non-human studies, observational studies, reviews, EQ-5D data reported only in the context of economic evaluation, EQ-5D-Y data and non-English language studies were excluded. Titles and abstracts were reviewed for all the retrieved records, and full-text articles were obtained for the included records for evaluation in a full-text review against the eligibility criteria. Any disagreements between analysts were resolved through discussion until a consensus was reached. Retrieved studies were critically appraised by a single reviewer for methodological quality using the Cochrane Risk of Bias tool (RoB2) for randomised controlled trials [4] and the Risk of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool for single-arm trials [5].

2.3 Data Extraction and Synthesis

Data from the included studies were extracted by one analyst and quality assured by a second analyst. For the HTA and regulatory reviews, data were extracted from the guidance/product labelling documents, and additional data were extracted from supporting documents (e.g. NICE committee papers, G-BA tragende gründe zum beschluss and zusammenfassende dokumentation, and FDA medical and statistical reviews), where available. As G-BA technology appraisals (TAs) were identified at a later date, abbreviated data extractions were performed for G-BA TAs in the HTA review, whereby only differences between data reported in linked G-BA and IQWiG documents (i.e. reporting the same product and indication) were extracted to avoid duplication of data. Extracted data included study/drug assessment details, drug therapeutic area, source and type of EQ-5D data, and main comments and criticisms by the HTA/regulatory agency about the presented PRO data. Where outcome data were missing, they were extracted as ‘not reported’. Data were presented descriptively, using a combination of narrative synthesis, and summary tables and graphs to present frequencies of EQ-5D use. No statistical comparative analyses were performed. Trends in the use of EQ-5D as a COA for drug technologies were examined across therapeutic areas (using NICE categories for consistency across the reviews). Differences between data reported in linked IQWiG and G-BA documents for German HTA submissions were also descriptively analysed.

3 Results

3.1 Literature Search

3.1.1 HTA Review

During the 2-year study period, 1072 TAs were published (G-BA n = 61, HAS n = 672, ICER n = 16, IQWiG n = 223 and NICE n = 100). In total, 1329 HTA decision and supporting documents were identified via manual searches. The flow of studies through identification and study selection can be found in Fig. 1. Keyword searching and full-text screening excluded 746 and 285 records, respectively. The most common reasons for exclusion at the full-text stage were no EQ-5D data reported (n = 193) and EQ-5D data reported only in the context of an economic evaluation (n = 87). Overall, 298 documents from 195 TAs were included in the HTA review. In total, 16 of the 60 included G-BA TAs provided additional EQ-5D data to linked IQWiG TAs and were extracted. A list of included references is provided in Online Resource 2. Of the 43 included NICE TAs, 19 reported EQ-5D data in supporting documentation only (appraisal committee papers and/or final appraisal committee papers) and not in the final appraisal guidance document.

Fig. 1
figure 1

Flowchart of the selection process for studies included in the health technology assessment review. Abbreviations: G-BA, Gemeinsamer Bundesausschuss; HAS, Haute Autorité de Santé; HTA, health technology assessment; ICER, Institute for Clinical and Economic Review; IQWiG, Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen; NICE, National Institute for Health and Care Excellence; TA, technology appraisal

3.1.2 Regulatory Review

During the 5-year study period, 1055 drugs were approved or reviewed by the EMA and FDA (EMA n = 320 and FDA n = 735). A total of 17 EMA records were identified through electronic searching of the PROLABELS database; no FDA records were identified via this search. All data relating to FDA-approved drugs were obtained from supporting documentation (medical and statistical reviews of the drug approval package) and not from the product label themselves. In total, 903 documents were downloaded and screened. Keyword search and full-text screening excluded 781 and 40 records, respectively. Overall, 82 documents reporting on 50 drugs were included (Fig. 2). A full list of the references included in the regulatory review can be viewed in Online Resource 3.

Fig. 2
figure 2

Flowchart of the selection process for studies included in the regulatory review. Abbreviations: EMA, European Medicines Agency; FDA, Food and Drug Administration

3.1.3 SLR

A total of 4248 references were identified via electronic and manual searches, with 3755 excluded after title and abstract screening. Subsequently, 493 full-text references were screened for eligibility. A total of 164 records were excluded for the following reasons: inappropriate study design (n = 49), falling outside of the study timeframe (n = 44), no outcome data of interest (n = 33), no EQ-5D data reported (n = 18), duplicate (n = 15) and no intervention of interest (n = 5). Therefore, 329 references were included in the SLR. However, data were extracted from 328 references, as two references reported data from the same study. The flow of included studies can be found in Fig. 3 and a list of the included references in Online Resource 4. Overall, 47% of the 259 included RCTs assessed using the Cochrane RoB2 tool had a low risk of bias, there were some concerns for 35% and 17% had high risk of bias. Of the 70 non-randomised studies included and assessed using the Risk of Bias in Non-randomized Studies of Interventions (ROBINS-I) tool, 4% were deemed to have low risk of bias, 84% had moderate risk, 9% had serious risk and 3% did not have sufficient information for risk of bias assessment.

Fig. 3
figure 3

Flowchart of the selection process for studies included in the systematic literature review

3.2 Type of EQ-5D Measures Reported

3.2.1 HTA Review

Overall, the EQ-VAS was the measure most frequently mentioned in HTA appraisal documents, appearing in 68% of all TAs which reported unique data (n = 103/151), and was mostly used in German TAs, in 100% (n = 78) of IQWiG and 94% (n = 15/16) of G-BA TAs (Table 1). When the EQ-5D measure was specified, both utility index and EQ-VAS scores were reported most frequently in NICE and HAS submissions (28% and 46%, respectively). Across all HTA agencies, TAs reporting all three EQ-5D measures (utility index, dimension levels and EQ-VAS) were least frequent, totalling 1% (n = 2/151) of all appraisals. The only difference between linked IQWiG and G-BA appraisals was that one G-BA appraisal also reported EQ-5D utility index data in addition to the EQ-VAS [6].

Table 1 Type of EQ-5D measure reported in the HTA, regulatory and systematic literature reviews

3.2.2 Regulatory Review

Overall, presentation of both the EQ-5D utility index and EQ-VAS was most frequently mentioned in 32% (n = 16/50) of all labels and supporting documents, followed by EQ-5D utility index only in 20% (n = 10/50) of all labels and supporting documents (Table 1). A similar trend was observed in FDA labelling supporting documents, with the EQ-5D utility index and EQ-VAS, and the EQ-5D utility index alone reported in 26% and 23% of documents, respectively. EQ-5D type was not mentioned in 23% (n = 8/35) of FDA documents. In EMA documents, reporting of both EQ-5D utility index and EQ-VAS was most common (47%) followed by utility index, EQ-VAS and dimensions (27%) (Table 1).

Fig. 4
figure 4

Version of EQ-5D reported in the health technology assessment review, regulatory review and systematic literature review. Abbreviations: EMA, European Medicines Agency; FDA, Food and Drug Administration; G-BA, Gemeinsamer Bundesausschuss; HAS, Haute Autorité de Santé; HTA, health technology assessment; ICER, Institute for Clinical and Economic Review; IQWiG, Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen; NICE, National Institute for Health and Care Excellence; NR, not reported; SLR, systematic literature review

3.2.3 SLR

Amongst the 328 extracted records, the reporting of both EQ-5D utility index and EQ-VAS was most frequent, in 36% (n = 119/328) of studies, followed by EQ-5D utility index only and EQ-VAS only, each in 21% (Table 1). Utility index and dimensions were reported together in only 2% (n = 7/328) of studies.

3.3 Version of EQ-5D Reported

3.3.1 HTA Review

Overall, most appraisals (n = 91, 60%) did not report the EQ-5D version, more often by G-BA (n = 11, 69%), IQWiG (n = 69, 87%) and ICER (n = 2, 67%). EQ-5D-3L and EQ-5D-5L were reported in 16% (n = 24) and 24% (n = 37) of all appraisals, respectively (Fig. 4). One NICE appraisal reported both versions of the EQ-5D, when data were presented from two different RCTs [7]. The EQ-5D version most frequently reported by HAS was EQ-5D-5L (n = 5; 45%) and was the EQ-5D-3L by NICE (n = 20; 46%).

3.3.2 Regulatory Review

In total, when the version was specified, EQ-5D-5L was most frequently reported, in 29 (58%) drug approvals, compared with the EQ-5D-3L, cited in 10 (20%) drug approvals (Fig. 4). Overall, the EQ-5D version was not specified in 22% (n = 11/50) of all drug approvals. In EMA documents, EQ-5D-5L was reported more often than EQ-5D-3L (53% versus 33% of labels). In FDA supporting documents, EQ-5D-5L was reported for most drugs (n = 21, 60%), with 26% not stating the version used, and only 14% reporting use of EQ-5D-3L.

3.3.3 SLR

Most studies in the SLR (n = 132, 40%) did not specify which version of the EQ-5D instrument was used (Fig. 4). When the version was specified, use of the EQ-5D-3L was marginally greater than of the EQ-5D-5L, as occurred in 31% versus 29% of studies.

3.4 Drug Therapeutic Area

Across all three reviews, cancer was the most frequently reported drug therapeutic area, in 67% (n = 101/151) of HTA TAs, 48% (n = 24/50) of labels in the regulatory review and 32% (n = 106/328) of records included in the SLR (Table 2).

Table 2 Drug therapeutic area reported in HTA, regulatory and systematic literature reviews

3.5 Acceptability of EQ-5D/PRO Data

3.5.1 HTA Review

Overall, 104 of 195 TAs recommended the technologies (HAS n = 10, ICER n = 3, IQWiG n = 50, G-BA n = 30 and NICE n = 41). Of those, PRO data (not just EQ-5D) were accepted as providing evidence of clinical benefit in 87 (HAS n = 8, ICER n = 3, IQWiG n = 42, G-BA n = 27 and NICE n = 7). PRO data had a specific impact on the decision in 13 TAs (IQWiG n = 12, G-BA n = 2 and NICE n = 1).

Among drugs assessed by HAS, a key critique of the committee was the inability of the PRO data to demonstrate benefit of treatment due to the exploratory nature of studies, lack of robust statistical analyses or large amounts of missing data. For ICER TAs, PRO data reported for all drugs were accepted by ICER, but none had any specific impact (i.e. evidence of added benefit) on the final decision. For example, in the assessment of crizanlizumab (Adakveo®, Novartis AG) and voxelotor (Oxbryta®, Global Blood Therapeutics Inc), the committee were unsure whether the lack of improvement in quality of life was due to the therapies having no real benefit or the inability of the instruments (including EQ-5D) to detect improvements [8]. Among the drugs assessed by IQWiG, the committee commented on the evidence of benefit according to PRO data, lack of usable PRO data, use of Hedges' g for interpreting statistical significance, and high risk of bias associated with the PRO data. A commonly highlighted issue was the unsuitability of the minimally important difference reference used for the EQ-VAS, as the study reported in Pickard et al. [9] used a cross-sectional design rather than longitudinal. Finally, although almost all assessed drugs were recommended by NICE, committee comments for seven drugs suggested that the EQ-5D was not deemed adequate for capturing quality of life in conditions with fluctuating symptom severity, such as migraine, where responses measured on a single day would not accurately represent long-term quality of life [10,11,12,13,14,15,16,17].

3.5.2 Regulatory Review

Among drugs assessed by the FDA, no EQ-5D data were referenced in any of the product labels; however, information on the acceptability of EQ-5D and PRO data was reported in supporting documents. Several documents stated that EQ-5D/PRO data may not be suitable for assessment of clinical benefit. For example, while PRO data reported in the FDA’s multidiscipline review for niraparib (Zejula, GSK) were seen to complement safety, radiographic and survival data, the inclusion of PRO results in the product labelling was not recommended for multiple reasons, including instrument suitability and limitations with the statistical analysis of these data, such as no adjustment for multiplicity [18]. For EMA-assessed drugs, similar criticisms of PRO/EQ-5D data referring to limitations with study design, statistical analysis (including those performed post hoc and without adjustments for multiplicity) and the sensitivity of EQ-5D were cited. Consequently, the value of PRO data for the benefit–risk analysis was perceived as limited for some drugs [19,20,21].

4 Discussion

The purpose of this research was to investigate the extent of use of the EQ-5D for COA outside of an economic setting, using HTA and regulatory information alongside an SLR of published literature. While reporting of PRO has been compared between HTA agencies and published clinical trials [22], to the authors’ knowledge, this is the first publication to compare the reporting of PRO (specifically EQ-5D) for COA between HTA and regulatory agencies and published clinical trials.

Results of the HTA review showed infrequent use of the EQ-5D outside of economic evaluations. This may be due to a lack of guidance or regulation of the reporting of EQ-5D in labelling, or because in conditions with small expected therapeutic benefit, the EQ-5D is unlikely to be sensitive enough to capture improvement. It may also be because the EQ-5D is not considered an adequate COA tool for clinical evaluation of treatment benefit. Similarly, the regulatory review suggested that EQ-5D was used to support labelling claims in a minority of EMA decisions, with no EQ-5D data reported in FDA product labels, only in supporting documents. This finding mirrors that of a previous review of the use of PRO endpoints in EMA and FDA decisions relating to oncology treatments between 2012 and 2016 [23]. While newer guidance is expected to stress the importance of COA, this finding may be explained by the 2009 FDA guidelines for the use of PRO to support labelling claims [24] which state a preference for PRO measures which are population-, treatment- or disease-specific, and developed with patient involvement. Additionally, FDA guidance published in 2021 specifically for oncology indications states that the choice of PRO must be justified and that the PRO can reliably measure clinically relevant outcomes [25]. This is likely because regulatory agencies such as the FDA are concerned with the risk–benefit profile of treatments, and therefore prefer PROs that are able to distinguish adverse side effects of the treatment [24]. Furthermore, EUnetHTA guidance recommends use of the EQ-5D for deriving QALYs and a disease-specific questionnaire to capture patient perceptions, making HTA agencies aware of the downfalls of the EQ-5D as a PRO [26]. HTA agencies may consider a broader set of criteria including the burden of a condition and the impact on quality of life from the condition and its treatment. In addition, HTA agencies may value consistency in decision-making, and the use of generic PRO measures can facilitate comparison across different diseases.

The EQ-5D was originally developed as a simple measure that could be used alongside more detailed health-related quality of life (HRQoL) measures providing a ‘common core’ for HRQoL comparisons and could be used as a tool to facilitate QALY-type calculations [27]. As its use has grown substantially over the years and across many health conditions [28], so too has evidence of its validity [29] and the contexts in which it is used. Its simplicity makes it an attractive measure for inclusion in clinical trials, and it is clear from this review that those EQ-5D data are not just being used to inform economic evaluations but are also, to a modest extent, being presented as a COA measure. Some stated concerns impacting the use of EQ-5D related to its sensitivity. Whilst the validity of EQ-5D has been established in many conditions, it is also recognised that there are some conditions where it performs poorly on psychometric assessments [30]. A possible approach to improve the content validity of EQ-5D is to extend the descriptive system by developing additional dimensions, also referred to as ‘bolt-ons’ [31]. These may render the EQ-5D more sensitive to specific conditions; however, substantial research is required for each bolt-on to establish its validity and utility value sets, and their acceptability by healthcare decision-makers is not yet well established.

It remains debatable, however, whether the EQ-5D should be used for COA. This review has found that, with the exception of HTAs in Germany where the EQ-VAS was predominantly used, reporting of the EQ-5D index and EQ-VAS data together was the most common approach for COA across the HTA, regulatory and systematic literature reviews. EQ-5D index values are not strictly a patient’s self-assessment of their own health. Rather, they are a composite of two elements: (i) the patient’s self-assessment of their own health using the five dimensions of the descriptive system and (ii) a measure of society’s preferences to avoid different severities and effects of ill health. Whilst not without controversy, the arguments for the use of societal values in valuing health states have been well rehearsed [32]. However, there is no clear reason why these arguments should extend to the clinical assessment of the health status of patients. Furthermore, the principal method used to value EQ-5D health states, the time trade-off, has been designed to reflect the healthcare decision-making under constrained budgets (e.g. whether it is preferable to invest in healthcare that extends life or improves HRQoL and all of the combinations in between). Why should this type of exercise be used to inform the clinical efficacy or effectiveness of treatment?

Conversely, it could be argued that the EQ-5D index is a way of weighting the levels and dimensions of the EQ-5D to generate an aggregate index and that, despite being based on preferences from society, the index recognises that the different dimensions do not have the same weight. Many instruments, such as the EORTC QLQ-C30, simply aggregate scores across dimensions and assume they have equal weight [33]. Alternative approaches have, however, been proposed for meaningfully aggregating responses to the dimensions, for example, the Pareto Classification of Health Change [34]. Whilst this has been used to analyse EQ-5D in routine outcome measurement, based on this review, its adoption in analysis of clinical trial data appears to be limited.

It should also be noted that the above limitations of the EQ-5D index do not apply to the EQ-VAS component of the EQ-5D, which offers a rating of one’s own health directly by the respondent. The influence of the EQ-VAS data as a COA in regulatory and HTA decision-making appears limited. EQ-VAS data were almost exclusively reported in assessments in Germany; however, the data were most commonly criticised on the basis of concerns around the definitions of minimally important difference included in the documentation.

5 Conclusions

Overall, findings suggest that the EQ-5D had been used for COA in 18% of all HTA technology appraisals published during the 2-year study period, most frequently the EQ-VAS in German TAs. The EQ-5D was also used to support labelling claims in a minority of EMA, in 5% of all new and modified drug approvals during the 5-year study period, with no EQ-5D data reported in FDA product labelling. The SLR found that the EQ-5D had been used for COA in clinical trials in just 8% of the published literature retrieved from MEDLINE and Embase. There were varying reasons for non-acceptance of EQ-5D data across regulatory agencies, such as issues with statistical analysis and study design, alongside concerns about sensitivity of EQ-5D to detect health status change in certain disease areas. In conclusion, findings from the three separate reviews; regulatory, HTA and the systematic literature review suggest that there is currently limited use of EQ-5D outside of economic evaluations.