Plain English summary

Idiopathic Pulmonary Fibrosis (IPF) is a progressive and fatal lung disease with a high symptom burden, which has a considerable impact on health-related quality of life (HRQoL). Numerous questionnaires have been developed for the purpose of evaluating HRQoL and deriving health state utility values (HSUV) which represents the preference of an individual for a particular health state. Each questionnaire may however produce different results in the same individual and this overall difference in values are primarily as a result of the descriptive systems. Consequently, it is important to understand these differences in the descriptive systems in choosing the appropriate questionnaire for economic evaluation. Our study aimed to compare the EQ-5D-5L and AQoL-8D to ascertain their performance to derive HSUVs in an Australian cohort of persons living with IPF.

Our results demonstrated that there was reasonable agreement between the two instruments with mean HSUVs for the EQ-5D-5L and AQoL-8D of 0.65 and 0.69 respectively. There were however some fundamental differences which lead us to conclude that the EQ-5D-5L demonstrated superior performance when compared to the AQoL-8D. This may be attributable to the high symptom burden associated with IPF and the inherent sensitivity of the EQ-5D-5L to measure physical attributes of HRQoL.

Introduction

Idiopathic pulmonary fibrosis (IPF) is the most frequent type of interstitial lung disease in older adults, characterised by progressive fibrosis and scarring of lung tissue, invariably leading to declining lung function, respiratory failure, and death [1,2,3]. Considering the natural progression of the disease, IPF is associated with a high symptom burden, typified by chronic cough and progressive shortness of breath, both which have a huge impact on health-related quality of life (HRQoL) [3].

HRQoL is an important aspect in health economic assessments of interventions to manage IPF. It has become increasingly important given the expanding landscape of research for IPF therapies, especially considering the high costs associated with treatments and the heterogeneity of clinical outcomes that may be masked by the adverse effects of the therapies under assessment. A diverse number of patient reported outcome measures (PROM) have been used to quantify HRQoL in persons with IPF [4]. While there is no gold standard to measure HRQoL in persons with IPF, it is important to ensure that the instrument being used is sensitive enough to quantify changes in health status related to the intervention under investigation [5]. Many disease specific instruments are currently being used for IPF none of these are preference based [4]. Preference-based PROMs and in particular, multi-attribute utility instruments (MAUIs) are recommended for economic evaluations as they generate heath-state utility values (HSUVs). HSUVs are an important metric that are used to estimate quality adjusted life years (QALYs) [5]. Numerous MAUIs have been developed for this purpose. To derive HSUVs, these instruments make use of two components, a descriptive system which includes questions that describe a person’s health and a utility algorithm which translates the question responses into a value (HSUV) measured on a scale of 0.00 (death) to 1.00 (best health) but can also be negative which represents health states considered worse than death [6]. A recent review of national health technology assessment guidelines in several countries demonstrating that a few MAUIs dominate: the EuroQol 5 dimension suite of instruments (EQ-5D): 85%; Short Form-6 Dimension (SF-6D): 32%; the Health Utilities Index (HUI): 29%; Quality of Wellbeing (QWB): 9%; and Assessment of Quality of Life (AQoL): 6% [7]. Each MAUI may, however produce different HSUVs in the same individual primarily as a result of the descriptive systems [6]. Thus, it is important to understand differences in the descriptive systems when choosing the appropriate MAUI for health economic evaluations. Although the EQ-5D suite of instruments is cited as the most used and most recommended or preferred by health funding agencies, recent studies have demonstrated that it may not necessarily be the most suitable in all disease conditions [7, 8]. There are currently just a few studies that have utilised MAUIs to assess HRQoL in individuals with IPF, and in those that have, most used the EQ-5D suite of instruments [4]. The AQoL-8D instrument, most recently developed with the aim of addressing deficiencies in descriptive systems of existing MAUIs and is often used in the Australian context, however it has not been assessed for suitability in the context of IPF [9]. No studies have undertaken a comparison of MAUIs to assess their relative performance and influence of the descriptive systems in the context of IPF.

The aim of this study was to assess the performance of between the EQ-5D-5L and the AQoL-8D to measure HSUVs in an Australian cohort of persons living with IPF. More specifically, we aimed to do this by conducting a head-to-head comparison of the two MAUIs, taking into consideration the practicality of the questionnaires, the level of agreement and test performance, namely the internal and construct validity.

Methods

Study participants and data collection

Participants for this study were recruited between August 2018 and December 2019 from Australian IPF Registry (AIPFR) [10, 11]. The AIPFR is a national multi-centre, prospective registry of IPF patients facilitated by the Lung Foundation of Australia. Details on the recruitment methodology for the AIFPR have been previosly described and can also be found in the supplement [10, 11]. Participation was voluntary through informed consent, and withdrawal was possible at any time without reason.

Data were collected using a predesigned survey instrument. The instrument collected socio-demographic and clinical information and incorporated the EQ-5D-5L and AQoL-8D. Data for St. George’s Respiratory Questionnaire (SGRQ), Hospital Anxiety and Depression Scale (HADS), University of California San Diego Shortness of Breath Questionnaire (SOBQ) and pulmonary function tests (PFT) were collected from the AIPFR database, using those with the date of completion closest to the survey completion, but only those within 12 months. For purposes of comparison, demographic and clinical data on non-responders to the survey were also collected from the AIPFR database.

Health-related quality of life measures

Table 1 provides a summary of the characteristics of the MAUIs, and disease specific instruments used in this study.

Table 1 Characteristics of the health-related quality of life instruments used in this study

MAUIs

EQ-5D-5L

The EQ-5D-5L was developed to address the limited sensitivity of its predecessor the EQ-5D-3L [12]. In addition to generating HSUVs, the EQ-5D-5L also includes a visual analogue scale (EQ-VAS) which patients can use to rate their current health on a scale from 0 to 100 (worst to best) [12]. While the valuation process for the EQ-5D-5L has been completed in Australia, it is yet to be published [11, 13]. To estimate HSUVs for the EQ-5D-5L, we made use of an earlier study which developed utility weights for the EQ-5D-5L for Australia [11, 13]. To ensure the robustness of the HSUVs estimated, we conducted a sensitivity analysis using estimates generated using the crosswalk method by Van Huot et al. [14] and using the United Kingdom (UK) value set for EQ-5D-5L [15].

AQoL-8D

The AQoL-8D is the latest version of the AQoL suite of instruments. This MAUI was developed to improve the instrument’s sensitivity to capture and assess the psychosocial domains of HRQoL [9, 16]. AQoL-8D HSUVs were calculated using a scoring algorithm incorporating Australian weights [17].

Comparator HRQoL instruments

Given that the EQ-5D-5L and AQoL-8D are preference-based instruments, we compared these with non-preference based instruments HRQoL measures used for IPF patients, namely disease specific instruments such as the SGRQ [18, 19], SOBQ [20] and others such as the HADS [21]. Scores for the SGRQ and SOBQ were presented as quartiles.

Disease severity

Several disease severity classification systems have been used for IPF [22, 23]. We used three measures: (1) the Gender, Age, Physiology (GAP) staging [24]; (2) the Composite Physiological Index (CPI) [25]; and (3) the forced vital capacity as a percent predicted (FVC%) [26]. These are fully described in the supplement.

Medications

Treatments were categorised in accordance with international guidelines for IPF, classified as (1) conditional recommendation for use (anti-fibrotics pirfenidone and nintedanib); (2) conditional recommendation for use (limited evidence n-acetylcysteine and anti-reflux medications); and (3) strong recommendations against use (prednisolone, warfarin, and azathioprine) [27, 28].

Statistical analysis

Descriptive statistics

Statistical analyses were conducted using R Software and STATA statistical software [29, 30]. Participants for whom a HSUV could be generated for one or both instruments (AQoL-8D or EQ-5D-5L) were included in this analysis. Two sample t-test or Chi-squared tests were used where appropriate to compare (1) responders and non-responders to the survey (2) participants with PFTs and participants without/with incomplete PFTs and (3) participants with comparator HRQoL data and participants without. A p-value < 0.05 was used as a test for statistical significance. Characteristics of participants are presented descriptively as means and standard deviations (SD), medians and interquartile range (IQR) for continuous variables or counts and proportions for categorical variables.

Summary statistics for participants’ characteristics and HSUV scores for the EQ-5D-5L, AQoL-8D, and the EQ-VAS were summarised as means and 95% confidence intervals (95%CI) and medians (IQR). Ceiling and floor effects for both instruments were evaluated by calculating the proportion of persons in the best possible and worst health states, described as 1.00 and ≤ 0.00, respectively. Response levels for all dimensions of EQ-5D-5L and AQoL-8D were evaluated and ratings for each level of each dimension were analysed.

Questionnaire practicality

Given the debilitating nature of IPF, an important criterion for evaluation is the practicality of the questionnaire. Firstly, we evaluated the completion rate of the questionnaire by assessing the number of complete questionnaires and number of questionnaires with sufficient information for utility calculation. Secondly, noting the disabling symptoms associated with the disease, we reviewed whether there were questions in both instruments where extreme (severe) responses were not recorded as expected, which would provide an indication of the meticulousness of responses under symptom duress.

Agreement between instruments

Pairwise agreement between the HSUVs generated by each instrument for individual participants was first assessed using a scatterplot. Bland Altman plots were then used to assess agreement between the two instruments by plotting the differences between the HSUVs of the two instruments against the mean of the two HSUVs along with the 95% confidence limits of agreement [31]. Intraclass correlation coefficients (ICCs) were then calculated using a two-way random effects model with average measures and absolute agreement in accordance with the nonparametric nature of the data [32]. An ICC < 0.50 is indicative of poor agreement; 0.50–0.75 moderate; 0.75–0.9 good; and > 0.90 excellent agreement [33]. We also evaluated scores across all instruments and disease severity measures for participants who demonstrated floor and ceiling effects [34]. Lastly, we evaluated the influence of sociodemographic and clinical covariates on HSUVs using Tobit models [35].

Test performance

Internal validity

Internal validity was assessed using the Cronbach’s alpha. For items within each dimension of the AQoL-8D, values > 0.7 were considered as acceptable levels of reliability [36].

Construct validity

To assess convergent validity, we assessed the strength of correlation between the two MAUIs and additionally between the MAUIs and other measures of HRQOL using Spearman’s rank correlation coefficient [37]. A Spearman’s rho ≥ 0.8 or ≤ −0.8 was considered a very strong association; 0.60–0.79 or −0.60 to 0.79 a strong association; 0.40–0.59 or −0.40 to 0.59 a moderate association; and −0.40 to 0.40 a weak association [38].

To assess divergent validity, we evaluated known group validity and the ability of the instruments to detect clinically relevant differences, more specifically in relation to the FVC%, GAP and CPI. For known group validity we utilised the Kruskal–Wallis rank test to assess the differences within clinical variable groups [37]. To assess the ability of the instruments to detect clinically relevant differences we estimated the effect size (ES), relative efficiency (RE) with the EQ-5D-5L as the reference, and the area under receiver operating characteristics curves (AUC) [37]. RE values > 1 would indicate the AQoL-8D is more efficient in distinguishing between known groups and clinical levels [37].

Results

Participants’ characteristics

Table 2 and S1 provide a summary of participant and non-participant characteristics. There was a 56% response rate (Figure S1). Of the 162 respondents, 156 completed the EQ-5D-5L and 157 the AQoL-8D. Persons who did not participate in the study (n = 126) had more comorbidities and were older than responders. Participants with lung function (n = 105) and comparator HRQoL data (n = 129) were more likely to be on antifibrotic medication (Table S1).

Table 2 Participant characteristics

The mean age for participants was 73.8 (7.6) years and 80% were aged 65–85 years. Most participants were male (61%), Caucasian (90%), lived in major cities (61%) and were from New South Wales (41%). Three-fifths were on antifibrotic treatment (60%) and 80% had ≥ 1 comorbidity.

The mean GAP index, FVC % and CPI were 4 (1), 87.6 (22.4) and 36.0 (13.8) respectively. Mean scores for total SGRQ and SOBQ were 46.0 (20.6) and 40.2 (27.6) respectively. The HADS questionnaire detected depression and anxiety in 24% and 16% of participants, respectively.

Questionnaire practicality

Completion of the questionnaires

Of the 162 participants, 97% completed the AQoL-8D with sufficient data for utility derivation but only 85% fully completed the questionnaire. For the EQ-5D-5L, 96% completed the questionnaire.

Item responses

Less than 1% of participants had severe problems with pain/discomfort (PD) and self-care (SC) (Table S2). For PD, 67% of participants had slight/no pain and for self-care, 86%. For mobility and anxiety or depression (AD), 1% had severe problems while 87% of participants reported slight or no problems for AD and 62% for mobility. For usual activities (UA), 4% had severe problems and 66% reported slight or no problems.

For the AQoL-8D, < 2% of participants had severe issues with mental health, happiness, relationships, self-worth, and senses. For pain and coping, responses for the severe level were 4% and 6% respectively, and 64–77% rated themselves as having slight or no deficit/problems in all dimensions.

Agreement between instruments

Figure 1A and 1B show distribution of HSUVs for the EQ-5D-5L and AQoL-8D, both of which were left-skewed. Table 3 provides summary statistics for the instruments. The EQ-5D-5L exhibited a wider range of values (−0.57 to 1.00) with 4% of participants (n = 6) reporting scores less than 0 (floor effect) and 13% (n = 20) the ceiling effect. The AQoL-8D scores ranged between 0.16 and 1.00 with only 1% (n = 2) demonstrating a ceiling effect. Mean (SD) for the EQ-5D-5L, AQoL-8D and EQ-VAS were 0.65(0.28), 0.69(0.20) and 69 (18), respectively. The scatterplot for the two instruments (Fig. 1C) showed clustering in the upper right quadrant corresponding to HSUVs higher than 0.50 for the EQ-5D and higher than 0.70 for the AQoL. The agreement between the two instruments was good with an ICC of 0.84 (95%CI, 0.78–0.89). The Bland Altman plot (Fig. 2) demonstrated a similar trend with a negative mean difference (−0.04) between the two instruments, with 92.1% of the HSUVs between the bounds of agreement (−0.39 to 0.30).

Fig. 1
figure 1

Distribution of scores for AQoL-8D, EQ-5D-5L and EQ-VAS

Table 3 Summary statistics for AQoL-8D, EQ-5D-5L, and EQ-VAS
Fig. 2
figure 2

Bland Altman plot for differences in means for AQoL-8D and EQ-5D-5L utilities

Tables 3 and 4 provide a comparison of participants with ceiling and floor effects from the EQ-5D-5L to the EQ-VAS, AQoL-8D, disease specific HRQoL instruments and disease severity measures. Of the 20 participants reporting perfect health, almost all (n = 18) had lower AQoL-8D scores driven by the MSD, which ranged between 0.33 and 0.87 with a mean of 0.41 (0.21). Overall, there were varying levels of concordance between lung function variables and the HRQoL measures. Similar trends were noted for the participants with floor effects. The participant with the lowest EQ-5D-5L utility (−0.57) did not have corresponding low lung function measures, however, they did record the worst scores for the SGRQ total (84), activity (100), and symptoms domain (97) and poor scores for the impact domain (70). The poor SGRQ impact domain score corresponded with the low AQoL MSD score (0.05) and poor HADS depression (12) and anxiety (16) scores which indicated moderate to severe anxiety and depression. While this participant recorded a low EQ-VAS score (32), it was not the lowest score recorded.

Table 4 Comparison of participants with perfect health based on the EQ-5D-5L

Table 5 provides summary statistics for AQoL-8D and EQ-5D-5L HSUVs and EQ-VAS by participant characteristics. Males generally had higher HSUVs as measured by both instruments and the EQ-VAS. While there was no distinct trend observed for mean HSUVs by age group, persons in the youngest age group (≤ 65 years) had the lowest HSUVs for both instruments and the EQ-VAS. There was an overall reduction in mean HSUVs with increasing disease severity for both instruments and the EQ-VAS as demonstrated by the FVC%, GAP stage and CPI score. Participants with better scores on the SGRQ, SOBQ, and HADS had higher HSUVs and EQ-VAS scores. HSUVs and EQ-VAS scores decreased with increasing number of comorbidities. Participants who were on antifibrotic medication consistently had higher HSUVs for both instruments and on the EQ-VAS compared to those not receiving antifibrotics. Conversely, persons who were on medication categories “conditional recommendation for use” and “strong recommendations against use” had lower HSUVs than those who were not on these medications for both instruments and the EQ-VAS. Employed participants had higher HSUVs and EQ-VAS scores than unemployed and retired participants.

Table 5 Comparison of participants with the floor effect on the EQ-5D-5L
Table 6 AQoL8D and EQ-5D-5L utility scores and EQ-5D Visual analogue scale scores stratified by participant characteristics

Univariable Tobit models (Table S3) indicated that the disease severity measures (PFTs), > 2 comorbidities, employment status, and medications in the categories “strong recommendations against use” and “conditional recommendations for use” were statistically significant predictors of HSUVs for both instruments, which was consistent with the descriptive analysis. The AQoL-8D unlike the EQ-5D-5L showed statistically significant associations between all age groups and HSUVs (reference age group ≤ 65 years) and the EQ-5D-5L demonstrated a statistically significant association with BMI and HSUVs, which was not observed with the AQoL-8D. For the multivariable models (Table S4–S6), our results demonstrated similar significant associations for both instruments for disease severity, persons with > 2 comorbidities, and employment status. The magnitude of the effect for the most part was larger with the EQ-5D-5L. The AQoL-8D however demonstrated additional statistically significant associations with age groups and medications in the category “strong recommendations against use” (Table 6).

Test performance

Internal consistency

Cronbach alpha scores (Table S7) for the EQ-5D-5L and AQoL-8D were 0.83 (95%CI, 0.79–0.87) and 0.95 (95%CI, 0.94–0.96), respectively. Closer evaluation of the AQoL-8D revealed Cronbach alpha scores between 0.80 and 0.90 for all dimensions except for coping (0.59) and senses (0.22).

Construct validity

The AQoL-8D was very strongly correlated with EQ-5D-5L (0.80). The PSD (0.79) was more strongly associated with the EQ-5D-5L utility than the MSD (0.74). The EQ-VAS was strongly associated with both the AQoL-8D and the EQ-5D-5L, 0.66 and 0.63, respectively. The SOBQ and SGRQ were more strongly associated with the EQ-5D-5L and the HADS and SGRQ impact with the AQoL-8D. More details are provided in Table S8.

Both instruments were able to detect statistically significant differences in HSUVs between clinical variables. The effect size between groups was larger for the EQ-5D-5L. The AUC was larger for the EQ-5D-5L indicating a higher sensitivity to differences in HSUVs between groups and the RE reflected that the EQ-5D-5L was more efficient in detecting differences between groups than the AQoL-8D. Full details are provided in Table S9.

Discussion

Given the importance of health economic evaluations in health financing decision-making especially with the expanding landscape of treatments for IPF, the selection of a preference-based PROM for research is a critical undertaking. Consequently, our study sought to directly compare the AQoL-8D and EQ-5D-5L for measuring HRQoL in persons with IPF in Australia. There was reasonable agreement between the two instruments for measuring HRQoL, however, there were some fundamental differences. One of these key differences was the enhanced sensitivity of the AQoL-8D compared to the EQ-5D-5L to measure the psychosocial aspects of HRQoL. This was further confirmed when the instruments were compared to other HRQoL measures. The EQ-5D-5L was highly correlated with the SOBQ and the Activity component of the SGRQ while the AQoL-8D was more associated with the Impact domain of the SGRQ and the HADS. In contrast to the AQoL-8D, the EQ-5D-5L had a greater divergent sensitivity and efficacy in relation to assessing HRQoL between clinical groupings.

Our study demonstrated that in this cohort, the practicality of both instruments was similar, noting that completion rates were sufficient for estimation of HSUVs, and extreme responses, were comparable for both instruments. Closer examination of fully completed questionnaires demonstrated an 11% difference in completion rates favouring the EQ-5D-5L, despite the AQoL-8D being administered first. This was consistent with published literature in a similarly aged population [39], and was expected given the difference in length in the questionnaires: 35 items with completion time of 5.5 min for the AQoL-8D and 5 items with completion time of 1 min for the EQ-5D-5L [5].

There was reasonable agreement between the EQ-5D-5L and AQoL-8D. First, the mean and median HSUVs for the AQoL-8D and EQ-5D-5L were similar, with the differences between the means and medians being 0.04 and 0.02 respectively. While there is limited evidence on the minimally important difference (MID) for the AQoL-8D or EQ-5D-5L for IPF, these differences fall within the reported MIDs in published literature for these two instruments, for the general Australian population (0.06 (0.03–0.08)) [40] and for a Canadian IPF cohort (0.01–0.05) [41], signifying that there was consistency in the health status between the two measures. The Bland–Altman plot, ICC and regression analysis provided further evidence to support the agreement between the AQoL-8D and EQ-5D-5L. While our study demonstrated reasonable agreement, previous studies that have evaluated the two instruments have shown larger discrepancies and lower HSUVs with AQoL-8D compared to the EQ-5D-5L [34, 39, 41]. We attribute this difference to disease or population specific characteristics which may be more focussed on psychosocial deficits to which the AQoL-8D is more responsive [42, 43], whereas for IPF the deficits related to the symptoms are predominantly physical [4], to which the EQ-5D-5L is predominantly responsive [43, 44]. This was further substantiated when we assessed the convergent validity, more specifically the association between the AQoL-8D and EQ-5D-5L and the disease specific or symptom related measures of HRQoL, where we noted strong associations between the EQ-5D-5L and the activity component of the SGRQ and the SOBQ while the AQoL-8D was strongly associated with the impact domain of the SGRQ and the HADS questionnaire.

Notwithstanding the similarities, there were notable differences which provided insight into the suitability of the AQoL-8D and EQ-5D-5L in an IPF cohort. In the first instance, the EQ-5D-5L demonstrated a wider range of HSUVs (−0.57 to 1.00 vs 0.16–1.00) and also demonstrated a larger proportion of persons with floor (4%) and ceiling effects (13%). This suggests that the EQ-5D-5L may not be a sufficiently sensitive measure for mild disease, but it may be more responsive to severe disease compared to the AQoL-8D. This is possibly as a result of the high symptom burden which is physically debilitating in persons with severe IPF as compared to milder disease. Conversely, in this cohort, the AQoL-8D is evidently a more robust measure for milder disease, demonstrating a wider range of HSUVs between 0.68 and 1.00 for this subgroup of patients who scored full health (1.00) with the EQ-5D-5L, noting that most of the deficit was attributed to the AQoL-8D MSD (psychosocial). While there is no comparison study for the AQoL-8D in an IPF cohort, recent research with the EQ-5D-5L has shown similar ceiling effects in patients with milder disease [41], corroborating our findings.

An important characteristic of a PROM is the ability to differentiate between known groups that are clinically different. To assess this, we focussed on clinically relevant variables. Our results demonstrated that both instruments were able to detect HSUV differences between groups in the variables studied. The EQ-5D-5L demonstrated a larger ES, higher sensitivity (AUC) and efficiency (RE) than the AQoL-8D, for clinical groups based on lung function testing. This was also seen in our regression analysis that demonstrated larger effect sizes with the EQ-5D-5L than with the AQoL-8D for GAP, FVC% and CPI. Of note however is the magnitude of the AUC for both the EQ-5D-5L and the AQoL-8D, both less than 0.75, indicative of a lower than optimal discriminatory power [45]. While this is not ideal, it is expected as generic instruments may not be sensitive enough to detect minimal changes related to disease specific or clinical parameters and is the reason for the recommendation to use these alongside disease specific instruments in IPF cohorts [4, 41]. Conversely, the AQoL-8D demonstrated a higher sensitivity and efficiency to differentiate clinical classification groupings with the HADS and the SGRQ, consistent with its responsiveness to the psychosocial aspects of HRQoL.

While there are no established standards for assessing HRQoL in IPF [4], and more specifically as it relates to preference-based instruments, the instrument selection process should be guided by its sensitivity to the unique characteristics of IPF patients and the specific changes expected by the interventions being evaluated. Notwithstanding the fact that there is no perfect instrument [5, 8], instruments with low sensitivity to changes in health states attributed to an intervention, or not suited to the specific population, may potentially introduce unwanted bias in the decision-making process [5, 8]. The EQ-5D-5L may potentially be more suited to our IPF cohort, primarily because of the evidence supporting its practicality, the wide observed range of HSUVs and its superior divergent validity, the latter of utmost importance when evaluating new treatments or interventions. While we acknowledge that the EQ-5D-5L may not fully capture the psychosocial aspects of HRQoL, our results demonstrated that the mean and median HSUVs from both instruments were quite similar and within MIDs, suggesting that this deficit may not be the primary influencer of the HSUVs, especially noting that HSUVs were higher with the AQoL-8D instrument. We do not, however, disregard the advantages of the AQoL-8D and recommend that they be used together whenever possible, especially in cohorts with milder disease.

This study has generated the first HSUVs for an Australian cohort of persons living with IPF, and the first to undertake a comparison of the AQoL-8D and EQ-5D-5L in a cohort of persons with IPF. This will be useful in future economic evaluations and adds to the limited evidence on preference-based instruments in the field of IPF. There are however some limitations, firstly the small cohort size. As IPF is a rare disease, this is consistent with other research [4]. Recent research estimates approximately 11,000 persons living with IPF in Australia [46], suggesting a 7–8% margin of error at a 95% confidence level with our cohort. In addition to the sample size, our cohort may not fully represent the Australian IPF population as both the AIPFR and survey were opt-in. This may mean that persons with more severe disease and older persons may be disproportionately represented in our cohort, and this may possibly underestimate the effect of IPF on HSUVs. However, we conducted a comparison in an earlier study and the results were analogous to results from other countries [11].

A second limitation is the cross-sectional nature of the study. This firstly limits our assessment of the construct validity as it relates to the sensitivity of the instruments to detect changes over time, which would be relevant to the context of economic evaluation. This will be one of the subjects of our continued research. Additionally, we used cross-sectional data for lung function and disease specific HRQoL instruments that were within 12 months of the survey completion. While this may be acceptable in most cases, progression of the disease can be quite varied, and this timeline may not be ideal in the case of rapid progressors [47].

A further limitation of this study is that we did not compare the instruments based on content and structural validity while this is an essential part of validating an instrument for use in a specific population, this was not the aim of this study. A comprehensive validation of both instruments will be the focus of our future work. Our analysis however demonstrated that the behaviour of the two instruments in this cohort was in line with previous evaluations of content validity which demonstrated a predisposition of the content of the EQ-5D to measure physical deficits/attributes of HRQoL and the AQoL to measure psychosocial deficits/attributes [5,6,7,8,9, 43, 44].

Finally, the assumptions used in the estimation of the EQ-5D-5L HSUVs, however the sensitivity analysis conducted in our previous study demonstrated that the values generated from the cross walk method [14] and UK value set were similar to the estimates generated from the Australian value set [4]. Despite this, we will update the analysis once a published value set is available from EuroQol, although we do not believe this will change our outcomes.

Conclusion

In selecting a MAUI for economic evaluation in a specific disease area, it is important to understand their descriptive systems and their innate characteristics as it relates to the disease being evaluated. Our study, the first of its kind, aimed to assess this for the AQoL-8D and EQ-5D-5L. Our findings suggest the EQ-5D-5L is the preferred instrument in for use in IPF based on the criteria evaluated, given its inherent sensitivity in measuring physical attributes related to HRQoL, and the debilitating physical effects of the symptoms of IPF.