In the first survey, editing of data eliminated 14.9 and 11.0 % of the Australian and US respondents, respectively, leaving usable samples of 1,430 and 1,460 respondents, respectively. Age/sex distributions for both ‘public’ and ‘patient’ samples are reported in Table 2. They are almost identical in the two countries, reflecting the use of demographic-based quotas. Unreported results found that the number of public respondents completing only high school, with a diploma or trade certificate, and completing university are almost identical in the Australian sample but skewed towards high-school completions in the US (42.4 %, 23.1 % and 34.5 % for the three US categories, respectively). Because of quotering, the numbers of respondents in each of the seven disease areas are very similar, varying from 148 to 179 per category. By comparison with the US, Australian men are overrepresented in every disease category. However, the differences are unimportant in the context of this study as representative samples are not strictly necessary for a comparison of instruments.
Table 3 reports summary statistics. The scores in the two countries are very similar. The maximum difference between mean scores is 0.03 (EQ-5D, public). Mean scores for the EQ-5D, Health Utilities Index (HUI) 3 and AQoL-8D are also very similar, particularly in the ‘public’ sample. However, the distributions of scores are dissimilar. The standard deviation around the mean varies by more than 100 % between the SF-6D/15D and the HUI 3. The EQ-5D has very significant ceiling effects, with about 40 % of respondents in both countries recording no disutility. In contrast, <10 % of public respondents recorded maximum scores on the SF-6D and AQoL-8D. In the total sample (public plus patients) only 0.5 and 1.4 % of respondents recorded scores below 0.4 on the 15D and SF-6D, whereas more than 10 % were assigned scores below 0.4 by the AQoL-8D and HUI 3, respectively.
The Pearson correlation between MAU instrument scores are reported in the top right-hand side of Table 4. The average of the correlations which included each instrument is shown in the final column of the table. It represents a summary measure of the convergence of each MAU instrument with the remaining five instruments. The results are similar in the two countries. The lowest correlation in both is between the QWB and EQ-5D (0.65 in both countries). The highest correlations are 0.82 and 0.84 between 15D and HUI 3 (Australia) and 15D and AQoL-8D (US). The average correlation with other instruments is highest in both countries for the 15D (0.79, 0.80; Australia/US), followed by AQoL-8D (0.77, 0.79; Australia/US). However, with the exception of the QWB there is little difference between the averages.
While the Pearson correlation is the conventional test of convergent validity, a more stringent test is the use of the ICC, which tests the association between absolute scores. It differs from the Pearson correlation if the line of best fit between the variables is not Y = X; that is, the implicit scales of the variables differ. ICC’s between MAU instruments are shown in the bottom left-hand side of Table 4. They are (necessarily) smaller than the Pearson correlations. The average ICC for the 15D drops from the highest to lowest position, reflecting the compressed range of scores it predicts. The largest average ICC in both countries is 0.69 for AQoL-8D, followed by 0.65 (0.67) for the EQ-5D in Australia (US).
Pairwise GMS regressions are reported for each combination of instruments for both countries in the Appendix. The country results are again almost identical. There is a maximum difference in the b coefficients between the two countries of only 7.6 % (1.83 vs. 1.70; Australia/US) in the regression of QWB on HUI 3. R
2 coefficients are higher than in the two five-instrument studies reported earlier, reflecting the wider range of observations in the first survey.
Perfect prediction of the marginal change in one MAU instrument by another implies b = 1.00 in the relevant pairwise regression. Table 5 reports deviation from this when deviation is measured as the larger divided by the smaller marginal change times 100. The lowest deviation is associated with QWB, AQoL-8D and EQ-5D, indicating greater predictive validity by these instruments when each is judged by the remaining instruments.
From Table 1, AQoL-8D has high face validity and particularly in the psychosocial dimensions, which include 24 of its 35 items. The more formal evidence of content validity is presented in Table 6, which reports the Pearson correlation between the dimensions of the SF-36, the three SWB, and the MAU instruments. The table excludes the SF-6D. As it is derived from the SF-36, its correlation with the SF-36 dimensions is an invalid comparator. From Table 6, the AQoL-8D has the highest correlation with each of the psychosocial dimensions. The difference is particularly significant for mental health where the AQoL-8D correlation is 0.27 and 0.22 points above the average correlation coefficient in the two countries, respectively. In the physical domain, the correlation is higher for general health but below the average for physical function and pain. However, in these cases the correlation is still sufficient to indicate sensitivity to these dimensions. The correlation between the AQoL-8D physical super-dimension and the SF-36 physical component summary (PCS) was 0.80, and indicates that AQoL-8D is sensitive to the physical dimensions, but that the overall correlation with the full AQoL-8D is reduced because of the increased breadth of the content.
The correlation between the MAU instruments and the three SWB instruments reported in the last three lines of Table 6 is lower than between the MAU instruments. While the three SWB instruments measure closely related constructs, they differ. Nevertheless, the correlation between them and the MAU instruments is similar. The lowest correlation in the Australian sample occurs with the EQ-5D, and in the US with the QWB. The highest correlation in both countries with all three instruments is with AQoL-8D. Its average correlation across the three instruments of 0.65 is 48 % above the average correlation of 0.44 for the remaining instruments.
In the second survey, 385 (different) Australian public respondents were invited to complete a baseline survey and to complete two further surveys spaced a fortnight apart. A total of 224 people completed the second-stage survey and all of these respondents completed the third-stage survey. Overall, therefore, 58 % of initial respondents completed all three surveys. The sample contained the same number of men and women (112); approximately 20 % were from the age cohorts below 34, 35–44, 45–54, 55–64 and 65+ years. Educational status was also spread: 35 % had completed only high school; 35 % had additional non-university qualifications and 30 % had a bachelor’s degree or above from a university.
Table 7 reports the mean scores of the AQoL-8D and its dimensions at each stage of the survey and the ICC coefficients between the three stages. The standard error of each mean was 0.01. Mean values are relatively stable over the 4-week retest period but increase by a small statistically significant amount for AQoL-8D and each of its dimensions, with the exception of independent living and happiness. The largest increases are for mental health (4.9 %) and senses (4.7 %). AQoL-8D increases by 4.1 %. For group data, a correlation of at least 0.7 is recommended as evidence of satisfactory reliability , and each of the ICC coefficients in Table 6 exceeds this threshold, with the exception of the dimensions for senses. Coefficients of 0.9 are considered satisfactory at the individual level for clinical purposes . The AQoL-8D coefficient of 0.89 is close to this higher threshold.
Cronbach alpha coefficients were calculated from the MIC database and reported in the last two columns of Table 7. AQoL-8D alphas are very high in both countries—0.96. The recommended value of 0.7 is also achieved by each of the AQoL-8D dimensions, with the exception of senses. This truncated dimension includes vision, hearing and communication, and the results suggest that there is not a strong underlying construct corresponding with these. However, the items were retained due to their intrinsic importance.