Patient-reported outcome measures (PROMs) have the potential to help improve health and care services by focusing attention on what matters most to people receiving treatment [13].

Changes measured using different measures should be highly correlated and show similar change magnitude. Many different measures have been developed but the changes measured by different instruments do not agree well [4]. Direct comparisons between two measures show the extent of agreement between them, but cannot show whether one measure is better than another. For this we need a gold standard for comparison.

In this study, we set out to compare the changes following hip and knee replacement surgery as measured by two generic PROMs – EQ-5D-3 L [5] and howRu [6] – using condition-specific measures – Oxford Hip Score (OHS) [7] and Oxford Knee Score (OKS) [8] – for comparison.

We compared comparable cohorts from two existing databases as a natural experiment – NHS PROMs and MyClinicalOutcomes. Since 2009, all patients having hip and knee replacement surgery paid for by the NHS have been asked to complete EQ-5D-3 L and the Oxford scores before and six months after surgery. Anonymised results are published for further analysis. This programme has led to more than 60 research papers [9]. MyClinicalOutcomes has collected a database on a wide range of patients where it has collected howRu and the Oxford Scores [10]. We extracted a subset of those with hip and knee replacement surgery. This allows a comparison of EQ-5D-3 L with howRu by seeing how both perform against the same condition-specific measures on similar cohorts of patients.

The measures

The OHS [7] and the OKS [8] are condition-specific PROMs for the evaluation of joint replacement implants and techniques. Each measure has 12 items, with five responses each. Each item is scored on a 0–4 scale. The score for each item is added, giving an overall score on a scale from 0 (worst possible score) to 48 (best possible score).

EQ-5D-3 L [5] is a generic PROM with two parts, the EQ-5D Index and a visual analogue scale (EQ-VAS). The EQ-5D Index is derived from 5 items: mobility (walking about), self-care (washing and dressing), usual activities (e.g. work, study, housework, family or leisure activities), pain or discomfort, and anxiety or depression. Each item has 3 possible responses. The EQ-5D Index is derived by applying weights to each response based on valuations derived from a population survey. The NHS PROMs programme uses the UK tariff [11]. These weights purport to represent the perspective of society as a whole. The range of possible scores for the EQ-5D Index is from −0.594 (worst state) to 1.0 (best state), with death allocated a value of 0. The EQ-VAS is a 20 cm visual analogue scale with a range from 0 (worst imaginable health state) to 100 (best imaginable health state). The EQ-VAS is intended for use as a quantitative measure of health outcome as judged by the individual respondents [12, 13].

HowRu [9] is a short generic patient-reported measure of health-related quality of life, with 4 items: pain or discomfort; feeling low or worried; limited in what you can do; need help from others. Each item has four possible responses: extreme, quite a lot, a little, and none. These are scored from 0 (extreme) to 3 (none). The summary howRu score is the sum of the item scores, giving a scale with 13 possible values with a range from 0 (4 × extreme) to 12 (4 × none).

Previous studies have compared howRu with SF12 [6] and with EQ-5D [14], and show that howRu has comparable overall performance at a single point in time. HowRu is considerably shorter than EQ-5D with 37 words vs. 230 words and has been validated for use at the individual patient level [15]. Since the original publication of howRu [6], some small changes have been made. The original item "Dependent on others" has been changed to "Require help from others", to improve understanding. The user instructions have been simplified from "Circle one face on each line to tell us how you are today" to "Choose one answer to each question". The main question "How are you today?" has been qualified by adding "(past 24 hours)" to clarify that it means this day rather than right now. These changes have slightly changed the word counts (see Fig. 1).

Fig. 1
figure 1

howRu instrument

All of these instruments were developed as measures of patient benefit, so we might expect that they would show a similar level of improvement and be highly correlated. However, condition-specific measures only take account of those aspects of each patient’s health directly associated with the condition being treated, while generic measures have a more holistic view, including co-morbidities. For this reason, condition-specific measures usually show larger improvements after surgery than generic measures [3].


The data collected in the NHS PROMs programme covers all hospitals providing hip and knee replacements paid by the NHS. Most data are collected using paper booklets. Pre-operative questionnaires are completed at a pre-operative assessment clinic or on admission. Post-operative questionnaires are mailed to each patient’s home address 6 months later.

To use the MCO web-based system, patients register, complete the appropriate condition-specific measures (here, OHS or OKS) and howRu, and consent to share their health information with their medical team. Patients are issued new question sets every three months and are shown feedback indicating the absolute and rate of change in their score. The MCO data for this study was collected between August 2011 and October 2013. The MCO data is not publicly available.

The MCO system had 1,696 patients with an OHS and 1,395 patients with an OKS. Of these, 178 hip replacement patients and 103 knee replacement patients had both a pre-operative and post-operative ratings. The proportion is relatively low because most patients also completed NHS PROMs surveys for hip and knee replacement operations, which involved duplication of the OHS and OKS scores. Entries with matched pre-op and a 5, 6 or 7-month post-op ratings for both howRu and OHS or OKS as appropriate were selected for analysis. Where more than one set of post-operation ratings was available, we selected the one closest to 6 months after the operation. All patient records that were incomplete for any reason were excluded from the analysis. This yielded data on 74 hip replacements and 42 knee replacements.

The original scores for both NHS and MCO records were used without case-mix adjustment.

Each instrument uses a different scale, which complicates comparison between results using different instruments (Table 1). We transformed each scale arithmetically to provide a common 0–100 scale from minimum (0) to maximum (100).

Table 1 Range of possible scores for each instrument

We used Excel or Stata/IC for Windows 12.1 to calculate the distributions, means, standard deviations and correlations for each measure.

The generic measures are compared with condition-specific measures in the following ways.

  • The proportion of patients reporting improvements using each measure.

  • Pre-op and post-op scores for each measure.

  • The mean change between each patient’s pre-operative and post-operative scores for each measure, using the 0–100 scale.

  • Correlation of the change between pre-operative and post-operative scores for each generic measure with the relevant condition-specific measure.


Table 2 shows the number of patients in each cohort and the proportion of patients who have shown improvement for each measure with the 95 % confidence limits.

Table 2 Percentage of patients reporting any improvement and 95 % confidence intervals

Table 3 shows, for each cohort and measure, the mean pre-operative and post-operative scores and the mean change after surgery (the outcome), calculated as the post-operative score minus the pre-operative score. These are shown transformed to a common 0–100 scale. The same data using the original scales are provided as an Additional file 1.

Table 3 Mean pre-op and post-op scores and the mean change after surgery (post-op score minus pre-op score) using 0–100 scales

The use of the 0–100 scale allows a comparison of the outcome as measured by each instrument for each type of operation (Fig. 2). For hip replacement, EQ-5D shows an improvement of 26.0, compared with 42.2 for OHS (62 % of the OHS score) for the NHS cohort. HowRu shows an improvement of 32.5 compared with 43.9 for OHS (74 %) for the MCO cohort.

Fig. 2
figure 2

Relative size of scores before and six months after hip and knee replacement surgery as measured by different instruments on a common 0–100 axis, where 0 represents the worst state on each scale and 100 represents the best possible score. OxS refers to Oxford Hip Score for hip replacement and Oxford Knee Score for knee replacement; EQ5D-Ind refers to the EQ-5D Index score and EQ-VAS the EQ-5D Visual Analogue Score. OxS-MCO and howRu scores are from MyClinicalOutcomes data, others are from the NHS PROMs data

For knee replacement, EQ-5D shows an improvement of 18.9, compared with 31.8 for OKS (59 %) for the NHS cohort. HowRu shows an improvement of 25.6 compared with 36.4 for OHS (70 %) for the MCO cohort.

The MCO patients have greater improvement than the NHS patients, which may be due to different populations. The howRu instrument shows a greater improvement, relative to the condition-specific measure than EQ-5D.

The correlations for each measure within each cohort are shown in Table 4 for the scores before surgery and 6 months after surgery. Table 5 and Fig. 3 show the correlation of the change or outcome of surgery, as measured by each instrument.

Table 4 Correlations between condition-specific and generic scores
Table 5 Correlations of differences between post-op and pre-op scores
Fig. 3
figure 3

Correlations of the outcome of hip and knee replacement surgery as measured by Oxford Hip and Knee Scores (OxS), EQ-5D Index, EQ-VAS (for NHS PROMs data) and howRu (for MCO data)

The correlations between howRu with OHS and OKS are higher than the corresponding correlations with EQ-5D Index. Tables 4 and 5 also give z-tests comparing the correlations: correlations with howRu are statistically significantly higher than with EQ-5D Index for the outcome of hip and knee replacement and pre-operatively for the knee replacement. The correlations of OHS and OKS with howRu are much higher and statistically significantly higher than those with the EQ-VAS. For example, considering the outcomes of hip surgery, a correlation of r = 0.77 (OHS vs. howRu) explains 59 % of the variance (r2), while correlation of r = 0.64 (OHS vs. EQ-5D Index) explains 41 % of the variance and correlation of r = 0.33 (OHS vs. EQ-VAS) explains only 11 % of the variance.


In a previous paper, [14] we compared and discussed the differences between howRu and EQ-5D in a study of the same population. That study showed that howRu is shorter, has better readability statistics, a higher completion rate, used a wider range of states and has a smaller ceiling effect than EQ-5D.

This study suggests that, for similar types of patient, howRu shows larger relative improvements, compared with condition-specific measures, than the EQ-5D Index and much larger improvements that EQ-VAS. HowRu also shows higher correlations for the surgery outcome, the difference between pre and post-operative scores.

One explanation for these differences may be the noise introduced by the weighting system or tariff used to calculate the EQ-5D Index scores. This view is supported by the release of the new tariff for EQ-5D-5 L [16], which has substantial differences from that used for the 3 L version [17].

The scores calculated in this paper for NHS patients, covering a 6-month period without risk adjustment, are very similar to those presented in the final published results for the whole year 2011–12, which include risk adjustment [18].

The condition-specific scores show high levels of improvement (the means are between 31.8 and 43.9 on the 0–100 scale). Generic measures such as EQ-5D Index and howRu capture each patient’s symptoms and disability from any cause (not just hips or knees). These show substantial but not as high improvements (between 18.9 to 32.5). On all measures, the results at six months are better for hips than for knees.

The improvements measured by EQ-VAS (10.2 for hips and 4.6 for knees) are much lower than for EQ-5D Index. EQ-VAS also shows low correlations with the EQ-5D Index. These large differences between EQ-VAS and EQ-5D Index were known in the 1990s for patients with rheumatic disease such as those having hip and knee replacement [19]. The new EQ-5D-5 L version [16] with more response levels may have better properties [20].

Feng, Parkin and Devlin [21] investigated the performance of the EQ-VAS in the NHS PROMs programme with similar results to those presented here and suggested that the results might be improved by providing better guidance on collection and coding. Our view is that EQ-VAS is measuring something substantially different from the other measures. EQ-VAS asks the patient to rate their health state on a scale with end points of best and worst imaginable health states. This implies inclusion of aspects such as prognosis (including that of other comorbidities), social deprivation and optimism, which are not covered by the other measures and may not be changed by joint replacement.

Hip and knee replacements are major operations with substantial costs in terms of both money and post-operative recovery periods. For these, and indeed all operations, patients, surgeons and commissioners need to know the likelihood of a favorable outcome. However, preliminary analysis of the first three years results of the NHS PROMs programme has shown little impact on hospital performance [22]. This may in part be because information feedback was slow. For example, the final results for operations performed in 2009 were not released until August 2011. Furthermore, these results were issued using a complex interactive spreadsheet (the PROMs Score Comparison Tool) [23] that is difficult to use.

Each measure uses a different scale range, which creates a barrier to comparison and understanding [24]. Transformed 0–100 scales, shown in Table 3 and Fig. 2, are much easier to interpret than the original scales when comparing mean scores. To illustrate this, Table 6 shows the original and the 0–100 scales for the average change as measured by the Oxford scores, EQ-5D and EQ-VAS for NHS hip and knee replacements. The original scales are shown in the Additional file 1.

Table 6 Comparison of mean improvement between pre-op and post-op measure on the original and transformed (0–100) scales

Limitations of this study include the modest number of MCO patients analysed. However, confidence intervals show that the numbers are statistically precise enough for our purposes. Case-mix adjustment was not applied to the scores [25]. The mean pre-operative condition-specific scores for the MCO cohorts are not significantly different from the NHS scores, but the postoperative scores are higher than the corresponding NHS scores (p < 0.05). This may be because the MCO patients comprise a different population from the NHS group, being younger [26], less deprived [27], more self-selecting and self-motivated [28], all of which may contribute to better outcome. NHS patients may have more co-morbidity, which might increase the gap between condition-specific and generic outcomes.


In this study, howRu, as a generic score, better measures improvement following hip and knee replacement surgery than EQ-5D compared to the OHS/OKS gold standard. Given the wide use of EQ-5D, we recommend that larger studies confirm or refute these findings.