Background

Appropriate patient reported outcome measures (PROMs) are imperative for the evaluation of clinical practice and research [1]. This is exemplified by the increasing use of such measures during the last two decades. Examples of widely used measures within musculoskeletal care include the Victorian Institute of Sports Assessment questionnaire for patella and achilles tendinopathy [2, 3] (VISA and VISA A questionnaires), the oxford hip and knee scores [4, 5], and the Disabilities of the Arm, Shoulder and Hand score (DASH)[6].

A range of PROMs have also been published specifically for evaluating patients following a rupture of their Achilles tendon. Examples of these include the American Orthopaedic Foot and Ankle Hindfoot score [7], Thermann score [8] and Leppilahti score [9]. However the development of these measures have predominantly been based on expert opinion, with a lack of data to support their validation [10]. In 2007 a research group addressed this gap through development of a new patient reported outcome measure, with supporting validation data [10]. This outcome measure was the 'Achilles tendon Total Rupture Score' (ATRS).

The ATRS contains ten items, for which patients are asked to respond using an 11-grade Likert scale, by ticking one box labelled 0-10. This score of zerois equivalent to a patient having major limitations/symptoms and a score of ten is equivalent to a patient having no limitations or symptoms. The score was originally developed and evaluated in Swedish, using a sample of patients aged 20-70 years with an acute Achilles tendon rupture. The score evaluates the constructs of 'symptoms and physical activity' through five questions addressing symptoms and five questions addressing physical activity.

The purpose of this study was to evaluate internal consistency convergent validity and responsiveness of the ATRS within a UK population of patients who had sustained an isolated acute Achilles tendon rupture.

Methods

Population identification

Ethical approval for this study was gained. Between August 2007 and June 2009, 70 patients presenting at the University hospital with an isolated, acute (< 10 days) midsubstance rupture of the Achilles tendon were screened. The diagnosis was established through subjective history and physical examination to confirm a palpable gap and a positive Thompson test [11].

Patients with delayed presentation, bilateral ruptures or other serious lower limb injury were excluded. In addition, patients who were unable to read the English language and therefore unable to complete the questionnaires were also excluded. Of the 70 screened patients, six were excluded. Three were excluded because they were unable to complete the questionnaires and three declined to complete the questionnaires.

Following non-operative or operative repair of their Achilles tendon, all patients were managed using an immediate weight-bearing protocol. This consisted of wearing an ankle foot orthoses with inserted heel wedges for eight weeks, followed by a standardised physical therapy programme.

Questionnaire follow up

All patients were routinely evaluated at baseline, six weeks, three months, six months and nine months, using PROMs, which included the ATRS, EQ 5D and Disability Rating Index (DRI).

. In contrast to the ATRS, the DRI is not disease specific and has been validated for use in a range of orthopaedic presentations [12]. It is a self administered questionnaire with twelve questions regarding common physical activities, to which patients respond using a 100 mm visual analogue scale. There are two anchor points, 'without difficulty = 0' and 'not at all = 100'. The EQ 5D was also completed by patients to obtain a global measure of health [13]. The EQ 5D comprises of five questions which ask the patient to respond with one of three options. The combination of these responses results in a single health index score.

These scores were selected instead of the VISA-A and Foot and Ankle Orthopaedic Score, which were used by the developing authors, because the VISA-A is a score for patients with Achilles tendinopathy not rupture, and the Foot and Ankle Orthopaedic score has little validation data in relation to Achilles tendon ruptures [14]

Sample size

There is no agreed optimum method for determining an appropriate sample size to evaluate aspects of validity for patient reported outcome measures [15]. However 50 patients have been advocated as the minimum requirement by previous studies evaluating aspects of validity [16]. Our planned case series of 64 patients would provide sufficient power to investigate important aspects of validity for the ATRS, and allow for 20% loss to follow-up.

ATRS analysis plan

Internal consistency was defined as the extent to which individual items of the ATRS correlate with each other [10, 15]. This is an important aspect of validity because it determines if all the items within the ATRS are measuring the same construct. This construct was defined as 'symptoms and physical activity' by the authors who developed the ATRS. Internal consistency was evaluated using Cronbach's alpha at each time point within SPSS (v.17.0). Values between 0.7 and 0.9 are regarded as satisfactory [17].

Convergent validity was defined as the extent to which the ATRS correlated with measures consistent with its theoretically derived construct. This was evaluated by correlating the ATRS with the overall DRI and EQ 5D scores, and the subdivisions of EQ-5D related to 'mobility' and 'usual activities'and the three subdivisions of the DRI (1: common basic activities of daily life, 2: more demanding daily physical activities, 3: work related and more vigorous activities). Following an assessment of data distribution using a Shapiro-wilk analysis, the Spearman's rank correlation coefficient was used within SPSS (v 17.0) at each time point. The minimum correlation coefficient was defined as being 0.7 [15].

Responsiveness was defined as the ability of the ATRS to detect clinically important changes over time [15]. For the ATRS to be responsive it needs to demonstrate a lack of 'floor and ceiling' effects, in that an individual should not be at the maximum or the minimum value for each time point. We recorded thatfloor and ceiling effects were being present if more than 15% of respondents achieved the lowest or highest possible scores [15]. This was followed by a relative efficiency calculation to analyse responsiveness of the ATRS versus the EQ5D and DRI according to Barr et al. [18]. Using this method a score of greater than 1 would indicate the ATRS was more responsive than the EQ 5D and DRI and a score less than 1 would indicate the ATRS to be less responsive than the EQ 5D and DRI.

Results

Baseline demographics

The baseline demographics of the 64 patients can be found in Table 1. The sample reflects the broad population of patients who sustain this injury. The age range was from 21-79 years and all of the patients sustained a complete rupture of the tendon. Nineteen of the 64 patients received operative management. All 64 patients completed the above three questionnaires at baseline, which was their pre-injury scores. One patient was lost to follow-up at six weeks (63 patients, 98%). A further three were lost to follow-up at the three month time point (60 patients, 94%) and a further two patients at six months (58 patients, 91%). Fifty-six patients (88%) completed the final follow-up questionnaires at nine months.

Table 1 Baseline demographics

Descriptive summary

The minimum, maximum, mean and standard deviation for the ATRS, EQ 5D and DRI at each time point can be found in Table 2, in addition to the descriptive data for the separate operative and non-operative groups. This illustrates that within the ATRS score there is a wider spread of scores at each time point, in comparison to the DRI and EQ 5D scores, which will be further discussed in the final conclusions.

Table 2 Overall ATRS, DRI and EQ-5D scores at each time point

Internal consistency

The internal consistency for the ten items of the ATRS at each time point was high (Cronbach's alpha > 0.7). For pre-injury scores this was 0.98, with a decreasing score of 0.89 at six weeks and three months and increasing to 0.95 and 0.94 at the six and nine month time points (Table 3)

Table 3 Cronbach's alpha for the ATRS at each time point

Convergent validity

There were statistically significant (< 0.001) correlations between the ATRS and DRI scores at each time point, with correlation coefficients ranging from -0.5 to -0.9, with the exception of the nine month time point, as shown in Table 4. Table 4 also shows the 95% confidence intervals for each time point.,

Table 4 Convergent validity

Table 4 illustrates the correlation coefficients for each construct of the DRI against the ATRS and overall EQ 5D and its two subdivisions against the ATRS at each time point. In relation to the overall EQ 5D score and its two sub-divisions the expected size of the correlations did not reach the 0.7 value, pre-defined in the analysis section. Regarding the three DRI sub-divisions, this was also not met by the first sub-division of 'common basic activities of daily life'. This criteria was only met within the last time point of the second and third sub-divisions of 'more demanding daily physical activities' and 'work related and more vigorous activities'.

Responsiveness

Table 5 illustrates the percentage of reported responses at the top (ceiling) of the total possible scores for the ATRS, DRI and EQ 5D and the percentage of reported responses at the bottom (floor) of the possible score for the ATRS, DRI and EQ5D. All three scores demonstrated a ceiling effect (defined as > 15% respondents) for reported pre-injury scores, which was highest for the more generic quality of life measure, EQ 5D. The ATRS and DRI scores did not demonstrate a ceiling effect at any other time point. This was in contrast to the EQ 5D, which demonstrated further ceiling effects at the six and nine moth time points. None of the three outcome measures demonstrated any floor effects.

Table 5 Percentage of ATRS, EQ-5D and DRI respondents at either the floor or ceiling of the score

Table 6 shows the relative efficiency of the ATRS in relation to the EQ 5D and DRI at each time point. On all occasions the ATRS demonstrated greater responsiveness when compared to the DRI and EQ 5D. At the six month time point the ATRS was 2.1 times more responsive EQ 5D and at nine months it was four times more responsive. The same trends were evident when compared to the DRI, but to a lesser extent, with the ATRS being 1.3 times more responsive at six months and 2.7 times more responsive at nine months.

Table 6 Relative efficiency of the ATRS across all time points

Discussion

The ATRS was published in 2007, and advocated by the authors as the only validated PROM available to evaluate patients following a rupture of their Achilles tendon [10]. There have been no subsequent validation studies. Therefore, this study represents the first paper to investigate aspects of validity, outside of the developing centre.

We investigated aspects of internal consistency, convergent validity and responsiveness of the ATRS, using a sample of 64 patients. The ATRS was found to have high internal consistency at each time point (Cronbach's alpha between 0.89 and 0.95). This finding is also consistent with the original development article, which reported a Cronbach's alpha of 0.96. However, a result above 0.90 has been debated within the literature as being an indication that the outcome is too homogeneous [19, 20]. The implication being that further item reduction may be appropriate [1].

Convergent validity of the ATRS was evaluated against the DRI score. Correlation coefficients between the DRI and ATRS demonstrated statistically significant correlations between the two scores at each time point. However the confidence intervals around this were wide, with only the six and nine month time points demonstrating a correlation coefficient of at least 0.7. These wide confidence intervals may be the result of the limited sample size of this study. Alternatively, they may reflect an element of heterogeneity amongst the sample. For example the inclusion of patients managed both operatively and non-operatively and patients with co-morbidities such as asthma or diabetes, may affect the distribution of PROMs scores.

These results do however provide some evidence that the ATRS is measuring similar aspects of outcome when compared to the DRI. The main limitation with this methodology was highlighted by the developing authors of the score, who acknowledged that this element of validity should be interpreted with caution as there was no existing gold-standard PROM with which to compare.

Further exploring correlations of the ATRS with the DRI we next investigated if the ATRS correlated more strongly with aspects of the three DRI sub-divisions. To further analyse aspects of convergent validity the ATRS was also correlated against the EQ-5D and two of its subdivisions evaluating 'mobility' and 'usual activities'. Again the confidence intervals were large across the time points and scales evaluated. There size of the correlations did not fulfil the pre-defined criteria of 0.7 within the EQ-5D or its subdivisions. Within the DRI this criteria was met by the second and third sub-divisions of the DRI at the six and nine month time points. These results were anticipated by the authors to an extent because the EQ-5D measures more generic quality of life, as opposed to the alternative construct of physical activity, measured by the ATRS. Again the key limitation of these correlations is that the three scores are measuring only similar constructs as opposed to exact constructs and with the large confidence intervals reported, a larger sample may be required. A 'foot and ankle' specific PROM may provide a more exact construct for comparison, but as described in the methods section there is also a distinct lack of robustly-developed outcome measures in this area [14].

The more specific ATRS outcome measure demonstrated greater responsiveness than the more generic DRI and EQ-5D scores at each time point. These results were in keeping with the original development article. The level of responsiveness was only marginal in comparison to the DRI and EQ-5D up until the three month time point, with greater levels of responsiveness evident at the six month and nine month time points. This may by representative of the greater ceiling effects seen within the EQ-5D and DRI scores. There are many methods available to determine responsiveness. This method was used as opposed to more routinely reported effect sizes because it does not require parametric assumptions.

Conclusion

This study provides further evidence regarding the validity of a newly developed measurement tool. Overall the ATRS demonstrated high internal consistency and responsiveness in comparison to the EQ-5D and DRI. It has also demonstrated minimal ceiling effects (within pre-injury scores only), and no floor effects. There was limited evidence of convergent validity with the DRI and EQ5D, although these tools do measure slightly different constructs. The descriptive data in Table 2 will provide researchers with a useful resource of estimates of range and spread of scores across multiple time points, but further research will be required to determine minimally important differences within this score.

Future areas of research could explore the use of this tool in the separate operative and non-operative populations, and within patients with chronic rupture presentations. While there is certainly scope to further explore aspects of validity of this new score, ideally with even larger samples, this study is a positive step towards the use of a universal measure of outcome for patients with a rupture of the Achilles tendon.