Instruments
Achilles Tendon Total Rupture Score
The ATRS has been developed to evaluate patient-reported outcome after the treatment of acute total Achilles tendon rupture. The score consists of ten items focusing on one dimension related to symptoms and physical activity. Each item can be graded on a scale of 0 to 10. A summation of the grades gives a total score, ranging from 0 to 100, where a lower score represents greater physical limitation [25].
EuroQoL-5 Dimension Questionnaire
The EQ-5D covers five dimensions, including mobility, self-care, everyday activities, pain/discomfort, and anxiety/depression, and each dimension has one item. Each item is rated by the patient according to a three-level (EQ-5D-3L) or five-level (EQ-5D-5L) scale [10] presenting a five-digit number to reflect health state and the level in each dimension. In this study, the EQ-5D-3L was used; there are 35 = 243 different possible health states from which a preference-based single index (utility value) can be derived. The single index is obtained by comparing the five-digit number with the average health state valuation of a population sample generated with the time trade-off (TTO) method or visual analogue scale (VAS) method. The EQ-5D single index ranges from 0 to 1 (although negative values are possible), with 0 regarded as “equal to death” and 1 as “the best imaginable health state” [10, 12].
The data set
Data were collected from a randomised controlled trial (RCT) conducted to evaluate patient-reported outcomes after stable surgical repair with the early loading of the tendon in patients with an acute Achilles tendon rupture [26]. One hundred patients (86 men, 14 women; age, 18–65 years) were recruited from a centre in Sweden between April 2009 and October 2010, and randomised to either a surgical (n = 49) or non-surgical (n = 51) treatment group. Patient-reported outcome was assessed for all patients using the self-rated ATRS and EQ-5D during follow-up at 3, 6, and 12 months after treatment. Twelve patients were excluded because of re-rupture or lost to follow-up. From the surgical group, two patients were excluded before first follow-up and another four after follow-up at 3 months. Four patients were excluded before first follow-up and one after follow-up at 6 and 12 months, in the non-surgical group. A final total of 274 paired ATRS and EQ-5D assessments [26] were then collected in the present study to analyse the statistical relationship between the scores.
Ethical approval was obtained from the Regional Ethical Review Board in Gothenburg, Sweden (Diarienr: 032-09).
Data analysis
The study utilised a direct mapping approach, which is based on a model of the EQ-5D utility scores (“Dolan tariff” [11]) with the ATRS scores as predictors: \({\text{EQ5D}}\;{\text{score}}=f({\text{ATRS}}).\)
As there is no theoretical model to guide the model selection for mapping between the instruments, data mining techniques must be applied to detect a potential algorithm that maps ATRS scores onto the utility scores. A standard approach for choosing between different candidate models is cross-validation [2], where the sample is randomly split into two sub-samples. A regression model is estimated on one sub-sample, referred to as the “training sample”, and the results from that regression are used to predict the outcome in the other sub-sample, referred to as the “validation sample”. The accuracy of the model can be assessed by some correlation measurement between the predicted and actual outcomes in the “validation sample” and the model with the lowest prediction error is then regarded as the best fit.
A modified and more efficient version of the cross-validation approach named K-fold cross-validation was used. The full sample is split into K sub-samples (often 5 or 10) and K − 1 sub-samples function as the “training samples”, while one sub-sample functions as the “validation sample”. The model results from the regressions of the training samples are used to predict the outcome in the validation sample and this is repeated K times, where each sub-sample functions as the “validation sample” in one of the repeats [8].
The model selection is further complicated by the fact that the ATRS data can be summarised in many different ways, e.g., using the summation score of all ten items, as shown in Fig. 1 (from 0 to 100), as the sole predictor, or using the ten items as separate continuous or categorical predictors, or using a subset of the ten items as predictors, etc. Each of these possible alternatives to summarise the ATRS score and function was regarded as the independent variable(s)/predictors (Table 1). For each model specification, we performed the K-fold cross-validation and measured the mean of the absolute errors: \({e_i}=|{y_i} - {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} _i}|\), i.e., the absolute deviation between the observed and predicted outcome in the “validation samples”. The lower the mean of absolute errors (MAE), the better the predictive accuracy of the model.
Table 1 Different model specifications
Table 1 shows that model E performed best, with a slightly smaller mean absolute error than the most basic model A. Model E is also a somewhat parsimonious model, including three items from the ATRS to explain EQ-5D scores: items four, five, and six, based on a stepwise regression were the three most statistically significant (and most influential) items from the ATRS in terms of predicting the EQ-5D score. The items describe experienced limitation due to pain in the calf achilles tendon/foot (item four), during acitivies of daily living (item five), and when walking on uneven ground (item six).