Introduction

There is widespread recognition that assessment of patient outcome following total hip and total knee arthroplasty (THA and TKA respectively) should employ patient-reported outcome (PRO) measures. These tools allow a more patient-centred view in treatment evaluation [13] and advocates suggest that they provide a remarkably sophisticated evaluation of whether a treatment has worked in the (important) sense of whether or not the patient feels better, and how much better [4]. Consequently a number of disease and joint-specific PRO assessment instruments have been developed for use with orthopaedic conditions [58]. These outcome questionnaires focus mainly on the patients function in typical activities of daily living (ADLs), pain intensity or joint stiffness. They are often employed in tandem with more generic health outcome instruments such as the SF-36 which in addition to assessing physical health incorporates questions on psycho-social aspects of general health. Some generic tools such as the SF-12 have separate summary scores for physical and mental health. Tools such as this have been shown to display good divergent validity [9] in that there is very little interaction between physical and mental component questions and thus overall scores. Interestingly in disease-specific scores that do not have specific mental health components, significant correlation of psychological variables and disease specific variables has been demonstrated [1012]. This interaction is somewhat expected as poor physical outcome and pain after THA/TKA can cause psychological distress and reduce quality of life, or alternatively, poor psychological status can result in worse physical outcome by interfering with the patients’ compliance to treatment [13] and affect pain coping strategies [14]. Such causal dependency is probably bidirectional with the directions difficult to separate. An alternative explanation though to the overlap in mental and physical health parameters in these assessment tools is a failure of the patient-reported outcome measure to discriminate the overlapping constructs, and thus poor divergent validity [15, 16]. A lack of divergent validity means that interpretability of such scales is limited since the resulting scores blend different constructs. Poor outcome scores can then reflect poor physical outcome, poor psychological status, or both. It is clearly desirable to use a diagnostic tool that separates physical from psychological variables as well as possible if one wishes to assess physical function in isolation.

Thresholds for correlations as indicators of divergent validity are rarely explicitly stated in the literature. However, some studies suggest that correlations below 0.30 indicate divergent validity [17, 18], whereas correlations above 0.40 are considered as indicating convergent validity [19].

The aim of this study was to evaluate the divergent validity of the WOMAC score and the Forgotten Joint Score, and to investigate correlations with psychological variables after joint arthroplasty.

Patients and methods

Sample

All patients that underwent THA or TKA at our institution within the last five years were considered for enrolment in this study and approached for study participation at their follow-up visits in 2008.

Inclusion criteria were: unilateral THA (cemented Stuemer-Weber hip stem, uncemented Fitmore cup, Zimmer) or unilateral TKA (cemented LCS complete, DePuy), primary arthroplasty surgery, no previous THA or TKA surgery.

Sociodemographic and clinical data including sex, age, education, type and location of implant and time since surgery were collected. Patients were sent the questionnaires and an informed consent form via mail. A reminder call was made to those patients who did not send back the questionnaires within eight weeks. If there was no response for another four weeks they were excluded. Reasons for not participating in the study were recorded.

Ethical approval for this study was obtained from the ethics committee of the canton of St Gallen, Switzerland.

Assessment instruments

Forgotten joint score-12

The Forgotten Joint Score-12 (FJS-12) is a recently published PRO measure to assess joint awareness in hips and knees during various activities of daily living [6]. It consists of 12 questions and is scored using a 5-point Likert response format with the raw scores transformed onto a 0–100 point scale. High scores indicate good outcome. The FJS has been shown to have a low ceiling effect and discriminates well between good, very good and excellent outcome after THA and TKA. It has shown high internal consistency (Cronbach’s Alpha 0.95) and discriminates well in known group comparisons [6].

Western Ontario and McMaster Universities Osteoarthritis Index

The Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index is a widely used outcome measure in patients with lower limb osteoarthritis (OA) [5]. It consists of 24 questions covering three dimensions: pain (five questions), stiffness (two questions) and function (17 questions). Scale scores are derived from adding up the item scores. High scores indicate poor outcome. The WOMAC OA index has been extensively tested for validity, reliability, feasibility and responsiveness for measuring changes after different OA interventions [5, 2022].

Brief symptom inventory

The Brief Symptom Inventory (BSI) [23] is a psychological self-report symptom scale developed as a short-form version of the SCL-90-R [24]. It is widely used in various medical fields to assess current psychological status and distress. The 53 items are grouped in nine symptom scales (somatisation, obsessive-compulsive behaviour, interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation, and psychoticism) and three global indices, Global Severity Index (GSI) as a global distress measure, Positive Symptom Distress Index (PSDI), and Positive Symptom Total (PST). Scale scores are derived from mean item scores. High scores indicate high psychological symptom burden.

Catastrophising scale

The catastrophising scale is part of the Coping Strategies Questionnaire developed by Rosenstiel and Keefe [25]. It comprises six items assessing catastrophising as a pain-related coping strategy characterised by a feeling of being overstrained and a pessimistic future perspective. The scale scores are derived from adding up the items. A high score indicates poor coping.

Statistical analysis

Sample characteristics are presented as percentages or as means with standard deviations and ranges. For determining associations between the administered scales (WOMAC score, FJS-12, BSI, Catastrophising scale) Pearson-correlation coefficients were calculated. Two multiple linear regression models were used to investigate the impact of sociodemographic and clinical variables and of the psychological scales (BSI and Catastrophising scale) separately for the WOMAC and for the FJS-12 score. In these models adjusted R-Squared (R2) indicates the proportion of variance explained by the independent variables (predictors) in the model. Variables having a significant association with the WOMAC or the FJS-12 in univariate analyis were considered for inclusion into the multivariate regression model if p < 0.05. In a first block of predictors, the patient characteristics sex, education, and location were included. In a second block of predictors the psychological scales (BSI scales and the Catastrophising scale) were included using a forward selection procedure.

Results

Sample characteristics

A total number of 356 patients were contacted in a mail survey in August 2008. 243 (68.3%) patients returned the questionnaires along with written informed consent. Reasons for not participating in the study (phone call) were: refusal of participation (42 patients; 11.8%), wrong address (29 patients; 8.1%), death (22 patients; 6.2%), cognitive impairment (3 patients; 0.8%), moving abroad (1 patient; 0.3%) and unknown reasons (16 patients; 4.5%). Mean patient age was 70.6 (SD 11.3) and 120 patients (49.4%) were female. 157 (64.6%) patients had THA surgery and 86 (35.4%) had TKA surgery. For further details see Table 1.

Table 1 Descriptive statistics for clinical and socio-demographic variables (n = 243)

Correlations between FJS-12, WOMAC, BSI and the catastrophising-scale

Correlation coefficients for the relationship between WOMAC, FJS-12, BSI scales and catastrophising scale are presented in Table 2. Highest correlations for the FJS-12 were found for Catastrophising (r = −0.60), BSI-Somatisation (r = −0.49) and the BSI-GSI (r = −0.44). WOMAC-Total also showed the strongest relation to Catastrophising (r = 0.79), BSI-Somatisation (r = 0.63) and BSI-GSI (r = 0.54). For comparison, correlations between the WOMAC subscales (pain, stiffness, and function) were between r = 0.80 and r = 0.91.

Table 2 Correlations between WOMAC, FJS-12, Catastrophising and BSI

Multivariate analysis of the FJS-12 and the WOMAC score

Sex, education and location of implant (hip or knee) have previously been shown to impact on the FJS-12 and WOMAC-Total score [6]. These variables were included as predictors in two separate linear regression models, with the WOMAC total score and FJS-12 as the dependant variables. The global distress scale of the BSI (BSI-GSI) as well as BSI-Somatisation and the Catastrophising scale were included as predictors in both models.

Overall the demographic and psychological variables explained 38% of the variance in the FJS-12 and 68% of the variance in the WOMAC score. Gender, education, and implant location (hip or knee replacement) explained similar small proportions of each score (gender explained 1.8% of the variance in FJS-12 and 1.9% of WOMAC-Total score; Education 1.8% of FJS-12 and 2.4% of WOMAC-Total score; and implant location, 2.7% of the FJS-12 and 5.0% of the WOMAC-Total score). Larger discrepancies were seen between WOMAC and FJS-12 in terms of the amount of variance explained by BSI-GSI scale (17.4% of FJS-12, and 26.0% of WOMAC-Total score), Catastrophising scale (12.6% of FJS-12, and 28.3% of WOMAC-Total score), and the BSI-Somatisation scale (1.6% of FJS-12, and 4.7% of WOMAC-Total score) (Table 3 and Figure 1).

Table 3 Multiple linear regression model for FJS-12 and WOMAC-Total
Figure 1
figure 1

Explained and unexplained variance for the WOMAC (1a) and FJS-12 scores (1b).

Discussion

This study investigated the associations between psychological parameters and physical outcome assessed by two PRO instruments, the WOMAC score and the FJS-12. We found high correlations between disease-specific outcome measures and several of the assessed psychological domains. Multivariate regression showed that catastrophising, psychological distress and somatisation explained almost 60% variance of the WOMAC score beyond the known covariates of sex, implant location and education. We found the same predictor set for the FJS-12, however, psychological parameters accounted only for half the variance seen in the WOMAC score.

Our findings indicate a significant lack of divergent validity of the WOMAC score and, to a lesser extent, of the FJS-12. The variance proportions estimated with help of the regression model suggest a substantial overlap between the orthopaedic and psychological scales. The lack of divergent validity becomes even more evident when opposing the high correlations between the WOMAC subscales themselves (above 0.80) and the correlations of the WOMAC total score with the psychological scores (up to 0.79).

This significant overlap with psychological status is not reflected in the WOMAC scales’ names (pain, stiffness, function) which somewhat misleadingly suggest to just measure physical, joint-related characteristics. This is also true for the FJS-12 which refers to joint awareness. However, the term joint awareness seems more closely related to psychological aspects.

We also found that location of joint arthroplasty (hip or knee) explained less than 5% of variance of both FJS-12 and the WOMAC score. This is interesting as it is well accepted that outcome differs between total hip and total knee arthroplasty populations [26, 27]. In contrast, the psychological scales exceeded these proportions by a factor of 10 (for both FJS-12 and WOMAC). Thus, our data indicate a stronger association between psychological factors and joint-related outcomes than that between outcome and the type of joint replaced.

Our findings compare well to other results from literature. Escobar et al. [15] investigated the association between WOMAC scores and the different subscales of the SF-36. They showed that both psycho-social and physical SF-36 scales correlated to the WOMAC score in a similar way. The WOMAC function subscale demonstrated the same correlation with both SF-36 social and physical function scores. WOMAC stiffness was equally correlated with SF-36 role-physical function score and mental health score. Similarly Wolfe [16] highlighted that divergent validity of the WOMAC may be compromised by factors such as fatigue, symptom counts, depression, and low back pain.

The strong correlation between physical and psychological scales found here and in other studies [2830] may partially be explained by causal interdepencies that have been suggested by several longitudinal studies.

Sharma et al. [31] demonstrated that mental health measured with the SF-36 predicted subsequent improvement in physical function in TKA, results in line with Brander et al. [32], who showed that preoperative depression substantially influences Knee Society Rating Scale function scores five years post-operatively. In contrast, Lingard et al. [33] found (in a large prospective observational study) that although psychological distress decreased post-operatively, pre-operative levels of distress were not related to post-operative improvement (change in pain and function).

Lopez-Olivo et al. [12] found a strong correlation between pre-operative psychological status and post-operative physical function at 6 months. Education, coping style and locus of control over health at baseline explained 22% of variance in WOMAC pain at follow-up. A similar predictor-set explained 19% of the WOMAC function scale and 36% of the total score of the Knee Society Rating Scale.

Our study was based on a cross-sectional design which is reasonable for the investigation of divergent validity. However, it does not allow for causal interpretation of the associations between orthopaedic outcomes and psychological variables. A limitation is the limited number of predictors in our model that left a large proportion of unexplained variance. Further interesting predictors that may be of future research interest include patient activity level, social support, cognitive function, range of motion and joint stability.

A particular strength of this study is the use of a comprehensive and detailed assessment of psychological status (BSI and the Catastrophising Scale from the Coping Strategies Questionnaire). These scales are more differentiated and comprehensive than other tools such as the SF-36 which has previously been employed to assess psychosocial characteristics of arthroplasty populations.

Conclusion

We found a substantial overlap between physical and psychological patient-reported symptoms in an arthroplasty population, i.e. orthopaedic PRO measures were strongly associated with psychological PRO measures indicating poor divergent validity.Whereas this may also reflect existing causal dependencies, it impairs valid measurement of orthopaedic outcome. Divergent validity is an important psychometric characteristic of PRO instruments that is required to guarantee accurate assessment of specific orthopaedic outcomes.

Problematically, the category names of the orthopaedic outcome scales suggest measurement of specific constructs such as pain, stiffness, function or joint awareness but they appear to be strongly associated with patients’ psychological status. Our findings suggest that the names of certain orthopaedic scales do not adequately reflect the constructs assessed with these scales.