Background

Primary total hip joint replacement (THR) is an effective intervention for severe osteoarthritis (OA) of the hip that relieves the patients' pain, increases physical functioning, and improves health related quality of life (HRQoL). Previously, evaluation of surgery for OA of the hip has typically been done with functional scoring systems, such as the Harris Hip score [13].

During the last decade patient-reported outcomes, such as HRQoL, have gained importance in the assessment of outcome after surgery for OA of the hip [47]. The two most commonly used questionnaires to assess the outcome of hip surgery are the generic Medical Outcomes Short Form 36 Health Survey (SF-36) and the more disease-specific Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) [46, 811].

Pain is a cardinal symptom of OA of the hip and is probably the most important variable for deciding whether to operate or not. Therefore, questionnaires specifically developed for the assessment of pain can potentially complement the WOMAC and the SF-36 among patients with OA, or possibly be more sensitive to change. The Brief Pain Inventory (BPI) is a self-administered questionnaire developed to assess pain and the impact of pain [12]. It was developed for use in cancer pain, but has also been used in other chronic pain conditions [1318].

The reliability, validity and responsiveness of the BPI, or a shortened version of it, after drug interventions, have recently been reported in unspecified patients with OA [15, 16], but its psychometric properties have not been documented in homogeneous samples of patients with OA of the hip, or in surgical intervention for OA of the hip. If the BPI is to be used in such a setting, it is important to document the properties of the questionnaire. In the present study we wanted to assess the psychometric properties of the BPI in patients with OA of the hip undergoing THR.

Materials and methods

Subjects and study design

The study was a prospective multi-center study in six hospitals in three Norwegian counties. We included consecutive patients >18 years old, who were accepted to the waiting list for primary THR surgery and had satisfactory proficiency of the Norwegian language to respond to questionnaires. Between June 2003 and June 2004, 356 patients were invited, and 250 (70%) accepted to participate and responded at baseline (Figure 1).

Figure 1
figure 1

Flow chart.

In each participating hospital a project contact in the orthopedic unit identified patients fulfilling the inclusion criteria. All participants responded to a package of questionnaires at (i) acceptance to the waiting list for surgery (baseline), and (ii) 1 year after surgery. At baseline, we mailed the questionnaire to the subjects shortly after an ambulatory visit.

Among the 250 baseline respondents, we excluded those that had new joint replacement surgery (n = 4), were not scheduled for surgery (n = 3), had severe complication after surgery (n = 2), had another dominating disease (n = 2), had died (n = 2) or chose another hospital for surgery (n = 1). The remaining 236 baseline respondents received questionnaires 1 year after surgery, of whom 203 responded to the questionnaire (80% of all baseline respondents). In the planning and establishing of this study sample size was calculated. The study was powered detect a change in HRQoL of 0.5 SD, with power 0.9, and 5% significance level. In paired analysis, this would require a sample size of at least 43 patients for analysis. To account for possible loss to follow-up and comparisons of changes in subgroups, we chose to include about 250 patients.

The Regional Committee for Medical Research Ethics and the Norwegian Social Science Data Services approved the study.

Questionnaires

At baseline, patients completed a questionnaire that comprised data on gender, age, marital status, cohabitation, education level, employment status, type of surgery, duration of pain in the joint, and number of years with ambulation problems. In addition, patients completed the Brief Pain Inventory (BPI) [12], the WOMAC [19, 20], and the SF-36 [2123] questionnaires.

The brief pain inventory

The BPI is a short, self-administered questionnaire with 11 items, which was designed to evaluate the intensity of, and the impairment caused by, pain. Originally, the BPI was developed to evaluate cancer pain, but it has been shown to be a valid and reliable instrument for chronic non-cancer pain [13, 14, 17]. All BPI items are scored using rating scales. Four items measure pain intensity (pain now, average pain, worst pain, and least pain) using 0 ("no pain") to 10 ("pain as bad as you can imagine") numeric rating scales, and seven items measure the level of interference with function caused by pain (general activity, mood, walking ability, normal work, relations with other persons, sleep, and enjoyment of life) using 0 (no interference) to 10 (complete interference) rating scales.

The items are aggregated into two dimensions, (1) Pain severity index, using the sum of the four items on pain intensity, and (2) Function interference index, using the sum of the seven pain interference items [18, 24, 25]. Missing values were handled as recommended by the developers of the BPI [26]. All four severity items must been completed for aggregating a pain severity index. The function interference index is scored as the mean of the items scores multiplied by seven, given that more than 50%, or four of seven, of the items have been completed [26]. We used the Norwegian translation of the BPI, which has documented reliability and validity [24].

The womac osteoarthritis index

The WOMAC is a validated and sensitive instrument that can detect clinically important changes following a variety of interventions for OA [19]. It is a three-dimensional, disease-specific, and self-administered instrument [19, 20] that consists of 24 items that evaluate pain (five items), stiffness (two items), and overall level of physical function (17 items). Items are rated using one of five responses (0 = none, 1 = mild, 2 = moderate, 3 = severe, 4 = extreme). The item scores are aggregated to three subscale scores, pain, stiffness, and physical function, which are calculated as the mean of the item scores in each dimension. Finally, all subscales were recoded to 0-10 scales to ease interpretation, where 10 represents maximal problems and 0 no problems [27].

For this study, patients were asked to respond to each item in relationship to the hip joint that was to be replaced and to respond in relationship to the past 48 hours. We used the Norwegian Likert scale version 3.1 [28, 29].

Medical outcomes study short form (SF-36)

The SF-36 questionnaire consists of 36 items that evaluate eight conceptual domains of HRQoL: general health (GH), physical functioning (PF), mental health (MH), role limitations - physical (RP), role limitations- emotional (RE), vitality (VT), bodily pain (BP), and social functioning (SF) [22]. The SF-36 is a widely used measure of HRQoL with documented validity and reliability in various languages and populations [22, 23]. This instrument has previously been used in patients with OA [46, 810, 3033]. The Norwegian translation of the standard SF-36 version 1.1 was used and scored on a 0-100 scale, with higher scores indicating a better HRQoL [34].

Statistical analysis

Descriptive statistics are presented using the mean (SD) or numbers (percentages). Groups were compared using the chi-square test, independent samples t-test, or Mann-Whitney U test, where applicable. Internal consistency reliability for the BPI, WOMAC and SF-36 scales at baseline was assessed using Cronbach's coefficient alpha [35]. A commonly accepted requirement for internal consistency reliability is that it should be at least 0.70 [36]. A floor effect occurs when a high proportion of the respondents grade themselves at the minimum score and a ceiling effect when a high proportion of the respondents score at the maximum of the instrument. Because the patients' perceived pain and HRQoL scores were expected to be very different before and after surgery, we estimated the floor and ceiling effect for the BPI, WOMAC and SF-36 at baseline and 1 year after surgery. Such effects may limit changes in scores in one direction, hence limiting an instrument to capture changes.

Construct validity of the BPI was assessed by Pearson's product-moment correlation coefficient between the BPI scale scores and scores on the WOMAC and SF-36 scales, using the baseline data in this study. Before this analysis, based on assessment of the content of the items on the scales, we hypothesized that (1) the Bodily pain (BP) scale of the SF-36 and the Pain scale of the WOMAC would represent similar constructs as the Pain severity index of the BPI, and (2) the SF-36 BP, PF and RP scales, and all three WOMAC scales would correspond with the Function interference index of the BPI. A finding of higher intercorrelations (r > 0.4) between subscales assessing similar constructs, compared with subscales assessing dissimilar constructs, would support the construct validity of the BPI.

Responsiveness was evaluated by longitudinal assessment of patients undergoing THR, investigating if the instruments were sensitive to change following the intervention. Responsiveness was further investigated in categories of respondents, stratified according to the response on an item on change in overall health during the past year. We used item two on the SF-36 questionnaire as the rating of overall change: "Compared to 1 year ago, how would you rate your health in general now? (1 = much better, 2 = somewhat better, 3 = about the same, 4 = somewhat worse, 5 = much worse)". The respondents were categorized as having a better (1 to 2), unchanged (3) or worse (4 to 5) health status [36, 37]. In assessment of correlations between indices and responsiveness, we standardized the analysis by only including respondents that had valid scores on all scales of the three questionnaires (n = 161).

To assess the magnitude of the responsiveness, we calculated the effect size (ES), standardized response mean (SRM), and Responsiveness Index (RI). ES= (mean 1 year after - mean at baseline)/SD baseline, SRM= (mean 1 year after - mean at baseline)/SD difference, and RI= (mean 1 year after - mean at baseline)/SD of change scores in patients with unchanged health status [3638].

Data were analyzed using SPSS for Windows version 13.0 (SPSS Inc., Chicago, Ill.). We chose a 5% significance level using two-sided tests.

Results

In total, 203 patients completed the study 1 year after THR, 143 (70%) were female. The patients completing the study had a mean age of 69 years (SD 10), 67% were married/cohabiting, and 61% were retired. The respondents and non-respondents did not differ in baseline characteristics, though the non-respondents tended to be slightly older and more of them were retired than the respondents (Table 1).

Table 1 Patient characteristics at baseline for respondents included in the analysis 1 year after hip joint replacement surgery and those excluded after baseline response, mean (SD) unless otherwise stated.

At baseline, internal consistency reliability, as assessed with Cronbach's α, was >0.80 for the BPI pain severity index and function interference index, the WOMAC and the SF-36 subscales except the pain and stiffness subscale on the WOMAC, the BP and GH scales of the SF-36 (0.79, 0.70, 0.68 and 0.69, respectively) (Table 2).

Table 2 Psychometric properties at baseline and 1 year after primary hip joint replacement

None of the BPI indices had marked floor or ceiling effects at baseline, however, at 1 year after THR the BPI pain severity index and function interference index showed a floor effect, but none had signs of a ceiling effect. On the WOMAC subscales at baseline, floor and ceiling effects were minor. One year after THR, the floor effect was larger on all three WOMAC subscales, but most marked on the pain and stiffness subscales. At baseline, floor and ceiling effects on the SF-36 subscales were most marked on the RP and RE scales, and 1 year after THR there was marked ceiling effects on the SF and RE scales (Table 2).

In total, 161 of the 203 patients had valid dimension scores on all three questionnaires at baseline and 1 year after THR. The correlations between the two BPI scales and scales of the WOMAC and SF-36 partially supported our hypotheses (Table 3). Correlation of the BPI pain severity index with the pain subscale on the WOMAC and BP on the SF-36 were high, in line with hypothesis (1). In addition the physical function scale on the WOMAC was highly correlated with the BPI pain severity index (r = 0.57). Further the BPI function interference index scores and the subscales on the WOMAC except stiffness were highly correlated in accordance with hypothesis (2). The results indicated moderate to high correlations of the BPI function interference index with BP and PF scales, as hypothesized, but low correlation with the RP scale. Further, the correlations of the BPI function interference index with the VT, SF and MH scales were moderate to high. The correlations between hypothesized non-corresponding items were lower (Table 3).

Table 3 Pearson's product-moment correlation coefficient between the Brief Pain Inventory pain (BPI) severity index and function interference index with subscales of the WOMAC and the SF-36 questionnaires for patients with OA of the hip at baseline (N = 161)

For the BPI, the responsiveness indices (ES, SRM and RI) for change from baseline to 1 year after THR for the total sample were large, with minimum values of 1.57 for the pain severity and 1.52 for the function interference index (Table 4). Also on the WOMAC scales the responsiveness indices on the three subscales were large, minimum values ranging from 1.69 to 2.84.

Table 4 Responsiveness indices (Effect Size (ES), Standardized Response Mean (SRM) and Responsiveness Index (RI)), for change from baseline to 1 year after primary hip joint replacement surgery (N = 161).

On the eight SF-36 scales the responsiveness indices showed more variation. For the PF, RP and BP scales the responsiveness indices were all above 0.94, for VT and SF scales they ranged from 0.77 to 0.85, except for the RI which was 1.10 for the VT. The remaining SF-36 scales, GH, RE and MH were less responsive, with responsiveness indices from 0.24 to 0.52.

In analysis of responsiveness in groups stratified according to the rating of overall health change: 133 reported better, 28 unchanged or worsened overall health than 1 year before. Those that reported an improvement in the rating of overall health change over 1 year had large values on all responsiveness indices on the pain severity index and function interference index of the BPI and all the subscales of the WOMAC and the SF-36, except GH, RE and MH, with values >0.80. All responsiveness indices for this group were larger than for the groups with unchanged or worsened overall health (Table 5).

Table 5 Responsiveness indices (Effect Size (ES), Standardized Response Mean (SRM) and Responsiveness Index (RI)), for change from baseline to 1 year after hip joint replacement surgery, according to perceived global change.

For those with unchanged or worsened overall health the responsiveness indices indicated an improvement in pain and HRQoL, with large responses on the BPI pain severity and function interference indices, the three WOMAC subscales, and for some of the SF-36 scales most related to physical health (PF, RP and BP). For the other SF-36 scales the effects were moderate (VT and SF), small (RE) or unchanged (MH). For the GH scale, the responsiveness indices changed in the opposite direction, suggesting a slight deterioration.

Discussion

In the present study we have documented the psychometric properties of the BPI in patients with OA of the hip, using standard methodology for assessing internal consistency reliability, validity and responsiveness. The BPI showed satisfactory internal consistency reliability with Cronbach's alpha >0.80 in assessment of pain and the impact of pain [36]. The pattern of observed correlations between subscales of the BPI measuring constructs similar to the WOMAC and SF-36 questionnaires generally were in line with expectations, thereby supporting the construct validity of the BPI scales in this setting. Further, the study has shown that the BPI also was responsive and detected change in pain and the impact of pain from before to 1 year after THR in a homogenous sample of patients with OA. The responsiveness of the BPI pain severity and function interference indices were at the level of the three WOMAC subscales and the PF, RP and BP scales of the SF-36, and in line with previous studies using the WOMAC [11, 39].

The internal consistency reliability for the BPI pain severity and function interference indices was in line with previous reports in patients with chronic non-malignant pain [14], OA [16], or undergoing cardiac surgery [18]. The high Cronbach's alpha in both dimensions of the BPI suggests that both indices are sufficiently unidimensional to permit scoring of the items as two composites.

The pattern of associations between the two BPI scales and corresponding and non-corresponding scales of the WOMAC and SF-36 largely supported the construct validity of the BPI. Our hypothesis was fairly crude and based on the judgment of items and scales, which cannot be expected to exactly capture all associations. At the same time it suggests that there is some overlap between the instruments, and that the BPI scales to some extent duplicate the BP scale of the SF-36 and the WOMAC pain subscale.

A previous study stated that 15% is a critical value for floor and ceiling effects [40]. In the present study, BPI subscale scores at baseline showed nearly no floor or ceiling effects. One year after THR, there was floor effects for both BPI subscales of 21% and 24%, respectively, and no ceiling effects. These results can be explained by the natural history of patients with OA that undergo THR; the lowest possible score is zero for a subject who refers to" no pain" for the pain severity index and "does not interfere" for the function interference index. For the WOMAC, the floor effect after THR was very large for the subscales pain and stiffness. These two subscales have fewer items than physical function. For the SF-36, the RP and RE scales had excessive floor and ceiling effects at baseline and 1 year after THR, and also the BP and SF scales presented excessive ceiling effects after THR, as in a previous study [11]. The large floor and ceiling effects may be related to the low number of possible values on these scales, as the RE, RP, BP and SF scales on the SF-36 have either few items or each item is scored on a binary scale.

The responsiveness of a measure is commonly appraised by comparing an observed change to another independent criterion, such as patient-perceived transition of health change [37]. In the present study, we used an item from the SF-36, which is not included in scoring of the SF-36 scales, where we categorized the respondents as having a better or unchanged/worse health status based on responses to a five point Likert scale. The stratified analysis according to rating of overall health change showed that the responsiveness indices (ES, SRM and RI) were large for the subscales on the WOMAC and the BPI pain severity and function interference indices and largest on both questionnaires among patients who reported improved health status. A previous study about responsiveness for the WOMAC and SF-36 after THR also reported good responsiveness on the WOMAC subscales and the physical domains on the SF-36 [11].

The results in the unchanged/worsened group seemed to be in the opposite direction of what would be expected, i.e. they suggested some improvement, but less than in the improved group. This may be caused by the crudeness of the rating of overall health change, in an intervention that for the majority of patients was very effective. Alternatively, it may be caused by recall bias, or other changes in health that were unrelated to OA or THR in this sample of elderly people with considerable comorbidity. Hence, the improvement with the disease-specific questionnaires may capture positive changes despite the patients' perspective of unchanged or worsened change in overall health.

The responsiveness of the BPI has previously not been reported among patients with OA undergoing THR. However, responsiveness of the BPI has been assessed in two previous studies: in patients undergoing cardiac surgery and patients with OA in a clinical trial of controlled-release oxycodone [16, 18]. Both studies reported moderate to large responsiveness indices, supporting the responsiveness of the BPI.

Compared with the WOMAC and SF-36 the BPI is short, easy to use and score. In the present study there was little difference in missing change scores on the scales of the different instruments. Previous studies have shown that BPI is a feasible instrument for use among patients with pain. Pain is a cardinal symptom among patients with OA of the hip and an important indication for undergoing THR, Therefore, a systematic evaluation of self-reported pain and the impact of pain using the BPI could be a complement to assessment by the physician [41]. Further, changes in pain can be quantified in a meaningful way using the BPI and enable comparison of results between studies [41]. Because of its briefness, the BPI can possibly also be valuable in a daily diary context.

Some limitations of our study should be mentioned. We assessed construct validity by investigating the pattern of associations between the different scales. However, other forms of validity could have been assessed, such as known groups validity or criterion validity, but we thought we had no feasible variables for such analysis. We also did not assess the factor structure of the BPI, which could have been done with confirmatory factor analysis. A two-factor structure of the BPI has been reported in several previous studies [13, 18, 25, 42], and we think this would contribute little to the paper. Responsiveness was in the present study assessed with an item from the SF-36 questionnaire, change in health in general, as a marker of overall health change. This was the best marker of overall health outcome that was available. We considered using responsiveness index which relates changes to an instrument's minimally important change. Because this is not reported for the BPI, we were unable to present this statistic. Further, we did not ask patients separate questions about changes in their physical health, mental health, pain or other components or symptoms and therefore cannot determine what components of health status were most important for the result. The sample size in the unchanged/worsened group was small, hence reducing the power of the study.

The BPI showed acceptable internal consistency reliability, construct validity and responsiveness in assessment of pain and impact of pain among patients with OA undergoing THR. We conclude that the BPI is a short instrument that can be used as an alternative or complement to more established instruments in this patient group, though the instrument offers few advantages over, and duplicates scales of, more comprehensive instruments, such as the WOMAC and SF-36.