Introduction

Neck pain is a common and burdensome condition costing millions of dollars annually in compensation and treatment [15]. Despite an extensive scientific literature, the optimal treatment for neck pain remains uncertain [6]. One approach to evaluating treatments when there are associated risks or uncertainties is decision-analytic modelling [79]. A decision-analytic model quantifies trade-offs inherent in alternative treatment strategies, often including treatment benefits, adverse events, inconveniences, and costs.

Preference-based measures of health-related quality-of-life (HRQoL) (or utilities) are important inputs into clinical decision analyses. In health-related applications, these measures quantify individuals’ values about the desirability of health states along a scale, usually anchored at 0, corresponding to death, and 1, corresponding to good health [7, 10]. Resulting scores are used as weights to calculate quality-adjusted life years in economic evaluations and are also useful to compare the relative impact of disease states [10, 11].

The only source of directly elicited preference scores for neck pain is the Beaver Dam Health Outcomes Study [1216]. However, certain features of this study limit the validity of its preference scores for decision or cost-effectiveness analyses of neck pain treatments. First, the characteristics of the respondents with neck pain, including type of neck pain, were not described. Second, a notable portion of respondents were not likely to be experiencing neck pain when surveyed, since scores were elicited from persons affected by neck pain in the past year, and spine pain (whether cervical or lumbar) has a relapsing and remitting course [2, 1720]. Third, respondents only included persons affected by severe neck pain, whereas most persons seeking treatment for neck pain have mild to moderate neck pain [2, 21].

Because direct measurement of health state preference scores can be cognitively challenging and time-consuming, there has been considerable attention to methods for deriving preference scores from conventional psychometric quality-of-life instruments [22]. Generic health status instruments have been used to measure quality-of-life in patients with musculoskeletal conditions; however, these instruments may not adequately detect dysfunction associated with neck pain. For example, a recent study determined that the SF-36 detected less disability and demonstrated less responsiveness than an upper extremity instrument in patients with upper extremity disorders, despite the fact that both instruments had comparable standardized response means [23] Neck-specific quality-of-life instruments may be more suitable for deriving preference scores in persons with neck pain, and an important step towards this goal is to explore the relationship between these two measures of HRQoL.

The primary objective of this study was to directly elicit neck pain patients’ preference scores for their current health for use in a decision-analytic model and future cost-effectiveness analyses. The secondary objective was to investigate the association between such preference scores and patients’ level of disability related to neck pain, using a psychometrically validated disease-specific health status instrument (the Neck Disability Index), to explore the feasibility of developing a mapping algorithm between the two methods.

Methods

Patient sample

Patients were recruited from outpatient multidisciplinary rehabilitation and chronic pain clinics in two geographic regions. In California, these were consecutive persons presenting for neck pain lasting at least 3 months (‘chronic neck pain’ sample). In Ontario, these were consecutive persons presenting for neck pain following a motor vehicle accident (MVA) (‘neck pain after a MVA’ sample). Patients with neck pain caused by an identifiable aetiology (e.g., fracture, infection, tumour, inflammatory arthritide, or myelopathy), were not fluent in English, or aged <18 years, were excluded.

Interviews

Semi-scripted, face-to-face interviews were conducted in private rooms by trained interviewers. Selected interviews were audio-taped and reviewed for quality, particularly initial interviews conducted by each interviewer. Patients performed several tasks. First, they completed a questionnaire about their current health that was used to construct their individualized current health state description card (Appendix A). Second, they rank-ordered their current health and three short-term (4-week) and three long-term (lifetime) hypothetical health states that were pertinent to a decision analysis of neck pain treatments [24]. Third, they rated their current and the hypothetical health states using the RS method. Finally, their utilities to these health states were elicited using the standard gamble (SG) method. The patients’ current health was framed as a short-term health state because spine pain tends to have a remitting, relapsing course [2, 1720].

Health state description cards

Interviewers constructed individualized health state description cards for each patient’s current health using pre-fabricated printed materials. Each patient’s current health state description card was based on her/his responses to the five items contained in the questionnaire completed at the start of the interview (Appendix A). Each item corresponded to a body function, or a domain of activity and participation: (1) neck pain intensity; (2) personal care; (3) work (or school) and household chores; (4) leisure, social and family activities; and (5) emotions related to health [25, 26]. The items had five response options that described levels of impairment or functioning. This standardized format was also used to describe the hypothetical health states, such that patients could compare their current and the hypothetical health states across uniform attributes and levels of attributes. This reduced the cognitive burden of comparing health states that differed by aetiology (i.e., musculoskeletal, cardiovascular, cerebrovascular, gastro-intestinal).

Rank ordering of health states

Patients vertically ranked the health state descriptions in order of desirability. Short-term (including patients’ current health) and long-term health states were rank-ordered separately. Obvious inconsistencies (such as ranking a clearly superior health state as worse than a clearly inferior state) were flagged, and patients were invited to reconsider their responses.

Rating scale preference scores

Rating scale preference scores were obtained by using a 100-point vertical visual analogue scale called a ‘feeling thermometer’ [27, 28]. Scores for short-term (including patients’ current health) and long-term health states were obtained separately. The lowest value on the feeling thermometer was 0 (anchored by the least desirable health state) and highest value was 100 (anchored by good health). Patients placed each health state card next to the number that best represented their preference for that health state, relative to the anchor states. This number was then divided by 100 to obtain a preference score. In those patients who considered a health state to be worse than death, death and the remaining health states were assessed along a scale anchored by the health state considered worse than death and good health. These scores were then linearly transformed [29] onto the conventional death–good health (0.0–1.0) preference scale using this equation:

$$ {\text{Transformed\_Score}}_{{{\text{health\_state}}}} = {\frac{{ ( {\text{Score}}_{{{\text{health\_state}}}} - {\text{Score}}_{\text{death}} )}}{{(1 - {\text{Score}}_{\text{death}} )}}} $$
(1)

The transformed utilities could then be pooled with the health state utilities of patients who considered death to be the worst health state.

Standard gamble preference scores

We elicited SG utility scores for the long-term (lifetime) hypothetical health states preferred to death using the conventional SG procedure. Patients were asked to consider a choice between a certainty of living in the health state under consideration for the rest of their life or a hypothetical treatment. The treatment offered a chance (p) of good health (utility score = 1.0) and a risk (1 − p) of immediate death (utility score = 0.0). We used a probability wheel as a visual aid to help patients understand risks inherent in the SG [27, 28]. The probability of the outcomes associated with the hypothetical treatment was varied using a converging ping-pong strategy until the patient was indifferent between the certainty and treatment [28]. According to decision theory, the utility of the long-term health state h LT was equivalent to the probability of good health at this indifference point:

$$ h_{\text{LT}} = (p)(1.0) + (1 - p)(0.0) = p $$
(2)

We used a slightly different procedure to elicit utilities from patients who considered a long-term health state worse than death. The hypothetical treatment offered a chance (p) of good health and a risk (1 − p) of ending in the health state considered worse than death (rather than the risk of immediate death, as in the conventional SG described above) [7]. Under this arrangement, utilities for the long-term health states and death were assessed along a health state worse than death–good health scale. We linearly transformed these utilities onto the conventional death–good health (0.0–1.0) scale using Eq. 1 in order to pool with the long-term health state utilities of patients who considered death to be the worse health state.

We used the chained SG to elicit utilities for patients’ current health (defined as a short-term health state of 4-weeks duration) and the short-term hypothetical health states (Fig. 1) [3032]. In the first step, a short-term anchor health state h A (considered worse than all other short-term health states) replaced immediate death (Fig. 1a, c). We described health state h A as ‘excruciating pain that is poorly controlled by medication while being completely dependent on others for all activities’ (Appendix B). The utility of a short-term health state h ST (including patients’ current health) was then:

$$ h_{\text{ST}} = (p)(1.0) + (1 - p)(h_{A} ) $$
(3)
Fig. 1
figure 1

Stages of the chained standard gamble procedure

In the second step of the chained gamble, the short-term health state h ST was related to the death–good health scale by evaluating the utility of the anchor health state h A using the conventional SG procedure. This provided the value for h A which was then used to solve for h ST in Eq. 3 (Fig. 1b). In patients who considered one of the long-term health states to be worse than death, the anchor health state h A was evaluated in relation to the long-term health state considered to be worse than death and good health (Fig. 1d) [32, 33]. The utility of death was also assessed in relation to the health state worse than death and good health (Fig. 1e) in order to linearly transform these latter patients’ utilities onto the death–good health scale, using Eq. 1.

Socio-demographic and clinical variables

Neck pain intensity was measured on a 10-centimetre visual analogue scale anchored at ‘No Pain’ and ‘Pain as Bad as it Could Be’. Disability related to neck pain was measured using the Neck Disability Index, a validated 10-item instrument where higher scores indicate greater disability [3436]. The Von Korff Pain Grade was used to classify neck pain and associated disability into grades of severity [3739]. Patients also completed a co-morbidity questionnaire, [40, 41] and the 36-item Medical Outcomes Study Short Form (SF-36v2) [42, 43]. SF-6D (indirectly measured) preference scores were estimated from SF-36 data [22]. Depressive symptoms were assessed with the Center for Epidemiologic Studies-Depression (CES-D) scale [4447], where scores ≥16 suggest clinically significant depressive symptoms.

Data preparation

We deleted observations based on three pre-defined rules: (1) the interview was rated as ‘Poor’ or ‘Very Poor’ [29], which occurred whenever interviewers judged that a patient did not understand the tasks or provide thoughtful responses; (2) death was assigned a score of 1.0; or (3) preference scores were invariant at 1.0 (i.e., patients assigned a preference score of 1.0 to all the health states) [48].

Analyses

Sample size estimate

Based on requirements for estimating mean preference scores and fitting regression models with ≤10 factors [49], and predicting a loss of 10% of total observations (due to incomplete data or deletions for the above pre-defined rules), we sought to enrol 110 patients in each sample.

Preference scores

Preference scores were pooled across samples and stratified by Von Korff Pain Grades. Descriptive statistics summarized the scores within grades. An extension of Cuzick’s non-parametric method was used to test for trend in preference scores across grades [50, 51].

Explanatory regression models

We used multivariable ordinary least squares (OLS) regression models to quantify the association between patients’ directly measured preference scores for their health and their Neck Disability Index scores. Analyses were conducted on the chronic neck pain and neck pain following a MVA samples independently, to determine whether the association differed by type of neck pain. Demographic factors (age, gender) and clinical factors (neck pain duration, depressive symptoms, co-morbidity) were included in the models based on their reported associations with neck pain [4, 5, 1719, 21].

We ran diagnostic tests on the models using conventional criteria and tested statistical assumptions for linear regression models [49, 52]. The homoscedasticity assumption was violated in the SG score models [53]. Several approaches for dealing with heteroscedasticity were applied, but none corrected the problem. We therefore fit a Tobit model as a sensitivity analysis to test the robustness of the SG score OLS model parameter estimates, since the Tobit model is used as an approach to deal with heteroscedasticity [53, 54]. All analyses were performed using SAS software (version 9.1, 2002–2003; SAS Institute, Inc., USA).

Results

Patient samples

We recruited patients from April 23, 2004 to December 21, 2005 and enrolled 104 patients with chronic neck pain and 116 patients with neck pain following a MVA (Fig. 2). All patients were interviewed within 2 weeks of their initial assessment for a course of neck pain treatment. Mean interview duration was 77 min (Standard Deviation [SD] = 17.55). Patients with neck pain after a MVA tended to have lower levels of education, higher levels of neck pain intensity, and more clinically significant depressive symptoms than patients with chronic neck pain (Table 1).

Fig. 2
figure 2

Participant flow and number included in analyses. Note: in some cases, the sum of the individual n for ‘Reasons for exclusion from analysis’ may exceed the total n excluded from analyses because selected observations were excluded for more than one reason

Table 1 Frequencies and means for socio-demographic and clinical variables

Data preparation

We excluded two chronic neck pain patients from the RS score analyses, and four chronic neck pain patients and eight patients with neck pain following a MVA from the SG score analyses (Fig. 2). In patients with neck pain after a MVA, 32 of 116 patients included in the RS analyses and 26 of 108 patients included in the SG analyses considered Major Stroke Disability to be a long-term health state worse than death. In the chronic neck pain patients, 43 of 102 patients included in the RS analyses, and 41 of 100 patients included in the SG analyses considered Major Stroke Disability to be the least desirable long-term health state.

Preference scores

Neck pain patients’ scores for their current health varied widely: RS scores ranged from −0.14 to 0.99 and SG scores ranged from −3.73 to 1.00 (Table 2) Negative preference scores are strictly interpreted as meaning selected patients considered their current (short-term) health to be worse than death. SF-6D scores showed less variation, ranging from 0.30 to 0.96 (Table 2). A ceiling effect was noted for SG scores, in which scores in 22 patients were equal to 1.00 (Table 2). Mean and median RS, SG, and SF-6D scores of patients with neck pain following a MVA (RS mean = 0.51 [SD = 0.28]; RS median = 0.51 [IQR = 0.49]; SG mean = 0.67 [SD = 0.50]; SG median = 0.78 [IQR = 0.37]; SF-6D mean = 0.56 [SD = 0.12]; SF-6D median = 0.57 [IQR = 0.14]) were consistently lower than those of chronic neck pain patients (RS mean = 0.65 [SD = 0.28]; RS median = 0.70 [IQR = 0.39]; SG mean = 0.72 [SD = 0.33]; SG median = 0.79 [IQR = 0.34]; SF-6D mean = 0.63 [SD = 0.11]; SF-6D median = 0.62 [IQR = 0.15]) (Fig. 3). RS, SG, and SF-6D scores showed a gradient in the expected direction across grade of neck pain and disability (Table 2) (RS scores: Z = [−7.42] [P < 0.0001]; SG scores: Z = [−3.02] [P = 0.001]; SF-6D scores: Z = [−9.53] [P ≤ 0.0001]).

Table 2 Neck pain patients’ preference scores for their current health stratified by grades of neck pain and disability
Fig. 3
figure 3

Box plots of neck pain patients’ preference scores for their current health. MVA = motor vehicle accident. Note: a standard gamble score outlier (−3.73) was removed from the neck pain after MVA group for the purpose of plotting data in this figure

We examined the consequence of not restricting the lower bound of the preference scales (i.e., allowing negative RS and SG scores) by setting all negative RS and SG scores to zero. This resulted in predictably higher mean and median values: RS mean = 0.57 (SD = 0.29); RS median = 0.60 (IQR = 0.53); SG mean = 0.67 (SD = 0.30); SG median = 0.77 (IQR = 0.40).

Explanatory regression models

Tobit model parameter estimates for SG scores did not differ considerably from the overall representation given by the OLS model parameter estimates. That is, the direction, order of magnitude, and statistical significance of β coefficients were consistent across OLS and Tobit models. We therefore focused on OLS model results for ease of comparison.

RS and SG scores that patients assigned to their current health were negatively associated with their level of neck pain disability measured by the Neck Disability Index (Tables 3, 4). The magnitude of association between RS scores that chronic neck pain patients assigned to their current health and their level of disability related to neck pain changed from the crude to the adjusted model. In contrast, the magnitude of association remained consistent across crude and adjusted models in patients with neck pain following a MVA. An association was also observed between RS scores and depressive symptoms in chronic neck pain patients, but no such association was observed in patients with neck pain following a MVA. The adjusted R 2 statistic showed that the adjusted model for chronic neck pain patients explained more variance than the crude model, whereas no additional variance was explained by the adjusted model for patients with neck pain after a MVA.

Table 3 Ordinary least squares regression model parameter estimates: persons with chronic neck pain
Table 4 Ordinary least squares regression model parameter estimates: persons with neck pain after a motor vehicle accident

There were differences across neck pain patient samples in the association between SG scores and neck pain disability as well. No association was observed in chronic neck pain patients. There was a weak, negative association observed in patients with neck pain after a MVA. As also observed in the RS-adjusted model described above, (1) the strength of the association did not change in the adjusted model for patients with neck pain following a MVA, whereas it did change in chronic neck pain patients, and (2) the adjusted model for SG scores of the chronic neck pain patients showed an association with depressive symptoms, but this did not hold for patients with neck pain after a MVA.

The adjusted R 2 statistic for both samples suggested that Neck Disability Index scores and covariates included in the SG score models collectively explained little variance beyond measurement noise. This is in contrast to the adjusted R 2 statistic values for the RS score adjusted models described above, in which at least 31% and 51% of the variance are explained in persons with chronic neck pain and those with neck pain following a MVA, respectively.

Discussion

Neck pain patients’ preference scores for their current health were directly elicited using the RS and SG. Though preference scores tended to decrease as neck pain and disability increased, wide variation was observed within grades of neck pain and disability. The association between patients’ preference scores for their current health and disability related to neck pain was also investigated. A more consistent relationship was observed in patients with neck pain following a MVA compared to patients with chronic neck pain. There was also a more consistent relationship between patients’ level of neck pain disability and RS scores that they assigned to their current health, compared to SG scores. The association between preference scores and depressive symptoms was considerable in patients with chronic neck pain, whereas it was negligible in patients with neck pain after a MVA.

There are plausible explanations for these findings. Persons with acute neck pain may be more focused on their pain and disability, compared to patients with chronic neck pain who may have adapted [55]. The greater influence of depressive symptoms on chronic neck pain patients’ preferences for their current health may be explained by reported associations between indicators of poor psychological health and neck pain [4, 5]. Depressive symptoms have been associated with slower recovery from neck pain, and psychological functioning has been observed to be negatively impacted as symptom duration increases [5658].

Our observation that a larger portion of variance was explained by disability related to neck pain in RS score regression models compared to SG score models was also observed in another study of patients with chronic musculoskeletal pain [59]. One explanation may be that the RS and Neck Disability Index share a similar response format, in which respondents endorse hierarchically ordered responses. In contrast, the SG requires respondents to reveal their preferences by making choices [60]. The SG score also incorporates respondents’ risk attitude, through its scaling procedure which is framed in terms of uncertainty [61, 62]. Furthermore, the effects of probability transformation, contingent weighting, and loss aversion that are associated with the SG’s approach of measuring utilities tend to enhance risk aversion, leading to upward bias [31, 60]. Because there is no risk in the RS procedure, RS scores tend to be systematically lower than SG scores, as observed [63, 64].

All directly and indirectly measured mean preference scores in our study (except chronic neck pain patients’ SG scores) were lower than those reported by the Beaver Dam Study (0.77 [95% CI: 0.72, 0.81]) [15]. One explanation may be that a sizeable portion of Beaver Dam respondents may not have had neck pain when surveyed. Also, we deleted observations with invariant SG scores, and some have suggested that invariant scores account for a portion of high SG scores reported in the literature [48]. We also did not delete extreme outliers identified as points of influence, nor did we restrict the lower bound of negative preference scores. There are multiple ways to deal with influential outliers [65] and negative preference scores [66, 67], and we chose to present our data without additional deletions beyond pre-defined rules. Finally, we measured patients’ preferences for their own health within a set of hypothetical health states that included a long-term health state that one-third of patients considered worse than death [68, 69]. When we linearly transformed these patients’ scores and pooled them with those of the remaining patients, overall mean scores were pulled downward.

There were other limitations associated with our study. Though our selection criteria for chronic neck pain patients included neck pain of at least 3 months duration, selected patients reported a duration of <3 months. This may be a measurement artefact, in which patients’ current (aggravated) neck pain episode, within the fluctuating (relapsing, remitting) course of chronic neck pain, was most prominent in these patients’ minds (and thus the stimulus for their response to the question on neck pain duration). Measurement error may have resulted from the manner that interviews were conducted, including the order in which health states were presented to respondents [70], and wording used to describe the health states [71]). Also, sample sizes limited the number of differences across samples that could be adjusted for. Thus, while several differences were adjusted for, there were insufficient observations to correct for all differences, including education, effect of medication, geographic region, country, and health care system.

These preference scores suggest that the HRQoL in persons seeking care for neck pain is considerably diminished. The average SG score that chronic neck pain patients assigned to their current health is nearly equivalent to mean SG scores assigned by patients with chronic, stable angina (0.81) and patients with Crohn’s disease (0.79) to their own health [72, 73]. Mean RS scores that patients with neck pain following a MVA assigned to their health are comparable to those of persons with myalgia and arthralgia from Lyme disease (0.54), while their mean SG score approaches that of Type I diabetes patients with a functioning kidney transplant (0.72) [74, 75]. These comparisons are instructive, but not entirely valid, since neck pain patients’ preferences for their episodic (short-term) health state are compared to preferences of patients with chronic (long-term) conditions.

Our study provides directly elicited preferences for health states related to chronic neck pain and neck pain following a MVA that can be used as quality-of-life weights for future decision and cost-effectiveness analyses of neck pain treatments. We also provide a preliminary examination into the feasibility of developing a mapping algorithm between directly measured neck pain patients’ preferences and the most widely used neck-specific instrument: the Neck Disability Index.