Background

Patients seeking health care for chronic musculoskeletal (MSK) pain report problems with a variety of everyday life health domains that includes physical activity, vitality, mental well-being, sleep, work, and social relationships [1,2,3,4]. They also report a wide range of goals for treatment, and it is shown that goals set by patients with low back pain included a mixture of health domains [3, 5, 6]. In response to these challenges, a set of core outcome measures covering a range of health domains including pain, disability, and health-related quality of life (HRQoL) have been suggested [7,8,9]. Studies that investigate the association between outcome measures report only a fair to moderate relationship, this indicate dissociations between pain, disability and HRQoL [10,11,12]. Despite this, most prognostic studies define one single primary outcome and patients are evaluated on the same outcome dimension irrespective of their treatment goals and expectations.

A large variety of individual prognostic factors have been identified with improvement in patient with neck pain [13,14,15,16,17,18]. Age, pain intensity, disability, previous neck pain history, widespread pain and expectations all have an individual association with pain intensity, disability, and global perceived change as outcomes [14,15,16,17,18]. The same prognostic factors have all been tested for inclusion in prognostic models where the focus is not the individual predictor effect but rather the performance of a combination of prognostic factors as a whole [19]. We recently externally validated and updated an existing prognostic model for patients with neck pain [20]. The model includes seven predictors and was developed to predict improvement after 12 weeks using Global perceived effect (GPE) as outcome measure. However, it is unknown how well one single outcome measure will capture the breath of goals reported by patients with neck pain. Few studies have investigated if the predictive performance of prognostic factors or models varies across outcomes representing different health domains [19]. In existing prognostic model studies, a pre-set of candidate prognostic factors are defined and subsequently for each outcome different prognostic models are developed, and the predictive performance of each model reported [21, 22]. Studies rarely explore if predictors included in a model equally predict outcomes that represent different health domains (predictive strength), and how well a model predicts different outcome measures (predictive ability). It is likely that the predictive strength of a prognostic factor or the predictive ability of a model will differ depending on the construct and health domain of the outcome measure selected. As patients may pursue treatment goals related to diverse outcomes including pain, disability and HRQoL, clinicians need to know to what extent prognostic factors and prognostic models relate to each of these. Thus, to explore prognostic factors and models regarding their predictive performance for outcomes that measure different health domains may provide a more comprehensive picture of prognosis and insights into how well the predictions match the wide range of patients' goals.

The aims of this study were to (1) examine the association among commonly used outcomes for neck pain (i.e., pain intensity, disability and HRQoL), (2) investigate if the predictive ability of a recently developed prognostic model for GPE of neck pain differs across outcome measures (i.e., pain intensity, disability and HRQoL), and (3) explore the predictive strength of the included predictors across outcome measures.

Methods

Study design and setting

This study was part of a one-year prospective observational cohort study that aimed to identify prognostic factors for neck pain patients in chiropractic practice in Norway [20]. The study was reported according to the STROBE statement [23].

The study was approved by The Norwegian Regional Committee for Medical and Health Research Ethics (2015/89).

Recruitment of patients and study samples

Chiropractors invited consecutive patients with neck pain to participate in the study from September 2015 till May 2016. The chiropractors (n = 71) were located across Norway representing both urban and rural settings. Prior to inclusion, patients received oral and written information about the study from the chiropractor. We invited all patients presenting with neck pain as a primary or secondary complaint to participate. The participants were included regardless of neck pain classification, pain duration and time since last chiropractic consultation or treatment. Thus, the neck pain could be a first episode or part of an episodic or persistent pain complaint. Inclusion criteria were as follows: age above 18, adequate understanding of Norwegian language to complete questionnaires, own and be able to operate a mobile phone. Exclusion criteria were as follows: no serious pathologies such as suspected inflammatory disorders, fractures, infection, malignancy, or nerve root involvement requiring referral to surgery. In addition, we asked the included chiropractors to report the reason why a patient did not want to participate or was not invited in the study. We attempted to contact all non-responders by phone and/or mail in order to collect information on the reason for the drop out. Participants signed a written consent.

Measurements

The data collection included self-reported questionnaires at baseline and after 12 weeks as described in detail previously [20]. Treatment was not affected by study participation. We chose follow-up at 12 weeks as endpoint [24,25,26].

Patient-reported baseline information

We included predictors from our previous external validation and update of a prognostic model developed by Schellingerhout and colleagues [20, 27]. The predictors of the updated model were pain patterns of neck pain the previous year and expected pain patterns of neck pain the upcoming year (described below), radiating pain to the shoulder and/or elbow (yes/no), number of MSK pain-sites (0–10) and educational level (low, medium and high). We measured the physical leisure activity predictor as a 5-point ordered scale (Never/Less than once a week/Once a week/2–3 time a week/More than 3 times a week). In order to identify a difference between no activity versus activity we categorized the physical leisure activity predictor into ‘doing activity once or more per week’ versus ‘doing activity never or less than once per week’ (≥ 1 per week/ < 1 per week) [20]. The included predictors represent a variety of known and well-documented demographic and psychosocial prognostic factors that reflects the different aspects of health domains [20].

Pain patterns were measured by a self-reported visual trajectory pattern questionnaire [20]. The questionnaire had descriptions of five different patterns of neck pain that aim to characterize patients’ neck pain the past year (Previous pattern) or expectations of neck pain the upcoming year (Expected pattern) [20]. The five pain patterns were based on existing literature on trajectory patterns [28]. These pain patterns illustrate an increasing severity from the Single pain episode to the Severe ongoing pain pattern [29, 30].

Clinician-reported baseline information

The chiropractors recorded the consultation-type, i.e. when in the clinical course of treatment, participants were recruited: “First-time consultation” described patients recruited at the first visit, “Follow-up consultation” described patients recruited during a clinical course of treatment, and “Maintenance consultation” described patients visiting the chiropractor regularly at pre-planned time points [31].

Outcomes

The outcome measures covered the health domains of pain intensity, disability, and HRQoL.

Neck pain intensity was measured by a numeric rating scale (NRS) rating from 0 indicating ‘No pain’ to 10 indicating ‘worst pain imaginable’ [32]. We used the score at week 12 as pain intensity outcome. The NRS scale has been shown to have good test–retest reliability, construct validity and responsiveness for pain intensity [32].

The Neck Disability Index questionnaire (NDI) was used to assess disability [33]. It consists of 10 items evaluating function, pain, sleep quality and work ability, each scored from 0–5, with a sum-score range of 0 to 50 points. A higher score indicates more disability. The NDI has been reported to be a reliable, valid and responsive outcome measure in various neck pain populations, including different neck pain conditions [34,35,36].

EQ-5D was used to assess the HRQoL [37]. It evaluates 5 dimensions; mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. We used the version with a 5-level response and the scoring algorithm from the United Kingdom to calculate a health state index ranging from 0 (equivalent to being dead) to 1 (full health). EQ-5D was developed to be relevant to a wide range of health conditions [37] and has been reported to be a reliable and valid outcome measure for neck pain [38, 39]. Patient participation was not blinded, as all outcome measures and prognostic factors were patient-reported. However, patients completed all questionnaires independently and in absence of the chiropractor or researchers.

Statistical analysis

We present patient characteristics and outcomes as frequencies with percentages, means and standard deviations (SD). The outcome change score is presented by median with interquartile range (IQR) for pain intensity, NDI and EQ-5D. The statistical difference between baseline and follow-up measurements were tested by paired t-tests for all outcome measures.

We assessed the association between change score of the outcome scales at 12 weeks with Pearson’s correlation coefficient (r). The strength of r was interpreted according to the coefficient values < 0.3 (weak), 0.3 to 0.7 (moderate) and >  = 0.7 (strong) [40, 41]. In addition, we used Lowess plots to visualize the relationship between pairs of change score of the outcomes. For each outcome, we determined effect sizes by dividing the mean change between baseline and 12-week follow-up, by the SD of the baseline score. Cohen’s guidelines define an effect size of about 0.1 as small, 0.3 as medium, and 0.5 or higher as large [42].

We conducted a series of multivariable linear regression analyses using block-wise entry to investigate the predictive performance of individual predictors and model for each outcome measure. We divided the selected predictors into four blocks based on assorted health domains used when updating the prognostic model [20]. First, we entered each single block (Block 1 to 4) into the model and we determined their isolated contribution to prediction. Thereafter, we sequentially combined the blocks in four steps, and determined the combined contribution to prediction of each model. The final model comprised all 4 blocks. Block 1 included the pain patterns to account for previous and expected pain symptoms; block 2 included radiating pain and number of MSK pain-sites to account for additional pain history; block 3 included education level, physical leisure activity and the interaction term ‘physical leisure activity#number of MSK pain-sites’ to account for sociodemographic variables. Block 4 included consultation-type.

To clarify the predictive contribution of the baseline value of outcomes for each outcome measure, we formed a fifth block (Block 5). Block 5 included the baseline value of either pain intensity (Block 5a), NDI (Block 5b) or EQ-5D (Block 5c). Separately, we combined each block to the final model (i.e., Block 5a to the final model; Block 5b to the final model; Block 5c to the final model). In addition, we investigated the individual predictive performance of each of the three blocks, Block 5a, Block 5b and Block 5c.

We report the predictive performance as predictive ability and strength of association. For the predictive ability the adjusted R2, i.e. the proportion of the variance in outcome explained by the prognostic factors, was compared across models. Furthermore, we report the relationship between predictors and outcomes as the strength of association (beta coefficients with 95% confidence intervals (CIs)). In all regression analyses, we transformed both baseline outcome variables and outcomes to a continuous scale of 0–100, i.e. (score/max of scale score) × 100, to allow comparison of model explanation and strength of association across outcomes. We assessed the normality assumptions for linear regression visually for each model based on the residual plots and Q-Q plots. We investigated the linear association between predictors and outcomes by added-variable plots. The amount of missing data was small and no predictor had more missing values than 2.9%. We assumed missing values were due to random processes, as the main reason for missing were mostly incomplete paper-based questionnaires, and not due to refusal of patients or chiropractors to fill inn questionnaires. For missing values of the predictors, we used the multiple imputation method. For each outcome, we examined if the available sample size was enough for exploring prognostic models [43]. We used the method by Riley et al. to calculate for the efficient sample size for multivariable linear regression modeling [44]. In the present study, seven candidate predictors (that included 18 parameters) were selected a priori based on an updated prognostic model for neck pain [20]. We pre-specified the anticipated R2 (0.8), and used mean and standard deviation of outcomes in this study sample. This specifies a sample size of 254 is required. Our total sample size included 941 patients and within acceptable limits. We set the significance level at 5% for all tests and performed all analyses in STATA/SE 16 (STATA Corp, College Stations, TX).

Results

Baseline data from 1313 patients were collected of whom 941 (72%) responded to 12 weeks follow-up and constituted the study sample used for analyses in this study (see flowchart in Additional file 3). Study participants and non-responders (n = 372, 28%) were comparable in terms of demographics, neck pain symptoms and history, general health and psychosocial factors with only minor and not very substantially differences observed (Table 1).

Table 1 Baseline characteristics of the study population (n = 941) and Non-responders at 12-week follow-up (n = 372)

Outcome change score and correlations

There were small to moderate improvements for all outcomes from baseline to 12 weeks (P < 0.001). The mean (SD) pain intensity decreased from 4.7 (2.4) to 2.7 (2.1), NDI decreased from 11.5 (6.6) to 9.4 (6.4) and EQ-5D utility score increased from 0.85 (0.13) to 0.88 (0.11). The median (IQR) for the outcome change scores were -2 (-4 to 0), -2 (-4 to 1) and 0.01 (-0.02 to 0.05), respectively.

There was, however, a large inter-individual variation in changes score as illustrated with Lowess plots (Additional file 1). The plots and the Pearson correlation coefficients between change score on the outcome scales at 12 weeks revealed the strongest correlation between NDI and EQ-5D (r = 0.57) and between NDI and pain intensity (r = 0.53), while the weakest correlation was between EQ-5D and pain intensity (r = 0.39). The effect sizes for pain intensity, NDI and EQ5D were 0.76, 0.46 and 0.23, respectively.

Predictive performance of models

The residuals showed no strong evidence of a violation of the assumptions for linear regression for any of the models.

In general, all single blocks contributed to explained variance of all outcomes, and the relative contribution of the entered blocks of predictors were quite similar across outcomes (Additional file 2). The pain patterns (Block 1) and the baseline variable of the corresponding outcome measure (Block 5) explained most of the variance. Regarding the predictive ability of baseline values of the respective outcome measures (Block 5), baseline pain intensity contributed less to the explained variance compared to baseline NDI and EQ-5D. The single Block 2 (radiating pain and number of MSK pain-sites) and 3, (education level and physical leisure activity) had similar contribution to the explained variance across outcomes. The single Block 4 (consultation-type) did not contribute significantly to explained variance.

Figure 1 and Tables 23 and 4. presents the block-wise regression models with pain intensity, NDI and EQ-5D as outcomes, respectively. During the block-wise entry, the adjusted R2 values were largely unaltered from Block 1 until Block 4 for all outcomes. The final model having the highest explained variance, regardless of which baseline outcome variable. When we included the baseline variable of the corresponding outcome measure to the final model (Block 1 to 5), the adjusted R2 values ranged from 0.26 to 0.60 across outcomes. NDI was persistently the outcome with the highest explained variance compared to pain intensity and EQ-5D, with larger adjusted R2 values for single blocks as well as for the final model.

Fig. 1
figure 1

The predictive performance (Adjusted R2) when sequentially combining the five blocks for each outcome

Table 2 The explained variance (Adjusted R2) and associations between pain intensity as outcome and predictors (entered in 5 blocks) explored by linear regression analysis, (n = 941)
Table 3 The explained variance (Adjusted R2) and associations between disability as outcome and predictors (entered in 5 blocks) explored by linear regression analysis, (n = 939)
Table 4 The explained variance (Adjusted R2) and associations between EQ-5D as outcome and predictors (entered in 5 blocks) explored by linear regression analysis (n = 941)

The impact of the baseline values for different outcomes were further explored in Table 5. Adding baseline values for NDI to the final model (Block 1–4) resulted in the largest increase in adjusted R2 across all outcomes. Baseline pain intensity had little impact even when pain intensity was the outcome.

Table 5 The explained variance (Adjusted R2) between baseline outcome variables, predictors and NDI, EQ-5D or pain intensity as outcome after 12 weeks explored by linear regression analysis

Predictors in the models

In the final models (including Block 5), the pain patterns (Block 1) and the corresponding outcome variable (Block 5) were significantly associated with all outcomes (Tables 2, 3 and 4). Of the other predictors, consultation-type (Block 4) significantly associated with NDI and EQ-5D as outcomes. Number of pain sites were significantly associated with pain intensity and NDI as outcome. For all outcomes, the association with of education level, radiating pain to the shoulder and/or elbow and physical leisure activity was weak. The 95% CI were large for all predictors across outcomes; thus, comparisons should be interpreted with caution.

Discussion

This study found weak to moderate associations between improvements on the scales of pain intensity, NDI and EQ-5D outcome instruments. We also found that the prognostic model developed for the prediction of GPE showed large differences in total explained variance across three outcomes (pain intensity, NDI and EQ-5D). The prognostic model showed poorer predictive ability for pain intensity compared to both NDI and EQ-5D. For all outcomes, the impact of the individual entered blocks of predictors were quite similar. Among the investigated predictors, pain patterns accounted for the largest explained variance in all outcomes.

Our results show that with the chosen set of predictors, disability is more accurately predicted than pain intensity and EQ-5D. The difference in explained variance between pain intensity and NDI is in keeping with previous studies using pain intensity and disability as outcomes [19, 21, 22, 45]. One study included a large number of psychological candidate prognostic factors [21], while another more recent study included a wide selection of biopsychosocial candidate predictors [22]. Hence, it is reasonable to suggest that the consistent differences in performance with different outcomes, is due to measurement properties of the various outcomes.

One possible explanation for the lower performance of the prognostic model using pain intensity as outcome may be due to the poor reliability of a single pain measure [46, 47]. As neck pain is reported to be episodic or fluctuating [48,49,50], it is less likely that an improvement of symptoms is captured by one single time point measurement on a group level. The phenomena captured by the composite measures (e.g., NDI and EQ-5D) might be less subject to such variation over time. Another explanation may be construct differences of outcome measures [47]. Multidimensional constructed outcomes may better capture the complexity of the individual’s experience of neck pain compared to single dimension outcomes such as pain intensity [47, 51].

MSK patients emphasize pain as an important goal and pain is commonly evaluated in clinical practice [2]. Accordingly, randomized clinical trials use pain intensity as an outcome. Although it seems relevant to assess pain, the poorer ability to capture improvement may lead to weaknesses in the trials. This is a challenge that needs attention and further studies for instance through exploring why changes in pain is difficult to predict and how pain intensity can be used best as an outcome measure.

The lower correlation between pain intensity and either of the two other outcomes, than between NDI and EQ-5D further supports that pain intensity is either less reliable or have a different construct. Although pain intensity is a simple one item measure, pain affliction may be modified by numerous other factors like expectations, pain beliefs and behaviors [52, 53]. The more concrete items included in NDI and EQ-5D may be less vulnerable to such modulation, and some of the differences may possibly be related to this effect.

Independent of outcome measure, the pain patterns and baseline values of outcomes were the predictors that contributed the most to prediction. Previously, neck pain history (including previous episodes and duration) and future expectations assessed by traditional measures (i.e., numeric scales) have been found to be predictors of GPE, pain and disability [17, 19, 22, 30, 45]. These findings support our results on pain history and expectations as consistent prognostic factors. Similar to systematic reviews on prognostic factors, we also found baseline pain intensity and disability as robust predictors of outcomes [14, 16].

Although it can be argued that there is an implicit perception that pain intensity is included in the pain patterns (Block 1), we found that pain intensity as a single predictor or when added to the final model only contributes little to prediction, regardless of outcome measure. The association with outcome for the remaining predictors (radiating pain to shoulder and/or elbow, physical leisure activity, MSK pain-sites and education level) were less consistent, in line with previous research on prognostic factors and models for neck pain [18, 20, 22]. Consultation-type has previous proven to be associated with outcome [20], but not to interact with other predictors [20, 30]. A recent study used a set of predefined prognostic factors to develop prognostic models for recovery of patients with neck pain [22]. This study used pain, disability, and perceived effect as outcomes. Like us, they found the prognostic model with disability as outcome to have the best predictive performance but found only neck pain duration as a consistent predictor across all outcomes. This emphasises that a prognostic factor or model derived for a specific outcome does not fully represent all health domains, and thus models need validation on the outcome they are intended for.

Strengths and limitations

One limitation concerns the loss to follow-up. Since 28% of the included patients did not respond at 12 weeks follow-up a possibility of attrition bias exits. The main reason to not answer the questionnaire at 12 weeks follow-up was due to lack of time required to complete questionnaires. However, the non-responders did not differ significantly from the analyzed sample suggesting that a possible impact of attrition bias may not have substantial influenced the results. Another concern is that the inclusion criteria to own a mobile phone and to adequate understand the Norwegian language may introduce selection bias. The most common reasons were ‘patient did not wish to participate’ followed by ‘chiropractor forgot or did not have time to ask’, thus we believe the risk of selection bias due to these parameters is low. The included predictors were derived using the single item GPE as outcome in a previous study [20]. We were therefore unable to directly compare results to other studies because there is no obvious way of transforming GPE to a comparable scale. However, as GPE do not represent a specific health domain, GPE as outcome did not favor any of the investigated outcomes. A strength is that the inclusion criteria were broad rather than strict, as in RCT’s, and this can result in a more heterogeneous patient population. Consequently, this may support the generalization of the results to patients with neck pain seen in primary health care. We included consultation-type because the participating patients were not included at a uniform time (zero time). Patients that seek care across health care settings experience different phases of their neck pain course (i.e., acute, recurring, or persistent), which is a challenge for clinicians regarding prognosis. Therefore, in the regression analyses of this study, we considered for these differences at inception by adding the variable consultation-type. We found that consultation-type did not interact with the included prognostic factors, but it seems that the inception time is related to outcome. Although not useful on an individual level, consultation-type may be one way to achieve additional prognostic information for a setting where the patient population are heterogeneous. The Visual trajectory pattern questionnaire has not been validated. However, it is quite similar to a questionnaire used for low back pain that seem to capture peoples prospectively measured course [29]. Also, the relationship with patient reported outcomes in the expected direction provides support for the responses to the questionnaire to be meaningful.

Conclusions

The highest correlation between outcome change scores was found between NDI and EQ-5D and lower association with pain. The prognostic model also showed best performance for NDI as outcome and the poorest for pain intensity. The predictive impact of the predictors was consistent across all outcomes. These results suggest that we need more knowledge on the reasons for the differences in predictive performance variation across outcomes.