Background

It is well understood that “measuring health is the first step to understanding health and understanding health is the first step to improving health” [1]. Patient-reported outcome measures (PROMs) present the opportunity to capture information directly from the patient that can help clinicians and researchers understand the impact of disease, treatment, and health status directly as the patient perceives it [2]. Moreover, PROMs play a critical role in supporting shared decision-making and personalized goal-setting between a patient and provider [2, 3]. In high-burden diseases with multifactorial causes, such as neck pain, PROMs present an opportunity to capture information that can inform the development of individualized evidence-based interventions, assess responsiveness to treatment and inform prognosis beyond traditional objective assessments [4].

Various evidence-based interventions have been recommended for the treatment of neck pain including treatments provided by a variety of interdisciplinary clinicians. However, due to the lack of standardization of PROMs across disciplines and in many cases even within a single discipline, there is difficulty in comparing the outcomes of these interventions [5]. This heterogeneity of measures makes it challenging to quantitatively evaluate which treatments are effective, their use in clinically meaningful research and comparison of findings between studies [3]. To that end, the expansion of electronic health record capabilities and data management allow the aggregation of large scale data collection and patient reported outcome integration at an unprecedented level. However, with the continued heterogeneity of measurement use in patients with neck pain and without minimal mandates, the ability to use this data to improve patient outcomes and advance the field will remain suboptimal.

Standardized PROM use has the potential to complement a clinician’s experience and expertise with an objective assessment of a patient’s status as they perceive it, assist with shared decision making, detect improvement in function, and provide informative large scale data to drive value based care pathways and quality improvement [6]. Various professional organizations, including the American Physical Therapy Association(APTA) have included recommendations for standardized PROM collection within published clinical practice guidelines(CPGs) including those specific to neck pain [4]. Despite open access to these guidelines, continued inconsistencies and lack of standardization in PROMs exist. To that end, these inconsistencies subsequently reduce the value of PROMs within physical therapy and across other professions [7].

Continued challenges to their implementation into clinical practice has been attributed to multifaceted barriers including lack of time to complete questionnaires, administrative burden, and lack of knowledge on how to translate data to knowledge [7, 8]. Additionally, without standardization of PROMs, patients may face “survey fatigue”. This combined with a clinician’s potential lack of knowledge on how to use the results to inform their clinical decision making further enhances the patients’ assumptions that they provide little value to their care. To that end, it’s critical to consider a PROMs measurement characteristics such as validity, consistency, feasibility, interpretability, and responsiveness. Therefore, a thoughtful, pragmatic, and evidence-informed selection process will ultimately influence the extent that the measure will be valuable, useful, and informative in clinical practice [1, 9].

In 2019, Chiarotto described three-steps to guide selection of the most appropriate PROM for a particular context [10]. Understanding what you want to measure and for what purpose, reviewing the literature, and assessing the quality of the measurement tool of interest were recommended steps to ensure what matters most to patients is captured [10]. Additionally, utilization of a conceptual model and framework to guide appropriate patient-reported outcome selection has also been suggested [11]. Physical therapists are one of the primary non-operative providers for patients with neck pain [12]. Accordingly, patients with neck pain account for approximately 20% of patients referred to outpatient physical therapy [13]. The first step to making recommendations for a set of PROMs to be used for patients with neck pain is to understand the breadth of PROMs within the profession. Secondly, it’s critical to understand what patient populations and clinical context these PROMs are reported in the literature. Therefore, the purpose of this study was to identify PROMs that are reported in patients with neck pain receiving physical therapy interventions and to provide guidance for physical therapists and other practitioners on PROM selection in this patient population.

Methods

Review design

The protocol for this systematic review was designed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) guidelines [14] and is registered with the International Prospective Register of Systematic Reviews (PROSPERO) database (CRD42023391158) [15]. We collaborated with a research librarian (SH) to develop an appropriate search strategy and management of the literature review.

Data sources and search strategy

We searched MEDLINE (OVID), Embase (Elsevier), CINAHL Complete (EBSCOhost), and Web of Science (Clarivate) on September 13, 2022, using a combination of keywords and database-specific subject headings for the following concepts: neck pain, including any conditions that had a primary symptom of pain, and specific outcomes identified of interest by the group. An additional modified filter from the COnsensus-based Standards for the selection of health status Measurement Instruments (COSMIN) was used to further limit studies that mentioned reliability and validity information [16]. No restrictions were placed by date or language. The search was limited to only systematic review and guideline publications using two Canadian Agency for Drugs and Technologies in Health (CADTH) search hedges, which were only modified to remove the health technology assessment terms. The search strategies were peer-reviewed by another librarian with expertise in systematic review searches prior to execution using the Peer Review of Electronic Search Strategies (PRESS) checklist [17]. The full, reproducible search strategies for all included databases are available in supplementary material 1.

Inclusion and exclusion criteria

The inclusion criteria for this study were systematic reviews of patients of any age or sex with neck pain receiving a physical therapy intervention or interventions commonly performed by a physical therapist. Studies included in this review must have met the additional criteria of reported outcomes in patients 18 years or older, patients with neck pain or cervicogenic headaches, and at least one patient-reported outcome measure recorded. The exclusion criteria applied in this study were if the study design was anything other than a systematic review of studies that used an experimental, quasi-experimental, or observational design, patients evaluated had neck pain with another spine-related condition such as low back pain, the intervention was provided by a chiropractor or the patient population included patients with neck pain who had neurologic deficits, severe cardiovascular diseases, serious pathology (e.g., malignancy, infection, cancer, inflammatory arthritis, fractures, upper cervical instability, etc.).

Study selection and data extraction

After databases were searched, titles and abstracts of studies were uploaded into Covidence. The article selection process was completed in two phases. In the first phase, two authors (MR and MH) performed independent reviews of titles and abstracts in Covidence using the predefined inclusion and exclusion criteria above. Articles were moved to full-text review if one or both authors found the article potentially relevant. In the second phase, the same two authors independently reviewed full-text articles for eligibility. Any conflicts were resolved by discussion between authors. Three reviewers (MR, MH, MS) performed independent data extraction with a checked final review performed by a single reviewer (MR). Data extraction was performed using a Population, Intervention, Comparison, Outcome (PICO) format with elements representing author, year, title, databases searched, study type, number of included studies, population, intervention, comparators, and patient-reported outcome measures evaluated. Any other measures included in the reviews were also extracted.

Data analysis and synthesis

Studies included in this review were evaluated from December 2022 to February 2023. The primary purpose of this review was to describe PROMs in physical therapy research and practice through qualitative synthesis. Therefore, we did not perform a meta-analysis of the data. For the qualitative synthesis, we described the studies by publication year, clinical population, study type, number of studies included in the review, and the outcomes reported in each study. We reported the frequency of PROMs by the constructs of disability, pain intensity, psychosocial factors, and quality of life. These were described according to their context of use (diagnosis, prognosis, and/or risk) within the included reviews.

Risk of bias

Two review authors (CH and JM) independently assessed the risk of bias in included reviews using the Assessment of Multiple Systematic Reviews 2 (AMSTAR 2) tools. AMSTAR 2 is a validated instrument that uses 16 questions to assess systematic reviews that include randomized and non-randomized studies of healthcare interventions, or both [18]. The included studies were appraised according to AMSTAR 2 guidance and rated the reviews into four categories: “high”, “moderate”, “low”, and “critically low” in overall confidence. We considered the potential impact of an inadequate rating for each item individually. Particularly, we took into account the critical domains, which include whether or not a protocol was registered before the commencement of the review, the adequacy of the literature search, the justification for excluding individual studies, the risk of bias from individual studies being included in the review, consideration of the risk of bias when interpreting the results of the review and the assessment of the presence and likely impact of publication bias. Disagreements between the review authors over the risk of bias in particular studies were resolved by consensus.

Results

Study characteristics

The electronic search resulted in an initial 9457 articles (Fig. 1). After 2454 duplicates were removed, 7003 articles were included for abstract and title review. Eighty-eight articles met the inclusion/exclusion criteria and were included in full-text retrieval. One study was excluded due to lack of full-text availability. After a full-text review, a total of 50 studies were excluded. This was due to the wrong patient population (19), wrong intervention (17), wrong study design (6), wrong comparator (3), wrong outcomes (2), non-English (2), and wrong setting (1).

Fig. 1
figure 1

PRISMA flow diagram of screened and eligible citations

Thirty-seven studies were included in the final review, were published between 2015 and 2022 and included a total of 31 distinct PROMs reported across all studies. Detailed characteristics of the included studies and PROMs are described and summarized in Table 1. Of the studies extracted for final review, 17 were systematic reviews and 20 were systematic reviews with meta-analyses. The mean number of studies included within each review was 13 (range 4–51). 70% of reviews included individuals with non-specific neck pain (acute, sub-acute, chronic), 27% included study populations specifically with whiplash-associated disorder (WAD), 27% included systematic reviews of individuals with radiating pain (radicular), and 22% of studies included populations consistent with cervicogenic headache. There were fourteen studies that included more than one study population within their review. There were a total of thirty-one PROMs reported across the thirty-seven studies in patients with non-specific neck pain, WAD, radiating pain and cervicogenic headache. Four patient-reported outcome constructs were identified amongst the included measures (Table 2). This included the constructs of disability, pain intensity, psychosocial factors, and QoL.

Table 1 Description and characteristics of included reviews
Table 2 The top five patient-reported outcome measure stratified by construct a

Patient-reported outcome measure constructs

Disability

Details of the PROMs included in three or more reviews are presented in Fig. 2. Of the eleven PROMs that were represented in three or more reviews, 45% (n = 5) of the PROMs assessed disability. The most frequently reported PROMs measuring disability included the Neck Disability Index (NDI), Neck Pain Questionnaire (NPQ), Patient Specific Functional Scale (PSFS), Neck Pain and Disability Scale (NPAD), and Disabilities of the arm, shoulder and hand (DASH). The NDI was represented in 89% (n = 33) of studies and was the most frequently included measure of disability within our review. This was followed by the NPQ, PSFS and DASH represented in 35%, 14% and 8% of studies respectively. The aforementioned PROMs context of use was for diagnosis and prognosis in patients with neck pain and cervicogenic headache.

Fig. 2
figure 2

The 11 most frequently reported PROMs. *DASH, Disabilities of the Arm, Shoulder and Hand; FABQ, Fear Avoidance Belief Questionnaire; NDI, Neck Disability Index; NPAD, Neck Pain and Disability Scale; NPQ, Neck Pain Questionnaire; NPRS, Numeric Pain Rating Scale; PSFS, Patient Specific Functional Scale; SF-12, Short Form Health Survey-12; SF-36, Short Form Health Survey-36; TSK, Tampa Scale of Kinesiophobia; VAS, Visual Analog Scale

Pain intensity

13% (n = 4) of all included PROMs (n = 31) measured the construct of pain intensity. Four measures of pain intensity were represented in this review and only two PROMs were represented in three or more reviews. These included the visual analog scale (VAS) and numeric pain rating scale (NPRS). The VAS was the most frequently utilized pain intensity measure and was represented in 76% (n = 28) of included studies. The NPRS was included in 73% (n = 27) of studies.

Psychosocial

Ten measures of psychosocial function were represented in the included studies, with two PROMs represented in three or more reviews. These included the Fear Avoidance Belief Questionnaire (FABQ) and Tampa Scale of Kinesiophobia (TSK). The FABQ was the most utilized of the two measures, representing 14% (n = 5) of all reviews. The second most frequently utilized measure of psychosocial function was the TSK, represented in 8% (n = 3) of all reviews.

Quality of life

Four QoL measures were represented in the included 31 PROMs with two PROMs present in three or more reviews. These included the Short Form Health Survey-12(SF-12) and Short Form Health Survey-36(SF-36). The SF-36 was the most frequently utilized QoL measure and was represented in 30% (n = 11) of included reviews. The SF-12 was the second most frequently utilized QoL measure and was included in 11% (n = 4) of reviews with risk of bias ratings of low (2) and high (2).

Risk of bias

A summary of the results from the critical appraisal of 37 studies using the AMSTAR2 are described in Fig. 3 with full details provided in supplementary materials 1. Confidence in the results were rated as critically low [19, 20], low [21,22,23,24,25,26,27,28,29,30,31,32], moderate [12, 33,34,35,36,37,38,39], and high [40,41,42,43,44,45,46,47,48,49,50,51,52,53,54]. The methodological weaknesses in the critically low and low rated studies are considered critical domains by AMSTAR2. These included a failure to adequately investigate publication bias and its impact on the results (9 studies), a lack of consideration of risk of bias when interpreting the results of the review (5 studies), or insufficient justification for excluding individual studies (3 studies). Studies rated as moderate were lacking information in more than one of the non-critical domains. This included not performing study selection in duplicate (1 study), not performing data extraction in duplicate (8 studies), lack of reporting on sources of funding for the studies included in the review (14 studies), not assessing the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis (3 studies), lack of satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review (9 studies), or failing to report any potential sources of conflict of interest, including funding (4 studies).

Fig. 3
figure 3

AMSTAR2(A MeaSurement Tool to Assess systematic Reviews) Confidence Ratings of included reviews

Discussion

The purpose of this review was to identify PROMs that are reported in patients with neck pain receiving physical therapy interventions and to provide guidance for physical therapists and other practitioners on PROM selection for patients with neck pain. Similar to the findings described in the Academy of Orthopaedic Physical Therapy(AOPT) Neck Pain CPG revision, our review found that the NDI was the most commonly utilized PROM [4]. The NDI has demonstrated high-quality evidence of good to excellent internal consistency, moderate to excellent test-retest reliability, and moderate quality evidence of poor to moderate responsiveness in patients with neck disorders [55].

Second and third to the NDI in the frequency of use for evaluation of perceived disability, were the NPQ and PSFS respectively. These findings were not surprising as they were consistent with another systematic review [55] that found the NDI and NPQ to be the most frequently utilized PROMs in physical therapy practice for patients with neck disorders. Bobos et al. demonstrated the NPQ to be the second most frequently evaluated PROM assessing disability in individuals with neck pain and moderate quality evidence demonstrating good to excellent internal consistency and good test-retest reliability [55]. Although the PSFS was originally developed to be used across a variety of conditions, moderate quality evidence of high test-retest reliability (ICC = 0.82 for cervical radiculopathy) of the PSFS has been found in patients with neck disorders [55, 56].

For the construct of pain intensity, both the VAS and NPRS were reported in over three quarters of the studies in our review. Not surprisingly, an international survey of researchers determined the NPRS to be the most widely used measure in primary care for patients presenting with neck pain [57]. The NPRS has demonstrated high-to-moderate-quality evidence of moderate to strong (0.58 to 0.93) test-retest reliability with a moderate association of concurrent construct validity between the NDI and VAS of r = 0.36 to 0.69 in patients with neck pain [57].

An in-depth understanding of the outcomes associated with physical therapy interventions in the treatment of neck pain is critical to enhancing the quality and effectiveness of clinical practice. Although there is no substitute for clinical experience and an evidence-based objective examination, the knowledge gathered from understanding a patient’s health status as they perceive it is equally, if not more important to recognize.

It’s critical to note the use of these instruments in clinical care. Standardized PROMs are intended to improve patient-centered care, measure intervention effectiveness, inform clinical decision making, quality improvement initiatives, and enhance shared decision making between the patient and the clinician. A recent systematic review summarizing patients’ experiences and perspectives of PROMs in clinical care found that patients’ perceived benefits of PROMs included a sense of empowerment, providing information to inform clinical planning, assessment, diagnosis and individualized treatment. However they also noted some common barriers to engagement including the PROMs perceived relevance, utility of questions, understanding the measures purpose and concerns about how information is applied clinically [58]. In accordance with patients, most clinicians value PROMs as long as they can be useful during the decision-making process. Noted barriers to their use include not having the infrastructure in place for data collection and when collection of PROMs disrupts their normal workflow [59].

Our review has some noted strengths. First, our study was a review of reviews, resulting in our confidence in the results of our study. Moreover, our findings were consistent with what has been reported in the recent literature. Following Cochrane guidance, our study methodology was thorough and robust creating the platform to capture as many relevant reviews as possible that met our a priori defined inclusion and exclusion criteria. Additionally, this study only reviewed systematic reviews, therefore it is possible that other PROMs have been evaluated for patients with neck pain or cervicogenic headache which have not been previously included in a systematic review analysis. However, it would be anticipated that the most frequently utilized PROMs for patients with neck pain would have been included in the selected reviews.

Our review also has some noted limitations which are important to acknowledge. A limitation of our study is a bias in established measures being reported at higher rates. For example, the NDI and the VAS were initially published in 1991 and 1921 respectively [60, 61]. Comparatively speaking, other disability measures such as the NPQ and the NPAD were published in 1994 and 1999 respectively [62, 63]. For other domains, the SF-36, SF-12, TSK and FBQ were published between 1991–1995 [64,65,66,67]. Therefore, no outcome measures in our study that were included in three or more reviews have been published in the last 20 years.

Additionally, there were PROMs that were not reported in our review that have emerged recently. Patient-Reported Outcomes Measurement Information System (PROMIS) measures were not reported in any of the included studies in this review. The PROMIS PROMs are gaining increasing popularity in clinical practice and research due to their psychometric properties and their ability to compare patient health and treatment outcomes across the continuum of care. A recent systematic review by Young et al. found that the PROMIS-Physical Function(PROMIS-PF) and PROMIS-Pain interference(PROMIS-PI) demonstrate moderate to strong correlations with the NDI, VAS, and SF-12 [68]. Additionally, there is increasing interest in lifestyle behaviors related to neck pain [69,70,71,72]. PROMs related to lifestyle behaviors such as sleep, which has been shown to contribute to neck pain intensity and outcomes, were not found within our review [72].

There were several research gaps that were identified by our study that highlight key areas for future research. Although there was moderate consistency in the reporting of PROMs within the disability and pain intensity constructs, our review found much lower rates of reporting and higher variability within the psychosocial construct. Psychosocial measures represented 32% of all PROMs however these varied greatly (10 total) with 80% (8 out of 10) used in only one review across the 37 included studies. Given the prevalence of psychosocial factors that may influence neck pain intensity [73, 74],prognosis [4, 20, 74, 75], and treatment approaches [38, 76], these data suggest that psychosocial measures are infrequently and inconsistently used when evaluating patients with neck pain. Thus, it seems reasonable to suggest that a gap exists on which measures assessing psychosocial factors are psychometrically supported and valuable in clinical practice, therefore resulting in this mass heterogeneity. This finding suggests a need for future research and specific recommendations for psychosocial PROMs that may be used in clinical practice and research in patients with neck pain.

Another measure which was not found in our review, is the Optimal Screening for Prediction of Referral and Outcome Yellow Flag (OSPRO–YF) tool. Although psychological characteristics can present independently, for example as either depression or anxiety, in patients with chronic pain they often coexist [77]. The evaluation of multiple domains of psychological distress including depression, anxiety, and pain catastrophizing allows for increased effectiveness and efficiency in discriminating between patients who may be at risk for poor outcomes. This comprehensive evaluation also allows for classifying pain phenotypes and identifying those who would benefit from targeted treatment interventions such as cognitive behavioral therapy or psychologically informed treatment [77,78,79]. Due to this, considering a multidimensional tool which evaluates a global psychological profile would inform a clinician if specific targeted interventions would be beneficial for their patient. The OSPRO-YF tool was originally published in 2016 and combines 11 unidimensional psychological questionnaires into 3 domains (negative mood, fear avoidance and positive affect). This single questionnaire has been shown to have good accuracy estimating individual, full-length psychological questionnaire scores for depressive symptoms, anxiety, anger, fear-avoidance beliefs, kinesiophobia, catastrophizing, self-efficacy, and pain acceptance in those with neck pain [80,81,82].

In contrast to the findings within the psychosocial construct, our review found consistency of PROMs within the construct of QoL, with 41% of studies reporting the use of the SF-36 or the shorter version SF-12. However, both measures have associated costs, are lengthy and have high clinician and patient burden. Therefore clinically, we are not able to confidently recommend the SF-36 or 12 without taking these barriers into consideration and understanding contextual factors including the resources that are available to a clinic setting or clinician.

There are several key implications of our findings. First, our study highlights the need for minimal mandates of PROMs that capture the full spectrum of neck pain-related constructs, including psychosocial factors. This has important implications for both clinical practice and research, as a comprehensive understanding of patients’ health as they perceive it is crucial for providing optimal care, facilitating shared-decision making, measuring intervention effectiveness, informing clinical decision making and high quality research. Second, our review identified a core set of patient-reported outcome measures that demonstrate clinical value to clinicians and patients. We provide recommendations of these measures in clinical practice and research settings, aiming to improve the standardization and comparability of patient-reported outcomes across studies and interventions.

Conclusion

There is great variability in PROMs used for patients with neck pain in physical therapy research and clinical practice. Based on these findings, we suggest that future research for neck pain evaluate PROMs that are most reported and psychometrically supported in the literature while considering clinician and patient burden. Based on the findings from this review, in the context of other literature, we recommend a core set of PROMs evaluating disability and pain intensity. This includes the NDI and NPRS or VAS. Assessment of patient QoL is critical, however recommendations for QoL PROMs need to be considered in the context of available resources and administrative burden. The findings from this review provides empirical evidence to assist in informing clinicians and researchers on the use of patient-reported outcome measures for patients with neck pain seeking physical therapy. Further research is needed to confidently recommend a QoL and psychosocial measure for patients presenting with neck pain. Other measures that were not included in this review but should be further evaluated for patients with neck pain are the PROMIS-PF, PROMIS-PI and the OSPRO-YF tool.