Background

Patient-reported outcome (PRO) instruments are important for assessing the symptoms and/or impact of symptoms on patients’ lives [1,2,3,4,5,6]. Part of developing or adapting a PRO instrument includes the selection of a response scale (e.g., verbal rating scale [VRS], numeric rating scale [NRS], visual analogue scale [VAS]). Despite the importance of response scale selection for PRO instruments, there seems to be little empirical basis for the type of response scale selected, the response options, the visual orientation of the scale (i.e., vertical vs. horizontal), or the response scale verbal anchors. When considering a response scale for a PRO instrument, it is important to consider the context of use including population of interest, therapeutic area, and study implementation. Such considerations are central because response scales contribute to the precision as well as the performance of an instrument in the clinical trial setting, such as the ability to detect changes with treatment [1, 7].

Response scale selection is a critical aspect of PRO instrument development and has broad downstream implications for the usability of the measure from the patient perspective, the level of precision with which the construct of interest is measured, and the quantitative properties of the outcome score including range, standard deviation, scoring, and score interpretation guidelines, as well as the responsiveness of the measure to detect change [8]. Additional complicating factors such as placement and exact wording of response anchors, cultural comparability/translatability of the format and wording, and ability to migrate the scale to various modes of data collection (paper, electronic) should also be considered.

Pain is subjective in nature and is measured by patient report of intensity, among other subjective qualities. There is an abundance of existing literature on pain measurement, from expert opinions to consensus guidelines to empirical studies, all of which compare the suitability of response scale types within a particular context of use. In the PRO assessment field, 5- and 7-point VRS, 11-point NRS (particularly recommended for use in pain measurement but used in other areas as well [9]), and 10 cm/100 mm VAS are commonly used for adult assessments. VAS, NRS, and VRS response scales are generally reliable and valid, and are usually part of primary outcome measure(s) in clinical trials of chronic pain treatments. For these reasons, it is important to consider the empirical evidence as well as precedent, especially when considering the development of a new instrument.

Although response scales in pain measurement are well-explored, the existing literature should be further compared and analyzed for key themes and recommendations. Existing Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) guidelines and review articles indicate the NRS as the gold standard in measurement of pain intensity [9, 10]; however, much of the key research was published prior to the 2009 issuance of the Food and Drug Administration’s (FDA) Guidance for Industry titled “Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims” [1]. As such, a review to determine the state of the evidence was necessary in order to guide the ongoing research at the Critical Path Institute’s PRO Consortium [11], as well as to inform the PRO assessment field during a period of expanding instrument development. The purpose of this paper is to describe a comprehensive review of the scientific literature aimed at identifying response scale options common in pain measures for the adult population, to summarize the empirical evidence in support of response scale selection, and to provide recommendations for this context of use.

Methods

Search procedure, terms, and strategy

The literature review was part of a larger study to summarize the available empirical evidence to support response scale selection during the development of new PRO instruments. The targeted search strategy included formal guidelines or review articles on the selection of response scales and response scale methodology (not specific to PRO instruments) and evidence on the selection of response scales for use in PRO instruments. Evidence was assembled and collated based on pre-determined categories such as selection of response scales for use in PRO instruments based on therapeutic area (e.g. pain). This paper focuses on the identification and review of literature that addressed response scale selection and use for PRO-based pain assessment in adults. To provide a comprehensive review, references included review articles, consensus guidelines, and expert opinion pieces as well as primary research studies (from the last 10 years) on response scale choice for pain assessment. The search database included Embase and MEDLINE (2004–2014; only articles published in English) and presentation abstracts from two recent annual meetings/conferences of the International Society for Pharmacoeconomics and Outcomes Research (2013–2014) and International Society for Quality of Life Research (2012–2013). The types of included references were PRO instrument articles with information on response scale selection in adults. Additional supplementary searches (e.g., reviewing reference lists of articles that met the initial search criteria) were also conducted to ensure no relevant publications were missed. These supplementary sources were not limited by publication date, and included the reference lists of key articles, publications not included in the search databases, and websites for major PRO-related working groups and consortia (e.g., PROMIS, NIH Toolbox, Medical Outcomes Study, Neuro-QoL, ASEQ-ME, EORTC, EuroQol Group, and FACIT Measurement System).

For abstracts and full-text articles, the search results were examined by two independent reviewers to determine whether the article should be included in the review; in the case of non-agreement, a third senior reviewer made the final judgement. Articles were excluded if they provided no direct or indirect evidence relevant to the search objectives, were not applicable to PRO instrument development, or addressed a therapeutic area not pre-specified for inclusion. Once articles fitting the search criteria were identified, the relevant data were extracted and summarized. The extraction tables included data on the study objective, study design, study population, therapeutic area, name of PRO instrument, type of response scale, and empirical evidence for response scale selection. To further characterize the quality of included articles, several different efforts were employed. First, included articles were categorized as providing either direct evidence or indirect evidence. Direct evidence was defined as evidence that provided a direct answer to the research question of interest. For example, direct evidence articles compared empirically the relative robustness or merits of two different response scale types within the same study/population. Indirect evidence was defined as important, relevant evidence that should be considered in the review and overall conclusions, but that does not directly answer the research question or hypothesis. Second, an assessment was made of the quality of the data presented and the strength of results and recommendations for each article included in the review. Each article was assigned a grade based on the type of article and strength of the data, from A (primary research; reflects the strongest empirical evidence for response scale recommendation) to D (review or expert opinion; reflects the weakest empirically-based evidence). In summary, the methods were comprehensive and covered several search objectives; the detailed literature review methods and processes are summarized elsewhere [12].

Results

Our review of the literature providing an empirical basis for response scale selection yielded pain (adult population) as the most prevalent area (Gries K, Safikhani S, Pease S, Harrington M, Rudell K, Berry P, Crescioni M, Patel M: Comprehensive literature review to characterize response scale types in patient-reported outcome measures, Unpublished observations). As a result of the abstract review, we identified 18 review articles/consensus guidelines/expert opinion pieces and 38 primary research studies for full text review, providing insights into the optimal response scale selection for pain assessment (Fig. 1). Further examination of the full text articles narrowed these down to 14 and 29, respectively.

Fig. 1
figure 1

Screening and review process for pain-related references*. *The top row covers all initial seven search objectives and proceeds accordingly

Response scale types used in adult pain studies

In the adult population, there were 14 review articles/consensus guidelines/expert opinion pieces and 29 primary research studies, providing insights into optimal response scale choice for pain assessment. There were seven individual response scales identified in this comprehensive review across all search objectives (i.e., VAS, VRS, NRS, Faces, other graphical, binary, and Likert or Likert-type scales). Based on review, it appears that adult pain studies have historically used simple single-item measures of pain with a VAS, NRS, or VRS as their primary outcome, albeit sometimes in the context of a multi-item instrument (e.g., the Brief Pain Inventory). There was no widely accepted standard for clinical pain assessment that would facilitate the comparison of response scale performance across trials (Table 1). Further descriptions of the response scale types are summarized elsewhere (Gries K, Safikhani S, Pease S, Harrington M, Rudell K, Berry P, Crescioni M, Patel M: Comprehensive literature review to characterize response scale types in patient-reported outcome measures, Unpublished observations).

Table 1 Summary of key evidence to support response scale selection in pain

Summary of key evidence regarding numeric rating scales in the adult pain population

In the review of the 29 direct empirical evidence, we identified 11 articles that directly compared NRS with other type of response scales in either a validation study or a direct methodological comparison (Table 2). The following observations regarding the NRS suggest that the NRS is potentially superior to the VRS or VAS. The NRS was preferred over the VAS because it was easier to administer and score [13]; was preferred by patients [14]; was found to be more responsive [15]; and had both higher patient acceptability and better psychometric properties [16]. The NRS was found to be more responsive than a six-point VRS [17]; the most feasible and discriminative self-report scale as compared to a VAS or 5-point VRS [18]; and more sensitive and discriminative than a binary scale [19]. One study that compared an NRS to a Faces scale (a type of graphical scale that uses photographs or pictures showing a continuum of facial expressions) found both scales to be equally well understood [20], although a population of Swahili-speaking patients preferred the Faces scale over the NRS [21].

Table 2 Summary of key evidence to support the numeric rating scale (NRS) in pain

Summary of key evidence regarding visual analog scales in the adult pain population

Seven studies directly compared the use of the VAS with other response scales. The following observations were made to support the use of the VAS. The VAS was found to be more sensitive than a four-point VRS [22]; a VAS with intermediate verbal anchors enhanced scale comprehension compared to a traditional VAS [15]; and the VAS was found to be equally well accepted and understood as a Faces scale [23]. One study found that a four-point VRS had lower scale failure rates (i.e., one or both tests were not completed correctly), than a VAS [24]. Table 3 summarizes the key evidence regarding VAS in the assessment of pain for adults.

Table 3 Summary of key evidence to support the visual analog scale (VAS) in pain

Summary of key evidence regarding verbal rating scales in the adult pain population

Three studies provided empirical evidence regarding the direct comparison of the VRS with the NRS and VAS. The following observations were made to support the use of the VRS. In 25 patients with chronic low back pain, intra-rater correlations between a VRS and VAS showed no significant difference between raters, thus demonstrating appropriate reliability [25]. A five-point VRS and VAS were administered in a cross-sectional study of patients with chronic, nociceptive, and neuropathic pain [26]. The VAS cut-off positions relative to the VRS labels and non-linear properties indicated different meaning of the rated pain intensity [26]. The VRS, NRS, and VAS were all sensitive to change for pain assessment, but the VAS was more difficult for patients in the study to understand [27]. The VRS showed substantial discrimination between pain words and was not dependent on the level of education of the patients [27]. Table 4 summarizes the key evidence regarding VRS in the assessment of pain for adults.

Table 4 Summary of key evidence to support the verbal rating scale (VRS) in pain

Summary of results in pain

Several review articles support the usefulness, reliability, and validity of each of the scale types (i.e., the VAS, VRS, NRS) for the self-assessment of pain in adults [28, 29]. The majority of the reviewed articles, 7 out of 13, identified the NRS as the most appropriate response scale for the assessment of pain in adults, including the IMMPACT guidelines for pain measurement in clinical trials and the National Institutes of Health Toolbox pain assessment [9, 10, 30,31,32,33].

The NRS circumvents problems for translations, administration, and scoring that may occur with the VAS on paper administration by ambiguous lines being drawn. The electronic format of the VAS may now avoid some of those issues stemming from paper administration; however, potential issues remain, stemming from various formats of screen sizes, specifically that the length of the VAS line may be variable across devices.

Another consideration is that a VAS is not appropriate for telephone interview-based or interactive voice response (IVR) system-based data collection. The VAS relies on a respondent’s ability to view the VAS and mark the position on it (via a pencil/pen, touch with a stylus or finger, or a mouse click) that represents his or her answer to the specific question. Hence, the VAS is limited to paper and screen-based electronic modes of data collection that enable that to occur.

In summary, both types of articles, reviews and empirical studies, provide evidence that the 11-point NRS is likely the optimal self-report response scale to evaluate pain among adult patients without cognitive impairment. Furthermore, the 11-point NRS is the easiest to implement and utilize on electronic data collection systems that time- and date- stamp the respondent’s data entry, which is recommended within FDA’s PRO Guidance for supporting PRO-based labelling claims [1].

Discussion

In this study, we conducted a comprehensive evaluation of the scientific literature to identify response scale option types for pain studies and to review and summarize available empirical evidence for each type of response scale by context of use, to enhance response scale selection for newly developed self-reported pain instruments in the adult population. With the publication of the IMMPACT guidelines in 2005 [9], the 11-point (i.e., 0 to 10) NRS was recommended as a core outcome measure of chronic pain treatment trials. Subsequently, the NRS became the gold standard for pain assessment in clinical trials; however, it is by no means universal. For example, some analgesics may have differential efficacy across pain types, and neuropathic versus musculoskeletal pain demonstrate different characteristics. However, extensive empirical evidence has been generated since the 2005 publication of the IMMPACT guidelines, and our review demonstrated that the recommended NRS response scale still appears to have the best performance when directly compared to other commonly used response scale types in the self-assessment of pain. NRS measures tend to be preferred over VAS measures by patients [9].VAS measures often result in more missing and incomplete data than NRS measures; this is most likely due to the easier to understand and less abstract nature of NRS measures [9]. Our findings suggested that pain assessment typically consists of single-item questions which typically use either an NRS or a VAS.

The 2014 FDA Guidance for Industry, “Analgesic Indications: Developing Drug and Biological Products” [34], provides recommendations to sponsors on the development of prescription drugs that are the subject of new drug applications for the management of acute and chronic pain as well as the management of breakthrough pain. The FDA recommends the use of a well-defined and reliable PRO instrument to assess the subject’s pain intensity. Because pain is a subjective experience, the choice of an adequate instrument to measure the primary endpoint is critical to demonstrating the efficacy of an analgesic. Therefore, it is important to consider whether a well-defined and reliable instrument already exists or can be developed for an analgesic study. It is also important that measures be based on scales or instruments that have been adequately developed for use in the population to be studied and that the instruments are appropriate for use in the setting of a clinical trial to measure change over time. The FDA’s recommendations highlight the importance of choosing an adequate response scale when selecting a pain instrument for a clinical trial.

Of course, this study has limitations. First, this review does not address the considerations for measuring the various phenotypic manifestations of pain. Pain can be categorized according to its duration, acute or chronic, as well as other characteristics, such as breakthrough pain and episodes of acute pain that occur in the context of otherwise well-controlled, chronic pain. The recommendations emerging from this review speak broadly to the concept of pain and do not reflect an examination of those intricacies due to the lack empirical evidence in the literature. In addition, this review did not discriminate between the choice of response scale for a single stand-alone item or for more than one item in a multi-dimensional pain inventory. Also, the review and merit of multi-domain PRO instruments which include pain were not included in this review. Nor does this review address the question of whether pain should be measured unidimensionally or across multiple domains.

Further, this review does not attempt to address other aspects of pain endpoints, such as recall period or worst vs. average pain intensity that may impact instrument validity. The PRO instrument’s recall period for assessing pain should be appropriate for the type of pain studied and the planned study design. The FDA recommends use of an instrument that asks the subject to assess his or her worst pain over a relatively short time period, and no longer than the past 24 h, with the assessment occurring at the same time [34].

Additionally, this review focused on adult pain and does not attempt to address other populations (e.g., cognitively impaired, pediatric), or other reporters (e.g., observer- or clinician-rated). In cases of young children or subjects who cannot provide self-report, observers (e.g., parents, caregivers) can report on observable indicators (e.g., events, behaviors, or signs) of pain (e.g., wincing, crying, or squirming). However, an observer cannot validly rate a subject’s pain intensity; and the FDA does not consider an instrument that requires an observer to do so to be well-defined or reliable [34].

Lastly, this literature review was conducted in early 2015 and was limited to articles published in English during a 10-year timespan from 2004 through 2014. Although a brief scan of the literature since 2014 did not find key evidence that would change the conclusions, the authors recognize that ending the structured review at 2014 is a limitation of this manuscript.

Conclusions

Overall, the literature supports the NRS as the preferred scale for pain measurement; the available empirical evidence demonstrates its superior performance to other response scale types in this context of use. It is simple, straightforward, and easy to interpret. This review includes evidence from a wide range of painful conditions, including acute pain as well as chronic neuropathic and musculoskeletal indications, suggesting a relatively broad generalizability of the findings. Recommendations for future research include well-designed head-to-head comparisons of response scales used in pain measures, structured reviews of pain assessment in other populations (e.g. pediatric, geriatric), and further exploration of the measurement considerations for the various phenotypic manifestations of pain.