FormalPara Key Summary Points

Why carry out this study?

Patients with prurigo nodularis (PN) have impaired quality of life due to severe itch, but options for quantifying their experiences of itch are limited.

This study aimed to validate the peak pruritus numeric rating scale (PP NRS), a single-item patient-reported outcome measure for assessing maximum itch severity, in patients with PN.

What was learned from the study?

Adults with PN who participated in qualitative interviews reported that itch was their worst symptom overall and demonstrated good understanding of the PP NRS.

In a psychometric evaluation using data from a phase 2 trial in adults with PN, the PP NRS showed good reliability, validity, and responsiveness.

This study confirms that the PP NRS is a content-valid and reliable PRO measure that can be used in clinical trials.

Introduction

Prurigo nodularis (PN) is a chronic intensely pruritic inflammatory skin disease characterized by the presence of numerous hyperkeratotic and/or fibrotic nodules, commonly located on the extremities and trunk [1]. When treating PN, the primary goal is to reduce skin inflammation and break the itch–scratch cycle [2]. This goal can be difficult to attain, as there are currently no US Food and Drug Administration (FDA)-approved therapies for PN, and patients are often inadequately managed on off-label therapies, some of which can cause severe side effects [3].

PN has a devastating impact on quality of life [3,4,5]. Compared to people without PN, people with PN are more likely to have comorbidities and use more healthcare resources [6,7,8]. They also have higher rates of anxiety and depression [9] and often suffer from disrupted sleep [10,11,12]. However, PN remains understudied compared to other inflammatory skin diseases, and there is an unmet need for validated patient-reported outcome (PRO) measures to assess the impact of the disease on quality of life and to demonstrate the benefits of novel therapeutics from the patient’s perspective [13].

The peak pruritus numerical rating scale (PP NRS) is a single-item PRO measure for assessing the maximum severity of itch in people with pruritic skin disorders. It has been validated in atopic dermatitis (AD) [14] and plaque psoriasis [15], but not in PN. To determine whether the PP NRS is suitable for use in clinical trials of treatments for improving itch in PN, we evaluated its content validity through qualitative interviews and assessed its psychometric properties using data from a recent phase 2 trial of the humanized anti-human IL-31RA monoclonal antibody nemolizumab in people with moderate-to-severe PN [16]. We also derived a threshold for meaningful within-patient change.

Methods

The human studies reported here were performed in accordance with the Helsinki Declaration of 1964 and its later amendments. Institutional review board/ethics committee approval was obtained (Advarra Inc, protocol #PRE00528446), and all participants provided written informed consent to participate.

Qualitative Interviews

The PP NRS assesses the intensity of itch “at the worst moment during the previous 24 h” on a scale of 0 (“no itch”) to 10 (“worst itch imaginable”). Content validity of the PP NRS was evaluated through qualitative telephone interviews with US adults with PN who scored the PP NRS as ≥ 7 points at screening. Interview participants were recruited via patient associations, patient panels, and social media by convenience sampling. Interviews were conducted in English by researchers experienced in qualitative research (CS, LS, RC, RW; one male, three female) using a semi-structured interview guide. The interviews combined concept elicitation with cognitive debriefing of the PP NRS. Participants were asked which symptoms of PN they suffered from and which of these they considered to be the worst. They then completed a paper copy of the PP NRS and answered questions relating to ease of use and understanding. They also answered questions about the PP NRS score change that would reflect a meaningful improvement, starting from the PP NRS score they recorded during the interview.

The interviews were audio-recorded and transcribed. After removal of any personally identifiable information, the interview transcripts were uploaded, coded, and analyzed by content analysis using ATLAS.ti (ATLAS.ti Scientific Software Development, Berlin, Germany). Quantitative data (including PP NRS responses) was collected on paper forms, which were entered into DataFax (DF/Net Research, Seattle, WA) to convert the data into an electronic format. The data was then analyzed using SAS® version 9.4 (SAS Institute, Cary, NC). The protocol for the interviews was approved by an institutional review board (Advarra, Columbia, MD).

Psychometric Evaluation of the PP NRS

Study Design

Psychometric properties of the PP NRS were analyzed using data from a phase 2 multicenter, randomized, placebo-controlled, double-blinded trial of nemolizumab in adults with severe pruritus (NCT03181503) [16]. Patients received subcutaneous injections of nemolizumab 0.5 mg/kg or placebo at day 1 (baseline), week 4, and week 8. PRO assessments were performed at day 1 and at weeks 4, 8, 12, and 18. The trial protocol was approved by the ethics committee at each participating institution.

Assessments

Patients completed the PP NRS daily in the morning throughout the trial. Other PRO assessments included the Average Pruritus (AP) NRS, AP Verbal Rating Scale (VRS), and PP VRS [17]; Dynamic Pruritus Score (DPS) [18]; and Dermatology Life Quality Index (DLQI) [19] (Table S2 in the supplementary material). Clinician-reported outcomes included the Investigator Global Assessment (IGA) and Prurigo Activity Score (PAS) [20], now known as the Prurigo Activity and Severity Score [21].

Psychometric Analyses

Test–retest reliability was examined using data for “stable” patients, defined on the basis of no score change for the IGA or for the PAS item on number of prurigo lesions in a representative area (hereafter “number of lesions”) between baseline and week 4 or a score change of ≤ 1 point on the AP VRS or PP VRS. Data were analyzed by paired t test, and intraclass correlation coefficients (ICCs) were estimated using a two-way mixed-effects analysis of variance (ANOVA) model with absolute agreement [22, 23]. Agreement base on ICC values was categorized as < 0 = none, 0–0.20 = slight, 0.21–0.40 = fair, 0.41–0.60 = moderate, 0.61–0.80 = substantial, > 0.8 = near-perfect [24].

Convergent and divergent validity was assessed by calculating Spearman’s rank-order correlation coefficients between PP NRS average weekly score and PP VRS, AP VRS, and AP NRS average weekly scores; DLQI total score; DLQI item 1 (itch) score; IGA score; and scores for the PAS items on number of lesions, percentage of prurigo lesions with excoriations/crusts (hereafter “lesions with excoriations/crusts”), and percentage of healed prurigo lesions (hereafter “healed lesions”). Spearman’s rank-order correlation coefficients were calculated for IGA and PAS number of lesions scores to assess divergent validity, and for PP VRS, AP VRS, AP NRS, and DLQI scores to assess convergent validity. Correlations (r) were categorized as < 0.30 = weak, ≥ 0.30 to < 0.50 = moderate, and ≥ 0.50 = strong [25]. It was hypothesized that PP NRS score would have moderate to strong correlations with the PROs (which measure similar constructs) and weak correlations with the clinician-reported outcomes (which measure more distal constructs).

Known-groups validity was assessed by comparing baseline PP NRS scores between subgroups of patients categorized according to PP VRS, AP VRS, and AP NRS average weekly scores, as well as DLQI total score, DLQI item 1 score, IGA score, and PAS item scores. Subgroups were compared by ANOVA adjusted for multiple comparisons by the Scheffé method [26].

To analyze responsiveness, we first calculated Spearman’s rank-order correlation coefficients between change from baseline to week 4 in PP NRS average weekly score and changes from baseline to week 4 for other outcome measures. Outcome measures for which r was ≥ 0.3 were included as anchors in the responsiveness analysis. For each anchor, ANOVA was used to compare changes in PP NRS average weekly scores from baseline to week 4 between patients whose score on the anchor was improved and those whose score was unchanged or worsened. Effect sizes and standardized response means were calculated.

Statistical Procedures

Analyses were conducted with SAS version 9.4 using data from all randomized patients who contributed baseline data for the PP NRS. All statistical tests were at a two-sided significance level of 0.05. Missing data were not imputed.

Meaningful Within-Patient Change

As recommended by the FDA [27], anchor-based methods were used in the primary quantitative analysis to estimate meaningful within-patient change thresholds for the PP NRS. Estimates were derived on the basis of the mean change in PP NRS score from baseline to week 4 in subgroups of patients who experienced meaningful change on other outcome measures. The anchor-based analyses were supplemented using distribution-based methods to calculate 0.5 × standard deviation (SD) and standard error of measurement (SEM) (SD × √ [1 − ICC], where the ICC was calculated using data for patients with no change in PP VRS score between baseline and week 4). Anchor- and distribution-based estimates were triangulated with meaningful improvement data from the qualitative interviews to derive a threshold for meaningful within-patient change.

Results

Qualitative Interviews

Participants

The 21 interview participants had a mean age of 53.1 years (range 27–76) and 15 (71%) were female. Eighteen participants (86%) were White, two (10%) were Black, and one (5%) had mixed Black/Asian heritage. Only one participant was newly diagnosed with PN (within the previous 6 months). Most participants rated the severity of their PN at the time of the interview as either moderate (n = 7/21, 33%) or severe (n = 10/21, 48%).

Concept Elicitation

Itch, pain, bleeding/scabbing, and dry skin were reported by all participants. Other frequently reported symptoms were lumps/bumps (20 participants, 95%), having a crust on the skin (20 participants), burning (19 participants, 90%), stinging (19 participants), lesions/sores (18 participants, 86%), and skin discoloration (18 participants). Of the 18 participants who indicated what their worst symptoms were, 15 (83%) mentioned itch as their worst symptom or one of their worst symptoms. Many participants described their itch using terms such as “painful”, “excruciating”, and “sore”. Others described it as “uncontrollable”, “inescapable”, “intense”, or “extreme”. A few described it as “uncomfortable” or “annoying”.

Of the 15 participants who described the duration of their itch, 8 (53%) reported that it was “constant”, “present all the time,” or lasted “all day”. Three participants (20%) reported that the duration varied or was inconsistent or unpredictable. Fourteen of the 17 participants who discussed the frequency of their itch (82%) stated that they experienced it every day. The other three participants reported having itch “4 days per week”, “several times per week”, and “2 to 3 days per month”. The most frequently reported locations of itch were the arms (8 of 13 participants who reported on location, 62%) and legs (5 participants, 38%). Three participants (23%) described experiencing itch all over their bodies.

Cognitive Debriefing of the PP NRS

When asked to score the PP NRS during the interview, 7 participants (33%) recorded a score of 10, 3 participants (14%) a score of 9, 7 participants a score of 8, 3 participants a score of 7, and 1 participant (5%) a score of 6. The mean (SD) score was 8.6 (1.2).

Most participants (n = 20/21, 95%) demonstrated that they understood the intended meaning of the PP NRS and had no issues with comprehension. All 14 participants who responded when probed further on their understanding of “at the worst moment” demonstrated understanding of this part of the question.

Eighteen of the 19 participants who provided feedback on the clarity of the PP NRS (95%) reported that it was clear and indicated that there was nothing that was difficult to understand or respond to. One participant (5%) expressed that it would be difficult to “describe something that would be in the middle of the scale.” All 16 participants who responded to probing on the “no itch” anchor and all 13 who responded to probing on the “worst itch imaginable” anchor were able to describe the anchor as intended and had no issues with the meaning.

Eighteen of the 20 participants who described their understanding of the 24-h recall period (90%) described it as the past day or as the period from the same time the previous day. Two participants misunderstood the recall period as “Just a few hours ago, when I was having an episode” or “when […] my family were yelling at me for touching my skin”.

Nineteen participants (90%) reported having no issue with selecting a response. Some participants explained that it was “easy” and that there was nothing confusing about the scale. However, two participants (10%) described having an issue when selecting a response. One stated that “it’s so hard to put a number on the itch” and the other explained how distractions during the day meant that they could “more easily give […] a range than […] a certain number”.

Psychometric Evaluation

Patients

The present analysis included 67 of 70 patients randomized in the phase 2 trial (Table 1). Mean (SD) age was 55.7 (16.0) years. Sixty percent of patients were female and most (97%) were White. Most patients (58%) had fewer than 100 nodules on their body. IGA score at baseline was 3 (moderate) for 52% of patients and 4 (severe) for 48%. The mean (SD) number of prurigo lesions in a representative area was 20.2 (16.0). Three-quarters of patients had more than 50% prurigo lesions with excoriations/crusts, and more than half of patients had less than 25% healed lesions.

Table 1 Demographics and baseline clinical characteristics of the phase 2 trial sample

Psychometric Properties

Test–retest reliability of the PP NRS was substantial based on PROs and weak to fair based on clinician-reported outcomes: ICC values for the correlations between baseline and week 4 scores were 0.73 for PP VRS average weekly score, 0.76 for AP VRS average weekly score, 0.20 for PAS number of lesions, and 0.25 for IGA score (Table 2).

Table 2 Test–retest reliability of the PP NRS in stable patients

The PP NRS demonstrated convergent and divergent validity. At baseline, PP NRS average weekly score was strongly correlated with PP VRS average weekly score (r = 0.75), AP VRS average weekly score (r = 0.73), DLQI item 1 score (r = 0.54), and AP NRS average weekly score (r = 0.85); moderately correlated with DLQI total score (r = 0.36) and IGA score (r = 0.33); and weakly correlated with PAS item scores (r =  − 0.09–0.05) (Table 3).

Table 3 Convergent and divergent validity of the PP NRS at baseline

PP NRS average weekly scores were higher in subgroups of patients with worse itch based on PP VRS, AP VRS, and AP NRS scores than in subgroups with less severe itch (p < 0.0001) (Table 4). Known-groups validity was also demonstrated on the basis of DLQI total score (p = 0.0054), DLQI item 1 score (p < 0.0001), and IGA score (p = 0.0027), but not based on PAS item scores.

Table 4 Known-groups validity of the PP NRS at baseline

A preliminary analysis was conducted to identify outcome measures to include in the responsiveness analysis (those for which r was ≥ 0.3). Change in PP NRS average weekly score from baseline to week 4 was strongly correlated with changes in PP VRS average weekly score (r = 0.88), AP VRS average weekly score (r = 0.84), DLQI item 1 score (r = 0.69), IGA score (r = 0.55), and DPS (r = − 0.65) (Table S1 in the supplementary material). The correlation with PAS number of lesions was weak (r = 0.29). PAS number of lesions was therefore excluded from the responsiveness analysis.

The PP NRS demonstrated responsiveness based on changes in other outcome measures with correlation coefficients ≥ 0.3 (Table 5). Decreases in mean PP NRS weekly average score were significantly higher in patients classified as “improved” (− 3.8 to − 4.8) than in those classified as “worsened/no change” (− 0.2 to − 2.1; p = 0.0010 for PAS lesions with excoriations/crusts, p < 0.0001 for all other outcome measures). Effect sizes were also higher for “improved” subgroups (− 3.7 to − 4.7) than for “worsened/no change” ones (− 0.2 to − 2.0).

Table 5 Responsiveness of the PP NRS from baseline to week 4

Meaningful Within-Patient Change

Qualitative Approach

Participants were asked to consider the PP NRS score change they would view as meaningful, starting from the score they registered during the interview. Of the 18 participants who reported on the smallest score change they would be satisfied or content with when receiving a new treatment, one (6%) indicated that they would be satisfied or content with a 1-point improvement, two (11%) a 2-point improvement, two (11%) a 3-point improvement, and 13 (72%) a more than 3-point improvement (Fig. S1 in the supplementary material). The average smallest score improvement they would be satisfied or content with when receiving a new treatment was 4.9 points.

Twenty of the 21 participants were further probed on meaningfulness of 1-, 2-, and 3-point score changes. Three-quarters of participants (n = 15/20, 75%) indicated that a 3-point improvement would be meaningful. They described that a 3-point improvement would mean “less itch on [their] skin”, “fewer lesions”, or “no burning and no itching”. Most participants (n = 14/20, 70%) expressed that a 2-point improvement would be meaningful. They explained that this would mean “less itch”, “itch severity and nodules reduced”, “less discomfort”, “feeling more comfortable to go out with friends”, or “be[ing] happy”. Only a minority of participants (n = 8/21, 38%) indicated that a 1-point improvement would be meaningful. Participants expressed that a 1-point improvement would mean “feeling more comfortable” or experiencing more calmness”, or that it would be “encouraging knowing that your itch is getting milder”.

Quantitative Approach

In anchor-based analyses, meaningful within-patient change for PP NRS score was estimated as − 2.2 based on the PP VRS, − 2.4 based on the AP VRS, − 4.6 based on the IGA, − 2.0 based on the DPS, and − 1.9 based on the DLQI (Table 6). Distribution-based estimates for the PP NRS were 0.49 for 0.5 × SD and 0.52 for SEM.

Table 6 Responder definitions for change in PP NRS from baseline to week 4

Triangulation of Estimates

Results of the anchor-based, distribution-based, and qualitative analyses were triangulated to obtain a range of thresholds for meaningful within-patient change on the PP NRS. The results of this triangulation suggested that a 2- to 5-point decrease in PP NRS score is a meaningful improvement for the target population.

Discussion

PN is characterized by an intractable itch–scratch cycle that can ruin patients’ lives [2, 12, 28]. Validated PRO measures for assessing itch [11, 12] are needed to properly evaluate the patient benefit of new therapeutic options. Here, we conducted a qualitative and psychometric evaluation of the PP NRS as part of a stepwise strategy for establishing its suitability and fitness for purpose for clinical trials in PN.

Participants in the qualitative interviews experienced myriad skin symptoms, the worst of which was itching. They demonstrated a good understanding of the PP NRS, including the recall period, anchors, and response scale. Moreover, most participants expressed that the PP NRS was clear and that it was straightforward to select a response.

The qualitative study found that a 3-point reduction in PP NRS score would be a meaningful improvement for most participants. However, the results also suggested that a 3-point improvement would not necessarily be a change that patients would be satisfied or content with. Indeed, 72% of participants would expect a change of more than 3 points on the PP NRS with a new treatment.

The PP NRS had good test–retest reliability, with ICC values for the PP VRS and AP VRS above the standard threshold of 0.70 [29]. Reliability based on PAS number of lesions and IGA scores was weaker, although this was not surprising, because the aspects of PN these clinician-reported outcomes capture are more closely related to disease activity than to itch. The PP NRS also had good construct validity, and known-groups validity was demonstrated on the basis of PROs and some but not all clinician-reported outcomes. For each of the anchors included in the responsiveness analysis, the PP NRS was able to differentiate between patients whose score on the anchor improved and those whose score was unchanged or worsened.

The identified 2- to 5-point meaningful within-patient change threshold is aligned with the threshold of a 4-point decrease in PP NRS score derived from anchor-based analyses in adults with moderate-to-severe AD [14] and plaque psoriasis [15]. The 4-point threshold has been accepted by European and US regulators in other skin conditions as a meaningful within-patient change threshold to demonstrate treatment effect [30,31,32,33].

Until recently, validated PRO measures for assessing itch in PN were not available. However, the worst itch (WI) NRS [34] has now been validated using data from a placebo-controlled trial of the small-molecule inhibitor serlopitant in patients with treatment-refractory PN [35]. The PP NRS and WI NRS have similar response scales and recall periods and appear to be conceptually equivalent based on cognitive debriefing and usability testing in US adults with moderate-to-severe PN [36]. However, they differ in their wordings: the PP NRS uses itch “at the worst moment” and the WI NRS uses itch “at its most intense”. For economic modelling and comparative effectiveness analyses, an equivalence study would be useful to provide evidence that the two itch NRSs measure the same concept and provide comparable data.

Limitations of the present study include the small sample. Also, the 4-week time interval in the test–retest analysis is longer than the typical 1 to 2 weeks between assessments [37], although the effect of this difference is likely conservative. Finally, the lack of analyses by sex potentially limits the generalizability of our findings, although the female predominance in our study samples is aligned with the higher prevalence of PN in women compared to men [1, 9].

Conclusion

This study provides strong evidence that the PP NRS is a content-valid and reliable PRO measure that can be used by adults with PN in clinical trials. Moreover, it provides a meaningful within-patient change threshold estimate for the PP NRS, based on triangulation of quantitative estimates and meaningful improvement data derived from a qualitative analysis.