FormalPara Key Summary Points

Why carry out this study?

Patients with prurigo nodularis (PN) often report sleep impairment, in many cases because of severe night-time itching.

However, there are no validated patient-reported outcome (PRO) measures for quantifying patient experiences of sleep disturbance in PN.

This study aimed to validate the single-item Sleep Disturbance Numerical Rating Scale (SD NRS) in PN.

What was learned from this study?

Qualitative interviews showed that the SD NRS is a clear and well-defined tool to assess sleep disturbance in adults with PN, while psychometric analyses using clinical trial data showed that the SD NRS is a reliable, valid, and responsive measure of day-to-day fluctuations in PN-related sleep disturbance

This study confirms that the SD NRS is a fit-for-purpose PRO measure for assessing treatment outcomes in PN trials.

Introduction

Prurigo nodularis (PN) is an inflammatory skin disease characterized by chronic pruritus and multiple hyperkeratotic nodules on the extremities and trunk [1]. While the pathogenesis of PN is poorly understood, it is believed to involve neural dysregulation and infiltration of the skin by immune cells [2, 3].

The experience for patients with extreme itching has a terrible effect on their quality of life [4,5,6]. Adults with PN have higher rates of anxiety and depression [7] and often suffer from sleep problems [5, 8,9,10,11], which can be so severe as to require medical treatment [9]. These sleep problems have been linked to the evening and night-time itching from which many patients suffer [11].

Although sleep quality and duration can be assessed objectively using methods such as polysomnography, there is value in using patient-reported outcome (PRO) measures to capture patient experiences of sleep disturbance and daytime sleepiness. PRO measures used to assess sleep in people with PN include single-item numerical rating scales (NRS) [12] and the 19-item Pittsburgh Sleep Quality Index (PSQI) [13], which has been used in several PN studies [5, 9, 14]. However, options are limited and, to our knowledge, none of the available PRO measures have been validated in PN.

The Sleep Disturbance Numerical Rating Scale (SD NRS) is a single-item PRO measure for assessing day-to-day fluctuations in sleep disturbance in people with pruritic skin disorders. Content validity of the SD NRS has been established in atopic dermatitis (AD) [15]. The SD NRS has also been psychometrically validated in AD [16]. However, it has not been determined whether the SD NRS is suitable for use in patients with PN. The aim of this study was to evaluate content validity of the SD NRS in PN through qualitative interviews and to investigate its psychometric properties using data from a recent phase 2 trial [17]. A further aim was to derive a threshold for meaningful within-patient change by triangulating estimates from qualitative and quantitative analyses.

Methods

The human studies reported here were performed in accordance with the Helsinki Declaration of 1964 and its later amendments. All participants provided written informed consent.

Qualitative Interviews

The SD NRS assesses sleep loss due to PN during the previous night on a scale of 0 (“no sleep loss”) to 10 (“I did not sleep at all”). Content validity of the SD NRS was evaluated through qualitative telephone interviews with a convenience sample of 21 US adults with PN who scored the SD NRS as ≥ 4 points at screening. Interview participants were recruited via patient associations, patient panels, and social media.

Interviews were conducted in English by researchers experienced in qualitative research (C.S., L.S., R.C., and R.W.; one male and three female) using a semi-structured interview guide. The interviews combined concept elicitation with cognitive debriefing of the SD NRS. They were audio-recorded with participants’ permission and transcribed verbatim. During the interviews, participants were asked which impacts of PN they suffered from and which of these they considered to be the worst. They then completed a paper copy of the SD NRS and answered questions relating to ease of use and understanding. They also answered questions about the SD NRS score change that would reflect a meaningful improvement, starting from the SD NRS score they recorded during the interview.

Audio files from the interviews were reviewed by the research team to ensure alignment with the interview guide and study objectives. Interviews transcripts were also reviewed for quality assurance purposes to remove any patient identifiable information and correct obvious transcription errors. Content analysis of interview transcripts was then performed using ATLAS.ti (ATLAS.ti Scientific Software Development, Berlin, Germany). Quantitative data (including SD NRS responses) was collected on paper forms, which were entered into DataFax (DF/Net Research, Seattle, WA) to convert the data into an electronic format. The data was then analyzed using SAS® version 9.4 (SAS Institute, Cary, NC).

Ethical approval of the interview study was received from Advarra (Columbia, MD; reference number Pro00044847).

Psychometric Evaluation of the SD NRS

Study Design

The psychometric properties of the SD NRS were analyzed using data from a phase 2 randomized controlled trial of nemolizumab in adults with PN (NCT03181503) [17], which received ethical approval for each participating site. Participants received nemolizumab 0.5 mg/kg or placebo at day 0 (baseline), week 4, and week 8. PRO assessments were performed at different study visits from baseline to week 18.

Assessments

Participants completed the SD NRS daily in the morning from baseline to week 4 using an electronic device. Other assessments included in this analysis are summarized in Table S1 in the electronic supplementary material. This phase 2 trial included a PRO as the primary endpoint, namely the Peak Pruritus Numerical Rating Scale (PP NRS), which assesses the intensity of pruritus “at the worst moment” during the previous 24 h on a scale of 0 (“no itch”) to 10 (“worst itch imaginable”). Other PRO assessment included the Average Pruritus Numerical Rating Scale (AP NRS) [18], Average Pruritus Verbal Rating Scale (AP VRS) [18], Peak Pruritus Verbal Rating Scale (PP VRS) [18], Dynamic Pruritus Score (DPS) [19], and Dermatology Life Quality Index (DLQI) [20]. Clinician-reported outcome measures were the Investigator Global Assessment (IGA) and a 7-item version of the Prurigo Activity Score (PAS) [21], which is now known as the Prurigo Activity and Severity Score [22] and includes descriptive items on lesion type, number, and distribution.

Psychometric Properties

Test–retest reliability of the SD-NRS was examined using data for “stable” participants with no score change on the PAS item on number of prurigo lesions in a representative area (hereafter “number of lesions”) or the IGA between baseline and week 4 and a score change of ≤ 1 point on the PP VRS and AP VRS. Data for baseline and week 4 were analyzed by paired t test, and the intraclass correlation coefficient (ICC) was estimated using a two-way mixed-effects analysis of variance (ANOVA) model with absolute agreement [23, 24]. ICC values were categorized as less than 0 = no agreement, 0–0.20 = slight agreement, 0.21–0.40 = fair agreement, 0.41–0.60 = moderate agreement, 0.61–0.80 = substantial agreement, and 0.81–1.00 = near-perfect agreement [25].

Construct validity was assessed by calculating Spearman’s rank-order correlation coefficients between SD NRS average weekly score at baseline and PP VRS and AP VRS average weekly scores; DLQI total score; scores for PAS items on number of lesions, percentage of prurigo lesions with excoriations/crusts (hereafter “lesions with excoriations/crusts”), and percentage of healed prurigo lesions (hereafter “healed lesions”); IGA score; and PP NRS and AP NRS average weekly scores. Correlations were categorized as r < 0.3 = weak, 0.3 ≤ r ≤ 0.7 = moderate, 0.7 < r ≤ 0.9 = strong, and r > 0.9 = very strong [26]. It was hypothesized that SD NRS score would have moderate to strong correlations with other PRO measures, including the PP NRS, and weak correlations with the clinician-reported outcome measures.

Known-groups validity was assessed by calculating mean baseline SD NRS scores for subgroups of participants categorized according to PP VRS, AP VRS, and AP NRS average weekly scores, as well as IGA score, DLQI total score, and PAS item scores. The subgroups used were based on clinically relevant severity categories or on quartiles of the distribution. Subgroups were compared using ANOVA models adjusted for multiple comparisons by the Scheffé method [27].

In an analysis of responsiveness, PP VRS and AP VRS average weekly scores, PAS item scores, IGA score, DPS score, and PP NRS and AP NRS average weekly scores were first tested for inclusion as potential clinical outcome measures. For each outcome measure, the Spearman’s rank-order correlation coefficient (r) with SD NRS average weekly score was calculated for score changes from baseline to week 4; outcome measures where r was ≥ 0.30 or −0.30 were used as anchors in the responsiveness analysis. For each anchor, ANOVA was used to compare changes in SD NRS average weekly scores from baseline to week 4 between participants whose score on the anchor was improved and those whose score was unchanged or worsened. Effect sizes were calculated as the mean difference in average weekly score between baseline and week 4 divided by the standard deviation of the average weekly score at baseline [28]. The standardized response mean was calculated as the mean difference in average weekly score between baseline and week 4 divided by the standard deviation of the difference in average weekly score between baseline and week 4.

Meaningful Within-Patient Change

In accordance with US Food and Drug Administration (FDA) recommendations on PRO measures [29], anchor-based methods were used to determine meaningful within-patient change thresholds for the SD NRS. The responder definition was estimated based on the mean change in SD NRS score from baseline to week 4 in participants with a decrease in PP VRS score of > 0–1 point; a decrease in AP VRS score of > 0–1 point; a 1-point decrease in IGA score; a week 4 DPS score of 5 (“slightly improved”); a 4- to 6-point decrease in DLQI total score; and a 2- to 4-point decrease in PP NRS score. To supplement the anchor-based analysis, 0.5*standard deviation, and standard error of measurement (SEM: standard deviation × √[1–ICC], where the ICC was for participants with no change in PP VRS score between baseline and week 4) were calculated in a distribution-based analysis. Anchor- and distribution-based estimates were triangulated with meaningful improvement data from the qualitative interviews to derive a threshold for meaningful within-patient change.

Statistical Procedures

Analyses were conducted with SAS® 9.4 version (SAS Institute, Cary, NC) using quantitative data collected during the qualitative interviews (including SD NRS responses) and data from all randomized participants in the phase 2 trial who had baseline data for the SD NRS. All statistical tests were performed at a two-sided significance level of 0.05.

Results

Qualitative Interviews

Concept Elicitation

The 21 interview participants had a mean age of 53.1 years (range 27–76) and 15 (71%) were female. Eighteen participants (86%) were white, two (10%) were Black, and one (5%) had mixed Black/Asian heritage. Only one participant was newly diagnosed with PN (within the previous 6 months). Most participants rated the severity of their PN at the time of the interview as either moderate (n = 7 out of 21, 33%) or severe (n = 10 out of 21, 48%).

All participants reported sleep disturbance and an impact on daily life. Of the 17 participants who discussed the worst impacts of PN, 2 (12%)identified sleep disturbance as being worst. And 12 (71%) of the 17 participants who discussed whether itch was the direct cause of the impacts they experienced attributed their sleep disturbance to itch.

A total of 19 participants (90%) had problems falling asleep, with 6 of these (32%) reporting that it took them an hour or more to get to sleep. Of the 19 participants (95%) who discussed nighttime awakenings, 18 tended to wake up at least once a night. The typical total duration of nighttime awakenings was 30 min or less for 9 participants (50%); 30 min to 1 h for 1 participant (6%); and an hour or more for 5 participants (28%). For 3 participants (17%), the duration of nighttime awakenings varied depending on when they had flare-ups. Of the 19 participants who commented on the effect of PN on the quality of their sleep, 16 (84%) mentioned that their sleep was negatively affected by PN. Some participants described their sleep as “not good,” “poor,” or “very poor.” One participant (5%) described “tossing” without waking up, while another described feeling “restless.” Six participants (32%) described how PN affected their tiredness, using terms such as “tired,” “very tired,” “always tired,” and “exhausted.”

Cognitive Debriefing of the SD NRS

When asked to score the SD NRS during the interview, 9 participants (43%) recorded a score of 7–9. One participant recorded a score of 0 (“no sleep loss”); no participants recorded a score of 10 (“I did not sleep at all”). The mean (standard deviation) score was 5.1 (2.7).

When debriefing the SD NRS cognitively, not all probes were asked to all participants because of time constraints and participant fatigue. Moreover, one participant did not complete the cognitive debriefing. The other 20 participants (95%) demonstrated that they understood the SD NRS. Of the 20 participants who discussed the clarity of the SD NRS, 19 (95%) reported that it was clear. One participant (5%) found it difficult to classify their sleep numerically. Of the 21 participants, 15 (71%) responded when probed on the “no sleep loss” and “I did not sleep at all” anchors. All 15 participants were able to convey the intended meaning.

The 14 participants who described their understanding of the recall period (67%) correctly interpreted “last night” as the time between going to bed and waking up in the morning. Of the 20 participants who discussed ease of selecting a response, 19 (95%) reported having no issues. Some participants explained that it was “easy.” One participant (5%) indicated that it was easy to select a response, but difficult to explain what the number meant to them.

Patients

The analysis included 67 out of 70 participants randomized in the phase 2 trial (Table 1). Mean (standard deviation) age was 55.7 (16.0) years. Of the patients, 60% were female, and most (97%) were white. Most patients (58%) had fewer than 100 nodules on their body. IGA score at baseline was 3 (moderate) for 52% of patients and 4 (severe) for 48%. The mean (standard deviation) number of prurigo lesions in a representative area was 20.2 (16.0). Three-quarters of patients had > 50% prurigo lesions with excoriations/crusts, and more than half of patients had < 25% healed lesions.

Table 1 Demographics and baseline clinical characteristics of the phase 2 trial sample

Psychometric Evaluation

Test–retest reliability of the SD NRS was substantial (ICC = 0.76) based on PP VRS average weekly score and near-perfect (ICC = 0.87) based on AP VRS average weekly score (Table 2). For clinician-reported outcomes, test–retest reliability was moderate (ICC = 0.48 for the PAS item on number of lesions, ICC = 0.51 for IGA).

Table 2 Test–retest reliability of the SD NRS in stable participants

At baseline, SD NRS average weekly score was strongly correlated with AP NRS average weekly score (r = 0.76); moderately correlated with PP NRS average weekly score (r = 0.69), PP VRS average weekly score (r = 0.53), AP VRS average weekly score (r = 0.57), and DLQI total score (r = 0.32); and weakly correlated with and IGA score (r = 0.26) and PAS item scores (r = −0.08 to 0.23; Table 3).

Table 3 Convergent and divergent validity of the SD NRS at baseline

Known-groups validity of the SD NRS was demonstrated based on PP VRS score (p = 0.0004), AP VRS score (p < 0.0001), AP NRS score (p < 0.0001), DLQI total score (p = 0.0233), and IGA score (p = 0.0156), but not based on PAS item scores (Table 4).

Table 4 Known-groups validity of the SD NRS at baseline

A preliminary analysis was conducted to identify outcome measures to include in the responsiveness analysis (those for which r was ≥ 0.3 or −0.3). Change in SD NRS average weekly score from baseline to week 4 was strongly correlated with changes in PP NRS average weekly score (r = 0.86), AP NRS average weekly score (r = 0.85), PP VRS average weekly score (r = 0.74), AP VRS average weekly score (r = 0.73), and moderately correlated with changes in IGA score (r = 0.53) and DPS score (r = −0.54; Table S2 in the electronic supplementary material). Correlations were moderate (r = 0.50–0.51) for PAS scores for lesions with excoriations/crusts and healed lesions. A weak correlation (r = 0.29) was observed for the PAS number of lesions item, which was therefore excluded from the responsiveness analysis.

The SD NRS demonstrated responsiveness based on changes in other outcome measures with correlation coefficients ≥ 0.3 or − 0.3 (Table 5). Decreases in mean SD NRS weekly average score were significantly higher in participants classified as “improved” (− 2.9 to − 4.2) than in those classified as “worsened/unchanged” (− 0.1 to − 1.7; p = 0.0029 for AP NRS average weekly score, p = 0.0005 for DPS score, p = 0.0002 for PAS lesions with excoriations/crusts and healed lesions, and p < 0.0001 for all other outcome measures). Effect sizes were higher for “improved” (− 1.6 to − 2.4) than for “worsened/no change” (− 0.0 to − 0.9).

Table 5 Responsiveness of the SD NRS from baseline to week 4

Meaningful Within-Patient Change

Qualitative Approach

Participants were asked to consider the SD NRS score change they would view as meaningful when receiving a new treatment, starting from the score they registered during the interview. Two participants indicated that they would be satisfied or content with a 1-point improvement, 3 participants a 2-point improvement, 9 participants a 3-point improvement, and 5 participants a > 3-point improvement (2 participants did not respond; Fig. S1 in the electronic supplementary material). On average, the smallest score change considered meaningful was 3.2 points.

Participants were further probed on the meaningfulness of 1-, 2-, and 3-point score changes. When asked to consider a 1-point improvement in SD NRS score, 9 participants (43%) expressed that it would be meaningful. Different participants explained how it would mean that they could “sleep longer,” experience a “little less itching” or a “little more relief,” or feel that it was “getting better.” Most participants (18, 86%) indicated that a 2-point improvement would be meaningful, because it would, for example, mean that they could “sleep more,” “sleep more soundly,” or “concentrate a little better”; that they would experience “less itching”; or that their condition “wouldn’t feel as severe.” One participant explained how it would mean “waking up in a lot better mood.” All 20 participants who considered a 3-point improvement indicated that it would be meaningful (one participant did not provide a response). Five of these participants (25%) indicated that it would mean having more energy or less fatigue. One participant described how a 3-point improvement would “make me feel better in the morning, more energetic and maybe put a smile on my face even.”

Quantitative Approach

In anchor-based analyses, the meaningful change threshold for the SD NRS score was estimated as −2.9 based on the PP NRS, −1.8 based on the PP VRS, −2.1 based on the AP VRS, −1.4 based on the DPS, −2.0 based on the DLQI, and −4.0 based on the IGA (Table 6). Lower estimates of 0.96 for 0.5*standard deviation and 0.93 for SEM were obtained from the distribution-based analysis.

Table 6 Estimates for meaningful within-patient change in SD NRS from baseline to week 4

Triangulation of Estimates

When estimates derived from the qualitative and quantitative analyses were triangulated to obtain a range of thresholds for meaningful within-patient change on the SD NRS, a 2- to 4-point decrease in SD NRS score was judged to be a meaningful improvement for people with PN.

Discussion

In patients with PN, the intractable itch-scratch cycle often leads to sleep problems [30, 31]. In fact, improving sleep is, after controlling itch and improving PN lesions, the top treatment priority of patients with PN [32]. Because sleep problems are so common—estimated in one study to affect over 40% of PN patients [10]—the patient benefit of new therapeutic options should include assessment of the impact of PN on sleep. This requires validated PRO measures for assessing sleep problems [8,9,10]. Here, we conducted a qualitative and psychometric evaluation of the SD NRS, which, to our knowledge, is the first PRO measure to be validated for assessing sleep disturbance in PN. All the qualitative interview participants experienced sleep disturbance as a consequence of PN, including problems falling asleep and nighttime awakenings. They generally demonstrated a good understanding of the SD NRS, including its recall period and anchors, and had no problem selecting a response. In psychometric analyses, the SD NRS showed good test–retest reliability, with ICC values > 0.70 using the PP VRS and AP VRS. Construct validity was also good, with the observed moderate to strong correlations between scores for the SD NRS and itch-related outcomes supporting the idea that sleep disturbance is a proximal impact of pruritus in PN. Known-groups validity was demonstrated based on PROs but only a minority of clinician-reported outcomes. Similarly, in a previous validation study of the SD NRS in AD, known-groups validity was only demonstrated based on PROs and not based on clinician-reported outcomes [16].

There is limited qualitative research describing the burden of PN. However, our findings are consistent with quantitative literature demonstrating reduced quality of life, sleep disturbance, and depression among patients with PN [5,6,7,8,9,10,11].

When comparing score changes from baseline to week 4, the SD NRS showed moderate to strong correlations with other PRO measures and with clinician-reported outcomes except for the PAS item on number of lesions. In a subsequent responsiveness analysis, the SD NRS was able to differentiate between participants whose given anchor score improved and those whose anchor score was unchanged or worsened.

Establishing thresholds for meaningful within-patient change is essential for being able to properly interpret PRO scores [33, 34]. By triangulating the various estimates, it was judged that a 2- to 4-point decrease in SD NRS score represents a meaningful within-patient improvement. This estimate is comparable to the range for meaningful within-patient change derived from a similar analysis of the SD NRS in adults with moderate-to-severe AD [16]. It is also similar to the 4-point thresholds derived from anchor-based analyses of other 11-point NRS used in pruritic skin diseases, including the PP NRS (in AD [35] and plaque psoriasis [36]), the Worst Itch Numeric Rating Scale (in AD [37]), and the Skin Pain Numeric Rating Scale (in AD [37]).

Limitations of the present study include the fact that the psychometric validation was based on data collected up to week 4—a short period for a chronic disease such as PN. It remains to be established whether the psychometric properties of the SD NRS, especially responsiveness, hold up over a longer period of time. Also, the sample was small, although saturation of concepts was reached after two-thirds of the sample was interviewed. Moreover, the 4-week time interval between assessments in the test–retest analysis is longer than the typical interval of 1–2 weeks [38]. Finally, no sleep measures were used in the validation of the SD NRS because they were not included in the trial used for the present analysis.

Conclusions

PN is extremely difficult to treat. Recently, the FDA approved the injectable biologic dupilumab, offering new hope to patients with PN. Validated PRO measures are essential for properly evaluating the efficacy of new and emerging therapies for PN. The present analysis provides evidence that the SD NRS is a reliable and valid PRO measure that can be used for capturing changes in loss of sleep in PN in daily practice and clinical trials. The present results confirm the link between pruritus and sleep disturbance in PN and identify minimum SD NRS score changes representing meaningful within-patient change. Ongoing research aims to confirm psychometric properties and meaningful change thresholds for the SD NRS in PN using phase 3 clinical trial data. Importantly, this ongoing validation will use data for other PRO measures that capture sleep problems.