FormalPara Key Summary Points

Why carry out this study?

To develop a fit-for-purpose (well-designed, reliable and valid) scale to assess itch intensity in clinical trial settings for children aged 6–11 years.

To provide a responder definition to identify children with a meaningful change in itch.

What was learned from the study?

The Worst Itch Scale is a well-defined, reliable, sensitive and valid scale for evaluating worst itch intensity in children aged 6–11 years with severe AD.

The most appropriate within-patient threshold for defining a clinically relevant response was a ≥ 3–4-point change on the Worst Itch Scale score.

Digital Features

This article is published with digital features, including a video abstract to facilitate understanding of the article. To view digital features for this article go to https://doi.org/10.6084/m9.figshare.20655396.

Introduction

Atopic dermatitis (AD) is a chronic, relapsing, inflammatory skin condition affecting patients of all ages [1]. Moderate-to-severe AD is characterised by intense, persistent and debilitating itch (pruritus), which can have a profoundly negative impact on patients’ lives [2,3,4,5]. Reducing itch in patients with AD is a major therapeutic goal and an important marker of treatment benefit [6]. However, itch can be accurately and reliably evaluated only by the patient. For adults and adolescents with moderate-to-severe AD, the Peak Pruritus Numerical Rating Scale (NRS) has been established as a reliable, sensitive and validated tool allowing patients to evaluate worst itch intensity [7, 8]. In adults, a change from baseline of ≥ 2–4 points in Peak Pruritus NRS has been defined as a clinically meaningful improvement in itch [8].

Recent attempts to address the lack of available assessment tools for children have included modifications of existing tools (such as ItchyQoL and ItchyQuant) for use with patients as young as 6–7 years of age [9] and the generation and validation of the PROMIS Itch Questionnaire-Child (PIQ-C) for itch intensity and impact in children ≥ 8 years of age [10]. The Worst Itch Scale was developed as a fit-for-purpose (i.e. well-defined, reliable and valid) scale to assess itch intensity in clinical trial settings [11] for children aged 6–11 years. The scale measures itch intensity during daytime and nighttime (when itch can be most intense and cause sleep disturbances). The purpose of the current study was to develop a fit-for-purpose (well-designed, reliable and valid) scale to assess itch intensity in clinical trial settings for children aged 6–11 years, and to empirically derive a responder definition (i.e. a clinically meaningful within-patient improvement) for worst itch in this age group, which can be used to identify children who have experienced a meaningful improvement in itch.

Methods

Development and Content Validation of Worst Itch Scale

Three iterative rounds of qualitative semi-structured interviews, comprising concept elicitation and cognitive debriefing, were conducted to develop and evaluate the content validity of the Worst Itch Scale. Children were eligible to participate in the study if they met the following criteria, as reported by their caregiver (parent or guardian): between the ages of 6 and 11 years, inclusive; diagnosed with AD (or eczema) by a physician; experienced symptoms of AD for a period of at least 1 year; recently (in the previous month) experienced at least moderate itching related to AD, as reported by the caregiver during screening in response to the question, ‘Over the last month, would you describe your child’s dermatitis/eczema-related itching as mild, moderate, severe or extremely severe, at its worst?’; inadequate response to topical AD medication(s) in the previous 6 months, as reported by the caregiver during screening in response to the question, ‘In the past 6 months, have any prescription or over-the-counter topical medications completely cleared up your child’s eczema?’ (possible responses were yes or no).

The first round of interviews aimed to identify any initial problems with question wording or response options. The second and third rounds were intended to provide further conceptual support, optimise items, test the adequacy of modifications and collect additional qualitative data about final items before quantitative evaluation.

Psychometric Evaluation of Worst Itch Scale

Psychometric evaluation of the Worst Itch Scale was conducted using data from the randomised, double-blinded, placebo-controlled LIBERTY AD PEDS (R668-AD-1652; ClinicalTrials.gov identifier: NCT03345914) phase 3 trial of dupilumab in paediatric patients aged 6–11 years with severe AD that was inadequately controlled with topical medications or for whom topical treatment was medically inadvisable [12]. The study was conducted in accordance with the provisions of the Declaration of Helsinki, the International Council for Harmonisation Good Clinical Practice (ICH GCP) guideline and applicable regulatory requirements; the protocol was reviewed and approved by institutional review boards/ethics committees. For all patients, written informed consent was obtained from a parent or legal guardian and written informed assent was obtained from the patient. The full study design of LIBERTY AD PEDS has been reported previously [12].

The Worst Itch Scale was included as an outcome in LIBERTY AD PEDS. The two Worst Itch Scale items (described below) were completed by children once daily in the evening via electronic diary throughout the study. Whenever possible, the child read and completed the two items alone. When required, a caregiver (parent or other) read the questions and response options aloud to the child, but caregivers were instructed not to influence or question the responses provided by the child.

The psychometric evaluation of the Worst Itch Score was performed in accordance with classical psychometric test theory and guidelines recommended by the US Food and Drug Administration (FDA) [13] and was based on evaluations previously conducted in adolescents and adults with AD [7, 8]. Psychometric evaluation included assessment of test–retest reliability, construct validity, known-groups validity and responsiveness. Data were pooled across treatment arms.

Several patient-reported outcome (PRO) and clinician-reported outcome (ClinRO) measures from LIBERTY AD PEDS were used for the psychometric evaluation of the Worst Itch Scale. The PRO measures included Patient Global Impression of Disease (PGID), Patient Global Impression of Change (PGIC), Children’s Dermatology Life Quality Index (CDLQI), SCORing Atopic Dermatitis (SCORAD) itch Visual Analogue Scale (VAS), SCORAD sleeplessness VAS and Patient-Oriented Eczema Measure (POEM). The PGID asked participants about their itching in the last 7 days and was scored on a 5-point scale (not itchy at all, a little itchy, medium itchy, pretty itchy, very itchy). The PGIC was designed to measure perceived change in itching since starting medication and was scored on a 5-point scale (much better, a little better, the same, a little worse, much worse). The ClinRO measures included SCORAD objective score, Eczema Area and Severity Index (EASI) and Investigator’s Global Assessment (IGA).

Test–retest reliability intraclass correlation coefficients (ICC) for the weekly average of the daily worst itch were computed using scores from the last 2 weeks of the treatment period (Week 15 [test] and Week 16 [retest]), when a patient’s underlying condition and intensity of symptoms are expected to remain stable, and using baseline (test) and Week 2 (retest) scores of a subgroup of patients with no change on the PGID (Table 1). Although no standards exist for judging the magnitude of reliability coefficients for individual items, it is generally recommended that ICCs should be ≥ 0.70 for multi-item scales [14].

Table 1 Summary of key measurement properties of the weekly average of daily worst itch scores in paediatric patients with atopic dermatitis aged 6–11 years

Establishing a Clinically Meaningful Within-Patient Change Threshold for Worst Itch Scale

Analyses were conducted using LIBERTY AD PEDS data pooled across treatment arms to define a clinically meaningful within-patient change (i.e. responder definition) using both anchor- and distribution-based methods. Consistent with recommendations that the anchor-based method is preferred [13, 15, 16] and patient-reported global status measures are the most appropriate [17], the primary anchor was a PGID improvement of 1 point. Supportive anchors were: PGIC improvement ‘a little better’ or ‘much better’; EASI response 50–74, 75–89 and 90–100 (50–74%, 75–89% and 90–100% improvement in EASI from baseline, respectively); IGA score 0 or 1; IGA improvement ≥ 2 points. The distribution-based one-half standard deviation (SD) and standard error of measurements (SEM) of the Worst Itch Scale score at baseline were calculated.

Results

Development and Content Validation of Worst Itch Scale

Three iterative rounds of interviews were conducted with a total of 22 children with AD aged 6–11 years (mean [SD] age 8.7 [1.8] years; 54.5% male) and their caregivers (one parent or guardian). The study sample was racially and ethnically diverse, with approximately 55% White, 23% Black, 18% Hispanic and 5% other, as reported by the caregiver. Among the child participants, 14 (63.6%) had moderate AD-related itch and 8 (36.4%) had severe AD-related itch, per caregiver report. Efforts were made to recruit a diverse group of participants with regard to age, given the developmental and cognitive differences expected in children across the span of 6–11 years of age.

Besides Peak Pruritus NRS, which has been used to assess itch in adult and adolescent patients with AD [8, 18], several new draft items were developed in collaboration with clinical outcome assessment experts that had different options for item wording (e.g. severity/intensity of worst itch, frequency of itching/scratching), response scale (e.g. 0–10 NRS, 4-point verbal rating scale, with and without figures) and recall period (24 h, last night and today). Across the three rounds of interviews, 4 children (3 aged 6 years and 1 aged 7 years) preferred that the interviewer read the items aloud while they followed along silently, although they were able to follow along and respond to questions as the interviewer read aloud.

Poorer performing items were deleted. Respondent feedback guided the selection of items for revision and additional testing, and confirmed the final items. For example, none of the participants could accurately interpret and use the recall period description ‘the previous 24 h’ when selecting a response. The recall period description ‘from the time you went to bed last night until right now’ was lengthy, wordy and confusing to almost all participants. Since many children had difficulty accurately selecting a score to represent worst itch over the entire 24-h time period, the most appropriate format for the Worst Itch Scale consisted of two separate items: one asking about ‘worst itching’ experienced ‘last night’ and one asking about ‘worst itching’ experienced ‘today’. Both items were rated by the child using an 11-point NRS on which 0 = ‘no itching’ and 10 = ‘worst itching possible’, with figures depicting the 0 and 10 anchors on the scale. Both items tested well within the context of the interviews, including participants who had limited reading ability (e.g. younger ages, poor readers). The highest (i.e. worst) response of the two items was taken as the daily worst itch score. In the LIBERTY AD PEDS study, the daily worst itch scores were averaged over 1-week intervals to take into account the potential variation in itch between the daily scores.

Psychometric Evaluation of Worst Itch Scale

Patient Characteristics

The analysis sample included 361 randomised patients (mean [SD] age 8.4 [1.7] years; 49.9% male) from the LIBERTY AD PEDS study [12] who received at least one dose of dupilumab or placebo and had at least one post-baseline Worst Itch Scale assessment during the treatment period. The study sample was racially and ethnically diverse, with approximately 69% White, 17% Black or African American, 8% Asian and 5% other. Patient demographics and background have been reported previously [12].

Test–Retest Reliability

Test–retest reliability ICCs for the weekly average of the daily worst itch during the last 2 weeks of the treatment period (Week 15 [test] and Week 16 [retest]), and at the start of the study (baseline [test] and Week 2 [retest]), were above the 0.70 criterion in both cases, at 0.95 and 0.76, respectively, indicating that the weekly average of the daily worst itch scores was stable during the time when the patients’ disease was stable.

Construct Validity

As expected, strong, positive correlations (r ≥ 0.50; based on Cohen’s guidelines [19]) were observed at baseline between the weekly average of the daily worst itch score and the SCORAD itch item (r, 0.69), PGID (r, 0.65) and the CDLQI itch item (r, 0.56) (Table 1). By Week 16, the magnitude of correlation coefficients increased in strength to 0.78, 0.67 and 0.58, respectively. Correlations with the ClinRO instruments were lower than with the PRO measures at both timepoints, with small correlations (r < 0.30; based on Cohen’s guidelines [19]) at baseline and moderate correlations (0.30 ≤ r < 0.50; based on Cohen’s guidelines [19]) at Week 16.

Known-Groups Validity

Known-Groups Analysis of Variance (ANOVA) analyses were conducted to evaluate the discriminating ability of the weekly average worst itch at baseline and Week 16 (Table 1). Results for the known-groups comparisons were in the anticipated direction and statistically significant for all omnibus tests and most pairwise tests (except for the pairwise comparisons at baseline, likely due to lack of variability).

Sensitivity to Change

Responsiveness or sensitivity to change was evaluated by computing correlations of change from baseline to Week 16 in the weekly average of the daily worst itch scores and the supporting outcome measures, effect-size estimates of change and ANOVAs (Table 1).

Clinically Meaningful Within-Patient Change Threshold

The response threshold estimates and distribution-based estimates are presented in Table 2. The primary anchor was a 1-point improvement in the PGID. The responder definition estimated as the mean of the weekly average of the daily worst itch scores corresponding to the primary anchor was 2.84. The responder definitions based on other anchors (PGIC, EASI response and IGA score) ranged from 2.43 for the PGIC to 4.80 for EASI 90–100. The highest estimates obtained (4.49, 4.71 and 4.80) were from the most stringent criteria (PGIC ‘much better’, IGA score 0/1 and EASI 90–100, respectively). As expected, the distribution-based one-half SD and SEM were much lower (less than 1.0) than the anchor-based threshold estimates.

Table 2 Estimates for thresholds for meaningful within-person change on the weekly average of the daily worst itch score using data from LIBERTY AD PEDS (R668-AD-1652; NCT03345914)

Discussion

The results of this study show that the Worst Itch Scale is a fit-for-purpose, well-defined and reliable tool to evaluate intensity of worst itch among paediatric patients aged 6–11 years with AD. The findings from the anchor- and distribution-based response analyses suggest that the most appropriate definition of a clinically meaningful response on the weekly average of the daily worst itch scores is ≥ 3–4 points.

Cognitive interviews were conducted to develop and evaluate the content validity of the Worst Itch Scale. Children were included across the whole range of 6–11 years and the children in the lower end of the range (6–7 years), for whom self-assessment of itch is particularly challenging, were well represented by 6 of the 22 participants (27.3%). The most appropriate format for the Worst Itch Scale consisted of two separate items: one asking about ‘worst itching’ experienced ‘last night’ and one asking about ‘worst itching’ experienced ‘today’. The results of the psychometric evaluation of the weekly average of the daily worst itch scores using data from the LIBERTY AD PEDS clinical trial [12] provide confirmation of strong measurement properties in children aged 6–11 years with severe AD.

Test–retest reliability was well above the recommended 0.70, indicating that the weekly average of the daily worst itch scores was stable over the time when the patients’ disease was stable. The results consistently supported the construct validity of the weekly average of the daily worst itch scores, demonstrating that the weekly average provides a measure of itch intensity from the patients’ perspective. All correlations were statistically significant, and the largest convergent correlations were, as expected, between the weekly average of the daily worst itch scores and the PRO measures designed to assess itch intensity or severity. Correlations with EASI total score and IGA score were lower, as might be expected for ClinRo instruments, which were not developed for the assessment of symptoms from the patient perspective and assess multiple clinical signs with varying association to pruritus. Furthermore, correlations tended to be lower at baseline than at Week 16, again, as expected, due to relatively low variability in scale scores resulting from study inclusion criteria designed to enroll patients with severe AD.

In the known-groups analyses, the consistent patterns of scores and highly significant differences between patients at the levels of the known groups tested at both timepoints (except for the pairwise comparisons at baseline) provide strong support for the discriminating ability of the weekly average of daily worst itch scores. Additionally, the patterns of mean change, which were in the anticipated direction and statistically significantly different across the PGID subgroups, as well as the strong correlations and large effect sizes of change, provide compelling evidence to support the responsiveness of the weekly average of the daily worst itch score. As expected, correlations between change in the weekly average of daily worst itch scores and change scores in the PRO measures, which assessed similar constructs, tended to be moderate to strong (r ≥ 0.30), ranging from 0.45 to 0.70. These were stronger than those observed between the change in weekly average of daily worst itch scores and ClinRO measures, which ranged from 0.40 to 0.46. Overall, the results indicated that improvements in worst itch intensity reported by patients using the weekly average of the worst itch score corresponded with improvements reported by patients and clinicians, and were also related to changes in similar concepts as assessed by different instruments. Although the Worst Itch Scale score provides only a single score of itch intensity, it did compare well with CDLQI. The moderate correlation with CDLQI suggests that worst itch score reflects the impact of itch on a patient’s quality of life. However, it is important that the true impact of itch on quality of life is accurately captured using CDLQI or another itch-specific impact measure.

The results of the anchor-based analyses using the primary patient-reported anchor, PGID, as well as supportive PRO and ClinRO anchors, provide evidence that a range of ≥ 3–4 points is an appropriate threshold for identifying clinically meaningful improvement in the weekly average of the daily worst itch. The distribution-based estimates of less than 1 point were lower compared with the anchor-based estimates. As the anchor-based method is preferred by the FDA [13], this was taken as the primary method. Thresholds for meaningful improvement in itch derived in children aged 6–11 years are comparable to those observed in adults [8].

A limitation of this analysis was that thresholds were derived using empirically driven data for patients with severe AD aged 6–11 years, and so may not be appropriate for extrapolation to other AD severity classifications or conditions. The construct validity was conducted using correlations of the Worst Itch Scale with several items, including the CDLQI itch and SCORAD itch VAS. The CDLQI and SCORAD measures have been fully validated [20]; however, the individual itch items have not been validated separately.

Conclusion

The Worst Itch Scale is a well-defined, reliable, sensitive and valid scale for evaluating worst itch intensity in children aged 6–11 years with severe AD. The most appropriate within-patient threshold for defining a clinically relevant response was a ≥ 3–4-point change on the Worst Itch Scale.