FormalPara Key Summary Points

Why carry out this study?

Assessing patient-centric outcomes in atopic dermatitis (AD) using patient-reported outcome (PRO) questionnaires is important in evaluating the symptoms and associated burden to daily life.

There is a need for guidance on interpreting PRO scores to clinically meaningful severity strata.

What was learned from this study?

Results from the linking analyses provide severity strata for scores generated by three novel PRO questionnaires (Worst Pruritus Numerical Rating Scale, Atopic Dermatitis Symptom Scale, and Atopic Dermatitis Impact Scale), for adolescents and adults with moderate to severe AD.

These severity strata can be used to inform clinical research and clinical practice treatment decisions.

Introduction

Atopic dermatitis (AD) is a chronic inflammatory skin disease affecting up to 30% of children and 10% of adults across different countries worldwide [1]. In addition to skin manifestations, AD is characterized by pruritus, skin pain, and sleep impacts [2]. There are currently no widely accepted biomarkers or objective measures of these symptoms, thus, they are best measured by patients themselves using patient-reported outcomes (PRO). In addition, prior research reported only modest correlations between clinician-evaluated AD lesions and patient-reported symptoms such as itch and pain [3, 4]. Therefore, it is important to assess these aspects using PRO questionnaires.

Prior to questionnaire development, a review of existing AD-specific instruments was conducted to determine whether there were any existing tools that could be implemented in a clinical trial for the assessment of symptoms and impacts of moderate to severe AD in adolescents and adults [5]. While AD-specific PRO questionnaires were developed and evaluated by other groups [6,7,8,9,10] [including the consensus group to harmonize core outcome measures for atopic eczema/dermatitis (HOME) [11, 12]], none of the reviewed questionnaires met the criteria of the research team for the evaluation of daily and weekly symptoms and impacts in a clinical trial setting for the target patient population. Thus, three novel PRO questionnaires for adolescents and adults with moderate to severe AD were developed on the basis of best measurement practices summarized in the US Food and Drug Administration’s 2009 PRO guidance [13,14,15]: Worst Pruritus Numerical Rating Scale (WP-NRS), Atopic Dermatitis Symptom Scale (ADerm-SS), and Atopic Dermatitis Impact Scale (ADerm-IS). Evidence of content validity [5, 16], psychometric performance, and score interpretation guidance (e.g., meaningful within person change) [17] has been demonstrated in adolescents and adults with moderate to severe AD.

It is important to translate PRO questionnaire scores into easily understandable reference points for clinicians and patients (i.e., interpretability), as performed for other patient- and clinician-reported assessments for AD [18,19,20]. This research sought to define severity strata not previously reported for the three PRO questionnaires described above that assess the daily/weekly signs, symptoms, and impacts of moderate to severe AD [5, 16, 17].

Methods

Data

Data from a global, randomized, double-blind, placebo-controlled, multi-center phase 3 clinical trial (NCT03568318) involving 901 adolescents and adults with moderate to severe AD were used for this analysis. Ethical review at each clinical site was completed for the clinical trial study protocol, informed consent forms, and recruitment materials before enrollment (See Supplementary Material, Table 1 for details); the study design and patient population were described previously [21]. Participants with scores from the target PROs (WP-NRS, ADerm-SS, ADerm-IS) at baseline and one follow-up timepoint (week 2, 4, or 16) were included in the analyses.

Measures

Figure 1 summarized the content and the scoring generated by the three target PRO assessments. Details for each questionnaire are also provided below.

Fig. 1
figure 1

Target questionnaires and scores. aQuestionnaire items use a 0–10 numerical rating scale. bItem completed daily. cTo avoid repeated measurement of concepts potentially assessed by other instruments in clinical trials, the ADerm-SS TSS-7 was developed by summing items that assess concepts not measured by clinician-reported questionnaires. ADerm IS Atopic Dermatitis Impact Scale, ADerm-SS Atopic Dermatitis Symptom Scale, TSS-7 7-Item Total Symptom Score, WP-NRS Worst Pruritus Numerical Rating Scale

The WP-NRS is a single-item PRO questionnaire designed to assess the severity of worst/maximal itch over the past 24 h on an 11-point numerical rating scale (NRS), with scores ranging from 0 (No itch) to 10 (Worst imaginable itch). Higher scores indicate more severe itch.

The ADerm-SS is an 11-item PRO questionnaire designed to assess 11 signs and symptoms of AD at their worst over a 24-h recall period. All items are scored on an 11-point NRS from 0 [no (sign/symptom concept)] to 10 [worst possible (sign/symptom concept)]. This analysis focused on two scores calculated for the ADerm-SS: a single-item score for skin pain (ADerm-SS Skin Pain) and a seven-item Total Symptom Score (ADerm-SS TSS-7). The ADerm-SS Skin Pain score is the score of Item 3 and ranges from 0 to 10, with higher scores indicating worse skin pain. The ADerm-SS TSS-7 is calculated as the sum of Items 1–7. The ADerm-SS TSS-7 score ranges from 0 to 70, with higher scores indicating worse AD symptoms.

The ADerm-IS is a 10-item PRO questionnaire designed to assess a variety of impacts that patients experience from their AD across both a 24-h recall period (daily Items 1–3) and 7-day recall period (weekly Items 4–10). Three domain scores were calculated for the ADerm-IS: Sleep, Daily Activities, and Emotional State. The ADerm-IS Sleep, Daily Activities, and Emotional State scores were calculated as the sum of Items 1–3, Items 4–7, and Items 8–10, respectively. The ADerm-IS Sleep score ranges from 0 to 30, with higher scores indicating greater impacts on sleep. The ADerm-IS Daily Activities score ranges from 0 to 40, with higher scores indicating greater impacts on daily activities. The ADerm-IS Emotional State score ranges from 0 to 30, with higher scores indicating greater emotional impacts.

The Patient Global Impression of Severity (PGIS) was used as an anchor variable in the analyses and is a single-item PRO questionnaire assessing overall current disease severity on a 7-point verbal response scale where 0 indicates “absent: no symptoms” and 6 indicates “very severe: cannot be ignored and markedly limits my daily activities.” In these analyses, the PGIS was collapsed into five categories to simplify interpretation (Fig. 2). This assessment was developed to align with regulatory expectations and guidance [22] on what constitutes a good anchor measure; specifically, it is an assessment of the patient’s current state to minimize influences of recall, it is easy to interpret and correlated with the target assessments, and was completed at comparable timepoints during the trial.

Fig. 2
figure 2

Patient Global Impression of Severity categories used for defining severity strata. PGIS Patient Global Impression of Severity. Absent, light peach with grey text; Minimal, peach with grey text; Mild, orange with white text; Moderate, light red with white text; Severe, dark red with white text

Statistical Methods

Linking is a statistical method that maps values from an anchor instrument to the equivalent values on the target measure (or vice versa) [23]. Equipercentile linking, the method most commonly used in studies of linking [24, 25], was used to map WP-NRS, ADerm-SS, and ADerm-IS scores to collapsed PGIS severity categories. Analyses were conducted using pooled data from several timepoints (baseline, weeks 2, 4, and 16) to ensure scores represented the full range of response options on the target assessments and the PGIS. For target assessments completed daily, scores from the earliest day of each corresponding weekly visit window were used. Pearson correlations and 95% confidence intervals (CI) were calculated between PGIS and the scores from the target PRO questionnaires for both age groups to confirm the suitability of the PGIS as an anchor for the linking analyses. To evaluate whether a pooled severity strata set could be created, adult and adolescent correlation coefficients and their 95% CIs were assessed for similarity. No imputation of missing data was conducted for any of the questionnaires.

Score intervals were estimated separately for adolescents and adults, and then qualitatively evaluated to identify the severity strata that were applicable to both adults and adolescents. Specifically, a score of 0 was utilized to indicate “absent” unless results suggested otherwise. Additionally, upper severity thresholds were averaged between adults and adolescents, then rounded down to the nearest integer. If rounding down caused overlap with adjacent severity strata, the averaged value was rounded up.

Finally, the agreement of severity strata with the PGIS for adolescents and adults was assessed by the weighted kappa statistic (κ). All analyses were conducted in SAS, version 9.4 (SAS Institute, Cary, NC, USA).

Results

Participant Demographics and Baseline Health Characteristics

The total sample analyzed (N = 882) included adults (n = 769) and adolescents (n = 113) with moderate to severe AD. The mean age of participants at baseline was 34.1 ± 15.0 years (range 12–75 years), and over half the sample was male (60.8%) and white (71.4%) (Table 1). On the basis of the clinician-assessed validated Investigator Global Assessment of AD (vIGA-AD), 46.8% and 53.2% of the sample were rated as moderate and severe at baseline, respectively (similar percentages were observed for both the adult and adolescent subgroups). Baseline scores on the PGIS and target PRO questionnaires were also similar for adults and adolescents (Table 2).

Table 1 Demographic characteristics of the sample
Table 2 Score distribution of assessments at baseline (N = 882) for M16-047

Correlation of PGIS Anchor to the Target Assessments

The WP-NRS, ADerm-SS Skin Pain and TSS-7, and ADerm-IS domain scores were moderately to strongly correlated (r ≥ 0.49) with the PGIS for both adolescents and adults (See Supplementary Material, Table 2). While correlations were lower for the adolescent sample [range 0.49 (week 2 ADerm-SS Skin Pain and week 4 ADerm-IS Emotional State) to 0.70 (week 16 WP-NRS and ADerm-SS Skin Pain)], compared with adults [range 0.61 (week 2 ADerm-IS Sleep) to 0.79 (week 16 ADerm-SS TSS-7)], all correlations were at least moderate. Furthermore, the 95% CIs for adults and adolescents overlapped for all scores for at least one timepoint, supporting the consistency in the correlations. Therefore, the PGIS was determined to be an acceptable anchor for equipercentile linking.

Identification of PRO Severity Strata by Age Group Separately and Combined

Identified severity strata anchored to the PGIS are presented in Fig. 3 for adults and adolescents separately. In general, the severity strata are consistent across both adults and adolescents, and therefore Fig. 3 also presents the pooled severity strata applicable to both adults and adolescents after qualitative evaluation. Agreement of the combined adult and adolescent severity strata with the PGIS severity categories indicated that the agreement was acceptable (κ > 0.4) [26] when applying them to either age group (Table 3).

Fig. 3
figure 3

Severity strata anchored to collapsed Patient Global Impression of Severity categories (adults and adolescents and pooled). Severity strata are based on the collapsed PGIS. Absent, light peach with grey text; Minimal, peach with grey text; Mild, orange with white text; Moderate, light red with white text; Severe, dark red with white text. The Moderate stratum corresponds to PGIS of Moderate and Moderately severe; the Severe stratum corresponds to PGIS of Severe and Very severe. ADerm IS Atopic Dermatitis Impact Scale, ADerm-SS Atopic Dermatitis Symptom Scale, PGIS Patient Global Impression of Severity, TSS-7 7-Item Total Symptom Score, WP-NRS Worst Pruritus Numerical Rating Scale

Table 3 Weighted kappa statistics

Discussion

The results presented can be used to interpret scores generated by the WP-NRS, ADerm-SS, and ADerm-IS, which were developed to be completed by adolescents and adults with moderate to severe AD [5, 16, 17] in a clinical trial setting. Results were largely consistent between the adult and adolescent samples, though correlations between scales were lower for adolescents, and threshold scores for adolescents tended to be less severe than adults. There may be subtle differences in symptom experience and burden between adults and adolescents with AD [27, 28], which may affect how they score their condition. Another explanation for difference between age groups is that the ADerm-SS and ADerm-IS scales were originally developed on the basis of an adult content validation study, and were geared toward the experience of this age group [5]. Follow-up qualitative research was conducted with adolescents with moderate to severe AD to confirm that the questionnaires capture the patient experience for this younger group [16]; however, differences in disease perception between age groups requires further research.

The verbal response scale utilized by the PGIS provides a framework for interpreting the scores of the target assessments, which is useful in understanding the clinical meaning of the scores in relation to patients’ overall AD severity. Defining severity strata was used for other clinical outcome assessments including PRO questionnaires in AD [18,19,20, 29,30,31], and this information can be used as a benchmark for comparison of scores between PRO questionnaires and also interpretation of scores in future clinical research for similar patients. The current strata for the WP-NRS and ADerm-SS Skin Pain scores are similar to results presented for other similar PRO questionnaires [30, 31] that use a 0–10 NRS; specifically, scores of less than 4 and 3 are associated with milder itch and skin pain, respectively. For the ADerm-SS TSS-7 and ADerm-IS domain scores, the severity strata presented here may help researchers determine screening criteria and endpoints for future clinical research on moderate to severe AD.

While there are other analytic methods for linking [32, 33], the equipercentile linking method provides an automated, non-parametric, data-driven approach to identify severity strata that requires fewer restrictions and distributional assumptions [32,33,34] compared with anchor-based methods [18, 20, 29]. One strength of the current analyses was the ability to pool data from several timepoints during the clinical trial to ensure scores captured the full range of response options on both the PGIS and the target assessments. Another strength of the current analyses is the large sample and the inclusion of both adolescents and adults, which allowed separate investigation of adolescents and adults and maintained adequate data volume across each PGIS severity level for the analyses. One limitation of these analyses is the use of clinical trial data collected among individuals with moderate to severe AD at baseline, and thus it is unknown whether these results are generalizable to other contexts of use or broader patient populations. Another limitation is using the PGIS anchor (collapsing a 7-point response scale into five categories), which differs from other anchors used in prior AD research categorizing patients as clear, mild, moderate, or severe. Therefore, these analyses should be replicated using additional anchors and real-world data to confirm whether the severity bands are the same for a broader sample of individuals (i.e., based on gender, race, region, etc.) with AD in other settings. While using one cohesive set of interpretability bands for all age groups is practical, investigators may consider applying separate bands for different age groups as provided. In addition, further evaluation is needed if applying these severity strata to a target patient population that differs from those with moderate to severe AD, such as patients with mild AD or general pruritus.

While there are other disease-specific assessments for AD, most are intended to be completed during clinic visits and have longer recall periods to accommodate a less frequent administration schedule. If the research goal is to evaluate the signs, symptoms, and impacts of moderate to severe AD on a more granular level, then the current daily/weekly diary assessments could be useful in understanding how the severity of these concepts is experienced by patients in their daily lives.

Conclusions

The current analyses provide severity strata to interpret scores generated by the WP-NRS, ADerm-SS, and ADerm-IS completed by adolescents and adults with moderate to severe AD. These strata may help inform future research, including clinical trial endpoints and clinical practice treatment targets, and help patients and clinicians understand research findings to participate in shared decision making.