FormalPara Key Summary Points

Why carry out this study?

Existing patient-reported outcome measures (PROMs) are not expected to be fit for purpose from a regulatory perspective to assess Chronic Hand Eczema (CHE) sign/symptom severity in clinical trials of treatment interventions for CHE and clinical practice for the management of CHE.

Other existing PROMs used in CHE are either not disease specific (e.g., the Dermatology Life Quality Index) or primarily provide an assessment of quality of life rather than a comprehensive assessment of the key signs/symptoms associated with CHE (e.g., the Quality of Life in Hand Eczema).

This study aimed to assess the content validity and psychometric validity of the Hand Eczema Symptom Diary (HESD), a new PROM designed to assess CHE sign/symptom severity and measure changes in these over time in CHE clinical trials and clinical practice.

What was learned from the study?

Content validity of the 6-item HESD (assessing itch, pain, cracked skin, redness, dryness and flaking) was confirmed in cognitive debriefing interviews with CHE patients and psychometric validation activities provided strong evidence of construct validity, reliability and the ability to detect change for the HESD Itch score, HESD Pain score and the HESD score and established within-individual responder definitions for those scores.

Introduction

Chronic Hand Eczema (CHE) is an inflammatory skin disease of the hands or wrists, often caused by contact dermatitis and characterized by recurring flares and poor prognosis [1]. One-year prevalence of CHE is estimated to be 4.7–9.1% [2,3,4], and the life-time prevalence of severe to very severe HE has been reported to be 1.9% [2]. Approximately 70.0% of CHE patients have moderate to severe disease, which can persist over several years [5]. CHE is characterized by signs of erythema, infiltration, hyperkeratosis and vesicles. Secondary signs include scaling, fissures and erosions, and the condition may be exacerbated by infections [3, 6]. Core symptoms include itch and pain, which are reported to negatively impact patients’ psychological wellbeing, physical functioning, daily life activities and the ability to work [1, 7,8,9,10].

Patient-reported outcome measures (PROMs) are valuable to support clinical trial endpoints for new therapies alongside clinical endpoints. Regulatory guidance indicates that fit-for-purpose PROMs must measure ‘concepts of interest’ that are clinically relevant and important to how patients ‘feel and function’, with evidence of validity, reliability and ability to detect change in the target population [11,12,13,14]. Qualitative research comprising a literature review and concept elicitation (CE) interviews with 20 CHE patients in the USA and five expert dermatologists was conducted to better understand the lived experience of CHE patients [9]. The signs/symptoms most frequently reported by patients included itch, pain, cracked skin, redness, thickened skin, swelling and bleeding.

The key signs and symptoms identified in this previous qualitative work were used to draft items to form the Hand Eczema Symptom Diary (HESD). The HESD is a new, CHE-specific PROM designed to assess the worst severity of CHE signs and symptoms for use in clinical trials of treatment interventions for CHE and in clinical practice for the management of CHE. Other existing measures of symptoms and/or health-related quality of life (HRQoL) used in CHE are either not specific to CHE (e.g., the Dermatology Life Quality Index [DLQI]) or have limitations that make them less likely to be accepted by regulatory authorities such as the US FDA as a measure of symptom severity. The Quality of Life in Hand Eczema (QOLHEQ) [15], for example, is a PROM that includes some measurement of the signs/symptoms of CHE but is primarily an assessment of HRQoL.

The aims of this research were to evaluate content validity of the HESD through  qualitative patient interviews and then to evaluate the psychometric properties and estimate within-individual responder definition thresholds for meaningful change using data from two clinical trials.

Methods

Study Design

This study consisted of two core activities: qualitative cognitive debriefing (CD) interviews with adult CHE patients to evaluate content validity of the HESD and psychometric validation using data from two clinical trials, pooled across treatment groups.

The study activities were performed in accordance with the Helsinki Declaration of 1964 and its later amendments. All participants provided written informed consent indicating their data will be used for medical research purposes and that the results may be published. Ethical approval and oversight for the qualitative interviews were provided by Copernicus Group Independent Review Board (CGIRB; reference ADE1-17-162). Ethical approval for the psychometric validation activities was obtained as part of the phase 2b (NCT03683719) and phase 3 (NCT04871711) trials from the independent institutional review boards (IRB) and ethics committees.

Instrument Development

The initial draft HESD included 13 items designed to assess the severity of CHE signs/symptoms. Patients rate each sign/symptom, at its worst, over the past 24 h using a 0–10 numeric rating scale (NRS). The decision to focus on ‘worst severity’ was to align with the US FDA’s stated preference for the assessment of symptoms ‘at their worst’ when label claims are sought [13]. A 24-h recall period was selected as CHE signs/symptoms can fluctuate daily, and the measure was intended to track sign/symptom severity over time. For the 0–10 NRS, two options for the upper anchor label were tested (i.e., ‘severe’ and ‘worst imaginable’) during the CD interviews, with the aim of selecting the most appropriate option from the patient perspective. The lower anchor was ‘no (sign/symptom)’ for all HESD items. The HESD was subject to multiple rounds of CD and revision, including input from expert dermatologists at key stages to ensure clinical insight into the interpretation of findings (Fig. 1).

Fig. 1
figure 1

HESD development overview

Qualitative Interviews

The content validity of the draft 13-item HESD was assessed in semi-structured, qualitative, face-to-face interviews with adults with CHE in the US. The interviews comprised CD activities to assess whether the HESD was understood and relevant and captures all sign/symptom concepts important to CHE patients. The interviews were approximately 60 min in total, with 30 min focused on CD of the HESD (the remainder was focused on debriefing an impact measure). The interviews were conducted in two rounds by trained qualitative interviewers, allowing for modifications and testing of the updated instrument between rounds.

Sample and Recruitment

Participants for the interviews were identified by a recruitment agency in the US (MedQuest, New York, NY, USA) via referring primary care physicians and dermatology specialists. To be eligible, patients had to be at least 18 years of age and have a clinician-confirmed diagnosis of CHE defined according to the guidelines for management of HE (ICD-10 code: I.20, I.23, I.24, I.25, I.30) [16]. Quotas were used to ensure the inclusion of patients with different levels of disease severity according to a standard physician global assessment (PGA) and from different regions of the USA to account for potential seasonal and regional differences. To capture a diversity of patient perspectives, quotas were also set with the view to recruit a mix of patients with allergic, irritant and atopic CHE subtypes. A total of 20 CHE patients (split across the different subtypes) were expected to be sufficient to assess content validity [17]. All interview participants were compensated for their participation.

Interview Procedure

Participants were asked to complete the HESD on a handheld device using a ‘think aloud’ approach and to share their thoughts as they read each instruction/item and selected each response. Participants were asked detailed questions about their interpretation and understanding of the instructions and item wording, relevance of concepts and the appropriateness of response options and recall period.

To determine the most appropriate wording to use for the response scale, half the participants (n = 10) were first debriefed on the ‘severe’ upper anchor label, whereas the other half (n = 10) were first debriefed on the ‘worst imaginable’ upper anchor label. Following debriefing of the HESD in its entirety, participants were then presented with one or more items with the opposite upper anchor label and asked to indicate their preference and why. Usability of the handheld device was also explored.

Qualitative Analysis

All interviews were audio-recorded and transcribed verbatim, with identifiable information redacted. Transcripts were imported into Atlas.Ti Scientific Software for analysis [18]. Qualitative analysis of the verbatim transcripts involved assigning dichotomous codes to each item, instruction, response option and recall period to indicate whether it was understood, relevant and appropriate and why.

Psychometric Validation

The 11-item version of the HESD that emerged from the content validity interviews was included as an exploratory endpoint in a phase 2b dose-ranging trial conducted to evaluate the efficacy and safety of delgocitinib cream in patients with mild to severe CHE (NCT03683719). Data from the trial were used to support item reduction, development of scoring and initial evaluation of psychometric properties (see Table S1 for more detail on methods). Analyses were conducted in the full sample (N = 258). The study design and eligibility criteria for the phase 2b trial have been previously described [19].

Psychometric results from the phase 2b trial were reviewed by the US FDA, who recommended deletion of additional items (those assessing signs/symptoms least commonly reported) and the use of Patient-Global Impression (PGI) items that were more specific to the targeted measurement concepts as part of the anchor-based analyses conducted to confirm within-individual responder definitions for the HESD scores (e.g., for the HESD Itch score it was recommended to use a PGI-Severity [PGI-S] item asking about severity of itch rather than just severity of CHE). Following this feedback, additional items were removed resulting in the 6-item HESD and specific PGI items were developed for itch and pain.

The 6-item HESD was included as a secondary endpoint in a phase 3, randomized, double-blind, vehicle-controlled, parallel-group, multi-site trial evaluating the efficacy and safety of twice-daily application of delgocitinib cream 20 mg/g versus cream vehicle over a 16-week treatment period in adults with moderate to severe CHE (NCT04871711; DELTA 1). Data from the first 280 participants with an IGA-CHE score at Baseline and Week 16 were used to perform confirmatory evaluation of the measurement properties of the 6-item HESD, including anchor- and distribution-based analyses to support interpretation of change. The HESD was completed on an electronic device (eDiary) starting at least 7 days prior to Baseline and then every evening until Week 16.

Eligibility Criteria

Participants in the phase 3 trial were required to have: a diagnosis of CHE, defined as HE that has persisted for > 3 months or returned twice or more within the last 12 months; moderate to severe CHE at screening and Baseline according to the Investigator Global Assessment for Chronic Hand Eczema (IGA-CHE; score of 3 or 4); a HESD Itch score (weekly average) of ≥ 4 points for the 7 days preceding Baseline; a documented recent history of inadequate response to treatment with topical corticosteroids (TCS) (at any time within 1 year before the screening visit) or for whom TCS were documented to be otherwise medically inadvisable (e.g., due to important side effects or safety risks).

Trial Assessments

Other clinical outcome assessment (COA) measures administered alongside the HESD during the phase 3 trial were used to support construct validity analyses, define participants with stable CHE for test-retest reliability analysis and define participants who experienced change. Details of the other COA measures included in analyses are provided in Table 1.

Table 1 Overview of other relevant COA instruments included in the phase 3 clinical trial

Statistical Methods

Table 2 details the psychometric analyses used in this study. All analyses were conducted in the psychometric analysis population (comprised of the first 280 participants randomized with an IGA-CHE completion at Baseline and Week 16) and using data from Week 4, unless otherwise specified. Week 4 was selected as this timepoint was expected to provide a greater distribution of scores than other timepoints such as Baseline or Week 16. The psychometric analysis population sample size was determined by calculating the sample size that would be required to achieve a mean change in HESD scores for the moderately improved anchor group with a 95% confidence interval (CI) width of less than 1 (< 25.0% of the expected mean change of 4). This calculation assumed the same proportion of participants achieving ‘moderately improved’ and the same standard deviation (SD) as the phase 2b sample, leaving room for dropout/assumption violation. All statistical analyses were detailed a priori in a psychometric analysis plan and conducted in accordance with European Medicines Agency (EMA) and US FDA standards and US FDA guidance for PROMs and other COAs [11, 14, 22, 23].

Table 2 Summary of psychometric analyses performed in the phase 3 clinical trial data

Results

Qualitative Interviews

Interview Sample Characteristics

A total of 20 adults with CHE from the US were interviewed (n = 10 in each interview round). The interview sample represented a diverse range of demographic and clinical characteristics (Table 3).

Table 3 Demographic and clinical characteristics of interview participants

Cognitive Debriefing of HESD

The items developed to assess all sign/symptom concepts identified as important during CE were generally found to be relevant and well understood (Table 4). Items assessing itch, redness, roughness and dryness were relevant for all participants, regardless of CHE subtype. Most participants correctly understood and accurately used the recall period throughout the HESD (≥ 75.0% for each item). All participants (n = 20/20) demonstrated an understanding of both the ‘severe’ and ‘worst imaginable’ options for the upper anchor label of the response scale. However, most (n = 10/16, 63.0%) reported a preference for the ‘severe’ option as it better reflected how they perceived their experience of signs/symptoms (n = 5), and it was easier for them to understand (n = 3). All participants asked found the touchscreen to be responsive (n = 6/6), the device light and easy to hold (n = 10/10), the font size appropriate and easy to read (n = 13/13) and navigating between the items to be easy (n = 13/13).

Table 4 Overview of HESD item understanding and relevance

Findings from the first round of interviews suggested several signs/symptoms were closely related or conceptually equivalent. To explore item redundancy and inform item reduction, seven participants (n = 7/10) in round two were asked to group the dermatological signs/symptoms they perceived to overlap conceptually. The most frequently reported conceptual overlaps were between flaking and cracking (n = 6/7), heat and burning (n = 5/7), hardness and roughness (n = 5/7) and hardness and thickening (n = 5/7).

Based on the findings from round one, input from the expert dermatologists and further consideration of the earlier CE findings, two additional items were added to assess concepts reported during the interviews not captured by the initial HESD (‘skin hardness’ and ‘oozing/weeping’). The original 13 items were retained for further evaluation in round two without revisions.

Additional modifications were made to the HESD following round two. Items assessing skin ‘hardness’ and ‘roughness’ were removed due to their similarity/overlap with skin ‘thickening’. Thickening was retained as it reflected the terminology used most frequently by participants when describing this sign/symptom. The item assessing ‘blisters’ was also removed because of its low relevance during CE and input from the expert dermatologists that blisters are relatively rare and typically representative of severe disease and therefore would not be relevant to most CHE patients. As blisters do not change rapidly from day to day and are a sign that is easily observed by clinicians, it was decided they would be more appropriate to evaluate in clinician-completed assessments, such as the IGA-CHE and Hand Eczema Severity Index (HECSI). Additionally, the item assessing ‘heat’ was removed due to its low relevance during CE and overlap with ‘burning’. This resulted in the 11-item HESD taken forward for psychometric validation in the phase 2b trial.

Modifications to HESD Following Initial Psychometric Validation and Regulatory Feedback

Based on the findings of the initial psychometric validation using the phase 2b data but importantly also considering the findings of the qualitative research and clinical input, the HESD was reduced from 11 to 8 items. Items assessing ‘bleeding’ and ‘oozing/weeping’ were removed because of poor item performance (i.e., heavily skewed response distributions across all timepoints [range 42.9–78.2%]; see Fig. S1) and low correlations (range 0.36–0.66; Table S2) with other items in the measure. Expert clinical input suggested these signs/symptoms were more characteristic of a severe CHE flare rather than daily experience and could likely be adequately assessed through clinician-completed assessments. Additionally, the qualitative findings and high correlations (r = 0.90; Table S2) suggested ‘pain’ and ‘burning’ were very closely conceptually related to the point of redundancy. Expert clinical input confirmed that ‘pain’ was more important to assess from a clinical perspective and therefore ‘burning’ was removed.

Following US FDA feedback, the HESD was further reduced from 8 to 6 items. Items assessing ‘swelling’ and ‘thickening’ were removed because the US FDA had questioned their importance to CHE patients, given that < 50.0% of interview participants spontaneously reported experience of these symptoms (Table 4) and response distributions for the phase 2b trial indicated that > 15.0% reported ‘no sign or symptom’ at Baseline for these items (Fig. S1). This resulted in the 6-item HESD brought forward for psychometric validation in the phase 3 trial (Fig. 2).

Fig. 2.
figure 2

Six-item HESD conceptual framework

Psychometric Validation of the HESD in Phase 3 Clinical Trial

For the psychometric results, we focus on the results from the evaluation of the final 6-item version in the phase 3 clinical trial. However, results from the validation of the 8-item HESD using the phase 2b data were comparable and equally strong and are summarized in Table S1.

Trial Sample Characteristics

Key demographic and clinical characteristics of the phase 3 sample are provided in Table 5. The sample included more female (65.7%) than male participants and most were clinically classified as Fitzpatrick skin types II or III (43.2% and 41.1%, respectively).

Table 5 Demographic and clinical characteristics for the psychometric analysis population in the phase 3 trial

Stage 1: Item Properties

Quality of completion Quality of completion was relatively high across all timepoints, decreasing only slightly over time. For the 7 days prior to the Baseline visit, 80.7% (n = 226) had no missing HESD daily item scores, and for the 7 days prior to the Week 16 visit, 72.9% (n = 204) had no missing HESD daily item scores. Over 94.0% had at least four complete days of HESD daily scores for a given week and most participants were missing just 1 or 2 days in each week.


Item-response distributions For all items, post-Baseline item responses were distributed across the response scale and in line with expectations (see Figures S2, S3 and S4). Ceiling effects (> 15.0% scoring the highest possible score) were only detected during the Baseline week for ‘cracked skin’ at Day −2 (16.0%) and Day −1 (16.1%) and for ‘dryness’ from Day −7 to Day −1 (17.1–18.9%). This was limited and not a concern as participants were expected to have more severe CHE at entry. Starting at Week 4, floor effects (> 15.0% scoring the lowest possible score) became dominant, reflecting improvements associated with treatment.

Stage 2: Dimensionality and Scoring


Inter-item correlations Inter-item correlations were calculated using data from Day −1 of Week 4 to provide an initial exploration of dimensionality. All inter-item correlations were moderate to high (0.71–0.89), and no correlations exceeded 0.90, suggesting all items are closely related but no item-pairs were so closely related to suggest redundancy (Table S3).


Rasch analysis Rasch analysis provided further support for a single unidimensional structure of the HESD, with all items showing acceptable fit statistics (see Online Appendix 1 for full details). Some evidence of person misfit was found for the infit and outfit statistics for the overall model; however, the patterns were not of concern given the low proportion of participants with high fit statistics. Item characteristic curves provided evidence the response scale for each item was appropriate, with distinct peaks for most response options on the 11-point NRS. Independence between item response was demonstrated for all items except ‘dryness’ and ‘flaking’.

Stage 3: Reliability and Validity of Scores


Internal consistency reliability Internal consistency was examined using data from Day −1 of Week 4 to assess the homogeneity of items belonging to the HESD weekly average score (Table S4). Cronbach’s alpha was high (0.96) and well above the a priori threshold of > 0.70, indicating good internal consistency. Calculation of the alpha coefficient with each item deleted in turn resulted in slightly lower Cronbach’s alpha values for all items (0.95–0.96), providing support for retaining all items.


Test-rest reliability Test-retest reliability results for participants defined as stable were ‘excellent’ (ICC range 0.89–0.94) for the weekly average HESD Itch score, HESD Pain score and HESD score between Week 2 and 4 and between Week 4 and 8, irrespective of the anchor used to define stability. Pearson’s correlation coefficients (range 0.84–0.91) were similar to ICCs, providing further evidence of strong test-retest reliability for the HESD scores (Table 6). The Spearman-Brown Prophecy Formula, used to approximate the reliability of the averaged measurement when creating a weekly summary score from < 7 days, confirmed that a summary score based on just 4 days of data would still have good reliability (> 0.83).

Table 6 Weekly average HESD Itch, HESD Pain and HESD score intra-class correlation coefficient estimates of test-retest reliability

Convergent validity Correlations were examined between the HESD Itch score, HESD Pain score and HESD score and the DLQI (total score, symptoms and feeling subscale, and item 1: itch, pain, soreness, stinging) at Week 4 (Table 7). All convergent correlations were moderate (range 0.53–0.64) for all three HESD scores and exceeded the hypothesized threshold, providing evidence of convergent validity.

Table 7 Correlation of HESD scores with convergent measures at Week 4

Known-group validity HESD Itch scores, HESD Pain scores and HESD scores were compared among groups who differed in severity as defined based on their Patient Global Assessment (PaGA), IGA-CHE and respective Itch, Pain and HESD PGI-S scores. For the HESD Itch, there was a pattern of significantly higher mean scores (indicating worse itch) for participants who also scored higher (worse) on the PaGA, IGA-CHE and Itch PGI-S (p < 0.001 for all scores), with expected monotonic increases across severity groups (Table 8). Effect sizes indicated differences between adjacent groups were moderate to large (ES > 0.69). Similar patterns of results were found for the HESD Pain scores (Table S5) and HESD scores (Table S6), providing evidence of known groups validity for all three HESD scores.

Table 8 Known groups validity for the HESD itch scores at Week 4

Ability to detect change Changes in HESD Itch scores, HESD Pain scores and HESD scores were compared among participants defined as “improved”, “stable” and “worsened” on the PaGA, IGA-CHE and the respective Itch, Pain and HESD PGI-S and PGI-C between Baseline and Week 16. These results provide evidence the three HESD scores can detect change over time, regardless of the rating used to define change. As shown in Table 9, for all anchors there was strong evidence that the HESD Itch score is responsive to improvements over time, with consistently large within-group effect sizes (ES ≥ 2.67) compared with moderate-large effect sizes in the ‘stable’ group (ES range 0.73–1.12). The number of participants who worsened was extremely low for the PaGA, IGA-CHE and Itch PGI-S and should therefore be interpreted with caution. Differences between change groups were statistically significant, and effect sizes were mostly large between groups. Results were similar and equally strong for the HESD Pain scores (Table S7) and HESD scores (Table S8).

Table 9 HESD itch scores ability to detect change between Baseline and Week 16

Interpretation of scores Anchor correlations revealed that all anchors were sufficiently correlated with their target scores (range 0.54–0.70) to support meaningful change analysis. Mean change in HESD scores was calculated for participants defined as ‘minimally’ or ‘moderately’ improved on the PGI-S (Itch, Pain, HESD), PGI-C (Itch, Pain and HESD), PaGA and IGA-CHE to define within-group change thresholds. This provided a range of values that could be considered plausible as the meaningful change threshold for a within-individual responder definition: HESD Itch score (−2.8 to −5.4), HESD Pain score (−2.4 to −5.4) and HESD score (−2.7 to −5.2). A correlation weighted average of these estimates was then produced, weighted according to the strength of anchor correlation with the target score. This produced values of −4.3 for the HESD Itch score, −4.4 for the HESD Pain score and −4.2 for the HESD score. Consultation of empirical cumulative distribution function (eCDF) and probability density function (PDF) plots showed these thresholds would classify > 50.0% of ‘moderately’ and ‘much improved’ participants (based on the anchors) as improved while classifying few ‘stable’ participants as improved (see Fig. 3 as an example; see Figs. S5S9 in the online supplementary material for the remaining eCDF plots). Between-group minimal important differences (MIDs) were also estimated with plausible ranges suggested as −1.45 to −2.50 for HESD Itch score, −0.87 to −3.04 for HESD Pain score and −1.54 to −2.51 for HESD score.

Fig. 3
figure 3

Empirical cumulative distribution function of HESD Itch change scores by Itch PGI-S group at Week 16

The estimates derived from these anchor-based methods were triangulated to form a recommended threshold for a responder definition. Estimates were summarized on forest plots to identify convergence around a small range of values (see Fig. 4 as an example; see Figs. S10 and S11 in the online supplementary material for the remaining forest plots). A within-individual responder definition of −4.0 for each HESD score was recommended for consistency with thresholds used for similar scores in related populations [38] as well as for ease of use/interpretation.

Fig. 4
figure 4

Forest plot showing different within-group mean change and distribution-based estimates of meaningful change for the HESD Itch score

Discussion

The HESD was developed based on in-depth qualitative research with the population of interest (CHE patients), thus following best practice methods for development of PROMs [9, 11,12,13,14]. As reported here, in-depth cognitive interviews were then used to confirm strong item relevance and comprehension, confirming the content validity of the measure. This was followed by psychometric evaluation using data from two clinical trials. All items were found to be highly relevant to most patients in both the qualitative and psychometric work and well understood by patients in the qualitative interviews. There were low levels of missing data throughout the phase 3 trial, providing evidence the diary is not overly burdensome to complete daily, even for a 16-week trial. Evaluation of item response distributions suggests the response scale can capture variability in severity of CHE signs/symptoms as well as changes in sign/symptom severity over time.

The final 6-item measure demonstrated strong validity, reliability and the ability to detect change over time in the specific context of adult CHE patients with moderate to severe signs/symptoms. Inter-item correlations provided evidence that all items are adequately related and can be grouped into a single unidimensional summary score. The underlying structure of the HESD was further supported by Rasch analysis, with all items showing acceptable item fit statistics. Internal consistency reliability was very high and not improved by removing any items in the scale, providing further support the HESD score (weekly average) is unidimensional and that all items assess a single underlying trait. Test-retest reliability results for all three HESD scores (weekly average) were very strong, irrespective of the timepoints used or how stability of CHE severity was defined.

Convergent validity findings were consistent with a priori hypotheses concerning relationships with the HESD scores and measures of related concepts (DLQI total score, DLQI symptoms and feeling subscale and DLQI item 1-itch, soreness and stinging). Known-group analyses provided evidence the HESD scores can discriminate among patients who differ in CHE severity. The expected pattern of monotonically increasing HESD scores across severity groups was identified and effect sizes indicated differences between adjacent groups were moderate to large. Importantly, the HESD scores were shown to be sensitive to improvements in CHE severity, with large effect sizes within groups defined as ‘improved’ and between ‘improved’ and ‘stable’ groups. The triangulation of various anchor-based methods supported a responder definition of −4.0 as the threshold for defining within-individual clinically meaningful improvements in all three HESD scores in moderate to severe CHE patients.

Strengths of this research include the capture of insights from 40 CHE patients through separate qualitative concept elicitation (n = 20) [9] and cognitive debriefing (n = 20) interviews during instrument development and content validity testing. Results from the psychometric validation conducted in two separate clinical trial populations were very consistent, providing confidence that the measurement properties are robust and not specific to a single sample. Furthermore, psychometric evaluation of the HESD was conducted in accordance with US FDA guidance for assessing measurement properties of PROMs [11, 14, 22, 23].

However, there are a few limitations to the study. The HESD was developed and tested only with patients in the US. To establish broader relevance across countries, there would be value in assessing content validity in a non-US population. For both psychometric validation populations, participants were predominantly white/Caucasian. Future confirmation of psychometric validity in larger proportions of non-white participants would be of value. Furthermore, although the interview sample included participants with three of the most prominent subtypes of CHE (i.e., allergic, atopic and irritant), it was not feasible to include all subtypes in the qualitative research. However, findings were consistent across the three prominent subtypes, providing some confidence in the generalizability of findings. Finally, all psychometric evaluation to date has been performed in clinical trial samples. If the HESD is to be used in real-world studies or in general clinical practice, further evaluation in a ‘real-world’ sample would be beneficial to confirm generalizability of the measurement properties.

Conclusion

The HESD is the first CHE-specific PRO diary measure developed and validated in line with regulatory guidance to specifically assess the severity of core signs and symptoms of CHE. Content validity of the HESD was confirmed in CD interviews with CHE patients, and psychometric validation activities showed evidence of strong construct validity, reliability and the ability to detect change for the HESD Itch score, HESD Pain score and the HESD score and that an improvement of ≥ 4 points in 7-day average HESD scores represents a clinically meaningful, important change.