FormalPara Key Points for Decision Makers
Table 1

1 Introduction

The burden of dermatologic diseases is well documented. In 2010, nonmelanoma skin diseases were the fourth-leading cause of nonfatal disease burden at the global level [1]. Atopic dermatitis (AD) and chronic hand eczema (CHE) are among the most common types of dermatologic disease. A large, web-based survey conducted in 2016 in eight countries estimated AD prevalence in the past 12 months ranging from 4.3 to 16.7%; point prevalence estimates ranged from 2.1 to 8.1% [2]. Hand eczema (HE) is common, but the prevalence of CHE is difficult to estimate because many affected individuals do not seek treatment. HE accounts for 9–35% of all occupational disease and affects an estimated 2–10% of the general population [3]. Dermatologic conditions have a significant impact on health-related quality of life (HRQOL). AD and CHE often cause constant, intense itching, highly visible symptoms (e.g. redness, flaking, bleeding from scratching), and impaired psychosocial and work functioning [4, 5]. Psychiatric comorbidities, including depression, anxiety, and suicidal ideation, are more common in individuals with AD than in the general population, even among patients with clinically mild or moderate disease [6, 7]. CHE is also associated with symptoms of anxiety and depression [8] and impairment in HRQOL, work productivity, and the performance of nonwork activities [9]. Despite the burden associated with AD and CHE, health care providers may underestimate the severity and impact of the symptoms and the stigma of having a visible skin condition [4].

Primary endpoints in clinical trials of AD and CHE are usually clinician-reported outcome (ClinRO) measures. Several ClinRO scales have been developed to combine assessment of different aspects of a dermatologic condition, such as extent or severity, into an overall score (e.g. Eczema Area and Severity Index [EASI]). These scales are intended to be objective measures of disease; however, few of the ClinRO measures commonly used in dermatology have been adequately validated. Evidence-based decision making in the treatment of dermatologic diseases is challenged by a lack of clinical outcome measures with demonstrated validity, reliability, responsiveness, and interpretability [10, 11].

Comparisons of the inter- and intrarater reliability of commonly used skin ClinRO measures such as the EASI, objective Scoring Atopic Dermatitis (SCORAD), and Investigator Global Assessment (IGA) highlight shortcomings in the reliability and consistency of these scales in assessing patients with AD [12]. Furthermore, the IGA has historically been defined by a particular sponsor for use in a particular trial or context, resulting in variation in IGA versions; only recently has a validated IGA been published for use in AD (Validated Investigator Global Assessment for Atopic Dermatitis [vIGA-AD]) [13]. In recognition of the challenges of evaluating outcomes in AD, the Harmonising Outcome Measures for Eczema (HOME) initiative was founded in 2008 with the aim of standardizing a core set of outcomes that should be assessed in clinical trials and routine practice to support evidence-based decision making [14].

Patient-reported outcomes (PROs) provide an important complement to ClinROs in both clinical trials and routine practice. Key symptoms and impacts of AD and CHE, such as pruritus, sleep disturbance, and interference with activities, are difficult or impossible for clinicians to assess. Additionally, the meaningfulness of clinical improvements can only be assessed by study participants [15]. The use of PROs helps clinicians, regulators, and other stakeholders understand patients’ experiences with the symptoms and impacts of a disease. Under the Patient-Focused Drug Development initiative, the US Food and Drug Administration (FDA) is urging the use of patient experience data in drug development and evaluation, most recently through the 21st Century Cures Act and the sixth authorization of the Prescription Drug User Fee Act (PDUFA VI) [16]. HRQOL data, as assessed by patient-reported outcome measures (PROMs), are also increasingly expected and considered in health technology evaluations by bodies such as Germany’s Institute for Quality and Efficiency in Health Care (IQWiG) and the UK’s National Institute for Health and Care Excellence (NICE). However, a systematic literature review of randomized controlled dermatology-related clinical trials found that PROs were included in some form in only 25.6% of 125 trials conducted between 1994 and 2001 [15]. (It should be noted that this review was completed before the US FDA’s guidance on the use of PROs to support potential claims in product labeling was issued in 2009.)

The objective of this study was to conduct a review of the literature to identify and evaluate PROMs used in studies of adults with AD or CHE. Our aim was to understand how the key symptoms and impacts of these conditions are assessed and to explore any gaps in the measures in use.

2 Methods

A structured review was conducted to identify PROMs used or developed for use in adults with AD or CHE (see Online Resource 1). Relevant articles were identified for review through searches of the PubMed database, using structured search strategies. To capture PROMs used in studies of the more recently developed or approved drugs for AD or CHE, the PubMed search was limited to clinical trials of treatments indexed since 2006. The search strategy was also limited to studies published in the English language and conducted in humans (versus animal research). In addition, searches of the ClinicalTrials.gov website (for interventional studies indexed from 2012 to 2017), FDA and European Medicines Agency (EMA) regulatory guidance documents, and drug labeling of drugs approved by the FDA or EMA for AD or CHE were conducted. Finally, medical reviews from the summary basis of approval from the FDA and European public assessment reports (EPARs) from the EMA for each approved product were examined to document whether label claims were granted based on PROs.

The most commonly used and evaluated measures identified in the initial review were then the focus of a more detailed review of their use in AD and CHE. A dermatology-specific instrument, an itch-specific instrument, an AD-specific instrument, and a CHE-specific instrument were chosen for the detailed review. Additional targeted searches were conducted in PubMed to identify studies evaluating or employing the measures of interest. The development, validation, and use of these PROMs in AD and CHE were described.

3 Results

3.1 Structured Literature Review

Among the 213 potentially relevant PubMed abstracts identified during the structured literature review, 37 studies using PROMs or describing the development or validation of a PROM were gathered for full-text review. Of these 37 studies, four were excluded after full-text review, for the following reasons: two studies did not include any PROM, one study did not evaluate a pharmaceutical treatment for AD or CHE, and one study did not include adult patients. Among the 64 ClinicalTrials.gov entries reviewed, 29 were determined to be relevant. In addition, five AD drug labels from the FDA and the EMA were reviewed. No CHE drugs had been approved by the FDA or the EMA at the time the review was conducted. Table 1 summarizes the relative frequency of the measures used in the identified studies.

Table 1 Measures identified by source

3.2 Regulatory Label Review

Table 2 summarizes the PRO results reported in FDA labels for AD treatments. The dupilumab label included a claim of reduction in itch using a Peak Pruritus Numeric Rating Scale (NRS; 0–10, with 10 being the worst pruritus), the tacrolimus label included a claim of improvement in patient evaluation of pruritus using a 10-cm visual analog scale (VAS; with 10 cm being the worst itch imaginable), and the pimecrolimus label included a claim of improvement in pruritus (specific means of assessing this outcome were not reported).

Table 2 FDA PRO label language for noncorticosteroid products recently approved for AD

Table 3 summarizes the PRO results in EMA and country-specific regulatory documents for AD and CHE. The dupilumab EMA label included claims of improved patient-reported symptoms based on the Pruritus NRS, as well as sleep, HRQOL, anxiety, and depression based on the Patient-Oriented Eczema Measure (POEM), the Dermatology Life Quality Index (DLQI), and the Hospital Anxiety and Depression Scale. The tacrolimus EMA label included a claim of improved HRQOL as indicated by the DLQI and the Children’s DLQI. The alitretinoin UK, Canada, and Israel labels in CHE included claims of improvement in a patient global assessment of symptoms.

Table 3 EMA and country-specific PRO label language for noncorticosteroid products recently approved for atopic dermatitis

3.3 Detailed Patient-Reported Outcome Measures Review

Based on the findings related to PROM use in the structured review, the subsequent in-depth review focused on four measures: the dermatology-specific DLQI, the itch-specific Pruritus/Itch NRS, the AD-specific POEM, and the CHE-specific Quality of Life in Hand Eczema Questionnaire (QOLHEQ). The DLQI and Pruritus NRS are dermatology-specific and could be used in AD or CHE, while the POEM is an AD-specific measure and the QOLHEQ is HE-specific. Table 4 summarizes the key characteristics of these measures, and Table 5 summarizes their psychometric properties as reported in the literature.

Table 4 Summary of key characteristics of PROMs of interest
Table 5 Summary of psychometric properties reported in the literature for PROMs reviewed

3.3.1 Dermatology Life Quality Index

The DLQI is a 10-item dermatology-specific QOL assessment with a 1-week recall period [17], and is the most frequently used HRQOL measure in dermatology clinical trials [18]. The DLQI assesses symptoms and feelings, daily activities, leisure, work and school, personal relationships, and adverse effects of treatment, and has nine items with four response options: ‘not at all’, ‘a little’, ‘a lot’, and ‘very much’. One item first asks whether work or study has been prevented and then (if ‘yes’) to what degree the skin condition has been a problem at work or study (‘a lot’, ‘a little’, or ‘not at all’). Individual item scores are summed to obtain a total DLQI score that can range from 0 to 30, with higher scores indicating worse HRQOL. The DLQI may be analyzed based on its six subscores (symptoms and feelings, daily activities, leisure, work and school, personal relationships, adverse effects of treatment). Hongbo et al. [19] developed banding of DLQI scores to facilitate their clinical interpretation, with scores of 0–1 indicating that a skin condition has no impact on HRQOL, scores of 2–5 indicating a small impact, scores of 6–10 indicating a moderate impact, scores of 11–20 indicating a large impact, and scores of 21–30 indicating an extremely large impact.

3.3.1.1 Use in Atopic Dermatitis (AD)

DLQI content was generated with input from 120 patients representing more than 30 different dermatology subgroups, including nine patients with AD and ten patients with ‘other eczema’ [17]. The measure is widely used and has been implemented in many studies of moderate-to-severe AD. In a systematic review of randomized, controlled trials in AD conducted between 2000 and 2014, the DLQI was used in over half of the 36 trials that used an HRQOL measure [20]. Furthermore, the DLQI is recommended by the HOME initiative as one of the best available measures to assess HRQOL in AD [18].

The psychometric properties (reliability, validity, and ability to detect change) of the DLQI have been demonstrated in patients with AD [21,22,23,24,25,26,27,28,29,30]. Two review articles provided a thorough overview of the use of the DLQI and its psychometric properties [22, 23], both concluding that the DLQI showed adequate levels of internal reliability, test–retest reliability, validity, and sensitivity to change. Estimates of the DLQI’s test–retest reliability have been investigated in several studies and found to be generally high across studies (i.e. Pearson correlation coefficient or intraclass correlation coefficient [ICC] > 0.70) [22, 23]. A Spanish study in a sample of 114 AD patients reported a test–retest ICC of 0.77 over a 1-week interval for a clinically stable subgroup [25]. In addition, several studies have estimated internal consistency (Cronbach’s α) of the DLQI in a range of dermatological conditions [22, 23]. In these studies, Cronbach’s α values ranged between 0.75 and 0.92, indicating the items are sufficiently related to form a scale. Several of these studies included AD patients; for example, among a mixed sample of 237 patients with AD or psoriasis (48% AD) in Spain, Cronbach’s α was 0.83 [25].

The construct validity of the DLQI has been extensively evaluated. Basra et al. [22] identified 37 different articles reporting the correlation of the DLQI with generic, dermatology-specific, and disease-specific measures, of which 11 studies examined construct validity of the DLQI in patients with AD. These studies showed that the DLQI varies in the strength of its association with other PRO instruments in line with the similarity of the constructs assessed. Two studies of people with AD found that correlation of the DLQI was stronger with the 36-Item Short Form Health Survey (SF-36) Mental Component Summary than the SF-36 Physical Component Summary (PCS) [26, 27]. This finding is expected, given that the PCS addresses physical limitations, which are not a key feature of AD. Other studies in AD populations found correlations between the DLQI and the POEM (r = 0.78; p < 0.001) [28] and the DLQI and the SCORAD (r = 0.42, p < 0.001) [29].

The DLQI’s responsiveness is also well established. Basra et al. [22] reported that most of the 33 efficacy studies in which the DLQI had been used between 1994 and 2007 showed that the DLQI detected change in patients before and after treatment. The authors highlighted 17 studies, which included a range of dermatologic conditions (most commonly psoriasis) that were particularly relevant to demonstrating the responsiveness of the DLQI. Badia et al. [25] evaluated the responsiveness of the Spanish DLQI in a sample of 114 adults with eczema who were treated with topical corticosteroids. Over the 21-day study period, mean DLQI scores significantly reduced from 4.5 to 1.6 (p < 0.001), yielding a large effect size of 0.82. Furthermore, among seven published clinical trials that included the DLQI (see Online Resource 2 and Online Resource 3), all studies showed improvements in DLQI scores after treatment, indicating that the DLQI is able to detect change associated with treatment in patients with moderate-to-severe AD. Studies of the biologic drugs dupilumab [31, 32] and nemolizumab [33] showed statistically significant and clinically meaningful improvement in DLQI scores for the treated versus placebo groups.

A 2008 review of DLQI validation studies that used both anchor- and distribution-based methods to estimate thresholds for interpretability of overall DLQI scores in specific skin conditions (e.g. inflammatory conditions, psoriasis, hyperhidrosis, and chronic idiopathic urticaria) found estimates for meaningful change of between 2.2 and 6.9 [22]. More recently, an anchor-based method was used to estimate a threshold for meaningful change in a sample of 192 patients with 20 chronic and acute skin diseases, including psoriasis (50.5%), acne (21.9%), and eczema (12.5%) [30]. This study demonstrated that a small change (based on a change of 2 or 3 on a 15-point Patient Global Rating of Change scale) was associated with a mean DLQI change score of 3.3 (n = 31). The authors recommended a threshold of 4 points for evaluating meaningful change in DLQI scores over time.

3.3.1.2 Use in Chronic Hand Eczema (CHE)

Among observational studies of CHE, the DLQI is the most frequently used PROM [34, 35]. Studies using the DLQI have established that CHE has a significant impact on HRQOL [36, 37], and increasing levels of CHE severity and productivity loss are associated with higher DLQI scores (indicating lower HRQOL).

The DLQI is a generic dermatology-related QOL measure, but it is not clear if it covers all of the key concepts relevant to CHE. There is no documented evidence that the development of the DLQI included patients with CHE, although of 120 patients who provided input, 10 had ‘other eczema’ (eczema other than AD) [17]. The psychometric properties of the DLQI have been demonstrated in patients with CHE [35, 38,39,40,41,42]. However, an alternative, six-item version of the DLQI with revised scoring has been recommended for the HE population based on a Rasch analysis [41]. In this version of the DLQI, items assessing personal relationships and interference with certain activities (shopping or looking after home or garden/social or leisure activities) were removed.

Reilly et al. [38] evaluated the DLQI in a randomized controlled trial (RCT) of pimecrolimus cream 1% in 257 people with mild or moderate CHE. For all DLQI subscores, except adverse effects of treatment, low DLQI scores (indicating better HRQOL) were predicted by low IGA, Total Signs and Symptoms (TSS), and Subject’s Overall Self-Assessment (SOSA) scores (p < 0.01 to < 0.0001). Improvements in IGA, TSS, and SOSA were significant predictors of improvement in all DLQI scores (p < 0.03 to < 0.0001).

Furthermore, DLQI scores have been found to correlate with other measures in observational studies, further establishing its construct validity in CHE. Agner et al. [34] found a median DLQI score of 8 in 416 patients with HE referred in Europe, and a significant correlation with disease severity as measured by the clinician-reported Hand Eczema Severity Index (HECSI; p < 0.001). Cvetkovski et al. [35] found a mean DLQI score of 7.8 in Danish patients with severe occupational HE, and there was a clear correlation of worsening DLQI scores with increasing HE severity. Depressive symptoms as measured by the Beck Depression Inventory II were strongly associated with impaired HRQOL as measured by the DLQI. High DLQI scores (indicating more impact on HRQOL) also were associated with prolonged sick leave and unemployment in patients with occupational HE [35].

A comparison of four methods of assessing HE severity, including DLQI, was conducted in 119 patients with moderate-to-severe HE from Denmark, Germany, and The Netherlands [40]. Objective HE severity assessment was performed by physicians using the HECSI and the Physician Global Assessment (PGA; 1 = almost clear, 2 = mild, 3 = moderate, 4 = severe). Patients completed the DLQI and a Clinical Photo Guide (patients selected the photo of HE most like their own from an array of four photos depicting HE of worsening severity). When correlations among the measures were assessed, all six pairwise correlation coefficients between the tested methods were statistically significant. Correlations between the DLQI and the three other HE measures were the weakest (r range 0.30–0.45), although statistically significant. The correlation between the HECSI and the PGA was highest (r = 0.82) [40]. These results indicate that the DLQI assesses concepts that are different from those assessed by objective measures of HE severity, and even from another subjective measure focusing on the appearance of HE.

Other analyses have demonstrated the DLQI’s reliability in CHE, but results related to the measure’s ability to detect change are limited and have been mixed. Among patients with stable CHE, there were no significant changes in DLQI scores from baseline to day 22, or baseline to week 26 [38]. In an RCT of 319 patients with moderate or severe CHE randomized to three different doses of alitretinoin or placebo (in which 51.4% of patients completed DLQI questionnaires), changes in DLQI scores from baseline were not statistically significant, possibly because the study lacked statistical power. In contrast, based on data from a clinical study of pimecrolimus cream 1% versus placebo in CHE, treatment success was a significant predictor of improvement in DLQI scores (p < 0.03 to < 0.0001) for all but the personal relationships score [38].Footnote 1 This study did not report DLQI score changes or differences between the treatment and placebo groups.

3.3.2 Pruritus/Itch Numeric Rating Scale

While no development history for a Pruritus NRS item is available, it is not uncommon for relatively simple symptom assessments to be lacking both a published development history and standard wording. A typical NRS is a scale from 0 to 5, or 0 to 10, with verbal anchors. For example, a pain NRS might have anchors of no pain for 0 and the worst pain you can imagine for 10.

The validity and psychometric properties of a Pruritus NRS have been demonstrated in pruritic conditions [43, 44]. A validation study was sponsored by the International Forum for the Study of Itch and assessed the reliability of a pruritus intensity VAS (100-mm line with anchors of no itch and worst itch imaginable), NRS (0–10, with anchors of 0 = no itch and 10 = worst itch imaginable), and verbal response scale (VRS; 4-point scale, 0 = no itch, 1 = low itch, 2 = moderate itch, 3 = severe itch) in 471 adults with chronic itch (mean age 58.4 years). Participants assigned a score representing the intensity of their symptoms using each of the three scales. All tools were found to have high reliability and concurrent validity (r > 0.8; p < 0.01), and mean values of all scales were highly correlated. In addition, the psychometric properties of an 11-point Pruritus NRS with anchors of 0 = no itching and 10 = worst itch imaginable were evaluated in a phase II study of baricitinib in patients with psoriasis [44]. Patients indicated their worst level of itching due to psoriasis in the past 24 h. Test–retest reliability was good (ICC range 0.71–0.74). Correlations with the DLQI scores were strong (r ≥ 0.80 at week 12), as were correlations in changes in the Itch NRS and DLQI (r ≥ 0.71), supporting the construct validity of the Itch NRS. A 4-point change was found to demonstrate clinically meaningful improvement in itch severity (corresponding to notable clinical improvements in psoriasis) after 12 weeks of treatment [44].

3.3.2.1 Use in AD

A Pruritus NRS has been used in three AD trials [31, 32, 45]. In these trials, the Pruritus NRS found statistically significant between-group differences and identified treatment responders.

3.3.2.2 Use in CHE

In a survey study, the most commonly reported symptoms of patients with CHE were dryness/flaking (81%), itchiness (75%), and cracking/tearing of the skin (71%), with itchiness and cracking of the skin being the most bothersome symptoms [5]. Among the clinical studies of CHE that were identified in this review, a study of pimecrolimus versus placebo used a 4-point NRS of 0 (absent) to 3 (severe) to assess pruritus, and found significant between-group differences [46].

3.3.3 Patient-Oriented Eczema Measure (POEM)

The POEM is a 7-item tool for assessing patient-reported severity of AD that is used in clinical practice and clinical trials to assess AD symptoms and sleep interference [28]. Specifically, the POEM items assess the frequency of dryness, itching, flaking, cracking, sleep disturbance, bleeding, and weeping/oozing because of eczema during the past week. Response options are 0 = no days, 1 = 1–2 days, 2 = 3–4 days, 3 = 5–6 days, and 4 = every day, and scores range from 0 to 28. Higher scores indicate a greater frequency of AD symptoms and sleep disturbance. The POEM, developed as an AD-specific measure, has not been used in CHE populations.

The POEM is an established PRO instrument and its use as an outcome measure to assess patient-reported symptoms in clinical trials is recommended by several international bodies, including the HOME initiative. The instrument content was generated and refined based on input of patients with AD, thus establishing content validity [28]. The measurement properties of the POEM, including reliability, construct validity, and the ability to detect change, have been adequately demonstrated in the literature [11, 18, 28, 31, 47, 48]. As part of a systematic literature review, Schmitt et al. [11] reviewed the validity, reliability, sensitivity to change, and ease of use of 20 AD severity measures, including the POEM. The authors concluded that, of the 20 instruments reviewed, only the POEM, SCORAD, and EASI could be recommended for use based on being evaluated sufficiently and performing adequately. In another systematic literature review of patient-reported symptom measures conducted as part of the HOME initiative, of the 18 instruments reviewed, only five symptom measures, one of which was the POEM, had been sufficiently validated to be considered potentially appropriate for use as a patient-reported measure in clinical trials [18]. The POEM has also shown adequate internal consistency, with a Cronbach’s α of 0.88 among a sample of 200 adult and pediatric patients with AD [28]. Its test–retest reliability was assessed in 50 patients with AD over a 24- to 48-h interval, with a mean difference between total scores over time of 0.04 (standard deviation 1.32). Scores were the same on both administrations in 33 (66%) of the 50 patients, within 2 points in 46 (92%) of the patients, and within 3 points in 49 (98%) of the patients, confirming acceptable test–retest reliability [28].

Construct validity for the POEM has been demonstrated by correlations between POEM total scores and DLQI total scores (r = 0.78), a patient global assessment of disease severity (rated on a 5-point scale—clear, mild, moderate, severe, or very severe) (r = 0.81), and a patient global assessment of overall bother related to eczema (rated on a 0–10 scale) (r = 0.84) [28]. Coutanceau and Stalder [47] also assessed the level of association between several AD severity measures (including the POEM) and HRQOL (DLQI). The POEM showed higher correlations with the Patient-Oriented SCORAD and adapted Self-Administered EASI (correlations between 0.72 and 0.79) than with the clinician-reported SCORAD (correlations between 0.58 and 0.66). The correlations between total scores on the POEM and DLQI were 0.64 at baseline and 0.66 at 4- to 8-week follow-up.

Preliminary evidence of the POEM’s ability to detect change was demonstrated as part of the initial instrument validation study [28]. A sample of 40 newly referred patients receiving treatment for AD who completed the POEM at clinic presentation and at weeks 1 and 4 of treatment had a decrease (improvement) in mean POEM total score, as well as in the individual item scores, over the 4-week period [28]. The responsiveness of the POEM to treatment benefit in moderate-to-severe AD has been demonstrated in three randomized placebo-controlled trials of dupilumab [31, 32]. In all three studies, the POEM detected significant changes after treatment, as well as significant between-group differences.

3.3.4 Quality of Life in Hand Eczema Questionnaire

The QOLHEQ was developed in German with input from patients with CHE in Germany, and simultaneously translated into several languages. The QOLHEQ assesses hand eczema-specific HRQOL over the past 7 days and “includes all impairments or limiting conditions caused by the health state of an individual [with hand eczema]” [49]. The QOLHEQ has 30 items in four domains—symptoms, emotions, functioning, and treatment/prevention—and asks patients to consider the level of bother related to ‘the skin condition of their hands’ during the past 7 days. Response options are a 5-point VRS (never, rarely, sometimes, often, all the time).

Initial item generation for the QOLHEQ did not involve concept elicitation interviews with patients. Experts developed the draft items based on reviews of the literature and existing dermatology-specific HRQOL measures, and the researchers prespecified the measure’s domains (symptoms, emotions, functioning, and treatment/prevention) before beginning the development process. Nevertheless, content validity of the measure in the CHE population was supported with focus groups (n = 34), during which the comprehensibility and completeness of the draft measure were reviewed. In a preliminary psychometric evaluation of the QOLHEQ conducted in a longitudinal validation study of German patients with CHE (n = 316), internal consistency, test–retest reliability, construct validity, and discriminant validity were found to be acceptable [50]. Responsiveness to change was demonstrated among a subset of 154 patients who reported CHE severity that was much improved or much worse over a period of 4–6 weeks. The QOLHEQ was more sensitive to change in CHE severity than the DLQI, Skindex-17, or EuroQol-5 Dimensions (EQ-5D) [50]. Validation studies have been conducted in a cross-cultural setting and with a Japanese version of the measure [51, 52], providing additional support for its construct validity. The QOLHEQ also has been used in a 5-year registry evaluating the management of patients with CHE [53].

4 Discussion

This study aimed to explore the key symptoms and impacts associated with AD and CHE in adult patients, review existing dermatology-specific PROMs used in the literature, and identify any gaps in the measures in use.

Based on the reviews conducted, several PROMs have been used to assess AD and CHE in clinical studies. Measures used included multidimensional assessments of HRQOL that were either AD-specific, skin-specific, or generic measures and single-item scales of key symptoms of AD using either an NRS, VAS, or VRS. The most frequently used measures in adult AD were the DLQI for HRQOL and single-item pruritus scales.

In CHE, clinical studies of alitretinoin used a patient global assessment of CHE control/severity consisting of a categorical scale (cleared, almost cleared, mild, moderate, severe), a pruritus VRS, a pain VAS, and the Skindex-29. A study of pimecrolimus 1% in CHE used a 4-point VRS for pruritus severity and burning severity, where 0 = absent and 3 = severe [46].

In clinical studies of both AD and CHE, symptoms and dermatology-related QOL in the domains of daily activities, leisure, work and school, personal relationships, feelings, and adverse effects of treatment are most commonly evaluated using the DLQI. Similarly, recent AD trials have employed a single-item, patient-reported 11-point Pruritus NRS. The AD-specific POEM is used often in AD trials to evaluate the frequency of specific symptoms (dryness, itching, flaking, cracking, bleeding, and weeping/oozing), as well as sleep disturbance. The CHE-specific QOLHEQ evaluates the level of bother of specific symptoms (pain, itch, affected sleep, fissuring, redness, bleeding, and dryness), as well as the impact of CHE on emotions, functioning, and treatment and prevention. Three of these measures have been included in regulatory labels of AD drugs: a Pruritus NRS (0–10 scale) for the FDA and EMA (dupilumab), the DLQI for the EMA (tacrolimus and dupilumab), and the POEM for the EMA (dupilumab).

Prior research has highlighted the limitations of clinician assessments in dermatology and has suggested that patient experience data may be underrepresented in dermatology in general [12, 54]. Although only patients can accurately report the intensity of symptoms such as pruritus and pain—which likely are among the most bothersome symptoms associated with dermatologic diseases [55]—primary endpoints in clinical trials of AD, CHE, and other dermatologic diseases have traditionally been ClinROs [54].

The results of this review suggest that specific AD (e.g. itching, flaking, cracking) and CHE (e.g. pain, itching, fissuring) symptoms are being assessed with PROMs in increasing numbers of clinical trials. The use of these assessments appears to be part of a broader trend of more consistent assessment of symptoms using PROMs alongside clinician-assessed signs in clinical trials. In addition, as therapeutic strategies in dermatology become more targeted toward specific dermatologic symptoms and toward diseases affecting specific sites (e.g. CHE), future research should explore, through PROs, patients’ experiences with these symptoms and site-specific diseases and the changes with treatment that are most meaningful to them.

The assessment of PROs is evolving to better characterize the key symptoms and impacts that patients with dermatologic conditions experience, and regulatory agencies have adopted a more patient-focused view of treatment benefit. Regulators increasingly expect evidence of treatment benefit not only in the primary symptom (e.g. pruritus) but also in secondary symptoms of AD. To explore a regulatory perspective, this review investigated PRO-related label claims, which usually are based on at least secondary endpoints in phase III clinical trials and, for the FDA, tend to rely on symptom-focused measures.

Some limitations of this study must be acknowledged. The literature search was conducted using a structured search strategy. In addition, studies were reviewed and included by a single reviewer. Finally, the definition of CHE is not standardized in the literature, potentially influencing patients’ impressions and descriptions of symptoms and limiting the comparability of findings between studies.

5 Conclusions

The reliance on ClinRO measures as the basis for primary endpoints in clinical trials in AD and CHE suggests that health care providers and the industry may be missing crucial information about treatment effectiveness and burden of disease from the patient perspective. It is important to capture the key symptoms reported by patients with AD and CHE to fully characterize the burden of these diseases and the potential for improvement with treatment. Preliminary research suggests that the key symptoms and impacts of AD and CHE differ, and the need for disease-specific PROMs for hand (and foot) eczema should be considered and based on further exploration of the experience of patients with site-specific eczema.