FormalPara Key Points
There is limited information on how the severity of atopic dermatitis is rated by patients and their physicians in the real world.
This study shows a discrepancy between how physicians and patients rate the severity of atopic dermatitis; patients focus more on skin-related quality-of-life outcomes and physicians focus more on sleep disturbance.
Better communication between patients and physicians should be encouraged to ensure the management of atopic dermatitis is directed towards the needs of the patient.


Atopic dermatitis (AD), a systemic inflammatory disease with autoimmune components, is a severe form of eczema pathologically characterized by skin barrier disruption mediated by a type 2 helper T-cell immune response [1]. Atopic dermatitis is part of the atopic march that may result from underlying biologic mechanisms [2], and while AD can develop in adults, onset is generally during early childhood and may extend into adult life in approximately half of the cases [3]; it has been estimated that the lifetime prevalence of AD in adults is 2–10% [4, 5]. In the absence of a cure, AD requires treatment over the lifetime of the patient.

In addition to the hallmark clinical symptoms of eczema and intense pruritus, AD is associated with a substantial and multi-faceted patient-reported burden that encompasses sleep disturbances, anxiety/depression, reduced function/productivity, and a lower quality of life (QoL) [614]. The presence of these effects of AD highlights the importance of understanding disease severity and impact from the patient’s perspective and the need to ensure that disease management is tailored to the needs of the individual patient.

Although AD presents in moderate and severe forms in 20–37% and 10–34% of patients, respectively [15, 16], there is limited understanding on how physicians assess the severity of AD in routine clinical practice and its relationship to how patients self-report their disease severity. This challenge of appropriately and consistently assessing disease severity is exacerbated by the availability of more than 20 measures that can be used to assess severity, which are mainly used in clinical trials [17, 18], and the lack of a consensus on how to define the severity of AD. While cut-off values for defining severity have been suggested for multi-component clinical scales such as the Eczema Area and Severity Index (EASI) [19] and Scoring Atopic Dermatitis (SCORAD) [20], these scales are predominantly based on specific clinical signs, and the use of these measures may not necessarily be appropriate for daily clinical practice, nor may they adequately capture factors that patients use to define severity. Simpler global impression scales may be more readily interpretable, but the wide variability in content and use in the absence of standardization limits their value in defining severity and making comparisons across populations and studies [21].

There is a need to better understand how AD severity is rated in clinical practice, and the objective of this study was to evaluate the level of agreement between physician- and patient-rated AD severity, and to identify factors that may be associated with discordance between these severity ratings. The study also evaluated physician awareness of clinical and patient-reported outcome measures available to assess disease severity and the impact of AD on QoL.


Data Source and Populations

The Adelphi AD Disease Specific Programme (DSP) is a cross-sectional real-world survey fielded between the fourth quarter of 2014 and the first quarter of 2015, which captures data from the perspective of physicians and their consulting patients [22]. The collection of patient data is compliant with the Health Insurance Portability and Accountability Act of 1996 and the Health Information Technology for Economic and Clinical Health Act of 2009.

Screening and recruitment of doctors reflect nationally representative samples subject to meeting the DSP inclusion criteria, which for the current study were primary care physicians (PCPs; including internal medicine specialists), dermatologists, or allergists/immunologists who qualified between 1978 and 2011; were practicing in the USA with active involvement in the pharmaceutical management of patients with AD; and saw a minimum number of patients with moderate-to-severe AD per month (five or more for PCPs, six or more for dermatologists, and 15 or more for allergists/immunologists); and severity was assessed by the physician.

For inclusion in the current AD DSP analysis, patients were required to be adults aged 18 years or older with, in the physician’s subjective opinion, a confirmed diagnosis or history of moderate or severe AD and not currently enrolled in a clinical trial. Patients who the physicians subjectively considered to have mild AD at the time of consultation could be included, provided that at some point previously in the course of their disease they were considered to have moderate or severe AD.



Physicians were requested to complete a detailed patient record form (PRF) for the next five prospective patients eligible for inclusion. Each specialist physician was requested to provide two additional prospective PRFs for patients currently receiving systemic immunosuppressant (IM) therapy. The PRFs captured characteristics of the AD and its treatment including the use of IMs, as well as information on the interference of itch and sleep disturbance on the patient’s daily activities in response to the question “Based on your discussions with the patient or perceptions, during the last week how much interference has each of the following aspects of the patient’s condition caused to their activities of daily living (excluding work)?” Physicians indicated interference using a Likert scale from 0 (none at all) to 6 (complete interference), with a cut-off value ≥4 considered substantial interference. Current severity of AD was rated by physicians based on the question “What is your overall assessment of the severity of atopic dermatitis symptoms in this patient currently based on your own definitions of mild, moderate and severe?”

Physicians also evaluated the individual components of the EASI [23], including body regions, area score, and severity scores, which were then used to calculate the EASI score for each patient. The EASI score provides an overall assessment of disease severity. In addition, physicians were asked to report their awareness and frequency of use of a range of clinical and patient-reported outcome measures that may be used for assessing AD severity and disease activity including: EASI; SCORAD (a clinical measure of AD severity) [20]; Patient-Oriented Eczema Measure (POEM; evaluates the presence and impact of AD signs and symptoms) [24]; Dermatology Life Quality Index (DLQI; evaluates the impact of skin conditions on health-related QoL) [25]; peak pruritus numerical rating scale (assesses severity of the worst itch in the past 24 h) [26]; and Pruritus 5D (assesses five dimensions of background itch: degree, duration, direction, disability, and distribution) [27].


Patients were invited to complete a patient self-completion (PSC) form. Those who accepted completed the PSC independently of the physician and returned it in a sealed envelope. The PSC consisted of questions eliciting information on demographics and details relating to their AD, and patients self-rated their current AD severity in response to the question “Generally how bad was your atopic dermatitis in the past 24 hours?” with possible responses of mild, moderate, and severe. The PSC also included validated measures that assessed QoL and impairment of daily work productivity. Quality of life was assessed using the DLQI (score range 0–30 with higher scores indicating a greater impact on QoL) [25] and the generic five-dimension EuroQol (EQ-5D-3L) [28], with health utility scores estimated using the US value set [29]. The Work Productivity and Activity Impairment Questionnaire for Specific Health Problems (WPAI:SHP) [30] was used to evaluate the effects of AD on productivity. The WPAI:SHP assesses the impact of a specific disease on work productivity and non-work activity during the previous 7 days, and in the current analysis, the overall work impairment item was used to evaluate the effects of AD among patients who were employed.

Statistical Analyses

Descriptive statistics were used to characterize the number and proportions of patients whose ratings matched or disagreed with physician ratings. Physician and patient ratings reflect perceptions of ‘current’ AD severity, and thus could be rated as mild despite the exclusion of patients with a history of only having mild AD. A weighted kappa coefficient was derived to determine the magnitude of the inter-rater agreement between the patient and physician ratings [31], and a concordance variable was based on the observed categories of patient assessment of severity higher (vs. physician), physician assessment of severity higher (vs. patient), and matched (reference group).

Bivariate analyses identified variables that showed significant differences in patterns of agreement. For categorical data, Pearson’s Chi-square tests were performed to test for significant differences between subgroups. For continuous data not considered to be normally distributed or ordinal data, Kruskal–Wallis tests were performed. All statistical tests performed were two-sided using a 5% significance level. Multinomial logistic regression was used to identify factors independently associated with a higher patient or physician assessment of severity using ‘matched’ as the base outcome, expressed as relative risk ratios (RRR) with their 95% confidence intervals (CIs). The choice of variables for inclusion in the model was based on a number of factors guided by a combination of bivariate analysis results in conjunction with disease knowledge to aid in the understanding of the likely correlation between variables. Standard errors were adjusted in the model to allow for the intra-group correlation (or clustering) of patients within physicians, using the Huber and White sandwich estimator of variances [32]. Where missing data were observed in a specific variable, that patient was excluded from analysis pertaining to that specific variable with no imputation of any missing data. Multi-variate analysis was performed on the set of patients that had complete data on the set of variables being analyzed. Goodness of fit for the logistic regression analysis was assessed using methods described by Cox and Snell and Cragg-Uhler/Nagelkerke as an indicator of effect sizes [33, 34]. All analyses were performed using Stata Statistical Software: Release 14 (StataCorp LP, College Station, TX, USA).


Physician Demographic Characteristics and Awareness of Outcomes Measures

The physician population consisted of 102 PCPs (including 52 internal medicine specialists), 75 dermatologists, and 25 allergists/immunologists. Overall, 120 (59.4%) were male, and they represented all geographic regions [East, n = 69 (34.2%); South, n = 28 (13.9%); Midwest, n = 58 (28.7%); West, n = 47 (23.3%)]; 93 (46.0%) qualified between 1978 and 1990, 65 (32.2%) between 1991 and 2000, and 44 (21.8%) between 2001 and 2011. Dermatologists and allergists generally reported greater awareness relative to PCPs across all the clinical and patient-reported outcomes measures (Table 1). However, even among dermatologists and allergists/immunologists, the awareness of key measures of itch, such as the pruritus numerical rating scale, Pruritus 5D, and POEM, was low (Table 1). Additionally, there was low use of all measures, including the pruritus numerical rating scale and the DLQI to assess the impact of AD on the patient’s daily life, both of which were commonly used by <5% of physicians across specialties (Table 1). While EASI was the most frequently used measure, with 69 physicians overall (34.2%) reporting use, it was commonly used by only 34 physicians (16.8%).

Table 1 Awareness of outcomes measures used for the assessment of atopic dermatitis

Patient Demographic and Disease Characteristics

The physicians provided PRFs for 1196 patients, 678 (56.7%) of whom completed a corresponding PSC, provided a rating of their current AD severity, and were included in the analysis. Of these, 369 were female (54.4%) and 525 were White (77.4%), with a mean age of 39.3 years, and the majority (315/492 with data available; 64.0%) had been diagnosed with AD as adults (Table 2). Approximately half of the patients (n = 312/629 with data available; 49.6%) were currently flaring (Table 2), and the mean body surface area affected was 15.2% (Electronic Supplementary Material (ESM) Table 1).

Table 2 Demographic and clinical characteristics of the study population, overall and by severity rating agreement between patientsa and their treating physiciansb

Severity Ratings

Level of Agreement

As shown in Table 3, of the 678 patients, 237 (35.0%) rated their AD as mild, 343 (50.6%) as moderate, and 98 (14.5%) as severe. In contrast, the physicians rated AD as mild in 175 patients (25.8%), as moderate in 404 (59.6%), and as severe in 99 (14.6%). The concordance variable showed that while 465 patient ratings (68.6%) matched the physician ratings, where discordance is present, patients tended to rate their disease severity lower than their treating physicians; in 137 cases (20.2%), the physician rated severity higher than the patients, and in 76 cases (11.2%), the patient ratings of severity were higher than the physician ratings. These ratings resulted in an overall severity discordance in 213 cases (31.4%), with a weighted kappa of 0.52 (bootstrapped 95% CI 0.518–0.525), indicating a moderate level of agreement.

Table 3 Physician- vs. patient-reported severity of atopic dermatitis

Bivariate Analysis

While race (White vs. other) was identified as being a significant demographic factor for discordance in AD severity ratings (p = 0.0099), neither specific ethnicity nor other demographic variables showed significance (Table 2). Among the clinical characteristics, neither time since diagnosis nor age at diagnosis was associated with patterns of agreement, although AD on the head and neck showed a significant association, as did the baseline symptom (day-to-day) of permanent scarring (Table 2). Although the baseline symptom (day-to-day) of papules was not deemed significant, the p value of 0.0500 was indicative of a possible relationship (ESM Table 1).

Patterns of agreement between physician and patient ratings showed no differences based on physician specialty (p = 0.6781; Table 2) or objective AD severity rating assessed by EASI score (p = 0.5308; ESM Table 1), percent body surface area affected (p = 0.9872; ESM Table 1), or current use of IMs (p = 0.9197; Table 2). In contrast, agreement patterns were significant for increased interference of AD on sleep (Fig. 1a; ESM Table 2), with less matching of severity ratings when there was sleep interference. Additionally, QoL was significantly lower, assessed using the DLQI (Fig. 1b; ESM Table 3; higher scores) and the EQ-5D-3L (Fig. 1c; ESM Table 3; lower scores), among patients who rated their disease severity higher than their physician’s rating. Similarly, among employed patients, overall work impairment on the WPAI:SHP was significantly greater for patients who gave higher ratings of their AD severity than their physician (Fig. 1d; ESM Table 3).

Fig. 1
figure 1

Agreement patterns based on subjective measures. Sleep disturbance interference in (a) was rated by the clinician on a scale of 0–6. Dermatology Life Quality Index (DLQI), 5-dimension EuroQol (EQ-5D), and Work Productivity and Activity Impairment Questionnaire for Specific Health Problems (WPAI:SHP) were patient-reported outcomes. CI confidence interval, SD standard deviation

Multivariate Analysis

Several variables that either trended toward significance or were identified as significant in the bivariate analysis were not significant in the regression analysis, including the baseline symptoms (day-to-day) of papules and permanent scarring, and AD on head/neck body regions (Fig. 2). However, two variables were demonstrated to be significantly associated with severity rating discordance; patient-reported QoL assessed using the DLQI, and sleep disturbance (Fig. 2). Patients were more likely to rate higher AD severity than their physician if they also reported worse QoL on the DLQI (RRR 1.04, 95% CI 1.00–1.08; p = 0.046) (Fig. 2a). Correspondingly, physicians were less likely to rate AD as more severe than the patient rating if the patient reported worse QoL (RRR 0.94, 95% CI 0.90–0.99; p = 0.017) (Fig. 2b). Conversely, physicians were more likely to rate higher AD severity than the patient when there was greater physician-reported sleep disturbance (RRR 1.71, 95% CI 1.01–2.89; p = 0.044) (Fig. 2b). Goodness-of-fit estimates using the methods of Cox and Snell R 2 (0.049) and Cragg-Uhler/Nagelkerke R 2 (0.061) were indicative of small effect sizes.

Fig. 2
figure 2

Relative risk ratios for higher severity assessment and their 95% confidence intervals. Cox and Snell’s R 2: 0.049; Cragg-Uhler/Nagelkerke R 2: 0.061. DLQI Dermatology Life Quality Index, *p < 0.05


Results from this study show a discordance between patient- and physician-reported AD severity, with an inter-rater agreement that was only of moderate magnitude (weighted kappa = 0.52); almost one-third of patients rated severity of their AD differently from how their physicians rated severity. These results are consistent with Torrelo et al. [35] who also found only moderate agreement between patient and physician perceptions of AD severity and a kappa coefficient that was identical to that of the current study. Such discordance between patient and physician perceptions of disease is common in the clinical setting and has been previously reported in other chronic conditions including osteoarthritis [36], rheumatoid arthritis [37], painful diabetic peripheral neuropathy [38], and psoriasis [39].

The use of bivariate analysis in this study was important not only for initially identifying significant variables that contribute to the observed disparity, but also for showing which variables were not significant. The patterns of agreement did not appear to be dependent on physician specialty, suggesting that physicians may have a similar misunderstanding of the patient’s perspective regardless of their specialty. Additionally, three key objective clinical measures also did not appear to influence pattern of agreement, i.e., the extent of AD as measured by body surface area, use of IMs, and EASI score, which itself has been proposed as a measure of disease severity [19]. In contrast, variables that evaluated the impact on the patient demonstrated significance, including sleep disturbance, QoL using both a dermatology-specific measure and a generic measure, and work productivity.

In the multi-variate model, day-to-day (baseline) symptoms of papules and permanent scarring were not significant, although a previous study using multiple regression analysis reported that the specific clinical signs of excoriations, erythema, and edema/papulation were independent predictors of patient-rated disease severity [40]. However, in that study, these signs accounted only for 25% of the variation in patient-reported severity, with the patient-reported severity primarily based on the bothersomeness of the condition rather than overall disease activity. Additionally, in the present study, excoriations, erythema, and edema/papulation were captured as part of the EASI score rather than as individual symptoms. Of the variables that did demonstrate significance in bivariate analysis and were included in the multi-variate model, the only two that retained significance were patient-reported QoL assessed using a dermatology-specific measure (DLQI), and sleep disturbance, which is both a frequent complaint of patients with AD and a component of their disease perception [11]. The finding that these two variables retained significance suggests that QoL-related issues contribute to patient perceptions of their overall disease severity and may likely be key drivers of discordance between the patient and physician perspectives. The importance of QoL and sleep is not surprising given the impact that AD, and the associated pruritus in particular, has on these outcomes [12, 14, 41, 42], and that sleep disturbance has also been reported to be a significant predictor of QoL in patients with AD [42].

Because QoL does not necessarily correlate with disease activity assessed using standard clinical measures [43], its identification as a factor contributing to the discordance between physician and patient perceptions suggests a need for the incorporation of QoL evaluation when assessing patients with AD. Such QoL assessment is clinically relevant considering that awareness and use of clinical and patient-reported AD measures among physicians was generally low. To our knowledge, this is the first study to evaluate the extent of awareness and clinical use of these measures among physicians. The low awareness even among dermatologists for measures that evaluated pruritus was surprising, especially given the importance of itch to the burden of AD including daily effects on sleep disturbance and functional impairment [10, 14]. Such low awareness of AD measures also indicates that physician estimation of severity may not necessarily be based on quantitative measures of disease activity, further suggesting that clinical assessment may be less than optimal. Importantly, these results also indicate that physicians may not be having meaningful conversations with patients about QoL or the impact of AD on the daily life of the patient. The patient impact likely represents an important indicator of a need for treatment because patients may be more concerned about what they can do and how they are perceived than by an estimated value on a severity scale.

To put the DSP physician-perceived severity findings into context, information regarding overall physician workload (from the same physicians that completed the DSP), in terms of the total numbers of AD patients seen over a 2-week period split into mild, moderate, and severe, was used to produce an estimate of the overall distribution of AD. This indicated proportions of patients with mild, moderate, and severe AD of 53.9, 42.0, and 4.1%, respectively. In addition, using information regarding total numbers of patients managed over a 12-month period by each physician included in the DSP, and taking into account the total number of physicians of each type in the USA, we estimated that the proportion of patients treated by PCPs, dermatologists, and allergists/immunologists was 66, 28, and 5% of patients with AD, respectively. Results indicate that when AD severity is projected to the total population, severe AD represents a smaller proportion of the overall AD population than the sample included in this analysis.

Strengths and Limitations

The strengths of this study are that the patients had a clinically confirmed diagnosis of AD, and that these patients reflect the consulting population from real-world clinical practice, providing insight into how patients who met the inclusion criteria and their consulting physician rate AD severity. However, interpretation and generalizability of these results should also take into account the study limitations, including that this analysis was limited only to patients with a history of moderate-to-severe AD. In addition, disease severity rating by the patient was based on personal judgment, rather than an objective measure such as POEM. Although this may limit the generalizability of the findings to other populations, this was considered important to determine patient perceptions in this real-world setting. Another limitation is that data acquisition was reliant on the accuracy of the physician’s report. There is also the potential for selection bias because those who agreed to participate may have characteristics different from those who did not agree. Last, the DSP is a cross-sectional study, and thus relationships should be considered associative rather than causal.


A discordance between patient and physician rating of AD severity was observed in almost one third of patients in this study, with QoL-related outcomes appearing to be the primary driver of this discordance. Patients tended to rate their disease of lower severity than their treating physicians, highlighting a need for greater communication between patients with AD and their treating physicians and confirming the importance of including the patient perspective when making management decisions. In addition, there is a general lack of awareness and use of clinical and patient-reported measures for assessing the severity and impact of AD, suggesting a need for greater education of physicians regarding the availability and use of such measures for the assessment of AD.