FormalPara Key Summary Points

Why carry out this study?

Participation in clinical trials often requires time-consuming travel to academic centers or urban areas, which may lead to lower rates of participation and enrollment of less diverse subjects.

Teledermatology has the potential to bridge this gap; however, remote assessments should be reliable and robust across patients with different skin tones and clinical assessors of different experience levels.

This study sought to determine whether psoriasis severity could be reliably assessed remotely through digital images, and whether patient skin tone or the experience level of the assessor impacted this reliability.

What was learned from the study?

Overall, all digital image-based assessors showed good agreement with a face-to-face assessor when evaluating patients with psoriasis across a range of disease severity.

Remote and face-to-face assessments demonstrated good concordance regardless of patient skin tone or the training level of the assessors.

Our pilot study lays the groundwork for further expanding telehealth-based clinical trials for patients of varied different skin tones in underserved areas.

Introduction

Telemedicine has the potential to expand clinical research access to new patient populations through enhanced recruitment, retention, and real-time data collection. Over the past two decades, these services have demonstrated value by helping improve patient–provider visit efficiency, reducing time to treatment, and allowing patients living in underserved areas to access dermatologic care and specialty consults [1,2,3,4]. Telehealth can also be used to augment traditional clinical trial methodologies by expanding access to new patient populations and enhancing recruitment, retention, and real-time data collection [5].

Psoriasis is a chronic, inflammatory skin disease that manifests as thick, red, scaly plaques on the body [6]. Treatment is often complex and involves visual, long-term monitoring of symptoms[7, 8]. The most widely used system for measuring psoriasis severity is the Psoriasis Area and Severity Index (PASI) score, which is a graded representation of involved body area across the scalp, upper limbs, trunk, and lower limbs [9]. In digital image-based PASI (DIB-PASI), digital photographs are used to score patients remotely. Previous research on the accuracy of DIB-PASI methods has shown good concordance between DIB-PASI and face-to-face PASI (FTF-PASI) scores [8, 10, 11]. Additionally, clinical outcomes did not differ between patients with psoriasis treated using an entirely online, collaborative connected-health model involving DIB-PASI and those who were managed in person [12].

Specifically, DIB-PASI services have the potential to improve the recruitment and retention of patients with psoriasis who do not live near academic medical centers and could not otherwise enroll. DIB-PASI can facilitate access to trials for these patients, while decreasing costs incurred by repeated visits and travel time [13,14,15]. By eliminating the need for on-site visits, DIB-PASI decreases the patient burden and could potentially mitigate the high drop-out rate seen in traditional trials [16]. Improved retention rates help avoid delays and can significantly minimize research costs [17].

The benefits of DIB-PASI are especially relevant for clinical trials concerning psoriasis, which requires regular follow-up to track progress and evaluate treatment response. In these patient populations, tele-assessments such as DIB-PASI have been shown to substantially decrease total patient turnaround time with no difference in efficacy compared with face-to-face consultations [18, 19]. In addition, visit compliance is significantly improved by associated reductions in patient travel time [10, 20]. In psoriatic populations, teledermatology applications have also been successfully used for high-need home monitoring and full-service online care management [11, 12]. Thus, DIB-PASI represents a cost-effective, practical way to include patients in geographically remote areas or disparate living conditions without compromising medical care or research quality. As a result, improved recruitment could allow for the study of diverse and previously underrepresented patients that are more representative of the disease population.

In this pilot study, we evaluate the reliability of digital photography to assess psoriasis severity. Our work expands on previous research to provide a more complete comparison of DIB-PASI and FTF-PASI scoring by examining clinical experience as well as evaluating the potential impact of patient skin tone on score reliability.

Methods

Study Design and Participants

Patients with plaque psoriasis were recruited from the Department of Dermatology, University of California, San Francisco for the clinical trial: “A Single-Center, Open-Label Study to Assess Improvement in Psychosocial and Occupational Dimensions with Adalimumab Treatment of Moderate-to-Severe Psoriasis evaluated with the Work Productivity and Activity Impairment-Specific Health Problem (WPAI-SHP), Psychological General Well-Being (PGWB), Psoriasis Quality of Life-12 (PQOL-12), and Dermatology Life Quality Index (DLQI).” Fourteen patients with moderate-to-severe plaque psoriasis were recruited and treated with adalimumab for 52 weeks. Eligibility criteria included being of age 18 or older and having physician-diagnosed plaque psoriasis.

Face-to-face visits were performed according to the study protocol. During each visit, FTF-PASI scores were assessed by a clinical research fellow with 2 years of clinical experience in PASI scoring. Simultaneously, digital images of all PASI-scored body sites excluding the scalp, which was obstructed by hair, were taken for DIB-PASI scoring. Each patient generally had six photos taken by a clinical research coordinator: full front, upper body front, lower body front, full back, upper body back, and lower body back. Patient skin tone was categorized as “light” (n = 8) for Fitzpatrick I–II phototypes, and “medium-to-dark” (n = 6) for Fitzpatrick III or greater. Participants with photos determined to be of sufficient quality for scoring were included in the study, on the basis of lighting, sharpness, and color clarity (Fig. 1). Of the 210 total sets of images, 198 (94.3%) were determined to be high quality and retained for our analysis. Images of these subjects from weeks 0, 12, and 24 were reviewed by four independent assessors, spanning the attending physician, two clinical fellows, and medical student training levels with 6, 3, 2, and 1 years of clinical experience in PASI scoring, respectively. For each week, all assessors used the images to score each participant’s upper limbs, trunk, and lower limbs across the four PASI components to derive a DIB-PASI score. All assessors completed the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) PASI score training prior to completing assessments for the study. Because no digital images were taken for the scalp body site, FTF-PASI scores for component were used in the calculation of the total DIB-PASI score.

Fig. 1
figure 1

Examples of digital images for DIB-PASI scoring. a Example digital photographs taken of a patient with psoriasis, front and back. b Example digital photographs taken of another patient with psoriasis, front and back. Identifying region redacted

Compliance with Ethics Guidelines

This study received approval from the UCSF IRB (#16-21005) and was performed in accordance with the Declaration of Helsinki 1964 and its later amendments. All subjects provided informed consent to participate in the study.

Analysis Methods

Statistical programming and analyses were performed using R 4.0.3. Results were considered statistically significant at p < 0.05.

The intraclass correlation coefficient (ICC) was used to evaluate the concordance of DIB-PASI scores with their corresponding FTF-PASI scores. The ICC measures score reliability by comparing the variability of different scores assigned to the same participant with the total variation across all scores and all participants. ICC values < 0.50 were categorized as “poor agreement,” 0.50 ≤ ICC < 0.75 as “moderate agreement,” 0.75 ≤ ICC < 0.90 as “good agreement,” and ICC ≥ 0.90 as “excellent agreement” [21,22,23]. ICC estimates and their 95% confidence intervals were calculated using an absolute-agreement, two-way mixed effect models with error adjustment for the number of raters compared in a given analysis, α = 0.05. Missing values and corresponding rows were omitted from analysis.

Results

Participants

We recruited patients and conducted study visits. A total of 14 eligible patients with plaque psoriasis, 10 male-identified and 4 female-identified, agreed to participate in the study. Age ranged from 26 to 53 years old with an average of 37.7 years. Age, gender identification, and baseline FTF-PASI scores at week 0 of each patient are described in Table 1. Digital images of sufficient quality for PASI scoring were available for 13, 12, and 14 patients for weeks 0, 12, and 24, respectively. All clinical assessors completed PASI assessments for each patient at each timepoint.

Table 1 Description of the gender identification, age, and baseline FTF-PASI score for 14 recruited participants

Overall Trends and Concordance in PASI Scores

In general, the distribution of all PASI scores was similar (Fig. 2a) and median scores across DIB-PASI and FTF-PASI trended close together. The full distribution of PASI scores ranged from 0 to 41.6. The median PASI score across all patients was 13.8 at week 0, 3.5 at week 2, and 1.8 at week 24. As illustrated in Fig. 2, median PASI decreased over time, reflecting the clinical improvement that was consistent across assessors both in person and digitally.

Fig. 2
figure 2

Overall PASI score distribution and assessor concordance. a Box plots showing distribution and range of PASI scores recorded by the face-to-face (FTF, gray) and four image-based assessors (numbered 1–4, colored). b Bar plots showing overall intraclass correlation (ICC, a measure of concordance) for each of the DIB-PASI assessors with the in-person assessor (colored bars), as well as a combined group concordance (gray bar). Error bars show 95% confidence interval. c Line plot showing ICC values across all time points, with colors corresponding to each of the assessors

To analyze the concordance between FTF-PASI and DIB-PASI scores, we calculated the intraclass correlation coefficient, a measure of agreement between observers that has been used in other studies of remote monitoring of psoriasis [21,22,23] The concordance of all scores was high, with an ICC value of 0.82 (Fig. 2b, gray bar; p < 0.0001), indicating good agreement between FTF-PASI and DIB-PASI scores from all assessors. When analyzed at the level of individual assessors, we found that all DIB assessors also demonstrated good agreement with the FTF assessor. Concordance was lowest at week 0 (ICC 0.64) when the distribution of PASI scores was widest, which was still considered moderate agreement.

Concordance by PASI Component and Region

The PASI is scored on four clinical components: erythema (redness), induration (thickness), desquamation (scaling), and degree of involvement for each affected body region (area). To understand the relative contribution of each component to concordance as previously observed, we evaluated their individual ICC values between FTF-PASI and DIB-PASI.

Scores of the four components showed good agreement for induration (ICC 0.84, p < 0.0001), desquamation (ICC 0.77, p < 0.0001), and area (ICC 0.78, p < 0.0001), and moderate agreement for erythema (ICC 0.72, p < 0.0001). ICC values were further analyzed across all assessors (Fig. 3a, colored bars), which all showed moderate or good agreement.

Fig. 3
figure 3

Concordance of PASI scores by clinical component and body region. a ICC values of PASI components (redness, thickness, scaling, area) and body regions (upper limb, trunk, lower limb; scalp excluded, see Methods). Gray bars denote combined concordance of all DIB-PASI assessors compared with the FTF-PASI assessor, while colored bars denote individual assessors. Error bars denote 95% confidence intervals

The measurements of the clinical PASI components are taken at four body regions: the scalp, upper extremities, trunk, and lower extremities [20]. We similarly evaluated the ICC values of each region between FTF-PASI and DIB-PASI scores. We found consistently good to moderate agreement across each body region: trunk (ICC 0.80, p < 0.0001), upper extremities (ICC 0.74, p < 0.0001), and lower extremities (ICC 0.73, p < 0.0001). A one-way between-subjects analysis of variance (ANOVA) found that different body areas were not significantly associated with differences in mean concordance values (F = 4.066, p = 0.393).

Concordance by Patient Skin Tone and Assessor Experience

The presentation of psoriasis has been shown to vary between ethnic groups, often with important implications for the treatment and management of nonwhite individuals [24]. To determine if the concordance between FTF-PASI and DIB-PASI scores was affected by skin tone, patients were classified as having a light or medium-to-dark skin tone by an experienced clinical assessor based on Fitzpatrick phototype. Of the 14 participants, 8 had light skin tones (Fitzpatrick I–II) and 6 had medium-to-dark skin tones (Fitzpatrick III and above). Both groups showed good agreement between FTF-PASI and DIB-PASI, with ICC values of 0.80 and 0.83 respectively. A one-way between-subjects ANOVA showed no significant differences in concordance associated with either skin tone group (F = 4.572, p = 0.1) (Fig. 4).

Fig. 4
figure 4

PASI concordance by patient skin tone and assessor training level. a ICC values grouped by patient skin tone (light and medium dark). Gray bars denote combined concordance of all DIB-PASI assessors compared with the FTF-PASI assessor, while colored bars denote individual assessors. Error bars denote 95% confidence intervals. b ICC values by training level (attending, fellow, student). Error bars denote 95% confidence intervals

Finally, we analyzed ICC values for concordance across levels of clinical training. The fellow level of training showed the highest concordance with the in-person assessor (ICC 0.86), followed by the attending (ICC 0.80), and student (ICC 0.75) training levels. These ICC values indicate that the high concordance between FTF-PASI and DIB-PASI holds at the individual assessor level as well as the overall group level.

Discussion

To our knowledge, we have for the first time assessed the level of concordance between PASI scores measured during face-to-face visits and those determined using digital images across different levels of clinical experience and patient skin tones. Overall, we demonstrate a high level of concordance between a face-to-face assessment compared with assessments by digital images alone. In particular, good agreement was achieved regardless of patient skin tone and clinical training.

Erythema was the only PASI component that showed moderate agreement instead of good agreement (defined by ICC ≥ 0.75). Additionally, when evaluating each component at the individual assessor level, the least experienced assessor showed moderate instead of good agreement for area. Given the use of area as a multiplicative factor in the calculation of PASI, even a small difference in measurement can have a relatively large effect on overall scores. While there was no difference observed between groups in this study, future implementation of DIB-PASI may benefit from implementing clear protocols or automated computer algorithms in evaluating area.

DIB-PASI also showed good agreement with FTF-PASI across the range of clinical experience, The fellow level of training showed the highest concordance with the in-person assessor (ICC 0.86), followed by attending (0.80), and student (0.75) training levels. Even though the higher training levels (i.e., attending and fellow) did correspond to greater concordance, good agreement was still obtained at the student level. These results suggest that even 1 year of PASI scoring experience could lead to good agreement between digital and face-to-face assessments, and that the reliability of digital image-based scores do not have to be solely obtained by highly experienced physicians.

Disparities in dermatological diagnosis and treatment of patients with nonwhite skin are well documented [24,25,26,27,28]. For example, skin of color remains greatly underrepresented in teaching images, and nearly half of dermatologists in one study felt their training was inadequate for confident diagnosis in patients with skin of color [25]. As PASI scoring is widely used for tracking disease severity and serves as a key metric in clinical trials to assess treatment efficacy [29], it is vital that digital image-based PASI scoring is reliable for patients with nonwhite skin. In our study, FTF-PASI was concordant with DIB-PASI for patients with light skin tones as well as those with medium-to-dark skin tones, suggesting that 2D images can be reliably assessed for patients of different skin tones. It is important to note, however, that the accuracy of FTF-PASI scoring may still be impacted by patient’s skin tone, as PASI components—such as erythema—can present differently in skin of color. Although current datasets for artificial intelligence in dermatology lack diversity [28], the advantage of digital images is that future advances in these algorithms may enable more accurate assessment of psoriasis severity in skin of color.

Limitations

Our pilot study is limited by a sample size of 14 patients, which was relatively small. However, as this study was longitudinal with each patient analyzed by five clinical assessors across multiple PASI subcomponents, the total number of observations analyzed exceeded 1000. This pilot study thus provides a stepping stone for conducting larger studies with a larger cohort of patients as well as a greater number of clinical assessors. The overall number of quality images in our study was high (94.3%), but the lack of high-quality images of the scalp region also restricts our conclusions to PASI scoring over the upper limbs, trunk, and lower limbs. Additionally, we used standardized photographs taken in the clinic. We did not assess mobile photography taken by patients at home, which are typically of more variable quality. However, developments in machine learning and artificial intelligence algorithms can help identify low-quality images and assist patients in obtaining high-quality images [30].

Conclusions

Clinical trials in dermatology are not diverse, with substantial underrepresentation of nonwhite patients [26]. Additionally, the burden of traveling to academic centers is a major obstacle for clinical trial recruitment of patients from underrepresented or marginalized communities. Remote DIB-PASI services can provide a cost-effective, accessible alternative to face-to-face visits needed for clinical research to improve recruitment and retention in clinical trials for chronic conditions such as psoriasis.

To effectively reduce these disparities, DIB-PASI scoring should agree with in-person assessments, be applicable to different skin tones, and be robust to varying experience level of assessors. Our results are consistent with previous research showing that PASI scores can be determined with a high level of accuracy by clinical assessors using digital images [8, 11, 31]. Notably, our pilot study takes a step further to demonstrate that a high level of concordance can be achieved across different patient skin tones and the clinical experience level of the assessor.

Reliable remote assessment is a vital step toward addressing the disparities in dermatology. Our pilot study used in-person digital photography in the office as a proof-of-concept, but advances in mobile phone capabilities and artificial intelligence algorithms [30] will soon enable patients with limited in-person access to dermatologists to obtain accurate images from their own homes for clinical assessment [32]. These advances lay the groundwork for reliable and high-quality teledermatological care and clinical research for diverse and remote patient populations.