FormalPara Take-home message

Why carry out this study?

The clinical manifestations of plaque psoriasis have been shown to severely impact patients’ quality of life and emotional well-being; therefore, it is important to measure patient experience alongside clinical parameters in the evaluation of treatments for plaque psoriasis.

The Psoriasis Symptoms and Impacts Measure (P-SIM) is a novel patient-reported outcome tool developed to specifically capture patients’ experiences of the key signs, symptoms and impacts of plaque psoriasis; here, three key items of the P-SIM (itching, skin pain and scaling) were completed at pre-defined patient visits using a hand-held device on a 0–10 numerical rating scale.

In this research, the psychometric properties of the P-SIM were evaluated using data from BE RADIANT, a phase 3b, multi-centre, randomised, secukinumab-controlled, double-blinded, parallel-group trial that investigated the efficacy and safety of bimekizumab in the treatment of moderate to severe plaque psoriasis.

What was learned from this study?

In these analyses, the itching, skin pain and scaling items of the P-SIM demonstrated good reliability, validity and sensitivity to change in the assessment of patients' experiences of key psoriasis signs, symptoms and impacts when completed at pre-defined patient visits on a hand-held device over a period of 48 weeks.

Anchor-based analyses confirmed that a four-point decrease in item scores of the P-SIM was indicative of a marked clinically meaningful improvement when completed at pre-defined patient visits on a hand-held device; this threshold can be used to assess treatment effects over 48 weeks in patients with moderate to severe plaque psoriasis.

Introduction

Psoriasis is a chronic, immune-mediated disease which manifests in the skin or joints [1]. Plaque psoriasis makes up approximately 80–90% of all cases of psoriasis and typically presents as red, raised patches of skin with silvery-white scales and well-defined edges [2]. The clinical manifestations of plaque psoriasis have been shown to severely impact patients’ quality of life and emotional well-being [3].

To date, several patient-reported outcome (PRO) measures have been developed to assess the symptoms and functional impacts of skin diseases, the most commonly used being the Dermatology Life Quality Index (DLQI) [4]. However, many PRO tools, such as DLQI, do not accurately capture the signs and symptoms of psoriasis reported by patients, due to: long recall periods, the exclusion of certain symptoms, lack of specificity and the lack of measurements to assess the severity of symptoms [5].

For these reasons, the Psoriasis Symptoms and Impacts Measure (P-SIM) was developed to specifically capture patients’ experiences of key signs and symptoms of plaque psoriasis. The P-SIM was designed as an electronic diary with a 24 h recall period. It consists of 14 items which aim to cover the breadth of psoriasis symptoms; these items were derived from literature searches, clinical expert opinion and interviews with patients [6]. Content validation of the P-SIM has previously been conducted alongside the BE ABLE 2 phase 2b study (NCT03010527) [6, 7]. Additionally, psychometric properties of the P-SIM have been evaluated based on blinded data from the BE VIVID (NCT03370133) and BE READY (NCT03410992) phase 3 trials of bimekizumab, in which daily assessments using the P-SIM were conducted [8].

Here, we evaluate the psychometric properties of the P-SIM in the context of BE RADIANT (NCT03536884), a phase 3b, multi-centre, randomised, secukinumab-controlled, double-blinded, parallel-group trial investigating the efficacy and safety of bimekizumab in the treatment of moderate to severe plaque psoriasis.

Unlike in the BE VIVID and BE READY clinical trials, in which the P-SIM was completed daily, in BE RADIANT, three of the 14 P-SIM items (itching, skin pain and scaling) were completed by patients on electronic tablets at pre-defined study visits. It is therefore necessary to further evaluate the instrument’s properties in the context of BE RADIANT, to assess whether the three selected P-SIM item scores can reliably measure symptoms in patients with moderate to severe plaque psoriasis, and to assess the reliability of a single measurement rather than a weekly average. Responder definition (RD) thresholds for the P-SIM items are also estimated.

Methods

Study Design

Analyses were conducted using blinded data from BE RADIANT, in which patients were randomised 1:1 to bimekizumab 320 mg every 4 weeks (Q4W) or secukinumab 300 mg Q4W (Fig. 1). At Week 16, patients receiving bimekizumab continued on bimekizumab Q4W or received bimekizumab Q8W. Adults with moderate to severe plaque psoriasis were enrolled, with baseline Psoriasis Area and Severity Index (PASI) ≥ 12, body surface area (BSA) affected by psoriasis ≥ 10%, and Investigator’s Global Assessment (IGA) score ≥ 3 on a 5-point scale.

Fig. 1
figure 1

In BE RADIANT, patients were randomised 1:1 to bimekizumab 320 mg Q4W or secukinumab 300 mg weekly to Week 4 then Q4W. Patients randomised to bimekizumab 320 mg Q4W either continued to receive Q4W dosing at Week 16 or switched to Q8W maintenance dosing. Secukinumab dosing was 300 mg at Weeks 0, 1, 2, 3 and 4, and Q4W thereafter. After completing the 48-week double-blinded period, patients could enrol in a 96-week open-label extension period to assess the long-term safety, tolerability and efficacy of bimekizumab. Patients who did not enrol in the open-label extension underwent a safety follow-up visit 20 weeks after their last dose of study treatment. PASI 100 100% improvement from baseline Psoriasis Area and Severity Index, Q4W every 4 weeks, Q8W every 8 weeks

The study protocol, amendments and patient informed consent were reviewed by a national, regional or Independent Ethics Committee or Institutional Review Board. BE RADIANT (NCT03536884) was conducted in accordance with the current version of the applicable regulatory and International Conference on Harmonisation–Good Clinical Practice requirements, the ethical principles that have their origin in the principles of the Declaration of Helsinki, and the local laws of the countries involved.

Measurements and Outcomes

The full P-SIM consists of 14 items are listed in Table 1. Each item was scored for severity or impact level at its worst over the past 24 h on a scale from 0 (no symptoms/impacts) to 10 (very severe). In BE RADIANT, three symptom items (skin itching, skin pain and skin scaling) from the full P-SIM were assessed, as highlighted in Table 1. These three items were used as key secondary endpoints in the BE VIVID and BE READY phase 3 trials, because they are subjective concepts that are most accurately assessed directly by the patient. These concepts can complement clinician-reported concepts and provide a holistic and comprehensive assessment of psoriasis signs and symptoms. It was therefore deemed appropriate to only include these items of the P-SIM for assessment in the BE RADIANT trial. Additionally, by including only three of the 14 P-SIM items, the time burden on patients would be reduced.

Table 1 Items of the P-SIM

The three-item P-SIM was administered and completed by patients at baseline and at Weeks 4, 8, 12, 16, 32 and 48, as shown in Table 2, on electronic tablets. Other PRO instruments (DLQI and the Patient Global Assessment of Psoriasis [PGAP]) and clinician-reported outcome (ClinRO) instruments for the assessment of treatment efficacy (PASI and IGA) were also used in BE RADIANT. PGAP and DLQI were administered at the same pre-defined study visits as the P-SIM, and PASI and IGA scores were recorded at every on-site study visit, as shown in Table 2. The PGAP consists of a multiple-choice question answered on a verbal rating scale: “How severe are your psoriasis-related symptoms right now?” the patient may respond that they have “no symptoms” (1), “mild symptoms” (2), “moderate symptoms” (3), “severe symptoms” (4), or “very severe symptoms” (5). The DLQI consists of 10 items, each scored from 0 to 3, with 3 representing the highest impact. A total score, ranging from 0 to 30, is obtained by adding the 10 item scores together, with higher scores indicating increased impact on health-related quality of life [9]. In this analysis, DLQI total score and DLQI Item 1 score (question: “Over the last week, how itchy, sore, painful or stinging has your skin been?”, answers: “Very much”; “A lot”; “A little” or “Not at all”) were used as part of the psychometric validation. Details of PASI and IGA score measurements can be found in the Supplementary Methods.

Table 2 Timing and frequency of completed measures through treatment period

Statistical Analysis

Analyses were performed including all randomised patients, unless otherwise noted. Only blinded datasets were used; therefore, treatment group allocation was not considered in any of the analyses. Observed case data were used for the analyses, and no imputation was performed for missing data.

Compliance and Completion Rates

The compliance rate for each item at a given visit was calculated as the number of patients who had a non-missing item score for that visit divided by the number of randomised patients who were receiving study treatment or who had a non-missing P-SIM score at that visit. The completion rate for each item at a given visit was calculated as the number of patients with a non-missing item score at that visit divided by the number of patients randomised at baseline.

Psychometric Validation

Construct Validity

Convergent validity was assessed by calculating the correlation between the three P-SIM item scores and scores from the other outcomes measuring similar concepts. Spearman’s rank correlation coefficients and corresponding p values were calculated between the P-SIM itching, skin pain and scaling item scores and PROs (DLQI total score, DLQI Item 1 score and PGAP score) and ClinROs (PASI total score and IGA score) at baseline and Weeks 16, 32 and 48. Only pairs of outcome measures assessed on the same visit (shown in Table 2) were included in this analysis. A correlation coefficient > 0.3 and ≤ 0.5 signified moderate convergent validity and a coefficient > 0.5 signified strong convergent validity [10]. It was hypothesised that the three items of the P-SIM would have moderate to strong correlations with PASI, IGA, DLQI and PGAP total scores, apart from baseline correlations between the three items of the P-SIM and both PASI and IGA. These correlations would be slightly weaker as there would be limited variability in these anchors due to the inclusion criteria of BE RADIANT (IGA was required to be moderate or severe and PASI ≥ 12). It was also hypothesised that the itching and skin pain items of the P-SIM would have a strong correlation with DLQI Item 1.

To assess the known-groups validity of P-SIM items (ability to differentiate between different patients subgroups), the mean scores of each of the three P-SIM items at Weeks 16, 32 and 48 were compared between subgroups of patients based on absolute PASI total score thresholds (≤ 1, > 1 to ≤ 3, > 3 to < 5 and ≥ 5) and IGA scores (0: “clear”; 1: “almost clear”; 2 or 3: “mild or moderate”; and 4: “severe”). PASI and IGA were used for the assessment of known-group validity as they are well-accepted clinical measures of psoriasis disease severity and were used to define primary efficacy endpoints in bimekizumab phase 3 studies. As expected due to the inclusion criteria of the BE RADIANT trial (PASI ≥ 12, IGA ≥ 3), there was low variability in scores for both measures at baseline; assessment of known-groups validity therefore focussed on Weeks 16, 32 and 48. The absolute PASI cut-off values considered to define subgroups (1, 3 and 5) have been shown to provide reliable estimates of disease activity that can be used to define treatment goals for psoriasis treatment and facilitate clinical decision-making [11]. Additionally, the absolute PASI cut-off value of 12 was used to define moderate to severe plaque psoriasis in the BE RADIANT inclusion criteria. The response options of the 5-point IGA scale were used as the criteria to define subgroups of patients with clinically different psoriasis severity levels, an approach shown to be a valid and reliable measure of psoriasis severity [12].

It was hypothesised that patients with higher PASI or IGA scores would have higher mean and median P-SIM item scores. p values from an analysis of variance (ANOVA; F test) and the Kruskal–Wallis test were used to compare distributions between the known groups.

Sensitivity to Change

Spearman correlation coefficients and corresponding p values were calculated between changes from baseline to Weeks 16, 32 and 48 in P-SIM item scores and those in other relevant outcomes [PASI (percent and absolute change), IGA, PGAP, DLQI Item 1 and DLQI total score]. A correlation coefficient ≥ 0.3 demonstrated acceptable sensitivity to change [13, 14].

Test–Retest Reliability

For each of the three items of the P-SIM, test–retest reliability was assessed using intraclass correlation coefficients (ICCs). These were calculated as the ratio of between-patient variance to the total variance [15]. ICCs were calculated at time intervals between baseline and Week 4 and between Weeks 12 and 16 among stable study participants (defined as those with the same PGAP score over these intervals). An ICC value ≥ 0.70 demonstrated acceptable test–retest reliability [15].

Determination of Responder Definition Thresholds

Anchor-based methods were used to define thresholds for within-patient clinically meaningful changes, in accordance with the FDA PRO guidance [16]. Distribution-based methods were also used to provide additional information about the variability of the relevant scale scores, as per the FDA guidance [16]. Triangulation was conducted by examining the results from all the anchor-based analyses, with the support of the estimates from the distribution-based methods, to define a range of values which could be used to interpret within-patient meaningful change scores for each of the three P-SIM items. Anchor-based analyses were conducted at Weeks 16, 32 and 48. At each of these visits, only anchors with Spearman correlations ≥ 0.30 and with sample size ≥ 50 were considered for the RD determination. The anchors chosen are shown in Table 3. Empirical cumulative distribution functions (eCDF) and probability density function (PDF) curves of observed changes for each of the three P-SIM items were plotted separately at Weeks 16, 32 and 48 for each response group for each of the anchors. Distribution-based analyses were conducted using the standard error of measurement and using the standard deviation (SD) of the item scores at baseline and pre-defined study visits. If the SD was vastly different over time, the baseline SD was used. If not, the mean of all SDs across the relevant time points was used.

Table 3 Final response categorisation by anchor for RD threshold analyses

Results

Patient Disposition and Baseline Characteristics

In BE RADIANT, 743 patients were randomised. Baseline demographics and disease characteristics are shown in Table 4. At baseline, the mean age of study participants was 45.0 years and the majority of participants were male (65.4%) and white (93.5%). The majority of study participants were from North America (39.2%). Mean scores for the itching, skin pain and scaling items were 6.7, 4.6 and 6.7, respectively. Mean BSA affected by psoriasis was 24.3% and the mean PASI score was 19.9.

Table 4 Patient demographics and baseline disease characteristics

P-SIM Compliance Rates and Completion Rates

At each study visit, all participants completed either all three P-SIM items or none (i.e., there was no item-level missingness). P-SIM compliance rates for itching, skin pain and scaling ranged from 98.9% at baseline to 96.9% at Week 48, with an overall average across all visits of 99.1% (Supplementary Table 1). P-SIM completion rates for itching, skin pain and scaling were also high (above 92.4%) at each time point apart from Week 48 (54.5%), with an overall average across all visits of 90.8% (Supplementary Table 2). The lower completion rate at Week 48 was due to the 252 study participants who had not yet reached the Week 48 visit at the time of the interim data-cut used in these analyses.

Psychometric Validation

Construct Validity

As hypothesised, the three P-SIM items had moderate to strong correlations with PGAP, DLQI total score and DLQI Item 1 at baseline and Weeks 16, 32 and 48 (Supplementary Table 3). A weak correlation (≤ 0.30) was observed at baseline between the three P-SIM item scores and the PASI and IGA scores, as expected due to study inclusion criteria. At Weeks 16, 32 and 48, the three P-SIM items had moderate to strong correlations with PASI total score and IGA, with the exception of the skin pain item score at Week 16, for which correlations were below the acceptable threshold of 0.3.

When examining known-groups defined using PASI total score (Fig. 2) and IGA score (Fig. 3), the P-SIM responded as expected across all three items, with higher P-SIM item scores reporting more severe symptoms at Weeks 16, 32 and 48. Statistically significant differences in P-SIM item scores were observed between the known-groups across all three P-SIM items.

Fig. 2
figure 2

P-SIM item scores at Week 16, 32 and 48 by PASI total score. a P-SIM Item 1—itching. b P-SIM Item 3—skin pain. c P-SIM Item 5—scaling. Black circular markers indicate mean scores and blue circular markers indicate outliers. PASI Psoriasis Area and Severity Index, P-SIM Psoriasis Symptoms and Impacts Measure

Fig. 3
figure 3

P-SIM item scores at Week 16, 32 and 48 by IGA score subgroup. a P-SIM Item 1—itching. b P-SIM Item 3—skin pain. c P-SIM Item 5—scaling. Black circular markers indicate mean scores and blue circular markers indicate outliers. IGA Investigator’s Global Assessment, P-SIM Psoriasis Symptoms and Impacts Measure

Sensitivity to Change

Spearman correlation coefficients between changes from baseline in the three P-SIM items and in PRO measurements (PGAP, DLQI Item 1 and DLQI total score) were strong (> 0.5) or moderate (> 0.3 and ≤ 0.5) at Weeks 16, 32 and 48 (Table 5). In contrast, correlation coefficients with changes in ClinROs showed a weak correlation (≤ 0.30, except for the P-SIM scaling item with PASI total score at Week 48 [0.31], as expected due to the inclusion of a scaling component in PASI).

Table 5 Spearman correlations between changes from baseline in P-SIM item scores and other clinician- and patient-reported outcomes

Test–Retest Reliability

Only ICCs calculated between Weeks 12 and 16 are reported, as between baseline and Week 4 the sample size was small (n = 79) and patients may have been misclassified as “stable” during this time interval because of improvements in some of their symptoms (but not to the point of changing the PGAP category) given that both study treatments were highly effective in relieving symptoms in the first few weeks of treatment. It was therefore not possible to draw conclusions from the ICCs calculated between baseline and Week 4. Through Weeks 12–16, ICC values were above the acceptability threshold of 0.70 (Table 6).

Table 6 Test–retest reliability of each P-SIM item score using intraclass correlation coefficients by response category ‘improvement level 1’

Determination of Responder Definition Thresholds

Among all proposed anchors, only PGAP score, DLQI Item 1 score and DLQI total score were included in the anchor-based analysis; PASI percent and absolute change from baseline and IGA change from baseline all had correlation coefficients well below the acceptable threshold of 0.30 [13, 14], with changes from baseline in P-SIM item scores across all time points. Therefore, it was deemed inappropriate to include them in the anchor-based analysis.

Table 7 shows the mean changes in P-SIM items by response category used for each anchor. The mean changes obtained for one level of improvement were considered to be appropriate potential RD thresholds for the three items of the P-SIM based on the eCDF plots, which consistently separated from those in the ‘no meaningful change’ groups (Supplementary Figs. 1 and 2).

Table 7 Mean changes in P-SIM item by response category within each anchor and each time point

Values obtained from the PGAP and DLQI Item 1 score change from baseline were prioritised for four main reasons: the PGAP is the preferred anchor of the US FDA; both the PGAP and DLQI Item 1 are measured using ordinal scales with directly interpretable verbal descriptors; the resulting false positive rate, which should be minimised, was the highest with the ‘no meaningful improvement group’ based on DLQI change in total score; and these anchors assess proximal concepts that are similar to the concepts measured by the selected P-SIM items.

The P-SIM item RD thresholds found using mean score changes ranged from 2.40 (Item 3: skin pain at Week 16) to 5.82 (Item 5: scaling at Week 48) (Table 7). Distribution-based values were lower compared to those obtained with the anchor-based approach and were considered as lower bounds of possible RD thresholds that would be less conservative. The mean change in P-SIM item scores obtained with one-point improvement on DLQI Item 1 and PGAP were close to the four-point FDA-recommended threshold, which was confirmed upon the inspection of the eCDF and PDF plots (the three P-SIM item eCDF curves for PGAP and DLQI Item 1 are presented in Supplementary Fig. 1 and 2, respectively). Based on these results, it was determined that a four-point improvement threshold represented a marked clinical improvement in patients with moderate to severe plaque psoriasis.

Discussion

Convergent validity assessment showed that the three items of the P-SIM had moderate to strong correlations with PGAP score, DLQI Item 1 score and DLQI total score, but weak correlations with PASI score and IGA score at baseline and Week 16 (for the skin pain item only). The weak correlations at baseline could be explained by the low variability of these anchors at baseline due to BE RADIANT inclusion criteria. The exact reason for the weak correlation between the skin pain item and PASI and IGA at Week 16 is unknown; however, it could be because ClinROs mainly focus on the appearance of the skin, therefore skin pain would be inadequately captured. Overall, these results indicate that the three P-SIM items have good convergent validity.

Statistically significant score differences were observed across the three items of the P-SIM between known groups based on PASI total score and IGA score, demonstrating the ability of the P-SIM to discriminate between known subgroups.

When examining sensitivity to change, correlation coefficients between changes in the P-SIM items and PRO measures were strong or moderate at Weeks 16, 32 and 48. However, correlation coefficients were weak between the P-SIM items and ClinRO measures (except for the scaling item and PASI total score at Week 48). Such low correlations between these clinician-rated measures and P-SIM items are likely due to two reasons. First, the variability of change scores from baseline was greatly reduced due to the use of highly effective treatments in this study, resulting in almost no patients with worsening and only a very small percentage of patients presenting no meaningful improvement based on these clinical measures. Additionally, the ClinRO scales may be less sensitive in distinguishing between different levels of improvement as compared to the PRO scales considered.

Generally, the P-SIM items demonstrated good test–retest reliability, with all ICC values well above the acceptability threshold of 0.70 between Week 12 and Week 16, demonstrating that the P-SIM is consistent in its measurement of signs, symptoms and impacts of plaque psoriasis.

RD thresholds were estimated in accordance with the FDA guidance [16]. For the anchor-based approach, the mean item score changes in the one level of improvement anchor group were lower for the skin pain item than for itching or scaling. This was likely due to the higher proportion of patients reporting little or no skin pain at baseline, whereas higher levels of scaling and itching were observed at baseline. This floor effect may impact the magnitude of reductions seen in the skin pain item scores, relative to the itching and scaling items.

Limitations of this study include the high proportion of white (93.5%) patients included in the randomised set. The lack of racial diversity in the randomised set limits the generalisability of the results. For example, post-inflammatory hyper- and hypopigmentation are generally of more concern in dark-skinned individuals [17]. Another potential limitation is the low completion rate at Week 48. It was necessary to determine the RD in advance of database unblinding to minimise bias. Therefore, all analyses were conducted on a blinded interim dataset. As a result, 252 patients had not yet completed Week 48 at the time of the data-cut-off, impacting the completion rates at Week 48. However, results at Week 32 were generally consistent with those at Week 48, indicating that the impact of this lower completion rate at Week 48 is likely minimal.

In BE RADIANT, three symptom items (itching, skin pain and scaling) from the full P-SIM were assessed. Although other items of the P-SIM, such as redness and lesions, are also subjective concepts which could be assessed by the patient, changes in patient experiences of itching, skin pain and scaling are key outcomes in the evaluation of efficacy of treatments for plaque psoriasis and have previously been used to provide patient-reported evidence complementary to traditional ClinRO outcomes. This assertion is supported by findings from concept elicitation research reported in the literature [5, 18].

In this study, a high proportion of patients had nail or scalp involvement (51.5% and 93.4%, respectively). Both nail and scalp involvement have been shown to severely impact patient quality of life; in future studies, the P-SIM should be used to assess experiences of patients with lesions involving the nails and scalp [19]. Based on this psychometric validation, a four-point RD threshold on an 11-point rating scale is proposed. Responder analyses using the four-point threshold to assess treatment effect may need to be restricted to patients reporting a score of ≥ 4 at baseline. Complementary analyses describing the proportion of patients reaching a score of 0 (no symptom) would allow all symptomatic patients to be included.

A strength of this psychometric validation work was the completion of the P-SIM during patient visits to the study site based on a pre-defined schedule. This resulted in all three item scores being obtained from a single assessment, which may have improved the completion rates as compared to a more regular assessment. Additionally, the use of verbal rating scales as anchors in this analysis may have more accurately captured patient experiences of psoriasis, especially for complex concepts such as skin pain.

Conclusions

In conclusion, the three items of the P-SIM assessed here demonstrated robust psychometric properties, including test–retest reliability, convergent and known-group validity and sensitivity to change in the concepts that they intended to measure. The anchor-based analysis in this study suggested that a four-point threshold for the three P-SIM items can be used to define marked within-patient meaningful change in patients with psoriasis treated over a period of 48 weeks.