FormalPara Key Summary Points

Multidimensional composite outcome measures are commonly used to assess the disease activity of psoriatic arthritis (PsA) in clinical trials; however, these measures are considered less feasible to be used in real-world practice which has led to the development of abbreviated composite outcome measures.

Some retrospective data sources, such as electronic health records, may rely on outcome measures not fully validated in PsA; however, global scales and patient-reported pain, which are validated in PsA, are often available.

A 3-item Global Assessment and Pain (GAP) composite endpoint, incorporating Physician Global Assessment, Patient Global Assessment, and patient-reported pain was found to have good discrimination and a high degree of responsiveness in PsA, comparable to Psoriatic Arthritis Disease Activity Score (PASDAS), and higher than Disease Activity Index for PsA (DAPSA).

The GAP composite could be a useful alternative to assess PsA disease activity and address important clinical questions regarding outcome and impact when doing retrospective analyses from existing datasets, such as electronic health records.

Introduction

The heterogeneous clinical presentation of psoriatic arthritis (PsA) has prompted the development of several multidimensional continuous composite measures of disease activity such as the Psoriatic Arthritis Disease Activity Score (PASDAS) [1], GRAppa Composite scorE (GRACE) [1], Composite Psoriatic Disease Activity Index (CPDAI) [2], and the Disease Activity Index for Psoriatic Arthritis (DAPSA) [3, 4]. Another binary multidimensional composite, Minimal Disease Activity (MDA), has been recommended, along with DAPSA, to identify states of low disease activity (LDA) and to serve as therapy targets for PsA [5, 6]. The PASDAS and GRACE have been shown to be the most sensitive for detecting treatment effect [7,8,9]. However, these various composites are mostly reserved for use in clinical trials as they are considered less feasible to be used in routine clinical care [10].

Shortened composites have now been introduced which could be more practical to use in a real-world setting [11]. The 3-item Visual Analog Scale (3-VAS) is made up of a Physician Global Assessment (PhGA), Patient Global Assessment (PatGA), and Patient Skin VAS, while the 4-item VAS (4-VAS) comprises a PhGA, Patient Pain, Patient Joint Activity, and Patient Skin VAS [11]. Data to date have shown these composites provide an accurate assessment of PsA disease activity, and it has been postulated that the use of these abbreviated composites may lead to improved patient care due to wider use of more feasible instruments in routine clinical practice [12,13,14,15,16]. A GRAPPA-OMERACT initiative is currently updating their recommendations around PsA composites for clinical trial use, and the 3-VAS and 4-VAS are being considered as candidates [17].

However, both the 3-VAS and 4-VAS do include a specific skin assessment, which is an important aspect of PsA, but generally may be less likely to be assessed as an individual component in patients seen in routine rheumatology clinical practice. Global assessments, such as PhGA and PatGA, along with patient-reported pain scores are more commonly found in rheumatology-focused electronic medical record data in the USA, and individually are considered validated in PsA [18,19,20]. Formation of a 3-item Global Assessment and Pain (GAP) composite endpoint, incorporating PhGA, PatGA, and patient-reported pain, would take into consideration a holistic assessment of musculoskeletal, extra-articular, and skin symptoms from both the physician and patient perspective and include an evaluation of pain. While identified as one of the most important symptoms to patients with PsA, an independent pain assessment has only been incorporated in the DAPSA, RAPID3, 4-VAS, and MDA [12, 21, 22]. The objective of this report is to describe the responsiveness and discrimination of the GAP composite, as well as the construct validity through correlation with other PsA composite and patient-reported quality-of-life endpoints using data from the phase 3 clinical trial program of ixekizumab (Ixe) in PsA.

Methods

Study Design and Patients

The analyses reported here are post hoc from two randomized, double-blind, phase 3 clinical trials of Ixe in patients with active PsA. Details of these trials have been previously reported, in brief, SPIRIT-P1 (NCT01695239) [23, 24] enrolled biologic-naïve patients, and SPIRIT-P2 (NCT02349295) [25, 26] enrolled patients with prior inadequate response or intolerance to one or two tumor necrosis factor inhibitors (TNFi). SPIRIT-P1 studied 80 mg Ixe every 2 weeks (Ixe Q2W) or 4 weeks (Ixe Q4W) after a 160 mg starting dose, or adalimumab (Ada). Patients who remained on placebo or Ada at week 24 were re-randomized to receive Ixe Q2W or Ixe Q4W after a starting dose of 160 mg. SPIRIT-P2 studied Ixe Q2W or Ixe Q4W after a 160 mg starting dose.

Assessments

The composite endpoints used in this analysis were the Global Assessment and Pain (GAP), DAPSA [3], cDAPSA [4], DAPSA28 [27], PASDAS [1], and MDA [5]. GAP, all DAPSA endpoints, and PASDAS were calculated post hoc. The components and scoring of each composite included in the analyses are described in Supplementary Table S1. The variations of DAPSA were included as minimal information is currently available on their performance characteristics. PASDAS was the representative multidimensional continuous composite as a result of prior data showing it is one of the most sensitive to detect treatment effect, and performance characteristics of other multidimensional composites have previously been reported from Ixe clinical trial data [9].

Other outcomes used to assess correlation with GAP included the Psoriasis Area and Severity Index (PASI), range 0–72 [28], Nail Psoriasis Severity Index (NAPSI), range 0–160 [29], and the Leeds Enthesitis Index (LEI), range 0–6, with higher scores reflecting worse skin, nail, and enthesitis disease activity, respectively [30]. Correlation of GAP with patient-reported quality-of-life measures were assessed using the Health Assessment Questionnaire–Disability Index (HAQ-DI), range 0–3, higher score indicates worse disability [31], and the Short Form-36 Physical Component Summary (SF-36 PCS), range 0–100, with higher scores reflecting better health status [32].

Statistical Analyses

The mean score for each composite measure was calculated at various time points (Supplementary Table S2) through week 52. PASDAS, SF-36 PCS, and NAPSI were not available at week 8. Treatment comparisons (Ixe Q4W and Ixe Q2W vs placebo) were made for each continuous composite at each time point during the 24-week placebo-controlled period using a mixed model for repeated measures analysis for SPIRIT-P1 and SPIRIT-P2 separately. Data from patients who were inadequate responders at week 16 were censored after week 16 and up to week 24.

Correlations among the composites and between each of the composites and individual physician- or patient-assessed measures were determined by Pearson or Spearman rank correlations, as appropriate using combined Ixe treatment groups and both studies integrated. Correlations were interpreted by Evan’s criteria as small (0.2), moderate (0.5), or large (0.8) effects [33]. The association of GAP with MDA was assessed through a t test or Mann–Whitney test, as appropriate.

Change in GAP in patients who reached LDA levels as defined by DAPSA ≤ 14 [22], cDAPSA ≤ 13 [22], and PASDAS ≤ 3.2 [1] was compared to patients not reaching LDA at weeks 24 and 52 by t test or Mann–Whitney test.

The effect size (ES) and standardized response mean (SRM) were estimated for all continuous composite measures from the complete integrated dataset using the following definitions: ES = (mean at baseline − mean at week X)/SD (baseline); SRM = (mean at baseline − mean at week X)/SD (change from baseline at week X) [34]. Comparisons among the composite measures were made by within-group paired t tests. Effect size or SRM values > 0.8 were considered large.

Statistical analyses were performed using SAS® software version 9.4 or higher (SAS Institute).

Ethical Approval

This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Results

Composite Outcomes During Placebo-Controlled Period

The total number of patients included in the analyses was 316 from SPIRIT-P1 (Ixe Q4W = 107; Ixe Q2W = 103; placebo = 106) and 363 from SPIRIT-P2 (Ixe Q4W = 122; Ixe Q2W = 123; placebo = 118). The studies’ baseline demographics and clinical characteristics have been reported previously [23, 25].

In patients receiving IxeQ4W (the labelled dose), the GAP composite showed statistically significant improvement from baseline vs placebo at all time points up to week 24 with separation from placebo as early as week 1 (Table 1). The same pattern was also seen with the cDAPSA and DAPSA28 (Table 1), as well as the PASDAS and DAPSA, which have been previously reported [9].

Table 1 Mean change from baseline to week 24 for continuous PsA composite total scores

Composite Outcomes Over Time

In patients randomized to Ixe who remained on it through week 52, there was continued improvement in disease activity as assessed by GAP over the 52 weeks. Similar improvement was seen with the other continuous composite measures (Fig. 1).

Fig. 1
figure 1

Mean observed value of psoriatic arthritis composites over time in ixekizumab-treated patients. The figure shows combined Ixe Q4W and Q2W treatment arms from SPIRIT-P1 and SPIRIT-P2. cDAPSA Clinical Disease Activity Index for Psoriatic Arthritis, DAPSA Disease Activity Index for Psoriatic Arthritis, DAPSA28 Disease Activity Index for Psoriatic Arthritis based on 28 joint count, GAP Global Assessment and Pain Composite, PASDAS Psoriatic Arthritis Disease Activity Score, Wk week

Correlation of GAP Composite and Other PsA Composite and Clinical Outcomes

A strong correlation was observed between GAP and PASDAS (r = 0.81–0.92), and moderate-strong correlations were seen with GAP and DAPSA (r = 0.49–0.81), cDAPSA (r = 0.49–0.80), and DAPSA28 (r = 0.46–0.83) (Table 2). Patients achieving MDA at weeks 8, 16, 24, and 52 had, on average, a lower GAP score respectively 11.8 (7.4), 10.4 (7.3), 7.9 (4.8), and 7.4 (6.0) compared to patients not achieving MDA at those same weeks [36.5 (18.4), 36.3 (19.5), 31.0 (16.1), and 30.0 (14.7)], p < 0.0001 for all.

Table 2 Correlation of GAP with other composite endpoints: patients randomized to Ixe and remained on it through week 52

GAP showed moderate correlations with HAQ-DI and SF-36 PCS, low correlations with physician assessments of psoriasis (PASI, NAPSI), and low-moderate correlation with physician assessment of enthesitis (Table 3). All correlations with SF-36 are negative because of higher values of SF-36 indicating improvement, while lower values of all the comparison endpoints indicate improvement. Comparing across continuous composite endpoints, correlations with the HAQ-DI and SF-36 PCS were highest for PASDAS, followed by GAP, then the three DAPSA variations. Correlation with the PASI and NAPSI were similar for PASDAS and GAP, and lower for the DAPSA variation composites. Correlation with the LEI was highest with the PASDAS, followed by the DAPSA variations, and GAP had the lowest correlation (Table 3).

Table 3 Correlation of PsA composites with individual physician- and patient-reported endpoint: patients randomized to Ixe who remained on it through week 52

GAP Outcomes in Patients with Low Disease Activity

In patients who achieved LDA states at week 24, percentage improvements of 76–79% in GAP were seen at week 24 (Fig. 2). A significantly greater improvement in GAP was seen in the groups achieving LDA states compared to those who did not (p ≤ 0.001) (Fig. 2). Similar results were seen at week 52 (Supplementary Fig. S1).

Fig. 2
figure 2

Mean change from baseline to week 24 in GAP composite by LDA for cDAPSA, DAPSA, and PASDAS. cDAPSA Clinical Disease Activity Index for Psoriatic Arthritis, DAPSA Disease Activity Index for Psoriatic Arthritis, GAP Global Assessment and Pain Composite, LDA low disease activity, n sample size, sd standard deviation

Effect Size and Standardized Response Means

All composite measures had large ES and SRM at both weeks 24 and 52. The highest values were associated with GAP and PASDAS (Table 4).

Table 4 Summary of effect size and standardized response mean (observed data)

Discussion

Following recent reports of the potential benefits of abbreviated PsA composite measures for use in routine clinical practice, we introduced an alternative 3-item composite (GAP) and described its performance characteristics. The analyses demonstrate that the GAP composite has good discrimination and a high degree of responsiveness in PsA, comparable to PASDAS, and higher than all the DAPSA composites assessed. A greater similarity of performance characteristics of the GAP composite to the PASDAS versus the DAPSA is likely related to both taking into consideration the multidimensional nature of PsA, compared to the DAPSA which is more unidimensional and focused on articular disease. The PASDAS and GAP also have in common the inclusion of both a patient global and a physician global score and in the development of the PASDAS it was noted that the patient and physician global VAS scores accounted for the majority of the variance in the total score [1]. In our analysis, GAP was also moderately correlated with physician-assessed enthesitis, the patient-reported physical function (HAQ-DI), and physical health status (SF-36) and had low correlation with physician-assessed psoriasis.

Other abbreviated PsA composites emerging as potential candidates for routine clinical practice (e.g., 3-VAS, 4-VAS, RAPID3) have also shown similar or higher ES and SRM compared to other multidimensional PsA composites when assessed in observational studies. The ES and SRM for the 3-VAS and 4-VAS were found to be higher than the cDAPSA, but similar to the PASDAS [11]. The 3-VAS and 4-VAS have also been found to be moderately correlated with HAQ-DI and SF-36 [11, 15]. The RAPID3 was found to have high correlation with PASDAS and higher ES than DAPSA using data from a clinical trial (TICOPA) [22], but similar responsiveness to DAPSA using an observational study (LOPAS II) [22].

The magnitude of the ES and SRMs from our analyses are similar to those from assessments of PsA composite performance characteristics from other clinical trials [7, 8]. However, data coming from observational studies generally found lower ES/SRM [11, 15, 16]. This could be related to varying clinical characteristics of the different patient populations, or alternative methodologic approaches for calculating the metrics [35]. As such, comparison of performance characteristics across different studies should be done with caution.

There is an advantage in incorporating into a feasible composite measure, a holistic assessment of PsA disease activity balanced from both the physician and patient perspectives, along with patient-reported pain, as the latter has been identified as the priority treatment outcome to patients [21]. Additionally, a PatGA has another benefit in taking into consideration patient-reported fatigue, which has also been identified as an important outcome in PsA and recommended as one of the core domains that should routinely be assessed [36]. Patient-reported fatigue has been identified as one of the main factors contributing to variability in a PatGA scale and assessment of fatigue specifically has been shown to contribute to a discordance when comparing global assessment between patients and physicians [37].

The impetus for the development of abbreviated composites is to enhance the feasibility of assessment in a routine clinical practice setting. The GAP composite comprises outcome measures more commonly available in some real-world data sources compared to measures of all individual PsA domains; thus, the GAP composite could be a useful alternative to assess PsA disease activity when doing a retrospective analysis from such data sources, e.g., electronic medical records in the USA.

Potential limitations of this analysis include the data deriving from Ixe clinical trials, which studied populations with highly active PsA. The performance characteristics of the measures assessed could vary in a population with less active disease. We also could not directly compare the GAP composite to other candidate PsA abbreviated composites (e.g., 3-VAS, 4-VAS, RAPID3) in this analysis as a result of one of the components of the two VAS scales, a Skin VAS, not being assessed in the SPIRIT trials, and RAPID3 was also not measured. Future research should assess disease activity thresholds for the GAP composite and derive minimal clinically important improvement values. Additionally, testing of GAP in other clinical trials and observational settings, and comparison with the 3-VAS/4-VAS and RAPID3 in the same data source would be value added. Lastly, for the optimal use of the GAP composite in the real-world setting, the reduction of assessment of all individual PsA domains by the clinician to one overarching global score relies on the assumption that all articular and extra-articular components have been evaluated on the basis of comprehensive history and physical assessment.

Conclusion

The GAP composite provides an opportunity for an alternative abbreviated composite endpoint that includes components commonly found in electronic health records, has comparable performance characteristics to the PASDAS, and is feasible to use in a real-world setting. The GAP composite could be used to address important clinical questions regarding outcomes and impact of PsA in existing datasets.