Introduction

Immune-mediated inflammatory diseases (IMIDs) such as rheumatoid arthritis (RA) and Sjogren’s syndrome (SS) belong to a group of chronic and highly disabling inflammatory conditions [1]. Recent findings that these clinically dissimilar diseases share similar immune dysregulation and molecular drivers of inflammation has sparked an interest in the development of novel therapies that may be used across inflammatory diseases regardless of the specific diagnosis [2, 3]. Selection for such targeted treatment would be based on patients’ response to the novel drugs which would be determined by these molecular drivers [3].

This innovative approach mirrors the new ‘tissue-agnostic’ drug development paradigm in oncology where targeted therapies are developed based on molecular markers rather than organ or tissue type [4, 5]. Tissue-agnostic drug development has already shown considerable promise in oncology with the FDA granting accelerated approvals for drugs such as Keytruda (pembrolizumab) and Vitrakvi (larotrectinib) for the treatment of solid tumours [6]. The EMA recently granted the conditional approval of Vitrakvi [7].

Tissue-agnostic therapies in IMIDs may be evaluated in innovative clinical trials such as biomarker-adaptive and basket trials. Biomarker-adaptive trials incorporate adaptive clinical trial methodology to modify the trials according to the accumulating outcome data [8]. In basket trials, patients are primarily grouped according to molecular drivers rather than their specific diagnosis [9, 10]. The expectation is that group sensitivities to the therapies can be assessed and compared and populations most likely to benefit from treatment identified [9, 11]. The use of basket trials have increased over the past 5 years and is set to increase rapidly over the next few decades as it becomes more widely adopted [12].

To facilitate cross-disease comparisons, it is essential that trial data from the patient groups are comparable. However, at present, a wide variation exists in the outcomes, endpoints and measures selected for use in drug trials. It should be noted that there is a distinction between the terms ‘outcome’ and ‘endpoint’ [13]. According to the NIH Collaboratory ‘….outcome usually refers to the measured variable (e.g. peak volume of oxygen (VO2) or PROMIS Fatigue score), whereas an endpoint refers to the analysed parameter (e.g. change-from-baseline at 6 weeks in mean PROMIS Fatigue score)….’ [13] The variations in outcomes and endpoints measured in trials make it difficult to compare and/or synthesise outcome data within and across IMIDs [14]. As a result, there may be variations in the trial data submitted to support applications for drug approvals and health technology assessment making head-to-head comparisons of drug efficacy and cost-effectiveness analyses challenging.

Core outcome sets (COS), which propose a minimum set of outcomes to measure and report for all trials in specific condition(s), have been developed to assist with the standardisation of outcomes measured in clinical trials [14, 15]. However, there may be variations in the COS proposed for different IMIDs and by different organisations due to differing foci and interests. There is therefore a need to identify appropriate outcomes and endpoints to measure across IMIDs in innovative tissue-agnostic trials.

The Birmingham National Institute for Health Research (NIHR) Biomedical Research Centre for Inflammation was founded to improve healthcare for patients with chronic immune-mediated inflammatory diseases, by developing and accelerating access to new diagnostic tests and new therapies. A programme of observational and experimental clinical trials will be undertaken to achieve this focusing on several IMIDs. The target IMIDs include the following: (i) rheumatoid arthritis (RA); (ii) juvenile idiopathic arthritis (JIA); (iii) ankylosing spondylitis (AS); (iv) psoriatic arthritis (PsA); (v) Sjogren’s syndrome (SS); (vi) Crohn’s disease (CD); (vii) ulcerative colitis (UC); (viii) uveitis (Uv); (ix) systemic lupus erythematosus (SLE) including juvenile SLE (jSLE); (x) autoimmune hepatitis (AIH); and (xi) primary sclerosing cholangitis (PSC). This review therefore focuses on these 11 IMIDs.

The specific objectives were:

  1. (i)

    Identify and map core outcome sets (COS), across 11 IMIDs in order to facilitate the selection of relevant outcomes across the conditions for innovative trials of tissue-agnostic drug therapies.

  2. (ii)

    Compare outcomes or endpoints recommended by the US Food and Drug Administration (FDA) and European Medicines Agency (EMA) to identify and highlight similarities and differences.

Methods

This study was conducted in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [16] (see PRISMA checklist). Ethical approval was not required for this study as it did not use patient data.

Two reviewers (OLA, LFR) systematically searched from inception to 28th December 2019 four online resources namely the (i) Core Outcome Measures in Effectiveness Trials (COMET), (ii) International Consortium for Health Outcomes Measurement (ICHOM), (iii) European Medicines Agency (EMA) and (iv) US Food and Drug Administration (FDA) databases.

Search strategy

The search on the COMET database [17] was restricted by selecting 23 relevant disease terms from the ‘disease name’ menu (Additional file 1). The other databases did not have this function therefore the 11 disease terms listed above were entered directly into their search boxes. Guidance documents were obtained by specifically searching the ‘Guidance, Compliance, & Regulatory Information’ section of the FDA [18] and the ‘Scientific Guidelines, the Clinical Efficacy and Safety Guidelines’ section of the EMA website [19].

Selection of publications

Studies archived on the COMET database were eligible that reported preliminary or definitive COS and outcome measures established through ranked consensus-based methodologies [14]. Purely methodological studies, COS study protocols, reviews of outcomes, outcome measures or symptoms which do not report a consensus-based approach were excluded. Articles reporting COS developed for trials of non-pharmaceutical interventions were also excluded. Published COS from the ICHOM and regulatory guidance provided by the EMA and FDA databases were eligible if focussed on the conditions of interest.

Initial screening of all titles and abstracts was independently conducted by the reviewers (LFR, OLA). The full texts of publications potentially meeting the eligibility criteria were obtained and independently reviewed by the same reviewers. The reasons for exclusion at this stage were documented. At each stage, disagreements regarding eligibility were resolved through discussion and, if necessary, consultation with a third reviewer (MC). Reasons for exclusion were recorded. We conducted a hand search of reference lists and citation search of the included publications.

Quality assessment

The methodological quality of the included studies peer-reviewed articles was independently assessed by the reviewers based on the items of the COS–Standards for Development recommendations (COS–STAD) checklist [20]. Differences in assessments were discussed and resolved.

Data extraction strategy

An electronic form was designed, piloted and used for data extraction by the reviewers (LFR, OLA). Data from all the COS publications (peer-reviewed COS articles and the regulatory guidance documents) were extracted verbatim by the two reviewers and cross-checked by a member of the research team (AR) for accuracy. Where available, the reviewers extracted information on:

  1. (i)

    Recommended core outcomes, endpoints and measures.

  2. (ii)

    Target patient populations (age, gender, inflammatory condition(s)), study design (e.g. interviews, focus groups, Delphi), contributing stakeholders (e.g. patients, healthcare professionals, carers), geographical location of stakeholders and setting for COS use.

  3. (iii)

    Methods used to derive, prioritise and/or select the final list of COS, endpoints and/or measures.

Data synthesis and presentation

The extracted outcomes and endpoints from the COS articles and regulatory guidance documents were grouped by OLA and LFR into sub-domains and domains based on their classification by the source publications. Where there were discrepancies in the domain and sub-domain classifications by different publications, the reviewers discussed and chose the most appropriate for this study. The reviewers inductively grouped domains into broad categories after completing data extraction based on characteristics of the domains.

A matrix was created for each condition displaying the core outcomes, endpoints and any recommended outcome measures extracted from each publication. Publications were arranged according to the COS’s target study design (e.g. longitudinal, clinical trial). These matrices were combined to form a single matrix showing all core outcomes, endpoints and measures recommended across all inflammatory conditions.

For the second study objective, the extracted outcomes, endpoints and measures recommended by the FDA and EMA in scientific guidance documents were separately compared to highlight similarities and differences. The findings were presented in a table. The matrix and the final tables were cross-checked by AR for accuracy.

Results

Characteristics of included publications

The selection process is depicted in a PRISMA flow diagram (see Fig. 1). Table 1 summarises the 44 included publications (peer-reviewed COS articles and regulatory guidance documents) and provides details on the characteristics of the included publications [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64]. See Supplementary Table 2 for further details.

Fig. 1
figure 1

PRISMA flow diagram

Table 1 Characteristics of included publications

Number of publications included

COS were found for all conditions except AIH and PSC. A total of 92 publications from COMET were screened and of these 30 were included [22,23,24,25, 27, 28, 30, 31, 33, 35, 37,38,39,40,41,42, 45, 46, 48,49,50,51, 54,55,56, 58,59,60,61, 64]. EMA guidance for AS, CD, JIA, PsA, RA, SLE and UC [21, 26, 32, 34, 43, 52, 62], FDA guidance for RA, SLE and UC [44, 53, 63] and one document from the ICHOM database [47] were included. No regulatory guidance was found for Uv or SS. Following reference list and citation searching, three articles [29, 36, 57] were retrieved bringing the total number of publications included in the final analysis to 44 [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64]. RA had the highest number of relevant publications (10 in all) whilst the only publication included for uveitis actually relates to JIA-related uveitis [64].

Study populations and settings

In terms of study populations, thirty-seven publications were associated with COS for adults or mixed populations [21,22,23,24,25,26, 28,29,30, 34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57, 60,61,62,63] whilst seven related to COS specifically for paediatric patients [27, 31,32,33, 58, 59, 64].

All the COS were designed/recommended for use in clinical trials and longitudinal observational studies (with the exception of the COS specifically developed for AS registries by Zochling et al. [24]). Five of these COS were also recommended for use in routine clinical practice for AS [23], perianal CD [28], inflammatory bowel disease (IBD) [30], PsA [35] and RA [50]. The ICHOM COS was developed for use in clinical trials and routine practice for all ‘inflammatory arthritis’ [47].

Quality assessment

Consensus methods used

All the COS articles reported employing consensus methods—nine studies (28%) used Delphi/modified Delphi methods, 11 (34%) used nominal group technique and 12 (38%) employed unspecified consensus methods. The included articles generally provided adequate information pertaining to scope (COS-STAD items 1–4).

Stakeholder involvement

Information relating to stakeholder involvement (COS-STAD items 5–7) and consensus process (COS-STAD 8–11) were sometimes less detailed making it difficult to assess the degree of stakeholder involvement and the robustness of the consensus process. Whilst clinical experts were involved in the COS development process for 34 studies, only 14 publications explicitly reported patient involvement, with one reporting that patients were not involved [64]. Details about the characteristics of panels or working groups were often limited making it difficult to ascertain the inclusion of patients and their specific involvement. The period COS were developed seemed to influence the reporting of patient involvement. COS published within the last decade (such as Nikiphorou et al. [49] and Radner et al. [50] for RA; Orbai et al. [39] and Tillett et al. [41] for PsA) were more likely to report patient involvement explicitly than older ones (such as Felson et al. for RA [45] and Gladman et al. [37] for PsA). However, we were unable to rule out the possibility that the developers of some of the older COS might have involved patients to some degree but the authors have not reported this in their publication. See Additional file 2 for further details. It should be noted that the regulatory guidance documents did not report the use of any consensus process or stakeholder involvement to inform the recommendations provided.

Core outcomes proposed across the inflammatory conditions

Core outcomes proposed

Table 2 summarises the core outcomes extracted from the COS articles and the regulatory guidance documents across the nine included inflammatory conditions. Outcomes such as disease activity, joint/structural damage, pain, fatigue, quality of life, physical function, work limitation/productivity, steroid use and biomarkers (acute phase reactants) were recommended across majority of the conditions. Psychosocial function, psychological and emotional wellbeing were the least frequently recommended ‘generic’ outcomes across the conditions. Expectedly, outcomes such as rectal bleeding which is specific to UC and sicca symptoms which relate to SS had very low frequencies.

Table 2 Core outcomes proposed across inflammatory conditions

Approach to outcome recommendations

One of the issues identified by this review was the difference in approach by the various COS developers and regulatory bodies. Regulatory bodies often suggested a list of ‘primary’ and ‘secondary’ endpoints from which trialists may make selections. On the other hand, COS developers propose a minimum set of outcomes (core items) to be measured in trials, sometimes complemented by optional or ‘outer core’ items [39].

Terminological inconsistencies

Another observation was the inconsistency in terminologies used by both regulatory bodies and COS developers. For example, the study by Heijde et al. [22] used the terms ‘measures’, ‘endpoints’ and ‘domains’ interchangeably to refer to outcomes such as pain and physical function [22]. Similarly, the 2015 EMA guidance for SLE [52] used the terms ‘outcomes’ and ‘endpoints’ interchangeably. The report stated ‘primary outcomes’ before going on to discuss ‘secondary endpoints’ [52].

Differences in recommendations

There were sometimes differences in the recommendations by regulatory bodies and COS developers. For example, the 2018 EMA guidance for Crohn’s disease [26] recommended fistula healing (demonstrated by MRI) as the primary endpoint for fistulising perianal Crohn’s disease whilst the COS developed by Shahan et al. [28], for the same population, included fistula response on MRI as optional [28].

There were disparities in recommendations for the use of biomarkers as outcomes or measures, with some studies cautioning against their use in specific patient subpopulations or disease stages. For instance, Ruemmele et al. [31] noted that C-reactive protein (CRP) is not elevated in all patients with active Crohn’s disease, limiting its usefulness, and although superior to CRP, faecal calprotectin has large variability in results and low responsiveness [31].

Outcome measures proposed across the target IMIDs

Outcome measures proposed across the target IMIDs can be found in Additional file 3.

Availability of outcome measures

It was observed that COS research groups tended to focus initially on achieving consensus and publishing their COS before commencing work on outcome measures to recommend in subsequent publications. For instance, Heijde et al. only reported the COS for AS in their initial article in 1997 [22]. However, 2 years later they published their work on outcome measures [23]. A similar scenario was observed with PsA where an earlier paper authored by Gladman et al. for the OMERACT PsA Working Group only reported COS [35] whilst a subsequent article presented outcome measures [36]. The latest publication from the group reported an update of the PsA COS and intimated that a thorough investigation of available measures would be commenced [38].

Information about validity of outcome measures

Although the COS studies suggested outcome measures to measure majority of the proposed COS, there was patchy information about the validity of these measures. Whilst the regulatory guidance and a few studies such as Gladman [36] explicitly discussed the available evidence of the validity of the measures proposed, the majority of the studies did not. Therefore, the basis of their recommendations was unclear, and this might explain the heterogeneity that was found in the recommendations.

Comparison of FDA and EMA recommendations

We were only able to compare FDA and EMA recommendations extracted for RA, SLE, jSLE and UC as these were the only conditions that had published FDA guidance documents. The findings are presented in Table 3.

Table 3 Comparison of FDA and EMA guidance

Comparison of guidance for RA

Comparing the FDA 2013 guidance for RA [65] with the corresponding EMA 2017 document [66], there were three key differences. Whilst the FDA regards clinical response measured by the ACR20 as a key domain for RA, and clinical remission as a secondary domain, the EMA considers clinical remission as a primary endpoint and does not recommend improvement in measures such as ACR20 as primary endpoints as their ‘clinical relevance may not be immediately clear’. [65, 66] In addition, the FDA guidance considered improvement in physical function as a key domain to assess whilst the EMA considered it as a secondary endpoint [65, 66]. However, both recommended the HAQ-DI for the assessment of physical function [65, 66].

Comparison of guidance for SLE

The FDA 2010 guidance for SLE and the corresponding EMA 2015 guidance considered the assessment of disease activity index (DAI), and reduction in flares as primary endpoints [67, 68]. Similar measures including the BILAG, SLEDAI, SLAM and ECLAM were recommended by both regulatory bodies to assess these two endpoints [67, 68]. However, whilst the FDA guidance regarded a reduction in concomitant steroids as a primary endpoint and the assessment of damage as a secondary endpoint, the order was reversed in the EMA guidance. Both documents recommended that the SLICC/ACR Damage Index is used to assess damage over a minimum period of 12 months [67, 68]. The FDA opinion was that there were no optimal measures for fatigue and so did not recommend any PRO measures [68]. On the other hand, the EMA recommended combining the SF-36 with any of the SLE-specific measures and also the FACIT-F or the BFI for the assessment of fatigue [67].

Interestingly, the FDA and EMA recommendations for jSLE matched well as both referenced the Paediatric Rheumatology International Trials Organization (PRINTO) COS [59, 67, 68]. This was the only instance where either regulatory body directly referenced a COS.

Comparison of guidance for UC

The key difference between the FDA and EMA guidance for UC was their position on the use of endoscopic remission as a primary endpoint/outcome [69, 70]. The FDA stated that ‘there are currently limitations of histological scoring systems and of community standards for definitions of histological improvement; thus, there are currently no criteria for histological assessment of mucosal healing’ and recommends endoscopic remission as a secondary endpoint [70]. On the other hand, the EMA considered the proportion of patients with endoscopic remission as a primary endpoint [69]. Both regulatory bodies felt there were issues with using the total Mayo score due to the inclusion of physician’s global assessment [69, 70]. The FDA suggested using a modified Mayo or modified UCDAI score (omitting the physician’s global) whilst the EMA stated that the total Mayo score ‘is not of primary interest’. [69, 70] Again the FDA did not consider any PROM as suitable for evaluating the signs and symptoms of UC whilst the EMA recommended validated PROMs such as the IBDQ as a secondary endpoint [69, 70].

Discussion

This systematic review has identified and mapped, for the first time, existing COS currently recommended for efficacy trials across multiple immune-mediated inflammatory diseases and compared outcomes and/or endpoints recommended by FDA and EMA for similarities and differences.

COS were found for all the conditions except AIH and PSC. The COS found for uveitis was specifically for JIA-related uveitis [64]. Outcomes such as disease activity, joint/structural damage, pain, fatigue, quality of life, physical function, work limitation/productivity, steroid use and biomarkers (acute phase reactants) were recommended across majority of the conditions and should be considered when designing basket trials for tissue-agnostic drug development involving patients with inflammatory diseases. For basket trials, trialists should consider using these common outcomes identified across the conditions in this review as a minimum set and supplement with other outcomes as required for each condition. This will therefore facilitate the comparison of outcomes across IMIDs in basket trials. The review also provides a useful repository of COS for inflammatory diseases and regulatory guidance.

There were significant similarities and differences in FDA and EMA recommendations. The only instance where either regulatory body directly referenced a COS was for jSLE—both referenced the PRINTO COS.

The relatively voluminous literature for some of the conditions, notably for RA and PsA, attests to considerable progress in the recommendation of outcome measures for these conditions. On the other hand, our review highlights the research effort required to produce COS for other conditions, particularly uveitis and Sjogren’s syndrome, for which we found very limited published information.

The differences in approach and inconsistent terminologies used by the regulators and COS developers might explain the disparities we sometimes found in some of the recommendations. Efforts should be made to harmonise the terminologies used by all the organisations. The fact that there was only on instance of the FDA and the EMA directly referencing a COS also indicates the need for increased collaboration across regulators and COS developers and inclusion of regulators in COS development.

Less than half of the COS publications explicitly reported patient involvement and when presented details of this involvement were often vague, with the exception of Tillett et al. [41] The implication of this is that some outcomes included in the COS might not be outcomes meaningful or highly prioritised by patients. The selection of stakeholder relevant outcomes and the need for patient involvement in regulatory decision-making is increasingly recognised as important [71,72,73].

The main limitation of this study is its reliance on the information explicitly provided in the included publications. For instance, although we noticed a tendency for more recent publications to detail patient involvement in the development of COS, we were unable to rule out the possibility that the developers of some of the older COS might have involved patients to some degree but the authors have not reported this in their publication.

Another limitation of the study is the lack of publications for some conditions such as PSC and the overrepresentation by RA. However, we have ensured that our tables present the results in a manner that clearly reflects this issue. As FDA guidance documents were not available for all the conditions, we were unable to directly compare the recommendations provide by the FDA and EMA for a number of the conditions.

The scope of this review was determined by our programme-specific requirements. Therefore, our findings and conclusions may not be applicable to research that involves a different selection of IMIDs. As the purpose of the review is to facilitate the selection of outcomes across several IMIDs for basket trials, there may be differences between the outcomes recommended in this review and previously published disease-specific COS.

Despite these limitations, by allowing comparison of COS across conditions, this review could facilitate the selection of commonly relevant outcomes that may be measured in tissue-agnostic trials. Measuring the same outcomes across the conditions would demonstrate more accurately the similarities or variations in the response to drug interventions between patient groups. This information could also guide the subsequent recommendations for drug approval. However, further work needs to be done to address the gaps identified especially relating to outcome measures to use in trials. The review highlights the need for greater collaborations between regulatory bodies and COS developers so that stronger and more uniform recommendations can be made which may facilitate the adoption COS. There is also a need for collaboration on the development of COS for routine care which is particularly important for real-world evidence (RWE) generation [74].

Conclusions

Tissue-agnostic drug development which utilise current advances in precision medicine such as basket trials, have the potential to usher in a new era of drug development in IMIDs. The measurement of a core set of outcomes across the conditions in such trials could facilitate the collection of more robust efficacy data by facilitating direct comparisons between patient groups. This information could potentially improve and strengthen subsequent drug approvals, recommendations and labelling claims. Outcomes such as disease activity, joint/structural damage, pain, fatigue, quality of life, physical function, work limitation/productivity, steroid use and biomarkers (acute phase reactants) should be considered when designing basket trials for tissue-agnostic drug development involving patients with inflammatory diseases. There is a need for increased collaboration between regulators and COS developers and inclusion of regulators as key stakeholders in COS development to enhance the quality of COS.