Background

Multicentre randomised trials are complex and expensive projects. Improving the efficiency and quality of trial conduct is important, for patients, funders, researchers, clinicians and policy-makers [1]. A key factor in successful planning and delivery of multicentre trials is how well sites meet their targets in recruiting and retaining participants, and in collecting high-quality, complete data in a timely manner [2]. Collecting and monitoring easily accessible data relevant to performance of sites has the potential to improve the efficiency and success of trial management. Ideally, such performance metrics should provide information that quickly identifies potential problems so they can be mitigated or avoided, hence minimising their impact and improving the efficiency of trial conduct.

We are not aware of any standardised metrics for monitoring site performance in multicentre trials. A recent query to all UK Clinical Research Collaboration (UKCRC), registered Clinical Trials Units (CTUs) revealed that many units routinely collect and report data for each site in a trial; such as numbers randomised, case report forms (CRFs) returned, data quality, missing primary outcome data, and serious breaches. How such data are used to assess and manage performance varies widely however [3,4,5,6,7]. Agreeing a small number of metrics for site performance that could be easily collected, presented and monitored in a standardised way by a trial manager or trial co-ordinator would be a potentially useful tool to improve efficient trial conduct.

Currently, trial teams, sponsors, funders and oversight committees monitor site performance and trial conduct based primarily on recruitment [8]. Whilst clearly important, recruitment is not the only performance indicator that matters for a successful trial. Using a range of additional metrics that include data quality, protocol compliance and participant retention would give a better overall measure of the performance of each trial site, and the trial overall. To be low cost and efficient, the number of metrics monitored at any one time should be limited to no more than 8 to 12 [9]. We conducted a systematic review to identify performance metrics that have been used, or proposed, for monitoring or measuring performance at sites in multicentre randomised trials.

Methods

We performed a systematic review to identify metrics that have been used or proposed for monitoring or measuring performance at individual sites in multicentre randomised trials.

Criteria for potentially eligible studies

Studies were potentially eligible for inclusion if they:

  • Reported one or more site performance metric, either used or proposed for use, specifically for the purpose of measuring individual site performance

  • Were multicentre randomised trials, or concerning multicentre trials

  • Were published in English

  • Related to randomised trials involving humans

Studies where the strategy for monitoring site performance was randomly allocated were included. We anticipated that there might be studies where the adoption of an individual performance metric might have been tested by randomly allocating sites to using that particular metric or not. Studies relevant to both publically funded and industry-funded trials were included.

Search strategy

We searched the Cochrane Library and five biomedical bibliographic databases (CINAHL, Excerpta Medica database (EMBASE), Medical Literature Analysis and Retrieval System Online (Medline), Psychological Information Database (PsychINFO) and SCOPUS) and Google Scholar from 1980 to 2017 week 07. The search strategy is provided as an Appendix (Table 3).

Selection of studies

Two reviewers (KW, JT) independently assessed for inclusion the titles and abstracts identified by the search strategy. If there was disagreement about whether a record should be included, we obtained the full text.

We sought full-text copies for all potentially eligible records, and two reviewers (KW, JT) independently assessed these for inclusion. Disagreements were resolved by discussion, and if agreement could not be reached the study was independently assessed by a third reviewer (LD). Multiple reports of the same study were linked together.

Data extraction and data entry

Two reviewers (KW, JT) extracted data independently onto a specifically designed data extraction form. In the few cases where full text was not available (n = 9), data were extracted using the title and abstract only. Data were entered into an Excel spreadsheet, and checked.

Data were extracted on the design of the randomised trial (participants, intervention, control, number of sites and target sample size); whether the performance metric/s was theoretical or applied. For each performance metric we collected data that included: a verbatim description of the metric; how the metric was measured or expressed; timing of the measurement and during which phase of the study; who measured the metric; if a threshold exists to trigger action, what the threshold was and what action it triggers; and whether the metric was recommended by the authors.

Data analysis

We described the flow of studies through the review, with reasons for being removed or excluded, using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidance [10]. Characteristics of each study were described and tabulated. Analyses were descriptive only, with no statistical analyses anticipated.

Results

The database search identified 3365 records, of which 177 were duplicates, leaving 3188 screened for eligibility (Fig. 1). At screening, we obtained full-text copies for 147 records to determine eligibility. For a further seven records full-text copies were unavailable, and so screened was based on the abstract only. Of those full-text copies and abstracts (for papers where the full text was unavailable), there was disagreement on three papers. Following discussion two papers were accepted for inclusion [11, 12] and one paper was excluded [13].

Fig. 1
figure 1

Flow diagram

Twenty-one studies were agreed for inclusion, of which 14 were studies proposing performance metrics and seven were studies using performance metrics (Table 1). These 21 studies reported a total of 117 performance metrics. The median number of performance metrics reported per study was 8, with the range being 1–16. Those 117 metrics were then screened, to exclude any judged as: lacking sufficient clarity; being unrelated to individual site performance; being too specific to an individual trial methodology or pertaining to clinical outcomes not trial performance. This left 87 performance metrics to be considered for use in day-to-day trial management. The metrics broadly fell into six main categories: assessing site potential before recruitment starts; and monitoring recruitment, retention, quality of data collection, quality of trial conduct, and trial safety (Table 2).

Table 1 Characteristics of included studies
Table 2 Examples of performance metrics within each identified category

Discussion

As far as we are aware, this is the first systematic review to identify and describe proposed or utilised metrics to monitor site performance in multicentre randomised trials. It provides a list of performance metrics, which can be used to contribute to developing and agreed a proposed set of performance metrics for use in day-to-day trial management. We identified 87 performance metrics which fell broadly into six main categories.

A strength of our study was the comprehensive search of the literature.

In planning this systematic review we envisaged that studies would be identified that had evaluated individual performance metrics either by implementation mid-way through a study, or ideally by randomising individual sites to use of a particular metric or not. Unfortunately, there was a paucity of such studies. Most studies suggested performance metrics on a purely theoretical basis, and did not provide data on the actual use of suggested metrics. The main limitations of our study were the lack of studies implementing performance metrics and reporting the effects of their utilisation, and that published work on this topic is limited, which is perhaps surprising as informal assessment of how sites perform in multicentre trials is common.

This list of performance metrics contributed to development of a Delphi survey sent to trial managers, UKCRC CTU directors and key clinical trial stakeholders, which is reported elsewhere. They were invited to participate through the UK Trial Managers’ Network (UK TMN) and UK Clinical Research Collaboration (UKCRC CTU) Network. Three Delphi rounds were used to steer the groups to consensus, refining the list of performance metrics. The reasons for their decisions were documented. Finally, data from the Delphi survey was presented to stakeholders in a priority setting expert workshop, providing participants with the opportunity to express their views, hear different perspectives and think more widely about monitoring of site performance. This was used to establish a consensus among experts on the top key performance metrics, expected to number around 8–12.

Conclusions

This study provides trialists for the first time with a comprehensive description of performance metrics described in the literature that have been proposed or used in the context of multicentre randomised trials. It will assist future work to develop a concise, practical list of performance metrics which could be used in day-to-day trial management to improve the performance of individual sites. This has the potential to reduce both the financial cost of delivering a multicentre trial, and the research waste and delay in scientific progress that results when trials fail to meet their recruitment target, are poorly conducted, or have inadequate data.