Background

Clinical trial registries were established to improve transparency and completeness in the reporting of clinical trials [1,2,3,4,5,6]. Since they were established, a number of policies have been implemented to encourage or mandate their use, and this has led to substantial growth in the number of trials that have been registered [7,8,9,10,11]. For example, since 2005, prospective trial registration has been a condition for publication in member journals of the International Committee of Medical Journal Editors (ICMJE) [1, 12]. The European Union and USA have also passed legislation requiring prospective registration of clinical trials involving drugs or devices [13].

Clinical trial registries provide the ability to measure biases in the reporting of clinical trials that arise due to non-publication, delayed publication, or incomplete publication of results [14]. Studies examining these issues rely on the ability to establish a link between the original trial registration and subsequent published article. These links can be established in an automatic fashion if the publication abstract or metadata includes the registry identifier [15, 16]. However, if this identifier is not included by trial investigators or added by journals, manual processes are needed to create these links, either through searches and inference or through direct contact with investigators. Despite the number of studies that have examined reporting biases by linking trial registry entries and publications, the processes for linking are variable and poorly described.

Clinical trial registries are a critical source of information for systematic reviewers who use these registries to augment bibliographic database searches when compiling relevant evidence from clinical trials [17,18,19]. Systematic reviewers may seek to identify links from published trial reports to their respective registry entries to fill in gaps for information that is missing or incompletely reported. They may also independently search trial registries to identify additional trials [20, 21] and follow links from the registry to reports of the trials.

Our aim was to quantify the processes that have been used to link clinical trial registries with published results and to examine the use and utility of automatic linkage over time. To do this, we conducted a systematic review of all studies examining a cohort of clinical trials to identify links from clinical trial registries to bibliographic databases and from bibliographic databases to clinical trial registries, following a published systematic review protocol [22].

Methods

Inclusion criteria and search strategy

We identified all primary studies that examined links between any of the registries in the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) and published articles in bibliographic databases. Studies were excluded if there was no English-language version, if they did not unambiguously report the total number of clinical trials for which links were identified, if they were reporting on a specific clinical trial, or if the identification of links was not the primary focus of the study. Studies that did not unambiguously report the processes used to identify links were included in the review but excluded from the analyses.

PubMed and Embase were searched from inception to May 27, 2016, [23, 24]. The search strategy was developed with the assistance of a medical research librarian with details described in a previously published protocol [22]. The full version of the search strategy for both databases is provided in additional files (see Additional files 1 and 2). This strategy included searching of all study references to identify any other relevant articles not captured in the original search. Duplicate studies were removed using digital object identifiers and manually comparing titles, authors, publication dates, and article metadata. All identified studies were screened individually by two reviewers for inclusion, and disagreement was resolved through discussion.

Data extraction

Two reviewers evaluated all the included studies to extract relevant information from the studies and resolved ambiguities by discussion. For each study, the following information was extracted: (a) number of reported clinical trials, (b) number of published articles, (c) trial registries used, (d) the study purpose (such as publication bias, outcome reporting bias, or assessing the publication rate of registered trials), (e) application domain (any constraints such as journal lists, conditions, or specialties), (f) processes for identifying links, and (g) proportions of links found using each process.

The processes used to identify links were categorised as one of three types: automatic, inferred, and inquired. Automatic links were defined by any process that used the unique registry identifier to reconcile the link into or from a bibliographic database without the need for a search or inquiry. This included searching PubMed for registry identifiers to find published articles in cohorts of registry entries or using identifiers in the metadata, abstract, or full text of published articles to find registry entries in cohorts of published articles. Inferred links were defined by any manual processes in which investigators searched for matches across databases using characteristics of the trial such as the names of the investigators, titles, and acronyms associated with the trial, location, sample size, or the population, intervention, or measurable outcome information to find a match in a bibliographic database or trial registry. Inquired links were defined by any manual process where the study authors attempted to contact the investigators or authors of a trial to request or confirm the presence or absence of a registry entry or a published article for each included trial.

Data synthesis and analysis

We examined the proportions of links that were identified through each of these three processes. Using the publication year of the studies that used both automatic and manual processes, we applied linear regression to determine whether the utility of the automatic processes—the proportion that were found automatically compared to the proportion that required manual processes—had increased over time. We did not undertake a pooled analysis of the utility of automatic links because many studies did not specify proportions found by each process used and because of the heterogeneity in the study designs. All statistical analyses were conducted using SPSS statistical software version 24.0 (IBM, Armonk, NY).

The protocol for this systematic review was published in 2016 [22] (see Additional file 3). We did not register the systematic review with PROSPERO because it does not directly examine at least one outcome of direct patient or clinical relevance.

Results

The initial search returned 11,986 results (after non-English articles were excluded), which produced 9486 articles after de-duplication (Fig. 1) [25]. A set of 348 studies remained after screening titles and abstracts, and of these, 81 studies were included in the review. One study considered links from both cohorts of registry entries and published articles [15, 26], for a total of 82 analyses. Excluded studies included conference abstracts, studies for which information about the proportions of registry entries or published articles that were identified was ambiguous [27,28,29] and studies that considered reporting biases but could not be included because the linking was atypical or there was no linking performed [30,31,32,33]. Some studies were excluded because they did not measure links between trial registries and bibliographic databases and, instead, considered links to or from other source of clinical trial information. These included links to or from protocols [3437], conference or meeting abstracts [38,39,40,41,42], internal company documents [17], Food and Drug Administration (FDA) documents or new drug approvals [43,44,45,46,47], or other databases of published articles [48, 49].

Fig. 1
figure 1

PRISMA flow diagram of study selection for a search and screening process that resulted in the inclusion of 81 studies

Studies identifying published articles from cohorts of registry entries

We identified 43 studies that examined links to published articles from registries, typically with the aim of examining publication bias or outcome reporting bias (Table 1). The application domains varied by types of studies (e.g., terminated and withdrawn trials [50, 51], trials funded by specific organisations or from certain countries [52, 53]), and by specialty and condition (e.g., paediatric or surgical trials [54, 55]). The most commonly studied registry was ClinicalTrials.gov only (35 studies), followed by some or all the registries of the WHO ICTRP (8 studies). The most commonly examined bibliographic databases were PubMed alone (22 studies), or Embase in combination with PubMed or other bibliographic databases (20 studies). The studies included cohorts of registry entries that ranged in size from 34 to 8907 (median 305) entries. The median proportion of registry entries for which published articles were found was 47%, and these proportions ranged from 4% (2 published articles in a cohort of 46 registry entries) to 76% (47 published articles in a cohort of 62 registry entries).

Table 1 Characteristics of 43 analyses identifying published articles from cohorts of trial registry entries

The processes used to identify links between clinical trial registries and published articles varied across the set of studies (Figs. 2 and 3). The most common process was to use a combination of automatic and manual processes (24/43, 56%), followed by manual processes only (11/43, 26%), and automatic processes only (3/43, 7%). There were five studies for which the process for identifying published articles was not clear or not provided.

Fig. 2
figure 2

The processes used to identify links in 81 included studies, including studies that examined automatic links only (red), both automatic and manual processes (purple), manual processes only (blue), and studies that did not report the processes used (grey)

Fig. 3
figure 3

The proportions of published articles identified in cohorts of registry entries (top, 43 studies, ranging from 34 to 8907 registry entries) and the proportions of registry entries found in cohorts of published articles (bottom, 39 studies, ranging from 54 to 698 articles), with studies that only considered automatic links (red) and all other studies (blue). The circle areas are proportional to the study size

Of the 24 studies that looked for published articles among a cohort of registry entries and used both manual and automatic processes, 12 studies specified the number of published articles identified via each process (Fig. 4). Among these studies, automatic links were used to identify between 13 and 42% (median 23%) of the published articles, and manual processes were used to find a further 5–42% (median 17%) articles that were not available via automatic links.

Fig. 4
figure 4

The proportions of published articles found in cohorts of registry entries (12 studies, top) and the proportions of registry entries found in cohorts of published articles (16 studies, bottom), by automatic links (grey) and manual processes (blue)

We found no evidence of a change in the overall proportion of publications that could be found via automatic links. A linear regression over the 12 studies—using the publication year as the independent variable—indicated no significant trend in the proportion of available links that can be identified by automatic processes (R 2 = 0.02, p = 0.36, β = 1.28% increase per year).

Studies identifying registry entries from cohorts of publications

There were 39 studies that considered cohorts of publications and identified associated registry entries in one or more of the WHO ICTRP clinical trial registries (Table 2). These studies included a range of 51–698 (median 181) published articles. These studies also covered a range of application domains, varying by the selection of journal, discipline, or study design [56,57,58,59,60,61,62]. The most commonly used bibliographic database was PubMed alone (19 studies), followed by PubMed in combination with other bibliographic databases (7 studies). To identify registrations, the studies most commonly searched ClinicalTrials.gov in combination with other registries (25 studies), followed by all trial registries included in the WHO ICTRP (9 studies). The median proportion of registry entries that were identified from cohorts of published articles was 54%, ranging from 10% (8 registrations from a cohort of 83 published articles) to 99% (75 registrations from a cohort of 76 published articles).

Table 2 Characteristics of 39 analyses identifying trial registry entries from cohorts of published articles

The processes used to identify links between clinical trial registries and published articles varied across the set of studies (Figs. 2 and 3). The most common process was to use a combination of automatic and manual processes (21/39, 54%), followed by automatic processes only (9/39, 23%), and manual processes only (2/39, 5%). There were 7 studies for which the processes used to identify registry entries were not clear or not provided.

Of the 21 studies that looked for registry entries among a cohort of published articles and used both manual and automatic processes, 16 reported the number of registry entries found using each process (Fig. 4). Among these studies, automatic links identified between 8 and 97% (median 49%) of registry entries and the manual processes identified between 0 and 28% (median 10%) additional entries.

We found no evidence of a change in the overall proportion of published articles for which registry entries could be found via automatic links. A linear regression over the 16 studies—using the publication year as the independent variable—indicated no significant trend in the proportion of links that can be identified via automatic processes (R 2 = 0.01, p = 0.73, β = 1.40% increase per year).

Discussion

In this systematic review, we found that investigators use both automatic and manual processes to link registry entries and publications and that automatic links could be used to identify some but not all links between registry entries and published articles. We found no evidence that the utility of automatic processes had increased over time.

To the best of our knowledge, no other systematic review has examined the utility of automatic links between trial registries and bibliographic databases. Previous studies that examined the availability of automatic links provided a broad analysis of automatic links made available through ClinicalTrials.gov and PubMed but did not systematically evaluate the proportion of links that could additionally be resolved using manual processes [15, 16, 63]. Other systematic reviews have examined reporting biases as a topic and included subsets of the studies we included [14, 64], but focused on publication rates and the completeness and consistency of outcome reporting, which we did not evaluate here. Our review adds to this area of research by compiling information about a broader group of studies and synthesising what is known about the utility of automatic links, and the need for supplementing automatic processes with manual processes, in studies that rely on links between trial registries and bibliographic databases.

Implications

Our results indicate that automatic links alone are a useful but not sufficient process for measuring rates of registration and publication or associated biases. Relying on automatic links to draw conclusions about the rate of non-publication will likely over-estimate the rate of non-publication. When aiming to monitor compliance with prospective registration of clinical trials, or monitoring publication practices and patterns, the limits of automatic links should be considered.

In general, the proportion of links identified by automatic processes was lower in studies that started with a cohort of registry entries and aimed to identify published articles, compared to studies that started with a cohort of published articles, and aimed to identify registrations. This may be a consequence of journals that have not yet established standards for registration [65] or have not implemented standards for incorporating registry identifiers in the information they pass to bibliographic databases.

The results also have implications for systematic reviews. Systematic review technologies for automating or supporting reviewers rarely consider information from clinical trial registries to improve the searching or screening processes [66] or the prioritisation or scheduling of systematic review updates. Because systematic reviews are already time-consuming [67, 68], the need for additional manual effort in the linking of trial registry entries with their published results may have hindered the development of tools based on this linkage. Areas for development include processes where systematic reviewers compare published reports with information in a registry or use trial registries to identify trials not found in bibliographic databases. By removing these barriers, machine-readable information linking all published studies with all registry entries may provide the catalyst for the increased use of registries in the searching, screening, and prioritising of systematic reviews.

Recommendations

We recommend continued pressure to ensure that journals and publishers adhere to standards of reporting that require unique trial identifiers to be specified in the abstract of the article and reported as part of the metadata provided to bibliographic databases. Trial investigators should also be encouraged to update registry entries with links to published results when journals do not provide the information to bibliographic databases. As we move into an era where the structured reporting of clinical trial results and individual participant data become the standard for responsible clinical trial reporting [69], the inability to automatically identify all sources of information about a clinical trial hinders our ability to reuse and synthesise results across trials. Given the number of extra links that could be identified by examining the full text of articles, we also recommend that journals ensure that clinical trial identifiers are included in the abstract or metadata provided to bibliographic databases.

We additionally recommend a standardised method for identifying links between registry entries and published articles that, for the time being, includes manual validation and checking and avoids drawing conclusions based only on automatic links. A standardised method should include details about what elements of a registry entry should be used to search for published articles and a standard definition for what constitutes published results. Standard reporting for these studies should include the number of registry entries for which searches were performed, the proportion that were identified by automatic links, by inference or by inquiry, and the full details of the dates of trial completion and the length of follow-up. Presenting studies in terms of the time to publication rather than the presence or absence of publication would make a greater proportion of the studies comparable and amenable to meta-analysis.

Limitations

There are two limitations to this review. First, the exclusion of studies for which there was no English language version available meant that we may have missed some studies examining WHO ICTRP registries from countries where English is not the primary language. Second, we used the publication year of the studies as a proxy for estimating changes in the proportions of links identified by each process without considering the period of study that each of the studies covered. This was necessary because a substantial proportion of studies did not report the range and distribution of publication and registration dates in the cohorts they examined, and this may have influenced our analysis of the trends in the utility of the automatic processes.

Conclusions

In this systematic review, we have quantified the use and utility of the processes that are used to link trial registries to bibliographic databases. The results indicate that manual processes are still used extensively and that the gap between what can be identified via automatic processes and what must be identified via manual processes persists. Future improvements in the quality of automatic linking between clinical trial registries and bibliographic databases should come from continued pressure on journals to enforce policies and practices to consistently include registry identifiers in published reports.