Introduction

High-quality randomised controlled trials are considered the gold standard research approach to identify causality or demonstrate treatment efficacy. There are many treatment uncertainties in neonatal practice [1] that would benefit from being subjected to high-quality randomised clinical trials [2]. However, the high cost of undertaking large and methodologically robust trials [3] means that only a small number are undertaken each year: the median cost of randomised controlled trials was estimated between US$43 and US$103,254 per participant [4] and publicly funded pragmatic neonatal trials cost £1.5–2 million [5]. A key driver of cost in clinical trials is data collection; the mean costs of trial data collection using conventional Case Record Forms have been estimated to be €1135 per participant [6]. More efficient collection; for example, using electronic Case Record Forms [6] and routinely available clinical data [7], provide opportunities to reduce costs and facilitate neonatal trials to improve the limited evidence base upon which much of neonatal care currently relies.

Methods to increase the efficiency of clinical trial data collection have been described by organisations such as the Institute of Medicine [8] and the Clinical Trials Transformation Initiative [9]; these include a targeted collection of common core data items, and extraction of trial data from existing sources, such as Electronic Patient Record (EPR) systems or disease registries; these approaches are most likely to be applicable to pragmatic trials [10]. The use of existing ‘real-world’ data sources such as these provides additional advantages: they can provide up-to-date incidence estimates for baseline and outcome event rates to better inform sample size calculations, and the accuracy and completeness of key data items can be estimated in advance from historical data to inform trial feasibility at the planning stage, and address widely held concerns about poor quality of data from existing sources [11]. However, because not all data items held within a routinely recorded database or registry will be relevant to clinical trials, the data items that are ‘core’ [9] for clinical trials in a particular clinical area need to be established. Established approaches exist for the definition of Core Outcome Sets [12], but none for core non-outcome data for clinical trials; for example, baseline or background data, and items used in randomisation.

An increasing proportion of neonatal Cochrane reviews are inconclusive because of insufficient high-quality data from randomised trials [2]. Neonatal care in the United Kingdom is well placed to develop large, efficient trials that use existing data: all infants admitted for National Health Service (NHS) neonatal care in England, Scotland and Wales have clinical data recorded in a summary EPR system as part of routine clinical care, and predefined data [13] are extracted to form the National Neonatal Research Database (NNRD). The effectiveness and efficiency of using routinely recorded clinical data, held in the NNRD for data-enabled neonatal trials, are currently being investigated [14]. We hypothesised that a set of common data items have been reported across neonatal trials that impact clinical practice; the aim of this study was to identify common neonatal data items. As there is no established approach for the identification of common baseline data items we undertook a systematic review to identify baseline data items reported in neonatal trials. A secondary aim was to quantify the completeness of these commonly reported items in the NNRD to inform whether this could be used as the sole or principal data source for clinical trials.

Methods

Systematic review

To identify data commonly reported in neonatal trials we conducted a systematic review of neonatal clinical trials published in high-impact journals. We developed a protocol with explicitly defined objectives, information to be extracted, and statistical methods. We prospectively registered the protocol with PROSPERO International Prospective Register of Systematic Reviews, registration number CRD42016046138 (https://www.crd.york.ac.uk/prospero), registered on 17 August 2016.

We searched the four most highly cited general medical journals that publish neonatal trials [15] (New England Journal of Medicine, Lancet, British Medical Journal and Journal of the American Medical Association) over a 10-year period from 1 January 2006 to 31 December 2015, using the PubMed database. The PubMed search strategy is described in Additional file 1. We extracted randomised clinical trials written in English that tested an intervention delivered to newborn infants in a neonatal unit setting, with no restriction on the disease area or treatment type. Prior to data extraction we changed the inclusion criteria for studies to include trials of infants born at more than 34 gestational weeks, so that the results would be more generalisable to neonatal trials. We did not include trials where an intervention was applied to a pregnant mother and infant outcomes were reported. Two authors (SJ and CG) independently performed the screening of each potentially relevant record and reviewed full text where necessary to assess eligibility. Discrepancies between the authors were resolved through discussion.

Two authors (SJ, CG) independently extracted the following items from included clinical trials: baseline items, items used in stratification or minimisation (randomisation), and items used to adjust primary outcomes. Other study characteristics that we extracted included whether the trial was multicentre and whether it involved preterm or term infants. Outcome data were not extracted as these are the subject of other parallel work [16]. A comprehensive list of reported data items and frequencies was extracted. Items were combined where appropriate; for example, administration of different medications was combined into the item ‘medications’. Preterm studies were defined as studies involving babies with a gestational age of less than 37 weeks or weighing less than 1500 g and term studies as studies on babies born at or above 37 weeks’ gestation. A formal risk of bias assessment was not conducted as the interest of this study was limited to the data collected, not the interventions or the measure of efficacy.

Data completeness

Data completeness in the NNRD was examined for infants born in England, Scotland and Wales during the period 1 January 1 2015 to 31 December 2015 for the first seven postnatal days. The NNRD contains over 400 different data per each baby; data held in the NNRD are extracted from individual infants’ EPR data routinely recorded by healthcare professionals as part of clinical care. Details of the Neonatal Dataset are searchable at the following webpage [13] and descriptive data for infants within the NNRD are available here [17]. We calculated the completeness in the NNRD of each data item reported by at least 20% of clinical trials included in the systematic review.

We defined incompleteness as an empty field or an implausible value. Where an item identified through the systematic review (for example, birth weight) directly matched a corresponding NNRD field, the completeness of these items was directly calculated. Where an item identified in the systematic review mapped to several fields in the NNRD (for example, respiratory support, identified in the systematic review, maps to several NNRD fields, including use of respiratory support, mode of ventilation, non-invasive respiratory support, nitric oxide, tracheostomy, surfactant [13], completeness was determined by at least one value that was not missing or implausible (according to the neonatal dataset data dictionary definition) over the multiple possible NNRD fields.

Results

Systematic review

We identified 161 articles in the literature search. We excluded 117 articles leaving 44 eligible to be included in the review (Fig. 1). Twenty-nine studies included only preterm babies, six only term babies and nine studies included both term and preterm babies (Table 1). The majority of studies (91%) were multicentre trials and overall included 30,968 participants (Table 1).

Fig. 1
figure 1

Flow of studies through the systematic review

Table 1 The identified studies and their characteristics

The median number of baseline data items reported in the 44 included trials was 12. Gestational age, sex and birth weight were collected as baseline items for 42 of 44 studies (Table 2). Fourteen data items were reported by at least 20% of studies; 66 baseline data items were reported by one study alone (Additional file 2: Table S1). No study reported all 14 of the most common data items.

Table 2 Data items reported in more than 20% of studies and stratified by the age of the study participants

Sixteen stratification items were reported by 35 trials. Neonatal unit identifier (57%) and gestational age (39%) were the most common items used for stratification during randomisation. Two (13%) of these stratification items were reported by more than 20% of trials and 9 (56%) were reported by one study only (Additional tables). Twenty-four items were reported by 33 trials to adjust the primary outcome. Of these, 3 (13%) were reported by more than 20% of all trials and 12 (50%) were reported by one study only (Additional file 2 Tables S1, S2, S3, S4). Eight (50%) stratification and 9 (38%) adjustment items were in the top 14 background data items. A full list of all common items can be found in the Additional file 2 Tables S1, S2, S3, S4.

Data completeness

In 2015, 96,699 infants were admitted to 180 neonatal units in England, Wales and Scotland. Admitted infants received 472,187 days of neonatal care during the first 7 days following birth (data not shown).

The completeness of common data items in the NNRD are summarised by age groups in Table 3. Data completeness in the NNRD is 99.9% for gestational age at birth, 99.9% for sex, 100% for birth weight, 99.7% for multiple birth and 100% for respiratory support on day 1 (Table 3). The majority of data items were more than 90% complete, exceptions include maternal ethnicity (70.2%), mode of delivery (81.4%) and Apgar score at 5 min (79.1%). Completeness was higher for all data items for preterm (mean completeness 94.4%) compared to term babies (mean completeness 89.2%) (Table 3).

Table 3 Data completeness in the National Neonatal Research Database (NNRD) for the data items reported in 20% of studies or more

Discussion

We have identified a common set of non-outcome data items reported in high-impact neonatal trials. We find that 12 of these 14 data items can be obtained from the NNRD with high completeness for most items (Table 3). The common data items identified here have previously been validated against independently collected trial data [17] where they were shown to be highly accurate and complete in the NNRD. This supports the assertion that non-outcome data held in the NNRD can be used to support large, efficient neonatal trials. We recognise that the trials included in the systematic review also reported a wide range of additional non-outcome data items that were not included in the common set identified here. In planning future pragmatic neonatal trials, the completeness and accuracy of additional data items critical to the integrity of a planned trial can be evaluated using approaches similar to those applied here. However, the finding that reported data items were variable even between similar trials (Additional file 2: Table S2) suggests that some reported data items may not have been critical to trial integrity, and that harmonisation of non-outcome data items may improve the consistency and efficiency of future neonatal trials. The common non-outcome data items we identify here, and their completeness and accuracy [17] in the NNRD, can be used to assess the suitability and feasibility of using the NNRD and other similar routinely recorded data sources for neonatal trials.

Data completeness of the NNRD has previously been calculated by Battersby et al. [17] in relation to a single clinical trial between 2008 and 2015. In this study percentage completeness was very similar to that found in the present study where common data items examined multiple births, gestational age, sex and birth weight, indicating that data completeness within the NNRD for these items is consistent over time. The present study builds upon this work by examining completeness for a wider range of empirically identified non-outcome data items; therefore, extending the relevance of these results to a wider range of potential clinical trials. For large neonatal trials in the United Kingdom, we demonstrate that the core non-outcome data items identified here are held in the NNRD to a high degree of completeness. For some core non-outcome data items, such as gestational age at birth, we show that the likelihood of missing data in clinical trials utilising the NNRD is small. These results can be used to develop and apply approaches to improve the recording of critical data items with lower completeness in a targeted way; for example, mode of delivery.

Common datasets in other clinical and research areas have been identified using a variety of methods. Doods et al. [62] identified common data groups and elements for feasibility analysis in cardiovascular medicine, diabetes, inflammatory, oncology and neurology through the use of an expert panel, but did not review the literature or include expertise from outside the field. This study identified a wide range of laboratory tests for feasibility studies. Diagnostic test data were not identified in our systematic review of large neonatal trials as commonly reported non-outcome data items, indicating that such data items are not as relevant to the pragmatic neonatal trials that are the focus of this work. Sheehan at al [63]. outline previously developed common data element sets, and some of the challenges inherent in adopting and using such sets. Chari et al. [64] conducted a systematic review of included trials and observational studies to identify common data elements in chronic subdural haematoma studies and, in keeping with our results, identified a core set of commonly reported non-outcome items. The approach that we used was a more limited systematic review of trials published in high-impact journals. This approach was chosen a-priori to focus on data items reported in trials that influence neonatal practice. This was a pragmatic decision and there are limitations to this approach: by limiting our review to general medical journals we may have missed influential trials published in specialty journals, and have not sampled the range of outcomes reported in smaller trials. Furthermore, no approach to date has sought parent or patient views on the importance of different non-outcome data items; this may be important given the different priorities identified by these groups compared to health professionals and researchers [65]. The examples cited here demonstrate the interest in, and potential value of, common sets of non-outcome data items, across different specialties. The development of an established methodological approach, analogous to that developed by the COMET initiative [12] would increase the consistency, robustness and comparability of such endeavours in future.

Our study has focussed on defining the data items usually recorded at baseline or used as explanatory data items in clinical trials. To our best knowledge there have been no previous attempts to identify core non-outcome trial data items such as these. We included the most common data items used in randomisation, which are often selected to conduct pre-specified subgroup analyses, and to adjust for the primary outcome. These items are often overlooked when exploring the impact of data quality in trials, despite the importance of completeness of these items for preserving statistical power and avoiding misinterpretation of results. We did not focus on outcome data items because the methodology to identify these data is well developed and such work is underway in neonatal medicine [16]. A limitation of our study is that data may have been selectively reported thus introducing bias; however, this is lessened as the included journal review protocols are designed to ensure that those items listed in the protocol are presented in the main trial outcomes publication. A further limitation of our study was that some items identified were dichotomous; for example, presence or absence of infection prior to trial enrolment and it was not possible to calculate completeness for such items as absence of the condition is not always actively recorded. Age was found to be a common data item; however, it is calculated using gestational age which is highly complete in the NNRD and, therefore, completeness for age was not calculated. An additional limitation stems from the fact that some data items collected in clinical trials did not directly align with data items in the NNRD; therefore, there may be a loss of information from aggregating several data items into a common data item held by the NNRD to assess data quality. Furthermore, included trials used different approaches to ascertain commonly reported data items; for example, the most commonly reported data item – gestational age – may be derived from maternal reported data, ultrasound measurement or clinical evaluation. Data held within the NNRD are extracted from routine clinical information used to inform clinical care, these clinically relevant data may be more appropriate for pragmatic trials than more granular data items reported in trials. Differences between trials and routinely recorded data sources in how data items are ascertained and synthesised have the potential to introduce biases into clinical trials seeking to use such routinely recorded data. Where such differences are randomly distributed between trial arms, the impact may be limited to lower precision, rather than systematic bias in favour of one trial arm. Further exploration is needed to understand how to accurately assess and synthesise similar data items and to quantify the direction and magnitude of potential biases.

It is important to note that some NNRD data items had between 10 and 30% missing data. The implications of such degrees of missingness depend on the role of the data item in the trial, but are likely to lead to a loss of precision [66]. Baseline variables have a role in pre-specified statistical analyses of outcomes in order that treatment effects can be estimated more precisely. Where the baseline is missing, there are methods which do allow incomplete baseline variables to be included without removing the patients with missing baselines, and to achieve some increase in precision. This is relevant to individually randomised trials, whereas an incomplete baseline may have a greater impact in trials randomising centre clusters when baseline completeness varies by centre. Baseline variables are also used to describe the trial population; for example, to allow readers to judge generalisability, and a high level of baseline completeness may be important for this purpose. Finally, baseline variables are important for subgroup analyses and missing data may limit such analyses. The results presented here will allow the impact that different degrees of missingness have in neonatal trials to be further explored and modelled to better understand which trials are most suitable to use routinely recorded data. The more widespread use of routinely collected data for clinical trials also has the potential to improve the recording of such data [67]. Another limitation is that we did not evaluate the accuracy of common non-outcome data items in the NNRD in this study, although this has recently been undertaken [17]. Completeness and accuracy are key factors in determining the suitability of using routinely recorded clinical data for clinical trials and should be evaluated for all data items deemed critical to any trial seeking to use such data.

The clinical and economic efficiency of using routinely recorded common data items has been demonstrated by trials that have used common registries such as SWEDEHEART [68, 69]. Common data items, as identified here and in core outcome sets [70], can be used to ensure that existing primary data capture systems such as EPR systems and registries capture appropriate data for trials, and in planning such trials. High accuracy and completeness of data are critical for trials; it may, however, not be feasible to evaluate such metrics for all data items within a database or registry – common data items and core outcome sets can be used to target quality assessment of data items most critical to a range of clinical trials. Ongoing data-enabled pilot trials that use routinely recorded data held in the NNRD (15) should provide prospective data regarding the feasibility of such an approach in the neonatal field.

Conclusion

Neonatal trials in high-impact journals report a common set of non-outcome data items in their primary publications. In the UK, our study indicates that these core non-outcome data can be obtained from the NNRD; the feasibility and efficiency using routinely recorded EPR data such as that held in the NNRD for neonatal clinical trials, rather than collecting these items anew, should be examined. We suggest that when planning primary data collection systems such as EPR systems, registries or clinical databases, consideration is given to fostering a culture of completeness and ensuring that important items are accurately and completely captured.