Background

Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease of unknown aetiology. It is challenging to diagnose RA because of the variability in the disease expression [13]. In RA, most reported outcome measures are composites, with the Disease Activity Score for 28 joints (DAS28), a continuous measure of current status of disease activity, and the American College of Rheumatology (ACR) response criteria, a binary indicator of disease activity change over time, being the most commonly used. The ACR response criteria are defined as 20 % (ACR20), 50 % (ACR50) and 70 % (ACR70) improvement in five of the seven measures [4, 5].

A composite endpoint or outcome comprises several single endpoints that are combined into a single outcome. The use of composite endpoints has been discussed extensively in the trial literature; for example, they are used as time-to-event endpoints in cardiovascular, cancer, diabetic and HIV studies [6, 7], where the composite might involve binary variables which combine mortality with non-fatal endpoints such as hospitalisation and cardiac arrest in chronic heart patients [8]. The advantages of using composite outcomes are statistical efficiency and increased precision of risk ratios and other parameter estimates, arising from a larger event rate, and hence a smaller sample size needed when designing the trial [9]. An article by Senn and Julious criticised the use of composite response measures, as they argued that their use should be carefully thought through and accompanied by consideration of their components [10].

Randomised controlled trials (RCTs) are the gold standard study design for evaluating treatment efficacy. Trials with measurements made on the same patient repeated over time nearly always have an outcome where patients have missing values at the end of follow-up. A missing value is an observation that was intended to be collected from a study subject but for a variety of reasons was not collected [11]. The presence of missing data in trials leads to a loss of statistical power to detect effects through a reduction in the size of the analysed sample, when imputation is not used. In addition, the remaining analysed sample may no longer be representative of the recruited sample, which may introduce bias into the treatment estimates. For example, in two-arm trials, these losses and biases can occur differentially in each arm and for reasons connected with changing disease outcome, thereby increasing the potential for incorrect/misleading conclusions from these randomised comparisons if the missing data is inappropriately handled.

A survey of RCTs in all medical fields in four major medical journals in 1999 found that a quarter of trials had more than 10 % of responses missing for the primary outcome [12]. A similar review in 2004 [13] found that 89 % of trials had reported partially missing data, meaning that there is some but not all data available for the individual. In addition, the review showed an unexpectedly high use of overly simple methods for handling missing data which ignored the partially available data. Furthermore, 79 % of trials did not report a sensitivity analysis as recommended by the Consolidated Standards of Reporting Trials (CONSORT) statement [14, 15].

This paper was motivated by the current practice for handling and reporting missing data in RA trials, which has typically involved the use of single imputation methods that have become outdated, e.g. the single imputation method using last observation carried forward (LOCF). Moreover, non-responder imputation (NRI) is another single imputation method, which assigns a subject with a missing binary or categorical outcome as if they are a non-responder. NRI assumes that missing values are treatment failures, and this assumption is unquestioned unless a sensitivity analysis is additionally undertaken in order to explore the impact on the results. The aims of this review were to assess the range of missing data rates in primary composite outcomes and to document the current practice for handling and reporting missing data in published RA trials compared to CONSORT recommendations.

Methods

Data sources and search strategy

Trials were identified by searching PubMed and other resources (hand searched individual journal websites, Web of Science, Cochrane Central Register of Controlled Trials and Google Scholar). The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines were followed for reporting the review methodology [16]. The search terms are given in Additional file 1: section A.

Selection of studies

Studies were included in the review if they met the inclusion criteria: phase 3; double blinded RCT conducted in adults with RA; English language papers published between January 2008 and December 2013; published in four rheumatology journals (Annals of the Rheumatic Diseases, Arthritis & Rheumatism, Arthritis Research & Therapy and Rheumatology); and four high impact factor general medical journals (Lancet, New England Journal of Medicine, Journal of the American Medical Association and the British Medical Journal); and a composite outcome measure was reported as the primary outcome.

Data extraction

Data extracted from the papers included the following: year of publication, journal, source of funding, primary and secondary outcomes, trial design; sample size calculation and whether this calculation included a dropout rate; amount of missing information (proportion of missing outcome data in each treatment group after randomisation); method of dealing with missing primary outcome data; analysis population (intention-to-treat (ITT), modified ITT, per protocol, complete or available cases); statistical method used to analyse the primary outcome; sensitivity analyses; participant flow diagram (e.g. number randomised in each group, number included in the ITT analysis, number completed and number of withdrawals or lost to follow-up) and study follow-up time. ITT is an approach used to analyse trial data, where subjects are analysed in their original randomised group irrespective of whether they received the intervention or not.

Statistical analysis

Data manipulation and analyses were carried out in Stata (version 13.0, StataCorp, College Station, TX). The proportion of missing primary outcomes at the primary time point was defined as: one minus the number of patients who completed the trial divided by the number of patients in ITT analysis. The treatment group that we presented here represents the combined active treatment groups in each trial; e.g. if a trial had three arms of which two arms were active treatment and one a placebo, then the numbers in the two active treatment groups were combined. To estimate the differential rate of missing primary outcome data, the relative rate of missing data was defined to be the rate of missing primary outcome in the placebo group relative to that in the treatment group. Summary statistics were presented to describe the characteristics of the studies included in the review. Chi-square or Fisher’s exact tests were used to compare between the categories.

Results

Study selection

The initial database search identified 297 unique studies, from which 196 were selected for a full text examination. A further 145 articles were excluded, leaving 51 papers (see Fig. 1). The majority of exclusions at this stage were made for studies not being in phase 3 or published in other journals.

Fig. 1
figure 1

PRISMA flow diagram for study selection

Characteristics of included studies

Of the 51 trials published between 2008–2013, 23 (45 %), 11 (22 %) and 17 (33 %) were published in the periods 2008–2009, 2010–2011 and 2012–2013, respectively (see Table 1). Fifty of the 51 trials were of parallel design (98 %), and of these, 26 (52 %) were two-arm treatment comparisons, 14 (28 %) three-arm comparisons and the remaining 10 (20 %) had more than three arms.

Table 1 Characteristics of the trials included in the review

The binary ACR20 responder index was the most frequently reported primary composite outcome in 35 trials (69 %), followed by 12 trials (24 %) reporting DAS28-ESR as the primary outcome, one trial reporting DAS28-CRP, and three (6 %) reporting the ACR50 responder index. The DAS28 measures are continuous composites, and the ACR indices are binary composites.

The median trial duration was 24 weeks (interquartile range IQR: 14–26 weeks). The mean age and disease duration of the participants in the trials at baseline ranged from 46.4 to 60.0 years, and 0.2 to 16.9 years, respectively.

CONSORT flow diagram

A participant flow diagram was reported in 43 trials (84 %). There was a rise in the use of the recommended flow diagram from 74 % in 2008–2009 (pre CONSORT, revised in 2010) to 93 % in 2011–2013 (post CONSORT). All nine of the trials published in the general medical journals reported a flow diagram. For those published in the specialist rheumatology journals, eight trials did not report a flow diagram.

Sample size calculation

Forty-three trials (84 %) reported sample size calculations. Of these, 9 (21 %) trials included an allowance for dropout in their power calculation. This dropout rate ranged from 5–25 %, and 4 out of 9 trials (44 %) underestimated the dropout rate (Fig. 2). The proportion of trials that reported a sample size calculation was similar in trials published pre and post CONSORT (87 % versus 82 %, p = 0.715). A similarly high proportion, 37 (73 % of) trials, reported both a sample size calculation and a participant flow diagram.

Fig. 2
figure 2

Observed and anticipated dropout rates. Each dot represents one trial; only 9 trials included dropout rate in the sample size calculations

The reporting of a sample size calculation and participant flow diagram shows a direction of improvement with the introduction of the CONSORT statement (Table 2), although these improvements were not significant at the 5 % level. In the pre CONSORT revision period, the number of trials reporting both a sample size calculation and a participant flow diagram were 16 of 23 trials (70 %), and in the post CONSORT period the proportion was slightly higher, 21 of 28 (75 %).

Table 2 Conformity with missing data-related CONSORT items by year of trial publication

Intention-to-treat analysis

In most trials, it was stated that the primary analyses were based on the ITT population. Forty-four (86 %) of the trials reported that ITT analyses were used, while the remaining 7 (14 %) of the trials analysed the primary outcome data according to a modified ITT population, which was defined as all randomised patients who had taken at least one dose of study medication and had a baseline and at least one other visit; see Additional file 1: Table S1.

Extent of missingness in the primary composite outcomes (reporting of missing data)

Missing values were present in primary composite outcomes for all of the 51 trials. The median missing primary composite outcome rate was 17 % (IQR 10–25 %) with a wide range from 2.1–52.7 %. Typically 17 % of the primary composite outcome data in ITT analyses were imputed data, and this was considerably higher for some trials. The rate of imputed missing primary outcome was >30 % in 9 trials (18 %), >20–30 % in 11 trials (22 %), 10–20 % in 18 trials (35 %), and <10 % in 13 trials (25 %).

Imputation details of the primary composite endpoints (missing data handling)

In the 38 trials with binary outcomes with imputation of the whole composite, NRI was used in 29 trials (76 %), and 6 trials (16 %) used LOCF. Moreover, in 3 trials no imputation method was used, although these trials had 10–13 % of missing data in the primary outcome data. Similarly, in the 13 trials that reported DAS28 (which is a continuous composite containing mixed continuous and binary constituents), both of these single imputation methods were prominent, with a greater number, 9 (69 %), using the NRI than LOCF, 4 (31 %).

Some trials used both of these imputation methods in their outcomes. For example, out of the 38 trials which reported using NRI, 23 (61 %) also used LOCF. One trial used both LOCF and multiple imputation (MI). MI is a general statistical method to analyse incomplete data. It attempts to impute missing information by repeating the imputation process multiple times, with each imputation consisting of a value randomly drawn from a distribution of likely values determined from the observed data [11].

Four trials reported using the LOCF method to impute the joint count component only [1720]. A further two trials reported imputing the components of the ACR core set using LOCF [21, 22] while reporting the use of NRI to impute patients missing the whole primary endpoint (ACR20 responder index).

Simple univariate methods were used to analyse primary composite binary outcomes such as the Cochran–Mantel–Haenszel test in 24 trials (47 %), a simple descriptive comparison (i.e. Fisher’s exact or the chi-square test) in 15 trials (29 %) and the binomial comparison in 4 trials (see Table 3). Repeated data of the outcome measured over time was only used in one trial, using a mixed model analysis [23].

Table 3 Test statistics used to analyse the primary outcome

Differential rate of missing outcomes between treatment and placebo groups

Of the 51 trials, 34 (67 %) were placebo controlled. There were notable differences in the rate of missing primary composite outcome data between treatment and placebo groups. In the treatment group the median rate was 14 %, whereas in the placebo group the rate was 24 %. A formal comparison of treatment and placebo groups in these 34 trials provided a relative rate of 1.61 (95 % CI: 1.29, 2.02), indicating that the rate of missing data in primary composite outcomes was on average 61 % higher in relative terms in the placebo group compared to the treatment group (Fig. 3). For the remaining 17 trials that were not placebo controlled, the median rate of dropout was 11 %, which is lower than that for the placebo controlled trials. Table 4 shows the relative rate of missing primary composite outcome data by length of follow-up. The effect of a higher rate of placebo group dropout was observed to be a little stronger in trials with 6 months or longer follow-up (trend test p = 0.296).

Fig. 3
figure 3

Differential rate of missing data for the primary composite outcome in placebo controlled trials. The line of equality represents no difference between groups; each dot represents one trial. There are more data points above the line, which indicates that trials generally have a higher rate of dropout in the placebo group than in the treatment group

Table 4 Rates of missing data in placebo relative to treatment group in placebo controlled trials (n = 34)

Sensitivity analysis for assessing the handling of missing data

Of the 51 trials, 14 (27 %) reported the use of a sensitivity analysis (see Additional file 1: Table S1 for full descriptions). Sensitivity analyses were used in 38 % (11/29) of trials with more than 15 % missing data in the primary outcome, and used in 14 % (3/22) of trials with less than 15 % missing data.

Missing data mechanism

There was no discussion of the impact that the handling of the missing data might have had on the primary analysis in any of the trials. An initial detail is to report the so-called missing data mechanisms, which are used to describe the assumed relationship between the observed data and the missing data. Amongst the trials in this review, only four mentioned the missing data mechanism, and further details were given for two of these in supplementary material [2326].

Discussion

Our results show that the trials in this review had a wide range of rates of missing values in the primary composite outcome. The methods that were used to handle and report missing data were not clearly reported. This was particularly the case for the components of the composite. Often, the reporting of these trials did not follow the recommendations set out in the CONSORT guidelines [14, 15]. The majority of trials provided a flow diagram that contained the number of patients used in the analysis for the overall primary composite outcome. While the diagram is informative, it does not include all the information required. For example, subjects might be still in the trial and have a missing measurement but contribute to the primary time point. Furthermore, the way current trials are reported often means that we know how many subjects are missing any data within the overall composite, but not for which of the individual components it is missing.

The level of missing composite outcome data varied from 2–53 %, and, typically, 17 % of primary composite data in ITT analyses were imputed. The extent of missing data in the components of the composite was unknown. This is crucial information, as the individual components might have different amounts of missing data or have some partial available data which could be used to inform the primary outcome. We found differential rates of missing data between treatment and placebo groups in the placebo controlled trials; the rate was on average 61 % times higher in the placebo group compared to the treatment group.

Most of the trials (43/51) provided a power calculation. However only 9 trials included a dropout rate in their power calculation, and 44 % of these trials underestimated the dropout rate that was allowed for. The remaining 34/43 trials did not allow for any dropout, and yet the median missing primary outcome was high, 18 % (12–25 %), which clearly reduces the power of the study, although the use of single imputation could unsatisfactorily be argued to overcome the loss of sample size. Moreover, some trials reported analysis of a modified ITT population, which further reduces the numbers from those in an ITT analysis. Although restricting analysis to the modified ITT is not expected to introduce bias per se, it will most likely reduce the power of the trial, may introduce bias and could be contrary to the spirit of the ITT principle [27].

As recently as 2013, methods known to lead to bias trial results, i.e. LOCF and NRI, continue to be used. Furthermore, articles that were published in 2013 (n = 10) reported the use of both methods in the primary analyses. Single imputation methods, as used widely in this area, are inappropriate for handling missing outcome data in trials, as they underestimate the true variability in the data [28]. In addition, if missing data are inadequately handled in primary analyses, there is the potential for misleading conclusions due to inadequately capturing the true variability in the data. However, there are other imputation methods that are highly recommended by experts in the field of missing data, i.e. multiple imputations [11, 29, 30].

Some trials in this review reported using LOCF to impute missing joint count constituents of composite outcomes, while using NRI for subjects missing the whole primary composite outcome. It was surprising that most of the trials did not discuss the missing data mechanism. The choice of the method to handle missing data relates to an assumption of missingness [12] and needs to be discussed in sufficient detail for transparency. It is also important to explore the pattern and mechanism of missingness and to report these. In our results, 6 trials mentioned the missing data mechanism.

We also identified a low proportion of trials reporting a sensitivity analysis, i.e. 14 (27 %). A sensitivity analysis, in addition to the main analysis, is generally recommended in order to assess the robustness of the study findings to plausible assumptions made about the missing data [29, 31] and to increase confidence in the validity and generalisability of the results. Moreover, the subject’s primary outcome data were imputed typically for 61 % more subjects in the placebo group than in the active treatment group, without challenge to the assumptions.

Our results have similarities with other studies that report missing data in primary outcome measures. A recent meta-analysis by Hewitt et al. [32] of quality of life outcome in musculoskeletal conditions found that attrition rates of these trials ranged from 4–28 %. It also showed a differential rate of missing data between treatment and control groups, which varied from 1–14 % in the treatment group and 3–25 % in the control group, respectively. Our study also shows similar results to other reviews that were carried out a decade ago. A study by Baron et al. [33] in rheumatic disease found high percentages of missing data on structural outcomes, and these trials did not adhere to the principle of ITT analysis. The review of Wood et al. [13] also found that inappropriate methods were used to handle missing primary outcome data and showed a similarly low percentage of trials reporting sensitivity analyses compared to our results (21 % versus 27 %).

Our study has some limitations. First, we searched four rheumatology and four high impact factor general medical journals, which excludes RA trials that are published elsewhere or in lower impact factor journals. Secondly, we excluded 54 articles that were published in other journals that did not endorse the CONSORT statement, although the majority of these articles did not report a composite outcome at the primary time point. Thirdly, we restricted our search to trials that reported a composite outcome as the primary endpoint. It is true that the space limitations in journals for articles are limited, and therefore some discrepancies may exist between the actual method used and the methods that were reported. Finally, these are RA disease-specific trials and so are not generalisable to other disease areas where the primary outcomes might not be of composite nature.

This study adds to the volume of the existing evidence of reporting and handling of missing outcome data. However, these findings require further investigation because our understanding on whether to impute the whole composite outcome or the individual components and then calculate the composite is unknown. Furthermore, there are no guidelines on how to handle missing data in composite outcome data that result from derived variables. We therefore designed a simulation study that uses information from these results to answer this crucial question.

Conclusions

This review highlights improvements in rheumatology trial practice since the revision of the CONSORT guidelines, in terms of reporting the power calculation and participant’s flow diagram. However, there is a need to improve the reporting and handling of missing composite outcome data and their components. In particular, sensitivity analyses need to be more widely used in RA trials because imputation is widespread and with assumptions and variability unchallenged, and missing data rates are differentially higher in placebo groups in this area.

Recommendations to improve the reporting and handling of missing data in composite outcome data in RA trials

Our recommendations are as follows:

  • Include the missing composite and its components during follow-up in a table format that shows the missing proportions in each arm at each time point.

  • Choose an appropriate method for handling missing data by considering the range of currently available methods. These include multiple imputation and mixed effects models.

  • Discuss the potential missing data mechanism and the observed pattern of missingness to support the chosen methods.

  • As recommended in the CONSORT guidelines, conduct a sensitivity analysis to show the robustness of the primary analysis to the assumptions that are made when handling missing data.

Abbreviations

ACR, American College of Rheumatology; ANCOVA, analysis of covariance; ANOVA, analysis of variance; BMJ, British Medical Journal; CONSORT, Consolidated Standards of Reporting Trials; CRP, C-reactive protein; DAS28, Disease Activity Score for 28 joints; ESR, erythrocyte sedimentation rate; IQR, interquartile range; ITT, intention-to-treat; JAMA, Journal of American Medical Association; LOCF, last observation carried forward; NEJM, New England Journal of Medicine; NRI, non-responder imputation; PRISMA: Preferred Reporting Items for Systematic reviews and Meta-Analyses; RA, rheumatoid arthritis; RCT, randomised controlled trial