Introduction

Infection is a major cause of morbidity and mortality in patients with rheumatoid arthritis (RA) [1, 2]. The increased incidence has been attributed to the disease itself, associated factors such as smoking and immunosuppressive therapy, or a combination of these. Glucocorticoid (GC) therapy, still widely used in the treatment of RA [3], is thought to be associated with an increased infection risk as well as other well-established adverse effects [4]. GCs are known to impair phagocyte function and suppress cell-mediated immunity, thereby plausibly increasing the risk of infection [5]. However, the extent to which GC therapy contributes to the observed increased risk in RA is not clear.

Surprisingly, despite six decades of clinical experience [6], no good summary estimates of infectious risk associated with GC therapy in RA populations exist. Systematic reviews have been performed to address the efficacy of GC therapy [7], as well as multiple safety outcomes from RCTs in RA populations [8, 9]. Reviews of safety issues from observational studies tend to be narrative (rather than systematic) reviews, despite the recognition that observational data must complement RCT data when assessing the harms of drug treatments [10]. No systematic reviews or meta-analyses exist that focus on the infection risk associated with GC therapy by combining evidence from RCTs and observational studies.

Our primary aim was to perform a systematic literature review and meta-analysis (where appropriate) of RCTs and observational studies to assess the association between systemic GC therapy and the risk of infection in patients with RA, compared with patients with RA not exposed to GC therapy. Secondary aims were to examine the influence of study design, definition of GC exposure, and type of infection.

Materials and methods

Search strategy

A search was conducted in MEDLINE, EMBASE, CINAHL, and the Cochrane Central Register of Controlled Trials (Clinical Trials; CENTRAL) database to January 2010 to identify studies among populations of patients with RA that reported a comparison of infection incidence between patients treated with GC therapy and patients not exposed to GC therapy.

Published studies were identified by using separate search strategies for RCTs and observational studies. The full search strategy can be found in Additional file 1. In brief, all GC RCTs for RA were sought. Observational studies were identified by using the broad keyword areas of "rheumatoid arthritis," "infection," and "antirheumatic therapy," limiting the search to epidemiologic studies. An initial search strategy of "GC therapy," as opposed to "antirheumatic therapy," missed many studies in which the association between GCs and infections was reported, but in which GC therapy was not included in the title, abstract, or as a key word. Exposure was limited to systemic GC therapy: studies that reported only intra-articular steroids were excluded. We considered only articles published in English because of the need to screen large numbers of publications by using the complete manuscript. Hand searching of reference lists from obtained articles and selected review articles also was performed. Abstract-only publications and unpublished studies were not considered. No authors were contacted for additional information.

Study selection

The first selection, based on title and abstract, was done by one reviewer (WGD). Studies conducted exclusively in non-RA populations were excluded. Studies with designs other than RCTs, case-control, or cohort studies were excluded at this stage, as were studies of nonsystemic GC therapy. RCTs that did not randomize GC therapy were excluded. Case-control studies defined by any outcome except infection also were excluded. The full manuscripts of all remaining articles were obtained. Any uncertainty during initial screening led to retention of the article for eligibility assessment.

Eligibility assessment was then performed independently by two reviewers (WGD and MH), applying the following final study-inclusion criteria. For RCTs: (1) study population of patients with RA or undifferentiated inflammatory polyarthritis, (2) exposure to systemic GC therapy (that is, excluding intra-articular and tendon-sheath injections) in one arm and nonexposure in a further study arm (that is, in which the only major difference between the arms was the use of GC), and (3) reporting of infection numbers or rates in the two relevant study arms. If studies reported additional arms examining the effect of an alternative active treatment, data were analyzed only for the arms comparing GC therapy with no-GC exposure. If studies were explicit in describing the methods by which they captured infection, nonreporting of infection within the results was assumed to represent no infections in either group. Absent reporting of infection that was in any way ambiguous led to exclusion of the study. Studies that reported only adverse events leading to drug discontinuation were included, although grouped separately. For observational studies: (1) assessment of infection risk in a population (or subpopulation) of patients with RA or undifferentiated inflammatory polyarthritis, (2) use of a cohort or case-control design to conduct data analysis, and (3) provision of a relative-risk or rate-ratio estimate for the association between systemic GC therapy and infection with a corresponding 95% confidence interval (or sufficient data to calculate this) were required. These criteria allowed inclusion of open-label extension studies if they analyzed infection risk with GC therapy compared with no-GC therapy. Helicobacter pylori infection was excluded. Disagreements were resolved by discussion.

Data extraction and meta-analysis

Data on the number of infections or the estimated relative risks were extracted by one reviewer (WGD), along with characteristics of the studies. Extracted data were cross-checked against notes made by both reviewers during the eligibility assessment, with resolution by discussion in the few instances of disagreement. Information on categorization of GC exposure and types of infection was collected.

Meta-analysis was conducted for RCTs and observational studies separately. RCT meta-analysis was performed initially including all studies, followed by a series of a priori sensitivity analyses. In the main analysis, all GC-treated arms were combined. Because of the low number of events and the sensitivity of the default weighting (the inverse of the variance of the logarithm of the odds ratio) to the definition of infection (for example, serious or not serious), alternative weighting was performed by number of patients, then by estimated person years of follow-up. To avoid excluding studies in which zero events were found in both arms, a sensitivity analysis was performed after adding 0.5 to all cells of the 2 × 2 table. Additional sensitivity analyses included limiting studies to GC doses of < 10 mg prednisolone equivalent (PEQ), limiting outcomes to serious infections, and excluding studies reporting only events leading to study withdrawal. If studies reported more than one type of infection, sensitivity analyses were performed to examine the influence of using alternative definitions. Different analysis methods were considered, given the statistical challenge of rare events [11], including the Mantel-Haenszel odds ratio (with and without zero-cell correction), inverse variance, and weighting by study size.

A meta-analysis of all observational studies was performed, stratified by study design (cohort and case control). If several strata of exposure (for example, 0 to 5, 5 to 10, and > 10 mg PEQ) were presented in the absence of an overall effect measure, one reported category was selected for the meta-analysis. If three categories were reported, the middle category was chosen. If only two categories were reported, the category with the larger number of patients or person time was selected. Random-effects models were used to account for between-study heterogeneity by using the DerSimonian and Laird method [12]. Similarity between the risk ratio and the odds ratio was assumed because infectious events were considered rare. Again, several a priori sensitivity analyses were conducted. With respect to exposure, dose-specific analyses were performed, as well as limiting analysis to studies considering only current GC exposure. Adjusted and unadjusted analyses were considered separately, as well as exploration of the impact of different components of multivariate adjustment (age and sex, disease severity, disease duration, comorbidity, and other RA therapies). Several specific outcomes were considered separately, including all-site serious infections, lower-respiratory-tract infections, tuberculosis, herpes zoster, and postoperative infections. In response to reviewers' comments, we also performed a sensitivity analysis of serious infections reported in prospective studies.

Funnel plots were created to examine the potential for small study effects [13]. Statistical heterogeneity was assessed by using the Cochrane I2 statistic [14], in which I2 > 50% represents substantial heterogeneity. All analysis was conducted by using Stata/SE version 11.

Results

The 1,568 records were identified through parallel database searching (Figure 1). The results were loaded into an electronic bibliographic management system (EndNote). After removal of duplicates, 1,309 studies were identified and screened by one reviewer (WGD). The 430 full-text articles were then assessed for eligibility by two reviewers (WGD and MH). The 21 RCTs [1535] and 42 observational studies [3677] (33 cohort, nine case-control) were included in the analysis. Details of the studies are described in Tables 1 and AF2 (Additional file 2).

Figure 1
figure 1

Flow chart demonstrating study selection. GC, glucocorticoid; RA, rheumatoid arthritis; RCT, randomized controlled trial.

Table 1 Summary of GC RCTs reporting infection outcomes

There were 1,963 patients included in the 21 RCTs, and 526,629, in the 42 observational studies. The mean study duration was 41 weeks for the RCTs, and the median follow-up time was 1.93 person years per patient for the 30 observational cohort studies for which follow-up time was available.

Main results

RCTs

In 1,026 GC-treated patients, 59 (5.8%) infections were found compared with 51 infections in 937 (5.4%) non-GC patients. Ten of 21 studies had no reported infections in either arm, and four further studies had no infections in one of the two arms. The estimated relative risk of infection associated with GC therapy was 0.97 (0.69, 1.36) (Figure 2). No evidence of statistical heterogeneity was present among the included trials (I2 = 0.0).

Figure 2
figure 2

Meta-analysis of infection risk in randomized controlled trials of systemic glucocorticoid therapy.

Observational studies

Systemic GC therapy was associated with an increased risk of infections in observational studies (RR, 1.67 (1.49, 1.87)). Risk estimates differed by study design, with cohort studies generating an RR of 1.55 (1.35, 1.79) and case-control studies, 1.95 (1.61, 2.36) (Table 2; Figure 3). However, evidence was noted of substantial statistical heterogeneity (I2 = 76% for observational studies overall, 71% for cohort studies, and 79% for case-control studies).

Table 2 Study design factors within observational studies and their influence on relative risk of infection associated with glucocorticoid therapy
Figure 3
figure 3

Meta-analysis of infection risk in observational studies, stratified by study design (1, cohort; 2, case-control).

Sensitivity analyses

RCTs

Sensitivity analyses using alternative weighting, different statistical methods of dealing with low event numbers, limiting to studies with a placebo rather than active comparator, and limiting to doses < 10 mg PEQ led to no major change in the results (Additional file 3). Too few studies reported exclusively serious infections, and too few events in those studies, warranted a robust meta-analysis [1820]. Studies considered to report predominantly nonserious infection generated an RR of 1.05 (0.89, 1.24). One study included methotrexate in addition to GC therapy in the treatment arm (15). Exclusion of this study generated an RR of 0.83 (0.57, 1.21).

Observational studies

Stratification by dose category showed a positive dose-response effect. Studies with average doses of < 5 mg PEQ generated an RR 1.37 (1.18, 1.58) compared with an RR of 1.93 (1.67, 2.23) for 5- to 10-mg PEQ. Only one study reported an RR for doses between 10 and 20 mg PEQ (RR, 2.97 (1.89, 4.67)) [68]. Limiting analyses to dose categories above a certain threshold also led to a dose response: RR, 2.46 (2.08, 2.92) for dose categories > 5 mg PEQ, RR 2.97 (2.39, 3.69) for dose categories > 10 mg PEQ, and RR 4.30 (3.16, 5.84) for dose categories > 20 mg PEQ. Doses of < 10 mg PEQ had a pooled estimate of 1.61 (1.42, 1.84), higher than the risk for studies of dosages < 5 mg PEQ.

Adjustment for age and sex led to an RR of 1.78 (1.58, 2.01) compared with no adjustment (RR 1.32 (0.97, 1.80)) (Table 2). Adjustment for direct measures of disease severity did not lead to much change in the risk estimates when compared with estimates not adjusted for direct measures of disease severity. Disease duration also had little impact on the RR. Adjustment for co-morbidity and for other RA therapies (disease-modifying antirheumatic drugs (DMARDs) and/or biologics) led to estimates ~40% higher than the unadjusted estimates. Limiting analysis to studies defining GC exposure as "current use" generated an RR of 1.70 (1.47, 1.97) (Table 2).

GC therapy was associated with an increased risk of all-site serious infection (RR, 1.89 (1.60, 2.24)), lower-respiratory-tract infections (RR, 2.10 (1.52, 2.91)), tuberculosis (RR, 1.74 (1.09, 2.76)), herpes zoster (RR, 1.74 (1.28, 2.36)) and, to a lesser extent, postoperative infections (RR, 1.38 (1.02, 1.86)). The risk of serious infections persisted when analysis was restricted to prospective studies (RR, 1.70 (1.14, 2.55)). Even with stratification by outcome, notable statistical heterogeneity remained across outcomes (I2 = 82%, 51%, 28%, 86% and 0, respectively).

Publication bias

The funnel plot of RCTs (Figure 3a) was roughly symmetrical, with all studies falling within the 95% CI. The funnel plot for observational studies was less symmetrical and had more outliers (Figure 3b). The Egger test for publication bias was nonsignificant for both the RCTs (P = 0.936) and observational studies (P = 0.174 for cohort studies and P = 0.576 for case-control studies).

Discussion

RCTs and observational studies generated different estimates of infection risk associated with GC therapy. The RCT meta-analysis suggested a null association between GC therapy and infection risk (RR, 0.97 (0.69, 1.36)). The confidence interval included both clinically meaningful increased risks (up to 35% increase) and decreased risks (up to a 30% reduction), making the result inconclusive. The observational studies provided an overall RR of 1.67 (1.49, 1.87), suggesting a significant, clinically important increased risk. However, significant heterogeneity was found within the studies. Even after performing multiple sensitivity analyses around exposure definition, outcome, and adjustment for confounders, marked heterogeneity remained a problem. Nonetheless, most analyses of observational studies reported an increased risk of infection, which conflicts with the result of the RCTs. The dose of GC therapy varied both within and between RCTs and observational studies and may contribute to our observed result. However, we were able to perform meta-analyses within both study designs to investigate the risk associated with daily doses ≤ 10 mg PEQ. The differential results between study designs remained. Although it is not yet clear to what extent the risk of infection is influenced by historic (or cumulative) GC therapy, patients in the observational studies are likely to have had longer cumulative exposure than are patients within the short-duration RCTs. This difference may go some way to explaining the apparent discrepancy in the results from the two study designs.

Both study designs had major limitations when addressing infection risk. The big challenges in RCTs were poor reporting of methods and results and the statistical challenge of rare outcomes. For observational studies, heterogeneity, lack of detailed reporting, confounding, and bias (in particular publication bias) were particularly problematic. Other factors affecting the results and interpretation included variability of sampling frame, inclusion and exclusion criteria, definition of comparison groups, and time-varying GC exposure.

Reporting of methods and results in RCTs

GC exposure was usually well defined within RCTs. On occasions, additional GC therapy was allowed at the discretion of the treating physician, and this was rarely quantified. In contrast, safety outcomes from RCTs lacked any standardized reporting of methods or results. Methods sections at times omitted any mention of safety assessment [30, 78] or were too vague to be helpful (for example, "records of ... adverse reactions... were kept") [79]. In the results sections, selective reporting was problematic and included reporting of only pre-selected events (for example, fractures and ophthalmologic complications [80]), events known to be associated with GC therapy [17], events occurring in more than two patients [29], or events leading to withdrawal). Reporting only events with a frequency beyond a certain threshold would miss rare events, potentially imbalanced across multiple studies. Withdrawal studies (in which reporting was complete) provided measures of relative risk that could be included in the analysis. It is important that exclusion of these studies in a sensitivity analysis did not change the overall results. Vague reporting was also common. Phrases such as "no meaningful toxicities were reported by the participants in either group" [81] or "the proportion of patients who reported adverse reactions [did not] differ between groups according to type of treatment" [79] did not provide sufficient information on infections to warrant inclusion. Reporting of symptoms rather than diagnoses meant we had to decide subjectively (but independently) whether infections were present. We sought to include studies with an infection incidence of zero, only if this was explicit or could be confidently inferred. Although this was ambiguous at times, the use of two independent reviewers made study selection more robust.

Reporting of adverse drug reactions or side effects (with assumed causality) rather than all adverse events (in which causality is not assumed) was common. For a common event such as infection, causality is difficult to establish. Recent guidelines advise "terms that do not imply causality (such as 'adverse events') should be the default term to describe harms, unless causality is reasonably certain" [82].

Nonstandardized reporting in RCTs was a major problem in collating information. Different definitions of infection meant that summary risk estimates were averaged across different outcomes. We attempted to perform sensitivity analyses limited to serious or nonserious infections but were limited by low numbers. Underreporting of nonserious infections was likely: nonserious respiratory infections account for 300 to 400 general practice consultations annually per 1,000 registered patients in the United Kingdom [83]. Applying these rates to the RCTs, for example in the 2-year study of 192 patients by Wassenberg [29], we might expect > 100 nonserious infections. The reported number of infections was only seven.

Rare events in RCTs

Much debate has occurred about the analytic and methodologic challenges of conducting meta-analyses to examine rare outcomes [11]. We used a variety of techniques including the Mantel-Haenszel odds ratio (with and without zero-cell correction), inverse variance, and weighting by study size to explore sensitivity to change. Although all methods failed to show a definite harmful or protective effect of GC therapy, all analyses included clinically important harms and benefits within the confidence intervals. GC therapy might be associated with a ≤ 35% increased risk of infection, or a 30% reduction. Although GCs are widely thought to increase the risk of infection, it is plausible that they might decrease the risk at these lower doses by controlling disease severity. The broad confidence intervals that span regions of clinically important effects in both directions are a consequence of low numbers of events, despite a meta-analysis of all existing studies.

Inconsistent capture or reporting of infections has an impact on the weighting of studies within a meta-analysis. Fewer events within a study result in an increased variance and thus a lower weighting. We therefore applied alternative weightings including total number of patients and estimated total person time, so studies with high numbers of patients but few infections would contribute more weight to the meta-analysis. For example, a 2-year study of 250 patients with one discontinuation for infection [33] contributed only 2.7% weight to the original meta-analysis, but increased to 17.6% when weighted by numbers of patients or 23.2% by person-time. The absence of a significantly increased risk in these sensitivity analyses is reassuring, although again, we cannot conclude that GCs are not associated with an increased (or decreased) risk of infection: the confidence intervals included up to a 70% increased or decreased risk, which is clinically meaningful.

Heterogeneity in observational studies

Although RCTs have some heterogeneity, for example in background therapy or entry criteria, the variability in observational studies is much wider. The observational studies reflected a wide range of settings and populations, including year of recruitment, disease duration, disease severity, GC therapy practice, co-therapy, co-morbidity, geography, health-care systems, and recruitment methods (for example, single-center surgical experience, administrative database, biologics register). Each has its own implication for risk estimates, but the multiple domains of difference meant that much heterogeneity existed within the studies. Even after stratification within any chosen domain, many differences remained in the other areas of potential heterogeneity, and the I2 values often remained high. Nonetheless, within this heterogeneity, the direction of effect typically suggested an increased risk associated with GC therapy, with only six of 42 studies reporting a relative risk of < 1. Statistical heterogeneity thus likely arose from different effect sizes.

It has been argued that meta-analysis of published nonexperimental data should be abandoned [84]. Others argue that careful consideration of sources of heterogeneity within a systematic review can offer more insights than the "mechanistic calculation of an overall measure of effect, which will often be biased" [85]. We ran many stratified analyses to consider the impact of these possible factors, producing some useful results, such as demonstrating a dose response.

Lack of detailed reporting in observational studies

Clear reporting of methods and results was a problem in observational studies as well as in RCTs, in particular, the definition of GC exposure and methods of risk attribution. This is important for GC therapy in RA because of its intermittent pattern of use and multiple routes of administration. GC therapy was rarely the primary exposure of interest in these observational studies, but merely one of many possible exposures or covariates, perhaps explaining the lack of detail. Methods sections rarely reported clearly on how GC exposure was captured, although each study design provided certain opportunities for defining exposure. For example, in prescription databases, clinician reporting, or case note review without clarity about exposure, interpreting the many study results was challenging. Even when the source of exposure was clearly described, the definitions for "GC exposed" were rarely consistent. GC exposure was variously defined as ever exposed during the study period [37], exposed at study baseline [36], or recent [75] or current exposure [39] at the time of infection. Even within exposure categories, definitions varied. For example, current exposure at the time of infection included definitions of GC prescriptions within 30 days of the event, 45 days, and beyond. Risk windows used in the analyses included "on drug" [39, 59], "on drug plus lag window" [68, 71], and "ever exposed" [36, 66]. Such analytic variability can produce different results even within one study [86]. Exploration of dose within observational studies was restricted by reporting. We were able to explore a possible dose-response only in studies that stratified by dose. Variability in the time period was found when average dose was considered, similar to yes/no definitions of exposure, adding additional heterogeneity. Definition and sources of outcomes as well as methods of verification (when undertaken) also varied between studies. Sources of infection ranged from electronic medical records, through case-note review or direct clinician reporting, to linkage with national inpatient registers.

Several risk estimates had to be excluded because of problems with reporting, including typographic errors with point estimates outside of confidence intervals, and absent confidence intervals around reported point estimates [39, 87]. Other studies reported average GC dose for cohorts of patients, but the absence of absolute patient numbers receiving GC therapy prevented inclusion.

Confounding and bias in observational studies

Confounding by disease severity, whereby patients with more-severe disease (and thus at a higher risk of infection) are more likely to receive steroids, was a major concern. This potential bias is unavoidable in observational drug studies. Confounding by contraindication was another possibility, in which patients with high comorbidity or frailty are considered too high risk for traditional DMARDs, and are instead treated with GCs. Within the meta-analysis, we stratified studies into those that reported unadjusted and adjusted risk estimates. Interestingly, the adjusted analyses provided a higher estimate of risk than did the unadjusted analyses, contrary to what we expected. If high disease severity and high comorbidity were reasons for receiving GC therapy (and both are independent risk factors for infection), we would have expected the adjusted analyses to move toward the null. However, clinical decisions are complex, and more than these two variables are considered, leaving the possibility of residual confounding.

Publication bias is an important consideration, present at several levels. First, researchers who found a positive "statistically significant" association between GC therapy and infection risk may be more inclined to include this result in their article. Indeed, 23 of 42 observational studies had statistically significant increased risks, with several just reaching the threshold of significance.

Second, techniques such as forward or backward selection for multivariate analysis automatically reject nonsignificant results. If GC therapy was only one of many covariates of interest, it is plausible that only the significant results were reported. We found examples of studies in which GC therapy was included in a multivariate model, but no subsequent GC risk estimate was reported [88]. At times, it was explicitly reported that no association was found, but either no measure of effect was provided [8993], or only a P value > 0.05 was reported [94]. Exclusion of these null studies would result in a false inflation of the summary risk estimate and is a major concern.

Third, having discovered a significant association, researchers may be more inclined to submit for publication.

Fourth, reviewers may be more inclined to accept. Publication bias means that the infection risk with GC therapy is likely to be less than the estimated RR of 1.67. Unfortunately, we cannot know how far correction for publication bias would move the result toward the null.

Quality of included studies

When combining multiple studies, we must consider not only the results from those individual studies, but also the quality of the studies. At present, no accepted instruments are available to assess the quality of studies that evaluate harms [82, 95]. We did attempt to assess the included studies according to scales but found that the scores oversimplified the limitations, lacked discrimination between studies, and missed other important factors. For example, the McHarm scale [96] scores reporting of both serious and severe harms as well as deaths. Very few of the observational studies had the primary aim of examining the safety of GC therapy, and thus did not consider severity or death. The Newcastle Ottawa Scale [97] includes a domain about comparability, or adjustment for confounders. The majority of studies adjusted for confounders, yet wide variation existed in the covariates used. We have listed the confounders adjusted for within Table AF2 to provide the reader with study-specific details and performed sensitivity analyses by using different adjustments. Ascertainment of exposure and outcome [97], as already discussed, was challenging to assess because GC therapy was only one of many covariates and often not the primary exposure of interest. Such lack of detail meant that we were limited in generating a meaningful or accurate score. Nonetheless, no studies appeared to have different methods of ascertainment of the exposure/outcome for the cases and controls exposed and comparison cohorts.

Conclusions

Given these numerous problems with both study designs in assessing the infection risk with GC therapy, how can we best summarize? The interventional nature of RCTs provides an opportunity to isolate and examine the effect of therapy. To overcome the problem of small numbers of events in individual studies, meta-analysis can collate results and enhance this useful experimental study design to address safety. Multiple analytic models all reached the same broad estimate, providing reassurance. Unfortunately, all estimates were derived from the selected studies after exclusion of studies of lower-quality methods and reporting. The results are valid only if the included studies were representative of all studies, and this is something we cannot assess. Of greater concern was the outcome ascertainment and reporting, which was generally of poor quality. The clear variation in methods of ascertainment and reporting within our included studies, plus likely underreporting, leads to anxiety about the meta-analysis result. The observational studies are harder still to untangle. Many issues cloud the picture, in particular methods for defining exposure and risk attribution, residual confounding, and publication bias. Replication of results does not allay these concerns. We must conclude that the risk of infection associated with systemic GC therapy in patients with RA remains uncertain, despite six decades of clinical experience. However, one consistent finding is that we cannot rule out the possibility of a clinically important increased risk, from either the RCTs or the observational studies. Improved, standardized reporting of harms [98] and improved access to patient-level, time-dependent data from RCTs would improve the ability to assess adequately the risks of specific adverse events. Within observational studies, clear definitions of drug exposure and risk attribution, as well as reporting of effect sizes, irrespective of statistical significance [99], would advance our knowledge.

Figure 4
figure 4

Funnel plots of risk ratios in (a) RCTs and (b) observational studies, stratified by study design.