Introduction

Crohn’s disease (CD) is an inflammatory condition which can affect any part of the gastrointestinal tract. It is characterised by chronic inflammation all the way through the intestinal wall. Crohn’s disease typically follows one of three behaviour patterns: inflammation only, stricturing, and penetrating [1]. Penetrating disease is typically characterised by formation of a fistulae (an abnormal connection between two epithelial surfaces). This can happen between intestinal loops (enteroenteric), intestine, and skin (enterocutaneous), or the anorectum and buttock skin (perianal). The incidence of perianal fistulae in CD is around 30% [2].

A fistulae is typically managed with sepsis control, through incision and drainage of any abscess, placement of a seton, and immune modulation by drugs such as azathioprine or infliximab (anti-TNF-α therapy) [3, 4]. A number of alternative surgical procedures might also be considered [3]. In serious cases, a stoma might be offered, often as a prelude to proctectomy [4]. This condition can have a significant impact on patients’ quality of life [5,6,7]. As few as one in three patients will achieve long-term healing of their fistulae [8]. Consequently, health care costs of anal fistulae in CD are high due to drug therapies [9, 10]. It is not surprising that this condition has been identified as a research priority in two recent research priority setting exercises [11, 12].

The aetiology of CD is complex and multifactorial. Recent genomic studies have identified several loci of susceptibility [13,14,15]. Several of these genes are implicated in aberrant immune responses. Environmental factors such as smoking are thought to play a key part in disease behaviour [16], as in altered intestinal microbiome [17] [18]. These are baseline disease or demographic factors that might be implicated in disease behaviour and prognosis. On top of these systemic mechanisms, localised mucosal damage and aberrant or failed repair mechanisms likely contribute to persistence of fistulae [2, 19].

Randomised controlled trials (RCTs) are the gold standard in clinical research, and these are sorely needed to guide treatment of fistulating perianal CD. To design trials, we need to balance prognostic factors across study arms to limit confounding and produce reliable results [20].

The aim of the present study was to systematically review the literature and identify baseline prognostic factors relevant to the treatment of fistulating perianal CD.

Materials and methods

This review was registered on the PROSPERO database (CRD42016050316) and conducted in line with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines using a predefined protocol.

The inclusion criteria were: publication during or after 1980; study size ≥50 patients with rectovaginal or perianal fistulae; fistulae cause by CD; patients aged 16 years or over; fistulae is baseline health state (startpoint [20]) of the study. Exclusion criteria were: CD without fistulae; paper only reports intervention as opposed to demographic or disease status; covariates; paper only includes treatment outcomes as opposed to analysing by demographic or disease status factors. Publications not in English were also excluded due to resource constraints.

Information sources were MEDLINE (1946 to October 26, 2016) and Embase (1974 to October 26, 2016) via Ovid. Searches, which used no limits, combined thesaurus and free-text terms (see Fig. 1).

Fig. 1
figure 1

Search terms used in paper selection

Results from bibliographic databases were combined with papers through secondary searches of bibliographies and papers of known relevance identified by clinical topic experts, and duplicates removed. Titles and abstracts of citations were screened against the eligibility criteria (by GB), with secondary review and resolution of queries (by ML and DH). Potentially eligible full texts were retrieved and the process repeated, with reasons for rejection recorded.

Data were extracted into predesigned tables (by GB) and findings confirmed (by ML). We extracted data on demographics of the patients and specific details about their condition, including: age; gender; smoking status; duration of disease; location of disease; number of fistulae; treatments; and outcome data on ‘response’ or ‘healing’, that is :fistulae closure, no further discharge from fistulae, or no fistulae recurrence, however defined. Risk of bias (RoB) in individual studies was assessed by two reviewers (GB and ML) using the Quality In Prognosis Studies tool (QUIPS) tool [21]. This tool assesses 6 domains: study participation, study attrition, prognostic factor measurement, outcome measurement, study confounding, and statistical analysis and reporting. We recorded statistical methods used and summary measures, however presented, including odds ratios, relative risks, hazard ratios with confidence intervals, tests of significance (p values). We conducted a narrative (descriptive) synthesis with results structured by type of prognostic factor.

Results

The PRISMA study selection flow chart is shown in Fig. 2.

Fig. 2
figure 2

PRISMA flow diagram

Study comparisons

Searches identified 997 papers. Following removal of duplicates and secondary searches, 923 were screened for inclusion. Forty-seven papers were reviewed at full-text level. Thirty-four papers were rejected at this stage for the following reasons: no prognostic factors reported (n = 11), <50 patients with fistulae caused by CD (n = 9), CD without fistulae (n = 4), fistulae was an endpoint (n = 3), development of fistulae was a factor in natural history of Crohn’s disease (n = 2), paper was a narrative review (n = 3), or paper was a systematic review (n = 2). This left 13 papers for qualitative review.

Study demography and design

Of the 13 studies identified, 2 were published between 1995 and the end of 1999 [22, 23], 7 between 2000 and the end of 2009 [24,25,26,27,28,29,30], and 4 between 2010 and 2014 [31,32,33,34]. Studies and characteristics are summarised in Table 1.

Table 1 Summary of included papers and characteristics

All studies took place in the USA (n = 3) [23, 27, 30]) or Europe (Germany (n = 3) [22, 25, 28], France (n = 2) [32, 34], the UK (n = 1) [24], the Netherlands (n = 1) [31], Austria (n = 1) [29], Spain (n = 1) [26], and Portugal (n = 1) [33]). The institutional setting was a teaching hospital in all cases.

Ten of the studies were prospective: either observational (n = 8) [22, 25, 26, 28,29,30, 33, 34] or RCTs (n = 2) [23, 31]. The remaining 3 studies were retrospective [24, 27, 35]. The follow-up period for studies ranged from 7 weeks to 27.3 years.

Different statistical methods were used to evaluate results. The techniques used were Fisher’s exact test (n = 9) [23,24,25, 27,28,29,30,31, 33], Chi-square test (n = 7) [23, 25,26,27, 30, 31, 33], mean with standard deviation (n = 5) [26, 29, 31, 33, 34], Mann–Whitney U test (n = 4) [24, 28, 31], Kaplan–Meier method (n = 4) [22, 25, 32, 34], log-rank test (n = 4) [22, 25, 32, 34], Cox proportional hazards regression model (n = 3) [22, 32, 34], 95% confidence Intervals (n = 2) [26, 33], odds ratios (n = 2) [23, 33], Wilcoxon rank tests (n = 2) [22, 28], median with interquartile range (n = 2) [31, 32], log-likelihood ratio (n = 1) [26], Kruskal–Wallis test (n = 1) [25], Kolmogorov–Smirnov test (n = 1) [33], and Hardy–Weinberg test (n = 1) [33]. Statistical methods and potentially confounding variables recorded are shown in Table 2.

Table 2 Additional baseline demographics and statistical tests used to assess prognostic factors

Outcomes

Identified prognostic factors were related to various outcome measures defined differently in the 13 papers. Common outcome terms were healing, response, complete response, partial response, and recurrence. A summary of various definitions and common ‘headings’ used is presented in Table 3.

Table 3 Common outcome groups and definitions used

Bias

Risk of bias findings are presented in Table 3. Overall risk of bias in the studies was judged to be low for 7 [26, 28, 29, 31,32,33,34, 36] and moderate for 6 studies [23,24,25, 30, 37] [24]. Study attrition was typically low. The domains most commonly at high risk of bias were study confounding (n = 5) [22, 24, 25, 28, 30] and statistical analysis and reporting (n = 6) [26, 30,31,32,33, 37]. This bias assessment is shown in Table 4.

Table 4 Risk of bias using QUIPS tool

Prognostic factors

Prognostic factors were divided into those associated with patient characteristics, disease characteristics, and environmental characteristics. These are summarised in Table 5.

Table 5 Studies and prognostic factors assessed

Patient characteristics

Two papers found that patient sex was significant. A RCT of infliximab versus placebo (n = 94) found that males were significant more likely than females to reach the primary endpoint (p < 0.001) versus (p = 0.28) [23]. Another paper (n = 81) found that time for closure of fistulae was significantly shorter for men than women, at 11.7 months versus 21.0 months (p = 0.03) [HR 0.59, (95% CI 0.36–0.96)] [34]. Three papers found sex had no significant association with outcome. One trial (n = 70) found sex was not significant to the ‘response’ of patients (p = 0.74) [31] and another (n = 108) found no difference between the sexes (p > 0.05) [26]. A retrospective study (n = 156) found that sex was not a significant prognostic factor. (p = 0.12) [HR 1.46, (95% CI 0.89–2.35)] [32]

Only 1 trial (n = 108) assessed age as a prospective factor and did not find it to be significant (p > 0.05) [26].

Race was evaluated in 1 study (n = 70) as ‘Caucasian versus other’ and was found not to be a significant predictor of healing (p = 0.39) [31].

Studies did not clearly report baseline/historic use of medications; this was reported as previous or current use of immunosuppression and therefore not included in this study.

Genetics

Two papers evaluated the clinical response of NOD2/CARD15 variant carriers versus wild-type patients to antibiotic therapy. One study (n = 54) found that that complete fistulae response was more likely with wild-type (33 vs. 0%, p = 0.02) [28]. The other (n = 203) found that those without the mutation were more likely to show clinical improvement when treated with antibiotics (7.7 vs. 40.5%, p = 0.041) [33]. Both of these studies relied on fistulae drainage and had small numbers in the variant carrier group; therefore, caution should be exercised in interpreting these results.

Disease duration and location

A prospective observational study (n = 52) found the duration of fistulating disease was a significant prognostic factor, although strength and direction of association was not clearly reported (p = 0.04) [29]. Two prospective studies found the duration of perianal fistulating disease was not significant—again measures used to assess this were not clear [26, 28]. A retrospective study (n = 226) found no significant associations between fistulae healing and the duration of CD [27].

Two papers reported patients with ileal CD only (in association with perianal disease) were significantly more likely to have better outcomes than those with other disease distributions. One RCT (n = 94) noted complete fistulae response was more likely in those with ileal and colonic disease (OR 5.1, p = 0.01) than those with isolated colonic disease (OR 2.3 p = 0.35) [23]. A retrospective study (n = 156) found patients with ileocolonic disease were more likely to achieve fistulae closure [HR 1.59 (1.08–2.34) p = 0.017] compared to those with colonic disease [HR 0.86 (0.58–1.27) p = 0.54] on univariate analysis [32]. On multivariate analysis, ileocolonic behaviour was positively associated with fistulae healing [HR 1.88 (1.08–3.32) p = 0.025]. This finding was not upheld by 1 prospective study (n = 81), and 1 retrospective study (n = 226) which found no association between fistulae healing and the initial site of CD [27, 34]. Three prospective studies found rectal involvement in CD was a predictor of poor fistulae healing [24, 25, 30].

Fistulae anatomy

Three papers identified complexity of fistulae anatomy as a prognostic factor. Prospective studies found that compared to simple fistulae, complex fistulae required more treatments (n = 86) (p = 0.02) [36] and took longer to heal (15.3 vs. 2 months) (n = 81) (p < 0.001) [HR 0.31 (95% CI 0.16–0.62)] [34]. A retrospective study (n = 156) demonstrated that simple fistulae was associated with fistulae closure [HR 2.53 (95% CI 1.43–4.45) (p = 0.006)] [32] Another study (n = 147) found a trend towards worse outcomes at 5 years for complex versus simple fistulae (p = 0.2113) [25].

One study (n = 224) found that a patient with multiple fistulae was less likely to achieve healing than a patient with a single fistulae [48.6 vs. 28.2% (p < 0.05)] [30]. This was not consistent across all studies [24, 25].

Presence of a rectovaginal fistulae was not thought to be a prognostic factor for overall perianal fistulae healing (n = 81) [27].

Environmental characteristics

Six studies evaluated smoking, and none of these found it to be a significant prognostic factor [26,27,28,29, 31, 34]. This is summarised in Table 6.

Table 6 Studies assessing smoking as a prognostic factor in outcome of perianal Crohn’s fistulae

Discussion

To our knowledge, this is the first systematic review to assess prognostic factors in fistulating perianal CD. It has identified candidate prognostic factors including NOD2/CARD15, duration of fistulating disease, distribution of CD, and fistulae anatomy. These require further robust assessment before they can be used to inform research or clinical practice. The challenges to prognostic research in this field are many, including lack of standardised outcome measures and timing of outcome measurement.

The NOD2 and CARD15 variant genes had a significant association with fistulae response to antibiotics in 2 studies [28, 33]. Prior work has found associations between disease severity and expression of the various alleles, particularly with aggressive luminal disease requiring early and repeated surgery [38,39,40]. This suggests that these are plausible factors related to the prognosis of fistulating perianal CD, although there is insufficient evidence presented at this point to understand strength of association, or modulating factors.

Duration of fistulating disease was significant in 1 study (with unclear direction), but not in 2 others. Long-standing fistulae have been shown to undergo epithelialisation and behave in a similar fashion to skin, and this may reduce the ability to heal [41,42,43]. If track epithelialisation is the underlying mechanism, then it may be reasonable to consider fistulae duration as a prognostic factor (or a proxy of a prognostic factor).

Disease distribution is possibly a prognostic factor, with ileal disease associated with a better prognosis and colonic or rectal disease associated with a worse prognosis. Guidelines advocate early assessment for proctitis in Crohn’s fistulae, as this impacts clinical strategy and outcome [4, 44, 45]. Proctitis has been associated with higher rates of proctectomy in previous studies, suggesting that this factor has a role in predicting outcomes in these patients [46].

The behaviour of the fistulating process is most likely a factor in healing, both in terms of complexity and number. Those with complex anatomy (multiple branching tracks crossing large proportions of the anal sphincter) are at risk of recurrent sepsis [47]. Unfortunately, terminology used to define ‘complex’ and ‘simple’ is not standard across the literature. Complexity of fistulae anatomy is more than location and number of branches. Magnetic resonance imaging offers the ability to assess volume and length of fistulae tracks [48]. It is plausible that a longer or large-volume fistulae track could take longer to heal than a short- or low-volume track. This is potentially an important prognostic marker and therefore would merit further assessment.

Patient demographics including sex may not have a role to play; the majority of studies reviewed found no relationship between sex and outcome, and those that did identify statistical differences obtained conflicting results. This may reflect sampling issues.

None of the studies reviewed found that smoking was a significant prognostic factor in fistulae outcomes. Smoking has been shown to be associated with poor disease control, and smoking cessation is widely advised in CD [49,50,51]. Given this, it is interesting that it is not a significant factor here. This could be for a number of reasons: bias of design of studies through definition of smoking (patient reported vs. carbon monoxide testing), or size or sampling of patients; that there is no mechanistic role for smoking in the formation of perianal fistulae; or that disease is already ‘bad’ and smoking has no additive effect.

The number of prognostic factors identified was limited by the number of studies reporting baseline factors with appropriate analysis. Even if cohorts had been well described, it would not have been possible to perform a meta-analysis in this setting as there was little consistency across study endpoints. There were 5 major groups of outcome (healed, response, complete response, partial response, recurrence), with an average of 4 definitions for each outcome. Definition of recurrence was fairly consistent across studies. The definition of healed included an asymptomatic fistulae, a non-draining fistulae on compression, and a change in the perianal disease activity index (PDAI). These are relatively subjective measures; even the PDAI has subjective elements [52], at a single time point. It is clear that there are issues to be addressed before further studies are undertaken to investigate this further.

There are limitations to consider in this review. Initial screening by a single reviewer to select studies and extract data increased the possibility that relevant reports were discarded [53, 54]. Despite this, we had multiple checks in place to support the single reviewer process, including screening of discarded abstracts for key papers by a second reviewer. This, coupled with support from clinical topic experts and a robust bibliography search, meant that we were confident that we had identified the majority of papers reporting prognostic factors.

This study used a broad search strategy to identify as many candidate papers as possible and used a tool appropriate for the assessment of prognostic factors (QUIPS). The validity of the findings is supported by the prognostic role of some reported factors in other aspects of inflammatory bowel disease. There are diminishing marginal returns from the use of databases additional to MEDLINE and Embase, with some such as CINAHL rarely retrieving unique references for many topic areas [55, 56]. For this reason, we believe our search strategy is associated with a low risk of bias.

It is important that any future prognostic study captures the above factors and uses a standardised well-defined outcome measure. A well-conducted cohort study will allow all the above factors to be properly assessed using appropriate multivariate statistical models [57, 58]. Given the prevalence and incidence of perianal CD, it might be possible to use the resulting data to inform novel study designs. Clear understanding of confounding factors might allow for trials within cohorts, Bayesian modelling or interrupted time series as alternatives to classical trial designs.

Conclusions

This systematic review has identified potential prognostic markers for outcomes in fistulating perianal CD, including genetic factors and disease behaviour. We cannot, however, draw robust conclusions from this heterogeneous group of studies. We recommend that future studies include well-characterised cohorts and use a consistent endpoint for reporting.