Background

Tuberculosis remains a leading cause of morbidity and mortality, especially in Asia and Africa with high tuberculosis burden. In China, the prevalence of active pulmonary tuberculosis in 2010 among those older than 15 years was 459/100,000, and the prevalence of smear-positive pulmonary tuberculosis was 66/100,000. [1] Up to 30% of patients with tuberculosis have tuberculous pleural effusion (TPE), in which extrapulmonary involvement causes pleural effusions [2]. Properly treating pleural effusions requires determining whether the effusions are TPEs or another type of effusion.

The gold standard for diagnosing TPE is the isolation of Mycobacterium tuberculosis (M. tuberculosis) from samples of either pleural effusion or pleural biopsy. This culturing offers 100% diagnostic specificity, but it usually takes several weeks, delaying diagnosis and increasing the risk that patients are lost to follow-up. In addition, pleural biopsy is invasive and technically difficult to some extent, particularly in children, such that success can depend strongly on the individual performing the biopsy. [3] Detecting granulomas in pleural biopsies can diagnose TPE with approximately 95% specificity, [2,3,4] but the sensitivity of culture- or granuloma-based methods is limited. [5] Although image-guided biopsies and local anesthetic thoracoscopic (LAT) biopsies can highly evaluated the sensitivity compared to blind pleural biopsy, both those techniques are not recommended as the first procedure for patients presenting with pleural effusions. Thus, this highlights the need for alternative less invasive diagnostic strategies.

TPE is largely the result of pathological immune reactions associated with an increase in cytokines, including interleukins (ILs). [6] ILs are secreted proteins that bind to specific receptors and help mediate communication among leukocytes. For example, IL-12 is essential for initially activating inrerferon(INF)-γ-mediated T cell responses to primary M. tuberculosis infection. [7, 8] ILs can promote various types of inflammatory responses, playing a role in activation-induced death of skin keratinocytes, mucosal epithelial cells, and T cells. [9] Evidence that pleural levels of some ILs are elevated in patients with TPE has led investigators to explore their potential for differentiating TPE from other types of pleural effusion. Most studies have looked at only one or a few ILs, and some studies looking at the same ILs have arrived at different conclusions. This led us to systematically review the literature and meta-analyze available data to gain a more comprehensive understanding of the potential of ILs for diagnosing TPE.

Methods

Search strategy and study selection

The systematic review was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines. [10] Two investigators independently searched PubMed, EMBASE, Web of Knowledge, CNKI, WANFANG, and WEIPU databases to identify studies assessing the role of ILs in diagnosing TPE published up to January, 2017. Before the full search, we performed a preliminary search to decide on the ILs to include in the review. The following search terms were used: “interleukins or IL” and “IL-2 or IL-6 or IL-12 or IL-12p40 or IL-18 or IL-27 or IL-33” and “tuberculosis” and “pleural effusion/pleural fluid” and “sensitivity or specificity or accuracy”. Reference lists in retrieved studies and review articles were examined manually to identify additional studies.

Two authors (NZ and CW) independently assessed each study for eligibility; disagreements were resolved by consensus. Studies were included if they fulfilled all the following criteria: (1) the work was an original research article published in English or Chinese, (2) human samples were analyzed, (3) standard methods were used to definitively diagnose the type of effusion as TPE or other type, and (4) data sufficient for calculating specificity and sensitivity were reported. Conference proceedings, letters to the editor, and studies including fewer than 10 patients with TPE were excluded.

Quality assessment and data extraction

The same two authors (NZ and CW) assessed the quality of included studies using the Quality Assessment of Diagnostic Accuracy Studies-2 tool (QUADAS-2). [11] For each criterion, a response of “yes” was assigned if it was fulfilled; “unclear”, if doubt existed whether it was fulfilled; or “no” if it was not fulfilled. The following data were retrieved from each study: authors, country, publication year, population characteristics, testing methods, cut-off value, methodological quality, and 2-by-2 tables showing rates of true positives (TPs), true negatives (TNs), false positives (FPs) and false negative (FNs).

Statistical analysis

Data were compiled in Excel, then transferred to Review Manager 5.3 (The Cochrane Collaboration, Copenhagen, Denmark) and STATA Version 12.0 (Stata Corp., College Station, TX) for statistical analysis. For each study, sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR) were calculated, together with 95% confidence intervals (CIs). A summary ROC (SROC) curve was generated for each IL in each study, [12] from which a single test threshold value was determined and used to calculate sensitivity and specificity. [13] Overall diagnostic performance for that IL was assessed as the area under the SROC curve (AUC).

The Q test and inconsistency index (I2) were used to detect potential heterogeneity in the natural logarithm of DOR (lnDOR) meta-analyzed across studies. [14] Presence of implicit cut-off point effects and correlation between sensitivity and specificity were assessed for each IL by calculating the Spearman rank correlation coefficient for each IL. Deeks’ funnel plot and Egger’s test were used to detect publication bias [15]. All statistical tests were two-sided, with P < 0.05 taken as the threshold of significance.

Results

Our systematic review included 38 studies examining the ability of pleural concentrations of IL-2, IL-6, IL-12, IL-12p40, IL-18, IL-27, and IL-33 to diagnose TPE. [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53] Other ILs in the Table 1 were excluded for meta-analysis because relevant data were available from fewer than 3 studies [54,55,56,57,58] (Fig. 1). Two authors (NZ and CW) assessed studies for possible overlap in the populations analyzed. Data were pooled from overlapping populations as long as the different studies reported on different ILs or IL combinations. Otherwise, if studies with overlapping populations reported on the same IL or IL combination, only the data from the largest study were used.

Table 1 Clinical summary of all studies
Fig. 1
figure 1

Flow diagram of study selection. QUADAS: Quality Assessment of Diagnostic Accuracy Studies

Study characteristics

Table 1 summarizes clinical characteristics of patients in the 38 studies that used for quantitative meta-analysis [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53]. Average sample size was 98 (range, 43 to 431) for each IL (Table 2). 23 studies stated that the pleural effusion samples were collected before any drug treatment [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38], while the rest 15 studies didn’t report such information [39,40,41,42,43,44,45,46,47,48,49,50,51,52,53]. Diagnosis of TPE or other type of pleural effusion was based only on clinical course in 5 studies, [22, 23, 28, 35, 41] i.e. on clinical presentation, pleural fluid analysis, radiology and responsiveness to anti-tuberculosis chemotherapy. Diagnosis was based on bacteriology, histology or both (gold standard) in 11 studies. In the remaining 21 studies, some patients were diagnosed with TPE based on clinical course and others based on the gold standard. One study [51] did not report the diagnostic standard for TPE. All but 3 studies [35, 37, 52] measured IL levels using enzyme-linked immunosorbent assays (ELISA), with the remaining 3 studies using radioimmunoassays.

Table 2 Diagnostic performance of interleukins from individual studies

Determination of statistical pooling model

Diagnostic studies are typically meta-analyzed using an SROC-based fixed-effects model, [59] a random-effects model using a bivariate normal approximation, [60] or a hierarchical SROC (HSROC)-based full Bayesian [61] or empirical Bayes method [62]. In our study, lnDOR heterogeneity was statistically significant and associated with high I2 values for most ILs (Table 3). These indications of substantial heterogeneity in lnDOR made the use of a SROC-based fixed-effects model inappropriate [63].

Table 3 Statistical measures of heterogeneity, cut-off effect, and publication bias for each interleukin

The possible presence of implicit cut-off point effects was examined for each included IL, using the Spearman rank correlation between sensitivity and specificity (Table 3). A negative correlation was found for most ILs, indicating no detectable implicit cut-point effect. Therefore, we used a random-effects model to estimate the mean sensitivity and specificity and associated CIs.

Diagnostic accuracy

These data were meta-analyzed using a random-effects model (Table 3). Fig. 2 summarizes the sensitivities and specificities for IL-27 and IL-18 reported by each study. (Results for the other ILs are reported in Additional file 1: Figure S1.) Sensitivity of IL-27 ranged from 0.80 to 1.00, and the pooled value was 0.93 (95%CI 0.90–0.95). Sensitivity of IL-18 ranged from 0.44 to 0.97, and the pooled value was 0.87 (95%CI 0.79–0.92). Specificity of IL-27 varied from 0.85 to 0.99, and the pooled value was 0.95 (95%CI 0.90–0.98). Specificity of IL-18 varied from 0.82 to 1.00, and the pooled value was 0.92 (95% CI 0.88–0.95). The pooled parameters for all included ILs are shown in Table 4.

Fig. 2
figure 2

Forest plot of the sensitivities and specificities. a. interleukin-27, b. interleukin-18. The calculated pooled mean with corresponding confidence interval is also reported

Table 4 Pooled means of sensitivity and specificity, diagnostic odds ratio(DOR), area under the curve(AUC), and calculated likelihood ratios for each interleukin

Unlike a traditional ROC plot, each data point on an SROC curve represents a separate study, allowing the curve to provide an overall assessment of diagnostic performance. Plotting the rate of TP against the rate of FP gave curves showing AUCs of 0.95 for IL-18 and IL-27 (Fig. 3). Among all ILs, IL-27 showed the highest overall accuracy, with a sensitivity of 93% and specificity of 95%.

Fig. 3
figure 3

Summary receiver operating characteristic (SROC) curve for all the interleukins included

Study quality and publication bias

QUADAS-2 assessment of included studies showed that most studies had low risk of bias (Fig. 4). Both Egger’s and Deeks’ tests suggest no evidence of bias among the studies for any ILs meta-analyzed (Table 3). Funnel plots indicate low risk of publication bias (Additional file 1: Figure S2).

Fig. 4
figure 4

Summary of QUADAS-2 assessments of included studies. QUADAS-2: Quality Assessment of Diagnostic Accuracy Studies-2. Patient Selection: Describe methods of patient selection; Index Text: Describe the index test and how it was conducted and interpreted; Reference Standard: Describe the reference standard and how it was conducted and interpreted; Flow and Timing: Describe any patients who did not receive the index tests or reference standard or who were excluded from the 2 × 2 table, and describe the interval and any interventions between index tests and the reference standard

Discussion

Assaying pleural levels of ILs may be a cost-effective and minimally invasive alternative to traditional tests for differentiating TPE from other types of pleural effusion. Our meta-analysis of the available evidence suggests that IL-27 and IL-18 show relatively high diagnostic accuracy for TPE, while five other well-studied ILs do not (IL-2, IL-12, IL-27, IL-33 and IL-12p40). Even IL-27 and IL-18 do not appear to have adequate diagnostic potential on their own, so they would need to be used in conjunction with other methods or conventional markers.

Our meta-analysis showed that IL-2, despite being centrally involved in the regulation of immune tolerance and activation, [64] is associated with quite low sensitivity and specificity. This may reflect the fact that IL-2 data were available from only 5 studies, all of which were conducted in China. Future work, preferably in Caucasians and other groups of Asians, should investigate the diagnostic potential of this IL.

DOR combines sensitivity and specificity into a single indicator of test performance. [65] Higher DOR indicates better discriminatory test performance. Mean DOR was 64.12 for IL-18 and 227.9 for IL-27, indicating high overall accuracy. Potentially more clinically meaningful than DOR are likelihood ratios. [66] A likelihood ratio > 10 or <0.1 suggests a 10-fold difference between the pre- and post-test probability that a condition is present. Of the ILs meta-analyzed here, only IL-18 and IL-27 had PLRs >10, suggesting that a positive test result for these ILs indicates a relatively high probability of TPE. In addition, IL-27 was associated with an NLR of 0.08, indicating an 8% probability that a negative IL-27 test result is a false negative for TPE. This may be sufficient for ruling out TPE in the clinic.

Pleural levels of a number of biomarkers have been proposed as aids in the diagnosis of TPE, including adenosine deaminase (ADA) and interferon-γ(INF-γ), both of which are present in patients with TPE at significantly higher concentrations than in patients with other types of pleural effusion. The diagnostic performance determined here for IL-18 and IL-27 compares favorably with that of ADA and INF-γ. Meta-analyses [67, 68] indicate that these latter two assays on their own are associated with the following diagnostic indices: sensitivity, 0.89 (95%CI 0.87–0.91) and 0.92 (95%CI 0.90–0.93); specificity, 0.97 (95%CI 0.96–0.98) and 0.90 (95%CI 0.89–0.91); PLR, 23.45 (95%CI 17.31–31.78) and 9.03 (95%CI 7.19–11.35); NLR, 0.11 (95%CI 0.07–0.16) and 0.10 (95%CI 0.07–0.14); and DOR, 272.7 (147.5–504.2) and 110.08 (95%CI 69.96–173.20). Although the available evidence suggests that IL-18 and IL-27 seem to have higher accuracy than ADA, the higher-cost and more complicated determination of IL-27 and IL-18 may limit their practical applicability. [69, 70] In addition, it has been reported that the combination of positive IL-27 with positive ADA values [16, 31, 46], can reach a sensitivity of 100% for the identification of TBP, Our meta-analysis, combined with previous ones, suggests that combining IL-18 and IL-27 with INF-γ and ADA may strengthen TPE diagnosis. We also suggest further studies should be carried out to determine the diagnostic accuracy of IL-27 and IL-18 combination or their combination with ADA or INF-γ.

Our meta-analysis suggests an association between elevated levels of at least certain pleural ILs and TPE. TPE has been characterized as a hypersensitive T cell reaction to mycobacteria or antigens in the pleural space, leading to the accumulation of protein-rich fluid. [6] ILs are divided into different families based on sequence homology, receptor chains or functional properties. IL-18 and IL-33 belong to the IL-1 family, [71] which contains inflammatory mediators playing a major role in early innate immune responses. IL-6, which belongs to a cytokine family of the same name, is a multifunctional, pleiotropic regulator of immune responses, acute-phase responses, hematopoiesis, and inflammation. [72] IL-2, a member of the γ-chain cytokine family, is produced mainly by CD4+ and CD8+ T cells and is essential for Treg cell development. [73] Although both blood and pleural fluid samples can be processed for all ILs, these assays are limited by their inability to differentiate drug resistant TB, consequently, cannot replace appropriate microbiological and molecular investigations. Future work is needed to examine how ILs may affect onset and/or progression of TPE and the probable association between ILs and drug sensitivity of TB.

To ensure reliable results, we meta-analyzed only ILs for which sensitivity and specificity data were available from at least 3 studies. As a result, we did not analyze several ILs for which levels appear to be elevated in tuberculosis [74], including IL-8 [57] and IL-22 [58]. Further work should examine the diagnostic potential of these ILs. In addition, more work should also examine the diagnostic performance of these and other ILs in combination, which we could not do for lack of studies including such combinations.

Our meta-analysis has additional limitations. First, exclusion of conference abstracts, letters to journal editors and unpublished data may have given rise to publication bias, such that our results overestimate actual diagnostic performance. Second, patients were diagnosed with TPE based on both bacteriological and histological assessment in only a few studies; in most studies, patients were diagnosed on the basis of one or the other, alone or in combination with clinical course, and they were diagnosed based solely on clinical course in a few studies. This increases risk of misclassification bias. Third, description of methodology was incomplete in many studies, leading to a QUADAS-2 assessment of “unclear”. In addition, we did not perform meta-regression analysis to determine the source of heterogeneity, because of the limited numbers of the studies included. Our results highlight the need for more rigorous studies of ILs in the diagnosis of TPE. Future work should also examine the diagnostic potential of IL levels in serum, since most studies have focused on pleural levels.

Conclusion

The available evidence suggests that assaying pleural levels of certain ILs may aid in the diagnosis of TPE when used in combination with other biomarkers and approaches. By confirming such diagnosis, ILs may help avoid the need for more invasive diagnostic procedures.