Introduction

Tuberculosis is a major health problem and one of the leading causes of death worldwide. Estimated deaths related to tuberculosis were 1.3 million in 2020 [1]. Among patients with pulmonary tuberculosis, 20–50% of patients had negative sputum acid-fast stain—known as smear-negative pulmonary tuberculosis (SNPTB) [2,3,4,5,6,7,8,9,10]. Although the gold standard for pulmonary tuberculosis diagnosis is the isolation of Mycobacterium tuberculosis from the culture, sputum collection may not be feasible or adequate in some situations, particularly in SNPTB patients. Thus, the diagnosis of SNPTB remains a challenging clinical conundrum. Furthermore, delayed tuberculosis diagnosis leads to disease progression and increase in community spread.

Interferon-gamma release assays (IGRAs) are one of the diagnostic methods that might be helpful to diagnose SNPTB. IGRAs, including QuantiFERON-TB Gold In-Tube test (QFT-GIT), QuantiFERON-TB Gold Plus (QFT-plus), and T-SPOT.TB, are in-vitro blood tests detecting T-cell released interferon-gamma (IFN-γ) stimulated by M. tuberculosis antigen. These tests have been approved by the U.S. Food and Drug Administration (FDA) to diagnose latent tuberculosis infection (LTBI) [11]. To date, IGRAs have been increasingly used to diagnose LTBI. IGRAs appear to have a stronger predictive value than tuberculin skin tests to detect developing of later active tuberculosis [12].

However, the role of IGRAs in diagnosing active tuberculosis remains unclear. Previous systematic reviews and meta-analyses focused on the diagnostic accuracy of IGRAs for diagnosis of active pulmonary tuberculosis, regardless of sputum smear results [13,14,15,16] and extra-pulmonary tuberculosis [17, 18], demonstrated moderate diagnostic performance. The role of IGRAs on SNPTB has not been well described. Thus, we conducted a systematic review and meta-analysis to explore the diagnostic accuracy of IGRAs for the diagnosis of smear-negative pulmonary tuberculosis.

Methods

This systematic review and meta-analysis was conducted and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement. The protocol was registered with PROSPERO (CRD42021274653).

Search strategy and eligibility criteria

We performed a comprehensive search of databases from each database’s inception to April 5, 2021, without language restriction. The databases included Ovid MEDLINE(R) and Epub Ahead of Print, In-Process & Other Non-Indexed Citations, and Daily, Ovid EMBASE, Ovid Cochrane Central Register of Controlled Trials, and Scopus. The search strategy was designed and conducted by an experienced librarian (LP) with input from the study’s principal investigator. Controlled vocabulary supplemented with keywords was used to search sensitivity/specificity of IGRAs for active tuberculosis disease in adult patients. The actual strategy listing all search terms used and how they are combined is available in Additional file 1: Table S1.

Studies were included if those: (1) studied the diagnostic accuracy of IGRA tests, including QFT-GIT, T-SPOT.TB, or QFT-plus, for diagnosis of active pulmonary tuberculosis (2) provided sufficient information to evaluate the sensitivity and specificity of IGRA tests for diagnosis of SNPTB and (3) included adult participants with age ≥ 15 years old. Studies were excluded if those: (1) did not follow the manufactory instructions or cut-off values (≥ 0.35 IU/ml for QFT-GIT and QFT-plus, and ≥ 6 spots for T-SPOT.TB) (2) performed IGRA tests from specimens other than blood (3) included patients with latent tuberculosis infection in the analyses (4) included patients with ongoing anti-tuberculosis drug for > 14 days (5) performed IGRA tests as the second sequential test relied on the result of the first test (6) included < 10 of patients with SNPTB and (7) the full articles could not be accessed.

Data extraction

Two authors (AP and SC) independently reviewed the titles and abstracts of all articles retrieved from the systematic search to exclude irrelevant studies. In case of disagreement, the discordant articles were included in the full-text review. The full articles of included studies from the first step were independently reviewed by two authors (PL and PT) to select eligible studies and abstract data. The reviewers resolved disagreements regarding study selection and data abstraction by discussion. The reviewers also manually reviewed the references of included studies and the previous systematic reviews to identify additional eligible studies. The reliability of study selection was assessed using the percent of agreement and \(\kappa\) statistic.

The following information was abstracted from each study, including author names, year of publication, study design, the country in which each study was conducted, type of IGRA test, and participant characteristics (number of participants, age, sex, proportion of confirmed pulmonary tuberculosis patients, proportion of patients with smear-negative pulmonary tuberculosis, participant immune status, history of previous tuberculosis infection, and history of BCG vaccination).

Outcome

The primary outcomes were the diagnostic accuracy variables, including sensitivity, specificity, positive likelihood ratio (LR), negative LR, and diagnostics odds ratio (DOR). Diagnostic accuracy measures were abstracted, including true positive, true negative, false positive, and false negative for each IGRA test and reference test.

Study quality assessment

The risk of bias in each eligible study was independently evaluated by two authors (PL and PT) using the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Disagreements between two reviewers were settled by the discussion with the third reviewer (TP). We defined studies with low risk of bias if they were judged as having low risk of bias for all domains of risk of bias evaluation [19].

Statistical analysis

We used the symmetric hierarchical summary receiver operating characteristic (HSROC) models to jointly estimate sensitivity and specificity, positive and negative likelihood ratios, and DOR [20]. We drew the HSROC curves based on the estimates and included sensitivity and specificity reported by the included studies. The area under receiver operating characteristic (AUROC) was also evaluated for each test. We were unable to pool estimates when the number of studies was less than 4. We were also unable to examine potential publication bias by evaluating funnel plots symmetry and Deeks funnel plot asymmetry tests since the number of studies was not large enough (< 20). We conducted the following pre-specified subgroup analyses: (1) studies having low risk vs. at risk of bias and (2) studies conducted in high vs. low tuberculosis burden countries. The subgroup analysis among T-SPOT.TB was unable to perform according to the low number of studies. Since patients with HIV infection and immunocompromised status were associated with false-negative IGRA tests [12, 13], we did the sensitivity analysis by excluding studies conducted on patients with HIV infection or immunocompromised status [21, 22]. Stata version 17 (StataCorp LLC, College Station, TX) was used in all statistical analyses.

Result

Study selection and study characteristics

Of 1,312 articles retrieved from the systematic search, 1,182 were excluded through title and abstract screening. Of 127 articles that underwent full-text review, a total of 16 studies were included in the final analysis (Fig. 1). The percent of agreement and \(\kappa\) statistic for study selection were 95% and 0.77, respectively.

Fig. 1
figure 1

Study selection

Fourteen studies [2,3,4,5,6,7,8,9,10, 23,24,25,26,27] reported the diagnostic accuracy of QFT-GIT, while 5 studies [10, 25, 26, 28, 29] reported the diagnostic accuracy of T-SPOT.TB. No study of QFT-GIT-plus was identified. A total of 1,204 and 2,658 SNPTB were included for QFT-GIT and T-SPOT.TB test, respectively. The studies were conducted in 10 different countries, of which 7 studies (44%) were conducted in high tuberculosis burden countries, according to the World Health Organization Global Tuberculosis Report 2020 [2, 3, 7, 9, 25, 28, 29]. Characteristics of included studies were demonstrated in Table 1. The diagnostic criteria for standard reference of active pulmonary tuberculosis in each study were described in Table 2. The absolute numbers of true positive, true negative, false positive, and false negative regarding SNPTB in each study are shown in Additional file : Table S2.

Table 1 Characteristic of included studies
Table 2 Diagnostic criteria for active pulmonary tuberculosis in each included study

Study quality

Of 16 included studies, 8 (50%) studies meet the low risk of bias criteria. Of 8 studies having at risk of bias, 4 studies were conducted using case–control design. Other 3 studies had concerns regarding the applicability of IGRA test since they defined indeterminate results as negative IGRA test and 1 study defined patients with negative tuberculosis culture as non-tuberculosis group without mention of bronchoscopy even though the study was conducted in high tuberculosis burden area. The summary and details of the risk of bias assessment are demonstrated in Fig. 2 and Additional file 1: Table S3.

Fig. 2
figure 2

Summary of assessment of study quality using QUADAS-2 tool stratified by each QUADAS-2 item

Diagnostic accuracy

For diagnosis SNPTB, QFT-GIT had sensitivity of 0.77 (95% CI 0.71–0.82), specificity of 0.70 (95% CI 0.58–0.80), DOR of 8.03 (95% CI 4.51–14.31), positive LR of 2.61 (95% CI 1.80–3.80), negative LR of 0.33 (95% CI 0.25–0.42), and AUROC of 0.81 (95% CI 0.77–0.84) (Table 3 and Fig. 3). For T-SPOT.TB, the diagnostic accuracy included sensitivity of 0.74 (95% CI 0.71–0.78), specificity of 0.71 (95% CI 0.49–0.86), DOR of 6.96 (95% CI 2.31–20.98), positive LR of 2.53 (95% CI 1.26–5.07), negative LR of 0.36 (95% CI 0.24–0.55), and AUROC of 0.77 (95% CI 0.73–0.80) (Table 3 and Fig. 3).

Table 3 Diagnosis accuracy of QuantiFERON-TB Gold In-Tube and T-SPOT.TB
Fig. 3
figure 3

Hierarchical summary receiver operating characteristic (HSROC) plots demonstrate summary operating point (red square), 95% confidence interval (yellow dash line) and HSROC curve (green solid line) of A QuantiFERON-TB Gold In-Tube and B T-SPOT.TB for diagnosis of smear-negative pulmonary tuberculosis. Open circles represent individual study included in the meta-analysis, with circle size representing the sample size in each study

Subgroup and sensitivity analyses

The sensitivity and negative LR of QFT-GIT appeared consistence in almost subgroup analyses. However, the specificity, positive LR, and DOR seemed lower in the subgroup of studies conducted in high tuberculosis burden countries compared to the low tuberculosis burden countries. The sensitivity analysis by excluding studies conducted in patients with HIV infection and immunocompromised status demonstrated the robustness of diagnostic accuracy of QFT-GIT for smear-negative pulmonary tuberculosis, with sensitivity of 0.79 (95% CI 0.73–0.84), specificity of 0.72 (95% CI 0.59–0.83), DOR of 8.85 (95% CI 5.36–18.11), positive LR of 2.86 (95% CI 1.86–4.40), negative LR of 0.29 (95% CI 0.22–0.38) (Table 3).

Discussion

Our systematic review and meta-analysis revealed comparable diagnostic performance for SNPTB diagnosis between QFT-GIT and T-SPOT.TB. However, both tests appear insufficient for ruling in or ruling out SNPTB. The sensitivities and specificities for QFT-GIT and T-SPOT.TB were in the 0.7–0.8 ranges. The specificity of QFT-GIT was lower in the subgroup of studies conducted in high tuberculosis burden areas, with the specificity of 0.57 (95% CI; 0.42–0.71).

Previous studies also demonstrated the moderate diagnostic performance of IGRAs for active pulmonary tuberculosis, including positive-smear patients [14, 15]. Our study revealed similar diagnostic performance in SNPTB despite potentially having a lower mycobacterial burden. Our findings emphasize that IGRAs may not correlate with sputum smear. The previous study using in-house IGRAs revealed that the positive rate of IGRAs was not significantly different between patients with positive and negative sputum smears [30]. In contrast to our study, previous systematic reviews and meta-analyses demonstrated T-SPOT.TB had lower specificity than QFT-GIT for active tuberculosis diagnosis [14]. The difference in inclusion criteria might explain the contradiction as the former meta-analysis included studies of LTBI. The involvement of LTBI may not represent the diagnosis algorithm in routine practice and result in an underestimation of specificity.

The diagnostic performance of IGRAs for diagnosis of SNPTB seems not to be high enough to diagnose or rule out active tuberculosis infection. As the IFN-γ can be detected in any stage of broad-spectrum tuberculosis diseases, the tests cannot distinguish active versus latent infection, particularly in areas of a high prevalence of tuberculosis [31]. Previous systematic reviews and meta-analyses demonstrated lower specificity of IGRAs for diagnosis of active tuberculosis in countries with low and middle income [13, 31]. Our study showed the robustness of the result since the specificity of QFT-GIT was lower in the subgroup analysis of high tuberculosis burden countries. The IGRAs may not be beneficial in this situation.

Our study reported the pooled sensitivity of 0.77 and 0.74 for QFT-GIT and T-SPOT. TB, respectively. That means approximately 25% of patients with SNPTB had negative IGRA results. As described above, IGRAs measure levels of IFN-γ released from T-cell lymphocytes. Several factors affecting T-cell function might be associated with false-negative IGRA results. Advanced age and low peripheral lymphocyte counts have been proposed as risk factors associated with false-negative IGRA results [22]. Low peripheral lymphocyte counts are reasonable to associate with a decrease in the production of IFN-γ response to specific Mycobacterium antigens. Although low peripheral lymphocyte counts may correlate to advanced age, previous studies demonstrated that advanced age was associated with false-negative T-SPOT.TB under the optimization of lymphocyte counts [32]. HIV infection was also associated with false-negative IGRAs [22]. As observed in our study, the overall sensitivity was higher than previous studies conducted on the HIV population [2]. The mechanism of the association remains controversial as it is unclear whether the CD4+ count directly affects the performance of IGRAs [33].

The main limitation we encountered conducting this review is the heterogeneity of the diagnostic gold standard for SNPTB. We cannot emphasize enough the diagnostic challenge of SNPTB. In our clinical practice, patients with negative acid-fast stain should undergo further diagnostic procedures, such as bronchoscopy, to obtain adequate specimens for mycobacterial culture. Unfortunately, the procedure might not be practical in some clinical settings. Therefore, our study included a variety of gold standard definitions. This may result in over-or under-estimation of the diagnostic accuracy.

There are other limitations worth noting. First, the study of QFT-plus was not included. QFT-plus might have higher diagnostic accuracy than QFT-GIT and T-SPOT.TB. One meta-analysis showed that the sensitivity of the QFT-plus for detecting tuberculosis infection is higher than QFT-GIT [34]. The higher sensitivity of QFT-plus because this test can detect both CD4 and CD8 T-cell responses. Further studies focusing on evaluating the accuracy of QFT-plus on smear-negative pulmonary tuberculosis is warranted. Second, only 50% of included studies meet the low risk of bias criteria. Nevertheless, subgroup analysis showed no difference in results between the low-risk and at-risk biases groups. Finally, because of the low number of T-SPOT.TB in included studies, so we cannot do subgroup analysis regarding the risk of bias, tuberculosis burden, and proportion of confirmed cases.

Conclusions

Our systematic review and meta-analysis showed that IGRAs have suboptimal accuracy for diagnosing or ruling out SNPTB. Nevertheless, IGRAs might be valuable tests for excluding tuberculosis among patients with low pre-test probability, especially in countries with a low tuberculosis burden.