Introduction

Tuberculosis (TB) is an infectious disease caused by the acid fast bacillus Mycobacterium tuberculosis (M. tuberculosis) [1]. Though global incidence is declining TB remains a pandemic. With 10.4 million new cases in 2015 and an estimated 2–3 billion infected people of which 10–15% are likely to develop clinical TB, the disease burden is still substantial and caused 1.79 million deaths in 2015. TB counts amongst the top ten causes of deaths worldwide [1]. The advances in combating global TB are being compromised by the rise in multidrug-resistant TB (MDR-TB), extensively drug-resistant TB (XDR-TB), socioeconomic difficulties, and high prevalence of human immunodeficiency virus (HIV) in the endemic areas [1,2,3]. Extra-pulmonary TB (EPTB) is developing readily in patients with concurrent immunosuppression; it is seen in over 50% of patient with concurrent HIV infection [4]. Distribution of disease, treatment, and outcome are strongly correlated with poverty and most deaths are medically preventable [5]. Immigration and traveling are continuously introducing M. tuberculosis into the developed world and sustaining western prevalence [6,7,8].

TB spreads via respiratory droplets, and subsequently infects the lower airways and complementary lymphatic system [9]. The lungs are most susceptible, but TB infections may spread throughout the body, by either lymphatic or hematogenous dissemination [10,11,12]. This way TB may cause a broad spectrum of symptoms from vague constitutional symptoms to specific symptoms from focal organ-related disease. Host immune responses drive several of the pathophysiologic processes seen in TB. This is illustrated histologically by the presence of caseous necrosis, cells of Langerhans, and epithelioid histiocytes [13, 14]. The clinical course varies greatly and classification of tuberculous disease is extensive, relying on ethological and clinical parameters [15]. The clinical patterns concerning TB infections are diverse and will not be discussed further.

Diagnostics and treatment response evaluation of TB rely on indirect serological and immunological observation, alongside radiological assessment of abnormal tissue morphology, and direct light microscopy [16, 17], and it is as important as it is difficult [18,19,20,21]. A variety of factors influence the diagnostic approach, e.g. the broad spectrum of tuberculous disease and the limitations of current laboratory methods [22, 23]; a common denominator for problems relating to laboratory diagnostics and evaluation is the isolation of the bacteria itself [22]. Thus, the conventional chest x-ray (CXR) is currently the cornerstone in screening, diagnosis, and evaluation of TB, although there are normal findings in 15% of the diseased population, and even higher in concurrent immunosuppressed individuals [16]. CXR has further limitations, i.e. the limited topographical area and the potential lag time between disease activity and alterations in tissue morphology [15]. More advanced imaging modalities like computed tomography (CT) and magnetic resonance imaging are also used in TB diagnostics, especially for specific organ assessment in pulmonary TB as well as EPTB [15]. Thus, evaluation of TB is usually based on presence of acid-fast bacilli in sputum and radiological findings. However, only 20–55% of patients with active pulmonary TB have sputum containing acid-fast bacilli and radiographic changes are not always characteristic enough to form the basis of definitive diagnoses [21]. EPTB is especially difficult to diagnose and evaluate [24, 25]. Evaluating treatment response is especially challenging in immunocompromised patient. With morphologic imaging, this is due to relative tardiness or complete absence of usual imaging features. The various serological assays may be unable to differentiate active infection from latent disease, and reduced host response in immunocompromised patients as well as subsets of resistant strains may influence results [15, 26]. Recent studies have suggested that combined positron emission tomography/CT with 18-fluorine-fluorodeoxyglucose (FDG-PET/CT) might predict response to anti-tuberculous treatment (ATT) early [23, 27]. The extensive treatment regimen for TB comprises a combination of anti-tuberculous drugs and antibiotics over several months and is costly and inconvenient. Detecting early response could limit the time spent on futile medication when treating non-responders, and subsequently facilitate more rapid alterations of the treatment strategies, including swift and effective targeting of drug-resistant TB by early recognition of treatment failure to benefit both patients and health care facilitators.

The progressing drug resistance amongst strains of M. tuberculosis (MDR-TB and XDR-TB) calls for vigilance in clinical practice as infections by resistant strains tend to persist after completion of conventional ATT [1, 28]. Response evaluation is, therefore, of special importance in cases with highly suspected or proven drug resistance. The use of FDG-PET/CT has shown the ability to assess lesion activity in both MDR-TB and XDR-TB [22, 29]. Confirming MDR-TB by conventional laboratory methods may take from 6 to 16 weeks but still be unsuccessful in 30% of cases. In clinical settings with acute TB such lag time is not acceptable and may prove fatal [22]. FDG-PET/CT may provide more prompt assessment of lesions metabolic activity and subsequent changes over time.

The aim of this study was to evaluate the value of FDG-PET or FDG-PET/CT for treatment response and clinical outcome in TB infected patients.

Materials and methods

Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) statement [30] were applied for this article. A review protocol was not elaborated and ethical review was deemed unnecessary.

Data source

The population, intervention, comparison, outcome (PICO) framework [31] of this study was to determine if FDG-PET or FDG-PET/CT can be used to evaluate treatment response in humans infected with TB. We performed a comprehensive search in Medline/PubMed and Embase databases for relevant articles using both subject headings and text words. The search was conducted on 7 March 2017. Keywords included “PET-CT”, “FDG” and “tuberculosis” along with their derivatives. The search was limited to human studies and English language papers. A full search strategy is provided in Online Resource 1. The results were collected, merged, and filtered using the computer software EndNote X8 (Clarivate Analytics, Philadelphia, PA).

Study selection

The selection of articles was conducted by two authors (HS, TS) individually using Covidence online software [32]. Inclusion criteria were original studies with humans infected with M. tuberculosis undergoing ATT with a minimum sample size of 10 patients. All subjects had to receive a minimum of one FDG-PET/CT or FDG-PET scan, and follow-up had to be evaluated using clinical outcome. Reviews, case reports, poster presentations, conference abstracts and other contributions not being original, peer-reviewed papers were excluded. The primary selection was based on article title and abstract. Full texts of the remaining articles were then appraised. Remaining articles were reviewed in detail by the authors and included in this review. Eligible studies were included in a meta-analysis if at least one post-baseline FDG-PET/CT or FDG-PET scan was also performed. Any discrepancies between reviewers were resolved by consensus and a third investigator (SH) when needed.

Obtaining supplementary data efforts and quality assessments

If data were missing, insufficient or equivocal, efforts were made to contact the corresponding author. The quality and risk of bias for each article were assessed using the QUADAS-2 tool [33].

Statistical analysis

Percentage change in maximum standardized uptake value (SUVmax) from baseline was meta-analyzed with a fixed-effects model using the inverse variance method, supplemented by a Forest Plot [34] in order to graphically display the point estimate and a respective 95% confidence intervals (CI) on a per-study basis as well as summarized across studies. The heterogeneity of the studies was assessed with the I-squared statistic [35], and publication bias was assessed visually with a Funnel plot [36, 37]. All analyses were done with STATA/MP 14.2 (StataCorp, College Station, Texas 77845 USA); for the meta-analysis, the metan package was applied.

Results

Medline/PubMed generated 480 results, Embase generated 1568 results, and the primary selection resulted in exclusion of 1927 articles not relevant to the subject (Fig. 1). Full text appraisal of 121 articles resulted in exclusion of 112, and the remaining nine papers were reviewed by the authors and included in this review.

Fig. 1
figure 1

PRISMA based flow chart [20]

The nine included articles (Table 1) were published between 2009 and 2016. Five (56%) articles were prospective cohorts, two (22%) were randomized controlled trials, one (11%) was a prospective observational study, and one (11%) was a retrospective case control study. One used PET, while eight used PET/CT. A total of 332 patients were included in these studies (range 19–131). Sixty-seven patients (20%) were lost to follow-up and thus 265 patients completed ATT according to study-protocol. All articles included in this review sought to assess the ability of FDG-PET or FDG-PET/CT to evaluate treatment response in patients with TB undergoing ATT.

Table 1 Study and patient characteristics

Four studies [18, 21, 23, 25] presented change in SUVmax from baseline and were included in our meta-analysis (Fig. 2). The number of included patients in these studies ranged from 21 to 131, and the estimated overall percentage change in SUVmax was − 54.38% (95% CI − 57.81 to − 50.96). All patients in these studies were HIV negative and with few exceptions [25] without any concomitant malignancy. The article by Demura et al. [21] included patients infected with either M. tuberculosis or M. avium (MAC); therefore, a subgroup analysis was performed to only include patients infected with M. tuberculosis. In the study by Malherbe et al. [23], two patient cohorts were followed, one from South Korea (N = 14), and one from South Africa (N = 99). These groups were analyzed separately and indicated by “Malherbe 2016a-b”,respectively.

Fig. 2
figure 2

Forest plot on ∆SUVmax. CI confidence interval, ES estimate

The two articles by Sathekge et al. [22, 24] exclusively involved HIV positive patients and, by study design, patients underwent only one FDG-PET/CT. SUVmax and number of involved lymph nodes (LN) were compared with treatment outcome to investigate sensitivity and specificity of various cut-off values. Not including these studies in our meta-analysis was due to concomitant HIV infection, binary (responder/non-responder) classification of ATT response, and the presentation of only a single FDG-PET/CT scan.

The remaining three studies [27, 29, 38] presented only qualitative/nominative results. The sample size was between 19 and 35 patients. They were heterogeneous in design, type of TB, and comorbidities. The data presented were quantitatively insufficient and incompatible with the meta-analysis. The findings were illustrated using a variety of nominative scales categorizing ATT response. Corresponding authors of these articles were contacted and quantitative numerical data (SUVmax) were requested, however unsuccessful. The study by Coleman et al. [29] had macaques as primary subjects and humans were included from a parallel study; a subgroup analysis was, therefore, performed.

Assessing quality and risk of bias

Individual assessment of the included articles was performed using the QUADAS-2 tool (Fig. 4 and Online Resource 2). Assessment of each QUADAS-2 domain for individual articles and possible sources of bias is available as Online Resource 2. The Funnel plot (Fig. 3) illustrating meta-analyzed studies was asymmetrical. Three of the five study populations were outside the pseudo 95% confidence limit which indicates possible heterogeneity and publication bias. The PRISMA checklist is provided in Online Resource 3.

Fig. 3
figure 3

Funnel plot of the studies included in meta-analysis

Discussion

General statements of principal findings

The goal of this study was to evaluate treatment response and clinical outcome in TB-infected patients using FDG-PET or FDG-PET/CT. The literature search strategy and selection process resulted in nine articles.

The four studies included in the meta-analysis compared changes in SUVmax between baseline scans and follow-up scans. The results showed a pooled change in SUVmax of − 54.38% (95% CI − 57.81, − 50.96). This may indicate that FDG-PET or FDG-PET/CT could be useful for evaluation of disease activity and monitoring of treatment response in patients with various forms of TB. Three of the four studies included in the meta-analysis demonstrated clinical correlation between patient outcome and changes in SUVmax. The three studies by Chen et al. [38], Coleman et al. [29], and Stelzmueller et al. [27] all concluded that FDG-PET/CT was potentially beneficial when assessing ATT response, but only Chen et al. applied a consistent clinical outcome reference standard. The two articles by Sathekge et al. [22, 24] concluded that FDG-PET/CT may separate responders from non-responders by correlating SUV-data with sputum cultures, clinical course, and radiological assessment, but neither presented data on semi-quantitative changes to corroborate these conclusions.

Details on included studies

Malherbe et al. [23] followed two cohorts which showed overall decrease in SUVmax of − 52.04 (95% CI − 56.73, − 47.36) (2016a), and − 29.04 (95% CI − 52.21, − 5.88) (2016b), see also Fig. 2. The FDG-PET/CT results were compared with a sputum culture reference standard at month 6 and a correlation between lesion activity and detection of M. tuberculosis in sputum cultures was found. Also, overall clinical outcomes were correlated with FDG-PET/CT-findings: so-called mixed imaging responses (novel FDG-avid lesions or foci with increased intensity in follow-up scans) were more likely to have an unfavorable outcome, i.e. 12/20 (60%) patients with treatment failure or recurrent disease demonstrated this pattern compared to 21/76 (28%) cured patients. Similarly, 12/76 (16%) cured patients demonstrated a so-called resolved pattern (normalized FDG in all lesions on follow-up scans) compared to only 1/12 (8%) patients with recurrent disease, and none of the patients where treatment failed. The strength of the study lies in the large cohort size, but it is weakened by the lack of a standardized treatment protocol. In the study by Dureja et al. [18] changes in SUVmax between sequential scans were assessed along with visual analogue scale (VAS) score, erythrocyte sedimentation rate (ESR), Eastern Cooperative Oncology Group (ECOG) score [39], and reports of subjective improvement in patients with uncomplicated spinal TB. Results showed a pooled change in SUVmax of − 56.13 (95% CI − 62.21, − 50.05) which correlated with the clinical improvement on VAS score, but not the ESR, at 6, 12, and 18 months. The study concluded that SUVmax can be used as a quantitative biomarker for metabolic activity in spinal tuberculosis and that radiological findings correlated well with the clinical outcome. A weakness is the large attrition of patients during follow-up (N = 15). Martinez et al. [25] performed FDG-PET/CT scans at baseline and 1 month after ATT initiation and showed a pooled response in SUVmax of − 34.38 (95% CI − 48.71, − 20.05). No consistent reference standard was used during follow-up. The study concluded that change in SUVmax was a marker of chemotherapeutic effect, especially in patients with EPTB; 18/20 patients were considered cured clinically/serologically, and all displayed some decrease of SUVmax, whereas the remaining two patients with increased SUVmax later had their diagnoses revised to lymphoma and MDR-TB, respectively. The strength of this article was the homogeneity of the study population and scan protocol. The TB-infected patients in the study by Demura et al. [21] showed a pooled decrease in SUVmax of − 89.13 (95% CI − 100.00, − 73.90). The patients underwent both FDG-PET and high-resolution CT (HRCT) scan and changes were clinically evaluated. No other reference method was defined. The conclusion of the study was that FDG-PET/CT could be useful for diagnosis and monitoring of treatment response in patients with pulmonary TB. The strength of the articles was the homogeneity of the study population, treatment and scan protocol, but it was limited by the small study population due to the large attrition and the lack of clinical correlation.

Three studies [27, 29, 38] compared changes between two FDG-PET/CT scans using a variety of nominative scales. In the article by Chen et al. [38], 28 MDR-TB infected patients completed treatment and were classified as either treatment success (N = 24) or failure (N = 4) using sputum culture conversion as reference standard. FDG-PET/CT was the best method for early prediction of treatment results and long-term outcome, i.e. at 2 months FDG-PET/CT demonstrated 96% sensitivity for predicting treatment success, and 79% specificity for predicting treatment failure. Similar results were accomplished by CT, but not until the 6 months scan. No SUVmax values for individual patients were presented in the article. In the study by Coleman et al. [29], 19 patients infected with XDR-TB and treated with linezolid monotherapy were randomly assigned four different groups with different intervals between scans. Apart from scans at baseline and month 6, additional scans were performed 2 months prior to ATT initiation (N = 5) and after initiation of ATT at month 1 (N = 4) two (N = 4), and three (N = 5). One subject was excluded after baseline scan. SUVmax results were only graphically represented. The reference standard was sputum culture. Of the 19 subjects included in the study, all except two showed a consistent decrease in inflammatory activity. The study concluded that FDG-PET/CT imaging might be used to quantitatively measure early drug efficacy against TB in humans. Stelzmueller et al. [27] examined changes in lesion activity in 35 patients of which 13 had comorbidities. The scan protocol was not standardized and subjects received 2–6 scans. Outcome was defined by changes in lesion activity and categorized into groups accordingly with no definitive reference standard. Results were presented as remission of disease (N = 15), residual disease (N = 16), or progression of disease (N = 4). The study stated that FDG-PET/CT could be useful to evaluate response to therapy and help define duration of treatment, but urged the need for further studies to investigate this. The comorbidities could have confounded the results, whereas the retrospective study design and heterogeneity in the timing of scans could have introduced bias.

Sathekge et al. published two articles [22, 24] wherein they assessed SUVmax and number of involved lymph node (LN) bastions using FDG-PET/CT and CT, and they correlated findings with clinical outcome as responders (N = 28) and non-responders (N = 16). In both studies SUVmax and the number of involved (PET-positive) LN bastions were significantly higher in non-responders compared to responders. In their 2011 study, a cut-off of 5 FDG-avid LN bastions separated responders from non-responders with a sensitivity and specificity of 88 and 81%, respectively, and a negative predictive value of 93%. Other parameters like viral load and CD4 status remained non-significant. In their 2012 study, an SUVmax cut-off of 4.5 separated responders from non-responders with a sensitivity and specificity of 95 and 85%, respectively, whereas corresponding values for CT-based response evaluation were 88 and 66%, respectively. Reference standard used was sputum culture complemented by clinical assessment and radiological findings when needed. The studies concluded that FDG-PET/CT has the potential to become a valuable tool for visualizing number of involved LN bastions to separate responders from non-responders.

The nine articles are heterogeneous in design and patient selection, and, thus, not fully comparable. However, there is a consistent trend suggesting FDG-PET/CT may be used for evaluation of ATT response. A limited number of articles are published on this subject, with varying demographic populations, ethnicity, socioeconomics, and geographical origin across the studies. FDG-PET/CT might have the potential to shorten treatment time and individualize treatment protocol depending on lesion activity [21, 27]. Larger studies are needed to confirm this. When FDG-PET/CT was compared to other radiological modalities it was proven more sensitive [18, 27] with the earliest changes seen in scans taken 4 weeks after initiation of ATT [25]. Some tuberculous lesions do not shrink and can even grow during treatment regardless of efficacy. Thus, other imaging modalities that solely examine lesion morphology and not lesion metabolism might, therefore, be misleading [40]. In the early stages of disease, metabolic alterations in tissue are seen prior to macroscopic changes [15]; FDG-PET/CT has the potential to discover these early changes in lesion activity much earlier than other imaging methods [27]. A lower uptake of FDG, and subsequently lower SUVmax on follow-up scans, indicates that ATT is effective and the treatment strategy should be continued. Contrarily, unaltered or increased SUVmax suggests insufficient ATT response and advocates changes in treatment regimens [21]. FDG-PET/CT proves useful in evaluating activity and response to ATT, but a definite diagnosis based on culture and tissue is recommended [21]. An overall matter is the general applicability of SUVmax-based response evaluation which remains controversial. Potential problems could be related to FDG-injection which includes extravasation at injection site, decay between dose measurement and time of injection or residual activity in the syringe. By adhering to the EANM/EARL standards, these challenges could be overcome [41]. Patient factors, for example, body weight, tissue composition, and blood glucose level as well as technical issues such as organ or lesion movement during scanning could also affect results [42].

Comorbidities

Comorbid patients usually have a more progressive disease course and higher mortality [43, 44]. The risk of developing TB is much higher in HIV-infected individuals [1]. HIV co-infection impairs the ability of the immune systems to contain the TB infection and accelerates the clinical course of HIV [22]. Early detection of treatment failure is, therefore, especially important in this patient group. Our findings suggest that ATT evaluation by FDG-PET/CT shows similar response patterns (Fig. 2) regardless of site of TB and comorbidities, which may indicate that FDG-PET/CT could be used for assessment of different forms of tuberculous disease which was previously dependent on separate diagnostic approaches.

Strengths and weaknesses of the study

The strength of this review lies in the extensive literature search strings and article selection. The search terms were very broad and article selection was performed by two separate authors, which increased the likelihood of including all relevant papers. Corresponding authors of studies with lacking relevant data [27, 29, 38] were contacted. To maximize transparency, the PRISMA statement structure was applied and the included articles were assessed using the QUADAS-2 tool to identify possible sources of bias. Potential weaknesses of this study are the following:

  1. i.

    The considerable heterogeneity in the meta-analyzed studies (I 2 = 90.1%).

  2. ii.

    The potential risk of bias in included articles (Fig. 4 and Online Resource 2).

    Fig. 4
    figure 4

    Graphical display of QUADAS-2 results [33]

  3. iii.

    The lack of a gold standard as well as inter-study inconsistency regarding diagnostic strategy and consensus regarding clinical outcome and definition of treatment response.

  4. iv.

    The assumed confounding roles of varying geographical and socioeconomic factors.

  5. v.

    The utilizing of only Medline/Pubmed and Embase databases; some gray literature may have been missed.

  6. vi.

    The significant variations in ATT protocols and manifestation of TB [pulmonary (N = 4), EPTB (N = 1), and both or unspecified (N = 6)] across studies. Furthermore, the FDG-PET/CT scans were performed within substantially variable timelines and intervals (Table 1). This may confound results in the given study and further render results less applicable to other TB in general.

Conclusion, unanswered questions and future research

The included articles indicate that FDG-PET or FDG-PET/CT is capable of evaluating tuberculous lesion activity, with special clinical significance in early ATT response evaluation. Further research is needed to clarify more precisely the clinical value of FDG-PET/CT in predicting the course of tuberculous disease and its potential for individual assessment of ATT response. This might facilitate a more optimized or personalized treatment regimen based on (semi)quantitative analysis, comorbidity, location, and dissemination of disease.

The included articles all identify FDG-PET/CT as a potential surrogate marker for ATT response with decreasing SUVmax indicating favorable clinical outcome. However, the precise predictive value, sensitivity, and specificity remain uncertain. Also, the most optimal time interval between baseline scan and response evaluation scan remains unclear and should be evaluated further. Finally, several variables may influence the results, so the establishment and evaluation of more standardized approaches is imperative.

Another matter to be addressed is the potential of novel quantification methods that may overcome the inherent weaknesses of SUV-based quantification. Clinically, a simple numerical cut-off value would be of practical importance for the clinicians to differentiate responders and non-responders and evaluate treatment response. However, establishing such dichotomous cut-offs of SUV-based measurements is probably not valid; we have previously addressed both the challenges in using SUV-based quantification, especially in response evaluation, and the potential of novel quantitative techniques. Assessment of global disease burden in essence presents the clinicians with a single number expressing the degree of disease activity either globally (systemic), or within a single organ while at the same time attenuating some of the difficulties with SUV, e.g. applying comparable regions of interest in sequential scans and partial volume correction [42, 45]. This may be of interest in TB, both locally in the lungs or systemically in disseminated EPTB. This has not yet been evaluated in TB, but results on lung inflammation and mesothelioma have shown the methodology to be feasible in the lungs [46, 47].

As mentioned, one of the inherent challenges of FDG is the relative non-specificity. This may impact the assessment of malignancy suspicious lung lesions in TB endemic areas and vice versa. In theory, multiple time point imaging enhances differentiation between benign and malignant tissue due to differences in FDG metabolism between cancer cells and inflammatory ones. However, several investigators have questioned the value of this time-consuming technique, and further studies are needed for clarification [48, 49].

These unclarified issues and controversies may all be addressed in a well-designed prospective multicenter study adhering to EARL standards. Although novel tracers are in the pipeline, the potential of FDG has not yet been fully explored, so currently the work should be focused on this ubiquitous tracer before too much efforts are invested in other tracers.