Introduction

The pathology of progressive brain radiation necrosis (BRN) primarily includes inflammation and angiogenesis in which cytokines, chemokines, and vascular endothelial growth factor are upregulated [1,2,3,4,5,6,7]. Inflammation and angiogenesis account for the breakdown of the blood–brain barrier, resulting in contrast-enhanced lesions and perilesional edema. Nevertheless, recurrent tumors also displayed these findings on computed tomography (CT) and magnetic resonance image (MRI). Distinguish between BRN and tumor progression (TP) is rather challenging on conventional radiological imaging. In addition, surgical removal of tissue samples is invasive even in cases of stereotactic biopsies, although pathological diagnosis remains the gold standard. Moreover, needle biopsy poses a risk of misdiagnosis because BRN is typically a heterogeneous lesion, with coexisting radiation necrosis and tumor cells [8]. Ideally, BRN is diagnosed by relatively less-invasive radiological examinations that evaluate the whole lesion, compared with needle biopsy. Recently, bevacizumab was shown to markedly reduce brain edema and improve patients’ clinical statuses, and is a promising and novel treatment for BRN [9,10,11,12]. As bevacizumab delays the surgical wound healing, patients diagnosed with BRN by surgical biopsy need to wait for wound healing before the bevacizumab administration. However, bevacizumab could be administered immediately after the diagnosis of BRN by noninvasive radiological imaging studies.

The last several decades have witnessed an upsurge of various functional images and nuclear medicine studies that have developed and seem useful for differentiating between BRN and TP. For example, MR spectroscopy (MRS) and diffusion-weighted images (DWI) offer qualitative data without using contrast media. Perfusion images depict cerebral blood flow or volume (CBV) using contrast media. In addition, single photon emission CT (SPECT) and positron emission tomography (PET) display metabolic data using various tracers. Despite these radiological imaging studies being useful for differentiating between BRN and TP, it remains unclear which imaging study is preferable. Hence, this systematic review aims to illustrate the diagnostic accuracy of radiological imaging for differentiation between BRN and TP.

Methods

Search strategy

We conducted a systematic review based on the directives of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis statement (PRISMA) [13]. Our review question (RQ) was structured using the patient, exposure, comparison, and outcome (PECO) approach. Our RQ was, “Are radiological imaging studies useful for distinguishing BRN from TP in brain tumor patients treated with radiotherapy who exhibit clinical or radiological disease progression?” Regarding radiological examinations, although many hospitals own CT and MRI equipment, SPECT and PET are less common. Hence, we categorized the radiological examinations into the following two groups: CT and MRI as conventional radiological imaging (RQ1) and SPECT and PET as nuclear medicine imaging (RQ2). Our medical librarians conducted a comprehensive systematic search using the PubMed, Cochrane Library, and Japan Medical Abstracts Society databases, up to March 2015. Additional file 1 presents the keywords used to complete the search. Regarding PET, several new tracers have been developed in recent years; however, these are too early to assess the diagnostic ability of differentiation between BRN and TP because numerous studies are required for systematic review. Hence, “fluorodeoxyglucose”/“FDG” and “amino acid”/“methionine” were included in the keywords. These tracers have been used since long, and an adequate number of studies are expected to be identified for the systematic review. Two reviewers (MF and KY for RQ1, and NN and TS for RQ2) screened and determined studies to be included for each RQ. Eligible studies investigated the diagnostic accuracy of radiological imaging methods for differentiation between BRN and TP and were written in English or Japanese. Eligible participants were patients who underwent radiotherapy for brain tumors. However, we excluded case reports, letters to the editor, and conference abstracts, as well as studies without sufficient information for construction of a 2 × 2 table.

Quality assessment and data analysis

The reviewers assessed the quality of individual studies using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) checklist [14]. The QUADAS-2 tool comprises four domains as follows: patient selection; index test; reference standard; and flow and timing. QUADAS-2 segregates study quality into “risk of bias” and “applicability.” We judged the risk of bias using signaling questions and applicability by concerns that the study does not match the RQ. Each domain was assessed in terms of the risk of bias and, the first three domains were also assessed in terms of concerns about applicability. Furthermore, the risk of bias and applicability were assessed by reviewers in each RQ. Besides QUADAS-2 assessment, indirectness, inconsistency, and imprecision were also assessed for the body of evidence.

We used Cochrane Collaboration Review Manager 5 (Review Manager. Version 5.3. Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014) to analyze the data of each study. The sensitivity, specificity, and accuracy, as well as 95% confidence intervals (CI), were calculated and evaluated using visual inspection of forest plots. In the quantitative synthesis, we completed bivariate diagnostic random effect meta-analysis and summary receiver operating characteristic (SROC) curves with R Software version 3.4.3 (https://www.R-project.org/) using mada package including “reitsma” function (https://www.rdocumentation.org/packages/mada/versions/0.5.8/topics/reitsma) to produce summary estimates for the sensitivity and specificity [15] and “madauni” (https://www.rdocumentation.org/packages/mada/versions/0.5.8/topics/madauni) for diagnosis odds ratio (DOR), provided by CRAN (The Comprehensive R Archive Network; https://cran.r-project.org/). Furthermore, a subanalysis of the quantitative synthesis was performed, dividing into tumor types, gliomas and metastatic brain tumors.

Results

Search results

Our database search for RQ1 yielded 188 papers. In addition, 13 records were identified from literature reviews. Of 201 papers, we excluded 34 because of duplication and 141 because they were case reports, featured incompatible contents, or had inadequate information. In the first screening, we identified 26 papers for full-text assessment. In the second screening, six papers were excluded because we could not identify the numbers of patients with true/false positive and negative results, or papers where a 2 × 2 table could not be constructed. Finally, we included 20 studies in the qualitative synthesis (Fig. 1; Table 1). The database search for RQ2 yielded 239 papers. In addition, 16 papers were identified from review articles. Of 255 papers, we excluded 37 because of duplication and 154 because of case reports, incompatible contents, or inadequate information. We selected 64 papers for the full-text screening; of these, 38 papers were excluded because of the inability of a 2 × 2 table construction. Finally, we selected 26 studies in the RQ2 meta-analysis (Fig. 1; Table 2).

Fig. 1
figure 1

Flow diagrams of the study selection for RQ1 (conventional radiological imaging) and RQ2 (nuclear medicine imaging)

Table 1 Summary of studies for CQ1 (conventional radiological imaging)
Table 2 Summary of studies for CQ2 (nuclear medicine imaging)

Meta-analysis

For RQ1, gadolinium (Gd)-enhanced MRI, DWI, MRS, and CT/MR perfusion were identified as methods to diagnose BRN. The Gd-MRI analysis was included four studies [16,17,18,19], the DWI analysis was included in two studies [20, 21], and the MRS analysis was included nine studies [20, 22,23,24,25,26,27,28,29]. The CT and MRI perfusion analyses were included in 1 [30] and eight studies [20, 21, 25, 31,32,33,34,35]. In these studies, the combination of multiple imaging (DWI and MRS, DWI and perfusion MRI, or DWI, MRS, and perfusion MRI) was also evaluated in three studies [20, 21, 28]. Additional file 2 describes the characteristics of studies included in the analysis of each modality. Figure 2 shows forest plots of each study in RQ1. In 26 studies for RQ2, SPECT, with a tracer of 201Tl, 99mTc-methoxyisobutylisonitrile (MIBI), and 99mTc-glucoheptonate (GHA), and PET, with a tracer of 18F-fluorodeoxyglucose (FDG), 11C-methionine (MET), 18F-fluoroethyltyrosine (FET), and 18F-boronophenylalanine (BPA), were used to differentiate between BRN and TP. The analyses of 201Tl-, 99mTc- MIBI-, and 99mTc- GHA-SPECT included six studies [19, 36,37,38,39,40], two studies [40, 41], and one study [42], respectively. The analyses of 18F-FDG-, 11C-MET-, 18F-FET-, and 18F-BPA-PET included nine studies [37, 39, 43,44,45,46,47,48,49], eight studies [48, 50,51,52,53,54,55,56], three studies [57,58,59], and one study [60], respectively. Additional file 2 describes information about each study. Figure 3 shows forest plots of RQ2 study.

Fig. 2
figure 2

The forest plot of each study for RQ1 (conventional radiological imaging)

Fig. 3
figure 3

The forest plot of each study in RQ2 (nuclear medicine imaging)

Figure 4 shows the pooled estimates of the diagnostic accuracy and SROC curves of the radiological imaging techniques. Combined imaging (DWI and MRS, DWI and perfusion MRI, or DWI, MRS, and perfusion MRI) exhibited the highest sensitivity (96%; 95% CI: 83–99%), and 18F-FET-PET exhibited the highest specificity (95%; 95% CI: 61–99%), resulting in high DORs. Conversely, the sensitivity of Gd-enhanced MRI was the lowest (63%; 95% CI: 28–89%), and the specificity of 18F-FDG-PET was the lowest (72%; 95% CI: 64–79%), which contributed to low DORs. Although the DOR of combined imaging (DWI and MRS, DWI and perfusion MRI, or DWI, MRS, and perfusion MRI) was the highest among all radiological imaging techniques, the DORs of perfusion MRI, DWI, and MRS were not high (MRP: 3.5, DWI: 3.4, and MRS: 3.0; Fig. 4).

Fig. 4
figure 4

Pooled estimates of the diagnostic accuracy and summary receiver operating characteristic curves of the radiological imaging in all included studies

In the subanalysis dividing into tumor types, gliomas and metastatic brain tumors, 23 studies included only gliomas and eight studies included only metastatic brain tumors. In addition, 14 studies included patients with various brain tumors; of these, 9 studies could be categorized into patients with glioma and patients with metastatic brain tumors. Excluding radiological imaging with a single study, Gd-enhanced MRI, MRS, perfusion, MRI, combined imaging (DWI and MRS, DWI and perfusion MRI, or DWI, MRS, and perfusion MRI), SPECT with 201Tl and 99mTc, and PET with 18F-FDG, 11C-MET, and 18F-FET were quantitatively synthesized in the subanalysis for gliomas (Fig. 5). Combined imaging (DWI and MRS, DWI and perfusion MRI, or DWI, MRS, and perfusion MRI) exhibited the highest sensitivity (97%; 95% CI: 80–100%), and 18F-FET-PET exhibited the highest specificity (99%; 95% CI: 91–100%), which resulted in higher DORs among radiological imaging for gliomas. Conversely, Gd-enhanced MRI and 18F-FDG-PET exhibited the lowest sensitivity (48%; 95% CI: 8–90%) and specificity (70%; 95% CI: 58–81%), respectively, among imaging for gliomas; these 2 studies had low DORs. In the subanalysis of metastatic brain tumors, Gd-enhanced MRI, perfusion MRI, 201Tl-SPECT, 18F-FDG-, and 11C-MET-PET were included in the meta-analysis (Fig. 6). Perfusion MRI exhibited the highest sensitivity (95%; 95% CI: 72–99%) but the lowest specificity (59%; 95% CI: 40–76%) among imaging for metastatic brain tumors. Thus, DORs were almost the same among these 5 imaging methods. Comparing between gliomas and metastatic brain tumors, Gd-enhanced MRI and 18F-FDG-PET declined the diagnostic accuracy of differentiating between BRN and TP in patients with glioma than that in patients with metastatic brain tumors. However, we observed no difference in the diagnostic accuracy between gliomas and metastatic brain tumors in perfusion MRI, 201Tl-SPECT, and 11C-MET-PET.

Fig. 5
figure 5

Pooled estimates of the diagnostic accuracy and summary receiver operating characteristic curves of the radiological imaging in studies for gliomas

Fig. 6
figure 6

Pooled estimates of the diagnostic accuracy and summary receiver operating characteristic curves of the radiological imaging in studies for metastatic brain tumors

Quality assessment

In this study, we assessed the risk of bias in accordance with QUADAS-2 (Fig. 7). Regarding patient selection, no randomized studies were included in our research results. While nine prospective cohort studies were identified [18, 28, 29, 32, 36, 37, 39, 46, 47], the remaining 36 studies were retrospective. Of 36 retrospective studies, patients were consecutively enrolled in 10 studies [19,20,21, 42, 43, 45, 48, 52, 58, 59]. In the index testing, the cutoff values of diagnostic parameters were preset and prospectively assessed in two studies but without blinding [22, 58]. In addition, cutoff values of diagnostic parameters were retrospectively exhibited with the diagnostic accuracy in other 28 studies; of these 28 studies, the cutoff values of diagnostic parameters were blindly measured in only five studies [16, 17, 31, 42, 46]. Only six studies used histopathology as the reference standard for all patients [16, 17, 21, 30, 47, 48], while two studies adopted clinical diagnosis as the reference standard [20, 42]. The remaining studies used the clinical diagnosis as the reference standard for some patients; in these studies, the clinical diagnosis was obtained from clinical and imaging follow-up. Of note, radiation necrosis was diagnosed if the clinical course was stable, and/or if the tumor was stable or shrunk or disappeared on a follow-up image. In most studies, the follow-up period was > 6 months. Only one study blindly reviewed the reference standard [16]. Regarding the applicability, patient selection was applicable to the RQ, but a nonblinded review of index tests and retrospectively-set cutoff values were not applicable to the RQ because of a high risk of bias-favoring index tests. Furthermore, studies that included clinical diagnosis as the reference standard had a high risk of bias and were not applicable to the RQ because radiological imaging data were usually included for clinical diagnosis.

Fig. 7
figure 7

Clustered bar graphs of quality results on the QUADAS-2 criteria tool

Several factors were associated with indirectness. As mentioned in the subanalysis, various brain tumors were included in the studies. Regarding the index test, parameters and cutoff values were different among studies with the same imaging modality. Notably, six different parameters were used among studies for MRS, and four different parameters were used among studies for perfusion MRI. Regarding cutoff values, the L/N ratio was mostly used in four studies with 11C-MET-PET; however, cutoff values were different among these studies. Studies with Gd-MRI, MRS, 201Tl-SPECT, and 18F-FDG-PET reported inconsistency in the sensitivity. In these imaging studies, one study revealed low sensitivity unlike the remaining studies reporting high sensitivity. In this review, most of the included studies had a large 95% CI as imprecision because of the small sample size. Notably, 33 (71.7%) studies included patients/lesions/scans < 50, and only one study included lesions > 100. The small sample size could be a bias to include specific patients only.

Discussion

The meta-analysis revealed a trend that the sensitivity was generally higher than the specificity in all radiological imaging methods; that is, TP was occasionally misdiagnosed as BRN by these imaging methods. 18F-FET-PET and 99mTc-MIBI-SPECT exhibited a high DOR. These nuclear medicine imaging techniques reflect cellular metabolism like amino acid transportation and transportation by P-glycoprotein; however, these were difficult to gain widespread use because of expensive specific apparatus and facilities. Conversely, the combination of DWI, MRS, and perfusion imaging exhibited the highest DOR among all imaging studies. Even with MRI, combined information with multiple parameters, including lesional metabolism and blood flow, enhanced the diagnostic accuracy, facilitating the differentiation between BRN and TP in conventional radiological imaging. In the subanalysis, Gd-enhanced MRI and 18F-FDG-PET revealed a low DOR and were useless to differentiate between BRN and TP in patients with glioma. In metastatic brain tumors, however, no difference was noted in the DORs among all radiological imaging methods. Hence, BRN could be diagnosed using any radiological imaging, such as Gd-enhanced MRI in metastatic brain tumors, and it is imperative to use specific imaging modality like combined imaging or new nuclear medicine for the diagnosis of BRN in gliomas.

In this review, many studies had a risk of bias. We included no randomized controlled trial, and only nine prospective cohort studies had a low risk of patient selection [18, 28, 29, 32, 36, 37, 39, 46, 47]. In addition, 26 (56.5%) studies were retrospective and had a bias to enroll a particular population of patients. In only two studies, a cutoff value for the best discrimination between BRN and TP was preset [22, 58]. Of note, retrospectively-set cutoff values could be overestimated and should be prospectively validated in future studies. Regarding the reference standard, histology was taken from all patients in only six studies (13%) [16, 17, 21, 30, 47, 48]. In studies using the clinical diagnosis as the reference standard, BRN was primarily if the clinical status and radiologically identified lesions were stable > 6 months. Hence, there was a possibility of confounding between the index test and the reference.

Regarding indirectness, various brain tumors were included. Reportedly, the development of radiation necrosis correlated with the total radiation dose, fraction size, treatment duration, and irradiated volume [61]; these factors of radiotherapy are different in applied radiotherapy between glioma and metastatic brain tumors. In addition, variable tumor cells and necrosis usually coexist in glioma after radiotherapy. Mixed lesions with tumor cells and necrosis render distinguishing between BRN and TP challenging even by histological examination. Thus, it is ideal to analyze the diagnostic accuracy of radiological imaging, dividing into glioma and metastatic brain tumors in the systematic review. Notably, diagnostic parameters were different among studies using the same imaging method. Moreover, when the same parameters were used for the same imaging method, the cutoff values were different among the studies, similar to those with L/N ratios for 11C-MET-PET. This, imprecision should be considered when assessing study results. In this review, strong evidence could not be obtained owing to the quantitative synthesis of studies with small sample size. We focused on PET with glucose and amino acid tracers as PET studies because several studies with these PET were published, which could be suitable for the meta-analysis. However, recent PET studies with new tracers, like 18F-DOPA, reported good results of differentiation between BRN and TP [62, 63]. In the near future, PET with new tracers would be investigated for the diagnostic accuracy in a meta-analysis after the adequate accumulation of studies. Recently, a PET/MRI study reported that FDG-PET/MRI could predict the local tumor control after stereotactic radiosurgery in patients with brain metastases [64]. Moreover, Jena et al. used PET/MRI for differentiating between BRN and TP in patients with glioma [65, 66]. Notably, PET/MRI can simultaneously evaluate lesions with several parameters including not only the tracer uptake but also ADC, chemical shifts, and CBV. Like the highest diagnostic accuracy of combination imaging with DWI, MRS, and/or perfusion MRI in this review, PET/MRI could exhibit high diagnostic accuracy in a future systematic review.

Conclusions

In the systematic review for diagnosing BRN, 20 studies for conventional radiological imaging and 26 studies for nuclear medicine studies were identified. All studies had small sample size, and many carried a risk of bias and indirectness. This review reveals that it is difficult to draw a firm conclusion as to which is the best imaging study for the BRN diagnosis. In patients with glioma, Gd-enhanced MRI and 18F-FDG-PET were unlikely to diagnose BRN, although the diagnostic ability was almost the same among included imaging in metastatic brain tumors. Combined imaging methods that include metabolic and blood flow imaging methods demonstrated the highest DOR among all imaging studies. The development of multiparametric imaging techniques could enhance the diagnostic accuracy for differentiating between BRN and TP in the future.