Introduction

Neoadjuvant chemotherapy (NAC) is the initial therapy for patients with inoperable or locally advanced breast cancer [1] and enables more patients with operable but large primary tumours to be treated with breast-conserving surgery [2]. Given a non-negligible proportion of patients treated with NAC cannot achieve an optimal response or have subsequent disease progression, accurate assessment of the therapeutic response is important to reduce toxicity from ineffective chemotherapy and guide selection of an alternative treatment option. Changes in tumour morphology and size on breast magnetic resonance imaging (MRI) after several cycles of NAC are widely used markers for assessment of the treatment response and prediction of patient outcome [3]. However, MRI studies have different predictive values across the various breast cancer subtypes [4, 5], and there is a limited evidence for their prognostic value. In addition, MRI techniques do not allow evaluation of newly developing distant metastasis during NAC.

Decreased glucose metabolism within breast cancer tissue on 18F-fluorodeoxyglucose positron emission tomography/computed tomography (18F-FDG PET/CT) is a useful indicator to assess the effectiveness of NAC [6]. Several meta-analyses have reported that 18F-FDG PET/CT scans performed during or after NAC could predict the final pathological response after completion of NAC [7,8,9,10]. A meta-analysis directly comparing PET/CT and breast MRI reported that PET/CT has a higher sensitivity and specificity for assessment of pathological response than conventional MRI when performed before 3 cycles of NAC [11]. More recently, accumulating evidence has suggested that assessment of the metabolic response using 18F-FDG PET or PET/CT has prognostic significance in breast cancer patients who underwent NAC [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32].

The use of 18F-FDG PET or PET/CT for response evaluation is not yet established in clinical practice [1], in part due to lack of evidence supporting changes in treatment plans based on the results of PET scans performed to evaluate treatment responses and whether this strategy improves clinical outcomes [33]. However, prior to designing clinical trials evaluating the use of PET for response-adaptive treatment, a thorough review of the currently available data regarding the correlations between metabolic responses evaluated using PET or PET/CT scans and disease recurrence or survival, and of associated risk stratification of breast cancer patients during or after NAC, is warranted. In addition, differences in the timing of PET scans, response criteria and their threshold values across the available studies and their potential effects on survival also need to be assessed. Therefore, we performed a systematic review and meta-analysis of the currently available literature on the prognostic value of 18F-FDG PET or PET/CT for treatment response evaluation in breast cancer patients who underwent NAC.

Materials and methods

This meta-analysis adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [34]. The protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO) network (registration no.CRD42020188979).

Literature search and data extraction

The PubMed, Embase, and the Cochrane Library databases were searched from inception to June 4, 2020. Search queries included the related terms ‘breast cancer’, 18F-FDG PET’, ‘neoadjuvant therapy’, and ‘prognosis’, which are described in the Additional File 1. There was no language restriction for the electronic searches. The references of the extracted articles were also examined to identify additional relevant articles.

The inclusion criteria were based upon the Patient/Intervention/Comparator/Outcome/Study design (PICOS) criteria as follows [34]: (1) female ‘patients’ with breast cancer; (2) 18F-FDG PET, PET/CT, or PET/MRI during or after NAC as ‘intervention’; (3) no ‘comparator’ for the study; (4) overall (OS) and disease-free survival (DFS) as ‘outcome’; and (5) prospective or retrospective studies published as original articles as ‘study design’. The exclusion criteria were as follows: (1) small sample size (< 10 patients); (2) other publication types including conference abstracts, review articles, editorials, and letters; (3) papers irrelevant to the research question; (4) insufficient information regarding survival analysis provided for the study; and (5) overlapping study populations. When the study populations may have overlapped, we selected the publication with the largest population.

Data extraction and quality assessment

The outcomes, study, and patient characteristics of the included studies were extracted using a standardised form. The methodological quality was appraised using the Quality in Prognostic Studies (QUIPS) tool based on key questions of prompting items and considerations for six bias domains which consist of study participation, study attrition, prognostic factor measurement, outcome measurement, study confounding, statistical analysis, and reporting [35]. Study selection, data extraction, and quality assessment were performed by two independent reviewers (S.H. and J.Y.C) with any discrepancy resolved through discussion.

Statistical analyses

Results from the survival analyses within individual articles, including survival rates, univariate and multivariate hazard ratios (HRs), and p values from log-rank tests were extracted. When the Kaplan-Meier curves were provided without corresponding HRs, survival probability at each time point was extracted by means of Engauge Digitizer software (version 10.4, http://markummitchell.github.io/engauge-digitizer/) and individual patient data were reconstructed using the methodology proposed by Guyot et al. [36]. Then, Cox regression analyses were performed to derive HRs and their 95% confidence intervals (CIs); if no events were observed in one arm, Cox regression with Firth’s penalised likelihood was used.

The HRs and their 95% CIs from the univariate Cox regression analyses comparing metabolic responders and poor responders on PET scans were pooled meta-analytically using the DerSimonian-Liard method for calculating weights. If multiple HRs for a single PET parameter were provided in an individual study due to different cut-offs or regions of interest, we selected the HR with the best prognostic value for the meta-analyses. Of note, the terms ‘interim PET’ and ‘post-treatment PET’ were defined as PET studies conducted during (i.e., after one, two, or three cycles) and after NAC, respectively. Higgins I2 statistics were used to assess heterogeneity [37]. Funnel plots with Egger’s test were drawn to identify the presence of publication bias [38]. The ‘survHE’, ‘coxphf’, ‘meta’, and ‘metafor’ packages in R (R Foundation for Statistical Computing, version 3.6.0) were used for the statistical analyses.

Results

Study characteristics

The PRISMA study selection process is described in Fig. 1. The initial literature search yielded 1682 articles. After removing 437 duplicates, screening of the remaining 1245 titles and abstracts yielded 37 potentially eligible papers. We excluded 16 of the 37 articles for the following reasons: palliative chemotherapy (n = 2), neoadjuvant endocrine therapy only (n = 1), no survival analysis (n = 3), overlapping patient populations (n = 8), PET for baseline assessment (n = 1), and only kinetic analyses of dynamic PET scans (n = 1). Thus, 21 studies with 1630 patients were included in the qualitative synthesis [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]. Eleven studies were prospectively conducted, where ten were retrospective studies. For the quantitative synthesis, we included only studies where HRs for metabolic responses assessed by PET scans either during or after NAC were available. A total of 17 studies (1279 patients) were included in the quantitative synthesis [12, 16,17,18,19,20,21, 23,24,25,26,27,28,29,30,31,32]. Of note, there was one study in which Kaplan-Meier curves were separately plotted according to the therapeutic regimen [19]; these patients were incorporated into the meta-analysis as separate cohorts. Tables 1 and 2 summarise the patient and study characteristics.

Fig. 1
figure 1

PRISMA flow chart showing the study selection process

Table 1 Study characteristics of the included studies
Table 2 Patient characteristics in the included studies

Quality assessment

The quality assessment performed using the QUIPS tool is illustrated in Fig. 2. The specific number of studies at risk of bias and reasons were as follows: Five studies presented a moderate risk in selection of participants because of the retrospective study designs, lack of clarity regarding whether the patients were enrolled in a consecutive manner, and/or the unclear inclusion and exclusion criteria [12, 23, 25, 26, 32]. All studies were rated as having a low risk of attrition bias. For prognostic factor measurement, nine studies had a high risk of bias due to the use of data-dependent cut-off values [12,13,14, 18, 20, 24, 27, 31, 32]. Regarding outcome measurements, fifteen studies had a moderate risk of bias because the definition or methods for measuring disease recurrence were unclear [12,13,14,15,16, 22,23,24, 26,27,28,29,30,31,32]. Six studies presented a moderate risk of confounding bias as no adjustment for potential confounders was performed [12, 18, 20, 21, 24, 28]. With regard to statistical analysis domains, five studies had a moderate risk of bias as it was unclear which variables were incorporated into the multivariate analyses, or too many variables were included in the multivariate analyses considering the number of patients in the study population [13, 15, 17, 19, 30]. One study had a high risk of bias in the statistical analyses due to possible selective reporting, as the results of survival analysis were provided for only a subset of the study population, and not the whole population [12].

Fig. 2
figure 2

Quality assessment using the QUIPS tool

Qualitative synthesis

The outcomes of the included articles are summarised in Table 3. PET scanning for evaluation of the patients’ response to NAC was performed during (interim PET) or after treatment (post-treatment PET) in 12 and 10 studies, respectively. The most widely evaluated PET parameter was the percent reduction of maximum standardised uptake value (SUVmax) compared to baseline SUVmax (%ΔSUVmax), followed by SUVmax and complete metabolic response (CMR) on interim or post-treatment PET scans. Of note, %ΔSUVmax is defined as (SUVmax at baseline PET − SUVmax at interim or post-treatment PET) / SUVmax at baseline PET × 100%. The CMR in the included studies was defined as negative FDG uptake [21], or according to the European Organisation for Research and Treatment of Cancer (EORTC) or Positron Emission Tomography Response Criteria in Solid Tumours (PERCIST) criteria [17, 26, 30]. Regarding the timing of interim assessment, PET or PET/CT were performed after two cycles of NAC in five studies [17,18,19, 30, 32], one cycle in four [20, 21, 28, 31], and three cycles in two studies [13, 29]. Specifically, higher %ΔSUVmax at interim evaluation was significantly associated with longer DFS in seven of ten studies [13, 17,18,19,20,21, 28, 29, 31, 32], and longer OS in two of six studies [13, 17, 20, 21, 28, 29]. In addition, CMR and lower SUVmax on interim PET scan was significantly associated with longer DFS in one of two studies [17, 30], and one study [18], respectively. Regarding post-treatment PET scans, %ΔSUVmax was a significant prognostic factor for DFS in four of six studies [12, 13, 17, 22, 25, 27]. Lower SUVmax was significantly associated with longer DFS in all five studies [12, 17, 22, 23, 25]. CMR was correlated with better DFS in three studies [16, 17, 26]; however, CMR was no longer statistically significant after completion of multivariate analyses of the data from two of them [17, 26]. Two studies reported that metabolic tumour volume and total lesion glycolysis and their reduction rates on post-treatment PET scans were significant predictors for better DFS [22, 25].

Table 3 Summary of outcomes in the included studies

The five studies exclusively included specific hormonal subtype of either TN or HR+/HER2−. Of two studies for HR+/HER2− subtype [18, 20], higher %ΔSUVmax on interim PET assessment was significantly associated with better DFS and OS. Likewise in three studies for TN subtype [19, 21, 27], higher %ΔSUVmax on interim PET scans or post-treatment PET scans was a significant predictor for longer DFS.

Quantitative synthesis

Meta-analytical pooling of HRs for interim and post-treatment PET analyses on DFS and OS was performed. With regard to the influence of metabolic responses on DFS, the pooled HR for interim PET scans was 0.21 (95% CI, 0.14–0.32; Fig. 3a) with no heterogeneity (I2 = 0%). No publication bias was shown in the funnel plot and Egger’s test (P = 0.8654; Fig. 4a). PET response analyses performed after one, two, and three cycles of NAC showed comparable prognostic values for DFS with pooled HRs of 0.18 (95% CI, 0.09–0.35), 0.25 (95% CI, 0.14–0.45), and 0.22 (95% CI, 0.09–0.51), respectively (P for subgroup difference = 0.7661). The pooled HR for metabolic responses on post-treatment PET/CT was 0.31 (95% CI, 0.21–0.46; Fig. 3b). No heterogeneity was found (I2 = 0%), and no publication bias was present (P = 0.3199; Fig. 4b). No statistical difference was found between the pooled HRs of interim and post-treatment PET response analyses (P = 0.1942). For studies using combined PET/CT, the pooled HRs for interim and post-treatment PET/CT were 0.23 (95% CI, 0.15–0.37) and 0.30 (95% CI, 0.20–0.43), respectively. The results of subgroup analyses according to PET response parameters are provided in Table 4.

Fig. 3
figure 3

Forest plots showing the pooled HRs of the PET response on interim (a) and post-treatment (b) 18F-FDG PET scans for disease-free survival

Fig. 4
figure 4

Funnel plots of studies assessing the PET response on interim (a) and post-treatment (b) 18F-FDG PET scans for disease-free survival

Table 4 Subgroup analysis according to PET timing and parameters

Among nine studies assessing the prognostic value of %ΔSUVmax for DFS on interim PET scans, six studies included patients with initial clinical stage II–III cancers; %ΔSUVmax was a significant predictor of DFS in Stage II–III breast cancer with a pooled HR of 0.21 (95% CI, 0.13–0.34; Additional file 2: Fig. S1). Of these, there were four studies which included either ER+/HER2− or triple-negative breast cancer population; %ΔSUVmax was also a significant prognostic factor in stage II–III ER+/HER2− or triple-negative breast cancer (pooled HRs = 0.20 [95% CI, 0.08–0.47] and 0.26 [95% CI, 0.11–0.61], respectively). Meta-regression analyses were performed according to clinical variables (including age, stage, histologic type and grade, receptors, subtypes, and pathological complete response) for these nine studies, and no variable was found to significantly influence the HRs (Additional file 3: Table S1).

With regard to the influence of metabolic response on OS, the pooled HR for the interim PET scans was 0.20 (95% CI, 0.09–0.44; Fig. 5a), and no heterogeneity was present (I2 = 0%). The pooled HRs for the metabolic response on post-treatment PET scans was 0.26 (95% CI, 0.14–0.51; Fig. 5b). No heterogeneity was found (I2 = 0%). No statistical difference was found between the pooled HRs of interim and post-treatment PET response analyses (P = 0.6137). Publication bias could not be evaluated because of the limited number of included studies. No statistical difference was found upon subgroup analyses in accordance with PET parameters (Table 4), although the paucity of studies limited the statistical power. For studies using combined PET/CT, the pooled HRs for interim and post-treatment PET/CT were 0.33 (95% CI, 0.08–1.37) and 0.24 (95% CI, 0.11–0.51), respectively.

Fig. 5
figure 5

Forest plots showing the pooled HRs of the PET response on interim (a) and post-treatment (b) 18F-FDG PET scans for overall survival

Discussion

In this meta-analysis, we assessed the prognostic value of 18F-FDG PET or PET/CT for evaluation of the response to treatment in breast cancer patients. Our major findings were as follows; (1) PET metabolic response during and after NAC is a significant predictor of disease recurrence and death; (2) more evidence is available regarding PET scans for prediction of DFS than OS (18 and 7 studies included in meta-analyses for DFS and OS, respectively); (3) %ΔSUVmax was the most frequently evaluated response parameter in both interim and post-treatment PET scans; (4) no differences were found between the prognostic values of interim and post-treatment PET scans in term of both DFS and OS; (5) regarding the timing of the interim assessment, PET is mostly commonly performed after 2 cycles (followed by PET after 1 cycle) and yielded significant prognostic values; and (6) in more than half of the included studies, the cut-off values for each PET parameter were data-dependent and therefore differed greatly across studies.

Notably, a metabolic response identified on PET scans is significantly associated with the pathological response in surgical specimens after NAC [7,8,9], a well-established predictor of patient outcomes [39]. In addition, in clinical practice, it is plausible that 18F-FDG PET or PET/CT is highly likely to have an incremental prognostic value on pathological response of breast cancer given (1) PET scans enable early assessment of patient responses to NAC, which may support decisions to cease ineffective treatment and select alternative treatment options, whereas pathological response can only be assessed after completion of surgical resection; (2) twelve of 15 included studies in which either multivariate Cox regression analyses or subgroup analyses were performed reported the metabolic response as having independent prognostic significance to pathological response [13,14,15,16,17, 19, 22,23,24,25,26,27,28,29, 31]. We could not pool HRs from multivariate Cox regression in the meta-analysis because variables included in the models differed widely across the studies that would directly affect the values of HR; (3) the results of our meta-regression indicated the pathological complete response did not influence the pooled HR, which may indirectly support the independent prognostic role of the metabolic response.

Upon thorough inspection of the clinical characteristics of the included studies, our study population mainly consisted of Stage II–III breast cancer patients, which was consistent with the types of patients who typically receive NAC [33]. We also found higher proportions of patients with HER2+ and triple-negative subtypes in the included studies (HER2+ regardless of hormonal receptor status: 26 [402/1558]; and TN: 28 [374/1357]) compared to general breast cancer population referred to cancer statistics published by the US National Cancer Institute: the prevalence of the HER2+ and triple-negative subtypes was 14% and 10%, respectively [40]. HER2+ or triple-negative subtypes are aggressive subtypes, with patients typically presenting with higher FDG uptake at baseline. Therefore, these subtypes of breast cancer are promising targets for evaluation of the metabolic response using 18F-FDG PET or PET/CT [33]. In addition, our analyses indicated that a metabolic response on interim PET also has prognostic significance in the ER+/HER2− group [18, 20], a subtype in which MRI is of limited utility for evaluation of patient responses to NAC [4, 5].

The studies included in our qualitative synthesis varied widely in terms of PET timing, and PET criteria for defining a metabolic response. %ΔSUVmax, the percent reduction of SUVmax between the baseline and interim or post-treatment PET scans, is the most frequently evaluated parameter and is associated with disease recurrence and survival. This ‘ratio’ has the advantage that it may offset the potential effect of noise, reconstruction, image sampling, and smoothing on SUVmax, as long as the PET scans at baseline and during or after NAC are performed using the same machine and protocols; otherwise, it may limit the applicability of results across PET facilities [41].

There were a comparable number of studies and prognostic significance regarding interim vs. post-treatment PET. As it can allow early response evaluation and subsequent modification in treatment, interim PET scans may have better clinical values than post-treatment PET. Regarding specific timing of interim PET, there were no apparent differences between the prognostic values of interim PET assessments performed at different times during NAC; however, the number of studies was insufficient to assess statistical significance. We found that in the majority of studies addressing interim PET scans the response was assessed after 1–2 cycles of NAC, and evidence of their prognostic value was found. Moreover, the better predictive values for pathological response when performing PET scans after 1–2 cycles of NAC (compared to after 3 cycles of NAC) were reported in previous meta-analyses [7, 10]. Given early assessment of response to NAC is important for timely modification of the therapeutic strategy, it might be advisable for interim PET to be performed after 1–2 cycles.

There were several limitations of our study. First, a substantial portion of the included studies were performed retrospectively. Second, there was considerable heterogeneity of hormonal subtype of tumour, PET scan timing, and response parameters among the studies. Therefore, caution is required when considering the applicability of our pooled estimates. Third, approximately half of the included studies used data-dependent cut-off values for the assessment of PET parameters (i.e., optimal cut-off on receiver operating characteristics analysis for predicting pathological response) which may overestimate the prognostic values of 18F-FDG PET or PET/CT. Fourth, the number of studies included for meta-analysis of OS was small, though the pooled HR was statistically significant. However, DFS was regarded as a valid surrogate for OS which requires long-term follow-up for the assessment of efficacy [42].

Conclusions

A metabolic response to NAC as detected by 18F-FDG PET or PET/CT is a significant prognostic factor in terms of DFS and OS. Meta-analytically pooled HRs for DFS nor OS were not significantly different for interim or post-treatment PET scans. %ΔSUVmax, defined as the percent reduction of SUVmax compared with that obtained from the baseline PET, is the most widely evaluated PET response parameter. For the interim assessment of patient responses to NAC, PET scans were commonly performed after 1–2 cycles of NAC and provided significant prognostic values. Evaluation of the metabolic response to NAC may be helpful not only in HER2+ or triple-negative subtypes which are known to be FDG-avid, but also in hormone receptor-positive tumours. These results suggest that 18F-FDG PET or PET/CT may provide accurate risk stratification of breast cancer patients and support risk-adapted therapeutic management based on metabolic response in clinical practice or trials.