Radiofrequency and Microwave Ablation Compared to Systemic Chemotherapy and to Partial Hepatectomy in the Treatment of Colorectal Liver Metastases: A Systematic Review and Meta-Analysis

Purpose To assess safety and outcome of radiofrequency ablation (RFA) and microwave ablation (MWA) as compared to systemic chemotherapy and partial hepatectomy (PH) in the treatment of colorectal liver metastases (CRLM). Methods MEDLINE, Embase and the Cochrane Library were searched. Randomized trials and comparative observational studies with multivariate analysis and/or matching were included. Guidelines from National Guideline Clearinghouse and Guidelines International Network were assessed using the AGREE II instrument. Results The search revealed 3530 records; 328 were selected for full-text review; 48 were included: 8 systematic reviews, 2 randomized studies, 26 comparative observational studies, 2 guideline-articles and 10 case series; in addition 13 guidelines were evaluated. Literature to assess the effectiveness of ablation was limited. RFA + systemic chemotherapy was superior to chemotherapy alone. PH was superior to RFA alone but not to RFA + PH or to MWA. Compared to PH, RFA showed fewer complications, MWA did not. Outcomes were subject to residual confounding since ablation was only employed for unresectable disease. Conclusion The results from the EORTC-CLOCC trial, the comparable survival for ablation + PH versus PH alone, the potential to induce long-term disease control and the low complication rate argue in favour of ablation over chemotherapy alone. Further randomized comparisons of ablation to current-day chemotherapy alone should therefore be considered unethical. Hence, the highest achievable level of evidence for unresectable CRLM seems reached. The apparent selection bias from previous studies and the superior safety profile mandate the setup of randomized controlled trials comparing ablation to surgery. Electronic supplementary material The online version of this article (10.1007/s00270-018-1959-3) contains supplementary material, which is available to authorized users.


Introduction
Colorectal cancer is the second most common cause of cancer-related death in developed countries and the third most common malignancy worldwide [1]. Roughly 50% of patients develop colorectal liver metastases (CRLM), yet only a minority (10-15%) can undergo partial hepatectomy (PH). Five-year survival following PH ranges between 31 and 58% in carefully selected patients [2,3]. The remainder is usually offered chemotherapy and/or local tumour ablation alone or in combination with PH. Especially radiofrequency (RFA) and microwave ablation (MWA) are commonly employed and widely available. Median overall survival (OS) following systemic treatment nowadays reaches 20-22 months in patients who receive sequential chemotherapy regimens often with biological agents; 5-year survival remains \ 15% [4][5][6][7][8]. Five-year survival following ablation varies between 17 and 53% [9][10][11][12][13]. Although recent studies [13][14][15][16] have reported similar survival for patients treated with thermal ablation or PH, interventional radiology and surgical oncology communities generally state that thermal ablation cannot be considered an alternative to PH. They recommend the use of open, laparoscopic or percutaneous RFA and MWA for small CRLM (B 3 cm) in patients who are unsuitable for resection due to (1) an impaired general health status (age, comorbidities), (2) a history of extensive abdominal surgery, (3) the presence of lesions with an unfavourable location or (4) an insufficient future liver remnant to resect all lesions [11,17,18]. In light of these recommendations the Dutch National Health Care Institute (ZiNL) and representatives from the Dutch societies for interventional radiology, surgical and medical oncology commissioned a systematic review and meta-analysis with the following research questions: (1) what is the evidence regarding safety and effectiveness for RFA and MWA in the treatment of CRLM? and (2) what is the status of RFA and MWA in international guidelines?

Search Strategies
The search strategies and inclusion criteria were based on the following PICOS question: P (population): patients with resectable and unresectable CRLM; I (intervention): RFA and MWA; C (comparison): for resectable disease PH and for unresectable disease systemic chemotherapy; O (outcomes): critical endpoints were OS, complications and quality of life (QoL), important endpoints were diseasefree survival (DFS), local progression-free survival (LPFS), and ablation-site recurrence rate (ASR); S (study designs): (systematic reviews), randomized studies, controlled studies, comparative observational studies with multivariate analysis and/or matching, non-comparative studies if an insufficient number of comparative studies was found. To assess the relative importance of outcomes (critical, important but not critical or limited) the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach was used [19].
We used Cochrane systematic review methods to identify studies that met the inclusion criteria. MEDLINE, Embase and the Cochrane Library (Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effectiveness, Health Technology Assessment database, CENTRAL) were searched (last update September 26th 2017) using a combination of text words and medical subheadings (search strategies: Table 3 online appendix). No time limit was used.
Searches were limited to studies involving humans and published in English or Dutch. Abstracts were only taken into consideration when their methodological quality could be sufficiently evaluated and data extraction could be entirely completed. Studies also describing primary liver tumours and/or non-colorectal liver metastases were only included if data about CRLM could be extracted separately. Only studies reporting on the following outcomes were considered: (1) critical outcomes: OS, QoL and complications; (2) important outcomes: DFS, LPFS, ASR.

Study Selection and Quality Criteria
All retrieved studies were evaluated for inclusion by two reviewers (JV, KHH) independently. First, studies were evaluated on title and abstract. Studies potentially eligible for inclusion were ordered in full text for a comprehensive evaluation.
For the included studies, the methodological quality was evaluated independently using the AMSTAR tool for systematic reviews and the risk of bias tool of the Cochrane Collaboration for randomized trials and controlled studies. For uncontrolled studies (including case series) the following criteria were judged: adequate definition of disease, clear baseline characteristics, inclusion of a representative cohort, adequate disease confirmation using validated methods, standardized data collection and objective outcome measurement.
All discrepancies were resolved by consensus. If no consensus was reached, the opinion of a third researcher (LGF) was the overriding factor.

Data Extraction
Data were extracted by one reviewer (KHH or LGF) and checked by a second (JV). The results were displayed as described in the article, allowing for recalculations based on the data extracted from the article if needed.

Data Analysis
Based on clinical criteria, such as population, intervention, control group and outcome, an assessment was made whether the studies were sufficiently comparable to perform a meta-analysis. A random effects model was chosen, unless there was no statistical heterogeneity. Individual results were presented in a forest plot. The following comparisons and outcomes allowed for a meta-analysis: (1) RFA versus PH alone regarding OS, DFS, LPFS, 30-day mortality and complications, and (2) RFA ? PH versus PH alone regarding OS, DFS, LPFS and 60-day mortality. For time-to-event outcomes (survival), the generic inverse variance method was used. Only corrected hazard ratios (HR; e.g. based on a multivariate analysis) were imputed. For dichotomic results (complications), the Mantel-Haenszel method was used to calculate risk ratios (RR). When C 10 studies were available for inclusion in the meta-analysis a funnel plot was used to assess for publication bias. The meta-analysis was conducted using Review Manager 5.3.

Levels of Evidence
To appoint a level of evidence, the GRADE system was used taking into account the quality assessment and the results from data extraction [20,21]. We classified the level of evidence into 4 GRADE categories: high, moderate, low and very low (Table 1). Quality elements evaluated for downgrading were study limitations, inconsistency, indirectness, imprecision and publication bias.
Two independent researchers graded the evidence levels (JV, KHH). If consensus was not reached, the opinion of a third independent researcher was decisive (LGF). The reasons for appointing evidence levels were documented.

Guidelines
(Inter)national guidelines about RFA and MWA for CRLM were searched in the following database: National Guideline Clearinghouse and Guidelines International Network as well as on websites of (inter)national guideline organizations and scientific societies. Two reviewers (JV, LGF) selected and judged the guidelines using the AGREE II instrument ( Table 2 online appendix) [22]. If consensus was not reached, the opinion of a third independent researcher (KHH) was decisive.

Results
The literature search resulted in 3530 records. After excluding 1121 duplicate papers and 459 documents written in a non-English language, a total of 1950 unique references remained (Fig. 1). Based on title and abstract 1622 references were excluded. A total of 328 articles were selected for full-text review. This led to the exclusion of 280 articles for the following reasons: single cohort without comparison (n = 115); wrong comparator, comparison, intervention or outcome (n = 48); no separate results for CRLM (n = 22); systematic review without quality appraisal (n = 20); narrative review (n = 17); observational study without matching or multivariate analysis (n = 16); and other (n = 42) ( Table 4 online appendix). A total of 48 articles were included: eight systematic reviews, two randomized studies, twenty-six comparative observational studies and ten case series. Two references were included as guideline. Seven out of eight systematic reviews were classified as high quality [1-3, 9, 23-25], one was judged as poor quality [26] (Fig. 2).
Updated search resulted in three new comparative observational studies [13,27,28]. Due to slow recruitment the trial was downgraded to a phase II study.
Twenty-four observational studies compared RFA for unresectable CRLM to PH for resectable disease (Fig. 4). Fourteen studies compared RFA with surgery alone [13,[30][31][32][33][34][35][36][37][38][39][40][41][42], eight studies compared RFA ? PH with PH alone [13,15,16,18,27,28,43,44], and four studies compared RFA to RFA ? PH or PH alone [13,[45][46][47]. A total number of 5020 patients were included in these observational studies (RFA: N = 1103; RFA ? PH: N = 541; PH alone: N = 3376). For none of these studies, it could be excluded that therapy selection was based on patient and/or tumour characteristics and/or physician preference (confounding by indication). Moreover, the methods used to describe outcomes were heterogeneous and, although all included studies used multivariate Quality of life There are no comparative studies on the effect of RFA or MWA --* GRADE definitions: high quality-further research is very unlikely to change our confidence in the estimate of effect (randomized controlled trials); moderate quality-further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate (controlled trials, no randomization), low quality-further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate (observational studies); very low quality-any estimate of effect is very uncertain (any other type) a serious imprecision: in case of low optimal information size (OIS; number of included patients did not meet sample size), dichotomous outcomes, low number of events, wide confidence intervals with uncertainty about magnitude of effect, or when there is a lot of variation in the effects among the participants in continious measures b serious indirectness: very important differences in populations, interventions, outcome measures, or indirect comparisons analysis or data matching based on prognostic factors, these factors differed from study to study. None of the studies blinded patients or outcome assessors. In eleven studies, data collection was retrospective.
Overall Survival

RFA Plus PH Versus PH Alone
Seven observational studies (N = 1918 reported corrected hazard ratios and allowed for pooling of OS results (Fig. 6) [13,15,16,18,27,45,46]. No significant difference in OS was found (HR = 1.24; 95%CI 0.84-1.84). One other article reported only non-corrected hazard ratios, treatment type was not associated with prognosis based on univariate analysis. Adding this study to the meta-analysis did not meaningfully alter the results: (HR = 1.27; 95%CI 0.90-1.81) [47]. Govindarajan et al. reported the OS for recurrent CRLM, and did not detect a significant difference between PH and PH ? RFA for both solitary CRLM (p = 0.49) and multiple CRLM (p = 0.18) [43].
In the EORTC-CLOCC trial, no significant difference in chemotherapy-induced toxicity between the groups was found [29]. In the observational studies comparing RFA   and PH alone, complications were more common after PH compared to RFA (10 studies; RR = 0.47; 95%CI 0.28-0.78) (Fig. 10) [29]. With 110 out of 119 patients included in the analysis, overall quality of life decreased 27 points on average after the procedure to partially restore (to 10 points under baseline) prior to starting chemotherapy (4-8 weeks after RFA) and completely restored hereafter. No formal statistical comparison was done.

RFA Versus PH Alone
Three and five observational studies (N = 406 and N = 1253), respectively, reported corrected hazard ratios for DFS [30,36,37,46,47] [34]. Most studies did not report corrected data for the number of recurrences. However, Gleisner et al. performed a matched-control and propensity score analysis [46]. At 1 year any disease recurrence was more commonly detected after RFA compared to PH alone (66 vs. 24%; p \ 0.001) with a high rate of ASR after RFA (41 vs. 2%; p \ 0.001). Lee et al. also included a propensity score analysis; ASR rate was higher after RFA compared to resection (p = 0.021) [36].

MWA
One randomized controlled trial (RCT) compared MWA to hepatectomy in 30 patients with resectable CRLM (Fig. 15) [48]. The absence of an intention-to-treat analysis makes this study at high risk of bias; 25% (10/40) of the randomized patients were not included in the analysis and the precise randomization method remains unclear.
One observational study compared MWA ? PH to PH alone in 53 consecutive patients with at least 5 bilobar CRLM [49]. MWA was performed for unresectable lesions. Another observational study compared a group of 20 patients who underwent MWA for multiple unresectable CRLM with two historical cohorts: 36 patients who had resection and 25 patients who only received systemic treatment [50]. Both studies are at risk of bias due to the absence of a randomization process and the retrospective data collection (Fig. 16). Finally, an additional number of ten case series were included (N = 689) (Fig. 17) [51][52][53][54][55][56][57][58][59][60]. In seven of these, the majority of patients underwent combined resections ? MWA [51-55, 57, 59]. Seven studies have a high risk of bias due to retrospective data collection and/or contamination of results after complementary PH [51-55, 57, 59]; in the three other studies risk of bias remains unclear because selection bias cannot be excluded [56,58,60]. Only two studies separately reported results for solitary CRLM [56,58]. Last updated search revealed no extra articles for MWA.
Engstrand et al. reported a 4-year OS of 41% for the MWA group versus 4% in the historical cohort treated with chemotherapy alone [50]. Treatment modality was found to be a prognostic factor in multivariate analysis (HR = 0.56; 95%CI 0.33-0.96). The 4-year OS in the PH alone cohort was 70%, but no formal statistical comparison was reported.

Guidelines
The search for guidelines resulted in 15 references, out of which two were excluded because they were updated by a more recent version [61,62]. Thirteen references were evaluated based on their full text; all were included and assessed according to the AGREE II instrument (Table 2 online appendix) [63][64][65][66][67][68][69][70][71][72][73][74][75]. In 4 guidelines RFA and MWA was not mentioned [63][64][65][66]. In 1 guideline RFA was mentioned but without clear recommendations [67]. The American College of Radiology (ACR) guideline does not include specific recommendations, but RFA was described as unsuitable for CRLM, although scientific support for this statement is lacking [68]. The US National Comprehensive Cancer Network (NCCN) guidelines do not provide well- defined recommendations for RFA and MWA, although they do write the following: ''The panel does not consider ablation to be a substitute for resection in patients with completely resectable disease. In addition, resection or ablation (either alone or in combination with resection) should be reserved for patients with disease that is completely amenable to local therapy. Use of surgery, ablation, or the combination, with the goal of less-than-complete resection/ablation of all known sites of disease, is not recommended'' [69,70]. References to the EORTC-CLOCC trial and to several observational studies were used to support these statements [3,29,46,[76][77][78][79][80]. The European Society for Medical Oncology (ESMO) considers RFA suitable for CRLM \ 4 cm if surgery is contraindicated and refers to the EORTC-CLOCC trial and a systematic review [29,71,78]. The UK National Institute for Health and Care Excellence (NICE) guideline considers the current evidence on safety and efficacy adequate to support the use of this procedure in patients unfit or otherwise unsuitable for hepatic resection, or in those who have previously had hepatic resection, provided that normal arrangements are in place for clinical governance, consent and audit [72]. The Scottish Intercollegiate Guidelines network (SIGN) commends that ablation should be considered for CRLM [73,81]. The Belgian Health Care Knowledge Center (KCE) recommends the use of RFA in combination with PH to preserve sufficient future liver remnant and refers to the NICE, SIGN and CCO guidelines [74]. The most comprehensive recommendations were reported in the Dutch Comprehensive Cancer Centre (IKNL) guideline: thermal ablation cannot be considered a substitute for resection, but represents a suitable treatment option for unresectable CRLM if the goal is a complete eradication of all lesions with curative intent [75]. Percutaneous ablation can be considered for patients who are less suitable for surgery because of high-age, comorbidity, unfavourable location or a history of extensive abdominal surgery. The ablation technique of the first choice is RFA. MWA can be considered a good alternative, especially for lesions in proximity of large blood vessels where heatsink, when heat is carried away by the flowing blood, may enable tumour cells to survive after RFA. IKNL refers to the EORTC-CLOCC trial, the Cochrane review and several observational studies [3,26,29,[82][83][84][85].

Discussion
Contradictory to the many available comparative observational studies and case series on thermal ablation for CRLM, the literature to reliably assess its effectiveness compared to chemotherapy and surgery is limited. Although one RCT was identified for RFA [29], GRADE valuation required downgrading the quality of evidence regarding OS. When comparing RFA (± PH) ? chemotherapy to chemotherapy alone, quality was downgraded to moderate, especially because both the optimal information size (OIS; number of included patients did not meet sample size) and the reduced relative risk (RRR = 100 * [1 -upper limit of the 95%CI for the HR (0.88)] = 12%) was too low (serious imprecision; Table 1). When comparing RFA ? chemotherapy to chemotherapy alone, quality was further downgraded to low, because a substantial part of the ablated patients also underwent PH (serious indirectness). However, the remarkable differences in 8-year OS (8.9 vs. 35.9%) and 8-year DFS (22.3 vs. 2.0%) seem to validate the eradication of all macroscopically visible CRLM and to justify the adoption of thermal ablation for unresectable CRLM for this indication [29]. The very serious risk of bias of the one MWA trial required downgrading to very low-quality evidence.
Comparing PH alone for resectable lesions with RFA for unresectable lesions, RFA was associated with significantly fewer complications but also with an inferior survival. In contrast, RFA in addition to PH for patients with unresectable disease, resulted in a comparable survival to resection alone for patients with resectable disease. In other words, for patients with unresectable disease, in whom palliative chemotherapy used to denote the only treatment option, RFA is able to offer patients a DFS and OS comparable to or approaching that of surgical candidates. Out of the eight studies published after 2012, seven showed a similar OS when comparing ablation (± PH) to PH alone (Figs. 5, 6), which may advert to ablative technique improvements. Although MWA compared to chemotherapy alone was associated with a superior OS for patients with unresectable CRLM, this is based on a single retrospective study at risk of bias due to the unclear randomization process, which seriously demotes quality of evidence [50].
In contrast to RFA, the number of comparative studies for MWA was limited. For this reason, we incorporated more restrictions for the RFA studies, including only RCTs and observational studies that performed either case matching or multivariate analysis for prognostic factors.
The included observational studies were by definition all confounded by indication, since ablation was only performed for unresectable lesions. Reasons for choosing ablation over PH were comorbidity (0-41%), inadequate future liver remnant and/or technical factors such as difficult anatomical location (5-67%), patient's choice (0-61%) or extrahepatic disease for studies where this was no exclusion criterion (0-19%). Two other methods to adjust for confounding, namely restricting inclusion to patients from one prognostic category (for example bilobar CRLM) or stratification into subgroups were not allowed, because these methods only take one prognostic factor into account. All outcome measures were heterogeneously reported and follow-up periods ranged between 19 and 61 months in observational studies on RFA. The documentation of tumour load and disease status was strongly variable as were the definitions of progression-, recurrenceand disease-free survival.
The reporting of complications was heterogeneous, which is why it is difficult to identify the most frequent complications for thermal ablation. Of the 24 observational studies, only two were published prior to 2008. In recent years, several technical advancements were implemented in the field of RFA, although the same can be assumed for surgical techniques. The impact of these two older reports on the global results is probably limited. For MWA this effect may be greater, because the only RCT was published in 2000 and one of two observational studies in 2006. Although technical factors such as an unfavourable anatomical location were used to choose for thermal ablation, clear definitions for resectability were not provided in any of the included studies, with the exception of Ruers et al., who defined resectability as ''the possibility to completely resect all CRLM'' [29]. For this reason, subgroup analysis was impossible and the risk for potential confounding by indication remains high. In the thermal ablation studies, the number of procedures necessary to reach local control was heterogeneously reported.
At the time of literature review, there was only one series comparing RFA to MWA for CRLM [86]. Of 243 patients there were no differences regarding OS and ASR between RFA and MWA (p = 0.559 and 0.078, respectively), although the complication rate for peribiliary CRLM was higher after MWA (p = 0.002).
Conclusions drawn from previous meta-analyses are comparable to ours with regard to patients with resectable CRLM, but differ for patients with unresectable disease. The review from Sutherland et al. [25] (published in 2006) was probably too old to find sufficiently relevant studies. Belinson et al. [2] and Cirocchi et al. [3] concluded: ''Evidence from the included studies are insufficient to recommend RFA for a radical oncological treatment of CRLMs''. Gurusamy et al. did not find any RCTs [9]. Bala et al. [1] and Loveman et al. [23] found one RCT for MWA (Shibata et al. [48] published in 2000) and concluded: ''Evidence is insufficient to show whether microwave coagulation brings any significant benefit in terms of survival or recurrence compared with conventional surgery for CRLM patients''. Smith et al. [24] did not assess RFA separately. Pathak et al. [26] were more positive in their conclusions, although their analysis primarily included case series.
The results from this analysis should be judged with caution. Although systematically obtained, there are no guarantees that all available evidence was identified. Furthermore, the inclusion of observational studies increases the risk for publication bias, for which objective indications were detected for the complication rate. Although (for RFA) only studies using randomization, matching or multivariate analysis was included, this does not exclude residual confounding.
To conclude, this article is the first systematic review that supports the widespread adoption of thermal ablation to treat small unresectable CRLM. The (1) recently published long-term survival results from the EORTC-CLOCC trial [29], the (2) comparable survival results after ablation versus resection for the series reported after 2012, the (3) comparable survival after ablation ? resection versus resection alone, the (4) potential to induce long-term disease control and the (5) low complication rates all argue in favour of thermal ablation over chemotherapy alone. Further randomized comparisons of thermal ablation with curative intent to current-day palliative chemotherapy alone should therefore be considered unethical. As a consequence, the highest achievable evidence level for unresectable CRLM seems to have been reached.
Although ablation for unresectable CRLM seems inferior to PH for resectable lesions, the lower complication rate combined with the apparent selection bias stresses the need to conduct a randomized controlled trial. Currently, PH for resectable CRLM is being challenged by thermal ablation in a large multicentre, phase III, randomized controlled trial (COLLISION trial; NCT03088150). This study assesses overall-and disease-free survival, time to (local) progression, primary and assisted technique efficacy rates, adverse events, quality of life and incremental costs.