Rectal cancer exerts a large healthcare burden globally, ranking as one of the highest in prevalence and mortality worldwide.1 Standard management of locally advanced rectal cancer (LARC) involves neoadjuvant therapy followed by total mesorectal excision (TME).2 Some patients achieve a complete clinical response (cCR) post-neoadjuvant therapy, where no tumor is detectable on clinical examination, endoscopy and imaging.3 In such cases, TME may expose patients to perioperative morbidity, mortality, and potential long-term sexual, urinary and bowel dysfunction and be unnecessary.4,5 Non-operative management or ‘watch and wait’ (W&W) offers comparable disease-free survival (DFS) rates6,7,8,9,10,11 and, owing to avoidance of surgery, has been associated with improved quality of life (QoL) compared with TME.12,13 Despite the 20–25% risk of local regrowth in W&W that necessitates intensive surveillance,10,11 successful surgical salvage is possible in over 90% of cases.7,11

Total neoadjuvant therapy (TNT), combining neoadjuvant chemoradiotherapy (nCRT) and upfront chemotherapy, has demonstrated improved rates of DFS, cCR, and pathological complete response (pCR) in LARC.14,15,16 Accordingly, TNT use has increased dramatically in recent years.17 Given the increasing rates of cCR, risks of surgery, and similar W&W DFS, it is possible that more patients and clinicians will consider W&W as the primary treatment option.

Global treatment costs of colorectal cancer are projected to surpass billions of US dollars and a co-ordinated international effort to mitigate the rising cost is warranted.18 LARC management is expensive given frequent use of combination treatment modalities and complex surgery.19 Surgery is associated with significant costs, attributable to inpatient hospital stay, surgical supplies, operating theatre expenses, and high overall complication rates.20 By avoiding surgery and inpatient hospital care, W&W has the potential to be a substantially cost-effective and cost-saving intervention.21,22,23 However, to capture recurrences early, W&W protocols involve rigorous surveillance through frequent multimodal imaging, blood tests, endoscopies and office visits, which adds to the financial burden.

The objective of this systematic review was to identify the economic impact of W&W, versus standard of care, in patients who have achieved cCR following neoadjuvant therapy for LARC.

Methods

This systematic review of healthcare economic evaluations (EEs) followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA)24 and Synthesis without Meta-Analysis (SWiM) guidelines,25 and was prospectively registered with PROSPERO (CRD42024513874).

Eligibility Criteria

Eligible participants were adult (>16 years) primary LARC patients who received neoadjuvant therapy (chemotherapy, radiotherapy, or both). The intervention studied was W&W versus standard of care (TME with or without adjuvant chemotherapy). The primary outcome was the (incremental) cost-effectiveness ratio, and secondary outcomes were the net financial costs of the interventions.

Complete (cost-effective analysis, cost-utility analysis, cost-benefit analysis, and cost-minimization analysis) and partial (cost analysis) EEs with varying healthcare perspectives, time horizons, settings, and discount rates were eligible. Exclusion criteria included pediatric population, malignancy proximal to the rectosigmoid junction, distant metastasis, or recurrent LARC. Case reports, editorials, letters, systematic reviews, comments, mini-reviews, book chapters and conference abstracts were excluded.

Search Strategy

The PubMed, OVID Embase, OVID Medline, and Cochrane Library CENTRAL databases were searched from inception to 21 February 2024, and updated on 26 April 2024, for studies of any design, in any setting, without language or publication restrictions. Keywords were related to ‘rectal neoplasms’, ‘watchful waiting’, ‘organ preservation’, ‘economics’, and ‘cost’. The full search strategy was developed with input from a database librarian (electronic supplementary material [ESM] 1). Supplementary searching included reviewing reference lists of included articles, consulting subject experts, and screening grey literature.

Study Selection

Studies were screened using the Covidence software (Veritas Health Innovation, Melbourne, VIC, Australia). After duplicate removal, two reviewers independently screened all titles, abstracts, and full texts, resolving disagreements by consensus, arbitrated by a third reviewer.

Data Extraction

Baseline study characteristics and outcome data were extracted independently by two reviewers using a standardized form, resolving discrepancies by consensus, arbitrated by a third reviewer. Study characteristics included author details, publication year, country, healthcare setting, study period, type of EE, analytical approach, study perspective, time horizon, discount rate, population and intervention characteristics. Outcome data extracted were the mean costs, effectiveness, and uncertainty analysis.

Quality Assessment

The methodological quality of trial-based EEs was examined using the British Medical Journal (BMJ) checklist,26 which contains 35 items evaluating study design, data collection, analysis and interpretation. Model-based EEs were assessed using the Philips checklist,27 which consists of 58-items assessing model structure, data, and consistency. Reporting of EEs were evaluated using the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) 2022 checklist, which includes 28 items aimed to standardize and enhance transparency in reporting.28

Two reviewers independently assessed the methodological and reporting quality, resolving disagreements by consensus, arbitrated by a third reviewer. Each item for the CHEERS, BMJ, and Philips checklists was recorded as ‘yes’ (1 point), ‘no’ (0 points), or ‘not applicable’ to assess completeness. As there are no validated scoring metrics for these checklists, grading systems or percentage cut-offs were not used.26,27,28,29

Data Analysis

Due to jurisdiction-specific factors such as location, healthcare system, time horizon, and perspective, there can be considerable heterogeneity in outcomes of EEs. Guidelines discourage pooling of primary outcomes when studies vary in their clinical setting or methodology,30 precluding meta-analysis due to these inherent limitations. Instead, synthesis of economic evidence followed established guidelines, employing structured narrative synthesis to present study characteristics, methodological quality, and outcomes.25 For review purposes, published costs were adjusted for inflation and purchasing power parity using a validated online calculator (https://eppi.ioe.ac.uk/costconversion/) using OECD data, targeting 2022 Australian dollars (AUS$) when reference data were available.31

Publication Bias

Publication bias was assessed by searching the grey literature, conference abstracts not proceeding to publication, analysis of sponsorship in included studies and outcome differences, and presence and results of uncertainty analysis.30

Results

Study Selection

Of 1548 articles retrieved, 519 duplicates were removed; of the 1029 articles screened by title and abstract, 1002 were irrelevant. Twenty-seven articles proceeded to full-text review and 9 met the inclusion criteria. Studies were excluded due to wrong study design (n = 17) and wrong patient population (n = 1). Supplemental searches identified three more studies, resulting in a total of 12 studies for inclusion in this systematic review (Fig. 1).

Fig. 1
figure 1

PRISMA diagram. PRISMA preferred reporting items for systematic reviews and meta-analyses

Economic Study Characteristics

Table 1 summarizes the characteristics of the included studies. Studies were published between 2016 and 2024 and originated from eight countries: United States,22,23 The Netherlands,32,33 Spain,34,35 Germany,36,37 Australia,38 New Zealand,39 United Kingdom,21 and Japan.40 Seven studies were cost-effectiveness analyses (CEA),21,22,23,32,34,35,37 while the remainder were cost analyses (CA).33,36,38,39,40 Seven studies adopted a model-based approach.21,22,23,32,34,36,37

Table 1 Characteristics of the included studies (n = 12)

Comparators to W&W were abdominoperineal resection (APR) in three studies,36,37,40 APR and low anterior resection (LAR) in three studies,22,23,38 and unspecified in the remaining studies. Rodriguez-Pascual et al. compared W&W with standard and robotic resection.34 In the article by Wurschi et al., the W&W cohort received TNT whereas the surgical comparator received nCRT.37 For the remainder, both cohorts received nCRT. Three studies specified low rectal cancer requiring APR,36,37,40 while the remaining studies investigated rectal cancer of any height. In all studies, only patients with cCR were offered W&W, whereas patients undergoing surgery may have achieved pCR,35,38,39 incomplete clinical response,32,33,37,40 or cCR,21,22,23,34,36 potentially biasing the outcomes.

Methodological Details

Table 2 summarizes the methodological details of the included studies. Eleven studies adopted a hospital or third-party payer (TPP) perspective,21,22,23,32,33,34,35,36,38,39,40 while Wurschi et al. adopted a patient perspective.37 When unspecified, perspective was inferred from costing information. Various time horizons were explored: 2 years,33 3 years,35,38 5 years,23,36,37,39,40 and lifetime.21,22,32,34 Costs and effects were discounted in seven studies, ranging from 1.5 to 4% depending on study jurisdiction.21,22,23,32,34,36,40 Model types included decision tree,36 Markov,22,23,32,34,37 or both.21

Table 2 Methodological quality and details of the included economic evaluations (n = 12)

Resource use in trial-based EEs was estimated from patient records and local institutional policies. Transition probabilities in model-based EEs were sourced from published literature, population statistics, practice guidelines, clinical trials, and institutional databases. Cost estimates reflected jurisdiction-specific payment systems, except in one Spanish study, where costs were derived from the US.34 Utilities were predominantly sourced from published literature and also institutional databases in two studies34,35 and expert elicitation in one study.32 Detailed data sources for modeling studies are provided in Table 3.

Table 3 Input parameters for modeling studies (n = 7)

Model-based uncertainty was assessed with deterministic sensitivity analysis (DSA) to investigate parameter uncertainty in six studies,21,22,23,32,36,37 probabilistic sensitivity analysis (PSA) to investigate simultaneous joint parameter uncertainty in four studies,21,22,23,34 and scenario analysis to investigate model assumptions in three studies.21,23,32 Subgroup analyses were conducted in three studies to investigate heterogeneity.21,35,37 Three trial-based EEs reported statistical analysis: standard deviations,33 p-values,39 or both.35

Quality Assessment

Studies were heterogenous in reporting and methodological quality. Table 2 displays the assessment for each study. The completeness of reporting ranged between 48 and 86% using the CHEERS 2022 checklist. Figure 2 demonstrates the proportion of studies that satisfied each item in the checklist. Completeness for the BMJ checklist for trial-based EEs was between 55 and 82%, and between 60 and 87% for the Philips checklist for model-based EEs. The full quality assessment matrix for each study is displayed in ESM 2.

Fig. 2
figure 2

Number of studies reporting CHEERS items (green, yes; red, no; grey; NA). CHEERS consolidated health economic evaluation reporting standards consolidated health economic evaluation reporting standards, NA not available

Cost Effectiveness

Seven studies (six modeling, one trial) evaluated cost effectiveness.21,22,23,32,34,35,37 Outcomes are summarized in Table 4 for model-based EEs and in Table 5 for trial-based EEs. Across time horizons of 3 years to lifetime, all studies consistently identified W&W to be dominant over surgical comparators, offering lower costs and higher quality-adjusted life-years (QALYs) from both TPP21,22,23,32,34,35 and patient perspectives.37 Standardized incremental costs of W&W ranged from AUS$1141 to AUS$192,145 (3–50%) less per patient (i.e. cost saving) from the TPP perspective and AUS$3203 (25%) less from the patient perspective. The incremental QALYs associated with W&W ranged from 0.089 to 2.03 more QALYs than surgery.

Table 4 Summary of primary outcomes—model-based (n = 7)
Table 5 Summary of primary outcomes—trial-based (n = 5)

In US studies over 5-year and lifetime horizons, W&W demonstrated 40–50% lower costs and superior QALY compared with both LAR and APR.22,23 Similarly, Spanish studies over 3-year and lifetime horizons demonstrated W&W dominance over surgical resection, including robotic resection in one study.34,35 Ferri et al. raised methodological concern by appearing to apply utilities derived from Short-Form 36 (SF-36) questionnaires at 12 months across 3 years without elaboration, potentially affecting the reliability of QALY estimates.35 Rodriguez-Pascual et al. used US costing estimates, potentially limiting applicability to the Spanish jurisdiction.34 Dutch and UK studies found W&W to be dominant over surgery over a lifetime,21,32 despite challenges in obtaining appropriate W&W utilities due to the lack of published literature. Hendriks relied on expert elicitation at the author’s institution32 and Rao et al. relied on proxy data from prostate cancer literature.21

One German study provided the only patient perspective, demonstrating W&W dominance over APR across a 5-year time horizon.37 However, W&W patients received TNT versus nCRT in the APR comparator, potentially limiting applicability as this may not reflect clinical practice.

Cost

Five studies performed CA comparing W&W with surgery; four trial-based EEs33,38,39,40 and one model-based EE.36 All were performed from TPP or hospital perspectives, with time horizons ranging from 2 to 5 years. W&W consistently showed lower costs compared with surgery across all studies. Standardized cost differences ranged from AUS$17,945 to AUS$37,010 (40–61% less costly).

A Dutch study with a 2-year time horizon demonstrated mean hospital costs of AUS$15,176 (95% confidence interval [CI] $13,895–16,456) for W&W and AUS$38,675 (95% CI $35,291–42,060) for surgery, translating to an AUS$23,499 (61%) cost reduction per patient.33 Similarly, a small (n = 10) trial-based Australian study with a 3-year time horizon showed mean costs of AUS$55,315 for W&W and AUS$92,325 for surgery, resulting in an AUS$37,010 (40%) cost reduction per patient.38

Three studies compared costs with a 5-year time horizon. In Japan, mean costs were AUS$18,858 for W&W and AUS$36,803 for APR, with a cost saving of AUS$17,945 (49%) per patient.40 In New Zealand, mean costs were NZ$47,906 for W&W and NZ$70,760 for surgery, resulting in a NZ$22,854 (32%) cost saving per patient.39 This was the only study to factor in neoadjuvant treatment costs, potentially increasing net financial costs compared with other studies. A German decision tree model found W&W costs were €6344 and APR costs were €14,511, saving €8167 (56%) per patient.36

The varying approaches used with respect to time horizon, discounting, and management of inflation limited the value of across-study comparisons of absolute and incremental costs.

Heterogeneity

Three studies performed subgroup analyses. Ferri et al. found greater cost savings with the W&W approach for low rectal tumors compared with medium-high rectal tumors, likely due to increased surgical complexity and risk of post-surgical complications.35 Wurschi et al. examined patient costs and found employed W&W patients had twice the cost saving compared with retired patients, likely due to reduced productivity losses.37 Rao et al. examined three cohorts: healthy 60- and 80-year-old males, and comorbid 80-year-old males, finding W&W dominant across all cohorts.21

Uncertainty

Ten studies conducted statistical or sensitivity analysis to address uncertainty.21,22,23,32,33,34,35,36,37,39 DSA across six studies demonstrated key parameters impacting outcomes were rates of local regrowth21,22,23 and distant metastasis following W&W,22,23 salvage surgery,22 perioperative mortality21 and utilities for W&W and surgical comparators.21,22,23,32,37 Doubling W&W costs or decreasing surgical costs by 90% could have altered outcomes in two studies.36,37

PSA was performed in four studies.21,22,23,34 US studies demonstrated W&W to be dominant over 5-year and lifetime horizons in almost all simulations.22,23 Rao et al. demonstrated W&W’s dominance with high certainty (>70%), with increasing certainty among older and comorbid patients.21 Rodriguez-Pascual et al. showed W&W to be cost saving in almost all simulations and mostly increasing QALYs when compared with standard and robotic resection (87% and 55% of simulations, respectively), however there were concerns regarding costing data and utility acquisition methods.34

Three model-based studies performed scenario analyses, examining different patient populations,32 adjuvant chemotherapy timings,23 and surveillance protocols,21 all showing W&W dominance in all scenarios examined. Trial-based EEs demonstrated statistically significant results, with p-values <0.05 in two studies35,39 and non-overlap of calculated confidence intervals in one study.33

Publication Bias

Searching of the grey literature and conference abstracts not proceeding to publication did not reveal any undiscovered articles. Analysis of sponsorship demonstrated three articles with academic affiliations,22,35,36 two with government affiliations,21,23 and one with industry affiliation.22 There were no differences in outcome based on sponsorship, and all sponsored articles included appropriate uncertainty analysis.

Discussion

EE is crucial in the assessment of new health technologies and health protocol implementation, enabling decision makers and policy developers to review the impact of interventions and allocate scarce healthcare resources efficiently. In this first systematic review of the global economic impact of W&W, 12 eligible studies of varying reporting quality and methodological designs were identified. Seven CEAs and five CAs all reported improved cost-effectiveness and cost-saving associated with W&W across a variety of time horizons and perspectives.

A key strength of this review was the comprehensive search strategy and broad eligibility criteria facilitating inclusion of different types of EEs globally. A consistent direction of effect favoring W&W as the dominant strategy suggests that the conclusions could be applicable internationally across heterogeneous health systems and patient populations. Multidimensional assessment of methodological and reporting quality allowed recognition and highlighting of high-quality EEs.

Of the 12 studies, 11 reported costs from TPP or hospital perspectives, while only one utilized the patient perspective. None comprehensively assessed societal costs, including indirect and intangible costs, such as productivity loss associated with frequent appointments during intensive W&W surveillance, or the psychological burden associated with the uncertainty of cancer prognosis. Additionally, none assessed implementation and maintenance costs of W&W pathways that may require significant coordination of the patient’s clinical journey, frequently necessitating a dedicated cancer care coordinator.41 Further research should explore the impact of a broader perspective and of broader cost considerations for W&W on its cost effectiveness. However, as two studies indicated that large changes in costs would be required to alter the conclusion of cost effectiveness of W&W versus surgery, even accounting for a more costly W&W pathway may not alter the dominance of W&W.

Considering the applicability of the results of this review to current LARC management is important. Neoadjuvant therapy included TNT in only one study.37 Given current practice recommendations and trends towards adoption of TNT,2,17 with more intensive upfront chemotherapy use and higher cCR rates, it remains unclear if the cost effectiveness of W&W will persist in TNT patient populations. In the single study utilizing TNT prior to W&W, the surgical comparator received nCRT, limiting its relevance to real-world practice, where clinicians are deciding on W&W for patients treated with TNT alone.37 Model-based CEAs examining TNT versus nCRT followed by TME from a TPP perspective found TNT was the dominant strategy over a 5-year time horizon,42,43 but did not explore the impact of W&W policies in the cohort. Finally, trial-based EE comparators in this review varied, and given incomplete clinical response is associated with poorer oncological outcomes compared with cCR, comparisons of cCR in W&W to incomplete clinical response in surgery may have biased results towards W&W. Additionally, comparison of pCR in surgery with cCR in W&W may have biased results towards surgery. Given it may be impractical or impossible to randomize patients to W&W versus surgery, high-quality prospective cohort data with standardized comparators are needed to allow accurate decision making.

This systematic review has limitations. First, the jurisdiction-specific nature of EEs results in inherent heterogeneity. This trade-off between local applicability and global generalizability precludes formal meta-analysis. As such, results were narratively synthesized, with a consistent direction of effect favoring W&W across all evaluations. Methodological and reporting quality varied significantly, with several studies failing to follow the majority of reporting guidelines.28 Future research should prioritize adherence to minimum reporting standards and good practice guidelines to improve quality, standardization, and transparency.

A limitation of summarizing outcomes of model-based EEs is that many used the same sources, or other model-based EEs, for their input parameters—namely transition probabilities and health state utilities. Given analogous inputs, it may not be surprising that the results themselves tended towards similar outcomes. Therefore, if an evidence source that underpins multiple EEs is inaccurate, there is concern that the multiplicity of analyses may amplify an erroneous conclusion rather than provide independent verification, reinforcing the need for thorough critical appraisal of the underlying input sources in addition to the economic modeling methodology. Moreover, due to limited literature, some studies used health utility data derived from prostate cancer literature or expert opinions, potentially introducing bias. Future research focusing on patient-reported outcomes and QoL would improve the accuracy of these decision-making models. Nevertheless, the identification of model structure and relevant input parameters may inform future model development.

In half of the model-based CEAs, DSA suggested potential outcome differences with varying local and distant recurrence rates post W&W. Although the thresholds exceeded published rates, long-term prospective follow-up data are limited. Additionally, recent literature suggests local recurrence following cCR may be a significant and independent risk factor for distant metastasis and that leaving the undetectable primary tumor in situ until recurrence occurs may result in poorer oncological outcomes.44,45 Despite salvage surgery being successful in almost all cases of local regrowth, more extensive surgeries may be required to achieve adequate local control. All DSAs suggested patient utilities for W&W and post-surgery may have changed the model outcome, highlighting the need for high-quality studies to refine these key parameters. Despite these limitations, PSA consistently supported W&W as the dominant strategy with high certainty.

Quality assessment tools in EE have several inherent pitfalls. No standardized tool exists, leading to the development of multiple checklists.29 The BMJ and Philips checklists were chosen as they were the most commonly used for trial- and model-based EEs, respectively;29 however, the subjective nature of these checklists results in high interrater variability, limiting the ability to provide reliable and consistent results.29 In this review, two independent reviewers performed each element of quality assessment, and disagreements were resolved either by consensus or a third reviewer, which helped reduce bias and systematic errors.30 Because no validated scoring systems of these checklists exist, it is important to emphasize that scores do not imply quality.26,27,28,29 Therefore, grading systems or arbitrary percentage cut-offs were not employed; instead, ESM 2 presents the complete matrix of quality assessment.

The results of this systematic review on the economic impact of W&W following neoadjuvant therapy for LARC suggest that W&W is likely cost effective and cost saving compared with surgery; however, caution is warranted given the small number of studies, clinical heterogeneity, and variable methodological quality of the included studies. Given these considerations, shared patient/clinician decision making is imperative. Nevertheless, our findings may aid the development of new decision-making models and in healthcare resource planning. Future research on patient-relevant health outcomes and societal cost effectiveness of W&W, particularly in the setting of TNT, are needed to further inform patients, clinicians, and policy makers.