Problem Definition and Analyses
A decision analytic model was created to compare the cost, effectiveness, and cost-effectiveness of two strategies for screening for BE: random 4-quadrant forceps biopsies alone (FB) vs. forceps biopsy with WATS3D (FB + WATS3D). The simulated cohort consisted of 60-year-old white males with GERD not previously screened for BE. Sixty years of age was used to leverage results of previous cost-effectiveness models for surveillance of BE sponsored by the National Cancer Institute [15]. Results were calculated with Microsoft Excel 2016 (Microsoft Corporation, Redmond, WA, USA) and TreeAge Pro 2019 (TreeAge Software, Williamstown, Massachusetts, USA).
Effectiveness was measured in quality-adjusted life years (QALYs), which weights years of life by their health state utilities. Health state utilities represent patient preferences for different states of health, ranging from 0 (death) to 1 (full health). Cost was taken from the third-party payer perspective. All costs were adjusted to 2019 US$ based on the Medical Care Services component of the Consumer Price Index [21]. Both cost and effectiveness were discounted at the standard 3% per year. Following recommendations, incremental cost-effectiveness analysis was performed using two thresholds for cost-effectiveness: $100,000/QALY and $150,000/QALY [22,23,24]. Supplementary analyses were conducted to determine the number needed to screen to avert 1 cancer, and the number needed to screen to avert 1 cancer death.
No Institutional Review Board approval was necessary as there was no use of individual-level data, and therefore no human subjects.
Overview of Model
BE detected by positive FB or WATS3D was referred to surveillance, with treatment of future dysplasia. Results of surveillance were taken from results of two published models found in the literature and on the National Cancer Institute’s Cancer Intervention and Surveillance Modeling Network (CISNET) website [15].
Because incremental cost-effectiveness analysis was used, analysis was only done to compute differences between the two strategies. When patients would have identical results for both strategies, cost and effectiveness were not calculated since they would cancel in the incremental analysis. Thus, no calculations were made for anyone with a positive FB screen since they would be sent for surveillance in both strategies. Similarly, anyone with a negative FB and a negative WATS3D would not go into surveillance regardless of strategy. Patients with a negative FB and discordant positive WATS3D would be entered into a surveillance protocol. These cases had to be modeled and the cost and effectiveness calculated. For cases of true negative FB but false positive WATS3D, we assumed they would go into surveillance with FB and later be removed after two rounds of negative FB “confirmed” the false positive status of the original WATS3D screen. For cases of false negative FB and true positive WATS3D, we assumed they would remain in surveillance with future FB surveillance confirming the presence of BE. A summary of how each case is handled is found in Table 1.
Table 1 Overview of how each screening approach handles different test results Model Parameters
All input parameters for the model are described below and summarized in Table 2.
While there are many estimates of the prevalence of BE in the general and GERD populations, there is a paucity of literature on the proportion of screenings with FB that result in a positive result. Rubenstein et al. used a database of over 150,000 patients undergoing their first endoscopy and presented the results of BE detection by sex and GERD as indication for endoscopy, by decade of life [25]. For white males with GERD, the proportion with FB positive results were 6.0% for age 40–49, 9.3% for age 50–59, and about 8.4% for age 60–69. Averaging the results for age 40–49 and 50–59, then 50–59 and 60–69, this would result in estimates of 7.65% for age 50 and 8.85% for age 60. We used 8.85% for our base case value, and 7.65% for the low end of the range for sensitivity analysis. For the high end, we used the value of 10.05%, which is the same distance from the base case as the low end of the range.
Adding WATS3D to FB has been shown to increase the yield of positive screens for BE. Over time, changes, including increased brush size and optimized computer algorithmic analysis with further machine learning, have improved performance of WATS3D. For our model, we chose to use data from the most recent and largest study of WATS3D, which incorporated the enhanced three-dimensional computer analysis system and the larger sampling brush [19]. In this study, WATS3D increased the overall detection of BE by 213% when used adjunctively in screening [19]. Given the uncertainty around these key parameters, we chose to calculate incremental cost-effectiveness analysis results across a wide range of values for these parameters. For additional yield, we used the result from Smith et al., [19] along with one-half and one-third of the published result as possible outcomes for the model (213%, 106.5%, 71%). The false positive rate for WATS3D benefits from only being used by WATS3D expert pathologists at one laboratory. However, no data currently exist to estimate the rate of false positives for WATS3D. For false positives, we considered 5%, 15%, and 25% as possible outcomes. We then calculated the incremental cost-effectiveness ratios for all nine combinations of these values. While this two-way sensitivity analysis became our primary analysis, we considered the center cell (106.5% additional yield, 15% false positive WATS3D) to be our base case for the purpose of one-way sensitivity analyses on other parameters and for ease of exposition. The Smith study also suggests a much higher prevalence of BE than previously thought, and this is supported by other data. While a Swedish study showed just 1.6% prevalence in the general population [26], a US study in first-time screening colonoscopy patients aged 40 or over found a 6.8% prevalence [27]. A mathematical model of the US population, aligned with data from the Surveillance, Epidemiology and End Results registry (SEER), arrived at an estimated prevalence of 5.6%, supporting a higher prevalence than previously thought [28]. A modeling study by Hur et al. estimated the prevalence of BE and the prevalence of GERD by decade of life [29]. Averaging the results for ages 50–59 and 60–69, adjusting to reflect the 5.6% prevalence in the USA, and combining this with the prior estimates of a relative risk of 6.0 for patients with chronic GERD symptoms, we calculated a prevalence of BE in white males age 60 with chronic GERD to be 18.7% [26, 28, 29]. Compared to the 8.85% estimated FB positives (see above), this suggests the true prevalence of BE in 60-year-old white males with GERD to be 111.3% greater than that detected by FB alone. Our base case of 106.5% additional yield, minus 15% false positives, would result in 90.5% true additional BE detected.
Surveillance
The cost and effectiveness of BE surveillance with treatment for LGD vs. natural history were taken from modeling studies [15]. There were two different models that looked at surveillance with treatment for LGD. One was developed by researchers from Erasmus University and the University of Washington (Erasmus/UW), and the other by researchers at Massachusetts General Hospital (MGH). These two models are described in the supplement for Kroep et al. and at the National Cancer Institute’s CISNET site [15]. These models followed non-dysplastic Barrett’s esophagus (NDBE) patients from age 60 to 100 for natural history and for surveillance with RFA for LGD. Cost was computed using 2015 US$, and effectiveness was measured in QALYs. The Erasmus/UW model estimated the additional cost and effectiveness of surveillance at $3733 and 0.2215 QALYs, while the MGH model estimated these parameters at $5255 and 0.2048 QALYs. We used the average of the two models and adjusted the cost to 2019 US$.
The Erasmus/UW and MGH models have estimated the number of cancers and cancer deaths for natural history and for surveillance with treatment for LGD for a cohort of 60-year-old males with newly diagnosed BE [15]. The MGH model estimated EAC incidence at 8.5% for natural history and 4.6% for surveillance with treatment, compared with 6.8% and 3.1% for the Erasmus/UW model. Thus, surveillance with treatment reduced EAC incidence by 45.9% and 54.4% for MGH and Erasmus/UW, respectively. For EAC-related death, MGH estimated 5.7% for natural history and 1.9% for surveillance with treatment, compared to 4.9% and 1.5% for Erasmus/UW. This corresponds to reductions of 66.7% and 69.4% for the two models. These rates of incidence and death, along with the reductions in both, due to surveillance vs. natural history, could then be multiplied by our model’s calculations of the number of additional BE cases in surveillance, as opposed to natural history due to the additional yield of WATS3D plus FB vs. FB alone. These results were expressed as the number of EAC cases and EAC deaths averted per 1000 screened. These numbers then were inverted to arrive at the number needed to screen overall using WATS3D plus FB to avert 1 additional cancer and 1 cancer death.
Costs
Cost of WATS3D and FB were based on Medicare reimbursement rates. WATS3D was based on 4 Current Procedural Terminology (CPT) codes: 88104, 88305, 88312, and 88361. The 88361 code was multiplied by 4 as there are usually 4 immunohistochemical (IHC) stains. The 2019 Medicare reimbursement total was $780. We assumed all patients were biopsied with forceps. While this assumption is unnecessary for FB costs as they are handled the same in both strategies and cancel out, this may overestimate the cost of WATS3D plus FB if in practice, not all patients would be biopsied. The cost for surveillance FB of false positive WATS3D included the facility charge (APC 5301/CPT 43239), physician charge (CPT 43239), as well as pathology charges (CPT 88305 + 88312) for each specimen jar. An unpublished analysis of Medicare claims data conducted for CDx Diagnostics by CodeMap found that on average 3.1 specimen jars were used for forceps biopsy. However, cases that were false positive WATS3D and true negative FB are likely to include many cases of very short length to biopsy, where fewer jars might be used. Nonetheless, we chose to be conservative and use 3.1 jars and bias results against use of WATS3D. We assumed surveillance of these false positives would be done at 3 and 6 years, following standard intervals for surveillance of NDBE.
Sensitivity Analysis
One-way sensitivity analysis was performed on all model parameters. Two-way sensitivity analysis was performed on the two key, uncertain WATS3D parameters: additional yield and false positive rate. Probabilistic sensitivity analysis was performed with each parameter modeled as a probability distribution, with 1000 trials to determine the proportion of the time each strategy was cost-effective for a range of willingness-to-pay values.
Probability and utility parameters were modeled as beta distributions, and cost parameters as gamma distributions. The base values were used for the means of these distributions, and the standard deviations were calculated based on using the low and high values of the one-way sensitivity analysis to represent a width of 4 standard deviations, akin to a 95% confidence interval [30]. The exception to the above was the parameter for the additional yield of WATS3D. The low and base values were chosen as being one-third and half of the value found in a recent large study, and the high value being the value from the study [19]. Accordingly, we used a triangular distribution with these three values as parameters (minimum, most likely, and maximum).