Introduction

Breast cancer is the most common cancer among patients in the USA and has a 12.8% lifetime risk [1]. When diagnosed at early stages (stages I–IIIA), 5-year survival from breast cancer is over 95% [2]. A component of high survival rates in early stage breast cancer is timely access to surgical treatment including breast conserving surgery (BCS) or mastectomy [3]. For patients with early stage breast cancer, these surgical options have equivalent long-term outcomes when BCS is followed by adjuvant radiation [4, 5]. Several large studies including one sampling over 1 million patients using the National Cancer Database (NCDB) have described shifting trends for the relative usage of mastectomy and BCS between 1998 and 2011 [6]. However, prior studies including ones using the NCDB undersample patients at risk for poor outcomes like rural residents, people of color, and the uninsured [7]. As the NCDB only includes data from patients treated at Commission on Cancer Certified facilities, which are less likely to be located in rural in low-income areas, large groups of vulnerable patients are missing from such analyses [8]. The result of this underrepresentation is reduced understanding of cancer care delivery trends in populations with barriers to high quality cancer care.

Population-based cancer registries are an important source of cancer surveillance data. Cancer registries in the USA accredited by North American Association of Central Cancer registries (NAACCR) collect data on the first course of treatment received for chemotherapy, radiation therapy, hormone therapy, and surgery where applicable for certain cancer types [9]. This systematic collection of treatment information represents a potentially underutilized data source in understanding initial patterns of healthcare utilization among cancer patients. Because statewide registries are typically a complete census of all individuals diagnosed with cancer in a given state, they include groups underrepresented in other types of cancer research. For instance, rural cancer patients, those who are uninsured, and cancer patients that lack continuous insurance coverage are often excluded from or underrepresented in insurance claims-based research [10]. The availability of treatment information contained in cancer registries for all cancer patients introduces a means to understand population level patterns in cancer care delivery with a health equity lens because it mitigates selection bias issues induced by data sources that rely on insurance coverage or treatment at particular accredited facilities.

A key concern limiting the value of cancer registry surgical treatment data for research has been the quality of the data across diverse populations, particularly with regard to under-ascertainment of procedures. Some prior validation studies of statewide cancer registries have focused on cancer registry data quality for radiation, chemotherapy, and hormone therapy [9, 11,12,13,14]. Therefore, validity of surgical treatment in state-based registries specifically among populations who experience health inequities, like rural and Black patients, remains unclear. In a state-based cancer registry validation study among adolescents and young adults that compared North Carolina state cancer registry data to Medicaid and private insurance claims, investigators found high agreement between the information contained in the cancer registry and the claims data for radiation therapy, moderate agreement for chemotherapy, and low agreement for hormone therapy [13]. Further, some previous validation studies that did include surgical treatment have been restricted to an older Medicare insured population. A prior study assessed the validity of treatment data through Medicare claims for registries included in the Surveillance Epidemiology and End Results program and indicated that because of low sensitivity, treatment data from the registries for chemotherapy, radiation, and hormone therapy should not be used for population-based assessments in treatment trends [15]. Since surgery often occurs early in the course of treatment and is a discrete event, data captured in state-based registries may be superior to that for chemotherapy or hormonal therapy, but this hypothesis remains untested. The usage of multipayer insurance claims as a gold standard allows unique assessment of state-based cancer registry validity across all age groups and multiple demographic populations.

High quality state-based cancer registries capture the full population of those needing cancer treatment, regardless of insurance status. Because of their representativeness, state-based cancer registries have the potential to illuminate treatment trends in groups historically underrepresented in health services research. Using a combination of the Medicaid, Medicare, and private insurance claims as a gold standard we sought to evaluate the validity of the surgical treatment information contained in a statewide cancer registry with a focus on understanding the validity of this information for two critical populations affected by health inequities, rural and Black patients.

Materials and methods

Study population

We used data from the University of North Carolina Cancer Information and Population Health Resource (CIPHR). CIPHR includes a data linkage between the North Carolina Central Cancer Registry (NC CCR) and insurance claims from Medicaid (2003 to 2012), Medicare (2003 to 2017), and commercial insurance claims (2003 to 2017). The details of this data linkage have been described previously [16]. Since 2008, the North Carolina Central Cancer Registry has met the highest standard of certification from the North American Association of Central Cancer Registries indicating completeness, accuracy, and timeliness of information to calculate cancer incidence statistics [13]. From a CIPHR query, we identified all patients over the age of 18 diagnosed with clinical American Joint Committee on Cancer (AJCC) stage I-III invasive breast cancer between 2003 and 2016 in the North Carolina Central Cancer Registry. We excluded patients with ductal carcinoma in situ or metastatic disease as determined by AJCC stage 0 or IV in the cancer registry due to inconsistent surgery indication in this population [17, 18]. We then restricted to patients who were continuously enrolled in at least one of the linked insurances programs for ≥ 1 year after their cancer diagnosis (Table 1). Due to the necessity for a defined time window to observe the surgery in the insurance claims data, cases were excluded if they were uninursed at diagnosis and later became insured, or if they switched insurance types during the 1 year period. Patients dually eligible for Medicaid and Medicare were included.

Table 1 Demographic characteristics of study population

Key variables and covariates

Our primary outcome of interest was receipt of breast cancer surgery. Our secondary outcome was surgery type: either BCS or mastectomy. In the North Carolina Central Cancer Registry receipt of surgery was defined using the North American Association of Central Cancer Registries-specified data fields (Item #670) for first course of surgical treatment. Surgery codes 20–24 indicated BCS and presence of codes 30–80 indicated mastectomy. In the insurance claims data, for all three payers, surgery was determined by examining the outpatient and inpatient files for Current Procedural Terminology (CPT) codes and International Classification of Disease (ICD-9 and ICD-10) procedure codes. For patients who had claims for more than one breast cancer surgery within the 1-year follow-up, we classified the last procedure type a patient had within 12 months of diagnosis as their surgical treatment. The last procedure was selected a priori to reduce the miscategorization of excisional biopsies as definitive surgical treatment.

We included several variables in stratified analyses. Insurance type was defined as Medicaid, Medicare, commercial insurance, or multiple payers including those dually enrolled in Medicaid and Medicare Fee-For-Service. These categorizations were determined based on insurance information at diagnosis as determined by the insurance claims enrollment files. For patients will multiple payers, both sets of claims were assessed for presence of surgical procedures. A surgery was counted in the category of the payer that primarily paid for the procedure. From the NC CCR we also included categories for registry abstracted race (Black, and non-Black). Black patients were focused on in the sensitivity analysis because they are a population affected by inequities in breast cancer treatment and outcomes in the state of North Carolina [19]. Using residential address at diagnosis census tracts were identified and categorized each tract as urban or rural using the 2010 US Department of Agriculture Rural–Urban Commuting Areas (RUCA) codes. Addresses in census tracts with codes 1–3 were categorized as urban and patients living in codes 4–10 were categorized as rural. This dichotomization is recommended by the federal Office of Rural Health policy and has previously been used in CIPHR data [20, 21]. Relative frequencies of covaries were calculated for the total early stage breast cancer population in the registry, the population that had 12 months continuous enrollment in insurance claims, and the population that was unable to be linked to claims.

Statistical analyses

Our first objective was to evaluate the quality of the surgical treatment information contained in the NC CCR data. To do this we first calculated the sensitivity, positive predictive value, specificity, negative predictive value, and Kappa statistics for the receipt of any surgical treatment versus not receiving surgical treatment using the insurance claims data as the presumed gold standard. Where sensitivity was defined as the probability that patients were coded as having surgery in the cancer registry given that they had an insurance claim for surgery. Positive predictive value was the probability that patients had an insurance claim for surgery given that they were coded as having surgery in the registry. Specificity was defined as the probability that patients did not have a code in the registry for surgery given that they did not have an insurance claim for surgery. The Kappa statistic is a measure of agreement between categorical variables that accounts for the potential that agreement occurred by chance [22]. We also conducted several stratified analyses to determine if the measures of reliability differed among groups especially affected by cancer inequities in treatment outcomes: Black patients, and rural patients. Kappa statistics greater than or equal to 0.81 were considered strong agreement [23]. we also evaluated if measures of reliability differed across insurance type. Finally, to evaluate temporal consistency of the data, we examined the change in measures over the study period.

In secondary analyses, we evaluated the sensitivity, specificity, positive predictive value, and negative predictive value of type of surgery received, either mastectomy or BCS. These analyses were restricted to those who were coded as having received surgery in both the NC CCR and claims. This restriction was done because to calculate agreement between the registry and claims data, information on surgery type from both sources of data is necessary. For both analyses on overall receipt of surgery and surgery type, we calculated the Kappa coefficient to evaluate the amount of agreement between the claims data and the North Carolina Central Cancer Registry. In these analyses, sensitivity was defined as the probability that a patient was coded as receiving a BCS in the registry given that she had an insurance claim for BCS. Specificity was defined as the probability a patient was coded as having received a mastectomy in the cancer registry given that she had an insurance claim for mastectomy. Positive predictive value was the probability a patient had a claim for BCS given that they were coded as having BCS in the registry. Negative predictive value was the probability a patient had a claim for mastectomy given that they were coded as having mastectomy in the registry.

We also conducted a sensitivity analysis to examine the robustness of our results to the choice of surgical procedure from the claims data. Instead of assigning the patient the last surgical procedure that occurred within a year of diagnosis, instead we assigned the first surgical procedure within 12 months captured in the insurance claims among patients who had multiple surgical procedures recorded during that time.

Results

In total, there were 63,361 patients diagnosed with stage I–III breast cancer that were captured in the cancer registry. Of these patients, a total of 35,626 patients did not link to claims, and 916 did not have one year of continuous enrollment following diagnosis. This resulted in a study population of 26,819 patients who had 12-month continuous enrollment and linkage to insurance claims. The study population was mostly Medicare insured (65%), followed by commercially insured (23%), and Medicaid insured (4%), (Table 1). Compared to the overall breast cancer population in NC, the population that linked to claims was comparably Black and rural, however had an older age distribution and overrepresentation of Medicare enrollees (Table 1). Approximately, 9% (n = 2,387) of patients in the study had multiple insurance types at diagnosis, of those with multiple insurance types 1,923 (81%) were dually enrolled in Medicaid and Medicare Fee-For-Service.

A total of 2090 (9%) of patients had codes for multiple procedures for different types of surgery in the insurance claims data. In the overall sample in the cancer registry 19% were Black patients, 19% lived in rural areas and 44% were over the age of 65. In the claims-linked sample 17% were Black patients, 22% of patients lived in rural areas, and 68% were 65 or older.

Of the included 26,819 patients, 23,125 had both a claim for surgery and a surgery listed in the cancer registry. The overall sensitivity was 97.9% (95% CI 97.8%, 98.1%) and positive predictive value was 93.2% (95% CI 93.0%, 93.4%). This high sensitivity and positive predictive value in identifying patients who had definitive surgery was robust in key populations including patients in rural areas, Black patients, and Medicare insured patients (Table 2).

Table 2 Comparison of receipt of breast cancer surgery in the registry and in the insurance claims stratified by key characteristics

When examining the receipt of surgery by insurance types, sensitivity was lower for Medicaid insured patients at 93.8% (95% CI 91.6%, 95.5%). This contrasts with sensitivity of 97.5% (95% CI 97.1%, 97.9%) for commercially insured patients and 98.3% (95% 98.1%, 98.5%) for Medicare insured patients. Positive predictive value for Medicaid insured patients was markedly lower at 70.5% (95% CI 69.2%, 71.8%) compared to 92.7% (95% CI 92.4%, 92.9%) for commercially insured patients and 94.5% (95% CI 94.2%. 94.7%) for Medicare insured patients. Of note, for commercial insurance in particular, the registry had procedures recorded for 428 patients that there was no corresponding commercial claim for yielding a specificity of 14.6% (95% CI 11.6%, 18.0%).

Sensitivity in identifying patients who had surgery increased marginally overtime from 2003 to 2016 (Table 2). In total, 1526 patients (5.8%) had neither a claim nor flag in the cancer registry for surgery. Overall, 1,857 patients had discordance between surgery recorded in the registry and surgery identified in claims which resulted in an overall specificity of 47.6% (95% CI 45.8, 49.3). In cases where there was discordance, the registry was more likely to report the presence of the procedure when there was no corresponding insurance claim.

In analyses examining the type of surgery received among those who had data in both the registry and the claims, there was a high agreement between the surgery type recorded in the cancer registry and the type in the insurance claims (Table 3). The probability that a woman had a code for mastectomy in the registry given that she had an insurance claim for mastectomy was 96% or greater for rural patients, Black patients, and Medicare insured patients. The probability that a woman had a code for BCS in the registry given that she had an insurance claim for BCS was 95% for all three groups. Kappa statistics over 0.9 for all key groups indicated very high agreement. The kappa value for Medicaid insured patients remained overall high at 0.82 (95% CI 0.77, 0.6), however was notably lower than the kappa values for Medicare (Kappa: 0.91 95% CI 0.91, 0.92) and privately insured patients (Kappa: 0.91, 95% CI 0.91, 0.93). Over the 2003 to 2016 study period, sensitivity, specificity, and Kappa values increased.

Table 3 Comparison of type of surgery received among patients with surgery data in both registry and insurance claims stratified by key characteristics

In sensitivity analyses examining the match between the cancer registry and claims data for the type of surgery received there were notable differences by whether the first or last occurring procedure in the insurance claims was used. In contrast to our a priori use of the last recorded surgery in the analysis above, when examining the first procedure listed in the claims data, 10.2% of patients had discordance between the registry and claims. When examining the last procedure received, only 4.3% of patients were discordant between their surgery type recorded in the registry and what was recorded in the insurance claims. The Kappa statistic for the last procedure was 0.91 (95% CI 0.90, 0.92) compared to a Kappa statistic of 0.77 (95% CI 0.76, 0.79) when the first procedure was used.

Discussion

In this cohort of patients with stage I–III breast cancer in North Carolina, we evaluated the validity of surgical treatment data using insurance claims as the presumed gold standard. Counter to our expectation, there was evidence of variation in quality of claims data for identifying surgical treatment for breast cancer. While concordance between the registry and claims was high for Medicare and private insurance, there was evidence that Medicaid claims may undercount relative to cancer registry data. We found ≥ 90% sensitivity and PPV overall for the receipt of surgery, however the PPV was considerably lower for Medicaid insured patients at 71%. When evaluating validity of type of surgery received, among those who had data in both the registry and the claims, sensitivity and PPV were ≥ 95%. Our analyses suggested that in this state-based cancer registry, the detection of surgical treatment and differentiation between surgery type was reliable, particularly for sensitivity and positive predictive value, and could be used in studies evaluating patterns in surgical treatment initiation.

While the cancer registry was overall able to accurately report the presence of surgery that also occurred in the claims, there was far more variation regarding the absence of surgery. There were more instances of the registry having a report for surgery that there was no corresponding claim for than cases of identifying an insurance claim for surgery that did not appear in the registry. This is contrary to our original hypothesis that in places of discordance, the claims data would more likely report a procedure that did not appear in the registry. This finding resulted in low specificity, NPV, and kappa statistics for the analyses on the presence or absence of surgery.

This investigation reported higher sensitivity of cancer registry data for identifying surgery than prior studies have reported for other components of cancer treatment including chemotherapy, radiation, and hormone therapy [9, 13, 15]. For example, in a validation study using SEER-Medicare, the sensitivity for identifying individuals who had received chemotherapy was 88% [24]. Prior studies examining the validity of surgical treatment of breast cancer have primarily focused on SEER registries using Medicare insurance claims as a gold standard. In our analysis, the North Carolina cancer registry’s ability to distinguish between types of surgery described in this study marginally exceeded the ability to identify different types breast cancer surgical treatment previously described in SEER-Medicare linkage in the early 1990s which reported 95% overall agreement between SEER registries and Medicare for patients who received a mastectomy and 91% agreement for patients who had BCS [25]. The high kappa statistics for BCS and mastectomy were comparable to those seen in a similar validation study conducted using a Canadian cancer registry and administrative claims [26]. We additionally found low specificity for distinguishing patients who did not have definitive surgery, this finding is also in line with a prior SEER-Medicare investigation that described low agreement and a moderate among of discordance between the SEER registries and the Medicare claims for patients who did not receive surgery [27]. A unique strength of this investigation was the ability to use multipayer claims as a gold standard to include a wider range of insurance types and ages for validation than prior studies.

Of note, for both distinguishing between patients who had surgery and those who did not, and also for identifying type of surgery received among those who had data in both the registry and claims, Kappa statistics and PPV were lower for patients insured by Medicaid than those insured by other payers. This discrepancy could be due to incomplete visualization of information for patients dually enrolled in Medicare Advantage, since data used in this investigation included claims only for fee for service Medicare. Other possible explanations for this finding could include receipt of surgery by these patients at institutions that were not aware of the patient’s Medicaid eligibility and did not attempt to file a claim, choice of an institution not to file a claim for other financial reasons, mis-classification by registry staff of diagnostic procedures, such as excisional biopsies, as surgeries, or other explanations. Overall, our findings suggest that users of Medicaid claims data may need to proceed with some caution when ascertaining patterns of surgical care among Medicaid patients. Additionally, for commercially insured patients there was a sizeable subset of patients for whom the registry had reported a surgery that there was not a corresponding claim.

This study has several notable strengths. This investigation used multiple insurance payers and included individuals dually enrolled in Medicaid and Medicare Fee-For-Service to validate the surgical information contained in a state-based cancer registry for breast cancer patients. The sizeable number of patients included in this allowed for stratified analyses key populations especially affected by cancer inequities, Black patients and rural residing patients and included all ages. This study also has several limitations. Although we used insurance claims data as the gold standard in our analyses, there were instances of the cancer registry reporting the presence of procedures that were not identified in claims potentially indicating an imperfect gold standard. Further, the insurance claims may misclassify types of surgeries as these data were designed for billing and not research purposes. Additionally, we were only able to evaluate validity among patients with linkage to insurance claims thus we could not evaluate the validity of treatment information for uninsured patients, so the generalizability of these results cannot be confirmed. In the registry, 3% of patients were uninsured and 8% had unknown or missing insurance status and that registry surgical data quality in this population remains unknown. Related to limited inferences for Medicaid and uninsured populations, the exclusion criteria for this study excluded patients who may have been uninsured at diagnosis and later became Medicaid insured at time of surgical treatment. Further, we were unable to distinguish between unilateral and bilateral surgical procedures in this validation analysis.

Overall, the robustness of the findings for both identifying patients who received surgical treatment and for the type of treatment received has implications for health equity research in cancer care delivery. State-based cancer registries have uniform reporting standards that are independent of a patients’ insurance type or treating facility. Because of this, though we were unable to directly assess the validity in the uninsured, investigations that use treatment data from state-based cancer registries are not limited by the selection bias that exists in investigations that are contingent upon insurance enrollment or treatment at certain accredited facilities [28]. This is especially consequential, as often studies of healthcare utilization using insurance claims data require a certain amount of continuous enrollment to observe the treatment, thus inherently excluding individuals with transient insurance coverage. The ability to study cancer treatment trends particularly in underrepresented populations in cancer research is especially urgent following the widespread loss of employer sponsored health insurance due to COVID-19 [29]. The validity of the surgical information observed in this study suggests that state-based cancer registries can be an important source of data for understanding population level patterns in cancer care delivery in populations underrepresented in cancer research.