SEER patient cohort selection
Retrospective patient-level data from the SEER 18 registries database (with additional treatment fields on radiation therapy) were used for multi-state modeling of disease progression. Eligible cases included women with grade I, II, and III histologically confirmed DCIS as first primary, diagnosed between 1992 and 2016, aged ≥ 40 years at diagnosis, and with known laterality, local treatment status (surgery and radiotherapy), survival time, and cause of death. Exclusion was warranted under any of the following criteria: iIBC ≤ 2 months following DCIS as this might signify upstaging of the DCIS lesion to invasive carcinoma; death of any cause ≤6 months following DCIS diagnosis; synchronous diagnosis of contralateral invasive carcinoma (cIBC); Paget’s disease; patients treated with postmastectomy radiation therapy; and patients not receiving treatment due to comorbidities or refusal (as coded in SEER). Figure 1 shows the numbers of cases excluded.
Capturing local invasive recurrences in SEER
To understand the impact of changes in SEER coding rules in 2007 which may have led to the under-reporting of subsequent iIBC following DCIS, we calculated the annual iIBC incidence density rate in the 5 years pre- and post-2007. This calculation is based on the number of iIBC events in each annual period, divided by the product of the person-time of the at-risk population during each period. This is presented for the full cohort (all risk groups), and by treatment group to account for changing treatment patterns.
Model building and statistical analysis
The multi-state model structure includes six mutually exclusive states, and the seven transitions between each state (Fig. 2). The effects of baseline patient, disease, and treatment characteristics on each transition was assessed using multivariate Cox proportional hazard regression models. The selected covariates included age at diagnosis (40–49, 50–69, 70–74, 75–79, ≥ 80 years), diagnosis year (1992–2016), race (Hispanic and non-Hispanic white, Hispanic and non-Hispanic black, other [Asian, Native American, Pacific Islander]), grade (I, II, III), lesion size (< 2 cm, ≥ 2 cm), estrogen receptor (ER) status, and local treatment strategy (no local treatment, breast conserving surgery [BCS] only, BCS followed by radiotherapy [RT], mastectomy). Complete cases were available for all variables (age, diagnosis year, treatment strategy), except for ER status, lesion size, and race. Missing observations were imputed with the substantive model compatible fully conditional specification method using co-variables with complete cases (age, diagnosis year, treatment strategy) and outcome (time, event). This method allows greater flexibility for non-linear models such as the Cox model, in that partially observed covariates are imputed based on non-linear covariate effectsx . The R package smcfcs version 1.4.0 was used.
To address possible confounding by indication, i.e. the systematic differences between patients undergoing different treatment strategies, propensity scores (PS) were calculated for each individual. The propensity score is an individual’s probability of receiving treatment given their pre-treatment characteristics (i.e. age, diagnosis year, grade, race, lesion size, ER status). As there are four treatment strategies being compared, generalized boosted regression models were used to compute PS weights which balance the distribution of selected characteristics between treatment and comparison groups. The pre-treatment characteristics listed above were used to calculate PS. The mean standardized effect size and Kolmogorov-Smirnof statistic were used to choose the optimal number of iterations to establish balance. Average treatment effect (ATE) analysis was conducted to determine the relative effectiveness of no intervention, BCS, BCS+RT, and mastectomy on average in the population. For each transition-specific Cox proportional hazards model, individuals were weighted by the inverse probability of receiving the treatment they received. Doubly robust estimation controlled for any covariates with lingering imbalances. PS analysis was conducted using the R package Twang version 1.5.
To address the violation of the proportionality assumption for some predictors in the Cox model for the transition from DCIS diagnosis to iIBC and to address the Markov assumption, time to iIBC was split at 5 years post-DCIS. Therefore the following multi-state transitions were modeled: T1. DCIS diagnosis → iIBC ≤ 5 years following diagnosis; T2. DCIS diagnosis → iIBC > 5 years following DCIS diagnosis; T3. DCIS diagnosis → cIBC; T4. DCIS diagnosis → death; T5. iIBC ≤ 5 years following diagnosis → death; T6. iIBC > 5 years following diagnosis → death; T7. cIBC → death. Intermediate lesions such as a subsequent diagnosis of DCIS during follow-up after initial DCIS are not considered in the model.
Conditional transition probabilities were computed for each treatment strategy cohort (except mastectomy) and the sub-cohort of patients with low-risk features (Hispanic and non-Hispanic white women aged 50–69 at diagnosis, with ER+, grade 1 + 2, ≤ 2 cm DCIS lesions) by building Cox models stratified by transition to compute cumulative transition hazards transformed into conditional transition probabilities using the Aalen-Johansen estimator. State occupation probabilities at different time points following DCIS diagnosis could be derived from these values. Data preparation and multi-state modeling was done using the R package mstate version 0.2.11.
PS-matched groups were also created for comparison when calculating the transition probabilities derived from the multi-state models. 1:2 matching of the n=338 individuals in the low-risk non-intervention group to each of the low-risk treatment groups was carried out using the “nearest neighbour” method in the MatchIt R package version 3.0.2. Exact matching was specified by year of diagnosis, age at diagnosis, and grade. Differences in iIBC at 5 years between low-risk PS-matched treatment groups were also evaluated using hazard ratios with 95% CIs derived from Cox proportional hazards models.
All statistical analyses were performed with R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria).