We included in the analyses all women (aged 15–99 at diagnosis) diagnosed with malignant, invasive breast cancer during 2000–2007, followed up until 31 December 2007, and collected by the Northern and Yorkshire Cancer Registry Information Service (NYCRIS), a population-based cancer registry covering 12 % of the English population. Ascertainment of the vital status was considered to be complete for all patients .
Each patient was allocated a socio-economic deprivation score according to her area (Lower Super Output Area) of residence at the time of diagnosis, using the English Indices of Multiple Deprivation (IMD) 2001 (income domain) . These scores were categorised according to the quintiles of their national distribution.
Each patient was allocated one of the four broad tumor TNM stages using a restrictive approach .
Information on surgical treatment was retrieved from a routinely collected national hospital dataset (Hospital Episode Statistics or HES). We retained surgical treatment within 1 month before and 6 months after the cancer diagnosis. The treatment (OPCS-4) codes  were categorized based on recommendations made by the Site-Specific Clinical Reference Group (SSCRG) for breast cancer  (Appendix 1). These categories were then dichotomized into ‘major treatment’ (axillary dissection or other axillary nodal procedures, breast conserving surgery, mastectomy, and plastic surgery) and ‘minor or no surgery’ (other surgical procedures and none).
We estimated net survival from breast cancer, for each deprivation group and by stage, using the Pohar-Perme estimator  implemented in the Stata  package stns .
The assumed causal relationships between variables are represented by a Directed Acyclic Graph (DAG) (Fig. 1, Appendix 2). Our main exposure of interest, the patient’s deprivation level, causally influences the age at which a woman was diagnosed with breast cancer, her comorbidity, thoroughness of the disease investigation, stage at diagnosis, the treatment received, and survival status after the cancer diagnosis. Year and regions at diagnosis were considered as baseline confounders. Factors such as the quality of investigation and comorbidity (shown in grey as unmeasured variables) were incorporated in the DAG. The omission of variables and arrows also represents our causal assumptions, e.g. we assume that the quality of the investigation does not affect survival except through its effect on stage at diagnosis.
We examined what proportions of the deprivation gap in survival were explained separately by tumor stage and treatment. Because of our data structure (in particular, the existence of important mediator-outcome confounders affected by exposure, the likely presence of many interactions and the fact that our outcome is binary) we focused on the decomposition of the total causal effect (TCE) into what have recently been termed randomized interventional analogues of natural direct and indirect effects, henceforth RIANDE and RIANIE [21–23].
The RIANDE and RIANIE can be estimated with an extension of Robins’ g-computation formula  implemented using Monte Carlo simulation in the Stata command gformula . We chose this method because of flexible modelling that allows interactions and other non-linearities. Although flexible in terms of parametric modelling assumptions, this method relies on the assumptions of no unaccounted confounding of the exposure–mediator, mediator–outcome or exposure–outcome relationship.
We conducted three analyses to investigate the mediating roles of stage and treatment (Appendix 3, Appendix 4). We first estimated the proportion of the effects of deprivation on survival that was mediated by differences in stage at diagnosis, i.e. we computed the ratio between the effect of deprivation on log odds of death that was mediated by stage (the RIANIE) and the total effect of deprivation on log odds of death (the total causal effect, TCE, which is the sum of the RIANDE—the effect not mediated by the mediator stage—and the RIANIE). In the second analysis, we estimated the proportion of the effect of deprivation on log odds of death that was mediated by differences in treatment. Stage at diagnosis was here considered to be a confounder of the relationship between treatment and survival, and was allowed to be affected by deprivation. Such a confounder is dealt with using an extension of the g-computation formula [24, 25]. In the third analysis, we estimated the proportion of the effect of deprivation on treatment that is mediated by differential stage.
Because the deprivation gap in survival varies by time since diagnosis, the binary survival outcome (dead vs. alive) was stratified according to time since diagnosis: at 6 months, 1 year given (conditioning on) 6-month survival, 3 years given 1-year survival, and 5 years given 3-year survival. The analyses were performed separately on each of these four binary survival outcomes, in order to disentangle early from late mediating effects of stage and treatment on deprivation gap in survival.
We used multinomial regressions to model stage at diagnosis (four categories) and logistic regression for treatment and survival status. Age at diagnosis was modelled using restricted cubic splines .
Single stochastic imputation within the g-computation procedures was used to handle missing stage (8 %). All variables in the models (including vital status), exact length of follow-up times and detailed treatment categories were included in the imputation model.