Causal inference with recurrent and competing events

Many research questions concern treatment effects on outcomes that can recur several times in the same individual. For example, medical researchers are interested in treatment effects on hospitalizations in heart failure patients and sports injuries in athletes. Competing events, such as death, complicate causal inference in studies of recurrent events because once a competing event occurs, an individual cannot have more recurrent events. Several statistical estimands have been studied in recurrent event settings, with and without competing events. However, the causal interpretations of these estimands, and the conditions that are required to identify these estimands from observed data, have yet to be formalized. Here we use a formal framework for causal inference to formulate several causal estimands in recurrent event settings, with and without competing events. When competing events exist, we clarify when commonly used classical statistical estimands can be interpreted as causal quantities from the causal mediation literature, such as (controlled) direct effects and total effects. Furthermore, we show that recent results on interventionist mediation estimands allow us to define new causal estimands with recurrent and competing events that may be of particular clinical relevance in many subject matter settings. We use causal directed acyclic graphs and single world intervention graphs to illustrate how to reason about identification conditions for the various causal estimands based on subject matter knowledge. Furthermore, using results on counting processes, we show that our causal estimands and their identification conditions, which are articulated in discrete time, converge to classical continuous time counterparts in the limit of fine discretizations of time. We propose estimators and establish their consistency for the various identifying functionals. Finally, we use the proposed estimators to compute the effect of blood pressure lowering treatment on the recurrence of acute kidney injury using data from the Systolic Blood Pressure Intervention Trial. Supplementary Information The online version contains supplementary material available at 10.1007/s10985-023-09594-8.


Introduction
Practitioners and researchers are often interested in treatment effects on outcomes that can recur in the same individual over time.Such outcomes include hospitalizations in heart failure patients (Anker and McMurray, 2012), fractures in breast cancer patients with skeletal metastases (Chen and Cook, 2004) and rejection episodes in recipients of kidney transplants (Cook and Lawless, 1997).However, in many studies of recurrent events, individuals may also experience competing events, such as death.These events may substantially complicate causal inference.
For example, consider investigators concerned with the effects of over-treatment of older adults with antihypertensive agents and aspirin.Over-treatment might lead to episodes of syncope (dizziness) caused by blood pressure becoming too low (in turn, possibly leading to injurious falls).Suppose these investigators conduct a randomized controlled trial in a sample of patients admitted to nursing homes with a history of cardiovascular disease and currently taking antihypertensives and aspirin.Patients are then randomly assigned to either discontinue or to continue both treatments (a similar intervention is considered in Reeve et al. (2020)).Assume no participant moves out of the nursing home or disenrolls from the study.After 1000 days, the investigators calculate the average number of syncope episodes in each arm and find that this average is lower in the treatment discontinuation arm compared to the continuation arm.However, they also calculate the proportion of deaths in each arm and find that mortality was higher in the discontinuation arm compared to the continuation arm.
In this example, all-cause mortality is a competing event for the outcome of interest (number of recurrent syncope episodes) because once an individual dies they cannot subsequently experience syncope.1 Due to the randomized design and the fact that no participant was lost to follow-up, the analysis these investigators conducted indeed has a "causal" interpretation: their results support an average protective effect of treatment discontinuation on the number of syncope episodes.However, analogous to previous arguments in the case where the outcome of interest is an incident (rather than a recurrent) event subject to competing events (Young et al., 2020;Stensrud et al., 2020b), this "protection" is difficult to interpret in light of the finding that mortality risk is increased by treatment discontinuation.The increased number of syncope episodes in the discontinuation treatment arm might only be due to the treatment effect on mortality.
The importance of characterizing the causal interpretation of statistical estimands is increasingly acknowledged (European Medicines Agency, 2020).In a series of articles by the Recurrent Event Qualification Opinion Consortium (Schmidli et al., 2021;Wei;Fritsch et al., 2021), six candidate causal estimands were proposed in recurrent event settings with competing events, defined by counterfactual contrasts under different treatment scenarios in the following: 1) the expected number of events in the study population; 2) the expectation over a composite of the recurrent and competing events in the study population; 3) the expected number of events under an intervention which prevents the competing event from occurring in the study population; 4) the expected number of events in a subset of the study population consisting of the principal stratum of individuals that would survive regardless of treatment; 5) the ratio of the expected number of recurrences to the restricted mean survival by the end of follow-up in the study population and 6) the ratio of the expectation over a composite of the recurrent and competing events to the restricted mean survival by the end of follow-up in the study population.
In addition to defining these various counterfactual estimands, Schmidli et al. (2021) considered some aspects of their differences in interpretation, as well as, for some of the estimands, approaches to statistical analysis.However, they did not consider assumptions needed to identify any of these counterfactual estimands in a given study with a function of the observed data.Once a causal estimand is chosen, this identification step is required to justify a choice of approach to statistical analysis.Furthermore, Schmidli et al. (2021) did not consider how underlying questions about treatment mechanism may be important to the choice of estimand in recurrent events studies when treatment has a causal effect on competing events, as illustrated in the example above.
In this work, we formalize the interpretation, identification and estimation of various counterfactual estimands in recurrent event settings with competing events using counterfactual causal models (Robins, 1986;Pearl, 2009;Richardson and Robins, 2013b;Robins and Richardson, 2011;Robins et al., 2020).Building on ideas in Young et al. (2020) and Stensrud et al. (2020b) for the case where the outcome of interest is an incident event (e.g.diagnosis of prostate cancer), we show that several of these estimands in recurrent event settings can be interpreted as special cases of causal effects from the mediation literature -total, controlled direct, and separable effects -by conceptualizing the competing event as a time-varying "mediator" (Robins and Greenland, 1992;Robins and Richardson, 2011;Robins et al., 2020).We give identification conditions and derive identification formulas for these estimands and demonstrate how single world intervention graphs (SWIGs) (Richardson and Robins, 2013b) can be used to reason about identification conditions with subject matter knowledge.Our results will also formalize the counterfactual interpretation of statistical estimands for recurrent events from the counting process literature (Cook and Lawless, 2007;Andersen et al., 2019), which has not adopted a formal causal (counterfactual) framework for motivating results.
The article is organized as follows.In Sec. 2, we present the structure of the observed data, without the complication of loss to follow-up.In Sec. 3, we define and describe several causal estimands for recurrent events in settings with competing events.In Sec. 4, we consider how to treat the censoring of events, including by loss to follow-up.In Sec. 5 we discuss identifiability conditions and give identification formulas for the proposed causal estimands.Furthermore, we demonstrate the convergence of discrete-time estimands to continuous-time estimands, and establish the correspondence between the discrete-time identification conditions and the classical independent censoring assumption in event history analysis. 2In Sec. 6, we describe statistical methods for the proposed estimands, and establish conditions for their consistency.In Sec. 7, we illustrate our results using the motivating example of over-treatment in older adults and syncope with simulated data.Finally, in Sec. 8, we provide a discussion.

Factual data structure
Consider the hypothetical randomized trial described in Sec. 1 in which i ∈ {1, . . ., n} i.i.d.individuals currently taking aspirin and antihypertensives are then assigned to a treatment arm A ∈ {0, 1} (0 indicates assignment to discontinuation of both aspirin and antihypertensives, 1 assignment to continuing both medications).Because the individuals are i.i.d., we suppress the subscript i.Let k ∈ {0, . . ., K + 1} denote K + 2 consecutive ordered intervals of time comprising the follow-up (e.g.days, weeks, months) with time interval k = 0 corresponding to the interval of treatment assignment (baseline) and k = K + 1 corresponding to the last possible follow-up interval, beyond which no information has been recorded.Without loss of generality, we choose a timescale such that all intervals have a duration of 1 unit of time until Sec.5.4.
Let Y k ∈ {0, 1, 2, . . .} denote the cumulative count of new syncope events by the end of interval k and D k ∈ {0, 1} an indicator of death by the end of interval k.Define D 0 ≡ Y 0 ≡ 0, that is, individuals are alive and have not yet experienced any post-treatment recurrent events at baseline.Let L 0 be a vector of baseline covariates measured before the treatment assignment A, capturing pre-treatment common causes of hypotensioninduced syncope and death, for example baseline age and previous history of injurious falls (Nevitt et al., 1991).For k > 0, let L k ∈ L denote a vector of time-varying covariates measured in interval k, e.g. containing updated blood pressure measurements. 32 While these results are shown for recurrent event outcomes, they also apply to the more classical competing event setting described in Young et al. (2020), which constitutes a special case of the current work.
3 Our presentation focuses on intention-to-treat effects by defining A as an indicator of baseline assignment to a particular treatment arm.Our results trivially extend to accommodate effects of adherence to a particular protocol at baseline by instead taking A to be the actual treatment strategy followed at baseline and by including common causes of treatment adherence, syncope and death in L 0 .In either case, indicators of time-varying adherence to the protocol may be important to include in L k , k > 0, for the purposes of identification to be discussed in later sections.
An example of a causal model describing the relation between treatment discontinuation of antihypertensive agents and aspirin (A), recurrent episodes of syncope (Y k ) and survival (D k ).
The history of a random variable through k is denoted by an overbar (i.e.Y k ≡ (Y 0 , . . ., Y k ) and L k ≡ (L 0 , . . ., L k )) and future events are denoted by underbars (i.e. We assume no loss to follow-up until Sec. 4. We assume that variables are temporally (and topologically) ordered as D k , Y k , L k within each follow-up interval.We adopt the notational convention that any variable with a negative time index occurring in a conditioning set is taken to be the empty set ∅ (e.g.P (A = a|L −1 , B) = P (A = a|B) for an event B).
An individual cannot experience recurrent events after a competing event, such as death, has occurred: if an individual experiences death at time k † , then Y j = Y k † and D j = 1 for all j ≥ k † .Thus, our observed data setting differs from the 'truncation by death' setting, in which the outcome of interest is considered to be undefined after an individual experiences the competing event (Young et al., 2020;Young and Stensrud, 2021;Stensrud et al., 2020a).For example, quality of life in cancer patients is only defined for individuals who are alive, whereas the number of new syncope episodes of a deceased individual is defined and equal to zero.
In what follows, we will use causal directed acyclic graphs (Pearl, 2009) (DAGs) to the represent underlying data-generating models.We assume that the DAG represents a Finest Fully Randomized Causally Interpreted Structural Tree Graph (FFRCISTG) model (Robins, 1986;Richardson and Robins, 2013b).Furthermore, we will assume that statistical independencies in the data are faithful to the DAG (see Appendix C for the definition of faithfulness that we adopt here).An example of a DAG, encoding a set of possible assumptions on the data generating law for the treatment discontinuation trial described in Sec. 1 is shown in Fig. 1.

Counterfactual estimands
In this section, we consider various counterfactual estimands in settings with recurrent and competing events.We propose extensions of previously considered counterfactual estimands that quantify causal effects on incident failure in the face of competing events (Stensrud et al., 2020b;Young et al., 2020;Stensrud et al., 2021) to the recurrent events setting.This includes a new type of separable effect, inspired by the seminal decomposition idea of Robins and Richardson (2011), that may disentangle the treatment effect on recurrent syncope from its effect on survival.We also discuss additional counterfactual estimands in recurrent event settings, considered in Schmidli et al. (2021) and in the counting process literature (Cook and Lawless, 2007).
We denote counterfactual random variables by superscripts, such that Y a k is the recurrent event count that would be observed at time k had, possibly contrary to fact, treatment been set to A = a.By causal effect, we mean a contrast of counterfactual outcomes in the same subset of individuals.
3.1.Total effect.The counterfactual marginal mean number of recurrent events by time k under an intervention that sets A to a is In turn, the counterfactual contrast quantifies a causal effect of treatment assignment on the mean number of recurrent events by k.Schmidli et al. (2021) referred to this effect as the 'treatment policy' estimand.However, in order to understand the interpretational implications of choosing this effect measure when competing events exist, it is important to understand that Eq. 1 also coincides with an example of a total effect as historically defined in the causal mediation literature (Robins and Greenland, 1992;Young et al., 2020).
In our running example, the total effect quantifies the effect of treatment (dis)continuation (A) on recurrent syncope (Y k ) through all causal pathways, including pathways through survival (D k ), as depicted by all directed paths connecting A and Y nodes intersected by D nodes in the causal diagram in Fig. 1 (Young et al., 2020).Therefore, a non-null value of the total effect is not sufficient to conclude that the treatment exerts direct effects on syncope (outside of death): the total effect may also (or only) be due to an (indirect) effect on survival, keeping individuals at risk of syncope for a longer (or shorter) period of time.
In addition to the total effect on recurrent syncope (Y k ), we might consider the total (marginal) effect of treatment on survival, given by the marginal contrast in cumulative incidences However, simultaneously considering the total effect of treatment on syncope and on survival is still insufficient to determine by which mechanisms the treatment affects syncope and death.For example, suppose that individuals in treatment arm A = 1 experience the competing event shortly after treatment initiation.In this case, no recurrent events would be recorded in this treatment arm.Clearly, in this setting it would not be possible for an investigator to draw any conclusions about the mechanism by which treatment acts on the recurrent event outside of the competing event.
3.2.Controlled direct effect.Consider the counterfactual mean number of events under an intervention that prevents the competing event from occurring and sets treatment to where the overline in the superscript denotes an intervention on all respective intervention nodes in the history of the counterfactual, i.e.Y a,d=0 k quantifies a causal effect of treatment assignment on the mean number of recurrent events by k under an additional intervention that somehow "eliminates competing events".Schmidli et al. (2021) referred to this effect as the 'hypothetical strategy' estimand.However, it is useful to notice that the effect (3) coincides with an example of a controlled direct effect as defined in the causal mediation literature (Robins and Greenland, 1992;Young et al., 2020).
In our example, the controlled direct effect isolates direct effects of treatment on recurrent syncope by considering a (hypothetical) intervention which prevents death from occurring in all individuals.An important reservation against the controlled direct effect is that it is often difficult to conceptualize an intervention which prevents the competing event from occurring (Young et al., 2020).For example, there exists no practically feasible intervention that can eliminate death due to all causes.Without clearly establishing the intervention being targeted, the interpretation of the direct effect is ambiguous and its role in informing decision-making is unclear.The unclear role of the controlled direct effect in decision-making was reiterated by Schmidli et al. (2021) in their discussion of the 'hypothetical strategy', although the authors did not discuss the role of the estimand in clarifying the mechanism by which treatment affects the outcome.
Eq. 4 is referred to as the modified treatment assumption and is described in detail in Stensrud et al. (2020a).This assumption is conceivable in our example by defining A Y and A D as a physical decomposition of the original study treatment A and by taking M Y ≡ A Y , M D ≡ A D .Specifically, define A Y as an indicator of assignment to continuing (A Y = 1) versus discontinuing (A Y = 0) only antihypertensives and A D an indicator of assignment to continuing (A D = 1) versus discontinuing (A D = 0) only aspirin.An individual receiving A Y = A D = 0 in this example has received the same treatment as A = 0 (assignment to discontinue both antihypertensives and aspirin) and an individual receiving A Y = A D = 1 the same treatment as A = 1 (assignment to continue both antihypertensives and aspirin).While a physical treatment decomposition is one way in which assumption (4) may hold, it may also hold for modified treatments that are not a physical decomposition (Stensrud et al., 2021(Stensrud et al., , 2020a)).
The marginal mean number of events under a hypothetical intervention where we jointly assign A Y = a Y and A D = a D for any combination of a D ∈ {0, 1} and Contrasts of this estimand for different levels of A Y and A D constitute particular examples of separable effects (Stensrud et al., 2020b(Stensrud et al., , 2021)), a type of interventionist mediation estimand (Robins and Richardson, 2011;Robins et al., 2020;Didelez, 2019).For example, the separable effect of A Y evaluated at Eq. 5 quantifies the effect of only treating with antihypertensives versus neither antihypertensives nor aspirin.These estimands correspond to the effects of joint interventions on the modified treatments A Y and A D , even when the modified treatment assumption (4) does not hold.If we had access to a four arm randomized trial in which individuals are observed under all four treatment combinations (A Y , A D ) ∈ {0, 1} 2 , and there is no loss to follow-up, these effects can easily be identified and estimated by two-way comparisons of the four different treatment combinations.However, because we only observe two of the four treatment combinations in the trial described in Sec. 2, namely A Y = A D = 1 and A Y = A D = 0, the separable effects target effects that require identifying assumptions beyond those that hold by design in this two arm trial.We will consider these assumptions in Sec.5.3.
Similarly to the case for total effect, the contrast ] captures causal pathways that involve both syncope and death, such as the path ] captures exclusively the effect of antihypertensives on syncope not mediated by survival.We can formalize this statement using the condition of strong A Y partial isolation, inspired by Stensrud et al. (2021): A treatment decomposition satisfies strong A Y partial isolation if There are no causal paths from Under strong A Y partial isolation, Eq. 5 captures only treatment effects on the recurrent event not via treatment effects on competing events, and is therefore a direct effect.In our example on treatment discontinuation, strong A Y partial isolation likely fails, as syncope may increase the risk of death via injurious falls.Therefore, Eq. 5 also captures effects of A Y on Y k via D k , and therefore cannot be interpreted as a direct effect outside of D k .A brief account of how to interpret the separable effects as direct and indirect effects under isolation conditions is given in Appendix A, and is discussed in detail for the competing risks setting in Stensrud et al. (2021).
3.4.Estimands with composite outcomes.Schmidli et al. (2021) proposed the estimands where µ a k = k i=0 I(D a i = 0) is the counterfactual restricted survival under an intervention that sets treatment to a. 4 They referred to Eq. 7 as the 'while alive strategy' estimand.A contrast in Eqs.7-8 under different levels of a captures both treatment effects on syncope and on the competing event.
Different types of composite outcomes have also been suggested.For example, Schmidli et al. (2021) described the estimand , which could also be extended by multiplying D a k or Y a k by a weight.Likewise, Claggett et al. (2018) introduced a reverse counting process, which can be formulated as for recurrent outcomes.The estimand is the expectation over a counting process which starts at M and decrements in steps of one every time the recurrent event occurs.If the terminating event occurs, the process drops to zero.There are common limitations to all estimands in this subsection: (1) Neither can be used to draw formal conclusions about the mechanism by which the treatment affects the recurrent event and the event of interest for the same reason as the total effect (Sec.3.1).
(2) The estimands (implicitly or explicitly) assign weight to the competing and recurrent events by combining them into a single effect measure.However, the choice of 'weights' is not obvious and can differ on a case-by-case basis.
(3) The estimands represent a coarsening of the information in the cumulative incidence and mean frequency, and therefore provide less information than simultaneously inspecting the mean frequency of syncope and the cumulative incidence of death.Inspecting the mean frequency and cumulative incidence curves separately gives the additional advantage of showing the (absolute) magnitude of each estimand separately as functions of time, which is not visible from the composite estimand alone.
3.5.Estimands that condition on the event history.The counterfactual intensity of the recurrent event process is defined as Eq. 9 is a discrete-time intensity of Y a k , conditional on the past history of recurrent events and measured covariates.One could then consider contrasts such as 4 In continuous time, the restricted survival can be written as t 0 I(T D ≥ s)ds, where T D is the time of the competing event.Taking the expectation gives t 0 S(s)ds for survival function S(t), which is the restricted mean survival in continuous time (Aalen et al., 2008).
However, because Eq. 9 conditions on the history of the recurrent event process up to time k, Eq. 10 generally cannot be interpreted as a causal effect, even though it is a contrast of counterfactual outcomes.This is because it compares different groups of individualsthose with a particular recurrent event and covariate process history under a = 1 versus those with that same history under a = 0. Thus, a nonnull value of Eq. 10 does not imply that A has a nonnull causal effect on Y at time k.This is analogous to the difficulty in causally interpreting contrasts of hazards for survival outcomes, and has already been discussed extensively in the literature (Robins, 1986;Young et al., 2020;Martinussen et al., 2020;Hernán, 2010;Stensrud and Hernán, 2020;Stensrud et al., 2020a).
3.6.Principal stratum estimand.Schmidli et al. (2021) also considered the principal stratum estimand Contrasts of Eq. 11, given by correspond to principal stratum effects, e.g. the survivor average causal effect (Robins, 1986;Frangakis and Rubin, 2002;Schmidli et al., 2021).The principal stratum estimand targets an unknown (possibly non-existent) subset of the population and therefore its role in decision-making is ambiguous (Robins, 1986;Robins et al., 2007;Joffe, 2011;Dawid and Didelez, 2012;Stensrud et al., 2020b;Stensrud and Dukes, 2021).Integrally linked to the unknown nature of the population to whom a principal stratum effect refers, this estimand depends on cross-world independence assumptions for identification that can not be falsified in any real-world experiment.

Censoring
Define C k+1 , k ∈ {0, ..., K} as an indicator of loss to follow-up by k + 1 such that, for an individual with C k = 0, C k+1 = 1, the outcome (and covariate) processes defined in Sec. 2 are only fully observed through interval k.Loss to follow-up (e.g.due to failure to return for study visits) is commonly understood as a form of censoring.We adopt a more general definition of censoring from Young et al. (2020) which captures loss to follow-up but also possibly other events, depending on the choice of estimand.
Definition 1 (Censoring, Young et al. (2020)).A censoring event is any event occurring in the study by k + 1, for any k ∈ {0, . . ., K}, that ensures the values of all future counterfactual outcomes of interest under a are unknown even for an individual receiving the intervention a.
Loss to follow-up by time k is always a form of censoring by the above definition.However, other events may or may not be defined as censoring events depending on the choice of causal estimand.For example, competing events are censoring events by the above definition when the controlled direct effect is of interest, but are not censoring events when the total effect is of interest (Young et al., 2020).This is because the occurrence of a competing event at time k † prevents knowledge of Y a,d=0 k † , but does not prevent knowledge of Y a k † .By similar arguments, competing events are not censoring events when separable effects are of interest because they do not involve counterfactual outcomes indexed by d = 0 ("elimination of competing events").When loss to followup is present in a study, we will define all effects relative to interventions that include "eliminating loss to follow-up" with the added superscript c = 0 to denote relevant counterfactual outcomes, e.g.Y c=0 k .The identification assumptions outlined below are sufficient for identifying estimands with this additional interpretation.Young et al. (2020) discuss additional assumptions that would allow an interpretation without this additional intervention on loss to follow-up.In Sec.5.4, we establish the correspondence between the notion of censoring adopted in this article and the classical independent censoring assumption in event history analysis.

Identification of the counterfactual estimands
In this section, we give sufficient conditions for identifying the total, controlled direct and the separable effects as functionals of the observed data.Proofs can be found in Appendix B. 5.1.Total effect.Consider the following conditions for k ∈ {0, . . ., K}: Eq. 12 states that the baseline treatment is unconfounded given L 0 .This holds by design with L 0 = ∅ when treatment assignment A is (unconditionally) randomized.Eq. 13 states that the censoring is unconfounded. Positivity Eq. 14 states that for every level of the baseline covariates, there are some individuals that receive either treatment.This will hold by design in trial where A is assigned by randomization, such as in the motivating example.The second assumption requires that, for any possible observed level of treatment and covariate history amongst those remaining alive and uncensored through k, some individuals continue to remain uncensored through k + 1 with positive probability. Consistency denote an increment of the process X.In the Appendix B we show that, under assumptions Eqs.12-16, Eq. 17 is an example of a g-formula (Robins, 1986).Another equivalent formulation is where 18is an example of an inverse probability weighted (IPW) identification formula.In turn, the total effect defined in Eq. 1 under an additional intervention that "eliminates loss to follow-up" can be expressed as contrasts of for different levels of a with E[∆Y a,c=0 i ] identified by Eq. 17 or Eq.18.In the survival setting, with support Y k ∈ {0, 1}, Eq. 17 corresponds to Eq. 30 in Young et al. (2020).A key difference from the survival setting is that the conditional probability of new recurrent events now depends on the history of the recurrent event process, which may take many possible levels, whereas in the survival setting considered by Young et al. (2020), the terms of the relevant g-formula are restricted to those with fixed event history consistent with no failure (Y k = 0).The identification formula for the total effect on the competing event (Eq.2) is shown in Appendix B. 5.1.1.Graphical evaluation of the exchangeability conditions.In Fig. 2 we show a single world intervention graph (SWIG) for the intervention considered under the total effect.This is a transformation (Richardson and Robins, 2013b,a) of the causal DAG in Fig. 1, which also includes unmeasured variables illustrating sufficient data generating mechanisms under which exchangeability conditions Eqs. 12-13 would be violated.In particular, Eqs.12-13 can be violated by the presence of unmeasured confounders (common causes of treatment, loss to follow-up, and outcomes) such as U AY or U CY in Fig 2 (A).This is well-known from before, and demonstrates how SWIGs can be used to reason about the identification conditions.
However, Eqs. 12-13, are not violated by unmeasured common causes of the outcomes Y k and D k (such as U Y , U DY and U D in Fig. 2 (B)), which we often expect to be present in practice.Examples of common causes of recurrent events and death include prognostic factors related to disease progression such as the previous history of injurious falls.In contrast, the controlled direct effect and separable effects are not identified in the presence of such unmeasured common causes of syncope and death, as we will see next.5.2.Controlled direct effects.The identification of the (controlled) direct effect (Eq. 3) proceeds analogously to the total effect, with the main difference being that we also intervene to remove the occurrence of the competing event.This amounts to re-defining the censoring event as a composite of loss to follow-up and the competing event.The identification conditions then take the following form for k ∈ {1, . . ., K}: Figure 2. Identification of total effect.Unmeasured variables are denoted by U • .(A) shows unmeasured confounders (common causes of treatment, loss to follow-up, and outcomes), which violate the exchangeability conditions Eqs. 12-13 through the red paths: U AY violates Eq. 12 and U CY violates Eq. 13. (B) shows unmeasured effect modifiers, which are common in practice.In the motivating example, this could for example be the history of prior injurious falls.The action of such unmeasured effect modifiers, shown by blue paths, does not violate any of the exchangeability conditions Eqs. 12-13.
We also assume the positivity assumption ( 14), and a modified version of the consistency assumption in Eq. 16 which requires us to conceptualize an intervention on the competing event (this consistency assumption is given in Appendix B).
Under assumptions Eqs.20-21, an identification formula is given by or equivalently by the IPW formula where we have defined For survival outcomes (Y k ∈ {0, 1}), Eq. 22 reduces to Eq. 23 in Young et al. (2020).In the absence of death and loss to follow-up and for randomized treatment assignment, both the total effect (Eq. 18) and controlled direct effect (Eq. 23 5.2.1.Graphical evaluation of the exchangeability conditions.Examples of unmeasured variables which violate the exchangeability conditions Eqs. 19-20 are shown in Fig. 3. Importantly, Eq. 20 is violated by unmeasured common causes U DY of D and Y .Therefore, the exchangeability assumption for the controlled direct effect (Eq. 20) is stronger than the exchangeability assumption for the total effect (Eq. 13).5.3.Separable effects.We begin by assuming the following three identification conditions. Exchangeability Eqs. 24-25 imply the exchangeability conditions for total effect (Eqs.12-13) due to the decomposition rule of conditional independence., can violate exchangeability (Eq.20). Positivity for all a ∈ {0, 1}, k ∈ {0, . . ., K + 1} and L k ∈ L. We also assume the positivity and consistency assumptions ( 14)-( 15) and ( 16).Eq. 26 requires that for any possibly observed level of measured time-varying covariate history amongst those who uncensored through each follow-up time, there are individuals with A = 0 and A = 1.
Consider a setting in which the A Y and A D components are assigned independently one at a time.We require the following dismissible component conditions to hold for all k ∈ {0, . . ., K}: where we have supposed that Under the identification conditions for separable effects and the modified treatment assumption (Eq.4), we have Eq. 31 can also be written on IPW weighted form as where we have defined The identification formula for separable effects on the competing event was first shown in Stensrud et al. (2021) and can also be found in Appendix B. 5.3.1.Graphical evaluation of the identification conditions.The exchangeability conditions ( 24)-( 25) can be evaluated in a similar way as for the total effect in Fig. 2.However, identification of the separable effects also require the dismissible component conditions to hold.These conditions can be evaluated in a DAG representing a four armed trial in which the A D and A Y components can be assigned different values (Stensrud et al., 2021), shown in Fig. 4. Here, we see that the unmeasured common causes of D and Y violate the dismissible component conditions.Thus, adjusting for the baseline covariates L 0 is therefore not only needed to ensure exchangeability with respect to the baseline treatment, but also to capture common causes of the recurrent event and the failure time.
5.4.Correspondence with continuous-time estimands.Up to this section, we have considered a fixed time grid in which the duration of each interval is 1 unit of time.In this section we will allow the spacing of the intervals to vary in order to consider the limit of fine discretizations of time.Let the endpoints of the intervals k ∈ {0, . . ., K + 1} correspond to times {0, t 1 , . . ., t K+1 } ⊆ [0, ∞).As before, we assume the duration of all intervals is equal, and denote this by ∆t.
We can associate the counterfactual quantities considered thus far in discrete time with corresponding quantities in the counting process literature.An overview of the corresponding quantities is presented in Table 1.
Table 1.Correspondence between discrete time quantities and continuous-time quantities.C and T D are the time of loss to follow-up and time of the competing event respectively, and L t is the process of measured covariates by time t.Factual quantities are variables that take their natural values, i.e. quantities that are not subject to any counterfactual intervention.These are different from observed quantities, which only contain the factual events that have been recorded in subjects that are under follow-up.The use of the superscript c in the right column represents a complete process, that is, a process where no individuals are lost to followup.

Discrete quantity
Quantity in the counting process literature Counterfactual quantity Observed quantity 5.4.1.Correspondence of identification conditions.In the counting process literature, it is usual (see e.g.Aalen et al. (2008) and Cook and Lawless (2007, Eq. 7.22)) to identify the intensity of the complete (i.e.uncensored) counting process as a function of the intensity of the observed (i.e.censored) counting process, using the independent censoring assumption where A corresponding formulation of Eq. 34 within the discrete-time framework is We assume that the possibility of experiencing more than one recurrent event during a single interval becomes negligible, i.e. ∆Y j ∈ {0, 1}, for fine discretizations.Thus, for small ∆t, Eq. 35 is closely related to We show in Appendix C that when the random variables in Eq. 36 are generated under an FFRCISTG model, and when consistency (Eq.16) and faithfulness hold, then exchangeability with respect to censoring (Eq.13) is implied by Eq. 36.In plain English, this result states that a discrete time analog of the independent censoring assumption implies the absence of backdoor paths between C a,c=0 i and Y a,c=0 j for all i ≤ j in Fig. 2.However, the reverse implication does not follow, as effects of C i on future ∆Y j (i.e. the presence of a path C i → ∆Y j for i ≤ j in a DAG) violates Eq. 36 without violating Eq. 13.The path C i → ∆Y j for i ≤ j could represent the presence of concomitant care which affects the recurrent outcome, and the consequences of such a path for the interpretation of discrete versus continuous-time estimands are clarified in Sec.5.4.3.A similar correspondence of identification conditions exists for the competing event (Robins and Finkelstein, 2000), and is stated in Appendix C. 5.4.2.Correspondence of identification formulas.In this section, we consider identifying functionals in the limit of fine discretizations of time.Justifications for the results are given in Appendix B.
In the limit of fine discretizations, E[Y a,c=0 k ] can be formulated as where , which heuristically means that ) .
Here, F B t denotes the filtration generated by the collection of variables and processes B. Eq. 37 is equivalent to The product-integral terms are covariate-specific survival functions with respect to the censoring event.Eq. 39 corresponds Eq. 7.29 in Cook and Lawless (2007) and targets a setting commonly called 'dependent censoring' in the counting process literature.
Under the strengthened independent censoring assumption which implies Eq. 34 without any covariates (L t = ∅), we have that W C,t = 1 with L t = ∅ and consequently Eq. 37 reduces to Eq. 41 corresponds to Eq. 13 in Cook and Lawless (1997).Further, if we define censoring as the first occurence of loss to follow-up and death, then Eq. 40 becomes and Eq.41 reduces further to Eq. 43 is the continuous-time limit of the controlled direct effect (Eq. 23) if Eq. 42 is satisfied for fine discretizations.It corresponds to the quantity R(t) in Andersen et al. ( 2019) and is cited by Cook and Lawless (1997) as a measure of the expected number of events for subjects at risk over the entire observation period, under the condition that the recurrent event is independent of the competing event.
The continuous-time limit of the identification formula for separable effects (Eq.32) is given by The weight W D (a Y , a D ) takes the form , .The mathematical characterization of the limit , where π • L D,j is defined in Eq. 33, depends on what type of process L D is.Many applications are covered when L D is a marked point process on a finite mark space.That is, L D takes values in a finite number of marks but can jump between marks over time.We will assume L D is such a process in Sec. 6.
Finally, a product integral representation of the total effect on the competing event is given in Appendix B.
In Table 2, we show an overview of the correspondence between the causal estimands discussed in Sec. 5 and common estimands that appear in the statistical literature.

Differences in interpretation.
In the counting process formalism of recurrent events, N c t is interpreted as the factual number of recurrent events that occurred by time t in a given individual, regardless of whether or not those events were observed by the investigators.Likewise, the observed counting process N t is interpreted as the number of events that were recorded while the subject was under follow-up.If study participants receive concomitant care in addition to the intervention of interest by virtue of being Table 2.A mapping of common recurrent events estimands in the literature to their counterfactual definition of risk.The third row shows a new proposed estimand, the separable effects for recurrent events, based on Stensrud et al. (2020b).

Definition Description
Alternative terminology Expected event count without elimination of competing events Marginal mean (Cook and Lawless, 1997), treatment policy strategy (Schmidli et al., 2021) Expected event count with elimination of competing events Cumulative rate (Ghosh and Lin, 2000;Cook and Lawless, 1997), hypothetical strategy (Schmidli et al., 2021) Expected event count under a decomposed treatment Does not correspond to classical estimands under follow-up, then individuals who are lost to follow-up may have different outcomes N c t compared to subjects under follow-up due to the termination of such concomitant care.This violates the independent censoring condition Eq. 34.Therefore, E[N c t ] is not identified when concomitant care under follow-up affects future outcomes N c t without additional strong assumptions.In other words, one cannot make inference on individuals who do not receive concomitant care by only observing individuals who do receive concomitant care.
By contrast, the counterfactual quantity Y c=0 k is often interpreted as the number of recurrent events that would be observed by time k under an intervention which prevented individuals from being lost to follow-up, i.e. in a pseudopopulation where all individuals receive the same level of concomitant care.E[Y c=0 k ] is still identified under effects of concomitant care on recurrent syncope, i.e. the arrows c a,c=0 k = 0 → Y a,c=0 k in Fig. 2 do not violate the exchangeability condition (Eq.13).In the special case where concomitant care does not affect future recurrent events, the interpretations of E[Y c=0 k ] and E[N c t k ] coincide.Similar arguments are given by Young et al. (2020), Sec. 5, for the incident event setting.

Estimation
The identification formulas motivate a variety of estimators that have been presented in the literature; examples can be found in Young et al. (2020);Stensrud et al. (2021); Martinussen and Stensrud (2021).In survival and event history analysis, researchers have traditionally been accustomed to estimands and estimators defined in continuous time.We mapped out correspondences between the discrete-time identification formulas and their continuous-time limits in Sec.5.4.2.In the following, we will present estimators that build on these correspondences.
We will consider the following general estimator applicable to several of the estimands considered above, Here, Ŷt is an estimator of a counterfactual mean frequency function under interventions of interest and Ŝ is needed to define the system in Eq. 44.Finally, D is an estimator of a counterfactual competing event process under interventions of interest.
The stochastic differential equation in Eq. 44 is uniquely determined by the integrators.Thus, presenting different estimators on this form amounts to presenting different integrators.
6.1.Risk set estimators.Identification formulas of the form Eq. 37, where the integrator conditions on the at-risk event T D ≥ u, C ≥ u, motivate the risk set estimators where Z i = I(T D,i ≥ •, C i ≥ •) is the at-risk process.Here, N i t = N c,i t∧C i is the observed counting process for the recurrent event, N D,i t = I(T D,i ≤ t, T D,i < C i ) is the observed counting process for death, and Ri , Ri,D are estimated weight processes of individual i (see Table 3).6.2.Horvitz-Thompson and Hajek estimators.Identification formulas of the form Eq. 39 motivate Hajek estimators and Horwitz-Thompson estimators, which give the integrators where

Rj
s I(A j = a) gives Hajek estimators, and H s = 1 gives Horvitz-Thompson estimators.
Ri is an estimated weight processes for individual i (see Table 3).The estimator defined by Eq. 44 may be unfamiliar to some practitioners, but it has the following properties: • The estimator is generic in the sense that, given weight estimators it can be used to estimate the total effect, the controlled direct effect, and the separable effect, and other composite estimands (e.g. the 'while alive' strategy) as defined in Sec. 3.
• Eq. 44 is easy to solve on a computer, as it defines a recursive equation that can be solved using e.g. a for loop.General software that can be used to solve systems like Eq. 44 is available for anyone to use at github.com/palryalen/.
In Theorem 1 in Appendix D we provide convergence results for the estimators in Eqs.44-45 for the case when the true weights are not known, but estimated.Convergence is guaranteed when the weight estimators Rt , RD t , and Rt converge in probability to the true weights for each fixed t, which is established for the additive hazard weight estimator we will consider in Sec. 6.3 (Ryalen et al., 2019, Theorem 2).
In Table 3 we present pairs of weights R i , R D,i , and Ri s that can be used in Eqs.44-45 to estimate the total effect, the direct effect, and the separable effects as defined in Sec. 3. Define W D , the weights associated with the intervention that prevents death from other causes, similarly to the censoring weights in Eq. 38, , where Λ D t and Λ

D|F t
are the compensators of N D with respect to F C,D,A t and F L,Y,C,D,A t .W C and W D are the unstabilized versions of these weights, defined as . Table 3. Weights R i , R i,D and R i in Eq. 45 and 46 for estimating the total effect, the direct effect, and the separable effects, respectively. Estimand -6.3.Estimating the weights.Suppose we have a consistent estimator of the propensity score π A , which will allow us to estimate the treatment weights in Table 3.
The time-varying weights in Table 3 solve the Doléans-Dade equation where W i is the weight of interest, N i is a counting process, α i and α * ,i are hazards, and θ i = α * ,i /α i .In Table 4, we present α i , α * ,i , and N i 's corresponding to the different weights in Table 3.We consider a weight estimator that is defined via plug-in of cumulative hazard estimates, where Âi and Â * ,i are cumulative hazard estimates of , where b is a smoothing parameter used to obtain the hazard ratio.
The weights only contribute to the estimator in Eq. 45 when individuals are at risk.This means that the counting process term in Eq.48 can be neglected for the upper four weights in Table 4 (as N D,i •− and C i •− are identically equal to zero then).The solution to Eq.48 is determined by the cumulative hazard estimates and the counting process.The smoothing parameter b contributes to the estimator only when N i jumps, which will not happen for the weights in the examples we consider.This is because the counting processes in Eq. 45 only jump when the respective N i 's are equal to zero.For the other weights, choosing b requires a trade-off between bias and variance, see (Ryalen et al., 2019) for details.
To estimate W i L D ,t , the weights associated with L D , we suppose that there are m marks.We consider the counting processes {N i h } m h=1 that "counts" the occurrence of each mark of individual i, having intensity where W i L D ,h,t solves Eq. 47 with We thus obtain an estimator of W i L D ,t by multiplying the estimators W i L D ,h,t , each of which solve Eq. 48.We present the choices of α i and α * ,i for the different weights in Table 4.
In summary, we suggest the following strategy for estimating the causal effects of interest: • Identify the requisite weights from Table 3 and specify hazard models α i , α * ,i from Table 4. • Solve Eq.48 to obtain estimates of the weight processes.
• Obtain Ri , RD,i , R i from Eq. 45 or 46 by multiplying together the weight estimates of individual i according to Table 3. • Solve Eq. 44 to obtain Ŷt (and/or Dt ), which estimates the expected number of events under the chosen intervention at t. • Repeat the previous steps with a contrasting intervention on treatment to obtain the targeted causal contrast.We use this estimation method in Sec. 7, assuming additive hazard models for the different α i 's and α * ,i 's.The estimators are implemented in the R packages transform.hazardsand ahw, which are available at github.com/palryalen/.The code is found in the online supplementary material.
Table 4. Hazards and counting processes that define Eq. 47 for the different weights.

Weight
Hazards in Eq. 47

Illustrative example: a trial on treatment discontinuation
In this section, we illustrate an application of the concepts and estimators outlined in Secs.5-6 for the total effect, controlled direct effect and separable effect, using a simulated data example motivated by the hypothetical trial described in Secs.1-2.We consider a simplified setting where there is one binary pre-treatment and post-treatment common cause of future events (syncope and death), denoted by L 0 (old age at baseline) and L 1 (blood pressure after treatment initiation) respectively.
In the data-generating model, we first sampled L 0 , L 1 according to Next, we generated the processes (Y k , D k , C k ) using the hazards The implementation of the data generating mechanism is shown in the online supplementary material.The data generating process is consistent with the causal DAGs in Fig. 5. Fig. 5 encodes the assumption that the A Y component (antihypertensive treatment) affects death and recurrent syncope via its effects on blood pressure (L 1 ) while the A D component (aspirin) acts directly on survival and has no effect on the recurrence of syncope except through pathways intersected by survival (D k ).The parameters of the data generating mechanism were chosen such that antihypertensive treatment (A Y = 1) increases the risk of death through the pathway A Y → L 1 → Y k → D k+1 , i.e. by lowering blood pressure, which in turn may lead to syncope and subsequent injurious falls.This is seen in Fig. 6 (A), as individuals who received antihypertensives (A Y = 1), shown by the black and red curves, experience a larger number of syncope episodes.Next, treatment with aspirin decreases the risk of death through cardiovascular protection via the pathway A D → D k+1 .As illustrated by the crossing of the black and blue curves in Fig. 6 (B), the decreased risk of death due to aspirin through pathway A D → D k+1 is compensated by the increased risk of death under antihypertensive treatment through . Therefore, discontinuation of antihypertensives only, i.e. (A Y = 0, A D = 1), gives the highest survival in this example.This illustrates the role and interpretation of the separable effect; even though a trial investigator only observes individuals in treatment levels A ∈ {0, 1}, the separable effects allows us to make inference under the hypothetical decomposed intervention (A Y = 0, A D = 1), which is not possible using conventional estimands such as the total effect or controlled direct effect.
By transforming the graphs in Fig. 5 to single world intervention graphs (Richardson and Robins, 2013b) corresponding to Figs. 2-4, it is straightforward to verify that the exchangeability (Eqs.12-13, 19-20 and 24-25) and dismissible component conditions (Eqs.27-30) are satisfied by the causal model.Because all positivity and consistency conditions also hold by construction in the data generating model, the total, controlled direct and separable effects are identified by the respective functionals given in Secs.5.1-5.3.] reflects that the term θ D is included for that estimand in Table 3, coming from the fact that W D (a Y , a D ) share its jump times with N D .Table 5.The left column includes the weights, the midddle column includes the hazards that define Eq. 47, and the right column includes the parametric hazard models that were used in the data analysis.

Weight
Hazards Hazard models fitted ] for a Y = a D using the estimators described in Sec.6 for 1000 simulated individuals in each of the treatment groups A = 0 and A = 1.It follows from the data-generating model defined in the beginning of Sec. 7 and Table 5 that the assumed hazard models are correctly specified (in the case of W D , W D , and W D (a, a D )).We could allow for a more involved data generating mechanism with time-varying coefficients.The assumed hazard models would still provide consistent estimates if the additive structure is correctly specified.Thus, an investigator can adapt the estimators in the supplementary R markdown file to other recurrent event problems.For the given sample size, the point estimates shown in Fig. 6 are unbiased (see the data-generating mechanism described in Sec. 7).

Discussion
We have used a counterfactual framework to formally define estimands for recurrent outcomes that differ in the way they treat competing events.The controlled direct effect is a contrast of counterfactual outcomes which implies that competing events are considered to be a form of censoring.The total effect captures all causal pathways between treatment and the recurrent event, and the separable effect quantifies contrasts in expected outcomes under independent prescription of treatment components.
Further, we have given formal conditions for identifying these effects, and demonstrated how to evaluate the identification conditions in causal graphs.This allowed us to formally describe how the causal estimands map to classical statistical estimands for recurrent events based on counting processes in the limit of fine discretizations of time.
In settings with competing events, it is often of interest to disentangle the effect on the recurrent event from the effect on the competing event.The controlled direct effect often fails to do so in a scientifically insightful way, because it is not clear which intervention, if any, eliminates the occurrence of the competing event.The interpretation of the direct effect is therefore unclear.The separable effect corresponds (by design) to a well-defined intervention in which we conceptually decompose the original treatment into components, which we then assign independently of each other.The practical relevance of the estimand relies on the plausibility of modified treatments.The process of conceptualizing modified treatments can motivate future treatment development and sharpen research questions about mechanisms (Robins and Richardson, 2011;Robins et al., 2020;Stensrud et al., 2020b).
Stronger assumptions are needed to identify the (controlled) direct effect and separable effects compared to the total effect.For example, these estimands require the investigator to measure common causes of the recurrent event and failure time, even in an ideal randomized trial such as in Sec. 2. The need for stronger assumptions is far from unique to our setting, and it is analogous to the task of identifying per-protocol effects in settings with imperfect adherence and mediation effects.
The use of a formal (counterfactual) framework to define causal effects elucidates analytic choices regarding treatment recommendations.The formal causal framework makes it possible to define effects with respect to explicit interventions, and to explicitly state the conditions in which such effects can be identified from observed data.This also makes it possible to transparently assess the strength and validity of the identifying assumptions in practice.
Next, we use the identification conditions and the law of total probability (LOTP) to add variables sequentially to the conditioning set in temporal order.We have that P (∆Y a,c=0 k = ∆y k ) is given by The conditional independence relation in the third line follows because C 0 ≡ 0 deterministically.Iterating this procedures for time indices k ∈ {1, . . ., k} gives Positivity ensures that the conditioning sets on RHS have a non-zero probability.Finally, by consistency we have that Next, we derive Eq. 18.By the presence of the indicator functions in Eq. 18 and by consistency (Eq.16) we have that .
Next, using the law of total expectation, the above is equal to The numerator and denominator of the fraction in the final line, indicated by ( ), differ only by the time index of Y a,c=0 in the conditioning set.Using the fact that (which follows from Eq. 13) we have that ( ) = 1, and thus by consistency (Eq.16), .
Iterating this procedure from j = i − 1 to j = 0 gives Using the law of total expectation again, RHS is equal to Another IPW representation also exists.We have that .
Arguing iteratively from j = i − 1 to j = 0, the RHS is equal to Putting everything together, we have that To conclude, we remark that the exchangeability conditions Eqs. 12-13 and identification formulas Eq. 17-18 follow directly from a general identification result, Theorem 31 of Richardson and Robins (2013b), by choosing outcome B.1.1.Limit of fine discretizations.We begin by noting that ∆Y k in Eq. 53 can only be non-zero if the individual has not experienced the competing event by beginning of the time interval k − 1.Therefore, where we have used the laws of probability in the second line.Using Bayes' law sequentially, we have that To proceed, we define modified intensities of the recurrent event process Next, let π(•) = P (A = •) and consider the stabilized weights .
The weight W C,i is a ratio of Kaplan-Meier survival terms with the respect to the censoring event.Let us also define the hazard of the competing event by Putting everything together, we have that Eq. 54 enables us to establish a correspondence with estimands in the counting process literature, as discussed in Sec.5.4.2, and also motivates estimators that we described in Sec. 6.The product term in Eq. 54 is a survival term with respect to the competing event, and the expectation is over weighted increments of the recurrent syncope.In the limit of fine discretization of time, Eq. 54 converges to B.1.2.Competing event .In order to identify E[D a,c=0 k ] from the observed data, we require the following two exchangeability assumptions instead of Eqs. 12 and 13 Using analogous arguments as for the recurrent event Y , identification of E[D a,c=0 k ] is achieved under Eqs.55-56 and Eqs.14-16 by Likewise, in the limit of fine discretizations of time, the cumulative incidence of the competing event is given by When treatment A is randomly assigned and Eq.87 holds with L(t) = N c (t) = ∅ (which is the usual independent censoring condition in survival analysis without any covariates (Aalen et al., 2008)), then W A = W C,t = 1 and Eq.58 reduces to This demonstrates sufficient conditions under which the discrete-time identification formula Eq. 29 in Young et al. (2020) converges to the usual representation of the cumulative incidence function in survival analysis.
B.2. Controlled direct effect.The identification conditions and identification formulas for direct effect are a special case of the identification results for total effect, redefining the censoring indicator as max(C i , D i ) (i.e. the first occurrence of the competing event and loss to follow-up), and re-defining the competing event as an event that almost surely does not occur.This gives us Next, we define to be modified intensities of the competing event process.This allows us to re-write Eq. 59 as where Under randomization of A and under the strong independent censoring assumption (Eq.40), W A = W C,i = W D,i = 1 and thus Eq. 60 converges to in the limit of fine discretizations of time.B.3.Separable effects.We begin by assuming the modified treatment assumption (Eq.4) and the identification conditions Exchangeability for all a ∈ {0, 1}, k ∈ {0, . . ., K + 1}. Consistency Consider the setting in which we conduct a four armed trial in which the A Y and A D are randomly assigned, independently of each other.We require the following dismissible component conditions to hold in the four armed trial To proceed, we introduce the following lemmas: Lemma 1.Under a FFRCISTG model, the dismissible component conditions (Eqs.27-30) imply the following equalities for a Y , a D ∈ {0, 1} ) , (68) Proof.We show the equality of Eq. 66, as Eqs.67-69 follow from analogous arguments, using Eqs.28-30 instead of Eq. 27.
The second and fourth line hold by consistency and by randomization of A Y and A D in the four armed trial.
Lemma 2. Suppose the exchangeability and positivity conditions Eqs. 24-25 and Eqs.62-64 hold.Define A = (A Y , A D ) and a = (a Y , a D ).We then have for all j ∈ {0, . . ., K + 1} that The quantities on the LHS are identified in the four armed trial, whereas the quantities on the RHS are identified in the two armed trial.
Proof.We show the equality for Eq.74, as Eqs.75-77 follow from analogous arguments using Eqs.28-30 instead of Eq. 27.We have that To derive the identification formula for separable effects, we proceed by sequential application of Bayes' theorem: (78) The quantities on RHS of Eq. 78 are identified in the four armed trial.The final identification formula for separable effects, which is a function of observed quantities in the two armed trial, follows directly from application of Lemma 3, which gives   Writing out RHS of Eq. 80 as a discrete sum over the density in Eq. 81, we have that RHS of Eq. 80 is equal to In the following result, we establish a correspondence between the exchangeability assumption and the classical independent censoring assumption in event history analysis.
Proof.Eq. 36 is equivalent to the statement ∆Y j ⊥ ⊥ C j | D j , L j−1 , Y j−1 , A for j ∈ {1, . . ., K + 1} .(86) Under faithfulness, a violation of Eq. 86 is equivalent to the existence of one of the three paths (1) (3) C k → Y k for some k ∈ {1, . . ., K + 1}, where X ∈ {L k , Y k−1 , D k , A}.Likewise, the violation of Eq. 36 is equivalent to the existence of one of the paths , A}.By the properties of transforming a DAG into a SWIG (Richardson and Robins, 2013a), and by consistency (Eq.16), the existence of (1') implies the existence of (1), and the existence of (2') implies the existence of (2).5 It follows that violation of Eq. 13 implies violation of Eq. 86, and consequently of Eq. 36.
An analogous relation exists between identification conditions for the competing event.The classical independent censoring assumption for the competing event takes the form where Since ∆D j ∈ {0, 1}, this can be written as Under faithfulness, exchangeability (Eq.56) is implied by Eq. 88.The contrast of Eqs. 2 and Eq. 4 in Robins and Finkelstein (2000) is similar to this correspondence.

3. 3 .
Separable effects.Following Stensrud et al. (2020a,b), consider two modified treatments A Y and A D such that receiving A Y = A D = a results in the same outcomes as receiving A = a for a ∈ {0, 1}; formally, let M Y and M D be two random variables, and suppose that the following conditions hold for the original treatment A and the modified treatments A Y , A D : All effects of A, A Y and A D on Y k and D k , k ∈ {0, . . ., K}, are intersected by M Y and M D , respectively, and

Figure 3 .
Figure 3. Identification of the (controlled) direct effect.In contrast to Figs. 2 (A) and (B), unmeasured common causes of Y and D, shown by the red path D a,c=d=0 k+1 k and L D,k satisfying Eqs.29-30 respectively.These conditions require that L D,k captures all effects of A D on Y c=0 k+1 , whereas L Y,k captures all effects of A Y on D c=0 k+1 .An example of L Y,k could be blood pressure measured at time k, which influences the cardiac risk.In our example on treatment discontinuation, we suppose that Eqs.27-30 hold with L A D = ∅.A detailed discussion of the decomposition of L k into L D and L Y is given in Stensrud et al. (2021).

Figure 4 .
Figure 4. Graphical evaluation of the dismissible component conditions Eqs. 27-28 when only baseline covariates are measured.Examples of violations of the conditions are shown as red paths.(A) Dismissible component conditions can be violated by unmeasured mediators such as M A Y and M A D : Eq. 27 is violated by the path A D → M A D → Y c=0 k+1 and Eq.28 is violated by the path A Y → M A Y → D c=0 k+1 .(B) Eq. 27 is violated by common causes of Y and D, such as the path A D → D c=0 k+1 ← U DY → Y c=0 k+1 , a collider path which opens when conditioning on D c=0 k+1 .Likewise, Eq. 28 is violated by the path A Y → Y c=0 and dA D t (a) = P (T D ∈ [t, t + dt)|T D ≥ t, C ≥ t, A = a).In this setting, Λ C t and Λ C|F t are the compensators of N C with respect to F C,D,A t and F L,Y,C,D,A t

Figure 5 .
Figure 5. (A) Causal graph illustrating a hypothetical four armed trial where A Y (antihypertensive agents) and A D (aspirin) are assigned freely.(B) Shows the observed two armed trial in which A Y and A D are assigned jointly (A Y ≡ A D ≡ A).Censoring nodes C k are not shown because they are not connected to any other nodes under the chosen data generating mechanism.

Figure 6 .
Figure 6.Different effects of treatment (dis)continuation of antihypertensives (A Y ) and aspirin (A D ) on recurrent syncope and survival are shown in (A) and (B), under a hypothetical data generating mechanism.The superscript c = 0, denoting an intervention to prevent loss to follow-up, has been suppressed in plot legends to reduce clutter.Estimates using the risk set estimator in Eq. 45 with a sample size of 1000 individuals in each treatment arm are shown in (C) and (D).95% pointwise confidence intervals obtained from a bootstrap sample of 400 are indicated in the shaded regions.The width of the confidence interval for Ê[D a Y =0,a D =1 k

=
a c i ∆y i d i l i P (A = a, C i = c i , ∆Y i = ∆y i , D i = d i , L i = l i ) • ∆y i × I(a = a Y ) P (A = a Y | L 0 = l 0 ) × I(c i = 0) i j=0 P (C j = 0 | C j−1 = 0, D j−1 = d j−1 , L j−1 = l j−1 , Y j−1 = y j−1 , A = a Y ) × i j=0 P (D j = d j | C j = 0, L j−1 = l j−1 , Y j−1 = y j−1 , D j−1 = d j−1 , A = a D ) i j=0 P (D j = d j | C j = 0, L j−1 = l j−1 , Y j−1 = y j−1 , D j−1 = d j−1 , A = a Y ) × i j=0 P (L D,j = l D,j | L j−1 = l j−1 , Y j = y j , D j = d j , C j = c j , A = a D ) i j=0 P (L D,j = l D,j | L j−1 = l j−1 , Y j = y j , D j = d j , C j = c j , A = a Y ) l i ) • ∆y i =E[∆Y a Y ,a D ,c=0 i ] .B.3.2.Limt of fine discretizations.To proceed, we define the weightsW L D,i (a Y , a D )= of probability, we may write Eq. 80 asE[Y a Y ,a D ,∆A D j (a Y )] × E W A W C,i W D,i (a Y , a D )W L D,i (a Y , a D )∆Y i C i = 0, D i−1 = 0, A = a Y .(84)Usinganalogous arguments where we start from Eq. 81 and replace ∆Y k by ∆D k , we have thatE[D a Y ,a D ,c=0 k ] is identified by E[D a Y ,a D ,∆A D j (a Y )] × E W A W C,i W D,i (a Y , a D )W L D,i (a Y , a D )∆D i C i = 0, D i−1 = 0, A = a Y .(85)Asexpected, we recover the identification formulas for total effect when choosing a Y = a D = a in Eqs.84 and 85.Appendix C. Correspondence of the independent censoring assumption We begin by introducing the notion of faithfulness, following Spirtes et al. (2000).Definition 4. A law P is faithful to a causal directed acyclic graph G if for any disjoint set of nodes A, B, C we have that A ⊥ ⊥ C | B under P implies (A ⊥ ⊥ C | B) G , where (•) G is used to denote graphical d-separation.