Estimation of separable direct and indirect effects in a continuous-time illness-death model

In this article we study the effect of a baseline exposure on a terminal time-to-event outcome either directly or mediated by the illness state of a continuous-time illness-death process with baseline covariates. We propose a definition of the corresponding direct and indirect effects using the concept of separable (interventionist) effects (Robins and Richardson in Causality and psychopathology: finding the determinants of disorders and their cures, Oxford University Press, 2011; Robins et al. in arXiv:2008.06019, 2021; Stensrud et al. in J Am Stat Assoc 117:175–183, 2022). Our proposal generalizes Martinussen and Stensrud (Biometrics 79:127–139, 2023) who consider similar causal estimands for disentangling the causal treatment effects on the event of interest and competing events in the standard continuous-time competing risk model. Unlike natural direct and indirect effects (Robins and Greenland in Epidemiology 3:143–155, 1992; Pearl in Proceedings of the seventeenth conference on uncertainty in artificial intelligence, Morgan Kaufmann, 2001) which are usually defined through manipulations of the mediator independently of the exposure (so-called cross-world interventions), separable direct and indirect effects are defined through interventions on different components of the exposure that exert their effects through distinct causal mechanisms. This approach allows us to define meaningful mediation targets even though the mediating event is truncated by the terminal event. We present the conditions for identifiability, which include some arguably restrictive structural assumptions on the treatment mechanism, and discuss when such assumptions are valid. The identifying functionals are used to construct plug-in estimators for the separable direct and indirect effects. We also present multiply robust and asymptotically efficient estimators based on the efficient influence functions. We verify the theoretical properties of the estimators in a simulation study, and we demonstrate the use of the estimators using data from a Danish registry study.


Introduction
Mediation analysis is an important tool in medical and epidemiological research for understanding the mechanisms that contribute to the overall effect of a treatment or exposure on an outcome of interest.Within the causal inference literature on mediation analysis, the target estimands of interest are often the natural (pure) direct and indirect effects (Robins and Greenland 1992;Pearl 2001), which together provide a nonparametric decomposition of the total treatment effect.A comprehensive overview of mediation analysis methods from a causal inference perspective can be found in VanderWeele (2015).
In this paper we study a continuous-time illness-death process where the potential mediator is the illness state.We are interested in the direct and indirect effect of a baseline exposure on the terminal event, adjusted for a set of pre-exposure covariates.This type of target estimand is often relevant when analysing real world data.We shall illustrate our method using a Danish registry study investigating the effects of dual antiplatelet therapy (DAPT) after myocardial infarction (MI) or stroke on mortality.DAPT is a treatment that combines aspirin and a second antiplatelet agent, which is often prescribed to MI or stroke patients to prevent blood clotting.It is well known that DAPT is associated with a lower risk of a recurrent cardiovascular event (Wallentin et al. 2009) which in turn is associated with increased mortality; this is the indirect effect of interest.At the same time DAPT has other effects that are associated with increased mortality, most notably it increases the risk of gastrointestinal bleeding (Kazi et al. 2015;Dinicolantonio et al. 2013); this is the direct effect of interest.
The conventional definition of natural direct and indirect effects is based on socalled cross-world quantities which require that we manipulate the mediator for each exposed individual to what would have occurred under non-exposure.Such quantities are not well defined in the illness-death setting since the mediator is effectively undefined when the terminal event occurs before the mediating event.This has implications for formulating the causal mediation targets of interest.
The term 'semi-competing risks' is often used in the literature when the outcome of interest is a non-terminal event that competes with a terminal time-to-event (Fine et al. 2001).We find that the definition of this term is unclear as discussed in Stensrud et al. (2021), and will refrain from using it in this paper.We will use the term "truncation" to describe the phenomenon when occurrence of the terminal event renders the intermediate event undefined, and the term "illness-death process" to describe the underlying data structure.
The challenges that arise when defining mediation targets for the illness-death models are similar to the well known challenges that arise when defining mediation targets for a survival outcome with a time-dependent mediator.Recent approaches in the literature redefine the target of interest beyond that of natural direct and indirect effects using randomized interventions (Zheng and van der 2017;Lin et al. 2017), path-specific effects (Vansteelandt et al. 2019) or separable effects (Didelez 2019;Aalen et al. 2020).While the setting in these papers is more general in that they allow for adjustment for time-varying covariates, they assume that the mediator Estimation of separable direct and indirect effects in a… process is measured at discrete time-points, and are thus not directly applicable to our setting where we allow the mediator process to change in continuous time.
Similar to Didelez (2019) and Aalen et al. (2020) we propose a definition of the direct and indirect effects using a treatment separation approach which is commonly referred to as the 'separable effects' approach (Stensrud et al. 2022(Stensrud et al. , 2021) ) or 'interventionist' approach (Robins and Richardson 2021) to causal mediation analysis.Based on an idea by Robins and Richardson (2011) this approach considers a hypothetical treatment decomposition under which it is possible to consider manipulations of the mediator independently of the treatment given.This is done by assuming that treatment has two binary components, a 'direct' one which is thought to affect the terminal event directly, and an 'indirect' one which only affects survival through it's effect on the intermediate event, and that the two components can be intervened upon separately.This makes it possible to define meaningful mediation targets even when the mediating event is truncated by death.The aim of this paper is to show how this approach can be applied to the continuous-time illness-death setting, and to derive estimators using semiparametric theory.In particular, the identifiability conditions and estimators we propose in this paper are an extension of Martinussen and Stensrud (2023), who consider similar causal targets and estimators in a continuoustime competing risk model.
The paper is organized as follows: In Sect. 2 we introduce the irreversible illness-death model as a stochastic process and describe the observed data structure.In Sect. 3 we formulate the targets of interest and present the identifiability conditions.In Sect. 4 we derive the efficient influence functions and establish their multiple robustness properties.We also suggest two estimators: a plug-in estimator based on the identifying functional and a one-step estimator based on using the efficient influence function as an estimating equation.We examine the performance of the estimators in a simulation study in Sect. 5. Section 6 illustrates the methods in the Danish registry data application.In Sect.7 we provide further discussion.Proofs and additional technical details are given in the Appendices.

Illness-death model
We consider an irreversible illness-death model, as depicted in Fig. 1.Following Andersen et al. (2012) the illness-death model is a stochastic process {X(t)} t∈[0,∞) with right-continuous sample paths and state space {1, 2, 3} , where state 1 is the ini- tial 'healthy' state, state 2 is the intermediate 'illness' state and state 3 corresponds to the absorbing state 'death'.We assume that X(0) = 1 , i.e. all subjects start in the initial 'healthy' state.We further assume that 2 → 1 transitions are not possible, i.e. the process is irreversible.In our DAPT example a patient enters state 1 when experiencing a myocardial infarction (MI) for the first time.The patient stays in state 1 until they either die or experience a recurrent cardiovascular event.In the latter case the patient moves to state 2 where they remain until death.
Then the hazards for the transitions between states 1 → 2 , 1 → 3 and 2 → 3 , respectively, are defined as follows

Data structure
Let A ∈ {0, 1} be a baseline treatment indicator, and W ∈ W = ℝ d a vector of base- line covariates.The full uncensored data are Z = {T 2 , T 1 , , A, W} ∼ Q where Q is a probability distribution belonging to a non-parametric statistical model Q .Let be the density of W and (⋅ | W) be the conditional distribution of A given W which we will refer to as the propensity score.The underlying density q of the data Z under Q factorizes as follows where (1) Estimation of separable direct and indirect effects in a… for That is, S 1 is the survival probability for the patients in state 1 and S 2 is the survival probability for patients in state 2. We also let N 13 (s) = I(T 2 ≤ s, = 0) , N 12 (s) = I(T 1 ≤ s, = 1) and N 23 (s) = I(T 2 ≤ s, = 1) denote the full-data counting processes corresponding to the transitions between states 1 → 3 , 1 → 2 and 2 → 3 , respectively.In our DAPT example N 13 is the counting process which jumps when a patient in the study dies without having a recurrent cardiovascular event.Further N 12 jumps when a patient experiences a recurrent cardiovascular event, and N 23 jumps when when a patient in the study dies having experienced a recurrent cardiovascular event.

Right censoring
We allow for right censoring with C denoting the censoring variable cor- responding to the time that an individual would be lost to followup.Under right censoring we only observe T1 = T 1 ∧ C , T2 = T 2 ∧ C and the indicators  = I(T 2 < C) and η = I( T1 < T2 ) .The observed data may then be represented as O = { T2 , , T1 , η, A, W} ∼ P where P belongs to a non-parametric statistical model P.
We make the coarsening at random (CAR) assumption, i.e., we assume that the coarsening probabilities only depend on the data as a function of the observed data.This assumption is stated more formally in Appendix A. Under CAR we can define the increments of the censoring martingale where is the censoring counting process corresponding to the observed censored observations up to and including time s, and is the corresponding censoring intensity.We also define which are the probabilities of being uncensored for patients in state 1 and state 2, respectively.

Separable direct and indirect effects
To define our estimand of interest we will use the concept of separable effects (Robins and Richardson 2011;Robins et al. 2021;Stensrud et al. 2022), which was briefly introduced in Sect. 1.This approach to mediation analysis moves the focus from intervening on the mediator process, which is conceptually problematic in the illnessdeath setting, to interventions on different components of the treatment A. To make the treatment decomposition more explicit we will think of the treatment A as having two binary components which we will denote A I and A D .As depicted in Fig. 2 we will assume that the component A I only affects the terminal event through it's effect on the intermediate event, and that the component A D only affects the terminal event directly.We will think of the corresponding four-arm trial as our 'target trial' and will define our target parameters based on the counterfactual variables defined by this target trial.In the observed data we have either A D = A I = 1 or A D = A I = 0 , but we presume that an intervention is possible where A D ≠ A I , i.e. the components could be set to different values.If such treatment components are assumed to exist and appropriate identification assumptions hold, then it is not necessary to conduct the four arm target trial.In fact the target parameters may be identified from the observed two-arm trial under the assumptions stated in Lemma 1 below.
This way of thinking about mediation analysis in terms of 'separable effects' can be useful when investigators want to know whether a specific mechanism of exposure is associated with the outcome.Often the hypothesis of interest concerns a specific 'active ingredient' of the exposure which may be difficult or impossible to measure.
In our example from Sect. 1 DAPT has been shown to have a protective effect on recurrent cardiovascular events, and is therefore often prescribed to MI or stoke Estimation of separable direct and indirect effects in a… patients.However DAPT is also associated with an increased risk of major bleeding (Wallentin et al. 2009).One of the primary forms of bleeding is gastrointestinal bleeding due to ulcers (Kazi et al. 2015;Dinicolantonio et al. 2013).We can then imagine a hypothetical treatment component A D which has the same effect as DAPT on mortality, but lacks any effect on cardiovascular events, and a hypothetical treatment component A I which has the same effect as DAPT on cardiovascular events but no direct effect on mortality.These treatment components do not necessarily correspond to meaningful real-world quantities.However, it can sometimes be useful to imagine them as hypothetical combination treatments.Assuming that gastrointestinal bleeding is the main effect of DAPT besides it's effect on cardiovascular outcomes, the A I component would correspond to a modified treatment that does not promote ulcers.In practice, a drug that combines DAPT with an additional drug that promotes healing of ulcers and thereby nullifies the harmful effect DAPT may resemble this hypothetical treatment.For instance a recent Danish registry study has shown that proton pump inhibitors (PPI's) can induce ulcer healing among patients treated with DAPT (Sehested et al. 2019).
It is important to note that the validity of the approach does not depend on whether the treatment components correspond to meaningful real life quantities.The validity of the approach does however depend crucially on the assumption that the two treatment components can be manipulated separately which is a strong assumption.

Parameter of interest
For j = 1, 2 we let T a D ,a I j denote the counterfactual event times under an intervention that sets A D to a D and A I to a I and let T a j denote the counterfactual event times under an intervention that sets A = a in the observed two-arm trial.
Then, the separable direct effect (SDE) and separable indirect effect (SIE) of the illness-death model are respectively defined as and where E(⋅) denotes expectations computed under the data-generating distribution.
That is, the SDE is the counterfactual contrast under A D = 1 and A D = 0 when A I is fixed at some level a I .The SIE is the counterfactual contrast under A I = 1 and A I = 0 when A D is fixed at a D .
Note that the separable direct and indirect effect add up to the total treatment effect 1 3

Identifiability conditions
In order to identify the parameters of the target trial given in Eqs. ( 2  Assumption A.0 is a separable effects analog of the consistency assumption.Assumption A.1-A.3 are standard assumptions for causal inference.Assumption A.4 is the so-called dismissible components conditions, which is an extension of the dismissible components conditions in Martinussen and Stensrud (2023) to the illness-death setting.In particular, assumption ( Δ1 ) states that the counterfactual hazards of the 1 → 2 transition are equal under all values of a D , and assumption ( Δ2 ) states that the counterfactual hazards of the 1 → 3 transition are equal under all values of a I .Lastly assumption ( Δ3 ) states that the counterfactual hazards of the 2 → 3 transition are equal under all values of a I .When the treatment components correspond to meaningful real-world treatments, the dismissible components conditions are empirically verifiable in future trials.

Under assumptions A.1-A.4 we have,
The dismissible components conditions are violated if the A D and A I compo- nents cannot be manipulated separately.In our DAPT example this would be the case if the biological pathways through which the medication affects MI or stroke is intertwined with the pathways through which it affects bleeding.The dismissible components conditions are also violated if there is an unmeasured common cause of the risk the intermediate and the terminal event.This is similar to the classical 'no unmeasured mediator-outcome confounding' assumption which is needed to identify natural (in-)direct effects.In our DAPT example this would be the case if there is an unmeasured common cause of cardiovascular events such as MI or stroke, and death.

Estimation
In this section we address the question of how to construct estimators of the estimand in Eq. ( 5).Efficient influence functions (EIFs) are an important concept in statistical theory for constructing estimators of causal parameters with desirable properties.In particular estimators based on the EIF are locally efficient (Bickel (6) SDE( , a I ) = (P; , 1, a I ) − (P; , 0, a I ), et al. 1993).Moreover they often exhibit multiple robustness properties in the sense that consistency of the estimator is preserved under misspecification of one or more components of the data distribution.Further, they are compatible with data adaptive estimation of nuisance parameters provided certain rate conditions hold.
In this paper we focus on the first two properties and assume (semi-)parametric models for the nuisance parameters.In particular, in what follows, we let Λ12,n , Λ13,n , Λ23,n , Λ C,n , πn denote (semi-)parametric estimators for the relevant components of the data distribution, and we let Λ * 12 , Λ * 13 , Λ * 23 , Λ * C , * and denote the large sample limits in probability of the (possibly misspecified) estimators.We let Q * and P * denote the corresponding distributions of Z and O respec- tively.If our working model for Λ 12 is correctly specified then Λ * 12 = Λ 12 and the same holds for Λ 13 , Λ 23 , Λ C and .
In Sect.4.1 we derive the efficient influence function.In Sect.4.2 we propose two types of estimators.The first is a 'plug-in' type estimator constructed by substituting estimators for the relevant part of the data distribution directly into (5).The second is a multiply robust estimator which uses the efficient influence function as an estimating equation.In Sect.4.3 we provide details on how to construct estimators of their asymptotic variance.

Efficient influence function
Below we derive the EIF of the separable direct and indirect effects under a nonparametric model.We first derive the full-data efficient influence function and then, assuming CAR and Assumptions A.0-A.4 hold, map it to the observed data efficient influence function using results given in Tsiatis (2006).We also establish general multiple robustness properties that will be satisfied by any estimator which solves the EIF estimating function.
Full-data efficient influence function denotes the expectation computed under Q * .In Appendix C we show that the effi- cient influence function for at Q * is given by ( 8) 1 3 and almost all w.
Proof In Appendix E. ◻ The multiple robustness properties stated in the lemma above imply that the full-data influence function ψ (Q * )(Z) is a consistent estimating function of (Q) when at most one of the transition intensities is inconsistently estimated.Observed-data efficient influence function Let ∶ P → ℝ , where P * → (P * ; , a D , a I ) = E * I(T a D ,a I 2 ≤ ) .In Appendix D we show that the observed data efficient influence function is given by with dM * ij (⋅) denoting the observed-data martingale increments under P * Lemma 3 (Multiple robustness) The observed-data efficient influence function admits a multiple robust structure in the sense that E{ ψ (P * )(O)} = (P) − (P * ) if one of the following holds This means that when the censoring distribution is correctly specified the same multiple robustness properties hold as in the full-data case.The censoring model and propensity score are allowed to be misspecified when all three transition intensites are correctly specified.
Efficient influence functions of the separable direct and indirect effects (9) 1 3 Estimation of separable direct and indirect effects in a… Consider the mappings P * → SDE (P * ; , a D , a I ) = (P * ; , 1, a I ) − (P * ; , 0, a I ) for a I ∈ {0, 1} and P → SIE (P * ; , a D , a I ) = (P * ; , a D , 1) − (P * ; , a D , 0) for a D ∈ {0, 1} .It follows by the functional delta method that the efficient influence functions of the separable direct and indirect effects in ( 6) and ( 7) are given by respectively and and will inherit the multiple robustness properties established in Lemma 3.

Plug-in (G-computation) estimator
A plug-in estimator estimates the relevant part of the distribution of O, in this case the empirical distribution of W and appropriate estimators Λ12,n , Λ13,n and Λ23,n of the transition intensities, and substitutes them in place of the unknown quantities in Eq. ( 5).Then one obtains the estimator where Equation ( 5) is also known as the G-computation formula (Robins 1986), and the estimator in ( 10) is also referred to as a G-computation estimator.Note that consistency of ΨPlug-in n (t, a D , a I ) depends on consistency of the estimators of all three transi- tion intensities.

One-step estimator
As mentioned above the efficient influence function is useful for constructing multiply robust efficient estimators.One way of doing this is to use the influence function directly as an estimating equation (van der and Robins 2003).Since the EIF in equation ( 9) is linear in the parameter of interest, this results the estimator: where ψ SDE (P * )(O;, a I ) = ψ (P * )(O;, 1, a I ) − ψ (P * )(O;, 0, a I ), for a I ∈ {0, 1}, The estimator in ( 11) is multiply robust.In particular it is consistent under misspecification of (i) Λ 12 , (ii) Λ 13 , (iii) Λ 23 or (iv) and Λ C as shown in Lemma 3.
Note that we can write: This approach is also referred to as a so-called 'one-step' bias correction approach (Ibragimov and Has'minskii 1981;Pfanzagel and Wefelmeyer 1985), and we will refer to the estimator in ( 11) as a 'one-step' estimator.

Asymptotic variance
If all nuisance models are correctly specified, then a consistent estimator of the asymptotic variance can be obtained from the variance of the influence function.
However if one or more of the nuisance models are misspecified then this variance estimator is no longer consistent, and other techniques must be used.Suppose we are willing to assume fully parametric models for all nuisance parameters.Then we can derive the asymptotic distribution of the estimators in ( 10) and ( 11) by stacking the corresponding unbiased estimating equations for the target and nuisance parameters, and applying standard estimating equation theory (Stefanski and Boos 2002).In particular, let θn be the estimators of the parameters of interest and nuisance parameters that solves n −1 ∑ n i=1 m(O i , θn ) = 0 where m(O, ) are the stacked estimating equations of both the parameter of interest and nuisance parameters.For the plug-in estimator in (10) this would be θn = ( ΨPlug-in , Λ12,n , Λ13,n , Λ23,n ) and m(O, ) = (P 13 , S Λ 12 , S Λ 13 , S Λ 23 ) where S Λ 12 , S Λ 13 , S Λ 23 are appropriate estimating equations for the transition hazards.Under suitable regularity conditions (Newey and McFadden 1994;van der Vaart 2000;Tsiatis 2006), we have It is then possible to derive an analytic expression for the asymptotic variance of the estimators in ( 10) and (11) using the sandwich variance estimator.
When the nuisance models are e.g.Cox regression models we need to take into account the variability of the baseline hazards which may be nonparametrically estimated.Then the asymptotic distribution can be derived using the functional delta method (van der Vaart 2000).This expression becomes very complicated, especially for the one-step estimator, and deriving an explicit estimator of the variance goes beyond the scope of this paper.

𝜑(P)(
Estimation of separable direct and indirect effects in a… 5 Simulation study

Simulation study 1: empirical performance
Below, we report the results from a simulation study where the aim is to compare the finite sample performance of the plug-in estimator and the one-step estimator.
The data was generated by the following simulation procedure: where expit(x) = {1 + exp(x)} −1 .Note that this corresponds to a scenario where treatment has a protective effect on both disease and death, and where the treatment effect on death is the same in diseased and disease-free subjects.
An estimator for the propensity score was constructed using a logistic regression model with main effects only.For the transition hazards we constructed estimators using a Cox regression model with main effects only and for the censoring hazard we used a Cox model with no covariate effects.The dependency of Λ 23 on the time of reaching state 2 was handled by delayed entry.We considered 8 different scenarios: in scenario (i) all nuisance models were correctly specified which is the case when ( , 12 , 13 , 23 , ) = 0 , and in scenarios (ii)-(viii) we considered misspecifi- cations of different combinations of the nuisance models by varying the values of ( , 12 , 13 , 23 , ) accordingly.Additional details on the misspecified scenarios are given in Appendix G.
For each scenario we generated 1000 datasets from the simulation procedure with a sample size of 400.For each dataset we computed the plug-in estimator and the one-step estimator for the SDE along with the bootstrap variance for each estimator based on 250 replicates.The results of our simulation study are summarized in Figs. 3 and 4 where for all scenarios we report bias, empirical standard error, coverage of the 95 % Wald confidence interval and accuracy of the standard error estimator computed at time points t ∈ {1, 5, 10, 15, 20, 25}.
As expected both the plug-in estimator and the one-step estimator are consistent in scenario (i) where all nuisance models are correctly specified and scenario (ii) were the propensity score and censoring models are misspecified.Moreover the coverages are close the nominal level.In scenarios (iii)-(v) where we consider misspecifications of at most one of the transition hazard models the one-step estimator provides a bias reduction over the plug-in estimator, as predicted by the multiple robustness properties in Lemma 3. In scenarios (vi)-(viii) where we consider misspecifications that go beyond the robustness properties of lemma 3 both the plug-in estimator and the one-step estimator are biased, except in scenario (vi) where the plug-in estimator surprisingly appears unbiased.The one-step estimator is more variable than the plug-in estimator throughout all scenarios.
This simulation study confirms the double robustness properties of the one-step estimator derived in Sect.4.1, which, along with the potential compatibility with data-adaptive estimation of nuisance parameters, highlights the real-word utility of the one-step estimator.

Simulation study 2: violation of assumptions
The dismissible components conditions in Lemma 1 are violated in the presence of an unmeasured common risk factor for illness and death.Below, we study such violations in a simulation study.
The data was generated by the following simulation procedure: W ∼ Bernoulli(0.5) We varied U along the grid {−1, −0.9, ..., 0.9, 1} and considered the four cases: (I) Protective treatment effect on disease and death, (II) Protective effect on disease and harmful effect on death, (III) Harmful effect on disease and protective effect on death and (IV) Harmful treatment effect on disease and death.
We constructed an estimator for the propensity score using a correctly specified logistic regression model.The censoring hazard was estimated using a Cox model with no covariate effects.The remaining nuisance models were estimated using Cox regression models adjusted for main effects of the observed variables.We generated 1000 datasets with a sample size of n = 1000 .For each dataset we computed the plug-in estimator and the one-step estimator for the SDE evaluated at time point t = 15 .The results are depicted in Fig. 5.It is seen that the bias increases with the magnitude of the association with the unmeasured common risk factor U. The direction of the bias depends on the effect of treatment on illness: when the treatment has a protective effect on disease the estimator is downwards biased, and when the treatment has a harmful effect on disease the bias is positive.

Real data application
Using data from the Danish nationwide registries we identified all hospital admissions for first time acute myocardial infarction (MI) between 2010 and 2014.To get a more homogeneous study population we only included patients who were treated with a Percutaneous Coronary Intervention (PCI).We also excluded patients with a preexisting alcohol abuse diagnosis or chronic kidney disease diagnosis and patients younger than 30 years or older than 100 years of age.We set the index date for inclusion at 30 days following discharge and excluded patients who died prior to the index data.We defined the treatment arm as those patients who picked up a prescription for DAPT before the index date and the placebo group as those who did not.Patients who were still alive by the end of 2019 were administratively censored.Among the 16,081 patients in the study population 3856 patients had a recurrent cardiovascular event (defined as a hospital diagnosis of MI, stroke or heart failure) and were subsequently censored, 968 patients died within follow-up without having a recurrent cardiovascular event 1 3 Estimation of separable direct and indirect effects in a… and 1385 patients experienced a recurrent cardiovascular event and subsequently died within followup.
The cumulative hazard curves in Fig. 6 suggest that treatment reduces both risk of recurrent cardiovascular event, overall mortality and death without recurrent cardiovascular event.To access how much of the effect of DAPT on mortality was mediated through recurrent cardiovascular events we estimated the separable direct and indirect effects.That is, we assume that the treatment has two components that could in principle be manipulated separately: one component A I which only affects the risk of recurrent cardiovascular event directly and another component A D which affects mor- tality through other pathways.A possible interpretation of these treatment components was discussed in Sect.3. We can then define the separable indirect effect as the effect under an intervention that fixes the treatment component affecting affecting mortality through other pathways than recurrent cardiovascular events but varies the treatment component affecting cardiovascular events.Similarly we can define the separable direct effect as the effect that fixes the treatment component affecting cardiovascular events and varies the component affecting mortality through other pathways.
We estimated the separable effects using the plug-in estimator and the one-step estimator presented in Sect.4.2.Both estimators used semi-parametric working models for the nuisance parameters.In particular, we used Cox regression models for the three transition hazards.The models were adjusted for baseline age, sex, hypertension diagnosis, prior gastrointestinal bleeding, diabetes, chronic liver disease, cancer, atrial fibrillation, Anemia, prior heart failure or stroke.We computed Wald-type point-wise confidence intervals based on 500 bootstrap data sets.The results of our analysis are presented in Figs.7 and 8.In addition to the separable direct and indirect effects we have also depicted the total effect, c.f., Eq. (4).
Our results suggest that the treatment reduces mortality both through recurrent cardiovascular events and through other pathways.That is, within the limitations of our study, we can conclude that the modified treatment that fixes the component affecting mortality through other pathways than recurrent cardiovascular events does not capture the entire protective effect of the treatment.In fact a substantial fraction of the protective effect of DAPT on mortality is a direct effect.
We recognize several potential limitations with our study.First, we likely have confounding by indication in that frail individuals are less likely to be prescribed the treatment.Therefore the drug will appear more effective than it actually is, also on non-cardiovascular mortality.This phenomenon is notoriously difficult to adjust for because of unmeasured confounding.Second, comorbidities such as diabetes status are essentially time-varying covariates.It is a major limitation of our method that we only adjust for baseline covarites.Third, a potential issue is that many cardiovascular events go undetected or are not entered into Estimation of separable direct and indirect effects in a… the registries e.g. when a patient dies suddenly without prior hospital admission.Finally, the overall risk of bleeding, which is the main side effect of DAPT, is very low.

Relation to other approaches
The main difficulty when formulating causal mediation targets in the illness-death model is that the mediating event is truncated by the terminal event.In this paper we proposed causal mediation estimands using the concept of separable effects, which considers interventions on separate components of the treatment instead of interventions on the mediator.This approach avoids the conceptual issues that arise when the terminal event occurs before the mediator, rendering the mediator undefined.Fig. 7 Estimates of the separable direct effect (SDE), separable indirect effect (SIE) and total effect (TE) using the one-step estimator.Solid lines represent effect estimates and dashed lines the corresponding 95 % point-wise confidence intervals Fig. 8 Estimates of the separable direct effect (SDE), separable indirect effect (SIE) and total effect (TE) using the plug-in estimator.Solid lines represent effect estimates and dashed lines the corresponding 95 % point-wise confidence intervals However, this comes at the cost of assuming that the treatment components can be manipulated separately, which may not always be appropriate.
Depending on the causal question at hand there are other approaches in the literature that may be useful for defining mediation targets in the illness-death model.Valeri et al. (2021) propose randomized interventional direct and indirect effects.Instead of considering manipulations of the mediator, they consider stochastic interventions on the intermediate time-to-event distribution conditional on baseline covariates.The authors then define the 'stochastic direct effect' as the difference in survival across exposure groups under a stochastic intervention that fixes the intermediate time-to-event distribution to be the same in both exposure groups.The 'stochastic indirect effect' is defined as the difference in survival within an exposure group when the intermediate time-to-event distribution is varied.Their approach result in the same identifying functionals as in our paper, but under different identifiability conditions.Thus the target parameter in our paper can also be interpreted as an interventional effect.
A different alternative is principal stratification which has often been advocated in the presence of truncation (Zhang and Rubin 2003;Comment et al. 2019).A recent paper by Gao et al. (2021) proposes a principal stratification approach for defining causal mediation effects in the subgroup where the intermediate event will happen before the potential terminal event when given either of two treatment options.This strata corresponds to a multistate model where only the transition from the 'healthy' state to the 'illness' state and from the 'illness' state to 'death' are involved, an thus their approach leads to a different identifying functional than the one in our paper.This method avoids the issues that arise when death occurs prior to the non-terminal event.However a limitation is that the empirical usefulness of the estimand is debatable since the subgroup for which the estimand is defined can never be observed.Huang (2021) proposes a method for causal mediation with 'semicompeting risk data', based on counterfactual counting processes for the latent intermediate event and the terminal event.To circumvent the undefinability of the intermediate event the author assumes that if the intermediate event does not occur before the terminal event it would never occur within follow-up.The paper was accompanied by a number of commentaries (Stensrud et al. 2021;Fulcher et al. 2021;Chan et al. 2021) which argue that the identification assumptions are too restrictive for most practical contexts.As the authors do not use a classical illness-death model framework, it is not clear to us how their identifying functional is connected to ours.

Conclusion and possible extensions
In this paper we proposed causal estimands for the separable direct and indirect effects of a baseline exposure on a terminal time-to-event outcome mediated by the illness state of a continuous-time illness-death process.We proposed a plug-in estimator based on the identifying functional, and a one-step estimator which solves the efficient influence function.We showed that the one-step estimator is multiply robust under appropriate regularity conditions, and we confirmed these theoretical properties in a simulation study which showed an impressive performance of the Let be a time horizon chosen such that there exists  > 0 with P( C > ) >  > 0 .For any time r ∈ [0, ] we define the set In particular, when C = r we observe the many-to-one mapping and the observed data may be expressed as The coarsening mechanism is monotone since G r (Z) ⊆ G r � (Z) for r > r ′ .Following Tsiatis (2006) Chapter 9.3 the CAR assumption is formally defined by where the coarsening hazard may be written That is, if we assume (12), Estimation of separable direct and indirect effects in a… where

B Proof of Lemma 1
All transition probabilities of the illness-death model can be expressed in terms of the hazards for the transitions (see e.g.Putter et al. (2007)).For instance, the probability of going from state 1 directly to state 3, within a time interval (s, t], can be expressed as where we have omitted the baseline covariates for now. The probability of going from state 1 to state 3 moving through state 2, within a time interval (s, t], can be expressed as Then where the last equality follows using (13) The first equality is by the law of iterated expectations.The second equality follows by using the representation in Eq. ( 13) under an intervention that sets A D = a D and A I = a I .The third equality follows by applying the dismissible components condi- tions.The last equality follows by applying A.0-A.3.

C Full data EIF
Let Q be a parametric submodel with parameter ∈ ℝ which passes through Q at = 0 .The corresponding tangent space T F is the closure of the linear span of the scores of the parametric submodels.Due to the factorization of the probability distribution of the full-data density in (1) we can write this as the orthogonal sum where In particular the score on the parametric submodel can be written T ) for all functions (u, a, w) ) for all functions (u, a, w) ) for all functions (u, r, a, w) Estimation of separable direct and indirect effects in a… where � W (w; ) = ∕ log (w; ) , � A|W (w | a; ) = ∕ log (a | w; ) and and and By Riesz' representation theorem the efficient influence function can be characterized as any element ψ ∈ T F which is a pathwise derivative of the target parameter in the sense that for any one-dimensional submodel Q with corresponding score ′ Z .Note that under the nonparametric model we have that the full-data tangent space is the entire Hilbert space L 2 0 (Q) of measurable, mean-zero functions of Z equipped with the covariance inner product.Then any pathwise derivate will trivially be contained in T F .Hence we only need to check that the proposed EIF in (8) satisfies ( 14).
Consider first the left-hand side of ( 14).We may write where the second equality follows by changing the order of integration.Consider now the right-hand side of ( 14).We have by iterated expectations, and the properties of score functions that ) Estimation of separable direct and indirect effects in a… Du to the representation of the scores 12 and 13 in terms of the full-data martingales, the expectations in ( 16)-( 18) are the covariances of martingale stochastic integrals.They be computed by finding the expectation of the corresponding predictable covariation processes (Fleming and Harrington 1991).In particular, the predictable covariation process of M F ij with itself is the compensator part of the martingale.The predictable covariation process of M F 12 and M F 13 is zero because the counting processes N 12 and N 13 by definition do not jump simultaneously.
Then we may write (16) as and similarly for (17) Finally we may rewrite (18) as Hence we have shown that the proposed influence function is in fact the efficient full-data influence function.
and Suppose , Λ 13 and Λ 23 are correctly specified, but Λ 12 is not.Then the terms ( 24) and ( 25) are zero, and we have

Fig. 2
Fig. 2 An informal causal diagram illustrating the relationship between the treatment components and the counting processes.The thick edges indicate a deterministic relationship )-(3) from the observed two-arm trial we need the following assumptions Lemma 1 (Identifiability) Suppose the following assumptions hold A.0 We assume that the interventions are such that A.1 Conditional exchangeability: A.2 Consistency: If an individual is observed to receive treatment A = a , then A.3 Positivity: and and A.4 Dismissible components conditions: for all t ∈ ℝ, r ∈ ℝ where a D ,a I ij (⋅) denotes the transition hazards of the counterfactual illness-death process under an intervention that sets A D = a D and A I = a D .
separable direct and indirect effects are identified to and Proof In Appendix B. ◻

Fig. 3 3
Fig. 3 Comparison of the G-computation (white rectangles) and one-step (black triangles) estimators of the SDE computed at time points t ∈ {2, 5, 10, 15, 20, 25} in terms of bias, empirical standard error, coverage of 95% confidence intervals and accuracy of the standard error estimator.This figure contains scenarios (i)-(iv) (Color figure online)

Fig. 4
Fig. 4 Comparison of the G-computation (white rectangles) and one-step (black triangles) estimators of the SDE computed at time points t ∈ {2, 5, 10, 15, 20, 25} in terms of bias, empirical standard error coverage of 95% confidence intervals and accuracy of the standard error estimator.This figure contains scenarios (v)-(vii) (Color figure online)

Fig. 5
Fig. 5 Bias of the plug-in (white rectangles) and one-step (black triangles) estimators of the SDE computed at time points t = 15 under violation of the identification assumption

Fig. 6
Fig. 6 Nelson-Aalen estimates of the cumulative hazards of MI (top left), overall mortality (top right) and death without recurrent MI (bottom) in our cohort.The red curves are the treatment arm and the black curves are the placebo arm.Along with the hazards (solid lines) are shown 95% confidence intervals (dashed lines) (Color figure online) u, u − r) du  23 (s, s − r) ds exp − � r s  12 (u) +  13 (u) du  12 (r) dr.