Bias of the additive hazard model in the presence of causal effect heterogeneity

Hazard ratios are prone to selection bias, compromising their use as causal estimands. On the other hand, if Aalen’s additive hazard model applies, the hazard difference has been shown to remain unaffected by the selection of frailty factors over time. Then, in the absence of confounding, observed hazard differences are equal in expectation to the causal hazard differences. However, in the presence of effect (on the hazard) heterogeneity, the observed hazard difference is also affected by selection of survivors. In this work, we formalize how the observed hazard difference (from a randomized controlled trial) evolves by selecting favourable levels of effect modifiers in the exposed group and thus deviates from the causal effect of interest. Such selection may result in a non-linear integrated hazard difference curve even when the individual causal effects are time-invariant. Therefore, a homogeneous time-varying causal additive effect on the hazard cannot be distinguished from a time-invariant but heterogeneous causal effect. We illustrate this causal issue by studying the effect of chemotherapy on the survival time of patients suffering from carcinoma of the oropharynx using data from a clinical trial. The hazard difference can thus not be used as an appropriate measure of the causal effect without making untestable assumptions.


Introduction
Hazard ratios, often obtained by fitting a Cox proportional hazards model (Cox, 1972), are the most common effect measures when dealing with time-to-event data.However, the hazard ratio is prone to selection bias due to conditioning on survival and therefore not suitable for causal inference (Hernán, 2010;Stensrud et al., 2017;Aalen et al., 2015).It has been recommended to use other, better interpretable, estimands when interested in causal effects (Hernán, 2010;Bartlett et al., 2020;Stensrud et al., 2018;Young et al., 2020).Alternatively, using the additive hazard model can avoid interpretation issues (Aalen et al., 2015;Martinussen et al., 2020).In the nonparametric model proposed by Aalen (1989), the hazard rate of time t for individual i in the absence of time-dependent) covariates x i (t) (of dimension p) equals where the parameters β j (t) are arbitrary regression functions, allowing time-varying effects (Aalen et al., 2008).Restricted versions have been proposed by Lin and Ying (1994) and McKeague and Sasieni (1994), where all or some β j (t) are assumed to be constant over time.The cumulative regression function, B j (t) = t 0 β j (s)ds, can reveal whether the effect changes over time, e.g. in Aalen et al. (2008, Example 4.6).
For cause-effect relations that can be accurately described with Aalen's additive hazard model, the hazard difference is a collapsible measure (Martinussen and Vansteelandt, 2013).Then, in the absence of confounding, the hazard difference is an unbiased estimator for the causal effect, even in the case of unmeasured risk factors (Aalen et al., 2015).For this, it is necessary that the exposure effect on the hazard does not depend on unmeasured individual characteristics (modifiers) and thus are the same for all individuals.However, due to the fundamental problem of causal inference, the effect homogeneity assumption is untestable (even in the absence of any confounding) (Hernán and Robins, 2020).In our companion paper, we have shown that next to unmeasured risk factors, i.e., frailty, effect heterogeneity at the level of the individual hazard results in selection bias of observed hazard ratios (Post et al., 2022).In this work, we study the bias of the hazard difference observed when estimating the causal effect under an additive hazard model but in the presence of heterogeneity of effect (on the hazard).To do so, we formalize the cause-effect relations and define the causal hazard difference.First, we show that in the absence of confounding, the expectation of the hazard difference observed (in practice) equals a marginalized causal hazard difference.After that, we study how this marginalized causal hazard difference deviates from the causal hazard difference due to selecting individuals with favourable values of the effect modifier.We will illustrate how this selection can result in a non-linear cumulative regression function curve while the individual causal effects are constant.Finally, we discuss the analysis of the effect of treatment on survival with carcinoma of the oropharynx from the clinical trial also studied in Aalen (1989).

Notation and hazard differences
Probability distributions of factual and counterfactual outcomes are defined in the potential outcome framework (Rubin, 1974;Neyman, 1923).Let T i and A i represent the (factual) stochastic outcome and exposure assignment level of individual i.Let T a i equal the potential outcome of individual i under an intervention of the exposure to level a (counterfactual when A i = a).For those more familiar with the do-calculus, T a is equivalent to T | do(A = a) as e.g.derived in Pearl (2009, Equation 40) and Bongers et al. (2021, Definition 8.6).Throughout this work, we will assume causal consistency, i.e. if A i = a, then T a i = T Ai i = T i .Causal consistency implies that potential outcomes are independent of the exposure levels of other individuals (no interference).The hazard rate of the potential outcome can vary among individuals due to heterogeneity in risk factors U 0 as also considered by Aalen et al. (2015).The hazard difference of the potential outcomes with and without (a = 0) exposure might also vary among individuals due to an effect modifier U 1 .Therefore, the hazard rate of individual i at time t of the potential outcome under exposure to level a is a function of U 0i and U 1i and thus random and equals The hazard of the potential outcome T a i can be parameterized with a function that depends on U 0i , U 1i and a.We focus on cause-effect relations that can be parameterized with a structural causal model (SCM), that consists of a joint probability distribution of (N A , U 0 , U 1 , N T ) and a collection of structural assignments (for more details see Post et al. (2022, Section 2)) such that where , then the effect is equal for each individual, i.e. effect homogeneity, and Aalen's additive hazard model is well-specified.Otherwise, ] will be the estimand of interest.The latter contrast equals the difference between the expected hazard rate in the world where everyone is exposed to a and the world without exposure.It will be referred to as the causal hazard difference (CHD) defined in Definition 1.
Definition 1 Causal hazard difference The CHD for cause-effect relations that can be parameterized with SCM 2 equals Throughout this work, we abbreviate the Lebesque-Stieltjes integral of a function g with respect to probability law F X , g(x)dF X (x), as g(X)dF X .
The CHD thus equals the difference of hazard rates of the potential outcomes marginalized over the population distribution of (U 0 , U 1 ).However, the distribution of (U 0 , U 1 ) among survivors will differ over time (in all worlds), i.e. (U 0 , U 1 ) Therefore, the hazard rates, and thus the hazard difference, for the factual outcomes are affected by these conditional distributions of U 0 and U 1 .We refer to the expected value of the difference of the observed hazards as the observed hazard difference (OHD) presented in Definition 2.
Definition 2 Observed hazard difference The OHD at time t equals For a randomized controlled trial (RCT), without informative censoring, the OHD can be easily expressed in terms of potential outcomes.By causal consistency By design of the trial A ⊥ ⊥ T a (in SCM (2) equivalent to N A ⊥ ⊥ U 0 , U 1 , N T ), so The OHD at time t is thus equal to We refer to this effect measure as the marginal causal hazard difference (MCHD) defined in Definition 3.
Definition 3 Marginal causal hazard difference The MCHD at time t for cause-effect relations that can be parameterized with SCM 2 equals As the integration in Definition 1 is with respect to the population distribution of (U 0 , U 1 ), the OHD can deviate from the CHD.However, it has been explained by Aalen et al. (2015) that U 0 ⊥ ⊥ A | T ≥ t so that for degenerate U 1 , the MCHD equals the CHD so that the latter can be unbiasedly estimated from RCT data.In this paper, we formalize the MCHD (and thus, in the absence of confounding, the OHD) in case of effect heterogeneity (non-degenerate U 1 ).

Results
In the remainder of the paper, we will focus on binary exposures such that a ∈ {0, 1}.It is known from Aalen et al. (2015) that when As a result, the selection of frailty factors U 0 over time is similar for exposed and unexposed individuals and does not result in bias.In the absence of confounding (T a ⊥ ⊥ A), the independence and causal consistency imply Thus Aalen et al. (2015), implicitly showed that In Lemma 1, we show that the additive frailty remains exchangeable in the presence of effect heterogeneity at the hazard scale.
Lemma 1 If the cause-effect relations of interest can be parameterized with SCM (2), where ] as the conditional expectations will decrease over time (survival of less susceptible individuals).The selection of levels of U 0 is thus similar in the exposed and unexposed worlds.It indeed follows from Lemma 1 that in the absence of effect heterogeneity, the OHD from an RCT is an unbiased estimator of the CHD as shown by Aalen et al. (2015).However, if heterogeneity exists, there will also be a selection of U 1 in the exposed world where individuals with more favourable levels of U 1 are more likely to survive.As a result of this selection, the OHD over time no longer represents the (population) average effect.In this work, we consider hazard functions that satisfy Condition 1.
Condition 1 Hazard without infinite discontinuity In Theorem 1, we show that the OHD equals E f 1 (t, U 1 , 1) | T 1 ≥ t in absence of confounding for hazard functions that satisfy Condition 1, and thus deviates from the CHD equal to Theorem 1 If the cause-effect relations of interest can be parameterized with SCM (2), where which equals . So, the difference with the OHD varies over time and equals.
For this case, the conditional expectation E[U 1 | T 1 ≥ t] can be expressed in terms of the Laplace transform of U 1 , as shown in Lemma 2.
Lemma 2 If the cause-effect relations of interest can be parameterized with SCM (2), where and where We will apply this lemma to illustrate how effect heterogeneity can affect the cumulative regression function curve in case the causal effect is constant for each individual (f 1 (t, U 1i , 1) = U 1i ).Let the additive hazard effect modifier U 1 equal µ 1 (≤ 0, for individuals that benefit) with probability p 1 , µ 2 (≥ 0, for individuals that are harmed) with probability p 2 or 0 (for individuals that are not affected).We denote this distribution as the Benefit-Harm-Neutral, BHN(p 1 , µ 1 , p 2 , µ 2 ), distribution.Note that it is necessary that ∀t : P(f 0 (t, U 0 ) < µ 1 ) = 0 to guarantee that the hazard rate is positive for each individual.The Laplace transform of U 1 equals, So, by Lemma 2 with f 1 (1, t) = 1, the OHD from an RCT is equal to .
The OHD then deviates from the constant CHD equal to For cause-effect systems that meet the conditions of Theorem 1, the expectation of the integrated OHD (and thus of the integrated MCHD) equals Even though the CHD is constant, B(t) will not be linear and deviate from the function g(t) = tE[U 1 ] due to the selection effect (of U 1 ) over time.Three types of curves could be observed as shown in Figure 1 where for illustration, p 1 = p 2 = 0.5, so there exist only two levels for the modifier U 1 .First of all, let's consider the case where the exposure harms some individuals (for which U 1i = 1) while others don't respond to the exposure at all (U 1i = 0), see the orange line in Figure 1.Initially, B(t) evolves as tE[U 1 ] = 0.5t.However, the individuals harmed by the exposure are less likely to survive over time, so the curve's derivative decreases.In the end, only individuals with U 1i = 0 are expected to survive so that B(t) remains constant.Concluding that the exposure initially harms but loses effect over time is false for this case as the effect remains constant for each individual.
Secondly, when some individuals do benefit (U 1i = −0.25)while others are not affected (U 1i = 0), the derivative of B(t) evolves from −0.125 to −0.25 at the moment only those that benefit are expected to survive, as illustrated with the purple line in Figure 1.But, again, the effect for an individual is constant and does not become more beneficial over time.
Finally, and probably most interesting, different individuals in the population might have opposite effects (U 1i = 1 or U 1i = −0.1),as illustrated with the pink line in Figure 1.Initially, the integrated hazard differences increase as the expected effect is harmful.However, over time those individuals with U 1i = −0.1 are more likely to survive so that E[U 1 | T 1 ≥ 1] changes sign.Finally, only those that benefit are expected to survive, and the curve decreases with a derivative equal to −0.1.For this example, it would be false to conclude that the exposure first harms but becomes beneficial over time.
Similar patterns can be observed for a continuous U 1 distribution, in which case the E U 1 | T 1 ≥ t will keep decreasing.For example when (U 1 + ) ∼ Γ (k, θ), then L U1+ (c) = θk θc+1 .By lemma 2 The B(t) are presented over time in Figure 2 for θ = k = 1 and ∈ {0, 1 4 , 1 2 , 1}.Thus, if U 0 ⊥ ⊥ U 1 , then the MCHD will be less or equal to the CHD due to the selection of U 1 .Decreasing or constant B(t) curves that at some point increase again can then not be explained by the selection of U 1 since individuals with less beneficial values of U 1 are expected to survive shorter.However, if U 0 ⊥ ⊥ U 1 , such a pattern of the B(t) curve can be due to the selection of U 1 as, by Theorem 1, the deviation of the OHD from the CHD depends on f 0 and the joint distribution of (U 0 , U 1 ).
3.1 Dependent U 0 and U 1 The bivariate joint distribution function of U 0 and U 1 , F (U0,U1) , can be written using the marginal distribution functions and a copula C (Sklar, 1959).As such, and the Kendall's τ correlation coefficient of U 0 and U 1 can be written as a function of the Copula C (Nelsen, 2006).To illustrate how the dependence can affect the OHD for the settings presented in Figure 2, we use a Gaussian copula where Φ and Φ 2,ρ are the standard normal and standard bivariate normal with correlation ρ cumulative distribution functions, respectively.Then, For this example, we let where U 0i ∼ Γ (1, 1) so that the hazard is nonnegative for each individual.For ρ ∈ {−1, sin (−0.25π) , 0, sin (0.25π) , 1} (such that τ ∈ {−1, −0.5, 0, 0.5, 1}) and ∈ {0, 1 2 , 1}, 3. The conditional expectations are derived empirically from simulations (n = 10, 000), and the integral is approximated by taking discrete steps of size 0.1.All programming codes used for this work can be found online at https://github.com/RAJP93/CHD.The survival curves of the potential outcomes can be found in Figure 4.
The selection bias increases when ρ > 0 (compared to ρ = 0).On the other hand, for ρ < 0, the selection bias is smaller most of the time than for ρ = 0 since favourable U 1 are expected to occur with unfavourable levels of U 0 .Moreover, for ρ = −1, at larger t, we observe that the selection bias can even change sign, and the absolute value of the bias might be larger than for ρ = 0 for those t.For ρ = 0, the MCHD might thus be larger than the CHD, so the MCHD is not a lower bound for the CHD.Note that if U 0 ⊥ ⊥ U 1 , the integrated MCHD depends on the functional form of f 0 .In Figure 7 in Appendix B the results for f 0 (t, are presented where the effect of the dependence is small.
4 Case study: the Radiation Therapy Oncology Group trial Next, we consider a large clinical trial carried out by the Radiation Therapy Oncology Group as described by Kalbfleisch and Prentice (2002, Section 1.1.2and Appendix A) and also presented by Aalen (1989).From the patients with squamous cell carcinoma (a form of skin cancer) of 15 sites in the mouth and throat from 16 participating institutions, our focus is only on two sites (faucial arch and pharyngeal tongue) and patients from the six largest institutions.All participants were randomly assigned to radiation therapy alone or combined with a chemotherapeutic agent.So, we are interested in the causal effect of the chemotherapeutic agent in addition to radiation therapy on survival.If the causal mechanism can be parameterized with SCM (2) without effect heterogeneity, i.e.
and the randomization was properly executed, implying N Ai ⊥ ⊥ U 0i , N T i , then, by Theorem 1, the OHD equals the CHD.Furthermore, the CHD can be unbiasedly estimated by fitting Aalen's additive hazard model.We did so by using the aalen() function from the package timereg in R. The estimated cumulative regression function (and a corresponding 95% confidence interval) of treatment combined with a chemotherapeutic agent is presented by the black lines in Figure 5.
In the absence of effect heterogeneity (ignoring the statistical uncertainty), one could now conclude that initially adding the chemotherapy is expected to harm a patient as B(t) takes on positive values and that the exposure loses its harmful effect over time as the derivative of B(t) decreases over time.Following a similar reasoning, Aalen et al. (2008) concluded for the effect of N-stage (an index of lymph node metastasis) on survival based on the same dataset (while also including patients with a tumour located at the tonsillar fossa) that "The regression plot shows that this [non-significant P-value for a zero-effect of N-stage from a Cox analysis] is due to a strong initial positive effect being "watered down" by a lack of, or even a slightly negative effect after one year.Hence, not taking into consideration the change in effect over time may lead to missing significant effects.".However, when in reality the observed time-varying effect can also result from the modifier U 1i selection.For example, when λ a i (t) = f 0 (t, U 0i )+U 1i , and U 1i ∼ BHN(0.5, −0.1, 0.5, 0.4), by Theorem 1, this pattern is expected (see the green line in Figure 5) while the causal effect is constant for each individual.The CHD equals 0.15 at each time point, but over time individuals that are harmed by the chemotherapy (U 1i = 0.4) are less likely to survive so that the MCHD converges towards −0.1 (the effect for individuals that benefit from the chemotherapy).When we perform a stratified analysis by site in the oropharynx (where randomization remains), we observe that the effect of chemotherapy might have opposite effects for tumours located in the faucial arch and on the pharyngeal tongue, see Figure 6.The tumour location could thus be the individual modifier underlying the BHN distribution.For this case study, we cannot be sure whether the effect of chemotherapy depends on the tumour location due to statistical uncertainty.However, it became clear that when statistical uncertainty is not the issue, it will be impossible to distinguish between a time-varying causal effect and a selection effect (of an unmeasured modifier) from data.Both phenomena can give rise to the same MCHD curves.

B(t) B(t)
Fig. 6: Estimated B(t) and corresponding 95% confidence interval (black) for patients with tumours located at the faucial (left) and pharyngeal tongue (right), respectively.Furthermore, the B(t) for a homogeneous population is presented (green) equal to 0.4t (right) and −0.1t (left) for comparison

Discussion and concluding remarks
The additive hazard model gives better interpretable estimates of causal effects than the proportional hazard model (Aalen et al., 2015).As discussed by Aalen et al. (2015), the model assumes that the additive part of the hazard involving the exposure (or treatment) is not affected by any other individual feature.Otherwise, if such effect heterogeneity at the hazard scale exists, we have shown that the OHD is a biased measure for the CHD of interest.A time-varying OHD can thus be the result of either an actual time-varying causal effect of the selection of favourable effect-modifier levels over time.Therefore, it is impossible to distinguish these scenarios based on data without making untestable assumptions.It is important to remark that for cause-effect relations that can be parameterized with SCM (2) where U 1 is degenerate (in which case the OHD can be an unbiased estimator of the CHD), contrary to the individual hazard differences, the difference of the potential survival times, T 1 − T 0 can be heterogeneous.So, heterogeneous effects can also exist under Aalen's additive hazard model.
In the case study, we have illustrated that one should be very careful when concluding that the effect decreases over time based on the cumulative regression function, as this might result from the selection.The size of the bias depends on how much the distribution F U1|T 1 ≥t changes over time.In case the U 1 is low in variability, the size of the bias will be small.When analyzing data from an RCT with an additive hazard model, it can thus be helpful to adjust for potential effect modifiers to reduce the remaining variability of unmeasured effect modifiers.
Even in the absence of confounding, the hazard difference and the hazard ratio (as discussed in our companion paper Post et al. (2022)) are biased estimators of the causal effect.Instead, contrasts of the survival probabilities, the median, or the restricted mean survival time, can be unbiased estimators of causal estimands and should thus be used to quantify the causal effect on time-to-event outcomes as suggested by others (Hernán, 2010;Bartlett et al., 2020;Stensrud et al., 2018;Young et al., 2020).

A.2 Proof of Theorem 1
Proof By causal consistency (Hernán and Robins, 2020), As N A ⊥ ⊥ U 0 , U 1 , N T , there is no confounding and T a ⊥ ⊥ A, so that lim By the law of total probability, First, we focus on the integrand, as the average integrated conditional hazard over the interval increases (or decreases).Then, the limit and integral can be interchanged by directly applying the monotone convergence theorem.
For non-monotone conditional hazard functions, let h ≤ h, and note Using the power series definition of the exponential function, . By application of the dominated convergence theorem, we can change the order of the limit and integral and conclude, lim
So that the Laplace transform of f U 1 |T 1 ≥t can be written as .