The built-in selection bias of hazard ratios formalized using structural causal models

It is known that the hazard ratio lacks a useful causal interpretation. Even for data from a randomized controlled trial, the hazard ratio suffers from so-called built-in selection bias as, over time, the individuals at risk among the exposed and unexposed are no longer exchangeable. In this paper, we formalize how the expectation of the observed hazard ratio evolves and deviates from the causal effect of interest in the presence of heterogeneity of the hazard rate of unexposed individuals (frailty) and heterogeneity in effect (individual modification). For the case of effect heterogeneity, we define the causal hazard ratio. We show that the expected observed hazard ratio equals the ratio of expectations of the latent variables (frailty and modifier) conditionally on survival in the world with and without exposure, respectively. Examples with gamma, inverse Gaussian and compound Poisson distributed frailty and categorical (harming, beneficial or neutral) distributed effect modifiers are presented for illustration. This set of examples shows that an observed hazard ratio with a particular value can arise for all values of the causal hazard ratio. Therefore, the hazard ratio cannot be used as a measure of the causal effect without making untestable assumptions, stressing the importance of using more appropriate estimands, such as contrasts of the survival probabilities.


Introduction
When interested in time-to-event outcomes, ideally, one would like to know the hazard rates of an individual in the worlds with and without exposure.In practice, the focus is on the ratio of the expected hazard rates in both worlds.It is then standard practice to fit the observed hazard rates with a (time-invariant) Cox model (Cox, 1972).A decade ago, Hernán (2010) raised awareness that hazard ratios estimated from a randomized controlled trial (RCT) are not suited for causal inference.Firstly, the average hazard ratio could be uninformative as there will typically be time-varying period-specific hazard ratios.More importantly, even if period-specific hazard ratios are estimated, these can vary solely due to the loss of randomization over time by conditioning on survivors.The exposure assignment and risk factors become dependent when conditioning on individuals that survived t, i.e. survival time T ≥ t, even if these risk factors are unrelated to the exposure (Aalen et al., 2015).As a result, effect measures based on hazards can suffer from non-collapsibility (Daniel et al., 2021;Sjölander et al., 2016;Aalen et al., 2015;Martinussen and Vansteelandt, 2013).
In practice, the wrong estimand, a ratio of (partly) marginalized hazards, is defined, that by the noncollapsibility, deviates from the conditional (causal) hazard ratio.This contrast is referred to as the built-in selection bias of hazard ratios as the non-collapsibility results from conditioning on prior survival (Hernán, 2010;Aalen et al., 2015;Sjölander et al., 2016;Stensrud et al., 2018;Young et al., 2020;Martinussen et al., 2020).This bias should not be confused with confounding bias that is absent when using data from an RCT (Didelez and Stensrud, 2021).For exposure assignment A, and the potential survival time when the exposure is intervened on to a denoted by T a , the observed hazard ratio from an RCT satisfies (De Neve and Gerds, 2020; Martinussen et al., 2020).The observed hazard ratio thus equals the ratio of hazard rates at time t for the potential outcomes of individuals from different populations; those for which T a ≥ t and those for which T 0 ≥ t.As already indicated before, these populations will typically not be exchangeable in other risk factors, implying that an effect found cannot be (solely) assigned to the exposure.The effect does thus not reflect how the hazard rate of an individual is affected by exposure.Only for cause-effect relations such that (1) is time-invariant, the estimand can be interpreted as log(P(T a ≥ t)) log(P(T 0 ≥ t)) .
It has therefore been recommended to use better interpretable estimands such as contrasts of quantiles, the restricted mean survival or survival probabilities of the potential outcomes respectively (Hernán, 2010;Bartlett et al., 2020;Stensrud et al., 2018;Young et al., 2020), or the probabilistic index derived from the latter (De Neve and Gerds, 2020).Alternatively, one can avoid interpretation issues by using accelerated failure time models (Hernán, 2010;Hernán et al., 2005) or additive hazard models (Aalen et al., 2015;Martinussen et al., 2020).Nevertheless, particularly in medical sciences, marginal hazard ratios are still commonly presented by practitioners.To boost the recommended paradigm shift, we would like to quantify the contrast between the causal hazard ratio and the observed hazard ratio, i.e. the built-in selection bias.To do so, we start by presenting a general parameterization of cause-effect relations for time-to-event outcomes using a structural causal model.This parameterization clearly defines the causal effect of an exposure on an individual's hazard and, as such, allows us to define the causal hazard ratio.The quantitative examples in which the hazard under no exposure varies among individuals, i.e. frailty, as presented in the literature (Stensrud et al., 2017;Aalen et al., 2015;Balan and Putter, 2020) do fit in our framework, and we will formalize results for these examples.Additionally, we will extend these examples with causal effect heterogeneity, i.e. the causal effect on the hazard rate might vary between individuals (Stensrud et al., 2017).

Notation
In this work, probability distributions of factual and counterfactual outcomes are defined in terms of the potential outcome framework (Rubin, 1974;Neyman, 1990).Let T i and A i represent the (factual) stochastic outcome and exposure assignment level of individual i.Let T a i equal the potential outcome of individual i under an intervention of level a (counterfactual when A i = a).For those more familiar with the do-calculus, T a is equivalent to T | do(A = a) as e.g.derived in Pearl (2009, Equation 40) and Bongers et al. (2021, Definition 8.6).Throughout this work, we will assume causal consistency: if A i = a, then T a i = T Ai i = T i , implying that potential outcomes are independent of the exposure levels of other individuals (no interference).
The hazard rate of T a can vary among the individuals in the population of interest.We will parameterize this heterogeneity for hazards of T 0 using a random variable U 0i that represents the frailty of individual i (Balan and Putter, 2020).The absolute effect of an exposure on the hazard, i.e. the contrast between the hazard rates of T 0 i and T a i respectively, can depend on U 0i .However, there might also be (relative) effect heterogeneity that we parameterize using the random variable U 1i , giving rise to an individual-specific hazard ratio.The hazard of the potential outcome T a i can be parameterized with a function that depends on U 0i , U 1i and a.We focus on cause-effect relations that can be parameterized with a structural causal model (SCM)1 , that consists of a joint probability distribution of (N A , U 0 , U 1 , N T ) and a collection of structural assignments (f A , f λ ) such that where e. when a = 0 f λ does not contain U 1 , and Note that the data generating mechanism is described by this SCM as T Ai i = T i .For the distribution of data observed from an RCT, we have that N A ⊥ ⊥ U 0 , U 1 , N T , i.e. no confounding, as a result of the randomization.Note that λ a i (t) equals the hazard of the potential outcome of individual i under exposure a, i.e.
In this parameterization, U 0 thus results in heterogeneity of the hazard under no exposure between individuals, and the presence of U 1 results in heterogeneity of the effect of the exposure on the hazard between individuals.The SCM could be re-parameterized by including more details, e.g.measured risk factors, so that the remaining (conditional) unmeasured heterogeneity will be less variable.
Formulation of hazard rates of potential outcomes presented in the literature, e.g. by Aalen et al. (2015) and by Stensrud et al. (2017), naturally fit in this parameterization.However, the dependence of T a and T 0 beyond shared frailty is typically not specified.The causal relations in SCM (2) can be visualized in a graph.In Figure 1, one can observe this so-called Single-World Intervention Graph (SWIG) as proposed by Richardson and Robins (2013) unified with the traditional causal directed acyclic graph (DAG) (Pearl, 2009).
It is insightful to realize that T a i depends on a and the random variables U 0i , U 1i , N T i only, i.e.
for some function g, so by randomization However, it should be noted that as ) and thus on T a i .In the literature, ( 4) is often implicitly derived by recognizing that {T ≥ t} is a collider in Figure 1, and thus opens a back-door path between A and T a (Aalen et al., 2015;Sjölander et al., 2016).The bias that results from conditioning on this collider is often referred to as the built-in selection bias of the hazard ratio.More general, in causal inference, selection bias is caused by conditioning on a variable G that induces dependence between Y a and A, i.e.Hernán and Robins (2020, Chapter 8).By causal as is the case in Equation ( 4) for G = {T ≥ t}).However, the built-in selection bias is subtle as the dependence is a consequence of conditioning a potential outcome on information on its factual outcome, while it should be noted that (for an RCT), again as A more standard example of selection bias for survival analysis from an RCT is not accounting for informative censoring (C = 1) (Hernán et al., 2004;Howe et al., 2016), in which case When interested in the causal effect, we should try to express the distribution of potential outcomes in terms of the observed distribution as we will do in the next section.
The SWIG (black) extended with the DAG (green) for SCM (2), visualizing the causal relations between the exposure assignment A, the observed survival indicator {T ≥ t} and the potential survival time T a .

The causal hazard ratio
Parameterization of the cause-effect relations with SCM (2) allows us to define the causal hazard ratio.If then the individual causal effect is captured by f 1 (t, U 1i , a).The latter equals the ratio of the hazard of an individual's potential outcome when exposed and when non-exposed at time t, i.e.
In the case of homogeneous effects, f 1 (t, U 1i , a) = f 1 (t, a) is equal for all individuals and would thus be the estimand of interest.In the case of heterogeneity of effects, f 1 (t, U 1i , a) is the individual multiplicative causal effect.From a public health perspective, the ratio of the expected hazard in the world where everyone is exposed to a and the expected hazard in the world where all individuals are unexposed is of interest.This causal hazard ratio (CHR) of interest can be obtained as the ratio of the marginalized (over U 0 and U 1 ) conditional hazard rates in both worlds as presented in Definition 1.
Definition 1 Causal hazard ratio The causal hazard ratio (CHR) for cause-effect relations that can be parameterized with SCM 2 equals where we abbreviate the Lebesque-Stieltjes integral of a function g with respect to probability law F X , i.e. g(x)dF X (x), as g(X)dF X .
Note that, when U 0 ⊥ ⊥ U 1 , the CHR can differ from the expectation of an individual's multiplicative effect (E When the parameterization of the cause-effect relations as SCM (2) would be known, the CHR can be expressed in terms of the distribution of the data generating mechanism as presented in Theorem 1.
Theorem 1 If the cause-effect relations of interest can be parameterized with SCM (2), and Consider, for example, the commonly used frailty model where effect heterogeneity is absent, i.e.
The CHR equals the multiplicative effect that does not differ among individuals and equals f 1 (t, a).By applying Theorem 1, this CHR is indeed derived to equal It is important to note that (7) deviates from the observed hazard ratio equal to as we will elaborate on in Section 4. In summary, it becomes clear that to derive the CHR from data, inference on the distribution of the latent frailty U 0 and effect modifier U 1 must be made.When their distributions are known, inference can be drawn from observed data, even when the parameters of the distributions are unknown.Software available to estimate frailty parameters are described by Balan and Putter (2020), and such methods could also be adapted to estimate the latent modifier distribution.However, in practice, the distributions of these latent variables are unknown.Even in the case without causal effect heterogeneity, it is impossible to distinguish the presence of frailty from a time-dependent causal effect (Balan and Putter, 2020, Section 2.5).More precisely, different combinations of (varying) effect sizes and frailty distributions give rise to the same marginal distribution.The same holds for combinations that also involve effect modifiers.In the case of clustered survival data, at least theoretically, the shared frailty could be distinguished from violation of proportional hazards (Balan and Putter, 2020).Reasoning along the same lines, individual frailty and marginal time-varying effects could only be derived from effect heterogeneity in the case of recurrent events with stationary distributions.

Marginal causal hazard ratio
In Theorem 1, the actual CHR (see Definition 1) has been expressed in terms of the distribution of the observed data, and we concluded that these are typically not identifiable.Practitioners often compute the hazard ratio from data, which expectation is referred to as the observed hazard ratio (OHR) and, in the absence of informative censoring, equals To compare the OHR to the CHR that quantifies the causal effect of interest, the OHR should be expressed in terms of potential outcomes.For data from an RCT, by independence (5) and causal consistency, the OHR equals the marginal causal hazard ratio (MCHR), i.e.
This MCHR should not be confused with the 'marginal causal hazard ratio' defined by Martinussen et al. (2020) that equals and is not considered in this work.
We will study how the MCHR (and thus the OHR from an RCT) differs from the CHR over time.By the law of total probability, (9) equals As the integration in the result of Theorem 1 is with respect to the distribution of U 0 and U 1 , in the entire population, instead of those individuals for which T a ≥ t or T 0 ≥ t, the MCHR deviates from the CHR, i.e. the built-in selection bias of the hazard (Hernán, 2010;Aalen et al., 2015;Stensrud et al., 2018).
The problem induced for estimation of the CHR thus results from inference on the wrong estimand; the combined effect of the exposure of interest and the difference in latent frailty (and effect modification) distribution, is computed.This section will formalize how (9) deviates from the CHR.Let us again focus on cause-effect relations that can be parameterized with SCM (2) in which where the CHR equals E[f 1 (t, U 1 , a)].These would be the cause-effect relations for which the CHR is an appropriate causal effect measure.Moreover, we focus on hazard functions that satisfy Condition 1 and do thus not have an infinite discontinuity.
Condition 1 Hazard without infinite discontinuity The real value of the MCHR, that deviates from the CHR, is derived in Theorem 2.
Theorem 2 If the cause-effect relations of interest can be parameterized with SCM (2), where and Condition 1 holds, then The conditional expectations that determine the value of the MCHR equal weighted means of f 0 (t, u 0 )f 1 (t, u 1 , a) and f 0 (t, u 0 ) with weights P(T a ≥t|U0=u0,U1=u1) P(T a ≥t) and P(T 0 ≥t|U0=u0) P(T a ≥t) respectively.To develop our understanding of the bias when the OHR (assuming no confounding) is used to estimate the CHR, we will first continue to study the bias as a result of frailty and heterogeneity separately in the next two subsections.

Causal effect homogeneity
In the case of homogeneous multiplicative causal effects on the hazard, i.e. f 1 (t, U 1i , a) = f 1 (t, a), the ratio of the marginal hazard rates of individuals satisfying T a i ≥ t and of those T 0 i ≥ t equals f 1 (t, a) multiplied by a factor that depends on the difference in frailty distributions at time t in those two populations as derived in Corollary 1.
Corollary 1 If the cause-effect relations of interest can be parameterized with SCM (2), where and condition 1 holds then The conditional expectation E [f 0 (t, U 0 ) | T a ≥ t] can be seen as a reweighted mean of f 0 (t, u 0 ).These weights equal P(T a ≥t|U0=u0) and over time increase for favourable values of decreases over time.If Λ a (u 0 , t) > Λ 0 (u 0 , t), e.g. when ∀t : f 1 (t, a) > 1, then the weights P(T a ≥t|U0=u0) decrease more rapidly with u than the weights P(T 0 ≥t|U0=u0) , leading to On the contrary, when Λ a (u 0 , t) < Λ 0 (u 0 , t), then So in the case of effect homogeneity, at time t, the MCHR is larger than the CHR when Λ a (u 0 , t) < Λ 0 (u 0 , t) and smaller when Λ a (u 0 , t) > Λ 0 (u 0 , t).An example of the latter was showcased by Stensrud et al. (2017), where a model with f 1 (t, a) = 1.81 a , f 0 (u 0 , t) = u 0 λ 0 (t) and compound Poisson distributed frailty U 0 could well explain the decrease of the effect of hormone replacement therapy on coronary heart disease in postmenopausal women over time as observed from an RCT by the Woman Health Initiative.Furthermore, as emphasized by Hernán (2010) based on the same case-study, even when f 1 (t, a) is constant over time, the (bias of the) MCHR hazard is time-varying, so that the estimates can depend on the follow-up time of a study.
For frailty models as presented by Stensrud et al. ( 2017) and Aalen et al. (2015), where f 0 (t, U 0i ) = U 0i λ 0 (t), it has already been shown by Balan and Putter (2020) that E [U 0 | T ≥ t, A = a] can be expressed in terms of the Laplace transform of the frailty U 0 .Reasoning along the same lines, E [U 0 | T a ≥ t] is expressed in terms of the Laplace transform of the U 0 in Lemma 1.
Lemma 1 If the cause-effect relations of interest can be parameterized with SCM (2), where where As in this previous work (Balan and Putter, 2020, Figure 5), we present examples where U 0 follows a gamma, inverse Gaussian or compound Poisson distribution respectively.The parameterizations, corresponding Laplace transforms and expressions for E[U 0 | T a ≥ t] can be found in Appendix B. To illustrate the selection bias, we consider a binary exposure and let can be derived.The MCHR then follows from Corollary 1 (conditional hazard is monotone increasing).The expressions for these quantities are presented in Table 1.
How the MCHR deviates from c over time for c ∈ { 1 3 , 3}, and θ ∈ {0.5, 1, 2} is visualized in Figure 2.For both c = 3 and c = 1 3 the selection of individuals that survive time t result in a MCHR that evolves in the opposite direction of the causal effect, towards 1, √ c and √ c −1 respectively.For the case of a compound Poisson frailty, the logarithm of this latter limit is even opposite to the sign of the logarithm of the CHR due to the nonsusceptible individuals.For all types of frailty, the higher the variance of U 0 , the larger the difference between the MCHR and the CHR.For comparison we have also presented the survival curves of T 1 and T 0 in Figure 10 in Appendix C for the setting where θ 0 = 1.Note that for an RCT T a d = T | A = a by (2) and causal consistency.
Table 1: Conditional expectations and resulting MCHR when the CHR equals c for different frailty distributions such that E[U 0 ] = 1 and var(U 0 ) = θ 0 (in absence of effect modification).

Causal effect heterogeneity in the absence of frailty
Before we return to the general case presented in Theorem 2, let's consider the presence of effect heterogeneity in the absence of frailty, i.e.
is equal for all t, the MCHR is not as over time among the exposed individuals those that 'benefit' more are more likely to survive.The effect of this selection on the MCHR over time is formalized in Corollary 2.
Corollary 2 If the cause-effect relations of interest can be parameterized with SCM (2), where , and Condition 1 holds then ] as more weight is placed on lower values of f 1 (t, u 1 , a) that correspond to higher P(T a ≥ t | U 1 = u 1 ).Besides the selection of frailty factors, the selection of individual modifiers can also lead to selection bias of the estimated hazard ratio.
decreases over time irrespective of whether the exposure is beneficial or harming on average.For a hypothetical setting without frailty but with effect heterogeneity, the CHR at t is thus systematically underestimated when using the MCHR, so the exposure seems more 'beneficial' than it is.Such a difference has only been explained by the presence of frailty (Hernán, 2010;Stensrud et al., 2017).For binary exposures and SCMs that further restrict f 1 (t, u 1 , a) = u a 1 , by Lemma 1 (as which by Corollary 2 (conditional hazard is monotone increasing) equal the MCHRs, and are presented in Table 2. Additionally, we derived E[U 1 | T 1 ≥ t] for a setting where the multiplicative hazard effect modifier U 1 equals µ 1 (< 1, for individuals that benefit) with probability p 1 , µ 2 (> 1, for individuals that are harmed) with probability p 2 or 1 (for individuals that are not affected).We define this distribution as the Benefit-Harm-Neutral, BHN(p 1 , µ 1 , p 2 , µ 2 ), distribution.
Table 2: Conditional expectation of the individual effect modifier U 1 when the CHR equals c for different modifier distributions such that E[U 1 ] = c and var(U 0 ) = θ 1 (in absence of frailty).
For E[U 1 ] ∈ 1 3 , 3 , and θ 1 ∈ {0.5, 1, 2} the evolution of the conditional expectation is shown in Figure 3 for all four effect-modifier distributions.For the BHN distribution, when E[U 1 ] equals 3 and 1 3 , we fix p 1 = 0.05, µ 1 = 0.5 and p 1 = 0.9, µ 1 = 0.1 respectively.Expressions for p 2 and µ 2 such that E settings considered there is a point in time that the MCHR drops below 1.For the continuous distributions, the MCHR won't stop decreasing.The decreases for the gamma and compound Poisson settings are very similar, while for the inverse Gaussian setting, this goes a bit slower.For the discrete setting, the marginal causal hazard converges to µ 1 of 0.5 and 0.1, respectively.Again, as in the previous subsection, the higher the variability of the latent variable, the faster the MCHR deviates from the CHR.Only for the discrete effect modifier, the lines cross for the different variances for E[U 1 = 3], but this is the result of different fractions of individuals that are not affected by the exposure (as the mean and variance are coupled).

Causal effect heterogeneity in the presence of frailty
In the general case where effect heterogeneity and frailty are present, both principles affect the value of the MCHR.By Theorem 2, the ratio evolves as . The numerator depends on the joint distribution of U 0 and U 1 .For illustration, we again consider a binary exposure and let and, by Lemma 1, can be derived from the Laplace transforms of U 0 U 1 and U 0 respectively.

Independent U 0 and U 1
In the case of independence, the Laplace transform of the product equals E[L U0 (cU 1 )], which generally does not adopt a tractable form.The case with a discrete effect modifier, introduced in Section 4.2, forms an exception.If U 1 ∼ BHN(p 1 , µ 1 , p 2 , µ 2 ), then For the running example where are independent of the U 1 distribution these expectations are the same as presented in Table 1.
The MCHR and its limit can be derived from the expressions in Table 3 by applying Theorem 2 (conditional hazard is monotone increasing).Interestingly, for gamma frailty, this limit remains 1.For the inverse Gaussian frailty, the effect heterogeneity drastically change the limit from E[U 1 ] to √ µ 1 , which is always less than 1.Finally, for compound Poisson frailty the limit changes from The evolution of over time is visualized in Figure 4 for U 1 ∼ BHN(0.05,0.5, 0.82, 3.5), such that E[U 1 ] = 3 and var(U 1 ) = 1, and for U 1 ∼ BHN(0.9, 0.1, 0.03, 6.0), such that E[U 1 ] = 1 3 and var(U 1 ) = 1.
In the case the CHR is larger than 1, the selection of less susceptible individuals (frailty) that are harmed less (effect modifier) in the exposed world, both cause the MCHR to be smaller than the CHR.The MCHR decreases faster in the presence of effect heterogeneity.This explains the observation by Stensrud et al. (2017), "Interestingly, the magnitude of frailty bias is larger when a heterogeneous treatment effect is included", for a simulation with frailty and random individual hazard ratios such that E[λ 1 i (t)] = 1.81 > 1.For the gamma and compound Poisson frailty examples this effect is relatively small as 1 (for the presented p 1 , µ 1 , p 2 and µ 2 ).However, for inverse Gaussian frailty, the MCHR deviates much more from the CHR in the presence of effect heterogeneity.In Figure 5, the evolution of the MCHR is presented for a longer timescale, and the limits become apparent.If the CHR is smaller than 1, then the selection of less susceptible individuals (frailty) in the unexposed world and the selection of individuals that benefit more (effect modifier) in the exposed world have opposite effects on the MCHR.For this case of discrete effect modifiers, the MCHR first decreases by selecting individuals with more beneficial modifiers and later increases (above the CHR) when the frailty selection effect prevails.For the examples presented, the fraction p 1 = 0.9 of the population with µ 1 = 0.1 are expected to survive so that over time the MCHR will resemble the MCHR in the absence of effect heterogeneity for this subpopulation (with the CHR equal to 0.1).The limit for gamma frailty is still one, so the MCHR deviates less from the CHR due to the two opposed selection effects.The bias is strongly reduced for the inverse Gaussian frailty as the limit √ 0.1 is close to the actual CHR.Finally, for the compound Poisson frailty, the MCHR with effect heterogeneity crosses the MCHR in the absence of effect heterogeneity as the frailty bias is larger for a CHR of 0.1 compared to one of 1 3 .In summary, the bias for the CHR can further increase in the presence of effect heterogeneity, stressing the issues regarding the causal interpretation of OHRs (assuming no confounding).However, for beneficial exposures, the frailty bias can reduce in the presence of effect heterogeneity (e.g., Inverse Gaussian frailty), illustrating that there might be settings where the MCHR is close to the CHR.
3 (opaque orange), when U 0 follows a gamma (left), inverse Gaussian (middle) or compound Poisson (right) distribution with variance 0.5 (dotted), 1 (solid) or 2 (dashed) respectively.For comparison, the lines presented in Figure 2 as well as the limits of the MCHRs are represented by transparent lines.

Dependent U 0 and U 1
In case the multiplicative effect of the exposure on the hazard of susceptible individuals is expected to be higher or lower than for less susceptible individuals, the distribution of U 0 U 1 will be less or more variable than when the latent variables are independent.Every bivariate joint distribution function can be written using the marginal distribution functions and a copula C (Sklar, 1959).As such, and the Kendall's τ correlation coefficient of U 0 and U 1 can be written as a function of the Copula C (Nelsen, 2006).To study how the dependence can affect the MCHR for the setting presented in Figure 4, we use a Gaussian copula where Φ and Φ 2,ρ are the standard normal and bivariate normal with correlation ρ, cumulative distribution functions, respectively.For ρ ∈ {−1, sin(− π 4 ), 0, sin( π 4 ), 1} (such that τ ∈ {−1, −0.5, 0, 0.5, 1}) and var(U 0 ) = 1, E U 0 U 1 | T 1 ≥ t is derived empirically from simulations and are presented in Figure 6.All programming codes used for this work can be found online at https://github.com/RAJP93/CHR.The results were very similar when using a Frank, Clayton or Gumbel copula instead of the Gaussian copula.For reference, the evolution of E U 0 | T 0 ≥ t over time is also presented in Figure 6.The corresponding MCHRs are presented in Figure 7.Note that for ρ = 0, we recover the independent setting already shown in Figure 4 that can be used for comparison.First of all, it is important to recall that when U 0 and U 1 are dependent the CHR equals E[U 1 ] + cov(U 0 , U 1 ) as discussed in Section 3.For CHRs greater than 1, it becomes clear that the selection

Conditional expectation
Fig. 6: 3 (orange), U 0 follows a gamma (left), inverse Gaussian (middle) or compound Poisson (right) distribution and where the joint distribution of U 0 and U 1 follows from a Gaussian copula with varying Kendall's τ correlation coefficient (see legend).Furthermore, the evolution of (orange), U 0 follows a gamma (left), inverse Gaussian (middle) or compound Poisson (right) distribution and where the joint distribution of U 0 and U 1 follows from a Gaussian copula with varying Kendall's τ correlation coefficient (see legend).
effect is more serious for cases with a high (positive) correlation between U 0 and U 1 .The stronger selection effect is due to the higher variability of U 0 U 1 .For CHRs less than 1, this trend is only true at short timescales, after which the frailty selection effect takes over since for this example a large fraction of the individuals, p 1 = 0.9, the effect is the same (U 1 = 0.1).When we use a continuous gamma distributed U 1 instead, the frailty selection effect is less apparent, as shown in Figure 8.So far, for CHRs larger than 1, we have observed 3 (orange), U 0 follows a gamma (left), inverse Gaussian (middle) or compound Poisson (right) distribution and where the joint distribution of U 0 and U 1 follows from a Gaussian copula with varying Kendall's τ correlation coefficient (see legend). a monotonic MCHR.However, in the case of strong dependence between U 0 and U 1 (τ = −1, and τ = 1 for inverse Gaussian frailty) resulting in a non-monotonic trend for the MCHR.For a Gamma distributed U 1 , in the case of inverse Gaussian distributed U 0 with τ = 1, the MCHR even equals a monotonic increasing function over time as shown in Figure 8.
In this section, we have applied Theorem 1 to several examples to illustrate the deviation of the MCHR from the CHR.In summary, even when the CHR is constant, an OHR from an RCT equal to x (at time t) can occur for different CHR values when the (U 0 , U 1 ) distribution is unknown as summarized in Table 4.
Table 4: Assuming no confounding, an OHR (at time t) equal to x can occur for all values of a constant CHR as a result of selection of the frailty (U 0 ) or modifier factor (U 1 ) which might be dependent.

Cause
Presented examples Modifier selection gamma distributed modifier (Figure 8) CHR < x Frailty selection

Implications for the Cox model
We have demonstrated that in the presence of frailty and effect heterogeneity, even when the CHR is constant, the MCHR varies over time.Then, the proportional hazards assumption will not hold for an OHR from an RCT (that is in the absence of informative censoring equal to the MCHR as discussed at the start of Section 4).Despite the many options to deal with non-proportional hazards (see, e.g.Thernau and Grambsch (2000, Section 6.5) or Hess (1994); Bennett (1983); Wei and Schaubel ( 2008)), in the majority of epidemiological time-to-event studies, the traditional Cox's proportional hazard model (that is thus misspecified) is fitted.The logarithm of the Cox estimate can be interpreted as the logarithm of the OHR marginalized over the observed death times (Schemper et al., 2009), i.e.E[log(OHR(T )) | C = 0] for censoring indicator C. The logarithm of the Cox estimate obtained from an RCT thus equals a time-weighted average of the logarithm of the OHR.In the case of non-proportional hazards, even for uninformative censoring, the estimate is well-known to be affected by the censoring distribution.It differs from the average log hazard ratio E [log(OHR(T ))] (Schemper et al., 2009;Xu and O'Quigley, 2000;Boyd et al., 2012).Therefore, the bias of the Cox estimate, when the estimand is the CHR, will depend on the joint distribution of (U 0 , U 1 ) as well as the censoring distribution.In most cases considered in Section 4, the deviation of the OHR from the CHR increased over time.For uninformative censoring, the probability of censoring increases over time, so the Cox estimate is closer to the OHR at short times.In Figure 9, this is demonstrated for the gamma-frailty case (Figure 4, var(U 0 = 1)) by presenting empirically obtained E[log(OHR(T )) | C = 0] (with 1, 000, 000 replications) based on a varying follow-up time and loss to follow-up modelled with an exponential censoring-time distribution with different parameters.The built-in selection bias of the OHR (assuming no confounding) emphasizes the 0 2 4 6 8 10 1.0 1.5 2.0 2.5
importance of testing the proportional hazard assumption.Serious bias will result in a time-varying OHR, so that violation of the proportional hazard assumption can be verified when fitting a Cox model.However, when the actual CHR would be time-varying, the OHR can be approximately constant when the selection effect and the change in CHR roughly cancels out (see, e.g.Stensrud et al. (2018)).The data cannot be used to distinguish the latter case from the case where there is no heterogeneity and, thus, no selection effect.Similarly, as already mentioned at the of Section 3, when the OHR would vary over time, we can never conclude whether this is the result of a time-varying causal effect or due to selection.However, the proportional hazard assumption would be violated in this case, and a standard Cox model is inappropriate.

Discussion and concluding remarks
In this paper, we have formalized how heterogeneity leads to deviation of the MCHR (see Equation ( 9)) from the CHR of interest (see Definition 1) due to the selection of both the individual frailty factor (U 0 ) and the individual effect modifier (U 1 ).This work generalizes frailty examples presented in the literature (Hernán, 2010;Aalen et al., 2015;Stensrud et al., 2017), by considering the possibility of multiplicative effect (on the hazard) heterogeneity that also results in non-exchangeability of exposed and unexposed individuals over time.As a result of the individual effect modifier (U 1 ), the individuals that survive in the exposed groups are expected to benefit more (or suffer less) from the exposure.At the same time U 0 | T 1 ≥ t will have a different distribution than U 0 | T 0 ≥ t.When the CHR is larger than 1 and U 0 ⊥ ⊥ U 1 , the selection effects act in the same direction.On the other hand, when the CHR is smaller than 1 and U 0 ⊥ ⊥ U 1 , the selection effects can act in opposite directions so that the MCHR might be closer to the CHR than in the case without effect heterogeneity (see Figure 4).
For data from an RCT, in the absence of informative censoring, the OHR equals the studied MCHR so that all results directly relate to the OHR.For observational data, the OHR does not equal the MCHR due to confounding.However, when all confounders L are observed, i.e.T a ⊥ ⊥ A | L, one can study the conditional (on L) OHR that in turn is equal to the conditional MCHR.The presented theorems are valid while conditioning on L.
The intuition explained by Hernán (2010) suggests that the MCHR underestimates the effect size, while the sign of the logarithms of the MCHR and the CHR are equal.However, we have shown that in the presence of effect heterogeneity, an MCHR equal to x can occur both under CHR > 1 as well as CHR < 1 as summarized in Table 4. Therefore, OHRs from RCTs are not guaranteed to present a lower bound for the causal effect without making untestable assumptions on the (U 0 , U 1 ) distribution.We have derived how the MCHR will evolve due to the selection of frailty and effect modifiers in Theorem 1.However, in practice, only the evolution of the OHR can be found (assuming sufficient data is available).Even after assuming absence of confounding (e.g. in case of a RCT) the CHR is non-identifiable without making (untestable) assumptions on the (U 0 , U 1 ) distribution as discussed at the end of Section 3.
We can thus not distinguish between a time-varying CHR without selection of U 0 and U 1 or a constant CHR with selection, see e.g Stensrud and Hernán (2020).However, adjusting for other risk factors can lower the remaining variability of U 0 and U 1 so that the bias is reduced.Even for an RCT, it may thus help to focus on adjusted hazard ratios despite the absence of confounding.Nevertheless, adjusting for other risk factors will require more data and modelling decisions.We want to remark that although additive hazard models (when well-specified) do not suffer from the frailty selection as shown by Aalen et al. (2015), these models will suffer from latent modifier selection in the presence of effect heterogeneity ( Proof By the law of total probability, First we focus on the integrand, For monotonic (increasing or decreasing) conditional hazard functions if h 2 < h 1 , then as the average integrated conditional hazard over the interval increases (or decreases).Then, the limit and integral can be interchanged by directly applying the monotone convergence theorem.
For non-monotone conditional hazard functions, let h ≤ h, and note where max s∈(t,t+ h) Using the power series definition of the exponential function, Then we can change the order of the limit and integral by application of the dominated convergence theorem and conclude, Applying Bayes rule, we obtain Furthermore, The ratio of both gives us the result.

A.3 Proof of Corollary 1
Proof By Theorem 2, and equals

A.4 Proof of Lemma 1
Proof By Bayes rule, the probability density f U 0 |T a ≥t equals Such that the Laplace transform of U 0 | T a ≥ t can be written as .

C Survival curves examples
The survival curves of T 0 and T 1 for the examples presented in this paper can be expressed in terms of the Laplace transforms presented in the previous section as shown in Table 5, where Λ 0 (t) = t 3 60 .For the example where U 0 ⊥ ⊥ U 1 , the survival curve for T 1 is obtained empirically from simulation.For data from a RCT T a d = T | A = a.Fig. 12: Survival curves for T 0 (black), and 1 (colored) when λ a (t) i = U 0i (U 1i ) a t 2 20 for a unit-variance BHN distributed U 1 with E[U 1 ] = 3 (green) and E[U 1 ] = 1 3 (orange), when U 0 follows a gamma, inverse Gaussian or compound Poisson distribution (from left to right) with variance 1.The corresponding MCHR curves were presented in Figure 4. Fig. 13: Survival curves for T 0 (black), and T 1 (colored) when λ a i (t) = U 0i (U 1i ) a t 2 20 , for a unit-variance BHN distributed U 1 with E[U 1 ] = 3 (green) or E[U 1 ] = 1 3 (orange), when U 0 follows a gamma, inverse Gaussian or compound Poisson distribution (from left to right) with variance 1 and where the joint distribution of U 0 and U 1 follows from a Gaussian copula with varying Kendall's τ .The corresponding MCHR curves were presented in Figure 7. Fig. 14: Survival curves T 0 (black), and T 1 (colored) when λ a i (t) = U 0i (U 1i ) a t 2 20 , for a unit-variance gamma distributed U 1 with E[U 1 ] = 3 (green) or E[U 1 ] = 1 3 (orange), when U 0 follows a gamma, inverse Gaussian or compound Poisson distribution (from left to right) with variance 1. and where the joint distribution of U 0 and U 1 follows from a Gaussian copula with varying Kendall's τ .The corresponding MCHR curves were presented in Figure 8.