Statistics in Biosciences

, Volume 5, Issue 2, pp 261–285

The Design of Intervention Trials Involving Recurrent and Terminal Events

Authors

  • Longyang Wu
    • Department of Statistics and Actuarial ScienceUniversity of Waterloo
    • Department of Statistics and Actuarial ScienceUniversity of Waterloo
Article

DOI: 10.1007/s12561-013-9083-z

Cite this article as:
Wu, L. & Cook, R.J. Stat Biosci (2013) 5: 261. doi:10.1007/s12561-013-9083-z
  • 149 Views

Abstract

Clinical trials are often designed to assess the effect of therapeutic interventions on the incidence of recurrent events in the presence of a dependent terminal event such as death. Statistical methods based on multistate analysis have considerable appeal in this setting since they can incorporate changes in risk with each event occurrence, a dependence between the recurrent event and the terminal event, and event-dependent censoring. To date, however, there has been limited development of statistical methods for the design of trials involving recurrent and terminal events. Based on the asymptotic distribution of regression coefficients from a multiplicative intensity Markov regression model, we derive sample size formulas to address power requirements for both the recurrent and terminal event processes. We consider the design of trials for which separate marginal hypothesis tests are of interest for the recurrent and terminal event processes and deal with both superiority and non-inferiority tests. Simulation studies confirm that the designs satisfy the nominal power requirements in both settings, and an application to a trial evaluating the effect of a bisphosphonate on skeletal complications is given for illustration.

Keywords

Multistate modelNon-inferiorityRecurrent eventsSample sizeSuperiority

1 Introduction

1.1 Background

Clinical trials must be designed with appropriate power to address scientific needs, ethical demands, and financial restrictions. In parallel group randomized trials involving failure time outcomes, power objectives are typically met for a given model (e.g. Cox model) by specifying the event rate in the reference arm, the clinically important effect, the censoring rate and the size of the test, and then by deriving a suitable sample size based on large sample theory [1]. Under this general framework, a number of authors have developed methods for planning trials based on analyses of the time to the first event [25].

Sample size formulas have been developed [6] for recurrent event outcomes based on mixed Poisson models with multiplicative rate functions [7, 8]. Power and sample size considerations were subsequently developed for more general multiplicative intensity-based models [9] using counting process theory [10]. Another approach to the analysis of recurrent event data in clinical trials is to use the robust methods for the analysis of multivariate survival data [11] under a working independence assumption, and sample size formula for this approach are available [12]. More recently there has been interest in trial design based on covariate-adjusted log-rank statistics for recurrent event analyses and associated sample size formula have been developed [13].

To date no methods have been developed for the design of clinical trials in which the aim was to test treatment effects on recurrent and terminal event processes. We address this problem under the framework of a Markov model with transient states corresponding to the recurrent events and a single absorbing state for death. The treatment effect on the recurrent events is formulated by specifying multiplicative intensity models with time-dependent strata based on the cumulative event history and a common treatment effect; this formulation is in the spirit of the Prentice et al. [14] approach to the analysis of recurrent events. Multiplicative intensity-based models are also incorporated for mortality with the same stratification criteria. Under this formulation we derive the limiting value of partial score statistics for the treatment effect on the recurrent and terminal event processes, along with the asymptotic variances under the null and alternative hypotheses. Sample size criteria are then obtained to satisfy power objectives when separate marginal hypothesis tests are of interest for the two types of event.

Non-inferiority designs are being used increasingly often in cancer and cardiovascular research [15, 16] since many treatments with proven efficacy are available and placebo-controlled trials are therefore unethical. In such settings new interventions are required to have some advantages over standard care, such as a lower cost, a lower rate of adverse events, or a less invasive mode of administration [17]. Rothmann et al. [15] provides an excellent discussion about the various approaches to hypothesis testing in the context of non-inferiority oncology trials and extensions have recently been made for recurrent event analyses based on mixed Poisson models or robust marginal methods [18]. We consider design issues when there are superiority or non-inferiority hypotheses for the recurrent event and survival processes in the context of the multistate model.

The remainder of this paper is as follows. In the next subsection we give further background information on the setting of the palliative trials for patients with cancer metastatic to bone. In Sect. 2 we define notation, describe the multistate model, and derive the relevant partial score statistics. The limiting distribution of the partial score statistics are derived in Sect. 3 which facilitate sample size calculation in Sect. 4. Simulation studies in Sect. 5 confirm that the empirical frequency properties are compatible with the nominal levels under the null and that power requirements are met. An application is given in Sect. 6 and general remarks and topics for further research are discussed in Sect. 7.

1.2 Trial Design for Patients with Skeletal Metastases

Cancer patients with skeletal metastases are at increased risk of a variety of clinical events including pathological and nonpathological fractures, bouts of acute bone pain, and episodes of hypercalcemia. These events are typically grouped together to form a composite recurrent “skeletal related event” which is used as a basis for the evaluation of treatments designed to reduce the occurrence of skeletal complications in cancer patients to help maintain functional ability and quality of life and minimize health service utilization [19]. Because the patient population has metastatic cancer, they are also at considerable risk of death. In breast cancer, twelve month survival in recent studies has been approximately 78.9 % in treated patients; in lung, prostate and other solid tumors the 12 month survival rates were 28.0 %, 66.0 % and 33.6 % respectively.

While bisphosphonate therapy is palliative and not expected to impact survival, an assessment of the effect on survival times is warranted for a complete evaluation of the consequences of treatment. Simultaneous consideration of treatment effects on the recurrent skeletal related events and survival is therefore essential and analyses must accommodate a possible association between the recurrent event and terminal death process.

2 Likelihood for Recurrent and Terminal Events

We adopt the framework of a continuous time multistate Markov process to jointly model the recurrent events and terminal event. Let {Zi(s),0<s} denote this process for individual i with a countable number of states in the state space \(\mathcal{S}=\{0, 1, \ldots, D \}\) and a right continuous sample path. The integers 0,1,2,… represent the number of recurrent events experienced over time and D represents an absorbing death state. Figure 1 displays a multi-state diagram for the recurrent events and terminal event process. If individual i is alive at time t and has experienced precisely j events over (0,t], then Zi(t)=j and if individual i dies at time s, Zi(t)=D for ts. We assume that all subjects are at state 0 at time t=0, the time of randomization. Let vi be a binary treatment indicator for individual i such that vi=1 if individual i was randomized to the experimental treatment and vi=0 otherwise.
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Fig1_HTML.gif
Fig. 1

Recurrent events with terminal event diagram representing the model formation based on counting processes. λ0j(t)exp(βv), j=1,2,…, are the transition intensities for the recurrent events from state (j−1) to state j. State D represents the terminal event of death and γ0j(t)exp(θv), j=1,2,…, are the corresponding event-dependent transition intensities; for convenience state Ej is simply referred to as state j

Let Tij be the time individual i enters state j, j=1,…, and \(T_{i}^{d}\) their time of death, i=1,…,m. Let dNij(t)=I(Zi(t)=j−1,Zi(t)=j), indicate that a (j−1)→j transition was made at time t for individual i, so dNij(t)=1 at tij but is zero otherwise, j=1,…. Let \(dN_{ij}^{d}(t)=I(Z_{i}(t^{-})=j-1, Z_{i}(t)=D)\) indicate that a (j−1)→D transition is made at time t (i.e. that the jth event was death). Let Ni(t)=(Nij(t),j=1,…) and \(N_{i}^{d}(t)=(N_{ij}^{d}(t), j=1,\ldots)\) jointly be the multivariate counting process for individual i. The history of the process is the information observed up to t and we let \({H}_{i}(t)=\{{N}_{i}(s), {N}_{i}^{d}(s), 0\leq s <t, v_{i} \}\) denote the history for individual i, i=1,…,m. A stochastic model for this multistate process must be assumed to derive sample size calculations. We formulate this model by specifying the respective intensity functions [21]. The intensities for event occurrence or death are defined as
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equa_HTML.gif
and
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equb_HTML.gif
respectively, where ΔNij(t)=Nij((t+Δt))−Nij(t) and \(\varDelta N_{ij}^{d}(t) = N_{ij}^{d}((t+ \varDelta t)^{-})-N_{ij}^{d}(t^{-})\) count the number of the (j−1)→j and (j−1)→D transitions over (t,t+Δt) respectively.

Consider a study with planned follow-up over the interval (0,τ], where τ is called the administrative censoring time. Individuals may withdraw prematurely from a study and so we let \(\tau _{i}^{\dagger}\) be the random right censoring time and let \(\tau_{i}=\mathrm{min}(\tau_{i}^{\dagger},\tau)\) be the net censoring time for individual i; we let \(X_{i}=\min(T_{i}^{d},\tau_{i})\) denote the total time on study and \(\delta_{i}=I(X_{i}=T_{i}^{d})\) indicate whether the terminal event was observed. Let Yi(t)=I(tτi) indicate whether individual i is under observation at t and Yij(t)=I(Zi(t)=j−1), j=1,… indicate that individual i is at risk of a transition out of state j−1 at time t (i.e. they are at risk for the jth event of either type), so \(\bar{Y}_{ij}(t)=Y_{i}(t)Y_{ij}(t)\) indicates they are both at risk and under observation. Then \(d\bar{N}_{ij}(t)=\bar{Y}_{ij}(t)dN_{ij}(t)\) and \(d\bar {N}_{ij}^{d}(t)=\bar{Y}_{ij}(t)dN_{ij}^{d}(t)\) are so-called the observable counting processes for the recurrent event and terminal events respectively. The observed data can then be written \(\{d\bar{N}_{i}(s), d\bar {N}_{i}^{d}(s), Y_{i}(s), 0<s, v_{i}\}\), i=1,…,m. The history of the observable process is the information observed up to t and denoted \(\bar{H}_{i}(t)=\{\bar{N}_{i}(s), \bar{N}_{i}^{d}(s), \bar{Y}_{i}(s), 0\leq s <t, v_{i} \}\), i=1,…,m.

Under conditionally independent censoring [22], the intensities for event occurrence and death of the observable processes are given by \(\bar{\lambda}_{j}(t|\bar{H}_{i}(t)) = \bar{Y}_{ij}(t) \lambda _{j}(t|H_{i}(t))\) and \(\bar{\gamma}_{j}(t|\bar{H}_{i}(t)) = \bar{Y}_{ij}(t) \gamma_{j}(t|H_{i}(t))\), respectively. Thus if individual i experienced Ji>0 recurrent events at times \(t_{i1}, \ldots, t_{i,J_{i}}\) over [0,Xi], their likelihood contribution is proportional to
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equc_HTML.gif
where ti0=0 and for notational convenience we let \(t_{i,J_{i}+1}=X_{i}\).
A specification is required for the intensity functions and here we adopt a multiplicative intensity Markov model [1] and we set the two intensities to
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ1_HTML.gif
(1)
and
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ2_HTML.gif
(2)
where λ0j(t) and γ0j(t) are non-negative baseline intensity functions for transitions involving the recurrent event and terminal event from state j, respectively. Through the time-dependent stratification on the cumulative number of events, this model accommodates an association between the recurrent and terminal events. The multiplicative effect of vi is assumed to be constant (i.e. not event dependent) for the two processes to give a parsimonious parameterization of the treatment effect. This model was discussed by Prentice et al. [14] is sometimes referred to as the stratified Anderson–Gill model [23].
The likelihood can be factored into two parts, one part involving β and the other part involving θ. The likelihood contribution for the recurrent event process involves β and is given by
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ3_HTML.gif
(3)
where \(\varLambda_{ij}(t)=\int_{0}^{t}\lambda_{ij}(u)du\) is the cumulative intensity function for individual i in stratum j. The partial likelihood for a sample of size m is then the product of m such terms.
The partial score estimating function for β is then
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ4_HTML.gif
(4)
The Breslow profile estimate of 0j(u) is
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ5_HTML.gif
(5)
and substituting (5) into (4) gives the “profile” partial score function
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ6_HTML.gif
(6)
where \(R_{j}^{(a)}(\beta,u)=m^{-1}\sum_{i=1}^{m}\bar{Y}_{ij}(u) v_{i}^{a} \exp (\beta v_{i})\) and a=0,1. Similarly, we obtain the corresponding score functions for the terminal event intensities as
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ7_HTML.gif
(7)
where \(S_{j}^{(a)}(\theta, u)=m^{-1}\sum_{i=1}^{m}\bar{Y}_{ij}(u) v_{i}^{a} \exp(\theta v_{i})\) and a=0,1. The score functions (6) and (7) are those of a stratified Cox regression model with one binary covariate. These two score functions form the basis of the partial score statistics we used to calculate sample size.

3 Asymptotic Properties of Partial Score Statistics

In this section, we investigate the asymptotic properties of the partial score statistic (6) and (7) under the null and the alternative hypotheses. We focus on trials for which separate marginal hypothesis tests are of interest for the recurrent and terminal event processes; common type I and II error rates are assumed for the two tests but accommodation of different type I and II error rates is trivial. We suppose here that analyses are to be based on at most J events, but note that J can be chosen to be large enough to capture all events in any given setting with probability approaching one. Suppose the treatment effect is β0 in (1) under the null hypothesis and βA under the alternative hypothesis. Under regularity conditions A to D of Andersen and Gill [23] and the assumption that mP(Zi(t)=j|Zi(0)=0)→∞, for every j and t, as m→∞, U(β0) is asymptotically equivalent to
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ8_HTML.gif
(8)
where E0(⋅) is the expectation taken under the null hypothesis and
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ9_HTML.gif
(9)
is the associated martingale under the null. Note that (8) is a sum of m independent and identically distributed random variables with expectation zero, so it follows from the central limit theorem that \(m^{-\frac{1}{2}}\) times (8) converges in distribution to a zero-mean normal random variable with asymptotic variance
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ10_HTML.gif
(10)
where \(r_{j}^{(a)}(\beta_{0}, u)=E_{0} [R_{j}^{(a)}(\beta_{0}, u)]\), a=0,1,2. This asymptotic variance is similar to the expected information from a stratified Cox regression where the strata are defined by the state of the Markov process.
Under the same set of regularity conditions as under the null hypothesis, the partial score statistic (6) evaluated at β0 is asymptotically equivalent to
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ11_HTML.gif
(11)
under the alternative hypothesis, where the expectation is taken under the alternative hypothesis. Note that (11) is also a sum of m independent and identically distributed random variables and it follows from the central limit theorem that \(m^{-\frac{1}{2}}\) times (11) converges in distribution to Gaussian with mean
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ12_HTML.gif
(12)
If we let
$$H_{ij}(u)=v_i-E_A\bigl(R_j^{(1)}( \beta_0, u)\bigr)/E_A\bigl(R_j^{(0)}( \beta_0, u)\bigr) , $$
the asymptotic variance of \(m^{-\frac{1}{2}}\) times (11) is
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ13_HTML.gif
(13)
under the alternative.

Thus we have expressions for \(E_{A}(m^{-\frac{1}{2}}U(\beta_{0}))\) by (12), the asymptotic variance \(V_{0}=\operatorname{Var}_{0}(m^{-\frac{1}{2}}U(\beta_{0}))\) of the score statistic under the null by (10), and the asymptotic variance of \(V_{A}=\operatorname{Var}_{A}(m^{-\frac {1}{2}}U(\beta_{0}))\) under the alternative by (13). These results will be used for the sample size calculations in the next section. Details on how the requisite expectations can be carried out are given in Appendix.

For the terminal event under the null hypothesis, \(m^{-\frac{1}{2}}\) times the partial score statistics can be shown to asymptotically equivalent to
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ14_HTML.gif
(14)
where
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Eque_HTML.gif
is the associated martingale process for the terminal event of subject i at the state j and \(\varGamma_{0j}(t)=\int_{0}^{t}\gamma_{0j}(u)du\) is the baseline cumulative intensity function for the terminal event in stratum j. The asymptotic variance of (14) is
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ15_HTML.gif
(15)
under the null, where \(s_{j}^{(a)}(\theta_{0}, u)=E_{0} [S_{j}^{(a)}(\theta_{0}, u)]\), a=0,1,2, and under the alternative hypothesis, \(m^{-\frac{1}{2}}\) times the partial score statistic (7) is asymptotically equivalent to
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ16_HTML.gif
(16)
The asymptotic variance of (16) is
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ17_HTML.gif
(17)
under the alternative, where
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equf_HTML.gif

4 Sample Size Derivation Based on Partial Score Statistics

4.1 Sample Size for the Design of Superiority Trials

In this section, using the partial score statistics of Sect. 3 we use a score test to calculate sample size requirements for a clinical trial involving recurrent events and terminal event. We illustrate this procedure by testing a treatment effect on the recurrent events. In superiority trials interest is in demonstrating the effectiveness off a new therapy for both the recurrent event process and the terminal event. In particular, we consider the case where H0:β=β0 and HA:ββ0, where β0 is the null value, and βA<β0 is the value under the alternative that represents the minimal clinically important treatment effect we wish to detect for the recurrent event process. If we assume a follow-up period (0,τ], then under the null hypothesis, the partial score statistics based on (6) is
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ18_HTML.gif
(18)
which converges in distribution to a standard normal random variable.
The approximate one-sided 100α1 % level partial score test involves rejecting the null if \(Z< z_{\alpha_{1}}\), where zα is the 100α % percentile of the standard normal distribution. Under the alternative hypothesis, if we set the power to 100(1−α2) %, we require \(P(Z< z_{\alpha_{1}}|H_{A})=1-\alpha_{2}\). Straightforward calculations show that the required sample size m to detect the effect of a reduction in the intensity of events under the new treatment at the significance level of 100α1 % with power 100(1−α2) % is
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ19_HTML.gif
(19)
where Ui(⋅) is the contribution of a single individual i to the partial score statistic (6).
Similarly, the required sample size for detecting superiority of the treatment on the terminal event with power 100(1−α2) % at size 100α1 % is
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ20_HTML.gif
(20)
Then the minimum required sample size to detect the superiority of the new treatment on both the recurrent events and terminal event is max(m,md).

The asymptotic variances of the test statistics require specification of the baseline intensities, effects of interest (e.g. βA and θA) and information regarding the censoring including the time of administrative censoring and the rate of withdrawal. Parametric assumptions are typically made at the design stage of clinical trials and with time-homogeneous baseline transition intensities λ0j(t)=λ0exp(ψβ(j−1)) and γ0j(t)=γ0exp(ψγ(j−1)), j=2,3,…, a transition probability matrix can be easily obtained for the control group which is governed by four parameters, λ0, ψβ, γ0, and ψγ (see Appendix). Historical data on the increase in risk of death with event occurrence and the increase in risk of future events with event occurrence inform the choice of ψβ and ψθ, leaving λ0 and γ0 to specify. Clinical researchers will typically be able to specify a mortality rate over a given period of time (0,τ] which represents a constraint on P0D(0,τ|v=0); see (29) in Appendix. Specification of the expected number of events over (0,τ] yields the second constraint which enables specification of the process for the control arm. Specification of the treatment effects on the event and death intensities enables computation of the analogous transition probability matrix for the experimental arm. An assumption of a constant rate of withdrawal is typical in sample size calculations for survival endpoints which can be modeled using an exponential distribution. All calculations required and outlined in Appendix are then possible with this specification; code is available from the authors upon request to facilitate these calculations. The same procedure can be carried out for non-inferiority designs described in the next section.

4.2 Sample Size for the Design of Non-inferiority Trials

In this section we address design issues when testing for non-inferiority of a new treatment for both recurrent events and terminal event when compared to a existing active-control. We adopt common notation to formulate the non-inferiority hypotheses [18]. Let LRR(C1/P1) denote the log-relative risk reflecting the effect of the active-control (C) to a placebo (P) treatment on the risk of events. The subscript ‘1’ on C1 and P1 to denote that this estimate must be known or estimated from historical studies. Similarly, we let LRR(C2/P2) denote the effect of the active-control to a placebo in the context of planned study. We also let LRR(E2/P2) denote the log-relative risk for the planned new treatment versus a placebo. Though no placebo will be used in the planned study, it is helpful to make indirect comparisons with the effect of the active-control to placebo. In particular, the non-inferiority trial is intended to show that the experimental trial retains a prestated percentage of the active -control effect against placebo with a specified power and type I error rate. We formulate the non-inferiority hypotheses for the recurrent events as follows. Let δ0 be the percentage of the active-control effect to placebo necessary to retain for non-inferiority claims for the new treatment. The null hypothesis can be formulated as
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ21_HTML.gif
(21)
which is to be tested against the alternative hypothesis
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ22_HTML.gif
(22)

For the purpose of sample size calculation, it is sometime desirable to consider a particular value of LRR(E2/C2) in the alternative hypothesis, which may be expressed as a percentage of the effect of active-control to the placebo. We let 1−δA denote the percentage of the active-control effect that the experiment treatment retains once the null hypothesis is rejected so that LRR(E2/C2)=(1−δA)LRR(P1/C1)<(1−δ0)LRR(P1/C1). In this study, we examine different values of δA in sample size calculations.

For testing non-inferiority of the treatment based on the recurrent event, we let β0=LRR(P1/C1) and evaluate the partial score statistic (6) at the boundary of the null hypothesis of (21). If we further suppose that the follow-up duration is (0,τ], the partial score statistic
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ23_HTML.gif
(23)
then converges in distribution to a standard normal random variable Z, where V0(⋅) is the asymptotic variance of the partial score statistic under the null hypothesis according to (10). Based on a one-sided α1 level partial score test, to reject the null hypothesis with the power 1−α2, one can obtain the required sample size m for a non-inferiority test of the new treatment on the recurrent events as
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ24_HTML.gif
(24)
where VA(⋅) is the asymptotic variance of the partial score statistic under the alternative hypothesis (13) and Ui(⋅) is the contribution of individual i to the partial score statistic (6). The expectation EA(⋅) is taken with respect to the model under the alternative as in (8) with βA=(1−δA)β0. The required sample size md for testing non-inferiority of new treatment on the terminal event may be obtained by replacing the corresponding quantities in (24) by the ones from the partial score statistic for the terminal event (7) as follows:
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Equ25_HTML.gif
(25)
where \(V_{0}^{d}(\cdot)\) and \(V_{A}^{d}(\cdot)\) are the asymptotic variances for the partial score statistics for the terminal event under the null and the alternative hypotheses, respectively; the expectation EA is taken with respect to the model for the terminal event under the alternative with \(\theta_{A}=(1-\delta ^{d}_{A})\theta_{0}\).

The minimum requirement for testing the non-inferiority of the new treatment on both recurrent events and terminal event is max(m,md) for one-sided test with the level of α1 and the power of 1−α2.

5 An Empirical Study of Frequency Properties

We simulate the Markov process with the multiplicative model of (1) for recurrent events and (2) for the terminal event. For planning purposes we set an upper limit to the number of states and set the maximum number of events to J=10; only approximately 2 % patients had eight or more skeletal complications in Hortobagyi et al. [20]. For computational convenience, we further specify the intensity function for recurrent event (1) and for the terminal event (2) as λ0j(t)=λ0exp(ψβ⋅(j−1)) and γ0j(t)=γ0exp(ψθ⋅(j−1)) j=1,…,10, respectively. The constants ψβ and ψθ represent the relative increase in the event and death intensity with the occurrence of each additional event. In the simulation study, we consider ψβ=log1.0=0 for a constant baseline intensity (rate) which is independent of the number of previous events and ψθ=log1.0=0 to correspond to the setting where mortality is independent of event occurrence. We set ψβ=log1.1 to reflect the setting where the event intensity increases with each event and ψθ=log1.1 to correspond to the case where the mortality rate increases with event occurrence. The coefficients β and θ are the effects of the experiment treatment on recurrent events and death, respectively, and are chosen to represent modest improvements.

The Markov model has eleven states (0,1,…,10) corresponding to the cumulative number of recurrent events and one absorbing state for death; we number these states 1 to 12 and consider a 12×12 transition intensity matrix denoted \(\mathcal{Q}^{v}\) for an individual with vi=v having (k,) entry \(q^{v}_{kl}\) given by λ0kexp(βv) for k=1,…,10 and =k+1, γ0kexp(θv) for k=1,…,10 and =12, −(λ0kexp(βv)+γ0kexp(θv)) for k==1,…,10, and zero otherwise. The transition probability matrix has elements Pℓk(t|v)=P(Z(t)=|Z(t)=k,v) and can be obtained as described in Appendix. We further specify the baseline intensities λ0 and γ0 by setting the probabilities that for a control subject the first event is a recurrent event to q=λ0/(γ0+λ0) and setting the probability that a control subject has died by t=1 to q for some pre-specified values of p and q.

5.1 Empirical Study of Superiority Designs

For simulation studies involving superiority designs, under the null hypothesis of no treatment effect we set β0=θ0=0. Under the alternative we set βA=log0.8 and θA=log0.9. The duration of the study is set to τ=1. A random censoring time is simulated for each individual using an exponential random variable with a probability of P(τi<1)=0.2. We investigate the performance of the proposed methods for sample size calculations for different scenarios. For each setting the sample sizes are determined according to formulas in (19) and (20). All simulations were implemented in R, and the coxph function in the survival package was used to obtain the partial score statistics. By setting the iter.max and init options to zero, the partial score statistics are obtained using the function coxph.detail. Under the null hypothesis, the variance of partial score statistic was obtained by summing up the observed information at each event time. Under the alternative, this variance was calculated using the sample variance of the partial score statistics at each event time. For each setting, we conducted 2000 replicates and reported the percentage of those replicates leading to rejection of the null hypothesis as the empirical type I error rate under the null hypothesis, and as the power under the alternative. Table 1 displays the empirical type I error rate and the power for different settings of superiority trials. The empirical type I error rates are consistent with the nominal level of 0.025. For testing for superiority of a new treatment with respect to both the recurrent event and the terminal event, max(m,md) was the selected sample size which ensured the empirical powers satisfied the nominal requirement of 80 %.
Table 1

Sample sizes and empirical rejection rates for tests of superiority for recurrent and terminal events; βA=log(0.80) and θA=log(0.9); %REJ0 and %REJA are the empirical type I error rate and empirical power respectively; the nominal type I error rate is 2.5 % and the nominal power is 80 %

ψβ

Endpoint

Setting

ψθ=1.0

ψθ=1.1

m

%REJ0

%REJA

m

%REJ0

%REJA

1.0

Recurrent

θ=θ0

728

2.45

84.45

771

2.00

83.10

Recurrent

θ=θA

710

2.65

84.20

753

2.40

82.90

Death

β=β0

6636

2.40

80.35

6673

2.30

80.85

Death

β=βA

6740

2.50

80.50

6816

2.75

80.50

1.1

Recurrent

θ=θ0

691

2.85

84.70

737

2.40

84.25

Recurrent

θ=θA

674

2.70

84.15

719

2.10

84.25

Death

β=β0

6674

2.60

79.45

6691

2.25

81.15

Death

β=βA

6759

2.45

80.50

6836

2.40

83.00

Endpoint is the outcome used for the sample size calculation

Setting is the value of the parameter for the complementary outcome when testing the corresponding endpoint

5.2 Empirical Study of Non-inferiority Designs

In this section we present simulation studies conducted to validate the proposed methods for sample size calculations for testing non-inferiority of the experiment treatment on both recurrent events and terminal event. We demonstrate that the empirical rejection rates are consistent with the nominal levels. In particular, we set LRR(C1/P1)=log0.6 (βA) for the effect of active-control against a placebo for the recurrent events and LRR(C1/P1)=log0.8 (θA) for the terminal event. We also assume the constancy assumption so that LRR(P2/C2)=LRR(P1/C1).

We consider these designs where the aim is to demonstrate that the experimental treatment retains at least 50 per cent of the effect of the active-control, so that δ0=0.5. In this simulation study, we consider one-sized test with the nominal level of type I error rate α1=0.025 and the power is set to 80 per cent (1−α2=0.8). The effect of the experiment treatment under the alternative hypothesis is represented by LRRA(E2/C2)=(1−δA)LRR(P1/C1) and we let δA=0.90 and 1.00 to correspond to a retention of 90 and 100 per cent of the active-control effect, respectively. The duration of the follow-up τ is set to be 1. A random censoring process is simulated for each subject using an exponential distribution with parameter ρ, which is specified so that each subject may withdraw from the study with a probability of 0.20 (ρ=log5/4).

For each simulation setting, the sample size is determined according to the formula (24) and (25). The simulation was implemented in R and the partial score statistics are obtained using coxph function in the survival package by setting the iter.max option equal to zero. The partial score statistics was obtained by setting the init option as (1−δ0)βA. Under the null hypothesis, the corresponding variance was obtained by summing up the observed information of each event time. Under the alternative hypothesis, this variance was calculated by the sample variance of the partial score statistics at all event times.

We conducted 2000 replicates and the percentage of those replicates leading to rejection of the null hypothesis is the empirical type I error rate under the null and the power under the alternative. Table 2 presents the empirical type I error rate and the power for different non-inferiority configurations. The empirical type I error rates are all consistent with the nominal level of 0.025. The empirical powers are all close to the nominal levels for modest and large sample sizes. For simultaneous detecting the superiority of a new treatment on both recurrent events and the terminal event, max(m,md) equal to the sample size calculated for the terminal event. The empirical powers for simultaneous testing for the superiority are consistent with the nominal level of 80 %. Additional simulation studies were conducted with larger effect sizes yielding smaller sample sizes. These demonstrated excellent control of the type I error rate for both the recurrent event and terminal event analyses, and slightly higher empirical power than the nominal level.
Table 2

Sample sizes and empirical rejection rates for tests of non-inferiority for recurrent and terminal events; βA=log(0.60), θA=log(0.8) and δ0=0.50; %REJ0 and %REJA are the empirical type I error rate and empirical power respectively; the nominal type I error rate is 2.5 % and the nominal power is 80 %

Endpoint

Setting

\(1-\delta^{d}_{A}=0.9\)

\(1-\delta^{d}_{A}=1.0\)

m

%REJ0

%REJA

m

%REJ0

%REJA

  

ψθ=1.0, ψβ=1.0, 1−δA=0.9

Recurrent

θ=θ0

986

2.20

83.05

   

Recurrent

θ=θA

967

2.20

83.50

962

2.40

83.60

Death

β=β0

9665

2.65

81.20

6296

2.65

80.65

Death

β=βA

9850

2.65

81.40

6405

2.55

81.40

  

ψθ=1.0, ψβ=1.0, 1−δA=1.0

Recurrent

θ=θ0

664

2.55

83.70

   

Recurrent

θ=θA

657

2.65

83.35

655

2.75

82.60

Death

β=βA

9904

2.75

82.10

6429

2.85

81.85

  

ψθ=1.0, ψβ=1.1, 1−δA=0.9

Recurrent

θ=θ0

945

2.65

84.30

   

Recurrent

θ=θA

934

2.85

84.85

931

2.10

84.90

Death

β=β0

9669

2.30

80.15

6276

2.30

79.10

Death

β=βA

9860

2.25

82.35

6401

2.40

81.60

  

ψθ=1.0, ψβ=1.1, 1−δA=1.0

Recurrent

θ=θ0

639

2.60

83.75

   

Recurrent

θ=θA

631

2.90

84.40

629

2.30

84.70

Death

β=βA

9918

2.20

82.20

6438

2.75

81.55

  

ψθ=1.1, ψβ=1.0, 1−δA=0.9

Recurrent

θ=θ0

1042

2.15

82.50

   

Recurrent

θ=θA

1030

2.25

83.05

1027

2.05

82.10

Death

β=β0

9761

2.05

81.75

6322

2.65

80.05

Death

β=βA

9964

2.05

80.50

6475

2.75

79.35

  

ψθ=1.1, ψβ=1.0, 1−δA=1.0

Recurrent

θ=θ0

701

2.05

83.75

   

Recurrent

θ=θA

693

2.65

83.65

691

2.65

82.70

Death

β=βA

10029

2.75

79.55

6507

2.35

81.05

  

ψθ=1.1, ψβ=1.1, 1−δA=0.9

Recurrent

θ=θ0

1004

2.85

84.25

   

Recurrent

θ=θA

992

2.75

83.20

990

2.95

83.9

Death

β=β0

9752

2.05

80.52

6329

2.50

80.70

Death

β=βA

9986

2.25

80.30

6482

2.50

80.15

  

ψθ=1.1, ψβ=1.1, 1−δA=1.0

Recurrent

θ=θ0

678

2.60

83.70

   

Recurrent

θ=θA

670

2.35

84.70

665

2.75

83.65

Death

β=βA

10053

2.35

79.75

6526

2.25

80.80

Endpoint is the outcome used for the sample size calculation

Setting is the value of the parameter for the complementary outcome when testing the corresponding endpoint

We also examine the sensitivity of the sample size calculations to misspecification of the censoring process at the request of a referee. We do this by using the sample size formula of Sect. 4, simulating the response processes correspondingly, but simulating withdrawal times from a Weibull distribution with shape and scale parameters given by a=2 and b=2.1199 respectively to ensure that P(τi<1)=0.2 remains satisfied. The same parameter configurations adopted earlier in this section were used here. We again evaluate the empirical type I error rate and empirical power and display these results in Table 3. In general, the results suggest the proposed method is moderately robust to misspecification of the censoring distribution as the empirical type I error rate and empirical power remain quite close to the previous results. There are few cases that the proposed method yielded either lower power (as low as 75 %) and higher power (over 90 %) but given the degree of misspecification this seems in line with expectations.
Table 3

Sensitivity of empirical rejection rates for tests of non-inferiority for recurrent and terminal events to non-uniform withdrawal; βA=log(0.50), θA=log(0.5) and δ0=0.50; %REJ0 and %REJA are the empirical type I error rate (nominal level 2.5 %) and empirical power (nominal level 80 %) respectively; random early withdrawal time generated by a Weibull distribution with shape a=2 and scale b=2.1199

Endpoint

Setting

1−δA=0.9

1−δA=1.0

m

%REJ0

%REJA

m

%REJ0

%REJA

  

ψθ=1.0, ψβ=1.0, 1−δA=0.9

Recurrent

θ=θ0

610

2.50

86.70

   

Recurrent

θ=θA

594

2.40

84.45

591

2.60

85.80

Death

β=β0

1304

3.00

84.20

881

2.05

85.70

Death

β=βA

1346

2.05

85.30

910

2.75

69.00

  

ψθ=1.0, ψβ=1.0, 1−δA=1.0

Recurrent

θ=θ0

424

2.90

86.50

   

Recurrent

θ=θA

413

2.50

73.75

411

2.55

89.35

Death

β=βA

1358

2.30

84.80

918

2.35

86.25

  

ψθ=1.0, ψβ=1.1, 1−δA=0.9

Recurrent

θ=θ0

603

2.90

86.80

   

Recurrent

θ=θA

587

2.75

87.45

584

2.30

86.65

Death

β=β0

1313

2.40

83.00

887

2.55

86.10

Death

β=βA

1352

2.25

85.15

913

3.00

85.30

  

ψθ=1.0, ψβ=1.1, 1−δA=1.0

Recurrent

θ=θ0

421

2.20

87.75

   

Recurrent

θ=θA

410

2.00

72.90

408

2.30

89.40

Death

β=βA

1363

2.65

85.35

921

2.20

86.35

  

ψθ=1.1, ψβ=1.0, 1−δA=0.9

Recurrent

θ=θ0

662

2.27

87.85

   

Recurrent

θ=θA

645

2.15

89.00

642

2.20

88.60

Death

β=β0

1317

2.40

85.30

890

2.05

86.95

Death

β=βA

1366

1.90

85.25

923

2.05

85.85

  

ψθ=1.1, ψβ=1.0, 1−δA=1.0

Recurrent

θ=θ0

459

2.75

75.50

   

Recurrent

θ=θA

450

2.20

75.75

445

2.20

75.40

Death

β=βA

1379

2.15

86.95

932

2.35

87.75

  

ψθ=1.1, ψβ=1.1, 1−δA=0.9

Recurrent

θ=θ0

642

2.45

88.30

   

Recurrent

θ=θA

625

2.35

88.80

622

2.15

97.20

Death

β=β0

1320

2.90

87.00

926

2.85

88.25

Death

β=βA

1370

2.50

86.45

891

2.35

85.35

  

ψθ=1.1, ψβ=1.1, 1−δA=1.0

Recurrent

θ=θ0

447

2.40

76.60

   

Recurrent

θ=θA

444

2.25

75.60

442

2.40

90.50

Death

β=βA

1384

3.00

88.35

935

2.10

87.80

Endpoint is the outcome used for the sample size calculation

Parameter setting for the complementary outcome when testing the corresponding endpoint

Additional sensitivity studies involved exploration of the effect of more general history dependence through the incorporation of subject-specific frailties common to all transitions. Of course if the variability of this frailty is small the state-dependence accommodated in the sample size calculations will be close to adequate and the resultant sample sizes will be reasonable. If this variance is large, a stronger state dependence exists and the model is more seriously misspecified; poor frequency properties of the design and analysis will then result.

5.3 An Extension to Time Non-homogeneous Transition Intensities

It is apparent from Appendix that a key step in the sample size derivation is the calculation of the transition probability matrix for a continuous time Markov process. This Markov process has a finite state space with J+1 states and one absorbing state D corresponding to the terminal event. Let P(s,t|v) denote the (J+2)×(J+2) transition probability matrix for 0≤st, with [k,l] entry
$$P\bigl(Z(t)=l|Z(s)=k, v\bigr)=p_{kl}(s,t|v) , $$
for l=k+1 or D, k=0,1,…,J. This Markov process can be fully specified through the transition intensity matrix as discussed in Cox and Miller [36]. Let P(s,t|v) and Q(v) denote the (J+2)×(J+2) matrix of transition probabilities pkl(s,t|v) and respective transition intensity matrix.

For a time homogeneous Markov process considered in our derivation, qkl(t|v)=qkl(v) and pkl(s,t|v)=pkl(ts|v) for l=k+1 or D, k=0,1,…,J. The transition probability matrix is then \({P}(t|v)=\exp\{Q(v)t\}=\sum_{x=0}^{\infty}[{Q}(v)t]^{x}/x!\) which can be evaluated through Jordan decomposition.

The proposed method can be easily extended to accommodate time nonhomogeneous Markov processes through use of a time transform of the original time scale. Suppose there exists a transformation of the time scale such that t=g(u;ς) defines a time scale on which the process is homogeneous with transition intensity matrix Q0(v) given v. Then
$$ P(u_1,u_2|v)= \exp\bigl\{{Q}_0\bigl(g(u_2;\varsigma)-g(u_1; \varsigma)|v\bigr )\bigr\} . $$
(26)
We consider the exponential time transformation [26]
$$g(u;\varsigma)=\varsigma u^\varsigma, $$
under which the rate of the process may be increasing (ς>1) or decreasing (ς<1), and when ς=1 the process is time homogeneous. In the following simulation study, we set ς=1.2 and the transition probability matrix was calculated using (26). All sample size formula are still applicable in this case with the specified ς following the time-transform. The simulation results in Table 4 indicate that the sample sizes derived from the proposed method achieve the nominal type I error rate (nominal level 2.5 %) and power (nominal level 80 %).
Table 4

Empirical properties of design based on time nonhomogeneous Markov process with ς=1.2: sample sizes and empirical rejection rates for tests of non-inferiority for recurrent and terminal events; β0=θ0=0, βA=log(0.50), θA=log(0.5) and δ0=0.50; %REJ0 and %REJA are the empirical type I error rate (nominal level 2.5 %) and empirical power (nominal level 80 %) respectively

Endpoint

Setting

1−δA=0.9

1−δA=1.0

m

%REJ0

%REJA

m

%REJ0

%REJA

  

ψθ=1.0, ψβ=1.0, 1−δA=0.9

Recurrent

θ=θ0

657

2.70

81.35

   

Recurrent

θ=θA

648

2.60

80.85

647

3.10

81.10

Death

β=β0

1469

2.40

81.75

985

2.40

83.75

Death

β=βA

1513

1.95

82.30

1016

2.80

82.60

  

ψθ=1.0, ψβ=1.0, 1−δA=1.0

Recurrent

θ=θ0

452

3.05

82.85

   

Recurrent

θ=θA

446

2.75

83.45

445

2.60

83.55

Death

β=βA

1526

2.60

82.65

1024

2.35

82.85

  

ψθ=1.0, ψβ=1.1, 1−δA=0.9

Recurrent

θ=θ0

647

2.45

81.15

   

Recurrent

θ=θA

639

2.35

81.70

637

2.05

80.35

Death

β=β0

1494

2.50

82.10

1002

2.70

82.60

Death

β=βA

1538

2.95

82.85

1032

3.20

83.80

  

ψθ=1.0, ψβ=1.1, 1−δA=1.0

Recurrent

θ=θ0

446

2.50

82.85

   

Recurrent

θ=θA

441

2.15

83.50

440

2.35

83.60

Death

β=βA

1551

2.80

82.05

1041

2.45

81.85

Endpoint is the outcome used for the sample size calculation

Parameter setting for the complementary outcome when testing the corresponding endpoint

6 Trial Design in Cancer Metastatic to Bone

Hortobagyi et al. [20] report on the effectiveness of the bisphosphonate pamidronate for the prevention of skeletal related events in breast cancer patients with skeletal metastases. Here we report on analyses of these data to furnish information helpful for the design of a future study planned to have one year duration.

Figure 2 displays the estimates of the cumulative transition intensities for the placebo group for both event occurrence and death. Separate transition intensities were specified for the first to third events (i.e. \(\bar{\lambda}(t|\bar{H}_{i}(t)) = \bar{Y}_{ij}(t)\lambda_{0j}(t)\) where Ni(t)=j, j=0,1,2), but the baseline intensity was assumed to be the same for fourth and subsequent events due to sparse data (i.e. \(\bar{\lambda}(t|\bar{H}_{i}(t)) = \bar{Y}_{ij}(t)\lambda ^{*}_{03}(t)\) if Ni(t)=j≥3. The risk of the first event appears roughly constant over two years and could be represented with a time homogeneous rate of λ0=1 with time measured in years. The slope of the Nelson–Aalen estimates for the event intensities (left panel) are increasing with event occurrence indicating increased risk of future events with each event occurrence. For design purposes a parsimonious representation is required, and the results of fitting a regression model \(\bar{\lambda}_{j}(t|\bar{H}_{i}(t)) = \bar{Y}_{ij}(t)\lambda_{0}(t) \exp (\psi_{\beta}N_{i}(t^{-}))\) gives \(\widehat{\psi}_{\beta}= 1.41\). A similar model was specified for the death intensities and the Nelson–Aalen estimates plotted in the right panel of Fig. 2 reveal increasing risk of death with the occurrence of each event. When the regression model \(\bar{\gamma}_{j}(t|\bar{H}_{i}(t)) = \bar {Y}_{ij}(t)\gamma_{0}(t) \exp(\psi_{\theta}N_{i}(t^{-}))\) was fit the estimate obtained is \(\widehat{\psi}_{\theta}= 1.36\); based on the mortality rate over one year we set γ0=0.1. The censoring rate over the course of a planned study is assumed to be 10 % over the 24 months suggesting ρ=0.5−1log(10/9).
https://static-content.springer.com/image/art%3A10.1007%2Fs12561-013-9083-z/MediaObjects/12561_2013_9083_Fig2_HTML.gif
Fig. 2

Nelson–Aalen estimates of the cumulative transition intensities for the placebo group in Hortobagyi et al. [20]

Scenario I: Consider the planning of future study aiming to demonstrate that a new treatment is superior with respect to the occurrence of skeletal complication and superior with respect to mortality. We suppose that the overall type I error rate is 5 % and a Bonferroni adjustment yields a 2.5 % type I error rate for each hypothesis. Suppose two two-sided tests are to be conducted, with each at the 2.5 % level to control the overall type I error rate at 5 %. Suppose 90 % power is required to detect a 20 % reduction (βA=log0.80) in the risk of recurrent events and a 10 % reduction in mortality (θ=log0.90). We find minimum sample sizes of 700 and 707 individuals, respectively.

Scenario II: Suppose a non-inferiority design is of interest and we have margins of 50 % for both the recurrent events and death. Suppose the type I error rate for each test is controlled at 2.5 % and 80 % power is desired for each test. Suppose the true effect of treatment corresponds to a 20 % loss of the effect of the active control on survival and a 10 % loss of effect on the recurrent event outcome. To ensure 80 % power to claim non-inferiority for the survival endpoint, 9052 individuals will be required, and 8506 individuals will be required for the recurrent event outcome.

7 Discussion

This article has provided design criteria for randomized trials with the objective of comparing two treatment groups with respect to the incidence of recurrent events and a terminal event. The motivating setting involves the palliative treatment of cancer patients with skeletal metastases who are at risk of both skeletal related events and death. Recurrent and terminal events arise in many other settings in medical research including transplant studies in which recipients may experience transient graft rejection episodes and total graph rejection [24]. In trials designed to investigate the effect of treatment for advanced chronic obstructive pulmonary disease patients are at risk of recurrent exacerbations and death [25].

The multistate framework adopted is appealing for modeling such processes because it structurally incorporates the terminal events as an absorbing state [22]. This is in contrast to many joint models which incorporate an association between recurrent and terminal events through shared or correlated random effects arising from parametric models. The proposed analysis represents a compromise between use of intensity-based models reliant on full model specification and marginal models. The proposed recurrent event model is in line with the Prentice et al. [14] approach in which the baseline intensity is stratified on the cumulative number of events but has the added implicit condition that subjects must be alive to contribute to the risk set; they are sometimes called “partially” conditional models. The terminal event state therefore enters in the asymptotic calculations by reducing the expected size of the risk sets.

The Nelson–Aalen estimates of the cumulative transition intensities and Aalen–Johansen estimates of the transition probability functions which are estimated under a Markov assumption, are robust in the sense that they remain consistent estimates for non-Markov processes under independent censoring [27, 28]. This is not true for the estimates of treatment effect in multiplicative intensity-based models where there is greater reliance on the model assumptions for valid interpretation of covariate effects. It would be of interest to study the performance of the separate and joint tests of treatment effect in this setting, which involve no conditioning on the event history [29].

Between subject variation in risk of events routinely arises in recurrent event datasets and mixed Poisson models are often adopted since they account for this heterogeneity. The marginal intensity of mixed Poisson processes features a sudden change in risk following event occurrence [21]. This feature is present in the proposed multistate framework but the change in risk is not transient. Boher and Cook [30] showed empirically that the multistate analysis based on the Prentice et al. [14] formulation retains good control of the type I error rate even with naive (i.e. non-robust) variance estimation, so the multistate partially conditional analysis offers some protection against heterogeneity.

Mixed models have also been proposed by several authors for modeling the association between the recurrent and terminal events through correlated or shared random effects [3133]. Likelihood and semiparametric methods based on estimating functions can be used for analysis of a dataset, but parametric assumptions could be made to derive required sample sizes. We prefer the multistate framework however, since the terminal nature of death is reflected in its designation as an absorbing state. Moreover, with the multistate analysis in which we adopt time-dependent stratification on the cumulative number of events, our sample size formula is directly relevant for analyses based on the so-called Prentice–Williams–Peterson approach [14] to analyze recurrent events in the absence of mortality. While the multistate framework requires that more parameters be specified, the multiplicative increase in risk with event occurrence is seen in a diverse range of datasets and offers some degree of parsimony.

We have restricted attention to settings where the event times are at most right censored. Frequently recurrent events are not observed directly but are only detectable under careful examination in a clinic. Studies aiming to prevent the occurrence of skeletal metastases involve quarterly examinations of patients at which bone scans are conducted to assess whether new metastases have developed. The same multistate model can be used to characterize the incidence of skeletal metastases and death, but the onset times of the metastases become interval-censored. If the Markov framework remains appropriate, the methods of Kalbfleisch and Lawless [34] may be employed with the multistate model package msm in R/Splus. Sample size calculations must be suitably modified and this is a topic of ongoing research.

Acknowledgements

This research was supported by the Natural Sciences and Engineering Research Council of Canada (LW and RJC) and the Canadian Institutes for Health Research (RJC). Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. The authors thank Jerry Lawless for helpful discussions, the editorial reviewers for helpful comments, and Novartis Pharmaceuticals for permission to use the data from the clinical trial.

Copyright information

© International Chinese Statistical Association 2013