The Design of Intervention Trials Involving Recurrent and Terminal Events
Authors
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s12561-013-9083-z
- Cite this article as:
- Wu, L. & Cook, R.J. Stat Biosci (2013) 5: 261. doi:10.1007/s12561-013-9083-z
- 149 Views
Abstract
Clinical trials are often designed to assess the effect of therapeutic interventions on the incidence of recurrent events in the presence of a dependent terminal event such as death. Statistical methods based on multistate analysis have considerable appeal in this setting since they can incorporate changes in risk with each event occurrence, a dependence between the recurrent event and the terminal event, and event-dependent censoring. To date, however, there has been limited development of statistical methods for the design of trials involving recurrent and terminal events. Based on the asymptotic distribution of regression coefficients from a multiplicative intensity Markov regression model, we derive sample size formulas to address power requirements for both the recurrent and terminal event processes. We consider the design of trials for which separate marginal hypothesis tests are of interest for the recurrent and terminal event processes and deal with both superiority and non-inferiority tests. Simulation studies confirm that the designs satisfy the nominal power requirements in both settings, and an application to a trial evaluating the effect of a bisphosphonate on skeletal complications is given for illustration.
Keywords
Multistate modelNon-inferiorityRecurrent eventsSample sizeSuperiority1 Introduction
1.1 Background
Clinical trials must be designed with appropriate power to address scientific needs, ethical demands, and financial restrictions. In parallel group randomized trials involving failure time outcomes, power objectives are typically met for a given model (e.g. Cox model) by specifying the event rate in the reference arm, the clinically important effect, the censoring rate and the size of the test, and then by deriving a suitable sample size based on large sample theory [1]. Under this general framework, a number of authors have developed methods for planning trials based on analyses of the time to the first event [2–5].
Sample size formulas have been developed [6] for recurrent event outcomes based on mixed Poisson models with multiplicative rate functions [7, 8]. Power and sample size considerations were subsequently developed for more general multiplicative intensity-based models [9] using counting process theory [10]. Another approach to the analysis of recurrent event data in clinical trials is to use the robust methods for the analysis of multivariate survival data [11] under a working independence assumption, and sample size formula for this approach are available [12]. More recently there has been interest in trial design based on covariate-adjusted log-rank statistics for recurrent event analyses and associated sample size formula have been developed [13].
To date no methods have been developed for the design of clinical trials in which the aim was to test treatment effects on recurrent and terminal event processes. We address this problem under the framework of a Markov model with transient states corresponding to the recurrent events and a single absorbing state for death. The treatment effect on the recurrent events is formulated by specifying multiplicative intensity models with time-dependent strata based on the cumulative event history and a common treatment effect; this formulation is in the spirit of the Prentice et al. [14] approach to the analysis of recurrent events. Multiplicative intensity-based models are also incorporated for mortality with the same stratification criteria. Under this formulation we derive the limiting value of partial score statistics for the treatment effect on the recurrent and terminal event processes, along with the asymptotic variances under the null and alternative hypotheses. Sample size criteria are then obtained to satisfy power objectives when separate marginal hypothesis tests are of interest for the two types of event.
Non-inferiority designs are being used increasingly often in cancer and cardiovascular research [15, 16] since many treatments with proven efficacy are available and placebo-controlled trials are therefore unethical. In such settings new interventions are required to have some advantages over standard care, such as a lower cost, a lower rate of adverse events, or a less invasive mode of administration [17]. Rothmann et al. [15] provides an excellent discussion about the various approaches to hypothesis testing in the context of non-inferiority oncology trials and extensions have recently been made for recurrent event analyses based on mixed Poisson models or robust marginal methods [18]. We consider design issues when there are superiority or non-inferiority hypotheses for the recurrent event and survival processes in the context of the multistate model.
The remainder of this paper is as follows. In the next subsection we give further background information on the setting of the palliative trials for patients with cancer metastatic to bone. In Sect. 2 we define notation, describe the multistate model, and derive the relevant partial score statistics. The limiting distribution of the partial score statistics are derived in Sect. 3 which facilitate sample size calculation in Sect. 4. Simulation studies in Sect. 5 confirm that the empirical frequency properties are compatible with the nominal levels under the null and that power requirements are met. An application is given in Sect. 6 and general remarks and topics for further research are discussed in Sect. 7.
1.2 Trial Design for Patients with Skeletal Metastases
Cancer patients with skeletal metastases are at increased risk of a variety of clinical events including pathological and nonpathological fractures, bouts of acute bone pain, and episodes of hypercalcemia. These events are typically grouped together to form a composite recurrent “skeletal related event” which is used as a basis for the evaluation of treatments designed to reduce the occurrence of skeletal complications in cancer patients to help maintain functional ability and quality of life and minimize health service utilization [19]. Because the patient population has metastatic cancer, they are also at considerable risk of death. In breast cancer, twelve month survival in recent studies has been approximately 78.9 % in treated patients; in lung, prostate and other solid tumors the 12 month survival rates were 28.0 %, 66.0 % and 33.6 % respectively.
While bisphosphonate therapy is palliative and not expected to impact survival, an assessment of the effect on survival times is warranted for a complete evaluation of the consequences of treatment. Simultaneous consideration of treatment effects on the recurrent skeletal related events and survival is therefore essential and analyses must accommodate a possible association between the recurrent event and terminal death process.
2 Likelihood for Recurrent and Terminal Events
Consider a study with planned follow-up over the interval (0,τ], where τ is called the administrative censoring time. Individuals may withdraw prematurely from a study and so we let \(\tau _{i}^{\dagger}\) be the random right censoring time and let \(\tau_{i}=\mathrm{min}(\tau_{i}^{\dagger},\tau)\) be the net censoring time for individual i; we let \(X_{i}=\min(T_{i}^{d},\tau_{i})\) denote the total time on study and \(\delta_{i}=I(X_{i}=T_{i}^{d})\) indicate whether the terminal event was observed. Let Y_{i}(t)=I(t≤τ_{i}) indicate whether individual i is under observation at t and Y_{ij}(t)=I(Z_{i}(t^{−})=j−1), j=1,… indicate that individual i is at risk of a transition out of state j−1 at time t (i.e. they are at risk for the jth event of either type), so \(\bar{Y}_{ij}(t)=Y_{i}(t)Y_{ij}(t)\) indicates they are both at risk and under observation. Then \(d\bar{N}_{ij}(t)=\bar{Y}_{ij}(t)dN_{ij}(t)\) and \(d\bar {N}_{ij}^{d}(t)=\bar{Y}_{ij}(t)dN_{ij}^{d}(t)\) are so-called the observable counting processes for the recurrent event and terminal events respectively. The observed data can then be written \(\{d\bar{N}_{i}(s), d\bar {N}_{i}^{d}(s), Y_{i}(s), 0<s, v_{i}\}\), i=1,…,m. The history of the observable process is the information observed up to t^{−} and denoted \(\bar{H}_{i}(t)=\{\bar{N}_{i}(s), \bar{N}_{i}^{d}(s), \bar{Y}_{i}(s), 0\leq s <t, v_{i} \}\), i=1,…,m.
3 Asymptotic Properties of Partial Score Statistics
Thus we have expressions for \(E_{A}(m^{-\frac{1}{2}}U(\beta_{0}))\) by (12), the asymptotic variance \(V_{0}=\operatorname{Var}_{0}(m^{-\frac{1}{2}}U(\beta_{0}))\) of the score statistic under the null by (10), and the asymptotic variance of \(V_{A}=\operatorname{Var}_{A}(m^{-\frac {1}{2}}U(\beta_{0}))\) under the alternative by (13). These results will be used for the sample size calculations in the next section. Details on how the requisite expectations can be carried out are given in Appendix.
4 Sample Size Derivation Based on Partial Score Statistics
4.1 Sample Size for the Design of Superiority Trials
The asymptotic variances of the test statistics require specification of the baseline intensities, effects of interest (e.g. β_{A} and θ_{A}) and information regarding the censoring including the time of administrative censoring and the rate of withdrawal. Parametric assumptions are typically made at the design stage of clinical trials and with time-homogeneous baseline transition intensities λ_{0j}(t)=λ_{0}exp(ψ_{β}(j−1)) and γ_{0j}(t)=γ_{0}exp(ψ_{γ}(j−1)), j=2,3,…, a transition probability matrix can be easily obtained for the control group which is governed by four parameters, λ_{0}, ψ_{β}, γ_{0}, and ψ_{γ} (see Appendix). Historical data on the increase in risk of death with event occurrence and the increase in risk of future events with event occurrence inform the choice of ψ_{β} and ψ_{θ}, leaving λ_{0} and γ_{0} to specify. Clinical researchers will typically be able to specify a mortality rate over a given period of time (0,τ] which represents a constraint on P_{0D}(0,τ|v=0); see (29) in Appendix. Specification of the expected number of events over (0,τ] yields the second constraint which enables specification of the process for the control arm. Specification of the treatment effects on the event and death intensities enables computation of the analogous transition probability matrix for the experimental arm. An assumption of a constant rate of withdrawal is typical in sample size calculations for survival endpoints which can be modeled using an exponential distribution. All calculations required and outlined in Appendix are then possible with this specification; code is available from the authors upon request to facilitate these calculations. The same procedure can be carried out for non-inferiority designs described in the next section.
4.2 Sample Size for the Design of Non-inferiority Trials
For the purpose of sample size calculation, it is sometime desirable to consider a particular value of LRR(E_{2}/C_{2}) in the alternative hypothesis, which may be expressed as a percentage of the effect of active-control to the placebo. We let 1−δ_{A} denote the percentage of the active-control effect that the experiment treatment retains once the null hypothesis is rejected so that LRR(E_{2}/C_{2})=(1−δ_{A})LRR(P_{1}/C_{1})<(1−δ_{0})LRR(P_{1}/C_{1}). In this study, we examine different values of δ_{A} in sample size calculations.
The minimum requirement for testing the non-inferiority of the new treatment on both recurrent events and terminal event is max(m,m^{d}) for one-sided test with the level of α_{1} and the power of 1−α_{2}.
5 An Empirical Study of Frequency Properties
We simulate the Markov process with the multiplicative model of (1) for recurrent events and (2) for the terminal event. For planning purposes we set an upper limit to the number of states and set the maximum number of events to J=10; only approximately 2 % patients had eight or more skeletal complications in Hortobagyi et al. [20]. For computational convenience, we further specify the intensity function for recurrent event (1) and for the terminal event (2) as λ_{0j}(t)=λ_{0}exp(ψ_{β}⋅(j−1)) and γ_{0j}(t)=γ_{0}exp(ψ_{θ}⋅(j−1)) j=1,…,10, respectively. The constants ψ_{β} and ψ_{θ} represent the relative increase in the event and death intensity with the occurrence of each additional event. In the simulation study, we consider ψ_{β}=log1.0=0 for a constant baseline intensity (rate) which is independent of the number of previous events and ψ_{θ}=log1.0=0 to correspond to the setting where mortality is independent of event occurrence. We set ψ_{β}=log1.1 to reflect the setting where the event intensity increases with each event and ψ_{θ}=log1.1 to correspond to the case where the mortality rate increases with event occurrence. The coefficients β and θ are the effects of the experiment treatment on recurrent events and death, respectively, and are chosen to represent modest improvements.
The Markov model has eleven states (0,1,…,10) corresponding to the cumulative number of recurrent events and one absorbing state for death; we number these states 1 to 12 and consider a 12×12 transition intensity matrix denoted \(\mathcal{Q}^{v}\) for an individual with v_{i}=v having (k,ℓ) entry \(q^{v}_{kl}\) given by λ_{0k}exp(βv) for k=1,…,10 and ℓ=k+1, γ_{0k}exp(θv) for k=1,…,10 and ℓ=12, −(λ_{0k}exp(βv)+γ_{0k}exp(θv)) for k=ℓ=1,…,10, and zero otherwise. The transition probability matrix has elements P_{ℓk}(t|v)=P(Z(t)=ℓ|Z(t^{−})=k,v) and can be obtained as described in Appendix. We further specify the baseline intensities λ_{0} and γ_{0} by setting the probabilities that for a control subject the first event is a recurrent event to q=λ_{0}/(γ_{0}+λ_{0}) and setting the probability that a control subject has died by t=1 to q for some pre-specified values of p and q.
5.1 Empirical Study of Superiority Designs
Sample sizes and empirical rejection rates for tests of superiority for recurrent and terminal events; β_{A}=log(0.80) and θ_{A}=log(0.9); %REJ_{0} and %REJ_{A} are the empirical type I error rate and empirical power respectively; the nominal type I error rate is 2.5 % and the nominal power is 80 %
ψ_{β} | Endpoint^{†} | Setting^{‡} | ψ_{θ}=1.0 | ψ_{θ}=1.1 | ||||
---|---|---|---|---|---|---|---|---|
m | %REJ_{0} | %REJ_{A} | m | %REJ_{0} | %REJ_{A} | |||
1.0 | Recurrent | θ=θ_{0} | 728 | 2.45 | 84.45 | 771 | 2.00 | 83.10 |
Recurrent | θ=θ_{A} | 710 | 2.65 | 84.20 | 753 | 2.40 | 82.90 | |
Death | β=β_{0} | 6636 | 2.40 | 80.35 | 6673 | 2.30 | 80.85 | |
Death | β=β_{A} | 6740 | 2.50 | 80.50 | 6816 | 2.75 | 80.50 | |
1.1 | Recurrent | θ=θ_{0} | 691 | 2.85 | 84.70 | 737 | 2.40 | 84.25 |
Recurrent | θ=θ_{A} | 674 | 2.70 | 84.15 | 719 | 2.10 | 84.25 | |
Death | β=β_{0} | 6674 | 2.60 | 79.45 | 6691 | 2.25 | 81.15 | |
Death | β=β_{A} | 6759 | 2.45 | 80.50 | 6836 | 2.40 | 83.00 |
5.2 Empirical Study of Non-inferiority Designs
In this section we present simulation studies conducted to validate the proposed methods for sample size calculations for testing non-inferiority of the experiment treatment on both recurrent events and terminal event. We demonstrate that the empirical rejection rates are consistent with the nominal levels. In particular, we set LRR(C_{1}/P_{1})=log0.6 (β_{A}) for the effect of active-control against a placebo for the recurrent events and LRR(C_{1}/P_{1})=log0.8 (θ_{A}) for the terminal event. We also assume the constancy assumption so that LRR(P_{2}/C_{2})=LRR(P_{1}/C_{1}).
We consider these designs where the aim is to demonstrate that the experimental treatment retains at least 50 per cent of the effect of the active-control, so that δ_{0}=0.5. In this simulation study, we consider one-sized test with the nominal level of type I error rate α_{1}=0.025 and the power is set to 80 per cent (1−α_{2}=0.8). The effect of the experiment treatment under the alternative hypothesis is represented by LRR_{A}(E_{2}/C_{2})=(1−δ_{A})LRR(P_{1}/C_{1}) and we let δ_{A}=0.90 and 1.00 to correspond to a retention of 90 and 100 per cent of the active-control effect, respectively. The duration of the follow-up τ is set to be 1. A random censoring process is simulated for each subject using an exponential distribution with parameter ρ, which is specified so that each subject may withdraw from the study with a probability of 0.20 (ρ=log5/4).
For each simulation setting, the sample size is determined according to the formula (24) and (25). The simulation was implemented in R and the partial score statistics are obtained using coxph function in the survival package by setting the iter.max option equal to zero. The partial score statistics was obtained by setting the init option as (1−δ_{0})β_{A}. Under the null hypothesis, the corresponding variance was obtained by summing up the observed information of each event time. Under the alternative hypothesis, this variance was calculated by the sample variance of the partial score statistics at all event times.
Sample sizes and empirical rejection rates for tests of non-inferiority for recurrent and terminal events; β_{A}=log(0.60), θ_{A}=log(0.8) and δ_{0}=0.50; %REJ_{0} and %REJ_{A} are the empirical type I error rate and empirical power respectively; the nominal type I error rate is 2.5 % and the nominal power is 80 %
Endpoint^{†} | Setting^{‡} | \(1-\delta^{d}_{A}=0.9\) | \(1-\delta^{d}_{A}=1.0\) | ||||
---|---|---|---|---|---|---|---|
m | %REJ_{0} | %REJ_{A} | m | %REJ_{0} | %REJ_{A} | ||
ψ_{θ}=1.0, ψ_{β}=1.0, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 986 | 2.20 | 83.05 | |||
Recurrent | θ=θ_{A} | 967 | 2.20 | 83.50 | 962 | 2.40 | 83.60 |
Death | β=β_{0} | 9665 | 2.65 | 81.20 | 6296 | 2.65 | 80.65 |
Death | β=β_{A} | 9850 | 2.65 | 81.40 | 6405 | 2.55 | 81.40 |
ψ_{θ}=1.0, ψ_{β}=1.0, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 664 | 2.55 | 83.70 | |||
Recurrent | θ=θ_{A} | 657 | 2.65 | 83.35 | 655 | 2.75 | 82.60 |
Death | β=β_{A} | 9904 | 2.75 | 82.10 | 6429 | 2.85 | 81.85 |
ψ_{θ}=1.0, ψ_{β}=1.1, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 945 | 2.65 | 84.30 | |||
Recurrent | θ=θ_{A} | 934 | 2.85 | 84.85 | 931 | 2.10 | 84.90 |
Death | β=β_{0} | 9669 | 2.30 | 80.15 | 6276 | 2.30 | 79.10 |
Death | β=β_{A} | 9860 | 2.25 | 82.35 | 6401 | 2.40 | 81.60 |
ψ_{θ}=1.0, ψ_{β}=1.1, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 639 | 2.60 | 83.75 | |||
Recurrent | θ=θ_{A} | 631 | 2.90 | 84.40 | 629 | 2.30 | 84.70 |
Death | β=β_{A} | 9918 | 2.20 | 82.20 | 6438 | 2.75 | 81.55 |
ψ_{θ}=1.1, ψ_{β}=1.0, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 1042 | 2.15 | 82.50 | |||
Recurrent | θ=θ_{A} | 1030 | 2.25 | 83.05 | 1027 | 2.05 | 82.10 |
Death | β=β_{0} | 9761 | 2.05 | 81.75 | 6322 | 2.65 | 80.05 |
Death | β=β_{A} | 9964 | 2.05 | 80.50 | 6475 | 2.75 | 79.35 |
ψ_{θ}=1.1, ψ_{β}=1.0, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 701 | 2.05 | 83.75 | |||
Recurrent | θ=θ_{A} | 693 | 2.65 | 83.65 | 691 | 2.65 | 82.70 |
Death | β=β_{A} | 10029 | 2.75 | 79.55 | 6507 | 2.35 | 81.05 |
ψ_{θ}=1.1, ψ_{β}=1.1, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 1004 | 2.85 | 84.25 | |||
Recurrent | θ=θ_{A} | 992 | 2.75 | 83.20 | 990 | 2.95 | 83.9 |
Death | β=β_{0} | 9752 | 2.05 | 80.52 | 6329 | 2.50 | 80.70 |
Death | β=β_{A} | 9986 | 2.25 | 80.30 | 6482 | 2.50 | 80.15 |
ψ_{θ}=1.1, ψ_{β}=1.1, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 678 | 2.60 | 83.70 | |||
Recurrent | θ=θ_{A} | 670 | 2.35 | 84.70 | 665 | 2.75 | 83.65 |
Death | β=β_{A} | 10053 | 2.35 | 79.75 | 6526 | 2.25 | 80.80 |
Sensitivity of empirical rejection rates for tests of non-inferiority for recurrent and terminal events to non-uniform withdrawal; β_{A}=log(0.50), θ_{A}=log(0.5) and δ_{0}=0.50; %REJ_{0} and %REJ_{A} are the empirical type I error rate (nominal level 2.5 %) and empirical power (nominal level 80 %) respectively; random early withdrawal time generated by a Weibull distribution with shape a=2 and scale b=2.1199
Endpoint^{†} | Setting^{‡} | 1−δ_{A}=0.9 | 1−δ_{A}=1.0 | ||||
---|---|---|---|---|---|---|---|
m | %REJ_{0} | %REJ_{A} | m | %REJ_{0} | %REJ_{A} | ||
ψ_{θ}=1.0, ψ_{β}=1.0, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 610 | 2.50 | 86.70 | |||
Recurrent | θ=θ_{A} | 594 | 2.40 | 84.45 | 591 | 2.60 | 85.80 |
Death | β=β_{0} | 1304 | 3.00 | 84.20 | 881 | 2.05 | 85.70 |
Death | β=β_{A} | 1346 | 2.05 | 85.30 | 910 | 2.75 | 69.00 |
ψ_{θ}=1.0, ψ_{β}=1.0, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 424 | 2.90 | 86.50 | |||
Recurrent | θ=θ_{A} | 413 | 2.50 | 73.75 | 411 | 2.55 | 89.35 |
Death | β=β_{A} | 1358 | 2.30 | 84.80 | 918 | 2.35 | 86.25 |
ψ_{θ}=1.0, ψ_{β}=1.1, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 603 | 2.90 | 86.80 | |||
Recurrent | θ=θ_{A} | 587 | 2.75 | 87.45 | 584 | 2.30 | 86.65 |
Death | β=β_{0} | 1313 | 2.40 | 83.00 | 887 | 2.55 | 86.10 |
Death | β=β_{A} | 1352 | 2.25 | 85.15 | 913 | 3.00 | 85.30 |
ψ_{θ}=1.0, ψ_{β}=1.1, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 421 | 2.20 | 87.75 | |||
Recurrent | θ=θ_{A} | 410 | 2.00 | 72.90 | 408 | 2.30 | 89.40 |
Death | β=β_{A} | 1363 | 2.65 | 85.35 | 921 | 2.20 | 86.35 |
ψ_{θ}=1.1, ψ_{β}=1.0, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 662 | 2.27 | 87.85 | |||
Recurrent | θ=θ_{A} | 645 | 2.15 | 89.00 | 642 | 2.20 | 88.60 |
Death | β=β_{0} | 1317 | 2.40 | 85.30 | 890 | 2.05 | 86.95 |
Death | β=β_{A} | 1366 | 1.90 | 85.25 | 923 | 2.05 | 85.85 |
ψ_{θ}=1.1, ψ_{β}=1.0, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 459 | 2.75 | 75.50 | |||
Recurrent | θ=θ_{A} | 450 | 2.20 | 75.75 | 445 | 2.20 | 75.40 |
Death | β=β_{A} | 1379 | 2.15 | 86.95 | 932 | 2.35 | 87.75 |
ψ_{θ}=1.1, ψ_{β}=1.1, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 642 | 2.45 | 88.30 | |||
Recurrent | θ=θ_{A} | 625 | 2.35 | 88.80 | 622 | 2.15 | 97.20 |
Death | β=β_{0} | 1320 | 2.90 | 87.00 | 926 | 2.85 | 88.25 |
Death | β=β_{A} | 1370 | 2.50 | 86.45 | 891 | 2.35 | 85.35 |
ψ_{θ}=1.1, ψ_{β}=1.1, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 447 | 2.40 | 76.60 | |||
Recurrent | θ=θ_{A} | 444 | 2.25 | 75.60 | 442 | 2.40 | 90.50 |
Death | β=β_{A} | 1384 | 3.00 | 88.35 | 935 | 2.10 | 87.80 |
Additional sensitivity studies involved exploration of the effect of more general history dependence through the incorporation of subject-specific frailties common to all transitions. Of course if the variability of this frailty is small the state-dependence accommodated in the sample size calculations will be close to adequate and the resultant sample sizes will be reasonable. If this variance is large, a stronger state dependence exists and the model is more seriously misspecified; poor frequency properties of the design and analysis will then result.
5.3 An Extension to Time Non-homogeneous Transition Intensities
For a time homogeneous Markov process considered in our derivation, q_{kl}(t|v)=q_{kl}(v) and p_{kl}(s,t|v)=p_{kl}(t−s|v) for l=k+1 or D, k=0,1,…,J. The transition probability matrix is then \({P}(t|v)=\exp\{Q(v)t\}=\sum_{x=0}^{\infty}[{Q}(v)t]^{x}/x!\) which can be evaluated through Jordan decomposition.
Empirical properties of design based on time nonhomogeneous Markov process with ς=1.2: sample sizes and empirical rejection rates for tests of non-inferiority for recurrent and terminal events; β_{0}=θ_{0}=0, β_{A}=log(0.50), θ_{A}=log(0.5) and δ_{0}=0.50; %REJ_{0} and %REJ_{A} are the empirical type I error rate (nominal level 2.5 %) and empirical power (nominal level 80 %) respectively
Endpoint^{†} | Setting^{‡} | 1−δ_{A}=0.9 | 1−δ_{A}=1.0 | ||||
---|---|---|---|---|---|---|---|
m | %REJ_{0} | %REJ_{A} | m | %REJ_{0} | %REJ_{A} | ||
ψ_{θ}=1.0, ψ_{β}=1.0, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 657 | 2.70 | 81.35 | |||
Recurrent | θ=θ_{A} | 648 | 2.60 | 80.85 | 647 | 3.10 | 81.10 |
Death | β=β_{0} | 1469 | 2.40 | 81.75 | 985 | 2.40 | 83.75 |
Death | β=β_{A} | 1513 | 1.95 | 82.30 | 1016 | 2.80 | 82.60 |
ψ_{θ}=1.0, ψ_{β}=1.0, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 452 | 3.05 | 82.85 | |||
Recurrent | θ=θ_{A} | 446 | 2.75 | 83.45 | 445 | 2.60 | 83.55 |
Death | β=β_{A} | 1526 | 2.60 | 82.65 | 1024 | 2.35 | 82.85 |
ψ_{θ}=1.0, ψ_{β}=1.1, 1−δ_{A}=0.9 | |||||||
Recurrent | θ=θ_{0} | 647 | 2.45 | 81.15 | |||
Recurrent | θ=θ_{A} | 639 | 2.35 | 81.70 | 637 | 2.05 | 80.35 |
Death | β=β_{0} | 1494 | 2.50 | 82.10 | 1002 | 2.70 | 82.60 |
Death | β=β_{A} | 1538 | 2.95 | 82.85 | 1032 | 3.20 | 83.80 |
ψ_{θ}=1.0, ψ_{β}=1.1, 1−δ_{A}=1.0 | |||||||
Recurrent | θ=θ_{0} | 446 | 2.50 | 82.85 | |||
Recurrent | θ=θ_{A} | 441 | 2.15 | 83.50 | 440 | 2.35 | 83.60 |
Death | β=β_{A} | 1551 | 2.80 | 82.05 | 1041 | 2.45 | 81.85 |
6 Trial Design in Cancer Metastatic to Bone
Hortobagyi et al. [20] report on the effectiveness of the bisphosphonate pamidronate for the prevention of skeletal related events in breast cancer patients with skeletal metastases. Here we report on analyses of these data to furnish information helpful for the design of a future study planned to have one year duration.
Scenario I: Consider the planning of future study aiming to demonstrate that a new treatment is superior with respect to the occurrence of skeletal complication and superior with respect to mortality. We suppose that the overall type I error rate is 5 % and a Bonferroni adjustment yields a 2.5 % type I error rate for each hypothesis. Suppose two two-sided tests are to be conducted, with each at the 2.5 % level to control the overall type I error rate at 5 %. Suppose 90 % power is required to detect a 20 % reduction (β_{A}=log0.80) in the risk of recurrent events and a 10 % reduction in mortality (θ=log0.90). We find minimum sample sizes of 700 and 707 individuals, respectively.
Scenario II: Suppose a non-inferiority design is of interest and we have margins of 50 % for both the recurrent events and death. Suppose the type I error rate for each test is controlled at 2.5 % and 80 % power is desired for each test. Suppose the true effect of treatment corresponds to a 20 % loss of the effect of the active control on survival and a 10 % loss of effect on the recurrent event outcome. To ensure 80 % power to claim non-inferiority for the survival endpoint, 9052 individuals will be required, and 8506 individuals will be required for the recurrent event outcome.
7 Discussion
This article has provided design criteria for randomized trials with the objective of comparing two treatment groups with respect to the incidence of recurrent events and a terminal event. The motivating setting involves the palliative treatment of cancer patients with skeletal metastases who are at risk of both skeletal related events and death. Recurrent and terminal events arise in many other settings in medical research including transplant studies in which recipients may experience transient graft rejection episodes and total graph rejection [24]. In trials designed to investigate the effect of treatment for advanced chronic obstructive pulmonary disease patients are at risk of recurrent exacerbations and death [25].
The multistate framework adopted is appealing for modeling such processes because it structurally incorporates the terminal events as an absorbing state [22]. This is in contrast to many joint models which incorporate an association between recurrent and terminal events through shared or correlated random effects arising from parametric models. The proposed analysis represents a compromise between use of intensity-based models reliant on full model specification and marginal models. The proposed recurrent event model is in line with the Prentice et al. [14] approach in which the baseline intensity is stratified on the cumulative number of events but has the added implicit condition that subjects must be alive to contribute to the risk set; they are sometimes called “partially” conditional models. The terminal event state therefore enters in the asymptotic calculations by reducing the expected size of the risk sets.
The Nelson–Aalen estimates of the cumulative transition intensities and Aalen–Johansen estimates of the transition probability functions which are estimated under a Markov assumption, are robust in the sense that they remain consistent estimates for non-Markov processes under independent censoring [27, 28]. This is not true for the estimates of treatment effect in multiplicative intensity-based models where there is greater reliance on the model assumptions for valid interpretation of covariate effects. It would be of interest to study the performance of the separate and joint tests of treatment effect in this setting, which involve no conditioning on the event history [29].
Between subject variation in risk of events routinely arises in recurrent event datasets and mixed Poisson models are often adopted since they account for this heterogeneity. The marginal intensity of mixed Poisson processes features a sudden change in risk following event occurrence [21]. This feature is present in the proposed multistate framework but the change in risk is not transient. Boher and Cook [30] showed empirically that the multistate analysis based on the Prentice et al. [14] formulation retains good control of the type I error rate even with naive (i.e. non-robust) variance estimation, so the multistate partially conditional analysis offers some protection against heterogeneity.
Mixed models have also been proposed by several authors for modeling the association between the recurrent and terminal events through correlated or shared random effects [31–33]. Likelihood and semiparametric methods based on estimating functions can be used for analysis of a dataset, but parametric assumptions could be made to derive required sample sizes. We prefer the multistate framework however, since the terminal nature of death is reflected in its designation as an absorbing state. Moreover, with the multistate analysis in which we adopt time-dependent stratification on the cumulative number of events, our sample size formula is directly relevant for analyses based on the so-called Prentice–Williams–Peterson approach [14] to analyze recurrent events in the absence of mortality. While the multistate framework requires that more parameters be specified, the multiplicative increase in risk with event occurrence is seen in a diverse range of datasets and offers some degree of parsimony.
We have restricted attention to settings where the event times are at most right censored. Frequently recurrent events are not observed directly but are only detectable under careful examination in a clinic. Studies aiming to prevent the occurrence of skeletal metastases involve quarterly examinations of patients at which bone scans are conducted to assess whether new metastases have developed. The same multistate model can be used to characterize the incidence of skeletal metastases and death, but the onset times of the metastases become interval-censored. If the Markov framework remains appropriate, the methods of Kalbfleisch and Lawless [34] may be employed with the multistate model package msm in R/Splus. Sample size calculations must be suitably modified and this is a topic of ongoing research.
Acknowledgements
This research was supported by the Natural Sciences and Engineering Research Council of Canada (LW and RJC) and the Canadian Institutes for Health Research (RJC). Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. The authors thank Jerry Lawless for helpful discussions, the editorial reviewers for helpful comments, and Novartis Pharmaceuticals for permission to use the data from the clinical trial.