Appendix A: Limiting distribution of the clustered rank statistic under \(H_1\)
For subunit j in cluster i that is randomized to arm k, let \(M_{ikj}(t)=N_{ikj}(t)-\int _0^t Y_{ikj}(s)d\Lambda _k(t)\) and \(M_{ik}(t)=\sum _{j=1}^{m_{ik}}M_{ikj}(t)\). By the definition of W,
$$\begin{aligned} W=&\sqrt{n}\{\sum _{i=1}^{n_1} \int _0^\infty \frac{H(t)}{Y_1(t)}dM_{1i}(t) -\sum _{i=1}^{n_2}\int _0^\infty \frac{H(t)}{Y_2(t)}dM_{2i}(t)\} \\&\quad + \sqrt{n}\int _0^\infty H(t)\{d\Lambda _1(t)-d\Lambda _2(t)\} \end{aligned}$$
Let \(\tau =\max \{t:S_1(t)S_2(t)G(t)>0\}\). Usually the upper limit of the support of survival distributions is longer than the study period which is the upper limit of the support of censoring distribution, so that \(\tau \) denotes the study period. For the log-rank statistic, as \(n\rightarrow \infty \), \(n^{-1}Y_k(t)\) and H(t) uniformly converge to \(y_k(t)={\bar{m}} p_kS_k(t)G(t)\) and
$$\begin{aligned} h(t)=\frac{{\bar{m}} p_1p_2S_1(t)S_2(t)G(t)}{p_1S_1(t)+p_2S_2(t)} \end{aligned}$$
in \([0,\tau ]\), respectively, so that we have
$$\begin{aligned} W=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\epsilon _{i} +\sqrt{n}\int _0^\infty h(t)\{d\Lambda _1(t)-d\Lambda _2(t)\}+o_p(1), \end{aligned}$$
where \(\epsilon _{i}=\epsilon _{i1} - \epsilon _{i2}\), \(\epsilon _{ik} = \sum _{j=1}^{m_{ik}}\epsilon _{ikj}\), and \(\epsilon _{ikj}=\int _0^\infty y_k(t)^{-1}h(t)dM_{ikj}(t)\).
Since, \(\{\epsilon _{i}, i=1,...,n\}\) are independent random variables with mean 0, by the central limit theorem, W is approximately normal with mean \(\sqrt{n}{\bar{\omega }}\), where \(\omega =\int _0^\infty h(t) \{d\Lambda _1(t)-d\Lambda _2(t)\}\) and variance \(\sigma ^2=\sigma _1+\sigma _2-2\sigma _{12}\) where
$$\begin{aligned} \sigma _k={\bar{m}}p_k(\sigma _{k1}^2-c_{k}) + \bar{\bar{m}}p_k^2c_{k} \\ \sigma _{k1}^2=\text{ var }(\epsilon _{ikj})=\int _0^\infty \frac{h^2(t)}{y_k(t)}d\Lambda _k(t) \end{aligned}$$
and
$$\begin{aligned} c_k=\text{ cov }(\epsilon _{ikj},\epsilon _{ikj'}) =\int _0^\infty \int _0^\infty \frac{h(t_1)h(t_2)}{y_k(t_1)y_k(t_2)}E\{dM_{ikj}(t_1)dM_{ikj'}(t_2)\} \end{aligned}$$
We can derive \(c_k\) in a rather direct way. For \(j\ne j'\), By definition,
$$\begin{aligned}&dM_{ikj}(t_1)dM_{ikj'}(t_2) \\&\quad = dN_{ikj}(t_1)dN_{ikj'}(t_2) - Y_{ikj}(t_1)\lambda _k(t_1)dt_1dN_{ikj'}(t_2) \\&\qquad - Y_{ikj'}(t_2)\lambda _k(t_2)dt_2dN_{ikj}(t_1) + Y_{ikj}(t_1)\lambda _k(t_1)Y_{ikj'}(t_2)\lambda _k(t_2)dt_1dt_2 \end{aligned}$$
By similar arguments to those in the lemma of Jung (2008), we have
$$\begin{aligned}&E\{dN_{ikj}(t_1)dN_{ikj'}(t_2)\} \\&\quad = P(t_1\le T_{ikj}<t_1+dt_1, t_2\le T_{ikj'}<t_2+dt_2, \delta _{ikj}=1, \delta _{ikj'}=1) \\&\quad = y_k(t_1,t_2)\times \frac{P(t_1\le T_{ikj}<t_1+dt_1, t_2\le T_{ikj'}<t_2+dt_2, \delta _{ikj}=1, \delta _{ikj'}=1)}{y_k(t_1,t_2)}\\&\quad = y_k(t_1,t_2)\lambda _k(t_1,t_2)dt_1dt_2 \end{aligned}$$
where \(y_k(t_1,t_2) = E(Y_{ikj}Y_{ikj'})=G(t_1,t_2)S_k(t_1,t_2)\). We can also derive
$$\begin{aligned}&E\{Y_{ikj}(t_1)\lambda _k(t_1)dt_1dN_{ikj'}(t_2)\}= y_k(t_1,t_2)\lambda _{k(2|1)}(t_1,t_2)\lambda _k(t_1)dt_1dt_2 \\&E\{Y_{ikj'}(t_2)\lambda _k(t_2)dt_2dN_{ikj}(t_1)\}= y_k(t_1,t_2)\lambda _{k(1|2)}(t_1,t_2)\lambda _k(t_2)dt_1dt_2 \end{aligned}$$
and
$$\begin{aligned}&E\{Y_{ikj}(t_1)\lambda _k(t_1)Y_{ikj'}(t_2)\lambda _k(t_2)dt_1dt_2\} = y_k(t_1,t_2)\lambda _k(t_1)\lambda _k(t_2)dt_1dt_2 \end{aligned}$$
Therefore
$$\begin{aligned} c_k = \int _0^\infty \int _0^\infty \frac{h(t_1)h(t_2)}{y_k(t_1)y_k(t_2)}y(t_1,t_2)dA_k(t_1,t_2) \end{aligned}$$
Similarly we have
$$\begin{aligned} \sigma _{12}&=\text{ cov }(\epsilon _{i1j},\epsilon _{i2j'})\\&=\int _0^\infty \int _0^\infty \frac{h(t_1)h(t_2)}{y_1(t_1)y_2(t_2)}E\{dM_{i1j}(t_1)dM_{i2j'}(t_2)\}\\&=\int _0^\infty \int _0^\infty \frac{h(t_1)h(t_2)}{y_1(t_1)y_2(t_2)}y(t_1,t_2)dA_{12}(t_1,t_2) \end{aligned}$$
where \(y(t_1,t_2) = E(Y_{i1j}Y_{i2j'})=G(t_1,t_2)S_{12}(t_1,t_2)\).
On the other hand, by definition,
$$\begin{aligned} \displaystyle {\hat{\sigma }}^2&=\frac{1}{n}\sum _{i=1}^n\left[ \int _0^\infty \frac{H(t)}{Y_1(t)} dM_{i1}(t) - \int _0^\infty \frac{H(t)}{Y_2(t)} dM_{i2}(t) \right. \\&\left. \quad +\int _0^\infty \frac{H(t)}{Y_1(t)}Y_{i1}(t)\{d\Lambda _1(t)-d{\hat{\Lambda }}(t)\} - \int _0^\infty \frac{H(t)}{Y_2(t)}Y_{i2}(t)\{d\Lambda _2(t)-d{\hat{\Lambda }}(t)\}\right] ^2 \end{aligned}$$
By the uniform convergence of \(n^{-1}Y_k(t)\) and \(Y_k(t)^{-1}dN_k(t)\) to \(y_k(t)\) and \(d\Lambda _k(t)\), respectively, \(d{\hat{\Lambda }}(t)\) uniformly converges to \(\{y_1(t)d\Lambda _1(t)+y_2(t)d\Lambda _2(t)\}/\{y_1(t)+y_2(t)\}\) in \([0,\tau ]\). Hence, we have
$$\begin{aligned} {\hat{\sigma }}^2=\frac{1}{n}\sum _{i=1}^n(\epsilon _{i}+\xi _{i})^2 +o_p(1) \end{aligned}$$
Here,
$$\begin{aligned}&\xi _{i}=\int _0^{\infty } \frac{h(t)}{\{y_1(t)+y_2(t)\}}\{Y_{i1}(t)\frac{y_2(t)}{y_1(t)} \\&\qquad + Y_{i2}(t)\frac{y_1(t)}{y_2(t)}\} \{d\Lambda _1(t)-d\Lambda _2(t)\} \end{aligned}$$
are negligible under a nearby alternative hypothesis. Therefore, \({\hat{\sigma }}^2=\frac{1}{n}\sum _{i=1}^n\epsilon _{i}^2 +o_p(1)\) converges to \(\sigma ^2\).
Appendix B: A simplified sample size formula under the nearby alternative hypothesis
We consider a proportional hazards model, \(\Delta =\lambda _1(t)/\lambda _2(t)\), and simplify the sample size formula under the nearby alternative hypothesis. Suppose \(S_1(t_1,t_2)\) and \(S_2(t_1,t_2)\) are commonly approximated by \(S(t_1,t_2)\). Under this assumption, we have \(\log \Delta =\approx \Delta -1\) by the Taylor expansion and
$$\begin{aligned} \omega = (\Delta -1) \int _0^\infty S(t)G(t)d\Lambda (t) \approx (\log \Delta ) d \end{aligned}$$
where \(d=-\int _0^\infty G(t)dS(t)=P(T_{ij}<C_{ij})\) denotes the probability that a subunit experiences an event. Furthermore,
$$\begin{aligned}&\sigma _{k}^2 = p_{3-k}^2\int _0^\infty S(t)G(t)d\Lambda (t)=p_{3-k}^2d \\&c_k = p^2_{3-k} \int _0^\infty \int _0^\infty S(t_1,t_2)G(t_1, t_2)dA(t_1,t_2) \end{aligned}$$
and
$$\begin{aligned}&dA(t_1,t_2) = \{\lambda (t_1,t_2) - \lambda _{(1|2)}(t_1,t_2)\lambda (t_2) - \lambda _{(2|1)}(t_2,t_1)\lambda (t_1) + \lambda (t_1)\lambda (t_2)\}dt_1dt_2 \end{aligned}$$
Let \(c_w= \int _0^\infty \int _0^\infty S(t_1,t_2)G(t_1, t_2)dA(t_1,t_2)\) and \(c_b= \int _0^\infty \int _0^\infty S_{12}(t_1,t_2)G(t_1, t_2)dA_{12}(t_1,t_2)\). Then, we have
$$\begin{aligned} \sigma ^2=p_1p_2{\bar{m}} d\{1+(2p_1p_2\bar{\bar{m}}/{\bar{m}} -1)\rho _w - 2p_1p_2\rho _b\bar{\bar{m}}/{\bar{m}}\} \end{aligned}$$
where \(\rho _w=c_w/d\) and \(\rho _b=c_b/d\). Hence, under the nearby alternative hypothesis, (4) is expressed as
$$\begin{aligned} n = \frac{(z_{1-\alpha /2} + z_{1-\beta })^2}{{\bar{m}}dp_1p_2 (\log \Delta )^2}\text{ DE } \end{aligned}$$
where \(\text{ DE }=1+(2p_1p_2\bar{\bar{m}}/{\bar{m}} -1)\rho _w - 2p_1p_2\rho _b\bar{\bar{m}}/{\bar{m}}\).
Appendix C: Calculation of parameters under practical settings given in Sect. 3.3
Under the assumption of common censoring within each cluster, we have \(G(t_1,t_2)=G(t_1\vee t_2)\). Further, with uniform accrual during accrual period a and with additional follow-up period b, we have
$$\begin{aligned} G(t) = {I}(t<a+b) - \frac{t-b}{a}{I}(b\le t<a+b) \end{aligned}$$
We assume Gumbel’s copula and the exponential marginal distribution with hazard rate \(\lambda _k\). Using the same notation as in Sect. 3.3, the within-treatment group joint distribution becomes,
$$\begin{aligned} S_k(t_1,t_2)&= \exp \left[ -\left\{ (\lambda _kt_1)^{1/\theta _w}+(\lambda _kt_2)^{1/\theta _w}\right\} ^{\theta _w}\right] \\ f_k(t_1,t_2)&= \lambda _k^2S_k(t_1,t_2)(\lambda _kt_1)^{1/\theta _w-1}(\lambda _kt_2)^{1/\theta _w-1}\left\{ (\lambda _kt_1)^{1/\theta _w}+(\lambda _kt_2)^{1/\theta _w}\right\} ^{2\theta _w-2} \\&\quad \times \left[ 1+(\frac{1}{\theta }_w-1) \left\{ (\lambda _kt_1)^{1/\theta _w}+(\lambda _kt_2)^{1/\theta _w}\right\} ^{-\theta _w}\right] \\ \frac{\partial S_k(t_1,t_2)}{\partial t_1}&= -\lambda _k S_k(t_1,t_2)(\lambda _kt_1)^{1/\theta _w-1}\left\{ (\lambda _kt_1)^{1/\theta _w}+(\lambda _kt_2)^{1/\theta _w}\right\} ^{\theta _w-1} \end{aligned}$$
Hence, we have
$$\begin{aligned} \lambda _k(t_1,t_2)&= \lambda _k^2(\lambda _kt_1)^{1/\theta _w-1}(\lambda _kt_2)^{1/\theta _w-1}\left\{ (\lambda _kt_1)^{1/\theta _w}+(\lambda _kt_2)^{1/\theta _w}\right\} ^{2\theta _w-2} \\&\quad \times \left[ 1+(\frac{1}{\theta }_w-1) \left\{ (\lambda _kt_1)^{1/\theta _w}+(\lambda _kt_2)^{1/\theta _w}\right\} ^{-\theta _w}\right] \\ \lambda _{k(1|2)}(t_1,t_2)&= \lambda _k(\lambda _kt_1)^{1/\theta _w-1}\left\{ (\lambda _kt_1)^{1/\theta _w}+(\lambda _kt_2)^{1/\theta _w}\right\} ^{\theta _w-1} \end{aligned}$$
Similarly for the inter-arm distributions, we have
$$\begin{aligned} S_{12}(t_1,t_2)&= \exp \left[ -\left\{ (\lambda _1t_1)^{1/\theta _b}+(\lambda _2t_2)^{1/\theta _b}\right\} ^{\theta _b}\right] \\ \lambda _{12}(t_1,t_2)&= \lambda _1\lambda _2(\lambda _1t_1)^{1/\theta _b-1}(\lambda _2t_2)^{1/\theta _b-1}\left\{ (\lambda _1t_1)^{1/\theta _b}+(\lambda _2t_2)^{1/\theta _b}\right\} ^{2\theta _b-2}\\&\quad \times \left[ 1+(\frac{1}{\theta }_b-1) \left\{ (\lambda _1t_1)^{1/\theta _b}+(\lambda _2t_2)^{1/\theta _b}\right\} ^{-\theta _b}\right] \\ \lambda _{12(1|2)}(t_1,t_2)&= \lambda _1(\lambda _1t_1)^{1/\theta _b-1}\left\{ (\lambda _1t_1)^{1/\theta _b}+(\lambda _2t_2)^{1/\theta _b}\right\} ^{\theta _b-1} \end{aligned}$$
In addition, using the formulas given in Sect. 3.1, we have
$$\begin{aligned} \omega= & {} (\lambda _1-\lambda _2)\\&\times \left\{ \int _0^{a+b} \frac{e^{-(\lambda _1-\lambda _2)t}}{(p_1e^{-\lambda _1t} + p_2e^{-\lambda _2t})^2}dt - \frac{1}{a}\int _0^{a+b} \frac{(t-b)e^{-(\lambda _1-\lambda _2)t}}{(p_1e^{-\lambda _1t} + p_2e^{-\lambda _2t})^2}dt \right\} \\ \sigma ^2_{k}= & {} p_{3-k}^2\lambda _k\\&\times \left\{ \int _0^{a+b} \frac{e^{-(\lambda _k+2\lambda _{3-k})t}}{(p_1e^{-\lambda _1t} + p_2e^{-\lambda _2t})^2}dt - \frac{1}{a}\int _0^{a+b} \frac{(t-b)e^{-(\lambda _k+2\lambda _{3-k})t}}{(p_1e^{-\lambda _1t} + p_2e^{-\lambda _2t})^2}dt \right\} \end{aligned}$$
Appendix D: Relationship between sample sizes of cluster randomization study and subunit randomization study
For CRTs with time-to-event endpoint, Li and Jung (2020) proposed that the required total number of clusters \(n_c\) can be calculated with
$$\begin{aligned} n_c(\rho _w,{\bar{m}}, \bar{\bar{m}},p_1^c) = \frac{(z_{1-\alpha /2} + z_{1-\beta })^2}{{\bar{m}}dp^c_1p^c_2(\log \Delta )^2}\text{ IF } \end{aligned}$$
Subunit randomization and cluster randomization are equivalent in some special cases. First, for a equally allocated SRT with sample size \(n_s\) and mean cluster size \({\bar{m}}\), if the inter-treatment ICC \(\rho _b = 0\), it is equivalent to a equally allocated CRT with a total of \(2n_s\) clusters and mean cluster size \({\bar{m}}/2\). Since \(E\{(m_i/2)^2\} = E(m_i^2)/4 = \bar{\bar{m}}/4\), this indicates that
$$\begin{aligned} 2n_s(\rho _w, 0,{\bar{m}}, \bar{\bar{m}},1/2) = n_c(\rho _w,{\bar{m}}/2, \bar{\bar{m}}/4,1/2) \end{aligned}$$
In addition, for equally allocated CRTs, we have
$$\begin{aligned} n_c(\rho _w,{\bar{m}}, \bar{\bar{m}},1/2)&= \frac{4(z_{1-\alpha /2} + z_{1-\beta })^2}{{\bar{m}}d(\log \Delta )^2}\{1+(\frac{\bar{\bar{m}}}{{\bar{m}}}-1)\rho _w\} \\&= \frac{1}{2} \times \frac{4(z_{1-\alpha /2} + z_{1-\beta })^2}{\frac{{\bar{m}}}{2}d(\log \Delta )^2}\{1+(\frac{\bar{\bar{m}}/4}{{\bar{m}}/2}-1)\rho _w + \frac{\bar{\bar{m}}}{2{\bar{m}}}\rho _w\}\\&\ge \frac{1}{2} n_c(\rho _w,{\bar{m}}/2, \bar{\bar{m}}/4,1/2)\\&\ge n_s(\rho _w, \rho _b,{\bar{m}}, \bar{\bar{m}},1/2) \end{aligned}$$
The last inequality is based on the previous equation and the fact that \(n_s(\rho _w, \rho _b,{\bar{m}}, \bar{\bar{m}},p_1)\le n_s(\rho _w, 0,{\bar{m}}, \bar{\bar{m}},p_1)\) always holds.