A Simulation Study of Statistical Approaches to Data Analysis in the Stepped Wedge Design

A Correction to this article was published on 17 August 2020

This article has been updated

Abstract

This paper studies model-based and design-based approaches for the analysis of data arising from a stepped wedge randomized design. Specifically, for different scenarios we compare robustness, efficiency, Type I error rate under the null hypothesis, and power under the alternative hypothesis for the leading analytical options including generalized estimating equations (GEE) and linear mixed model (LMM)-based approaches. We find that GEE models with exchangeable correlation structures are more efficient than GEE models with independent correlation structures under all scenarios considered. The model-based GEE Type I error rate can be inflated when applied with a small number of clusters, but this problem can be solved using a design-based approach. As expected, correct model specification is more important for LMM (compared to GEE) since the model is assumed correct when standard errors are calculated. However, in contrast to the model-based results, the design-based Type I error rates for LMM models under scenarios with a random treatment effect show Type I error inflation even though the fitted models perfectly match the corresponding data-generating scenarios. Therefore, greater robustness can be realized by combining GEE and permutation testing strategies.

Introduction

A stepped wedge cluster randomized trial design is a type of one-way crossover design in which each cluster starts under a reference or control condition and then crosses over to a treatment condition at a randomly determined time point [6]. Eventually, at the last time point, all clusters receive treatment during the final study time period. The unique control to treatment crossover patterns are referred to as “sequences” (e.g., in Fig. 1, the stepped wedge design has 4 sequences). In contrast, in a parallel cluster randomized design, half of the clusters are (usually) randomly assigned to the intervention and half to the control at the beginning of the trial with no planned crossover. A stepped wedge design is also different from a cluster randomized crossover design in which each cluster is randomly assigned to cross over from control to treatment or treatment to control (possibly more than once). In both crossover and stepped wedge trials, a washout time period may be included between intervention and control periods in order to make sure one condition does not affect the other or to allow individuals enrolled under one condition to complete their intervention before their cluster changes conditions. Figure 1 illustrates the settings for a traditional crossover design, a parallel design and a stepped wedge design [6].

Fig. 1
figure1

Settings for traditional crossover and stepped wedge designs. “X” represents a treatment; “O” represents control

Stepped wedge cluster randomized trials have become increasingly popular in recent years for a number of reasons [13]. For example, in the field of HIV prevention and treatment, as governments and public health agencies have begun to focus on effective implementation of proven interventions, stepped wedge designed studies are often used during program roll-out to assess real-world effectiveness. In Killiam et al. [10], for instance, a stepped wedge design was used to evaluate whether integrating antiretroviral therapy (ART) into antenatal care clinics increased the proportion of HIV-infected pregnant women initiating ART during pregnancy, compared to the standard approach of referral for ART.

Another reason to consider a stepped wedge design is that it may be logistically or financially impossible to provide the intervention to all participants at once due to resource limitations or geographical constraints [1, 9, 13]. In this case, the stepped wedge design is feasible because only a small fraction of the clusters are required to initiate the intervention at each time point. Also, the stepped wedge design is useful when it is not ethical or practical to withhold or withdraw treatment, but logistical constraints prevent immediate provision of the intervention, since all participants are able to receive the intervention eventually [14]. For example, in the field of sexually transmitted infections (STI) prevention, Golden et al. [3] used a stepped wedge design to assess the impact of an intervention to reduce STI burden as the program was implemented across Washington state. Finally, the longitudinal nature of the stepped wedge design allows one to study changes in the effectiveness of the intervention over time by modeling the effects of time [7, 19].

Despite the increased adoption of stepped wedge designs, there are a number of important analytical issues that need additional careful study in order to provide practical recommendations. Key issues include the impact of a small number of clusters, and robustness to model assumptions such as additional sources of variation due to heterogeneity in time effects or in treatment effects. We are interested in the performance of marginal and random effect models for evaluating the treatment effect in the stepped wedge design from both a model-based (inference based on distributional assumptions) and design-based (inference based on reference to the permutation distribution implied by the study design) perspective (see Sect. 2.3 for further discussion). Ji et al. [9] found that model-based inference on the treatment effect in stepped wedge designs using linear mixed models (LMM) is sensitive to model mis-specification, such as failing to account for cluster-by-time interactions in the data. Therefore, there is a real practical risk that simple model-based inference may provide inaccurate standard errors and invalid Type I error rates. Ji et al. [9] also considered permutation tests and found that the permutation test provided tight control of Type I error rates under the scenarios they investigated. Thompson et al. [17] compared cluster-level parametric and non-parametric within-period estimates of treatment effects to the standard mixed-effect model-based inference. Ultimately the parametric within-period model was not recommended due to its below nominal coverage levels under some scenarios. The non-parametric within-period estimator was less efficient than the mixed-effect model approach when period effects were common to all clusters or the number of clusters varied. Furthermore, the estimate of the treatment effect from cluster-level methods was consistently larger than that from the mixed-effect model. Therefore, important gaps exist in terms of what conditions are required for the validity and efficiency of common alternative analysis methods.

In the current study, we conduct both model-based (asymptotic) and design-based analyses at the individual-level using linear mixed models (LMM) and generalized estimating equations (GEE) approaches under both null and alternative conditions for a variety of data-generating scenarios. We include scenarios with random treatment effects where the impact of the treatment depends on the specific cluster to which it is applied, a situation that was not investigated by Ji et al. [9]. We also consider a varying number of clusters. We specifically seek to characterize treatment effect estimation bias, standard error accuracy, and Type I error rates under null conditions, and power under alternative conditions for the different analysis strategies. In Sect. 2, we describe the simulation scenarios and models we use for our study. In Sect. 3, we present results comparing efficiency, robustness and power among the analysis methods. In Sect. 4, we summarize our findings and discuss future steps.

Methods

Data-Generating Model

We generated normally distributed data with an identity link corresponding to a balanced, complete, cross-sectional stepped wedged design with 5 time points (T =5) and either twenty (I =20) or forty clusters (I =40). The design structure is shown in the third panel of Fig. 1. One hundred observations (n =100) were generated for each cluster at each time point for a total sample size of N = 10,000 (I = 20) or 20,000 (I = 40). We are envisioning common public health studies or clinical delivery investigations within health care systems where a moderate number of clusters (i.e., villages, hospitals) are available but relatively large population sizes are under study. Let Yijt be the response for individual j in cluster i at time t \( \left( {i{\text{ in }}1 \ldots 20/40; \, j{\text{ in }}1 \ldots 100; \, t{\text{ in }}1 \ldots 5} \right) \). We generate data from the model

$$ Y_{ijt} = \mu + a_{i} + \beta_{t} + X_{it} (\theta + c_{i} ) + e_{ijt}, $$
(1)

where μ is the overall mean, ai is a random effect for cluster i where ai ~ N(0, τ2), βt is the (categorical) fixed effect of time point t, Xit is the treatment indicator (0 = control; 1 = treatment) for cluster i at time t, θ is the fixed or average treatment effect, ci is a random cluster-specific treatment effect where ci ~ N(0, ν2), and eijt is a random error where eijt ~ N(0, σ2). We assume that Corr(ai, ci) = ρ (possibly 0) and eijt is independent of ai and ci.

Simulation Scenarios

Table 1 shows nine data-generating scenarios used for our simulation studies. We investigated scenarios with different number of clusters (20 vs. 40) under the null condition to understand the effect of number of clusters on Type I error rate [11]. All scenarios contain a fixed treatment effect (θ = 0 under null condition and θ = various values under alternative conditions; the value of θ chosen varied by scenario to achieve power between 10 and 90%), a time effect (β1 = 0, β2 = 0.2, β3 = 0.3, β4 = 0.4, β5 = 0.5 for t = 1,…,5 under all conditions) and a random cluster effect (τ2 = 4). The error variance (σ2) is equal to 1 in all simulations. For these variance components the intraclass correlation coefficient (ICC), defined as \( \frac{{\tau^{2} }}{{\sigma^{2} + \tau^{2} }} \), is equal to 0.8. A random treatment effect (ν2 = 4) is also included in some scenarios. When a random treatment effect is included, we allow it to be uncorrelated (corr = 0) or correlated (corr = 0.3) with the random cluster effect.

Table 1 Scenarios for simulation

We generate 500 realizations under each scenario allowing Type I error rate estimates to be accurate to within ± 0.02 due to Monte Carlo variation.

Approaches to Analysis

We fit each simulated dataset in R using two GEE models (R package gee) and four LMM models (R package lme4) using both standard model-based inference and design-based inference. All models were fit to individual-level data. All inferences using GEE are based on robust (sandwich) variances [2]. We estimate bias, variance, and Type I error rate under the null hypothesis, and power under alternative hypotheses. Power was only investigated in the 40 cluster cases to focus on scenarios where the Type I error rates were (generally) close to nominal levels.

GEE Approaches

We investigate models with independent (G1) and exchangeable (G2) working correlation structures. The exchangeable working correlation structure, in which the correlation between observations within a cluster is assumed constant [11], is often chosen in the analysis of stepped wedge trials since it captures a common source of correlation. However, it is known that GEE asymptotically will be robust to mis-specification of the working variance structure because GEE uses robust (sandwich) variance estimates that are widely valid provided there are a large number of clusters [2].

We conduct both model-based and design-based tests using GEE. For the model-based tests, we compare the robust z-score (estimate divided by its robust standard error) of the intervention effect from the GEE analysis to the standard normal distribution. GEE tends to inflate Type I error rates when the number of clusters is small [15], so we expect that performance may be better for 40 clusters compared to 20 clusters.

For the design-based analyses, we permute the stepped wedge sequences among clusters and investigate the use of both the estimated intervention effect and the robust z-score as a test statistic. We reject the null when the test statistic from the observed dataset is smaller than the 2.5th percentile or larger than the 97.5th percentile of the permutation distribution.

LMM Approaches

Table 2 shows four LMM models fit to simulation scenarios S1–S9. Tests are again conducted using both model-based and design-based tests. We expect that LMM will be less robust to mis-specification of the variance structure than GEE since standard errors are computed under the assumed covariance for the outcomes. If the random effect model structure is mis-specified, then the model-based variance in LMM will be invalid and an inflated Type I error rate may result.

Table 2 Characteristics of linear mixed models (LMM) fit to scenarios S1–S9

For the design-based tests, similar to the approach outlined for GEE, we investigate both unstandardized and standardized intervention effect estimates as a test statistic and reject the null when the observed test statistic is smaller than the 2.5th percentile or larger than the 97.5th percentile of the permutation distribution.

Results

GEE Asymptotic Inference

In the scenarios that we studied, the GEE estimators with different correlation structures (G1 and G2) both give unbiased estimates of the treatment effect (Table 3). GEE with an exchangeable correlation structure (G2) leads to a smaller empirical variance compared to G1, indicating higher efficiency due to the fact that the exchangeable structure corresponds more closely to the true correlation structure and therefore provides an optimal weighted estimator. The efficiency advantage remained true even in scenarios with a random treatment effect (e.g., S2, S3, S5, S6), which does not correspond to a simple exchangeable correlation structure. For 20 clusters, the Type I error rate is inflated to approximately 0.10. As we change the number of clusters from 20 to 40, the estimated sandwich variance more closely approximates the true sampling variance, so the Type I error rate approaches 0.05. The presence of correlation between the random cluster and random treatment effects (e.g., S2 vs. S3 or S5 vs. S6) does not meaningfully affect the results.

Table 3 GEE results based on 500 simulations showing model-based results for θ, the treatment effect

Under alternative conditions (S7–S9), we choose an effect size of 0.08 for S7 and an effect size of 1.00 for S8 and S9 to investigate power. The GEE model with an exchangeable correlation structure (G2) is much more efficient than the GEE model with independent correlation structure (G1) due to the large ICC (see Sect. 2.2) in these data.

LMM Asymptotic Inference

Table 4 shows results from fitting LMM models L1–L4 to the nine scenarios. All models give unbiased treatment effect estimates under all scenarios. Also, the Type I error rates for models with a random treatment effect are all close to the nominal threshold 0.05. Not surprisingly, the Type I error rate is significantly inflated for the model that assumes independent data (L1) under all scenarios. For analysis models without a random treatment effect (L1, L2), the Type I error rates are also far above the nominal level under simulation scenarios that include a random treatment effect (e.g., S2, S3, S5, S6). Interestingly, whether the random cluster effect and random treatment effect are modeled as correlated or not does not seem to have a significant effect on the statistical Type I error rates (compare L3 vs. L4). In addition, the cross-simulation variance of the intervention effect estimate is not noticeably different between models L2–L4, suggesting that it is preferable to over-fit than to under-fit a model. Based on the results in Tables 3 and 4, our simulations validate theoretical predictions that correct model specification is more important for LMM compared to GEE.

Table 4 LMM results based on 500 simulations showing model-based results for θ, the treatment effect

Under alternative conditions, we choose an effect size of 0.08 for S7 and effect size of 0.80 for S8 and S9 to evaluate power. The power for L3 and L4 are similar; therefore, there is not much difference in their efficiency.

GEE Permutation Test

In addition to the GEE model-based analysis shown in Table 3, we also conducted GEE design-based analyses. We provide results based on both the permutation distribution of the estimated treatment effect parameter (Table 5) as well as the permutation distribution of the robust z-statistic (Table 6).

Table 5 GEE results based on 500 datasets (1000 permutations/dataset), showing design-based results where the test statistic is the estimated intervention effect, θ
Table 6 GEE results based on 500 datasets (1000 permutations/dataset), showing design-based results where the test statistic is the robust z-score for the intervention effect

Table 5 illustrates several interesting findings. In scenarios with no random treatment effect (S1, S4), both G1 and G2 maintain the nominal Type I error rate and do not show the Type I error inflation with smaller numbers of clusters that was observed in the model-based analysis. Table 5 shows some evidence of a small Type I error inflation for G1 under scenarios that include random treatment effects (S2, S3, S5, and S6). However, the Type I error rate for the GEE model with exchangeable correlation structure (G2) is significantly inflated under scenarios with a random treatment effect.

Interestingly, when the permutation test is based on the robust z-statistic (Table 6), the Type I error rate inflation largely disappears. The Type I error rates for G1 are all close to 0.05. Model G2 now shows only slight Type I error inflation under scenarios that include a random treatment effect.

Under the alternative condition, we generated data with θ = 0.21 for S7 and θ = 0.80 for S8 and S9. Similar to the asymptotic results (Sect. 3.1), the model with exchangeable correlation structure (G2) has more power than the model with independent correlation structure (G1), although in Table 5 this is partly due to the inflated Type I error rate previously noted for G2.

LMM Permutation Test

We also conducted design-based tests using the permutation distribution of the LMM-based estimated treatment effects and z-statistics (Tables 7 and 8, respectively). We note that the treatment effect permutation distributions (but not the z-statistic permutation distributions) for models L1 and L2 are identical to those for models G1 and G2, respectively, and so the results in these simulations are similar.

Table 7 LMM results based on 500 datasets (1000 permutations/dataset), showing design-based results where the test statistic is the estimated intervention effect, θ
Table 8 LMM results based on 500 datasets (1000 permutations/dataset), showing design-based results where the test statistic is the estimated intervention effect, θ

Table 7 shows permutation results using the treatment effect coefficients. Interestingly, the model that assumes independence (L1) shows little to no Type 1 error inflation, while all the non-independence models (L2–L4) show significant Type I error inflation under scenarios with random treatment effects. For this reason, power comparisons are difficult, except under scenario S7 (no random treatment effect) where we find that models that include a correlation structure similar to the data-generating mechanism have higher power than the independence model.

In Table 8, we see that design-based tests using z-statistics give similar results to the design-based tests based on coefficients (Table 7) for L1–L2 over all scenarios. In contrast, the Type I error inflation noted for models L3 and L4 under scenarios with a random treatment effect (S2, S3, S5, S6) in Table 7 is reduced, but not eliminated, using a design-based test using z-statistics (Table 8). In addition, unlike Table 7, there is some suggestion in Table 8 that increasing the number of clusters from 20 to 40 reduces the Type I error inflation for L2–L4 under scenarios with a random treatment effect. It is notable that, in contrast to the model-based LMM results, the Type I error rates for design-based tests using LMM models L3 and L4 give inflated Type I error rates under scenarios with a random treatment effect (S2, S3, S5, S6) even though the models perfectly match the corresponding scenarios.

Since L1–L4 all have nominal Type I error rate only under scenarios without a random treatment effect (S7), we only compare power under scenario S7. The power for L2, L3, and L4 are similar under S7, while the power for L1 is much lower.

Discussion

We conducted both model-based and design-based analyses of data from stepped wedge study designs to compare the robustness, efficiency, and Type I error rate under null conditions and power under alternative condition among GEE and LMM models for each of 9 data-generating scenarios. In general, for model-based analyses correct model specification is more important to LMM compared to GEE, and over-specification of LMM models performs better than under-fitting. Specifically, the model-based results show that LMM models with random cluster and treatment effect produce similar levels of bias, efficiency, and Type I error rate as the correctly specified LMM model, even if there is no random treatment effect in the correctly specified model. In contrast, if a random treatment effect truly exists, the model-based results for a LMM without a random treatment effect show an inflated Type I error rate.

In model-based analyses, the number of clusters has a greater effect on Type I error rates in GEE than LMM. As we increase the number of clusters from 20 to 40, the model-based GEE simulations provide Type I error rates closer to the nominal level. In contrast, the Type I error rates for model-based LMM simulations are close to nominal levels even for 20 clusters when the analysis model matches the data-generating scenario. Westgate et al. [11, 12, 18] have investigated the effect of various corrections to GEE when the number of clusters is small in the context of parallel design cluster randomized trials. The application of these methods to stepped wedge trials has not been investigated although Taljaard et al. [16] noted some of the risks associated with too few clusters in stepped wedge trials. Additional research on finite sample size corrections, the effect of number of clusters, and the effect of (possibly variable) cluster size in the context of stepped wedge trials (with both linear and non-linear links) is needed.

Using permutation tests, the primary quantities of interest are the Type I error rate under null conditions and the power under alternative conditions. Permutation tests do not naturally provide estimates of the treatment effect or confidence intervals (although Hughes et al. [8] describe a design-based procedure for stepped wedge models that gives estimates, confidence intervals, and valid tests). Using a permutation procedure, GEE models show similar Type I error rates under all the scenarios investigated when the permutation test is based on robust z-scores. In addition, Type I error rates are not sensitive to the number of clusters when the permutation distribution is used for testing. GEE models with an exchangeable working correlation structure show greater power than models with an independence working correlation structure. However, in scenarios with a random treatment effect, permutation tests from a GEE model with exchangeable correlation matrix show significantly inflated Type I error rates when based on the estimated treatment effect coefficient but only minor inflation when based on robust z-statistics.

Design-based tests using LMM models produced some surprising findings. When a random treatment effect is included in the data generation, the design-based tests using LMM models show inflated Type I error rates even if the underlying model is correctly specified. This is likely due to the fact that the inclusion of a random treatment effect leads to a different covariance matrix for each sequence [8] and thereby violates the assumption of exchangeability under the null hypothesis, which is required for permutation tests [4]. The magnitude of the effect of this violation on the Type I error rate will depend on the relative magnitude of the variance components.

In this study, we used large random effect variances relative to the error variance (e.g., ICC equal to 0.8). This approach has allowed us to identify scenarios that lead to Type I error inflation with a relatively limited number of simulations. However, this also suggests that in applying our results to applications with smaller (relative) random effect variances, researchers should be most concerned about the scenarios where we find moderate to large Type I error inflation. In addition, all the simulations presented here used a linear link and normal errors and are all based on individual-level analyses of the data. Further research is needed using models with non-linear links such as binary data or that use cluster-level methods, as noted by Thompson et al. [17].

We have shown areas of strength and weaknesses of model-based and design-based analyses of stepped wedge designs. We believe these results will help guide practitioners in choosing approaches to the analysis of data from stepped wedge designs.

Change history

  • 17 August 2020

    The original version of this article unfortunately contained an error in ‘R code’ under Appendix section.

References

  1. 1.

    Brown CA, Lilford RJ (2006) The stepped wedge trial design: a systematic review. BMC Med Res Methodol 6:54

    Article  Google Scholar 

  2. 2.

    Diggle P, Heagerty P, Liang K, Zeger S (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford

    MATH  Google Scholar 

  3. 3.

    Golden MR, Kerani RP, Stenger M, Hughes JP, Aubin M, Malinski C, Holmes KK (2015) Uptake and population-level impact of expedited partner therapy (EPT) on Chlamydia trachomatis and Neisseria gonorrhoeae: the Washington State Community-level Randomized Trial of EPT. PLoS Med 12(1):e1001777

    Article  Google Scholar 

  4. 4.

    Good P (2005) Permutation, parametric and bootstrap tests of hypotheses, 3rd edn. Springer, New York

    MATH  Google Scholar 

  5. 5.

    Hooper R, Teerenstra S, de Hoop E, Eldridge S (2016) Sample size calculation for stepped wedge and other longitudinal cluster randomized trials. Stat Med 35:4718–4728

    MathSciNet  Article  Google Scholar 

  6. 6.

    Hussey MA, Hughes JP (2007) Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials 28:182–191

    Article  Google Scholar 

  7. 7.

    Hughes JP, Granston TS, Heagerty PJ (2015) Current issues in the design and analysis of stepped wedge trials. Contemp Clin Trials 45:55–60

    Article  Google Scholar 

  8. 8.

    Hughes JP, Heagerty PJ, Xia F, Ren Y (2019) Robust inference in the stepped wedge design. Biometrics. https://doi.org/10.1111/biom.13106

    Article  MATH  Google Scholar 

  9. 9.

    Ji X, Fink G, Robyn PJ, Small DS (2017) Randomization inference for stepped-wedge cluster-randomized trials: an application to community-based health insurance. Ann Appl Stat 11:1–20

    MathSciNet  Article  Google Scholar 

  10. 10.

    Killiam WP, Tambatamba BC, Chintu N, Rouose D, Stringer E, Bweupe M, Yu Y, Stringer JSA (2010) Antiretroviral therapy in antenatal care to increase treatment initiation in HIV-infected pregnant women: a stepped-wedge evaluation. AIDS 24:85–91

    Article  Google Scholar 

  11. 11.

    Leyrat C, Morgan EK, Leurent B, Kahan CB (2018) Cluster randomized trials with a small number of clusters: which analyses should be used? Int J Epidemiol 47:321–331

    Article  Google Scholar 

  12. 12.

    Li P, Redden DT (2015) Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials. BMC Med Res Methodol 15:38

    Article  Google Scholar 

  13. 13.

    Mdege ND, Man MS, Taylor CA, Torgerson DJ (2011) Systematic review of stepped wedge cluster randomized trials shows that design is particularly used to evaluate interventions during routine implementation. J Clin Epidemiol 64:936–948

    Article  Google Scholar 

  14. 14.

    Rhoda DA, Murray DM, Andridge RR, Pennell ML, Hade EM (2011) Studies with staggered starts: multiple baseline designs and group-randomized trials. Am J Publ Health 101:2164–2169

    Article  Google Scholar 

  15. 15.

    Sharples K, Breslow N (1992) Regression analysis of correlated binary data: some small sample results for the estimating equation approach. J Stat Comput Simul 42:1–20

    Article  Google Scholar 

  16. 16.

    Taljaard M, Teerenstra S, Ivers NM, Fergusson DA (2016) Substantial risks associated with few clusters in cluster randomized and stepped wedge designs. Clin Trials 13:459–463

    Article  Google Scholar 

  17. 17.

    Thompson JA, Davey C, Fielding K, Hargreaves JR, Hayes RJ (2018) Robust analysis of stepped wedge trials using cluster-level summaries within periods. Stat Med 37:2487–2500

    MathSciNet  Article  Google Scholar 

  18. 18.

    Westgate PM (2013) On small-sample inference in group randomized trials with binary outcomes and cluster-level covariates. Biom J 5:789–806

    MathSciNet  Article  Google Scholar 

  19. 19.

    Woertman W, de Hoop E, Moerbeek M, Zuidema SU, Gerritsen DL, Teerenstra S (2013) Stepped wedge designs could reduce the required sample size in cluster randomized trials. J Clin Epidemiol 66:752–758

    Article  Google Scholar 

Download references

Funding

Funding

This research was supported by the National Institute of Allergy and Infectious Diseases Grant AI29168 and PCORI contract ME-1507-31750.

Author information

Affiliations

Authors

Corresponding author

Correspondence to James P. Hughes.

Appendix: R, Stata, and SAS code

Appendix: R, Stata, and SAS code

Here we present basic R, Stata, and SAS code for fitting common models for stepped wedge designs with cross-sectional data collection at each time point. See [5,6,7].

  1. I

    Linear mixed models

    1. (1)

      \( {\text{Random cluster effect}}:Y_{ijt} = \mu + a_{i} + \beta_{t} + X_{it} \theta + e_{ijt} \)

    2. (2)

      \( {\text{Random cluster and cluster}} \times {\text{time effect}}:Y_{ijt} = \mu + a_{i} + \beta_{t} + b_{it} + X_{it} \theta + e_{ijt} \)

    3. (3)

      \( {\text{Random cluster}},{\text{ cluster}} \times {\text{time and treatment effect }}({\text{corr}}(a_{i} ,_{{}} c_{i} ) = 0):Y_{ijt} = \mu + a_{i} + \beta_{t} + b_{it} + X_{it} (\theta + c_{i} ) + e_{ijt} \)

    4. (4)

      \( {\text{Random cluster, cluster}} \times {\text{time and treatment effect }}({\text{corr}}(a_{i} ,_{{}} c_{i} ) = \rho ):Y_{ijt} = \mu + a_{i} + \beta_{t} + b_{it} + X_{it} (\theta + c_{i} ) + e_{ijt}, \)

      where

      ai ~ N(0, τ2)

      bit ~ N(0, γ2)

      ci ~ N(0, ν2)

      eijt ~ N(0, σ2).

      figurea
  2. II

    Generalized estimating equation models

    1. (5)

      Independent working correlation

    2. (6)

      Exchangeable working correlation

      figureb

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ren, Y., Hughes, J.P. & Heagerty, P.J. A Simulation Study of Statistical Approaches to Data Analysis in the Stepped Wedge Design. Stat Biosci 12, 399–415 (2020). https://doi.org/10.1007/s12561-019-09259-x

Download citation

Keywords

  • Stepped wedge design
  • GEE
  • LMM
  • Permutation test
  • Simulation