Skip to main content
Log in

Heterogeneous treatment effects and bias in the analysis of the stepped wedge design

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

The effect of an intervention in a stepped wedge design can vary across clusters or with time since exposure to treatment, but consequences of such heterogeneous treatment effects for the analysis of stepped wedge designs are not well recognized. In this article, we advance the idea that the stepped wedge design can be framed as a special case of a difference-in-differences design with staggered treatment exposure. Using this perspective, we show that the standard difference-in-differences regression approach estimates the average treatment effect of the treatment period with bias when treatment effects vary by cluster or time since exposure. We then use Monte-Carlo simulations of stepped wedge designs to examine the performance of the standard regression approach as well as alternative approaches that estimate treatment effects relative to intervention exposure. Simulation results confirm that estimates from the standard approach are biased when treatment effects are heterogeneous. Alternative regression approaches may perform better, but only if the research design allows estimation of the full treatment effect across all clusters. We conclude by offering recommendations for the design and analysis of stepped wedge trials.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. We would like to thank two anonymous reviewers for this suggestions.

References

  • Abraham, S., Sun, L.: Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Technical report, MIT (2018)

  • Allen, L.A., McIlvennan, C.K., Thompson, J.S., Dunlay, S.M., LaRue, S.J., Lewis, E.F., Patel, C.B., Blue, L., Fairclough, D.L., Leister, E.C., Glasgow, R.E., Cleveland, J.C., Phillips, C., Baldridge, V., Walsh, M.N., Matlock, D.D.: Effectiveness of an intervention supporting shared decision making for destination therapy left ventricular assist device. JAMA Int. Med. 178(4), 520 (2018). https://doi.org/10.1001/jamainternmed.2017.8713

    Article  Google Scholar 

  • Angrist, J.D., Pischke, J.S.: Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press, Princeton (2008)

    Book  Google Scholar 

  • Callaway, B., Sant’Anna, P.H.: Difference-in-differences with multiple time periods and an application on the minimum wage and employment. Technical report, SSRN Working Paper 3148250 (2018)

  • Cameron, A.C., Miller, D.L.: A practitioner’s guide to cluster-robust inference. J. Human Resour. 50(2), 317–372 (2015). https://doi.org/10.3368/jhr.50.2.317

    Article  Google Scholar 

  • Davey, C., Hargreaves, J., Thompson, J.A., Copas, A.J., Beard, E., Lewis, J.J., Fielding, K.L.: Analysis and reporting of stepped wedge randomised controlled trials: Synthesis and critical appraisal of published studies, 2010 to 2014. Trials 16(1), 358 (2015). https://doi.org/10.1186/s13063-015-0838-3

    Article  PubMed  PubMed Central  Google Scholar 

  • Daw, J.R., Hatfield, L.A.: Matching and regression to the mean in difference-in-differences analysis. Health Services Res. 53(6), 4138–4156 (2018)

    Article  Google Scholar 

  • Ellenberg, S.S.: The stepped-wedge clinical trial. JAMA 319(6), 607 (2018). https://doi.org/10.1001/jama.2017.21993

    Article  PubMed  Google Scholar 

  • Golden, M.R., Kerani, R.P., Stenger, M., Hughes, J.P., Aubin, M., Malinski, C., Holmes, K.K.: Uptake and population-level impact of expedited partner therapy (ept) on chlamydia trachomatis and neisseria gonorrhoeae: the washington state community-level randomized trial of ept. PLOS Med. 12(1), e1001777 (2015). https://doi.org/10.1371/journal.pmed.1001777

    Article  PubMed  PubMed Central  Google Scholar 

  • Goodman-Bacon, A.: Difference-in-differences with variation in treatment timing. Tech. rep., NBER Working Paper No. 25018 (2018)

  • Hemming, K., Girling, A.: The efficiency of stepped wedge vs cluster randomized trials: Stepped wedge studies do not always require a smaller sample size. J. Clin. Epidemiol. 66(12), 1427 (2013)

    Article  Google Scholar 

  • Hemming, K., Haines, T.P., Chilton, P.J., Girling, A.J., Lilford, R.J.: The stepped wedge cluster randomised trial: Rationale, design, analysis, and reporting. BMJ 350(1), h391–h391 (2015). https://doi.org/10.1136/bmj.h391

    Article  CAS  PubMed  Google Scholar 

  • Holland, P.W.: Statistics and causal inference (with discussion). J. Am. Stat. Assoc. 81, 945–970 (1986). https://doi.org/10.2307/2289064

    Article  Google Scholar 

  • Huffman, M.D., Mohanan, P.P., Devarajan, R., Baldridge, A.S., Kondal, D., Zhao, L., Ali, M., Krishnan, M.N., Natesan, S., Gopinath, R., Viswanathan, S., Stigi, J., Joseph, J., Chozhakkat, S., Lloyd-Jones, D.M., Prabhakaran, D.: Effect of a quality improvement intervention on clinical outcomes in patients in india with acute myocardial infarction. JAMA 319(6), 567 (2018). https://doi.org/10.1001/jama.2017.21906

    Article  PubMed  PubMed Central  Google Scholar 

  • Hughes, J.P., Granston, T.S., Heagerty, P.J.: Current issues in the design and analysis of stepped wedge trials. Contemp. Clin. Trials 45, 55–60 (2015)

    Article  Google Scholar 

  • Hussey, M.A., Hughes, J.P.: Design and analysis of stepped wedge cluster randomized trials. Contemp. Clin. Trials 28(2), 182–191 (2007). https://doi.org/10.1016/j.cct.2006.05.007

    Article  PubMed  Google Scholar 

  • Imbens, G.W., Rubin, D.B.: Rubin causal model. New Palgrave Dictionary Econ. pp. 1–10 (2017)

  • Kahn-Lang, A., Lang, K.: The promise and pitfalls of differences-in-differences: Reflections on ’16 and pregnant’and other applications. Technical report National Bureau of Economic Research (2018)

  • Kennedy-Shaffer, L., Gruttola, V., Lipsitch, M.: Novel methods for the analysis of stepped wedge cluster randomized trials. Stat. Med. 39(7), 815–844 (2019). https://doi.org/10.1002/sim.8451

    Article  PubMed  PubMed Central  Google Scholar 

  • Lechner, M.: The estimation of causal effects by difference-in-difference methods estimation of spatial panels. Found. Trends Econ. 4(3), 165–224 (2010). https://doi.org/10.1561/0800000014

    Article  Google Scholar 

  • Lindner, S., McConnell, K.J.: Difference-in-differences and matching on outcomes: a tale of two unobservables. Health Serv. Outcomes Res. Methodol. 19(2–3), 127–144 (2019). https://doi.org/10.1007/s10742-018-0189-0

    Article  Google Scholar 

  • Lindner, S., Kaufman, M.R., Marino, M., O’Malley, J., Angier, H., Cottrell, E.K., McConnell, K.J., DeVoe, J.E., Heintzman, J.R.: A medicaid alternative payment model program in oregon led to reduced volume of imaging services. Health Affairs 39(7), 1194–1201 (2020). https://doi.org/10.1377/hlthaff.2019.01656

    Article  PubMed  Google Scholar 

  • Matthews, J.N.S., Forbes, A.B.: Stepped wedge designs: Insights from a design of experiments perspective. Stat. Med. 36(24), 3772–3790 (2017). https://doi.org/10.1002/sim.7403

    Article  CAS  PubMed  Google Scholar 

  • Nickless, A., Voysey, M., Geddes, J., Yu, L.M., Fanshawe, T.R.: Mixed effects approach to the analysis of the stepped wedge cluster randomised trial-investigating the confounding effect of time through simulation. PLoS ONE 13(12), e0208876 (2018). https://doi.org/10.1371/journal.pone.0208876

    Article  PubMed  PubMed Central  Google Scholar 

  • Parent, B., LaGrone, L.N., Albirair, M.T., Serina, P.T., Keller, J.M., Cuschieri, J., Addison, E.J., Choe, L., Delossantos, G.B., Gaskill, C.E., Moon, S.D., MacDonald, J.T., Stolzberg, M.J., Eaton, E.G.V., Zech, J.M., Kritek, P.A.: Effect of standardized handoff curriculum on improved clinician preparedness in the intensive care unit. JAMA Surgery 153(5), 464 (2018). https://doi.org/10.1001/jamasurg.2017.5440

    Article  PubMed  Google Scholar 

  • Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educat. Psychol. 66(5), 688–701 (1974). https://doi.org/10.1037/h0037350

    Article  Google Scholar 

  • Ryan, A.M., Burgess, J.F., Dimick, J.B.: Why we should not be indifferent to specification choices for difference-in-differences. Health Service Res. 50(4), 1211–1235 (2015). https://doi.org/10.1111/1475-6773.12270

    Article  Google Scholar 

  • Sikkens, J.J., van Agtmael, M.A., Peters, E.J.G., Lettinga, K.D., van der Kuip, M., Vandenbroucke-Grauls, C.M., Wagner, C., Kramer, M.H.: Behavioral approach to appropriate antimicrobial prescribing in hospitals. JAMA Int. Med. 177(8), 1130 (2017). https://doi.org/10.1001/jamainternmed.2017.0946

    Article  Google Scholar 

  • Sun, L., Abraham, S.: Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Technical report, MIT (2020)

  • Thompson, J., Davey, C., Fielding, K., Hargreaves, J., Hayes, R.: Robust analysis of stepped wedge trials using cluster-level summaries within periods. Stat. Med. 37(16), 2487–2500 (2018). https://doi.org/10.1002/sim.7668

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Thompson, J.A., Fielding, K.L., Davey, C., Aiken, A.M., Hargreaves, J.R., Hayes, R.J.: Bias and inference from misspecified mixed-effect models in stepped wedge trial analysis. Stat. Med. 36(23), 3670–3682 (2017)

    Article  Google Scholar 

  • Woertman, W., de Hoop, E., Moerbeek, M., Zuidema, S.U., Gerritsen, D.L., Teerenstra, S.: Stepped wedge designs could reduce the required sample size in cluster randomized trials. J. Clin. Epidemiol. 66(7), 752–758 (2013)

    Article  Google Scholar 

  • Zeileis, A.: Econometric computing with hc and hac covariance matrix estimators. J. Stat. Software (2004). https://doi.org/10.18637/jss.v011.i10

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank participants of the CHSE lunch box and two anonymous referees for helpful comments and suggestions.

Funding

The authors did not receive outside funding for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Lindner.

Ethics declarations

Conflict of interest

Authors Lindner and McConnell declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Derivation of weights based on Abraham and Sun (2018)

The first step is to partial out time fixed effects. For a random variable \(x_{it}\), denote its sample average for period \(t\) as \({\bar{x}}_{\cdot t} = 1/N\sum _i x_{it}\) and the demeaned value as \({\dot{x}}_{it} = x_{it} - {\bar{x}}_{it}\). After partialling out time fixed effects, regression (5) can be rewritten as follows:

$$\begin{aligned} {\dot{y}}_{it} = \theta {\dot{D}}_{it} + \zeta _{it}. \end{aligned}$$

The ordinary least square estimate of \(\theta\) can then be written as follows:

$$\begin{aligned} {\hat{\theta }} = \frac{\sum _{t=1}^T\sum _{i=1}^N{\dot{y}}_{it}{\dot{D}}_{it}}{\sum _{t=1}^T\sum _{i=1}^N{\dot{D}}_{it}{\dot{D}}_{it}}. \end{aligned}$$

Taking the probability limit, the population regression coefficient is then:

$$\begin{aligned} \text {plim}\, {\hat{\theta }} = \frac{\sum _{t=1}^T E({\dot{y}}_{it}{\dot{D}}_{it})}{\sum _{t=1}^T E({\dot{D}}_{it}^2)}. \end{aligned}$$

Note that \(E(D_{it}-{\bar{D}}_{\cdot t}) = E(D_{it})-{\bar{D}}_{\cdot t}= {\bar{D}}_{\cdot t}- {\bar{D}}_{\cdot t}= 0\), and therefore, \(E({\dot{y}}_{it}{\dot{D}}_{it}) = E((y_{it}-{\bar{y}}_{\cdot t}){\dot{D}}_{it}) = E(y_{it}{\dot{D}}_{it}) - E({\bar{y}}_{\cdot t}{\dot{D}}_{it}) = E(y_{it}{\dot{D}}_{it}) - {\bar{y}}_{\cdot t}E(D_{it}-{\bar{D}}_{\cdot t}) = E(y_{it}{\dot{D}}_{it})\). Using this expression and iterative expectations, the probability limit of the regression coefficient can be rewritten as:

$$\begin{aligned} \text {plim}\, {\hat{\theta }}&= \frac{\sum _{t=1}^T E(E(y_{it}{\dot{D}}_{it}|J_i=j))}{\sum _{t=1}^TE({\dot{D}}_{it}^2)} \\ \end{aligned}$$

Next, we expand on the numerator, noting that \({\dot{D}}_{it}= (1-{\bar{D}}_{\cdot t})\) if \(D_{it}=1\) and \({\dot{D}}_{it}= -{\bar{D}}_{\cdot t}\) if \(D_{it}=0\):

$$\begin{aligned} \sum _{t=1}^T E(E(y_{it}{\dot{D}}_{it}|J_i=j))&= \sum _{t=1}^T \left( \sum _{j<t}P(J_i=j)(1-{\bar{D}}_{\cdot t})E(y_{it}|J_i=j, D_{it}=1)\right) \\&\quad + \left( \sum _{j\ge t}P(J_i=j)(-{\bar{D}}_{\cdot t})E(y_{it}|J_i=j, D_{it}=0)\right) \\ \end{aligned}$$

Note that \(\sum _{j<t}P(J_i=j) = P(J_i<t) = {\bar{D}}_{\cdot t}\) and conversely, \(\sum _{j\ge t}P(J_i=j)=(1-{\bar{D}}_{\cdot t})\). The weights in the two expressions thus sum to 0:

$$\begin{aligned} \sum _{j<t}P(J_i=j) (1-{\bar{D}}_{\cdot t}) - \sum _{j\ge t}P(J_i=j) {\bar{D}}_{\cdot t}= {\bar{D}}_{\cdot t}(1-{\bar{D}}_{\cdot t}) - (1-{\bar{D}}_{\cdot t}){\bar{D}}_{\cdot t}= 0. \end{aligned}$$

Therefore, the terms from the second expression can be distributed across the first expression. Further note that due to random assignment and no anticipation, \(E(y_{it}|J_i=j, D_{it}=0) = E(y^0_{it}|J_i=j, D_{it}=1)\), i.e., expected outcomes of untreated individuals are equal to expected potential outcomes of treated individuals if they were not exposed to treatment. Therefore, the numerator simplifies to:

$$\begin{aligned} \sum _{t=1}^T E(E(y_{it}{\dot{D}}_{it}|J_i)) = \sum _{t=1}^T \left( \sum _{j<t}P(J_i=j)(1-{\bar{D}}_{\cdot t})E(y^1_{it}-y^0_{it}|J_i=j, D_{it}=1)\right) . \end{aligned}$$

The expression \(E(y^1_{it}-y^0_{it}|J_i=j, D_{it}=1)\) is equal to the cluster- and time since first exposure-specific average treatment effect of the treated \(CATT_{jl}\). Further re-expressing \(P(J_i=j)= {\bar{D}}_{\cdot t}\cdot P(J_i|J_i<t)\), where \(P(J_i|J_i<t)\) is the probability of an individual \(i\) being in cluster \(j\) among individuals exposed to treatment, implies:

$$\begin{aligned} \sum _{t=1}^T E(E(y_{it}{\dot{D}}_{it}|J_i)) = \sum _{t=1}^T {\dot{D}}_{it}(1-{\dot{D}}_{it}) \left( \sum _{j<t}P(J_i|J_i<t)E(y^1_{it}-y^0_{it}|J_i=j, D_{it}=1)\right) . \end{aligned}$$

The final step is to express the equation in terms of time since exposure \(l\) instead of t. Note that \(t=(j+l+1)\), and for a cluster \(j\), \(l\) ranges from zero to \(T-l\). It then follows:

$$\begin{aligned} \text {plim}\, {\hat{\theta }}&= \frac{\sum _{j=1}^{j_{\text {max}}}\sum _{l=0}^{T-j-1}{\bar{D}}_{\cdot (j+l+1)}(1-{\bar{D}}_{\cdot (j+l+1)}) P(J_i=j|J_i<(j+l+1)) CATT_{jl}}{\sum _{t=1}^TE({\dot{D}}_{it}^2)} \\&= \sum _{j=1}^{j_{\text {max}}}\sum _{l=0}^{T-j-1}\omega _{jl}CATT_{jl}, \end{aligned}$$

where

$$\begin{aligned} \omega _{jl} = \frac{{\bar{D}}_{\cdot (j+l+1)}(1-{\bar{D}}_{\cdot (j+l+1)}) P(J_i=j|J_i<(j+l+1))}{\sum _{t=1}^TE({\dot{D}}_{it}^2)} \end{aligned}$$

are the weights, which are not only a function of population fractions \(P(J_i=j|J_i<(j+l+1))\) but also of the treatment exposure variance \({\bar{D}}_{\cdot (j+l+1)}(1-{\bar{D}}_{\cdot (j+l+1)})\).

Parameters used for the Monte Carlo simulation

See Table 7.

Table 7 Simulation parameters

Treatment schedule and treatment heterogeneity

In this appendix section, we describe treatment schedules for each of the four scenarios. In each table, rows denote intervention cohorts and columns show time. The numbers indicate the fraction of the treatment effect that participants are exposed to, where cells with a value equals to zero are pre-intervention periods and cells with a value larger than zero are treatment periods. Colors are used to distinguish between the four research design: only black numbers are for the first research design (baseline design with four intervention cohorts and five time periods), black and teal numbers are for the second research design (longer treatment period), black and blue numbers are for the third research design (comparison group), and black, teal, blue and brown numbers are for the fourth research design (longer treatment period and comparison group).

The treatment schedule for the first scenario with homogeneous treatment effects across intervention cohort and no delay in treatment effect only has two values, zero (for control periods) and one (for treatment periods) (Table 8).

Table 8 Treatment schedule for the first scenario

The treatment schedule for the second scenario with heterogeneous treatment across cohorts (vertical heterogeneity) and no treatment effect delay has treatment periods with values above and below one (Table 9).

Table 9 Treatment schedule for the second scenario

The treatment schedule for the third scenario with homogeneous treatment across cohorts and treatment effect delay has some treatment periods with values less than one, indicating time periods after intervention onset where the full treatment effect is not yet realized (Table 10).

Table 10 Treatment schedule for the third scenario

The fourth scenario with heterogeneous treatment effects across cohorts and treatment effect delay has the most complicated treatment schedule. Values for all cohorts increase during the first three treatment periods due to the delay, and each cohort has different long-run treatment effects (Table 11).

Table 11 Treatment schedule for the fourth scenario

Based on these treatment schedules, and assuming the same number of participants per cluster and time period, the ATTP can be calculated by simply summing cluster- and time since exposure-specific treatment effects during the treatment period from the previous tables. The following table shows ATTPs for scenarios 2 to 4 and research designs 1 and 2; ATTPs for research design 3 are identical to ATTPs for research design 1, while ATTPs for research design 4 are identical to ATTPs for research design 2. ATTPs for scenario 1 are all - 0.001 (Table 12).

Table 12 ATTP for scenarios 2-4 and research designs 1-2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lindner, S., McConnell, K.J. Heterogeneous treatment effects and bias in the analysis of the stepped wedge design. Health Serv Outcomes Res Method 21, 419–438 (2021). https://doi.org/10.1007/s10742-021-00244-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-021-00244-w

Keywords

Navigation