Skip to main content
Log in

Handling Missing Data in Randomized Experiments with Noncompliance

  • Published:
Prevention Science Aims and scope Submit manuscript

Abstract

Treatment noncompliance and missing outcomes at posttreatment assessments are common problems in field experiments in naturalistic settings. Although the two complications often occur simultaneously, statistical methods that address both complications have not been routinely considered in data analysis practice in the prevention research field. This paper shows that identification and estimation of causal treatment effects considering both noncompliance and missing outcomes can be relatively easily conducted under various missing data assumptions. We review a few assumptions on missing data in the presence of noncompliance, including the latent ignorability proposed by Frangakis and Rubin (Biometrika 86:365–379, 1999), and show how these assumptions can be used in the parametric complier average causal effect (CACE) estimation framework. As an easy way of sensitivity analysis, we propose the use of alternative missing data assumptions, which will provide a range of causal effect estimates. In this way, we are less likely to settle with a possibly biased causal effect estimate based on a single assumption. We demonstrate how alternative missing data assumptions affect identification of causal effects, focusing on the CACE. The data from the Johns Hopkins School Intervention Study (Ialongo et al., Am J Community Psychol 27:599–642, 1999) will be used as an example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444–455.

    Article  Google Scholar 

  • Bloom, H. S. (1984). Accounting for no-shows in experimental evaluation designs. Evaluation Review, 8, 225–246.

    Article  Google Scholar 

  • Dunn, G., Maracy, M., Dowrick, C., Ayuso-Mateos, J. L., Dalgard, O. S., Page, H., et al. (2003). Estimating psychological treatment effects from a randomized controlled trial with both non-compliance and loss to follow-up. British Journal of Psychiatry, 183, 323–331.

    Article  CAS  PubMed  Google Scholar 

  • Emsley, R., Dunn, G., & White, I. R. (2010). Mediation and moderation of treatment effects in randomised controlled trials of complex interventions. Statistical Methods in Medical Research. doi:10.1177/0962280209105014.

  • Frangakis, C. E. & Rubin, D. B. (1999). Addressing complications of intention-to-treat analysis in the presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika, 86, 365–379.

    Article  Google Scholar 

  • Frangakis, C. E. & Rubin, D. B. (2002) Principal stratification in causal inference. Biometrics, 58, 21–29.

    Article  PubMed  Google Scholar 

  • Frangakis, C. E., Rubin, D. B., & Zhou, X. H. (2002). Clustered encouragement design with individual noncompliance: Bayesian inference and application to advance directive forms. Biostatistics, 3, 147–164.

    Article  PubMed  Google Scholar 

  • Hirano, K., Imbens, G. W., Rubin, D. B., & Zhou, X. H. (2000). Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics, 1, 69–88.

    Article  PubMed  Google Scholar 

  • Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960.

    Article  Google Scholar 

  • Ialongo, N. S., Werthamer, L., Kellam, S. G., Brown, C. H., Wang, S., & Lin, Y. (1999). Proximal impact of two first-grade preventive interventions on the early risk behaviors for later substance abuse, depression and antisocial behavior. American Journal of Community Psychology, 27, 599–642.

    Article  CAS  PubMed  Google Scholar 

  • Imbens, G. W. & Rubin, D. B. (1997). Bayesian inference for causal effects in randomized experiments with non-compliance. Annals of Statistics, 25, 305–327.

    Article  Google Scholar 

  • Jo, B. (2002a). Statistical power in randomized intervention studies with noncompliance. Psychological Methods, 7, 178–193.

    Article  PubMed  Google Scholar 

  • Jo, B. (2002b). Estimating intervention effects with noncompliance: Alternative model specifications. Journal of Educational and Behavioral Statistics, 27, 385–420.

    Article  Google Scholar 

  • Jo, B. (2008a). Bias mechanisms in intention-to-treat analysis with data subject to treatment noncompliance and missing outcomes. Journal of Educational and Behavioral Statistics, 33, 158–185.

    Article  Google Scholar 

  • Jo, B. (2008b). Causal inference in randomized experiments with mediational processes. Psychological Methods, 13, 314–336.

    Article  PubMed  Google Scholar 

  • Jo, B., Asparouhov, T., Muthén, B. O., Ialongo, N. S., & Brown, C. H. (2008). Cluster randomized trials with treatment noncompliance. Psychological Methods, 13, 1–18.

    Article  PubMed  Google Scholar 

  • Jo, B., & Vinokur, A. (2010). Sensitivity analysis and bounding of causal effects with alternative identifying assumptions. Journal of Educational and Behavioral Statistics, in press.

  • Kellam, S. G., Branch, J. D., Agrawal, K. C., & Ensminger, M. E. (1975). Mental health and going to school: The Woodlawn program of assessment, early intervention, and evaluation. Chicago: University of Chicago Press.

    Google Scholar 

  • Little, R. J. A. & Rubin, D. B. (2002). Statistical analysis with missing data. New York: Wiley.

    Google Scholar 

  • Little, R. J. A., & Yau, L. (1998). Statistical techniques for analyzing data from prevention trials: Treatment of no-shows using Rubin’s causal model. Psychological Methods, 3, 147–159.

    Article  Google Scholar 

  • Mattei, A., & Mealli, F. (2007). Application of the principal stratification approach to the Faenza randomized experiment on breast self-examination. Biometrics, 63, 437–446.

    Article  CAS  PubMed  Google Scholar 

  • Mealli, F., Imbens, G. W., Ferro, S., & Biggeri A. (2004). Analyzing a randomized trial on breast self-examination with noncompliance and missing outcomes. Biostatistics, 5, 207–222.

    Article  PubMed  Google Scholar 

  • Muthén, L. K., & Muthén, B. O. (1998–2009). Mplus user’s guide. Los Angeles: Muthén & Muthén.

    Google Scholar 

  • Neyman, J. (1923). On the application of probability theory to agricultural experiments. Section 9 translated in Statistical Science, 5, 465–480 (1990).

    Google Scholar 

  • O’Malley, A. J., & Normand, S. L. T. (2004). Likelihood methods for treatment noncompliance and subsequent nonresponse in randomized trials. Biometrics, 61, 325–334.

    Article  Google Scholar 

  • Peng, Y., Little, R. J., & Raghunathan, T. E. (2004). An extended general location model for causal inferences from data subject to noncompliance and missing values. Biometrics, 60, 598–607.

    Article  PubMed  Google Scholar 

  • Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6, 34–58.

    Article  Google Scholar 

  • Rubin, D. B. (1980). Discussion of “randomization analysis of experimental data in the Fisher randomization test” by D. Basu. Journal of the American Statistical Association, 75, 591–593.

    Article  Google Scholar 

  • Rubin, D. B. (1990). Comment on “Neyman (1923) and causal inference in experiments and observational studies.” Statistical Science, 5, 472–480.

    Google Scholar 

  • Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: CRC.

    Book  Google Scholar 

  • Sobel, M. E. (2006).What do randomized studies of housing mobility demonstrate: Causal inference in the face of interference. Journal of the American Statistical Association, 101, 1398–1407.

    Article  CAS  Google Scholar 

  • Stuart, E. A., Perry, D. F., Le, H-N, & Ialongo, N. S. (2008). Estimating intervention effects of prevention programs: Accounting for noncompliance. Prevention Science, 9, 288–298.

    Article  PubMed  Google Scholar 

  • Werthamer-Larsson, L., Kellam, S. G., & Wheeler, L. (1991). Effect of first-grade classroom environment on child shy behavior, aggressive behavior, and concentration problems. American Journal of Community Psychology, 19, 585–602.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Booil Jo.

Additional information

We appreciate helpful feedback from the Prevention Science Methodology Group. The study of the first author was supported by MH066319 and MH066247 from the National Institute of Mental Health. The work of the second author was conducted while she was a postdoctoral student at the George Washington University. Elizabeth M. Ginexi is now at the National Institute on Drug Abuse, Bethesda, MD.

Appendices

Appendix 1: PIRC Example Mplus Input Files

1.1 CACE Estimation Under MAR

  • TITLE: CACE estimation under MAR

  • DATA: FILE = ps09jhu.dat;

  • VARIABLE:

  • NAMES = Z S R shy6 shy0 male health black;

  • USEV = Z S shy6 shy0 male health black;

  • !USEOBS = (shy6 NE 999);

  • CATEGORICAL = S;

  • !binary compliance indicator S (0/1, missing=999)

  • CLASSES = C(2); !two compliance strata

  • MISSING = all (999); !missing values coded as 999

  • ANALYSIS:TYPE = MIXTURE;

  • MODEL:

  • %OVERALL%

  • shy6 ON Z shy0-black;

  • !shy6 regressed on randomization Z and covariates

  • C#1 ON shy0-black;

  • !compliance class C regressed on covariates

  • %C#1%

  •     [S$1@-15]; !compliers

  • shy6 ON Z;

  • !compliers’ outcome varies across Z

  • %C#2%

  •     [S$1@15]; !never-takers

  • shy6 ON Z@0;

  • !never-takers’ outcome is stable across Z (OER)

1.2 CACE Estimation Under MAR II

  • TITLE: CACE estimation under MAR II

  • DATA: FILE = ps09jhu.dat;

  • VARIABLE:

  • NAMES = Z S R shy6 shy0 male health black;

  • USEV = Z S R shy6 shy0 male health black;

  • CATEGORICAL = S R;

  • ! binary missing indicator R for shy6

  • CLASSES = C(2); !two compliance strata

  • MISSING = all (999); ! missing values coded as 999

  • ANALYSIS:TYPE = MIXTURE;

  • MODEL:

  • %OVERALL%

  • shy6 ON Z shy0-black;

  • C#1 ON shy0-black;

  • R ON Z shy0-black;

  • !R is related to observed information

  • %C#1%=

  •     [S$1@-15]; !compliers

  •     [R$1] (1);

  • !R stable across C under the control (MAR)

  • shy6 ON Z;

  • R ON Z;

  • !compliers’ R status varies across Z

  • %C#2%

  •     [S$1@15]; !never-takers

  •     [R$1] (1);

  • !R stable across C under the control (MAR)

  • shy6 ON Z@0; !OER

  • R ON Z;

  • !never-takers’ R status varies across Z

1.3 CACE Estimation Under RER

  • TITLE: CACE estimation under RER

  • DATA: FILE = ps09jhu.dat;

  • VARIABLE:

  • NAMES = Z S R shy6 shy0 male health black;

  • USEV = Z S R shy6 shy0 male health black;

  • CATEGORICAL = S R;

  • CLASSES = C(2); !two compliance strata

  • MISSING = all (999); ! missing values coded as 999;

  • ANALYSIS:TYPE = MIXTURE;

  • MODEL:

  • %OVERALL%

  • shy6 ON Z shy0-black;

  • C#1 ON shy0-black;

  • R ON Z shy0-black;

  • !R is related to observed information;

  • %C#1%

  •     [S$1@-15]; !compliers

  •     [R$1];

  • !R varies across C under the control

  • shy6 ON Z;

  • R ON Z;

  • !compliers’ R varies across Z

  • %C#2%

  •     [S$1@15]; !never-takers

  •     [R$1];

  • !R varies across C under the control

  • shy6 ON Z@0; !OER

  • R ON Z@0;

  • !never-takers’ R stable across Z (RER)

1.4 CACE Estimation Under SCR

  • TITLE: CACE estimation under SCR

  • DATA: FILE = ps09jhu.dat;

  • VARIABLE:

  • NAMES = Z S R shy6 shy0 male health black;

  • USEV = Z S R shy6 shy0 male health black;

  • CATEGORICAL = S R;

  • CLASSES = C(2); !two compliance strata

  • MISSING = all (999); ! missing values coded as 999;

  • ANALYSIS:TYPE = MIXTURE;

  • MODEL:

  • %OVERALL%

  • shy6 ON Z shy0-black;

  • C#1 ON shy0-black;

  • R ON Z shy0-black;

  • %C#1%

  •     [S$1@-15]; !compliers

  •     [R$1];

  • !R varies across C under the control

  • shy6 ON Z;

  • R ON Z@0;

  • !compliers’ R stable across Z (SCR)

  • %C#2%

  •     [S$1@15]; !never-takers

  •     [R$1];

  • !R varies across C under the control

  • shy6 ON Z@0; !OER

  • R ON Z;

  • !never-takers’ R varies across Z

Appendix 2. PIRC Example Mplus Output Files (Key Model Parameter Estimates Only)

2.1 CACE Estimation Under MAR

  •                              Estimate     S.E.     Est./S.E.     P-Value

  • Latent class 1 (complier)

  • SHY6 ON

  •   Z(CACE)          -0.553      0.229      -2.417           0.016

  •   SHY0                 0.228      0.052        4.372           0.000

  •   MALE                0.223      0.112        1.982           0.047

  •   HEALTH           0.305      0.211        1.443           0.149

  •   BLACK             0.019      0.162        0.115           0.908

  • Intercepts

  •   SHY6                 2.256      0.253        8.908           0.000

  • Residual variances

  •   SHY6                 0.924      0.083      11.173           0.000

  • Latent class 2 (never-taker)

  •   SHY6 ON

  •   Z (OER)             0.000      0.000     999.000      999.000

  • Intercepts

  •   SHY6                1.431      0.212        6.764           0.000

  • Logistic regression of C on X

  • C#1 ON

  •   SHY0                -0.300      0.151      -1.987           0.047

  •   MALE                0.135      0.290        0.465           0.642

  •   HEALTH          -1.094      0.538       -2.035           0.042

  •   BLACK            -1.044      0.412       -2.536           0.011

  • Intercepts

  •   C#1                    1.437      0.499        2.878           0.004

2.2 CACE Estimation Under MAR II

  •                              Estimate     S.E.     Est./S.E.     P-Value

  • Latent class 1 (complier)

  • SHY6 ON

  •   Z(CACE)           -0.553      0.229      -2.417           0.016

  •   SHY0                  0.228      0.052        4.372           0.000

  •   MALE                 0.223      0.112        1.982           0.047

  •   HEALTH             0.305      0.211        1.443           0.149

  •   BLACK              0.019      0.162        0.115           0.908

  • Intercepts

  •   SHY6                  2.256      0.253        8.908          0.000

  • Residual variances

  •   SHY6                  0.924      0.083      11.173          0.000

  • R (missing indicator) ON

  •   Z                          0.952      0.395      2.409            0.016

  •   SHY0                  0.014      0.123      0.114            0.909

  •   MALE               -0.226      0.278      -0.814           0.415

  •   HEALTH           -0.212      0.402      -0.528           0.597

  •   BLACK               0.781      0.324      2.412           0.016

  • Thresholds (MAR: [R$1] (1) in Input)

  •   R$1                   -0.902      0.453      -1.989            0.047

  • Latent class 2 (never-taker)

  •   SHY6 ON

  •   Z (OER)              0.000      0.000     999.000      999.000

  • Intercepts

  •   SHY6                  1.431      0.212        6.764           0.000

  • R (missing indicator) ON

  •   Z                         0.298      0.324       0.920            0.357

  • Thresholds (MAR: [R$1] (1) in input)

  •   R$1                    -0.902      0.453      -1.989            0.047

2.3 CACE Estimation Under RER

  •                              Estimate     S.E.        Est./S.E.     P-Value

  • Latent class 1 (complier)

  • SHY6 ON

  •   Z(CACE)             -0.586     0.251        -2.332          0.020

  •   SHY0                    0.227     0.052         4.368          0.000

  •   MALE                   0.225     0.112         2.004          0.045

  •   HEALTH               0.306     0.212         1.441          0.149

  •   BLACK                 0.016     0.162         0.098          0.922

  • Intercepts

  •   SHY6                    2.291     0.277         8.266          0.000

  • Residual variances

  •   SHY6                    0.920     0.083         11.033        0.000

  • R (missing indicator) ON

  •   Z                           1.171     0.537         2.178          0.029

  •   SHY0                  -0.003     0.124        -0.025          0.980

  •   MALE                 -0.219     0.280        -0.780          0.435

  •   HEALTH             -0.294     0.417        -0.704          0.481

  •   BLACK                0.717     0.343         2.092          0.036

  • Thresholds

  •   R$1                     -0.757     0.525         -1.442        0.149

  • Latent class 2 (never-taker)

  •   SHY6 ON

  •   Z (OER)                0.000     0.000     999.000     999.000

  •   Intercepts               1.437     0.210         6.848        0.000

  •       SHY6

  • R (missing indicator) ON

  •   Z(RER)                0.000     0.000     999.000     999.000

  • Thresholds

  •   R$1                     -1.276     0.548        -2.328        0.020

2.4 CACE Estimation Under SCR

  •                              Estimate     S.E.       Est./S.E.     P-Value

  • Latent class 1 (complier)

  • SHY6 ON

  •   Z (CACE)           -0.477     0.205        -2.325         0.020

  •   SHY0                  0.228      0.053         4.332         0.000

  •   MALE                 0.215      0.113         1.899         0.058

  •   HEALTH             0.301      0.210         1.436         0.151

  •   BLACK               0.025      0.161         0.155         0.877

  • Intercepts

  •   SHY6                 2.179       0.231         9.423         0.000

  • Residual variances

  •   SHY6                 0.934       0.082        11.380         0.000

  • R (missing indicator) ON

  •   Z (SCR)               0.000     0.000     999.000     999.000

  •   SHY0                 0.065       0.128         0.507         0.612

  •   MALE               -0.300       0.300       -0.999         0.318

  •   HEALTH            0.001       0.443         0.002         0.999

  •   BLACK              1.028       0.385         2.670         0.008

  • Thresholds

  •   R$1                   -1.620       0.516        -3.139         0.002

  • Latent class 2 (never-taker)

  • SHY6 ON

  •   Z (OER)              0.000       0.000     999.000     999.000

  • Intercepts

  •   SHY6                 1.419       0.217         6.533         0.000

  • R (missing indicator) ON

  •   Z                        0.866       0.413         2.097         0.036

  • Thresholds

  •   R$1                   -0.021       0.605        -0.035         0.972

Appendix 3. Artificial Data Analyses

For readers who are interested in hands-on experience, we provide an artificial data set, which can be obtained from the Prevention Science website (www.preventionresearch.org). The same Mplus input files provided in Appendix 1 can be used after changing the data file name (i.e., DATA: FILE = artif.dat;). The results of the artificial data analyses are provided below in Table 2.

Table 2 Artificial data: CACE estimates under different missing data assumptions (standard error in parentheses)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jo, B., Ginexi, E.M. & Ialongo, N.S. Handling Missing Data in Randomized Experiments with Noncompliance. Prev Sci 11, 384–396 (2010). https://doi.org/10.1007/s11121-010-0175-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11121-010-0175-4

Keywords

Navigation