Abstract
Treatment noncompliance and missing outcomes at posttreatment assessments are common problems in field experiments in naturalistic settings. Although the two complications often occur simultaneously, statistical methods that address both complications have not been routinely considered in data analysis practice in the prevention research field. This paper shows that identification and estimation of causal treatment effects considering both noncompliance and missing outcomes can be relatively easily conducted under various missing data assumptions. We review a few assumptions on missing data in the presence of noncompliance, including the latent ignorability proposed by Frangakis and Rubin (Biometrika 86:365–379, 1999), and show how these assumptions can be used in the parametric complier average causal effect (CACE) estimation framework. As an easy way of sensitivity analysis, we propose the use of alternative missing data assumptions, which will provide a range of causal effect estimates. In this way, we are less likely to settle with a possibly biased causal effect estimate based on a single assumption. We demonstrate how alternative missing data assumptions affect identification of causal effects, focusing on the CACE. The data from the Johns Hopkins School Intervention Study (Ialongo et al., Am J Community Psychol 27:599–642, 1999) will be used as an example.
Similar content being viewed by others
References
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444–455.
Bloom, H. S. (1984). Accounting for no-shows in experimental evaluation designs. Evaluation Review, 8, 225–246.
Dunn, G., Maracy, M., Dowrick, C., Ayuso-Mateos, J. L., Dalgard, O. S., Page, H., et al. (2003). Estimating psychological treatment effects from a randomized controlled trial with both non-compliance and loss to follow-up. British Journal of Psychiatry, 183, 323–331.
Emsley, R., Dunn, G., & White, I. R. (2010). Mediation and moderation of treatment effects in randomised controlled trials of complex interventions. Statistical Methods in Medical Research. doi:10.1177/0962280209105014.
Frangakis, C. E. & Rubin, D. B. (1999). Addressing complications of intention-to-treat analysis in the presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika, 86, 365–379.
Frangakis, C. E. & Rubin, D. B. (2002) Principal stratification in causal inference. Biometrics, 58, 21–29.
Frangakis, C. E., Rubin, D. B., & Zhou, X. H. (2002). Clustered encouragement design with individual noncompliance: Bayesian inference and application to advance directive forms. Biostatistics, 3, 147–164.
Hirano, K., Imbens, G. W., Rubin, D. B., & Zhou, X. H. (2000). Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics, 1, 69–88.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960.
Ialongo, N. S., Werthamer, L., Kellam, S. G., Brown, C. H., Wang, S., & Lin, Y. (1999). Proximal impact of two first-grade preventive interventions on the early risk behaviors for later substance abuse, depression and antisocial behavior. American Journal of Community Psychology, 27, 599–642.
Imbens, G. W. & Rubin, D. B. (1997). Bayesian inference for causal effects in randomized experiments with non-compliance. Annals of Statistics, 25, 305–327.
Jo, B. (2002a). Statistical power in randomized intervention studies with noncompliance. Psychological Methods, 7, 178–193.
Jo, B. (2002b). Estimating intervention effects with noncompliance: Alternative model specifications. Journal of Educational and Behavioral Statistics, 27, 385–420.
Jo, B. (2008a). Bias mechanisms in intention-to-treat analysis with data subject to treatment noncompliance and missing outcomes. Journal of Educational and Behavioral Statistics, 33, 158–185.
Jo, B. (2008b). Causal inference in randomized experiments with mediational processes. Psychological Methods, 13, 314–336.
Jo, B., Asparouhov, T., Muthén, B. O., Ialongo, N. S., & Brown, C. H. (2008). Cluster randomized trials with treatment noncompliance. Psychological Methods, 13, 1–18.
Jo, B., & Vinokur, A. (2010). Sensitivity analysis and bounding of causal effects with alternative identifying assumptions. Journal of Educational and Behavioral Statistics, in press.
Kellam, S. G., Branch, J. D., Agrawal, K. C., & Ensminger, M. E. (1975). Mental health and going to school: The Woodlawn program of assessment, early intervention, and evaluation. Chicago: University of Chicago Press.
Little, R. J. A. & Rubin, D. B. (2002). Statistical analysis with missing data. New York: Wiley.
Little, R. J. A., & Yau, L. (1998). Statistical techniques for analyzing data from prevention trials: Treatment of no-shows using Rubin’s causal model. Psychological Methods, 3, 147–159.
Mattei, A., & Mealli, F. (2007). Application of the principal stratification approach to the Faenza randomized experiment on breast self-examination. Biometrics, 63, 437–446.
Mealli, F., Imbens, G. W., Ferro, S., & Biggeri A. (2004). Analyzing a randomized trial on breast self-examination with noncompliance and missing outcomes. Biostatistics, 5, 207–222.
Muthén, L. K., & Muthén, B. O. (1998–2009). Mplus user’s guide. Los Angeles: Muthén & Muthén.
Neyman, J. (1923). On the application of probability theory to agricultural experiments. Section 9 translated in Statistical Science, 5, 465–480 (1990).
O’Malley, A. J., & Normand, S. L. T. (2004). Likelihood methods for treatment noncompliance and subsequent nonresponse in randomized trials. Biometrics, 61, 325–334.
Peng, Y., Little, R. J., & Raghunathan, T. E. (2004). An extended general location model for causal inferences from data subject to noncompliance and missing values. Biometrics, 60, 598–607.
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6, 34–58.
Rubin, D. B. (1980). Discussion of “randomization analysis of experimental data in the Fisher randomization test” by D. Basu. Journal of the American Statistical Association, 75, 591–593.
Rubin, D. B. (1990). Comment on “Neyman (1923) and causal inference in experiments and observational studies.” Statistical Science, 5, 472–480.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: CRC.
Sobel, M. E. (2006).What do randomized studies of housing mobility demonstrate: Causal inference in the face of interference. Journal of the American Statistical Association, 101, 1398–1407.
Stuart, E. A., Perry, D. F., Le, H-N, & Ialongo, N. S. (2008). Estimating intervention effects of prevention programs: Accounting for noncompliance. Prevention Science, 9, 288–298.
Werthamer-Larsson, L., Kellam, S. G., & Wheeler, L. (1991). Effect of first-grade classroom environment on child shy behavior, aggressive behavior, and concentration problems. American Journal of Community Psychology, 19, 585–602.
Author information
Authors and Affiliations
Corresponding author
Additional information
We appreciate helpful feedback from the Prevention Science Methodology Group. The study of the first author was supported by MH066319 and MH066247 from the National Institute of Mental Health. The work of the second author was conducted while she was a postdoctoral student at the George Washington University. Elizabeth M. Ginexi is now at the National Institute on Drug Abuse, Bethesda, MD.
Appendices
Appendix 1: PIRC Example Mplus Input Files
1.1 CACE Estimation Under MAR
-
TITLE: CACE estimation under MAR
-
DATA: FILE = ps09jhu.dat;
-
VARIABLE:
-
NAMES = Z S R shy6 shy0 male health black;
-
USEV = Z S shy6 shy0 male health black;
-
!USEOBS = (shy6 NE 999);
-
CATEGORICAL = S;
-
!binary compliance indicator S (0/1, missing=999)
-
CLASSES = C(2); !two compliance strata
-
MISSING = all (999); !missing values coded as 999
-
ANALYSIS:TYPE = MIXTURE;
-
MODEL:
-
%OVERALL%
-
shy6 ON Z shy0-black;
-
!shy6 regressed on randomization Z and covariates
-
C#1 ON shy0-black;
-
!compliance class C regressed on covariates
-
%C#1%
-
[S$1@-15]; !compliers
-
shy6 ON Z;
-
!compliers’ outcome varies across Z
-
%C#2%
-
[S$1@15]; !never-takers
-
shy6 ON Z@0;
-
!never-takers’ outcome is stable across Z (OER)
1.2 CACE Estimation Under MAR II
-
TITLE: CACE estimation under MAR II
-
DATA: FILE = ps09jhu.dat;
-
VARIABLE:
-
NAMES = Z S R shy6 shy0 male health black;
-
USEV = Z S R shy6 shy0 male health black;
-
CATEGORICAL = S R;
-
! binary missing indicator R for shy6
-
CLASSES = C(2); !two compliance strata
-
MISSING = all (999); ! missing values coded as 999
-
ANALYSIS:TYPE = MIXTURE;
-
MODEL:
-
%OVERALL%
-
shy6 ON Z shy0-black;
-
C#1 ON shy0-black;
-
R ON Z shy0-black;
-
!R is related to observed information
-
%C#1%=
-
[S$1@-15]; !compliers
-
[R$1] (1);
-
!R stable across C under the control (MAR)
-
shy6 ON Z;
-
R ON Z;
-
!compliers’ R status varies across Z
-
%C#2%
-
[S$1@15]; !never-takers
-
[R$1] (1);
-
!R stable across C under the control (MAR)
-
shy6 ON Z@0; !OER
-
R ON Z;
-
!never-takers’ R status varies across Z
1.3 CACE Estimation Under RER
-
TITLE: CACE estimation under RER
-
DATA: FILE = ps09jhu.dat;
-
VARIABLE:
-
NAMES = Z S R shy6 shy0 male health black;
-
USEV = Z S R shy6 shy0 male health black;
-
CATEGORICAL = S R;
-
CLASSES = C(2); !two compliance strata
-
MISSING = all (999); ! missing values coded as 999;
-
ANALYSIS:TYPE = MIXTURE;
-
MODEL:
-
%OVERALL%
-
shy6 ON Z shy0-black;
-
C#1 ON shy0-black;
-
R ON Z shy0-black;
-
!R is related to observed information;
-
%C#1%
-
[S$1@-15]; !compliers
-
[R$1];
-
!R varies across C under the control
-
shy6 ON Z;
-
R ON Z;
-
!compliers’ R varies across Z
-
%C#2%
-
[S$1@15]; !never-takers
-
[R$1];
-
!R varies across C under the control
-
shy6 ON Z@0; !OER
-
R ON Z@0;
-
!never-takers’ R stable across Z (RER)
1.4 CACE Estimation Under SCR
-
TITLE: CACE estimation under SCR
-
DATA: FILE = ps09jhu.dat;
-
VARIABLE:
-
NAMES = Z S R shy6 shy0 male health black;
-
USEV = Z S R shy6 shy0 male health black;
-
CATEGORICAL = S R;
-
CLASSES = C(2); !two compliance strata
-
MISSING = all (999); ! missing values coded as 999;
-
ANALYSIS:TYPE = MIXTURE;
-
MODEL:
-
%OVERALL%
-
shy6 ON Z shy0-black;
-
C#1 ON shy0-black;
-
R ON Z shy0-black;
-
%C#1%
-
[S$1@-15]; !compliers
-
[R$1];
-
!R varies across C under the control
-
shy6 ON Z;
-
R ON Z@0;
-
!compliers’ R stable across Z (SCR)
-
%C#2%
-
[S$1@15]; !never-takers
-
[R$1];
-
!R varies across C under the control
-
shy6 ON Z@0; !OER
-
R ON Z;
-
!never-takers’ R varies across Z
Appendix 2. PIRC Example Mplus Output Files (Key Model Parameter Estimates Only)
2.1 CACE Estimation Under MAR
-
Estimate S.E. Est./S.E. P-Value
-
Latent class 1 (complier)
-
SHY6 ON
-
Z(CACE) -0.553 0.229 -2.417 0.016
-
SHY0 0.228 0.052 4.372 0.000
-
MALE 0.223 0.112 1.982 0.047
-
HEALTH 0.305 0.211 1.443 0.149
-
BLACK 0.019 0.162 0.115 0.908
-
Intercepts
-
SHY6 2.256 0.253 8.908 0.000
-
Residual variances
-
SHY6 0.924 0.083 11.173 0.000
-
Latent class 2 (never-taker)
-
SHY6 ON
-
Z (OER) 0.000 0.000 999.000 999.000
-
Intercepts
-
SHY6 1.431 0.212 6.764 0.000
-
Logistic regression of C on X
-
C#1 ON
-
SHY0 -0.300 0.151 -1.987 0.047
-
MALE 0.135 0.290 0.465 0.642
-
HEALTH -1.094 0.538 -2.035 0.042
-
BLACK -1.044 0.412 -2.536 0.011
-
Intercepts
-
C#1 1.437 0.499 2.878 0.004
2.2 CACE Estimation Under MAR II
-
Estimate S.E. Est./S.E. P-Value
-
Latent class 1 (complier)
-
SHY6 ON
-
Z(CACE) -0.553 0.229 -2.417 0.016
-
SHY0 0.228 0.052 4.372 0.000
-
MALE 0.223 0.112 1.982 0.047
-
HEALTH 0.305 0.211 1.443 0.149
-
BLACK 0.019 0.162 0.115 0.908
-
Intercepts
-
SHY6 2.256 0.253 8.908 0.000
-
Residual variances
-
SHY6 0.924 0.083 11.173 0.000
-
R (missing indicator) ON
-
Z 0.952 0.395 2.409 0.016
-
SHY0 0.014 0.123 0.114 0.909
-
MALE -0.226 0.278 -0.814 0.415
-
HEALTH -0.212 0.402 -0.528 0.597
-
BLACK 0.781 0.324 2.412 0.016
-
Thresholds (MAR: [R$1] (1) in Input)
-
R$1 -0.902 0.453 -1.989 0.047
-
Latent class 2 (never-taker)
-
SHY6 ON
-
Z (OER) 0.000 0.000 999.000 999.000
-
Intercepts
-
SHY6 1.431 0.212 6.764 0.000
-
R (missing indicator) ON
-
Z 0.298 0.324 0.920 0.357
-
Thresholds (MAR: [R$1] (1) in input)
-
R$1 -0.902 0.453 -1.989 0.047
2.3 CACE Estimation Under RER
-
Estimate S.E. Est./S.E. P-Value
-
Latent class 1 (complier)
-
SHY6 ON
-
Z(CACE) -0.586 0.251 -2.332 0.020
-
SHY0 0.227 0.052 4.368 0.000
-
MALE 0.225 0.112 2.004 0.045
-
HEALTH 0.306 0.212 1.441 0.149
-
BLACK 0.016 0.162 0.098 0.922
-
Intercepts
-
SHY6 2.291 0.277 8.266 0.000
-
Residual variances
-
SHY6 0.920 0.083 11.033 0.000
-
R (missing indicator) ON
-
Z 1.171 0.537 2.178 0.029
-
SHY0 -0.003 0.124 -0.025 0.980
-
MALE -0.219 0.280 -0.780 0.435
-
HEALTH -0.294 0.417 -0.704 0.481
-
BLACK 0.717 0.343 2.092 0.036
-
Thresholds
-
R$1 -0.757 0.525 -1.442 0.149
-
Latent class 2 (never-taker)
-
SHY6 ON
-
Z (OER) 0.000 0.000 999.000 999.000
-
Intercepts 1.437 0.210 6.848 0.000
-
SHY6
-
R (missing indicator) ON
-
Z(RER) 0.000 0.000 999.000 999.000
-
Thresholds
-
R$1 -1.276 0.548 -2.328 0.020
2.4 CACE Estimation Under SCR
-
Estimate S.E. Est./S.E. P-Value
-
Latent class 1 (complier)
-
SHY6 ON
-
Z (CACE) -0.477 0.205 -2.325 0.020
-
SHY0 0.228 0.053 4.332 0.000
-
MALE 0.215 0.113 1.899 0.058
-
HEALTH 0.301 0.210 1.436 0.151
-
BLACK 0.025 0.161 0.155 0.877
-
Intercepts
-
SHY6 2.179 0.231 9.423 0.000
-
Residual variances
-
SHY6 0.934 0.082 11.380 0.000
-
R (missing indicator) ON
-
Z (SCR) 0.000 0.000 999.000 999.000
-
SHY0 0.065 0.128 0.507 0.612
-
MALE -0.300 0.300 -0.999 0.318
-
HEALTH 0.001 0.443 0.002 0.999
-
BLACK 1.028 0.385 2.670 0.008
-
Thresholds
-
R$1 -1.620 0.516 -3.139 0.002
-
Latent class 2 (never-taker)
-
SHY6 ON
-
Z (OER) 0.000 0.000 999.000 999.000
-
Intercepts
-
SHY6 1.419 0.217 6.533 0.000
-
R (missing indicator) ON
-
Z 0.866 0.413 2.097 0.036
-
Thresholds
-
R$1 -0.021 0.605 -0.035 0.972
Appendix 3. Artificial Data Analyses
For readers who are interested in hands-on experience, we provide an artificial data set, which can be obtained from the Prevention Science website (www.preventionresearch.org). The same Mplus input files provided in Appendix 1 can be used after changing the data file name (i.e., DATA: FILE = artif.dat;). The results of the artificial data analyses are provided below in Table 2.
Rights and permissions
About this article
Cite this article
Jo, B., Ginexi, E.M. & Ialongo, N.S. Handling Missing Data in Randomized Experiments with Noncompliance. Prev Sci 11, 384–396 (2010). https://doi.org/10.1007/s11121-010-0175-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11121-010-0175-4