Power calculation in multiply imputed data

  • Ruochen Zha
  • Ofer HarelEmail author
Regular Article


Multiple imputation (MI) has been proven an effective procedure to deal with incomplete datasets. Compared with complete case analysis (CCA), MI is more efficient since it uses the information provided by incomplete cases which are simply discarded in CCA. A few simulation studies have shown that statistical power can be improved when MI is used. However, there is a lack of knowledge about how much power can be gained. In this article, we build a general formula to calculate the statistical power when MI is used. Specific formulas are given for several different conditions. We demonstrate our finding through simulation studies and a data example.



The data used in this manuscript came from a Grant (R01 MH077312) awarded to Dr. Golda Ginsburg by the National Institute of Mental Health. NCT00847561


  1. Baguley T (2004) Understanding statistical power in the context of applied research. Appl Ergon 35:73–80CrossRefGoogle Scholar
  2. Balkin RS, Sheperis CJ (2011) Evaluating and reporting statistical power in counseling research. J Couns Dev 89(3):268–272CrossRefGoogle Scholar
  3. Barnard J, Rubin DB (1999) Small-sample degrees of freedom with multiple imputation. Biometrika 86(4):948–955MathSciNetCrossRefzbMATHGoogle Scholar
  4. Beaujean AA (2014) Sample size determination for regression models using Monte Carlo methods in R. Pract Assess Res Eval 19:2Google Scholar
  5. Champely S, Ekstrom C, Dalgaard P, Gill J, Wunder J, Rosario HD (2015) Basic functions for power analysisGoogle Scholar
  6. Cohen J (1988) Statistical power analysis for behavioral science, 2nd edn. Routledge, LondonzbMATHGoogle Scholar
  7. Collins LM, Schafer JL, Kam C-M (2001) A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods 6(4):330–351CrossRefGoogle Scholar
  8. Desai M, Esserman DA, Gammon MD, Terry MB (2011) The use of complete-case and multiple imputation-based analyses in molecular epidemiology studies that assess interaction effects. Epidemiol Perspect Innov 8(1):5CrossRefGoogle Scholar
  9. Elashoff JD (2007) nQuery advisor® Version 7.0 user’s guideGoogle Scholar
  10. Faul F, Erdfelder E, Lang A-G, Buchner A (2007) G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39:175–191CrossRefGoogle Scholar
  11. Ginsburg GS, Drake KL, Tein JY, Teetse R, Riddle MA (2015) Preventing onset of anxiety disorders in offspring of anxious parents: a randomized controlled trial of a family-based intervention. Am J Psychiatry 172(December):1207–1214CrossRefGoogle Scholar
  12. Graham JW (2009) Missing data analysis: making it work in the real world. Ann Rev Psychol 60:549–576CrossRefGoogle Scholar
  13. Graham JW, Olchowski AE, Gilreath TD (2007) How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci 8:206–213CrossRefGoogle Scholar
  14. Hansen MH, Hurwitz WN, Madow WG (1953) Sample survey methods and survey, 1st edn. Wiley, New YorkzbMATHGoogle Scholar
  15. Harel O (2007) Inferences on missing information under multiple imputation and two-stage multiple imputation. Stat Methodol 4(January):75–89MathSciNetCrossRefzbMATHGoogle Scholar
  16. Harel O, Zhou XH (2007) Multiple imputation: review of theory, implementation and software. Stat Med 26(16):3057–3077MathSciNetCrossRefGoogle Scholar
  17. IBM Corp. (2013) IBM SPSS statistics for windows, version 22.0. IBM Corp., Armonk, NYGoogle Scholar
  18. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  19. Marshall A, Altman DG, Holder RL, Royston P (2009) Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med Res Methodol 9(1):1CrossRefGoogle Scholar
  20. McGinniss J, Harel O (2016) Multiple imputation in three or more stages. J Stat Plan Inference 176:33–51MathSciNetCrossRefzbMATHGoogle Scholar
  21. Meng X-L (1994) Multiple-imputation inferences with uncongenial sources of input (Disc: pp. 558–573). Stat Sci 9:538–558CrossRefGoogle Scholar
  22. Moher D, Dulberg CS, Wells GA (1994) Statistical power, sample size, and their reporting in randomized controlled trials. JAMA 272(2):122–124CrossRefGoogle Scholar
  23. Murphy KR, Myor B, Wolach A (1998) Statistical power analysis: a simple and general model for traditional and modern hypothesis tests, 1st edn. Routledge, LondonGoogle Scholar
  24. Muthén LK, Muthén BO (2002) How to use a Monte Carlo study to decide on sample size and determine power. Struct Equ Model 9(4):599–620MathSciNetCrossRefGoogle Scholar
  25. NCSS, LLC. Kaysville, Utah, USA (2017) PASS 15 power analysis and sample size softwareGoogle Scholar
  26. Peterman RM (1990) The importance of reporting statistical power: the forest decline and acidic deposition example. Ecology 71(5):2024–2027CrossRefGoogle Scholar
  27. R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  28. Raghunathan TE, Solenberger PW, Van Hoewyk J (2002) IVEware: imputation and variance estimation software user guide. Survey Methodology Program Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MIGoogle Scholar
  29. Reiter JP (2008) Multiple imputation when records used for imputation are not used or disseminated for analysis. Biometrika 95:933–946MathSciNetCrossRefzbMATHGoogle Scholar
  30. Rubin DB (1978) Multiple imputations in sample surveys: a phenomenological Bayesian approach to nonresponse, pp 20–28. Survey Research Methods Section of the American Statistical AssociationGoogle Scholar
  31. Rubin DB (1988) An overview of multiple imputation. In: JSM proceedings on survey research methods section. Alexandria: American Statistical AssociationGoogle Scholar
  32. Rubin DB (1987) Multiple imputation for nonresponse in surveys, 1st edn. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  33. SAS (2008) SAS/STAT 9.2 user’s guide. SAS, Cary, NCGoogle Scholar
  34. SAS Institute Inc. (2011) SAS/STAT Software, Version 9.3. Cary, NCGoogle Scholar
  35. Schafer JL (1997) Analysis of incomplete multivariate data, 1st edn. Chapman and Hall, Boca RatonCrossRefzbMATHGoogle Scholar
  36. Schafer JL (1999) Multiple imputation: a primer. Stat Method Med Res 8(1):3–15CrossRefGoogle Scholar
  37. Schafer JL, Graham JW (2002) Multiple imutation: our view of the state of art. Psychol Method 7(2):147–177CrossRefGoogle Scholar
  38. Schafer JL, Olsen MK (1998) Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behav Res 33(4):545–571CrossRefGoogle Scholar
  39. Shen ZJ (2000) Nested multiple imputation. Ph.D. thesis, Department of Statistics, Harvard UniversityGoogle Scholar
  40. StataCorp (2013) Stata power and sample-size reference manual release 13Google Scholar
  41. Steidl RJ, Hayes JP, Schauber E (1997) Statistical Power Analysis in Wildlife Research. The Journal of Wildlife Management 61(2):270–279CrossRefGoogle Scholar
  42. Templ M, Filzmoser P (2008) Visualization of missing values using the R-package VIM. Research report cs-2008-1, Department of Statistics and Probability Theory, Vienna University of TechnologyGoogle Scholar
  43. van Buuren S (2012) Flexible imputation of missing data, 1st edn. Chapman and Hall, Boca RatonCrossRefzbMATHGoogle Scholar
  44. van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67CrossRefGoogle Scholar
  45. Van der Sluis S, Dolan CV, Neale MC, Posthuma D (2008) Power calculations using exact data simulation: a useful tool for genetic study designs. Behav Genet 38:202–211CrossRefGoogle Scholar
  46. Verbeke G, Molenberghs G (2000) Chap. 21. New York: SpringerGoogle Scholar
  47. Wagstaff D A, Harel O (2011) A closer examination of three small-sample approximations to the multiple-imputation degrees of freedom. Stata J 11(3):403–419(17)CrossRefGoogle Scholar
  48. White IR, Carlin JB (2010) Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate valuese size for planned missing designs. Stat Med 29(December):2929–2931Google Scholar
  49. White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30(4):377–399MathSciNetCrossRefGoogle Scholar
  50. Wothke W (2000) Longitudinal and multigroup modeling with missing data. Lawrence Erlbaum Associates PublishersGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.The University of ConnecticutStorrsUSA

Personalised recommendations