Analysis of Missing Data

  • John W. Graham
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)


In this chapter, I present older methods for handling missing data. I then turn to the major new approaches for handling missing data. In this chapter, I present methods that make the MAR assumption. Included in this introduction are the EM algorithm for covariance matrices, normal-model multiple imputation (MI), and what I will refer to as FIML (full information maximum likelihood) methods. Before getting to these methods, however, I talk about the goals of analysis.


Auxiliary Variable Full Information Maximum Likelihood Complete Case Analysis Imputation Model Data Augmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Aiken, L.S., & West, S.G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.Google Scholar
  2. Allison, P. D. (2002). Missing Data. Thousand Oaks, CA: Sage.zbMATHGoogle Scholar
  3. Arbuckle, J. L. (1995). Amos users’ guide. Chicago: Smallwaters.Google Scholar
  4. Arbuckle, J. L. (2010). IBM SPSS Amos 19 User’s Guide. Crawfordville, FL: Amos Development Corporation.Google Scholar
  5. Bentler, P. M., & Wu, E. J. C. (1995). EQS for Windows User’s Guide. Encino, CA: Multivariate Software, Inc.Google Scholar
  6. Collins, L. M., Wugalter, S. E. (1992). Latent class models for stage-sequential dynamic latent variables. Multivariate Behavioral Research, 27, 131–157.CrossRefGoogle Scholar
  7. Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351.Google Scholar
  8. Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia: Society for Industrial and Applied Mathematics.Google Scholar
  9. Graham, J. W. (2003). Adding missing-data relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10, 80–100.MathSciNetCrossRefGoogle Scholar
  10. Graham, J. W. (2009). Missing data analysis: making it work in the real world. Annual Review of Psychology, 60, 549–576.CrossRefGoogle Scholar
  11. Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory. Prevention Science, 8, 206–213.CrossRefGoogle Scholar
  12. Graham, J. W., and Coffman, D. L. (in press). Structural Equation Modeling with Missing Data. In R. Hoyle (Ed.), Handbook of Structural Equation Modeling. New York: Guilford Press.Google Scholar
  13. Graham, J. W., Cumsille, P. E., and Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.). Research Methods in Psychology (pp. 87–114). Volume 2 of Handbook of Psychology (I. B. Weiner, Editor-in-Chief). New York: John Wiley & Sons.Google Scholar
  14. Graham, J. W., Cumsille, P. E., and Shevock, A. E. (in press). Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.). Research Methods in Psychology (pp. 000–000). Volume 3 of Handbook of Psychology (I. B. Weiner, Editor-in-Chief). New York: John Wiley & Sons.Google Scholar
  15. Graham, J. W., & Donaldson, S. I. (1993). Evaluating interventions with differential attrition: The importance of nonresponse mechanisms and use of followup data. Journal of Applied Psychology, 78, 119–128.CrossRefGoogle Scholar
  16. Graham, J. W., & Hofer, S. M. (1991). EMCOV.EXE Users Guide. Unpublished manuscript, University of Southern California.Google Scholar
  17. Graham, J. W., Hofer, S.M., Donaldson, S. I., MacKinnon, D.P., & Schafer, J. L. (1997). Analysis with missing data in prevention research. In K. Bryant, M. Windle, & S. West (Eds.), The science of prevention: methodological advances from alcohol and substance abuse research. (pp. 325–366). Washington, D.C.: American Psychological Association.CrossRefGoogle Scholar
  18. Graham, J. W., Hofer, S.M., and MacKinnon, D.P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: an application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197–218.CrossRefGoogle Scholar
  19. Hansen, W. B., & Graham, J. W. (1991). Preventing alcohol, marijuana, and cigarette use among adolescents: Peer pressure resistance training versus establishing conservative norms. Preventive Medicine, 20, 414–430.CrossRefGoogle Scholar
  20. Jaccard, J.J. & Turrisi, R. (2003). Interaction effects in multiple regression. Newberry Park, CA: Sage Publications.Google Scholar
  21. Jöreskog, K.G. & Sörbom, D. (2006). LISREL 8.8 for Windows [Computer software]. Lincolnwood, IL: Scientific Software International, Inc.Google Scholar
  22. MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G. & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7(1), 83–104.Google Scholar
  23. Mels, G. (2006) LISREL for Windows: Getting Started Guide. Lincolnwood, IL: Scientific Software International, Inc.Google Scholar
  24. Muthén, L. K., & Muthén, B. O. (2010). Mplus User’s Guide. (6th ed.). Los Angeles: Author.Google Scholar
  25. Neale, M. C., Boker, S. M., Xie, G., and Maes, H. H. (2003). Mx: Statistical Modeling. VCU Box 900126, Richmond, VA 23298: Department of Psychiatry. 6th Edition.Google Scholar
  26. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods, Second Edition. Newbury Park, CA: Sage.Google Scholar
  27. Raudenbush, S. W., Rowan, B., and Kang, S. J. (1991). A multilevel, multivariate model for studying school climate with estimation via the EM algorithm and application to U.S. high-school data. Journal of Educational Statistics, 16, 295–330.CrossRefGoogle Scholar
  28. Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.CrossRefGoogle Scholar
  29. Rubin, D. B., & Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika, 47, 69–76.MathSciNetzbMATHCrossRefGoogle Scholar
  30. Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. New York: Chapman and Hall.zbMATHCrossRefGoogle Scholar
  31. Schafer, J. L. (2001). Multiple imputation with PAN. In L. M. Collins and A. G. Sayer (Eds.) New Methods for the Analysis of Change, ed., (pp. 357–377). Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  32. Schafer, J. L., and Olsen, M. K. (1998). Multiple imputation for multivariate missing data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33, 545–571.CrossRefGoogle Scholar
  33. Schafer, J. L., and Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11, 437–457.MathSciNetCrossRefGoogle Scholar
  34. Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528–550.MathSciNetzbMATHCrossRefGoogle Scholar
  35. von Hippel, P. T. (2004). Biases in SPSS 12.0 Missing Value Analysis. American Statistician, 58, 160–164.CrossRefGoogle Scholar
  36. Willett, J. B., and Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychological Bulletin, 116(2), 363–381.CrossRefGoogle Scholar
  37. Yuan, K-H., & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. Sociological Methodology, 30, 165–200.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • John W. Graham
    • 1
  1. 1.Department of Biobehavioral HealthThe Pennsylvania State UniversityUniversity ParkUSA

Personalised recommendations