Advertisement

AStA Advances in Statistical Analysis

, Volume 95, Issue 4, pp 351–373 | Cite as

Efficient ways to impute incomplete panel data

  • Kristian Kleinke
  • Mark Stemmler
  • Jost Reinecke
  • Friedrich Lösel
Original Paper

Abstract

We find that existing multiple imputation procedures that are currently implemented in major statistical packages and that are available to the wide majority of data analysts are limited with regard to handling incomplete panel data. We review various missing data methods that we deem useful for the analysis of incomplete panel data and discuss, how some of the shortcomings of existing procedures can be overcome. In a simulation study based on real panel data, we illustrate these procedures’ quality and outline fruitful avenues of future research.

Keywords

Missing data Multiple imputation Panel data Linear mixed effects models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackerman, B.P., Brown, E.D., Izard, C.E.: The relations between contextual risk, earned income, and the school adjustment of children from economically disadvantaged families. Dev. Psychol. 40(2), 204–216 (2004a) CrossRefGoogle Scholar
  2. Ackerman, B.P., Brown, E.D., Izard, C.E.: The relations between persistent poverty and contextual risk and children’s behavior in elementary school. Dev. Psychol. 40(3), 367–377 (2004b) CrossRefGoogle Scholar
  3. Allison, P.D.: Missing Data. Sage, Thousand Oaks (2001) Google Scholar
  4. Bailey, L., Chapman, D.W., Kasprzyk, D.: Nonresponse adjustment procedures at the census bureau: A review. In: Proceedings of the Annual Research Conference, pp. 421–444, U.S. Bureau of the Census, Washington (1985) Google Scholar
  5. Bingham, C.R., Crockett, L.J.: Longitudinal adjustment patterns of boys and girls experiencing early, middle, and late sexual intercourse. Dev. Psychol. 32(4), 647–658 (1996) CrossRefGoogle Scholar
  6. Bingham, C.R., Stemmler, M., Petersen, A.C., Graber, J.A.: Imputing missing data values in repeated measurement within-subjects designs. Methods Psychol. Res. Online 3(2), 131–155 (1998) Google Scholar
  7. Bryk, A.S., Raudenbush, S.W.: Hierarchical Linear Models. Sage, Newbury Park (1992) Google Scholar
  8. Carpenter, J., Kenward, M., Evans, S., White, I.: Last observation carryforward and last observation analysis. Stat. Med. 23, 3241–3244 (2004) CrossRefGoogle Scholar
  9. Chambers, J.M.: Software for Data Analysis: Programming with R. Springer, New York (2008) zbMATHCrossRefGoogle Scholar
  10. Collins, L.M., Schafer, J.L., Kam, C.M.: A comparison of inclusive and restrictive missing-data strategies in modern missing-data procedures. Psychol. Methods 6, 330–351 (2001) CrossRefGoogle Scholar
  11. Cook, R.J., Zeng, L., Yi, G.Y.: Marginal analysis of incomplete longitudinal binary data: A cautionary note on LOCF imputation. Biometrics 60, 820–828 (2004) MathSciNetzbMATHCrossRefGoogle Scholar
  12. Crockett, L.J., Bingham, C.R.: Anticipating adulthood: Expected timing of work and family transitions among rural youth. J. Res. Adolesc. 10(2), 151–172 (1996) CrossRefGoogle Scholar
  13. Davidov, E., Thörner, S., Schmidt, P., Gosen, S., Wolf, C.: Level and change of group-focused enmity in Germany: Unconditional and conditional latent growth curve models with four panel waves. Adv. Stat. Anal. (2011, this issue). doi: 10.1007/s10182-011-0174-1
  14. Everitt, B., Hothorn, T.: A Handbook of Statistical Analysis Using R. Chapman & Hall, Boca Raton (2006) zbMATHCrossRefGoogle Scholar
  15. Ezzati-Rice, T.M., Johnson, W., Khare, M., Little, R.J.A., Rubin, D.B., Schafer, J.L.: A simulation study to evaluate the performance of model-based multiple imputations in NCHS health examination surveys. In: Proceedings of the Annual Research Conference, pp. 257–266, U.S. Bureau of the Census, Washington (1995) Google Scholar
  16. Faraway, J.J.: Linear Models with R. Chapman & Hall, Boca Raton (2004) Google Scholar
  17. Faraway, J.J.: Extending Linear Models with R. Chapman & Hall, Boca Raton (2006) zbMATHGoogle Scholar
  18. German, A., Hill, J.: Data Analysis Using Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2007) Google Scholar
  19. Graham, J.W.: Adding missing-data-relevant variables to FIML-based structural equation models. Struct. Equ. Model. 10(1), 80–100 (2003) MathSciNetCrossRefGoogle Scholar
  20. Graham, J.W.: Missing data analysis: Making it work in the real world. Annu. Rev. Psychol. 60, 549–576 (2009) CrossRefGoogle Scholar
  21. Graham, J.W., Schafer, J.L.: On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle, R. (ed.) Statistical Strategies for Small Sample Research, pp. 1–29. Sage, Thousand Oaks (1999) Google Scholar
  22. Graham, J.W., Cumsille, P.E., Elek-Fisk, E.: Methods for handling missing data. In: Schinka, J.A., Velicer, W.F. (eds.) Handbook of Psychology: Volume 2. Research Methods in Psychology, pp. 87–114. Wiley, Hoboken (2003) Google Scholar
  23. Horton, N.J., Kleinman, K.P.: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61(1), 79–90 (2007) MathSciNetCrossRefGoogle Scholar
  24. Horton, N.J., Lipsitz, S.R.: Multiple imputation in practice: Comparison of software packages for regression models with missing variables. Am. Stat. 55, 244–254 (2001) MathSciNetCrossRefGoogle Scholar
  25. Kalton, G., Kasprzyk, D.: The treatment of missing survey data. Surv. Methodol. 12, 1–16 (1986) Google Scholar
  26. Laird, N.M.: Missing data in longitudinal studies. Stat. Med. 7, 305–315 (1988) CrossRefGoogle Scholar
  27. Lally, J.R., Mangione, P.L., Honig, A.S.: The Syracuse University Family Development Research Program: Long-range impact of an early intervention with low-income children and their families. In: Powell, D.R. (ed.) Parent Education as Early Childhood Intervention: Emerging Directions in Theory, Research and Practice, pp. 79–104. Ablex, Norwood (1988) Google Scholar
  28. Larsson, B., Possum, S., Clifford, G., Drugli, M.B., Handegård, B.H., Mørch, W.-T.: Treatment of oppositional defiant and conduct problems in young Norwegian children. Eur. Child Adolesc. Psych. 18(1), 42–52 (2008) CrossRefGoogle Scholar
  29. Little, R.J.A.: Missing-data adjustments in large surveys. J. Bus. Econ. Stat. 6(3), 287–296 (1988) MathSciNetCrossRefGoogle Scholar
  30. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987) zbMATHGoogle Scholar
  31. Lösel, E., Beelmann, A., Stemmler, M.: Skalen zur Messung sozialen Problemverhaltens bei Vorschul- und Grundschulkindern. Die deutschen Versionen des Eyberg Child Behavior Inventory (ECBI) und des Social Behavior Questionnaire (SBQ) [unpublished manuscript]. University of Erlangen-Nürnberg, Department of Psychology (2002) Google Scholar
  32. Lösel, F., Stemmler, M., Jaursch, S., Beelmann, A.: Universal prevention of antisocial development: Short- and long-term effects of a child- and parent-oriented program. Monatsschr. Kriminol. Strafrechtsreform 92, 289–307 (2009) Google Scholar
  33. Lösel, R., Wüstendörfer, W.: Zum Problem unvollständiger Datenmatrizen in der empirischen Sozialforschung [The problem of missing data in social science research]. Köln. Z. Soziol. Soz.psychol. 26, 342–357 (1974) Google Scholar
  34. Loukas, A., Fitzgerald, H.E., Zucker, R.A., von Eye, A.: Parental alcoholism and co-occurring antisocial behavior: Prospective relationships to externalizing behavior problems in their young sons. J. Abnorm. Child Psychol. 29(2), 91–106 (2001) CrossRefGoogle Scholar
  35. McArdle, J.J.: Longitudinal dynamic analyses of cognition in the health and retirement study panel. Adv. Stat. Anal. (2011, this issue). doi: 10.1007/s10182-011-0168-z
  36. McCord, J.: A thirty-year follow-up of treatment effects. Am. Psychol. 33, 284–289 (1978) CrossRefGoogle Scholar
  37. Muthén, L.K., Muthén, B.O.: Mplus User’s Guide, 6th edn. Muthén & Muthén, Los Angeles (2010) Google Scholar
  38. Neyman, J.: Outline of a theory of statistical estimation based on the classical theory of probability. Philos. Trans. R. Soc. Lond. Ser. A 236, 333–380 (1937) CrossRefGoogle Scholar
  39. Neyman, J., Pearson, E.S.: On the problem of most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. Ser. A 237, 289–337 (1933) CrossRefGoogle Scholar
  40. Raghunathan, T.E.: What do we do with missing data? Some options for analysis of incomplete data. Annu. Rev. Publ. Health 25, 99–117 (2004) CrossRefGoogle Scholar
  41. Raghunathan, T.E., Lepkowski, J.M., van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27(1), 85–96 (2001) Google Scholar
  42. Reinecke, J., Seddig, D.: Growth mixture models in longitudinal research. Adv. Stat. Anal. (2011, this issue). doi: 10.1007/s10182-011-0171-4
  43. Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976) MathSciNetzbMATHCrossRefGoogle Scholar
  44. Rubin, D.B.: Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econ. Stat. 4(1), 87–94 (1986) CrossRefGoogle Scholar
  45. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman &Hall, London (1997a) zbMATHCrossRefGoogle Scholar
  46. Schafer, J.L.: Imputation of missing covariates under a general linear mixed model. Technical Report 97-10, University Park: Pennsylvania State University, The Methodology Center (1997b) Google Scholar
  47. Schafer, J.L., Graham, J.W.: Missing data: Our view of the state of the art. Psychol. Methods 7, 147–177 (2002) CrossRefGoogle Scholar
  48. Schafer, J.L., Olsen, M.K.: Multiple imputation for missing-data problems: A data analyst’s perspective. Multivar. Behav. Res. 33, 545–571 (1998) CrossRefGoogle Scholar
  49. Schafer, J.L., Yucel, R.M.: Computational strategies for multivariate linear mixed-effects models with missing values. J. Comput. Graph. Stat. 11(2), 437–457 (2002) MathSciNetCrossRefGoogle Scholar
  50. Seiffge-Krenke, L., Stemmler, M.: Coping with everyday stress and links to medical and psychosocial adaptation in diabetic adolescents. J. Adolesc. Health 33, 180–188 (2003) CrossRefGoogle Scholar
  51. Stemmler, M., Petersen, A.C.: Gender differential influences of early adolescent risk factors for the development of depressive affect. J. Youth Adolesc. 34(3), 175–183 (2005) CrossRefGoogle Scholar
  52. Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation (with discussion). J. Am. Stat. Assoc. 82, 528–550 (1987) MathSciNetzbMATHCrossRefGoogle Scholar
  53. Tremblay, R.E., Desmarais-Gervais, L., Gagnon, C., Charlebois, P.: The preschool behavior questionnaire. Stability of its factor structure between cultures, sexes, ages and socioeconomic classes. Int. J. Behav. Dev. 10, 467–484 (1987) CrossRefGoogle Scholar
  54. Tremblay, R.E., Loeber, R., Gagnon, C., Charlebois, R., Larive, S., LeBlanc, M.: Disruptive boys with stable and unstable high fighting behavior patterns during junior elementary school. J. Abnorm. Child Psychol. 19(3), 285–300 (1991) CrossRefGoogle Scholar
  55. Tremblay, R.E., Vitaro, E., Gagnon, C., Piche, C., Royer, N.: A prosocial scale for the preschool behavior questionnaire: Concurrent and predictive correlates. Int. J. Behav. Dev. 15(2), 227–245 (1992) Google Scholar
  56. van Buuren, S.: Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16(3), 219–242 (2007) MathSciNetzbMATHCrossRefGoogle Scholar
  57. van Buuren, S., Groothuis-Oudshoorn, K.: MICE: Multivariate imputation by chained equations in R. J. Stat. Softw. (2011, forthcoming). Available from http://www.stefvanbuuren.nl/publications/MICE%20in%20R%20-%20Draft.pdf
  58. van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, C.G.M., Rubin, D.B.: Fully conditional specication in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006) MathSciNetzbMATHCrossRefGoogle Scholar
  59. Weins, C., Reinecke, J.: Delinquenzverläufe im Jugendalter: Eine methodologische Analyse zur Auswirkung von fehlenden Werten im Längsschnitt [Development of juvenile delinquency: An analysis of the effects of missing data]. Monatsschr. Kriminol. Strafrechtsreform 90(5), 418–437 (2007) Google Scholar
  60. Yu, L.M., Burton, A., Rivero-Arias, O.: Evaluation of software for multiple imputation of semi-continuous data. Stat. Methods Med. Res. 16, 243–258 (2007) MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Kristian Kleinke
    • 1
  • Mark Stemmler
    • 2
  • Jost Reinecke
    • 1
  • Friedrich Lösel
    • 2
    • 3
  1. 1.Faculty of Sociology and Centre for StatisticsUniversity of BielefeldBielefeldGermany
  2. 2.Institute of PsychologyUniversity of Erlangen-NurembergNurembergGermany
  3. 3.Institute of CriminologyUniversity of CambridgeCambridgeUK

Personalised recommendations