Advertisement

Behavioral Ecology and Sociobiology

, Volume 65, Issue 1, pp 103–116 | Cite as

Model averaging, missing data and multiple imputation: a case study for behavioural ecology

  • Shinichi Nakagawa
  • Robert P. Freckleton
Original Paper

Abstract

Model averaging, specifically information theoretic approaches based on Akaike’s information criterion (IT-AIC approaches), has had a major influence on statistical practices in the field of ecology and evolution. However, a neglected issue is that in common with most other model fitting approaches, IT-AIC methods are sensitive to the presence of missing observations. The commonest way of handling missing data is the complete-case analysis (the complete deletion from the dataset of cases containing any missing values). It is well-known that this results in reduced estimation precision (or reduced statistical power), biased parameter estimates; however, the implications for model selection have not been explored. Here we employ an example from behavioural ecology to illustrate how missing data can affect the conclusions drawn from model selection or based on hypothesis testing. We show how missing observations can be recovered to give accurate estimates for IT-related indices (e.g. AIC and Akaike weight) as well as parameters (and their standard errors) by utilizing ‘multiple imputation’. We use this paper to illustrate key concepts from missing data theory and as a basis for discussing available methods for handling missing data. The example is intended to serve as a practically oriented case study for behavioural ecologists deciding on how to handle missing data in their own datasets and also as a first attempt to consider the problems of conducting model selection and averaging in the presence of missing observations.

Keywords

Data augmentation Data deletion Estimation bias The rate of missing information Expectation maximization QAIC EPP MCMC House sparrows 

Notes

Acknowledgement

We thank Laszlo Garamszegi for the invitation to this special issue and the comments for a previous version of the manuscript. We are also grateful for the constructive comments by two anonymous referees. SN is grateful to Losia Lagisz and Mihoko Nakagawa for their support during the manuscript writing and is supported by Marsden Fund (UOO0812). RPF is funded by a Royal Society University Research Fellowship.

Supplementary material

265_2010_1044_MOESM1_ESM.doc (254 kb)
S1 (DOC 253 kb)
265_2010_1044_MOESM2_ESM.zip (27 kb)
S2 (ZIP 26.9 kb)

References

  1. Allison PD (2002) Missing data. Sage, Thousand OaksGoogle Scholar
  2. Anderson DR (2008) Model based inference in the life sciences: a primer on evidence. Springer, New YorkCrossRefGoogle Scholar
  3. Barton K (2009) MuMIn: multi-model inference. In: R package version 0.12.0. http://r-forge.r-project.org/projects/mumin/
  4. Biro PA, Dingemanse NJ (2009) Sampling bias resulting from animal personality. Trends Ecol Evol 24:66–67PubMedCrossRefGoogle Scholar
  5. Bolker BM (2008) Ecological models and data in R. Princeton University Press, PrincetonGoogle Scholar
  6. Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, BerlinGoogle Scholar
  7. Claeskens G, Hiort NL (2009) Model selection and model averaging. Cambridge University Press, CambridgeGoogle Scholar
  8. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum, HillsdaleGoogle Scholar
  9. Collins LM, Schafer JL, Kam CM (2001) A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Meth 6:330–351CrossRefGoogle Scholar
  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc B Methodol 39:1–38Google Scholar
  11. Freckleton RP (2010) Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol doi:10.1007/s00265-010-1045-6
  12. Garamszegi LZ (2010) Information-theoretic approaches to statistical analysis in behavioural ecology: an introduction. Behav Ecol Sociobiol. doi: 0.1007/s00265-010-1028-7 Google Scholar
  13. Garamszegi LZ, Moller AP, Torok J, Michl G, Peczely P, Richard M (2004) Immune challenge mediates vocal communication in a passerine bird: an experiment. Behav Ecol 15:148–157CrossRefGoogle Scholar
  14. Garamszegi LZ, Eens M, Hurtrez-Bousses S, Moller AP (2005) Testosterone, testes size, and mating success in birds: a comparative study. Horm Behav 47:389–409PubMedCrossRefGoogle Scholar
  15. Garamszegi LZ, Calhim S, Dochtermann N, Hegyi G, Hurd PL, Jorgensen C, Kutsukake N, Lajeunesse MJ, Pollard KA, Schielzeth H, Symonds MRE, Nakagawa S (2009a) Changing philosophies and tools for statistical inferences in behavioral ecology. Behav Ecol 20:1363–1375CrossRefGoogle Scholar
  16. Garamszegi LZ, Eens M, Janos T (2009b) Behavioural syndromes and trappability in free-living collared flycatchers, Ficedula albicollis. Anim Behav 77:803–812CrossRefGoogle Scholar
  17. Garland T, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364CrossRefGoogle Scholar
  18. Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, CambridgeGoogle Scholar
  19. Graham JW, Schafer JL (1999) On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle R (ed) Statistical strategies for small sample research. Sage, Thousand OaksGoogle Scholar
  20. Griffith SC, Owens IPF, Burke T (1999) Environmental determination of a sexually selected trait. Nature 400:358–360CrossRefGoogle Scholar
  21. Griffith SC, Owens IPF, Thuman KA (2002) Extra pair paternity in birds: a review of interspecific variation and adaptive function. Mol Ecol 11:2195–2212PubMedCrossRefGoogle Scholar
  22. Hadfield JD (2008) Estimating evolutionary parameters when viability selection is operating. Proc R Soc B Biol Sci 275:723–734CrossRefGoogle Scholar
  23. Hadfield JD, Nakagawa S (2010) General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J Evol Biol 23:494–508CrossRefGoogle Scholar
  24. Harrell FEJ, with contributions from many other users (2008) Hmisc: Harrell miscellaneous. In: R package version 3.5-2. http://biostat.mc.vanderbilt.edu/s/Hmisc
  25. Hilborn R, Mangel M (1997) The ecological detective: confronting models with data. Princeton University Press, PrincetonGoogle Scholar
  26. Honaker J, King G (2009) What to do about missing data values in time series cross-section data. http://gking.harvard.edu/files/abs/pr-abs.shtml
  27. Honaker J, King G, Blackwell M (2008) Amelia: Amelia II: a program for missing data. In: R package version 1.1-33. http://gking.harvard.edu/amelia
  28. Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61:79–90PubMedCrossRefGoogle Scholar
  29. Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends Ecol Evol 19:101–108PubMedCrossRefGoogle Scholar
  30. Link WA, Barker RJ (2006) Model weights and the foundations of multimodel inference. Ecology 87:2626–2635PubMedCrossRefGoogle Scholar
  31. Little RJA (1988) A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 83:1198–1202CrossRefGoogle Scholar
  32. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New YorkGoogle Scholar
  33. Lukacs PM, Thompson WL, Kendall WL, Gould WR, Doherty PF, Burnham KP, Anderson DR (2007) Concerns regarding a call for pluralism of information theory and hypothesis testing. J Appl Ecol 44:456–460CrossRefGoogle Scholar
  34. McKnight PE, McKnight KM, Sidani S, Figueredo AJ (2007) Missing data: a gentle introduction. Guilford, New YorkGoogle Scholar
  35. Nakagawa S (2004) A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol 15:1044–1045CrossRefGoogle Scholar
  36. Nakagawa S, Freckleton RP (2008) Missing inaction: the dangers of ignoring missing data. Trends Ecol Evol 23:592–596PubMedCrossRefGoogle Scholar
  37. Nakagawa S, Waas JR, Miyazaki M (2001) Heart rate changes reveal that little blue penguin chicks (Eudyptula minor) can use vocal signatures to discriminate familiar from unfamiliar chicks. Behav Ecol Sociobiol 50:180–188CrossRefGoogle Scholar
  38. Nakagawa S, Gillespie DOS, Hatchwell BJ, Burke T (2007a) Predictable males and unpredictable females: sex difference in repeatability of parental care in a wild bird population. J Evol Biol 20:1674–1681PubMedCrossRefGoogle Scholar
  39. Nakagawa S, Ockendon N, Gillespie DOS, Hatchwell BJ, Burke T (2007b) Does the badge of status influence parental care and investment in house sparrows? An experimental test. Oecologia 153:749–760PubMedCrossRefGoogle Scholar
  40. Ockendon N, Griffith SC, Burke T (2009) Extrapair paternity in an insular population of house sparrows after the experimental introduction of individuals from the mainland. Behav Ecol 20:305–312CrossRefGoogle Scholar
  41. R Development Core Team (2009) R: a language and environment for statistical computing, 282nd edn. R Foundation for Statistical Computing, ViennaGoogle Scholar
  42. Richards SA, Whittingham MJ, Stephens PA (2010) Model selection and model averaging in behavioural ecology: the utility of the IT-AIC framework. Behav Ecol Sociobiol. doi: 10.1007/s00265-010-1035-8 Google Scholar
  43. Rubin DB (1976) Inference and missing data. Biometrika 63:581–590CrossRefGoogle Scholar
  44. Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New YorkCrossRefGoogle Scholar
  45. Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489CrossRefGoogle Scholar
  46. Rushton SP, Ormerod SJ, Kerby G (2004) New paradigms for modelling species distributions? J Appl Ecol 41:193–200CrossRefGoogle Scholar
  47. Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, LondonCrossRefGoogle Scholar
  48. Schafer JL (1999) Multiple imputation: a primer. Stat Meth Med Res 8:3–15CrossRefGoogle Scholar
  49. Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Meth 7:147–177CrossRefGoogle Scholar
  50. Stephens PA, Buskirk SW, Hayward GD, Del Rio CM (2005) Information theory and hypothesis testing: a call for pluralism. J Appl Ecol 42:4–12CrossRefGoogle Scholar
  51. Stephens PA, Buskirk SW, del Rio CM (2007) Inference in ecology and evolution. Trends Ecol Evol 22:192–197PubMedCrossRefGoogle Scholar
  52. Still AW (1992) On the number of subjects used in animal behaviour experiments. Anim Behav 30:873–880CrossRefGoogle Scholar
  53. Strimmer K, Rambaut A (2002) Inferring confidence sets of possibly misspecified gene trees. Proc R Soc Lond B Biol Sci 269:137–142CrossRefGoogle Scholar
  54. Symonds MRE, Moussalli A (2010) A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion. Behav Ecol Sociobiol. doi: 0.1007/s00265-010-1037-6 Google Scholar
  55. van Buuren S, Groothuis-Oudshoorn K (2009) mice: Multivariate imputation by chained equations. In: R package version 1.21. http://www.stefvanbuuren.nl
  56. Whittingham MJ, Stephens PA, Bradbury RB, Freckleton RP (2006) Why do we still use stepwise modelling in ecology and behaviour? J Anim Ecol 75:1182–1189PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Department of ZoologyUniversity of OtagoDunedinNew Zealand
  2. 2.Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK

Personalised recommendations