Abstract
Model averaging, specifically information theoretic approaches based on Akaike’s information criterion (IT-AIC approaches), has had a major influence on statistical practices in the field of ecology and evolution. However, a neglected issue is that in common with most other model fitting approaches, IT-AIC methods are sensitive to the presence of missing observations. The commonest way of handling missing data is the complete-case analysis (the complete deletion from the dataset of cases containing any missing values). It is well-known that this results in reduced estimation precision (or reduced statistical power), biased parameter estimates; however, the implications for model selection have not been explored. Here we employ an example from behavioural ecology to illustrate how missing data can affect the conclusions drawn from model selection or based on hypothesis testing. We show how missing observations can be recovered to give accurate estimates for IT-related indices (e.g. AIC and Akaike weight) as well as parameters (and their standard errors) by utilizing ‘multiple imputation’. We use this paper to illustrate key concepts from missing data theory and as a basis for discussing available methods for handling missing data. The example is intended to serve as a practically oriented case study for behavioural ecologists deciding on how to handle missing data in their own datasets and also as a first attempt to consider the problems of conducting model selection and averaging in the presence of missing observations.
References
Allison PD (2002) Missing data. Sage, Thousand Oaks
Anderson DR (2008) Model based inference in the life sciences: a primer on evidence. Springer, New York
Barton K (2009) MuMIn: multi-model inference. In: R package version 0.12.0. http://r-forge.r-project.org/projects/mumin/
Biro PA, Dingemanse NJ (2009) Sampling bias resulting from animal personality. Trends Ecol Evol 24:66–67
Bolker BM (2008) Ecological models and data in R. Princeton University Press, Princeton
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, Berlin
Claeskens G, Hiort NL (2009) Model selection and model averaging. Cambridge University Press, Cambridge
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum, Hillsdale
Collins LM, Schafer JL, Kam CM (2001) A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Meth 6:330–351
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc B Methodol 39:1–38
Freckleton RP (2010) Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol doi:10.1007/s00265-010-1045-6
Garamszegi LZ (2010) Information-theoretic approaches to statistical analysis in behavioural ecology: an introduction. Behav Ecol Sociobiol. doi:0.1007/s00265-010-1028-7
Garamszegi LZ, Moller AP, Torok J, Michl G, Peczely P, Richard M (2004) Immune challenge mediates vocal communication in a passerine bird: an experiment. Behav Ecol 15:148–157
Garamszegi LZ, Eens M, Hurtrez-Bousses S, Moller AP (2005) Testosterone, testes size, and mating success in birds: a comparative study. Horm Behav 47:389–409
Garamszegi LZ, Calhim S, Dochtermann N, Hegyi G, Hurd PL, Jorgensen C, Kutsukake N, Lajeunesse MJ, Pollard KA, Schielzeth H, Symonds MRE, Nakagawa S (2009a) Changing philosophies and tools for statistical inferences in behavioral ecology. Behav Ecol 20:1363–1375
Garamszegi LZ, Eens M, Janos T (2009b) Behavioural syndromes and trappability in free-living collared flycatchers, Ficedula albicollis. Anim Behav 77:803–812
Garland T, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364
Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Graham JW, Schafer JL (1999) On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle R (ed) Statistical strategies for small sample research. Sage, Thousand Oaks
Griffith SC, Owens IPF, Burke T (1999) Environmental determination of a sexually selected trait. Nature 400:358–360
Griffith SC, Owens IPF, Thuman KA (2002) Extra pair paternity in birds: a review of interspecific variation and adaptive function. Mol Ecol 11:2195–2212
Hadfield JD (2008) Estimating evolutionary parameters when viability selection is operating. Proc R Soc B Biol Sci 275:723–734
Hadfield JD, Nakagawa S (2010) General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J Evol Biol 23:494–508
Harrell FEJ, with contributions from many other users (2008) Hmisc: Harrell miscellaneous. In: R package version 3.5-2. http://biostat.mc.vanderbilt.edu/s/Hmisc
Hilborn R, Mangel M (1997) The ecological detective: confronting models with data. Princeton University Press, Princeton
Honaker J, King G (2009) What to do about missing data values in time series cross-section data. http://gking.harvard.edu/files/abs/pr-abs.shtml
Honaker J, King G, Blackwell M (2008) Amelia: Amelia II: a program for missing data. In: R package version 1.1-33. http://gking.harvard.edu/amelia
Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61:79–90
Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends Ecol Evol 19:101–108
Link WA, Barker RJ (2006) Model weights and the foundations of multimodel inference. Ecology 87:2626–2635
Little RJA (1988) A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 83:1198–1202
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
Lukacs PM, Thompson WL, Kendall WL, Gould WR, Doherty PF, Burnham KP, Anderson DR (2007) Concerns regarding a call for pluralism of information theory and hypothesis testing. J Appl Ecol 44:456–460
McKnight PE, McKnight KM, Sidani S, Figueredo AJ (2007) Missing data: a gentle introduction. Guilford, New York
Nakagawa S (2004) A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol 15:1044–1045
Nakagawa S, Freckleton RP (2008) Missing inaction: the dangers of ignoring missing data. Trends Ecol Evol 23:592–596
Nakagawa S, Waas JR, Miyazaki M (2001) Heart rate changes reveal that little blue penguin chicks (Eudyptula minor) can use vocal signatures to discriminate familiar from unfamiliar chicks. Behav Ecol Sociobiol 50:180–188
Nakagawa S, Gillespie DOS, Hatchwell BJ, Burke T (2007a) Predictable males and unpredictable females: sex difference in repeatability of parental care in a wild bird population. J Evol Biol 20:1674–1681
Nakagawa S, Ockendon N, Gillespie DOS, Hatchwell BJ, Burke T (2007b) Does the badge of status influence parental care and investment in house sparrows? An experimental test. Oecologia 153:749–760
Ockendon N, Griffith SC, Burke T (2009) Extrapair paternity in an insular population of house sparrows after the experimental introduction of individuals from the mainland. Behav Ecol 20:305–312
R Development Core Team (2009) R: a language and environment for statistical computing, 282nd edn. R Foundation for Statistical Computing, Vienna
Richards SA, Whittingham MJ, Stephens PA (2010) Model selection and model averaging in behavioural ecology: the utility of the IT-AIC framework. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1035-8
Rubin DB (1976) Inference and missing data. Biometrika 63:581–590
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489
Rushton SP, Ormerod SJ, Kerby G (2004) New paradigms for modelling species distributions? J Appl Ecol 41:193–200
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London
Schafer JL (1999) Multiple imputation: a primer. Stat Meth Med Res 8:3–15
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Meth 7:147–177
Stephens PA, Buskirk SW, Hayward GD, Del Rio CM (2005) Information theory and hypothesis testing: a call for pluralism. J Appl Ecol 42:4–12
Stephens PA, Buskirk SW, del Rio CM (2007) Inference in ecology and evolution. Trends Ecol Evol 22:192–197
Still AW (1992) On the number of subjects used in animal behaviour experiments. Anim Behav 30:873–880
Strimmer K, Rambaut A (2002) Inferring confidence sets of possibly misspecified gene trees. Proc R Soc Lond B Biol Sci 269:137–142
Symonds MRE, Moussalli A (2010) A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion. Behav Ecol Sociobiol. doi:0.1007/s00265-010-1037-6
van Buuren S, Groothuis-Oudshoorn K (2009) mice: Multivariate imputation by chained equations. In: R package version 1.21. http://www.stefvanbuuren.nl
Whittingham MJ, Stephens PA, Bradbury RB, Freckleton RP (2006) Why do we still use stepwise modelling in ecology and behaviour? J Anim Ecol 75:1182–1189
Acknowledgement
We thank Laszlo Garamszegi for the invitation to this special issue and the comments for a previous version of the manuscript. We are also grateful for the constructive comments by two anonymous referees. SN is grateful to Losia Lagisz and Mihoko Nakagawa for their support during the manuscript writing and is supported by Marsden Fund (UOO0812). RPF is funded by a Royal Society University Research Fellowship.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by L. Garamszegi
This contribution is part of the Special Issue “Model selection, multimodel inference and information-theoretic approaches in behavioural ecology” (see Garamszegi 2010).
Rights and permissions
About this article
Cite this article
Nakagawa, S., Freckleton, R.P. Model averaging, missing data and multiple imputation: a case study for behavioural ecology. Behav Ecol Sociobiol 65, 103–116 (2011). https://doi.org/10.1007/s00265-010-1044-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00265-010-1044-7