Skip to main content
Log in

Model averaging, missing data and multiple imputation: a case study for behavioural ecology

  • Original Paper
  • Published:
Behavioral Ecology and Sociobiology Aims and scope Submit manuscript

Abstract

Model averaging, specifically information theoretic approaches based on Akaike’s information criterion (IT-AIC approaches), has had a major influence on statistical practices in the field of ecology and evolution. However, a neglected issue is that in common with most other model fitting approaches, IT-AIC methods are sensitive to the presence of missing observations. The commonest way of handling missing data is the complete-case analysis (the complete deletion from the dataset of cases containing any missing values). It is well-known that this results in reduced estimation precision (or reduced statistical power), biased parameter estimates; however, the implications for model selection have not been explored. Here we employ an example from behavioural ecology to illustrate how missing data can affect the conclusions drawn from model selection or based on hypothesis testing. We show how missing observations can be recovered to give accurate estimates for IT-related indices (e.g. AIC and Akaike weight) as well as parameters (and their standard errors) by utilizing ‘multiple imputation’. We use this paper to illustrate key concepts from missing data theory and as a basis for discussing available methods for handling missing data. The example is intended to serve as a practically oriented case study for behavioural ecologists deciding on how to handle missing data in their own datasets and also as a first attempt to consider the problems of conducting model selection and averaging in the presence of missing observations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Allison PD (2002) Missing data. Sage, Thousand Oaks

    Google Scholar 

  • Anderson DR (2008) Model based inference in the life sciences: a primer on evidence. Springer, New York

    Book  Google Scholar 

  • Barton K (2009) MuMIn: multi-model inference. In: R package version 0.12.0. http://r-forge.r-project.org/projects/mumin/

  • Biro PA, Dingemanse NJ (2009) Sampling bias resulting from animal personality. Trends Ecol Evol 24:66–67

    Article  PubMed  Google Scholar 

  • Bolker BM (2008) Ecological models and data in R. Princeton University Press, Princeton

    Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, Berlin

    Google Scholar 

  • Claeskens G, Hiort NL (2009) Model selection and model averaging. Cambridge University Press, Cambridge

    Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum, Hillsdale

    Google Scholar 

  • Collins LM, Schafer JL, Kam CM (2001) A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Meth 6:330–351

    Article  CAS  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc B Methodol 39:1–38

    Google Scholar 

  • Freckleton RP (2010) Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol doi:10.1007/s00265-010-1045-6

  • Garamszegi LZ (2010) Information-theoretic approaches to statistical analysis in behavioural ecology: an introduction. Behav Ecol Sociobiol. doi:0.1007/s00265-010-1028-7

    Google Scholar 

  • Garamszegi LZ, Moller AP, Torok J, Michl G, Peczely P, Richard M (2004) Immune challenge mediates vocal communication in a passerine bird: an experiment. Behav Ecol 15:148–157

    Article  Google Scholar 

  • Garamszegi LZ, Eens M, Hurtrez-Bousses S, Moller AP (2005) Testosterone, testes size, and mating success in birds: a comparative study. Horm Behav 47:389–409

    Article  PubMed  CAS  Google Scholar 

  • Garamszegi LZ, Calhim S, Dochtermann N, Hegyi G, Hurd PL, Jorgensen C, Kutsukake N, Lajeunesse MJ, Pollard KA, Schielzeth H, Symonds MRE, Nakagawa S (2009a) Changing philosophies and tools for statistical inferences in behavioral ecology. Behav Ecol 20:1363–1375

    Article  Google Scholar 

  • Garamszegi LZ, Eens M, Janos T (2009b) Behavioural syndromes and trappability in free-living collared flycatchers, Ficedula albicollis. Anim Behav 77:803–812

    Article  Google Scholar 

  • Garland T, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364

    Article  Google Scholar 

  • Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

    Google Scholar 

  • Graham JW, Schafer JL (1999) On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle R (ed) Statistical strategies for small sample research. Sage, Thousand Oaks

    Google Scholar 

  • Griffith SC, Owens IPF, Burke T (1999) Environmental determination of a sexually selected trait. Nature 400:358–360

    Article  CAS  Google Scholar 

  • Griffith SC, Owens IPF, Thuman KA (2002) Extra pair paternity in birds: a review of interspecific variation and adaptive function. Mol Ecol 11:2195–2212

    Article  PubMed  CAS  Google Scholar 

  • Hadfield JD (2008) Estimating evolutionary parameters when viability selection is operating. Proc R Soc B Biol Sci 275:723–734

    Article  Google Scholar 

  • Hadfield JD, Nakagawa S (2010) General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J Evol Biol 23:494–508

    Article  Google Scholar 

  • Harrell FEJ, with contributions from many other users (2008) Hmisc: Harrell miscellaneous. In: R package version 3.5-2. http://biostat.mc.vanderbilt.edu/s/Hmisc

  • Hilborn R, Mangel M (1997) The ecological detective: confronting models with data. Princeton University Press, Princeton

    Google Scholar 

  • Honaker J, King G (2009) What to do about missing data values in time series cross-section data. http://gking.harvard.edu/files/abs/pr-abs.shtml

  • Honaker J, King G, Blackwell M (2008) Amelia: Amelia II: a program for missing data. In: R package version 1.1-33. http://gking.harvard.edu/amelia

  • Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61:79–90

    Article  PubMed  Google Scholar 

  • Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends Ecol Evol 19:101–108

    Article  PubMed  Google Scholar 

  • Link WA, Barker RJ (2006) Model weights and the foundations of multimodel inference. Ecology 87:2626–2635

    Article  PubMed  Google Scholar 

  • Little RJA (1988) A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 83:1198–1202

    Article  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    Google Scholar 

  • Lukacs PM, Thompson WL, Kendall WL, Gould WR, Doherty PF, Burnham KP, Anderson DR (2007) Concerns regarding a call for pluralism of information theory and hypothesis testing. J Appl Ecol 44:456–460

    Article  Google Scholar 

  • McKnight PE, McKnight KM, Sidani S, Figueredo AJ (2007) Missing data: a gentle introduction. Guilford, New York

    Google Scholar 

  • Nakagawa S (2004) A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol 15:1044–1045

    Article  Google Scholar 

  • Nakagawa S, Freckleton RP (2008) Missing inaction: the dangers of ignoring missing data. Trends Ecol Evol 23:592–596

    Article  PubMed  Google Scholar 

  • Nakagawa S, Waas JR, Miyazaki M (2001) Heart rate changes reveal that little blue penguin chicks (Eudyptula minor) can use vocal signatures to discriminate familiar from unfamiliar chicks. Behav Ecol Sociobiol 50:180–188

    Article  Google Scholar 

  • Nakagawa S, Gillespie DOS, Hatchwell BJ, Burke T (2007a) Predictable males and unpredictable females: sex difference in repeatability of parental care in a wild bird population. J Evol Biol 20:1674–1681

    Article  PubMed  CAS  Google Scholar 

  • Nakagawa S, Ockendon N, Gillespie DOS, Hatchwell BJ, Burke T (2007b) Does the badge of status influence parental care and investment in house sparrows? An experimental test. Oecologia 153:749–760

    Article  PubMed  Google Scholar 

  • Ockendon N, Griffith SC, Burke T (2009) Extrapair paternity in an insular population of house sparrows after the experimental introduction of individuals from the mainland. Behav Ecol 20:305–312

    Article  Google Scholar 

  • R Development Core Team (2009) R: a language and environment for statistical computing, 282nd edn. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  • Richards SA, Whittingham MJ, Stephens PA (2010) Model selection and model averaging in behavioural ecology: the utility of the IT-AIC framework. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1035-8

    Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63:581–590

    Article  Google Scholar 

  • Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York

    Book  Google Scholar 

  • Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489

    Article  Google Scholar 

  • Rushton SP, Ormerod SJ, Kerby G (2004) New paradigms for modelling species distributions? J Appl Ecol 41:193–200

    Article  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London

    Book  Google Scholar 

  • Schafer JL (1999) Multiple imputation: a primer. Stat Meth Med Res 8:3–15

    Article  CAS  Google Scholar 

  • Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Meth 7:147–177

    Article  Google Scholar 

  • Stephens PA, Buskirk SW, Hayward GD, Del Rio CM (2005) Information theory and hypothesis testing: a call for pluralism. J Appl Ecol 42:4–12

    Article  Google Scholar 

  • Stephens PA, Buskirk SW, del Rio CM (2007) Inference in ecology and evolution. Trends Ecol Evol 22:192–197

    Article  PubMed  Google Scholar 

  • Still AW (1992) On the number of subjects used in animal behaviour experiments. Anim Behav 30:873–880

    Article  Google Scholar 

  • Strimmer K, Rambaut A (2002) Inferring confidence sets of possibly misspecified gene trees. Proc R Soc Lond B Biol Sci 269:137–142

    Article  Google Scholar 

  • Symonds MRE, Moussalli A (2010) A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion. Behav Ecol Sociobiol. doi:0.1007/s00265-010-1037-6

    Google Scholar 

  • van Buuren S, Groothuis-Oudshoorn K (2009) mice: Multivariate imputation by chained equations. In: R package version 1.21. http://www.stefvanbuuren.nl

  • Whittingham MJ, Stephens PA, Bradbury RB, Freckleton RP (2006) Why do we still use stepwise modelling in ecology and behaviour? J Anim Ecol 75:1182–1189

    Article  PubMed  Google Scholar 

Download references

Acknowledgement

We thank Laszlo Garamszegi for the invitation to this special issue and the comments for a previous version of the manuscript. We are also grateful for the constructive comments by two anonymous referees. SN is grateful to Losia Lagisz and Mihoko Nakagawa for their support during the manuscript writing and is supported by Marsden Fund (UOO0812). RPF is funded by a Royal Society University Research Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shinichi Nakagawa.

Additional information

Communicated by L. Garamszegi

This contribution is part of the Special Issue “Model selection, multimodel inference and information-theoretic approaches in behavioural ecology” (see Garamszegi 2010).

Electronic supplementary material

Below is the link to the electronic supplementary material.

S1

(DOC 253 kb)

S2

(ZIP 26.9 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nakagawa, S., Freckleton, R.P. Model averaging, missing data and multiple imputation: a case study for behavioural ecology. Behav Ecol Sociobiol 65, 103–116 (2011). https://doi.org/10.1007/s00265-010-1044-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00265-010-1044-7

Keywords

Navigation