Behavioral Ecology and Sociobiology

, Volume 65, Issue 1, pp 91–101 | Cite as

Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error

Original Paper

Abstract

There has been a great deal of recent discussion of the practice of regression analysis (or more generally, linear modelling) in behaviour and ecology. In this paper, I wish to highlight two factors that have been under-considered, collinearity and measurement error in predictors, as well as to consider what happens when both exist at the same time. I examine what the consequences are for conventional regression analysis (ordinary least squares, OLS) as well as model averaging methods, typified by information theoretic approaches based around Akaike’s information criterion. Collinearity causes variance inflation of estimated slopes in OLS analysis, as is well known. In the presence of collinearity, model averaging reduces this variance for predictors with weak effects, but also can lead to parameter bias. When collinearity is strong or when all predictors have strong effects, model averaging relies heavily on the full model including all predictors and hence the results from this and OLS are essentially the same. I highlight that it is not safe to simply eliminate collinear variables without due consideration of their likely independent effects as this can lead to biases. Measurement error is also considered and I show that when collinearity exists, this can lead to extreme biases when predictors are collinear, have strong effects but differ in their degree of measurement error. I highlight techniques for dealing with and diagnosing these problems. These results reinforce that automated model selection techniques should not be relied on in the analysis of complex multivariable datasets.

Keywords

Regression Model selection Information theory 

References

  1. Anderson DR (2008) Model-based inference in the life sciences. Springer, New YorkCrossRefGoogle Scholar
  2. Burnham KP, Anderson DR (1998) Model selection and multimodel inference. Springer, BerlinGoogle Scholar
  3. Burnham KP, Anderson DR (2002) Model selection and multimodel inference. Springer, BerlinGoogle Scholar
  4. Burnham KP, Anderson D, Huyvaert K (2010) AICc model selection in ecological and behavioural science: some background, observations and comparisons. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1029-6
  5. Carroll RJ, Spiegelman CH, Gordon Lan KK, Bailey KT, Abbott RD (1984) On errors-in-variables for binary regression models. Biometrika 71:19–25CrossRefGoogle Scholar
  6. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective. Chapman & Hall, LondonCrossRefGoogle Scholar
  7. Chatfield C (1996) The analysis of time series. Chapman & Hall, LondonGoogle Scholar
  8. Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, CambridgeGoogle Scholar
  9. Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric error models. J Am Stat Soc 89:1314–1328Google Scholar
  10. Dennis B, Ponciano JM, Lele SR, Taper ML, Staples DF (2006) Estimating density dependence, process noise and observation error. Ecol Monogr 76:323–341CrossRefGoogle Scholar
  11. Draper NR, Smith H (1998) Applied regression analysis. Blackwell Scientific, OxfordGoogle Scholar
  12. Ellner SP, Seifu Y, Smith RH (2002) Fitiing population dynamic models to time-series data by gradient matching. Ecology 83:2256–2270CrossRefGoogle Scholar
  13. Felsenstein J (1988) Phylogenies and quantitative characters. Ann Rev Ecolog Syst 19:445–471CrossRefGoogle Scholar
  14. Forstmeier W, Schielzeth H (2010) Cryptic multiple hypothesis testing in linear models: overestimated effect sizes and the winner’s curse. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1038-5
  15. Fox J-P, Glas C (2003) Bayesian modelling of measurement error in predictor variables using item response theory. Psychometrika 68:169–191CrossRefGoogle Scholar
  16. Freckleton RP (2002) On the misuse of residuals in ecology: regression of residuals versus multiple regression. J Anim Ecol 71:542–545CrossRefGoogle Scholar
  17. Freckleton RP, Watkinson AR, Thomas TH, Webb DJ (1998) Yield of sugar beet in relation to weather and nutrients. Agric For Meteorol 93:39–51CrossRefGoogle Scholar
  18. Freckleton RP, Watkinson AR, Green RE, Sutherland WJ (2006) Census error and the detection of density dependence. J Anim Ecol 75:837–851PubMedCrossRefGoogle Scholar
  19. Garamszegi LZ (2010) Information-theoretic approaches in statistical analysis in behavioural ecology: an introduction. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1028-7
  20. Garcia-Berthou E (2001) On the misuse of residuals in ecology: testing regression residuals vs. the analysis of covariance. J Anim Ecol 70:708–711CrossRefGoogle Scholar
  21. Goldstein H (1995) Multilevel statistical models. Eward Arnold, LondonGoogle Scholar
  22. Grafen A, Hails R (2002) Modern statistics for the life sciences. Oxford University Press, OxfordGoogle Scholar
  23. Haining R (1990) Spatial data analysis in the social and environmental sciences. Cambridge University Press, CambridgeGoogle Scholar
  24. Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, OxfordGoogle Scholar
  25. Hegyi G, Garamszegi LZ (2010) Using information theory as a substitute for stepwise regression in ecology and behavious. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1036-7
  26. Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends Ecol Evol 19:101–108PubMedCrossRefGoogle Scholar
  27. Leigh RA, Johnston AE (1994) Long-term experiments in agricultural and ecological science. In CAB International, WallingfordGoogle Scholar
  28. Linden A, Knape J (2009) Estimating environmental effects on population dynamics: consequences of observation error. Oikos 118:675–680CrossRefGoogle Scholar
  29. Link WA, Barker RJ (2006) Model wieghts and the foundations of multimodel inference. Ecology 87:2626–2635PubMedCrossRefGoogle Scholar
  30. Quinn G, Keough M (2002) Experimental design and data analysis for biologists. Cambridge University Press, CambridgeGoogle Scholar
  31. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55CrossRefGoogle Scholar
  32. Rushton SP, Ormerod SJ, Kerby G (2004) New paradigms for modelling species distributions. J Appl Ecol 41:193–200CrossRefGoogle Scholar
  33. Ruxton GD, Colgrave N (2002) Experimental design for the life sciences. Oxford University Press, OxfordGoogle Scholar
  34. Schafer DW (1987) Covariate measurement error in generalized linear models. Biometrika 74:385–391CrossRefGoogle Scholar
  35. Shenk TM, White GC, Burnham KP (1998) Sampling variance effects on detecting density dependence from temporal trends in natural populations. Ecol Monogr 68:445–463CrossRefGoogle Scholar
  36. Sokal RR, Rohlf FJ (1995) Biometry. W.H. Freeman & Co., New YorkGoogle Scholar
  37. Stefanski LA, Cook JR (1995) Simulation extrapolation: the measurement error jackknife. J Am Stat Assoc 90:1247–1256CrossRefGoogle Scholar
  38. Székely T, Freckleton RP, Reynolds JD (2004) Sexual selection explains Rensch’s rule of size dimorphism in shorebirds. Proc Natl Acad Sci 101:12224–12227PubMedCrossRefGoogle Scholar
  39. Whittingham MJ, Stephens PA, Bradbury R, Freckleton RP (2006) Why do I still use stepwise regression? J Anim Ecol 42:270–280Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK

Personalised recommendations