Balancing Statistics and Ecology: Lumping Experimental Data for Model Selection

  • Nelly van der Hoeven
  • Lia Hemerik
  • Patrick A. Jansen


Ecological experiments often accumulate data by carrying out many replicate trials, each containing a limited number of observations, which are then pooled and analysed in the search for a pattern. Replicating trials may be the only way to obtain sufficient data, yet lumping disregards the possibility of differences in experimental conditions influencing the overall pattern. This paper discusses how to deal with this dilemma in model selection. Three methods of model selection are introduced: likelihood-ratio testing, the Akaike Information Criterion (AIC) with or without small-sample correction and the Bayesian Information Criterion (BIC). Subsequently, we apply the AICc method to an example on size-dependent seed dispersal by scatterhoarding rodents.

The example involves binary data on the selection and removal of Carapa procera (Meliaceae) seeds by scatterhoarding rodents in replicate trials during years of different ambient seed abundance. The question is whether there is an optimum size for seeds to be removed and dispersed by the rodents. We fit five models, varying from no effect of seed mass to an optimum seed mass. We show that lumping the data produces the expected pattern but gives a poor fit compared to analyses in which grouping levels are taken into account. Three methods of grouping were used: per group a fixed parameter value; per group a randomly drawn parameter value; and some parameters fixed per group and others constant for all groups. Model fitting with some parameters fixed for all groups, and others depending on the trial give the best fit. The general pattern is however rather weak.

We explore how far models must differ in order to be able to discriminate between them, using the minimum Kullback-Leibler distance as a measure for the difference. We then show by simulation that the differences are too small to discriminate at all between the five models tested at the level of replicate trials.

We recommend a combined approach in which the level of lumping trials is chosen by the amount of variation explained in comparison to an analysis at the trial level. It is shown that combining data from different trials only leads to an increase in the probability of identifying the correct model with the AIC criterion if the distance of all simpler (=less extended models) to the simulated model is sufficiently large in each trial. Otherwise, increasing the number of replicate trials might even lead to a decrease in the power of the AIC.


AIC Carapa procera Kullback-Leibler distance Likelihood-Ratio test model selection Myoprocta acouchy non-central chi-square distribution power Red acouchy scatterhoarding seed dispersal seed size 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19: 716–723.CrossRefGoogle Scholar
  2. Borowiak, D. S. (1989). Model Discrimination for Non-Linear Regression Models. Marcel Dekker Inc., New York.Google Scholar
  3. Burnham, K. P. and D. R. Anderson (2002). Model Selection and Inference. A Practical Information-Theoretic Approach. Springer, New York.Google Scholar
  4. Cox, D. R. and D. V. Hinkley (1974). Theoretical Statistics. Chapman and Hall, London.Google Scholar
  5. Hallwachs, W. (1994). The Clumsy Dance between Agoutis and Plants: Scatterhoarding by Costa Rican Dry Forest Agoutis (Dasyprocta punctata: Dasyproctidae: Rodentia). PhD thesis, Cornell University, New York.Google Scholar
  6. Hemerik, L. and N. van der Hoeven (2003). Egg distributions of solitary parasitoids revisited. Entomologia Experimentalis et Applicata 107: 81–86.CrossRefGoogle Scholar
  7. Hemerik, L., N. van der Hoeven and J. J. M. Van Alphen (2002). Egg distributions and the information a solitary parasitoid has and uses for its oviposition decisions. Acta Biotheoretica 50: 167–188.CrossRefGoogle Scholar
  8. Hilborn, R. and M. Mangel (1997). The Ecological Detective. Confronting Models with Data. Princeton University Press, Princeton.Google Scholar
  9. Huisman, J., H. Olff and L. F. M. Fresco (1993). A hierarchical set of models for species response analysis. Journal of Vegetation Science 4: 37–46.CrossRefGoogle Scholar
  10. Hurvich, C. M. and C.-L. Tsai (1989). Regression and time series model selection in small samples. Biometrika 76: 297–307.CrossRefGoogle Scholar
  11. Jansen, P. A., M. Bartholomeus, F. Bongers, J. A. Elzinga, J. Den Ouden and S. E. Van Wieren (2002). The role of seed size in dispersal by a scatter-hoarding rodent. pp. 209–225. In: Levey, D., W. R. Silva, and M. Galetti (Eds) Seed Dispersal and Frugivory: Ecology, Evolution and Conservation. CAB International, Wallingford.Google Scholar
  12. Jansen, P. A. (2003). Scatterhoarding and Tree Regeneration. Ecology of Nut Dispersal in a Neotropical Rainforest. PhD thesis, Wageningen University, The Netherlands.Google Scholar
  13. Kullback, S. and R. A. Leibler (1951). On information and sufficiency. Annals of Mathematical Statistics 22: 79–86.Google Scholar
  14. Linhart, H. and W. Zucchini (1986). Model Selection. John Wiley and Sons, New York.Google Scholar
  15. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6: 461–464.Google Scholar
  16. Smith, C. C. and O. J. Reichman (1984). The evolution of food caching by birds and mammals. Annual Review of Ecology and Systematics 15: 329–351.CrossRefGoogle Scholar
  17. Umbach, D. M. and A. J. Wilcox (1996). A technique for measuring epidemiologically useful features of birthweight distributions. Statistics in Medicine 15: 1333–1348.CrossRefGoogle Scholar
  18. Van der Hoeven, N., (in press). A general method to calculate the power of likelihood-ratio based tests to choose between two nested models. Journal of Statistical Planning and Inference.Google Scholar
  19. Vander Wall, S. B. (1990). Food Hoarding in Animals. Chicago University Press, Chicago.Google Scholar
  20. Vander Wall, S. B. (2003). Effects of seed size of wind-dispersed pines (Pinus) on secondary seed dispersal and the caching behavior of rodents. Oikos 100: 25–34.CrossRefGoogle Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  • Nelly van der Hoeven
    • 1
  • Lia Hemerik
    • 2
  • Patrick A. Jansen
    • 3
  1. 1.Department of Theoretical BiologyLeiden UniversityLeiden
  2. 2.Department of Mathematical and Statistical MethodsWageningen UniversityWageningen
  3. 3.Forest Ecology and Forest Management GroupWageningen UniversityWageningen

Personalised recommendations