Minimum message length clustering, environmental heterogeneity and the variable Poisson model

Abstract

One possible explanation of variation in vegetation is based on the variable Poisson model. In this model, species occurrence is presumed to follow a Poisson distribution, but the value of the Poisson parameter for any species varies from point to point, as a result of environmental variation. As an extreme, this includes dividing the given habitat into areas favourable to a community and areas which are unfavourable, or at least not occupied. The spatial area can then be viewed as a series of patches within which each species follows a Poisson distribution, although different patches may have different values for the Poisson parameter for any particular species.

In this paper, I use a method of fuzzy clustering (mixture modelling) based on the minimum message length principle to examine the variation in Poisson parameter of individual species. The method uses the difference between the message length for the null, 1-cluster case and the message length for the optimal cluster solution, appropriately normalised, as a measure of the amount of pattern any analysis captures. I also compare the Poisson results with results obtained by assuming the within patch distribution is Gaussian. The Poisson alternative consistently results in a greater capture of pattern than the Gaussian, but at the expense of a much larger number of clusters. Overall, the Gaussian alternative is strongly supported. Other mechanisms that might introduce extra clusters, for example within-cluster correlation or spatial dependency between observations, would presumably apply equally to both models. The variable Poisson model, in the limit, converges on the individualistic model of vegetation, the Gaussian on something like the community unit model. With these data, the individualistic model is strongly rejected. Difficulties with comparing model classes mean this conclusion must remain tentative.

Abbreviations

MML:

Minimum Message Length

ptp:

point-to-point

References

  1. Ashby, E. 1935. The quantitative analysis of vegetation. Ann. Bot. 49: 779–802.

    Article  Google Scholar 

  2. Banfield, J. D. and A. E. Raftery 1993. Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821.

    Article  Google Scholar 

  3. Barsalou, L. W. 1995. Deriving categories to achieve goals. In:. A. Ram and D. B. Leake (eds.), Goal Directed Learning. MIT Press, Cambridge MA. pp. 121–176.

    Google Scholar 

  4. Bensmail, H., G. Celeux, A. E. Raftery and C. P. Robert. 1997. Inference in model-based cluster analysis. Statistics and Computing 7:1–10.

    Article  Google Scholar 

  5. Boerlijst, M. and P. Hogeweg. 1991. Spiral wave structure in prebiotic evolution: hypercycles stable against parasites. Physica D 48: 17–28.

    Google Scholar 

  6. Brokaw, N. and R. T. Busing. 2000. Niche versus chance in tree diversity in forest gaps. TREE 15: 183–188.

    CAS  Google Scholar 

  7. Dale, M. B. 1987. Knowing when to stop: cluster concept-concept cluster. Coenoses 3: 11–32.

    Google Scholar 

  8. Edgoose, T. and L. Allison. 1999. MML Markov classification of sequential data. Statistics and Computing 9:269–278.

    Article  Google Scholar 

  9. Edwards, R. T. and D. Dowe. 1998. Single factor analysis in MML mixture modelling. Lecture Notes in Artificial Intelligence 1394 Springer Verlag, pp. 96–109.

  10. Erickson, R. O. and J. R. Stehn. 1945. A technique for analysis of population density data. Amer midl. Nat. 33:781–787.

    Article  Google Scholar 

  11. Feller, W. 1943. On a general class of ‘contagious’ distributions. Ann. Math. Statist. 14:389–400.

    Article  Google Scholar 

  12. Fraley C. and A. E. Raftery 1998. How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis. Technical Report no. 329, Department of Statistics, University of Washington.

  13. Goodall, D. W. 1953. Objective methods for the classification of vegetation 1. The use of positive interspecific correlation. Austral. J. Bot. 1: 39–63.

    Google Scholar 

  14. Greig-Smith, P. 1983. Quantitative Plant Ecology, 3rd Edition, Blackwell, Oxford.

    Google Scholar 

  15. Hastie, T. and W. Stuetzle. 1989. Principal curves. Amer Statist. Assoc. J. 84: 502–516.

    Article  Google Scholar 

  16. Hilderman, R. J. & Hamilton, H. J. 1999. Heuristics for ranking the interestingness of discovered knowledge. Proc. 3rd Pacific-Asia Conf. Knowledge Discovery PKDD’99, Beijing, Springer, Berlin. pp. 204–209.

  17. Keddy, P. A. 1993. Do ecological communities exist? A reply to Bastow Wilson. J. Veg. Sci. 4: 135–136.

    Article  Google Scholar 

  18. Kemp, C. D. and A. W. Kemp. 1956. The analysis of point quadrat data. Austral. J. Bot. 4: 167–174.

    Article  Google Scholar 

  19. Kolmogorov, A. N. 1965. Three approaches to the quantitative description of information. Prob. Inform. Transmission 1: 4–7. (translation).

    Google Scholar 

  20. Mackay 1969. Recognition and action. In: S. Watanabe (ed.), Methodologies of Pattern Recognition, Academic Press, London, pp. 409–416.

    Chapter  Google Scholar 

  21. Pólya, G. 1930. Sur quelques points de la théorie des probabilités. Ann. Inst. Poincaré 1: 117–161.

    Google Scholar 

  22. Rissanen, J. 1999. Hypothesis selection and testing by the MDL principle. Comput. J. 42:260–269.

    Article  Google Scholar 

  23. Robinson, P. 1954. The distribution of plant populations. Ann. Bot. 19:59–66.

    Article  Google Scholar 

  24. Shipley, B. and P. A. Keddy. 1987. The individualistic and community-unit concepts as falsifiable hypotheses. Vegetatio 69: 47–55.

    Article  Google Scholar 

  25. Simberloff, D. 1980. A succession of paradigms in ecology: Essentialism to materialism and probabilism. Synthese 43: 3–29.

    Article  Google Scholar 

  26. Singh, B. N. and K. Das. 1938. Distribution of weed species on arable land. J. Ecol. 26: 455–466.

    Article  Google Scholar 

  27. Stanford, D. and A. E. Raftery. 1997. Principal curve clustering with noise. Tech. Rep. 317, Dept. Statistics, University of Washington.

  28. Stevens, W. L. 1937. Significance of grouping. Ann. Eug. London. 8: 57–69.

    Article  Google Scholar 

  29. Trass, H. and N. Malmer. 1973. North European approaches to classification. In: R. H. Whittaker (ed.), Classification and Ordination of Plant Communities, Dr. W Junk, The Hague, pp.529–575.

    Chapter  Google Scholar 

  30. Wallace, C. S. 1995. Multiple factor analysis by MML estimation. Tech. Rep. 95/218, Dept Computer Science, Monash University, Clayton, Victoria 3168, Australia.

  31. Wallace C. S. 1998. Intrinsic classification of spatially-correlated data Comput. J. 41: 602–611.

    Article  Google Scholar 

  32. Wallace, C. S. and D. L. Dowe. 2000. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing 10:73–83.

    Article  Google Scholar 

  33. Westhoff, V. and E. van der Maarel 1973. The Braun-Blanquet approach. In: R. H. Whittaker (ed.), Classification and Ordination of Plant Communities, Dr. W. Junk, The Hague, pp. 617–707.

    Chapter  Google Scholar 

  34. Wilson, J. B. 1991. Does vegetation science exist? J. Veg. Sci. 2:289–290.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to M. B. Dale.

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Cite this article

Dale, M.B. Minimum message length clustering, environmental heterogeneity and the variable Poisson model. COMMUNITY ECOLOGY 2, 171–180 (2001). https://doi.org/10.1556/ComEc.2.2001.2.4

Download citation

Keywords

  • Fuzzy clustering
  • Gaussian distribution
  • Mixture modelling
  • Pattern