Abstract
This paper examines how we might test the continuum theory against the community unit theory. Adherence to one or other of these models without testing is simply an assignment of an extreme prior probability to the preferred option. The question can be rephrased to ask whether, for a set of observations, a single model is adequate or whether a mixture of models would be preferable. To judge between them involves first defining the nature of the model(s) to be fitted in each case and then comparing the complexity and quality of fit. Occam’s razor suggests that we should seek the simplest model with adequate fit, with parameters estimated with optimal precision. The simplest comparison of the two theories thus requires only the estimation of the number of clusters for the chosen model(s) of within-cluster variation. If a single cluster is of adequate quality then the continuum model is appropriate, while if several are needed then the community model is preferable for that particular dataset. To establish universal applicability of either model involves investigation of many datasets.
There are several ways in which model quality can be assessed, and here I concentrate on the minimal message length principle which is a function of the prior probability of the model and its fit to the observed data, assuming the model to be correct. This principle has been shown to perform well when compared with other possibilities.
I first illustrate the procedure for making a choice between models, using a simple model, then examine two alternative formulations of within-cluster models which seem more appropriate, one static, the other dynamic.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Anand, M. and Orlóci, L. 1997. Chaotic dynamics in a multispecies community. Ecological and Environmental Statistics 4: 337–344.
Attias, H. 1999. Independent factor analysis. Neural Computation 11:803–851.
Bar-Yam, Y. 2002. Sum rule for multiscale representations of kinematically described systems. Advances in Complex Systems 5: 409–431.
Barron, A. R. and Conover, T. M. 1991. Minimum complexity density estimation. I. E. E. E. Trans. Inform. Theory 37: 1034–1054.
Boerlijst, M. C. 2000. Spirals and spots: novel evolutionary phenomena through spatial self-structuring. In: U. Dieckmann, R. Law and H. Metz (eds.), The Geometry of Ecological Interactions: Simplifying Spatial Complexity, Cambridge University Press, Cambridge, pp. 171–182.
Boulton, D. M. and Wallace, C. S. 1970. A program for numerical classification. Comput. J. 13: 63–69.
Boulton, D. M. and Wallace, C. S. 1973. An information measure for hierarchic classification. Comput. J. 16: 254–261.
Brokaw, N. and Busing, R. T. 2000. Niche versus chance in tree diversity in forest gaps. TREE 15: 183–188.
Bruun, H. H. and Erjnaes, R. 2000. Classification of dry grassland vegetation in Denmark. J. Veg. Sci. 11: 585–596.
Dale, M. B. 1994. Straightening the horseshoe: a Riemannian resolution? Coenoses 9:43–53.
Dale, M. B. 2000. On plexus representation of dissimilarities. Community Ecology 1: 43–56.
Dale, M. B. 2001. Minimal message length clustering, environmental heterogeneity and the variable Poisson model. Community Ecology 2: 171–180.
Dale, M. B. 2002. Models, measures and messages: an essay on the role of induction. Community Ecology 3: 191–204.
Dale, M. B. and Anderson, D. J. 1973. Inosculate analysis of vegetation data. Austral. J. Bot. 21:253–276.
Dale, M. B., Dale, P. E. R. and Edgoose, T. 2002a. Markov models for incorporating temporal dependence. Acta Oecologica 23:261–269.
Dale, M. B., Dale, P. E. R., Li, C. and Biswas, G. 2002b. Assessing impacts of small perturbations using a model-based approach. Ecol. Modell. 156:185–199.
Dale, M. B., Salmina, L. and Mucina, L. 2001. Minimum message length clustering: an explication and some applications to vegetation data. Community Ecology 2: 231–247.
Dale, P. E. R. and Dale, M. B. 2002. Optimal classification to describe environmental change: pictures from the exposition. Community Ecology 3: 19–30.
Davis, R. I. A., Lovell, B. C. and Caelli, T. 2002. Improved estimation of hidden Markov model parameters from multiple observation sequences. In: R. Kasturi, D. Laurendeau and C. Suen (eds.), Proc. Intematl. Conf. Pattern Recognition, August 11–14 II, Quebec City, Canada, pp. 168–171.
Desrochers, R. E. and Anand, M. 2003. The use of taxonomic diversity indices in the assessment of perturbed community recovery. In: Proc. 4th Intematl. Conf. Ecosystems and Sustainable Development, June 4–6, 2003, Siena, Italy. WIT Press, Southampton.
Domingos, P. 1999. The role of Occam’s Razor in knowledge discovery. Data Mining and Knowledge Discovery 3: 409–425.
Edwards, R. T. and Dowe, D. 1998. Single factor analysis in MML mixture modelling. Lecture Notes in Artificial Intelligence 1394, Springer Verlag, Berlin, pp. 96–109.
Erjncs, R. and Bruun, H. H. 2000. Gradient analysis of dry grassland vegetation in Denmark. J. Veg. Sci. 11: 573–584.
Fisher, D. H. 1992. Pessimistic and optimistic induction. TR CS-92-12 Dept. Comput. Sci., Vanderbilt Univ.
Gamberger, D. and Lavra, N. 1997. Conditions for Occam’s razor applicability and noise elimination. In: Proc. 9th European Conf. Machine Leaming. Springer Verlag, pp. 108–123.
Gilbert, N. and Wells, T. C. E. 1966. Analysis of quadrat data. J. Ecol. 54: 675–686.
Gillison, A. N. and Brewer, K. R. W. 1985. The use of gradient directed transects or gradsects in natural resource surveys. J. Environ. Manage. 20: 103–127.
Goodall, D. W. 1953. Objective methods for the classification of vegetation: the use of positive interspecific correlation. Austral. J. Bot. 1:39–63.
Hájek, P. and Havránek, T. 1977. On generation of inductive hypotheses. International. J. Man-Mack Stud. 9: 415–438.
Hanson, R. Stutz, J. and Cheeseman, P. 1991. Bayesian Classification with Correlation and Inheritance. In: Proc. 12th International Joint Conference on Artificial Intelligence. Sydney, Australia. August 24–30. Morgan Kaufmann, San Francisco, pp. 692–698.
Hogeweg, P. 2002. Computing an organism: on the interface between informatic and dynamic processes. BioSystems 64: 97–109.
Ihm, P and van Groenewoud, H. 1975. A multivariate ordering of vegetation data based on Gaussian type gradient response curves. J. Ecol. 63: 767–777.
Jelinski, D. E. and Wu, J-G. 1996. The modifiable areal unit problem and implications for landscape ecology. Landscape Ecology 11: 129–140.
Kiers, H. A. L. 1994. SIMPLIMAX: oblique rotation to an optimal target with simple structure. Psychometrika 59: 567–579.
Kreinovich, V. and Kunin, I. A. 2003. Kolmogorov complexity and chaotic phenomena. Intematl. J. Engineering Science 41: 483–493.
Legendre, P. and Gallagher, E. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 270: 271–280.
Li, C., Biswas, G., Dale, M. B. and Dale, P. E. R. 2002. Matryoshka: A HMM based temporal data clustering methodology for modelling system dynamics. Intelligent Data Analysis Journal (in press)
Lippe, E., de Smidt, J. and Glenn-Lewin, D. 1985. Markov models and succession: a test from a heathland in the Netherlands. J. Ecol. 73:775–791.
Neal, R. M. 1998. Markov chain sampling methods for Dirichlet process mixture models. Tech. Rep. 9815, Department of Statistics, Univ. Toronto.
Neil, J. R., Wallace, C. S. and Korb, K. B. 1999. Bayesian networks with non-interacting causes. Tech. Rep. 1999/28, Dept. Computer Science, Monash University, Melbourne.
Openshaw, S. 1984. The modifiable areal unit problem. CATMOG 38. GeoBooks, Norwich, England.
Orlóci, L., Anand, M. and He, X. S. 1993. Markov chain: a realistic model for temporal coenosere? Biométrie-Praximétrie 33:7–26.
Pagie, L. and Hogeweg, P. 1999. Colicin diversity: a result of eco-evolutionary dynamics. J. Theoret. Biol. 196:251–261.
Posse, C. 1995. Projection pursuit exploratory data analysis. Camputat. Statist. Data Anal. 20:669–687.
Ramsey, J. B. and Yuan, H-J. 1990. The statistical properties of dimension calculations using small data sets. Nonlinearity 3:155–176.
Rietkerk, M., Boerlijst, M. C., van Langevelde, F., HilleRisLambers, D., van der Koppel, J., Kumar, L. Prins, H. H. T. and de Roos, A. M. 2002. Self-organization of vegetation in arid ecosystems. Amer Natur. 160: 524–530.
Rissanen, J. J. 1978. Modelling by shortest data description. Auto-matika 14:465–471.
Rissanen, J. J. 1987. Stochastic complexity. J. Royal Statist. Soc. B 49:223–239
Rissanen, J. J. 1996. Fisher information and stochastic complexity. I. E. E. E. Trans. Information Theory 42: 40–47.
Shalizi, C. R., and Crutchfield, J. P. 1999. Computational mechanics: Pattern and prediction, structure and simplicity. Sante Fe Institute Working Paper 99-07-044.
Shipley, B. and Keddy, P. A. 1987. The individualistic and community-unit concepts as falsifiable hypotheses. Vegetatio 69: 47–55.
Stone, J. V. and Porrill, J. 1998. Independent component analysis and Projection Pursuit: a tutorial introduction. Available as file ica_tutorial2.tex from www.shef.ac.uk/psychology/stone
Trunk, G. V. 1976. Statistical estimation of the intrinsic dimensionality of data collections. Inform. Control. 12: 508–525.
Tucker, B. C. and Anand, M. 2003. The use of matrix models to detect natural and pollution-induced forest gradients. Community Ecology 4:89–100.
Uebersax, J. S. and Grove, W. M. 1993. A latent trait finite mixture model for the analysis of rating agreement. Biometrics 49: 823–835.
Wallace, C. S. 1995. Multiple factor analysis by MML estimation. Tech. Rep. 95/218, Dept Computer Science, Monash University, Clayton, Victoria 3168, Australia. 21 pp.
Wallace, C. S. 1996. MML Inference of predictive trees, graphs and nets. In: Gammerman, A. (ed.). Computational Learning and Probabilistic Reasoning, John Wiley, London, pp. 43–66.
Wallace, C. S. 1998. Intrinsic classification of spatially-correlated data. Comput. J. 41: 602–611.
Wallace, C. S. and Dowe, D. L. 2000. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing 10: 73–83.
Wallace, C. S. and Freeman, P. R. 1992. Single-factor analysis by minimal message length estimation. J. Roy. Statist. Soc. 54:195–209.
Wallace, C. S. and Georgeff, M. P. 1983. A general objective for inductive inference. Tech. Rep. 32, Dept. Computer Science, Monash University, 3168 Australia.
Wallace, C. S., Korb, K. B. and Dai, H. 1996. Causal discovery via MML. Tech. Rep. 96/254 Dept. Computer Science, Monash University, Clayton, Victoria 3168, Australia.
Webb, G. I. 1996. Further experimental evidence against the utility of Occam’s Razor. J. Artif. Intell. Res. 4: 387–417.
Wisheu, I. and Keddy, P. A. 1992. Competition and centrifugal organisation of plant communities: theory and tests. J. Veg. Sci. 3:147–156.
Young, P., Parkinson, S. and Lees, M. 1996. Simplicity out of complexity in environmental modelling: Occam’s razor revisited. J. Appl. Statist. 234: 165–210.
Acknowledgments
To Madhur Anand, Laco Mucina, David Goodall, and Rasmus Erjnæs, my thanks for permission to use their data. They are not responsible for any misuse or misinterpretations. To R. Davis, my thanks for assistance with coupled hidden Markov models.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Dale, M.B. Continuum or community: a priori assumption or data-dependent choice?. COMMUNITY ECOLOGY 4, 129–139 (2003). https://doi.org/10.1556/ComEc.4.2003.2.2
Published:
Issue Date:
DOI: https://doi.org/10.1556/ComEc.4.2003.2.2