Abstract
Although there are many measures of similarity existing in the phytosociological literature, these almost all apply to data for which the describing attributes have only single values. In many cases, however, there can be a richer structure in the attribute values, either directly from the nature of the attributes or derived from relationships between the stands. In this paper, I first examine a range of possible sources of such structure in phytosociological data, and then propose a similarity measure sufficiently general to be applicable to all the variant types. Finally I present some examples of applying such measures to frequency data from tropical grasslands and to successional data from subtropical rain forest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Austin, M. P. & Beibin, L. 1982. A new approach to the species classification problem in floristic analysis. Aust. J. Ecol. 7: 75–89.
Bartels, P. H., Bahr, G. F., Calhoun, D. W. & Wied, G. L. 1970. Cell recognition by neighbourhood grouping techniques in Ticas. Acta Cytol. 14: 313–324.
Bednarek, A. R. & Smith, T. F. 1980. A taxonomic distance applicable to paleontology. Math. BioSci. 50: 285–295.
Bednarek, A. R. & Ulam, S. M. 1979. An integer valued metric for patterns. Fundamentals of computation theory, pp. 52–57, Academic-Verlag, Berlin.
Bellacicco, A. 1977. Clustering time varying data. In: Barra, J. R., Brodeau, F., Romier, G. & van Cutsen, B. (eds), Recent developments in statistics, pp. 739–748, North Holland, New York.
Ben-Bassat, M. & Zaidenberg, L. 1984. Contextual template matching: a distance measure for patterns with hierarchically dependent features. IEEE Trans Patt. Anal. Mach. Intell. PAMI 6: 201–211.
Blackburn, D. T. 1980. A generalized distance metric for the analysis of variable taxa. Bot. Gaz. 141: 325–335.
Borg, I. & Staufenbiel, T. 1986. The MBR metric. J. Math. Psychol. 30: 81–84.
Bowman, D. M. J. S. & Wilson, B. A. 1986. Wetland vegetation pattern on the Adelaide River floodplain, Northern Territory, Australia. Proc. Roy. Soc. Qld. 97: 69–77.
Boinovič, R. & Srihari, S. N. 1982. A string correction algorithm for cursive script recognition. IEEE Trans. Patt. Anal. Mach. Intell. PAMI 4: 655–663.
Bray, J. R. & Curtis, J. T. 1957. An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27: 325–349.
Brook, R. J. & Stirling, W. D. 1984. Agreement between observers when the categories are not specified. Brit. J. Math. Statist. Psychol. 37: 271–282.
Bykat, A. 1979. On polygon similarity. Inform. Process. Lett. 9: 23–25.
Cayley, A. 1849. A note on the theory of permutations. Phil. Mag. 34: 527–529.
Critchlow, D. 1985. Metric methods for analyzing partially ranked data. Springer-Verlag, New York.
Coggins, J. M. 1983. Dissimilarity measures for clustering strings. In: Sankoff, D. & Kruskal, J. B. (eds), Time warps, string edits and macromolecules: the theory and practice of sequence comparison. pp. 311–321. Addison Wesley, London.
Dale, M. B. 1968. On property structure, numerical taxonomy and data handling. In: Heywood, V. H. (ed), Modern methods in plant taxonomy, pp. 185–197. Academic Press, London.
Dale, M. B. 1989. Dissimilarity for partially ranked data and its application to cover-abundance data. Vegetatio (in press).
Dale, M. B. 1988. Knowing when to stop: cluster concept-concept cluster. Coenoses 3: 11–32.
Dale, M. B. in press. Mutational and nonmutational similarity measures: a preliminary examination. Coenosis.
Dale, M. B. & Anderson, D. J. 1972. Qualitative and quantitative information analysis. J. Ecol. 60: 639–653.
Dale, M. B. & Clifford, H. T. 1976. The effectiveness of higher taxonomic ranks for vegetation analysis. Austral. J. Ecol. 1: 37–62.
Dale, M. B., Clifford, H. T. & Ross, D. R. 1984. Species, equivalence and morphological redescription: a Stradbroke Island vegetation study. In: Coleman, R. J., Covacevich, J. & Davie, P. (eds), Focus on Stradbroke: New information on North Stradbroke Island and surrounding areas, 1974–1984. Boolarong Publ., Brisband & Stradbroke Island Management Organization, Amity Point.
Dale, M. B. & Dale, P. E. R. 1986. Similarity and structured attribute in ecological classification. Abstr. Botan. 10: 17–34.
Dale, M. B., Ferrari, C, Beatrice, M. & Venanzoni, R. 1986. A comparison of some methods of selecting species in vegetation analysis. Coenoses 1: 35–52.
Dale, M. B., Groves, R. H., Hull, V. J. & O’Callaghan, J. F. 1970. Computer assisted description of leaf shapes. New Phytol. 70: 437–442.
Dale, M. B., MacNaughton-Smith, P., Williams, W. T. & Lance, G. N. 1970. Numerical classification of sequences. Austr. Cornput. J. 2: 9–13.
Dale, M. B. & Walker, D. 1970. Information analysis of pollen diagrams. Pollen et Spores 2: 21–37.
Estabrook, G. F. & Meacham, C. A. 1979. How to determine the compatability of undirected character state trees. Math. Bio-Sci. 46: 251–256.
Faith, D. P. 1985. A model of immunological distance in systematics. J. Theor. Biol. 114: 511–526.
Feoli, E. & Lagonegro, M. 1983. A resemblance function based on probability: applications to field and simulated data. Vegetatio 53: 3–9.
Findler, N. V. & van Leeuwen, J. 1979. A family of similarity measures between strings. IEEE Trans. Patt. Anal. Mach. Intel. PAMI 1: 116–118.
Fredman, M. L. 1984. Algorithms for computing evolutionary similarity measures with length independent gap penalties. Bull. Math. Biol. 46: 553–566.
Hayashi, C. 1956. Theory and examples of quantification II. Proc. Inst. Statist. Math. 4: 19–30. (in Japanese).
Hill, M. O. 1973. Reciprocal averaging: an eigenvector method of ordination. J. Ecol. 61: 237–249.
Hill, M. O. 1979. TWINSPAN, a FORTRAN program for analysing multivariate data in an ordered two-way table by classification of individuals and attributes. Dept. Ecology & Systematics, Cornell Univ. Ithaca, New York.
Hogeweg, P. & Hesper, B. 1984. The alignment of sets of sequences and the construction of phylogenetic trees: an integrated method. J. Mol. Evol. 20: 175–184.
Hohn, M. E. & Nuhfer, E. B. 1980. Asymmetric measures of association, classed data and multivariate analysis. Math. Geol. 12: 235–246.
Ito, T., Kodama, Y. & Toyoda, J. 1984. A similarity measure between patterns with non-independent attributes. IEEE Trans. Patt. Anal. Mach. Intell. PAMI 6: 111–115.
Jensén, S. & van der Maarel, E. 1980. Numerical approaches to wetland classification with special reference to macrophyte communities. Vegetatio 42: 117–128.
Kendall, M. G. 1938. A new measure of rank correlation. Biometrika 30: 81–93.
Kullback, S. 1959. Information theory and statistics. Wiley, New York.
Lambert, J. M. & Dale, M. B. 1964. The use of statistics in phytosociology. Adv. Ecol. Res. 2: 59–66.
Lance, G. N. 1970. Mixed and discontinuous data. In: Anderssen, R. S. & Osborne, M. R. (eds), Data representation, pp. 102–107. Univ. Queensland Press, St. Lucia, Qld.
Legendre, P. & Chodorowski, A. 1977. A generalization of Jaccard’s association coefficient for Q analysis of multi-state ecological data matrices. Ekol. Polska 25: 297–308.
Lehmann, D. R. 1972. Judged similarity and brand-switching data as similarity measures. J. Marketing Res. 9: 331–334.
Lemone, K. A. 1982. Similarity measures between strings extended to sets of strings. IEEE Trans. Patt. Anal. Mach. Intel. PAMI 4: 345–347.
Lerman, I.-C. 1977. Formal analysis of a general notion of proximity between variables. In: Barra, J. R., Brodeau, F., Romier, G. & van Cutsen, B. (eds), Recent developments in statistics, pp. 787–795. North Holland, New York.
Lerman, I.-C. & Peter, P. 1985. Elaboration et logiciel d’un indice de similarité entre objets d’un type quelconque. IRISA, Rennes, Publ. Intern. 262, 72 pp.
Levenshtein, V. I. 1965. Binary codes capable of correcting deletions, insertions and reversals. Dokl. Akad. Nauk SSR. 163: 825–828. (in Russian).
Lewis, P. A. W., Baxendale, P. B. & Bennett, J. L. 1967. Statistical discrimination of the Synonymy/Antonymy relationship between words. Assoc. Comput. Mach. J. 14: 20–44.
Lilliefors, H. W. 1967. On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Amer. Statist. Assoc. J. 62: 399–402.
Little, I. P. & Ross, D. R. 1985. The Levenshtein metric: a new means for soil classification tested by data from a sand-podzol chronosequence and evaluated by discriminant analysis. Aust. J. Soil Res. 23: 115–130.
Lowrance, R. & Wagner, R. A. 1975. An extension to the string-to-string correction problem. J. A. C. M. 22: 177–183.
Lu, S.-Y. 1984. A tree matching algorithm based on node splitting and merging. IEEE Trans. Patt. Anal. Machine Intell. PAMI 6: 249–256.
Lu, S.-Y. & Fu, K.-S. 1978. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Systems, Man & Cybernetics SMC 8: 381–389.
Mäkirinta, U. 1978. Die pflanzensoziologische Gliederung der Wasservegetation im See Kukkia, Südfinnland. Acta Univ. Ouluens. Ser. A 75, Biol. 5.
Matusita, K. 1977. Cluster analysis and affinity of distributions. In: Barra, J. R., Brodeau, F., Romier, G. & van Cutsen, B. (eds), Recent developmnts in statistics. pp. 537–544. North Holland, New York.
Mojena, R. 1977. Hierarchical grouping methods and stopping rules: an evaluation. Comput. J. 20: 359–363.
Moore, R. K. 1979. A dynamic programming algorithm for the distance between two finite areas. IEEE Trans. Patt. Anal. Machine Intell. PAMI 1: 86–88.
Mountford, M. D. 1962. An index of similarity and its application to classificatory problems. In: Murphy, P. W. (ed.), Progress in soil zoology, pp. 43–50. Butterworth, London.
Nakamura, K. & Iwai, S. 1982. A representation of analogical inference by fuzzy sets and its application to information retrieval system. In: Gupta, M. M. & Sanchez, E. (eds), Fuzzy information and decision processes, pp. 373–368. North Holland.
Norris, J. M. & Dale, M. B. 1971. Transition matrix approach to numerical classification of soil profiles. Proc. Soil. Sci. Soc. Amer. 35: 487–491.
Orlóci, L. & Stofella, S. K. 1986. A taxon-free numerical approach to the study of plant communities. Ann. Arid Zone 25: 111–131.
Ozawa, K. 1983. CLASSIC: a hierarchical clustering algorithm based on asymmetric similarities. Patt. Recog. 16: 201–211.
Ratkowsky, D. A. & Lance, G. N. 1978. A criterion for determining the number of groups in a classification. Austral. Comp. J. 10: 115–117.
Reuhkala, E., Jalanko, M. & Kohonen, T. 1979. A redundant hash addressing method adapted for the postprocessing and error-correction of computer-recognized speech. ICASSP 79: IEEE Internatl. Conf. Acoustics, Speech and Signal Processing. pp. 591–594.
Sakoe, H. 1979. Two-level DP-matching — a dynamic programming-based pattern matching algorithm for connected voice recognition. IEEE Trans. Acoustics, Speech and Signal Processing ASSP 27: 588–595.
Sankoff, D. & Kruskal, J. B. 1983. Time warps, string edits and macromolecules: the theory and practice of sequence comparison. Addision Wesley, London.
Sibson, R. 1969. Information radius. Z. Wahrsch. Verw. Geb. 14: 149–160.
Sneath, P. H. A. 1985. Testing levels on a dendrogram for multivariate hypersphericity. Comput. GeoSci. 11: 767–785.
Sokal, R. R. & Sneath, P. H. A. 1963. Principles of numerical taxonomy. San Francisco, Freeman.
Ukkonen, E. 1985. Algorithms for approximate string matching. Inform. Control. 64: 100–118.
van Rijsbergen, C. J. 1986. A non-classical logic for information retrieval. Comput. J. 29: 481–485.
Vesely, A. 1981. Logically oriented cluster analysis. Kybernetika 17: 82–92.
Wallbrecher, E. 1976. Ein Cluster-Verfahren zur richtungsstatistischen Analyse tektonischer Daten. Geol. Rdsch. 67: 840–857.
Waterman, M. S. & Smith, T. F. 1978. On the similarity of dendrograms. J. Theor. Biol. 73: 789–800.
Werman, M., Pelg, S. & Rosenfeld, A. 1985. A distance metric for multidimensional histograms. Comput. Vision Graph. Image Process. 32: 328–336.
Williams, W. T., Lance, G. N., Webb, L. J., Tracey, J. G. & Dale, M. B. 1969. Studies in the numerical analysis of complex rain forest communities. III. The analysis of successional data. J. Ecol. 57: 515–535.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1989 Kluwer Academic Publishers
About this chapter
Cite this chapter
Dale, M.B. (1989). Similarity measures for structured data: a general framework and some applications to vegetation data. In: Mucina, L., Dale, M.B. (eds) Numerical syntaxonomy. Advances in vegetation science, vol 10. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-2432-1_4
Download citation
DOI: https://doi.org/10.1007/978-94-009-2432-1_4
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-7597-8
Online ISBN: 978-94-009-2432-1
eBook Packages: Springer Book Archive