Skip to main content

Similarity measures for structured data: a general framework and some applications to vegetation data

  • Chapter
Numerical syntaxonomy

Part of the book series: Advances in vegetation science ((AIVS,volume 10))

  • 75 Accesses

Abstract

Although there are many measures of similarity existing in the phytosociological literature, these almost all apply to data for which the describing attributes have only single values. In many cases, however, there can be a richer structure in the attribute values, either directly from the nature of the attributes or derived from relationships between the stands. In this paper, I first examine a range of possible sources of such structure in phytosociological data, and then propose a similarity measure sufficiently general to be applicable to all the variant types. Finally I present some examples of applying such measures to frequency data from tropical grasslands and to successional data from subtropical rain forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Austin, M. P. & Beibin, L. 1982. A new approach to the species classification problem in floristic analysis. Aust. J. Ecol. 7: 75–89.

    Article  Google Scholar 

  • Bartels, P. H., Bahr, G. F., Calhoun, D. W. & Wied, G. L. 1970. Cell recognition by neighbourhood grouping techniques in Ticas. Acta Cytol. 14: 313–324.

    PubMed  CAS  Google Scholar 

  • Bednarek, A. R. & Smith, T. F. 1980. A taxonomic distance applicable to paleontology. Math. BioSci. 50: 285–295.

    Article  Google Scholar 

  • Bednarek, A. R. & Ulam, S. M. 1979. An integer valued metric for patterns. Fundamentals of computation theory, pp. 52–57, Academic-Verlag, Berlin.

    Google Scholar 

  • Bellacicco, A. 1977. Clustering time varying data. In: Barra, J. R., Brodeau, F., Romier, G. & van Cutsen, B. (eds), Recent developments in statistics, pp. 739–748, North Holland, New York.

    Google Scholar 

  • Ben-Bassat, M. & Zaidenberg, L. 1984. Contextual template matching: a distance measure for patterns with hierarchically dependent features. IEEE Trans Patt. Anal. Mach. Intell. PAMI 6: 201–211.

    Article  CAS  Google Scholar 

  • Blackburn, D. T. 1980. A generalized distance metric for the analysis of variable taxa. Bot. Gaz. 141: 325–335.

    Article  Google Scholar 

  • Borg, I. & Staufenbiel, T. 1986. The MBR metric. J. Math. Psychol. 30: 81–84.

    Article  Google Scholar 

  • Bowman, D. M. J. S. & Wilson, B. A. 1986. Wetland vegetation pattern on the Adelaide River floodplain, Northern Territory, Australia. Proc. Roy. Soc. Qld. 97: 69–77.

    Google Scholar 

  • Boinovič, R. & Srihari, S. N. 1982. A string correction algorithm for cursive script recognition. IEEE Trans. Patt. Anal. Mach. Intell. PAMI 4: 655–663.

    Article  Google Scholar 

  • Bray, J. R. & Curtis, J. T. 1957. An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27: 325–349.

    Article  Google Scholar 

  • Brook, R. J. & Stirling, W. D. 1984. Agreement between observers when the categories are not specified. Brit. J. Math. Statist. Psychol. 37: 271–282.

    Article  Google Scholar 

  • Bykat, A. 1979. On polygon similarity. Inform. Process. Lett. 9: 23–25.

    Article  Google Scholar 

  • Cayley, A. 1849. A note on the theory of permutations. Phil. Mag. 34: 527–529.

    Google Scholar 

  • Critchlow, D. 1985. Metric methods for analyzing partially ranked data. Springer-Verlag, New York.

    Book  Google Scholar 

  • Coggins, J. M. 1983. Dissimilarity measures for clustering strings. In: Sankoff, D. & Kruskal, J. B. (eds), Time warps, string edits and macromolecules: the theory and practice of sequence comparison. pp. 311–321. Addison Wesley, London.

    Google Scholar 

  • Dale, M. B. 1968. On property structure, numerical taxonomy and data handling. In: Heywood, V. H. (ed), Modern methods in plant taxonomy, pp. 185–197. Academic Press, London.

    Google Scholar 

  • Dale, M. B. 1989. Dissimilarity for partially ranked data and its application to cover-abundance data. Vegetatio (in press).

    Google Scholar 

  • Dale, M. B. 1988. Knowing when to stop: cluster concept-concept cluster. Coenoses 3: 11–32.

    Google Scholar 

  • Dale, M. B. in press. Mutational and nonmutational similarity measures: a preliminary examination. Coenosis.

    Google Scholar 

  • Dale, M. B. & Anderson, D. J. 1972. Qualitative and quantitative information analysis. J. Ecol. 60: 639–653.

    Article  Google Scholar 

  • Dale, M. B. & Clifford, H. T. 1976. The effectiveness of higher taxonomic ranks for vegetation analysis. Austral. J. Ecol. 1: 37–62.

    Article  Google Scholar 

  • Dale, M. B., Clifford, H. T. & Ross, D. R. 1984. Species, equivalence and morphological redescription: a Stradbroke Island vegetation study. In: Coleman, R. J., Covacevich, J. & Davie, P. (eds), Focus on Stradbroke: New information on North Stradbroke Island and surrounding areas, 1974–1984. Boolarong Publ., Brisband & Stradbroke Island Management Organization, Amity Point.

    Google Scholar 

  • Dale, M. B. & Dale, P. E. R. 1986. Similarity and structured attribute in ecological classification. Abstr. Botan. 10: 17–34.

    Google Scholar 

  • Dale, M. B., Ferrari, C, Beatrice, M. & Venanzoni, R. 1986. A comparison of some methods of selecting species in vegetation analysis. Coenoses 1: 35–52.

    Google Scholar 

  • Dale, M. B., Groves, R. H., Hull, V. J. & O’Callaghan, J. F. 1970. Computer assisted description of leaf shapes. New Phytol. 70: 437–442.

    Article  Google Scholar 

  • Dale, M. B., MacNaughton-Smith, P., Williams, W. T. & Lance, G. N. 1970. Numerical classification of sequences. Austr. Cornput. J. 2: 9–13.

    Google Scholar 

  • Dale, M. B. & Walker, D. 1970. Information analysis of pollen diagrams. Pollen et Spores 2: 21–37.

    Google Scholar 

  • Estabrook, G. F. & Meacham, C. A. 1979. How to determine the compatability of undirected character state trees. Math. Bio-Sci. 46: 251–256.

    Article  Google Scholar 

  • Faith, D. P. 1985. A model of immunological distance in systematics. J. Theor. Biol. 114: 511–526.

    Article  PubMed  CAS  Google Scholar 

  • Feoli, E. & Lagonegro, M. 1983. A resemblance function based on probability: applications to field and simulated data. Vegetatio 53: 3–9.

    Article  Google Scholar 

  • Findler, N. V. & van Leeuwen, J. 1979. A family of similarity measures between strings. IEEE Trans. Patt. Anal. Mach. Intel. PAMI 1: 116–118.

    Article  CAS  Google Scholar 

  • Fredman, M. L. 1984. Algorithms for computing evolutionary similarity measures with length independent gap penalties. Bull. Math. Biol. 46: 553–566.

    Google Scholar 

  • Hayashi, C. 1956. Theory and examples of quantification II. Proc. Inst. Statist. Math. 4: 19–30. (in Japanese).

    Google Scholar 

  • Hill, M. O. 1973. Reciprocal averaging: an eigenvector method of ordination. J. Ecol. 61: 237–249.

    Article  Google Scholar 

  • Hill, M. O. 1979. TWINSPAN, a FORTRAN program for analysing multivariate data in an ordered two-way table by classification of individuals and attributes. Dept. Ecology & Systematics, Cornell Univ. Ithaca, New York.

    Google Scholar 

  • Hogeweg, P. & Hesper, B. 1984. The alignment of sets of sequences and the construction of phylogenetic trees: an integrated method. J. Mol. Evol. 20: 175–184.

    Article  PubMed  CAS  Google Scholar 

  • Hohn, M. E. & Nuhfer, E. B. 1980. Asymmetric measures of association, classed data and multivariate analysis. Math. Geol. 12: 235–246.

    Article  CAS  Google Scholar 

  • Ito, T., Kodama, Y. & Toyoda, J. 1984. A similarity measure between patterns with non-independent attributes. IEEE Trans. Patt. Anal. Mach. Intell. PAMI 6: 111–115.

    Article  CAS  Google Scholar 

  • Jensén, S. & van der Maarel, E. 1980. Numerical approaches to wetland classification with special reference to macrophyte communities. Vegetatio 42: 117–128.

    Article  Google Scholar 

  • Kendall, M. G. 1938. A new measure of rank correlation. Biometrika 30: 81–93.

    Google Scholar 

  • Kullback, S. 1959. Information theory and statistics. Wiley, New York.

    Google Scholar 

  • Lambert, J. M. & Dale, M. B. 1964. The use of statistics in phytosociology. Adv. Ecol. Res. 2: 59–66.

    Article  Google Scholar 

  • Lance, G. N. 1970. Mixed and discontinuous data. In: Anderssen, R. S. & Osborne, M. R. (eds), Data representation, pp. 102–107. Univ. Queensland Press, St. Lucia, Qld.

    Google Scholar 

  • Legendre, P. & Chodorowski, A. 1977. A generalization of Jaccard’s association coefficient for Q analysis of multi-state ecological data matrices. Ekol. Polska 25: 297–308.

    Google Scholar 

  • Lehmann, D. R. 1972. Judged similarity and brand-switching data as similarity measures. J. Marketing Res. 9: 331–334.

    Article  Google Scholar 

  • Lemone, K. A. 1982. Similarity measures between strings extended to sets of strings. IEEE Trans. Patt. Anal. Mach. Intel. PAMI 4: 345–347.

    Article  CAS  Google Scholar 

  • Lerman, I.-C. 1977. Formal analysis of a general notion of proximity between variables. In: Barra, J. R., Brodeau, F., Romier, G. & van Cutsen, B. (eds), Recent developments in statistics, pp. 787–795. North Holland, New York.

    Google Scholar 

  • Lerman, I.-C. & Peter, P. 1985. Elaboration et logiciel d’un indice de similarité entre objets d’un type quelconque. IRISA, Rennes, Publ. Intern. 262, 72 pp.

    Google Scholar 

  • Levenshtein, V. I. 1965. Binary codes capable of correcting deletions, insertions and reversals. Dokl. Akad. Nauk SSR. 163: 825–828. (in Russian).

    Google Scholar 

  • Lewis, P. A. W., Baxendale, P. B. & Bennett, J. L. 1967. Statistical discrimination of the Synonymy/Antonymy relationship between words. Assoc. Comput. Mach. J. 14: 20–44.

    Google Scholar 

  • Lilliefors, H. W. 1967. On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Amer. Statist. Assoc. J. 62: 399–402.

    Article  Google Scholar 

  • Little, I. P. & Ross, D. R. 1985. The Levenshtein metric: a new means for soil classification tested by data from a sand-podzol chronosequence and evaluated by discriminant analysis. Aust. J. Soil Res. 23: 115–130.

    Article  Google Scholar 

  • Lowrance, R. & Wagner, R. A. 1975. An extension to the string-to-string correction problem. J. A. C. M. 22: 177–183.

    Google Scholar 

  • Lu, S.-Y. 1984. A tree matching algorithm based on node splitting and merging. IEEE Trans. Patt. Anal. Machine Intell. PAMI 6: 249–256.

    Article  CAS  Google Scholar 

  • Lu, S.-Y. & Fu, K.-S. 1978. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Systems, Man & Cybernetics SMC 8: 381–389.

    Article  Google Scholar 

  • Mäkirinta, U. 1978. Die pflanzensoziologische Gliederung der Wasservegetation im See Kukkia, Südfinnland. Acta Univ. Ouluens. Ser. A 75, Biol. 5.

    Google Scholar 

  • Matusita, K. 1977. Cluster analysis and affinity of distributions. In: Barra, J. R., Brodeau, F., Romier, G. & van Cutsen, B. (eds), Recent developmnts in statistics. pp. 537–544. North Holland, New York.

    Google Scholar 

  • Mojena, R. 1977. Hierarchical grouping methods and stopping rules: an evaluation. Comput. J. 20: 359–363.

    Article  Google Scholar 

  • Moore, R. K. 1979. A dynamic programming algorithm for the distance between two finite areas. IEEE Trans. Patt. Anal. Machine Intell. PAMI 1: 86–88.

    Article  CAS  Google Scholar 

  • Mountford, M. D. 1962. An index of similarity and its application to classificatory problems. In: Murphy, P. W. (ed.), Progress in soil zoology, pp. 43–50. Butterworth, London.

    Google Scholar 

  • Nakamura, K. & Iwai, S. 1982. A representation of analogical inference by fuzzy sets and its application to information retrieval system. In: Gupta, M. M. & Sanchez, E. (eds), Fuzzy information and decision processes, pp. 373–368. North Holland.

    Google Scholar 

  • Norris, J. M. & Dale, M. B. 1971. Transition matrix approach to numerical classification of soil profiles. Proc. Soil. Sci. Soc. Amer. 35: 487–491.

    Article  Google Scholar 

  • Orlóci, L. & Stofella, S. K. 1986. A taxon-free numerical approach to the study of plant communities. Ann. Arid Zone 25: 111–131.

    Google Scholar 

  • Ozawa, K. 1983. CLASSIC: a hierarchical clustering algorithm based on asymmetric similarities. Patt. Recog. 16: 201–211.

    Article  Google Scholar 

  • Ratkowsky, D. A. & Lance, G. N. 1978. A criterion for determining the number of groups in a classification. Austral. Comp. J. 10: 115–117.

    Google Scholar 

  • Reuhkala, E., Jalanko, M. & Kohonen, T. 1979. A redundant hash addressing method adapted for the postprocessing and error-correction of computer-recognized speech. ICASSP 79: IEEE Internatl. Conf. Acoustics, Speech and Signal Processing. pp. 591–594.

    Google Scholar 

  • Sakoe, H. 1979. Two-level DP-matching — a dynamic programming-based pattern matching algorithm for connected voice recognition. IEEE Trans. Acoustics, Speech and Signal Processing ASSP 27: 588–595.

    Article  Google Scholar 

  • Sankoff, D. & Kruskal, J. B. 1983. Time warps, string edits and macromolecules: the theory and practice of sequence comparison. Addision Wesley, London.

    Google Scholar 

  • Sibson, R. 1969. Information radius. Z. Wahrsch. Verw. Geb. 14: 149–160.

    Article  Google Scholar 

  • Sneath, P. H. A. 1985. Testing levels on a dendrogram for multivariate hypersphericity. Comput. GeoSci. 11: 767–785.

    Article  Google Scholar 

  • Sokal, R. R. & Sneath, P. H. A. 1963. Principles of numerical taxonomy. San Francisco, Freeman.

    Google Scholar 

  • Ukkonen, E. 1985. Algorithms for approximate string matching. Inform. Control. 64: 100–118.

    Article  Google Scholar 

  • van Rijsbergen, C. J. 1986. A non-classical logic for information retrieval. Comput. J. 29: 481–485.

    Article  Google Scholar 

  • Vesely, A. 1981. Logically oriented cluster analysis. Kybernetika 17: 82–92.

    Google Scholar 

  • Wallbrecher, E. 1976. Ein Cluster-Verfahren zur richtungsstatistischen Analyse tektonischer Daten. Geol. Rdsch. 67: 840–857.

    Article  Google Scholar 

  • Waterman, M. S. & Smith, T. F. 1978. On the similarity of dendrograms. J. Theor. Biol. 73: 789–800.

    Article  PubMed  CAS  Google Scholar 

  • Werman, M., Pelg, S. & Rosenfeld, A. 1985. A distance metric for multidimensional histograms. Comput. Vision Graph. Image Process. 32: 328–336.

    Article  Google Scholar 

  • Williams, W. T., Lance, G. N., Webb, L. J., Tracey, J. G. & Dale, M. B. 1969. Studies in the numerical analysis of complex rain forest communities. III. The analysis of successional data. J. Ecol. 57: 515–535.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

L. Mucina M. B. Dale

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Kluwer Academic Publishers

About this chapter

Cite this chapter

Dale, M.B. (1989). Similarity measures for structured data: a general framework and some applications to vegetation data. In: Mucina, L., Dale, M.B. (eds) Numerical syntaxonomy. Advances in vegetation science, vol 10. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-2432-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-94-009-2432-1_4

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-7597-8

  • Online ISBN: 978-94-009-2432-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics